Blueprint & Other Guidelines for Paper Setters

My Lordincrease me in knowledge.” (Holy Quran 20 : 114)

Learning Objectives

  • Explain what is a Blueprint
  • Explain purpose of a blueprint
  • Describe how blueprint is made

Introduction

Students often come out of an exam hall complaining that:

  • Unimportant questions were given
  • Questions came from outside the syllabus
  • Too many questions came from a particular topic/chapter/unit
  • Not enough time to complete the paper; paper was too lengthy
  • There was no question from such an important unit/chapter
  • Most of the questions were too difficult
  • Paper was too easy, a piece of cake
  • Questions were too vague
  • This was not taught to us

The Blueprint

Written examinations are the most commonly employed method to assess knowledge in medical education and are used to test recall abilities as well as higher-order cognitive functions, such as the interpretation of data and problem-solving skills.

Blueprinting can be defined as the creation of a template to determine the content of a test; it is a detailed plan of action for a paper.

Blue print is a two way matrix that ensures that all aspects of the curriculum and educational domains are covered by assessment programs over a specified period of time. It is a chart which shows the placement of each question in respect of the objective and the content area that it tests.

It should be drawn up by paper setters for any formative or subjective test, including MCQs, SEQs, OSPEs, and Structured Viva etc. It should be drawn out for a small class test, term exam, sendup as well as university examinations.

It lists the number and type of questions across the course content, with relative weightage given to each topic/chapter/unit (depending on the examination). Weightage given is determined according to number of learning objectives & their relative importance.

There may be subtle or huge differences between what is taught and emphasized in lectures , what is mentioned in the syllabus or textbooks, and what is assessed. All this has to be systematically organized.

Blueprinting is increasingly used in the field of medical education worldwide; the assessments are prepared in such a way that those students who have not met important learning outcomes are not able to graduate.

Content validity gauges the extent to which an assessment covers a representative sample of the material which should be assessed; for example, if examination questions cover the learning objectives of the syllabus, the examination is considered to have content validity. Other objectives of blueprinting are given below.

Objectives/Purpose of a Blueprint

Using an examination blueprint ensures:

  • Increases validity and reliability of the paper
  • Is a guide to paper construction
  • Ensures proper weighting of marks for important topics
  • Aligns questions with learning objectives
  • Distributes questions according to clinical importance
  • Minimizes inter-examiner variations in selecting questions
  • Ensures uniform distribution of questions from the syllabus
  • Prevents over or under-sampling of questions from a single topic
  • Ensures papers are well organized and test in-depth subject knowledge
  • Ensures appropriate overall difficulty level for an average student
  • Ensures sufficient time given for attempting the paper
  • Helps students focus on key areas of clinical importance/public health.
  • Improves examination results

Criteria & Significance of a Good Assessment

Good Validity

  • Validity is a measure of the degree to which the assessment actually reflects the qualities in the candidate that it is intended to measure
  • Content Validity: Questions should be congruent with learning objectives of the course and their relative importance
  • Validity is the strength of the conclusions, inferences or propositions we draw from the assessment.

High Reliability

  • The degree to which the result of a measurement, calculation, or specification can be depended on to be accurate.  Reliability is the degree of consistency of a measure. A test will be reliable when it gives the same repeated result under the same conditions.
  • Individual marks would not change appreciably over many more assessments.

Reproducibility

  • This means that if a student were given a similar examination repeatedly (incorporating many different cases, many different written questions, etc.) he or she would come up with a similar mark on each occasion.
  • This constitutes the proof that the assessment is accurately reflecting the candidate’s actual ability, and not just their good luck or bad luck in meeting a particular group of questions, patients (in a clinical exam), or examiners.

Generalizability:

  • The marks given for the sample of cases or questions really does represent the marks that would be obtained if a very much wider range of questions were to have been attempted.

Uniformity & Consistency

  • There should be planned and organized distribution of questions from the syllabus. Proper weightage should be given to each area
  • It also implies that papers should be of similar difficulty each year. It should not be that there is a difficult paper one year and an easy paper the next year

Feasibility

  • The state or degree of being easily or conveniently done
  • Questions should be at level of the students (undergraduate etc.)
  • Questions should be from the prescribed syllabus
  • Questions should be clearly written and unambiguous
  • An average student should be able to answer the questions in the allotted time

Acceptability

  • Both candidates and the examiners find the purposes and format of the assessment reasonable and acceptable
  • Item indices (Difficulty Index, Discrimination indices etc. should be determined after the paper for MCQs and SEQs)

Objectivity

  • Should fulfill learning objectives
  • Being the object or goal of one’s efforts or actions. not influenced by personal feelings, interpretations, or prejudice; based on facts; unbiased

Transparency, Fairness and Accountability

  • Unfair and irregular practices in the exam system can impede access of worthy candidates to progress
  • Right to information on the qualifications and experience of the paper setter(s)
  • A common request for information is to ask to see the marks given by examiners on the answer papers

https://www.thedailystar.net/opinion/perspective/making-public-exams-transparent-1519834

Making of the Blueprint

  • Learning Objectives
  • The syllabus, especially topics of the exam or test are used to make a blueprint
  • Examination content should match the syllabus
  • Learning objectives of each topic should be well designed and preferably displayed beforehand
  • Learning objectives are designed according to university syllabus, recommended textbooks, past papers and other necessary information for undergraduate students
  • Learning objectives for each topic are scored according to quantity, clinical importance, (impact on health and frequency/prevalence in society) and time given to them.
  • The relevant units and/or topics are then given weightage i.e. amount of questions to be given to them (SEQs, MCQs based on the above information)
  • Individual weightages to each unit or chapter or topic is determined from the indicators of a blueprint

Indicators of a Blueprint

  1. Impact

Relative Impact Score:

  • Critical
  • Essential
  • Important
  • Need to Learn
  • Nice to learn
  • Trivial

The impact score is determined by the following factors

  • Number of learning objectives
  • Number of lectures/time devoted to the area
  • Number of Pages
  • Incidence & prevalence in the society
  • These are reasons why number of questions from CNS in paper of Pharmacology of UHS have to be increased

Final impact score ranges from 1 – 3.

  • Impact score of 1 implies ‘Critical’ and ‘Essential’
  • Impact score of 2 implies ‘Important’ and ‘Need to Learn’
  • Impact score of 3 implies ‘Nice to learn’ and ‘Trivial’

2. Frequency

This tells us about how frequently the topic/learning objective/question has been asked in previous formative and summative examinations.

It also ranges from 1 – 3.

  • Frequency score 1 means less frequently asked question
  • Frequency score 2 means moderate frequency of asking question
  • Frequency score 3 means high frequency of asking question

Weightage of Each Content

Following steps are conducted for deciding weightage to each content area:

  1. Calculate I × F i.e. Impact of topic × Frequency of asking questions from each topic
  2. Calculate total summation of all I × F and this will be labeled as “T”.
  3. Weightage coefficient (W) will be calculated as I × F/ T
  4. Multiply the Weightage coefficient (W) by total number of items
  5. Calculate adjusted weightage of each content areas as per total marks
  6. All this can be displayed on a table on MS Excel or MS Word
  7. Same can be displayed for MCQs, SEQs etc

The difficulty level of each question should also be displayed i.e. C1, C2 or C3 level (example ahead)

C1 (Recall), C2 (Understanding) and C3 (Analysis)

Levels of Cognition

C1. Recall . Student remembers or memorizes

e.g. Enumerate 5 causes of fever.

This is lowest level of knowledge.

C2. Understand.  Student describes or explains

e.g. Explain findings of an ECG or X ray chest

C3. Student uses information gained from different sources to reach a diagnosis

e.g. considering history, exam, blood gas levels, CT findings. Scenario based.

This is the level we should aim in teaching & in assessment.

For a long time there were 3 levels of cognition, C1 – C3. Now as shown in picture there are 6 levels.

Highest level is ‘Creation’. Student is asked to use knowledge to come up with his/her own plan /solution of a problem

e.g. Design a management plan for 30 year old lady who has reported with high grade fever, 6 days after a delivery at home.

This is necessary so that at the end one can see whether they are appropriately distributed i.e. not all should be of C1 level or C3 level.

This means for early formative exams, like class tests, 50 % of questions should be of C1 level, 40 % of C2 and 10 % of C3 level.

The percentages may altered e.g. by increasing ratio of C3 level questions later on in the session in term, sendup and professional exams.

Cognitive competence goes far deeper than merely remembering facts. Exercise of these higher levels of cognition should be actively promoted and explicitly assessed

Marks allotted to each question or part of the question pertaining to the topic or learning objective should be mentioned in a separate column.

Different segments of the content should be assessed in MCQs and SEQs; also beneficial in avoiding repetition.

In the last column, time given for each question(s) should be allotted; i.e. the approximate time in which we expect an average student to answer the question (realistically).

It is also recommended that blue prints should be prepared by different subject experts every time and should be peer reviewed

An example of a blueprint is given below:

Time Factor

As one can see in the above table, the time expected for answering each SEQ is given. This can be assessed by estimating time required by a student to answer the SEQ (one can ask a demonstrator to attempt the SEQ); Same can be done with MCQs (average 1 minute; some should be answered in 15-30 seconds, some in 30 seconds – 1 minute while the C3 level MCQs in 1.5 minutes)

Conclusion

In conclusion, constructing a test blueprint includes the following steps:

Identify the content areas of the subject matter (sections) to be measured by the exam

Identify the learning outcomes (domains) across each section to be measured by the exam

Weight the sections and domains in terms of their relative importance

Construct a spreadsheet in accordance with relative weights by distributing the test items

Allocate the relevant cells of the spreadsheet with their level of cognition

The question paper should be fairly distributed over the whole syllabus prescribed for the course during the academic semester. No question or part thereof should be out of the prescribed syllabus. Repetition of questions must be avoided.

The blue print makes the assessment clear, explicit and transparent to everyone involved in the process of learning. It makes assessment ‘fair’ to the students as they can have clear idea of what is being examined and can direct their learning efforts in that direction. Blueprints arising from these detailed specifications form an exact sampling plan for content domain to be tested.

Other Examination Guidelines

When asking questions, action verbs should be used from Bloom’s Taxonomy. Action verbs used in learning objectives should also be used in question papers. Like ‘enlist, explain, tabulate, describe, compare, enumerate, explain, classify’ etc.

  • Learning objectives should be SMART (specific, MEASURABLE, attainable, relevant and time focused)

Use following verbs for the given cognitive levels:

C1 : Enlist. Enumerate, Classify, Define, Describe, Name, label

C2 : Explain, Rationalize, Tabulate, differentiate, calculate

C3: Compare and Contrast, Justify, Discriminate

  • Avoid using ‘What’or ‘write down’ or ‘give’ etc. in questions. It is ambiguous. Students and examiners will not be knowing whether one is asked to define, enlist, describe, explain etc.
  • Include the mark allocation for each question and parts of a question, with a more detailed breakdown where necessary
  • Proof read the text once again by oneself and then pass on the paper to the Reviser for the final proof reading, preferably a senior colleague.
  • Pass on the finalized draft of the paper to an external reviser who has to proof read the text again, ensure that no test item is out of syllabus, check that all set tasks are workable and that the paper can be completed in the set time.
  • One should be able to give satisfactory answers to the following questions :

Window Dressing :

Window Dressing :
‘superficial or misleading presentation of something, designed to create a favourable impression.’

Verbosity:

‘ the fact or quality of using more words than needed’

Above picture shows questions that are NOT scenario based ! They are simple questions which have undergone window dressing/verbosity. There is no link between the scenario and lead question (underlined in red).

These lead questions could have been asked separately WITHOUT the preceding window dressing.

A true scenario based question has to have a link between the stem and the lead question, whether for an MCQ or a SEQ.

Examples of scenario based questions given below:

Also read this blog on flaws in designing MCQs :

With great acknowledgments and thanks to my mentor in Medical Education Dr. Fahad (Dental Surgeon, Medical Educationist)

Leave a Reply

Your email address will not be published. Required fields are marked *