The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

HMEF5053 Measurement and Evaluation in Education_vDec19

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by nur adila, 2020-08-12 20:42:13

HMEF5053 Measurement and Evaluation in Education_vDec19

HMEF5053 Measurement and Evaluation in Education_vDec19

34  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

Krathwohl, Bloom and Bertram (1973) and his colleagues developed the affective
domain which deals with things related to emotion, such as feelings, values,
appreciation, enthusiasm, motivation and attitudes. The five major categories
listed the simplest behaviour to the most complex: receiving, responding, valuing,
organisation and characterisation (refer to Figure 2.5).

Figure 2.5: Krathwohl, Bloom and BertramÊs taxonomy of affective learning outcomes
Source: Krathwohl et al. (1973)

These categories are further explained as follows:

(a) Receiving (A1)
The behaviours at the receiving level require the student to be aware, willing
to hear and focused or attentive. Verbs describing behaviours at the receiving
level include ask, listen, choose, describe, follow, give, hold, locate, name,
point to, select, reply and so forth. For example, the student:
(i) Listens to others with respect; and
(ii) Listens and remembers the names of other students.

(b) Responding (A2)
The behaviours at the responding level require the student to be an active
participant, attend and react to a particular phenomenon, willing to respond
and gain satisfaction in responding (motivation). Verbs describing
behaviours at the responding level include answer, assist, aid, comply,
conform, discuss, greet, help, label, perform, practise, present, read, recite,
report, select, tell, write and so forth. For example, the student:
(i) Participates in class discussion;
(ii) Gives a presentation; and
(iii) Questions new ideals, concepts or models in order to fully understand
them.

Copyright © Open University Malaysia (OUM)

TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?  35

(c) Valuing (A3)
This level relates to the worth or value a person attaches to a particular object,
phenomenon or behaviour. This ranges from simple acceptance to the more
complex state of commitment. Valuing is based on the internalisation of a set
of specified values, while clues to these values are expressed in the student
as overt behaviour and are often identifiable. Verbs describing behaviours at
the valuing level include demonstrate, differentiate, follow, form, initiate,
invite, join, justify, propose, read, report, select, share, study, work and so
forth. For example, the student:

(i) Demonstrates belief in the democratic process;

(ii) Is sensitive towards individual and cultural differences (values
diversity);

(iii) Shows the ability to solve problems;

(iv) Proposes a plan for social improvement; and

(v) Follows through with commitment.

(d) Organisation (A4)
At this level, people organise values into priorities by contrasting different
values, resolving conflicts between them and creating a unique value
system. The emphasis is on comparing, relating and synthesising
values. Verbs describing behaviours at the level of organisation are adhere,
alter, arrange, combine, compare, complete, defend, explain, formulate,
generalise, identify, integrate, modify, order, organise, prepare, relate,
synthesise and so forth. For example, the student:

(i) Recognises the need for balance between freedom and responsible
behaviour;

(ii) Accepts responsibility for his or her behaviour;

(iii) Explains the role of systematic planning in solving problems;

(iv) Accepts professional ethical standards;

(v) Creates a life plan in harmony with abilities, interests and beliefs; and

(vi) Prioritises time effectively to meet the needs of the organisation, family
and self.

Copyright © Open University Malaysia (OUM)

36  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

(e) Characterisation (A5)
At this level, a personÊs value system controls his or her behaviour. The
behaviour is pervasive, consistent, predictable and most importantly,
characteristic of the student. Verbs describing behaviours at this level
include act, discriminate, display, influence, listen, modify, perform,
practise, propose, qualify, question, revise, serve, solve and verify. For
example, the student:

(i) Shows self-reliance when working independently;

(ii) Cooperates in group activities (displays teamwork);

(iii) Uses an objective approach in problem solving;

(iv) Displays a professional commitment to ethical practice on a daily basis;

(v) Revises judgement and changes behaviour in light of new evidence;
and

(vi) Values people for what they are and not how they look.

Table 2.3 shows how affective taxonomy may be applied to a value such as
honesty. It traces the development of an affective attribute such as honesty from
the „receiving‰ level until the „characterisation‰ level where the value becomes a
part of the individualÊs character.

Table 2.3: Affective Taxonomy for Honesty

Individual Character Explanation

Receiving (attending) Aware that certain things are honest or dishonest

Responding Saying honesty is better and behaving accordingly

Valuing Consistently (but not always) telling the truth

Organisation Being honest in a variety of situations

Characterisation by a value or Honest in most situations, expects others to be honest

value complex and interacts with others fully and honestly

Copyright © Open University Malaysia (OUM)

TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?  37

SELF-CHECK 2.2

1. Explain the differences between characterisation and valuing
according to the affective taxonomy of learning outcomes.

2. „A student is operating at the responding level.‰ What does this
mean?

ACTIVITY 2.3

The Role of Affect in Education
„Some say schools should be concerned only with content.‰
„It is impossible to teach content without also teaching affect.‰
„To what extent, if at all, should we be concerned with the assessment of
affective learning outcomes?‰
In the myINSPIRE online forum, discuss the three statements in the
context of the Malaysian education system.

2.4 ASSESSING PSYCHOMOTOR LEARNING
OUTCOMES OR BEHAVIOUR

The psychomotor domain includes physical movement, coordination and use of
motor-skill areas. Development of these skills requires practice and is measured in
terms of speed, precision, distance, procedures and techniques in execution. There
are seven major categories listed in this domain from the simplest to the most
complex behaviour as shown in Figure 2.6.

Copyright © Open University Malaysia (OUM)

38  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

Figure 2.6: Taxonomy of psychomotor learning outcomes
These learning outcomes are further explained as follows:
(a) Perception (P1)

This is the ability to use sensory cues to guide motor activity. It ranges from
sensory stimulation and cue selection to translation. Verbs describing these
types of behaviours include choose, describe, detect, differentiate,
distinguish, identify, isolate, relate, select and so forth. For example, the
student:
(i) Detects non-verbal communication cues from the coach;
(ii) Estimates where a ball will land after it is thrown and then moves to

the correct location to catch the ball;
(iii) Adjusts the heat of the stove to the correct temperature by the smell and

taste of food; and
(iv) Adjusts the height of the ladder in relation to the point on the wall.
(b) Set (P2)
This includes mental, physical and emotional sets. These three sets are
dispositions that predetermine a personÊs response to different situations
(sometimes called mindsets). Verbs describing „set‰ include begin, display,
explain, move, proceed, react, show, state and volunteer. For example, the
student:
(i) Knows and acts upon a sequence of steps in a manufacturing process;
(ii) Recognises his or her abilities and limitations; and
(iii) Shows the desire to learn a new process (motivation).
Note: This subdivision of the psychomotor domain is closely related to the
„responding‰ subdivision of the affective domain.

Copyright © Open University Malaysia (OUM)

TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?  39

(c) Guided Response (P3)
The early stages in learning a complex skill that includes imitation, and trial
and error. Adequacy of performance is achieved by practising. Verbs
describing „guided response‰ include copy, trace, follow, react, reproduce
and respond. For example, the student:

(i) Performs a mathematical equation as demonstrated;

(ii) Follows instructions when building a model of a kampung house; and

(iii) Responds to the hand signals of the coach while learning gymnastics.

(d) Mechanism (P4)
This is the intermediate stage in learning a complex skill. Learned responses
have become habitual and the movements can be performed with some
confidence and proficiency. Verbs describing „mechanism‰ include
assemble, calibrate, construct, dismantle, display, fasten, fix, grind, heat,
manipulate, measure, mend, mix and organise. For example, the student:

(i) Uses a computer;

(ii) Repairs a leaking tap;

(iii) Fixes a three-pin electrical plug; and

(iv) Rides a motorbike.

(e) Complex Overt Response (P5)
The skilful performance of motor acts that involve complex movement
patterns. Proficiency is indicated by a quick, accurate and highly coordinated
performance, requiring a minimum of energy. This category includes
performing without hesitation and automatic performance. For example,
players often utter sounds of satisfaction or expletives as soon as they hit a
tennis ball (like world famous tennis players Maria Sharapova and Serena
Williams) or a golf ball (golfers will immediately know they have hit a bad
shot!) because they can tell by the feel of the act and what the result will be.
Verbs describing „complex overt responses‰ include assemble, build,
calibrate, construct, dismantle, display, fasten, fix, grind, heat, manipulate,
measure, mend, mix, organise and sketch. For example, the student:

(i) Manoeuvres a car into a tight parallel parking spot;

(ii) Operates a computer quickly and accurately; and

(iii) Displays competence while playing the piano.

Copyright © Open University Malaysia (OUM)

40  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

Note: Many of the verbs are the same as „mechanism‰, but there are adverbs
or adjectives that indicate that the performance is quicker, better and more
accurate.
(f) Adaptation (P6)
Skills are well developed and the individual can modify movement patterns
to fit special requirements. Verbs describing „adaptation‰ include adapt,
alter, change, rearrange, reorganise, revise and vary. For example, the
student:
(i) Responds effectively to unexpected experiences;
(ii) Modifies instruction to meet the needs of the students; and
(iii) Performs a task with a machine that it was not originally intended to

do (machine is not damaged and there is no danger in performing the
new task).
(g) Origination (P7)
Creating new movement or pattern to fit a particular situation or specific
problem. Learning outcomes emphasise creativity based on highly-
developed skills. Verbs describing „origination‰ include arrange, build,
combine, compose, construct, create, design, initiate, make, and originate.
For example, the student:
(i) Constructs a new theory;
(ii) Develops a new technique for goalkeeping; and
(i) Creates a new gymnastic routine.
Table 2.4 shows how psychomotor taxonomy may be applied to kicking a football.
It traces the development of the psychomotor skill of kicking a football from the
„perception‰ level until the „origination‰ level.

Copyright © Open University Malaysia (OUM)

TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?  41

Table 2.4: Psychomotor Taxonomy for Kicking a Football

Level Explanation
Perception Able to estimate where the ball would land after it was kicked
Responding Shows the desire to learn and perform a kicking technique
Guided response Able to kick the ball under guidance through trial and error or
imitation
Mechanism Able to kick the ball mechanically with some confidence and
proficiency
Complex overt Able to kick the ball skilfully using a proper technique learnt
response
Adaptation Able to modify the kicking technique to suit different situations
Origination Able to create a new kicking technique

SELF-CHECK 2.3

1. Explain the differences between adaptation and guided response
according to the taxonomy of psychomotor learning outcomes.

2. „A student is operating at the origination level.‰ What does this
mean?

2.5 IMPORTANT TRENDS IN WHAT TO
ASSESS

Since the influence of testing on curriculum and instruction is now widely
acknowledged, educators, policymakers and others are turning to alternative
assessment methods as a tool for educational reform. The call is to move away
from traditional objective and essay tests towards alternative assessments focusing
on authentic assessment and performance assessment (we will discuss these
assessment methods in Topics 5 and 6). Various techniques have been proposed to
assess learners more holistically, focusing on both the product and process of
learning (refer to Figure 2.7).

Copyright © Open University Malaysia (OUM)

42  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

Figure 2.7: Trends in what to assess
Source: Dietel, Herman and Knuth(1991)

 Assessment of cognitive outcomes has remained the focus of most assessment
systems all over the world because it is relatively easier to observe and
measure.

 Each domain of learning consists of subdivisions, starting from the simplest
behaviours to the most complex, thus forming a taxonomy of learning
outcomes.

 When we evaluate or assess a human being, we are assessing or evaluating the
behaviour of a person.

 Every subject area has its unique repertoire of facts, concepts, principles,
generalisations, theories, laws, procedures and methods to be transmitted to
learners.

Copyright © Open University Malaysia (OUM)

TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?  43

 There are six levels in BloomÊs taxonomy of cognitive learning outcomes with
the lowest level termed knowledge, followed by five increasingly difficult
levels of mental abilities: comprehension, application, analysis, synthesis
and evaluation. The six levels in the revised version are remembering,
understanding, applying, analysing, evaluating and creating.

 Affective characteristics involve the feelings or emotions of a person. Attitudes,
values, self-esteem, locus of control, self-efficacy, interests, aspirations and
anxiety are all examples of affective characteristics.

 The five major categories of the affective domain from the simplest behaviour
to the most complex are receiving, responding, valuing, organisation and
characterisation.

 The psychomotor domain includes physical movement, coordination and use
of the motor-skill areas.

 The seven major categories of the psychomotor domain from the simplest
behaviour to the most complex are perception, set, guided response,
mechanism, complex overt response, adaptation and origination.

 The ideal situation is an alignment of objectives, instruction and assessment.

 The trend in assessment is to move away from traditional objective and essay
tests towards alternative assessments focusing on authentic assessment and
performance assessment.

Affective outcomes Cognitive-constructivist
Authentic assessment Cognitive outcomes
Behaviour Holistic assessment
Behavioural view Psychomotor outcomes
BloomÊs taxonomy The Helpful Hundred

Copyright © Open University Malaysia (OUM)

44  TOPIC 2 FOUNDATION OF ASSESSMENT: WHAT TO ASSESS?

Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching,
and assessing: A revision of BloomÊs taxonomy of educational objectives.
Boston, MA: Allyn & Bacon.

Dietel, R., Herman, J., & Knuth, R. (1991). What does research say about
assessment. Retrieved from https://bit.ly/2ECzOXP

Dwyer, F. M. (1991). A paradigm for generating curriculum design oriented
research questions in distance education. Second American Symposium
Research in Distance Education. University Park, PA: Pennsylvania State
University.

Heinich, R., Molenda, M., Russell, J. D., & Smaldino, S. E. (2001). Instructional
media and technologies for learning (7th ed.). Englewood Cliffs, NJ: Prentice
Hall.

Krathwohl, D., Bloom, B., & Bertram, B. (1973). Taxonomy of educational
objectives, the classification of educational goals, handbook II: Affective
domain. New York, NY: David McKay.

Copyright © Open University Malaysia (OUM)

Topic  Planning

3 Classroom
Tests

LEARNING OUTCOMES

By the end of the topic, you should be able to:
1. Describe the process of planning a classroom test;
2. Explain the purposes of a test and their impotance in test planning;
3. Describe how learning outcomes to be assessed affect test planning;
4. Select the best item types for a test in line with learning outcomes;
5. Develop a table of specifications for a test;
6. Identify appropriate marking schemes for an essay test; and
7. Explain the general principles of constructing relevant test items.

 INTRODUCTION

In this topic, we will focus on the process of planning classroom tests. Testing is
part of the teaching and learning process. The importance of planning and writing
a reliable, valid and fair test cannot be underestimated. Designing tests is an
important part of assessing studentsÊ understanding of course content and their
level of competency in applying what they have learnt. Whether you use low-
stakes quizzes or high-stakes mid-semester and final examination tests, careful
design will help provide more calibrated results. Assessments should reveal how
well students have learnt what teachers want them to learn while instruction
ensures that they learn it.

Copyright © Open University Malaysia (OUM)

46  TOPIC 3 PLANNING CLASSROOM TESTS

Thus, thinking about summative assessment at the end of a programme of teaching
is not enough. It is also helpful to think about assessment at every stage of the
planning process, because identifying the ways in which teachers will assess their
students will help clarify what it is that teachers want them to learn, and this in
turn will help determine the most suitable learning activities.

This topic will discuss the general guidelines applicable to most assessment tools
when planning a test. Topics 4 and 5 will discuss in detail the objectives of essay
tests. The authentic assessment tools such as projects and portfolios will be
discussed in the respective topics.

3.1 PURPOSES OF CLASSROOM TESTING

Tests can refer to traditional paper-and-pencil or computer-based tests, such as
multiple choice, short answer and essay tests. Tests provide teachers with objective
feedback as to how much students are learning and how much they have
understood what they have learnt. Commercially published achievement tests to
some extent can provide evaluation of the knowledge levels of individual students,
but provide only limited instructional guidance in assessing the wide range of
skills taught in any given classroom.

Teachers know their students and they are the best assessors of their students.
Tests developed by the individual teachers for use with their own class are most
instructionally relevant. Teachers can tailor tests to emphasise the information
they consider important and to match the ability levels of their students. If
carefully constructed, classroom tests can provide teachers with accurate and
useful information about the knowledge retained by their students.

The key to this process is the test questions that are used to elicit evidence of
learning. Test questions and tasks are not just planning tools; they also form an
essential part of the teaching sequence. Incorporating the tasks into teaching and
using the evidence about the student learning to determine what happens next in
the lesson is truly an embedded formative assessment.

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  47

3.2 PLANNING A CLASSROOM TEST

A well constructed test must have high quality items. The well constructed test is
an instrument that provides accurate measure of test takerÊs ability within a
particular domain. It is worth spending time writing high quality items for the
tests. In order to produce high quality questions, the test construction has to be
properly planned. Let us look at the following steps of planning a test (refer to
Figure 3.1).

Figure 3.1: Planning a test

3.2.1 Deciding Its Purposes

The first step in test planning is to decide on the purpose of the test. Tests can be
used for many different purposes. If a test is to be used formatively, it should
indicate precisely what the student needs to study, and to what level. The purpose
of formative tests is to assess progress and to direct the learning process. These
tests will have limited sample of content and learning outcomes. Teachers must
prepare sufficient mix of easy and difficult items. These items are used to make
corrective prescriptions such as practice exercises for some students who do not
perform satisfactorily in the tests.

Copyright © Open University Malaysia (OUM)

48  TOPIC 3 PLANNING CLASSROOM TESTS

If a test is to be used summatively, the coverage of content and learning outcomes
would be different from that of formative tests. Summative tests are normally
conducted at the end of a teaching and learning phase, for example, at the end of
a course. They are used to determine the studentsÊ mastery level of the course and
help teachers to decide whether a particular student can proceed to the next level
of his or her studies. The summative tests should therefore cover the whole content
areas and learning outcomes of the course, or should at least cover a representative
sample of the contents and learning outcomes of the course. The test items are also
varied in their levels of difficulty and complexity as defined by the learning
outcomes.

Tests can also serve a diagnostic purpose. Diagnostic tests are used to find out
what students know and do not know, and their strengths and weaknesses. They
typically happen at the start of a new phase of education, like when they start
learning a new course. The tests normally cover topics (content as well as learning
outcomes) that students will study in the upcoming course. The test items included
in the test are usually simple. Besides, diagnostic tests are also used to „diagnose‰
the learning difficulties encountered by students. When used for this purpose, the
tests will cover specific content areas and learning outcomes and hope to unravel
the causes of the learning problems so that remediation can be implemented.

3.2.2 Specifying the Intended Learning Outcomes

The focus of instruction in a course of study is not mere acquisition of knowledge
by students but more importantly on how they can use and apply the acquired
knowledge in different and meaningful situations. The latter has been referred to
as course learning outcomes (CLOs), which should cover the cognitive, affective
and psychomotor domains as explained in Topic 2. In other words, the emphasis
in instruction should be on the mastery of CLOs when teachers deliver the content
covered in the topics of the course. The syllabus of a course should therefore
present not only the relevant content areas in the form of topics but also indicate
the CLOs to be achieved. A course of study might have a number of topics but only
three to five CLOs. For instance, for an Educational Assessment course, there may
be 10 topics to be covered with four CLOs, which are spread across the 10 topics
as shown in Table 3.1.

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  49

Table 3.1: Mapping of Course Learning Outcomes Across Topics

CLO Explain the Compare the Develop Critically Evaluate
Different Different Different the Suitability of
Topic Assessment
1 Principles and Methods of Methods for Different
2 Theories of Educational Use in the Assessment
3 Educational Testing and Classroom (C3) Methods for Use in
4 Testing and Assessment the Classroom (C6)
5
6 Assessment (C2) (C4) xx
7 x xx
8 x
9 x x
10 x x x
x
x

x

x

x

Note: The parentheses indicate the levels of complexity according to the BloomÊs
taxonomy

In line with the principle of constructive alignment, assessment of a course should
also focus on the mastery of CLOs. CLOs are normally written in general terms.
Under each topic, the learning outcome is more specific and is often referred to as
an intended learning outcome (ILO). In assessing a topic of a course, it is
imperative that its ILO is clearly specified. Table 3.2 states examples of CLO and
its related ILO for a specific topic in the Educational Assessment course (i.e.
Portfolio Assessment)

CLO Table 3.2: Example of Course Learning Outcome (CLO) and Intended
ILO Learning Outcome (ILO)

Critically evaluate the suitability of different assessment methods for use in the
classroom (C6)

Critically evaluate the usefulness of portfolios as an assessment tool (C6)

Copyright © Open University Malaysia (OUM)

50  TOPIC 3 PLANNING CLASSROOM TESTS

A word of caution. Remember, not all ILOs can be assessed by tests. Tests are only
appropriate in assessing cognitive learning outcomes. For example, of the
following three intended learning outcomes (ILO), only ILO 1 can be assessed by
a test using an essay question. On the other hand, ILO 2, which belongs to the
pyschomotor domain is more appropriately assessed by practical work via teacher
observation, while ILO 3 which belongs to the affective domain, may be assessed
during the implementation of the class project via peer evaluation.

ILO 1 Explain the differences among the cognitive, affective and psychomotor
domains of learning outcomes.

ILO 2 Demonstrate the proper technique of executing a table tennis top-spin
in service.

ILO 3 Work collaboratively with other students in the team to complete the
class project.

SELF-CHECK 3.1

1. What type of learning outcome in BloomÊs taxonomy can be
assessed by tests? Why?

2. How is the intended learning outcome (ILO) different from course
learning outcome (CLO)?

3.2.3 Selecting Best Item Types

Once the intended learning outcomes (ILOs) for the topics to be assessed have been
specified, the next step in planning a test is to select the best item types. Different
item types have different purposes and are different in their usefulness. Table 3.3
shows two common item types used in a test ă multiple-choice and essay questions
and their respective purposes and usefulness. Refer to Topics 4 and 5 for more
details.

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  51

Table 3.3: Item Types and Their Respective Purposes and Usefulness

No. Item Type Purpose and Usefuness
1 Multiple-choice
 Test for factual knowledge
2 Essay  Assess a large number of items
 Score rapidly, accurately and objectively

 Require candidates to write an extended piece on a
certain topic

 Assess higher-order thinking skills such as analysing,
sythesising and evaluating in BloomÊs taxonomy

It is thus imperative that the item types selected in assessment should be relevant
to the ILO to be assessed. There must be a close match between the ILOs and the
types of items to be used. For example, if the ILO is to develop the ability to
organise ideas, the use of multiple-choice test would be a poor choice. The best
item type would be an essay question. The following are two intended learning
outcomes (refer to Table 3.4). Can you select the best item types to assess them?

ILO 1 Table 3.4: Examples of Intended Learning Outcomes
ILO 2 Discuss the usefulness of portfolios as an assessment tool in education.
Define what a portfolio is.

ILO 1 requires students to present a discussion. They need to thoroughly review,
examine, debate or argue the pros and cons of a subject. To do this, they need to
write an extended response. ILO 1 can only be assessed by an essay test. However,
ILO 2 merely requires students to identify a definition. A multiple-choice question
(MCQ) is good enough to perform the assessment task.

Copyright © Open University Malaysia (OUM)

52  TOPIC 3 PLANNING CLASSROOM TESTS

ACTIVITY 3.1

The following is a list of learning outcomes. Identify the best item type
to assess each of them.

No. Learning Outcome MCQ/Essay

1 Name the levels of BloomÊs taxonomy and identify
the intellectual behaviour each refers to.

2 Devise a table of specification, complete with
information on what to assess and how to assess.

3 Discuss the strengths and weaknesses of using
essay questions as an assessment tool.

4 Defend the use of portfolios for classroom
assessment.

5 Define norm-referenced and criterion-referenced
assessments.

6 Explain the purposes of assessment in education.

7 Describe the process involved in planning a test.

8 Illustrate the use of item analysis in assessing the
quality of a MCQ.

9 Differentiate between formative and summative
assessments.

10 Develop appropriate scoring rubrics as marking
schemes for essay questions.

11 State the advantages and disadvantages of
multiple-choice items as an assessment tool.

12 Examine the usefulness of project work as an
assessment tool.

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  53

3.2.4 Developing a Table of Specifications

Making a test blueprint or table of specifications is the next important step that
teachers should do. The table presents the topics of the course, the cognitive
complexity levels of the test items according to BloomÊs taxonomy, the number of
test items corresponding to the number of hours devoted to the topics and course
learning outcomes in class. In fact, the decision of exactly how many test items to
include in a test is based on the importance of the topics and learning outcomes as
indicated by student learning time, the item types used, and also the amount of
time available for testing.
A table of specifications is a two-way table with the cognitive complexity levels
across the top, and the topics and course learning outcomes to be covered by a test
and hours of interaction down one side. The item numbers associated with each
topic are presented under the complexity level as determined by the CLO.
Table 3.5 presents an example of a table of specifications with MCQs as the item
type. For ease of understanding, let us assume that the test will only cover the
first three complexity levels of BloomÊs taxonomy, namely Knowledge (C1),
Comprehension (C2) and Application (C3).

Copyright © Open University Malaysia (OUM)

Table 3.5: Table of Specifications: MCQ Item Type54  TOPIC 3 PLANNING CLASSROOM TESTS

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  55

In this example, the vertical columns on the left of the two-way table show a list of
the topics covered in class and the amount of time spent on those topics. The
amount of time spent on the topics as shown in the column „Hours of Interaction‰
is used as a basis to compute the weightage or percentage (% hours) and the marks
allocated. For a test with MCQs, the marks allocated also indicate the number of
test items for each topic.

In this hypothetical case, the teacher has spent 20 hours teaching the three topics
of which 4 hours are alloted to Topic 1. Thus, 4 hours from a total of 20 hours
amount to 20 per cent or six items from the total of 30 items as planned by the
teacher. Likewise, the weightage or percentage and the marks alloted for Topics 2
and 3 are computed in the same manner. The weightage and number of items for
Topic 2 are 30 per cent and nine items respectively. For Topic 3, they are 50 per
cent and 15 items respectively.

Based on the cognitive complexity level of the CLO for each topic, the teacher will
decide on the number of items to be included under each level. This information
is presented in the cells of the column on Item No. For example, the cognitive
complexity level of the CLO1 for Topic 1 is C2, the teacher has decided to have two
items at C1 (i.e. items 1 and 2) and four items at C2 (i.e. items 10, 11, 12 and 13). Of
course, he or she can decide to have all the six items framed at C2, but not at C3.
For Topic 2, the number of items required is nine at C2. Again, the teacher has
decided to have some items at C1 (i.e. four items) and the rest at C2 (i.e. five items)
to make up the required number of items. Topic 3 seems to be the most important
topic and it requires 15 items, i.e. half of the total in the test, and the teacher has
decided to have three items at C1, six items at C2 and another six items at C3.
Overall, of the total 30 items in the test, 30 per cent of them are at C1, 50 per cent
at C2 and 20 per cent at C3. The teacher, of course, might have a reason for such a
distribution. Perhaps, he or she feels that this is the beginning of the course, and
he or she wants to focus on the understanding of the key concepts of the course.
Whatever it is, the decision is the prerogative of the teacher who knows best on
what and how he or she wants to assess the students.

Table 3.6 is another example of a table of specifications. The table focuses on essay
items.

Copyright © Open University Malaysia (OUM)

Table 3.6: Table of Specifications: Essay Questions56  TOPIC 3 PLANNING CLASSROOM TESTS

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  57

The first vertical column on the left presents the five topics identified for
assessment, followed by hours of interaction for each topic in the second column.
Based on this formation, the teacher can work out the weightage in terms of %
hours and marks for each topic. In this hypothetical case, the teacher has spent
50 hours teaching the five topics of which 5 hours each are alloted to Topics 1 and
2. Thus, 5 hours from a total of 50 hours amount to 10 per cent or 10 marks from
the total of 100 total marks as planned by the teacher. Likewise, the weightage or
percentage and the marks alloted for Topic 2, Topic 3, Topic 4 and Topic 5 are
computed in the same manner. The weightage and marks alloted for Topic 2,
Topic 3, Topic 4 and Topic 5 are 10, 20, 20 and 40 respectively.

Based on the marks alloted and the cognitive complexity levels of the CLOs, the
teacher then decides how he or she is going to distribute the marks according to
the levels of complexity. For example, for Topic 1, he or she can have one essay
item carrying 10 marks at C3 or two essay items, one at C2 and the other at C3 and
each carries 5 marks. In this hypothetical case, the teacher has decided to have two
items for Topic 1 and another two for Topic 2, each carrying 5 marks. The four
items make up the Section A of the test. For Topic 3, he or she has decided to
distribute the 20 marks between two items, each carrying 10 marks. He or she has
done the same for Topic 4. This makes up the Section B. Section C is alloted to
Topic 5 with two items, each carrying 20 marks. This is just an example of how the
marks for each topic are distributed and the number of items decided. There can
be many other variations, of course.

So far, we have looked at the table of specifications in the form of a two-way table.
A table of specifications can be in the form of a three-way table with item types as
an additional level. Whatever the format, the table of specifications is a very useful
piece of document in assessment. This kind of table ensures that a fair and
representative sample of items or questions appear in the test. Teachers cannot
measure every piece of content in the syllabus and cannot ask every question they
might wish to ask. A table of specifications allows the teacher to construct a test
which focuses on the key contents as defined by the weights in percentages given
to them. A table of specifications provides the teacher with evidence that a test
has content validity, that it covers what should be covered. This table also allows
teacher to view the test as a whole.

The teacher, especially a newly trained one, is advised to have this table of
specifications together with the subject syllabus reviewed by the subject expert or
the subject head of department whether the test plan has included what it is
supposed to measure. In other words, it is important that the table of specifications
must have content validity. To ensure this, the students should ideally not be given
choices in a test. Without choices, all students are thus assessed equally.

Copyright © Open University Malaysia (OUM)

58  TOPIC 3 PLANNING CLASSROOM TESTS

SELF-CHECK 3.2

What is a table of specifications?

ACTIVITY 3.2

1. Have you used a table of specifications?
2. Identify a course of your choice, prepare a table of specifications for

a test.
Share your answers with your coursemates in the myINSPIRE online
forum.

3.2.5 Constructing Test Items

Once a valid table of specifications has been prepared, the next step is constructing
the test items. While the different item types such as multiple choice, short answer,
true-false, matching and essay items are constructed differently, the following
principles apply to constructing test items in general.
(a) Make the instructions for each type of item simple and brief;
(b) Use simple and clear language in the questions;
(c) Write items that are appropriate for the learning outcomes to be measured;
(d) Do not provide clue or suggest the answer to one question in the body of

another question;
(e) Avoid writing questions in the negative. If you must use negatives, highlight

them, as they may mislead students into answering incorrectly;
(f) Specify the precision of answers;
(g) Try as far as possible to write your own questions. Check to make sure the

questions fit the learning objectives and requirements in the table of
specifications if you need to use questions from other sources; and
(h) If the item was revised, recheck its relevance.

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  59

In writing test items, you must also consider the length of the test as well as the
reading level of your students. You do not want students to feel rushed and
frustrated because they are not able to demonstrate their knowledge of the material
in the allotted time. Some general guidelines regarding time requirements for
secondary school student test takers are shown in Table 3.7.

Table 3.7: Allotment of Time for Each Type of Question

Task Approximate Time per Item
True-false items 20ă30 seconds
Multiple choice (factual) 40ă60 seconds
Multiple choice (complex) 70ă90 seconds
Matching (five stems/six choices) 2ă4 minutes
Short answer 2ă4 minutes
Multiple choice (with calculations) 2ă5 minutes
Word problems (simple math) 5ă10 minutes
Short essays 15ă20 minutes
Data analysis/graphing 15ă25 minutes
Extended essays 35ă50 minutes

If you are combining multiple choice and essay items, these estimates may help
you decide how many of each type of items to include. One mistake often made
by many educators is having too many questions for the time allowed.

Once your items are developed, make sure that you include clear directions to the
students. For the objective items, specify that they should select one answer for
each item and indicate the point value of each question, especially if you are
weighting sections of the test differently. For essay items, indicate the point value
and suggested time to be spent on the item (we will discuss essay questions in
more detail in Topic 5). If you are teaching a large class with close seating
arrangements and are giving an objective test, you may want to consider
administering several versions of your test to decrease the opportunities for
cheating. You can create versions of your test with different arrangements of the
items.

More detailed guidelines to prepare and write multiple choice, short answer, true-
false, matching, essay, portfolios and projects will be discussed in Topics 4, 5, 6
and 7 respectively.

Copyright © Open University Malaysia (OUM)

60  TOPIC 3 PLANNING CLASSROOM TESTS

ACTIVITY 3.3

To what extent do you agree with the allotment of time for each type of
question shown in Table 3.7?

Justify your answer by discussing it with your coursemates in the
myINSPIRE online forum.

3.2.6 Preparing Marking Schemes

Preparing a marking scheme well in advance of testing date will give teachers
ample time to review their questions and make changes to answers when
necessary.

The teacher should make it a habit to write a model answer which can be easily
understood by others. This model answer can be used by other teachers who act
as external examiners, if need be. For objective test items, the model answers are
simple. The marking scheme is just a list of answers with the marks alloted for
each. However, for essay items, the marking schemes can be a bit complicated and
require special skills and knowledge to prepare. The marking schemes may take
the form of a checklist, a rubric or a combination of both. Refer to Topic 5 for a
detailed explanation on the marking scheme.

Coordination on the use of marking schemes should be done once the test answer
scripts are collected. Teachers should try to read answers from some scripts and
review the correct answers in the marking scheme. Teachers may sometimes find
that students have interpreted the test question in a way that is different from what
is intended. Students may come up with excellent answers that may fall slightly
outside what has been asked. Consider giving these students marks accordingly.
Likewise, teachers should make a note in the marking scheme for any error made
earlier but carried through the answer; marks should be deducted if the rest of the
response is sound.

SELF-CHECK 3.3

Why is it neccessary for a test to be accompanied by a marking scheme?

Copyright © Open University Malaysia (OUM)

TOPIC 3 PLANNING CLASSROOM TESTS  61

3.3 ASSESSING TEACHER’S OWN TEST

Regardless of the kind of tests teachers use, they can assess their effectiveness by
asking the following questions:

(a) Did I Test for What I Thought I Was Testing for?
If you wanted to know whether students could apply a concept to a new
situation, but mostly asked questions determining whether they could label
parts or define terms, then you tested for recall rather than application.

(b) Did I Test What I Taught?
For example, your questions may have tested the studentsÊ understanding of
surface features or procedures, while you had been lecturing on causation or
relation ă not so much what the names of the bones of the foot are, but how
they work together when we walk.

(c) Did I Test for What I Emphasised in Class?
Make sure that you have asked most of the questions about the material you
feel is the most important, especially if you have emphasised it in class.
Avoid questions on obscure material that are weighted the same as questions
on crucial material.

(d) Is the Material I Tested for Really What I Wanted Students to Learn?
For example, if you wanted students to use analytical skills such as the ability
to recognise patterns or draw inferences, but only used true-false questions
requiring non-inferential recall, you might try writing more complex true-
false or MCQs.

Students should know what is expected of them. They should be able to identify
the characteristics of a satisfactory answer and understand the relative importance
of those characteristics. This can be achieved in many ways: you can provide
feedback on tests, describe your expectations in class, or post model solutions on
a class blog. Teachers are encouraged to make notes on the scripts. When exams
are returned to the students, the notes will help them understand their mistakes
and correct them.

SELF-CHECK 3.4

Describe the steps involved in planning a test.

Copyright © Open University Malaysia (OUM)

62  TOPIC 3 PLANNING CLASSROOM TESTS

 The first step in test planning is to decide on the purpose of the test. Tests can
be used for many different purposes.

 The next step is to consider the learning outcomes and their complexity levels
as defined by BloomÊs taxonomy. The teachers will have to select the
appropriate knowledge and skills to be assessed and include more questions
about the most important learning outcomes.

 The learning outcomes that the teachers want to emphasise will determine not
only what material to include on the test, but also the specific form the test will
take.

 Making a test blueprint or table of specifications is the next important step that
teachers should do.

 The table describes the topics, the behaviour of the students, the number of
questions on the test corresponding to the number of hours devoted to the
topics in class.

 The table of specifications helps to ensure that there is a match between what
is taught and what is tested.

 Classroom assessment is driven by classroom teaching which itself is driven
by learning outcomes.

 The test format used is one of the main driving factors in the studentsÊ learning
behaviour.

Checklist Intended learning outcome (ILO)
Marking schemes
Complexity levels of BloomÊs Rubrics
taxonomy Table of specifications

Course learning outcome (CLO)

Hours of interaction

Copyright © Open University Malaysia (OUM)

Topic  How to Assess?

4 ă Objective
Tests

LEARNING OUTCOMES

By the end of the topic, you should be able to:
1. Define an objective test and list the different types of objective tests;
2. Construct short-answer questions;
3. Construct multiple-choice questions;
4. Develop true-false questions; and
5. Prepare matching questions.

 INTRODUCTION

In Topic 2, we discussed the need to assess students holistically based on cognitive,
affective and psychomotor learning outcomes, and in Topic 3, we looked at the
steps involved in planning a class test.
In this topic, we will focus on using objective tests in assessing various kinds of
behaviour in the classroom. Four types of objective tests are examined and the
guidelines for the construction of each type of test are discussed. The advantages
and limitations of these types of objective tests are explained too.

Copyright © Open University Malaysia (OUM)

64  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

4.1 WHAT IS AN OBJECTIVE TEST?

When objective tests were first used in 1845 by George Fisher in the United States,
they were not well-received by the society. However, over the years, they have
gained acceptance and are now widely used in schools, industries, businesses,
professional organisations, universities and colleges. In fact, they have become the
most popular format of assessing various types of human abilities, competencies
and socio-emotional attributes.
What is an objective test? An objective test is a written test consisting of items or
questions which require the respondent to answer by supplying a word, phrase or
symbol or by selecting from a list of possible answers. The former is referred to as
supply-type items while the latter is referred to as selection-type items. The
common supply-type items are short-answer questions and the selection-type
items are multiple-choice questions, true-false questions and matching questions.
The word objective means „accurate‰. An objective item or question is „accurate‰
because there is only one correct answer and the marking cannot be influenced by
the personal preferences and prejudices of the marker. In other words, it is not
„subjective‰ and not open to varying interpretations. This is one of the reasons
why the objective test is popular in measuring human abilities, competencies and
many other psychological attributes such as personality, interest and attitude.
Figure 4.1 describes how objective tests were used in Malaysian schools since their
inception and also how they are used today.

Figure 4.1: Objective tests in Malaysian schools

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  65

Objective tests vary depending on how the questions are presented. The four
common types of questions used in most objective tests are multiple-choice
questions, true-false questions, matching questions and short-answer questions
(refer to Figure 4.2).

Figure 4.2: Common formats of objective tests

4.2 MULTIPLE-CHOICE QUESTIONS (MCQS)

Let us take a look at one of the most popular objective tests which is the multiple-
choice question.

4.2.1 What is a Multiple-choice Question?

Multiple-choice questions or MCQs are widely used in many different settings
because they can be used to measure low-level cognitive outcomes as well as more
complex cognitive outcomes. It is challenging to write test items to tap into higher-
order thinking. All the demands of good item writing can only be met when test
writers have been well-trained. Above all, test writers need to have expertise in the
subject area being tested so they can gauge the difficulty and content coverage of
the test items.

Copyright © Open University Malaysia (OUM)

66  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

Multiple-choice questions are the most difficult to prepare. These questions have
two parts:
(a) A stem that contains the question; and
(b) Four or five options with one containing the correct answer called the key

response. Three-option multiple-choice questions are also gaining
acceptance.
The other incorrect options are called distractors. The stem may be presented as a
question or a statement, while the options can be a word, phrase, numbers,
symbols and so forth. The role of the distractors is to attract the attention of
respondents who are not sure of the correct answer.
A traditional multiple-choice question (or item) is one in which a student chooses
one answer from a number of choices supplied (as illustrated in Figure 4.3).

Figure 4.3: Multiple-choice question
(a) The stem should:

(i) Be in the form of a question or statement to be completed;
(ii) Be expressed clearly and concisely, avoiding poor grammar, complex

syntax, ambiguity and double negatives;
(iii) Generally present a positive question (if a negative is used, it should be

emphasised with italics or underlining);
(iv) Generally ask for one answer only (the correct or the best answer); and
(v) Include as many of the words common to all alternatives as possible.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  67

(b) The options or alternatives should:
(i) Ensure that each item has either three, four or five alternatives, all of
which should be mutually exclusive and not too long;
(ii) All follow grammatically from the stem and be parallel in grammatical
form;
(iii) Be unambiguous and expressed simply enough to make clear the
essential differences between them; and
(iv) Ensure that the intended answer or key be clearly correct to the
informed, while the distractors should be definitely incorrect, but
plausible.

SELF-CHECK 3.1

1. What is an objective test?
2. Why is multiple-choice questions (MCQs) test a popular form of

objective test?

4.2.2 Construction of Multiple-choice Questions

Did you know that MCQs test writing is a profession? By that, we mean that good
test writers are professionally trained in designing test items. Test writers have
knowledge of the rules of constructing items, but at the same time they have the
creativity to construct items that capture studentsÊ attention. Test items need to be
succinct but clear in meaning.

McKenna and Bull (1999) offered some guidelines for constructing stems for
multiple-choice questions. All the options in multiple-choice items need to be
plausible, but they also need to separate students of different ability levels. Let us
take a look at these guidelines.
(a) When writing stems, present a single, definite statement to be completed or

answered by one of the several given choices (see Example 4.1).

Copyright © Open University Malaysia (OUM)

68  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

Example 4.1:

Weak Question Improved Question

World War II was: In which of these time periods was
A. The result of the failure of the World War II fought?
A. 1914ă1917
League of Nations B. 1929ă1934
B. Horrible C. 1939ă1945
C. Fought in Europe, Asia and Africa D. 1951ă1955
D. Fought during the period of 1939ă

1945

Note: In the weak question, there is no clue from the stem to what the
question is asking. The improved version identifies more clearly the question
and offers the student a set of homogeneous choices.

(b) When writing stems, avoid unnecessary and irrelevant material (see
Example 4.2).

Example 4.2:

Weak Question Improved Question

For almost a century, the Rhine river has Which of the following would be
been used by Europeans for a variety of the most dramatic result if, because
purposes. However, in recent years, the of diesel pollution from ships, the
increased river traffic has resulted in river Rhine was closed to all
increased levels of diesel pollution in the shipping?
waterway. Which of the following would
be the most dramatic result if, because of A. Increased prices for Ruhr
the pollution, the Council of Ministers of products
the European Community decided to
close the Rhine to all shipping? B. Shortage of water for Italian
industries
A. Increased prices for Ruhr products
C. Reduced competitiveness of
B. Shortage of water for Italian the French Aerospace
industries Industry

C. Reduced competitiveness of the D. Closure of the busy river
French Aerospace Industry Rhine ports of Rotterdam,
Marseilles and Genoa
D. Closure of the busy river Rhine
ports of Rotterdam, Marseilles and
Genoa

Note: The weak question is too wordy and contains unnecessary material.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  69

(c) When writing stems, use clear, straightforward language. Questions that are
constructed using complex wording may become a test of reading
comprehension rather than an assessment of studentÊs performance with
regard to a specific learning outcome (see Example 4.3).

Example 4.3:

Weak Question Improved Question

As the level of fertility approaches its A major decline in fertility in a
nadir, what is the most likely developing nation is likely to
ramification for the citizenry of a produce
developing nation?
A. A decrease in the labour force
A. A decrease in the labour force participation rate of women
participation rate of women
B. A downward trend in the
B. A downward trend in the youth youth dependency ratio
dependency ratio
C. A broader base in the
C. A broader base in the population population pyramid
pyramid
D. An increased infant mortality
D. An increased infant mortality rate rate

Note: In the improved question, the word „nadir‰ is replaced with „decline‰
and „ramifications‰ is replaced with „produce‰. These are more
straightforward words.

(d) When writing stems, use negatives sparingly. If negatives must be used,
capitalise, underscore or bold (see Example 4.4).

Example 4.4:

Weak Question Improved Question

Which of the following is not a symptom Which of the following is a symptom
of osteoporosis? of osteoporosis?
A. Decreased bone density A. Decreased bone density
B. Frequent bone fractures B. Raised body temperature
C. Raised body temperature C. Painful joints
D. Lower back pain D. Hair loss

Note: The improved question is stated in the positive so as to avoid the use
of the negative „not‰.

Copyright © Open University Malaysia (OUM)

70  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(e) When writing stems, put as much of the question in the stem as possible,
rather than duplicating material in each of the options (see Example 4.5).

Example 4.5:

Weak Question Improved Question

Theorists of pluralism have asserted Theorists of pluralism have asserted
which of the following? that the maintenance of democracy
requires
A. The maintenance of democracy
requires a large middle class A. A large middle class

B. The maintenance of democracy B. The separation of
requires autonomous centres of governmental powers
countervailing power
C. Autonomous centres of
C. The maintenance of democracy countervailing power
requires the existence of a
multiplicity of religious groups D. The existence of a multiplicity
of religious groups
D. The maintenance of democracy
requires the separation of
governmental powers

Note: In the improved question, the phrase „maintenance of democracy‰ is
included in the stem so as not to duplicate it in each option.

(f) When writing stems, avoid giving away the answer because of grammatical
cues (see Example 4.6).

Example 4.6:

Weak Question Improved Question

A fertile area in the desert in which the A fertile area in the desert in which
water table reaches the ground surface is the water table reaches the ground
called an surface is called a/an

A. Mirage A. Lake

B. Oasis B. Mirage

C. Lake C. Oasis

D. Polder D. Polder

Note: The weak question uses the article „an‰ which identifies choice B as
the correct response. Ending the stem with „a/an‰ improves the question.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  71

(g) When writing stems, avoid asking an opinion as much as possible.

(h) Avoid using the words „always‰ and „never‰ in the stem as test-wise
students are likely to rule such universal statements out of consideration.

(i) When writing distractors for single response MCQs, make sure that there is
only one correct response (see Example 4.7).

Example 4.7:

Weak Question Improved Question

What is the main source of pollution of What is the main source of pollution
Malaysian rivers? of Malaysian rivers?
A. Land clearing A. Open burning
B. Open burning B. Coastal erosion
C. Solid waste dumping C. Solid waste dumping
D. Coastal erosion D. Carbon dioxide emission

Note: In the weak question, both options A and C can be considered correct.

(j) When writing distractors, use only plausible and attractive alternatives (see
Example 4.8).

Example 4.8:

Weak Question Improved Question

Who was the third Prime Minister of Who was the third Prime Minister of
Malaysia? Malaysia?
A. Hussein Onn A. Hussein Onn
B. Ghafar Baba B. Abdullah Badawi
C. Mahathir Mohamad C. Mahathir Mohamad
D. Musa Hitam D. Abdul Razak Hussein

Note: In the weak question, B and D are not serious distractors.

Copyright © Open University Malaysia (OUM)

72  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(k) When writing distractors, if possible, avoid the choices „All of the above‰
and „None of the above‰. If you do include them, make sure that they appear
as correct answers some of the time.

It is tempting to resort to these alternatives but their use can be flawed. To
begin with, they often appear as an alternative that is not the correct
response. If you do use them, be sure that they constitute the correct answer
part of the time. An „All of the above‰ alternative can be exploited by a test-
wise student who will recognise it as the correct choice by identifying only
two correct alternatives. Similarly, a student who can identify one wrong
alternative can then rule this response out. Clearly, the studentÊs chance of
guessing the correct answer improves as he or she employs these techniques.
Although a similar process of elimination is not possible with „None of the
above‰, it is the case that when this option is used as the correct answer, the
question is only testing the studentÊs ability to rule out the wrong answers
and this does not guarantee that the student knows the correct one
(Gronlund, 1988).

(l) Distractors based on common student errors or misconceptions are very
effective.

One technique for compiling distractors is to ask students to respond to
open-ended short answer questions, perhaps as formative assessments.
Identify which incorrect responses appear most frequently and use them as
distractors for a multiple-choice version of the question.

(m) Do not create distractors that are so close to the correct answer that they may
confuse students who really know the answer to the question. Distractors
should differ from the key in a substantial way, not just in some minor
nuance of phrasing or emphasis.

(n) Provide a sufficient number of distractors.

You will probably choose to use three, four or five alternatives in a multiple-
choice question. Until recently, it was thought that three or four distractors
were necessary for the item to be suitably difficult.

However, a study by Owen and Freeman suggests that three choices are
sufficient (Owen & Freeman, 1987). Clearly, the higher the number of
distractors, the less likely it is for the correct answer to be chosen through
guessing provided that all alternatives are of equal difficulty.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  73

ACTIVITY 4.1

1. Do you agree that you should not use negatives in the stems of
MCQs? Why?

2. Do you agree that you should not use „All of the above‰ and
„None of the above‰ as distractors in MCQs? Why?

3. Select 10 multiple-choice questions in your subject area and
analyse the distractors of each item using the guidelines
mentioned earlier.

4. Suggest how you would improve weak distractors.
Share your answers with your coursemates in the myINSPIRE online
forum.

4.2.3 Advantages of Multiple-choice Questions

Multiple-choice questions are widely used to measure knowledge outcomes and
various types of learning outcomes. They are popular because of the following
reasons:
(a) Learning outcomes from simple to complex can be measured;
(b) There are highly structured and clear tasks are provided;
(c) A broad sample of achievement can be measured;
(d) Incorrect alternatives or options provide diagnostic information;
(e) Scores are less influenced by guessing than true-false items;
(f) Scores are more reliable than subjectively scored items (such as essays);
(g) Scoring is easy, objective and reliable;
(h) Item analysis can reveal how difficult each item was and how well it

discriminated between the stronger and weaker students in the class;
(i) Performance can be compared from class to class and year to year;

Copyright © Open University Malaysia (OUM)

74  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(j) Can cover a lot of material very efficiently (about one item per minute of
testing time); and

(k) Items can be written so that students must discriminate among options that
vary in degree of correctness.

4.2.4 Limitations of Multiple-choice Questions

While there are many advantages of using multiple-choice questions, there are also
many limitations in using such items. These limitations are:
(a) Constructing good items is time-consuming;
(b) It is frequently difficult to find plausible distractors;
(c) MCQs are not as effective for measuring some types of problem-solving

skills and the ability to organise and express ideas;
(d) Scores can be influenced by studentsÊ reading ability;
(e) There is a lack of feedback on individual thought processes ă it is difficult to

determine why individual students selected incorrect responses;
(f) Students can sometimes read more into the question than was intended;
(g) It often focuses on testing factual information and fails to test higher levels

of cognitive thinking;
(h) Sometimes, there is more than one defensible „correct‰ answer;
(i) They place a high degree of independence on the studentÊs reading ability

and the constructorÊs writing ability;
(j) Does not provide a measure of writing ability; and
(k) May encourage guessing.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  75

Last but not least, let us look at Figure 4.4 which highlights some procedural rules
when constructing multiple-choice questions.

Figure 4.4: Procedural rules for the construction of multiple-choice questions

SELF-CHECK 4.2

1. What are some advantages of using multiple-choice questions?
2. List some limitations or weaknesses of multiple-choice questions.

4.3 TRUE-FALSE QUESTIONS

The next type of objective test is the true-false question. Here, we will discuss the
rationale for its use as well as its limitations.

Copyright © Open University Malaysia (OUM)

76  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

4.3.1 What are True-False Questions?

In the most basic format, true-false questions are those in which a statement is
presented and the student indicates in some manner whether the statement is true
or false. In other words, there are only two possible responses for each item and
the student chooses between them. A true-false question is a specialised form of
the multiple-choice format in which there are only two possible alternatives. These
questions can be used when the test designer wishes to measure a studentÊs ability
to identify whether statements of fact are accurate or not. The true-false questions
can be used for testing knowledge and judgement in many subjects. When
grouped together, a series of true-false questions on a specific topic or scenario can
test a more complex understanding of an issue. They can be structured to lead a
student through a logical pathway and can reveal part of the thinking process
employed by the student in order to solve a given problem. Let us see
Example 4.9.

Example 4.9:

True False

A whale is a mammal because it gives birth to its young.

4.3.2 Advantages of True-False Questions

True-false questions can be quickly written and can cover a lot of content. True-
false questions are well-suited for testing studentsÊ recall or comprehension.
Students can generally respond to many questions covering a lot of content in a
fairly short amount of time. From the teacherÊs perspective, these questions can be
written quickly and are easy to score. Since they can be objectively scored, the
scores are more reliable than for items that are at least partially dependent on the
teacherÊs judgement. Generally, they are easier to construct compared to multiple-
choice questions because there is no need to develop distractors. Hence, they are
less time-consuming compared to constructing multiple-choice questions.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  77

4.3.3 Limitations of True-False Questions

However, true-false questions have a number of limitations, notably:
(a) Guessing

A student has a one in two chances of guessing the correct answer of a
question. Scores on true-false items tend to be high because of the ease of
guessing correct answers when the answers are not known. With only two
choices (true or false), the student can expect to guess correctly on half of the
items for which correct answers are not known. Thus, if a student knows the
correct answers to 10 questions out of 20 and guesses on the other 10, the
student can expect a score of 15. The teacher can anticipate scores ranging
from approximately 50 per cent for a student who did nothing but guess on
all items to 100 per cent for a student who knew the material.
(b) Tendency to Use the Original Text
Since these items are in the form of statements, there is sometimes a tendency
to take quotations from the text, expecting the student to recognise a correct
quotation or note a change (sometimes minor) in wording. There may also
be a tendency to include trivial or inconsequential material from the text.
Both of these practices are discouraged.
(c) Difficult to Set
It can be difficult to write a statement which is unambiguously true or false,
particularly for complex material.
(d) Unable to Discriminate Different Abilities
The format does not discriminate among students of different abilities as well
as other question types.

Copyright © Open University Malaysia (OUM)

78  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

4.3.4 Suggestions for Constructing True-False
Questions

Here are some suggestions for constructing true-false questions:

(a) Include only one main idea in each item (see Example 4.10).

Example 4.10:

Poor Item Better Item

The study of biology helps us The study of biology helps us
understand living organisms and understand living organism.
predict the weather.

(b) As in multiple-choice questions, use negatives sparingly. Avoid also double
negatives as they tend to contribute to the ambiguity of the statement.
Statement words like none, no and not should be avoided as far as possible
(see Example 4.11).

Example 4.11:

Poor Item Better Item

None of the steps in the experiment All the steps in the experiment were

were unnecessary. necessary.

(c) Avoid broad, general statements. Most of these statements are false unless
qualified (see Example 4.12).

Example 4.12:

Poor Item Better Item

Short-answer questions are more Short-answer questions are more

favourable than essay questions in favourable than essay questions in

testing. testing factual information.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  79

(d) Avoid long complex sentences. Such sentences also test reading
comprehension besides the achievement to be measured (see Example 4.13).

Example 4.13:

Poor Item Better Item

Despite the theoretical and Litmus paper turns red in an acidic
experimental difficulties of determining solution.
the exact pH value of a solution, it is
possible to determine whether a
solution is acidic by the red colour
formed on litmus paper when inserted
into the solution.

(e) Try using true-false questions in combination with other materials, such as
graphs, maps and written material. This combination allows for the testing
of more advanced learning.

(f) Avoid lifting statements directly from assigned readings, notes or other
course materials so that recall alone will not lead to a correct answer.

(g) In general, avoid the use of words which would signal the correct response
to the test-wise student. Absolutes such as „none‰, „never‰, „always‰, „all‰
and „impossible‰ tend to be false, while qualifiers such as „usually‰,
„generally‰, „sometimes‰ and „often‰ are likely to be true.

(h) A similar situation occurs with the use of „can‰ in a true-false statement. If
the student knows of a single case in which something „can‰ be done, it
would be true.

(i) Ambiguous or vague statements and terms, such as „largely‰, „long time‰,
„regularly‰, „some‰ and „usually‰ are best avoided in the interest of clarity.
Some terms have more than one meaning and may be interpreted differently
by individuals.

(j) True statements should be about the same length as false statements. There
is a tendency to add details in true statements to make them more precise.

(k) Word the statement so precisely that it can be judged unmistakably as either
true or false.

(l) Statements of opinion should be attributed to some source.

(m) Keep statements short and use simple language structure.

Copyright © Open University Malaysia (OUM)

80  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(n) Avoid verbal clues (specific determiners) that can indicate the answer.
(o) Test important ideas rather than trivia.
(p) Do not present items in an easily learned pattern.

4.4 MATCHING QUESTIONS

Matching questions are used in measuring a studentÊs ability to identify the
relationship between two lists of terms, phrases, statements, definitions, dates,
events, people and so forth. In addition, one matching question can replace several
true-false questions.

4.4.1 Construction of Matching Questions

In developing matching questions, you have to identify two columns of material
listed vertically. The items in Column A (or I) are usually called premises and
assigned numbers (1, 2, 3 and so on) while items in Column B (or II) are called
responses and designated capital letters (A, B, C and so on). The student reads a
premise (Column A) and finds the correct response from among those in Column
B. The student then prints the letter of the correct response in the blank besides the
premise in Column A.

An alternative is to have the student draw a line from the correct response to the
premise, but this is more time consuming to score. One way to reduce the
possibility of guessing correct answers is to list a larger number of responses
(Column B) than premises (Column A), as shown in Example 4.14:

Example 4.14:
Directions: Column A contains statements describing selected Asian cities. For
each description, find the appropriate city in Column B. Each city in Column B can
be used only once.

Column A Column B
1. Ancient capital of Thailand: ___________ A. Ayutthaya
2. Largest city in Sumatera: ______________ B. Ho Chi Minh City
3. Capital of Myanmar: _________________ C. Karachi
4. Formerly known as Saigon: ___________ D. Medan
5. Former capital of Pakistan: ____________ E. Yangon
F. Hanoi
G. Surabaya

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  81

Another way to decrease the possibility of guessing is to allow responses to be
used more than once. Directions to the students should be very clear about the use
of responses.

Some psychometricians suggest that no more than five to eight premises
(Column A) in one set are given. For each premise, the student has to read through
the entire list of responses (or those still unused) to find the matching response.
For this reason, the shorter elements should be in Column B, rather than Column
A, to minimise the amount of reading needed for each item. Responses (Column
B) should be listed in logical order if there is one (chronological, by size and so on).
If there is no apparent order, the responses should be listed alphabetically.
Premises (Column A) should not be listed in the same order as the responses. Care
must be taken to ensure that the association keyed as the correct response is
unquestionably correct and that the numbered item could not be rightly associated
with any other choice.

4.4.2 Advantages of Matching Questions

Like other types of assessments, there are advantages and disadvantages to
matching questions as well. Let us go through the advantages first.

(a) Matching questions are particularly good at assessing a studentÊs
understanding of relationships. They can test recall by requiring a student to
match the following elements (McBeath, 1992):

(i) Definitions ă Terms;

(ii) Historical events ă Dates;

(iii) Achievements ă People;

(iv) Statements ă Postulates; and

(v) Descriptions ă Principles.

(b) They can also assess a studentÊs ability to apply knowledge by requiring a
test-taker to match the following:

(i) Examples ă Terms;

(ii) Functions ă Parts;

(iii) Classifications ă Structures;

Copyright © Open University Malaysia (OUM)

82  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(iv) Applications ă Postulates; and
(v) Problems ă Principles.

(c) Matching questions format is really a variation of the multiple-choice format.
If you find that you are writing MCQs which share the same answer choices,
you may consider grouping the questions into a matching item.

(d) Matching questions are generally easy to write and score when the content
is tested and objectives are suitable for matching questions.

(e) Matching questions are highly efficient as a large amount of knowledge can
be sampled in a short amount of time.

4.4.3 Limitations of Matching Questions

There are also limitations when using this type of assessment, such as:
(a) Matching questions are limited to material that can be listed in two columns

and there may not be much material that lends itself to such a format;
(b) If there are four items in a matching question and the student knows the

answer for three of them, the fourth item is a giveaway through elimination;
(c) Difficult to differentiate between effective and ineffective items;
(d) Often leads to testing of trivial facts or bits of information; and
(e) Often criticised for encouraging rote memorisation.

4.4.4 Suggestions for Constructing Good Matching
Questions

When assessing students, we must prepare quality questions. Here are some
suggestions for constructing good matching questions:
(a) Provide clear directions. They should explain how many times responses can

be used;
(b) Keep the information in each column as homogeneous as possible;
(c) Include more responses than premises or allow the responses to be used

more than once;
(d) Put the items with more words in Column A;

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  83

(e) Correct answers should not be obvious to those who do not know the content
being taught;

(f) There should not be keywords appearing in both the premise and response,
providing clues to the correct answer; and

(g) All of the responses and premises for a matching item should appear on the
same page.

SELF-CHECK 4.3

1. What are some advantages of matching questions?
2. List some limitations of matching questions.

4.5 SHORT-ANSWER QUESTIONS

A short-answer question is basically a supply-type item. It exists in two formats,
namely direct question and completion question formats. The following are
examples of short-answer questions (refer to Table 4.1):

Table 4.1: Direct Question versus Completion Question

Direct Question Completion Question

 Who was the first Prime Minister of  The first Prime Minister of Malaysia

Malaysia? (Answer: Tunku Abdul was ___________. (Answer: Tunku

Rahman) Abdul Rahman)

 What is the value of x in the equation  In the equation 2x + 5 = 9, x =

2x + 5 = 9? (Answer: 2) __________________. (Answer: 2)

You may refer to Nitko (2004) for more examples.

Copyright © Open University Malaysia (OUM)


Click to View FlipBook Version