The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

HMEF5053 Measurement and Evaluation in Education_vDec19

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by nur adila, 2020-08-12 20:42:13

HMEF5053 Measurement and Evaluation in Education_vDec19

HMEF5053 Measurement and Evaluation in Education_vDec19

84  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

4.5.1 Strengths and Weaknesses of Short-answer
Questions

The short-answer questions are generally used to measure simple learning
outcomes. It is used almost exclusively to measure memorised information (except
for learning outcomes on problem-solving in Mathematics and Science). This has
partly made the short-answer question one of the easiest to construct.

Another strength of the short-answer questions is that the possibility of guessing
which often occurs in the selection-type item can be reduced. In this case, learners
must supply the correct answer when they respond to the question. They must
either recall the information asked for or make the necessary computations to
obtain the answer. They cannot rely on their partial knowledge to choose the
correct answer from the list of alternatives.

Many short-answer questions can be set for a specific period of time. A test paper
of short-answer questions is thus able to cover a fairly wide coverage of content of
the course to be assessed. This enhances the content validity of the test.

One major weakness of the short-answer questions is that it cannot be used to
measure complex learning outcomes such as organising ideas, presenting an
argument or evaluating information. What is required of learners is simply
providing a word, phrase or symbol.

Scoring of answers to the short-answer questions can also pose a problem. Unless
the question is carefully phrased, learners can provide answers of varying degree
of correctness. For example, the answer to a question such as „When was Malaysia
formed?‰ could either be „In 1963‰ or „On 16 September 1963‰. The teacher has to
decide whether learners who give the partial answer have the same level of
knowledge as those who provide the complete answer.

Besides, learnersÊ answers can also be contaminated by spelling errors. If spelling
is taken into consideration, the test scores of learners will reflect their level of
knowledge of the content assessed as well as their spelling ability. If spelling is not
considered in the scoring, the teacher has to decide whether the misspelled word
actually represents the correct answer.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  85

4.5.2 Guidelines on Constructing Short-answer
Questions

Although the construction of short-answer questions is comparatively easier than
other types of objective items, they have a variety of defects which should be
avoided to ensure that they will function as intended. The following are some
guidelines for the construction of short-answer questions.

(a) Word the question so that the intended answer is brief and specific. As far as
possible, the question should be phrased in such a way that only one answer
is correct (see Example 4.15).

Example 4.15:

Poor Item Better Item

An animal that eats the flesh of other An animal that eats the flesh of other

animals is ____________. animals is classified as ______________.

(Possible answers: a wolf, a lion, (One specific answer: carnivorous)
hungry,...etc)

(b) Use direct questions instead of incomplete statements. The meaning of the
items is often clearer if they are phrased as direct questions (see
Example 4.16).

Example 4.16:

Poor Item Better Item

The author of Alice in Wonderland was What is the pen name of the author of

______________. Alice in Wonderland?

(Possible answers: a story writer, a (Answer: Lewis Carroll)
mathematician, an Englishman, and
buried in 1898)

Copyright © Open University Malaysia (OUM)

86  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

(c) If the question requires a numerical answer, indicate the units in which the
answer is to be expressed (see Example 4.17).

Example 4.17:

Poor Item Better Item

When did Colombus discover In what year did Colombus discover

America? America?

(Possible answers: the 15th century, (1492)
1492)

(d) For an incomplete statement type of question, put the blank towards the end
of the sentence (see Example 4.18).

Example 4.18:

Poor Item Better Item

____________ is the capital of Malaysia. The capital of Malaysia is
_____________.

(Answer: Kuala Lumpur)

(e) For an incomplete statement type of question, limit blanks to one or two. If
there are more than two blanks in a statement, the question becomes
unintelligible or ambiguous (see Example 4.19).

Example 4.19:

Poor Item Better Item

_________ and __________ are two Two different methods of scoring essay

methods of scoring _________. tests are the __________ and

___________ methods.

(Answers: analytic, holistic)

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  87

(f) Avoid irrelevant clues (see Example 4.20).
Example 4.20:

Poor Item Better Item

A specialist in urban planning is called A specialist in city planning is called

an ___________. a(n) ____________.

(Answer: urbanist) (Answer: urbanist)

(g) Do not copy statements verbatim from textbooks. When you copy material,
you encourage students to do rote memorisation.

(h) A completion item should omit important words, not trivial words. Use the
item to assess a studentÊs knowledge of an important fact or concept.

(i) Keep all the blanks of completion items the same length so as not to cue the
students to the possible answer.

SELF-CHECK 4.4

1. What are the strengths of short-answer questions?
2. Elaborate on some weaknesses of short-answer questions.

ACTIVITY 4.2

1. Select five true-false questions in your subject area and analyse
each item using the guidelines mentioned earlier.

2. Select five matching questions in your subject area and analyse
each item using the guidelines mentioned earlier.

3. Suggest how you would improve the weak items for each type of
question.

Share your answers with your coursemates in the myINSPIRE online
forum.

Copyright © Open University Malaysia (OUM)

88  TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS

 An objective test is a written test consisting of items or questions which require
the respondent to select from a list of possible answers. An objective item or
question is „accurate‰ because it cannot be influenced by the personal
preferences and prejudices of the marker.

 Objective tests vary depending on how the questions are presented. The three
common types of questions used in most objective tests are multiple-choice
questions, matching questions and true-false questions.

 Multiple-choice questions have two parts: a stem that contains the question,
and three, four or five options with one of them containing the correct answer.
The correct option is called the key response and incorrect options are called
distractors.

 Multiple-choice questions are widely used because they can measure learning
outcomes from simple to complex. They are highly structured and clear tasks
are provided to test a broad sample of what has been learnt.

 Multiple-choice questions, however, are difficult to construct, tend to measure
low-level learning outcomes, lend themselves to guessing and do not measure
studentsÊ writing ability.

 True-false questions are those in which a statement is presented and the
student indicates whether the statement is true or false.

 True-false questions can be written quickly and are easy to score. Since they
can be objectively scored, the scores are more reliable than for items that are at
least partially dependent on the teacherÊs judgement.

 Avoid lifting statements directly from assigned readings, notes or other course
materials so that recall alone will not lead to a correct answer.

 Matching questions are used in measuring a studentÊs ability to identify the
relationship between two lists of terms, phrases, statements, definitions, dates,
events, people and so forth.

 To reduce the possibility of guessing correct answers in matching questions,
list a larger number of responses than premises and allow responses to be used
more than once.

Copyright © Open University Malaysia (OUM)

TOPIC 4 HOW TO ASSESS? – OBJECTIVE TESTS  89

 In writing test items, you must consider the length of the test or examination
as well as the reading level of your students.

 The two types of short-answer questions are direct questions and completion
questions.

Allotment of time Objective tests
Alternatives Premises
Distractors Responses
Guessing Short-answer questions
Matching questions Stem
Multiple-choice questions True-false questions

Gronlund, N. E. (1988). How to construct achievement tests. Englewood Cliffs, NJ:
Prentice Hall.

McBeath, R. (1992). Instructing and evaluating in higher education: A guidebook
for planning learning outcomes. Englewood Cliffs, NJ: Educational
Technology.

McKenna, C., & Bull, J. (1999). Designing effective objective test questions: An
introductory workshop. Retrieved from https://bit.ly/2It9v8K

Nitko, A. J. (2004). Educational assessment of students (4th ed.). New Jersey, NJ:
Pearson.

Owen, S. V., & Freeman, R. D. (1987). WhatÊs wrong with three option multiple
items? Educational & Psychological Measurement, 47, 513ă522.

Copyright © Open University Malaysia (OUM)

Topic  How to Assess?

5 ă Essay Tests

LEARNING OUTCOMES

By the end of the topic, you should be able to:
1. Define and list the criteria for an essay question;
2. Explain the formats of essay tests;
3. List the advantages and limitations of essay questions;
4. Construct well-written essay questions that assess learning outcomes

given; and
5. Describe different types of marking schemes for essays.

 INTRODUCTION

In Topic 4, we discussed in detail the use of objective tests in assessing students.
In this topic, we will examine a different type of test called the essay test. The essay
test is a popular technique for assessing learning and is used extensively at all
levels of education.

It is also widely used in assessing learning outcomes in business and professional
examinations. Essay questions are used because they challenge students to create
their own responses rather than simply selecting a response. Essay questions have
the potential to reveal studentsÊ abilities to reason, create, analyse and synthesise,
which may not be effectively assessed using objective tests.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  91

5.1 WHAT IS AN ESSAY QUESTION?

According to Stalnaker (1951), an essay is „a test item which requires a response
composed by the examinee usually in the form of one or more sentences of a nature
that no single response or pattern of responses can be listed as correct, and the
accuracy and quality of which can be judged subjectively only by one skilled or
informed in the subject.‰ Though the definition was provided a long time ago, it is
a comprehensive definition. Elaborating on this definition, Reiner, Bothell,
Sudweeks and Wood (2002) argued that to qualify as an essay question, it should
meet the following four criteria:

(a) The learner has to compose rather than select his or her response or answer.
In essay questions, students have to construct their own answer and decide
on what material to include in their response. Objective test questions (MCQ,
true-false, matching) on the other hand, require students to select the answer
from a list of possibilities.

(b) The response or answer the learner provides will consist of one or more
sentences. Students do not respond with a „yes‰ or „no‰ but instead have to
respond in the form of sentences. In theory, there is no limit to the length of
the answer. However, in most cases, its length is predetermined by the
demand of the question and the time limit allotted for the test question.

(c) There is no one single correct response or answer. In other words, the
question should be composed so that it does not ask for one single correct
response. For example, the question „Who killed JWW Birch?‰ assesses
verbatim recall or memory and not the ability to think. Hence, it cannot
qualify as an essay question. You can modify the question „Who killed JWW
Birch? Explain the factors that led to the killing.‰ Now, this is an essay
question that assesses studentsÊ ability to think and give reasons for the
killing supported with relevant evidence.

(d) The accuracy and quality of studentsÊ responses or answers to essay
questions must be judged subjectively by a specialist in the subject. The
nature of essay questions is such that only specialists in the subject can judge
to what degree responses (or answers) to an essay question are complete,
accurate and relevant. Good essay questions encourage students to think
deeply about their answers that can be judged only by someone with
appropriate experience and expertise in the content area. Thus, content
expertise is essential for both writing and grading essay tests. For example,
the question „List three reasons for the opening of Penang by the British in
1789‰ requires students to recall a set list of items. The person marking or
grading the essay does not have to be a subject matter expert to know

Copyright © Open University Malaysia (OUM)

92  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

whether the student has listed the three reasons correctly as long as the list
of three reasons is available as an answer key. For the question „To what
extent is commerce the main reason for the opening of Penang by the British
in 1789?‰, a subject matter expert is needed to grade or mark the answer to
this essay test question.

5.2 FORMATS OF ESSAY TESTS

Essay formats are usually classified into two groups: restricted response essay
questions and extended response essay questions. Both types are useful tools but
for different purposes.

(a) Restricted Response Essay Questions
Restricted response essay questions restrict or limit both the content and the
form of studentsÊ answers. The following are three examples:

(i) Discuss two advantages and two disadvantages of essay questions in
measuring studentsÊ performance.

(ii) List five guidelines for writing good essay items. For each guideline,
write a short statement explaining why it is useful in improving the
validity of essay assessment.

(iii) Distinguish the formative assessment from the summative assessment
in terms of their aims, the timing of the implementation and the content
coverage.

As shown in the examples, students are specifically informed what and how
they should respond to the questions. They indicate the number of points
required and/or the scope of the responses. The restriction or limitation on
the studentsÊ responses can also be done by including an interpretative
material (e.g. a graph, a paragraph describing a particular problem or an
extract from a literary work) and students are asked to respond to one or two
questions based on it.

The restricted response questions are more structured and are useful for
measuring learning outcomes requiring the interpretation and application of
knowledge in a specific area. They narrow the focus of the assessment task
to a specific and well-defined performance. The nature of these questions
makes it more likely that the students will interpret each question the way it
is intended. The teacher is also in a better position to assess the correctness
of studentsÊ answers when a question is focused and all students interpret it
in the same way. When the teacher is clear about what makes up correct

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  93

answers, it improves scoring reliability and the scoresÊ validity. Although
restricting studentsÊ responses makes it possible to measure more specific
learning outcomes, these same restrictions make them less valuable as a
measure of those learning outcomes emphasising integration, organisation
and originality. For higher-order learning outcomes, greater freedom of
response is needed.

(b) Extended Response Essay Questions
Extended response essay questions provide less structure and this promotes
greater creativity, integration and organisation of material. The following are
three examples:

(i) Examine to what extent essay questions are effective in measuring
studentsÊ performance.

(ii) Evaluate the usefulness of multiple-choice questions as an assessment
tool in education.

(iii) „Research without theory is blind.‰ Discuss.

In responding to extended response essay questions, students are free to select any
information that they think pertinent, to organise the answer in accordance with
their best judgement, to integrate and to evaluate ideas they deem appropriate.
This freedom enables them to demonstrate their ability to analyse problems,
organise their ideas, describe in their own words, and/or develop a coherent
argument. The extended-response essay questions are therefore useful in assessing
higher-order thinking skills. They can also be used to assess writing skills.

The freedom for students to respond to extended response essay questions can
cause some problems. First, there is usually no single correct answer to the
question. Students are free to choose the way to respond, and the degree of
correctness or merit of their answers can only be judged by a skilled subject-matter
expert. A large number of examiners is required if the assessment involves a big
student population. Inter-rater reliability in scoring can be an issue. Second, the
same freedom that enables the demonstration of creative expression and other
higher-order thinking skills makes the extended response essay question
inefficient for measuring more specific learning outcomes. Third, the extended
response essay questions require good writing skills on the part of the students.
This type of question is thus disadvantageous to students whose writing skills are
poor. Due to these limitations, it is often recommended that more restricted
response essay questions to be used in place of extended response essay questions.

Copyright © Open University Malaysia (OUM)

94  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

ACTIVITY 5.1

Select a few essay questions that have been used in tests or examinations.
To what extent do these questions meet the criteria of an essay question
as defined by Stalnaker (1951) and elaborated by Reiner et al. (2002)?

Discuss with your coursemates in the myINSPIRE online forum.

5.3 ADVANTAGES OF ESSAY QUESTIONS

Essay questions are used to assess learning because of the following reasons:

(a) Essay questions provide an effective way of assessing complex learning
outcomes. They allow one to assess studentsÊ ability to synthesise, organise
and express ideas, and evaluate the worth of ideas. These abilities cannot be
effectively assessed directly with other paper-and-pencil test items.

(b) Essay questions allow students to demonstrate their reasoning. These
questions not only allow students to present an answer to a question but also
to explain how they have arrived at their conclusions. This allows teachers
to gain insight into a studentÊs way of viewing and solving problems. With
such insight, teachers can detect problems which students may have with
their reasoning process and help them overcome these problems.

(c) Essay questions provide authentic experiences. Constructing responses is
closer to real life than selecting responses as in the case of objective tests.
Problem solving and decision making are vital life competencies which
require the ability to construct a solution or decision rather than selecting a
solution or decision from a limited set of possibilities. In the work
environment, it is unlikely that an employer will give a list of „four options‰
for a worker to choose from when the latter is asked to solve a problem. In
most cases, the worker will be required to construct a response.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  95

5.4 DECIDING WHETHER TO USE ESSAY
QUESTIONS OR OBJECTIVE QUESTIONS

Keep in mind that essay questions should strive for higher-order thinking skills.
Therefore, the decision whether to use essay questions or objective questions in
examinations can be problematic for some educators. In such a situation, one has
to go back to the objectives of assessment. What kinds of learning outcomes do you
intend to assess? Essay questions are generally suitable to assess:
(a) StudentsÊ understanding of subject matter or content; and
(b) Thinking skills that require more than simple verbatim recall of information

by challenging the students to reason with their knowledge.

It is challenging to write test items to tap into higher-order thinking. However,
studentsÊ understanding of subject matter or content, and many of the other
higher-order thinking skills, can also be assessed through objective items. When in
doubt about whether to use an essay question or an objective question, just
remember that essay questions are used to assess studentsÊ ability to construct
rather than select answers.

To determine what type of test (essay or objective) to use, it is helpful that you
examine the verb(s) that best describe the desired ability to be assessed (refer to
Topic 2).

These verbs indicate what students are expected to do and how they should
respond. They serve to focus on the studentsÊ responses and channel them towards
the performance of specific tasks. Some verbs clearly indicate that students need
to construct rather than select their answer (such as to explain). Other verbs
indicate that the intended learning outcome is focused on studentsÊ ability to recall
information (such as to list). Perhaps, recall is best assessed through objectively
scored items. Verbs that test for understanding of subject matter or content or other
forms of higher-order thinking, but do not specify whether the student is to
construct or select the response (such as to interpret) can be assessed either by
essay questions or objective items.

Copyright © Open University Malaysia (OUM)

96  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

ACTIVITY 5.2

Compare, explain, arrange, apply, state, classify, design, illustrate,
describe, name, complete, choose, defend and name. Decide which of the
verbs in the list are best assessed by essay questions or objective tests or
both objective and essay questions.

Post your answer on the myINSPIRE online forum.

5.5 LIMITATIONS OF ESSAY QUESTIONS

While essay questions are popular because they enable the assessment of higher-
order learning outcomes, this format of evaluating students in examinations has a
number of limitations which should be kept in mind.

(a) One purpose of testing is to assess a studentÊs mastery of subject matter. In
most cases, it is not possible to assess the studentÊs mastery of the complete
subject matter domain with just a few questions. Because of the time it takes
for students to respond to essay questions and for markers to mark studentsÊ
responses, the number of essay questions that can be included in a test is
limited. Therefore, using essay questions will limit the degree to which the
test is representative of the subject matter domain, thereby reducing content
validity. For instance, a test of 80 multiple-choice questions will most likely
cover more of the content domain than a test of three to four essay questions.

(b) Essay questions have limitations in reliability. While essay questions allow
students some flexibility in formulating their responses, the reliability of
marking or grading is questionable. Different markers or graders may vary
in their marking or grading of the same or similar responses (inter-scorer
reliability) and one marker can vary significantly in his or her marking or
grading consistency across questions depending on many factors (intra-
scorer reliability). Therefore, essay answers of similar quality may receive
notably different scores. Characteristics of the learner, length and legibility
of responses, and personal preferences of the marker or grader with regard
to the content and structure of the response are some of the factors that may
lead to unreliable marking or grading.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  97

(c) Essay questions require more time for marking student responses. Teachers
need to invest a large amount of time to read and mark studentsÊ responses
to essay questions. On the other hand, relatively little or no time is required
for teachers to score objective test items like multiple-choice items and
matching exercises.

(d) As mentioned earlier, one of the strengths of essay questions is that they
provide students with authentic experiences because students are challenged
to construct rather than select their responses. To what extent does the short
time normally allotted to test affect student response? Students have
relatively little time to construct their responses and this time limit does not
allow them to give appropriate attention to the complex process of
organising, writing and reviewing their responses. In fact, in responding to
essay questions, students use a writing process that is quite different from
the typical process that produces excellent writing (draft, review, revise and
evaluate). In addition, students usually have no resources to aid their writing
when answering essay questions (dictionary or thesaurus). This
disadvantage may offset whatever advantage accrued from the fact that
responses to essay questions are more authentic than responses to multiple-
choice items.

5.6 MISCONCEPTIONS ABOUT ESSAY
QUESTIONS IN EXAMINATIONS

Other than the limitations of essay questions discussed earlier, there are also some
misconceptions about this form of assessment. These misconceptions are:

(a) By Their Very Nature, Essay Questions Assess Higher-order Thinking
Whether or not an essay item assesses higher-order thinking depends on the
design of the question and how studentsÊ responses are scored. Not all essay
questions can assess higher-order thinking skills. Indeed, it is possible to
write essay questions that simply assess recall. Also, if a teacher designs an
essay question meant to assess higher-order thinking but then scores
studentsÊ responses in a way that only rewards recall ability, that teacher is
not assessing higher-order thinking. Therefore, teachers must be well-
trained to design and write higher-order thinking questions.

Copyright © Open University Malaysia (OUM)

98  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

(b) Essay Questions are Easy to Construct
Essay questions are easier to construct than multiple-choice items because
teachers do not have to create effective distractors. However, that does not
mean that good essay questions are easy to construct. They may be easier to
construct in a relative sense, but they still require a lot of effort and time.
Essay questions that are hastily constructed without much thought and
review usually function poorly.

(c) The Use of Essay Questions Eliminates the Problem of Guessing
One of the drawbacks of objective test items is that students sometimes
get the right answer by guessing which of the presented options is correct.
This problem does not exist with essay questions because students need
to generate the answer rather than identifying it from a set of options
provided. At the same time, the use of essay questions introduces bluffing,
another form of guessing. Some students are „good‰ at using various
methods of bluffing (vague generalities, padding, name-dropping) to add
credibility to an otherwise weak answer. Thus, the use of essay questions
changes the nature of the guessing that occurs, but does not eliminate it.

(d) Essay Questions Benefit All Students by Placing Emphasis on the Importance
of Written Communication Skills
Written communication is a life competency that is required for effective and
successful performance in many vocations. Essay questions challenge
students to organise and express subject matter and problem solutions in
their own words, thereby giving them a chance to practise written
communication skills that will be helpful to them in future vocational
responsibilities. At the same time, the focus on written communication skills
is also a serious disadvantage for students who have marginal writing skills
but know the subject matter being assessed. If students who are
knowledgeable in the subject obtain low scores because of their inability to
write well, the validity of the test scores will be diminished.

(e) Essay Questions Encourage Students to Prepare More Thoroughly
Some research seems to indicate that students are more thorough in
their preparation for examinations using essay questions than in their
preparation for objective examinations such as those using multiple-choice
questions. However, after an extensive review of existing literature and
research on this topic, Crooks (1988) concluded that studentsÊ extent of
preparation is based more on the expectations teachers set upon them
(higher-order thinking and breadth and depth of content) than the type of
test questions they expect to be given in examinations.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  99

SELF-CHECK 5.1

1. What are some limitations in the use of essay questions?
2. List some of the misconceptions about essay questions.

ACTIVITY 5.3

Compare the following two essay questions and decide which one
assesses higher-order thinking skills.
(a) „What are the major advantages and limitations of solar energy?‰
(b) „Given its advantages and limitations, should governments spend

money developing solar energy?‰

Post your answer on the myINSPIRE online forum.

5.7 GUIDELINES ON CONSTRUCTING ESSAY
QUESTIONS

When constructing essay questions, whether they are for coursework assessments
or examinations, the most important thing is to ensure that students have a clear
idea of what they are expected to do after they have read the question or problem
presented.

Here are specific guidelines that can help you improve existing essay questions
and create new ones.

(a) Clearly Define the Intended Learning Outcome to be Assessed by the
Question
Knowing the intended learning outcome is crucial for designing essay
questions. In specifying the intended learning outcome, teachers clarify the
performance that students should be able to demonstrate as a result of what
they have learnt. The intended learning outcome typically begins with a verb
that describes an observable behaviour or action that students should
demonstrate. The focus is on what students should and should not be able to
do in the learning or teaching process. Reviewing a list of verbs can help to
clarify what ability students should demonstrate, thereby defining the
intended learning outcome to be assessed (refer to subtopic 4.8).

Copyright © Open University Malaysia (OUM)

100  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

(b) Avoid Using Essay Questions for Intended Learning Outcomes that are
Better Assessed with Other Kinds of Assessment
Some types of learning outcomes can be more efficiently and more reliably
assessed with objective tests than with essay questions. Since essay questions
sample a limited range of subject matter or content, are more time-
consuming to score and involve greater subjectivity in scoring, the use of
essay questions should be reserved for learning outcomes that cannot be
better assessed by some other means. Let us look at Example 5.1.

Example 5.1:
Learning Outcome:
To be able to differentiate the reproductive habits of birds and amphibians.

Essay Question:
What are the differences in egg laying characteristics between birds and
amphibians?

Note: This learning outcome can be better assessed by an objective test.

Objective Item:
Which of the following differences between birds and amphibians is correct?

Birds Amphibians

A Lay a few eggs at a time Lay many eggs at a time

B Lay eggs Give birth

C Do not incubate eggs Incubate eggs

D Lay eggs in nest Lay eggs on land

(c) Clarity About the Task and Scope
Essay questions have two variable elements ă the degree to which the task is
structured and the degree to which the scope of the content is focused. There
is still confusion among educators as to whether more structure (of the task
required) and more focus (on the content) are better than less structure and
less focus. When the task is more structured and the scope of content is more
focused, two problems are reduced:

(i) The problem of student responses containing ideas that were not meant
to be assessed; and

(ii) The problem of extreme subjectivity when scoring student answers or
responses.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  101

Although more structure helps to avoid these problems, how much and what
kind of structure and focus to provide are dependent on the intended
learning outcome that is to be assessed by the essay question. The process of
writing effective essay questions involves defining the task and delimiting
the scope of the content in an effort to create an effective question that is
aligned with the intended learning outcome to be assessed by it (as
illustrated in Figure 5.1).

Figure 5.1: Alignment between content, learning activities
and assessment tasks

Source: Phillips, Ansary Ahmed and Kuldip Kaur (2005)
This alignment is absolutely necessary for obtaining studentsÊ responses that
can be accepted as evidence that a student has achieved the intended learning
outcome. Hence, the essay question must be carefully and thoughtfully
written in such a way that it elicits student responses that provide the teacher
with valid and reliable evidence about the studentsÊ achievement of the
intended learning outcome. Failure to establish adequate and effective limits
for studentsÊ answers to the question may result in students setting their own
boundaries for their responses. This means that students might provide
answers that are outside the intended task or address only a part of the
intended task. If this happens, then the teacher is left with unreliable and
invalid information about the studentsÊ achievement of the intended learning
outcome. Also, there is no basis for marking or grading studentsÊ answers.
Therefore, it is the responsibility of the teacher to write essay questions in
such a way that they provide students with clear boundaries for their
answers or responses. Let us look at Example 5.2.

Copyright © Open University Malaysia (OUM)

102  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

Example 5.2: Improving Clarity of Task and Scope of Essay Questions

Weak Essay Question:
Evaluate the impact of the Industrial Revolution on England.

The verb is „evaluate‰, which is the task the student is supposed to do. The
scope of the question is the impact of the Industrial Revolution on England.
Very little guidance is given to students about the task of evaluating and the
scope of the task. A student reading the question may ask:

(i) The impact on what in England? The economy? Foreign trade? A
particular group of people? (The scope is not clear.)

(ii) Evaluate based on what criteria? The significance of the revolution? The
quality of life in England? Progress in technological advancements?
(The task is not clear.)

(iii) What exactly do you want me to do in my evaluation? (The task is not
clear.)

Improved Essay Question:
Evaluate the impact of the Industrial Revolution on the quality of family life
in England. Explain whether families were able to provide for the education
of their children.

The improved question determines the task for students by specifying a
particular unit of society in England affected by the Industrial Revolution
(family). The task is also determined by giving students a criterion for
evaluating the impact of the Industrial Revolution (whether or not families
were able to provide for their childrenÊs education). Students are clearer
about what must be done to „evaluate‰. They need to explain how family life
has changed and judge whether or not the changes are an improvement for
the children.

SELF-CHECK 5.2

1. When would you decide to use an objective item rather than an
essay question to assess learning?

2. What is the difference between the task and the scope of an essay
question?

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  103

(d) Questions that are Fair
One of the challenges that teachers face in composing essay questions is that
because of their extensive experience with the subject matter, they may be
tempted to demand unreasonable content expertise on the part of the
students. Hence, teachers need to make sure that their students can „be
expected to have adequate material with which to answer the question‰
(Stalnaker, 1951). In addition, teachers should ask themselves if students can
be expected to adequately perform the thought processes which are required
of them in the task. For assessment to be fair, teachers need to provide their
students with sufficient instruction and practice in the subject matter
required for the thought processes to be assessed.

Another important element is to avoid using indeterminate questions. A
question is indeterminate if it is so unstructured that students can redefine
the problem and focus on some aspect of it with which they are thoroughly
familiar or if experts in the subject matter cannot agree that one answer is
better than another. One way to avoid indeterminate questions is to stay
away from vocabulary that is ambiguous. For example, teachers should
avoid using the verb „discuss‰ in an essay question. This verb is simply too
broad and vague. Moreover, teachers should also avoid including
vocabulary that is too advanced for students.

(e) Specify the Approximate Time Limit and Marks Allotted to Each Question
Specifying the approximate time limit helps students allocate their time in
answering several essay questions. Without such guidelines, students may
feel at a loss as to how much time to spend on a question. When deciding the
guidelines for how much time should be spent on a question, keep the slower
students and students with certain disabilities in mind. Also make sure that
students can be realistically expected to provide an adequate answer in the
given and/or suggested time. Similarly, state the marks allotted to each
question so that students can estimate how much they should write to
answer the question.

(f) Use Several Relatively Short Essay Questions Rather than One Long
Question
Only a very limited number of essay questions can be included in a test
because of the time it takes for students to respond to them and the time it
takes for teachers to grade the studentsÊ responses. This creates a challenge
with regard to designing valid essay questions. Shorter essay questions are
better suited to assess the depth of student learning within a subject, whereas
longer test essay questions are better suited to assess the breadth of student
learning within a subject. Hence, there is a trade-off when choosing between
several short essay questions or one long question. Focus on assessing the

Copyright © Open University Malaysia (OUM)

104  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

depth of student learning within a subject limits the assessment of the
breadth of student learning within the same subject. Meanwhile, focus on
assessing the breadth of student learning within a subject limits the
assessment of the depth of student learning within the same subject. When
choosing between using several short essay questions or a long question, also
keep in mind that short essays are generally easier to mark than long essays.

(g) Avoid the Use of Optional Questions
Students should not be permitted to choose one essay question to answer
from two or more optional questions. The use of optional questions should
be avoided for the following reasons:
(i) Students may waste time deciding on an option; and
(ii) Some questions are likely to be harder which could make the
comparative assessment of studentsÊ abilities unfair.

The issue of the use of optional questions is debatable. It is often practised,
especially in higher education and students often demand that they be given
choices. The practice is acceptable if it can be assured that the questions have
equivalent difficulty levels and the tasks as well as the scope required by the
questions are equivalent.

Last but not least, let us improve the essay questions through preview and review.

Improving Essay Questions Through Preview and Review

The following steps can help you improve the essay item before and after you
administer it to your students.

PREVIEW (before handing out the essay question to the students)

Predict StudentsÊ Responses
Try to respond to the question from the perspective of a typical student.
Evaluate whether students have the content knowledge and the skills
necessary to adequately respond to the question. After detecting possible
weaknesses of the essay questions, repair them before handing them out in the
exam.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  105

Write a Model Answer
Before using a question, write model answer(s) or at least an outline of major
points that should be included in an answer. Writing the model answer allows
reflection on the clarity of the essay question. Furthermore, the model answer
serves as a basis for the grading of student responses. Once the model answer
has been written, compare its alignment with the question and the intended
learning outcome, and make changes as needed to assure that the intended
learning outcome, the question and the model answer are aligned with one
another.
Before using the question in a test, ask a knowledgeable person in the subject
to critically review the essay question, the model answer and the intended
learning outcome to determine how well they are aligned with each other.
REVIEW (after receiving the student responses)
Review StudentsÊ Responses to the Essay Question
After students have answered the questions, carefully review the range of
answers given and the manner in which students seem to have interpreted the
question. Make revisions based on the findings. Writing good essay questions
is a process that requires time and practice. Carefully studying the studentsÊ
responses can help to evaluate studentsÊ understanding of the question as well
as the effectiveness of the question in assessing the intended learning
outcomes.

Copyright © Open University Malaysia (OUM)

106  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

In addition, you can use a checklist as shown in Figure 5.2 to check your essay
questions.

Figure 5.2: A checklist for writing essay questions

SELF-CHECK 5.3

1. Why should you specify the time allotted for answering each
question?

2. Why should you avoid optional questions?
3. What is meant when it is said that questions should be „fair‰?
4. What should you do before and after administering a test?

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  107

5.8 VERBS DESCRIBING VARIOUS KINDS OF
MENTAL TASKS

Using the list suggested by Moss and Holder (1988), and Anderson and Krathwohl
(2001), Reiner et al. (2002) proposed the following list of verbs that describe mental
tasks to be performed (refer to Table 5.1).

Table 5.1: Verbs, Definitions and Examples

Verbs Definitions Examples
Analyse
Break material into its constituent Analyse the meaning of the line „He
Apply
parts and determine how the parts saw a dead crow, in a drain, near the
Attribute
Classify relate to one another and to an post office‰ in the poem The Dead
Compare
Compose overall structure or purpose. Crow.
Contrast
Decide which abstractions Apply the principles of supply and
Create (concepts, principles, rules, laws, demand to explain why the
theories, generalisations) are consumer price index (CPI) in
relevant in a problem situation. Malaysia has increased in the last

three months.

Determine a point of view, bias, Determine the point of view of the

value or intent underlying the author in the article about her

presented material. political perspective.

Determine which category belongs Classify the organisms into

to something. vertebrates and invertebrates.

Identify and describe points of Compare the role of the Dewan

similarity. Rakyat and Dewan Negara.

Make or form by combining Compose an effective plan for

things, parts or elements. solving flooding problems in Kuala

Lumpur.

Bring out the points of difference. Contrast the contribution of Tun
Hussein Onn and Tun Abdul Razak
Hussein to the political stability of
Malaysia.

Put elements together to form a Create a comprehensive solution for
coherent or functional whole, the traffic problems in Kuala
reorganise elements into a new Lumpur.
pattern or structure.

Copyright © Open University Malaysia (OUM)

108  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

Critique Detect consistencies and Judge which of the two methods is

inconsistencies between a product the best way for reducing high

and relevant external criteria; absenteeism in the workplace.

detect the appropriateness of a

procedure for a given problem.

Defend Develop and present an argument Defend the decision to raise fuel
to support a recommendation, to prices by the government.
maintain or revise a policy,
programme or propose a course of
action.

Define Give the meaning of a word or Define the term „chemical
concept; place it in the class to weathering".
which it belongs and distinguish it
from other items in the same class.

Describe Give an account of; tell or depict in Describe the contribution of ZaÊba

words; represent or delineate by a to the development of Bahasa

word picture. Melayu.

Design Devise a procedure for Design an experiment to prove that
accomplishing some task. 21 per cent of air is composed of
oxygen.

Differentiate Distinguish relevant from Distinguish between supply and
irrelevant parts or important from demand in determining price.
unimportant parts of presented
material.

Explain Make clear the cause or reason of Explain the causes of the First
something; construct a cause-and- World War.
effect model of a system; tell
„how‰ to do; tell the meaning of.

Evaluate Make judgements based on criteria Evaluate the contribution of the
and standards; determine the microchip in telecommunications.
significance, value, quality or
relevance of; give the good points
and the bad ones; identify and
describe the advantages and
limitations.

Generate Come up with alternative Generate hypotheses to account for
hypotheses, examples, solutions, an observed phenomenon.
proposals based on criteria.

Identify Recognise as being a particular Identify the characteristics of the

person or thing. Mediterranean climate.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  109

Illustrate Use a word picture, a diagram, a Illustrate the use of catapults in the
Infer chart or a concrete example to amphibious warfare of Alexander.
Interpret clarify a point.

Justify Draw a logical conclusion from What can you infer happened in the
List
Predict presented information. experiment?

Propose Give the meaning of; change from Interpret the poetic line, „The sound
Recognise one form of representation (such as of a cobweb snapping is the noise of
Recall numerical) to another (such as my life.‰
Summarise verbal).
Trace
Show good reasons for; give your Justify the American entry into the
evidence; present facts to support Second World War.
your position.

Create a series of names or other List the major functions of the

items. human heart.

Know or tell beforehand with Predict the outcome of a chemical
precision of calculation, reaction.
knowledge or shrewd inference
from facts or experience what will
happen.

Offer for consideration, acceptance Propose a solution for landslides

or action; suggest. along the North-South Highway.

Locate knowledge in long-term Recognise the important events in

memory that is consistent with the road to independence in

presented material. Malaysia.

Retrieve relevant knowledge from Recall the dates of important events

long-term memory. in Islamic history.

Sum up; give the main points Summarise the ways in which man

briefly. preserves food.

Follow the course of; follow the Trace the development of television
trail of; give a description of in school instruction.
progress.

The definitions specify thought processes a person must perform to complete the
mental tasks. Note that this list is not exhaustive and local examples have been
introduced to illustrate the mental tasks required in each essay question.

Copyright © Open University Malaysia (OUM)

110  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

ACTIVITY 5.4

Discuss the following with your coursemates in the myINSPIRE online
forum:
(a) Select some essay questions in your subject area and examine

whether the verbs used are similar to those in the list given in
Table 5.1. Do you think the tasks required by the verbs used are
appropriate? Justify.
(b) Do you think students are able to differentiate between the tasks
required in the verbs listed? Justify.
(c) Are teachers able to describe to students the tasks required by using
these verbs? Explain.

5.9 MARKING AN ESSAY

Marking or grading of essays is a notoriously unreliable activity. If we read an
essay at two different times, the chances are high that we will give the essay a
different grade each time. If two or more of us read the essay, our grades will likely
differ, often dramatically so. We all like to think we are exceptions, but study after
study of well-meaning and conscientious teachers show that essay grading is
unreliable (Ebel, 1972; McKeachie, 1987). Eliminating the problem is unlikely, but
we can take steps to improve grading reliability. Using a scoring guide or marking
scheme helps control the shifting of standards that inevitably take place as we read
a collection of essays and papers. The common types of marking scheme used in
scoring studentsÊ responses to essay questions are diagrammatically presented as
follows (refer to Figure 5.3):

Figure 5.3: Types of marking scheme

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  111

A marking scheme may take the form of a checklist, a rubric or a combination of
both.

(a) Checklist
In a checklist, a score is awarded for every correct or relevant point in a
response. The sum of these individual scores provides the final score of the
response. Table 5.2 is an example of a checklist.

Table 5.2: Sample of a Checklist

Reference Topic 5, Section 5.7, p. 74

Suggested Strengths
answers
 Essay questions provide an effective way of assessing complex
learning outcomes.

 Essay questions allow students to demonstrate their reasoning
and creativity.

 Essay questions provide authentic experiences because students
are given the opportunity to organise, write and review their
responses.

 Guessing is very much reduced.

(Accept any other appropriate answers.)

Marks Award 1 mark for each point. (1 mark  4 = 4 marks)

allocation

This marking scheme can be used to assess studentsÊ responses to an essay
question that ask for the strengths of essay questions as an assessment tool.
A checklist is easy to use. The teacher just needs to read through the studentÊs
response and checks the number of points for the calculation of marks. A
checklist is useful to assess factual content and it is relatively easy to
construct. The teacher just needs to present a list of points required in the
response and decide on the marks for each point. However, a checklist with
a list of points does not provide for the assessment of intangible learning
outcomes such as „to discuss‰, „to evaluate‰ or „to explain‰ and other
complexity levels of BloomÊs taxonomy. It also has limited feedback for
formative purposes and students cannot use it as a guide for writing
assignments.

Copyright © Open University Malaysia (OUM)

112  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

(b) Rubric
The two most common approaches used in scoring rubrics are the holistic
and the analytic methods.

(i) Holistic Method (Global or Impressionistic Marking)
The holistic approach to scoring essay questions involves reading an
entire response and assigning it to a category identified by a score or
grade. This method involves considering the studentÊs answer as a
whole and judging the total quality of the answer relative to other
studentsÊ responses or the total quality of the answer based on certain
criteria that have been developed.

Think of it as sorting into bins. You read the answer to a particular
question and assign it to the appropriate bin. The best answers go into
the „exemplary‰ bin, the good ones go into the „good‰ bin and the
weak answers go into the „poor‰ bin (refer to Table 5.3).

Table 5.3: Sample of a Marking Scheme Using the Holistic Method

Level of Achievement Descriptor

7ă8  Addresses the question
(Exemplary)  States a relevant argument
 Presents arguments in a logical order
 Uses acceptable style and grammar (no errors)

5ă6  Combination of above traits, but less consistently
(Good) represented (few errors)

3ă4  Does not address the question explicitly, though does
(Adequate) so tangentially

 States a somewhat relevant argument
 Presents some arguments in a logical order
 Uses adequate style and grammar (some errors)

1ă2  Does not address the question
(Poor)  States no relevant arguments
 Is not clearly or logically organised
 Fails to use acceptable style and grammar

0  Irrelevant response or no answer

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  113

Then, points are written on each paper appropriate to the bin it is in. It
is based on an overall impression. The holistic method is also referred
to as global or impressionistic marking.

One of the strengths of holistic rubric is that studentsÊ responses can be
scored quite quickly. The teacher needs to read through the studentÊs
response and decide in which band of scores the response lies. This
rubric can provide an overview of student performance but it does not
provide detailed information about studentÊs performance. It may be
difficult to provide an overall score to the studentÊs response.

How best can a teacher use the holistic method in scoring studentsÊ
responses? Before he or she starts marking, the teacher can develop a
description of the type of response that would illustrate each category,
and then try out this draft version using several actual papers. After
reading and categorising all of the papers, it is a good idea to re-
examine the papers within a category to see if they are similar enough
in quality to receive the same points or grade. It may be faster to read
essays holistically and provide only an overall score or grade, but
students do not receive much feedback about their strengths and
weaknesses. Some instructors who use holistic scoring also write brief
comments on each paper to point out one or two strengths and/or
weaknesses so students will have a better idea of why their responses
received the scores they did.

(ii) Analytic Method
The analytic method of marking is the system most frequently used in
large-scale public examinations and also by teachers in the classroom.
Its basic tool is a two-dimensional table with the performance criteria
down the vertical column on the left and the performance levels across
the top row. The cells then present the performance descriptors as
shown in Table 5.4.

Copyright © Open University Malaysia (OUM)

114  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS Table 5.4: Sample of a Marking Scheme Using the Analytic Method

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  115

The holistic scoring gives students a single, overall assessment score for
the response as a whole. The analytic scoring provides students with at
least a rating score for each criterion. For example, based on the rubric,
a studentÊs response may get 3 points for focus/organisation, 2 points
for elaboration and 4 points for mechanics, giving a total of 9 marks.
Alternatively, an analytic rubric may take the form of a weighted
rubric, whereby different weights (value) are assigned to different
criteria and include an overall achievement by totalling the criteria.
Refer to Table 5.5 for a sample of a weighted analytic rubric.

Copyright © Open University Malaysia (OUM)

116  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS Table 5.5: Sample of a Marking Scheme Using the Weighted Analytic Method

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  117

To use the rubric, the performance level achieved by the student is
multiplied by the weight to give a score for each criterion. For example,
for focus/organisation, the score is 3  1.25 = 3.75, for elaboration, the
score is 2  1.25 = 2.5 and for mechanics the score is 4  0.5 = 2.0. This
gives the student a total of 8.25 marks out 12.

The analytic rubric provides more detailed feedback on areas of
strength and weakness because the performance criteria are given and
each criterion can be weighted to reflect its relative importance in the
studentÊs response. Generic rubrics which are not task specific can also
be a useful aid to learning. Students can use them too as a guide to
doing the assignments. As shown in Table 5.5, the performance
descriptors are stated in general terms, and do not give away the
answers. However, it takes more time to create and use than a holistic
rubric. Moreover, it is important that each point for each criterion is
well-defined. Otherwise, different raters may not arrive at the same
score.

5.10 SUGGESTIONS FOR MARKING ESSAYS

Here are some suggestions for marking or scoring essays:

(a) Grade the papers anonymously. This will help control the influence of our
expectations of the student on the evaluation of the answer.

(b) Read and score the answers to one question before going on to the next
question. In other words, score all the studentsÊ responses to Question 1
before looking at Question 2. This helps to keep one frame of reference and
one set of criteria in mind through all the papers, which results in more
consistent grading. It also prevents an impression that we form in reading
one question from carrying over to our reading of the studentÊs next answer.

(c) If a student has not done a good job on the first question, we may let this
impression influence our evaluation of the studentÊs second answer.
However, if other studentsÊ papers come in between, we are less likely to be
influenced by the original impression.

(d) If possible, try to grade all the answers to one particular question without
interruption. Our standards might vary from morning to night or one day to
the next.

Copyright © Open University Malaysia (OUM)

118  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

(e) Shuffle all the papers after each item is scored. Changing the order of papers.
this way reduces the context effect and the possibility that a studentÊs score
may be the result of the location of the paper in relationship to other papers.
If RakeshÊs „B‰ work is always following JamalÊs „A‰ work, then it might
look more like „C‰ work and his grade would be lower than if his paper was
somewhere else in the stack.

(f) Decide in advance how you are going to handle extraneous factors and be
consistent in applying the rule. Students should be informed about how you
treat such things as misspelled words, neatness, handwriting, grammar and
so on.

(g) Be on the alert for bluffing. Some students who do not know the answer may
write a well-organised coherent essay but one containing material irrelevant
to the question. Decide how to treat irrelevant or inaccurate information
contained in the studentsÊ answers. We should not give credit for irrelevant
material. It is not fair to other students who may also have preferred to write
on another topic, but instead wrote on the required question.

(h) Write comments on the studentsÊ answers. Teacher comments make essay
tests a good learning experience for students. They also serve to refresh your
memory of your evaluation should the student question the grade given.

(i) Be aware of the order in which papers are marked which can have an impact
on the grades awarded. A marker may grow more critical (or more lenient)
after having read several papers, thus the early papers may receive lower (or
higher) marks than papers of similar quality that are scored later.

(j) Also, when students are directed to take a stand on a controversial issue, the
marker must be careful to ensure that the evidence and the way it is
presented is evaluated, not the position taken by the student. If the student
takes a position which differs from that of the marker, the marker must be
aware of his or her own possible bias in marking the essay.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  119

ACTIVITY 5.4

1. Compare the analytical method and holistic method of marking
essays.

2. Which method is widely practised in your institution? Why?
3. Do you think there would be a difference in marking an answer

using the two methods? Justify your answer.
Post your answers on the myINSPIRE online forum.

 An essay question is a test item which requires a response composed by the
examinee usually in the form of one or more sentences of a nature that no single
response or pattern of responses can be listed as correct, and the accuracy and
quality of which can be judged subjectively only by one skilled or informed in
the subject matter.

 There are two types of essays based on their function: restricted response and
extended response essay questions.

 Essay questions provide an effective way of assessing complex learning
outcomes.

 Essay questions provide authentic experiences because constructing responses
are closer to real life than selecting responses.

 It is not possible to assess a studentÊs mastery of the complete subject matter
domain with just a few questions.

 Essay questions have two variable elements ă the degree to which the task is
structured and the degree to which the scope of the content is focused.

 Whether or not an essay item assesses higher-order thinking depends on the
design of the question and how studentsÊ responses are scored.

 Specifying the approximate time limit helps students allocate their time in
answering several essay questions.

Copyright © Open University Malaysia (OUM)

120  TOPIC 5 HOW TO ASSESS? – ESSAY TESTS

 Avoid using essay questions for intended learning outcomes that are better
assessed with other kinds of assessment.

 Analytical marking is the system most frequently used in large-scale public
examinations and also by teachers in the classroom. Its basic tool is the marking
scheme with proper mark allocations for elements in the answer.

 The holistic approach to scoring essay questions involves reading an entire
response and assigning it to one of several categories, each given a score or
grade.

Analytical method Holistic method
Checklist Marking scheme
Complex learning outcomes Mental tasks
Constructed responses Model answer
Essay Rubric
Grading Time consuming

Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching,
and assessing: A revision of BloomÊs taxonomy of educational objectives.
Boston, MA: Allyn & Bacon.

Crooks, T. J. (1988). The impact of classroom evaluation practices on
students. Review of Educational Research, 58(4), 438ă481.

Ebel, R. L. (1972). Essentials of educational measurement. Oxford, England:
Prentice-Hall.

McKeachie, W. J. (1987). Can evaluating instruction improve teaching? New
Directions for Teaching and Learning, 31(1987), 3ă7.

Copyright © Open University Malaysia (OUM)

TOPIC 5 HOW TO ASSESS? – ESSAY TESTS  121
Moss, A., & Holder, C. (1988). Improving student learning: A guidebook for faculty

in all disciplines. Dubuque, IO: Kendall/Hunt.
Phillips, J. A., Ansary Ahmed, & Kuldip Kaur. (2005). Instructional design

principles in the development of an e-learning graduate course. Paper
presented at The International Conference in E-Learning. Bangkok, Thailand.
Reiner, C. M., Bothell, T. W., Sudweeks, R. R., & Wood, B. (2002). Preparing
effective essay questions. Stillwater, OK: New Forums Press.
Stalnaker, J. M. (1951). The essay type examination. In E. F. Lindquist (Ed.),
Educational measurement (pp. 495ă530). Menasha, WI: George Banta.

Copyright © Open University Malaysia (OUM)

Topic  Authentic

6 Assessment

LEARNING OUTCOMES

By the end of the topic, you should be able to:
1. Define authentic assessment;
2. Explain how to use authentic assessment;
3. Explain the advantages and disadvantages of authentic assessment;
4. Describe the characteristics of authentic assessment; and
5. Compare authentic assessment with traditional assessment.

 INTRODUCTION

Many teachers use traditional assessment tools such as multiple-choice tests and
essay type tests to assess their students. How well do these multiple-choice or
essay tests really evaluate studentsÊ understanding and achievement? These
traditional assessment tools do serve a role in the assessment of student outcomes.

However, assessment does not always have to involve paper and pencil, but can
instead be in the form of a project, an observation or a task that shows a student
has learnt the material. Are these alternative assessments more effective than the
traditional ones?

Some classroom teachers are using testing strategies that do not focus entirely on
recalling facts. Instead, they ask students to demonstrate the skills and concepts
they have learnt. Teachers may want to ask the students to learn how to apply their
skills to authentic tasks and projects or to have students demonstrate the

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  123

application of their knowledge in real life. The students must then be trained to
perform meaningful tasks that replicate real-world challenges. In other words,
students are asked to perform a task rather than select an answer from a ready-
made list.

This strategy of asking students to perform real-world tasks that demonstrate
meaningful application of essential knowledge and skills is called authentic
assessment. Let us learn more about authentic assessment in the following
subtopics.

ACTIVITY 6.1

The following are two assessment procedures, A and B. Which is an
authentic assessment and which is traditional assessment?

Assessment A
Students are asked to take a paper-and-pencil test on how to prepare for
MCQs in an examination paper.

Assessment B
Students are asked to prepare for MCQs in an examination paper,
administer it to a class of 30 students and then write a report.

Justify your answer in the myINSPIRE online forum.

6.1 WHAT IS AUTHENTIC ASSESSMENT IN
THE CLASSROOM?

Authentic assessment, in contrast to the more traditional assessment, encourages
the integration of teaching, learning and assessing. In the „traditional assessment
model‰, teaching and learning are often separated from assessment. A test is
administered after knowledge or skills have been acquired. Authentic assessment
usually includes a task for students to perform and a rubric by which their
performance on the task will be assessed. Thus doing science experiments, writing
stories and reports, and solving mathematical problems that have real-world
applications can all be considered as examples of authentic assessment. Useful
achievement data can be obtained via authentic assessment.

Copyright © Open University Malaysia (OUM)

124  TOPIC 6 AUTHENTIC ASSESSMENT

Teachers can teach students mathematics, history and science, not just know them.
Then, to assess what the students had learnt, teachers can ask students to perform
the tasks that „replicate the challenges‰ faced by those using mathematics, history
or conducting a scientific investigation. Well-designed traditional classroom
assessments such as tests and quizzes can effectively determine whether or not
students have acquired a body of knowledge.

In contrast, authentic assessments ask students to demonstrate understanding by
performing a more complex task usually representative of more meaningful
application. These tasks involve asking students to analyse, synthesise and apply
what they have learnt in a substantial manner and students create new meaning
in the process as well. In short, authentic assessment helps answer the question,
„How well can you use what you know?‰ but traditional testing helps answer the
question, „Do you know it?‰

The usual or traditional classroom assessment such as multiple-choice tests and
short-answer tests are just as important as the authentic assessment. In fact, the
authentic assessment complements the traditional assessment. Authentic
assessment has been gaining acceptance among early childhood and primary
school teachers where traditional assessment may not be appropriate.

6.2 ALTERNATIVE NAMES FOR AUTHENTIC
ASSESSMENT

Did you know that authentic assessment is sometimes referred to as performance
assessment, alternative assessment and direct assessment?

It is called performance assessment or performance-based assessment because
students are asked to perform meaningful tasks. Performance assessment is, „a test
in which the test taker actually demonstrates the skills the test is intended to
measure by doing real-world tasks that require those skills, rather than by
answering questions asking how to do them‰ (Vander Ark, 2013). Project-based
learning (PBL) and portfolio assignments are examples of performance
assessment. With performance assessment, teachers observe students while they
are performing in the classroom, and judge the level of proficiency demonstrated.
As authentic tasks are rooted in curriculum, teachers can develop tasks based on
what already works for them. Through this process, evidence-based assignments
such as portfolios become more authentic and more meaningful to students.

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  125

The term alternative assessment is sometimes used because authentic assessment
is an alternative to traditional assessments. Using checklists and rubrics in self and
peer evaluation, students participate actively in evaluating themselves and one
another. Alternative assessments measure performance in ways other than
traditional paper-and-pencil, and short-answer tests. For example, a Klang Valley
Science teacher may ask the students to identify the different pollutants in the
Klang River and make a report to the local environmental council.

Direct assessment is so-called because authentic assessments are direct measures
that provide more direct evidence of meaningful application of knowledge and
skills. If a student does well on a multiple-choice test, we might infer indirectly
that the student could apply that knowledge in real-world contexts as well; but we
would be more comfortable making that inference from a direct demonstration of
that application such as in the example mentioned earlier, river pollutants. We do
not just want students to know the content of the disciplines when they leave
school; we want them to apply other knowledge and skills they have learnt. Direct
evidence of student learning is tangible, visible, and measureable and tends to be
more compelling evidence of exactly what students have and have not learnt.
Teachers can directly look at studentsÊ work or performances to determine what
they have learnt.

6.3 HOW TO USE AUTHENTIC ASSESSMENT?

Authentic assessments focus on the learning process, sound instructional
practices, and high-level thinking skills and proficiencies needed for success in the
real world, and, therefore, may offer students who have been exposed to them
huge advantages over those who have not. This helps students see themselves as
active participants, who are working on a task of relevance, rather than passive
recipients of obscure facts. It helps teachers by encouraging them to reflect on the
relevance of what they teach and provides results that are useful for improving
instruction.

The following lists the steps which you can take to create your own authentic
assessment:

(a) Identify which standards you want your students to meet through this
assessment;

(b) Choose a relevant task for this standard or set of standards, so that students
can demonstrate how they have or have not met the standards;

Copyright © Open University Malaysia (OUM)

126  TOPIC 6 AUTHENTIC ASSESSMENT

(c) Define the characteristics of good performance on this task. This will provide
useful information regarding how well students have met the standards; and

(d) Create a rubric or set of guidelines for students to follow so that they are able
to assess their work as they perform the assigned task.

Brady (2012) suggested some examples of authentic assessment strategies which
include the following:

(a) Exhibit an athletic skill;

(b) Produce a short musical, dance or drama;

(c) Publish a class brochure;

(d) Perform a role, an oral presentation or an artistic display;

(e) Plan or draw conceptual mind maps or flow charts;

(f) Demonstrate the use of ICT tools such as webpages creation or video editing;

(g) Construct models;

(h) Produce creative writing;

(i) Peer teaching, evaluating teacher-student feedback; and

(a) Attempt unstructured tasks like problem-solving, open-ended questions,
formal and informal observations.

6.4 ADVANTAGES OF AUTHENTIC
ASSESSMENT

According to Wiggins (1990), while standardised, multiple-choice tests can be
valid indicators of academic performance, tests often mislead students into
believing that learning requires cramming and mislead teachers into believing
tests are after-the-fact, contrived and irrelevant.

A move towards more authentic tasks and outcomes improves teaching and
learning. In this respect, authentic assessment has many benefits, but the main
benefits are as follows:

(a) Authentic assessment provides parents and community members with
directly observable products and understandable evidence concerning their
childrenÊs performance. The quality of studentÊs work is more discernible to
laypeople than when we must rely on abstract statistical figures.

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  127

(b) Authentic assessment uses tasks that reflect normal classroom activities or
real-life learning as means for improving instruction; thus, allowing teachers
to plan a comprehensive, developmentally-oriented curriculum based on
their knowledge of each child.

(c) Authentic assessment is consistent with the constructivist approach to
learning. This approach emphasises that students should use their previous
knowledge to build new knowledge structures, be actively involved in
exploration and inquiry through task-like activities, and construct meaning
from educational experience. Most authentic assessments engage students
and actively involve them with complex tasks that require exploration and
inquiry.

(d) Authentic assessment tasks assess the studentsÊ ability on how well they can
apply what they have learnt in real-life situations. An important school
outcome is the ability of the students to solve problems and lead a useful life,
rather than simply to answer questions about facts, principles and theories
they have learnt. In other words, authentic assessments require students to
demonstrate their ability to complete a task using their knowledge and skills
from several areas rather than simply recalling information or saying how to
do a task.

(e) Authentic assessment tasks require an integration of knowledge, skills and
abilities. Complex tasks, especially those that span for longer periods, require
students to use different skills and abilities. Portfolios and projects, two
common tools in authentic assessment, require a student to use knowledge
from several different areas and many different abilities.

(f) Authentic assessment focuses on higher-order thinking skills such as
„applying, analysing, evaluating and creating‰, which are found in BloomÊs
taxonomy. Authentic assessment evaluates thinking skills such as analysis,
synthesis, evaluation and interpretation of facts and ideas ă skills which
standardised tests generally avoid.

(g) Embedding authentic assessment in the classroom allows for a wide range of
assessment strategies. It involves the teacher-and-student collaboration in
determining assessment (student-structured tasks).

(h) Authentic assessment broadens the approach to student assessment.
Introducing authentic assessment along with traditional assessment
broadens the types of learning outcomes that a teacher can assess. It also
offers students a variety of ways of expressing their learning, thus enhancing
the validity of student evaluation.

Copyright © Open University Malaysia (OUM)

128  TOPIC 6 AUTHENTIC ASSESSMENT

(i) Authentic assessment focuses on studentÊs progress, rather than identifying
their weaknesses. Authentic assessment lets teachers assess the processes
students use as well as the products they produce. Many authentic tasks offer
teachers the opportunity to watch the way a student goes about solving a
problem or completing a task. Appropriate scoring rubrics help teachers
collect information about the quality of the processes and strategies students
use, as well as assess the quality of the finished product.

6.5 DISADVANTAGES OF AUTHENTIC
ASSESSMENT

Despite the usefulness of authentic assessment as an assessment tool, it has some
drawbacks as well. Some of the criticisms are as follows:

(a) High-quality Authentic Assessment Tasks Are Difficult to Develop
First, they must match the complex learning outcomes that are being
assessed. Teachers may decide that more than one learning outcome be
assessed by the same complex task. They must also be aware that not every
learning outcome can and should be assessed by authentic assessments. They
should only select those that can and should. In crafting the tasks for
assessment, teachers also have to decide if they want to assess the process,
the product or both. Of course, most important of all, the tasks developed
must allow for predetermined performance criteria. For that, the tasks must
possess special characteristics. Refer to subtopic 6.6 for details.

(b) High-quality Scoring Rubrics Are Difficult to Develop
This is especially true when teachers want to assess complex cognitive and
intangible affective learning outcomes or permit multiple answers and
products. Failure to develop a high-quality rubric will affect the validity and
reliability of assessment.

(c) Completing Authentic Assessment Tasks Takes a Lot of Time
Most authentic tasks take days, weeks or months to complete. For instance,
a research project might take a few weeks and this might reduce the amount
of instructional time.

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  129

(d) Scoring Authentic Assessment Tasks Takes a Lot of Time
The more complex the tasks, the more time teachers can expect to spend on
scoring. Complex tasks normally allow for many diverse outputs from the
students. It is time consuming to score this type of outputs. Besides,
assessment that focuses on the process requires that teachers monitor and
score the output at different stages in the implementation of the tasks.

(e) Scores from Tasks for Authentic Assessment May Have Lower Scorer
Reliability
With complex tasks, multiple outputs and answers, scoring depends on
teachersÊ own competence. If two teachers are doing the assessment, they
may mark the same output or answer of a student quite differently. This is
not only frustrating to the student but lowers the reliability and validity of
the assessment results. However, this problem can be solved by having well-
defined rubrics and well-trained scorers to mark the studentsÊ output.

(f) Authentic Assessments Have Low Reliability from the Content-sampling
Point of View
Normally, each authentic assessment task will only focus on specific subject-
matter content. As the task requires an extended period of time to complete,
it is not possible to have a wide content coverage as in the traditional
objective assessment formats which allow a broader content coverage in less
time.

(g) Completing Authentic Assessment Tasks May Be Discouraging to Less Able
Students
Complex tasks such as projects require students to sustain their interest and
intensity over a long period of time. They may be overwhelmed by the high
demands of the authentic assessment. Though group work may help by
permitting peers to share the work and use each otherÊs differential
knowledge and skills to complete the task, group work has its limitations in
assessment.

In sum, criticism of authentic assessments generally involves both the informal
development of the assessments and difficulty in ensuring test validity and
reliability given the subjective nature of human scoring rubrics as compared to
computers scoring multiple-choice test items. Many teachers shy away from
authentic assessments because these methodologies are time intensive to manage,
grade, monitor and coordinate. Teachers find it hard to provide consistent grading
scheme. The subjective method of grading may lead to bias. Teachers also find that
this method is not practical for a big group of students.

Copyright © Open University Malaysia (OUM)

130  TOPIC 6 AUTHENTIC ASSESSMENT

Nevertheless, based on the value of authentic assessments to student outcomes,
the advantages of authentic assessments outweigh these concerns. For example,
once the assessment guidelines and grading rubric are created, they can be filed
away and used year after year. As Linquist (1951) noted, there is nothing new
about this authentic assessment methodology. This is not some kind of radical
invention recently fabricated by the opponents of traditional tests to challenge the
testing industry. Rather it is a proven method of evaluating human characteristics
that has been in use for decades.

ACTIVITY 6.2

Is authentic assessment practised in your institution? How is it done? If
it is not being practised, explain why.

Share your experience with your coursemates in the myINSPIRE online
forum.

6.6 CHARACTERISTICS OF AUTHENTIC
ASSESSMENT

The main characteristics of authentic assessment have been summed up by Reeves,
Herrington and Oliver (2002) who then contrasted its methodology to that of
traditional assessment. According to Reeves et al. (2002), authentic assessment is
characterised by the following:

(a) Has Real-world Relevance
The assessment is meant to focus on the impact of oneÊs work in real or
realistic contexts.

(b) Requires Students to Define the Tasks and Sub-tasks Needed to Complete
the Activity
Problems inherent in the activities are open to multiple interpretations rather
than easily solved by the application of existing algorithms.

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  131

(c) Comprises Complex Tasks to Be Investigated by Students Over a Sustained
Period of Time
Activities are completed in days, weeks and months rather than minutes or
hours. They require significant investment of time and intellectual resources.

(d) Provides the Opportunity for Students to Examine the Task from Different
Perspectives, Using a Variety of Resources
The use of a variety of resources rather than a limited number of pre-selected
references requires students to distinguish relevant information from
irrelevant data.

(e) Provides the Opportunity to Collaborate
Collaboration is integral to the task, both within the course and the real
world, rather than achievable by the individual learner.

(f) Provides the Opportunity to Reflect
Assessments need to enable learners to make choices and reflect on their
learning, both individually and socially.

(g) Can Be Integrated and Applied Across Different Subject Areas and Lead
Beyond Domain-specific Outcomes
Assessments encourage interdisciplinary perspectives and enable students
to play diverse roles; thus, building robust expertise rather than knowledge
limited to a single well-defined field or domain.

(h) Authentic Activities are Seamlessly Integrated with Assessment
Assessment of activities is seamlessly integrated with the major task in a
manner that reflects real-world assessment, rather than separate artificial
assessment removed from the nature of the task.

(i) Creates Values
The product, outcome or result of an assessment is polished and is valued by
the student in its own right, rather than being treated as preparation for
something else.

(j) Allows Competing Solutions and Diversity of Outcomes
Assessments allow a range and diversity of outcomes open to multiple
solutions of an original nature, rather than a single correct response obtained
by the application of rules and procedures.

Copyright © Open University Malaysia (OUM)

132  TOPIC 6 AUTHENTIC ASSESSMENT

6.7 DIFFERENCES BETWEEN AUTHENTIC AND
TRADITIONAL ASSESSMENTS

Assessment is authentic when we directly examine studentÊs performance on
worthy intellectual tasks. Traditional assessment, by contrast, relies on indirect or
„proxy items‰ that though efficient, are simplistic substitutes from which we think
valid inferences can be made about the studentÊs performance at those valued
challenges (Wiggins, 1990). The differences can be summed up as in Table 6.1.

Table 6.1: Comparisons between Authentic and Traditional Assessments

Attributes Authentic Assessment Traditional Assessment
Reasoning and
practice  Schools must help students  Schools must teach this
become proficient at body of knowledge and
Assessment and performing the tasks they skills.
curriculum will encounter when they
leave schools.  To determine if teaching
is successful, the school
 To determine if teaching is must then test students to
successful, the school must see if they acquired the
then ask students to perform knowledge and skills.
meaningful tasks that
replicate real-world  The curriculum drives
challenges to see if students assessment. The body of
are capable of doing so. knowledge is determined
first. That knowledge
 Assessment drives the becomes the curriculum
curriculum. That is, teachers that is delivered.
first determine the tasks that Subsequently, the
students will perform to assessments are
demonstrate their mastery developed and
and then a curriculum is administered to
developed that will enable determine if acquisition
students to perform those of the curriculum
tasks well, which would occurred.
include the acquisition of
essential knowledge and
skills. This has been referred
to as planning backwards.

Copyright © Open University Malaysia (OUM)

TOPIC 6 AUTHENTIC ASSESSMENT  133

Types of  Students are required to  Students are required to
assessment tasks
demonstrate understanding take tests, usually the
Nature of
assessment tasks by performing a more selection type, in which

Focus of complex task usually they are asked to select
assessment
representative of more the correct answer from
LearnersÊ
responses in meaningful applications such the choices provided.
assessment
as carrying out a class project

and keeping portfolios.

 Real-life tasks are assigned  Tests that are contrived,
for learners to perform in e.g. MCQs are used to
order to demonstrate their assess learnersÊ
proficiency or competency. proficiency or
understanding in a short
period of time.

 Construction or Application  Recall or Recognition of

of Knowledge Knowledge

Assessment requires learners In assessment, learners
to be effective performers of are only required to
the acquired knowledge. reveal if they can
Therefore, during assessment recognise and recall,
learners are asked to analyse, normally facts that they
synthesise and apply what have learnt out of
they have learnt and create context.
new meaning in the process.

 Learner Structured  Teacher Structured

Authentic assessments allow What a student can and
more student choices and will demonstrate has
construction in determining been carefully structured
what is presented as evidence by the person(s) who
of proficiency. Even when developed the test. A
students cannot choose their studentÊs attention will
own topics or formats, there understandably be
are usually multiple focused on and limited to
acceptable routes towards what is on the test.
constructing a product or
performance.

Copyright © Open University Malaysia (OUM)


Click to View FlipBook Version