5.0 ANALYSIS OF TEST RESULTS
5.1 Total Score Score No Name Score
No Name
. 21 / 30
. 19 / 30
1 Uzayr Rayyan Ramadhan 28 / 30 17 Syafiatul Asmaq 18 / 30
2 Aiysar Nur Izdiyad 18 / 30
3 Qamarul Isyraq 28 / 30 18 Nur Fairuz 18 / 30
4 Aimy Nur Husna Humairah 17 / 30
5 Nur Sumayyah Maisarah 27 / 30 19 Muhammad Alif Asyraff 16 / 30
6 Mohamad Najib 16 / 30
7 Said Lutfil Daiyan 27 / 30 20 Muhammad Aqil 15 / 30
8 Farah Syahzarina 13 / 30
9 Nur Iffah Musfirah 27 / 30 21 Nur Khaireen Eishal 9 / 30
10 Zayyan 9 / 30
11 Dhanalakshmii 27 / 30 22 Muhammad Aiman Haiqal 8 / 30
12 Saidatul Amani Haninah 8 / 30
13 Mohamad Akiff Zaqwan 26 / 30 23 Nurul Iman Hannani 7 / 30
14 Ainur Syifa Syuhada 4 / 30
15 Fasihah Nor Rania 26 / 30 24 Muhammad Hasif
16 Muhammad Adam Firdaus
26 / 30 25 Sham Shatul Bharizon
26 / 30 26 Nur Khairuna Irdyna Syifa
26 / 30 27 Luqman Hakim
25 / 30 28 Syafia Qaleesya
24 / 30 29 Muhammad Zulfikri
23 / 30 30 Siti Balqus
22 / 30 31 Mohamad Aidil Naufal
22 / 30 32 Affiq Aisy Iskandar
Table 3: Total Score
Based on table 3, the highest score obtained is 28 which represent by two students that
are Uzayr Rayyan Ramadhan and Aiysar Nur Izdiyad. The students who got intermediate
score range between 13 to 19 points represent by nine students and the pupils who got
the lowest score is Affiq Aisy Iskandar scored only four points out of 30.
7
Total Score
Number of Students 12
10
6 to 10 11 to 15 16 to 20 21 to 25 26 to 30
8 Marks
6
4
2
0
0 to 5
Figure 1: Total Scores
Figure 1 shows that Year 4 students of Sekolah Kebangsaan Kuala Kubu Bharu are
mixed abilities students because they are of different level of proficiency. Based on table
3 and figure 1, a frequency histogram was built and the values for mean, median, mode
and standard deviation were calculated to know the students’ total scores. Figure 2 shows
the frequency histogram with class interval of 5 while the values for mean, median, mode
and standard deviation is shown in table 4 below.
MEAN 19.6
MODE 26
MEDIAN 21.5
STANDARD DEVIATION 7.3
Table 4: The Values for Mean, Mode, Median and Standard Deviation of the Students
Total Scores
8
Score Figure 2: Histogram of frequency
Student's Name Uzayr Rayyan Ramadhan Based on this histogram of frequency, it can be said that it shows a random distribution where
Aiysar Nur Izdiyad there are too many peaks to show that there are many proficiency shown by students. Some are too
Figure 3: Section A’s Scores Qamarul Isyraq good and some are too bad in their performance.
9 Aimy Nur Husna Humairah 5.2 Analysis of Section A Results
Nur Sumayyah Maisarah
Mohamad Najib Section A's Scores
Said Lutfil Daiyan
Farah Syahzarina 12
Nur Iffah Musfirah 10
Zayyan
Dhanalakshmii 8
Saidatul Amani Haninah 6
Mohamad Akiff Zaqwan 4
Ainur Syifa Syuhada 2
Fasihah Nor Rania 0
Muhammad Adam Firdaus
Syafiatul Asmaq
Nur Fairuz
Muhammad Alif Asyraff
Muhammad Aqil
Nur Khaireen Eishal
Muhammad Aiman Haiqal
Nurul Iman Hnannani
Muhammad Hasif
Sham Shatul Bharizon
Nur Khairuna Irdyna Syifa
Luqman Hakim
Syafia Qaleesya
Muhammad Zulfikri
Siti Balqus
Mohamad Aidil Naufal
Affiq Aisy Iskandar
will reveal it. In item analysis, there are two most common statistics used to determine the quality
of an item that are the item difficulty and item discrimination. Difficulty index is a measure of the
proportion of examinees who responded to an item correctly where discrimination index is a
measure of how well the item discriminates between examinees who are knowledgeable in the
content area and those who are not.
No of Difficulty Justification Discrimination Justification Final
Questions Index Index Justification
excellent
1 0.75 SECTION A excellent retain
2 0.41 excellent retain
3 0.78 best 0.78 excellent retain
4 0.88 excellent modify
5 0.78 good 0.44 excellent retain
6 0.81 excellent modify
7 0.75 best 0.67 excellent retain
8 0.88 modify
too easy 0.44
9 discard
best 0.67
10 retain
too easy 0.56
11 retain
12 best 0.44 retain
13 retain
14 too easy 0.44 discard
15 retain
16 0.94 too easy 0.11 poor retain
17 retain
18 0.75 best 0.44 excellent retain
19 retain
20 0.70 SECTION B excellent retain
0.50 excellent
21 0.50 best 0.44 excellent retain
22 0.09 retain
23 0.34 best 0.78 good retain
24 0.78 excellent retain
25 0.34 best 0.78 excellent retain
26 0.70 excellent retain
27 0.30 too difficult 0.33 excellent modify
28 0.59 discard
29 good 0.44 good modify
30 0.70 excellent modify
0.80 best 0.44
0.63 excellent
0.47 good 0.67 excellent
0.69 excellent
0.40 best 0.78 excellent
0.84 excellent
0.80 good 0.33 excellent
0.84 excellent
0.81 best 0.89
poor
SECTION C excellent
excellent
best 0.89
best 0.78
best 1.00
good 0.56
best 0.56
good 0.78
too easy 0.56
best 0.00
too easy 0.56
too easy 0.44
Table 8: Difficulty Index and Discrimination Index
14
6.1 Difficulty Index
According to Understanding Item Analyses (n.d), difficulty index is simple the percentage
of correct answers responded by students. In this case, it is also the item mean. It ranges
between 0.0 and 1.0; the higher the value the easier the question. To determine the best
item, the difficulty index is 0.5. It is called p-value.
Difficulty Index for Question 1 - 30
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30
Figure 9
Difficulty Value Quality Recommendation
< 0.2 too difficult discard/modify
0.2 - 0.5 good retain
0.5 - 0.8 best
too easy discard/modify
> 0.8
Table 9
Based on the data in figure 9 and table 9, Question 1 (0.75), Question 3 (0.78),
Question 5 (0.78), Question 7 (0.75), Question 10 (0.75), Question 11 (0.7), Question 12
(0.5), Question 13 (0.5), Question 16 (0.78), Question 18 (0.7), Question 20 (0.59),
Question 21 (0.7), Question 22 (0.8), Question 23 (0.63), Question 25 (0.69) and
Question 28 (0.8), are categorised as “best” quality since they have the difficulty index
between 0.50 and 0.80 and should definitely be retained. While, Question 2 (0.41),
Question 15 (0.34), Question 17 (0.34), Question 19 (0.3), Question 24 (0.47) and
Question 26 (0.4) are categorised as “good” quality since they have difficulty value
between 0.20 and 0.50 and also should be retained. Hence, I can conclude that most of
the upper and lower pupils were able to answer these questions.
15
On the other hand, Question 14 (0.09) is categorised as “very difficult” quality since they
have difficulty value below 0.20 and should be discarded. While Question 4 (0.88),
Question 6 (0.81), Question 8 (0.88), Question 9 (0.94), Question 27 (0.84), Question 29
(0.84) and Question 30 (0.81) are categorised as “very easy” quality since they have the
difficulty value above 0.80 and should be discarded or carefully reviewed. Therefore, I
can conclude that most of the upper and lower pupils were unable to answer these
questions.
6.2 Discrimination Index
According to Understanding Item Analyses (n.d), discrimination index can be referred as
the ability of an item to differentiate among students on the basis of how well they know
the material being tested. The likely range of the discrimination index is -1.0 to 1.0;
nevertheless, if an item has a discrimination value below 0.0, it suggests a problem. When
an item is negatively discriminated, generally, the most knowledgeable examinees are
getting the item wrong and the least knowledgeable examinees are getting the item right.
A negative discrimination index may point out that the item is measuring something else,
not what the rest of the test is measuring.
Discrimination Index for Question 1 - 30
1.2
1
1 0.89 0.89
0.78 0.78 0.78 0.78 0.78
0.8 0.67
0.78
0.67 0.67 0.44 0.56 0.56 0.56
0.6 0.56 0.56 0.44
0.44 0.44
0.4 0.44 0.44 0.44 0.44 0.44 0.33
0.33
0.2 0.11
000000000000000000000000000000
0
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Q18
Q19
Q20
Q21
Q22
Q23
Q24
Q25
Q26
Q27
Q28
Q29
Q30
Figure 10
16
Discrimination Quality Recommendation
Value
>0.4 excellent retain
good can be improve
0.30 - 0.39 fair need to review
0.20 - 0.29 poor discard/modify
discard/modify
<0.20 very poor
<0
Table 10
Based on the data from figure 10 and table 10, it can be seen that most of the questions
have an excellent discrimination index where their values are above 0.4. These questions
should be retained with a slight modification for some questions. While Question 14 and
19 with the value of 0.33 each are considered ‘good’ quality of questions where their
value are in the range between 0.30 and 0.39. These questions should be revised as
Question 14, although, it has a good discrimination value, its difficulty index is too low.
However, Question 9 and 28 are considered ‘poor’ quality of questions where their values
are 0.11 and 0.0 respectively. These questions also should be revised whether to discard
them or not.
6.3 Reliability Coefficient
Level of Agreement Reliability Value
Perfect >1.00
Almost perfect 0.81 – 1.00
0.61 – 0.80
Substantial 0.41 – 0.60
Moderate 0.21 – 0.40
0.00 – 0.20
Fair
Slight
Poor <0.00
Table 11
Table 11 show the degree of reliability of a test. A test is said to be reliable when the
value is above 0.5. It is the consistency of the measurement of a test it the test was to be
tested again in the same condition. A student scores similar to the first test. For this test,
the reliability value was measured and is shown in table 12 below along with the variance,
PQ value, average difficulty index and discrimination index. It has the value of 0.93 and
according to table 11, the test is almost perfect.
17
AVERAGE DIFF. 0.65
AVERAGE DISC. 0.57
PQ 5.54
VARIANCE 53.02
RELIABIITY KR20 0.93
SEM 1.98
Table 12
6.4 Distractor Analysis
Distractor refers to the wrong responses in multiple choice questions. For example, the
correct answer is A, so the remaining choices; B, C, and D are the distractors of the
questions. (Boon, Lee, & Aeria, 2017) It can be a good, bad or non-functional distractor
based on the number of responses from upper and lower group of students.
Item 15 *A B C D
Total Students 11 6 3 12
High Proficiency 6 0 0 5
Low Proficiency 1 5 2 3
Others 41 1 4
Table 13: Distractor Analysis for Item 15
As example, for item 15, it can be seen that the correct answer is A. Distractors B and C
are a good distractors because more low proficiency students choose those as answers.
However, distractor D is not a good distractor as more high proficiency students choose
this as response to the question than low proficiency students.
7.0 REFLECTION
From my findings, I have found that there are a few factors that contribute to a good
assessment and need to be put into consideration before constructing test paper and
administrating the test to ensure the validity and reliability of the data collected. First and
foremost, I should check students’ level of proficiency before constructing the test so that the
data distribution would be normal distribution. Secondly, while administrating the test, I should
give a fixed time for students to answer the test so that their responses would be authentic
because there are a few students that made a few attempts to perfect their responses, however,
I only took the first attempt.
18