October 30, 2015
Relating STAR Reading™ and STAR Math™ to
Washington Smarter Balanced Assessments
performance
Quick reference guide to the STAR Assessments™
STAR Reading™—used for screening and progress-monitoring assessment—is a
reliable, valid, and efficient computer-adaptive assessment of general reading
achievement and comprehension for grades 1–12. STAR Reading provides
nationally norm-referenced reading scores and criterion-referenced scores. A STAR
Reading assessment can be completed without teacher assistance in about 10
minutes and repeated as often as weekly for progress monitoring.
STAR Math™—used for screening, progress-monitoring, and diagnostic
assessment—is a reliable, valid, and efficient computer-adaptive assessment of
general math achievement for grades 1–12. STAR Math provides nationally norm
referenced math scores and criterion-referenced evaluations of skill levels. A STAR
Math assessment can be completed without teacher assistance in less than 15
minutes and repeated as often as weekly for progress monitoring.
STAR Reading and STAR Math received the highest possible ratings for screening and progress monitoring by
the National Center on Response to Intervention, are highly rated for progress monitoring by the National
Center on Intensive Intervention, and meet all criteria for scientifically based progress-monitoring tools set
by the National Center on Student Progress Monitoring.
All logos, designs, and brand names for Renaissance Learning’s products and services, including but not limited to Renaissance Learning,
Renaissance Place, STAR, STAR Assessments, STAR Math, STAR Math Enterprise, STAR Reading, and STAR Reading Enterprise are
trademarks of Renaissance Learning, Inc., and its subsidiaries, registered, common law, or pending registration in the United States and
other countries. All other product and company names should be considered the property of their respective companies and organizations.
© 2015 by Renaissance Learning, Inc. All rights reserved. Printed in the United States of America. This publication is protected by U.S. and
international copyright laws. It is unlawful to duplicate or reproduce any copyrighted material without authorization from the copyright
holder. For more information, contact:
RENAISSANCE LEARNING
P.O. Box 8036
Wisconsin Rapids, WI 54495-8036
(800) 338-4204
www.renaissance.com
Page 2 of 23
Project purpose
Educators face many challenges; chief among them is making decisions regarding how to allocate limited
resources to best serve diverse student needs. A good assessment system supports teachers by providing
timely, relevant information that can help address key questions about which students are on track to meet
important performance standards and which students may need additional help. Different educational
assessments serve different purposes, but those that can identify students early in the school year as being at-
risk to miss academic standards can be especially useful because they can help inform instructional decisions
that can improve student performance and reduce gaps in achievement. Assessments that can do that while
taking little time away from instruction are particularly valuable.
Indicating which students are on track to meet later expectations is one of the potential capabilities of a
category of educational assessments called “interim” (Perie, Marian, Gong, & Wurtzel, 2007). They are one of
three broad categories of assessment:
• Summative – typically annual tests that evaluate the extent to which students have met a set of
standards. Most common are state-mandated tests such as the Smarter Balanced Assessment (SBA).
• Formative – short and frequent processes embedded in the instructional program that support learning
by providing feedback on student performance and identifying specific things students know and can do
as well as gaps in their knowledge.
• Interim – assessments that fall in between formative and summative in terms of their duration and
frequency. Some interim tests can serve one or more purposes, including informing instruction,
evaluating curriculum and student responsiveness to intervention, and forecasting likely performance on
a high-stakes summative test later in the year.
This project focuses on the application of interim test results, notably their power to inform educators about
which students are on track to succeed on the year-end summative state test and which students might need
additional assistance to reach proficiency. Specifically, the purpose of this project is to explore statistical
linkages between Renaissance Learning interim assessments1 (STAR Reading and STAR Math) and the SBA. If
these linkages are sufficiently strong, they may be useful for:
1. The early identification of students at risk of failing to make yearly progress goals in reading and math,
which could help teachers decide to adjust instruction for selected students.
2. Forecasting percentages of students at each performance level on the state assessments sufficiently in
advance to permit redirection of resources and serve as an early warning system for administrators at the
building and district level.
Assessments
Smarter Balanced Assessment (SBA)
Groups of states have formed two related consortia, Smarter Balanced Assessment Consortium (SBAC) and
Partnership for Assessment of Readiness for College and Careers (PARCC), for developing assessments to test
Common Core State Standards (CCSS). Many states have replaced their current state summative test with SBAC
or PARCC in the 2014–2015 school year.
1 For an overview of the STAR tests and how they work, please see the References section for a link to download The research foundation for
STAR Assessments report. For additional information, full technical manuals are available for each STAR assessment by contacting
Renaissance Learning at [email protected]
Page 3 of 23
This report is concerned with the Smarter Balanced English language arts/literacy and mathematics
assessments in grades 3 through 8. The choice of these two school subjects was made because they coincide
with the content of the STAR interim assessments, STAR Reading and STAR Math.
The Smarter Balanced Assessment reports scaled scores to describe a student’s location on the achievement
continuum ranging from 2114 to 2769 for ELA/literacy and 2189 to 2802 for mathematics in grades 3-8. The SBA
results are used to classify students into four achievement levels, each with descriptors that convey the
knowledge and skills expected of students at the differing levels of performance. In general, Level 1 indicates
the student has not met the achievement standard; level 2 indicates the student has nearly met the
achievement standard; level 3 indicates the student has met the achievement standard; and level 4 indicates
the student has exceeded the achievement standard. The four achievement levels are defined by ranges of
students’ scaled scores, displayed for ELA/literacy and math in Tables 1a and 1b, respectively.2
Table 1a. SBA achievement level score ranges: ELA/literacy
Grade Level 1 Level 2 Level 3 Level 4
2432-2489 2490-2623
3 2114-2366 2367-2431 2473-2532 2533-2663
2502-2581 2582-2701
4 2131-2415 2416-2472 2531-2617 2618-2724
2552-2648 2649-2745
5 2201-2441 2442-2501 2567-2667 2668-2769
6 2210-2456 2457-2530
7 2258-2478 2479-2551
8 2288-2486 2487-2566
Table 1b. SBA achievement level score ranges: Mathematics
Grade Level 1 Level 2 Level 3 Level 4
2436-2500 2501-2621
3 2189-2380 2381-2435 2485-2548 2549-2659
2528-2578 2579-2700
4 2204-2410 2411-2484 2552-2609 2610-2748
2567-2634 2635-2778
5 2219-2454 2455-2527 2586-2652 2653-2802
6 2235-2472 2473-2551
7 2250-2483 2484-2566
8 2265-2503 2504-2585
STAR Reading™ and STAR Math™
STAR Assessments are nationally normed, computer adaptive measures of general achievement. STAR Reading
and STAR Math are intended for use as interim assessments that can be administered at multiple points
throughout the school year for purposes such as screening, placement, progress monitoring, and outcomes
assessment. Renaissance Learning recommends that STAR tests be administered two to five times a year for
most purposes, and more frequently when used in progress monitoring programs. Recent changes to the STAR
test item banks and software make it possible to test as often as weekly, for short term progress monitoring in
programs such as RTI (response to intervention).
2 For more detailed descriptors associated with each achievement level, see: http://www.smarterbalanced.org/wordpress/wp-
content/uploads/2015/03/Reporting-ALDs-Final.pdf.
Page 4 of 23
Method
Data collection
Analysis included the evaluation of correlations and statistical linkages between scores on Smarter Balanced
Assessments (SBA) and STAR Reading and STAR Math. Such analyses require matched data, with student
records that include both the SBA and STAR test scores. Using a secure data-matching procedure compliant
with the federal Family Educational Rights and Privacy Act (FERPA), staff from eight large districts in four
Smarter Balanced states (California, Oregon, Washington, and Connecticut) provided Renaissance Learning with
state test scores for students who had taken STAR Reading and/or STAR Math during the 2014/15 school year.
Each record in the resulting matched data file included a student’s SBA scores as well as scores on any STAR
Reading or STAR Math tests taken during that same year.
Linkages between the STAR and SBA score scales were developed by applying equipercentile linking analysis
(Kolen & Brennan, 2004) at each grade. The SBA score scale was linked to the STAR score scale yielding a table of
equivalent SBA scores for each possible STAR score. This type of analysis requires students take both
assessments at about the same time.
Sample
The matched STAR-SBA data was divided into two samples. Linking was completed using a concurrent sample,
which included STAR tests taken within 30 days before or after the mid-date of the SBA administration window.
The concurrent sample consisted of a total of 54,777 students with matched SBA ELA/literacy and STAR Reading
scores, and 50,752 students with matched SBA mathematics and STAR Math scores. Of the concurrent sample,
10% of the students in each grade was reserved as part of a holdout sample which was used exclusively to
evaluate the linking, and was not included in the sample used to compute it.
STAR tests taken outside the +/-30 day SBA window were included in a predictive sample, which was used to
evaluate the accuracy of using the linking results to predict SBA performance using STAR data from earlier in the
school year. In the predictive sample, STAR scaled scores were projected to the mid-date of the SBA testing
window using national growth norms (Renaissance Learning, 2015a, 2015b). National growth norms are based
on grade and initial performance, and are updated annually using a five-year period of data which includes
millions of students. They provide typical growth rates for students based on their starting STAR test score. For
each STAR score in the predictive sample, the number of weeks between the STAR administration date and the
SBA mid-date was calculated. The number of weeks between the two tests was multiplied by the student’s
expected weekly scaled score growth (based on national growth norms). The expected growth was then added
to the observed scaled score to determine the projected STAR score at the time of the SBA. If a student had
multiple STAR tests in the predictive sample, all the projected scores were averaged.
Tables 2a through 2d contain sample sizes and descriptive statistics for each subject and sample.
Page 5 of 23
Table 2a. Descriptive statistics for STAR™ and SBA ELA/literacy test scores by grade (concurrent sample)
Grade Hold Out Sample Size Total SBA ELA/literacy STAR Reading
Linking M SD M SD
3 1,107 9,961 11,068 2410.8 84.5 420.4 182.2
4 1,220 10,980 12,200 2465.2 90.7 547.8 223.6
5 1,091 9,817 10,908 2497.7 92.3 634.3 252.5
6 807 7,265 8,072 2525.9 86.9 727.1 275.8
7 632 5,688 6,320 2551.5 97.2 810.9 299.1
8 621 5,588 6,209 2570.9 96.6 878.7 306.5
Table 2b. Descriptive Statistics for STAR™ and SBA Mathematics Test Scores by Grade (concurrent sample)
Grade Hold Out Sample Size Total SBA Mathematics STAR Math
Linking M SD M SD
3 1,080 9,720 10,800 2431.0 80.2 602.7 107.5
4 1,058 9,524 10,582 2474.3 80.6 677.4 115.3
5 975 8,775 9,750 2494.3 92.1 730.1 123.4
6 785 7,067 7,852 2522.3 101.7 766.7 116.9
7 634 5,710 6,344 2539.0 110.1 792.3 126.1
8 542 4,882 5,424 2557.7 122.0 813.0 120.3
Table 2c. Descriptive statistics for STAR™ and SBA ELA/literacy test scores by grade (predictive sample)
Grade Sample Size SBA ELA/literacy STAR Reading
M SD M SD
3 8,620 2422.1 86.4 450.1 169.3
4 8,603 2469.4 91.3 563.9 200.7
5 8,595 2502.9 93.0 653.7 232.7
6 8,511 2530.9 87.8 746.3 251.1
7 8,642 2555.5 95.7 829.4 271.3
8 8,864 2571.5 95.6 896.1 276.4
Page 6 of 23
Table 2d. Descriptive statistics for STAR™ and SBA mathematics test scores by grade (predictive sample)
Grade Sample Size SBA Mathematics STAR Math
M SD M SD
3 8,593 2433.5 79.6 617.3 84.6
4 8,571 2478.0 81.6 688.3 95.1
5 8,595 2498.4 92.2 741.1 102.4
6 8,575 2527.0 100.9 776.2 98.1
7 8,623 2541.5 108.0 797.1 100.6
8 8,859 2553.4 116.8 813.6 96.6
Results
Scale linkage
Equipercentile linking was used to develop linkages between STAR and SBA scale scores for reading and math.
The result of the analysis was a set of tables yielding equivalent SBA scores for each possible STAR score. These
results allow the user to look up the SBA ELA/literacy or mathematics test score that corresponds to every
possible STAR Reading or STAR Math score (see Figures 1a through 1b).
Figure 1a. Linkage of SBA ELA/literacy to the STAR Reading™ scale
Page 7 of 23
Figure 1b. Linkage of SBA mathematics to the STAR Math™ scale
Correlations
Two sets of correlations were computed: one between the SBA scores and concurrent STAR scores, and another
between SBA scores and the SBA score equivalents (obtained from the linking). Tables 3a and 3b display these
correlations for reading and math respectively.
For reading, the correlations between the SBA ELA/literacy and STAR averaged .82 and ranged from .81 to .83.
The correlations between SBA and SBA score equivalents were similar, averaging .83 and ranging from .82 to .85.
Table 3a. Pearson correlations between STAR Reading™ and SBA ELA (concurrent sample)
Grade SBA ELA/literacy score correlation with:
Concurrent STAR Reading scale scores SBA ELA/literacy score equivalents
3 0.82 0.83
4 0.82 0.84
5 0.83 0.85
6 0.81 0.82
7 0.83 0.83
8 0.83 0.83
Average 0.82 0.83
Page 8 of 23
For math, correlations between the SBA Mathematics and STAR were similar to those for reading, averaging .85
and ranging from .83 to .86. The correlations between SBA and SBA score equivalents averaged .87 and ranged
from .86 to .89.
Table 3b. Pearson correlations between STAR Math™ and SBA mathematics (concurrent sample)
Grade SBA Mathematics score correlation with:
Concurrent STAR Math scale scores SBA Mathematics score equivalents
3 0.84 0.86
4 0.86 0.88
5 0.86 0.89
6 0.86 0.87
7 0.86 0.88
8 0.83 0.87
Average 0.85 0.87
STAR equivalents to SBA achievement level cut scores
A principal purpose of the linkage between STAR and SBA ELA/literacy and mathematics test scores was to
identify the scores on STAR Reading and STAR Math that are approximately equivalent to the cut-off scores that
separate achievement levels on the SBA tests. Tables 4a and 4b display those cut scores for reading and math in
grades 3 through 8, respectively.
Because the linking was done using a sample from just eight districts in four states, these cutscores should be
considered approximations that can be updated with greater precision as more data become available in the
future.
Table 4a. Equivalent STAR™ score achievement level ranges: Reading
Grade Level 1 Level 2 Level 3 Level 4
562 - 1400
3 0 - 332 333 - 454 455 - 561 661 - 1400
853 - 1400
4 0 - 434 435 - 530 531 - 660 1016 - 1400
1169 - 1400
5 0 - 483 484 - 599 600 - 852 1250 - 1400
6 0 - 499 500 - 691 692 - 1015
7 0 - 576 577 - 772 773 - 1168
8 0 - 589 590 - 857 858 - 1249
Page 9 of 23
Table 4b. Equivalent STAR™ score achievement level ranges: Math
Grade Level 1 Level 2 Level 3 Level 4
621 - 686 687 - 1400
3 0 - 546 547 - 620 696 - 771 772 - 1400
782 - 826 827 - 1400
4 0 - 602 603 - 695 808 - 859 860 - 1400
831 - 885 886 - 1400
5 0 - 692 693 - 781 852 - 891 892 - 1400
6 0 - 716 717 - 807
7 0 - 745 746 - 830
8 0 - 783 784 - 851
RMSEL and mean differences
Accuracy of the scale linkage was evaluated two ways. The same scores used to complete the linking were used
to compute the root mean squared errors of linking (RMSEL). Additionally, the holdout sample (i.e., concurrent
scores not used to complete the linking) were used to evaluate differences between observed SBA scores and
SBA score equivalents. Tables 5a and 5b display these statistics by grade for reading and math respectively.
Table 5a. Summary statistics from the ELA/literacy linkage (concurrent sample)
Grade Linking N Holdout sample difference scores Max
sample M SD Min
RMSEL
3 31.16 1,107 0.69 49.87 -167 212
4 32.06 1,220 -0.56 52.49 -260 175
5 32.01 1,091 2.79 48.89 -137 177
6 32.55 807 -1.93 52.01 -241 170
7 35.07 632 -4.53 55.15 -275 135
8 34.83 621 4.99 55.67 -153 201
Table 5b. Summary statistics from the math linkage (concurrent sample)
Grade Linking N Holdout sample difference scores Max
sample M SD Min 193
RMSEL 177
162
3 27.68 1,080 2.54 42.58 -210 215
420
4 25.68 1,058 -0.89 40.61 -159 258
5 28.00 975 -2.56 43.2 -148
6 33.30 785 0.86 48.68 -163
7 34.94 634 3.2 55.36 -196
8 40.83 542 0.87 63.67 -245
Page 10 of 23
Classification accuracy
The predictive sample was used in analyses exploring the accuracy of using STAR tests taken earlier in the
school year to predict SBA performance based on STAR cutscores identified in the linking analysis.
Two sets of correlations were calculated to summarize the predictive power of the STAR test scores: raw
correlations between the projected STAR and observed SBA scale scores, and equated-score correlations
between the SBA score equivalents obtained from the linking and the observed SBA scores. The predictive
sample correlations were similar in magnitude to the correlations presented earlier for the concurrent sample,
indicating that projected STAR scores are reliable estimates of SBA performance. Tables 6a and 6b display these
correlations for reading and math, respectively.
For reading, the raw correlations averaged .84 (ranging from .83 to .85) and the correlations between SBA and
SBA score equivalents averaged .85 (ranging from .84 to .87).
Table 6a. Pearson correlations between projected STAR Reading™ scores and SBA ELA/literacy (predictive
sample)
Grade SBA ELA/literacy score correlation with:
Projected STAR Reading scale scores SBA ELA score equivalents
3 0.84 0.86
4 0.84 0.86
5 0.85 0.87
6 0.83 0.84
7 0.84 0.85
8 0.84 0.84
Average 0.84 0.85
For math, the raw correlations averaged .87 (ranging from .83 to .88) and the correlations between SBA and SBA
score equivalents averaged .89 (ranging from .86 to .90).
Table 6b. Pearson correlations between projected STAR Math™ scores and SBA mathematics (predictive
sample)
Grade SBA mathematics score correlation with:
Projected STAR Math scale scores SBA mathematics score equivalents
3 0.87 0.89
4 0.88 0.90
5 0.88 0.90
6 0.88 0.89
7 0.87 0.89
8 0.83 0.86
Average 0.87 0.89
Page 11 of 23
Different evaluation data are presented for two-category (proficient vs. not proficient) versus four-category
(achievement level) projections. For the two-category projections, standard statistical classification diagnostics
were calculated. Because these can be quite complex in the multi-category situation, a simpler approach was
taken with four achievement levels.
Two-category proficiency status projections. Classification diagnostics were derived from counts of correct
and incorrect classifications that could be made when using STAR scores to predict whether or not a student
would be proficient on the SBA. The classification diagnostic formulas are outlined in Table 7a and the types of
classifications are summarized in Table 7b.
Figure 7a. Descriptions of classification diagnostic accuracy measures
Measure Formula Interpretation
Overall classification TP + TN Percentage of correct classifications
accuracy N
TP Percentage of proficient students identified as such using
Sensitivity STAR
TP + FN
Specificity TN Percentage of not proficient students identified as such
using STAR
Positive predictive value TN + FP
(PPV) TP Percentage of students STAR finds proficient who actually
are proficient
Negative predictive value TP + FP
(NPV) TN Percentage of students STAR finds not proficient who
actually are not proficient
Observed proficiency rate FN + TN
(OPR) TP + FN Percentage of students who achieve proficient
Projected proficiency rate N Percentage of students STAR finds proficient
(PPR) TP + FP
Difference between projected and observed proficiency
Proficiency status projection N rates
error
PPR - OPR
Page 12 of 23
Figure 7b. Schema for a fourfold table of classification diagnostic data
SBA Result Total
STAR Proficient Proficient Not Projected Proficient
Estimate Not (TP + FP)
True Positive False Positive
Total (TP) (FP) Projected Not
(FN + TN)
False Negative True Negative
(FN) (TN) N = TP+FP+FN+TN
Observed Proficient Observed Not
(TP + FN) (FP + TN)
Classification accuracy diagnostics are presented in tables 8a and 8b for reading and math, respectively.
On average, students were correctly classified as either proficient or not (i.e., overall classification accuracy)
84% of the time for reading and 87% of the time for math. For reading, the forecasts were accurate between 83%
and 85% of the time depending on the grade, while for math they were accurate between 86% and 88% of the
time.
Sensitivity statistics (i.e., the percentage of proficient students correctly forecasted) averaged 86% for reading,
with a range from 83% to 88%. For math, sensitivity statistics averaged 84%, with a range from 81% to 88%.
Sensitivity is positively related to observed proficiency rate, with similar trends emerging across the two
metrics. Specificity statistics (i.e., the percentage of not proficient students correctly forecasted) were similar to
sensitivity, averaging 82% for reading and 88% for math. Specificity is negatively related to observed proficiency
rate, so grades with higher observed proficiency rates tend to have lower specificity.
Positive predictive values averaged 84% for reading and ranged from 82% to 85%. For math, positive predictive
values averaged 84% and ranged from 83% to 86%. Therefore, when STAR scores forecasted students to be
proficient, they actually were proficient 84% of the time for reading and 84% of the time for math.
For reading, negative predictive values were similar to positive predictive values, averaging 85% and ranging
from 83% to 86%. For math, negative predictive values were also similar to positive predictive values, averaging
88% and ranging from 87% to 89%. The negative predictive value results indicated that when STAR scores
forecasted that students were not proficient, they actually were not proficient 85% of the time for reading and
88% of the time for math.
For reading, differences between the observed and projected proficiency rates (i.e., proficiency status projection
error) indicated that projected STAR Reading scores tended to accurately predict proficiency rates across
grades. Positive values of proficiency status projection error indicate over-prediction and negative values
indicate under-prediction. For reading, proficiency status projection errors averaged 1% and ranged from 0% to
3%. For math, proficiency status projection errors averaged 0% and ranged from -2% to 3%.
Finally, the area under the ROC curve (AUC) is a summary measure of diagnostic accuracy. The National Center
on Response to Intervention has set an AUC of 0.85 or higher as indicating convincing evidence that an
assessment can accurately predict another assessment result or outcome. In this study, both STAR Reading and
STAR Math well exceeded that standard. For reading, the average AUC was .92, with a range from .91 to .93,
indicating that STAR scores did a very good job of discriminating between which students scored proficient on
the SBA for ELA/literacy and which did not. For math, the average AUC was .94, with a range from .93 to .95,
indicating that STAR scores did a very good job of discriminating between which students scored proficient on
the SBA for math and which did not.
Page 13 of 23
Table 8a. Classification diagnostics for reading
Measure Grade
345678
Overall classification accuracy 85% 84% 85% 83% 84% 84%
Sensitivity 83% 86% 88% 85% 86% 86%
Specificity 86% 81% 82% 80% 82% 81%
Positive predictive value (PPV) 84% 82% 85% 82% 85% 84%
Negative predictive value (NPV) 86% 86% 86% 83% 83% 84%
Observed proficiency rate (OPR) 46% 50% 52% 51% 54% 53%
Projected proficiency rate (PPR) 46% 53% 54% 53% 54% 54%
Proficiency status projection error 0% 3% 2% 2% 1% 1%
Area under the ROC curve 0.93 0.93 0.93 0.91 0.92 0.92
Table 8b. Classification diagnostics for math
Measure Grade
345678
Overall classification accuracy 86% 86% 88% 86% 88% 86%
Sensitivity 88% 88% 82% 81% 84% 82%
Specificity 83% 85% 92% 90% 90% 89%
Positive predictive value (PPV) 83% 83% 86% 85% 86% 83%
Negative predictive value (NPV) 88% 89% 89% 87% 89% 88%
Observed proficiency rate (OPR) 49% 47% 38% 42% 42% 39%
Projected proficiency rate (PPR) 52% 49% 36% 40% 41% 39%
Proficiency status projection error 3% 2% -2% -2% -1% 0%
Area under the ROC curve 0.94 0.94 0.95 0.94 0.95 0.93
Page 14 of 23
Four-category achievement level projections. This section deals with the accuracy of forecasts of a more
finely-grained performance classification of student performance according to the four SBA achievement levels.
Compared to the two-category analyses, the four-level analyses were more straightforward. They focused on
two components: 1) the percentage of agreement between classifications based on SBA test scores and those
based on projected STAR scores and 2) comparisons of the projected and actual percentages of students who
achieved each level on the SBA.
Table 9 lists two measures of accuracy of the achievement level classifications based on linked projected STAR
scores. The first is the percent of perfect agreement between classifications based on SBA scores and those
based on projected STAR scores. The second measure is the percent of agreement within +/- one achievement
level.
Classification accuracy was similar for reading and math. For reading, perfect agreement rates averaged 62%
and ranged from 61% to 63%. For math, they averaged 66% and ranged from 63% to 67%. The percent
agreement within +/- one achievement level was above 95% for reading and math in all grades. These trends in
classification errors suggest that STAR scores can very accurately predict a student’s general SBA performance
(within one level above or below their actual achievement level) but are less precise when used to predict a
student’s specific SBA achievement level.
Table 9. Accuracy of projected STAR™ scores for predicting SBA achievement levels
Reading Math
Grade Agreement within +/- Agreement within +/-
one level one level
Perfect agreement Perfect agreement
3 62% 97% 64% 99%
4 61% 96% 67% 99%
5 63% 98% 67% 99%
6 62% 98% 66% 98%
7 63% 98% 67% 99%
8 62% 99% 63% 97%
Tables 10a and 10b list the percentages of students who were in each SBA achievement level versus those
forecasted to be in each level for reading and math, respectively.
For both reading and math, the achievement level projection errors (i.e., differences between the percentage of
actual and forecasted achievement) averaged 0%, but tended to slightly over-predict moderate performance
(e.g. Levels 2 and 3) and slightly under-predict extreme performance (e.g., Levels 1 and 4). The math projection
errors ranged from -8% to 7%, and the reading errors ranged from -4% to 4%.
Page 15 of 23
Table 10a. Actual versus forecasted percentages of students for each SBA ELA/literacy achievement level
Grade Level achieved Level forecasted Difference
(forecasted – achieved)
123412341234
3 28% 26% 23% 23% 25% 30% 23% 22% -3% 4% 0% -1%
4 29% 21% 24% 26% 25% 22% 26% 27% -4% 1% 2% 1%
5 27% 21% 31% 21% 23% 23% 35% 19% -4% 2% 4% -2%
6 19% 29% 35% 17% 15% 32% 38% 15% -4% 3% 3% -2%
7 23% 24% 36% 17% 19% 27% 39% 15% -4% 3% 3% -2%
8 20% 27% 36% 17% 16% 30% 40% 14% -4% 3% 4% -3%
Table 10b. Actual versus forecasted percentages of students for each SBA mathematics achievement level
Level achieved Level forecasted Difference
Grade 1234 (forecasted – achieved)
1 2 34
1234
3 26% 26% 28% 20% 18% 30% 35% 17% -8% 4% 7% -3%
4 21% 32% 27% 20% 16% 35% 34% 15% -5% 3% 7% -5%
5 34% 28% 17% 21% 29% 35% 19% 17% -5% 7% 2% -4%
6 27% 31% 21% 21% 23% 37% 23% 17% -4% 6% 2% -4%
7 29% 29% 23% 19% 27% 32% 26% 15% -2% 3% 3% -4%
8 34% 26% 19% 21% 33% 28% 20% 19% -1% 2% 1% -2%
Conclusions and applications
The equipercentile linking method was used to link the STAR Reading and STAR Math score scales to their
counterpart Smarter Balanced Assessment (SBA) score scales in grades 3 through 8. The result of each linkage
analysis was an estimate of the approximately equivalent SBA score for that grade. Using the tables of linked
scores, we identified STAR Reading and STAR Math scores that were linked to the cutscores for SBA achievement
levels (reported in Tables 4a and 4b). Because the linking was done using a sample from just eight districts in
four states, and may not be representative of the student population in each state included in the linking or all
SBA states, these cutscores should be considered approximations that can be updated with greater precision as
more data become available in the future.
Correlations indicated a strong relationship between the STAR and SBA tests. On average, the correlation
between SBA and concurrent STAR scores (i.e., STAR tests taken within +/- 30 days of the SBA mid-date) was .82
for reading and .85 for math. Similarly, the average correlation between SBA and predictive STAR scores (i.e.,
STAR tests taken earlier and projected to the SBA mid-date) was .84 for reading and .87 for math. When
projecting STAR scores to estimate SBA performance, students were correctly classified as either proficient or
not 84% of the time for reading and 87% for math.
The statistical linkages between STAR interim assessments and the SBA for ELA/literacy and mathematics
provide a means of forecasting student achievement on the SBA based on STAR scores obtained earlier in the
Page 16 of 23
school year. Example STAR Reading and STAR Math reports that utilize the STAR-SBA linking are provided in the
Appendix. They include individualized reports, which compare each student’s STAR performance to the growth
trajectory that typically would lead to proficiency on the SBA, as well as group-level performance reports that
forecast of the number of students that are expected to score at each achievement level on the SBA. Both types
of information can be used to help educators determine early and periodically which students are on track to
reach proficiency and make decisions accordingly.
References
Kolen, M. J. & Brennan, R. R. (2004). Test equating scaling and linking: Methods and practices. New York, NY:
Springer Science+Business Media.
Perie, M., Marion, S., Gong, B., & Wurtzel, J. (2007). The role of interim assessments in a comprehensive
assessment system. Aspen, CO: Aspen Institute.
Renaissance Learning. (2015a). STAR Math: Technical manual. Wisconsin Rapids, WI: Author. Available from
Renaissance Learning by request to [email protected]
Renaissance Learning. (2015b). STAR Reading: Technical manual. Wisconsin Rapids, WI: Author. Available from
Renaissance Learning by request to [email protected]
Renaissance Learning. (2014). The research foundation for STAR Assessments: The science of STAR. Wisconsin
Rapids, WI: Author. Available online from http://doc.renlearn.com/KMNet/R003957507GG2170.pdf
Independent technical reviews of STAR Reading and STAR Math
U.S. Department of Education: National Center on Intensive Intervention. (2015a). Review of progress monitoring
tools [Review of STAR Math]. Washington, DC: Author. Available online from
http://www.intensiveintervention.org/chart/progress-monitoring
U.S. Department of Education: National Center on Intensive Intervention. (2015b). Review of progress monitoring
tools [Review of STAR Reading]. Washington, DC: Author. Available online from
http://www.intensiveintervention.org/chart/progress-monitoring
U.S. Department of Education: National Center on Response to Intervention. (2010a). Review of progress
monitoring tools [Review of STAR Math]. Washington, DC: Author. Available online from
https://web.archive.org/web/20120813035500/http://www.rti4success.org/pdf/progressMonitoringGO
M.pdf
U.S. Department of Education: National Center on Response to Intervention. (2010b). Review of progress
monitoring tools [Review of STAR Reading]. Washington, DC: Author. Available online from
https://web.archive.org/web/20120813035500/http://www.rti4success.org/pdf/progressMonitoringGO
M.pdf
U.S. Department of Education: National Center on Response to Intervention. (2011a). Review of screening tools
[Review of STAR Math]. Washington, DC: Author. Available online from
http://www.rti4success.org/resources/tools-charts/screening-tools-chart
U.S. Department of Education: National Center on Response to Intervention. (2011b). Review of screening tools
[Review of STAR Reading]. Washington, DC: Author. Available online from
http://www.rti4success.org/resources/tools-charts/screening-tools-chart
Page 17 of 23
Appendix: Sample reports 3
Reading dashboard
The Reading Dashboard combines key data from across Renaissance’s products to provide a 360 degree view of
student performance and progress toward goals. At-a-glance data paired with actionable insight helps teachers
drive student growth to meet standards and ensure college and career readiness. Educators can review data at
the student, group, or class level and various timeframes to determine what’s working, what isn’t, and most
importantly, where to focus attention to drive growth.
3 Reports are regularly reviewed and may vary from those shown as enhancements are made.
Page 18 of 23
Performance reports focusing on the pathway to proficiency
The report graphs the student’s STAR Reading or STAR Math scores and trend line (projected growth) for easy
comparison with the pathway to proficiency.
Page 19 of 23
Group performance reports
The Group Performance Report compares students' performance on the STAR Assessments to the pathway to proficiency for annual state tests and summarizes the results.
It helps you see how groups of students (whole class, for example) are progressing toward proficiency. The report displays the most current data as well as historical data as
bar charts so that you can see patterns in the percentages of students on the pathway to proficiency and below the pathway.
Page 20 of 23
Growth proficiency chart
STAR Assessments’ Student Growth Percentiles and expected state assessment performance are viewable by district, school, grade, or class. In addition to Student Growth
Percentiles, other key growth indicators such as grade equivalency, percentile rank, and instructional reading level are also available to help educators identify best
practices that are having a significant educational impact on student growth.
Page 21 of 23
High-level performance reports
Reports for administrators provide periodic, high level forecasts of student performance on the state tests. It includes a performance outlook for each
achievement level and options for how to group and list information.
Page 22 of 23
Acknowledgments
The following experts have advised Renaissance Learning in the development of the STAR Assessments.
Thomas P. Hogan, Ph.D., is a professor of psychology and a Distinguished University
Fellow at the University of Scranton. He has more than 40 years of experience conducting
reviews of mathematics curricular content, principally in connection with the preparation of
a wide variety of educational tests, including the Stanford Diagnostic Mathematics Test,
Stanford Modern Mathematics Test, and the Metropolitan Achievement Test. Hogan has
published articles in the Journal for Research in Mathematics Education and Mathematical Thinking and Learning,
and he has authored two textbooks and more than 100 scholarly publications in the areas of measurement and
evaluation. He has also served as consultant to a wide variety of school systems, states, and other organizations
on matters of educational assessment, program evaluation, and research design.
James R. McBride, Ph.D., is vice president and chief psychometrician for Renaissance
Learning. He was a leader of the pioneering work related to computerized adaptive testing
(CAT) conducted by the Department of Defense. McBride has been instrumental in the
practical application of item response theory (IRT) and since 1976 has conducted test
development and personnel research for a variety of organizations. At Renaissance Learning,
he has contributed to the psychometric research and development of STAR Math, STAR
Reading, and STAR Early Literacy. McBride is co-editor of a leading book on the development of CAT and has
authored numerous journal articles, professional papers, book chapters, and technical reports.
Michael Milone, Ph.D., is a research psychologist and award-winning educational writer
and consultant to publishers and school districts. He earned a Ph.D. in 1978 from The Ohio
State University and has served in an adjunct capacity at Ohio State, the University of
Arizona, Gallaudet University, and New Mexico State University. He has taught in regular and
special education programs at all levels, holds a Master of Arts degree from Gallaudet
University, and is fluent in American Sign Language. Milone served on the board of directors
of the Association of Educational Publishers and was a member of the Literacy Assessment Committee and a
past chair of the Technology and Literacy Committee of the International Reading Association. He has
contributed to both readingonline.org and Technology & Learning magazine on a regular basis. Over the past 30
years, he has been involved in a broad range of publishing projects, including the SRA reading series,
assessments developed for Academic Therapy Publications, and software published by The Learning Company
and LeapFrog. He has completed 34 marathons and 2 Ironman races.
Page 23 of 23
R45817.151030