21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 481
instruction.” 100% of teachers in schools below had risen to $2.2 billion (Lazarín 2014).
the AYP margin reported doing so, while 67% of Reflecting this rapid growth, a 2009 survey found
teachers at lower risk of failing AYP did. that benchmark assessments were nearly univer-
sal among 62 of the country’s largest urban dis-
Despite their varying data sources and strate- tricts (Council of the Great City Schools 2011).
gies, these studies are all observational. A more
optimal approach to identifying test-based Typically administered three or more times a
instruction and resultant score inflation is to use year, these exams have helped to transform
“self-monitoring assessments.” First introduced school systems into testing and data-intensive
by Koretz and Beguin (2010), these assessments environments (Lazarín 2014; Council of the
incorporate audit items into actual high-stakes Great City Schools 2015). Both teachers and
tests. These audit items are sufficiently novel that administrators are now under pressure to practice
they are not susceptible to test preparation tech- data-driven decision-making. Large-scale evalu-
niques. Only one study has implemented such an ations of benchmark test use have failed to find
approach: In New York State, students received clear positive or negative effects on state test
multiple embedded items that attempted to scores (Konstantopoulos et al. 2013, 2016;
“undo” the predictable features of the test. For Cordray et al. 2012; Slavin et al. 2013). Moreover,
example, if a state standard required students to certain case studies provide reason to be skeptical
understand positive and negative slopes but con- of the extent to which teachers have actually
sistently only tested positive slopes, an audit item embraced district efforts to promote data use
was included to test understanding of negative (Means et al. 2010). Nevertheless, in some school
slopes. Koretz et al. (2016) report the results of contexts, benchmark test data and institutional
this experiment, finding that marginal or “bub- pressure for “data-driven decision making” have
ble” students—those closest to passing and thus reshaped teacher beliefs and behaviors in conse-
most important to coach—were most likely to quential ways (Booher-Jennings 2005; Marsh
perform relatively worse on the audit items. This et al. 2006). Even so, these shifts have not typi-
suggests that these students’ test-focused learn- cally produced “deep” changes in pedagogy. That
ing did not transfer to other assessments of the is, they have not fundamentally changed how
same skills. teachers engage students around instructional
content and didactic instruction continues to
Qualitative and survey studies have docu- dominate classrooms (Diamond 2007). With
mented similar shifts in attention between stu- these findings in mind, we turn to our review of
dents. For example, studies document that test-based accountability’s apparent effects on
teachers focus on “bubble” students, those close students.
to the proficiency cut score (Booher-Jennings
2005; Hamilton et al. 2007). All of these instruc- 21.3.2 S tudent Outcome
tional shifts have been enabled, in part, by the Consequences
new forms of testing technology and data that
have proliferated in response to accountability The preceding review of instructional responses
pressure. In particular, there has been extensive suggests that one needs to exercise considerable
growth in the use of benchmark assessments caution in interpreting any changes in students’
designed to help schools track and support stu- high-stakes test scores that are observed under
dents’ mastery of standards (Datnow and accountability pressure. It is for this reason that
Hubbard 2015). Although there is no nationally we do not review the substantial body of work
representative data on benchmark assessment that concludes that K–12 accountability systems
use, we can indirectly track their growth over have had positive average effects on high-stakes
time by considering the changing size of their state test scores (Chakrabarti 2007; Chiang 2009;
market. In 2003, districts nationwide spent Lauen and Gaddis 2012; Reback et al. 2014;
$212 million on tools related to benchmark
assessments (Olson 2005); by 2011, this figure
482 J. Mittleman and J. L. Jennings
Rockoff and Turner 2010; Rouse et al. 2013; Whereas the above studies focused on average
Springer 2008; Winters and Cowen 2012). effects by subject and student group, others have
Instead, we focus our review on three broader focused on the distributional impacts of account-
domains of student outcomes: students’ knowl- ability systems. That is, they explore the hetero-
edge and skills (as measured by low-stakes tests), geneous effects of accountability pressure on
students’ educational attainment and labor mar- students across the test score distribution. Such
ket outcomes (in terms of high school graduation, effects are of particular interest because most
college enrollment, and earnings), and students’ current accountability systems rely on profi-
identities (in terms of how social meaning ciency rates, a threshold measure of achievement.
attaches to test-based categorical inequalities). Since sanctions are a function of the proportion
21.3.2.1 Students’ Knowledge of students brought over the proficiency thresh-
and Skills old, slightly increasing the scores of a small num-
ber of students—the “bubble” students discussed
Compared to the large literature assessing earlier in this chapter—can positively impact the
impacts on high-stakes tests, relatively few stud- school’s accountability rating.
ies have assesed the impact of NCLB on students’ A large body of evidence addresses this issue,
achievement on low-stakes tests. Dee and Jacob’s finding mixed results on the extent to which
(2009) study of the effects of NCLB on National teachers use data to target resources to students.
Assessment of Educational Progress (NAEP) One study found negative effects of accountabil-
scores found increases in state NAEP scores in ity pressure on the lowest performing students in
4th and 8th grade math, but no increases in read- Chicago (Neal and Schanzenbach 2007), while
ing for either 4th or 8th grade. Dividing these another in Texas found positive effects for low-
average effects into subgroups, Dee and Jacob performing students as well as larger gains for
(2009) identified larger positive effects on 4th marginal students (Reback 2008). In total, four
grade math scores for Black and Hispanic stu- studies identified positive effects on low-
dents than for White students; at the same time, performing students (Jacob 2005; Springer 2008;
in 4th grade reading, only White students gained Ladd and Lauen 2010; Dee and Jacob 2009),
while Black and Hispanic students did not. while another four find negative effects on high-
Like Dee and Jacob, Wong et al. (2009) found performing students (Krieg 2008, Ladd and
positive effects on 4th and 8th grade NAEP math Lauen 2010; Dee and Jacob 2009; Reback 2008).
scores. They also found evidence for positive Two more recent studies attempt to make
effects on 4th grade NAEP reading scores, but sense of variation in distributional effects across
only when states had high standards for profi- contexts and time periods (Jennings and Sohn
ciency. Reback et al. (2014) analyzed data from 2014; Lauen and Gaddis 2016). These studies
the Early Childhood Longitudinal Study- suggest that the mixed findings in the literature
Kindergarten (ECLS-K) study, finding small can be explained by three factors. First, because
positive effects of NCLB accountability pressure accountability pressure incentivizes schools to
on the ECLS-K reading and science tests, but no focus attention on students closest to the profi-
effects on the math test. ciency standard, the difficulty of the standard
Taken together, the existing evidence suggests itself affects whether lower or higher performing
that the positive effects of NCLB identified using students will gain most. Less difficult proficiency
high-stakes test scores do translate somewhat standards appear to decrease inequality in high-
into gains on low-stakes tests that are less likely stakes achievement, while more difficult ones
to be corrupted by score inflation. However, the increase it. Second, when targeting students near
evidence is equivocal: Low-stakes test score proficiency, educators appear to emphasize test-
gains are found for some students on some sub- specific skills. Therefore, the effect of
jects under certain circumstances. accountability-induced targeting should differ
across high and low-stakes tests. For example,
21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 483
Jennings and Sohn (2014) evaluated student forming students being strategically funneled
scores on high- and low-stakes tests of similar into special education.
skills administered within the same high stan-
dards context. They found an inequality- 21.3.2.3 S tudents’ Identities
increasing focus on students close to proficiency High-stakes test score data divide students into
on the high-stakes tests, but no effects on inequal- multiple levels of proficiency based on their
ity on the low-stakes tests. Finally, it appears that scores. Students can be labeled as commended,
focusing attention on students close to profi- meeting the standard, or not meeting the stan-
ciency is most pronounced in the lowest- dard. Teachers and schools use these scores for
performing schools. This may help explain why organizational purposes such as sorting students
these effects have been identified more in loca- into advanced courses or remediation opportuni-
tions with a higher fraction of low-performing ties. States sometimes allocate scholarship
schools. opportunities based on these scores. Even beyond
21.3.2.2 Students’ Educational these institutionalized consequences, however,
and Labor Market Outcomes there is reason to believe that test score labels
could come to have broad significance for stu-
One major gap in our knowledge is understand- dents. Ever since the Pygmalion study found that
ing accountability’s effect on students’ later life providing randomly assigned performance labels
outcomes. From A Nation at Risk to the Common to teachers could affect students’ subsequent per-
Core standards movement, standards and formance on standardized tests (Rosenthal and
accountability have been justified with appeals to Jacobsen 1968), research has found that arbitrary
the challenge of success in a knowledge-based performance labels can have real educational
economy. Therefore, one of the crucial assump- consequences. In particular, prior research on
tions motivating test-based accountability sys- students’ responses to tracking labels (Oakes
tems is that promoting test score gains will 1985) suggests that high-stakes test scores plau-
ultimately enhance students’ ability to succeed sibly affect student identities, engagement in
after high school. Surprisingly, this assumption school, peer dynamics, teacher and parental
remains almost entirely untested. expectations, and future educational decisions.
Deming et al. (2016) offer the first evidence Two quantitative studies have examined the
on how accountability pressure impacts students’ impact of state accountability-based test score
trajectories up to and after high school gradua- labels on students’ future achievement and
tion. Using longitudinal data from Texas, they decision-m aking. Papay et al. (2011) found that
compare cohorts within schools that faced differ- students earning an “advanced” label on
ent degrees of accountability pressure. The Massachusetts’ exit exam were more likely to
results are mixed. They find that students in high attend college than those who scored just below
schools facing pressure to avoid a “Low- this cut score. Domina et al. (2016) find similar
Performing” rating experienced several positive evidence of student responses to performance
outcomes: They were more likely to graduate on labels, documenting declines in test scores and
time, accumulated more high school math cred- grades after a student received a low-status per-
its, were more likely to attend and graduate from formance label. These results do not directly
a four-year college, and they had higher earnings address student beliefs and attitudes; still, they
at age 25. However, the effects of accountability provide strong, indirect evidence that account-
were not uniformly positive. Within those schools ability labels do not go unnoticed by students.
on the cusp of a “Recognized” rating, account- Beyond the test performance labels used in
ability pressure had no effect overall and appeared accountability systems, there is evidence that
to cause significant negative long-term effects on increased state accountability pressure is also
poorly performing students. These negative long- associated with a different type of label: ADHD
term effects appear to be the result of poorly per- diagnoses. Bokhari and Schneider (2011), for
484 J. Mittleman and J. L. Jennings
example, determined that demanding state closest elementary school scored a “B” rather
accountability laws increased prescriptions for than an “A”. Below the “A/B” threshold, school
stimulant drugs. King et al. (2014), contrasting ratings did not have appreciable effects on vot-
stimulant use in the summer and school year, ing, although there was suggestive evidence for a
found the largest use differences for higher-SES further penalty at the “D/F” threshold.
children living in states with strict accountability
policies. How accountability-induced diagnostic These studies suggest that school ratings
labels affect students over their life course information—independent of underlying school
remains an open question. What is clear, how- quality—affects broader support for local
ever, is that accountability systems have facili- schools. Jacobsen et al. (2013) provide direct evi-
tated new forms of categorization, which have dence for this effect. Using parent surveys in
the potential to be internalized by students. New York City before and after a large increase
in the city’s standards—which caused 71% of the
21.3.3 P olicy Feedback city’s schools to fall at least one letter grade on
Consequences the city’s report card—they find significant, albeit
small, decreases in reported parent satisfaction. A
Finally, test-based accountability systems do not national survey experiment by the same authors
only impact teachers, students, and schools: They (Jacobsen et al. 2014) revealed that it is not only
also affect public opinion in ways that may the substance of school ratings that affects atti-
dynamically feed back into the classroom. tudes toward schools; the actual style in which
Because measures themselves play a central role ratings are presented (i.e., letter grades vs percent
in constructing social problems (Espeland and proficient), can accentuate or undermine public
Stevens 1998), test score data help shape the pub- approval of schools. Like Barrows (2014), the
lic understanding of educational achievement authors find that these style differences only
and inequality. In this way, the data produced by impacted approval ratings for highly- and poorly-
current accountability systems frame future pol- rated schools without any effect on middling
icy debates, encouraging certain actions while schools.
forestalling others.
Whereas the above studies focus on the effect
A growing body of research documents that of accountability ratings, Rhodes (2015) attempts
the school quality indicators disseminated as part to assess the impact of accountability systems as
of accountability programs could have the unin- a whole. Combining original survey data with
tended effect of reducing public support for summary indicators of the strength of states’
schools. Kogan et al. (2015), for example, use accountability systems, Rhodes finds that system
data from Ohio tax referenda to demonstrate that strength is associated with significantly lower
voters in school districts that failed to meet AYP reports of trust in government, less confidence in
were 10% less likely to approve subsequent lev- government efficacy, and more negative attitudes
ies. These votes reduced district revenue by over about schools. Despite her extensive use of indi-
13% and disproportionately affected already vidual and state-level controls, however, Rhodes’
impoverished districts. Barrows (2014) provides results are difficult to interpret given the likeli-
complementary evidence from school board elec- hood of reverse causality and omitted variables.
tions in Florida. Applying a regression disconti- Still, in combination with the other studies
nuity design to the “A–F” letter grades that reviewed, Rhodes provides added cause for con-
Florida assigns to local schools, Barrows finds cern that efforts to spur school improvement
that voters are significantly less likely to vote for through test-based accountability may have had
school board incumbents in precincts wherein the the unintended consequence of undercutting
future attempts at mobilizing broad-based sup-
port for reform.
21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 485
21.4 R esearch on Alternatives parents knew that their child’s school was desig-
to Test-Based Accountability nated as needing improvement (Stecher et al.
2010).
The above review suggests that, despite certain These patterns do not appear to be limited
positive effects, the national shift toward test- only to NCLB. Henderson (2010) found a similar
based accountability has also been associated lack of response under Florida’s statewide school
with considerable costs. As such, it is important “A–F” grading system.
to evaluate alternative mechanisms that could Moreover, families’ apparent unresponsive-
maintain accountability while avoiding some of ness to NCLB-era school ratings is also consis-
the unintended consequences of the current test- tent with research on other school rating and
based system. In this section, we briefly review choice programs that preceded NCLB (Lauen
research on three proposed alternatives: market- 2007; Rich and Jennings 2015). This research
based accountability, professional accountability, demonstrates that information on school quality,
and process-based accountability. in itself, was not enough to disrupt the socially
and contextually constrained process of school
21.4.1 Market-Based Accountability enrollment.
An irony of NCLB and its predecessors is that,
even though they failed in their intended goal of
One alternative to regulatory test-based account- promoting student mobility, they had the unin-
ability is market-based accountability, by which tended consequence of promoting staff mobility.
families can “vote with their feet” and attend Numerous studies have demonstrated that the
schools that better meet their preferences or “shock” of a negative accountability rating pro-
needs. Such systems are often implemented motes attrition out of affected schools (Clotfelter
alongside test-based ratings programs. Indeed, et al. 2004; Sims 2009; Hanushek and Rivkin
one of the main drivers for school improvement 2010; Feng et al. 2013). This attrition is particu-
envisioned by NCLB was the coupling of market- larly pronounced among experienced teachers
based accountability with test-based accountabil- (Sims 2009) and high value-added teachers (Feng
ity. In the first year that a school was identified as et al. 2013). Li (2015) provides similar evidence
in need of improvement, the district was required for principals, showing that the onset of NCLB in
to allow students to transfer out of that school North Carolina corresponded with high value-
into a better performing school within the added principals moving to schools less likely to
district. be sanctioned under the new system.
The experience of public school choice under The limited mobility of students but strategic
NCLB, however, provides reason to be skeptical mobility of teachers and principals provides a
of the extent to which market-based accountabil- sobering corrective to the narrative that competi-
ity on its own can deliver on the promise of tion will spur system-wide improvement. In sys-
system-wide improvement. Remarkably, the tems marked by residential segregation and
most recent evidence suggests that only about 1% inequality in family resources, market-based
of eligible students took advantage of NCLB’s accountability is unlikely to secure adequate
public school choice provisions (Stecher et al. opportunities for all students.
2010). Two facts help explain this low take-up
rate. First, over a third of districts reported that
they simply had no schools available for trans- 21.4.2 P rofessional Accountability
fers, often because every school serving the rele-
vant grade level was also failing AYP. Second, Given the risks of relying on families to enforce
survey evidence suggests that school quality accountability through school choice, a poten-
reports were not salient enough to parents to war- tially attractive alternative is to better empower
rant action: In eight large districts, only 19% of the professionals within school systems through
486 J. Mittleman and J. L. Jennings
professional accountability. Test-based account- However, even though doctors appear to offer
ability programs are motivated in part by a per- a model for the professionalization of teachers,
ceived need to resolve a principal-agent problem. this arguably reflects a misunderstanding of how
Confidence in institutions, including public edu- professionalization has affected the quality and
cation, has eroded over the last four decades cost of American medicine. As Starr (1983) has
(Lipset and Schneider 1983). The public no lon- convincingly shown, professional power is often
ger trusts teachers and administrators to act in the not used to improve practice, but to deflect poten-
best interests of students without oversight. By tial entrants to markets and control price.
providing a way to monitor and incentivize teach- American physicians’ professional control has
ers’ behavior, test-based accountability programs contributed to a health care system that is more
fill the gap of public distrust. An alternative costly than any other country in the world, while
approach to accountability, therefore, would be still being less effective on almost every quality
to address this distrust itself. One way to do this measure available (Garber and Skinner 2008).
would be by shoring up teachers’ status as pro-
fessionals: highly qualified, expertly trained in a To be sure, professionalization in conjunction
specialized body of knowledge, and in need of with other performance evaluation measures
sufficient autonomy to accomplish their work. holds promise as a complement or supplement to
test-based accountability. However, we are aware
Elementary and secondary school teachers in of no existing evidence that convincingly indi-
America have fought recurring battles to assert cates that professionalization alone will drive
their professional status (Ingersoll and Merrill improvements in student outcomes and reduc-
2011). As of 2013, only 34% of American teach- tions in cost.
ers agreed that “the teaching profession is valued
in society,” compared to 59% of teachers in 21.4.3 I nspectorate or Process-Based
Finland, 67% in Korea, and 84% in Malaysia Accountability
(OECD 2015). American teachers’ perceptions
of their poor standing finds apparent validation in Finally, school reformers do not only look abroad
international comparisons of teacher salary and for alternate models of selecting, training and
working conditions. Compared to 33 other OECD compensating teachers; they also point to alterna-
countries, the average salary of American teach- tive systems of test-based accountability itself.
ers ranks 28th, despite the fact that American One feature of accountability systems found in
teachers rank 6th in terms of the total number of many countries, particularly in Europe, is formal
hours that they work (OECD 2014, 2015). school inspections. Although “the practice of
school inspections varies considerably among
Cross-national comparisons of the teaching and within countries,” 24 out of the 31 OECD
profession—and associated student outcomes— countries surveyed in 2009 reported that school
is a fraught exercise; disconfirming evidence can inspections were part of their accountability sys-
be found for nearly any generalization. tem (OECD 2011, p. 434). In such a system,
Nevertheless, Goldhaber (2009, p. 97) suggests trained external evaluators visit schools and
that the countries that perform best on interna- assess them on a range of measures. The results
tional assessments typically train their teachers of these visits are publicly reported, along with
the way that America trains its doctors. These guidance for improvement, and may be tied to
countries have relatively few training programs, reward or sanction.
more applicants than available slots, a high
degree of standardization across programs, a In theory, because they utilize expert judg-
promise of a relatively high permanent income ment about a holistic range of factors, inspec-
after certification, and an extended period of tion systems provide accountability without the
post-licensure training under the supervision of unintended consequences of a mechanistically
more senior doctors. None of these conditions test-based system. Under the English system,
hold in the American teaching profession.
21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 487
for example, inspectors determine whether a we cautiously conclude that inspectorate-style
school is Outstanding, Good, Satisfactory, or approaches to accountability could hold promise
Failing based on 27 dimensions, including “the as schools work to design accountability and
extent of pupils’ spiritual, moral, social, and improvement efforts for the post-NCLB era.
cultural development” (Jerald 2012, p. 7).
Despite this holistic approach, student test 21.5 Conclusion: The Future
scores still play a central role in determining of Accountability
schools’ overall ratings. Moreover, these ratings and Accountability Research
still carry consequences: Since 1993, at least
230 schools have been shut down because of Research on accountability systems has prolifer-
failure to improve after multiple inspections ated in the last two decades. Although sociolo-
(Hussain 2015; Jerald 2012). gists have made important contributions to this
body of evidence, much of the research reviewed
Despite their theoretical appeal, there is little above was conducted outside of our discipline,
evidence on the effect that inspections have on particularly by economists. With this in mind, we
schools and students. Advocates for adopting an conclude with thoughts for future research,
English-style inspectorate in America point to the focusing on two areas that would especially ben-
fact that, on average, English schools designated efit from sociological analysis.
as Failing require only 20 months and 3 or 4
follow-up inspections before they are upgraded An essential area for future research is better
to Satisfactory or better (Jerald 2012). This kind understanding how accountability policies are
of evidence is clearly not sufficient for establish- mediated by local context. In schools facing sim-
ing causal effects. Existing work attempting to ilar pressures and incentives, how do reactions
estimate the causal effect of inspection on student and results differ? What formal and informal
achievement has come to mixed conclusions. In characteristics of school communities influence
Denmark, Luginbuhl et al. (2009) find no effect how accountability policies shape practice?
of being inspected. In England, Rosenthal (2004) These kinds of questions will only increase in
finds a small negative effect of being inspected at salience as accountability under ESSA continues
all, whereas Allen and Burgess (2012) and to become more locally differentiated. These are
Hussain (2015) find small to moderate positive also the kinds of questions that sociologists are
effects of being inspected and receiving a “Fail” particularly well suited to address. Bryk and
rating. Schneider’s (2002) work on relational trust, for
example, provides a model for understanding
The mechanisms behind these effects are how and why the effects of reform vary across
unclear. Advocates point to the detailed, action- schools.
able feedback that schools are supposed to
receive after inspection (Jerald 2012). However, A second area for research is examinations of
Hussain (2015) questions the impact that feed- how student performance data comes to shape
back, in itself, has on student achievement. All teacher practice. Understanding the factors that
schools, Hussain notes, receive feedback, but this support effective data use would advance school
feedback apparently has no effect in schools not improvement efforts by bolstering a key link in
rated Failing. Nevertheless, Hussain (2015) the chain prescribed by policymakers. Data use is
argues that Fail ratings promote some genuine more than a technical process, however. The
change in school practice, as he finds no evidence ways in which teachers and school leaders under-
of “gaming” behavior and finds that effects per- stand and act upon data are also fundamentally
sist even after students leave the Failing school. social and value-laden processes. For student
In another encouraging contrast to the American performance data to become intelligible, teachers
studies reviewed above, he also finds no evidence and school leaders must engage in a process of
that a Failing rating promotes teacher mobility commensuration that “changes the terms of what
out of the affected school. Given this evidence,
488 J. Mittleman and J. L. Jennings
can be talked about, how we value, and how we common baseline and ensuring that no student is
treat what we value” (Espeland and Stevens allowed to remain beneath it. Whatever the future
1998, p. 315). Because of this, a sociological per- holds for accountability policy in the post-Obama
spective has much to contribute to studies of era, we hope that this fundamental tenet of the
data-driven school improvement efforts. standards and accountability movement persists.
The new rating systems mandated under
ESSA will provide especially interesting case
studies of this process. As mentioned above, References
ESSA requires states to incorporate one nontradi-
tional, nonacademic factor into their ratings: fac- Academic Benchmarks. (2016). Map displaying states’
tors like student engagement, grit, and growth adoption of CCSS. Academic Benchmark’s Common
mindsets. How will these measures come to be Core State Standards Adoption Map. http://aca-
understood and contested on the ground? In what demicbenchmarks.com/common-core-state-adoption-
map/. Accessed 17 Apr 2017.
ways will “soft” performance measures repro- Allen, R., & Burgess, S. (2012). How should we treat
duce or disrupt the categorical inequalities long under-performing schools? A regression discontinu-
observed in “hard” measures? Questions like ity analysis of school inspections in England. CMPO
these call out for sociological analysis. Working Paper Number 12/287, Bristol University.
Barrows, S. (2014). Performance information and retro-
Since the 1980s, test-based accountability has spective voting: Evidence from a school accountabil-
played an increasingly central role in organizing ity regime. Harvard Program on Education Policy and
American education. Despite some difference in Governance Working Paper Series (Working Paper
emphases across administrations, there has been PEPG 15-03).
Beveridge, T. (2009). No Child Left Behind and fine arts
a clear consensus that government has a duty to classes. Arts Education Policy Review, 111(1), 4–7.
hold schools accountable for standardized perfor- Bokhari, F., & Schneider, H. (2011). School accountabil-
mance metrics. Even efforts to decentralize
accountability—as in school choice programs— ity laws and the consumption of psychostimulants.
Journal of Health Economics, 30, 355–372.
Booher-Jennings, J. (2005). Below the bubble:
have taken for granted the proposition that “Educational Triage” and the Texas accountability
schools would compete on the basis common system. American Educational Research Journal,
performance standards. 42(2), 231–268.
Bryk, A., & Schneider, B. (2002). Trust in schools: A core
The future of this consensus is unclear. The resource for improvement. New York: Russell Sage
popular backlash against the Common Core Foundation.
Standards movement and the election of a presi- Bush, G. (1989, September 28). Joint statement on the
dent who campaigned on school vouchers sug-
gest that public support for standards and education summit with the Nation’s Governors in
Charlottesville, Virginia. Online by Gerhard Peters and
John T. Woolley, The American Presidency Project.
accountability may be reaching a breaking point. http://www.presidency.ucsb.edu/ws/?pid=17580.
For those who have worked to highlight the limi- Accessed 2 Mar 2016. (2006). From the capi-
tations and unintended consequences of test- Center on Education Policy. 4 of the No Child Left
based accountability, the current moment of tal to the classroom: Year
Behind Act. http://cep-dc.org/displayDocument.
reevaluation holds great potential. The research cfm?DocumentID=301. Accessed 2 Mar 2016.
reviewed in this chapter suggests a number of Center on Education Policy. (2007). NCLB Year 5:
areas in which our current accountability system
could be improved. Choices, changes, and challenges: Curriculum and
instruction in the NCLB era. http://www.cep-dc.org/
displayDocument.cfm?DocumentID=312. Accessed 2
However, the current moment also carries Mar 2016.
considerable risk. Despite its shortcomings, the Chakrabarti, R. (2007). Vouchers, public school response,
test-based accountability movement enshrined
the principle that it is unacceptable for any child and the role of incentives: Evidence from Florida.
FRB of New York Staff Report, (306).
Chiang, H. (2009). How accountability pressure on failing
in America to be left behind by their school sys- schools affects student achievement. Journal of Public
tem. Accountability was promoted, at least in Economics, 93(9), 1045–1057.
part, as a tool for equity: a way of identifying a
21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 489
Clotfelter, C. T., Ladd, H. F., Vigdor, J. L., & Diaz, articles/2015/12/07/the-every-student-succeeds-act-
R. A. (2004). Do school accountability systems explained.html. Accessed 2 Mar 2016.
make it more difficult for low-performing schools Espeland, W. N., & Stevens, M. L. (1998).
to attract and retain high-quality teachers? Journal Commensuration as a social process. Annual Review
of Policy Analysis and Management, 23(2), of Sociology, 24, 313–343.
251–271. Feng, L., Figlio, D. & Sass, T. (2013). School accountabil-
ity and teacher mobility. Working Paper. http://www2.
Common Core State Standards Initiative. (2016) gsu.edu/~tsass/pdfs/school%20accountability%20
Standards in your State. http://www.corestandards. and%20teacher%20mobility%2004-12-2013%20
org/standards-in-your-state/. Accessed 2 Mar 2016. TRS%20Clean.pdf. Accessed 2 Mar 2016.
Garber, A. M., & Skinner, J. (2008). Is American health
Cordray, D., Pion, G., Brandt, C., Molefe, A., & Toby, care uniquely inefficient? (NBER Working Paper No.
M. (2012). The impact of the Measures Of Academic 14257). Cambridge, MA.
Goldhaber, D. (2009). Lessons from abroad: Exploring
Progress (MAP) program on student reading achieve- cross-country differences in teacher development
ment: Final Report. U.S. Department of Education systems and what they mean for U.S. policy. In
National Center for Education Evaluation and D. Goldhaber & J. Hannaway (Eds.), Creating a new
Regional Assistance. http://ies.ed.gov/ncee/edlabs/ teaching profession (pp. 81–114). Washington, DC:
regions/midwest/pdf/REL_20134000.pdf. Accessed 2 Urban Institute Press.
Mar 2016. Grissom, J. A., Nicholson-Crotty, S., & Harrington, J. R.
Council of the Great City Schools. (2011). Using data (2014). Estimating the effects of No Child Left Behind
on teachers’ work environments and job attitudes.
to improve instruction in the great city schools: Educational Evaluation and Policy Analysis, 36(4),
Documenting current practice. http://files.eric.ed.gov/ 417–436.
fulltext/ED536742.pdf. Accessed 2 Mar 2016. Hamilton, L. S., & Stecher, B. M. (2007). Measuring
Council of the Great City Schools. (2015). Student test-
ing in America’s great city schools: An inventory instructional responses to standards-based account-
and preliminary analysis. http://www.cgcs.org/cms/ ability. Santa Monica: RAND Corporation.
lib/DC00001581/Centricity/Domain/87/Testing%20 Hamilton, L. S., Stecher, B. M., Marsh, J. A., McCombs,
Report.pdf. Accessed 2 Mar 2016. J. S., Robyn, A., Russell, J. L., Naftel, S., & Barney, H.
Datnow, A., & Hubbard, L. (2015). Teachers’ use of (2007). Implementing standards-based accountability
assessment data to inform instruction: Lessons from
the past and prospects for the future. Teachers College under No Child Left Behind: Responses of superinten-
Record, 117(4), 1–26. dents, principals, and teachers in three states. Santa
Davidson, E., Reback, R., Rockoff, J., & Schwartz, Monica: RAND Corporation.
H. L. (2015). Fifty ways to leave a child behind: Hamilton, L. S., Stecher, B. M., & Yuan, K. (2008).
Idiosyncrasies and discrepancies in states’ imple-
mentation of NCLB. Educational Researcher, 44(6), Standards-based reform in the United States: History,
347–358. research, and future directions. RAND Education.
Dee, T. S., & Jacob, B. (2009). The impact of No Child http://www.rand.org/pubs/reprints/RP1384.html.
Left Behind on student achievement (NBER Working Accessed 2 Mar 2016.
Paper No. 15531). Cambridge, MA. Hannaway, J., & Hamilton, L. (2008). Performance-based
Dee, T. S., Jacob, B., & Schwartz, N. L. (2013). The
effects of NCLB on school resources and practices. accountability policies: Implications for school and
Educational Evaluation and Policy Analysis, 35(2), classroom practices. Washington, DC: Urban Institute
252–279. and RAND Corporation.
Deming, D. J., Cohodes, S., Jennings, J., & Jencks, Hanushek, E. A., & Rivkin, S. G. (2010). The quality
C. (2016). School accountability, postsecondary and distribution of teachers under the No Child Left
attainment and earnings. Review of Economics and Behind Act. Journal of Economic Perspectives, 24(3),
Statistics, 98(5), 848–862. 133–150.
Diamond, J. B. (2007). Where the rubber meets the road: Henderson, M. (2010). Does information help families
Rethinking the relationship between high-stakes test-
ing policy and classroom instruction. Sociology of choose schools? Evidence from a regression discon-
Education, 80(4), 285–313. tinuity design. Unpublished manuscript. Harvard
Diamond, J. B., & Spillane, J. P. (2004). High-stakes University, Department of Government and Social
accountability in urban elementary schools: Policy, Cambridge, MA.
Challenging or reproducing inequality? Teachers Ho, A. D., & Haertel, E. H. (2006). Metric-free measures
College Record, 106(6), 1145–1176.
Domina, T., Penner, A. M., & Penner, E. K. (2016). of test score trends and gaps with policy-relevant
“Membership has its privileges”: Status incentives examples (CSE Report 665). Los Angeles: National
and categorical inequality in education. Sociological Center for Research on Evaluation, Standards, and
Science, 3, 264–295. Student Testing (CRESST), Center for the Study of
Education Week. (2016, January 4). The Every Student Evaluation, University of California, Los Angeles.
Succeeds act: Explained. http://www.edweek.org/ew/ Hoffer, T. B. (2000). Accountability in education. In M. T.
Hallinan (Ed.), Handbook of the sociology of educa-
tion (pp. 529–543). New York: Springer.
490 J. Mittleman and J. L. Jennings
Holcombe, R., Jennings, J. L., & Koretz, D. (2013). The Kogan, V., Lavertu, S., & Peskowitz, Z. (2015).
roots of score inflation: An examination of oppor- Performance federalism and local democracy: Theory
tunities in two states’ tests. In G. Sunderman (Ed.), and evidence from school tax referenda. American
Journal of Political Science, 60(2), 418–435.
Charting reform, achieving equity in a diverse
nation (pp. 163–189). Greenwich: Information Age Konstantopoulos, S., Miller, S., & van der Ploeg, A.
Publishing. (2013). The impact of Indiana’s system of interim
Hussain, I. (2015). Subjective performance evaluation in assessments on mathematics and reading achieve-
the public sector: Evidence from school inspections. ment. Educational Evaluation and Policy Analysis,
The Journal of Human Resources, 50(1), 189–221. 35(4), 481–499.
Ingersoll, R., & Merrill, E. (2011). The status of teach-
ing as a profession. In J. Ballantine & J. Spade (Eds.), Konstantopoulos, S., Miller, S., van der Ploeg, A., & Li,
W. (2016). Effects of interim assessments on student
School and society: A sociological approach to edu- achievement: Evidence from a large-scale experiment.
cation (4th ed., pp. 181–189). Thousand Oaks: Pine
Forge Press/SAGE Publications. Journal of Research on Educational Effectiveness,
Jacob, B. A. (2005). Accountability, incentives, and 9(S1), 188–208.
behavior: Evidence from school reform in Chicago. Koretz, D., & Beguin, A. (2010). Self-monitoring
Journal of Public Economics, 89, 761–796. assessments for educational accountability systems.
Jacob, B. (2007). Test-based accountability and student Measurement, 8(2–3), 92–109.
Koretz, D., Jennings, J. L., Ng, H. L., Yu, C., Braslow, D.,
achievement: An investigation of differential per- & Langi, M. (2016). Auditing for score inflation using
formance on NAEP and state assessments (Working self-monitoring assessments: Findings from three pilot
Paper 12817). Cambridge, MA: National Bureau of studies. Educational Assessment, 21(4), 231–247.
Economic Research. Krieg, J. (2008). Are students left behind? The distribu-
Jacob, R. T., Stone, S., & Roderick, M. (2004). Ending tional effects of No Child Left Behind. Education
Finance and Policy, 3, 250–281.
social promotion: The response of teachers and stu- Ladd, H. F., & Lauen, D. L. (2010). Status versus growth:
dents. Chicago: Consortium on Chicago School The distributional effects of accountability policies.
Research. Retrieved March 29, 2011, from http:// Journal of Policy Analysis and Management, 29(3),
www.eric.ed.gov/PDFS/ED483823.pdf 426–450.
Jacobsen, R., Saultz, A., & Snyder, J. W. (2013). When Ladd, H. F., & Zelli, A. (2002). School-based accountabil-
accountability strategies collide: Do policy changes ity in North Carolina: The responses of school prin-
that raise accountability standards also erode public cipals. Educational Administration Quarterly, 38(4),
satisfaction? Educational Policy, 27(2), 360–389. 494–529. https://doi.org/10.1177/001316102237670.
Jacobsen, R., Snyder, J. W., & Saultz, A. (2014). Informing Lauen, D. L. (2007). Contextual explanations of school
or shaping public opinion? The influence of school choice. Sociology of Education, 80(3), 179–209.
accountability data format on public perceptions Lauen, D. L., & Gaddis, S. M. (2012). Shining a light or
of school quality. American Journal of Education, fumbling in the dark? The effects of NCLB’s subgroup-
121(1), 1–27. specific accountability on student achievement.
Jennings, J. L., & Bearak, J. M. (2014). “Teaching to Educational Evaluation and Policy Analysis, 34(2),
the Test” in the NCLB Era: How test predictability 185–208.
affects our understanding of student performance. Lauen, D., & Gaddis, M. (2016). Accountability pressure,
Educational Researcher, 43(8), 381–389. academic standards, and educational triage.
Jennings, J. L., & Sohn, H. (2014). Measure for measure: Educational Evaluation and Policy Analysis, 38(1),
How proficiency-based accountability systems affect 127–147.
inequality in academic achievement. Sociology of Lazarín, M. (2014). Testing overload in America’s
Education, 87(2), 125–141. schools. Center for American Progress. https://
Jennings, J. L., Bearak, J. M., & Koretz, D. M. (2011). www.americanprogress.org/issues/education/
report/2014/10/16/99073/testing-overload-in-ameri-
Accountability and racial inequality in American edu- cas-schools/. Accessed 2 Mar 2016.
cation. Paper presented at the annual meetings of the Li, D. (2015). School accountability and principal mobil-
American Sociological Association, Las Vegas, NV.
Jerald, C. D. (2012). Education sector reports: On Her ity: How No Child Left Behind affects the allocation
Majesty’s school inspection service. Washington, DC: of school leaders (Harvard Business School Working
Education Sector. Paper, No. 16-052). http://www.hbs.edu/faculty/
King, M. D., Jennings, J. L., & Fletcher, J. (2014). Medical Pages/item.aspx?num=50034. Accessed 2 Mar 2016.
adaptation to academic pressure: Schooling, stimulant Lipset, S. M., & Schneider, W. (1983). The confidence
use, and socioeconomic status. American Sociological
Review, 79(6), 1–28. gap: Business, labor and government in the public
Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, mind. New York: Free Press.
B. M. (2000). What do test scores in Texas tell us? Loeb, S., & Cunha, J. (2007). Have assessment-based
Santa Monica: RAND (Issue Paper IP-202). http://
www.rand.org/publications/IP/IP202/. Accessed 4 accountability reforms influenced the career deci-
June 2013. sions of teachers? A report commissioned by the
U.S. Congress as part of Title I, Part E, Section
21 Accountability, Achievement, and Inequality in American Public Schools: A Review of the Literature 491
1503 of the No Child Left Behind Act of 2001. investment decisions: The impact of test-score labels
https://cepa.stanford.edu/sites/default/files/Cunha_ on educational outcomes (No. w17120). National
Accountability_Labor_Decisions.pdf Accessed 2 Mar Bureau of Economic Research.
2016. Pedulla, J. J., Abrams, L. M., Madaus, G. F., Russell,
Luginbuhl, R., Webbink, D., & Wolf, I. D. (2009). Do M. K., Ramos, M. A., & Miao, J. (2003). Perceived
inspections improve primary school performance?
Educational Evaluation and Policy Analysis, 31(3), effects of state-mandated testing programs on teach-
221–237.
Marsh, J. A., Pane, J. F., & Hamilton, L. S. (2006). ing and learning: Findings from a national survey of
teachers. Boston: Lynch School of Education, Boston
Making sense of data-driven decision making in edu- College.
cation: Evidence from recent RAND Research (OP- Polikoff, M. S., McEachin, A. J., Wrabel, S. L., & Duque,
170). Santa Monica: RAND Corporation. M. (2014). The waive of the future? School account-
McDonnell, L. M. (2005). No Child Left Behind and the ability in the waiver era. Educational Researcher,
federal role in education: Evolution or revolution? 43(1), 45–54.
Peabody Journal of Education, 80(2), 19–38. Reback, R. (2008). Teaching to the rating: School
McNeil, L. M. (2000). Contradictions of school reform: accountability and the distribution of student achieve-
Educational costs of standardized testing. New York: ment. Journal of Public Economics, 92, 1394–1415.
Routledge. Reback, R., Rockoff, J., & Schwartz, H. L. (2014). Under
Means, B., Padilla, C., & Gallagher, L. (2010). Use of pressure: Job security, resource allocation, and pro-
ductivity in schools under No Child Left Behind.
education data at the local level: From accountabil- American Economic Journal: Economic Policy, 6(3),
ity to instructional improvement. U.S. Department of 207–241.
Education. https://www2.ed.gov/rschstat/eval/tech/ Rhodes, J. H. (2015). Learning citizenship? How state
use-of-education-data/use-of-education-data.pdf. education reforms affect parents’ political attitudes
Accessed 2 Mar 2016. and behavior. Political Behavior, 37(1), 181–220.
Mehta, J. (2013). How paradigms create politics: The Rich, P. M., & Jennings, J. L. (2015). Choice, informa-
transformation of American educational policy, 1980– tion, and constrained options: School transfers in a
2001. American Educational Research Journal, 50(2), stratified educational system. American Sociological
285–324. Review, 80(5), 1069–1098.
Meyer, J. W., & Rowan, B. (1977). Institutionalized orga- Rockoff, J., & Turner, L. J. (2010). Short-run impacts of
nizations: Formal structure as myth and ceremony. accountability on school quality. American Economic
American Journal of Sociology, 83(2), 340–363. Journal: Economic Policy, 2(4), 119–147.
Meyer, J. W., & Rowan, B. (1978). The structure of Rosenthal, L. (2004). Do school inspections improve
educational organizations. In M. W. Meyer (Ed.), school quality? Ofsted inspections and school exami-
Environments and organizations. San Francisco: nation results in the U.K. Economics of Education
Jossey-Bass. Review, 23(2), 143–151.
Murnane, R. J., & Papay, J. P. (2010). Teachers’ views Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the
on No Child Left Behind: Support for the principles,
concerns about the practices. The Journal of Economic classroom: Teacher expectation and pupils’ intellec-
Perspectives, 24(3), 151–166. tual development. Rinehart and Winston.
Neal, D., & Schanzenbach, D. W. (2007). Left behind by Rouse, E. R., Hannaway, J., Goldhaber, D., & Figlio, D.
(2013). Feeling the Florida heat? How low-performing
design: Proficiency counts and test-based account- schools respond to voucher and accountability pres-
ability (NBER Working Paper No. 13293). sure. American Economic Journal: Economic Policy,
Oakes, J. (1985). Keeping track. New Haven: Yale 5(2), 251–281.
University Press. Shen, X. (2008). Do unintended effects of high-stakes test-
OECD. (2011). How are schools held accountable? In ing hit disadvantaged schools harder? Doctoral dis-
Education at a Glance 2011: Highlights. Paris: OECD sertation, Stanford University.
Publishing. Shepard, L. A. (1988, April). Should instruction be mea-
OECD. (2014). Indicator D3: How much are teach- surement driven? A debate. In Meeting of the American
ers paid? In Education at a Glance 2014: OECD Educational Research Association, New Orleans.
Indicators. Paris: OECD Publishing. Shepard, L. A., & Dougherty, K. (1991). Effects of high-
OECD. (2015). Country note: United States of America: stakes testing on instruction. In R. L. Linn (Ed.), The
effects of high stakes testing. Annual meetings of
Key findings from the teaching and learning interna- the American Education Research Association and
tional survey (TALIS). Paris: OECD Publishing. the National Council of Measurement in Education.
Olson, L. (2005, November 30). Benchmark assess- Chicago, IL.
ments offer regular checkups on student achieve- Sims, D. P. (2009). Going down with the ship? The effect
ment. Education Week. http://www.edweek.org/ew/
articles/2005/11/30/13benchmark.h25.html. Accessed of school accountability on the distribution of teacher
2 Mar 2016. experience in California (Urban Institute Working
Papay, J. P., Murnane, R. J., & Willett, J. B. (2011). Paper). http://www.urban.org/research/publication/
going-down-ship-effect-school-accountability-dis-
How performance information affects human-capital
492 J. Mittleman and J. L. Jennings
tribution-teacher-experience-california. Accessed 2 annual meeting of the Association for Education
March 2016. Finance and Policy, San Antonio, TX.
Slavin, R. E., Cheung, A., Holmes, G. C., Madden, N. A., Taylor, G., Shepard, L., Kinner, F., & Rosenthal, J. (2002).
& Chamberlain, A. (2013). Effects of a data-driven A survey of teachers’ perspectives on high-stakes test-
district reform model on state assessment outcomes.
American Educational Research Journal, 50(2), ing in Colorado: What gets taught, what gets lost
371–396. (CSE Technical Report 588). Los Angeles: University
Springer, M. G. (2008). The influence of an NCLB of California. Retrieved September 20, 2010, from
accountability plan on the distribution of student test http://eric.ed.gov/PDFS/ED475139.pdf
score gains. Economics of Education Review, 27(5), The National Commission on Excellence in Education.
556–563. (1983). A nation at risk: The imperative for educa-
Starr, P. (1983). The social transformation of American tional reform. An open letter to the American people.
A report to the Nation and the Secretary of Education.
medicine: The rise of a sovereign profession and the http://files.eric.ed.gov/fulltext/ED226006.pdf.
making of a vast industry. New York: Basic Books. Accessed 2 Mar 2016.
Stecher, B. M. Vernez, G., & Steinberg, P. (2010). U.S. Department of Education. (2002). Fact sheet on title
I, Part A. https://www2.ed.gov/rschstat/eval/disadv/
Reauthorizing No Child Left Behind: Facts and rec- title1-factsheet.pdf. Accessed 2 Mar 2016.
ommendations. RAND Education. http://www.rand. Winters, M. A., & Cowen, J. M. (2012). Grading New York
org/pubs/monographs/MG977.html. Accessed 2 Mar accountability and student proficiency in America’s
2016. largest school district. Educational Evaluation and
Steinberg, M. P., & Donaldson, M. L. (2016). The new Policy Analysis, 34(3), 313–327.
educational accountability: Understanding the land- Wong, M., Cook, T. D., & Steiner, P. M. (2009). No Child
scape of teacher evaluation in the post-NCLB Era. Left Behind: An interim evaluation of its effects on
Education Finance and Policy, 11(3), 340–359.
Sun, M., Saultz, A., & Ye, Y. (2014). Federal policy and learning using two interrupted time series each with
its own non-equivalent comparison series. Institute for
the teacher labor market: Exploring the effects of Policy Research (Working Paper 09–11), 18.
NCLB on teacher turnover. Paper presented at the
Methods for Examining the Effects 22
of School Poverty on Student Test
Score Achievement
Douglas Lee Lauen, Brian L. Levy,
and E. C. Hedberg
Abstract poverty effects on student test score growth.
Measuring school effects has been an impor- This model does, however, suggest that varia-
tant inquiry for sociologists of education for at tion in test score growth across schools may
least 50 years. This chapter summarizes cur- be greater than variation in test score growth
rent research on the relationship between across students, which opens important ave-
school poverty and student achievement, nues for understanding the sources of this
which relies heavily on cross-sectional asso- variation.
ciations. We then propose that scholars con-
sider longitudinal approaches to estimating An important mode of sociological inquiry is
school effects in which changes in school out- seeking to understand the effects of groups on
comes are related to changes in school con- individual action. From the earliest days of the
texts. We present illustrative examples of both discipline, sociologists have investigated this
cross-sectional and longitudinal analyses question with many types of evidence, from
using a census of North Carolina students and quantitative counts and rates to interviews and
schools. Cross-sectional models indicate a observation. That individual behavior is shaped
significant negative association between by social context is a central presumption of soci-
school poverty and achievement. Our pre- ologists, one that separates our discipline from
ferred specification—a three-level model of economics and psychology, which stress the role
time within students cross-nested within of individual drives and preferences.
schools—finds no relationship between school
poverty and achievement, which raises impor- Whether the focus is on schools as organiza-
tant questions about the validity of school tions, schooling as a set of implicit and explicit
practices, or the educational system as a central
D. L. Lauen (*) institution in a system of stratification, sociologists
University of North Carolina, Chapel Hill, NC, USA of education have made significant contributions
e-mail: [email protected] to our understanding of the ways schools affect
student’s lives, both during the years of formal
B. L. Levy schooling and thereafter. Focusing on quantitative
Harvard University, Cambridge, MA, USA school effects studies in particular, we have learned
e-mail: [email protected] that education is a key determinant of status attain-
ment, that family background is a strong predictor
E. C. Hedberg of success in school, and that school impacts are
NORC at the University of Chicago,
Chicago, IL, USA
© Springer International Publishing AG, part of Springer Nature 2018 493
B. Schneider (ed.), Handbook of the Sociology of Education in the 21st Century, Handbooks
of Sociology and Social Research, https://doi.org/10.1007/978-3-319-76694-2_22
494 D. L. Lauen et al.
relatively small once family background is con- students and teachers are usually higher in high-
trolled (Blau and Duncan 1967; Duncan and poverty schools, making it more challenging to
Hodge 1963; Sewell et al. 1969, 1980). maintain continuity and coherence in learning
across the school year (Allensworth et al. 2009).
As James Coleman and colleagues put it in the These factors contribute to diminished instruc-
highly influential 1966 Equality of Educational tional capacity and worse curricular coverage
Opportunity report, “schools are remarkably sim- (Johnson et al. 2012). In addition, there is the
ilar in the way they relate to the achievement of concern of negative classroom spillover effects—
their pupils when the socioeconomic background that students in these schools learn at slower rates
of the students is taken into account. It is known due to the high prevalence of students with low
that socioeconomic factors bear a strong relation initial achievement and high rates of learning dis-
to academic achievement. When these factors are abilities and disruptive behavior (Hoxby 2000;
statistically controlled, however, it appears that Sacerdote 2011). Studies report that low-SES or
differences between schools account for only a high-poverty schools have lower test scores even
small fraction of difference in pupil achieve- once the family background of students is statisti-
ment” (Coleman et al. 1966, pp. 21–22). This cally controlled (e.g., Entwisle et al. 1994; Choi
claim raised important doubts about the suitabil- et al. 2008; Willms 1986, Battistich et al. 1995).
ity of schools as institutions that could ameliorate
social inequality. These concerns have provided a rationale for
public policies to mix students by social back-
However, Coleman’s classic work also ground. These policies have included busing,
reported that “children from a given family school choice, magnet schools, and drawing
background, when put in schools of different school boundaries to create more diverse schools.
social composition, will achieve at quite differ- Today, school assignment solely on the basis of
ent levels.” This finding suggested that proactive race has been ruled unconstitutional, so integra-
efforts to mix students by social background tion plans that mix students by socioeconomic
could have beneficial effects. In particular, the status are emerging as alternatives and have been
report argued that Black student achievement implemented in dozens of districts including
was more strongly related to school inputs than Wake County, NC; Cambridge, MA; and San
White student achievement: “The principal way Francisco (Kahlenberg 2012). Integrating stu-
in which the school environments of Negroes dents by income, however, is becoming harder
and Whites differ is in the composition of their because neighborhoods are getting more segre-
student bodies, and it turns out that the composi- gated by income (Jargowsky 1996; Watson 2009;
tion of the student bodies has a strong relation- Reardon and Bischoff 2011) and within large dis-
ship to the achievement of Negro and other tricts, between-school segregation by free/
minority pupils” (ibid., p. 22). This finding reduced-price lunch eligibility increased by about
became an important rationale for desegregation 30% between 1990 and 2010 (Owens et al. 2016).
and busing programs to integrate Black and
White students during the 1960s and 1970s. For these reasons, understanding the effects of
school poverty on student achievement has been
Fifty years ago and today, there is widespread a rich area of research in the sociology of educa-
concern about the performance of segregated tion. In addition, the theoretical and methodolog-
minority and high-poverty schools. High-poverty ical underpinnings of our analysis in this chapter
schools tend to have difficulty retaining experi- have deep roots in sociological analysis of many
enced teachers, who prefer to teach in low-pov- phenomena. At its most basic level, we want to
erty schools with better working conditions understand whether social context has effects on
(Boyd et al. 2005; Scafidi et al. 2007). Therefore, individuals over and above their individual back-
high-poverty schools tend to have more novice, ground. In short, whether the setting, or context,
long-term substitutes, and out-of-field teachers in which an individual is embedded has a causal
(Clotfelter et al. 2007, 2009; Ingersoll 2002; effect on their outcomes.
Lankford et al. 2002). Absences and mobility of
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 495
This chapter aims to both summarize existing this variable are also highly reliable. If this sec-
work on cross-sectional contextual effects (Blau ond set of assumptions are not met, such that the
1960; Blalock 1984; Iversen 1991; Raudenbush predictors were some psychometric scale and
and Bryk 2002) and encourage further develop- only a sample of students was available for each
ment and wider use of longitudinal approaches to school, a multilevel latent variable model would
estimating contextual effects (Bryk and be more appropriate (Lüdtke et al. 2008).
Raudenbush 1988; Lauen and Gaddis 2013). We
begin with the cross-sectional contextual effects 22.1 D ata
case, in which we summarize the large method-
ological literature on how to examine the effects The data used for this chapter comes from one
of school poverty on test score at one point in complete cohort from a statewide database of
time. This presentation is made with two caveats. administrative records compiled by the North
First, we stress in this section that the absence of Carolina Department of Public Instruction and
unobserved confounding is a strong, and likely archived by Duke University’s North Carolina
untenable, assumption for drawing causal infer- Education Records Data Center. We first observe
ences from cross-sectional observational studies. students in third grade in 2006 and retain in the
In brief, it is difficult to rule out the possibility sample only those were promoted to fourth and
that adverse selection into high-poverty schools fifth grade in 2007 and 2008, consecutively.
may be driving the residual associations often There are more than 216,000 student-year obser-
found from cross-sectional designs. In response to vations over 3 years, with about 72,000 unique
the first caveat, we turn to first two- and then students observed in each year. Our measure of
three-level longitudinal contextual effects designs. student poverty is whether the student is eligible
Our section on longitudinal modeling includes for free or reduced-price lunch, a threshold that is
examples of two-level (time within student) and actually 185% of the poverty line, adjusted for
three-level (time within student within school) family size.1 About 47% of students fall below
models. It also covers the complication of the this income threshold. The sample is about 55%
cross-nesting of students in schools over time. White, 28% Black, 10% Hispanic, 4% multi-
racial, 2% Asian, and 1% Native American. The
Longitudinal designs examine the association outcome is a vertically equated math test score
between changes in school poverty and changes
in student test scores. Not without their own com- 1 Free or reduced-price lunch eligibility is not an ideal
plications, longitudinal designs provide a stron- measure of family poverty or income, but it is the most
ger basis for making causal inferences about widely available one in U.S. administrative data from
sociological theories and public policy because school districts and states. One might want a continuous
(1) one can often establish temporal ordering, an income measure from all parents in the school to explore
important precondition for estimating causal the sensitivity of impacts to different income cutoffs.
effects, (2) one can exploit techniques to disen- Unfortunately, family income is generally not available in
tangle fixed confounding factors from the effects administrative data. It is also a measure pegged above the
of time-varying treatment effects. One can only poverty line rather than right at the poverty line. In addi-
draw causal conclusions from longitudinal tion, it is a measure that is disappearing. The “community
designs with strong assumptions about the eligibility” standard replaces individual eligibility with
absence of time-varying confounding, among schoolwide eligibility for schools that meet the commu-
others, but arguably the assumptions one must nity eligibility threshold. Finally, it does not capture the
make about the estimates from cross-sectional three aspects of family SES: income, parental education,
designs are stronger still (i.e., less likely to hold). and parental occupation. In our experience, however,
school-level correlations between percent free/reduced-
The second caveat is that we assume that our price lunch and average SES or percent college-educated
predictor, poverty, is measured without error. parents are quite high, so even if they mean different
Since we are working with state census data things at the individual level, they correlate strongly at the
(described below), the school-level aggregates of school level.
496 D. L. Lauen et al.
Fig. 22.1 Distribution of school poverty of North Carolina elementary schools with a fifth grade sample, 2008
designed to measure growth across grade levels covariate while holding constant the individual
(mean = 349.6, SD = 10.6). This outcome mea- value of the same covariate (Blau 1960; Blalock
sure is well suited to longitudinal analysis 1984; Iversen 1991). For example, does the
because it is designed to measure growth in school poverty rate have an effect on test score
achievement over time. The fifth grade sample holding constant student poverty? In this section
includes more than 1300 elementary schools, we consider the cross-sectional contextual effects
with an average of 55 students per school. Our model, the meaning of the estimated parameters,
focal variable of interest is school poverty rate and how to properly estimate the sampling vari-
which is the proportion of students in the school ances of the effects. We pay special attention to
that were eligible for free or reduced-price lunch. how appropriate estimation procedures either
Due to the state’s economic diversity, the state- increase or decrease the sampling variances. For
wide nature of the data, and the sample size, we the purposes of the exposition, we assume a bal-
observe a great deal of variation in school pov- anced sample (where each school has the same
erty in the sample (mean = .52, SD = .23, inter- number of students), but we note that our North
quartile range = .46, 5th percentile = .10 and Carolina data set is unbalanced.
.95th percentile = .96, see Fig. 22.1) and have
plenty of statistical power to reliably estimate Consider a hypothetical balanced sample
contextual effects. where there are i = {1, 2, …n} units (e.g., stu-
dents) each in j = {1, 2, …m} groups (e.g.,
22.2 Cross-Sectional Contextual schools), for a total sample size of N = n × m.
Effects Next, assume that an academic student level
outcome, y, for unit i in group j is predicted by a
Cross-sectional contextual effects models aim to student level variable, x, and the average of this
estimate the mean difference in an outcome asso- variable within each school, xj . For simplicity,
ciated with a change in the group mean of a we will assume that within each school there is
the same relationship between x and y. The focus
on this section is on three types of relationships
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 497
Fig. 22.2 Between,
within, and contextual
effects
in clustered data: between, within, and contex- erated from fitting various models to the North
tual. The next section will explore the sampling Carolina data. The first effect is the between
variance associated with these effects. The con- effect, which is the effect of the group average of
textual effect is derived from two other effects, the covariate, xj , on the group average of the
the between and within effects. We define each of outcome, yj . For example, researchers may be
these in turn and then show how they can be used interested in the relationship between a school’s
to define the contextual effect. average poverty level on school average math
achievement. One way to estimate this slope is to
To fix ideas consider Fig. 22.2, a visual repre- calculate the averages of the covariate and out-
sentation of positive between, within, and contex- come for all groups (schools) and then perform a
tual effects, which plots six observations across simple OLS regression, yj = z 0 + z1x j + vj ,
three groups, each with a solid line indicating the where the between effect is the estimate of ζ1.2
within-group regression and the between-group Note that the error term of this model, vj, is a
regression fit through the group means. In this combination of the between-group error term, uj,
hypothetical example, the between-group slope is and the average within-group error term, ej , so
0.75 and the within-group effect is 0.25. The con- that vj = uj + ej .
textual effect is then 0.5, represented by the long-
dashed line that compares the value of the higher Looking at the example data results in
value of x in group 2 with the same value of x in Table 22.1, the first model estimated is the
the group with the higher mean. That vertical dif- “Between Effects Model.” In this model we see
ference is the mean difference in y associated with that among North Carolina 5th graders, the dif-
a single unit change in the group mean. ference between a school with no poverty (i.e.,
Conceptually, the contextual effect is the predicted the mean is 0) and a school that is completely
difference in test score between two students who impoverished (i.e., the mean is 100) is about 11
share the same individual poverty level, but who
attend schools that differ by one unit of school 2 With balanced data one can predict unit values the group
poverty (Raudenbush and Bryk 2002, p. 141).
means yij = k0 + k1x j + wij , where wij = uj + eij, and the
With this picture in mind, we now define each
of the three effects and discuss coefficients gen- slope of the group mean from this model is the same as the
between model, κ1 = ζ1.
498 D. L. Lauen et al.
Table 22.1 Cross-sectional ordinary least squares con- term is eij.3 The “Within Effects Model” in
textual models, 5th grade mathematics Table 22.1 (column 2) shows a within-school
poverty gap of five points, less than half as large
1 2 3 4 as the between-school effect of 11.
OLS OLS OLS OLS The between and within effects can be com-
between within within within bined in a single model that enters both the group-
effects effects and and mean of the unit covariate and the
model model between context group-mean-centered value of the unit level
effects effects covariate into the regression
model model
( )yij = b0 + b1x j + b2 xij - x j + wij , where β2 = λ1
Student −5.027 −5.027
poverty (0.067) and β1 = ζ1 if the data are balanced. In Table 22.1,
Indicator (0.072) column 3, this model is estimated using OLS and
−5.027 our unbalanced data in the “OLS Within and
Group- Between Effects Model.”4 The contextual effect is
mean −11.294 (0.072) the difference between the between effect and the
centered −12.781 −7.754 within effect. It represents the difference in the
student outcome when the level-1 value is held constant
poverty (0.511) 357.827 (0.136) (0.154) and the group-mean is increased by one unit.
Indicator 359.617 (0.043) 361.486 361.486 Another way to think of the contextual effect is
(0.296) (0.071) (0.071) that it is the between effect net of the within effect.
School To estimate the contextual effect directly, and
mean compute a standard error of this estimate, we
poverty remove the group-mean centered variable, replace
it with an uncentered (or grand-mean centered)
Intercept level-1 variable, yij = g 0 + g1x j + g 2 xij + wij ,
which fits the data just as well. The within effect is
Notes: N = 72,252 students nested in 1310 schools. All γ2 = β2 = λ1 and the contextual effect is γ1 = β1 − β2.
effects statistically significant at p < 0.001. Standard This model is represented in the fourth column of
errors in parentheses. OLS = ordinary least squares Table 22.1, “OLS Within and Context Effects
Model,” where the effect of the group mean pov-
points. We can interpret gradients of this effect by erty is −7.75 points, the difference between the
multiplying by the proportion impoverished. For
example, the difference between no poverty and 3 When estimating the within model through the dummy
50% poverty is −11.3 * 0.5 = 5.7 points. variable approach, degrees of freedom is calculated cor-
rectly because the number of dummies counts toward the
The second effect of interest is the within number of regressors. When estimating the within model
effect, which is the effect of the unit level covari- via demeaning, one must adjust the degrees of freedom to
ate on the outcome with all variance associated account for the number of groups, which increases the
with the outcome at the group level removed. residual error variance, which in turn increases the standard
This can be accomplished by de-meaning the errors. This step is taken into account by statistical soft-
outcome and predictor’s group means from the ware. Our estimates were produced by Stata’s xtreg be re
level-1 values, such as in an econometric fixed and fe commands, which compute correct standard errors.
effects model, or by entering dummy variables
for each group except one. OLS can estimate the 4 Note that the between effect in Model 3 differs from the
within effect with the following model that trans- one in Model 1 because Model 1 is a regression of the
forms each variable by subtracting the group school averages, whereas Model 3 uses the individual
means from x and y, level test scores. If the data were balanced, the effects
would be the same.
xi†j = xij - x j and yi†j = yij - yj , to estimate the
model yi†j = l0 + l1xi†j + eij , where the within
effect is the estimate of λ1 and the within error
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 499
two effects shown in column 3: essence, these multilevel models use the data to
−12.781−(−5.027) = −7.754. partial out a random intercept, uj, which gener-
ally produce a different parameter estimate of
We note, however, that the OLS model typi- contextual effects than would be produced by an
cally underestimates the sampling variance of OLS model. Many econometricians express con-
group-level effects and over estimates the sam- cern over the use of random effects models in lieu
pling variance of within-group effects. In our of a fixed effects model that carries fewer assump-
example, the standard error on the within effect tions. In a fixed effects model, indicators for all
coefficient is smaller in column 2 than in column but one cluster are included as covariates.
3. While estimating the contextual effect is a However, this model removes all the variance
straightforward linear combination (β1 − β2), the associated with the cluster from the model, mak-
sampling variance of the effect required for a sta- ing the estimation of contextual effects from
tistical test, V{β1 − β2} = V{β1} + V{β2} − 2 × C cross-sectional data impossible.6
V{β1, β2}, is tedious because it requires the sam-
pling covariance of β1 and β2, which is usually In an appendix we sketch some important sta-
not reported or easily accessible in many soft- tistical details for FGLS. We show how the esti-
ware packages. We turn to this topic in the fol- mation relates to the conditional intraclass
lowing section. correlation, how random effects estimates use
both between- and within-cluster variation, and
22.2.1 Methods to Estimate thus generally lie between estimates produced by
the Sampling Variance OLS and fixed effects estimates. Perhaps more
of between, Within, familiar to sociologists of education are mixed or
and Contextual Effects Hierarchical Linear Models (Raudenbush and
Bryk 2002).7 The statistical details of the estima-
Ordinary least squares (OLS) regression makes tion are beyond the scope of this chapter, but
the assumption that each observation is indepen- essentially the estimation of the variance compo-
dently sampled. The data we use to estimate con- nents and design effects are both estimated in a
textual effects is always clustered into groups. single maximum likelihood step.8 We refer inter-
For example, in many surveys and interventions, ested readers to Raudenbush and Bryk (2002)
schools are sampled and then students within the chapters 3, 13, and 14, for an extended treatment
school are selected. This creates two sources of of maximum likelihood estimation and Bayesian
random error: the between-group residual, uj, methods. An advantage of this modeling frame-
which is the difference between the average of work is that it extends naturally into treating both
the group and the average of group averages, and intercepts and slopes as random, as we discuss
eij, which is the difference between each observa- below.
tion and its group average. Therefore, rather than
using OLS to estimate context effects, the econo- 6 This is due to the transformation of variables in fixed
metric approach is to represent these two sources effects models whereby the group mean is subtracted
of error with a random effects (RE) model esti- from each variable. In the case of “level-2” variables, this
mated through feasible generalized least squares procedure renders the transformed variable into a constant
(FGLS), or in psychology, a mixed model, such of 0 (because the group mean of a “level-2” variable is the
as a Hierarchical Linear Model (HLM).5 In variable itself). This constant of 0 is collinear with the
intercept constant of 1 rendering the model impossible to
5 A third option, not outlined here, is to simply estimate estimate.
OLS coefficients and use cluster-robust standard errors,
also known as sandwich estimators. This produces stan- 7 Mixed models can be estimated using the HLM software
dard errors that take into account clustering, but the coef- (Raudenbush et al. 2004); proc mixed in SAS; mixed in
ficient estimate itself is produced from only student-level Stata or SPSS; lme, nlme, and lme4 in R; or other special-
variation, so will generally differ from one produced by a ized software such as MPlus.
random or fixed intercept model.
8 Mixed models are related to, and an extension of,
ANOVA procedures (Raudenbush 1993). When restricted
maximum likelihood is employed, equivalent estimates
are obtained.
500 D. L. Lauen et al.
The model to estimate the between and within ion that is as good as random. With observational
effects for the ith student in school j is data, we rely on assumptions of “conditional
( ) yij = b0 + b1 xij - x j + eij , where ignorability” or “no omitted confounders.” In
short, these assumptions mean that once we con-
b0 = g 00 + g 01 x j + uj ,and dition on presumed confounds of treatment
assignment and outcome, we can ignore the fact
b1 = g10 . that students were not, in fact, assigned to schools
through a random process. If we knew and could
Equivalently, in mixed notation, we can write measure all confounds related to attending a high-
this model as poverty school we could potentially adjust for
( ) yij = g 00 + g 01 x j + g10 xij - x j + uj + eij . these confounds and produce a credible estimate.
For example, in Table 22.2, column 5, we
control for race/ethnicity, number of absences,
In this model, the between effect is noted as and number of school moves. If these variables
γ01, the slope of the group mean of the covariate, were sufficient to remove confounding, we could
and the within effect is noted as, γ10, the slope of consider these estimates causal. This is not likely
the group-mean-centered level-1 value of the the case as there are potentially many more con-
covariate. The values of γ00, uj, and eij, are the founds we should include in this model. Two
intercept, between-group residual, and within- omitted confounds we might wish to include
group residual, respectively. might be the quality of early childhood educa-
The “Mixed Within and Between Effects tion and intrinsic motivation to learn math.
Model” in column 3 of Table 22.2 presents the Nonetheless, it is instructive to examine what
results using a mixed model estimated with happens to the poverty coefficients at the contex-
restricted maximum likelihood (REML). We see tual and individual levels once we adjust for
that the within effect and its standard error is race/ethnicity, absences, and school moves: Both
−5.027 (0.067), identical to those produced by the decline in absolute value. This suggests that
“Random Within and Between Effects Model” race, absences, and/or school moves are either
estimated with feasible generalized least squares confounds or mediators depending on the logic
(FGLS) (column 1). The between effects are close, of causal ordering, which is challenging to assess
but differ slightly (−11.722 compared to −11.801). with cross-sectional data. To estimate the total
In addition, there is a close correspondence effect of school poverty, we should adjust for
between the contextual effects models estimated confounds and should not adjust for mediators
with FGLS and Mixed models (compare columns through which the school poverty effects oper-
2 and 4 of Table 22.2). In general, we report virtu- ate. Adjusting for a mediator would essentially
ally no differences in the estimates produced by block a pathway through which school poverty
REML and FGLS. We note, however, that the OLS affects the outcome, which is essential for con-
and multilevel model estimates of the contextual ducting a mediation analysis, but is not appropri-
effects are not the same as the OLS estimates of ate for estimating the total effect of school
the contextual effects (compare columns 2 and 4 poverty. By this logic, race/ethnicity is not likely
of Table 22.2 to column 4 of Table 22.1). This is a mediator because it is determined prior to
because the multilevel models take into account entering school. School move is also measured
both within- and between-school variation in pro- prior to entering the school in this period since it
ducing the contextual effect estimate, whereas the measures whether a student is new to the school
OLS estimate of the contextual effect is estimated they currently attend, so it also could not be con-
only on student-level variation. sidered a mediator. Absences during the current
The credibility of these estimates depends on school year could, however, be viewed as a con-
whether students are assigned to schools in a fash- found or a mediator: a confound if absences only
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 501
Table 22.2 Cross-sectional random effects contextual models, 5th grade mathematics
12 3 4 5
FGLS random FGLS random REML mixed REML mixed REML mixed
within and within and within and
within and between within and between effects context effects context effects
model
effects model context effects
Student poverty −5.027 −5.027 −3.680
Indicator (0.067) (0.071)
(0.067)
Group-mean
centered student −5.027 −5.027
poverty Indicator
(0.067) −6.694 (0.067) −6.774 −4.353
School mean −11.722 (0.519) −11.801 (0.459)
poverty (0.454)
(0.513) (0.454) 360.228 −3.214
American Indian (0.259) (0.320)
360.129 360.128 360.228 3.790 −2.827
Asian (0.293) (0.294) (0.259) 7.983 (0.206)
4.376 4.376 3.790 0.184 −4.811
Black 7.974 7.974 7.983 (0.087)
0.231 0.231 0.184 −2.013
Hispanic (0.115)
−1.711
Multiracial (0.162)
−0.017
Number of
absences (0.002)
−0.912
School move (0.105)
360.395
Intercept (0.256)
3.736
SD(u) 7.773
SD(e) 0.188
Rho
Notes: N = 72,252 students nested in 1310 schools. All effects statistically significant at p < 0.001. Standard errors in
parentheses. FGLS feasible generalized least squares, REML restricted maximum liklihood, SD(u) standard deviation
of the between-school error, SD(e) standard deviation of the within-school error, rho portion of total unexplained varia-
tion that lies between schools
reflect family background or health, or a media- there is a relationship between changes in school
tor if schools have some control over ensuring poverty and changes in test scores. If school pov-
students attend school. erty has a causal effect on test scores, we would
expect that students with higher exposure to
22.3 Longitudinal Contextual high-poverty schools would have slower test
Effects score growth. This specification assumes there
are changes in school poverty and changes in test
Due to the challenges in making causal infer- score to examine. To conduct this analysis we
ences with cross-sectional data, in this section we link students to schools over time and measure
consider a different empirical question: whether student poverty, school poverty rate, and test
score at each time point. We have seen in Fig. 22.1
502 D. L. Lauen et al.
that there is wide variation in school poverty school poverty are now xtij and xtj . Both of these
across schools. We call these between-school dif- variables now have a t subscript to denote that
ferences in school poverty rates. A different ques- these can vary across time. The intercept is the
tion is whether student exposure to school poverty expected test score for a non-poor student in a
varies over time. We call these within-student dif- school with no poor students at baseline (in
ferences in school poverty rates. These can 2006). β2 and β4 are the estimates of baseline test
change very little for students who remain in the score gaps between poor and non-poor students
same school and quite a bit for students who and between schools with no and all poor stu-
change schools. Student test scores also have dents, respectively. Based on prior research and
within- and between-student components. Test the results presented above, we expect these to be
scores can change due to differences in teacher negative. The primary coefficient of interest is β5,
quality, motivation, family inputs, and changes in which measures the annual expected test score
context over time. But, achievement test scores growth difference between students in schools
are strongly related within the same student over with no poor students and students in schools
time, suggesting that a student’s ability to per- with all poor students. If β5 is negative, then test
form on standardized achievement tests may be score trajectories of students in high-poverty
largely fixed by the early elementary grades. For schools are shallower than the trajectories of stu-
example, when we fit an unconditional growth dents in low-poverty schools. Note that we
model yti = β0 + β1year + u0i + eti, we estimate an include an interaction with year and x to avoid
ICC (the portion of test score growth that lies biasing β5. We include two random effects, a ran-
between students) of .82, which means that dom intercept specific to each student, u0i, and a
within-student variation around student-specific random coefficient for time, u1iyt. The random
means is relatively small compared to variation in intercept is the student-specific deviation from
student-specific means around the grand mean of the grand mean, and the random effect for time is
test scores. This suggests that test scores are the student-specific deviation from the mean
fairly stable within the same students over time. growth rate across all students.
22.3.1 T wo Level Growth Model Table 22.3, model 1 presents the results of the
(Time within Student) two-level contextual growth model with student-
level random intercept and growth terms. We
We begin our exploration of how test score include controls for race/ethnicity, absences, and
growth rates vary by school poverty with a two- number of school moves and interactions of these
level random effects linear growth model (time controls with year. This model assumes that
within students) fit to data on students in grades school poverty is a student-level characteristic
three through five: and ignores the clustering of students within
schools. Although these assumptions may not be
ytij = b0 + b1yeart + b2 xtij + b3 xtij yeart tenable, the differences between this model and
+b4 xtj + b5 xtj yeart + u0i + u1i yeart + eti the cross-sectional models in Table 22.2, model 5
are notable. First, the relationship between school
Because this model now includes measures of poverty and math test scores declines substan-
time (year) and interactions with time, we call tially in magnitude (from −4.3 to −2.1), as does
this a longitudinal contextual effects model. In the relationship between student poverty and
the cross-sectional model, we had only one con- baseline math test scores (−3.7 to −1.25). It is
textual effects parameter. Now we have two, β4 important to note that the main effects for student
and β5. Our measure of time, year, is rescaled poverty and school poverty have different mean-
such that year = calendar year-2006, so that time ings in Tables 22.2 and 22.3. In Table 22.2, for
runs from 0 to 2, in increments of 1. Student and example, the contextual effect of school poverty
is a cross-sectional association at grade five,
which includes the cumulative effect of school
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 503
Table 22.3 Longitudinal random effects contextual models, 3rd through 5th grade mathematics
1 2 3
Two-level longitudinal Three-level Three-level
growth model with longitudinal growth longitudinal
student-level random model not accounting growth model
effects for partial cross-nesting that accounts
for partial
cross-nesting
Fixed effects 5.57 *** 5.49 5.55 ***
Year [0.11] *** [0.16] *** [0.14] ***
−1.25 *** −1.51 ***
Student poverty [0.05] *** −1.22
indicator −0.10 [0.05] ***
[0.03] *** −0.13 *** [0.05]
Student poverty −2.06 −0.13
indicator * year [0.11] [0.03]
−0.10 0.18 *** [0.03]
School mean poverty [0.06] [0.22] 0.08
343.58 0.20 [0.21]
School mean poverty * [0.27] 0.08
year x
[0.15] [0.15]
Constant x 342.59 342.63
[0.31] *** [0.31]
Student random effects 7.77
School random effects 1.29 x x
Accounts for partial −0.24 x x
cross-nesting x
Includes controls and 3.83
interactions of controls xx
with year 216,756
Random effects 7.28 7.37
Student 0.53 0.77
Sd(intercept) −0.36 −0.29
Sd(slope)
Corr(intercept, slope) 3.06 2.33
School 1.24 1.22
Sd(intercept) −0.15 −0.27
Sd(slope) 3.80 3.79
Corr(intercept, slope)
Sd(residual) 0.12 0.07
Betw school ICC 0.85 0.72
Initial status 216,756 216,756
Growth
N
Notes: N = 216,756 student-year observations across 3 years cross-nested in 1310 schools. All models include controls
for race/ethnicity, absences, and number of school moves and interactions of each control with year. Standard errors in
parentheses. ***p < 0.001, ** p<0.01, *p < 0.05
poverty from the past. In Table 22.3, the contex- The average annual growth rate is 5.6 test
tual effect of school poverty is the expected gap score points. The coefficient on the interaction of
in third grade math test scores. school mean poverty and year, −0.10, is not large
or statistically significant, which indicates this
504 D. L. Lauen et al.
model would produce test score trajectories for p 0ij = b00 j + b01 j xtij + r0ij
students in high-poverty schools that are quite
similar for students in low-poverty schools. Poor p1ij = b10 j + b11 j xtij + r1ij
students have slightly slower growth rates, a
result that is statistically significant due to the where β00j is the mean baseline test score within
large sample size, but one that is nonetheless school j, xtij is a student-level time-varying covari-
quite small in magnitude (−0.10, or 2% of a ate (e.g., poverty status), r0ij is a random intercept
year’s growth). The random effects predict that effect specific to student ij, β10j is the average
95% of baseline test scores lie between 328 and growth rate of test scores within school j, and r1ij
359 (344 +/− 1.96*7.77) and that 95% of growth is a random growth rate effect specific to student
rates lie between 3.0 and 8.1 (5.57 ij. Finally, the school-level equation is:
+/− 1.96*1.29).
b00 j = g 000 + g 001 xtj + u00 j
Despite the improvement on a cross-sectional
specification that the contextual growth model b10 j = g100 + g101 xtj + u10 j
above offers, the two-level model is not cor-
rectly specified and makes assumptions about where γ000 is the mean baseline test score across
the structure of the data that are not likely to be schools, xtj is the time-varying school-level
valid. Assigning school-level variables as char- mean of x (e.g., school poverty rate), u00j is a ran-
acteristics of the student ignores the clustering dom intercept effect specific to school j, γ100 is the
of students by demographic, economic, and average growth rate of test scores across schools,
other characteristics at the school-level. and u10j is a random growth rate effect specific to
Moreover, treating multiple students within a school j.
school as being independent observations is
inaccurate and often explicitly contradictory to Educational research analyzing contextual
the structure of data sets such as ours that have effects of schools often requires a specific type of
repeated observations on students nested within multilevel model when using longitudinal data.
schools. Cross-sectional data observe students at only one
point in time. Quite often in this type of design
22.3.2 Three-Level Growth Model students appear in only one school, so the data
(Time Within Student Within School) have a hierarchical nesting structure (see Fig. 22.3).
This is very rarely the case with longitudinal data
A more appropriate specification is a three-level as many students change schools from year to year
model of student achievement trajectories that (or even within years). Thus, rather than students
treats time as level-1, students as level-2, and nesting perfectly within schools, students are
schools as level-3 (Bryk and Raudenbush 1988). cross-classified into multiple schools (see
Continuing the illustrative example using the Fig. 22.4). Cross-classification complicates the
North Carolina data, at level-1 our measurement estimation of school-level random effects, and to
model is: accurately estimate contextual effects with valid
significance tests, we must account for the cross-
ytij = p 0ij + p1ij yeart + etij classified structure of the data.
where Ytij is the achievement score at year t for
student i in school j, π0ij is initial status or base- Table 22.4 provides data from a hypothetical
line test score for student i in school j, π1ij is the sample to illustrate cross-classification. Each row
linear growth rate for student ij, and εtij is the represents a student, and there are four waves of
residual disturbance. Our random coefficient and data collected. Whereas some students are per-
random slope are the level-2 outcomes of: fectly nested into a single school at level-3 (e.g.,
students 1, 5, 7, and 9), many students change
schools at least once during the panel survey.
There are two types of cross-classification.
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 505
Fig. 22.3 Three-level data structure with perfect nesting
Fig. 22.4 Three-level data structure with partial cross-nesting
Complete cross-classification occurs when there of students observed per school falls, it becomes
are students in the survey for each permutation of increasingly likely that the data set is partially
schools by time period. Partial cross-c lassification, cross-classified. Not surprisingly, this is the case
on the other hand, occurs when we observe only with our administrative data on children from
some of the potential permutations of schools by North Carolina, which has about 11% of students
time in the data. This is the case in the hypotheti- switching schools between years. For more details
cal data displayed in Table 22.4, and in longitudi- about cross classification, readers may consult
nal data sets with several waves and/or multiple Raudenbush and Bryk (2002), chapter 12.
geographies in the sampling frame, this is the
most common structure of the data. In fact, as the Table 22.3, model 2 presents a naïve 3-level
number of schools in the data set rises or number model that does not account for the partially
cross-nested structure of the data and instead
506 D. L. Lauen et al.
Table 22.4 Hypothetical panel data set with schools account for a great deal of the variation in
cross-classification test score growth rates. The standard deviation of
School student growth rates is 0.53; the standard devia-
Student School School School ID_t4 tion of school growth rates is more than twice as
ID ID_t1 ID_t2 ID_t3 1 large at 1.24. This translates to a between-school
2 ICC of 0.85,10 which means that 85% of the total
1 1 1 1 5 test score growth rate is accounted for by schools
2 1 1 1 1 and only 15% is accounted for by students.
3 1 5 5 2
4 1 2 1 3 Table 22.3, model 3 presents the results from
5 2 2 2 2 our 3-level longitudinal growth model of math
6 2 2 3 4 test scores that accounts for partial cross-nesting.
7 2 2 2 3 The parameter estimates and standard errors are
8 3 4 4 6 quite similar to those in model 2, however we
9 3 3 3 observe clear differences in the random effects.
10 3 10 10 In short, accounting for partial cross-nesting
increases the portion of variation in initial status
assumes perfect nesting.9 This model, like mod- and growth due to students relative to that due to
els 1 and 3, includes controls for race/ethnicity, schools. Whereas student-specific factors account
absences, and number of school moves and inter- for only 15% variation in math score growth rates
actions of these controls with year. By ignoring in model 2, they explain 28% of the variation in
the partial cross-nesting, we implicitly assume math score growth in model 3. In addition,
that every student-school combination represents schools account for less variation in baseline test
a unique observation; that is, when a student scores relative to students: Between-school fac-
changes schools, he or she is assumed to be a new tors explain 12% of the unexplained variance in
student rather than the same student in a different baseline math scores in model 2, but they only
school. explain 7% in model 3.
Unlike the 2-level model shown in the first Along with the differences we observe
column, this 3-level specification indicates that between the 3-level models that do and do not
there is very little relationship between school properly account for partial cross-nesting, there
poverty levels and baseline math scores (.18). We are also important differences between the 3-level
also observe a positive but insignificant coeffi- models (models 2–3) and the 2-level model
cient on the interaction of school mean poverty (model 1). The 3-level results are in clear contrast
and year, which indicates that students in high- to those from our 2-level model as we observe
poverty schools have growth rates quite similar to sign changes in the fixed coefficients for school
students in low-poverty schools. The coefficients poverty on baseline math scores and math score
on student poverty and student poverty growth growth. There are also important differences in
rate differentials are quite similar in models 1 and the random effects, with a small decrease in the
2. Turning to the random effects, we find that standard deviation of the student random inter-
about 12% of the variation in student third grade cept and larger declines in the standard deviation
test scores lies between schools. While we do not of the student random growth coefficient. In fact,
find evidence of strong effects of school poverty whereas less than one-tenth of baseline variance
on test score growth, we find strong evidence that in test scores occurs between schools, over 70%
of the variation in growth rates is between schools
9 In Table 22.3, models 1 and 3 are estimated with and potentially attributable to school-level effects
R. Stata’s typical coding of multilevel models is strictly rather than student effects (unconditional models
hierarchical, with each unit fitting neatly into a single yield similar results). Thus, despite the seem-
group. It is possible to fit cross-classified data in Stata
using a specific notation, but large, unbalanced cross clas- 10 The between-school ICC for test score growth is calcu-
sified data sets pose serious computational problems in
this program. Instead, we turned to the lme4 package in R lated as 1.242 = .85
(Bates 2010), which is particularly well suited for compu- .532 + .1242
tationally efficient analysis of large non-hierarchical data.
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 507
ingly negligible impact of school poverty, the effects; for instance, school poverty at grade 4
3-level model reveals a substantial role for affects achievement at grade 4 but not at grade 5.
school-level effects on math score achievement This may not be correct. Increasingly, research-
trajectories. An important implication of these ers are examining the cumulative impact of pro-
results is that 2-level models can overstate the longed exposure to contextual disadvantage. For
role of student effects on achievement trajecto- instance, several analyses of the impact of neigh-
ries when ignoring school-level characteristics borhood disadvantage conclude that sustained
and clustering. exposure to disadvantaged contexts has much
The adequacy of this 3-level model rests on a more pernicious effects than episodic exposures
number of assumptions. First, to conclude that (e.g., Sharkey and Elwert 2011; Wodtke et al.
our parameter estimates represent mean causal 2011).
effects, we must assume that, as stated above, we
have properly adjusted for all confounds of the
school poverty–test score relationship. This is a 22.4 Conclusion
strong assumption, and it is unlikely to be satis-
fied in our present example. For instance, paren- There are many reasons to expect a strong neg-
tal socioeconomic status (SES) likely affects both ative relationship between school poverty and
school assignment and test score growth (e.g., student test scores. As mentioned above, these
through investment in non-school educational include differences between high- and low-pov-
resources or at-home educational experiences). erty schools in curriculum, instructional pac-
Student poverty is related to SES, but is most cer- ing, teacher quality, classroom disruptions, and
tainly a flawed proxy in the sense that it captures other factors. On the other hand, it is widely
only the bottom end of the income distribution known that there is much more variation in test
and does not measure either occupation or paren- score achievement levels within than between
tal education. schools. This suggests that the common narra-
Our estimates of the effect of school poverty tive of failing high-poverty schools may be
on test score growth will be unbiased if all unob- overblown. Perhaps U.S. public schools are
served confounds are fixed and have constant fairly homogenous in the ways that matter the
effects on the dependent variable (test scores) most for learning. For example, there may be
and focal independent variable (school poverty). very little variation in instructional time across
The relationship between school poverty and schools, with instruction organized in age-
baseline test scores would incorporate any time- graded classrooms of the same size, taught by
invariant fixed effects. Thus, although baseline teachers with very similar training, governed by
effects are sensitive to time-invariant, unobserved fairly consistent state standards. In short, per-
confounding, the relationship between school haps over time, schools have become institu-
poverty and math score growth should be immune tionally isomorphic with nationwide
to this type of confounding. Of course, parental expectations about how schooling should be
SES does vary over time—as do many other organized (Meyer and Rowan 1977; DiMaggio
unobserved confounders such as marital disrup- and Powell 1983).
tion—and the impacts of school poverty on both Our results show that improving model speci-
baseline math scores and score growth are sub- fication reduces the correlation between school
ject to bias from these omitted time-varying poverty and test score. The between-school effect
confounds. is large, at greater than one standard deviation in
A final assumption of the model is that the test score (the math test score standard deviation
functional form of the relationship between is 10.6). But some of this is accounted for by the
school poverty and math scores is specified prop- effect of student poverty on test score and the fact
erly. In the present example, we assume that that school poverty is an aggregate property of
school poverty has only contemporaneous student poverty. The contextual effect of school
508 D. L. Lauen et al.
poverty, an estimate of the effect of school pov- There are some limitations to note. The advan-
erty net of student poverty, is still pretty large at tage of using administrative data to estimate con-
above two-thirds of a standard deviation in test textual effects is that the data contain a census of
score. Adjusting for race/ethnicity, absences, and all students in each school, which reduces stan-
number of school moves reduces this cross-sec- dard errors. In addition, students can often be fol-
tional association a great deal, to below half a lowed over time, and across different schools,
standard deviation. A longitudinal specification is which permits longitudinal analysis. A disadvan-
not directly comparable to the cross-sectional tage is that many important confounds are not
model in that the effect of school poverty is now measured. National data sets have many more
the effect on third grade score rather than fifth measures of family background, student motiva-
grade score. The two-level model reports an even tion, and early childhood experiences than does
smaller effect of school poverty on baseline (third administrative data. Another limitation is that
grade) test scores. The three level model’s school within-student variation in school context is
poverty effect is indistinguishable from zero. The somewhat limited and what variation exists is pro-
conclusion we draw from this is that adjustment duced by students who change schools. This itself
for student-level confounds and modeling frame- introduces an endogeneity problem in that the
work (cross-sectional vs longitudinal) matters a decision to switch schools may be the result of
great deal to conclusions about the relevance of school poverty. If this is the case, then controlling
school poverty to student test scores. The second for school mobility may in fact block one of the
conclusion we draw from this is that when esti- pathways through which school poverty exerts
mating the effects on student test scores, a three- influence on test scores, which could bias esti-
level specification (time within student within mates. Methods for proper adjustment of time-
schools) is better than a two-level specification dependent confounds are beyond the scope of this
(time within students). The reason for this is that chapter, but have been developed by Robins
variation in test score growth across schools may (Robins 1999; Robins et al. 2000) and applied to
be greater than variation in test score growth this question by Lauen and Gaddis (2013). For the
across students. This perhaps suggests a greater sake of ease of exposition, this chapter covers the
potential to find school correlates of change in three time point linear growth case. It is possible
test scores rather than student correlates of that results from a quadratic growth model esti-
change in test scores. For this exercise, we only mated on grades 3–8 might be more appropriate.
explored one, finding that student growth rates do Results could vary by state. A national sample
not vary with the poverty level of their schools. with better baseline and time-varying confounds
We note that this particular finding does not pre- would be a welcome improvement.
clude discovering more promising school corre-
lates of test score change. Third, accounting for Finally, test scores themselves are quite stable
partial cross-nesting of students in different within students over time because they are
schools over time has virtually no impact on fixed designed to have high reliability. This suggests
parameter estimates, though it does change the that estimating correlates of test score changes
size of the random effects. For this reason, this may be challenging. In addition, test scores are
difference could affect standard errors of coeffi- also not the only important outcome of schooling,
cients. In our study we observe no difference in so examining the cumulative effects of school
precision due to a relatively large sample size. contexts on non-test score outcomes is an obvious
next step for future contextual effects research.
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 509
Appendix: The Feasible Generalized each other on their values of an outcome. The
Least Squares Method for Random larger the ICC, the more correlated (i.e., more
Intercept Models similar) two units are within the same group.
The first method to estimate within and between The ICC is an important parameter because it
effects in a single model was feasible generalized is a key contributor to the design effect (Kish
least squares (FGLS), which is a process of trans- 1965) of the variances of the between-group
forming the variables to control the error struc- effects, contextual effects, and within effects.
ture and then fitting OLS models to the new data. Design effects are measures of how much the
The procedure outlined here is called the Swamy- sampling variance (the square of the standard
Arora (1972)11 method and is implemented in error) of estimated effects of group level vari-
many software packages. This model requires ables (such as between effects) change due to the
estimates of conditional variance components.12 estimation strategy (i.e., generalized least squares
To define conditional variance components, con- compared to ordinary least squares). The use of
sider that the total conditional variance of the OLS naively on the original data produces sam-
outcome (conditional on the values of fixed pre- pling variances that ignore the design effects,
dictors and their fixed effects) is a combination of leading to inflated standard errors and false tests
the variances of the error terms from the between of the null hypotheses.
and within models, var(yij| Xij) = var (uj) + var (
eij). These two quantities are called variance com- Estimating a contextual effects model with
ponents. Variance components can be rescaled to feasible generalized least squares (FGLS)
be an estimate of the intraclass correlation (ICC), requires transforming the OLS equation with
weights defined by a ratio of the variance compo-
( )defined as r = nents. For clustered data, the transformation de-
var uj , which is the means the value using a weight: For example, a
variable z would be transformed by zi*j = zij -qˆ zj
( ) ( )var uj + var eij
using the group mean of z and the parameter θ as
proportion of the total conditional or uncondi- the weight. The parameter θ is based on the
within and between variance components
tional variation that exists between groups is (Cameron and Trivedi 2005):
characterized by the intraclass correlation, ICC
or ρ.13 The intraclass correlation is a measure of
how much units within the same group resemble ( )( )qˆ = 1- var eij
( ) ( ) n var uj + var eij .
11 See Hill et al. (2008) for a general overview, and Thus, a model that estimates both within and
Amemiya (1985) for a more advanced treatment and
history. between effects can be computed using the fol-
12 Note that we specify the conditional variance compo- lowing regression,
nents, since the standard errors are based on the residual
variance net of the model. Many other texts on education ( ) ( ) ( )yij -qˆ yj = b0 1-qˆ + b1 xj -qˆ xj
evaluations employ unconditional variance components
because they are performing experiments, where the only ( ) ( ) .
impact of interest is the randomized experiments. +b2 xij - x j + eij - ej
However, here we are modeling observational data, and
thus the standard errors are based on the residuals net of As between-unit variation increases relative to
the model specified. within-unit variation, then qˆ approaches 1 and
13 In many cases, such as randomized experiments, the the random effects estimator converges to the
ICC is a measure of how much the population variance
occurs between groups. Estimates of these parameters for fixed effects estimator, which demeans with the
math and reading are available from Hedges and Hedberg
(2007, 2013) and Hedberg and Hedges (2014). However, entire portion of the group mean. Conversely, as
in contextual analysis, the ICC is a conditional parameter,
noting how much of the variance in the outcome, net of within-unit variation increases relative to
predictors, occurs between groups. between-unit variation, qˆ approaches 0 and the
random effects estimator converges to pooled
510 D. L. Lauen et al.
OLS, in which group averages are irrelevant. In Economic Review, Papers and Proceedings, 95(2),
other words, the random effects estimator uses 166–171.
the information in the data to determine how Bryk, A. S., & Raudenbush, S. W. (1988). Toward a
much of the group mean to include in the esti- more appropriate conceptualization of research on
mate, more if between effects are large and less if school effects: A three-level hierarchical linear model.
between effects are small. American Journal of Education, 97, 65–108.
Cameron, A. C., & Trivedi, P. K. (2005).
The θ parameter is also directly related to how Microeconometrics: Methods and applications.
the standard error increases when using the cor- Cambridge: Cambridge University Press.
rect model compared to the naïve OLS estimator. Choi, K. H., Raley, R. K., Muller, C., & Riegle-Crumb,
For example, if we examine the “Random Within C. (2008). Class composition: Socioeconomic charac-
and Between Effects Model” in column 1 of teristics of coursemates and college enrollment. Social
Table 22.2 we see that the standard error of the Science Quarterly, 89(4), 846–866.
between effect (0.513) is much larger than the Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2009). Are
standard error of the between effect from the teacher absences worth worrying about in the U.S.?
OLS model in column 3 of Table 22.1 (0.136). Education Finance and Policy, 4, 115–149.
The random effects variance is about Clotfelter, C. T., Ladd, H. F., Vigdor, J. L., & Wheeler,
0.513^2/0.136^2 = 14 times higher than the OLS J. (2007). High-poverty schools and the distribution of
variance. There are about 55 students per school, teachers and principals. North Carolina Law Review,
and the conditional ICC is 0.23, so the expected 85, 1345–1379.
design effect is about 1 + (55–1)*0.23 = 13, Coleman, J. S., Campbell, E. Q., Hobson, C. J.,
which is consistent with the observed inflation in McPartland, J., Mood, A. M., Weinfeld, F. D., & York,
the standard error of the between effect R. L. (1966). Equality of educational opportunity.
coefficient. Washington, DC: U.S. Government Printing Office.
DiMaggio, P. J., & Powell, W. W. (1983). The iron
References cage revisited: Institutional isomorphism and col-
lective rationality in organizational fields. American
Allensworth, E., Ponisciak, S., & Mazzeo, C. (2009). The Sociological Review, 48(2), 147–160. https://doi.
schools teachers leave: Teacher mobility in Chicago org/10.2307/2095101.
public schools. Research Report, Consortium on Duncan, O. D., & Hodge, R. W. (1963). Education
Chicago School Research, University of Chicago. and occupational mobility: A regression analysis.
Retrieved from http://files.eric.ed.gov/fulltext/ American Journal of Sociology, 68, 629–644.
ED505882.pdf Entwisle, D. R., Alexander, K. L., & Olson, L. S. (1994).
The gender gap in math: Its possible origins in neigh-
Amemiya, T. (1985). Advanced econometrics. Cambridge, borhood effects. American Sociological Review, 59(6),
MA: Harvard University Press. 822–838.
Hedberg, E. C., & Hedges, L. V. (2014). Reference values
Bates, D. M. (2010). lme4: Mixed-effects modeling with of within-district interclass correlations of academic
R. http://lme4.r-forge.r-project.org/book achievement by district characteristics results from a
meta-analysis of district-specific values. Evaluation
Battistich, V., Solomon, D., Kim, D., Watson, M., & Review, 38(6), 546–582.
Schaps, E. (1995). Schools as communities, poverty Hedges, L. V., & Hedberg, E. C. (2007). Intraclass cor-
levels of student populations, and students’ attitudes, relation values for planning group-randomized tri-
motives, and performance: A multilevel analysis. als in education. Educational Evaluation and Policy
American Educational Research Journal, 32(3), 627. Analysis, 29(1), 60–87.
https://doi.org/10.2307/1163326. Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correla-
tions and covariate outcome correlations for planning
Blalock, H. M. (1984). Contextual-effects models: two-and three-level cluster-randomized experiments
Theoretical and methodological issues. Annual Review in education. Evaluation Review, 37(6), 445–489.
of Sociology, 10, 353–372. Hill, R. C., Griffiths, W. E., & Lim, G. C. (2008).
Principles of econometrics. Hoboken: Wiley.
Blau, P. M. (1960). Structural effects. American Hoxby, C. (2000). Peer effects in the classroom: Learning
Sociological Review, 25, 178–193. from gender and race variation (NBER Working
Paper No. 7867). Retrieved from http://www.nber.org/
Blau, P. M., & Duncan, O. D. (1967). The American occu- papers/w7867
pational structure. New York: Wiley. Ingersoll, R. (2002). Out-of-field teaching, educa-
tional inequality, and the organization of schools:
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2005). An exploratory analysis. CPRE Research Reports.
Explaining the short careers of high-achieving teach- Retrieved from http://repository.upenn.edu/
ers in schools with low-performing students. American cpre_researchreports/22
22 Methods for Examining the Effects of School Poverty on Student Test Score Achievement 511
Iversen, G. R. (1991). Contextual analysis. Newbury Reardon, S. F., & Bischoff, K. (2011). Income inequal-
Park: SAGE Publications. ity and income segregation. American Journal of
Sociology, 116(4), 1092–1153.
Jargowsky, P. A. (1996). Take the money and run:
Economic segregation in U.S. metropolitan areas. Robins, J. M. (1999). Association, causation, and mar-
American Sociological Review, 61, 984–998. ginal structural models. Synthese, 121(1), 151–179.
Johnson, S. M., Kraft, M. A., & Papay, J. P. (2012). How Robins, J. M., Hernán, M. Á., & Brumback, B. (2000).
context matters in high-need schools: The effects of Marginal structural models and causal inference in
teachers’ working conditions on their professional sat- epidemiology. Epidemiology, 11(5), 550–560.
isfaction and their students’ achievement. Teacher’s
College Record, 114(10), 1–39. Sacerdote, B. (2011). Peer effects in education: How
might they work, how big are they and how much
Kahlenberg, R. D. (Ed.). (2012). The future of school do we know thus far? In E. Hanushek, S. Machin, &
L. Woessmann, (Eds.), Handbook of the economics of
integration: Socioeconomic diversity as an edu- education. Amsterdam: Elsevier.
cation reform strategy. New York: The Century
Foundation. Scafidi, B., Sjoquist, D. L., & Stinebrickner, T. R. (2007).
Kish, L. (1965). Survey sampling. New York: Wiley. Race, poverty, and teacher mobility. Economics of
Lankford, H., Loeb, S., & Wyckoff, J. (2002). Teacher Education Review, 26(2), 145–159.
sorting and the plight of urban schools: A descriptive
analysis. Educational Evaluation and Policy Analysis, Sewell, W. H., Haller, A. O., & Portes, A. (1969). The edu-
24(1), 37–62. cational and early occupational attainment process.
Lauen, D. L., & Gaddis, S. M. (2013). Exposure to class- American Sociological Review, 34, 82–92.
room poverty and test score achievement: Contextual
effects or selection? American Journal of Sociology, Sewell, W. H., Hauser, R. M., & Wolf, W. C. (1980). Sex,
118(4), 943–979. schooling, and occupational status. American Journal
Lüdtke, O., Marsh, H. W., Robitzsch, A., Trautwein, of Sociology, 86, 551–583.
U., Asparouhov, T., & Muthén, B. (2008). The mul-
tilevel latent covariate model: A new, more reliable Sharkey, P., & Elwert, F. (2011). The legacy of disadvan-
approach to group-level effects in contextual studies. tage: Multigenerational neighborhood effects on cog-
Psychological Methods, 13(3), 203. nitive ability. American Journal of Sociology, 116(6),
Meyer, J. W., & Rowan, B. (1977). Institutionalized orga- 1934–1981.
nizations: Formal structure as myth and ceremony.
American Journal of Sociology, 83, 340–363. Swamy, P. A. V. B., & Arora, S. S. (1972). The exact
Owens, A., Reardon, S. F., & Jencks, C. (2016). Income seg- finite sample properties of the estimators of coef-
regation between schools and school districts. American ficients in the error components regression models.
Educational Research Journal, 53(4), 1159–1197.
Raudenbush, S. W. (1993). Hierarchical linear models and Econometrica: Journal of the Econometric Society,
experimental design. Applied Analysis of Variance in 40, 261–275.
Behavioral Science, 137, 459. Watson, T. (2009). Inequality and the measurement
Raudenbush, S. W., Bryk, A. S., & Congdon, R. of residential segregation by income in American
(2004). HLM 6 for windows [Computer software]. neighborhoods. Review of Income and Wealth, 55(3),
Lincolnwood: Scientific Software International. 820–844.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical Willms, J. D. (1986). Social class segregation and its rela-
tionship to pupils’ examination results in Scotland.
linear models: Applications and data analysis meth- American Sociological Review, 51, 224–241.
ods. SAGE Publications. Wodtke, G. T., Harding, D. J., & Elwert, F. (2011).
Neighborhood effects in temporal perspective: The
impact of long-term exposure to concentrated dis-
advantage on high school graduation. American
Sociological Review, 76(5), 713–736.
School and Teacher Effects 23
Stephen L. Morgan and Daniel T. Shackelford
Abstract debate on the validity of models and measures
This chapter summarizes the extant sociologi- that seek to identify effective teachers, including
cal literature on the interactive nature of methods that (1) infer effective teaching from
school and teacher effects on student learning. growth in pupil test scores or (2) assess teacher
It explains why the most recent literature on performance through systematic classroom
teacher sorting demands the attention of more observation. Instead, these debates have been
sociologists of education, and it demonstrates dominated by economists and policy researchers
what is revealed about patterns of teacher sort- who have demonstrated little interest in drawing
ing using the type of data most commonly insight from the extant sociological literature on
analyzed by sociologists of education. either teacher effects or school effects.
Throughout, the chapter discusses the meth-
odological requirements of research that can Although the lack of broad engagement
and cannot disentangle teacher effects from among sociologists in the most recent debate on
school effects, and it considers how teacher effective teaching might be considered a failing
and school effects may be evolving in the of the sociology of education, it also reflects a
changing landscape of K–12 education in the healthy skepticism about the worth of engage-
United States. ment in a debate over methods, such as value-
added models (VAMs), thought very likely to fail
For studies of school performance and student on their own anyway. Even with this rationaliza-
learning, the sociology of education has a long tion, now is the time for sociologists to join fel-
history of research on the effects of teachers. low social scientists and policy researchers in a
Most of the specific literature on these effects reconstruction of the literature on teacher effects.
predates the push to encourage effective teaching Not only is there good reason to expect that the
in the United States through accountability poli- monitoring of effective teaching may have altered
cies. In fact, as we will discuss in this chapter, the relationships between teachers and other
sociologists have contributed very little to the school actors, the debate itself appears to be in a
phase of transition to more reasonable modes of
S. L. Morgan (*) · D. T. Shackelford (*) analysis and interpretation. More scholars seem
Johns Hopkins University, Baltimore, MD, USA to recognize that teacher effects vary fundamen-
e-mail: [email protected]; tally because of their entanglement with effects
[email protected] generated by school and community differences.
These encompassing contextual effects are famil-
iar objects of study for sociologists of education,
© Springer International Publishing AG, part of Springer Nature 2018 513
B. Schneider (ed.), Handbook of the Sociology of Education in the 21st Century, Handbooks
of Sociology and Social Research, https://doi.org/10.1007/978-3-319-76694-2_23
514 S. L. Morgan and D. T. Shackelford
and as a result sociologists have an important sentences from Waller’s book, most commonly to
contribution to offer. demonstrate the choppy waters that teachers must
navigate when they seek to motivate listless stu-
In this chapter, we have several related aims: dents while accommodating parents and school
(1) to convey the contours of the extant sociologi- leaders. Yet, the focus on this single book in the
cal literature on teachers, (2) to consider the current collective memory often obscures the
interactive nature of school effects and teacher breadth of related research from early and mid-
effects on student learning, (3) to explain why the twentieth-century sociology sociology of educa-
most recent literature, largely outside of sociol- tion. Consider just three examples of topics of
ogy, on teacher sorting should receive more study from this period of scholarship that, as we
attention from sociologists of education, (4) to will explain below, remain important to current
demonstrate what is revealed about patterns of debates on teacher effectiveness:
teacher sorting using the type of data most com-
monly analyzed by sociologists of education (the 1. Professionalism: Teachers should be profes-
most recent nationally representative survey of sionals, and mechanisms for the careful selec-
high school students conducted by the U.S. tion and training of teachers need to be further
Department of Education), and (5) to offer our developed (Myers 1934). Teachers differ a
perspective on the methodological and measure- great deal in their social origins (Carlson
ment requirements of research that can break new 1961), but they remain valued leaders in their
ground on unraveling the interrelationships communities (Buck 1960). Teachers retain
between school and teacher effects. their community leadership roles partly
because their out-of-school behavior is moni-
23.1 T hree Themes of Sociological tored and regulated by the community (Cook
Research on Teachers et al. 1938; Cook and Greenhoe 1940).
Relatedly, teacher satisfaction rests on mutu-
In this section, we recount three prominent ally respectful relations with the community
themes in sociological research on teachers, (Roth 1958). In large school systems, teachers
which can be discussed in a rough chronological move between vacancies in search of students
order. No review can hope to be comprehensive, who are easier to teach, typically with the con-
and we aim only to offer examples that demon- sequence that the schools with students who
strate longstanding sociological engagement on have the most social disadvantage receive
three topics—teachers as professionals embed- school instruction from the least experienced
ded in communities, teachers as inputs into stu- teachers (Becker 1952a). Fortunately, most
dent achievement models, and teachers as actors teachers remain active readers, including for
in schools with complex organizational structures professional development and the improve-
that are differentially effective. ment of their own teaching skills (Fisher
1958).
23.1.1 Teachers As Professionals
Embedded in Communities 2. Within-Classroom Performance: Teachers are
most effective when their social distance from
The most prominent early sociological research the pupils assigned to them is minimized, sug-
on school teachers is easily identified by the work gesting that teachers should be trained and
of Willard Waller, whose (1932) book The sorted in recognition of these challenges
Sociology of Teaching mapped the contours of (Bogardus 1928). But because of student het-
subsequent scholarship. To align their work with erogeneity, and the lack of an effective system
Waller’s legacy, contemporary sociologists still that allocates teachers to students with
frequently adorn their writing with insightful individual-specific needs, it is important for
all teachers to tailor their practices to the indi-
vidual situations of each student (Bogardus
23 School and Teacher Effects 515
1929; Becker 1952b). Matching effects aside, Possibly because of the attention to resource
teachers who maintain a traditional, autocratic differences across schools, as well as the compel-
mode of instruction teach more content than ling case made for the preeminence of family
do teachers who maintain a congenial, demo- background as a determinant of student achieve-
cratic mode of instruction (Brookover 1943). ment, EEO’s attention to the study of teacher
3. Attitudes Toward School Leadership: Teachers effects is often forgotten. In fact, it is not clear
must navigate conflicting pressures created by that EEO’s contributions were ever adequately
students, parents, and principals, and the rela- appreciated. Ravitch (1993, p. 130) claims that
tionships among teachers reflect their its findings on teachers were “almost universally
approaches to these pressures (Becker 1953; ignored by academic researchers and the press”
Gordon 1955). Teacher satisfaction is shaped after the report was released and in subsequent
by whether administrators conform to teach- decades. In retrospect, and with another couple
ers’ expectations of appropriate administra- of decades of reflection, the core findings of EEO
tive decision making (Bidwell 1955). on teacher effects must be recognized as one of
sociology’s most important contributions to the
A more comprehensive review of the literature study of teachers.
from this period is of limited value, and some of
the early research does not meet our current stan- For their work, Coleman and his team first
dards of rigor. Nonetheless, some attention is tabulate differences in teacher characteristics and
instructive, as these examples demonstrate, to skills by the racial identities of students, sepa-
appreciate the provenance of many of the research rately by region of the country.1 The overall goal
themes found today in the sociology of education of EEO was to measure and report on such differ-
and in debates on teacher effectiveness. Although ences. Through linked surveys of students, teach-
the three sets of conclusions summarized above ers, and school administrators, Coleman and his
range over multiple substantive domains, they are team offer the following summaries of their pri-
all consistent with the themes set down by Waller: mary findings on teachers (pp. 148 and 165,
Teachers are professionals, pursuing complicated respectively):
goals, including their own professional develop-
ment and career trajectories, which must be pur- Compared to teachers of the average White
sued within schools and communities with student, teachers of the average [Black student]
diverse actors and dynamic expectations.
• score lower on a test of verbal competence,
23.1.2 Teachers As Inputs and the difference is most pronounced in the
in Educational Production Southern States.
Research on teacher effectiveness was pushed in • are neither more nor less likely to have
a new direction by the 1966 study, Equality of advanced degrees.
Educational Opportunity (EEO), commonly
referred to as “the Coleman report” (Coleman • have slightly more teaching experience, and
et al. 1966). In an attempt to document differ- slightly more tenure in their present school.
ences in all schooling “inputs,” following on the
directive from the Civil Rights Act of 1964 to • read more professional journals.
conduct a national study of educational opportu- • are neither more nor less likely to have
nity, Coleman and his team launched a study of
extraordinary importance (see Alexander and majored in an academic subject.
Morgan 2016; Gamoran and Long 2007; • if they are elementary teachers, were less
Sørensen and Morgan 2000).
likely to be trained in teacher’s colleges.
• more often are products of colleges that offer
no graduate training.
1 For the specific numbers, see Tables 6a and 6b, pages
16–17, Tables 2.31.5 and 2.31.6, pages 124–25, Tables
2.33.1–8, pages 131–40, Tables 2.34.1–14, pages
149–62.
516 S. L. Morgan and D. T. Shackelford
• attended colleges with a much lower percent sis away from a consideration of equality of
White in the student body. inputs toward the capacity of inputs to generate
more equality of student outcomes.
• less often rate their college high in academic
quality. Here, the analysis is clear: Teacher character-
istics are predictive, and more strongly for Black
• less often are members of academic honorary students than for White students (see Tables
societies, at least in the South. 3.25.2 and 3.25.3, p. 318). Perhaps most interest-
ing, teachers’ verbal test scores (on a thirty-item
• more often participate in teachers’ organiza- vocabulary test) have independent predictive
tions, especially in the South. power, above and beyond teachers’ levels of edu-
cation and experience. For this particular effect,
• more often have attended institutes for the cul- Coleman and his colleagues conclude that “the
turally disadvantaged. teachers’ verbal skills have a strong effect, first
showing at the sixth grade, indicating that
Compared to the average White [pupil], the between grades 3 and 6, the verbal skills of the
average [Black] pupil attends a school in which teacher are especially important” (Coleman et al.
the teachers are 1966, p. 318). Altogether, EEO concludes that
teachers are important, that their effects accumu-
• neither more nor less likely to have high late over years of schooling, and that the achieve-
absenteeism rates. ment of non-White students is especially
responsive to teacher quality.
• paid more in some regions and less in others;
thus the national averages are about the same. Because of its design, the Coleman report
conceptualized teachers as a schooling “input,”
• more likely to have requested assignment to reflecting the educational production
their particular school and to expect to make a methodology of the time. In this tradition, school
lifelong career of teaching. environments are nominally additive, even if the
subtlety of the writing sometimes implies genu-
• less likely to wish to remain in their present ine interactions. Regardless, in this type of
school if given a chance to change, or to teacher effects research, scholars have less use
declare they would reenter teaching if the for characterizations of teachers as professionals
decisions could be made again. embedded in communities, struggling to navigate
institutional rules and social relations while
• less likely to rate students high on academic working with heterogeneous populations of stu-
motivation and ability. dents. They are seen instead as actors with fixed
characteristics and capacities, distributed across
• less likely to believe that the school has [a] schools in ways that reflect their own interests as
good reputation with other teachers. well as the opportunities and constraints in the
labor market for teachers.
• less likely to prefer to teach in an academic
high school. 23.1.3 Teachers As Members
of Differentially Effective
• more likely to spend a substantial amount of Schools
time in class preparation.
With the maturation of the subfield of sociology
• more likely to teach large classes. of education, scholars continued to work on the
• more likely to spend time counseling with three subjects from mid-twentieth-century work
introduced above: professionalism (e.g., Blase
students.
• somewhat more likely to have taught in the
school the prior year.
• more likely to take a teacher’s examination as
a condition of employment.
Racial differences were, therefore, complex
when teacher characteristics are analyzed using
all of these measures. The resolution of the com-
plexity for Coleman and his team was to predict
student achievement based on teacher character-
istics, as part of the larger goal of shifting analy-
23 School and Teacher Effects 517
1986), within-classroom performance (e.g., hidden consequences of some prominent educa-
Sieber and Wilder 1967), and attitudes toward the tional practices, such as ability grouping and cur-
community and school leadership (e.g., Edgar riculum tracking, as well as new consideration of
and Warren 1969; Jessup 1978). Some existing how organizational constraints can limit school
questions received deeper examination, such as functioning and teacher performance. This work
studies of student–teacher match advantages that was pursued as efforts to desegregate schooling
leverage higher-quality data and refined concep- had stalled, the standards-based reform move-
tualizations. Alexander et al. (1987), for example, ment was launched in hopes of preserving the
make the case that pupil–teacher background international standing of U.S. educational institu-
congruence, based on the match of the socioeco- tions, and whole-school models of reform were
nomic status of the teacher to that of the pupil’s initially crafted (later often relabeled “restructur-
family, promotes higher levels of achievement ing” and “school turnaround” models; see, e.g.,
within the classroom.2 These studies have also Lee and Smith 1993, 1995).
evolved to align with emergent theoretical per-
spectives and alternative methodologies (e.g., Much could be written on the development and
Calarco 2011, 2014). general contours of the effective schools literature
in sociology from the 1980s through 2000, but we
The major development, however, was the focus only briefly on the subset of this literature
emergence of a developed perspective on schools that has considered the role of teachers in deliver-
as complex organizations. From early work, such ing effective instruction. This literature includes
as Larkin (1973), Bredo (1977), and Barnett pieces that model teacher commitment, efficacy,
(1984), that explored how school organization and satisfaction as a function of organizational
determines teacher behavior, a whole-school form, leadership structure, and general workplace
approach to modeling effectiveness developed control (e.g., Bacharach et al. 1990; Bidwell et al.
from the 1980s onward. The emergent model 1997; Ingersoll 1996; Lee et al. 1991; Raudenbush
came to see teachers not as learning inputs with et al. 1992; Rosenholtz and Simpson 1990; Rowan
fixed capacities for generating achievement, with et al. 1997). It also includes research that consid-
effects variable only according to match differ- ers the social relations among teachers, and how
ences across students with differing needs, but these relations can be a resource for supporting a
rather as vital core workers in schools with vari- school’s mission to generate achievement as a col-
able environments that delimit the range of pos- lective project (e.g., Bidwell and Yasumoto 1999;
sible performance. From this perspective, teacher Friedkin and Slater 1994; Yasumoto et al. 2001).
effectiveness varies with administrative struc-
tures and the social resources that inhere in work Although much variation exists in the particu-
networks (see Gamoran et al. 2000). lar arguments of these many studies, most valo-
rize the school community’s capacity to develop
This enriched conceptualization of schools and support effective teaching, even while the
emerged from scholarly sources and in response specific analysis of teacher practices is not usu-
to policy concerns. A preexisting interest in ally a direct subject of study. A good example of
investigating schools as agents of the intergener- this type of argument is the work on private, and
ational reproduction of inequality was joined to especially Catholic, schools. Bryk et al. (1993) is
new work on the social organization of schooling the exemplar. Here, the notion of “subsidiarity”
(see Hedges and Schneider 2005). The result was received particular emphasis as a broad ideologi-
increased attention to the unintended and/or cal commitment that structures effective Catholic
schools. As Bryk et al. (1993, pp. 301–02) write:
2 And such studies have continued. Crosnoe et al. (2004),
for example, offer evidence of more general achievement … subsidiarity means that the school rejects a
gains that result from healthy relationships between stu- purely bureaucratic conception of an organization.
dents and teachers, which they measure as intergenera- There are advantages to workplace specialization,
tional bonding. Now, economists are very much interested and it is hard to imagine the conduct of complex
in such effects, as we discuss below. work without established organizational procedures.
518 S. L. Morgan and D. T. Shackelford
Subsidiarity, however, claims that instrumental 1 . Teacher effects on student learning are real,
considerations about work efficiency and special- and these effects vary according to the match
ization must be mediated by a concern for human of each teacher to each student.3
dignity. Decentralization of school governance is
not chosen purely because it is more efficient, 2 . Teacher effects are a joint function of teach-
although it does appear to have such consequences. ers’ skills and effort, the first of which is
Nor is it primarily favored because it creates orga- strongly shaped by experiences before enter-
nizations that are more client sensitive, although ing the profession.4
this also appears to be true. Rather, decentraliza-
tion is predicated on the view that personal dignity 3 . School environments, which encompass both
and human respect are advanced when work is administrative structures and networks of
organized in small communities where dialogue social relations, shape both student effort and
and collegiality may flourish. At root is a belief teacher effort.
that the full potential of human beings is realized in
the social solidarity that can form around these 4 . Effective schools align student effort and
small group associations. teacher effort to advance student learning.
This sort of writing, and explanatory style, is The joint implication of these propositions
used to explain why Catholic schools are effec- can be expressed as
tive. Teachers are central to the mechanism that
generates learning, but it is the organization itself ( ) Learningi = fi Teacherj , Environments (23.1)
that activates the mechanism.
where the learning of each student i is an
Following the development of this sociologi- individual-specific function, fi(∙), under exposure
cal version of the effective schools literature, to a teacher j in school environment s. The chal-
sociologists have moved toward more direct lenge for analysis is that we typically observe a
assessments of interventions that target teacher student’s achievement, and possibly a student’s
performance. In some cases, the connections to achievement growth, for a small number of teach-
the effective schools literature are overt (e.g., ers in only one school. We want to know how
Gamoran et al. 2003; Moller et al. 2013) while
for others the attention is less direct (e.g., 3 The recent economics literature, which has leveraged
Hallinan 2008; Jennings and DiPrete 2010). administrative data sources, is also relevant, especially for
Overall, the effective schools literature remains the claim of match effects. Egalite et al. (2015), for exam-
influential within sociology, and it is an impor- ple, show that in Florida the race congruence of student–
tant piece of the foundation on which a prevailing teacher pairing promotes small but positive effects, even
consensus would now appear to rest, and which though Winters et al. (2013) argue that gender congruence
we detail in the next section. appears to have no substantial effects. See also Jackson
(2013) for a broad treatment of teacher match effects,
23.2 S chool Effects and Teacher which demonstrates their importance with empirical
Effects in Sociology: results from North Carolina.
The Conventional Wisdom
in Four Propositions 4 The economics literature is also consistent with the skills
claim. Ehrenberg and Brewer (1994), analyzing the High
From the sociological literature on the effects of School and Beyond data, show that teachers’ degrees have
teachers, we are comfortable asserting that the positive associations with achievement, perhaps indicat-
following propositions are supported by enough ing that teacher ability is important. More recently,
convincing evidence to constitute the conven- Clotfelter et al. (2007), through an analysis of North
tional wisdom of the field: Carolina administrative data, show that teacher experi-
ence, test scores, and licensure all have positive associa-
tions with achievement, although more for math than for
reading. Kukla-Acevedo (2009) show that in a Kentucky
school district teachers’ math preparation predicted fifth
grade math achievement.
23 School and Teacher Effects 519
learning would differ if each student were the putative teacher effects in b T are discussed as
exposed to alternative teachers in alternative if they are conditional on S and X. In research
school environments, after which we could form where measures in T are unavailable, the reduced
estimates for groups of individuals of different form school effects in b S are often discussed as
types, exposed to different types of teachers and if they encompass complex interactions with
in different school environments. Unfortunately, latent teacher effects, which could be directly
our observational data sets do not permit clean estimated if suitable measures in T were to
identification of these effects of interest because become available.
the institutional structure of schooling restricts
individual students’ exposure to alternative Altogether, in sociology it is widely recog-
teachers and schools. nized that the effects of schools and the effects of
teachers who work within schools cannot be sep-
In sociology, it is common to offer estimated arated easily in an empirical analysis. Outside of
regression equations of the form: sociology, it is less clear that this point is recog-
nized, as we will discuss below. That said, out-
Y = a + b T T + b S S + b X X + e (23.2) side of sociology, especially in the work of
economists, it is widely recognized that the joint
where Y is a learning outcome measure, T is one distribution of students, teachers, and schools
or more measures of teacher characteristics, S is generates complex matching gains and deficits in
one or more measures of school environments, the learning process. This recognition has led to a
and X is a set of student-level characteristics typi- rich literature on teacher assignment, attrition,
cally included as “control” variables. The terms and sorting, which we present next.
such as b T are conformable vectors of estimated
slope coefficients for the measures specified by 23.3 The Distribution of Teachers
the subscripts. If the analysis considers teacher Across and Within Schools
effects directly, then S is often regarded as a set of
school-level controls. If the study is one of school Since the Elementary and Secondary Education
environments, in which it is asserted that teacher Act was reauthorized by the No Child Left
effects are part of an unobserved mechanism, Behind (NCLB) legislation, we have learned a
then T may be excluded, often because suitable great deal about teacher assignment and teacher
measures are unavailable. sorting.5 This research has accumulated progres-
sively, building on templates from the 1980s and
Interpretations of results from estimated 1990s of various types (e.g., how teachers
regression equations of this form are often devel-
oped with language that implies interactive 5 We do not mean to imply that scholars did not study
effects, such that, for example, the estimates b T assignment and sorting patterns before the era of account-
should be interpreted as conditional on values of ability arrived in the 1990s. One early careful study in
S, or possibly even b S . Such interpretations are sociology is Becker (1952a), as summarized above. And,
usually developed as part of the overall conclu- in the wake of EEO, and after the U.S. Supreme Court
sions of a study, when authors use theory and ruled that racial balance in the teaching corps is a measure
intuition to reason beyond their empirical models of unitary status in desegregating school districts, scholars
that usually have been specified as nominally became very much interested in the distribution of teach-
additive. When reasoning beyond the data, few ers across schools in the same area. For example,
sociologists discuss their findings with explicit Greenberg and McCall (1974) show that in the San Diego
recognition of the individual-specific nature of school system teachers sorted across schools based on the
Eq. 23.1, where the function that generates learn- socioeconomic status of students, given that the salaries
ing is itself individually variable. Instead, indi- available did not differ across the district. Studies such as
vidual variability is usually thought to have been this one led to deeper modeling of teachers’ revealed pref-
swept away by a lag specification for the outcome erences and the possibilities for interventions to change
Y along with measures in X, even if in some cases their job search choices (see Antos and Rosen 1975;
Levinson 1988).
520 S. L. Morgan and D. T. Shackelford
respond to desegregation remedies, how teachers 2. Across schools in the same geographic region,
are laid off as part of “reductions in force” stud- and frequently the same local education
ies). A growing source of motivation is to under- authority, teachers appear to be sorted by the
stand whether the nation’s teaching corps is student composition of schools, using stan-
strong enough, and stable enough, to support a dard measures of credentials and experience,
schooling system that will allow the U.S. to and following the pattern first established for
remain competitive with the surging economies teacher attrition (see Allensworth et al. 2009;
of international peers. More recently, as systems Boyd et al. 2005; Clotfelter et al. 2005, 2006,
were developed by states to consider whether 2011; Feng 2010, 2014; Krei 1998; Lankford
schools were making the adequate yearly prog- et al. 2002; Rice 2013).
ress (AYP) required for continuation under
NCLB, some granular analysis of teacher effects 3. These patterns of teacher attrition, mobility,
across all schools has become possible. The and sorting may be a response to school man-
interests of three groups then dovetailed: (1) agement and working conditions, which vary
those who hoped to develop new formulas for with student composition, rather than a direct
AYP that could replace threshold measures of response to the greater challenges of teaching
proficiency with alternatives that recognize students from disadvantaged origins (see
school differences in average student achieve- Horng 2009; Ingersoll and May 2012; Loeb
ment growth; (2) those who hoped to develop et al. 2005; Ost and Schiman 2015).
models of achievement growth that could be used
to identify teachers who are deserving of merit 4 . Some policy interventions can make between-
bonuses; and (3) those who hoped to use achieve- school sorting even more substantial. These
ment growth models to determine the proportion effects have emerged in response to state
of teachers who are grossly ineffective, yet pro- incentives for hiring certified teachers, merit
tected from dismissal because of teacher tenure. pay for teachers, class-size reductions, and the
passage of accountability legislation (see
This literature is important for sociologists to Clotfelter et al. 2004; Goldhaber et al. 2007;
absorb because it has implications for the conven- Guarino et al. 2011; Jepsen and Rivkin 2009).
tional wisdom on school and teacher effects. Yet,
it is impossible to review this vast literature both 5. Salary inducements have not been effective at
chronologically and by theme in a piece of this eliminating teacher sorting across schools, in
length. We have therefore grouped the studies by part because of patterns of racial segregation
primary findings, ordered somewhat chronologi- (see Clotfelter et al. 2011; Feng 2014;
cally as they have been developed in the literature. Goldhaber et al. 2010). Nonetheless, there
With only a few exceptions (e.g., Ingersoll 2005; may be some scope for future change, and
Kalogrides et al. 2013), this research has accumu- more results will be needed to examine the
lated in journals that do not have a sociological range of responses to alternative interventions
focus. The primary findings are: (see Clotfelter et al. 2008; Fulbeck 2014;
Fulbeck and Richards 2015).
1. The student composition of schools—percent
in poverty, proportion non-White, etc.—pre- 6. Sorting may erode the capacity of resource
dicts both teacher attrition and teacher mobil- differences across schools to mitigate the
ity (see Elfers et al. 2006; Feng 2014; learning differences produced by family back-
Hanushek et al. 2004; Scafidi et al. 2007). ground (see Bastian et al. 2013; Ladd 2008;
Rates of exit are highest in schools with pupils Rubenstein et al. 2007; but see also Player
who have greater social disadvantage, leaving 2009).
the teaching corps in such schools compara-
tively young and inexperienced. 7 . Within schools, sorting is also present, follow-
ing the same pattern of between-school sort-
ing (see Clotfelter et al. 2005, 2006; Feng
2010; Kalogrides et al. 2013). This finding
cannot be surprising to sociologists who know
23 School and Teacher Effects 521
the literature on the assignment of teachers to more likely to flee schools with larger propor-
curriculum tracks. tions of students who identify as Black (Jackson
8. Recent policy interventions have also gener- 2009). Not inconsistent with this pattern, teach-
ated additional sorting within schools, as ers’ value-added scores tend to increase after
school leaders have redistributed teachers to teachers enter new schools (Jackson 2013).
satisfy new challenges. For example, Fuller and Chingos and West (2011) suggest that, in
Ladd (2013) show that in North Carolina, Florida, VAMs indicate that effective teachers
accountability legislation caused schools to are more likely to be promoted to become prin-
move less credentialed teachers down to cipals while less effective teachers are more
untested grades (kindergarten through second likely to be reassigned to low-stakes positions,
grade) and more credentialed teachers up to consistent with research that does not utilize
tested grades (third through fifth grade). For a VAMs to measure effectiveness (see Fuller and
study in ten Kentucky school districts, Barrett Ladd 2013).
and Toma (2013) show that principals increased
the class sizes of teachers they deemed effec- Finally, some of the work on teacher sorting
tive based on their own assessments. that is informed by VAMs has begun to wrestle
9. The most recent literature on sorting has been with school context effects. Koedel (2009) argues
informed by value-added models of teacher that teachers have spillover effects on achieve-
effectiveness. VAMs attempt to identify ment in subjects that they do not teach while
effective teachers by average gains in their Jackson and Bruegmann (2009) find evidence of
pupils’ test scores, not measures of teachers’ spillover effects through peer learning. Loeb
own characteristics or practices.6 As of the et al. (2012) show that effective schools are able
time of this writing, the implications of the to hire the most effective teachers, as measured
VAM work for teacher sorting results are by VAMs, while Ferguson and Hirsch (2014)
unclear. make the case that effective teachers are gener-
ated by effective schools.
Some studies suggest that teachers with high
value-added scores are more likely to remain in Overall, the literature on teacher sorting—
their schools (Boyd et al. 2011), although the which now encompasses an older literature on
pattern is stronger in schools with more advan- teacher attrition and teacher mobility—raises
taged students (Goldhaber et al. 2011).7 Other important questions for sociological research on
studies argue that the latter effects dominate school and teacher effects. Have we deempha-
(Steele et al. 2015), with effective teachers sized the older sociological perspective that con-
ceived of teachers as valuable “inputs” with
6 For clear, simple, accurate, and balanced summaries of autonomous capacities to generate learning?
value-added modeling, see Corcoran and Goldhaber Although sociologists have not wavered in their
(2013) and Corcoran (2016). To understand the required position that “teachers matter,” it may be the case
assumptions with more depth, see Reardon and that we have been too quick to assume that teach-
Raudenbush (2009). For studies that have defended and ers are broadly similar in their potential, condi-
deployed VAMs, see Chetty et al. (2014a, b). For argu- tional on training, and that variation in any
ments against the use of VAMs, see Rothstein (2009, apparent teacher effects is almost entirely attrib-
2010) and Guarino, Reckase, Wooldridge (2015). For utable to variation in their school environments.
work that compares the results of VAMs to various other Not unrelated to this question, is it possible that
types of teacher evaluation systems, see Grissom and schools that appear to be effective because of
Youngs (2016). their administrative structures are instead only
effective because they have been better able to
7 Jacob and Lefgren (2007) show that parents dispropor- attract teachers who are effective because of their
tionately prefer effective teachers in high poverty schools, own capacities? To begin to address questions
perhaps because such teachers are comparatively rare.
522 S. L. Morgan and D. T. Shackelford
such as these, we need to develop a deeper appre- chapter, we consider the distribution of math and
ciation for the empirics of teacher sorting, and, in science teachers across students in public high
the next section, we advance this goal. schools in 2009, merging to the HSLS data both
funding and school characteristics from the 2009
23.4 An Example of What through 2013 Common Core of Data.8
a Typical Data Source
Reveals For the HSLS, each sampled first-year high
About the Distribution school student is linked, through both an admin-
of Teachers istrative list and a student response, to the teacher
of the relevant math and/or science class in which
Many of the most persuasive studies of teacher the student was enrolled in fall 2009. These
assignment and teacher sorting are based on teachers are then asked to complete a
district-level and state-level analyses of adminis- questionnaire that assesses their class structure,
trative data, usually only from states with the their attitudes toward their school and its stu-
most sophisticated data systems that have wel- dents, and their own qualifications.
comed academic research. It has been assumed
by many researchers that what has been learned The 753 public high schools sampled for the
in these states is applicable to the nation as a HSLS have student samples that range from 7 to
whole, but surely this inference will be evaluated 49 students, with a mode of 24 students. These
in the future. Furthermore, because of the focus students are matched to both math and science
on student testing in grades three through eight, teachers, so that we have a total of 12,832 stu-
in response to NCLB, most studies of teacher dents matched to 3172 math teachers and 11,676
sorting consider only elementary schools; those students matched to 2362 science teachers.9
studies that do consider middle school grades When weighted appropriately, the responses of
have less clear results. teachers can be used to estimate the distributional
characteristics of the teacher–student match
Sociologists of education most commonly across first-year high school students in 2009 for
study secondary schools, in part because of their two linked populations: all students enrolled in
longstanding interest in proximate institutions math classes in public schools and all students
that shape entry into the adult stratification order. enrolled in science classes in public schools.10
Existing teacher effects research in sociology is
therefore dominated by studies of high schools. 8 Our analysis is related to, but distinct from, the most
Because of the mismatch with the teacher sorting common prior analyses of national distributions of teach-
literature, it is useful to consider what can be ers. These prior studies, which have been discussed above,
learned about teacher sorting from an analysis have frequently used the Schools and Staffing Surveys
using the type of data most commonly analyzed (SASS). Analysis of the SASS surveys allows for the
by sociologists of education—a national sample modeling of teacher distributions across schools, but not
of students nested within high schools, collected directly of teacher distributions across students, since
by the U.S. Department of Education, following only school aggregate measures of student characteristics
on the template first established by Coleman and are available, and typically without detailed measures of
his colleagues for EEO. the family backgrounds of students.
In this chapter, we offer an analysis of the most 9 On average, we have 4.4 sampled students for each math
recent nationally representative survey, which is teacher and 5.4 sampled students for each science teacher,
the High School Longitudinal Survey of 2009 with medians of 3 and 4 students, respectively. At the
(HSLS). Still ongoing, the HSLS is a sample of school level, the median number of math teachers is 4
first-year public and private high school students across the 720 schools with sampled math teachers while
in 2009, which includes linked survey instruments the median number of science teachers is 3 across the 699
for students, parents, math and science teachers, schools with sampled science teachers.
counselors, and school administrators. In this
10 We exclude private schools from this analysis, mostly
because the teacher sorting literature is very much focused
on public schools. Of course, teachers do sort into private
schools as well, and private schools have served as a valu-
able point of comparison in the effective schools research
in sociology. A more comprehensive analysis should con-
sider sorting by sector and type of school as well.
23 School and Teacher Effects 523
We first offer results on school-level climate, behaviors of students and their parents, as well as
where teachers are the informants on the prob- available administrative and district support. We
lems that their schools face, as well as teacher then turn toward an analysis of the distribution of
satisfaction with the level of support that is pro- teachers, measured by their preparation and
vided to meet their challenges. We consider the experience, and assess the extent to which a pat-
relationships that teacher-perceived climate and tern of teacher sorting is present among the math
support have with the characteristics of the stu- and science teachers of ninth graders.
dent populations of HSLS schools, measured by
students’ socioeconomic status and performance 23.4.1 S chool Climate As Reported
on a standardized math test. We also consider the by Teachers
relationships that teacher-perceived climate and
support have with per-pupil instructional expen- Table 23.1 presents 32 partial correlation coeffi-
ditures, measured at the district level. This first cients, bounded by −1 and 1, between the school-
portion of the analysis demonstrates that teachers level or student-level variable listed in the first
who work in schools with disadvantaged student row of each panel and each of the teacher-level
populations report that the learning climate is variables listed in the row labels of each of the
more challenging, because of the attitudes and
Table 23.1 Partial correlation coefficients for students’ socioeconomic status and algebra test scores in the ninth grade
with teachers’ reports of resource problems and climate problems
Math teacher Science teacher
Partial correlation Standard error Partial correlation Standard error
School mean of SES with
Resources and facilities are a problem −0.138 0.035 −0.052 0.032
−0.098 0.029 0.056 0.034
Administrative support is a problem −0.385 0.028 0.031
−0.328
Student attitudes and behavior are a
problem
Lack of parent support is a problem −0.403 0.027 −0.384 0.030
Within-school SES with
Resources and facilities are a problem −0.003 0.013 0.003 0.017
0.007 0.014 0.015
Administrative support is a problem 0.013 −0.002 0.013
−0.020 −0.015
Student attitudes and behavior are a
problem
Lack of parent support is a problem −0.030 0.013 −0.033 0.014
School mean of algebra test score with
Resources and facilities are a problem −0.163 0.032 −0.091 0.035
−0.101 0.030 0.014 0.038
Administrative support is a problem −0.380 0.028 0.030
−0.352
Student attitudes and behavior are a
problem
Lack of parent support is a problem −0.360 0.029 −0.382 0.032
Within-school algebra test score with
Resources and facilities are a problem −0.010 0.015 −0.001 0.023
Administrative support is a problem 0.024 0.017 0.032 0.023
Student attitudes and behavior are a −0.040 0.015 −0.015 0.018
problem
Lack of parent support is a problem −0.034 0.016 −0.015 0.016
Notes: The partial correlation coefficients are adjusted for school type (whether the high school is a charter or magnet
school), and the data are weighted to the populations of ninth graders enrolled in math and science classes, respectively.
The standard errors are heteroskedasticity-consistent and are adjusted for the clustering of students within teachers
Source: High School Longitudinal Study of 2009 (HSLS:09)
524 S. L. Morgan and D. T. Shackelford
subsequent four rows. We offer partial correla- • Teaching is limited by shortage of equipment
tion coefficients separately for the reports of for demonstrations
math and science teachers, yielding 16 each.
• Teaching is limited by inadequate physical
These partial correlation coefficients are esti- facilities
mated by appropriately scaling coefficients from
underlying regression models with students as • Teaching is limited by high student-to-teacher
the unit of analysis, and where we adjust the stan- ratio
dard errors for the clustering of students within
schools and teachers. In addition to specifying Administrative support is a problem
each model with one of the two focal variables as
the outcome variable and one as a predictor vari- • Teaching is limited by inadequate professional
able (which is arbitrary, given the subsequent learning opportunities
scaling of the underlying regression coefficients
as partial correlation coefficients), the regression • Teaching is limited by inadequate administra-
models also include indicator variables for mag- tive support
net schools and charter schools, with regular pub-
lic schools as the reference category. The number • Teaching is limited by lack of planning time
of magnet and charter schools is too small to per- • Teaching is limited by lack of autonomy in
mit evaluations of differential associations, and
so the indicator variables simply adjust the partial instructional decisions
correlation coefficients.
Student attitudes and behavior are a problem
The first panel offers partial correlation coeffi-
cients for school mean SES with each of four scales • Student tardiness is a problem at this school
of teacher attitudes about problems at their school. • Student absenteeism is a problem at this school
School mean SES is calculated as the mean of the • Student class cutting is a problem at this school
sampled students’ SES values; each student’s value • Students dropping out is a problem at this
is a standardized composite of the available infor-
mation on the “big five” variables: mother’s and school
father’s education, mother’s and father’s occupa- • Student apathy is a problem at this school
tional prestige, and total family income (and where • Students coming unprepared to learn is a
“mother” and “father” are nominal labels in many
cases for those who are listed as parents and guard- problem at this school
ians). The four scales of problems for each teacher • Teaching is limited by uninterested students
are based on agree/disagree responses for multiple • Teaching is limited by low morale among
underlying questions, which we group together to
form the following scales: students
• Teaching is limited by disruptive students
Resources and facilities are a problem
Lack of parental support is a problem
• Lack of parental involvement is a problem at
this school
• Teaching is limited by lack of parent/family
support
• Lack of teacher resources and materials is a All scales are factor scored and have acceptable
problem at this school measurement properties (e.g., Cronbach’s alpha
estimates of reliability between 0.70 and 0.88).
• Teaching is limited by shortage of computer
hardware/software For the first panel of Table 23.1, all partial
correlation coefficients are in the expected
• Teaching is limited by shortage of support for directions, with slightly stronger relationships for
using computers math teachers. Schools with more advantaged
student populations (i.e., higher values for school
• Teaching is limited by shortage of textbooks mean of SES) have fewer problems according to
for student use the teacher reports, with the associations stronger
• Teaching is limited by shortage of instruc-
tional equipment for students
23 School and Teacher Effects 525
for student and parent attitudes, behavior, and The third and fourth panels of Table 23.1
support than for resources, facilities, and admin- substitute the available HSLS test score for SES,
istrative support. which in this case is a test of algebra knowledge
and skill. The values for these two panels are
How strong are these associations? Like all remarkably similar to the first two panels based
product-moment correlations, partial correlations on SES. The reason is straightforward: SES is
are bounded by −1 and 1. Values for the strongest strongly associated with the test score, both at the
associations in Table 23.1 have partial correlation school level and for within-school variation.
coefficients such as −0.4, which we interpret as
moderately strong, given attenuation from mea- Table 23.2 presents an analogous 32 partial
surement error for each pair of variables. Most of regression coefficients, using the same scales of
the other associations are much smaller in magni- problems reported by teachers, but using four
tude, typically near to −0.1. One might regard district-level measure of expenditures. The first
these coefficients as too small to be interpreted, panel presents per-pupil instructional expendi-
but we feel that they are meaningfully negative, tures, and the third panel presents per-pupil
usually more than twice the size of their standard instructional salary expenditures only. Both mea-
errors, and would be larger in magnitude—prob- sures are drawn from the Common Core of Data,
ably between 25% and 50% larger—in the and averaged across the 4 years during which
absence of random measurement error. each student was (or would have been) enrolled
in their school. The second and fourth panels are
By our interpretive standards, values that are cost-adjusted versions of these two expenditure
smaller in magnitude than their standard errors measures, using the same area-cost-adjustment
are the only estimated partial correlation coeffi- procedure detailed in Morgan and Jung (2016).
cients that we think can be reasonably attributed
to sampling error alone. Some partial correlation Whether cost-adjusted or not, schools with
coefficients of this type are present in the second higher levels of expenditures have slightly lower
panel. These partial correlation coefficients are levels of teacher-reported problems. The partial
for within-school SES measures of each student correlation coefficients are close to −0.1 in most
with the resource and administrative support cases. But, in relative comparisons to the results
scales analyzed for the first panel of Table 23.1. from Table 23.1, an interesting difference is pres-
In these cases, individual values for SES are devi- ent. When considering teacher reports of student
ated from the school-specific mean, and then all and parent attitudes, behavior, and administrative
schools are pooled for the analysis. Students with support, the implied associations are substan-
high values for within-school SES are those who tially weaker than for the school mean of SES
are well above their school’s mean. Given that the and the school mean of test scores. It is unknown
teacher attitudes that compose these two scales whether the relative weakness of these relation-
reference their entire school, we would not expect ships is genuine, or is instead attributable to the
these partial correlation coefficients to deviate necessity of using district-average expenditure
from zero, except as a result of sampling error. measures, rather than school-specific measures.
That is precisely what we see.11 Our interpretation is that the relative weakness of
the relationships is genuine, since this is what
11 These within-school scales of SES also have more mea- one would expect based on extant research that
surement error, and so the correlation coefficients are fur- demonstrates the weak predictive power of
ther attenuated. Notice also that we do have meaningful expenditures measures of all types (i.e., from
but very small negative partial correlation coefficients for EEO to more recent efforts, such as Morgan and
within-school SES with the student and parent attitude, Jung 2016). For the other two problems scales—
behavior, and support scales. These coefficients suggest focused explicitly on resources, facilities, and
that there is a very slight tendency for teachers who are administrator supports—the associations with
assigned to lower-SES students within their schools to resources are comparable to those with the school
report more challenges created by the attitudes and behav- means of SES and test scores. This is also quite
ior of students and parents.
526 S. L. Morgan and D. T. Shackelford
Table 23.2 Partial correlation coefficients for district-level per pupil expenditures with teachers’ reports of resource
problems and climate problems
Math teacher Science teacher
Partial Standard Partial Standard
correlation error
correlation error
All instructional expenditures (per pupil) with −0.104 0.030 −0.035 0.055
Resources and facilities are a problem 0.034 0.049 0.049
Administrative support is a problem −0.036 0.035 0.042
Student attitudes and behavior are a problem −0.077 0.035 −0.088 0.036
Lack of parent support is a problem −0.081
All instructional expenditures (per pupil and −0.078
cost-adjusted) with
Resources and facilities are a problem −0.097 0.029 −0.056 0.055
Administrative support is a problem −0.050 0.034 0.010 0.049
Student attitudes and behavior are a problem −0.075 0.035 0.042
Lack of parent support is a problem −0.060 0.035 −0.092 0.040
Instructional salary expenditures (per pupil) with −0.055
Resources and facilities are a problem −0.125 0.031 0.049
Administrative support is a problem −0.054 0.032 −0.058 0.047
Student attitudes and behavior are a problem −0.106 0.033 0.044 0.039
Lack of parent support is a problem −0.111 0.033 0.035
Instructional salary expenditures (per pupil and −0.108
cost-adjusted) with −0.101
Resources and facilities are a problem
Administrative support is a problem −0.121 0.028 −0.082 0.050
Student attitudes and behavior are a problem 0.031 0.003 0.046
Lack of parent support is a problem −0.071 0.034 0.040
−0.104 0.033 −0.111 0.037
Notes: See Table 23.1 −0.070
Source: See Table 23.1 −0.093
sensible, even if the sizes of the relationships sent a reason to seek employment in schools with
between actual resource expenditures and prob- simpler climates.
lems attributable to resources and facilities may
be smaller than some readers would expect. To assess teacher sorting directly, we now
consider teacher characteristics, presenting 24
23.4.2 Teacher Sorting partial correlation coefficients in each of Tables
Across and Within Schools 23.3 and 23.4, analogous to those already
reported in Tables 23.1 and 23.2. Rather than use
The results provided in Tables 23.1 and 23.2 four scales of teacher-reported problems at their
demonstrate that the HSLS generates reasonable schools, each panel includes three measures of
results about how teacher reports of the problems teacher training (whether they have graduate
faced by their schools are related to measures of degrees, are certified, and are certified in math or
expenditures, test scores, and the SES of stu- science, respectively) as well as three measures
dents. The results suggest that teachers who work of teacher experience (years since bachelor’s
with disadvantaged student populations report degree, years teaching at the current school, and
that the learning climate is more challenging. For years teaching math or science, respectively, at
some teachers, the challenges may be rewarding, the high school level).
while for others the same challenges may repre-
For Table 23.3, the partial correlation coeffi-
cients for the associations with school mean of
SES and school mean of test scores are small but
23 School and Teacher Effects 527
Table 23.3 Partial correlation coefficients for students’ socioeconomic status and algebra test scores in the ninth grade
with teachers’ training and experience
Math teacher Science teacher
Partial correlation Standard error Partial correlation Standard error
School mean of SES with 0.095 0.030 0.096 0.031
Teacher has a graduate degree 0.070 0.035 0.109 0.033
Teacher is certified 0.076 0.034 0.105 0.032
Teacher is certified in math/science 0.003 0.028 0.048 0.033
Years since bachelor’s degree 0.079 0.029 0.077 0.032
Years at current school 0.051 0.027 0.114 0.030
Years teaching math/science in high
school 0.015 0.012 0.025 0.014
Within-school SES with 0.036 0.014 0.015 0.012
Teacher has a graduate degree 0.042 0.014 0.020 0.013
Teacher is certified 0.039 0.012 0.032 0.014
Teacher is certified in math/science 0.032 0.013 0.027 0.013
Years since bachelor’s degree 0.038 0.013 0.030 0.013
Years at current school
Years teaching math/science in high 0.083 0.029 0.099 0.033
school 0.058 0.037 0.138 0.034
School mean of algebra test score with 0.065 0.036 0.132 0.034
Teacher has a graduate degree 0.003 0.028 0.066 0.037
Teacher is certified 0.084 0.027 0.073 0.034
Teacher is certified in math/science 0.047 0.025 0.099 0.033
Years since bachelor’s degree
Years at current school 0.051 0.015 0.040 0.015
Years teaching math/science in high 0.060 0.016 0.027 0.022
school 0.064 0.016 0.036 0.022
Within-school algebra test score with 0.068 0.015 0.023 0.016
Teacher has a graduate degree 0.076 0.016 0.038 0.014
Teacher is certified 0.085 0.016 0.032 0.015
Teacher is certified in math/science
Years since bachelor’s degree
Years at current school
Years teaching math/science in high
school
Notes: See Table 23.1
Source: See Table 23.1
meaningful, and perhaps slightly larger for SES experience, with the effect perhaps larger for sci-
than for test scores. Teachers in high-SES schools ence teachers than for math teachers. This pattern
and with high test scores are slightly more likely is consistent with the literature on teacher assign-
to have graduate degrees, be certified, and have ments and curriculum tracking, although perhaps
more years of teaching experience. In addition, weaker in magnitude than that literature would
the partial correlation coefficients for within- lead one to expect.
school SES and within-school test scores are Table 23.4 presents evidence that schools situ-
weak but meaningful because they are generally ated in districts with higher levels of expenditures
in the expected direction. Students who have are also more likely to have teachers with stron-
comparatively high SES and high test scores in ger training, and, to a lesser extent, prior experi-
their schools are very slightly more likely to ence. The strongest partial correlation coefficients
have teachers with stronger training and more are for graduate degrees among teachers, which
528 S. L. Morgan and D. T. Shackelford
Table 23.4 Partial correlation coefficients for district-level per pupil expenditures with teachers’ training and
experience
Math teacher Science teacher
Partial Standard Partial Standard
correlation error
correlation error
All instructional expenditures (per pupil) with 0.151 0.026 0.161 0.037
Teacher has a graduate degree 0.056 0.032 0.013 0.043
Teacher is certified 0.061 0.032 0.018 0.042
Teacher is certified in math/science 0.020 0.027 0.038 0.041
Years since bachelor’s degree 0.061 0.035 0.101 0.039
Years at current school 0.032 0.028 0.031
Years teaching math/science in high school −0.002
All instructional expenditures (per pupil and 0.115 0.028 0.127 0.040
cost-adjusted) with 0.065 0.032 0.018 0.047
Teacher has a graduate degree 0.069 0.032 0.021 0.047
Teacher is certified 0.016 0.028 0.028 0.044
Teacher is certified in math/science 0.077 0.036 0.139 0.041
Years since bachelor’s degree 0.012 0.031 0.055 0.033
Years at current school
Years teaching math/science in high school 0.144 0.025 0.160 0.033
Instructional salary expenditures (per pupil) with 0.059 0.033 0.030 0.039
Teacher has a graduate degree 0.066 0.033 0.034 0.039
Teacher is certified 0.006 0.027 0.026 0.036
Teacher is certified in math/science 0.058 0.034 0.083 0.038
Years since bachelor’s degree 0.005 0.028 0.020 0.031
Years at current school
Years teaching math/science in high school 0.110 0.027 0.125 0.036
Instructional salary expenditures (per pupil and 0.071 0.032 0.036 0.045
cost-adjusted) with 0.076 0.031 0.039 0.044
Teacher has a graduate degree 0.004 0.028 0.015 0.040
Teacher is certified 0.074 0.035 0.125 0.040
Teacher is certified in math/science 0.020 0.030 0.048 0.032
Years since bachelor’s degree
Years at current school
Years teaching math/science in high school
Notes: See Table 23.1
Source: See Table 23.1
may reflect a type of sorting where teachers with perhaps weaker than one would expect for this
graduate degrees choose to work in, or are hired type of analysis, given the established literature
by, school districts with higher expenditures. We on teacher sorting and the strong claims that have
do not have data on individual teacher salaries, been developed based on administrative data,
but it seems reasonable that the higher instruc- usually for elementary schools in selected states.
tional expenditures in these school districts On the other hand, most of the associations are in
reflect higher salary offers to those hired with the expected direction, suggesting that at the high
graduate degrees, or raises awarded to those who school level, in a national sample, teacher sorting
acquire graduate degrees during their employment. of the expected pattern is present. Sorting is
not confined to elementary schools, nor only
Altogether, what have Tables 23.3 and 23.4 detectable in states with comparatively rich
shown? On the one hand, the associations are all
23 School and Teacher Effects 529
administrative data that has been made available 23.5 Conclusions
to academic researchers.12
In this chapter, we first reviewed the long tradi-
The implication of these patterns is that tion of sociological research on teacher effects
schools with the highest performance may well and school effects, with particular emphasis on
benefit from having the strongest teachers (who the interaction between the two. We then consid-
themselves benefit from higher levels of ered the large literature on teacher attrition,
resources, more supportive administrative struc- mobility, and sorting, which has matured mostly
tures, and the opportunity to teach students who outside of sociology. To assess the relevance of
present fewer learning challenges and have more the sorting literature to the sociological literature,
supportive home environments). Yet, with partial we then offered an empirical analysis of recent
correlations of this magnitude, it is hard to make data on high school students and their math and
the case that we have developed evidence that science teachers. We showed that sorting dynam-
high school teacher sorting is a powerful source ics are present in a national sample of ninth grad-
of high school differences. ers matched to their teachers, but we also
concluded that the pattern of sorting is not so
In this sense, the results can be considered large that it presents a fundamental challenge to
somewhat encouraging for the school effects lit- the sociological literature on school effects that
erature in sociology that has mostly ignored sort- typically ignores the dynamics teacher sorting.
ing dynamics. The caveat, of course, is that this
analysis has only rather limited measures of We conclude, in this section, with some
teacher skill and quality. We cannot eliminate the thoughts on how teacher and school effects are
possibility that a more substantial pattern of likely to evolve, based on our interpretation of the
teacher sorting exists on the characteristics of current policy environment. Partly in response to
teachers not measured by the HSLS instrument. the uncertainty of the value of in-service profes-
And we cannot establish any connections at all to sional development, as well as the threat of new
the most recent teacher sorting literature, which forms of alternative teacher certification, calls for
has used VAMs to attempt to identify effective a more deeply professionalized teaching corps for
teachers. It is possible that sorting would appear our public schools are now common. Sociologists
more dramatic if a valid measure of effectiveness will surely study how the teaching profession
were available, rather than simply measures of adapts in the coming decades in response to this
qualifications and crude measures of experience. new form of teacher mobilization, which seems
poised to reshape preservice teacher training and
12 In the supplementary appendix, we offer four analogous enhance within-classroom autonomy. While it
tables (S1 through S4) for the 10-state saturated sample of may be comforting to believe that these efforts
schools in the HSLS. For the results reported in these will protect teachers from future evaluation met-
additional tables, we include fixed effects for states in the rics that are too narrow, this prediction may be too
underlying regression models. The results presented there sanguine and is certainly premature. We think it is
demonstrate that the average within-state partial correla- quite plausible that policymakers, administrative
tion coefficients are only slightly smaller in magnitude in authorities, and parents will remain at least as
nearly all cases of direct comparison to those in Tables interested in identifying teacher and school effects
23.1 through 23.4, suggesting that these weak patterns of with simple output measures that can be used to
teacher sorting are characteristic of within-state relation- allocate resources and choose from among com-
ships as well. This result implies, even though it is based peting schools. If so, then a new professionaliza-
on an analysis of only 10 states, that the weakness of the tion movement may not alter the relative
associations is not generated by suppression that is attrib- distribution of teacher effects, by altering sorting
utable to unspecified state-level differences in the results patterns, even if the movement does succeed in
in Tables 23.1 through 23.4.
530 S. L. Morgan and D. T. Shackelford
boosting teacher salaries and improving working teachers with enough care, even if we can take
conditions. pride in our greater relative attention to both the
organizational context of schooling and the
Changes in the distribution of teacher effects advantages and disadvantages conferred by dif-
may, however, arise from other sources. As of ferences in home environments. The greatest
this writing, the prospects are uncertain for immediate need, however, is not a shift in empha-
greater harmonization of curricular standards sis on the part of researchers, but rather a new and
across states, and across school districts within substantial commitment from federal and state
states. If the move toward more common stan- data collection agencies to pursue more complete
dards receives a new push from a policy shock or measurement of the features and activities of stu-
leadership change, then the effects of teachers dents, teachers, and schools. Available adminis-
may become easier to discern in studies that ana- trative data, which has effectively opened up
lyze comparable criterion-referenced test scores many important questions of academic interest
across schools. If these same test scores are to be and policy importance, does not adequately mea-
used for the evaluation of teacher performance, sure the home environments that strongly shape
then there is reason to expect a strengthening of student performance in school, and offers little
the dynamics that generate teacher sorting across granular data on the behavior of students.
schools. In this scenario, apparent school effects National data sources, patterned on EEO, are
may emerge, which in fact represent the accen- stronger in their measurement of the features of
tuation of the sorting of effective teachers toward students, their parents, and schools, but they do
schools with students who are easier to teach. not include sufficient information on the peda-
gogy and expertise of teachers or the learning
Consider how any such future sorting dynam- climates within classrooms. Without improve-
ics may interact with the most common school ments in available data, nifty new identification
effect analyzed recently: the effectiveness of strategies from methodologists are unlikely to
charter schooling. A consensus seems to have generate enough insight to enhance our under-
emerged (or nearly so) that the highest-quality standing of the complementarities that character-
charter schools are no worse than the non-charter ize both school and teacher effects.
alternatives in their vicinity, and frequently sub-
stantially better. What has never been effectively References
determined is how commonly any apparent char-
ter school effects are attributable to (1) their abil- Alexander, K. L., Entwisle, D. R., & Thompson, M. S.
ity to attract higher-quality teachers, (2) their (1987). School performance, status relations, and the
ability to motivate teachers of all types to devote structure of sentiment: Bringing the teacher back in.
substantially more effort, or (3) features of char- American Sociological Review, 52(5), 665–682.
ter schools that are separable from the effects of
their teachers, such as disciplinary policy and tar- Alexander, K. L., & Morgan, S. L. (2016). The Coleman
geted curricula. If charter schools increase in report at fifty: Its legacy and implications for future
number, while the velocity of teacher sorting research on equality of opportunity. RSF: The Russell
increases, then estimated charter school effects Sage Foundation Journal of the Social Sciences, 2(5),
may increase, as teachers, not just students, are 1–16.
creamed from traditional public schools.
Allensworth, E., Ponisciak, S., & Mazzeo, C. (2009). The
Altogether, it will be essential to devote schools teachers leave: Teacher mobility in Chicago
greater attention to developing study designs that public schools. Chicago: Consortium on Chicago
can estimate the interactive nature of teacher and School Research at the University of Chicago.
school effects, attuned to the underlying pro-
cesses that determine the job-seeking behavior of Antos, J. R., & Rosen, S. (1975). Discrimination in
teachers. The sociological literature on school the market for public school teachers. Journal
effects has not considered the distribution of of Econometrics, 3(2), 123–150. https://doi.
org/10.1016/0304-4076(75)90042-1.
Bacharach, S., Bamberger, P., & Conley, S. (1990).
Professionals and workplace control: Organizational
23 School and Teacher Effects 531
and demographic models of teacher militancy. Brookover, W. (1943). The social roles of teachers and
Industrial and Labor Relations Review, 43(5), 570– pupil achievement. American Sociological Review,
586. https://doi.org/10.2307/2523329. 8(4), 389–393.
Barnett, B. G. (1984). Subordinate teacher power in
school organizations. Sociology of Education, 57(1), Bryk, A. S., Lee, V. E., & Holland, P. B. (1993). Catholic
43–55. https://doi.org/10.2307/2112467. schools and the common good. Cambridge, MA:
Barrett, N., & Toma, E. F. (2013). Reward or punishment? Harvard University Press.
Class size and teacher quality. Economics of Education
Review, 35, 41–52. https://doi.org/10.1016/j. Buck, R. C. (1960). The extent of social participa-
econedurev.2013.03.001. tion among public school teachers. The Journal of
Bastian, K. C., Henry, G. T., & Thompson, C. L. (2013). Educational Sociology, 33(8), 311–319. https://doi.
Incorporating access to more effective teachers into org/10.2307/2264408.
assessments of educational resource equity. Education
Finance and Policy, 8(4), 560–580. https://doi. Calarco, J. M. (2011). “I need help!” Social class and chil-
org/10.1162/EDFP_a_00113. dren’s help-seeking in elementary school. American
Becker, H. S. (1952a). The career of the Chicago public Sociological Review, 76(6), 862–882. https://doi.
school teacher. American Journal of Sociology, 57(5), org/10.1177/0003122411427177.
470–477.
Becker, H. S. (1952b). Social-class variations in Calarco, J. M. (2014). The inconsistent curriculum. Social
the teacher–pupil relationship. The Journal of Psychology Quarterly, 77(2), 185–209. https://doi.
Educational Sociology, 25(8), 451–465. https://doi. org/10.1177/0190272514521438.
org/10.2307/2263957.
Becker, H. S. (1953). The teacher in the author- Carlson, R. O. (1961). Variation and myth in
ity system of the public school. The Journal of the social status of teachers. The Journal of
Educational Sociology, 27(3), 128–141. https://doi. Educational Sociology, 35(3), 104–118. https://doi.
org/10.2307/2263223. org/10.2307/2264812.
Bidwell, C. E. (1955). The administrative role and satisfac-
tion in teaching. The Journal of Educational Sociology, Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014a).
29(1), 41–47. https://doi.org/10.2307/2263350. Measuring the impacts of teachers I: Evaluating bias
Bidwell, C. E., Frank, K. A., & Quiroz, P. A. (1997). in teacher value-added estimates. American Economic
Teacher types, workplace controls, and the organiza- Review, 104(9), 2593–2632. http://www.aeaweb.org/
tion of schools. Sociology of Education, 70(4), 285– aer/.
307. https://doi.org/10.2307/2673268.
Bidwell, C. E., & Yasumoto, J. Y. (1999). The colle- Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014b).
gial focus: Teaching fields, collegial relationships, Measuring the impacts of teachers II: Teacher value-
and instructional practice in American high schools. added and student outcomes in adulthood. American
Sociology of Education, 72(4), 234–256. https://doi. Economic Review, 104(9), 2633–2679. http://www.
org/10.2307/2673155. aeaweb.org/aer/.
Blase, J. J. (1986). Socialization as humanization: One
side of becoming a teacher. Sociology of Education, Chingos, M. M., & West, M. R. (2011). Promotion and
59(2), 100–113. https://doi.org/10.2307/2112435. reassignment in public school districts: How do
Bogardus, E. S. (1928). Teaching and social distance. The schools respond to differences in teacher effective-
Journal of Educational Sociology, 1(10), 595–598. ness? Economics of Education Review, 30(3), 419–433.
https://doi.org/10.2307/2961789. https://doi.org/10.1016/j.econedurev.2010.12.011.
Bogardus, E. S. (1929). Social case analysis and teach-
ing. The Journal of Educational Sociology, 3(1), 3–6. Clotfelter, C., Glennie, E., Ladd, H., & Vigdor, J. (2008).
https://doi.org/10.2307/2961155. Would higher salaries keep teachers in high-poverty
Boyd, D., Lankford, H., Loeb, S., Ronfeldt, M., & schools? Evidence from a policy intervention in
Wyckoff, J. (2011). The role of teacher quality in North Carolina. Journal of Public Economics, 92,
retention and hiring: Using applications to transfer to 1352–1370.
uncover preferences of teachers and schools. Journal
of Policy Analysis and Management, 30(1), 88–110. Clotfelter, C. T., Ladd, H. F., & Vigdor, J. (2005).
https://doi.org/10.1002/pam.20545. Who teaches whom? Race and the distribu-
Boyd, D., Lankford, H., Loeb, S., & Wyckoff, J. (2005). tion of novice teachers. Economics of Education
Explaining the short careers of high-achieving teach- Review, 24(4), 377–392. https://doi.org/10.1016/j.
ers in schools with low-performing students. The econedurev.2004.06.008.
American Economic Review, 95(2), 166–171.
Bredo, E. (1977). Collaborative relations among elemen- Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006).
tary school teachers. Sociology of Education, 50(4), Teacher–student matching and the assessment
300–309. https://doi.org/10.2307/2112502. of teacher effectiveness. The Journal of Human
Resources, 41(4), 778–820.
Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2007).
Teacher credentials and student achievement:
Longitudinal analysis with student fixed effects.
Economics of Education Review, 26(6), 673–682.
https://doi.org/10.1016/j.econedurev.2007.10.002.
Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2011).
Teacher mobility, school segregation, and pay-based
policies to level the playing field. Education Finance
and Policy, 6(3), 399–438. https://doi.org/10.1162/
EDFP_a_00040.