The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Handbook of Learning Disabilities ( PDFDrive )

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by perpustakaanipgkrm, 2021-06-17 01:08:55

Handbook of Learning Disabilities

Handbook of Learning Disabilities ( PDFDrive )

This page intentionally left blank

28

Exploratory and Confirmatory Methods
in Learning Disabilities Research

Robert D. Abbott
Dagmar Amtmann

Jeff Munson

This chapter explores a selection of ex- children within each of several classrooms

ploratory and confirmatory methods used or repeated measures on individual chil-

in our research on learning disabilities in dren. Maintaining and preparing such data

reading and writing. We emphasize the for analysis are more easily handled with

strengths and limitations of each method “relational” databases such as Access. Such

within the contexts of competing theoretical programs easily allow the linking of data

explanations for the data and have mini- across linked hierarchies such as repeated

mized the number of equations included in measurements on a child (fall, winter, and

the text, instead referring readers to the rel- spring scores on the Process Assessment of

evant sources in the statistical literature. the Learner (PAL) [Berninger, 2001] ortho-

Readers interested in single-subject designs, graphic coding task), child measurements

growth curve analysis, or qualitative re- (gender, IQ), classroom variables (class size,

search methods are directed to other chap- teacher experience), and school characteris-

ters in this handbook as we minimize dis- tics (number of students, public/private, rur-

cussion of these methods. al/suburban/urban). Using relational data-

bases to manage the data in such hierarchies

also reminds the researcher of the large in-

Data Structure, Data Management crease in Type I error rates when the data
Systems, and Missing Data are not treated as nested and the standard
errors are incorrectly calculated (e.g., if a

Researchers need to develop facility with a single-level regression analysis is being done

variety of statistical and data management and each child in the same classroom is as-

software applications. Most researchers are signed the same value on teacher experi-

familiar with the row (subject) by column ence) (Raudenbush & Bryk, 2002; Snijders

(measurement) data structures commonly & Bosker, 1999). In addition, in complex

found in statistical packages such as SPSS. studies with multiple data collection ses-

Such “flat file” databases are consistent sions and multiple data sources, relational

with many types of research questions. Flat database structures can facilitate schedul-

file databases, however, are limiting when ing, data collection, and data management.

data are nested, such as data on multiple For example, following is a description of a

471

472 METHODOLOGY

system developed by Munson to manage schedules are exported to an Internet-based
data within the Autism Program Project. calendar organized by both date and staff
member. Data files and their documentation
This system uses Microsoft Access and created for use across projects are placed on
stores the information in a centralized se- a password-protected web site. Other elec-
cure location. Individual staff members then tronic documents, such as testing protocols,
have access, on a password-protected basis, data collection forms, digital photos, and so
to the relevant sections of the database for on, are similarly centralized to provide
their particular responsibilities. Integrating ready access to all staff members. A web
information such as appointment schedul- page serves as a master catalogue of these
ing and data collection status provides up- resources that currently has over 100 links
to-date information to staff members across to specific documents related to the project
the project and minimizes the tracking of re- (e.g., Word, Excel, Access, PowerPoint, and
dundant information. For example, as one SPSS files). Information from this shared
family may have as many as 10 or more database and document system is continual-
scheduled appointments, this information is ly archived using the AutoSave software
entered into Access a single time, then auto- program. Immediately after any file is modi-
matically integrated into confirmation let- fied, a duplicate of this file is saved on an-
ters to the family, daily schedules for the other computer. The AutoSave program is
reception staff, an internet-based testing cal- configured to save the last three versions of
endar, and comprehensive subject-specific every file. Furthermore, all data are backed
data collection reports. This system also al- up on read–write CDs once a week that are
lows complex data entry and management stored offsite.
tasks to be carried out on several computers
simultaneously. This architecture enables Such a system is flexible, allows easy up-
staff who are working directly with families dating and transformation of data, ensures
immediate access to clinically relevant data correct linking of up-to-date data across
while simultaneously accomplishing accu- sources, and produces data files that are
rate data entry. As a result, data are always readily accessible for statistical analysis.
saved in the correct format in the same cen- Though many of these activities can be ac-
tral location for direct import into SPSS and complished through the careful use of syn-
other analysis packages. This system shown tax files in SPSS, we often encounter re-
in Figure 28.1 has been implemented across searchers using multiple versions of their
a local area network of 13 PCs located in data with each version containing some
different locations. Ongoing appointment nonoverlapping subset of data revisions and

FIGURE 28.1. Example of an integrated relational database system.

Exploratory and Confirmatory Methods 473

transformations, making it difficult to iden- mates of the individual-level missing data
tify a version with all the relevant correctly values are needed. Schafer (1997) recom-
calculated variables. mends the following steps: (1) multiple im-
puted data sets are estimated (the number of
Even with the best of management sys- data sets depends on the amount and pat-
tems, missing data for a person occasionally tern of missing data), (2) the same statistical
occurs during data collection, especially in models are run on each data set, and (3) the
repeated-measure designs or in survey re- resulting parameter estimates and standard
search with large sample sizes. Current sta- errors are then averaged taking both within-
tistical approaches view the mechanisms un- and between-imputation variability into ac-
derlying such missing data as either count. Schafer provides free programs for
accessible or inaccessible (Graham & doing these estimations in a windows envi-
Schafer, 1999). Additionally, accessible ronment (e.g., Norm) or in SAS (e.g., PAN
mechanisms have been viewed as missing for panel data; Schafer, 2001). Recent com-
completely at random (MCAR) and missing parisons (Collins, Schafer, & Kam, 2001) of
at random (MAR) (Little & Rubin, 1987). maximum likelihood and multiple imputa-
tion approaches suggest including all avail-
The limitations of traditional approaches able auxiliary variables in the imputation
to handling missing data are now widely procedure because there will be a decreased
recognized (Schafer, 1997). Listwise dele- chance of omitting an important cause of
tion, where the person, with one or more missingness.
observations missing, is dropped from all
analyses, assumes MCAR (the most strin- These approaches are making maximal
gent mechanism) and often dramatically use of the available data. They are not creat-
reduces power. Pairwise deletion where ing data. For example, in repeated-measures
the person with missing observations is designs, the methods base their estimates on
dropped only from analyses of variables on all the available data for all persons and do
which the person is missing, often produces not delete the available data for a person
correlation/covariance matrices that are just because the person is missing data on
non-Gramian and creates problems for one variable at one time point. Modeling
analyses that assume such matrices are the growth using hierarchical linear models al-
result of minor product moment multiplica- lows growth to be modeled even when the
tion. In practical terms, the magnitude of data are not time structured (all children
the correlations may be inconsistent with measured at the same time) and some chil-
each other because they are based on differ- dren are missing data (Raudenbush & Bryk,
ent subsets of the participants. Mean impu- 2002).
tation, where the missing data point is re-
placed by the mean on the variable, biases Exploratory Methods of Analysis
the variances and covariances as estimates
of population values. Once developed, a flexible data manage-
ment system and missing data approach can
Contemporary statistical approaches uti- be used to readily prepare data for graphical
lize all the available data to estimate the co- and quantitative exploratory analysis. Ex-
variance matrix of the measurements (the ploratory analyses can be used for many
focus in many analyses) using direct maxi- purposes. In this chapter we describe some
mum likelihood methods (AMOS 4.0; Ar- of the exploratory methods we have used to
buckle & Wothke, 1999), multiple imputa- (1) examine the consistency of the data with
tion (Schafer, 1997, 2001), or empirical assumptions made by confirmatory inferen-
Bayes (Raudenbush & Bryk, 2002) ap- tial statistical methods, (2) explore measure-
proaches. For most data, all of these ap- ment structures in the context of new theo-
proaches are superior to listwise, pairwise, ries of a construct or the relationships
and mean imputation methods (Shafer & among constructs, and (3) develop a better
Graham, 2002). Under certain assumptions, understanding of the heterogeneous nature
these contemporary approaches lead to the of growth in our growth modeling efforts.
same estimated covariance matrix. For
analyses that focus on individual data, such Exploratory methods are often linked to
as repeated-measures analyses, growth
curve modeling, and survival analyses, esti-

474 METHODOLOGY

the assumptions made when we are going to models we have found the graphing capabil-
use a confirmatory method. For example, ities of MLwiN (Goldstein et al., 1998)
many of our instructional studies focus on helpful. Plots of individual growth curve
modeling the growth of children during slopes with confidence intervals are avail-
multisession tutorial interventions (Bern- able within MLwiN with a few “mouse”
inger et al., 1998) and comparing the effects clicks. Serpentine graphs of individual
of different interventions on parameters of growth curve slopes have been helpful as a
the growth curve such as the slope. As a way to explore the presence of qualitatively
first step in the confirmatory modeling of different subgroups of children with poten-
the data we examine the variances and co- tially different growth processes. Examina-
variances of repeated measures using HLM tion of the width of confidence intervals
(Bryk & Raudenbush, 1992; Raudenbush, helps us identify children with highly vari-
Bryk, Cheong, & Congdon, 2000) to decide able data and begin to understand the
whether the compound symmetry assump- sources of that variability. We have then
tions of homogeneous variances and spheri- used this information in subsequent studies
cal covariances are consistent with the data to design varying instructional treatments
on the repeated measures or whether these and confirmatory data analyses.
simplifying assumptions should not be
made and more complex confirmatory Standard references for graphical repre-
models with heterogeneous variances sentations for exploratory analyses include
and autoregressive-structured covariances Tufte (1983, 1990, 1997), Cleveland
should be included in the confirmatory (1993), and Wainer (1997). We have also
model when standard errors are estimated. found the book by Wilkinson (1999) helpful
With respect to repeated-measures designs, in thinking about potential graphical repre-
examination of assumptions and the choice sentations not yet appearing in the standard
of subsequent analysis plans have been nice- statistical packages.
ly summarized in recent chapters by Rau-
denbush (2001a, 2001b). Other examples Some Quantitative Methods
include the exploration of data relative to for Exploratory Analysis
such assumptions as normality of distribu-
tions of residuals and linearity of relation- A variety of quantitative methods are avail-
ship. able for exploratory analysis (Cook, 1998).
Some approaches, such as using various
Selected Graphical Methods of symbols to plot the influence or leverage of
Exploratory Analysis points in scatter diagrams, combine graphi-
cal and quantitative analysis. Others, such
The uses of graphical exploratory methods as the examination of unplotted studentized
have been greatly enhanced by the develop- residuals in linear regression or the sorting
ment of powerful desktop-based graphical of Mahalanobis statistics in HLM (Rauden-
interfaces. Starting with programs such as bush & Bryk, 2002), are largely based on
Data Desk and MacSpin, the options for quantitative analysis.
desktop exploratory graphical analysis have
been greatly expanded with the develop- Since their development in the early
ment of microprocessors and software such 1900s, exploratory factor analysis (EFA)
as Splus (MathSoft, 2000; StatSci, 2000), and exploratory principal components
SYSTAT, and recent versions of SPSS. Such analysis (PCA) are among the most widely
approaches as scatterplot brushing and piv- used methods in psychological research.
ot tables have become standard practice in They have been used in the early stages of
our research for many reasons (e.g., to iden- scale development (to examine the dimen-
tify and interpret bivariate extreme scores). sionality of a set of questions thought to
These approaches and many others are de- represent a unitary underlying dimension)
scribed in the manuals of the statistical and in explorations of covariance and cor-
packages as well as various books (e.g., relation matrices among measures to exam-
Cook, 1998). ine the consistency of these relationships
with the theoretical predictions derived
For example, in our work with growth from the theory of the constructs.

Although widely used, the exploratory

Exploratory and Confirmatory Methods 475

uses of EFA and PCA have been criticized in helped us to conceptualize the dimensions
terms of both theory and practice. Fabrigar, and develop new and more factorially pure
Wegener, MacCallum, and Strahan (1999) measures of each dimension. We also ex-
summarized and critiqued current practice plored changes in interpretation if we used
in using EFA and PCA and offered guide- EFA or varied the number of dimensions re-
lines for their appropriate use. Key decisions tained for interpretation. Consistent with
in the appropriate use of these methods in- the findings of others for data with multiple
clude choices about (1) study design (e.g., reliable indicators of each theoretical con-
including enough reliable indicators [3–5 is struct, we found that after rotation, inter-
a common recommendation] of each of the pretations were highly consistent whether
theoretically expected dimensions; defining the initial dimensions had been derived us-
and sampling the appropriate population); ing PCA or EFA.
(2) decisions to use EFA (scaling based on
common variance estimated via commonali- This discussion has emphasized the con-
ties; factor scores only estimated), or PCA stant interplay of theory, exploratory meth-
(scaling based on total variance; component ods, and subsequent use of confirmatory
scores linear combinations of the measured methods on new data. In our view, the use
variables); (3) if using EFA, decisions about of exploratory methods should be guided by
the loss function to be maximized (e.g., theory as much as possible, replicated and
principal axis factor analysis vs. maximum extended to a new data set to strengthen
likelihood factor analysis); (4) decisions their use, and performed so that Type I er-
about the number of factors or components ror rates are tightly controlled so that over-
to retain for subsequent examination; and generalization and Type I errors are mini-
(5) decisions about how to rotate the initial mized. As we use exploratory methods, we
EFA factors or PCA components to more constantly keep in mind implications for (1)
theoretically interpretable positions (e.g., the construct validity of our measures and
orthogonal versus oblique rotations). These manipulations, (2) the influences on the in-
issues are explored more fully in books on ternal validity of the conclusions we draw
factor analysis and latent variable modeling from our study design, (3) the ways in
(Comrey, 1992; Loehlin, 1998) which our statistical conclusion validity will
be influenced by the assumptions we will be
We agree with Fabrigar, Wegener, Mac- making in our confirmatory analyses, and
Callum, and Strahan (1999), and we have (4) the effects of characteristics of our re-
found these methods useful at the early search context and participants on the ex-
stages of our conceptualization and re- ternal validity of our generalizations about
search. For example, in our current work on contexts and participants.
the use of morphological knowledge by
poor and good readers, we have used ex- Confirmatory Methods of Analysis
ploratory PCA (Berninger, Abbott, Billings-
ley, & Nagy, 2001). In doing so, we first de- A variety of confirmatory methods are
veloped multiple measures of each of our available to compare the consistency of data
theoretical dimensions of morphological with models derived from competing theo-
knowledge, we administered these multiple ries. Methods for normally distributed mea-
measures to a diverse group of readers, and sured dependent variables range from
we then explored their measurement struc- analyses based on standard univariate linear
ture using PCA with Varimax orthogonal model analyses using analysis of variance
rotation. We used PCA and VARIMAX ro- (ANOVA) and regression to multivariate
tation because scores on each measure had a linear model analyses using such methods
reasonable degree of reliability and shared as multivariate analysis of variance
common variance with scores on the other (MANOVA) and discriminant functions
measures. We were interested in the inde- (Lunneborg & Abbott, 1983). Methods for
pendent contribution of each dimension in binary and categorical dependent variables
explaining the covariation of individual dif- range from survival methods to exact tests
ferences on the measures so we used orthog- based on permutation and randomization.
onal rotation to retain the independence of Methods for latent variable linear models
the dimensions. Results of this analysis

476 METHODOLOGY

(Bentler, 1980; Bollen, 2002) range from of the statistical test can be dramatically af-
confirmatory factor (CFA) analysis to struc- fected by choice of analysis (Cole, Maxwell,
tural equation modeling (SEM) of latent Arvey, & Salas, 1993). If the multiple de-
variables. In this chapter we can only briefly pendent variables are indicators of a latent
discuss a few confirmatory methods that we variable, then SEM of mean structures pro-
have found useful in our research. Because vides a more powerful test of the hypothesis
many of our research questions focus on the of mean differences due to treatment than
covariation of individual differences and does MANOVA. If the multiple dependent
growth over time in reading and writing variables are not indicators of a latent vari-
processes we have often used CFA and SEM able, then MANOVA can have greater pow-
with multiple indicators to provide a more er. However, the power of the MANOVA
complete mapping of the theoretical con- test of treatment effects depends on the pat-
struct and a modeling of measurement error. tern of correlations among the dependent
We therefore focus our discussion on these variables and the direction of the experi-
methods. Again, we have provided readers mental effects (Cole et al., 1993). For exam-
with citations to overviews and statistical ple, if the experimental effects are concen-
articles that elaborate our discussion. trated in a single dimension, then
significance tests based on the greatest char-
Typically, the researcher allows the re- acteristic root test (e.g., Roy’s gcr test) may
search questions and distributional proper- be more powerful than those based on all
ties of the data to guide the choice of a CFA roots (e.g., Wilks Lambda).
method. Often, however, there is not a one-
to-one correspondence between a research Another use of a SEM framework for
question and a confirmatory statistical thinking about designed experiments is that
method. For example, in a designed experi- the researcher can include potential media-
ment including multiple dependent variables tors in the SEM model to directly model
with normally distributed residuals, mea- threats to internal validity such as resentful
sured independent variables, and factorially demoralization (when control subjects be-
designed randomly assigned treatments, come aware they are not receiving the bene-
the investigator has a wide choice of ap- fits of the treatment and consequently per-
proaches ranging from methods used to form at a lower level than they would have)
test competing hypotheses about means or poor fidelity of implementation of the
(MANCOVA; structural equation modeling treatment that can threaten conclusions
of mean structures) to methods used to test drawn from randomized experiments (Cook
competing hypotheses about covariation & Campbell, 1979; Shadish, Cook, &
(canonical correlation; structural equation Campbell, 2002). Measuring fidelity of im-
modeling of covariance structures). While plementation and taking variations in fideli-
the measured variable linear model-based ty into account in the statistical analysis is
approaches are standard in research, using especially important in studies done in
multiple indicator SEM methods provides schools where contexts and classroom dy-
an approach to modeling the reliability of namics may vary widely leading to varying
the outcome, allows the modeling of corre- implementation of the instructional strate-
lated measurement error (perhaps due to gies.
method effects), and provides the opportu-
nity to examine competing models of medi- As a context for exploring some of these
ating processes. confirmatory methods, consider a series of
our research questions concerning the struc-
Modeling of random and correlated mea- tural relationships among orthographic
surement error in the SEM methods can knowledge, phonological knowledge, and
provide more accurate estimates of the ef- reading acquisition (Abbott & Berninger,
fects of the experimental treatments (Bollen, 1995). These questions have included the
1989). Rigdon (1994) illustrates how failing following: Are covariations among mea-
to model the measurement error in the pre- sures of orthographic coding at the letter
dictors and criteria of regression models can level, letter cluster level, and word level
change the researcher’s conclusions about more consistent with a theoretical model
the results. When there are multiple depen- hypothesizing that all these measures reflect
dent variables in an experiment, the power a single dimension or a theory that hypothe-

Exploratory and Confirmatory Methods 477

sizes that multiple dimensions are needed to measures of a construct into the research
model the covariances among these mea- design. By including multiple measures of
sures (CFAs of measures for varying devel- each construct, CFA allows the researcher
opmental levels and genders)? Do ortho- to statistically test the degree that the co-
graphic measures account for unique variances among measures are consistent
variance in reading and writing acquisition with relationships predicted by varying the-
beyond that accounted for by phonological ories of the phenomenon. For example, the
measures and print experience (structural theory that hypothesizes that orthographic
equation modeling of latent variables)? Is knowledge and phonological knowledge
orthographic processing important only in are identical processes would specify that
word recognition in isolation or also when the correlation between standardized fac-
reading in context? Many of these questions tors would have been 1.00 in Olsen and
relate to the construct validity of measures colleagues’ (1994) study. CFA of multiple
of orthographic knowledge. Others relate to indicators allows us to extend this work by
the structural relationships of orthographic examining this relationship with measure-
knowledge, phonological knowledge, and ment error modeled and to statistically
reading and writing (Abbott & Berninger, compare the fit to the data of such a one-
1993). factor theory as compared to a theory that
hypothesizes the two factors are correlated
Central to each of these questions is con- but not identical.
struct validity. In their classic paper, Cron-
bach and Meehl (1955) suggested that the In contrast to exploratory factor analysis,
evaluation of the construct validity of mea- confirmatory factor analysis allows the re-
sures should consider both convergent evi- searcher to statistically test different mea-
dence (Are the measures correlated with surement models underlying the covaria-
other measures they theoretically should be tions among the indicators. If multiple
correlated with?) and discriminant evidence methods of measuring conceptually similar
(Are the measures little correlated with oth- and distinct constructs are included in a
er measures from which they are theoreti- study, CFA can also be used to investigate
cally distinct?). A well-designed study inves- method effects as well as covariation among
tigating the construct validity of measures the measures due to the underlying con-
of orthographic knowledge would therefore struct. In our example, developing measure-
include measures that are theoretically dis- ment models based on multiple indicators
tinguishable as well as measures that are will result in factors that are more reliable
theoretically related. According to this view, and provide a more complete operational-
if orthographic knowledge and phonologi- ization of orthographic and phonological
cal knowledge are the result of the same knowledge than can be achieved by a single
process, their correlation should be one, re- measurement of each. Subsequent tests of
stricted only by our ability to reliably mea- structural relationships among these latent
sure the constructs. Exploratory analyses variables or factors will then provide a pic-
can inform this research as Olsen, Forsberg, ture of the nature of relationships among
and Wise (1994) included measures of or- the theoretical constructs less influenced by
thographic knowledge and phonological measurement error and the narrowness of a
knowledge and found a correlation of r = single measurement of a theoretical con-
.43 between phonological and orthographic struct.
factors in their exploratory PCA with
oblique rotation. A second aspect of exam- Many resources are available for guiding
ining whether orthographic knowledge and such latent variable structural modeling ef-
phonological knowledge are identical is to forts (Bentler, 1980; Bentler & Wu, 2001;
examine if, when used as predictors of a Bollen, 1989; Byrne, 2001, or earlier ver-
common criterion, they make unique contri- sions; Hayduk, 1987). The following dis-
butions to the prediction of the criterion. cussion outlines some of the steps to consid-
er when using latent variable SEM. First,
Though the results derived from such the researcher clearly specifies the predic-
analyses are informative, analysis ap- tions derived from the competing theories of
proaches based on measured variables can the meaning of the constructs and their
be enhanced by incorporating multiple structural relationships. To address ques-

478 METHODOLOGY

tions about orthographic knowledge, phon- ment models (i.e., 5 factors vs. 4 factors)
ological knowledge, and reading acquisi- can be hierarchically compared statistically
tion, for example, the researcher needs to and a decision made about the best-fitting
decide on the measurement models for measurement model. Most researchers use a
orthographic knowledge, phonological combination of goodness-of-fit statistics
knowledge, and reading acquisition (Bern- (differences in ␹2; Gonzalez & Griffin,
inger, 1994). The competing hypotheses 2001), incremental fit indices, model parsi-
about the structural relationships among mony adjusted statistics, expected parame-
the constructs need to be specified; for ex- ter change statistics, root mean square error
ample, two such theoretical hypotheses of approximation (RMSEA), confidence in-
would be one that models the relationship tervals, and Lagrangian Multiplier tests to
between orthographic coding and reading evaluate the fit of hierarchically related
as completely mediated through phonologi- competing measurement models. The length
cal coding and a second that includes an ad- of this (incomplete) list attests to current re-
ditional direct path from orthographic cod- search interest in the development of fit in-
ing to reading hypothesizing that all the dices of competing models! Byrne (2001)
relationship between orthographic coding clearly discusses the strengths and weak-
and reading is not accounted for by the rela- nesses of this “smorgasbord.”
tionship of orthographic and phonological
coding. Fourth, the researcher compares the fit of
competing theories of the structural rela-
Second, for an appropriately sized sample tionship among the factors. We argue
(Bentler & Yuan, 1999; Curran, West, & strongly that attention focuses on the rela-
Finch, 1996; MacCallum, Browne, & Sug- tive fit of relevant, competing theoretical
awara, 1996) the researcher needs to mea- models. In most cases, adding post hoc
sure multiple reliable indicators of each con- paths to those specified in the theoretical
struct, ideally using multiple methods of models will improve the absolute fit in a
measurement. In deciding how many indica- sample but may not cross-validate well and
tors to include in a study, the competing ele- will not necessarily lead to the population
ments of practicality and the need to obtain structure that generated the data (MacCal-
multiple indicators of a factor must be bal- lum, 1986; MacCallum, Roznowski, &
anced. Obtaining at least three measures of Necowitz, 1992). In our experience, adding
each construct allows the testing of most post hoc additional paths such as correlated
structural hypotheses, although four mea- measurement errors often improves the ab-
sures are necessary for some questions (e.g., solute fit of both the competing models and
to test whether two measures are parallel thus their relative fit remains the same. The
rather than just the equivalent). This princi- researcher should also consider the possible
ple will ensure that the paths in many mod- existence of other theoretically meaningful
els are identified, but some models (those “equivalent” models that achieve the same
with many covariances among factors or co- degree of overall fit to the matrix of covari-
variances among the residuals of the mea- ances as the model being fitted (MacCal-
sured variables or nonrecursive relation- lum, Wegener, Uchino, & Fabrigar, 1993).
ships) and data structures (those with highly Carefully applying the approaches of
correlated redundant indicators) may have MacCallum, and colleagues (1993) or the
some underidentified paths (see Bollen, tracing rules (Loehlin, 1998) will help us
1989, and Brito & Pearl, 2002, for a discus- recognize some of the likely theoretical al-
sion of identification in structural equation ternatives. The approaches described in
models). Pearl (2000) and Spirtes, Glymour, and
Scheines (1993) can also be used to identify
Third, the researcher needs to examine equivalent models. Some of these alternative
whether the measured indicators are reflect- models can be removed from consideration
ing the latent variables consistent with theo- if longitudinal data have been collected
ry. Indicators should have significant load- where the time structure of the data makes
ings on their hypothesized latent variables some theoretical statements about the rela-
and the intercorrelations among latent vari- tionships among variables untenable. Care
ables should be within the range expected must be taken that the tested models evalu-
by theory. The fit of competing measure-

Exploratory and Confirmatory Methods 479

ate the relevant theoretical hypotheses. For relevant controls and baseline activation,
example, cross-sectional panel SEM models measuring whether or not a brain regions is
evaluate autoregressive patterns of growth. activated beyond a chance level is a com-
However, without explicit modeling of indi- monly used outcome. One approach would
vidual growth in these latent variable mod- be to use ␹2 to evaluate the difference in the
els (Duncan, Duncan, Strycker, Li, & 2 × 2 table of proportions of dyslexic and
Alpert, 1999), some types of treatment ef- typically developing readers showing activa-
fects (e.g., fan-spread growth) will not be tion in a brain region. Unless the researcher
detectable. Five, the generalization of the tested a large number of children, however,
structural and measurement models should he or she might quickly encounter problems
be examined for new (or hold-out) samples because of small sample size, the number of
(MacCallum, Roznowski, Mar, & Reith, cells relative to sample size, or low expected
1994) and models should be compared for values in a cell of the table (Cochran, 1954).
different groups where theory suggests that While various rules for using ␹2 have been
the relationships might differ. proposed (Agresti, 1990), they all rely on
using ␹2 and asymptotic theory to estimate
When group membership is categorical a p value. Rather than relying on the asymp-
(e.g., female/male), confirmatory tests that totic theory, it is now possible to calculate
the paths for male and females are equiva- an exact p value based on permutations of
lent can be carried out using multiple group the data (Agresti, 1992). These statistical
SEM (Byrne, 2001). When such hypotheses tests involve permuting the data and deter-
of moderation include continuous variables mining where the obtained data fall within
moderating relationships among constructs, the population of possible permutations of
then SEM needs to directly include product the data. It is easy to construct examples
terms in a similar fashion to multiple regres- where the asymptotic ␹2 test gives a p > .05
sion. Unfortunately, the process is much and the exact p < .05 or vice versa. Fisher il-
more complicated in moderated SEM (in lustrated such cases in his development of
part because the product terms are not mul- the exact test for the 2 × 2 table. Further-
tivariate normal) and specialized approach- more, even in this simple example, assump-
es need to be implemented (Li et al., 2001). tions about the marginals of the table are
important. For a 2 × 2 table, using
Another form of moderation (or interac- Barnard’s test that assumes only one of the
tion) arises when children have differential marginals is fixed results in greater power
growth curves within a treatment. We have for detecting alternative hypotheses than the
found that growth mixture modeling using more widely available but less powerful
SEM (Muthen, 2001a, 2001b) can be help- Fisher’s exact test that assumes that both
ful in differentiating children in our instruc- marginals are fixed.
tional experiments who benefit significantly
and quickly from instruction from children Permutation tests based on enumeration
who gained little from a particular instruc- are available today for a great variety of
tional intervention. types of data, and research questions includ-
ing ordered categorical (Agresti, Mehta, &
New Directions in Confirmatory Methods Patel, 1990) and recent statistical and com-
puting developments in Monte Carlo esti-
In current research, our confirmatory mod- mation have made accurate approximation
eling of data has been enhanced by three of exact tests and estimation of distribution-
statistical developments that might inform al characteristics and confidence intervals
others’ research. First, our research with for a variety of data types computationally
MRI (magnetic resonance imagery) brain feasible (Senchaudhuri, Mehta, & Patel,
scanning using PEPSI protocols (Richards et 1995). Given that much research about chil-
al., 2000) to examine processing differences dren with learning disabilities only includes
of children with phonological and ortho- a small number of children or measure-
graphic reading problems and typically de- ments with nonnormal error distributions,
veloping (age- and IQ-matched) children statistical tests that do not rely on asymp-
has benefited from the use of permutation- totic, large sample theory may be more ap-
based tests of differences in distributions. In propriate and in many cases may be more
this research, after taking into account the

480 METHODOLOGY

powerful when asymptotic assumptions are and examine the consistency of data with
not consistent with the data. StatXact the models derived from competing theoret-
(2001) and LogXact (1993) are two statisti- ical perspectives.
cal software packages that provide re-
searchers with modern approaches to calcu- We have also pointed readers to a variety
lating these exact p values. of statistical software applications. Each ap-
plication has complementary strengths and
A second statistical innovation that we weaknesses. We believe that our readers will
think will prove useful in the future to re- benefit by learning about this diversity and
searchers in learning disabilities is the set of using it in their attempts to answer a variety
theorems and statistical approaches derived of exploratory and confirmatory research
from graph-theoretic representations of re- questions.
lationships among constructs (Edwards,
1995; Pearl, 2000). These approaches pro- Acknowledgments
vide support for both exploratory identifi-
cation and examination of equivalent mod- Robert D. Abbott’s work on this chapter was sup-
els (Spirtes et al., 1993; Spirtes, Richardson, ported by (1) Statistics Core for the Learning Disci-
Meek, & Scheines, 1998) and approaches to plinary Center: Links to Schools and Biology, Grant
confirmatory testing of competing models No. P50 HD33812-07; (2) Interventions for Com-
(Shipley, 2000a, 2000b). Although these ponent Writing Disabilities, Grant No. HD25858-
methods focus the researcher on equivalent 11; and (3) the Statistics Core of the Autism Pro-
models that have the same overall fit to the gram Project: Neurobiology and Genetics of
covariance matrix, they also suggest ways to Autism, Grant No. HD 35465. Dagmar Amtmann’s
describe the distinguishing theoretical pre- work on this chapter was supported by Grant No.
dictions of these models (Pearl, 2000; Ship- H224A930006 from the U.S. Department of Educa-
ley, 2000a). tion, National Institute on Disability and Rehabili-
tation Research. Jeff Munson’s work was supported
A third set of statistical tools that greatly by Grant No. HD35465.
enhances the planning of CFAs includes
software packages that compute power for References
a variety of statistical methods. The ones we
use include packages for linear model analy- Abbott, R. D., & Berninger, V. W. (1993). Structur-
ses (Power and Precision 2.0 ; Borenstein, al equation modeling of relationships among de-
Rothstein, & Cohen, 2001), rates and pro- velopmental skills and writing skills in primary
portions (StatXact and LogXact), hierarchi- and intermediate grade writers. Journal of Edu-
cal models (Optimal Design; Raudenbush cational Psychology, 85, 478–508.
& Feng, 2003), and latent variable structur-
al equation models (EQS 6.0, Bentler & Abbott, R. D., & Berninger, V. W. (1995). Structur-
Wu, 2001; Mplus, Muthen & Muthen, al equation modeling and hierarchical linear
2002). Use of these tools greatly assists in modeling: Tools for studying the construct validi-
the planning of research designs with ade- ty of orthographic process in reading and writing
quate power. development. In V. W. Berninger (Ed.), The vari-
eties of orthographic knowledge: relationships
Summary to phonology, reading, and writing (pp. 321–
354). Dordrecht, The Netherlands: Kluwer Acad-
In this chapter we have provided an emic.
overview of the exploratory and confirma-
tory methods we have found useful in our Agresti, A. (1990). Categorical data analysis. New
research programs and describe some of the York: Wiley.
strengths and weaknesses of these methods.
Other research programs with different, or Agresti, A. (1992). A Survey of exact inference for
even the same, research questions may contingency tables. Statistical Science, 7, 131–
choose to use other methods. If we have 177.
touched on methods new to readers, we
hope they will consider how these methods Agresti, A., Mehta, C. R., & Patel, N. R. (1990).
might help them clarify research questions Exact inference for contingency tables with or-
dered categories. Journal of the American Statis-
tical Association, 85, 453–458.

Arbuckle, J. L., & Wothke, W. (1999). Amos 4.0
user’s guide. Chicago: Smallwaters.

Bentler, P. M. (1980). Multivariate analysis with la-
tent variables: Causal modeling. Annual Review
of Psychology, 31, 419–456.

Bentler, P. M., & Wu, E. (2003). EQS Structural

Exploratory and Confirmatory Methods 481

equations program manual, version 6.0. Encino, Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F.,
CA: Multivariate Software. & Alpert, A. (1999). An introduction to latent
Bentler, P. M., & Yuan, K.-H. (1999). Structural variable growth curve modeling: Concepts, is-
equation modeling with small samples: Test sta- sues, and applications. Mahwah, NJ: Erlbaum.
tistics. Multivariate Behavioral Research, 34,
181–197. Edwards, D. (1995). Introduction to graphical
Berninger, V. W. (1994). Reading and writing acqui- modeling. New York: Springer-Verlag.
sition: A developmental neuropsychological per-
spective. Madison, WI: Brown & Benchmark. Fabrigar, L. R., Wegener, D. T., MacCallum. R. C.,
Berninger, V. W. (2001). Process assessment of the & Strahan, E. J. (1999). Evaluating the use of ex-
learner (PAL) test battery for reading and writ- ploratory factor analysis in psychological re-
ing. San Antonio, TX: Psychological Corpora- search. Psychological Methods, 4, 272–299.
tion.
Berninger, V. W., Abbott, R. D., Billingsley, F., & Goldstein, H., Rasbash, J., Plewis, I., Draper, D.,
Nagy, W. (2001). Processes underlying timing Browne, W., Yang, M., Woodhouse, G., & Healy,
and fluency: Efficiency, automaticity, coordina- M. (1998). A user’s guide to MlwiN. Bath, UK:
tion, and morphological awareness. In M. Wolf Multilevel Models Project.
(Ed.), Dyslexia, fluency, and the brain (pp.
383–414). Timonium, MD: York Press. Gonzalez, R., & Griffin, D. (2001). Testing parame-
Bollen, K. A. (1989). Structural equations with la- ters in structural equation modeling: Every “one”
tent variables. New York: Wiley. matters. Psychological Methods, 6, 258–269.
Bollen, K. A. (2002). Latent variables in psychology
and the social sciences. Annual Review of Psy- Graham, J. W., & Schafer, J. L. (1999). On the per-
chology, 53, 605–634. formance of multiple imputation for multivariate
Borenstein, M., Rothstein, H., & Cohen, J. (2001). data with small sample size. In R. Hoyle (Ed.),
Power and Precision 2.0. Teaneck, NJ: Biostat. Statistical strategies for small sample research
Brito, C., & Pearl, J. (2002). A new identification (pp. 1–29). Thousand Oaks, CA: Sage.
condition for recursive models with correlated er-
rors. Structural Equation Modeling, 9, 459–474. Hayduk, L. (1987). Structural equation modeling
Bryk, A., & Raudenbush, S. W. (1992). Hierarchi- with LISREL: Essentials and advances. Balti-
cal linear models in social and behavioral re- more: Johns Hopkins University Press.
search: Applications and data analysis methods.
Newbury Park, CA: Sage. Li, F., Duncan, T. E., Duncan, S. C., Wallentin, F. Y.,
Byrne, B. M. (2001). Structural equation modeling Acock, A. C., & Hops, H. (2001). Interaction
with AMOS: Basic concepts, applications, and models in latent growth curves. In G. A. Mar-
programming. Mahwah, NJ: Erlbaum. coulides & R. E. Schumacker (Eds.), New devel-
Cleveland, W. S. (1993). Visualizing data. Summit, opments and techniques in structural equation
NJ: Hobart Press. modeling (pp. 173–202). Mahwah, NJ: Erlbaum.
Cochran, W. G. (1954). Some methods for strength-
ening the common ␹2 tests. Biometrics, 10, Little, R. J. A., & Rubin, D. B. (1987). Statistical
417–454. analysis with missing data. New York: Wiley.
Cole, D., Maxwell, S., Arvey, R., & Salas, E.
(1993). Multivariate group comparisons of vari- Loehlin, J. C. (1998). Latent variable models: An
able systems: MANOVA and structural equation introduction to factor, path, and structural analy-
modeling. Psychological Bulletin, 114, 174–184. sis (3rd ed.). Mahwah, NJ: Erlbaum.
Collins, L. M., Schafer, J. L., Kam, C.-M. (2001). A
comparison of inclusive and restrictive strategies LogXact. (1993). Software for exact logistic regres-
in modern missing data procedures. Psychologi- sion. Cambridge, MA: Cytel Software.
cal Methods, 6, 330–351.
Comrey, A. L. (1992). A first course in factor analy- Lunneborg, C., & Abbott, R. D. (1983). Elemen-
sis. Hillsdale, NJ: Erlbaum. tary multivariate analysis for the behavioral sci-
Cook, R. D. (1998). Regression graphics: Ideas for ences: Application of Basic structure. New York:
studying regressions through graphics. New Elsevier Science.
York: Wiley.
Cook, T. D., & Campbell, D. T. (1979). Quasi-ex- MacCallum, R. C. (1986). Specification searches in
perimentation: Design and analysis for field set- covariance structure modeling. Psychological
tings. Chicago: Rand-McNally. Bulletin, 100, 107–120.
Cronbach, L. J., & Meehl, P. (1955). Construct va-
lidity in psychological tests. Psychological Bul- MacCallum, R. C., Browne, M. W., & Sugawara,
letin, 52, 281–302. H. M. (1996). Power analysis and determination
Curran, P. J., West, S. G., & Finch, J. F. (1996). The of sample size for covariance structure modeling.
robustness of test statistics to nonnormality and Psychological Methods, 1, 130–149.
specification error in confirmatory factor analy-
sis. Psychological Methods, 1, 16–29. MacCallum, R. C., Roznowski, M., Mar, M., &
Reith, J. V. (1994). Alternative strategies for
cross-validation of covariance structure models.
Multivariate Behavioral Research, 29, 1–32.

MacCallum, R. C., Roznowski, M., & Necowitz, L.
B. (1992). Model modifications in covariance
structure analysis: The problem of capitalization
on chance. Psychological Bulletin, 114, 490–504.

MacCallum, R. C., Wegener, D. T., Uchino, B. N.,
& Fabrigar, L. R. (1993). The problem of equiva-
lent models in applications of covariance struc-
ture analysis. Psychological Bulletin, 114,
185–199.

MathSoft. (2000). S-PLUS-2000 user’s guide. Seat-
tle, WA: Author.

482 METHODOLOGY

Muthen, B. (2001a). Latent variable mixture mod- Rigdon, E. E. (1994). Demonstrating the effects of
eling. In G. A. Marcoulides & R. E. Schumacker unmodeled random measurement error. Structur-
(Eds.), New developments and techniques in al Equation Modeling, 1, 375–380.
structural equation modeling (pp. 1–34). Mah-
wah, NJ: Erlbaum. Schafer, J. L. (1997). Analysis of incomplete multi-
variate data. London: Chapman & Hall.
Muthen, B. (2001b). Second-generation structural
equation modeling with a combination of cate- Schafer, J. L. (2001). Multiple imputation with
gorical and continuous latent variables: New op- PAN. In L. M. Collins & A. G. Sayer (Eds.), New
portunities for latent class-latent growth model- methods for the analysis of change (pp.
ing. In L. M. Collins & A. G. Sayer (Eds.), New 357–377). Washington, DC: American Psycho-
methods for the analysis of change (pp. logical Association.
289–322). Washington, DC: American Psycho-
logical Association. Schafer, J. L., & Graham, J. W. (2002). Missing
data: Our view of the state of the art. Psychologi-
Muthén, L. K., & Muthén, B. O. (2001). Mplus 2.0 cal Methods, 7, 147–177.
User’s Guide. Los Angeles: Muthen & Muthen.
Senchaudhuri, P., Mehta, C. R., & Patel, N. R.
Muthén, L. K., & Muthén, B. O. (2002). How to (1995). Estimating exact p-values by the method
use a Monte Carlo study to decide on sample size of control variables, or Monte Carlo rescue.
and determine power. Structural Equation Mod- Journal of the American Statistical Association,
eling, 9, 599–620. 90, 640–648.

Olsen, R., Forsberg, H., & Wise, B. (1994). Genes, Shadish, W. R., Cook, T. D., & Campbell, D. T.
environment, and the development of ortho- (2002). Experimental and quasi-experimental de-
graphic skills. In V. W. Berninger (Ed.), The vari- signs for generalized causal inference. New York:
eties of orthographic knowledge: Theoretical and Houghton Mifflin.
developmental issues (pp. 27–71). Dordrecht,
The Netherlands: Kluwer Academic. Shipley, B. (2000a). Cause and correlation in biolo-
gy: A user’s guide to path analysis, structural
Pearl, J. (2000). Causality. Cambridge: Cambridge equations and causal inference. Cambridge, UK:
University Press. Cambridge University Press.

Raudenbush, S. W. (2001a). Comparing personal Shipley, B. (2000b). A new inferential test for path
trajectories and drawing causal inferences from models based on direct acyclic graphs. Structural
longitudinal data. Annual Review of Psychology, Equation Modeling, 7, 206–218.
52, 501–525.
Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel
Raudenbush, S. W. (2001b). Toward a coherent analysis: An introduction to basic and advanced
framework for comparing trajectories of individ- multilevel modeling. Thousand Oaks, CA: Sage.
ual change. In L. M. Collins & A. G. Sayer
(Eds.), New methods for the analysis of change Spirtes, P., Glymour, C., & Scheines, R. (1993).
(pp. 33–64). Washington, DC: American Psycho- Causation, prediction, and search. New York:
logical Association. Springer-Verlag.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierar- Spirtes, P., Richardson, T., Meek, C., & Scheines, R.
chical linear models: Applications and Data (1998). Using path diagrams as a structural mod-
analysis methods (2nd ed.). Thousand Oaks, CA: eling tool. Sociological Methods and Research,
Sage. 27, 182–225.

Raudenbush, S. W., Bryk, A. S., Cheong, Y. F., & StatSci. (2000). S–PLUS guide to statistical and
Congdon, R. (2000). HLM-5: Hierarchical linear mathematical analysis, version 2000. Seattle,
and nonlinear modeling. Chicago: Scientific Soft- WA: StatSci.
ware, International.
StatXact. (2001). StatXact–5.0 statistical software.
Raudenbush, S. W., & Feng, L. X. (2001). Effects of Cambridge, MA: Cytel Software.
study duration, frequency of observation, and
sample size on power in studies of group differ- Tufte, E. R. (1983). The visual display of quantita-
ences in polynomial change. Psychological Meth- tive information. Cheshire, CT: Graphics Press.
ods, 6, 387–401.
Tufte, E. R. (1990). Envisioning information.
Richards, T., Corinna, D., Serafini, S., Steury, K., Cheshire, CT: Graphics Press.
Dager, S., Marro, K., Abbott, R. D., Maravilla,
K., & Berninger, V. W. (2000). Effects of a Tufte, E. R. (1997). Visual explanations: Images,
phonologically-driven treatment for dyslexia on quantities, evidence and narrative. Cheshire, CT:
lactate levels as measured by proton MR Spectro- Graphics Press.
scopic imaging. American Journal of Neuroradi-
ology, 21, 916–922. Wainer, H. (1997). Visual revelations: Graphical
tales of fate and deception from Napoleon Bona-
parte to Ross Perot. New York: Springer-Verlag.

Wilkinson, L. (1999). The grammar of graphics.
New York: Springer-Verlag.

29

Designs for
Applied Educational Research

Jean B. Schumaker
Donald D. Deshler

Following is an excerpt from a recent letter changed how I thought about the content that
sent to a researcher at the University of I taught, how I viewed student learning, and
Kansas Center for Research on Learning what I saw my role being as a teacher. While I
(KU-CRL). don’t always succeed with all of my students, I
can’t begin to tell you how significant the
You will never know what a difference the changes have been! Rather than dreading
Content Enhancement Routines have made in teaching—especially the classes with the most
my teaching and in the success that my stu- difficult to teach students—I now look for-
dents are having! Let me give you a little back- ward to the challenge of seeing if I can meet
ground. I have taught high school science for the needs of both the brightest as well as the
18 years. As the makeup of our school district ones who struggled the most. The things I
has changed in recent years, my classes have learned about Content Enhancement have
become increasingly diverse. For the most transformed my life as a teacher. More impor-
part, I did okay as the composition of my tantly, they have changed how students in my
classes changed, but five years ago, when our classes learn—and that’s why I got into teach-
school adopted inclusion for students with dis- ing in the first place!
abilities, things began to fall apart on me. If I
tried to do things to meet the needs of the stu- This type of letter is clearly encouraging and
dents with LD in my classes, I would lose my the kind of message any educational re-
highly capable students. When I would use searcher longs to receive from those who
things that worked well in the past, more often use the products of his or her research and
than not, they didn’t work at all—especially development work. However, for every let-
with the students with disabilities. After a ter like this that is received, there are proba-
while, I felt totally frustrated. That following bly many other unwritten ones that could
summer, I attended a district workshop on be written by teachers who are frustrated
Content Enhancement Routines that were de- because a supposed “research-validated
veloped at your research center. As I sat practice” has fallen short of its expecta-
through the workshop, a whole new set of tions.
possibilities opened up to me. When the school
year began, I started using several of the rou- Not surprisingly, researchers at the KU-
tines I had learned in my classes—they CRL do everything they can to prevent such

483

484 METHODOLOGY

disappointment. Indeed, over the past 24 had as their overriding goal to design an ar-
years, researchers at the KU-CRL have been ray of interrelated interventions that would
devoted to the goal of creating instructional enable adolescents with LD to compete
interventions that are palatable for teachers within the context of mainstream environ-
to use and sufficiently powerful to affect the ments. To realize that goal, they adopted
performance of students with learning dis- five standards against which they judge the
abilities (LD). What is important to under- design and field-testing of new interven-
stand about their work is that in the process tions. Each intervention must (1) be practi-
of designing and field-testing a broad array cal for teachers to use and perceived by
of instructional interventions, they have, them as being doable within their class-
more often than not, come up short on an rooms, (2) be easy for both the teacher and
initial attempt with regard to producing the students to learn, (3) yield outcomes that
results they had in mind. What often has are deemed to be meaningful in terms of
seemed logical to those on the initial design real-world measures such as passing grades,
team frequently falls considerably short of (4) be sufficiently broad in its reach that it
the mark when it is taken into the class- has a favorabe impact on the performance
room. These “misfires” have led to the KU- of those without disabilities (especially if the
CRL motto: “We must be willing to go back intervention is used in the general education
to the drawing board!” Indeed, going back classroom), and (5) be sufficiently powerful
to the drawing board on many occasions to have a favorable impact on the perfor-
has been necessary to fine tune an interven- mance of students with disabilities to such a
tion to a point at which it finally starts to degree as to enable them to compete within
make sense to teachers and produces the the context the criterion environment (e.g.,
kinds of results with students that KU-CRL the general education classroom).
researchers (and the students!) would deem
worthwhile. To design interventions that meet these
five standards, several principles have guid-
Not surprisingly, given the enormous ed KU-CRL research. First, all KU-CRL re-
complexity of student learning (especially searchers are committed to designing inter-
for students who are saddled with a learn- ventions that will work within the complex
ing disability) and the complex dynamics realities of today’s schools and classrooms.
that exist in most schools (particularly sec- Hence, from the beginning, interventions
ondary schools), designing interventions are designed with these realities in mind so
that are powerful, practical, and robust is a that the gap between new interventions and
great challenge. The magnitude of this chal- the realities that define practice in today’s
lenge has been exacerbated in recent years schools and classrooms and the number of
with the expectations set forth in PL 105-17 times the drawing board needs to be revisit-
that programming for students with disabil- ed can be minimized. Second, KU-CRL re-
ities be outcome based within the context of searchers are committed to the principles of
successfully mastering (and not merely gain- “participatory research” (Turnbull & Turn-
ing access to) the general education curricu- bull, 1989) where researchers actively team
lum (Turnbull, Rainbolt, & Buchele-Ash, with key stakeholders (students with LD,
1997). In essence, the requirements of the parents, practitioners, etc.) who have im-
law demand that students with disabilities portant perspectives about the nature of the
not only acquire a significant array of skills intervention that is being considered. Thus,
that will enable them to compete within rig- the perspectives of these stakeholders are
orous general education classes but that carefully considered, and input from them is
they acquire these skills to such a degree gathered at all stages during the design and
that the skills can be generalized to a host of testing of an intervention. Third, KU-CRL
circumstances and settings and be main- researchers are committed to using sound
tained over time. research methodologies and designs. Hence,
they work with research-design experts to
Since its inception in 1977, the staff of the help ensure that research methodologies
KU-CRL (initially known as the Institute for and designs are in alignment with the nature
Research on Learning Disabilities—it was of research question(s) being asked and the
one of five Office of Special Education Pro- complex realities of the settings in which the
grams-funded LD research institutes), has

Designs for Applied Educational Research 485

studies will be conducted. Fourth, KU-CRL who have been prepared to work with
researchers are committed to collecting a teachers and administrators by following
broad array of measures that will enable known principles of professional develop-
consumers to have a thorough understand- ment and school change and to assist them
ing of the intervention’s effects at all stages in successfully implementing interventions
during the teaching process. Thus, multiple validated by KU-CRL researchers. These in-
measures that yield reliable and quantifiable dividuals continually provide KU-CRL re-
results are used that, when taken as a searchers information about how the in-
whole, will tell a relatively complete story of structional interventions are working (or
the intervention’s effects. not working, as the case may be) in the na-
tion’s schools. As a result, additional adjust-
Fifth, KU-CRL researchers are committed ments can be made in the interventions and
to a field-testing process that involves sever- the formats of the instructional materials.
al stages. Initially, a new intervention might
be tested under tightly controlled conditions In short, these eight principles are at the
to determine whether the intervention is vi- heart of the intervention research that is
able and worthy of testing in more authen- conducted by KU-CRL researchers. These
tic conditions. Ultimately, the intervention is researchers believe that when these princi-
tested under circumstances that closely ap- ples are applied in combination, they help
proximate the complexity and unpre- to increase the level of impact that interven-
dictable nature of actual classrooms. Sixth, tion research ultimately has on the perfor-
KU-CRL researchers are committed to a mance of students with disabilities.
process of continual refinement of an inter-
vention until the magnitude of gains that Indeed, these principles are especially im-
meets acceptable thresholds for social sig- portant if researchers are to deal with the
nificance as well as statistical significance (if unique set of challenges that often accompa-
possible, given the research design being ny research that is conducted on students
used) are achieved. To meet these standards, with LD—especially as these students be-
KU-CRL researchers follow the center’s come adolescents and move into secondary
motto of “being willing to go back to the schools. Among some of the more signifi-
drawing board.” In addition, they continue cant challenges that need to be addressed
to make refinements in an intervention and within this research context are the follow-
test it further even years after the initial re- ing. First, sufficient numbers of certain sub-
search has been completed. Seventh, KU- types of students (e.g., those with disabili-
CRL researchers are committed to translat- ties in math) are often difficult to find. In
ing field-test versions of interventions that addition, these students often are transient
have been successfully validated into in- or evidence high dropout rates. Therefore,
structional materials or manuals that in- locating sufficient numbers of these students
clude all the necessary supports, activities, to study is often difficult—especially in rig-
and procedures needed by teachers to be orous general education classes. As a result,
used effectively in classrooms. The commit- researchers need to be creative in selecting
ment to do this translation is time-consum- and inventing new research designs that fo-
ing and resource intensive. Finally, KU-CRL cus on small numbers of students. Second,
researchers are committed to a process of the diversity and magnitude of problems
bringing interventions to scale, that is, en- that students experience are often greater
suring that schoolwide, districtwide, and than shown on paper. For example, many
national use of an intervention can take adolescents with LD demonstrate a broad
place successfully. As has been repeatedly array of significant personal/social problems
documented (e.g., Elmore, 1996), seldom that can have a profound influence on their
are validated educational practices brought overall functioning. Third, through the in-
to scale. To determine how new interven- formed consent process, adolescents fre-
tions work when they are used in schools quently choose not to participate in a study.
across the nation, the KU-CRL staff has de- Finally, because of the limited numbers of
veloped and maintained an international students available and prevailing limitations
training network (ITN). The ITN currently imposed by schools, randomly assigning
consists of nearly 1,200 certified trainers students to groups is often not possible.

In light of these challenges and the press-

486 METHODOLOGY

ing need to design interventions that will quired in this design. Because large numbers
produce large effect sizes, KU-CRL re- of students with LD, and especially large
searchers have employed a broad array of numbers of students with the same types of
research designs that fully capitalize on the deficits (e.g., students with writing deficits),
circumstances available to them and that are typically not available in the same
compensate for the limitations that might be school, this design can be used quite effec-
presented by prevailing conditions. The fol- tively if researchers have limited access to
lowing sections discuss different research numerous research sites or limited re-
designs that have been used by KU-CRL re- sources. In addition, students with LD are
searchers while developing and testing inter- often available for participation in interven-
ventions that make a difference in the per- tion studies during the time they are as-
formance of students with LD (see Table signed to the resource room. Typically, just
29.1 for a list of the designs described here- a few students are present in this setting at a
in). time. This design is especially useful in situ-
ations in which students with disabilities are
Single-Subject Designs receiving the intervention in a setting with
just a few other students.
The first general type of design that has
been used by KU-CRL researchers and affil- In addition, this design is especially useful
iates is the single-subject design (Baer, Wolf, when researchers are interested in monitor-
& Risley, 1968). This design belies its name ing the progress of individual students over
because a methodologically sound study a time period while an intervention is being
that employs this design must involve the implemented to determine the effects of the
inclusion of several subjects, not just a sin- intervention on each student and to deter-
gle subject. The design is especially useful mine the number of trials required by each
with students with LD for several reasons. student to reach mastery. This has been a
First, large numbers of subjects are not re- priority for KU-CRL investigators interested
in teaching skills and strategies to students
TABLE 29.1 Example Designs Used by KU- with disabilities because they have been in-
CRL Researchers terested in developing interventions that
create large gains in skills within short time
Single-subject designs periods and in ensuring that all students
with learning disabilities benefit from the
Multiple-baseline across-students design intervention. That is, instead of being inter-
Multiple-baseline across-teachers design ested in measuring the mean pretest and
Multiple-baseline across-skills design posttest scores of a large group of students
Multiple-baseline across-settings design who have received a short-term intervention
Reversal design and determining that statistically significant
gains were achieved, KU-CRL researchers
Group designs have been interested in monitoring the
progress of each student on each practice
Control-group design with students trial and ensuring that the student is per-
Control-group design with teachers forming at or above a criterion level at the
Comparison-group design with student end of the study that will enable him or her
to succeed in the general education curricu-
volunteers lum. Although statistical methods can be
Comparison-group design for students with used in conjunction with this design (e.g.,
Scruggs, Mastropieri, & Casto, 1987;
teacher volunteers Swanson & Hoskyn, 1998), the differences
Comparison-group design with counterbalanced between preintervention and postinterven-
tion performance can often be seen with the
conditions for students naked eye if the data are displayed in graph
Control-group design with counterbalanced form. Thus, this design is useful when inves-
tigators are interested in studying an inter-
conditions for teachers vention that takes place over an extended
period requiring multiple practice trials be-
Combination designs

Combination designs with students
Combination designs with teachers and students

Designs for Applied Educational Research 487

fore a student reaches mastery at a level that the design can be used to measure general-
is substantially different than the level at ization and maintenance of a newly learned
which the student was performing at the be- skill or strategy.
ginning of the study. It enables investigators
to watch each student’s progress and make The major disadvantage associated with
decisions about the time required to pro- this design is that the same measure must be
duce targeted gains. gathered on each student several times;
probably the smallest number of times that
This design is also especially helpful when might be sufficient is six. This requirement
investigators wish to monitor student can cause problems in an instructional set-
progress on several measures at once. For ting where time is at a premium and stu-
example, if investigators wish to determine dents need to be spending their time learn-
whether students can generalize the skills ing versus being tested. However, if the tests
they are learning to different kinds of tasks, are short or if they are integrated into the
while also measuring their performance on instruction in such a way that they provide
a task on which they have had practice, this formative feedback to the student, several
is a good design to use. tests might be acceptable to school person-
nel.
Also, this design is useful when all the
students in a setting must receive the in- When this design has been used by KU-
struction. Often, school administrators are CRL researchers and affiliates, all partici-
reluctant to allow researchers to do research pating students are given at least three base-
if half of the students who will participate in line “probes” or tests in order to measure
the study are serving as control subjects and their baseline performance on a type of task
will not benefit. In other words, they are re- that they would encounter in school. When
luctant to give permission for students to their performance across probes is stable,
lose valuable instructional time by partici- some of the students receive instruction;
pating in a study if they will not gain some- others receive additional baseline probes.
thing from the investment of that time. These other students serve as the controls
With the single-subject design, all the stu- for the students receiving the instruction.
dents can potentially benefit. Once the students who have received in-
struction have mastered the skill being
Finally, this design has been useful when taught, the other students then receive in-
KU-CRL researchers have been interested in struction as well. Several tiers of “other stu-
studying a new way of instructing teachers dents” can be employed within the design to
and when only a few teachers are available serve as controls for students who have re-
for participation in a study. This is often the ceived the intervention before them. Several
case because just a few teachers of students replications of the design can be completed
with LD are employed by each school in a in order to show that there is generality
district. Thus, large numbers of special edu- across a number of subjects.
cation teachers are often not available un-
less researchers have the resources to work An example study in which the multiple-
with a large number of schools. probe across-students design has been used
is a study conducted by Bulgren, Hock,
Multiple-Baseline Across-Students Design Schumaker, and Deshler (1995) on the ef-
fects of instruction in a mnemonic strategy
KU-CRL researchers have used a variation on student test performance. In this study, a
of the multiple-baseline across-students de- total of 12 students participated in four
sign, called the multiple-probe across-stu- replications of the design (3 students partic-
dents design (Horner & Baer, 1978), often ipated in each replication). For each probe,
because of the flexibility of use associated the students were given information to
with it. Students can be taught individually study for a test, and on the next day they
or in small groups. They can be enrolled in were administered the test. In some of the
a class in which the instruction takes place, probes, the information to be remembered
or they can meet individually with a re- was clearly specified; in other probes,
searcher according to their schedules. They students were given written passages con-
can be enrolled in several classes across a taining information that they had to find in-
school day or in the same class. In addition, dependently and study. There were two con-

488 METHODOLOGY

ditions: (1) a baseline condition in which Strategic Tutoring, in an after-school study
probe tests were given before the students club (see Study 2 in the article). What makes
received instruction; and (2) a postinterven- this study unique is that each student’s test
tion condition, which occurred after the ini- grades were monitored in relation to a dif-
tial instruction in the strategy. Results ferent course. The study shows that the stu-
showed that when the students learned a dents’ quiz and test scores in the targeted
strategy, called the Paired-Associates Strate- course dramatically increased after strategic
gy (Bulgren & Schumaker, 1996), their abil- tutoring began for five of the six students
ity to make study cards and their scores on from a mean baseline test score of 50% to a
probe tests improved dramatically. At the mean postinstruction test score of 80% in
end of the study, most of the students were general education classes. A mean effect size
earning probe test scores above the 80% of 3.12 was achieved on student test per-
level (the “B” level in most classes), whereas formance, with one student’s effect size as
when the study began the large majority of high as 10.72. A follow-up condition
test scores fell in the failing range. demonstrated that four of the six students
maintained their performance levels in the
In some other studies in which the multi- targeted classes after Strategic Tutoring was
ple-probe across-students design was used, discontinued.
a third condition was included to show
whether students maintained their use of a Multiple-Baseline Across-Teachers Design
strategy over time. For example, in a study
by Hughes and Schumaker (1991) in which This design is similar to the design described
students were taught a test-taking strategy earlier except teachers are the subjects in the
(Hughes, Schumaker, Deshler, & Mercer, design instead of students. This design has
1988), students received probe tests every 2 been used in studies in which teachers have
weeks after instruction was terminated. The been taught how to use an innovative in-
results showed that they maintained their structional practice. In order to use the de-
use of the strategy by earning a mean score sign, observers visit the participating teach-
of 85% of the points available for as many ers’ classrooms and gather baseline data
as 11 weeks after instruction had been ter- during at least three lessons in each
minated. These researchers also showed teacher’s class. Next, some teachers receive
that the students’ test grades in their general instruction in the instructional practice,
education classes improved. Four students’ whereas others do not. Once the teachers
average test grades improved by one letter who have received instruction show that
grade, and two students’ average test grades they can use the instructional technique in
improved by two letter grades. their classes, at least one more baseline
point is gathered for the other teachers, and
In two other studies using the multiple- then they receive training and implement
probe across-students design, the major tar- the instruction.
get of the intervention was a decrease in er-
rors in students’ performance versus an Again, the major advantage of this design
increase in student performance as targeted is that only a few teachers are needed to
in the aforementioned studies. Lenz and complete it. This is important in today’s
Hughes (1990) focused on decreasing stu- world because teachers are busy, and they
dents’ errors in their oral reading of pas- are often reluctant to add anything to their
sages, and Schumaker and colleagues already full plates. The major disadvantage
(1982) focused on decreasing students’ er- is that several observations need to be con-
rors in their written work. Both studies ducted in actual classrooms. Sometimes
demonstrated that instruction in a strategy teachers are reluctant to have observers visit
can help students reduce the number of er- their classrooms several times.
rors they produce.
An example study in which this design
In a final variation of this design, Hock, has been employed was conducted by Bul-
Pulvers, Deshler, and Schumaker (2001) gren, Schumaker, and Deshler (1988). In
kept track of students’ quiz and test scores this study, secondary general education
in their general education classes (e.g., alge- teachers who were teaching inclusive sub-
bra class and biology class) while they were ject-area courses were taught how to use an
receiving a special type of tutoring, called

Designs for Applied Educational Research 489

instructional routine—the Concept Mastery him or her. As long as the student’s perfor-
Routine—for teaching conceptual informa- mance on the second skill to be taught is
tion to students. Before and after the in- stable, instruction in the second skill begins.
struction, their use of teaching behaviors as- Once the student reaches the mastery crite-
sociated with teaching concepts was rion on the second skill, another baseline
measured through the use of observational probe is given on the third skill to be taught.
checklists in their classes. Results showed As long as the student’s performance on the
that the teachers used a mean of 27% of the third skill to be taught is stable, instruction
teaching behaviors before the instruction in the third skill begins, and so forth for as
and 91% of the behaviors after the instruc- many skills as are to be taught.
tion. In each case, the teacher’s instruction
improved only after he/she attended the A major advantage associated with this
workshop. Concomitantly, the unit test design is that all the students can begin in-
scores of students who were enrolled in the struction at the same time. Some students
teachers’ classes also improved. Before their do not need to be “waiting” for instruction
teachers were trained, students with disabil- to begin, and teachers need not find some-
ities earned a mean test score of 60%. After thing else for those students to do while
their teachers were trained, they earned a they wait. The major disadvantage associat-
mean test score of 71%. ed with this design is that students need to
take many tests, even more tests than they
Another example study in which this de- take with the multiple-baseline across-stu-
sign was applied focused on teacher imple- dents design, because they have to take a
mentation of the Recall Enhancement Rou- test associated with each skill each time a
tine in inclusive classes (Bulgren, Deshler, & probe is given.
Schumaker, 1997). In this study, teachers
were taught how to use the routine to co- One study in which this design was used
construct mnemonic devices with their stu- was conducted by Schumaker, Deshler, Al-
dents related to information that the stu- ley, Warner, and Denton (1982). In this
dents needed to learn. Again, the results study, the effects of instruction in the Multi-
showed that the teachers’ behavior im- pass Strategy were determined. The strategy
proved only after attending the workshop has three parts or substrategies—“Survey,”
on the routine (see Bulgren, Deshler, Schu- “Size-up,” and “Sort-out.” During Survey,
maker, & Lenz, 2000; Bulgren, Lenz, Schu- students spend about 3 minutes getting an
maker, Deshler, & Marquis, 2002, for addi- overview of a textbook chapter. During
tional applications of this design). “Size-up,” students find the most important
information in each section of the chapter
Multiple-Baseline Across-Skills Design and take notes on it. During “Sort-out,”
students review the important information
Researchers at the KU-CRL have used a for a test. Thus, the behaviors required of
variation of the multiple-baseline across- the students in the three parts of the strategy
skills design called the multiple-probe are mutually exclusive. The results showed
across-skills design. This design is appropri- that when each student learned each part of
ate when students are to be taught several the strategy, the student’s performance relat-
skills in a sequence (not simultaneously) and ed to only that part of the strategy improved
when the skills are mutually exclusive (i.e., substantially. For example, one student per-
if students learn one of the skills, their be- formed an average of 33% of the Survey be-
havior related to the other skills is not ex- haviors, 33% of the Size-up behaviors, and
pected to change). When this design is used, 0% of the Sort-out behaviors before instruc-
all students participating in the study have tion. After instruction in each part, he per-
at least three baseline probe tests on all the formed 100% of the required behaviors.
skills to be taught. Then all students begin When given a test over the information in
instruction in the first skill to be taught at the chapter that he studied without supervi-
the same time. Once a student reaches the sion, the student earned a score of 25% dur-
mastery criterion on the first skill to be ing baseline and 90% after instruction.
taught, another baseline probe on the sec-
ond and third skills to be taught is given to Another study employing this design was
conducted by Clark, Deshler, Schumaker,
Alley, and Warner (1984). Here, instead of

490 METHODOLOGY

using several parts of a strategy, the re- and social studies assignments without
searchers used two different strategies to prompting. The remaining students general-
create the multiple-baseline effect. In this ized their use of the strategies after receiving
study, students were administered baseline generalization instruction. The four stu-
probes associated with reading comprehen- dents who received instruction in all four
sion of two different types of reading pas- strategies received higher scores than did the
sages: narrative and expository passages. average student on the district writing com-
Then they were taught the Visual Imagery petency exam. The students’ skills main-
Strategy (Schumaker, Deshler, Zemitzch, & tained into the next school year when their
Warner, 1993), which can be applied to nar- English writing assignments were gathered
rative passages. Once they had mastered and scored.
that strategy and their reading comprehen-
sion of narrative passages had increased, Multiple-Baseline Across-Settings Design
they were administered at least one probe
test on an expository passage. Once their The multiple-baseline across-settings design
performance on this type of passage stabi- is useful when students who fit the charac-
lized, they were taught the Self-Questioning teristics of the subject sample are present in
Strategy (Schumaker, Deshler, Nolan, & Al- several settings and when the intervention
ley, 1994). Results showed that the stu- can be implemented in each of those set-
dents’ reading comprehension scores in- tings. The way this design is typically used,
creased on each type of passage only after the same student is present in all of the set-
the students had received strategy instruc- tings, and the intervention is applied in each
tion related to that type of passage. For setting in relation to that student across
example, during baseline, the average per- time. A requirement of the design, then, is
centage of comprehension questions the stu- that when the behavior of the student
dents answered correctly on a narrative pas- changes in one setting, it does not change in
sage was 55%; after instruction, the average the other settings. KU-CRL researchers have
percentage correct was 69%. found that this is a difficult requirement to
meet because the skills and strategies that
Still another study in which the multiple- are often the focus of their interventions are
baseline across-skills design was used cov- taught to mastery, and students often gener-
ered writing interventions (Schmidt, 1983; alize their use of the strategies to other set-
Schmidt, Deshler, Schumaker, & Alley, tings without prompting. The students’
1989). This study focused on the instruction spontaneous generalization of a skill thus
of four writing strategies: the Sentence Writ- destroys the whole purpose of the design: to
ing Strategy (Schumaker & Sheldon, 1985; show control over the dependent variable.
Sheldon & Schumaker, 1985), the Para- Thus, KU-CRL researchers have rarely used
graph Writing Strategy (Schumaker & Lyer- this design.
la, 1991), the Error Monitoring Strategy
(Schumaker, Nolan, & Deshler, 1985), and Nevertheless, KU-CRL researchers have
the Theme Writing Strategy (Schumaker, in created a variation of this design. This hy-
press). Students were taught the strategies in brid design might be called the multiple-
sequence, and writing samples were gath- baseline across-settings and across-students
ered in their resource room class as well as design. In this design, the intervention is im-
in their English classes and social studies plemented in several classes (settings) by the
classes throughout the study. Each writing different teachers in those classes, and the
sample was scored for complete sentences behavior of a targeted student with LD is
and types of sentences, organized para- measured in each setting. This design avoids
graphs, errors, and organized themes (each the possible generalization effect that is
of these measures corresponded to one of probable in the multiple-baseline across-set-
the strategies taught). Results showed that tings design when only one student takes
the students’ writing improved in each area part in each replication of the design.
after the strategy corresponding to that area
was taught. In addition, five of the seven One study in which this hybrid design
students generalized their use of all the was used is a study conducted by Lenz, Al-
strategies they had learned to their English ley, and Schumaker (1987) on the use of ad-
vance organizers in secondary content class-

Designs for Applied Educational Research 491

es. In this study, seven subject-area teachers teacher to choose four concepts to be
and a student with LD who was enrolled in taught; she chose epiglottis, pancreas, alve-
one of each teacher’s classes participated. oli, and esophagus. Each concept was
During baseline, the teachers’ use of ad- taught in a separate lesson. Every other con-
vance organizers and the students’ oral re- cept was paired with the teacher’s use of the
ports of information they learned were mea- routine in an ABAB reversal design. The
sured in each lesson. Then the teachers were students took a test over the concept on the
taught (at different times) how to use ad- day following the lesson on the concept. Re-
vance organizers at the beginning of each sults showed that the students earned signif-
lesson. They then used advance organizers icantly higher scores on the tests about the
in their classes (settings). The results concepts that were taught through the use
showed when the number of advance orga- of the routine than on the tests about the
nizer elements used by a teacher increased concepts that were taught with traditional
in a class and when the student was taught means of instruction. After “epiglottis” was
to attend to the advance organizer elements, taught with the routine, for example, stu-
the number of items the student orally re- dents earned an average test score of 83%.
ported at the end of the class period also in- After “pancreas” was taught with tradition-
creased. al methods, students earned an average test
score of 27%. After “alveoli” was taught
Reversal Design with the routine, students earned an average
test score of 70%. After “esophagus” was
The reversal design is another design that taught with traditional methods, students
has been rarely used by KU-CRL re- earned an average test score of 42%. Thus,
searchers. To use this design, researchers this design can be effectively used to demon-
take baseline measures, then they imple- strate experimental control when an in-
ment the intervention. Next, they repeat the structional intervention can be applied to
baseline condition, then they implement the some sets of information and not applied to
intervention for a second time, and so forth, other sets of information in alternation to
for as many times as they wish in order to determine how well students learn that in-
demonstrate that the intervention causes a formation.
change in the behavior. Thus, a major re-
quirement of this design is that the behavior Group Designs
to be changed will “reverse” or revert to
baseline levels when the baseline condition Group designs are particularly useful in the
is reinstituted. Again, because KU-CRL re- field of education when researchers wish to
searchers have been focused on developing compare the effects of one instructional pro-
interventions that cause enduring changes in cedure to the effects of another instructional
behavior (note the studies mentioned earlier procedure. KU-CRL researchers have often
showing maintenance of the behavior after been interested in comparing the effects of
the intervention was discontinued), they an innovative instructional procedure to the
have rarely used this design. effects of traditional instruction. Sometimes
they have been interested in comparing the
One way this design has been adapted by effects of a new professional development
KU-CRL researchers is exemplified by a method for teachers to another method and
study by Bulgren and colleagues (2000) (see showing that the new method produces re-
Study 3 in the article). The researchers were sults that are at least as good as the other
studying the effects of a teacher’s use of an method.
instructional routine in a general education
science class. The instructional routine in- Usually, group designs have been used
volves the use of analogies to help students with an intervention that is relatively short
understand a new concept by relating it to in duration because the chance of getting all
something they already understand. For ex- the subjects together several times is small.
ample, a good analogy that might help stu- Students, for example, become ill, miss
dents understand the functions of the parts school, have extracurricular activities, have
of the eye involves showing students the doctor’s appointments, and have a job after
parts of a camera. Bulgren et al. asked the

492 METHODOLOGY

school, and generally have other priorities other measure might be used by the re-
besides participating in research studies. searchers to demonstrate the equivalence of
Teachers are often busy, too, and the likeli- the groups. Students in the experimental
hood of ensuring that they will get together group receive the intervention; students in
several times is small. Thus, group designs the control group do not. Except for this
have some limitations if an intervention difference, all other conditions should be
takes considerable time requiring several the same for both groups. For example, they
practice trials to mastery. should receive instruction from the same
teacher, the information covered should be
Sometimes, group designs have been used the same, and the same amount of instruc-
by KU-CRL researchers with relatively tional time should be available to both
small numbers of student or teacher sub- groups. As explained previously, this design
jects. Other times, KU-CRL researchers has been used infrequently by KU-CRL re-
have conducted studies involving hundreds searchers. However, it has been used in
of students. They have particularly used some instances when the intervention could
large-group designs when they have studied be compressed into a block of time, like one
interventions that are appropriate for gener- or two class periods and when measures
al education classes in which large numbers could be taken on the effects of the interven-
of students are enrolled and which included tion immediately.
several students with LD. Because school
administrators often do not allow students One study in which a postest-only con-
to be randomly assigned to groups for ex- trol-group design was used focused on the
perimental purposes, KU-CRL researchers effects of an advance organizer prior to stu-
have relied, for the most part, on involving dent reading of a passage. In this study,
schools in which students are already ran- Lenz (1983) recruited 46 students with
domly assigned to classes or on randomly learning disabilities and 51 normally achiev-
assigning intact classes to the different ex- ing students to participate. Within each
perimental conditions. achievement group, students were randomly
assigned to the experimental or the control
Sometimes teachers will not allow them- group. They all took a social studies
selves to be randomly assigned to groups. In achievement test, and the researcher demon-
these cases, to move forward with a study, strated that the groups were equivalent.
the researchers have allowed the teachers to Then all the students read three passages
assign themselves to either the experimental and took a test over each passage. Students
or the comparison group. Of course, such in the experimental group received some in-
an arrangement makes the researchers even struction about how to attend to and use
more responsible for showing that the two advance organizers and were given an ad-
groups are equivalent at the beginning of vance organizer before reading each pas-
the study. With these general considerations sage. The control group did not receive ad-
in mind, some of the group designs that vance organizers. The results showed that
have been used by KU-CRL researchers and the experimental students with LD an-
affiliates are described below. swered significantly more questions about
important information in the passages than
Control-Group Design with Students the control students with LD. There were
no differences between the groups of nor-
In the control-group design, students are mally achieving students.
randomly assigned to one of two groups: an
experimental group and a control group. Bulgren, Schumaker, and Deshler (1994)
The researchers need to demonstrate that used a variation of this design to study the
the two groups are equivalent in some way, effects of the Recall Enhancement Routine
and this is typically done by having students on the test performance of secondary stu-
take a pretest and showing that the stu- dents with disabilities and other students
dents’ scores on the pretest are similar. who were enrolled in general education
However, this is often not possible because classes. Forty-one students were recruited
of limitations imposed by school personnel from two social studies classes. The students
in terms of using valuable instructional were stratified by grade level (seventh or
time. If giving a pretest is not possible, some eighth) and exceptionality (LD or non-LD).

Designs for Applied Educational Research 493

Half the students in each stratified group tion through the use of a traditional lecture.
were randomly selected to participate in the Procedural controls were in place for
experimental group. The other half partici- teacher and classroom variables. Results
pated in the control group. Students in both showed that the experimental students with
groups received the same lecture by the disabilities and the whole group of experi-
same teacher. At the end of the lecture, a re- mental students earned significantly higher
view period covered the information in the scores on a test on the information than did
lecture. During the review portion of the their counterparts in the control group. For
lesson, the experimental students participat- example, experimental students with LD
ed in creating mnemonic devices to help earned an average score of 71%, and con-
them remember some of the information in trol students with LD earned an average
the lecture. The control students simply par- score of 57% on the test. Low achievers in
ticipated in a traditional review of the same the experimental group earned an average
facts (the facts were repeated). The re- score of 86% and in the control group
searchers demonstrated that the groups earned an average score of 63%. Normal
were equivalent by showing that the two achieving students in the experimental
groups correctly answered about the same group earned an average score of 84% and
number of questions about facts that were in the control group earned an average
not reviewed during the review period. They score of 76%. Many more experimental
also showed that students with LD in the students passed the test than control stu-
experimental group correctly answered an dents.
average of 71% of the questions about re-
viewed facts, whereas students with LD in Control-Group Design with Teachers
the control group correctly answered an av-
erage of 42% of the questions about re- KU-CRL researchers have used the control-
viewed facts. Students without LD in the ex- group design with teachers as well as stu-
perimental group correctly answered an dents. For example, Kline, Deshler, and
average of 85% of the questions about re- Schumaker (1992), as a part of a series of
viewed facts, whereas students without LD studies conducted to identify variables that
in the control group correctly answered an produce implementation of empirically vali-
average of 54% of the questions about re- dated practices by teachers, conducted a
viewed facts. The differences between the study in which teachers were randomly as-
groups were statistically significant, and signed to an experimental group and a con-
they were socially significant in that so trol group using a stratified method of as-
many more students earned passing scores signment to control for the grade level at
when the intervention was used than when which the teachers were teaching. Both
it was not used. groups had received a 3-hour overview on
strategy instruction and had participated in
Bulgren and colleagues (2002) used an- workshops on how to teach two learning
other variation of this design to study the strategies. As a part of this study, both
effects of a concept comparison routine on groups next participated in a workshop on
the test performance of secondary students how to teach the FIRST-Letter Mnemonic
enrolled in five general education science Strategy (Nagel, Schumaker, & Deshler,
classes, including students with disabilities. 1986). The experimental teachers met in
The teachers of each class randomly as- support teams once a month following the
signed the students to an experimental or a workshop. Support-team meetings were led
control group. The intervention was a sin- by a school administrator. The control
gle lesson on “Tropical Diseases” in which teachers did not participate in support-team
two diseases, malaria and snail fever, were meetings, but they had unlimited access to
compared and contrasted. During the les- the administrator in charge of the training
son, experimental students, who met in one workshops and of ensuring implementation.
classroom during their regularly scheduled Results showed that all the experimental
class period, received the information teachers began the instruction, while less
through the use of the new routine. Control than half the control teachers did. The ex-
students, who met in another classroom at perimental teachers began the instruction
the same time, received the same informa-

494 METHODOLOGY

within 9 days of the workshop on average, and live-instruction groups averaged 17 and
whereas the control teachers began it within 12 responses, respectively. The posttest was
13 days. The experimental teachers taught given during each student’s formal IEP
the strategy to more students, and they pro- meeting, at which the same questions were
ceeded further through the instruction than asked. This time, students in the compari-
did the control teachers. Thus, this design son group averaged 21 responses while stu-
can be helpful in identifying differential ef- dents in the computerized and live-instruc-
fects in teacher-training methods. tion group averaged 62 and 61 responses,
respectively. The second measure of student
Comparison-Group Design use of the strategy was a calculation of the
with Student Volunteers percentage of goals each student con-
tributed to his or her IEP during the confer-
This design involves the use of two groups ence. This was a posttest-only measure. Stu-
of students. One group receives the inter- dents in the comparison group contributed
vention, and the other group does not. Stu- an average of 20% of the goals found on
dents volunteer to participate in the inter- their IEPs. Students in the computerized and
vention. Sometimes, such a sampling live-instruction groups contributed of 66%
method may be necessary if the intervention and 79% of the goals on their IEPs, respec-
takes considerable time. This may be espe- tively. On a measure of knowledge of the
cially necessary if the students are adoles- Self-Advocacy Strategy, students in the live
cents because they often want to be in- and computerized instruction groups earned
volved in decisions being made about what average scores of 16% and 19% correct on
they learn and how they spend their time. the pretest and 94% and 97% correct on
Unfortunately, this method places a limita- the posttest, respectively. The researchers
tion on the research because the students concluded that the computerized instruction
who participate in the intervention are vol- was as effective as the live instruction in the
unteers (i.e., committed to participating in strategy in terms of knowledge of the strate-
the intervention) and are possibly different gy and actual performance in the IEP meet-
from other students who might not volun- ing.
teer.
Comparison-Group Design for Students with
An example of a study in which this de- Teacher Volunteers
sign was used was conducted by Lancaster,
Schumaker, and Deshler (2002). It com- This design has been used by KU-CRL re-
pared the effects of live instruction to the ef- searchers and associates when some teach-
fects of computer-based instruction in the ers are interested in testing an intervention
Self-Advocacy Strategy. The point of this by using it in their classrooms and other
study was to demonstrate that the comput- teachers are willing to participate in the
erized instruction was as effective as live in- study as comparison teachers only. Typical-
struction from a teacher. There were three ly, the researchers have used this design in
groups of students. Students volunteered to schools in which students are randomly as-
participate either in instruction in the Self- signed to classes. In other words, although
Advocacy Strategy or in a comparison teachers might be volunteering to use the in-
group. Students who volunteered to partici- tervention or not to use the intervention, the
pate in the instruction were randomly students in their classes have been randomly
selected to participate either in the live- assigned into those classes and thus can be
instruction group (n = 8) or in the comput- expected to be comparable samples. Of
erized instruction group (n = 8). Students’ course, the researchers make sure that
use of the strategy was measured in two dif- the student samples are truly equivalent
ferent ways. One measure calculated the through the use of a pretest or some other
number of responses students made to a se- measure collected from school records.
ries of questions related to individual educa-
tion plans (IEPs). Prior to instruction, stu- Vernon, Schumaker, and Deshler have
dents in the comparison group averaged 11 used this design in a series of studies focused
responses, and students in the computerized on teaching Cooperative Thinking Strate-

Designs for Applied Educational Research 495

gies in inclusive general education classes. iors. Student groups in both sets of classes
Cooperative Thinking Strategies are sets of had had the same number of opportunities
behaviors that small groups of students can to practice problem solving, although the
use to complete group tasks such as solving comparison classes had no instruction on
problems, deciding how to deal with a two- the topic (Vernon, Schumaker, & Deshler,
sided issue, and completing a large project. 1996).
In many of the studies, 10 to 12 teachers
volunteered to deliver the intervention (in- Vernon, Schumaker, and Deshler have
struction in one of the Cooperative Think- also used this design in a series of three
ing Strategies) in their classes, and another studies focused on creating safe learning
10 to 12 teachers volunteered their classes communities in inclusive elementary classes.
as comparison classes. Measures taken in The general idea associated with this work
the classes before and after instruction in- is to provide groups of students, including
cluded student knowledge of how to pro- students with disabilities, ways to support
ceed on the particular type of group task, each other during instruction. As in the Co-
student performance of the behaviors in- operative Thinking Strategies studies, two
volved in completing the group task, stu- groups of teachers and their students have
dent performance of social skills while com- been involved in each community-building
pleting the group task, sociometric study. Each study has focused on one
measures of student acceptance of each stu- method for building a learning community.
dent in the class, and student opinions of The results of the studies have been positive,
group work. The data collected on students showing that teacher use of the instruction
with disabilities was analyzed separately leads to significant and substantial differ-
from the data collected on other students in ences between experimental and compari-
the classes. son students with regard to their knowledge
and their performance in class.
Results showed in each study that stu-
dents with and without disabilities in the For example, one study focused on teach-
experimental classes knew significantly ing students how to participate in class dis-
more about what they had to do and per- cussions in respectful and helpful ways.
formed significantly more of the behaviors Twenty teachers and 372 students in the
required than the students in the compari- teachers’ inclusive classes participated. The
son classes after the experimental teachers results showed that students in the experi-
implemented the instruction. For example, mental classes knew significantly more in-
one study focused on teaching students a formation about how to create a classroom
way of problem solving in groups. Twenty community, participated more frequently,
teachers and 392 students participated. At and engaged in fewer behaviors that would
the beginning of the study, experimental disrupt a discussion (e.g., yell-outs, negative
and comparison students earned an average comments, and laughing at speakers) than
of 1% of the points on a test of knowledge did students in the comparison classes after
of how to solve problems in groups. After the experimental classes had participated in
the 10 experimental teachers taught their the experimental lessons. At the beginning
students a strategy for solving problems in of the study, for example, students in the ex-
groups, on average, the experimental stu- perimental classes and comparison classes
dents earned 75% of the points, and the earned 14% and 13% of the points, respec-
comparison students earned 1% of the tively, on a test of their knowledge about
points on the knowledge test. Before in- what they should be doing in discussions
struction began, the student groups in the and concepts related to learning community.
experimental and comparison classes per- After the instruction, experimental students
formed an average of 34% of the problem- earned 77% of the points, and comparison
solving behaviors. After the instruction of students earned 16% of the points. After
the strategy in experimental classes, the ex- the instruction, experimental students with
perimental students performed an average disabilities participated a total of 289 times
of 84% of the problem-solving behaviors in the final discussion in their classes, while
while the students in the comparison classes comparison students with disabilities partic-
performed an average of 39% of the behav- ipated a total of 99 times. In the first discus-
sion, they participated a total of 121 and

496 METHODOLOGY

108 times, respectively (Vernon, Schumaker, Control-Group Design with Counterbalanced
& Deshler, 1999). In sum, these researchers Conditions for Teachers
have found that they can use this compari-
son-group design to study the effects of in- KU-CRL researchers have used this design
terventions in inclusive classes while moni- to show the effects of a particular training
toring the performance of students with method or condition on teacher implemen-
disabilities as well as other students in the tation of a validated practice. For example,
classes. in one study, Kline, Deshler, and Schumaker
(1992) wanted to know the effects of pro-
Comparison-Group Design with viding teachers with all the materials and
Counterbalanced Conditions for Students equipment that they would need to imple-
ment an intervention. All the participating
This design has been used by KU-CRL re- teachers attended workshops on the Word
searchers to demonstrate that a particular Identification Strategy (Lenz, Schumaker,
instructional method produces learning Deshler, & Beals, 1984) and the Paraphras-
gains for students with regard to subject- ing Strategy (Schumaker, Denton, & Desh-
matter content (e.g., science, social studies, ler, 1984). The teachers were randomly di-
and literature) in inclusive general educa- vided into two groups. After the Word
tion classes. In one study in which this de- Identification Strategy workshop, teachers
sign was used, Bulgren and colleagues in Group 1 received all the materials and
(2000) studied the effects of analogical in- equipment needed to teach the Word Identi-
struction in inclusive secondary classes. fication Strategy. Teachers in Group 2 re-
Eighty-three students in eight science classes ceived only the instructor’s manual. After
participated. The students had been ran- the Paraphrasing Strategy workshop, teach-
domly assigned to their classes by school ers in Group 2 received all the materials
personnel at the beginning of the year. The needed to teach the Paraphrasing Strategy.
intact classes were randomly assigned to The teachers in Group 1 received only the
one of two experimental conditions. In one instructor’s manual. Results showed that
condition, students received analogical in- the teachers who received the materials be-
struction related to the concept of commen- gan the instruction sooner and taught more
salism and traditional instruction related to students the strategy than did the teachers
the concept of pyramid of numbers. In the who did not receive the additional materi-
other condition, students received analogi- als.
cal instruction related to the concept of
pyramid of numbers and traditional instruc- Combination Designs
tion related to the concept of commensal-
ism. All the students received traditional in- Sometimes, KU-CRL researchers have com-
struction related to the concepts of food bined single-subject designs with group de-
web and heterotroph. Both groups received signs. This combination approach is espe-
the instruction on all four concepts within cially useful when researchers want to
one class period. On the next day, they took demonstrate improved performance across
a test on the information. The results time as the result of an intervention as well
showed, for example, that the students with as compare the effects of the intervention to
LD who received the analogical instruction the effects of some other type of instruction.
in association with the concept of pyramid
of numbers earned significantly higher Combination Designs with Students
scores (M = 69%) than did the students who
had traditional instruction on that concept In some studies, KU-CRL researchers and
(M = 40%). Similar differences were found affiliates have combined the multiple-probe
for low achievers, normal achievers, and across-students design with the group de-
high achievers. There were no differences sign to determine the effects of an interven-
between the groups on test items related to tion on student performance. In the group
the concepts of food web and heterotroph, design, students are randomly selected into
which indicated that the groups were equiv- one of two groups: an experimental group
alent.

Designs for Applied Educational Research 497

and a control group. In the multiple-probe the different baselines were the four social
design, the skills of students in the experi- skills taught in the experimental class. A dif-
mental group are measured across time, be- ferent student in the experimental class was
fore and after they participate in the instruc- targeted for the measurement of each social
tional intervention. skill in the multiple-baseline design. The in-
tervention focused on the instruction of the
Van Reusen, Deshler, and Schumaker four social skills in sequence, and the target-
(1989) used this design to study the effects ed experimental students’ performance of
of live instruction of the Self-Advocacy the social skills was measured before and af-
Strategy (Van Reusen, Bos, Schumaker, & ter each skill was taught. Results showed
Deshler, 1994) on student’s performance in that the selected students’ performance of
simulated and real IEP conferences. Stu- each skill improved only after instruction in
dents with LD were randomly assigned to that skill had taken place. In addition, the
an experimental and a control group. All students in the experimental class earned an
the students received at least three tests on average score of 45% on the social skills
their performance of self-advocacy skills at performance pretest and 86% on the post-
the beginning of the study except for one test. Students in the comparison class
control student. Then some of the students earned average pretest and posttest perfor-
in the experimental group received instruc- mance scores of 44% and 44%, respective-
tion in the strategy. Once their performance ly.
had improved, another group of experimen-
tal students received instruction. Once their A third study using more than one design
performance had improved, a third group of was conducted by Schumaker and col-
experimental students received the instruc- leagues (1984). The focus of this study was
tion. Meanwhile, the performance of the teaching students a way to learn informa-
control-group students continued to be tion from a textbook chapter that had been
measured. Results showed that the baseline specially coded and for which a special type
performance levels of the two groups of stu- of audiotape had been made. Besides using
dents were comparable and stable. The mul- a multiple-probe across-substrategies design
tiple-probe design demonstrated that the to show that the students learned the sub-
performance of the experimental students strategies, the researchers also used a multi-
improved only after they received the in- ple-probe across-students design to show
struction. Students who had received the in- the effects of learning the substrategies on
struction made a mean of 98 relevant con- unit test scores. They also used a reversal
tributions to their IEP conferences whereas design to show the students’ performance
students who did not receive the instruction on unit tests when they used the special
made a mean of 42 relevant comments. This tapes versus verbatim tapes. (Prior to some
represented a statistically significant differ- unit tests, the experimental students re-
ence. Moreover, 86% of the goals appearing ceived the specially marked textbook chap-
in the final IEPs of students who had re- ters and special audiotapes; prior to other
ceived the training were goals specified by tests, the experimental students received a
the students themselves compared to 13% verbatim audiotape of the chapter.) Finally,
of the goals appearing in the final IEPs of they used a comparison-group design to
students who had not received the instruc- show how the experimental students per-
tion. formed in comparison to other students in
the same general education class on the
In another study in which the multiple- tests. The comparison students received the
baseline design was combined with a com- verbatim audiotapes. Results showed that
parison-group design is a study that focused verbatim audiotapes produced no change in
on the instruction of social skills in an inclu- the comparison students’ test scores (M =
sive sixth-grade class (Vernon & Schumak- 51%) when compared to baseline test scores
er, 1993). For the comparison part of the (M = 56%). Experimental students earned
design, a second sixth-grade class partici- substantially higher scores after using the
pated. Students in both classes took social specially designed audiotapes and marked
skills tests before and after the experimental text (M = 91%) than they did when they
class received instruction in the social skills. used verbatim tapes (M = 41%).
For the multiple-baseline part of the design,

498 METHODOLOGY

Combination Designs with teachers in the feedback groups who were
Teachers and Students still in the district were teaching the strategy
in the next school year, whereas none of the
KU-CRL researchers have used this type of comparison teachers had reinitiated the in-
design when they are interested in tracking struction that year. Thus, teacher success in
teacher performance in the classroom while teaching something new to students might
also measuring the effects of teacher perfor- have a profound effect on whether teachers
mance on student performance. Sometimes, use that instructional practice in the future.
they have used a multiple-probe across-
teachers design to determine teacher effects In another variation of the combination
and a group design to determine student ef- design, Fisher, Deshler, and Schumaker
fects. (1999) studied the effects of computerized
versus live instruction on the knowledge,
Kline, Schumaker, and Deshler (1991) construction of instructional materials, and
used such a combination of designs to behavior of teachers in their general educa-
determine the effects of training on teacher tion classrooms. Twenty-nine of 58 preser-
delivery of feedback to students and to de- vice teachers were randomly assigned to re-
termine the effects of different types of feed- ceive instruction on how to use the Concept
back on student learning. Eighteen teachers Mastery Routine (Bulgren, Deshler, & Schu-
were randomly assigned to one of three maker, 1993) through a computerized pro-
groups: a group that received training in a gram; the remainder were assigned to re-
feedback routine, a group that received ceive live instruction. These teachers took
training in the same feedback routine plus two tests before and after the instruction: a
training in how to teach students to accept knowledge test and a test of constructing
feedback, and a comparison group. Partici- materials associated with the routine. Of
pating students were randomly selected stu- eight inservice teachers, four were randomly
dents with learning disabilities in the partic- selected to receive the computerized train-
ipating teachers’ classes. All the teachers ing, and the other four were randomly se-
were taught to implement instruction in the lected to receive the live instruction. These
Sentence Writing Strategy (Schumaker & teachers took the same tests as the preser-
Sheldon, 1985) in a daylong workshop. In vice teachers. In addition, a multiple-probe
addition, teachers in one group received ex- across-teachers design with three replica-
plicit instruction in how to give feedback to tions was used with the inservice teachers to
students. Teachers in a second group re- show their performance of the routine in
ceived explicit instruction in how to give their classes. The results showed no differ-
feedback to students and how to teach stu- ences between the groups; that is, the com-
dents to accept teacher feedback. Teachers puterized instruction was shown to be as ef-
in the third group (the comparison group) fective as live instruction in terms of teacher
received consultation with regard to imple- knowledge of the routine, teacher construc-
menting the strategy instruction. The multi- tion of materials, and teacher use of the rou-
ple-probe results showed that the teachers’ tine.
delivery of feedback improved concomitant
with instruction in how to give feedback; Conclusion
the group results showed that the students
of teachers in the two feedback groups Researchers at the KU-CRL and their asso-
reached mastery in significantly fewer ciates have been studying the problem of ed-
lessons than did students of teachers in the ucating students with learning disabilities
comparison group. In fact, they required for the past 24 years from a number of an-
one-third fewer practice trials (10 vs. 15), gles. They have developed and validated in-
on average, representing a difference of a structional methods for teaching these stu-
week of instruction. There were no differ- dents a variety of learning strategies so that
ences found between the number of errors they can meet the demands associated with
made by students of comparison teachers on their required general education courses.
practice trials 1 and 2, while there were dif- They have developed and validated methods
ferences between the errors made on these for teachers of general education courses to
trials for students whose teachers received
the feedback training. Moreover, 7 of 11

Designs for Applied Educational Research 499

deliver the content in learner-friendly ways, Bulgren, J. A., Deshler, D. D., Schumaker, J. B., &
ways that enable students to understand Lenz, B.K. (2000). The use and effectiveness of
and remember the content. They have devel- analogical instruction in diverse secondary con-
oped and validated methods for creating tent classrooms. Journal of Educational Psychol-
learning communities within classes and for ogy, 92(3), 426–441.
teaching students how to work together in
productive ways. For the most part, this re- Bulgren, J. A., Hock, M. F., Schumaker, J. B., &
search has been conducted in schools, and Deshler, D. D. (1995). The effects of instruction
in many cases, regularly assigned teachers in a paired associates strategy on the information
have delivered the instruction. To conduct mastery performance of students with learning
their research under typical school condi- disabilities. Learning Disabilities Research and
tions, KU-CRL researchers and affiliates Practice, 10(1), 22–37.
have had to be creative in working with a
variety of research designs. At times, they Bulgren, J. A., Lenz, B. K., Schumaker, J. B., Desh-
have had to sacrifice some of the strict rules ler, D. D., & Marquis, J. G. (2002). The use and
of experimental design, like pure random effectiveness of a comparison routine in diverse
assignment, to ensure that a research study secondary content classrooms. Journal of Educa-
was conducted. However, they have tried to tional Psychology, 94(2), 356–371.
substitute other types of controls within a
study to help them demonstrate the effects Bulgren, J. A., & Schumaker, J. B. (1996). The
of their intervention. They chose each de- Paired-Associates Strategy. Lawrence: The Uni-
sign exemplified here for its fit to the prob- versity of Kansas Center for Research on Learn-
lem at hand as well as its fit to the current ing.
situation in the schools. Such adaptations
are needed if applied educational research is Bulgren, J., Schumaker, J. B., & Deshler, D. D.
to produce outcomes and products that will (1988). Effectiveness of a concept teaching rou-
be usable in today’s and tomorrow’s schools tine in enhancing the performance of LD students
and that will substantially change the ways in secondary-level mainstream classes. Learning
that students with disabilities are educated Disability Quarterly, 11(1), 3–17.
in the future.
Bulgren, J. A., Schumaker, J. B., & Deshler, D. D.
Dedication (1994). The effects of a recall enhancement rou-
tine on the test performance of secondary stu-
The authors wish to dedicate this chapter in honor dents with and without learning disabilities.
and memory of Dr. Donald M. Baer, their teacher Learning Disabilities Research and Practice, 9(1),
and supporter. His creatvity and leadership in ex- 2–11.
perimental design formed the springboard from
which the studies reported in this article were born. Clark, F. L., Deshler, D. D., Schumaker, J. B., Alley,
The authors are grateful for his instruction, his ex- G. R., & Warner, M. M. (1984). Visual imagery
ample, and all aspects of his beautiful mind and and self-questioning: Strategies to improve com-
wonderful heart. prehension of written material. Journal of Learn-
ing Disabilities, 17(3), 145–149
References
Elmore, R. F. (1996). Getting to scale with good ed-
Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). ucational practice. Harvard Educational Review,
Some current dimensions of applied behavior 66(1), 1–25.
analysis. Journal of Applied Behavior Analysis, 1,
91–97. Fisher, J. B., Deshler, D. D., & Schumaker, J. B.
(1999). The effects of an interactive multimedia
Bulgren, J. A., Deshler, D. D., & Schumaker, J. B. program on teachers’ understanding and imple-
(1993). The Concept Mastery Routine. Law- mentation of an inclusive practice. Learning Dis-
rence, KS: Edge Enterprises. ability Quarterly, 22(2), 127–142.

Bulgren, J. A., Deshler, D. D., & Schumaker, J. B. Hock, M. F., Pulvers, K. A., Deshler, D. D., & Schu-
(1997). Use of a recall enhancement routine and maker, J. B. (2001). The effects of an after-school
strategies in inclusive secondary classes. Learning tutoring program on the academic performance
Disabilities Research and Practice, 12(4), of at-risk students and students with learning dis-
198–208. abilities. Remedial and Special Education, 22(3),
172–186.

Horner, R.D., & Baer, D.M. (1978). Multiple-probe
technique: A variation of the multiple-baseline
design. Journal of Applied Behavior Analysis,
11(1), 189–196.

Hughes, C. A., & Schumaker, J. B. (1991). Test-tak-
ing strategy instruction for adolescents with
learning disabilities. Exceptionality, 2, 205–221.

Hughes, C. A., Schumaker, J. B., Deshler, D. D., &
Mercer, C. O. (1988). The Test-Taking Strategy:
Instructor’s manual. Lawrence, KS: Edge Enter-
prises.

Kline, F. M., Schumaker, J. B., & Deshler, D. D
(1991). Development and validation of feedback
routines for instructing students with learning

500 METHODOLOGY

disabilities. Learning Disability Quarterly, 14(3), ACLD (Vol. 3, pp. 170–183). Syracuse, NY:
191–207. Syracuse University Press.
Kline, F. M., Deshler, D. D., & Schumaker, J. B. Schumaker, J. B., Deshler, D. D., Nolan, S. M., &
(1992). Implementing learning strategy instruc- Alley, G. R. (1994). The Self-Questioning Strate-
tion in class settings: A research perspective. In gy: Instructor’s manual. Lawrence: University of
M. Pressley, K. R. Harris, & J. T. Guthrie (Eds.) Kansas Center for Research on Learning.
Promoting academic competence and literacy in Schumaker, J. B., Deshler, D. D., Zemitzch, A., &
school (pp. 361–406. Orlando, FL: Academic Warner, M. M. (1993). The Visual Imagery Strat-
Press. egy. Lawrence: University of Kansas Center for
Lancaster, P., Schumaker, J. B., & Deshler, D. D. Research on Learning.
(2002). The development and validation of an in- Schumaker, J. B., & Lyerla, K. D. (1991). The Para-
active hypermedia program for teaching a self- graph Writing Strategy: Instructor’s manual.
advocacy strategy to students with disabilities. Lawrence: University of Kansas Institute for Re-
Learning Disabilty Quarterly, 25(4), 277–302. search in Learning Disabilities.
Lenz, B. K. (1983). The effect of advance organizers Schumaker, J. B., Nolan, S. M., & Deshler, D. D.
on the learning and retention of learning disabled (1985). The Error Monitoring Strategy: Instruc-
adolescents within the context of a cooperative tor’s manual. Lawrence: University of Kansas In-
planning model. Unpublished doctoral disserta- stitute for Research in Learning Disabilities.
tion, University of Kansas, Lawrence. Schumaker, J. B., & Sheldon, J. (1985). The Sen-
Lenz, B. K., Alley, G. R., & Schumaker, J. B. (1987). tence Writing Strategy: Instructor’s manual.
Activating the inactive learner through the pre- Lawrence: University of Kansas, Institute for Re-
sentation of advance organizers. Learning Dis- search on Learning Disabilities.
ability Quarterly, 10(1), 53–67. Scruggs, T. E., Mastropieri, M. A., & Casto, G.
Lenz, B. K., & Hughes, C. (1990). A word identifi- (1987). The quantitative synthesis of single-sub-
cation strategy for adolescents with learning dis- ject research: Methodology and validation. Re-
abilities. Journal of Learning Disabilities, 23(3), medial and Special Education, 8(2), 24–33.
149–158, 163. Sheldon, J., & Schumaker, J. B. (1985). The Sen-
Lenz, B. K., Schumaker, J. B., Deshler, D. D., & tence Writing Strategy: Student Lessons.
Beals, V. L. (1984). The Word Identification Lawrence, KS: Edge Enterprises.
Strategy: Instructor’s manual. Lawrence: Univer- Swanson, H.L., & Hoskyn, M. (1998). Experimen-
sity of Kansas Institute for Research in Learning tal intervention research on students with learn-
Disabilities. ing disabilities: A meta-analysis of treatment out-
Nagel, D. R., Schumaker, J. B., & Deshler, D. D. comes. Review of Educational Research, 68,
(1986). The FIRST-Letter Mnemonic Strategy: 277–321.
Instructor’s manual. Lawrence, KS: Edge Enter- Turnbull, H. R., & Turnbull, A. P. (1989). Report
prises. of consensus: Conference on principles of family
Schmidt, J. (1983). The effects of four generaliza- research, Lawrence, KS: Bureau of Child Re-
tion conditions on learning disabled adolescents’ search.
written language performance in the regular Turnbull, R., Rainbolt, K., & Buchele-Ash, A.
classroom. Unpublished doctoral dissertation, (1997). Individuals with Disabilities Education
University of Kansas, Lawrence. Act: Digest of significance of 1997 amendments.
Schmidt, J. L., Deshler, D. D., Schumaker, J. B., & Lawrence, KS: Beach Center on Families and Dis-
Alley, G. R. (1988/89). Effects of generalization ability
instruction on the written language performance Van Reusen, T., Bos, C., Schumaker, J. B., & Desh-
of adolescents with learning disabilities in the ler, D. D. (1994). The Self-Advocacy Strategy: In-
mainstream classroom. Reading, Writing, and structor’s manual. Lawrence, KS: Edge Enterpris-
Learning Disabilities, 4(4), 291–309. es, Inc.
Schumaker, J. B. (in press). The Theme Writing Van Reusen, A. K., Deshler, D. D., & Schumaker, J.
Strategy: Instructor’s manual. Lawrence, KS: B. (1989). Effects of a student participation strat-
Edge Enterprises. egy in facilitating the involvement of adolescents
Schumaker, J. B., Denton, P. H., & Deshler, D. D. with learning disabilities in the individualized ed-
(1984). The Paraphrasing Strategy: Instructor’s ucational program planning process. Learning
manual. Lawrence: University of Kansas Institute Disabilities, 1(2), 23–34.
for Research in Learning Disabilities. Vernon, D. S., Schumaker, J. B., & Deshler, D.D.
Schumaker, J. B., Deshler, D. D., Alley, G. R., Warn- (1993). Who benefits from social skills instruc-
er, M. M., & Denton, P. H. (1982). Multipass: A tion in the mainstream classroom? Exceptionality
learning strategy for improving reading compre- Education Canada, 3(1, 2), 9–38.
hension. Learning Disability Quarterly, 5,(3) Vernon, D. S., & Schumaker, J. B. (1996). The
295–304. LEARN Strategy and the THINK Strategy: Co-
Schumaker, J. B., Deshler, D. D., Alley, G. R., Warn- operative Thinking Strategies in the classroom
er, M. M., Clark, F. L., & Nolan, S. (1982). Error (Continuation Repotr No. SBIR R44MH47211-
Monitoring: A learning strategy for improving 04, National Institute of Mental Healyh). Lawer-
adolescent academic performance. In W. M. ence, KS: Edge Enterptrises.
Cruickshank & J. W. Lerner (Eds.), Best of

30

The Methods of Cluster Analysis and
the Study of Learning Disabilities

Deborah L. Speece

Since the condition of learning disabilities such, the review of applied work is illustra-
(LD) has been recognized, researchers have tive rather than exhaustive but is designed
sought methods to understand the apparent to cover the major methodological issues
heterogeneity of skills in children with the germane to using cluster analysis.
disorder. Fletcher and colleagues (1997)
traced clinical interest to the late 1800s and Part of the appeal of applying cluster
Morris, Blashfield, and Satz (1986) attrib- analysis techniques to the study of LD is the
uted the first empirical effort to 1969. Both possibility of making sense of a field that is
clinical/rational and empirical methods have beset by muddled constructs and definition-
been applied to identify homogeneous sub- al conundrums. It is frequently noted that
types of children and there are many exam- LD is a multivariate phenomenon, but dis-
ples of each approach to classification re- tinguishing the central attributes of the con-
search in LD (e.g., Boder, 1973; Lyon, 1985; dition from important correlates has proven
Morris et al., 1998; Speece, McKinney, & to be a difficult task (MacMillan, 1993). An
Appelbaum, 1985; Wolf & Bowers, 1999). apt analogy to the situation in LD was pro-
Although there are good reasons to ap- vided by Supreme Court Justice Potter Stew-
proach classification from a clinical/rational art in his discussion of pornography: “I
perspective (Torgesen, 1982), the purpose of shall not today attempt to define [pornogra-
this chapter is to review the methods of em- phy]. . . . But I know it when I see it” (Jaco-
pirical subtyping known as cluster analysis. bellis v. Ohio, 378 U.S. 184 [1964], p. 8),
The specific goals of this chapter are to de- retrieved from http://laws.findlaw.com/us/
scribe the details of the method, illustrate 378/184.html 10/15/2001). Similarly, there
the application of cluster analysis methods is little doubt in many minds about the exis-
by reference to research examples in LD, as- tence of LD, but our ability to identify the
sess the contribution of cluster analysis in- constructs and develop a coherent classifica-
vestigations to the study of LD, and suggest tion remains a goal rather than a reality.
future directions. The examples are drawn
primarily from my own work but also in- Of course, a statistical method can only
clude research by other investigators. As provide some of the tools needed by investi-
gators to develop a coherent classification,
and in this chapter I present and discuss the

501

502 METHODOLOGY

decisions required of the investigator. Clus- ously considers the proposition that no clus-
ter analysis is not a single method but, ters exist in the data set; (5) sample size is
rather, encompasses a variety of approaches large enough to test proposed cluster solu-
(Lorr, 1994). The present discussion is limit- tions with a split sample (cross validation);
ed to hierarchical agglomerative methods in and (6) external validation results are inter-
which each entity (in the present case, a pretable within the theoretical model. This
child) starts as his or her own cluster and design may exist in the larger literature on
successive mergers are made until all partic- cluster analysis, but it does not yet exist in
ipants are in a single cluster. Thus, there are the cluster analysis literature on learning/
always n – 1 possible cluster solutions and reading disabilities. This does not mean that
clusters will always be obtained. The inves- classification research using cluster analysis
tigator must decide, among other things, in LD has not yielded interpretable results.
what point in the hierarchy best represents Rather, the quintessential study has yet to
the true underlying structure of the data. be conducted. The issues generated by Skin-
ner’s three-stage framework are reviewed
Skinner (1981) provided a framework for next with examples from LD research.
designing and evaluating classification re-
search regardless of whether the approach Theory Formulation
was clinical or empirical. The three major
elements of the framework are theory for- Specification of Theory
mulation, internal validity, and external va-
lidity. Theory formulation represents a Even though cluster analysis is primarily an
number of issues including statement of the- exploratory technique (Everitt & Dunn,
ory guiding the investigation, purpose of the 1983), the theoretical or conceptual basis of
classification, and selection of subjects, vari- the study needs to be stated explicitly, as
ables, procedures, and similarity measures, cluster analysis techniques will always yield
the latter pertaining to how subjects/clusters clusters even with random data. A theoreti-
will be evaluated as similar for successive cal or conceptual framework provides the
mergers to be made. Internal validity refers basis to develop hypothesized subtypes that
to the replicability of the cluster solution, should be obtained. The hypothesized sub-
and it is at this stage that cluster analysis types can then be used as guideposts to as-
procedures may seem to be “black magic” sist in selecting possible cluster solutions,
(Jain & Dubes, 1988). A number of proce- providing some confidence that the cluster
dures have been used to determine the criti- solution obtained represents a meaningful,
cal questions of whether clusters exist in the as opposed to random, partition of the data.
data and, if so, how many. External valida- There is no need for complete specification
tion is often viewed as the most interesting of the number of clusters and expected pro-
phase because it is here that cluster differ- files because interesting, unanticipated sub-
ences are assessed for usefulness and mean- types may emerge that lead to further devel-
ing. It is also the stage at which analysis re- opment of theory.
turns to more familiar ground for many
researchers with the use, for example, of In addition to hypothesized subtypes, the-
multivariate analysis of variance (MANO- ory should also guide predictions about
VA) or analyses of variance (ANOVA) to as- cluster differences (external validation). It is
sess cluster differences. not rare for more recent investigations to
offer hypothesized subtypes, but the exter-
A cluster analysis study that addresses nal validation procedures are often not
each aspect of Skinner’s (1981) framework linked to theory on an a priori basis. Rather,
would include the following design features: the selection of procedures and the explana-
(1) subjects are selected from a known pop- tion of cluster differences have a decidedly
ulation and selection criteria are theoretical- post hoc flavor. Although this approach is
ly relevant; (2) both the classification vari- often convincing, it is not a strong position.
ables and external validation procedures are An exception to this criticism was provided
theoretically based; (3) hypothesized sub- by Feagans and Appelbaum (1986) in their
types are presented and the relationship of study of oral-language subtypes of children
the subtypes to external validation proce-
dures are proposed; (4) the investigator seri-

Cluster Analysis 503

with LD. They presented a conceptual basis are equally important. The work in LD, ei-
for expecting a narrative language subtype ther implicitly or explicitly, has pursued pre-
and further specified that children with dictive purposes rather than communica-
strong narrative skill also would exhibit tion. Investigations in LD usually identify
better achievement, especially reading multiple subtypes and explore the external
comprehension. Their hypotheses were con- validity of cluster solutions by examining
firmed: two narrative subtypes were identi- cluster differences on an independent set of
fied and exhibited higher academic achieve- measures. For example, Speece and col-
ment compared to the other four subtypes leagues (1985) identified seven behavioral
across 3 years. subtypes based on general education teach-
ers’ ratings and validated cluster differences
In a study of unselected kindergarten on special education teacher ratings and ob-
children, Speece, Roth, Cooper, and De La served classroom behavior. Morris and col-
Paz (1999) hypothesized two oral-language leagues (1998) identified 10 cognitive/lin-
subtypes, one representing narrative skills guistic subtypes and used a variety of
(based on Feagans & Appelbaum, 1986) measures to validate them. The point is that
and one representing phonological aware- there are usually too many subtypes identi-
ness skills based on the extant literature fied to be useful in practice and external val-
documenting the importance of this skill for idation procedures are not typically related
early reading. They also predicted that if to practice.
the subtypes were obtained, a subtype with
strong phonological skills should also Participants
demonstrate better skills on word reading
and spelling measures whereas a strong nar- As with any study, the issue here is the ex-
rative subtype should exhibit better skills in tent to which the results can be generalized
listening comprehension. The proposed to a known population. The interpretation
subtypes were obtained with mixed exter- of early classification work in LD is ham-
nal validation results. The strong phonolog- pered by the use of school-identified sam-
ical subtype (which also had higher oral- ples, the problems of which are well known
language skills overall) did exhibit better (Keogh & MacMillan, 1983; MacMillan &
reading and spelling skills compared to the Speece, 1999). In some instances, the results
other subtypes, but the strong narrative with school-identified samples are theoreti-
subtype had higher listening comprehension cally compelling (e.g., Feagans & Appel-
scores only in comparison to a subtype with baum, 1986), and in others they provide hy-
low overall language skill. These results potheses for further work (e.g., McKinney
suggested that oral-language skills may not & Speece, 1986). In general, however, these
have a uniform influence on literacy skills, types of studies demonstrate the feasibility
which is contrary to conventional wisdom of cluster analysis methods rather than pro-
of strong linkages between oral language vide generalizable classifications. One ex-
and literacy. ception is the analysis of the Florida Longi-
tudinal data set by Satz and Morris (1981).
Purpose In that study an unselected group of males
was used in a cluster analysis to identify
The purpose of a classification study guides subtypes with LD, thus avoiding several
selection of external validation procedures problems related to selection bias. Similarly,
and is also tied to selection of subjects and Speece and colleagues (1999) used an unse-
classification measures. Classification sys- lected group of kindergarten children in
tems may promote either communication their study of oral-language subtypes and
with practitioners or prediction which is reading. When the goal of the study is more
more related to scientific goals (Blashfield narrowly focused on subtypes of children
& Draguns, 1976). Blashfield and Draguns with LD rather than identifying subtypes in
noted that these goals are in opposition be- the general population, care must be exer-
cause communication necessitates simplicity cised in the definition and selection of sub-
and ease of implementation, whereas pre- jects. Morris and colleagues (1998) used re-
diction requires complexity and flexibility search criteria to define children as reading
to examine scientific tenets. Both purposes

504 METHODOLOGY

disabled rather than rely on school identifi- plified to a few defining issues. Regarding
cation procedures. similarity measures, Cronbach and Gleser
(1953) demonstrated that profiles contain
Classification Measures three pieces of information: shape (i.e., the
“ups and downs” of a profile), scatter (i.e.,
A problem with some cluster analysis stud- variation among profile points), and eleva-
ies is that they represent a secondary analy- tion (mean level of performance across the
sis of an existing database. At issue is not profile). Correlational similarity measures
the secondary analysis per se but, rather, the yield clusters based on shape, whereas dis-
limitations imposed on sample definition tance similarity measures incorporate all
and variable selection for both classification three pieces of information (Skinner, 1978).
and external validity procedures. Several
studies have been designed specifically as Operationally, a correlation similarity
classification investigations which provide measure removes the mean and standard de-
more freedom to select a coherent set of viation from each profile to produce
classification variables. Speece (1987) se- “shape” clusters, whereas a distance metric
lected classification variables using an infor- uses the original form of the data, usually
mation-processing theoretical model for a standardized. A problem with distance mea-
study of reading disabilities and Speece and sures is that shape, scatter, and elevation are
Cooper (1990) used a multiple domain ap- confounded and may differentially affect
proach (achievement, behavior, intelligence, cluster formation. Adams (1985) suggested
and language) to classify at-risk and nor- that correlation may be useful in situations
mally developing children. in which the sample has relatively low (or
high) performance, which may be the case
Another issue is the number of variables for a sample with reading/learning disabili-
to use. More is not necessarily better. There ties. This approach was used in a study of
is no rule for subject-to-variable ratios as children with reading disabilities (Speece,
with some statistical methods (e.g., factor 1987) and produced six subtypes based on
analysis) but unnecessary variables may add information-processing variables. Skinner
noise to the results, making it difficult to (1978) proposed that data first be evaluated
identify the structure if there is one (Milli- according to shape with the resulting clus-
gan & Cooper, 1987). Using a Monte Carlo ters reanalyzed by including scatter and ele-
comparison of clustering algorithms, Price vation to determine the relative importance
(1993) found that increasing the number of of these elements. This procedure was incor-
variables decreased the detectability of clus- porated in two studies (Speece & Cooper,
ters and advised limiting variables to the 1990; Speece et al., 1999). In both cases the
smallest set possible. “shape” clusters, although internally valid,
did not yield meaningful profiles, but each
Similarity Measures and produced an interpretable split when scatter
Clustering Algorithms and elevation data were evaluated. Figure
30.1 depicts the final six-cluster solution for
Skinner (1981) placed the selection of simi- the Speece and Cooper study. Each pair of
larity measure and algorithm under the in- clusters (i.e., 1–2, 3–4, 5–6) was formed by
ternal validation stage of a classification the split of the shape clusters. Examination
study. However, these decisions are placed of the profiles for each pair shows that the
under theory formulation for the present shape of each profile is similar while differ-
discussion to emphasize the point that ing on elevation across the classification
knowledge of the conceptual basis of the variables. Clusters 2, 3, and 5 were inter-
study can assist in making the appropriate preted as variations on normal performance
decisions. whereas cluster 1 was indicative of an LD
profile, cluster 4 was suggestive of mild
Similarity measures define how two enti- mental retardation, and cluster 6 was sug-
ties will be judged as similar, and the cluster gestive of deficient language processing.
algorithm defines why these mergers are This study is presented in more depth in a
made. There are hundreds of similarity mea- later section of the chapter.
sures and algorithms (Blashfield & Alden-
derfer, 1988), but the discussion can be sim- Choice of an algorithm is closely tied to

Cluster Analysis 505

FIGURE 30.1. Mean profiles of clusters across the classification variables. RDG, reading achieve-
ment; MTH, math achievement; VIQ, verbal intelligence; NIQ, nonverbal intelligence; WRS, rating of
work-related skills; IPS, rating of interpersonal skills; PL1–PL4, level of scores on Preschool Language
Assessment Instrument; TLP, total prompts on Dynamic Assessment Task; RPT, residual posttest gain
score on Dynamic Assessment Task. Odds ratios for Clusters 1 through 6 are 252.0, 27.0, 1.0, 43.2,
7.7, and 31.5, respectively. From Speece and Cooper (1990). Copyright 1990 by the American Educa-
tional Research Association. Reprinted by permission.

the similarity measure and Blashfield and ture. In this regard, average linkage, com-
Aldenderfer (1988) provided an extended plete linkage, and Ward’s (1963) minimum-
discussion of algorithms. Applied re- variance algorithms are reasonable choices.
searchers are limited generally to algorithms The average linkage method combines enti-
available in statistical software programs ties such that members in one cluster have a
(e.g., SAS and SPSS), but a variety of algo- greater mean similarity to each other than
rithms are available and they tend to be the with all members in another cluster. With
ones evaluated in the methodological litera- complete linkage, members of clusters are

506 METHODOLOGY

more similar to each other than to members Number of Clusters
of any other cluster. Ward’s method joins
members based on minimizing within clus- After the publication of the Golden and
ter variance. One issue to consider is how Meehl (1980) study, Milligan and Cooper
the algorithms perform with similarity mea- (1985) evaluated the effectiveness of 30 sta-
sures. With respect to recovering known tistical stopping rules in recovering the cor-
structure in a data set (Monte Carlo stud- rect number of clusters. Three of the six best
ies), Ward’s method performs best with a performing rules are incorporated in SAS
distance metric, whereas average linkage (1999) and are referred to in the SAS manu-
provides best recovery with correlation as al as the pseudo F statistic, the pseudo t2
the similarity measure (Lorr, 1983; statistic, and the Cubic Clustering Criterion.
Scheibler & Schnneider, 1985). Morey, These rules serve as guidance functions to
Blashfield, and Skinner (1983) reported determine if clusters exist and how many
similar findings with a real data set with the may be viable. The downside of these proce-
additional finding that Ward’s method with dures is threefold. First, the evaluation was
Euclidean distance provided the best dis- based on a simulated data set with a known
criminatory power across methods. structure (Milligan & Cooper, 1985). Sec-
ond, the evaluation was based on clustering
The careful investigator must exert some with Euclidean distance so performance
caution in using these recommendations by with other similarity measures is unknown.
investigating the influence of shape, scatter, Third, the tests are conservative; whereas
and elevation and the performance of sever- significant tests provide confidence that
al algorithms. This is because the recom- clusters exist, nonsignificant tests do not
mendations by methodologists are based rule out the presence of clusters (Duda &
primarily on analyses of data sets with Hart, 1973; Hawkins, Muller, & ten
known structures, which is not the case in Krooden, 1982; Sarle, 1983). Even though
applied situations. these stopping rules are not perfect, Duda
and Hart (1973) suggested that a “suspi-
Internal Validity cious test is better than none” (p. 244). Ear-
ly subtyping work in learning disabilities
Having addressed the multiple issues did not incorporate these methods, but
prompted by the theory formulation, the more recent applied work has taken advan-
next stage of analysis requires attention to tage of the stopping rules (e.g., Speece &
the following questions: (1) are there clus- Cooper, 1990; Speece et al., 1999).
ters in the data set, (2) if so, how many, and
(3) can they be replicated? These questions In conjunction with the stopping rules, a
represent the most difficult phase of cluster good practice is to analyze a data set with
analysis. The reason internal validation is several algorithms to get a sense of how
difficult is because there is no safety net many clusters may be present. Ward’s
such as an F ratio or effect size estimate to method, complete linkage, and average link-
protect the investigator from drawing fool- age typically are used. The logic underlying
ish conclusions. The importance of good in- this procedure is that a cluster structure
ternal validity procedures was illustrated by should not be algorithm dependent even
Golden and Meehl (1980). They designed a though the algorithms use different rules for
study to test cluster analysis methods by at- joining entities (Anderberg, 1973; Johnson
tempting to detect biological sex via MMPI & Wichern, 1982; Lorr, 1983). In practice,
responses known to provide excellent dis- convergence on the number of clusters
crimination between females and males. Of problem at this point is rare, but the proce-
the six clustering methods evaluated, only dure is helpful in narrowing the possibili-
three were judged as producing accurate ties. Graphing the data is an indispensible
partitions of the data. These findings, of method for examining clusters. Mean pro-
course, did not bode well for the method, files across the variables used for classifica-
leading the authors to call for the develop- tion can be plotted to assist in interpreting,
ment of consistency (internal validation) for example, differences between a four-
tests. and five-cluster solution. Another technique
is to plot cluster members (or cluster cen-
troids) by the canonical discriminant func-

Cluster Analysis 507

tions derived from the data set to examine 1985) and longitudinal subtype stability
separation. Figure 30.2 represents such a (McKinney & Speece, 1986). Another
plot for a three-cluster solution from the method to determine correct membership
Speece and Cooper (1990) study. Morris uses the k-means iterative clustering proce-
and colleagues (1998) provided an example dure, which is a nonhierarchical technique
using cluster centroids. and requires the specification of the number
of subtypes. The k-means procedure uses
After some determination is made on can- the centroids from the hierarchical solution
didate solutions (generally several solutions to form clusters and membership agreement
are carried forward to the replication can be assessed between the two solutions
phase), it is necessary to assess correct clus- (e.g., Morris et al., 1998; Speece & Cooper,
ter membership. Hierarchical techniques do 1990).
not reassign members as the analysis pro-
ceeds, so it is possible that some members Replication
may have a better fit with another cluster.
To test this possibility, a discriminant func- Because of the uncertainty associated with
tion can be derived for each cluster and the the statistical stopping rules, replication of a
sample members can be “forcasted” into cluster structure is a requirement. Two types
each cluster based on the fit of the data with of replication are apparent in the literature
the discriminant function. This procedure in LD: single- sample and split-sample tech-
provides an index called posterior probabil- niques. Single-sample methods are not as
ity of membership in each cluster. Reassign- powerful and are used when sample size is
ment is necessary when membership proba- small. Typically, a subset of the original
bility is higher for a cluster other than for sample is reclustered using procedures in-
the original assignment. This general ap- voked with the full sample and membership
proach, outlined by Field and Shoenfeldt agreement between the subsample and full
(1975), also has provided useful descriptive sample is assessed with the kappa or Rand
information on cluster membership of nor- statistic. Another technique is to add sub-
mally achieving children (Speece et al.,

FIGURE 30.2. Two-dimensional represenation of a three-cluster solution based on performance of
112 first-grade children. From Speece (1993). Copyright 1993 by Paul H. Brookes. Reprinted by per-
mission.

508 METHODOLOGY

jects (e.g., normally achieving children descriptive summaries of the four language
added to original sample of children with profiles and asking the children’s teachers to
LD) and recluster the larger sample. The ad- classify the children into one of the four
dition of “noise” to the data set should not clusters (high average, low average, high
disturb the original solution if clusters are narrative, low overall). The teachers reliably
stable (Morris et al., 1998; Speece, 1987). classified the children, but this agreement
with empirical classifications was modest.
A powerful method of replication is to The teachers were best able to identify chil-
use two samples (or a split sample), each dren in the high average cluster and, to a
subjected to cluster analysis procedures and lesser extent, the low overall cluster but had
results for each sample compared (McIntyre difficulty identifying children in the other
& Blashfield, 1980; Morey et al., 1983). two subtypes. These clinical validity data
Conceptually this entails clustering samples supported the descriptive validity results
A and B, assigning members from B to A that incorporated reading and spelling vari-
based on A’s classification functions and ables not used in the cluster analysis.
comparing the membership concordance be-
tween the actual and forecasted results for There are many examples of descriptive
sample B. Applied examples of this tech- validation efforts in the LD literature (e.g.,
nique can be found in the Morey and col- Feagans & Appelbaum, 1986; Lyon, Stew-
leagues (1983) and Speece and Cooper art, & Freedman, 1982; Morris et al., 1986,
(1990) papers. Breckenridge (1989) extend- 1998). The validation procedures used by
ed the method by recommending a double Speece and Cooper (1990) with the clusters
cross-validation procedure. In addition to depicted in Figure 30.1 are described in de-
the foregoing sequence, the results would tail to provide a sense of the method.
also be examined when the order is reversed
(i.e., B-A-B). The sample was composed of 112 first-
grade children, 63 deemed at risk for school
Although possibly obvious at this point, failure and 49 normally achieving children.
it is important to state what does not consti- The classification measures included read-
tute evidence of internal validity: assessing ing and math achievement, teacher ratings
cluster differences on the variables used for of work-related and interpersonal skills,
classification. The logic is intuitive—as clus- four levels of classroom discourse skills that
ter analysis forms groups of subjects based represented increasing cognitive complexity,
on similarity across the classification vari- and two measures of learning potential (re-
ables, cluster differences on these variables sponse to instruction) operationalized by
are a forgone conclusion. In the odd circum- performance on a dynamic assessment task.
stance that differences are not obtained, it is In this study, the external validation of the
safe to say the solution is not functional. subtypes was examined by (1) comparison
on a set of achievement variables not used
External Validity in clustering, (2) relative risk analysis, and
(3) observed classroom behavior. The analy-
Skinner (1981) proposed that external vali- sis of the achievement variables was con-
dation could be evaluated by three types of ducted for each pair of clusters (1–2, 3–4,
validity procedures: predictive (differential 5–6) and confirmed that the profiles associ-
response to treatment), descriptive (differen- ated with normal performance (clusters 2,
tiation of clusters across variables that are 3, 5) had significantly higher achievement
independent of the classification variables), than the atypical profiles. The relative risk
and clinical (membership agreement be- analysis assessed the extent to which mem-
tween clusters and clinical judgment). Typi- bership in a cluster was associated with ele-
cally, cluster analysis studies in LD focus on vated risk of teacher referral to a teacher as-
descriptive validation. However, Lyon sistance team, a preliminary step to referral
(1985) provided initial data on the validity for identification for special education ser-
of reading disability subtypes by examining vices. This analysis produces an odds ratio
response to instruction. Speece and col- for each cluster (reported for each cluster in
leagues (1999) examined the clinical validi- the caption for Figure 30.1) and is interpret-
ty of oral-language subtypes by developing ed as the increased risk, compared to cluster
3 (baseline cluster), of being referred. For

Cluster Analysis 509

example, the odds ratio (OR) of 257 associ- tion evidence for the identified clusters was
ated with the LD profile (cluster 1) means obtained using an extensive and multivari-
that the risk of referral is 257 times that of ate approach that included both individual
the profile for the normal cluster (cluster 3). differences and contextual variables.
With the exception of cluster 5, each was
significant. In addition, the risk associated Assessing the Contribution of
with the LD profile was significantly greater Empirical Subtyping to the Study
than the risk for the other four clusters. of Learning Disabilities
Similarly, the OR for cluster 5 was signifi-
cantly lower. When I was asked by the editors to write a
chapter on cluster analysis my first reaction
The children also were observed in their was, “Are you sure you want one? I don’t
classrooms and two composites of class- think anyone is doing it anymore.” I said
room behavior, academic responding and this despite my own recent work using the
inappropriate responding, were used to as- method and knowledge of the paper by
sess cluster differences. Here we were inter- Robin Morris and his colleagues (1998). To
ested in differences among the clusters with check my perception that use of cluster
normal variation (clusters 2, 3, and 5) and analysis in applied contexts had declined, I
among the atypical profiles. Of most inter- conducted some electronic searches using
est to the present discussion are contrasts the PsycINFO database. The search was di-
among the atypical profiles. The LD profile vided into decades (1970–1979; 1980–
(cluster 1) exhibited less academic and more 1989; and 1990–2001) and only journal ar-
inappropriate behavior than did the lan- ticles were requested. Using the terms “clus-
guage disability profile (cluster 6). This may ter analysis” and “reading disabilities,” 0,
be a key to the elevated risk of referral for 8, and 7 articles were uncovered by decade,
cluster 1 and coincides with the extremely respectively. Broadening the parameters to
low teacher ratings of work-related skills cluster analysis and LD produced somewhat
(WRS) and interpersonal skills (IPS) in addi- higher numbers: 0, 18, and 22 articles.
tion to low achievement evident in the pro- More inclusive terms, “cluster analysis” and
file of classification variables. The profiles “learning,” produced 9, 46, and 102 papers
associated with mild mental retardation by decade. It would seem then that the use
(cluster 4) and language disability (cluster 6) of cluster analysis, while infrequent in the
did not differ on academic responding, but reading and LD literature, has at least re-
cluster 4 was more inappropriate then clus- mained steady and increased in other educa-
ter 6. tional areas. My initial perception appeared
more related to the fact that my current in-
In a second study, Cooper and Speece terests in classification would be considered
(1990) examined a more complex rendering to be in the rational, as compared to empiri-
of observed classroom behavior and includ- cal, realm (e.g., Speece & Case, 2001).
ed a third cohort of children who were as-
signed to one of the six original clusters by What may account for the infrequent use
the forecasting method described previous- of cluster analysis in the LD literature?
ly. Eight composite ecological arrangements There are several possibilities. First, as is ev-
were developed and risk for special educa- ident in this chapter, cluster analysis using
tion placement was assessed for at-risk chil- hierarchical, agglomerative methods is a
dren in each cluster and each behavioral complicated process. Surprisingly, an ad-
composite. The LD profile was at signifi- vanced statistical text written by an expert
cantly greater risk of placement when chil- in cluster analysis devoted only part of a
dren received fewer opportunities to read in chapter to the method (Everitt, 1996). It ap-
a small group. Two nonsignificant trends of pears that graduate statistics courses in edu-
elevated risk for this cluster were associated cation and psychology may not provide de-
with higher frequencies of classroom tailed exposure to the method. Thus, the
arrangements that featured independent interested user needs to seek other means of
work. Independent of cluster membership, instruction. Of course, it is possible for any-
exposure to higher levels of independent one with some knowledge of a statistical
work time was a significant risk factor for
the sample. To summarize, external valida-

510 METHODOLOGY

software package to produce clusters, but potential correspondence with attention
the use of the default methods programmed deficit disorder (ADD). Examination of the
in statistical packages comes with a great profile of cluster 1 in Figure 30.1 shows
deal of risk to the integrity of the results. that, in addition to low achievement, chil-
dren in this cluster also received extremely
A second reason may be the sheer amount low teacher ratings on behavior (WRS, IPS).
of work required and the uncertainty expe- It may be that this cluster of children repre-
rienced particularly in the internal valida- sents an overlap with ADD, a possibility
tion stage of analysis. Morris and colleagues that cannot be evaluated with the data set
(1998) described the process as requiring but should be considered in future studies.
“painstaking scrutiny” (p. 370). Although Entertaining the possibility that clusters
users of cluster analysis techniques are no overlap may be one method of better repre-
more noble than other researchers, compe- senting the structure of the population of
tent use does require acceptance of the pos- children with learning disabilities. Gordon
sibility that no clusters exist in the data, that (1999) provided detail on these methods of
several solutions may be plausible, and that cluster analysis.
conceptual clarity is rarely achieved.
Summary and Conclusions
A third reason offered by Fletcher and
colleagues (1997)in their review of subtypes Cluster analysis represents one tool needed
of dyslexia is that the work has had little by researchers interested in the classification
impact on research in reading disabilities, a of children who experience learning prob-
conclusion that can be generalized to work lems. The methods are complex and it is
in LD. This conclusion is due to all that has necessary to seek converging evidence to
been discussed in this chapter but can be re- support the number of clusters thought to
duced to the importance of a theoretical be present in the data. Although there is no
framework guiding the study. The need for definitive statistical model to guide these
a sound, theoretically driven classification analyses, the methods outlined in this chap-
model of LD is acute, but, so far, we have ter and elaborated in the methodological lit-
not been up to the task. For example, it is erature cited can provide some confidence
now clear that discrepancies between intelli- in the results. The applied research in the
gence and achievement do not provide good field of LD, whether within or across do-
coverage of reading disabilities (e.g., Fletch- mains, has provided insights into the multi-
er et al., 1994; Speece & Case, 2001) but variate nature of LD.
the critical classification variables have yet
to be determined. Even though advances However, a frank appraisal of the extant
have been made in understanding early empirical subtyping research would lead to
reading failure, the bulk of this evidence is the conclusion that these studies have not
limited to word reading in young children had a great deal of influence on the field. In
with little attention to comprehension or addition to the need for theory, rather than
older readers. Part of the problem is our method, to drive the investigation, Skinner
newness as a field. Even though interest in and Blashfield (1982) identified several oth-
the study of LD has at least a 100-year his- er factors that may limit the impact of clus-
tory (Torgesen, 1998) progress is slow and ter analysis research. Two that are particu-
we have had to overcome our share of false larly relevant to research in LD are (1) that
starts (Lyon, 1987). the research base is represented primarily by
single studies rather than programmatic ef-
Another possible reason for lack of im- forts in classification and (2) lack of integra-
pact is that researchers use methods that tion with clinical practice.
produce mutually exclusive clusters when
overlapping cluster structure may be a bet- These issues have more to do with the
ter representation of the data. Lorr (1994) conceptualization of the research problem
noted that overlap is a feature of classifica- than with the method. Practitioners as well
tion in psychiatry, but this possibility has as researchers require convincing evidence
not been studied via cluster analysis in LD that subtypes have meaning beyond the nar-
even though clinical experience would sug- row boundaries drawn by users of cluster
gest this position is viable. Take, for exam-
ple, the issue of classroom behavior and the

Cluster Analysis 511

analysis. There is little research that exam- Anderberg, M. R. (1973). Cluster analysis for appli-
ines the utility of a classification in terms of cations. New York: Academic Press.
instructional or classroom factors (see
Speece, 1993, for expansion of this idea). Blashfield, R. K., & Aldenderfer, M. S. (1988). The
Predictive (response to treatment) and clini- methods and problems of cluster analysis. In J. R.
cal validity efforts will need to be included Nesselroade & R. B. Cattell (Eds.), Handbook of
in a comprehensive framework to extend multivariate experimental psycholoqy (2nd ed.,
the usefulness of results from cluster analy- pp. 447–473). New York: Plenum Press.
sis investigations.
Blashfield, R. K., & Draguns, J. G. (1976). Evalua-
Efforts in this direction could be acceler- tive criterion for psychiatric classification. Jour-
ated by sharing classification functions nal of Abnormal Psychology, 85, 140–150.
among investigators (Speece & Cooper,
1991). Because of the time required to com- Boder, E. (1973). Developmental dyslexia: A diag-
plete a cluster analysis study and because of nostic approach based on three atypical reading-
developmental processes, it is likely that the spelling patterns. Developmental Medicine and
participants from the original sample may Child Neurology, 15, 663–687.
present different profiles by the time the
cluster structure is validated descriptively. Breckenridge, J. N. (1989). Replicating cluster
For example, the longitudinal analysis of analysis: Method, consistency, and validity. Mul-
the behavioral subtypes identified by Speece tivariate Behavioral Research, 24, 147–161.
and colleagues (1985) indicated that chil-
dren moved into more maladaptive sub- Cooper, D. H., & Speece, D. L. (1990). Maintaining
types over time (McKinney & Speece, at-risk children in regular education settings: Ini-
1986). Thus, further intervention validation tial effects of individual differences and class-
efforts on the original sample may not be room environments. Exceptional Children, 57,
appropriate. However, another sample 117–126.
could be drawn, measured on the same
variables or constructs, and cluster mem- Cronbach, L. J., & Gleser, G. C. (1953). Assessing
bership determined by the forecasting meth- similarity between profiles. Psychological Bul-
ods described earlier in the chapter. This letin, 50, 456–473.
approach could extend validation efforts to
other investigators who may be more inter- Duda, R. O., & Hart, P. E. (1973). Pattern classifi-
ested in instructional validation than in the cation and scene analysis. New York: Wiley.
derivation of subtypes (Speece & Cooper,
1991). Everitt, B. S. (1996). Making sense of statistics in
psychology: A second-level course. New York:
The best work in classification of LD us- Oxford University Press.
ing empirical methods is ahead of us. Care-
ful examination of the strengths and limita- Everitt, B. S., & Dunn, G. (1983). Advanced meth-
tions of previous work will assist in the ods of data exploration and modelling. London:
development of coherent, relevant, and Heinemann.
meaningful classifications.
Feagans, L., & Appelbaum, M. I. (1986). Valida-
Acknowledgment tion of language subtypes of learning disabled
children. Journal of Educational Psychology, 78,
Portions of this chapter were based on Speece 358–364.
(1990, 1994–95).
Field, H. S., & Schoenfeldt, L. F. (1975). Ward and
References Hook revisited: A two-part procedure for over-
coming a deficiency in the grouping of persons.
Adams, K. M. (1985). Theoretical, methodological, Educational and Psychological Measurement, 35,
and statistical issues. In B. P. Rourke (Ed.), Neu- 171–173.
ropsychology of learning disabilities: Essentials
of subtype analysis (pp. 17–39). New York: Guil- Fletcher, J. M., Morris, R., Lyon, G. R., Stuebing,
ford Press. K. K., Shaywitz, S. E., Shankweiler, D. P., Katz,
L., & Shaywitz, B. A. (1997). Subtypes of
dyslexia: An old problem revisited. In B. Blach-
man (Ed.), Foundations of reading acquisition
and dyslexia (pp. 95–114). Mahwah, NJ: Erl-
baum.

Fletcher, J. M., Shaywitz, S. E., Shankweiler, D. P.,
Katz, L., Liberman, I. Y. Stuebing, K. K., Francis,
D. J., Fowler, A. E., & Shaywitz, B. A. (1994).
Cognitive profiles of reading disabilities: Com-
parisons of discrepancy and low achievement def-
initions. Journal of Educational Psychology, 86,
6–23.

Golden, R. R., & Meehl, P. E. (1980). Detection of
biological sex: An empirical test of cluster meth-
ods. Multivariate Behavioral Research, 15,
475–496.

Gordon, A. D. (1999). Classification (2nd ed.).
New York: Chapman & Hall/CRC.

Hawkins, D. M., Muller, M. W., & ten Krooden, J.
A. (1982). Cluster analysis. In D. M. Hawkins

512 METHODOLOGY

(Ed.), Topics in applied multivariate analysis (pp. Milligan, G. W., & Cooper, M. C. (1987). Method-
303–356). Cambridge, UK: Cambridge Universi- ology review: Clustering methods. Applied Psy-
ty Press. chological Measurement, 11, 329–354.
Jacobellis v. Ohio, 378 U.S. 184, (1964), retrieved
from http://laws.findlaw.com/us/378/184.html on Morey, L. C., Blashfield, R. K., & Skinner, H. A.
10/15/2001. (1983). A comparison of cluster analysis tech-
Jain, A. K., & Dubes, R. C. (1988). Algorithms for niques within a sequential validation framework.
clustering data. Englewood Cliffs, NJ: Prentice Multivariate Behavioral Research, 18, 309–329.
Hall.
Johnson, R. A., & Wichern, D. W. (1982). Applied Morris, R., Blashfield, R. K., & Satz, P. (1986). De-
multivariate techniques. Englewood Cliffs, NJ: velopmental classification of learning disabled
Prentice Hall. children. Journal of Clinical and Experimental
Keogh, B. K., & MacMillan, D. L. (1983). The log- Neuropsychology, 8, 371–392.
ic of sample selection: Who represents what? Ex-
ceptional Education Quarterly, 4, 84–96. Morris, R. D., Stuebing, K. K., Fletcher, J. M.,
Lorr, M. (1983). Cluster analysis for social scien- Shaywitz, S. E., Lyon, G. R., Shankweiler, D. P.,
tists. San Francisco: Jossey-Bass. Katz, L., Francis, D. J., & Shaywitz, B. E. (1998).
Lorr, M. (1994). Cluster analysis: Aims, methods, Subtypes of reading disability: Variability around
and problems. In S. Strack & M. Lorr (Eds.), Dif- a phonological core. Journal of Educational Psy-
ferentiating normal and abnormal personality chology, 90, 347–373.
(pp. 179–195). New York: Springer.
Lyon, G. R. (1985). Educational validation studies Price, L. J. (1993). Identifying cluster overlap with
of learning disability subtypes. In B. P. Rourke NORMIX population membership probabilities.
(Ed.), Neuropsychology of learning disabilities: Multivariate Behavioral Research, 8, 235–262.
Essentials of subtype analysis (pp. 228–253).
New York: Guilford Press. Sarle, W. S. (1983). Cubic clustering criterion (SAS
Lyon, G. R. (1987). Learning disabilities research: Technical Report No. A-108). Cary, NC: SAS In-
False starts and broken promises. In S. Vaughn & stitute.
C. S. Bos (Eds.), Research in learning disabilities:
Issues and future directions (pp. 69–80). Boston: SAS Institute (1999). SAS/STAT User’s guide: Ver-
College-Hill Press. sion 8 Edition. Cary, NC: Author.
Lyon, G. R., Stewart, N., & Freedman, D. (1982).
Neuropsychological characteristics of empirically Satz, P., & Morris, R. (1981). Learning disability
derived subgroups of learning disabled readers. subtypes: A review. In F.J. Pirozzolo & M. C.
Journal of Clinical Neuropsychology, 4, 343– Wittrock (Eds.), Neuropsycholoqical and cogni-
365. tive processes in reading (pp. 109–141). New
MacMillan, D. L. (1993). Development of opera- York: Academic Press.
tional definitions in mental retardation: Similari-
ties and differences with the field of learning dis- Scheibler, D., & Schneider, W. (1985). Monte Carlo
abilities. In G. R. Lyon, D. B. Gray, J. F. tests of the accuracy of cluster analysis algo-
Kavanaugh, & N. A. Krasnegor (Eds.), Better un- rithms: A comparison of hierarchical and non-
derstandinq learninq disabilities: New views hierarchical methods. Multivariate Behavioral
from research and their implications for educa- Research, 20, 283–304.
tion and public policies (pp. 117–152). Balti-
more: Brookes. Skinner, H. A. (1978). Differentiating the contribu-
MacMillan, D. L., & Speece, D. L. (1999). Utility tion of elevation, scatter, and shape in profile
of current diagnostic categories for research and similarity. Educational and Psycholoqical Mea-
practice. In R. Gallimore, L. Bernheimer, D. L. surement, 38, 297–308.
MacMillan, D. L. Speece, & R. R. Vaughn (Eds.),
Developmental perspectives on children with Skinner, H. A. (1981). Toward the integration of
high incidence disabilities (pp. 111–133). Mah- classification theory and methods. Journal of Ab-
wah, NJ: Erlbaum. normal Psychology, 20, 68–87.
McIntrye, R. M., & Blashfield, R. K. (1980). A
nearest centroid technique for evaluating the Skinner, H. A., & Blashfield, R. K. (1982). Increas-
minimum variance clustering procedure. Multi- ing the impact of cluster analysis research: The
variate Behavioral Research, 15, 225–238. case of psychiatric classification. Journal of Con-
McKinney, J. D., & Speece, D. L. (1986). Academic sulting and Clinical Psychology, 50, 727–735.
consequences and longitudinal stability of behav-
ioral subtypes of learning disabled children. Jour- Speece, D. L. (1987). Information processing sub-
nal of Educational Psychology, 78, 365–372. types of learning disabled readers. Learning Dis-
Milligan, G. W., & Cooper, M. C. (1985). An ex- abilities Research, 2, 91–102.
amination of procedures for determining the
number of clusters in a data set. Psychometrika, Speece, D. L. (1993). Broadening the scope of clas-
50, 159–179. sification research: Conceptual and ecological
perspectives. In G. R. Lyon, D. B. Gray, J. F. Ka-
vanaugh, & N. A. Krasnegor (Eds.), Better un-
derstanding learning disabilities: New views from
research and their implications for education and
public policy (pp. 57–72). Baltimore: Brookes.

Speece, D. L., & Case, L. P. (2001). Classification in
context: An alternative approach to identifying
early reading disability. Journal of Educational
Psychology, 93, 735–749.

Speece, D. L., & Cooper, D. H. (1990). Ontogeny
of school failure: Classification of first grade chil-
dren at risk. American Educational Research
Journal, 27, 119–140.

Cluster Analysis 513

Speece, D. L., & Cooper, D. H. (1991). Retreat, re- subgroups in research on learning disabilities. In
group or advance? An agenda for empirical clas- J. P. Das, R. F. Mulcahy, & A. E. Wall (Eds.),
sification research in learning disabilities X. In Theory and research in learning disabilities (pp.
L.V. Feagans, E.J. Short, & L. Meltzer (Eds.), 111–131). New York: Plenum Press.
Subtypes of learninq disabilities: Theoretical per- Torgesen, J. K. (1998). Learning disabilities: An his-
spectives and research (pp. 33–52). Hillsdale, NJ: torical and conceptual overview. In B. Y. L. Wong
Erlbaum. (Ed.), Learning about learning disabilities (pp.
3–34). San Diego, CA: Academic Press.
Speece, D. L., McKinney, J. D., & Appelbaum, M. Ward, J. H. (1963). Hierarchical grouping to opti-
I. (1985). Classification and validation of behav- mize an objective function. Journal of the Ameri-
ioral subtypes of learning-disabled children. Jour- can Statistical Association, 58, 236–244.
nal of Educational Psychology, 77, 67–77. Wolf, M., & Bowers, P. G. (1999). The double-
deficit hypothesis for the developmental dyslexi-
Speece, D. L., Roth, F. P., Cooper, D. H., & De La as. Journal of Educational Psychology, 91,
Paz, S. (1999). The relevance of oral language 415–438.
skills in early literacy: A multivariate analysis.
Applied Psycholinguistics, 20, 167–190.

Torgesen, J. K. (1982). The use of rationally defined

31

Neurobiological Indices of Dyslexia

Sally E. Shaywitz
Bennett A. Shaywitz

Dyslexia is characterized by an unexpected Stevenson, Gilger, & Pennington, 1992).
difficulty in reading in children and adults Longitudinal studies, both prospective
who otherwise possess the intelligence, mo- (Francis, Shaywitz, Stuebing, Shaywitz, &
tivation, and schooling considered necessary Fletcher, 1996; B. A. Shaywitz, Holford, et
for accurate and fluent reading (S. E. Shay- al., 1995) and retrospective (Bruck, 1992;
witz, 1998, 2003). Recent epidemiological Felton, Naylor, & Wood, 1990; Scarbor-
data indicate that, like hypertension and ough, 1984), indicate that dyslexia is a per-
obesity, dyslexia fits a dimensional model. sistent, chronic condition; it does not rep-
In other words, within the population, read- resent a transient “developmental lag”
ing ability and reading disability occur along (Figure 31.1). Over time, poor readers and
a continuum, with reading disability repre- good readers tend to maintain their relative
senting the lower tail of a normal distribu- positions along the spectrum of reading
tion of reading ability (Gilger, Borecki, ability (B. A. Shaywitz, Holford, et al.,
Smith, DeFries, & Pennington, 1996; S. 1995).
Shaywitz, Escobar, Shaywitz, Fletcher, &
Makuch, 1992). Dyslexia is both familial and heritable
(Pennington & Gilger, 1996). Family histo-
Dyslexia is perhaps the most common ry is one of the most important risk factors,
neurobehavioral disorder affecting children, with 23% to as many as 65% of children
with prevalence rates ranging from 5–10% who have a parent with dyslexia reported to
in clinic- and school-identified samples to have the disorder (Scarborough, 1990). A
17.5% in unselected population-based sam- rate among siblings of affected persons of
ples (S. E. Shaywitz, 1998). Previously, it approximately 40% and among parents
was believed that dyslexia affected boys pri- ranging from 27 to 49% (Pennington &
marily (Finucci & Childs, 1981); however, Gilger, 1996) provides opportunities for
more recent data indicate similar numbers early identification of affected siblings and
of affected boys and girls (Flynn & Rahbar, often for delayed but helpful identification
1994; S. Shaywitz, Shaywitz, Fletcher, & of affected adults. Replicated linkage stud-
Escobar, 1990; Wadsworth, DeFries, ies implicate loci on chromosomes 2, 3, 6,

514

Neurobiological Indices of Dyslexia 515

FIGURE 31.1. Trajectory of reading skills over time in readers who are nonimpaired and those who
are dyslexic. Ordinate is Rasch scores (W scores) from the Woodcock–Johnson reading test (Woodcock
& Johnson, 1989) and abscissa is age in years. Both readers who are dyslexic and those who are non-
impaired improve their reading scores as they get older, but the gap between the readers who are
dyslexic and those who are nonimpaired remains. Thus dyslexia is a deficit and not a developmental
lag. Data from Francis et al. (1996). Copyright 2002 by Sally Shaywitz.

15, and 18 (Fisher & DeFries, 2002) for the words, in fact, can be decomposed into
transmission of phonological awareness phonological segments. Thus, it is this
deficits and subsequent reading problems. awareness that allows the reader to connect
Whether the differences in the genetic loci the letter strings (the orthography) to the
represent polygenic inheritance, different corresponding units of speech (phonological
cognitive paths to the same phenotype, or constituents) they represent. The awareness
different types of dyslexia is not clear. that all words can be decomposed into these
basic elements of language (phonemes) al-
Review of Theoretical Models lows the reader to decipher the reading
code.
Overwhelming converging evidence from a
number of lines of investigation indicates To read, a child has to develop the insight
that the central difficulty in dyslexia reflects that spoken words can be pulled apart into
a deficit within the language system. Investi- phonemes and that the letters in a written
gators have long known that speech enables word represent these sounds. As numerous
its users to create an indefinitely large num- studies have shown, however, such aware-
ber of words by combining and permuting a ness is largely missing in children and adults
small number of phonological segments, the with dyslexia (Brady & Shankweiler, 1991;
consonants and vowels that serve as the nat- Bruck, 1992; Fletcher et al., 1994; Liberman
ural constituents of the biological special- & Shankweiler, 1991; Rieben & Perfetti,
ization for language. An alphabetic tran- 1991; Shankweiler et al., 1995; Shankweiler,
scription (reading) brings this same ability Liberman, Mark, Fowler, & Fischer, 1979;
to readers, but only as they connect its arbi- Share, 1995; S. E. Shaywitz, 1996, 1998;
trary characters (letters) to the phonological Stanovich & Siegel, 1994; Torgesen, 1995;
segments they represent. Making that con- Wagner & Torgesen, 1987). Results from
nection requires an awareness that all large and well-studied populations with
reading disability confirm that in young
school-age children (Fletcher et al., 1994;

516 METHODOLOGY

Stanovich & Siegel, 1994) as well as in ado- great difficulty in reading (S. E. Shaywitz,
lescents (S. E. Shaywitz et al., 1999) a deficit 1996).
in phonology represents the most robust and
specific (Morris et al., 1998) correlate of According to the model, a circumscribed
reading disability. Such findings form the ba- deficit in a phonological function blocks ac-
sis for the most successful and evidence- cess to higher-order processes and to the
based interventions designed to improve ability to draw meaning from text. The
reading (Report of the National Reading problem is that the affected reader cannot
Panel, 2000). While children and adults with use his or her higher-order linguistic skills to
a phonological deficit represent the majority access the meaning until the printed word
of cases of dyslexia, we note that other sub- has first been decoded and identified. For
types may, indeed, account for some cases of example, an individual who knows the pre-
dyslexia, for example, surface dyslexia cise meaning of the spoken word “appari-
(Coltheart, Curtis, Atkins, & Haller, 1993; tion” will not be able to use his knowledge
Coltheart, Masterson, Byng, Prior, & Rid- of the meaning of the word until he can de-
doch, 1983; Coltheart, Rastle, Perry, Lang- code and identify the printed word on the
don, & Ziegler, 2001), D- and L-type dyslex- page and will appear not to know the
ia (Bakker, 1992; Bakker, Licht, & van word’s meaning.
Strien, 1991), and dyslexia resulting from
deficits in naming speed in addition to The Phonological Deficit in Adolescence
phonological deficits (double-deficit hypoth- and Adult Life
esis [Wolf, 1991; Wolf, Bally, & Morris,
1986; Wolf & Bowers, 1999]). Other theo- Deficits in phonological coding continue to
ries of dyslexia have been proposed that are characterize readers with dyslexia even in
based on the visual system (Stein & Walsh, adolescence; performance on phonological
1997) and other factors, such as temporal processing measures contributes most to
processing of stimuli within these systems discriminating and average readers and
(Talcott et al., 2000; Tallal, 2000). However, those with dyslexia, and average and superi-
these theories have generally not received or readers as well (S. E. Shaywitz et al.,
confirmatory support. 1999). Children with dyslexia neither spon-
taneously remit nor demonstrate a lag
Implications of the Phonological mechanism for “catching up” in the devel-
Model of Dyslexia opment of reading skills. Yet many readers
with dyslexia do become quite proficient in
Basically, reading comprises two main reading a finite domain of words that are in
processes—decoding and comprehension their area of special interest, usually words
(Gough & Tunmer, 1986). In dyslexia, a that are important for their careers—for ex-
deficit at the level of the phonological mod- ample, an individual who is dyslexic in
ule impairs the ability to segment the writ- childhood but, in adult life, becomes inter-
ten word into its underlying phonological ested in molecular biology and then learns
elements. As a result, the reader experiences to decode words that form a minivocabu-
difficulty, first in decoding the word and lary important in molecular biology. Such
then in identifying it. The phonological an individual, however, while able to de-
deficit is domain specific; that is, it is inde- code words in this domain still, exhibits evi-
pendent of other, nonphonological, abilities. dence of his early reading problems when he
In particular, the higher-order cognitive and has to read unfamiliar words, which he then
linguistic functions involved in comprehen- does accurately but not fluently and auto-
sion, such as general intelligence and rea- matically (Ben-Dror, Pollatsek, & Scarpati,
soning, vocabulary (Share & Stanovich, 1991; Bruck, 1985, 1990, 1992, 1998;
1995), and syntax (Shankweiler et al., Lefly & Pennington, 1991; S. E. Shaywitz et
1995), are generally intact. This pattern—a al., 1999). In adolescents, the rate of read-
deficit in phonological analysis contrasted ing as well as facility with spelling may be
with intact higher-order cognitive abilities— most useful clinically in differentiating aver-
offers an explanation for the paradox of age from poor readers. From a clinical per-
otherwise intelligent people who experience spective, these data indicate that as children
approach adolescence, a manifestation of

Neurobiological Indices of Dyslexia 517

dyslexia may be a slow reading rate; in fact, brain morphometry (Filipek, 1996), and
children may learn to read words accurate- diffusion tensor MRI (magnetic resonance
ly, but they will not be fluent or automatic, imaging) (Klingberg et al., 2000) supports
reflecting the lingering effects of a phono- the belief that there are differences in
logical deficit (Lefly & Pennington, 1991). the temporo–parieto–occipital brain re-
Because they are able to read words accu- gions between dyslexic and nonimpaired
rately (albeit very slowly), dyslexic adoles- readers.
cents and young adults may mistakenly be
assumed to have “outgrown” their dyslexia. Functional Brain Imaging
Data from studies of children with dyslexia
who have been followed prospectively sup- Rather than being limited to examining the
port the notion that in adolescents, the rate brain in an autopsy specimen, or measuring
of reading as well as facility with spelling the size of brain regions using static mor-
may be most useful clinically in differentiat- phometric indices, functional imaging offers
ing average from poor readers in secondary the possibility of examining brain function
school, and college and even graduate during performance of a cognitive task. In
school. It is important to remember that principle, functional brain imaging is quite
these older students with dyslexia may be simple. When an individual is asked to per-
similar to their unimpaired peers on un- form a discrete cognitive task, that task
timed measures of word recognition yet places processing demands on particular
continue to suffer from the phonological neural systems in the brain. To meet those
deficit that makes reading less automatic, demands requires activation of neural sys-
more effortful, and slow. tems in specific brain regions, and those
changes in neural activity are, in turn, re-
Review of the Research Literature on the flected by changes in brain metabolic activi-
Neurobiology of Reading ty, which in turn, are reflected, for example,
by changes in cerebral blood flow and in the
Anatomic Evidence cerebral utilization of metabolic substrates
such as glucose. Some of the first functional
To a large degree these advances in under- imaging studies of dyslexia used positron
standing dyslexia have informed and facili- emission tomography ([PET] e.g., Gross-
tated studies examining the neurobiological Glenn et al., 1991; Hagman et al., 1992). In
underpinnings of reading and dyslexia. His- practice, PET requires intraarterial or intra-
torically, as early as 1891, the French neu- venous administration of a radioactive iso-
rologist Dejerine suggested that a portion of tope to the subject so that cerebral blood
the left posterior brain region is critical for flow or cerebral utilization of glucose can
reading. Beginning with Dejerine, a large be determined while the subject is perform-
literature on acquired inability to read ing the task. Positron-emitting isotopes of
(alexia) describes neuroanatomic lesions nuclei of biological interest have short bio-
most prominently centered in the pari- logical half-lives and are synthesized in a cy-
etotemporal area (including the angular clotron immediately prior to testing, a fac-
gyrus, supramarginal gyrus, and posterior tor that mandates that the time course of
portions of the superior temporal gyrus) as the experiment conform to the short half-
a region pivotal in mapping the visual per- life of the radioisotope.
cept of the print onto the phonologic struc-
tures of the language system (Damasio & Functional magnetic resonance imaging
Damasio, 1983; Friedman, Ween, & Albert, (fMRI) promises to supplant other methods
1993; Geschwind, 1965). Another posterior for its ability to map the individual brain’s
brain region, in the occipitotemporal area, response to specific cognitive stimuli. Be-
was also described by Dejerine (1892) as cause it is noninvasive and safe, it can be
critical in reading. More recently, a range of used repeatedly, properties that make it ide-
neurobiological investigations using post- al for studying humans, especially children.
mortem brain specimens (Galaburda, Sher- In principle, the signal used to construct
man, Rosen, Aboitiz, & Geschwind, 1985), MRI images changes, by a small amount
(typically of the order 1–5%), in regions
that are activated by a stimulus or task. The

518 METHODOLOGY

increase in signal results from the combined The Research Program at the Yale Center
effects of increases in the tissue blood flow, for the Study of Learning and Attention
volume and oxygenation, though the precise
contributions of each of these is still some- How Functional Brain Imaging Has Informed
what uncertain. MRI intensity increases Research on Dyslexia
when deoxygenated blood is replaced by
oxygenated blood. A variety of methods can FUNCTIONAL MRI AND PHONOLOGICAL
be used to record the changes that occur,
but one preferred approach makes use of PROCESSING
ultrafast imaging, such as echo planar imag-
ing (EPI), in which complete images are ac- Our research program has used fMRI to ex-
quired in times substantially shorter than a amine the functional organization of the
second. EPI can provide images at a rate fast brain for reading and reading disability. Ini-
enough to capture the time course of the tial studies focused on the identification of
hemodynamic response to neural activation those cortical sites associated with various
and to permit a wide variety of imaging subcomponent operations in reading in
paradigms over large volumes of the brain. readers who are nonimpaired. We next ex-
Details of fMRI are reviewed by Ander- amined how the brain activation patterns of
son and Gore (1997). Magnetic source adults with dyslexia differed from adult
imaging using magnetoencephalography readers who are nonimpaired. Most recent-
(MEG) has emerged as a complementary ly we have studied children with dyslexia
functional imaging modality. It is useful and compared their brain imaging patterns
in resolving the temporal sequences of with children who were readers who are
cognitive processes though its spatial nonimpaired. Before describing some of
resolution is much less precise than PET or these results in more detail we first review
fMRI. the rationale for the tasks we have used and
the strategy employed to analyze the results
Converging evidence using functional of these measures.
brain imaging in dyslexic readers also
shows a failure of left-hemisphere posterior THEORETICAL ISSUES IN TASK DESIGN
brain systems to function properly during
reading (Brunswick, McCrory, Price, Frith, Most functional imaging studies, whether
& Frith, 1999; Helenius, Tarkiainen, Cor- PET or fMRI, use a subtraction methodolo-
nelissen, Hansen, & Salmelin, 1999; Hor- gy in attempting to isolate brain/cognitive
witz, Rumsey, & Donohue, 1998; Paulesu function relations (Friston, Frith, Liddle, &
et al., 2001; Rumsey et al., 1992, 1997; Frackowiak, 1993; Petersen & Fiez, 1993;
Salmelin, Service, Kiesila, Uutela, & Salo- Sergent, 1994). Reading can be considered
nen, 1996; B. A. Shaywitz et al., 2002; S. E. to involve three component processes: or-
Shaywitz et al., 1998; Simos, Breier, Fletch- thographic, phonological, and lexical–
er, Bergman, & Papanicolaou, 2000; Tem- semantic processing. In designing tasks, it is
ple et al., 2001), as well as during nonread- important that the decision and response
ing visual processing tasks (Demb, Boynton, components of both the experimental and
& Heeger, 1998; Eden et al., 1996). In addi- the baseline tasks be comparable. In many
tion, some functional brain imaging studies of our studies we used five tasks: line orien-
show differences in brain activation in tation judgment, letter case judgment, single
frontal regions in dyslexic compared to letter rhyme, nonword rhyme, and category
readers who are nonimpaired, in some stud- judgment. The five tasks are ordered hierar-
ies readers with dylexia are more active in chically; at the lowest level, the line orienta-
frontal regions (Brunswick et al., 1999; tion (L) judgment task (e.g. Do [\\\/] and
Rumsey et al., 1997; S. E. Shaywitz et al., [\\\/] match?) taps visual–spatial processing
1998), and in others readers who are non- but makes no orthographic demands. Next,
impaired are more active in frontal regions the letter case judgment task (e.g., Do
(Corina et al., 2001; Georgiewa et al., 1999; [bbBb] and [bbBb] match in the pattern of
Gross-Glenn et al., 1991; Paulesu et al., upper and lower case letters?) adds an or-
1996). thographic processing demand but makes
no phonological demands, because the stim-

Neurobiological Indices of Dyslexia 519

ulus items that consist entirely of consonant asked to make a same/different judgment by
strings are, therefore, phonotactically im- pressing a response button if the displays
permissible. The third task, single letter are matched on a given cognitive dimension.
rhyme (SLR) (e.g., Do the letters [T] and [V]
rhyme?), while orthographically more sim- SEX DIFFERENCES
ple than C, adds a phonological processing
demand requiring the transcoding of the let- Our initial series of investigations focused
ters (orthography) into phonological struc- on the identification of those cortical sites
tures, and then sufficient phonological associated with various subcomponent op-
analysis of those structures to determine erations in readers who are nonimpaired.
that they do or do not rhyme; the fourth Accordingly, we examined normal readers,
task, nonword rhyme (NWR) (e.g., Do 19 neurologically normal right-handed men
[leat] and [jete] rhyme?), requires analysis of and 19 women (B. A. Shaywitz, Shaywitz,
more complex structures. The fifth task, se- et al., 1995). Of particular interest were dif-
mantic category (SC) judgment (e.g., Are ferences in brain activation patterns during
[corn] and [rice] in the same category?), also phonological processing in men compared
makes substantial demands on transcoding to women. Figure 31.2, which demonstrates
from print to phonology (Lukatela & Tur- that activation during phonological pro-
vey, 1994; Van Orden, Pennington, & cessing in men was more lateralized to the
Stone, 1990) but requires in addition that left inferior frontal gyrus (IFG), illustrates
the printed stimulus items activate particu- these differences; in contrast, activation
lar word representations in the reader’s lexi- during this same task in women resulted in
con to arrive at the word’s meaning. In a a more bilateral pattern of activation of this
typical set of reading tasks, the subject region.
views two simultaneously presented stimu-
lus displays, one above the other, and is These findings provide the first clear evi-
dence of sex differences in the functional

R LR L

FIGURE 31.2. Sex differences in the brain during phonological processing. Composite fMRI images
showing the distribution of brain activation patterns in men (left) and women (right) during, the non-
word rhyming task. In men, activation is lateralized to the left inferior frontal regions but in women the
same region is active bilaterally. Data from B. A. Shaywitz, Shaywitz, et al. (1995). Copyright 2002 by
Sally Shaywitz.


Click to View FlipBook Version