A COMPARISON OF THREE TESTS OF MEDIATION
by
Rosalia E. Warbasse
A dissertation submitted in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy
(Counseling Psychology)
at the
UNIVERSITY OF WISCONSIN-MADISON
2009
UMI Number: 3384384
Copyright 2009 by
Warbasse, Rosalia E.
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
UMI®
UMI Microform 3384384
Copyright 2009 by ProQuest LLC
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
© Copyright by Rosalia E. Warbasse 2009
All Rights Reserved
I
a A dissertation entitled
A COMPARISON OF THREE TESTS OF MEDIATION-
I
V
cd
Ou
(S
eno
submitted to the Graduate School of the
University of Wisconsin-Madison
in partial fulfillment of the requirements for the
o degree of Doctor of Philosophy
U
by
ROSALIA E. WARBASSE
Date of Final Oral Examination: J u l y 3 1 , 2009
Month & Year Degree to be awarded: D e c e m b e r May August 2009
•••A**************************** **+******#*******^
Approval Signatures of Dissertation Committee
^^nfik^
Signature, Dean of Graduate School
/*f«nA#C Ctd^j^LiJa,
A COMPARISON OF THREE TESTS OF MEDIATION
Rosalia E. Warbasse
Under the supervision of Professor William T. Hoyt
At the University of Wisconsin-Madison
A simulation study was conducted to evaluate the performance of three tests of
mediation: the bias-corrected and accelerated bootstrap (Efron & Tibshirani, 1993), the
asymmetric confidence limits test (MacKinnon, 2008), and a multiple regression approach
described by Kenny, Kashy, and Bolger (1998). The evolution of these methods is reviewed
and current recommendations for assessing mediation are discussed (Frazier, Tix, & Barron,
2004; Preacher & Hayes, 2004; Mallinckrodt, Abraham, Wei, & Russell, 2006). The three
tests of mediation were evaluated according to three criteria that should be of interest to
researchers as they choose among methods for testing mediation hypotheses: the statistical
standard, the conceptual standard and the pragmatic standard. The statistical properties of the
three tests of mediation were evaluated by assessing the Type I error rates and power of the
tests. The conceptual standard was assessed by evaluating which method best captures the
conceptual meaning of mediation. The pragmatic standard was assessed by evaluating which
method was the minimally sufficient analysis to test the most basic, three-variable mediation
model.
I argue that for the most basic mediation model, the multiple regression approach
described by Kenny et al. (1998) provides the best balance of Type I error and power among
the three methods investigated; is a conceptually sound test of mediation in that it tests the
significance of each link in the hypothesized causal chain; and is the minimally sufficient
analysis for assessing mediation. £M -HU
UWMN GRADUATE SCHOOL
AUG 192009
i
Acknowledgments
Thank you to the professors in the Department of Counseling Psychology at UW-
Madison. I am grateful for how the years in the department have shaped and formed me.
Thank you to Bill Hoyt, my advisor. Your dedication to excellence in teaching and
writing is inspirational. This degree and dissertation were completed in large part because of
your support as an advisor.
Thank you to my husband, Eric, for his continuous support and encouragement. I am
blessed to be with a spouse who sees and values the talents which I have been given.
Table of Contents 1
Chapter One 2
7
Figure 1.1 Path Diagram for the Basic Mediation Model 12
Chapter Two 16
Figure 2.1 The Basic Causal Chain Involved in Mediation 28
Table 2.1 Methods Under the Multiple-Test Framework for Assessing
Mediation 50
Table 2.2 Methods Under the Single-Test Framework for Assessing 52
Mediation
Chapter Three 53
Figure 3.1 Pairwise Comparisons When No Indirect Effect Exists 54
(ab=0)
Figure 3.2 Pairwise Comparisons When an Indirect Exists (ab>0) 57
Table 3.1 Sample Table: Concordance Between Two Methods When No 59
Indirect Effect Exists (ab=0) 60
Chapter Four 61
Table 4.1 Type I Error Rates for BCa, ACL, and BT 62
Figure 4.1 Type I Error Rates (a=0, b=0) 63
Figure 4.2 Type I Error Rates (a=0, b=.\4) 64
Figure 4.3 Type I Error Rates (a=0, 6=.39) 65
Figure 4.4 Type I Error Rates (a=0, b=.59) 66
Figure 4.5 Type I Error Rates (a=. 14, b=0) 68
Figure 4.6 Type I Error Rates (a=.39, b=0)
Figure 4.7 Type I Error Rates (a=.59, b=0)
Table 4.2 Power to Detect an Indirect Effect for BCa, ACL, and BT
21. Figure 4.8 Power (a=.14, Z?=.14) iii
70
22. Figure 4.9 Power (a=A4, b=39) 71
23. Figure 4.10 Power (a=.14, b=.59) 72
24. Figure 4.11 Power (a=.39, 6= 14) 73
25. Figure 4.12 Power (a=.39, 6=.39) 74
26. Figure 4.13 Power (a=.39, b=.59) 75
27. Figure 4.14 Power (a=.59, b=.U) 76
28. Figure 4.15 Power (a=.59, b=.39) 77
29. Figure 4.16 Power (a=.59, b=.59) 78
30. Table 4.3 Possible Outcomes of Pah-wise Comparisons 81
31. Figure 4.17 Simplified Pairwise Comparisons When No Indirect Effect 84
32. Figure 4.18 Simplified Pairwise Comparisons When an Indirect Effect 85
Exists (ab>0)
33. Table 4.4 Concordance Between ACL and BT When No Indirect Effect 87
Exists (ab=0)
34. Table 4.5 Concordance Between BCa and BT When No Indirect Effect 88
Exists (ab=0)
35. Table 4.6 Concordance Between Concordance between BCa and ACL 89
When No Indirect Effect Exists (ab=0)
36. Table 4.7 K for Condition 10 of ACL-BT Comparisons 91
37. Table 4.8 Concordance Between ACL and BT When an Indirect Effect 93
Exists (ab>0)
3 8. Table 4.9 Concordance Between BCa and BT When an Indirect Effect 95
Exists (ab>0)
39. Table 4.10 Concordance Between BCa and ACL When an Indirect 97
Effect Exists (ab>0)
40. Table 4.11 Summary Statistics for % Agreement IV
99
41. Figure 4.19 ACL-BT Disagreement: No Indirect Effect Exists (ab=0), 101
N=50
42. Figure 4.20 ACL-BT Disagreement: No Indirect Effect Exists (ab=0), 102
N=100
43. Figure 4.21 BCa-BT Disagreement: No Indirect Effect Exists (ab=0), 103
N=50
44. Figure 4.22 BCa-BT Disagreement: No Indirect Effect Exists (ab=0), 104
N=100
45. Figure 4.23 BCa-ACL Disagreement: No Indirect Effect Exists (ab=0), 105
N=50
46. Figure 4.24 BCa-ACL Disagreement: No Indirect Effect Exists (ab=0), 106
N=100
47. Figure 4.25 ACL-BT Disagreement: An Indirect Effect Exists (ab>0), 107
N=50
48. Figure 4.26 ACL-BT Disagreement: An Indirect Effect Exists (ab>0), 108
N=100
49. Figure 4.27 BCa -BT Disagreement: An Indirect Effect Exists (ab>0), 109
N=50
50. Figure 4.28 BCa -BT Disagreement: An Indirect Effect Exists (ab>0), 110
N=100
51. Figure 4.29 BCa -ACL Disagreement: An Indirect Effect Exists (ab>0), 111
N=50
52. Figure 4.30 BCa -ACL Disagreement: An Indirect Effect Exists (ab>0), 112
N=100
53. Chapter Five 115
54. References 132
55. Appendix V
136
56. R syntax 136
57. Table A.l Summary of Simulation Conditions: No Indirect Effect Exists 147
58. Table A.2 Summary of Simulation Conditions: An Indirect Effect Exists 148
59. Table A.3 Optimal a and b for the Indirect Effect (ab) 150
60. Table A.4 ACL-BT Comparisons: No Indirect Effect Exists (ab=0) 151
61. Table A.5 BCa -BT Comparisons: No Indirect Effect Exists (ab=0) 152
62. Table A.6 ACL- BCa Comparisons: No Indirect Effect Exists (ab=0) 153
63. Table A.7 ACL-BT Comparisons: An Indirect Effect Exists (ab>0) 154
64. Table A.8 BCa-BT Comparisons: An Indirect Effect Exists (ab>0) 156
65. Table A.9 ACL- BCa Comparisons: An Indirect Effect Exists (ab>0) 158
66. Figure A.l ACL-BT Disagreement: No Indirect Effect Exists (ab=0), 160
67. Figure A.2 ACL-BT Disagreement: No Indirect Effect Exists (ab=0), 161
N=500
68. Figure A.3 BCa-BT Disagreement: No Indirect Effect Exists (ab=0), 162
N=200
69. Figure A.4 BCa-BT Disagreement: No Indirect Effect Exists (ab=0), 163
N=500
70. Figure A.5 BCa-ACL Disagreement: No Indirect Effect Exists (ab=0), 164
N-200
71. Figure A.6 BCa-ACL Disagreement: No Indirect Effect Exists (ab=0), 165
N=500
72. Figure A.7 ACL-BT Disagreement: An Indirect Effect Exists (ab>0), 166
N=200
73. Figure A. 8 ACL-BT Disagreement: An Indirect Effect Exists (ab>0), 167
N=500
vi
74. Figure A.9 BCa -BT Disagreement: An Indirect Effect Exists (ab>0), 168
N=200
75. Figure A. 10 BCa -BT Disagreement: An Indirect Effect Exists (ab>0), 169
N=500
76. Figure A. 11 BCa -ACL Disagreement: An Indirect Effect Exists (ab>0), 170
N=200
77. Figure A. 12 BCa -ACL Disagreement: An Indirect Effect Exists (ab>0), 171
N=500
1
Chapter One
Mediation analysis is used when researchers wish to study the mechanism(s) through
which one variable affects another variable. One of the reasons why mediation analysis is
important is that it can aid in theory building and specification (Judd & Kenny, 1981). For
example, in social psychology, mediator variables can describe the internal psychological
processes by which external events influence behavior (Kenny, Kashy, & Bolger, 1998).
Mediation analysis can also be used in program evaluation and outcome research to
understand the mechanisms by which treatment effects are generated (Judd & Kenny, 1981).
One example of a mediation model in counseling psychology is one that posits that the
relationship between counseling condition (i.e. treatment or no-treatment) and well-being is
mediated by social support. Given this model, one possible mediation hypothesis is that
counseling influences well-being by increasing social support (Frazier, Tix, and Barron,
2004). Once the relationship between variables is known, effort can be directed towards
targeting those variables that have the most impact on the outcome of interest.
After a relationship between variables is identified, researchers often want to explore
how or why particular predictor or independent variables influence certain criterion or
dependent variables (Baron & Kenny, 1986; Kenny et al., 1989). A mediation model is a
causal model that describes a hypothesized causal sequence of relationships between
variables. In order to understand the process of mediation, it is helpful to diagram the
relationships between variables.
2
X
M
a
X
Figure 1.1 Path Diagram for the Basic Mediation Model
Figure 1.1 represents the relationships between variables in the most basic mediation
model. The strength of the relationship between the two hypothetical variables, X and Y, is
represented by path c. The predictor variable, Variable X, is thought to influence the
criterion variable, Variable Y, through a third variable, a mediator variable, Variable M. The
effect of X on M is represented by path a, the effect of M on Y is represented by path b, and
the effect of X on Y controlling for M is represented by path c'. As can be seen from the
diagrams above, the total effect of X on Y (path c) can be partitioned into a direct effect (path
c') and an indirect effect (the product of paths a and b) so that c = ab+ c'. The indirect effect
is also referred to as the mediated effect by some (e.g. MacKinnon et al., 2002) because it
represents the portion of the effect of X on Y that is mediated by another variable.
Mathematically, the indirect effect (or the product of paths a and b) is equal to the difference
between the total effect of X on Y (path c) and the direct effect of X on Y (path c') (ab = c -
C).
3
There is support for a mediation hypothesis when the indirect path accounts for a
sizable proportion of the XY covariation. Full mediation occurs when the indirect path
accounts for all of the covariation between X and Y. Statistical evidence for full mediation is
provided when the predictor variable has no effect on the criterion variable when the
mediated effect is taken into account, or in other words, when path c' is not significantly
different from zero after controlling for the mediator. Partial mediation occurs when the
indirect path accounts for some but not all of the covariation between X and Y. When path
c' is reduced in size but is still significantly different from zero, then there is evidence for
partial mediation. This may indicate that other mediating variables are important in
explaining the relationship between X and Y (Baron & Kenny, 1986).
Researchers have discussed conceptualizing and quantifying the degree of mediation
as the proportion of the total effect that is mediated {able) (MacKinnon, Warsi, & Dwyer
1995; Shrout & Bolger, 2002). MacKinnon et al. (1995) conducted a simulation study that
compared four different estimators for the proportion of the total effect that is mediated.
Their study showed that the estimates for the proportion mediated stabilized at sample sizes
of 500 and above. Because such large sample sizes are necessary to gain accurate estimates,
MacKinnon et al. (1985) and Shrout & Bolger (2002) encourage researchers to be cautious
about using and interpreting these estimates. Although quantifying the degree of mediation
as the proportion of the total effect that is mediated is intuitively appealing in that it helps
readers make sense of the practical importance of mediation, it is important to remember that
this index is lacking in precision for sample sizes typical in many research areas.
Establishing mediation involves testing whether the magnitude of the indirect path in
Figure 1.1 (from X to Y through M) is significantly different from zero. There are two major
4
frameworks for assessing mediation. One framework involves testing the significance of the
component paths that make up the mediation model. For the most basic mediation model this
involves testing the significance of paths a and b in Figure 1.1 to see if there is a significant
relationship between the predictor variable and the mediator variable and between the
mediator variable and the outcome variable. This framework assesses mediation by testing
each link in the hypothesized causal chain. If one link is not statistically significant, it can be
argued that mediation is not occurring as hypothesized. I refer to this framework as the
multiple-testframework because mediation is established by testing the significance of each
path between variables in a mediation sequence. This framework yields one effect size
estimate for each path tested. A second framework for testing mediation involves testing
whether the indirect effect is significant. For the most basic mediation model (Figure 1.1)
the indirect effect is estimated by either the product of paths a and b or the difference of the
regression coefficients c and c'. This framework yields a single effect size estimate. I refer to
this framework as the single-testframework because mediation is established by testing the
significance of the estimate for the indirect path.
Several methods for testing mediation exist within each of the two general
frameworks for assessing mediation. There has been a recent proliferation of articles
describing and comparing these methods (Frazier, Tix, & Barron, 2004; MacKinnon,
Lockwood, Hoffman, West, & Sheets, 2002; MacKinnon, Lockwood, & Williams, 2004;
Mallinckrodt, Abraham, Wei, & Russell, 2006; Shrout & Bolger, 2002). What is missing
from the literature is the recognition that mediation methods fall under two general
frameworks for assessing mediation. Because of this lack of recognition, heretofore there has
been little consideration of the conceptual adequacy of each as a framework for
5
understanding mediator relations. Researchers have recently recommended methods that fall
under the single-test framework (MacKinnon et al., 2002; MacKinnon et al., 2004;
Mallinckrodt et al., 2006; Preacher & Hayes, 2004) and have directed readers to these
methods (Frazier et al., 2004). A statistical argument has been made in favor of these
methods in that they often have greater statistical power than methods that fall under the
multiple-test framework. However, there has been little consideration of conceptual and
pragmatic issues that are also relevant to the choice of an analytic strategy for testing
mediation.
This lack of conceptual clarity may contribute to misleading statements in the
literature. For example, in their discussion of one multiple-test approach, Frazier, Tix, and
Barron (2004) stated, "it is not enough to show that the relation between the predictor and
outcome is smaller or no longer is significant when the mediator is added to the model.
Rather, one of several methods for testing the significance of the mediated effect should be
used" (p. 128). They imply that the multiple-test framework is not sufficient for establishing
mediation without explaining why they think this is so.
Preacher and Hayes (2004) declared that "a necessary component of mediation is a
statistically and practically significant indirect effect" (abstract). One of their main
arguments for assessing mediation through testing the significance of the indirect effect is
that these methods have been found to be more powerful than the methods that fall under the
multiple-test framework for assessing mediation.
The purpose of this dissertation is to explore the fundamental question of which
framework, the multiple-test framework or the single-test framework, is the most accurate
and helpful framework for assessing mediation from each of three perspectives. The two
6
frameworks will be evaluated according to three standards. The first question that will be
addressed is which framework has better statistical properties? The second question that will
be asked is which framework makes the most conceptual sense? The third question that will
be asked is which framework is the most pragmatic?
The statistical standard is the one that has been addressed most thoroughly by the
existing research. This study will seek to replicate and extend the existing research by
conducting a simulation study that will compare Type I error rates and the statistical power
of three methods for conducting mediation analysis that have yet to be compared to one
another. Evaluating the two frameworks according to the conceptual standard will speak to
the question of whether one framework is superior to the other with respect to: (a) capturing
the meaning of mediation (that one variable impacts another variable through an intervening
variable) and (b) capturing information from the data that will be most helpful for the two
purposes for which mediation analysis is often employed (theory building and program
evaluation). The pragmatic standard that will be applied to the two frameworks is one that
was articulated by Wilkinson and the Task Force on Statistical Inference (1999) of choosing
the minimally sufficient analysis when deciding between quantitative methods. They suggest
the following, "If the assumptions and strength of a simpler method are reasonable for your
data and research problem, use it. Occam's razor applies to methods as well as to theories"
(Wilkinson et al., 1999, p. 598). Thus, the three methods that are chosen for the simulation
study will be evaluated not only according to their statistical accuracy in assessing mediation
but also according to their conceptual clarity, their ease of use, and the interpretability of
their results.
7
Chapter Two
In the midst of the recent proliferation of articles on mediation analysis and in the
ongoing debate as to which method should be employed by researchers, an important
distinction between methods has not been articulated. Missing from the discussion on
mediation is the recognition that mediation methods fall into two frameworks. One set of
methods assesses mediation by testing the significance of each link in a hypothesized causal
chain. Because this approach involves at least two significance tests, I refer to this
framework as the multiple-test framework. A second set of methods assess mediation by
testing the significance of the indirect effect; the effect of the independent variable on the
dependent variable via the mediator or mediators. Because this approach involves a single
test of significance I refer to this framework as the single-test framework.
It is important to recognize the fundamental difference between the two frameworks
for assessing mediation because this is the first point in the decision tree that researchers
should look to when trying to decide which mediation analysis to use. In order to aid
researchers in deciding which of the two frameworks of mediation analysis is superior to the
other, three questions will be explored. Which of the two frameworks makes the most
conceptual sense? Which of the two frameworks has better statistical properties? And, which
of the two frameworks makes more sense from a pragmatic point of view? In this chapter, I
will begin to address these questions by first reviewing the multiple-test and single-test
frameworks for assessing mediation and the methods that fall under each and then reviewing
simulation studies that have been conducted in order to compare various methods for
assessing mediation.
8
The Multiple-Test Framework for Assessing Mediation
The multiple-test framework assesses mediation by testing the significance of the
component paths that comprise the mediation model. The methods that fall under the
multiple-test framework are the methods of Kenny and colleagues (Judd & Kenny, 1981;
Baron & Kenny, 1986; Kenny et al., 1998). The paper that has been most frequently cited in
the area of mediation is Baron and Kenny (1986) (Preacher & Hayes, 2004). As of January,
2007, this paper has been cited 7708 times according to a Web of Knowledge citation search.
Since the publication of their papers, the methods of Kenny and colleagues have been the
standard methods used in conducting mediation analysis. Minor revisions have been made to
the methods over the years but they are generally based on using a series of multiple
regression analyses to estimate and test the path coefficients illustrated in Figure 1.1
(repeated below for convenience).
X
M
XY
Figure 1.1 Path Diagram for the Basic Mediation Model
9
In these diagrams, X represents the independent or predictor variable, M represents
the mediator variable, and Y represents the dependent or criterion variable. Path c represents
the total relationship between X and Y. Path a quantifies the relationship between X and Min
that one unit change in X results in a change in M of a units. Path b quantifies the
relationship between Mand Fin that one unit change in Mresults in b units change in Y
whenXis statistically controlled. Path c' represents the direct effect of Xon Y when Mis
held constant (Shrout & Bolger, 2002).
The following three regression analyses can be used to estimate paths a, b, c, and c'.
First, the criterion variable is regressed on the predictor variable in order to estimate
and test the significance of path c.
Y=B0(l) + cX+e(l) (2.1)
Second, the mediator is regressed on the predictor variable in order to estimate and
test the significance of path a in the diagram.
M=B(K2) + aX+ei2) (2.2)
Third, the criterion variable is regressed on both the predictor variable and on the
mediator to estimate and test path b in the diagram. The significance of path c' can also be
determined.
Y = £0(3) + c'X+bM+ e{3) (2.3)
In these regression equations Z?o(i), #0(2), and BQQ) represent the regression intercepts
and ^(i), £(2), and e@) represent the residuals.
10
Judd and Kenny (1981): The Test of Full Mediation
Kenny and colleagues have varied what they emphasize as important for
demonstrating mediation. Judd and Kenny (1981) discussed mediation in the context of
evaluation research and conducting a process analysis that specifies the causal mechanisms
responsible for treatment outcomes.
They stated that in order to validate a hypothesized mediation model, researchers
must provide evidence for three conclusions. Researchers must demonstrate that the
predictor variable affects the criterion variable (for without this relationship it makes little
sense to posit a causal process between the variables)(Conclusion I); that when there is a
series of variables in a mediation chain each variable affects the variable following it when
variables prior to it are controlled (Conclusion II); and that the predictor variable no longer
affects the criterion variable once the mediator variables are controlled (Conclusion III). The
third conclusion provides evidence for full mediation and indicates that the intervening
variables that have been specified in the model are sufficient for explaining the relationship
between the predictor and criterion variables. Judd and Kenny (1981) described the
regression analyses that would provide evidence for these conclusions in terms that apply to
models with more than one mediating variable. However, if we refer to the most basic
mediation model (a single mediating variable) illustrated in Figure 1.1, the three conclusions
would be supported if the c, a, and b regression coefficients were significantly different from
zero and if the c' coefficient was not significantly different from zero. Demonstrating that c'
is not significantly different from zero tells us that after controlling for the mediator
variables, a previously significant relationship between X and Y is no longer significant.
11
In Judd and Kenny's (1981) discussion of the three conclusions that provide evidence
for mediation, they discuss two exceptions that become important in later articles on
mediation. With respect to Conclusion I, they note that even if the predictor variable is not
shown to be related to the criterion variable, for example in a case where a treatment program
does not affect outcome, it still may be important to explore the mediating process in order to
understand why the treatment was ineffective. One possible scenario that may explain why
treatment does not appear to affect outcome is that the direct effect of the predictor variable
while controlling for the mediator variable (c') is cancelled out by the mediating process (i.e.
path ab is opposite in sign to c'). This phenomenon has traditionally been referred to as
suppression. MacKinnon, Krull, and Lockwood (2000) noted that suppression can be
considered as a special case of mediation (where direct and indirect paths have opposite
signs). Some researchers are now treating suppression as such (Shrout and Bolger, 2002).
With respect to Conclusion III, Judd and Kenny (1981) note that if full mediation is not
demonstrated, partial mediation can be demonstrated by presenting evidence for Conclusions
I and II.
12
Baron and Kenny (1986): The Basic Plus Bivariate Test of Mediation
Baron and Kenny (1986) describe mediation in the context of the simplest causal
chain.
Mediator
Independent Outcome
Variable Variable
Figure 2.1 The Basic Causal Chain Involved in Mediation
They discuss providing evidence for mediation in terms of meeting three conditions. The
first two conditions stipulated in Baron and Kenny (1986) are similar to the first two
conclusions articulated in Judd and Kenny (1981). Baron and Kenny's (1986) third condition
and Judd and Kenny's (1981) third conclusion differ in that Baron and Kenny describe in
more detail the phenomenon of partial mediation. They describe mediation as occurring
along a continuum with mediators that partly explain the relationship between X and Y on
one end of the continuum (partial mediation) and mediators that fully explain the relationship
between X and Y on the other end of the continuum (full mediation). If we refer back to
Figure 1.1, Baron and Kenny's three conditions are satisfied if the regression coefficients a,
b, and c are found to be significant. Baron and Kenny (1986) differ from Judd and Kenny
(1981) in that Baron and Kenny (1986) were more explicit about the fact that path c' need
not be nonsignificant in order to demonstrate mediation. This shift in emphasis is subtle;
although it is informative to test path c' because this test establishes the type of mediation
13
that is occurring (e.g. full versus partial mediation), the test of path c' is not critical to
establishing mediation.
Baron and Kenny (1986) go on to introduce a significance test for the indirect effect
of the predictor variable on the criterion variable via the mediator (the product of paths a and
b). Although this significance test is provided, Baron and Kenny do not require this step in
order to demonstrate mediation; only the regression analyses described above are required.
The indirect effect is tested for significance by dividing it by its standard error and
comparing the resulting z-score to the standard normal distribution. For example, the indirect
effect is significant at the/7 = .05 level if the z-score is greater than 1.96. The standard error
term Baron and Kenny (1986) introduced to test the significance of the indirect effect (ab) is
Sobel's (1982) estimate with an added term.
/ I 2 2~~ 2 2 ~ 2 2 {2A)
TJbsa +a sb +sa sb
22
The term sa St, is not included in Sobel's estimate but the numerical value of this
term is usually small. Both of these formulas for the standard error of the indirect effect
assume multivariate normality for the indirect effect and are best used for large sample sizes
(e.g. samples of 200 or more).
Kenny, Kashy, and Bolger (1998): The Basic Test of Mediation
Kenny et al. (1998) summarize Judd and Kenny (1981) and Baron and Kenny (1986)
in four steps and depict mediation using Figure 1.1. The four steps are: (a) show that the
predictor is correlated with the criterion variable; (b) show that the predictor variable is
correlated with the mediator; (c) show that the mediator affects the criterion variable; (d) and
if the researcher wishes to provide evidence for full mediation, show that the effect of Xon Y
14
controlling for Mis not significantly different from zero. If only the first three steps are met,
then there is evidence for partial mediation.
Kenny et al. (1998) elaborate on the phenomenon of suppression that Judd and Kenny
(1981) briefly noted. To accommodate the possibility of suppression, Kenny et al. (1998)
relax the requirement that Xbe correlated with Y (Step 1). Shrout and Bolger (2002) suggest
that another condition when Step 1 should be relaxed is when the predictor variable is distal
to the criterion variable such as in longitudinal studies that track long-term processes.
Because the first and fourth steps are optional steps (the fourth is only necessary
when trying to establish full mediation), Kenny et al. (1998) conclude, "the essential steps in
establishing mediation are Steps 2 and 3." If we refer back to Figure 1.1, only the regression
coefficients a and b need to be significant in order to provide evidence for mediation. This
latest articulation of the methods of Kenny and colleagues has been referred to as the test of
joint significance by MacKinnon et al. (2002). The termjoint significance refers to the
requirement that both paths a and b need to be tested and found to be statistically significant
in order to establish mediation. A word of caution: the phrase test of joint significance may
be misinterpreted by readers as a test of the product of the two paths a and b when it really
tests the significance of paths a and b separately. MacKinnon et al. (2002) attribute this
method to Cohen and Cohen (1983). Cohen and Cohen (1983) note that a mediation
hypothesis is supported when each of the component paths in a causal chain is found to be
significant.
A Summary of the Multiple-test Framework
The multiple-test framework, which is represented by the methods of Kenny and
colleagues, has evolved in the direction of becoming less restrictive over time. With each
15
iteration of their methods, Kenny and colleagues have required fewer paths to be significant
in order to provide evidence for mediation. Judd and Kenny (1981) required four paths to be
tested (a, b, c, and c'); Baron and Kenny (1986) required only three paths to be tested (a, b,
and c); and Kenny et al. (1998) required only two paths to be tested (a and b). Baron and
Kenny (1986) relaxed Judd and Kenny's requirement that path c' be shown to not be
significantly different from zero to allow for the possibility of partial mediation. Kenny et al.
(1998) relaxed Baron and Kenny's (1986) requirement that path c be significant to allow for
the possibility of suppression. It makes intuitive sense that what is necessary to make a case
for mediation is to show that the predictor variable affects the mediator (path a) and that the
mediator affects the criterion variable independent of the effects of the predictor variable
(path b). For the purposes of this dissertation I adopt a terminology that reflects the evolution
of the multiple-test framework. Working backwards in time and starting from the most
recent iteration, I refer to Kenny et al.'s (1998) method as the basic test of mediation since it
tests only the basic paths necessary for establishing mediation. I refer to Baron and Kenny's
(1986) iteration as the basic plus bivariate test, as it adds to the basic test of mediation a test
of the bivariate relationship between X and Y. I refer to Judd and Kenny's (1981) iteration
as the test of full mediation as it requires that mediator variables explain all of the XY
covariation.
16
Table 2.1
Methods Under the Multiple-Test Frameworkfor Assessing Mediation
Test of Mediation Regression Coefficients Tested for Significance
Basic test of mediation (Kenny et al., 1998) a, b
Basic plus bivariate test of mediation (Baron & a, b, c
Kenny, 1986)
Test of full mediation (Judd & Kenny, 1981) a, b, c, c'
Note, c' is tested in order to show that it does not differ significantly from zero
Frazier et al. (2004), in their summary of the methods of Kenny and colleagues,
reiterated the three requirements of Baron and Kenny (1986) that paths a, b, and c be tested
for significance and listed as a fourth step the requirement that the difference between c and
c' be tested for significance. The stated purpose of this fourth step was to establish that
controlling for the mediator variable significantly decreased the relationship between the
predictor and criterion variables. Because the difference between c and c' (c - c') is equal to
the product of paths a and b (ab), they test the significance of c - c' by dividing ab by the
standard error term introduced by Baron and Kenny (1986). Frazier et al. (2004) attribute
this fourth step to Kenny and colleagues, but this fourth step was never required by them.
Frazier et al. (2004) conflate the multiple-test framework and the single-test framework by
requiring that the component paths of the mediation model be tested as well as the indirect
effect.
17
The Single-Test Framework for Assessing Mediation
The single-test framework assesses mediation by testing the significance of the
indirect effect (also referred to as the mediated effect by MacKinnon et al. 2002) either by
dividing the estimate of the indirect effect by its standard error or by bracketing the estimate
with a confidence interval. To review, the total correlation between Xand 7 can be
partitioned into the direct effect of Xon Y (represented by path c') and the indirect effect of X
on Y that is mediated by a third variable, M. The indirect effect is represented in Figure 1 by
paths a and b. The total effect ofXon Y is equal to the sum of the direct effect and the
indirect effect (c = ab + c'). Path a represents the relationship between X and M in that one
unit change in Xresults in a units change in M. Path b represents the relationship between M
and Y in that one unit change in M results in b units change in Y when X is statistically
controlled. Therefore, one unit change in X results in ab units change in Y.
X
M
X
Figure 1.1 Path Diagram for the Basic Mediation Model
The methods that fall under the single-test framework can be divided into three
categories: (a) methods that use the product of the regression coefficients a and b {ab) to
18
estimate the indirect effect, (b) methods that use the difference in coefficients c and c' (c - c')
to estimate the indirect effect, and (c) bootstrap or resampling methods that empirically
estimate the indirect effect and bracket the effect with an empirically derived confidence
interval. The coefficients used in the product of coefficients and difference in coefficients
approaches are obtained using the same regression equations that are the foundation of the
multiple-test framework (repeated below).
Y = Bm + cX+e(l) (2.1)
M=B0i2) + aX+e(2) (2.2)
Y = #o(3) + c'X+bM+ e(3) (2.3)
In these regression equations 2?o(i), -5o(2), and 2?o(3) represent the regression intercepts
and e(i); ep), and e@) represent the residuals.
The product of coefficients methods use Equations 2.2 and 2.3 to estimate the
regression coefficients a and b. The difference in coefficients methods use Equations 1 and 3
to estimate the coefficients c and c'. The indirect effect is equally well quantified by the
product of coefficients a and b and the difference in coefficients c and c' which are
mathematically equivalent (ab = c - c'). Although one would assume that significance tests
of the indirect effect using products of coefficients methods and difference in coefficients
methods would converge and yield similar answers, they often do not because the formulas
used to estimate the standard error of the indirect effect and the methods used to test the
significance of the indirect effect differ in the two approaches.
19
Methods that Test the Significance of the Product of Coefficients a and b
One estimate for the standard error of the product of coefficients (ab) that is
commonly seen in mediation literature is Sobel's estimate which is the square root of the
99 99 99
term b sa + a Sb +sa Sb . Sobel (1982, 1986) was one of the first to derive the standard error
of the indirect effect. He used the multivariate delta method or the first order Taylor series to
estimate the standard error of the indirect effect (MacKinnon et al., 1995). Variations of
Sobel's estimate have also been introduced into the mediation literature. One variation of
99
Sobel's estimate that adds a second order Taylor series term, sa Sb to the original formula
was discussed by Baron and Kenny (1986) and a variation of Sobel's estimate that subtracts
the term sa2Sb2 from the original formula was discussed by MacKinnon et al. (1995) and
MacKinnon et al. (2002). The indirect effect is tested for significance by dividing it by
Sobel's estimated standard error (or variations of Sobel's estimate) and comparing the
resulting z-score to the standard normal distribution. Mallinckrodt et al. (2006) call this
approach the Normal Theory approach.
Sobel's estimate assumes that the indirect effect, the product term ab, is distributed
normally. However, later researchers realized that this assumption did not hold because the
sampling distribution of the product of two normally distributed variables is usually not
normal. MacKinnon (retrieved January 13, 2007) listed the moments of the distribution of the
product of two normally distributed variables as having a skew of 1.15 and a kurtosis of 3.5
when paths a and b are small. Methods that assess mediation by testing the significance of
the indirect effect using any estimate of the standard error that assumes the normal
distribution of ab have low power and incorrect Type I error rates (MacKinnon et al., 2002)
because the distribution of ab is likely to be positively skewed.
20
MacKinnon and colleagues have developed several methods that attempted to take
into account the asymmetry of the distribution of the indirect effect. They are: the empirical
distribution of ab, the distribution of the product of two standard normal variables, and the
asymmetric confidence limits for the distribution of the product ab (renamed the M method
in MacKinnon et al. 2004. MacKinnon (2008) reverted back to calling this method the
asymmetric confidence limits test). MacKinnon and colleagues' three methods take into
account the asymmetry of the sampling distribution of ab in various ways. For the first
method, the empirical distribution of ab, MacKinnon, Lockwood, and Hoffman (1998, as
cited in MacKinnon et al., 2002) used simulations to generate an empirical sampling
distribution for ab. Critical values for different significance levels were then determined
from this distribution. The second method, the distribution of the product of two standard
normal variables (MacKinnon et al., 1998 as cited in MacKinnon et al., 2002), involves
converting the estimates of a and b into z scores, multiplying the two z scores, and then
testing the significance of this product term by comparing it to the expected distribution of
the product of two normal random variables from Craig (1936, as cited in MacKinnon et al.,
2002). The third method, the asymmetric confidence limits method (MacKinnon &
Lockwood 2001, as cited in MacKinnon et al., 2002), also converts the estimates of a and b
into z scores. These z scores are then used to find critical values for the product of two
random variables from Meeker et al.'s (1981; as cited in MacKinnon et al., 2002) tables that
are then used to construct upper and lower confidence intervals. If the confidence intervals
that bracket the indirect effect do not include zero then the effect is statistically significant.
21
(Footnote ': The first two methods were described in a paper presentation and the third
method was described in an unpublished manuscript, MacKinnon & Lockwood, 2001. When
I requested the unpublished manuscript, the research assistant referred me to the MacKinnon
et al. 2004 article, which was the most recent iteration of their methods. The methods
described in the paper presentation and the unpublished manuscript are not described in any
available published articles.)
22
Methods that Test the Significance of the Difference in Coefficients c and c'
MacKinnon et al. (2002) noted that difference in coefficients methods vary in how
they derive the standard error of the indirect effect, the assumptions they make about the
properties of the variables, and the null hypotheses about intervening variables that are
tested. Two examples of estimates of the standard error of the difference in coefficients (c -
c') are McGuigan and Langholtz' estimate and Freedman and Schatzkin's estimate
(MacKinnon et al., 2002). Please refer to Table 1 of MacKinnon et al., 2002 (p. 85) for a
table of standard errors and tests of significance for difference in coefficients methods and
products of coefficients methods.
Bootstrap or Resampling approaches
The bootstrap approach is another approach to calculating an estimate of the indirect
effect. The significance of the indirect effect is determined by constructing a confidence
interval around it (Shrout & Bolger, 2002). Similar to how the methods of MacKinnon and
colleagues were inspired, this trend towards bootstrapping the indirect effect was prompted
by the finding that the product of two normally distributed variables is not normally
distributed (MacKinnon, Warsi, & Dwyer, 1995; MacKinnon and Lockwood, 1998).
Bootstrap procedures can be used to construct confidence intervals around any parameter
estimate but have been most useful for parameter estimates that have unknown distributions
because analytically derived formulas for the standard error perform well for parameter
estimates for which the expected sampling distribution is known. Bootstrap methods are
often applied when analytically derived formulas for estimating the standard error do not
perform well. Another advantage of bootstrap methodology is that it can be used on small
samples (e.g sample sizes that range from 20-80 (Efron & Tibshirani, 1993)).
23
Bootstrap methods calculate the standard error and confidence intervals of an
estimate through empirical rather than analytical approaches. Computers are used to generate
data sets (bootstrap samples) from the set of original observations by repeatedly sampling
from the original data set with replacement (Bollen & Stine, 1990). The number of bootstrap
samples that researchers have recommended if one intends to estimate the indirect effect and
confidence intervals around it have ranged from a minimum of 1,000 (Bollen & Stine, 1990)
to 10,000 (Mallinckrodt et al., 2006). An estimate of the standard error of the indirect effect
can be calculated from the bootstrap samples. The standard deviation of the sampling
distribution of the estimates of the indirect effect derived from repeated samples is the
bootstrap-estimated standard error (Shrout & Bolger, 2002). A 95% confidence interval can
be constructed around the indirect effect by adding or subtracting the product of 1.96 and the
standard error to the estimate of the indirect effect. However, using an estimate of the
standard error to construct confidence intervals is not so useful when the sampling
distribution is not symmetric. A more useful approach to constructing confidence intervals
around the indirect effect is by empirically identifying the cutpoints that exclude {cell x
100%) of the values from each tail of the distribution. For example, if a is set to .05,
confidence intervals can be identified by examining the bootstrap distribution of the indirect
effect and identifying the values that mark the bottom and top 2.5% (Shrout & Bolger, 2002).
Shrout and Bolger (2002) refer to confidence intervals constructed with this latter method as
the bootstrap percentile. They also briefly mention another bootstrap method, the bias-
corrected bootstrap (Efron & Tibshirani, 1993, pp. 178-188) that yields even more accurate
confidence intervals with smaller samples. Efron and Tibshriani (1993) refer to this
approach as the bias-corrected and accelerated bootstrap.
24
Comparisons of Empirically and Analytically Derived Confidence Intervals
Researchers have compared the performance of bootstrap methods to those that use
the delta method to estimate the standard error of the indirect effect (e.g. Sobel's estimate
and variations of it) with respect to bracketing the point estimate of the indirect effect with
confidence intervals (Bollen & Stine, 1990; Shrout & Bolger, 2002, Mallinckrodt et al.,
2006). Bollen and Stine (1990) used actual data from three samples to provide examples of
how the bootstrap-derived sampling distribution compared to the normal distribution and
how empirically derived confidence intervals compared with analytically derived ones.
Bollen and Stine's first two examples had small samples (N=20) and when a normal
distribution was superimposed over the bootstrap distribution, it was evident that the
bootstrap distributions were skewed. Because the bootstrap sampling distributions were
skewed, the bootstrap confidence intervals were asymmetric. The 90% confidence intervals
for the bootstrap method and the delta method led to the same conclusion for the first two
examples. Bollen and Stine's third example had a larger sample size (N=175).
Interestingly, the bootstrap distribution for this sample was similar to the normal distribution.
Confidence intervals constructed from the two methods were also similar. They concluded
that the delta method for estimating the standard error of the indirect effect (which is a based
on a large-sample approximation) and a normal distribution may indeed work well for large
samples.
(Footnote 2: However, in the first example, the ninety-five percent confidence intervals led to
different conclusions. At ce=.05 the bootstrap confidence intervals did not include zero while
the delta method confidence intervals did.)
25
Shrout and Bolger (2002) calculated point estimates of the indirect effect, bracketed
the point estimates with confidence intervals, and compared those derived analytically to
those derived empirically for two samples. The first sample was a hypothetical (simulated)
sample of N=80 with the parameters set at a = .4, b = .3, and c' = 0. The data for the second
sample was collected by Chen and Bargh (1997) and had a sample size of forty-six dyads.
Shrout and Bolger (2002) superimposed normal distributions on the bootstrap distributions
and found that the bootstrap distributions were skewed positively in both samples. In the
first example, the bootstrap and analytic estimates of the indirect effect were similar (.089
and .091 respectively) and were close to the population value, which was ab = .12. However,
the analytically derived confidence intervals (which were calculated using Sobel's estimate
with the added term sa Sb ) did include zero, whereas the bootstrap confidence intervals did
not. The bootstrap percentile accurately detected the indirect effect while the analytically
derived confidence intervals did not. In the second example, the bootstrap and analytical
methods led to the same conclusion—that there was no indirect effect. However, the
bootstrap percentiles narrowly included zero (-0.001, 0.49) and when Shrout and Bolger
corrected the intervals for bias, the new intervals did not include zero (0.004, 0.52). The
bias-corrected and accelerated confidence intervals are more accurate for small samples than
the usual bootstrap approach.
Mallinckrodt et al. (2006) compared what they termed the Normal Theory (NT)
method to the bootstrap method for assessing mediation. The Normal Theory method is
comprised of the three steps required by Baron and Kenny (1986) to provide evidence for
mediation as well as a fourth step of testing the significance of the indirect effect by using
Sobel's estimate and the standard normal distribution. Mallinckrodt et al. (2006) used data
26
collected by Mallinckrodt and Wei (2005) to draw a sample of 60 from a total sample of 430
students. Shrout and Bolger's (2002) results were replicated in Mallinckrodt et al. (2006) in
that they found that although the NT estimate and the bootstrap estimate of the indirect effect
were similar (in this case, they were equal to each other, both were 0.012), the NT confidence
intervals included zero (-0.002, 0.026) while the bootstrap intervals (0.0002, 0.0397) and the
bias-corrected intervals (0.0004, 0.0413) did not.3 Also, it is unclear from the study whether
Mallinckrodt et al. (2006) used the bias-corrected bootstrap or the bias-corrected and
accelerated bootstrap. They stated that they used the bias-corrected bootstrap but imply that
they used the same bias-corrected bootstrap that Shrout and Bolger (2002) used, which is
really the bias-corrected and accelerated bootstrap.) One major contribution of Mallinckrodt
et al. (2006) to the bootstrap literature is that they provided syntax and descriptions for
running the normal bootstrap procedure and the bias-corrected procedure (when possible), for
six software programs: Amos 5.0, LISREL 8.54, EQS 6.1, Mplus 3.13, SAS 9.1, and SPSS
12.0.
(Footnote3: If one used the basic test of mediation, which only requires that paths a and b be
significant one would also reach the conclusion that an indirect effect exists).
27
A Summary of the Single-test Framework
The single-test framework for assessing mediation assumes that mediation is best
assessed by testing the significance of the indirect effect. Three sets of approaches for
assessing mediation that fall under this framework are product of coefficients approaches,
difference-in-coefficients approaches, and bootstrap or resampling approaches. Table 2
provides a summary of references that are associated with these approaches.
28
Table 2.2
Methods Under the Single-Test Frameworkfor Assessing Mediation
Families of Approaches to Assessing Mediation Test of significance
Product of coefficients approaches Sobel's estimate:
ab
Sobel's estimate of the standard error of the
indirect effect (1982, 1986). Methods that employ psj^a1 Ab
Sobel's estimate and variations of Sobel's estimate
were dubbed Normal Theory Methods by Variation of Sobel's estimate:
Mallinckrodt et al. (2006)
2 2, 22
Empirical distribution of ab (MacKinnon et al. ab
1998 as cited in MacKinnon et al, 2002) ^a2s2 +b2sa2
Distribution of the product of two standard normal ? = zazb
variables (MacKinnon et al.1998 as cited in
MacKinnon et al., 2002) Lower Confidence Limit = mediated
Asymmetric confidence limits approach effect + P r o d L o w e r Type 1 Error (S.^ )
(MacKinnon 2008), also known as the M method
(MacKinnon et al. 2004) Upper Confidence Limit = mediated
effect + ProdUpper Type i Error ( s . s )
Difference-in-coefficients approaches
c-c
McGuigan and Langholtz (1988) tN-2 ~ '4s2c+sl-2(rcc,scsc.)
Freedman and Schatzkin (1992) c-C
' tf-2 ~~ ^' s2+s2,-2scsc,Jl- ' XM
Bootstrap or resampling approaches [percentile bootstrap - empirical
determination of confidence interval:
Percentile bootstrap and bias-corrected and for a particular a level, determine the
accelerated bootstrap approaches (Efron & (a/2) x 100% and (l-a/2) x 100%
Tibshirani, 1993).The implementation of this percentiles of the distribution]
approach was described by Shrout and Bolger [bias-corrected and accelerated
(2002) and Mallinckrodt et al. (2006). bootstrap algorithm provided by Efron
& Tibshirani (1993)]
29
A Review of Simulation Studies that Compared Methods of Assessing Mediation
The recent literature on mediation analysis has included several large-scale simulation
studies that compared the various methods of assessing mediation. Simulation studies
empirically demonstrate how well methods of mediation analysis perform under varying
values of parameters a, b, c', and sample size. In these studies researchers generate large
numbers of data sets (samples) drawn from populations with known values of paths a, b, and
c'. They then apply the mediation analyses they wish to study to these simulated data.
Because the population parameters are known, researchers can compare the methods with
respect to power, Type I error rates, accuracy in the estimate of the indirect effect, and
accuracy in the estimate of the standard error or confidence interval around the indirect
effect.
MacKinnon and colleagues have been prominent in conducting simulation studies on
mediation analysis (MacKinnon & Dwyer, 1993; MacKinnon et al., 1995; MacKinnon et al.,
2002; MacKinnon et al., 2004). In their early studies, MacKinnon and colleagues mainly
focused on checking the accuracy of estimates of the standard error of the indirect effect
derived analytically (MacKinnon & Dwyer, 1993; MacKinnon et al., 1995). MacKinnon et
al. (2002) conducted the first simulation study that compared all the methods that they could
locate (including methods that fall under both the multiple-test framework and the single-test
framework) for assessing mediation. These methods were compared on the basis of the
accuracy of estimates of the indirect effect, the accuracy of the estimates of the standard error
of the indirect effect, Type I error rate, and power. MacKinnon et al. (2004) restricted their
simulation study to methods that fall under the single-test framework but expanded their
study to include bootstrap methods.
30
Early Studies on Estimates of the Standard Error of the Indirect Effect
MacKinnon and Dwyer (1993) studied the performance of three estimates of the
standard error of the indirect effect. Two of the estimates they studied are estimates of the
standard error of the product term ab: Sobel's formula (which they designated aapDeita) and
97 •
Sobel's formula with the added term sa Sb (which they designated o"apExact)- The third
estimate they studied was the McGuigan and Langholtz estimate of the standard error of the
difference in coefficients c - c' (which they designated aT.T-). MacKinnon and Dwyer set the
parameters of paths a and b equal to .7 and c' equal to .2. The sample sizes they studied were
N=10, 25, 50, 100, 200, 500, 1000, and 5000. They generated 100 replications for each
sample size. They modeled continuous and categorical independent and dependent variables.
(Because categorical dependent variables are rare in counseling psychology, only the results
from the first two conditions listed will be reviewed here.)
MacKinnon and Dwyer (1993) found that Sobel's estimate (aapDeita) and the variation
of Sobel's estimate (aapExact) yielded estimates of the standard error of the indirect effect that
were similar to the true standard error for both the continuous independent variable condition
and the binary independent variable condition. The McGuigan and Langholtz estimate (oT.T)
was an accurate estimate of the standard error only for the continuous independent variable
condition. For the binary independent variable condition the McGuigan and Langholtz
estimate was inflated. The estimates of the standard error were closest to the true standard
error at sample sizes of 50 and greater.
MacKinnon et al. (1995) expanded on their prior study by adding a fourth estimate of
the standard error of the indirect effect to the three that had already been studied in
MacKinnon and Dwyer (1993). The fourth estimate was a variation on Sobel's estimate that
31
subtracted the term sa2Sb2. To recap, the four estimates of the standard error of the indirect
effect were Sobel's estimate, two variations on Sobel's estimate (one that added the term
sa2Sb2 and one that subtracted the term sa2st2), and the McGuigan and Langholtz estimate.
MacKinnon et al. (1995) also explored four estimators each of two indexes of the relative
magnitude of mediation which they designated the proportion of the total effect that is
mediated {able) and the ratio of the indirect to the direct effect {able'). They simulated the
four estimates of the standard error of the indirect effect, the four estimators of the proportion
mediated, and the four estimators of the ratio for the continuous independent variable
condition and the binary independent variable condition. They carried out 500 replications of
the above study on eight sample sizes (7V=10, 25, 50, 100, 500, 1000, and 5000) and 64
parameter combinations. Unfortunately, they do not list the parameters they studied. They
pool their results across the 64 parameter value combinations and reported that the mean of
ab was .16, the mean of the proportion mediated was .30, and the mean of the ratio of the
indirect to the direct effect was .67.
MacKinnon et al.'s (1995) results supported the findings of MacKinnon and Dwyer
(1993). The authors noted that Sobel's estimate for the standard error of the indirect effect
generally performed the best across the continuous independent variable and the binary
independent variable conditions, although Sobel's estimate and the two variations of it
performed similarly (most likely because the term sa Sb is usually quite small). The
McGuigan and Langholtz estimate again performed well for the continuous independent
variable condition but was inaccurate for the binary independent variable condition (the
estimate was usually two to three times larger than the true standard error in this latter
condition). Estimators for the two indexes of the relative magnitude of the indirect effect
32
{able and able') did not perform well. This result is not surprising because there is a lot of
variation in the ratio of two unstable parameters. The point and variance estimates of the
proportion mediated tended to stabilize at sample sizes of 500 and above. Because it was
unclear from the initial study where the point and variance estimators of the ratio of the
indirect effect to the direct effect stabilized, the authors conducted another simulation study
with sample sizes of 2000, 3000, and 4000. Point and variance estimators of the ratio
seemed to stabilize at around 2000 for the continuous independent variable condition and
4000 for the binary independent variable condition.
Another innovation of MacKinnon et al. (1995) was that they calculated confidence
intervals for each sample based on the sample estimates of the standard error of the indirect
effect and determined the proportion of times the confidence interval fell to the left and to the
right of the true value of the indirect effect. For a given sample, the question was whether
the 95% confidence interval includes the actual parameter value. In theory, this should occur
95% of the time. Across samples, for those that did not include the parameter value,
MacKinnon et al. (1995) tabulated the number of times this occurred to the left (i.e. sample
estimate was less than the population value) and to the right (i.e. sample estimate was greater
than the population value). Because the population values were positive, confidence
intervals that fell to the left of the parameter value included values that were closer to zero or
were the wrong sign and confidence intervals that fell to the right of the parameter value
included values that were farther from zero than the parameter value. Sobel's estimate
generally had the most accurate error rate (the true value of the indirect effect fell to the left
or right of the confidence interval about 5% of the time). However, a key finding of this
study was that there was an asymmetry in the error rate in that most of the time the true value
33
of the indirect effect fell to the right of the confidence interval. (Only at N=5000 did the true
value of the mediated effect fall about equally to the left and to the right of the confidence
interval).
The first two simulation studies of MacKinnon and colleagues focused on exploring
the accuracy of estimates of the standard error of the indirect effect and did not go to the next
step of testing the significance of the indirect effect by dividing it by its standard error and
comparing the resulting value to the standard normal distribution. Moreover, MacKinnon
and colleagues only used "normal theory" estimates of the standard error of the indirect
effect. Presumably, the discovery of the asymmetry in the error rate of normal theory
estimates of the standard error of the indirect effect sparked the development of new methods
sensitive to the asymmetrical sampling distribution of the indirect effect. Examples of these
methods are the three methods developed by MacKinnon and colleagues that were reviewed
earlier: the empirical distribution of ab, the distribution of the product of two standard
normal variables, and the asymmetric confidence limits for the distribution of the product ab.
Later Simulation Studies on Assessing Mediation
MacKinnon et al. (2002) expanded on their prior simulation studies in two important
ways. First, they conducted a comprehensive review of methods of mediation analysis that
include both the multiple-test and single-test methods for assessing mediation. Second, they
thoroughly explored the statistical properties of these methods by determining the accuracy
of point estimates of the indirect effect and their standard error and calculating the power and
Type I error rates of all the methods.
MacKinnon et al. (2002) reviewed fourteen methods of mediation analysis that they
classified into three families of methods. One family of methods, which they called the
34
causal steps approach, is made up of the three tests of mediation reviewed earlier in this
chapter under the multiple-test framework: the test of full mediation (Judd & Kenny, 1981),
the basic plus bivariate test of mediation (Baron & Kenny, 1986), and the basic test of
mediation (Kenny et al. 1998; Cohen & Cohen, 1983). MacKinnon et al. referred to this
basic test as the test of joint significance of a and b. The two other families of mediation
tests they reviewed are product of coefficients approaches and difference-in-coefficients
approaches, which both fall under the single-test framework for assessing mediation. They
reviewed seven tests of mediation in the product of coefficients family (three of which were
developed by MacKinnon and colleagues) and four tests of mediation in the difference-in-
coefficients family. All combinations of parameter values when paths a, b, and c' were
individually set at 0, 0.14, 0.39, and 0.59 were simulated. The sample sizes they studied
were N = 50, 100, 200, 500, and 1000. The dependent and mediator variables were always
modeled as continuous variables while the independent variable was modeled as a continuous
variable and a binary variable. All the possible combinations of path sizes for a, b, and c',
type of independent variable (continuous or binary), and sample size resulted in 640 different
conditions. Five hundred replications were conducted for each condition.
MacKinnon et al. (2002) limit the discussion of their results to the condition where
the independent variable is modeled as a continuous variable because they report that the
results for the binary and the continuous independent variable conditions were similar.
MacKinnon et al.'s (2002) first set of tables (Tables 2-3) shows how estimates of the
standard error of the indirect effect derived by difference of coefficients approaches and
products of coefficients approaches performed. MacKinnon et al.'s second set of tables
(Tables 4-6) detail the Type I error rates and statistical power of each of the three families of
35
approaches for the conditions when a = b and c '=0. They focus their discussion of results to
the conditions where a — b and c' = 0 because these results were representative of results
from all the other conditions. When the tabled results were not representative of the other
conditions, these exceptions were discussed in the text (e.g. they discussed the conditions
when results differed across values of c' in the text). MacKinnon et al.'s third set of tables
(Tables 7-9) detail the Type I error rates of each of the three families of approaches when a #
b (for example, when a or b = 0 and the other is nonzero).
MacKinnon et al. (2002) found that most estimates of the standard error of the
indirect effect were accurate, except for three. The Freedman and Schatzkin (1992) and the
Clogg et al. (1992) standard error estimates were much smaller than the true standard error
for all conditions and Goodman's (1960) estimate often yielded undefined standard errors.4
The Type I error rate was calculated by tabulating the number of times that an
indirect effect was found when there was no indirect effect simulated. In mediation analysis,
Type I error can occur when both paths a and b are equal to zero (a = b = 0) or when either
path a or b is equal to zero and the other path is nonzero (a = 0 and b^0;a^0 and b = 0).
Because the alpha level is set to .05, the expected rate of Type I errors is 5/100 or 25/500.
Statistical power (which is equal to 1 - Type II error rate) was calculated by tabulating the
number of times an indirect effect was found when one was simulated (i.e. when a ^ 0 and b
#0).
(Footnote4: This last finding contradicted their 1995 results.)
36
One challenge with respect to comparing the Type I error rate across methods is that
MacKinnon et al. (2002) present their results on the Type I error rate of the fourteen tests of
mediation in two different places. They report the Type I error rate when both paths a and b
equal zero in Tables 4-6 and report the Type I error rate when either path a or b equals zero
and the other path is nonzero in Tables 7-9. This makes it difficult to compare the frequency
of Type I errors across methods. This separation of results led to two different sets of
performance rankings of the tests of mediation.
When only the conditions in which a = b = 0 are considered, MacKinnon et al. (2002)
ranked the distribution of the product of two standard normal variables (MacKinnon et al.
1998 as cited in MacKinnon et al., 2002) and the empirical distribution of ab (MacKinnon et
al., 1998 as cited in MacKinnon et al., 2002) as the top two methods with the most accurate
Type I error rates and the greatest statistical power (p. 95, 98).5 When all of the conditions
that can produce Type I errors are taken into account (a=b=0; a=0 and b^0; <#0 and b=0),
the mediation tests that have the best balance of Type I error and power are the basic test of
mediation (Kenny et al. 1998; Cohen and Cohen, 1983) referred to as the test of joint
significance by MacKinnon et al., (2002) and the asymmetric confidence limits test
(MacKinnon & Lockwood, 2001 as cited in MacKinnon et al., 2002) (p.99). This is because
when either path a or b is equal to zero and the other path is nonzero, the distribution of the
product of two standard normal variables method and the empirical distribution of ab method
have Type I error rates that are too high (values ranged from .09 to .89).
(Footnote5: Four methods are ranked according to performance but the latter two methods,
the two difference-in-coefficients methods, have caveats attached to their use.)
37
One ramification of the finding that the distribution of the product of two standard
normal variables method and the empirical distribution of ab method have high Type I error
rates is that if these two methods are utilized there is a large chance that a trivial or
nonexistent indirect effect will be found to be statistically significant. An example of this
can be seen in MacKinnon et al.'s (2002) Table 9. For the distribution of products test, in a
sample of size N = 100, when path a is zero and path b is . 14 the Type I error rate is . 18. The
chances of finding a nonexistent indirect effect to be statistically significant are even greater
when either path a or b is large and the other path is zero. In Table 9 of MacKinnon et al.
(2002), for the distribution of products test, in a sample of size N = 100, when path a is zero
and path b is .59 the Type I error rate is .67.
Two more findings of MacKinnon et al. (2002) were that the full test of mediation
(Judd & Kenny, 1981) and the basic plus bivariate test of mediation (Baron & Kenny, 1986)
have "low Type I error rates and the lowest statistical power in all conditions studied" (p. 96,
98 and in abstract). These findings are especially apparent in the tables that focus on the
condition where c' is zero (Tables 4-6). However, these findings are understandable given
that the above two tests of mediation require that the bivariate correlation between the
predictor and criterion variables be significant. Because many of the bivariate correlations
included in the simulation study are small when c' - 0, (e.g. r = .0196 when a = b = .14 and r
= .1521 when a = b = .39) it is expected that the bivariate effect would not be significant,
particularly in smaller samples drawn from these "small" and "moderate" populations. The
simulation conditions that are the focus of MacKinnon et al.'s discussion section are also the
conditions that are least favorable to the full test of mediation (Judd & Kenny, 1981) and the
basic plus bivariate test of mediation (Baron & Kenny, 1986). MacKinnon et al. (2002) do
38
note that the basic plus bivariate test of mediation (Baron & Kenny, 1986) had greater power
as c' increased (which would lead to larger bivariate correlations between X and Y).6 Some
subsequent authors (e.g. Frazier et al., 2004) have emphasized the above two findings and
have failed to note MacKinnon et al.'s overall conclusion that the basic test of mediation
(Kenny et al. 1998; Cohen & Cohen, 1983), which they label the test of joint significance, is
the method that performed the best across all conditions (p.99 and in abstract).
MacKinnon et al. (2002) qualify their finding that the basic test of mediation
performed best out of all methods across all conditions with the criticism that "no parameter
estimate or standard error of the intervening variable effect is available for the joint test of
the significance of a and P so that effect sizes and confidence intervals are not directly
available" (p. 99). They therefore conclude that "other tests that are close to the joint
significance test in accuracy such as the asymmetric confidence interval test may be
preferable as they do included an estimate of the magnitude of the intervening variable
effect" (p. 99). This argument is persuasive only if one accepts the premise that the single-
test framework for assessing mediation is the framework of choice (in which case it is a
foregone conclusion). Advocates of the multiple-test framework for assessing mediation
might argue that an estimate of the indirect effect is not necessary or desirable. The basic test
of mediation (Kenny et al., 1998, Cohen & Cohen, 1983) does not aspire to provide an
estimate of the indirect effect.
(Footnote6: MacKinnon et al. (2002) also note that the Judd and Kenny (1981) method had
less power as c' increased. This is because the Judd and Kenny (1981) test is a test of full
mediation which occurs only when c' is not significantly different from zero.)
39
The major premise of the basic test of mediation is that only paths a and b are required to be
significant in order to provide evidence for mediation. Missing from MacKinnon et al's
(2002) evaluation of the fourteen tests of mediation that they reviewed is the argument for
why the single-test framework for assessing mediation is preferable to the multiple-test
framework for assessing mediation. That is, there is no conceptual or pragmatic argument
that shows why researchers studying mediation should focus on the product term (ab), rather
than looking at paths a and b individually.
Although their 2002 study concluded that the basic test of mediation performed best
across all simulation conditions, MacKinnon and colleagues chose not to include this test in
their next simulation study (MacKinnon et al., 2004) and instead focused on further
exploring the performance of the asymmetric confidence limits test (MacKinnon &
Lockwood, 2001).
(Footnote7: The basic test of mediation provides two effect sizes, one estimate for path a and
another estimate for path b. Confidence intervals can be constructed around these effect
sizes in the usual manner, using the standard error of the parameter estimate.)