The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by ct.fironika, 2020-07-17 11:54:37

Test of Mediation

Test of Mediation

40
The main focus of MacKinnon et al. (2004) was to compare methods that bracketed
the indirect effect with confidence intervals. They classified methods that estimated
confidence intervals analytically as single-sample methods and classified methods that
estimated confidence intervals empirically as resampling methods. MacKinnon et al. (2004)
had two goals. The first goal was to compare the performance of two single-sample
methods, the asymmetric confidence limits method (MacKinnon & Lockwood, 2001) with
confidence intervals calculated by using Sobel's formula for the standard error of the
mediated effect. They designated the former the Mmethod because they used Meeker's
(1981) tables of the distribution of the product of two normal random variables to obtain
critical values to construct upper and lower confidence intervals. They designated the latter
the z method because the standard normal distribution is used to obtain critical values to
construct upper and lower confidence intervals. Their first goal was accomplished by
conducting a simulation study (Study 1 in the article) that modeled five different sample sizes
(N= 50, 100, 200, 500 and 1000) and simulated all combinations of four parameter values
for paths a and b (0,.14,.39,and .59) for a total of eighty different conditions. The direct
effect, path c', was always set to zero. Ten thousand replications were conducted for each of
the eighty conditions.
The results of Study 1 replicated MacKinnon et al.'s (2002) results which showed that
the M method was superior to the z method. This result was to be expected because it has
been well established that methods that use the standard normal distribution to test the
significance of the indirect effect or construct confidence intervals around it will be
inaccurate because the indirect effect does not have a normal sampling distribution.
However, even though the M method was superior to the z method, the Type I error rate of

41
the M method was still lower than the expected rate of .025 of true values falling to the left
and to the right of the 95% confidence intervals. This lower than expected Type I error rate
was found when both paths a and b were zero and when only path a or b was zero and the
other was nonzero. MacKinnon et al. (2004) posited two possible explanations for the lower
than expected Type I error rate. One explanation was that the sampling distribution of the
indirect effect {ab) was the product of two t distributions rather than two normal
distributions. The second explanation was that the distribution of the indirect effect may
vary according to different combinations of estimates of paths a and b and therefore basing
the estimation of confidence intervals on a single distribution (e.g. the distribution of the
product of two normal variables) may lead to inaccurate intervals.

The second goal of MacKinnon et al. (2004) was to compare three single-sample
methods with six resampling methods. The three single-sample methods were the M method
and the z method evaluated in the first study, along with a new method that MacKinnon et al.
(2004) introduced, which they designated the empirical-M method. The empirical-M method
was developed in response to the findings in Study 1. MacKinnon et al. (2004) hoped to
improve on the M method by constructing a table of critical values similar to the ones found
in Meeker et al. (1981) that was also based on the distribution of the indirect effect {ab) but
was empirically rather than analytically derived. For each combination of paths a and b
found in Meeker et al. (1981), 10,000 samples were generated that were each replicated
1,000 times. The distribution of the product term {ab) was used to determine values for upper
and lower 95% confidence intervals. This new table was used in the same manner as the
Meeker tables were used in the Mmethod. The six resampling methods were the jackknife,

42
the percentile bootstrap, the bias-corrected bootstrap, the bootstrap-/, the bootstrap-g, and
Monte Carlo.

The second goal was accomplished by conducting a second simulation study (Study
2) that modeled four sample sizes (N = 25, 50, 100, and 200) and ten combinations of paths a
and b (a=b=0; a=0 and 6=.14; a=0 and b=39; a=0 and b=.59; a=b=.\4; a=.\4 and b=.39;
a=A4 and b=.59; a= b=.39; a=39 and b=.59; a=b=.59) for a total of 40 different conditions.
One thousand replications were conducted for each of the forty combinations. Six
resampling methods were then applied to the forty thousand samples generated. Type I error
was assessed by calculating the number of times the value of zero fell to the left or to the
right of the confidence intervals. The bias-corrected bootstrap method had Type I error rates
that were closest to .025. The accuracy of the confidence intervals was assessed by looking
at how many times the 80%, 90% and 95% confidence intervals fell outside the Bradley
robustness interval. The bias-corrected bootstrap method emerged as the most accurate
method. Type I error rate and power at the alpha=.05 level were calculated across the four
sample sizes modeled and the nine methods simulated. The bias-corrected bootstrap method
again emerged as the method with the most accurate Type I error rate and the most power.
MacKinnon et al. (2004) note that they also investigated the performance of the bias-
corrected and accelerated bootstrap but did not report these results because the methods had
results comparable to, but not better than, the bias-corrected bootstrap.
A Summary of Recent Simulation Studies

Most of the simulation studies that have been conducted have focused on methods of
assessing mediation that fall under the single-test framework (MacKinnon & Dwyer, 1993;
MacKinnon et al., 1995; Shrout & Bolger, 2002; MacKinnon et al., 2004). Only MacKinnon

43
et al. (2002) have compared methods that fall under the single and multiple-test frameworks
to each other. In this review (MacKinnon et al., 2002), fourteen methods for assessing
mediation were compared to each other on statistical grounds such as Type I error rate and
power. Across all of the conditions that were simulated, the test that had the best balance of
Type I error and power was the basic test of mediation (Kenny et al. 1998; Cohen & Cohen,
1983). However, this finding may be somewhat lost given the prominence of single-test
framework methods in the literature. This finding has also been overlooked by researchers
who have suggested that the basic test of mediation is not sufficient for assessing mediation
and that what is necessary for establishing mediation is testing the significance of the indirect
effect (Frazier et al. 2004). Mallinckrodt et al. (2006) noted MacKinnon et al.'s (2002)
finding that the basic test of mediation provides the best balance between Type I and Type II
errors and yet suggested that most counseling psychologists are interested in estimating the
magnitude of the indirect effect and bracketing the estimate with a confidence interval and so
would most likely choose one of the methods under the single-test framework (p.377).
Preacher and Hayes (2004) have been critical of Baron and Kenny (1986). They conjecture
that because Baron and Kenny (1986) require two significance tests (of paths a and b) instead
of one significance test (of ab) the former may incur more Type I and Type II errors than the
latter.

Directionsfor Future Research
Researchers wishing to conduct a mediation analysis today are faced with many
possibilities. What standards should researchers look to to guide their choice of method?
Researchers may be drawn to methods that fall under the single-test framework because
recent simulation studies have shown that these methods are powerful tests of mediation. In

44

this study I will revisit the statistical standard that has been an important focus of the

mediation literature thus far and examine the statistical performance of three mediation

methods that have yet to be compared to one another. I will also argue that researchers

should consider conceptual and pragmatic standards along with the statistical standard when

deciding on which mediation method to employ for their studies.

Researchers (Frazier et al., 2004; MacKinnon et al., 2002; Mallinckrodt et al., 2006)

have asserted that tests of mediation that yield an effect size for the indirect effect (i.e.

mediation analyses that fall under the single-test framework) are preferable to those that do

not. However, this is a debatable position - not one that can be taken for granted. The recent

trend in the mediation literature has been to focus on methods that fall under the single-test

framework. The apparent consensus in the research community in the movement towards the

single-test framework appears to support the assumption of the superiority of the single-test

framework to the multiple-test framework as a foregone conclusion. However, is there

enough evidence to support this conclusion? Few arguments for this position have been

clearly articulated and not enough evidence has been accumulated to support the movement

towards adopting the single-test framework as the preferred framework in mediation analysis.

When mediation methods have been compared to one another, the focus of the

comparisons has been on the statistical performance of these methods. Perhaps mediation

methods should be evaluated on more than simply the statistical standards that have so far

been used to assess their performance. Two other standards that could be considered when

evaluating mediation methods are the conceptual standard and the pragmatic standard.

In MacKinnon et al.'s (2002) comparison of fourteen methods for assessing

mediation a fundamental philosophical difference between the methods was not articulated.

45
The multiple-test framework and the single-test framework for assessing mediation differ in
how mediation is conceptualized. As long as the fundamental philosophical differences
between the two approaches of assessing mediation remain unarticulated the process of
trying to decide which of the many mediation methods to employ is blurred. Before
mediation analyses are compared to one another, the theoretical underpinnings of the two
frameworks in which the methods are nested should be evaluated. Does one framework make
more conceptual sense than the other?

The multiple-test framework conceptualizes mediation as a causal chain. This causal
chain is only as strong as its weakest link. Mediation is assessed by testing the strength (or
the significance) of each link in the hypothesized causal chain. If any link on the causal
chain is not significantly different from zero, then the mediation hypothesis is not supported.
Moreover, the multiple-test framework is interested in the effect size of each link in the
causal chain. Having some idea of the practical significance of each link helps researchers in
the two main tasks for which mediation analysis is helpful: theory building and specification
and program evaluation and outcome research.

The single-test framework assesses mediation by testing the significance of the
overall indirect effect. Using the tracing rule the indirect effect can be quantified as the
product of the path that links the predictor variable to the mediator (path a) and the path that
links the mediator to the criterion variable (path b). This approach yields an effect size for the
indirect effect and tests the significance of the indirect effect. Preacher and Hayes (2004)
stated, "a significance test associated with ab should address mediation more directly than a
series of separate significance tests not directly involving ab" (p. 719).

46
Applying the conceptual standard to mediation analyses involves asking the question,
Does one framework for mediation analysis makes more conceptual sense than the other?
When studying the mechanisms by which one variable affects another variable, does it make
more conceptual sense to test the significance of each link in a hypothesized causal chain or
to focus on the magnitude of the overall indirect effect?
Another standard that could be applied to mediation analyses is the pragmatic
standard. Wilkinson et al. (1999) have strongly recommended that when choosing between
quantitative methods, researchers should use the minimally sufficient analysis assuming that
the strength and assumptions of the simpler analysis are appropriate for their data. Cohen
(1990), nine years earlier, made a similar point. One of the major lessons Cohen had learned
in his years of applying statistics to psychology is that "simple is better" when it comes to
"the representation, analysis, and reporting of data" (p. 1305). He encouraged researches to
stay in close touch with their data, understand the statistics that underlie their analysis, and
use the simplest statistical analysis that will serve the purpose of their study. Methods for
assessing mediation should not only be evaluated according to their statistical accuracy, they
should also be evaluated according to their ease of use and the interpretability of their results.
Applying the pragmatic standard to mediation methods involves asking the question, which
statistical analysis is the simplest analysis that is sufficient to establish that mediation is
occurring?
Overview of Research Questions
The purpose of this study is to extend the literature on mediation analysis by
comparing three mediation methods that are considered to be the strongest in their respective
branches of mediation and that have yet to be compared to one another through a simulation

47

study. The three methods that will be the focus of this dissertation are the basic test of

mediation (that falls under the multiple-test framework), the asymmetric confidence limits

method (that falls under the products of coefficient approach under the single-test

framework) and the bias-corrected and accelerated bootstrap method (that falls under

bootstrap approaches under the single-test framework). The difference-in-coefficients

approach that falls under the single-test framework will not be represented in this study as

these methods have performed poorly compared to product-of-coefficients approaches and

bootstrap approaches.

The three methods of mediation that will be compared in this study will be evaluated

according to three standards: statistical, conceptual, and pragmatic. Studies to date have

focused on comparing mediation methods with respect to their statistical properties.

Although statistical performance is an important consideration, other criteria should be

considered when deciding on which statistical analysis to use. The conceptual standard asks

researchers to consider which method best captures the conceptual meaning of mediation.

The pragmatic standard asks researchers to consider which among the available statistical

methods is the minimally sufficient analysis appropriate to their data and their research

question.

The Method and Results chapters of this dissertation will primarily address the

statistical standard that has thus far been the focus of prior simulation studies. The four

research questions that will be addressed are:

1. What are the Type I error rates of each mediation method across varying

population parameters when there is no indirect effect simulated?

48

2. What are the Type II error rates and power of each mediation method across

varying population parameters when an indirect effect is simulated?

3. How do the three methods of mediation compare to one another when there is no

indirect effect simulated? When the results of two mediation methods are compared directly

to one another, how often do they agree with one another? Disagree with another? And, if

there is disagreement between two methods, is there a particular direction to the

disagreement?

4. How do the three methods of mediation compare to one another when an indirect

effect is simulated? When the results of two mediation methods are compared directly to one

another, how often do they agree with one another? Disagree with another? And, if there is

disagreement between two methods, is there a particular direction to the disagreement?

The conceptual and pragmatic standards discussed earlier in this chapter will be

revisited in the Discussion chapter of this dissertation. Findings from the statistical analyses

will be used to illustrate points that will be made in the Discussion section with respect to the

conceptual standard and the pragmatic standard. One question that will be addressed that is

relevant to the conceptual standard is, Are there conditions where applying the multiple-test

framework or the single-test framework have important conceptual ramifications? In other

words, are there conditions where it is evident that one framework should be applied rather

than the other?

Questions that are relevant to the pragmatic standard are, How accessible were these

three mediation methods? Were the results easily interpretable? Do any of these mediation

methods speak to the practical significance of the relationships between variables? In the

research community there are strong incentives for researchers to use the most powerful test

49
that they can find. The pragmatic standard asks the questions, Can a mediation method be
too powerful? Do utilizing high-powered methods increase the chance that trivial
relationships between variables are found to be significant? The pragmatic standard takes
into consideration the importance of the practical significance, as well as the statistical
significance, of the relationship between variables.

50

Chapter 3

The purpose of this simulation study is to evaluate the statistical properties of three

tests of mediation that have yet to be compared to one another. The three mediation methods

that will be the focus of this study are: the basic test of mediation (Kenny et al., 1998), the

asymmetric confidence limits method (MacKinnon et al., 2004), and the bias-corrected and

accelerated bootstrap method (Efron & Tibshirani, 1993). These three methods will be

compared in two ways. The first way that the statistical properties of the three mediation

methods will be assessed is by comparing the results of the mediation analysis to the actual

conditions that were modeled. This is how the statistical properties of mediation methods

have been evaluated in past simulation studies (MacKinnon et al, 2002; MacKinnon et al,

2004). By comparing the results of the mediation analysis to the actual conditions that were

modeled, the Type I error rate, the Type II error rate, and the power of each method can be

determined.

The conditions that will be modeled in this simulation study will vary along the

parameters of sample size, the effect size of path a, and the effect size of path b. The sample

sizes that will be modeled in this simulation study are 50, 100, 200, and 500. These sample

sizes were modeled by MacKinnon et al. (2002) and were chosen because they are

comparable to the most common sample sizes found in the social sciences. The effect sizes

of paths a and b that will be modeled are also those that were simulated by MacKinnon et al.

(2002): 0 (a condition that is included in order to evaluate Type I error rate), 0.14, 0.39, and

0.59. The latter three values represent small, medium, and large effect sizes, respectively.

The predictor, mediator, and criterion variables will all be modeled as continuous variables

with multivariate normal distributions. Previous simulation studies (MacKinnon et al., 2002;

51

MacKinnon et al., 1995) have shown that when variables have been modeled as both

categorical and continuous, that the subsequent results have been comparable to each other.

Because continuous variables are more prevalent than categorical variables in the areas of

research of interest to counseling psychologists, continuous variables will be modeled in this

study.

The second way that the statistical properties of the three mediation methods will be

assessed is by comparing the methods to each other in a pairwise fashion. There are three

possible combinations of comparing two methods to one another: the basic test of mediation

will be compared to the asymmetric confidence limits method, the basic test of mediation

will be compared to the bias-corrected bootstrap method, and the asymmetric confidence

limits method will be compared to bias-corrected bootstrap method. These pairwise

comparisons will be carried out under two conditions: when an indirect effect exists (i.e.

paths a and b are nonzero) and when an indirect effect does not exist (i.e. when either path a

or path b = 0). These pairwise comparisons will be conducted in order to ascertain the extent

of agreement or disagreement between the results of two mediation methods and to ascertain

whether there is a. pattern of agreement or disagreement between the results of the two

methods of mediation.

In order to explore the association between the results of two mediation methods, a 2

x 2 table will be constructed that will depict when each method detects an indirect effect and

when it does not. Each 2 x 2 table will have four quadrants: one quadrant will represent the

conditions when both methods detect a significant indirect effect; a second quadrant will

represent the condition when both methods do not detect an indirect effect; and the third and

fourth quadrants will represent the conditions when one method detects a significant effect

52
and the other doesn't (one method will have arrived at the correct conclusion and the other at

the incorrect conclusion). The four quadrants will take on different meanings depending on

whether or not an indirect effect was modeled. Under the conditions when an indirect effect

was not modeled, the quadrant which represents the instances when both methods detect a

significant indirect effect represents instances when both methods committed Type I errors.

Under the conditions when an indirect effect was modeled, the quadrant which represents the

instances when both methods detect a significant indirect effect represents the power of the

methods; the instances when both methods correctly detected an indirect effect. For

example, under the condition when an indirect effect does not exist, the 2 x 2 table will be set

up as follows:

Figure 3.1

Pairwise Comparisons When No Indirect Effect is Modeled (ab=0)

Method 1

Significant Non-significant

Significant Method 1 - correct
Both methods commit a Method 2 - Type I error
Type I error.

Method 2

Non- Both methods are correct.
significant Method 1 - Type I error

Method 2- correct

U Areas of agreement u Areas of disagreement

53
Under the condition when an indirect effect does exist, the 2 x 2 table will represent

the following conditions:

Figure 3.2

Pairwise Comparisons When an Indirect Effect is Modeled (ab>0)

Method 1

Significant Non-significant

Significant Method 1 - Type II error
Both methods are Method 2 - correct
correct.

Method 2

Non- Both methods commit a Type
significant Method 1 - correct II error.

Method 2- Type II error

U Areas of agreement u Areas of disagreement

54
For the purpose of convenience, all results from these pairwise comparisons will be

displayed in six tables (three tables each for when an indirect effect is simulated and when it

is not simulated). The tables will have the following format:

Table 3.1

Sample Table: Concordance Between Two Methods When No Indirect Effect Exists (ab=0)

Method 1 only - Type I Method 2 only - Type I

Both commit Type I error Both fail to reject H0 error error

The pairwise comparisons will reveal the extent to which the results of two methods
agree or disagree with each other. The pairwise comparisons will also reveal whether there is
a consistent pattern of agreement or disagreement between the results of two methods. When
there is disagreement, I will look at which population parameters increase the rate of
disagreement. These are areas of study that have yet to be addressed in the literature on
mediation.

The degree of 2 x 2 association in the pairwise comparisons can be measured using
Cohen's kappa (K). This coefficient is typically used as a measure of inter-rater agreement
but it can also be used as an index of agreement between dichotomous variables. In this study
the methods can be likened to raters that conclude whether or not there is a significant
indirect effect. Cohen's kappa takes into account the amount of agreement that can occur by
chance and is therefore considered to be a superior measure of agreement when compared to
a simple calculation of percent agreement. Cohen's kappa is defined as follows:

Po - Pc 55

K= l-pe M

where p0 is equal to the proportion of agreement between two raters (and in this
study, between two methods) and/»c is equal to the proportion of units for which agreement
is expected by chance. When the observed agreement is equal to chance agreement, K = 0.
When observed agreement exceeds chance agreement, K > 0. When observed agreement is
less than chance agreement, K < 0. K is equal to 1 when there is perfect agreement between
raters.
The Simulation Study

The R programming language (R Development Core Team, 2007) will be used for
this simulation study. In all, 64 conditions will be modeled representing all possible
combinations of sample size (N=50, 100, 200 and 500), and effect sizes for paths a and b (0,
0.14, 0.39, and 0.59). A Monte Carlo simulation study is useful when comparing the
performance of statistical methods because the accuracy of the methods can be checked
against the population parameters that were simulated (i.e. the results of the statistical
methods can be compared to the "answers"). In a Monte Carlo simulation, an artificial
population or pseudo-population is generated according to the population parameters set by
the researcher. Samples are drawn from this pseudo-population (pseudo-samples) in a
manner that resembles conducting replications of a study in the real world. In this study,
1000 psuedo-samples will be simulated for each of the 64 conditions. The three mediation
methods that will be examined in this study will be applied to the 64,000 pseudo-samples
drawn. For the bias-corrected bootstrap method, 1000 bootstrap samples of each of the

56
64,000 pseudo-samples will be drawn (the minimum number of replications suggested by
Shrout and Bolger, 2002).

The pseudo-samples in this study will be simulated using an R package entitled
mvrnorm, which is loaded onto R by using the command "library(MASS)" (Venables and
Ripley, 2002). This package simulates samples from a specified multivariate normal
distribution and uses the eigen decomposition for matrix decomposition. The bias-corrected
bootstrap method will be implemented by using the R package entitled boot (Ripley, 2007;
Davison & Hinkley, 1997) which is loaded onto R by using the command "library(boot)".
The bias-corrected bootstrap method is called "bca" by the package boot. MacKinnon, Fritz,
Williams, and Lockwood (2007) have provided a program (PRODCLIN) that implements the
asymmetric confidence limits method in a number of programming languages, including R.
The basic test of mediation does not require programs other than R or any additional R
packages in order to be implemented.

57

Chapter Four

This simulation study compared the performance of three methods of assessing

mediation across sixty-four conditions that varied across sample size, the strength of the

relationship between predictor variable and the mediator, and the strength of the relationship

between the mediator variable and the criterion variable. The three methods that were

compared to one another were the bias-corrected and accelerated bootstrap (BCa) method, the

asymmetric confidence limits (ACL) method, and the basic test of mediation (BT). Tables

A.l and A.2 (see Appendix) summarize the conditions that were simulated in this study.

Type I Error Rates

The Type I error rates of the three methods are displayed in Table 4.1 and in Figures

4.1 to 4.7. The three methods had similar Type I error rates across conditions in the larger

sample sizes (N = 200 and N = 500). The graphs of the Type I error rates of the three

methods show that there is more divergence between the three methods in the conditions

where the sample size is smaller and when the non-zero path is medium or large. The BCa

had a Type I error rate of .05 or higher in seventeen of the twenty-eight conditions in which

no indirect effect was modeled (61%), the ACL in fifteen of twenty-eight conditions (54%),

and the BT in eleven of twenty-eight conditions (39%). The mean Type I error rate for the

conditions where the Type I error rate was greater than or equal to .05 was .07 for BCa and

.06 for both ACL and BT.

Methods that assess mediation by testing the significance of the indirect effect (such

as BCa and ACL) may be more prone to committing a Type I error when either path a or path

b is large because if the other path is small but insignificant, the indirect effect (the product

term ab) may be found to be significant because of the influence of the large path. The fact

58
that the BCa method had the highest Type I error rate among the three methods when sample
sizes were small and either path a or b were medium or large is worthy of note because the
BCa is most recommended for use when sample sizes are small.

59

Table 4.1

Type I Error Ratesfor BCa, ACL, and BT

Condition n a b ab BCa ACL BT
0 00 0.010 0.002 0.001
1 50 0 00 0.007 0.004 0.004
2 100 0 00 0.007 0.001 0.001
3 200 0 00 0.006 0.003 0.003
4 500

5 50 0 0.14 0 0.019 0.007 0.005
6 100 0 0.14 0 0.022 0.016 0.015
7 200 0 0.14 0 0.042 0.025 0.023
8 500 0 0.14 0 0.069 0.049 0.042

9 50 0 0.39 0 0.087 0.054 0.040
10 100 0 0.39 0 0.078 0.053 0.047
11 200 0 0.39 0 0.064 0.056 0.053
12 500 0 0.39 0 0.039 0.036 0.032

13 50 0 0.59 0 0.077 0.062 0.049
14 100 0 0.59 0 0.082 0.068 0.066
15 200 0 0.59 0 0.064 0.058 0.056
16 500 0 0.59 0 0.066 0.058 0.056

17 50 0.14 0 0 0.018 0.007 0.006
0 0.034 0.019 0.019
18 100 0.14 0 0 0.051 0.037 0.033
0 0.066 0.048 0.042
19 200 0.14 0

20 500 0.14 0

33 50 0.39 0 0 0.069 0.041 0.033
0 0.070 0.052 0.044
34 100 0.39 0 0 0.074 0.062 0.059
0 0.051 0.047 0.045
35 200 0.39 0

36 500 0.39 0

49 50 0.59 0 0 0.099 0.083 0.070
0 0.071 0.067 0.061
50 100 0.59 0 0 0.065 0.055 0.054
0 0.034 0.028 0.028
51 200 0.59 0

52 500 0.59 0

Note, a = population path coefficient for effect of X on M; b = population path coefficient
for effect of M on Y (controlling for X); ab = product (indirect effect). Direct effects of X on
Y (path c) are zero for all conditions.

Figure 4.1

O Type I error rates
(a = 0, b = 0)
d
Sample size
ooo
d

&o
CO
d
L_

2

CD

CD o-a-
Q.

d

CM

o
d

oo
d

Figure 4.2 Type I error rates
(a = 0, b = .14)
o
d Sample size
oto
d

CO

oo

CD ^ -

&d

C\J

o
d
oo

Figure 4.3

Type I error rates
(a = 0, b = .39)

o Mediation test
d
- » - BCa
• — - ACL
- ° - BT
oo
o
o •

£ CO

o
d
2 •---- ~...-° ~r.~---_ ^^^,

CD o •••"" ""•--.."""•--_

- lCD ^- 0' ~"~~-^~"^^

1- O

"'*- o

CN
O

d

oo 100 200 300 400 500
d Sample size

0

Figure 4.4

O Type I error rates
(a = 0, b = .59)
d
Sample size
CO

o
d

CD

o
d

2

0)

(1) o
Q.



CM
O

d

oo
d

Figure 4.5 Type I error rates
(a = .14, b = 0)
o
d 100 200 300 400 500
ooo Sample size
d

CD

o

d

o

CM

o
d
oo
d

0

Figure 4.6

o Type I error rates
d (a = .39, b = 0)

CO 100
Sample size
o

& CO
CO do>

L—

ok_

I—

CD

CD •<t
Q.
o
H>» d

CM

o

oo

Figure 4.7 Type I error rates
(a = .59, b = 0)
o
d • Mediation test
oo
o • \ \ \\\ -+- BCa
d - - - ACL
0, v > - - _
CD ..-8.- BT

o
d

o •

CN 1 11 11

o 100 200 300 400 500
d

oo
d1

0

Sample size

67

Power

The three methods studied performed similarly to one another with respect to power.

The power rates of the three methods are displayed in Table 4.2 and in Figures 4.8 to 4.16.

The BCa had slightly more power than the other two methods across most conditions. The

ACL and BK methods were the most similar to one another in performance: the BT method

had slightly less power than ACL method across most conditions. A power level of .80

(Cohen, 1992) has been adopted as an ideal level of power to aim for in planning research.

The BCa method had power levels of .80 or higher in eighteen of the thirty-six conditions in

which an indirect effect was modeled (50%). Both the ACL and BT approaches had power

levels of .80 or higher in seventeen of the thirty-six conditions where an indirect effect was

present (47%). The conditions in which the three methods had less than .80 power

overlapped with one another. Only in one condition did the BCa achieve an ideal power level

greater than .80 while the other methods did not (see Condition 24 in Table 4.2). The mean

power level for the conditions in which power was greater than or equal to .80 was .95 for the

BCa, .96 for the ACL, and .95 for the BT (the mean power level for the BCa was brought

down by Condition 24 in Table 4.4).

Table 4.2

Power to Detect an Indirect Effect for 'a, ACL, and BT
ab BCa
Condition n a b 0.02 0.045 ACL BT
0.02 0.110 0.028 0.020
21 50 0.14 0.14 0.02 0.339 0.099 0.082
0.02 0.822 0.291 0.274
22 100 0.14 0.14 0.783 0.767

23 200 0.14 0.14

24 500 0.14 0.14

25 50 0.14 0.39 0.05 0.179 0.148 0.122
26 100 0.14 0.39 0.05 0.332 0.299 0.277
27 200 0.14 0.39 0.05 0.559 0.538 0.520
28 500 0.14 0.39 0.05 0.907 0.907 0.901

29 50 0.14 0.59 0.08 0.227 0.196 0.180
30 100 0.14 0.59 0.08 0.346 0.326 0.313
31 200 0.14 0.59 0.08 0.526 0.515 0.512
32 500 0.14 0.59 0.08 0.893 0.898 0.896

37 50 0.39 0.14 0.05 0.170 0.138 0.117
38 100 0.39 0.14 0.05 0.292 0.265 0.240
J39 200 0.39 0.14 0.05 0.496 0.472 0.458
40 500 0.39 0.14 0.05 0.820 0.821 0.818

41 50 0.39 0.39 0.15 0.689 0.650 0.618
42 100 0.39 0.39 0.15 0.960 0.986 0.952
43 200 0.39 0.39 0.15 0.999 0.999 0.999
44 500 0.39 0.39 0.15 1.000 1.000 1.000

45 50 0.39 0.59 0.23 0.820 0.818 0.799
46 100 0.39 0.59 0.23 0.976 0.978 0.978
47 200 0.39 0.59 0.23 1.000 1.000 1.000
48 500 0.39 0.59 0.23 1.000 1.000 1.000

53 50 0.59 0.14 0.08 0.152 0.134 0.117
54 100 0.59 0.14 0.08 0.203 0.191 0.179
55 200 0.59 0.14 0.08 0.356 0.354 0.342
56 500 0.59 0.14 0.08 0.713 0.718 0.718

J57 50 0.59 0.39 0.23 0.684 0.674 0.651
58 100 0.59 0.39 0.23 0.920 0.926 0.920
59 200 0.59 0.39 0.23 0.995 0.996 0.996
60 500 0.59 0.39 0.23 1.000 1.000 1.000

61 50 0.59 0.59 0.35 0.959 0.970 0.965
62 100 0.59 0.59 0.35 1.000 1.000 1.000
63 200 0.59 0.59 0.35 1.000 1.000 1.000
64 500 0.59 0.59 0.35 1.000 1.000 1.000

69

Note, a = population path coefficient for effect of X on M; b = population path coefficient
for effect of M on Y (controlling for X); ab = product (indirect effect). Direct effects of X on
Y (path c) are zero for all conditions.

70

Figure 4.8

Power
(a = .14, b = .14)

CO

o

CD

o 100 200 300 400 500
Sample size
CM

o

o
o

0

71

Figure 4.9 Power
(a = .14, b = .39)
00
Sample size
o

CD

d
d

CM

o
d

Figure 4.10

Power
(a = .14, b = .59)

00

o

CD
C)
CD

O
Q_

O

CM

d

o
d

100 200 300 400 500
Sample size

73

Figure 4.11

Power
(a = .39, b = .14)

00

o

CO

o

o

CM
O

o
o

100 200 300 400 500
Sample size

74

Figure 4.12

Power
(a = .39, b = .39)

00 100 200 300 400 500
Sample size
o

CO

o

o

CN

d

o

0

Figure 4.13

Power
(a = .39, b = .59)

00 100 200 300 400 500
Sample size
o

CD
O
CD

o
D.

o

CN
O

O
O

0

Figure 4.14

Power
(a = .59, b = .14)

CO

o

CD

d

0

o

Q.

CM

d

o
d

100 200
Sample size

77

Figure 4.15

Power
(a = .59, b = .39)

00

d

CO

d

CD

O
Q_

d

CN

d

o
d

100 200 300 400 500
Sample size

Figure 4.16 Power
(a = .59, b = .59)
%*^

doo

CD

d

a)

o

Q_

d

CM Mediation test

d - • - BCa
---- ACL
do -Q- BT

1 1 11 1i

0 100 200 300 400 500

Sample size

79
Figures 4.8 to 4.16 show that, as expected, within each condition power levels increased as
sample size increased. Another pattern discernible through the graphs is that for the same
effect size for path ab, there is lower power in the conditions where path a is greater than
path b than there is for the conditions where path b is greater than path a. One example of
this can be seen when Conditions 29-32 are compared with Conditions 53-56, which are
depicted in Figures 4.10 and 4.14 respectively. Although the effect size of ab is the same for
all of the above conditions (ab = .08), the power to detect the indirect effect exceeds the
desirable .80 power level when N = 500, path a = .14, and path b = .59. In the reverse
condition where N = 500, path a = .59, and path b = .14, power is below the .80 level.
Reasons for this pattern are discussed in Kenny et al. (1998) and Hoyle and Kenny (1999).
One reason for this pattern is that the collinearity between the predictor variable and the
mediator variable influences the power to detect any relationship between the mediator
variable and the criterion variable. The more variance in the mediator that is accounted for
by the predictor, the less unique variance there is in the mediator to explain the variance in
the criterion variable. Kenny et al. (1998) provided an equation to aid in conducting power
analyses that estimated that the effective sample size for the test of path b is approximately
N(l - rxnf) where N is the total sample size and TXM is the correlation between the predictor
variable and the mediator variable. It is evident from this equation that as path a increases in
size, the sample size required to maintain a particular level of power for the test of path b
also increases. All three methods use multiple regression to estimate paths a and b therefore
all the methods are affected by collinearity between the predictor variable and the mediator
variable.

80
Hoyle and Kenny (1998) showed that when Sobel's formula for the standard error of
the indirect effect is used to test the significance of the indirect effect, the test of ab is
maximized when a equals

(ab)2 -^(abf -{ab)* <41>
Habf - 1

They calculated the optimal value for a in what they defined as their high-collinearity
condition (ab - .18) and their low-collinearity condition (ab = .12) and showed that the
optimal value of a is less than that of b in both conditions. Moreover, the magnitude of the
difference between the optimal values of paths a and b is greater in the high-collinearity
condition. From this pattern they concluded that the power to detect mediated effects is
highest when path b is larger than path a, particularly when there is a large mediated effect.
Pairwise Comparisons

Agreement between Methods
The performance of each of the three mediation methods was compared to the others
in a pairwise fashion in order to assess the extent of agreement and disagreement between the
methods and to ascertain whether there is a pattern of agreement or disagreement between the
methods. The ACL was compared to the BT, the BCa was compared to the BT, and the ACL
was compared to the BCa. The results of the methods were compared pairwise for each of
the 1000 replications in each of the 64 conditions. For each pair of methods, agreement and
disagreement were coded in the R syntax as shown in Table 4. In a two-tailed significance
test there are three possible outcomes: reject Ho and find a significant positive effect, reject

81
Ho and find a significant negative effect, and fail to reject Ho. Thus when two methods are

used to test a hypothesis in the same sample, there are 3 x 3 or nine possible pairwise

outcomes. Three of these outcomes (codes 0, 1, and 6 in Table 4.3) represent agreement and

the other six represent disagreement (i.e. the two tests lead to different results in that sample).

Table 4.3

Possible Outcomes of Pairwise Comparisons

Code Outcome

0 Both methods fail to reject the null hypothesis.

1 Both methods conclude there is a significant positive effect.

2 Method 1 concludes there is a significant positive effect while Method 2 fails to reject the
null hypothesis.

3 Method 1 fails to reject the null hypothesis while Method 2 concludes there is a significant
positive effect.

4 Method 1 concludes there is a significant negative effect while Method 2 fails to reject the
null hypothesis.

5 Method 1 fails to reject the null hypothesis while Method 2 concludes that there is a
significant negative effect.

6 Both methods conclude there is a significant negative effect.

7 Method 1 concludes there is a significant negative effect while Method 2 concludes that
there is a significant positive effect.

8 Method 1 concludes that there is a significant positive effect while Method 2 concludes
that there is a significant negative effect.

Note: The numerical codes function as labels for the comparisons and do not have any
numerical value associated with them.

82
Tables A.4 to A.6 (appendix) display the results from the pairwise comparisons of the
three tests of mediation when no indirect effect was modeled. Most of the cases fell in the
comparison coded "0" in which both methods arrive at the correct conclusion. No cases fell
in the comparisons coded "7" and "8," the comparisons where one method finds a positive
significant effect and the other method finds a negative significant effect.
Tables A.7 to A.9 (appendix) display the results from the pairwise comparisons of the
three tests of mediation when an indirect effect was modeled. Most of the cases fell in the
comparisons coded "0" and " 1 , " the comparisons in which both methods commit a Type II
error and both methods arrive at the correct conclusion, respectively. The majority of the
cases shift from the comparison coded "0" to the comparison coded " 1 " as the sample size
and the effect sizes of paths a and b increase. As in the conditions where no indirect effect
was modeled, no cases fell in the comparisons coded "7" and "8" (the comparisons where
one method finds a positive significant effect and the other method finds a negative
significant effect).
Examination of pairwise comparisons showed that when disagreements occurred it
was because one method rejected Ho and the other failed to reject. (That is, no comparisons
fell into categories 7 and 8 in Table 4.3) When an indirect effect was positive in the
population (Tables A.7 to A.9), it was very rare for either method reject Ho in the negative
direction (i.e. to conclude that a significant negative indirect effect was present). Thus it is
convenient to tabulate agreement and disagreement without distinguishing between positive
and negative rejections of Ho.
Figures 4.17 and 4.18 situate the comparisons between methods in a 2 x 2 grid that
highlights the areas of agreement and disagreement between the two methods and indicates

83
the type of error that occurs in each comparison. The codes of the pairwise comparisons that
comprised each category are listed in each quadrant.

84

Figure 4.17

Simplified Pair-wise Comparisons When No Indirect Effect Exists (ab=0)

Method 1

Significant Non-significant

Significant Method 1 - correct
Both methods commit a Method 2 - Type I error
Type I error.

1,6,7,8 3,5

Method 2 Both methods correctly fail
to reject HQ.
Non-
significant Method 1 - Type I error

Method 2 - correct

2,4 0

u Areas of agreement u Areas of disagreement

Note: Codes in smaller boxes within each cell refer to Table 4.3.

85

Figure 4.18:

Simplified Pairwise Comparisons When an Indirect Effect Exists (ab>0)

Method 1

Significant Non-significant

Significant Method 1 - Type II error
Both methods correctly Method 2 - correct
reject Ho.

Method 2 1 3,7a

Non- Both methods commit a Type
significant Method 1 - correct II error.

Method 2- Type II error

2,8a 0, 4a, 5a, 6a

U Areas of agreement u Areas of disagreement

Note: Codes in smaller boxes within each cell refer to Table 4.3.
aCodes 4 to 8 involve one or both methods finding a significant negative indirect effect. This

type of error does not fit neatly into the categories in this Figure. For the purpose of

simplifying the pairwise comparisons, the finding of a significant negative indirect effect has

been categorized as Type II error.

86
Tables 4.4 to 4.6 tabulate agreement and disagreement for pairs of method when no
indirect effect exists (see Figure 4.17). Overall percent agreement (Po) was over 95% for all
three pairwise comparisons in all 28 conditions, and usually much higher.
Tables 4.4 to 4.6 also show Cohen's (1960) kappa (K) for each comparison. Kappa
estimates the percent agreement (Po) corrected for chance, which is generally a more
accurate index of agreement than Po.

87

Table 4.4

Concordance between ACL and BT When No Indirect Effect Exists (ab=0)

Both commit Both fail to A C L only - B T only - ACL-

cn a b Type I error reject H 0 Type I error Type I error Po B T K
0
1 50 0 01 998 1 0 0.999 0.666
2 100 0
3 200 0 04 996 0 0 1 1.000
4 500
0 1 999 0 0 1 1.000

03 997 0 0 1 1.000

5 50 0 0.14 5 993 2 0 0.998 0.832
1 0.997 0.902
6 100 0 0.14 14 983 2 0 0.998 0.957
1 0.991 0.896
7 200 0 0.14 23 975 2

8 500 0 0.14 41 950 8

9 50 0 0.39 39 945 15 1 0.984 0.822
10 100 0 0.39 47 947 6 0 0.994 0.937
11 200 0 0.39 53 944 3 0 0.997 0.971
12 500 0 0.39 32 964 4 0 0.996 0.939

13 50 0 0.59 49 938 13 0 0.987 0.876
14 100 0 0.59 66 932 2 0 0.998 0.984
15 200 0 0.59 56 942 2 0 0.998 0.981
16 500 0 0.59 56 942 2 0 0.998 0.981

17 50 0.14 0 6 993 1 0 0.999 0.923
1 0.998 0.946
18 100 0.14 0 18 980 1 0 0.996 0.941
1 0.992 0.907
19 200 0.14 0 33 963 4

20 500 0.14 0 41 951 7

33 50 0.39 0 33 959 8 0 0.992 0.888
0 0.992 0.912
34 100 0.39 0 44 948 8 0 0.997 0.974
0 0.998 0.977
35 200 0.39 0 59 938 3

36 500 0.39 0 45 953 2

49 50 0.59 0 70 917 13 0 0.987 0.908
50 100 0.59 0 61 933 6 0 0.994 0.950
51 200 0.59 0 54 945 1 0 0.999 0.990
52 500 0.59 0 28 972 0 0 1 1.000

Note. C = condition; a = population path coefficient for effect of X on M; b = population
path coefficient for effect of M on Y (controlling for X); ACL = asymmetric confidence
limits method; BT = basic test of mediation; P0 = percent agreement; K = Cohen's kappa.

88

Table 4.5

Concordance between BCa and BT When No Indirect Effect Exists (ab=0)

cn Both commit Both fail to B C a only - B T only - BCa-

1 50 a b Type I error reject H 0 Type I error Type I error Po B T K
2 100
3 200 0 0 1 990 9 0 0.991 0.180
4 500 0
0 04 993 3 0 0.997 0.726
0
0 1 993 6 0 0.994 0.249

03 994 3 0 0.997 0.665

5 50 0 0.14 5 981 14 0 0.986 0.412
6 100 0 0.14 12 975 10 3 0.987 0.642
7 200 0 0.14 21 956 21 2 0.977 0.635
8 500 0 0.14 41 931 28 0 0.972 0.732

9 50 0 0.39 39 913 48 0 0.952 0.597
10 100 0 0.39 46 922 32 0 0.968 0.726
11 200 0 0.39 50 936 14 0 0.986 0.870
12 500 0 0.39 28 959 11 2 0.987 0.805

13 50 0 0.59 43 917 34 6 0.96 0.662
14 100 0 0.59 61 916 21 2 0.977 0.829
15 200 0 0.59 51 933 13 3 0.984 0.856
16 500 0 0.59 52 934 14 0 0.986 0.874

17 50 0.14 0 6 982 12 0 0.988 0.495
18 100 0.14 0 17 965 16 2 0.982 0.645
19 200 0.14 0 32 948 19 1 0.98 0.752
20 500 0.14 0 41 933 25 1 0.974 0.746

33 50 0.39 0 30 929 39 2 0.959 0.575
34 100 0.39 0 42 929 28 1 0.971 0.729
35 200 0.39 0 54 924 20 2 0.978 0.819
36 500 0.39 0 40 946 11 3 0.986 0.844

49 50 0.59 0 66 898 33 3 0.964 0.767
50 100 0.59 0 56 927 15 2 0.983 0.859
51 200 0.59 0 49 934 16 1 0.983 0.843
52 500 0.59 0 22 963 12 3 0.985 0.738

Note. C = condition; a = population path coefficient for effect of X on M; b = population
path coefficient for effect of M on Y (controlling for X); BCa = bias corrected and
accelerated bootstrap; BT = basic test of mediation; P0 = percent agreement; K = Cohen's
kappa.

89

Table 4.6

Concordance between BCa and ACL When No Indirect Exists (ab=0)

cn Both commit Both fail to B C a only - A C L only - BCa-

1 50 a b Type I error reject H 0 Type I error Type I error Po A C L K
2 100 0
3 200 0 02 990 8 0 0.992 0.331
4 500 0
0 04 993 3 0 0.997 0.726

0 1 993 6 0 0.994 0.249

03 994 3 0 0.997 0.665

5 50 0 0.14 7 981 12 0 0.988 0.534
2 0.99 0.732
6 100 0 0.14 14 976 8 3 0.977 0.646
1 0.978 0.802
7 200 0 0.14 22 955 20

8 500 0 0.14 48 930 21

9 50 0 0.39 49 908 38 5 0.957 0.673
10 100 0 0.39 50 919 28 3 0.969 0.747
11 200 0 0.39 52 932 12 4 0.984 0.858
12 500 0 0.39 31 956 8 5 0.987 0.820

13 50 0 0.59 53 914 24 9 0.967 0.745
14 100 0 0.59 62 912 20 6 0.974 0.813
15 200 0 0.59 53 931 11 5 0.984 0.860
16 500 0 0.59 52 928 14 6 0.98 0.828

17 50 0.14 0 7 982 11 0 0.989 0.556
18 100 0.14 0 18 965 16 1 0.983 0.671
19 200 0.14 0 35 947 16 2 0.982 0.786
20 500 0.14 0 47 933 19 1 0.98 0.814

33 50 0.39 0 35 925 34 6 0.96 0.617
34 100 0.39 0 48 926 22 4 0.974 0.773
35 200 0.39 0 57 921 17 5 0.978 0.827
36 500 0.39 0 40 942 11 7 0.982 0.807

49 50 0.59 0 75 893 24 8 0.968 0.807
50 100 0.59 0 60 922 11 7 0.982 0.860
51 200 0.59 0 50 930 15 5 0.98 0.823
52 500 0.59 0 22 960 12 6 0.982 0.700

Note. C = condition; a - population path coefficient for effect of X on M; b = population
path coefficient for effect of M on Y (controlling for X); BCa = bias corrected and
accelerated bootstrap; ACL = asymmetric confidence limits method; P0 = percent agreement;
K = Cohen's kappa.


Click to View FlipBook Version