6.8 ■ Estimation for the Binomial Distribution 181
Example 6.41 Hypertension We now return to the specific data set in Example 6.39. Suppose we
want to construct a 95% CI for the interobserver variability as defined by σ .
2
Solution Because there are 10 people and s = 8.178, the required interval is given by
2
( 9s χ 2 2 9 975 , 9s χ 2 9 025) = ( . ) . , ( ) . 3 87 27 26)
2
,
98 178 19 02 98.1178 270 = ( .
.
,.
,.
(
.
.
.
Similarly, a 95% CI for σ is given by 387. , 2726) = ( 1 97 522). Notice that the CI
,
for σ is not symmetric about s = 8.178, in contrast to the CI for µ, which was sym-
2
2
metric about x. This characteristic is common in CIs for the variance.
We could use the CI for σ to make decisions concerning the variability of the
2
Arteriosonde machine if we had a good estimate of the interobserver variability of
blood-pressure readings from a standard cuff. For example, suppose we know from
previous work that if two people are listening to blood-pressure recordings from a
standard cuff, then the interobserver variability as measured by the variance of the
set of differences between the readings of two observers is 35. This value is outside
the range of the 95% CI for σ (3.87, 27.26), and we thus conclude that the interob-
2
server variability is reduced by using an Arteriosonde machine. Alternatively, if this
prior variance were 15, then we could not say that the variances obtained from us-
ing the two methods are different.
Note that the CI for σ in Equation 6.15 is only valid for normally distributed
2
samples. If the underlying distribution is not normal, then the level of confidence
for this interval may not be 1 − α even if the sample size is large. This is different
from the CI for µ given in Equation 6.6, which will be valid for large n based on the
central-limit theorem, even if the underlying distribution is not normal.
REVIEW QUESTIONS 6D
1 What is the difference between a t distribution and a chi-square distribution? When R E V I E W
do we use each?
2 Suppose we have a normal distribution with mean = 0 and variance = 5. We draw a
2
sample of size 8 from this distribution and compute the sample variance, s . What is
the probability that s > 10?
2
6.8 Estimation for the Binomial Distribution
Point Estimation
Point estimation for the parameter p of a binomial distribution is discussed in this
section.
Example 6.42 Cancer Consider the problem of estimating the prevalence of malignant melanoma
in 45- to 54-year-old women in the United States. Suppose a random sample of 5000
women is selected from this age group, of whom 28 are found to have the disease.
Let the random variable X represent the disease status for the ith woman, where X = 1
i
i
if the ith woman has the disease and 0 if she does not; i = 1, . . . , 5000. The random
variable X was also defined as a Bernoulli trial in Definition 5.14. Suppose the preva-
i
lence of the disease in this age group = p. How can p be estimated?
n
We let X = X = ∑ i the number of women with malignant melanoma among the
i=1
n women. From Example 5.32, we have E(X) = np and Var(X) = npq. Note that X can
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 181 7/15/10 6:57:54 PM
182 C H A P T E R 6 ■ Estimation
also be looked at as a binomial random variable with parameters n and p because X
represents the number of events in n independent trials.
Finally, consider the random variable ˆ p = sample proportion of events. In our
example, ˆ p = proportion of women with malignant melanoma. Thus
1 n
ˆ p = X = ∑ X n
n i=1 i
Because ˆ p is a sample mean, the results of Equation 6.1 apply and we see that
Ep ˆ ) = ( EX ) ≡ ( i µ = p. Furthermore, from Equation 6.2 it follows that
Varp ˆ ) = ( 2 n = σ pqn and se p ˆ ) = pqn
(
Thus, for any sample of size n the sample proportion ˆ p is an unbiased estimator of
the population proportion p. The standard error of this proportion is given exactly
by pq n and is estimated by ˆˆ n . These principles can be summarized as follows:
pq
Equation 6.16 Point Estimation of the Binomial Parameter p
Let X be a binomial random variable with parameters n and p. An unbiased es-
timator of p is given by the sample proportion of events ˆ p. Its standard error is
pq
given exactly by pq n and is estimated by ˆˆ n.
Example 6.43 Estimate the prevalence of malignant melanoma in Example 6.42, and provide its
standard error.
Solution Our best estimate of the prevalence rate of malignant melanoma among 45- to
54-year-old women is 28/5000 = .0056. Its estimated standard error is
)
.0056 (.9944 5000 = .0011
Maximum-Likelihood Estimation
A better justification for the use of ˆ p to estimate the binomial parameter p is that ˆ p
is a maximum-likelihood estimator of p. To understand this concept, we need to
define a likelihood.
Definition 6.16 Suppose the probability-mass function of a discrete random variable X is a function
of k parameters p , . . . , p , denoted by p. If x , . . . , x is a sample of n independent
k
n
1
1
%
observations from X, then the likelihood of the sample x , . . . , x given p is denoted by
1
n
%
L(x|p) and represents the probability of obtaining our sample given specified values
˜ %
for the parameters p , . . . , p . It is given by
1 k
n (
i (
n n
( Prx | p) × L × Pr x | p) = ∏ Prx | p)
L x | p) = ( 1 % % i=1 %
%
%
Example 6.44 Diabetes Suppose we have a sample of 100 men, of whom 30 have diabetes and 70 do
not. If the prevalence of diabetes = p, then what is the likelihood of the sample given p?
Solution Each observation is a binary random variable where x = 1 if a man has diabetes and =
i
0 otherwise. In this example p is a single parameter p. Furthermore, Pr(X = 1) = p and
%
Pr(X = 0) = 1 − p. Thus, the likelihood of the sample is p (1 − p) = L(x | p). If X is a con-
70
30
˜
tinuous random variable, then because Pr(X = x) = 0 for specific values of x, a slightly
different definition of likelihood is used, where the probability of a specific observa-
tion given p is replaced by the probability density of that observation given p.
% %
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 182 7/15/10 6:57:58 PM
6.8 ■ Estimation for the Binomial Distribution 183
Definition 6.17 Suppose the probability-density function (f) of a continuous random-variable X is a
function of k parameters p ,... , p ≡ p. If x , . . . , x is a sample of n independent
k
1
1
n
observations, and x =(x ,... , x n ), then
1
n
L(| ) = xp f x |) L× p × f x | ) = ( p ∏ f x |)
(
p
(
% % 1 % n % i=1 i %
Example 6.45 Suppose we have n independent observations x , . . . , x from a normal distribution
1 n
with mean = µ and variance = σ . What is the likelihood of the sample?
2
Solution In this case, p = (,µσ 2 ). From the definition of a normal density, we have
%
2
fx() = ( 2πσ ) exp − ( /)( x − 12 i µ )
1
i
Thus the likelihood of the sample is
n
2
L(| ,µσ 2 ) = ∏ { 1 ( 2 πσ) exp −( 12) x ( i − ) µ }
x
% i=1
=
n
= 1 ( 2π ) n/ 2 σ n exp − ( 12 ) ∑ x( i − µ 2 σ) 2
=1
i
We have the following definition of a maximum-likelihood estimator.
Definition 6.18 Suppose we have a probability-mass function or a probability-density function that is
a function of k parameters p , . . . , p . The maximum-likelihood estimator (MLE)
1 k
of the parameters p , . . . , p are the values p , . . . , p , which maximize the likeli-
1 k 1,ML k,ML
hood. Heuristically, the MLE can be thought of as the values of the parameters (p)
that maximize the probability of the observed data given p. %
%
Example 6.46 Diabetes Find the MLE of the diabetes prevalence p in Example 6.44.
Solution The likelihood L = p (1 − p) . It is usually easier to maximize the log likelihood
70
30
rather than the likelihood itself. Hence,
log L = log p + 30 70 log(1 − p)
To maximize log L, we take the derivative of log L with respect to p and set the
expression to 0. We have
d log L dp = p − 30 70 (1 − p) = 0
or 30/p = 70/(1 − p)
or 30(1 − p) = 70p
or 30 = 100 p, or p ˆ ML = 30 100 = . 3
Thus, .3 is the MLE of p.
In general, if we have a binomial distribution with n observations of which k are
successes and n − k are failures, then
−
L(| p) = p ( − k 1 p) n k and theMLE p = of p ˆ ML = kn = p ˆ
x
%
The rationale for using the MLE is that in general, for a wide class of distributions,
as the sample size gets large the MLE is unbiased and has the smallest variance among
all unbiased estimators. Thus it is a useful general method of parameter estimation.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 183 7/15/10 6:58:00 PM
184 C H A P T E R 6 ■ Estimation
Example 6.47 Consider Example 6.45. Find the MLE of µ.
Solution From Example 6.45, we have
n
L(| ,µσ 2 ) ≡ L = 2 π 1 ( n/2 σ) n exp ( − 12)∑ x ( i − ) µ 2 σ 2
x
i=1
=
n
or log L =− ( n ) 2 log( ) − n log( ) −(12 )∑ x − ) µ i ( 2 σ 2 2
π
2
σ
i=1
We take the derivative log L with respect to µ and set the expression to 0 as follows:
n
2
d log L d µ x − ∑ i ( µ = ) σ = 0
i=1
2
or, because σ > 0, we have
n
i (
∑ x − ) = 0
µ
i=1
or
n
x − ∑ i n µ = 0
i=1
or
n
ˆ µ ML ∑ xn = x
=
i
= i 1
Thus the MLE of µ is x.
Interval Estimation—Normal-Theory Methods
Point estimation of the parameter p of a binomial distribution was covered in the pre-
vious two sections. How can an interval estimate of the parameter p be obtained?
Example 6.48 Cancer Suppose we are interested in estimating the prevalence rate of breast cancer
among 50- to 54-year-old women whose mothers have had breast cancer. Suppose that
in a random sample of 10,000 such women, 400 are found to have had breast cancer at
some point in their lives. We have shown that the best point estimate of the prevalence
rate p is given by the sample proportion ˆ p = ,000 = 400 10 .040 . How can an interval
estimate of the parameter p be obtained? (See the solution in Example 6.49.)
Let’s assume the normal approximation to the binomial distribution is valid—
whereby from Equation 5.14 the number of events X observed out of n women will
be approximately normally distributed with mean np and variance npq or, corre-
spondingly, the proportion of women with events = = ˆ p X n is normally distributed
with mean p and variance pq/n.
The normal approximation can actually be justified on the basis of the central-
limit theorem. Indeed, in the previous section we showed that ˆ p could be repre-
sented as an average of n Bernoulli trials, each of which has mean p and variance pq.
Thus for large n, from the central-limit theorem, we can see that ˆ p = X is normally
distributed with mean µ = p and variance σ /n = pq/n, or
2
.
Equation 6.17 ˆ p N ppq n∼ ( , )
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 184 7/15/10 6:58:02 PM
6.8 ■ Estimation for the Binomial Distribution 185
Alternatively, because the number of successes in n Bernoulli trials X = np ˆ
=
(which is the same as a binomial random variable with parameters n and p), if Equa-
tion 6.17 is multiplied by n,
.
Equation 6.18 XN np npq)
∼
,
(
This formulation is indeed the same as that for the normal approximation to the
binomial distribution, which was given in Equation 5.14. How large should n be
before this approximation can be used? In Chapter 5 we said the normal approxi-
mation to the binomial distribution is valid if npq ≥ 5. However, in Chapter 5 we
assumed p was known, whereas here we assume it is unknown. Thus we estimate
p by ˆ p and q by ˆ q =−1 ˆ p and apply the normal approximation to the binomial if
npq ˆˆ ≥ 5. Therefore, the results of this section should only be used if npq ˆˆ ≥ 5. An ap-
proximate 100% × (1 − α) CI for p can now be derived from Equation 6.17 using
methods similar to those given in Section 6.5. In particular, from Equation 6.17
we see that
1
+
Pr p − ( z 1 − 2 pq n < p < ˆ pz 1 α/ −α / 2 pq n) =− α
This inequality can be written in the form of two inequalities:
p − z 1 − 2 pq n < p ˆ and p < ˆ p z 1 α/ −α / 2 pqn
+
To explicitly derive a CI based on these inequalities requires solving a quadratic
equation for p in terms of ˆ p. To avoid this, it is customary to approximate pq n by
ˆˆ n and to rewrite the inequalities in the form
pq
ˆˆ
p − z 1 − 2 pq n < ˆˆ p ˆ and p < ˆ p z 1 α/ −α / 2 pqn
+
ˆˆ
We now add z 1−α/ 2 pq n to both sides of the first inequality and subtract this quan-
tity from both sides of the second inequality, obtaining
+ ˆ
p < p z 1 α − pqn and ˆ p − ˆˆ z 1 α − /2 /2 pq n < ˆˆ p
Combining these two inequalities, we get
ˆˆ n
ˆ p − z 1 − /2 pq p < ˆ pz+ 1 α −α /2 pq
ˆˆ n <
pq n =− α
or Pr p ˆ − z 1 − /2 pq n < ( ˆˆ p < ˆ pz 1 α −α /2 ˆˆ ) 1
+
The approximate 100% × (1 − α) CI for p is given by
ˆˆ np z+ (
ˆˆ
ˆ p − z 1 − /2 pq , ˆ 1 α −α /2 pqn )
Equation 6.19 Normal-Theory Method for Obtaining a CI for the Binomial Parameter p (Wald
Method)
An approximate 100% × (1 − α) CI for the binomial parameter p based on the
normal approximation to the binomial distribution is given by
ˆ p ± z 1 −α /2 ˆˆ
pqn
This method of interval estimation should only be used if npq ˆˆ ≥ 5.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 185 7/15/10 6:58:06 PM
186 C H A P T E R 6 ■ Estimation
Example 6.49 Cancer Using the data in Example 6.48, derive a 95% CI for the prevalence rate
of breast cancer among 50- to 54-year-old women whose mothers have had breast
cancer.
Solution ˆ p = .040 α = .05 z 1 −α /2 = . 196 n = 10 ,000
Therefore, an approximate 95% CI is given by
.040 196049610 ,000 ,.040 196049610,000
+
−
(.
.
,
)
(.
.
)
.
.
= (.040 − .004 , .040 + .004 = ) (.036 , .044 )
Suppose we know the prevalence rate of breast cancer among all 50- to 54-year-
old American women is 2%. Because 2% is less than .036 (the lower confidence limit),
we can be quite confident that the underlying rate for the group of women whose
mothers have had breast cancer is higher than the rate in the general population.
Interval Estimation—Exact Methods
The question remains: How is a CI for the binomial parameter p obtained when
either the normal approximation to the binomial distribution is not valid or a more
exact CI is desired?
Example 6.50 Cancer, Nutrition Suppose we want to estimate the rate of bladder cancer in rats that
have been fed a diet high in saccharin. We feed this diet to 20 rats and find that 2
2
develop bladder cancer. In this case, our best point estimate of p is ˆ p = = ..
1
20
However, because
)(
npq ˆˆ = 20 (2201820 ) = . < 1 8 5
the normal approximation to the binomial distribution cannot be used and thus
normal-theory methods for obtaining CIs are not valid. How can an interval
estimate be obtained in this case?
A small-sample method for obtaining confidence limits will be presented.
Equation 6.20 Exact Method for Obtaining a CI for the Binomial Parameter p (Clopper-Pearson
Method)
An exact 100% × (1 − α) CI for the binomial parameter p that is always valid is
given by (p , p ), where p , p satisfy the equations
1 2 1 2
α n n
−
k
1
PrX ≥ ( xp = | p ) = = ∑ p ( − p ) n k
1
2 kx k 1 1
=
α x n
∑
−
k
Pr(X ≤ ( | x p = p 2 ) = 2 = k=0 p 1 p 2 ) n k
( −
2
k
A rationale for this CI is given in our discussion of hypothesis testing for the
binomial distribution in Section 7.10 on page 247.
The main problem with using this method is the difficulty in computing expres-
sions such as
x n
∑ p (1 − p) n k
−
k
k=0 k
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 186 7/15/10 6:58:08 PM
6.8 ■ Estimation for the Binomial Distribution 187
Fortunately, special tables exist for the evaluation of such expressions, one of which
is given in Table 7 in the Appendix. This table can be used as follows:
Equation 6.21 Exact Confidence Limits for Binomial Proportions
(1) The sample size (n) is given along each curve. There are two curves corre-
sponding to each given sample size. One curve is used to obtain the lower
confidence limit and the other to obtain the upper confidence limit.
(2) If 0 ≤ ˆ p ≤ 5 . , then
(a) Refer to the lower horizontal axis, and find the point corresponding to ˆ p.
(b) Draw a line perpendicular to the horizontal axis, and find the two
points where this line intersects the two curves identified in (1).
(c) Read across to the left vertical axis; the smaller value corresponds to the
lower confidence limit and the larger value to the upper confidence limit.
(3) If .5 < p ˆ ≤ . 1 0, then
(a) Refer to the upper horizontal axis, and find the point corresponding to ˆ p.
(b) Draw a line perpendicular to the horizontal axis, and find the two
points where this line intersects the two curves identified in (1).
(c) Read across to the right vertical axis; the smaller value corresponds to
the lower confidence limit and the larger value to the upper confidence
limit.
Example 6.51 Cancer Derive an exact 95% CI for the probability of developing bladder cancer,
using the data given in Example 6.50.
Solution We refer to Table 7a in the Appendix (α = .05) and identify the two curves with n =
20. Because ˆ p = . ≤ 1 .5, we refer to the lower horizontal axis and draw a vertical line
at .10 until it intersects the two curves marked n = 20. We then read across to the left
vertical axis and find the confidence limits of .01 and .32. Thus the exact 95% CI =
(.01, .32). Notice that this CI is not symmetric about ˆ p = .10.
Another approach to solving this problem is to use the BINOMDIST function of
Excel. From Equation 6.20, we need to find values of p and p such that
1 2
Pr X ≥ ( p = 2| p ) = 1 . and PrX ≤ 025 ( 2| p = p ) = 025
.
2
However, Pr(X ≥ 2 | p = p ) = 1 − Pr(X ≤ 1| p = p ) = 1 − BINOMDIST(1, 20, p , TRUE)
1 1 1
and Pr(X ≤ 2 | p = p ) = BINOMDIST(2, 20, p , TRUE). Hence we set up a spreadsheet
2 2
in which the first column has values of p from .01 to 1.0 in increments of .01; the
1
second column has 1 − BINOMDIST(1, 20, p , TRUE); the third column has values
1
of p from .01 to 1.0 in increments of .01; and the fourth column has BINOMDIST
2
(2, 20, p , TRUE). An excerpt from the spreadsheet is shown in Table 6.7.
2
Usually with exact confidence limits accurate to a fixed number of decimal
places, we cannot exactly satisfy Equation 6.20. Instead, we use a more conservative
approach. We find the largest value of p so that Pr(X ≥ x | p = p ) ≤ α/2 and the small-
1 1
est value of p so that Pr(X ≤ x | p = p ) ≤ α/2. Based on Table 6.7 with α = .05, the
2 2
values of p and p that satisfy these inequalities are p = .01 and p = .32. Hence, the
1 2 1 2
95% CI for p is (.01, .32).
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 187 7/15/10 6:58:10 PM
188 C H A P T E R 6 ■ Estimation
Table 6.7 Evaluation of exact binomial confidence limits using Excel,
based on the data in Example 6.50
p 1 − BINOMDIST(1,20,p ,TRUE) p BINOMDIST(2,20,p ,TRUE)
1 1 2 2
0.01 0.017 0.25 0.091
0.02 0.060 0.26 0.076
0.03 0.120 0.27 0.064
0.04 0.190 0.28 0.053
0.05 0.264 0.29 0.043
0.06 0.340 0.30 0.035
0.07 0.413 0.31 0.029
0.08 0.483 0.32 0.023
0.09 0.548 0.33 0.019
0.1 0.608 0.34 0.015
Example 6.52 Health Promotion Suppose that as part of a program for counseling patients with
many risk factors for heart disease, 100 smokers are identified. Of this group, 10
give up smoking for at least 1 month. After a 1-year follow-up, 6 of the 10 pa-
tients are found to have taken up smoking again. The proportion of ex-smokers
who start smoking again is called the recidivism rate. Derive a 99% CI for the
recidivism rate.
Solution Exact binomial confidence limits must be used, because
npq ˆˆ = (. )(. ) = 10 64 24
. < 5
We refer to the upper horizontal axis of the chart marked α = .01 in Appendix Table
7b and note the point ˆ p = .60. We then draw a vertical line at .60 until it intersects
the two curves marked n = 10. We then read across to the right vertical axis and find
the confidence limits of .19 and .92. Thus the exact 99% CI = (.19, .92).
More extensive and precise exact binomial confidence limits are available in
Geigy Scientific Tables [4]. Also, calculation of exact binomial confidence limits are
available directly in some statistical packages, including Stata and indirectly using
Excel as previously discussed.
For example, we can also use the Stata command cii to obtain exact 99% confi-
dence limits for p. The general form of this command is
.cii n x, level(%).
where % is the level of confidence, n is the number of trials and x is the number of
successes. The results for the recidivism data are as follows:
.cii 10 6, level(99)
-- Binomial Exact --
Variable | Obs Mean Std. Err. [99% Conf. Interval]
--------------------+----------------------------------------------------------------------------------------------
| 10 .6 .1549193 .1909163 .9232318
R E V I E W We see that the 99% exact binomial confidence interval is (.19, .92), which is the
same as we obtained with Table 7b.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 188 7/15/10 6:58:10 PM
6.9 ■ Estimation for the Poisson Distribution 189
REVIEW QUESTIONS 6E
1 Suppose an experimental weight-loss program is provided for 40 overweight participants. R E V I E W R E V I E W
A participant is considered partially successful if he or she has lost 5 lb or more after
6 months. Suppose that 10 of the 40 participants are partially successful in losing weight.
(a) What is an estimate of the partial-success rate?
(b) Derive a 95% CI for the proportion of partial successes.
2 A participant is considered completely successful if he or she has lost 20 lb or more
after 6 months. Suppose that 4 of the 40 participants are completely successful at
losing weight.
(a) What is an estimate of the complete success rate?
(b) Derive a 95% CI for the proportion of participants who were completely
successful at losing weight.
6.9 Estimation for the Poisson Distribution
Point Estimation
In this section, we discuss point estimation for the parameter λ of a Poisson distribution.
Example 6.53 Cancer, Environmental Health A study in Woburn, Massachusetts, in the 1970s
looked at possible excess cancer risk in children, with a particular focus on leukemia.
This study was later portrayed in the book and movie titled A Civil Action. An impor-
tant environmental issue in the investigation concerned the possible contamination
of the town’s water supply. Specifically, 12 children (<19 years of age) were diag-
nosed with leukemia in Woburn during the period January 1, 1970 to December 31,
1979. A key statistical issue is whether this represents an excessive number of leu-
kemia cases, assuming that Woburn has had a constant 12,000 child residents (≤age
19) during this period and that the incidence rate of leukemia in children nationally
is 5 cases per 100,000 person-years. Can we estimate the incidence rate of childhood
leukemia in Woburn during the 1970s and provide a CI about this estimate?
We let X = the number of children who developed leukemia in Woburn during
the 1970s. Because X represents a rare event, we assume that X follows a Poisson
distribution with parameter µ = λT. We know from Chapter 4 that for a Poisson dis-
tribution, E(X) = λT, where T= time and λ = number of events per unit time.
Definition 6.19 A person-year is a unit of time defined as 1 person being followed for 1 year.
This unit of follow-up time is commonly used in longitudinal studies—that is,
studies in which the same individual is followed over time.
Example 6.54 Cancer, Environmental Health How many person-years accumulated in the Woburn
study in Example 6.53?
Solution In the Woburn study, 12,000 children were each followed for 10 years. Thus a total
of 120,000 person-years accumulated. This is actually an approximation, because
the children who developed leukemia over the 10-year period were only followed
up to the time they developed the disease. It is also common to curtail follow-up for
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 189 7/15/10 6:58:11 PM
190 C H A P T E R 6 ■ Estimation
other reasons, such as death or the development of other types of cancer. However,
the number of children for whom follow-up is curtailed for these reasons is probably
small and the approximation is likely to be accurate.
Finally, although children moved in and out of Woburn over the 10-year period,
we assume there was no net migration in or out of the area during the 1970s.
We now wish to assess how to estimate λ based on an observed number of events
X over T person-years.
Equation 6.22 Point Estimation for the Poisson Distribution
Let’s assume the number of events X over T person-years is Poisson-distributed
ˆ
/
with parameter µ = λT. An unbiased estimator of λ is given by λ= XT, where X
is the observed number of events over T person-years.
If λ is the incidence rate per person-year, T = number of person-years of fol-
low-up, and we assume a Poisson distribution for the number of events X over T
person-years, then the expected value of X is given by E(X) = λT. Therefore,
ˆ
E( ) = EX T
)
(
λ
= λ TT = λ
ˆ
Thus λ is an unbiased estimator of λ.
Example 6.55 Cancer, Environmental Health Estimate the incidence rate of childhood leukemia
in Woburn during the 1970s based on the data provided in Example 6.53.
Solution There were 12 events over 120,000 person-years, so the estimated incidence rate =
12/120,000 = 1/10,000 = 0.0001 events per person-year. Because cancer incidence
rates per person-year are usually very low, it is customary to express such rates per
5
100,000 (or 10 ) person-years—that is, to change the unit of time to 10 person-
5
ˆ
5
5
years. Thus if the unit of time = 10 person-years, then T = 1.2 and λ = 0.0001 (10 ) =
10 events per 100,000 person-years.
Interval Estimation
The question remains as to how to obtain an interval estimate for λ. We use a similar
approach as was used to obtain exact confidence limits for the binomial proportion
p in Equation 6.20. For this purpose, it is easier to first obtain a CI for µ = expected
number of events over time T of the form (µ ,µ ) and then obtain the corresponding
1 2
CI for λ from (µ /T, µ /T). The approach is given as follows:
1 2
Equation 6.23 Exact Method for Obtaining a CI for the Poisson Parameter l
An exact 100% × (1 − α) CI for the Poisson parameter λ is given by (µ /T, µ /T),
1 2
where µ ,µ satisfy the equations
1 2
α ∞
PrX ≥ ( x | µ = 1 ) = µ 2 = ∑ e −µ 1 µ 1 k k!
=
−1
x kx
=− ∑ e −µ 1 µ k 1 k!
1
=
k=0
α x
µ
( ≤PrX = x | µ µ 2 = ) = ∑ e − 2 µ k 2 k!
2 k =0
and x = observed number of events, T = number of person-years of follow-up.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 190 7/15/10 6:58:12 PM
6.9 ■ Estimation for the Poisson Distribution 191
As in obtaining exact confidence limits for the binomial parameter p, it is difficult to
exactly compute µ , µ to satisfy Equation 6.23. Table 8 in the Appendix provides the
2
1
solution to these equations. This table can be used to find 90%, 95%, 98%, 99%, or
99.8% CIs for µ if the observed number of events (x) is ≤ 50. The observed number of
events (x) is listed in the first column, and the level of confidence is given in the first
row. The CI is obtained by cross-referencing the x row and the 1 − α column.
Example 6.56 Suppose we observe 8 events and assume the number of events is Poisson-distributed
with parameter µ. Find the 95% CI for µ.
Solution We look at Table 8 under the x = 8 row and the 0.95 column to find the 95% CI for
µ = (3.45, 15.76).
We see this CI is not symmetric about x (8), because 15.76 − 8 = 7.76 > 8 − 3.45 = 4.55.
This is true for all exact CIs based on the Poisson distribution unless x is very large.
Example 6.57 Cancer, Environmental Health Compute a 95% CI for both the expected number
of childhood leukemias (µ) and the incidence rate of childhood leukemia per 10
5
person-years (λ) in Woburn based on the data provided in Example 6.53.
Solution We observed 12 cases of childhood leukemia over 10 years. Thus, from Table 8, refer-
ring to x = 12 and level of confidence 95%, we find that the 95% CI for µ = (6.20,
20.96). Because there were 120,000 person-years = T, a 95% CI for the incidence
.
620 20 96 620 20 96
.
.
.
5
rate = 120 000 120 000, , , events per person-year or 120 000 × 10 , 120 000 × 10 5
,
,
events per 10 person-years = (5.2, 17.5) events per 10 person-years = 95% CI for λ.
5
5
We can also use the Stata cii command to obtain an exact 95% CI for the inci-
dence rate (λ). The general syntax is
.cii py x, poisson
where py = number of person-years and x = number of events. The results for the
leukemia data are as follows:
.cii 120000 12, poisson
-- Poisson Exact --
Variable | Exposure Mean Std. Err. [95% Conf. Interval]
-------------------+-----------------------------------------------------------------------------------------------
| 120000 .0001 .0000289 .0000517 .0001747
We see that the 95% CI for λ = (5.2/10 , 17.5/10 ), which agrees with our results from
5
5
Table 8. Stata cannot be used if we just have available a number of events, without a
corresponding number of person-years as in Example 6.56.
Example 6.58 Cancer, Environmental Health Interpret the results in Example 6.57. Specifically,
do you feel there was an excess childhood leukemia risk in Woburn, Massachusetts,
relative to expected U.S. incidence rates?
Solution Referring to Example 6.53, we note that the incidence rate of childhood leukemia in
the United States during the 1970s was 5 events per 10 person-years. We denote this
5
rate by λ . Referring to Example 6.57, we see that the 95% CI for λ in Woburn = (5.2,
0
17.5) events per 10 person-years. The lower bound of the 95% CI exceeds λ (= 5),
5
0
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 191 7/15/10 6:58:13 PM
192 C H A P T E R 6 ■ Estimation
so we can conclude there was a significant excess of childhood leukemia in Woburn
during the 1970s. Another way to express these results is in terms of the standard-
ized morbidity ratio (SMR) defined by
incidencerateinWoburnfor childhood lleukemia 10 10 5
SMR = = = 2
U.S. incidenceratefor childhood lleukemia 510 5
If the U.S. incidence rate is assumed to be known, then a 95% CI for SMR is given by
.
52 17 5
.
1043 50). Because the lower bound of the CI for SMR is > 1, we conclude
5 , 5 = (. ,.
there is a significant excess risk in Woburn. We pursue a different approach in Chap-
ter 7, addressing this issue in terms of hypothesis testing and p-values.
Another approach for obtaining exact CIs for the Poisson parameter µ is to use
the POISSON function of Excel in a similar manner as we used for BINOMDIST in
Section 6.8 on page 187. This approach is useful if the observed number of events (x)
and/or if the desired level of confidence (1 − α) does not appear in Table 8. Specifi-
cally, Equation 6.23 can be written in the form
1 − POISSON(x − 1, µ , TRUE) = α/2
1
and
POISSON (x, µ , TRUE) = α/2
2
To solve these equations, we create columns of possible values for µ and µ and find
1 2
the largest value of µ and the smallest value of µ (perhaps to 2 decimal places of
1 2
accuracy) such that
1 − POISSON(x − 1, µ , TRUE) ≤ α/2
1
and
POISSON (x, µ , TRUE) ≤ α/2
2
The two-sided 100% × (1 − α) CI for µ is then (µ , µ ).
1 2
In some instances, a random variable representing a rare event over time is assumed
to follow a Poisson distribution but the actual amount of person-time is either unknown
or is not reported in an article from the literature. In this instance, it is still possible to
use Appendix Table 8 to obtain a CI for µ, although it is impossible to obtain a CI for λ.
Example 6.59 Occupational Health In Example 4.38, a study was described concerning the possible
excess cancer risk among employees with high exposure to ethylene dibromide in two
plants in Texas and Michigan. Seven deaths from cancer were reported over the period
1940−1975, whereas only 5.8 cancer deaths were expected based on mortality rates for
U.S. white men. Find a 95% CI for the expected number of deaths among the exposed
workers, and assess whether their risk differs from that of the general population.
Solution In this case, the actual number of person-years used in computing the expected
number of deaths was not reported in the original article. Indeed, the computation
of the expected number of deaths is complex because
(1) Each worker is of a different age at the start of follow-up.
(2) The age of a worker changes over time.
(3) Mortality rates for men of the same age change over time.
However, we can use Appendix Table 8 to obtain a 95% CI for µ. Because x = 7 events,
we have a 95% CI for µ = (2.81, 14.42). The expected number of deaths based on U.S.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 192 7/15/10 6:58:13 PM
6.10 ■ One-Sided CIs 193
mortality rates for white males = 5.8, which falls within the preceding interval. Thus we
conclude the risk among exposed workers does not differ from the general population.
Table 8 can also be used for applications of the Poisson distribution other than those
based specifically on rare events over time.
Example 6.60 Bacteriology Suppose we observe 15 bacteria in a Petri dish and assume the number
of bacteria is Poisson-distributed with parameter µ. Find a 90% CI for µ.
Solution We refer to the 15 row and the 0.90 column in Table 8 to obtain the 90% CI (9.25,
23.10).
6.10 One-Sided CIs
In the previous discussion of interval estimation, what are known as two-sided CIs
have been described. Frequently, the following type of problem occurs.
Example 6.61 Cancer A standard treatment exists for a certain type of cancer, and the patients
receiving the treatment have a 5-year survival rate of 30%. A new treatment is pro-
posed that has some unknown survival rate p. We would only be interested in using
the new treatment if it were better than the standard treatment. Suppose that 40 out
of 100 patients who receive the new treatment survive for 5 years. Can we say the
new treatment is better than the standard treatment?
One way to analyze these data is to construct a one-sided CI, where we are in-
terested in only one bound of the interval, in this case the lower bound. If 30% is
below the lower bound, then it is an unlikely estimate of the 5-year survival rate for
patients getting the new treatment. We could reasonably conclude from this that the
new treatment is better than the standard treatment in this case.
Equation 6.24 Upper One-Sided CI for the Binomial Parameter p — Normal-Theory Method
An upper one-sided 100% × (1 − α) CI is of the form p > p such that
1
Pr(p > p 1 ) = 1 − α
If we assume that the normal approximation to the binomial holds true, then
we can show that this CI is given approximately by
p > p − ˆ z 1 −α ˆˆ /
pq n
This interval estimator should only be used if npq ˆˆ ≥ 5.
To see this, note that if the normal approximation to the binomial distribution
holds, then ˆ p ∼ N (,ppqn ). Therefore, by definition
(
+
1
Pr ˆ p < p z 1 −α pqn) =− α
We approximate pq n by ˆˆ n and subtract z 1−α ˆˆ
pq n from both sides of the
pq
equation, yielding
ˆ p − z 1 −α pq p
ˆˆ n <
)
or p > p − ˆ z 1 −α pqn and Pr p >( p − ˆ z 1 −α pqn =− α
ˆˆ
ˆˆ
1
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 193 7/15/10 6:58:16 PM
194 C H A P T E R 6 ■ Estimation
Therefore, if the normal approximation to the binomial distribution holds, then
p > p − ˆ z 1 −α ˆˆ
pqn is an approximate upper 100% × (1 − α) one-sided CI for p.
Notice that z 1−α is used in constructing one-sided intervals, whereas z 1−α/ 2 was
used in constructing two-sided intervals.
Example 6.62 Suppose a 95% CI for a binomial parameter p is desired. What percentile of the
normal distribution should be used for a one-sided interval? For a two-sided
interval?
.
Solution For α = .05, we use z 1−.05 = z = 1.645 for a one-sided interval and z 1052 = z . 975 = 196
−
.
.95
for a two-sided interval.
Example 6.63 Cancer Construct an upper one-sided 95% CI for the survival rate based on the
cancer-treatment data in Example 6.61.
Solution First check that npq ˆˆ = (. )(. ) = 100 46 24 ≥ 5. The CI is then given by
Pr p > − .40 z .95 . (. ) 100 = .95
46
[
−
.
Pr p > .40 −1 645 049. (. ) ] = 95
Pr(p > .319 ) = .95
Because .30 is not within the given interval [that is, (.319, 1.0)], we conclude the
new treatment is better than the standard treatment.
If we were interested in 5-year death rates rather than survival rates, then a one-
sided interval of the form Pr(p < p ) = 1 − α would be appropriate because we would
2
only be interested in the new treatment if its death rate were lower than that of the
standard treatment.
Equation 6.25 Lower One-Sided CI for the Binomial Parameter p — Normal-Theory Method
The interval p < p such that
2
Pr(p < p ) = 1 − α
2
is referred to as a lower one-sided 100% × (1 − α) CI and is given approximately by
ˆˆ
p < pz 1 −α pq n
+ ˆ
This expression can be derived in the same manner as in Equation 6.24 by starting
with the relationship
Pr ˆ p >( p − z 1 −α pqn) =− α
1
ˆˆ
pq
If we approximate pq n by ˆˆ n and add z 1−α pq n to both sides of the equation,
we get
Pr p <( p z 1 −α pqn =− α
)
+ ˆ
1
ˆˆ
Example 6.64 Cancer Compute a lower one-sided 95% CI for the death rate using the cancer-
treatment data in Example 6.61.
Solution We have ˆ p = .6. Thus the 95% CI is given by
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 194 7/15/10 6:58:19 PM
6.11 ■ Summary 195
Pr p < + .61 .645 .(.) 100 = .95
64
Pr p <[ + .61 645 049)] 1 . ( . = . 95
Pr(p < .681 ) = .95
Because 70% is not within this interval [that is, (0, .681)], we can conclude the new
treatment has a lower death rate than the old treatment.
Similar methods can be used to obtain one-sided CIs for the mean and variance
of a normal distribution, for the binomial parameter p using exact methods, and for
the Poisson expectation µ using exact methods.
6.11 Summary
This chapter introduced the concept of a sampling distribution. This concept is
crucial to understanding the principles of statistical inference. The fundamental
idea is to forget about our sample as a unique entity and instead regard it as a ran-
dom sample from all possible samples of size n that could have been drawn from
the population under study. Using this concept, X was shown to be an unbiased
estimator of the population mean µ; that is, the average of all sample means over
all possible random samples of size n that could have been drawn will equal the
population mean. Furthermore, if our population follows a normal distribution,
then X has minimum variance among all possible unbiased estimators and is thus
called a minimum-variance unbiased estimator of µ. Another justification for the use of
X to estimate µ is that X is the maximum-likelihood estimator (MLE) of µ. An MLE is
the value ˆ µ, which maximizes the Pr(obtaining our sample | ˆ µ). A similar definition
holds for obtaining MLEs of parameters in other estimation problems. Finally, if our
population follows a normal distribution, then X also follows a normal distribution.
However, even if our population is not normal, the sample mean still approximately
follows a normal distribution for a sufficiently large sample size. This very important
idea, which justifies many of the hypothesis tests we study in the rest of this book,
is called the central-limit theorem.
The idea of an interval estimate (or CI) was then introduced. Specifically, a 95%
CI is defined as an interval that will contain the true parameter for 95% of all ran-
dom samples that could have been obtained from the reference population. The
preceding principles of point and interval estimation were applied to the following:
(1) Estimating the mean µ of a normal distribution
(2) Estimating the variance σ of a normal distribution
2
(3) Estimating the parameter p of a binomial distribution
(4) Estimating the parameter λ of a Poisson distribution
(5) Estimating the expected value µ of a Poisson distribution
The t and chi-square distributions were introduced to obtain interval estimates for
(1) and (2), respectively.
In Chapters 7 through 14, the discussion of statistical inference continues,
focusing primarily on testing hypotheses rather than on parameter estimation. In
this regard, some parallels between inference from the points of view of hypothesis
testing and CIs are discussed.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 195 7/15/10 6:58:20 PM
196 C H A P T E R 6 ■ Estimation
P ROB l E m S
Gastroenterology 6.12 Compute a 95% CI for the mean white blood count
Suppose we are asked to construct a list of treatment as- following admission.
signments for patients entering a study comparing different 6.13 Answer Problem 6.12 for a 90% CI.
treatments for duodenal ulcer. 6.14 What is the relationship between your answers to
6.1 Anticipating that 20 patients will be entered in the Problems 6.12 and 6.13?
study and two treatments will be used, construct a list of *6.15 What is the best point estimate of the percentage
random-treatment assignments starting in the 28th row of of males among patients discharged from Pennsylvania
the random-number table (Table 4 in the Appendix). hospitals?
6.2 Count the number of people assigned to each treat- *6.16 What is the standard error of the estimate obtained
ment group. How does this number compare with the ex- in Problem 6.15?
pected number in each group? *6.17 Provide a 95% CI for the percentage of males among
6.3 Suppose we change our minds and decide to enroll patients discharged from Pennsylvania hospitals.
40 patients and use four treatment groups. Use a computer
program (such as MINITAB or Excel) to construct the list of
random-treatment assignments referred to in Problem 6.1. Microbiology
6.4 Answer Problem 6.2 for the list of treatment assign- A nine-laboratory cooperative study was performed to
ments derived in Problem 6.3. evaluate quality control for susceptibility tests with 30-µg
netilmicin disks [6]. Each laboratory tested three standard
control strains on a different lot of Mueller-Hinton agar, with
Pulmonary Disease 150 tests performed per laboratory. For protocol control,
The data in Table 6.8 concern the mean triceps skin-fold each laboratory also performed 15 additional tests on each
thickness in a group of normal men and a group of men with of the control strains using the same lot of Mueller-Hinton
chronic airflow limitation [5]. agar across laboratories. The mean zone diameters for each
of the nine laboratories are given in Table 6.9.
*6.18 Provide a point and interval estimate (95% CI) for
Table 6.8 Triceps skin-fold thickness in normal men the mean zone diameter across laboratories for each type
and men with chronic airflow limitation of control strain, if each laboratory uses different media to
perform the susceptibility tests.
Group Mean sd n
*6.19 Answer Problem 6.18 if each laboratory uses a com-
Normal 1.35 0.5 40 mon medium to perform the susceptibility tests.
Chronic airflow limitation 0.92 0.4 32 *6.20 Provide a point and interval estimate (95% CI) for the
interlaboratory standard deviation of mean zone diameters
Source: Reprinted with permission of Chest, 85(6), 58S−59S, 1984. for each type of control strain, if each laboratory uses differ-
ent media to perform the susceptibility tests.
*6.21 Answer Problem 6.20 if each laboratory uses a com-
*6.5 What is the standard error of the mean for each group? mon medium to perform the susceptibility tests.
6.6 Assume that the central-limit theorem is applicable.
What does it mean in this context? 6.22 Are there any advantages to using a common medium
versus using different media for performing the susceptibil-
ity tests with regard to standardization of results across
6.7 Find the upper 1st percentile of a t distribution with 16 df. laboratories?
6.8 Find the lower 10th percentile of a t distribution with
28 df. Renal Disease
6.9 Find the upper 2.5th percentile of a t distribution with 7 df. A study of psychological and physiological health in a cohort of
6.10 What are the upper and lower 2.5th percentiles for a dialysis patients with end-stage renal disease was conducted
chi-square distribution with 2 df? What notation is used to [7]. Psychological and physiological parameters were initially
denote these percentiles? determined at baseline in 102 patients; these parameters were
Refer to the data in Table 2.11. Regard this hospital as typi- determined again in 69 of the 102 patients at an 18-month
cal of Pennsylvania hospitals. follow-up visit. The data in Table 6.10 were reported.
6.11 Compute a 95% CI for the mean duration of hospi- 6.23 Provide a point and interval estimate (95% CI) for the
talization. mean of each parameter at baseline and follow-up.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 196 7/15/10 6:58:20 PM
Problems 197
Table 6.9 Mean zone diameters with 30-µg netilmicin disks tested in nine separate laboratories
Type of control strain
E. coli S. aureus P. aeruginosa
Laboratory Different media Common medium Different media Common medium Different media Common medium
A 27.5 23.8 25.4 23.9 20.1 16.7
B 24.6 21.1 24.8 24.2 18.4 17.0
C 25.3 25.4 24.6 25.0 16.8 17.1
D 28.7 25.4 29.8 26.7 21.7 18.2
E 23.0 24.8 27.5 25.3 20.1 16.7
F 26.8 25.7 28.1 25.2 20.3 19.2
G 24.7 26.8 31.2 27.1 22.8 18.8
H 24.3 26.2 24.3 26.5 19.9 18.1
I 24.9 26.3 25.4 25.1 19.3 19.2
Table 6.10 Psychological and physiological parameters in patients with end-stage renal disease
Baseline (n = 102) 18-month follow-up (n = 69)
Variable Mean sd Mean sd
Serum creatinine (mmol/L) 0.97 0.22 1.00 0.19
Serum potassium (mmol/L) 4.43 0.64 4.49 0.71
Serum phosphate (mmol/L) 1.68 0.47 1.57 0.40
Psychological Adjustment to Illness (PAIS) scale 36.50 16.08 23.27 13.79
6.24 Do you have any opinion on the physiological and *6.27 What is the best point estimate for p, the probability
psychological changes in this group of patients? Explain. of a failure with the drug?
(Note: A lower score on the PAIS scale indicates worse *6.28 What is a 95% CI for p?
adjustment to illness.)
*6.29 Suppose we know penicillin G at a daily dose of 4.8
megaunits has a 10% failure rate. What can be said in com-
Ophthalmology, Hypertension paring the two drugs?
A study is conducted to test the hypothesis that people with
glaucoma have higher-than-average blood pressure. The Pharmacology
study includes 200 people with glaucoma whose mean SBP
is 140 mm Hg with a standard deviation of 25 mm Hg. Suppose we want to estimate the concentration (µg/mL)
of a specific dose of ampicillin in the urine after various pe-
6.25 Construct a 95% CI for the true mean SBP among riods of time. We recruit 25 volunteers who have received
people with glaucoma. ampicillin and find they have a mean concentration of 7.0
6.26 If the average SBP for people of comparable age is µg/mL with a standard deviation of 2.0 µg/mL. Assume
130 mm Hg, is there an association between glaucoma and the underlying population distribution of concentrations is
blood pressure? normally distributed.
*6.30 Find a 95% CI for the population mean concentration.
Sexually Transmitted Disease *6.31 Find a 99% CI for the population variance of the
Suppose a clinical trial is conducted to test the efficacy of a concentrations.
new drug, spectinomycin, for treating gonorrhea in females. *6.32 How large a sample would be needed to ensure
Forty-six patients are given a 4-g daily dose of the drug and that the length of the CI in Problem 6.30 is 0.5 µg/mL
are seen 1 week later, at which time 6 of the patients still if we assume the sample standard deviation remains at
have gonorrhea. 2.0 µg/mL?
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 197 7/15/10 6:58:21 PM
198 C H A P T E R 6 ■ Estimation
Environmental Health by 10 mm Hg after 1 month. Can we carry out some pro-
Much discussion has taken place concerning possible cedure to be sure we are not simply observing the placebo
health hazards from exposure to anesthetic gases. In one effect?
study conducted in 1972, 525 Michigan nurse anesthetists 6.42 What assumptions have you made to carry out the
were surveyed by mail questionnaires and telephone inter- procedure in Problem 6.41?
views to determine the incidence rate of cancer [8]. Of this Suppose we decide a better measure of the effectiveness
group, 7 women reported having a new malignancy other of the drug is the mean decrease in blood pressure rather
than skin cancer during 1971. than the measure of effectiveness used previously. Let
1
,
6.33 What is the best estimate of the 1971 incidence rate d = i x − i yi = ,... , 100, where x = DBP for the ith per-
i
i
from these data? son before taking the drug and y = DBP for the ith person
i
6.34 Provide a 95% CI for the true incidence rate. 1 month after taking the drug. Suppose the sample mean
of the d is +5.3 and the sample variance is 144.0.
A comparison was made between the Michigan report and i
the 1969 cancer-incidence rates from the Connecticut tu- 6.43 What is the standard error of d ?
mor registry, where the expected incidence rate, based on 6.44 What is a 95% CI for the population mean of d?
the age distribution of the Michigan nurses, was determined 6.45 Can we make a statement about the effectiveness of
to be 402.8 per 100,000 person-years. the drug?
6.35 Comment on the comparison between the observed 6.46 What does a 95% CI mean, in words, in this case?
incidence rate and the Connecticut tumor-registry data.
SIMULA TION
Obstetrics, Serology Draw six random samples of size 5 from the data in
A new assay is developed to obtain the concentration of Table 6.2.
M. hominis mycoplasma in the serum of pregnant women. 6.47 Compute the mean birthweight for each of the six
The developers of this assay want to make a statement on samples.
the variability of their laboratory technique. For this purpose,
10 subsamples of 1 mL each are drawn from a large serum 6.48 Compute the standard deviation based on the
sample for one woman, and the assay is performed on each sample of six means. What is another name for this
subsample. The concentrations are as follows: 2 , 2 , 2 , 2 , quantity?
4
4
3
5
2 2 2 , 2 , 2 , 2 . 6.49 Select the third point from each of the six samples,
5
3
4
4
4
5
*6.36 If the distribution of concentrations in the log scale and compute the sample sd from the collection of six third
to the base 2 is assumed to be normal, then obtain the best points.
estimate of the variance of the concentrations from these 6.50 What theoretical relationship should there be between
data. the standard deviation in Problem 6.48 and the standard
*6.37 Compute a 95% CI for the variance of the concentra- deviation in Problem 6.49?
tions. 6.51 How do the actual sample results in Problems 6.48
*6.38 Assuming the point estimate in Problem 6.36 is the and 6.49 compare?
true population parameter, what is the probability that a par-
ticular assay, when expressed in the log scale to the base 2,
is no more than 1.5 log units off from its true mean value for Obstetrics
a particular woman? Figure 6.4b plotted the sampling distribution of the mean
*6.39 Answer Problem 6.38 for 2.5 log units. from 200 samples of size 5 from the population of 1000
birthweights given in Table 6.2. The mean of the 1000
birthweights in Table 6.2 is 112.0 oz with standard deviation
Hypertension 20.6 oz.
Suppose 100 hypertensive people are given an antihyper- *6.52 If the central-limit theorem holds, what proportion of
tensive drug and the drug is effective in 20 of them. By the sample means should fall within 0.5 lb of the population
effective, we mean their DBP is lowered by at least 10 mm mean (112.0 oz)?
Hg as judged from a repeat blood-pressure measurement *6.53 Answer Problem 6.52 for 1 lb rather than 0.5 lb.
1 month after taking the drug. *6.54 Compare your results in Problems 6.52 and 6.53
6.40 What is the best point estimate of the probability p of with the actual proportion of sample means that fall in these
the drug being effective? ranges.
6.41 Suppose we know that 10% of all hypertensive pa- *6.55 Do you feel the central-limit theorem is applicable for
tients who are given a placebo will have their DBP lowered samples of size 5 from this population? Explain.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 198 7/15/10 6:58:21 PM
Problems 199
Hypertension, Pediatrics 6.59 Use interval-estimation methods to determine if
The etiology of high blood pressure remains a subject of the sex of successive births is predictable from the sex of
active investigation. One widely accepted hypothesis is that previous births.
excessive sodium intake adversely affects blood-pressure
outcomes. To explore this hypothesis, an experiment was SIMULA TION
set up to measure responsiveness to the taste of salt and
to relate the responsiveness to blood-pressure level. The Nutrition
protocol used involved giving 3-day-old infants in the new- Data Set VALID.DAT, on the Companion Website, provides
born nursery a drop of various solutions, thus eliciting the estimated daily consumption of total fat, saturated fat, and
sucking response and noting the vigor with which they alcohol as well as total caloric intake using two different
sucked—denoted by MSB (mean number of sucks per burst methods of dietary assessment for 173 subjects.
of sucking). The content of the solution was changed over 6.60 Use a computer to draw repeated random samples of
10 consecutive periods: (1) water, (2) water, (3) 0.1 molar size 5 from this population. Does the central-limit theorem seem
salt + water, (4) 0.1 molar salt + water, (5) water, (6) water, to hold for these dietary attributes based on samples of size 5?
(7) 0.3 molar salt + water, (8) 0.3 molar salt + water, (9) 6.61 Answer Problem 6.60 for random samples of size 10.
water, (10) water. In addition, as a control, the response of
the baby to the taste of sugar was also measured after the 6.62 Answer Problem 6.60 for random samples of size 20.
salt-taste protocol was completed. In this experiment, the 6.63 How do the sampling distributions compare based on
sucking response was measured over five different periods samples of size 5, 10, and 20? Use graphic and numeric
with the following stimuli: (1) nonnutritive sucking, that is, a methods to answer this question.
pure sucking response was elicited without using any exter-
nal substance; (2) water; (3) 5% sucrose + water; (4) 15% Infectious Disease
sucrose + water; (5) nonnutritive sucking.
A cohort of hemophiliacs is followed to elicit information on
The data for the first 100 infants in the study are given in the distribution of time to onset of AIDS following serocon-
Data Set INFANTBP.DAT, on the Companion Website. The version (referred to as latency time). All patients who sero-
format of the data is given in Data Set INFANTBP.DOC, on convert become symptomatic within 10 years, according to
the Companion Website. the distribution in Table 6.11.
Construct a variable measuring the response to salt. For
example, one possibility is to compute the average MSB for Table 6.11 Latency time to AIDS among
trials 3 and 4 − average MSB for trials 1 and 2 = average hemophiliacs who become HIV positive
MSB when the solution was 0.1 molar salt + water − aver-
age MSB when the solution was water. A similar index could Latency time (years) Number of patients
be computed comparing trials 7 and 8 with trials 5 and 6.
0 2
6.56 Obtain descriptive statistics and graphic displays for 1 6
these salt-taste indices. Do the indices appear to be normally 2 9
distributed? Why or why not? Compute the sample mean for 3 33
this index, and obtain 95% CIs about the point estimate.
4 49
6.57 Construct indices measuring responsiveness to sugar 5 66
taste, and provide descriptive statistics and graphical dis- 6 52
plays for these indices. Do the indices appear normally 7 37
distributed? Why or why not? Compute the sample mean
and associated 95% CIs for these indices. 8 18
9 11
6.58 We want to relate the indices to blood-pressure level. 10 4
Provide a scatter plot relating mean SBP and mean DBP,
respectively, to each of the salt-taste and sugar-taste indi-
ces. Does there appear to be a relation between the indices 6.64 Assuming an underlying normal distribution, compute
and blood-pressure level? We discuss this in more detail in 95% CIs for the mean and variance of the latency times.
our work on regression analysis in Chapter 11.
6.65 Still assuming normality, estimate the probability p that
a patient’s latency time will be at least 8 years.
Genetics 6.66 Now suppose we are unwilling to assume a normal
Data Set SEXRAT.DAT, on the Companion Website, lists distribution for latency time. Re-estimate the probability p
the sexes of children born in over 50,000 families with more that a patient’s latency time will be at least 8 years, and
than one child. provide a 95% CI for p.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 199 7/15/10 6:58:21 PM
200 C H A P T E R 6 ■ Estimation
Environmental Health The study reported that of a total of 1996 tests given to
We have previously described Data Set LEAD.DAT (on 40- to 49-year-old women, 156 yielded false-positive results.
the Companion Website), in which children were classi- 6.77 What does a false-positive test result mean, in words,
fied according to blood-lead level in 1972 and 1973 by in this context?
the variable lead_group, where 1 = blood-lead level < 40 6.78 Some physicians feel a mammogram is not cost-
µg/100 mL in both 1972 and 1973, 2 = blood-lead level effective unless one can be reasonably certain (e.g., 95%
≥ 40 µg/100 mL in 1973, 3 = blood-lead level > 40 µg/100 certain) that the false-positive rate is less than 10%. Can
mL in 1972 but < 40 µg/100 mL in 1973. you address this issue based on the preceding data? (Hint:
6.67 Compute the mean, standard deviation, standard error, Use a CI approach.)
and 95% CI for the mean verbal IQ for children with specific 6.79 Suppose a woman is given a mammogram every
values of the variable GROUP. Provide a box plot comparing 2 years starting at age 40. What is the probability that she
the distribution of verbal IQ for subjects with lead_group = will have at least one false-positive test result among 5
1, 2, and 3. Summarize your findings concisely. screening tests during her forties? (Assume the repeated
6.68 Answer Problem 6.67 for performance IQ. screening tests are independent.)
6.69 Answer Problem 6.67 for full-scale IQ. 6.80 Provide a two-sided 95% CI for the probability estimate
in Problem 6.79.
Cardiology
Data Set NIFED.DAT (on the Companion Website) was SIMULA TION
described earlier. We wish to look at the effect of each Nutrition
treatment separately on heart rate and systolic blood pres- On the computer, we draw 500 random samples of size 5
sure (SBP). from the distribution of 173 values of ln(alcohol DR [diet
6.70 Provide separate point estimates and 95% CIs for the record] + 1) in the Data Set VALID.DAT, where Alcoh_dr is
changes in heart rate and SBP (level 1 to baseline) for the the amount of alcohol consumed as reported by diet record
subjects randomized to nifedipine and propranolol, respec- by a group of 173 American nurses who recorded each
tively. Also provide box plots of the change scores in the food eaten on a real-time basis, over four 1-week periods
two treatment groups. spaced approximately 3 months apart over the course of
1 year. For each sample of size 5, we compute the sample
6.71 Answer Problem 6.70 for level 2 to baseline. mean x , the sample standard deviation s, and the test sta-
–
6.72 Answer Problem 6.70 for level 3 to baseline. tistic t given by
6.73 Answer Problem 6.70 for the last available level to x −µ
baseline. t = s n 0
6.74 Answer Problem 6.70 for the average heart rate (or
blood pressure) over all available levels to baseline. where n = 5 and µ = overall mean of ln(alcohol DR + 1)
0
over the 173 nurses = 1.7973.
Occupational Health 6.81 What distribution should the t-values follow if the
*6.75 Refer to Problem 4.23. Provide a 95% CI for the central-limit theorem holds? Assume µ is the population
0
expected number of deaths from bladder cancer over 20 mean for ln(Alcoh_dr + 1).
years among tire workers. Is the number of cases of bladder 6.82 If the central-limit theorem holds, then what percent-
cancer in this group excessive? age of t-values should exceed 2.776 in absolute value?
*6.76 Refer to Problem 4.24. Provide a 95% CI for the 6.83 The actual number of t-values that exceed 2.776 in
expected number of deaths from stomach cancer over 20 absolute value is 38. Do you feel the central-limit theorem is
years among tire workers. Is the number of cases of stom- applicable to these data for samples of size 5?
ach cancer in this group excessive?
Cardiovascular Disease
Cancer A study was performed to investigate the variability of cho-
The value of mammography as a screening test for breast lesterol and other lipid measures in children. The reported
cancer has been controversial, particularly among young within-subject standard deviation for cholesterol in children
women. A study was recently performed looking at the rate was 7.8 mg/dL [10].
of false positives for repeated screening mammograms 6.84 Suppose two total cholesterol determinations are
among approximately 10,000 women who were members obtained from one child, yielding an average value of 200
of Harvard Pilgrim Health Care, a large health-maintenance mg/dL. What is a two-sided 90% CI for the true mean
organization in New England [9]. total cholesterol for that child? (Hint: Assume the sample
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 200 7/15/10 6:58:22 PM
Problems 201
standard deviation of cholesterol for the child is known to Table 6.12 Home blood pressure recordings for one
be 7.8 mg/dL.) patient
6.85 Suppose an average of two total cholesterol determi-
nations is to be used as a screening tool to identify children Day SBP (mm Hg) DBP (mm Hg)
with high cholesterol. The investigators wish to find a value 1 121 87.5
c, such that all children whose mean cholesterol values 2 109 81
over two determinations are ≥c will be called back for fur-
ther screening, whereas children whose mean cholesterol 3 117.5 91.5
values are <c will not be followed any further. To determine 4 125 94
c, the investigators want to choose a value c such that 5 125 87.5
the lower one-sided 90% CI for µ if the observed average 6 129 90.5
cholesterol over two determinations = c would exclude 7 123 90
250 mg/dL. What is the largest value of c that satisfies this 8 118.5 85.5
requirement? 9 123.5 87.5
10 127 89
Endocrinology Mean 121.85 88.40
Refer to Data Set BONEDEN.DAT on the Companion sd 5.75 3.56
Wesbite. n 10 10
6.86 Assess whether there is a relationship between BMD
at the femoral neck and cigarette smoking using CI method- The doctor wants to assess whether the underlying mean
ology. (Hint: Refer to Section 6.6.) SBP for this woman is <140 or ≥140 mm Hg.
6.87 Assess whether there is a relationship between BMD 6.97 Provide a 95% CI for true mean SBP for this patient.
at the femoral shaft and cigarette smoking using CI method-
ology. (Hint: Refer to Section 6.6.) 6.98 Answer the doctor’s question given the result in Prob-
lem 6.97.
SIMULA TION Another issue the doctor wants to study is what the hyper-
6.88 Using the computer, generate 200 random samples tensive status of the patient usually is. A person is classified
as hypertensive on any one day if either his or her SBP is
from a binomial distribution with n = 10 and p = .6. Derive a ≥140 mm Hg or his or her DBP is ≥90 mm Hg.
large sample 90% CI for p based on each sample.
6.89 What percentage of the CIs include the parameter p? 6.99 What proportion of days would the woman be classi-
fied as hypertensive based on the preceding data?
6.90 Do you think that the large-sample binomial confidence- A person would be classified as hypertensive overall if his or
limit formula is adequate for this distribution? her probability of being hypertensive on an individual day (p)
6.91 Answer Problem 6.88 for a binomial distribution with is ≥20% based on a large number of days.
n = 20 and p = .6. 6.100 Develop 95% CI for p based on your answer to
6.92 Answer Problem 6.89 for a binomial distribution with Problem 6.99. (Hint: Use Appendix Table 7a.)
n = 20 and p = .6. 6.101 Would the person be classified as hypertensive
6.93 Answer Problem 6.90 for a binomial distribution with overall based on your answer to Problem 6.100? Why or
n = 20 and p = .6. why not? Explain your answer.
6.94 Answer Problem 6.88 for a binomial distribution with
n = 50 and p = .6. Sports Medicine
6.95 Answer Problem 6.89 for a binomial distribution with Injuries are common in football and may be related to a
n = 50 and p = .6. number of factors, including the type of playing surface, the
6.96 Answer Problem 6.90 for a binomial distribution with number of years of playing experience, and whether any pre-
n = 50 and p = .6. vious injury exists. A study of factors affecting injury among
Canadian football players was recently reported [11].
Hypertension The rate of injury to the upper extremity (that is, shoulder to
A patient who is taking antihypertensive medication is asked hand) on a dry field consisting of natural grass was 2.73 inju-
by her doctor to record her blood pressure at home to ries per 1000 games. Assume this rate is known without error.
check that it is in the normotensive range. On each of 10 6.102 The study reported 45 injuries to the upper extrem-
days, she took an average of two readings, with results as ity on a dry field consisting of artificial turf over the course
shown in Table 6.12. of 10,112 games. What procedure can be used to assess
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 201 7/15/10 6:58:22 PM
202 C H A P T E R 6 ■ Estimation
whether the risk of injury is different on artificial turf versus 6.107 What is the probability that the sample mean will be
natural grass? within 1 unit of the population mean?
6.103 Provide a 95% CI for the rate of injury to the upper 6.108 Suppose we want to choose a large enough sample
extremity on artificial turf. (Hint: Use the Poisson distribution.) so that the sample mean is within 1 unit of the population
Express each rate as the number of injuries per 1000 games. mean 99% of the time. What is the minimum sample size to
achieve this goal?
Hypertension
A hypertensive patient has been on antihypertensive medi- Radiology, Cancer
cation for several years. Her physician wants to monitor her A radiologist investigates whether a new (less costly)
blood pressure via weekly measurements taken at home. method for identifying esophageal cancer is as effective as
Each week for 6 weeks she takes several blood pressure the gold standard.
readings and averages the readings to get a summary blood He obtains the following test results: false positive = 0, true
pressure for the week. The diastolic blood pressure (DBP) positive = 46, false negative = 1, true negative =17.
results are shown in Table 6.13.
6.109 What is the sensitivity of the test?
Table 6.13 Weekly mean DBP readings 6.110 Provide a 95% CI for the sensitivity (two decimal
for an individual patient place accuracy is sufficient). (Hint: The following spread-
sheet might be useful.)
Week Mean DBP Mean DBP
(mm Hg) Week (mm Hg)
p BINOMDIST BINOMDIST
(45,47,p,TRUE) (46,47,p,TRUE)
1 89 4 84
2 88 5 82 0.80 1.000 1.000
3 81 6 89.5 0.81 0.999 1.000
0.82 0.999 1.000
Mean 85.75 0.83 0.998 1.000
sd 3.66
0.84 0.997 1.000
0.85 0.996 1.000
6.104 Her doctor is considering taking her off antihyperten- 0.86 0.993 0.999
sive medications but wants to be fairly certain that her “true” 0.87 0.988 0.999
DBP is less than 90 mm Hg. Use a statistical approach to 0.88 0.982 0.998
answer this question. (Hint: Consider a CI approach.) 0.89 0.972 0.996
The doctor takes the patient off antihypertensive medica- 0.90 0.956 0.993
tion and instructs her to measure her blood pressure for 0.91 0.933 0.988
3 consecutive weeks. The doctor will put the patient back 0.92 0.899 0.980
on antihypertensive medication if her mean DBP over the 3 0.93 0.850 0.967
weeks is ≥90 mm Hg. 0.94 0.782 0.945
6.105 Suppose there is no real change in the patient’s un- 0.95 0.688 0.910
derlying mean blood pressure regardless of whether she is on 0.96 0.566 0.853
medication. What is the probability that she will be put back 0.97 0.414 0.761
on antihypertensive medication? (Hint: Assume that the true 0.98 0.242 0.613
mean and standard deviation of DBP for the patient are the 0.99 0.080 0.376
same as the measured mean and standard deviation over the 0.995 0.023 0.210
6 weeks while the patient is on antihypertensive medication.) 0.999 0.001 0.046
6.106 Suppose we observe x events over T years for a 0.9995 0.000 0.023
random variable X that is Poisson-distributed with expected
value = µ. Note: These data were provided by Dr. Ori Preis.
(a) Show that x is the MLE of µ. Genetics
(b) Suppose λ is the number of events per year. What is
the MLE of λ? The estimation of allele probabilities is essential for the
closer quantitative identification of inheritance. It requires
Suppose we have a population with a normal distribution the probabilistic formulation of the applied model of inheri-
with mean = 50 and standard deviation = 10. tance. The hereditary disease phenylketonuria (PKU) is a
We draw a sample of 13 observations from this distribution. useful example. PKU follows a recessive form of inheritance.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 202 7/15/10 6:58:22 PM
References 203
Suppose there are two alleles at a gene locus denoted by 6.113 Provide a point estimate and 95% CI for the a allele
a and A where the possible genotypes are (aa), (aA), and frequency p.
(AA). An individual will only be affected if the genotype aa As an experiment, 10,000 people are completely geno-
appears (i.e., a recessive form of inheritance). typed, of whom 10 have the (aa) genotype, 630 have the
6.111 Suppose the probability of an a allele is p. If people (aA) genotype [i.e., either (aA) or (Aa)], and 9360 have the
mate randomly, then what is the probability of the (aa) geno- (AA) genotype.
type? 6.114 If the two alleles of an individual are independent
Suppose that on a population level it is impossible to geno- random variables, then provide a point estimate and a 95%
type large numbers of individuals. However, it is known that CI for the a allele frequency p.
among 10,000 people surveyed in the population, 11 have 6.115 Does genotyping a population provide more accu-
the PKU clinical phenotype. rate estimates of p than obtained by only having the clinical
6.112 Provide a point estimate and 95% CI for the prob- phenotype? Why or why not?
ability of having the PKU phenotype.
R EFERENCES
[1] Cochran, W. G. (1963). Sampling techniques (2nd ed.). [7] Oldenburg, B., Macdonald, G. J., & Perkins, R. J.
New York: Wiley. (1988). Prediction of quality of life in a cohort of end-stage
[2] SHEP Cooperative Research Group. (1991). Preven- renal disease patients. Journal of Clinical Epidemiology, 41(6),
tion of stroke by antihypertensive drug treatment in older 555−564.
persons with isolated systolic hypertension: Final results of [8] Corbett, T. H., Cornell, R. G., Leiding, K., & Endres,
the Systolic Hypertension in the Elderly Program (SHEP). J. L. (1973). Incidence of cancer among Michigan nurse-
JAMA, 265(24), 3255−3264. anesthetists. Anesthesiology, 38(3), 260−263.
[3] Mood, A., & Graybill, F. A., & Boes, D. C. (1974). [9] Elmore, J. G., Barton, M. B., Moceri, V. M., Polk, S.,
Introduction to the theory of statistics (3rd ed.). New York: Arena, P. J., & Fletcher, S. W. (1998). Ten year risk of false
McGraw-Hill. positive screening mammograms and clinical breast exami-
[4] Documenta Geigy scientific tables, 8th ed., vol. 2. nations. New England Journal of Medicine, 338, 1089−1096.
(1982). Basel: Ciba-Geigy. [10] Elveback, L. R., Weidman, W. H., & Ellefson, R. D.
[5] Arora, N. S., & Rochester, D. R (1984). Effect of (1980). Day to day variability and analytic error in deter-
chronic airflow limitation (CAL) on sternocleidomastoid mination of lipids in children. Mayo Clinic Proceedings, 55,
muscle thickness. Chest, 85(6), 58S−59S. 267−269.
[6] Barry, A. L., Gavan, T. L., & Jones, R. N. (1983). [11] Hagel, B. E., Fick, G. H., & Meeuwisse, W. H. (2003).
Quality control parameters for susceptibility data with 30 µg Injury risk in men’s Canada West University Football. Ameri-
netilmicin disks. Journal of Clinical Microbiology, 18(5), can Journal of Epidemiology, 157, 825−833.
1051−1054.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-006.indd 203 7/15/10 6:58:22 PM
7
Hypothesis Testing:
One-Sample Inference
7.1 Introduction
Chapter 6 discussed methods of point and interval estimation for parameters of various
distributions. However, researchers often have preconceived ideas about what the values
of these parameters might be and wish to test whether the data conform to these ideas.
Example 7.1 Cardiovascular Disease, Pediatrics A current area of research interest is the familial
aggregation of cardiovascular risk factors in general and lipid levels in particular.
Suppose the “average” cholesterol level in children is 175 mg/dL. A group of men
who have died from heart disease within the past year are identified, and the choles-
terol levels of their offspring are measured. Two hypotheses are considered:
(1) The average cholesterol level of these children is 175 mg/dL.
(2) The average cholesterol level of these children is >175 mg/dL.
This type of question is formulated in a hypothesis-testing framework by specify-
ing two hypotheses—a null and an alternative hypothesis. We wish to compare the
relative probabilities of obtaining the sample data under each of these hypotheses.
In Example 7.1, the null hypothesis is that the average cholesterol level of the chil-
dren is 175 mg/dL and the alternative hypothesis is that the average cholesterol level
of the children is >175 mg/dL.
Why is hypothesis testing so important? Hypothesis testing provides an ob-
jective framework for making decisions using probabilistic methods, rather than
relying on subjective impressions. People can form different opinions by looking
at data, but a hypothesis test provides a uniform decision-making criterion that is
consistent for all people.
In this chapter, some of the basic concepts of hypothesis testing are developed and
applied to one-sample problems of statistical inference. In a one-sample problem,
hypotheses are specified about a single distribution; in a two-sample problem, two
different distributions are compared.
7.2 General Concepts
Example 7.2 Obstetrics Suppose we want to test the hypothesis that mothers with low socio-
economic status (SES) deliver babies whose birthweights are lower than “normal.”
To test this hypothesis, a list is obtained of birthweights from 100 consecutive, full-
term, live-born deliveries from the maternity ward of a hospital in a low-SES area.
The mean birthweight (x) is found to be 115 oz with a sample standard deviation (s)
of 24 oz. Suppose we know from nationwide surveys based on millions of deliveries
204
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 204 7/15/10 7:32:55 PM
7.2 ■ General Concepts 205
that the mean birthweight in the United States is 120 oz. Can we actually say the
underlying mean birthweight from this hospital is lower than the national average?
Assume the 100 birthweights from this hospital come from an underlying nor-
mal distribution with unknown mean µ. The methods in Section 6.10 could be used
to construct a 95% lower one-sided confidence interval (CI) for µ based on the sam-
ple data—that is, an interval of the form µ < c. If this interval contains 120 oz (that is,
c ≥ 120), then the hypothesis that the mean birthweight in this hospital is the same
as the national average would be accepted. If the interval does not contain 120 oz
(c < 120), then the hypothesis that the mean birthweight in this hospital is lower
than the national average would be accepted.
Another way of looking at this problem is in terms of hypothesis testing. In
particular, the hypotheses being considered can be formulated in terms of null and
alternative hypotheses, which can be defined as follows:
Definition 7.1 The null hypothesis, denoted by H , is the hypothesis that is to be tested. The alter-
0
native hypothesis, denoted by H is the hypothesis that in some sense contradicts
1
the null hypothesis.
Example 7.3 Obstetrics In Example 7.2, the null hypothesis (H ) is that the mean birthweight in
0
the low-SES-area hospital (µ) is equal to the mean birthweight in the United States
(µ ). This is the hypothesis we want to test. The alternative hypothesis (H ) is that
1
0
the mean birthweight in this hospital (µ) is lower than the mean birthweight in the
United States (µ ). We want to compare the relative probabilities of obtaining the
0
sample data under each of these two hypotheses.
We also assume the underlying distribution is normal under either hypothesis.
These hypotheses can be written more succinctly in the following form:
Equation 7.1 H : µ = µ 0 vs. H : µ < µ 0
1
0
Suppose the only possible decisions are whether H is true or H is true. Actually, for
0
1
ease of notation, all outcomes in a hypothesis-testing situation generally refer to
the null hypothesis. Hence, if we decide H is true, then we say we accept H . If we
0
0
decide H is true, then we state that H is not true or, equivalently, that we reject H .
1
0
0
Thus four possible outcomes can occur:
(1) We accept H , and H is in fact true.
0 0
(2) We accept H , and H is in fact true.
0 1
(3) We reject H , and H is in fact true.
0 0
(4) We reject H , and H is in fact true.
0 1
These four possibilities are shown in Table 7.1.
Table 7.1 Four possible outcomes in hypothesis testing
Truth
Decision H 0 H 1
Accept H
0 H is true and H is accepted H is true and H is accepted
0
0
1
0
Reject H H is true and H is rejected H is true and H is rejected
0 0 0 1 0
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 205 7/15/10 7:32:55 PM
206 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
In actual practice, it is impossible, using hypothesis-testing methods, to prove that
the null hypothesis is true. Thus, in particular, if we accept H , then we have actually
0
failed to reject H .
0
If H is true and H is accepted, or if H is true and H is rejected, then the cor-
0
0
0
1
rect decision has been made. If H is true and H is rejected, or if H is true and H is
0
1
0
0
accepted, then an error has been made. The two types of errors are generally treated
differently.
Definition 7.2 The probability of a type I error is the probability of rejecting the null hypothesis
when H is true.
0
Definition 7.3 The probability of a type II error is the probability of accepting the null hypothesis
when H is true. This probability is a function of µ as well as other factors.
1
Example 7.4 Obstetrics In the context of the birthweight data in Example 7.2, a type I error
would be the probability of deciding that the mean birthweight in the hospital was
lower than 120 oz when in fact it was 120 oz. A type II error would be the probabil-
ity of deciding that the mean birthweight was 120 oz when in fact it was lower than
120 oz.
Example 7.5 Cardiovascular Disease, Pediatrics What are the type I and type II errors for the
cholesterol data in Example 7.1?
Solution The type I error is the probability of deciding that offspring of men who died from
heart disease have an average cholesterol level higher than 175 mg/dL when in fact
their average cholesterol level is 175 mg/dL. The type II error is the probability of
deciding that the offspring have normal cholesterol levels when in fact their choles-
terol levels are above average.
Type I and type II errors often result in monetary and nonmonetary costs.
Example 7.6 Obstetrics The birthweight data in Example 7.2 might be used to decide whether
a special-care nursery for low-birthweight babies is needed in this hospital. If H
1
were true—that is, if the birthweights in this hospital did tend to be lower than the
national average—then the hospital might be justified in having its own special-care
nursery. If H were true and the mean birthweight was no different from the U.S.
0
average, then the hospital probably does not need such a nursery. If a type I error is
made, then a special-care nursery will be recommended, with all the related extra
costs, when in fact it is not needed. If a type II error is made, a special-care nursery
will not be funded, when in fact it is needed. The nonmonetary cost of this decision
is that some low-birthweight babies may not survive without the unique equipment
found in a special-care nursery.
Definition 7.4 The probability of a type I error is usually denoted by α and is commonly referred
to as the significance level of a test.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 206 7/15/10 7:32:56 PM
7.3 ■ One-Sample Test for the Mean of a Normal Distribution: One-Sided Alternatives 207
Definition 7.5 The probability of a type II error is usually denoted by β.
Definition 7.6 The power of a test is defined as
1 − β = 1 − probability of a type II error = Pr(rejecting H |H true)
0 1
Example 7.7 Rheumatology Suppose a new drug for pain relief is to be tested among patients
with osteoarthritis (OA). The measure of pain relief will be the percent change in
pain level as reported by the patient after taking the medication for 1 month. Fifty
OA patients will participate in the study. What hypotheses are to be tested? What do
type I error, type II error, and power mean in this situation?
Solution The hypotheses to be tested are H : µ= 0 vs. H : µ> 0, where µ = mean % change in
1
0
level of pain over a 1-month period. It is assumed that a positive value for µ indi-
cates improvement, whereas a negative value indicates decline.
A type I error is the probability of deciding that the drug is an effective pain re-
liever based on data from 50 patients, given that the true state of nature is that the
drug has no effect on pain relief. The true state of nature here means the effect of the
drug when tested on a large (infinite) number of patients.
A type II error is the probability of deciding the drug has no effect on pain relief
based on data from 50 patients given that the true state of nature is that the drug is
an effective pain reliever.
The power of the test is the probability of deciding that the drug is effective
as a pain reliever based on data from 50 patients when the true state of nature is
that it is effective. It is important to note that the power is not a single number
but depends on the true degree of pain relief offered by the drug as measured
by the true mean change in pain-relief score (δ). The higher δ is, the higher the
power will be. In Section 7.5, we present methods for calculating power in more
detail.
The general aim in hypothesis testing is to use statistical tests that make α and β
as small as possible. This goal requires compromise because making α small involves
rejecting the null hypothesis less often, whereas making β small involves accepting
the null hypothesis less often. These actions are contradictory; that is, as α decreases,
β increases, and as α increases, β decreases. Our general strategy is to fix α at some
specific level (for example, .10, .05, .01, . . .) and to use the test that minimizes β or,
equivalently, maximizes the power.
7.3 One-Sample Test for the Mean of a Normal
Distribution: One-Sided Alternatives
Now let’s develop the appropriate hypothesis test for the birthweight data in Ex-
ample 7.2. The statistical model in this case is that the birthweights come from a
normal distribution with mean µ and unknown variance σ . We wish to test the
2
null hypothesis, H , that µ = 120 oz vs. the alternative hypothesis, H , that µ <
1
0
120 oz. Suppose a more specific alternative, namely H : µ 1 µ = 1 = 110 oz, is selected.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 207 7/15/10 7:32:56 PM
208 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
We will show that the nature of the best test does not depend on the value
chosen for µ provided that µ is less than 120 oz. We will also fix the α level at .05
1
1
for concreteness.
Example 7.8 A very simple test could be used by referring to the table of random digits in Table 4
in the Appendix. Suppose two digits are selected from this table. The null hypoth-
esis is rejected if these two digits are between 00 and 04 inclusive and is accepted if
these two digits are between 05 and 99. Clearly, from the properties of the random-
number table, the type I error of this test = α = Pr(rejecting the null hypothesis |
5
H true) = Pr(drawing two random digits between 00 and 04) = = .05. Thus the
0 100
proposed test satisfies the α-level criterion given previously. The problem with this
test is that it has very low power. Indeed, the power of the test = Pr(rejecting the null
5
hypothesis | H true) = Pr(drawing two random digits between 00 and 04) = = .05.
1 100
Note that the outcome of the test has nothing to do with the sample birth-
weights drawn. H will be rejected just as often when the sample mean birthweight
0
(x) is 110 oz as when it is 120 oz. Thus this test must be very poor because we would
expect to reject H with near certainty if x is small enough and would expect never
0
to reject H if x is large enough.
0
It can be shown that the best (most powerful) test in this situation is based on
the sample mean (x). If x is sufficiently smaller than µ , then H is rejected; other-
0 0
wise, H is accepted. This test is reasonable because if H is true, then the most likely
0 0
values of x tend to cluster around µ , whereas if H is true, the most likely values of
0 1
x tend to cluster around µ . By “most powerful,” we mean that the test based on the
1
sample mean has the highest power among all tests with a given type I error of α.
Definition 7.7 The acceptance region is the range of values of x for which H is accepted.
0
Definition 7.8 The rejection region is the range of values of x for which H is rejected.
0
For the birthweight data in Example 7.2, the rejection region consists of small
values of x because the underlying mean under the alternative hypothesis (µ ) is less
1
than the underlying mean under the null hypothesis. This type of test is called a
one-tailed test.
Definition 7.9 A one-tailed test is a test in which the values of the parameter being studied (in this
case µ) under the alternative hypothesis are allowed to be either greater than or less
than the values of the parameter under the null hypothesis (µ ), but not both.
0
Example 7.9 Cardiovascular Disease, Pediatrics The hypotheses for the cholesterol data in Ex-
ample 7.1 are H : µ 0 µ = 0 vs. H : µ 1 µ > 0 , where µ is the true mean cholesterol level for
children of men who have died from heart disease. This test is one-tailed because the
alternative mean is only allowed to be greater than the null mean.
In the birthweight example, how small should x be for H to be rejected? This
0
issue can be settled by recalling that the significance level of the test is set at α.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 208 7/15/10 7:32:59 PM
7.3 ■ One-Sample Test for the Mean of a Normal Distribution: One-Sided Alternatives 209
Suppose H is rejected for all values of x < c and accepted otherwise. The value c
0
should be selected so that the type I error = α.
It is more convenient to define test criteria in terms of standardized values rather
than in terms of x. Specifically, if we subtract µ and divide by S n, we obtain the
0
random variable t = X − ( µ 0 ) S ( n), which, based on Equation 6.5, follows a t n−1 dis-
tribution under H . We note that under H , based on the definition the percentiles of
0
0
a t distribution, Pr(t < t n−1 ,α ) = α. This leads us to the following test procedure.
Equation 7.2 One-Samplet t Test for the Mean of a Normal Distribution with Unknown Variance
(Alternative Mean < Null Mean)
To test the hypothesis H : µ 0 µ = 0 ,σ unknown vs. H : µ 1 µ < 0 ,σ unknown with a
significance level of α, we compute
x −µ
t = 0
s n
If t < t n−1,α , then we reject H .
0
If t ≥ t n−1,α , then we accept H .
0
Definition 7.10 The value t in Equation 7.2 is called a test statistic because the test procedure is
based on this statistic.
Definition 7.11 The value t n−1,α in Equation 7.2 is called a critical value because the outcome of the
test depends on whether the test statistic t < t n−1,α = critical value, whereby we reject
H or t ≥ t , whereby we accept H .
0 n−1,α 0
Definition 7.12 The general approach in which we compute a test statistic and determine the out-
come of a test by comparing the test statistic with a critical value determined by the
type I error is called the critical-value method of hypothesis testing.
Example 7.10 Obstetrics Use the one-sample t test to test the hypothesis H : µ = 120 vs. H : µ < 120
0
1
based on the birthweight data given in Example 7.2 and using a significance level
of .05.
Solution We compute the test statistic
x − µ
t = 0
s n
115 −120
=
24 100
−5
= =−208.
.
24
Using the TINV function of Excel, we see the critical value = t 99, .05 = −TINV(.05,99) =
−1.66. Because t = −2.08 < −1.66, it follows that we can reject H at a significance
0
level of .05.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 209 7/15/10 7:33:03 PM
210 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Example 7.11 Obstetrics Use the one-sample t test to test the hypothesis given in Example 7.10
using a significance level of .01.
Solution Using Excel, the critical value is t 99, .01 = −TINV(.01,99) = −2.36. Because t = −2.08 >
−2.36, it follows that we accept H at significance level = .01.
0
If we use the critical-value method, how do we know what level of α to use? The
actual α level used should depend on the relative importance of type I and type II
errors because the smaller α is made for a fixed sample size (n), the larger β becomes.
Most people feel uncomfortable with α levels much greater than .05. Traditionally,
an α level of exactly .05 is used most frequently.
In general, a number of significance tests could be performed at different α
levels, as was done in Examples 7.10 and 7.11, and whether H would be accepted
0
or rejected in each instance could be noted. This can be somewhat tedious and is
unnecessary because, instead, significance tests can be effectively performed at all α
levels by obtaining the p-value for the test.
Definition 7.13 The p-value for any hypothesis test is the α level at which we would be indifferent
between accepting or rejecting H given the sample data at hand. That is, the p-value
0
is the α level at which the given value of the test statistic (such as t) is on the border-
line between the acceptance and rejection regions.
According to the test criterion in Equation 7.2, if a significance level of p is used,
then H would be rejected if t < t n−1, p and accepted if t ≥ t n−1, p . We would be indiffer-
0
ent to the choice between accepting or rejecting H if t = t n−1, p . We can solve for p as
0
a function of t by
Equation 7.3 p = Pr( t n−1 ≤ t)
Thus p is the area to the left of t under a t n−1 distribution.
The p-value can be displayed as shown in Figure 7.1.
Figure 7.1 Graphic display of a p-value
0.4
0.3 t n–1 distribution
Frequency 0.2
0.1
p-value
0.0
t 0
Value
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 210 7/15/10 7:33:04 PM
7.3 ■ One-Sample Test for the Mean of a Normal Distribution: One-Sided Alternatives 211
Example 7.12 Obstetrics Compute the p-value for the birthweight data in Example 7.2.
Solution From Equation 7.3, the p-value is
208)
Pr t ( 99 ≤− .
Using the TDIST function of Excel, we find this probability is given by TDIST
(2.08,99,1) = .020, which is the p-value. Note that the TDIST function can only be
used to obtain right-hand tail areas for positive values of t. However, from the sym-
metry of the t distribution, Pr(t 99 ≤− . 208 = ) Pr(t 99 ≥ . 2 08 ) = TDIST (2.08,99,1). The
third argument (1) indicates that the p-value for a one-tailed test is desired.
An alternative definition of a p-value that is useful in other hypothesis-testing
problems is as follows:
Definition 7.14 The p-value can also be thought of as the probability of obtaining a test statistic as
extreme as or more extreme than the actual test statistic obtained, given that the
null hypothesis is true.
We know that under the null hypothesis, the t statistic follows a t distribution.
n−1
Hence, the probability of obtaining a t statistic that is no larger than t under the null
hypothesis is Pr(t n−1 ≤ ) t = p-value, as shown in Figure 7.1.
Example 7.13 Cardiology A topic of recent clinical interest is the possibility of using drugs to
reduce infarct size in patients who have had a myocardial infarction within the
past 24 hours. Suppose we know that in untreated patients the mean infarct size is 25
(ck − g − EQ/m ). Furthermore, in 8 patients treated with a drug the mean infarct size
2
is 16 with a standard deviation of 10. Is the drug effective in reducing infarct size?
Solution The hypotheses are H : µ= 25 vs. H : µ< 25. The p-value is computed using Equation
0
1
7.3. First we compute the t statistic given by
16 − 25
t = =−255
.
10 8
The p-value is then given by p = Pr(t < −2.55). Referring to Table 5 in the Appendix,
7
we see that t = 2.365, and t = 2.998. Because 2.365 < 2.55 < 2.998, it follows
7, .975 7, .99
that 1 − .99 < p < 1 − .975 or .01 < p < .025. Using Excel, the exact p-value is given by
TDIST (2.55,7,1) = .019. Thus H is rejected and we conclude that the drug signifi-
0
cantly reduces infarct size (all other things being equal).
This can also be interpreted as the probability that mean infarct size among a
random sample of 8 patients will be no larger than 16, if the null hypothesis is true.
In this example, the null hypothesis is that the drug is ineffective, or in other words,
that true mean infarct size for the population of all patients with myocardial infarc-
tion who are treated with drug = true mean infarct size for untreated patients = 25.
The p-value is important because it tells us exactly how significant our results are
without performing repeated significance tests at different α levels. A question typi-
cally asked is: How small should the p-value be for results to be considered statisti-
cally significant? Although this question has no one answer, some commonly used
criteria follow.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 211 7/15/10 7:33:05 PM
212 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Equation 7.4 Guidelines for Judging the Significance of a p-Value
If .01 ≤ p < .05, then the results are significant.
If .001 ≤ p < .01, then the results are highly significant.
If p < .001, then the results are very highly significant.
If p > .05, then the results are considered not statistically significant (sometimes
denoted by NS).
However, if .05 ≤ p < .10, then a trend toward statistical significance is some-
times noted.
Authors frequently do not specify the exact p-value beyond giving ranges of the type
shown here because whether the p-value is .024 or .016 is thought to be unimportant.
Other authors give an exact p-value even for results that are not statistically significant so
that the reader can appreciate how close to statistical significance the results have come.
With the advent of statistical packages such as Excel, MINITAB, and Stata, exact p-values
are easy to obtain. These different approaches lead to the following general principle.
Equation 7.5 Determination of Statistical Significance for Results from Hypothesis Tests
Either of the following methods can be used to establish whether results from
hypothesis tests are statistically significant:
(1) The test statistic t can be computed and compared with the critical value
t at an α level of .05. Specifically, if H : µ = µ vs. H : µ < µ is being
n−1, α 0 0 1 0
tested and t < t n−105 , then H is rejected and the results are declared statisti-
,.
0
cally significant (p < .05). Otherwise, H is accepted and the results are de-
0
clared not statistically significant (p ≥ .05). We have called this approach the
critical-value method (see Definition 7.12).
(2) The exact p-value can be computed and, if p < .05, then H is rejected and
0
the results are declared statistically significant. Otherwise, if p ≥ .05, then H
0
is accepted and the results are declared not statistically significant. We will
refer to this approach as the p-value method.
These two approaches are equivalent regarding the determination of statistical sig-
nificance (whether p < .05 or p ≥ .05). The p-value method is somewhat more precise
in that it yields an exact p-value. The two approaches in Equation 7.5 can also be
used to determine statistical significance in other hypothesis-testing problems.
Example 7.14 Obstetrics Assess the statistical significance of the birthweight data in Example 7.12.
Solution Because the p-value is .020, the results would be considered statistically significant
and we would conclude that the true mean birthweight is significantly lower in this
hospital than in the general population.
Example 7.15 Cardiology Assess the significance of the infarct-size data in Example 7.13.
Solution The p-value = Pr(t < −2.55). Using the TDIST function of Excel, we found that
7
p = .019. Thus the results are significant.
In writing up the results of a study, a distinction between scientific and statisti-
cal significance should be made because the two terms do not necessarily coincide.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 212 7/15/10 7:33:07 PM
7.3 ■ One-Sample Test for the Mean of a Normal Distribution: One-Sided Alternatives 213
The results of a study can be statistically significant but can still not be scientifically
important. This situation would occur if a small difference was found to be statisti-
cally significant because of a large sample size. Conversely, some statistically nonsig-
nificant results can be scientifically important, encouraging researchers to perform
larger studies to confirm the direction of the findings and possibly reject H with a
0
larger sample size. This statement is true not only for the one-sample t test but for
virtually any hypothesis test.
Example 7.16 Obstetrics Suppose the mean birthweight in Example 7.2 was 119 oz, based on a
sample of size 10,000. Assess the results of the study.
Solution The test statistic would be given by
119 −120
t = =−417
.
24 10 000
,
Thus the p-value is given by Pr(t 9999 < −4.17). Because a t distribution with 9999 de-
grees of freedom (df ) is virtually the same as an N(0,1) distribution, we can approxi-
mate the p-value by Φ(− . ).001. The results are thus very highly significant but
<417
are clearly not very important because of the small difference in mean birthweight
(1 oz) between this hospital and the national average.
Example 7.17 Obstetrics Suppose the mean birthweight in Example 7.2 was 110 oz, based on a
sample size of 10. Assess the results of the study.
Solution The test statistic would be given by
110 −120
t = =−132
.
24 10
The p-value is given by Pr(t < −1.32). From Appendix Table 5, because t 9,.85 = 1.100
9
and t 9,.90 = 1.383 and 1.100 < 1.32 < 1.383, it follows that 1 − .90 < p < 1 − .85 or .10
< p < .15. Using Excel, the p-value = TDIST (1.32,9,1) = .110. These results are not
statistically significant but could be important if the same trends were also apparent
in a larger study.
The test criterion in Equation 7.2 was based on an alternative hypothesis that
µ < µ . In many situations we wish to use an alternative hypothesis that µ > µ . In
0 0
this case H would be rejected if x, or correspondingly our test statistic t, were large
0
(> c) and accepted if t were small (≤ c), where c is derived next. To ensure a type I
error of α, find c such that
α = Pr > (t 0 ) = c H Pr c µ > (t µ = )
0
= − Pr µ ≤ (t µ = c 0 )
1
Because t follows a t distribution under H , we have
n−1 0
α =− Pr (t n − ≤ ) c or − 1 α = Pr (t n −1 1 ≤ ) c
1
Because Pr(t n − < t n 1 , −α ) = 1 − α, we have c = t n − −11 , α . Thus at level α, H is rejected if
−1
1
0
t > t n − −11 , α and accepted otherwise. The p-value is the probability of observing a test
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 213 7/15/10 7:33:08 PM
214 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
statistic at least as large as t under the null hypothesis. Thus, because t follows a t n−1
distribution under H , we have
0
p = t ( n − ≥ Pr t) =− Pr t ( n 1 ≤ t)
1
−1
The test procedure is summarized as follows.
Equation 7.6 One-Sample t Test for the Mean of a Normal Distribution with Unknown Variance
(Alternative Mean > Null Mean)
To test the hypothesis
H : µ = µ 0 vs. H : µ > µ 0
1
0
with a significance level of α, the best test is based on t, where
x −µ
t = 0
s n
If t > t n − −11 , α , then H is rejected
0
If t ≤ t n − −11 , α , then H is accepted
0
The p-value for this test is given by
p = Pr t ( n−1 > t)
The p-value for this test is depicted in Figure 7.2.
Example 7.18 Cardiovascular Disease, Pediatrics Suppose the mean cholesterol level of 10 chil-
dren whose fathers died from heart disease in Example 7.1 is 200 mg/dL and the
sample standard deviation is 50 mg/dL. Test the hypothesis that the mean choles-
terol level is higher in this group than in the general population.
Solution The hypothesis
H : µ 0 175 vs. H : µ = 1 > 175
is tested using an α level of .05. H is rejected if
0
t > t n − −11 , α = t 995
.
,
Figure 7.2 p-value for the one-sample t test when the alternative mean (µ ) > null mean (µ )
1 0
0.4
0.3 t n–1 distribution
Frequency 0.2
0.1 p-value
0.0
0 t
Value
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 214 7/15/10 7:33:11 PM
7.4 ■ One-Sample Test for the Mean of a Normal Distribution: Two-Sided Alternatives 215
In this case,
200 −175
t =
50 10
25
= = 158
.
.
15 81
From Table 5, we see that t 9,.95 = 1.833. Because 1.833 > 1.58, it follows that we accept
H at the 5% level of significance.
0
If we use the p-value method, the exact p-value is given by
p = Pr(t > 1.58)
9
Using Appendix Table 5, we find t 9,.90 = 1.383 and t 9,.95 = 1.833. Thus, because 1.383 <
1.58 < 1.833, it follows that .05 < p < .10. Alternatively, using the TDIST function of
Excel, we can get the exact p-value = Pr(t > 1.58) = TDIST (1.58,9,1) = .074. Because
9
p > .05, we conclude that our results are not statistically significant, and the null
hypothesis is accepted. Thus the mean cholesterol level of these children does not
differ significantly from that of an average child.
REVIEW QUESTIONS 7A
1 What is the difference between a type I and type II error? R E V I E W
2 What is the difference between the critical-value method and the p-value method of
hypothesis testing?
3 Several studies have shown that women with many children are less likely to get
ovarian cancer. In a new study, data are collected from 25 women ages 40−49 with
ovarian cancer. The mean parity (number of children) of these women is 1.8 with
standard deviation 1.2. Suppose the mean number of children among women in the
general population in this age group is 2.5.
(a) What test can be used to test the hypothesis that women with ovarian cancer have
fewer children than women in the general population in the same age group?
(b) Perform the test in Review Question 7A.3a using the critical-value method.
(c) What is the p-value based on the test in Review Question 7A.3a?
(d) What do you conclude from this study?
7.4 One-Sample Test for the Mean of a Normal
Distribution: Two-Sided Alternatives
In the previous section the alternative hypothesis was assumed to be in a specific
direction relative to the null hypothesis.
Example 7.19 Obstetrics Example 7.2 assumed that the mean birthweight of infants from a low-
SES-area hospital was either the same as or lower than average. Example 7.1 assumed
that the mean cholesterol level of children of men who died from heart disease was
either the same as or higher than average.
In most instances this prior knowledge is unavailable. If the null hypothesis is not
true, then we have no idea in which direction the alternative mean will fall.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 215 7/15/10 7:33:12 PM
216 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Example 7.20 Cardiovascular Disease Suppose we want to compare fasting serum-cholesterol
levels among recent Asian immigrants to the United States with typical levels found
in the general U.S. population. Suppose we assume cholesterol levels in women ages
21−40 in the United States are approximately normally distributed with mean 190 mg/dL.
It is unknown whether cholesterol levels among recent Asian immigrants are higher
or lower than those in the general U.S. population. Let’s assume that levels among
recent female Asian immigrants are normally distributed with unknown mean
µ. Hence we wish to test the null hypothesis H : µ 0 µ = 0 = 190 vs. the alternative
hypothesis H : µ 1 µ ≠ 0 . Blood tests are performed on 100 female Asian immigrants
ages 21−40, and the mean level (x) is 181.52 mg/dL with standard deviation = 40 mg/dL.
What can we conclude on the basis of this evidence?
The type of alternative given in Example 7.20 is known as a two-sided alternative
because the alternative mean can be either less than or greater than the null mean.
Definition 7.15 A two-tailed test is a test in which the values of the parameter being studied (in this
case µ) under the alternative hypothesis are allowed to be either greater than or less
than the values of the parameter under the null hypothesis (µ ).
0
The best test here depends on the sample mean x or, equivalently, on the test
statistic t, as it did in the one-sided situation developed in Section 7.3. We showed in
Equation 7.2 that to test the hypothesis H : µ 0 µ = 0 versus H : µ 1 µ < 0 , the best test was
of the form: Reject H if t < t n−1,α and accept H if t ≥ t n−1,α . This test is clearly only
0
0
appropriate for alternatives on one side of the null mean, namely µ < µ . We also
0
showed in Equation 7.6 that to test the hypothesis
H : µ 0 µ = 0 vs. H : µ 1 µ > 0
the best test was correspondingly of the following form: Reject H if t > t n − −11 , α and
0
accept H if t ≤ t .
0 n − −11 , α
Equation 7.7 A reasonable decision rule to test for alternatives on either side of the null mean
is to reject H if t is either too small or too large. Another way of stating the rule is
0
that H will be rejected if t is either < c or > c for some constants c , c and H
2
2
1
1
0
0
will be accepted if c ≤ t ≤ c .
1 2
The question remains: What are appropriate values for c and c ? These values
2
1
are again determined by the type I error (α). The constants c , c should be chosen
2
1
such that
(
Equation 7.8 Pr reject H 0 H 0 true) = Pr t < ( c 1 or t > c H 0 true) )
2
= Pr t ( < c H 0 true) + ( t > cH 0 true) = α
Pr
1 1
2
Half of the type I error is assigned arbitrarily to each of the probabilities on the
left side of the second line of Equation 7.8. Thus, we wish to find c , c so that
1 2
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 216 7/15/10 7:33:14 PM
7.4 ■ One-Sample Test for the Mean of a Normal Distribution: Two-Sided Alternatives 217
Equation 7.9 Pr t < ( c H 0 ) =true Pr t > ( c H true ) = α 2
2
0
1
We know t follows a t distribution under H . Because t and t are the
n−1 0 n−1,α 2 n − −11 , α 2
lower and upper 100% × α/2 percentiles of a t distribution, it follows that
n−1
Pr t < ( t n − 2 ) = Pr t ( > t n 11 − α 2 ) = α 2
,
−1,α
Therefore,
c = 1 t n 1, α 2 = t − n 1 1 − 2 and c = 2 t n 1 1−α 2
−α
,
,
−
−
This test procedure can be summarized as follows:
Equation 7.10 One-Sample t Test for the Mean of a Normal Distribution with Unknown Variance
(Two-Sided Alternative)
To test the hypothesis H : µ = µ vs. H : µ ≠ µ , with a significance level of α, the
0 0 1 0
best test is based on t = ( x − µ 0 ) s ( n).
If t > t n − −11 , α 2
then H is rejected.
0
If t ≤ t n − −11 , α 2
then H is accepted.
0
The acceptance and rejection regions for this test are shown in Figure 7.3.
Figure 7.3 One-sample t test for the mean of a normal distribution (two-sided alternative)
t n–1 distribution = distribution of t under H 0
Frequency Rejection region Rejection region
Acceptance region
0
t n–1, �/2 0 t n–1, 1–�/2
Value
Example 7.21 Cardiovascular Disease Test the hypothesis that the mean cholesterol level of re-
cent female Asian immigrants is different from the mean in the general U.S. popula-
tion, using the data in Example 7.20.
Solution We compute the test statistic
x − µ
t = 0
s n
181 52 −190
.
=
40 100
.
−848
= =−212
.
4
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 217 7/15/10 7:33:17 PM
218 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
For a two-sided test with α = .05, the critical values are c = 1 t 99 025 , c = 2 t 99 975 .
,.
,.
.
From Table 5 in the Appendix, because t 99 975 < t 60 975 = 2 000, it follows that c <
,.
,.
2
2.000. Also, because c = −c it follows that c > −2.000. Because t = −2.12 < −2.000 < c ,
1
2
1
1
it follows that we can reject H at the 5% level of significance. We conclude that the
0
mean cholesterol level of recent Asian immigrants is significantly different from that
of the general U.S. population.
Alternatively, we might want to compute a p-value as we did in the one-sided
case. The p-value is computed in two different ways, depending on whether t is less
than or greater than 0.
Equation 7.11 p-Value for the One-Sample t Test for the Mean of a Normal Distribution
(Two-Sided Alternative)
x −µ
Let t = 0
s n
2 × t ( ≤ Pr t), if t ≤ 0
p = n−1
f
× 2 − 1 Pr t ( n−1 ≤ t), if t > 0
Thus, in words, if t ≤ 0, then p = 2 times the area under a t distribution to the
n−l
left of t; if t > 0, then p = 2 times the area under a t distribution to the right of t.
n−l
One way to interpret the p-value is as follows.
Equation 7.12 The p-value is the probability under the null hypothesis of obtaining a test sta-
tistic as extreme as or more extreme than the observed test statistic, where, be-
cause a two-sided alternative hypothesis is being used, extremeness is measured
by the absolute value of the test statistic.
Hence if t > 0, the p-value is the area to the right of t plus the area to the left of −t
under a t distribution.
n−l
However, this area simply amounts to twice the right-hand tail area because the
t distribution is symmetric around 0. A similar interpretation holds if t < 0.
These areas are illustrated in Figure 7.4.
Example 7.22 Cardiovascular Disease Compute the p-value for the hypothesis test in Example 7.20.
Solution Because t = −2.12, the p-value for the test is twice the left-hand tail area, or
p = × Pr t ( 99 <−2 12 ) =2 ( 212992,) = TDIST . 037
,
.
.
Hence, the results are statistically significant with a p-value of .037. Note that if the
third argument of TDIST is 2, then a two-sided p-value is obtained.
Finally, if n is large (say, >200), then the percentiles of the t distribution
(t > t n − −11 , α 2) used in determining the critical values in Equation 7.10 can be
approximated by the corresponding percentiles of an N(0,1) distribution (z 1−α/2 ).
Similarly, in computing p-values in Equation 7.11, if n > 200, then Pr(t n−1 ≤ ) t can be
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 218 7/15/10 7:33:19 PM
7.4 ■ One-Sample Test for the Mean of a Normal Distribution: Two-Sided Alternatives 219
Figure 7.4 Illustration of the p-value for a one-sample t test for the mean of a normal
distribution (two-sided alternative)
0.4 If t ≤ 0, p-value = 0.4 If t > 0, p-value =
twice the left-hand twice the right-hand
tail area t n–1 distribution t n–1 distribution tail area
0.3 0.3
Frequency 0.2 Frequency 0.2
0.1 0.1 p/2
p/2
0.0 0.0
t 0 0 t
Value Value
approximated by Pr[N(0,1) < t] = Φ(t). We used similar approximations in our work
on CIs for the mean of a normal distribution with unknown variance in Section 6.5
on page 172 in Chapter 6.
When is a one-sided test more appropriate than a two-sided test? Generally, the
sample mean falls in the expected direction from µ and it is easier to reject H using a
0
0
one-sided test than using a two-sided test. However, this is not necessarily always the
case. Suppose we guess from a previous review of the literature that the cholesterol
level of Asian immigrants is likely to be lower than that of the general U.S. popula-
tion because of better dietary habits. In this case, we would use a one-sided test of
the form H : µ 0 190 vs. H : µ = 1 < 190. From Equation 7.3, the one-sided p-value =
2
Pr(t 99 <− . 212 = ) ( .12 ,99 ,) = TDIST 1 .018 = 1 (two-sided p-value). Alternatively, sup-
2
pose we guess from a previous literature review that the cholesterol level of Asian
immigrants is likely to be higher than that of the general U.S. population because of
more stressful living conditions. In this case, we would use a one-sided test of the form
H : µ 0 190 vs. H : µ = 1 > 190. From Equation 7.6, the p-value = Pr(t > −2.12) = .982. Thus
99
we would accept H if we use a one-sided test and the sample mean is on the opposite
0
side of the null mean from the alternative hypothesis. Generally, a two-sided test is
always appropriate because there can be no question about the conclusions. Also, as
just illustrated, a two-sided test can be more conservative because you need not guess
the appropriate side of the null hypothesis for the alternative hypothesis. However, in
certain situations only alternatives on one side of the null mean are of interest or are
possible, and in this case a one-sided test is better because it has more power (that is,
it is easier to reject H based on a finite sample if H is actually true) than its two-sided
1
0
counterpart. In all instances, it is important to decide whether to use a one-sided or
a two-sided test before data analysis (or preferably before data collection) begins so as
not to bias conclusions based on results of hypothesis testing. In particular, do not
change from a two-sided to a one-sided test after looking at the data.
Example 7.23 Hypertension Suppose we are testing the efficacy of a drug to reduce blood pres-
sure. Assume the change in blood pressure (baseline blood pressure minus follow-up
blood pressure) is normally distributed with mean µ and variance σ . An appropriate
2
hypothesis test might be H : µ 0 0 vs. H : µ = 1 > 0 because the drug is of interest only if
it reduces blood pressure, not if it raises blood pressure.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 219 7/15/10 7:33:20 PM
220 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
In the rest of this text, we focus primarily on two-sided tests because they are
much more widely used in the literature.
In this section and in Section 7.3, we have presented the one-sample t test,
which is used for testing hypotheses concerning the mean of a normal distribution
when the variance is unknown. This test is featured in the flowchart in Figure 7.18
(p. 258) where we display techniques for determining appropriate methods of statis-
tical inference. Beginning at the “Start” box, we arrive at the one-sample t test box
by answering yes to each of the following four questions: (1) one variable of inter-
est? (2) one-sample problem? (3) underlying distribution normal or can central-limit
theorem be assumed to hold? and (4) inference concerning µ? and no to question
(5) σ known?
One-Sample z Test
In Equations 7.10 and 7.11, the critical values and p-values for the one-sample t test
have been specified in terms of percentiles of the t distribution, assuming the un-
derlying variance is unknown. In some applications, the variance may be assumed
known from prior studies. In this case, the test statistic t can be replaced by the test
statistic z = x − ( µ 0 ) ( σ n). Also, the critical values based on the t distribution can
be replaced by the corresponding critical values of a standard normal distribution.
This leads to the following test procedure:
Equation 7.13 One-Sample z Test for the Mean of a Normal Distribution with Known Variance
(Two-Sided Alternative)
To test the hypothesis H : µ = µ vs. H : µ ≠ µ with a significance level of α,
0
1
0
0
where the underlying standard deviation σ is known, the best test is based on
z = x − ( µ 0 ) ( σ n)
If z < z α or z > z 1 − α 2 2
then H is rejected.
0
z z
If z α ≤≤ 1− α 2 2
then H is accepted.
0
To compute a two-sided p-value, we have
p = z if z ≤ 2Φ() 0
− [
z ]
= 21 Φ() if z > 0
Example 7.24 Cardiovascular Disease Consider the cholesterol data in Example 7.21. Assume
that the standard deviation is known to be 40 and the sample size is 200 instead of
100. Assess the significance of the results.
Solution The test statistic is
181 52 −190
.
z =
40 200
−848
.
= =−300
.
.
2 828
We first use the critical-value method with α = 0.05. Based on Equation 7.13, the
critical values are −1.96 and 1.96. Because z = −3.00 < −1.96, we can reject H at a 5%
0
level of significance. The two-sided p-value is given by 2 × Φ(−3.00) = .003.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 220 7/15/10 7:33:23 PM
7.5 ■ The Power of a Test 221
Similarly, we can consider the one-sample z test for a one-sided alternative as
follows.
Equation 7.14 One-Sample z Test for the Mean of a Normal Distribution with Known Variance
(One-Sided Alternative) ( µ < µ )
0
1
To test the hypothesis H : µ = µ vs. H : µ < µ with a significance level of α,
0
0
0
1
where the underlying standard deviation σ is known, the best test is based on
z = x − ( µ 0 ) ( σ n)
If z < z , then H is rejected; if z ≥ z , then H is accepted. The p-value is given by
α
α
0
0
p = Φ(z).
Equation 7.15 One-Sample z Test for the Mean of a Normal Distribution with Known Variance
(One-Sided Alternative) ( µ > µ )
1
0
To test the hypothesis H : µ = µ vs. H : µ > µ with a significance level of α,
0
0
1
0
where the underlying standard deviation σ is known, the best test is based on
z = x − ( µ 0 ) ( σ n)
If z > z 1−α , then H is rejected; if z ≤ z 1−α , then H is accepted. The p-value is given
0
0
by p = 1 − Φ(z).
In this section we presented the one-sample z test, which is used for testing hypoth-
eses concerning the mean of a normal distribution when the variance is known. Begin-
ning at the “Start” box of the flowchart (Figure 7.18, p. 258), we arrive at the one-sample
z test box by answering yes to each of the following five questions: (1) one variable of
interest? (2) one-sample problem? (3) underlying distribution normal or can central-
limit theorem be assumed to hold? (4) inference concerning µ? and (5) σ known?
7.5 The Power of a Test
The calculation of power is used to plan a study, usually before any data have been
obtained, except possibly from a small preliminary study called a pilot study. Also,
we usually make a projection concerning the standard deviation without actually
having any data to estimate it. Therefore, we assume the standard deviation is
known and base power calculations on the one-sample z test as given in Equations
7.13, 7.14, and 7.15.
One-Sided Alternatives
Example 7.25 Ophthalmology A new drug is proposed to prevent the development of glaucoma
in people with high intraocular pressure (IOP). A pilot study is conducted with the
drug among 10 patients. After 1 month of using the drug, their mean IOP decreases by
5 mm Hg with a standard deviation of 10 mm Hg. The investigators propose to study
100 participants in the main study. Is this a sufficient sample size for the study?
Solution To determine whether 100 participants are enough, we need to do a power calculation.
The power of the study is the probability that we will be able to declare a significant
difference with a sample of size 100 if the true mean decline in IOP is 5 mm Hg
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 221 7/15/10 7:33:24 PM
222 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
with a standard deviation of 10 mm Hg. Usually we want a power of at least 80%
to perform a study. In this section we examine formulas for computing power and
addressing the question just asked.
In Section 7.4 (Equation 7.14) the appropriate hypothesis test was derived to test
H : µ 0 µ = 0 vs. H : µ 1 µ < 0
where the underlying distribution was assumed to be normal and the population
variance was assumed to be known. The best test was based on the test statistic z. In
particular, from Equation 7.14 for a type I error of α, H is rejected if z < z and H
α
0
0
is accepted if z ≥ z . The form of the best test does not depend on the alternative mean
α
chosen (µ ) as long as the alternative mean is less than the null mean µ .
0
1
Hence, in Example 7.2, where µ = 120 oz, if we were interested in an alternative
0
mean of µ = 115 oz rather than µ = 110 oz, then the same test procedure would
1
1
still be used. However, what differs for the two alternative means is the power of the
test = 1 − Pr(type II error). Recall from Definition 7.6 that
(
Power = Pr reject HH 0 false) = Pr Z < ( z µ α = µ 1)
0
X − µ
= Pr σ n 0 < α = z µ µ 1
= < ( X µ 0 +Pr α n µ = z σ µ 1 )
,
We know that under HX ~ N(µσ 2 n). Hence, on standardization of limits,
,
1
1
Power = Φ 0 ( α σ + − z µ n 1) σ ( n ) = Φ µ z α + ( µ 0 σ µ − 1 ) n
This power is depicted graphically in Figure 7.5.
Note that the area to the left of µ + z α σ n under the H distribution is the
0
0
significance level α, whereas the area to the left of µ + z α σ n under the H distribu-
0
1
tion is the power = 1 − β.
Figure 7.5 Illustration of power for the one-sample test for the mean of a normal distribution
with known variance (µ < µ )
1 0
Distribution of X Distribution of X
under H 1 under H 0
2
2
N(� , � /n) N(� , � /n)
0
1
Power =
1 – �
Frequency Acceptance
region
Rejection
region
0 � 1 � 0 Value
Area = �
� + z �/ n
�
0
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 222 7/15/10 7:33:26 PM
7.5 ■ The Power of a Test 223
Why should power concern us? The power of a test tells us how likely it is that a
statistically significant difference will be detected based on a finite sample size n, if
the alternative hypothesis is true—that is, if the true mean µ differs from the mean
under the null hypothesis (µ ). If the power is too low, then there is little chance of
0
finding a significant difference and nonsignificant results are likely even if real dif-
ferences exist between the true mean µ of the group being studied and the null mean
µ . An inadequate sample size is usually the cause of low power to detect a scientifi-
0
cally meaningful difference.
Example 7.26 Obstetrics Compute the power of the test for the birthweight data in Example 7.2
(p. 204) with an alternative mean of 115 oz and α = .05, assuming the true standard
deviation = 24 oz.
Solution We have µ = 120 oz, µ = 115 oz, α = .05, σ = 24, n = 100. Thus
0 1
Power = z + Φ (120 −115 ) 100 = 24 [ − Φ . 1 645 + (510) 24] ) = ( .438 = Φ 0 ) .669
.05
Therefore, there is about a 67% chance of detecting a significant difference using a
5% significance level with this sample size.
We have focused on the situation where µ < µ . We are also interested in power
1
0
when testing the hypothesis
H : µ 0 µ = 0 vs. H : µ 1 µ = 1 > µ 0
as was the case with the cholesterol data in Example 7.1. The best test for this situ-
ation was presented in Equation 7.15, where H is rejected if z > z 1−α and accepted if
0
z ≤ z 1 −α . Notice that if z > z 1−α , then
Equation 7.16 x − µ 0 > z
σ n 1 −α
If we multiply both sides of Equation 7.16 by σ n, and add µ , we can re-express
0
the rejection criteria in terms of x, as follows:
Equation 7.17 x > 0 + µ z 1 −α σ n
Similarly, the acceptance criteria, z ≤ z 1 −α , can also be expressed as
Equation 7.18 x ≤ 0 + µ z 1 −α σ n
Hence the power of the test is given by
Power = > ( X µ 0 + Pr 1 − σ n µ = z µ 1 ) =− Pr X <( 0 + µ z 1 −α α σ n µ = µ 1)
µ
1
µ σ + n − z µ µ − µ ( ) n
1 Φ
=− 0 1− σ α n 1 =− z 1 1− + 0 σ 1
1 Φ
α
Using the relationships Φ(−x) = 1 − Φ(x) and z =− z 1 − α , this expression can be
α
rewritten as
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 223 7/15/10 7:33:29 PM
224 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Figure 7.6 Illustration of power for the one-sample test for the mean of a normal distribution
with known variance ( µ > µ )
1 0
Distribution of X Distribution of X
under H 0 under H 1
2
2
N(� , � /n) N(� , � /n)
1
0
Power = 1 – �
Frequency Acceptance Rejection
region
region
Area = �
0 � 0 � 1 Value
� + z 1–� �/ n
0
µ − ) n µ − ) n
µ
µ
Φ 1 − α +z ( 1 0 = Φ − α z + ( 1 0 if µ 1 > µ 0
σ σ
This power is displayed in Figure 7.6.
Example 7.27 Cardiovascular Disease, Pediatrics Using a 5% level of significance and a sample of
size 10, compute the power of the test for the cholesterol data in Example 7.18, with
an alternative mean of 190 mg/dL, a null mean of 175 mg/dL, and a standard deviation
(σ) of 50 mg/dL.
Solution We have µ = 175, µ = 190, α = .05, σ = 50, n = 10. Thus
0 1
Power = − Φ . + 1 645 ( − 190 175) 10 50
= − Φ( 1 1 645 15 10 50) = ( − Φ 0 696)
+
.
.
0 696) =−
=− (. 6 1 . 757 = . 243
1 Φ
Therefore, the chance of finding a significant difference in this case is only 24%.
Thus it is not surprising that a significant difference was not found in Example 7.18
because the sample size was too small.
The power formulas presented in this section can be summarized as follows:
Equation 7.19 Power for the One-Sample z Test for the Mean of a Normal Distribution with
Known Variance (One-Sided Alternative)
The power of the test for the hypothesis
H : µ = µ 0 vs. H : µ = µ 1
0
1
2
where the underlying distribution is normal and the population variance (σ ) is
assumed known is given by
Φ α µ 0 µ − 1 n σ Φ z +( − z ( 1 − α µ) = 0 − µ + 1 n σ )
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 224 7/15/10 7:33:30 PM
7.5 ■ The Power of a Test 225
Notice from Equation 7.19 that the power depends on four factors: α, |µ − µ |,
1
0
n, and σ.
Equation 7.20 Factors Affecting the Power
(1) If the significance level is made smaller (α decreases), z increases and hence
α
the power decreases.
(2) If the alternative mean is shifted farther away from the null mean (|µ − µ |
1
0
increases), then the power increases.
(3) If the standard deviation of the distribution of individual observations in-
creases (σ increases), then the power decreases.
(4) If the sample size increases (n increases), then the power increases.
Example 7.28 Cardiovascular Disease, Pediatrics Compute the power of the test for the choles-
terol data in Example 7.27 with a significance level of .01 vs. an alternative mean of
190 mg/dL.
Solution If α = .01, then the power is given by
+
Φ .01 + (190 175 ) 10 50 = Φ z − ( . 2 326 15 10 50)
−
=− Φ( 1
1
1
= = − Φ .377 ) 1 ( .377 ) =− .9158 ≈ %8
which is lower than the power of 24% for α = .05, computed in Example 7.27. What
does this mean? It means that if the α level is lowered from .05 to .01, the β error will
be higher or, equivalently, the power, which decreases from .24 to .08, will be lower.
Example 7.29 Obstetrics Compute the power of the test for the birthweight data in Example 7.26
with µ = 110 oz rather than 115 oz.
1
Solution If µ = 110 oz, then the power is given by
1
Φ . + [ 120 −110 10 24 ] 1 645 ( = Φ − ( 2 522) = . 994 ≈ 99%
.
)
which is higher than the power of 67%, as computed in Example 7.26 for µ = 115 oz.
1
What does this mean? It means that if the alternative mean changes from 115 oz
to 110 oz, then the chance of finding a significant difference increases from 67%
to 99%.
Example 7.30 Cardiology Compute the power of the test for the infarct-size data in Example 7.13
with σ = 10 and σ = 15 using an alternative mean of 20 (ck − g − EQ/m ) and α = .05.
2
Solution In Example 7.13, µ = 25 and n = 8. Thus, if σ = 10, then
0
Power = − Φ . + 1 645 ( − 25 20) 810 = ( − Φ 0 23)
.
.
1
=− Φ(. =− 591 = 409 ≈ 41%
0 23)
.
1
whereas if σ = 15, then
Power = − Φ . + 1 645 ( − 25 20) 815 = ( − Φ 0 702)
.
= − 759 = 241 24%
1 .
≈
.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 225 7/15/10 7:33:31 PM
226 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
What does this mean? It means the chance of finding a significant difference de-
clines from 41% to 24% if σ increases from 10 to 15.
Example 7.31 Obstetrics Assuming a sample size of 10 rather than 100, compute the power for the
birthweight data in Example 7.26 with an alternative mean of 115 oz and α = .05.
Solution We have µ = 120 oz, µ =115 oz, α = .05, σ = 24, and n = 10. Thus
0 1
Power = z + Φ (120 −115 ) 10 = 24 ( ( − Φ . 1 645 + 51024)
.05
= − Φ( 0 .986 ) =− .838 = .162
1
What does this mean? It means there is only a 16% chance of finding a significant
difference with a sample size of 10, whereas there was a 67% chance with a sample
size of 100 (see Example 7.26). These results imply that if 10 infants were sampled,
we would have virtually no chance of finding a significant difference and would
almost surely report a false-negative result.
For given levels of α (.05), σ (24 oz), n (100), and µ (120 oz), a power curve can
0
be drawn for the power of the test for various alternatives µ . Such a power curve is
1
shown in Figure 7.7 for the birthweight data in Example 7.2. The power ranges from
99% for µ = 110 oz to about 20% when µ = 118 oz.
Two-Sided Alternatives
The power formula in Equation 7.19 is appropriate for a one-sided significance test at
level α for the mean of a normal distribution with known variance. Using a two-sided
test with hypotheses H : µ 0 µ = 0 vs. H : µ 1 µ ≠ 0 , the following power formula is used.
Equation 7.21 Power for the One-Sample z Test for the Mean of a Normal Distribution (Two-Sided
Alternative)
The power of the two-sided test H : µ = µ vs. H : µ ≠ µ for the specific alternative
1
0
0
0
µ = µ , where the underlying distribution is normal and the population variance
1
(σ ) is assumed known, is given exactly by
2
µ − ) n µ − ) n
µ
µ
n
Φ 1 − 2 +z ( 0 1 + Φ − − z 1 α/ −α / 2 + ( 1 0
σ σ
Figure 7.7 Power curve for the birthweight data in Example 7.2
1.0
.9
Power = Pr(reject H 0 � = � 1 ) .7 given in Equation 7.14
.8
Power curve for
.6
the one-sided
.5
significance test
.4
for the birthweight
.3
data in Example 7.2
.2
.1
106 108 110 112 114 116 118 120
Birthweight
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 226 7/15/10 7:33:33 PM
7.5 ■ The Power of a Test 227
and approximately by
µ − µ n
Φ−z 1 −α/ 2 + 0 1
σ
To see this, note that from Equation 7.13 we reject H if
0
x − µ x − µ
z = 0 < z or z = 0 > z
α 2
σ n α / σ n 1 − / 2
If we multiply each inequality by σ n and add µ , we can re-express the rejection
0
criteria in terms of x, as follows:
Equation 7.22 x < 0 + µ z α / σ n or x > 0 + µ z 1 − / 2 σ n
α 2
The power of the test vs. the specific alternative µ = µ is given by
1
Equation 7.23 Power = Pr < ( X µ 0 + α / σ n = z 1 ) + µ > PrX ( µ 0 + µ 1 − / 2 σ n µ = z µ µ 1)
α 2
µ σ + n − z µ µ σ + n − z µ µ
α 2
= Φ 0 α / 1 +− 0 1− / 2 1
1 Φ
σ n σ n
( µ µ − ) n ( µ − µ ) n
= z α / + Φ 0 1 +− z 1− / 2 + + 0 1
1 Φ
α 2
σ σ
Using the relationship 1 − Φ(x) = Φ(−x), the last two terms can be combined as follows:
µ − µ ( ) n µ − µ ( ) ) n
Equation 7.24 Power = z α + Φ 0 1 + − Φ z 1 − /2 + 1 0
α/2
σ σ
Finally, recalling the relationship z = −z we have
α/2 1− α/2,
µ − µ ( ) n µ ( − − µ ) n
− Φ
Equation 7.25 Power = z 1 − 2 + 0 1 + − Φ z 1 α/ −α / 2 + 1 0
σ σ
Equation 7.25 is more tedious to use than is usually necessary. Specifically, if µ < µ ,
1
0
then the second term is usually negligible relative to the first term. However, if
µ > µ , then the first term is usually negligible relative to the second term. Therefore,
1
0
the approximate power formula in Equation 7.21 is usually used for a two-sided test
because it represents the first term in Equation 7.25 if µ > µ and the second term
0
1
in Equation 7.25 if µ > µ . The power is displayed in Figure 7.8. Note that the ap-
1
0
proximate power formula for the two-sided test in Equation 7.21 is the same as the
formula for the one-sided test in Equation 7.19, with α replaced by α/2.
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 227 7/15/10 7:33:36 PM
228 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Figure 7.8 Illustration of power for a two-sided test for the mean of a normal distribution with
known variance
Power = 1 – � Power = 1 – �
Frequency Frequency Area = �/2
Area = �/2
0 � 1 � 0 Value 0 � 0 � 1 Value
Area = �/2 Area = �/2
� + z �/2 �/ n � + z 1–�/2 �/ n � + z �/2 �/ n � + z 1–�/2 �/ n
0
0
0
0
(a) � < � 0 (b) � > � 0
1
1
Example 7.32 Cardiology A new drug in the class of calcium-channel blockers is to be tested for
the treatment of patients with unstable angina, a severe form of angina. The effect
this drug will have on heart rate is unknown. Suppose 20 patients are to be studied
and the change in heart rate after 48 hours is known to have a standard deviation of
10 beats per minute. What power would such a study have of detecting a significant
difference in heart rate over 48 hours if it is hypothesized that the true mean change
in heart rate from baseline to 48 hours could be either a mean increase or a decrease
of 5 beats per minute?
Solution Use Equation 7.21 with σ = 10, | µ − µ | = 5, α = .05, n = 20. We have
0 1
Power = Φ − ( z 1052 + 52010 ) = − Φ 1 96 + ( . = 2 236) Φ 0(..276 = ) .609≈ .61
.
−.
/
Thus the study would have a 61% chance of detecting a significant difference.
7.6 Sample-Size Determination
One-Sided Alternatives
For planning purposes, we frequently need some idea of an appropriate sample size
for investigation before a study actually begins. One possible result of making these
calculations is finding out that the appropriate sample size is far beyond the finan-
cial means of the investigator(s) and thus abandoning the proposed investigation.
Obviously, reaching this conclusion before a study starts is much better than after it
is in progress.
What does “an appropriate sample size for investigation” actually mean?
Consider the birthweight data in Example 7.2. We are testing the null hypothesis
H : µ = µ vs. the alternative hypothesis H : µ = µ , assuming that the distribution of
0
1
0
1
birthweights is normal in both cases and that the standard deviation σ is known.
We are presumably going to conduct a test with significance level α and have some
idea of what the magnitude of the alternative mean µ is likely to be. If the test pro-
1
cedure in Equation 7.14 is used, then H would be rejected if z < z or equivalently
0 α
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 228 7/15/10 7:33:37 PM
7.6 ■ Sample-Size Determination 229
Figure 7.9 Requirements for appropriate sample size
H 1 H 0
distribution distribution
2
2
N(� , � /n) N(� , � /n)
0
1
Area = 1 – �
Frequency Acceptance
region
Rejection
region
0 � 1 � 0 Value
Area = �
� + z �/ n
�
0
if x < 0 + µ z σ α n and accepted if z ≥ z or equivalently if x ≥ 0 + µ z σ α n. Suppose
α
the alternative hypothesis is actually true. The investigator should have some idea
as to what he or she would like the probability of rejecting H to be in this instance.
0
This probability is, of course, nothing other than the power, or 1 − β. Typical values
for the desired power are 80%, 90%, …, and so forth. The problem of determining
sample size can be summarized as follows: Given that a one-sided significance test
will be conducted at level α and that the true alternative mean is expected to be µ ,
1
what sample size is needed to be able to detect a significant difference with prob-
ability 1 − β? The situation is displayed in Figure 7.9.
In Figure 7.9, the underlying sampling distribution of X is shown under the
null and alternative hypotheses, respectively, and the critical value µ + z α σ n has
0
been identified. H will be rejected if x < 0 + µ z σ α n. Hence the area to the left of
0
µ + z α σ n under the rightmost curve is α. However, we also want the area to the
0
left of µ + z α σ n under the leftmost curve, which represents the power, to be 1 − β.
0
These requirements will be met if n is made sufficiently large, because the variance of
each curve (σ /n) will decrease as n increases and thus the curves will separate. From
2
the power formula in Equation 7.19,
Power = Φ α + (z µ 0 − µ 1 n σ ) =− β 1
We want to solve for n in terms of α, β, |µ − µ |, and σ. To accomplish this, recall that
0 1
Φ(z ) = 1 − β and, therefore,
1−β
z + α µ 0 µ − 1 n σ = z 1 − β
Subtract z from both sides of the equation and multiply by σ/|µ − µ | to obtain
α 0 1
− z + ( z ) σ
n = α 1 − β
µ 0 − µ 1
Replace −z by z and square both sides of the equation to obtain
α 1−α
2
+ z ) σ 2
z ( 1 − 1 β −α
n = 2
µ
µ
1
( 0 − )
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 229 7/15/10 7:33:39 PM
230 C H A P T E R 7 ■ Hypothesis Testing: One-Sample Inference
Similarly, if we were to test the hypothesis
H : µ 0 µ = 0 vs . H : µ 1 µ = 1 > µ 0
as was the case with the cholesterol data in Example 7.1, using a significance level
of α and a power of 1 − β, then, from Equation 7.19, the same sample-size formula
would hold. This procedure can be summarized as follows.
Equation 7.26 Sample-Size Estimation When Testing for the Mean of a Normal Distribution
(One-Sided Alternative)
Suppose we wish to test
H : µ = µ 0 vs . H : µ = µ 1
1
0
where the data are normally distributed with mean µ and known variance σ .
2
The sample size needed to conduct a one-sided test with significance level α
and probability of detecting a significant difference = 1 − β is
σ 2 + z ) 2
n = z ( 1 − 1 −β α
µ
(µ 0 − ) 2
1
Example 7.33 Obstetrics Consider the birthweight data in Example 7.2. Suppose that µ = 120 oz,
0
µ = 115 oz, σ = 24, α = .05, 1 − β = .80, and we use a one-sided test. Compute the
1
appropriate sample size needed to conduct the test.
2
24 ( z + z ) 2
Solution n = 8 . . 95 = . ( . + 23 04 0841 645) 2 = 23 04((.175 ) 142 .3
=
.
.
6
25
The sample size is always rounded up so we can be sure to achieve at least the re-
quired level of power (in this case, 80%). Thus a sample size of 143 is needed to have
an 80% chance of detecting a significant difference at the 5% level if the alternative
mean is 115 oz and a one-sided test is used.
Notice that the sample size is very sensitive to the alternative mean chosen. We
2
see from Equation 7.26 that the sample size is inversely proportional to (µ − µ ) .
0 1
Thus, if the absolute value of the distance between the null and alternative means is
halved, then the sample size needed is four times as large. Similarly, if the distance
between the null and alternative means is doubled, then the sample size needed is
1/4 as large.
Example 7.34 Obstetrics Compute the sample size for the birthweight data in Example 7.2 if
µ = 110 oz rather than 115 oz.
1
Solution The required sample size would be 1/4 as large because (µ − µ ) = 100 rather than
2
0 1
25. Thus n = 35.6, or 36 people, would be needed.
Example 7.35 Cardiovascular Disease, Pediatrics Consider the cholesterol data in Example 7.1.
Suppose the null mean is 175 mg/dL, the alternative mean is 190 mg/dL, the stan-
dard deviation is 50, and we wish to conduct a one-sided significance test at the 5%
level with a power of 90%. How large should the sample size be?
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
CHE-ROSNER-10-0205-007.indd 230 7/15/10 7:33:41 PM