The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Beginning Statistics with Data Analysis (1)

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by mazatulathirah, 2021-12-26 08:09:38

Beginning Statistics with Data Analysis (1)

Beginning Statistics with Data Analysis (1)

for se2/n does that. When x = 0, this coefficient is 1, because in that special case,
is the value of y at x. But when x is far from 0, Σxi2 is much larger than SSx,

and so then the uncertainty in is much larger than that of y.
To check on the plausibility of a contemplated value of b, we subtract it from

and divide by :

t STATISTIC FOR CONTEMPLATED VALUE OF b:

Suppose that we want to check the plausibility of b = 10. Our test statistic will be

and we refer the value of the statistic to the t distribution with n – 2 degrees of
freedom, as we did in Section 12-5 with .

Similarly, we construct confidence limits for b using the t distribution. For
example, 90 percent limits are given by

As before, when n is large (say > 30), we can use as an approximation values
from the standard normal probability table in place of the values from the t
distribution with n – 2 degrees of freedom.

EXAMPLE 11 For the data in Example 3, check on the plausibility of the
contemplated value b = 0.
SOLUTION. From Example 3 we recall that

Thus the estimated variance of the slope = –0.5 is

To check on the plausibility of b = 0, we compute

Because this value of is near zero, which is the middle of the t distribution,
with 2 degrees of freedom, we conclude that the contemplated value b = 0 is
plausible.

EXAMPLE 12 Temperatures below the earth’s surface. In Example 5 in Chapter
3 we looked at data on temperatures taken at various depths in an artesian well at
Grenoble, France, using a point 28 meters below the earth’s surface as the origin
for both temperature and depth. We add the point x = 0 and y = 0 to the four
points studied earlier, because we know that the temperature difference at the
origin is, in fact, zero:

From these data we compute the following summary statistics:

and

The coefficients for the least-squares line for predicting temperature from
depth are

and

so that the least-squares prediction line is

In our earlier look at these data we did not include the point (0, 0), but we
did make sure that our line passed through the origin. By fitting the line without
forcing it through the origin, we can check whether b = 0 is plausible. The
minimizing value of the sum of squared deviations is

and, using formula (12), we get

Taking the square root, we have = 0.574. To get 95 percent confidence limits
for the true y intercept b, we compute

The value b = 0 falls near the middle of the interval formed by these limits. Thus
zero is indeed a plausible value for the y intercept.

To give a quantitative measure to the plausibility, we can compute

Interpolation in the t table with 3 degrees of freedom suggests that the chance of
a deviation from zero larger than 0.62 is about 0.58 when the true value of b is
zero. Thus, again, zero is a reasonable value for b.

Whether one prefers the general least-squares line just given or the line
through the origin depends on how firmly one believes in the linearity. If that has
no solid foundation, the general line is probably the better choice over the region
of measurement.

PROBLEMS FOR SECTION 12-6

In Example 2B we gave the equation

for predicting y, the square root of the distance required to stop a car traveling at a speed of x miles per
hour. Use the following summary statistics of the data to solve Problems 1 through 4:

1. Verify the values of the least-squares coefficients. Show your work.
2. Continuation. Find a 99 percent confidence interval for b using the two-sided multiplier from Table A-2

in Appendix III in the back of the book for the t distribution.
3. Continuation. How different would your interval be if you used the approximate multiplier from the

standard normal table?
4. In Example 2B it was argued that b = 0 should be the value for the y intercept if the true regression of y

on x really is linear. How plausible is this value of b?

* 12-7 PREDICTING NEW y VALUES

In addition to constructing tests and confidence limits for slope and intercept, we
often want to assess the variability of a new point or of an estimate of a value of
y based on a value of x. Even though the variabilities of the y’s around the line
may be the same for each x, the departure of a new value from the estimate
based on the fitted line does depend on the distance the new x is from the center
of the previous measurements, x. We explain here how to take this into account.

Making predictions from an estimated regression line is straightforward. We

take a new x value, say x0, and our predicted y value is . When

we actually do get to observe the y value corresponding to x0, say y0, it will not

be equal to 0, because we had to estimate the true regression parameters b and

m by and . Also, y0 equals the true regression of y on x plus an error term,

which we write in the reverse of the usual order:

The difference between our prediction 0 and the observed value y0 is

The variance of this difference is based on contributions from each of the three
parts on the right-hand side and takes into account the relationship between

and . We give without proof the formula

To appreciate this formula we look at its three parts:

1. The first term σ2 is the contribution to variability because we are estimating
the result for a single observation.

2. The second term σ2/n is the contribution from the uncertainty of the level of
the line, which is essentially the variability of y.

3. The third part is the extra uncertainty because we are not sure of the slope of
the line, and the farther from x we go, the more this matters. That is reflected
in (x0 – x)2. When x0 = x, this third term vanishes.

If we substitute se2 for σ2 in equation (15), our estimate of the variance of y0
– 0 is as follows:

ESTIMATED VARIANCE OF y0 – 0:

The value of is a measure of how accurately we can predict a new y value, y0,
from a given x value, x0.

Let us note two special things about expression (16). First, the estimated
variance can never be smaller than se2 (1 + 1/n), the value it takes when x0 = x.
When our original sample is large, we can ignore the 1/n term, and this
minimum value is approximately se2. Second, the farther x0 is from x, the larger

becomes. This allows us to be properly cautious about predictions far

beyond the range of x values in the sample used to calculate the least-squares

coefficients when the relation is actually linear. It does not allow for the

possibility that the true relation is nonlinear. Finally, when , the

last two terms of (16) become

This is just the value of from formula (12).

Our general method for getting confidence limits is applicable for the
predicted value of a new observation. For 95 percent limits we take 95 percent
confidence limits for y0:

Because the farther x0 is from x, the larger is, confidence limits for an x value
near x are closer together than those for x values farther away.

The confidence limits in (17) give us a way to check whether a newly
observed y value, y0, is consistent with predictions based on a previously
computed least-squares regression line. For a value x0, if the 95 percent
confidence limits do not contain the new y value, y0, then we have grounds for
doubting that the new point is consistent with the least-squares line.

EXAMPLE 13 For the data in Example 3, find the estimated variance of y0 – 0
when x0 = 2.
SOLUTION. Recall that n = 4, x = 2.5, SSx = 5, and se2 = 0.15. Then, substituting
in formula (16), we have

(The contribution of the third term is almost negligible, because x0 = 2 is so
close to x = 2.5.) Thus the estimated standard error of y0 – 0, at x0 = 2, is

.

EXAMPLE 14 Indianapolis 500 predictions. In Example 1 in Chapter 11, in our

calculations with the winning speeds from 1961 through 1970 for the
Indianapolis 500 auto race, we used the least-squares regression line to make
predictions for years beyond 1970. Is the winning speed of 158.59 miles per
hour for 1974 consistent with the least-squares prediction?
SOLUTION. The least-squares prediction for 1974 is

Useful summary statistics from previous calculations based on the data in Table
11-1 are

And here we have x0 = 1974 – 1960 = 14. Substituting these values into equation
(16) and taking the square root of both sides, we get

To get 95 percent limits for the new y value (the winning speed for 1974) we
need the value

from Table A-2 in Appendix III in the back of the book. Then the 95 percent
confidence limits are

These limits are plotted in Fig. 12-6. The actual winning speed for 1974 was
158.59 miles per hour, and this lies within our interval, although not by much.
Thus the new point seems reasonably consistent with the estimated regression

based on data from 1961–1970.
To get a more quantitative assessment, we compute

Interpolating in the t table with 8 degrees of freedom gives a probability of a
larger departure than this of about 0.14.

Figure 12-6 Predicting Indianapolis 500 winning speed for 1974.

We should, however, pause before trying to extrapolate with the estimated
regression line in order to make predictions far from 1961–1970. Although the
(x0 – x)2 term is intended to allow for uncertainty in the slope as we extrapolate,
that is not what worries us. The possibility exists that a whole new automotive
technology might be invented and that new machines with different engines and
different fuel would be allowed to race. Our formula does not allow for such
systematic changes in the process. This is a major danger in such extrapolations.

PROBLEMS FOR SECTION 12-7

Use the Indianapolis 500 data in Table 11-1 to solve Problems 1, 2, and 3. Useful summary statistics are

The least-squares regression line is

1. Based on the least-squares regression line, the retrospective prediction for the winning speed in 1950 is
118.30 miles per hour. Is this prediction consistent with the actual winning speed of 124.00 miles per
hour?

2. Continuation. Construct 95 percent confidence limits for the winning speed at the Indianapolis 500 auto
race in 1979.

3. Continuation. Look up the actual winning speed for the 1979 Indianapolis 500 in an almanac and see
whether or not it falls within your interval in Problem 2.
Problems 4 through 6 deal with the data on heights (x) and weights (y) of 15 high school sophomore

boys given in Table 11-3. Before doing any calculations, be sure to reverse the height and weight values for
subject 7. Some summary statistics for the resulting data are

4. Compute the least-squares regression coefficients for predicting weight (y) from height (x), and find se,
the estimated standard deviation of the regression error term.

5. Continuation. A new boy whose height is 65 inches is about to enter the sophomore class. What is your
prediction of his weight?

6. Continuation. Construct 95 percent confidence limits to go with your prediction in Problem 5.
The data on the distances of the planets from the sun in Table 3-2 are used to solve Problems 7, 8, and 9.

Useful summary statistics are

7. Predict the logarithm of the distance from the sun for a new 11th planet.
8. Continuation. Construct 99 percent confidence limits to go with the prediction in Problem 7.
9. Continuation. What values of d11 correspond to these limits?

Table 3-6 gives data on winning times for the 1500-meter run for men in the Olympics. Let

and

Summary statistics are

and the least-squares line is

Use this information to solve Problems 10, 11, and 12.
10. Predict the winning time for the 1500-meter run for the 1976 Olympics.
11. Continuation. Construct 95 percent confidence limits for your prediction in Problem 10.
12. Continuation. Look up the actual winning time for the 1976 Olympic 1500-meter race in an almanac

and see whether or not it falls within your interval in Problem 11.

12-8 SUMMARY OF CHAPTER 12

Equation numbers match those used earlier in the chapter.
1. The linear regression model for n pairs of (x, y) quantities (x1, y1), (x2, y2),
…, (xn, yn), is

where b and m are the unknown regression coefficients. The random errors
e1, e2, …, en are assumed to be independent and distributed according to the
normal probability distribution with mean 0 and variance σ2. This normality
assumption is rarely met in practice, but its use allows us to make
approximate tests and construct approximate confidence limits.
2. In terms of the special notation for sums of squared deviations, the least-
squares estimates for b and m are

and the minimum sum of squared residuals is
3. The estimated variance of the random error terms is
4. The proportion of the variability SSy explained by the linear regression line

is

The quantity r2 is the square of the sample correlation coefficient, Sxy
which lies between –1 and +1.
5. The estimated variances of the least-squares coefficients are

6. The estimated variance for predicting a new y value at x = x0 is

7. To test the plausibility of contemplated values of unknown parameters on
new y values, we compute the critical ratio

and we compare it to values from the t distribution with n – 2 degrees of
freedom. Confidence intervals for these unknown values take the form

where again the multiplier comes from the t distribution with n – 2 degrees
of freedom.

SUMMARY PROBLEMS FOR CHAPTER 12

Table 11-5 gives data on demand for corn feed. Summary statistics for these data
are

Use this information to solve Problems 1 through 5.
1. Compute a 95 percent confidence interval for the slope m in the regression
demand (y) on aggregate value (x).
2. Continuation. Comment on the plausibility of the value m = 0.
3. Continuation. What proportion of variability of demand is explained by the
fitted least-squares line?
4. Continuation. An examination of the data in Table 11-5 shows a general
increase in demand over time that may be related to factors other than
aggregate value. Compute the regression of demand on year (minus 1948),
using the starting date of each crop year.
5. Continuation. What proportion of variability of demand is explained by the
fitted regression line in Problem 4? Compare this value with the one you

computed in Problem 3.

Table 12-1 gives data on stopping distances for a car traveling at various
speeds. Summary statistics are given at the bottom of the table. The least-squares
line for predicting the square root of the distance is

where and x = speed. Use this information to solve Problems
6 through 10.

6. Predict the value of for a new observation for a car traveling at 35

miles per hour.

7. Continuation. Construct 90 percent limits for your predicted value in
Problem 6.

8. Predict the value of if the new observation is for a car traveling at

70 miles per hour.

9. Continuation. Construct 90 percent limits for your predicted value in
Problem 8.

10. Continuation. Compare the sizes of the intervals formed by the limits in
Problems 7 and 9, and explain why one interval is larger than the other.

REFERENCES

S. Chatterjee and B. Price (1977). Regression Analysis by Example, Chapters 1 and 2. Wiley, New York.
F. Mosteller, R. E. K. Rourke, and G. B. Thomas, Jr. (1970). Probability with Statistical Applications,

second edition, Chapter 11. Addison-Wesley, Reading, Mass.
S. Weisberg (1980). Applied Linear Regression, Chapter 1. Wiley. New York.

Regression with 13
Two Predictors






Learning Objectives

1. Developing the linear regression model for two predictors
2. Extending the method of least squares to fit linear regression equations with

two predictors
3. Comparing the effectiveness of one versus two predictors
4. Applying linear regression ideas to multiplicative models involving two

predictors

13-1 PREDICTING WITH TWO VARIABLES

Once we have used regression methods to get a linear relation

between a dependent variable, y, and one independent variable or predictor, x,
then the idea of using more than one predictor suggests itself. The next thing to
try is two predictors.

EXAMPLE 1 Picture frames. We all know the formula for the circumference of
a rectangle:

where C is the circumference and W and L are the width and length.

Picture frames have a rectangular shape, and the employees in a frame shop
often use a crude measuring stick and this formula to estimate the circumference.
The stick measures distances only to the nearest inch.

To see how close he comes to the measured circumference, the shop manager
performed an experiment with four frames. First, the width and length were
measured to the nearest inch with the measuring stick, and then the
circumference was measured to the nearest 1/10 of an inch using a tape measure.
Because these measured quantities differ from the true quantities, we label them
with the corresponding lower-case letters, w, l, and c. The results of the
experiment are as follows:

The difference between the measured circumference c and the estimated one 2w
+ 2l is a residual. The results in the final column show us that even in a simple
situation, prediction equations are subject to error.

In this example of a formula with two predictors, the equation C = 2W + 2L
represents a geometric law where we know exactly the form of the relation. In
the social and biological sciences, we are likely to have less assured forms for
our relationships. We may read that some variable y, such as a child’s
performance on a standardized test, is a “function of” other variables, u and w,
such as parents’ education and the child’s age. (Now w is a general variable, no
longer the width measurement in the example.) The expression “function of”
means the same in everyday life as in mathematics; that is, it means “depends on
in an as yet unspecified way.” In statistics we usually expect our models to

include some uncertainty, fuzz, or deviations that imply that although there may
be some general trend, the forecasts are not exact. Thus we have as a
mathematical model a regression equation:

Here e is taken to be a random error, just as in linear regression with one
predictor. Let us take another example where our experience is a good guide.

EXAMPLE 2 Ski trip expenses. Suppose that we go on a ski trip. The total cost
of the trip will have three components: (a) the preparation costs, which include
purchases of maps, waxes, etc., (b) the daily expenditure (such as tow fees,
lodging, and meals) times the number of days for the trip, and (c) the automobile
expenses, the number of miles times the cost per mile. The preparation costs,
daily expenditures, and automobile costs per mile are roughly constant from trip
to trip, but the number of days and number of miles driven will vary if the trips
are to different ski areas and at different times. Table 13-1 contains the
information for five different trips.

We wish to estimate the coefficients b0, b1, and b2 in equation (1) for this
problem. Because we have already suggested an interpretation for each, we can
make some guesses as to their values. For example:

TABLE 13-1
Expenses for five different ski trips

Using these, we can get predictions for the total expenditures using the linear
relation

The five values of are listed in Table 13-1, along with the residuals, y – . No
residual exceeds $10, and thus it appears that we have a reasonable prediction
equation. The sum of squared residuals is

In most problems we do not know what the model actually involves– what
the variables are, whether the equation is linear, whether there are only two
predictors, and so on. Even if we know that formula (1) gives the correct model,
we do not know the values of b0, b1 and b2 or even the reasonable guesses that

we used in this example. As in the case of one predictor, we need to estimate the
unknown regression coefficients. One way to estimate these uses an extension of
the method of least squares that we used in Chapters 11 and 12 for the case of
one predictor. Using a least-squares computer program for the data in Table 13-
1, we get the following estimates:

The predicted values using the least-squares equation are

and the values of are listed in Table 13-1 along with the new residuals y – .
Notice that the residuals from the least-squares line add to zero (the other

residuals do not) and that the sum of squared residuals for the least-squares
regression line

is smaller than the sum of squares for the line based on our guesses, 280. Both of
these results follow from properties of the method of least squares.

The least-squares result does not look just as we expected. We had
tentatively labeled b0 as the preparation costs, but our estimate 0 = – 5.2 is
negative! This may mean that our original interpretation was not quite correct, or
more likely that we need more than five observations to get a more accurate
estimate of b0. The estimates of the other two coefficients make more sense, and
they are reasonably close to our expectations.

PROBLEMS FOR SECTION 13-1

Draper and Smith (1980) gave an example of steam consumption in a factory, with
y = number of units of steam used per month,
u = average atmospheric temperature for month in degrees Fahrenheit,
w = number of operation days per month.

(Their regression coefficients are given on their p. 196; data on their pp. 615–616.)
1. Which variable is the dependent variable, and which are the independent ones?

2. Continuation. If we have a prediction equation for this problem of the form

what signs do you expect b1 and b2 to have? Explain your answer.
3. Continuation. For a set of 25 observations, Draper and Smith computed the fitted equation

Do the coefficients agree in sign with your predictions from Problem 2?
4. Continuation. Using the equation in Problem 3, estimate the value of y if u = 68°F and w = 10 days.

What value of y will you predict if u = 50°F and w = 20 days?
5. Continuation. The predictions for y using the equation in Problem 3 do not coincide exactly with the

observed values; that is, an error term is associated with the equation. If you could use more variables
than u and w to predict y, which one variable from the following list would you choose to add? Why?
a) average wind velocity (in miles per hour)
b) calendar days per month
c) days below 32°F
6. From a sample of 100 freshmen at a major university, administrators wish to predict first-year grade-
point average (GPA, with values between 0 and 4) using verbal and mathematical scores on the
Scholastic Aptitude Test (SAT). If

what signs do you expect b1 and b2 to have in the following regression equation?

7. Continuation. The least-squares equation computed from data for the 100 students is

Do the coefficients agree in sign with your predictions in Problem 6?
8. Continuation. A student enters the university with SAT scores

Is the predicted GPA using the equation from Problem 7 a reasonable one? Explain your answer.
9. Continuation. What other variables might you wish to use in addition to SAT scores for predicting GPA?

13-2 LEAST-SQUARES COEFFICIENTS FOR TWO

PREDICTORS

When we have just three variables, a dependent variable y and two independent
variables u and w, our observations come in triples:

and so on. We regard a triple as a single observation. The general observation is

where the subscript i refers to the ith observation, and i runs from 1 through n,
the total number of observations.

Just as in the case of one predictor, the least-squares method for two
predictors minimizes the sum of squared residuals. We plan to fit an equation of
the form

For specified values of b0, b1 and b2, and a particular observation (yi, ui, wi), the
predicted value of yi is

and the residual is

(We use the notation di for distance instead of ri for residuals because of possible
confusion with the correlation coefficients.) We wish to choose the b’s to
minimize the sum of squares of the d’s; that is, we minimize

As we did in the one-predictor situation of Chapter 12, we use hats to label the

b’s that minimize S for the sample of n observations, as . The
minimum value of S is as follows:

The values are called the least-squares estimates of the unknown

regression coefficients, and the least-squares predicted value for the ith
observation is

As we might expect, finding the values of to minimize

expression (3) requires more effort than in the least-squares problem with a
single predictor. Let us look at this minimization problem numerically in an
example.

EXAMPLE 3 Least squares for the ski trip expenses. For the five observations
dealing with the expenses for ski trips in Table 13-1, find the values of b0, b1 and
b2 that minimize S.

SOLUTION. Let us begin by assuming that we know the value of b0 = –5.2.
Then we need only find and to minimize

One way to find and is to look at several possible values of b1 (say 71, 75,
79, and 83) and several possible values of b2 (say 0.16, 0.18, 0.20, and 0.22),
assuming for a moment these were in the neighborhood of the solution. Then we
can compute S for each of the combinations of b1 and b2. These 16 values are
given in Table 13-2. We see that the smallest value of S, 118.8, in this table is
that for b1 = 79 and b2 = 0.18. As a next step toward getting the minimized value
of Ŝ = 116.54 we gave in Section 13.1, we need finer grids of values for b1 near

b1 = 79 (say 78.8, 78.9, 79, 79.1, 79.2) and b2 near b2 = 0.18 (say 0.174, 0.176,
0.178, 0.180, 0.182, 0.184). Using these, we might arrive at the least-squares
values = 78.9 and = 0.176.

TABLE 13-2
Values of S = Σ(yi – b0 – b1ui – b2wi)2 for

the data in Table 13-1, using b0 = –5.2

For simplicity, the calculations in Table 13-2 for S are based on the
minimizing value = –5.2. If we did not know this in advance, we would need
to do the same calculations for several different values of b0 to find the
minimizing value of b0 as well.

FORMULAS FOR , AND

To find the least-squares coefficients numerically by using a grid, as we did in
Example 3, is both time-consuming and wasteful. As in the case of simple linear
regression, the minimization of S can be done algebraically, and Appendix II in
the back of the book contains formulas for the least-squares coefficients

. To use these formulas in practice, one will almost certainly
want to use a calculator. Some pocket calculators have special buttons for
computing these coefficients directly.

DOING LEAST SQUARES BY HIGH-SPEED COMPUTER

Fortunately, we have a better way to get the least-squares regression coefficients

—we can use the high-speed computer. Because so many people want to

compute least-squares regression coefficients in practical situations, almost

every high-speed computer has a built-in computer program for doing regression

calculations. To use these programs, usually we need only give the computer a

list of our n observations (the triples: yi, ui and wi), tell it which is the dependent

variable and which are the predictors, and then indicate that we want to compute

the least-squares coefficients for the regression of y on u and w. A good

computer program not only will compute but also will give us Ŝ,

the predicted values, i, the residuals, yi – i, and other useful information.

PROBLEMS FOR SECTION 13-2

The data in Example 1 for the circumferences of picture frames are used for Problems 1, 2, and 3. For
Problems 1 and 2, we take b0 = 0.

1. Compute the value of S for the prediction formula c – 2 w + 2l, which uses b1 = b2 = 2.
2. Find the four values of S corresponding to b1 = 2.0, 2.2 and b2 = 1.8, 2.0. Which values of b, and b2

give the smallest value of S?
3. Compare the best predictor from Problem 2 with the performance of the least-squares estimator

computed by using formulas in Appendix II in the back of the book. Note that the least-squares formula
allows to be nonzero.

4. For the data in Table 13-1, find the value of S for b0 = –5.2, b2 = 0.176, and (a) b, = 78.8, (b) b, = 78.9,
and (c) b, = 79.1.

5. Continuation. Compare the values of S from (a) and (c) in Problem 4 with the least-squares value from
(b).

6. Consider the following data for predicting first-year GPA from SAT scores:

Using a least-squares computer program or the formulas from Appendix II in the back of the book, find
the least-squares equation for predicting GPA from SAT scores.
7. Continuation. Compute the value of Ŝ, using equation (4), for your equation in Problem 6.
8. Continuation. Compute the value of S for the equation

and compare it with the value of Ŝ from Problem 7.

13-3 ONE VERSUS TWO PREDICTORS

When we fit a two-predictor linear regression equation of the form

we often wish to compare our estimated coefficients with those that we would
have computed using only one predictor, either u or w, and an estimating
equation of the form

or one of the form

If we estimate the coefficients b0, b1 and b2 from actual data using least squares,

the estimated coefficient ordinarily will differ from will differ from

, and b0 will differ from both and .

EXAMPLE 4 Using the data in Table 13-1, compute the least-squares regression
coefficients for y versus u, and y versus w. Compare these with the coefficients
in Example 2.

SOLUTION. For the regression of y on u, we have the following summary
statistics:

Thus the least-squares regression coefficients are

The fitted equation is

Similarly, for the regression of y on w, we have
Thus,

and the fitted equation is
The coefficients using each predictor separately differ from those we

computed earlier:

Thus we see that the coefficients depend on which predictors are allowed to
enter the equation. Even the signs change.

MINIMIZING S

Our aim in estimating the coefficients in the equation

was to find a “good fit.” The device we chose for achieving this aim was the
method of least squares. This method requires minimizing

where is the predicted value. We can also compute the minimum value of S if
is a predicted value based on only one of the two variables u and w.

FACT: The predicted values yi based on always produce a value of S at least

as small as that produced by the predicted values based on a linear equation using only one of the

variables.

EXAMPLE 5 Compare the values of S for the one-predictor regression
equations for the data in Table 13-1 with the value Ŝ based on the least-squares
equation with two predictors.
SOLUTION. From Example 2, we have Ŝ = 116.54. For the one-predictor
equation with the predictor u,

we have SSy = 60, 530 and SSu = 10; so

Similarly, for the other one-predictor equation with the predictor w,

we have

If we had to choose only one predictor in this example, using u does a much
better job than using w, because S′ is much smaller than S*. But Ŝ is still smaller
(but not much smaller) than S′.

THE PROPORTION OF VARIABILITY EXPLAINED

We can easily extend the idea of Chapter 12 regarding the proportion of
variability in the dependent variable y explained by the fitted regression line. For
one predictor variable, we saw that the proportion of SSy that is removed by the
fitted regression is as follows:

We use R2 in place of the r2 of Chapter 12 because we wish to use formula (5)
with more than one predictor. Indeed, the same formula works for two predictors
if Ŝ is the minimum sum of squared deviations:

Just as we saw in Chapter 12, R2 must be between 0 and 1; that is,

Let us use a special notation so that we can compare the values of R2 for
different equations:

In this notation it is understood that we are predicting y. If there is any ambiguity
about which variable is being predicted, we need a more complicated notation.
From the FACT about Ŝ noted earlier in this section and formula (5), we can see
the following:

FACT RESTATED:

This result means that by using a second predictor variable in our regression

equation we never explain a smaller proportion of the variability in the
dependent variable y.

EXAMPLE 6 Compare R2 values for regression equations fitted to the data in
Table 13-1.
SOLUTION. From Example 5 we have that

and

As we expected, R2u,w > R2u and R2u,w > R2w.
Because R2u is almost the same as R2u,w in value, and both are extremely

close to the maximum value of 1, we can say that in terms of the variability of y
explained, the addition of the predictor w to the regression of y on u has very
little effect. Recall from Example 4 that the estimated regression coefficient for u
also changed only a little when w was included in the equation, from 77.5 to
78.9. The addition of w did have a much larger impact on the estimate b0 (going
from 18 to –5.2), even though and R2 changed very little. Appendix II in the

back of the book contains an alternative formula for the calculation of R2u,w.

MULTIPLE CORRELATION COEFFICIENT

We have already recognized that for one variable R2 = r2 and r is the correlation

coefficient. What is new here is that when two variables predict y, R is called the

multiple correlation coefficient. What is being correlated is yi with its

forecasted value , which is equal to .

Whereas r can range between –1 and +1 and thus can be negative, R takes

only values between 0 and 1. The formulas automatically make the sign positive

when we fit the regression coefficients .

PROBLEMS FOR SECTION 13-3

Table 13-3 contains data for 50 of the largest U.S. cities concerning their cancer mortality (y), the length of
time their drinking water has been fluoridated (u), and the percentage of population age 65 or older (w). We
are interested in examining the relationship between cancer mortality and the fluoridation of water. The data
in Table 13-3 are used for the following 12 problems. Summary data are

1. Using the first 10 and last 10 observations, make a plot of y against u.
2. Using the same observations, plot y against w.
3. Continuation, (a) In the plots of Problems 1 and 2, does y seem to be related to either variable? If so,

describe how. (b) Are there any special features of the plots that you can note?
4. Compute the regression of y on u.
5. Compute the regression of y on w.
6. Continuation. Draw the estimated regression lines from Problems 4 and 5 on your plots from Problems

1 and 2. Do the relationships seem linear? Do there appear to be any outliers? If so, which cities are
they?

*7. Use a least-squares computer program to compute the values of R2 for the two regressors in Problems 4
and 5.

*8. Using the formulas from Appendix II in the back of the book, or a least-squares computer program,
calculate the fit of the two-predictor regression equation, with predictors u and w.

*9. Continuation. Use a least-squares computer program to calculate Ŝ and R2 for your equation in Problem
8.

*10. Continuation. Use a least-squares computer program to compare the R2 values for the two-predictor
regression equation and the regression of y on u.

TABLE 13-3
For the 50 largest cities in 1970:
cancer mortality, number of years city water
supply has been fluoridated, percent age 65 or more



Note: Indianapolis, Nashville, and Jacksonville are not included because their boundaries changed markedly
between 1960 and 1970; therefore the table lists 50 of the 53 largest cities.
Source: Data collected by W. Mason from various U.S. Government publications.

*11. Continuation. Interpret the equation in Problem 8, explaining the effect of adding w to the regression of
y on u.

12. Continuation, (a) Do your answers to Problems 1 through 11 shed any light on the possible causal links
between the fluoridation of water supplies and cancer? (b) What other variables might you want to
include in a further analysis of these data?

13-4 MULTIPLICATIVE FORMULAS WITH TWO

PREDICTORS

Not all formulas with two predictor or independent variables have the nice
additive form of our first examples. Some other examples of familiar formulas
with two independent (predictor) variables are the following:

The first two equations are examples of mathematical laws; the third is the
ideal gas law from physics. By taking logarithms of both sides of these
equations, we can write them in an additive form:

In all three of these equations we are predicting the value of a dependent
variable, y, from the values of two independent variables, u and w, using a linear
relation of the form

For the three examples, the variables are

The coefficients are

In the social and biological sciences it is often natural to cast models with
two or more predictors in a multiplicative form. Taking logarithms of both sides
of the equation is thus a common practice.
EXAMPLE 7 Computing the volume of trees. In estimating the volume V of a
cylinder from πr2H, we could have errors in measurement for r and H. But more
generally, just as in linear regression with one predictor, we rarely expect, even
without measurement error, perfect relationships; we expect only general trends.
For example, foresters often want to predict the volume of tree trunks to be cut

for lumber. To a first, and very rough, approximation, we might conjecture that
tree trunks are roughly cylindrical, which suggests the possibility of using the
equation

Actually, large tree trunks are tapered, being wide at the bottom and narrow at
the top. Thus, another possible estimation equation is that for a cone:

where r is the radius at the bottom of the cone.
In practice, foresters measure the circumference of a tree trunk at breast

height and thus get a measure of the radius r. Table 13-4 contains measurements
on 10 tree trunks; their volumes were determined by cutting the trees down,
submerging them in a tank of water, and measuring the amount of water
displaced. Note that the tree volume is much more like that of a cone than that of
a cylinder. The magnitude of the difference between the cone formula prediction
and the true volume increases as the volume increases, and the predictions are
too small for the small trees and too large for the large ones. This can be seen in
Table 13-4. Perhaps we can find a better relationship for predicting volume than
either formula (7) or formula (8).

TABLE 13-4
Measurements on 10 tree trunks

Note: The radius was measured to the nearest whole inch, and the r value is the number of inches divided
by 12.

We have just seen that taking logarithms in multiplicative equations like
formulas (7) and (8) leads to linear equations of the form

By taking logarithms of both sides for formulas (7) and (8), we get two such
equations, where

The value of b0 is log π for formula (7) and log(π/3) for formula (8).
Our prediction V* from the formula for the cone, expression (8),

underestimated when the volume was small and overestimated when it was
large. Using a two-predictor linear regression equation, we can try to choose
values of b0, b1 and b2 to get better predictions for volume than the formula for a
cone provides. For our calculations in this example we use natural logarithms,
because in scientific problems they are more often used than common
logarithms. The natural logarithms of the values of the three variables are listed
in Table 13-5. We use the notation In for logarithms to the base e and log for

logarithms to the base 10. Table 13-5 also lists the values of

TABLE 13-5
Natural logarithms of radius, height,
and volume for data on 10 trees in Table 13-5

the predictions in the In scale from our approximating formula for the volume of
a cone. The sum of squared residuals using this equation is

By using a computer program (or the formulas in Appendix II in the back of
the book), we find that the least-squares fitted regression line for this problem is
and the minimizing value of sum of squared residuals is

As we expected, Ŝ is less than S*. The proportion of variability in y

explained by this equation is

a value very close to 1. The numerical values we give here come from rounding
values from the output of our computer program. If we use the formulas in
Appendix II, our answers may differ slightly. The constant term b0, which we
compute using the formula

is sensitive to relatively small shifts in the values of the other two coefficients b1
and b2. When b1 changes by 0.20 (from 1.44 to 1.64) and b2 by 0.06 (from 0.883
to 0.823), the constant term b0 goes from 0.152 to 0.427. The reason is that the
three variables y, u, and w for the 10 observations in Table 13-5 are highly
interrelated. This can be seen by looking at the R2 values for the regressions of y
on u and y on w,

and then looking at the square of the correlation between u and w: 0.986. Not
only is y almost perfectly linearly related to u and to w, but also the relationship
between u and w is almost linear. Although u and w are measuring different
quantities (log-radius and log-height), the two are closely related for our range of
observation. Thus we have trouble getting stable coefficients that are unaffected
by minor changes in the data.

When two predictors are highly related, as in this example, we say that they
are nearly collinear, and we must be careful how we interpret our estimated
regression coefficients.

The improvement of the two-variable least-squares equation over the one-
variable equations is measured by the differences in the R2 values:
1. The improvement due to the addition of variable w, given that u is already in
the equation

2. The improvement due to the addition of variable u, given that w is already in
the equation

Looking at these differences in R2’s cannot tell us if we need to add the variable
to the equation. To answer this question we need the estimated standard errors of
the least-squares coefficients in the two-variable equation. For this problem, our
computer program gives these:

We use the term standard error here in place of standard deviation because most
computer programs that produce regression results do so. Thus, estimated
standard error means estimated standard deviation.

To check on the plausibility of b1 = 0 in the two-variable equation we
compute the statistic

Because we have n = 10 observations in this example, and because we have
estimated 3 parameters (b0, b1 and b2), we compare the statistic t to values of the
t distribution with n – 3 = 10 – 3 = 7 degrees of freedom. From Table A-2 in
Appendix III in the back of the book for 7 degrees of freedom we find t = 2.36
gives a probability of 0.95 (two-sided), or the probability that a value exceeds
2.36 is 0.025. Because our observed value of 2.51 exceeds the tabled value 2.36,

we have reasonable grounds for concluding that b1 is different from zero. We
conclude that we should not drop the predictor u from the equation.

Similarly, to check on the plausibility of b2 = 0, we compute

We conclude that we should not drop variable w from the equation.

A Cautionary Note. In this example we first looked at the data in several
forms, fitted three regressions and computed the corresponding R2 values, and
then computed the t statistics for the coefficients b1 and b2. The tail probabilities
associated with the t tests do not have the clear-cut interpretation that we get for
a single t test where the data have not been examined previously.

We are using the probabilities associated with the t values as approximate
guides. The more ways that we look at the data, the less meaning such tail
probabilities have in subsequent tests. This point is even more important to keep
in mind when we have more than two predictors.

PROBLEMS FOR SECTION 13-4

Using a dozen hen’s eggs from the grocery store, Dempster measured the long (L) and short (W) diameters
and the volume (V). Dempster’s goal was similar to the one we had in Example 7; he wished to see if the
formula for an ellipsoid of revolution,

where k = π/6, was a reasonable approximation for measuring the volume of eggs. By taking logarithms, the
ellipsoid formula becomes

where

Dempster’s data are given in Table 13-6, using natural logarithms.
TABLE 13-6

Measurements for a dozen hen’s eggs

† u = In L, w = In W, and y = In V (natural logarithms).
Source: Adapted from A. P. Dempster (1969). Elements of Continuous Multivariate Analysis, p. 151.
Addison-Wesley, Reading, Mass.

1. Using the data in Table 13-6 and formula (9), calculate the predicted values of y.
2. Continuation. Plot the predicted values from Problem 1 against the observed. Do the points fall roughly

along a straight line?
3. Continuation. Compute the sum of squared residuals using the prediction equation (9).
4. Continuation. What proportion of the variability in y is explained by equation (9)?
5. The actual least-squares fit for the multiple regression of y on u and w is

with R2 = 0.621. How much of an improvement is the least-squares equation fit over equation (9), in
terms of proportion of variability explained?

6. The estimated standard errors for the coefficients of u and w in the least-squares equations are 0.267 and
0.546. Compute the t ratios for and , and check on the plausibility of (a) b1 – 0, (b) b2 = 0.

13-5 SUMMARY OF CHAPTER 13

Formula numbers correspond to those used earlier in the chapter.
1. With a dependent variable, y, and two predictors, u and w, the linear

regression model is

where e is a random error with zero mean and unknown variance σ2.

2. The least-squares estimates of the coefficients in (1), , yield

the minimum sum of squared residuals:

3. The proportion of variability of the dependent variable, y, explained by the
least-squares fit is

As with a single predictor, 0 ≤ R2 ≤ 1. If we use two predictors in our
regression equation, we can always get a higher value of R2 than by using
only one of them.

4. R, the positive square root of R2, is called the multiple correlation coefficient.

5. Multiplicative formulas with two or more predictors can be changed into
formulas with linear equations by taking logarithms.

6. To check on the plausibility that a true regression coefficient, bi takes the
value zero, we compute the t ratio

and compare the statistic with values from tables of the t distribution with n
– 3 d.f.

SUMMARY PROBLEMS FOR CHAPTER 13

1. In an investigation of the deterrent effects of punishment on homicide rates, a
sociologist planned to use state data to predict

from

If the certainty of punishment (variable u) and the severity (variable w)
actually deter homicides, what signs do you expect to find for b1 and b2 in
the following regression?

2. Continuation. List three additional variables that might have effects on
homicide rates in addition to certainty and severity of punishment.

3. Continuation. An analysis carried out using data from 48 states for 1960 and
the model from Problem 1 yielded the following least-squares coefficients
and estimated standard errors:

Check the plausibility of (a) b1 = 0, (b) b2 = 0.
4. Use the formulas from Appendix II in the back of the book or a least-squares

computer program to fit an equation of the form y = b0 + b1u + b2w to the
following data:

5. Continuation. Find the value of Ŝ and compute the proportion of variability
explained, R2.

6. A study predicting freshman GPA using SAT mathematics and verbal scores
yields the following least-squares estimates for data from 20 college
students:

Test the plausibility that (a) b1 = 0, (b) b2 = 0.

TABLE 13-7
Employment, price
deflator, and GNP by years

Note: The data have been centered on the sample means y = 65, 317, u = 101.7, w = 387, 698.
Source: Adapted from Tables 1 and 2 in J. W. Longley (1967). An appraisal of least squares programs for
the electronic computer from the point of view of the user, Journal of the American Statistical Association
62:819–841.

7. In a study predicting

from

for the years 1947 to 1962, the data in Table 13-7 on page 401 were
gathered. Use a least-squares computer program to get an estimated equation
for the linear regression of y on u and w.
8. Continuation. Test the plausibility that (a) b1 = 0, (b) b2 = 0.
9. Continuation. Calculate Ŝ and R2 for your equation in Problem 7.
10. Continuation. Compare the R2 value from Problem 9 with the value of R2 for
the regression of y on u alone.

REFERENCES

S. Chatterjee and B. Price (1977). Regression Analysis by Example, Chapter 3. Wiley, New York.
N. R. Draper and H. Smith (1980). Applied Regression Analysis, second edition, Chapter 4. Wiley, New

York.
F. Mosteller and J. W. Tukey (1977). Data Analysis and Regression, Chapter 12. Addison-Wesley, Reading,

Mass.

Multiple 14
Regression






Learning Objectives

1. Extending the regression model and least squares to more than two
predictors

2. Interpreting the output from computer programs for multiple regression
analysis

3. Choosing subsets of predictors to be included in a regression model
4. Examining plots of residuals from a multiple regression analysis for patterns

that relate to omitted predictors

14-1 MANY PREDICTORS

All of the basic ideas discussed so far carry over to the situation with more than
two predictors. For example, the regression model for five predictors is

where t, u, v, w, and x are the values of the predictors and e is a random error
with mean 0 and unknown variance σ2. This is the multiple regression model.
Our observations now come in sextuples:

In this chapter we discuss aspects of fitting multiple regression models to
data and how to choose predictors for inclusion in the models being fitted. The
calculations for multiple regression are complicated and extremely time-
consuming unless one has a high-speed computer available. Thus we develop the
material in this chapter as if we had available a computer program suitable for
multiple regression—one that will do all of the calculations we require.

In Section 14-2, after describing some aspects of choosing multiple
regression models, we develop a detailed practical example. Throughout this
example we introduce new ideas and illustrate how they are used in the analysis
of data. We conclude the chapter with a series of real-life examples that will
guide us in discussing the interpretation and use of multiple regression.

COMPUTER OUTPUT AND ANALYSES

The least-squares estimators for the regression coefficients in equation (1)
minimize

We label them , and . Our multiple regression computer

program will compute these estimated coefficients as well as the following:

a) the minimum value of S,

b) the predicted values
c) the residuals yi –
d) the estimated standard error for each of the regression coefficients
e) the proportion of variability explained by the estimated regression equation:

Some multiple regression programs produce even more information, such as
plots of residuals similar to those we used when we had a single predictor. It
takes considerable training and guidance in order to be able to interpret even
those items listed here, such as residuals, and so we concentrate on these in this
chapter.

USING t TESTS FOR REGRESSION COEFFICIENTS

It is possible to use a t test to see if a coefficient in the regression equation
differs substantially from zero or if it is near enough to zero for us to consider
the possibility of omitting that variable. For example, to check whether we wish
to include the variable u in equation (1), we can compute the t ratio

These t values often are included as part of standard computer output. The
number of degrees of freedom depends on the total number of observations
available, n, and the number of constants fitted, k, and equals n – k. We might
want to keep a variable in a regression equation, however, even if the coefficient
is not distinctly different from zero. For example, we may have theoretical
reasons or experience showing that a variable should be kept. If a regression

relating automobile fuel consumption to several variables did not give a
significant coefficient to miles driven, we would still leave it in.

PROBLEMS FOR SECTION 14-1

1. This chapter does not give computational formulas for obtaining multiple regression equations. Why
not?

2. To predict a person’s weight from other variables, we might use a multiple regression based on
y = weight in pounds
t = height in inches
u = chest circumference in inches
v = waist circumference in inches
w = hip circumference in inches
x = sex (0 if male, 1 if female)
Write out the sextuple of measurements for your own body.

3. How many constants are to be fitted in equation (1)?
4. What does e measure in equation (1)?
5. If you had a regression equation and could measure for many observations, what would the average

value of e be, approximately?
6. Because e varies from one measurement to another, how can we estimate its variance?
7. If ei is the error for the ith measurement, write equation (2) in terms of e’s.
8. Explain the distinction between bi and . Do we usually know or find bi and ?
9. Explain the distinction between yi and . Do we usually know or find yi and ?
10. What measure of fit of the regression equation to the data is proposed?

14-2 CHOOSING A MULTIPLE REGRESSION EQUATION

LOOKING AT THE DATA

The data for multiple regression analyses are more extensive and more complex
than those for the other statistical methods discussed in this book, but we cannot
rely on automated computer programs to do all of our work for us. We still need
to look at the raw data, as well as at various numbers produced as by-products of
our analyses. After we fit a multiple regression model to our data, we should
look at the residuals. They may have patterns. A typical way to look at residuals
is to plot them against the estimated value of the y variable, , or against the
predictors. By doing this we may find a pattern in the residuals. If there is a
curvilinear trend, we might either introduce transformations into the variables or
add a quadratic term to the prediction.

There may also be one or more very wild observations. If that happens, then

at least one residual should be extremely large compared with the others. Then
we may want to consider setting aside such points and redoing the multiple
regression analysis on the remaining points.

FOCUSING ON A SUBSET OF PREDICTORS

How can we decide which variables actually belong in the regression equation?
Expression (1) has five predictors, but could we actually settle for only two or
three? We have several possible approaches for focusing on a subset of the
possible predictors.

We might compute a t ratio for each of the estimated regression coefficients,
as in equation (5). In each, we take the estimated regression coefficient and
divide it by the estimate of its standard error. If the ratios are small, we are often
inclined to omit the corresponding predictor from our equation. We must resist,
however, the desire to conduct blindly significance tests at a prespecified level
like 0.05 on the t ratio for every possible predictor. Especially with social
science data, we may be unwise to delete these slight variables unless other
variables are doing the job. These variables could have useful effects even
though the effects are too small and our data are too few to prove firmly that
they are useful variables. Remember that tail probabilities do not have the usual
simple interpretation when we look at the data over and over again and then
conduct several t tests.

As an alternative we might wish to run regression analyses on the same set of
data to see how well we do when we eliminate some variables from the
regression equation. Generally speaking, we prefer equations with fewer
variables. Often we cannot readily tell the effect of removing a variable. To
determine this we compare the sums of squared deviations, Ŝ, for two different
equations. We can do this directly by looking at Ŝ or R2, remembering that as we
delete variables from the equation, Ŝ increases in value, and thus R2 decreases.

If we use 5 variables, we will have fitted 6 regression coefficients (including
the constant term, b0), and therefore we will have given up 6 degrees of freedom.
This means that a more appropriate quantity to look at than Ŝ is the estimate of
the error variance, σ2, from the multiple regression model (1):


Click to View FlipBook Version