Problem 1 (9 Points) Use the data in BWGHT.RAW for this problem.

(i) How many women are in the sample, and how many report smoking during

pregnancy? (2)

(ii) Among women who smoked during pregnancy, what is the average number of

cigarettes smoked per day? (1)

(iii) Find the average of fatheduc in the sample. Why are only 1,192 observations

used to compute this average? (2)

(iv) Report the average family income and its standard deviation in dollars. (4)

Problem 2 (9 Points) Use the data in WAGE2.RAW to estimate a simple

regression explaining monthly salary (wage) in terms of IQ score (IQ).

(i) Estimate a simple regression model where a one-point increase in IQ changes

wage by a constant dollar amount. Use this model to ﬁnd the predicted

increase in wage for an increase in IQ of 20 points. Does IQ explain most of

the variation in wage? (5)

(ii) Now, estimate a model where each one-point increase in IQ has the same per-

centage eﬀect on wage. If IQ increases by 20 points, what is the approximate

percentage increase in predicted wage? (4)

Problem 3 (3 Points) Which of the following can cause OLS estimators to be

biased?

(i) Heteroskedasticity.

(ii) Omitting an important variable.

(iii) A sample correlation coeﬃcient of .95 between two independent variables both

included in the model.

Problem 4 (3 Points) Which of the following can cause the usual OLS t

statistics to be invalid (that is, not to have t-distributions under H0)?

(i) Heteroskedasticity.

(ii) A sample correlation coeﬃcient of .95 between two independent variables that

are in the model.

(iii) Omitting an important explanatory variable.

1

Problem 5 (18 Points) The ﬁle CEOSAL2.RAW contains data on 177 chief

executive oﬃcers and can be used to examine the eﬀects of ﬁrm performance on

CEO salary.

(i) Estimate a model relating annual salary to ﬁrm sales and market value. Make

the model of the constant elasticity variety for both independent variables.

Report the results in standard form. (8)

(ii) Add prof its to the model from part (i). Why can this variable not be included

in logarithmic form? Would you say that these ﬁrm performance variables

explain most of the variation in CEO salaries? (3)

(iii) Add the variable ceoten to the model in part (ii). What is the estimated

percentage return for another year of CEO tenure, holding other factors ﬁxed?

(2)

(iv) Find the sample correlation coeﬃcient between the variables log(mktval) and

prof its. Are these variables highly correlated? What does this say about the

OLS estimators? (5)

Problem 6 (7 Points) Consider an equation to explain salaries of CEOs in

terms of annual ﬁrm sales, return on equity (roe, in percent form), and return on

the ﬁrm’s stock (ros, in percent form):

log(salary) = β0 + β1log(sales) + β2roe + β3ros + u.

(i) In terms of the model parameters, state the null hypothesis that, after control-

ling for sales and roe, ros has no eﬀect on CEO salary. State the alternative

that better stock market performance increases a CEO’s salary. (2)

(ii) Using the data in CEOSAL1.RAW, the following equation was obtained by

OLS:

log(salary) = 4.32 + 0.280 log(sales) + .0174 roe + 0.00024 ros

(.32) (.035) (.0041) (.00054)

n = 209 , R2 = .283.

By what percentage is salary predicted to increase if ros increases by 80

points? (2)

(iii) Test the null hypothesis that ros has no eﬀect on salary against the alternative

that ros has a positive eﬀect. Carry out the test at the 5% signiﬁcance level.

(3)

2

Problem 7 (12 Points) The following model can be used to study wether

campaign expenditures aﬀect election outcomes:

voteA = β0 + β1log(expendA) + β2log(expendB)

+ β3prtystrA + u,

where voteA is the percentage of the vote received by Candidate A, expendA and

expendB are campaign expenditures by Canditates A and B, and prtystrA is a

measure of party strength for Candidate A (the percentage of the most recent pres-

idential vote that went to A’s party).

(i) In terms of the parameters, state the null hypothesis that a 1% increase in A’s

expenditures is oﬀset by a 1% increase in B’s expenditure. (2)

(ii) Estimate a model using the data in VOTE1.RAW that directly gives the t

statistic for testing the hypothesis in part (ii). What do you conclude? (Use

a two-sided alternative.) (10)

Problem 8 (20 Points) Consider a model where the return to education depends

on the amount of work experience (and vice versa):

log(wage) = β0 + β1educ + β2exper + β3educ · exper + u.

(i) State the null hypothesis that the return to education does not depend on the

level of exper. (1)

(ii) Use the data in WAGE2.RAW to test the hypothesis in (i) against your stated

alternative. Carry out the test at a 5% signiﬁcance level. (3)

(iii) Let θ1 denote the return to education (in decimal form), when exper = 10 :

θ1 = β1 + 10β3. Obtain θˆ1 and a 95% conﬁdence interval for θ1. (8)

(iv) Now estimate the model

log(wage) = β0 + β1educ + β2exper + β3tenure + β4married

+ β5black + β6south + β7urban + u.

Holding other factors ﬁxed, what is the approximate diﬀerence in monthly

salary between blacks and nonblacks? (2)

(v) Extend the model from part (iv) to allow wages to diﬀer across four groups of

people: married and black, married and nonblack, single and black, and single

and nonblack. What is the estimated wage diﬀerential between married blacks

and married nonblacks? (6)

3

Problem 9 (3 Points) Which of the following are consequences of heteroskedas-

ticity?

(i) The OLS estimators, βˆj, are inconsistent.

(ii) The usual F statistic no longer has an F distribution.

(iii) The OLS estimators are no longer BLUE.

Problem 10 (3 Points) Consider a linear model to explain monthly beer con-

sumption:

beer = β0 + β1inc + β2price + β3educ + β4f emale + u

E(u|inc, price, educ, f emale) = 0

V ar(u|inc, price, educ, f emale) = σ2price

Write the transformed equation that has a homoskedastic error term.

Problem 11 (12 Points) Use the data in VOTE1.RAW for this problem.

Compute the Breusch-Pagan test for heteroskedasticity in a model with voteA as

the dependent variable and prtystrA, democA, log(expendA), and log(expendB) as

independent variables. Use the F statistic version. Estimate a regression model

that directly gives the F statistic. What are the null and the alternative hypotheses

in this F test. Is there evidence for heteroskedasticity at a 10% signiﬁcance level?

Problem 12 (7 Points)

Use the data in HPRICE1.RAW for this problem.

Consider the model

log(price) = β0 + β1log(lotsize) + β2log(sqrf t) + β3bdrms + u.

Obtain the heteroskedasticity-robust standard errors for this equation. Discuss any

important diﬀerences with the usual standard errors. What do you suggest about

heteroskedasticity in this model?

4

Problem 13 (10 Points) Use the data in CEOSAL1.RAW for this problem.

Consider the following model to explain salaries of CEOs in terms of annual ﬁrm

sales, return on equity (roe) and a dummy variable, rospos, which is equal to one

if ros > 0 and equal to zero if ros ≤ 0 (ros: return on the ﬁrm’s stock, in percent

form):

log(salary) = β0 + β1log(sales) + β2roe + β3rospos + u.

Generate the dummy variable rospos and apply the RESET to this model. State

the unrestricted test equation, the null, and the alternative hypothesis of functional

form misspeciﬁcation. Is there evidence of functional form misspeciﬁcation?

Problem 14 (4 Points) Decide if you agree or disagree with each of the

following statements :

(i) Like cross-sectional observations, we can assume that most time series obser-

vations are independently distributed.

(ii) The OLS estimator in a time series regression is unbiased under the ﬁrst three

Gauss-Markov assumptions.

(iii) A trending variable cannot be used as the dependent variable in multiple

regression analysis.

(iv) Seasonality is not an issue when using annual time series observations.

5