ERSH 8320
Lab #11: 11/12/2008
Regression with Multiple Categorical Independent Variables
Curvilinear Regression
In this exercise, we:
1. Use effect coding in an analysis with multiple categorical IVs.
2. Use the SPSS regression curve fitting procedure.
Exercise #1 Multiple categorical IVs in SPSS.
We will run the regression with two categorical IVs that was used as an example in class today. The data
can be found in the car.sav data file. The effect codes are already used.
The data are from Neter (1996, p. 705):
A consumer organization studied the effect of age of automobile owner on size of cash offer for a used
car by utilizing 12 persons in each of three age groups (young, middle, elderly) who acted as the owner
of a used car. A medium price, six-year-old car was selected for the experiment, and the ‘owners’
solicited cash offers for this car from 36 dealers selected at random from dealers in the region.
Randomization was used in assigning the dealers to the ‘owners’. The offers (in hundreds of dollars) can
be found on the class website.
Effect coding in file:
Age group: A1 is for young, A2 is for middle. Elderly is remainder group.
Gender group: G1 is for male. Female is remainder group.
A1G1 is A1 x G1; A2G1 is A2 x G1 – both make up the interaction.
First, run the regression model using the regression package:
1. Go to Analyze…Regression…Linear.
2. Put Y in the dependent box.
3. Put A1, A2, G1, A1G1, A2G1 in the independents box.
4. Click on “continue”.
5. What do the following terms mean:
Parameter Estimated Interpretation
value
Intercept (a)
b1 – A1
b2 – A2
b3 – G1
b4 – A1G1
b5 – A2G1
Is the interaction significant? How about the main effect of age group? We cannot tell using the
regression package. Therefore, we will use the GLM package next:
1. Go to Analyze…General Linear Model…Univariate.
2. Put Y in the dependent variable box.
3. Put both Age and Gender (not the effect coded variables) in the fixed factors box.
4. Click OK.
Now, is the interaction significant? How about the main effect of age group?
These are questions you should be able to answer from your ANOVA knowledge from ERSH 8310.
Exercise #2: Curvilinear regression.
We wish to decide what regression model seems to fit the data best, a linear model or a curvilinear
model.
From Pedhazur, p. 522:
Suppose that we are interested in the effect of time spent in practice on the performance of a visual
discrimination task. Subjects are randomly assigned to different levels of practice, following which a test
of visual discrimination is administered, and the number of correct responses is recorded for each
subject. As there are six levels the highest-degree polynomial possible for these data is the fifth. Our
aim, however, is to determine the lowest degree-polynomial that best fits the data.
SPSS has a built in curvilinear regression procedure that we will use. The data are saved in the file
curve.sav. X is the time variable. Y is the score on the visual discrimination task.
1. Go to Analyze…Regression..Curve Estimation…
a. Put the task score (Y) in the dependents box.
b. Put time (X) in the independent box.
c. Select the linear, quadratic, and cubic functions.
d. Be there is a check by “Include constant in equation” – for the intercept
e. B e there is a check by “plot models” – to visually see how each model appears.
f. Press OK.
2. After you receive the output, the next step is to figure out which model fits best. We can think
of this process as running a series of nested models, so we can use the change in R2 to figure out
if adding an additional term is significant. Complete the following boxes:
a. Testing for Quadratic
i. Full model: quadratic
ii. Reduced model: linear
Model R2 Numerator df
Full (quadratic)
Reduced (linear)
Hypothesis test:
F= =
p-value (use “=fdist(F, num_df, denom_df)” in Excel):
If the test for the quadratic is significant, we now move to test whether or not the test for the cubic is
significant, too. If not, we stop with just the linear.
Let’s test the cubic anyway:
b. Testing for Cubic
i. Full model: Cubic
ii. Reduced model: Quadratic
Model R2 Numerator df
Full (cubic)
Reduced (quadratic)
Hypothesis test:
F= =
p-value (use “=fdist(F, num_df, denom_df)” in Excel):
Does the graph confirm what you found?