The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Chapter 5 # 1 Scatter Diagrams • Scatter diagrams are used to demonstrate correlation between two quantitative variables. • Often, this correlation is

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2016-02-21 04:45:04

Scatter Diagrams Correlation Classifications

Chapter 5 # 1 Scatter Diagrams • Scatter diagrams are used to demonstrate correlation between two quantitative variables. • Often, this correlation is

Scatter Diagrams Correlation Classifications

• Scatter diagrams are used 19 • Correlation can be Regression Plot
to demonstrate 18 classified into three basic
correlation between two Weight 17 categories Weight 19
quantitative variables. 16 18
15 • Linear 17
• Often, this correlation is 14 16
linear. 13 • Nonlinear 15
12 14
• This means that a straight • No correlation 13
line model can be 20 21 22 23 24 12
developed. 21 22 23 24
Length 20

Length

Chapter 5 # 1 Chapter 5 # 2

Correlation Classifications Correlation Classifications

• Two variables may be Curvilinear Correlation
correlated but not through
a linear model. result 500 • Two quantitative 50
400 variables may not be 40
• This type of model is 300 correlated at all C4 30
called non-linear 200 20
100 5 10 15 10
• The model might be one sample
of a curve. 0 0
0 -10
-20
Chapter 5 # 3 -30
-40

0 5 10 15 20 25
Animalno

Chapter 5 # 4

Linear Correlation Linear Correlation

• Variables that are Regression Plot
correlated through a
linear relationship can Weight 19 • Negatively correlated Regression Plot
display either positive 18 variables vary as
or negative correlation 17 opposites 4
16
• Positively correlated 15 • As the value of one Student GPA 3
variables vary directly. 14 variable increases the
13 other decreases
12
21 22 23 24
20
Length

2 10 20 30 40
0
Hours Worked Chapter 5 # 6

Chapter 5 # 5

Strength of Correlation Strength of Correlation

• Correlation may be strong, Regression Plot • When the data is Regression Plot
moderate, or weak. distributed quite close
to the line the
• You can estimate the Student GPA 4 correlation is said to Final Exam Score 95
strength be observing the 3 be strong 90
variation of the points 85
around the line • The correlation type is 80
independent of the 75
• Large variation is weak strength. 70
correlation 65
60
2 55 65 75 85 95
0 50
Midterm Stats Grade
55

10 20 30 40

Hours Worked

Chapter 5 # 7 Chapter 5 # 8

The Correlation Coefficient Interpreting r

• The strength of a linear relationship is measured • The sign of the correlation coefficient tells us the
by the correlation coefficient direction of the linear relationship

• The sample correlation coefficient is given the  If r is negative (<0) the correlation is negative. The
symbol “r” line slopes down

• The population correlation coefficient has the  If r is positive (> 0) the correlation is positive. The
symbol “ρ”. line slopes up

Chapter 5 # 9 Chapter 5 # 10

Interpreting r Cautions

• The size (magnitude) of the correlation • The correlation coefficient only gives us an
coefficient tells us the strength of a linear indication about the strength of a linear
relationship relationship.

 If | r | > 0.90 implies a strong linear association • Two variables may have a strong curvilinear
 For 0.65 < | r | < 0.90 implies a moderate linear relationship, but they could have a “weak” value
for r
association
 For | r | < 0.65 this is a weak linear association Chapter 5 # 12

Chapter 5 # 11

Fundamental Rule of Correlation Setting

• Correlation DOES NOT imply causation • A chemical engineer would like to determine if a
relationship exists between the extrusion
– Just because two variables are highly correlated does temperature and the strength of a certain
not mean that the explanatory variable “causes” the formulation of plastic. She oversees the
response production of 15 batches of plastic at various
temperatures and records the strength results.
• Recall the discussion about the correlation
between sexual assaults and ice cream cone sales Chapter 5 # 14

Chapter 5 # 13 The Experimental Data

The Study Variables Temp 120 125 130 135 140
Str 18 22 28 31 36
• The two variables of interest in this study are the strength Temp 145 150 155 160 165
of the plastic and the extrusion temperature. Str 40 47 50 52 58

• The independent variable is extrusion temp. This is the Chapter 5 # 16
variable over which the experimenter has control. She
can set this at whatever level she sees as appropriate.

• The response variable is strength. The value of “strength”
is thought to be “dependent on” temperature.

Chapter 5 # 15

The Scatter Plot

• The scatter diagram for Scatter diagram of Strength vs Temperature Conclusions by Inspection
the temperature versus 60
strength data allows us to • Does there appear to be a relationship between the
deduce the nature of the Strength (psi) 50 study variables?
relationship between these
two variables 40 • Classify the relationship as: Linear, curvilinear, no
relationship
30
• Classify the correlation as positive, negative, or no
20 130 140 150 160 170 correlation
120 Temperature (F)
• Classify the strength of the correlation as strong,
What can we conclude simply from the scatter diagram? moderate, weak, or none

Chapter 5 # 17 Chapter 5 # 18

Computing r Computing r

r = n 1 1 Σ x−x  y −y  r = n 1 1 Σ[(Z x )(Z y )]
− sx sy −

df z-scores z-scores
for y data
for x data
Chapter 5 # 19
Chapter 5 # 20

Computing r - Example Classifying the strength of linear
correlation
See example handout for the plastic strength versus
extrusion temperature setting •The strength of a linear correlation between the response
and the explanatory variable can be assigned based on r
Chapter 5 # 21
These classifications are discipline dependent
Classifying the strength of linear
correlation Chapter 5 # 22

For this class the following criteria are adopted: Scatter Diagrams and Statistical
If |r| > 0.90 then the correlation is strong Modeling and Regression
If |r| < 0.65 then the correlation is weak
If 0.65 < |r| < 0.90 then the correlation is • We’ve already seen that the best graphic for
illustrating the relation between two quantitative
moderate variables is a scatter diagram. We’d like to take
this concept a step farther and, actually develop a
Chapter 5 # 23 mathematical model for the relationship between
two quantitative variables

Chapter 5 # 24

The Line of Best Fit Plot Using the Line of Best Fit to Make
Predictions

• Since the data appears to Strength 60 • Based on this graphical Strength 60
be linearly related we can model, what is the 50
find a straight line model 50 predicted strength for 40
that fits the data better plastic that has been 30
than all other possible 40 extruded at 142 degrees? 20
straight line models. 30
120 130 140 150 160 170
• This is the Line of Best 20 Temp
Fit (LOBF) 120 130 140 150 160 170
Temp Chapter 5 # 26

Chapter 5 # 25

Using the Line of Best Fit to Make Using the Line of Best Fit to Make
Predictions Predictions

• Given a value for the • Based on this graphical Strength 60
predictor variable, determine model, at what
the corresponding value of temperature would I need 50
the dependent variable to extrude the plastic in
graphically. order to achieve a strength 40
of 45 psi? 30
• Based on this model we
would predict a strength of 20
appx. 39 psi for plastic 120 130 140 150 160 170
extruded at 142 F
Temp
Chapter 5 # 27
Chapter 5 # 28

Using the Line of Best Fit to Make Computing the LSR model
Predictions
• Given a LSR line for bivariate data, we can use that
• Locate 45 on the response line to make predictions.
axis (y-axis)
• How do we come up with the best linear model
• Draw a horizontal line to from all possible models?
the LOBF
Chapter 5 # 30
• Drop a vertical line down
to the independent axis The straight line model

• The intercepted value is • Any straight line is completely defined by two
the temp. required to parameters:
achieve a strength of 45
psi  The slope – steepness either positive or negative
 The y-intercept – this is where the graph crosses the
Chapter 5 # 29
vertical axis
Bivariate data and the sample linear
regression model Chapter 5 # 32

• For example, look at
the fitted line plot of
powerboat
registrations and the
number of manatees
killed.

• It appears that a linear
model would be a
good one.

yˆ = b o + b1 x
Chapter 5 # 31

The Parameter Estimators Calculating the Parameter Estimators

• In our model “b0” is the estimator for the • The equation for the LOBF is:
intercept. The true value for this parameter is β0
yˆ = bo + b1x
• “b1” estimates the slope. The true value for this
parameter is β1 Chapter 5 # 34

Chapter 5 # 33 Computing the Intercept Estimator

Calculating the Parameter Estimators • The intercept estimator is computed from the
variable means and the slope:
• To get the slope estimator we use:
b0 = y − b1x
= n Σ (x y )− Σ x ⋅ Σ y
n Σ x 2 − (Σ x )2 • Realize that both the slope and intercept
( )b1 estimated in these last two slides are really point
estimates for the true slope and y-intercept
or
Chapter 5 # 36
b1 = r  sy 
sx

Chapter 5 # 35

Revisit the manatee example Computing the estimators

Look at the summary statistics and correlation So the slope is:
coefficient data from the manatee example

Variable N Mean SEMean StDev b1 = r  sy 
Boats 10 74.10 2.06 6.51 sx
Deaths 10 55.80 5.08 16.05
b1 = 0.921 16.05  = 2.27
Minitab correlation coefficient output  6.51 
Correlations: Boats, Deaths
Pearson correlation of Boats and Deaths = 0.921
P-Value = 0.000

Chapter 5 # 37 Chapter 5 # 38

Computing the estimators Put it together

And the intercept is calculated using the slope • In general terms any old linear regression
information along with the variable means: equation is:
response = intercept + slope(predictor)
b0 = y − b1x
• Specifically for the manatee example the sample
= 55.8 − 2.27 (74.1) regression equation is:
Deaths = -112.7 + 2.27(boats)
= −112.4
Chapter 5 # 40
Chapter 5 # 39

The slope estimate The slope estimate

• b1 is the estimated slope of the line • In our example the estimated slope is 2.27
• The interpretation of the slope is, “The amount of change • This is interpreted as, “For each additional 10,000 boats

in the response for every one unit change in the registered, an additional 2.27 more manatees are killed
independent variable.”

Chapter 5 # 41 Chapter 5 # 42

The intercept estimate The intercept Estimate

• Recall the sample regression model: • Sometimes this value is meaningful. For example
resting metabolic rate versus ambient temperature in
“b0” is the estimated y- intercept Centigrade (oC)

yˆ = b0 + b1x • Sometimes it’s not meaningful at all.
• This is an example where the y-intercept just serves to
The interpretation of the y-intercept is, “The
value of the response when the control (or make the model fit better. There can be no such thing as
independent) variable has a value of 0.” a –112.7 manatees killed

Chapter 5 # 43 Chapter 5 # 44

Regression Output Regression Output

Use the minitab regression output for the • The sample regression equation is:
manatee example to predict the expected ManateesKilled = -112.7 + 2.27(boats)
number of manatees killed when the number of
power boat registrations is 750,000 (x = 75) • So:
ManateesKilled = -112.7 + 2.27(75) = 57.6
Chapter 5 # 45
• This means that we expect between 57 and 58
Regression Output manatees killed in a year where 750,000 power
boats are registered.
Use the minitab regression output for the
manatee example to predict the expected Chapter 5 # 46
number of manatees killed when the number of
power boat registrations is 850,000 (x = 85) Regression Output

Chapter 5 # 47 • The sample regression equation is:
ManateesKilled = -112.7 + 2.27(boats)

• So:
ManateesKilled = -112.7 + 2.27(85) = 80.25

• This means that we expect between 80 and 81
manatees killed in a year where 750,000 power
boats are registered.

Chapter 5 # 48

Regression Output Cardinal Rule of Regression

STOP!! YOU HAVE VIOLATED • NEVER NEVER NEVER NEVER NEVER NEVER
THE CARDINAL RULE OF REGRESSION predict a response value from a predictor value that is
outside of the experimental range.
Chapter 5 # 49
• The only predictions we can make (statistically) are
predictions for responses where powerboat registrations
are between 670,000 and 840,000.

• This means that our prediction for the year when 850,000
powerboats were registered is garbage

Chapter 5 # 50

Regression Estimates The r2 Value
The coefficient of determination
• If r2 is, say, 0.857 we can conclude that 85.7% of the
• r2 is called the coefficient of determination. variability in the response is explained by the
• r2 is a proportion, so it is a number between 0 and 1 variability in the independent variable.

inclusive. • This leaves 100 - 85.7 = 14.3% left unexplained. It’s
• r2 quantifies the amount of variation in the response that is only the unexplained variation that is incorporated into
the “uncertainty”
due to the variability in the predictor variable.
• r2 values close to 0 mean that our estimated model is a Chapter 5 # 52

poor one while values close to 1 imply that our model
does a great job explaining the variation

Chapter 5 # 51

Scatter of Points and r2

r2 and the correlation coefficient r2 = 0.848 r2 = 0.992

• r2 is related to the correlation coefficient Chapter 5 # 54
• It’s just the square of r
• The interpretation as the proportion of variation in the

response that is explained by the variation in the
predictor variable makes it an important statistic

Chapter 5 # 53


Click to View FlipBook Version