Worksheet 6 (§2.7)
Boxplots and Methods for Detecting Outliers
This worksheet continues the discussion begun in Discussion Sheet 6. It covers the material from §2.7 of
the textbook. If you haven’t already done so, complete Discussion Sheet 6 and read §2.7. This discussion
continues with a modified form of the boxplot, the form that we will use from now on. This modified
boxplot will give us yet another way to identify outliers.
Sex Cholesterol
FEM 155
FEM 163
FEM 167
FEM 167
FEM 171
FEM 187
FEM 196
FEM 198
FEM 198
FEM 212
FEM 215
FEM 232
FEM 234
FEM 234
FEM 238
FEM 257
FEM 271
FEM 271
FEM 309
FEM 405
Inner and Outer Fences
A modified form of a boxplot developed by the statistician John Tukey uses what he called "fences". There
are inner fences and outer fences. To find the upper inner fence, take 1.5 times the IQR and add it to the
upper quartile. That data value is called the upper inner fence:
Upper Inner Fence = QU + 1.5 × IQR
The lower inner fence is found the same way except by subtracting from the lower quartile:
Lower Inner Fence = QL − 1.5 × IQR
There are also what Tukey called "outer fences". These are made by using 3 times the IQR:
Upper Outer Fence = QU + 3 × IQR
Lower Outer Fence = QL − 3 × IQR
These fences are useful in making a modified form of a boxplot and identifying outliers.
1. Find the upper and lower inner fences for these data. Find the upper and lower outer fences.
1
Modified Boxplot
The textbook uses what some call a modified boxplot, although Tukey just called it a boxplot or a "box and
whiskers" display. The only difference is that rather than extending each whisker to the farthest data values:
1. The upper whisker is drawn to the upper inner fence.
2. The lower whisker is drawn to the lower inner fence.
3. Any data values between the upper inner fence and the upper outer fence are marked with a dot.
4. Any data values between the lower inner fence and the lower outer fence are marked with a dot.
5. Any data values above the upper outer fence are marked with an asterisk.
6. Any data values below the lower outer fence are marked with an asterisk.
2. Make a modified boxplot for these data. It is this plot that we will mean whenever we refer to a boxplot
or box and whiskers plot in this course.
Outliers
We now have two methods for determining outliers.
1. Any data value that has a z-score with an absolute value greater than 3 (is more than 3 standard
deviations above or below the mean is considered an outlier.
2. Any data point between the inner and outer fences (either upper or lower) Tukey considered a "suspect
outlier".
3. Any data point beyond the outer fences (either upper or lower) Tukey considered a "very suspect
outlier."
Outliers are not necessarily wrong data. However they must be checked carefully to see if data were
recorded wrong or entered wrong in a computer program. If not, they are just highly unusual values.
Special attention will have to be paid to them in any analyses done with the data.
3. Identify any outliers that can be identified by the z-score method.
2
4. Find any suspect outliers and highly suspect outliers using the inner and outer fences. How do these
results compare with what you got in #3?
Suggested Homework: 2.39, 2.40, 2.42
Homework for which solutions will be posted: 2.40
3