75 DATA MANAGEMENT AND ANALYSIS We will continue looking at the mean and median of data sets in this unit, and we will add the concepts of percentile and the related quartile that will help us understand our sets of data and how they are distributed. These concepts will use what we know about percentages to position an observation relative to other observations in the data set. At the end of this unit students will be able to: 1. Explain the meaning of percentile and predict the number of data elements above or below a given percentile of a data set. 2. Explain the relationships among percentiles, quartiles and the median of a data set. 3. Describe a data set distribution and represent it graphically by comparing the mean, median and various percentiles of a data set. Then, draw conclusions regarding the distribution. 4. Compute percentiles and quartiles using formulas and a calculator or functions and spreadsheets. 5. Make appropriate use of spreadsheet software to assist in descriptions and comparisons of distributions of data sets.
76 DATA MANAGEMENT AND ANALYSIS Percentiles are in everyday use; they are commonly used to report where a score stands relative to the other scores in tests. Percentiles are closely related to percentages. In fact, they are divider marks for percentages of the data. The median of a data set may also be called the 50th percentile as it roughly divides the data into two groups that each has 50% of the data. Example 1: Hamda scored at the 75th percentile on a standardized test. How good was her score in relation to the other students taking the test? Hamda scored higher than 75% of the students taking the test, and there are 25% of the students taking the test scored higher than or equal to Hamda’s score. In general, we can say approximately 75% of the students scored less than Hamda’s score, and approximately (100-75)% = 25% of the students scored more than Hamda’s score. Example 2: Ahmad scored at the 60th percentile in the math test. What does this mean? A. Ahmad scored 60 out of 100 in the math test. B. Ahmad performed better than 60% of the students in the math test. In general, we can say approximately 60% of the students scored less than Ahmad’s score, and approximately (100-60)% = 40% of the students scored more than Ahmad’s score. In general, we say that a number is the k th percentile if it is larger than k% of the data. × % = % × % = % A percentile can be thought as dividing dataset values into two separate groups.
77 DATA MANAGEMENT AND ANALYSIS Practice 1. Suppose the 15th percentile of the data set is 71.75. Approximately what percent of the data is less than 71.75? _________________________________________________ 2. Suppose the 15th percentile of the data set is 71.75. Approximately what percent of the data is more than 71.75? _________________________________________________ ➤ How to calculate percentile rank (position)? In this course we will use the following formula to find the percentile rank. Example 3: Consider the data shown. Data 5 20 22 24 24 26 28 30 30 32 32 35 36 36 37 37 37 38 40 40 41 50 52 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 I. Find the value that corresponds to the 50th percentile (the median). ➤ The data is written in order from smallest to largest. ➤ There are 23 numbers, and the index (position) of each number is also shown. ➤ Use the formula to find the rank (position): = 100 ( + 1) = 50 100 (23 + 1) = 12 ➤ The median is the number in the 12th position. The median is 35. Put the data in order (smallest to largest.) n=23 P=50 The value is in the twelfth position The median is the 50th percentile
78 DATA MANAGEMENT AND ANALYSIS II. Find the values that corresponds to the 40th percentile. ➤ The data is written in order from smallest to largest. ➤ There are 23 numbers, and the index (position) of each number is also shown. ➤ Use the formula to find the rank (position): = 100 ( + 1) = 40 100 (23 + 1) = 9.6 ➤ Write the number 9.6 as 9 + 0.6 and then take the 9th number (which is 30), and then add 0.6 times the distance to the 10th number (which is 32.) Data 5 20 22 24 24 26 28 30 30 32 32 35 36 36 37 37 37 38 40 40 41 50 52 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 30 + 0.6 × (32 − 30) = 30 + 0.6(2) = 30 + 1.2 = 31.2 The 40th percentile is the number 31.2 Practice Consider the data shown. Data 10 10 25 26 26 28 32 32 32 33 36 37 40 41 41 42 42 42 43 45 55 60 61 64 75 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I. Find the median using the formula. II. Find the 25th percentile. n=23 P=40
79 DATA MANAGEMENT AND ANALYSIS III. Find the 75 th percentile. IV. Find the 30 th percentile. The four quarters that divide a dataset into quartiles are: ➤The 25th percentile is called the first quartile, and is written: ➤The 50th percentile is called the second quartile, and is written: ➤The 75th percentile is called the third quartile, and is written: Quartiles are measures of position and they divide the data into quarters. The median is the second quartile,
80 DATA MANAGEMENT AND ANALYSIS ➤ How to find the Quartiles? Example 1: Consider the below data. Data 10 10 25 26 26 28 32 32 32 33 36 37 40 41 41 42 42 42 43 45 55 60 61 64 75 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I. Find the median, 2. = 50 100 ( + 1) = 50 100 (25 + 1) = 13 = II. Find 1. = 25 100 (25 + 1) = 25 100 (25 + 1) = 6.5 = 28 + 0.5( 32 − 28) = III. Find 3. = 75 100 (25 + 1) = 75 100 (25 + 1) = 19.5 = 43 + 0.5( 45 − 43) = n=25, P=50 n=25, P=75 is in the position 13 is in the position 19.5 is in the position 6.5 n=25, P=25
81 DATA MANAGEMENT AND ANALYSIS OR: Note that we could have used the following: • The 25th percentile () is the median of the lower half of the data. • The 75th percentile () is the median of the upper half of the data. Data 10 10 25 26 26 28 32 32 32 33 36 37 40 41 41 42 42 42 43 45 55 60 61 64 75 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Practice Find the median (), the 25th percentile (), and the 75th percentile () of the data shown. 10 33 35 35 37 38 42 44 45 49 50 53 55 57 58 The median of the lower half =30 Median=40 The median of the upper half =44 When n is an odd number, you will get the same results if you use the formula = 100 ( + 1) to find the quartiles
82 DATA MANAGEMENT AND ANALYSIS Example 2: The scores of 15 students in a math test are ordered from smallest to largest. Practice 1. Approximately what percent of the data is greater than the second quartile? 2. Approximately what percent of the data is less than the second quartile? 3. Approximately what percent of the data is between and ? 4. Approximately what percent of the data is between and ? 5. Approximately what percent of the data is between and ? 6. Approximately what percent of the data is less than ? 7. Approximately what percent of the data is more than ? Quartiles divide the data into quarters.
83 DATA MANAGEMENT AND ANALYSIS A five-number summary simply consists of the smallest data value (minimum), the first quartile, the median, the third quartile and the largest data value (maximum). The Interquartile Range (IQR) is the range of the middle 50% of values when ordered from smallest to largest. It is the distance from to . The IQR is calculated by finding the difference between the third quartile (Q3) and the first quartile (Q1). Example: For the given five-number summary table, find the Interquartile Range (IQR). Minimum Median Maximum 7 29 32 37 55 The interquartile range (IQR) = 3 − 1 = 37 − 29 = 8 = 3 − 1
84 DATA MANAGEMENT AND ANALYSIS Practice 1. For the given five-number summary table, find the Interquartile Range (IQR). Minimum Median Maximum 1,278 2,876 3,569 5,489 7,755 2. Your Instructor has the midterm exam grades. He told you that the 25th percentile (Q1) is 50, and that the interquartile range of the grades (IQR) is 24. Find the 75th percentile (Q3) of the midterm exam grades. 3. Sarah has some data numbers. She told you that the 75th percentile (Q3) is 92 and that the interquartile range of the data (IQR) is 36. Find the 25th percentile (Q1) of the data numbers? An outlier is a data value that lies outside the overall pattern of a distribution. In other words, an outlier is an unusual observation in a data set – either much larger or much smaller – than most of the other data values. Why is it important to find the outliers of any given data? 1. They might be inaccurate data values. 2. They can indicate a remarkable occurrence. 3. They can heavily influence the values of some summary statistics, like the mean, range and the standard deviation.
85 DATA MANAGEMENT AND ANALYSIS ➤ Detecting Outliers: An outlier is typically calculated by multiplying the IQR by 1.5 and then determining if any data values are greater or lesser than that calculated distance away from or . By calculating 1 − (1.5 × ) and 3 + (1.5 × ), you are determining lower and upper limits (fences) for the data. Any value outside these limits is an outlier. Example: Consider the following data: 45, 48, 72, 77, 80, 81, 82, 83, 83, 84, 84, 85, 87, 94, 105. Minimum Median Maximum 45 77 83 85 105 a) Find the Interquartile Range (IQR). = 3 − 1 = 85 − 77 = 8 b) Are there any outliers? If so, what are they? To find if there are any outliers, we follow the steps below: ➤ Calculate the Lower Fence: 1 − 1.5 × = 77 − 1.5 × 8 = 77 − 12 = 65 ➤ Calculate the Upper Fence: 3 + 1.5 × = 85 + 1.5 × 8 = 85 + 12 = 97 ➤ Check what numbers in the given data are below the lower fence and above the upper fence. Lower Fence: 1 − (1.5 × ) Upper Fence: 3 + (1.5 × )
86 DATA MANAGEMENT AND ANALYSIS The numbers 45 and 48 are below the lower fence (65), and the number 105 is above the upper fence (97). Hence the dataset has 3 outliers which are 45, 48 and 105. Practice 1. Given the following data: 32, 65, 68, 72, 75, 80, 81, 82, 84, 84, 84, 86, 87, 94, 97. The five-number summary table is: Minimum Median Maximum 32 72 82 86 97 a) Find the Interquartile Range (IQR). b) Are there any outliers? If so, what are they? 2. Given the following data: 65, 68, 69, 72, 79, 80, 81, 82, 86, 87, 88, 89, 91, 98, 102. The five-number summary table is: Minimum Median Maximum 65 72 82 89 102 a) Find the Interquartile Range (IQR). b) Are there any outliers? If so, what are they? Note: In any given dataset, if there is no number below the lower fence or above the upper fence, then the dataset does not contain any outlier.
87 DATA MANAGEMENT AND ANALYSIS 3. Given the following data: 61,70, 71, 74, 75, 79, 81, 82, 83, 84, 84, 88, 88, 94, 123. The five-number summary table is: Minimum Median Maximum 61 74 82 88 123 a) Find the Interquartile Range (IQR). b) Are there any outliers? If so, what are they? A Box-and-Whisker plot, also known as Boxplot, is another way to graph a data distribution. It helps you see where data is concentrated, and it helps determine if the extreme values are outliers. ➤ Boxplot parts: 1. Box goes from Q1 to Q3, with a vertical line inside it for the median. 2. Left (lower) whisker is extended from Q1 to the smallest data value that is not an outlier. 3. Right (upper) whisker is extended from Q3 to the largest data value that is not an outlier. 4. The outliers are separated from the whiskers and they are marked by star, dot, or cross symbols. Boxplots visually show the distribution of numerical data and skewness through displaying the data five-number summary.
88 DATA MANAGEMENT AND ANALYSIS ➤ Steps to construct a Boxplot: In this example we will go through the steps needed to construct a Boxplot. Example 1: Table below shows the amount of sodium in 20 different cereal products: Data 0 50 70 100 130 140 140 150 160 180 180 180 190 200 200 210 210 220 290 340 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ➤ Step 1: Find the five-number summary 1 : 25 100 × (20 + 1) = 5.25; 5 th number is 130 and 6th number 140. 1 = 130 + 0.25(140 − 130) = 132.5 Median : 50 100 × (20 + 1) = 10.5; 10th number is 180 and 11th number 180. Median = 180 + 0.5(180 − 180) = 180 3 : 75 100 × (20 + 1) = 15.75, 15th number is 200 and 6th number 210. 3 = 200 + 0.75(210 − 200) = 207.5 Min Median Max 0 132.5 180 207.5 340 Key points: A Boxplot is a chart of the five-number summary of the given data. The Boxplot is unique because it features the following: • The horizontal axis covers all data values. • The box covers the middle 50% of the data values. • The median as a vertical line inside the box represents the center. • Each whisker covers 25% of the data values. 1. The lower whisker covers the lower 25% of the values. 2. The Upper whisker covers the upper 25% of the values. • Outliers are indicated on a Box-and-Whisker plot by the “star” symbol.
89 DATA MANAGEMENT AND ANALYSIS ➤ Step 2: Draw the fences The fences are located 1.5 × below and above respectively. = 3 − 1 = 207.5 − 132.5 = 75 Left (Lower) Fence: 1 − 1.5 × = 132.5 − 1.5 × 75 = 20 Right (Upper) Fence: 3 + 1.5 × = 207.5 + 1.5 × 75 = 320 ➤ Step 3: Identify the outliers Is there any value below 20? Yes; 0 is an outlier. Is there any value above 320? Yes; 340 is an outlier.
90 DATA MANAGEMENT AND ANALYSIS ➤ Step 4: Draw the Whiskers and the outliers • The right whisker is a line extended from to the largest number that is not an outlier. In our example from 207.5 to 290. • The left whisker is a line extended from to the smallest number that is not an outlier. In our example from 132.5 to 50. • Mark the outliers as x outside the fences. In our example 0 and 340 are outliers. Note: • Approximately half of the data values of the Boxplot are between 132.5 and 207.5 (between 1 and 3.) • The median (or middle of the data) is at 180. • The fences are at 20 and at 290 and any value outside that range is an outlier. • There are two outliers in this data: 0 and 340 • The whiskers are left from 1 to 50, and right from 3 to 290. 20 320 Whiskers
91 DATA MANAGEMENT AND ANALYSIS Example 2: The grades of a group of 15 students on a Math test are as follows: 84 89 66 82 76 79 72 98 75 80 76 55 77 68 69 Find the five-number summary and construct the Boxplot for the given data. ➤To find the five-number summary we must arrange the numbers in an ascending order: Data 55 66 68 69 72 75 76 76 77 79 80 82 84 89 98 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Five-number summary: Min Median Max 55 25/100(15+1) =4 1 =69 50/100(15+1) =8 Median=76 75/100(15+1)=12 3 =82 98 ➤ IQR = 82 − 69 = 13 ➤ Fences: Lower Fence: 69 − 1.5 × 13 = 49.5 Upper Fence: 82 + 1.5 × 13 = 101.5 ➤ Outliers: No outliers Q 1 Q 3 Min Median Max
92 DATA MANAGEMENT AND ANALYSIS Practice Draw the Box-and-Whisker plot for the following data: 1, 5, 7, 8, 10, 12, 13, 15, 19, 34 Min Median Max IQR Upper fence Lower fence Outliers ➤ Reading and analyzing Box-and-Whisker Plots: A Box-and-Whisker plots are an excellent way to visualize the distribution of a data set. What information can we extract from a Boxplot? • Key numerical values such as five-number summary: Minimum, First Quartile, Median, Third Quartile, and Maximum. • The values of the outliers, if there are any. • Whether the data are spread-out or clustered together.
93 DATA MANAGEMENT AND ANALYSIS • Whether the data is symmetrical or skewed and the direction of the skewness: ➤ When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric. ➤ When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right). ➤ When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left). Example 1: The following Boxplot represents the heights of 40 students in a statistics class. • Minimum value = 59 • 1, First quartile = 64.5 • , Second quartile or median= 66 • , Third quartile = 70 • Maximum value = 77 25% 25% 25% 25% Minimum First Quartile Median Third quartile Maximum
94 DATA MANAGEMENT AND ANALYSIS Practice 1) For the given Box-and-whisker plot, answer the following questions: a) Approximately what percentage of data is less than the first quartile? _______ b) Approximately what percentage of data is greater than the third quartile? _________ c) Approximately what percentage of data is less than the second quartile? ___________ d) Approximately what percentage of data is between first quartile and the maximum value? __________ e) Approximately what percentage of data is between the minimum value and the second quartile? _____________ Key points: • Each quarter has approximately 25% of the data. • The median (second quartile) marks the mid-point of the data and this is shown by the vertical line that divides the box into two parts. In this example, the median is 66. • The “box” represents the middle 50% of the data. • The spreads of the four quarters are 64.5 – 59 = 5.5 (first quarter), 66 – 64.5 = 1.5 (second quarter), 70 – 66 = 4 (third quarter), and 77 – 70 = 7 (fourth quarter). So, the second quarter has the smallest spread and the fourth quarter has the largest spread. • Range = maximum value –minimum value = 77 – 59 = 18 • Interquartile Range: IQR = – 1 = 70 – 64.5 = 5.5. • In general, when the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric. When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right). When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left). •
95 DATA MANAGEMENT AND ANALYSIS 2) The Box-and-Whisker plot below shows the grade distribution of the students in a class. a) The five-number summary of the data is: Min 1 Median Max b) The outlier is ________________ c) The Range of the data is ___________ d) The interquartile range of the data is _______________________ e) The shape of the distribution is ___________ 3) The number of messages sent by 20 students in 1 day are shown in the Box-and-Whisker plot. a) Write the five-number summary. Min 1 Median Max b) The interquartile range of the data is ______________________ c) Approximately what percentage of students sent less than 13 messages? __________
96 DATA MANAGEMENT AND ANALYSIS d) Approximately what percentage of students sent more than 7 messages? __________ e) Approximately how many students sent 7 to 18 messages per day? __________ f) Approximately how many students sent more than 10 messages? __________ g) Approximately how many students sent exactly 11 messages per day? __________ 4) The accompanying Box-and-Whisker plot represents the cost, in dollars, of 28 CDs at a music store. a) The price of the most expensive CD at the store is ____________________ b) What is the range of the costs? ___________________ c) The median price of the CD is ____________________ d) Which cost represents the 75th percentile? _________ e) Approximately what percentage of the CDs cost above $26? ______________ f) Approximately what percentage of the CDs cost below $20.50? ___________________ g) How many CDs cost between $14.50 and $26.00? ___________________ h) How many CDs cost less than $14.50? __________________________ i) The shape of the distribution is ___________
97 DATA MANAGEMENT AND ANALYSIS ➤ Double Box-and-Whisker Plots Boxplots are can be used to compare the distribution of two datasets. Example 1: The double Box-and-Whisker plot shows the heights of the boys and girls in Grade7. Answer the following questions: a) What is the height of the tallest student in the class? The tallest student in the class is a boy (71 inches). b) What is the height of the shortest girl in the class? The shortest girl in the class is 61 inches tall. c) The upper quartile height for the girls is same as the median (67 inches) height for the boys. d) Approximately what percentage of the girls are taller than 67 inches? Approximately 25% of the girls are taller than 67 inches, since the height 67 inches represents the third quartile in the girls’ heights. e) Approximately what percentage of the boys are shorter than 66 inches? Approximately 25% of the boys are shorter than 66 inches, since the height 66 inches represents the first quartile in the boys’ heights f) Approximately what percentage of girls are in range of 64 -70 inches tall? Approximately 75% of girls are in the range of 64-70 inches, since 64 and 70 represent the first quartile and the max respectively. g) Who have the bigger interquartile range? IQR (girls)=67-64=3 inches and IQR (boys)=69-66=3 inches, therefore, both girls and boys have the same interquartile range. h) Who have the smaller range? Range (boys)=71-63=8 inches and Range (girls)=70-61=9 inches, therefore, the boys have a smaller range. i) Sara’s height is at 80th percentile in her class. Her height falls in the range of 65-67 inches. (True/ False). Explain. False, because the height 67 inches represents the 75th percentile in the girls’ heights, therefore, Sara’s height is in the interval 67-70. j) In general, who do you think are taller boys or girls? Why? Boys are taller in general in this class because approximately 50% of the boys are taller than 67 inches (second quartile), whereas approximately 75% girls are shorter than 67 inches (third quartile.)
98 DATA MANAGEMENT AND ANALYSIS Practice Mr. Ahmed gave two math quizzes to his students this week. The Box-and-Whisker plots represent the scores of the students in both tests. a) The range of Test 1 scores is equal to the range of Test 2 scores. (True/ False) b) The median score of Test 2 is same as the median score of Test 1. (True/ False) c) The interquartile range of Test 1 is greater than the interquartile range of Test 2. (True/ False) d) Approximately 75% of the students scored more than 75 in Test 1, whereas approximately 75% of the students scored more than 80 in Test 2. (True/ False) e) Which five-number summary value is the same for both tests? ___________________ f) The scores in Test 2 are most spread out between 55and 65. (True/False) g) Sara scored 65 in both tests. In which test did Sara perform better? Explain. __________________________________________________________ h) In which test do you think that the students performed better? Why? _____________________________ i) Comment on the shape of distribution for the scores of both tests. _______________________ Test 1 Test 2
99 DATA MANAGEMENT AND ANALYSIS Practice (mixed questions) 1) In an experiment, a large class of students guessed the answers to a quiz with 10 True/False questions. For each student, the number of correct answers were counted. The column chart shows the data collected. Answer the following questions about the data shown in the chart. I. How many students guessed exactly 7 answers correctly? _______________ II. How many students guessed all 10 answers correctly? _________________ III. How many students guessed all 10 answers incorrectly? ________________ IV. How many students were in the experiment? _________________________ V. Show that the median is the 18th data item. ______________________________ Median = ______ VI. Show that 1 is the 9th data item. __________________________________ 1 = ______ VII. Show that 3 is the 27th data item. _________________________________ 3 = ______ VIII. Write the five-number summary of the data: Minimum Median Maximum 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 Frequency Number of Correct Answers Guessing the Answers to 10 T/F Questions
100 DATA MANAGEMENT AND ANALYSIS IX. Construct the Box-and-Whisker plot for the given data. 2. The Box-and-Whisker plot shows the weekly earnings ($) of two different workers. Which of the following statements true? a) The lower quartile of worker B is equal to the median of worker A. b) The median of worker B is greater than the median of worker A. c) The earnings’ range of the worker A is greater than the earnings’ range of worker B d) The median of earnings is the same for both workers. Worker A Worker B
101 DATA MANAGEMENT AND ANALYSIS 2) On a test worth 100 points; here is the five-number summary of the scores: a) Are the grades spread out evenly across the whole range of the data? _____________________________________________________________ b) Where are the data values clustered together, and where are the data values spread out more? _____________________________________________________ _____________________________________________________________ c) What is the range? _____________________________________ d) What percentage of the test scores are between 55 and 97? ______________ e) What is the interquartile range? _____________________________________ f) What percentage of the test scores are between 75 and 81? ______________ 3) On a test worth 100 points; here is the 5-number summary of the scores: a) Are the grades spread out evenly across the whole range of the data? _____________________________________________________________ b) Where are the data values clustered together, and where are the data values spread out more? ____________________________________________________ _____________________________________________________________ c) What is the interquartile range? _____________________________________ d) What percentage of the test scores are between 94 and 97? ______________ Minimum Median Maximum 55 75 78 81 97 Minimum Median Maximum 55 75 88 94 97
102 DATA MANAGEMENT AND ANALYSIS 4) The following chart has data from an experiment of 80 measurements (n=80). The chart has 8 columns and 10 rows. The data is arranged in order from the top left corner to the bottom right corner. So, the first column has the first 10 data elements, the second column has the 11th data element through the 20th, and so on. a) The 53rd data value is in the 6th column and the 3rd row; what is it? _______ b) Find the 10th percentile of the data using the formula. _____________________________________________________________ _____________________________________________________________ c) Find the 90th percentile of the data using the formula. _____________________________________________________________ _____________________________________________________________ d) Use the formula to find the median, the first quartile, 1, and the third quartile, 3, and give the five-number summary. Then, fill in the five-number summary table below. _____________________________________________________________ _____________________________________________________________ _____________________________________________________________ e) Write the four data intervals defined by the five-number summary. Each of these intervals contain 25% of the data. _________________________________________________ Minimum Median Maximum
103 DATA MANAGEMENT AND ANALYSIS f) Which of these intervals is the shortest? __________________ g) Which of these intervals is the longest? __________________ h) Where is the data most spread out, and where is it the most concentrated? _____________________________________________________________ _____________________________________________________________ i) Find the IQR (interquartile range) for this data. _______________________ j) Find the range for this data. _______________________ k) Find the range of the middle 50% of this data. ______________________ 5) Here is a partial list of the data, showing the 4 smallest values and the 4 largest values: 35, 40, 42, 45, …, 60, 66, 75, 78 A Box-and-Whisker plot is being constructed from the data. The plot so far is given below: a) Find the five-number summary from the plot and/or the data: Minimum Median Maximum b) Find the interquartile range: = _________________________________ c) Find the upper fence: 3 + 1.5 × = _______________________________ d) Find the lower fence: 1 − 1.5 × = _______________________________ e) What are the outliers in the data? ____________________
104 DATA MANAGEMENT AND ANALYSIS f) Select the graphs that show the left and right hand whiskers: 6) Construct a Box-and-whisker plot for the following ordered data. A five-number summary table and an axis has been provided. Data 8 10 10 12 12 12 15 16 18 18 18 18 25 30 38 Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a) The position of the median is = 100 ( + 1) = ____. The median is ______. b) For 1: The median of the lower half of the data is ________ (this is 1) c) For 3: The median of the upper half of the data is ________ (this is 3) d) The five-number summary is: e) The interquartile range: = ___________________________________ Minimum Median Maximum
105 DATA MANAGEMENT AND ANALYSIS f) Compute the quantities associated with the lower and upper fences. Lower fence: 1 − 1.5 × = _______________________________________ Upper fence: 3 + 1.5 × = _______________________________________ g) Draw a Box-and-Whisker plot. h) Identify any outliers in the data set._______________ 7) Consider the data: 998, 72, 431, 443, 796, 334, 376, 498, 457, 458, 225 I. Find (a) the median, (b) 1, (c) 3, and (d) the interquartile range. Order the data first: _____, _____, _____, _____, _____, _____, _____, _____, _____, _____, _____ (a) _____________________________________________________________ (b) _____________________________________________________________ (c) _____________________________________________________________ (d) _____________________________________________________________ II. Compute the quantities associated with the lower and upper fences. Lower fence: 1 − 1.5 × = _____________________________________________ Upper fence: 3 + 1.5 × = _____________________________________________ III. Identify any outliers in the data set. _________________________
106 DATA MANAGEMENT AND ANALYSIS Click on the below Excel-Lesson one file to start learning: Excel Lesson Four Objectives: By the end of the this lesson you will be able to use Excel functions to calculate the following: - Percentiles - Quartiles - IQR - Upper and Lower Fences - Min - Max - Number of values in a dataset (count)