The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by PENERBITAN PELANGI SDN BHD, 2023-09-26 20:50:18

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Title pg STPM.indd 5 05/11/2021 8:18 AM


Mathematics Semester 3 STPM PREFACE This book ‘Pre-U STPM Text Mathematics (T) – Semester 3’ is one in a series of three books, specially written to meet the requirements of the new revised Mathematics (T) syllabus in the STPM Examination, which will take effect from 2012. Under the new system introduced by the Malaysian Examinations Council (MEC), the new Form Six curriculum will be spread over three semesters, with candidates sitting for an examination at the end of each semester, and also coursework. This is to enhance the teaching and learning orientation of Form Six, so as to be in line with the orientation of teaching and learning in colleges and universities. The new syllabus fulfils the requirements of this new system, and this book covers all the topics specified in the syllabus of Mathematics (T) for Semester 3. It is also suitable for use by students pursuing a foundation course in science or matriculation programmes at local universities. This book seeks to fulfil the aims and objectives as set out in the new mathematics syllabus, i.e. to develop the understanding of mathematical concepts and mathematical thinking, and acquire skills in problem solving and the applications of mathematics related to science and technology. This will prepare the students with an adequate foundation, before they proceed to programmes in the field of science and technology at institutions of higher learning. The contents of this book, in six chapters based on the syllabus, are organised and planned in a systematic manner so as to make learning more effective. Each new topic in a chapter is clearly presented via a simple and practical approach, which leads to a better overall picture and understanding of the topic concerned. The concepts presented in each subtopic include relevant explanations in detail, followed by the all-important worked examples, presented clearly in a step by step manner. This approach presumes that the student has only prior basic mathematical knowledge and skills up to the SPM level. The questions at the end of each subtopic are planned in such a way that the student will be able to test his understanding of, and apply, the concepts learned to solve basic problems. They are also suitable for the coursework. A summary of the concepts and other important formulae is given at the end of each chapter. A revision exercise covering all the subtopics in the chapter is also included. The questions chosen are all planned such that they are of equivalent standard as those in the STPM examination. To help the student in his revision and to assess his preparedness in facing the examination, a set of sample STPM examination paper is included at the end of the book, together with complete worked solutions. This book may be used as a textbook in the classroom, for coursework or by the student studying on his own. It is hoped that both teachers and students alike will benefit from using this book, and find the learning process both effective and enjoyable. ii


iii CONTENTS • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 1 DATA DESCRIPTION 1 1.1 Data Representation 2 1.2 Measures of Central Tendency 15 1.3 Measures of Dispersion 28 1.4 The Shape of a Distribution 49 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 2 PROBABILITY 71 2.1 Counting Techniques 72 2.2 Probability 81 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 3 PROBABILITY DISTRIBUTIONS 114 3.1 Discrete Random Variables 115 3.2 Continuous Random Variables 128 3.3 Binomial Distribution 148 3.4 Poisson Distribution 156 3.5 Normal Distribution 164 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 4 SAMPLING AND ESTIMATION 194 4.1 Sampling 195 4.2 Estimation 213 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 5 HYPOTHESIS TESTING 237 5.1 Hypothesis Tests 238 5.2 Testing Population Mean 244 5.3 Testing Population Proportion 250 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Chapter 6 CHI-SQUARED TESTS 260 6.1 The Chi-Squared Distribution 261 6.2 Goodness-Of-Fit Tests 265 6.3 Tests of Independence 272 STPM Model Paper (954/3) 283 Answers 285 iii


iv Mathematics Semester 3 STPM Miscellaneous symbols = is equal to ≠ is not equal to  is identical to or is congruent to ≈ is approximately equal to , is less than < is less than or equal to . is greater than > is greater than or equal to ∞ infinity \ therefore Set notation  is an element of  is not an element of ∅ empty set {x | …} set of x such that … N set of natural numbers, {0, 1, 2, 3, …} Z set of integers Z+ set of positive integers Q set of rational numbers R set of real numbers [a, b] closed interval {x | x  R, a < x < b} (a, b) open interval {x | x  R, a , x , b} [a, b) interval {x | x  R, a < x , b} (a, b] interval {x | x  R, a , x < b}  union  intersection Data description x1 , x2 , … observations f 1 , f 2 , … frequencies with which the observations x1 , x2 , … occur x – sample mean s 2 sample variance, s 2 = 1 n n i Σ = 1 (xi – x –) 2 iv µ population mean s 2 population variance Probability A an event A A9 complement of an event A or the event not A P(A) probability of an event A P(A | B) probability of event A given event B Probability distributions X a random variable X x value of a random variable X Z standardised normal random variable z value of the standardised normal random variable Z f(x) value of the probability density function of a continuous random variable X F(x) value of the cumulative distribution function of a continuous random variable X E(X) expectation of a random variable X Var(X) variance of a random variable X B(n, p) binomial distribution with parameters n and p Po(l) Poisson distribution with parameter l N(µ, s 2 ) normal distribution with mean µ and variance s 2 χ2 v chi-squared distribution with v degrees of freedom Sampling and estimation µ ^ unbiased estimate of the population mean s ^ 2 unbiased estimate of the population variance p population proportion p ^ sample proportion Mathematical Notation


Mathematics Semester 3 STPM Chapter 1 Data Description 1 CHAPTER DATA DESCRIPTION Learning Outcome (a) Identify discrete, continuous, ungrouped and grouped data. (b) Construct and interpret stem-and-leaf diagrams, box-and-whisker plots, histograms and cumulative frequency curves. (c) State the mode and range of ungrouped data. (d) Determine the median and interquartile range of ungrouped and grouped data. (e) Calculate the mean and standard deviation of ungrouped and grouped data from raw data and from given totals such as n ∑ i=1 (xi – a) and n ∑ i=1 (xi – a) 2 . (f) Select and use the appropriate measures of central tendency and measures of dispersion. (g) Calculate the Pearson coefficient of skewness. (h) Describe the shape of a data distribution. back-to-back stem plots – plot batang belakang ke belakang box-and-whisker plot – plot kotak dan misai continuous data – data selanjar discrete data – data diskrit interquartile range – julat antara kuartil lower quartile – kuartil bawah mean – min median – median mode – mod negatively skewed distribution – taburan terpesong negatif outliers – nilai-nilai luar biasa positively skewed distribution – taburan terpesong positif standard deviation – sisihan piawai stem-and-leaf diagram – gambar rajah batang dan daun symmetrical distribution – taburan bersimetri ungrouped data – data tak terkumpul upper quartile – kuartil atas 1 Bilingual Keywords


2 Mathematics Semester 3 STPM Chapter 1 Data Description 1 1.1 Data Representation Types of data A variable is a characteristic that can take different values. Examples of variables are the type of vehicles produced by a company, the grade of eggs sold in a hypermarket, the number of calls received by a company in a day and the heights of students in a class. The observed values or measurements of a variable are called data. They are either qualitative or quantitative data. Quantitative data are divided into discrete and continuous data. Discrete data Discrete data can take only exact values. For example, the number of calls received by a company in a day for 25 days are natural numbers. However, there may be other data such as the size of shoes, 5, 5 1 2 , 6, 6 1 2 , 7 which are not necessarily natural numbers. Continuous data Continuous data do not take exact values but lie in a certain interval and are measured to a certain degree of accuracy which depends on the measuring instrument used. One example of continuous data is the height of a student, 132 cm (correct to the nearest cm) which lies in the interval 131.5 cm  height  132.5 cm. Another example is the mass of a durian which is recorded as 1.2 kg which lies in the interval 1.15 kg  mass  1.25 kg. Ungrouped data Raw data are recorded in a sequence in which they are available. For example, the number of male children in 10 families chosen at random are as follows. 1 4 2 0 2 3 3 2 1 4 These raw data are called ungrouped data. Grouped data When the data set is large, the data are summarised by grouping them into classes. Consider the following example. The lengths of 40 leaves of a tree collected are measured (to the nearest cm) and the following results are obtained. 8 12 15 16 14 20 14 9 17 19 12 10 11 14 10 17 15 17 15 9 18 15 14 12 14 11 13 13 18 13 11 12 11 16 12 13 18 12 14 12 These data may be grouped into classes with intervals 8 – 10, 11 – 13, 14 – 16, 17 – 19 and 20 – 22 as follows: Length of leaf (cm) 8 – 10 11 – 13 14 – 16 17 – 19 20 – 22 Frequency 5 15 12 7 1 Table 1.1 This is called a frequency distribution and the data presented are called grouped data. When the raw data are grouped in a frequency distribution table, the original values of the data are lost.


3 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 1 State whether the data obtained from the observation or measurement of each of the following variables are discrete or continuous data. (a) The age of a person. (b) The number obtained when a fair die is thrown. (c) The time taken to run 100 metres. (d) Set of negative integers, {–1, –2, –3, …}. (e) The number of robberies reported per day. (f) The diameter of a tennis ball. Solution: (a) Continuous data because age is a measurement of time which is a continuous variable. (b) Discrete data because the possible numbers when a fair die is thrown are 1, 2, 3, 4, 5, 6. (c) Continuous data. (d) Discrete data. (e) Discrete data. (f) Continuous data. Stem and leaf diagrams (Stemplots) In statistics, a data set is usually presented in a diagram to obtain a clear picture of the data. Suppose the marks for 16 students in an English Language test are shown below. 54 48 83 37 46 28 51 72 65 51 50 33 63 53 57 61 A diagram used to represent ungrouped data is called a stem-and-leaf diagram. By assuming that the ‘stem’ represents tens and the ‘leaf ’ represents units, for the first six data, 54, 48, 83, 37, 46, 28, we obtain a partial stem-and-leaf diagram: Stem (tens) Leaf (units) 2 3 4 5 6 7 8 8 7 6 8 4 3 Table 1.2 The complete stem-and-leaf diagram is shown below. Stem (tens) Leaf (units) 2 3 4 5 6 7 8 8 7 3 6 8 4 1 1 0 3 7 5 3 1 2 3 Table 1.3 The lowest mark The highest mark


4 Mathematics Semester 3 STPM Chapter 1 Data Description 1 After arranging the ‘leaves’ in ascending order and by giving a key, we have the final stem-and-leaf diagram as follows: Stem (tens) Leaf (units) 2 3 4 5 6 7 8 8 3 7 6 8 0 1 1 3 4 7 1 3 5 2 3 Key: 4 | 6 means 46 Table 1.4 The stem-and-leaf diagram gives a good picture of the shape of the data distribution. Example 2 The heights of students (to the nearest cm) in a class are given below. 152 145 153 142 155 157 156 149 144 157 150 147 155 154 152 153 151 148 151 152 147 156 146 144 Construct a stem-and-leaf diagram for the heights of these students. Solution: Since there are too many leaves on the stems, we use multiple stems as follows: Stem Leaf 14 14 14 14 15 15 15 15 2 5 4 4 7 7 6 8 9 0 1 1 2 3 2 3 2 5 5 4 7 6 7 6 After arranging the ‘leaves’ in ascending order, we obtain the following stemand-leaf diagram. Stem Leaf 14 14 14 14 15 15 15 15 2 4 4 5 6 7 7 8 9 0 1 1 2 2 2 3 3 4 5 5 6 6 7 7 Key: 14 | 8 means 148 cm


5 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 3 The number of days in a year a group of 20 students are absent from school are given below. 10 22 5 15 18 12 8 16 14 12 21 17 16 14 13 11 9 19 14 11 Display the data using an appropriate stem-and-leaf diagram. Solution: The completed stem-and-leaf diagram is as follow: Stem Leaf 4 7 10 13 16 19 22 1 1 2 0 1 1 2 2 0 1 1 1 2 0 0 1 2 0 2 0 Key: 16 | 1 means 17 days Reminder: For a stem-and-leaf diagram, (a) the stems must be equally apart, (b) a key must be given. Back-to-back stem-and-leaf diagrams Two sets of data can be compared by drawing a back-to-back stem-and-leaf diagram, as shown in the following example. Example 4 The ages of teachers in School X and School Y are shown below. School X School Y 50 41 32 36 26 36 53 53 33 30 39 42 45 37 28 25 53 33 22 40 25 38 52 48 54 45 47 47 38 24 41 51 49 25 44 24 33 35 37 24 24 30 47 49 52 50 45 44 47 44 25 35 24 26 29 53 26 23 28 38 Construct a back-to-back stem-and-leaf diagram for the above data, and give a comment on the ages of the teachers in School X and School Y. Solution: School X School Y 4 4 5 6 2 8 5 2 5 4 4 5 4 6 9 6 3 8 0 8 8 0 3 6 6 2 3 9 7 3 3 5 7 5 8 4 7 4 5 9 7 7 7 5 8 1 4 2 5 0 1 9 4 0 2 4 2 3 3 0 5 3 1 3


6 Mathematics Semester 3 STPM Chapter 1 Data Description 1 After rearranging the ‘leaves’ in ascending order, the stem-and-leaf diagram below is obtained. School X School Y 6 5 4 4 2 2 3 4 4 4 5 5 5 6 6 8 8 9 8 8 6 6 3 2 0 0 3 3 3 5 5 7 7 8 9 9 8 7 7 7 7 5 5 4 4 1 4 0 1 2 4 5 9 4 3 3 2 2 0 0 5 1 3 3 Key: 5 | 2 means 25 Key: 3 | 7 means 37 Comment: On the whole, the teachers in School X are older than the teachers in School Y. Exercise 1.1 1. State whether the data obtained from each variable below is discrete or continuous. (a) The number of A’s for STPM Physics from 2000 to 2010. (b) The travelling time from a student’s house to his school. (c) The weight of each of the boxers taking part in the 2000 Olympic Games in Sydney. (d) The size of shirts sold by the shop ‘Smartshoppe’. (e) The number of pens produced everyday at a factory. 2. Construct stem-and-leaf diagrams for the following data. (a) The length of a straight line, as estimated by 22 students, correct to the nearest mm are 10.5, 8.5, 8.6, 8.1, 7.3, 4.4, 6.6, 6.6, 7.9, 8.7, 8.3, 6.0, 8.7, 7.5, 7.9, 6.0, 9.1, 7.2, 8.4, 8.1, 8.6, 9.3 (b) The weights of 26 taxi drivers correct to the nearest kg are 69, 47, 63, 66, 62, 71, 76, 81, 68, 59, 63, 70, 58, 65, 52, 62, 52, 67, 54, 74, 65, 59, 69, 72, 74, 60 (c) A group of students participate in a test to measure their reaction times. The results measured to the nearest 0.01 second are shown below. 0.12, 0.15, 0.19, 0.18, 0.20, 0.12, 0.22, 0.24, 0.15, 0.12, 0.15, 0.19, 0.18, 0.20, 0.12, 0.22, 0.24, 0.15, 0.16, 0.15, 0.19, 0.18, 0.21, 0.15, 0.21, 0.19, 0.21, 0.22, 0.21, 0.18 (d) The time (in hours) spent surfing the internet per day by a group of teachers, recorded to the nearest 0.1 hour, are shown below. 3.5, 3.8, 6.2, 6.5, 4.2, 4.8, 4.2, 5.6, 3.7, 3.8, 4.9, 5.2, 0.2, 1.2, 5.8, 5.6, 3.6, 3.9, 3.2, 3.1, 3.0, 2.8, 2.9, 2.4, 2.1, 2.8, 0.3, 0.8


7 Mathematics Semester 3 STPM Chapter 1 Data Description 1 3. Write down the meaning of the circled number in the stemplot below. Stem Leaf 4 5 5 6 6 7 8 2 3 6 8 9 2 2 3 4 5 6 7 7 8 9 1 2 4 (a) The stemplot represents the time taken to travel from station A to station B with key 5 | 9 meaning 5.9 hours. (b) The stemplot represents the mass of a chemical substance in grams (correct to 2 decimal places) with key 5 | 9 meaning 0.59 g. 4. Draw back-to-back stemplots for each of the following cases. Give a comment in each case. (a) 20 boys and 20 girls take part in a competition to test how long a person can hold his or her breath. The times recorded are correct to the nearest second. Girls Boys 18 16 18 22 21 14 11 20 22 16 22 25 17 22 19 22 25 22 19 17 19 23 15 16 19 23 16 15 23 9 21 16 24 22 21 22 18 16 18 19 (b) The Mathematics and Chemistry marks for 20 students are shown in the following table. Mathematics 77 84 52 78 67 65 73 73 57 42 79 66 71 62 64 32 86 44 45 59 Chemistry 31 45 62 61 82 44 34 50 53 77 57 46 67 53 43 74 68 59 58 46 Histograms Histograms look like bar charts but they are not bar charts. In bar charts, the length of a bar is proportional to the frequency. In a histogram, the area of a rectangle is proportional to the frequency. All rectangles are drawn side by side and any empty spaces in between two rectangles means that the class has zero frequency. For example, the following histogram shows the heights (in cm) of 26 footballers in a school. Note that there is not a single player whose height lies between 155 cm and 157 cm. Frequency 2 150 0 Height (cm) 152 154 156 158 160 4 6 8 Figure 1.1 For a histogram, area of each rectangle ∝ frequency


8 Mathematics Semester 3 STPM Chapter 1 Data Description 1 (a) Histograms with equal class width Area of rectangle = class width × height of rectangle If the class width is the same, then, area of rectangle ∝ height of rectangle. Class width is constant. Hence, the height of a rectangle represents the frequency of the corresponding class. Example 5 The frequency distribution of the marks obtained by 134 students in an examination is given below. Mark 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 Frequency 22 18 22 24 14 14 20 Construct a histogram to represent the data. Solution: Class boundaries are 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 123123123123123123123 Class widths are 10 10 10 10 10 10 10 Since the class widths of all classes are the same, frequency can be used as the heights of the rectangles. Now, we draw the histogram by following the steps below. 1. Mark the class boundaries on the horizontal axis. 2. Mark frequency on the vertical axis. 3. For every class, draw a rectangle with the same height as the class frequency. 4 19.5 0 Marks 29.5 39.5 49.5 59.5 69.5 79.5 89.5 8 12 16 20 24 Frequency Histogram for the marks of 134 students (b) Histograms with different class widths If the widths of every class are different for a histogram, then the height of every rectangle is not the same as the frequency. To determine the height of a rectangle with different class widths, use the following formula. Frequency density = Frequency Class width Then use ‘frequency density’ as the height of a rectangle.


9 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 6 On a certain day, 542 cars are parked in the parking lots of a shopping complex. The parking duration of each car (to the nearest minute) is shown in the table below. Draw a histogram to represent this information. Time (minutes) 6 – 25 26 – 61 62 – 81 82 – 105 106 – 113 114 – 149 150 – 197 198 – 297 Frequency 62 72 90 120 45 108 30 15 Solution: Class boundaries are 5.5 25.5 61.5 81.5 105.5 113.5 149.5 197.5 297.5 123123123123123 123 123 123 Class widths are 20 36 20 24 8 36 48 100 Since the class widths are different, we should calculate the frequency density for each class first. Class Class boundary Class width Frequency Frequency density 6 – 25 5.5 – 25.5 20 62 3.1 26 – 61 25.5 – 61.5 36 72 2 62 – 81 61.5 – 81.5 20 90 4.5 82 – 105 81.5 – 105.5 24 120 5 106 – 113 105.5 – 113.5 8 45 5.6 114 – 149 113.5 – 149.5 36 108 3 150 – 197 149.5 – 197.5 48 30 0.6 198 – 297 197.5 – 297.5 100 15 0.15 Histogram for the parking duration of each car in the parking lots of a shopping complex is shown below. Frequency density 0.5 5.5 25.5 0 Time (minutes) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 45.5 65.5 85.5 105.5 125.5 145.5 165.5 185.5 205.5 225.5 245.5 265.5 285.5 305.5


10 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Exercise 1.2 1. Draw a histogram for each of the frequency tables below. (a) x f 5 4 6 11 7 9 8 22 9 10 10 4 (b) x f 5 – 9 8 10 – 14 14 15 – 19 29 20 – 24 40 25 – 29 38 30 – 34 22 35 – 39 9 2. Draw a histogram for each of the frequency distributions below. (a) Age (x years) Number of applicants 0  x  5 85 5  x  10 122 10  x  15 107 15  x  20 243 20  x  25 128 25  x  30 66 30  x  35 42 35  x  40 25 (b) Distance from house (x km) Number of workers 0  x  2 48 2  x  4 92 4  x  6 166 6  x  8 65 8  x  12 63 12  x  20 48 20  x  40 32


11 Mathematics Semester 3 STPM Chapter 1 Data Description 1 3. The waiting times for 80 patients who are seeking treatment from a doctor in a clinic are shown below. Waiting time (minutes) Number of patients 0 – 3.5 12 3.5 – 10.5 18 10.5 – 14.0 15 14.0 – 17.5 6 17.5 – 24.5 8 24.5 – 35.0 15 35.0 – 49.0 6 Display the above data on a histogram. 4. The following table shows the monthly wage distribution for workers in United Paper Company. Wage (RM x) Number of workers 600  x  800 5 800  x  1 000 11 1 000  x  1 200 23 1 200  x  1 400 6 1 400  x  1 600 18 1 600  x  1 800 9 1 800  x  2 000 11 (a) Draw a histogram to illustrate the workers’ wage distribution in the company. (b) Every worker contributes RM50 for insurance. Besides that, every worker is given accommodation with a monthly rental of RM150. The rental payment for every worker is deducted directly from the wage. Construct a net income distribution table for the workers in the company. Hence, draw a histogram to represent this distribution. Compare the two histograms drawn in (a) and (b). Cumulative frequency curves A frequency distribution can also be displayed by a cumulative frequency curve (ogive). The commonly used cumulative frequency curve, is the ‘less than’ cumulative frequency curve. Consider the following frequency distribution of the marks obtained by 50 students in a mathematics test. Mark Frequency 50 – 54 2 55 – 59 3 60 – 64 8 65 – 69 12 70 – 74 15 75 – 79 6 80 – 84 3 85 – 89 1 Table 1.5


12 Mathematics Semester 3 STPM Chapter 1 Data Description 1 In order to explain the ‘less than’ cumulative frequency, we consider the class, 60 – 64. The ‘less than’ cumulative frequency for the class 60 – 64 is 2 + 3 + 8 = 13. The upper boundary for this class is 64.5 marks. Therefore, the cumulative frequency shows that 13 students obtain less than 64.5 marks. For the class 65 – 69, the ‘less than’ cumulative frequency is 2 + 3 + 8 + 12 = 25. Therefore, 25 students obtain less than 69.5 marks. By continuing the process for other classes, a ‘less than’ cumulative frequency distribution is obtained as shown in Table 1.6. Upper boundary Cumulative frequency  49.5 0  54.5 2  59.5 5  64.5 13  69.5 25  74.5 40  79.5 46  84.5 49  89.5 50 Table 1.6 ‘Less than’ cumulative frequency distribution After a cumulative frequency distribution is constructed, a cumulative frequency curve can be plotted by following the steps below. 1. Mark the boundaries on the horizontal axis. 2. Mark the cumulative frequency on the vertical axis. 3. Plot cumulative frequency against upper boundary for every class. 4. Join all points with a smooth curve. The cumulative frequency curve for the marks distribution of 50 students in a mathematics test is shown in Figure 1.2. Example 7 Plot a cumulative frequency curve based on the following frequency distribution. Mark Frequency 0  x  20 9 20  x  40 29 40  x  60 42 60  x  80 26 80  x  100 14 Cumulative frequency 10 49.5 0 Marks 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 20 30 40 50 Figure 1.2


13 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Solution: Class boundaries are 0, 20, 40, 60, 80, 100 123 123 123 123 123 Class widths are 20 20 20 20 20 To draw a cumulative frequency curve, we need to find the cumulative frequency first. The cumulative frequency table is shown below. Mark Cumulative frequency x  0 0 x  20 9 x  40 38 x  60 80 x  80 106 x  100 120 Based on the cumulative frequency table, we can draw the cumulative frequency curve as shown. Cumulative frequency 20 20 0 Marks 40 60 80 100 120 40 60 80 100 Exercise 1.3 1. Plot cumulative frequency curves for the data. (a) x 2 – 4 4 – 6 6 – 8 8 – 10 10 – 12 12 – 14 f 3 5 13 13 10 4 (b) Time (minutes) Frequency 5 – 14 6 15 – 24 8 25 – 34 12 35 – 44 16 45 – 54 8


14 Mathematics Semester 3 STPM Chapter 1 Data Description 1 2. The number of criminal cases reported per day in a duration of 400 days are summarised below. Number of criminal cases per day Number of days 0 – 4 5 5 – 9 37 10 – 14 87 15 – 19 121 20 – 24 77 25 – 29 42 30 – 34 21 35 – 39 10 Construct a cumulative frequency table and plot a cumulative frequency curve. Based on the estimates, there are 3 unreported cases for every reported case. Construct an estimated frequency table for the number of criminal cases that actually occur and draw a cumulative frequency curve for this distribution. 3. Construct a cumulative frequency table based on the given data in the following table. Hence, plot a cumulative frequency curve. Time taken to solve a question (x minutes) Number of candidates 10  x  15 5 15  x  20 28 20  x  25 69 25  x  30 126 30  x  35 84 35  x  40 45 40  x  50 66 4. The distribution of petrol usage (in litres) for a journey of 100 km by a car is shown in the following table. Petrol usage (x litres) Number of cars 6  x  71 5 7  x  81 18 8  x  91 10 9  x  10 55 10  x  11 97 11  x  12 42 12  x  13 26 Construct a cumulative frequency table, and plot a cumulative frequency curve.


15 Mathematics Semester 3 STPM Chapter 1 Data Description 1 1.2 Measures of Central Tendency Measures of central tendency are used to determine the central values of a data set. Measurement commonly used for measures of central tendency are the mode, median and mean. Mode (Ungrouped data) The mode of a set of data is the value that occurs the most number of times in the set. For the data set 10, 11, 12, 13, 14, 14, 15, 16, the mode is 14 because 14 is the value that occurs most frequently. A data set that has one value that occurs with the highest frequency is said to be unimodal. For the data set 5, 5, 5, 6, 7, 8, 9, 9, 9, 10, two values 5 and 9 both occur with the same highest frequency, which is three times. Hence the modes for this data set are 5 and 9. This data set is said to be bimodal. All sets of numerical data have a mean and a median but not all of them have modes. For example, the data set 1, 2, 4, 5, 6, 8, 9, 11 has no mode. Mode is a suitable and useful measure of central tendency in the shoe business. If the manager of a shoe shop wants to stock the shoes which are most saleable, he needs to know the mode so that sufficient orders of the desired size are made. Median (Ungrouped data) Median is the value at the centre of a data set after the data set is arranged in ascending or descending order. Example 8 Find the median for each set of data shown below. (a) 75, 67, 48, 66, 89, 51, 70 (b) 75, 67, 48, 66, 189, 51, 70 Solution: (a) Arrange the numbers in ascending order, 48, 51, 66, 67, 70, 75, 89 3 numbers 3 numbers Since 67 is the middle number after the numbers are arranged in ascending order, 67 is the median. (b) Arrange the numbers in ascending order, 48, 51, 66, 67 , 70, 75, 189 Therefore, 67 is the median for this set of numbers. Note that for Example 8(b), even though there is an odd number 189 in the set, the median remains the same as in Example 8(a). For data which has even number of observations, the median cannot be obtained directly because there are no values in the centre. For this case, we obtain the median by finding the average of two observations which are nearest to the centre. Defining Median VIDEO


16 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 9 Find the median for the following set of data. 14, 16, 17, 17, 18, 21, 23, 27, 29, 29, 30, 32 Solution: Number of observations in the data is 12. Therefore, there is no number in the middle of the set. To find the median, we find the average of the 6th and 7th observations. 6th observation = 21 7th observation = 23 Therefore, the median = 1 2 (21 + 23) = 22 In general, if a set of data has n observations, x1 , x2 , x3 , …, xn which has been arranged in ascending order, then (i) the median is the ( n + 1 2 )th observation when n is odd, (ii) the median is the average of the ( n 2 )th and ( n 2 + 1)th observations when n is even. Median (Grouped data) For a grouped frequency distribution, the median is the ( ∑f 2 )th observation. An estimate of the median may be obtained. (a) by calculation, (b) using a cumulative frequency curve, Estimation of median by calculation Consider the following example. Example 10 Find the median of the following grouped frequency distribution. Class 1 – 2 3 – 4 5 – 6 7 – 8 9 – 10 11 – 12 13 – 14 15 – 16 Frequency 5 8 22 18 36 29 20 12 Solution: Class boundary Frequency, f Cumulative frequency 0.5 – 2.51 5 5 2.5 – 4.51 8 13 4.5 – 6.51 22 35 6.5 – 8.51 18 53 8.5 – 10.5 36 89 10.5 – 12.5 29 118 12.5 – 14.5 20 138 14.5 – 16.5 12 150


17 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Number of observations = 150 Median = (150 2 )th observation = 75th observation From the cumulative frequency table, the class that contains the 75th observation is 8.5 – 10.5. Therefore, the median class is 8.5 – 10.5. m 8.5 10.5 Lower boundary Upper boundary 36 observations 22 observations Width of the median class = 10.5 – 8.5 = 2 Position of the median from the lower class boundary = 22 36 × 2 = 1.2 Therefore, median = 8.5 + 1.2 = 9.7 In general, Lower boundary, Lm Upper Median, m boundary, Um Cumulative frequency, Fm – 1 1 –– 2 (�f) Cumulative frequency up to the median class, Fm By proportion, m – Lm Um – Lm = 1 2 (∑f ) – Fm –1 Fm – Fm – 1 m = Lm + 3 1 2 (∑f ) – Fm – 1 f m 4c Um – Lm = c Fm – Fm – 1 = fm where Lm = lower boundary of median class, Fm – 1 = cumulative frequency before median class, f m = frequency of median class, c = width of median class.


18 Mathematics Semester 3 STPM Chapter 1 Data Description 1 The calculation of an estimate of the median of grouped data is illustrated in the following example. Example 11 Find the median for the data in the following grouped frequency distribution. Class Frequency, f 0  x  51 7 5  x  10 27 10  x  15 35 15  x  20 54 20  x  25 63 25  x  30 43 30  x  35 25 35  x  40 17 40  x  45 9 45  x  50 4 Solution: Total frequency = 7 + 27 + 35 + 54 + 63 + 43 + 25 + 17 + 9 + 4 = 284 Median = (284 2 )th observation = 142th observation Total frequency up to class 15  x  20 is 123. Total frequency up to class 20  x  25 is 186. Therefore, median is located in the class 20  x  25. Class 20  x  25 is called the median class. Lower boundary of median class, Lm = 20 Cumulative frequency before the median class, Fm – 1 = 123 Frequency of the median class, f m = 63 Width of the median class, c = 25 – 20 = 5 Median = Lm + 3 1 2 (∑f ) – Fm – 1 f m 4c = 20 + 3 1 2 (284) – 123 63 45 = 20 + 1 142 – 123 63 25 = 21.51 Estimation of median using a cumulative frequency curve The median of grouped data may be estimated from a cumulative frequency curve. In this method, we plot a cumulative frequency curve and read the observation which has the 1 2 (∑f) cumulative frequency. Figure 1.3 Cumulative frequency Σf –– 2 Median Data


19 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 12 100 earthworms are collected from a garden. The lengths (to the nearest millimetre) of the earthworms are recorded as shown in the table below. Length (mm) 95 – 109 110 – 124 125 – 139 140 – 154 155 – 169 170 – 184 185 – 199 200 – 214 Number of earthworms 2 8 17 26 24 16 6 1 Construct a cumulative frequency table and plot a cumulative frequency curve for the information. Estimate the median length of the worms. Solution: The class boundaries are 94.5, 109.5, 124.5, 139.5, 154.5, 169.5, 184.5, 199.5, 214.5. Length of earthworms (x mm) Cumulative frequency x  94.51 0 x  109.5 2 x  124.5 10 x  139.5 27 x  154.5 53 x  169.5 77 x  184.5 93 x  199.5 99 x  214.5 100 0 10 20 30 40 50 60 70 80 90 100 94.5 109.5 124.5 139.5 154.5 169.5 184.5 199.5 214.5 Length of earthworms (mm) Cumulative frequency Median = 153 Median is the (100 2 )th = 50th observation. From the graph, median ≈ 153 mm.


20 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Exercise 1.4 1. Find the median for the following data. (a) 4, 6, 8, 5, 6, 10, 7 (b) 12, 7, 10, 5, 11, 9, 7, 10, 8, 6 (c) 0.80, 0.50, 0.75, 0.88, 0.55, 0.42, 0.088, 2.00 2. Find the median and mode of the data represented by each of the stemplots below. (a) Stem Leaf 1 2 3 4 5 6 0 1 4 5 7 3 8 9 2 4 5 8 1 6 6 4 5 (b) Stem Leaf 10 15 20 25 30 35 2 3 1 4 3 3 4 1 2 3 3 3 0 2 4 1 Key: 3 | 8 means 38 Key: 15 | 4 means 19 3. The following table shows the scores obtained when a die is thrown 30 times. Find the mode and median score. Score 1 2 3 4 5 6 Frequency 4 3 5 7 6 5 4. The table below shows a frequency distribution of the masses of 48 female students in a college where the masses are given correct to the nearest kg. Mass (kg) Frequency 40 – 44 8 45 – 49 14 50 – 54 20 55 – 59 5 60 – 64 1 Find the median mass of the female students. 5. The following frequency distribution shows the time taken by 35 students to complete their classes’ notice board beautification project. Time (minutes) Frequency 20 – 29 4 30 – 39 7 40 – 49 10 50 – 59 8 60 – 69 4 70 – 79 2 Find the median time taken to complete the project.


21 Mathematics Semester 3 STPM Chapter 1 Data Description 1 6. Plot a cumulative frequency curve for the frequency distribution in Question 4 and estimate the median mass. 7. Plot a cumulative frequency curve for the frequency distribution in Question 5 and estimate the median time. 8. Estimate the median diameter of the pencils produced in a factory as shown below by drawing a cumulative frequency curve. Diameter (cm) Frequency 0.47 – 0.48 4 0.49 – 0.50 7 0.51 – 0.52 11 0.53 – 0.54 6 0.55 – 0.56 5 Mean (Ungrouped data) Mean (arithmetic mean) of a set of data is the sum of the values divided by the total number of observations. Mean, x – = n ∑ i=1 xi n or x – = ∑x n where n ∑ i=1 xi = sum of values of all observations n = total number of observations Example 13 Find the mean of the numbers 12, 18, 13, 10, 6, 23, 16. Solution: Mean, x – = ∑x n = 12 + 18 + 13 + 10 + 6 + 23 + 16 7 = 14


22 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Mean (Grouped data) Consider data given in the form of a frequency distribution with single-valued classes. If x1 , x2 , x3 , … xn are observations and f 1 , f 2 , f 3 , …, f n represent frequencies of the observations respectively, then the mean, x –, is given by x – = n ∑ i=1 f i xi n ∑ i=1 f i = ∑fx ∑f Example 14 Calculate the mean of the frequency distribution. x 4 5 7 10 11 15 17 f 3 12 23 10 14 8 2 Solution: x Frequency, f fx 4 3 12 5 12 60 7 23 161 10 10 100 11 14 154 15 8 120 17 2 34 ∑f = 72 ∑fx = 641 Mean, x – = ∑fx ∑f = 641 72 = 8.90 Consider data given in the form of a frequency distribution where the classes are not single-valued. We no longer know the actual value of individual observations. Consequently, we cannot find accurately the sum of the values. Therefore, we find the approximate value of this sum using the mid-points of the classes. The mean, x – is given by x – = n ∑ i=1 f i xi n ∑ i=1 f i = ∑fx ∑f , where xi (or x) is the mid-point of the class.


23 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 15 Calculate the mean of the following frequency distribution. Class Frequency, f 1 – 41 5 5 – 81 13 9 – 12 31 13 – 16 19 17 – 20 8 21 – 24 4 Solution: For every class, we use the mid-point to represent the class. This means that class 1 – 4 is represented by 1 + 4 2 = 2.5. Similarly, we find the mid-point of the other classes. The following table is constructed to calculate the mean. Class Mid-point, x Frequency, f fx 1 – 41 2.5 5 12.5 5 – 81 6.5 13 84.5 9 – 12 10.5 31 325.5 13 – 16 14.5 19 275.5 17 – 20 18.5 8 148.0 21 – 24 22.5 4 90.0 ∑f = 80 ∑fx = 936 Mean, x – = ∑fx ∑f = 936 80 = 11.7 Coding method In the examples discussed above, we only study frequency distributions where the observations have small values. In actual situations, the values can be very large or very small. A method which transforms the original data is used to simplify the calculation. This method is called the coding method. The technique is to subtract a number from every observation and if possible, divide the new value by another number to obtain a new set of data with simpler values. So, the original data x1 , x2 , …, xn with respective frequencies f 1 , f 2 , …, fn are transformed to y1 , y2 , …, yn with respective frequencies f 1 , f 2 , …, f n by using the formula y = x – k h


24 Mathematics Semester 3 STPM Chapter 1 Data Description 1 This formula is called the coding formula where k is called the assumed mean and h is called the scaling factor, usually the class width. By definition, y = ∑fy ∑f = ∑f 1 x – k h 2 ∑f = 1 h 1 ∑fx – ∑fk ∑f 2 = 1 h 1 ∑fx ∑f – k∑f ∑f 2 = 1 h (x – k) hy = x – k Therefore, x = hy + k . Since assumed mean and scaling factor help to simplify the calculation, we can choose any assumed mean and scaling factor as long as the calculation becomes simple. Example 16 The mass of each of 300 packets of flour produced by a factory is labelled 1 kg. When the mass of each packet is measured to the nearest 0.001 kg, the following frequency distribution is obtained. Mass (x kg) Number of packets 0.980  x  0.985 8 0.985  x  0.990 22 0.990  x  0.995 35 0.995  x  1.000 67 1.000  x  1.005 94 1.005  x  1.010 48 1.010  x  1.015 20 1.015  x  1.020 6 Calculate the mean mass of the packets.


25 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Solution: The mid-point of the class 1.000  x  1.005, that is 1.0025 is chosen as the assumed mean and 1 200 is chosen as the scaling factor. Mass (x kg) Mid-point, x y = 200(x – 1.0025) Frequency, f fy 0.980  x  0.985 0.9825 –4 8 –32 0.985  x  0.990 0.9875 –3 22 –66 0.990  x  0.995 0.9925 –2 35 –70 0.995  x  1.000 0.9975 –1 67 –67 1.000  x  1.005 1.0025 0 94 0 1.005  x  1.010 1.0075 1 48 48 1.010  x  1.015 1.0125 2 20 40 1.015  x  1.020 1.0175 3 6 18 ∑f = 300 ∑fy = –129 Mean of y, y = ∑fy ∑f = –129 300 = – 0.43 Since y = 200(x – 1.0025), x = y 200 + 1.0025 = –0.43 200 + 1.0025 = 1.00035 Therefore, the mean mass of the packets of flour is 1.00035 kg. Exercise 1.5 1. Determine the mean of the following data. (a) 3, 5, 7, 4, 5, 9, 6 (b) 11, 6, 9, 4, 10, 8, 6, 9, 7, 5 2. Find the mean of the data in the following frequency tables. (a) x 0 1 2 3 4 5 f 3 6 18 22 17 7 (b) x 0 – 1 2 – 3 4 – 5 6 – 7 8 – 9 10 – 11 12 – 13 f 2 8 13 9 11 6 3


26 Mathematics Semester 3 STPM Chapter 1 Data Description 1 3. Find the mean of the data in the following frequency tables. (a) x 3 4 5 6 7 8 9 f 8 12 25 28 39 16 6 (b) x 83 84 85 86 87 88 89 f 8 12 25 28 39 16 6 What is the difference between the two means? 4. To obtain grade A, Raymond must achieve an average of at least 75 marks in four tests. If his average mark for the first three tests is 72, calculate the lowest mark he must get in his fourth test in order to obtain grade A. 5. The mean of 5 numbers is 6, and the mean of 4 other numbers is 15. Find the mean of the 9 numbers together. 6. By choosing a suitable mean and scaling factor, use the coding method to find the mean of the data in each frequency table given below. (a) x 5.0 5.2 5.4 5.6 5.8 f 6 17 21 13 4 (b) x 26 27 28 29 30 31 32 33 f 22 18 24 35 29 15 8 6 7. Calculate the mean of the data in the following frequency table. x 5 – 6 7 – 8 9 – 10 11 – 12 13 – 14 15 – 16 17 – 18 f 10 27 41 32 23 11 4 Deduce the mean of each of the frequency distributions below: x 125 – 126 127 – 128 129 – 130 131 – 132 133 – 134 135 – 136 137 – 138 f 10 27 41 32 23 11 4 x 25 000 – 26 000 27 000 – 28 000 29 000 – 30 000 31 000 – 32 000 33 000 – 34 000 35 000 – 36 000 37 000 – 38 000 f 10 27 41 32 23 11 4 8. The table below shows the cumulative frequency for the distribution of the number of accidents per year in 210 factories. Accident cases Number of factories  01 0  10 25  20 77  30 134  40 167  50 199  60 210 Find the mean number of accidents per year.


27 Mathematics Semester 3 STPM Chapter 1 Data Description 1 9. The following table shows the cumulative frequency for the mass of fishes caught in a lake for one month. Mass (g) Number of fishes  300 100  350 94  400 72  450 48  500 22  550 22  600 0 Find the mean mass of the fishes caught in the lake. 10. On a particular day, the number of books in 40 racks in a library is recorded in a frequency table as shown below. Number of books 31 – 35 36 – 40 41 – 45 46 – 50 51 – 55 56 – 60 Number of racks 4 6 10 13 5 2 Find the mean number of books in each rack. Give your answers correct to two significant figures. 11. The time taken by a computer to calculate a certain problem is used as the standard speed of the computer and is also known as ‘bench mark’. Bench marks for 203 computers tested are shown in the table below. Bench mark (x seconds) Number of computers 0.00180  x  0.00185 3 0.00185  x  0.00190 15 0.00190  x  0.00195 27 0.00195  x  0.00200 45 0.00200  x  0.00205 72 0.00205  x  0.00210 30 0.00210  x  0.00215 11 Using the substitution y = 10  000(x – 0.002025), obtain a distribution for y. Calculate the mean of y and hence find the mean of x.


28 Mathematics Semester 3 STPM Chapter 1 Data Description 1 1.3 Measures of Dispersion Measures of dispersion are used to determine the spread of a data set. Measures of dispersion are the range, the interquartile range, standard deviation and variance. Range (Ungrouped data) The range is the difference between the largest value and the smallest value in a data set. For example, in a mathematics test, if the highest mark is 83 and the lowest mark is 12, then the range is 83 – 12 = 71. Range = Largest observation – Smallest observation The range is the simplest measure of dispersion because it gives the difference between two extreme values. If the range is big, then the set of data has observations which vary a lot. On the other hand, if the range is small, then the set of data contains observations which are close to each other. The range is not a good measure of dispersion because it depends on the highest value and the lowest value of a set of data. If there is an odd observation, then the range will provide a wrong picture of the data. Example 17 The heights (in cm) of a group of students in a class are given below: 150 152 155 160 148 149 151 153 156 155 149 161 155 157 153 Find the range. Solution: Largest value = 161 cm Smallest value = 148 cm Range = largest value – smallest value = 161 cm – 148 cm = 13 cm Interquartile range (Ungrouped data) The range of a data set ignores the spread of the internal values and also cannot be used with grouped data. A suitable measure of the spread of the data set is naturally associated with the median and the values at the positions one-quarter and three-quarters of the way through the ordered data. These values are called the lower and upper quartiles respectively. Quartiles Quartiles are values which divide a set of data arranged in ascending or descending order into four equal parts as shown below. 10 15 17 20 25 29 30 35 38 40 45 ↑ ↑ ↑ First quartile Median or second quartile Third quartile Q1 = 17 Q2 = 29 Q3 = 38 The first quartile, also known as the lower quartile, is a number such that 25% of the data are less than this number.


29 Mathematics Semester 3 STPM Chapter 1 Data Description 1 The third quartile, also known as the upper quartile, is a number such that 75% of the data are less than this number. The second quartile is also called the median, is the middle observation of the data set. The quartile of ungrouped data are usually determined as follows: 1. Arrange the data in ascending order, and find the value of pn where p is the proportion and n is the size of the data set. 2. It pn is not an integer, round it up to the next integer and determine the corresponding ordered value. 3. If pn is an integer, say r, determine the average of the rth and (r + 1)th ordered value. Example 18 Find all the quartiles of the data set. 23, 47, 32, 34, 42, 35, 44, 36, 52, 40, 42, 46 Solution: Arrange the data in ascending order, 23, 32, 34,  35, 36, 40,  42, 42, 44,  46, 47, 52 ↑ ↑ ↑ Q1 Q2 Q3 1 4 n = 12 4 = 3, 1 2 n = 6 and 3 4 n = 36 4 = 9 are integers. First quartile, Q1 = 1 2 (x3 + x4 ) = 1 2 (34 + 35) = 34.5 Median, Q2 = 1 2 (x6 + x7 ) = 1 2 (40 + 42) = 41 Third quartile, Q3 = 1 2 (x9 + x10) = 1 2 (44 + 46) = 45 Example 19 Find the first quartile, median and the third quartile of each of the following data set. (a) 114, 120, 133, 138, 145, 148, 151 (b) 15, 19, 20, 22, 25, 27, 27, 28, 31, 32 Solution: (a) 114 120 133 138 145 148 151 ↑ ↑ ↑ Q1 Q2 Q3


30 Mathematics Semester 3 STPM Chapter 1 Data Description 1 1 4 n = 7 4 = 1.75, 1 2 n = 3.5, 3 4 n = 5.25. First quartile, Q1 = x2 = 120 Median, Q2 = x4 = 138 Third quartile, Q3 = x6 = 148 (b) 15 19 20 22 25  27 27 28 31 32 ↑ ↑ ↑ Q1 Q2 Q3 1 4 n = 10 4 = 2.5, 1 2 n = 5, 3 4 n = 7.5 First quartile, Q1 = x3 = 20 Median, Q2 = 1 2 (x5 + x6 ) = 1 2 (25 + 27) = 26 Third quartile, Q3 = x8 = 28 Interquartile range The interquartile range is the difference between the third quartile and the first quartile, Q3 – Q1 . This value gives the range of the middle 50% of the data arranged in order. The interquartile range avoids giving a wrong and inaccurate picture when there are one or two extreme values in the data. Example 20 The table below shows the number of fish reared in each house in 25 houses. Find the median and the interquartile range. Number of fishes 0 1 2 3 4 5 Frequency 1 5 8 7 3 1 Solution: Median = 1 25 + 1 2 2th observation = 13th observation = 2 fish First quartile, Q1 = x7 = 2 fish Third quartile, Q3 = x19 = 3 fish Interquartile range = Q3 – Q1 = 3 – 2 = 1 fish


31 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Interquartile range (Grouped data) Using calculation For a grouped data with total frequency ∑f, First quartile, Q1 = ( 1 4 ∑f)th observation Third quartile, Q3 = ( 3 4 ∑f)th observation The method of interpolation used in determining median can be used to determine the quartiles too. For example, to determine the first quartile, Q1 , we can take the class which contains the ( 1 4 ∑f)th observation. Lower boundary, L Cumulative frequency, FB Q1 1 –– 4 (�f) Cumulative frequency up to the class containing Q1, FA Upper boundary, U Hence, Q1 – L U – L = 1 4 (∑f ) – FB FA – FB Q1 – L = 3 1 4 (∑f ) – FB FA – FB 4 (U – L) \ Q1 = L + 3 1 4 (∑f ) – FB f 4c U – L = c FA – FB = f where L = lower class boundary of the class containing the first quartile, FB = cumulative frequency before the class containing the first quartile, f = frequency of the class containing the first quartile, c = width of the class containing the first quartile. The third quartile can be similarly determined by the formula. Q3 = L + 3 3 4 (∑f ) – FB f 4c where L = lower boundary of the class containing the third quartile, FB = cumulative frequency before the class containing the third quartile, f = frequency of the class containing the third quartile, c = width of the class containing the third quartile. The interquartile range is given by Interquartile range = Q3 – Q1


32 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 21 The following table shows the heights of a group of students. Calculate the estimates of the first quartile, third quartile and the interquartile range. Height (cm) 150 – 155 155 – 160 160 – 165 165 – 170 170 – 175 175 – 180 Frequency 15 32 68 52 24 12 Solution: Total number of students, ∑ f = 15 + 32 + 68 + 52 + 24 + 12 = 203 First quartile, Q1 = 1 203 4 2th observation = 50.75th observation Total frequency up to the class 155 – 160 = 47 Total frequency up to the class 160 – 165 = 115 Therefore, the first quartile is in the class 160 – 165. Lower boundary of the class containing the first quartile, L = 160 Cumulative frequency before the class containing the first quartile, FB = 47 Frequency of the class containing the first quartile, f = 68 Width of the class containing the first quartile, c = 165 – 160 = 5 First quartile, Q1 = L + 3 1 4 (∑f ) – FB f 4c = 160 + 3 1 4 ( 203) – 47 68 45 = 160.28 cm Third quartile, Q3 = [ 3 4 (203)]th observation = 152.25th observation Total frequency up to class 160 – 165 = 115 Total frequency up to class 165 – 170 = 167 Therefore, third quartile is in the class 165 – 170. Lower boundary of the class containing the third quartile, L = 165 Cumulative frequency before the class containing the third quartile, FB = 115 Frequency of the class containing the third quartile, f = 52 Width of the class containing the third quartile, c = 5 Third quartile, Q3 = LB + 3 3 4 (∑f ) – FB f 4c = 165 + 3 3 4 (203) – 115 52 45 = 168.58 cm


33 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Interquartile range = Q3 – Q1 = 168.58 – 160.28 = 8.3 cm Using a cumulative frequency curve After a cumulative frequency curve is drawn, we can determine the median and quartiles directly from the curve as shown in the example below. Example 22 The table below shows the distribution of the weights of babies (in kg) born in a hospital from January to June. (a) Construct a cumulative table and plot a cumulative frequency curve. (b) Determine the median and the interquartile range. Weight (kg) 0.0 – 1.0 1.0 – 2.0 2.0 – 3.0 3.0 – 4.0 4.0 – 5.0 5.0 – 6.0 Number 12 233 442 185 96 32 Solution: (a) Weight (kg)  0.0  1.0  2.0  3.0  4.0  5.0  6.0 Cumulative frequency 0 12 245 687 872 968 1000 (b) Based on this table, we draw a cumulative frequency curve as shown below. 100 0 200 300 400 500 600 700 800 900 1 000 1.0 2.0 3.0 4.0 5.0 6.0 Q1 Q3 Weight (kg) Cumulative frequency Median, Q2 = 2.5 kg First quartile, Q1 = 1 4 (1000)th observation = 250th observation = 2.0 kg


34 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Third quartile, Q3 = 3 4 (1000)th observation = 750th observation = 3.3 kg Interquartile range = 3.3 – 2.0 = 1.3 kg Exercise 1.6 1. Find the range and interquartile range for each data set below. (a) 9, 11, 8, 9, 4, 10, 12, 8, 16, 5, 9, 8 (b) 58, 62, 37, 59, 46, 51, 40, 33, 55, 43, 22, 63, 54 (c) 2.7, 3.1, 3.2, 3.2, 3.4, 3.5, 3.8, 3.9, 4.0, 4.0, 4.3 2. The following data shows the masses (to the nearest g) of 20 eggs collected on a certain day by a farmer. 87, 94, 103, 101, 89, 79, 90, 111, 108, 92, 98, 110, 96, 89, 98, 112, 95, 106, 109, 93 (a) Find the median, first quartile and third quartile of the masses of the eggs. (b) A roti canai seller orders eggs which are of medium mass. If 50% of the eggs of medium mass are delivered, what is the range of their masses? 3. Find the median and interquartile range for the data in the stemplots below. (a) Stem Leaf 1 2 3 4 4 5 7 0 2 6 6 7 8 8 9 5 6 9 9 (b) Stem Leaf 15 20 25 30 35 0 2 3 4 1 1 2 2 3 3 4 0 1 2 3 4 4 0 1 1 1 3 1 2 2 3 4 Key: 3 | 8 means 38 cm Key: 15 | 2 means 17 kg 4. Find the range and interquartile range of each of the following frequency distributions. (a) Score 1 2 3 4 5 6 7 8 9 10 Frequency 0 2 6 10 12 15 6 5 2 1 (b) Number of children 0 1 2 3 4 5 Frequency 2 4 6 7 2 1


35 Mathematics Semester 3 STPM Chapter 1 Data Description 1 5. The cumulative frequency curve below shows the cumulative expenditure of 260 employees in the public sector who buy luxury items in the month of December. 0 50 100 150 200 250 300 50 100 150 200 250 300 Number of employee Expenditure (RM) Determine the median and interquartile range. 6. The cumulative frequency curve below shows the number of bags reported lost each day by passengers of an airline. 0 50 100 125 175 200 25 75 150 5 15 25 35 10 20 30 40 Cumulative frequency (days) Number of bags Determine the first quartile and the third quartile. The workers in an airline company know that the number of lost bags is high on busy days. If 30% of the days are assumed to be busy, find the least number of bags lost per day on busy days. 7. The cumulative frequency curve below represents the cumulative frequency of the loads carried by 800 lorries during a spot check by the Road Transport Department. 0 200 400 600 800 10 20 30 40 Cumulative frequency (number of lorries) Load (tonnes)


36 Mathematics Semester 3 STPM Chapter 1 Data Description 1 (a) Find the median and the interquartile range. (b) If the Road Transport Department reports that 35% of the lorries are overloaded, determine the maximum load that is approved for lorries. 8. The cumulative frequency curve below shows the daily revenues of a supermarket. 100 20 75 80 0 85 90 95 100 105 110 115 120 40 60 80 Earnings (RM1 000) Cumulative frequency (%) 106 102 93 87 (a) Find the median and interquartile range. (b) The manager of the supermarket finds that the sales for 10% of the selling days are ‘bad’ because there are no profits after deducting costs. Determine the level of sales which is considered ‘bad’. (c) The manager find that 15% of the selling days are good, what is the level of sales considered ‘good’? 9. The cumulative frequency table below shows the travelling times to school for 472 students in a school. Time (minutes)  0  5  10  15  20  25  30  35 Cumulative frequency 0 102 255 340 402 435 457 472 Calculate estimates of the first quartile, third quartile and the interquartile range for the travelling times to school. 10. The marks obtained by 134 students in an examination are recorded in the following table. Marks 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 Frequency 22 18 22 24 14 14 20 Calculate estimates of the first quartile, third quartile and interquartile range.


37 Mathematics Semester 3 STPM Chapter 1 Data Description 1 11. The table below shows the distances of javelin throws done by an athlete during a certain period of time. Distance (x m) Frequency 18.0  x  19.0 3 19.0  x  20.0 8 20.0  x  21.0 18 21.0  x  22.0 36 22.0  x  23.0 16 23.0  x  24.0 3 Calculate estimates of the lower quartile, upper quartile and interquartile range. 12. A cumulative frequency table is given below. x 5 – 7 8 – 10 11 – 13 14 – 16 17 – 19 20 – 22 23 – 25 26 – 28 f 4 8 16 28 29 19 10 3 Plot a cumulative frequency curve. Hence, determine the median and the interquartile range. 13. The table below shows a frequency distribution of the weights, to the nearest kg, of 156 students in a school. Weight (kg) 40 – 44 45 – 49 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 Frequency 9 6 21 54 54 9 3 (a) Construct a cumulative frequency table and plot a cumulative frequency curve. (b) How many students weigh less than 57 kg? (c) How many students are heavier than 61 kg? (d) Estimate the median and the interquartile range. 14. These are 8481 candidates who sit for a Mathematics paper in the STPM examination in a certain year and the results obtained are shown in the following table. Number of candidates 48 216 822 1057 1492 1683 1522 1011 522 108 Marks 0 – 10 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 – 80 81 – 90 91 – 100 (a) Plot a cumulative frequency curve. (b) Estimate the range of marks obtained by the middle 80% of the candidates. (c) If 55 marks is fixed as the passing mark, how many candidates will pass?


38 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Standard deviation (Ungrouped data) Deviation is the difference between values. In statistics, deviation refers to the difference between an observation and the data mean. Since mean is the central value for all observations in a set of data, deviation is an effective measurement to measure how close or how far an observation is from the mean. Standard deviation for a set of data is the square root for the mean of all the squares of deviations from the mean. Algebraically, standard deviation, s, for n observations is given by s = n ∑ i=1 (xi – x) 2 n = ∑(x – x) 2 n The standard deviation is the most important measure of dispersion and is often used. If the spread of a set of data is wide, then the standard deviation is also large. If a set of data has no spread at all, that is when all the values of the observations are the same, then the standard deviation is zero. Example 23 Find the standard deviation of the data 4, 5, 6, 7, 8, 9, 10. Solution: Mean, x = ∑x n = 4 + 5 + 6 + 7 + 8 + 9 + 10 7 = 49 7 = 7 x x – x (x – x) 2 4 –3 9 5 –2 4 6 –1 1 7 0 0 8 1 1 9 2 4 10 3 9 28 If the mean is not an integer, the calculation of the standard deviation using the formula s = ∑(x – x) 2 n becomes complicated because (x – x) and (x – x) 2 involve decimal numbers. To overcome this problem, we will derive an alternative formula for s. s = ∑(x – x) 2 n = 28 7 = 4 = 2 Standard deviation = 2


39 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Alternative form for ∑(x – x) 2 is derived as follows n ∑ i=1 (xi – x) 2 = n ∑ i=1 (xi 2 – 2xxi + x 2 ) = n ∑ i=1 xi 2 – 2x n ∑ i=1 xi + n ∑ i=1 x 2 = n ∑ i=1 xi 2 – 2x(nx ) + nx 2 = n ∑ i=1 xi 2 – 2nx 2 + nx 2 = n ∑ i=1 xi 2 – nx 2 In short, ∑(x – x ) 2 = ∑ x2 – nx 2 Hence, standard deviation, s = ∑x2 – nx 2 n s = ∑x2 n – x 2 s = ∑x2 n – 1 ∑x n 2 2 Example 24 Find the mean and standard deviation of data. 13, 23, 35, 6, 28, 35, 48, 12, 37 Solution: Number of observations, n = 9 x 13 23 35 6 28 35 48 12 37 ∑x = 237 x2 169 529 1225 36 784 1225 2304 144 1369 ∑x2  = 7785 Mean, x = ∑x n = 237 9 = 26.3 Standard deviation, s = ∑x2 n – 1 ∑x n 2 2 = 7785 9 – 1 237 9 2 2 = 171.56 = 13.1 x – = n ∑ i=1 xi n nx– = n ∑ i=1 xi


40 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Standard deviation (Grouped data) Consider data given in the form of a frequency distribution with single-valued classes. If x1 , x2 , x3 , …, xn represent different values of a data set and f 1 , f 2 , f 3 , …, f n represent the respective frequencies, then the standard deviation, s, for the set of data is s = ∑f(x – x ) 2 ∑f where x represents the mean of the data. Example 25 Find the mean and standard deviation of the data below. Observation Frequency 3 4 5 22 8 35 9 36 12 17 15 6 Solution: Observation, x Frequency, f fx (x – x ) f (x – x ) 2 3 4 12 –5.5 121.00 5 22 110 –3.5 269.50 8 35 280 –0.5 8.75 9 36 324 0.5 9.00 12 17 204 3.5 208.25 15 6 90 6.5 253.50 ∑f = 120 ∑fx = 1020 ∑f(x – x) 2 = 870 Mean, x = ∑fx ∑f = 1020 120 = 8.5 Standard deviation, s = ∑f(x – x ) 2 ∑f = 870 120 = 2.69 (three significant figures)


41 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Consider data given in the form of a frequency distribution, where the classes are not single-valued. The mid-point of each class is used to represent the respective class value. This means that if x1 , x2 , x3 , …, xn represent the mid-points of the classes and f 1 , f 2 , f 3 , …, f n represent the frequencies of the corresponding classes, then the standard deviation is s = ∑f(x – x ) 2 ∑f Example 26 Find the mean and standard deviation of the frequency distribution below. Class Frequency 0 – 9.91 5 10 – 19.9 13 20 – 29.9 23 30 – 39.9 31 40 – 49.9 16 Solution: We use the mid-point of the class to represent the class interval. For example, 4.95 [= 1 2 (0 + 9.9)] is used to represent the class 0 – 9.9. Class Mid-point, x Frequency, f fx (x – x ) f (x – x ) 2 0 – 9.9 4.95 5 24.75 –24.55 3013.51 10 – 19.9 14.95 13 194.35 –14.55 2752.13 20 – 29.9 24.95 23 573.85 –4.55 476.16 30 – 39.9 34.95 31 1083.45 5.45 920.78 40 – 49.9 44.95 16 719.20 15.45 3819.24 ∑f = 88 ∑fx = 2595.60 ∑f(x – x ) 2 = 10 981.82 Mean, x = ∑fx ∑f = 2595.60 88 = 29.50 Standard deviation, s = ∑f(x – x ) 2 ∑f = 10 981.82 88 = 124.79 = 11.17


42 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Note that the formula used to find the standard deviation is complicated. Hence, we derive an alternative formula for the standard deviation so that the calculation is more simple. ∑f (x – x ) 2 = ∑fx 2 – 2∑fxx + ∑f x 2 = ∑fx 2 – 2x (∑fx) + x 2 ∑f = ∑fx 2 – 2x (x ∑f ) + x 2 ∑f = ∑fx 2 – x 2 ∑f Standard deviation, s = ∑f(x – x ) 2 ∑f = ∑f x2 – x 2 ∑f ∑f Hence, s = ∑fx 2 ∑f – x 2 Example 27 By referring to the data given in Example 25, find the standard deviation. Solution: Observation, x Frequency, f fx fx2 3 4 12 36 5 22 110 550 8 35 280 2240 9 36 324 2916 12 17 204 2448 15 6 90 1350 ∑f = 120 ∑fx = 1020 ∑fx2 = 9540 Standard deviation, s = ∑fx 2 ∑f – 1 ∑fx ∑f 2 2 = 9540 120 – 1 1020 120 2 2 = 7.25 = 2.69 x – = ∑fx ∑f


43 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 28 By referring to the frequency distribution given in Example 26, find the standard deviation. Solution: Class Mid-point, x Frequency, f fx fx2 0 – 9.9 4.95 5 24.75 122.51 10 – 19.9 14.95 13 194.35 2905.53 20 – 29.9 24.95 23 573.85 14 317.56 30 – 39.9 34.95 31 1083.45 37 866.58 40 – 49.9 44.95 16 719.20 32 328.04 ∑f = 88 ∑fx = 2595.60 ∑fx2 = 87 540.22 Standard deviation, s = ∑fx 2 ∑f – 1 ∑fx ∑f 2 2 = 87 540.22 88 – 1 2595.60 88 2 2 = 124.79 = 11.17 Coding method The calculation of standard deviation involves squaring numbers and hence the calculation results in large numbers. Just like the calculation of mean, we can use the coding method to transform the original data to a more simplified form. We know that the coding formula is y = x – k h where k is the assumed mean and h is the scaling factor. Standard deviation of y, sy = ∑f(y – y ) 2 ∑f s y 2 = ∑f1 x – k h – x – k h 2 2 ∑f = 1 h2 ∑f(x – x ) 2 ∑f s y 2 = 1 h2 s x 2 s x = hsy Therefore, the standard deviation of original values = h × standard deviation of y coded values.


44 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 29 Find the standard deviation of the population of people in 150 towns as shown in the table below. Population Number of towns 100 000 – 109 999 4 110 000 – 119 999 5 120 000 – 129 999 13 130 000 – 139 999 26 140 000 – 149 999 37 150 000 – 159 999 21 160 000 – 169 999 17 170 000 – 179 999 16 180 000 – 189 999 11 Solution: Let assumed mean, k = 104 999.5 and scaling factor, h = 10 000. Population Mid-point, x Number of towns, f y = x – 104 999.5 10 000 fy fy2 100 000 – 109 999 104 999.5 4 0 0 0 110 000 – 119 999 114 999.5 5 1 5 5 120 000 – 129 999 124 999.5 13 2 26 52 130 000 – 139 999 134 999.5 26 3 78 234 140 000 – 149 999 144 999.5 37 4 148 592 150 000 – 159 999 154 999.5 21 5 105 525 160 000 – 169 999 164 999.5 17 6 102 612 170 000 – 179 999 174 999.5 16 7 112 784 180 000 – 189 999 184 999.5 11 8 88 704 ∑f = 150 ∑fy = 664 ∑fy2 = 3508 s y = ∑fy 2 ∑f – 1 ∑fy ∑f 2 2 = 3508 150 – 1 664 150 2 2 = 3.7913 s y = 1.9471 s x = 10 000s y = 19 471 Standard deviation of the population = 19 471


45 Mathematics Semester 3 STPM Chapter 1 Data Description 1 Example 30 Sugar is packed in bags labelled 20 kg. Fifty bags are examined and the mass, x kg, of each bag is determined. The following results are obtained: ∑(x – 20) = 21.4, ∑(x – 20)2 = 43.5. Find the mean and standard deviation. Solution: Let y = x – 20. Then ∑y = 21.4 and ∑y2 = 43.5. y = ∑y n = 21.4 50 = 0.428 s y 2 = ∑y 2 n – y 2 = 43.5 50 – 0.4282 = 0.6868 s y = 0.829 Therefore, x = y + 20 = 0.428 + 20 = 20.428 s x = sy = 0.829 Hence, the mean mass is 20.43 kg and the standard deviation is 0.83 kg. Variance Variance is the square of the standard deviation, that is, variance = (standard deviation)2 . Hence, standard deviation = variance. Variance, s 2 = ∑(x – x) 2 n = ∑x2 n – 1 ∑x n 2 2 for ungrouped data. Variance, s 2  = ∑f(x – x) 2 ∑f = ∑fx2 ∑f – 1 ∑fx ∑f 2 2 for grouped data. The measurement variance does not have the same dimension as the other statistical measurements. For example, if the original data has cm as its unit, then the mean, median and mode are also in cm, while the unit of the variance is in cm2 . Because of this, variance is seldom used for comparison between data. In most calculations, the standard deviation is more commonly used because it has the same dimension as the original data.


Click to View FlipBook Version