The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore Pra-U STPM Maths(T) Semester 3 2022 CC039332c

View in Fullscreen

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Like this book? You can publish your book online for free in a few minutes!

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by PENERBITAN PELANGI SDN BHD, 2023-09-26 20:50:18

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Pages:

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

246 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 This is a one-tailed test with a critical region at the left tail. To locate the z value, we look for 0.01 area in the normal distribution table. From the table, the z value is approximately –2.33. The critical region: z , –2.33. Step 4: Calculate the value of the test statistic. z = x – – µ σ n = 3850 – 4100 680 25 = –1.838 Step 5: Make a decision. To make a decision, we compare the value of the test statistic to the critical value. This value of z = –1.838 is greater than the critical value of –2.33 and thus it falls in the nonrejection region. We do not reject H0 that the average monthly salary of an executive is RM4100. Note: Example 5: Normal population and small sample, known population variance. Example 6 The length of a particular type of iron nails produced by a manufacturer has standard deviation 6.8 mm. The target length for an iron nail is 38 mm. A supervisor takes length measurement of a random sample of 100 nails and obtains a sample mean length of 39.4 mm. Test whether the mean length is on target. Use the 5% significance level. Solution: Let μ be the mean length of an iron nail and x – be the corresponding sample mean. Given information: μ = 38 mm, s = 6.8 mm, n = 100, x – = 39.4 mm. We are going to test whether the mean length of nails meets the target length of 38 mm. The significance level a is given as 0.05. The following are five basic steps in testing the hypothesis. Step 1: State the null hypothesis and the alternative hypothesis. H0 : μ = 38 mm, H1 : μ ≠ 38 mm. Step 2: Specify the significance level. a = 0.05. Step 3: Select an appropriate probability distribution and determine the critical regions. The population standard deviation s is known and the sample size is large. Hence, the sampling distribution of x – is approximately normal with mean µ and standard deviation σ n . We will use the normal distribution to perform the test. This is a two-tailed test with two critical regions, one at each tail. Since the total area of the critical regions is 0.05, the area of the critical region at each tail is 0.025. To locate the z values, we look for 0.025 and 0.975 areas in the normal distribution table. From the table, the z values are –1.96 and 1.96. The critical regions: z , –1.96 and z . 1.96.

247 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Step 4: Calculate the value of the test statistic. z = x – – µ σ n = 39.4 – 38 6.8 100 = 2.058 Step 5: Make a decision. To make a decision, we compare the value of the test statistic to the critical value. This value of z = 2.058 is greater than the critical value of 1.96 and thus it falls in the critical region. We reject H0 and conclude that the mean length of the iron nails produced by the manufacturer does not meet the target length of 38 mm. Note: Example 6: Large sample, known population variance. Population mean, variance unknown In practice, the population variance σ2 is usually not known. However as long as the sample size is large, the normal approximation for the sample mean X – remains valid even if s is replaced by its unbiased estimate σ ^. We can then apply the general test procedure using the test statistic Z = X – – µ σ ^ n . Note : σ ^ 2 = n n – 1 s 2 where s 2 is the sample variance given by s 2 = 1 n n ∑ i=1 (x – x –) 2 . Example 7 A sample of 60 watches of a particular brand is checked for accuracy at 10:00:00 hours. Let μ denote the true mean watch reading when the actual time is 10:00:00 hours. The resulting sample mean and sample standard deviation are 10:00:01.2 hours and 1.8 seconds respectively. Use a significance level of 1% to decide whether the evidence from the sample suggests the watches are fast. Solution: Let x – be the mean watch reading for the sample. Given information: μ = 10:00:00 hours, s = 1.8 seconds, n = 60, x – = 10:00:01.2 hours. s ^ is calculated as s ^ = n n – 1 s 2 = 60 59 × 1.82 = 1.815 seconds We are going to test whether the mean watch reading is fast. The significance level a is 0.01. We carry out a hypothesis test using the following five steps. Step 1: State the null hypothesis and the alternative hypothesis. H0 : μ = 10:00:00 hours, H1 : μ . 10:00:00 hours. Step 2: Specify the significance level. a = 0.01. Step 3: Select an appropriate probability distribution and determine the critical region.

248 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 The population standard deviation is not known and the sample size is large. We will use the normal distribution to perform the test. This is a one-tailed test with a critical region at the right tail. To locate the z value, we look for 0.01 area in the normal distribution table. From the table, the z value is approximately 2.33. The critical region: z . 2.33. Step 4: Calculate the value of the test statistic. z = x – – µ σ ^ n = 1.2 – 0 1.815 60 = 5.122 Step 5: Make a decision. To make a decision, we compare the value of the test statistic to the critical value. This value of z = 5.122 is greater than the critical value of 2.33 and thus it falls in the critical region. We reject H0 and conclude that the true average watch reading is faster than the actual time of 10:00:00 hours. Relationship between hypothesis tests and confidence intervals The hypothesis testing is very closely related to the estimation by a confidence interval. For the case of a normal population with unknown mean μ and known σ2 , both hypothesis testing and confidence interval estimation are based on the random variable, z = X – – µ σ n . It turns out that the testing of H0 : μ = μ0 against H1 : μ ≠ μ0 at a significance level a is similar to calculating a 100(1 – a)% confidence interval for μ. If µ0 is outside the 100(1 – a)% confidence interval, then H0 is rejected at a, and if μ0 is inside the 100(1 – a)% confidence interval, then H0 is not rejected at a. Exercise 5.2 1. Determine the critical region and the critical values for z at a = 0.02 in the hypothesis testing: H0 : μ = 50 against H1 : μ , 50. 2. Find the critical values x – given that H0 : μ = 850, H1 : μ ≠ 850, σ = 36, n = 80. Use α = 0.05. 3. Given that H0 : μ = 250, H1 : μ ≠ 250 with n = 80, x – = 248, σ = 10.7. State whether the null hypothesis H0 should or should not be rejected at a = 0.02. 4. For the hypothesis test H0 : μ = 16.8, H1 : µ . 16.8, what is your conclusion at a = 0.1 if n = 45, x – = 17.5, s = 2.5? 5. In a random sample of 20 observations from a normal distributed population with standard deviation s = 27.6, the sample mean is x – = 110. Use a = 0.01 to perform the following hypothesis test: H0 : μ = 120, H1 : μ , 120. Next, change the sample size to 200 and perform the test again. Explain why you make different decisions as the sample size increases.

249 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 6. An electrical company manufactures light bulbs, their life times measured in hours, approximately normally distributed. The company claims that the light bulb has a mean life time of 1000 hours with a standard deviation of 86 hours. A customer suspects that the mean life time may be lower. He tests a random sample of 35 light bulbs and finds that the average life time is 960 hours. At the significance level of 5%, does the data provide evidence to conclude that mean life time of a light bulb is 1000 hours claimed by the company? 7. A manufacturer claims that the mean fat content of his hot dog is 10%. Assume that percentage fat content to be normally distributed with standard deviation of 3%. A consumer group, concerned about the fat content of hot dog, submits a random sample of 30 hot dogs to a laboratory for analysis. The laboratory result shows that the mean fat content of the hot dog is 12%. Carry out an appropriate hypothesis test, using significance level of 1%, in order to advise the consumer group as to the validity of the manufacturer’s claim. 8. 9 babies are born in a hospital in a particular day. The weights, in kg, of the babies are recorded as follows: 2.9 3.2 3.0 2.8 3.4 2.7 3.1 3.0 2.8 Assume that the sample is from a normally distributed population with standard deviation 0.3 kg. Carry out a test, at the 5% significance level, on the hypothesis that the population mean weight is 3 kg. 9. A car manufacturer advertises a car that has 20 km/l fuel consumption. A random sample of 30 cars gives a mean petrol mileage of 18.7 km/l with standard deviation 3.81 km/l. Assume that the sample is from a normal population. Test the manufacturer’s claim at the 5% significance level. 10. The random variable X has a normal distribution with standard deviation σ = 21. A random sample of size 16 gives the sample mean x – = 89. Determine a 90% confidence interval for the population mean. Then, test H0 : μ = 95 against H1 : μ ≠ 95 using a = 0.1. Explain how this confidence interval can be used to test the hypothesis. 11. Production records show that a machine makes coins with a mean diameter of 24.5 mm. An inspector selects a random sample of 100 coins and obtains a mean diameter of 25.3 mm with a standard deviation of 2.57 mm. Determine whether the machine slipped out of normal operation at the 1% significance level? 12. The national mean cholesterol level is approximately 220 units. 50 patients with high cholesterol levels (over 265) participate in a drug study and are treated with a new drug. After treatment the sample mean is 235 and the sample standard deviation is 41. One question of interest is whether people taking this new drug still have a mean cholesterol level that exceeds the national average. What conclusion would you get from this study by using a significance level of 2%? 13. A random sample of 40 steel bars is taken from one of the production lines. It is found that the sample mean and sample standard deviation are 28 kg and 4.5 kg respectively. Investigate the claim that the mean mass of a steel bar is 30 kg. Use a significance level of 5%. 14. It is claimed that a car owner drives, on average, more than 25 000 km per year. To test this claim, a random sample of 60 car owners are asked to keep a record of the distance they travel. Would you agree with this claim if the random sample shows an average of 26 500 km and a standard deviation of 7590 km? Use a significance level of 5%. 15. A production supervisor measures the filled volume of a random sample of 80 cans of mango juice labelled as containing 350 ml. The sample has mean volume 348.2 ml and standard deviation 5.9 ml. Let μ represent the mean volume for all cans of mango juice recently filled by this machine. The supervisor test H0 : μ = 350 against H1 : μ ≠ 350 at a significance level of 1%. (a) Find the critical values in ml. (b) Explain whether the mean filled volume differs from 350 ml?

250 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 5.3 Testing Population Proportion Often we have to perform a hypothesis test about a population proportion. Evidence concerning the value of a population proportion is provided by the sample proportion. We shall discuss hypothesis tests about the population proportion for small sample, where direct evaluation of binomial probabilities is required. We shall also discuss hypothesis tests about the population proportion for large samples, using the normal approximation to the binomial distribution. Population proportion, small sample Let a parameter p represent the unknown proportion of a population that possesses a certain characteristic. If an independent observation is randomly obtained from the population it can then be taken as having a probability p of showing that particular characteristic. If a random sample of n observations is taken from the population, the number of observations exhibiting the character of interest can be realised as a random variable X obtained from a binomial experiment, that is, X ~ B(n, p). When the binomial parameter p, success probability in a binomial experiment, is to be tested using hypothesis testing procedure, we will consider that this parameter equals some specified value. A hypothesis testing problem would then become testing the null hypothesis H0 that p = p0 against the alternative hypothesis which may be one of the usual one-sided or two-sided alternatives: p , p0 , p . p0 , or p ≠ p0 . The appropriate random variable on which we base our decision criterion is the binomial random variable X. Values of X that provide significant evidence indicating the success probabilities are far from p0 will lead to the rejection of the null hypothesis. Consider the hypotheses H0 : p = p0 , H1 : p , p0 , we use the binomial distribution with p = p0 and q = 1 – p0 to determine P(X < x). The value of x represents the number of successes in our sample of size n. If P(X < x) , α, we reject H0 as the result is significant at the significance level α. Likewise, for the hypotheses H0 : p = p0 , H1 : p . p0 , we obtain P(X > x). If this probability is less than α, we reject H0 . Lastly, for the hypotheses H0 : p = p0 , H1 : p ≠ p0 , we calculate P(X < x) if x , np0 , or P(X > x) if x . np0 . If the probability is less than a 2 , we reject H0 . The steps for testing a null hypothesis about a proportion versus various alternatives are: 1 State H0 : p = p0 and H1 : p , p0 , p . p0 , or p ≠ p0 2 Specify the significance level 3 Determine the critical region 4 Calculate the appropriate binomial probability 5 Make a decision

251 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Example 8 An airline claims that, on average, 6% of its flights are delayed each day. On a given day, of 20 flights, 2 are delayed. Test the hypothesis that the proportion of delayed flights is 6% at the significance level of 0.05. Solution: Step 1: State the null hypothesis and alternative hypothesis. H0 : p = 0.06, H1 : p . 0.06. Step 2: Specify the significance level. α = 0.05. Step 3: Select an appropriate probability distribution and determine the critical region. We have np = 20 × 0.06 nq = 20 × 0.94 = 1.2 = 18.8 Since np , 5, the sample size is small. We will use binomial distribution to evaluate directly the probability. This is a one-tailed test with a critical region falling in right side. The sample proportion can be considered an outcome of a binomial experiment with p = 0.06 and n = 20. All x values such that P(X > x) , 0.05. Step 4: Calculate the appropriate binomial probability. We have x = 2 and n = 20, P(X > 2) = 20 ∑ x=3 1 20 x 2 0.06x (1 – 0.06)20 – x = 1 – 2 ∑ x=0 1 20 x 2 0.06x (1 – 0.06)20 – x = 1 – 0.885 = 0.115 Step 5: Make a decision. To make a decision, we compare the value of the binomial probability to the significance level. This value of 0.115 is greater than the significance level of 0.05 and thus it falls in the nonrejection region. We do not reject H0 and conclude that there is insufficient reason to question the airline’s claim. Population proportion, large sample When the sample size n is large, the sample proportion p ^ = x n is approximately normally distributed with mean and standard deviation equal to µp ^ = p and s p ^ = p(1 – p) n respectively. Hence, for a large sample, we use normal distribution to perform a hypothesis test about the population proportion p. The sample size is large if np . 5 and nq . 5. Then the test statistic is given by Z = p ^ – p pq n ,

252 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 For a two-tailed test at the significance level α, the critical regions are given as z , –z—a 2 and z . z—a 2 , whereas, the critical region for a one-tailed test is either z , –zα or z . zα . Example 9 A botanist has produced a new variety of hybrid rice grain that has better ability to resist stem borer than other varieties. He knows that 82% of the seeds from the parent plants germinate. He claims the hybrid has the same germination rate. 300 seeds from the hybrid plant are tested and 233 germinated. Test the botanist’s claim at the 2% significance level. Solution: Let p be the proportion of seeds from the new hybrid plant germinated and p ^ be the corresponding proportion for the sample. Given information: p = 0.82, n = 300, and p ^ = 233 300 . We are going to test whether the claim by the botanist is valid. The significance level α is given as 0.02. The following are the five steps in testing the hypothesis. Step 1: Formulate the null hypothesis and the alternative hypothesis. H0 : p = 0.82 H1 : p ≠ 0.82. Step 2: Specify the significance level. α = 0.02. Step 3: Select an appropriate probability distribution and determine the critical regions. We have np = 300 × 0.82 = 246 nq = 300 × 0.18 = 54 Since both np and nq are both greater than 5, the sample size is large. We will use the normal distribution. This is a two-tailed test with two critical regions, one in each tail. Since the total area of the critical regions is 0.02, the area of the critical region in each tail is 0.01. To locate the z values, we look for 0.01 and 0.99 areas in the normal distribution table. From the table, the z values are –2.33 and 2.33. The critical regions: z , –2.33 and z . 2.33

253 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Step 4: Calculate the value of the test statistic. z = p ^ – p p(1 – p) n = 0.7767 – 0.82 0.82(1 – 0.82) 300 = –1.952 Step 5: Make a decision. We compare the value of the test statistic to the critical value. This value of z = –1.952 is greater than the critical value of –2.33 and thus it falls in the nonrejection region. We do not reject H0 which states that the rate of germination for the hybrid plant is 0.82. Example 10 A manufacturing company has submitted a claim that 85% of components produced by a certain process are non-defective. A new process is introduced to lower the proportion of defective components below the current 15%. In a sample of 100 components produced with the new process, 7 are defective. Is this evidence sufficient to conclude that the process has been improved? Use the 5% significance level. Solution: Let p be the proportion of defective components produced by the existing process and p ^ be the corresponding proportion for the new improved process. Given information: p = 0.15, n = 100, p ^ = 7 100 . We are going to test whether the evidence is sufficient to justify the improvement. The significance level α is given as 0.05. The following are the five steps in testing the hypothesis. Step 1: Formulate the null hypothesis and the alternative hypothesis. H0 : p = 0.15, H1 : p , 0.15. Step 2: Specify the significance level. α = 0.05. Step 3: Select an appropriate probability distribution and determine the critical region. We have np = 100 × 0.15 = 15 nq = 100 × 0.85 = 85 Since both np and nq are both greater than 5, the sample size is large. We will use the normal distribution.

254 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 This is a one-tailed test with a critical region located at the left tail. The area of the critical region is 0.05. To locate the z value, we look for 0.05 area in the normal distribution table. From the table, the z value is –1.645. The critical regions: z , –1.645. Step 4: Calculate the value of the test statistic. z = p ^ – p p(1 – p) n = 0.07 – 0.15 0.15(1 – 0.15) 100 = –2.240 Step 5: Make a decision. We compare the value of the test statistic to the critical value. This value of z = –2.240 is smaller than the critical value of –1.645 and thus it falls in the critical region. We reject H0 and this evidence is sufficient to conclude that the process has been improved at the 5% significance level. Exercise 5.3 1. In a random sample of 20 independent observations obtained from a binomial distribution, a student wants to test H0 : p = 0.3 versus H1 : p0 ≠ 0.3. The student decides to reject H0 if X < 3 or X > 11. Find P(X < 3) and P(X > 11). 2. Suppose that X = 13 is a number of occurrence from a sample of 20 independent observations obtained from a binomial population. Test the following hypotheses at the 10% significance level. H0 : p = 0.8 H1 : p , 0.8 3. A hunter claims that he hits 70% of the wildfowl he shoots at. On one day he guns down 6 of the 12 wildfowls he aims at. Using the 5% significance level for a hypothesis test, what is your conclusion on this claim? 4. If there is no gender bias in trainee selection, then the pool of potential trainees is 50% male and 50% female. In a sample of 10 trainees, it is found that there are only two women trainees. Is there evidence of gender bias in trainee selection? Use the 10% significance level for the hypothesis test. 5. A magazine claims that 45% of its readers do not trust an advertisement on a certain health food product. In a poll of 20 randomly sampled magazine readers conducted two years later, 11 state that they do not trust the advertisement. At the 5% significance level, is there evidence to support the claim that the percentage of the readers against the advertisement has increased?

255 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 6. 15% of mothers in a country who gave birth last year were under 20 years of age. A sociologist claims that births to mothers under 20 years of age is decreasing. He selects a random sample of 25 births this year and finds that 2 of them are to mothers under 20 years of age. Use a significance level of 1% to test the sociologist’s claim. 7. At a certain college, it is estimated that around 30% of the students drive cars to class. In a random sample of 20 college students chosen, 7 are found to drive cars to class. Determine, at the 5% significance level, whether the 30% of the students driving to college is a valid estimation. 8. Suppose that there are 38 occurrence from a sample of 60 independent observations obtained from a binomial distribution. Find the z-value of the test statistic for the hypotheses test of H0 : p = 0.5 against H1 : p ≠ 0.5. 9. Suppose that X = 8 is an observation obtained from a random sample of size n = 25. The sample is drawn from a binomial distribution. Consider the following test of hypothesis: H0 : p = 0.4 H1 : p , 0.4 Compute an exact probability, P(X < 8), and compare it with the corresponding probability using a normal approximation. 10. A construction firm claims that the job of covering housing floors with tiles is 80% completed in a new housing estate. Would you agree with this claim if a random survey of new houses in this estate shows that 32 out of 50 have floor job completed? Use a significance level of 5%. 11. A sample survey indicates that, out of 650 births, 327 are boys and the rest are girls. Do these figures confirm the hypothesis that the sex ratio is 50 to 50? Use a significance level of 10%. 12. A club claims that it receives 12% responses from its mailing to club members. A random sample of 200 shows that only 18 members respond. Test the claim at a significance level of 10%. 13. A study estimates that 47 per cent of people in a city wear spectacles. However, you believe the proportion of people in the city who wear spectacles is less than 47 per cent. You conduct a study and find that, among 250 randomly selected people in the city, 114 of them wear spectacles. Test, at the 5% significance level, the estimation that 47 per cent of people in the city wear spectacles. 14. A patented medicine claims that it is effective in curing 80% of the patients suffering from cold. From a sample of 100 patients using this medicine, it is found that only 72 are cured. Determine whether the claim is valid at the 5% significance level. 15. In a controlled laboratory experiment, scientists discovers that 35% of a certain rats subjected to a cancer-causing chemical injection later develop cancerous tumours. Would we have reason to believe that the proportion of rats developing tumours when subjected to this injection has increased if a repeated experiment finds that 26 of 60 rats develop tumours? Use the 5% level of significance. 16. A random sample of 120 recent donations at a certain blood bank reveals that 55 are of type O blood. Does this suggest that the actual percentage of type O donations differs from 38%, the percentage of the population having type O blood? Carry out an appropriate hypothesis test using a significance level of 10%. Would your conclusions have been different if a significance level of 5% had been used?

256 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Summary 1. A hypothesis test or significance test is a method of using sample data as evidence to test a statistical hypothesis about a population parameter. 2. A null hypothesis is a statement about a population parameter that is assumed to be true until it is rejected with strong evidence obtained from a sample. An alternative hypothesis is a statement about a population parameter that will be true if the null hypothesis is rejected. 3. A test statistic is a random variable whose value is used to determine whether a null hypothesis is rejected in a hypothesis test. 4. A critical region is the set of values that leads to the rejection of the null hypothesis in favour of the alternative hypothesis. 5. (a) Type I error occurs if the null hypothesis is rejected when the null hypothesis is true. The probability of making this error is called the significance level of a test denoted by α. (b) Type II error occurs if the null hypothesis is not rejected while in fact the null hypothesis is false. The probability of making this error is denoted by β. 6. A hypothesis test which has one sided critical region in either left or right tail is called a one-tailed test. A hypothesis test which has two critical regions, each at the left tail and right tail, is called a two-tailed test. 7. The general test procedure is as follows: • State the null and alternative hypotheses • Specify the significance level • Select an appropriate probability distribution and determine the critical region(s) • Calculate the value of the test statistic • Make a decision 8. To test a hypothesis about a population mean with known variance the test statistic is Z = X – – µ σ n , where the population is normal if the sample size is small. 9. To test a hypothesis about a population mean with unknown variance where the sample is large, the test statistic is Z = X – – µ σ ^ n . 10. To test a hypothesis about a population proportion, where the sample size is small, the critical region(s) is (are) as follows: (a) All x values such that P(X < x) , a for p , p0 (b) All x values such that P(X > x) , a for p . p0 (c) All x values such that P(X < x) , a 2 when x , np0 and all x values such that P(X > x) < a 2 when x . np0 for p ≠ p0 11. To test hypothesis about a population proportion, where the sample size is large, the test statistic is Z = p ^ – p p(1 – p) n .

257 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 STPM PRACTICE 5 1. A random sample of 70 observations taken from a normal distributed population, with standard deviation σ = 7.2, gives the sample mean x – = 60.8. Test, at the 5% significance level, H0 : μ = 60 against H1 : μ . 60. 2. A random sample of 103 observations is taken from a certain population. It is found that its sample mean is 189 with a standard deviation of 29.7. The null hypothesis H0 : μ = 200 is to be tested against H1 : μ ≠ 200. If the test is performed at the 2% significance level, what conclusion do you draw? 3. Suppose a random sample of 20 independent observations is obtained from a binomial distributed population and the number of successes is 8. Use a significance level of 1% to test H0 : p = 0.20 against H1 : p ≠ 0.20. 4. It is given that the number of successes, x = 85 and the number of independent observations, n = 300 for a random sample obtained from a binomial distributed population. Use a significance level, α = 0.05 to perform the hypothesis test H0 : p = 0.32 versus H1 : μ , 0.32. 5. A person claims that the weather forecasts by a meteorologist are no better than the outcomes of tossing a fair coin. If a head is obtained then there will be no rain, and if a tail is obtained then there will be rain. He records the weather for 50 randomly chosen days. The meteorologist forecast is correct on 34 of these days. (a) Write the hypotheses clearly. (b) Use a significance level of 1% to test the claim of the person. 6. An environmental department wants to determine whether a cleanup project at a lake has been effective. This is to be done by recording dissolved oxygen content (in ppm, parts per million) in the lake, with higher values indicating less pollution. Prior to the cleanup project the mean dissolved oxygen readings around the lake is reported as 9.80. Six months after the initiation of the cleanup, a random sample of 80 readings gives the mean and standard deviation as 9.95 ppm and 0.51 ppm respectively. (a) State null and alternative hypotheses for a hypothesis test. (b) Carry out the test at a significance level of 5%. 7. A shopkeeper realises that 20% of customers buy a drink from the storage. During the renovation of the shop a new storage was installed. He picks a random sample of 20 customers and finds that only 1 customer have bought drinks from the new storage. Using a significance level of 5% to test whether there has been a change in the proportion of customers buying a drink from the storage. 8. A real estate agent claims that 60% of all apartments being built today are 3-bedroom units. To test this claim, a sample of 50 new apartments is inspected and it is found that the proportion of these apartments with 3-bedroom units is 75%. Perform a hypothesis test at a significance level of 2%. 9. The contents of a random sample of 9 containers of a particular paint are 5.2, 4.7, 4.6, 5.3, 5.1, 4.8, 4.9, 5.4, and 4.8 litres. Assume that the contents is normally distributed with standard deviation 0.2 litre. Use a 10% significance level to determine whether the mean content of the containers is 5 litres. 10. The mean height of male students in a certain college has been 164.9 centimetres with a standard deviation of 14.3 centimetres. Is there strong reason to believe that there has been a change in the mean height of male students if a random sample of 90 male students in the college has a mean height of 168.5 centimetres? Assume that the sample standard deviation remains the same and use a significance level of 5%.

258 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 11. An exit poll by a news station of 800 people in a certain parliamentary constituency finds 390 voting for candidate A. Does the data support the hypothesis that candidate A receives 50% of the parliamentary constituency’s votes at a significance level of 10%. 12. According to an article published 2 years ago, the mean number of cigarettes smoked per day by adults who were daily smokers was 11.6. To determine whether adults who are daily smokers nowadays smoke less than the general population of daily smokers in the past, a random sample of 100 adults who are current daily smokers and record the number of cigarettes smoked on a randomly selected day. The data give a sample mean of 10.8 cigarettes and a standard deviation of 3.9 cigarettes. Perform, at the 5% significance level, a test to determine whether current adults who are daily smokers smoke less than the general population of daily smokers two years ago. 13. A pizza shop in a district advertises that their average delivery time to the district is at most 15 minutes. A sample of 60 delivery times has an average of 18 minutes. The true standard deviation of delivery times is 8 minutes. Is there sufficient evidence to reject the shop’s claim at the 5% significance level? 14. For 150 bottles labelled as “350-millilitre”, the average amount of soft drink filled by the machine is 348.5 millilitres. The population standard deviation is known to be 11.5 millilitres. If the expected amount (per bottle) of the soft drink filled by the machine is not equal to 350 millilitres, then the machine is “out of control”, and must be shut down for repairs. Test whether the machine is out of control at the 5% significance level. 15. A certain type of medicine is known to be 60% effective in relieving an arthritis pain. To determine whether a new and somewhat more expensive patent medicine is superior in alleviating the arthritis pain, a sample of 500 patients is taken and it is found that the new medicine provides relief of 318 patients. Carry out a test at a significance level of 5%. 16. The hourly french fried potato output by a certain brand fry machine is advertised to be 60 kilograms. For the new machine purchased by a fast food restaurant, tests are run for 50 different one-hour periods, producing an average output of 57.2 kilograms, with a standard deviation of 6.8 kilograms. At the 5% level of significance, does the fast food restaurant management have grounds for complaints? 17. A random sample of 80 observations obtained from a population produces the results: 80 ∑ i=1 xi = 2880, 80 ∑ i=1 xi 2 = 129 276. (a) Calculate an unbiased estimate of the population mean and variance. (b) Determine a 95% confidence interval for the population mean. (c) Test, at the 5% significance level, H0 : μ = 34 against H1 : μ ≠ 34. (d) Explain the relationship between the confidence interval obtained in (b) and the result of the test in (c). 18. The mean lifetime of a sample of 200 picture tubes produced by a manufacturer is found to be 14 800 hours with a standard deviation of 1060 hours. Use a significance level of 5% to test, on the basis of this data, whether the mean lifetime of all such tubes made is 15 000 hours as claimed by the manufacturer. 19. In a sequence of rolling a dice 500 times, ‘ones’ are obtained 77 times. Test, at a significance level of 10%, the hypothesis that the dice is unbiased. 20. A traffic authority is concerned with the increasing traffic congestion in a city. The authority knows that the mean time per journey between the city centre and the nearby town during the peak hours was 32.5 minutes with a standard deviation of 19.7 minutes five years ago. This year, for a random sample of 150 car drivers, the mean travelling time is 36.8 minutes. Does the authority justify the worry about the traffic congestion getting worst this year if a significance level is chosen to be 5%?

259 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 21. A large chain of telecommunication service provider introduces a training programme for counter staff to increase their productivity and morale. The management believes that the mean time for collecting customers’ phone-bill payment should be 120 seconds. After the training programme, the times to collect 90 bills are found to have a mean of 113.5 seconds and a standard deviation of 35 seconds. (a) Determine, at the 5% significance level, whether, after the training programme, the mean time to collect a bill is less than 120 seconds. (b) What assumption is necessary for the test in part (a) to be justifiable? 22. The proportion of members of a certain badminton club who are able to explain the rules of badminton game correctly is p. A random sample of 10 members of the badminton club is selected and 4 members are able to explain the rules correctly. Test the null hypothesis, H0 : p = 0.7 against the alternative hypothesis H1 : p , 0.7 at the 10% significance level. 23. For a binomial distribution B(15, p), it is to test H0 : p = 0.25 versus H1 : p . 0.25. (a) Using a significance level of 5%, determine the critical region for this test. The area of the critical region at the left end should be as close as possible to 0.05. (b) Find the actual significance level of this test and the critical value. (c) Draw a conclusion from this test. 24. In a class, 25 students are selected randomly and their reaction times, in seconds, to a particular experiment, are measured. The mean reaction time for the students is 6.2 seconds with a variance of 4 s2 . (a) Find the 95% confidence interval for the mean reaction time for all students. (b) State any assumptions necessary to make this valid estimate. (c) Based on your answer in (a), would the null hypothesis that the true mean is 7.0 seconds be rejected at the 5% significance level? Why? 25. A peanut factory claims that at most 6% of the peanut shells contain no nuts. 100 peanuts are selected at random and it is found that 9 of them were empty. Test at the 5% level of significance whether or not the claim made by the factory is true. 26. Scores from a standardised memory test of all the secondary students in a school are normally distributed with a mean μ and a standard deviation σ = 18. The score of a random sample of 60 such students has a mean of 76. Perform a test to determine whether μ is less than 80, using the 5% significance level. 27. A random sample of 65 bags of groundnut which are labelled as 100 grams, are weighed. The mean weight of the bags is 99 grams with an estimate of the standard deviation of 4.21 grams. Test whether the mean weight of all bags of groundnut packed is less than 100 grams. Use the 5% level of significance. 28. A machine fills bottles with cooking oil. Prior to maintenance of the machine, the volume of cooking oil in a bottle could be modelled by a normal distribution with mean 554 ml and standard deviation 3.5 ml. Following this new setting of the machine, the mean volume of cooking oil of a random sample of 12 bottles is 555.1 ml. (a) Carry out a test, at the 10% significance level, to decide whether the mean volume of cooking oil filled by the machine has changed. Assume that the distribution of volume is still normal with the variance remain unchanged. (b) Find the largest significance level such that there is evidence that the mean volume of the cooking oil has increased. 29. Steel rods produced on a production line are supposed to average 25.2 mm in diameter. Assume the diameters of the steel rods are normally distributed with standard deviation 0.11 mm. It is desired to check that the mean diameter of the rods is under control within certain limits. Suppose a random sample of 80 steel rods is taken and the diameter of each rod is measured. Test at the 2% level to determine the limits of the mean diameter such that the steel rods produced would be acceptable.

Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 CHAPTER Learning Outcome (a) Identify the shape, as well as the mean and variance, of a chi-squared distribution with a given number of degrees of freedom. (b) Use the chi-squared distribution tables. (c) Identify the chi-squared statistic. (d) Use the result that classes with small expected frequencies should be combined in a chi-squared test. (e) Carry out goodness-of-fit tests to fit prescribed probabilities and probability distributions with known parameters. (f) Carry out tests of independence in contingency tables (excluding Yates correction). chi-squared distribution – taburan khi kuasa dua chi-squared statistic – statistik khi kuasa dua chi-squared test – ujian khi kuasa dua contingency table – jadual kontingensi degree of freedom – darjah kebebasan expected frequencies – kekerapan jangkaan goodness-of-fit test – ujian kebagusan penyuaian observed frequencies – kekerapan cerapan test of independence – ujian ketakbersandaran CHI-SQUARED TESTS 6 Bilingual Keywords

261 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 6.1 The Chi-Squared Distribution Hypothesis test discussed in the last chapter each involves a null hypothesis stated in terms of a population parameter and a test statistic having a known probability distribution. They are called parametric tests. However, not all ideas can be stated in terms of population parameters. In this chapter, we shall discuss a non-parametric test called chi-squared test which is performed using the chi-squared distribution. Let x1 , x2 , …, xn be a random sample from a normal distribution with mean μ and variance σ2 . Then the sampling distribution of the statistic χ2 = n ∑ i=1 (xi – x –) 2 s2 is called the chi-squared distribution with n – 1 degrees of freedom. The probability density function is given by f(χ2 v ) = c(χ2 v ) —v 2 – 1 e –χ2 v –––2 where c is a constant, χ2 v is the chi-squared statistic with ν degrees of freedom and e is the base of the natural logarithm. c is a normalised factor so that the area under the chi-squared curve is equal to one. Examples of chi-squared distributions with various degrees of freedom are shown in the figure below. The curve for degrees of freedom, ν = n – 1 = 3 – 1 = 2, represents the distribution of chi-square values computed from all possible samples of size 3. Likewise, the curve for degrees of freedom equal to 10 corresponds to the distribution for samples of size 11. f(2 v) 2 v v = 1 0 0 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 v = 2 v = 3 v = 5 v = 10

262 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 The chi-squared distribution has the following properties: • The values of χ2 cannot be negative • The curve is not symmetric • They are all positively skewed • As v gets larger, the degree of skewness decreases • The mean of the distribution is equal to the number of degrees of freedom: μ = v. • The variance is equal to two times the number of degrees of freedom: s2 = 2 × v • When the degrees of freedom are greater than or equal to 2, the maximum value occurs when χ2 v = v – 2 • As the degrees of freedom increase, the chi-squared curve approaches a normal distribution. The area under the curve between 0 and a particular chi-squared value is a cumulative probability associated with that chi-squared value. For example, the figure below is a graph of the chi-squared distribution with 6 degrees of freedom, the shaded area represents a cumulative probability associated with a chi-squared statistic equal to x; that is, it is the probability that the value of a chi-squared statistic will fall between 0 and x. Probability density O x 0.02 0.04 0.06 0.08 0.10 5 10 15 20 0.12 25 2 The χ2 -distribution table gives values of χ2 for various values of α and v, where α and v represent significance level and degrees of freedom respectively. The areas, α, are the column headings; the degrees of freedom, v, are given in the left column, and the table entries are the χ2 values. Hence the χ2 value with 6 degrees of freedom, leaving an area of 0.05 to the left, is χ2 6 = 1.635. Owing to lack of symmetry, we must also use the table to find χ2 6 = 12.592 for α = 0.95.

263 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Critical values for the 2 -distribution If X has a χ2 -distribution with v degrees of freedom, then for each pair of values of p and v, the tabulated value of x is such that P(X < x) = p. 0 x p P 0.01 0.025 0.05 0.9 0.95 0.975 0.99 0.995 0.999 v = 1 0.03 1571 0.03 9821 0.02 3932 2.706 3.841 5.024 6.635 7.879 10.83 2 0.02010 0.05064 0.1026 4.605 5.991 7.378 9.210 10.60 13.82 3 0.1148 0.2158 0.3518 6.251 7.815 9.348 11.34 12.84 16.27 4 0.2971 0.4844 0.7107 7.779 9.488 11.14 13.28 14.86 18.47 5 0.5543 0.8312 1.145 9.236 11.07 12.83 15.09 16.75 20.51 6 0.8721 1.237 1.635 10.64 12.59 14.45 16.81 18.55 22.46 7 1.239 1.690 2.167 12.02 14.07 16.01 18.48 20.28 24.32 8 1.647 2.180 2.733 13.36 15.51 17.53 20.09 21.95 26.12 9 2.088 2.700 3.325 14.68 16.92 19.02 21.67 23.59 27.88 10 2.558 3.247 3.940 15.99 18.31 20.48 23.21 25.19 29.59 11 3.053 3.816 4.575 17.28 19.68 21.92 24.73 26.76 31.26 12 3.571 4.404 5.226 18.55 21.03 23.34 26.22 28.30 32.91 13 4.107 5.009 5.892 19.81 22.36 24.74 27.69 29.82 34.53 14 4.660 5.629 6.571 21.06 23.68 26.12 29.14 31.32 36.12 15 5.229 6.262 7.261 22.31 25.00 27.49 30.58 32.80 37.70 16 5.812 6.908 7.962 23.54 26.30 28.85 32.00 34.27 39.25 17 6.408 7.564 8.672 24.77 27.59 30.19 33.41 35.72 40.79 18 7.015 8.231 9.390 25.99 28.87 31.53 34.81 37.16 42.31 19 7.633 8.907 10.12 27.20 30.14 32.85 36.19 38.58 43.82 20 8.260 9.591 10.85 28.41 31.41 34.17 37.57 40.00 45.31 21 8.897 10.28 11.59 29.62 32.67 35.48 38.93 41.40 46.80 22 9.542 10.98 12.34 30.81 33.92 36.78 40.29 42.80 48.27 23 10.20 11.69 13.09 32.01 35.17 38.08 41.64 44.18 49.73 24 10.86 12.40 13.85 33.20 36.42 39.36 42.98 45.56 51.18 25 11.52 13.12 14.61 34.38 37.65 40.65 44.31 46.93 52.62 26 12.20 13.84 15.38 35.56 38.89 41.92 45.64 48.29 54.05 27 12.88 14.57 16.15 36.74 40.11 43.19 46.96 49.65 55.48 28 13.56 15.31 16.93 37.92 41.34 44.46 48.28 50.99 56.89 29 14.26 16.05 17.71 39.09 42.56 45.72 49.59 52.34 58.30 30 14.95 16.79 18.49 40.26 43.77 46.98 50.89 53.67 59.70

264 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Example 1 The curve of the chi-squared distribution with v = 3 degrees of freedom is shown below. Find the critical value of χ2 such that the area in the shaded region is 0.025. O f(c 2 ) c 2 Solution: Look it up in the table by proceeding down the left column entitled v, degrees of freedom, to v = 3. Then move to the right till the column labelled 0.975 is found. The result is 9.348. Thus we have P(χ2 . 9.348) = 0.025. Exercise 6.1 1. Find the 95th percentile of the chi-squared distribution with 9 degrees of freedom. 2. Using the table of chi-squared distribution table, find (a) P(χ2 7 , 18.48), (b) P(χ2 13 . 19.81), (c) P(χ2 21 . 32.67). 3. Giving v and α, find the critical value(s) for each case (a) = 0.01 v = 10 (b) = 0.05 v = 20 (c) – = 0.05 2 – = 0.05 2 v = 15

265 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 4. Using the chi-squared distribution table, find the value of k such that (a) P(χ2 25 , k) = 0.01 (b) P(χ2 11 . k) = 0.95 (c) P(k , χ2 18 , 9.39) = 0.04 5. (a) Find the mean and the standard deviation of a chi-squared distribution with 8 degrees of freedom. (b) Which one of the following chi-squared distributions looks the most like a normal distribution? (i) A chi-squared distribution with 1 degree of freedom (ii) A chi-squared distribution with 2 degrees of freedom (iii) A chi-squared distribution with 5 degrees of freedom (iv) A chi-squared distribution with 10 degrees of freedom 6.2 Goodness-Of-Fit Tests The chi-squared test can be used to test how good a fit between observed frequencies and expected frequencies. Observed frequencies are the actual frequencies observed from a random sample. Expected frequencies are theoretical frequencies based on a distribution under the null hyprothesis which is presumed to be true until statistical evidence indicates otherwise. As an example: what would we expect by flipping a coin 12 times? By chance, we observe six heads and six tails. If we observe one head and eleven tails in this experiment, would this outcome be attributable merely to chance or be it due to the coin being biased? The chi-squared test can help providing an answer. Before discussing the chi-squared test, we have several assumptions to make. First, frequency data is used to represent the actual number of elements in each category. Second, categories are mutually exclusive, that is, whatever is being tallied can only be in one cell and cannot overlap. Third, categorical data is a grouping of data according to similar characteristics in a way to show the frequencies of each category. Let us look at an example to see how we use the chi-squared test to determine whether the frequencies observed across the categories differ significantly from what are expected theoretically. Consider the tossing of a six-sided dice. We have the null hypothesis that the dice is fair, which is equivalent to the hypothesis that the distribution of outcomes is uniform. Suppose that the dice is thrown 60 times and each outcome is recorded. The observed frequency oi for each face of the dice is shown in the table below: Faces 1 2 3 4 5 6 o1 = 12 o2 = 8 o3 = 14 o4 = 7 o5 = 9 o6 = 10 The chi-squared test will compare the observed frequencies oi with the corresponding expected frequencies ei . The table above lists the observed frequencies, and the expected frequencies need to be determined. To calculate the expected frequency for each outcome, we make use of the hypothesis that the outcome of a fair dice is uniformly distributed. Since the probability of each outcome is one-sixth and there are a total of 60 rolls of the dice, we have Expected frequency ei = 1 6 × 60 = 10

266 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Note that the expected frequencies are anticipated only in theoretical sense. It is not practical to expect the observed frequencies perfectly match the expected frequencies. The table below lists the observed and expected frequencies for each category: Faces 1 2 3 4 5 6 o1 = 12 o2 = 8 o3 = 14 o4 = 7 o5 = 9 o6 = 10 e1 = 10 e2 = 10 e3 = 10 e4 = 10 e5 = 10 e6 = 10 Now, we need to decide whether the observed frequencies are reasonably close to the expected frequencies or really different from them. The hypothesis to be tested is how good the observed frequencies fit a given pattern or a theoretical distribution. The test is called a goodness-of-fit test. A useful measure for the overall discrepancy between the observed and expected frequencies is the chisquared test statistic χ2 = k ∑ i=1 (oi – ei ) 2 ei , where χ2 is a value of a random variable X2 whose sampling distribution is approximately very closely described by the chi-squared distribution with k – 1 degrees of freedom and k is the number of categories. The symbols oi and ei represent the observed and expected frequencies respectively for the ith category. For the chi-squared goodness-of-fit test, the number of degrees of freedom shows the number of independent free choices which can be made in allocating values to the expected frequencies. In this example of tossing a dice, there are six expected frequencies (one for each face, that is, 1 to 6) and only five of the expected frequencies can vary independently and the sixth one must take whatever value is required to fulfil that constraint of total frequency. Thus, the degrees of freedom ν = number of categories – number of constraints. Here there are six categories and one constraint, so ν = 6 – 1 = 5. To calculate the chi-squared test statistic, we first subtract the expected frequency ei from the observed frequency oi . Then we square the difference and subsequently divide the squared difference by the expected frequency ei , before finally adding the quotients. This is done in the table below: Faces oi ei (oi – ei ) (oi – ei ) 2 (oi – ei ) 2 ei 1 12 10 2 4 0.4 2 8 10 –2 4 0.4 3 14 10 4 16 1.6 4 7 10 –3 9 0.9 5 9 10 –1 1 0.1 6 10 10 0 0 0 χ2 = 3.4

267 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 This means the value of χ2 with 5 degrees of freedom is 3.4. In the goodness-of-fit test, if the observed frequencies are the same as the expected frequencies, then χ2 = 0. Thus, if χ2 value is small, there will be high degree of compatibility between expected and observed frequencies, indicating a good fit. If χ2 value is large, there is a low degree of matching between the two frequencies and the fit is poor. This also implies that the critical region falls in the right tail of the chisquared distribution. At the 10% significance level, we find χ2 5 = 9.236 using χ2 table. The calculated value of χ2 = 3.4 is less than 9.236, it would support the hypothesis that the outcomes of the dice is uniformly distributed. In other words, the dice is fair. v = 5 9.236 Note: To perform a chi-squared test, the expected frequency for each category is at least equal to 5. This restriction may require combining adjacent categories, resulting in a reduction of the number of degrees of freedom. Example 2 A quality supervisor at a glass manufacturing factory inspects a random sample of 60 sheets of glass to check for any minor defects. The number of flaws in a glass sheet are recorded. The results are as follows: Number of flaws 0 1 2 3 Observed frequency 32 15 9 4 Use a 5% significance level to test the hypothesis that these data follows a Poisson distribution with mean 0.75. Solution: A test procedure is as follows. Step 1: State the hypotheses H0 : The number of flaws has a Poisson distribution H1 : The number of flaws does not has Poisson distribution Step 2: Specify the significance level Here α = 0.05 Step 3: Select the appropriate test statistic and calculate its value Use the chi-squared goodness-of-fit test to determine whether observed sample frequencies differ significantly from expected frequencies specified in the null hypothesis. P(X = xi ) = e–0.75 0.75xi xi ! , xi = 0, 1, 2, 3 which gives the following probability associated with each class and thus the corresponding expected frequency is obtained by multiplying the appropriate Poisson probability by the sample size n = 60.

268 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 xi P(X = xi ) ei 0 1 2 3 or more 0.472 0.354 0.133 0.041 28.32 21.24 7.98 2.46 If an expected frequency is less than 5, two or more classes can be combined. In the above situation the expected frequency in the last class is less than 3, so we should combine the last two classes to get, Number of flaws Observed frequency Expected frequency 0 1 2 or more 32 15 13 28.32 21.24 10.44 The chi-squared value can now be calculated: χ2 = ∑ (o – e) 2 e = (32 – 28.32)2 28.32 + (15 – 21.24)2 21.24 + (13 – 10.44)2 10.44 = 2.94 Step 4: Determine the critical region Here, we have 3 classes, thus the chi-squared statistic has 3 – 1 = 2 degree of freedom. Using a significance level of 0.05, from chi-squared distribution table, the critical value of χ2 2 is 5.991. Step 5: Make a decision As χ2 = 2.94 , 5.991, we conclude that there is no real evidence to suggest the data does not follow a Poisson distribution. Example 3 The table below shows frequency distribution of 50 telephone call lengths (in minutes). Determine whether there is significant evidence at the 5% significance level, to reject the null hypothesis that the call length has a normal distribution with mean 14 minutes and standard deviation 6.4 minutes. Call length (in minutes) Frequency 0 – 5 4 5 – 10 9 10 – 15 16 15 – 20 13 20 – 25 5 25 – 30 3

269 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Solution: We proceed with the steps of a test procedure as follows: Step 1: State the hypotheses H0 : The telephone call lengths follow a normal distribution H1 : The telephone call lengths do not follow a normal distribution Step 2: Specify the significance level Here α = 0.05 Step 3: Select the appropriate test statistic and calculate its value Use the chi-squared goodness-of-fit test to determine whether observed sample frequencies differ significantly from expected frequencies specified in the null hypothesis. The expected frequency for each class (category), listed in the given table can be obtained from a normal curve. The z values corresponding to the boundaries of the second class are z1 = 5 – 14 6.4 = –1.406 z2 = 10 – 14 6.4 = –0.625 From the normal table, the area between z1 = –1.406 and z2 = –0.625 is P(–1.406 , Z , –0.625) = P(Z , –0.625) – P(Z , –1.406) = 0.266 – 0.08 = 0.186 Thus, the expected frequency for the second class is e1 = 0.186 × 50 = 9.3. The expected frequency for the first class interval is obtained by using the total area under the normal curve to the left of the boundary 5. For the last class interval, we can use the total area to the right of the boundary 25. All other expected frequencies could be found by the similar method described above for the second class. The complete set of calculation needed to find the expected frequency in each class is summarised in the table below. Note that we have combined adjacent classes in the table, where the expected frequencies are less than 5. As a result, the total number of classes is reduced from 6 to 4. Class boundaries oi ei 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 4 9 16 13 5 3 4 9.3 14.8 13.2 6.6 2.2 The following table shows the detailed calculations for the chi-squared value. Class boundaries oi ei (oi – ei ) (oi – ei ) 2 (oi – ei ) 2 ei Below 10 13 13.3 –0.3 0.09 0.0068 10-15 16 14.8 1.2 1.44 0.0973 15-20 13 13.2 –0.2 0.04 0.0030 Above 20 8 8.8 –0.8 0.64 0.0727 χ2 = 0.180 13.3 8.8 13 8

270 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Step 4: Determine the critical region The number of degrees of freedom is therefore equal to 4 – 1 = 3. Using a significance level of 0.05, the critical value of chi-squared is 7.815. Step 5: Make a decision As χ2 = 0.180 , 7.815, we have no reason to reject the null hypothesis and conclude that the normal distribution offers a good fit for the distribution of telephone call lengths. Exercise 6.2 1. Assume that a chi-squared goodness-of-fit test is conducted. Determine the critical value of the chisquared test statistic for each of the following cases. (a) Number of categories = 7, α = 0.01 (b) Number of categories = 10, α = 0.10 2. A random sample of 500 observations is obtained and distributed into 4 categories as follows: Category 1 2 3 4 xi 49 263 146 42 Use α = 0.05 to test the null hypothesis H0 : p1 = 0.10, p2 = 0.50, p3 = 0.30, p4 = 0.10. 3. Three coins are tossed 150 times, and the observed frequencies of 0, 1, 2 and 3 heads per toss are 14, 43, 67 and 26 times respectively. Use a 5% significance level to test whether the three coins are balanced. 4. An experiment is to draw a card from a regular deck of 52 cards that has been thoroughly shuffled and it is recorded whether it is a spade, heart, diamond, or club. This process is repeated 40 times, each time replacing the card just drawn. If after 40 trials, 9 spades, 13 hearts, 11 diamonds and 7 clubs are obtained. Test the hypothesis that the deck is honest at the 10% significance level. 5. Each package of beans sold in the supermarket is supposed to mix red beans, mung beans, black beans and black-eyed beans in the ratio of 5:3:1:1. A random sample selected from these packages contains 400 of mixed beans is found to have 210 red beans, 124 mung beans, 30 black beans and 36 blackeyed beans. Test the hypothesis that the package contains the mixed beans in the ratio 5:3:1:1 at the 0.05 significance level. 6. A boy buys a bag of 100 jelly beans. This bag has 5 different colours of jelly beans in it. Assume all five colours are equally likely to be put in the bag. The boy is curious about the colour distribution and opens the bag. He finds out that he has 17 brown, 24 yellow, 10 red, 31 green, and 18 white. Test the hypothesis that the colours of the jelly beans occur with equal frequency at a significance level of 5%. 7. The number of road accidents per week at a junction is monitored by the public traffic department. The table below shows the frequency of accidents per week in 60 weeks. Number of accidents 0 1 2 3 Observed frequency 28 15 12 5 Test the hypothesis that the data follows Poisson distribution with mean 0.9 at the 5% significence level.

271 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 8. The following frequency distribution table represents the number of days during a year that a total of 50 employees at a company are absent from work due to illness. It is thought that the data follows a normal distribution with population mean μ = 7 and standard deviation σ = 3. Number of days absent Number of employees 0 – 3 4 3 – 6 13 6 – 9 24 9 – 12 7 12 – 15 2 Test the goodness-of-fit between the observed class frequencies and the corresponding expected frequencies of a normal distribution at the 5% significence level. 9. A paper shop has several retail stores in a city. The following table shows the number of boxes shipped per day for the last 100 days. Number of packages shipped Number of days 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 5 13 28 23 18 10 3 Use a 5% significance level to test the goodness of fit between the observed class frequencies and the corresponding expected frequencies of a normal distribution with mean 16.4 and standard deviation 7.2. 10. The table below shows the number of rain days in January for the years from 1953 to 2004. Number of rain days 0 1 2 3 4 5 Observed frequency 9 7 14 15 6 1 Test the hypothesis that the recorded data may be fitted by the Poisson distribution with mean 2 at the 10% significance level. 11. A recent study reports the number of hours of personal computer usage per week for a sample of 60 persons. Excluding from the study are people who work in the office and use the computer as part of their work. 1.1 6.7 2.2 2.6 9.8 6.4 4.9 5.2 4.5 9.3 7.9 4.6 4.3 4.5 9.3 5.3 6.3 8.8 6.5 0.6 5.2 6.6 9.3 4.3 6.3 2.1 2.7 0.4 5.1 5.6 5.4 4.8 2.1 10.1 1.3 5.6 2.4 2.4 4.7 1.7 2.0 6.7 3.7 3.3 1.1 2.7 6.7 6.5 4.3 9.7 7.7 5.2 1.7 8.5 4.2 5.5 9.2 8.5 6.0 8.1 It is thought that the data follows a normal distribution with mean 7.4 and standard deviation 2.7. Test the hypothesis at the 5% significance level.

272 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 6.3 Tests of Independence When two attributes (variables) are observed for each element of a random sample, the data can be simultaneously classified with respect to these attributes in a two-way classification table called a contingency table. We can then determine whether there is a significant association between the two attributes. Suppose we take a random sample of 200 persons and classify them based on gender as well as whether these persons own handphones. The observed frequencies are presented in the following 2 × 2 contingency table. Own handphone (yes) Own handphone (no) Total Male Female Total 70 30 100 60 40 100 130 70 200 A contingency table can be of any size. In general, a contingency table with r rows and c columns is denoted as an r × c table. The row and column totals in the above table are called marginal frequencies. It is common practice to refer to each possible outcome of an experiment as a cell. Hence in our example we have four cells. Let us test the hypothesis of independence between a person’s gender and a person’s possession of a handphone. To perform this test, we first calculate the expected frequencies for each of the four cells of the above 2 × 2 contingency table under the assumption that the hypothesis is true. Let M represent the event that an individual selected from the sample is male. Let Y represent the event that an individual selected owns a handphone. Since M and Y are independent events, P(M > Y) = P(M)P(Y). But P(M > Y) = e11 200 , P(M) = 130 200 , and P(Y) = 100 200 . Thus, we have e11 200 = 1 130 200 21 100 200 2 Which we can rearrange as e11 = 130 × 100 200 = (First row total)(First column total) Total sample size Where e11 is the expected frequency for the cell in row 1 and column 1. The general formula for obtaining the expected frequency of any cell is given by Expected frequency = (Row total)(Column total) Total sample size The expected frequency for each cell is recorded in parentheses beside the actual observed value in the table shown below. Own handphone (yes) Own handphone (no) Total Male Female Total 70 (65) 30 (35) 100 60 (65) 40 (35) 100 130 70 200

273 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Note that the expected frequencies in any row or column add up to the appropriate marginal total. We need to calculate only the one expected frequency in the top row of the table and then find the others by subtraction. The number of degrees of freedom associated with the chi-squared test used here is equal to the number of cell frequencies that may be filled in freely when we are given the marginal totals and the grand total, and in this illustration that number is 1. A simple formula providing the correct number of degrees of freedom is ν = (r – l)(c – l). Hence, for our example, v = (2 – 1)(2 – 1) = 1 degree of freedom. We want to measure how much the observed frequencies differ collectively, from their corresponding expected frequencies. We do this with the chi-squared test statistic χ2 = k ∑ i=1 (oi – ei ) 2 ei , where the summation extends over all the cells in the r × c contingency table. We have χ2 = (70 – 65)2 65 + (60 – 65)2 65 + (30 – 35)2 35 + (40 – 35)2 35 = 2.1978 Using a chi-squared table, we can see that for ν = 1, the critical value for 5% significance level is χ2 1 = 3.841. Since the calculated value for χ2 of 2.1978 does not fall within the critical region, we do not reject the hypothesis that there is no relationship between a person’s gender and the person’s possession of a handphone. Example 4 The following data show the attitude of housewives in various parts of the country to a certain brand of detergent. Attitude North Central South Like Indifferent Dislike 46 25 16 21 58 37 31 35 42 Test the hypothesis that the attitude to new introduced detergent is independent of geographical area of residence at the 1% significance level Solution: The given table is arranged to include the row and column totals. Attitude North Central South Total Like Indifferent Dislike 46 25 16 21 58 37 31 35 42 98 118 95 Total 87 116 108 311 Step 1: State the hypotheses H0 : There is no association between attitude and location H1 : There is association between attitude and location Chi-square and Tests of Independence INFO

274 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Step 2: Specify the significance level Given α = 0.01 Step 3: Select the appropriate test statistic and calculate its value Use the chi-squared test for independence to determine whether there is any significant association between the two categorical variables. As with goodness-of-fit test described earlier, the key idea of the chi-squared test for independence is a comparison of observed and expected frequencies. The expected frequency for each cell of the table can be generated using the following formula: Expected frequency = (Row total)(Column total) Total sample size In fact, for a 3 × 3 contingency table, only four expected values in the top two rows of the table are calculated and the remaining five expected values are found by subtraction. For example, to calculate the expected frequency (for attitude like and north) 98 × 87 311 = 27.41. In this way, the table of both observed and expected frequencies is as shown below. Attitude North Central South Total Like Indifferent Dislike Total 46 (27.41) 25 (33.01) 16 (26.58) 87 21 (36.55) 58 (44.01) 37 (35.44) 116 31 (34.04) 35 (40.98) 42 (32.98) 108 98 118 95 311 The number of degrees of freedom ν = (r – l)(c – l) = (3 – 1)(3 – 1) = 4. The chi-squared test statistic is χ2 = k ∑ i=1 (oi – ei ) 2 ei = (46 – 27.41)2 27.41 + (21 – 36.55)2 36.55 + (31 – 34.04)2 34.04 + (25 – 33.01)2 33.01 + (58 – 44.01)2 44.01 + (35 – 40.98)2 40.98 + (16 – 26.58)2 26.58 + (37 – 35.44)2 35.44 + (42 – 32.98)2 32.98 = 33.5057 Step 4: Determine the critical region From chi-squared table, the critical value χ2 for 4 degrees of freedom at 1% level is given by 13.28. Step 5: Make a decision As the calculated value 33.51 is greater than the critical value 13.28, we can conclude there is evidence to reject H0 ; that is attitude to new detergent and geographical area of residence are not independent.

275 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 Exercise 6.3 1. An experiment has 500 observations and the data are classified into 4 × 6 contingency table. Suppose we conduct a chi-squared test of independence at the 1% significance level. Assume the calculated value of the chi-squared test statistic is 39.2. (a) Determine the number of degrees of freedom. (b) Find the critical value for the chi-squared test of independence. (c) Determine whether the chi-squared test values falls into the critical region. 2. The following 3 × 2 contingency table contains observed values for a sample of size 250. Determine whether the row and column variables are independent using the chi-squared test with α = 0.025. X Y A B C 25 37 55 32 63 38 3. A research group performs a study on gender and handedness (right- or left-handed). 800 individuals are randomly chosen from a very large population. The following contingency table displays the distribution of the two categories. Right-handed Left-handed Male Female 344 352 72 32 Test the hypothesis that gender is independent of handedness at the 5% significance level. 4. Consider a sample of 200 customers. For each customer, we have information on gender and preference of food. A contingency table for these data is shown below. Indian Japanese Western Male Female 40 20 20 50 50 20 Carry out a test, at the 5% significance level, to determine whether there is any association between gender and preference of food. 5. In an experiment to study the association between diabetes and smoking habits, the following data are obtained for 150 individuals. Nonsmokers Moderate smokers Heavy smokers Diabetes No diabetes 25 40 30 21 18 16 Using a 1% significance level, test the hypothesis that there is no association between cigarette smoking and the risk of diabetes.

276 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 6. A camera manufacturer has four suppliers of lenses. The table below shows the numbers of defective lenses supplied by the suppliers. Good Defective Supplier 1 Supplier 2 Supplier 3 Supplier 4 95 180 134 138 5 15 16 7 Test, at the 5% significance level, whether the supplier is associated with the lens quality. What is your advice to the purchasing department based on the test result? 7. The table shows the result of a taste test in which a random sample of 500 people in two age groups is asked which of four formulations of a chocolate drink they prefer. Age group Formulation A Formulation B Formulation C Formulation D 7 – 25 26 – 50 30 28 69 36 116 70 78 73 Use a 0.01 significance level to test whether the preference for the different formulation change with age. 8. Fruit trees are subject to a bacteria-caused disease. Several different treatments for this disease are adopted. Treatment A: no action taken, treatment B: careful removal of clearly affected branches, and treatment C: frequent spraying of the leaves with an antibiotic in addition to careful removal of clearly affected branches. There are few different outcomes from the disease. Outcome 1: tree dies in the same year as the disease is noticed, outcome 2: tree dies 2-4 years after disease is noticed, outcome 3: tree survives beyond 4 years. A group of 200 trees are assorted into one of the treatments and over the next few years the outcome is recorded. The results are displayed in the following contingency table. Outcome Treatment A B C 1 2 3 37 16 3 24 20 15 17 32 36 Determine whether there is any substantial evidence to conclude that outcome is independent of treatment. Use a 5% significance level for this test. 9. The table below shows the observed distribution of blood types: A, B, AB, and O in three samples of Malays living in Kedah, Selangor and Johor. Blood type Kedah Selangor Johor A B AB O 14 16 3 17 205 184 51 232 41 37 11 51 Test, at the 5% significance level, whether the distribution of blood type is different across the three states.

277 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 10. A manufacturer operates four assembly machines on three separate shifts daily. The table below gives the number of machine breakdowns recorded in the past year. Machine 1 Machine 2 Machine 3 Machine 4 First shift Second shift Third shift 75 90 141 89 108 175 43 63 121 28 59 141 Determine whether these data provide sufficient evidence, at the 2.5% significance level, to infer that machine breakdown is independent of shift. Summary 1. The chi-squared distribution has one parameter, called the degree of freedom. 2. The chi-squared distribution curve lies to the right of the vertical axis and is skewed to the right. 3. In a goodness-of-fit test, we test the null hypothesis that the observed frequencies follow a certain pattern or theoretical distribution. 4. In a test of independence, we test the null hypothesis that two attributes are independent. 5. General test procedure in a chi-squared test. • State the hypotheses • Specify the significance level • Calculate the value of the chi-squared test statistic k ∑ i=1 (oi – ei ) 2 ei (Combine any adjacent classes where necessary) • Determine the critical region based on the number of degrees of freedom and the significance level • Make a decision STPM PRACTICE 6 1. (a) Find P(0.83 , χ2 5 , 12.8) . (b) Determine the value of k such that P(6.447 , χ2 21 , k) = 0.049. 2. A departmental store sells men’s shirts and stocks these shirts in five different sizes: S, M, L, XL, and XXL. The number of the shirts sold each week is recorded. Sizes Number of shirts S M L XL XXL 21 24 39 25 13 Test, at a 10% significance level, the hypothesis that number of shirts sold is uniformly distributed.

278 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 3. Three identical dice are thrown 150 times. The number of dice whose scores on the top faces at each throw are odd is recorded. The results are as follows: Number of odd scores 0 1 2 3 Frequency 33 59 43 15 Using a 5% significance level, test the hypothesis that all three dice are unbiased. 4. Cars heading to a certain junction may go straight, turn left or turn right. A road transport department officer asserts that 60% of the cars will go straight at the intersection, and of the remaining 40%, equal proportions will turn left and right. One hundred cars are randomly monitored and it is found that 51 cars go straight, 17 cars turn left and 32 cars turn right. Test, at the 5% significance level, the hypothesis that the proportions of cars going straight, turning left and turning right do not differ significantly from those asserted by the officer. 5. A pharmaceutical company conducts a trial on 200 patients to determine the effectiveness of a new cough remedy. Of these patients, 100 are randomly selected to be given the standard cough remedy and the remaining 100 are assigned the new cough remedy. The result are recorded as shown. Standard cough remedy New cough remedy No relief Some relief Full relief 53 34 13 37 44 19 Carry out a test, at a significance level of 5%, to investigate whether the two cough remedies are equally effective. 6. A football fan keeps the record of the goals scored per match by his favourite team. The results are shown below. Goals obtained per match 0 1 2 3 4 5 6 Number of matches 11 16 25 14 7 5 2 Using a 5% significance level, perform a test of the hypothesis that the number of goals per match has a Poisson distribution with mean 2.2. 7. The following table gives the cumulative frequency distribution of the lives (in years) of 40 note-book batteries tested by a battery manufacturer. Battery life not greater than 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Cumulative frequency 0 2 3 7 22 32 37 40 Based on the previous experience, it is believed that a normal distribution with mean 3.5 years and standard deviation 0.7 year provides a good approximation. Perform a chi-squared test, at the 5% significance level, to determine whether the normal distribution gives a good fit for these data.

279 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 8. The table below shows the frequency distribution of marks for a paper obtained by 178 candidates. Mark, x Number of candidates 50 < x < 60 40 < x < 50 30 < x < 40 20 < x < 30 10 < x < 20 0 < x < 10 5 19 34 63 47 10 The population mean and standard deviation of the distribution of marks for the paper are 26.0 and 11.5 respectively. Test, at the 10% significance level, the hypothesis that the distribution of marks for the paper is normal. 9. A botanist sows three seeds in each of 80 pots. The number of seeds which germinate in each pot is recorded. The results of all the 80 pots are given in the following table. Number of seeds germinate 0 1 2 3 Number of pots 25 20 29 6 Using a 1% significance level, test the hypothesis that the data may be fitted by the binomial distribution B(3, 0.4). 10. The distributions of marks for a paper marks in an examination has mean μ and standard deviation σ. Each candidate is assigned one of the five grades A, B, C, D, E as follows: Mark, x Grade x > μ + 3 s 2 A μ + s 2 < x < μ + 3 s 2 B μ – s 2 < x < μ + s 2 C μ – 3 s 2 < x < μ – s 2 D x < μ – 3 s 2 E The table below summarises the grades of a random sample of 198 candidates. Grade A B C D E Number of candidates 17 55 81 33 12 Determine, at the 1% significance level, the adequacy of the normal distribution N(μ, s2 ) as a model for these data.

280 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 11. The lengths (in millimetres) in a random sample of 50 leaves of a certain plant are recorded as follows: 145 133 125 157 165 138 143 151 148 132 155 136 144 158 147 152 140 148 146 150 138 177 165 118 154 126 163 121 140 168 163 135 147 153 146 140 173 142 135 138 156 147 142 128 144 145 151 135 161 150 Test the hypothesis that the leaves length can be approximately modelled by the normal distribution N(146, 132 ). Use a 0.05 significance level. 12. The table below shows the number of individuals exposed to a certain virus and the number of individuals who develop the disease. Development of disease Yes No Exposure to virus Yes No 44 19 116 128 Conduct a test of hypothesis at the 1% significance level, to determine whether there is association between the exposure to the virus and the development of the disease. 13. The table below shows the number of males and females in each of three employment categories at a manufacturing company. Managerial Support Worker Male Female 10 6 39 52 285 624 Using a 1% significance level, test whether there is any association between gender and employment categories. 14. A researcher in a study of heart disease in males links subjects to socioeconomic status and smoking habits. The results are summarised in the contingency table below. Socioeconomic status High Middle Low Smoking habits Current Former Never 66 119 88 29 27 12 55 36 30 Perform a chi-squared test on association between smoking habits and socioeconomic status. Use a significance level 2.5%.

281 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 15. A hypermarket wants to study the relationship between the method of payment by customers of different age groups. A random sample of 250 customers is taken and the results are summarised in the table below. Age group 18 – 25 26 – 35 36 – 45 Over 46 Payment method Card Cash 18 14 36 27 25 33 30 67 Carry out a test at the 5% significance level to find out whether the method of payment is independent of age group. 16. The school of Biological Sciences of a university records the level of exposure to a certain pollutant and the number of brain abnormality for laboratory mice. The data are summarised in the table below. Number of brain abnormality 0 – 2 3 – 4 5 – 6 Level of exposure to pollutant High Medium low 12 8 7 18 7 8 39 13 8 Test, at the 5% significance level, whether there is association between the level of exposure to the pollutant and the number of brain abnormality found in the laboratory mice. 17. The table below summarises the number of hours of sleep at nights for a random sample of adults of different age groups. Number of hours of sleep Less than 6 6 to 8 More than 8 Age group 25 – 44 45 – 54 > 55 41 34 76 85 77 69 70 62 43 Carry out a test, at the 1% significance level, to determine whether the number of hours of sleep is independent of the age of an adult. 18. A plant expert collects samples of rice from a large field of 600 plots. One part of his investigation is based on the sterility observed and genotype used for each plot. Genotypes I II III IV Sterility No problem Moderate Severe 30 102 18 21 90 39 19 120 11 16 77 57 Test, at a 1% significance level, whether sterility is independent of genotype.

282 Mathematics Semester 3 STPM Chapter 6 Chi-squared Tests 6 19. Some years ago, a college offers three different courses: science, arts and commerce. The contingency table below shows 180 new students of each gender registered for each course. Course Gender Science Arts Commerce Male 45 20 35 Female 18 34 28 Use a chi-squared test at the 5% significance level to examine whether choice of course is independent of gender. 20. A researcher investigates a belief that a person’s skin reaction when exposed to ultra-violet light is related to the person’s eye colour. A person’s eye colour and its skin reaction of 200 randomly selected people are recorded as follows: Eye colour Brown Blue Others Skin reaction None 22 28 17 Mild 41 25 8 Strong 33 7 19 Use a chi-squared test to test, at the 5% significance level, that skin reacted to ultra-violet light is independent of person’s eye colour. 21. A survey is carried out on television programmes watched by adults during their leisure time. The survey involves a randomly picked 85 adults of different gender. The results are summarised in the following contingency table. Drama Comedy Sports Male 20 25 11 Female 17 10 2 Carry out a chi-squared test, at the 5% significance level, to investigate whether the selection of the programmes is independent of the gender. 22. The grades in a mathematics paper for a particular term are as follows: Grade A B C D Number of students 11 18 32 19 Test the hypothesis, at the 5% level of significance, that the distribution of grades is uniform. (Hint: Test the grades in the ratio 1 : 1 : 1 : 1) 23. In an experiment to study the relationship between smoking and hypertension, the following data are taken on 90 individuals. Smokers Non-smokers Hypertension 27 10 No hypertension 24 29 Test that there is no relationship between smoking habits and hypertension. Use a 5% level of significance.

283 MATHEMATICS (T) PAPER 3 (One and a half hours) Instruction to candidates: Answer all questions in Section A and any one question in Section B. Answers may be written either in English or Bahasa Malaysia. All necessary working should be shown clearly. Scientific calculators may be used. Programmable and graphic display calculators are prohibited. Section A [45 marks] Answer all questions in this section. 1. The number of patients seeking treatment at a private clinic is recorded over a period of 20 days as follows: 55 48 63 75 33 25 52 56 55 74 65 90 63 49 20 48 34 65 49 55 (a) Find the median and interquartile range. [4 marks] (b) Determine whether there are outliers. [3 marks] (c) Draw a box-and-whisker plot to represent the data. [3 marks] (d) Comment on the shape of the frequency distribution. [2 marks] 2. The events A and B are such that P(A) = 0.6, P(B) = 0.5 and P(A < B) = 0.8. Show that event A and B are independent. The event C is such that P(A < C) = 0.25, P(B < C) = 0.85 and P(A > C) = 3P(B > C). Find P(C). [5 marks] 3. The number of mosquito larvae in a particular pond can be modeled by a Poisson distribution with a mean of 80 larvae per 1000 ml of water. (a) Calculate the probability that there are at most 3 larvae in a random sample of 100 ml of water. [3 marks] (b) What is the most likely number of larvae in a random sample of 100 ml of water? [5 marks] 4. The masses of a certain type of battery have mean 1.48 kg and standard deviation 0.55 kg. (a) Find the probability that the mass of a randomly selected battery is between 1.3 kg and 1.6 kg. (b) Find the probability that the mean mass of 25 randomly selected batteries is between 1.3 kg and 1.6 kg. [4 marks]

284 Mathematics Semester 3 STPM Specimen Paper 5. A bag contains a very large number of balls of the same size but of various colours. The proportion of blue balls is p. It is required to test the null hypothesis H0 : p = 0.53 against the alternative hypothesis H1 : p , 0.53. A random sample of 10 balls is taken and it is noted that 3 balls are blue. Should the null hypothesis be rejected at a significance level of 10%? [6 marks] 6. Before making a class arrangement, a primary school headmaster takes a random sample of 60 new students and compare the pre-arithmetic achievement levels of those who attended kindergarten and those who did not. The results are shown in the contingency table below. Below level At level Above level No kindergarten 10 7 7 Kindergarten 7 18 11 Test the hypothesis that there is an association between attending kindergarten and pre-arithmetic achievement. Use a 0.05 level of significance to perform the test. [10 marks] Section B [15 marks] Answer any one question in this section. 7. The life-span, t (in days), of a certain insect has the cumulative distribution function F given by 1 t . 3 F(t) = ct2 (18 – t 2 ) 0 , t < 3 0 t = 0 (a) Determine the value of the constant c. [2 marks] (b) Find the probability that the life-span is more than 2 days. [3 marks] (c) Calculate the median. [5 marks] (d) Find the probability density function of the life-span of the insect, and sketch its graph. [5 marks] 8. An ambulance service serves a district area. The time taken, in minutes, for a random sample of 80 instants, from receiving an emergency call to the arrival of an ambulance at the location requested for the service was recorded and the results were summarised by ∑x = 1570 and ∑x2 = 31 810. (a) Obtain unbiased estimates of the mean and the variance of the waiting period. [4 marks] (b) Construct a symmetrical 95% confidence interval for the mean waiting time. [5 marks] (c) Explain how this confidence can be used to infer the result of a two-tailed hypothesis test, at the 5% significance level, on the mean target waiting time to be 19 minutes. [2 marks] (d) It is now decided to examine, at the 5% significance level, whether the mean waiting time is less than 19 minutes. [4 marks]

285 1 Data Description Exercise 1.1 1. (a) discrete (b) continuous (c) continuous (d) discrete (e) discrete 2. (a) Stem Leaf 4 5 6 7 8 9 10 4 0 0 6 6 2 3 5 9 9 1 1 3 4 5 6 6 7 7 1 3 5 Key: 7|2 means 7.2 mm (b) Stem Leaf 45 50 55 60 65 70 75 80 2 2 2 4 3 4 4 0 2 2 3 3 0 0 1 2 3 4 4 0 1 2 4 4 1 1 Key: 50|2 means 52 kg (c) Stem Leaf 1 1 1 1 2 2 2 2 2 2 2 5 5 5 5 5 5 6 8 8 8 8 9 9 9 9 0 0 1 1 1 1 2 2 2 4 4 Key: 1|9 means 0.19 second (d) Stem Leaf 0 0 1 1 2 2 3 3 4 4 5 5 6 6 2 3 8 2 1 4 8 8 9 0 1 2 5 6 7 8 8 9 2 2 8 9 2 6 6 8 2 5 Key: 4|8 means 4.8 3. (a) 6.3 hours, [6.0 – 6.4] hours (b) 0.63 g, [0.60 – 0.64] g 4. (a) Girls Boys 0 0 1 1 1 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 9 11 13 15 17 19 21 23 25 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 Key: 1|15 Key: 21|1 means 16 means 22 Comment: Boys hold their breath longer than girls. (b) Chemistry Mathematics 4 1 6 6 5 4 3 9 8 7 3 3 0 8 7 2 1 7 4 2 3 4 5 6 7 8 2 2 4 5 2 7 9 2 4 5 6 7 1 3 3 7 8 9 4 6 Key: 6|4 Key: 5|2 means 46 means 52 Comment: Students achieved higher marks in Mathematics compared to Chemistry. Exercise 1.2 1. (a) f 0 x 4 5 6 7 8 9 10 8 12 16 20 24 (b) 0 10 4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5 20 30 40 f x ANSWERS

286 Mathematics Semester 3 STPM Answers 2. (a) 0 Number of applicants 50 100 5 Age (years) 10 15 20 25 30 35 40 150 200 250 (b) 0 2 48 92 166 65 12 3.2 468 20 12 40 50 100 150 200 Number of workers Distance (km) 31 1 2 3. 0 4 7.0 14.0 21.0 28.0 35.0 42.0 49.0 8 12 12 9 15 6 4 5 1.5 16 Number of patients 3.5 10.5 17.5 24.5 31.5 38.5 45.5 Time (minutes) 4. (a) Number of workers 400 Net income (RM) 600 800 1 000 1 200 1 4001 600 1 800 0 4 8 12 16 20 24 (b) Net income (RM) Number of workers 400  x  600 5 600  x  800 11 800  x  1 000 23 1 000  x  1 200 6 1 200  x  1 400 18 1 400  x  1 600 9 1 600  x  1 800 11 The two histograms drawn in (a) and (b) are the same except the income classes in (b) are RM200 less than the one in (a). Number of workers 400 Net income (RM) 600 800 1 000 1 200 1 4001 600 1 800 0 4 8 12 16 20 24 Exercise 1.3 1. (a) 2 0 10 20 30 40 50 4 6 8 10 Cumulative frequency 12 14 x (b) 0.5 0 10 20 30 40 50 10.5 20.5 30.5 Time (minutes) Cumulative frequency 40.5 50.5 60.5 2. 0 4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5 40 80 120 Number of days Number of crime cases Upper boundary ‘Less than’ cumulative frequency  0.5 0  4.5 5  9.5 42  14.5 129  19.5 250  24.5 327  29.5 369  34.5 390  39.5 400

287 Mathematics Semester 3 STPM Answers 0 4.5 9.5 14.5 19.5 24.5 29.5 34.5 39.5 100 200 300 400 Cumulative frequency Number of crime cases Actual number of crime cases in a day Numbers of days 0 – 16 5 20 – 36 37 40 – 56 87 60 – 76 121 80 – 96 77 100 – 116 42 120 – 136 21 140 – 156 10 Number of actual crime cases 0 18 38 58 78 98 118 138 158 100 200 300 400 Cumulative frequency 3. Upper boundary ‘Less than’ cumulative frequency  10 0  15 5  20 33  25 102  30 228  35 312  40 357  50 423 0 10 15 20 25 30 35 40 45 50 100 200 300 400 Cumulative frequency Time (minutes) 4. Upper boundary ‘More than’ cumulative frequency  6 253  7 248  8 230  9 220  10 165  11 68  12 26  13 0 Petrol usage (l) 0 6 7 8 9 10 11 12 13 100 200 300 Cumulative frequency Exercise 1.4 1. (a) 6 (b) 8.5 (c) 0.65 2. (a) 42, 56 (b) 26.5, 28 3. 4, 4 4. 50.0 kg 5. 46.0 minutes 6. 0 10 39.5 49.5 54.5 59.5 64.5 44.5 20 30 40 50 Cumulative frequency Mass (kg) Median mass = 50.0 kg 7. 0 10 29.519.5 39.5 49.5 59.5 69.5 20 30 40 79.5 Cumulative frequency Time (minutes) Median time = 46.0 minutes

288 Mathematics Semester 3 STPM Answers 8. 0 10 0.465 0.505 0.525 0.545 0.565 0.485 20 30 40 Diameter (cm) Cumulative frequency Median diameter = 0.515 cm Exercise 1.5 1. (a) 5.571 (b) 7.5 2. (a) 2.89 (b) 6.385 3. (a) 6.119 (b) 86.119; Difference between two means = 80 4. 84 5. 10 6. (a) 5.374 (b) 28.879 7. 10.58 The classes in the second distribution are the values in the first distribution + 120. Therefore, the mean = 130.581. For the third distribution, mean = 30 581. 8. 26.333 9. 454 g 10. 45 11. y Number of computers –2 3 –1.5 15 –1.0 27 –0.5 45 0 72 0.5 30 1.0 11 Mean for y = – 0.256 Mean for x = 0.0019994 Exercise 1.6 1. (a) 12, 2.5 (b) 41, 18 (c) 1.6, 0.8 2. (a) 97 g, 91 g, 107 g (b) 91 g – 107 g 3. (a) 27 cm, 25 cm (b) 27 kg, 9 kg 4. (a) 8, 2 (b) 5, 2 5. RM145, RM69 6. 14 bags, 26 bags; 16 bags 7. (a) 20 tonnes, 9 tonnes (b) 22.5 tonnes 8. (a) RM97 000, RM10 000 (b) RM87 000 (c) RM106 000 9. 5.52 minutes, 16.13 minutes, 10.60 minutes 10. 35.89, 69.86, 33.97 11. 20.56 m, 21.94 m, 1.38 m 12. x f Cumulative frequency  4.5 0 0  7.5 4 4  10.5 8 12  13.5 16 28  16.5 28 56  19.5 29 85  22.5 19 104  25.5 10 114  28.5 3 117 0 (fif ) 40 5.5 15.5 10.5 20.5 25.5 80 120 Cumulative frequency 1 4 (fif ) 3 4 x 17.0, 6.5 13. (a) Mass (kg) Cumulative frequency  39.5 0  44.5 9  49.5 15  54.5 36  59.5 90  64.5 144  69.5 153  74.5 156 0 Median 39.5 44.5 49.5 54.5 Q 57 1 Q3 59.5 61 64.5 69.5 74.5 40 80 120 160 Cumulative frequency Mass (kg) (b) 60 (c) 52 (d) 58.5 kg; 6.5 kg

289 Mathematics Semester 3 STPM Answers 14. (a) 0 2 000 20.510.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5 100.5 4 000 6 000 8 000 1 000 3 000 5 000 7 000 9 000 Cumulative frequency Marks (b) 27.6% – 78.3% (c) 4 089 candidates Exercise 1.7 1. 1.49 2. 16.162 3. 280.6 cm 4. 2.61 km 5. 4.09 years 6. X : 7.32 minutes; Y : 9.01 minutes 7. 140.67, 64.52 8. RM160 000 9. RM88 199.50, RM30 541.10 10. 41.12, 0.96 11. 82.80, 6.67 Exercise 1.8 1. 15, 15, 14.4 0 4 10 12 14 16 17 11 13 15 8 12 2 6 10 14 Frequency Number of eggs Negatively skewed distribution. 2. (a) Time (minutes) 0 10 19.5 29.5 34.5 39.5 49.5 59.5 69.5 74.5 79.5 89.5 94.5 20 30 Number of students (b) Positively skewed distribution. 3. (a) 0 Frequency 2 4 19.5 Length (mm) 9.5 29.5 49.5 69.5 89.5 39.5 59.5 79.5 99.5 6 8 10 (b) Mean . mode (c) Positively skewed distribution. 4. (a) 10 639.5 649.5 659.5 669.5 679.5 689.5 694.5 699.5 704.5 714.5 724.5 734.5 20 30 40 50 60 70 80 90 Life-span (hours) Density of number of bulks (b) Negatively skewed distribution; (c) median 5. (a) Mean = 125.2 mm Hg, median = 122.8 mm Hg (b) Positively skewed distribution (Median < Mean) Exercise 1.9 1. 2.75 2. –0.30 3. (a) 0.43 (b) –0.52 Exercise 1.10 1. (a) 1 2 3 4 5 (b) 10 11 12 13 14 15 16 (c) 0 1 2 3 4 5 6 7 (d) 46 47 48 49 50 51 52 2. (a) Stem Leaf 80 81 82 83 84 85 1 2 2 2 5 9 0 0 2 3 5 7 9 9 1 2 2 5 9 9 9 0 1 5 7 8 3 Key: 80|2 means 8.02

290 Mathematics Semester 3 STPM Answers (b) 8.0 8.1 8.2 8.3 8.4 8.5 8.6 3. (a) 2.33 minutes, 1.46 minutes, 3.14 minutes (b) 0 1 2 3 4 1.46 2.33 3.14 5 6 7 8 9 10 (c) Positively skewed distribution. 4. 10 12 43 20 40 60 80 100 120 30 50 70 90 110 140130 62 79 121 ‘Outlier’ values = 134 and 145 5. Group A : Symmetrical Group B : Negatively skewed 6. 0 20 40 50 60 80 Girls Boys The distribution of the girls’ marks was skewed positively while the distribution of the boys’ marks was skewed negatively. As a whole, the boys’ performance was better than the girls’. 7. January July Median = 54 Q1 = 35 Q3 = 72 Median = 16 Q1 = 6 Q3 = 33 0 2010 4030 50 60 8070 90 100 110 January July The distribution of ABCD company’s share sales in January was negatively skewed but in July, the distribution of sales was positively skewed. ABCD company’s share sales is higher in January compared to July. STPM Practice 1 1. (a) 171 g, 23 g (b) 174.5 g, 8.5 g 2. (a) Stem Leaf 12 13 14 15 16 8 9 1 4 0 2 2 2 3 4 4 5 6 7 7 9 9 0 0 0 1 1 3 4 4 7 7 9 1 3 Key: 12|8 means 12.8% (b) Modes: 14.2%, 15.0%, Range: 3.5% (c) 14.8%, 1.1% 3. (a) Stem Leaf 25 30 35 40 45 50 55 60 0 1 3 3 4 0 0 0 1 1 1 2 2 2 3 4 0 0 1 2 2 2 2 2 2 4 1 1 1 3 3 4 3 0 1 0 Key: 25|3 = 28 years (b) 37 years, 35 years (c) 35.5 years, 10 years 4. 13.63 hours, 16.98 hours 5. 744, 16 6. (a) Age (years) Cumulative frequency 0 504030 9.0 12.5 17.0 2010 20 40 60 80 100 120 140 160 180 200 (b) 12.5, 8.24 7. Height (cm) Cumulative frequency 0 113 121.5 129 138 147.5 100 110 120 130 140 150 160 170 40 100 200 300 400 129.0 cm, 16.5 cm

291 Mathematics Semester 3 STPM Answers 8. (a) Cumulative frequency 10 0 First quartile 1000 hours Median Time (minutes) 1015 hours Third quartile 8580 90 95 100 105 110 115 120 125 20 30 40 50 60 70 80 90 100 (b) 96.5 minutes, 9 minutes (c) 64 9. Cumulative frequency 0 2.995 3.195 3.395 3.595 3.795 3.995 4.195 4.395 4.595 4.795 4.995 0.1 0.2 0.3 0.4 0.43 0.5 0.6 3.85 3.46 0.7 0.8 0.75 0.9 1.0 Time (hours) Median 3.495 (a) 3.495 hours (b) 32% 10. (a) 515.6 g, 9.0 g (b) 40% 11. Machine A: 200 g, 2.32 g Machine B: 200 g, 4.90 g Machine A is more reliable 12. 20% 13. (a) 1.7 (b) 8.9 minutes 14. (a) 105 hours (b) 104.25 hours, 7.886 hours (c) 33 50 hours 15. (a) 34.45, 13.65 (b) 66.7% 16. (a)Cumulative frequency Length of metal rods (cm) 20 119.996 119.998 120.000 120.002 120.004 40 60 80 0 10 30 50 70 90 100 (b) 119.9996 (c) 119.9998 (d) 0.002 17. 4.53 kg 18. 15.817 mm Diameter (mm) Cumulative frequency (%) 20 15.55 15.65 15.75 15.85 15.95 16.05 16.15 40 60 80 100 0 15.874 mm. This adjustment can be said to be satisfactory because the standard deviation is only 0.118. 19. 17.53 hours, 2.22 hours 20. 1.832 m, 0.16 m 21. 5.3, 2.465 22. (a) Number of people ( millions) Age (years) 1 0 2 20 40 60 80 100 120 3 4 5 3.85 4.64.4 3.55 2.9 1.0 (b) 35 years (c) 38.25 years

292 Mathematics Semester 3 STPM Answers 23. (a) Mass (kg) Frequency 9.54.5 14.5 19.5 24.5 29.5 34.5 10 0 20 30 40 (b) 17, 8 Increment in mass (kg) Cumulative frequency 20 0 4.5 9.5 29.5 14.5 24.5 19.5 34.5 Q1 M Q3 40 60 80 100 (c) 17.85, 5.57 (d) Median, because the graph is skewed positively. 24. (a) 20.2 litres, 20.6 litres (b) Almost symmetrical, median × mean 25. (a) Length of leaves (cm) Frequency 2.2 2.7 3.2 3.7 4.2 4.7 5.2 5.7 6.2 10 20 30 Negatively skewed. (b) 4.405 cm, 0.91 cm (c) 3.45 cm  length of leaves  3.95 cm (d) Mid-point (cm) Frequency Length (cm) Cumulative frequency 1.7 0  1.95 0 2.2 3  2.45 3 2.7 5  2.95 8 3.2 8  3.45 16 3.7 12  3.95 28 4.2 18  4.45 46 4.7 24  4.95 70 5.2 20  5.45 90 5.7 8  5.95 98 6.2 2  6.45 100 Median = 4.54 cm 26. (a) The distribution is positively skewed (b) Median = 40.9 27. (a) Number of families Annual income 10 1 40 37 5.5 2.670.66 10 000 20 000 30 000 20 30 40 (b) 0.795 28. (a) 30 mm, 27 mm (b) 88 mm (c) 0 Boundary 'Outlier' value 10 30 50 70 90 100 20 40 60 80 29. (a) 713.5, 50.5 (b) 619.5 659.5 699.5 739.5 779.5 639.5 679.5 719.5 759.5 799.5 30. (a) X : Median = 200 Interquartile range = 40 Y : Median = 160 Interquartile range = 60 (b) (i) Distribution of group X is negatively skewed while the distribution of group Y is symmetrical. (ii) Median of group X is more than the median of group Y. (c) The ranges for the two distributions are the same. 31. (a) Time (minutes) Number of children ,10 0 ,20 4 ,30 14 ,40 31 ,50 40 ,60 46 ,70 49 ,80 50

293 Mathematics Semester 3 STPM Answers 10 20 30 40 50 0 Cumulative frequency 10.5 20.5 30.5 40.5 50.5 Time (minutes) 60.5 70.5 80.5 (b) Median  36 minutes Interquartile range  19 minutes (c) Affects both the median and interquartile range. New median  37 minutes, new interquartile range  18 minutes (d) (i) 88.5% (ii)  38 minutes 32. 58.4, 9.2 33. 163.1 cm, 1.3 cm 34. (a) 82.1 s (b) 80.5 s, 85.1 s, 89.3 s (c) 16% 35. 1.6; The distribution of the data is positively skewed. 36. The median average ranking scores for males and females are the same. The average ranking scores of the males are skewed slightly towards hygienic status and the range is small. The average ranking scores of females show a wider range compared to that of the males, and are highly skewed towards hygienic status. 37. (a) 6.025 minutes (b) Time of arrival (t) Cumulative frequency < 0754 0 < 0756 1 < 0758 9 < 0800 22 < 0802 53 < 0804 74 < 0806 80 (c) Median time = 0801 hours, Mode time = 0801 hours (d) Q1 ≈ 0800 hours, Q3 ≈ 0803 hours Semi-interquartile range = 1.5 minutes (e) 72.5% 38. (a) Median, Q2 = 37, Q1 = 28, Q3 = 51 (b) Since Q3 − Q2 . Q2 – Q1 , this distribution is positively skewed. 2 Probability Exercise 2.1 1. 6 2. 48 3. 30 4. 54, 378 5. (a) 676 000 (b) 468 000 6. 240 7. 720 (a) 120 (b) 20 8. 60 9. 560 10. 24 360 11. 2520 (a) 120 (b) 1800 (c) 360 12. (a) 256 (b) 6561 13. 21 14. (a) {cde, cdf, cdg, cef, ceg, cfg, def, deg, dfg, efg} (b) 10 (c) 5 C3 = 10 15. 230 300 16. 1764 17. (a) 120 (b) 336 (c) 360 (d) 1 (e) 35 (f) 1 (g) 126 (h) 1 18. (a) 350 (b) 150 (c) 105 19. 63 20. (a) 25C5 = 25! 20! 5! (b) 25C20 = 25! 5! 20! Selecting 5 paintings is the same as selecting 20 paintings. The number of ways to do each is the same. 21. 2520 22. 24 23. (a) 11 880 (b) 495 24. 210 Exercise 2.2 1. (a) S = {win, lose, draw} (b) (i) A = {win} (ii) B = {win, draw} 2. (a) S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}. (b) (i) E = {4, 8} (ii) E = {1, 3, 5, 7, 9, 11} (iii) E = {φ} 3. (a) S = {HH, HT, TH, TT} (b) B = {HT, TH, TT} 4. 1 3 5. (a) S = {R1 , R2 ,Y1 , Y2 , Y3 , Y4 , G1 , G2 , G3 } (b) (i) 4 9 (ii) 2 3 6. (a) Let B, G1 and G2 denotes the blue and two green balls respectively. The tree diagram is shown as follows: G1 Second ball G1 B First ball G2 BG1 Outcome BG2 G1 B G1 G2 G2 B G2 G1 G2 B G2 B G1 S = {BG1 , BG2 , G1 B, G1 G2 , G2 B, G2 G1 } (b) 1 3 7. (a) A = {HH, HT, TH}; 3 4 (b) B = {HT, TH, TT}; 3 4 (c) C = {TT}; 1 4 (d) A  B = {HT, TH}; 1 2

294 Mathematics Semester 3 STPM Answers (e) A  B = {HH, HT, TH, TT}; 1 (f) A = {TT}; 1 4 8. 5 9 9. 8 15 10. Let O = event that the number of up face is odd, E = event that the number of up face is even, H = event that the up face is head, T = event that the up face is tail. H T H Second toss EHH EHT ETH ETT T H T H First toss OH Outcome T OT O First cast E S = {OH, OT, EHH, EHT, ETH, ETT} (a) 2 3 (b) 1 3 (c) 1 3 (a) and (b) make up the whole sample space. 11. (a) The outcomes in the sample space S are the 52 cards in the deck. (b) Let E be the event that a red face card is chosen, symbolically E = { J◆, Q◆, K◆, A◆, J♥, Q♥, K♥, A♥} (c) 8 52 12. (a) The collectively exhaustive list of the possible outcomes of tossing two dice: Black die Green die 1 2 3 4 5 6 1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) 2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (b) E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} (c) 1 6 13. (a) The events “ace” and “king” are mutually exclusive as any given card cannot be both an ace and a king. (b) The events “ace” and “spade” are not mutually exclusive as a given card can be both an ace and a spade. (c) 2 13 (d) 4 13 14. (a) 1, 2, 3, 4, 5 and 6 (b) {1}, {2}, {3}, {4}, {5} and {6} (c) {1, 2, 3, 4, 5, 6} (d) Yes (e) Yes (f) 1 6 15. White die Black die 1 2 3 4 5 6 1 (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) 2 (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) 3 (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) 4 (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) 5 (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 6 (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (b) Events E2 and E3 are exhaustive events. (c) E2  and E3  are mutually exclusive events. (d) P(E2  E3 ) = 1 and P(E2   E3 ) = 0 16. (a) 0.6 (b) 0.3 (c) 0.4 (d) 0.1 17. (a) 0.7 (b) 0.9 (c) 0.1 18. (a) 3 5 (b) 7 10 (c) 4 5 (d) 0 (e) 3 10 (f) 2 5 19. 0.35 20. (a) 0.325 (b) (i) 0.8125 (ii) 0.375 21. (a) 0.75 (b) 0.81 22. Yes 23. (a) 0.48 (b) 0.52 (c) 0.77 No, since P(A)P(C) ≠ P(A  C) 24. Yes 25. 0.70 26. (a) 0.417 (b) 0.807 (c) 0.4751 27. (a) 0.042 (b) 0.012 (c) 0.988 (d) 0.05 28. 0.25 STPM Practice 2 1. 12 2. 75 3. 2600 4. 24 5. 720 6. (a) 42 (b) 151 200 (c) 24 (d) 20 (e) 56 (f) 1 (g) 7 (h) 1 7. 1320 8. 20 9. 303 600 10. 60 11. 56 12. 126 13. 1 14. (a) S = {1, 2, 3, 4, 5, 6} (b) P(A) = 1 6 , P(B) = 1 2

295 Mathematics Semester 3 STPM Answers 15. (a) 1 2 (b) 1 8 (c) 3 8 (d) 0 16. 6225 17. Let E be the event that a spade is chosen, symbolically (a) {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K} (b) 1 4 18. (a) (i) 363 = 46 656 (ii) 42 840 (b) (i) 0.9182, i.e. 92% of PINs have no repeated symbols. (ii) 0.0818, i.e. 8% of PINs have repeated symbols. 19. (a) (i) 40 320 (ii) 10 080 (b) 1 8 20. (a) 6 11 (b) 4 11 (c) 1 11 21. 65% 22. (a) 21 40 (b) 11 60 23. 83 120 24. (a) Yes (b) 0.50 25. (a) No (b) 4 13 26. (a) 0.19 (b) 0.15 (c) 0.90 15 60 60 100 40 100 4 40 45 60 36 40 I A M F I A (d) 15 19 15 19 19 100 81 100 45 81 4 19 36 81 M F I A M F 27. 0.3509 28. (a) 7 8 (b) 3 10 29. (a) 0.62 (b) 0.38 30. (a) (i) 0.042 (ii) 0.988 (iii) 0.05 (b) (i) No (ii) No 31. (a) 0.45, 0.25, 0.1125 (b) (i) No (ii) Yes (c) 0.45; If A and B are independent, then P(A | B) = P(A) 32. (a) 0.62 (b) 0.9032 33. (a) 1 30 (b) 3 10 (c) 1 3 34. (a) Some improvement No improvement Total Treatment received 150 450 600 Placebo received 100 300 400 Total 250 750 1 000 (b) (i) 0.25 (ii) 0.9 (c) Yes Let event T = “a volunteer received treatment” event I = “a volunteer showed some improvement” Event T and event I are independent since P(T) × P(I) = P(T  I) 35. (a) 2 7 (b) 5 28 36. (a) 0.8765 (b) 0.9974 37. 0.624 38. (a) 0.166 (b) 0.398 (c) 0.844 39. (a) 125 126 (b) 14 125 40. (a) 0.3 (b) (i) The events E9 and F9 are not mutually exclusive. (ii) The events E9 and F9 are not independent. 41. (a) (i) 0.2 (ii) 0.18 (iii) 0.38 (iv) 0.58 (b) (i) A and B are not mutually exclusive. (ii) A and B are not independent. 3 Probability Distributions Exercise 3.1 1. (a) 0.45 (b) 0.65 2. (a) 0.4 (b) 0.5 3. a = 1 9 , b = 2 9 w 0 1 2 3 4 P(W = w) 1 9 1 9 3 9 2 9 2 9 10 2 3 4 x P 1 9 — 2 9 — 3 9 — 4. (a) 1 54 (b) x 1 2 3 4 P(X = x) 2 27 1 6 8 27 25 54 (c) 41 54 (d) 25 27

Pages:

Click to View FlipBook Version