The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Home Explore Pra-U STPM Maths(T) Semester 3 2022 CC039332c

View in Fullscreen

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Like this book? You can publish your book online for free in a few minutes!

Related Publications

Discover the best professional documents and content resources in AnyFlip Document Base.

Published by PENERBITAN PELANGI SDN BHD, 2023-09-26 20:50:18

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

Pages:

Pra-U STPM Maths(T) Semester 3 2022 CC039332c

196 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 0.145 0.472 0.274 0.559 0.108 0.790 0.102 0.243 0.047 0.001 Table 4.2 We select the first two digits after the decimal point for each number, that is 0.145 yields 14. Hence, the student that has been assigned the number 14 is selected. The random number 0.472 yields 47 but none of the student possesses this number, so we ignore this number and look at the next random number in the table. Continue this process until the second student is selected. 2. Systematic sampling In systematic sampling, each member is chosen from a population at a regular interval, time, order or space. For example, we can select the first 5 students from the school register of each class of the fifth formers. 3. Stratum sampling In stratum sampling, a population is categorised into groups that have the same characteristics. For example, students can be categorised into stratum according to the mode of transportation which students take to their schools: walk, cycle, ride a bus or drive. We then select a student from each group. 4. Stratified sampling In stratified sampling, a population is divided into strata. Each stratum is a representation of the respective population. For example, in a particular state, schools can be stratified into zones according to the geographical location. A school is then chosen from each zone and a sample of students is selected among the chosen schools. Sampling distributions The value of a population parameter is always constant; for example, there is only one value of the population mean, µ. However, we would expect different samples of the same size taken from the same population to give different values of the sample mean, x – defined by x – = 1 n (X1 + X2 + … + Xn ) where X1 , X2 , … Xn is a random sample. Consequently x – is a random variable. The sample mean x – has a probability distribution that is called the sampling distribution of the mean. In general, the probability distribution of a sample statistic is called the sampling distribution of the statistic. It specifies the possible values of the statistic and their probabilities. Example 1 Consider a sample of size 2 taken from a set of integers {0, 2, 6, 8} with replacement. The total number of samples is 5 × 5 = 25 and is listed below. (0, 0) , (2, 0) , (4, 0) , (6, 0) , (8, 0) (0, 2) , (2, 2) , (4, 2) , (6, 2) , (8, 2) (0, 4) , (2, 4) , (4, 4) , (6, 4) , (8, 4) (0, 6) , (2, 6) , (4, 6) , (6, 6) , (8, 6) (0, 8) , (2, 8) , (4, 8) , (6, 8) , (8, 8) Find the mean of each sample. Hence, find the probability distribution of the sample mean and illustrate it in a diagram.

197 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Solution: The mean for each sample is calculated. For example, the mean for (0, 0) is 1 2 (0 + 0) = 0 and the mean for (2, 0) is 1 2 (2 + 0) = 1. These sample means, x –, are listed below. 0, 1, 2, 3, 4 1, 2, 3, 4, 5 2, 3, 4, 5, 6 3, 4, 5, 6, 7 4, 5, 6, 7, 8 The probability distribution for the sample mean is shown below. x – 0 1 2 3 4 5 6 7 8 P(X – = x –) 0.04 0.08 0.12 0.16 0.20 0.16 0.12 0.08 0.04 The probability distribution of the sample mean is shown below. 0 1 2 3 4 5 6 7 8 0 0.04 0.08 0.12 0.16 0.20 P(X = x) x Mean and standard deviation of the sample mean The mean and standard deviation of the sampling distribution of the mean are called the mean and standard deviation of X –, and are denoted by µx – and sx – respectively. Actually, the mean and standard deviation of X – are, respectively, the mean and standard deviation of the means of all samples of the same size selected from a population. A random sample of size n taken from an infinite population (or a finite population with replacement) with mean µ and variance s2 has mean µx – = µ and standard deviation sx – = s n . Example 2 A population consists of three numbers, 0, 2 and 4. (a) Find the mean µ and variance σ2 . (b) Find and tabulate the probability distribution of the sample mean for a sample of size 3 taken with replacement from the population. (c) Find the mean and variance for the sample mean, X – = 1 3 (X1 + X2 + X3 ) (d) Verify that µx – = µ and sx – = s n . Solution: (a) µ = E(X) = 0 + 2 + 4 3 = 2

198 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 E(X 2 ) = 02 + 22 + 42 3 = 20 3 σ2 = E(X 2 ) – [E(X)]2 = 20 3 – (2)2 = 8 3 (b) Consider the numbers 0, 0 and 2. There are 3 ways of selecting a sample of the chosen number, i.e. {x1 , x2 , x3 } = {(0, 0, 2), (0, 2, 0), (2, 0, 0)}. The mean of each sample selected is calculated. For example, the mean of (0, 0, 2) is 0 + 0 + 2 3 = 2 3 . On the other hand, there are 3! = 6 ways of selecting a sample of the chosen numbers 0, 2 and 4. The mean of (0, 2, 4) is 0 + 2 + 4 3 = 2. The outcomes of all possible samples of size 3 are put into the table as follow: (x1 , x2 , x3 ) (0, 0, 0) (2, 2, 2) (4, 4, 4) (0, 0, 2) (0, 0, 4) x – = 1 3 (x1 + x2 + x3 ) 0 2 4 2 3 4 3 Number of samples 1 1 1 3 3 (x1 , x2 , x3 ) (2, 2, 0) (2, 2, 4) (4, 4, 0) (4, 4, 2) (0, 2, 4) x – = 1 3 (x1 + x2 + x3 ) 4 3 8 3 8 3 10 3 2 Number of samples 3 3 3 3 6 The probability distribution of the sample mean X – is listed in the table below. Total number of samples is ∑f = 27. x – = 1 3 (x1 + x2 + x3 ) 0 2 4 2 3 4 3 8 3 10 3 Number of samples, f 1 7 1 3 6 6 3 P(X – = x –) 1 27 7 27 1 27 3 27 6 27 6 27 3 27 (c) E(X – ) = ∑ x –P(X – = x–) = 01 1 27 2 + 21 7 272 + 41 1 272 + 2 3 1 3 27 2 + 4 3 1 6 272 + 8 3 1 6 272 + 10 3 1 3 27 2 = 2

199 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 E(X –2 ) = ∑ x –2 P(X – = x –) = 02 1 1 272 + 22 1 7 272 + 42 1 1 272 + 1 2 3 2 2 1 3 272 + 1 4 3 2 2 1 6 272 + 1 8 3 2 2 1 6 272 + 1 10 3 2 2 1 3 272 = 44 9 Var(X – ) = E(X –2 ) – [E(X – )]2 = 44 9 – (2)2 = 8 9 (d) E(X – ) = µ = 2 Var(X – ) = s2 n = 8 3(3) = 8 9 Hence, it is verified that µx – = µ and sx = s √ n . Example 3 X is a random variable with probability distribution as follows: P(X = 0) = 0.5 P(X = 1) = 0.3 P(X = 2) = 0.2 (a) Find the mean µ and variance σ2 . (b) Find and tabulate the probability distribution of all possible sample means for a sample of size 2 taken with replacement from the population. (c) Find the mean and variance for the sample mean, X – = 1 2 (X1 + X2 ) (d) Verify that µx – = µ and sx – = s  n . Solution: (a) µ = E(X) = ∑xP(X = x) = 0(0.5) + 1(0.3) + 2(0.2) = 0.7 E(X2 ) = ∑x2 P(X = x) = 02 (0.5) + 12 (0.3) + 22 (0.2) = 1.1 s2 = E(X2 ) – [E(X)]2 = 1.1 – (0.7)2 = 0.61 (b) Consider the numbers 0 and 1. It can be selected in two ways, i.e. {x1 , x2 } = {(0, 1), (1, 0)} The probability of selecting the sample (0, 1) = P(X1 = 0 ) × P(X2 = 1) = 0.5 × 0.3 = 0.15 Therefore, the selection for the rest of the samples is worked out in the same way. Now for each sample chosen, calculate the mean. For example, the mean of (0, 1) is 1 2 (0 + 1) = 0.5.

200 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 The outcomes of all possible samples of size 2 are put into the table as follow: (x1 , x2 ) (0,0) (0,1) (0,2) (1,0) (1,1) (1,2) (2,0) (2,1) (2,2) x – = 1 2 (x1 + x2 ) 0 0.5 1 0.5 1 1.5 1 1.5 2 P(X – = x –) 0.25 0.15 0.10 0.15 0.09 0.06 0.10 0.06 0.04 The probability distribution of the sample mean X – is shown below. x – = 1 2 (x1 + x2 ) 0 0.5 1 1.5 2 P(X – = x –) 0.25 0.30 0.29 0.12 0.04 (c) E(X – ) = ∑ x – P(X – = x –) = 0(0.25) + 0.5(0.30) + 1(0.29) + 1.5(0.12) + 2(0.04) = 0.7 E(X –2 ) = ∑ x – 2 P(X – = x –) = 02 (0.25) + 0.52 (0.30) + 12 (0.29) + 1.52 (0.12) + 22 (0.04) = 0.795 Var(X – ) = E(X –2 ) – [E(X – )]2 = 0.795 – (0.7)2 = 0.305 (d) E(X – ) = µ = 0.7 s2 n = 0.61 2 = 0.305 \ Var(X – ) = s2 n = 0.305 Hence, it is verified that µx – = µ and sx – = s  n . A random sample of size n taken without replacement from a finite population of size N with mean µ and variance σ2 has mean µx – = µ and the standard deviation, sx – = s  n N – n N – 1 . Example 4 A population X consists of three numbers {1, 4, 7}. (a) Find the mean µ and variance σ2 . (b) Find and tabulate the probability distribution of the sample mean for a sample of size 2 taken without replacement from the population. (c) Find the mean and variance for the sample mean, X – = 1 2 (X1 + X2 ). (d) Verify that µx – = µ and sx – = s  n N – n N – 1 .

201 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Solution: (a) µ = E(X) = 1 + 4 + 7 3 = 4 E(X2 ) = 12 + 42 + 72 3 = 22 s2 = E(X2 ) – [E(X)]2 = 22 – (4)2 = 6 (b) As the sampling is done without replacement, the possible samples that can be selected are (1, 4), (1, 7), (4, 1), (4, 7), (7, 1) and (7, 4). The sample means of all possible samples of size 2 are shown in the table as follow: (x1 , x2 ) (1, 4) (1, 7) (4, 1) (4, 7) (7, 1) (7, 4) x – = 1 2 (x1 + x2 ) 2.5 4 2.5 5.5 4 5.5 The probability distribution of the sample mean x – is listed in the table below. Total number of sample is ∑f = 6 x – = 1 2 (x1 + x2 ) 2.5 4 5.5 Number of samples, f 2 2 2 P(X – = x –) 1 3 1 3 1 3 (c) E(X – ) = ∑ x – P(X – = x –) = 2.51 1 3 2 + 41 1 3 2 + 5.51 1 3 2 = 4 E(X –2 ) = ∑ x – 2 P(X – = x –) = 2.52 1 1 3 2 + 42 1 1 3 2 + 5.52 1 1 3 2 = 17.5 Var(X – ) = E(X –2 ) – [E(X – )]2 = 17.5 – (4)2 = 1.5 (d) E(X – ) = µ = 4 σ2 = 6, N = 3 and n = 2 Var(X – ) = s2 n (N – n) (N – 1) = 6 2 (3 – 2) (3 – 1) = 1.5 Hence, it is verified that µx – = µ and sx – = s  n N – n N – 1 .

202 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Instead of saying the standard deviation of the sampling distribution of the mean, statisticians refer to this standard deviation as the standard error of sample mean. If a random sample of size n is taken from a population with mean µ and standard deviation σ, then the standard error of the sample mean is sx – = s  n . If the sample size n is large, then the standard error is small. Hence, the sample mean will be close to the population mean. µ m fi n Sampling distribution with sample size m. Sampling distribution with sample size n. Figure 4.1 Example 5 A population has mean µ and standard deviation σ = 200. A random sample A of size nA = 10 and another random sample B of size nB = 100 is taken from the population. Find the standard error for each sample and comment on your results. Solution: The standard error for sample A is s  nA = 200 10 = 63.25 The standard error for sample B is s  nB = 200 100 = 20 Observe that when the sample size increases from 10 to 100, the standard error decreases from 63.25 to 20. In general, a larger sample size should be chosen to reduce the standard error when conducting a sampling project. Exercise 4.1 1. A population consists of numbers 1, 4 and 7. (a) Find the mean µ and variance σ2 . (b) Find and tabulate the probability distribution of the sample mean, X – = 1 3 (X1 + X2 + X3 ), for a sample of size 3 taken with replacement. (c) Find the mean and variance of the sample mean. (d) Verify that E(X – ) = µ and Var(X – ) = s2 n . 2. (a) Find the mean µ and variance σ 2 for each of the populations with the following probability distributions. (i) P(X = 0) = 0.6, P(X = 1) = 0.3, P(X = 2) = 0.1 (ii) P(X = –3) = 0.4, P(X = 2) = 0.3, P(X = 4) = 0.3 (b) Find the mean and variance of the sample mean of size 2. (c) Verify that E(X – ) = µ and Var(X – ) = s2 n .

203 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 3. A population consists of numbers 1, 4, 7 and 8. (a) Find the mean µ and variance σ2 . (b) Find and tabulate the probability distribution of the sample mean, X – = 1 2 (X1 + X2 ), for a sample of size 2 taken without replacement. (c) Find the mean and variance of the sample mean. (d) Verify that E(X – ) = µ and Var(X – ) = s2 n 1 N – n N – 1 2 . (e) Write the formula for Var(X – ) when N → ∞. 4. X is a discrete random variable with the following probability distribution: x –2 –1 0 1 2 P(X = x) 1 12 1 4 1 3 1 4 1 12 (a) Find the mean µ and variance σ2 of X. (b) Find and tabulate the probability distribution of the sample mean of size 2 taken with replacement. (c) Find the mean and variance of the sample mean. (d) Verify that E(X – ) = µ and Var(X – ) = s2 n . (e) Write down the mean and variance of the sample mean for a size of 3 taken with replacement. 5. A population has mean µ and standard deviation σ = 50. A random sample A of size nA = 38 and another random sample B of size nB = 80 is taken from the population. Find the standard error for each sample and comment on your results. Shape of the sampling distribution of the mean Sampling from a normal population Population Sampling distribution of means for different sizes n (a) (b) (c) (d) Figure 4.2 The Figure 4.2(a) shows a population with a normal distribution. Then the sampling distribution of the mean will also be normally distributed for any sample size as shown in diagrams (b), (c) and (d). n = 2 n = 5 n = 30 x x x Normal x f(x) f(x) f(x) f(x)

204 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 If a random sample of size n taken from a normal population with mean µ and variance σ 2, then the sampling distribution of the mean also has a normal distribution with mean µ and variance s2 n . Notice that X – ~ N(µ , s2 n ) irrespective of the sample size n. Example 6 A sample of size 15 is taken from a normal population with mean 60 and variance 20. Find the probability that the sample mean is less than 58. Solution: The population X ~ N(60, 20). Then X – = 1 15 15 ∑ i=1 Xi ~ N160, 20 152. P(X – , 58 ) = P 1 Z , 58 – 60 20 15 2 = P(Z , –1.732) = 0.0416 Example 7 The length of a type of worm is normally distributed with mean 15 cm and standard deviation 30 cm. Find the probability that the length of a worm selected at random lies between 14 cm and 16 cm. A sample of size 25 worms is chosen randomly and the mean length is calculated. Find the probability that the mean length lies between 14 cm and 16 cm. Solution: Let X be the length of a worm. Then X ~ N(15, 30). P(14 , X , 16) = P1 14 – 15 30 , Z , 16 – 15 30 2 = P(–0.183 , Z , 0.183) = 0.1452 Now the sampling distribution of the mean length, X – = 1 25 25 ∑ i=1 Xi ~ N(15, 30 25 ). P(14 , X – , 16) = P1 14 – 15 30 25 , Z , 16 – 15 30 25 2 X – – µ Z = ———– s2 n = P(–0.913 , Z , 0.913) = 0.6388 X – µ Z = ———– s

205 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Example 8 A sample of size n is taken from a normally distributed population with mean 80 and variance 36. Find the value of n if the probability of the sample mean exceeds 78 is 0.975. Solution: The population X ~ N(80, 36) Then X – ~ N180, 36 n 2. Given that P(X – . 78) = 0.975 P 1 Z . 78 – 80 36 n 2 = 0.975 From the standardized normal table, P(Z , 1.96) = 0.975 Hence, P(Z . –1.96) = 0.975 ∴ 78 – 80 36 n = –1.96 –2 –1.96 = 36 n n = 361 1.96 2 2 2 = 34.57 Therefore, n = 35 Example 9 A large random sample of size n is taken from a normal population with mean 80 and standard deviation 5. Find the least value of n if the sample mean (a) exceeds the population mean by at least 2 with a probability of less than 0.01, (b) differs from the population mean by at least 2 with a probability of less than 0.01. Solution: The population X ~ N(80, 25) Then X – ~ N180, 25 n 2. (a) P(X – – 80 . 2 ) , 0.01 P 1 X – – 80 25 n . 2 25 n 2 , 0.01 ⇒ P 1 Z . 2 25 n 2 , 0.01 From the standardized normal table, P(Z , 2.326) = 0.99 \ P(Z . 2.326) = 0.01 Hence, 2 25 n . 2.326 n 25 . 2.326 2 n . 251 2.326 2 2 2 = 33.8 Therefore, the least value of n is 34

206 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (b) P(|X – – 80| . 2) , 0.01 P 1 X – – 80 25 n . 2 25 n 2 , 0.01 ⇒ P 1 |Z| . 2 25 n 2 , 0.01 P 1 Z , – 2 25 n 2 + P1 Z . 2 25 n 2 , 0.01 Therefore, P 1 Z . 2 25 n 2 , 0.005 From the standardized normal table, P(Z , 2.576) = 0.995 \ P(Z . 2.576) = 0.005 Hence, 1 2 25 n 2 . 2.576 n 25 . 2.576 2 n . 251 2.576 2 2 2 = 41.47 Therefore, the least value of n is 42. Sampling from any population Population Sampling distribution of the mean for different sizes n a b Uniform distribution x f(x) (a) n = 2 x f(x) (b) n = 5 x f(x) (c) n = 30 x f(x) (d) Population Sampling distribution of the mean for different sizes n x f(x) (e) n = 2 x f(x) (f) n = 5 x f(x) (g) n = 30 x f(x) (h) Figure 4.3

207 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 The diagram (a) shows a population with uniform distribution, i.e. one with the probability density function f(x) = 1 b – a over the interval (a, b). The diagrams (b), (c) and (d) shows the sampling distribution of the mean for samples of sizes n = 2, 5 and 30 respectively. Notice that as n gets larger, the distribution approaches a normal distribution. The diagram (e) illustrates another population with probability density function as shown. The diagrams (f), (g) and (h) show the sampling distribution of the mean for samples of sizes n = 2, 5 and 30 respectively. Notice that as the sample size n increases, the distribution approaches a normal distribution. This behaviour is true in general and is stated by the central limit theorem below. In random sampling from any population with mean µ and variance σ2 , where the sample size n is sufficiently large, the sample mean has an approximately normal distribution with mean µ and variance s2 n . In general, the sample size is considered sufficiently large if n > 30. The central limit theorem is a very important theorem in the study of statistics as it enables us to make inferences about the population parameters without knowing anything about the distribution of the population. We just need to study the sample taken from the population in order to draw inferences about the population parameters. Thus, this will overcome constraints like time, money, resources, etc., in the process of conducting sampling researches. Example 10 The random variable X has probability distribution shown in the table. x 0 1 2 3 P(X = x) 0.3 0.2 0.4 0.1 (a) Find E(X) and Var(X). (b) If the mean of 30 observations of X is denoted by X –, find the probability that X – lies between 1 and 1.4. Solution: (a) Construct the table of values for xP(X = x) and x 2 P(X = x) for the distribution of X. x 0 1 2 3 P(X = x) 0.3 0.2 0.4 0.1 xP(X = x) 0 0.2 0.8 0.3 x2 P(X = x) 0 0.2 1.6 0.9 E(X) = ∑xP(X = x) = 0 + 0.2 + 0.8 + 0.3 = 1.3 E(X 2 ) = ∑x2 P(X = x) = 0 + 0.2 + 1.6 + 0.9 = 2.7 Var(X) = 2.7 – (1.3)2 = 1.01

208 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (b) Since n = 30 is sufficiently large, according to the central limit theorem, the sample mean, X – = 1 30 30 ∑ i=1 Xi ~ N11.3, 1.01 30 2. P(1 , X – , 1.4) = P 1 1 – 1.3 1.01 30 , Z , 1.4 – 1.3 1.01 30 2 = P(–1.635 , Z , 0.545) = 0.6560 Example 11 The random variable X has a probability density function 1 3 , 0 < x < 3, f(x) = 0 , otherwise. (a) Find the mean and variance of X. (b) If the mean of 30 observations of X is denoted by X –, find the probability that X – lies between 1.25 and 1.65. Solution: (a) E(X) = ∫ 3 0 1 3 x dx = 1 6 3x2 4 3 0 = 1 6 [9 – 0] = 1.5 E(X2 ) = ∫ 3 0 1 3 x 2 dx = 1 9 3x3 4 3 0 = 1 9 [27 – 0] = 3 Var(X) = 3 – (1.5)2 = 0.75 (b) Since n = 30 is sufficiently large, according to the central limit theorem, the sample mean, X – = 1 30 30 ∑ i=1 Xi ~ N11.5, 0.75 30 2. P(1.25 , X – , 1.65) = P 1 1.25 – 1.5 0.75 30 , Z , 1.65 – 1.5 0.75 30 2 = P(–1.581 , Z , 0.949) = 0.7717

209 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Example 12 Find the probability that the sample mean selected at random for a sample of size 30 taken from a population with binomial distribution, parameters n = 10 and p = 0.5, will exceed the value of 5.5. Solution: Given X ~ B(10, 0.5), then E(X) = np = 10(0.5) = 5 and Var(X ) = npq = 10(0.5)(0.5) = 2.5 Since n = 30 is sufficiently large, according to the central limit theorem, the sample mean, X – = 1 30 30 ∑ i=1 Xi ~ N15, 2.5 30 2. P(X – . 5.5) = P 1 Z , 5.5 – 5 2.5 30 2 = P(Z . 1.732) = 0.0416 Example 13 A sample of size m is taken from a population with binomial distribution, parameters n = 20 and p = 0.4. It is found that 4% of the sample mean is less than 7.5. Find the value of m. Solution: Given X ~ B(20, 0.4), then E(X ) = 20(0.4) = 8 and Var(X) = 20(0.4)(1 – 0.4) = 4.8 Assuming m is sufficiently large, according to the central limit theorem, X – = 1 m m ∑ i=1 Xi ~ N18, 4.8 m 2. P(X – , 7.5) = 0.04 P 1 Z , 7.5 – 8 4.8 m 2 = 0.04 From the standardized normal table, P(Z , 1.751) = 0.96. Hence, P(Z , –1.751) = 0.04 ∴ 7.5 – 8 4.8 m = –1.751 m 4.8 = –1.751 –0.5 m = 4.81 1.751 0.5 2 2 = 58.86 Therefore, m = 59

210 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Mean and standard deviation of the sample proportion The population is the ratio of the number of elements in a population with a particular feature to the total number of elements in the population. The sample proportion gives a similar ratio for a sample. If a random sample of size n is taken from a population and the number of elements with a particular characteristic is X then the sample proportion is p ^ = X n . Just like the sample mean X – , the sample proportion p ^ is a random variable. The probability distribution of the sample proportion is called its sampling distribution. Shape of the sampling distribution of the proportion The sampling distribution of the proportion is approximately normal with mean p and standard deviation p(1 – p) n , provided n is sufficiently large. In general, the sample size is sufficiently large if np . 5 and n(1 – p) . 5. For a more accurate answer an adjustment in value called the “continuity correction” is performed as shown below. (a) P(p ^ > a) = P1p . a – 1 2n2 (b) P(p ^ < a) = P1p ^ , a + 1 2n2 (c) P(p ^ . a) = P1p . a + 1 2n2 (d) P(p ^ , a) = P1p ^ , a – 1 2n2 Example 14 A company manufactures a type of light bulbs. It is known that 5% of the bulbs are defective. Find the probability that a sample of 400 light bulbs chosen at random will yield a proportion of defective bulbs of (a) at least 4%, (b) at most 5.5%, (c) more than 4.5%, (d) less than 4.8%. Solution: Let p ^ be the proportion of defective light bulbs from the sample. µp ^ = p = 0.05 s p ^ = pq n = 0.05 × 0.95 400 p ^ ~ N10.05, 0.05 × 0.95 400 2, i.e. p ^ ~ N(0.05, 0.00011875) (a) P(p ^ > 0.04) = P1p ^ . 0.04 – 1 2(400) 2 = P(p ^ . 0.03875) = P1Z . 0.03875 – 0.05 0.00011875 2 = P(Z . –1.032) = 0.8490

211 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (b) P(p ^ < 0.055) = P1p ^ , 0.055 + 1 2(400) 2 = P(p ^ , 0.05625) = P1Z , 0.05625 – 0.05 0.00011875 2 = P(Z , 0.574) = 0.7171 (c) P(p ^ . 0.045) = P1p ^ . 0.045 + 1 2(400) 2 = P(p ^ . 0.04625) = P1Z . 0.04625 – 0.05 0.00011875 2 = P(Z . –0.344) = 0.6346 (d) P(p ^ , 0.048) = P1p ^ , 0.048 – 1 2(400) 2 = P(p ^ , 0.04675) = P1Z , 0.04675 – 0.05 0.00011875 2 = P(Z , –0.298) = 0.3828 The standard deviation of the sampling distribution of proportion, p(1 – p) n , is also known as the standard error of sample proportion. The larger the sample size, the smaller the standard error. Thus, the sample proportions will be closer to the population proportion. Example 15 The proportion of students of a secondary school who wear glasses is 0.35. A study is taken to determine the proportion of students who wear glasses by choosing a sample A of 30 students and another sample B of 50 students. Find the standard error of proportion for these two samples and comment on your results. Solution: When the sample size of A, nA = 30, the standard error of the sample A, sPA = 0.35(1 – 0.35) 30 = 0.0871. When the sample size of B, nB = 50, the standard error of the sample B, sPB = 0.35(1 – 0.35) 50 = 0.0675. When the sample size increases from 30 to 50, the standard error of the sample proportion of students who wear glasses decreases from 0.0871 to 0.0675. Hence, the larger the sample size, the smaller the standard error.

212 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Exercise 4.2 1. Find the probability that a random sample of size 30 taken from population with a binomial distribution with parameters n = 9 and p = 0.5 will have a mean value exceeding 5. 2. A random variable X has probability distribution as shown in the table. x 1 3 5 7 P(X = x) 0.2 0.4 0.3 0.1 (a) Find E(X) and Var(X). (b) Find the probability that the mean of a random sample of size 30 lies between 3 and 4. 3. A random variable X has a probability density function, 1 3 , 3 < x < 6, f(x) = 0 , otherwise. (a) Find the mean and variance of X. (b) A random sample of 30 observations is taken of X. Find the probability that the sample mean lies between 4.2 and 4.7. 4. A random variable X has a normal distribution with mean 60 and standard deviation 4. The mean of 15 observations is denoted by X – . (a) Find the probability that X lies between 58 and 61. (b) Find the probability that the sample mean X – lies between 58 and 61. 5. The height of a type of a plant is normally distributed with mean 21 cm and standard deviation 90 cm. The heights of a randomly selected sample of 10 plants are measured and the mean height is calculated. Find the probability that the sample mean is between 18 cm and 27 cm. 6. A random sample of size n is taken from a normal distribution with mean 74 and variance 36. Find the value of n if the probability that the sample mean exceeds 73 is 0.854. 7. A random sample of size n is taken from a normal distribution with mean µ and variance 1. Find the least value of n if the probability that the sample mean lies 0.10 within the mean µ is more than 0.95. 8. A computer disc manufacturer finds that 3% of the discs produced are defective. Find the probability that a randomly selected sample of 500 discs contains (a) at least 5% defects, (b) more than 5% defects, (c) at most 3% defects, (d) less than 3% defects. 9. A playing dice is modified so that the number six appears once every five tosses. Find the probability that the proportion of sixes that appears in 60 tosses of the dice is (a) less than 25%, (b) at most 28%, (c) at least 18%, (d) more than 30%. 10. The proportion of students of a secondary school who take tuition in mathematics is 0.48. A study is taken to determine the proportion of students who take tuition in mathematics by choosing a sample A of 30 students and another sample B of 80 students. Find the standard error of sample proportion for these two samples and comment on your results.

213 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 4.2 Estimation Point estimation More often than not, the population parameter is unknown as it is not possible to study every member of the population. However, we can study a subset of the population and obtain information from it. From the sample information, we can then estimate the population parameter. A statistic intended for estimating a parameter is called a point estimator. The value of the statistic used to estimate the population parameter is called a point estimate. A point estimator has to satisfy certain conditions, one of which is that it must be unbiased. A statistic is an unbiased estimator for a population parameter q if the expected value of the statistic is q, i.e. U is an unbiased estimator for q, if E(U) = q. The value of an unbiased estimator is called an unbiased estimate. Unbiased estimates for population mean and variance If x1 , x2 , …, xn are the values of a sample of size n taken from a population, then an unbiased estimate for the population mean µ is the sample mean x ^ = 1 n n ∑ i=1 xi . An unbiased estimate for the population variance s2 is s ^ 2 = ns2 n – 1 , where s 2 = n ∑ i=1 (xi – x –) 2 n . Note: The unbiased estimate s ^ 2 can have other forms as shown below. (a) s ^ 2 = n ∑ i=1 (xi – x –) 2 n – 1 (b) s ^ 2 = 1 n – 1 1 n ∑ i=1 x2 – 1 n ∑ i=1 xi2 2 n 2 (c) s ^ 2 = n ∑ i=1 xi 2 – n(x –) 2 n – 1 The grouped data, replace n with ∑f in the formulae above. Example 16 Find the unbiased estimates for the population mean and variance from the following sample data: (a) 46, 48, 51, 50, 45, 53, 50, 48 (b) ∑x = 120, ∑x2 = 2102, n = 8 (c) ∑x = 128, ∑(x – x–) 2 = 312, n = 8 (d) x 20 21 22 23 24 25 f 4 14 17 26 20 9

214 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Solution: (a) Unbiased estimate for the population mean = x ^ = 1 n n ∑ i=1 xi = 46 + 48 + 51 + 50 + 45 + 53 + 50 + 48 8 = 391 8 = 48.88 ∑x2 = 462 + 482 + 512 + 502 + 452 + 532 + 502 + 482 = 19 159 Unbiased estimate for the population variance = s ^ 2 = n n ∑ i=1 xi 2 – 1 n ∑ i=1 xi2 2 n(n – 1) = 8(19 159) – 3912 8(8 – 1) = 391 56 = 6.982 (b) Unbiased estimate for the population mean = x ^ = 1 n ∑x = 120 8 = 15 Unbiased estimate for the population variance = s ^ 2 = n∑x2 – (∑x) 2 n(n – 1) = 8(2102) – 1202 8(8 – 1) = 2416 56 = 43.14 (c) Unbiased estimate for the population mean = x ^ = 1 n ∑x = 128 8 = 16

215 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Unbiased estimate for the population variance = s ^ 2 = ∑(x – x –) 2 n – 1 = 312 8 – 1 = 44.57 (d) Construct the table of values for fx and fx2 . x 20 21 22 23 24 25 f 4 14 17 26 20 9 ∑f = 90 fx 80 294 374 598 480 225 ∑fx = 2051 fx2 1600 6174 8228 13 754 11 520 5625 ∑fx2 = 46 901 Unbiased estimate for the population mean = x ^ = ∑fx ∑f = 2051 90 = 22.79 Unbiased estimate for the population variance = s ^ 2 = (∑f)(∑fx2 ) – (∑fx) 2 (∑f)(∑f – 1) = 90(46 901) – 20512 90(90 – 1) = 14 489 8010 = 1.809 Unbiased estimate for population proportion If a random sample of size n is taken from a population with a specific character, then the unbiased estimate for the population proportion with the specific character is x n where x is the number of sample elements with the specific character. Example 17 A research is conducted in a town to find the number of residents who owns a credit card. Out of a random sample of 120 residents, 25 of them own a credit card. What is the unbiased estimate for the proportion of town residents who owns a credit card?

216 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Solution: The sample size, n = 120 The number of residents who owns a credit card, x = 25 Therefore, the unbiased estimate for the proportion of town residents who owns a credit card is p ^ = 25 120 = 0.21 Exercise 4.3 1. Find the unbiased estimate for the population mean and variance from the following sample data: (a) 19.30, 19.61, 18.27, 18.90, 19.14, 19.90, 18.76, 19.10 (b) n = 12, ∑x = 282, ∑(x – x–) 2 = 48.72 (c) n = 34, ∑x = 330, ∑x 2 = 23 700 (d) x 1 2 3 4 5 f 12 18 28 25 17 2. A lecturer wants to know the proportion of students in a university who wear glasses. Out of a random sample of 50 students, 20 of them wear glasses. What is the unbiased estimate for the proportion of students in the university who wear glasses? Interval Estimation Another technique to estimate a population parameter q is to construct a confidence interval (a, b) around the point estimate and specify that the confidence interval (a, b) contains q with a certain probability called the confidence level. Confidence interval for the normal population mean µ with known variance σ2 If X – = 1 n (X1 + X2 + … + Xn ) is the mean of a random sample taken from a normal population with known variance σ2 , then a (1 – α)100% symmetrical confidence interval for the population mean µ is 1X – ± z—a 2 σ  n 2. Notes: (a) z 1 – _ 2 _ 2 –z α 2 _ α 2 _z α α α a b P(afi fib) = 1 – x µ α 1 – α = P(–z—a 2 , Z , z—a 2 ) 1 – α = P(a , µ , b) (a) (b) Figure 4.4

217 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (b) x – is sample mean. (c) The confidence interval is written as 1x – – z—a 2 σ  n , x – + z—a 2 σ  n 2. (d) The length of the interval is known as the width, w = 2z—a 2 σ  n . (e) The standard error, σ  n = w 2z—a 2 . Proof: afi fib a b 1 – _α 2 _α 2 α µ Figure 4.5 A (1 – α)100% symmetrical confidence interval for µ is an interval (a, b) such that P(a , µ , b) = 1 – α. Let z—a 2 be the standardized score such that P(–z—a 2 , Z , z—a 2 ) = 1 – α. Then, the sample mean X – ~ N( µ , α2 n ) Hence, P1–z—a 2 , X – – µ σ  n , z—a 2 2 = 1 – α P1–z—a 2 σ  n , X – – µ , z—a 2 σ  n 2 = 1 – α P1–X – – z—a 2 σ  n , – µ , –X – + z—a 2 σ  n 2 = 1 – α P1X – + z—a 2 σ  n . µ . X – – z—a 2 σ  n 2 = 1 – α Rearrange it, P1X – – z—a 2 σ  n , µ , X – + z—a 2 σ  n 2 = 1 – α Therefore, a = x – – z—a 2 σ  n and b = x – + z—a 2 σ  n Hence, a (1 – α)100% symmetrical confidence interval for µ is 1x – – z—a 2 σ  n , x – + z—a 2 σ  n 2. Example 18 A random sample of 10 fishes are caught from a pond and its length, in cm, were measured. The results are as follows: 9.1, 9.1, 11.3, 10.7, 9.8, 10.2, 10.1, 9.7, 9.9 and 9.5. The lengths of fishes in the pond are normally distributed with variance 4. Find a 95% symmetrical confidence interval for the mean lengths of all the fishes in the pond. Interval Estimation INFO 1 – _ 2 _ 2 –z 2 _ z 2 _ α α α α α Figure 4.6

218 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Solution: Let x – be the mean length of the sample x – = ∑x n = 9.1 + 9.1 + 11.3 + 10.7 + 9.8 + 10.2 + 10.1 + 9.7 + 9.9 + 9.5 10 = 9.94 cm For a 95% symmetrical confidence interval, 95% = (1 – α)100% 1 – α = 0.95 α = 0.05 Q = α 2 = 0.025 From the standardized normal table, z—a 2 = z0.025 = 1.96 Given that σ2 = 4, hence σ = 2 and n = 10 Therefore, a 95% symmetrical confidence interval = 1x – ± z—a 2 σ  n 2 = 19.94 ± (1.96) 2 10 2 = (9.94 ± 1.24) = (8.70, 11.18) Example 19 The scores of an IQ test is known have a normal distribution with mean µ and variance σ2 . A random sample scores of 68 people who have taken the IQ test were selected and a 98% symmetrical confidence interval worked to be (88.2, 102.8). Find the mean sample x – and σ. Hence, find a 92% symmetrical confidence interval for µ. Solution: For a 98% symmetrical confidence interval, (1 – α)100% = 98% (1 – α) = 0.98 α = 0.02 α 2 = 0.001 From the standardized normal table, z—a 2 = z0.001 = 2.326 The 98% symmetrical confidence interval for the mean scores of the IQ test is 1x – ± z—a 2 σ  n 2 . Therefore, 1x – – z—a 2 σ  n 2 = 1x – – (2.326) σ 68 2 = 88.2 ……  and 1x – + z—a 2 σ  n 2 = 1x – + (2.326) σ 68 2 = 102.8 ……  Q P z 2 _α

219 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4  + : 2x – = 191.0 Therefore x – = 95.5  –  : 212.326 σ 68 2 = 14.6 σ = 68(14.6) 2(2.326) = 25.88 For a 92% symmetrical confidence interval, (1 – α)100% = 92% (1 – α) = 0.92 α = 0.08 α 2 = 0.04 From the standardized normal table, z—a 2 = z0.04 = 1.751 The population standard deviation σ = 25.88 and the sample size n = 68. Therefore, a 92% symmetrical confidence interval for the mean scores of the IQ test = 1x – ± z—a 2 σ  n 2 = 195.5 ± (1.751) 25.88 68 2 = (95.5 ± 5.5) = (90.0, 101.0) Interpretation of a confidence interval Consider the IQ scores of a normal population with mean µ = 4.5 and standard deviation σ = 2.87. In a study of sampling distribution of the IQ scores, the mean score of a randomly chosen sample of 40 participants x – is found to be 3.975. Thus, a 90% confidence interval for the mean score is 1x – ± z—a 2 σ  n 2 = 13.975 ± (1.65) 2.87 40 2 = (3.975 ± 0.7465) = (3.23, 4.72). Therefore, we are 90% confident that the interval (3.23, 4.72) will contain the population mean µ. µ lies in this interval 3.23 4.72 Figure 4.7 If we perform the sampling distribution with another 10 randomly selected samples of size 40 each, 10 confidence intervals will be obtained. Hence for a 90% confidence interval, we can expect 0.9 × 10 = 9 of these confidence intervals to contain the mean population µ = 4.5.

220 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Table 4.3 below shows the 10 samples and the confidence intervals obtained in a sampling study. Sample Min sample 90% confidence interval 1 4.64 (3.89, 5.39) 2 4.56 (3.81, 5.31) 3 3.96 (3.21, 4.71) 4 5.12 (4.37, 5.87) 5 4.24 (3.49, 4.99) 6 3.44 (2.69, 4.19) 7 4.60 (3.85, 5.35) 8 4.08 (3.33, 4.83) 9 5.20 (4.45, 5.95) 10 4.88 (4.13, 5.63) Table 4.3 Figure 4.8 is a line segment representation of the confidence intervals. = 4.50 2.50 3.50 4.50 5.50 6.50 0 1 2 3 4 5 6 7 8 9 10 11 µ Figure 4.8 Notice that 9 out of the 10 confidence intervals contains the mean µ = 4.5, i.e. the sixth interval (2.69, 4.19) do not contain µ. Confidence interval for population mean µ with known variance s2 based on a large sample n. By the central limit theorem, the distribution of the sample mean is approximately normal. X – ~ N(µ, s2 n ). If x – is the mean of a random sample of a sufficiently large size n(n > 30) taken from any population with mean µ and known variance σ2 , then a (1 – α)100% symmetrical confidence interval for µ is 1x – ± z—a 2 σ  n 2.

221 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Example 20 A factory manufactures steel rods with mean mass µ kg and standard deviation 15 kg. A random sample of 50 steel rods has a total mass of 2775 kg. Determine a 94% symmetrical confidence interval for µ. Solution: The sample mean X – = 2775 50 = 55.5 For a 94% symmetrical confidence, (1 – a)100% = 94% a = 0.06 a 2 = 0.03 From the standardized normal table, z—a 2 = z0.015 = 1.881 Hence a 94% symmetrical confidence interval for µ = 1X – ± z—a 2 σ ^  n 2 = 155.5 ± (1.881) 15 50 2 = (55.5 ± 4.0) = (51.5, 59.5) Confidence interval for the population mean µ with unknown variance σ2 based on a large sample If x – is the mean of a random sample of a sufficiently large size n (n > 30) taken from any population with mean µ and unknown variance σ2 , then a (1 – α)100% symmetrical confidence interval for µ is 1x – ± z—a 2 σ ^  n 2 , where σ ^ 2 = ∑(x – x –) 2 n – 1 is the unbiased estimate for the population variance σ2 . Note: (a) n must be sufficiently large, i.e. n > 30. (b) Unbiased estimate for population variance σ2 is σ ^ 2 = ∑(x – x –) 2 n – 1 . Example 21 A random sample of size 100 is taken from a population and the data are obtained as follows: ∑x = 108 and ∑(x – x –) 2 = 74.8 Find a 97% symmetrical confidence interval for the population mean µ. Solution: The population variance is unknown so the unbiased variance σ ^ 2 = ∑(x – x –) 2 n – 1 is used to estimate the value of σ2 . The sample mean, x – = ∑x n = 108 100 = 1.08 The unbiased variance σ ^ 2 = ∑(x – x –) 2 n – 1 = 74.8 100 – 1 = 0.756 Therefore, σ ^ = 0.756 = 0.869

222 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 For a 97% symmetrical confidence interval, 97% = (1 – α)100% ⇒ α = 0.03 a 2 = 0.015 From the standardized normal table, z—a 2 = z0.015 = 2.170 Hence, a 97% symmetrical confidence interval for the population mean µ = 1x – ± z—a 2 σ ^  n 2 = 11.08 ± (2.17) 0.869 100 2 = (1.08 ± 0.19) = (0.89, 1.27) Example 22 A random sample of size 100 is taken from a population and the data is summarized below. ∑x = 1000, ∑(x – x –) 2 = 170.8 Find a (a) 97% symmetrical confidence interval, (b) 99% symmetrical confidence interval, for the population mean. Solution: (a) For a 97% symmetrical confidence interval. 97% = (1 – α)100% α = 0.03 a 2 = 0.015 From the standardized normal table, z—a 2 = z0.015 = 2.170 The sample mean, x – = ∑x n = 1000 100 = 10 The unbiased variance s2 = ∑(x – x –) 2 n – 1 = 170.8 100 – 1 = 1.7253 Therefore, σ ^ = 1.7253 = 1.3135 Hence, a 97% symmetrical confidence interval for the population mean = 1X – ± z—a 2 σ ^  n 2 = 110 ± (2.17) 1.3135 100 2 = (10 ± 0.285) = (9.715, 10.285) (b) For a 99% symmetrical confidence interval, (1 – α)100% = 99% α = 0.01 a 2 = 0.005

223 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 From the standardized normal table, z—a 2 = z0.005 = 2.576 Hence, a 99% symmetrical confidence interval for the population mean = 1x – ± z—a 2 σ ^  n 2 = 110 ± (2.576) 1.3135 100 2 = (10 ± 0.338) = (9.66, 10.34) Example 23 The lengths of steel rods manufactured by a factory have a distribution with a nominal value of 50 cm. During a routine check, a random sample of 150 steel rods is taken and the data are summarized as follows: ∑(x – 50) = –12 and ∑(x – 50)2 = 1408 Find a 90% symmetrical confidence interval for the mean length of the rods manufactured by the factory. Solution: Let u = x – 50 then ∑(x – 50) = ∑u = –12 and ∑(x – 50)2 = ∑u2 = 1408 The mean u – = ∑u n = – 12 150 Now x = u + 50 Hence, x – = 50 + u – = 50 – 12 150 = 49.92 The unbiased variance σ ^ u 2 = n(∑u2 ) – (∑u) 2 n(n – 1) = 150(1408) – (–12)2 150(150 – 1) = 9.443 From x = u + 50 ⇒ σ ^ x = σ ^ u = 9.443 = 3.0729 For a 90% symmetrical confidence interval, (1 – α)100% = 90% α = 0.10 a 2 = 0.05 From the standardized normal table, z—a 2 = z0.05 = 1.645

224 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Hence, a 90% symmetrical confidence interval for the mean length of steel rods = 1x – ± z—a 2 σ ^  n 2 = 149.92 ± (1.645) 3.0729 150 2 = (49.92 ± 0.41) = (49.51, 50.33) Example 24 A population has unknown mean µ and unknown variance. Ah Beng takes a random sample of 40 observations and obtained a mean of 26.8 and unbiased variance of 35.65 while Gowry takes a random sample of 60 observations and obtained a mean of 19.7 and unbiased variance of 42.98. Combining these results to give a single sample, determine a 98% symmetrical confidence interval for the population mean µ. Solution: In order to obtain the combined mean and combined variance we need to find the overall total of these 100 observations and also the overall total of squares. Now ∑(x + y) = nx x – + ny y – = 40(26.8) + 60(19.7) = 2254. \ The combined mean is x – c = 2254 100 = 22.54 Rearranging the unbiased variance is s 2 = 1 n – 1 1∑x2 – (∑x) 2 n 2 , the sum of squares is ∑x2 = (n – 1)s 2 + (∑x) 2 n = (n – 1)s 2 + (nx–) 2 n . Hence the combined sum of squares is ∑c 2 = 1(40 – 1)(35.65) + (40 × 26.8)2 40 2 + 1(60 – 1)(42.98) + (60 × 19.7)2 60 2 = 55 941.17. The unbiased estimate for the population variance for combined sample of 100 observations is s ^ c 2 = 1 (100 – 1) 155 941.17 – 22542 100 2 = 51.8789 \ A 98% confidence interval for the population mean µ is 122.54 – 2.326 51.8789 100 2 , 122.54 + 2.326 51.8789 100 2 = (20.9, 24.2) Sample size in the estimation of population mean In the study of sampling distribution, a constraint often faced is the determination of the sample size. If the sample size is small, it may not be representative of the population; hence the results may not be accurate. On the other hand, a large sample size may not be easily obtainable. For example, in the research to find a drug that can cure AIDS, new drugs are tested on AIDS patients. A large sample may be difficult to obtain as patients may be reluctant to take new drug which are untested and proven its effectiveness. Apart from this, obtaining a large sample may be time consuming and also costly. A main concern in the study of estimation theory is to reduce the estimation error. In general, a large sample size can reduce the estimation error.

225 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Error – x – z x – – α 2 – σ fin x + z – – α 2 – σ fin µ – x – z x – – α 2 – σ fin r = z– α 2 – σ fin r = z– α 2 – σ fin x + z – – α 2 – σ fin (a) (b) Figure 4.9 The estimation error r = z—a 2 σ  n . Hence, the sample size n = 122. Example 25 The income of a fresh graduate of a university is known to be normally distributed with standard deviation RM1000. Find the least sample size that must be taken so that mean income is estimated with an error within RM500 with a confidence level of 95%. Solution: For a confidence level of 95%, α = 0.05 ⇒ z—a 2 = z0.025 = 1.96 Estimation error r = 500 and standard deviation σ = 1000. Hence, n = 1z—a 2 σ r 2 2 = 11.96 × 1000 500 2 2 = 15.37 Therefore, the minimum sample size that should be taken is 16 (round up to the next integer). Confidence interval for the population proportion If p ^ is the proportion of a random sample of a sufficiently large size n with a particular characteristic, then a (1 – α)100% symmetrical confidence interval for the population proportion with the particular characteristic is 1p ^ ± z—a 2 p(1 – p) n 2 . Example 26 A supervisor wants to know the proportion of defective electronic components produced by a machine. He inspects a random sample of 200 electronic components and found that 40 electronic components is defective. Find a (a) 95% symmetrical confidence interval, (b) 98% symmetrical confidence interval, for the proportion of all electronic components produced by the machine. Solution: The sample proportion of defective electronic components is p = x n = 40 200 = 0.2

226 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (a) For a 95% symmetrical confidence interval, (1 – α)100% = 95% α = 0.05 a 2 = 0.025 From the standardized normal table, z—a 2 = z0.025 = 1.96 The 95% symmetrical confidence interval for the proportion of all electronic components produced by the machine = 10.2 ± 1.96 (0.2)(0.8) 200 2 = (0.2 ± 0.055) = (0.145, 0.255) (b) For a 98% symmetrical confidence interval, (1 – α)100% = 98% α = 0.02 a 2 = 0.01 From the standardized normal table, z—a 2 = z0.01 = 2.326 The 98% symmetrical confidence interval for the proportion of all electronic components produced by the machine = 10.2 ± 2.326 (0.2)(0.8) 200 2 = (0.2 ± 0.0658) = (0.134, 0.266) Sample size in the estimation of population proportion A (1 – α)100% confidence interval for the population proportion p with a sample proportion p ^ is 1p ^ ± z—a 2 p ^ (1 – p ^ ) n 2 . The term z—a 2 p ^ (1 – p ^ ) n is known as the estimation error. Error p p + z– α 2 – n p(1 – p) fi p ^ ^ ^ p + z– α 2 – n p(1 – p) fi ^ ^ ^ ^ p p – z– α 2 – n p(1 – p) fi r = z– α 2 – n p(1 – p) fi r = z– α 2 – n p(1 – p) fi p + z– α 2 – n p(1 – p) fi ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ (a) (b) Figure 4.10 The estimation error r = z—a 2 p ^ (1 – p ^ ) n . Hence, the sample size n = 1 z—a 2 r 2 2 p ^ (1 – p ^ ). Usually, the value of p ^ is only known after the sample size n is first determined. To overcome this problem, the value of p ^ can be estimated in the one of the following ways: (a) Conduct an initial sampling test on one of the sample to determine the value of p ^ . (b) Use existing value of p ^ .

227 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 (c) Guess the value of p ^ after consulting people who has knowledge of the population. (d) If the methods (a) to (c) is unsuccessful, then we can estimate the value of p ^ by finding the maximum value of p ^ (1 – p ^ ); by completing the squares of p ^ (1 – p ^ ) = p ^ – p ^ 2 = 1 4 – 1 4 + p ^ – p ^ 2 = 1 4 – 1 1 4 – p ^ + p ^2 2 = 1 4 – 1 1 2 – p ^ 2 2 Hence, the maximum value of p ^ (1 – p ^ ) = 1 4 . Therefore, n = 1 4 1 z—a 2 r 2 2 = 1 z—a 2 2r 2 2 Example 27 A study is conducted in a particular school to determine the proportion of pupils who smokes. It is found that 8 out of 120 pupils selected at randomly smoke. Find the least sample size so that the difference between the sample proportion and the population proportion is less than 0.05 with a confidence level of 99%. Solution: From the initial study conducted, it is found that the proportion of pupils who smokes is p ^ = 8 120 = 0.07. For a confidence interval of 99%, α = 0.01 ⇒ z—a 2 = z0.005 = 2.575 The estimation error r = 0.05 The sample size n = 1 z—a 2 r 2 2 p ^ (1 – p ^ ) = 1 2.575 0.05 2 2 (0.07)(1 – 0.07) = 172.66 Rounding up to the next integer, the least value of n = 173. Example 28 An IT company in Johor is interested to know the proportion of students in the state who own a computer. Find the least sample size so that the difference between the sample proportion and the population proportion is less than 0.1 with a confidence level of 98%. Solution: Since the company do not have any information about the proportion of students in Johor who own a computer, we can use the maximum value of p(1 – p) = 1 4 . For a confidence interval of 98%, α = 0.02 ⇒ z—a 2 = z0.01 = 2.326 The sample size n = 1 z—a 2 2r 2 2 = 1 2.326 2 × 0.1 2 2 = 135.2569 Hence, the least value of n = 136.

228 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Exercise 4.4 1. A study is conducted upon 12 worms caught in a garden. Their lengths, in cm, are recorded as shown below. 9.5, 9.5, 11.2, 10.6, 9.9, 11.1, 10.9, 9.8, 10.1, 10.2, 10.9, 11.0 It is known that lengths of the worms are normally distributed with standard deviation 4 cm. Find a 95% confidence interval for the mean length of worms in the garden. 2. The heights of 100 students chosen at randomly from a secondary school in Kedah are recorded. A 95% confidence interval constructed from this sample is (177.22 cm, 179.18 cm). Find the (a) sample mean, (b) standard deviation for the heights of students in Kedah, (c) confidence interval of 98% for the heights of students in Kedah. 3. A random sample of size 120 is taken from a normal distribution and the results are summarized below. n = 120, ∑x = 1008, ∑(x – x –) 2 = 172.8 Find the confidence interval of (a) 97%, (b) 99%, for the population mean. 4. On a routine check in a manufacturing process, a supervisor finds that 45 out of a random sample of 300 products are defective. Determine the 95% and 98% confidence intervals for the proportion of defective products manufactured by the factory. 5. A research is conducted in Perak to determine the proportion of students who own handphones. From a random sample of 400 students, it is found that 136 own handphones. Find the least sample size to be taken so that the percentage of students who own handphones in Perak is within ±2% of the population proportion with a confidence level of 95%. 6. The scores of a statistics test are normally distributed with standard deviation 38. Find the least sample size to be taken so that the mean score of the population can be estimated within 10 marks of the sample mean with a confidence level of 95%. 7. A sample is taken at random from a population with mean µ and standard deviation 45. Find the least sample size to be taken so that µ can be estimated within 8 marks of the sample mean with a confidence level of 96%. 8. A research is conducted in Kuala Lumpur to determine the proportion of people who own credit cards. From a random sample of 40 people, it is found that 18 people own credit cards. Find the least sample size to be taken so that the proportion of people who own credit cards in Kuala Lumpur differs from the sample proportion by not more than 0.08 with a confidence level of 99%. 9. A research is conducted in Penang to determine the proportion of students who own computers. From a random sample of 50 students, it is found that 24 own a computer. Find the least sample size to be taken so that the proportion of students who own computers in Penang differs from the sample proportion by not more than 0.05 with a confidence level of 95%.

229 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 10. A sports company manager is interested to know the proportion of students who play badminton in Perak. A random sample of size n is to be chosen. Find the least sample size to be taken so that the proportion of students who play badminton in Perak differs from the sample proportion by not more than 0.1 with a confidence level of 90%. 11. A sports company manager is interested to know the proportion of students who play squash in Pahang. A random sample of size n is to be chosen. Find the least sample size to be taken so that the proportion of students who play squash in Pahang is estimated to differ from the sample proportion by not more than 0.2 with a confidence interval level of 94%. 12. A random sample of size 25 is taken from a normal population with standard deviation of 3.5. The mean of a sample is 14.8. (a) Find a 95% symmetrical confidence interval for the population mean µ. (b) What sample size is required to obtain a 98% symmetrical confidence interval of width of at most 1.5? (c) What confidence level would correspond to the confidence interval of (13.75, 15.85) based on the above sample of 25? 13. The error made when a certain measuring instrument is used to measure the body length of a bee of a particular species has a normal distribution with mean 0 mm and standard deviation 0.8 mm. (a) Calculate the probability that the error made when the instrument is used once is numerically less than 0.4 mm. (b) The body length of a wasp was measured ten times with the instrument, calculate the probability that the mean length will be within 0.4 mm of the true length. (c) The mean of another ten measurements was 1.93 mm, determine a 98% symmetrical confidence interval for the true body length of the bee. 14. In a study of the proportion of students who attend school without pocket money, a random sample of n students gave a 99% symmetrical confidence interval (0.1908, 0.2892). Determine (a) the sample proportion, (b) the standard error, (c) the sample size n. 15. The life span of a rabbit has a normal distribution with mean of 9.2 years and standard deviation of 1.8 years. Find the probability that a randomly selected rabbit has a life span between 8.1 and 9.3 years. A random sample of 36 rabbits is selected, find the probability that the mean life span is (a) between 8.1 and 9.3 years, (b) within three quarters of a standard deviation of the population mean, (c) within 0.5 year of the population mean, (d) at least 0.5 year lower than the population mean. 16. A farm manager wish to estimate the damage to Agar (Aquilaria) trees of his plantations due to flooding and pests. (a) According to previous record, the standard deviation of the number of damaged trees is 350. Calculate the number of trees required so that he is 98% confident that the estimate is within 100 trees of the true mean. (b) The standard deviation of the number of damaged tree is actually 385. Based on the sample size obtained in (a), determine the symmetrical confidence level for the estimate to be within 100 trees of the true mean.

230 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 17. A shop sells durians supplied by two durian orchard, A and B. A random sample of 20 durians from the orchard A is chosen and the masses, in kg, are measured. The results are recorded in the table below. 2.99 2.83 3.04 2.97 3.01 2.89 2.86 2.91 2.82 2.95 2.97 2.92 3.01 2.94 2.99 2.97 2.87 2.89 2.95 3.01 (a) Determine the 95% symmetrical confidence interval for the mean mass of durians from the orchard A given that the masses of durians are known to be normally distributed with standard deviation of 0.045 kg. (b) The mean mass of a random sample of 60 durians from the orchard B is 2.79 kg. Determine the 95% symmetrical confidence interval for the mean mass of durians from the orchard B given that the masses of durians are also known to be normally distributed with standard deviation of 0.045 kg. A customer found a durian and the mass of it is 2.93 kg. what conclusion he can deduce regarding the origin orchard of the durian? Explain your reason. 18. According to old records the standard deviation of math marks of a certain exam is 15.5 but the mean µ will fluctuate. An examiner wants to estimate the mean mark of all the candidates but he has a sample of 100 candidates, which gives a mean of 28.8. (a) What assumptions about the tiles must be made in order to calculate a confidence interval for µ? (b) Assuming the above assumption is justified, calculate a 95% symmetrical confidence interval for µ. (c) Later it was discovered that the actual mean of µ was 45.5. What conclusion can be drawn about this sample? (d) Determine the smallest sample size in order to estimate the marks within 5 marks of the actual mean with probability of 0.95. 19. A student calculated three confidence intervals to be (15.87, 18.33), (15.70, 18.50) and (15.97, 18.23) but forgot to label them. All he knew was it is based on 94%, 96% and 98% symmetrical confidence levels. (a) State, with a reason which interval is the 96% one. (b) Estimate the standard error. (c) What was the unbiased of the mean in this case? 20. The table below shows the number of hand phones owned by 30 randomly chosen households. Number of hand phones (x) 1 2 3 4 5 6 7 8 9 Number of households (f) 2 4 5 8 4 3 2 1 1 (a) Calculate the unbiased estimates of the mean and variance of the number of hand phones. Thirty more randomly households were randomly sampled and the sample had a mean of 3.6 hand phones and ∑ (x – x –) 2 30 = 1.5. (b) Treating the 60 results as a single sample, obtain an unbiased estimates of the population mean and the population variance. (c) State, with a reason, these two sets of estimates you would prefer to use. (d) Obtain a 95% symmetrical confidence interval for the number of hand phones of the population.

231 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 Summary 1. If X – is the sample mean taken from an infinite population (or a finite population with replacement) with mean µ and variance σ 2, then E(X –) = µ and Var(X –) = s 2 n . 2. If X – is the sample mean taken from a finite population of size N without replacement with mean µ and variance σ 2, then E(X –) = µ and Var(X –) = s2 n 1 N – n N – 1 2. 3. Central limit theorem: For any population with mean µ , variance σ 2 and a sample of size n sufficiently large (n > 30), the sample mean X – ~ N1µ , s2 n 2 . 4. For a population with proportion p and a sample of size n (n > 30), the sample proportion p ^ ~ N1p, p(1 – p) n 2. 5. The unbiased estimate for the population mean is the sample mean, x ^ = 1 n n ∑ i=1 xi . 6. The unbiased estimate for the population variance is s ^ 2 = ns2 n – 1 with s 2 = ∑(x – x –) 2 n . 7. The unbiased estimate for the population proportion is the sample proportion, p ^ = x n . 8. A (1 – α)100% symmetrical confidence interval for the population mean µ and known standard deviation σ is 1x – ± z—a 2 σ  n 2 . 9. A (1 – α)100% symmetrical confidence interval for the population mean µ and unknown standard deviation σ is 1x – ± z—a 2 s ^  n 2 where s ^ = ∑(x – x –) 2 n – 1 is the unbiased estimator for the population standard deviation. 10. A (1 – α)100% symmetrical confidence interval for the population proportion p is 1p ^ ± z—a 2 p ^ (1 – p ^ ) n 2 where p ^ = x n is the sample proportion. 11. The estimation error of a (1 – α)100% symmetrical confidence interval for the population mean is r = z—a 2 σ  n . 12. The estimation error of a (1 – α)100% symmetrical confidence interval for the population proportion is r = z—a 2 p ^ (1 – p ^ ) n .

232 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 STPM PRACTICE 4 1. The table below shows the frequency distribution of the ages, X (in years), of 250 expectant mothers in a hospital. x 18 – 20 – 22 – 24 – 26 – 28 – 30 – 32 – 34 – 26 – 38 – f 14 36 42 57 48 26 17 7 2 0 1 18– denotes 18 < x , 20, 20 – denotes 20 < x , 22, and so forth. f is the number of mothers. (a) Find, to the nearest 0.1 year, the mean and standard deviation age for X. (b) If the 250 expectant mothers is a random sample taken from a large population, find a 95% symmetrical confidence interval for the mean age of all expectant mothers. 2. In a sampling study conducted in town A, it is found that 28 out of 80 students chosen at random prefer the MIMI drink. (a) Find a 95% symmetrical confidence interval for the proportion of students from town A who prefer MIMI. A similar study is conducted in town B and it is found that 45 out of 100 students chosen at random prefer MIMI. (b) Find a 95% symmetrical confidence interval for the proportion of students from town B who prefer MIMI. (c) Based on these two results, can it be concluded that more students from town B prefer MIMI than those from town A? 3. A machine is used to fill fruit juice into tins and the nominal mass of the tins is 100 grams. The machine’s efficiency is monitored periodically, and in a random sample of 10 tins where the masses are examined, the measurements in centimetres are recorded as follows: 113, 128, 112, 98, 127, 108, 105, 120, 118, 115 (a) Find the unbiased estimate for the mean mass of the population of tins filled by the machine. (b) Find a 95% symmetrical confidence interval for the mean mass of the population of tins filled by the machine. 4. A normal population has mean µ and variance σ 2 where both of these values are unknown. A random sample is taken from the population and the values are shown below. 3.2, 1.6, 0.9, 4.1, 3.7 (a) Find the mean and standard deviation for the sample. (b) Find the unbiased estimate for σ2 . (c) Find a 95% symmetrical confidence interval for µ . (d) Explain what is meant by ‘unbiased estimate’ in part (b). (e) Explain what is meant by ‘95% symmetrical confidence interval’ in part (c). 5. The scores of a test of a random sample of 250 students are summarized below. ∑x = 11 872, ∑x2 = 646 193 (a) Find a 90% symmetrical confidence interval for the mean score of the students who sat for the test. (b) It is known that the scores of all students who sat for the test have mean 45.292 and standard deviation 18.761. Find the probability that a sample of 250 students have mean score exceeding the sample mean found above.

233 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 6. The perimeter of a type of tree has mean 82.4 cm and standard deviation 2.8 cm. A random sample of size n is taken from the population above and its mean worked out. (a) Show that the smallest value of n is 137 so that the standard deviation of the sample means is less than 0.24 cm. (b) Estimate the percentage of a sample of size 137 that has mean between 82.1 cm and 82.6 cm. 7. A study is done to find out the proportion of students who take mathematics tuition in a town. From a random sample of 200 students, 136 of them are found to take mathematics tuition. (a) Estimate the proportion of students who take mathematics tuition. (b) Obtain a 95% symmetrical confidence interval for proportion of students who take mathematics tuition. (c) Find the sample size that must be taken so that the probability of students who take mathematics tuition is estimated with an error of less than 3% is 0.95. 8. (a) The random variable X has a normal distribution. Find the mean and standard deviation of X if P(X . 10) = 0.1 and P(9 , X , 10) = 0.2. (b) The random variable Y has a normal distribution with mean 10 and standard deviation 2. A random sample of size n is chosen from Y and the sample mean, Y – is calculated. Find the smallest n such that P(Y – . 10.1) < 0.01. 9. A random sample of a type of thread manufactured by a company is examined and its length recorded in the table below. Thickness 72.5 77.5 82.5 87.5 92.5 97.5 102.5 107.5 Number of threads 6 18 32 57 102 51 25 9 (a) Find the unbiased estimate for mean and standard deviation thickness of the threads manufactured by the company. (b) Find a 95% symmetrical confidence interval for the mean thickness of the threads manufactured by the company. 10. The 95% symmetrical confidence interval for the mean life span, measured in hours, of a type of battery obtained from a random sample of size 36 is (1023.3 hours, 1101.7 hours). Assuming that the life span of the batteries is normally distributed, find (a) the mean and standard deviation of the mean life span of the batteries, (b) 99% symmetrical confidence interval for the mean life span of the batteries. 11. In a survey, 45 out of 100 parents of a secondary school are in favour of their children studying in the science stream. (a) Determine the smallest sample size to estimate the proportion of parents who are in favour of their children studying in the science stream with an error of not more than 0.06 at a confidence level of 95%. (b) State how does the sample size is affected if (i) the error is less than 0.06 with the confidence level remains unchanged, (ii) the confidence level is less than 95% with the error unchanged.

234 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 12. The age of people suffering from lung cancer in country A is normally distributed with mean 61 years and standard deviation 5 years. (a) Find the probability that 10 randomly chosen people suffering from lung cancer have a mean age of less than 63 years. (b) Find the probability that 5 randomly chosen people suffering from lung cancer have a total age of more than 310 years. (c) The ages of 10 randomly chosen people suffering from lung cancer in country B are as follows: 60, 64, 65, 59, 55, 57, 62, 63, 66, 48 Assuming the ages of people suffering from lung cancer are normally distributed, determine the 95% symmetrical confidence interval for the mean age of sufferers in country B. Hence determine whether the mean age differs from country A, stating your reason. 13. A random sample of size 36 is taken from a normal population with standard deviation of 2.5. The mean of a sample is 15.8. (a) Find a 95% symmetrical confidence interval for the population mean µ. (b) What sample size is required to obtain a 98% symmetrical confidence interval of width of at most 1.4? (c) What confidence level would correspond to the confidence interval of (14.62, 15.95) based on the above sample of 36? 14. A hard disc manufacturer claims that 10% of its products are defective. A random sample of 50 hard discs is inspected. (a) State the sampling distribution. (b) Find the probability that the sample proportion of defective hard discs produced is (i) between 8% and 15%, (ii) at least 3% lower than the population proportion, (iii) within three quarters standard deviation of the population proportion. (c) Determine the 90th percentile. 15. A company manufactures a drug with a potency which is normally distributed with mean µ mgcm–3 and standard deviation s mgcm–3. The potency of a random sample of 25 drugs are given by ∑x = 127.2 and ∑(x – x –) 2 = 85.8228. Calculate the unbiased estimate for µ and s. Based on your estimates, calculate (a) the probability that a drug chosen at random has a potency of more than 5.25 mgcm–3. (b) the 90 percentile for the potency of the drug. If (x – – 0.55, x – + 0.55) is the 96% symmetrical confidence interval for µ, determine the minimum size of the sample required. 16. A population has unknown mean µ and unknown variance s2 . Alex takes a random sample of 45 observations and obtained a mean of 18.5 and unbiased variance of 36.65, while Ginny takes a random sample of 55 observations and obtained a mean of 16.5 and unbiased variance of 42.68. By combining their results to give a single sample, obtain a 98% symmetrical confidence interval for µ.

235 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 17. A market research of a poll is being conducted in a certain place. 200 out of a random sample of 250 people being asked say they supported a party A. (a) Obtain a 96% symmetrical confidence interval for the proportion of people who supported party A and interpret on this confidence interval. (b) Explain why an interval is more informative than a point estimate. 18. In an inspection around the grocery stores in various places of a country, it was found that 84 out of 500 white bread being sold contain more than the permitted amount of preservative allowed under the government law. (a) Estimate the percentage of white bread that contains more than the permitted amount of preservaive in the country. (b) Give a 96% symmetrical confidence interval for the percentage of white bread that contains more than the permitted amount of preservative. (c) Determine the size of the sample for the number of white bread in the country that contains more than the permitted amount of preservative can be estimated with an error of less than 5% with a probability of 0.95. 19. In a recent survey a manager of a marketing firm wants to estimate the people's spending on hand phone, a random sample of 100 people were interviewed and a 95% symmetrical confidence interval of (RM1200, RM1400) was obtained. Find the sample mean x –. The manager thinks that the confidence interval is too wide and wants a confidence interval of width RM150. (a) Using the same value of x –, find the confidence limits in this case. (b) Find the confidence level found in (a). (c) The manager is still not satisfied and now wants to know how large a sample would be required to obtain a 95% symmetrical confidence interval of width not more than RM150. Find the smallest sample size. 20. In a particular town 46% of the town folks use the taxi to get to their destinations. A random sample of 100 people is asked whether they preferred to use the taxi and if at least 55 of them is in favour of using the taxi then more taxis will be put on the road. (a) Determine the sampling distribution of the sample proportion of people who uses the taxi service. (b) Find the probability that more taxis will be put on the road. 21. A survey is conducted by a college administrator to determine the amount of time spent by its students on physical activities during their leisure time. It was found that the total hours per week spent has a mean of 13.8 hours and standard deviation of 4.5 hours. For a selected random sample of 50 students, find the probability that (a) the sample mean is less than 14 hours, (b) the sample mean is within half hour of the population mean. 22. According to a report of a particular country, 75% of the people support the concept of free tertiary education. A survey was carried out on a random sample of 100 people regarding this idea. (a) State the sampling distribution. (b) Find the probability that the sample proportion supporting free tertiary education is (i) at least 5% lower than the population proportion, (ii) within half a standard deviation of the population proportion.

236 Mathematics Semester 3 STPM Chapter 4 Sampling and Estimation 4 23. In a survey done in a particular village, 85 out of a random sample of 100 people oppose the release of the genetically breed mosquito into the environment. (a) Determine the smallest sample size needed to estimate the population proportion with an error not more than 0.05 with at least 90% confidence level. State the assumption made. (b) State the effect on the sample size (i) if the confidence level is 95% with the error unaltered. (ii) if the error is 0.06 with the same confidence level unaltered. 24. A random variable X has a normal distribution with mean 188 and standard deviation of 6. A random sample of size n is chosen from the population. (a) State the sampling distribution of the mean sample. (b) Find the value of n if the probability that the sample mean is less than 189 is 0.841. 25. A population distribution has a mean of 119 and variance of 250. If 30 samples, each of size 25, are taken from this population, (a) calculate the probability that the sample mean is more than 121, (b) determine the number of samples with mean less than 121. 26. An automobile company which also sells Global Positioning System (GPS) claims that the proportion of cars installed with GPS is 0.32. In a survey, a random sample of 125 cars is inspected. (a) Assuming the claim is true, calculate the probability that the sample proportion is within 0.05 of the population proportion. (b) If 30 cars from the random sample was found to have installed GPS, construct a 95% confidence interval for the proportion of cars installed with GPS. (c) If the selling price, y(RM), for the 30 cars from the random sample is summarised by 30 ∑ i=1 yi = 4 637 250 and 30 ∑ i=1 yi 2 = 9.56036 × 1011, construct a 95% confidence interval for the average selling price of cars that are installed with GPS. 27. The concentrations of a trace element in 7 randomly chosen samples of liquid from certain soft drinks were taken and listed, in milligrams per litre, as follows: 236.6 232.5 234.2 233.9 240.8 236.7 237.3 (a) Calculate the unbiased point estimates of the mean and the variance of the population from which this sample was drawn. (b) Estimate the proportion of the population of concentrations that are more than 235 milligrams per litre. 28. A sample of size 600 is taken from a population with a proportion p = 0.56. Determine the value of k such that P(|^p – p| > k) = 0.025, where ^p is a sample proportion. 29. A random sample of 100 adults is taken from a normal population with a variance of 36. Determine the value of k if the symmetric k% confidence interval for the population mean is (5.412, 7.386). 30. 30 random samples are taken from a population which is assumed to have the binomial distribution B(9, 0.5). (a) Determine the sampling distribution of the sample mean. (b) Find the probability that the sample mean is more than 5. (c) If the mean of a random sample is 4.38, obtain a 90% confidence interval for the population mean.

237 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 CHAPTER Learning Outcome (a) Explain the meaning of a null hypothesis and an alternative hypothesis. (b) Explain the meaning of the significance level of a test. (c) Carry out a hypothesis test concerning the population mean for a normally distributed population with known variance. (d) Carry out a hypothesis test concerning the population mean in the case where a large sample is used. (e) Carry out a hypothesis test concerning the population proportion by direct evaluation of binomial probabilities. (f) Carry out a hypothesis test concerning the population proportion using a normal approximation. alternative hypothesis – hipotesis alternatif critical region – rantau genting critical value – nilai genting hypothesis testing – pengujian hipotesis null hypothesis – hipotesis nol one-tailed test – ujian satu hujung rejection region – rantau penolakan significance level – aras keertian test statistic – statistik ujian two-tailed test – ujian dua hujung Type I error – ralat jenis I Type II error – ralat jenis II HYPOTHESIS TESTING 5 Bilingual Keywords

238 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 5.1 Hypothesis Tests In the last chapter we have discussed an important statistical inference: estimation of parameters, where our objective is to estimate the unknown true value of a parameter. In this chapter we introduce another important type of statistical inference: testing of statistical hypothesis, where we shall be interested to examine whether the data from a random sample support or refute a conjecture about the true value of a parameter. As an example, the manufacturer of a certain water filter may claim that the filtered water, has an average, a pH value of 6.4. As another example, an insurance company may claim that the percentage of population who buy health insurance this year has increased compared to last year’s 12%. In the first example, it is to test a hypothesis about the population mean. In the second example, it is to test a hypothesis about the population proportion. The truth or falsity of a statistical hypothesis is never known to us with absolute certainly unless we examine the entire population. This, of course, would be impractical in most situations. Instead, we examine a random sample from the population to produce evidence that either supports or refutes the hypothesis. The evidence from the sample that is inconsistent with the stated hypothesis leads to the rejection of the hypothesis. A hypothesis test or significance test is a method of using sample data as evidence to test a statistical hypothesis about a population parameter. The null and alternative hypothesis The structure of hypothesis testing will be formulated with a null hypothesis denoted by H0 and an alternative hypothesis denoted by H1 . Usually H0 specifies a particular value for a population parameter, and H1 specifies a range of values. In general, the null hypothesis H0 represents there is no difference between the claim and reality whereas the alternative hypothesis H1 represents there is statistically significant difference between the claim and reality. In a hypothesis test, H0 is assumed to be true and information obtained for a sample statistic is used to determine whether there is strong evidence to reject H0 . Example 1 A hypothesis test is performed to determine whether the mean value of filtered water from a certain type of water filter is 6.4. State the null hypothesis and alternative hypothesis for the hypothesis test. Solution: Let µ denote the mean pH value of filtered water. The null hypothesis is that the mean pH value is 6.4, i.e. H0 : µ = 6.4 The alternative hypothesis is H1 : µ ≠ 6.4 Example 2 A hypothesis test is performed to determine whether the percentage of population who buy health insurance this year has increased compared to last year’s 12%. State the null hypothesis and alternative hypothesis for the hypothesis test. Solution: Let p denote the proportion of population who buy health insurance this year. The null hypothesis is that this year’s percentage remains the same as last year. H0 : p = 0.12 The alternative hypothesis is that this year’s percentage has increased. H1 : p . 0.12 Note: The null hypothesis H0 is usually stated using the equality sign.

239 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Test statistics A test statistic is a random variable whose value is used to determine whether a null hypothesis is rejected in a hypothesis test. The choice of a test statistic depends on the assumed probability distribution and the hypothesis under question. Consider the following example. A study claims that 20% drivers in a city involve running red lights. We choose, at random, 20 drivers from the city. If more than 7 drivers admit to running red lights, it indicates a higher percentage. In this case, we are essentially testing the null hypothesis that 20% drivers involve running red lights against the alternative hypothesis that the percentage is higher. This can be written as follows: H0 : p = 0.2, H1 : p . 0.2. The test statistic on which we base our decision is random variable X, the number of drivers in our test. The possible values of X, from 0 to 20, are divided into two groups: those numbers less than or equal to 7 and those greater than 7. All possible values obtained greater than 7 constitute what is called the critical region. The set of values that leads to the rejection of H0 in favour of H1 is called a critical region or rejection region. Thus, if x . 7, we reject H0 in favour of the alternative hypothesis H1 . If x < 7, we fail to reject H0 . This decision criterion is illustrated in the figure below. Do not reject H0 (p = 0.2) Reject H0 (p > 0.2) x Critical value, 7 0 20 Critical region Figure 5.1 Type I and Type II errors The decision procedure described in test statistics above could lead to either of two wrong conclusions. We may reject H0 when in fact H0 is true, that is, the percentage of running of red lights by drivers does not increase. This may occur because we happen to choose this particular selected group of drivers who have such rude behaviour. Or, alternatively, we may not reject H0 when in fact H0 is false, that is, running red lights is getting worse. A Type I error occurs when a true H0 is rejected; a Type II error occurs when a false H0 is not rejected. It is obvious that we cannot completely avoid making these errors. Our goal is try to keep the probability of making these errors relatively small. In testing any statistical hypothesis, there are four possible outcomes that determine whether our decision is correct or in error. These four possibilities are listed in the following table. H0 is true H0 is false Do not reject H0 Correct decision Type II error Reject H0 Type I error Correct decision Table 5.1

240 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 The probability of making a Type I errror is called the significance level and is denoted by the Greek letter a. In our example, a type I error will occur when more than 7 drivers rush through red lights that is actually an odd sample taken. Hence, if X is the number of drivers who involve running red lights, P(Type I error) = P(X . 7 when p = 0.2) = 20 ∑ x=8 1 20 x 2(0.2)x (1 – 0.2)20 – x , 1 – 7 ∑ x=0 1 20 x 2(0.2)x (1 – 0.2)20 – x = 1 – 0.9679 = 0.0321. We say that the null hypothesis is being tested at a significance level of 3.21%. The probability of making Type II error is denoted by the Greek letter b. It is impossible to calculate this probability unless a specific value is stated in the alternative hypothesis. We shall not discuss the determination of b. It can be shown that, for a fixed sample size, a decrease in the probability of one error will usually result in an increase in the probability of the other error. However, we can reduce both types of errors by increasing the sample size. Example 3 A food company produces a box of 250 g corn flakes. It periodically conducts a statistical test to decide whether the mean net mass of all boxes is 250 g. The null and alternative hypotheses are stated below. H0 : µ = 250, H1 : µ ≠ 250. The results of carrying out the hypothesis test lead to no rejection of the null hypothesis. Comment on the conclusion by error type or as a right decision to make if (a) µ is in fact 250 g, (b) µ is in fact not 250 g. Solution: (a) If in fact µ = 250 g, the null hypothesis is true. Thus, by not rejecting the null hypothesis, we have made a right decision. This interprets that the package machine is functioning properly. (b) If in fact µ ≠ 250 g, the null hypothesis is false. Thus, by not rejecting the null hypothesis, we have committed a Type II error. This interprets that the package machine is out of control even though the inspected output sample indicates a satisfactory position. One-tailed and two-tailed tests Consider the null hypothesis that the mean weight of new born babies in a certain city is 3 kg. We test H0 : µ = 3 against H1 : µ ≠ 3 or H1 : <3 or H1 : >3 Only one of these alternative hypotheses can be used at a time. We examine each case in turn. Two-tailed test (H1 : µ ≠ 3) A random sample of size n = 64 new born babies is taken. Assume that the standard deviation of the population to be 1.60 kg. From the central limit theorem, we know that the sampling distribution of X – is a approximate normal distribution with standard deviation sX – = s n = 1.6 8 = 0.2.

241 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 A sample mean that falls close to the hypothesised value of 3 kg would be considered evidence in favour of H0 . Conversely, a sample mean that is significantly less than or more than 3 kg would be evidence favouring H1 . A critical region, indicated by the shaded areas in the figure 5.2, is arbitrarily chosen to be X – , 2.7 and X – . 3.3. If the sample mean X – falls inside the critical region, H0 is rejected; otherwise H0 is not rejected. – 2 – 2 = 3.0 – x Nonrejection region 2.7 3.3 Critical region Critical region Figure 5.2 The significance level of the test is equal to the total of the areas shaded in each tail of the normal distribution. We have, a = P(X – , 2.7) + P(X – . 3.3). The z values corresponding to x – 1 = 2.7 and x – 2 = 3.3 are z1 = 2.7 – 3.0 0.2 = –1.5 z2 = 3.3 – 3.0 0.2 = 1.5 Hence, a = P(Z , –1.5) + P(Z . 1.5) = 2P(Z , –1.5) = 0.1336 That is to say 13.36% of all samples of size 64 would lead to the rejection of H0 when it is true. One-tailed test (H1 : µ < 3 or µ > 3) The critical region for the alternative hypothesis H1 : µ , 3 lies entirely in the left tail of the normal distribution, while the alternative hypothesis µ . 3 lies entirely in the right tail as shown in Figure 5.3. Critical region = 3.0 – x = 3.0 – x Critical region Figure 5.3 (a) Figure 5.3(b) In testing hypothesis about a continuous population, it is common to choose the value of a to be 1%, 5% and 10%. What is Hypothesis Testing? INFO

242 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 The choice of a one-tailed or a two-tailed alternative hypothesis depends on the conclusion to be drawn if H0 is rejected. The position of the critical region can only be finalised once H1 has been stated. For example, a manufacturer of electrical kettle conducts a test on the incoming heating element from a supplier. He needs some criterion for deciding whether the average number of damaged elements has increased, because he concerns about incurring an unnecessary financial loss. He sets up the hypothesis that there is no change in damage rate and tests this against the alternative hypothesis that new shipment received has higher defective rate. Such an alternative hypothesis will result in a one-tailed test with the critical region located at the right tail. Example 4 A consumer group suspects that a local store’s 100 g packages of dried mangoes actually weigh less than 100 g. The group takes a random sample of 50 such packages and finds that the mean mass for the sample is 99 g. (a) State the null and alternative hypotheses for the hypothesis test. (b) Determine whether hypothesis test is two-tailed test or one-tailed test and the location of the critical region. Solution: (a) H0 : µ = 100 g, H1 : µ , 100 g. (b) The group concerns only the mass of the packages less than 100 g and thus test against the alternative hypothesis that the mass is inferior. Such an alternative hypothesis will result in a one-tailed test with the critical region falling in the extreme left tail of the distribution. Exercise 5.1 1. A defendant who has been indicted for committing a crime stands trial in a court. Based on the evidence, the judge will make a decision whether the defendant is innocent or guilty. State the null and alternative hypotheses for the above court case. 2. A new drug is developed by a pharmaceutical company. It is the responsibility of the government to judge the safety and effectiveness of this drug before allowing it to be sold to the public. State the null and alternative hypotheses for this case. 3. A multiple-choice test consisting of 20 questions, each with five possible answers of which one is correct, is given to a student. The purpose of this test is to determine the student’s familiarity with the subject. A score of 8 or more correct answers will convince the examiner that the student has knowledge of the subject being tested and is not simply guessing. Determine the null and alternative hypotheses for setting up the hypothesis test. 4. A manufacturer of sports equipment has developed a new synthetic fishing line claimed to have a mean breaking strength of 8 kg. If a random sample of 30 lines is tested and found to have a mean breaking strength of 7.9 kg. Determine the null and alternative hypotheses for setting up the hypothesis test. 5. The recommended adequate intake of calcium for adults is 1000 mg per day. If we want to carry out a hypothesis test whether the average adult on diet gets a daily intake of less than 1000 mg of calcium. State the null and alternative hypotheses.

243 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 6. There are two wrong conclusions from which you could draw in a hypothesis test: Type I and Type II errors. Identify the Greek letter to denote the probability of each type of error. 7. The null hypothesis H0 : p = 0.35 is tested against the alternative hypothesis H1 : p . 0.35, where p is a population proportion. (a) Suppose that the decision procedure leading to nonrejection of the null hypothesis when in fact it is false. What type of error is committed? (b) If the decision procedure leading to the rejection of the null hypothesis when in fact it is true. What type of error is committed? 8. Consider the null hypothesis, H0 : A new teaching technique and the conventional classroom procedure are equally effective versus the alternative hypothesis, H1 : A new teaching technique is either inferior or superior to the conventional procedure. Describe the decisions taken that would result in Type I and Type II errors if H0 is tested. 9. For the following statements, what happens to the likelihood that we reject the null hypothesis? (a) The closer the value of a sample mean is to the value stated by the null hypothesis. (b) The further the value of a sample mean is from the value stated in the null hypothesis. 10. Suppose one reads news stating that children in his country watch, on average, 5 hours of television per week. In order to test this claim, he conducts a study on a random group of 30 children and finds that the children in the group spend, on average, 4.6 hours watching television per week. (a) State the null and alternative hypotheses. (b) If the decision “reject the null hypothesis” is adopted, what decision error could be committed? (c) Assume the decision “fail to reject the null hypothesis” is implemented, what decision error could be made? (d) State whether the test is a one-tailed test or a two-tailed test. 11. The normal curve for testing a null hypothesis H0 : µ = 50 is shown below. 0.025 = 50 0.025 25 75 – x Determine the (a) rejection region, (b) nonrejection region, (c) critical values, (d) significance level. 12. The normal curve for testing a null hypothesis H0 : µ = 32 is shown below. – x 0.01 31 = 32

244 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 Determine the (a) rejection region, (b) nonrejection region, (c) critical value, (d) significance level. 13. It is given that H0 : μ = 70 is tested against H1 : μ > 70. State whether the test is a one-tailed test or a two-tailed test. Assuming that the sample mean is normally distributed, sketch a normal curve and indicate the rejection and nonrejection regions with a critical value arbitrarily chosen. 14. A hypothesis test is stated as H0 : μ = 15 versus H1 : μ ≠ 15. State whether the test is a one-tailed test or a two-tailed test. Assuming that the sample mean is normally distributed, sketch a normal curve and indicate the rejection and nonrejection regions with an arbitrarily critical value. 15. The pass rate of driving test in a country is reported to be 0.65. To test whether this claim is true, a researcher selects a random sample of 20 driving test candidates. If the number of candidates passing the driving test in the sample is anywhere from 9 to 17, the researcher decides not rejecting the null hypothesis that p = 0.65; otherwise, he concludes that p ≠ 0.65. Use the binomial distribution to determine the significance level of the test. 16. A record in a country reveals that the population mean life span of its people last year is 68 years with a standard deviation of 7.7 years. A random sample of 50 recorded deaths in that country during this year shows an average life span of 70.1 years. A critical region for the test statistic is such that x – > 70.1, find the corresponding value of the test statistic. 5.2 Testing Population Mean Evidence concerning the value of a population mean is provided by the sample mean. In the case where the population variance is known, and a small sample is taken from a normal population or a large sample is taken from any population, normal distribution is used to test a hypothesis about a population mean. In the case where the population variance is unknown, a large sample is taken from any population, the hypothesis test about a population mean can be carried out using a normal distribution as an approximation. Population mean, variance known We recall that the sampling distribution of the (sample) mean is normal for samples (any size) drawn from a normal population, and is approximately normal for large sample drawn from any population. The sampling distribution has mean µX – = µ and variance σX – 2 = s2 n , where μ and σ2 are the mean and variance of the population from which we pick random samples of size n. Suppose the population has unknown mean μ and known variance σ2 . Consider the hypotheses H0 : μ = μ0 , H1 : μ ≠ μ0 . Under the critical value approach, the significance level a is predetermined. The value of a corresponds to the total area of the critical region. Under the null hypothesis, μ = μ0 , and the test statistic Z = X – – µ0 σ n has a standard normal distribution, N(0, 1). Thus, for a given a, the critical values of the random variable Z are –z—a 2 and z—a 2 . We have the probability

245 Mathematics Semester 3 STPM Chapter 5 Hypothesis Testing 5 P 1 –z—a 2 , X – – µ0 σ n , z—a 2 2 = 1 – a This expression can be used to indicate a nonrejection region for the null hypothesis H0 . Hence, if –z—a 2 , z , z—a 2 , we do not reject H0 . On the other hand, if the calculated value of the test statistic falls in the critical region, that is, z , –z—a 2 or z . z—a 2 , H0 is rejected. For a fixed significance level a, the critical regions and critical values are as shown in Figure 5.4. – 2 – 2 – 2 0 z –z – 2 z (z) 1 – Figure 5.4 The following two examples illustrate how hypothesis tests are performed for the case in which the population varience is known. Example 5 A survey made by the Human Resource Ministry states that the average monthly salary of an executive is RM4100 with a standard deviation of RM680. However, a sample of 25 executives selected recently gives an average monthly salary per month of RM3850. Assuming that the average monthly salary of an executive is normaly distributed, test, at the 1% significance level, whether the ministry’s claim is too high. Solution: Let μ be the mean monthly salary of an executive claimed by the Human Resource Ministry and x – be the corresponding sample mean. Given information: μ = RM4100, σ = RM680, n = 25, x – = RM3850. We are going to test whether the ministry’s claim of monthly salary is too high. The significance level a is 0.01. We carry out a hypothesis test using the following five steps. Step 1: State the null hypothesis and the alternative hypothesis. H0 : μ = RM4100, H1 : μ , RM4100. Step 2: Specify the significance level. a = 0.01. Step 3: Select an appropriate probability distribution and determine the critical region. The population standard deviation s is known, the sample size is small but the population distribution is normal. Hence, the sampling distribution of X – is normal with mean µ and standard deviation σ  n . We will use the normal distribution to perform the test.

Pages:

Click to View FlipBook Version