REVISION SERIES: STATISTICS 4. A sample of the monthly amount invested in the Harith Company’s profit-sharing plan by employees was organized into a frequency distribution table for further study. Amount Invested (RM) Number of Employees 30 - 34 3 35 - 39 7 40 - 44 11 45 - 49 22 50 - 54 40 55 - 59 24 60 - 64 9 65 - 69 4 Calculate Variance and Standard Deviation. 5. Based on the answer in (4) calculate Pearson’s Coefficient of Skewness 1 (PCS 1) when mean = 29.9 and mode = 33.83 and interpret a conclusion. 6. Calculate the mean deviation from mean of the given data (given mean =64) 45 55 63 76 67 84 75 48 62 65 7. The quality of lightbulbs estimated life span (burning hours) for 100 bulbs for brand A are stated as below. Life span of bulbs (in hours) Brand A 0 – 50 15 50 – 100 20 100 – 150 18 150 – 200 25 200 – 250 22 100 Based on the table above. Find mean dan standard deviation. 47
REVISION SERIES: STATISTICS 8. Hani Zulaikha has just launched her new scarf called KekNi. She is interested to determine the number of customers who visit her shop to buy the KekNi scarf within one month. Day Number of customers 1 – 5 11 6 – 10 18 11 – 15 32 16 – 20 24 21 – 25 20 26 – 30 15 Pearson’s Coefficient of Skewness is used to measure the skewness of the distribution. You are required to: i. Calculate the variance and standard deviation for a several of customers within one month. ii. Determine the type of distribution by computing the Pearson Coefficient of Skewness 2 (PCS 2) and draw the skewness graph. iii. Nilofar has launched her new scarf called Nurlufa with a mean of 27.36 and a standard deviation of 10.58 within one month. Ascertain which scarf founder has more consistency to produce a new collection. 48
CHAPTER 5 CORRELATION AND REGRESSION 5.1 5.2 5.3 5.4 Concept of Correlation Scatter Diagram Linear Coefficient of Correlation Concept of Regression 49
REVISION SERIES: STATISTICS 5.1 Concept of correlation The degree of relationship between variables (the dependent variable), from one or more related variables (the independent variable) It helps to determine how well a linear or other equation explains the variable's relationship. Correlation analysis measures the degree /linear strength between the two variables studied. The correlation is very high with the value of correlation coefficient (r) is between 0.90 -1.00 CHAPTER 5: 50
REVISION SERIES: STATISTICS 5.2 Construct scatter diagram A scatter diagram forms certain patterns, either increasing or decreasing, to indicate a relationship between two variables. Strength of correlation Strength of correlation Positive correlation: • x increases, y also increases • Scatter diagram: Negative correlation: • x decrease, y increases, vice versa • Scatter diagram: No correlation: • No relationship between the two variables. • Does not show any pattern or is randomly spread. • Scatter diagram: Perfect positive correlation: • x and y move in the same direction with a fixed proportion. It indicates as +1. • Scatter diagram: Perfect negative correlation: • x and y move in the opposite direction with a fixed proportion. It indicates as -1. • Scatter diagram: 0 10 20 30 40 50 0 5 1 0 1 5 2 0 2 5 Y X 0 50 100 150 0 5 1 0 1 5 Y X 0 10 20 30 40 50 60 0 2 0 4 0 6 0 8 0 100 Y X 0 100 200 300 400 0 5 1 0 1 5 2 0 2 5 Y X 0 100 200 300 400 0 5 1 0 1 5 2 0 2 5 Y X 51
REVISION SERIES: STATISTICS 5.3 Calculate linear coefficient of correlation 5.3.1 Pearson’s product moment correlation coefficient Where r = correlation coefficient n = number of observations ∑xy = sum of the product of x and y ∑x² = sum of the squares of values of variable x (∑x)² = square of the sum of all the values of variable x Value and strength of correlation: Strength of correlation Value of correlation coefficient, r Very strong positive correlation 0.8 to 1.0 Strong positive correlation 0.6 to 0.79 Moderate positive correlation 0.4 to 0.59 Weak positive correlation 0.1 to 0.39 Zero correlation 0.0 Weak negative correlation -0.1 to -0.39 Moderate negative correlation -0.4 to -0.59 Strong negative correlation -0.6 to -0.79 Very strong negative correlation -0.8 to -1.0 Source: Faizah, Lau, Phang & Zainuddin, 2019 Used to determine the correlation for quantitative data. The value of correlation lies between -1.0 and 1.0 Positive relationship: Increase in one variable will cause the other variable to increase and vice versa. Negative relationship: Increase in one variable causes the other variable to decrease and vice versa Zero / close to zero: No linear relationship between the two variables Formula: = ∑ −∑ [ 2− 2] 2− 2 52
REVISION SERIES: STATISTICS 5.3.2 Spearman’s rank correlation coefficient 5.4 Show concept of regression Spearman’s rank correlation coefficient • For qualitative data • Measure of association between two variables that are at least of ordinal scale Formula: = 1 − 6 d 2 2 − 1 Step: • List of all the n subjects (or observations). • Rank the x & y variables • Arrange the data in ascending order. • Determine the various value of di (the different value between two rank) • Square di (or di² ) • Sum all value of di² Regression analysis is a statistical technique for determining the equation related to two variables. The least square method can be used to determine the relationship between two variables. The linear regression equation can be written in the form of y = a + bx. In the equation, x is the independent variable, y is the dependent variable and a and b are two constants. A regression line with positive slope shows that the two variables have direct relationship. Meaning that, if x increases, y will increase as well, and vice versa. It can be used to make forecasts. 53
REVISION SERIES: STATISTICS Formula: Example 5.1: The following data shows the number of tuitions attended by a student and its mathematics marks. No. of tuition (x) 1 3 4 7 6 9 5 4 5 9 Marks (y) 30 32 35 40 48 50 52 55 57 61 Based on the above data; i. Construct a scatter diagram and describe relation between two variables ii. Calculate Pearson coefficient of correlation and interpret the result. iii. Create the regression function Solution: i. Scatter diagram: There is positive relationship between x and y = − (∑)(∑) − () − ∑ ∑ − (∑) = (∑) − [ ∑ ] ̅ − ̅ = + 0 20 40 60 80 0 2 4 6 8 10 y x The number of tuitions attended by a student and its Mathematics marks 54
REVISION SERIES: STATISTICS ii. Pearson coefficient of correlation: x y xy x² y² 1 30 30 1 900 3 32 96 9 1024 4 35 140 16 1225 7 40 280 49 1600 6 48 288 36 2304 9 50 450 81 2500 5 52 260 25 2704 4 55 220 16 3025 5 57 285 25 3249 9 61 549 81 3721 53 460 2598 339 22252 = ∑ − ∑ √[ ∑ − ( ) ][ ∑ − () ] = 10(2598) − 53(460) √[10(339) − (53)²][10(22252) − (460)²] = 25,980 − 24,380 √[3390 − 2809][222520 − 211600] = 1,600 √[581][10920] = 1,600 √[6,344,520 = 0.635 (Strong positive correlation) iii. Regression using least square method: = − (∑)(∑) 2 − () 2 = 2598 − (53)(460) 10 339 − (53) 2 10 = 2598 − 2438 339 − 280.9 = 160 58.1 = 2.754 55
REVISION SERIES: STATISTICS = (∑) − [ ∑ ] = 460 10 − 2.754 [ 53 10] = 46 − 2.754[5.3] = 31.404 = + = 31.404 + 2.754 Example 5.2: The following table shows the score for Management and Economics in a test. Student A B C D E F G Management 40 60 90 75 80 80 65 Economics 70 75 62 60 80 90 55 Based on the above data, calculate the spearman’s rank coefficient correlation. Solution: Student A B C D E F G Management, Px 40 60 90 75 80 80 65 Px 7 6 1 4 2.5 2.5 5 Economics, Py 70 75 62 60 80 90 55 Py 4 3 5 6 2 1 7 di 3 3 -4 -2 0.5 1.5 -2 di² 9 9 16 4 0.25 2.25 4 ∑di = 44.5 = 1 − 6 ⅆ 2 (2−1) = 1 − 6(44.5) 7(7 2 − 1) = 1 − 267 7(48) = 1 − 267 336 = 0.20 56
REVISION SERIES: STATISTICS REVISION EXERCISE 5 1. Briefly describe the concept of correlation. 2. The following table shows the number of units produced and the number of defects for product A in a month. No. of unit (’00) (x) 30 32 35 40 48 50 52 55 57 61 No. of defect (y) 1 0 2 5 2 4 6 5 7 8 Based on the above data: i. Construct a scatter diagram ii. Based on (a), how would you describe the relationship between two variables? iii. Calculate Pearson coefficient of correlation and interpret it. iv. Calculate the regression function. 3. Ten students sat for the Statistics and Business Mathematics examination. The result are shown in the table below. Student A B C D E F G H I J Statistics 55 57 89 78 92 70 65 69 80 49 Mathematics 60 70 70 60 81 90 59 80 75 55 Based on the above table: i. Calculate Pearson’s product moment correlation coefficient. ii. Draw a scatter diagram for the above data and explain the type of relationship between the variables. 4. The following table shows the number of items produced per month, x and the production cost per month, y at MJ Factory. Estimate the linear regression equation y = a + bx using the method of least square. No. of items, x (’00) 26 44 53 29 77 80 20 40 67 86 17 61 Production cost, y (RM’000) 42 60 69 47 91 98 39 55 85 104 37 77 57
REVISION SERIES: STATISTICS 5. The data below shows the number of workers and the number of units produced for Item Y at nine companies for a month. Company A B C D E F G H I No. of workers 50 60 55 60 80 40 83 42 70 No. of units (‘000) 30 63 40 50 60 30 70 50 60 Based on the above data: i. Construct a scatter diagram ii. Describe relationship between two variables iii. Calculate Pearson coefficient of correlation and interpret the result iv. Formulate the regression equation 6. The following data shows the interest rate and number of car sales. Interest rate % (x) 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 Car sales (y) 320 320 350 400 480 500 520 550 570 610 Based on the above data: i. Construct a scatter diagram and describe relation between two variables ii. Calculate Pearson coefficient of correlation and interpret the result. iii. Create the regression function 7. The table below shows the number of customers and commission gained by seven agents at MM Insurance. Based on the above data, i. Draw a scatter diagram ii. Calculate Pearson’s product moment correlation coefficient between the number of customers and commission gained. Comment on your result. iii. Calculate the regression equation using the least squares method. Agents AA BB CC DD EE FF GG No. of customers, x 10 7 10 13 11 14 9 Commission, y (RM’00) 12 10 14 16 15 19 11 58
REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 5 1. The following table shows the marks obtained by 6 students (1, 2, 3, 4, 5, and 6) in Statistics and Organizational Behaviour subject in an examination. Student 1 2 3 4 5 6 Statistics 66 91 83 70 66 54 Organizational Behaviour 80 100 78 68 80 60 Based on the above data, simplify the Spearman’s rank correlation coefficient. 2. The following table shows the marks obtained by 6 students (A, B.C, D, E, F) in Entrepreneurship and Marketing subject in an examination. Student A B C D E F Entrepreneurship 65 90 80 73 77 54 Marketing 80 100 77 78 81 89 Based on the above data, simplify the Spearman’s rank correlation coefficient. 3. Below is expenditure incurred by Rizz & Man Holding for the Research and Development (R&D). Also shown is the total profit earned for 6 consecutive years. Year R&D Expenditure (RM Million) Total Profit (RM Million) 2012 2 20 2013 3 25 2014 5 34 2015 4 30 2016 11 40 2017 5 31 Express the linear regression equation for the data above using the least squares method and interpret the interpret the relationship between R&D expenditure and total profit based on your answer. 59
REVISION SERIES: STATISTICS 4. The table below shows the interest rates for car loans and the number of customers who apply for the loan on a month from a finance company. Draw the scatter diagram for the above data. 5. A production manager collected the data below on production cost and the quantity produced for 10 consecutive days. These data are given below. Day 1 2 3 4 5 6 7 8 9 10 Quantity (‘000 units) 10 13 20 18 17 15 16 14 11 12 Cost (RM’000) 20 28 38 35 33 30 34 29 23 25 By using the least square method, calculate the regression equation for cost. 6. A store manager wishes to find out whether there is a relationship between the age of his employees and the number of sick days they take each year. The data for the sample is shown below. Express the relationship between these variables by using Pearson’s product-moment coefficient of correlation and interpret the result. Age, x 18 26 39 48 53 58 Days, y 16 12 9 5 6 2 Interest rate (%) 6.0 6.2 6.5 6.8 7.0 7.2 7.5 7.8 8.0 8.2 8.4 8.7 No. of applicants 80 80 78 75 70 60 60 55 50 48 45 40 60
CHAPTER 6 PROBABILITY CONCEPT 6.1 6.2 6.3 SAMPLE SPACES AND PROBABILITY ADDITIONAL RULES FOR PROBABILITY MULTIPICATION RULES FOR PROBABILITY 61
REVISION SERIES: STATISTICS 6.1 Sample space and probability Probability: The likelihood of an event is to occur. Experiment: A process that, when performed, result in one and only one of many observations. Outcome: One of the possible results in a random experiment. A sample space: The set of all possible outcomes of a random experiment. Event: A set of outcomes of a random experiment. It can be one outcome or more than one outcome. CHAPTER 6: 62
REVISION SERIES: STATISTICS Example 6.1: Experiment Outcomes Sample space Toss a coin Head, Tail S = {Head, Tail} Roll a dice 1, 2, 3, 4, 5, 6 S = {1, 2, 3, 4, 5, 6} Toss a coin twice HH, HT, TH, TT S = {HH, HT, TH, TT} Netball competition Win, Lose S = {Win, Lose} Taking a quiz Pass, Fail S = {Pass, Fail} Choose a lecturer Male, Female S = {Male, Female} 6.2 Probability of events using: 6.2.1 Tree diagram Provide a visual representation of an experiment that involves a series of activities Enable all logical possibilities are considered. Using a tree diagram, the possible outcomes can be displayed in a clearer way. 63
REVISION SERIES: STATISTICS Example 6.2: A box contains 10 red balls and 8 blue balls. Two balls have been picked from the box. Based on the information given, construct the three diagrams. Solution: 1 st drawn 2 nd drawn Outcomes Probability Red, Red 10 18 10 18 = 25 81 Red, Blue 10 18 8 18 = 20 81 Blue, Red 8 18 10 18 = 20 81 Blue, Blue 8 18 8 18 = 16 81 6.2.2 Venn Diagram Venn Diagram It is an illustration that uses overlapping or separate circles to show the logical relationships between two or more sets of items Visualizing the similarities and differences between two or more sets Red Blue 10/18 8/18 Red Red Blue Blue 10/18 10/18 8/18 8/18 64
REVISION SERIES: STATISTICS Example 6.3: An experiment has eight equally likely outcomes: 32, 40, 64, 68, 71, 79, 89, 97. Event Y = {40, 68, 79, 89} Event Z = {68, 71, 89, 97} i. Represent the event using a Venn diagram. ii. Calculate the probability below: a. P(Y) b. P(Z) c. P (Y∩Z) Solution: i. Draw a Venn diagram: ii. Calculate the probability a. Y = {40, 68, 79, 89} P(Y) = 4 8 = 1 2 b. Z = {68, 71, 89, 97} P(Z) = 4 8 = 1 2 c. Y ∩ Z = {68, 89} P (Y ∩ Z) = = 2 8 = 1 4 Y Z 68 89 40 79 71 97 64 32 S 65
REVISION SERIES: STATISTICS 6.2.3 Two-way table Example 6.4: A random sample of two thousand students was asked whether they had ever bought books at MPH Bookstore. A total of 800 students have bought books at MPH Bookstore. Out of 800 female students, 500 of them have never bought books at MPH Bookstore. Based on the above information. i. Construct a two-way table ii. Calculate the probability if one male student is selected randomly. iii. Calculate the probability if one student selected has never bought books at MPH Bookstore. iv. Calculate the probability if one student has bought books at MPH Bookstore is a female. Solution: i. Two-way table: Have bought Have Never bought Total Male 500 700 1,200 Female 300 500 800 Total 800 1,200 2,000 Two-way table Also called frequency table or contingency table. It is another way to display the frequencies of two different categorical variables One category is shown in the rows while the other is shown in the column 66
REVISION SERIES: STATISTICS ii. Probability if one male student is selected at random. P(M) = n(M) / n(S) = 1,200 / 2,000 = 0.6 iii. Probability that the selected student has never bought books at MPH Bookstore. n (male and female has never bought books) = 1,200 n (all students) = 2,000 = 1200 / 2000 = 0.6 iv. Probability that this student is a female has bought books at MPH Bookstore n (female has bought books) = 300 n (all female students) = 800 = 300 / 800 or 0.375 6.2.4 Complementary events Example 6.5: M&M chocolate candy are of varying, and the different colours occur in different proportions. The table below shows the probability that chosen candy drawn from a packet of M&M randomly. Calculate the probability for brown. Colour Yellow Red Orange Green Blue Brown Probability 0.2 0.1 0.3 0.1 0.2 (i) Complementary events Two complementary events are always mutually exclusive. ҧ(A complement) : The event consists of all the outcomes for an experiment that are not in A. 67
REVISION SERIES: STATISTICS Solution: P (ҧ) = 1 − 0.2 − 0.1 − 0.2 − 0.1 − 0.2 P (ҧ) = 0.2 6.2.5 Classical Formula Example 6.6: A dice is rolled once, calculate the probability that: i. The outcomes are numbers less than 5. ii. A 4 to 6 is obtained. Solution: s = {1, 2, 3, 4, 5, 6} n(s) = 6 A = Number less than 5 A = {1, 2, 3, 4} n(A) = 4 P(A) = = s = {1, 2, 3, 4, 5, 6} n(s) = 6 A = Number 4 to 6 A = {4, 5, 6} n(A) = 3 P(A) = = Classical formula All outcomes in the sample space are equally likely to occur. Equally likely events; Events with the same probability of occuring. 68
REVISION SERIES: STATISTICS 6.3 Additional rules for probability Mutually exclusive events Non-mutually exclusive events Example 6.7: Identify whether the following events are mutually exclusive or non-mutually exclusive. Explain your reason. 1. Select a student in a class: The student is a female, and the student is a male. 2. Roll a dice: Get an even number and get the number less than three. Solution: 1. Mutually exclusive; A student cannot be both female and male at the same time. 2. Non-mutually exclusive; Number 1 number two is both an even number and a number less than 3. • When event A and event B cannot occur together. • Example: Tossing a coin; Getting either a head or a tail. •P(A U B) = P(A) + P(B) Mutually Exclusive Events • The possibility that event A and event B will happen at the same time. • Example: College students allowed to take either Statistics, Mathematics, or both courses this semester. •P(A U B) = P(A) + P(B) - P(A ∩ B) Non Mutually Exclusive P(A) P(B) P(A) P(B) P(A and B) 69
REVISION SERIES: STATISTICS Example 6.9: In a tuition centre, there are 8 teachers and 32 students. Five teachers and twenty students are female. If one person is selected randomly, calculate the probability that the person is a teacher or a female. Solution: P (teacher or female) = P (teacher) + P (female) – P (female teacher) = 8 40 + 25 40 − 5 40 = 28 40 = 7 10 6.4 Multiplication rules for Probability Example 6.10: A box contains 2 green buttons, 4 black buttons and 3 yellow buttons. A button is selected, and it is replaced. Then, second button is selected. Calculate: i. Probability of selecting 3 black buttons. ii. Probability of selecting 1 yellow button and then a green button. iii. Probability of selecting 2 black buttons and then one yellow button. Example 6.8: In a kitchen, there are 8 packs of nasi lemak, 5 packs of buns, 2 packs of fried rice, and 1 pack of curry puffs. If a child selects one pack at random, find the probability that it is a pack of nasi lemak or a pack of fried rice. Solution: P (Nasi lemak or Fried rice) = P (Nasi lemak) + P (Fried rice) = 8 16 + 2 16 = 10 16 = 5 8 Independent • Two events are independent when: • The probability of two or more events that occur in consecutively. • The occurrence of one event has no effect on the probability of the occurance of another event • P(A ∩ B) = P(A) .P(B) Dependent • The occurrence of the first event affects the occurance of the second events; Probability is changed. • Usually happens in sampling without replacement • P( A and B) = P(A) x P(B after A) or • P(A ∩B) = P(A) x P(B/A) •When P(B/A) = P(B), the two events are independent 70
REVISION SERIES: STATISTICS Solution: i. Probability of selecting 3 black buttons = P (black and black and black) = P(black) . P(black) . P(black) = 4 9 . 4 9 . 4 9 = 64 729 ii. Probability of selecting 1 yellow button and then a green button = P(yellow and green) = P(yellow) . P(green) = 3 9 . 2 9 = 6 81 = 2 27 iii. Probability of selecting 2 black buttons and then one yellow button = P (black and black and yellow) = P(black) . P(black) . P(yellow) = 4 9 . 4 9 . 3 9 = 48 729 = 16 243 Example 6.11: The classification of employees at YSY Enterprise by gender and status is shown as follows: Single (S) Married (MA) Total Male (M) 8 17 25 Female (F) 6 13 19 Total 14 30 44 If one of the employees is selected randomly to attend a training, calculate the probability that this employee is a female and single. P (F and S) = P (F) P (S | F) P(F) = 19/44 P (S | F) = 6/19 P (F and S) = P(F) P (S | F) = (19/44) (6/19) = 0.136 71
REVISION SERIES: STATISTICS REVISION EXERCISE 6 1. Two paper clips are picked without replacement from a box of five red and three blue paper clips. i. Construct a tree diagram to illustrate the probabilities of the event. ii. Calculate the probability of each the final outcomes. iii. Find the probability that both paper clips will be red. iv. Find the probability that at least one paper clip will be red. 2. For each of the following experiments, identify the sample spaces: i. One toss of a coin and one roll of a dice. ii. The gender of the children if a family has three children. iii. A bag contains three buttons that are labelled as AA, BB and CC. Two buttons are selected at random (without replacement) from this bag. 3. Of the 200 bottles produced by two machines, had defects. Eighty of the total bottles were produced on Machine A and 10 of these 80 are defective. Based on the information given: i. Construct a tree diagram to show the probabilities of the event. ii. Calculate the probability of each the final outcomes. iii. If a bottle is selected at random, calculate the probability of defective bottles. iv. If a bottle is selected at random, calculate the probability of defective bottles or from Machine A. 4. The table below shows the population by ethnic in Taman Surya, Petra Jaya Kuching. Malay Iban Bidayuh Melanau Other Bumiputera Chinese Others 6,069 7,520 2,070 1,343 1,665 5,989 1,630 If one person is selected randomly, calculate the probability that this person is i. Iban ii. Chinese 72
REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 6 1. A bag contains four red balls, six green balls, two white balls and three black balls. Outlines the probability that a ball chosen at random from the bag is: i. A white ball ii. Red or white balls iii. Not a black ball 2. A bag contains 4 yellow balls and 7 purple balls. Atikah picks a ball at random from the bag and replaces it in the bag. She mixes the balls in the bag and then picks another ball at random from the bag. Calculate the probability that Atikah picks a yellow ball in her second draw using the diagram. 3. In a class of 19 students, 8 boys are asked if they have lunch at school. 4 girls eat lunch at school, while 5 boys do not eat lunch at school. Based on the above information, you are required to draw a two-way table to illustrate the results and write the following probabilities of: i. Choosing a boy who eat his lunch at school. ii. Choosing a girl who does not eat her lunch at school. iii. Choosing a boy, and he eats lunch. iv. Choosing a girl, and she does not eat lunch. 4. There are 12 red marbles and 16 blue marbles in a bag. A marble is drawn at random. Visualize the answer for the probability that the marble drawn is red. 5. Adlan’s favourite meal is pasta, followed by a cake as a dessert. Adlan’s mother cooks pasta once a week. If she cooks pasta, then the probability that Adlan gets cake is 3/5. If she does not cook pasta, the probability that Adlan gets cake for dessert is ¼. Draw the tree diagram for the above information. 73
REVISION SERIES: STATISTICS 6. A coin is tossed twice. Construct the tree diagram to find the probability of getting at least one number. 7. In hospital unit A, there are 9 nurses and 6 doctors, 3 nurses and 2 doctors are female. Total number of male staffs is 10. Represent the events by using two-way tables. 8. Ratu has a bag consists of 6 black balls and 4 yellow balls. He picks from the bag randomly, replace it and randomly picks it again. i. Draw a tree diagram to represent the probability. ii. Calculate the probability that Ratu picks no yellow ball. iii. Calculate the probability that Ratu picks at least one black ball 9. 50 people were asked if they have ever been to France of Spain. 18 people have been to France, 23 people have been to Spain and 6 people have been to both. Concert the data to a Venn Diagram to represent the information. 10. Zara has opened a new restaurant in Behrang. She wants to know the meal preference of her customers who eat in her restaurant. From her survey, 45% of the customers ordered salad, of which 60% of them also ordered juice. On the contrary, 15% who did not order salad have also ordered juice. i. Draw a tree diagram from the information above to show the probability. ii. Calculate the probability, if: a. A customer orders a salad and juice b. A customer order juice c. A customer order salad given they ordered juice d. A customer did not order juice 74
CHAPTER 7 ESTIMATION AND HYPOTHESIS Types of Estimation Confident Interval for Mean when standard deviation is known and unknown THE CONCEPT OF ESTIMATION THEORY 7 . 1 INITIATE HYPOTHESIS TESTING 7 . 2 Null hypothesis and Alternative hypothesis One-tailed and two-tailed test z-Test and t-Test for mean of population Steps in hypothesis testing 75
REVISION SERIES: STATISTICS a 7.1 Concept of Estimation Theory Theory of estimation : Part of statistics that deals with estimating values of parameters based on measured empirical data that has a random component. Estimation: The process of determining parameter values from empirical. Objective of estimation: To determine the approximate value of a population parameter on the basis of a sample statistic, at a specified confidence level. Confidence level must be established in advance of the statistical testing because the margin of error as well as the necessary scope of the testing depends on these Normally confidence level: 90%, 95% or 99% are used. CHAPTER 7: 76
REVISION SERIES: STATISTICS 7.2 Types of Estimation 7.3 Hypothesis Testing Point estimation Is a single value of a statistic. Serve as a best estimate of an unknown population parameter. Example: The sample mean () is a point estimate of the population mean . The sample proportion is a point estimate of the population proportion P. Interval estimation Defined by two numbers between which a population parameter is said to lie. Example: < ̅< b is an interval estimate of the population mean . This means that the population mean is greater than but less than b. An assumption about the value of a population parameter. This assumption may be or may not be true. Refers to the procedures used by researcher/statisticians to accept or reject statistical hypotheses. The best way to determine whether a hypothesis is true or false is to examine the entire population 77
REVISION SERIES: STATISTICS 7.4 Functions of Hypothesis 7.5 Characteristics of Hypothesis Test the truth or accuracy of a theory Enhance the objectivity and purpose of research work Help a researcher in prioritizing, data collection, hence providing focus on the study Characteristics of Hypothesis Be simply stated, clear and precise Consistent with most known facts Must explain facts that give rise to the need for explanation State relationship between variables Capable of being tested 78
REVISION SERIES: STATISTICS 7.6 Types of Hypotheses 7.7 Step in Hypothesis Testing STEP 1: Formulate Hypothesis Null and Hypothesis Alternate i. The null hypothesis (H0): Statistical hypothesis to claim that there is no difference between two parameters. ii. The alternative hypothesis (HA): Statistical hypothesis to claim that there is a difference between two parameters. Hypothesis-testing Common Phrases: > < = ≠ Is greater than Is less than Is equal to Is not equal to Is above Is below Is no different from Is different from Is higher than Is lower than Has not changed from Has changed from Is longer than Is shorter than Is the same as Is not the same as Is bigger than Is smaller than Is increased Is decreased or reduced from Source: Bluman A.G, 2018 Population parameter (such as mean or standard deviation ) is equal to a hypothesized value. Means that there is no statistical significance between the variables examined. An initial claim; to disprove based on previous analyses or specialized knowledge. Opposite of the null hypothesis. Population parameter is smaller than, greater than or different from the hypothesized value in the null hypothesis. Refers to what the researcher might believe to be true or hope to prove true Null hypothesis (denoted by ) Alternative hypothesis (HA) 79
REVISION SERIES: STATISTICS Hypothesis Symbol: STEP 2: Specify the Significance Level, α i. Find sign level and critical value (C.V): Significance level, α Confidence Interval 0.01 (1%) 0.99 (99%) 0.05 (5%) 0.95 (95%) 0.10 (10%) 0.90 (90%) ii. Critical and Noncritical regions: Hypothesis Symbol =, ≥, ≤ H1 ≠, >, < 80
REVISION SERIES: STATISTICS Summary of Hypothesis Testing and Critical Value (CV): Two-tailed test Right-tailed test Left-tailed test : = 0: = 0: = : ≠ 1: > 1: < α C.V α C.V α C.V 0.10 ±1.65 0.10 +1.28 0.10 -1.28 0.05 ±1.96 0.05 +1.65 0.05 -1.65 0.01 ±2.58 0.01 +2.33 0.01 -2.33 STEP 3: Identify the Test Statistics Statistics test Sample size, n Population standard deviation, σ Formula z-test n ≥ 30 Known = ҧ− √ t-test n < 30 Unknown = ҧ− √ STEP 4: Formulates the Decision Rule Two-tailed test Do not reject Ho: C.V (left) ≤ Z ≤ C.V (right) Reject Ho: Z < C.V (left) or Z > C.V (right) Right-tailed test Do not reject Ho: Z ≤ C.V Reject Ho: Z > C.V Left-tailed test Do not reject Ho: Z ≥ C.V Reject Ho: Z < C.V STEP 5: Make the Decision • 0 is accepted if the statistical test value is outside its critical value. • 0 is rejected if the statistical test value is within its critical value. 81
REVISION SERIES: STATISTICS Example 7.1 A manager at GB Factory claims that the employees in the factory are working above average. A random sample of thirty employees has a mean of 9.5 hours. With mean a population of 8.2 hours, standard deviation of 3.2 and 5% significance level, is there sufficient evidence to support the manager’s claim? Solution: Step 1: State the hypothesis 0: = 8.2 1: > 8.2 Step 2: Significance level α = 0.05, Z 0.05 =1.65 Step 3: Statistics Test = ҧ− √ = 9.5 − 8.2 3.2 √30 = 2.225 Step 4: Decision rule If α =0.05, Z 0.05 = 1.65 Reject Ho if Z 0.05 > 1.65 Step 5: Make the decision Reject Ho. Based on the sample size 30 with a one tailed test on the right and α = 0.05, it seems that there is no sufficient evidence to support the manager’s claim. 82
REVISION SERIES: STATISTICS Example 7.2 A researcher claims that all students at Epsom College spend greater than 2.5 hours on sport per week. A sample of 20 students selected randomly from a college produced a mean 4.6 hours per week. With the population standard deviation of 4 and using 5% significance level, test the manager’s claim. Solution: Step 1: State the hypothesis 0: ≤ 2.5 1: > 2.5 Step 2: Significance level α = 0.05, t 0.05 = 1.729 df = 20 - 1 = 19 Step 3: Statistics Test = ҧ− √ = 4.6 − 2.5 4 √20 = 2.348 Step 4: Decision rule If α =0.05, t 0.05 = 2.348 Reject Ho if t 0.05 > 1.729 Step 5: Make the decision Reject Ho. Since t = 2.348 is > 1.729. The manager’s claim that the students spend more time on sport than 2.5 hours is supported. 83
REVISION SERIES: STATISTICS REVISION EXERCISE 7 1. Define estimation in statistics. 2. Illustrate the rejection and non-rejection regions for each of the following examples of hypothesis test about on the sampling distribution. Assume it is normal distribution. i. A two-tailed test; α = 0.01; n = 30. ii. A left-tailed test; α = 0.05; n = 50. 3. State the null hypothesis and alternative hypothesis for each of the following examples. Identify whether it is a case of a two-tailed, a left-tailed, or a right-tailed test. i. To test if the mean amount of time spends per week reading story books by all teenagers is different from 8.6 hours. ii. To test if the mean of total expenses per week by all students at MM Cafeteria is less than RM100. 4. A sample of 120 people has a mean age of 20 with a population standard deviation, σ of 5. You are required to test the hypothesis that the mean population at 16.2 and α =0.05. 5. A random sample of 40 adults revealed that they spend 6.2 hours each week on Facebook. Standard deviation for the sample is 0.5. Do this adult browsing more than 7 hours a week on average. At 1% significance level, set up the hypothesis. 6. A manager at GG Factory wants to know if the number of sick days an employee takes per year is greater than 5 on average. A random sample consists of 32 employees at a factory had a mean of 5.6. With the standard deviation of the population is 1.2. Is there enough evidence to support the researcher’s claim at 1% significance level? 7. A sample consists of 60 students in Suria College has a mean age of 21. With the population standard deviation of 5, you are required to test the hypothesis that the population mean at 19.2 and α = 0.05. 84
REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 7 1. A principal at CBA School claims that the students in his school have above average intelligence. A random sample of thirty students has a mean score of 111. Is there sufficient evidence to support the principal’s claim? The mean IQ of the population is 100 with a standard deviation of 15. Use a 0.05 level of significance to justify your answer. Prepare your answer. 2. It is claimed that Politeknik Tuanku Syed Sirajuddin’s students need to spend an average of RM7 a day for foods. Thus, a random sample of 30 students has been selected to test this claim. Would you agree with this claim if the random sample showed an average of RM9 per day and the standard deviation of RM4.80 per day? Use a 0.5 level of significance to justify your answer. Prepare your answer. 3. A new laboratory technician reads a report that the average number of students using computer laboratory per hour was 16. To test this hypothesis, he selected at random and kept track of the number of students who used the lab over 8 hours. The result was as follows. 20 24 18 16 16 19 21 23 At α = 0.05, run the test to claim that the average is actually 16 with suitable diagram. Given the standard deviation is 2.97. 85
REVISION SERIES: STATISTICS REFERENCES Bluman A.G.(2018). Elementary Statistics: A Step by Step Approach. 10th Edition: McGrawHill Education Faizah Omar, Lau T.K., Phang Y.N. & Zainuddin Awang (2019). Statistics 4th Edition Shah Alam: Oxford Fajar Sdn Bhd. Lau T.K., Phang Y.N. & Zainuddion Awang (2022). Statistics. Fifth Edition. Selangor: SJ Learning. Soalan Peperiksaan Akhir 86
REVISION SERIES: STATISTICS REVISION EXERCISE 1 1. Descriptive statistics: Process of collecting, organizing, summarizing, presenting and analysing the data. 2. Inferential statistics 3. Descriptive, Inferential 4. Primary; Secondary; Secondary; Secondary; Secondary; Secondary 5. Secondary data is a published data gathered by other parties 6. Distinct categories of variables based on some characteristic or attribute. 7. Population; Sample 8. Population: 1,000 households in Taman Indah Sample: 500 households in Taman Indah. 9. i. Qualitative; ii. Qualitative; iii. Quantitative; iv. Quantitative; v. Quantitative; vi. Qualitative; vii. Quantitative; viii. Qualitative; ix. Quantitative; x. Qualitative 10. i. Continuous; ii. Discrete; iii. Continuous; iv. Continuous; v. Continuous; vi. Continuous; vii. Discrete; viii. Continuous; ix. Discrete; x. Continuous 11. Internet, observation 12. i. Experimental; ii. Telephone survey; iii. Personal interview; iv. Mailed questionnaire FOCUS ON FINAL EXAM 1 1. Observation, Experiment, personal interview, mailed questionnaire, internet survey 2. i. Primary; ii. Secondary; iii. Secondary; iv. Primary; v. Secondary 3. i. Quantitative; ii. Qualitative; iii. Quantitative; iv. Quantitative; v. Qualitative 4. Sample: A subset or representative of the population; Population: All the subjects/items that the researcher wants to study. 5. Direct questionnaire: Advantages; i. Cheap; ii. Covers a large area; Disadvantages: i. quite low response; ii. Bias 6. i. Quantitative; ii. Qualitative; iii. Qualitative; iv. Qualitative; v. Qualitative 7. i. Quantitative; ii. Quantitative; iii. Qualitative; iv. Quantitative ANSWERS: 87
REVISION SERIES: STATISTICS REVISION EXERCISE 2 1. 2. The age distribution of the workers in Borneo Manufacturing company is as follows. Lower class boundaries Cumulative frequency for ‘less than’ ogive Cumulative frequency for ‘more than’ ogive 9.5 0 700 19.5 85 615 29.5 205 495 39.5 430 270 49.5 565 135 59.5 670 30 700 0 3. Range = 17.5; k = 5; c = 3.5 4. Range = 75; k = 6; c = 13 Class Interval Frequency Cumulative frequency 19 – 31 2 2 32 – 44 7 9 45 – 57 11 20 58 – 70 8 28 71 – 83 12 40 84 – 96 5 45 45 FOCUS ON FINAL EXAM 2 1. Range = 807; k = 7; c = 115 2. Range = 130; k = 6; c = 22 3. Histogram: 0.5 2.5 4.5 6.5 8.5 10.5 88
REVISION SERIES: STATISTICS 4. a = 6; b = 34; c = 12; d = 8; e = 0.06; f = 0.23; g = 0.12; h = 0.08; i = 6; j = 63 5. Range = 29, k = 6; c = 5 REVISION EXERCISE 3 1. Central tendency 2. Mean = 61.67; Median = 60; Mode = 60 3. Median = 53.75; Mode = 78 4. Mean = 5.316; Median = 4.989; Mode = 4.526 5. Mean = 11.38; Mode = 9; Median = 11.5 6. Mean = 8.5; Mode = 7&1; Median = 7.5 7. Mean = Mode = A; Median = B 8. Mean = 77.25; Median = 78; mode = 77 9. Mean = 34.63; median = 32;mode 27.33 FOCUS ON FINAL EXAM 3 1. Mean = 1138.68; Median = 1172.92 2. Mean = 51.04; Median = 51.625 3. Mean = 152.5; Median = 152 4. Mean = 6.588 5. Positively skewed / right; Negatively skewed / left 6. Mean = 159.6; Median = 161.295; Mode = 163.869 7. Median = 104.5, Mode = 105; Mean = 103 8. Mean = 161.7; Median = 161.08; Mode = 160.38 9. Mean = 15.88; Median = 15.34; Mode = 13.68 0 10 20 30 40 0 50 100 150 200 250 FREQUENCY CLASS BOUNDARIES Histogram of monthly expenditure on food by a sample of 100 families in a town. 89
REVISION SERIES: STATISTICS REVISION EXERCISE 4 1. Mean deviation = 2.513; Variance = 9.30478; SD = 3.05037; PCS 1 = 0.25901; right 2. Range = 3.7 3. Mean = 45.75, Standard deviation = 6.218, Coefficient of variation = 13.59% 4. PCS 1 = 1.5625, right; PCS 2 = 2.8125, right 5. Mean = 16.41, variance = 78.908, SD = 8.88, CV = 54.12 6. Mean = 20.733, median = 20, mode = 18, positively skewed 7. ҧ< ̃ < ̂: ℎ ; ҧ= ̃ = ̂: ; ̂ < ̃ < ҧ: ℎ ℎ 8. Mean; 31.78, median; 30.25, variance; 116.47, Sd; 10.79, PCS II; 0.43 9. Variance = 129.901, Standard deviation =11.4, CV = 32.92 FOCUS ON FINAL EXAM 4 1. SD = 930.199; PCS 2 = -0.11 2. SD = 7.51; PCS 2 = -0.078 3. MD = 3.04 4. Variance = 56.43; SD = 7.51 5. -0.52; negatively skewed 6. MD = 9.4 7. Mean =134.5; SD = 69.16 8. Variance = 54.9; Standard deviation = 7.41; Skewed to the right; PCS 2 = 0.219; Nilofar = 38.7%; Hani = 46.7% REVISION EXERCISE 5 1. Correlation 2. Positive; r = 0.87; y = 9.6886 + 0.2106x 3. r: 0.484, Positive linear correlation 4. b = 0.978, a =18.1 5. r = 0.7966, a = 5.29, b: 0.7508 6. r = 0.936, strong positive relationship, b = 347.879, a = -564.243 7. r = 0.956; positive linear correlation, b: 6.907, a = -59.157 90
REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 5 1. p = 0.54 2. p = 0.6; Positive 3. y = 10 + 2x 4. r = -0.986 5. r = 0.983; y = 4.06 + 1.74x 6. r = -0.979 REVISION EXERCISE 6 1. ii. W,W = 0.36; W,B = 0.27; B,W = 0.27; B,B = 0.10; iii. 0.36; iv. 0.9 2. i. [H,1], [H,2], [H,3], [H,4], [H,5], [H,6], [T,1], [T,2], [T,3], [T,4], [T,5], [T,6]; ii. [G,G,G], [G,B,G], [G,B,B], [B,G,B], [B,G,G], [B,B,B], [B,B,G], [G,G,B]; iii. [AB], [AC], [BA], [BC], [CA], [CB] 3. ii. AD = 0.05; AP = 0.35; BD = 0.2; BP = 0.4; iii. 0.25; iv. 0.6 4. i. 0.286; ii. 0.228 FOCUS ON FINAL EXAM 6 1. i. 2/15; ii. 2/5; iii. 4/5; 2. 4/11 3. i. 3/19; ii. 7/19; iii. 3/8; iv. 7/11 4. 3/7 5. [P,C], 3/35; [P,NC], 2/35; [NP,C], 6/28; [NP,NC], 18/28 6. [P,P], ¼; [P,N], ¼; [N,P], ¼; [N,N], ¼; 7. Two-way tables: Nurses Doctors Total Male 6 4 10 Female 3 2 5 Total 9 6 15 8. i. [B,B] 9/25; [B,Y] 6/25; [Y,B] 6/25; [Y,Y] 4/25; ii. 9/25; iii. 21/25 91
REVISION SERIES: STATISTICS 9. Venn Diagram: 10. i. 0.27 e. 0.3525 f. 0.766 g. 0.6475 REVISION EXERCISE 7 1. Estimation is the process of determining parameter values from empirical data collected from a sample. 2. i. Two-tailed test; α = 0.01; n = 100. ii. Left-tailed test; α = 0.05 and n = 50. 3. i. Two tailed; H0 = 8.6; H1≠8.6; ii. Left tailed. Ho; μ = 100; H1: μ < 100 4. Ho: = 16.2; H1: ≠ 16.2; α = 0.05; z = 8.33; Reject Ho 5. Ho: = 7; H1: > 7; α = 0.01; z = -10.12; Reject Ho 6. Ho: = 5; H1: > 5; α = 0.01; z = 2.83; Reject Ho 7. Ho; μ = 19.2; H1: μ ≠ 19.2; z = 2.79; α = 0.05; reject Ho. F 18 S 6 23 3 92
REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 7 1. Ho; μ ≤ 100; H1: μ > 100; z = 4.0166; α = 0.05; reject Ho. 2. Ho; μ = 7; H1: μ > 7; z = 2.28; α = 0.05; reject Ho. 3. Ho; μ = 16; H1: μ ≠ 16; t = 3.45; α = 0.05; reject Ho. 93