The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by zainahzaira, 2023-08-17 01:51:22

REVISION SERIES: STATISTICS

EBOOK

REVISION SERIES: STATISTICS ZAINAH BINTI SEMAN SHARIFAH MAHANI BINTI SYID ASSIMIE


PREFACE THE REVISION SERIES for Statistics e-book helps students gain a better understanding of statistical concepts and techniques. Besides, it will provide the students with an understanding of how to handle the statistical data and interpret it effectively. Furthermore, students are exposed to the simple notes, formulas, and exercises of the revision book. Through revision exercises and a focus on final exam questions, students will be able to improve their knowledge of concepts and calculation techniques while getting familiar with past year questions. This revision book consists of seven chapters: Introduction to Statistics, Data Presentation, Central Tendency, Dispersion, Correlation and Regression. Probability and Hypothesis. Each chapter begins with summary notes with examples to enable students to have a quick revision, ends with revision exercises, and focuses on final exam questions that enable students to put into practice the concept and formula learned for each topic.


A B S T R A C T The Revision Series for Statistics is provided to help students have quick revisions on related topics in statistics. This e-book will expose the students to simple explanations with examples, revision exercises, and the final exam exercises. Thus, it enables students to understand the whole concept, apply the formula related to statistical data, and be well prepared for examination. Hopefully, this e-book will serve its purpose in helping students gain a better understanding of the course.


02 DATA PRESENTATION Frequency distribution tables Quantitative data 03 CENTRAL TENDENCY Measure of Central Tendency Measure of Central Tendency: Ungrouped data Measure of Central Tendency: Grouped data Relationship between Mean, Median and Mode 04 DISPERSION AND SKEWNESS Measure of Dispersion Measure of Dispersion for Ungrouped Data Measure of Dispersion for Grouped Data Coefficient of Variation 05 CORRELATION AND REGRESSION Concept of Correlation Scatter Diagram Linear Coefficient of Correlation Concept of Regression 06 PROBABILITY CONCEPT Sample Space and Probability Additional Rules for Probability Multipication Rules for Probability 07 ESTIMATION AND HYPOTHESIS TESTING The Concept Of Estimation Theory Initiate Hypothesis Testing TABLE OF CONTENTS 01 INTRODUCTION TO STATISTICS Definition of Statistics, Types of Statistics Sources of Data Types of Data Scale Measurement Statistical Terms Data Collection Methods 1 12 21 34 49 61 75


CHAPTER 1 INTRODUCTION TO STATISTICS Definition of Statistics Types of Statistics Sources of data Types of data Scale of measurement Statistical terms Data collection methods 1.1 1.2 1.4 1.5 1.6 1.7 1.3 1


REVISION SERIES: STATISTICS 1.1 Definition of statistic 1.2 Types of statistics 1.3 Sources of data • The scientific techniques and methods to collect, organize, summarize, present, and analyze data in order to obtain relevant information, draw reliable conclusions, and take effective decision. Definition: Descriptive statistics Data are collected, arranged, summarized and presented in meaningful way such as chart, graphs and table Used to explain the data in which it is already known to summarize the sample. Inferential statistics Make inferences or generalization about a larger population by analyzing the sample of that population. Used when to draw a conclusion for the data obtain from the sample. Primary data • Gathered from primary sources such as interview the respondents to obtain the response • Must be gathered when data needed are not available from secondary sources. • Advantages: • More precise in line with the research objective. • Enable the researcher to explain the data collection method and its limitations. • Disadvantages: • The data collection process requires more time, manpower and money Secondary data • Published data collected by other parties or agencies. • Other sources: Bulletins, journals, newspapers and other publications • Advantages: • Easily accessible. • Cheap because there is no fieldwork required • Save time. • Disadvantages: • May lack accuracy because the measurement procedure • May not fulfill specific needs and objectives CHAPTER 1: 2


REVISION SERIES: STATISTICS 1.4 Types of data 1.5 Scale of measurement Scale of Measurement Nominal Categorical data. Example: Gender, car brand Ordinal It can be categorized and ranked. Example: Likert scale questions, top three scorer Interval It can be categorized, ranked, evenly spaced, and no meaningful zero. Example: Test score, temperature Ratio It can be categorized, ranked, evenly spaced and exists a true zero. Example: Weight, age • Describe characteristic or attribute (non-numerical). • Example: marital status, foods preference and occupations Qualitative data • Can be counted or measured (numerical). • Example: Weights, test scores and temperatures. • Can be divided into two groups: • Discrete: Round numbers • Continuous: Decimal numbers Quantitative data 3


REVISION SERIES: STATISTICS 1.6 Statistical terms Includes of all subjects that are being surveyed Used to indicate the entire set of populations that are of interest in the survey. A group of individuals chosen from a population. Used to indicate a subset of populations that is representative of a population. Population Sample Types of Sampling techniques Probability sampling Simple random Systematic sampling Stratified sampling Cluster sampling Non-probability sampling Convenience sampling Snowball sampling Judgemental sampling Quota sampling 4


REVISION SERIES: STATISTICS 1.7 Data collection methods 1.7.1 Observation Allow the researcher to record what actually occurs without communicating with the respondents. Advantages: • Obtaining more accurate, valid and objective data • Useful to clarify the current behaviour of respondents. Disadvantage: • Highly qualified and unbiased observer required. • Do not expose the respondents’ intention to the researcher. • Past or future happenings can't be predicted by present observations. Data collection methods Observation Experiments Direct/mailed questionnaire Telephone surveys Internet survey Personal interviews 5


REVISION SERIES: STATISTICS 1.7.2 Experiments 1.7.3 Mailed questionnaire A controlled study. To determine cause and effect relationships. The study will be supervised by researchers of which there are two groups: a control group and experimental group. Example: Researchers will provide formula to the first class for exams while for the control group, the researchers will not provide formula to the second class for exams. It is: A questionnaire is posted to the respondent with a stamped addressed envelope attached. Selected respondents answer the questions in the questionnaire then return it to the researcher within a certain period of time. Advantages: Cheaper than personal interviews Researcher coverage is wider geographical area No interviewer influence Respondent has more time to think of proper response Disadvantages: Low responses rate Only very simple questions can be asked Some people may have difficulty reading or understanding the questions. 6


REVISION SERIES: STATISTICS 1.7.4 Telephone surveys 1.7.5 Internet survey Through phone call, a prepared set of questionnaires are used by interviewer to obtained the respondent's response. Normally short in duration • Cheaper than personal interview • Can monitor the interviews to ensure that specified interview procedures and purposes followed during the process of survey. Advantages: • Limited to the respondents who can be contacted by telephone • Normally, response rate lower than face to face interviews • Only a few questions can be asked • Selected respondents do not answer when the calls are made. • Respondents can be studied due to unlisted numbers and cell phones • The interviewer's tone of voice might have effect the response. Disadvantages: A set of questionnaires is given to a target sample. Respondents will respond to the questions over the world wide web. Various mediums of online surveys: email, embedded over website, social media etc. Advantages: • Faster response • Response world wide • Low cost compared to telephone survey Disadvantages: • Questionable data reliability • Limited access to certain population 7


REVISION SERIES: STATISTICS 1.7.6 Personal Interview A face to face interview was used by the researcher to gather the data. An interviewer asks the questions from a set of questionnaires. Then, record the respondent's responses. Advantages: • Obtaining in-depth responses • Allow an interviewer to clarify any doubt from the respondents. • The reactions of the respondents can be observed. • An experienced interviewer is able to tell if a respondent is providing false information. • Data collection normally yield a high response rate. Disadvantages: • Required well trained interviewers. • Selection of respondents may be biased. • Expensive compared to telephone interview. 8


REVISION SERIES: STATISTICS REVISION EXERCISE 1 1. Statistics can be categorized into two. Explain TWO [2] categorizes of statistics. 2. Statistics refer to the procedures used to organize and summarize the data while_______________ statistics refer to the procedures of taking a sample from a population and making estimates about population and making estimates about the population. 3. _____________________ are techniques used to summarize or describe numeric data while __________________ are used to interpret the meaning of data. (Inferential / Descriptive) 4. For each of the sources described below, decide whether it is Primary or Secondary: i. The data from postal questionnaires. ii. The data from the Bank Negara Malaysia. iii. Students’ enrolment for the year 2021 at Politeknik Kuching Sarawak iv. Statistics of overseas trade including import and exports prices indices. v. International Business textbook vi. A newspaper article dated 1st September 2021 about healthy lifestyle. 5. Explain secondary data and give TWO [2] examples. 6. Explain qualitative data and give THREE [3] examples. 7. Fill in the blank with the suitable terms: i. The entire group of interest for a statistical conclusion is referred to ______________. ii. A subgroup that is representative of a population is referred to _____________________. 8. A researcher wants to know total expenditure on food of each household in Taman Indah. It is assumed that Taman Indah has 1000 household. The researcher divided all the house at Taman Indah into 10 blocks equally and interviewed every house in 5 randomly chosen blocks. Based on above information, state the population and sample for this study. 9


REVISION SERIES: STATISTICS 9. Identify the type of variable (Qualitative or Quantitative) for the following statements: No. Statements Qualitative / Quantitative i. The countries of origin of immigrants. ii. The reasons people use taxis. iii. Number of workers at Sego Enterprise. iv. Number of employees at Kolej Komuniti Kuching. v. Revenue of Malaysia Airline Berhad. vi. The items sold at the school canteen. vii. The number of kids in a family. viii. The most popular colours of cars. ix. The marks scored in a Statistic test. x. The favourite fruit eaten by the students in a class. 10. Determine in the following statements whether discrete or continuous data. No. Statements Discrete / Continuous i. The weight of an individual. ii. Numbers of houses in Shah Alam. iii. Quantity of petrol sold by petrol stations in Satok. iv. Time taken to finish a test. v. The pulse rates of a group of athletes at rest. vi. The height of a group of 18 years old students vii. Number of cups of a coffee drunk per day viii. Length of 500 bottles produced in a factory. ix. Numbers of student in Commerce Department x. Yearly incomes of college professors 11. The fastest way to collect data is through __________ and a study that involves tracking behavior over a period is referred to ____________. 12. Recommend the best data collection method in each of the following situations: i. The research technique used to determine the causal relationships between variables. ii. The instrument used a series of questions and other prompts to collect information from respondents. iii. A method that involves two-way systematics communication between an interviewer and the respondents. iv. The researcher sends out the questionnaires to respondents with a request to complete them and return by post. 10


REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 1 1. State FIVE (5) data collection method to obtain required data from the respondents 2. Identify the types of data (primary or secondary) for each following characteristic: i. Questionnaire ii. Websites iii. Blog iv. Survey v. Magazine 3. Identify whether the following variables are qualitative or quantitative: i. Time taken to revise a particular subject ii. Qualification of the candidate for a particular job. iii. Number of voting paper in a ballot box. iv. Annual profit of Mantop Trading. v. State of health of people in Town A. 4. Define the following terms: i. Sample ii. Population iii. Primary data iv. Secondary data v. Discrete quantitative data 5. Explain TWO (2) advantages and TWO (2) disadvantages of any of data collection methods. 6. State the types of data (qualitative or quantitative) for each following characteristic. i. Number of family member ii. Blood glucose level iii. Eye color iv. Marital status v. Ethnicity 7. Identify the quantitative or qualitative data for these statements. i. The weight of the new-born baby is 3.45kg. ii. Adam goes swimming four times a week. iii. Sabrina has curly brown hair iv. Brian has one elder brother and wo younger sisters. 11


CHAPTER 2 DATA PRESENTATION FREQUENCY DISTRIBUTION TABLES Number of class Class interval Frequency, Cumulative frequency Class boundaries Mid-point Relative frequency Elements of frequency distribution tables: Histogram Frequency polygon Ogive QUANTITATIVE DATA 2.1 2.2 12


REVISION SERIES: STATISTICS THE WEIGHT OF THE NEWBORN BABY IS 3.45 KG i. 8. 2.1 Construct frequency distribution tables CHAPTER 2: • k = 1 + 3.3 log10 n (n = Number of data) • Number of classes must be a round number 1. Number of classes, k • Consists of lower class limit and upper class limit • Formula: Class width / size = Range No.of classes • Range = Highest value - Lowest value • Lower class limit for the first class = Lowest/smallest data value • Upper class = Lower class + Class width - 1.0 (without decimal places) • Upper class = Lower class + Class width - 0.1 (for one decimal places) 2. Class interval • The number of occurrences of values within a particular group 3. Frequency (f) • Refers to sum of all the frequencies up to and including that value. 4. Cumulative frequency (cf) • Formula: Lower boundary of a class = Upper limit of previous class + Lower limit of class 2 • Upper boundary of a class = Upper limit of class + Lower limit of next class 2 5. Class boundaries •ҧ= Lower limit of class + Upper limit of class 2 6. Mid point (ҧ) • Formula : Frequency of each class Total of frequency • The sum of relative frequency must be equal to 1. 7. Relative frequency 13


REVISION SERIES: STATISTICS 2.2 Organize quantitative data 2.2.1 Histograms 2.2.2 Frequency Polygon Histograms A graphical representation of a grouped frequency distributions, with bars representing frequencies Constructed using class boundaries and frequencies of the classes. Steps to construct a histogram: Identify class boundaries and frequency for each class Mark the class boundaries at the horizontal axis (x) Insert the frequency at the vertical axis (y) Draw a vertical bar to show the frequency for each class Drawn by connecting the midpoints of every class in one line. Steps for drawing a polygon: • Complete a histogram • Add another two classes with zero frequency at both ends of the histogram • Mark the midpoint of histogram bars • Connect the midpoints using straight lines • Make sure that the polygon starts and ends at the x-axis 14


REVISION SERIES: STATISTICS 2.2.3 Ogives (Cumulative frequency distribution) Example 2.1: ZZ Enterprise sells its product through online. The data shows the sales in units for 45 days. 19 46 81 60 32 45 62 44 73 35 56 59 65 56 40 57 67 48 80 77 61 55 24 90 40 76 68 82 88 34 76 94 83 49 52 66 87 80 49 71 32 54 78 79 92 Based on the data provided; i. Determine range, the number of classes and the class width. ii. Construct a frequency distribution table. iii. Construct a histogram and histogram polygon. iv. Construct ‘Ogive less than; and ‘Ogive more than’. Solution: i. Range, number of classes and class width Graph of a cumulative frequency table or cumulative relative frequencies distribution. Two types: • Ogive less than : An increasing function and rises to the right. • Ogive more than : fall to the right Range: Number of classes: Class width: = Highest value – Lowest value = 94 – 19 = 75 k = 1 + 3.3 log 10 45 = 1 + 3.3 (1.6532) = 6.46 ≈ 6 classes = 75 6 = 12.5 ≈ 13 15


REVISION SERIES: STATISTICS ii. Construct a frequency distribution table containing: Class interval Frequency Cumulative frequency Class boundaries Mid-point Relative frequency 19 - 31 2 2 18.5 – 31.5 25 0.04 32 - 44 7 9 31.5 – 44.5 38 0.16 45 - 57 11 20 45.5 – 57.5 51 0.24 58 - 70 8 28 57.5 – 70.5 64 0.18 71 - 83 12 40 70.5 – 83.5 77 0.27 84 - 96 5 45 83.5 – 96.5 90 0.11 Total ∑=45 ∑=1.00 iii. Construct a histogram and histogram polygon 0 2 4 6 8 10 12 14 y, Frequency x, units Title : Sales in Unit of ZZ Enterprise in 45 days 18.5 31.5 45.5 57.5 70.5 83.5 96.5 16


REVISION SERIES: STATISTICS iv. Construct ‘Ogive less than; and ‘Ogive more than’ Lower boundaries Cumulative frequency ‘Less than’ Cumulative frequency ‘More than’ 18.5 0 45 31.5 2 43 45.5 9 36 57.5 20 25 70.5 28 17 83.5 40 5 96.5 45 0 0 5 10 15 20 25 30 35 40 45 50 0 20 40 60 80 100 120 Y, CUMULATIVE FREQUENCY X, UNIT Ogive More than and less than for monthly sales in units of ZZ Enterprise 18.5 31.5 45.5 57.5 70.5 83.5 96.5 17


REVISION SERIES: STATISTICS REVISION EXERCISE 2 1. LY Berhad recently conducted training for its workers. The management wants to determine whether the training has improved its workers’ productivity. The productivity of LY’s workers after the training is shown below: Based on the above table, construct histogram. 2. The age distribution of the workers at Borneo Manufacturing is shown as follows: Age (years) Number of workers 10 - 19 85 20 - 29 120 30 - 39 225 40 - 49 135 50 - 59 105 60 - 69 30 Construct a ‘less than’ ogive and a ‘more than’ ogive. 3. The following data shows the weight (kilograms) of guava sold in November 2018: 18.1 21.4 9.5 13.5 5.2 13.2 11.2 8.6 9.1 3.9 8.1 6.7 10.5 9.6 7.2 11.5 8.1 7.9 4.5 14.9 Based on the data provided, calculate the range, number of class and class width. 4. The marks obtained by 45 candidates in a speaking test is shown below: 61 46 81 55 44 62 45 94 60 35 56 92 48 67 57 40 32 65 80 77 19 68 76 40 90 56 73 82 88 34 66 52 78 83 87 76 24 80 49 59 79 49 54 32 71 Based on the data: i. Calculate the range, number of class and class width ii. Construct a frequency distribution table that consists of class interval, tally, frequency, cumulative frequency, class boundaries, midpoint, and relative frequency. Productivity (Unit per hours) Number of workers 1 - 2 18 3 - 4 25 5 - 6 15 7 - 8 9 9 - 10 7 11 - 12 3 18


REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 2 1. The table below shows the estimated distance (in kilometers) from students’ houses to Politeknik Tuanku Syed Sirajuddin. i. Approximate the value of range, number of classes and size of class interval for the data ii. Construct frequency distribution table consists of class interval, frequency, midpoint and class boundaries. 2. The data below show the number of calories listed for selected foods and beverages. 120 100 130 130 200 150 120 210 120 190 210 170 180 130 110 190 100 160 125 90 90 125 110 150 130 190 80 115 100 175 i. Approximate the value of range, number of classes and size of class interval for the data ii. Construct frequency distribution table consists of class interval, frequency, midpoint and class boundaries. 3. Given below is the frequency distribution of monthly expenditure on food by a sample of 100 families in a town. Describe the table of data into histogram and polygon. Expenditure 0 - 50 50 – 100 100 – 150 150 – 200 200 – 250 250 – 300 Frequency 7 24 30 27 8 4 4. The following petrol mileage for 100 taxis are incomplete. Fill in a, b, c, d, e, f, g, h, I, j value in the table. Kilometres per litre No. of Taxis Relative Frequency Cumulative frequency 6 – 8 (a) (e) (i) 8 – 10 23 (f) 29 10 – 12 (b) 0.34 (j) 12 – 14 17 0.17 80 14 – 16 (c) (g) 92 16 – 18 (d) (h) 100 12 420 402 490 434 371 211 29 417 200 27 300 150 803 393 400 30 140 493 145 10 610 290 709 97 315 397 519 112 131 108 207 90 15 19 171 591 111 207 161 500 103 170 187 150 141 817 144 171 417 19


REVISION SERIES: STATISTICS 5. An insurance company researcher conducted a survey on the number of car thefts in a large city for 30 days. The data are as follows. 52 62 51 50 69 58 77 66 53 57 75 56 55 67 73 79 59 68 65 72 57 51 63 69 75 65 53 78 66 55 i. Detail the range, number of classes and class size for the data above. ii. Construct a frequency distribution table consisting of class interval, frequency, midpoint and class boundaries. 20


CHAPTER 3 CENTRAL TENDENCY 3 . 1 M E A S U R E O F C E N T R A L T E N D E N C Y 3 . 2 M E A S U R E O F C E N T R A L T E N D E N C Y : U N G R O U P E D D A T A M E A S U R E O F C E N T R A L T E N D E N C Y : G R O U P E D D A T A 3 . 3 3 . 4 R E L A T I O N S H I P : M E A N , M E D I A N , M O D E 21


REVISION SERIES: STATISTICS i. 3.1 Measure of Central Tendency Usually called as AVERAGE. Value located at the central position within a set of observations. Often used in measuring the central tendency: •Mean •Median •Mode. Measures of Central Tendency Mean The middle values of a set of quantitative data by adding up all the data value, then divide by the number of values in a set of data. Median The middle value of an ordered list in a set of data. Mode A set of data in which the value that appears most frequently (occurance). CHAPTER 3: 22


REVISION SERIES: STATISTICS 3.2 Measure of Central Tendency for Ungrouped data Central Tendency Formula / Explanation Example Mean Mean = ∑ n = number of data ∑ = the total of data In a statistics quiz, the scores of six students are 35, 53, 57, 58, 67 and 90. Calculate mean: Solution: Mean = ∑ = 35+53+57+58+67+90 6 = 360 6 = 6 Median Step: 1. Location of median: Median = +1 2 2. Value of median: Data is even, computing average of the two middle values. In a management quiz, the scores of six students are 40, 55, 57, 58, 70 and 89. Calculate median: Solution: 1. Median = 6 + 1 = 3.5 2 2. 40, 55, 57, 58, 67, 89 57 + 58 = 57.5 2 Mode Step: 1. Arrange the data in ascending order 2. Mode = value the occurs most frequency In an English test, the scores of six students are 57, 51, 57, 78, 67 and 92. Solution: = 51, 57, 57, 67, 78, 92 = 57 (most occurrences) 23


REVISION SERIES: STATISTICS 3.3 Measure of Central Tendency for Grouped data Central Tendency Formula / Explanation Mean = = = n i i i n i i f f x mean 1 1 Where xi = midpoint for the class fi = frequency for the class Median ̃ = + [ ∑ − ∑ − ] Step 1: Obtain the cumulative frequencies. Step 2: Determine the location of median class interval using cumulative frequency column. Where ∑ = Sum of frequencies = Lower class boundary of the median class = Frequency of the median class ∑ −1 = Cumulative frequency before the median class C = Median class size Mode Mode= + [ − (−)+(−) ] Where = Lower boundary of the class containing the mode 0 = frequency of the class containing the mode 1 = frequency of the class before the class containing the mode 2 = frequency of the class after the class containing the mode C = Size of the class containing the mode 24


REVISION SERIES: STATISTICS 3.4 Relationship among mean, median and mode Relationship Distribution Mean > Median > Mode Positively skewed / Skewed to the right Mode < Median < Mean Positively Skewed Mean = Median = Mode Symmetrical / Zero-skewness: Data is evenly distributed. Mean = Median = Mode Symmetrical distribution Mean < Median < Mode Negatively skewed / Skewed to the left Mean < Median < Mode Negatively Skewed 25


REVISION SERIES: STATISTICS Example 3.1: The data below are collected from 50 students prepaid reload expenses for a month. Total Expenses Frequency 1 - 20 2 21 - 40 10 41 - 60 18 61 – 80 12 81 – 100 5 101 – 120 3 Based on the data given, you are required to: i. Calculate mean ii. Calculate median iii. Calculate mode iv. Describe the skewness of the data Solution: Total Expenses Class boundaries fi xi fixi Cumulative frequency Position of data 1 - 20 0.5 – 20.5 2 10.5 21 2 1-2 21 - 40 20.5 – 40.5 10 30.5 305 12 3-12 41 - 60 40.5 – 60.5 18 50.5 909 30 13-30 61 – 80 60.5 – 80.5 12 70.5 846 42 31-42 81 – 100 80.5 – 100.5 5 90.5 452.5 47 43-47 101 – 120 100.5 – 120.5 3 110.5 331.5 50 48-50 ∑= 50 ∑=2,865 i. Mean ҧ= ∑ ҧ= 2865 50 ҧ= 57.3 26


REVISION SERIES: STATISTICS ii. Median Location of median = 50 / 2 = 25 Median class = 41 -60 x̃ = 40.5 + [ 25 − 12 18 ] x 20 x̃ = 54.94 iii. Mode ̂ = + [ 0− 1 (0−1)+(0−2) ] ̂ = 40.5 + [ 18 − 10 (18 − 10) + (18 − 12) ] 20 ̂ = 51.93 iv. Describe skewness of the data Mode < Median < Mean Positively Skewed Mode = 51.93, Median = 54.94, Mean = 57.3 27


REVISION SERIES: STATISTICS REVISION EXERCISE 3 1. Describe the measure of central tendency. 2. The following data are gathered from six month prepaid reload record from a student. 60, 65, 55, 60, 70, 60 Based on the above data, calculate mean, median, and mode. 3. The data below shows the test score of a student. 25, 78, 56, 78, 89, 47, 12, 45 Based on the above data, calculate; i. Median ii. Mode 4. The number of hours per week that 500 college students spend studying is shown as follows: 0 - 1 2 - 3 4 - 5 6 - 7 8 - 9 10 - 11 12 - 13 55 87 145 90 73 35 15 Based on the above information, calculate: i. Mean ii. Median iii. Mode 5. Calculate the mean, mode and median of ungrouped data below: 12 9 8 14 13 15 11 9 6. Calculate the mean, mode and median of ungrouped data below: 5 8 7 11 11 13 6 7 28


REVISION SERIES: STATISTICS 7. Calculate the mean, median and mode for the following data. A A B C D F E A B B F E B A C A A B 8. The following table shows a student’s score for seven subjects taken by him at Polytechnic Kuching Sarawak. Subjects Marks Statistics 89 Microeconomic 85 Marketing 77 Business Accounting 62 Islamic Studies 87 English 77 Principle of Management 78 Mathematics 79 Based on above data, Calculate: i. Mean ii. Median iii. Mode 9. Given below is the Age of PKS staffs involved in Merdeka Jogathon 2020 on 1 January 2020 held in Padang Merdeka Year of Experience No. of staff 15 – 21 2 22 – 28 7 29 – 35 6 36 – 42 3 43 – 49 1 50 – 56 5 Calculate the value of mean, median and mode 29


REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 3 1. The frequency distribution below shows the number of positive Covid-19 cases in Pahang from 1st March 2022 to 31st March 2022. The data of positive Covid-19 Number of days 383 – 653 8 654 – 924 2 925 – 1195 6 1196 – 1466 7 1467 – 1737 4 1738 - 2008 4 i. From the table, simplify the calculation for mean and median. 2. The frequency distribution below shows the monthly amount invested by employees in Kejora company under company’s profit-sharing plan. Amount invested (RM) Number of employees 30 – 34 3 35 – 39 7 40 – 44 11 45 – 49 22 50 – 54 40 55 – 59 24 60 - 64 9 65 - 69 4 i. From the table, simplify the answer for mean and median 30


REVISION SERIES: STATISTICS 3. The data gives the weight (in pounds) of a sample of 10 students of Class A in a certain college. 138, 146, 168, 146, 161, 164, 158, 126, 173, 145 State the value of mean and median. 4. State the mean number of tyres purchased annually by each individual from following data. Number of tyres purchased No. of people 1 2 2 4 4 8 5 3 7 3 8 2 9 2 10 4 12 6 5. Identify the skewness of the TWO (2) distribution and indicate the location of mean, median and mode for the distribution. i. ii. 31


REVISION SERIES: STATISTICS 6. Heights of 90 female students from Brilliant College are recorded in the following table: Calculate: i. Mean ii. Median iii. Mode 7. Identify the median, mode and mean. 95, 103, 105, 110, 104, 105, 112, 90 8. The following table shows the height of 50 students: Calculate: i. Mean ii. Median iii. Mode Height (cm) Number of students 140.05 – 145.04 8 145.05 – 150.04 13 150.05 – 155.04 12 155.05 – 160.04 7 160.05 – 165.04 20 165.05 – 170.04 16 170.05 – 175.04 9 175.05 – 180.04 5 Height (cm) Number of students 145 – 149 1 150 – 154 2 155 – 159 16 160 – 164 19 165 – 169 5 170 – 174 6 175 – 179 1 32


REVISION SERIES: STATISTICS 9. Hani Zulaikha has just launched her new scarf called KekNi. She is interested to determine the number of customers who visit her shop to buy the KekNi scarf within one month. Day Number of customers 1 – 5 11 6 – 10 18 11 – 15 32 16 – 20 24 21 – 25 20 26 – 30 15 Based on the above information, locate the center of tendency by computing: i. Mean ii. Median iii. Mode 33


CHAPTER 4 DISPERSION AND SKEWNESS 4.1 Measure of Dispersion Measure of Dispersion : Ungrouped Data 4.2 Measure of Dispersion : Grouped Data 4.3 4.4 Coefficient of Variation Measures of Skewness 4.5 34


REVISION SERIES: STATISTICS Hani Zulaikha has just launched her new scarf called KEkNi 4.1 Explain the measurement of dispersion Distribution of A and B Measurement of dispersion Dispersion of a distribution provides additional information on the reliability of the measure of central location. If the data is widely dispersed, it is considered as less representative (B) Little dispersion = More reliable (A) It is useful for comparing the spread of data A B CHAPTER 4: 35


REVISION SERIES: STATISTICS 4.2 Calculate the measure of dispersion for ungrouped data Where n = number of observations or value x = observation or value ҧ= ∑ 2 = ℎ ∑ = Example 4.1: Calculate the range and mean deviation for the data below: 5 2 3 7 10 13 9 Range • The difference between the highest and lowest values in a set of data • Range = Highest value - Lowest value Mean deviation • Measures the ‘average’ distance between each observation value and the mean in a set of observations. • Mean deviation = ∑ −ҧ Variance: • Measures the spread between numbers in a set of data around the mean. • Large variance: The data is widely dispersed around the mean. • Formula : 2 = 1 −1෌ − ҧ 2 • Formula : 2 = 1 −1 ෌ 2 − ∑ 2 Standard deviation: • Measure of how close the data is in relation to the mean. • High standard deviation: More data are dispersed. • Formula: σ = √ 36


REVISION SERIES: STATISTICS Solution: i. Range = Highest (value) − Lowest (value) = 13 – 2 = 11 ii. Mean deviation x̅ = 49 7 x̅ = 7 x̅ deviation = |5 − 7| + |2 − 7| + |3 − 7| + |7 − 7| + |10 − 7| + |13 − 7| + |9 − 7| 7 = 2 + 5 + 4 + 0 + 3 + 6 + 2 7 = 22 7 = 3.14 iii. Variance ҧ= ∑ ҧ= 49 7 = 7 2 = (5 − 7) 2 + (2 − 7) 2 + (3 − 7) 2 + (7 − 7) 2 + (5 − 7) 2 + (10 − 7) 2 + (13 − 7) 2 + (9 − 7) 2 7 − 1 2 = (−2) 2 + (−5) 2 + (−4) 2 + (0) 2 + (3) 2 + (6) 2 + (2) 2 6 2 = 4 + 25 + 16 + 9 + 36 + 4 6 2 = 94 6 2 = 90 6 2 = 15.67 iv. Standard deviation σ = √ 2 σ = √15. 67 σ = 3.96 37


REVISION SERIES: STATISTICS 4.3 Calculate the measure of dispersion for grouped data Range: • Difference between the upper boundary of the highest class and the lower boundary of the lowest class. • Range = Upper boundary (highest class) - Lower boundary (lowest class) Mean deviation: • Computed by summing up the difference between the midpoint of each class and the mean with its frequency and then dividing the value by the total of frequency • Mean deviation = 1 − ҧ Variance • Refers to the sum of the squared distance of the measurements from the mean divided by (n-1). • Formula: 2 = 1 ∑−1 − ҧ 2 • 2 = 1 ∑−1 2 − ∑ 2 ∑ Standard deviation • Refers to the square root of the variance • Formula: σ = √ 38


REVISION SERIES: STATISTICS 4.4 Calculate the coefficient of variation 4.5 Calculate the measures of skewness Condition Skewness If mean – mode > 0 ̂ < ̃ < ̅ • Positively skewed / skewed to the right. • Most of the observations are smaller than the mean and the large values are more widely distributed. If Mean – Mode = 0 ̂ = ̃ = ̅ • Symmetrical / zero-skewness which means the data is evenly distributed. • The mean, median and mode are equal. If Mean – Mode < 0 ̅ < ̃ < ̂ • Negatively skewed / skewed to the left. • Most of the observations are larger than the mean and the small values are more widely distributed. Measure changes that have occurred in a population over time Compare variability of two populations Expressed as a percentage of the mean. The smaller relative variation implies more consistency Formula: = ҧ 39


REVISION SERIES: STATISTICS Example 4.2: The data below are collected from 50 students prepaid reload expenses for a month. Total Expenses Frequency 1 - 20 2 21 - 40 10 41 - 60 18 61 – 80 12 81 – 100 5 101 – 120 3 Based on the above data, calculate: i. Range ii. Mean deviation iii. Variance iv. Standard deviation v. Pearson’s coefficient of skewness I vi. Pearson’s coefficient of skewness II vii. Comment on the distribution in question (v) and (vi) Pearson’s coefficient of skewness is commonly used to measure the skewness of the distribution • PCS 1= − = ҧ−̂ Pearson’s coefficient of skewness 1(PCS 1) • 2 = 3(−) = 3(ҧ−̃) Pearson’s coefficient of skewness 2 (avoid using mode) • Zero skewness = Distribution is symmetrical. • Positive skewness = Distribution is skewed to the right or positively skewed • Negative skewness = Distribution is skewed to the left or negatively skewed Measure of skewness: 40


REVISION SERIES: STATISTICS Solution: Class Interval Class boundaries fi xi fixi ( − ̅) f ( − ̅) ( − ̅) f ( − ̅) 1 - 20 0.5 – 20.5 2 10.5 21 46.8 93.6 2190.24 4380.48 21 - 40 20.5 – 40.5 10 30.5 305 26.8 268.0 718.24 7182.4 41 - 60 40.5 – 60.5 18 50.5 909 6.8 122.4 46.24 832.32 61 – 80 60.5 – 80.5 12 70.5 846 13.2 158.4 174.24 2090.88 81 – 100 80.5 – 100.5 5 90.5 452.5 33.2 166.0 1102.24 5511.2 101 – 120 100.5 – 120.5 3 110.5 331.5 53.2 159.6 2830.24 8490.72 ∑= 50 ∑=2,865 ∑=968.0 28,488 i. Range = 120.5 – 0.5 = 120 ii. Mean deviation = 1 [| − ҧ|] = 1 50 [968] = 19.36 iii. Variance 2 = 1 ∑ − 1 ( − ҧ) 2 2 = 1 50 − 1 (28,488) 2 = 581.39 iv. Standard deviation σ = √ 2 σ = √581. 39 σ = 24.11 41


REVISION SERIES: STATISTICS v. Pearson’s coefficient of skewness I = − = 57.3 − 51.93 24.11 = 0.22 vi. Pearson’s coefficient of skewness II = 3( − ) = 3(57.3 − 54.94) 24.11 = 0.294 vii. The distribution is skewed to the right or positively skewed. 42


REVISION SERIES: STATISTICS REVISION EXERCISE 4 1. The table below shows the number of hours per week that 500 college students spend studying. 0-1 2-3 4-5 6-7 8-9 10-11 12-13 55 87 145 90 73 35 15 Based on the above information, compute: i. Mean deviation ii. Variance iii. Standard deviation iv. PCS 1 and explain the distribution 2. Find the range for the data below: 5.1 2.2 3.4 5.7 4.6 4.3 5.9 3. The data below shows age of residents at Taman Indah. Class Frequency xi 30 – 34 3 32 35 – 39 7 37 40 – 44 11 42 45 – 49 25 47 50 – 54 9 52 55 - 59 5 57 Based on the above data, calculate: i. Mean ii. Standard deviation iii. Coefficient of variation 4. The lengths of stay on the paediatric ward of Borneo Medical Centre were arranged in a form of a frequency distribution. The mean length of stay was 18 days, the median is 15 days and the mode length are 13 days. The standard deviation was computed to be 3.2 43


REVISION SERIES: STATISTICS days. Identify the skewness of the distribution based on Pearson coefficient of skewness 1 and Pearson coefficient of skewness 2. 5. Given below is the working experience of the JP’s academic staff. Year of Experience No. of staff 1 – 5 7 6 – 10 12 11 - 15 16 16 – 20 9 21 -25 10 26 – 30 6 26 - 30 6 From the above data, calculate standard deviation and coefficient of variation 6. The data below shows the daily expenses of DPM’s students. 37 33 33 32 29 28 28 23 22 22 22 21 21 21 20 20 19 19 18 18 18 18 16 15 14 14 14 12 9 6 Based on the above data: i. Calculate mean ii. Calculate median iii. Calculate mode iv. Identify the types of skewness of the distribution. 7. State and sketch the distribution for the following condition: Condition Distribution Diagram ҧ< ̃ < ̂ ҧ= ̃ = ̂ ̂ < ̃ < ҧ 44


REVISION SERIES: STATISTICS 8. Given below is the overtime period by some employees at a manufacturing company at Samajaya Industrial Park for the year 2020. Based on the above data, calculate i. Mean ii. Median iii. Variance iv. Standard deviation v. Pearson’s coefficient of skewness II 9. Given below is the Age of PKS staffs involved in Merdeka Jogathon 2020 on 1 January 2020 held in Padang Merdeka i. Calculate the value of variance and standard deviation. ii. Calculate coefficient of variation. Overtime Period (minutes) Frequency 15 – 21 25 22 – 28 32 29 – 35 26 36 – 42 2 43 – 49 10 50 – 56 12 Year of Experience No. of staff 15-21 2 22-28 7 29-35 6 36-42 3 43-49 1 50-56 5 45


REVISION SERIES: STATISTICS FOCUS ON FINAL EXAM 4 1. The frequency distribution below shows the number of positive Covid-19 cases in Pahang from 1st March 2022 to 31st March 2022. The data of positive Covid-19 Number of days 383 – 653 8 654 – 924 2 925 – 1195 6 1196 – 1466 7 1467 – 1737 4 1738 – 2008 4 By using the data above, calculate the Pearson’s coefficient of skewness 2 (PCS 2). 2. The frequency distribution below shows the monthly amount invested by employees in Kejora company under company’s profit-sharing plan. Amount invested (RM) Number of employees 30 – 34 3 35 – 39 7 40 – 44 11 45 – 49 22 50 – 54 40 55 – 59 24 60 – 64 9 65 - 69 4 i. From the table, simplify the answer for mean and median. ii. From the data in (2i) calculate Pearson’s Coefficient of Skewness 2 (PCS 2). 3. Calculate the mean deviation for the following data: 4 7 11 3 1 46


Click to View FlipBook Version