The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.
Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by hanaafarhana28, 2022-01-16 03:19:42

QMT181

TOPIC 1 - TOPIC 6

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 1: INTRODUCTION TO • Explain the meaning and application of statistics.
STATISTICS • Recognize two types of statistics; descriptive statistics and inferential statistic.
• Understand statistical terms
Nurul Fatin Azara Binti Zulkarnain • Identify types of data
Faculty of Computer and Mathematical Sciences • Identify the scales of measurement

Universiti Teknologi MARA (UiTM)

WHAT IS STATISTICS? TERMS IN STATISTICS

Collecting Organizing Analysing Interpreting Presenting Population Sample
• Refer to a small part of the population
• Refers to the pool of individuals or
things from which a statistical sample that is used for study.
is drawn for a study.
• To designate a subset of items that are
• Any selection of individuals grouped chosen from the population
together by a common feature can be
• Statistics involves the use of scientific procedures and methods for collecting, organizing, said to be a population.
analysing, interpreting and presenting data as useful information, in order to draw valid
conclusions and make effective decisions. • To designate the complete set of items Sample
that are of interest in the research.
• This statistical processes form a part of the decision-making process in many organizations.
E.g: In a study on the reading habits of secondary school children in
• Statistics teaches people to use a limited sample to make intelligent and accurate conclusions Malaysia, the population would consist of all secondary school
about a greater population.
children in Malaysia, while the sample may consist of 1,000
secondary students randomly selected from the 14 states in Population

Malaysia.

TERMS IN STATISTICS TERMS IN STATISTIC

Parameter Statistic VARIABLES
• Computed from sample data.
• A summary measure for the entire • Measure the characteristics of the population that the researcher wants to study.
population is called a parameter. • Characteristic symbols: • An attribute that describes a person, place, thing, or idea.
n – Sample size • The value of the variable can "vary" from one entity to another.
• Characteristic symbols: m – Mean • Can be divided into qualitative and quantitative variable.
N – Population size s – Standard deviation
μ – Mean E.g. Variables of interest may be the monthly income of respondents, respondents’
σ – Standard deviation age, gender, level of education, number of children, and type of house owned by
respondents.
E.g: In a country of 10 million students, when we compute the mean of English oral scores of all
10 million students and find that the score is 60, this is called a population parameter. If 10,000

students are randomly selected from 10 million students in the country, and the average score
of their English oral test is calculated, then this is a statistic.

TERMS IN STATISTICS

TYPES OF STATISTIC

Census Sample Survey

• Measure a variable for every unit in the • Involves a group of respondents from the

population. population.

• An attempt to gather information about every • Sampling gathers information only about a

member of the population part, the sample, to represent the whole.

• It would be a straightforward way to get the • The results are used to make inferences or

most accurate, thorough information. generalizations about the population.

• Possible if the population of the study is small. • Necessary if the population is large. Descriptive Inferential
Statistics Statistics
• Malaysia undertakes a census every 10 years. • Reduce cost, time, and the results may be as

accurate as the census if it is selected by using

a proper sampling technique.

TYPES OF STATISTIC TYPES OF STATISTIC

Descriptive statistics Inferential statistics
• Make generalizations about a population by analysing the sample.
• Data are compiled, organized, summarized, and
presented in suitable visual forms which are easy • To make inferences about the population based on measurements obtained from the
to understand and suitable for use. sample.

• Various tables, graphs, charts and diagrams are • The procedure is to select a sample from the population, measure the variables of
used to exhibit the information obtained from the interest, analyse the data, interpret the output and draw conclusions based on the data
data. analysis.

• Numerical summarizes such as mean, median, • E.g. Probability distributions, hypothesis testing, correlation testing.
mode.
• E.g. In 2011, 79% of United States adults used the internet.
• Raw data are transformed into meaningful forms
so that user can make generalization just by taking
a quick look at visual presentations.

SOURCES OF DATA

TYPES OF VARIABLES

Primary Data Secondary Data Qualitative Quantitative
• Collected by the user himself. Variable Variable
• Previously collected and
• “First hand” data source. summarized by other parties for
their own use and the current
• E.g. Interview, survey. user obtained these data from
their reports.

• E.g. Newspaper reports, annual
reports of departments, and
published reports by organization.

TYPES OF VARIABLES SCALE OF MEASUREMENT

Statistics Examples

Quantitative or Numerical Qualitative or Categorical Scale Basic Characteristics Examples Descriptive Inferential
Nominal
• Measured with a numerical scale • Measured with a non-numerical scale Ordinal Number assigned to classify ID number, gender, Frequency, Chi-square, binomial
objects. programme. percentages, mode test
• Yields numerical response • Yields categorical response. Interval
• E.g How tall are you? The answer is • E.g Are you a Malaysian? The answer is only Number assigned to indicate Social class, Median, percentile, Spearman’s rank
Ratio the relative positions of the qualification, job ranking correlation, ANOVA
numerical. ‘Yes’ or ‘No’ ordered objects.
position, level of
satisfaction.

Number assigned to indicate Age, income, Range, mean, variance, Pearson’s correlation,
the magnitude of differences attitudes, opinions. standard deviation t-tests, regression

Discrete Continuous between objects. Normally in
• Numerical response which arises from a • Numerical responses which arises from a the multiple-choice response.

counting process. measuring process. Zero setting is fixed; number Length, weight, Range, mean, variance, Coefficient of
• E.g How many children do you have? • E.g How tall are you? What is your weight?
assigned to indicate the income, cost, sales standard deviation, variation, almost all
actual value or amount of quantity, amount of statistical analysis
variable. expenditure. methods can be used.

SCALE OF MEASUREMENT

Nominal Ordinal Interval Ratio

Name of restaurant Preference ranking among Preference rating of food Money spent in the last two
months at the respective
the restaurants quality from the scale of restaurants

1 to 10

A Cafe 1 8.8 RM 200

B Cafe 3 1.0 RM 0

C Cafe 2 7.1 RM 88

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 1: INTRODUCTION TO • The types of sampling techniques and how they differ from each other.
STATISTICS • Steps in carrying out the major probability sample designs.
• The strengths and weaknesses of the various types of sampling techniques.
Nurul Fatin Azara Binti Zulkarnain
Faculty of Computer and Mathematical Sciences

Universiti Teknologi MARA (UiTM)

SAMPLING

• Sampling is the process of selecting a sample from a population. SAMPLING TECHNIQUES
• Sampling is very preferable if the population is large.
• The sample must be selected in such a way that it will accurately represent its Sampling techniques can be classified
as random sampling (probability
population. sampling) and non-random sampling
• Samples can be selected randomly or non-randomly. (non-probability sampling) methods.

The sampling process

Defining the Establishing Determining Specifying Selecting the
population sampling sample size sampling sample
frame method

PROBABILITY SAMPLING TECHNIQUE SIMPLE RANDOM SAMPLING

• Also known as random sampling technique. Lottery Method:
• Probability sampling techniques are used when a researcher plans to make inferences 1. List down the members of the population
2. Write down the numbers on pieces of cards representing each unit of population.
about the population.The sample is selected based on known probabilities. 3. Put the cards in a container and draw one card at a time without replacement until the required number

SIMPLE RANDOM SAMPLING of samples are selected.
4. Units of population drawn out will be the selected samples.
• A simple random sample is selected from the population in such a way that each item has the
same chance of being selected as a sample. E.g. Mariah Agency wants to assess client’s views about the quality of their service last month.These include
assessing the satisfaction with regards to speed of service and hours of operation.The agency has a list of
• The sample is drawn randomly from a sampling frame. names of all their clients for last month in their records.The total number of clients is 20.The agency want
• We can use a table of random numbers, computer software or a calculator with random to choose 5 clients from the 20 clients using lottery method.

number generator or lottery method to choose the sample. 1. Get the list of client’s name.This would be the sampling frame.
• Members of population can be selected using either of the two methods; lottery method or 2. Give each client a number starting from 01 to 20.
3. Print each number on separate pieces of paper and place the numbers in a box.While mixing the
random number table method.
numbers real well and closing your eyes, pull out a number. Record the number.
4. Repeat the process of pulling out a number until you get 5 different numbers.
5. The 5 numbers represent the sample of clients that will be asked about their views.

SIMPLE RANDOM SAMPLING SYSTEMATIC SAMPLING

Table of Random Numbers Method: The first sample will be chosen using simple random sampling, while the subsequent samples chosen according
Assume you have the test scores for a
population of 200 students. Each student has to an interval, k from the sampling frame. 1 1
been assigned a number from 01 to 200.We 1
want to randomly sample only 5 of the =
students by using table of random numbers. 2 1
where N = population size, n = sample size. If k is calculated to be a number with decimals, then the number 2
1. Randomly point to a starting spot in the
table.Assume we land on 75636 (3rd should be rounded up to the nearest integer. 3 1
column, 2nd row) 3

2. Keep in mind that we are looking for Example: 4 1
numbers whose first three digits are 4
from 001 to 200 (representing students).
Imran Agency wants to assess client’s views about the quality of their service last month.These include 5 1
3. Continue down the column until you find 5
5 of the numbers whose first three digits assessing the satisfaction with regards to speed of service and hours of operation.The agency has a list of
are less than or equal to 200. 1
names of all their clients for last month in their records.The total number of clients is 20. Describe the steps 6 6
4. From this table, we arrive at 070 (07015),
038 (03811), 045 (04594), 055 (05542), needed to choose 5 clients from the 20 clients using systematic sampling. 7 1
7
Steps:

1. Get the list of client’s name. This would be the sampling frame. 8 1
8
2. Give each client a number starting from 01 to 20.
! #$ 1
3. Calculate the interval size, = " = % = 4 9 9

4. Use simple random sampling to choose one number from the first four number, 01 to 04. 12
00
5. If the number chosen is 02, then the subsequent samples are 06, 10, 14 and 18.

STRATIFIED SAMPLING CLUSTER SAMPLING

The population is divided into subgroups called strata.Within each stratum, a Malay The population is divided into subgroups (cluster) and then selects a number of clusters randomly from
simple random sample is selected. Example: Chinese them.This method is useful when no sampling frame is available or the sampling frame would be too costly
to be generated.
Zikry Agency wants to assess client’s views about the quality of their service last Indian Example:
year.These include assessing the satisfaction with regards to speed of service and Kai Agency wants to assess potential clients’ views about their agency. Since the agency is in Perlis, all
hours of operation.Their clients can be grouped according to race: Malay, household in Perlis are considered their potential clients.The agency does not have a complete list of
Chinese and Indians.There are 50 Malays, 40 Chinese and 10 Indians.The agency households in Perlis. Describe the steps needed to choose households from the population using cluster
sampling if all households are in the areas Kangar,Arau or Padang Besar.
wants to choose a sample of 10 clients. Steps:
1. Let Kangar,Arau and Padang Besar represent the clusters.
1.The population is divided into three strata: Malay, Chinese and Indians. 2. Use simple random sampling method to choose 1 area out of the 3 areas.
3. If Kangar is chosen, all households in Kangar are members of the sample
2. Calculate the sample size for each stratum.
%$
Malay: &$$ × 10 = 5

Chinese: '$ × 10 = 4
&$$
&$
Indian: &$$ × 10 = 1

3.We need to separate sampling frames for each stratum, thus, one name list for

the Malays, one for the Chinese and another for the Indians are required.

Then, use simple random sampling method to choose 5 Malays from the 50 Arau Kangar P.Besar

Malays, 4 Chinese from the 40 Chinese and 1 Indian from the 10 Indians.

MULTI-STAGE SAMPLING

Multi-stage sampling is very useful if the study involves geographical divisions.

As an example, suppose we want to study about the average monthly income of Petronas pump station
throughout Peninsular Malaysia.
Steps:
1. We know that Peninsular Malaysia can be divided into geographical divisions i.e 12 states.These states

are clusters. Select states randomly, say 6 states.
2. The states selected are subdivided into districts.Then, select districts randomly.
3. The selected districts are further divided into towns or cities.
4. Then, select towns or cities randomly.
5. From the selected towns or cities, take census to study.

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 1: INTRODUCTION TO • The types of sampling techniques and how they differ from each other.
STATISTICS • Steps in carrying out the major probability sample designs.
• The strengths and weaknesses of the various types of sampling techniques.
Nurul Fatin Azara Binti Zulkarnain
Faculty of Computer and Mathematical Sciences

Universiti Teknologi MARA (UiTM)

SAMPLING

• Sampling is the process of selecting a sample from a population. SAMPLING TECHNIQUES
• Sampling is very preferable if the population is large.
• The sample must be selected in such a way that it will accurately represent its Sampling techniques can be classified
as random sampling (probability
population. sampling) and non-random sampling
• Samples can be selected randomly or non-randomly. (non-probability sampling) methods.

The sampling process

Defining the Establishing Determining Specifying Selecting the
population sampling sample size sampling sample
frame method

PROBABILITY SAMPLING TECHNIQUE SIMPLE RANDOM SAMPLING

• Also known as random sampling technique. Lottery Method:
• Probability sampling techniques are used when a researcher plans to make inferences 1. List down the members of the population
2. Write down the numbers on pieces of cards representing each unit of population.
about the population.The sample is selected based on known probabilities. 3. Put the cards in a container and draw one card at a time without replacement until the required number

SIMPLE RANDOM SAMPLING of samples are selected.
4. Units of population drawn out will be the selected samples.
• A simple random sample is selected from the population in such a way that each item has the
same chance of being selected as a sample. E.g. Mariah Agency wants to assess client’s views about the quality of their service last month.These include
assessing the satisfaction with regards to speed of service and hours of operation.The agency has a list of
• The sample is drawn randomly from a sampling frame. names of all their clients for last month in their records.The total number of clients is 20.The agency want
• We can use a table of random numbers, computer software or a calculator with random to choose 5 clients from the 20 clients using lottery method.

number generator or lottery method to choose the sample. 1. Get the list of client’s name.This would be the sampling frame.
• Members of population can be selected using either of the two methods; lottery method or 2. Give each client a number starting from 01 to 20.
3. Print each number on separate pieces of paper and place the numbers in a box.While mixing the
random number table method.
numbers real well and closing your eyes, pull out a number. Record the number.
4. Repeat the process of pulling out a number until you get 5 different numbers.
5. The 5 numbers represent the sample of clients that will be asked about their views.

SIMPLE RANDOM SAMPLING SYSTEMATIC SAMPLING

Table of Random Numbers Method: The first sample will be chosen using simple random sampling, while the subsequent samples chosen according
Assume you have the test scores for a
population of 200 students. Each student has to an interval, k from the sampling frame. 1 1
been assigned a number from 01 to 200.We 1
want to randomly sample only 5 of the =
students by using table of random numbers. 2 1
where N = population size, n = sample size. If k is calculated to be a number with decimals, then the number 2
1. Randomly point to a starting spot in the
table.Assume we land on 75636 (3rd should be rounded up to the nearest integer. 3 1
column, 2nd row) 3

2. Keep in mind that we are looking for Example: 4 1
numbers whose first three digits are 4
from 001 to 200 (representing students).
Imran Agency wants to assess client’s views about the quality of their service last month.These include 5 1
3. Continue down the column until you find 5
5 of the numbers whose first three digits assessing the satisfaction with regards to speed of service and hours of operation.The agency has a list of
are less than or equal to 200. 1
names of all their clients for last month in their records.The total number of clients is 20. Describe the steps 6 6
4. From this table, we arrive at 070 (07015),
038 (03811), 045 (04594), 055 (05542), needed to choose 5 clients from the 20 clients using systematic sampling. 7 1
7
Steps:

1. Get the list of client’s name. This would be the sampling frame. 8 1
8
2. Give each client a number starting from 01 to 20.
! #$ 1
3. Calculate the interval size, = " = % = 4 9 9

4. Use simple random sampling to choose one number from the first four number, 01 to 04. 12
00
5. If the number chosen is 02, then the subsequent samples are 06, 10, 14 and 18.

STRATIFIED SAMPLING CLUSTER SAMPLING

The population is divided into subgroups called strata.Within each stratum, a Malay The population is divided into subgroups (cluster) and then selects a number of clusters randomly from
simple random sample is selected. Example: Chinese them.This method is useful when no sampling frame is available or the sampling frame would be too costly
to be generated.
Zikry Agency wants to assess client’s views about the quality of their service last Indian Example:
year.These include assessing the satisfaction with regards to speed of service and Kai Agency wants to assess potential clients’ views about their agency. Since the agency is in Perlis, all
hours of operation.Their clients can be grouped according to race: Malay, household in Perlis are considered their potential clients.The agency does not have a complete list of
Chinese and Indians.There are 50 Malays, 40 Chinese and 10 Indians.The agency households in Perlis. Describe the steps needed to choose households from the population using cluster
sampling if all households are in the areas Kangar,Arau or Padang Besar.
wants to choose a sample of 10 clients. Steps:
1. Let Kangar,Arau and Padang Besar represent the clusters.
1.The population is divided into three strata: Malay, Chinese and Indians. 2. Use simple random sampling method to choose 1 area out of the 3 areas.
3. If Kangar is chosen, all households in Kangar are members of the sample
2. Calculate the sample size for each stratum.
%$
Malay: &$$ × 10 = 5

Chinese: '$ × 10 = 4
&$$
&$
Indian: &$$ × 10 = 1

3.We need to separate sampling frames for each stratum, thus, one name list for

the Malays, one for the Chinese and another for the Indians are required.

Then, use simple random sampling method to choose 5 Malays from the 50 Arau Kangar P.Besar

Malays, 4 Chinese from the 40 Chinese and 1 Indian from the 10 Indians.

MULTI-STAGE SAMPLING

Multi-stage sampling is very useful if the study involves geographical divisions.

As an example, suppose we want to study about the average monthly income of Petronas pump station
throughout Peninsular Malaysia.
Steps:
1. We know that Peninsular Malaysia can be divided into geographical divisions i.e 12 states.These states

are clusters. Select states randomly, say 6 states.
2. The states selected are subdivided into districts.Then, select districts randomly.
3. The selected districts are further divided into towns or cities.
4. Then, select towns or cities randomly.
5. From the selected towns or cities, take census to study.

QMT181: Introduction to Statistics NON PROBABILITY SAMPLING TECHNIQUE

TOPIC 1: NON PROBABILITY • Any procedure in which elements will not have the equal chance of being included in
SAMPLING TECHNIQUE AND DATA a sample.

COLLECTION • Non-probability sampling technique is used when sampling frames are difficult to
obtain.
Nurul Fatin Azara Binti Zulkarnain
Faculty of Computer and Mathematical Sciences • Performing non-probability sampling is considerably less expensive than doing
probability sampling, but the results are of limited value.
Universiti Teknologi MARA (UiTM)

CONVENIENCE SAMPLING JUDGEMENTAL SAMPLING

• The selection of elements or sampling units is left primarily to the interviewers. • The population elements are selected based on the judgement of the researcher.
• Respondents are selected because they happen to be in the right place at the right time. • Usually, the sample must conform to a certain criterion set by the researcher.
• For example, in a study about homeless people, the researcher may want to talk to those who are
Example:
homeless.
A researcher want to conduct a study at the entrance of an airport at 10 AM, the passengers who arrive
there at 10 AM will be selected as respondents for his research.The researcher can conduct an interview
with these respondents or distributing the questionnaire for them to answer.

QUOTA SAMPLING SNOWBALL SAMPLING

• The population divided into subgroups. • Snowball sampling is a method in which a researcher identifies one member of some population of
• Samples are chosen non-randomly from these subgroups to get the required quota. interest, speaks to him, and then asks that person to identify others in the population that the researcher
• The selection of sample are done so that the sample contain a certain number of items (quota) with a might speak to.

given characteristics. • This person is then asked to refer the researcher to yet another person, and so on.
• Generally, the researcher arbitrarily selects a predetermined number of individuals from different classes
• This procedure is applied until the researcher obtains the required number of respondents. For example
of population usually different occupational, age or based on their styles, dressing etc. in study the behaviour of drug addicts.
• The sample selection of samples is up to the researchers’ choices following the given quotas.
• The process is quite similar with convenience sampling but it differs in terms of the flexibility of the

researcher to choose the respondents he wants provided they fulfil the stated specifications.

DATA COLLECTION TYPES OF QUESTIONS

Types of questions that can be used in a questionnaire:

• Open-ended questions: Questions asking about respondents’ opinions concerning some issues related
to the study.
E.g.What do you think of the new rules that have been imposed to all the staffs in the company?

• Close-ended questions: Consists of dichotomous and multiple-choice questions. E.g.
i. Are you married?: O Yes O No
ii. Your CGPA : O Less than 2.00 O 2.00 – 3.00 O More than 3.00

• Scaled type of question.
E.g. Do you agree that the teachings of Mathematics and Science in schools being taught in English?

1. Strongly Disagree 2. Disagree 3. Neutral 4.Agree 5. Strongly Agree

GUIDELINES TO WRITE A QUESTIONNAIRE

Write questions according to the following guidelines:

1. Use short questions.
2. Use simple language.
3. Ask only one issue per question. Do not write questions that ask about two things at once.
4. Questions must be in order.
5. Use clear terms. If necessary, define terms that are not familiar to the respondents.
6. Avoid personal questions.
7. Avoid sensitive questions or words that may offend the respondents, their organisation or their ethnic

group.
8. Avoid questions that require calculations to be made by respondents.
9. Use more closed ended questions. Minimize the use of open ended questions.
10. A questionnaire checklist can be constructed to ensure all require data are included.

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 11:ORGANIZING & • Organize qualitative data and quantitative data into a frequency table from raw data.
PRESENTING DATA • Draw pie, bar and component bar charts
• Draw graph quantitative data such as stem-and-leaf plot, histogram, ogive and use these
Nurul Fatin Azara Binti Zulkarnain
Faculty of Computer and Mathematical Sciences graph to understand the problem and make decisions

Universiti Teknologi MARA (UiTM)

DATA ORGANIZATION AND PRESENTATION METHODS FOR ORGANIZING AND PRESENTING QUALITATIVE DATA
Qualitative Data
• After data is collected, data are next ready to be presented and organized in an
understandable form such as tables, charts or graphs. Frequency Pie Chart Bar Chart Contingency
Distribution Table Table (two way
• Data presentation is an essential step before further statistical analysis can be done.
• It is a visual way to look at the data and see what happened to the data and make table)

interpretations about it.
• It is usually the best way to show the data to others especially non statistical person.

One variable Two variables
• Vertical • Cluster/Multiple
• Horizontal • Stacked/Component

FREQUENCY DISTRIBUTION TABLE

FREQUENCY DISTRIBUTION TABLE E.g.The data below shows the race of 50 students majoring in Statistics at One College.
Construct a frequency table for this data.
• A frequency table is a table consists of columns and rows.
• A frequency table is a simple table where data are collected and arranged in some I MC I MMMMM I Note:
I MMOC I C I CM M – Malay
simple classes and categories. COMMMOMMMC C – Chinese
• A frequency distribution summarizes data into classes and their frequencies. M I OC CMM I O I I – Indian
CMCM I CCCCC O – Others

Qualitative Frequency Table : Frequency Class frequency
Variable 20
Race 15
Malay 10
Chinese 5
Indian 50
Other
Total

RELATIVE FREQUENCY DISTRIBUTION TABLE RELATIVE FREQUENCY DISTRIBUTION TABLE

• Can be extended to the relative frequency table analysis. E.g.The data below shows the race of 50 students majoring in Statistics at One College.
Construct a relative frequency table for this data.
• Relative frequency table analysis can be defined as an analysis that show the fraction of
the total number of observation in each category. Race Frequency Relative Frequency Percentage
Malay 20 20 / 50 = 0.4 40%
• The percentage of the relative frequency are also presented in this table by multiply Chinese 15 15 / 50 = 0.3 30%
the value of relative frequency with 100. Indian 10 10 / 50 = 0.2 20%
Other 5 5 / 50 = 0.1 10%
Relative frequency of a class j is defined as Total 50 1 100%

Frequency of the class j = !
Total number of frequency ∑

VERTIC AL B AR CHART

BAR CHART Vertical Bar Chart for 50 student's race

( H O R I ZO NTAL B AR C HART O R V E RT I C AL B AR C H ART) 25

• Simplest graphical presentation. Race No of 20
• Bar chart can be consider as a chart made of bars whose heights represent the frequencies of Students
Malay
respective categories. Chinese 20
• Basically, for the single qualitative variable, we have two options of bar chart which are Indian 15
Other 10
Horizontal Bar Chart and also Vertical Bar Chart. 5
Race Total 50 15
No of Students
10

5

0 Chinese Indian Other
Malay Race

HORIZONTAL B AR CHART

Horizontal Bar Chart for 50 student's race PIE CHART

Race No of Other 5 10 15 20 • Another graphing technique for a single qualitative variable that we can apply is a pie chart.
Students Indian No of Students
Malay Chinese • Pie chart basically is a circle divided into portions that represent the relative frequencies or
Chinese 20 Malay percentages of a population or a sample belonging to different categories.
Indian 15
Other 10 0 • To construct a pie chart, angles of each category must be calculate by using this formula:
5
Total 50 = ×360


25

PIE CHART

E.g.The data below shows the race of 50 students majoring in Statistics at One College. CONTINGENCY TABLE
Construct a pie chart for this data.
• Contingency table is defined as a cross tabulation displays the joint distribution of two qualitative
Race No of Percentage Angle Pie Chart for 50 students’ race variables.
Students 10%
• Also known as two-way table.
Malay 20 40% 144º Malay
Chinese
Chinese 15 30% 108º 20% 40% Indian
Other
Indian 10 20% 72º

Other 5 10% 36º

Total 50 100% 360º 30%

CONTINGENCY TABLE

E.g.The data below shows the race of 50 students majoring in Statistics at One College.
Construct a contingency table for this data.

I, Ma M, Ma C, Fe I, Ma M, Fe M, Fe M, Fe M, Fe M, Fe I, Ma Note: BAR CHART
I, Ma M, Fe M, Fe O, Ma C, Fe I, Fe C, Fe I, Ma C, Fe M, Ma
C, Fe O, Fe M, Fe M, Fe M, Ma O, Fe M, Ma C, Ma M, Fe C, Fe M – Malay ( CL U STER/MULTIPL E B AR CHART & S TAC KED/CO MPO NE NT B AR C HART)
M, Ma I, Fe O, Ma C, Fe C, Ma M, Fe M, Ma M, Fe O, Fe I, Ma C – Chinese
C, Fe M, Ma C, Ma M, Fe I, Fe C, Fe C, Ma I, Fe C, Ma C, Ma I – Indian • Two qualitative variables and contingency table data, can be represented by graph.
O – Others • There are two common graphing techniques that can be used for graphing this data, which are
Ma – Male
Cluster / Multiple Bar Chart and Stacked / Component Bar Chart.
Fe – Female

Contingency Table :

Race Gender
Male Female
Malay
Chinese 7 13
Indian 69
Other 64
23

CLUSTER/MULTIPLE B AR CHART STACKED/COMPONENT B AR CHART

Cluster Bar Chart for 50 Student's Race According to Gender Stacked Bar Chart for 50 Student's Race According to Gender

14 25
13
Race Gender Race Gender
Male Female 12 Male Female
Malay Malay
Chinese 7 13 10 9 Chinese 7 13 20
Indian 69 8 6 Indian 69
Other 64 Other 64 15
23 7 23 13
Number of Students6 6 Male 9 Female
Number of Students44Female10 4 Male
2
3 5 3
2 7 2
Other
0 66
Malay
0 Chinese Indian Other Chinese Indian
Malay

Race Race

METHODS FOR ORGANIZING AND PRESENTING QUANTITATIVE DATA STEM-AND-LEAF PLOT

Quantitative Data • Display separates data entries into leading digits known as the stem and trailing digits
known as the leaf.

• Stem - left side of the column, the leaf – right side of the column

Stem-and-leaf Frequency Histogram Frequency Ogive Stem Leaf
plot Distribution Table Polygon
4 5
5 523848
6 825300482
7 5127420
8 20440
9 12

STEM-AND-LEAF PLOT STEM-AND-LEAF PLOT

E.g.The data below shows the marks for Probability and Statistics course for 30 students. Construct a stem-and-leaf plot. Step 3: Construct the stem-and-leaf plot.
51 70 79 75 72 55 25 38 74 54
37 15 56 17 77 43 16 15 72 72 15 15 16 17 24 25 25 30 37 38
25 30 24 46 47 46 45 38 81 49 38 43 45 46 46 47 49 51 54 55
56 70 72 72 72 74 75 77 79 81
Step 1: Rearrange the data from the smallest to highest value.
Stem Leaf
15 15 16 17 24 25 25 30 37 38
38 43 45 46 46 47 49 51 54 55 1 5, 5, 6, 7
56 70 72 72 72 74 75 77 79 81 2 4, 5, 5
3 0, 7, 8, 8
Step 2: Split each of the data into two sets of digit. 4 3, 5, 6, 6, 7, 9
5 1, 4, 5, 6
1|5 1|5 1|6 1|7 ……………………………. 7|5 7|7 7|9 8|1 6
7 0, 2, 2, 2, 4, 5, 7, 9
8 1

The stem-and-leaf display
shows that the distribution
is skewed to the left.

FREQUENCY DISTRIBUTION FREQUENCY DISTRIBUTION

• A frequency table summarizes the data collected by forming intervals of values and E.g.The data below shows the marks for Probability and Statistics course for 30 students.
indicating the number that falls into each interval. 51 70 79 75 72 55 25 38 74 54
37 15 56 17 77 43 16 15 72 72
• This frequency table with class intervals is known as the frequency distribution of 25 30 24 46 47 46 45 38 81 49
grouped data.
Step 1: Decide on the number of classes, k using formula: 2k > n, where n = number of observations
Step 1: Decide on the number of classes
Step 2: Determine the class interval
Step 3: Set the class interval
Step 4: Tally and count the number of items in each class.

FREQUENCY DISTRIBUTION FREQUENCY DISTRIBUTION 38
55
Step 2: Determine the class interval.The class interval should be same for all classes which can be calculated Step 4:Tally and count the number of items in each class 81
using following formula: i ≥ "%#$, where H = highest data value, L = lowest data value, k = number of classes
15 15 16 17 24 25 25 30 37
Step 3: Set the class interval. 38 43 45 46 46 47 49 51 54
Class interval: 15-28, 29-42, 43-56, 57-70, 71-84 56 70 72 72 72 74 75 77 79

FREQUENCY DISTRIBUTION FREQUENCY DISTRIBUTION

Class limit - The class limits of a class are the highest and lowest values of class. The class limit, class boundaries, midpoint and class interval should be,
Class boundaries - The values such that two class boundaries, the upper class
boundaries and the lower class boundary. &'( &) = 14.5, *+(*, = 28.5 &'(*, = 21.5 OR *,.'(&).' = 21.5
* * * *
Class interval - The size of the class.
= −

Midpoint – The centre value of the class.

29 – 15 = 14

HISTOGRAM

HISTOGRAM

• Histograms are used to describe numerical data that have been grouped into the
simple and relative frequency distribution.

• Histogram is a series of bars that represents the frequency of observations within a
class interval.

• The graph is likely looks like a bar chart but the bars in the histogram actually touch
each other.

• Class boundaries are used in the construction of histogram and the height of the bars
are similar to its frequencies, where X-axis represents class boundaries, and Y-axis
represents frequency.

FREQUENCY POLYGON

FREQUENCY POLYGON

• To draw a frequency polygon, put a dot to represent the frequency of each class above
the midpoint of each class interval.

• The dots are then connected to form the polygon.
• Two additional classes with zero frequencies are added.
• If a histogram is available, the frequency polygon is obtained by connecting the

midpoints of the histogram.
• In graph, X-axis represents midpoint whereas Y-axis represents frequency.

FREQUENCY POLYGON

CUMULATIVE FREQUENCY DISTRIBUTION AND OGIVE

• Cumulative frequency is used to determine the number of observations that lie above or below a
particular value in a data set.

• The cumulative frequency is calculated using a frequency distribution table.
• Cumulative frequency is determined by adding frequencies.
• The last value will always be equal to the total for all observations, since all frequencies will already have

been added to the previous total.
• There are two types of cumulative frequency distribution ‘less than’ and ‘more than’ cumulative

distribution.
• Generally we use ‘less than’ cumulative distribution.
• An ogive is a graph or a line chart of a cumulative frequency distribution.
• There are two ways of constructing an ogive or cumulative frequency curve.
• In order to construct an ogive, X-axis represent the class boundaries, and Y-axis will be a cumulative

frequency value.

CUMULAT IV E FR EQ UENCY DIST R IBUT IO N A ND O GIV E

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 3: NUMERICAL At the end of this chapter, you should be able to:
DESCRIPTIVE MEASURES • Estimate the measures of central tendency for ungrouped and grouped data
• Estimate the measures of position for ungrouped and grouped data
Nurul Fatin Azara Binti Zulkarnain • Estimate the measures of dispersion for ungrouped and grouped data
Faculty of Computer and Mathematical Sciences • Estimate the measures of skewness

Universiti Teknologi MARA (UiTM)

MEASURES OF CENTRAL TENDENCY

GROUP OF DATA

Mean Median Mode

• It is used to represent a group of data distribution. Grouped Ungrouped

• Central tendency is a single value in the centre of a data and can be taken as a summary value for Presented in the form of a Raw data.The data that
frequency distribution give information on each
data set. table member of the population
• E.g. If we have data about the ages of children in a playground, we can find one value to represent the or sample individually. Or
not given in the form of
ages of the children in the playground.
• A measure of the central tendency can be either the mean value of data, the middle value of the data frequency table.

or the most frequent value of the data.

ARITHMETIC MEAN OR MEAN ( ! )

• Mean or average or arithmetic mean measure the centre of the data. MEAN FOR UNGROUPED DATA

• Mean is the average of a set of measurement, is the most commonly used measure of central The following are the weight (kg) of all ten employees of a small company:
tendency. 53 62 61 87 48 45 80 57 63 62

ü Add all of the values together. Find the mean weight of these employees.

ü Divide by the number of values to obtain the mean.

• Disadvantage: very sensitive to outliers or extreme values. (Values that are very small or very
large relative to the majority of the values in a data set.)

Ungrouped data Grouped data ̅ = ∑!$"# ! = 53 + 62 + 61 + 87 + 48 + 45 + 80 + 57 + 63 + 62
= 618 10
∑!$"# "! ∑!$"# &!"! 10
̅ = # , = 1,2,3, … , ̅ = ∑& , = 1,2,3, … ,

where, where, = .
̅ = Mean value, ̅ = Mean value,
!= Data value, != Midpoint value,
n = Sample size != Frequency value,
n = Sample size

MEAN FOR GROUPED DATA MEAN FOR GROUPED DATA

Calculate the mean age for the following data: Calculate the arithmetic mean for the given grouped data.

Age (years) (x) No of students (f) fx Monthly Income No of employees (f) Mid-point (x) fx
180 (RM)
18 10 532 5996
640 700 - 799 8 794.5 11043.5
19 28 315 13293
110 800 - 899 13 849.5 10495
20 32 28737.5
! = 900 - 999 14 949.5
21 15 4998
1000 - 1099 10 1049.5
22 5 ! = 74563
1100 - 1199 25 1149.5

! = 90 1200 - 1299 4 1249.5

The given data is the grouped data since it is tabulated as a frequency table. Note also that the x ! = 74
value is the value of the variable (i.e age), so the formula used is:
∑"%#$ "!#"
̅ = ∑%"#$ "!#" = $%%% = 19.74 ̅ = ∑" = %()*+ = 1007.61
∑" &' %(

The value 19.74 means the average age of the students is 19.74 years. Therefore, the average income of 74 employees is RM 1,007.61.

MEDIAN ( $ ) MEDIAN FOR UNGROUPED DATA (ODD)

• Median is a middle value of the arranged data in ascending order. Steps:
• The interpretation of median is 50% of the total number of observations having a value less than a 1. Arrange data in ascending order.
2. Find the position of median.
median value while another 50% of the total number of observations having a values more than a Position of median = (n + 1)/2, where n is the number of observations.
median value. 3. Locate the median value from the arranged data and hence determine the median value.
• Properties of the median: E.g.
1) It always exists and it is always unique. Calculate the median:
2) Extreme values do not affect its value.
4, 20, 6, 12, 8, 26, 21
1. 4 , 6, 8, 12, 20, 21, 26
2. Position of median = (7 + 1)/2 = 4
3. According to ascending order data above, 4th data is 12. Therefore, the median of above data is 12.

MEDIAN UNGROUPED DATA (EVEN) MEDIAN FOR GROUPED DATA

Steps: Steps:
1. Arrange data in ascending order.
2. Find the position of median. 1. Obtain the cumulative frequencies.
Position of median = (n + 1)/2, where n is the number of observations.
3. Locate the median value from the arranged data and hence determine the median value. 2. Determine the location of the median class, where the formula is ∑&
E.g. '
The following are the height (cm) of 8 children in a school. Calculate the median:
3. Find the median value by using formula:
120, 107, 101, 123, 111, 116, 129, 105
1. 101, 105, 107, 111, 116, 120, 123, 129
2. Position of median = (8 + 1)/2 = 4.5
3. According to ascending order data above, since the position 4.5 is between 111 and 116, so the

median of data is = (111+116)/2 = 113.5
50% of the 8 children’s height is less than 116 and another 50% more than 111.

MEDIAN FOR GROUPED DATA MEDIAN FOR GROUPED DATA

Calculate the median number of sick days for the following grouped data: Calculate the median for the following data:
Number of sick days Number of students Cumulative frequency
0 12 12 Marks Class boundaries Number of students Cumulative frequency
1 14 26 2 2
2 10 36 30-39 29.5-39.5 3 5
3 5 41 10 15
4 3 44 40-49 39.5-49.5 14 29
7 36
50-59 49.5-59.5 3 39
1 40
60-69 59.5-69.5

70-79 69.5-79.5

∑& 80-89 79.5-89.5
'
Step 2: Find the position of median = = 44/2 = 22 90-99 89.5-99.5

Step 3: Look up the cumulative frequency column and look for the value of cumulative frequency Step 2: Find the position of median = ∑& = 40/2 = 20
which is equal or next greater than 22. '

Step 3: Look up the cumulative frequency column and look for the value of cumulative frequency
which is equal or next greater than 20.The median class is 59.5 – 69.5

Therefore, the median class is 1 and the median is also 1. ~ ')*#+ 10 = 63.07
#,
Step 4: So, Lm = 59.5, ∑ m-1= 15, fm=14, C = 10; = 59.5 +

50% of the students have marks less than 63.07 and 50% of the students have marks more
than 63.07.

MEDIAN FOR GROUPED DATA MEDIAN FOR GROUPED DATA

Instead of using the Formula for estimating the median value, the median value for grouped data can be
estimate using the Ogive.The procedures are:
Step 1: Obtain the cumulative frequencies.
Step 2: Draw the Ogive
Step 3: Determine the location of the median class, where the formula is ∑'&.
Step 4: Determine the median value by look at the Ogive graph in the cumulative frequency axis (i.e.Y-
Axis) and move to the right until curve is touched.Then move down and obtain the median value at
class boundaries axis (i.e. X-Axis).

Step 2: Find the position of median = ∑& = 30/2 = 15
'

Step 3: Locate the value 15 on the y-axis. Read the corresponding median values on the x-
axis. From the ogive, the median is estimated to be 48.5 marks.

MODE MODE FOR UNGROUPED DATA

• Mode is the values that appear most frequently in a data set. Properties of the mode: Find the modal value for the given data:
1. It always exists and the value can be more than one value. 17, 14, 16, 20, 14, 18, 14, 12, 14
2. Extreme values do not affect its value.
• The formula for estimating the mode value can be categorized into two situation which As we can see, the value 14 occurs the most frequent (4 times), so the mode value is 14.

are ungrouped data situation and grouped data situation.The formula are:

MODE FOR UNGROUPED DATA MODE FOR GROUPED DATA ( % )

Find the modal value for the given data: Step 1: Determine the location of the modal class which is based on the highest
frequency.
17, 14, 16, 20, 14, 18, 14, 12, 14, 17, 11, 17, 10, 17 Step 2: Find the mode value by using formula:

As we can see, the value 14 and 17 occurs the most frequent (4 times), so the mode
values are 14 and 17.A group of data that has two modal values is known as a bimodal
distribution.

MODE FOR GROUPED DATA MODE FOR GROUPED DATA

Compute the mode value for the following data: Compute the modal value for the following data:

Number of sick days Number of students Marks Class boundaries Number of students
1 12 2
2 14 30-39 29.5-39.5 4
3 10 10
4 5 40-49 39.5-49.5 13
5 3 8
50-59 49.5-59.5 2
1
60-69 59.5-69.5

70-79 69.5-79.5

80-89 79.5-89.5

The mode value is 2, which means number of sick days for most of students is 2. 90-99 89.5-99.5

The mode class for the above data is 60-69 since the class has the highest frequency, so,

Lm= 59.5, △#= 13 − 10 = 3,△'= 13 − 8 = 5, = 69.5 − 59.5 = 10. Therefore substituting the
values into the formula, we obtain
3
^ = 59.5 + 3+5 10 = 63.25

which means, most of the students have scored 63.25 marks.

MODE FOR GROUPED DATA MODE FOR GROUPED DATA

Instead of using the Formula for estimating the mode value, the mode value for grouped data can be From the histogram, the mode marks is estimated to be 48.1, that
estimate using the Histogram.The procedures are: means most of the students scored 48.1 marks.
Step 1: Drawn the Histogram.
Step 2: Determine the location of the modal class by looking at highest bar.
Step 3 Determine the modal value by drawing two lines, which are:
• First line: Stating from the left highest point and joint to the lower right point.
• Second line: Starting from the right highest point and joint to the lower left point.
This two lines will produce one interception line and move down and obtain the mode value at class
boundaries axis (i.e. X-Axis).

RELATIONSHIP BETWEEN MEAN, MEDIAN, MODE

The shape of a distribution can be determined by using the three values of mean, median and mode.
Three common shape of a distribution are:
1) Symmetrical distribution
If the value of mean = median = mode
2) Positively – skewed distribution (skewed to the right) If the value of mode < median < mean
3) Negatively – skewed distribution (skewed to the left) If the value of mode > median > mean

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 2: NUMERICAL At the end of this chapter, you should be able to:
DESCRIPTIVE MEASURES • Estimate the measures of central tendency for ungrouped and grouped data
• Estimate the measures of position for ungrouped and grouped data
Nurul Fatin Azara Binti Zulkarnain • Estimate the measures of dispersion for ungrouped and grouped data
Faculty of Computer and Mathematical Sciences • Estimate the measures of skewness

Universiti Teknologi MARA (UiTM)

MEASURES OF POSITIONS

Q U A RT I L E S

The First Quartile / The Second Quartile / The Third Quartile / 25% 25% 25% 25%
Lower Quartile Median Upper Quartile
(Q1) (Q2) (Q3) Lowest Q1 Q2 Q3 Highest

25% of the total data is less 50% of the total data is less 75% of the total data is less
than first quartile value and than second quartile value than third quartile value and
75% of the total is more than and 50% of the total is more 25% of the total is more than
than second quartile value.
first quartile value. third quartile value.

QUARTILES FOR UNGROUPED DATA (ODD) QUARTILES FOR UNGROUPED DATA (EVEN)

Step 1:Arrange the data in ascending order (i.e. lowest to highest) Step 1:Arrange the data in ascending order (i.e. lowest to highest)

Step 2: Find the position of quartiles, the formula are: 1 Step 2: Find the position of quartiles, the formula are:
+ 1 + 3 + 1
! = 4 , " = 2 , # = 4 , + 1 + 1 3 + 1
! = 4 , " = 2 , # = 4 ,

where ! = First Quartile, " = Second Quartile, #= Third Quartile, n = Sample size. where ! = First Quartile, " = Second Quartile, #= Third Quartile, n = Sample size.

Step 3: Find the quartile value, by using formula: Step 3: Find the quartile value, by using formula:

= + ( − ) = + ( − )

E.g Compute the first quartile, the second quartile and the third quartile for the following data: E.g Compute the first quartile, the second quartile and the third quartile for the following data:

! " # 12, 15, 8, 20, 5, 25, 30, 23, 17 ! " # 14, 23, 28, 38, 49, 51, 57, 65, 71, 84

1. 5, 8, 12, 15, 17, 20, 23, 25, 30 1. 14, 23, 28, 38, 49, 51, 57, 65, 71, 84

2. Position of quartiles: ! = $%! = '%! = 2.5, " = $%! = '%! = 5, # = # $%! = #('%!) = 7.5. ! = $%! = !*%! = 2.75, " = $%! = !*%! = 5.5, # = # $%! = #(!*%!) = 8.25.
& & " " & & & & " " & &
2. Position of quartiles:

3. Quartile value: ! = 8 + 0.5(12-8) = 10, " = 17, # = 23+0.5(25-23)=24 3. Quartile value: ! = 23 + 0.75(28-23) = 26.75, " =49+0.5(51-49)=50, # = 65+0.25(71-65)=66.5

QUARTILES FOR GROUPED DATA QUARTILES FOR GROUPED DATA

• Can be computed by formula or estimated by an Ogive. Calculate the first quartile, second quartile and third quartiles for the following grouped data:

Monthly Income (RM) Class boundaries Number of Employees Cumulative Frequency

700 - 799 699.5 – 799.5 10 10

Formula that can be used to compute quartiles are: 800 - 899 799.5 – 899.5 12 22

+ = ,! + ∑# -∑/%!&&' , 900 - 999 899.5 – 999.5 25 47
$

/%! 1000 - 1099 999.5 – 1099.5 8 55

where, ,! = lower class boundary of the first, second and third quartile, 1100 - 1199 1099.5 – 1199.5 17 72
∑ ,!&-! = Cumulative frequency before the respective quartiles class,
,! = Frequency of the respective quartiles class, 1200 - 1299 1199.5 – 1299.5 8 80
= Class size of the respective quartiles class.
Position of != ∑/ = 80/4 = 20
&
Locate ! from the cumulative frequency column, from the table, the first quartile class is 799.5 – 899.5,

and i = 1, 2, 3 represents the first, second and third quartiles. ! = ,' + 4 − ∑ ,'&-! = 799.5 + 20 − 10 100 = .
12
,'
which means, 25% of the employees have monthly income less than RM 882.83, while 75% of the

employees have monthly income more than RM 882.83.

QUARTILES FOR GROUPED DATA QUARTILES FOR GROUPED DATA

Calculate the first quartile, second quartile and third quartiles for the following grouped data: Calculate the first quartile, second quartile and third quartiles for the following grouped data:

Monthly Income (RM) Class boundaries Number of Employees Cumulative Frequency Monthly Income (RM) Class boundaries Number of Employees Cumulative Frequency

700 - 799 699.5 – 799.5 10 10 700 - 799 699.5 – 799.5 10 10

800 - 899 799.5 – 899.5 12 22 800 - 899 799.5 – 899.5 12 22

900 - 999 899.5 – 999.5 25 47 900 - 999 899.5 – 999.5 25 47

1000 - 1099 999.5 – 1099.5 8 55 1000 - 1099 999.5 – 1099.5 8 55

1100 - 1199 1099.5 – 1199.5 17 72 1100 - 1199 1099.5 – 1199.5 17 72

1200 - 1299 1199.5 – 1299.5 8 80 1200 - 1299 1199.5 – 1299.5 8 80

Position of "= ∑/ = 80/2 = 40 Position of #= # ∑ / = 240/4 = 60
" &
Locate " from the cumulative frequency column, from the table, the second quartile class is 899.5 – 999.5, Locate # from the cumulative frequency column, from the table, the third quartile class is 1099.5 – 1199.5,
∑ 3∑
" = ,( + 2 − ∑ ,(&-! = 899.5 + 40 − 22 100 = 971.50 # = ,) + 4 − ∑ ,)&-! = 1099.5 + 60 − 55 100 = 1128.91
25 17
,( ,)
which means, 50% of the employees have monthly income less than RM 971.50, while 50% of the which means, 75% of the employees have monthly income less than RM 1128.91, while the other 25%

employees have monthly income more than RM 971.50. of the employees have monthly income more than RM 1128.91.

RANGE RANGE

• Range can be defined as the different between the largest and smallest values in one data set.The Find the range for the following data:
formula for estimating this measurement are: 12, 18, 20, 34, 8, 42, 30, 58, 40

The formula for estimating this measurement are: Range = Largest value – Smallest value
= 58 – 8
Range for ungrouped data = Largest value – Smallest value = 50

Range for grouped data = Upper boundary of last class – lower boundary of first class Find the range for the following data: 4 6 8 10
10 15 12 9
x2
f 13

Range = Largest value – Smallest value
= 10 – 2
=8

RANGE FOR GROUPED DATA INTERQUARTILE RANGE AND QUARTILE DEVIATION

Find the range for the following data: Class Boundaries No. of shops • Interquartile range can be defined as the different between the first quartile and third quartile values
2.5 – 6.5 12 in one data set.The formula for estimating this measurement are same for the ungrouped and
Daily Sales (RM’000) 5.5 – 8.5 20 grouped data, where the formula (for both group and ungrouped data) is:
3–5 8.5 – 11.5 13
6–8 8 Interquartile Range (IQR) = 3 − 1
9 – 11 11.5 – 14.5 Quartile deviation measure the spread or dispersion of quartiles, the formula is:
12 – 14

Range = Upper boundary of last class – lower boundary of first class Quartile deviation (QD) = 3 − 1
= 14.5 – 2.5
= 12 "

Thus, the range is RM 12,000 Quartile deviation is sometimes known as the semi-interquartile range.

INTERQUARTILE RANGE AND QUARTILE DEVIATION INTERQUARTILE RANGE AND QUARTILE DEVIATION

Compute the interquartile range and the quartile deviation for the following data: Compute the interquartile range and the quartile deviation for the following grouped data:
15, 7, 23, 41, 28, 37, 46, 25, 14, 17
No of cars owned No of families Cumulative frequency
1. Calculate the first quartile and the third quartile. 1 14 14
2 28 32
Re-arrange the data: 7, 14, 15, 17, 23, 25, 28, 37, 41, 46 3 6 48
4 2 50
Position of Q1 = (n + 1)/4 = (10 + 1)/4 = 2.75
Q1 = 14 + 0.75(15 – 14) = 14.75 Compute the position of both lower and upper quartile:

Position of Q3 = 3(n + 1)/4 = 3(10 + 1)/4 = 8.25 != ∑/ = 0* = 12.5 , #= # ∑/ = 3(12.5) = 37.5
Q3 = 37 + 0.25(41 – 37) = 38 & & &

Therefore, From the table, Q1 = 1 car and Q3 = 3 cars

Interquartile range: 3 − 1 = 38 – 14.75 = 23.25 Therefore,

Quartile deviation = 3 − 1 = 23.25 = 11.625 IQR = 3 − 1 = 3 – 1 = 2 cars

" " 3 − 1 2

QD = " = " = 1 car

BOX-AND-WHISKERS DISPLAY (BOX PLOT) BOX-AND-WHISKERS DISPLAY (BOX PLOT)

• Another way to describe the distribution of data is by using box and whisker plot, sometimes called E.g.The following data show the daily rates (RM) of 25 hotel rooms in a city:
a box plot and can be develop by using the information of measure of positions. 70 78 89 90 98 102 104 108 110 112
112 115 115 116 118 118 120 120 124 126
• A five number summary that provides a useful graphical representation of smallest value of 126 126 128 128 200
measurement, first quartile, median, third quartile, and the largest value of measurement.
Construct a box-and-whisker display and determine whether the distribution is skewed to the right, skewed to the left
• Can determine the shape of data distribution. or symmetrical.

Smallest value of the measurement = 70 Smallest value Q1 Q2 Q3 Largest value
Q1 = 103 70 103 115 125 200
Median / Q2 = 115
Q3 = 125
Largest value of the measurement = 200

BOX-AND-WHISKERS DISPLAY (BOX PLOT)

A box-plot can determine the shape of data distribution, by looking at:
1) Symmetrical distribution - The distance from Q1 to the Q2 would be equal to the distance from the
Q2 to Q3
2) Positively – skewed distribution (skewed to the right) - The distance from Q1 to the Q2 is much
shorter than the distance from the Q2 to Q3
3) Negatively – skewed distribution (skewed to the left) - The distance from Q1 to the Q2 is longer
than the distance from the Q2 to Q3

QMT181: Introduction to Statistics LEARNING OUTCOMES

NUMERICAL DESCRIPTIVE At the end of this chapter, you should be able to:
MEASURES • Estimate the measures of central tendency for ungrouped and grouped data
• Estimate the measures of position for ungrouped and grouped data
Nurul Fatin Azara Binti Zulkarnain • Estimate the measures of dispersion for ungrouped and grouped data
Faculty of Computer and Mathematical Sciences • Estimate the measures of skewness

Universiti Teknologi MARA (UiTM)

VARIANCE & STANDARD DEVIATION VARIANCE & STANDARD DEVIATION FOR UNGROUPED DATA

• The variance is the mean of the squared deviations. Grouped data E.g For the following data calculate the variance and the standard deviation.
• It determines how the values vary around the mean value. 3, 6, 9, 12, 15, 18
• Standard deviation is the square root of the variance.
1. ∑x!= 32 + 62 + 92 + 122 + 152 + 182 = 819
Ungrouped data 2. ∑x= 3 + 6 + 9 + 12 + 15 + 18 = 63
3. n = 6
Variance: ! Variance: !

! = ∑ #$# ! or ! = ∑#!$ ∑# ! = ∑' #$# ! or ! = ∑'#!$ ∑%#
%$& $ ∑'$& ∑%

%$& ∑'$& ! !

Variance, ! = ∑#!$ ∑# = ! = '&( $ %& = 31.5
$ ) %
Standard deviation (by definition): Standard deviation (by definition):
%$&

∑#!$ ∑# ! '&( $ %& !
$ ) %
! ∑#!$ ∑# ! ! ∑'#!$ ∑%# ! Standard deviation, = = = = 31.5 = 5.612
$ ∑%
= ∑ #$# or = = ∑' #$# or = %$&
%$& ∑'$&
%$& ∑'$&

VARIANCE & STANDARD DEVIATION FOR GROUPED DATA COEFFICIENT OF VARIATION (CV)

Calculate the variance and the standard deviation for the following data

Number of Number of fx fx2 • The coefficient of variation measure the ratio of the standard deviation to the mean expressed as a
Children (x) families (f) percent.
0
050 12 • The CV can be used to compare two or more sets of data measured in different units.
56
1 12 12 90 • When making comparisons between different values of coefficient of variations, the larger the
144 coefficient of variation the larger variability of data (i.e. indicate data is not consistent).
2 14 28 ∑ 2 = 302

3 10 30

4 9 36 = Standard Deviation ×100
Mean
∑ = 50 ∑ = 106

∑'#!$ ∑%# ! 01!$ &'( !
∑% )'
Variance: ! = = = 1.577 If CV distribution A is greater than the CV distribution B, then distribution A is more dispersed or
∑'$& 21$& more spread than distribution B. It means that the distribution B is more consistent (or less dispersed

or more stable or more uniform) than distribution A.

∑'#!$ ∑%# ! 01!$ &'( !
∑% )'
Standard deviation: = = = 1.577 = 1.256 ≈ 1 number of children
∑'$& 21$&

COEFFICIENT OF VARIATION (CV) COEFFICIENT OF SKEWNESS

E.g. • The coefficient of skewness measure the skewness of the data as well as the shape of the data
distribution.As for determining the shape of data distribution, the properties that should be follow
The mean and standard deviation for time take (in minutes) to answer the same quiz by a group of are:
Finance students were 16.45 and 5.42 respectively, whereas for the Statistics students, the mean and
standard deviation are 14.7 and 6.11. Determine which groups of students gave relatively more 1) Symmetrical distribution: If the value of skewness = 0
consistent in time take to answer the quiz.
2) Positively – skewed distribution (skewed to the right): If the value of skewness > 0 (positive).
• n = ".$! × 100 = 32.95% 3) Negatively – skewed distribution (skewed to the left): If the value of skewness < 0 (negative)
%&.$" Basically, there are two formula that can be used for estimating the coefficient of skewness of the data.
The formula are:
• = &.% × 100 = 41.56%
%$.'

Since CV Finance < CV Statistic, Finance student is more consistent. Pearson’s Coefficient of Skewness I, CS I = ()*+ ,(-.) = #̅ $ #O
/0*+.*1. .)23*03-+ P

Pearson’s Coefficient of Skewness II, CS II = 4(()*+ ,().3*+) = 0(#̅ $ #R)
/0*+.*1. .)23*03-+ P

COEFFICIENT OF SKEWNESS MEASURES OF DISPERSION AND SKEWNESS

1. Given the mean, mode and standard deviation of a set of data are 4, 5 and 0.5 respectively. Find the The following data show the monthly sales (RM ‘000) of 30 sundry shops taken randomly from Town X:
Pearson Coefficient of Skewness and explain the distribution.

Skewness= ()*+ , (-.) = $ ," = – 2 Monthly Sales (RM ‘000) Number of Shops
/0*+.*1. 7)23*03-+ 8." 5<7 8
7<9 12
Skewness value is negative value, thus the distribution of the data set is negatively-skewed. 9 < 11 5
2
11 < 13 3
13 < 15

2. Given that the mean, median and standard deviation of daily sales of Be-Tec electronic are 2.55, 1.89 and a) Compute the mean monthly sales of 30 sundry shops.
1.23 respectively. Calculate the coefficient of skewness and state the shape of distribution for daily sales.
b) Calculate the median monthly sales and explain its meaning.
Skewness= 4(()*+ , ().3*+) = 4(!."" ,%.9:) = 1.61 c) Find the standard deviation of monthly sales.
/0*+.*1. 7)23*03-+ %.!4
d) Determine the shape of data distribution using the coefficient of skewness.
Skewness value is positive value, thus the distribution of daily sales is positively-skewed.
e) If the Town B, the respective mean and standard deviation of monthly sales are RM 12,000 and RM 3,000,
sundry shops in which town have more dispersed monthly sales? Show this using appropriate calculation.

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 3: CORRELATION AND At the end of this chapter, you should be able to:
REGRESSION • Determine the Least Squares Regression Line, y=a+bx
• Able to interpret a and b
Nurul Fatin Azara Binti Zulkarnain • Use of regression model for prediction purposes
Faculty of Computer and Mathematical Sciences

Universiti Teknologi MARA (UiTM)

SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

• A simple linear relationship involving only two variables. a
• One would be the dependent variable (y) while the other would be the independent b
• For every 1 mark increase in Test Marks, the average Final Marks increases by 0.5.
variable (x). • For every 10 marks increase in Test Marks, the average Final Marks increases by 5.
• The dependent variable is the variable in regression that cannot be controlled or

manipulated.
• Besides helping to determine the type of relationship between two variables, regression

analysis also helps predict a dependent variable when independent variable is given.

Linear regression equation: Y-intercept Slope

= +

a is the Y-intercept of the regression line.
b is the slope - indicates the change in the E(Y) (mean of response variable) for every unit increase
in X.

SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

120 The values of a and b can be obtained using the least squares method.

100 ∑ − ∑ ∑ ∑ − ∑ ∑
∑ ! − ∑ ! ∑ !
80 b = b = ∑ !

a −
60 b
Axis Title ∑ ∑
40
a = − or a = − b
20

0
0 10 20 30 40 50 60
Axis Title

SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Slope b Example:
If slope has positive value.That implies a positive relationship exist between the two variables A lecturer wants to know the relationship between the number of study hours in a week and GPA
The specific value b = +b indicate that for every unit increase in independent variable (x), obtained by 10 students selected randomly from a class.The data below gives the following results:
dependent variable (y) would increase by b units.
The specific value b = -b indicate that for every unit increase in independent variable (x), Number of study hours GPA
dependent variable (y) would decrease by b units. 9 3.20
7 3.00
Intercept a 10 3.15
Let a= a,When independent variable (x) = 0, the dependent variable (y) would be a. 6 2.84
7 2.98
8 3.05
12 3.48
4 2.01
5 2.28
6 2.90

Number of study hours (x) GPA (y) x2 xy

9 3.2 81 28.8 Estimate the GPA obtained by Amir if he studies for 11 hours in a week.
7 3 49 21
10 3.15 100 31.5 y = 1.70056 + 0.1606 x
6 2.84 36 17.04
7 2.98 49 20.86 When x = 11,
8 3.05 64 24.4 y = 1.70056 + 0.1606 (11) = 3.47
12 3.48 144 41.76 Therefore,Amir will get GPA 3.47 if he studies for 11 hours in a week.
4 2.01 16 8.04
5 2.28 25 11.4
6 2.9 36 17.4
∑ = 74 ∑ = 28.89 ∑ x2 = 600 ∑ xy = 222.2

b = ! ∑ #$%∑ # ∑ $ = ./(111.1)4(56)(17.78) = 0.1606
! & #!% ∑ # ! ./(9//)4 56 !

a = − = '(.(* − 0.1606 -. = 1.70056
+, +,

Therefore, the regression equation is : y = 1.70056 + 0.1606 x
a = 1.70056 means when the study hour is zero, then the GPA is 1.70.
b = 0.1606 means for every one hour increase in study hours, the GPA will increase by 0.1606.

SIMPLE LINEAR REGRESSION

Example:

A school headmaster plans to increase their tuition classes’ period hours in order to decrease the number of students who
fail in their examination. He randomly collected information from other schools in town for his plan.The table below shows
the data gathered.

School Tuition class period (hours) Number of Students who fail

A8 17

B 10 13

C 18 5

D 20 4

E 10 7

F5 21

G4 19

H5 6

i) Determine the regression equation line using the least square method.
ii) Estimate the number of students who fail if there were twelve hours tuition class period.

QMT181 – Introduction to Statistics LEARNING OUTCOMES

TOPIC 4: INDEX NUMBER At the end of this chapter, you should be able to:
• Describe the term index.
Nurul Fatin Azara Binti Zulkarnain • Compute the index number using the unweighted index methods
Faculty of Computer and Mathematical Sciences • Compute the index number using the weighted index methods

Universiti Teknologi MARA (UiTM)

WHY COMPUTE INDICES? INDEX

An index number expresses the relative change in price, quantity, or value
compared to a base period.

Provide convenient ways to express the change in the total RM 345,651,289,560 UNWEIGHTED WEIGHTED
of a heterogeneous group of items. or 10%? Deals with the items that having Deals with the items that having
a difference level of important.
Easier to comprehend than actual numbers (percent changes) the same level of important.
Weighted Relative Index
Simple Relative index Weighted Aggregate Index

Average Relative Index Laspeyres’ Index
Paasche’s Index
Simple Aggregate Index

SIMPLE RELATIVE INDEX NUMBER SIMPLE RELATIVE INDEX NUMBER

A simple relative price index number is the ratio of the price of a single item in given Table below shows the price of fish from the year 2001 to 2005. Taking 2002 as the base
period t, to its price in the base period 0. On the other hand, a simple relative quantity
index number can be defined as a ratio of the quantity of a single item in given period , year, find the simple relative price index of fish 2001 and 2005. Explain the results
to its quantity in the base period 0.
obtained. Year Selling Price per kg (RM)

Simple relative price index number: ! 2001 5.00
" 2002 6.50
= ×100 2003 7.50
2004 8.00
where, 2005 10.00

Pt = Price of item in the current year,
P0 = Price of item in the base year.

Simple relative quantity index number: Base year = 2002, Current year = 2001 Base year = 2002, Current year = 2005

= ! ×100 = ##$&"!..$"×"" 1×01000==####7""""6#$.9×21%00 = ##'&"!".$×."""1×0010=0 ##""% ×100
" = = ##""#

where, = 153.85%

Q t = Quantity of item in the current year, As for the fish price in year 2001, the price is lower by 23.08% (100-76.92) as compared
Q 0 = Quantity of item in the base year. to price in year 2002. However, the fish price has increase by 53.85% (153.85-100) from
year 2002 to 2005.

SIMPLE RELATIVE INDEX NUMBER AVERAGE RELATIVE INDEX NUMBER

Table below shows the quantity of iron produced by a country, in millions of tons, from An average relative price index number compares the changes in the total price of
2000 to 2003. Find the simple quantity index for 2003 with 2001 as the base year. several items, whereas an average relative quantity index number is the sum of all
quantity relatives of all the items divided by the number of items.
Year Production of iron (millions of tons)

2000 300 Average relative price index number:

2001 450 = 1 ∑ ! ×100
"
2002 520

2003 800 where,

Pt = Price of item in the current year,
P0 = Price of item in the base year,
Base year = 2001, Current year = 2003 n = Number of items involved.

= ((*)"!$"""××101000==((##1""7""7$& .×781%00 Average relative quantity index number:
=
= 1 ∑ ! ×100
"

The quantity of iron produced in year 2003 shows an increase by 77.78% (178.78-100) where,
from year 2001 to 2003
Q t = Price of item in the current year,
Q 0 = Price of item in the base year,
n = Number of items involved.

AVERAGE RELATIVE INDEX NUMBER SIMPLE AGGREGATE PRICE INDEX NUMBER

Table shows the prices per kilogram and quantities for three different food items, namely, A simple aggregate price index number for a given time, t is calculated by expressing the

A, B and C, in 2001 and 2003. Find the average relative price and quantity index for these total of item prices in year, t as a percentage of the total item prices in the base year. By
three items in 2003, taking 2001 as the base year.
using same definition, a simple aggregate quantity index number, is calculated by
Food items Price per Kilogram Quantity expressing the total of item quantities in year, t as a percentage of the total item

A 2001 2003 2001 2003 quantities in the base year.
B
C 2.00 3.00 140 195

4.00 3.20 750 950 Simple aggregate price index number:

15.00 30.00 490 670 = ∑ ! ×100
∑ "
Base year = 2001, Current year = 2003 Base year = 2001, Current year = 2003

"##$ = % &!,#$$% ×100 + &',#$$% ×100 + &(,#$$% ×100 "##$ = % '!,#$$% ×100 + '',#$$% ×100 + '(,#$$% ×100 Simple aggregate quantity index number:
$ &!,#$$& &',#$$& &(,#$$& $ '!,#$$& '',#$$& '(,#$$&

= ' + ×100 + +.," ×100 + +" ×100 = ' [139.29 + 126.67 + 136.73] ∑ !
+ , * '$ + ∑ "
' = 134.23%
= + 150 + 80 + 200 = ×100

= 143.33% where,

The average price of these three food items has increased by The average quantity of these three food items has increased ∑ Qt = Total quantity of a group items in the current year,
43.33% from year 2001 to 2003. by 34.23% from year 2001 to 2003. ∑ Q0 = Total quantity of a group items in the base year.

SIMPLE AGGREGATE INDEX NUMBER EXERCISE

Table shows the prices per kilogram and quantities for three different food items, namely, Table below shows the prices, price indices for Mr. Loo’s various expenditures.

A, B and C, in 2001 and 2003. Calculate the simple aggregate price index and simple

aggregate quantity index, taking 2001 as the base year. Expenditure Price (RM) Price index

Food items Price per Kilogram Quantity Food 121
Transport 120
A 2001 2003 2001 2003 Clothing 2000 2003
B 1089 z
C 2.00 3.00 140 195 Utility 900 480 117
Rental x 336 110
4.00 3.20 750 950 Note: 2000 = 100
300 y
15.00 30.00 490 670 600 275
250
Simple aggregate price index number:

"##$ = ∑&) ×100 = &!,#$$%)&',#$$%)&(,#$$% ×100 = $)$.")$# ×100 = 172.38%
∑&$ &!,#$$&)&',#$$&)&(,#$$& ")+)%,

The total price of these three food items has increased by 72.38% from year 2001 to 2003. Find the value for x, y and z.

Simple aggregate quantity index number:

"##$ = ∑') ×100 = '!,#$$%)'',#$$%)'(,#$$% ×100 = %-,)-,#)./# ×100 = 131.52%
∑'$ '!,#$$&)'',#$$&)'(,#$$& %+#)/,#)+-#

The total quantity of these three food items has increased by 31.52% from year 2001 to 2003.

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 6: INDEX NUMBER At the end of this chapter, you should be able to:
• Describe the term index.
Nurul Fatin Azara Binti Zulkarnain • Compute the index number using the unweighted index methods
Faculty of Computer and Mathematical Sciences • Compute the index number using the weighted index methods

Universiti Teknologi MARA (UiTM)

INDEX WEIGHTED RELATIVE INDEX NUMBER

An index number expresses the relative change in price, quantity, or value • A weighted relative price index number is the ratio of the price of a single item in
compared to a base period.
given period t , to its price in the base period 0 based on their weighted of
UNWEIGHTED WEIGHTED importance,W.
Deals with the items that having Deals with the items that having • A weighted relative quantity index number can also be defined as a ratio of the
a difference level of important. quantity of a single item in given period , to its quantity in the base period 0 based
the same level of important. on their weighted of importance,W.
Weighted Relative Index
Simple Relative index Weighted Aggregate Index Weighted relative price index number:

Average Relative Index Laspeyres’ Index ∑ ! × ×100, ∑ ! × ×100
Paasche’s Index = " = "
Simple Aggregate Index
∑ ∑

where,

Pt/Qt = Price/Quantity of item in the current year,

P0/Q0 = Price/Quantity of item in the base year,
W = Weight of items.

WEIGHTED RELATIVE INDEX NUMBER WEIGHTED AGGREGATE INDEX NUMBER

Table shows the prices, in Ringgit Malaysia, of 4 items in 2002 and 2003 along with the • In order to show the level of importance of every single item, a weight,Wi, can be
weights for every individual item. Calculate the weighted relative price index in 2003, assigned to every item.
using 2002 as the base year.

Items 2002 2003 Weight Weighted relative price/quantity index number:
A 52.00 55.12 4
B 24.00 24.72 3 = ∑ ! ×100, = ∑ ! ×100
C 141.80 184.34 1 ∑ " ∑ "
D 87.60 100.74 2

%,#""$ × % + ',#""$ × ' + (,#""$ × ( + ),#""$ × ) where,
%,#""# ',#""# (,#""# ),#""# Pt/Qt = Price/Quantity of item in the current year,
#""$ = ×100
% + ' + ( + ) P0/Q0 = Price/Quantity of item in the base year,
55.12 24.72 184.34 100.74 W = Weight of items.
52 ×4 + 24 ×3 + 141.80 ×1 + 87.60 ×2
= ×100
4+3+1+2
= 109.3 %

The weighted relative price of these four items has increased by 9.3% from year 2002 to 2003.

WEIGHTED AGGREGATE INDEX NUMBER LASPEYRES INDEX NUMBER

Table shows the prices per kilogram and quantities for three different food items, namely, A Laspeyres’s price index uses the base year quantities as the weights, whereas
A, B and C, in 2001 and 2003. Find the weighted aggregate price and quantity index for Laspeyres’s quantity index uses the base year prices as the weights.
these three items in 2003 with 2001 as the base year.

Food Items Price per kg Quantity Weight Laspeyres Price Index Number:

A 2001 2003 2001 2003 6
B 4
C 2.00 3.00 140 195 2 ∑ ! "
∑ " "
4.00 3.20 750 950 = ×100

15.00 30.00 490 670 where,

#""$ = %,#""$× % + ',#""$× ' + (,#""$× ( ×100 = 3×6 + 3.20×4 + 30×2 ×100 Pt = Price of item in the current year,
%,#""*× % + ',#""*× ' + (,#""*× ( 2×6 + 4×4 + 15×2 P0 = Price of item in the base year,
Q0 = Quantity of item in the base year.
= 156.55%

The weighted aggregate price of these three food items has increased by 56.55% from year 2001 to 2003. Laspeyres Quantity Index Number:

#""$ = %,#""$× % + ',#""$× ' + (,#""$× ( ×100 = 95×6 + 950×4 + 670×2 ×100 = ∑ ! " ×100
%,#""*× % + ',#""*× ' + (,#""*× ( 140×6 + 750×4 + 490×2 ∑ " "
where,

= 130.91% Qt = Price of item in the current year,
Q0 = Price of item in the base year,
The weighted aggregate quantity of these three food items has increased by 30.91% from year 2001 to 2003. P0 = Price of item in the base year.

PAASCHE INDEX NUMBER LASPEYRES INDEX NUMBER &
PAASCHE INDEX NUMBER

Paasche’s price index uses the current year quantities as the weights and Paasche’s Table shows that the quantity (in tons) and price (in thousands of ringgit) of three items,
quantity index uses the current year prices as the weights. A, B and C produced by a company in 2001 and 2002.Taking 2001 as the base year, find
the Laspeyres index and Paasche index for price and quantity of items produces in 2002.

Paasche Price Index Number: ∑ ! ! Items 2001 2002
∑ " ! Quantity Quantity
= ×100 A Price 20 Price 30
B 30 32 35 25
where, C 40 60 40 90
10 12
Pt = Price of item in the current year,
P0 = Price of item in the base year, Laspeyres Index Number:
Qt = Quantity of item in the current year.
∑ ! " 35×20 + 40×32 + (12×60)
Paasche Quantity Index Number: #""# = ∑ " " ×100 = 30×20 + 40×32 + (10×60) ×100

= ∑ ! ! ×100 = 108.87%
∑ " !
The total price of these three items has increased by 8.87% from 2001 to 2002.

where, ∑ ! " 30×30 + 25×40 + (90×10)
∑ " " 20×30 + 32×40 + (60×10)
Qt = Price of item in the current year, #""# = ×100 = ×100
Q0 = Price of item in the base year,
Pt = Price of item in the current year. = 112.90%

The total quantity of these three items has increased by 12.90% from 2001 to 2002.

LASPEYRES INDEX NUMBER & EXERCISE
PAASCHE INDEX NUMBER

Table shows that the quantity (in tons) and price (in thousands of ringgit) of three items, The price per kilogram and quantity produced of vegetables for year 2019 and 2020 are
A, B and C produced by a company in 2001 and 2002.Taking 2001 as the base year, find given as follows:
the Laspeyres index and Paasche index for price and quantity of items produces in 2002.

Items Price 2001 Price 2002 Items 2019 2020
30 Quantity 35 Quantity Price (RM) Quantity (‘00 kg) Price (RM) Quantity (‘00 kg)
A 40 20 40 30 A
B 10 32 12 25 B 2.50 173.6 3 203.3
C 60 90 C 4.50 160.6 3 210.4
6.50 148.3 10 187.9

Paasche Index Number:

#""# = ∑ ! ! ×100 = 35×30 + 40×25 + (12×90) ×100 Using 2019 as the base year,
∑ " ! 30×30 + 40×25 + (10×90) i) Calculate the average relative price index for 2020.

= 111.79% ii) Calculate Laspeyres quantity index for 2020. Explain the meaning of the value calculated.
iii) Calculate the Paasche price index for 2020.
The total price of these three items has increased by 11.79% from 2001 to 2002.

#""# = ∑ ! ! ×100 = 30×35 + 25×40 + (90×12) ×100
∑ " ! 20×35 + 32×40 + (60×12)

= 115.93%

The total quantity of these three items has increased by 15.93% from 2001 to 2002.

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 5: TIME SERIES At the end of this chapter, you should be able to:

Nurul Fatin Azara Binti Zulkarnain • Define the four components of a time series.
Faculty of Computer and Mathematical Sciences • Compute a moving average.
• Determine a linear trend equation.
Universiti Teknologi MARA (UiTM) • Use a trend to forecast future time periods and to develop seasonally adjusted forecasts.
• Determine and interpret a set of seasonals indexes.

INTRODUCTION TO TIME SERIES TREND

• Time series data collected from a real life thing that researcher are interested in. • Trend is the overall long term direction of the series (usually more than 10 years).
• The data is analysed using a computer to give graphic and numeric output. • The main reason for influence the trend are changes in technology and population.
• The output of the analysis tell us more about the real life condition. • The reasons for studying trend pattern:
• The interval taken can be hourly, daily, weekly, monthly or yearly.
• An analysis of the pattern generated by the past data and projecting this pattern into Ø Allows the modeller to understand the historical pattern existing in the data to determine
suitable model to be applied.
the future for forecast.
Ø Enable the modeller to project past pattern or trend into the future forecasting – what had
Time Series Components: happened before will continue to occur in the future.
• Trend
• Cyclical variations Ø Enable the modeller to isolate or remove the trend component from the actual data.
• Seasonal variations • Some examples of trend are:
• Irregular variations
Ø An increasing number of road accidents’ death from year to year.
Ø A declining rate of babies’ deaths from year to year due to modern medicines.

SEASONAL VARIATION CYCLIC AL VARIATION

• Seasonal pattern if the regular fluctuations occurring within a specific period of time • Cycle occurs when a series follows an up and down pattern that is not seasonal.
(that not exceed a period of one year below). • It refers to the rises and falls of the series over unspecified period of time, usually

• Fluctuations repeat in the following periods with the same regulatory pattern. around a long-run trend (2-10 years).
• Weather – the price of fish will increase compare with other months. • The main reason for influence the cyclical variation is an interactions of numerous
• School holiday – more travel or vacation. It tends to increase demand of fuel, fully
combinations of factors influencing the economy. (prosperity, recession, depression
book for hotel and flight tickets. and recovery.
• Festive seasons – demand of essential goods such as certain food items (flour, eggs and • Some examples of cyclical are:

cornflakes), public transport and health care will increased. Ø A declining rate of share prices due to economic recession.
Ø A decreasing employment rate due to economic downturn.

1st Cycle 2nd Cycle 3rd Cycle

IRREGULAR VARIATION MOVING AVERAGE METHOD

• It is the movement that is unpredictable that takes place by chance or randomly. A non linear trend can be obtained using the moving averages method. Moving average
• Irregular fluctuation occurs due to unforeseen events such as natural disaster, wars method is one of the most popular approaches for smoothing out time series data.A
moving average consists of a series of averages, where each average is the mean value of
etc. the time series over a fixed interval of time.
• Some examples of irregular variation:
Simple moving average
Ø A decreasing number of tourists flying to New York due to the attack on World Used when the data are such that the period is an odd number of terms.
Trade Centre on the 11th September 2011. Example:
Ø Five days a week
Ø A loss of sales due to floods in Segamat on February 2006. Ø Seven days a week
Ø Three shifts a day
Ø Three season (terms) a year

Semi averages
Used when the data are such that the period is an even number of terms.
Example:
Ø Four quarters a year
Ø Four weeks a month

FOREC AST THE TREND OF OBSERVED VARIABLE FOREC AST THE TREND OF OBSERVED VARIABLE
USING MOVING AVERAGE METHOD USING MOVING AVERAGE METHOD

The annual profits (in RM’000) of the Easy Company are given as follows: Year Profit (RM’000) 3Y M total 3-year MA
T (Trend) ÷ 3

Year 2001 2002 2003 2004 2005 2006 2007 2001 280

Profit 280 320 375 350 380 400 410 2002 320 280+320+375 = 975 975 / 3 = 325.00

(RM’000)

2003 375 1045 348.33

Determine the trend values using a 3-year moving averages method correct to 2 decimal 2004 350 1105 368.33
places.

2005 380 1130 376.67

2006 400 1190 396.67

2007 410

FOREC AST THE TREND OF OBSERVED VARIABLE FOREC AST THE TREND OF OBSERVED VARIABLE
USING MOVING AVERAGE METHOD USING MOVING AVERAGE METHOD

Time Series of Profit (RM'000) and Trends of Easy Company The following table shows the quarter year sales (RM thousand) made by a sports
450 equipment company in Segamat, Johor.

400 Quarter Year
2011
350 1 2010 50 2012
2 45 73 57
300 3 66 60 78
4 57 49 65
250 44 56

200

150 Determine the trend values using a 3-year moving averages method correct to 2 decimal
places.
100

50

0 2345 6 7
1

Profit (RM’000) 3-year moving average
T (Trend)

FOREC AST THE TREND OF OBSERVED VARIABLE FOREC AST THE TREND OF OBSERVED VARIABLE
USING MOVING AVERAGE METHOD USING MOVING AVERAGE METHOD

Year Quarter Sales,Y 4Q-MA Centred-MA Trend Time Series and Trend of Sports Equipment Sales
2010 1 45
2 66 (45 + 66 + 57 + 44)/4 = 53 (53 + 54.25) / 2= 53.625 90
2011 3 57 (66 + 57 + 44 + 50)/4 = 54.25 (54.25 + 56)/2 = 55.125 80
4 44 70
2012 1 50 56 56.375 60
2 73 56.75 57.375 50
3 60 58.875 40
4 49 58 60.375 30
1 57 59.75 61.625 20
2 78 63.125 10
3 65 61
4 56 62.25 0 23 412341 23 4
1 2010 2012
64

2011

Sales, Y Centred-MA Trend

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 5: TIME SERIES At the end of this chapter, you should be able to:

Nurul Fatin Azara Binti Zulkarnain • Define the four components of a time series.
Faculty of Computer and Mathematical Sciences • Compute a moving average.
• Use a trend to forecast future time periods and to develop seasonally adjusted forecasts.
Universiti Teknologi MARA (UiTM) • Determine and interpret a set of seasonals indexes.

MEASURING SEASONAL VARIATION

Seasonal variations are fluctuations that coincide with certain seasons and are repeated year after MEASURING SEASONAL INDEX
year. Understanding seasonal fluctuations help plan for sufficient goods and materials on hand to
Since S is a seasonal variation that occurs due to seasonal factors which happens once in
meet varying seasonal demand. Seasonal variation can be estimated using a model known as a a short time period (usually a year) and this will repeat in each of the coming years,
multiplicative model. Using the multiplicative model, we assume therefore, we only determine one adjusted S value for each time period for the years.
Let’s say for the time period Quarter 1, 2005, Quarter 1, 2006, and Quarter 1, 2007 we
Actual data (Y) = Trends (T) x Seasonal variation (S) x Cyclical variation (C) x Irregular variation (I) will have only one value of adjusted S for Quarter 1 irrespective in what year it is. This
or simply Y = T x S x C x I adjusted S value is commonly known as seasonal index.

Y ≈T x S Seasonal index for a time period t is the adjusted value of the seasonal variations of the
time period t.
Hence,

= ×100%

S is given as a percent.

The value of S is therefore can be either greater than 100%, lesser than 100%, or equal to 100%.The

value of S interprets either the actual data (Y) is above the trend (if S > 100%), below the trend (if S

< 100%) or exactly equal to the trend (if S = 100%).

FOREC AST THE TREND OF OBSERVED VARIABLE a) Determine the trend values using the moving average method. Give your answers correct to 4 decimal places.
USING MOVING AVERAGE METHOD
Year Quarter Kilowatt hours (millions) 4-Q MA Centred-MA
The table below shows electrical usage data for a particular water power company for the year 2007 1 Y Trend,T
2007 to 2009.
1071

Year Quarter Kilowatt hours (millions) 2 648
2007 1 1071
2 648 736.25
2008 3 480
4 746 3 480 723.0000
2009 1 965
2 661 709.75
3 501
4 768 4 746 711.3750
1 1065
2 667 713
3 486
4 780 2008 1 965 715.6250

718.25

2 661 721.0000

723.75

3 501 736.2500

748.75

4 768 749.5000

750.25

2009 1 1065 748.3750

746.5

2 667 748.0000

749.5

3 486

4 780

b) Plot the time series data and trend values in one graph. c) Calculate the seasonal variations and indices for each quarter.

Kilowatt hours (millions) Centred-MA Seasonal Variation
Year Quarter Y 4-Q MA Trend,T
Time Series and trend of electrical usage 2007 ×100%
1 1071 736.25
1200 2008 2 648 709.75
3 480
1000 2009 4 746 713
1 965 718.25
Kilowatt hours (milllions) 800 2 661 723.75 723.0000 66.3900
3 501 748.75 711.3750 104.8673
600 Kilowatt hours (millions) 4 768 750.25 715.6250 134.8472
Trend 1 1065 746.5 721.0000 91.6782
2 667 749.5 736.2500 68.0475
400 3 486 749.5000 102.4683
4 780 748.3750 142.3083
200 748.0000 89.1711

0
123412341234

2007 2008 2009

c) Calculate the seasonal variations and indices for each quarter. c) Explain the meaning of the seasonal index for all quarters.

Year Quarter Year Quarter
Seasonal Index 2
2007 1 23 4 1 3 4
2008 104.8673 138.6055 103.6886
2009 134.8472 66.3900 102.4683 90.4428 67.2322
Average 142.3083
138.5777 91.6782 68.0475 103.6678 Required total : The electrical usages for the first quarter has increased by 38.6055% (138.6055-100) due to seasonal influences.
Correction Factor 4 x 100 = 400 The electrical usages for the second quarter has decreased by 9.5572% (100-90.4428) due to seasonal influences.
89.1711 The electrical usages for the third quarter has decreased by 32.7678% (100-67.2322) due to seasonal influences.
Total: The electrical usages for the four quarter has increased by 3.6886% (103.6886-100) due to seasonal influences.
90.4246 67.2187
399.8890

= 400 = 1.0002
399.8890

Seasonal Index 138.6055 90.4428 67.2322 103.6886

Forecast the electrical usage for the fourth quarter in 2010.

FORECASTING Kilowatt hours (millions) Centred-MA Seasonal Variation
Year Quarter Y 4-Q MA Trend,T
Forecasting is the process of predicting the future using the data that are recorded in the past. One 2007 1 ×100%
main purpose of seasonal index is to forecast the value of dependent variable in the coming future 1071 736.25 723.0000
so that decision making can be made. 709.75 711.3750
2 648 715.6250
From the multiplicative model, 713 721.0000
Y =T x S x C x I 3 480 718.25 736.2500 66.3900
Y ≈T x S 723.75 749.5000 104.8673
4 746 748.75 748.3750 134.8472
Therefore to forecast the value of Y, for the time period t: 750.25 748.0000 91.6782
Forecast Y for the time period t = Estimated trend for time period t x Seasonal index for time 2008 1 965 746.5 68.0475
period t or 749.5 1 102.4683
Yt = Tt x St 2 661 2 142.3083
3 89.1711
3 501 4
5
4 768 6

2009 1 1065

2 667

3 486

4 780

2010 1

2

3

4

FORECASTING c) Calculate the seasonal variations and indices for each quarter.

Year Quarter

Change in trend = !!"!" = %&'"%() = 3.5714 2007 1 23 4
#"$ '"$ 2008 104.8673
2009 134.8472 66.3900 102.4683
Average 142.3083
Trend2010, 4 = + (change in trend × no of direction cell) 138.5777 91.6782 68.0475 103.6678 Required total :
= 748 + (3.5714 × 6) = 769.4284 Correction Factor 4 x 100 = 400
89.1711
Total:
90.4246 67.2187
399.8890

Forecast2010, 4 = (*$*,,&× ,& = 769.426× 103.6886 = 797.8070 = 400 = 1.0002
100 100 399.8890

The electric usage for fourth quarter in year 2010 should be 797.8070 kilowatt hours (millions). Seasonal Index 138.6162 90.4497 67.2374 103.6966

EXAMPLE a) Determine the trend values using the moving average method. Give your answers correct to 3 decimal places.

Year Term Y 3-T Moving Total 3-T MA (Trend,T)
2010 1
2 250
2011 3
The table below shows the number of rejected bulbs recorded by a certain factory in a city. 1 210 250 + 210 + 200 = 660 660 / 3 = 220
2012 2
First term Second term Third term 3 200 210 + 200 + 280 = 690 230
(Jan-Apr) (May-Aug) (Sept-Dec) 2013 1
Year 2 280 720 240
250 210 200 3
2010 280 240 250 1 240 770 256.667
2011 370 390 390 2
2012 360 350 350 3 250 860 286.667
2013
370 1010 336.667

390 1150 383.333

390 1140 380

360 1100 366.667

350 1060 353.333

350

b) Plot the time series data and trend values in one graph. c) Calculate the seasonal variations and indices for each term.

Time series and trend of number of rejected bulbs Year Term Number of rejected 3-T Moving 3-T MA (Trend,T) Seasonal
2010 bulbs, Y Total Variations
450 2011 1
2012 2
400 2013 3 250
1
No of rejected bulbs 350 2 210 660 220.000 95.455
3
300 1 200 690 230.000 86.957
2
250 3 280 720 240.000 116.667
1
200 Y 2 240 770 256.667 93.506
150 3-T MA (Trend, T) 3
250 860 286.667 87.209

100 370 1010 336.667 109.901

50 390 1150 383.333 101.739

0 390 1140 380.000 102.632
123123123123
360 1100 366.667 98.182

2010 2011 2012 2013 350 1060 353.333 99.057

350

c) Calculate the seasonal variations and indices for each quarter.

Year 1 Term 3 Required total : c) Explain the meaning of the seasonal index for the second term.
2 86.957 3 x 100 = 300 The numbers of rejected bulbs for the second term has decreased by 1.879% (100-98.121) due to seasonal influences.
2010 116.667 87.209
2011 109.901 95.455 102.632 Total:
2012 98.182 93.506
2013 108.250 101.739 92.266 297.955
Average 99.057
97.439 92.912

Correction Factor 300/297.955 = 1.007

Seasonal Index 109.008 98.121

d) Forecast the number of rejected bulbs for the second term of 2014.

Year Term Y 3-T Moving Total 3-T MA (Trend, Seasonal !!"!" )-).)))"((*
2010 T) Variations #"$ $*"$
2011 1 250
2 210 660 Change in trend = = =14.815
2012 3 200 690
1 280 720 220.000 95.455 (*$&,!( = + (change in trend × )
2013 2 240 770 230.000 86.957 = 353.333 + (14.815 × 3) = 397.778
3 250 860 240.000 116.667
1 370 1010 256.667 93.506 (*$&,!( = (*$&,!(× !( = 397.778× 98.121
2 390 1150 286.667 87.209 100 100
3 390 1140 336.667 109.901
1 360 1100 383.333 101.739 = 390.3048
2 350 1060 380.000 102.632
3 350 366.667 98.182 The numbers of rejected bulbs for second term in year 2014 should be 390.3048 ≈ 391 units.
353.333 99.057

QMT181: Introduction to Statistics LEARNING OUTCOMES

TOPIC 6: PROBABILITY At the end of this chapter, you should be able to:

Nurul Fatin Azara Binti Zulkarnain • Define the set theories of probability.
Faculty of Computer and Mathematical Sciences • Apply the addition and multiplication rules for probability.
• Use the counting rules
Universiti Teknologi MARA (UiTM) • Construct a tree diagram.
• Understand the Bayes’ Theorem

INTRODUCTION TO PROBABILITY REVIEW ON SET THEORY

Probability is the likelihood or chance that a A set is a group of objects.A set is denoted by capital letters such as A, B, C, etc.A set is
particular event will occur. Probability is the basis written as

of inferential statistics. Probability is the language A = {a, b, c, d, e} or B = {2, 4, 6, 8, 10}
we use to model uncertainty. The value ranges
between 0 and 1. A Venn Diagram is a diagram used to portray the operation on set.

U

0 ___________________________________ 1

Impossibility Certainty

REVIEW ON SET THEORY REVIEW ON SET THEORY

An intersection of two sets A and B in the set of all elements belonging to both A and A union of two sets A and B is the set of all elements belonging to A or B or both A and
B. Intersection of A and B is denoted as A ∩ B. B. Union of A and B is denoted as A ∪ B.

If A = {5, 3, 8, 9, 7}, B = {4, 8, 5, 9}, If A = {2, 4, 6, 8}, B = {1, 2, 3, 4, 5},
then A ∩ B = {5, 8, 9}. then A ∪ B = {1, 2, 3, 4, 5, 6, 8}.

The shaded region in the Venn Diagram below shows A ∩ B: The shaded region in the Venn diagram below shows A ∪ B.
U U

A 5B AB
37 8 2
94
68 1
43

5

REVIEW ON SET THEORY REVIEW ON SET THEORY

A complement of set A is the set of all elements not belonging to A but belonging to the Two sets A and B are said to be disjoint or mutually exclusive if the two sets have no
universal set of U. Complement of A is denoted as A’. elements in common such that A ∩ B = { } = ∅
If U = {4 , 8, 12, 16, 20, 24, 28} is a universal set,A = {4, 8, 12},
then A’ = {16, 20, 24, 28} If A = {3, 4, 5, 7, 8}, B = {9, 11, 13}, then A and B are disjoints sets since A ∩ B = ∅.
The shaded region in the Venn diagram below shows A’:
The Venn diagram below shows the sets A and B are disjoint.
U
U
16 A
20 4 8 12 A B
24 28 9 11 13
3 45 7
A’ 8

REVIEW ON SET THEORY BASIC TERMS IN PROBABILITY

Subset.A set A is subset of another set B if all the elements in A are also in B such A ⊂ B. Experiment.An experiment is a process of obtaining an outcome or a process that
If A = {1, 2, 3}, and B = {1, 2, 3, 4, 5}, then A is a subset of B. generates a set of data.
The Venn diagram below shows the set A is a subset of B,A ⊂ B:
Sample Space.A sample space is a complete list of all the possible outcomes of an
U experiment. Each possible result of such a study is represented by one and only one point
in the sample space, which is usually denoted by S.
A B
12 Events.An event is the set of outcomes resulting from an experiment and it is taken
3 4 from a sample space.An event is a subset of a sample space.
5

BASIC TERMS IN PROBABILITY

Experiment Sample Space Events PROBABILITY OF EVENT
S = {1, 2, 3, 4, 5, 6} Obtaining a “3” from rolling a
Rolling a dice S = {Heads,Tails} dice. Probability of an event A to occur is defined as the ratio of number of ways
S = {two red balls, two blue balls} Obtaining a “Head” the event A can occur and the total number of possible outcomes S. It is
Tossing a coin Drawing a blue ball from the denoted by P(A):
S = {defective, non-defective} box.
Drawing a ball from a = =
box containing two red S = {yes, no} Obtaining a defective unit.
balls and two blue balls
is an experiment. Employee answering “no”. where 0 ≤ ≤ 1

Checking every 20th The probability of an event A occurring varies from 0 to 1.
unit of a product from
a production line to see
whether it is defective
or not is also an
experiment.

Interviewing 30
randomly selected
employees in a
company and asking
them whether they
were satisfied with
their present job is also
an experiment.

PROBABILITY OF EVENT PROBABILITY OF EVENT

Example 1: Consider an experiment of tossing a dice. Find
A dice is rolled.What is the probability to obtain numbers greater than 3? a) The probability of getting even number.
b) The probability of getting odd number.
Solution: c) The probability of getting number greater than 4.
Sample space S = {1, 2, 3, 4, 5, 6}; n(S) = 6

Let A = the event of obtaining numbers greater than 3.Thus,A = {4, 5, 6} and n(A) = 3.
Therefore, using the definition of probability of Event:

P(obtaining numbers more than 3) = = = 3 = 1
6 2

Example 2:
A vase has 10 marbles.Two marbles are red, three are green and five are blue. If an
experimenter randomly selects 1 marble from the vase, what is the probability that it

will be green?

Solution: 3
ℎ 10
P(Green) = = = 0.3

PROPERTIES OF PROB ABILITY ADDITIONAL RULES PROBABILITY

1. The sum of probabilities of the outcomes in the sample space is 1. Mutually exclusive events:Two events A and B are mutually exclusive if they
2. Probability of an event A is between 0 and 1 or 0 ≤ ≤ 1. If P(A) = 0, cannot both occur at the same time, namely when A ∩ B = ∅
If A and B are any two mutually exclusive events, then
then an event is not to occur and if P(A) = 1, then an event is certain to
occur. A ∪ B = P (A) + P (B)
3. If A denotes an event A occurs and A’ denotes an event A does not occur,
then P (A) + P (A’) = 1 or P (A’) = 1 – P (A)

ADDITIONAL RULES PROBABILITY ADDITIONAL RULES PROBABILITY

Example: One white dice and one black dice are enrolled. Find the probability that:
A dice is rolled.What is the probability of obtaining a “3” or “5”? S = { 1W1B, 1W2B, 1W3B, 1W4B, 1W5B, 1W6B, 2W1B, 2W2B, 2W3B, 2W4B, 2W5B,
2W6B, 3W1B, 3W2B, 3W3B, 3W4B, 3W5B, 3W6B, 4W1B, 4W2B, 4W3B, 4W5B, 4W6B,
Solution: 5W1B, 5W2B, 5W3B, 5W4B, 5W5B, 5W6B, 6W1B, 6W2B, 6W3B, 6W4B, 6W5B, 6W6B}
n(S) = 36
Sample space, S = {1, 2, 3, 4, 5, 6} and n(S) = 6 a) The white dice shows a number smaller than 3
Let A = event of obtaining a “3” and B = event of obtaining a “5” b) The sum of the dice greater than 9
Thus,A = {3}; n(A) = 1, and B = {5} ; n(B) = 1 c) The white dice shows a number smaller than 3 or the sum of the dice is greater than 9

Note that A and B cannot occur together.Therefore A and B are mutually exclusive

events.Applying the addition rule for mutually exclusive events:

P(A or B) = P(A ∪ B) = P (A) + P (B)

we obtain: 1 1 2 1
6 6 6 3
P(A ∪ B) = P A +P B = + = = = 0.33

ADDITIONAL RULES PROBABILITY ADDITIONAL RULES PROBABILITY

The probability that a company executive will travel by plane is 2/3 and that he will travel Not mutually exclusive events: Two or more events that can occur together
by train is 1/5.What is the probability of his travelling by train or plane? (i.e. P ( A ∩ B) ≠ ∅).
If A and B are any two non-mutually exclusive events, then

P(A or B) = P(A ∪ B) = P (A) + P (B) - P ( A ∩ B)

ADDITIONAL RULES PROBABILITY ADDITIONAL RULES PROBABILITY

Example : 112, 1 3 Example :
2 4 A construction company is bidding for two contracts,A and B.The probability that the
A and B are two events with P ( A ∩ B) = P(B) = , P(A ∪ B) = . Find P (A).
company will get contract A is 3/5, whereas the probability that the company will get
Solution: contract B is 1/3. In addition, the probabilities that the company will get both contracts
P(A ∪ B) = P (A) + P (B) - P (A ∩ B)
are 1/8.What is the probability that the company will get contract A or B?

3 = P (A) + 1 - 1 Solution:
4 2 12
35, 13, 1
P (A) = 3 - 1 + 1 = 1 P (A) = P (B) = P (A ∩ B) = 8
4 2 12 3
P(A ∪ B) = P (A) + P (B) - P (A ∩ B)

P(A ∪ B) = 3 + 1 - 1 = 97
5 3 8 120

ADDITIONAL RULES PROBABILITY MULTIPLIC ATION RULES PROB ABILITY

On a TV quiz show, a contestant is asked to pick an integer at random from the first Independent Events:Two events A and B are said to be independent if the event A has
consecutive positive integers, that is, the integers 1 through 100. If the number picked is no influence on the event B to occur.
divisible by 12 or 9, the contestant will win a free trip to the Bahamas.What is the
probability that the contestant will win the trip? If the events A and B are independent, then the probability of both A and B to occur is
P (A ∩ B) = P(A) × P(B)
Note that if A and B are independent, P(B / A) = P(B) and also P(A / B) = P(A)

For example, a coin is tossed twice.The second toss of the coin is not being influenced by
the first toss of the coin, therefore tossing a coin for the first time and tossing a coin for
the second time are two independent events.

MULTIPLIC ATION RULES PROB ABILITY MULTIPLIC ATION RULES PROB ABILITY

Victor flies from San Francisco to New York via Chicago. He takes United Airlines from If the probability that person A will be alive in 20 years is 0.7 and the probability that
San Francisco to Chicago, and Beta Airlines from Chicago to New York.The probability of person B will be alive in 20 years is 0.5, what is the probability that they will both be alive
United Airlines plane safely landed is 0.95, and the probability that Beta Airlines plane in 20 years?
safely landed is 0.98. Find the probability that:
Solution:
Victor lands safely in Chicago and New York P (A ∩ B) = P (A) × P (B) = 0.7 × 0.5 = 0.35
Solution:
P (Chicago ∩ New York) = P (Chicago) × P (New York) = 0.95 × 0.98 = 0.931 A fair die is tossed twice. Find the probability of getting a 4 or 5 on the first toss and a 1,
2, or 3 in the second toss.
Victor lands safely in Chicago, but has difficulty in New York
Solution: Solution: 1 1 2 1
P (Chicago ∩ New York’) = P (Chicago) × P (New York)’ = 0.95 × (1 - 0.98) = 0.019 6 6 6 3
P (First toss) = P (4 and 5) = + = =
Victor has difficulty lands in Chicago and New York
Solution: P (Second toss) = P (1, 2 or 3) = 1 + 1 + 1 = 3 = 1
P (Chicago ∩ New York)’ = P (Chicago)’ × P (New York)’= (1 - 0.95) × (1 - 0.98) = 0.001 6 6 6 6 2

P (A ∩ B) = 1 × 1 = 1
3 2 6

MULTIPLIC ATION RULES PROB ABILITY MULTIPLIC ATION RULES PROB ABILITY

A number is picked at random from the digits 1, 2, ..., 9, and a coin and a dice are tossed. Dependent Events:Two events are dependent when the occurrence or non-
Find the probability of picking an odd digit, getting a head on the coin, and getting multiple occurrence of one event will affect the probability of the occurrence of the other event.
of 3 on the dice. The probability of the occurrence of the second event depends upon the occurrence or
non-occurrence of the first event.

For example, consider a bag contains 3 red balls and 4 blue balls. One ball is withdrawn
from the bag and not replaced.A second ball is then withdrawn. Find the probability that
the second ball is red.

Solution:
There are two situations to be considered:
1) The first ball is red and is not replaced.The probability of first ball is red is 73.Thus, the
probability of second ball is red is 62.
2) The first ball is blue and is not replaced.The probability of first ball is blue is 74.Thus the
probability of second ball is red is 63.

Hence, the probability of picking a second ball is depends on the outcome of the first ball.
In this case, the two events are called dependent.Therefore, Conditional Probability
was introduced.

MULTIPLIC ATION RULES PROB ABILITY MULTIPLIC ATION RULES PROB ABILITY

Conditional Probability : If A and B are two dependent events, then the probability of Consider an experiment of throwing a fair dice. Let A is the event of getting even number
and B is the event that the number obtained is greater than 3. Find | .
the occurrence event A given that the event B has already occurred is denoted by Solution:
| and defined as: S = {1, 2, 3, 4, 5, 6} ;A = {2, 4, 6} ; B = {4, 5, 6}

| = ∩ where > 0, similarly, | = ∩ where > 0.


The Rule of Multiplication for Dependent Events U

A

Let A and B are two dependent events. Since | = ∩ , then 2 4B
65

∩ = | × 1
3

This is known as the general rule of multiplication. | = 6 7∩8 = | = 9/; = 9
∩ 68 </; <
Similarly, from | = , we can obtain

∩ = | ×

If two events are independent, | = and | = ( ).

Thus,
∩ = ×

MULTIPLIC ATION RULES PROB ABILITY MULTIPLIC ATION RULES PROB ABILITY

A and B are two events with P(A) = 1 , P(B) = 12, P(A ∪ B) = 3 The probability for Ali to obtain grade A in Mathematics is 0.5 and the probability for Abu
3 4 to obtain grade A in Mathematics is 0.4. If the probability for Ali to obtain grade
a) |
b) ′| Mathematics given Abu has obtained grade A in Mathematics is 0.7, determine the
probability for:

1 A Solution:
1) Ali and Abu to obtain grade A in Mathematics
a) | = ∩ = 12 = 1 B Let A = Ali, B = Abu
1 4 P(A) = 0.5; P(B) = 0.4; P(A |B) = 0.7

3

b) ′| = ′∩ ′

(A’ ∩ B’) = ( ∪ )′ 2) Abu to obtain grade A in Mathematics given that Ali has obtained grade A in
Mathematics.

P(B’) = ′∩ ′ = P (A ∪ B)′ = 1− P (A ∪ B) = 1− 3 = 1 ∩ = | ×
′ ′ 1− 4 2 ∩ = 0.7×0.4 = 0.28
1
1− 2

| = ? @∩A = B.DE = 0.56
?@ B.F

MULTIPLIC ATION RULES PROB ABILITY

Let A and B are the events from an experiment with P(B) = 53, | = 23, and
7
∩ = 25 . Calculate P(A), | ′ , then, determine whether A and B are

independent events or not? and also determine whether A and B are mutually exclusive

events or not?


Click to View FlipBook Version