1'he 1extile Institute
Manual of Textile Technology
) 90073951 7 Practical Statistics for
1Control the Textile Industry:
;sessment
Part I
GAV Leaf MSc CText FTI FRSS
Foreword by Professor CS Whewell BSc PhD
CChem FRSC CText FTI(Hon) CCoI FSDC
The Textile Institute
10 Blackfriars Street
Manchester M3 SDR
England·
C S Whewell BSc PhD CChem FRSC CText FTI (Hon) CCol FSDC - Coordinator
P W Harrison BSc CText FTI MIInfSc
5.2
5.3
__ 55..43.1 •
-5.'1. 1
5.5
5.5.1
5.5.2
5.5
).6.1
15.6.2
5.6.3
,5.7 ..-
ti5.8
7 5.8.1
1.1 Some Statistical Problems 7 5.9
1.2 Populations, Samples, Variation
1.3 Variables 7 :6.
1.4 Random Variation ; Uncertainty 7
8 F. 1
9 2
6.3
6.3. 1 ~
2. 1 Frequency Distributions (1) •9
2.2 Frequency Distributions (2)
2.3 Relative Frequencies 11
2.4 Histograms \6.3.2
2.5 Probability Curves
2.5. 1 Probabil ity " .i6.3.3 -
2.6 Some Types of Probability-density 12 6.4
12 6.5
14 6.6
15
Curve 16 7.
3.1 Statistics Distribution ~
3.2 Averages: the Arithmetic Mean 19 7.1 ~
3.3 Measures of Variability
3.3.1 The Range 7. 1 .1
3.3.2 The Mean Deviation 19 17.1.2
3.3.3 The Variance 19 7.1. 3
3.3.4 The Standard Deviation 19 7.1. 4
3.3.5 The Coefficient of Variation (CV) 19 7.2
3.4 Estimation and Estimators 19 7.3
3.4 The Calculation of Means and Variances
3.5. 1 In Algebraic Identity 20 .4
3.5.2 Linear Transformations
Calculations Involving the Use of a Frequency 20 7.5
3.6
20 7.6
21 7.7
21 7.8
21 7.9
21 7.10
22 7.10 ..,
i'.11
24
8.
4.2 The Definition of Probability 24 8. 1
4.3 The Scale of Probability 24 8.2
4.4 Some Examples of Probabilities
4.5 Compound Events 25 8.2.1
4.6 The Addition Rule: Simple Form 26 8.2.2
4.6. 1 Extension to More than Two Events 26 8 2.3
4.6.2 Exhaustive Events 26 8.2.4
4.7 The Addition Rule: General Form 26 8.2.5
4.8 Conditional Probability 26 ,..
4.9 The Multiplication Rule 27
Statistical Independence
4.10 Some Probability Calculations -27
4.11
28
28
1.1 Some Statistical Problems
1.2 Populations, Samples, Variation
1.3 Variables
1.4 Random Variation; Uncertainty
2. 1 Frequency Distributions (1) Curve
2.2 Frequency Distributions (2)
2.3 Relative Frequencies
2.4 Histograms
Probability Curves
2.5 Probabi 1ity
2.5. 1 Some Types of Probability-density
2.6
3.1 Sta tis tics Distribution
3.2 Averages: the Arithmetic Mean
Measures of Variability
3.3 The Range
3.3. 1 The Mean Deviation
3.3.2 The Variance
3.3.3 The Standard Deviation
3.3.4 The Coefficient of Variation (CV)
3.3.5 Estimation and Estimators
The Calculation of Means and Variances
3.4 In Algebraic Identity
3.4 Linear Transformations
3.5. 1 Calculations Involving the Use of a Frequency
3.5.2
3.6
4.2 The Definition of Probability
4.3 The Scale of Probability
4.4 Some Examples of Probabilities
4.5 Compound Events
4.6 The Addition Rule: Simple Form
4.6. 1 Extension to More than Two Events
4.6.2 Exhaustive Events
4.7 The Addition Rule: General Form
4.8 Conditional Probability
4.9 The Multiplication Rule
4.10 Statistical Independence
4.11 Some Probability Calculations
,------------------------------------------.:,.5. 5.
1------------------------------------ - 5.
5.
5.
1.1 Some Statistical Problems
1.2 Populations, Samples, Variation
1.3 Variables
1.4 Random Variation; Uncertainty
2. 1 Frequency Distributions (1) 6
2.2 Frequency Distributions (2) 6
2.3 Relative Frequencies . \6
2.4 Histograms
2.5 Probability Curves .h.
2.5. 1 Probabi 1ity
2.6 Some Types of Probability-density 6
6
6
Curve
3.1 Statistics Distribution
3.2 Averages: the Arithmetic Mean
3.3 Measures of Variability
3.3.1 The Range
3.3.2 The Mean Deviation
3.3.3 The Variance
3.3.4 The Standard Deviation
3.3.5 The Coefficient of Variation (CV)
3.4 Estimation and Estimators
3.4 The Calculation of Means and Variances
3.5.1 In Algebraic Identity
3.5.2 Linear Transformations
3.6 Calculations Involving the Use of a Frequency
4.2 The Definition of Probability
4.3 The Scale of Probability
4.4 Some Examples of Probabilities
4.5 Compound Events
4.6 The Addition Rule: Simple Form
4.6.1 Extension to More than Two Events
4.6.2 Exhaustive Events
4.7 The Addition Rule: General Form
4.8 Conditional Probability
4.9 The Multiplication Rule
4.10 Statistical Independence
4.11 Some Probability Calculations
5.2 The Mean and Variance of a Probability
The Geometric Distribution
5.3 The Geometric Distribution: General Form
The Binomial Distribution
5.3.1
The General Form
---:"'5.4 The poisson Distribution
__ 5.4.1
The General Form
5.5 The Mean and Standard Deviation
5.5.1 The Normal Distribution
5.5.2 The General Form
The Standard Normal Distribution
i5.6 Tables of the Normal Distribution
A Simple Test fur Normality
i~. 6.1 The Normal Approximation to the Binomial
The Continuity Correction
15- .6.2 The Normal Approximation to the Poisson
15.6.3
i5.7
fl5.8
:5.8.1
5.9
Repeated Sampling
The Mean and Variance of a Function of Random Variables
6.3 Some Special Cases
6.3. 1
Linear Functions
- 16.:.2
Products
" -6.3.3
6.4 Quotients .
6.5
6.6 The Central-limit Theorem
7. The Sampling Distribution of the Mean
7. 1 The x2-Distribution
.7.1.1
! 7.1. 2 Introduction
7.1. 3 Poi~t Estimates
7.1. 4 Accuracy of a Point Estimate
7.2 Precision of a Point Estimate
7.3 Interval Estimation
Confidence Limits for ~, Large Sample Available
.4 The Interpretation of Confidence Intervals
Confidence Limits for~, Small Sample Available
7.5 Choosing the Sample Size
7.6 Confidence Limits for the Difference between Two Means
7.7 Confidence Limits for Matched Pairs'
Confidence Limits for 02 and for 0
7.8 Confidence Limits for the Ratio of Two Variances
Confidence Limits for a Proportion
7.9 How Many Observations are Needed?
7.10 Confidence Limits for the Difference between Two Proportions
7.10.1
7.11
8.1 Introduction Available
Test for a ~ingle Mean: large Sample Test
I 8.2 Hypotheses
A General Principle
8.2.1 The Significance Level
8.2.2 The Interpretation of a Significance
8.2.3 Single-tail and Double-tail Tests
8.2.4
8.2.5
)~[--------
f: ------ 1
1-------------------------------------- 1
--I ,- 1
;1 _ m
E
8.2.6 Statistical and Practical Significance 62 9
Errors and the Choice of Sample Size 63 c
8.2.7 Test for a Single Mean: Small Sample Available 64 t
8.3 Test for the Difference between Two Means: Independent Samples 65
8.4 The Case of Large Samples 66 i
8.4.1 The Case of Small Samples 66
8.4.2 Test for the Difference between Two Means: Matched Samples 66
8.5 Test for Single Variance 67
8.6 Test for the Difference between Two Variances 68
8.7 Test for a Single Proportion 69
8.8
Table Al
Table A2
Table A3
Table A4
Table A5
Table A5 (continued) a 0.025
Tctble A5 (continued) a = 0.01
8.2.6 Statistical and Practical Significance
8.2.7 Errors and the Choice of Sample Size
8.3 Test for a Single Mean: Small Sample Available
8.4 Test for the Difference between Two Means: Independent Samples
8.4.1 The Case of Large Samples
8.4.2 The Case of Small Samples
8.5 Test for the Difference between Two Means: Matched Samples
8.6 Test for Single Variance
8.7 Test for the Difference between Two Variances
Test for a Single Proportion
8.8
Answers to Problems 70
Table Al 71
Table A2 72
Table A3 73
Table A4 74
Table A5 75
Table A5 (continued) CI. 0.025 76
Table A5 (continued) CJ. = 0.01 77
1.lntroductfion
1,' Some 5tJt;stical Pr.,hlems .!-:.ZPopulo_tions, Samples. Variation
~der tile follOi·;in:j problems tnt·.t miSrtt face J The situations described Jbove have several thinGs in
cernmon. Filstly, in eJch ca';e. a population. or -
;anufactuf-er of, say, men's Quter'n'ear gJrmcnts. aggregate. of things can be distinguished. In
Example 1.1. the population is a human one. and
E mole 1.1 The mJnufacturer marks the size of the thi s corre~ponds to the nonna 1, everyday use of the
g:~~nts he produce s accord ing to che st girth. In word. In Example 1.2, the delivery of yarn is a
order to de.Jign the ga nnents ~ ther:fore I he needs population or, more precisely, the totality of
to kno~. among many other thlngs. Lhe average chest standard linear-density tests that could be made
girth of the men in the populatlOn to whom he lS on the yarr is a population. In a sense, this
hopi ng to se 11 the ga nmen t s. To detenmi ne th is ave- population does not exist because the tests have
rage exactl,l. the chest girth of every ..,ani~ the not been performed. but it is obviously possible
population would have to be measured. But, Slnce to imagine they could be done. The population in
there are probably :everal mlll,On men l~ the 00- Example 1.:1 consists of the garment blanks the
pulation. this would be a hopelesse1y impractlca?le manufactur,r will make in the future; again this
procedure - it would,take far too long to do and is an imaginary population because the blanks do
not yet exist.
would be v~ry expenslVe.
A second COmmon feature in the problems described
The only reasonable alternative is to measure a re- was that it was impossible or impracticable to
latively small number of men. perhaps a few examine every member of the population, so that
thousand, chJsen at rand'm from .he population. questions concerning the population have to be
Tllese few m~n are call ed a sarno1e. and the aqerage, answered by using only the incomplete information
ci;~st girth of the samole can be found. However, providen by exa~ining ju't the few members con-
this is not what the manufacturer wants - he would tained in 1 sample.
like to know the average chest girth of the popu-
lation. Thore is therefore a problem, which can be It is this problem, of trying to say something
meaningful about populations when only the results
stated as follows: 'From the data of a sample, what of a sample are available, that is tackled by the
methods of statistics.
tan be said about the average value of the popula-
tion from which the sample was drawn?'*** Why does this problem arise? The answer lies in
the fact that natural things, and man-made things.
Example 1.2 The manufactur o,r buys yarn from a seem inevitably to vary one from another. No one
spinner to knit into fabric. It has been agreed has yet pErsuaded a sheep or a cotton plant to
between them that each consignment of yarn delivered produce f'bres that all have the same length,
to the manufacturer should have an average linear weight. and other properties. The manufacturing
density inside th~ tolera'nee range 36±1 tex. When processes that men have devised seem incapable of
a batch of yarn is delivered. how can the manufac- mass-prodJcing articles that are identical in
turer decide whether or not the average linear every res,ect. If variation did not exist in a
density does, in fact, 1 ie within t"e tolerances? popu1atio,', i.e., if every member of the population
were identical in every way. it would be sufficient
To find the a'/eragf' 1inear density of the whole con- to examine only one member in order to know every-
signment would again be impracticable, since the thing about th~ population. But, when the members
standard test for linear density is destructive; of the ropu1ation vary, and only a few of them
when a 11 the yarn had been tested, there would be have been examined, O'lr knowledge of the population
none left to knit: The wanufacturer is therefore is incomplete and uncertain. One of the objectives
forced to test only a small fraction of the yarn of statistical methods is to measure the degree of
de11vered, and then two questions arise. Firstly. this unce-tainty.
how many standard linear density tests should be
carried out; and secondly. how should the test 1.3 Variatles
results be used to decide whether or not the yarn
dellvery meets the agreed specification? ••• The scien:e of statistics therefore deals with
quantitit"s that vary. Such quantities are called
Examp~e 1.3 The manufactUl"er knowss from past variables; examples are the masses of consecutive
experlence. that usually 3% of the garment blanks metre lengths of a woven cloth, the number of
stoppages per hour on a spinning frame, the percen-
he produces are defective for one reason or tage impwity in separate batches of a certain·
chemical. All these variables share the property
another. He is fairly happy with this situation that the) can have numbers associated with then"
called the values of the variable. and statist'cal
because he realizes that no mass-production process methods confine themselves to dealing with such
numer; ca 1 da ta.
nas yet been invented that will not produce some
Variab1e~ are of several different kinds, depending
defective items, However, he would not want the on how the values associated with them have been
obtained. There are three principal methods by
level of defectives to go any higher. He wants to which nur:eric<-l data can be' generated.
£lntrol th·, level of defectives, i.e., to detect (i) By countinq Possibly the simplest way of
obtalninq numerical data is to count the number of
qUlc~ly any increase in the number of defectives times an event happens. The value obtained by this
procedure can only be whole numbers, or integ~rs,
~eing produced so that remedial action can be taken. and the variables whose values are found in this
way are ,:alled discrete or discontinuous. Examples
o do thlS. the garment blanks must be inspected. of discrete variables are:
but the rate of production is too high to enable
ever¥. blank to be examined. Once again the manu-
facturer is' compelled to inspect only ~ small number
of the many garments he nroduces. How should 'he
r~sults of th,s lnspectl~n be used to detect ~
Cflange HI the level of defectives being produced? •..-Id:
~: The Sfl'lbol•• * is us~d throughout the text
deno:e the end of an E'x3:r.ple. In some instances,
example is continued later.
---'-----------------_-.:_--l~r
1.lntroductfion
,0';TROOUCTI~~ '1.2 Populo_tions, Samples, Variation
1.' Some StJt~5tical Proble~5 The situations described above have several things in
C0fm10n. Fil stly, in euch case, a population, or
~ider the fol1owin~ pr?blerns th,t,.t m1sht r~':e .} aggregate, of things can be distinguished. In
Example 1.1, the population is a human one, and
manufacturer of, say) men S outen-.ear gannents. this corresponds to the non"al, everyday use of the
word. In Example 1.2, the delivery of yarn is a
Examp Ie I. I The manufacturer marks the size of the population or, more precisely, the totality of
ga .-ments he produce s accord 1 ng to chest g 1rth. In standard linear-density tests that could be made
order to de>ign the garments, therefore, he needs on the yan. is a population. In a sense, this
to kno;;, among many other things, the average chest population does not exist becau,e the tests have
girth of the men in the population to whom he is not been performed, but it is obviously possible
hoping to sell the garments. To determine t~is ave- to imagine they could be done. The population in
rage exactly, the chest glrth of every ~an 1n the Example 1.:1 consists of the garment blanks the
population "auld have to be measured. But, since manufactur,r will make in the future; again this
there are probably :everal million men i~ the 00- , 1S an imaginary population because the blanks do
pulation, this would be a hopelessely impract1ca?le not yet exi st.
procedure - it would take far too long to do and
A second common feature in the problems described
would be very expensive. was tr.at it was impossible or impracticable to
examine every member of the population, so that
The only reasonable alternative is to measure a re- questions concerning the population have to be
answered by using only the incomplete information
latively small number of men, perhaps a few provided by exa~ining ju't the few members con-
thousand, chJsen at randrm from ,he population. tained in 1 sample.
These few men are called a samole, and the a~erage,
chest girth of the samole can be found. However, It is this problem, of trying to say something
this is not what the manufacturer wants - he would meaningful about populations when only the results
like to know the average chest girth of the popu- of a sample are available, that is tackled by the
methods of statistics.
lation. Thore is therefore a problem, which can be
Why does this problem arise? The answer lies in
Stated as foilows: 'From the data of a sample, what the f~ct t.hat natural things, and man-made things,
seem 1nevltably to vary one from another. No one
(an be said about the average value of the popula- has yet persuaded a sheep or a cotton plant to
pr?duce f'bres that all have the same length,
tion from which the sample was dra~m?'*** we1ght, and other properties. The manufacturing
processes that men have devised seem incapable of
Example 1.2 The manufactur ,'1' buys yarn from a mass-prodJting articles that are identical in
spinner to knit into fabric. It has been agreed every res,ect. If variation did not exist in a
between them ti,at each cO;lsignment of yarn del ivered populatio,', i.e., if every member of the population
to the manufacturer shoulJ have an average linear were identical in every way, it would be sufficient
density inside the tolera'nce range 36±1 tex. When tO,examine only one member in order to know every-
a batch of yarn is delivered, how can the manufac- th1ng about th~ population. But, when the members
turer decide wnether or not the average linear of the population vary, and only a few of them
density does, in fact, 1 ie within t',e tolerances? have been examined, 01r knowledge of the population
is incomplete and uncertain. One of the objectives
To find the averag~ linear density of the whole con- of statistical methods is to measure the degree of
signment would again be impracticable, since the this unce-tainty.
standard test for linear density is destructive;
when all the yarn had been tested, there would be 1.3 Variacles
none left to knit: The manufacturer is therefore
forced to test only a small fraction of the yarn The scien:e of statistics therefore deals with
de11vered, and then two questions arise. Firstly, quantititl's that vary. Such quantities are called
how many standard linear density tests should be variables; examples are the masses of consecutive
carried out; and secondly, how should the test metre lengths of a woven cloth, the number of
results be used to decide "hether or not the yarn stoppages per hour on a spinning frame, the percen-
dellvery meets the agreed specification? *** tage impu-ity in separate batches of a certain,
chemical. All these variables share the property
Example 1.3 The manufacturer knows, from past that they can have numbers associated with then ••
called the values of the variable, and statist'cal
experience, that usually 3~ of the garment blanks methods confine themselves to dealing with such
numerical data.
he produces are defective for one reason or
another. He is fairly happy with this situation Variable, are of several different kinds, depending
on how tee values associated with them have been
~:~ause he realizes that no mass-production process obtained. There are three principal methods by
~ yet been lnvented that w111 not produce some wh1ch nur:enc"l data can be' generated.
delect1ve ltems, However, he would not want the (i) Bycountinq. Possibly the simplest way of
obtalnlng numer1cal data is to count the number of
level of defectives to go any higher. He wants to times an event happens. The value obtained by this
procedure can only be whole numbers, or integ~rs,
cont:?l the level of defectives, i.e., to detect and the variables whose values are found in this
way are ,:alled discrete or discontinuous. Examples
QU1C<ty any lncrease in the number of defectives of discrete variables are:
~~,~g produced so that .remedial action can be taken.
o thlS, the garmen, blanks must be ins-ected
but the rate of production is too high to ~nable'
efvielrZ bl 'nk to be examln'ed. Once again, the manu-
""
Of"tr ~rer lS compelled to inspect only a small number
tne many garments he nroduces. How should the
r~su ts of this lnspectl~n be used to detect a
c·,ange HI the level of defectives being produced? **'ic
~: The symbo1 *** is used throughout the tex t to
denote the end of an examp 1e. In some instances, the
exa~~le is continued later.
--._-------------_..:-_- l~j~r
(a) the l\;JlitDer' 01 fJulls ill IUO-U1 lengths of cloth; These sources of variation are too numerous to
(b) the nunlbcr of abserltc~~ PCI" dJy in a cer(Jln consider separately and in detail, and they tend
to bl' intennittent in operation. They are largely
facto'-y; . outSide the control of the spinn~r and constitute
a kird of nuisance factor that has to be tolerateci.
(c) tl1~ numb,'/' of seven~ly UJllIJ(jed fl_~rcs In J Such .ourees of variation seem to affect all
prod"ction processes and also occur in natllral
sJlllple of ~;hrJllk-(~~I~,t-treJteu tlln-es. phenomena. It is the occurrence of these !:.lIldom
or chance sources of variation that gives 'ise to
~ rneJSU!-H~<J Ndll} vJ,r'lub~t:~ have '1Jlucs thJl uncei-tainty in the kind of sitJations desci'ibed
in Section 1.1 and has led to the developm"nt of
are foui1lfbYCOii:pJ,-in'j tne ·.arlablc wltn u :>tJ.ndard statistical methods.
meosuring scale. Examples are: Variaoles that are affected by random sour~es of
variation are called random variables. Although the
(a) the lengths of a certain type of gannent; .. numerical values associated with random variables
(b) the times taken.to repalr stopped machlnes, appar~ntly occur haphazardly, there is usually an
(c) the linear denslty of a yarn. underlying pattern in the variation. It is tne
existrnce of sueh patterns tnat allows a solution
Since the scales used in such c~ses are continuous, to the basic problem of statistics, i.e., of
makinJ meaningful statements about populations
the variables whOSE values are round ln thlS way when only the results of a sample are available.
The d"tection and description of patterns in
are ca 11ed ~ous. In theory, the va 1ues can be numer'lcal data are therefore of fundamental .mpor-
tance in the development of statistical methods,
measured to as many declmal places as we please, and it is with methods for doing so that we begin.
but in practice they are expressed to a known and
limited degree of accuracy. Thus, for example, the
result of a linear-density test on a piece of yarn
may finally be stated as 36.2 tex, and it will be
understood that the'e.act'value of the linear
density lies between 36.15 tex and 36.25 tex.
Similarly, if the masses of a certain type of
garment are measured'to the nearest 5 g',then a
garment whose mass 1S recorded as 320 g wl11 have
an'exact'mass between 317.5 g and 322.5 g.
(iil) By subjective assessment There are some .
properties of textlle materlals that are of conSl-
derable im~ortance, but which cannot yet be qUGnti-
fled either by counting or by measuring against a
continuous scale. Examples are the'handle'of a
fabric and the comfort in wear of a garment. No
satisfactory method of measuring properties like
this has yet been devised, and it is probable that
a completely objective measure is impossible, since
what one person may find pleasant or comfortable,
another may not. Such properties can therefore
generally only be discussed in terms of opinions
or preferences, and the objective of any investi-
gation concerning them will usually be to discover
whether there is any consensus or agreement in the
opinions expressed by several observers.
To achieve this, it is preferable to have some kind
of numerical data to analyse, and to generate this
the observers may, for example, each be asked to
compare several fabrics and place them in order of
prefere~ce according to handle. This procedure is
called ranking, and once it has been done the
articles being compared can be given numbers; the
number 1 might be assigned to the best fabric, 2 to
the next best, and so on. These numbers can be the
subject of statistical analysis in some cases, but
it is important to realize that they are quite
different in kind from the numbers associated with
count ir,g or measuri ng.
1.4 Random Variation; Uncertainty
It has been seen that statistical rr~thods have been
devised as al aid in drawing conclusions about
populations by using only the limited info"11lation
contained in a sample, and that this situation
arises because of variation.
However, in this context, two kinds of variation
must be distinguished. Sometimes variation is
deliberately introduced into a manufactured
article. For example, in spi,.ning a slub yarn, the
',near density of the yarn is made ~o vary in a
systematic manner. Because this variation is
regular, the properties of the yarn would be
predictable with certainty, and no statistical
problem would exist, were it not for the additional
pre~ence of other sources of variation of a
different, non-regular, kind. All sorts of things
Ciln happen when a yarn is being s~·un, which contri-
bute to the variation of its properties alonq its
length. The raw materiit'l can vary, the ambient
temperature may fluctua:e, the sliver may occa-
slOnally slip on the drafting rollers, and so on.
~Patterns in Data
1 F~C·~er~.-·'-cj-st-ri-bu-tio~J]J It helps to group the marks in fives, as shown in
the table. When ?ll the data have been dealt with,
-Ii: " , T!:~! d,1('1 .-;hmm in Tt),)l~! ::.1 Wl'l~l'!' the ma rr.5 ue COI'.1 ted (th i s is wh,!re the group ing
in fives is helpfJl) to give the frequencies for
:'!!'~.~" "'olJnt~n9 the num~er' of cnd-L,rr'Jk~, per each value of the variable. Table 2.2 is an
oi~ected ls:de example of a frequency distribution; it shows how
a cer'ain spin'irlg f~ame during the data are distributed among the possible values
of the variable.
Nr on onet've hours As they appear in the table,
The production of a table of frequencies usually
lO consec~e'diff'cult to comprehend, and some means makes the data more comprehensible, for two
1e da ta ar ' .d reasons. Firstly, there are fewer individual
f helping the brain to understand them 1S nee ed. figures to comprrhend; in this example, the 100
original data have been reduced to just six fre-
ict~rial representation of data 1S often-.helpful quencies. Second~y, a pattern often emerges in the
frequencies. Here, for example, the frequencies
nPthis respect, and a natural thlng to do wlth th,S rise to a maximum at a value of one end-break per
hour and then decrease very rapidly at higher
articular set of data is to plot them on a graph values of the variable. In other words, the vast
majority of hours (93 out of 100) had two or fewer
n the order in wh 1cr they were obta 1ned. Such a end-breaks and very few hours had more than two.
.lot is shown in Fig. 2.1. (The data wer~ wntten This pattern can be illustrated graphically. Two
suitable represrntations are shown in Fig. 2.2, the
.own in rowS, the sequence startlnc; O,O,J~2, ... ~nd bar chart and the frequency polygon. In the bar
chart, vertical 1 ines are drawn whose lengths are
mding ...2,1,0,1.) This.partlcular set aT data lS proportional to the frequency of the value at the
foot of the'bar'. The frequency polygon, formed
,n example of a time senes, Slnce the data were by plotting the frequencies against the values of
the variable, as in an ordinary graph, and then
Ibtained at regular lntervals~ and consequently the joining the plotted points by straight lines. Both
representations revecl the pattern of the frequenc-
lrder in which they were obtalned mlght be cies described carlier, and which is used is mainly
a matter of personal preference.***
important. For example, there could be a systematic
leriodic variation in the data, WhlCh would be
"evea 1ed by a gr3ph of tilis kind. However '.there
is no apparent pattern of that sort in this
•.'.:.trticul~r set of data.
. plot does reveal, however, that the smallest
number of end-breaks per hour was 0 and that the
largest was 5. Since the variable, end-breaks per
hour, can take only integer values, the only
possible values appearing in the data set will be
0,1,2,:,4, and 5. The data can therefore be reduced,
or sumar; zed, by count; "g the number of times each
of these values appears, and this has bepn done in
Table 2.2.The best way to do this is to work through
the data of Table 2.1 one by one and put a mark
opposite the a~propriate value in Table 2.2.
Numbcr
of
end-
brea ks
10 15 20 25 30 3S 40 45 50
Number 5 hour number
of
end- 4
breaks 3
2
1
0
55 60 65 70 75 80 85 gO 95 100
hour number
Fig. 2. I Plot of data in Table 2.1
=~=----------------=----
-- c,...,._.!:P.9.a~tt~e~r~n.:.:s:......:i-.:..n~D-=a..:.t:::.a --.-;=:
"IJuie .1,. . _. -- --~--_..._~_._- 4 10
Numbers of End-ureaks per Hour on One S"ide 0032
of a CC'I~ tJ i!l 51' i Ilil i JIlI I" I "<.\lIle
1Z11 00 00
1 I000 1 1 1 1
24 101 011 1 1
1 .1 0 0 1 1 3 1 0 0
0151 2 1 10Z0
0003 1 Z 0 Z 10
1101 0131)
2 0 12 0 0 10 1
1 Z 200 12 1 0
Number of end- Number of hours
breaks per hour (frequency)
a fH/... fH/... rH-L fH/... rH-L fti../... I I / I 34
1 rti-I... rti-I... rti-I... fH/... rti-I... rH-L rti-I... rti-I... rti-I... I 46
2 /HI.... rti-I... I I I 13
3 IIII 4
4 II 2
5I 1
100
I
fI
!;
i 50
rI:
I 40
II fr~quency
\ 30
j
20
10
£
-_.-_.-2. 2 F,·rqll,~cL.O~.S.t.'.:1_bl/t_i_o,,-j3.l T.bl~ 2..1: Rcsu~L inear-dens ity tes ts (tex)
,Q.~~~~~.e-.J=on5ilJrtllcnt of Worsted Yarn
£~--3.~ Till'd" ta of rXillllp 1 c ;.\ wer(~ especially
~<lSy to deJ I wi th b'=C.1U •..C.•' the po>',ib1e valuc~; of tile
JppC'Jrinf] in the dJ ta \'icn~ 1 imi Led to S 1 x. 31.3 31.3 .J 1.5 31.3 31.3 32.0 31.9 31.8 33.1 3>J.6
variJblc nOW the data in Table 2. J, wilich are tile 30.2 31.2 29.G 32.7 32.7 31.8 30.2 31.8 30.5 30.5
consider 31.4 30.6 31.4 31. 5 30.1 30.3 31.2 30.7 30.9 30.9
resu Its cf an extcnsiVl~ series of standJrd 1 incar- 30.9 30.1 32.4 32.8 31.6 31.8 31.7 29.5 30.7 31.6
dens ity tes ts CorTi ~d ~llt on J 1arge delivery of 30.6 31.4 31.0 31.0 30.5 30.5 31.0 29.1 30.2 31.1
'"orsted 'IJrn. This time the variable is a cont inuous 29 .•. 30.6 32.2 30.4 32.1 31. 7 31.5 31.7 31.4 30.4
5 inee in theory the linear density, or count, 31.5 30.4 31.3 31. 9 31.1 31.9 32.0 31.6 30.3 32.1
one 1 yarn can te es t 1m] ted to many decimal 31.0 31.4 33.1 30.6 31. 2 32.2 32.6' 31.9 32.2 31.3
places. 30.7 30.9 30.7 32.3 32.7 31. 3 32.5 31.3 31.3 31.5
af a 31.9 31.0 31.0 32.3 31.5 29.8 32.4 31.7 31.6 32.0
bl'cJtls(~ .of expc,'imcl1ra 1 erl'Of~ • tile 30.6 30.8 31. 1 32.1 29.9 31.6 30.6 30.6 31. 1 31.3
in prJ("icr, the reSJ 1 t.c:; ; s : irJi tel! 1 Jnd tex values 32.4 31. 1 29.7 31.2 30.6 31.5 31.0 31.1 31. 2 ));0
JeCUCJc:-, of 31.1 30.8 30.9 31.6 30.6 30.4 30.9 29.7 30.2 31.6
30.3 29.4 30.0 30.0 32.8 31. 9 30.7 31.7 31.8 30.1
ar~ often recarded to :me decimal place, as here. 31.0 30.8 32.1 30.8 31.1 32.5 31.7 30.5 30.5 31.5
31. 1 31. 2 31.4 29.5 31.5 31 ? 31.4 30.1 32.2 30.5
31.2 30.9 30.6 31. 2 30.3 30.6 31.8 31.4 30.6 31.3
30.9 31.2 30.2 29.6 31. 2 29.9 30.5 31.1 30.8 31.8
31.4 29.3 31.2 31. 1 31.1 31.0 31.0 30.7 31.3 30.7
31.0 30.2
1=---Table 2,4: Frequenc:1 Ois,t_r_ib_u_t_io_n_fo_r_tl._O.'_ef__oT__aa__tb__a1e_z._3 ._
e10S:'
I ",:,::;__mi: P:int __
9 rtiI..
29.6·30.0 29.3 rtiI.. rtiI.. /
rtf.L.. l'tiL.. rtiI.. rtiI.. rtiI.. /
I 30.1-30.5
30.6-31.0 f"/'l.I... rtiI.. rtiI.. rm.. rt-'l.. rm.. fH..L fH..L fill
31.1-31.' rtiI.. rtiI.. rtiI.. rm.. fH..L rm.. rtiI.. fH..L fH..L rtf.!. fill
31.6-32.0 rtf.!, rtiI.. rtiI.. rm.. fH..L fill
32.1-32.5 rtiI.. rtiI.. fN../....
32.6·33.0 rtiI.. /
33.1-33.5 //
This ti~e, the data were not obtained in any parti- (i) It is essential that the classes are defined
cular order and we therefore proceEd immediately to unambiguously,i.e., it must be clear into which
f,lrm a frequency distribution. class any observation lies. Thus in Table 2.4 the
classes run
An exarrination of the data reveals that the smallest
recorded value was 29.1 and the largest 33.1. Since 29.h~9.S
the da':a are recorded to one decimal place, the 29.6-30.0
values of the variable that could appear in the data 30.1-'30;'5
are therefore 29.1,29.2.29.3, ...• 32.9,33.0.33.1.
There are thus 41 possible values, and, if the number and so on, and there is no doubt in which class a
of times each of these values appears in the data value like 29.5 lies.
were counted, as was done for Example 2.1, there
would still be a large number of frequencies to But classes defined as follows are sometimes seen
examine, and little would-have been gained. The way
out of this difficulty is to group the values into 29.0-29.5
small classes and then count the freque"cies of the 29.5-30.0
values falling in each class. Experience indicates 30.0-30.5
that between 8 and 15 classes is ideal; except for
this "ecommendation, there are no special rules for etc. These classes are ambiguous, since a value like
helping to choose the classes. Bearing this in mind, 29.5 could be placed either in the first class or in
a convenient set of classes for the present data is the second, and this should be avoided.
shown in the first column of Table 2.4. The procedure
fo, findin0 the frequencies is ex.ctly.~he same as (ii) It is preferable, though not essential, tl make
for Example 2.1 except that in this cas~ a mark is all classes of equal width. The width of the classes
placed alongside the class in which eacr. lalue lies. is a quantity needed in certain calculations, and its
value is best found by subtracting corresponding
When the marks have been counted to give the fre- point~ in adjacent classes. For example, the width
quenCles, the patter" of the data becomes clear. It of the first class in Table 2.4 is given by the
is seen that the data tend to co"centrate in the difference between the lower ends of the first and
class 31.1-31.5, and that there are relatively few secor.d classes, i.e., by 29 •.5-29.1 = 0.5. It will be
observations at tne e:<trem{ ends of the distribution. found that the width of all classes in Table 2.4 is
This is a common pattern for continuous variables, 0.5.
thou9h not by any means the only one that occurs.
(iii) The mid-points of the classes are also needed
Several comments about the construction of frequencv for purposes of calculation. They are the centre of
dIstributions like that in Table 2.4 can be made .. the rdnge o' values defining a class. For example,
the m'd-point of the third class in Table 2.4 is
V2(30.1+30.S) = 30.3.
~ Patterns in Data
-----_.----
?~.~<J.:1..~~cies
Tilesum of Lhc f"e'iucilcies'ilTdole 2.4i, equel1 La
the LotJl Illlll1bel~ of VJlucs Irl Tllblc 2.3. i.e .. 19t:.
SOlHet;Il\('S it is nc,~cssJ.ry to COfllpLlrc b..o.. dislr'ibu-
clans that hove ocen fonlied f"01i1 data with diffcrent
total frequencies. F'r examp'le, Table 2.5 shows the
frequency distribu"on of Table,2.4,together wlth
the distribution of 11near denslty ln a subsequent
delivery of the same type of yarn. The classes are
defined in toe same way as before but, for shortness,
are denoted only by their mid-points,
Ta~le 2.5: Frequency Distributions of Li~ear Density (tex)
for Two Deliveries of the same Type of Worsted Yarn
Frequency Relative frequency (%)
Class 1st delivery 2nd del ivery lst del ivery 2nd delivery
mid-point
5 5 2.6 -
29.3 11 6 5.7
29.8 26 18 13.5 3.6
30.3 44 29 22.9 4.3
30.8 54 42 28.1 12.9
31.3 29 23 15.1 20.7
31.8 15 12 7.8 30.0
32.3 6 4 3.1 16.4
32.8 1 1.0 8.6
33.3 2 2.9
33.8 0.7
Total
frequencies 192 140 99.8 100.1
The tE'stSwere made to investigate the difference 2.4 Histograms
Det,'ieenthe two ya rn cons ignments, so a compari son
of the frequency distributions shown in the second A pictorial representation of frequency distribu-
and third columns of Table 2.5 is called for. tions like those in Table 2.5 is helpful, and for
However, because the total frequencies are not the this purpose'a diagram called a histogram is"the most
same, it is not valid to compare individual frequen- useful. Fig. 2.3 shows the histogram for the data of
Cles. To facilitate such comparisons, the actual the first delivery in Table 2.5. The horizontal axis
frequencies have been converted to relative frequen- is d~vided into segments corresponding to the widths
~ by uSlng the equation of the classes shown in Table 2.5. These ClilSSeS are
identified on the 'horizontal axis by their mid-
re1ative frequency", actua 1 frequency points. Above each segment, a rectangle is construc-
tota 1 frequency ted ';005e area is proportional to the relat've fre-
quency of the class represented by the segment.
and these values are shown in the last two columns
of Table 2:5 They have been multiplied by 100 to con- It is very important to remember that it is the
vert them to percentBge relative frequencies' note AREP,S of the rectangles that represent frequencies.
that these figures do not add up to 100% bec~use the
lnd1vldual relative frequencies have been rounded to Because relative frequenc'ies have been used, the
one place Gf decimals. The relative frequencies show total area of the rectangles is equal to unity, or
the,Broportion (or percentage) of observations to 100% if percentage relative frequencies are
fal.lng 1£1each class. employed.
It is valid to c0mpare the two sets of relative In ol'der to dra\; the rectangles, it is necessary to
frequencies. It can be seen that the two distribu- know their heights. Suppose hi is the height of the
tl0~S are quite similar, except that the centre of rect~ngle representing the ith class. Its widt~ will
the second distribution is displaced relative to be c~ual to the width Wi of-the class, so the area
~hat,of the first, suggestinq that the average linear of the rectangle is hiwi. This has to be proportio-
enSlty of the second delive~y was greater than that nal to the relative frequency Fi of the class, i.e.,
of the flrst.
A special case occurs if all the class widths are
the same, i.e. > if Wi ; constant. This constant
can then be incorpora ted in the pro port iona 1 "ty small at low values of the variable and become larger
constant implied in relation (2.1), which then as the v~riable increases. The original data are
presented in the first two coluMns of Table 2.6 and
gi yes the percentage relative frequencies in the third
column. It is misleading to compare these
hi a: F; frequencies to discern the pattern in the data
because of the varying class widths. These widths
i.e., the heights of He recta"gles are proportional are given in the penultimate column and have been
to the relative frequencies. However, it must not found 1Y subtr, cting th~ starting points of adjacent
be forgotten that it is still basically the areas classes; thus the width of the first cla,s is
of the rectangles that represent frequencies. This 25-11 = 14, and so on. Note that the last class is
special case occurs in the frequency distribution open-ended and thus has an infinite width. The
of Table 2.5, and that is why a vertical scale of ratios Fi/wi are shown in the final column, and
relative frequencies is shown in FiJ. 2.3; it is a these are proportional to the heights of the
useful aid in constrUcting the rectangles. rectangles forming the histogram. A vertical scale of
these va"lues is helpful in constructing the
An example in which the class width~ are not constant rectangles, as shown in Fig. 2.4. Th's diagram i;
is shown in Table 2.6. These data refer to the size the histogram of the data, and their pattern is lOW
of textile establ ishment< in Great Britain in 1961, clear.
as measurec' by the number' of employees they had.
Because the m,jority of establishments had fewer
then 100 employees. the class widths are relatively
Table 2.6: Size of Textile Establishments in Great Britain
with More than 10 Employees (June, 1961)
Number of iNumber of % Relative Class
employees Iestablishments frequency, Fi width, w·1 Fi / wi
11-24 875 15.74 14 1.12
2568 46.20 75 0.61
I 25-99 1878 33.78 400 0.085
184 3.31 500 0.0066
100-499 0.68 1000 0.0007
38 ~.29 co 0
500-999
, 100.00 I
1000-1999
2000 or more 16
5559
=-- ---.J)~.C 13
--,---- 2. Patterns in Data
2.5 Probability Curves form' d with much sloaller classes, and consequentl'!
its I istogram would consist of narrower r!ctangle~ -
;;previous sectl<Jfls have shown that it is possible we should have progressed from a histrogram like that
ct. ,.'g. <.S" (corresPQndiflfj to the Ot'i'I'inalhistofjrJIIJ
to find I"Jtten:' In d"!.ilby form]nq freqllency of tlg. 2.3) to something like Fig. <.5b. It can be
seen.now much slIJoother that of Fig. 2.5b is. If w,~
J;stril'Jtions ,111J that J plclurc of" till: IJutlerll i~ lmag,ne this process to continue, as the sample size
tends to infinity (i.e., as the sample tends to
,rovided by a histogr·aIJl. The frequency distl'ibution becoPIp. the papulation) and the accuracy of the
measur~nents also incr~ases, in the limit the
~f Table 2.4 and its associated histogram in Fig.2.3 outl!ne of tile histogram will become indi,tin'lui-
shab,e from the sffiooth curve of Fig, 2.5c.
refer to a sample of 192 1 inear-density tests made
Thus we can think of the smoo.h curve describing
on a large del ivery of yarn. The population from the population from vlhich the sample reprEsented by
the histrogram was drawn. Put another way, the
which this sailipleI;ds drawn consists of all the hIstogram of Fig. 2.5a is an estimate of the
population curve of Fig. 2.5c. Statistical theory
count tests that coul'l have been made on the yarn provides a rallge of.smooth curves of different
shapes that can be used to represent populations.
in the delivery. This number of tests would be When some sample data are available, the sample
hJstogram can be matched ag,inst one of the standard
finite but very large, and it is often convenient theo~etical patterns, thus revealing the kind of
population from which the sample came. Put J'ather
to think of populations being infinite in size. crudely, this is the start of the process o~ making
stat~ments about populations when only the I'esults
The choice of classes in constructing the of t:Je sample are known.
frequency distribution was governed by two things,
namely, the number of data available and the
accuracy to which they were expressed. It'would
have been impossible, for example, to have made the
class widths smalier than 0.1 since the data were
expressed only to the first decimal place. However,
theoretically a continuous variable like yarn linear
density can be measured to any accuracy we please.
Imagine this has been done; then it would be
possible to reduce the widths of the classes. But
then many of the classes would be empty because of
the limited amount of data. So imagine further that
the number of tests performed is considerably
increased. A frequency distribution could now be
100 300 500 700 900 1100 1300 1500 1700 1900
200 400 600 800 1000 1200 1400 1600 1800 2000
M~asuring accuracy
increases
2.5.1 Prob~_bil ity
It will be recalled that the areas of the rectangles
forming a histogram represent relative frequencies.
Thus, in Fig. 2.5a, the shaded area represents the
relative frequency, or porportion, of values in the
sample lying between x 0 a and x = b. (Here x
denotes the variable being considered, i.e., yarn
linear density in tex in Fig. 2.3fAs the sample
size is increased, th";s area eventually becomes the
shaded area under the smooth curve in Fig.,2.5c,
and it then represents the proportion of values ~
~~opulation lying between x = a an& x = b. These
proportions are called orobabilities and the
smooth curve of Fig. 2.Sc is a probability-density
~.
------------------------ rt.m-: -. ,-. tJ-· ~~"....."..",..-..,~ [ . ._
------- ~]
2. Patterns in Data
If the :,eight of tile probilbilitJl-densit.y curve <1t Pr\xlIlin<x';Xmax) "J Xlllol.. c 1.0 .
f(x)dx
any villue of x is given by y " f (x) , ilS shOlVn ln xmin
Fig. 2.6, the" ti,e probilbility thiltx lie' lJetVieen
J JIH1 tJ, :;ivCII by tlll~ ~hadcd iln.:a. 15 Note ttldt XlIllll can be - OJ, and XlllilX Cdn be +.:0,
p,.(".;x,IJ)- h Pr',bilbilities arc of fundamentill illlportance in
st.-listlCS, Jlld 1I10rC Viill be said ab0ut them in
jl(x)<lx. Chlpter 4.
il
As a consequence of the Viay the original histrograms
lVere constructed, by using relative frequencies, the 2.,; Some Types of Probability-density Curve
total area under the histogram, and hence under the
probability-density curve, is equal to 1.0 (or to The shape. or patterns of variability revealed by
100% if percentage relatlve frequencles are used). frequency distributions and histograms can be quite
Thus, if xmin and Xmax are, respectively, the varied, but generally speaking they fall into a
smallest and largest possible values of the relatively small number of recognizable types, Some
variable x, then typical probability-density curves found in practice
an, s hOlVn in Fig. 2. 7 .
~_I_"
rI,· f\trSkelV
j (x)
I ~-~.
II
!I
r.1
,Ic:/1_~._.,J;::r1 d)
x
1
~
f) Compound, f (x)
or mixed
M0asuring accuracy
i f1creases
2.5.1 Prob~_bil ity ty ~ f (x)
It will be recalled that the areas of the rectangles )
forming a histogram represent relative frequencies.
Thus, in Fig. 2.5a, the shaded area represents the
relative frequency, or porportion, of values in the
sample lying bet',;eenx : a and x : b. (Here x
denotes the variable being considered, i.e., yarn
linear density in tex in Fig. 2.3)-As the sample
size is increased, this area eventually becomes the
shaded area under the smooth curve in Fig.,2.5c,
and it then represents the proportion of values i'2.
~~'pulation lying bet',;cen x : a and' x ~ b. These
proportions are called probabilities, and the
smooth curve of Fig. 2.5c is a probability-density
~.
- 2. Patterns in Data Pr\Xmin~X"Xmax) 0 JXInJI.f(x)dx = 1.0 .
If the :leight of the probJbilit)'-density curve ill xmin
any value of x is given by y = f (x) , as shOlvn in
Fig. 2.6, the'l I.heproDilll1lltylhatx lies hel"cen Note thllt +xllIill can be - ~), c1nd XHlJ.X can b~ (,Q.
J alld 0, qlVCtI l)y tIll: shJdcd ilr°L'u. 15 Pr",babilities ure of fundamenl"l importance in
st..t'istlC>, und 1Il0re"ill be said abuut thelllin
JIi Clilpter 4.
Pr(d,x...b) I(x)d". 2.0 Some Types of Probability-density Curve
J The shapeo or patterns of variability revealed by
frequency distributions and histograms can be quite
As a consequence of the way the original histrograms varied, but generally speaking they fall into a
were constructed, by using relative frequencies. the relatively small number of recognizable types, Some
total area under the histogram, and hence under the typical probability-density curves found in practice
probability-density curve, is equal to 1.0 (or to ar;: shown in Fig. 2.7.
100% if percentage relatlve frequencles are used).
Thus, if xmin and Xmax are, respectively, the
smallest and largest possible values of the
variable x, then
Skew ~f(X)
LJ_~.
~- •. x
f) Compound. f (x)
or mixed
Fig. 2.7 (a) and (b) are examples of a class of . Fig. 2.7 (d) shows some extremely skew distributions
olten referred to as J-shaped. They do not occur as'
distributions known Js_~~~~a~ because of the,r fr~quently a5 types (b) and (cl, but Fig. 2.4 shows
a' exan:ple.
sy"T11ctry"hout ,1 (,·"t,\11v,llu,' Of lh"-Viln.1hl, x,
The distribution in·Fig. 2.7 (e) is rare, but
as 1.n d,'C1t"d by the vedlc"l dotted 1'''''0. The eXemples can be found, such as that in Problem 4
,l.... , . at the end of this chapter.
distribution in rig: 2.1 (.1) 1$ called rectAnqula.!:. $rynethimes frequency distributions occur that
appear to be mixtures of two or more of the above
or un~, for ObV1QUS reasons, whlle, ,that In s'andard types. For example, if two batches of wool
fiures of different qualities are thoroughly mixed,
Fi<J.2.7 (b) is an example oj' perhaps Lne most th~ resulting distribution of fibre di'tmeters would
prJbably have two peaks as a consequence of
cunman and useful symnetf'l~al ~lst~lbu~lon, called co,nbining two distributions of type (b), as shewn in
Fig. 2.7 (f). When this kind of thing is found, it
the nonnal or .§~.tJ5sian.Th1s d1stnbut10n occurs is usually prudent to exercise caution 'n drawing
co~clusions from the data.
freq-;;entiY in ,t.ltistical theory, and m.1ny sets of
data tend to follow tillS k1nd of pattern; for.
example, the histogram of yarn llnear denS1ty 1n .
Fig. 2.3 looks as tHough it could be of the nonnal
type.
The distributions of Fig. 2.7 (c)are called skew.
The curves rise to a peak. or maXlmum, Just as the
normal distribution does, but they are not
symmetrical about a central value. They occur quite
often in practice, and F1g. 2.2 shows an example of
this type.
PROBLEMS FOR CHAPTER 2.
1. The following are the results of counting the
number of wrap breakages during the ~eaving of 92
standard lengths of a certain kind of cloth.
230 10 12 112
1 0 12 34 3 4 2 3
4 134120301
312 2 5032 31
12302 11 34 2
0 213122 1 33
24012130 52
13230354 10
2 132 10 1 0 4
01
Construct a frequency distribution fu' these data,
calculate the relative frequencies, and draw a
frequency polygon.
2. The data below are the results of breaking
strength tests (in gf) on a yarn.
491 507 501 512 501 508 513 511 496 505
507 491 503 503 499 501 493 490 498 511
514 493 494 507 499 501 491 499 513 485
I 488 496 512 508 508 523 496 523 502 499
498 504 510 500 501 498 513 505 508 514
493 501 4911 509 524 497 506 507 490 516
480 .505' 482 498 505 513 517 483 504
W'509 501 513 519 490 497 508 502 499 506
526 501 487 .512 511 500 178 491 502 490
511 516 512 497 506 509 493 510 475 503
Construct a frequency table and draw a histogram.
____ ~,>:J
F' 2 7 (a) and (b) are examples of a class of Fig. 2.7 (d) shows some extremely skew distributions
olten referred to as J-shaped. They do not occur as'
d~~~ributions known as _~~~~!!,.!. beouse of their frequently as types (b) and (c), but Fig. 2.4 shows
af exan:ple.
s 1lTJ(~try,1hOlJt ,1 centr~1I vt11up Of lhp VdrLlhll X.
The distribution in 'Fig. 2.7 (e) is rare, but
asY' 1 n d)'Cll,'J by the vedic"l Jotted li""5. Tile examples can be found, such as that in Problem 4
,l.... , . at the end of this chapter.
distribution in rig: 2.1 (.1) lS called rectanqula':. So,nethimes frequency distributions occur that
appear to be mixtures of two or more of the above
for ObV1QUS reasons, whlle.~hat In s'andard types. For example, if two batches of wool
0:or un~, fiures of different qualities are thoroughly mixed,
Fi<J.2.7 (b) is an cxan:;Jlc perhaps (ne most th~ resulting distribution of fibre diilmeters would
prJbably have two peaks as a consequence of
cunman and useful symnetf'l~al ~lst~lbu~lon, called co,nbining two distributions of type (b), as shewn in
Fig. 2.7 (f). When this kind of thing is found, it
the nonnal or ~~U5sian. ThlS dlstnbutlon occurs is usually prudent to exercise caution 'n drawing
co~clusions from the data.
frcq~1Y in' toltis t ica 1 thcory, and m.lny sets of
da ta tend to fa 11 ow tills k lnd of pa t tcrn; for .
example, the histogram of yarn llnear denslty ln
Fig. 2.3 looks as tHough it could be of the nonnal
type.
The distributions of Fig. 2.7 (c) are called skew.
The curves rise to a peak. or maXlmum. Just as the
normal distribution does, but they are not
symmetrical about a central value. They occur quite
often in practice, and Flg. 2.2 shows an example of
this type.
PROBLEMS FOR CHAPTER 2.
1. The following are the results of counting the
number of wrap breakages during the ~eaving of 92
standard lengths of a certain kind of cloth.
230 10 12 112
10 12 34 3 4 2 3
4 134120301
312 25032 31
1230211342
0 2 13 12 2 1 33
24012130 52
13 23 03 54 1 0
2 13 2 101 0 4
01
Cons truc t a frequency distribution fo' these data,
calculate the relative frequencies, and draw a
frequency polygon.
2. The data below are the results of breaking
strength tests (in 'If) on a yarn.
491 507 5 01 512 501 508 513 511 496 505
507 491 503 503 499 501 493 490 498 511
514 493 494 507 499 501 491 499 513 485
I 488 496 512 508 508 523 496 523 502 499
498 504 510 500 SOl 498 513 505 S08 S14
493 SOl 4911 509 524 497 506 507 490 516
480 .50S· 482 498 S05 513 S17 483 S04
W'509 501 S13 S19 490 497 508 S02 499 S06
526 501 487 .512 Sll 500 178 491 502 490
511 516 512 497 506 S09 493 510 47 S 503
Construct a frequency table and draw a histogram.
____ ~,>=J , _
, Il .__ .
2. Patterns in Data
3. Rayon yarn is wound on metal spools that are made
to a specifi.d mass of 226 g, ~ith a'tolerance range'
of 3 g. A random sample of 100 spools were found to
have the following masses (g),
I 206 210 231 235 225 225 223 210 212 218
227 211 208 230 228 2?3 230 228 208 226
209 228 210 208 206 210 227 215 213 210
218 208 226 227 207 207 226 226 232 226
227 225 228 227 209 2::5 234 209 223 210
233 217 227 210 228 210 225 229 210 231
226 208 224 216 210 217 227 226 219
I 228 208 225 212 210 224 208 209 223 230
I 207 230 209 220 223 206 206 226 209 222
227 211 218 227 207 209 226 229 225
232
~9
Construct a frequency distribution with a class
interval of 3 g and draw the ccrresponding histogram.
Comment on the performance of t~e spool-making
machines and on the size of the 'tolerance range'.*
4. Yarn is wound on large spools, which are run at
high speeds. At times the spoole, run erratically, and
when this occurs the operation ',sstopped and the
spools are doffed short of their intended load. The
following data refer to a sample of 147 spools, in
which the frequency of doffing is given for the
percentage of the intended load at doffing, arranged
in classes as show~.
Class (X) 0-5 5-15 15-25 25-35 .15-45 45-55
Frequency 45 4 10 2 11
Class (%) 55-65 65-75 75-85 FJ5-95 95-100
Frequency 10 . 6 2 1 51
Construct the appropriate hist09ram and
comment on the data.*
*From'Stati;tical Methods for the Process Industries'
by ~Iaurice H.Belz (Macmillan, London) and reproduced
by kind PFmission of the publi;hers.
]. 1 J_t..J.~~c_s lJ~a2'J~,,~~v.i1.':..~,l..h..iJ..'_t.z
in the 1.1st ,:I.,lpU',-. IllcthOtJ'; vier\..' dcscritH,'d which J. 3. I rhe na nge,
J.l1m~' ttll> main fe(ltun~s of Ull~ pattern of vJriabil ity
Sev(ral measures of variability have been devised,
ic a set of data to be found. Fir,t, it is often of ~hich the simplest is the ringe. This is defined
necessary to COf;lparetl'Oor more sets of data, and as the difference between the argest and the
such comparisons are facilitated if the distributions sma'lest values in the sample. Thus, in Example 3.1,
can be characterized by a few numbers or statistics the largest bbservation was 55.7 cm and the smallest
that suwmarize the data in a concise way. Second, the 51.8 em; the range of the sample was therefore
methods of Chapter 2 can only be used if a relatively 55.7-51.8 = 3.9 cm.
large amount of data is available. In many practical
applications the samp1es.are small, often consisting The sample range is very easy to calculate, and for
only of 5 or 10 observat10ns, say, and such small thi~ reason it is often used in routine quality-
quantities of data are not amenable to frequency- :ont~ol work, but as a general measure of vari,bi1ity
distribution analysis. lt suffers from the fact that it uses only the two
extr~me values in the sample and takes no acc~unt
Summarizing statistics can be of many kinds, depending of how the other values vary.
on the features of the data it is desired to
highlight, but only the two most corrmon of these "'ill 3.3.2 The Mean Deviation
be dealt with here. These are
A mOl'e sophisticated measure of variability, which
(i) averages, which measure the central value of a does not suffer from this defect, is the mean
-d'ist"ribution,and deviation. The derivation of this quantity is best
(ii) measures of variabi 1ity. explained by reference to Fig. 3.1, which shows the
resuits of the sample of garment lengths (Example
3.2 Averages: the Arithmetic Mea; 3.1) plotted against a scale. The arithmetic mean
valu;~ is also indicated. The diagram illustrates the
Several different averages can be defined, each of way 'n which the mean measures a'central value'and
which is of use in certain circumstance;, but by far how the individual sample values vary about, or
the most common is the arithmetic mean. This is deviate from, the mean value.
defined by the equ, tion
The mean deviation is the arithmetic mean of these
arithmetic mean = sum of observed values deviations, i.e., it is the average distance of the
number of observations' p~otted points in Fig. 3.1 from the point represen-
tlng the mean value. Only one of the deviations is
At this point, it is convenient to introduce a injkated. ,n the diagram, that for the value 5'1.5.
notation that simplifies the statement of.equations . Sinc£ the mean is 53.6, the deviation corresponding
like Equation (3.1). The variable under consideration to this value is
will usually be denoted by a letter at the end of the
alphabet, e.g., x, y, u, v. If n values of the In general, the deviation corresponding to the ValUE!
variable are available, they will be denoted (If the. Xi is
vlriable is x) by
di = lXi-xi,
The arithmetic·mean of such a set of values is
denoted by X (read x'~ar) and Equation (3.1) becomeS the '~rtical lines indicating that the positive
x ~ X:!.+Xl+~
valu~ of the deviation is always taken. The mean
n deviation is then
This equation can itself be shortened by introducing
the symbo1 E to indicate 'the sum of'. Thus
= n Ix, -x I
l:
-i=1--n-.1 -
In most cases, the 1imits of the sUlTrnationare Example 3.1 (continued) For the sample of garment
obvious, and then we write simply
Ex = Xl+XZ+",+xn , lengths, the calculation of the mean deviation is
and shown in Table 3.1. The sample values are shown
in the first column~ Their mean has already been
ca1cL1ated, giving X = 53.6, and the second column
shows the differences x-x. Note that this column adds
up to zero, the positive differences cancelling the
Exwple 3.1 The fol1owlng are the lengths (in cm) of negative ones. This will always be the case, whatever
a sample of SlX gament blanks chosen at random
from a large batch of similar blanks: the.values of x. and this is the reason why only the
54.5,53.0,55.7,51.8,54.2,52.4 posltlve values of the deviations, shown in the third
For this example, n = 6, and
colum", are averaged. ***
Lx = 54.5 + 53.0 + 55.7 + 51.8 + 54.2 + 52.4 = 321.6
Therefore the mean length of the sample of garments
,"'as
------------------_-.-. _-----_--.-
x ,x-x d =Ix-xl
~ deviatlon ~4.5 0.9 0.9
53.0 -0.6 0.6
~ 55.7
! 51.8 2.1 2.1
I 54.2 -1.8 1.8
52.4
0.6 0.6
-1.2 1.2
"I I I I t I ~ ~ It L d 7.? "II
I II II
51.5 52.0 52.5 53.0 53.5 54.0 54.5 55.0 :i5.5 56.0 a I= l: din = 7.2/6 = 1.2 '-.J
JTable 3.1: Calculation of
Mean Deviation
3.3.3 The Variance Example 3.1 (continued) Since s' = 2.124 cm', we
find thJt the standard deviation of the gannent-
The mean deviation is quite a good measure of length data is
var;ability and is, in fact, often used for
characterizing the irregularity of yarns*. It is 3.3.5 The Coefficient of Variation (CV)
easy to understand and is relatively easy to
It is sometimes necessary to compare the variability
calculate. HOI,ever, it has the major disadvantage of two or more sets of data that have quite
different mean values. For example, it may be
that taking only positive values of the deviations desired to compare the linear-density variation or
leads to mathematical difficulties in developing a irregularity, of a 20-tex yarn with that of a
theory of statistics based on mean deviation as a 30-tex yarn. To compare variances or ~ tandard
measure of variability. For this reason, another deviations directly in such circumsta,lCes may not be
justified, since it might be expected that a fine
measure has been devised, which still uses the yarn would be less irregular, and hence have a
concept of deviations but which does not lead to smaller standard deviation, than a coarse one. Thi~
difficulty is overcome by using the coefficient of
the mathematici,l difficulties mentioned above. This variation, which is a measure of re1a~------
is the variance, which is defined as the mean value variability. It is defined by expressing the
07 the sguares of the deviations. Denoting the standard deviation as a percentage of the arithmetic
mean) ; .e.)
sa:lp"evariance by s', we have
Example 3.1 (continued) For the garment-length data,
<' _ [(x-x)' (3.4) x = 53.6 cm and s = 1.46 em, so the coefficient of
.. - n-l . variation is
(Since the variance is a mean value, a divisor n C = 100 x 1.46/53.6 = 2.72%
would be expected in this equation. The reason why
n - 1 is used instead will be discussed in Section A word of caution is necessary at this point. The
3.4. ) coefficient of variation is independent of the units
in which the variable is measured, but its value does
Example 3.1 (continued) The deviations for the depend on the zero of the scale of measurement. Thus
sample of gannent lengths are given in Table 3.1. it is quite legitimate to compare the CV's of two
Their sum of squares is variables that are, say, masses, one of which is
measured in ounces and the other in grams. This is
[d' = [(x-xj2 = 0.g'+(-0.6)'+2.1'+(-1.8)'+0.6'+ because 0 gram represents exactly the same weight as
(-1.2)' = 10.62.
o ounce. But it is not pennissible to compare the
In this example, n = 6 and Equation (3.4) therefore
gives CV's of two tem8erature variables it one temperature
lS measured ln F and the oth8r in C, since OOF is
and this is the sample variance for the gannent not the same temperature as 0 C. Similarly, it would
be wrong to compare the CV's of two yarn-count
1 ength~. *** variables, one of which is measured in a direct
counting system (e.g., tex) and the other in an
*See, for example,' Evenness Testing in Yarn indirect system (e.g., cotton count).
Production: Part I',p.D
3.3.4 The Standard Deviation
Since the variance is the mean value of the sguared
deviations, it has units lhat al'e the square of the
units of the original variable. Thus, since the
gannent lengths were measured in 1..0", the variance
has units em', It is obviously desirable that a
measure of v•.riability should have the same units
as the data from which it is calculated, and this
can be achieved by finding the square root of the
variance. This square root is called the standard
deviation, i e.,
~--E t'mation and Estimators If the original variable is denoted by x. the idea
is to transform the values of x into values of a new
explanation for the use of (n-l) as a divisor in variable y, which is related to x by an equation of
An , (3 4\ rather than the ,~xpected n, 1 S thr type
,-quotlon'y T,h,e' reason lies l,n the b'JS1C pro blem f
0
necess::c; discu'sed in chapter 1, namely that an y = m(x-a). (3.8)
stat"t',,<'being made to say something meaningful The values of m ard a are chosen so as to make the
aatboteutmpa. po-pulation ~esu 1ts f a,samp 1e, valliesof 1.. as-simple as possible.
when only the
0
chosen at random from the populatlon, are aval1able.
Now the mean and the variance of a sample, such as Example 3.1 (continued) The values of the garment
those calculated in the precedlng sectlons, hdve no 1engths were
, t irtrinsic value, except ,or what they tell us
X = 54.5.53:0,55.7,51.8,54.2.~2.4
:~~~t the corresponding quantities calculated for the
whole population, In fact, the samp~e statlstlcs are For a set of data like'this. it is best to choose a
equa 1 to a value in the centre I)fthe rang~ of '
llod estimators of the correspondlng populatlon val~es, say, a = 53.5; since the data are ~iven to
~~at~stics, and a V('-yclear distinction must be ,nade one decimal pla~e: choosing m = 10 will r€lToOVe
~etween values calculated from a sample and the decmals. Hence a convenient transformation for these
lues t!at would b! obtained if the whole population
data is
~:~e ~easured, This distinction is emphasi:ed by the
use of different letters to denote populatlon y = 10(x-53.5) , (3.9)
statistics and their sample estlmates. Thus, a
population mean is Jsually denoted by ~, ltS sample The values of y corresponding to the given x-values
are then
pstimate by x. A populatlOn v'nance 1,,11 be denoted
by 0', its s,:nple est':nate by s'. y = 10, -5. 22. -17. 7, -11,
An obviously desirable property of an,estimator is which ar~ easier numbers to use for calculation than
that it should provide as good an estlmate as the original x-values.
possible of the corresponding population statistic.
Of course. when the mean 9 and the variance s} of the
Now it can be shown that, if the divisor.':'i.s used \~ y-values have been found, it is necessary to reverse
r~E~~~tion (3.4), the resulting estimate of the the transformation to find the mean and variance of
population variance 0' will, on average, tend to be the A-values. To do this.-rt,can be shown that
slightly too small and that this bias can be removed
by changing the divisor to (n-l). y=,,(x-a),
It must be emphasized that, once a sample has been and therefore
chosen, the sample mean X and the sample variance s' X = ,1 + Y 1m.
can be calculated, and their values are therefore
known. The value of the ~opulation mean ~ and the Also.
population variance 0' remain unknown, but a very
reasonable question to ask is: Given X and s', what s} : m's~
can be said about the values of ~ and cr'? The answers so tlla't
to questions of this kind will be dealt with in later
s~= s}/m2
chapters. and
sx=sy/m. (3.12)
3.5 The Calculation of Means and Variances Thes~ results show that, once y and Sy have been
The methods of calculating means and variances u~ed calculated, it is an easy matter to find x and sx.
in introduct'ng the statistics in Sections 3.2 and
3.3 are not necessarily the best or most convenient Exam"le 3.1 (continued). The complete calculation
in practice. There are some aids to calculation that for :he garment-length data is shown in Table 3.2.
arE often useful, even when an electronic calculator
is being employed. Step~' shows the original data in column (1), the
tran;formed data in (2), and the squares of the
3.5,1 In Algebraic Identity transformed values in (3). The sums of the values in
columns (2) and (3) are ry and ry', respectively.
It has been seen that the variance of a sample of
size.r:is given by Step II calculates the mean of the r-values by using
Equa~l()n (3.2) written in terms of y rather than >: .
The most difficult part of the calculation of s'is
findino the sum of sqJares of deviations, r(x-x)'. ~.!..!J finds the sum of squares of the r deviati"ns
Howeve~, it can be shown that
by u:;ing Equation (3.7), again written in y's rather
This is an important identity that will be used than ;<'s.
repeatedly, since it avoids the necessity to find
ali the individual deviations. ftep IV uses the definition of variance, Equation
3.4), to find the variance of the y's .
3.5.2 Linear Transfonnations
.?~ calculatES the standar< deviation of the y's.
Calculating the variance by using Equations (3.7)
and (3.4) involves c~nputing the v~lue Df ex', i.e., ~ reverses the transformation by using
the Sum of squares of the individual values of the Equations (3.lOa) and (3.12) to find the mean and
variable. If these values contain many digits, e.g., stancard deviation of the original x values, which
lf they are measured to several decimal places, such are seen to be identical with those obtained
a calculation can bec~ne very lengthy and thus be previously.
prone to error. A calculator can overflow, for
example. Fortunately, there is a way of reducing the
dlfficulty of the arithmetic involved, by the use of
a linear transformation.
== -'1~[ 21
-.....• •..----
~3 .4 Estimatio~d Estimators (n-l) as a divisor in If the original variable is denoted by x, the idea
lanation for tne use of is to transform the values of x into values of a new
variable y, which is related to x by an equation of
An e~p (3i-4)i'~r'eraastohner than the "xpec ted n, is of th, type
EqUa"0ny 1ies in the basic problem
netcetSisasrt.i' cs, discu!~ed l, n .hapter 1 ,.name1y that an
y ; m(x-a). (3.8)
sa. 1'<being made to say somethlng mean1ngful The values of m ard a are chosen so as to make the
aabtotuetmpa. po-pulation when only the ~esu 1ts 0f a.samp 1e, values of 1.. as-simple as possible.
chosen at random from the populat10n, are avallable.
~ow the mean and the variance of a sample, such as Example 3.1 (continued) The values of the gannent
1eny ths were
tho~c calculate" in the preced~ng sect10ns, hdve no
. t irtrinsic value, except ,or what they tell us x ; 54.5,53:0,55~7,51.8,54.2,~2.4
:~~~t the corresponding quantities calcula~ed.for the
whole population. In fact, the sample statlstlcs are For a set of data liki thi~, it is best to choose a
equal to a value in the centre ~f the rang~ of
called esti~ of the correspo~d1ng populatlon valles, say, a ; 53.5; since the data are ~iven to
one dec imaI place" chaos ing m ; 10 wi 11 rerr,ove
statistics, and a V(:·y clear dlst1nctlOn must be .nade
deccmals. Hence a convenient transfonnation for these
b tween values calculated from a sample and the
da ta is
elues t!at "ould b: obtained if the whole population
~~~e 'lleasured.This distinction is emphasized by the
use of different letters to denote popu1atlon y; 10(x-53.5) : (3.9)
statistics and thei~ sample ~stimates. Thus, a
po~ulation mean is Jsually denoted by lJ, 1tS sample The values of y corresponding to the given x-values
are then
estimate by x. A populatlOn v·r;ance \;111 be denoted
by 0', its s,:nple est mate by s . y; 10, -5, 22, -17, 7, -11,
An obviously desirable property of an estimator is which arc easier numbers to use for calculation than
that it should provide as good an estimate as the original x-values.
possible of the corresponding population statistic.
Of course, when the mean 9 and the variance Sy of the
Now it can be shown that, 1f the d,v,sor :'.1S used \~ y-values have been found, it is necessary to reverse
f~E~uiticn (3.4), the resulting estimate of the
population variance 0' will, on average, tend to be \ the transformation to find the mean and variance of
slightly too small and that this bias can be removed the .(-va1ues. To do this;-Tt can be shown that
by changing the divisor to (n-l).
y ; .,(x-a),
It must be emphasized that, once a sample has been and therefore
X ; .1 + Y 1m.
chosen, the sample mean x and the sample variance s'
Also,
can be calculated, and their values are therefore'
known. The value of the ~opulation mean lJ and the Sy ; m's~
population variance 0' remain unknown, but a very
reasonable question to ask is: Given X and s', what so tha"t
can be said about the values of lJ and o'? The answers
to questions of this kind will be dealt with in later s~; sy/m'
chapters.
and
Sx = sy/m. (3.12)
3.5 The Calculation of Means and Variances Thesr results show that, once y and Sy have been
calculated, it is an easy matter to find x and sx.
The methods of calculating means and variances u~ed
in introduct'~g the statistics in Sections 3.2 and Exam,'le 3.1 (continued). The complete calculation
3.3 are not necessarily the best or most convenient for :he garment-length data is shown in Table 3.2.
in practice. There are some aids to calculation that
arE often useful, even when an electronic calculator Step I shows the original data in column (1), the
Is being employed. tran;1Formed data in (2), and the squares of the
transformed values in (3). The sums of the values in
3.5.! In Algebraic Identity columns (2) and (3) are Ey and Ey', respectively.
It has been seen that the variance of a sample of Step II calculates the mean of the l-values by using
size ~ is given by Equa!1On (3.2) written in tenns of y rather than>:.
The most difficult part of the calculation of s'is ~.!..!J finds the sum of squares of the l deviations
findina the sum of sqJares of deviations, E(x-x)'.
Howeve;, it can be shown that by u,;ing Equation (3.7), again written in y's rather
than ;('S.
This is an important identity that will be used
repeatedly.,since it avoids the necessity to find' ~ uses the definition of variance, Equation
all the individual deviations. (3.4), to find the variance of the y's .
3.5.2 Linear Transfonnations .?~ calculatES the standar' deviation of the y's.
Calculating the variance by using Equations (3.7) ~ reverses the transformation by using
and (3.4) involves computing the v,!-lueDf i:x', i.e., Equations (3.10a) and (3.12) to find the mean and
the Sum of squares of the Individual values of the stancard deviation of the original x values, which
variable. If these values contain many digits, e.g., are seen to be identical with those obtained
1f they are measured to several decimal places, such previously.
a calculation can beccxne very lengthy and thus be
prone to error. A calculator can overflow, for
example. Fortunately, there is a way of reducing the
dlfficulty of the aritrlTIeticinvolved, by the use of
a linear transfcmation.
t== -!}~l. ~---~-·1-
Table 3.2: -riJlculation of Me2/1 and Standard Deviation (1) (2) (3)
~f a Small Sample y = 10(x-53.5)
')tep ~ L
10
54.5 -5 100
53.0 22 25
[ , 55.7 -17 484
51. 8 289
7 49
54.2 -11 121
52.4
r.y= 6 r.y'= 1068
y = r.y/n = 6/6 = 1.0
r.(y-y)' = r.y'-(r.y)'/n
1068-6'/6
= 1068-6
= 1062
s' = r.(y-y)'/(n-l)= 1062/5 = 212.4
Y1
Sy = (212.4)i = 14.6
Since y = 10(x-53.5).
x = 53.5 + ]/10 = 53.5 + 1.0/10 = 53.6 em
and
Sx = Sy/IO = 14.6/10 = 1.46 em
3.6 Calculaticns [nvo,ving the Use of a Frequency Once the frequency distribution is formed. the
Dlstnbution identity of the individual members of a class is
lost. and it is assumed for the purpose of
When large amounts of data are available. the calculation that every member of a class is equal ~o
methods of the last section can still be used to the mid-point of the class. The error resulting from
calculate summarizing statistics. However. the first ,naking this assumption is usually negligible. Hence
stage in the analysis of large samples is often the the values of the variable are taken to be those ln
formation of a frequency cistribution. and this can column (2) of Table 3.3. These can be made mo~e
be used to shorten the necessary arithmetic. provided amenable by transforming them. by using a linear
that the classes are all of the same width. As an transformation of the form
example. consider the frequency distribution of
Table 2.4. whic'l is reproduced in the first ~hree y = m(x-a)
ctJ1umnsof Table 3.3.
For use with frequency distributions. it is best to
choose a equal to one of the mid-points in column
(2); in-this example. a = 31.3 has been chosen. The
best value for ~ is
1
m = class wldth .
Table 3.3: .Ca lcu1ation of t1ean and StandiJrd Deviation
for the Data of Table 2.3
r--- (1) (2) (3) (4) (5) (6) II Y = r.fy/n = -64/192 = - 0.333
f y=(x-31.3)/0.5 fy
I Class f/ III r.t(y-J)' r.ty'- (r.ty)'/n
mid-pt(x)
Step Class (tex) 80 502-(-64)'/102
99
104 502-21.33
44
29.1-29.5 29.3 5 -4 -20 0 = 480.67
29.6-30.0 29.8 11 -3 -33 29
30.1-30.5 26 -2 -52 60 IV s} = Ef(y-y)'/(n-l) = 480.67/191 2.517.
30.6-31.0 30.3 ~ 44 -1 -44 54
I 31.1-31.5 54 32 1
30.8 29 00
31.6-32.0 31.3 15 I 29 V Sy = (2.517)2' = 1.5:36 .
31.8 6 2 30
32.1-32.5 2 3 18 VI Since y = (x-31.3)/0.5,
32.6-33.0 32.3 48
33.1-33.5 32.8 x = 31. • 0.5Y = 31.3 + 0.5 x (-0.333)
33.3
= 31.13 tex
and
Sx = 0.5sy = 0.5 x 1.586 0.79 teL
CalculatIon at Mean and Standard Oev,at'on (1) (2) (3)
Sample ',tep ~ y = 10(x-53.5)
L
54.5 10
53.0 -5 100
I , 55.7 22 25
51.8 -17 484
289
54.2 7 49
52.4 -11 121
r.y= 6 r.y'= 1068
'I Y r.y/n 6/6 = 1.0
III r.(y-y)' r.y'-(r.y)'/n
1068-6'/6
1068-6
1062
IV s 2 ;;; r.(y-Y)'/(n-1)= 1062/5 212.4
Y1
V Sy = (212.4);- = 14.6
VI Since y = 10(x-53.5),
x = 53.5 + y/l0 = 53.5 + 1.0/10 53.6 cm
and
Sx = sy/l0 = 14.6/10 = 1.46 em ***
3.6 Calculaticns Invo.ving the Use of a Frequency Once the frequency distribution is formed, the.
Distnbution identity of the individual members of a class 1S
lost and it is assumed for the purpose of
When large amounts of data are available, the calc~lation th"t every member of a class is equal ;0
methods of the last section can still be used to the mid-point of the class. The error resulting from
calculate summarizing statistics. However, the first ~aking this assumption is usually negligible. Hence
stage in the analysis of large samples is often the the values of the variable are taken to be those 1n
formation of a frequency cistribution, and this can column (2) of Table 3.3. These can be made mo~e
be used to shorten the necessary arithmetic, provided amenable by transforming them, by using a linear
that the classes are all of the same width. As an transformation of the form
example, consider the frequency distribution of
Table 2.4, whic', is reproduced in the first ':hree y = mIx-a)
culumns of Table 3.3.
For use with frequency distributions, it is best to
choose a equal to one of the mid-points in column
(2); in-this example, a = 31.3 has been chosen. The
best value for ~ is
1
m = class width'
.Table 3.3: Calculation of Mean and Standard Deviation
for the Data of Table 2.3
Il Y = '£.fy/n= -64/192 = - 0.333
(1) (2) (3) (4) (5) (6) III r.f(y-y)2 Hy2- (r.fy)2/n
I, Class 502-(-64)'/102
;., 502-21.33
Step Class (tex) mid-pt(x) y=(x-31.3)/0.5 fy f/
I;
29.1-29.5 29.3 5 -4 -20 80 480.67
29.6-30.0 29.8 11 -3 -33 99
30.1-30.5 30.3 26 -2 -52 104 IV s§ = Ef(y-y)'/(n-l) = 480.67/191
30.6-31.0
30.8 44 -1 -44 44 1
I 31.1-31.5 31.3 54
31.8 29 0 00 V Sy = (2.517);- = 1.5:36 .
31.6-32.0
I IIi I 1 29 29 VI Since y = (x-31.3)/0.5,
32.1-32.5 32.3 15 2 30 60 x = 31. • O.5y = 31.3 + 0.5 x (-0.:
32.6-33.0 32.8 6
3 18 54
I = 31.13 tex
33.1-33.5 33.3 2 4 S 32
!I
II t= .•.192 and
-64 :02 Sx = 0.5sy = 0.5 x 1.586 0.79 tex
In the CXJIIlP 1e. ther-c fore. 4. Strength tests were made on a yarn with the
following results (in gf).
m· ~1 .• 0 592, C8, 616, 610, 610, 612, 591, 606, 607
£-,
d the transformation is thus Calcu;ate the mean and the standard deviation of
the yarn strength.
an -31 3). (3.13)
Y = 2(x . . 5.The nunber of warp breakages that occurred during
. the weaving of fixed lengths L . a certain cloth were
counted, with the following results.
lues of the transformed vanable are shown ln
The va (4) and "e now proceed to f'nd the mean and Number of warp breaks
vcaorliuamnnc.e of these y-va 1ues. per le1gth of cloth: 0 1 2 3 4 5 6
If the y-vclues Jre denoted by
Y~,Y2""'Yk Frequency: 15 26 21 19 8 3 0
and the corresponding frequencies by Calculate the mean and the standard deviation of t:1e
number of breaks per length of cloth.
f1 Jl .... ' fk'
the grand total of all the y-values is 6.The following table shows the distribution of the
number of defectives in samples of 100 chosen at
k randoln from the production of a certain garment OVE'r
f'Yl+f,y,+...+fkYk ~ [fiYi ' a period of time.
i= 1
and the total number of observations is
n :: f1-+f2+·· .+fk . Number of defectives: 0 1 2 3 4 5 6 7 8
Therefore the arithmetic mean is Number of samples: 4 II 17 30 24 15 10 7 4
n Calculate the mean and the standard deviation of the
number of defectives per sample.
[ fiYi
i=1 _ [(y
--n- .
nY = (3.14) 7.The distribution of the lengths of fibre.; chosen
at rar,jem from a large batch is·shown below.
[ fi
len9th~) Frequency
i= 1 8.1 - 3.2
Further, to each valUE of y, there is a corresponding 3
8.3 - 8.4 8
deviation frem the mean = (Yi-.9),so that the sum of 24
8.5 - a.6 32
squares of the devia~ions is 26
8.7 - 8.8 6
k 8.9 - 9.0 1
Ui(Yi-Y)', 9.1 - 9.2
i~ 1
9.3 - ~.4
and the variance of the y-values is therefore
Sy,~= n-l . (3.15)
It can be shown also that CalculJte the mean and.the standard deviation of the
fibre lengths.
and the use of this identity avoids the necessity
of having to find all the individual deviations. ·8.The table below shows the distribution of the
permanganate values (in parts per million) of the
The complete calculation is shown in Table 3.3. effluent from a factory in 60 days.
~ gives in column (5) the products fy, which
are formed hymultiplying the vaiues in column (3) Permanganate NLUnber of
by :he COrl "sponding values in column (4). Column value ~ days
(6) is formed by multiplying the values in column
(5) by the corresponding values in column (3). The 6.0 - 6.4 1
sums of tre entries in columns (3), :5), and (6) arE 6.5 - 6.9 4
n(=Ef), [fy, and [fy', respectively. The remaining 7.0 7.4 9
steps in the calculation are similar to those 7.5 - 7.9 14
described in the preceding section. 8.0 - 8.4 15
8.5 - 8.9 9
9.0 - 9.4 5
9.5 - 9.9 2
10.0 -10.4 I
,. An experiment was performed to determine the Ca1culnte the mean and the standard deviation of
wet:ability of a certain fabric. The test consists the penmanganate value.
in measuring the time taken for water to penetrate
the fabric. The test was repeated 10 times with the 9.The times taken (in seconds) by an operative to
following results (minutes). replace a bobbin on a spinning frame are shown below.
5.5, 5.1, 6.4, 5.7, 6.0, 5.6, 6.3, 6.2, 5.7, 6.0 Time (see)
Calculate the mean and the standard deviation of Nunber of I 7 17
the wetting time. occasions 16.1-16.2 16.3-16.4
2. Felting tests were carried out on some merino- Time (,ec) 15.9-16.0
wool samples. The results are expressed as the
diameter (in mm) of a felted ball produced during Number of
the test and are given below. occasions
27.0,25.8, 26.2, 27.2, 27.4,26.3,27.1 Calculate the mean and standard deviation of the
replac~ent time.
Calculate the mean and the standard deviation of
these dota. 10.The numbers of minor accidents sustained in a year
by a wJrk force were as follows.
3. Five standard count tests were carried out on a
cone of yarn with the following results (in tex). Number of accidents 0 1 2 3 4 5 6
41.3, 41.1, dO.9, 41.2, 41.3 Number of employee~ 16 19 30 14.10 3 1
I Calculate the mean and the standard deviation of the Find t:lemean and variance of the number of accidents
yarn Count. per employee.
tI=-=- =.=====li"tm· (2
4. Probability
4.1 The Need for Probabi lity heads when the loaded coin is tossed, and a gambler
would be interested to know what it was:
It has repeatedly been emphasized that the science of
statistics is concerned "ith making statements about An alternative approach is as follows. Imagine the
populations whpn the only information avail :ble is a trial or experiment of tos .ing the coin is repeated
random sample (Iravlnfrom the population. The nature many times and the number of occasion~ on which it
of this problem, i.e., of drawing conclusions about a falls heads uppennost is noted. For example, suppose
whole (the population) when only a part of it (the that in 1000 tosses heads were uppermost on 532
sam'ple) has been examined, suggests that there is occasions. It seems natural to suggest that the
likely to be some uncertainty about the conclusions, probability of heads on any subseque.':'t.o!s.s is equal
and a measure of the degree of this uncertainty to the proportion or relative frequency of trials
would "obviously be a useful addendum. Such a that previously resulted in heads, i.e.,
measure is provided by probability theory.
The concept of probability as a relative frequency
We all l;se the concept of probabil ity in our is of very wide application and is one usually
everyday lives. We often make statements like adopted in statistical work. We shall therefore give
'It will probably rain today', or 'The chances are the following general definition.
that I shall be on holiday in August", or 'It is
probable that he will arrive by train'. However, Definition If a trial is repeated n times and an
although such statements imply that one is not event E occurs in f of these repetitions, then an
absolutely certain that it "ill rain, or that one estimate of the probability that E will occur in any
will be on holiday in August, or that a friend will future trial is
travel by train, they are rather vague about the
extent of the certainty felt about each situation, It should be noted that Equation (4.1) provides an
and comments of this kind certainly would not be estimate of the probability that E will occur in a
good enough fJr scientific or technological work. future trial. The reason for this (considering the
For example, a managing director is not going to be coin -·tossing example again) is that, if another
very impressed by a chemist who tells him that " 1000 tosses were carried out, it is very unlikely
'The chances are that the new finish I have that exactly 532 heads would again be found, so that
developed is better than the o;d one', whereas he "a different value for Pr(heads) would be givln by
might well be convinced by the statement'; am 95% Equation (4.1). Howeve. it is reasonable to suppose
sure that the new finish is better than the old one'. that in the long run, a, the number of trials, n,
What is needed, therefore, is a precise definition becomes larger and larger, the ratio fin will
of what is meant by probability, one that will settle down and become closer and closer to a fixed
provide a numerical value (like the figure 95% in value, which can be regarded as the exact probabiJ ity
the above statement) for the chance or probability that E will OCCur. Experience tends to confirm that
that a certain conclusion is correct. this is what would happen, and, this being so, the
values of Pr(E) obtained from limited numbe •.s of
Several approaches to such a definition are trials can be regarded as estimates of the exact
availab,e. The most useful for statistic~l purposes probability, in much the same way that the sample
is developed in the next section. mean I is regarded as an estimate of the population
mean >J.
4.< The Definition of Probability
4.3 The Scale of Probability
Consider first a very simple experiment, or trial,
which consists in tossing a coin and noting whether Probabilities are measured on a scalE that is itself
or not it falls with heads uppennost. A natural a consequence of the definition summarized in
question to ask is 'What is the probability that the Equation (4.1): Consider two extreme cases. First,
coin wi 11 fall heads uppermost in a single toss or suppose that the event E is impossib;e. Then,
trial?' Probability theory "as first developed to however many tria1s are r arried out, E nev.er occurs
ans"er questions of this kind, which arose in and thus f = 0 always. Consequently, Equation (4.1)
connection with 3ames of chance, and the classical gives
approach would be as follows.
Pr\?n impossible event) ~ O.
There arc only two possible results for the trial,
heads 01' tails. If the coin is a fair one, i.e., is At the other extreme, suppose E is certain to hapoen
not loaded or biassed in any way, there is no reason in any trial. Then, every time a trial is made, E
to believe that one result is more likely to happen ';i11 occur, so that f = n always. Hence
than the other, and the results are said to be
equally likely. Of the two equally likely results, Pr(a certain event) = '.
only one is 'heads ',so the probability of heads is
V2. Similarly, the probability of throwing a 4 when The probabilities of all other events lie between
a perfect die is rolled is 1/6, since there are six these extremes. Sometimes the values of' probabilities
possible equally likely results, only one of which are multiplied by 100 to convert them to percentages.
is a 4. Thus, the estimate of probability of hfads given in
the last section can be written either as 0.532 or as
This kind of argument is quite reasonable so long as 53.2%.
one is considering only 'perfect 'systems like fair
coins and unloaded dice, with a finite number of
possible results. But suppose the coin in the above
experiment was biassed in some (unknown) way. There
are still two possible results to the trial, heads
or tails, but they are no longer equally likely, and
consequently the above argument fails. However, there
is still a certain chance or probability of getting
~- Exa~of ---Probabilities per tnur were never observed in this particular
sample of hours. Since there is no reason why 6 or
~. 't 's known from past experience that a more end-break" should not occur in an hour, this
final row has been included so that all possible
Suppose I 'n'g process usually makes 2.7% defective value~ of the variable has been accounted for.
m.anufactUrlAs. ',;i11become apparent I.n a 1ater Ch apter,
Column (3) shows the relatiVe frequencies, which,
artl~les'n of a "uality-control scheme for this according to the definition of probability, are
estimi.tes of the probabilities that the given numbers
the eS1g qUires~a knowledge of the probability of of enj-breaks will occur in any future hour
(prOVIded that the working conditions remain the
process redefective when a single article is drawn same). For example, the probability that any
g•e•ttrIanngdoma from t.he pro"",tu'C 1on l',ne.·1W'th the . subsequent hour will contain one end-break is
estimated as
t;rminology of probabilIty, a trIal here.conslsts
in selecting an article at ra~dcm. ASSuml~g the
rocess is running nonnally, If such a trIal were
Pe eatedly made, then In the long run. a proportIon Pr(l) = 0.46
~fPO.027 (or 2.7%) of such repetitions would result
. defective beIng found. Hence, accordIng to our Note that the sum of all the probabilities is unity,
i.e. ,
;",f~nitic~, the probability of finding a defective
i~ a single trial is 0.027 (or 2.7%). ..- Pr(0;.Pr(I)+Pr(2)+Pr(3)+Pr(4)+Pr(5)+?r(6 or more)
= 1.0 (4.2)
Example 4.2
The t'ble shows how this total probability of 1.0 is
Table 4.1 shows, in co'lumns (1) and (2), the distributed among the possible values of the
freq'lency table of Table 2.2, referring to the variable; hence columns (1) and (3) together
number of end-breaks per hour on one constitute a probability distribution.
Table 4.1 Dis~ribution of End-breaks Examp'e 4.3
a Spinning Frame
The d.,ta of Table 2.3 are the results (in tex) of
-(1) I (2) (3) count tests on a large delivery of yarn. Table 4.2
shows the corresponding frequency distribution in
Number of end-1Number of hours Relative frequency colu~ns (1) and (3); this was previously given in
Table 2.4
breaks per (frequency) (probability)
The relative frequencies are shown in column (4),
hour and these are the estimates of the probabilities
that a further count test on this delivery would
0 34 0.34 have ~ value falling in a particular class. For
1 46 0.46 example, the probability that a count test has a
2 13 0.13 result between 29.6 and 30.0 is estimated to be
0.04 0.057; that its value would be between 31.1 and
I3 4 0.02 31.5 is 0.281; and so on. Again the relative
4 2 0.01 frequencies show how the probabilities are
1 0.00 distributed among the possible values of the
I5 variable. Note again that the sum of column (4)
a 1:00 should be 1.000; rounding-off errors cause it to be
16 or more slightly different in this case.
100
I
sid~ uf a spinning frame. A row has been added to
the table. which indicates that'6 or mor~ end-breaks
~
'1 ab1e 4.2: Distribution of Results (in t2X) of Tests
on a Large Delivel~ of Yarn
~
(1) (2) (3) (4) (5)
~~-
11
Relative
:I Class frequency Cumulative
Class .boundaries Frequency (probability) probability
l (tex) (tex)
·i 29.1-29.5 29.05-29.55 5 0.026 0.026
If I1 29.6-30.0 29.55-30.05 11 0.057 0.083
30.1-30.5 30.05-30.55 26 0.135 0.218
i 30.6-31.0 30.55-31.05 44 0.229 0.447
31.1-31.5 31. 05-31.55 54 0.281 0.728
,j 31.6-32.0 31.55-32.05
29 0.151 0.879
32.' -32.5 32.05-32.55 15 J.078 0.957
I 32.6-33.0 32.55-33.05 6 0.031 0.988
33.1-33.5 33.05-33.55 2 0.010 0.998
~ll 192 0.998
·11
:.It::---
---- ..-J
There ;s J furthet' poillt to be made about. thi~ ~.b.l Exten510n to More than Two Events
The addition rule can obviously be extended to more
example, which deals .ith a continuous variable. fhe than tvlO events. In this case, it becomes as fall 01'5:
if E\, E" ... ,Ek are k mutually exclusive events
data were recorded to one decimal place. Thus, if a (i .e. ~ no two of them can occur sirnuLar.eously) then
Pr(E, or E, or ... or Ek) ~ Pr(E,)+Pr(E,)+" .+.'r(Ek). (4.4)
test result 15 recorded as 31.2, it is to be
EXillople 4.4 Refer again tv Table 4.1 to find the
understood that the'exuct'value lies between 31.15 probability of a given hour containing bet"een 1 and
i end-breaks.
and 31.25. ~uw the first of the classes in Table 4.2 :-Iefi.1d that
Pr(between 1 and 4 end-breaks)
is defined as containing recorded values between
Pr(l or 2 or 3 or 4 end-breaks)
29.1-29.5, as shOl-m in column (1). The range of exact • Pr(ll + Pr(2) + Pr(3) + Pr(4)
c 0.46 + 0.13 + 0.04 + 0.02
values corresponding to this is from 29.05 to 29.55,
0.65
and these are the class boundaries shown in column
Example 4.5 Using the count-test results of Tlble
(2). The probabilities of column (4) can now be ~.2, find the probability that the exact value of
c future random-count test will· be less than 30.55.
1nterpreted in te"ms of exact values rather than Using x to denote yarn count, we have
Pr(x<30.55) ~ Pr(x<29.0S)
recorded values. Fa}' example, the probability that
+Pr(29,05<x,29.55)
the exact value of a future test \'Iill lie between
0.218
30.05 and 30.55 is estima,ed as 0.135. *** This procedure for finding the probability that the
value of a variable will be less than (or greater
4.5 Compound Events than) a given value is an important one. The
cumulat1ve probabilities of column (5) i1 Table 4.2
So far "e have concentrated on estimating a-e calculated in this way and are estimltes of the
probabilities in relatively simple situations, such probabi 1 ity tha t ii count tes t wi 11 be 1e:;s than
as the probability that there "ill be exactly one 29.55, les~ than 30.05, less than 30.55, and so on.
end-brea' on a spinning frame in a given hour, or 4.6.2 Exhaustive Events
the probability that a single article chosen at If E" E" ... Ek are the only possible re.,u1ts of a
random "ill be defective. In practice, situations trial, they are said to be exhaustive. In any single
arise that are more complex than this. For example, trial, one of them is bound to occur so that
it may be necessal'y to ca 1cuI ate the probabi 1 ity of
finding three or more end-breaks in a given hour, or If the events are also mutually exclusive, Equation
the probability of finding exactly five defectives in (4.4) then gives
a sample of 100 articles chosen from a large batch. Pr(E, )+Pr(E;d+ ... +Pr(Ek) - 1.
These more complex situations are usually described This explairs why the probabilities in Table 4.1
as being composed of compound events, and the ar-d up to 1.0, since one of the values listed in
calculation of the associated probabilities can column (I) must occur in any given hour .
usually be carried out by using one or t"o general Ar. important deduction from this is as follows.
rules. No formal proof of the rules "ill be given Scppose I denotes the event that E does not occur.
here, but they will be illustrated by practical Then in any trial it is certainly true that either
examp 1es. E.occurs or it does not; hence
4.6 The Addition Rule: Simple Fonn 4.7 The Addition Rule: General Form
The simple form of the addition rule is concerned
Consider the data of Table 4.1 and suppose it is only with mutually exclusive events, i.e., with
required to estimate the probability that there will combinations of events that can never occur
be fewer than two end-breaks in a given hour. simultaneously. However, there are many situations
Remembering the basic definition of probability as a ,,>ere this is not true. For example, returr to the
relative frequency, I'e V/ould proceed as follOl's. d"ta of Table 4. I and define tliO events:
E\ent E,: fewer than two end-breaks occur,
Pr(fewcr than tv/a end-breaks) E, ent E,: bet"een 1 and 4 end-breaks occur
~ Fr(no end-breaks Q! one end-break)
number of hours with no or with one end-break
total number of hours observed
number of hours ,Iith\./number of hours "i th)
(no end-break
!lone end-break
tota' number of hours observed
. (nUmber of hours \;ith }(nUmber of hours wi th ~
no end-break
one end-break
= \ tota 1 number of hours total number of haUl'S
observed \observed
Pr(no end-breaks) + Pr(one end-break)
0.34 + 0.46, from Table 4.1
0.80.
This is an example of the simplest form of the
addition rule of probability. The essential feature
of the compound event (no end-break or one end-break)
1S that its two components, (no end-break) and (one
end-break), cannot occur. simultaneously in any given
hour. Such events are said to be mutually exclusive.
The general statement of the addition rule is as
fol101's.
Addition rule: simple form: If E, and E, are mutual1y
exclusive events, then
Pr(E, or E,) ~ Pr(E,l + Pr (E,). (4.3)
The probabilities of these events have already been Level of SR trea tlOent
found in Spction 4.6.They are
Oama;Je High(lI) Low(L) Tota 1s
Pr (E,) = 0.80; Severe(S) 727
Mild(M) 497 230 576
pr(E;) = 0.65. Totals 152 424
Now suppose we calculate tne probability of the 1303
, nO event (E, or E,). USlng the Slmp1e form of 649 654
c:h0c1lapdodui•tion rul.e. Equdlo.n (,rt..3) , "e get
It can be seen immediately that the proportion of
pr(E: 0r E,) Pr(E .)+Pr(E,J fibres treated at the high level that were severely
damaged was
0.80+0.65
497
1.45. 649 = 0.77;
which is obviously wrong since no probability can similarly the proportion of sever"ly damaged fibres
that hod been treated at the low level was
e>:ceeJ1.0.
TIc explanation for this erroneous result is that 230 = 0.35.
£, and £, arc not mutually cxclu·slVe. The can occur 654
simultaneously. for. if one end-break occurs in an
hour. then E, is true (sinee fewer than 2 end-brea '.$ These proportions are so different that one would
have occurred) and Ez is also true (because between inevi:·.ablyreach the conclusion (which was, of
1 and 4 end-breaks h<ve occurred). The situation is courS2, expected) that the proportion of damaged
illustrated in Fig. A.l, which shows that the fib"es depends upon the level of treatment used.
simultaneous event (h and E, ) has been counted
twice in adding together Pr(Et) and Pr(E,). It must Further, since proportions or relative frequencies
therefore be substracted again. Consequently, the can be interpret~d as probabilities, it is seen that
general form of the addition rule is as follows. the probability that any single fibre will be damaged
depends on the lel·el of SR treatment. Such
possible results: numoer of end-breaks probabilities are called conditional and the notation
per hour. Pr(se·'ere damage/high level)
f--E --j is uspd to denote the probability that a fibre will
1 be se'l~rely damaged, given that it has received a
high level of SR treatment.
f- E2 -I
-I
f--- E1 or £2
f- E1 -j
and
E2
4.9 The Multiplication Rule
Addition rule: qenera1 form
If Et and E, a ·e any two events then The at'ove conditional probabil ity can be ca·lculated
Pr(E, or E,) Pr(E,) + Pr(E,) - Pr(E, and E,). (4.6) as follows
Returning to the above problem. we see that
Pr(E, and E,) = Pr(one end-break) = 0.46. Pr(severe damage/high level)
Therefore. from Equation (4.6).
Pr(E, or E,) 0.80 + 0.65 - 0.46 number of fibres treated at high level
that were severely damaged
= 0.99 . total number of fibres treated at high level
number of fibres treated at high level ~)
( severely damaged fibres examined
total number of
tO~31 number of fibres treated at high level)
( total number of fibres examined'
Now. coch of the ratios in the above expression can
be im.erpreted as a probabi 1ity estimate. The
numerator is the probability that a fibre selected
at ranrlom was both treated at a high level and was
severely damaged. The denominator is the probabilit.y
that a randomly selected fibre was treated at the
high l~vel.
We can therefore write
Pr(sev~re damage/high level)
4,8 Conditional Probability = £!jsevere damage and high level)
The data shown in Table 4.3 were obtained from an Pr(high level)
experiment designed to investigate the damaging
effect of a shrink-resist (SR) treatment This is an example of the multiplication rule of
probab~lity, whose general statement lS as follows.
on wool fibres. A large batch of fibres was divided
lnto two equal parts. One part was given a low level The multiplication rule If (, and E, are two events
of SR treatment. while the other part was given a ·that can occur together, the probability that E, will
~,gh level of the tre2.tment. Random samples of fibres occur, given that Et has already occurred, is
,rom each pare were then examined. and each fibre was
assessed as either severely damaged or mildly damaged. Pr(E, and E,)
The table shows the numbers of fibres in each
category. Pr(Et)
c__
4. Probability
An equivalent fO'lllthat is often useful is This rule can be extended to more than t>lO
indepelldent ,'v"nl:', TlIII", ,," L" r .., ,., ,[,
Pr(EI and L,) 0 Pr([,) x 1"'([.,/[,), ,1I't' ,
Illdl'JlI'IU..ll'lll ,·Vt:lil ...I• UWil
Noll· 1I1.l1. ·,1111 I, ~\Iltl ! •. ,lr'L' IlIll'nJI.lll!Jcdble Ull
tilelett-hJnu 'ide uf [,,"Jlian (4.8) (S';lCC the Pr(EIl cJnd £: llnJ .. and Ek}
event (E, and E,) IS the same as the event (E, and Pr(E,) x PriE,) x ...x Pr(E").
El»), they must te inte,'cil;lngeabloen the right, i.e.,
Pr(E, anil [,) 0 Pr([,) x flr(E1/E,).
It s~ould be appreciated that Pr(E,/E,) and Example 4.4 A manufacturlng process is used for
Pr(l,/E,) mean quite different things. For example, ~aking articles whose specified mass is 340 g.
if, in Table 4,3, H denotes high level of treatment Jccasionally the process drifts off target, and a
and S denotes severe damage, then Pr(S/H) is the qua Iity-contro 1 scheme has been introduced to deteet
probability that a fibre, known to have been treated when this happens. The scheme consists in regularly
at the high level, will be severely damaged. On the
other hand Pr(H/S) is the probability that a fibre, selecting u. random sample of four articles and
known to be severely damaged, was treated at the
high 1 eve 1 , finding the average mass of the sample, which can
then be ccmpared with the specified value. Because
In geneI'd1 of the ine,itable sampling variations, the sample
averages will not often be exactly equal to the
Pr(E,/E,) " Pr(EdE,) . _pecified value, even when the process is running
correctly, and the quality contl'oller has to decide
4.10 Statistical Independence ./hether or not the difference is large enough to
indicate that the process has drifted off target.
The data of Table 4.4 were collected by a canpany ·:his problem will be consider'ed in more detail in a
that receives similar parts from two suppliers, A later chapter, but one way of interpreting regular
and B. Large samples of the parts provided by each inspection results of this kind will be considered
supplier were inspectEd, and the table shows the here.
numbers of parts that were,or were not, defective.
These data can be treated in the same way as those Table 4.5 shows the results of the last twelve
in Table 4.3. For example, the proportion of sample averages in the order in which they were
obtained, Imagine the quality controller has just
Table 4.4: Numbers of Defective and made the twelfth inspection and is looking at the
Non-defective Parts sequence of
Supplier Sample No. Sample ..Deviation from High(H) or
A B Tota 1s mean (g) specification (g) Low (L)
Non-defective(N) 471 648 111g 1 342 2 H
21 40 2 338 -2 L
Defect ive (0) 16 3 337 -3 L
4 341 1 H
Tota Is 487 672 1159 5 343 3 H
6 342 2 H
defectives made by supplier A was 7 339 -1 L
8 343 3 H
PrIOlA) = 16/487 = 0.033, 9 341 H
10 342 1 H
while the proportion of defectives made by supplier 11 342 2 H
B was 12 30 2 'i I
3
Pr(O/B) 24/672 = 0.036.
sample averages. He notes that the differences
rhese proportions of defectives are so nearly equal between the sample averages and the specified vllue,
that for all practical purposes we see that it does shown in the third column, are all small and that
not matter which supplier is used - they ar~ both nene of them, by itself, would justify adjusting the
equally good (or bad:) Thus, the proportion of pl'ocess. However, he also notices that'the last five
defectives is independent of the supplier. This
contrasts strongly with the situation in Table 4.3, averages are 211 higher than 'he nominal. Is this an
,where the proportion of severely damaged fibres was
highly dependent en the level of SR treatment. indication that the process has drifted off target
acj is tending to produce articles that are too
Furthermore, the probability of a part being hf,avy? This is a typical statistical problem, and
defective, irrespective of which supplier it came it is answered by calculating the probability of the
from, is given by the totals in the final column last five samples being high if the process is
of the table, i.e., ~erating correctly.
Pr(O) = 40/1159 '0.035. To do this, we note that, if the process is making
,r:icles of the correct average mass, then we should
Hence it is seen that expect about half the sample averages to be high,
and half to ge low. Consequently the probability
PrIOlA) 0 Pr (0) that any single sample average will be high is 1/2,
',-husif Hn denotes the event that the nth sample
When this situation occurs, the events 0 and A are av~rage is high we have
said to be statistically independent.
Pr(H,) = Pr(H,) = ••• = Pr(H12) =
The above is an example of the general definition
of independence. No," the events H" H" ..., H12 will be independent
if the process is running correctly, for there is
Definition The event E, is statistically independent no reason then to suppose that a high result at any
of the event E 2 if one inspection will influence the result of the next
in"pection. Therefore the compound event of getting
Pr(Ei /E,) = Pr(E,) and Pr(E,/E,) = Pr(E,). th,:last five samples high is given by Equation
(4.10), i.e.,
For independent events, the multiplication rule of
Equation (4.8) bl.comes Pr(H, and H, and H" and HII and HI,)
Pr(Eland E,) = Pr(E,) x
pr(Ho) x Pl"(H,) x Pr'(H,,) x I'r(!lII)x I'r(H,,) Now tI'~machine continues to run only if none of
fuses burns out.
1 11 1
i;(~X~XI Assum'''g the fuses to behave independently, the
multiplication rule, Equation (4.10), gives
0.03125.
Pr(l11iJninedoes not stop)
-he quality control'er \;ould now Jrgue as follo\;$.
Pr(A and IT and C)
~I probJbility of gettlng the IJst flVe JverJges
Pr(A) x Pr(B) x Pr(f)
I :eh is only Jbout 0.03 if the process is worki~
chc,rgrt:..r. tl\'1 i.e., it is quite rare to get five high 0.99 x 0.98 x 0.96
., •. 0.931392.
~n successlon lf the process lS as lt
Finally, we know that
should be, When this rare event actually happens,
it must therefore make uS.SUSplC10US that th~ Pr(machine stops) + Pr(machine "es not stop)
process has drifted, and lt would be a good '~:: to so thaL
do scme further checks HTlI1ledlately.
Pr(machine stops) 1-0.931392
Fxample 4.5 A knitting machine is producing garment
tlanks that have to be rejected if 0.068608 ( 6.9% ).
(a' they are outside the mass tolerances, or In practical terms, this means that about 6.9% of an
(b) they contein holes, caused by dropped stitches. machine. cycles will be interrupted by blown fuses.
Past experience has shown that. on average, 5% of 1. A piece of equipment will function only if three'
the blanks are outside mass tolerances and 3% of compon£'nts A, 8, and C are all working. The
them contain holes. If these are the only faults probability that A will fail during a given year is
causing rejection, what proportion of the production 5%, that 8 wi 11 fai1 is 15%, and that C wi 11 fai1 i5,
will be rejected? 10%. Wrct is the probability that the equipm,~nt will
fail b~fore the end of the year?
We begin by noting that it is possible for a garment
both to be outside tolerances and to contain holes, 2. A garment is assembled from three componelts A,
so that the events 8, and C, and each garment consists of two A
cOl1lponents,two 8 components, ar.J one C component.
E,: a garment is outside mass tolerances,
It is known fran past experience that 5% of the A
E2: a garment contains holes, componAnts are faulty, 3% of the 8 component~; are
faulty, and 4% of the C components are faulty.
are not mutually exclusive. However, it is reasonable
to assume that E, and E2 are independent, since it is If components are selected at random for assembly,
difficult to see a strong connection between what proportion of the assembled garments will
incorrect mass and dropped stitches. contain
Now a garment is rejected if either El or Ez (or no faulty components;
both) occurs. Hence the probabil ity that a garment
wi11 be rejected is exactly one faulty component;
Pr(rejection) = Pr(El or Ez) at least two faulty components?
= Pr(E,)+Pr(Ez)-Pr(E, and E,) 3. Garm~nts are made up by sewing together four'
frem Equation (4.6) pieces of fabric, two of type A and two of type 8.
The proportion of pieces of type A that contain
= Pr(E,)+Pr(Ez)-Pr(E1) x Pr(E,l holes is 2%; the proportion of pieces of type E:
containing holes is 3%.
after using Equation (4.9), since E, and E, are
independent. But A garment is scrapped if it contains two or more
pieces with holes in them. If pieces are selected at
Pr(E,) = 0.05 and Pr(E,) = 0.03, so that random for making-up, calculate the proportion of
scrapped garments.
Pr(rejection) 0.05'+ 0.03 - 0.05 x 0.03
4. A baLch of 100 garments contains five defectives,
Since probabilities are interpreted as relative and arrlves for final inspection in a factory. If
frequencies, this calculation suggests that 7.85% items al'e selected at random from the batch and
of the ganncnt blanks "ill have to be rejected .••• inspected, calculate the probability that
Examp1e 4.6 A machi ne is protected by three fuses, (i) the first two items inspected will be defective;
A, 8, and C, and the machine will function only if
all three fuses are operative. Past ~xperience has (ii) the first defective found.will be the fourth
indicated that during the time taken to canplete a to be inspected;
cycle of operations the probabilities that the fuses
will burn out are 0.01,0.02, and 0.04, respectively. (iii) tl~re will be just one defective in the first
What is the probability that the machine will stop three items inspected.
during a 9iven cycle because of a blown fuse?
(iv) The factory runs an acceptance sampling scheme,
In this example, it is easiest to find the which stJtes that a batch will be accepted if no
probability that the machine will not stop. The defectives are found in the first five items
reason for this is that there is onlY one set of inspected. Calculate the probability that the above
cir·:umstancesin which the machine will not stop, batch w'11 be accepted.
namely, if none of the fuses burns out, whereas there 5. A roem contains three machines A, 8, and C. The
are several sets of circumstances that would cause probabi~ity that A will stop during a certain day is
the machine to stop (i.e., if A burns out, or if 8 3/4, the probability that 8 will stop is 1/2, and the
does, Or if A and C both do, and so on). probability that C will stop is 1/3.
Let A denote the event that fuse A fails and A the (a) Calculate the respective probabilities that 0,1,
event that it does not. Then Equation (4.5) gives 2, and 3 machines will stop during the day.
Pr(A)+Pr(A) = I, so that Pr(A) - Pr,(A) (b) What is the most probable number of stopped
machines"
I 1 - 0.01
0.99.
Similarly
I1I Pres) Pr(S) 0.98 and
_
Pr(e) Pr(C) 0.96.
[;:_b~.----- ]~L.
5. Some Standalld PrrobabiBity Distributions
5.1 IntJ'oduction fll the standard distribJtions with which we shall
bp.concerned have a mean and a variance, and, as
The general approach to the solution .f the basic mentioned above, a major task of statistical
problem of statist'ics (of makIng statements about ar,alysis is often to make statements about. or
a population by "sIng the results of a sample) IS estimates of, these quantities. It is therefore
called statistical inference, and ,t depends on there necessary to be able to find the mean and variance
being a recognizable pattern in the data being of the .heoretical distributions. The distributions
considered, Methods for revealing,such patterns \'E. shall consider fall into two distinct classes _
in large samples were dLalt wIth In Chapter 2. those referring to discrete variables, and those
When a pattern has been detected, it can usually be appropriate for continuous variables.
likened to one of a number of standard, or
theoretical patterns, <ihich then becomes a model for Consider first a discrete variable x, which can
the data and which is assumed to represent the' take the integer values 0.1,2, ... ,xi,...•k. The
population from <ihich the,sample was drawn. Since tile distribution of x is usually defined by a formula
properties of the theoretIcal patterns are known, a for calculating the probaJility that x = <i. i.e .•
step has thus been taken towards being able to make
reasonable statements about the population. Pr(x = Xi) , O~xi",k .
If the sample is small, there will not be enough data NOd. in Section 3.6. it was explained that the mean
to reveal the underlying pattern, but in suc~ c.ses of a large sample can be calculated from a frequency
prevlous experience with similar data will often tahle by using the formula (Equation 3.14)
suggest that a pattern exists and what type it is.
Efixi
In order to carry out the procedures of statistical
inference, it is necessary to have available a stock However, the relative frequency filn has been
of standard patterns, or distributions. that can act
as models for real data. In this chapter. a number defined as an estimate of probability. Consequently,
of the more conrnonly used st.,ndard distributions will as the sample Lecomes larger and larger, so that it
be given, togethe~ \'ith examples of the kind of eventually includes the whole population, the
circumstances in which they might be expected to
apply. It will bec~ne apparent that each type of xrelative frequency in the above formula becomes the
distribution is defined by certain parameters, and
individual distributions of the same type are ~ probability, and the mean becomes the
distinguished by different numerical values of the
defining parameters. For example, one very important ,population mean, ~. Hence
distribution, the normal distribution, is completely
defined once the values of its mean and standard \1=[ xiPr(x=Xi).
deviation are known. In any particular investigation '\i=O
the actuaI vaIues of the parameters in the
population being considered are unknown (there A similar argument, starting with Equation (3.15)
would be no proble,n of statistical inference if they shows that the population variance is
~C£ kno·Nn).Thus, in many of the problems with which
we shall be concerned in later chapters. the main k
interest wi11 be in making statements about the
possible population values of the defining 0' ['(Xi-~)'Pr(x
parameters. xi= 0
Xi).
Fig. 5.1 A typical continuous distribution
~
_ r ntinuOuS variables, the probability . 5.3.1 TileGeometric Distribution: General Form
,or ,0 . 's a smooth curve of the klnd dlscussed
Suppose a trial rlS two possible outcomes, success
distrlbut~O~.~ and defined by a probability density or failure. Let the probability of success in any
single t.'ial be constant and equal to p. Imagine a
in sect~~ <x< 00), and Fit). 5.1 shows a typlcal succession of independent trials; then the
probability that the first success will occur at the
fIx). ( Recalling that areas under such curves rth trio'l is
example't probabilities, we see that the probability
repres~n v~lue cf the val'iablc X wi 11 fall in the
thatlt;~te~v.1 (Xi. xl + dXi) is 9ivcn by the shaded
smal . F;~ 5 1. If ~he ,.Idth dXi or the interv,l1 PrIx • ,.).,p(l_p{-l , (5.5)
aft'"III s ',~1; .the shaded a,'eawi 11 be almost a
lS v;ry,cr".';lO~eheight is equal to the value of for r = 1,2,3, ... From this, it is seen that the
rf e(cxt)oangtI x Henc e distribution is completely defined when the value of
= xi, . e., f."\X,,) p is knuwn.
1.
Pr(xi<X<Xi+dXi) = f(xi )d"i . The mear and standard deviation of the general
geometric distribution, found by using Equation
This is the analogue of the probability PrIx ~ Xi)
. toe discr~te case. The contrlbutlon to the mean (5.5) in Equations (5.1) and 5.2), are
~n the values in the interval (Xi, Xi+ dXi) lS
t~erefore Xif(xi )dxi and the sum of all the~e (a) meanu = lip; (5.6)
contributions is the integral of thlS quantlty over
(b) standard deviation a = (1_p)1/2/p. (5.7)
all values of xi, i.e.,
Example 5.1 (continued) In this example, p = 0.3, and
Equation (5.5) gives the probability values shown in
Table 5.1
In a similar way, the variance of a continuous r Pr(x = r) r PrIx = r)
distribution is found to be
f00
·(xi-u)2f(xiJdxi.
1 8.300 9 0.017
2 0.210 10 0.012
3 C.147 11 0.008
4 0.103 12 0.006
Example 5.1 As part of.a quality-control scheme, a 5 0.072 13 0.004
manufacturing process lS lnspected at regular
intervals so as to detect when it drifts off tar~et. 6 0.050 14 0.003
If a shift of a certain size takes place, a single
inspection is not certain to detect it immediately 7 0.035 15 0.002
but has a 30% chance of doing so. It is obviously of
interest to the quality controller to know how 8 fl.025 :> 16 0.006
quickly ~he shift might be detected. As a start
towards solving this problem, suppose we calculate This distribution is illustrated by the bar chart
the probability that the process shift will be shown in Fig. 5.2. From the table, we see that the
detected at the third inspection after it occurred. chances of detecting a shift on the first sample are
0.3, on the second sample 0.21, on the third sample
Since the probability that any individual inspection 0.147, and so on. The probability of not detecting
will successfully detect the shift 1~ 30%, we have, the shift until the tenth sample is small. equal to
with an obviou~ notation, 0.012.
Pr(S) = 0.3, The mean and standard deviation of this particular
geometric distribution are. from Equations (5.6) and
and the probability that the inspection will fail to (5.7),
detect the shift is
u = Q1.3 3.33,
Now, if the shift is to be first detected at the Note that. the roean valu" of a variable that can take
third inspection, the 'first two inspections must only integer values need not be an integer. ***
have failed to detect it. Hence
Pr(shift detected on third inspection)
= Pr(first inspection fails and secJnd inspection
fa;Is and third inspectionsuccecds)
= Pr(F) x Pr(F) x Pr(S),
en using the multiplication rule, Equation (4.10),
and making the reasonable assumption that successive
inspections are independ~nt. Therefore, if x denotes
the number of inspection; before detection,
Prix = 3) = 0.7 x 0.7 x 0.3 = 0.147.
This calculation can be ~epeated for different
values of x(=l ,2,3,...) and the probabilities thus
generated are an example of the gecffletric
~tr~bution. ***
In fact, there is a general fonn of .t.he distribution.
The successive inspections in the abcve e~ample are
lnstances of what are generally called trials. Hith
thlS nomenc1ature, the genera 1 fonn of the geometri c
d1stribution can be stated as follows.
L________ 31 L
~ The Binomial Distribution Whether or not the scheme is reasonable depends on
how ,requently a sample of size 50 will contain tliO
t:. <1 1 The Genera 1 Fonll or' nlld~edefectives ~~Ji the process is runni~
satisfactor,ly. For, if the rule leads to the
~~~~tribU~l?lIt the binomial rejection of too lIlanysatisfactory batches, it is
obvicusly not sensible. we therefore calculate the
~;stribution is concerned h'1tfl.d.seqUe~ce of I prob"bility of finding two or more defectives in a
sample af size 50 when the process is making 2.7%
'~dependent trials, the probabl11ty of success.at defect ives.
~ach trial being always equal to.p. We now envlsage
that a ~ number n of tnals 1S carnedout, and
the binomial distributIon gIves the probabllltyof
getting exactly r successes (().:r"n) ln the n tnals. The procedure of selecting a rand~n sample of 50
articles fr~ a batch can be hought of as doing 50
If X now denotes the number of successes in n separate trials, each consistl"g in choosing an
independent trlals, then lt can be shown that arti<:le at random. If the batch is very large
compi,red with the size of the sample, the probability
n: r n-r (5 .3 ) that any article chosen for the sample will be a
defective is constant and equal to 0.027 (2.7%).
Pr (x = r) = ; ~ (n_r )~ p (1-p ), These are just the conditions in which a binomial
distribution applies, with n • 50 and p. 0.027, dnd
where r: (read r facto,-ial) = 1 x Z x 3 X ••• x r. Equaoion (5.8) tan be used to generate the necessary
probiibilities.
Tile binomial distribution is defined "hen nand pare
kno"" and some typical distnbutions are sh0"n in If x denotes the number of defectives in a sample, we
Fig. 5.3. This diagram. illustrates the fact that, have to find
"hen n is small and p lS near to 0 or near to I, the
binomial distribution is skew (e.g:, the curve Pr(x~2) Prix 2 or x = 3 or or x = 50)
labelled n = 10, p. 0.1). When p • 0.5, the
distribution is s~nnetrical about its mid-point for PrIx 2)+Pr(x c 3)+ +Pr(x • 50)
a 11 va lues 0 f n (e. g ., when n • 10, p • 0.5 in Fig.
5.3), and it also tends to be symmet,-ical when n is by H.c additi on ru Ie of probabi Iity. Therefore, to
large, even though p may be close to 0 or to I (e.g., find Pr (x>2) we need to put r • 2, 3"." 50
when n • 100, p. 0.1 in Fig. 5.3). This tendency to succ~ssively in Equation (5.8) and add the result~
symmetry for large values of n is of considerable together. This is a very tedious calculati In; it can,
importance and will be mentioned again in Section 5.7. howe'ler, be much shortened by ~ot ing that
The mean:and standard deviation of the binomial Pr(x>2) + Pr(x<2) • 1
distribution are
p np, and (5. g) and that
O. /np(l-p). (5.10) Pr(x o or x • 1)
Example 5.2 A manufacturer supplies large batches PrIx 0) + PrIx .1),
of articles. When a batch is ready for dispatch to a
customer, it is inspected for defectives. It is too so tr.at
expensive tc inspect every article in the batch and
it has been suggested that a sample of 50 articies PrIx>;,) • I - PrIx • 0) - PrIx • I) •
should be randomly chosen fr~n each batch and Equation (5.8) gives
inspected, and that, if two or more defectives are o~~b:Pr(x. 0) ;
found in the sample" the batch should be rejected. (0.027)0(0.973)"
Past experience has shown that the manUfacturing
process usually produces 2.7% defectives, and this is = (0. 973 IS'
considered acceptable. Is the suggested scheme a .0.2545.
sensible one?
(Not~ that 0: is defined to be equal to 1, a ld that
a 0 =1, where ~ j s any number).
I0.40
/n 10, 0.1
0.35
'c I"" 1 0.30
0.25
0.20
0.15
0,10
0.05
Also, T5:i0\9:' (0.027)' (0.973)'" preferable if there were 5001eway of measuring its
strength. This kind of situation occurs frequently
pr{x=l)= in the interpretation of experimental data, and the
argument used is a canmon one.
~ 50 x 0.027 x(0.973)"
The basic approach is to enquire whether the result
~ 0.3531. actually obtained in the experiment could easily
have aris~n purely by chance.
Therefore
Suppnse t,'pre reolly i~ n" di fference in the
pr( <>2) 1-0. ?5·1S-r,. J!i.l 1 effectiveness of the treatments Nand S. Then the
assessors cannot really distinguish between th~l
If O.J92·L and woula therefore state their preference at random.
Since the'.'have only two choices, the probability
is calculJtic'l shows thJt, if the .;:li .Jcess always that any assessor will choose N as his preference is
V2. Each assessor can be regarded as a separate
Th d batches containing the satlsfactory level independ(nt 'trial 'and an N prefe.-ence as a 'success'.
Pro,du,c.e defe.ctives, then about 390./0 f a11 Sue.l Consequen~ly the conditions in which the binomial
distribution applies are met, with n = 10 and p = 0.5.
of 2· '· tory batches would be rejected by the not
s.at'.sf atc'on plan. The pr.pose d P1an l.S 0b'Vl0US 1Y This allows us to calculate the probability of
getting seven N preferences when there really is no
a1nssepnesc1'b'leone, since it results in such a high differenc,~ betw'en the treatments. In fact, if seve~
. ,. N prefereLces are evidence in favour of the new
proportion of wrong dec 1 s1ons. treatment. eight, nine, or ten such'successes'would
be even more favourable experimental results.
An obvious way to modi fc' the sc leme_so as to avoid Consequently we find the probability of getting seven
this difficulty is to allow.more deTec,lves ln the or more N preferences purely by chance. If x is the
samole before a batch 1S rejected. By proceedlng as number of N preferences given by the ten assessors,
abo~e, Table 5.2 can be construered. From we have
Table 5.2: The Cumvlative Binomial Distribution Pr(x>7)/p 0.5, n = 10}
when n = 50, 0.027
PrIx = • or X = 8 or x = 9 or X 10/p 0.5,
1. OOOl
n = 10}
0.745,
0.3924 Pr(x 7/p = 0.5, n = 10)+ ...+
0.1524
0.0458 Pr(x = :O/p = 0.5, n = 10} .
0.0111
0.0022 Using Equiltion (5.8) with p = 0.5, n la, we find
0.0004
0.0009 PrIx ~ 7/p=0.5, n=IO}
this it is seen that if the rule were modified to 0.1172 + 0.Ofl40 + 0.0098 + 0.0010
reject a batch if three or more defectives are found
in the sample, then about 15% of the satisfactory 0.1720.
batches would be rejected. This is still too many
wrong decisions and ,a rule that rejects a batch when The probability of getting seven or more N
four or more defectives are found in a sample is preferences purely by chance is thus quite high.
more sensible, since then only about 4.5% of Another way of looking at this is to imagine the
acceptable batches would be ';ncorrectly rejected. experiment repeated indefinitely, there being no real
difference between ,the treatments. Then about 17% of
This example is allied to a branch of quality all such experiments would result in seven or more N
control known as samplin> inspection, which will be preferences being given.
dealt ,lith in more detail in a later chapter. ***
The particular experimental result obtained is thus
Example 5.3 seen to be not at all unusual and could easily be
obtained by chance; consequently tbe experiment does
A chemist developed a nevi softening agent N that he not provide very strong evidence to suggest that the
considered wovld be more effective than the standard new trea trr."ntis better than the standard. ***
treatment S for a particular type of fabric. To test
his claim he carried out an experiment. The 5.5 The Poisson Distribution
treat~ents were applied to separate randomly chosen
• specimens of the fabric, and the treated specimens 5.5.1 The General Form
were then submitted to each of ten assessors, who
were asked to state which fabric they preferred. The two distributions so far considered refer to
Seven out of the ten assessors preferred the fabric circumstances ir which a succession of separate,
treated by the'ne" process. Is this sufficient independent. trials is imagined to occur, each trial
evider.ceto justify the claim that the new treatment having only two possible results, namely, 'success 'or
1S better than the standard? 'failure..'HC',tlevert,here are many situations where
these cond.tions do not apply. One important ca~:e is
The argument used to ans"er this ques tion is a cammon when events occur randomly in space or time, and
one in stotistical inference and is an attempt to put there is theoretically no limit to the number of
'~to objective terms what we instinctively feel about events that can happen in a trial, or unit. Examples
sltuations of this kind. If all ten assessors had are the number of faults in 10-m lengths of cloth,
preferred the new treatment, we should probably have or the number of end-breaks per hour on a spinning
nad lottle doubt that it was better than the frame. Provided that the events (faults, end-breaks,
standard, and even if nine out of the ten preferred etc.) occur independently and only one at a time, the
N we should still ~robably feel fairly confident in appropriate distribution is often the Poisson.
conclUding that N was better. On the other hand, if,
say, five or fewer of the ten assessors had preferred For this distribution, the variable x is the num!er
h, then we should probably conclude that N is not of events per unit of time or space and can take the
better than S. The particular experiment we are values 0.1,2, ... The Jrobability th't exactly r
concerned with had a resul t some"herc between these ev:nts will occur is given by
c,(tremes,and the issue is not so clear-cut. While
~even out of ten assessors preferring N appears to
e srunekind of evidence in favour of N, it would be
~~~II
5. SomSetandarPd robab1iity Distri butior.-s-----------·--------
~!:....I~~and Standard Deviation
The defillin'lparallleterof the Poisson d,,;tribution ,-- - --------------- ----.---- -----------1
is nIt i .c. 1 once the numerical vc11uc of l!l is KnQvlll, I ~\ul:lt;cr of sto~lJJf)e')
E.1llation(5.11) can be used to calculate the ProbdOllity of Uds tlJIilLer 0; houl'S'
probability that r events will occur in a unit of
tillloer space, for r " 0,1,2 .... sOilletypical 1 il J cloy number of s tap pages of 0'1("" t ;me I
distributions, wi th different values of 01, are sho'lI1 ------J
in Fig. 5.4. The diagrams show that when 01 is small I
the poisson distribution is highly skewed (e.g., the
curve labelled 01" 1). As 01 increases, the 16 or fewer 0.8675 ai
distribution tends to become syffinetrical,as can be
seen from the curves labelled 01" 5 and m " 10. 0.0686 1 1/3
0.0360 2 2/3
0.0279 4
It can be shown that the parameter 01 is equal to the for r " 7,
~ean of the distribution and that it is also equal
to the variance. Thus we have e-4.24./
--7-~-
lJ nI, and (5.12)
o " mlF! (5.13 )
Example 5_4. as shown in the table. For the first row
A factory contains a large number of similar machines Pr(x(6) " Pr(x = 0 or 1 or 2 or 3 or 4 or 5 or 6)
which stop randomly at an average rate of 4.2 per
day. In a nonnal working day of eight hours, the Pr(x" O)+Pr(x " 1)+ ... +Pr(x 6)
mechanic can deal with up to six stoppages; if more
than six breakdowns occur, he works overtime e-4.2(4.2)0 e-4.2(4.2) 1 e-4.2(4.2)6
subject to a maximum of four hours. What is the
average.amount of overtime he works? O~ + 1: + ... + 6! -
If the mechanic can deal with six stoppages in 0.0150 + 0.0630 + 0.1323 + 0.1852
eight hours, the average time needed to restart a
stopped machine is 8/6" 11/3 hours, and in four + 0.1944 + 0.1633 + 0.1143
hours' overtime he can deal with up to three extra 0.8675.
breakdowns. Table 5.3 shows all possible cases. The
first column shows all possible numbers of The final column in the table shows the hours of
stoppages. If the breakdowns occur randomly, the overtime.necessary to deal with each number of
probabilities in the second column can be calculated stoppages. This together with the column of
from the Poisson distribution with m " 4.2, i.e., probabilities, constitutes a probabil :ty distribution
Pr(x " r) " e-4.2 4.2r/r~. of hours of overtime. Therefore, by using Equation
(5.1), the mean number of overtime hours per day is
o x 0.8675 + 1 1/3 x 0.0686 + 2 2/3 x 0.0360
+ 4 x 0.0279 = 0.2991 hour = 18 minutes. ***
0.40 /m
0.35
0.30
Pr (x"r)
0.25
0.20
0.15
0.10
0.05
Normal Distribl tion SUPPOSt the variable x has a mean lJx = v and G
~ =standa~d deviation Ox 0, and consider the linear
6 1 The Genera 1 Form transf,.nnation
~ to probably the most important and 1II0St Then, lising Equations (3.10) and (3.12) and ,In
We'd'urn now distribution 1"n statls t' lCS. There are obv i ou:. nota t ion, we fi nd that
widely use sons why the normal (or Gaussian)
several rte1a0nY oc'upics such a central position. One
d'1stnbu crnt-inuous
wana'bl e d0, 1' n f ac t , h,ave lJu· (lJx-lJ)/a. 0,
is that ~a~ns tl,at are n rmal, or very nearly so
d1str1bU~ must be emphasized that there is not since lJx.~. and that
(tnOU~hr~l" anvthing \-Irong with a variable Hhose au. ox/a· 1,
neces:a t'~n i~ not rormal). Another reason is that
d1str1~~r;bution occur'; naturally at many points in since ox· a.
the dd' -lopment of the mathemati ca 1 theor'y of Further, it can be shown that, if x has a normal
the eve
statistics. distrib'ltion, then u will also be normally
. 'k the geexnetric, binomial, and Poisson distribJted. Consequently the transformation of
dUlnslltn eb U t"ns whic~ are roncerned with discrete Equation (5.15) turns an N(lJ,a2) variable into an N
refers to a
I' ..• · ,. ..,
variables, the normal dlstr1but1~n (0,1) variable. The distribution N (0,1), with zero
continuouS variable, It 1S thererore deflnedby a. mean an~ unit variance and standard deviation, is
robability-density fLnct10n f (x), aS,descrlbed In called the standard normal distribution.
Spec t'o1ns 2 . 5 an'd 5.2. For the nonnal dlstnbutlOn The transformation of any normal distribution N
1 (ll,a') into the standard normal distribution N(O,l)
fIx) = 0(2") 1/2 rxp {-(x-wi' /20'),
-~«<~, (5.14) is shown diagranmatically in Fig. 5.6. In particular,
where u and a are, respectively, the mean and any value of'x (. X, say) is transformed into a value
ofu(·U) 'Jiven by Equation (5.15), i.e. (5.16)
st~ndard deviation of the distribution. These are
U· (X-Il)/a.
the parameters that characterize the distribution;
once their numerical values are known', the
distribution is completely defined, and the notation Another W1Y of regarding the variable lJ that is often
useful is as follows. Rearranging Equation (5.16)
N (u, 02) is often used to refer to a normal gives
distribution with mean u and variance 02. Thus N X-Il • lJa.
(10,25) is a normal distribution with a mean of 10
and a variance of 25. Thet'e is therefore an infinite
number of normal distributions, some of which are Now X-lJ is the deviation of the value X from the mean
lJ, and the above equation shows that this can be
shown in Fig. 5.5. The curves are symmetrical about equated tv U standard deviations.
their means; theoretically they extend to infinity in
both directions, never quite reaching the horizontal
ax is.
5.6.3 TabJes of the Normal Oistribution
5.6.2 The Standard NOlmal Distribution It Has explained in Section 2.5 that areas under
We have seen that there is an infinite number of probabi 1 ity curves represent probabil i ti es. Thus the
nOrmal dlstributions, each with its own mean and area shown shaded in Fig. 5.6(a) is equal to the
standard deviation. However, it is possible to probability Pr(x~X). It is important to appreciate
transform anyone of them into a unique standard clearly what this means. If there is a population
form, by USing a li.lear transformation of the kind that is represented by the distribution N(lJ,a2) and
dl>Cus>ed in Section 3.5, end it will be seen later a member 0' this population is chosen at random, then
that there are very positive advantages in doing the shaded area gives the probability that the
thl S.
numerical /alue of the member chosen will be greater
than a specified value X. Alternatively, because
5. Some Standard Probability Distributions
Wu
Fig. 5.6 Transformation of N(w ,02) into N(O, 1)
probabilities are interpreted as relative frequenCies, difficulty is overcome r" using the
.the proportion of members of the population having distribution, which we hu seen is standard normal
values greater than X is equal to the shaded area. N(lJ ,0') by using the transformation obtained from
Tihnetegarraea1 can be calculated by evaluating the t' = (x-lJ )/0,
:'0 the t
I f(x)dx
tiu = dxla.
X
The integral in Equation ·(5.17) therefore becomes
rI
exp (- (X-lJ)'/2a') dx (5.17)
J~
X a(21:)
for the case of a normal distribution. foo_1_ exp (_ (~)' 12) dx
Now this integration is not easy, and has to be X (2rr)1/2 aa
carried out numerically for each combination of the
values of lJ, a, and X. The calculation of normal J~w
probabilities in this "ay "ou1d therefore be a very exp(-u'/2)du
tedious procedure. The tedium can be overcome, U (2rr) 1/2
hOwever, by producing tables of the areas, though
a problem still exists because there is an infinite In other words, the shaded area to the right of X in
number of normal distributions and to tabulate areas Fl]. 5.6(a) is equal to the shaded area to the right
under them all "ould obviously be impossible. The
· Fi 5.6(b). This shows that it is only Further, we shall maKe the reasonable assump:ion
that chest girth h,S a nonna1 di;tribution C:his
of U 1n ~; tabulate areas under the standard assumption could be checked by using the methods
necessary. and such tables are given in tile of Section 5.7).
normal curve~
(i) It is all-lays helpful to draw'a sketch fo/'
,\ppendix . . problems of this type, and the appropriate sketch
for thi~ case is shown in Fig. 5.7(i). The
Trlaagbblhte1-sehs Al and A2 refer to values of the areas 1n the proportion of men with chest girths greater th.n
104 em is shown shaded, and we see that it is c
d tail of the distribution. Table Al \-lhi1e right-hand tail area given directly by the tables.
apnr (I' U' to be found when U is known, We have the basic equations
l~ , . u = (X-c.)la and
o· ,.en1e),U'2\.s·Thhoewrse values af U for glven values of
b is no need. to tabu l'ate more tnan Pr(x,X) Pr(u~U) = Pre u> Xfl).
T a
rr(u ) r' other probabll,t,es 11kely to be needed
'~rs'ly,n." be the rl.ght- han d·1tal irea 1.$
-hIS Found when
M
C3Y1 .. :"'.lT1to'e fol1cwi~g examples vlil1 illustrate this
knol'l'" "
f01 ;'!t.
~ Pr(x>lrJ4) Pr(u> 1-0-4-896-
The chest girths of a large sample of men were
measured, and the mean and standard deviation of the
measul"ements were found to be
mean = 96 en; Pr(u >1.0)
0.1587.
stanrlard dev)a~ior. '" 8 em.
of men in from Tatle A1. Thus, about 16% of men have chest
It is required to estimate the proportion girths ,reater than 104 cm.
the population with chest glrths
(i) greater than 104 em;
(ii) less than 100 em; (ii) The sketch for this part is Fig. 5.7(ii). The
required area is shown shaded; it is not a right-
(iii) less than 90 cm; hand tail area and therefore cannot be found directly
in the table. However, since the total area under the
(iv) between lO~ em and 110 em. curve is unity, we have
Since the sample was large, we can suppose that the Pr(x<lCJ) + Pr(x>lOO) = 1, and therefore
mean and st3ndard deviation of the sam?le are good PrIx < 1C~) 1 - Pr(x~laO) .
estimates of the corresponding parameters in the
popu 1at ion, 1. e. ,
~~96 em;
a"- 8 em·
l~------ ---J
'"
Saine Sta~ Probability Distributions
NUW pr( x;> 100) is the unshaded area, and thi scan Example 5.1,
be found directly. He have
A wearing trial was carried out to estimate the
100-96 average life of a certain kind of gannent. One
f'r(u '--8-) hundred people were each issued with a garment and
asked to wear it for ten hours a day, five days a
= P r (u ;> O. S ) week. At the end of three weeks, four wearers
reported that their gannent was worn out, and after
= 0.3085. rlcnce four weeks a further 16 wearers considered their
garment was finished. Assuming the lives of the
Prix < 100) = 1 - 0.3085 = 0.6915, garments are normally distributed, estimate the mean
and standard deviation of the garment life, and
SO that about 69% of men are estimated to have chest hence estimate the proportion of garments with lives
girths smaller than 100 cm. longer than 300 hours.
(iii) Fig. 5.7(iii) shows the sketch for this Let u and 0 be the mean and standard devi,tion o~
problem. The area required lies in,the left-hand garment 1ife. It wi 11 be convenie~t to work i , units
tail, and proceeding as usual we flnd of hours, so that each ~ek is equivalent to 50
hou rs 'wea r. At the end c three weeks, i.e., 150
90-96 hours, four garments had failed, and the situation
Pr(x< 90) = Pr(u <-S-) is as shown in Fig. 5.9(a). The shaded area
represents the four garments worn out at the end
= Pr(u~ -0.75). of t"ree weeks. For a left-hand tail area of 0.04,
we find, from Table A2, that U = -1.75, the minus
The neg-tive value of U simply means that we are sign being introduced because a left-hand teil area
dealing l'litha left-hand tail area. To find this is involved. The transformation equation U =(X-~)/o
area, we noce that the standard normal distribution therefore gives
is syn\llctrlca1 about the point u = O. Therefore the
area to the left of -U is equal to the area to the
right of +U, as can be seen in Fig. 5.S, so that
Pr(u<-U) = Pr(u,+U); hence -1.75 = (150-~)/o. (a),
Pr(u.;-0.75) = Pr(u.+0.75).
i
After four weeks, or 200 hours, a total of 4+16
20 garments were worn out, and the corresponding
sketch is Fig. 5.9(b). For a 'left-hand tail area of
0.2, Table A2 gives U = -0.84, so that
Now Table Al does not give the value of this Equations (a) and (b) are simultaneous equations to
probabi 1ity di ,'ectly. However, the values for solve for" and 0 , lead,ng to
U = 0.74 and for 0.76 are given; in fact,' when
U = 0.74,cx=0.2296, and, when U = 0.76, a=0.2236. ~ = 246 hours,
The value for 'U = 0.75 is obtained by linear o = 55 hours.
interpolation, i.e., when U = 0.75,
a= 1/2(0.2296 + 0.2236) = 0.2266. The proportion of garments with lives longer than
300 hours is given by the shaded area in Fig. 5.9
which suggests that just under 23% of men in the (c). Thus
popl'lation have chest girths less than 90 cm.
(iv) This case is sketched in Fig. 5.7(iv). The Pr(life ~ 300) = Pr(u~300-246)
required area, shown shaded, is not a tail area; 55
however, we have
= Pr(u,O.3S) = 0.1635
and the two probabilities on the right-hand side
of this e~uation are right-hand tail areas. The from Table A1, i.e., about 16% of garmE,nts are
value of Pr{x ~ 100) has already been found to be expected to have 1ives longer than 300 hours ~
0.30S5 in (ii), while
The advantage of this approach, which, of course,
11 0-96
Prix .110) = Pr(u ~--S-) relies on the assumption that garment lives are
Pr(u;> 1.75) normally distributed, is that a reasoncble estimate
0.0401 of the average life can be made long before all the
Pr(100<: x<: 110) 0.3085-0.0401 garments are Viorn out. ***
0.2684,
i.e., about 27% of men have chest girths between
100 cm and 110 cm.
a) II In matching a coloured pattern during dyeing, a
colG"r difference of zero would, of course, be the
_,0.04 I' ideal. However, for a particular dyeing it has been
I agre,d between the dye,' and his custa.ner that colour
Too diff2rences of up 5 colour-difference units will be
---I---- acceptable, i.e., the measured colour difference
~. IllIlStbe within the ".lnQc , 5 unit~, where a - sign
Ii.,'; 1""'1\ u·;,'<1to illdic"tl' that t.1I,d·yed fuhric is
150 1 iqht.er tht1rl t.tll~ ';lMH..1.1nJ Jlld t1 •.. 5i~T1 indicates
t.h"t it is Jarker.
If a dyeing is too light, the fabric has to be
reprccessed at an extra cost of 251 of the basic
dyei"g cost: if it is too dark, the fabric must be
stri!.ped and redyed at an additional cost of 50%.
Past experience has indicated that repeat batches
of ttis dyeing produce colour differences that have
a nonnal distribution with 0 standard deviation of
4 units. What colour difference should the dyer aim
~t in order to minimize the average cost of dyeing?
Suppose c is the basic cost of dyeing. Then, if a
dye batch is originally too light, the total cost
will be 1.25c, while if it is originally too dark
th0 total ~ost will be 1,5c. Let u be the required
objective colour difference; then the distribution
of colour differences in repeat batches is as shown
in Fi~. 5.10. The dyeing tolerance are also shown,
together with the costs. Now suppose a, is the
probability of getting a too dark batch, and a,
the probability of getting a too light one. These
proba'>ilties are given by the shaded areas in Fig.
5.10. Obviously, the chance of getting an ac,:eptable
batch is l-a,-a2' These probabilities can be
assoclated with the costs, so that the average cost
of a dye batch is, by using a result analogous to
Equation (5,1),
The valu~,.'s of ti~ Ji1d J., dc~end on the iJosition of I: i...:..?-£"j_~lpl e Te2..t fOf_..t!Yrfl1d_!-2._~.
relJtlvC to the tolC'l'~ncc .',5 ,)[,H.l Cdll ~L' ~'o~ndfrolll of l.he ~:Oilllllon Iiiclh(ld:-.-, usc'd for drd\olinlj
tableS of the nOl1l1Jl (11:;'~Jid)ulll)l1 by ,.dluIIJlHHJ usions frull! (:xpl'rlIllCllLdl (lllld (C't)C'ild on being
tile dPP!~Opr'I"t.e V<JllH;:" ot U, l.t.:. l\ble Lo dSSl'illC tho.t the dJlJ IIJve UIlC h~ol1l u flonnal
popuiution. It is then:fore vf SOiile interest to have
a qU1C~ r:lcthGd fo)' chcc:,intj \·,h~Lhf.:r ~hlS Jssumption
~s justified.
0,,2 of the s'irnplest pr-ocedures lS 1 ,lustrated in
Table 5.~ The first C01Unl!1 shov/S tIle ddtd, which arc
the len:;ths (in em) of a sllll1ple of lnne ~pnnents
Table 5.'~ shohs these volues of U for a l'ange of chosen at random. The data are next arranged in
values of I; and the cOl-r'Cspondlng valll\.!s of i):~ and
ascendinCJ Ci~der of md9nitudc, a~; Sh(;"'/~l in the second
('I obtei"ed from Tilble AI. (I-Iot~ that, for vclues of
colunln; these ordered values w,11 D~ d~110tcd by Xli)
U 1 ik·; 1. no, chc va IUeS of (;.hilVO been found by
In tile third column, values of i/(n+l) arc shown,
lnterpolatlng in Table Al). Tho flnill column of
Table 5.4 shows the ratio c/c calculated by using v/here'n'iS the sample size ana i::: l,~, ... )n, These
Equation (5.18). ~rol11 this, it is seen that the..
minimum vdluc of C occurS '"men~;::: -1.0 unlt. lnlS values of i/(n+l) are regarded as cumulative normal
S~<Jgc:slS lhJl, ill ordet' tv l1linilllizc UI\: JVCra~Jl: dye
cost per lutcil, the dyer should Jll1l to l1Iake tile probabilities as shown in F~g. 5.11, lnd the finai
dyelngs iighter th.n th. standilrd by one co1our-
colUilln sh'-.:\tl:' the corresponoin9
difference unit.
I Gar11lent In ascendlng
order X( i)
I lengths xi
48.3
T"~le 5.~: Mean Cosl of Dyelng ~or Different r;~.2 49.4 ---~
51 .3
ValuesoflJ lOc. 7 52.7 -1.28
r=-~- U2 Cil Ci2 c/c I 48.3 53.2 I-0.84
52.7 55.1
55.2 -0,52
49.4 58.2 -0.25
53.2 0.00
0.25
o 55.1 0.52
0.84
-0.1 1.25 1. 25 0.1056 0.1056 1.0792 51. 3 1. 28
-0.2 1.275 0.1013 0.1103 1.0782
-0.3 1.3 1.225 0,1151 1.0772 55.2
-0.4 1.2 o 0968 0.1200 1,0763
-0.5 1.325 0.1251 1.0755 values of the standard normal variate u read off
-0.6 1. 35 1.175 0.0926 0.1303 from Tilble A1.
-0.7 1.15 0.0885 0.1357 1.0749
-0.8 1 .375 0.0846 0.1412 1,0743
-0.9 1.4 1.125 0.0808 0.1469 1.0738
1.1 0.0771 0.1527 1.0734
W-1. 1 1.425 0.0735 0.1587 1.0732
-1. 2 1. 45 1,075 0,0701 O. HAS 1.0730
-1.3 1. 05 0.0668 0.1711 1.0731
-1.4 1.475 0.0637 D.1775 1.0731
1.5 1.025 0.0606 0.1841 1.0732
1.0 0.0577 1.0734
1.525 0.0548
1. 55 0,975 ***
0.95
1.575 0.925
1.6 0.9
is
S
~
Ei' I
~
49 5 a 51 52 $53 5 4 55 56 57 58
@)
(9
0I I I
I I I
I I
I
Fig. 5.13 3 r
I -Ir-
2I I
~- I
u I
,it
0 32.05 33.05 - x (il
30.05
The values of X(i) are now plotted against the An estimate of cr is given by one-third of the
values of ui, as ln Flg. 5.12, and, lf the plotted difference between the values of X(i) when u ~ 1.5,.
points fall close to a straight line, as in this Thus, from Fi,g. 5.13, when u " 1.5, X(i) 0 32.55,
case, the sample can be regarded as having come and, when u " -1.5, X(i) 0 29.95. Thus an estimate
from a nonnal population. of a is
I If a frequency table is available, the test is 5.8 The Nonnal Approximation to the Binomial
slightly modified in that the upper end of each
I class interval is plotted against the value of u It was pointed out in Section 5.4 that, as the
corresponding to (0.5+ Ef)j(n+l), where Ef is the number of trials n increases, the binomial
f cumulative frequency up to the ith class. As an distribution becomes more and more symmetrical. In
example of this,. consider the count data of Table fact, it can be shown that it tends to approach a
l 2.3, whose frequercy distribution is shown in nonnal distribution with the same mean and standard
Table 2.4. The complete calculation is shown in devietion, i.e., with ~=np, 0; (np(l-p)Jl/2. This
( Table 5.5, and the graph of X(i) against Ui is means that in certain circumstances binomial
shown in Fig. 5.13. Again the points fall close to probabilities, which are often tedious to calculate
I a straight line, suggesting that the population of by using the basic binomial Equation (5.8), can be
linear-density values in tex was nonnal. approximated to by nonnal probabi)ities read from
i" tables of the standard nonnal distribution.
Estimates of the population mean ~ and its standard
'~-- deviation a can be fe,und from the straight lines The principal condition that must be satisfi'd is
drawn on the graphs. An estimate of ~ is the value that n is sufficiently large. A number of rules for
~Iiit of X(i) when u "0. From Fig. 5.13, this gives an' deciding whether this is so can be found in the
estimate of 31.15 tex, compared with the calculated literature, but a good general one is that n must
be gr?ater than the larger of 9p/(1-p) and 9(1-p)/p,
value x " 31.13 tex. i. e. ,
Upper end Cumulative 0.5Hf ui
of class
f frequency 193 -1.92
x (i) -1.38
Lf -0.78
-0.14
29.55 5 5 0.028 0.60 S max L • pl-p l.
30.05 11 16 0.085 (l-p
30.55 26 42· 0.220 1.22
31.05 44 86 0.448 1. 70 _p_ " 0.32 =0 471
31. 55 54 140 0.728 2.22 1-p 0.68 .
29 169 0.878 2.70
32.05 15 184 0.956 l::2. - Q.:.§§. = 2 125
32.55 190 0.987
33.05 6 192 0.997 P - 0.32 . .
33.55 2
n"192
-----------~~-------]
5. Some Standard Probability Distributions
To,ing tI'" lJrger of tilese [1'0 ratios, l'le f'ind I'e
mUS t t,a'/c
i.e" if n is 20 or more, normal distribution to One of J nlanufacturcr's custOll1crs operates a
tables can be used to calculate approximations $tilllpl inlJ-ins~t:ct ion ScrH:lilE: on cons 19l1lllcnts of
binomial probabilitics. ilrticlc'::i received from the lIloJrlufacturer. This
requires that a sample of 300 articles be drawn at
5.8.1 The Continuity Correction random and inspected, and that not more than 13
defectives shall be found. The manufacturer's process
In calculating the approximations, account must be usually produces 3% defectives; what is the
probability that a consig~~ent containing this
taken of the fact that, whereas the binomial proportion of defectives will be rejected by the
customer?
variate is discrete, takinn only integer values,
Provided that the consignment is a large one,
norm·,1 variates are continuous. This is alloVied for containing many times 300 articles, the situation
envisaged is a binomial one, i.e., each article
by noting that, if a continuous variate is measured chosen for the sample can be thought of as a trial
and the probability that it will be a defective is
and recorded to the nearest integer, say, 1', then constant and equal to 0.03. Hence n • 300, P • 0.03.
Now a consignment will be rejected if it contains
its exact va 1ue 1 ies in the range from I' - ! to more than 13 defectiv2s, i.e., if it contains 14 or
more defectives. Thus
I' +~TherLfore, provided that n is larg~ enough,
where bPr( ) denotes a binomial probability and The calculation of this binomial probability wou'd
nPr( ) a normal probability, The situation is be very tedious if the basic binomial equation were
illustrated in Fig, 5.14(a), where the binomial used, so we investigate the possibility of using "he
distribution has been re~resented by a histogram normal approximation; to do this we have to make
with rectangles centred on x • I' for I' • 0,1,2, .. ,n. sure that the Condition (5.l9) is satisfied. In this
Drawn on the same diagram' is the approximating example,
nonnal distribution; it is apparent that the area
under the latter between I' - ~ and I' + ~ is l 0.03
approxlmately equal to the area of the shaded l-p 0.97
rectangle, and this is what Equation (5.20)
expresses. and
1.=£ 0.97
p 0.03
Jf these the second is obviously the 9reater, so we
require
• >9 00..0937
nx
Since n • 300, Condition (5.19) is satisfied, and
the normal approximation can be used.
r- -1 r+-1
22
Fig. 5.14(b) shows that the sum of the areas of the nPr (u ~1-3-.-5--z9:.g0)s
rectangles centred on 1', 1'+1,... ,n, which from Table AI. This implies that, if the manufac-
;urer consistently sends consignments containing 3%
represents the probability Prix ~ 1'), is defectives, then just over 6% of them will be
rejected by the customer. Whether this situation is
approximated to the area under the normal acceptable to the manufacturer is. of course,
-i.distribution to the right of x • I' dependent on many other factors.
Thus
,5.9 The Nonna I Approx ima t ~on to the Poi s son
A similar normal approximation can be used when the
r,lean\l of a Poisson distribution is 9reater than
Ibout 5. The approximating nonnal distribution has
'n2an \l and standard deviation 1iJ. A continui ty
correction similar to that explained in d~aling with
the binomial distribution is applied.