The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Handbook_on_Material_and_Energy_Balance_Calculat1

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by alphacentauryc137, 2022-06-29 20:46:27

Handbook_on_Material_and_Energy_Balance_Calculat1

Handbook_on_Material_and_Energy_Balance_Calculat1

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 75

Figure 3.11 Normal distribution plot of ceramic strength data.

Table 3.9 Excerpted lines from the table of normal distribution R2 values.

sample size 1.0% 2.5% 5.0% 10.0% 15.0% 25.0%
400 0.9901 0.9916 0.9928 0.9940 0.9946 0.9954
500 0.9917 0.9931 0.9940 0.9949 0.9955 0.9962

As can be seen, an R2 of 0.9813 is not good enough. For sample sizes in the 400-500 range,
fewer than 1% of all genuinely normal datasets have an R2 below 0.99. Therefore, we conclude
that a normal distribution would be a poor fit for the ceramic data.

Assignment. Suppose that the experimenters found a flaw in the preparation of the ceramic
samples that cast doubt on the validity of any results below a strength of 510. Re-evaluate the
normal distribution analysis when these data points are omitted.

3.3 Basic Applications of Inferential Statistics to Measurement

We defined inferential statistics in section 3.1 as the process by which the measurements in a
sample are used to make statements about the (unknown) population being measured. Now we
will study the use of inferential statistics to answer two basic questions about sets of
measurements:

• Given a set of measurements, can you produce an interval that you are "confident" will
contain the mean of the population being measured?

• Given one or more sets of measurements, can you tell if the underlying populations are
"significantly" different in mean or variance from a target amount, or from each other?

In addition to these two questions, we will also discuss the difference between random error
and systematic error in measurements, and show how these errors are propagated through a
calculation. But first, we consider the concept of statistical independence of measurements.

Statistical independence in a dataset means that knowing the value of any one measurement
gives you no information about any other measurement. When we make measurements, we want
each of the measurements to be representative of the whole population. In other words, we would
like to maximize the overall information content in the measurements. When some or all of the
measurements are "linked" to each other in some way, the overall information amount will be

76 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

reduced. Even worse, the data collected will give a distorted picture of the whole population. The
calculations of this section are all invalid unless your sample is statistically independent, and so in
analyzing data, you should examine your measurement process at the beginning to insure
independence of your experimental results.

The best way to detect violations of independence is to make a lag plot of the data. Consider
the revised furnace data plotted in Figures 3.5 and 3.6, where the cycle time and sampling interval
coincided roughly with a power-on or a power-off occurrence. Thus, if the measurement at a
particular time was 938 °C, the next measurement would be right around 949 °C. To make a lag
plot, set the x-value equal to an element in the sample, and a y-value equal to the next element in
the sample. The effect of this is that each point plotted has the form (xxjci) (хг,хз) (хъ^сл).... That
is, each point has as its x-value one of the measurements, with its y-value the next measurement in
the series.

Now suppose the cycle time was twenty minutes and the sampling time was an hour.
Measurements would be taken at about the same point in the cycling sequence. Thus, if the
measurement at a particular time was 938 °C, the next measurement might be 939 °C or maybe
937 °C. Figure 3.12 shows an example of a lag plot for this situation.

950 Test for Independence #1

О Jsample ,°о'о о
г<'° /
\f=x-2
3f=x+2 /& о '
Ф

■J 945 H *'ъ$Чу'
w+-» ° ' о сР /
О 4 ,°' о оУ у
Ф .'<ò У
3 ',Ρθ У
У
2 <.*$>°*> о <'
%
g, 940
E

У уУ
—^—
935 940 945 950
935

temperature at time t-1

Figure 3.12 Lag plot of hourly furnace measurements, 20 minute cycles. Pattern indicates lack of
statistical independence.

The Figure 3.12 lag plot shows that almost all the points lie between the lines у = x - 2 and у
= x + 2. Almost every measurement is within 2 °C of the measurement before. We defined
statistical independence to mean that knowing the value of one measurement gave you no
information about the values of subsequent measurements. Obviously, that's not the case here. In
most cases, one measurement predicts the next measurement with an error of less than 2 °C.

The most common cause of non-independence is poor experimental design when repeatedly
measuring a quantity over time. In the example above, since the measurement interval is a certain
multiple of the cycle time, we are likely to get similar values on repeated measurements. If we had
a cycle time of 7 minutes instead of 20 minutes, each measurement would come in a different place
in the cycle, and there wouldn't be the repetition in measurements we see above. For example,
Figure 3.13 shows a lag plot for the furnace example with a 7-minute cycle rather than a 20-minute
cycle. There's no visible pattern. In addition, we wouldn't see any pattern if the sample time was
taken at random intervals.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 11

Test for Independence #2

950 О °oo о
оЪ о
E 8 c' о о о

~ 945 ° о

+-» о о OD О
Q)
оо ° оо
v. с>о о о
3 °°

2 о (9 о
&EΦ 940

935 940 945 950
935 temperature at time t-1

Figure 3.13 Lag plot of hourly furnace measurements, 7-minute cycles. Lack of pattern indicates
statistical independence of measurements.

Since a loss of independence means that there is a relationship between consecutive
measurements, the simplest way to check your data for independence is a lag plot. If you see a
pattern in the lag plot, then that pattern means you can predict one measurement from the
measurement before, and thus you have a problem with independence. If you see no pattern, then
the independence assumption is reasonable.

The theoretical effect of a lack of independence is that all the logic underlying the statistical
inference described in this section fails, so you don't know how reliable your confidence intervals
are anymore. The practical effect varies from situation to situation. In the simplest case, where the
lag plot shows a linear pattern with a positive slope, the sample standard deviation underestimates
the population standard deviation. In turn, this means that confidence intervals will be too narrow.
That is, you may think that there's a 95% chance that your interval will contain the population
mean, but in reality there's only (say) a 60% chance, little better than flipping a coin. We will
assume all of the example datasets in this section have statistical independence.

Our main teaching example for the rest of this section is the measurement of the heat capacity
(Cp) of silicon oxynitride, Si2ON2. A chemistry student, under the supervision of a faculty
member, makes the measurements. The student heats the ceramic to 100 °C, lets it cool to 25 °C in
a calorimeter, and measures the heat given up during the cooling process. The experiment is
repeated with a starting temperature of 90 °C. The difference in heat given up from the two
temperatures is divided by the 10 degree difference in starting temperature to estimate the heat
capacity at 95 °C, with units of J/(mol ■ K). The method is prone to error when being done by an
unskilled person, so she repeats the experiment 10 times, with the results shown in Table 3.10. See
also the "Si2ON2" worksheet in workbook StatTools.xls.

Table 3.10 Ten measurements of the heat capacity of Si2ON2 at 95 °C (J/mol ■ K).

76 82 89 87 75 71 72 86 74

How should she use this data set to estimate the heat capacity of Si2ON2 at 95 °C? The most
obvious choice would be to use the mean of the 10 measurements, which is 80 J/(mol ■ K). But she
knows that 80 J/(mol ■ K) is unlikely to be the exact heat capacity of the object, since that would

require all of the measurement errors to cancel each other out exactly.

78 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

The student decides to trade precision for accuracy. That is, instead of reporting a single
number as her estimate, she decides to provide a range of values she feels are likely to contain the
true heat capacity somewhere inside them. She might choose to say, "Since my measurements are
between 71 and 89 J/(mol ■ K), I'm pretty sure they're not all too high or all too low. I'm fairly
certain that the true heat capacity of Si2ON2 at 95 °C is between 71 and 89 J/(mol ■ K)."

However, that strikes her as too pessimistic. If 80% of her measurements were greater than
72 J/(mol ■ K), and 80% of her measurements were less than 88 J/(mol ■ K), she can still be pretty
confident that the true heat capacity is really between 72 and 88 J/(mol ■ K). But where to draw the
line? How confident should she be? Inferential statistics provides a (relatively) simple method
that does a good job converting repeated measurements to a range of values in which one can be
specifically confident that the true measured quantity lies.

3.3.1 Sampling Distributions of the Mean and the Central Limit Theorem

The two key ideas we need to understand to attack the problem above are the ideas of a
sampling distribution and the Central Limit Theorem. We will use an expanded set of heat
capacity data to help understand these ideas. Suppose that each of 200 students in a chemistry
class make 10 measurements of the heat capacity of Si2ON2 at 95 °C. That's 2000 measurements
in all. (The data are on the "ChemClass" worksheet). Figure 3.14 shows a histogram of those
2000 measurements.

350 τ— Class Measurements of Cp of S12ON2

Figure 3.14 Histogram of 2000 measurements of Si2ON2 heat capacity made by 200 students.

At least at first glance, this looks like a uniform distribution. Now let's look at a different
histogram. For this one, assume that the professor of the chemistry class didn't want to go to the
trouble of recording all 2000 measurements her students made, and had each student report only
the average of his or her 10 measurements. Figure 3.15 shows a histogram based on these 200
means. In contrast to the histogram of all individual measurements, the histogram of means looks
like a normal distribution.

This second histogram is the one we're really interested in. It's a simplified version of
something we'll never actually see in real life, the sampling distribution of the mean. The
sampling distribution of the mean is the distribution of every possible sample mean taken from a
given population. In the above example, each student contributes one mean to the histogram.
Theoretically, every chemistry student ever, present, past or future, could contribute one sample
mean to the above histogram. That's why we say we'll never actually see a complete sampling
distribution in real life.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 79

Figure 3.15 Histogram of 200 mean student heat capacity measurements.

It's important to understand the difference between a sample distribution and a sampling
distribution. Here's how to keep them straight. The sample distribution is a histogram whose bins
contain counts of actual measurements. The sampling distribution is a histogram whose bins
contain counts of how often a calculation based on some measurements (the mean) lies in that bin.
The sampling distribution is statistically important, as we shall show.

First, though, we need to look at the other new idea of this section, the Central Limit
Theorem. The histogram of individual heat capacities looks like a uniform distribution, but the
histogram of mean heat capacities looks normal. This makes sense. Most samples "should" have
an average near the overall mean, so a histogram of averages — that is, the sampling distribution
of the mean — should be clustered around the population mean. This also implies that the
sampling distribution should have less variation than the distribution of the original population. If
"most" of the sampling distribution is near the center, then few values will be far away.

The Central Limit Theorem (CLT) makes these ideas precise. It says:

1. The sampling distribution of the mean is approximately normal, and becomes more
normal as the sample size increases.

2. The sampling distribution has the same mean as the population from which the samples
are drawn. In the context of repeated measurements of the same quantity, this corresponds
to the statement that the mean of the sampling distribution is the true value of the measured
quantity.

3. The standard deviation of the sampling distribution is called the standard error, and is
smaller than the standard deviation of the population by a factor of 1/n, where n is the size
of a single sample.

The CLT stated that the sampling distribution is "approximately" normal. The rule of thumb
is, the closer the original population is to a normal distribution, the closer the sampling distribution
will be to normal for a given sample size. The original population of 2000 heat capacity
measurements is uniform rather than normal, but it's roughly symmetric, and that's enough for
application of the CLT. This is why the sampling distribution in the expanded heat capacity
project seems close to normal, as shown by the histogram. Figure 3.16 confirms this in the form of
a normal distribution plot.

80 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

86 Normal Distribution of С p Means

o>

Ф
3 84
r>c
82
Q)

*J J^J/P

С actual Cp = 1.005(theoretical Cp) - 0.44

ÜФ 80 о R2 = 0.9961

vФ.

a 78
rc

tm3s 76

74 1i 11 i

74 76 78 80 82 84 86

theoretical percentile value

Figure 3.16 Normal distribution plot of the 200 student means. Heat capacity of Si2ON2
measured at 95 °C, in J/(mol ■ K). Text box displays edited version of the Trendline equation.

The value of R2 indicates that the normal distribution gives an "excellent" fit to the data, and
the trendline equation confirms this with a slope close to 1 and an intercept very close to zero.

Now suppose that the original population of 2000 measurements was far from normal, which
could occur if the apparatus used to measure heat capacity was faulty. It never gave measurements
below 76 J/(mol ■ K), and occasionally gave extremely large ones, up to 100 J/(mol ■ K). The
results would then be biased (see worksheet "ChemClassSkew"). The distribution of all
measurements would show this skew. Figure 3.17 shows the histogram of all 2000 (skewed) heat
capacity measurements.

Skewed Class Measurements of C p of Si2ON2
1200

(76, 79] (79, 82] W/ff/Wf//Ar (94, 97] (97, 100]

(82, 85] (85, 88] (88, 91] (91, 94]
Cp at 95 °C, J/(mol - deg)

Figure 3.17 Histogram of 2000 student heat capacity measurements using biased equipment.

So now the question is: can the mean of each student's sample be adequately represented by a
normal distribution plot as was done with the non-skewed results? Figure 3.18 shows a histogram
of the mean of each student's sample. The skewness is still obvious, but to be sure, we make a
distribution plot, as shown in Figure 3.19.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 81

Mean of Skewed Student С p



ФСα3σ>

id ISJtStftStffSffA П

(79.5, (80.25, (81, (81.75, (82.5, (83.25,
80.25] 81] 81.75] 82.5] 83.25] 84]

Cp at 95 °C, J/(mol ■ deg)

Figure 3.18 Histogram of 200 mean student measurements using biased equipment.

With such a non-normal population, the sampling distribution is visibly not normal yet. We
say, "not normal yef9 because the Central Limit Theorem says that the sampling distribution
becomes increasingly normal as the sample size increases. For the sampling distribution of the
mean to be approximately normal when the original population is very skewed, we would need
samples of size 30 or 40 — that is, each student would have to repeat the measurement 30 or 40
times before the sampling distribution would be normal. Although textbooks suggest that with
samples of 25 or 30 it is always assumed that the sampling distribution is normal, populations with
very extreme values or that have very lopsided histograms will require larger samples than that to
feel confident about the sampling distribution's normality.

84 Normal Distribution of Skewed Cp Mean

83 о
о
§ 8?
(>
Φ
+5 <£^^

С 81 ^g*^*

8aωi - 80 оосроа>*9ь>'в7 actual Cp = 0.9867(theor Cp) + 1.10
R2 = 0.9594
3ro
(0 79

78 79 80 81 82 83
78
theoretical percentile values

Figure 3.19 Normal distribution plot of 200 mean student measurements, biased equipment.
Values in text box have not been properly rounded off.

In the introduction to this example, we stated that the instrument had bias, so we knew that the
data were skewed. The experimenter seldom knows about a source of bias beforehand, and instead
analyzes the data to see if bias exists. The point of this example was to discuss the effects of
skewed populations on the sampling distribution. Later in the Chapter, we will show how to apply
statistical analysis to a set of data to detect bias. If the source of the skewness is systematic error,
as currently described, then the sampling distribution has uncorrectable error, and is useless in
obtaining a reliable estimate of heat capacity. The source of bias should be determined and
eliminated and the experiments be redone.

82 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

3.3.2 Confidence Intervals

Let's assume for now that the histogram of 200 means (worksheet "ChemClass") in Figure

3.15 is exactly normal, and that it is exactly equal to the sampling distribution of the mean for the

whole population. The mean of the sampling distribution is 80.12, and the standard deviation is

1.91. Our assumptions, combined with the Central Limit Theorem, imply that the population mean

is also 80.12, and the standard deviation of the population is 1.91 x .04. We now use Excel

to compute a range of values that would contain 95% of all the sample means.

To do this, we use a new Excel function, — NORMINV{probability, mean, stdev).
NORMINV is the inverse function to the NORMDIST function we studied in section 3.2.
NORMDIST took an x-value and returned the proportion of the distribution that lay below that x.
NORMINV does the reverse: given a proportion of the distribution, it tells you what value of x
corresponds to that proportion.

We already know the mean and standard deviation to plug into NORMINV, but we need to
know what number(s) to use for the probability. To see how NORMINV works, let's apply it to
the ChemClass data. We write in any cell the following expression: =NORMINV(0.75, 80.12,
1.91), and the value of 81.41 appears. Looking at the histogram in Figure 3.15, we can see that
81.41 lies to the right of the center. What NORMINV is telling us is that 75% of the distribution
lies below 81.41, and the remaining 25%) lies above 81.41.

How can we use this to find our middle 95%? When we draw a normal distribution chart for
a distribution, 95% of the distribution is in the central region, which leaves 5% for the tails at left
and right. Since the normal distribution is symmetric, that's 2.5% in each tail. Thus, the left end
of our range is the jc-value that cuts off the bottom 2.5%> of the distribution, and the right end of our
range is the x-value that cuts off the top 2.5% (bottom 97.5%) of the distribution. This is what we
write in the worksheet:

Left end for range = NORMINV(0.025, 80.12, 1.91) - 76.38

Right end for range - NORMINV(0.975, 80.12, 1.91) - 83.86

That is, in our sampling distribution, 95% of all sample means lie between 76.38 and 83.86.
Figure 3.20 shows this, where the central shaded area represents 95% of the distribution.

Normal Distribution of Mean Cp
0.24

0.2

0.16 /\ 1

0.12 / X !

0.08 J$w0z/ 84

0.04 1
0Л —
74 ^ 1—

76 78 80 82 86

mean Cp of Si2ON2 from class

Figure 3.20 Graph of pdf for normal distribution with mean 80.12 and standard deviation 1.91.
Because the pdf is a mathematical idealization, thejy-values have no meaning in and of themselves,
and so there is no label on the j-axis. The shaded region represents the middle 95% of the
distribution.

We want to reword this statement to find an interval of heat capacities we are confident will
contain the true, unknown heat capacity of Si2ON2 at 95 °C. First, recall from the CLT that the
mean of the sampling distribution should be the same as the true heat capacity of the object. Next,

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 83

because of the symmetry of the situation, the center of the interval (76.38, 83.86) is also the center
of the distribution: У2(76.38 + 83.86) = 80.12. Finally, the distance from 80.12 to the boundaries
of the interval is 83.86 - 80.12 = 80.12 - 76.38 - 3.74. Thus, 95% of the time, the distance from
the true heat capacity to the sample mean heat capacity is less than 3.74.

How do we use this to answer our original question? In real life, we don't know the true heat
capacity, we know the sample mean. For our original sample of 10 heat capacity measurements,
the mean was 80.0. Therefore, we can be 95% confident that the (unknown) true heat capacity is
between 80.0 - 3.74 and 80.0 + 3.74, or (76.26, 83.74). Thus, we have (finally!) answered the
question we asked back at the start of section 3.3: an estimate of the true heat capacity of Si2ON2
at 95 °C is that it is in the range 80.0 ± 3.74 J/(mol ■ deg) = (76.26 to 83.74), and we are 95%
confident that the true heat capacity really is in that range. Our interval of (76.26, 83.74) is called
a 95% confidence interval for this heat capacity.

A back-of-the-envelope shortcut follows from this example. Note that 3.74/1.91 =1.96 « 2.
That is, when we added and subtracted 3.74, that was about the same as adding or subtracting 2
standard errors. Thus, many people regard "the sample mean ± 2 standard errors" as a quick-and-
dirty 95%) confidence interval.

All of this may seem somewhat speculative. After all, the above calculation was based on a
whole stack of assumptions that won't be true in real life:

• We made up a fictitious class of 200 students, each making the same 10 measurements the
original student made, so that we could look at an (approximate) sampling distribution.
However, in real life, we'll usually have just one set of samples, not 200.

• We claimed that the sampling distribution was exactly normal. However, in real life, it'll
only be approximately normal, and, as noted in the previous bullet, we won't even have an
approximate sampling distribution to look at.

• We claimed that the 200 "student" means had exactly the same average as the heat
capacity we were trying to estimate, and that the standard error was exactly the population
standard deviation over the square root of n. Again, in real life, we don't have the
sampling distribution to make even these simplifying statements.

For these reasons, we can't calculate confidence intervals for real situations using the method
we used above. However, we can get surprisingly far in spite of this.

In the first place, we have the Central Limit Theorem, which tells us that our sampling
distribution, even if we can't see it, is approximately normal as long as our sample size is "large
enough" and/or our population is close to normal itself. Although we can't check either of these
directly, the news is usually good as long as we are working with measurements. In most cases,
repeated measurements of a single quantity are roughly symmetric and clustered around their
center, which would guarantee the CLT applies even for relatively small samples.

In the second place, we don't know the population mean — but we have an estimate for it
with our sample mean. And we don't know the population standard deviation — but we have an
estimate for it with our sample standard deviation. So we can use these sample values to do our
calculation.

Since our assumptions of normality of the sampling distribution is probably close but not
exactly true, and since our having to use the sample mean and standard deviation in place of the
population values adds uncertainty to the accuracy of our estimates, we'll have to make our
confidence interval slightly wider than 2 standard errors in each direction to remain 95% confident
that we really have the population mean somewhere in that interval.

The correct number of standard errors is determined by something called the /-distribution.
We will not attempt to explain the /-distribution here. Instead, we will simply use Excel's
TINV'(probability, degrees offreedom) function as a black box to calculate how much to multiply
the standard deviation by to allow us to compute a confidence interval.

84 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Unfortunately, the probability in the TINV function doesn't work the same way as the
probability in the NORMINV function. If you want to compute a 95% confidence interval, TINV
wants you to use the probability associated with the tails. Thus, for a 95% confidence interval,
there would be 5% of the distribution left in the tails, so you'd use probability = 0.05 in TINV.
"Degrees of freedom" will be part of our unexplained black box; for confidence intervals, degrees
of freedom = n - 1, the sample size minus 1.

With this, we are ready to compute the "real" confidence interval for the 10 heat capacity
measurements reported in Table 3.10 (and worksheet "SÌ20N2"). They have mean 80.0 and
standard deviation 7.12. The steps for computing a 95% confidence interval are:

1. Convert the sample standard deviation to an estimate of the standard error by dividing by

= 2.25-

2. Use TINV to determine how many standard errors we need for our confidence interval:
=TINV(0.05,9) = 2.26.

3. Add and subtract TINV x estimated standard error from your sample mean: 80.0 ±
2.26(2.25) = 80.0 ± 5.09 - [74.91 J/(mol ■ deg), 85.09 J/(mol ■ deg)].

It shouldn't be too disappointing that the precision of this confidence interval is less than the
precision of our range (76.38, 83.86) when we took the mean of 200 samples (worksheet
"ChemClass"). That more precise interval was based on considerably more information (200
sample means instead of one sample of 10, plus all those unrealistic assumptions). Since in real
life we typically know a lot less than we assumed for our example above, our confidence intervals
are correspondingly wider.

Equation [3.6] can find a specified confidence interval for situations where the Central Limit
Theorem lets us believe the sampling distribution is approximately normal. Recall that x stands
for the sample mean, s for the sample standard deviation, a is the (decimal form of) confidence
level, and n is the sample size.

a confidence interval = x ■ШЙ-а,и-1)^Д + ШУ(1-^-1и, [3.6]
v Л/и yn

In the heat capacity example, we focused on finding a 95% confidence interval, which is by

far the most common level of confidence for finding an interval. This is purely custom, and

probably arose historically because it's so easy to estimate for large samples (just add ± 2 standard

errors). The next most common level of confidence used is 99%, followed by 90%.

EXAMPLE 3.5 — Finding a 90% Confidence Interval

Use the calculation method from the heat capacity example to compute a 90% confidence
interval.

Data. / i = 1 0 , x = 80, s = 7.12.

Solution. Using Equation [3.6], a 90% confidence interval would be 80 ± TINV(0.1,9) 7.12
л/Й)

75.87 J/(mol · deg) to 84.13 J/(mol ■ deg).

Assignment. Find a 99% confidence interval for the same data.

Note that the 90% confidence interval is more precise (i.e., narrower) than the 95%)
confidence interval we found above, and, if you do the assignment, you'll find that the 99%
confidence interval is the least precise (i.e., widest) of all. This makes sense. You are working
with the same 10 measurements each time. The only way to increase your confidence that the
interval you calculate actually contains the quantity you're estimating is to make it wider.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 85

Equation [3.6] can be used to help you understand what sample size you might need to
achieve a certain level of precision in your estimate. Suppose you wanted to be able to estimate
the mean heat capacity of the substance to within ± 0.5 J/(mol - deg) with 90% confidence. Since
we're only interested in the ± part of the formula, we can ignore the sample mean x and set up an
equation to solve for the sample size n:

0.5 = TINV(0A,n-l4)n1A2

4n=nNV(p.l9n-iy 7.12
0.5

However, we have an n on each side of the equation, and there's no way to dig it out of the
right hand side using algebra. To complete the calculation, we use the fact that for large n, the t-
distribution is nearly identical to the normal distribution. One way to understand why this is so is
that the larger the sample size, the more it should resemble the actual population, so the
assumptions of the Central Limit Theorem (i.e., normality) are much closer to reality.

In terms of Excel functions, for large n, NORMINV ,0,1 « TINV(1 - a, n - 1), and we

V

don't need to know n to use NORMINV:

^n ~ NORMINV(0.915&1)-,^7-.12 = 23.42

л «549

As a general formula, if ±5 is the desired precision for a confidence interval, a is the desired
confidence level (as a decimal), and s is the sample standard deviation, then your sample size can
be calculated by Equation [3.7].

Approximate sample size n , NORMINV Ί + α A l Λ "s Λ 2 [3.7]

We now have a chicken-and-egg problem here. This formula tells us how big to make our
sample to get a desired precision, but the formula involves s9 which requires us to have already
taken a sample. The way around this is to take a small preliminary sample first, with 10 or 20
elements, use that to estimate s, then plug that value of s into the above formula to figure out the
total sample size you'll need. Or, if you are repeating an experiment that has been done before,
you can use the standard deviation from the previous version of the experiment in the formula
above to estimate n.

As a final comment, note that n and δ are inversely related in the above formula. That is, to
reduce the width of the confidence interval δ, you need to increase n. It makes sense that to get a
more precise estimate (reduced δ), you need more information (increased n).

EXAMPLE 3.6 — Relationship Between Sample Size and Interval Width Using the Si2ON2 Example.

(i) Suppose the student made 20 measurements instead of 10, and got a mean of x = 80 and
standard deviation s = 7.12. How would that change her 95% confidence interval? (//) Suppose
she wanted her 95% confidence interval to have a width of no more than δ = ±0.5. Assuming her
standard deviation remained around 7.12, how many measurements should she make to achieve
this precision?

Solution. (0 Equation [3.6] gives us 80 ± TINV(0.05,9)H^ = (76.40 to 83.60). As expected, the

л/20

result is a narrower confidence interval than the original interval from a sample size ofjust 10.

86 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

/ 7 19Λ 2
(ii) gtihvees"cuosrren cÄt"1vnNuOmRbMerINoVf imOVea'9s7u5re'Om'leìJ—n0ts—.5toI'ob=tai7n78a·5 > rounded up to n = 779.
Equation [3.7] desired level of precision is
Unfortunately,

often impossibly large.

Assignment. The published value for the heat capacity of Si2ON2 is given to three decimal places:
that is, with an implied precision of δ = 0.0005. Find the appropriate sample size to achieve this
precision using the value of s = 7.12 and a confidence level of 90%. Do you believe that
somebody actually did that many measurements to obtain this number?

3.3.3 Treatment of Errors

Section 1.8 briefly discussed the treatment of significant figures in calculations. Basically, the
significant figures of a number are from the first non-zero digit from the left to either the last digit
to the right (zero or non-zero) if there is a decimal point, or the last non-zero digit to the right if
there is no decimal point. Significant figures in a numeric result reflect the precision of the
measurement. For example, the following recording of repeated measurements of the same length
(cm) would not be correct: 2.0128, 2.0256, 2.0389, 2.0423, and 2.0056. From the data, we can see
that the first uncertain decimal place is the second digit. So every number above should be
rounded to the second digit to the right of the decimal point. Obviously, the data above indicate an
unattained precision. Precision can be improved by better measurement methods, but not by how
we record the data.

You may have noticed that values are often cited throughout this Handbook to more than the
acceptable number of significant figures. One reason is that Excel tends to display many more
significant figures than are justified, and we often show Excel's display as-is. The other reasons is
that it's common practice to carry one or two "extra" significant figures throughout a series of
calculations, and then round off the final answer. However, since one can't be sure when the
answer is "final", it's tempting to keep one more significant figure in a result than is justified. We
explain this as a caveat for the reader.

We now extend our discussion from significant figures to errors. Using the statistical
concepts and principles introduced previously in this chapter, we will show how to identify
systematic errors and how to compare the accuracy and precision of two different measurement
methods. Further, by a thorough study of the propagation of errors, the rationale for the rules
governing the treatment of significant figures in section 1.8 will become clear.

Before we study error treatment, it is convenient to discuss the related concepts of precision
and accuracy, which are descriptive in nature. Consider a situation where four students (А, В, С
and D) tested their analytical ability by electrodepositing all of the copper from 100.00 mL of a
standard solution on a platinum electrode, and weighing it. The standard solution contained
exactly 0.2000 mol/L of Cu. Figure 3.21 shows their results as a table and as a diagram.

For A, the results are very close and consistent, but the mean value is different from the true
value. A's measurements are precise but not accurate. For B, the results are quite different from
each other, but the mean is very close to the true value. B's results are accurate but not precise.
For C, the results are neither precise nor accurate. For D, the results are accurate and at the same
time precise. In summary, precision tells us how consistent the measurements are and accuracy
tells us the proximity to the true value.

To gain further insight for precision and accuracy, we need to go beyond the descriptive level
and introduce the concept of error. In measurements, error is defined to be the difference between
the measured value and the true value. Now suppose the true value is //, and the measured result is
x, then the error e = x - μ.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 87

Measurement of Copper Concentration Л В
0.2032 0.2065
True value \ 0.2034 0.1937
0.2033 0.1902
D as> 0.2032 0.2098
0.2034 0.2001
measurer о ♦ oo
оо
nn OD ■ С D
0.2177 0.1998
D DO 0.2023 0.2004
0.1994 0.2002
A !! 0.22 0.1892 0.1999
0.19 0.2 0.21 0.2165 0.1997
0.18
concentration, mol/L

Figure 3.21 Table and a schematic representation of the analytical results of copper concentration
made by four students. Open points are measurements and solid points are means.

In practice, it is important to identify the sources of error. For this purpose, we need to
introduce one more quantity: x0. Here x0 represents the average value of all the possible
measurements (infinite!) under the same condition. It is not intended to represent the average of
the measurements you have actually carried out, which, as always, will be denoted as x. Based on

these notations, error can be decomposed into two parts:

β = (χ-χ0)+(χ0-μ) [3.8]

The first part is called random error, which measures the variability within the data set,
regardless how close they are to the true value. You may recall random error as the quantity being
estimated by the standard deviation, as defined earlier by Equation [3.3]:

4^TZ(X'~J)2

The second part is called systematic error, which measures the bias, or the overall proximity
to the true value. In Figure 3.21, random error exists in all four data sets, but systematic error
exists only in data set A and possibly C. The linkage between random/systematic errors and
precision/accuracy is rather obvious. It is safe to say that precision corresponds to the size of the
random error and accuracy corresponds to the size of systematic error. The word bias is also used
to describe systematic error.

When we talked about confidence intervals above, we said we were "95 % (or whatever %)
confident that the confidence interval contains the population mean μ". However, that was in the
context of the usual statistical assumption that measurements are free of bias, that is, μ = χ0·
Properly speaking, we should be saying that we are "95 % confident that the confidence interval
contains the mean of all measurements x0"·

Random error and systematic error have very different origins. Random error is the
cumulative result of many sources of small disturbances that are generally not controlled. As a
result, random error can be minimized, but not completely eliminated. In contrast, sources of
systematic error can be identified and removed.

It is desirable to have a measurement method that is both precise and accurate. But in
practice, we often need to start from an available method and try to improve it by identifying and
removing systematic errors and minimizing the associated random errors.

With just one measurement, or even sometimes with just one set of measurements, it is
impossible to tell whether systematic error exists or not. The identification comes from a

88 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

comparison. The comparison can be in several different forms. The easiest situation is when, as in
the Cu2+ example, we know in advance the exact value of the quantity being measured. Typically
this value is from the measurements using a standard method. The basic procedure is to form a
confidence interval around the sample mean, and see if the exact value of the quantity is in the
confidence interval or not. If the exact value is not in the confidence interval, then we conclude
there is likely to be bias in the measurement process. We do this so we can select the best
instruments and measurement technique for future tests. Let's look at the confidence intervals we
get from students A, B, C, and D again to demonstrate the logic.

We begin by forming 95 % confidence intervals for the measurements made by the four
different students, using the formula:

* =^0.05,4 ~T=

or, in Excel formulas, =AVERAGE(ra>ige) ± TINV(0.05, 4)*STDEV(ra/ige)/SQRT(5).

Student 95 % Confidence interval (mol/L)
A (0.2032, 0.2034)
В (0.1898,0.2103)
С (0.1900,0.2200)
D (0.1996,0.2004)

These data show that only A has biased measurements. A is 95% confident that x0 is between
0.2032 and 0.2034. However, the known correct value, μ = 0.2000, is not in A's confidence
interval. Therefore, we are at least 95 % confident that μ Φ x0. Thus, we can be at least 95 %
confident that student A has a systematic error. A formal way of stating this conclusion would be
to say, "there is significant evidence at the 5% level that A's measurements have systematic error."
To estimate A's systematic error, we take the difference between A's mean and the known true
value: 0.2033 - 0.2000 = 0.0033 mol/L.

Figure 3.21 shows that student С might also have had systematic error. We cannot conclude
this from the evidence at hand, however. Since С is 95% confident that x0 is between 0.1900 and
0.2200, and the true value of μ = 0.2000 is also between 0.1900 and 0.2200, we cannot conclude
with any certainty that μ Φ x0. This is because with only five measurements and with the amount
of uncertainty in C's measurements, we can't tell whether the mean measurement is high because
of random error or systematic error.

EXAMPLE 3.7 — Heat Capacity Systematic Error.

Consider the measurement of the heat capacity of Si2ON2 at 95 °C made by the student
(worksheet "SÌ20N2" in workbook StatTools.xls). The published value for that heat capacity is
85.356 J/(mol ■ deg). (/) Is there evidence of systematic error in her measurement at the 5%
significance level? (ii) At the 1% significance level? (Hi) At the 10% significance level?

Data. The 95 % confidence interval reported above was (74.91 to 85.09).

Solution, (/) Since the true value for this heat capacity is not in the student's 95 % CI, we conclude
that there is significant evidence at the 5 % level that the student has systematic error in her
measurements.

(//) A 99 % confidence interval is (74.13 to 85.87) (this is the assignment following Example
3.6). Since the true value for this heat capacity is in the student's 99% CI, we conclude that there
is not significant evidence at the 1% level that the student has systematic error in her
measurements. This difference in conclusion from part (/) is why you should always state the
significance level of your result, since sometimes different conclusions are reached at different
significance levels, as here.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 89

Whether you choose to believe the answer to (/) or (//) is usually decided by the consequences
of making the wrong decision. Would it be worse to believe (i) but be wrong, or would it be worse
to believe (ii) but be wrong? In the first case, you might end up buying extra equipment, or
repeating the experiment, in the incorrect belief that you had a systematic error in the original
experiment. In the second case, you stick with your original experimental results, in the incorrect
belief that you had no systematic error. In this example, we'd regard the second error as worse,
and so to avoid making that error we would accept the first result and redesign/redo the
experiment.

(Hi) Since 90 % confidence intervals are narrower than 95 % confidence intervals, and since
the true value was already outside the 95 % confidence interval, we can immediately conclude that
it was also outside the 90 % confidence interval, and thus at the 10 % significance level that there
was systematic error.

Assignment. Return to the "Steel" worksheet. The steel company has a target level of copper
content of 0.157 %. According to their set of 50 samples, is there significant evidence at the 5 %
level that they are missing this target?

You may be confused about our use of 95 % and/or 5 % in the paragraphs above. In the
previous subsection, we described intervals as being at a "95 % confidence level". In the
paragraphs above, we talk about "significant evidence at the 5 % level". The reason for the change
from 95 % to 5 % is the different focus of each calculation. For confidence intervals, we are
focusing on the fact that we're pretty confident that our interval captures the true value of what
we're measuring, so our language of "95 % confidence interval" reflects that. In other words,
we're confident that there is only one chance in 20 that the true value lies outside the stated range.
However, in the examples above, the focus is on how unlikely it is that student A's measurement
process could be correct, so our language of "significant evidence at the 5 % level" reflects that. In
other words, there is only one change in 20 that the student's result is unbiased. The simple rule of
thumb is, when you talk about a confidence interval, you use the large number (e.g., 95 %); when
you talk about a significance level, you use the small number (5 %). Upcoming examples will help
you understand this difference in language.

Returning to measurement errors, we may not know the exact true value in advance, but we
may have a reference method or standard method that is known to be free from systematic error.
We can carry out the measurements using both the reference method and the method under
examination. Based on the two data sets, a statistical analysis will tell whether the method under
study has systematic error.

EXAMPLE 3.8 — Ore Assays.

You are a small silver mining operation that has traditionally contracted out the assay of ores
to The Adit Corporation. You are satisfied that Adit's assays are accurate. However, you have
been approached by a salesperson from Batches and Sons, a startup in the assay business. Batches
and Sons is willing to do the same assays for a lower cost. You decide to test their accuracy
against that of Adit by sending each company ten different samples from the same ore. Based on
the data below (troy ounces per ton), is there evidence of systematic error on the part of Batches
and Sons?

Data. Table 3.11 shows the silver assay reported by the two companies (measured in oz/ton).

Table 3.11 Measurements of silver in ore by two companies.

Sample 1 2 3 4 5 6 7 8 9 10
Adit 9.4 14 13.7 10.5 13.4 14.2 8.1 7.7 16.4 8.7

Batches 11.4 12.3 13.5 13.2 12.5 14.5 7.9 6.2 17.7 7.5
( A - B ) -2 1.7 0.2 -2.7 0.9 -0.3 0.2 1.5 -1.3 1.2

90 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Solution. Since each sample is measured twice, once by Adit and once by Batches, and since we
are sure that Adit makes accurate measurements, we can take the difference between Adit's and the
Batches measurements (A - B) as a measurement of how far the Batches measurement is from the
accurate measurement. If the Batches measurements were accurate, then the difference between
the two companies would be zero. Therefore, we will compute a 90% confidence interval for the
average difference (A - B), and see if zero is in the resulting confidence interval. We have x = -
0.1, s = 1.5, and /0.i,9 = 1-833. So -0.1 + 1.833' 1.5 N (-0.93, 0.81). Since the interval contains

zero, there is no significant evidence at the 10% level of a mean difference between the
measurements of Adit and the measurements of Batches. Thus, there is no significant evidence at
the 10% level of systematic error on the part of Batches.

Assignment. A new pH meter appears on the market. Due to its ability to store digital pH values,
you plan to replace your standard meter with new ones. First, you want to check to see if this new
device has any systematic bias. You prepare 8 aqueous solutions of different pH and measure
them using both the standard and the new meter. Table 3.12 shows the results. Find out whether
the new pH-meter has systematic error at a significance level of 0.05.

.Table 3.12 Measurement of acidities using two methods.

Sample 12 34 5 6 7 8

Standard measurement 1.88 2.57 4.02 5.49 7.12 8.29 9.77 11.23

New measurement 1.80 2.49 4.03 5.42 7.09 8.18 9.78 11.14

In considering the above examples, remember that using the t statistic depends on two
assumptions. First, the measurements should be independent of one another — that is, values of
one measurement should give you no information about the values of other measurements.
Second, the Central Limit Theorem should be applicable, meaning either the samples are fairly
large (30+) and/or the samples indicate a normal distribution of the population in question.
Normal distribution plots of the differences A - В in Example 3.8 (not shown) indicate that the
normality assumption is reasonable there. You should check this yourself for the Assignment.

If systematic error is detected, further analysis is required to identify the source of the
systematic error. It is possible that several sources of systematic errors coexist. It is also possible
that one or more of these systematic errors cannot be conveniently eliminated. In such cases, we
can at least estimate the magnitude of the error and make corrections accordingly.

We often need to improve the precision of measurements. The statistical procedure to check
whether the precision of a new set of measurements has been improved is known as the F-test. We
illustrate the procedure by an example referring back to the Si2ON2 heat capacity example given
earlier. Suppose the professor wants a more precise heat capacity measurement, and buys a new
furnace with more precise temperature control. The student makes 10 more measurements, with
the results shown in Table 3.13.

Table 3.13 Ten heat capacity measurements J/(mol ■ deg) of Si2ON2 at 95 °C using new
equipment.

78 80 91 79 77 79 87 82 76

The standard deviation of the original measurements in Table 3.10 is 7.12, while the standard
deviation of the new measurements in Table 3.13 is 5.17. Thus, the new measurements have a
smaller standard deviation. However, each sample size is relatively small, only 10 each. Do we
have statistically significant evidence to conclude from these that the new equipment has greater
precision than the original?

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 91

Excel's V1EST(array1, array!) can be used to calculate the probability, based on two
samples, that their underlying populations have the same variance (which is the same as testing to
see if they have the same standard deviation). Using the data in Tables 3.10 and 3.13 as input, the
FTEST function in cell D52 of the "SÌ20N2" worksheet returns 0.35. That is, there's a 35%
chance that the precision of the two instruments are identical, and only random fluctuation in their
measurements are responsible for the improvement we saw. Using the common 95 % standard for
making our decision, we would only conclude the variances were different if there was less than a
5 % chance they could be the same. Therefore, there's no significant evidence at the 5 % level that
the new instrument is more precise than the original.

EXAMPLE 3.9 — Improving Measurement Precision.

A metallurgist who was not satisfied with the standard method for the determination of nickel
content in an alloy designed a new method. To see whether his new method has better precision,
he measured the nickel content of an alloy five times using the standard method and six times
using the new method. At the 0.05 level, is his new method more precise than the old are?

Data. Standard (%): 6.32, 6.43, 6.18, 6.25, 6.55.

New method (%): 6.50, 6.53, 6.55, 6.47, 6.49, 6.57.

Solution. Excel's FTEST returns 0.011. Thus, there's only a 1.1% chance that the precision of the
standard method is the same as that of the new method. Since the standard method has a standard
deviation of 0.15, while the new method has a standard deviation of 0.04, we conclude that the new
method is significantly more precise at the 0.05 level.

Assignment. Two students, A and В used two different methods to titrate a copper-containing
solution to determine its concentration. Their results are:

A (mol/L Cu): 0.2074, 0.2068, 0.2077, 0.2059.

В (mol/L Cu): 0.2089, 0.2133, 0.2111, 0.2074, 0.2118, 0.2095, 0.2104.

Decide at the 0.05 level whether student A's precision is significantly higher than that of
student B.

Our final examples examine the situation where there are two competing measurement
methods, neither of which can be regarded as a reference method. Although you can't determine if
either measurement system has systematic error, you can at least test to see if the two measurement
methods have similar values for x0. If they don't, then you can conclude at least one method has
systematic error, even if you don't know which one, and you can do additional
measurements/experimentation to try to pin this down.

The Excel function for this is TTEST(army1, array2, tails, type). Arrayl and array2 are
the two sets of measurements being compared. Since we are using this in the context of testing to
see if two measurement methods give different results, without caring which is bigger or smaller,
the appropriate value for tails is 2. (You would use tails = 1 if you were testing to see if one
particular method resulted in measurements that were consistently smaller than the other method;
that is, if you were testing for a particular signed difference, rather than just a difference).

Type has three values, the first of which ("paired = 1") we will not discuss here. The other
two values concern how Excel estimates the standard deviation of the two datasets combined. If
you run an FTEST and find no significant evidence at the 1% level of a difference in the standard
deviations of the two sets of measurements, then use type - 2; on the other hand, if there is
significant evidence of a difference, use type = 3.

Consider a comparison of the results of the original ten measurements of heat capacity of
Si2ON2 at 95 °C versus the ten measurements made with the new improved furnace. We showed
above that there was not significant evidence of a difference in the standard deviations in the two

92 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

sets of measurements, so we cannot conclude that the new furnace gave measurements that are
more precise. Now we will see if there is a difference in accuracy between the two sets of
measurements. That is, we're not comparing either one against the known reference value for the
heat capacity, we're just comparing the two sets of measurements against each other. This would
be the situation, for example, if we didn't know the heat capacity in the beginning.

The estimate of x0 from the original equipment was 80.0 and the estimate from the new
equipment is 81.7. Since the standard deviations are not significantly different, we use type = 2 in
the TTEST function. The result in cell D53 of the "SÌ20N2" worksheet shows there is a
probability of 0.55 = 55% that the true value of x0 is the same for both furnaces. Thus, the data
show no significant evidence of a difference in accuracy for the two data sets. The purchase of a
new furnace appears not to improve the results. Although there was a reduction of the standard
deviation with the new fornace, and although the new mean increased from the original, which we
know to be in the correct direction, the experimental error in each set of measurements is still so
large that statistically the improvements with the new equipment could still be a result of being
lucky the second time around, rather than conclusively "better".

EXAMPLE 3.10 — The Professor Tries Again.

The ceramic engineering professor digs even deeper into her grant money to buy a calorimeter
less dependent on the skill of the user. She hopes this will improve the accuracy and precision.
Her student dutifully makes 10 more measurements. Is the third set of measurements better than
the originals? Better than the second set of measurements?

Data. The third set often measurements is: 86 94 86 85 86 85 85 86 83 86 This
dataset has mean x = 86.2 J/(mol · deg) and standard deviation s = 2.90 J/(mol · deg).

Solution. We first compare the above (third) set of measurements against the original. The FTEST
returns a value of 0.013 = 1.3 %. Since 1.3 % is smaller than 5 %, we can conclude that there is
significant evidence at the 5 % level that the third measurements are more precise than the
originals. In addition, the TTEST returns a value of 0.020 = 2.0 % (again less than 5 %), so we can
conclude that there is significant evidence at the 5 % level that the third measurements differ from
the originals in accuracy. (Note again that this test, by itself, does not establish the third
measurements as more accurate than the originals, only that the accuracies differ. It takes
comparison with the known reference value to document that the third set is more accurate). Thus,
the experimental data show that the third set of measurements is both more precise than, and
differs in accuracy from, the originals. As a technical point, even though the difference in
precision was significant at the 5 % level, it was not significant at the 1 % level. That means that
in the TTEST, we still used type = 2, not type = 3.

We now compare the third set of measurements against the second set. The FTEST returns a
value of 0.10 = 10 % > 5%, so there is not significant evidence at the 5 % level to conclude that the
third measurements are more precise than the second. However, the TTEST returns a value of
0.027 = 2.7 % < 5 %, so there is significant evidence at the 5 % level for a difference in accuracy
between the second and third set of measurements. By comparison with the reference value, we
can conclude that the third set of measurements is more accurate, but not more precise, than the
second set of measurements.

Assignment. For the datasets in Example 3.10 and for the datasets in the Assignment following it,
determine if there is significant evidence for a difference in accuracy.

3.3.4 Error Propagation

In engineering, one often calculates a quantity from several other quantities that are measured
directly. For example, the volume of a cylinder can be calculated from its height and radius, which
are measurement results. Since each measurement has random (and possibly systematic) error, the
calculated volume also has errors that can be traced back to the original measurements. The

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 93

transfer of errors from the initial measurements to final calculated result through one or several
intermediate calculating stages is called error propagation. In section 1.8, we discussed briefly
how to decide how many significant figures to carry when we do a calculation. Essentially, we
were tracking the propagation of errors. A statistical study of error propagation will help us to
understand the rationale behind the way to report the number of significant figures in an answer.

We treat propagations of random error and systematic error differently (Taylor 1997). This is
because propagation of random error is a statistical process, and so involves standard deviations
and variances. Systematic error, on the other hand, is by definition non-random, and so its
propagation is governed by elementary calculus.

Random error propagation. Since the function of a random variable is still a random
variable, the problem of random error propagation is equivalent to the determination of the
standard deviation of the new random variable, which is a function of the random error(s).
Depending on the form of the function, three different situations are discussed here.

(a) Linear combinations. When the relationship between the final calculated quantity у and
the independent measurements x\,x2... is у = b0 + b\X\ + b2x2 + ..., the standard deviation ofу is:

V(^)2+(è2<72)2 + . [3.9]

where σ\9 σ2 ... are the standard deviations ofxbx2 ....

EXAMPLE 3.11 — Linear Random Error Propagation.

A chemical engineer was interested in the concentration change of a catalyst after a reaction.
He measured the catalyst concentration before and after the reaction. He knew from previous
experience that the standard error of both measurements was 0.01 mmol/L. What was the standard
error of the calculated concentration change?

Solution. Denote the concentrations before and after the reaction as JCI and x2 respectively, and
denote the concentration change as y. Obviously the relation between them is у = x2 - x\.
According to equation 3.9, σ = -y](-l *σι)2 +(<τ2)2 =0.014 mmol/L.

Assignment. In a titration experiment, we are interested in the total titrant used. We record the
initial and final readings on the burette. It is known that the standard errors associated with each of
the two readings are 0.02 ml. What is the standard error associated with the calculated amount of
titrant?

(b) Multiplicative expressions. When the relationship between the final calculated quantity у
and the independent measurements xu х2,*з and x4 is у = ^x^ixijc^), we can calculate the relative
standard deviation as:

[3.10]
У

The above formula is invalid if the quantities are not independent. For example, if the
relationship between у and x is у = x3, then we cannot re-write this as у = χ·χ·χ and calculate the
relative standard deviation using the formula above because the three copies of x are not
independent. Instead, the general formula for у = xn is

у \ησΛ [3.11]

У Vx )

94 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

EXAMPLE 3.12 — Multiplicative Random Error Propagation.

An engineer wants to know the volume of a cylinder. From past experience, he knows that
his measurements of linear quantities have a standard deviation of 1.0 mm. He measures the radius
of the cylinder as 52.1 mm and the height as 99.3 mm. What is the relative standard deviation and
standard deviation of the calculated volume?

Solution. The relative standard deviations are 1.0/52.1 = 1.919 % for the radius and 1.0/99.3 =
1.007 % for the height. V= nr2h. We use Equation [3.11] for r2, and Equation [3.10] to combine
that with h. The relative standard deviation of the volume is:

~σVr}L= 22f—)2+(—)2 = ^22(±.9\9%)2 + (4.007%/ =3.97%.
Vr h

The calculated volume is 846 788 mm3, of which 3.97 % is 33 600 mm3, the standard
deviation of the calculated volume. We should report the volume as 847 x 103 mm3. Alternately,
846 788 ±33 600 mm3 or as being between 813 x 103 and 880 x 103 mm3.

Assignment. If we assume ideal gas behavior, we can use PV = nRT as the equation of state, in
which P denotes pressure, V denotes gas volume, n denotes the molar amount, T denotes absolute
temperature and 7? is a constant known to six significant figures. If we know the value of P, V, and
T, we can always calculate the amount present in the system. Suppose we measure P, Fand Г with
relative standard errors as 0.2 %, 0.3 % and 0.4 %, respectively. Calculate the relative standard
error associated with the calculated amount of gas.

(c) Otherfunctions. In general, ify =j(x), then the standard deviation ofу is

<v dy\ [3.12]
x dx

EXAMPLE 3.13 — Other Random Error Propagation.

The photometric method is often used to determine the composition of a solution. Two
quantities, absorbance A and transmittance T9 are related by A = -log(7). If the measured value of
transmittance is 0.842 with a standard error 0.002, what is the standard deviation of the calculated
absorbance?

Solution. —ddAT = - l o g e / r = -0.434/r.

S0°A dA 10.002 * (-0.434J/ 0.8421 = 0.001
1 dT

Assignment. The rate of a chemical reaction follows the model: r = e~ ' , where r is the reaction
rate (mol/L-min) and Г is the absolute temperature. Suppose you measure 7 = 295 К with standard
deviation of 2 K, what is the calculated reaction rate and what is its standard deviation?

Systematic error propagation. It is much easier to calculate the propagation of systematic
error because systematic error is not a random variable. The calculation relies solely on

elementary calculus. We are just using the linear approximation to the derivative — « — , where
dx Ax

Ax and Ay represent the systematic errors in x and у respectively. Once the approximating

equation is written, we can solve for Ay or — . Three situations are discussed.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 95

(a) Linear combinations. When the relationship between the final calculated quantity у and the
independent measurements xu x2... is у = b0 + b\X\ + b2x2 + ..., the systematic error of у is:

Ay = hlAxl +b2Ax2 + ... [3.13]

The sign of Ay can be either positive or negative.

(b) Multiplicative expressions. When the relationship between the final calculated quantity у
and the independent measurements х ь х2, хз and x4 is у = &\х2/(хзХ4), the relative systematic
error is:

Ay Axi Ax2 Ax3 Ax4 [3.14]

У Х| X2 -^3 X4

Ify = xn,then Ау _ Ax

У* [3.15]

(с) Otherfunctions. In general ifу =ßx)9 then the systematic error ofу is simply

Ay = Ax d—x [3.16]

This is the only one of the three rules for which systematic error and random error are treated
the same.

EXAMPLE 3.14 — The Difference Between Propagation of Random and Systematic Errors.

In Example 3.13, we looked at multiplicative random error propagation. Suppose the 1 mm
error was systematic error rather than random error. What is the change in the resulting volume
calculation?

Solution. Using Equations 3.13 and 3.14, we have Δ ^ =2 i : 0.048 · So the change in
V 52.1 99.3

the resulting volume calculation is (846,788 mm3)(0.048) - 40 600 mm3.

Assignment. Repeat the assignment concerning the ideal gas law PV = nRT (following Example

А 7Э A T Z Л Т 1

3.13) if the percentages given are relative systematic errors — , , rather than relative

random errors.

3.4 Curve Fitting

In the earlier part of this Chapter, we measured one thing repeatedly. For example, we
measured the temperature of a furnace every hour for two days. Alternatively, we could have
measured the temperature of an uncontrolled furnace as a function of a range of voltages across the
heating elements. In this section, we look at data from measuring a certain property once, but for
many different conditions. As an example, consider the data in Table 3.14. This shows the yield
of a production process operated at different temperatures. All of the variability is assumed to be
caused by the effect of temperature on yield. The data and supporting calculations are on
worksheet "Yield" in workbook StatTools.xls.

Table 3.14 Yield data at different temperatures.

Temperature (°C) 80 86 100 82 90 99 81 96 94 93

Yield (kg) 8.3 9.3 10.7 9.5 10.5 11.7 8.4 10.5 11.3 10.8

96 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

The measurements are one yield for each temperature. Rather than try to understand how a
single variable is behaving, our goal is to determine if a functional relationship exists between two
variables, and if so, what it is. How do we find a formula that predicts the yield for a given a
process temperature? The first step (as usual) is to use Excel to chart the data. Figure 3.22 (called
an x - у scatter plot), contains the points from Table 3.14 as well as the regression line for this
dataset.

Reaction Yield vs. Temperature

л1 оL· ♦ ♦

11.5 ♦ ./
л#
11 V ^" w

ö) 10.5 ^s^

2 10 A

Ф w "♦

* 9.5 4►♦ 100 105

9 80 85 90 95

8.5 temperature, °C

Оо

75

Figure 3.22 Plot of reaction yield versus temperature, with linear regression line created by
Excel's Trendline tool.

If you mentally change the line in Figure 3.22 — tilt the slope, or shift it up a little — you can
make the line closer to some of the points in the graph, but that makes it farther away from others.
Although the regression line in this example doesn't pass through any points of the sample dataset,
it seems to be a good "middle of the road" line. In this section, we will be seeking ways to find
lines and curves that are at the smallest possible average distance from the measured points. In
previous sections, we emphasized that when studying how data varies from the center, or how it
deviates from an ideal value, statisticians prefer to measure those variations or deviations using the
squared distance rather than absolute distance. Combining these two ideas, we can state the
fundamental goal for this section: given a set of measurements for which we are trying to find a
mathematical relationship or formula, we seek a formula that makes the sum of the squared
differences between the measurements and the formula fs predicted values as small as possible.
Naturally, we want the least complex equation possible.

Since it gets tedious to say, "the sum of the squared differences between the measurements
and the formula's predicted values", we usually abbreviate this as the sum of squares. If we're
trying to minimize the sum of squares, calculus has ways of minimizing functions using
derivatives. In this section, all the techniques we use have calculus as their justification, but we
will leave all the calculus details aside. Instead, we will focus on Excel's tools to solve regression
problems.

3.4.1 Simple Linear Regression and Excel's Trendline Tool

The first function we will study is at once the simplest and the most commonly used
functional relationship between two variables: a straight line. Simple linear regression is the
name of the process that finds the best straight line for a given pair of variables. That's what was
done in Figure 3.22. We'll use the yield measurement data set to explore various aspects of simple
linear regression.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 97

Of the (at least) four ways Excel will do simple linear regression, the best for a beginner is to
use the TrendHne tool. Start with an x-y scatter plot of two variables (here, temperature and yield).
Then right-click on a chart point and choose the TrendHne menu option. Click on the Linear icon
in the resulting dialog box. Then click on the Options tab, and check the Display equation and
Display R-squared boxes at the bottom before hitting OK. The TrendHne tool adds a straight line
to the graph, provides a formula for that line, and provides a quantity called R2 that we've already
discussed in section 3.2 and 3.3. We'll talk first about the formula, then about R2.

Although we will always let Excel calculate regression formulas for us, calculations to find
the slope and intercept of a regression line are not difficult:

n ί n \i n Л
nΣ(χiУi)-\Σχi
Σ^·

Slope m : n I i\ i(n" Λ2 [3.17]

ηΣi=[\ χϊ )V-/\=Σi χΐ )

ΣУi-mΣχi

Intercept b : /=1 /=1 [3.18]

n

where n represents the number of points in the dataset. Results from Excel's SLOPE and
INTERCEPT functions (based on the above formulas) should be reported with the same number
of significant figures as x or y, whichever is smaller. The results for this example (cell N10 and
N11 on worksheet "Yield") are: у = 0.14118115 and x = -2.6204216, displayed with way too
many significant figures. Excel's TrendHne tool may display more or less significant figures than
are justified. If TrendHne has overly-truncated the number of significant figures, use the Format
Data Labels / Number option to increase them. Use scientific notation if appropriate.

The statistical importance of minimizing the sum of squares requires a deeper look into the
calculations of this example. For now, we'll focus on Equations [3.17] and [3.18], and show how
they minimize the sum of squared differences. Consult Table 3.15 and Figure 3.23. In addition to
the temperature and yield measurements, Table 3.15 contains the yields predicted by the linear
regression equation for each temperature x. The row underneath shows the differences between the
measured yields and the yields predicted by the formula. These differences are often called the
errors or the residuals at each point. You can also see these differences in Figure 3.23, which
pictures them as vertical bars from each point.*

Table 3.15 Details of the regression residuals in the temperature versus yield example. "Yield,
calculated", "Difference" and "Difference squared" may be slightly off because of roundoff.

Temp, °C 80 86 100 82 90 99 81 96 94 93
Yield, kg 8.3 9.3 10.7 9.5 10.5 11.7 8.4 10.5 11.3 10.8
Yield, calc'd. 8.67 9.52 11.50 8.96 10.09 11.36 8.82 10.93 10.65 10.51
Difference -0.37 -0.22 -0.80 0.54 0.41 0.34 -0.42 -0.43 0.65 0.29
Difference sqd 0.140 0.049 0.636 0.295 0.171 0.118 0.172 0.187 0.422 0.084

The vertical lines show the distances from the measured points to the equation line. The
better the equation reflects the measurements, the smaller the sum of those distances will be.
However, statisticians prefer working with squared quantities to absolute quantities, so we try to
make the sum of the squared distances as small as possible. The sum of squared distances (SSD)

* Remember, we stated that the temperature has no uncertainty. All error is ascribed to the yield.

98 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

is the sum of values in the last row of Table 3.15 = 2.28. When we edit the regression line
equation to contain the proper number of significant figures plus one, we obtain the following least
squares representation of yield as a function of temperature:

Yield, kg = 0.141 (temperature, °C) - 2.62

There's an extra significant figure in the equation, so after you calculate the yield, round it off
to 2 significant figures. What makes у = 0.141/ - 2.62 special among all possible straight lines is
that no other line has a sum of squared distances smaller than 2.28.

12 Indication of Error in Yield Data
11.5
♦I
11
a) 10.5 ♦
2 10
♦4
Ф
—♦ у = 0.1412х- 2.6204
9.5
9 R2 = 0.8149 —

8.5 4
8
75 ii

80 85 90 95 100 105
temperature, °C

Figure 3.23 Plot of temperature versus yield data with regression line, unedited regression
equation, and error bars.

We now introduce some terminology to explain what R2 means and how to interpret it. First

we calculate the mean of yields; у = 10.1. Recall from section 3.1 the formula for the variance of

a set of data, applied here to the yield, y:

Variance:= 1 «
l
T(yi-y)2
n-\
/=1

The \l(n - 1) was to average the sum of squared differences, so if we multiply the variance by
n - 1 we're left with у / _ -\2 . This quantity is referred to as the total sum of squares (SST).

/=i

Since the SST is the sum of all the squared differences between individual yields j;,· and their
mean y, the SST is often described as the total variation iny. For our example, SST = 12.3, which
can be easily computed in Excel by using =\/AR(range of cells with yield)*9. See also row
C20:M20 in worksheet "Yield".

The sum of squares we computed from the regression line can be written algebraically as
У fv - ί θ 14x - 2 6))2 ' anc* *s referred to as the sum of squared errors (SSE). The word "error"

i=l

is used here to indicate the difference between the "true" measured yield and the yield resulting
from the regression equation. As noted above, SSE = 2.28 for this example. Excel does not
require you to do all the work shown in Table 3.15 to compute SSE; we included the extra
information for learning purposes. All Excel requires are the original yields and the yields
predicted by the formula. From these, you can use the Excel function =SUMXMY'2(range of cells
with measured yields, range of cells with predicted yields). Finally, the difference between SST
and SSE is the regression sum of squares (SSR); that is, SSR = SST- SSE.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 99

With this vocabulary, we can talk about the definition and interpretation of R2. First, since
SST = SSR + SSE, we think of SSR and SSE splitting the total variation in yield (SST) into two
parts: variation that can be explained by the regression line (SSR) and variation still left

СCD С СГр
unexplained (SSE). R2, the coefficient of determination, is defined as R2 = = 1
. Thus,
ÒÒ1 ÒÒ1

R2 is the proportion of the total variation in yield that can be explained by the regression equation.
In our example, SSR = 12.3 - 2.3 = 10.0, so R2 = 10/12.3 = 0.81.

In this example, temperature and the regression equation explain 81% of the variation in yield.
The remaining 19% of variation in yield remains unexplained. Either we need a better formula
using a different functionality of temperature, or additional reaction variables not yet included, to
better explain the variation in yield from one experiment to the next.

Since R2 is the proportion of the total variation in the dependent variable that can be explained
by the independent variable and the model, R2 = 1 is the best possible value, and the closer R2 is to
1, the better the job the model does of explaining variation in the dependent variable.

The coefficient of determination is closely related to correlation, usually represented by the
variable r. The correlation is a measure of how well the у variable can be modeled using the given
function of x. In the yield case, we used a straight line. The correlation can be positive or
negative; it always has the same sign as the slope of the corresponding regression line. Since r2 =
R2, r lies between -1 and +1. A correlation of+1 would mean that the points lie exactly on a line
of positive slope; a correlation o f - 1 would mean that the points lie exactly on a line of negative
slope; and a correlation of 0 would mean that there is no linear relationship between the two
variables. The main advantage of the correlation over the coefficient of determination is that it's
signed, so that it tells you more about the nature of the relationship between x and у than R2. The
main advantage of R2 is that we can use R2 for a regression with any number of independent
variables, while the correlation is only defined for pairs of variables.

EXAMPLE 3.15 — Modeling the Heat Capacity ofTiO*.

The heat capacity of TiOx was measured over a range of temperature, with the results shown
in the "CpTiOx" worksheet. The temperature is measured in degrees kelvin and heat capacity is
measured in cal/mol-deg. Find and interpret the regression line and coefficient of determination
for this data.

Data. These data are in columns В and С of worksheet "CpTiOx" in workbook StatTools.xls.

Solution. As usual, the first step is to chart the data, as shown as Figure 3.24. The Trendline tool
gives a linear least squares line equation: у = 0.0378x + 26.322, and the coefficient of
determination R2 = 0.9166. This means that the linear model explains 91.7% of the variation in
heat capacity at different temperatures.

Two observations come to mind immediately on viewing the data. First, there are two points
notably farther from the regression line than the other points. Second, the trend of data points
seems more of a curved line than a straight line. We will deal with both of these observations in a
later section.

Assignment. The "Tempering" worksheet contains data on the efficiency of a condenser's ability
to remove water vapor from warm humid air. The embedded WordPad document has more
information on the details and intent of the experiment. Make a graph of mass condensed (y-axis)
vs. t, and use Excel's Trendline tool to find the regression linear model for these data, as well as
R2. Does the result look like it can be improved on?

100 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Figure 3.24 Heat capacity data on titanium oxide with linear regression line and coefficient of
determination. Lower text box is reformatted with actual variables and a more appropriate number
of significant figures. Note the two points that seem unusually far from the trend.

In the paragraph describing R2 for the "Yield" dataset, we made the comment, " . . we need a
better formula using a different functionality of temperature . . . to better explain the variation in
yield". The formula: Yield, kg = 0.141 (temperature, °C) - 2.62 is the best possible linear
formula. However, there might be non-linear formulas that do a better job than the linear formula.
The Trendline tool offers four other models to choose from.*

• Logarithmic: y = alnt + b.
• Polynomial: y = a + bt + ct2 (order = 2); у = a + bt + ct2 + dt3 (order = 3); and so forth

up to order 6.
• Power: у = ахъ
• Exponential: у = аеы
We will discuss the relative merits of these and alternative models in Section 3.4.4 below,
"Choosing Models".

EXAMPLE 3.16 — Non-linear Modelsfor the CpTiOx Data.
Use the Trendline tool to examine the fit of a quadratic (polynomial order 2) and a power

model for the CpTiOx data.
Data. See data in columns В and С in worksheet "CpTiOx" in workbook StatTools.xls.
Solution. First, we clear the linear trendline from the chart, and add new trendlines for each of the
cited models. Excel will display the equation and R2 on the chart, as shown in Figure 3.25. The
quadratic model is represented by the dashed line, and has a formula of у = -0.00014lx2 + 0.154x
+ 2.95. The power model is the solid line, with a formula of у = 4.37x°376. Of the three models for
this data set (these two plus the linear model in Example 3.15), the best fitting model is the
quadratic model. This is true both visually and using the coefficient of determination, where the
quadratic model explains 96.5% of the variation in heat capacity, versus 93.9% for the power
model and 91.7% for the linear model.

Be aware that Trendline's Regression equation may have more or fewer significant figures
than justified by the data. If so, reformat the text box containing the equation to give the
appropriate number of significant figures plus one, and round off the answer later.

Trendline sometimes decides that one or more of these models are inappropriate, and disallows its use.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling Ю1

46 Heat Capacity Data for TiOx
45 \
44 у = 4.37x°·376 ♦ ^Γ. .♦
R2 = 0.939 *5^*^Г
-о 43 ** ♦

t"о 42 ^

E 41

cfГС 40 j^ у = -0.000141X2 + 0.154X + 2.95 —
39 R2 = C| QfiR
о
38 И* ,+

37

36

300 350 400 450 500 550

temperature, К

Figure 3.25 Heat capacity of TiOx fitted with quadratic and power models. Solid line refers to
power model.

Assignment. Use the Trendline tool to find a cubic (polynomial order 3) and a logarithmic model
for the "CpTiOx" data, and a quadratic and cubic model for the "Tempering" data.

Excel has several other methods for finding regression models, including the SLOPE and
INTERCEPT functions and the LINEST function. These methods will not be covered in this
Chapter. Consult the Excel Help features for more information about these if you're interested.

3.4.2 Using Solver to Develop Single-Variable Regression Models

The common feature of all the model types available in Trendline is that they are algebraic
formulas derived from calculus and/or linear algebra to calculate automatically the best model of
each of these types. Suppose, however, you wish to fit a model that isn't one of the types
supported by Trendline.

Here's an example. Molten iron dissolves both silicon and oxygen when it is in contact with a
slag. The equation relating the composition of the dissolved elements and temperature is given as
a composition function equation.

log(%57)+ 2 log(%0)= concentration function = CF. = — + b

In this case, what is the relationship between dissolved silicon and oxygen in molten iron with
respect to 1/77 The data obtained in 7 tests is shown in bordered cells in worksheet "LstSqrSiOx".
You wish to regression-fit the data to a model of the form >> = m/T+ b, which is not one of those in
the Trendline list.

One way to find a model of the form у = m/T + b is to use Excel to create two new variables,
the first equal to the composition function CF, the second equal to 1/T. Then plot 1/7 versus the
sum of the logs and use the graph and Trendline. This is how the graph was made in the
"LstSqrSiOx" worksheet. The formula shown is an edited version of the Trendline result.

A second way to find a model is by using ExcePs Solver tool. The Solver tool can minimize
or maximize some quantity by varying the constants in a range of cells. In this case, we want to
minimize the sum of squared differences between the actual values of the CF and the values of CF
calculated by the model, which we'll do by asking Solver to vary the values of m and b. The
details of how to do this are in the embedded WordPad document in the "LstSqrSiOx" worksheet.
If unfamiliar with the use of Solver, please consult your Excel manual, and the SuperSolver User's
Guide on the Handbook CD.

102 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Using either method, the resulting model is something close to CF = -28/700/Γ+ 10.7. Using
this formula, the sum of squared errors SSE = 0.00450. Since the total variation in CF is SST =
0.279, we get SSR = 0.275 and R2 = 98.4%. The two-term model explains over 98% of the
variation in composition function.

EXAMPLE 3.17 — An Asymptotic Modelfor the Heat Capacity ο/ΤίΟχ.
Use Solver to find a model for the CpTiOx data of the form Cp = a- be~T/m.

Data. See worksheet "CpTiOx" in workbook StatTools.xls.
Solution. Consult the worksheet, starting near cell AJ16. The table of values are in column AK,
along with the coefficients we want fitted (a, b) and the quantity we want to minimize (the sum of
squares SSE, cell AK20). Solver requires initial estimates of a and b, which should be reasonably
close to the values Solver seeks (however, Solver often converges even with poor initial estimates).
For an initial value of a, we note that a model of the form у = a - be~mo° is exponentially
asymptotic to the horizontal line Cp = у = a. The data show Cp asymptotic to about 48, so that's
our starting value (i.e., initial estimate) for a. We then estimate b from one of the points, which
gives a reasonable starting value for b of 220. We then use these values of a and b in the equation
to calculate Cp in column AL (heading is "Calc'd Cp"). Solver is then invoked to minimize SSE
(cell AK20) by changing the values of a and b. Solver gives us a = 46.0, b = 201, and a resulting
R2 of 0.960. Solver stores the setup on the worksheet, so if you invoke Solver again, a dialog box
will appear showing the entries. The parameter "Adjusted 7?2" in the text box will be explained in
a later section. Figure 3.26 shows a graph of the data and fitted model.

Asymptotic Model for Cp of TiOx
48

Cp= 46.0 - 201.4*exp(-T/100)
R2 = 0.960 Adj. R2 = 0.958

34 350 400 450 500 550
300 temperature, К

Figure 3.26 Heat capacity of TiOx fitted by Solver to an asymptotic model.

Assignment. A graph of the tempering data (worksheet "Tempering") shows a curve that looks like
a mirror image of the graph of у = ух or у — -yjx that's been shifted 40 units to the right. Thus,
we might try a model of the form: mass condensed = (40 - if where the "40 - /" is the
shift/reflection and b is the unknown root. Use Solver to find a model of this form for the
tempering data, and compute R2. How does this model compare to the quadratic and cubic models
found in an earlier assignment?

3.4.3 Multiple Linear and Non-linear Regression

So far, we've looked at various relationships between a variable and one other factor (linear
and non-linear). However, sometimes we know, or suspect, that there may be multiple factors that

One or any other number may be used as a denominator for the exponential factor fraction. Using 100
keeps the exponential from being an extremely tiny number.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 103

affect a certain process variable. Corrosion is one of those processes, and is typically studied by
exposing samples to an environment and measuring the amount of corrosion. In one set of tests,
100 cm2 galvanized steel samples were exposed to humid air in a corrosion test box for 300 days.
The % relative humidity and temperature columns are self-explanatory. "Thickness" is the
thickness of the galvanized coating, in μιη. "Weight gain" indicates the amount of corrosion, and
is the increase in mass, in mg, for a sample. The data are in Table 3.16 and in worksheet "Corros"
in workbook StatTools.xls.

Table 3.16 Corrosion data for galvanized steel samples. Temperature is Celsius.

%RH. Temp Thickness Weight gain %RH Temp Thickness Weight gain
80.7 25.0 88.4 42.3 59.1 17.8 88.3 11.5
78.4 27.5 87.7 38.7 58.5 16.9 81.1 10.7
76.8 24.8 88.6 37.5 57.3 20.1 93.4 12.4
61.6 22.8 88.0 27.8 49.3 18.7 88.6 8.3
61.1 21.1 87.8 16.9 50.8 19.5 85.9 6.0
60.6 22.8 86.9 19.7 49.4 21.0 70.7 9.2
60.3 23.3 92.0 19.2 50.3 18.2 77.4 6.7
62.0 23.5 93.0 21.2 52.1 19.7 78.5 8.9
58.5 23.2 86.8 14.9 57.1 20.6 82.7 15.1
58.5 19.0 81.0 13.8 70.0 17.6 90.6 14.4
58.3 17.5 88.2 15.8

The goal is to analyze the data to find a statistical relationship between the weight gain, the
galvanized layer thickness, and the two environmental factors. If we don't have a typecast
mathematical model for a process, we start with a linear model. We can develop a regression
model for weight gain у as a linear function of the different input variables (x\ = % relative
humidity, x2 = temperature, and x3 = zinc thickness). There are three independent variables and we
solve for four parameters:

у = a\X\ + #2*2 + ад + b

Finding regression models with more than one independent variable is called multiple
regression. Here, it is multiple linear regression. Clearly, we can't use Trendline to find a model,
since we can't plot all four variables in a single two-dimensional graph. We could set it up using
Solver in the fashion described earlier, but Solver does not automatically produce some useful
statistical parameters. A better option is to use Excel's statistical Regression procedure located in
Tools/Data Analysis. An embedded WordPad document on the "Corros" worksheet explains
how to use the Regression tool. Excel lets you name the variables in the output. Label each
column of data, and include the column labels when you select the data. Then place a check mark
in the "Labels" dialog box. Below, we will focus on the resulting output.

A partial display of the output (Table 3.17) shows some summary statistics and the four
parameters in the regression equation. Excel obtained the following regression equation:

Wt. gain (mg) = -54.0 + 0.788(% RH) + 1.428(0 - 0.070(μηι)

The Regression tool also returns R2 = 0.929. "Multiple R" is just the square root of R2, and
has no meaning of its own apart from the meaning it derives from R2. We will discuss "Adjusted R
Square" in the next subsection.

We mentioned earlier that a linear model is sometimes inadequate to express the data. We
may know that the data changes exponentially, or has some other functional variation that involves
polynomial factors. The Regression tool can also be used in these cases.

104 Chapter 3 Statistical Concepts Applied to Measurement and Sampling
Table 3.17 Results of Excel regression analysis of the galvanized steel corrosion data.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.96366

R Square 0.92864

Adjusted R Square 0.91605

Standard Error 3.04933

Observations 21

Coefficients -53.981629
Intercept 0.7883615
% rei. hum. 1.4278573
Temperature (°C) -0.0699328
Thickness

The Regression tool has another useful feature which appears if you check Residuals on the
Regression dialog box. Excel calculates a table of у values predicted by the regression equation
and the residual value compared to the measured value. If your original data was sorted according
to ascending or descending values of certain variables, the residual values can tell you if the larger
residuals appear at one or the other extreme of measurement. Other diagnostic tools are also
available.

EXAMPLE 3.18 — Using the Regression Tool to Find Non-Linear Models.

Find the best quadratic and cubic models for the tempering data (see assignment after
Example 3.16).

Data. See worksheet "Tempering" in workbook StatTools.xls.

Solution. The data is in column format in the "Tempering" worksheet, starting in column O. The
Regression tool requires that data be in adjacent columns. In addition to the two variables shown
above, we have added two additional columns, consisting of the square and the cube of the
temperature. Always label data columns and check the Labels box on the Regression dialog
screen. Include the column labels when you select the data.

A quadratic model expressing mass as a function of temperature would have the form M = a +
ЪТ+сТ2. Thus, you can think of it as a multiple regression model with two independent variables,
Tand 7 \ In the Regression tool dialog box, choose 0 6 : 0 1 5 as the Input Y range, and P6:Q15
as the Input X range. The resulting model is M= 4.13 + 0.01277- 0.002747*.

The cubic model can be found the same way, only this time you have to include column R in
the Input X range. The resulting model is M= 4.35 - 0.06157+ 0.0019872 - 0.00007737^. As
usual, express the equation parameters to the correct number of significant figures plus one, and
round off the answer later.

Assignment. Use the Regression tool on the data (next page) to predict the molality b of product С
as a function of the molality of A and В in a solution. Use linear regression to see if bc is
adequately expressed as a linear function of the sum of bA and bB. Should the intercept be zero?

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 105

bA 0.1717 0.3016 0.1274 0.0679 0.1872 0.2222 0.2040 0.3045 0.2727 0.2120
Ьв 0.2231 0.2323 0.2288 0.1960 0.2425 0.2288 0.2727 0.2550 0.2310 0.2222
be 0.0412 0.0696 0.0294 0.0138 0.0559 0.0460 0.0732 0.0910 0.0656 0.0466

bA 0.2910 0.3030 0.6760 0.1470 0.4850 0.9568 0.4545 0.5100 0.5250 0.9696

Ьв 0.2323 0.2392 0.2646 0.1843 0.2392 0.1818 0.2550 0.3465 0.2727 0.2862
be 0.0727 0.0806 0.2068 0.0258 0.1133 0.1481 0.1251 0.2354 0.1719 0.3440

bA 0.4850 0.8585 0.8320 1.0682 0.4656 0.3640 0.6060 0.7548 0.2940 0.6363
0.2912 0.2450 0.2134 0.2704 0.3030 0.3060 0.3045 0.2929 0.3922
JbeB. 0.3030 0.3420 0.2162 0.2707 0.1515 0.1414 0.2379 0.3027 0.1047 0.4152
0.1964

3.4.4 Using Solver and Excel's SSD Tool to Find Equation Coefficients

Sometimes we need to fit data to a phenomenological equation where the variables and factors
cannot be mathematically separated. In Section 2.3, for example, the van der Waals (vdW)
equation of state for a gas utilized two factors, a and b. Equation [3.19] expresses the vdW
equation in terms of gas compressibility z, density p, and temperature in kelvin.

1 ар [3.19]

\-bp RT

If the density is expressed as mol/L, the units for a are L2 · bar/mol2, and the units for b are
L/mol. R, the gas constant, is 0.0831447 L · bar/(mol · K), and the compressibility is unitless. In
order to use the vdW equation of state, we need values of a and b that minimize the difference
between tabular (correct) values of P, T and V, and those calculated by the vdW. Excel's
Regression tool cannot be used to find a and b because the format of the vdW equation does not
allow separation of variables as required by Regression. Instead, Solver can find the "best" values
of a and b by minimizing the square of the sum of the differences (SSD) between tabular values
and those calculated by the vdW. The SSD term was introduced earlier in Section 3.4.1. Once a
and b are found, the vdW equation can be revised to express P in terms of Fand T.

We illustrate this technique for air by finding a and b based on data from NIST, as shown on
worksheet air-vdW in workbook StatTools.xls. Column H of the worksheet contains the z values
calculated from the P, V, and p table values.

The first step is to select a place on I J ~ Г К "T~ L TÌ
the worksheet (anywhere you like) for 1
the sought coefficients a and è, like this: 1
-1
I vdW coefficients

You can see that the gas constant R 3 a bR
is there too — that makes it easier to ; 0.08314471
reference the formula's parameters when

they're in the same place. It's handy but not required to place them all close to each other.

You may wish to assign names for the cells designated to hold the coefficients and the gas

constant. It's not required but it'll help you to avoid a possible "absolute reference" mistake, and it

will also make your formulas
more readable. To assign a |=1/(1-CoefB*E8) - CoefA*E8/(ConstR*D8)

name, select the cell, then place DE F GH ■

the cursor on the dropdown

arrow next to the name box T, К Dens, mol/L Voi, L/mol Voi, STP Table Z v d W Z
containing the cell's address. 260 0.046908 21.3183 22.3965 0.99922 1.00000|
Then type desired name for this

cell and press Enter. You can't use standard Excel addresses as a cell name, so you can't type just
'a' or 4b\ So, just for instance, cell J4 is named CoefA and cell K4 is named CoefB.

106 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

We want a vdW equation which, given p and T9 would return the most accurate z values.
Enter a vdW equation for z in the worksheet, shown here in cell 18.

The calculated vdW z is initially based on zero values for a and й, which is equivalent to ideal
gas behavior. Drag the formula you've just entered down to create corresponding formula entries
for each row of your data set.

Now enter the criterion to determine the a and b coefficients. When a and b are properly
found, the values in our calculated column (column I) will be as close as possible to their
predefined counterparts from the NIST table (column H). For this, we use the sum of the squares
of differences (SSD):

V I x . — у. Y = sum of the squares of differences [3.20]

Designate a cell (we used cell J6) and enter a formula using following template:
=SUMXMY2 (RangeJPredefinedValue, RangeColumnofInterest)

Here, the vdW z values are in the column of interest, and the table values of z are the
predefined values. Therefore the equation entered in cell J6 is: SUMXMY2(H8:H91,18:191),
although since we are squaring the differences, it makes no difference which order we list the
range columns. Alternatively, you can click on the Formula Wizard button (fx), select the function
above, and enter the range of x and у values from your worksheet by following the prompts.

Now everything is set up for analysis. Go to the Tools menu and select the "Solver" menu
item. In the Solver Parameters box:

"Set Target Cell' = your cell containing SUMXMY2 formula (here, J6);

"Equal To" = "Min" option;

"By Changing Cells" = your a and b coefficient cells (select both of them at one time).
The Solver Parameters box will appear similar to the diagram below.

Solver Parameters JiJSi

Set Target Cd: |SJS6 5J ]Solve

Equal To: Г Max (? MQ Г Value of: Close

rBy Changing CeHs: 3J Options

$J$4:$K$4 Guess

(-Subject to the Constraints: -

Reset All
Help

Leave the Solver Options in their default entries. Leave the CoefA and CoefB cells vacant.
Now click the Solve button. Solver will then find values for a and b which cause the SUMXMY2
function to be as small as possible. Over the range 1 - 50 bar and 260 - 2000 K, the value of а =
1.2803 and b = 0.04254. The worksheet also shows the same technique applied to other versions
of the vdW equations.

3.4.5 Choosing Among Models
We've shown three different methods in Excel for finding regression equations for data:

Trendline, Solver, and Regression. Our purpose now is to examine different aspects of

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 107

discovering, evaluating, and choosing among competing models. We now introduce the statistical
term Adjusted R2 (Ä2adj) to guide our choice.

Consider three models of the CpTiOx data found using the Trendline tool for the linear and
quadratic models, and Solver for the asymptotic model:

• Linear, у = 0.0377x + 26.4. R2 = 0.917

• Quadratic: j ; = -0.00014bc2+0.154x+2.95. R2 = 0.964

• Asymptotic: у = 46.0-20lex/m. R2 = 0.960

Which model should we choose as a best model? On one hand, the larger R2 is, the better the
model explains the variation in the yield. On the other hand, in science we always prefer simpler
models to more complex ones. Is an improvement in R2 from 0.917 to 0.964 enough to prefer the
quadratic model to the linear model, or should we stick with the simpler model? Similarly, is an
improvement of 0.960 to 0.964 enough to prefer the quadratic model to the asymptotic model?

It turns out that adding a variable to a regression model always increases R2. i?2adj is a
parameter designed to help you decide whether that increase is "enough". Is the improvement in a
model from adding a variable worth the additional complexity? On the other hand, more generally,
is a particular "larger" model with lots of variables superior to a "smaller" model with fewer
variables? Although in general we will rely on the Regression tool to calculate 7?2adj for us,
working one example by hand will help explain what 7?2adj does:

K^^-S^SSESb/T(/n-(-nwk--ll))2 [3.21]

where n as usual represents the number of points in the dataset, while к represents the number of
independent variables.

Recall that R2 was the percentage of variation in the dependent variable explained by the
regression model. 7?2adj has no direct interpretation of that sort; however, it is still true that models
with larger i?2adj are considered superior to those with smaller i?2adj. We will use i?2adj to compare
the quadratic and asymptotic models for the CpTiOx data. The output of the Regression tool on
worksheet "CpTiOx" has the necessary information for the quadratic model. The ANOVA table
tells us that for the quadratic model, SSE = 6.10 and SST= 173.23 (cells XI9 and X20). Also, the
quadratic model has к = 2 independent variables, x and x2, and n = 22 points. Therefore,

R2 =\ β ' Ι υ / ( ^ ~ ^ ~ ν _ о 961- On the other hand, the asymptotic model (developed with
adj 173.23/(22-lJ

the use of Solver for Exercise 3.18) has SSE = 6.98 and SST= 173.23 (cells AK20 and AK22). к =
1 independent variables, so R 2 = \ 6 . 9 6 / ( 2 2 - 1 - 1 ; _ ^ ^ ^ Thus, although the asymptotic

adj 173.23/21
model is algebraically simpler than the quadratic model, the quadratic model has a slightly larger
7?2adj, which is interpreted as saying the improvement in the quadratic model is enough to justify its
additional complexity. However, the improvement in 7?2adj is small, and considering the scatter in
the data*, the two models are about equivalent.

EXAMPLE 3.19 — Removing the Thickness Variablefrom the Galvanized Corrosion Example.

We introduced the Regression tool to model the weight gain of galvanized steel under
different corrosion conditions. 7?2adj for the linear regression model was 0.916. For reasons we'll
discuss below, there is reason to wonder whether the thickness variable is important to the model,

Do you remember the two points that were rather far from the trend? We will consider the validity of these
points in a later section, after which we can re-compare the asymptotic and quadratic models.

108 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

or if instead it can be removed from the model to simplify it. Compute and interpret 7?2adj for a
model where weight gain depends only on % relative humidity and temperature.

Data. See Table 3.16 and worksheet "Corros".

Solution. If we use the Regression tool to see if weight gain depends only on % relative humidity
and temperature, the resulting model has 7?2adj = 0.9195 (results on the "Corros" worksheet).
Therefore, since the 7?2adj for the larger model is actually smaller than for the smaller model, the
smaller model is preferred.

Assignment. Examine the linear, quadratic, and cubic models in the "Tempering" example
(worksheet "Tempering") and report which model would be preferred according to 7?2ad]·.

Hypothesis testing. The results of the Regression tool can also be used to judge whether or not a
model is useful overall, as well as the utility of individual terms within the model. A full
discussion of hypothesis testing on regression models is beyond the scope of this text. However,
here is a brief guide to important information included in the Regression results.

Hypothesis testing begins with certain key assumptions. These assumptions are that the
residuals from the model should be independent and normally distributed with mean 0 and some
constant standard deviation σ. If you look at the residuals for a model and these assumptions do
not seem valid, then the inferences below will likely be invalid as well. Assuming that the
residuals are OK, the next thing to look at is the overall significance of the model.

In the table labeled "ANOVA", look at the cell called "Significance F\ We are interested in
a term called the probability value, or p-value of a hypothesis test for the whole model. Without
going into detail here, the usual rule of thumb is that as long as the/rvalue is below 0.05, there is
significant evidence at the 5 % level that the model as a whole is valid. As in section 3.3,
alternative standards are 0.01 (significant evidence at the 1 % level) and 0.10 (significant evidence
at the 10% level).

If the overall model is deemed significant, then it's reasonable to check individual terms in the
model. Look for the column labeled "p-value" in the untitled table below the ANOVA table in the
Regression tool output. If a variable has a p-value above 0.05, the interpretation is that there is not
significant evidence at the 5 % level that that variable should be included in the model. A different
interpretation of the same situation would be that the data do not provide enough evidence to
distinguish that variable's coefficient from 0 and, of course, if that coefficient were 0, then there
would be no point in including that variable in the model. If one or more of the variables is not
significant, that may mean that the model would be OK without that variable; or, it can result from
an experiment that measures several independent variables in only a narrow range of possible
values. Other explanations are also possible.

EXAMPLE 3.20 — Hypothesis Testingfor the Galvanized Steel Model with Three Independent Variables.

Use the above recommendations on hypothesis tests to analyze the galvanized steel model
with three independent variables, and verify that the thickness of the galvanized layer is not a
statistically significant variable.

Solution. Example 3.19 used i?2adj to show that the model was "better" when the thickness variable
was omitted. There is another way to show the same thing. When we introduced the Regression
tool, we recommended that you check off the "Residuals" box. This is to make the residuals
available for diagnostic purposes. Figure 3.27 shows four graphs based on the residuals used to
evaluate the assumptions for hypothesis testing of regression models:

Although 22 points is not enough to see a pattern in some of these plots, they indicate no
problems with the assumptions for a hypothesis test. The plot of residuals vs. time and the residual
lag plot both show no patterns that would indicate a problem with the independence assumption.
The normal distribution plot of the residuals shows a reasonably linear pattern in the points and, as

Chapter 3 Statistical Concepts Applied to Measurement and Sampling Ю9

we'll see below, the R2 for the normal distribution plot can be interpreted as consistent with a
normal distribution for the residuals. Finally, the plots of residuals vs. time and vs. the predicted y-
values indicate no changes in the standard deviation with time or with the size of y. We proceed
under the assumption that the hypothesis tests for these data will be valid. Table 3.18 shows
relevant parts of the Regression tool output. The model equation using all three variables is:

Wt. gain, mg = 0.788(%RH) + 1.428(0 -0.070(x) - 53.98

Residuals vs. 'rime Predicted у vs. Residuals
8
6 4
4
2 2 ίΡ
0
-2 0 $ о σ
-4 -2
-6
-8 -4 оо
-6
() 3 6 9 12 15 18 21
-8

ю 20 30 40 50

8 Residual Lag Plot Norm Dist'n Plot of Resids
О
6 О
4 -10 10
2 о
cp
*-°et )°о
0 оо ° о

■2

-4 Оо
-6

-8

-5 0 5 10

Figure 3.27 Diagnostic graphs for regression residuals, galvanized steel example.

Table 3.18 Partial results of the Regression tool on the galvanized steel example.

ANOVA df SS MS Significance F

Regression 3 2057.033 685.67779 73.741248 5.96648E-10
Residual
Total 17 158.0733 9.2984295

20 2215.107

110 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Coefficients Stand. Error tStat P-value
Intercept
% rei. hum. -53.981629 10.8771403 -4.96285124 0.00012
Temperature (°C)
Thickness 0.7883615 0.10565496 7.4616611 9.3E-07

1.4278573 0.30538734 4.67556148 0.00022

-0.0699328 0.13733445 -0.50921521 0.61715

The "Significance F " /?-value is nearly zero. The key is that it is less than 0.05, which
indicates that the model is statistically significant. The /?-values for the Intercept and three
variables are similarly small, but p for the thickness variable is much larger than 0.05. That
indicates that the thickness variable is making no significant contribution to the model, and could
possibly be omitted without ill effects (this was the motivation for our treatment of this model in
Example 3.19, using 7?2adj). Normally, one would expect that the galvanized thickness would be a
factor in corrosion rate, but if the galvanized layer in the population is so thick that it is intact
throughout, it may substantially prevent oxidation of the iron in the 300-day exposure.

Regression was performed again, this time omitting the thickness data. The equation is:

Wt. gain, mg = 0.768(%RH) + 1.431(0 - 58.82

Assignment. Suppose the formula of dissolved product С obtained by reaction between A and В
was AB2. In the assignment for Example 3.18, you looked at a model that predicted the molality of
С as the sum of molalities A and B, which required two coefficients and an intercept (which you
may have set to zero). Use the recommendations on hypothesis tests to compare your linear model
to one where bc is expressed as a product of bA x (bB)2. Hint: should the intercept be set at zero?

Using common sense and/or theory to choose among models. When you have to choose
between models, statistical criteria are sometimes insufficient by themselves, or may be superseded
by non-statistical criteria. The quadratic model was (slightly) preferable to the asymptotic model
for the CpTiOx data ifjudged by 7?2adj. But this may not be the end of the story.

When comparing two or more models, ask yourself if theory or experience gives you any
expectations about what the shape of the relationship among the variables should be. In the
CpTiOx example, the vertex of the quadratic model occurs at 0.154/2(0.000141 = 548 K. If you
were to use the quadratic formula to try to predict the heat capacity for temperatures above those
for which you have data, the negative sign on the T2 term would predict that the heat capacity
would start falling. On the other hand, the asymptotic model says that no matter how high the
temperature, the heat capacity will never exceed about 46 cal/(mol ■ deg) (the value of parameter
a). If either theory or experience gives you a strong belief that one of these behaviors is the
"correct" one, it narrows down your choice of model significantly, and can even cause you to
overrule the results of i?2adj. However, if we know nothing about how the model should behave,
there is little to choose from between these models, since 7?2adj differs by only 0.003 out of 1.

These considerations are particularly important in trying to find a model that represents the
behavior of a real industrial process parameter. If we know nothing about what controls the
process, we seek whatever type of equation best fits the data while giving a high value of 7?2adj.
This is called an empirical model. If a process was governed primarily by thermodynamic
limitations, we would seek to fit the results to an equation format consistent with how an
equilibrium constant varied with temperature. If the process was governed primarily by chemical
kinetics, we would seek to fit the results to an equation format specific to a reaction rate constant.
Models based on physical-reality equations are called phenomenological models. On this basis,
for the assignment to Example 3.20, an A and В concentration product model is most appropriate
for the concentration of AB2 in solution rather than linear concentrations of A and B.

We explore this concept a bit further by looking at experimental data obtained as a function of
time. There is a completely separate field within statistics called "time series" in which the data

Chapter 3 Statistical Concepts Applied to Measurement and Sampling \\\

are completely dependent. Recall that in Section 3.3, we defined statistically independent data to
mean that a given measurement value was completely independent of the previously taken
samples. Suppose we measure the temperature at one-minute intervals of an object placed in a
furnace that is at 800 °C. If we know its temperature at 11 and 12 minutes, we can predict very
accurately the temperature at 13 minutes.

We mentioned in section 3.3 that a lag plot was a good way to indicate the statistical
independence of a population. What would a lag plot look like of a typical data set for the
temperature of an object, initially at 30 °C, placed in a furnace at 800 °C? Figure 3.28 shows such
a plot, where the temperature is measured to ±1 °C.

400 Lag Plot for Temperature Rise

350 ^r- *АГ^

g 300 *j&*' I

2 250 c/*^

3 О^Ο
2 200
Φ г
150 y=0.981x+14.5
ВEa юо
R2 = 0.9979

50

о

50 100 150 200 250 300 350 400

temperature #1

Figure 3.28 Lag plot for temperature of object placed in hot furnace. Temperature measured at
one-minute intervals. Text box shows the Trendline equation for data. Temperature #1 is the
object temperature, while temperature #2 is its temperature one minute later.

The data are almost perfectly dependent, a straight line is a very good representation of the
data. Of course, we really don't need a lag plot to tell us this. However, a lag plot can indicate
outlier data, as shown by the point at temperature #1 «220 °C, which should be closer to 200 °C.
Possibly some external source, such as static electricity or a line surge, affected the measurement.

One of the applications of statistics to time-series data is the prediction of the final
temperature of an object that is subject to a heat flux. Suppose a fixed voltage is applied to a small
laboratory furnace with a temperature controller. If the controller fails or the power relay sticks
closed, the furnace temperature will continue to rise. The furnace manufacturer may want to know
how hot the furnace will get to be sure that if the controller fails, the furnace heating elements
won't burn out and the insulation won't be ruined. Is it feasible to measure the furnace rise
temperature for an hour or two, and use the data to predict its final temperature (i.e., the
temperature at infinite time)?

The rate at which an object heats up when placed in a furnace (or cools down when taken out)
is expressed as Newton's law of heating (or cooling):

T^Tf-iTf-T^e kt [3.22]

where Tt represents the temperature at any time /, 7} is the final temperature, Г, is the initial object
temperature, and A: is a constant that represents the way heat is transferred between the object and
its surroundings, к may fluctuate somewhat with time because the heat transfer mechanism
changes as the temperature approaches 7}. Therefore, actual temperature-change data usually
deviates somewhat from Newton's law.

112 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

The first question to answer is how well Newton's law applies to the furnace heating situation.
As a test, one of the furnaces was taken from inventory and connected to the 110 V line. The
initial temperature was 27 °C. The temperature was measured for 500 minutes, by which time it
was obvious that a steady (final) temperature was reached. Worksheet "TempRise" shows the
results. Clearly, the furnace reached an apparent final temperature of about 850 °C in about 300
minutes (or a bit less). There is some "flutter" in the data once the furnace has reached the final
temperature, probably due to minor variations in line voltage or changes in ambient conditions.

To model the data, we seek a value of к and 7} that minimizes the sum of the squares of the
differences between the data and a Newton's law fit. The analysis needs only the first 300 minutes
of data because the furnace reached its final temperature before that. Solver minimized the value
of SSE (by minimizing the function SUMXMY2) by varying к and 7} as shown in the worksheet.
Solver found a solution at к = -0.01762 and 7}= 845.5 °C. Figure 3.29 shows the data and the
superimposed model curve (to 300 minutes) using the above two parameters.

Temperature Rise of Uncontrolled Furnace
900

800

700 Data Model
I
U 600 / I
S /
з 500
2 A statistical analysis of the first
300 points gave к = -0.01762 and
a) 400 Tf = 845.5 °C.
EQ.
R2 = 0.9978
В 300
a ;
200

100 г 50 100 150 200 250 300 350 400 450 500
о time, min

Figure 3.29 Temperature rise of an uncontrolled laboratory furnace connected to line voltage. A
smooth solid line was drawn through the measured temperatures, while the dashed line (first 300
minutes) represents a model based on Newton's law using statistically determined parameters.

The temperature rise of the furnace seems to be well represented by Newton's law because the
dashed and solid lines are very close during the first 300 minutes. However, linearizing the data is
a better way to compare the fit. We do this by using a logarithmic version of Equation [3.22]:

ln\rTTtt--TTff^ = kt [3.23]

where the quantity Tt-Tf is defined as the relative temperature, or RT. Figure 3.30 shows the

7}-7>

actual value of ln|RT|/£ vs. time using к = -0.01762 and Tf= 845.5. This affirms that Newton's
law is overall valid. The "flutter" in the data above 200 minutes is caused by small fluctuations in
the temperature measurements, which are magnified by the structure of the RT function.

Newton's law states that infinite time is required to reach the final temperature, but variations in line
voltage, ambient temperature and instrument sensitivity mean that the difference between the furnace
temperature and the theoretical final temperature disappear in a finite time (here less than 300 minutes).

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 113

400 Linearization of Furnace Data via Newton's Law
350
300 50 100 150 200 250 300
_£ 250 time, min
S 200
" 150
100

50
0

0

Figure 3.30 Comparison between heating time calculated from statistical analysis of actual
furnace temperature rise data (k = -0.01762 and 7} = 845.5) and theoretical line (dashed) based on
perfect adherence to Newton's law. The slope of the theoretical line should be (and is) 1.

If we stopped here, we would be tempted to conclude that Newton's law adequately
represented the data. If this were true, a statistical analysis of the first 50 or 75 minutes of
temperature rise could accurately predict the final furnace temperature. The trouble is that the
scales for Figures 3.29 and 3.30 aren't correctly chosen to properly evaluate this assumption.
What we really need is a difference plot that shows more clearly how far the actual data are from a
Newton's law model. For this, we plot the residuals between the actual and Newton's law
temperature values. We talked about residuals in Section 3.4.1 and other places, and defined them
as the difference between the actual data and what the model predicts. Figure 3.31 shows a plot of
the temperature residuals for the actual and Newton's law model. This shows clearly the
significant difference between the actual furnace temperature and that predicted by statistically
fitting 300 minutes of temperature rise data to Newton's law.

Recall that our original objective was to use the early temperature rise measurements to
statistically analyze the data and calculate 7}. However, the data show that the greatest deviation
from Newton's law is in the early times, hence use ofthat data (for example, the first 40 minutes or
so) is not likely to be a valid population to use for prediction of 7}. One possibility is base our
analysis on "intermediate temperature" data, say between 40 and 100 minutes, to avoid the large
residuals in the first 40 minutes. We can test this and other hypotheses by taking several ranges of
data from the worksheet, and calculating 7} and к. The table below shows some sample results.

Time range, min 30-75 30-90 30-110
355° - 634° 355° - 687° 355° - 728°
Temp range, deg
Newton's law Final T 895° 880° 854°
-0.0163 -0.0167 -0.0176
к

These results are rather disappointing. They show that even if we skip the first 30 minutes of
data, we still have to use data out to 100 minutes in order to get a predicted final temperature
within 10° С of the actual value of 846 °C. This is because the early-rise portion of the
temperature excursion has a large effect on the predicted final temperature, and the early-rise
portion has the worst Newton's law fit. An early-rise portion might be acceptable if we were
interested in a final temperature prediction of ±50 °C, but even then, we would have to confirm
this by tests on several furnaces.

114 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

20 Actual - Model Temperature Residuals

О '

оdг 10
E
■Фоσ

гE -10

EQ.
Ф -20



*-»

Ш -30

-40
50 100 150 200 250 300
time, mintes

Figure 3.31 Temperature residuals from the fit of a Newton's law model to the first 300 minutes
of the temperature rise of an uncontrolled laboratory furnace. The results indicate a significant
deviation from Newton's law for the first 100 minutes.

One might ask if we can estimate how confident we should be that using an early-rise data set
and a Newton's law model could predict the final temperature to within (say) ±25 °C. The answer
is "yes", but what we're asking about here is actually a prediction interval rather than a confidence
interval. This is a valid statistical question, but it is beyond the scope of the coverage for the
Chapter. Our main point here is that a chart of residuals may be the best indicator of the
limitations of a model to fit a data set.

We now turn out attention to a different type of time-series data analysis, where the final
value is known ahead of time, and we are interested in using early-time data to predict the time
required to attain a set fraction of the final value.

EXAMPLE 3.21 — Selecting a Modelfor the Hydrogen Reduction ofNiO.

A company recovers nickel oxide from an aqueous precipitate, and calcines it to produce NiO
granules. These granules are sold to nickel producers, who reduce the NiO to Ni with hydrogen.
The company must specify to their customers how long it takes to reduce 95 % of the NiO to Ni.
They seek to improve the way they model the test results in hopes of minimizing the time required
to carry out the tests. The current test practice is to measure the reduction of a sample until it
reaches a fractional reduction of at least 0.8. Tests are carried out in triplicate. The composite
results of the three tests are plotted, and fitted to a quadratic equation, which is used (by
extrapolation) to estimate the time required for a higher value of fractional reduction. However,
this method wasn't very good at estimating the actual time for higher reduction; a quadratic
equation consistently predicted a shorter time than lab test results carried out until reduction was
nearly complete. The plant engineer seeks a better way of estimating the reduction kinetics to
overcome two problems with the current system (tests take too long, and results don't extrapolate
very well). He decides to try to fit the reduction data to physical reality (i.e., a kinetic model)
instead of the quadratic (empirical) equation.

Examine the data with a kinetic model of the type kr{t) = 1 - (1 - X)1/3 (so-called "interface
control" model) to see if it gives a better fit than the quadratic model. Here, X = fractional
reduction, t = time, and kr is the reaction rate constant (in reciprocal time). In this example, X must

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 115

eventually reach one, but we don't know how long it will take. (Note that l/k is the time for X to
reach one).

Data. Worksheet "NiORedn" shows the results of a set of triplicate reduction tests. You will need
to look at this worksheet to follow the solution.

Solution. The first step is to plot the data for each test, as shown in Figure 3.32.

Figure 3.32 Chart of three reduction tests on NiO granules. Smooth line drawn by Excel through
the data points.

The Trendline tool on the composite data set found the following quadratic equation for the
relationship between fractional reduction X and time:

/(seconds) = 161IX2 + 750X + 46. R2 = 0.9892.

The first task in testing the kinetic equation fit is to calculate kr to see if it is nearly constant.
A plot of the term [1 - (1 - X)1/3]/t vs. X showed little trend with X (or t for that matter). The
average value of kT calculated from the early-time tests (18 tests, for X < 0.4) was 0.000259 sec"1.
The resulting kinetic equation was used to see how well the early-time composite test results
would predict the time required to reduce 95 % of the NiO. Figure 3.33 shows the results.

These results indicate that the kinetic model is superior to the quadratic model in representing
the data. The kinetic model predicts the reduction process up to higher reduction extents more
accurately and in less time than the quadratic model. This saves time in characterizing the
reducibility of the granule reducibility.

Assignment. In the temperature rise example discussed earlier, we noted that к changed rapidly
with time before leveling out at about -0.018. See if you can find an equation that expresses к as a
function of time (or temperature) that can be used in Newton's law to make the early-time data
useful for predicting 7}.

116 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Figure 3.33 Comparison between quadratic and kinetic model equations to represent the data on
reduction of NiO with H2. Using all data (X < 0.86), the quadratic model predicts 2212 seconds to
reach a fractional reduction of 0.95. Using only X < 0.4 data, the kinetic model predicts that 2437
seconds are required.

This discussion is useful for making one general point: there is rarely a single "correct"
regression model for a situation. For one thing, you rarely have complete information on the
variables you've measured; for another, you rarely are able to measure all conceivable variables
that might affect the situation at hand. As a result, unless guided by theory or experience, using
regression usually gives you a collection of reasonable models, rather than a single perfect one.
And remember the earlier warning about extrapolating an empirical model beyond the range of
data that was used to develop the model parameters. Extrapolation is less dangerous with a
phenomenological model than with an empirical one.

3.4.6 Polynomial vs. Rational Function Models

Historically, polynomial models are among the most frequently used empirical models for
fitting functions. We have already mentioned some of the advantages and disadvantages of using a
polynomial equation to fit experimental or tabular data. As mentioned, they have poor
extrapolatory and asymptotic properties. The degree of the polynomial must be high in order to
model complicated data, so one may be tempted to settle for an unsatisfactory fit to avoid a large
number of parameters.

If a polynomial functions is inadequate, you might consider a rational function model (RFM).

A rational function is simply the ratio of two polynomial functions such as shown in Equation

[3.24]. The constant term in the denominator of a rational function is usually set to 1. Clearly, a

polynomial function is simply a rational function with a constant in the denominator.

Aa+ Алх + A?x ... [3.24]
У = — И V~

1 + Bxx + B2x +Β3χό...

Section 4.6.4 of (NIST-RFM 2006) gives a case study of the application of an RFM to fit the

thermal expansion of copper to temperature.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 117

3.4.7 Outliers

When a set of data is first observed graphically, you may notice that some points don't seem
to follow the trend of the rest of the points. We call these points outliers. There is no widely
accepted quantitative definition of "outlier", so we use a vague qualitative definition. There are
two ways a point can be an outlier. Either it lies at a great distance from the other points in the
dataset, or the dataset contains a trend identifiable by regression, but the point in question lies far
from that trend.

A full treatment of outliers is beyond the scope of this work. Outliers of the first type are
often related to experimental difficulties such as mis-measurements, mis-recordings, or poor
experimental design. We will talk a little about outliers of the second type, points that do not fit a
regression model. Look back at Figures 3.24 and 3.25 showing the heat capacity data on titanium
oxide fitted with a linear, quadratic, and power model obtained by using the Trendline tool. Please
review the prior statistical analysis and worksheet "CpTiOx" and the text near the figures.

We already mentioned that two points seem to lie "above" the general trend of the rest of the
points. Further, it looks as though the linear and quadratic model lines have been dragged towards
these two points. (We ignore the power model since its fit was not as good as the quadratic
model). The quadratic curve (Figure 3.25) fits well for points below 350 К and above 475 K, but
in between, the curve looks too high for all but the exceptional points. Possibly these two points
are outliers. They are identified on worksheet CpTiOx in green-shaded cells.

If you suspect you have outliers, the Regression tool can give some useful quantitative
information if you check the "Standardized Residuals". Table 3.19 shows residual information for
the above points and model; to save space, we have omitted the points at each end of the dataset,
but their values in this table resemble those of the non-exceptional points included. The first four
columns are those provided by the Regression tool; we added the last. The two exceptional points
appear in this table as observations 8 and 15. We want to focus on the Standard Residuals column,
whose meaning we now explain.

Table 3.19 Partial set of standardized residuals from Excel's Regression tool for the quadratic
model using the CpTiOx data.

RESIDUAL OUTPUT

Observation Predicted Y Residuals Std Residuals Chance of seeing

6 40.01034526 -0.500345 -0.928265602 100.0%

1 40.17116243 0.0688376 0.127710904 100.0%

|8 41.08288294 1.3971171 2.592001581 19.0%

9 41.64004659 -0.520047 -0.964816499 100.0%

10 42.19791484 -0.127915 -0.237314013 100.0%

11 42.27951988 -0.31952 -0.592789301 100.0%

12 43.2046142 -0.544614 -1.01039556 100.0%

13 43.52540885 -0.055409 -0.102797283 100.0%

14 43.87321909 -0.343219 -0.636757252 100.0%

| 15 43.98015139 1.5698486 2.912461819 7.6%

16 44.2277849 -0.067785 -0.125757937 100.0%

17 44.5654696 -0.44547 -0.82645753 100.0%

In this example, the standard deviation of the residuals is 0.539. Recall the interpretation of
the standard deviation: it represents the "average" distance between each point and the mean. So
points whose (absolute) residual is greater than 0.539 are relatively far away from the curve, while
those with smaller residuals are relatively close to the curve. On this scale, observations 8 and 15
are quite far away from the curve. For example, observation 8 is 1.397/0.539 = 2.592 times as far
from the curve as average, while observation 15 is 1.570/0.539 = 2.912 times as far from the curve
as average.

118 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

An additional criterion can lead to a different way to identify outliers. Recall that the
assumptions for hypothesis testing of regression curves included that the residuals were normally
distributed with mean 0. If that assumption were true here, how likely would it be that observation
8 would be 2.592 standard deviations from 0? We can compute this as 2*(1-
NORMDIST(2.592,0,1,TRUE)) - 0.0095, or only about a 1 % chance. Observation 15 is even
less likely: 2*(1-NORMDIST(2.912,0,1,TRUE)) - 0.0036, or only about a 0.3 % chance.

Based on calculations like these, one can establish a rule of thumb about how large a
standardized residual should be for that point to be considered a possible outlier. While different
people use different standards, the two most common are to use standardized residuals of |2.5| or
|3.0| as cutoffs to determine potential outliers. By the first rule, since both observations 8 and 15
have standardized residuals of more than 2.5, both would be considered outliers; by the second
rule, since neither has standardized residuals more than 3.0, neither would be considered outliers.

We close with two warnings. First, it's tempting to remove outliers from a dataset. In the
CpTiOx example, removing the two potential outliers results in a new quadratic model that fits the
remaining points very well, better than the first curve did. Figure 3.34 shows the results of outlier
removal.

Heat Capacity of TiOx Omitting Outliers
46
45

■О 43
"δ 42

I 41

8 40
£ 39
° 38

37

36 *
300 350 400 450 500 550

temperature, К

Figure 3.34 Linear and quadratic model for CpTiOx data with two outliers removed.

However, you want to be very careful about removing points from a dataset. Outliers might
be caused by experimental error, in which case removing them is completely justified — but on the
other hand, they might indicate that you don't understand the situation as well as you think you do.
The current controversy surrounding measurements of climate change illustrate this point. The
temperature "outliers" may be precursors of a real trend. In general, you should have some non-
statistical criterion before removing outliers from a dataset. If you publish your results in a
journal, and you have removed points as outliers, you should still report those points and your
actions to permit readers to reach their own judgment about the appropriateness of including or
excluding these points.

Our second warning is that the rule of 2.5-standard-residuals-as-outlier seems a bit risky to us.
True, it's unlikely that any particular observation is this far from the regression curve, but the
more points you have, the more likely it is that one of them is notably off. The "Chance of Seeing"
column in the table above reports the probability that you might see a point this far off somewhere
among the 22 measurements in this dataset even if everything is working as it should. As you can
see, there's over a 17 % chance of an outlier as far away as observation 8, and even observation 15
has a 7.6 % chance of occurring. Given the usual statistical cutoff of 5 %, neither of these would
be regarded as all that rare an occurrence.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 119

Thus, our overall recommendation on outliers is to look at the standard residuals, and if you
find points with standard residuals of ±2.75 or ±3, then you should further investigate these points
as possible outliers and potentially removable points. If nothing else, these points indicate the
need to repeat the measurements. Ultimately, it all boils down to a question of judgment.

3.4.8 Warnings

This brief section is dedicated to two warnings related to misuses of regression.

Warning #1: Beware the Perils of Extrapolation. Consider again the temperature-yield
situation with a linear model: у = 0.141/ - 2.603. What does this model tell you about the reaction
when the reaction temperature is at 0 °C? Is the resulting yield likely to be -2.603 kg? What does
that even mean? In this example, all your temperature measurements were between 80 °C and 100
°C. As a result, any predictions made by your model are based only on information from that
range of temperatures, and you have no idea whatsoever how the reaction would behave at 0 °C,
and only a guess at 70 °C. As we've stated earlier, be extremely cautious about extrapolating
regression curves beyond the range of the measurements you've made.

Warning #2: Correlation is not Causation. Returning to the galvanized steel example, we
decide to see if adding another variable would help improve our understanding of the weight gain.
Choosing at random, we pick the closing index of the NASDAQ stock market for each of the days
at question. Figure 3.35 indicates a startling degree of affinity between these two variables.

Relationship between NASDAQ and Weight Gain

AR

40 γ = 0.6012χ-119Ϊ, ■■
R2 = 0.7401
35
v<** ■I \^f
■E 30 щ

(0 2000 2010 2020 2030 2040 2050 2060
daily close of NASDAQ index
2 25

σ> 2 0
Φ£ 15

10
5
πυ
1990

Figure 3.35 Putative relationship between stock market closings and corrosion of galvanized steel.

Recall our interpretation of R2: an R2 of 0.7401 means that the variation in the NASDAQ
index explains 74% of the variation in weight gain. Obviously, that's not what's happening. By
coincidence, the NASDAQ during this period tended to be a little higher when the weight gain was
higher, and a little lower when the weight gain was lower. That doesn't mean that there is any
causal or explanatory relationship between the two variables. The point of this example is that
regression is a mathematical calculation. It is up to the user to assign meaning, if any, to the
results. As a rule, no statistical calculation, by itself, can establish causation. They can validate
theories that explain the relationship, or suggest to the researcher where an unexpected relationship
might exist, but no more.

3.5 Experimental Design

Up to this point, we have been focusing on how to analyze data and how to interpret the
results. Now it's time to continue our journey into a very important area, experimental design.
Here we start with no data, and plan to obtain it by experiments. The goal may be to uncover a

120 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

relationship or seek an optimum. An experimental research project of this type is certainly not
alien to an engineer. A chemical engineer might want to identify the key factors that determine the
yield of a reaction. A metallurgist trying to make durable alloys may want to test the susceptibility
to corrosion of different alloys under variety of conditions. In essence, experimental design helps
us to fulfill our goals, and at the same time, a proper design can minimize our experimental efforts.
A thorough treatment of experimental design is well beyond the scope of this chapter, and
interested readers can consult one of the references at the end of the Chapter. In this section, we
are going to introduce some of the basic concepts infactorial experimental design using examples.

3.5.1 Factorial Design

There are many ways to examine how various conditions can change the outcome of a
situation. Suppose a chemical engineer wants to find the optimal conditions for the production of a
fine chemical according to the following equation:

A+ B^D

Reaction temperature, pressure, catalyst loading, and agitation might influence the reaction
yield. Anything that influences the yield is called a factor. The reaction yield, which is the
quantity of interest here, is called the response.

For a specific factor, its possible values are called the levels of this factor. A quantitative
factor takes numeric values, while a qualitative factor is described as present/absent, on/off or
open/closed. Every factor must take a pre-specified value when carrying out an experiment. Each
different combination of experimental conditions is called a treatment. Clearly, each treatment
corresponds to an experiment run. An experimental design matrix is a tabulation of all the runs
together with the respective level specifications of all factors.

An example of a qualitative factorial experimental design might involve the way agitation
affects yield. The factors could be the type of impeller and shape of the tank. The impeller levels
could be a chain and a paddle, and the tank levels could be a round and a square tank. This is the
simplest type of factorial experiment, at two levels of two factors. It is called a 2 x 2, or a 22
factorial experiment, requiring four runs to produce four responses. An example of a quantitative
factorial design might involve the effect of temperature, pressure, and catalyst loading. Here the
simplest type of experiment would be two levels of three factors, or a 2 x 3 matrix. The levels
would be a "high" and a "low" setting for each factor.

Factorial experiment designs obtain the largest amount of information possible with the
smallest number of runs. Consider the situation where two reactants form a desired product, with
temperature and time being the most important factors. The desired response is the percent of
theoretical yield, with a goal of determining the factor values that give the greatest percent yield.
People not familiar with factorial experimental design may inadvertently design a set of
experiments that are inefficient in terms of time and effort. It is tempting to adopt an experimental
design matrix that studies one factor at a time (the so-called OFAT method). A hypothetical
application of OFAT experimental design is described below.

The time and temperature must be related by some sort of response surface (Wikipedia 2006),
which may be flat, smoothly curved, or irregular and steeply curved. The investigator is initially
unaware of the exact shape of the response surface, but suspects that it is steeply curved, so
several, maybe a dozen, experiments may be needed. To use the OFAT method, he first selects
one factor to hold constant while varying the other. He then uses the factor that gives the greatest
yield, and conducts another set of experiments holding that constant while varying the other. He
first holds the temperature constant at 265 °C and measures the yield at different reaction times,
with the following results.

time, min. 8 11 14 17 20 23 26
yield, % 69 79 82 80 71 56 35

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 121

The maximum yield appears to be at 14 minutes, so he keeps the reaction time at 14 minutes
and measures the yield at different temperatures.

temperature, °C 220 230 240 250 260 270 280

yield, % 51 66 78 85 86 77 56

Figure 3.36 shows the results of the second (14-minute) set of OFAT experiments. They
indicate an estimated overall maximum yield of 86 % at 260 °C and 14 minutes reaction time.

90 % Theoretical Yield at 14 Minutes 290 Figure 3.36 Results of
second set of OFAT
85 220 230 240 250 260 270 280 experiments on reaction
yield. Point at 265 °C
80 temperature, °C came from first set of
variable-time experiments
^ 75 at 265 °C. Smooth line
drawn through points by
2 70 Excel's charting tool.

ft«
60
55
50210

Unbeknownst to the investigator, the actual response surface for this process can be depicted
by contour lines of % theoretical yield, as shown in Figure 3.37. This sort of response surface is
one of the four types mentioned in Figures 3.9-3.12 (NIST-RS). Visual inspection indicates that
the maximum yield is in the middle of the 85 % yield ellipse, possibly approaching 90 % at 17
minutes and 255 °C.

290 Iso-yield Contour Lines
280
270 10 15 20 25 30
P 260
d time, min
Φ 250

240
230
220

Figure 3.37 Contour lines of percent theoretical reaction yield on a surface defined by
temperature and reaction time. The diamond markers indicate the 265 °C OFAT results (run first)
while the square markers indicate the 14-minute results. The OFAT experiments indicate an
apparent maximum yield of 86% at 14 minutes and 260 °C.

122 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

The fourteen OFAT tests have clearly come close to the actual maximum yield. Had the
investigator chosen the initial set of tests at 16 minutes instead of 14 minutes, he would have
gotten even closer. An OFAT design matrix works better when the response surface has yield
contour lines more nearly circular, but much worse when they are elliptical and slanted (or any
other strange shape). Now we ask: can a factorially-designed experiment get results as good as or
better than the OFAT results, but with fewer experiments?

For the rest of the Chapter, we'll concentrate on answering this question, mainly by
introducing examples of different sorts. First we look at an example of a quantitative factorial
experimental design on hydrogen precipitation of nickel from solution. The yield of nickel might
involve three factors, temperature (T), pressure (P), and catalyst loading (C), each at two levels.
This is a 2 x 3, or 23 design, requiring 8 runs to investigate every combination. If the engineer runs
experiments under every possible treatment, the design is called complete factorial design, or
factorial design for brevity. In contrast, if he runs only a fraction of all the possible treatments, the
design is called fractional factorial design. Fractional designs will be introduced in the next
subsection. For a design, if the experimenter makes the same replicates for every treatment he
runs, the design is called balanced', otherwise, it is called unbalanced. A balanced design is much
easier to work with in terms of the ease of data analysis.

The first step is to draw up a design matrix and assign the high and low level values for each
factor. Suppose the hydrometallurgist sets the temperature levels at 50 °C and 80 °C; pressure
levels at 3 and 4 atm; and catalyst loads at 1% and 2%*. He determines the reaction yield for every
treatment with one replication per treatment. The runs were made in random order. Table 3.20
shows the design matrix, the factors, and results of average percent yield of two replications. Here,
yield represents the percent of total nickel present that precipitates.

Table 3.20 Experimental results on nickel precipitation yield.

run 1 2 3 4 5 6 7 8
Г(°С) 50 50 50 50 80 80 80 80
P (atm) 3 3 4 4 3 3 4 4
C(%) 1 2 1 2 1 2 1 2
Yield (%) 48 55 51 58 73 85 76 90

Now let's analyze the results. Some of the questions we want to answer are: how can the
hydrometallurgist estimate the effect of switching the temperature from 50 degree to 80 degree in
terms of reaction yield? Similarly, how can he estimate the effect of pressure and catalyst loading?
Can he find the optimum yield conditions?

The 2 x 3 matrix required 8 experiment runs. Half were carried out at 50 degrees, and the
other half at 80 degrees. There is an obvious one-to-one correspondence between these two halves,
namely run 1 corresponds to run 5, run 2 corresponds to run 6 and so on. By correspondence, we
mean in the two related runs, temperature is the only factor whose level has been changed. We
first estimate the temperature effect using the data in run 1 and run 5 only, giving 73% - 48% =
25%), which means that by changing temperature from 50 to 80 degrees while holding pressure
constant at 3 atm and catalyst loading at 1%, the reaction yield increases by 25%. The effect of
temperature can also be estimated from the data in runs 2 and 6, which gives 85% - 55% = 30%;
or from runs 3 and 7, which gives 25%; or from runs 4 and 8, which gives 32%. We should realize
that there is no particular reason to choose any one of the four values since they all estimate the
same quantity, the temperature effect. Therefore, the best thing we can do here is take their
average: (25% + 30% + 25% + 32%)/4 = 28%. To make the correct calculation, it is critical to
match a run with its unique partner, so that only the factor whose effect you are estimating differs

Naturally, the experimenter must have some preliminary information before setting the factor levels.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 123

in the pair of runs you're looking at. If the matching is wrong, the result will certainly be incorrect
since within the wrong pair, factor(s) other than temperature has been changed.

Following the same reasoning, we can estimate the pressure effect as 3.5%, and the catalyst
loading effect as 10%. Try to work these numbers by yourself and if you get different answers,
check whether your matching of pairs is correct.

The calculations in the example can be significantly simplified if a new notation is adopted
for a 2-level matrix. We use - to represent the lower level of a factor and + for the higher level.
Based on this notation, Table 3.20 turns into Table 3.21.

Table 3.21 Design matrix with new notation.

run 1 2 3 4 5 6 7 8
T(°C) - - - - + + + +
P (atm) - - + + - - + +
C(%) - + - + - + - +
Yield (%) 48 55 51 58 73 85 76 90

The benefits of this new notation are not limited to convenience and simplicity. Rather, it
brings in additional insights into our analysis of data. To see this point, let's rework the example
above. The effect of temperature can be calculated as follows:

(- 48% - 55% - 51% - 58% + 73% + 85% + 76% + 90%)У4 = 28% [3.25]

The sign before each number comes from the sign of the temperature factor in the
corresponding experimental run. The divisor 4 comes from the 4 pairs of data. Similarly, we can
work out the effects of pressure and catalyst loading. The algebraic interpretation of the signs -
and + gives us a valid formal procedure in general. To work out the effect of a factor, we can
ignore the signs of other factors, and combine the sign of this factor with the corresponding
response value, then sum up all the values and divide the sum with the number of pairs. This
method of calculating effect sizes is called contrasts.

This notation also leads to a simple implementation of these calculations in Excel. Instead of
recording columns of just +'s and - ' s , record +1 's and -1 's. Assume for convenience that the ±1 's
for T were in cells T1 :T8 and the yields were in cells W1 :W8. In a convenient empty cell, type in
the formula =SUM(T1:T8*W1:W8)/4, then, instead of hitting the Enter key, hit control-Enter.
The result, 28, should appear in the cell.

In essence, the effect of a factor tells us how sensitive the response is to this factor. In the
above example, the effect of temperature is more significant than that of pressure. So if you want
to optimize the yield, temperature is the key factor. Conceptually the effects of factors are similar
to the coefficients in regression. Recall from section 3.4, if we model a data set using the equation
у = bo + b\X\ + b2x2, where x\ and x2 are explanatory variables, the coefficients b\ and b2 tell us the
sensitivities of the response variable у with respect to x\ and x2. If there is evidence that the model
is not sufficient, we might introduce an interaction term xxx2 to capture the non-additive features
contained in the data. This naturally brings up a question: can we define an interaction term in the
experimental design setting?

The answer to the above question is yes. For the above example, we can define the interactive
two-factor effects TP, PC, and TC, which represent temperature-pressure, pressure-catalyst and
temperature-catalyst interactions, respectively. These two-factor effects measure the non-additive
interaction between the two factors. Take TC as an example. As we know, the main effect of
catalyst loading was 10%. This was obtained by averaging over all the different levels of the
temperature factor and pressure factor. Since we are estimating TC, the pressure factor can be
ignored here. Let's examine the effect of catalyst loading under different temperatures. When T =
50 °C, the effect of catalyst loading can be estimated as (55% - 48% + 58% - 51%)/2 = 7%.
When T = 80 °C, the effect of catalyst loading can be estimated as (85% - 73% + 90% - 76%)/2 =

124 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

13%. Clearly, the effect of catalyst loading depends on the level of temperature. When the
temperature is higher, the increase of catalyst loading can enhance the reaction yield more
significantly. The TC interactive effect can be estimated as (13% - 7%)/2 = 3%. Physically, this
can be interpreted as saying that the catalyst is more active at higher temperature.

There is an easier way to make the same calculation, once you understand what the numbers
and their signs should be. First, we rearrange the calculation of the TC effect:

85%-73%+ 90%-76% 55% - 4 8 % + 58% - 5 1 %
22
2

= 48%-55% + 51%-58%-73% + 85%-76% + 90% =
4

This is shown better if you expand Table 3.21 by incorporating the term TC, whose signs are
determined by multiplying the signs for T and С If the signs of T and С are the same, then the
sign for TC is +, otherwise, the sign for TC is -. We obtain Table 3.22. The signs for the
calculation of the TC effect can now be read directly from the table.

Table 3.22 Design matrix augmented by a TC interaction column.

run 1 2 3 4 5 6 7 8

T(°C) - - - - + + + +

P (atm) - - + + - - + +

C(%) - +- +- +- +
TC +- +- - +- +

Yield (%) 48 55 51 58 73 85 76 90

EXAMPLE 3.22 — Calculating the Pressure-Catalyst Interaction.

Follow the algebraic method described above to estimate the two-factor interactive effect PC
using the data in Table 3.21.

Solution. First, we construct an expanded table to establish the signs of each term.

Table 3.23 Design matrix with PC column added.
run 1 2 3 4 5 6 7 8
T(°C) — — — — + + + +
P (atm) — — + + — — + +
C(%) — + — + — + — +
TC + — + — — + — +
PC + — — + + — — +
Yield (%) 48 55 51 58 73 85 76 90

Then the two-factor interactive effect PC can be estimated as:

(48% -55% - 5 1 % + 58% + 73% - 85% - 76% + 90%)/4 = 0.5%
Assignment Calculate the two-factor and three-factor interactive effects TP and TPC.

Now we have estimated the main effects of temperature, pressure and catalyst loading, their
two-factor interactive effects and the three-factor interactive effect. In total, we have estimated 7
quantities from the 8 experimental runs. This can be generalized to a design with n factors. In
general, if there are n factors involved with 2 levels each, the total number of experimental runs is


Click to View FlipBook Version