The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Handbook_on_Material_and_Energy_Balance_Calculat1

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by alphacentauryc137, 2022-06-29 20:46:27

Handbook_on_Material_and_Energy_Balance_Calculat1

Handbook_on_Material_and_Energy_Balance_Calculat1

Chapter 1 Dimensions, Units, and Conversion Factors 25

1.17 The thermal conductivity of aluminum at 32 °F is 117 Btu · h l · ft ] · °F \ Use U-Converter
to find the equivalent value in terms of cai · sec-1 · cm1 · °C_1.

1.18 A column of water 30 cm high and 20 mm in diameter is in a cylinder supported by a piston.
Calculate the force in N required to prevent the piston from moving, and calculate the pressure on
the piston face in mm Hg and psi.

1.19 A government surplus auction is being held to dispose of some scrap titanium. A prospective
bidder examined the scrap, which was in the form of a conical pile of broken pieces. The
circumference of the pile was estimated to be 18 m, and the angle of the pile surface was 30° from
the horizontal. The bulk density was estimated to be 60 %. Estimate the mass of the titanium.

1.20 The Reynolds number is a dimensionless number defined for a fluid flowing in a pipe as

Re = Dup/μ

where D is pipe diameter, и is fluid velocity, p is fluid density, and μ is fluid viscosity. When the
value of Re is less than about 2100, the flow is laminar, and if above 2100, the flow is turbulent.

Sulfuric acid flows through a pipe with an inner diameter of 3.067 inches, and an average
velocity of 0.52 ft/s. At the fluid temperature of 25 °C, the density of H2S04 is 1826 kg/m3, the
viscosity is 19 centipoises [1 cP = 1 x 10"3 kg/(m · s)]. Without using your calculator (relying on
integers and orders-of-magnitude), determine if the flow is laminar or turbulent.

1.21 An instrument has been developed to measure the mass flow rate of a gas, and is calibrated in
kg/min. Another instrument has been developed to measure the volumetric flow rate, and display
it in L/s. These two instruments are to be used to measure the moisture content of flowing air, in
terms of mg/L. Assume that dry air has a volume fraction of 78 % N2, 21 % 02, and 1 % Ar. Set
up a spreadsheet solution to calculate the moisture content from a number derived from the
readings of these instruments, and plot a graph from 0 to 100 mg/L. If the desired precision of the
moisture content reading is 2 significant figures, how many significant figures are required from
the instrument readings?

1.22 A balloon of volume 1.00 m3 has a mass of 0.50 g, and is filled with helium. Helium has a
specific volume of 6.10 m3/kg, and air has a specific volume of 0.861 m3/kg. Calculate the mass
capable of being lifted by the balloon.

1.23 Calculate a conversion factor to convert the heat of combustion of a gaseous fuel from units
of calories per mole to Btu per standard cubic foot.

1.24 The composition of dry air was listed in a recent reference handbook as follows:

gas: N2 o2 Ar C02

φ, %: 78.10 20.95 0.92 0.03

Calculate the mass fraction of each constituent, and the mass fraction of each element.
Calculate the molar mass of dry air, and compare to the value given in Section 1.6.3.

CHAPTER 2

Thermophysical and Related Properties of Materials

An important part of material and energy balances is accounting for the alterations that take place
when materials are subjected to changes in their physical environment. This Chapter examines the
relationship between the pressure, temperature and volume of a substance. The response of a
substance to these variables is quantified by an equation of state. We will also look at the effect of
pressure and temperature on the stability of two different forms of a substance. The Chapter
concludes with a brief introduction to the properties of solutions. The General References listed
earlier in the Handbook (page 605) are good sources of thermophysical property data.

2.1 State of a System and Properties of a Substance

The portion of the universe set aside for consideration is called a system. The system
boundary may be real or imaginary, through which energy and material may pass. It initially
consists of a definite amount of matter, composed of one or more specific substances. Each
substance may exist in various forms. We designate these forms of a substance as phases. A
phase is a homogeneous, physically distinct, aggregate of matter which is mechanically separable
from any other phases that may be present. Mixtures of different phases are separated by phase
boundaries. A phase may exist over a range of pressure or temperature, or to use the
thermodynamic term, in various states. The state may be characterized by a variety of
macroscopic properties.

The macroscopic state of a system can be defined in terms of three intensive observable
properties (or "variables of state"), and one extensive variable. The intensive variables are
composition, pressure, and temperature, while a typical extensive variable is volume. For a single-
phase system, the properties of the system are the same as the properties of the phase. Each of the
properties of a phase of a given substance in a defined state has only one value, and these
properties always have the same value for a given state, regardless of how the substance arrived at
that state. In fact, a property can be defined as any quantity that depends on the state of the system
and is independent of the path (that is, the prior history). The state is therefore defined when these
properties are specified. Section 2.2 defines the minimum number of properties that must be
specified in order to fix the state of the substance.

There are two classes of state properties: intensive and extensive. An intensive property is
independent of the mass, while an extensive property varies directly with mass. Thus if a phase is
divided in half, the intensive property remains the same, but each half has one half the value of the
extensive property. Temperature, pressure, and density are intensive properties, while mass and
volume are intensive. Of course, such properties as specific (or molar) volume are intensive
because the mass is defined.

Systems can be described according to the state of aggregation of the contained substances. A
homogeneous system is one where the properties of the contents are the same throughout, with no
apparent surfaces of discontinuity. A homogeneous system contains only one phase. A
heterogeneous system contains two or more phases, which are separated from each other by phase
boundaries. There may be large (abrupt) changes in properties across phase boundaries. If surface
tension, molecular absorption on surfaces etc. are not involved, the phase boundaries may be
considered surfaces of discontinuity of infinitesimal thickness.

Chapter 2 Thermophysical and Related Properties of Materials 27

Frequently we will refer not only to the property of a substance, but to the properties of a
system, which is an assemblage of phases. In that case, the value of the specified property has
significance for the entire system. For example, if the temperature of a system is specified, then all
phases are in thermal equilibrium, and the temperature is uniform throughout. (If a system
temperature is not specified, then different phases in the system might be at different
temperatures). If a system is in mechanical equilibrium, all macroscopic forces are balanced and
the pressure is uniform throughout unless the system exists over a considerable difference of
elevation, in which case the influence of gravity may affect the pressure. Even then, there is no
tendency for the pressure at any location to change, and in almost all cases, the effect of
gravitational force differences is negligible. The state of chemical equilibrium will be dealt with in
later chapters.

We further define a system in terms of the permeability of its boundary to matter or energy.
An open system can exchange energy and matter with the surroundings. A closed system can
exchange energy but not matter. An isolated system cannot exchange mater or energy. In a
heterogeneous system, each phase is an open system.

Quite often, we will designate the phases in a system using two special terms: fluid phases
and condensed phases. The term fluid refers to the gas and liquid state. The term condensed refers
to the liquid and solid state. A glass is a condensed form of matter between a liquid and a solid
state. We begin out discussion of state properties in Section 2.3 with the gas phase.

2.2 The Gibbs Phase Rule

Thermodynamic functions involving the free energy of phases were used by J. W. Gibbs to
show that the number of phases that can be present at a particular temperature, pressure, and phase
composition is limited. The state of a system is defined in terms of certain intensive (composition,
temperature, pressure, and specific volume ), and extensive (mass, amount, volume) properties.
When a certain minimum number of properties are specified, all other macroscopic properties are
fixed. In the case of an open system, these considerations are expressed by the familiar Gibbs
phase rule, which relates the number of components C, number of phases Φ, and degrees of
freedom D. D represents the number of intensive variables that can be specified independently for
an open system at rest. The meaning of С in the phase rule is that it is the minimum number of
independent chemical entities required to define the chemical nature of the system.

D = C+2-<P [2.1]

2.2.1 Consequences of the Phase Rule for Non-Reactive and Reactive Systems

In principle, there is no such thing as a "non-reactive" system. For practical purposes, we
define "non-reactive" to mean that there is no discernable change in chemical composition during a
finite time period. Non-reactive (inert) substances may or may not be in their most stable
thermodynamic state, but if not, reaction kinetics are so slow that no reaction is observed. Sections
2.2.2 and 2.2.3 discuss the application of the phase rule to non-reactive systems.

Application of the phase rule requires that the number of phases be known, either by
measurement or by observation. The rule does not predict when, or if, any new phases will or
might form as a result of changing a system's properties. The rule applies to stable and metastable
systems, and so long as D = 0, the system is uniquely characterized. The rule applies irrespective
of the mass of a system so long as gravitational forces are negligible, but at least one extensive
variable must be set to specify completely the system. When a system is closed, it's mass is fixed
and the overall composition is therefore no longer a variable.

Electrical, magnetic, surface, gravitational, and similar effects are ignored. The phase rule in the
Handbook is applied only to macroscopic properties.

28 Chapter 2 Thermophysical and Related Properties of Materials

The calculation of the number of degrees of freedom is different in reacting vs. non-reacting
systems. When chemical reactions occur, a certain stoichiometric relationship is imposed on the
limits of compositional variance. This "uses up" one of the degrees of freedom. The value of D
must be decreased by one for every independent chemical reaction that occurs. Sections 2.2.4
presents the phase rule for reactive systems. The phase rule discussion in this Chapter is not meant
to cover the subject in depth in view of the many references and texts that do (Ferguson 1966,
Hillert 1998).

2.2.2 Application of the Phase Rule to One-Phase Non-Reactive Systems

Consider a system composed initially of mercury vapor atoms. This is a one-component, one-
phase system (i.e., Φ = 1 and С = 1). Since a one-component, one-phase system is homogeneous,
the composition is fixed, and hence the state of such a system depends on the temperature,
pressure, and one other intensive variable (such as specific volume). According to the phase rule,
D = 2, so, only two of these three state variables can be arbitrarily specified. The value of the third
variable depends on the other two. The relationship between the properties of the phase is
expressed as an equation of state, with the most well-known being the ideal gas law (PVm = R7).
However, the system is not yet completely specified; for that, we need to specify an extensive
variable, such as mass or volume. Therefore, only closed systems can be completely specified.

Now consider an open system containing three species, 02(g), H20(g), and Ar(g) in a
temperature range where the tendency for water vapor to dissociate or the monomer element 0(g)
to form, is vanishingly small, and H20(/) has no tendency to form. С is the number of species, so
С = 3, Ф = 1, and D = 4. Any four of the temperature, pressure, or composition values may be
changed. If the temperature and pressure are fixed, two composition variables remain. Specifying
the composition of any two of the three component species fixes the other by difference; the
system is then invariant. In an open system, the system can exchange 02(g), H20(g), and Ar(g)
with the surroundings in order to effect a composition or volume change, and exchange energy to
effect a temperature change.

Now suppose we close the system. Since the composition is fixed, two of the composition
variables are no longer independent, hence D = 2. It is feasible to consider this (non-reactive)
system as having one component, an 02-H20-Ar gas mixture of a certain composition. The system
state is then defined by specifying any two of the three state variables; the system is bivariant.

One way to think of a one-phase one-component system is that its properties lie on a P-Vm-T
surface. The fact that it's a surface, and not a volume or a line is because of the phase rule. The
shape of the surface is governed by the nature of the component's equation of state.

2.2.3 Application of the Phase Rule to Multi-Phase Non-Reactive Systems

Recall that С = 1 for a single-substance open system. A P-Vm-T surface exists for the gas,
liquid, and each stable solid phase of the substance. The liquid and vapor P-Vm-T surfaces
intersect at a line, along which two phases are present, e.g., liquid and vapor. This line is given a
special designation: the VLE line, which stands for vapor-liquid equilibrium. Both phases are in
physical* and thermal equilibrium. According to the phase rule, when Ф = 2, D = 1, and only one
of the intensive state variables is capable of being independently varied. This is called a univariant
system. If the temperature of a two phase system is selected as the independent variable, the
pressure and all other intensive variables (such as the specific volume of each phase) are set by the
system's equation of state, which defines the relationship between these two dependent variables.
The maximum number of phases of a one-component invariant system (i.e., when D = 0) is 3. The
temperature and pressure of a three-phase one component system is fixed at the so-called triple

Physical equilibrium means that when the system is at rest, there is no tendency for a physical
transformation between the existing phases. When the system is disturbed, physical (but not chemical)
transformations occur to bring the system back into physical equilibrium.

Chapter 2 Thermophysical and Related Properties of Materials 29

point. If the extensive variable mass is defined, so will be the extensive variable volume. Such a
system is invariant.

The phase rule applies to multiple component systems, such as, for example, a system of two
components, NaCl and H20. For a univariant system, D = 1 so Φ = 3. Three possible phases are
solid NaCl, a saturated aqueous solution of NaCl, and a vapor phase of pure H20 (i.e., two
condensed phases and a gas). Alternatively, the three phases could be solid NaCl, solid ice, and a
saturated aqueous solution of NaCl. Here, there are three condensed phases but no gas. In the first
situation, a small change in temperature results in a small change in pressure (which is the vapor
pressure of the solution), the specific volume and mass of solid NaCl, and a commensurate change
in the aqueous phase composition. The equation of state for this system relates the change in
vapor pressure of a saturated solution with temperature. Note that the phase rule in its familiar
form applies to a system even though all of the components are not present in each phase. In the
second situation we may independently vary the temperature or pressure, according to the
combined equations of state of the three condensed phases.

2.2.4 Application of the Phase Rule to Reactive Systems

For non-reactive systems, it's feasible to consider a defined species assemblage as a single
component. For a reactive system, we must avoid this practice. The chemical component used in
connection with the application of the phase rule is most often a molecular species or an atomic
element. In that case, С = the number of species minus the number of independent chemical
reactions occurring between the species. In the case where all possible independent reactions
proceed to equilibrium, the number of component species = the number of constituent elements.

Consider a system containing N2(g), H2(g), and NH3(g). Practically speaking, there is only
one independent chemical reaction between the three species:

y2N2(g) + 1 V2R2(g) - NH3(g) [2.2]

At room temperature, the reaction rate is so slow that a mixture of the three species is inert.
In that case, the phase rule application is identical to the 02-H20-Ar system discussed in Section
2.2.2. There are three constituents, so according to the phase rule, Z) = 3 + 2 - l = 4 . However, at
elevated temperatures, especially in the presence of a catalyst, the reaction can proceed to
equilibrium, and the composition of the gas mixture responds to any change in temperature or
pressure. Since there are two elements, С = the number of elements = 2, giving D = 3. A degree
of freedom is lost because the equilibrium constant imposes a composition constraint. This
constraint is absent when the species are inert. This system's P-Vm-T surface is defined by the
equation of state of the three gases and the value of Кщ.

An additional (stoichiometry) constraint is imposed by closing the system, thus fixing the

composition and the amount of each substance in the reactor. For example, if the system contained

initially only NH3, and it was heated to a point where the reverse of Equation [2.2] initiated, the
gas mixture would have пИ2 = 3rcN2. This would be true irrespective of whether or not chemical

equilibrium was reached or what the pressure was. In this situation, the system is bivariant. An

additional constraint may be applied by defining the extent to which a certain reactant species is

consumed by a certain chemical reaction, or by specifying that the system reaches chemical
equilibrium. The system then becomes univariant, and is represented on the P-Vm-T surface as a

line of variable temperature. The application of the phase rule to reactive systems will be

discussed in more detail in Chapter 6.

The next few sections of this Chapter apply the above concepts to non-reactive systems, and
introduce various equations of state. The degree-of-freedom concept introduced by the phase rule
is applied somewhat differently to determine the number of system variables that must be specified
to make a material balance on non-reacting (Chapter 4) and reacting (Chapter 6) systems.

30 Chapter 2 Thermophysical and Related Properties of Materials

2.3 The Gas Phase

A gas uniformly fills any container, is easily compressed, and mixes completely with any
other gas. The properties of a gas are greatly affected by pressure and temperature. At sufficiently
low temperature, a gas will condense to a liquid; the condensation temperature increases with
pressure. Condensation does not occur if the pressure is above a value called the critical pressure.
The isothermal relationship between gas pressure and volume is given by Boyle 's law.

PVm = k [2.3]

The value of к for most gases varies slightly with pressure. For one mole of air at 300.0 К
and 1.000 atm, Л = 24.610 liter-atm.

The isobaric gas volume varies nearly linearly with temperature according to Charles ' law:

VJT=b [2.4]

The value of b for air at 1 atm shows very little variation with temperature, going from
0.08200 L · mol"1 · deg'1 at 260 К to 0.08207 L · тоГ1 · deg-1 at 2000 K.

2.3.1 The Ideal Gas Law

Boyle's law and Charles' law can be combined to derive a very simple relationship between
the molar volume, temperature and pressure:

PVm = RT [2.5]

where P is the absolute pressure of the gas, Vm is the molar volume of the gas, T is the absolute
temperature, and R is a constant with units that depend on the pressure and volume units. This is
called the ideal gas law, and is but one of several equations of state for gases. The gas constant R
is also employed in many other relationships, with different values and appropriate units. A list of
the values and units of R is given on the inside front cover, and can be converted to other units by
the program U-Converter. The value of R in L · atm/(mol · K) is 0.0820575.

An ideal gas is a theoretical substance in which the particles are assumed to be totally free of
mutual interaction. Most gases involved in materials production and processing are at sufficiently
high temperature and/or at sufficiently low pressure such that deviations from the ideal gas law are
small. For example at room temperature and pressure, the molar volumes of He, H2, N2, 0 2 and
CH4 deviate from +0.08 to -0.21 % of the values predicted by Equation [2.5]. These deviations
are even less at elevated temperature. However, there are some cases where application of the
ideal gas law can lead to significant error. Two notable examples encountered in material
processing are pressurized steam and molecular gases undergoing cryogenic liquefaction.

For material balances involving gases, it is convenient to convert the listed volumes to a
certain temperature and pressure: the standard temperature and pressure, abbreviated STP. This
Handbook uses 273.15 К (0 °C) and 101.325 kPa (1 atm) as STP. In recognition of the slight
deviations from the ideal gas law, the standard temperature is often rounded off to 273 K. There
are alternate conventions for STP, such as 1 bar for the standard pressure, or 60 °F for the standard
temperature.

One of the consequences of Equation [2.5] is that at a given temperature and pressure, the
molar volume of all ideal gases is equal. At STP, the ideal gas molar volume is 0.0224 140 m3, or
22.414 L. Similarly, the molar density at STP is 44.615 mol/m3. As an approximation, the molar
volume is sometimes rounded off to 0.0224 m3, or more commonly, 22.4 L*, a useful number to
memorize. Rearranging Equation [2.5] gives a convenient way to make calculations on the change
in volume as temperature and pressure of an ideal gas change:

P}V1_P2V1 [26]
7Ί Ti

* Remember that the molar volume in the AES is 359.04 frVlb-mol, often rounded off to 359 ft3.

Chapter 2 Thermophysical and Related Properties of Materials 3\

EXAMPLE 2.1 — Removal of Air by a Vacuum Pump.

A vacuum furnace has a radius of 1.00 m and a height of 2.00 m. The air initially inside is at
a temperature of 302 К and a pressure of 760.0 mm Hg, and must be pumped out until the internal
pressure is 1.00 mm Hg. Calculate the mass of air that will be removed by the pump.

Solution. The furnace volume is π(1)2(2) = 6.28 m3. To apply Equation [2.5] we need a value of R
in units of m3 and mm Hg, but no such value is listed inside the Handbook cover, or in U-
Converter. Therefore, we need a new value of R to work the example in the specified units. The
conversion procedure outlined in Chapter 1 gives R = 0.06236 m3 · mm Hg/(mol · K).

760 mm Hg(6.28 m3) = л[0.06236 m3 - mm Hg/(mol ■ K)](302 K)

Solving, the furnace contained initially 253.4 moles of air. Repeating the calculation at 1 mm
Hg, the furnace contains 0.33 moles of air. Thus, the pump removes 253.1 moles of air.

In Chapter 1, we calculated that the molar mass of dry air was 0.02897 kg/mol. Therefore, the
mass of air removed is (253.1 mol)(0.02897 kg/mol) = 7.33 kg.

Assignment. Pure C02 is flowing through a duct at the rate of 45 kg/min. Calculate the volume
and molar flow rate at 1 bar and 350 K. Assume ideal behavior.

EXAMPLE 2.2 — Gas Volume and Flowrate.

A gas of composition <pC02 = 60.0 % and <pCO = 40.0 % is flowing through a duct at the rate
of 10.0 kg/s. The duct widens from an area of 2.00 m2 to 3.00 m2. Upstream of the enlargement
point the pressure and temperature are 120.0 kPa and 402 K. Downstream of the enlargement
point the pressure is 101.0 kPa and 382 K. Calculate the linear velocity of the gas just above and
just below the enlargement point.

Solution. Following the procedure outlined in Chapter 1, the molar mass of the gas is 0.03761
kg/mol. Therefore, the molar flow rate is 265.9 mol/s. Application of Equation [2.5] at the
upstream side gives

120 000 Pa(F) = 265.9 mol[8.314 m3 - Pa/(mol - K)](402 K)
The volume flow rate is then 7.406 m3/s. The volume downstream of the enlargement point
can be obtained by using Equation [2.6]:

120 000(7.406)/402 - 101 000F2)/382
V2 = 8.36 m3/s

Above the enlargement point, the velocity is (7.406 m3/s)/(2 m2) = 3.70 m/s, and below the
enlargement point the velocity is (8.36 m3/s)/(3 m2) = 2.79 m/s.

Assignment. The moisture content of dry air increases from wH20 = 0 to 4.3 % as it passes over a
moist solid. The duct diameter is constant, as is the temperature and pressure of the gas. Use
Excel to make calculations of the ratio of the inlet/outlet velocity of the gas as a function of wH20.
Plot the result. Can you determine a mathematical relationship between the variables?

Sometimes a flow rate will be presented in apparently contradictory terms. For example, a
flow rate is reported as "250 m3/min (STP) at 150 °C and 1.5 atm". What has happened is that the
actual flow at 150 °C and 1.5 atm has been corrected to STP (using Equation [2.5]) so that it is
easier to determine the mass (or molar) flow.

A term used often in dealing with gas mixtures is pressure fraction, or partial pressure, which
is that portion of the absolute pressure contributed by one species of the mixture. Dalton 's law
states that the total pressure P of a system is the sum of the partial pressures of the individual
species. Adding a new gas to a mixture of gases in a system of fixed volume will not change the

32 Chapter 2 Thermophysical and Related Properties of Materials

partial pressures of the existing gases; P will increase, and x will decrease. For a mixture of gases
А, В, С, etc:

nBxP = xBP [2.7]

/>B =

nA+nB+nc+....

Рв = ΨΒΡ (ideal behavior only) [2.8]

2.3.2 Non-Ideal Gas Behavior

The ideal gas law is actually a limiting law, in that the quantity PVIT approaches a constant as
the mass density approaches zero. Therefore, in general, the ideal gas law can be restated as:

PVJRT = z [2.9]

where Vm is the molar volume and z is the compressibility factor. At high temperatures, z —> 1,
and the gas approaches ideal behavior. At STP, z for air = 0.9994, so for all practical purposes, air
can be treated as an ideal gas at that condition. The compressibility factor varies with temperature
and pressure and is tabulated in standard reference works for common gases. The deviation from
ideality depends significantly on the nature of the gas. For example, He and Ar obey the ideal gas
law much better than does C02 or H20.

Different formalisms have been proposed to express the equation of state for gases that
deviate significantly from ideality. One of the more common formalisms is the van der Waals
(vdW) equation of state:

и RT a

where V refers to the molar volume. The constants a and b are derived either from the critical
temperature and pressure of the gas, or by a statistical fit of experimental data. Another form of
the vdW equation involves the compressibility z.

The virial equation of state expresses the quantity PV/RT as a power series in the inverse of
volume:

PV_ 1B C [2.11]

RT z = 1 + — + —- + .

where V refers to the molar volume, and В, С, etc. are functions of temperature. Compressibility
factors and vdW equation coefficients are tabulated in the General Reference Section, along with
other equations of state. However, even though these equations are better than the ideal gas law,
they too are limiting laws, and become increasingly in error as the pressure increases.

EXAMPLE 2.3 — Compressibility of Steam.

Calculate the compressibility factor of steam from table data for an absolute pressure of 10.0
bar (1000 kPa), and over the same range using the vdW. Plot the results.

Data.

t,°C 200 220 240 260 280 300

V. L/kg 205.9 216.9 227.6 237.9 248.0 258.0

van der Waals constants: a = 5.48 x 106 [atm(cm3/g-mol)2]
b = 30.6 (cm3/g-mol)*

These values were obtained from the critical temperature and pressure values for steam.

Chapter 2 Thermophysical and Related Properties of Materials 33

Solution. The context of this problem indicates it is intended for a spreadsheet solution. The data
table is extended by making calculations to convert °C to K, and Vm to cmVmol by multiplying the
volume in L/kg times the molar mass of water, 18.016 g. For example, FmH20 at 200 °C (473 K)
= 205.9(18.016) = 3709.5 cm3. The compressibility z at 200 °C calculated as:

z = 1000x3709.5 = 0.943
- 8314x473

where the gas constant R = 8314 cm3 · kPa · mol"1 · K"1

The units in the vdW equation indicate that a pressure in atm is convenient; P = 9.869 atm.
At 200 °C, the pressure is:

P = 82.06x473 5.48x10е
F-30.6 V1

where the gas constant R = 82.06 cm3 · atm · mol-1 · K"1.

Excel's Goal Seek tool was used to search for a value of Vm such that P = 9.869. The
resulting values of P9 V, and T were used to calculate z for each temperature. The results are
shown on the table below, and plotted in Figure 2.1.

The results show that the vdW equation is better than the ideal gas equation (where z = 1) as
an equation of state for steam, but still falls short of an adequate representation of the measured
behavior of steam.

Table values 473 493 513 533 553 573

Γ,Κ 3709 3908 4100 4286 4468 4648
0.943 0.953 0.961 0.967 0.972 0.976
V, cc/mol

z

van der Waals 3819 3992 4164 4335 4506 4677
0.971 0.974 0.976 0.978 0.980 0.982
V, cc/mol
z

Compressibility of Steam at 1000 kPa
1.00

Table — D - vdW

0.94
200 220 240 t oC 260 280 300

Figure 2.1 Comparison between the compressibility factor z for superheated steam at 1000 kPa
(9.869 atm) as obtained from tabular values, and as calculated from the van der Waals equation.
An ideal gas has z = 1.

Assignment. The next-page table lists tabular values for the volume of steam at the pressure where
the steam is in equilibrium with liquid water (called saturated steam). Calculate the
compressibility factor for saturated steam from the data, and compare to the vdW equation values.

34 Chapter 2 Thermophysical and Related Properties of Materials

Saturated steam 20 40 60 80 100 120 140 160

/,°C 57840 19550 7679 3409 1673 891.5 508.5 306.8
V, L/kg 2.337 7.375 19.92 47.36 101.3 198.5 361.4 618.1
РДРа

A better agreement between the vdW and table values for a non-ideal gas can be obtained by
calculating a and b via a statistical fit of the tabular values over a limited range of temperature and
pressure of a gas. This restricts the general applicability of the vdW equation to the range of
conditions over which the values of a and b were derived. A regression procedure for deriving
equation coefficients from tabular data is described in Chapter 3.

2.4 Condensed Phases

The effect of temperature and pressure on the volume of liquids or solids is much less than for
gases. The effect of temperature is usually much more important than pressure, and is expressed
as a coefficient of thermal expansion a. As a material gains thermal energy, the atoms vibrate and
behave as if they had a larger atomic radius. The average atomic or molecular distance increases,
and so too does the overall material dimension. The coefficient of expansion is an indication of
the strength of the atomic bonds. For example, metals with high melting points have low values of
a. The effect of temperature on volume is given by:

a = —1 ' 'δνλ (К)" [2.12]

V

For solids, a is expressed as a linear coefficient (units of K_1) to express the change in length
of a solid object as a function of a change in temperature. The value of ацпеаг ~ 3(avoiUme)· Strictly
speaking, the value of a should apply to a single crystal and would be dependent on the
orientation, but in practice, the value is cited for polycrystalline materials. Values of a are
tabulated in some of the General Reference Section citations for a specific temperature, but in
actuality, a is a function of temperature. The design of piping for plants carrying hot fluids
requires careful attention to the expansion that can occur as processes start and stop.

EXAMPLE 2.4 — Thermal Expansion of Titanium.

Calculate the increase in length of a 15 m titanium pipe subjected to a temperature increase
from 25 to 100 °C on the assumption that a remains constant over this temperature range.
Data. The value of a is 8.6 x 10"6 K"1 at 25 °C.
Solution. The fractional increase in length is given by:

Fractional length (m/m) - (8.6 x 10~*)(100 - 25) - 6.45 x 10^
A 15 m pipe would then increase by 15(6.45 x 10-4) = 9.68 χ 10~3 m, or 9.7 mm.
Assignment. Calculate the increase in pipe diameter if it was initially 35 cm.

For liquids, the thermal expansion is expressed as the volume (or cubic) coefficient of
expansion. The density of water from 0 to 100 °C is shown in Figure 2.2. Clearly, the volume
coefficient of expansion is not a constant over the temperature range. It is common to express the
density as a function of temperature rather than using the volume coefficient of expansion. The
density of water is adequately expressed as follows, with an accuracy to ±0.5 kg/ m3:

Density of water (kg/m3 or g/L) - -0.00357/2 - 0.0691* + 1000.5 [2.13]

Chapter 2 Thermophysical and Related Properties of Materials 35

Interestingly, water's density is a maximum at about 4 °C, which is not accounted for by
Equation [2.13].

1000 i Density of Water
990
20 40 60 80 100
ME 980 temperature, °C
!> 970

960
950

()

Figure 2.2 Density of water as a function of temperature.

2.5 Vapor-Liquid Equilibrium (VLE)

It is well known that gases will condense to a liquid or a solid if cooled sufficiently. Our most
common experience is with water vapor which we see condensing on cool surfaces and
evaporating from warm ones. All liquid and solid elements (and most compounds) have a finite
vapor pressure that increases exponentially with temperature. Eventually, a temperature is reached
such that the stable phase at all pressures is the gas. Vaporization and condensation processes are
very common in materials production and processing, so it's important to understand how these
processes occur. Our discussion of vapor-liquid equilibrium (sometimes called vapor saturation)
uses water as an example, but the principles are the same for other species.

First, consider ice. If we place ice in an evacuated vessel at a temperature below the freezing
point of water, a very small amount of ice will evaporate until a certain partial pressure of H20 is
reached. The vaporization of H20 from ice is called sublimation, a phenomenon common to all
solids. According to the Gibbs phase rule (Section 2.2), the water vapor-ice system has one degree
of freedom, which means that the vapor pressure of ice is a function of temperature alone. For
many engineering calculations, the logarithm of the vapor pressure of a substance is approximately
a linear function of reciprocal absolute temperature. Equation [2.14] can be used to calculate the
vapor pressure of ice. In vacuum systems where the last traces of water vapor must be removed, a
"cold trap" containing liquid nitrogen is used to lower the vapor pressure of water to a vanishingly
small amount.

log(pH20, ice), kPa = - 2665/7+ 9.54 [2.14]

For units of torr, add 0.875 to the equations. For units of atm, subtract 2.006.

Next, suppose liquid water is added to the ice-vapor system. At 0.01 °C (the triple point),
water, ice and steam are present at a steam pressure of 611 Pa (or about 6.03 x 10"3 atm). The
system is invariant, so it has zero degrees of freedom. Above the triple point, the vapor pressure of
liquid water increases exponentially with temperature. If the temperature is raised high enough,
the vapor pressure of water eventually becomes so high that the volume of a mass of steam and
water become equal, and there is no visible difference between the two phases. This is called the
critical point, which for water is at 374.15 °C and 22 120 kPa (218.3 atm). A mixture of water and
steam is a univariant system, with one degree of freedom.

Finally we consider steam. Steam in equilibrium with water is given a special name:
saturated steam. If saturated steam alone is present in a closed system, any decrease in

In this discussion, water refers to the liquid form of H20, and steam to the vapor form.

36 Chapter 2 Thermophysical and Related Properties of Materials

temperature or increase in pressure will cause some steam to condense to water, and the steam
remains saturated with water. If saturated steam is heated, or the pressure is dropped, it is no
longer saturated with liquid, and is given a different name: superheated (or dry) steam. The
deviation of steam from the ideal gas law is significant at pressures above about 10 bar, as was
shown in Example 2.3. The properties of saturated and superheated steam (and water up to 374
°C) are so important that special tables {steam tables) have been created that list the
thermophysical and thermodynamic properties of water, saturated steam, and supersaturated steam.
Hard-copy steam tables have been largely supplanted by steam calculation programs
(ChemicaLogic 2003, Archon 2001, MegaWatSoft 2008).

Equation [2.15] gives a simple 2-term equation for the vapor pressure of water. The accuracy
can be improved by adding a third term (in T or log7). Depending on the accuracy required,
Equations [2.15] or [2.16] can be used between the triple point and normal boiling point of water.
Please copy these two equations to the inside cover of your Handbook for easy future reference.

log(pH20), kPa = -2256/Г+ 8.061 [2.15]

log(pH2O),kPa--2571/r-0.00304r+ 10.029 [2.16]

For units of torr, add 0.875 to the equations. For units of atm, subtract 2.006. For bar,
subtract 2. Above 380 K, consult steam tables, use one of the steam calculation programs, or see
an expanded chart and equations of water vapor pressure on the Handbook CD in folder Charts.

EXAMPLE 2.5 — Evaporation of Water in a Closed Vessel.

1.00 kg of water is placed into a vessel of 1.0 m3 volume, and heated to 350 K. Calculate the
pressure in the vessel, and the mass of water and steam.

Solution. We can ignore the volume occupied by the liquid water because of the size of the vessel
and the number of significant figures given in the problem statement. The vapor pressure of water
from Equation [2.16] is 41.6 kPa. Application of the ideal gas law gives:

Amount = 41 600(1.0)/(8.314)(350) = 14.3 moles steam

The mass of steam is 14.3(18.0) = 257 g. By difference, the mass of water is 743 g. This
occupies 0.074 % of the vessel volume, confirming that the volume of water in the vessel is
negligible in comparison to the total volume. This would not be true for a much smaller vessel.

Assignment, a) Calculate the mass of steam assuming the volume of steam is better represented by
the vdW equation than the ideal gas equation, b) Calculate the temperature where the mass of
steam is 0.85 kg (assume ideal behavior).

Figure 2.3 shows a plot of the vapor pressure of water and ice vs. temperature. An expanded
phase diagram is shown in the Handbook CD in folder Charts. A log scale was used for pressure
because a linear scale would not have displayed the lower temperature values in a visible manner.
Other volatile substances have characteristics similar to those of water. The solid curved line
represents the vapor pressure of saturated steam. This line is often called the Vapor-Liquid
Equilibrium line, or VLE line, and we'll use this term often in the Handbook. The intersection of
the vertical solid line and the VLE at 273 К represents the triple point for H20, where ice, water,
and steam are all present. The vertical dashed line at 373 К represents the normal boiling
temperature of H20 in an open container at an ambient pressure of one atm.

The significance of Figure 2.3 is illustrated by following the conditions outlined by two
different simple paths: A—>B—>C, and A—>D—>E. Certain conditions of pressure and temperature
are attainable by placing water into a cylinder fitted with a piston whose position can vary and
thereby maintain a specified pressure. Assume that heat can be added or removed from the
system. Path A—>B—>C is at a constant pressure of 900 kPa (about 9 atm). Path A—»D—>E is at a

Chapter 2 Thermophysical and Related Properties of Materials 37

constant temperature of 360 K.* The pressure at point A exceeds the vapor pressure of water, so
there is no vapor phase in the cylinder.

Now suppose heat is added to the system under isobaric conditions, starting at point A (365
K). This causes the water temperature to increase along line A—>B at 900 kPa. Water is the only
phase initially present as the temperature increases. Once the water temperature reaches point В
(440 К), adding more heat will not increase the temperature, but instead will cause water to
evaporate, creating vapor in the cylinder, and causing the piston to rise. The piston will continue
to rise because the added heat will produce more vapor. Eventually (still at point B, 440 K), all of
the water will have been transformed to saturated steam, with a large increase in volume. From
that point on, if more heat is added the cylinder will contain superheated steam.

1.E+08 Vapor Pressure of H20

-►C Critical point
(374 °C, 2.21 MPa)
Normal
boiling point

E Superheated steam

riple point
(0.01°C, 611 Pa)

250 300 350 400 450 500 550 600 650
T,K

Figure 2.3 Phase diagram showing the /?H20 (in Pa) as a function of temperature. The heavy
curved line represents the saturation condition for steam (the VLE line). The heavy vertical line
represents the phase boundary between ice and water.

Now suppose the pressure is reduced, starting at point A, while keeping the temperature
constant at 365 К (i.e. an isothermal process). Water is the only phase as pressure decreases
towards point D. The piston does not move until point D is reached. At point D (about 0.6 atm),
any attempt to lower the force on the piston below 0.6 atm causes water to evaporate and the piston
to rise, but the pressure will remain constant at 0.6 atm. There is a large increase in volume at
point D. Eventually the cylinder (still at point D) will contain only saturated steam. Heat must be
supplied to provide the heat of vaporization to keep the temperature at 365 К and the pressure at
0.6 atm. Once all the water has vaporized, further lowering of pressure towards point E results in
the cylinder being filled with superheated steam.

Any other path from point A to the superheated steam region has the same result: no
evaporation until the P-T condition is that of saturated steam. Then, the water vaporizes at the
saturation (VLE) line, which requires heat to be supplied if the process is isothermal or isobaric.
After that, any combination of decreased P or increased T brings the system to the region of
superheated steam. Going in the opposite direction, there is no condensation of water along a path
from the superheated steam region towards the water region until the VLE condition is reached.

A constant-pressure process is an isobaric (or isostatic) process. A constant-temperature process is an
isothermal process. A constant-volume process is isochoric.

38 Chapter 2 Thermophysical and Related Properties of Materials

Heat must be removed during condensation for an isothermal or an isobaric process. If
condensation or evaporation is isobaric, it must also be isothermal, and vice versa.

The upper limit on the saturation line is the critical point of H20, where the volume of the
steam after vaporization is the same as the water that is evaporating. The fluid above the critical
point is called a supercritical fluid, and has interesting properties, such as the ability to dissolve
large quantities of minerals that are otherwise nearly insoluble.

The process of boiling is a special case of evaporation that takes place under certain
circumstances. At points В and D, slowly adding heat causes evaporation to occur predominantly
at the phase boundary between the gas and fluid. If heat is added or the piston is raised rapidly,
bubbles of vapor will nucleate and grow below the surface as evaporation takes place at the newly
created interfacial bubble area in the bulk of the fluid. The rise of these bubbles agitates the fluid,
and gives the characteristic boiling appearance. The normal boiling point is that temperature
where the saturation pressure is one standard atmosphere. The actual boiling point changes with
atmospheric pressure, such that in a ski lodge high in the Rockies, it's impossible to get a really
hot cup of coffee. In the other direction, water in the boiler of a power plant boils at temperatures
well above 100 °C to high-pressure steam.

2.5.1 Mixtures of Condensable and Non-Condensable Gases

A number of terms have been developed to describe the properties of gas mixtures containing
one or more condensable species. These terms are used most often in connection with the
condensation and evaporation of water, and the moisture content of gas mixtures containing steam.
The dew point temperature (dpt) is defined as the temperature at which water would begin
condensing from a moist gas; this is of course the VLE temperature. For a given gas composition,
the dpt increases with pressure. At a given pressure, the dpt tells when moisture could form during
the cooling of a process off-gas.

The relative humidity is a term used to describe how far the conditions are from the dew
point. When the^(H20) is less than the saturation value, it has a certain humidity relative to that at
the dew point, which is expressed as a percent of the saturation humidity. Knowledge of the
relative humidity is important in processes that use large amounts of ambient air.

EXAMPLE 2.6 — Humidity and Dew Point.

Moist air at a total pressure of 100.0 kPa and 350 К has a volume fraction of steam of 20.0 %.
Calculate a) the relative humidity; b) the dew point of the gas; and c) the effect of total pressure on
the dpt and relative humidity at 350 K. Equation [2.15] is adequate for this example.

Solution, a). The saturation pH20 at 350 К is 41.2 kPa. Assuming ideal gas behavior, /?(H20) =
20.0 kPa. The % relative humidity (%RH) is given by:

%RH = 20.0(100)/41.2 = 48.5 %

b). The dew point temperature is the saturation temperature at /?H20 = 20.0 kPa. From
Equation [2.15], at 20.0 kPa the saturation temperature (dpt) is 334 K, or 61 °C. Therefore, the dpt
is about 16 ° below the gas temperature.

c). The solution to this part is aided by reviewing the earlier discussion of Figure 2.3. The
need for several calculations means that a spreadsheet solution is preferred. The relative humidity
is easily calculated by following the procedure used in part a). As the total pressure increases, so
does thepH20. Table 2.1 shows the results for a range in total pressures between 25 and 225 kPa.
Somewhere between 200 and 225 kPa, the relative humidity reaches 100 %, so the value at 225
kPa is unstable. This is understandable since the saturation pressure of steam is 41.2 kPa for the
gas conditions.

Chapter 2 Thermophysical and Related Properties of Materials 39

Table 2.1 % Relative humidity at 350 К for moist air having φΗ20 = 20 %. The conditions used
in the calculations for part a) are shown in bold.

P total, kPa 25 50 75 100 125 150 175 200 225
pH20 5 10 15 25 30 35 40
%RH 11.9% 23.8% 35.7% 20 59.5% 71.4% 83.3% 95.2% 45
47.6% 107.1%

The dpt is calculated by revising Equation [2.15] as follows:

Dew point, К = 2256/8.061 - log(pH20)

This formula was used in an Excel worksheet to calculate the dpt for total pressures between
20 and 300 kPa. Table 2.2 and Figure 2.4 show the results. As the pressure increases, so too does
the/?H20, and with that, the dpt goes up. Based on the original moist air temperature of 350 К (77
°C), Figure 2.4 indicates that the total pressure could increase to about 2.15 atm without danger of
condensing water.

Table 2.2 Dew point temperature of moist air having (jpsteam = 20 %. The conditions used in the
calculations for part a) are shown in bold.

P total, kPa 25 50 75 100 140 180 220 260
P total, atm 0.247 0.493 0.740 0.987 1.382 1.776 2.171 2.566
Dew point, К 306 319 327 333 341 346 351 355

Dew point, °C 33 46 54 60 68 73 78 82

Dew Point of Moist Air

о

о

£

2+■»

Q.

£

30 -J !

0 0.5 1 1.5 2 2.5 3

total pressure, atm

Figure 2.4 Relationship between total pressure and dpt of moist air having cpsteam = 20 %. Dew
point at 350 К (77 °C) is at 2.15 atm (218 kPa) as indicated by D symbol.

Assignment. The moist air in this example is cooled to 320 K, which condenses some water. The
air is then separated and reheated to 350 K. Calculate the %RH at 350 K.

2.5.2 Software for Making Dew Point and Humidity Calculations

The technology of measuring and expressing the amount of moisture in air is called
psychrometry. A common way to express the various relationships discussed earlier is a
psychrometric chart, in which the moisture content of air (usually expressed as g H20/kg of dry
air) is plotted against temperature. Percent relative humidity lines are superimposed on the
diagram, along with thermal data. Psychrometric charts are available in chemical and mechanical
engineering reference works, and can be downloaded from the internet. More useful are programs
that make psychrometric calculations such as the PsyCalc 98 program (Linric 1998), which is
available on-line. Chapter 7 covers the subject of the thermal effects of the condensation and
evaporation of water.

40 Chapter 2 Thermophysical and Related Properties of Materials

EXAMPLE 2.7 — Moisture Content of Clay Dryer Streams.

Moist clay is dried by passing hot dry air across the clay in a rotary dryer. A sketch of the
process devices and streams is shown below. P = 0.975 atm. The ambient air flowrate is 1000
m3/min at 35 °C and 18% RH and is heated to 145 °C. The moist air leaving the dryer is at 73 °C
and 87% RH. Use the PsyCalc 98 program to: a) calculate the mass of water removed from the
clay per m3 of ambient air entering the heater, and b) calculate the quantity of ambient air that
should be added to the moist air such that its dpt is 28 °C.

vvei eia у I

Ambient air . Air heater Hot air Clay dryer Moist air
35 °C, 18%RH 145 °C * 73 °C, 87 %RH

1

Dry clay

Solution. The PsyCalc program was used to generate several characteristics of the ambient and
moist air streams. These characteristics (after some editing) are:

P dbt RH dpt H20/dry air

Amb. air 98.80 kPa 35.0 °C 18.0% 7.17 °C 6.47 g/kg

Moist air 98.80 kPa 73.0 °C 87.0 % 69.75 °C 285.03 g/kg

Amb. air НгО/amb air pH20 wH20 φΗ20 Sp vol/dry air
Moist air 7.15 g/m3 1.041 kPa 6431 PPM 10,261 PPM 0.9046 m3/kg
194.22 g/m3 30.87 kPa 221,810 PPM 312,430 PPM 1.467 m3/kg

Some explanation is needed before proceeding with the solution. The term "dbt" refers to
"dry bulb temperature", as measured by a dry thermometer. It is equivalent to the temperature
measured normally, but is given the designation "dry bulb" in contrast to the "wet bulb"
temperature (wbt) measured by a thermometer surrounded by wet gauze in a flowing air stream.
The wbt was used as an indication of the %RH before modern psychrometers were available. At
100 %RH, the wbt and dbt are the same.

The columns containing the words "dry air" refer to the mass of water per kg of dry
(moisture-free) air, or specific volume (actual cubic meters of moist air per kg of dry moisture-free
air). The pH20 refers to the partial pressure of water vapor in the gas, and is equivalent to the
vapor pressure of H20 at the dpt (i.e., at the VLE temperature). The terms wH20 and φΗ20 refer
to the mass and volume fraction of H20 in the moist air, expressed in parts per million. These
definitions serve to illustrate an important standard used by psychrometric charts and software: the
basis is one kg of dry (moisture-free) air, which occupies 0.7738 m3 at STP and contains 34.53
moles. With a little arithmetic, it is possible to convert some of the table values to others, but for
the most accurate calculations on water vapor, actual rather than ideal gas behavior must be used.
For example, water vapor in moist air at 73 °C has a compressibility of 0.994. Note that the
PsyCalc program calculates the specific volume of dry air at any temperature and pressure if you
set the dew point to -61 °C, which is effectively "dry air". In addition, the vapor pressure of water
at VLE can be obtained by either setting the dpt = dbt, or %RH = 100.

To return to the solution of part a), we choose a basis of 1 m3 ambient gas. From the data
table, φΗ20 = 0.01026*, which amounts to 0.9897 m3 of dry ambient air entering the air heater.
PsyCalc showed that the specific volume of ambient air was 0.9046 m3/kg, the reciprocal of which
is 1.1055 kg dry air/m3 ambient air. In addition, the ambient air contained 7.15 g H20. The moist
air contained 285.03 x 1.1055 = 315 g H20. The ratio of moist air/ambient air flow is

The фН20 value can also be obtained by dividing the/?H20 by the total pressure.

Chapter 2 Thermophysical and Related Properties of Materials 41

1.467/0.9046 = 1.6217. On the original basis of 1000 m /min of ambient flow, the moist air flow
is 1621.7 m3/min and the mass of water removed from the clay is 308 kg/min.

The solution to part b) requires use of the "flow" version of PsyCalc. The moist air will be
blended with ambient air such that the mixed stream will have a dpt of 28 °C. The flow version
allows input of actual or standard flow units,

Assignment. Use PsyCalc to determine the amount of water that would condense out of the moist
air if it were cooled to 60 °C.

A word of warning about using any commercially developed program that doesn't include the
source code: Always double-check your answer with an alternate method if it's going to be used
for a critical application. In critical applications, many engineers will purchase programs from
two different vendors and use both to assure that the calculations are done correctly.

The discussion so far has focused on the air/H20 system, mainly because it is so commonly
encountered, which makes it a good example. The principles are the same for all systems, and
vapor saturation processes are common in many material processing operations. For example, two
substances may be separated if their vapor pressures are much different. Figure 2.5 shows the
vapor pressure of zinc and cadmium over a pressure range of 10 orders of magnitude. The distance
between the VLE lines is an indication of the effectiveness of a condensation process for
separating one metal from a gas mixture of the two.

Figure 2.5 Vapor pressure of pure Cd and pure Zn between about 144 °C and the normal boiling
points of the metals. The variables chosen for the diagram axes were selected to linearize the
relationships.

The diagram shows that the VLE pressure difference decreases with temperature, from a
factor of about 13 just above the melting point of zinc to a factor of about five at 1000 К. А
separation process based on selective condensation of zinc is therefore favored by operating at as
low a temperature as possible, but in practice, other factors such as heat transfer and equipment
size would play a role in selecting the best operating temperature. Another important factor

42 Chapter 2 Thermophysical and Related Properties of Materials

affecting the outcome of a separation process is that the zinc and cadmium are alloyed, which
lowers both their vapor pressures. As cadmium is removed from the alloy, its vapor pressure
decreases, while that of the zinc increases. This is discussed in more detail in Section 2.7.

2.6 Effect of Pressure on Phase Transformation Temperatures

Solid and liquid phases are in equilibrium at the melting (or freezing) point, which by
convention is listed at 1 atm pressure. However, pressure has an effect on the freezing point and
vapor pressure of a liquid, and on the transformation temperature of one form of a solid to another.
The effect of external pressure on phase transformation temperature is being used in various
materials processing operations to produce crystalline materials that aren't otherwise stable. In the
case of the freezing point, the relationship is:

dT _ T(Vt - Vs) [2.17]
dP~ AHf

where AHf is the heat of fusion of the substance and Vi and Fs are the respective molar volumes of
liquid and solid. Equation [2.17] is one form of the Clapeyron equation. From knowledge of the
densities of the liquid and solid phases, and the heat of fusion, it is possible to determine the
variation of freezing point of the substance with pressure.

Qualitatively, if Vi is larger than Vs, then dT/dP will be positive, and the melting point will
increase with applied pressure. This is the case for the majority of solids. However, if the liquid
has the greater density, an increase of pressure will cause the freezing point to decrease. A very
few substances, notably water, bismuth, and antimony, exhibit this type of behavior.

EXAMPLE 2.8 — Effect of Pressure on the Freezing Point of Water.

Calculate the effect of pressure on the freezing point of water.

Data. At the freezing point of water (0 °C, 273.15 K) the density of water is 999.84 kg/m3, and the
density of ice is 916.84 kg/m3. ΔΗ{ is 6.01 kJ/mol.

Solution. The molar volumes of water and ice are 1.8019 x 10"5 m3 and 1.9650 x 10-5 m3
respectively. Equation [2.17] becomes:

dT _ 273.15(1.8019 xlO"5-1.9650 xlO'5)
dP " 6010

--^xlO^-deg-m3·.!-1

Recognizing that 1 Pa-1 = 1 m3 · J~\ the freezing point of water drops by 7.4 x 10~8 deg-per Pa
increase. Since dT/dP is small, it may be assumed to be constant over a considerable pressure
range. So for example at a pressure of 1 MPa (about 10 atm), ice melts at 273.08 K. If a small
object is pressed against a block of ice at 0 °C (such as an ice skate), a thin film of water will form.

Assignment. Find a reference for the densities of rhombic and monoclinic sulfur at their transition
temperature. Given that the transition temperature increase is 0.035 deg atm-1, calculate the heat
of transformation.

The vapor pressure of a liquid or solid is also affected by the influence of external pressure,
imposed for example by the introduction of an inert gas. The following relationship (the Poynting
equation) for the effect of pressure on the vapor pressure of a liquid is:

Φ_ = Κΐ [2.18]
dP Vv

Chapter 2 Thermophysical and Related Properties of Materials 43

where P is the total pressure on the liquid and p is the vapor pressure. Assuming the vapor to
behave like an ideal gas, and replacing dp/dP by AplAP, Equation [2.18] may be written as:

^p =RT^AP [2.19]

EXAMPLE 2.9 — Effect of Pressure on the Vapor Pressure of Water.

Calculate the change in vapor pressure of water at 35 °C when it evaporates into a space
containing an insoluble gas at various pressures from 100 to 2000 kPa (or 1 to 20 atm). Express
the results as a graph.

Solution. We start with a detailed calculation at 1 atm, and repeat at other pressures. From
Equation [2.16], the/?(H20) - 5.61 kPa. The density of water from Equation [2.13] is 993.7 kg/m3
at 35 °C, which gives a molar volume of 1.813 x 10"5 m3/mol. From the units used (pressure in Pa,
volume in m3, amount-of-substance in moles, and temperature in K), the value of R is 8.314 m3
• Pa/(mol · K). The units on Ap and/? are immaterial so long as they are the same.

When water vaporizes into an empty chamber, the total pressure P is equal to the vapor
pressure of water, i.e. /?H20 = P = 5610 Pa at 35 °C (308.15 K). If an inert gas is added such that
its partial pressure is 101 330 Pa (1 atm), the external pressure AP is increased by approximately
101 300 Pa. Assuming that/?H20 is not altered greatly, Equation [2.19] is then:

Ap _(1.813xl0~5)(1.013xl05)

5610 " 8.314(308.15)

which gives Ap = 4 Pa. The/?H20 is thus 5610 + 4 = 5614 Pa. Figure 2.6 shows the results of a
series of calculations in terms of the absolute increase in /?H20 and the percent change in /?H20.
The effect of external pressure on substance's vapor pressure is seen to be relatively small, but can
be significant in distillation processes operating at high pressures where knowing the exact vapor
pressure is critical to designing a separation reactor. A similar set of calculations at 50 °C gave
absolute increases about double those at 35 °C, but the percent change was virtually the same.

Effect of External Pressure on Vapor Pressure of Water

80 1.50%

Δρ \^У' >> 1.25%
% increase in pH20
60 ^ ^^>^. >χ ^ ^ 1.00%

CO ^ 0.75% Φto
0.50%
<od. 40 ^^"t CO

20 Φ
о

0.25%

0.00%
200 400 600 800 1000 1200 1400 1600 1800 2000

external pressure, kPa

Figure 2.6 Results of calculations showing the effect of external pressure on the vapor pressure of
water at 35 °C. The external pressure is exerted by an inert gas.

Assignment. A chamber of 1.000 m3 volume is filled with saturated steam plus a small amount of
water at 100 °C. An inert gas is then added to the chamber until the total pressure inside is 20 atm.
Calculate the normal boiling point of the water, and the mass of water evaporating or condensing
because of the introduction of the inert gas.

44 Chapter 2 Thermophysical and Related Properties of Materials

2.7 Steam and Air Property Calculators

Water, steam and air are so commonly encountered in materials processing that special tables
and software have been developed to assist in making material and energy balances involving
these substances. To a first approximation, we often treat steam and air as ideal gases, with
acceptable error at pressures below 1 atm and temperatures above 25 °C. This approximation is
unsatisfactory for accurate calculations, or at elevated pressures. The non-ideal behavior of steam
at elevated pressure was discussed in Section 2.3, and in detail in Example 2.3. Data in that
example came from a steam table, which gives the properties of water and steam over a wide range
of temperature and pressure. Data for air is available from NIST (Lemmon 2000), and is on the
Handbook CD (in folder Air) as an embedded WordPad document in workbook AirPVT.xls.

The steam and air tables are cumbersome to use, and often require interpolation to find the
value needed for a particular situation. Much better are calculational programs available from
various vendors. Some vendors have demo versions, while others can be accessed free on-line,
such as the steam programs mentioned in the Appendix and in Section 2.5. These programs are
used in later chapters where energy balances are discussed. There is no student program for air,
but commercial types can be purchased from various vendors and the NIST (Lemmon 2002).

2.8 Properties of Solutions

We have already discussed solutions of gases (such as air), where for ideal gases, the partial
pressure of a species is directly proportional to the mole (or volume) fraction. In this section we
describe condensed-phase solutions, which are very common in materials processing. Very often,
the first metallic phase produced in the extractive flowsheet is a liquid solution, containing one or
more impurities that must subsequently be removed. Other solution phases are slags (mixtures of
molten oxides), mattes (mixtures of molten sulfides), and molten salts. The final product of a
materials production process is often a solid solution, such as an iron-carbon alloy.

The earlier calculations on vapor-liquid equilibria were confined to pure substances. In
general, the vapor phase in equilibrium with a solution phase will contain all of the species of the
solution phase so long as the solution phase species do not react to form complex compounds.
Each solute will exhibit a unique vapor pressure that is a function of temperature and solution
composition. The composition of the vapor phase will seldom be the same as the composition of
the solution phase. The relationship between the composition of the vapor phase and the
composition of the solution depends mainly on the vapor pressure of the pure substances, and how
these substances interact in the solution phase. A considerable body of scientific knowledge and
data are available on the thermodynamic properties of solution phases (see citations in General
References, page 605), but for the purposes of this Handbook, only two types of solution behavior
will be discussed: an ideal and a regular solution. The first type of solution comprises substances
that are very similar in properties, such as straight-chain hydrocarbons and a few very-similar
metals. The second type allows for a symmetrical deviation from ideal behavior, which expands
its applicability.

2.8.1 Ideal Solutions — Raoult's Law

We have already introduced the concept of an ideal solution when we discussed mixtures
(solutions) of ideal gases. The partial pressure of each gas was a function of the mole fraction of
the gas in the mixture. For condensed-phase solutions, we describe ideal behavior in terms of the
partial pressure of the pure component and its partial pressure in the dissolved-state. For a two-
component system A and B, where/?0 is the vapor pressure of the pure component:

PA=P°A(XA) [2.20]

where xA refers to the mole fraction of component A in the condensed-phase solution. Equation
[2.20] means, for example, that the vapor pressure of an ideal solution component of mole fraction

0.5 is equal to half of the vapor pressure of the pure component.

Chapter 2 Thermophysical and Related Properties of Materials 45

Equation [2.20] is an expression of Raoult's law (strictly speaking, it is a limiting law as x ->
1). Unfortunately, Raoult's law has very limited applicability in describing the behavior of most
solutions encountered in material processing. Solutes conform more closely to Equation [2.20] as
mole fraction approaches unity, i.e., to the solvent constituent. We are forced to use Raoult's law
when we have no other information about a system.

Consider the Cu - Ni system, which exhibits complete solid and liquid solubility*. The two
elements are very similar in chemical and physical properties, so to a first approximation, the
relationship between vapor phase composition and liquid solution (alloy) composition can be
approximated by application of Raoult's law. The vapor pressure of the pure liquid elements is
given by:

Log(p°Cu) = -15 900/Γ+ 5.60 [2.21]

Log(/?°Ni) = -20 300/Γ + 6.43 [2.22]

where p is expressed in atm, and temperature in K. The equations are valid from 2000 К to the
normal boiling point (3157 К for Ni and 2838 К for Cu). Suppose we are interested in
determining the relationship between the composition of the liquid and the gas phases at 2600 K.
At xCu = 1, the gas phase is pure Cu and/>°Cu = 0.305 atm. At xNi = 1, the gas phase is pure Ni
and ;?°Ni = 0.0419 atm. For Cu - Ni solutions, the partial pressure of each species can be
estimated from Raoult's law.

At xCu = 0.5, pCu - 0.5(0.305) - 0.1525 atm. Similarly, ^Ni = 0.5(0.0419) = 0.02095 atm.
The alloy vapor pressure ispNi + pCu, or 0.1735 atm. For the gas phase, <pCu = 87.9 %, balance
Ni. The xCu in the gas is thus 1.76 times the xCu in the liquid.

We can use a similar type of calculation to determine the composition of the alloy phase when
the vapor pressures of the two elements are the same at 2600 K. The defining relationships are:

/?°Cu(xCu)=/?°Ni(xNi)

xCu + xNi = 1

With appropriate substitutions, we find that xCu = 0.120, and the partial pressures of each
element are 0.0367 atm.

EXAMPLE 2.10 — Vapor and Liquid Phase Composition for the Cu - Ni System.

Calculate the composition of the vapor and liquid alloy phases for Ptotal = 0.02 atm. Assume
the liquid alloy obeys Raoult's law.

Solution. We know that the sum of />Ni + pC\x = 0.02 atm, but we don't know the temperature.
However, we know that the temperature will be between 2178 К (xCu = 1) and 2497 К (xNi = 1).
We can write Raoult's law for Cu and Ni between these temperatures. For example, at 2250 K,
p°C\\ = 0.0341 atm and/?°Ni = 0.00256 atm. A recast Raoult's law expression for each element is:

xCu - 29.3(pCu)

xNi = 391(/?Ni)

Substituting in that xCu = 1 - xNi and/?Cu = 0.02 -pNi, we find thatpCu = 0.01886 and/?Ni
= 0.00114 atm. Then in the alloy xCu = 0.552 and by difference, xNi = 0.447. Also, in the gas
xCu = 0.943 and by difference, xNi = 0.057. This calculation is easily repeated for other
temperatures, with the following results shown in the next-page table (bold values from preceding
arithmetic). Figure 2.7 shows a chart of the results.

This behavior is sometimes called "miscible in all proportions".

46 Chapter 2 Thermophysical and Related Properties of Materials

T,K 2178 2200 2250 2300 2350 2400 2450 2497
0.0016 0.0026 0.004 0.0062 0.0094 0.0139
p°m 0.0013 0.0236 0.0341 0.0486 0.0682 0.0944 0.1289 0.02
0.0003 0.0011 0.0026 0.0048 0.0082 0.0132
p°Cu 0.0199 0.0197 0.0189 0.0174 0.0152 0.0118 0.0068 0.1707
рШ 0.8368 0.5522 0.3582 0.2226 0.125 0.0527
pCu 0 0.987 0.9427 0.8711 0.7594 0.5901 0.3397 0.02

xCu (alloy) 0.02 0
0
xCu (gas) 1 0
1

Figure 2.7 Isobaric vapor-liquid equilibria for the copper-nickel system at .Ptotal = 0.02 atm.
Liquid alloy assumed to follow Raoult's law. The composition of the gas and liquid phases are
connected by a horizontal (i.e., isothermal) tie line. The lower line (0 symbol) represents the
composition of the saturated liquid at Ptotal = 0.02 atm.

Assignment. Calculate and plot the isothermal VLE diagram for the system at 2250 K.

2.8.2 Non-Ideal Solutions — Activity Coefficients

As mentioned earlier, solution phases encountered in materials processing are seldom ideal,
and some are very far from ideal. One way to describe the behavior of a solute is to define how far
it deviates from ideality. A parameter that describes such deviation is called the activity coefficient
and has the symbol γ. Equation [2.20] (Raoult's law) is altered to include an activity coefficient to
define the "extent of non-ideality". In terms of constituent A:

aA = activity of A = (YA)(*A) = PA/P°A [2.23]

If Raoult's law is valid, γ = 1. Where it is not, γ is some function of composition and
temperature. The activity coefficient may be greater than one, in which case the solution
component is said to deviate positively from Raoult's law. A negative deviation occurs when γ is
less than one. In cases where the deviation from ideality is fairly symmetrical across the range of
composition, a one-parameter equation has been found to be useful — the regular solution
approximation. For a two-component solution of A and B:

R71n(YA) = a(x2B) [2.24]

where a is a constant, independent of temperature and composition, with the same value for both
constituents of a binary solution*. We see that as xA —* 1 (i-e., хв —* 0), γΑ —► 1, in conformance

with the observation that the solvent constituent becomes more ideal as its composition increases.

* The regular solution approximation can be applied to ternary and higher-order systems, but is beyond the
scope of this Handbook.

Chapter 2 Thermophysical and Related Properties of Materials 47

Another consequence of the regular solution approximation is that as temperature increases,
the system becomes more ideal, which is in agreement with experience. Although very few
systems follow the regular solution formalism across the entire composition range, it may be quite
adequate over a limited range. Then, a should be derived from experimental data in the
composition range of interest.

At the other extreme of composition (i.e., as xA —» 0), the activity coefficient of the dilute
solute (here, γΑ) approaches a limiting value, designated γ°Α· The use of a fixed activity coefficient
for a very dilute solute (i.e., a reference state) is designated as the Henry's law approximation, and
γ° is called the Henry's law activity coefficient. Henry's law is thus a limiting law for the activity
coefficient of a regular solution constituent at infinite dilution. Quite often, the Henry's law
coefficient is satisfactory for dilute solutions up to a mole fraction of solute of 0.02.

To summarize, for every non-ideal solution, the solvent species activity coefficient
approaches Raoult's law behavior as its mole fraction approaches one, while the solute species
activity coefficient approaches Henry's law as its mole fraction approaches zero. The regular
solution formalism codifies the way that the activity coefficient varies with composition, and
conforms to Raoult's and Henry's law at the extremes.

EXAMPLE 2.11 — Evaporation from Liquid Cd - Mg Alloys at 700 °C.

Mixtures of Cd and Mg are heated to 700 °C to form a liquid solution. A 10.00 kg melt
contained initially wCd = 40.0 %. The melt lost 2.5 g by evaporation. Calculate the composition
of the vapor.

Data. Cd - Mg liquid alloys obey the regular solution approximation, with a = -4100. The vapor
pressure equations for liquid Cd and Mg are:

log(p°Cd) = -5250/Γ + 5.05 log(p°Mg) = -6900/Γ+ 5.10

Solution. The loss of 2.5 g will not make a significant change in the alloy composition. The
original (and final) alloy had xCd = 0.126, and contained nCd = 35.58 and nMg = 246.86. At 700
°C (973 K),p°Cd = 0.451 atm, and/?°Mg = 0.0102 atm. The regular solution equations for Cd and
Mg are based on Equation [2.24]:

1.987(973)(lnyCd) = -4100(0.874)2 1.987(973)(1ηγΜ§) = -4100(0.126)2

The partial pressures of Cd and Mg are calculated from Equation [2.23]:

pCd = 0.451(0.198)(0.126) = 0.0113 atm

pMg - 0.0102(0.967)(0.874) - 0.00862 atm

The Ptotal of the gas is 0.020 atm, with xCd = 0.567, balance Mg. The gas contains wCd =
85.8%, balance Mg.

Assignment. What is the composition of a Cd-Mg alloy that haspCd =pMg at 973 K?

2.8.3 Solutions of Gases in Condensed Phases

Practically every known solid or liquid will dissolve one or more of the common elemental
gases (N2, 02, H2, S2 etc.), sometimes forming a reaction product. This section will deal with gases
that do not form strong bonds or dissociate into ions upon dissolving. HC1, for example,
dissociates and ionizes in water, and as such, is not covered here.

Gases typically dissolve in two forms: molecular and atomic. The first form, molecular
dissolution, occurs when the solute has the same molecular structure in solution as it does in the
gas phase. Examples are 0 2 and N2 in water. On the other hand, C02 dissociates slightly to form
carbonic acid, while also being present as a molecular solute. If the amount of dissolved gas is
very small, the solubility (at a given temperature) is directly proportional to the partial pressure of

48 Chapter 2 Thermophysical and Related Properties of Materials

the gas in equilibrium with the solution. The equation expressing the relationship between
composition of dissolved gas and its partial pressure is similar to Henry's law equation [2.24],
where the Henry's law constant γ° is often designated by a different symbol. Here we will
continue to use γ°:

Xgas = (/?gas)/y0 P·25]

At 307 K, the mole fraction of 0 2 in water at/?02 = 1 atm is 2.0 x 10"5, so γ° = 5.0 x 104. At
25 °C and/?C02 = 1 atm, the molality of C02 is 0.033 while the molality of HC03" = 1.2 x 10"4.
Therefore, ignoring the minor amount of ionization, γ° of C02 in water = 30 on a molality basis.
γ° increases with increasing temperature, so the solubility of gases in water is less at higher

temperatures. This explains why dissolved gases evolve (as tiny bubbles) when water is heated.

Gases dissolve in metals as atoms, so the solubility of a diatomic gas is proportional to the
square root of the gas pressure. This behavior gives rise to Sieverts ' law:

*gas = Ks(pgas)l/2 [2.26]

where K$ is the Sieverts' law constant. (Note that K$ is just 1/γ°). Quite often the composition is
expressed in practical units such as mass fraction expressed as parts per million. This is
satisfactory because when very dilute, the ratio of wppm/x is a constant that can be incorporated in
^ s . In addition, since log(Ks) is a linear function of reciprocal temperature, it is common to
express Sieverts' law in logarithmic form. For example, the solubility of hydrogen in liquid iron is
given by:

Log(wppm H) - -1637/Г+ 2.313 + y2log(pH2) [2.27]

where the pressure is in atm units.

EXAMPLE 2.12 — Volumetric Solubility ofC02 in Water.

One liter of water is removed from a capped bottle at 10 °C where thepC02 inside the bottle
was 2.3 atm. Calculate the ambient volume of C02 dissolved in the water at/?C02 = 2.3 and 1 atm.

Data. wC02 in water at 10 °C and 1 atm/?C02 = 0.23 %.

Solution. The composition term given as data must be converted to mole fraction. xC02 in water
at 10 °C and 1 atm/?C02 - 0.000943, so γ° for C02 =1060. The composition will increase by a
factor of 2.3 at 2.3 atm, but the ambient volume will decrease by the same factor. One liter of
water is 55.5 moles, so nC02 = 0.0523 at 1 atm. The volume of dissolved C02 is obtained by
application of the ideal gas law, Equation [2.3], where R = 0.08206 liter · atm/mol · deg · K. The
volume of dissolved C02 is 1.21 L at 1 atm and 10 °C, and is the same at 2.3 atm.

Assignment. The solubility of 0 2 and N2 in water at 1 atm pressure and 0 °C is:

wN2 = 0.0029 % w02 = 0.0069 %

Calculate the mole fraction of 0 2 and N2 in ice water when it is in equilibrium with air at
Ptotal = 1 atm.

2.8.4 The Solubility Limit

Our discussion so far has been on systems where the extent of solution was not limited. For
example, there is no practical limit to the solubility of N2 in water; as we increase /?N2, the
solubility increases, and no new species form. Similarly, at 700 °C, Cd-Mg alloys display a
continuous series of liquid solutions. However, a great many processes employ phases that are
sparingly soluble in each other. We consider first the limited solubility between two species that
do not form compounds, such as silver and copper. When a mole of Cu and a mole of Ag are
placed in contact at 900 К and allowed to equilibrate, the Ag will take Cu into solid solution while
Cu takes Ag into solid solution. The solubility limit of Cu in Ag at 900 К is xCu = 0.05, while the

Chapter 2 Thermophysical and Related Properties of Materials 49

limit for Ag in Cu is xAg = 0.02. These values simply reflect the ability of one element's atoms to
fit into the atomic structure of the other's. The solubility of non-reacting species is usually greater
in the liquid phase than in the solid phase.

For reactive species, the solubility may be limited by the formation of a compound. In the
Ca-Mg system, the compound CaMg2 is so stable that the solid solubility of Ca in Mg and vice
versa is negligible. Even for liquid alloys, the solubility limit for Ca in Mg at 650 °C is xCa =
0.17. A similar compound-formation solubility limit may occur in gas-metal systems. For
example, because of the formation of iron nitride, Fe4N, the solubility limit of N in Fe at 500 К is
wN = 0.0029 %. This occurs at 1780 atm. There is no apparent solubility limit of N in liquid Fe.

The most common way to depict these relationships is by means of г phase diagram. These
are readily available in materials science texts, materials property handbooks (ASM 1992), and
database software cited in General References.

2.8.5 The Solubility of Ionic Species in Water; the Solubility Product

The use of aqueous processing techniques is extremely important in materials engineering.
We are all very familiar with water's ability to dissolve solid compounds, such as salt (NaCl), and
we know that aqueous solutions can react with other substances to form new solutions. In this
section, we will confine our discussion to substances of very low (ionic) solubility in water.

Sometimes it is reported that a substance is "insoluble in water", but in fact, this is never
strictly true. When we place a carefully weighed piece of limestone in distilled water for a few
days, then remove it and weigh it again, we may not detect any change in mass. However, we
know that limestone must be at least somewhat soluble in water because over geologic times,
water flowing underground has dissolved enough limestone to leave large caves and caverns.
Analysis of ground water using highly sensitive techniques shows the presence of almost every
element in the periodic table.

The behavior of slightly soluble ionic solids is very important industrially. The fact that
calcium sulfate is slightly less soluble in hot water than in cold water causes it to coat steam boiler
tubes, reducing thermal efficiency. The fact that calcium oxide is sparingly soluble means that
calcium ions in aqueous solution can be used to "scrub" the S02 from a gas to form a precipitate
that can be filtered from solution. Finally, the very low solubility of most heavy metal sulfides
means that wastewater streams containing heavy metals can be purified by adding sulfide ions to
the wastewater.

When an ionic substance is placed in contact with pure water, it dissolves. The dissolved

solid dissociates completely into separate hydrated cations and anions. For example when CaF2 is
placed in water, two F~ ions form for every Ca2+ ion. As dissolution proceeds, the composition of

these ions increases until some sort of dynamic equilibrium is reached between the solute ions and
the remaining solid. Dissolution stops when the solution is saturated, i.e., the solubility limit has
been reached. We can designate the extent of solubility at the saturation limit by use of a solubility
product, Ksv, which is a product of the concentration of the solute ions. For CaF2:

^sp = [Ca2+][F-]2 [2.28]

where [Ca2+] and [F~] are expressed in molarity units, or mol/L. This is virtually equivalent to

molality in very dilute solutions.

EXAMPLE 2.13 — The Solubility ofCaF2 in Water.
Calculate the solubility of CaF2 in water at 25 °C.

Data. The ^sp for CaF2 at 25 °C is 4.0 x 10"11.

50 Chapter 2 Thermophysical and Related Properties of Materials

Solution. The stoichiometry of CaF2 dissolution produces three ions, as described above. Let;; =
mol/L of CaF2 that dissolves, then у mol/L of Ca2+ and 2y mol/L of F~ form. From Equation
[2.28]:

4.0 x 10-11 = (y)(2y)2 = 4y3

Solving,;; - 0.000215 mol/liter. Thus, the solubility of CaF2 in water at 25 °C is 2.15 x 10"4
mol/L, or 17 mg/L.

Assignment. At a higher temperature, the solubility was found to be 21 mg/L. Calculate the value
of Ksp for CaF2.

It's very important to distinguish between the solubility of a given species at saturation and its
solubility product. Ksp has only one value for a species at a given temperature. The saturation
solubility of the species in pure water is unique because of the stoichiometry of dissolution.
However, if the water is not pure, and especially if common ions are present prior to the
introduction of the solid into the water, the saturation solubility has an infinite number of possible
values at a given temperature .

The above discussion is intended to give only a brief introduction to the subject of ionic
dissolution and solubility limit. Solubility calculations can be more complex if two-element solids
form more than two ionic species when they dissolve, or when the ions interact with water to
change the pH.

2.9 Summary

Thermophysical properties of substances and solutions are available in one of the General
Reference works cited on page 605. The state of a phase or a system is described by the properties
of that phase or system, and if a sufficient number of state properties are defined, the phase or
system is said to be specified, or fixed. The properties of a fixed system are related by equations
of state, either derived from fundamental principles or empirical in nature. The most common
equation of state for a gas is the ideal gas law, which is closely approximated at the elevated
temperatures and ambient pressures encountered in many industrial processes. One mole of an
ideal gas occupies a volume of 22.414 L at STP.

The vapor pressure of a condensed phase is uniquely described by a mathematical relationship
with temperature and total pressure. The logarithm of the vapor pressure is nearly linear with
respect to reciprocal temperature. The difference between the vapor pressures of substances may
be employed to separate them during a refining process. Pressure has a very small effect on the
volume of condensed phases, but the effect of temperature is large enough to be significant in
certain situations. The external pressure has a similarly small effect on the temperature of a phase
transformation in a one-component system, and an external pressure created by an inert gas has a
small effect on the vapor pressure of a substance.

Gases may deviate significantly from ideal behavior at elevated pressure. Most commonly,
high-pressure steam and compressed air are non-ideal. If departure from ideality is limited, the P-
V-T relationship may be described by van der Waals equation of state. For the most accurate P-V-
T description, extensive tables are available that relate the thermophysical properties of non-ideal
gases. Computer programs for making calculations on steam are available on the web.

In a mixture of gases, each species has a pressure (its partial pressure) which is determined by
the total pressure, and the volume fraction of that species. The tendency of a gas species to
condense from a mixture is indicated by its dew point temperature (dpt), or the percent relative
humidity of the condensable gas. A gas mixture is saturated with the condensable gas at the dew
point (where the %RH is 100 %).

* The concentration units for Ksp used here are mol/L, but are equivalent to molality at high dilution.

Chapter 2 Thermophysical and Related Properties of Materials 51

Condensed-phase solutions are very common in materials processing and extraction
metallurgy. Ideal solutions are those rare cases in which the solute species obey Raoult's law,
which specifies that the vapor pressure of a solute be in direct proportion to its mole fraction. In
some cases, the solute species obey the regular solution formalism, which requires the use of an
activity coefficient. Very dilute solutions, such as gases dissolved in water, can be described by
Henry's law, which uses a fixed value of the activity coefficient for the solute species. The
pressure-composition relationship for bimolecular gases that dissolve atomically in metals (such as
H2, N2, etc.) is given by Sieverts' law, which requires use of the square root of gas pressure.

Ionic solids dissolve in water to form ionic solutions. The solubility product can be used to
calculate the solubility of sparsely soluble solids. The presence of other ions can have a great
effect on the solubility of an ionic solid.

References and Further Reading

Archon Engineering, Steam Tables, [Online], http://www.archoneng.com/steam.html. 2001.

Baker, Hugh, editor, ASM Handbook, vol. 3, Alloy Phase Diagrams, ASM International, 1992.

ChemicaLogic Corporation, SteamTab Companion, [Online].
http://www.chemicalogic.com/steamtab/companion/default.htm. November 2003.

Ferguson, F. D. and Jones, T. K., The Phase Rule, Butterworths, 1966.
Hillert, Mats, Phase Equilibria, Phase Diagrams, and Phase Transformations: Their
Thermodynamic Basis. Cambridge University Press, 1998.

Lemmon, E. W., Jacobson, R. Т., Penoncello, S. G., and Friend, D. G., "Thermodynamic
Properties of Air and Mixtures of Nitrogen, Argon and Oxygen from 60 to 2000 К at Pressures to
2000 MPa", Text and Tables, JPCRD, Vol. 29, No. 3, 2000.

Lemmon, E. W., McLinden, M. O., and Huber, M. L., NIST Reference Fluid Thermodynamic and
Transport Properties Database (REFPROP), vers. 7.0, NSRDS 23, 2002.

Linric Company, PsyCalc 98, [Online]. http://www.linric.com/webpsysi.htm and

http://linricsoftw.webl27.discountasp.net/webpsycalc.aspx. 1998.

Material Properties, Engineering Toolbox, [Online].
http://www.engineeringtoolbox.com/material-properties-t_24.html.

MegaWatSoft, Steam97Web [Online].
http://www.steamtablesonline.com/steam97web.aspx. 2008.

U. S. Environmental Protection Agency on-line course, Basic Concepts in Environmental Sciences,
Module 2: Characteristics of Gases, Module 3: Characteristics of Particles, Module 4: Liquid
Characteristics, http://www.epa.gov/air/oaqps/eog/bces/. December 2009.

Wikipedia contributors, "Phase (matter)", "Thermodynamic equilibrium", "Gibbs phase rule",
"Ideal gas ", "Ideal gas law ", "Equation of state", "Standard conditions for temperature and
pressure", "Compressibility", "Compressibility chart", "Van der Waals equation", "Thermal
expansion", "Evaporation", "Vapor pressure", "Dew point", "Relative humidity",
"Psychrometrics", "Steam", "Water (data page)" "Activity (chemistry)", "Raoult's law", "Regular
solution", "Henry's law", "Activity coefficient". Wikipedia, The Free Encyclopedia, July 2010,
http://en.wikipedia.org/wiki/Main_Page.

52 Chapter 2 Thermophysical and Related Properties of Materials

Exercises

2.1 Calculate the mass of air which would occupy 1.00 m3 at STP, assuming air consists of φΝ2 =
0.79 and φ02 - 0.21. Calculate the volume occupied by one kg of air at 100 kPa and 1000 °C.

2.2 A mass spectrometer was used to analyze a sample of dry air. The results were:

wC = 0.000125; wN = 0.755267; wO = 0.231781; wAr-0.012827

Calculate the volume percent of each molecular species in the sample, and the molar mass of
the air.

2.3 CH4 occupies 1.00 m3 of volume at STP. How many moles of CH4 are present?

2.4 Liquid oxygen has a density of 1142 kg/m3 at is normal boiling point of-183 °C. What is the
percent increase in volume at -183 °C after evaporation? The vdW constants for 0 2 are:

a = 1.36 L2 · atm/mol2 b = 0.0318 L/mol

2.5 The volume fraction of argon in dry air is about 0.93 %. Argon can be separated by fractional
distillation of liquid air; normally, a recovery of 92 % is attained. How many kg of air are required
to produce 1 m3 Ar at STP?

2.6 Air is used to cool an engine. 145 kg of air enters the cooler via a duct at a pressure of 131
kPa and 42 °C, and exits at 105 kPa and 78 °C. Calculate the linear velocity of air in the incoming
and exit stream if the diameter of the duct is 128 cm.

2.7 A titanium tank of inside diameter 1.10 m and depth 2.30 m contains 2005 L of water at 25 °C.
The water (and of course the tank) is heated to 100 °C. Is there any danger of the water
overflowing the tank?

2.8 A steel tube of diameter 0.500 in and length 10.0 ft is filled with water at 25 °C and 15.0 psia.
How much additional water can enter the tube if the pressure increases to 200.0 psia?

2.9 One kg of air at 300 К and 20 bar pressure has a volume of 0.0428 m3. Calculate the
compressibility factor z.

2.10 Use one of the on-line or demo steam property calculators to calculate the change in volume
for 1 kg of water evaporating to saturated steam over the range 30 ° to 130 °C. Calculate the
percentage difference between the actual volume of the steam and that predicted by the ideal gas
law.

2.11 350 L of air at 1.28 atm, 44.0 %RH and 325 К is bubbled through a large amount of water at
325 K. Calculate the ambient volume of the vapor-saturated air leaving the water. Check your
answer with the PsyCalc (or other) program.

2.12 Calculate the volume fraction of oxygen in steam-saturated air at 321 К and 785 mm Hg.

2.13 126 L of steam-saturated air at 136 kPa and 305 К is added to a similar volume of dry air at
136 kPa and 305 K. Calculate the % RH.

2.14 Steam-saturated air at 1.35 atm and 305 К is expanded to 1.04 atm. Calculate the %RH and
dew point. Check your answer with the PsyCalc program.

2.15 A gas has the following composition: <pCd = 0.0160, (jpZn = 0.0025, remainder N2. Estimate
the dew point of this gas at a total pressure of 105 kPa, assuming the condensed phases obey
Raoult's law.

2.16 Examine one of the General Reference books, or search the internet, to find the
thermophysical properties of graphite and diamond. Use the data to calculate the effect of pressure
on the transformation temperature of graphite to diamond.

2.17 The vapor pressure of zinc was measured in equilibrium with liquid Mg-Zn alloys at 650 °C,
with the following results:

Chapter 2 Thermophysical and Related Properties of Materials 53

xZn 0.1 0.2 0.3 0.4
pZn, atm 0.00091 0.00229 0.00455 0.00750

Make calculations to determine if liquid Zn conforms to the regular solution approximation in
this system. The vapor pressure of liquid zinc is given by: log(p°Zn), atm = -6120/Г+ 5.19.

2.18 Copper iodate Си(Юз)2 has a solubility in water at 25 °C of 3.3 x 10~3 mol/L. During
dissolution, it ionizes to Cu2+ and I03". Calculate the Ksp for Cu(I03)2.

2.19 Liquid Fe at 1840 К is exposed to N2 at 1 atm until equilibrium is reached. The metal was
sampled and found to contain wN = 0.046 %. Calculate the/?N2 in equilibrium with the molten Fe
for a wN = 0.023 %.

CHAPTER 3

Statistical Concepts Applied to Measurement and Sampling

Engineers prepare material and energy balances for a variety of reasons and at various stages in the
life of a process or plant. At the design stage, planners make various assumptions concerning
efficiency of reactions, heat loss rates, etc., and they compute theoretical balances to set the
process flow rates, temperatures, efficiencies, equipment sizes and so on. The balances are exact
but theoretical, and satisfy all equations precisely. They might also conduct sensitivity analyses,
which investigate how sensitive the conclusions are to the various assumptions made. Throughout
the design stage, designers must estimate the errors associated with the material property and
equipment specifications.

Once a plant or process is built and operating, managers may wish to optimize its control or
economic performance. This means the operator needs to compute material and/or energy
balances on the real system using actual measurements of flow rates, chemical compositions,
temperatures, and the like. Then, using those real-world balances, the operator can adjust various
process factors to improve performance. Any recommendation for a process change based on plant
measurement must recognize that errors can occur when taking and analyzing samples.

A theoretical material or energy balance often requires the use of tabular data, such as the
solubility of a gas in water, or the enthalpy of superheated steam. In some cases, it's difficult to
incorporate tables into computerized process simulations. It's therefore necessary to develop
equations that can faithfully reproduce the tabular data, and this in turn requires the data to be
statistically analyzed to determine how many equation terms and what kind, are required.

Real-world material balances require samples from heterogeneous materials, indirect
measurements or calculations of flow rates, temperature measurements under extreme
environmental situations, and sometimes-arduous chemical analyses, all of which have some
degree of error associated with them. Therefore, actual balances may have considerable
uncertainty in the results. For example the presence of unaccounted-for or trace elements in a
material mixture can give unknown or unacceptable errors in the material and energy balances.

Finally, in process analysis, we often take repeated samples in preparation for an
improvement campaign. We should take enough samples to get representative results so we can
make a valid statistical analysis, or use the results to develop a process model. In doing this, we
are always faced with the question of how many samples are enough, or if there is some source of
bias that may be affecting the results.

This Chapter introduces basic statistical concepts and shows how to use statistical tools to
analyze experimental or plant data. Statistics deals with the collection, analysis, interpretation, and
presentation of numerical data. Descriptive statistical methods help summarize or describe data
(for example by computing the average daily temperature or rainfall to describe a climatic area).
Inferential statistical methods allow us to characterize a larger system from a smaller body of data.
This is exactly what we are trying to do when we take samples to compute energy and material
balances for a physical process. Accordingly, after two introductory sections on basic ideas, we
will focus on inductive statistical tools that are important for process-oriented engineers.

The focus is on data analysis, using visual techniques to display the data before making
calculations. The Chapter coverage is not a substitute for a course in engineering statistics, and the
ideas mentioned here are not covered in full detail. Several reference texts are cited at the end of
the Chapter for persons seeking a more in-depth coverage. The NIST/SEMATECH e-Handbook
of Statistical Methods (NIST-Gen 2006) provided many of the ideas and some of the examples

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 55

used in the Chapter. The principle computational and graphing tool is Excel, which we used in
Chapters 1 and 2. Excel has many built-in statistical tools that we will use throughout the Chapter.
Please consult the end-of-Chapter references for help with Excel's tools.

3.1 Basic Statistical Concepts and Descriptive Tools.

This section introduces the main statistical tools that are commonly used to describe or
summarize a set of measurements. These tools constitute the foundation for the more complex
tools developed later in the Chapter. We show how to use Excel's tools to carry out statistical
analysis in this document and in workbook StatTools.xls on the Handbook CD (folder Statistics).

An example can help make things concrete. Copper is an impurity in flat-rolled steel.
Steelmakers are concerned with the copper content because if it is too high, the steel is more
difficult to process, leading to a higher cost to the producer. Table 3.1 shows a company's
measurements of copper content of steel in 50 batches produced over a three-day period.

Table 3.1 %Cu in 50 batches of steel.

0.205 0.143 0.113 0.219 0.173 0.143 0.133 0.167 0.190 0.169
0.224 0.154 0.064 0.137 0.172 0.237 0.180 0.167 0.120 0.150
0.155 0.123 0.131 0.129 0.173 0.179 0.246 0.114 0.159 0.210
0.125 0.176 0.197 0.145 0.138 0.155 0.154 0.141 0.175 0.166
0.179 0.178 0.170 0.236 0.207 0.150 0.150 0.120 0.179 0.149

The 50 measurements of %Cu comprise a statistical sample. This sample is listed in
worksheet "Steel" in the Excel statistics workbook (StatTools.xls) on the Handbook CD. In
general, statisticians use the word sample to refer to any collection of objects being measured in an
experiment. It is possible each object will be measured in more than one way. For example, you
might measure not just the copper content, but the amount of other impurities in each batch, the
finishing temperature of each batch, the inclusion count of each ingot, etc. Each such
measurement is referred to as a variable.

It's impossible to hold 50 three-decimal numbers in one's head, or comprehend all the fine
detail in the above set of numbers. As a result, people look for ways to summarize a set of
numbers in a way that retains the useful information. For example, the company could calculate
the average composition of three days steel production, or note that five batches had copper
contents above 0.21%. Calculations based on information in the sample are called statistics, and
things like averages are called summary or descriptive statistics. In addition, because a picture is
usually better than a descriptive text, people often find ways to present such data graphically, using
things like bar charts or x-y plots. A number of standardized procedures have been developed to
display and analyze experimental data (NIST-Gen 2006).

The engineer is usually not interested in the sample for its own sake. In the example above,
the company probably doesn't care about the specific fact that the tenth batch contained 0.172
%Cu. Rather, they are more likely to be interested in demonstrating that the copper content falls
(or almost certainly will fall) within a desired composition range. In addition, they may want to
search for a statistically significant relationship between the tensile strength of the steel and the
%Cu. Statisticians use the words population or process to refer to the collection of all objects
similar enough to the sample that the measurements of the sample objects should give a good idea
about the (unknown) measurements of objects not in the sample. The collection of every batch of
steel made by your company would be a population. In real life, one never knows everything
about the population or process. The steel company may want to develop a correlation between
tensile strength and copper content on each batch of steel rather than running tensile strength tests
on all batches.

56 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

The more important function of statistics is the process of making statements about the
population using only the information in a sample, i.e., a portion of the population. This process is
called inferential statistics. Some questions that the company might try to answer based on this
sample of 50 measurements are: Can they estimate the largest %Cu likely to occur in a month's
worth of (say) 240 batches? If the steelmaking process is changed, and another 50 batches are
analyzed, how do they know if the change has significantly affected the copper content? They may
also be interested in developing a model based on statistical data that can predict steel ductility as a
function of the amount of all of its alloying ingredients. We will spend the rest of Section 3.1
talking about important descriptive statistics and graphs, while addressing questions of inferential
statistics in later sections.

3.1.1 Histograms and Frequency Distributions

When test results are presented in a journal article or at a conference, the author often uses a
frequency distribution or a histogram to display the measured variable. A frequency distribution is
a table summarizing the different values occurring in the variable by subdividing the total range of
values into sub-ranges (called bins), and counting the number of values in each sub-range. Table
3.2 is a frequency distribution for the copper content data presented earlier in Table 3.1.

Table 3.2 Frequency distribution table for %Cu in steel.

Range (0.06, 0.09] (0.09,0.12] (0.12,0.15] (0.15,0.18] (0.18,0.21] (0.21,0.24] (0.24, 0.27]

Frequency 1 4 15 20 5 4 1

This shows there was one %Cu measurement between 0.06 and 0.09, 4 measurements
between 0.09 and 0.12, etc. The math symbol ")" or "(" in an interval means don't include the
boundary value, while " ] " or "[" means do include the boundary value. Thus, our measurement of
0.180 %Cu is included in (0.15, 0.18], not (0.18, 0.21].

Frequency distributions permit us to display large sets of numbers in simpler, more organized
fashion. They also help us begin seeing patterns in the numbers that can be lost in the sheer mass
of data. For example, it's clear that most of the %Cu measurements are near the center, with a few
values on the small side and a few values on the high side. Statisticians sometimes use the phrase
"distribution of the data" to refer to qualitative descriptions like this.

Sometimes, instead of reporting the number of measurements in a given range, researchers
report the relative frequency, that is, the percentage of the sample that falls in each bin. Table 3.3
shows a relative frequency distribution for the copper data.

Table 3.3 Relative frequency distribution table for %Cu in steel.

Range (0.06, 0.09] (0.09,0.12] (0.12,0.15] (0.15,0.18] (0.18,0.21] (0.21,0.24] (0.24, 0.27]

Rei. Freq. 2% 8% 30% 40% 10% 8% 2%

The next step is to pick the ranges for a frequency distribution. Statisticians use a particular
kind of column chart called a histogram to represent frequency distributions and relative frequency
distributions. You'll see column charts used in newspapers and textbooks to represent a variety of
data, but most of those column charts are not histograms. What makes a column chart into a
histogram is that the j^-axis of the histogram always has to have either a frequency or a relative
frequency. Thus, a histogram is exactly what you'd get when you used a frequency distribution or
a relative frequency distribution to make a column chart.

The instructions for creating a frequency distribution table are on an embedded WordPad
document in worksheet "Steel", of StatTools.xls, which contains the copper content data presented
in Table 3.1. The document describes how to determine the number of bins, their width, and where
the first bin's boundary should be. Subsequent examples are based on material from this
document. The instructions for making a histogram are contained in the same document. In

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 57

addition to producing a chart, Excel's histogram tool also gives additional statistical parameters for
the data. Figure 3.1 shows the two types of copper distribution histograms.

Histogram of %Cu in 50 Steel Batches Histogram of %Cu in 50 Steel Batches

24 • 50O/O
Sо 20
f 16 a 40%
«5о 12
(0
4-
«и 3 0 %
28
4-1

(A
ШМШ ψΜΡ/№\
^ 20%
(0.06, (0.09, (0.21, (0.24,
0.09] 0.12] 0.24] 0.27] ■M

g 10%

a QO/0 уштм
(0.06, (0.09, (0.12, (0.15, (0.18, (0.21, (0.24,
0.09] 0.12] 0.15] 0.18] 0.21] 0.24] 0.27]
%Cu

Figure 3.1 Histograms based on Tables 3.2 and 3.3.

Notice that the histograms look pretty similar. That's because all we've done is change the
vertical scale. The purpose of a histogram is to allow the analyst to see how the measurements of
the variable are distributed. The "shape" of the data is much more important than the scale. Some
typical questions one might answer from a histogram are: are the measurements mostly clumped
in the center, with just a few at each end, or are they spread out more evenly? Are the
measurements symmetric around the center, or are the measurements on the right much further
away from the center than the measurements on the left? The "shape" of the histogram data can
help the analyst decide how best to analyze the data, as will be seen in section 3.3.

There is no single best histogram for most datasets. For the examples above, instead of
grouping our measurements from 0.06 to 0.09, 0.09 to 0.12, and so forth, we could instead have
grouped them from 0.06 to 0.10, 0.10 to 0.14, and so on up to 0.22 to 0.26. Histograms can have
different numbers of bins, different bin widths, and different placement of bins. You have the
choice of allowing Excel to make these decisions for you or to make them yourself by telling Excel
what you want. A WordPad document embedded in the "Steel" worksheet of workbook
StatTools.xls takes you through both the default histogram, where Excel makes all the decisions, as
well as the individualized histogram, like the ones above where the authors made all the decisions.
The document goes into detail on selecting bin size, range, etc. Different decisions about bins can
result in rather different looking histograms. If time permits, it's a good idea to try different
choices to see which best displays the data. Because of this plurality of histograms, they should be
thought of as an approximate, qualitative presentation of the data, rather than as an exact,
quantitative description. Excel also has a number of other chart display options that may give
additional insight into the distribution of results.

EXAMPLE 3.1 — A Histogram of Ceramic Strength Measurements.

A researcher measured the strength of different batches of silicon nitride (Si3N4). These
measurements were part of a larger experiment whose goal was to determine how to manufacture a
product of maximum strength (NIST-Cer, 2006). Use the instructions from worksheet "Steel" to
design a frequency distribution for this data, and plot a corresponding histogram.

Data. 480 measurements appear on the "CerStr" worksheet of workbook StatTools.xls. The data
range from a low of 345.294 to a high of 821.654, and are measured to three decimal places.

Solution. We start by defining the bin properties.

(a) How many bins should we use? Log2 480 = 8.91, so we consider using about 9 bins.

58 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

(b) How wide should each bin be? 821.654-345.294 = 52.93, since most folks have a

9

preference for round numbers, we round this off to a bin width of 50. Since we're rounding down,
we'll have more than 9 bins.

(c) Where should the boundaries of the first bin be? As usual, there is less guidance here than
for the other questions, and you may want to try several different choices and see if there's a major
difference in the resulting histogram. However, we will stick with the idea that round numbers are
easily understood, and use (300, 350] for the first bin. Table 3.4 shows the resulting frequency
distribution and Figure 3.2 the histogram. The shape of this data is roughly symmetric, but with a
longer tail on the left than on the right.

Table 3.4 Frequency distribution of the strength of 480 batches of silicon nitride.

Range Frequency Range Frequency

(300, 350] 1 (600, 650] 116

(350,400] 1 (650, 700] 97

(400, 450] 4 (700, 750] 97

(450, 500] 5 (750, 800] 40

(500, 550] 26 (800, 850] 3

(550, 600] 90

Histogram of Ceramic Strengths

120
100
>80
с
3 60
σ
£ 40

20

(300, (350, (400, (450, (500, (550, (600, (650, (700, (750, (800,
350] 400] 450] 500] 550] 600] 650] 700] 750] 800] 850]

strength of ceramic

Figure 3.2 Histogram based on Table 3.4.

Assignment. A WO3 reduction process aims to produce metallic tungsten. Owing to product
porosity, a certain percentage of oxygen will remain in the reduced product. Measurements of the
remaining oxygen from 351 batches of reduced W03 are shown on worksheet "WO" in
StatTools.xls. The numbers are presented as %0 x 103; thus, the first measurement of 52.6
represents 0.0526 % 0 . Use Excel to make a histogram for this data. Justify your choices of bin
number, bin width, and bin boundaries.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 59

3.1.2 Mean, Standard Deviation, and Variance

The three most important summary statistics of a population are the mean, standard deviation,
and variance. Thefirstof our summary statistics, the mean, is just another word for the ordinary
arithmetic average. Add up all the numbers, and divide by how many of them there are.
Algebraically, if xi represents the first number, x2 the second number, etc., and the symbol for the
mean of x is x, Equation [3.1] gives the formula for the mean.

Sample mean = χ = — ^ xt [3.1]
n

In statistics, n represents the number of points, so another name for n is sample size. The
mean is the most commonly used summary statistic when people try to define the center of a
variable. The mean defines the center of a variable in the sense of a center of mass; that is, if
uniform masses were placed on a yardstick, one for each measurement in the appropriate location,
the mean is where the fulcrum would be to get the yardstick to balance.

The other common type of summary statistic is a measure of dispersion, which quantifies how
spread out the set of numbers is from its center. The most common statistical measures of
dispersion are standard deviation and variance. Since their formulas are not quite intuitive, we'll
spend a paragraph or two deriving them.

An intuitive measure of "closeness" would be to find the difference between each of the
points and the center — in this case, the mean — and average those differences together. The
formula for that would be:

-nΈ1=\(χί-*)

The trouble here is that all the positive differences will cancel all the negative differences, so
the average is always 0, which doesn't say much. The obvious way to fix this is to use absolute
values in the formula to ensure that all of our differences are positive:

1 УГ^П I —I
—> . Ji.-x
n^i=W г I

This is a perfectly reasonable idea but, for reasons beyond the scope of this text, statisticians

prefer a different way of ensuring the differences become positive, which is to square them. So the

formula now looks like this:

-Σΐ=\(χί-χ)
n

This is almost what we want. However, it can be shown mathematically that using this
formula on a sample will always slightly underestimate the "true" dispersion in the population, and
that the correctfixis to divide by n - 1 rather than by n. Thus, we finally reach the formula for the
variance of a sample, whose symbol is s2:

Sample variance = 2 =-^—Ynl (Xi-x)2 [3.2]
s n-\ l

However, the units for the variance are rarely convenient. For example, if x were some
variable measured in kilograms, then the variance would have units of kg2, which are hard to
interpret, to say the least. Therefore, the standard deviation is the square root of the variance, and
will have the same units as the original variable x:

Sample standard deviation = s= j V^7 (χ. - χ)2 [3.3]

60 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

There are two key ideas about these measures of dispersion that are important going forward.
First, although it is not literally true, the standard deviation is still best thought of as the average
difference between individual measurements and the mean of those measurements. Second, the
usual way to measure or describe how much a dataset varies around a given value is to use the sum
of the squared differences from that value. This concept will appear again in sections 3.3 and 3.4.

Excel has formulas for the mean, variance, and standard deviation. Consider the %Cu data on
worksheet "Steel" on workbook StatTools.xls. The formula for the mean in Excel is
"AVERAGE(rangre)". The compositions are in cells A3 through E12, so the relevant formula in
Excel is "=AVERAGE(A3:E12)". Either type the formula in (see cell E15), or use Excel's "Insert
Function" feature. To do the latter, click on the littlejSr to the left of the cell reference box. In the
resulting dialog box, pick "Statistical" from the Function Category box, and then click on
"AVERAGE" in the box to the right. Then either type A3:E12 in the resulting dialog box, or use
the mouse to select these cells. Help is available from the Insert Function dialog box. No matter
how you do it, you should get 0.163 as the mean (of %Cu) in cell El5.

How many significant digits should the mean have? Each measurement is reported to 3
digits. Suppose, for the moment, that all three digits are significant. According to the calculation
guidelines in section 1.8, the mean should also have three significant digits. If Excel does not
display the correct number of significant digits, click on the relevant cell, and choose
Format/Cells... Click on the Number tab at the top, then the Number entry in the Category list.
Finally, choose the correct number of decimal places.

The Excel formula for standard deviation is "=STDEV(rangre)", while the formula for the
variance is "=VAR(range)". These formulas were entered on the "Steel" worksheet, cells El6 and
E17. The standard deviation of the %Cu data is 0.0358, while the variance is 0.00128. These are
also expressed with three significant digits, since all our original numbers had three significant
digits and none of our operations reduces the number of significant digits.

3.1.3 Median, Percentile and Quantile

Percentiles and quantiles are closely related measures of location, which means they locate a
particular measurement within a set of values. We use both words interchangeably in this chapter.
A general definition will follow, but first we'll look a quick example to help explain the idea. The
most common percentile is the median, an alternative to the mean as a measure of the center of a
dataset. The median is supposed to be the number "in the middle of the data", or the number that
separates the bottom 50% of the data from the top 50% of the data. We want to find the median of
the %Cu data. For convenience, Table 3.5 shows the 50 copper impurity measurements, sorted
smallest to largest, across then down.

Table 3.5 Sorted %Cu data, n = 50.

0.064 0.113 0.114 0.120 0.120 0.123 0.125 0.129 0.131 0.133
0.137 0.138 0.141 0.143 0.143 0.145 0.149 0.150 0.150 0.150
0.154 0.154 0.155 0.155 0.159 0.166 0.167 0.167 0.169 0.170
0.172 0.173 0.173 0.175 0.179 0.179 0.179 0.180
0.190 0.197 0.205 0.207 0.176 0.178 0.224 0.236 0.237 0.246
0.210 0.219

If we want to separate the bottom 50% of the data from the top 50% of the data, we should be
looking somewhere around the 25th and 26th numbers in the sorted list, which we've highlighted
above. Looking first at 0.159, it is close to the median, but not quite right: 24 numbers are below
it, but 25 are above it, so it isn't quite in the middle. 0.166 suffers from the same problem: 25
numbers below, 24 above. Thus, for the median, we'll pick the average of 0.159 and 0.166, which,
to three decimal places, is 0.163. This will be our median, and, since it falls between the 25th and
26th numbers in the sorted list, it fits our criterion of separating the bottom 50% of the data from
the upper 50%. You may have noticed that the median is the same as the mean for this example.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 61

That is a sign that the histogram is symmetric around the mean since there are just as many values
above the median as below, and since the median equals the mean, then there are just as many
values above the mean as below. In most distributions, the mean and median won't be the same.

Conceptually, the kth percentile is the value in the dataset that separates the lower к % of the
data from the upper (100-£) % of the data. Thus, the median is the same as the 50th percentile.
There are a number of devils in the details of actually finding a percentile, based primarily on the
problem that for most sample sizes and most percentages, there's no value that separates exactly к
% below from exactly (100-&) % above. As we showed above, we can solve that problem by using
numbers not in the dataset, but in the right place on the number line. However, there is no
universal agreement among statisticians on how to choose a number from outside the dataset. In
the example above, 0.160 and 0.164 would also separate the bottom 50% from the top 50% of the
data; we only picked 0.163 because it felt "middler" than these others.

We will sidestep these details and rely on Excel's PERCENTILE and PERCENTRANK
functions. The PERCENTRANK works as an inverse to the PERCENTILE function; that is,
PERCENTILE starts with a percentage and returns to the user the number in the dataset
corresponding to that percentage, while PERCENTRANK takes a number in the dataset and
returns the percentage corresponding to that number. Because the median is the most commonly
computed percentile, Excel has a MEDIAN function, which for this example returns 0.163, as
shown on the "Steel" worksheet (cell E18).

EXAMPLE 3.2 — Percentiles of the %Cu Data using Excel

Find and understand the 40th percentile of the %Cu measurements. Also, find the approximate
percentile corresponding to a measurement of 0.179.

Data. The data are on the "Steel" worksheet of workbook StatTools.xls.

Solution. The format of the PERCENTILE function is =PERCENTILE(array of numbers,
percentage as a decimal). As before, you can call the PERCENTILE function by clicking on
the little/x; near the cell reference box. Therefore, the formula that will tell us the 40th percentile
for the %Cu data is =PERCENTILE(A3:E12, 0.4) - 0.152. (Actually, Excel reports it as 0.1524,
but we've reduced the number of significant digits back to 3). Thus, the value 0.152 (cell G20)
separates the lower 40% of the %Cu measurements from the upper 60% of the measurements.

To see why that makes sense, consider 40% of 50 = 20. Thus, the 40th percentile should be
greater than or equal to 20 of the %Cu measurements, and less than or equal to 50 - 20 = 30 of the
%Cu measurements. Looking at the actual numbers, place 20 contains 0.150, while place 21
contains 0.154. Thus, the 40th percentile falls between these two numbers, to separate the lower 20
from the upper 30 measurements.

The format of the PERCENTRANK function is =PERCENTRANK(array of numbers,
value, significant digits), where the last argument is optional. Therefore, we use
=PERCENTRANK(A3:E12, 0.179, 3) = 73.4%. To see why this makes sense, there are 36
measurements in the %Cu dataset smaller than 0.179, and -jfi = 72%.

Notice that Excel is making a choice here. 0.179 occurs three times in the %Cu dataset, and
Excel could potentially return the percentage corresponding to all numbers less than or equal to
0.179, which would be -fj = 78%. Or, it could return the average value between 72% and 78%).
However, Excel apparently interpolates between the percentages corresponding to just before and
after the first of the occurrences of 0.179, or 72% and 74%.

There are two additional details about PERCENTRANK. First, if no significant digits are
specified, it will return the answer to three significant digits. Second, PERCENTRANK still
works even if the value given to it is not in the array. For example, the %Cu data contains no
measurement of 0.215, but the formula =PERCENTRANK(A3:E12, 0.215) still returns a
percentage between the percentages for 0.210 and 0.219, both of which are in the dataset.

62 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Assignment. Find the 75* percentile of the ceramic strength data, and explain why that value is
reasonable. Also, find the percentile corresponding to a strength of 550, using 5 significant digits,
and explain why that value is reasonable.

3.2 Distributions of Random Variables

Statisticians use the word distribution in two senses. The first use is qualitative, and describes
the features of a dataset displayed in a histogram. Looking at the Figure 3.1 histograms, the %Cu-
in-steel histogram looks "bell-shaped", while the ceramic histogram (Figure 3.2) looks "left-
skewed" because it has a longer "tail" on the left than on the right.

The second use of the word distribution is more precise. Here, a distribution is a
mathematical formula whose plotted shape or pattern serves as an idealized model for the shape or
pattern seen in a histogram of a sample variable. Typically, the fit between a mathematical
formula and the histogram of an actual set of measurements is never perfect. However, as is true
of all mathematical models, the goal of a mathematical model is to be an adequate approximation
of the real data rather than an exact match for it.

Statisticians use the term random variable to refer to a measurement process where the value
of any individual measurement cannot be precisely predicted in advance, but for which there is a
(potentially unknown) mathematical distribution that can be used to predict the likelihood of a
individual measurement having a particular value. Identifying the best model distribution for a
random variable is an important step in inferential statistics. In particular, as shown in the next
section, most inferential methods assume that a particular model or class of models can
approximate the sample variable. The mathematical formulas of these idealized distributions will
let us calculate the probability that a given measurement falls into a given range of values. For our
purposes, we can use a very informal definition of probability or likelihood: the probability or
likelihood of a random event is the percentage of time that event is expected to occur.

In this section, we will describe two of the most important mathematical distributions for
engineering purposes, the uniform and the normal distributions. These are both continuous
distributions, which are used to model variables that can, at least in theory, be measured to any
degree of precision. Other important distributions for engineering data include the exponential
distribution and the Weibull distribution, both of them useful in describing the times between
consecutive occurrences of events.

There are also discrete distributions, which are used to model variables where the
measurements are counts, and therefore whole numbers. Examples of discrete distributions with
broad utility include the binomial and the Poisson distributions. However, the uniform and normal
distributions are sufficient for the material presented in the rest of this Handbook, and so will be
the only two described in detail. In addition, we will give a rule of thumb for deciding how
imperfectly a model can fit a dataset and still be a useful approximation for that dataset. We will
use the first-mentioned distribution, the uniform distribution, to introduce a number of ideas that
are useful for all distributions.

3.2.1 The Uniform Distribution

Some processes operate very steadily, or are controlled to do so, such that important process
variables seem unaffected by time or other process variables. Suppose an engineer was monitoring
a large furnace whose temperature was controlled by a device that sent power to the furnace when
it dropped below a certain temperature, and shut off power when it reached a certain temperature.
A recording device measured the temperature at hourly intervals over the course of two days and
recorded 48 measurements (in °C). The cycle time for the controller was about seven minutes,
which is much less than the time interval between each temperature measurement. Thus, the
measurements were taken at all different times in the control cycle. The resulting measurements
ranged from a low of 938 °C to a high of 949 °C, as shown on the "Furnace" worksheet of
StatTools.xls. The range is apparently 11°.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 63

Table 3.6 and Figure 3.3 show a frequency distribution and a histogram of the resulting
measurements. Although the data are reported to the nearest whole degree, the actual temperatures
could occur in a slightly different range. For example, the first bin might be more properly written
(937.5, 939.5]. In worksheet "Furnace", Excel writes the first bin simply as 938-939, which is the
way Table 3.6 shows it.

Table 3.6 Frequency distribution of hourly furnace measurements.

Bin Frequency Relative Frequency Cumulative %
938-939 8 16.7% 16.7%
940-941 8 16.7% 33.3%
942-943 9 18.8% 52.1%
944-945 8 16.7% 68.8%
946-947 9 18.8% 87.5%
948-949 6 12.5% 100.0%

We should think of the relative frequencies in Table 3.6 as probabilities. For example,
according to the 48 measurements, 18.75% were in the 942-943 bin. Therefore, if we were to
make a 49th measurement, we might predict that there was a 18.75% chance that the resulting
temperature would be 942 or 943. Similarly, there'd be a 12.5% chance that the resulting
temperature would be 948 or 949.

Histogram of Hourly Furnace Temperatures

938-939 940-941 942-943 944-945 946-947 948-949

temperature, °C

Figure 3.3 Histogram of hourly furnace temperatures, with cumulative percentage line added.

However, it's unlikely that these 48 measurements reflect exactly, perfectly, the distribution
of all possible temperature measurements that might be made over time. Instead, they're just a
sample from this population of temperatures. So, rather than believe these percentages represent
exact percentages, we look for a mathematical model that describes the general situation we see in
this example.

The histogram shows a distribution that seems to be uniform in height. One bar is a little
higher, another a little lower, but in general each bin seems to contain about 8 measurements, or
about 16.7% of the total data set. The histogram seems to say that the chances of our hypothetical
49th temperature being assigned to the first bin are about the same as the chances of it being
assigned to the second bin, or the third bin, or any other bin. So a simple summary of this
distribution is that there's about a 16.7% chance that a new observation will fall into any particular

64 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

bin. The uniform distribution is the mathematical distribution that models situations like this,
where a randomly chosen member of the population is equally likely to be found in any part of the
distribution.

As was mentioned in the introduction to this section, the uniform distribution applies to
continuous data. Although our temperature measurements are given to the nearest integer, this
represents rounding of temperatures that actually range from 937.50 °C to 949.49 °C and which, if
measured more precisely, could be displayed with many more significant digits. Therefore, as we
work with this example, we will assume that the actual range of temperatures in the furnace is the
interval [937.5 °C, 949.5 °C), or 12°, not just the integers 938 to 949.*

The histogram in Figure 3.3 contains a new feature, a line called the cumulative percentage
line, which is based on the Cumulative % column in the frequency distribution. Excel will
calculate these values if you check the Cumulative Percentage box in the Histogram tool. First,
let's understand the numbers. In the first bin, we have 8 measurements, which is 8/48 = 16.67% of
the whole data set. In the second bin, we have 8 more measurements, and combined with the first
bin, that's 16/48 = 33.33% of the whole data set. Adding the third bin brings us to 25/48 = 52.08%
of the data set. Thus, the cumulative percentage for each bin is the percentage of the data set
contained in that bin plus the bins before it. (Numbers in the "Cumulative %" column may not
reflect the exact sum of the numbers shown in the "Relative Frequency" column because of
rounding). The line added to the histogram plots these percentages, using the scale on the right
hand side of the graph. The square "points" on the Cumulative % line are plotted at the median x-
value of each histogram column.

The plotted function is the cumulative distribution function (cdf), usually denoted F(x), and is
an idealization of the cumulative percentage line. While the cumulative percentage for a bin is the
fraction of the data that lie in or below that bin, F{x) is defined as the probability that a randomly
chosen element of the population will be <x. The uniform distribution is so simple that we can
figure out the cdf for the temperature example using common sense.

First, the cumulative percentage line looks pretty close to perfectly straight. The idea of the
uniform distribution should lead you to the same conclusion. If the data above were perfectly
uniform, you'd have 8 measurements in each of the 6 bins, and so the line would be going up by
exactly 16.67%) for each column.

So, the cdf — the idealized cumulative percentage line — should be a straight line. To see
which straight line, we just have to think about what it is supposed to mean. If the smallest
possible temperature is 937.5 °C, then F(937.5) would be the percentage of the data that was
smaller than 937.5 °C, which is 0%>. Similarly, the highest possible temperature is 949.5 °C, so
F(949.5) would be the percentage of data that was smaller than 949.5, which is 100%), or 1.
Therefore, the cdf for this example is a straight line that passes through the points (937.5, 0) and
(949.5, 100). Some algebra shows you that the formula ofthat function is F(x) = (100/range)(x -
937.5). (Remember, the range is actually 12°).

We said above that the value F(x) of the cdf was defined to be the probability that a randomly
occurring temperature was less than or equal to x. However, the cdf can be used to find the
probability that a randomly occurring temperature falls into any range between 937.5 and 949.5.
Here are two examples.

First, consider finding the probability that the temperature is between 940 °C and 948 °C.
The cdf tells us that the probability that the temperature is <948 is (100/12)(948 - 937.5) - 87.50
%. However, the range we're interested in should not include any temperatures below 940 °C, so

When Excel created the histogram, the first bin was displayed like this on the worksheet: 938-939.
Mathematically, the first bin should be displayed like this: [937.5 °C, 939.49 °C) because that's the set of
real numbers that would round off to integer measurements of 938 or 939, the values currently contained in
the first bin. If you're focusing on what Excel is doing, you could write (937, 939] for the first bin, (939,
941] for the second bin, etc.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 65

we need to subtract off the probability (100/12)(940 - 937.5) - 20.83 % that the temperature falls
in that range. Therefore, using the notation "Yv(range)" to refer to the probability that a
temperature falls into a given range, we can write the above calculation as:

Pr(940 < x < 948) = Pr(x < 948) - Pr(x < 940) - F(948) - F(940) - 66.67 %

Suppose instead that we wanted to know the probability that the temperature falls outside this
range. Since all the temperatures in our population are between 937.5 °C and 949.5 °C, there is a
100% chance that a newly measured temperature is in that range. From the previous paragraph,
there's a 66.67 % chance of finding a temperature between 940 °C and 948 °C. Therefore, that
leaves a 33.33 % chance of finding a temperature outside that range. That is,

Pr(x <940 or x >948) = 1 - Pr(940 < JC < 948) - 100% - 66.67% - 33.33%

You may have wondered about the significant figures in the percentages above. Since all of
these percentages are calculated either (/) as the ratio of two integers or (if) using the cdf formula,
there is arbitrarily high precision. Thus, you may choose whatever number of decimal places you
like for these percentages.

The second function we will use is the probability densityfunction (pdf), denoted as fix). The
pdf is the derivative of the cdf, so for the temperature example:

,., ч d (х-9Ъ1.5Л J_. The pdf gives an alternative method of computing probabilities. We
12
f\x) = —
dx\ 12 J

can use the Fundamental Theorem of Calculus to write:

948 948 , t3·4!

Pr{940<x<948) = F ( 9 4 8 ) - F ( 9 4 0 ) - \f{x)dx = J — dx = 0.6667

940 9 4 0 1 2

In particular, the probability that a measurement falls into a particular range is found by
integrating the pdf over that range.

What would happen if we integrated the pdf for the temperatures over its entire range?
According to the rule above, we should get the probability that a new temperature measurement
from our population has a temperature between 937.5 and 949.5. But since all of our measured
temperatures are in that range, that probability is 100% = 1. Checking the calculation, we get

949.5 j χ 949.5

[ —dx = — = 1-
J 12
937.5 i Z 12 937.5
1Z

This illustrates a general feature of pdfs: if you integrate them over the entire range of values

of their population, you should get 1.

Other than computing probabilities, the importance of the pdf is that its graph should resemble
the histogram of the data being modeled. The graph of the function fix) = 1/12 on the range 937.5
< x < 949.5 is just a horizontal line segment at у =0.08333. If the data above were perfectly
uniform, each of the columns in the histogram would have exactly the same height, since each
would contain exactly 8 measurements. The columns in the actual histogram are roughly level,
with about 8 measurements each. Even though the ^-values in a graph of the cdf and in the
histogram are quite different, the flat shape is the same. Any time you have data whose histogram
basically looks flat across the tops of the columns, a uniform distribution is a reasonable choice of
model for that data.

The name "probability density function" needs clarification. The pdf does not give the
probability of a particular measurement occurring: it is the integral of the pdf, (i.e., the cdf), that
gives probabilities. Visually, since you can think of the integral offix) over some range as the area
under the graph of fix) over some range, you can think of probabilities as given by areas under fix).

66 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

EXAMPLE 3.3 — Uniformity of Vermiculite Particles.

A company uses vermiculite in a chemical process, and considers buying vermiculite from a
new supplier. However, the process has stringent requirements as to the diameter of the
vermiculite particles. The supplier sells "fine" grade vermiculite with a nominal particle diameter
of 2.0 mm. The company obtains a sample, and measures the diameter of 1000 particles by
passing them through successively smaller meshes. Based on these measurements, it concludes
that the diameters of the supplier's "fine" vermiculite are uniformly distributed with a range of
1.85 mm to 2.20 mm. {a) Write down the cdf and pdf for this situation; and (b) calculate the
percentage of vermiculite particles that are outside the desired range of 1.9-2.1 mm.

Solution: (a) The cdf is p(x) = (x -\ 35) = _ J _ (x _ \ 35). The pdf is the derivative of
V ; 2.20-1.85V ' } 0.35V ' }
F, which i s / x ) = 1/0.35 - 2.86.

(b) The percentage of particles that are too small is

Fix 90Ì = —— (Ì 90-1 85)) = 14 3%· The percentage of particles that are too large is
V' ; 0.35V' '

1-F(2.10)=1 (2.10-1.85)= 28.6%. Thus the total percentage of undesirable vermiculite

particles is 42.9%.

Assignment: Suppose temperatures obtained from the cycling furnace ranged from 887 °C to 903
°C, and are still distributed uniformly. (/) Write down the cdf and pdf for this situation; and (ii)
calculate the probability that a new temperature measurement will fall between 898 °C and 900 °C.

It seems fairly clear that the furnace temperature data conforms to a uniform distribution
model. However, with some of the datasets and distributions to come, it will be less visually
obvious from a histogram how good the "fit" of a particular model might be. A graphical
technique that gives a different look at how well a particular distributional model fits a dataset is
called a distribution plot. Distribution plots are not usually necessary with the uniform
distribution. In general, it's pretty easy to look at a histogram and say either, "Yes, the tops of the
bars are close to each other" or "No, the tops of the bars are much too uneven to be considered
'flat'". The simple algebraic formula for the uniform distribution makes it a good choice for
explaining how to make a distribution plot.

The basic idea behind making a distribution plot is to graph the actual data, sorted smallest to
largest, against the theoretical percentile values for the distribution being tested, in this case, the
uniform distribution. Let's look at a concrete example. If the temperature data is sorted, the 15th
temperature in the list is 941 °C* Using Excel's PERCENTRANK function, we see that 941 °C is
the 29.7th percentile of the temperatures. That is, 29.7% of temperatures would be smaller than
941 °C, and 70.3% would be larger.

If we want to use a uniform distribution as a model for this data, what does that distribution
predict the 29.7th percentile value to be? That is, if the data were exactly uniform, what number
would separate the lower 29.7 % of the data from the upper 70.3 % of the data? Recall that the cdf
gives the percentage of data below a particular value, so the theoretical 29.7th percentile would be
the number x that solves the equation F(x) = (l/12)(x - 937.5) = 0.297. Solving this equation gives
x = 941.064. This (small) piece of evidence favors the uniform distribution as a model for the
temperature data because the actual 29.7th percentile, 941 °C, is pretty close to the theoretical
percentile value predicted by the model (941.064 °C).

The procedure for sorting a set of variables is described in an embedded document in the "Steel" worksheet
of StatTools.xls.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 67

The distribution plot is a graph of the theoretical percentile values on the x-axis against the
actual percentile values on the 7-axis. For the 15th item in this dataset, we would plot the point
(941.064, 941). The better the fit between the data and the uniform distribution, the closer the
actual percentiles should be to the theoretical percentiles. If each pair of numbers is nearly the
same, the result is a set of points very close to a line of slope = 1. Figure 3.4 shows what the
furnace data looks like. The extent of deviation of points from a straight line of slope = 1 indicates
the departure from the assumed distribution. The text box shows the equation of the best-fitting
line. If the furnace data was perfectly uniform, the slope would be 1 with an intercept of zero.
However, the slope and intercept by themselves are inadequate to determine how well the
distribution fits the uniform model. A better indicator is the value of R2.

Uniform Distribution Plot for Furnace Data
950

о 948

> 946

944
сeФ 942
Ф
Ω.
"(О 940 ъм*«мъ* 1

3 938 ορβο actual = 0.985(theor) + 14.48
фКЮ R2 = 0.9908
+■»
II
о

936 938 940 942 944 946 948 950
936 theoretical percentile value, °C

Figure 3.4 Uniform distribution plot for the furnace data. Text box obtained by using Excel's
Trendline tool.

The text box on the uniform distribution plot refers to a value for R2 (the coefficient of
determination). Excel offers to calculate this value as part of the Trendline tool for x - у scatter
plots. We will discuss the definition and meaning of R2 in Section 3.4, but we introduce it here to
make the uniform-or-not-uniform decision more precise. R2 will always be a number between 0
and 1, and the closer it is to 1, the closer the points in the underlying graph are to a straight line (or
whatever type of line the Trendline tool is creating). The furnace temperature had R2 = 0.9908,
which is very close to one. This indicates an "excellent" fit to the uniform distribution.

On the other hand, if the proposed model is a poor fit for the dataset under consideration,
there will be systematic differences between the theoretical and actual percentiles, and the
distribution plot may show a significant deviation from a line of slope = 1. To illustrate, consider
carefully the way the controller operates. Recall that the furnace controller turns on the power
when the temperature gets too low, and turns the power off when the furnace reaches a sufficiently
high temperature. We measured the temperature every hour. Suppose in that situation, the time
interval between each power-on occasion was 22 minutes. Since 22 is not an integer fraction of
60, the point of temperature sampling might occur anywhere in the cycle time interval. What
would the data look like if the furnace took roughly five minutes to heat up and fifteen minutes to
cool down, for a twenty-minute cycle time? Then our hourly measurements might occur more
often at the same point in the cycle. The histogram would have one high bar, and the rest would be
low. On the other hand, what if the cycle time was twenty minutes heating and twenty minutes
cooling? If the first temperature measurement was taken right around when the power is just
turned on, the measured temperature would be near the coolest temperature in our range. An hour

68 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

later, the measurement would be taken when the power was just turned off, so the measured
temperature would be near the highest temperature in the range. Figure 3.5 shows that the
resulting histogram looks "U-shaped", since most of the measurements are at either extreme, and
relatively few measurements are in the middle. Certainly this is a contrived case, but it does show
how measurements taken at identical intervals may give misleading results.

Histogram of Hourly Furnace Temperatures

938-939 940-941 942-943 944-945 946-947 948-949
temperature, °C

Figure 3.5 Histogram for furnace temperatures when the measurement time consistently
overlapped with heating and cooling cycle times.

Clearly, the uniform model is not a good one for this data set because the points in the
uniform distribution plot (Figure 3.6) look more S-shaped than straight. The slope of the best-
fitting straight line is not close to one, and the intercept is far from zero. The fact that R2 is close to
one (i.e., 0.952) tells us that a straight line gives a good representation of the data, but by itself
does not tell us that the uniform distribution is a poor choice.

Uniform Distribution Plot for Revised Furnace Data

952

950 opd^ooo
ooo^^^
о 948
^άζ &^
946
> о

944 actual = 1.21(theor)-198.6
"ФсJ 942 R2 = 0.952
ф 940
α

938
936

934 938 940 942 944 946 948 950
936 theoretical percentile value, °C

Figure 3.6 Uniform distribution plot for the revised furnace data, where hourly measurement
times coincided closely with either turn-on or turn-off controller times.

Although the actual furnace temperature is uniformly distributed, poor experimental
procedure in selecting the times for temperature measurement makes it seem like the temperature
fluctuation data do not fit a uniform distribution. The way to avoid this is to sample the furnace

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 69

temperature at random intervals. Suppose you were sampling a furnace with a mean cycling time
of 30 minutes, and wanted to take 48 temperature samples over a period of 24 hours. You could
generate a series of 48 random numbers between 15 and 45, and use these as the interval between
temperature measurements. In Excel, select a cell and type in RANDBETWEEN(15 45), and
press Enter. Pressing F9 will generate a random number between these limits. Each time you
press F9, a new value will appear. If your "Calculation" option is set to automatic, Excel activates
this procedure any time you make a significant change in the worksheet. To get a stable set of
sample intervals, fill down for 48 rows, select the column of numbers, and use Copy and Paste
Special/Numbers and Formats to save the sampling intervals for the test procedure.

We mentioned earlier that a value of R2 close to one does not by itself confirm any particular
distribution choice. However, there are ways to use R2 for this purpose. Worksheet "RSqCutoffs"
in workbook StatTools.xls has a table of values of R2 to help you decide how well the data fit a
particular distribution model. Table 3.7 shows a selected portion from the worksheet for a uniform
distribution.

Table 3.7 Excerpted lines from the table of uniform distribution R2 values.

sample size 1.0% 2.5% 5.0% 10.0% 15.0% 25.0%
40 0.9378 0.9479 0.9564 0.9651 0.9703 0.9761
50 0.9492 0.9586 0.9655 0.9724 0.9761 0.9807

The numbers at the top of the table ("1%", "2.5%", etc.) represent the proportion of time a
genuinely uniform data set would have an R2 smaller than the number shown in the row below.
Based on the worksheet table, if we had 100 different samples of uniform data, each of size 500,
then we'd expect 99 of them to have a uniform distribution plot with an R2 above 0.9950, and only
1 to have a smaller R2. This table can be used as a guide to how good a fit the uniform distribution
is to a particular dataset. Looking at the revised furnace data first, it has a sample size of 48 and an
R2 of 0.952. Using the n = 50 row in Table 3.7 for comparison, 0.952 < 0.9586, so fewer than
2.5% of all truly uniform datasets would have an R2 so small. That makes the uniform model look
pretty unlikely. On the other hand, the original furnace data had a sample size of 48 and an R2 of
0.9908, so using either the n = 40 or the n = 50 row, we see that the furnace data had a better R2
than at least 25% of all genuinely uniform datasets. This is good evidence in favor of the original
furnace data having a uniform distribution.

More generally, you should use the following criteria:
• If the uniform distribution plot has an R2 greater than the 25% cutoff value in the table,

the uniform distribution is an excellent fit for the data set.
• If the value of R2 falls between the 15% and the 25% cutoffs, the uniform distribution is a

good fit.
• If the value of R2 falls between the 5% and the 15% cutoffs, the uniform distribution is a

moderately good fit.
• If the value of R2 falls between the 1% and the 5% cutoffs, the uniform distribution is a

poor fit.

• If the value of/?2 falls below the 1% cutoff, the uniform distribution is an unacceptable
fit.

Using this language, the uniform distribution is an excellent fit for the original furnace data,
but a poor fit for the revised furnace data (but remember that the timing of the temperature
measurements flawed the revised furnace data). The Excel details for making a uniform
distribution plot, as well as for calculating R2, are in a WordPad document on the "Furnace"
worksheet of workbook StatTools.xls.

We introduced several general statistical concepts while discussing the uniform distribution.
These concepts are applicable to any distribution, as we'll show next in Section 3.2.2.

70 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

• The cumulative distribution function (cdf) F(x) gives the percentage of the population
that measures less than or equal to x.

• The probability density function (pdf)Xx) can be integrated to find the probability that an
element of the population lies in a particular range, and has a graph that should resemble the
histogram of the variable being modeled by that distribution.

• A distribution plot can help the user decide if the distribution is a good model for the
variable being considered. The most important criterion is the value of R2.

To clarify the use of R2 as a criterion for rejecting or accepting a particular distribution, note
that we are not plotting (or calculating) R2 from a chart of sample data vs. some other experimental
variable. Instead, we are plotting actual percentile values vs. those expected if the samples fit a
certain type of distribution.

3.2.2 The Normal Distribution

Unquestionably, the most important statistical distribution is the normal distribution, often
called the "bell curve". As an example of a situation that might have a normal distribution,
consider a natural gas (NG) producer aiming to produce NG with a heating value of 1050 Btu/ft3.
(All volumes are STP). They do this by blending refined NG from different gas fields that have
different amounts of combustible and inert constituents. They are able to measure the heating
value to ±2 Btu, and over a month, gather a sample of 120 measurements that turn out to have a
range of 1020 to 1074 Btu/ft3. The data are in worksheet "NG" in workbook StatTools.xls.

The normal distribution is a useful model in any situation where the measurements are
symmetric around some center point, and values near the center are more likely than values at the
extremes. Normal distributions can have any mean. Since the (ideal) normal distribution is
symmetric, the mean will always lie at the center of the distribution. Normal distributions can also
have any standard deviation. Recall from the previous section that the standard deviation was a
measure of how far the average element of a dataset is from the mean. The greater the standard
deviation, the greater the average distance from the center. In terms of a normal curve, that means
that the standard deviation governs how wide the central "hump" is. The larger the standard
deviation, the wider the central hump.

Equation [3.5] gives a normal distribution pdf with mean μ and standard deviation σ.

-(*-μ)2 [3.5]

f(x) = -7L-e 2 σ '
2πσ

When working with theoretical distributions defined by mathematical formulas such as the

above, statisticians use Greek letters for quantities like mean and standard deviation to distinguish

them from means and standard deviations defined from samples. Thus, whenever you see "μ", it's

an unknown population mean, while "jc" tells you it's a sample mean, even without any other
context or explanation. The same distinction holds for σ versus s.

Excel's mean and standard deviation formulas were used on the NG population to calculate μ
= 1048.1 and σ = 10.98. When we replace the ideal model represented by [3.5] with a real
(reasonable) model, the heating values of the gas produced by the company might be:

-(x-1048.l)2

/ W = -r=7 x* 2(1098)2

V2TC(10.98)

where x represents the heating value measured in a random batch, 1048.1 represents x, and 10.98
represents s. Figure 3.7 shows a plot of this curve added to the histogram, using rounded values of
1048 as the mean and 11 as the standard deviation. The y-values were suitably scaled so that the
pdf fits neatly over the histogram.

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 71

The normal curve is asymptotic to the x-axis to the left and right. That means that, at least in
theory, there is no limit to how far away an observation can be from the mean. In terms of natural
gas blending, the normal model includes the possibility that a heating value measurement could be
zero or even negative, which obviously makes no sense. However, the normal model also says the
likelihood of such a measurement occurring is so astronomically tiny as to be negligible.
According to the normal distribution for this sample, even a measurement below 1000 Btu/ft3
would be extremely unlikely. If the company were to continuing to measure samples of natural gas
at a rate of 4 samples per day, they should expect a measurement of less than 1000 Btu/ft3 about
once every 115 years.

Histogram for Heating Values of Blended NG

100%

(1018, (1024, (1030, (1036, (1042, (1048, (1054, (1060, (1066, (1072,
1024] 1030] 1036] 1042] 1048] 1054] 1060] 1066] 1072] 1078]

heating value (Btu/ft3)

Figure 3.7 Histogram of natural gas heating values, with a superimposed cumulative % curve.

The pdf for the normal distribution is obviously a more complex function than the pdf for the
uniform distribution. The method for finding the probability of a measurement being in a
particular range is still just the integral o f / o v e r that range; however, the pdf for the normal
distribution is perhaps the most common example of a function with no closed-form anti-
derivative. That is, if you wanted to find the probability that the heating value of a particular batch
of natural gas was between 1040 and 1042, you would still set up the integral:

1042 1042 -(x-1048.l)2

jf(x)dx = Щ\ом)10{0е 2(10.98)2 dx

1040

but you couldn't evaluate it analytically using an anti-derivative. (The probability could, however,
be estimated directly from the data. Since 10 of 120 measurements were either 1040 or 1042,
approximately 10% of the distribution lies in the range 1040 to 1042).

Calculating probabilities with normal distributions is usually accomplished with the cdf
instead the pdf. Although the previous paragraph tells us we can't write down a simple (or even a
complicated) formula for the cdf, its importance is such that most scientific calculators — and
Excel — have high-accuracy approximations for the normal cdf built in. Figure 3.8 shows the
normal curve for the natural gas heating values. It was generated using rounded values of mean =
1048 and a standard deviation =11.

We now show how to use Excel's tools to make probability calculations, based on these
values (not the actual, slightly different sample mean and standard deviation calculated by Excel).
We first find the probability that the measured heating value in a batch is less than 1060 BTU.

72 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Figure 3.8 The NG histogram with an ideal normal pdf curve superimposed.

A diagram is helpful before using NORMDIST. It's easy to draw the necessary diagram by
hand, rather than by using Excel. Just put down an x-axis and sketch a bell-shaped curve above it.
Now put in the distribution mean (in this case, 1048) in the center, and the boundaries of the region
you're trying to find (in this case, 1060 down to a value where the curve closely approaches the y-
axis). Now just shade in the part of the normal curve corresponding to what you're trying to
calculate. Figure 3.9 shows the same thing, but using Excel's charting tools. Here, we want to
find the probability that a particular measurement is below 1060, so we've shaded in the region
underneath the curve to the left of 1060.

NG Normal Plot

0.04 -г-

0.035 \ y$ N.
с 0.03 /\

О" 0 0 2 5 / \^

j= 0.02 / -\

.2 0.015 / \
w / \
/
u> 0.01 ^/ \
u 0.005 i—~~"^ -
1020 1030 X.
0i
1010 1 ^^—- 1090
1040 1050 1060 1070 1080
heating value, Btu/ft3

Figure 3.9 pdf of normal model for natural gas data. Shaded region shows the area of interest in
calculating the probability that any given sample would have a heating value less than 1060 BTU.

In Excel, the cdf for the normal distribution is =NORMDIST(x, mean, std dev,
cumulative), where x is the value at which the cdf is to be evaluated; mean and std dev are the
mean and standard deviation of the cdf; and cumulative should be set to TRUE for the cdf. If
cumulative is set to FALSE, NORMDIST gives you the pdf instead. Since the cdf always gives
the probability of a value being less that or equal to x, we can see that the probability is just the
value of the cdf at 1060, which is =NORMDIST(1060, 1048, 11, TRUE) - 0.862 (cell J3 of

Chapter 3 Statistical Concepts Applied to Measurement and Sampling 73

worksheet NG). That is, according to the normal distribution, 86.2 % of the heating values of all
natural gas samples, measured or unmeasured, should be less than 1060 Btu/ft3.

Now suppose we wanted to calculate the probability that the measured heating value in a
batch would round off to 1040 (±2 Btu). The heating value would have to measure between 1038
and 1042. For this, imagine a shaded region on Figure 3.9 between 1038 and 1042. We want the
difference between the value of the cdf at x = 1042 and x = 1038, which is:

=NORMDIST(1042,1048,11 ,TRUE) - NORMDIST(1038,1048,11,TRUE) = 0.111

If you prefer thinking algebraically, then remember that the cdf F(x) is the same as the Excel
function NORMDIST with cumulative = TRUE and recall that probabilities are found by
integrating the pdf:

1042

\f(x)dx = F(l042) - F(l038)

1038

Finally, suppose we want to know the probability that the measured heating value in a batch
exceeds 1070 BTU. Imagine a shaded region to the right of 1070. We know that the area to the
left can be found using the cdf. Recall from the uniform distribution that the total probability for
the whole range of values is 1. That means that the shaded area to the right of 1070 is the
difference between 1 and the unshaded area on the left. Therefore, we write (in any cell):

= 1 - NORMDIST(1070,1048,11,TRUE) = 0.0228

Although we will be using Excel to do all probability calculations with normal distributions,
there are certain basic facts to remember. For any normal curve, with any mean and any standard
deviation,

• About two-thirds of a normally distributed population lies within one standard
deviation of the mean (exact figure is 68.3%).

• About 95% of a normally distributed population lies within two standard deviations of
the mean (exact figure is 95.4%).

• Virtually all of a normally distributed population lies within three standard deviations
of the mean (exact figure is 99.7%).

We can use these guidelines to give a second, approximate answer to the probability that a
new batch would have a heating value in excess of 1070 BTU. The mean is 1048, and the standard
deviation is 11, so 1070 is two standard deviations above the mean. We know that about 95% of
the natural gas measurements are within two standard deviations of the mean, so about 5% lies
outside that region. That 5% is split between the tail on the left, below 1026, and to the right,
above 1070. Since the normal distribution is symmetric, that leaves half of 5%, or 2.5%, in upper
tail. (Compare with the exact answer of 2.28% above).

Given the importance of the normal distribution, how can we judge if it is a good model for a
dataset? Just seeing a bell-shaped histogram may be misleading. The way to check for normality
is to use a normal distribution plot. Recall from the previous section that a distribution plot is a
plot of the theoretical percentile values for the distribution being tested against the actual
percentiles from the data. A how-to guide for making a normal distribution plot using Excel is
embedded as a WordPad file in the "NG" worksheet. Figure 3.10 shows the normal distribution
plot for natural gas heating values data (from "NG" worksheet in StatTools.xls workbook).

Recall that if the normal distribution is a good model, each theoretical percentile will be very
close to the percentile from the actual data. The plotted points will lie close to a line of slope = 1
and have an intercept of zero. Figure 3.10 shows a slope of 1.010 and an intercept of-10.2.
However, the slope and intercept of a normal distribution plot do not by themselves indicate the
validity of a normal distribution fit. A more important statistical indicator is the value of R2, which
was discussed earlier in connection with the uniform distribution plot.

74 Chapter 3 Statistical Concepts Applied to Measurement and Sampling

Normal Distribution Plot for Heating Value of NG
1080

ooyT

ш 1060 actual Btu/ft3 = 1.010(theor Btu/ft3) -10.2
σ> R2 = 0.9932
с

ГС

§ 1040
гс

о^о

1020

1020 1030 1040 1050 1060 1070 1080
theoretical heating value

Figure 3.10 A normal distribution plot for the theoretical percentile heating values of natural gas
vs. the actual percentile heating values (units of Btu/ft3). Text box equation developed using
Excel's Trendline tool. Textbox equation values have not been properly rounded off.

Earlier we used the value of R2 to characterize more precisely how well a uniform distribution
fit a data set. We can also use R2 for a normal distribution in the same way. Table 3.8 shows
relevant lines from the "RSqCutoffs" worksheet. The sample size for the natural gas example is
120. If necessary, we could interpolate to find values for R2 that were 40% of the way between n =
100 and n= 150. However, in this case, our value of R2 = 0.9932 is above the 25% cutoff in both
lines. Thus, a normal model is an "excellent" fit.

Table 3.8 Excerpted lines from the table of normal distribution i?2 values. 25.0%
0.9846
sample size 1.0% 2.5% 5.0% 10.0% 15.0% 0.9890
100 0.9636 0.9699 0.9744 0.9787 0.9814
150 0.9744 0.9786 0.9818 0.9850 0.9868

EXAMPLE 3.4 — Evaluation of the Normal Distribution for Ceramic Strength Data.

Examine the dataset on the strength of a silicon nitride ceramic (worksheet "CerStr") and the
histogram shown in Figure 3.2 of Example 3.1. Does a normal distribution adequately represent
the data?

Solution. Although the histogram doesn't look perfectly normal, there is a hump in the middle and
a tail on the left, so perhaps the lack of tail on the right can be forgiven. The best approach is to
analyze the data by a normal distribution plot, as shown in Figure 3.11. The trendline has a slope
close to 1, but at the extremes, the points seem to be rather far from the line. To be more certain,
we consult a table of values for R2.


Click to View FlipBook Version