140 5 Fuzzy Systems
of V, and Y = X o R. Y is said to be induced by X and R. In the above, "o" represents
the composition of X and R. Here X and Y can be viewed as row vectors whose
components are the values of the membership functions as:
X = {u/mX(u) │ u ∈ U}
Y = {v/mY(v) │ v ∈ V}
As discussed before, X o R is the max-min product of the vector X and the relation
matrix R:
X o R = {v / [max (min (mX (u), mR (u, v))) │ u ∈ U] │ v ∈ V}
∪= {v / maxU [min (mX(u), mR(u, v))]}
V
Example.
XR Y
⎡0.8 0.9 0.2 ⎤
[0.2 1 0.3] o ⎢0.6 1 0.4 ⎥ = [0.6 1 0.4 ]
0.8
⎢ ⎥
⎣0.5 1⎦
Fuzzy Inference
1. "if A then B" and X → Y
Fuzzy inference is based on fuzzy implication and the compositional rule of
inference discussed above. The basic steps are as follows. Let A and X be fuzzy sets
of a universe of U, B and Y be fuzzy sets of a universe V. Suppose that we are
given:
a) Implication: "if A then B"
b) Premise: X is true
and we want to determine:
c)Conclusion: Y.
To achieve the above, perform the following two steps:
Step 1. Compute the fuzzy implication, "if A then B," as a fuzzy relation R = A ×
B.
Step 2. Induce Y by Y = X o R.
Using the membership functions for A, B, and X, we can compute the membership
function for Y as follows:
Y =XoR
5.4 Fuzzy Logic 141
= X o (A × B)
= {v / [max (min (mX(u), mA(u), mB(v))) │ u ∈ U] │ v ∈ V}
∪= {v / maxU [min (mX(u), mA(u), mB(v))]}.
V
2. "if A and B then C" and X and Y → Z
An extension of the above is (A and B) => C, i.e., "if A and B then C," X and Y
are true, then derive conclusion Z. Let A and X be fuzzy sets of a universe of U, B
and Y be fuzzy sets of a universe V, and C and Z be fuzzy sets of a universe of W.
Then Z is computed as follows:
Z = (X × Y) o (A × B × C)
= {w / [max (min (mX(u), mY(v), mA(u), mB(v), mC(w))) │ u ∈ U, v ∈ V]│w ∈ W}
∪= {w / maxU×V [min (mX(u), mY(v), mA(u), mB(v), mC(w))]}.
W
Example. "if A then B" and X → Y
Let the universe: U = {1, 2, 3, 4, 5}. We define two fuzzy sets:
small = {1/1, 2/0.8, 3/0.6, 4/0.4, 5/0.2}
large = {1/0.2, 2/0.4, 3/0.6, 4/0.8, 5/1}
From these two fuzzy sets, we can derive other fuzzy sets as, for example:
not large = {1/0.8, 2/0.6, 3/0.4, 4/0.2}
very small = {1/1, 2/0.64, 3/0.36, 4/0.16, 5/0.04}
not very small = {2/0.36, 3/0.64, 4/0.84, 5/0.96}
Here the membership function of "very" X is computed from the membership
function of X, mX(u), by {u/mX2(u) │ u ∈ U}. Now let our problem be described as
follows:
Our implication: if A then B, where A = small and B = large.
Our premise: X = not large.
What can we conclude for Y?
In this example, we first compute R by A × B, then determine the answer Y by Y = X
o R.
Step 1. Derive R from "if A then B" by A × B, where A = small and B = large:
142 5 Fuzzy Systems
⎡ 0.2 0.4 0.6 0.8 1 ⎤
⎢ 0.2 0.4 0.6 0.8 0.8 ⎥
⎢⎥
R = ⎢ 0.2 0.4 0.6 0.6 0.6 ⎥
⎢ 0.2 0.4 0.4 0.4 0.4 ⎥
⎣⎢ 0.2 0.2 0.2 0.2 0.2 ⎦⎥
Step 2. Induce Y by Y = X o R, where X = not large:
XR
⎡ 0.2 0.4 0.6 0.8 1 ⎤
⎢ 0.2 0.4 0.6 0.8 0.8 ⎥
⎢⎥
Y = [0.8 0.6 0.4 0.2 0] o ⎢ 0.2 0.4 0.6 0.6 0.6 ⎥
⎢ 0.2 0.2 0.2 0.2 0.2 ⎥
⎣⎢ 0.2 0.2 0.2 0.2 0.2 ⎦⎥
= [0.2 0.4 0.6 0.8 0.8]
In the above inference, we followed the basic steps of fuzzy inference discussed
earlier to compute R = A × B, then to determine Y by X o R. We can also obtain the
same result for Y by using the following formula, which discussed after the basic
steps.
Y =XoR
= X o (A × B)
∪= {v / maxU [min (mX(u), mA(u), mB(v))]}.
V
Note that as a special case of the fuzzy inference discussed here, when X = A and
A is a normal fuzzy set (i.e., the maximum degree of A is 1), we can show that Y = X
o R = X o (A × B) = A o (A × B) is equal to B as follows:
Y = A o (A × B)
= {v / [max (min (mA(u), mA(u), mBB(v))) │ u ∈ U] │ v ∈ V}
={v / [max (min (mA(u), mB(B v))) │ u ∈ U] │ v ∈ V}
∪= {v / maxU [min (mA(u), mB(v))]}
V
∪= {v / [min (maxU(mA(u)), mB(v))]}
V
∪= {v / [min (1, mB(v))]} (since A is normal)
V
5.5 Fuzzy Control 143
∪= {v / mB(v)}
V
=B.
5.5 Fuzzy Control
Control refers to the control of various physical, chemical, or other numeric
characteristics, such as temperature, electric current, flow of liquid or gas, motion of
machines, various business and financial quantities (e.g., flow of cash, inventory
control), and so forth. A control system can be abstracted as a box for which inputs
are flowing into it, and outputs are emerging from it. Parameters can be included as
parts of inputs or within the box, i.e., the control system.
For example, consider a system that controls some kind of temperature
distribution by heat and possibly cooling sources. The inputs may be the current
temperature distribution and its time derivatives, and a parameter may be target
temperature distribution. The outputs can be the amounts of the heat and cooling
sources to be applied. The control problem in general is to develop a formula or
algorithm for mapping from the inputs and parameters to the outputs.
Fuzzy control is a control technique based on fuzzy logic. Given input, typically
system measurements such as temperature, we are to determine output such as an
amount of the heat source to control the system. In fuzzy control, the rules, input,
and/or output may involve fuzziness, leading to the use of fuzzy logic.
The basic idea of fuzzy control is to apply fuzzy inference to control problems. In
fuzzy control, the control box includes fuzzification, fuzzy inference using fuzzy
if-then rules, and defuzzification procedures. Fuzzy rules can include human
descriptive judgements, such as "if the temperature is moderately high and the
pressure is very low, then the output is medium." Although fuzzy control is based on
fuzzy inference, simple methods are used in considering computation time.
5.5.1 Fuzzy Control Basics
In a fuzzy control system, we have a set of fuzzy control rules in the format of "if x
= small, then z = big," or "if x = small and y = medium, then z = big." Here x and y are
the input variables and z is the output variable. Given specific values of x and y,
our task is to determine a value of z using applicable control rules and fuzzy
inference.
Commonly used fuzzy variables and their membership functions
We define fuzzy variables that can represent values of the input and output variables.
A commonly used set of seven fuzzy variables follows:
144 5 Fuzzy Systems
NB = Negative Big
NM = Negative Medium
NS = Negative Small
ZO = Zero
PS = Positive Small
PM = Positive Medium
PB = Positive Big
Or, the two mediums, NM and PM, may be omitted, resulting in the following set of
five fuzzy variables. This smaller set of fuzzy variables is simpler, but it would result
less fine or delicate control. For simplicity, we will use this five fuzzy variable
version hereafter.
NB = Negative Big
NS = Negative Small
ZO = Zero
PS = Positive Small
PB = Positive Big
The next step is to define membership functions for these fuzzy variables.
Defining a membership function is up to us, and the selection of membership
functions affects the control performance. What membership function we choose
depends on many factors, such as the type of application, how much fine control is
required, how fast the control must be performed, and so on. A rule of thumb is that
simpler membership function causes lesser computation time but reduces fine control.
There are two categories of membership functions. One is continuous and the
other discrete. The following, Fig. 5.14, shows an example of the continuous
membership function. In this example, each fuzzy variable's membership function
has a triangular shape (plus the zero membership function outside of the triangle, i.e.,
the bottom line segments). There are other common continuous membership
functions such as trapezoids (rather than triangles) and bell shapes.
In Fig. 5.14, the membership function value or degree of variable PS is 1 when
normalized x = 0.5; PS is 0.5 when normalized x = 0.25 or 0.75; and PS is 0 when x
= 1 or ≤ 0. Mathematically, a triangular membership function m(x) can be represented
as:
⎡(a − x−b ) ⎤
m(x) = max ⎢ ,0⎥ ,
⎣ a ⎦
where a > 0 and b are constants. b determines the x value for the apex or the
symmetric point, and 2a represents the width of the triangle base. For example, in Fig.
5.14, for the membership function for variable NS, we choose a = 2 and b = - 2 for
not normalized, and a = 0.5 and b = - 0.5 for normalized.
5.5 Fuzzy Control 145
Fig. 5.14. Graphical representations of continuous triangular membership functions for five
fuzzy variables.
The following is an example of discrete membership functions.
Table representations of discrete membership functions for five fuzzy variables
x -4 -3 -2 -1 0 12 34
m(x):
NB 1 0.5 0 00 00 00
NS 0 0.5 1 0.5 0 00 00
ZO 0 00 0.5 1 0.5 0 00
PS 0 00 00 0.5 1 0.5 0
PB 0 00 00 00 0.5 1
Example.
Assume that acceleration of a space shuttle is between -4G and +4G (G represents the
gravitational acceleration). Using five fuzzy variables, the triangular membership
function is represented by Fig. 5.14. A negative acceleration of -1.5 G, that is, the
normalized value of -0.375, for example, is represented as, NS with the degree of
0.75, and ZO with the degree of 0.25 (and NB, PS, and PB with the degree of 0).
Typical fuzzy control setup
We will describe a typical fuzzy control setup in the following. At each time interval
our fuzzy control system receives specific values for two inputs, E and ∆E, and yields
one output, W:
E and ∆E → Fuzzy control system → W
That is, E and ∆E are our input variables, and W is our output variable. For example,
at some specific time, E may be 3 and ∆E may be 0, and then W may be determined
as -2.2. After a short time interval, the values of E and ∆E change, and a new value
of W is computed. This process continues over a certain time period until control has
been achieved.
Suppose that T represents the value to be controlled by the system. If we are to
146 5 Fuzzy Systems
control temperature, T will be the temperature; to control speed, T will be the speed,
etc. Then T0 is the target T, and E is the difference between T and T0.
E = T - T0: the general expression for temperature difference
Since T and E change step by step over time, we can define T and E at each time
period n as Tn and En, respectively.
En = Tn - T0: the temperature difference at time period n
Then ∆E is defined as follows (∆E represents the time derivative of E at time period
n).
∆E = En - En-1: the changing rate of E at time period n
Just as E is the difference between the current and target values, rather than the
current control value itself, W is a deviation from the current output value. For
example, suppose that temperature is controlled by a heat source, and the amount of
heat at time period n is Zn; then
Zn+1 = Zn + W.
Note. In some books, E is defined as E = T0 - T, i.e., the sign will be reversed. Using
this definition, the sign of ∆E is also reversed: ∆E = En - En-1 = (T0 - Tn) - (T0 - Tn-1)
= -(Tn - Tn-1). In the fuzzy if-then table discussed below, the terms of "Negative" and
"Positive" for E and ∆E will be interchanged (e.g., replace NB with PB). The table
entries for W remain the same for this definition of E.
Fuzzy if-then rules that derive W from E and ΔE
We set up a table for fuzzy if-then rules that derive W from E and ΔE in terms of the
fuzzy variables. The following is such a table.
Fuzzy if-then rule table for (E, ΔE) → W
∆E
W NB NS ZO PS PB
────────────────────────────────────────
NB PB ← (Rule 6)
NS PS
E ZO PB PS ZO NS NB
PS ↑ NS ← (Rule 8)
PB (Rule 1) NB ← (Rule 9)
This table represents nine rules corresponding to the nine entries in the table. For
example, "if E = ZO and ΔE = NB, then W = PB" may be called Rule 1. The remaining
four entries in the same horizontal line of the table may be called Rules 2, 3, 4 and 5.
The remaining four entries in the vertical line may be called Rules 6, 7, 8 and 9. That
5.5 Fuzzy Control 147
is,
Rule 1: if E is ZO and ∆E is NB then W is PB,
or
Rule 2: if E is ZO and ∆E is NS then W is PS,
or
:
:
Rule 8: if E is PS and ∆E is ZO then W is NS,
or
Rule 9: if E is PB and ∆E is ZO then W is NB.
System response phases
System response in fuzzy control, i.e., the behavior of the value to be controlled with
respect to time, is shown in Fig. 5.15. The response is an oscillating (irregular) cycles
with decaying amplitude, where each cycle consists of four phases, I through IV. The
Phase I of Cycle 1, i.e., the beginning of control, is near point a1 in the figure, where
E has the most negative value and ∆E is near zero. Hence, Rule 6 in the if-then rule
table applies to the region around point a1. Around point b1, E is near zero and ∆E is
a large positive number. Hence, Rule 5 in the if-then rule table applies to this point.
Similarly, around point c1, E is large and positive and ∆E is near zero, so Rule 9
applies; around point d1, E is near zero and ∆E is most negative, and Rule 1 applies
to this point. After Phases I, II, III, and IV of Cycle 1, the four phases repeat for Cycle
2, with smaller amplitude. Hence, near point a2, E is small and negative, ∆E is near
zero, and Rule 7 applies.
The following fuzzy if-then rule table shows some points in Fig. 5.15 and their
major corresponding rules. As we will see later in a case study, typically more than
one rule is applied to a point, which is a major feature of fuzzy systems in general.
Points a3, b3, c3, and so on will continue in the same fashion, forming a shrinking
spiral in the table. Whether a3 or any subsequent point converges as a major rule to
Rule 3, where E = ZO and ∆E = ZO, depends on how fast the amplitude decreases.
Perhaps a3 stays on Rule 7, the same rule as for a2, and b3 converges to Rule 3, E =
ZO and ∆E = ZO.
148 5 Fuzzy Systems
Fig. 5.15 System response in fuzzy control
∆E
W NB NS ZO PS PB
─────────────────────────────────────────
NB PB a1
NS PS a2
E ZO PB d1 PS d2 ZO NS b2 NB b1
PS NS c2
PB NB c1
In the fuzzy if-then rule table, the nine entries shown, which represent the phases
of different cycles, are considered the major entries required to achieve the required
control. The empty entries of the table (such as E = NB and ∆E = NB) are considered
not important for several reasons. First, for fuzzy control as depicted in Fig. 5.15,
cases in which both E and ∆E take extreme values (e.g., E = NB and ∆E = NB; E =
PB and ∆E = NB) do not occur. That is, when one of E and ∆E takes an extreme value,
the other is near zero. Second, as we will see soon, this if-then rule table is used in
conjunction with membership functions like Fig. 5.14. Each value of a fuzzy variable
(e.g., ∆E = ZO) does not represent a single point, but instead covers a wide range (e.g.,
ZO covers x = -2 to 2) with varying degrees. That is, each rule in the table covers
wide ranges of E and ∆E. A third reason is, as we will see in the case study in the next
subsection, W is typically chosen as a deviation (a small additive term) from the
current system output, rather than the system output itself. For such W, the effect of
W is not as critical as the system output itself.
These reasons allow us to have fewer rules to perform the required control, and
normally, simple assumptions are made for these empty entries. For example, assume
W = 0 (not W = ZO) for all these empty entries. When W represents a deviation from
5.5 Fuzzy Control 149
the current system output, W = 0 means to keep the current system output. In certain
applications, some of the empty entries are filled in, as for example: E = PS and ∆E
= NB then W = NS, E = PB and ∆E = NS then W = PS, E = PS and ∆E = NS then W
= ZO, etc.
A cookbook recipe to compute output, W, from two inputs, E and
∆E
Now with all this predetermined information, we can compute W for the given values
of E and ∆E.
1. Fuzzification.
Look at Fig. 5.14 and find which fuzzy variables (NB, etc.) apply to the given
specific value of E (x in the figure) and to what degree. Repeat the same for ∆E.
2. Fuzzy inference.
a) Look at the fuzzy if-then rule table above and find which rules apply for the
fuzzy variable combinations found for E and ∆E in Step 1. Let call these rule
numbers i and j (e.g., if Rules 8 and 9 are applicable, then i = 8 and j = 9).
b) Compute the weight (firing strength), αi, of each rule found in Substep (a) in
the form of
αi = min(mF1(E), mF2(ΔE)) = mF1(E) ∧ mF2(ΔE)
where F1 and F2 are the fuzzy variables found in Step 1, and ∧ takes the
minimum of the operand membership functions.
c) Find the membership function for W associated with each rule in the form of
mi(W) = αi ∧ mF3(W)
where F3 is the fuzzy variable found for W in the fuzzy if-then rule table, and
mF3(W) is the membership function corresponding to that fuzzy variable in Fig.
5.14 (let x be W).
Note that in these substeps, we employ the fuzzy implication formula in the
form of "if E and ΔE then W" = E × ΔE × W, taking the minimum of the
membership functions of these variables, E, ΔE, and W.
d) Compute the membership function for W, mT(W), in the following form:
mT(W) = max(mi(W), mj(W)) = mi(W) ∨ mj(W),
where ∨ takes the maximum of the operand membership function. Note that since
these rules are combined in the form of Rule i or Rule j, we take the max of these
membership functions.
150 5 Fuzzy Systems
3. Defuzzification.
The above mT(W) gives the fuzzy version of the solution, i.e., the answer for
output W as a (membership) function of W. (For example, an answer may be to
produce output W in the form of -4 to -1 with a degree of 0.5, and so on.) For
practical output, however, we need a specific single value, W0, as a system output
to perform the control. For this purpose, we compute the "mean" of W weighted
by mT(W) or the "center of gravity" of mT(W) as W0 as follows:
∫ W⋅ mT (W) dW
W0 = _______________________
∫ mT (W) dW
Here the ∫ symbol represents ordinary integration, rather than fuzzy union (note
"dW" at the end of the expression). This process of evaluating the center of
gravity is called a defuzzification procedure.
5.5.2 Case Study: Controlling Temperature with a Variable Heat
Source
In the following, we will illustrate the basic procedure of fuzzy control discussed
above by using a simple example. Our case study problem is described below.
Problem description
We have one type of system measurement, temperature, T. Let Tn be the temperature
at time period n, and T0 be the target temperature. From these values, we compute the
following:
E = T - T0: general expression for temperature difference
temperature difference at time period n
En = Tn - T0: changing rate of E at time period n
∆E = En - En-1:
We also have one type of system output, the changing rate of heat source, W. W
represents a small difference from the current heat source. If the current heat source
is Zn, then Zn+1 = Zn + W.
Our case study problem is as follows: given two input values, E = (the difference
between the current temperature and the target temperature) and ΔE = (the time
derivative of the difference), we are to determine output value, W = (the changing
rate of heat source).
Suppose that (not normalized) E = 3 and ∆E = 0.
Step 1. Fuzzification.
By looking at the not normalized x = 3 in Fig. 5.14 (for E = 3), we find the
membership function of E is PB (Positive Big) with the degree = 0.5 and
PS with degree = 0.5. Similarly, the membership function of ∆E is ZO
with the degree = 1.0.
5.5 Fuzzy Control 151
Step 2. Fuzzy inference
a) Hence, in the fuzzy if-then rule table, two rules are applicable: Rules
8 and 9. (Rule 8: if E is PS and ∆E is ZO, then W is NS; Rule 9: if E
is PB and ∆E is ZO, then W is NB.)
b) For each of these two rules, we compute the weight (firing strength):
α8 = mPS(E) ∧ mZO(ΔE) = 0.5 ∧ 1.0 = 0.5
α9 = mPB(E) ∧ mZO(ΔE) = 0.5 ∧ 1.0 = 0.5
c) We find the membership function for W associated with each rule as:
m8(W) = α8 ∧ mNS(W) = 0.5 ∧ mNS(W) (see Fig. 5.16)
m9(W) = α9 ∧ mNB(W) = 0.5 ∧ mNB(W) (see Fig. 5.17)
d) The membership function for W, mT(W), is obtained as the max of the
above two intermediate membership functions, m8(W) and m9(W)
(see Fig. 5.18).
mT(W) = m8(W) ∨ m9(W).
Step 3. Diffuzification.
We compute the center of gravity of mT(W) as W0 (see Fig. 5.19). The
numerator of W0 = ∫ W⋅ mT (W) dW = ∫-4-1 W⋅(0.5) dW + ∫-10 W⋅(-W/2) dW
= (0.5)W2/2 | -1 + (-0.5) W3/3 | 0 = -47/12. The denominator of W0 = ∫
-4 -1
mT (W) dW = ∫-4-1 (0.5) dW + ∫-10 (-W/2) dW = (0.5)W | -1 + (-0.5) W2/2 | 0
-4 -1
= 7/4. Hence, W0 = (-47/12)/(7/4) = -2.2381.
Fig. 5.16 m8(W) = 0.5 ∧ mNS(W) Fig. 5.17 m9(W) = 0.5 ∧ mNB(W)
152 5 Fuzzy Systems
Fig. 5.18. the membership function for W, Fig. 5.19. Final output, W0, as the center of
mT(W) = max(m8(W), m9(W)) gravity of mT(W)
Possible extensions of this simple case study are multiple system measurements
(rather than a single measurement of temperature) and/or outputs (rather than a single
output of W).
Programming considerations
Writing a short program for a simple problem such as the case study discussed here
is a good way to understand the basics of fuzzy control. Integrations may be carried
out by using simple formula. One such simple formula would be to add up narrow
rectangular areas, where each rectangle has width dW. Or, when the membership
functions for fuzzy variables are triangular as in our discussion or other linear form,
integration may be determined analytically by adding areas of trapezoids and
triangles.
A simple assumption can be made on how the value of W0 may affect the system's
measurement to be controlled for simulation. For example, ∆T, the change in
temperature T, may be assumed to be proportional or a linear function of W0. Starting
with an initial T, at each time step, we would have a new value of T as old T + ∆T,
compute new values of E, ∆E, and W0. We may be able to see how temperature
converges to the target temperature - a simulation of fuzzy control.
For commercial applications, the principles of fuzzy control discussed here may
not actually operate inside individual machines, since fuzzy control is too costly and
time consuming for realtime control. Instead, input to output mappings are
determined using fuzzy control at the factories, these mappings are recorded on a
computer chip, and the machines operate based on this information.
5.5.3 Extended Fuzzy if-then Rules Tables
Note on the four “corner regions” of fuzzy if-then rule tables
Up to this point, all the entries of the four corner regions of fuzzy if-then tables are
assumed 0 for simple implementations. For more fine-tuned control, these corner
regions can have nonzero values as in the following example.
5.5 Fuzzy Control 153
ΔE
W NB NM NS ZO PS PM PB
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
NB PB PM
NM PM
NS PS ZO NM
E ZO PB PM PS ZO NS NM NB
PS PM ZO NS
PM NM
PB NM NB
In this improved table, six nonzero entries are added in the two corner regions. Note
that the same entries (e.g., PM) appear along the diagonal lines. In this sub-section,
we discuss the significance of the four-corner regions.
1. For simple implementation discussed earlier, we assume W = 0 for all entries in
all the four corner regions (here we call the original table). The justifications are
as follows:
i) The regions correspond to less critical points in control process.
ii) The table with zero corner regions still covers a wide range (e. g., ZO =
-0.33 to 0.33). Since multiple rules are used, even if W = 0 for some rules,
W ≠ 0 for some other rules and final W ≠ 0.
iii) W is typically, e. g., a changing rate of a heat source, rather than heat source
itself. W = 0 means to keep the current heat source, and this works well for
many applications. Incidentally, if W represents heat source itself, we
cannot set W = 0; it probably means disastrous control.
2. An improved table over W = 0.
The literature (e.g., Lee, p. 413) suggests adding six entries (e.g., if E = NB and
ΔE = PS then W = PM; other corner entries remain 0) for possible finer control
as shown in the above table (an improved table). This is based on considering
fine tuning in system response as explained below.
In the following, Points refer to the points in Fig. 5.20.
Point a: E = NB, ΔE = 0 → W = PB (original table).
Point b: E = NB, ΔE = PS. To accelerate temperature increase, we may set W
= PM instead of 0.
Point c: Typically, there is some time lag between fuzzy control and physical
target systems. The added W = PM at Point b may cause over-shooting.
To correct this situation, we may start an early adjustment before Point
d by: E = NS, ΔE = PB → W = NM instead of 0.
Point d: E = ZO, ΔE = PB → W = NB (original table)
154 5 Fuzzy Systems
To d
c
b
a
t
Fig. 5.20. System response points for finer fuzzy control.
The resulting typical system response may look as follows.
Improved Original
Fig. 5.21. System response with improved fuzzy control.
3. Further extensions for even finer tuning. Fill in the four corner regions with
nonzero entries diagonally. For seven variables, a total of 7 × 7 = 49 entries:
ΔE
W NB NM NS ZO PS PM PB
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
NB PB PB PB PB PM PS ZO
NM PB PB PB PM PS ZO NS
NS PB PB PM PS ZO NS NM
E ZO PB PM PS ZO NS NM NB
PS PM PS ZO NS NM NB NB
PM PS ZO NS NM NB NB NB
PB ZO NS NM NB NB NB NB
5.5 Fuzzy Control 155
For five fuzzy variables, 5 × 5 = 25 entries.
ΔE
W NB NS ZO PS PB
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
NB PB PB PM PS ZO
NS PB PB PS ZO NS
E ZO PB PS ZO NS NB
PS PM ZO NS NB NB
PB ZO NS NB NB NB
For example, for E = PB, ∆E = NB, PB and NB cancel out and W = ZO.
An extension of the case study for which both E and ∆E are
nonzero
Example. E = 3 and ∆E = -0.5, rather than E = 3 and ∆E = 0.
The basic principle is the same as discussed in the cookbook recipe, p. 149.
Step 1. Each of E and ∆E are represented by two fuzzy variables: E is the same
as before, i. e., PB with 0.5 and PS with 0.5. ∆E is now ZO with 0.75 and
NS with 0.25.
Step 2. (a) The fuzzy if-then rule table has to be extended to have additional
entries: (tentatively called Rules 10 and 11):
Rule 10. if E = PS and ∆E = NS then W = ZO.
Rule 11. if E = PB and ∆E = NS then W = NS.
The rest of the algorithm is performed in the same way.
(b) Compute the firing strengths, α8, α9, α10, α11.
(c) Find the membership function for W associated with each rule,
m8(W), . . ., m11(W).
(d) Determine mT(W) by the max operations on m8(W), . . ., m11(W).
Step3. Compute the center of gravity.
5.5.4 A Note on Fuzzy Control Expert Systems
There are many control problems in which rules are expressed in descriptive rather
than numeric expressions. For example, "if the speed is moderately fast, then slightly
reduce the fuel amount" is a descriptive expression, while "if the speed is 200
km/hour, then reduce the fuel amount by 6%" is a numeric expression. Many
real-world control rules are described in descriptive expressions, because this is the
way experts perform their operations. Numeric rules are not used for several reasons.
156 5 Fuzzy Systems
For example, the number of rules required in numeric form may be large, the rules
may be so complex that some sort of approximations have to be used (e.g.,
linearization of nonlinear expressions), and so forth. Even if we have numeric rules,
obtaining numeric input can be either difficult or not economical. For these situations,
descriptive rules with either numeric or descriptive input can be used.
For example, consider parallel parking a car. Solving this problem by, say, using
a set of differential equations and measuring the distance, angular velocity of the
steering, etc., is difficult. Instead, we may have been using a sort of a set of fuzzy
rules, such as "if the distance to the next car is PS (Positive Small), then keep the
same speed and rotate the steering PS."
As we have seen in this chapter, fuzzy control can easily incorporate descriptive
rules in the system. For example, "if the speed is moderately fast (PS) and the angle
is sharp to the left (NM: Negative Medium), then slightly reduce (NS) the fuel
amount" can be a fuzzy rule described by an experienced expert. Membership
functions can be defined for these fuzzy variables, fuzzy inference can be performed,
and output can be computed. In a sense, this type of fuzzy control quantifies
descriptive rules for numeric computation. This is the basic idea of implementing
human operators' descriptive knowledge in the form of fuzzy control. Fuzzy control
based on this idea has been most successful in terms of real-world applications
among fuzzy systems, and the number is likely to increase in the future.
5.6 Hybrid Systems
One of the most active recent trends is the use of various forms of hybrid systems
combining fuzzy logic and other areas such as neural networks and genetic
algorithms.
Fuzzy - neural network hybrid systems
Much current research suggests using fuzzy logic and neural networks as
complementary techniques. The fundamental concept of such hybrid systems is to
complement each other's weaknesses, thus creating new problem-solving approaches.
For example, there are no capabilities of machine learning in fuzzy systems. Nor do
fuzzy systems have capabilities of memory or pattern recognition in the way neural
networks do. The backpropagation neural network model, for instance, can
memorize and recognize a potentially huge number of input patterns by storing much
less information about weights. (For example, if there are 100 input units, and each
unit can have either 0 or 1, then 2100 ≈ 1030 different input patterns are possible.)
Fuzzy systems with neural networks may add such capabilities, and in fact, recent
commercial applications of neural networks in Japan are mostly tied to fuzzy control.
The current stage of such neural network systems is relatively simple for real-world
applications, however, and some people say that their functions are mostly "tuning"
rather than "learning." Several applications have shown the advantages of neural
networks in mapping the non-linear behavior of systems to predict future states,
monitor the system behavior and anticipate failures. (For more, see Jang, et.al.,
1997.)
5.7 Fundamental Issues 157
Fuzzy - genetic algorithm hybrid systems
Applications of genetic algorithms combined with fuzzy control are being
investigated not only at the academic level but also at the commercial level. Genetic
algorithms are particularly well-suited for tuning the membership functions in terms
of placing them in the universe of discourse. Properly configured genetic
algorithm/fuzzy architecture searches the complete universe of discourse and finds
adequate solutions according to the fitness function.
Fuzzy - PID hybrid systems
For certain applications, fuzzy and PID systems are employed together as a hybrid
controller. A PID controller can be used for approximate and fast control, while a
fuzzy system either tunes the PID gains or schedules the most appropriate PID
controller for better performance.
5.7 Fundamental Issues
Problems and limitations of fuzzy systems
1) Stability
Stability is a major issue for fuzzy control. As described below, there is no
theoretical guarantee that a general fuzzy system will not become chaotic and
stays stable, although such a possibility appears to be extremely slim from the
extensive experience.
2) Lack of learning capability
As mentioned before, fuzzy systems lack capabilities of machine learning, and
neural network-like memory and pattern recognition. This is why hybrid systems,
particularly neuro-fuzzy systems, are becoming popular for certain applications.
3) Determining or tuning good membership functions and fuzzy rules is not always
easy. Even after extensive testing, it is difficult to say how many membership
functions are really required. Questions like why a particular fuzzy expert system
needs so many rules, or when a developer can stop adding more rules are not
easily answered.
4) There exists a general misconception of the term "fuzzy" as imprecise or
imperfect. Many professionals think of fuzzy logic as "magical" without firm
mathematical foundation.
5) Verification and validation of a fuzzy expert system generally requires extensive
testing with hardware in the loop. Such a luxury may not be affordable by all
developers.
158 5 Fuzzy Systems
Stability, controllability and observability
The notion of stability is well-established in the classical control theory, and for a
given linear system, several criteria of stability can be applied and necessary
computations can be performed to obtain results. Similarly, the notion of
controllability and observability is firmly established in modern state space theory.
Using a linearized set of equations, proper parameters can be computed to show that
the system behavior meets these criteria well. As a result of the complexity of
mathematical analysis for fuzzy logic, stability theory requires further study, and
issues such as controllability and observability have to be defined for fuzzy control
systems.
Fuzzy versus probability theories
The continuous rather than crisp transition characteristics between 0 and 1 in fuzzy
sets and logic is similar to probability theory. Additionally, the technique of deriving
membership functions using the relative frequency distribution confuses developers
and creates an impression that fuzzy logic is another form of probability theory. This
sometimes raises debates about how fuzzy theory differs from probability.
The most fundamental difference is in their basic ideas. In probability theory, we
deal with chance of occurrence, e.g., getting the head by flipping a coin, winning a
lottery, being involved in a car accident, and so forth. The membership degree of
fuzzy set theory is not probability, but plausibility. For example, suppose that
someone's membership degree in a set of young people is 0.7. This does not mean
that this person is young 70% of time and old the remaining 30% of the time, which
reflects the probability. Rather, it means that this person is fairly young to the degree
of 70% right now all the time. In other words, the fundamental difference between
fuzzy and probability theories is that the former deals with deterministic plausibility,
while the latter deals with the likelihood of nondeterministic, stochastic events.
From a practical point of view, fuzzy systems have led to numerous new real
world applications, which would not have been realized by using probability theory.
5.8 Additional Remarks
A bit of history
Fuzzy set theory was introduced in 1965 by Lotfi A. Zadeh at the University of
California at Berkeley. In 1974, E.H. Mamdani et. al at the University of London,
demonstrated applications of fuzzy set theory to control problems. But the concepts
were known in a relatively small research community until Hitachi of Japan used
fuzzy control for the new Sendai Subway in 1986. The performance improvement
was significant; it reduced the stop-gap distance by 2.5 times, doubled the comfort
index, and saved 10% in power consumption. The number of practical fuzzy
application systems has exploded since then. World-wide industrial and commercial
applications appear likely to increase significantly in the near future. Academic
5.8 Additional Remarks 159
interests in fuzzy theory has also been growing recently, as indicated by the first
IEEE Transactions on Fuzzy Systems in February 1993.
Significance of fuzzy control
As stated at the beginning of this chapter, fuzzy logic allows decision making under
fuzzy information and rules. It also allows us to represent descriptive or qualitative
expressions. For control problems that involve the fuzziness and descriptive
expressions, fuzzy control is typically simpler, faster, less costly and more robust
than traditional mathematical approaches. Fig. 5.22 shows the approximate position
for which fuzzy control is most useful. In many systems (e.g., "humanistic" systems),
using classical control makes precision either impossible or inappropriate. Here are
some comparisons for typical situations.
Typical classic versus fuzzy control systems
Classic Control Fuzzy Control
─────────────────────────────────────────────────────
Input, output, and Numeric Numeric+ descriptive
intermediate values
Algorithm Single, e.g., Multiple (Front-end if-then rules
a differential equation may select algorithms)
Robustness Weak Good
Fig. 5.22. Approximate domain for which fuzzy control best fits.
160 5 Fuzzy Systems
Generic categories of fuzzy system applications
The following is a list of different categories and their fuzzy system application types.
Category Application Area Examples
────────────────────────────────────────────────────────
Control Control is the most widely applied category today. The
majority of the industrial applications in the next table are
in this category.
Pattern recognition Image (e.g., optical character recognition) audio, signal
processing.
Quantitative analysis Operations research, statistics, management
Inference Expert systems for diagnosis, planning, and prediction;
natural language processing; intelligent interfaces;
intelligent robots; software engineering.
Information retrieval Databases.
A partial list of application areas of fuzzy systems
The following is a list of selected application areas and examples.
Field Applications
────────────────────────────────────────────────────────
Transportation Subways, helicopters, elevators, traffic control, highway
tunnel-air control
Automobiles Transmissions, cruise control, engines, brakes
Consumer electronics Washing machines, driers, refrigerators, vacuum cleaners,
rice cookers, televisions, VCRs, air conditioners,
kerosene fan heaters, microwave ovens, shower systems,
video cameras
Robotics
Computers
Other industries Steel, chemical, power generation, construction, nuclear,
aerospace
Engineering Electrical, mechanical, civil, environmental, geophysics
Medicine
Management Credit evaluation, damage/risk assessment, stock picking,
marketing analysis, production management, scheduling,
decision support systems
Further Reading 161
Further Reading
If I had to, I would choose Terano's book for general introduction to fuzzy systems
and applications. Lee's article is a clear tutorial on fuzzy control. Zadeh's 1965 piece
is the seminal article on fuzzy sets and further derivatives of many areas of fuzzy
systems.
J.-S.R. Jang, C.T. Sun and E. Mizutani, Neuro-Fuzzy and Soft Computing,
Prentice-Hall, Upper Saddle River, NJ, 1997.
Y. Jin, Advanced Fuzzy Systems Design and Applications, Physica-Verlag, 2003.
C.C. Lee, "Fuzzy Logic in Control Systems: Fuzzy Logic Controller, Parts I and II."
IEEE Transactions on Systems, Man and Cybernetics, 20, 2 (March/April, 1990)
404-435.
T. Munakata and Y. Jani, "Fuzzy Systems: An Overview," Communications of the
ACM, Vol. 37, No. 3 (March, 1994), 69-76.
T. Terano, K. Asai, and M. Sugeno, Fuzzy Systems Theory and Its Applications,
Academic Press, San Diego, 1992.
P. Witold and G. Fernando, An Introduction to Fuzzy Sets: Analysis and Design, MIT
Press, 1998.
L.A. Zadeh, "Fuzzy Set," Information and Control, Vol. 8, 1965, 338-353.
L.A. Zadeh, "Fuzzy Algorithms," Information and Control, Vol. 12, 1968, 94-102.
L.A. Zadeh, "Outline of a New Approach to the Analysis of Complex Systems and
Decision-Making Approach," IEEE Transactions on Systems, Man and Cybernetics,
Vol. SME-3, No. 1, January, 1973, 28-44.
H. -J Zimmermann and Hans-Jurgen Zimmerman, Fuzzy Set Theory - and Its
Applications, 4th Ed., Springer, 2005.
Journals
IEEE Transactions on Fuzzy Systems.
Fuzzy Sets and Systems, Elsevier (sponsored by the IFSA, International Fuzzy
Systems Association).
International Journal of Approximate Reasoning, Elsevier (affiliated with the
NAFIPS, North American Fuzzy Information Processing Society).
Many other journals, magazines and conference proceedings in AI and its
applications carry articles in fuzzy systems.
6 Rough Sets
6.1 Introduction
Rough set theory is a relatively new mathematical and AI technique introduced by
Zdzislaw Pawlak, Warsaw University of Technology, in the early 1980s. This area
has remained unknown to most of the computing community until recently. Rough
set theory is particularly useful for discovering relationships in data. This process is
commonly called knowledge discovery or data mining. It is also suited to reasoning
about imprecise or incomplete data.
Rough sets, meaning approximation sets, are built on ordinary sets. We recall that
fuzzy sets are a generalization of ordinary sets. In this regard, rough sets and fuzzy
sets have a common ground. However, their ways of deviating from ordinary sets are
different, and their primary application objectives are also different. These two
approaches can be used in their own domains independently, or they can be
complementary. Briefly, fuzzy sets allow partial membership to deal with gradual
changes or uncertainties. Rough sets, on the other hand, allow multiple memberships
to deal with indiscernibility.
Rough set theory is commonly compared to other techniques such as statistical
analysis, particularly discriminant analysis, and machine learning in classical AI.
While there are no mathematical proofs to show which technique is most suitable for
what types of problems, it appears that each technique has specific strengths for
problems of certain kinds. For example, rough sets might do a better job than
statistical analysis when the underlying data distribution deviates significantly from
a normal distribution, since there is no such distribution assumption in rough sets.
Also, perhaps rough sets could be better than statistical analysis when the sample size
is small, since any distribution can hardly be defined in such a case. Machine learning
can broadly be defined as the process in which computers acquire their knowledge
and improve their skills by themselves. Under this broad definition, the data mining
aspect of rough sets can be regarded as a technique of machine learning. Rough sets,
however, employ an approach different from those found in classical machine
learning.
A major feature of rough set theory in terms of practical applications is the
classification of empirical data and subsequent decision making. The primary
application domains of rough sets have been in symbolic approaches such as data and
6.1 Introduction 163
decision analysis, databases, knowledge based systems, and machine learning. There
are also recent interests in rough control, applications of rough set theory to control
problems. Practical application areas include: engineering disciplines such as civil,
electrical, chemical, mechanical and transportation; pharmacology; medicine; and
operations research. Specific application examples include: cement kiln, aircraft
pilot performance evaluation, hydrology, and switching circuits.
The basic idea of rough sets
Raw data is often very detailed, yet disorganized, incomplete and imprecise. To
understand and use the data, we derive underlying knowledge about the data, i.e.,
what it represents. Such knowledge can be represented in many forms. Rules are the
most common form of representing knowledge. Other forms include equations and
algorithms.
In many situations, we may not need detailed data to derive conclusions for actions.
Instead, "coarse" or "rough" data or data sets may be sufficient. In certain situations,
such approximate rough data may be even better than a detailed one. Too much detail
is often confusing. Rough data can be more efficient, effective and robust, and may
uncover the underlying characteristics.
The rough sets methodology gives a new technique for reasoning from imprecise
and ambiguous data. The technique can efficiently perform knowledge acquisition
and machine learning. It performs these by lowering the degree of precision in data,
based on a rigorous mathematical theory. By selecting the right roughness or
precision of data, we will find the underlying characteristics of data. In this chapter
we will give a brief introduction to rough set theory. More short introductions will be
found in Pawlak 1988 and 1995. A thorough treatment of the theory, especially its
theoretical foundation, is discussed in Pawlak 1991.
List of selected symbols in this chapter
Symbol Page Meaning
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
U×V 165 The cartesian product of two sets, U and V.
R* 167, 169 The partition induced by an equivalence relation R:
R* = {X1, X2, ...., Xn}, where Xi is an equivalence
class of R. Xi is also called an elementary set of an
approximation space S = (U, R).
R*1 ⋅ R*2 167 The product of two partitions, R*1 and R*2, is the
partition induced by R1 ∩ R2.
S = (U, R) 171 An approximation space, where U is a finite set of
objects and R ⊆ U × U is an equivalence relation on
U. If u, v ∈ U and (u, v) ∈ R, we say that u and v are
indistinguishable in S. R is called an indiscernibility
relation.
164 6 Rough Sets
Symbol Page Meaning
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
S(X) 171 The lower approximation of X in S = ∪Xi⊆X Xi
S (X) 171 The upper approximation of X in S = ∪Xi∩X≠∅ Xi
POSS(X)
BNDS(X) 172 The positive region of X in S = S(X)
NEGS(X) 172 The boundary region of X in S = S (X) - S(X)
a
K 172 The negative region of X in S = U - S (X)
G 174 The confidence factor
POSS (B*) 176 A knowledge representation system, where:
BNDS (B*) K = (U, C, D, V, ρ), U is a set of objects; C is a set of
NEGS (B*) condition attributes; D is a set of action (or decision)
γA (B) attributes; V = ∪a∈F Va, and Va is the domain of
attribute a ∈ F, where F = C ∪ D; ρ: U × F → V for
A →γ B every u ∈U and a ∈ F is an information function. ρ
βA (B) can also be represented as: ρu: F → V by ρu(a) = ρ(u,
σA (B) a) for every u ∈ U and a ∈ F.
180 An equivalence relation defined on U for any subset
G of C or D, such that (ui, uj) ∈ G if and only if ρ(ui,
g) = ρ(uj, g) for every g ∈ G.
183 The positive region of partition B* in S = ∪Yj∈B* S
(Yj) = ∪Yj∈B* [ ∪Xi⊆Yj Xi ].
183 Boundary region = ∪Yj∈B* ( S (Yj) - S(Yj)) = ∪Yj∈B*
[ ∪Xi∩Yj≠∅ Xi - ∪Xi⊆Yj Xi ]
183 Negative region = U - ∪Yj∈B* ( S (Yj)) = U - [ ∪Xi∩Yj≠∅
Xi ].
184 The dependency of B on A = ⏐ POSS (B*) ⏐ ⁄ ⏐ U ⏐,
where ⏐ ⏐ denotes the cardinality, and A ⊆ C, the set
of conditional attributes and B ⊆ D, the set of
decision attributes
184 The dependency of B on A, where A, B and γ as
defined above
185 The discriminant index of B on A = ⏐ POSS (B*) ∪
NEGS (B*) ⏐ ⁄ ⏐ U ⏐
185 The significance of B on A, defined as σA (B) = γC (B)
- γC-A (B)
6.2 Review of Ordinary Sets and Relations 165
Symbol Page Meaning
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Cˆ 186 A reduct or relative reduct of C. (A subset B of C is
independent if there is no other subset B' of C which
is B' ⊂ B and B ' = B . B is a reduct of C if B is a
maximal independent subset. This is extended to a
relative reduct by considering subset B to be an
independent set with respect to D, where there is no
other subset B' that satisfy POSB' (D*) = POSB (D*).)
RED(C) 186 The collection of all reducts of C
REDD(C) 186 The collection of all relative reducts of C with D
CORE(C) 187 Core of C = ∩ B∈RED(C)B
CORED(C) 188 Relative core of C = ∩ B∈REDD(C)B
6.2 Review of Ordinary Sets and Relations
For the convenience of the reader, we will briefly review some basics on sets and
relations which will be used in this chapter. A review on the topics with a different
focus is given in Subsection 5.3.1. These topics are typically covered in college
discrete mathematics courses and a reader who is familiar with these materials may
skip this section. (A reader who needs more details may see, e.g., C.L. Liu, Elements
of Discrete Mathematics, 2nd Ed., McGraw-Hill, 1985, or similar discrete
mathematics books.)
Given two sets, U and V, we define the cartesian product = U × V as U × V = {(u,
v) | u ∈ U, v ∈ V}, where (u, v) represents an ordered pair. That is, U × V is the set of
all such ordered pair elements where u is chosen for every element of U and v is
chosen for every element of V. For example, suppose that U is the set of neckties a
man has: U = {red-tie, blue-tie}, and V is the set of shirts he owns: V = {white-shirt,
grey-shirt, pink-shirt}. Let us call them red, blue, white, grey and pink for simplicity.
U × V in this example then is {(red, white), (red, grey), (red, pink), (blue, white),
(blue, grey), (blue, pink)}. In general, if U has m elements and V has n elements, then
U × V has m × n elements. A binary relation, or simply relation, R from U to V is
a subset of U × V. In the above example, {(red, white), (red, grey), (blue, white),
(blue, pink)} is a binary relation. Fig. 6.1 shows these concepts. Especially, if U =
V, the cartesian product becomes U × U. A binary relation is a subset of U × U and is
said to be a relation on U (Fig. 6.2).
We can define various kinds of relations on U when they satisfy specific
characteristics. In particular, R is an equivalence relation, if
(1) reflexive, i.e., (u, u) for every u ∈ U.
(2) symmetric, i.e., (u, v) implies (v, u) for every u, v ∈ U.
(3) transitive, i.e., (u, v) and (v, w) imply (u, w) for every u, v, w ∈ U.
166 6 Rough Sets
Fig. 6.1. The cartesian product and a binary relation defined on two sets.
Fig. 6.2. The cartesian product and a binary relation defined on one set.
Example 1.
Let NY = New York, LA = Los Angeles, SF = San Francisco, MO = Montreal, TO
= Toronto, MC = Mexico City, and U = {NY, LA, SF, MO, TO, MC}. Then,
R ={(u, v)│ u and v are in the same country} = {(NY, NY), (LA, LA), (SF, SF), (MO,
MO), (TO, TO), (MC, MC), (NY, LA), (LA, NY), (NY, SF), (SF, NY), (LA, SF),
(SF, LA), (MO, TO), (TO, MO)} is an equivalence relation.
Example 2.
U = a group of people.
R1 = {(u, v)│ u and v have the same last name} is an equivalence relation.
R2 = {(u, v)⏐ u and v have the same birthday} is an equivalence relation.
R3 = {(u, v)⏐ u and v have the same sex} is an equivalence relation.
R4 = R1 ∩ R2 = {(u, v)│ u and v have the same last name and the same
birthday} is an equivalence relation.
R5 = R1 ∩ R2 ∩ R3 = {(u, v)│ u and v have the same last name, the same
birthday, and the same sex} is an equivalence relation.
Example 3.
U = {a, b, c, d, e, f, g}, R = {(a, a), (b, b), (c, c), (d, d), (e, e), (f, f), (g, g), (a, b), (b,
6.3 Information Tables and Attributes 167
a), (c, d), (d, c), (e, f), (f, e), (e, g), (g, e), (f, g), (g, f)} is an equivalence relation. As
in case for any mathematical treatment, such representation can be an abstract form
of specific cases such as Examples 1 and 2. Fig. 6.3(a) is a diagram representation
of this relation.
Generally, a partition of a set U is a set of nonempty subsets of U, {X1, X2,..., Xk},
where X1 ∪ X2 ∪...∪ Xk = U and Xi ∩ Xj = ∅ for i ≠ j. i.e., a partition divides a set
into a collection of disjoint subsets. These subsets are called blocks of the partition.
In particular, when we have an equivalence relation R on U, we can partition U so
that every two elements in each subset are related and any two elements in different
subsets are unrelated. We say the partition is induced by the equivalence relation R,
and denote the partition as R*. The subsets are called the equivalence classes. Fig.
6.3(a) is an example of an equivalence relation, and Fig. 6.3(b) is the partition
induced by the equivalence relation. Sets X1, X2, and X3 are equivalent classes.
Let R1 and R2 be two equivalence relations on U, and R*1 and R*2 be the
corresponding partitions induced by R1 and R2. The product of two partitions, R*1
and R*2, denoted R*1 ⋅ R*2 is defined as the partition induced by R1 ∩ R2. In other
words, in the partition R*1 ⋅ R*2, two elements a and b are in the same block
(equivalent class) if a and b are in the same block of R*1 and also in the same block
of R*2. In Example 2, in the partition induced by R4 = R1 ∩ R2, two persons will be
in the same block if they have the same last name and the same birthday.
(a) (b)
Fig. 6.3 (a) An equivalence relation. (b) The partition induced by the equivalence relation.
6.3 Information Tables and Attributes
Rough set theory deals with data expressed in two-dimensional or matrix form tables,
called information tables. In this section, we will discuss some terminology
associated with information tables.
168 6 Rough Sets
Information tables
For rough set theory, the input information is given in the form of a two-dimensional
table (i.e., matrix), called an information table (or decision table). The following is
a simple example of an information table.
Example 4.
Table 1. Symptoms and heart problems of patients
Universe Condition Attributes Decision Attribute
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Person Temperature Blood Pressure Heart Problem
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Adams normal low no
Brown normal low no
Carter normal medium yes
Davis high medium no
Evans high high yes
Ford high high yes
As we see in this example, the columns of an information table are divided into three
sections; the universe U, condition attributes (or simply attributes), and decision
attributes (or simply decisions). The universe is the set of elements under
consideration as in ordinary sets. In the above example, the universe contains 6
patients. We can have any number of condition attributes; in this example, we have
two, Temperature and Blood Pressure. Similarly, we can have any number of
decision attributes, although one is all that is required. In the above example, we have
one decision attribute, Heart Problem. If we want, we can add more, as for example,
Stroke Problem, Diabetes Problem, and so on. Rows of an information table are
called entities (objects, or sometimes examples). The entities can be labeled by the
elements of the universe as, for example, Adams, etc. Each element (patient in the
above table) is characterized by its condition and decision attribute values.
The previous table is simple for an illustration purpose. For practical applications,
the table size would be much larger. For example, there may be 10,000 patients,
twenty condition attributes, and the range of each attribute may be much higher than
simply "normal" and "high." Also, the application domains can be in many other
areas such as analysis of consumer and industrial products, process control, and so on.
Given input in this form, we would like to derive various conclusions as our output.
Possible types of conclusions include: how the decision attributes depend on the
condition attributes, are there any condition attributes that are redundant, i.e., can be
eliminated without affecting the decision making, and derivation of underlying rules
governing the relationship from the condition to decision attributes.
In the above example, let Adams = a, Brown = b, Carter = c, Davis = d, Evans =
e, and Ford = f, for simplicity. Then the universe U is {a, b, c, d, e, f}. Equivalence
relations can be defined by the condition and decision attributes, as for example,
6.3 Information Tables and Attributes 169
R1 = {(u, v)│ u and v have the same Temperature}
R2 = {(u, v)⏐ u and v have the same Blood Pressure}
R3 = {(u, v)⏐ u and v have the same Heart Problem}
R4 = R1 ∩ R2 = {(u, v)│ u and v have the same Temperature and Blood Pressure }.
The universe can then be partitioned by these equivalence relations. For example,
R4*, the partition induced by the equivalence relation R4 = R1 ∩ R2 = {(u, v)│ u and
v have the same Temperature and Blood Pressure}, is R4* = R1* ⋅ R2* = {X1, X2, X3,
X4}, where X1 = {a, b}, X2 = {c}, X3 = {d}, X4 = {e, f}. Sets X1, X2, X3, and X4 are
called the equivalence classes.
Concepts
So far we have focused our attention primarily on condition attributes. For decision
attributes, we can define equivalence relations and determine the partitions in the
same way as for condition attributes. For example, we can define the following
equivalence relation.
R3 = {(u, v)⏐ u and v have the same Heart Problem}
The partition induced by this relation is:
R3* = {Y1, Y2}, where Y1 = {a, b, d} and Y2 = {c, e, f}.
In general, such sets in a partition are called concepts (Y1 and Y2 in the above
example). The concept Y1 corresponds to the set of all patients with no heart problem,
and Y2 with heart problem.
In rough set theory, we are interested in finding mappings from the partitions
induced by the condition attributes to the partitions induced by decision attributes.
Rule induction
We saw that R4*, the partition induced by the equivalence relation R4 = R1 ∩ R2 = {(u,
v)│ u and v have the same Temperature and Blood Pressure}, is:
R4* = {X1, X2, X3, X4}, where X1 = {a, b}, X2 = {c}, X3 = {d}, X4 = {e, f}.
In the above, we had R3*, the partition induced by R3 = {(u, v)⏐ u and v have the same
Heart Problem} as:
R3* = {Y1, Y2}, where Y1 = {a, b, d} and Y2 = {c, e, f}.
From these, we can derive rules as follows:
if X1 = {a, b}, then Y1 = {a, b, d}.
if X2 = {c}, then Y2 = {c, e, f}.
if X3 = {d}, then Y1 = {a, b, d}.
if X4 = {e, f}, then Y2 = {c, e, f}.
In words,
170 6 Rough Sets
if Temperature is normal and Blood Pressure is low, then no Heart Problem.
if Temperature is normal and Blood Pressure is medium, then yes Heart Problem.
if Temperature is high and Blood Pressure is medium, then no Heart Problem.
if Temperature is high and Blood Pressure is high, then yes Heart Problem.
Or, these rules can be simplified as follows:
if Blood Pressure is low, then no Heart Problem.
if Temperature is normal and Blood Pressure is medium, then yes Heart Problem.
if Temperature is high and Blood Pressure is medium, then no Heart Problem.
if Blood Pressure is high, then yes Heart Problem.
6.4 Approximation Spaces
In the previous Table 1, information processing is straightforward, since each
elementary set (equivalence class) in the partition induced by the two condition
attributes maps to an elementary set (a concept) in the partition induced by the
decision attribute. In general, this may not be the case. That is, elements in an
elementary set may map to different concepts. Dealing with such information tables
is the core of rough set theory, and we will discuss some basics of these topics in this
section.
Inconsistent information tables
Now consider Table 2, where a patient Gill is added to Table 1. Previously based on
Table 1, we had a rule: "if Temperature is high and Blood Pressure is high, then yes
Heart Problem." With the addition of Gill, this rule is no longer true. Such a table is
called inconsistent. That is, an inconsistent information table contains entities whose
condition attribute values are the same, but lead to different concepts.
TABLE 2. (Addition of a patient to Table 1)
Universe Condition Attributes Decision Attribute
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Person Temperature Blood Pressure Heart Problem
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Adams normal low no
Brown normal low no
Carter normal medium yes
Davis high medium no
Evans high high yes
Ford high high yes
Gill high high no
6.4 Approximation Spaces 171
To deal with inconsistent tables, we introduce various terminology which is
discussed in the following.
Approximation spaces and lower and upper approximations
As their name implies, rough sets are sets which cannot be clearly ascertained or
defined. However, rough (approximate) sets can be constructed. We will define
approximation spaces which lead to the concept of rough sets.
Let U be a finite set of objects and R ⊆ U × U be an equivalence relation on U.
Then, S = (U, R) is called an approximation space. If u, v ∈ U and (u, v) ∈ R, we say
that u and v are indistinguishable in S. R is called an indiscernibility relation.
Indiscernibility relations are the main concept of rough sets. For example, in Table
2 above, suppose R = R2 = {(u, v)⏐ u and v have the same Blood Pressure}. Then R2
is a indiscernibility relation. Patient a (Adams) and b (Brown), for example, are
indiscernible using this equivalence relation; so are elements c and d; so are e, f, and
g. We can define other indiscernibility relations by choosing other equivalence
relations, such as R4 = R1 ∩ R2 = {(u, v)│ u and v have the same Temperature and the
same Blood Pressure}.
Let R* = {X1, X2, ...., Xn} denote the partition induced by R, where Xi is an
equivalence class of R. Xi is also called an elementary set of S. Any finite union of
elementary sets is called a definable set.
Let X be any subset of U. Then we define the following:
S(X) = ∪Xi⊆X Xi the lower approximation of X in S
S (X) = ∪Xi∩X≠∅ Xi the upper approximation of X in S
In words, S(X) is the union of all the elementary sets of S, where each elementary set
is totally included (i.e., a subset) in X. S (X) is the union of all the elementary sets of
S, where each elementary set contains at least one element in X. In the following,
when the meaning is clear from the context we sometimes write S(X) as S and S (X)
as S for simplicity.
Example 5.
In Table 2, let us choose R2, an equivalence relation for the same Blood Pressure, as
our relation R. Then S = (U, R2) is the approximation space. The partition induced by
R2 is R2* = {X1, X2, X3}, where X1 = {a, b}, X2 = {c, d}, and X3 = {e, f, g} are the
equivalence classes or elementary sets of S (Fig. 6.4a). X1 ∪ X2 = {a, b, c, d} is a
definable set of S. Suppose that we have a new concept (for example, Stroke Problem
= no) for which X = {b, c, d}. Then S(X) = X2 and S (X) = X1 ∪ X2 (Fig. 6.4b).
172 6 Rough Sets
Fig. 6.4 (a) The partition induced by the equivalence relation of the same Blood Pressure. (b)
The lower and upper approximation of X = {b, c, d}.
Three distinct regions in an approximation space: positive,
boundary and negative
Using the lower and upper approximations discussed above, we can characterize the
approximation space S = (U, R) in terms of the concept X with three distinct regions
defined as follows:
1. the positive region: POSS(X) = S(X)
2. the boundary region: BNDS(X) = S (X) - S(X)
3. the negative region: NEGS(X) = U - S (X)
The lower and upper approximations, and the positive, boundary, and negative
regions are the most important notions in rough set theory.
In Table 2, suppose X = {b, c, d}, then the three distinct regions are (Fig. 6.5):
POSS(X) = S(X) = X2
BNDS(X) = S (X) - S(X) = X1 ∪ X2 - X2 = X1
NEGS(X) = U - S (X) = X3
6.4 Approximation Spaces 173
Fig. 6.5. The positive, boundary, and negative regions of X = {b, c, d}.
In general, a diagram interpretation of the three regions are given in the following Fig.
6.6. Suppose that the top three figures are given:
Then we have the following for the boundary and negative regions:
Fig. 6.6.A diagram interpretation of the lower and upper approximations and the positive,
boundary, and negative regions.
174 6 Rough Sets
Rule induction on an approximation space
For any concept, rules induced from its positive region (i.e., lower approximation)
are called certain, since they are certainly valid. On the other hand, for any concept,
rules induced from the boundary region of the concept are called uncertain. For an
uncertain rule, we can define the confidence factor α. Let Xi be an elementary set in
the boundary region and Yj be a concept. The confidence factor for a rule derived
from Xi and Yj is:
α = P Xi ∩ Yj
Xi
In words, the confidence factor is (the number of elements that are in the elementary
set under consideration and that satisfy the concept for the rule) / (the total number
of elements in the elementary set under consideration).
Example 6.
In Table 2, certain rules can be:
if X1 = {a, b}, then Y1 = {a, b, d, g}.
if X2 = {c}, then Y2 = {c, e, f}.
if X3 = {d}, then Y1 = {a, b, d, g}.
In words,
if Temperature is normal and Blood Pressure is low, then no Heart Problem.
if Temperature is normal and Blood Pressure is medium, then yes Heart Problem.
if Temperature is high and Blood Pressure is medium, then no Heart Problem.
Uncertain rules and their confidence factors can be:
if X4 = {e, f, g}, then Y1 = {a, b, d, g} with α = |{g}| / |X4| = 1/3 = 0.33.
if X4 = {e, f, g}, then Y2 = {c, e, f} with α = |{e, f}| / |X4| = 2/3 = 0.67.
In words,
if Temperature is high and Blood Pressure is high, then no Heart Problem with the
confidence factor = 0.33.
if Temperature is high and Blood Pressure is high, then yes Heart Problem with the
confidence factor = 0.67.
Definability and rough sets
As a special case, if BND = ∅, i.e., if S(X) = S (X), X is a definable set in S (for
example, the two concepts in Table 1 are definable sets). Otherwise, if BND ≠ ∅, or
6.4 Approximation Spaces 175
equivalently if S(X) ≠ S (X), then X is said to be undefinable or a rough set. (The two
concepts in Table 2, representing "no" and "yes" Heart Problem are rough sets.)
Although there is some ambiguity on what exactly is a "rough set" in the literature,
this can be considered a standard definition of a rough set.
Generally, there are four different kinds of situations of rough sets based on
whether S(X) = ∅ and whether S (X) = U, as can be defined as follows:
1. S ≠ ∅ and S ≠ U then X is roughly definable (Fig. 6.7a).
2. S ≠ ∅ and S = U then X is internally definable (or externally undefinable)
(Fig. 6.7b).
3. S = ∅ and S ≠ U then X is externally definable (or internally undefinable)
(Fig. 6.7c).
4. S = ∅ and S = U then X is totally non-definable (or totally undefinable) (Fig.
6.7d).
(a) (b)
(c) (d)
Fig. 6.7 Four types of definable and undefinable situations of rough sets. (a) Roughly definable.
(b) Internally definable. (c) Externally definable. (d) Totally nondefinable.
The following is an intuitive meaning of the above classification in terms of the
positive, boundary, and negative regions.
1. Roughly definable. There are elements in U that definitely belong to X (they are
the elements in the positive region). Similarly, there are elements in U for which
we can say that they definitely do not belong X, i.e., they definitely belong to -X,
the complete of X (they are the elements in the negative region).
2. Internally definable. There are elements in U that definitely belong to X, but
176 6 Rough Sets
there are no elements in U for which we can assure that they do not belong to X
(since there is no negative region).
3. Externally definable. This is the opposite of internally definable. There are no
elements in U for which we can say that they definitely belong to X (since there
is no positive region), but there are elements in U for which we can say that they
do not belong to X.
4. Totally non-definable. We cannot decide for any element of U whether it
definitely belongs to X or -X.
Properties of S and S
Every union of elementary sets (i.e., equivalence classes) is definable. This is
because in this case, S = S = X and BND = ∅, and by the definition of definable.
Also, the following properties hold:
1. S(∅) = S (∅)
S(U) = S (U)
2. S(X) ⊆ X ⊆ S (X)
3. S(X ∪ Y) ⊇ S(X) ∪ S(Y)
4. S(X ∩ Y) = S(X) ∩ S(Y)
5. S (X ∪ Y) = S (X) ∪ S (Y)
6. S (X ∩ Y) ⊆ S (X) ∩ S (Y)
7. S(U - X) = U - S (X)
8. S (U - X) = U - S(X)
9. S(S(X)) = S (S(X)) = S(X)
10. S ( S (X)) = S( S (X)) = S (X)
6.5 Knowledge Representation Systems
A formal definition of knowledge representation systems (KRS)
In rough set theory a knowledge representation system (KRS) is formally defined as
a quintuple as follows. The quintuple is an aggregate of objects, attributes and their
6.5 Knowledge Representing Systems 177
values, and a function. The KRS are typical applications of rough sets.
K = (U, C, D, V, ρ),
where:
U is a set of objects;
C is a set of condition attributes;
D is a set of action (or decision) attributes;
V = ∪a∈F Va, and Va is the domain of attribute a ∈ F, where F = C ∪ D;
ρ: U × F → V for every u ∈ U and a ∈ F is an information function. ρ can also
be represented as: ρu: F → V by ρu(a) = ρ(u, a) for every u ∈ U and a ∈ F.
The information function ρ can be defined in a more restricted way as: ρ: U × a
→ Va for every u ∈ U and a ∈ F. The previous definition of ρ: U × F → V can
theoretically include many invalid mappings since V contains all possible values of
all attributes. For example, (Adams, temperature) → "none" is probably invalid
mapping; "none" is probably a value of another attribute, say, coughing rather than
temperature. The second definition, ρ: U × a → Va, eliminates such invalid mapping.
Which definition to employ is one's choice, probably depending on a specific
problem. If there are no errors, invalid mapping will not occur. If |V| is small, the use
of V is manageable.
Generally, how to represent human knowledge is a difficult problem and it is a
major problem in AI research today. Human knowledge is so complex, there
probably is no simple answer on what is the best form to represent it. Two most
popular methods for representing knowledge in knowledge-based systems are
rule-based and frame-based. The KRS here is closely related to the rule-based
systems. The condition, premise, or "if" part of an if-then rule corresponds to the
condition attributes, and the action, conclusion, or "then" part of the if-then rule
corresponds to decision attributes in this KRS.
As an example, consider a problem of representing "knowledge" on how various
factors affect heart and stroke problems of persons. Many factors are conceivable,
some have obvious close connections to these diseases while some don't. Some
possible factors are: laboratory test results, such as temperature, pulse, blood
pressure, blood test (e.g., good and bad cholesterol), urinalysis, EKG, etc.; symptoms
such as chest pain, dizziness, etc.; diet, i.e., what to eat and drink; and so forth. Then
C, set of condition attributes, is the set of these factors, for example, C =
{temperature, pulse, . . ., EKG}. (For simplicity, we have dropped symptoms and diet,
etc.) If there are 25 of these condition attributes, |C| = 25. The set of decision
attributes, D has two elements, D = {heart_problem, stroke_problem}, and |D| = 2.
F = C ∪ D in this case is F = {temperature, pulse, . . ., EKG; heart_problem,
stroke_problem} and |F| = 27. Let us assume that the first attribute temperature is
measured as either normal, slightly_high, or high. Then Va for attribute a =
temperature is the domain of the values of this attribute, i.e., Vtemperature = {normal,
slightly_high, high}. Although |Vtemperature| = 3 in this example, |Va| can be any value.
Similarly, we may define other domains, for example, Vpulse = {low, medium,
high}, . . , Vstroke_problem = {none, moderate, high}. V is the union of all these Va's, e.g.,
V = {normal, slightly_high, high, low, medium, . . .}. A short note on the
178 6 Rough Sets
definition on V: The elements of V are distinct since V is a set. Hence, for example,
"high" in Vpulse = {low, medium, high} will not appear in V repeatedly after "high" for
Vtemperature as V = {normal, slightly_high, high, low, medium, high, . . .}. This means
that |V| ≤ Σ a∈F |Va|. In certain applications, it may be desirable to distinguish high
pulse from high temperature. In such case, we can define Vtemperature =
{normal_temperature, slightly_high_temperature, high_temperature} and Vpulse =
{low_pulse, medium_pulse, high_pulse}.
Suppose that information for these condition and decision attributes is collected
for 10,000 men. Some of the information can be unknown or inaccurate. In such a
case, for example, "unknown" can be added as a possible attribute value. For
example, there may not be any EKG test result for Man No. 7,825. The 10,000 men
are the elements of set U, which can be represented as U = {u1, u2, . . ., u10,000}. Now
we can construct the cartesian product U × F. In set form, U × F is the set of many
ordered pairs as {(u1, temperature), . . ., (u10,000, stroke_problem)}. (How many
elements are in U × F? Yes, there are 270,000.) Or, in matrix or two-dimensional
array form, we have 10,000 rows corresponding to the 10,000 men and 27 columns
corresponding to the 25 conditional plus 2 decision attributes.
Now by using the collected information of 270,000 values (of which some may
have unknown values) for the 10,000 men, we can fill in the U × F matrix, as e.g., (u1,
temperature) = slightly_high, . . ., (u10,000, stroke_problem) = none. Each row of this
matrix representing a specific man is called an object (or entity). This matrix, with
the specific element values, can be viewed as mapping from U × F to V, and denoted
as an information function, ρ: U × F → V. Combining all these values, we have our
KRS as K = (U, C, D, V, ρ). (During the time this part of the book was being prepared,
a Harvard report in the New England Journal of Medicine was just published. It said
that eating fish does not particularly help to avoid heart problems based on the study
for 45,000 men for six years. Can you think of how this study can be formulated as
a KRS?)
We note that our KRS format can represent many types of applications. Here are
some examples:
Diagnosis:The above heart and stroke problem is in this category. The same
concept can be applied to engineering, business, and other
medical problems.
Prediction: e.g., in finance to predict a stock market.
Control: Simply stated, control is input-output mapping. For example,
to control room temperature, the temperature from a sensor is
input and a heat or cooling source is output. The conditional
attributes correspond to input and the decision attributes to
output.
Machine learning: Discovering from information functions.
A knowledge representation system (KRS) example
We will consider the following fictitious example to illustrate a knowledge
representation system.
6.5 Knowledge Representing Systems 179
U is a set of 8 persons (objects): U = {u1, u2,..., u8}. Here u1, u2,..., are symbolic
representations of Adams, Brown, and so on.
C is a set of 3 condition attributes: C = {Temp, Blood-P (Blood-Pressure), Vision
(Eyesight)}
D is a set of 2 decision attributes: D = {Heart-Risk, Health (General-Health)}
Fig. 6.8. An information function ρ from U × F to V.
The domains of individual attributes are given by:
VTemp = {below, normal, above}
VBlood-P = (low, average, high}
VVision = {near, standard, far}
VHeart-Risk = {none, slight, serious}
VHealth = {poor, good, excellent}
We have:
F = C ∪ D = {Temp, Blood-P, Vision; Heart-Risk, Health}
V = ∪a∈F Va ={below, normal, above; low, average, high; near, standard, far; none,
slight, serious; poor, good, excellent}
ρ: U × F → V, i.e., ρ: {(u1, Temp),..., (u8, Health)} → {below,..., excellent} is an
information function (Fig. 6.8).
The following Table 3 is one of such information functions.
180 6 Rough Sets
Table 3. An example information function in information table form for a knowledge
representation system
UC D
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Person Temp Blood-P Vision Heart-Risk Health
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
u1 normal low far slight poor
u2 below average standard serious excellent
u3 above low near serious good
u4 normal average near slight excellent
u5 normal low far none good
u6 above high near serious good
u7 above average standard serious excellent
u8 below average standard none good
6.6 More on the Basics of Rough Sets
Equivalence relations on attributes
As discussed before, for any subset G of C or D, we can define an equivalence
relation G on U such that (ui, uj) ∈ G if and only if ρ(ui, g) = ρ(uj, g) for every
g ∈ G. That is, G = {(ui, uj) | ui and uj have the same value for every attribute g ∈
G}. In words, we group the elements based on the values of specific attributes;
elements ui and uj are related if all the values of the specific attributes are the same,
not related otherwise. We denote the partition induced by the equivalence relation
G as G*.
Example 7. (From Table 3)
G = {Temp}.
In this case, there is only one element (attribute) in G, which is denoted as g = Temp.
ρ(u1, Temp) = ρ(u4, Temp) = ρ(u5, Temp) = normal
ρ(u2, Temp) = ρ(u8, Temp) = below
ρ(u3, Temp) = ρ(u6, Temp) = ρ(u7, Temp) = above
Hence, G = {(u1, u1), (u2, u2), ..., (u8, u8); (u1, u4), (u1, u5), (u4, u5), (u4 ,u1), (u5 ,u1),
(u5, u4), (u2, u8), (u8, u2), (u3, u6), (u3, u7), (u6, u7), (u6, u3), (u7, u3), (u7, u6)}.
Fig. 6.9 shows G*, the partition of the three subsets of U induced by G .
6.6 More on the Basics of Rough Sets 181
Fig. 6.9.The partition G* induced by an equivalence relation G , where (ui, uj) ∈ G if and
only if ρ(ui, Temp) = ρ(uj, Temp) in Table 3.
Example 8. (From Table 3)
G = {Temp, Blood-P}
In addition to ρ(ui, Temp) = ρ(uj, Temp) in the previous example we have:
ρ(u1, Blood-P) = ρ(u3, Blood-P) = ρ(u5, Blood-P) = low
ρ(u2, Blood-P) = ρ(u4, Blood-P) = ρ(u7, Blood-P) = ρ(u8, Blood-P) = average
ρ(u6, Blood-P) = high
For (ui, uj) to be in G , ρ(ui, g) = ρ(uj, g) must hold for every g in G, i.e., in our
example, for both g = Temp and g = Blood-P. This means that we take the
intersection of the two equivalence relations, one for Temp and the other for
Blood-P:
G = { (u1, u1), (u2, u2), ..., (u8, u8); (u1, u5), (u5, u1), (u2, u8), (u8, u2)}.
The partition induced by G is the product of the partitions induced by the two
equivalence relations for Temp and Blood-P: G* = {X1, X2, X3, X4, X5, X6}, where X1
= {u1, u5}, X2 = {u4}, X3 = {u2, u8}, X4 = {u3}, X5 = {u6}, X6 ={u7}. Fig.6.10 shows
the partition; the superposed dashed lines correspond to the partition induced by
Blood-P.
182 6 Rough Sets
Fig. 6.10.The partition induced by an equivalence relation G , where (ui, uj) ∈ G if and
only if ρ(ui, Temp) = ρ(uj, Temp) and ρ(ui, Blood-P) = ρ(uj, Blood-P) in Table 3.
POS, BND, and NEG regions of a partition
Let A ⊆ C, i.e., A is a set of some condition attributes, as, e.g., A = {Temp, Blood-P}.
Let B ⊆ D, i.e., B is a set of some decision attributes as, e.g., B = {Heart-Risk} or B
= {Heart-Risk, Health}. Let A* = {X1, ..., Xn} and B* = {Y1, ..., Ym} denotes the
partitions on U induced by the equivalence relations à and B , respectively. (These
are for A and B in place of G in the previous example.) We will be interested in
determining to what extent the partition B* as a whole can be characterized or
approximated by the partition A* (Fig. 6.11).
Fig. 6.11. The characterization of partition B* on partition A*.
Example 9. The partitions induced by A = {Temp} and B = {Heart-Risk} in
Table 3.
Let A = {Temp}. As we saw before, Ã = {(u1, u1), (u2, u2), ..., (u8, u8); (u1, u4), (u1, u5),
6.6 More on the Basics of Rough Sets 183
(u4, u5), (u4 ,u1), (u5, u1), (u5, u4), (u2, u8), (u8, u2), (u3, u6), (u3, u7), (u6, u7), (u6, u3), (u7,
u3), (u7, u6)} and A* = {X1, X2, X3} is given in Fig. 6.12(a). Let B = {Heart-Risk}.
Then similarly, B = {(u1, u1), (u2, u2), ...}, and B* = {Y1, Y2, Y3} is given in Fig.
6.12(b).
(a) (b)
Fig. 6.12 (a) The partition A* = {X1, X2, X3} induced by à where A = {Temp}. (b) The partition
B* = {Y1, Y2, Y3} induced by B where B = {Heart-Risk}.
In terms of the lower approximation S (Yj) and upper approximation S (Yj) of Yj ∈ B*
in the approximation space S = (U, Ã), we define the positive, boundary, and
negative regions of the partition B*, as follows:
POSS (B*) = ∪Yj∈B* S (Yj) = ∪Yj∈B* [ ∪Xi⊆Yj Xi ]
BNDS (B*) = ∪Yj∈B* ( S (Yj) - S(Yj)) = ∪Yj∈B* [ ∪Xi∩Yj≠∅ Xi - ∪Xi⊆Yj Xi ]
NEGS (B*) = U - ∪Yj∈B* ( S (Yj)) = U - ∪Yj∈B* [ ∪Xi∩Yj≠∅ Xi ]
Note that POSS (B*), BNDS (B*), and NEGS (B*) defined above are distinct.
Note also that the argument type of B* in POSS (B*), etc. defined here is different
from the argument type of X in POSS (X), etc. defined in Section 6.4. X is a set of
elements while B* is a partition induced by an equivalence relation, i.e., a set of sets
of elements.
We may understand these as our way of defining POS, etc. That is, for X, we
define POSS (X) = S(X). For B*, we define POSS (B*) = union of S(Xi) where Xi ∈
B*. The latter case can be described more generally as: if F = {X1, X2, ..., Xn}, then
POSS (F) = POSS (X1) ∪ POSS (X2) . . . ∪ POSS (Xn). In both cases of POSS (X) and
POSS(B*), the results are sets of elements, i.e., they are of the same type and
consistent, and will not cause any problem. Incidentally, this type of situation is
common in mathematics. For example, a function may be defined on different types
184 6 Rough Sets
of arguments, one for a scalar, the other for a vector, or a matrix. Of course, the
function must be defined for each of the different types of arguments.
Example 10. POSS (B*), BNDS (B*), and NEGS (B*) applied to Example 9.
S(Y1) = ∪Xi⊆Y1 Xi = ∅, S(Y2) = X3, S(Y3) = ∅
S (Y1) = ∪Xi∩Y1≠∅ Xi = X1, S (Y2) = X2 ∪ X3, S (Y3) = X1 ∪ X2
POSS (B*) = ∪Yj∈B* S (Yj) = S(Y1) ∪ S(Y2) ∪ S(Y3) = ∅ ∪ X3 ∪ ∅ = X3
BNDS (B*) = ∪Yj∈B* ( S (Yj) - S(Yj)) = (X1 - ∅) ∪ (X2 ∪ X3 - X3) ∪ (X1 ∪ X2 - ∅)
= X1 ∪ X2
NEGS (B*) = U - ∪Yj∈B* ( S (Yj)) = U - ((X1) ∪ (X2 ∪ X3) ∪ (X1 ∪ X2)) = ∅
Attribute dependencies
Various measures can be defined to represent how much B, a set of decision
attributes, depends on A, a set of condition attributes. In the following, we state some
of these measures. Probably the most common measure is the dependency.
Dependency γA (B)
The dependency of B on A, denoted as γA (B), is a plausible measure of how much
B depends on A and is defined as follows.
γA (B) = ⏐ POSS (B*) ⏐ ⁄ ⏐ U ⏐
where S = (U, Ã) is the approximation space, and ⏐ ⏐ denotes the cardinality (i.e., the
number of elements) of a set. Note that 0 ≤ γA (B) ≤ 1. In particular,
1. γA (B) = 1: B is totally dependent on A, i.e., A functionally determines B.
2. γA (B) = 0: A and B are totally independent of each other.
3. 0 < γA (B) < 1: B is roughly dependent on A.
In general, the dependency of B on A can be denoted by A →γ B. For example, A →1
B if B is totally dependent on A.
Example 11. γA (B) applied to Example 10.
γA (B) = ⏐ POSS (B*) ⏐ ⁄ ⏐ U ⏐ = ⏐ X3 ⏐ ⁄ ⏐ U ⏐ = 3 ⁄ 8 = 0.375.
i.e., {Temp} →0.375 {Heart-Risk}.
Example 12. (From Table 3)
6.6 More on the Basics of Rough Sets 185
{Temp, Blood-P, Vision} →0.5 {Heart-Risk}
{Temp, Blood-P} →0.5 {Heart-Risk}
We can say that to determine {Heart-Risk}, the {Temp, Blood-P} or {Temp, Blood-P,
and Vision} knowledge is not sufficient, since γ = 0.5. Also, {Vision} is superfluous
since the removal does not affect the dependency.
As a special case of the dependency, when we choose A as a set of one condition
attribute a as γ{a} (B), it is a measure of how much B depends on that specific
condition attribute. That is, γ{a} (B) gives the importance level of a in determination
of B. In the following, we simply state the definitions of other measures.
Discriminant index βA (B)
βA (B) = ⏐ POSS (B*) ∪ NEGS (B*) ⏐ ⁄ ⏐ U ⏐ = ⏐ U - BNDS (B*) ⏐ ⁄ ⏐ U ⏐
The discriminant index is a measure of the degree of certainty in determining
whether elements in U are elements of B or not B. This is can also be interpreted as
a measure indicating how much uncertainty can be removed by selecting S = (U, Ã).
Significance σA (B)
The significance of B on a specific condition attribute a can be defined by using the
dependencies as follows:
σ{a} (B) = γC (B) - γC-{a} (B)
In words, the significance of B on a is the difference between the dependency of B on
the set of all condition attributes C and the dependency of B on the set of all condition
attributes except the specific attribute a. That is, the significance measures the
importance level of the attribute by considering how a deletion of the attribute from
the entire set of condition attributes affects the dependency. γC-{a} (B) is a sort of the
"complement dependency" of {a} with respect to C.
We can further extend the significance for {a} to A, a set of any number of
condition attributes:
σA (B) = γC (B) - γC-A (B)
This is a measure to indicate the importance of the set of condition attributes A, rather
than a single attribute a.
The dependency and discriminant index are "direct" measures focusing on one or
more condition attributes. On the other hand, the significance is "complementary,"
that is, it considers the entire set of condition attributes. Which measure or measures
are to be used depends on a specific application. For example, if γ{a} (B) is equal to
1, then B is totally depends on a. Hence, no other measures may need to be
determined.
186 6 Rough Sets
Reducts and elimination of superfluous attributes
In a knowledge representation system, each entity is described by the attribute values
of C, the set of condition attributes. (For example, in Table 3, Object u1, is described
by: Temp = normal, Blood-P = low, and Vision = far.) Some attributes in C can be
redundant and thus can be eliminated.
Let B be a non-empty subset of C. B is called a dependent set of attributes if there
exists a proper subset B' ⊂ B such that B ' = B , i.e., B' →1 B; otherwise, B is called
an independent set or minimal set. B is said to be a reduct of C if B is a maximal
independent set of condition attributes. (Maximal means an addition of any attribute
to B would make the new B dependent.) A reduct of C is denoted as Cˆ . Cˆ induces
the same partition as C. In general, more than one reduct of C can be identified. The
collection of all reducts of C is denoted by RED(C).
For example, consider the following Table 4.
Table 4. Illustration of dependent/independent sets, reducts and the core
U CD
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Person Temp Blood-P Vision EKG Cholesterol Heart-Risk
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
u1 normal low far . . . . . . slight
u2 below average standard . . . ... serious
. ... ... ... ... ... ...
. ... ... ... ... ... ...
Here we consider C = {Temp, Blood-P, Vision, EKG, Cholesterol} (but not D =
{Heart-Risk}). Let B1B = {Temp, Blood-P, Vision} and B2B = {Temp, Blood-P}.
Suppose that B1B and B2B leads to the same equivalence relation on U, i.e., they induce
the same partition on U; then B1B is a dependent set. B2B is an independent set if there
is no proper subset of B2B that has the same equivalence relation. In other words, B2B
is an independent set if a deletion of any attribute of B2B results in a different
equivalence relation. Furthermore, B2B is a reduct of C if B2B is a maximal independent
set, i.e., an addition of any attribute such as Vision, EKG, or Cholesterol, would
make the set dependent. In other words, B2B has the same equivalence relation as C,
i.e., it induces the same partition as C. Suppose that B3B = {Blood-P, EKG,
Cholesterol} is another reduct of C, and B2B and B3B are the only reducts of C. Then
RED (C) = {B2B , B3B }.
We can extend the above definitions, dependent and independent sets, and reduct,
to take into account the set of decision attributes D. We do this with the notion of
positive regions. Let again B be a non-empty subset of C. B is called a dependent set
with respect to D if there exists a proper subset B' ⊂ B such that POSB' (D*) = POSB
(D*); otherwise, B is regarded as an independent set with respect to D. B is said to
be a relative reduct of C if B is a maximal independent set with respect to D. The
collection of all such relative reducts is denoted by REDD (C). We note that for any
reduct or relative reduct Cˆ of C, C →γ D always implies Cˆ →γ D, i.e., C can be
6.6 More on the Basics of Rough Sets 187
reduced to Cˆ , without a loss of information.
For example, in Table 4, these extensions will take into account set D =
{Heart-Risk} in terms of POSB(D*). For example, B2B = {Temp, Blood-P} is an
independent set with respect to D, if a deletion of any attribute results in different
POSB(D*). B2B is a relative reduct of C if POSB2(D*) = POSC(D*). Suppose that B2B
and B3B are the only relative reducts of C. Then REDD (C) = {BB2, BB3}.
Example 13. (From Table 3)
Cˆ = {Temp, Blood-P} is the only relative reduct of C = {Temp, Blood-P, Vision}.
C can be reduced to Cˆ ; Table 3 can be transformed to another equivalent and
simpler table (Table 5).
Table 5. Reduced knowledge representation system
UC D
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Person Temp Blood-P Heart-Risk Health
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
u1 normal low slight poor
u2 below average serious excellent
u3 above low serious good
u4 normal average slight excellent
u5 normal low none good
u6 above high serious good
u7 above average serious excellent
u8 below average none good
The core of C is defined as the set of condition attributes belonging to the
intersection of all reducts of C:
CORE (C) = ∩ B∈RED(C)B
For example, suppose C = {Temp, Blood-P, Vision, EKG, cholesterol}, and only
reducts of C are {Temp, Blood-P, EKG} and {Blood-P, EKG, cholesterol}. Then
CORE (C) = {Blood-P, EKG}.
When a deletion of a condition attribute from C results in a different equivalence
relation from the equivalence relation defined for C, then the condition attribute is
called indispensable. In equation form, a condition attribute a ∈ C is indispensable
if C a ≠ C where Ca = C - {a}. The core of C is equal to the set of all indispensable
attributes in C. When dealing with an information table, a common problem is to
identify the most essential condition attributes. The core or all of its elements, the
indispensable condition attributes, is necessary in order to have the same equivalence
relation as C, although it is not sufficient. A reduct is sufficient to have the same
equivalence relation as C. When there are many reducts, a selection of a reduct is not
necessarily obvious. We can employ various criteria such as selecting a reduct of the
188 6 Rough Sets
smallest number of attributes, or selecting a reduct that contains most common
attributes for the specific application under consideration, and so forth.
As before, we can extend the above definitions to take into account D, the set of
action attributes. The relative core is the set of condition attributes belonging to the
intersection of all relative reducts of C:
CORED (C) = ∩ B∈REDD(C)B
A condition attribute a ∈ C is said to be indispensable with respect to D if POSc-{a}
(D*) ≠ POSc (D*). The relative core of C is equal to the set of all indispensable
condition attributes with respect to D. The core can be easily determined from a KRS.
The core is a subset of every reduct, i.e., every reduct is a superset of the core. Hence,
it is advantageous to start with the core in order to find a reduct.
Example 14. (From Table 3)
C = {Temp, Blood-P, Vision}, D = {Heart-Risk, Health}.
Let D' = {Heart-Risk}. Then the relative core of C with respect to D':
CORED' (C) = {Temp}
There are two relative reducts of set C with respect to D':
B1B = {Temp, Blood-P}, B2B = {Temp, Vision},
i.e., REDD' (C) = {BB1, B2B }, and CORED' (C) = B1 ∩ BB2.
6.7 Additional Remarks
Implementation considerations
There are many ways to employ rough sets depending on specific applications. The
following are some possible considerations.
1. Representation of input data and information tables. Since rough sets deal with
information tables, computer processing requires storing raw data represented as
information tables. The simplest data structure for an information table will be
a two-dimensional array since the table is a two-dimensional matrix. When the
size of the table is not known in advance, however, a linked list is more flexible
to dynamically allocate memory space. A linked list with pointers of row-wise
and column-wise can be used to access horizontal and vertical directions of the
table. Typically, an array may be easier for programming, while a linked list may
be more flexible for variable size of information tables.
2. Preparation and analysis of input data. Input data can be carefully prepared
6.7 Additional Remarks 189
manually or sometimes through other pre-processing techniques such as
statistical analysis. We try to include the minimum necessary and sufficient
information for a particular application. The condition attributes can be
manually arranged in decreasing order of importance, if it is known.
3. Discretization of input data. Raw data is often given as numeric values, such as
99.7, rather than descriptive ones given in examples in this chapter, as, for
example, Temp = below, normal, or above. When raw data is in numeric form,
usually we need to pre-process it by assigning one of discrete intervals - a sort
of "quantization" of continuous data. Such quantization process is called
discretization. For a programming purpose, descriptive values such as "below"
do not necessarily have to be used exactly as they appear. Instead, discrete
numeric values can be associated as, for example, below = 1, normal = 2, and
above = 3. This numeric representation is often easier for programming.
4. Analysis of condition attributes. For example, determining indispensable attr
ibutes, reducts, and the core (with or without respect to a decision attribute).
Suppose that we arrange the condition attributes in decreasing order of
importance, based on intuition or some sort of pre-processing. We can start from
the least important attribute. We drop it and check whether the resulting partition
is the same as the original one containing all the condition attributes. This
requires exhaustive comparisons by the computer. If the result is the same, the
attribute is dispensable; otherwise it is indispensable. We can repeat this process
for the remaining attributes, each time by choosing one attribute. At the end of
this process, we know whether each attribute is dispensable or indispensable for
the entire set of condition attributes. The set of all the indispensable attributes is
the core, that is, the set of absolutely necessary condition attributes (although it
may not be sufficient).
A reduct is a set of sufficient condition attributes equivalent to the original
data and we can work on it to derive rules. Several scenarios are possible to
determine a reduct depending on a specific application.
(a) If all the attributes are indispensable, we cannot drop any of the attributes.
The core is the only reduct.
(b) If some attributes are indispensable while others are not, we can start from
the core to find a reduct. We pick one of the dispensable attributes, and keep
adding one at a time, until the resulting partition is the same as the original
one with all the input attributes. This process requires exhaustive
comparisons.
(c) In another extreme case, there may be no indispensable attributes, that is,
the core is empty. In this case, we can start from a set of some dispensable
attributes, adding or dropping one at a time until we find a reduct.
In practice, often it is necessary to limit the number of attributes and the
number of possible values each attribute can take. For example, if there are
seven attributes and each attribute can take one of four values, the number of
possible combinations is 4 ** 7 = 16,384. This number may be too big; we may