13.4 Adaptive Lattice-Ladder Filters 931
bm (n), we will make use of the two matrix inversion identities for a matrix of the
form
A = [Al1 A12] (13.4.21)
A21 A22
where A, All, and A22 are square matrices. The inverse of A is expressible in two
different forms, namely,
A- 1 _ [All + AIlA12A221A21AIl - A1-11A12A-2-21] (13.4.22)
- -A-2-21 A21Al-11 A2-21
and A- 1-11 - A-1-11A12A2-21 ]
A-1 _ [ (13.4.23)
- -A221A21AIl A221 A21AIl A12A2l + A221
where All and A12 are defined as
A- l1 = All - A12A221A21
(13.4.24)
Order-Update Recursions. Now, let us use the formula in (13.4.22) to obtain the
inverse of Rm+1 (n) by using the form in (13.4.20). First, we have
A22 = v(n) - V~(n)R;;;1(n)Vm(n) (13.4.25)
= v(n) + b~(n)V~(n) = E!(n)
and (13.4.26)
Hence,
or, equivalently, 0] _1_0 bm(n) ]
+ E~(n) [bm(n)] [bH(n) E~(n) (13.4.27)
P (n) = [Pm(n) m
m+1 0 1 1
E~(n)
1]
By substituting n -1 for n in (13.4.27) and postmultiplying the result by -Qm+1 (n),
we obtain the order update for amen). Thus,
=am+1 (n) -Pm+1 (n - 1)Qm+1 (n)
1)= o[Pm(n - 0] [-Q};:)]
0 ...
_ 1 [bm(n-1)][bH(n_1) 1] Qm+1(n) (13.4.28)
E~(n-1) 1 m
= [amen)] _ km+1(n) [bm(n -1)]
o E~(n - 1)
1
932 Chapter 13 Adaptive Filters
where the scalar quantity km+l (n) is defined as
(13.4.29)
The reader should observe that (13.4.28) is a Levinson-type recursion for the predic-
tor coefficients.
To obtain the corresponding order update for bm (n), we use the matrix inversion
formula in (13.4.23) for the inverse of Rm+l (n), along with the form in (13.4.11). In
this case, we have
A- ll = q(n) - QmH (n)R;;;1(n -1)Qm(n) (13.4.30)
= q(n) + a~(n)Q~(n) = E~(n)
and
(13.4.31)
Hence,
or, equivalently,
[~ [am~n)]=Pm+l(n) (13.4.32)
Pm(nO_1)] + E}(n) [1 a!:(n)]
Now, if we postmultiply (13.4.32) by -Vm+l(n), we obtain
(13.4.33)
where
(13.4.34)
The proof of (13.4.34) and its relation to (13.4.29) is left as an excercise for the
reader. Thus, (13.4.28) and (13.4.33) specify the order-update equations for am(n)
and bm(n), respectively.
The order-update equations for E~(n) and E!(n) may now be obtained. From
the definition of E~(n) given by (13.4.8), we have
(13.4.35)
13.4 Adaptive Lattice-Ladder Filters 933
By substituting for am+l (n) from (13.4.28) into (13.4.35), we obtain
Ef+!(n) = q(n) + [a~(n) 0] [Q~(n)] - :m+!(n) [b~(n -1) l]Q~+! (n)
m ... Em(n - 1)
= Ef (n) _ Ikm+! (n)1 2 (13.4.36)
m E~(n-1)
Similarly, by using (13.4.17) and (13.4.33), we obtain the order update for E!+! (n),
in the form
Eb (n)=E b (n_1)_lkm+1 (n)1 2 (13.4.37)
m+l m Et(n)
The lattice filter is specified by two coupled equations involving the forward and
backward errors fm(n, n -1) and gm(n, n - 1), respectively. From the definition of
the forward error in (13.4.1), we have
fm+l (n, n - 1) = x(n) + a~+l (n - 1)Xm+! (n - 1) (13.4.38)
Substituting for a~+l (n - 1) from (13.4.28) into (13.4.38) yields
=fm(n,n-1)- km+l (n - 21) x [x(n-m-1)+b~(n-2)Xm(n-1) ]
b
Em(n - )
km+l (n - 1) gm(n -l,n - 2) (13.4.39)
= fm(n,n -1) - b
Em(n - 2)
To simplify the notation, we define (13.4.40)
fm(n) = fm(n, n - 1)
gm(n) = gm(n, n -1)
Then, (13.4.39) may be expressed as
=fm+l(n) fm(n) - kmE~+(l n(n - 1) -1) (13.4.41)
_ 2) gm(n
Similarly, beginning with the definition of the backward error given by (13.4.13),
we have
=gm+l (n, n - 1) x(n - m - 1) + b~+l (n - l)Xm+! (n) (13.4.42)
934 Chapter 13 Adaptive Filters
Substituting for bm+1 (n - 1) from (13.4.33) and simplifying the result, we obtain
k~+1 (n - 1) (13.4.43)
gm+l(n,n-1)=gm(n-1,n-2)- f !m(n,n-1)
Em(n - 1)
or, equivalently,
k~+1 (n - 1) (13.4.44)
gm+1(n)=gm(n-1)- f !m(n)
Em(n - 1)
The two recursive equations in (13.4.41) and (13.4.44) specify the lattice filter
illustrated in Figure 13.4.1 where, for notational convenience, we have defined the
reflection coefficients for the lattice as
'j{f (n) = -km(n) (13.4.45)
m E!_1(n-1)
'j{b (n) = -k~(n)
m E~_1 (n)
The initial conditions on the order updates are
!o(n) = go(n) = x(n)
= = = +Et (n) Eg(n) Ln (13.4.46)
w n - 1Ix(l)1 2 wEt (n - 1) Ix(n)1 2
1=0
We note that (13.4.46) is also a time-update equation for Et (n) and Eg(n).
fm-j(n) ---_rF------~
(a)
go(n) gj(n) g2(n)
'----
~ Stage Stage Stage Stage
1 2 3 M
fo(n) fJ(n) fz(n) ----
(b)
Figure 13.4.1 Least-squares lattice filter.
13.4 Adaptive lattice-Ladder Filters 935
lime-Update Recursions. Our goal is to determine a time-update equation for km (n),
which is necessary if the lattice filter is to be adaptive. This derivation will require
time-update equations for the prediction coefficients. We begin with the form
(13.4.47)
The time-update equation for Vm+l (n) is
(13.4.48)
The time-update equations for the prediction coefficients are determined as follows.
From (13.4.6), (13.4.7), and (13.3.14), we have
=amen) -Pm(n - l)Qm(n)
= - ~ [Pm(n - 2) - Km(n -l)X~(n - l)Pm(n - 2)]
+x [wQm(n -1) x(n)X~(n - 1)] (13.4.49)
= amen - 1) - Km(n - 1) [x(n) + a~(n - l)Xm(n - 1)]
where Km(n - 1) is the Kalman gain vector at iteration n - 1. But, from (13.4.38),
we have
+x(n) a~(n -l)Xm(n -1) = fm(n, n -1) == fm(n)
Therefore, the time-update equation for amen) is
(13.4.50)
In a parallel development, using (13.4.15), (13.4.16), and (13.3.14), we obtain the
time-update equations for the coefficients of the backward predictor, in the form
(13.4.51)
Now, from (13.4.48) and (13.4.50), the time-update equation for km+l (n) is
= -km+l (n) [WV~+l (n - 1) + x*(n - m - l)X~+l (n)]
= wkm+1(n -1) - wV~+l(n -1) [Km(~ -1)] fm(n)
+ x*(n - m - l)X~+l (n) [am(}_1)]
- x*(n - m - l)X~+l (n) [Km(~ _ 1)] fm(n) (13.4.52)
936 Chapter 13 Adaptive Filters
But
(13.4.53)
and
V~+l (n - 1) [Km(~ _ 1)] = V~ (n - 2)Km(n - 1)
V~ (n - 2}Pm(n - 2)X;;'(n - 1) (13.4.54)
=
w+JLm(n-1)
-b:: (n - 2)X;;' (n - 1)
+=
w JLm(n - 1)
g;;'(n - 1) - x*(n - m - 1)
+= -=-----~--
W JLm(n -1)
where JLm(n) is as previously defined in (13.3.13). Finally,
Xt (n) [ 0 ] = X~(n - 1}Pm(n - 2)X;;'(n - 1) = --'J-L-m-(n----1--)'-- (13.4.55)
+ +m+l Km(n - 1)
W JLm(n - 1) W JLm(n -1)
Substituting the results of (13.4.53), (13.4.54), and (13.4.55) into (13.4.52) we obtain
the desired time-update equation in the form
(13.4.56)
It is convenient to define a new variable
(13.4.57)
Clearly, am(n) is real-valued and has a range 0 < am(n) < 1. Then, the time-update
equation (13.4.56) becomes
(13.4.58)
Order Update for «m (n). Although am (n) can be computed directly for each value
of m and for each n, it is more efficient to use an order-update equation, which is
determined as follows. First, from the definition of Km(n) given in (13.3.12), it is
easily seen that
(13.4.59)
13.4 Adaptive Lattice-Ladder Filters 937
To obtain an order-update equation for amen), we need an order-update equation
for the Kalman gain vector Km(n). But Km+l (n) may be expressed as
0]o= ([Pm(n) + _1 [bm(n)] [bH (n) 1]) [ X~(n) ]
0 E~(n) 1 m x*(n -m)
+= [Km(n)] g~(n, n) [bm(n)] (13.4.60)
o E~(n)
1
The term gm (n, n) may also be expressed as
= +gm(n, n) x(n - m) b~(n)Xm(n) (13.4.61)
+= x(n - m) [b~(n -1) - K~(n)gm(n)] Xm(n)
= +x(n - m) b~(n - l)Xm(n) - gm(n)K~(n)Xm(n)
= gm(n) [1- K~(n)Xm(n)]
=am(n)gm(n)
Hence, the order-update equation for Km(n) in (13.4.60) may also be written as
K (n) = [Km(n)] + am(n)g~(n) [bm(n)] (13.4.62)
m+l 1
0 E~(n)
By using (13.4.62) and the relation in (13.4.59), we obtain the order-update equa-
tion for am (n) as follows:
=Clm+l (n) 1 - X~+l (n)Km+l (n) = 1 - [X~ (n)x(n - m)]
x ([Km(n)] + am(n)g~(n) [bm(n)])
o E~(n) 1
=am (n) - am(n;g~(n) [X~ (n) x(n - m)] [bm1(n)]
Em(n)
am(n)g~(n)
= amen) - b gm(n, n)
Em(n)
= a (n) _ a~(n)lgm(n)12 (13.4.63)
m E~(n)
Thus, we have obtained both the order-update and time-update equations for the
basic least-squares lattice shown in Figure 13.4.1. The basic equations are (13.4.41)
and (13.4.44) for the forward and backward errors, usually called the residuals,
(13.4.36) and (13.4.37) for the corresponding least-squares errors, the time-update
/
938 Chapter 13 Adaptive Filters
Equation (13.4.58) for km(n), and the order-update Equation (13.4.63) for the pa-
rameter O!m(n). Initially, we have
E~(-1) = E!{-1) = E!{-2) = E > 0 (13.4.64)
= =fm{-1) gm(-1) km(-l) = 0
O!m{-1) = 1, O!-t(n) = O!_t(n -1) = 1
Joint Process Estimation. The last step in the derivation is to obtain the least-squares
estimate of the desired signal d(n) from the lattice. Suppose that the adaptive filter
has m + 1 coefficients, which are determined to minimize the average weighted
squared error
~m+t = Lwn n - 1Iem+l(l,n)1 2 (13.4.65)
1=0
where
(13.4.66)
The linear estimate
(13.4.67)
which will be obtained from the lattice by using the residuals gm (n), is called the joint
process estimate.
From the results of Section 13.3.1, we have already established that the coeffi-
cients of the adaptive filter that minimize (13.4.65) are given by the equation
(13.4.68)
We have also established that hm(n) satisfies the time-update equation given in
(13.3.27).
Now, let us obtain an order-update equation for hm(n). From (13.4.68) and
(13.4.27), we have
°oJ _1_ [bm{n)Jhm+t(n) = [Pmo(n)
[Dm(n)J + E~(n) 1 [bmH(n) 1]Dm+l(n) (13.4.69)
...
We define a complex-valued scalar quantity Dm(n) as
(13.4.70)
Then, (13.4.69) may be expressed as
h (n) = [hm(n)J + ED~m((nn)) [bm{n)J (13.4.71)
m+l
0 1
13.4 Adaptive Lattice-Ladder Filters 939
The scalar 8m (n) satisfies a time-update equation that is obtained from the time-
update equations for bm{n) and Dm{n), given by (13.4.51) and (13.3.17), respec-
tively. Thus,
8m{n) = [b~{n -1) - K~{n)g~{n) 1] [wDm+l{n -1) +d{n)X~+l{n)]
= w8m{n -1) + [b~{n -1) l]X~+l{n)d{n) (13.4.72)
- wg~{n)[K~{n) 0]Dm+l{n-1)-g~{n)d{n)[K~{n) O]X~+l{n)
But (13.4.73)
[b~ (n -1) 1] X~+l (n) = x*{n - m) + b~ (n - l)X~{n) = g~{n)
Also,
[K~{n) 0] D m+ l { n -1) = w + 1 [X~{n)Pm{n -1) 0] [Dm~n..-1)]
ILm{n)
(13.4.74)
= 1 X~{n)hm{n - 1)
+w ILm{n)
The last term in (13.4.72) may be expressed as
[K~ (n) 0] [x:'{n)] = 1 Xl (n)Pm{n - l)X* (n)
... W+ILm{n) m m
(13.4.75)
ILm (n)
=
Upon substituting the results in (13.4.73-13.4.75) into (13.4.72), we obtain the desired
time-update equation for 8m {n) as
(13.4.76)
Order-update equations for am{n) and gm{n) have already been derived. With
=eo{n) d{n), the order-update equation for em{n) is obtained as follows:
= =em{n) em{n, n -1) d{n) - h~{n -l)Xm{n)
= d{n) - [h~_l{n -1) 0] [Xm.~l.{n)]
- 8m - 1(n - 1) [bmI - 1(n - 1) 1] Xm{n)
b
Em _ 1(n -1)
= ) 8m-l{n -l)gm-l{n) (13.4.77)
em-l {n - E!_l (n - 1)
/
940 Chapter 13 Adaptive Filters
,+,..1 ,~ ,+,..., ,+ den)
+ el(n) + e2(n) ..+ e3(n) + eM-I(n) +
,+ -\.~
den) +..1 ,~ +,..1
-
- -
-&o-(n--]) &I(n-]) &2(n-]) &M-2(n-]) &M-I(n-])
E~(n-]) E~(n-]) Et-2(n-]) Et-I(n-l)
ES(n-])
...... gM-I(n)
IM-I(n)
- go(n) gl(n) g2(n) gM-2(n)
x(n) Stage Stage Stage
2 M-]
]
......
lo(n) f.(n) i2(n) IM-2(n)
Figure 13.4.2 Adaptive RLS lattice-ladder filter.
Finally, the output estimate d(n) of the least-squares lattice is
(13.4.78)
But h~+1 (n - 1) is not computed explicitly. By repeated substitution of the order-
update equation for hm+1 (n) given by (13.4.71) into (13.4.78), we obtain the desired
expression for d(n) in the form
(13.4.79)
In other words, the output estimate d(n) is a linear weighted sum of the backward
residuals gk (n).
The adaptive least-squares lattice/joint-process (ladder) estimator is illustrated
in Figure 13.4.2. This lattice-ladder structure is mathematically equivalent to the RLS
direct-form FIR filter. The recursive equations are sumarized in Table 13.5. This is
called the a priori form of the RLS lattice-ladder algorithm in order to distinguish
it from another form of the algorithm, called the a posteriori form, in which the
coefficient vector hM (n) is used in place of hM (n -1) to compute the estimate d (n). In
many adaptive filtering problems, such as channel equalization and echo cancellation,
the a posteriori form cannot be used, because hM (n) cannot be computed prior to
the computation of d(n).
We now describe a number of modifications that can be made to the "conven-
tional" RLS lattice-ladder algorithm given in Table 13.5.
13.4 Adaptive lattice-ladder Filters 941
TABLE 13.5 A Priori Form of the RLS Lattice-Ladder Algorithm
Lattice predictor: Begin with n = 1 and compute the order updates for m = 0,1, ... , M - 2
= +km+l (n -1) wkm+l(n - 2) am(n - 2)fm(n -l)g~(n - 2)
'X~+1 (n - 1) = km+l(n -1)
E!(n - 2)
k~+l (n -1)
E~(n-1)
fm+!(n) = fm(n) +'X~+l(n -l)gm(n -1)
=gm+l(n) gm(n -1) +:J[~+!(n -l)fm(n)
Ef (n _ 1) = Ef (n _ 1) _ Ikm+! (n - 1)12
m+l m E!(n - 2)
Eb (n _ 1) = Eb (n _ 2) _ Ikm+! (n - 1)12
m+l m E~(n -1)
am+l (n - 1) _ ( - 1) a;,(n - l)lgm(n _1)12
- am n -
b
Em(n -1)
=Ladder filter: Begin with n 1 and compute the order updates for m = 0,1, ... , M - 1
= +8m(n -1) w8m(n - 2) am(n -l)g~(n -l)em(n -1)
~m(n -1) = 8m (n -1)
E!(n -1)
+em+l(n) = em(n) ~m(n -l)gm(n)
Initialization
ao(n - 1) = I, eo(n) = d(n), fo(n) = go(n) = x(n)
Et (n) = Eg(n) = wEt (n - 1) + Ix(n)1 2
am (-l) = 1, km (-l) = 0
E!(-l) = E~(O) = E > 0; 8m(-1) = 0
Modified RLS Lattice Algorithms. The recursive equations in the RLS lattice algo-
rithm given in Table 13.5 are by no means unique. Modifications can be made to
some of the equations without affecting the optimality of the algorithm. However,
some modifications result in algorithms that are more numerically robust when fixed-
point arithmetic is used in the implementation of the algorithms. We give a number
of basic relationships that are easily established from the above developments.
First, we have a relationship between the a priori and a posteriori error residuals.
A priori errors:
fm(n, n -1) == fm(n) = x(n) + a~(n -1)Xm(n - 1) (13.4.80)
=gm(n, n - 1) == gm(n) x(n - m) + b~(n - 1)Xm(n)
942 Chapter 13 Adaptive Filters (13.4.81)
A posteriori errors:
+!m{n, n) = x{n) a~{n)Xm{n -1)
gm{n, n) = x{n - m) + b~{n)Xm{n)
The basic relations between (13.4.80) and (13.4.81) are (13.4.82)
=!m{n, n) cxm{n - 1)!m{n)
gm{n, n) = CXm {n)gm (n)
These relations follow easily by using (13.4.50) and (13.4.51) in (13.4.81).
Second, we may obtain time-update equations for the least-squares forward and
backward errors. For example, from (13.4.8) and (13.4.50) we obtain
E~{n) = q{n) + a~{n)Q~{n)
=q{n) + [a~{n -1) - K~{n -1)!m{n)] [wQ~{n -1) +x*{n)Xm{n -1)]
(13.4.83)
Similarly, from (13.4.17) and (13.4.51) we obtain
(13.4.84)
Usually, (13.4.83) and (13.4.84) are used in place of the sixth and seventh equations
in Table 13.5.
Third, we obtain a time-update equation for the Kalman gain vector, which is
not explicitly used in the lattice algorithm, but which is used in the fast FIR filter
algorithms. For this derivation, we also use the time-update equations for the forward
and backward prediction coefficients given by (13.4.50) and (13.4.51). Thus, we have
- [ 0 0 ] [ x*{n) ]
0 Pm-l{n -1) X~_I{n -1)
+ E~_1l (n) [ 1] [ amH- 1(n) ] [ X~_xl*({nn)- ]
am-J (n) 1
1)
=[ 0 ] + !~-J (n, n) [ 1 ]
K m-l{n-l)
EL1{n) am -l{n)
== [Cm-J{n)] (13.4.85)
cmm{n)
13.4 Adaptive lattice-ladder Filters 943
where, by definition, Cm-l (n) consists of the first (m - 1) elements of Km{n) and
cmm{n) is the last element. From (13.4.60), we also have the order-update equation
for Km{n) as
(13.4.86)
By equating (13.4.85) to (13.4.86), we obtain the result
C (n) - g~-l (n, n) (13.4.87)
mm - ...E..:b.:.:..-;(-n=)---
m-l
and, hence,
(13.4.88)
By substituting for bm-l (n) from (13.4.51) into (13.4.88), we obtain the time-update
equation for the Kalman gain vector in (13.4.85) as
(13.4.89)
There is also a time-update equation for the scalar am{n). From (13.4.63), we
have
am (n ) = am-l (n) - a~_1{n)lgm_l{n)12
b
Em- 1(n)
=a m-l (n)[1 - cmm{n)gm-l (n)] (13.4.90)
A second relation is obtained by using (13.4.85) to eliminate Km-l (n) in the expres-
sion for am (n). Then,
am{n) = 1 - X~{n)Km{n)
- (-1) [1- f~_l{n,n)fm-l{n)] (13.4.91)
- am-l n 1
Em- 1(n)
By equating (13.4.90) to (13.4.91), we obtain the desired time-update equation for
am{n) as
1 1::'_1 (n,n)lm_l (n) ]
am-l{n) = am-l{n - 1) [ - E!_I(n) (13.4.92)
1 () ()
- Cmm n gm-l n
944 Chapter 13 Adaptive Filters
Finally, we wish to distinguish between two different methods for updating the
reflection coefficients in the lattice filter and the ladder part: the conventional (ilJdi-
rect) method and the direct method. In the conventional (indirect) method,
= -I km+l{n) (13.4.93)
E!{n _ 1)
Xm+l (n)
'j{b (n) = _ km* +l (n) (13.4.94)
m+l E~{n)
(13.4.95)
where km+l (n) is time-updated from (13.4.58), 8m (n) is updated according to (13.4.76),
and E~{n) and E!{n) are updated according to (13.4.83) and (13.4.84). By substi-
tuting for km+l (n) from (13.4.58) into (13.4.93), and using (13.4.84) and the eighth
equation in Table 13.5, we obtain
2») _= _Xl n
m+l ( )
km+l{n -1) ( WEE~!{{nn- - 1) c¥m{n -1E)!~m{n{n-)g1;)'{n -1)
E!{n - 2)
1) (1 _= 'J(/ (n _ c¥m{n -l)lgm{n _1)1 2 )
m+l E~{n - 1)
c¥m{n -1)!m{n)g;'{n -1)
E~{n - 1)
= 'j{1 (n -1) _ c¥m{n -l)!m+l{n)g;'{n -1) (13.4.96)
m+l E~{n - 1)
which is a formula for directly updating the reflection coefficients in the lattice. Sim-
ilarly, by substituting (13.4.58) into (13.4.94), and using (13.4.83) and the eighth
equation in Table 13.5, we obtain
X~+l (n) = X~+l (n - 1) _ c¥m{n - 1~~:~)gm+l (n) (13.4.97)
Finally, the ladder gain can also be updated directly according to the relation
gm{n) = gm{n -1) - C¥m{n)g;~~~~m+l{n) (13.4.98)
The RLS lattice-ladder algorithm that uses the direct update relations in (13.4.96-
13.4.98) and (13.4.83-13.4.84) is listed in Table 13.6.
An important characteristic of the algorithm in Table 13.6 is that the forward
and backward residuals are fed back to time-update the reflection coefficients in the
lattice stage, and em+1(n) is fed back to update the ladder gain gm (n). For this reason,
this RLS lattice-ladder algorithm has been called the error-feedback form. A similar
form can be obtained for the a posteriori RLS lattice-ladder algorithm. For more
details on the error-feedback form of RLS lattice-ladder algorithms, the interested
reader is referred to Ling, Manolakis, and Proakis (1986).
13.4 Adaptive Lattice-Ladder Filters 945
TABLE 13.6 Direct Update (Error-Feedback) Form of the A Priori RLS Lattice-Ladder
Algorithm
= =Lattice predictor: Begin with n 1 and compute the order updates for m 0, 1, ... , M - 2
':if;! (n -1) = Xl (n _ 2) _ am(n - 2)fm+l(n -1)g;;'(n - 2)
E~(n - 2)
m+l m+l
=X b (n -1) X b (n _ 2) _ am(n - 2)f:'(n -1)gm+l(n -1)
E~(n -1)
m+l m+l
= +fm+l(n) fm(n) X~+1 (n - 1)gm(n -1)
gm+1(n) =gm(n-1)+X~+1(n-1)fm(n)
=E~+I(n -1) wE~+l(n -2) +am+l(n -2)lfm+l(n _1)12
( 1) _ ( 1) a~(n - 1)lgm(n _1)1 2
am+l n - - am n - - E~ (n _ 1)
=E!+I(n -1) wE!+l(n -2) +am+l(n -1)lgm+l(n _1)1 2
Ladder filter: Begin with n = 1 and compute the order updates m = 0, 1, ... , M -1
I: (n -1) = I: (n _ 2) _ am(n -1)g;;'(n -1)em+l(n -1)
,)m 'jm E~(n -1)
= +em+l(n) em(n) ;m(n -1)gm(n)
Initialization
=ao(n - 1) 1, eo(n) = d(n), fo(n) = go(n) = x(n)
=E6 (n) = Eg(n) wE6 (n -1) + Ix(n)12
=am (-1) = 1, X!(-1) = ~(~(-1) 0
E!(-1) = E~(O) = E > 0
Fast RLS Algorithms. The two versions of the fast RLS algorithms given in Sec-
tion 13.3.3 follow directly from the relationships that we have obtained in this sec-
tion. In particular, we fix the size of the lattice and the associated forward and
backward predictors at M - 1 stages. Thus, we obtain the first seven recursive equa-
tions in the two versions of the algorithm. The remaining problem is to determine the
time-update equation for the Kalman gain vector, which was determined in (13.4.85-
13.4.89). In version B of the algorithm, given in Table 13.3, we used the scalar am (n)
to reduce the computations from 10M to 9M. Version A of the algorithm, given in
Table 13.2, avoids the use of this parameter. Since these algorithms provide a direct
updating of the Kalman gain vector, they have been called fast Kalman algorithms
(for reference, see Falconer and Ljung (1978) and Proakis (1989».
Further reduction of computational complexity to 7M is possible by directly
updating the following alternative (Kalman) gain vector (see Carayannis, Manolakis,
and Kalouptsidis (1983» defined as
(13.4.99)
946 Chapter 13 Adaptive Filters
Several fast algorithms using this gain vector have been proposed, with complexi-
ties ranging from 7M to 10M. Table 13.7 lists the FAEST (Fast A Posteriori Error
Sequential Technique) algorithm with a computational complexity 7M (for a deriva-
tion, see Carayannis, Manolakis, and Kalouptsidis (1983; 1986) and Problem 13.7).
In general, the 7M fast RLS algorithms and some variations are very sensi-
tive to round-off noise and exhibit instability problems (Falconer and Ljung (1978),
Carayannis, Manolakis, and Kalouptsidis (1983; 1986), and Cioffi and Kailath (1984».
The instability problem in the 7M algorithms has been addressed by Slock and
Kailath (1988; 1991), and modifications have been proposed that stabilize these algo-
rithms. The resulting stabilized algorithms have a computational complexity ranging
from 8M to 9M. Thus, their computational complexity is increased by a relatively
small amount compared to the unstable 7M algorithms.
To understand the stabilized fast RLS algorithms, we begin by comparing the
fast RLS algorithm given in Table 13.3 and the FAEST algorithm in Table 13.7. As
TABLE 13.7 FAEST Algorithm
=!M-I (n) x(n) + a~_l (n -1)XM- 1(n -1)
- !M-I(n)
!M-I(n,n) = aM-l (n- 1)
=aM-l (n) aM-l (n - 1) - K M- 1(n -1)/M-l (n, n)
+E'_l (n) = wE'_1 (n -1) !M-I (n)!;'_1 (n, n)
]+ !;'_l(n) [
KM(n)=[CM-1(n)]=[_ 0 1]
K M_I (n-l) wE'_I(n-l) aM_I(n-l)
cMM(n)
gM-l (n) = -wE~_1 (n - 1)cMM(n)
K M- 1(n) = CM- I (n) - bM- 1(n - l)cMM(n)
_ = _ (n -1) + --",-I,;!/M,:-..-..:I..(.n:.)..1..:2...,--
aMen) aM-I
wEM_I(n -1)
= +aM-I (n) aMen) gM-l(n)CMM(n)
_ n) = gM-l ((n))
gM-I (n, aM-l n
+Et_l (n) = wEt_1 (n - 1) gM-I (n)g;'_l (n, n)
= K+bM-1(n) bM-1(n - 1) M- I (n)gM-I(n, n)
=eM(n) den) - h:w.(n - I)XM(n)
_ = -eM_(n-)
eM(n,n) aMen)
= +hM(n) hM(n - 1) KM(n)eM(n, n)
Initialization: Set all vectors to zero
E'_I(-I) = Et_I(-I) =E > 0
aM-I(-I) = 1
13.4 Adaptive lattice-ladder Filters 941
indicated, there are two major differences between these two algorithms. First, the
FAEST algorithm uses the alternative (Kalman) gain vector instead of the Kalman
gain vector. Second, the fast RLS algorithm computes the a priori backward predic-
tion error gM-l (n) through FIR filtering using the backward prediction coefficient
vector bm-l(n - 1), whereas the FAEST algorithm computes the same quantity
through a scalar operation by noticing that the last element of the alternative gain
vector, cMM(n), is equal to -wEt_lgM-l(n). Since these two algorithms are al-
gebraically equivalent, the backward prediction errors calculated in different ways
should be identical if infinite precision is used in the computation. Practically, when
finite-precision arithmetic is used, the backward prediction errors computed using
different formulas are only approximately equal. In what follows, we denote them
by gi{~l and g~-l (n), respectively. The superscripts (f) and (s) indicate that they
are computed using the filtering approach and scalar operation, respectively.
There are other quantities in the algorithms that can also be computed in dif-
ferent ways. In particular, the parameter aM-l (n) can be computed from the vector
quantities KM-l(n) and XM-1(n) as
(13.4.100)
or from scalar quantities. We denote these values as ai{~l (n) and a~)_l (n), respec-
tively. Finally, the last element of KM(n), denoted as c~~ (n), may be computed from
the relation -gi{~l (n)
-(f) ( ) = wE b ( n - 1 ) (13.4.101)
cMM n M
_
1
The two quantities in each of the three pairs [gi{~l (n), g~)-l (n)], [ai{~l (n),
a~)_l (n)], and [c~~(n), c~M(n)] are algebraically equivalent. Hence, either of the
two quantities or their linear combination (of the form kf3(s) + (1 - k)f3(f) , where f3
represents any of the three parameters) are algebraically equivalent to the original
quantities, and may be used in the algorithm. Slock and Kailath (1988; 1991) found
that by using the appropriate quantity or its linear combination in the fast RLS
algorithm, it was sufficient to correct for the positive feedback inherent in the fast
RLS algorithms. Implementation of this basic notion leads to the stabilized fast RLS
algorithm given in Table 13.8.
We observe from Table 13.8 that the stabilized fast RLS algorithm employs con-
stants ki' i = 1,2, ... , 5, to form five linear combinations of the three pairs of quanti-
ties just described. The best values of the ki found by Slock and Kailath resulted from
computer search, and are given as kl = 1.5, k2 = 2.5, k3 = 1, k4 = 0, ks = 1. When
ki = 0 or 1, we use only one of the quantities in the linear combination. Hence, some
of the parameters in the three pairs need not be computed. It was also found that
the stability of the algorithm is only slightly affected if ai{~l (n) is not used. These
simplifications result in the algorithm given in Table 13.9, which has a complexity of
8M and is numerically stable.
The performance of the stabilized fast RLS algorithms depends highly on proper
initialization. On the other hand, an algorithm that uses gi{~l (n) in its computations
948 Chapter 13 Adaptive Filters
TABLE 13.8 The Stabilized Fast RLS Algorithm
+!M-l (n) = x(n) a:w.-l (n - 1)XM- 1(n - 1)
!M-I (n)
!M-I (n, n) = a- M-I (n- 1)
aM-l(n) = aM-l(n -1) - KM-1(n -1)!M-l(n, n)
_ () = !;'_I (n)
CMI n f
wEM_1(n -1)
[ CM(,) -l (n) ] = [ K- 0 ] + C- M! (n) [ 1 ]
cMM(n)
M- I (n - 1) aM-l (n - 1)
g~~l (n) = x(n - M + 1) + b~_l (n - I)XM- 1(n)
(/)*
-(f)() gM_l(n)
cMM n = - wEb (n -1)
M-l
cMM(n) = k4c~~(n) + (1 - k4)c~~(n)
KM(n) = [~M-l(n)J
cMM(n)
gM(s) -l (n) = -w EbM-l (n - 1)-c(Ms)M* (n) i = 1,2,5
g<j}_! (n) = k;g~~1 (n) + (1 - k; )g~)-l (n),
CK M- 1(n) = M - 1 (n) - bM - I (n - 1)cMM(n)
= +aM(n) aM-l (n - 1) CMI (n)!M-I (n)
a-(M..)_I (n) = a-M (n) + gM(.•)-I (n )-C(Ms)M ( n)
a~~! (n) = +1 K:w.-l (n)XM - 1(n)
aM-I (n) = k3a~~1 (n) + (1 - k3)a~)_1 (n)
+E~_l (n) = wE'_1 (n -1) !M-I (n)!'M_I (n, n)
[or, /1 =..!.- f 1 _ I(7)I(n)12]
WE M_1(n-l) a~_l(n)
EM_1(n)
(i)
=g(O (n n) - gM-l(n) i 1,2
M-I ' - aM-I (n)'
+bM - 1(n) = bM - 1(n - 1) K- M - 1(n)gM(I)_I (n, n)
Et_l (n) = wEt_I (n - 1) + g~~1 (n)g~~l (n, n)
eM(n) = d(n) - h:w.(n - 1)XM(n)
eM(n, n) = -eM_(-n-)
aM(n)
hM(n) = hM(n -1) + KM (n)eM (n, n)
13.4 Adaptive Lattice-ladder Filters 949
TABLE 13.9 A Simplified Stabilized Fast RLS Algorithm
+!M-l (n) = x(n) a:w.-l (n - 1)XM- 1(n - 1)
!M-l (n, n) = !M-l (n)
( 1)
aM-l n-
aM-l (n) = aM-l (n - 1) - K M- l (n - 1)!M-l (n, n)
_ () = !!.,-1 (n)
eMl n f
wEM _ 1(n -1)
]+KM(n)=[CM-l(n)]=[_ 0 !!.,_l(n) [ 1 ]
K M- 1(n - 1) WE~_l (n - 1) aM-l (n - 1)
cMM(n)
g~~l (n) = x(n - M + 1) + b:w.-l (n - 1)XM- 1(n)
g~~1 (n) = -wEt_l (n - 1)cMM(n)
= =gYJ-l (n) kig~~1 (n) + (1 - ki )gi1-l (n), i 1,2
K M- 1(n) = CM-1(n) - bM-1(n -1)cMM(n)
= +eXM(n) eXM-l (n - 1) CMl (n)!M-l (n)
eXM-l(n) = eXM(n) + g~~1 (n)~M(n)
+E'_1 (n) = WE'_1 (n - 1) !M-l (n)!!.,_l (n, n)
(i) 1, 2
gM(i) -l ( n, n ) = g_M-l ((n)) , . =
aM-l n
I
+b M- 1(n) = b M- 1(n - 1) K- M- l (n)gM(I)_l (n, n)
Et_l (n) = WEt_l (n - 1) + g~~1 (n)g~~l (n, n)
eM(n) = d(n) - h:w.(n - 1)XM(n)
eM(n, n) = -e_M(-n-)
aM(n)
= +hM(n) hM(n -1) KM (n)eM (n, n)
is not critically affected by proper initialization (although, enventually, it will di-
verge). Consequently, we may initially use g~~l(n) in place of g~_l(n) (or their
linear combination) for the first few hundred iterations, and then switch to the form
for the stabilized fast RLS algorithm. By doing so, we obtain a stabilized fast RLS
algorithm that is also insensitive to initial conditions.
13.4.2 Other Lattice Algorithms
Another type of RLS lattice algorithm is obtained by normalizing the forward and
J Jbackward prediction errors through division of the errors by E~ (n) and E! (n),
respectively, and multiplication by Jam(n -1) and Jam(n), respectively. The re-
sulting lattice algorithm is called a square-root or angle-and-power normalized RLS
950 Chapter 13 Adaptive Filters
lattice algorithm. This algorithm has a more compact form than the other forms
of RLS lattice algorithms. However, the algorithm requires many square-root op-
erations, which can be computationally complex. This problem can be solved by
using CORDIC processors, which compute a square root in N clock cycles, where
N is the number of bits of the computer word length. A description of the square-
root/normalized RLS lattice algorithm and the CORDIC algorithm is given in the
book by Proakis et al. (2002).
It is also possible to simplify the computational complexity ofthe RLS algorithms
described in the previous section at the expense of compromising the convergence
rate. One such algorithm is called the gradient-lattice algorithm. In this algorithm
each stage of the lattice filter is characterized by the output-input relations
fm{n) = fm-l{n) - km{n)gm-l{n -1) (13.4.102)
gm{n) = gm-l{n -1) - k~{n)fm-l{n)
where km(n) is the reflection coefficient in the m th stage ofthe lattice and fm (n) and
gm{n) are the forward and backward residuals.
This form of the lattice filter is identical to the Levinson-Durbin algorithm, except
that now km(n) is allowed to vary with time so that the lattice filter adapts to the time
variations in the signal statistics. The reflection coefficients {km{n)} maybe optimized
by employing the method of least squares, which results in the solution
These coefficients can also be updated recursively in time. The ladder coefficients are
computed recursively in time by employing an LMS-type algorithm that is obtained
by applying the mean-square-error criterion. A description of this algorithm is given
in the paper by Griffiths (1978) and in the book by Proakis et al. (2002).
13.4.3 Properties of Lattice-Ladder Algorithms
The lattice algorithms that we have derived in the two previous subsections have
a number of desirable properties. In this subsection, we consider the properties of
these algorithms and compare them with the corresponding properties of the LMS
algorithm and the RLS direct-form FIR filtering algorithms.
Convergence Rate. The RLS lattice-ladder algorithms basically have the same con-
vergence rate as the RLS direct-form FIR filter structures. This characteristic be-
havior is not surprising, since both filter structures are optimum in the least-squares
sense. Although the gradient lattice algorithm retains some of the optimal character-
istics of the RLS lattice, nevertheless the former is not optimum in the least-squares
sense and, hence, its convergence rate is slower.
13.4 Adaptive lattice-ladder Filters 951
5 0.0 Channel-correlation matrix Optimum
Eigenvalue ratio = ii 800 900
5
ii-tap equalizer, noise variance = 0.00 i
i
Gradient lattice algorithm
c!: -1.0 Least-squares lattice algorithm
~ iOO 200 300 400 500 600 700
Number of iterations
i
..:: -2.0
o
.3
-3.0 0
Figure 13.4.3 Learning curves for RLS lattice, gradient lattice, and
LMS algorithms for adaptive equalizer of length M = 11. (From
Digital Communications by John G. Proakis. ©1989 by McGraw-
Hill Book Company. Reprinted with permission of the publisher.)
For comparison purposes, Figures 13.4.3 and 13.4.4 illustrate the learning curves
for an adaptive equalizer of length M = 11, implemented as an RLS lattice-ladder
filter, a gradient lattice-ladder filter, and a direct-form FIR filter using the LMS algo-
rithm, for a channel autocorrelation matrix that has eigenvalue ratios of Amax/Amin =
11 and Amax/Amin = 21, respectively. From these learning curves, we observe that the
gradient lattice algorithm takes about twice as many iterations to converge as the op-
timum RLS lattice algorithm. Furthermore, the gradient lattice algorithm provides
significantly faster convergence than the LMS algorithm. For both lattice structures,
the convergence rate does not depend on the eigenvalue spread of the correlation
matrix.
Computational Requirements. The RLS lattice algorithms described in the previous
subsection have a computational complexity that is proportional to M. In contrast,
the computational complexity of the RLS square-root algorithms is proportional to
M2. On the other hand, the direct-form fast algorithms, which are a derivative of the
lattice algorithm, have a complexity proportional to M, and they are a little more
efficient than the lattice-ladder algorithms.
In Figure 13.4.5 we illustrate the computational complexity (number of complex
multiplications and divisions) of the various adaptive filtering algorithms that we
have described. Clearly, the LMS algorithm requires the fewest computations. The
fast RLS algorithms in Tables 13.3 and 13.9 are the most efficient of the RLS algo-
rithms shown, closely followed by the gradient lattice algorithm, then the RLS lattice
algorithms and, finally, the square-root algorithms. Note that for small values of M,
there is little difference in complexity among the rapidly convergent algorithms.
Numerical Properties. In addition to providing fast convergence, the RLS and gra-
dient lattice algorithms are numerically robust. First, these lattice algorithms are
952 Chapter 13 Adaptive Filters
Channel-correlation matrix
Eigenvalue ratio = 21
g 0.0 ll-tap equalizer, noise variance = 0.001
OJ Gradient lattice algorithm
Least-squares lattice algorithm
~
§~ -1.0
-OJ 900
E
.e.
S -2.0
"o"'
OIl
.3 o 100 200 300 400 500 600 700 800
-3.0 Number of iterations
Figure 13.4.4 Learning curves for RLS lattice, gradient lattice,
and LMS algorithms for adaptive equalizer of length M = 11.
(From Digital Communications by John G. Proakis. ©1989 by
McGraw-Hill Book Company. Reprinted with permission of the
publisher.)
700
.'9'"" 600 Direct-form
RLS square-root
.;':": algorithm
'6 FastRLS
algorithm
"0
§ 500
''0""
.~
·c.c~.. 400
"3
E
><
cO.J.
E
300
0
<.)
".0"..'.
1l
ZE 200
100
~~~~~===:::L:============L:M=s~algOrithm
O 5 10 15 20 25 M
M - Length of filter
Figure 13.4.S Computational complexity of adaptive filter algo-
rithms.
13.4 Adaptive lattice-ladder Filters 953
=TABLE 13.10 Numerical Accuracy, in Terms of Output MSE for Channel with Amax/Amin 11
and w = 0.975, MSE x 10-3
Algorithm
Number of bits RLS Fast Conventional Error feedback
(including sign) square root RLS RLS lattice RLS lattice LMS
16 2.17 2.17 2.16 2.16 2.30
13 2.33 2.21 3.09 2.22 2.30
11 6.14 3.34 25.2 3.09 19.0
9 75.3 a 365 31.6 311
o Algorithm did not converge.
numerically stable, which means that the output estimation error from the computa-
tional procedure is bounded when a bounded error signal is introduced at the input.
Second, the numerical accuracy of the optimum solution is also relatively good when
compared to the LMS and the RLS direct-form FIR algorithms.
For purposes of comparison, we illustrate in Table 13.10 the steady-state average
squared error or (estimated) minimum MSE obtained through computer simulation
from the two RLS lattice algorithms and the direct-form FIR filter algorithms de-
scribed in Section 13.2. The striking result in Table 13.10 is the superior performance
obtained with the RLS lattice-ladder algorithm, in which the reflection coefficients
and the ladder gain are updated directly according to (13.4.96-13.4.98). This is the
error-feedback form of the RLS lattice algorithm. It is clear that the direct updat-
ing of these coefficients is significantly more robust to round-off errors than all the
other adaptive algorithms, including the LMS algorithm. It is also apparent that
the two-step process used in the conventional RLS lattice algorithm to estimate the
reflection coefficients is not as accurate. Furthermore, the estimation errors that are
generated in the coefficients at each stage propagate from stage to stage, causing
additional errors.
The effect of changing the weighting factor w is illustrated in the numerical
results given in Table 13.11. In this table, we give the minimum (estimated) MSE
TABLE 13.11 Numerical Accuracy, in Terms of Output MSE, of A Priori Least-Squares Lattice Al-
gorithm with Different Values of the Weighting Factor w, MSE, x 10-3
Algorithm
W = 0.99 W = 0.975 w = 0.95
Number of bits Error Error Error
with sign Conventional feedback Conventional feedback Conventional feedback
16 2.14 2.08 2.18 2.16 2.66 2.62
13 7.08 2.11 3.09 2.22 3.65 2.66
11 39.8 3.88 25.2 3.09 15.7 2.78
9 750 44.1 365 31.6 120 15.2
954 Chapter 13 Adaptive Filters
obtained with the conventional and error-feedback forms of the RLS lattice algo-
rithm. We observe that the output MSE decreases with an increase in the weigh~ing
factor when the precision is high (13 bits and 16 bits). This reflects the improvement
in performance obtained by increasing the observation interval. As the number of
bits of precision is decreased, we observe that the weighting factor should also be
decreased in order to maintain good performance. In effect, with low precision, the
effect of a longer averaging time results in a larger round-off noise. Of course, these
results were obtained with time-invariant signal statistics. If the signal statistics are
time-variant, the rate of the time variations will also influence the choice of w.
In the gradient lattice algorithm, the reflection coefficients and the ladder gains
are also updated directly. Consequently, the numerical accuracy of the gradient
lattice algorithm is comparable to that obtained with the direct update form of the
RLS lattice.
Analytical and simulation results on numerical stability and numerical accuracy
in fixed-point implementation of these algorithms can be found in Ling and Proakis
(1984), Ling, Manolakis, and Proakis (1985; 1986), Ljung and Ljung (1985), and
Gardner (1984).
Implementation Considerations. As we have observed, the lattice filter structure is
highly modular and allows for the computations to be pipelined. Because of the
high degree of modularity, the RLS and gradient lattice algorithms are particularly
suitable for implementation in VLSI. As a result of this advantage in implementa-
tion and the desirable properties of stability, excellent numerical accuracy, and fast
convergence, we anticipate that adaptive filters will be increasingly implemented as
lattice-ladder structures in the near future.
13.5 Summary and References
We have presented adaptive algorithms for direct-form FIR and lattice filter struc-
tures. The algorithms for the direct-form FIR filter consisted of the simple LMS
algorithm due to Widrow and Hoff (1960) and the direct-form, time-recursive least-
squares algorithms, including the conventional RLS form given by (13.3.23-13.3.27),
the square-root RLS forms described by Bierman (1977), Carlson and Culmone
(1979), and Hsu (1982), and the RLS fast Kalman algorithms, one form of which was
described by Falconer and Ljung (1978), and other forms later derived by Carayannis,
Manolakis, and Kalouptsidis (1983), Proakis (1989), and Cioffi and Kailath (1984).
Of these algorithms, the LMS algorithm is the simplest. It is used in many appli-
cations where its slow convergence is adequate. Of the direct-form RLS algorithms,
the square-root algorithms have been used in applications where fast convergence
is required. The algorithms have good numerical properties. The family of stabi-
lized fast RLS algorithms is very attractive from the viewpoint of computational
efficiency. Methods to avoid instability due to round-off errors have been proposed
by Hsu (1982), Cioffi and Kailath (1984), Lin (1984), Eleftheriou and Falconer (1987),
and Slock and Kailath (1988; 1991).
The adaptive lattice-ladder filter algorithm derived in this chapter is the optimum
RLS lattice-ladder algorithms (in both conventional and error-feedback form). Only
the a priori form of the lattice-ladder algorithm was derived, which is the form most
Chapter 13 Problems 955
often used in applications. In addition, there is an a posteriori form of the RLS
lattice-ladder algorithms (both conventional and error-feedback), as described by
Ling, Manolakis, and Proakis (1986). The error-feedback form of the RLS lattice-
ladder algorithm has excellent numerical properties, and is particularly suitable for
implementation in fixed-point arithmetic and in VLSI.
In the direct-form and lattice RLS algorithms, we used exponential weighting
into the past in order to reduce the effective memory in the adaptation process. As an
alternative to exponential weighting, we may employ finite-length uniform weighting
into the past. This approach leads to the class of finite-memory RLS direct-form and
lattice structures described in Cioffi and Kalaith (1985) and Manolakis, Ling, and
Proakis (1987).
In addition to the various algorithms that we have presented in this chapter, there
has been considerable research into efficient implementation of these algorithms
using systolic arrays and other parallel architectures. The reader is referred to Kung
(1982), and Kung, Whitehouse, and Kailath (1985).
Problems
13.1 Use the least-squares criterion to determine the equations for the parameters of the
FIR filter model in Figure 13.1.2, when the plant output is corrupted by additive
noise, w(n).
13.2 Determine the equations for the coefficients of an adaptive echo canceler based on
the least-squares criterion. Use the configuration in Figure 13.1.8 and assume the
presence of a near-end echo only.
13.3 If the sequences WI (n), wz(n), and w3(n) in the adaptive noise-canceling system
shown in Figure 13.1.14 are mutually uncorrelated, determine the expected value of
the estimated correlation sequences rvv(k) and ryv(k) contained in (13.1.26).
13.4 Prove the result in (13.4.34).
13.5 Derive the equation for the direct update of the ladder gain given by (13.4.98).
13.6 Derive the equation for the reflection coefficients in a gradient lattice-algorithm
given in (13.4.103).
13.7 Derive the FAEST algorithm given in Table 13.7 by using the alternative Kalman
gain vector
instead of the Kalman gain vector KM (n).
13.8 The tap-leaky LMS algorithm proposed by Gitlin, Meadors, and Weinstein (1982)
may be expressed as
where 0 < w < I, Il is the step size, and XM(n) is the data vector at time n. Deter-
mine the condition for the convergence ofthe mean value of hM(n).
956 Chapter 13 Adaptive Filters
13.9 The tap-leaky LMS algorithm in Problem 13.8 may be obtained by minimizing the
cost function
= +8(n) le(n)1 2 cllhM(n)112
where c is a constant and e(n) is the error between the desired filter output and the
actual filter output. Show that the minimization of 8(n) with respect to the filter
coefficient vector hM(n) leads to the following tap-leaky LMS algorithm.
+ = +hM(n 1) (1 - ~c)hM(n) ~e(n)X~(n)
13.10 For the normalized LMS algorithm given by (13.2.31), determine the range of values
13.11 of step size ~ to ensure the stability of the algorithm in the mean-square-error sense.
13.U By using the alternative Kalman gain vector given in Problem 13.8, modify the a
priori fast least-squares algorithms given in Tables 13.2 and 13.3, and thus reduce the
number of computations.
Consider the random process
+x(n) = gv(n) w(n), =n 0, 1, ... , M - 1
where v(n) is a known sequence, g is a random variable with E[g] = 0, and
E[g2] = G. The process w(n) is a white noise sequence with
yww(m) = a;'8(m)
Determine the coefficients of the linear estimator for g, that is,
= LM-I
g h(n)x(n)
n=O
that minimizes the mean-square error
8 = e[(g _ g)2]
13.13 Recall that an FIR filter can be realized in the frequency-sampling form with system
function
H(z)= 1 -M M-I H
-z ~"
M k=O 1- k
ej2n:k/Mz-l
= HI(Z)H2(Z)
where HI (z) is the comb filter and H2(Z) is the parallel bank of resonators.
(a) Suppose that this structure is implemented as an adaptive filter using the LMS
algorithm to adjust the filter (DFT) parameters Hk. Give the time-update equa-
tion for these parameters. Sketch the adaptive filter structure.
(b) Suppose that this structure is used as an adaptive channel equalizer, in which the
desired signal is
LM-I 2Jrk
W k =M-
d(n) = Ak COSWkn,
k=O
Chapter 13 Problems 957
13.14 With this form for the desired signal, what advantages are there in the LMS adaptive
algorithm for the DFT coefficients Hk over the direct-form structure with coefficients
h(n)? (Hint: Refer to Proakis (1970).)
Consider the performance index
J = h2 - 40h + 28
Suppose that we search for the minimum of J by using the steepest-descent
algorithm
+1
h(n 1) = h(n) - 2~g(n)
where g(n) is the gradient.
(a) Determine the range of values of ~ that provides an overdamped system for the
adjustment process.
(b) Plot the expression for J as a function of n for a value of ~ in this range.
13.15 Consider the noise-canceling adaptive filter shown in Figure 13.1.14. Assume that the
additive noise processes are white and mutually uncorrelated, with equal variances
a~. Suppose that the linear system has a known system function
1
H(z) = 1 - z1z-1
13.16 Determine the optimum weights of a three-tap noise canceler that minimizes the
MSE.
Determine the coefficients al and a2 for the linear predictor shown in Figure P13.16,
given that the autocorrelation yxx(n) of the input signal is
0<a<1
+ e(n)
x(n)---t----------_-(
Figure P13.16
958 Chapter 13 Adaptive Filters
13.17 Determine the lattice filter and its optimum reflection coefficients corresponding to
13.18 the linear predictor in Problem 13.16.
Consider the adaptive FIR filter shown in Figure P13.18. The system C(z) is cha~ac
terized by the system function
1
C(z) = 1 + 0.9z-l
Determine the optimum coefficients of the adaptive FIR filter B(z) = bo + bICl
that minimize the MSE. The additive noise is white with variance a~ = 0.1.
wen)
x ( n ) - -......-+l C(z) Adaptive >--+-~e(n)
FIR
filter
Figure P13.18
13.19 In the gradient lattice algorithm, the forward and backward prediction errors are
given by (13.4.102).
(a) Show that the minimization of the least-squares error
= Ln
+8!;,S w n - I [lfm(l)1 2 Igm{l)1 2]
1=0
with respect to the reflection coefficients {km(n)} results in the equation given
by (13.4.103).
(b) To determine an equation for the recursive computation of the reflection coeffi-
cients given by (13.4.103) define
um(n) = wum(n -1) +2fm-l(n)g~_l{n -1)
+ +vm{n) = wvm{n - 1) Ifm-l {n)1 2 Igm-l (n - 1)12
so that km(n) = um{n)/vm(n). Then, show that km{n) can be computed recur-
sively by the relation
km{n) = km(n _ 1) + fm(n)g~_l (n - +1) g!(n)fm-l (n)
wvm(n -1)
13.20 Consider the adaptive predictor shown in Figure P13.16.
(a) Determine the quadratic performance index and the optimum parameters for
the signal
=x{n) . "n14T + w{n)
sm
where w{n) is white noise with variance a~ = 0.1.
Chapter 13 Problems 959
13.21 (b) Generate a sequence of 1000 samples of x(n), and use the LMS algorithm to
adaptively obtain the predictor coefficients. Compare the experimental results
with the theoretical values obtained in part (a). Use a step size of Il :::: follmax.
(e) Repeat the experiment in part (b) for N = 10 trials with different noise se-
quences, and compute the average values of the predictor coefficients. Comment
on how these results compare with the theoretical values in part (a).
An autoregressive process is described by the difference equation
+x(n) = 1.26x(n - 1) - 0.81x(n - 2) w(n)
13.22 <a) Generate a sequence of N = 1000 samples of x(n), where w(n) is a white noise
sequence with variance a~ = 0.1. Use the LMS algorithm to determine the
parameters of a second-order (p = 2) linear predictor. Begin with al (0) =
a2(0) = o. Plot the coefficients al (n) and a2(n) as a function of the iteration
number.
(b) Repeat part (a) for 10 trials, using different noise sequences, and superimpose
the 10 plots of al (n) and a2 (n) .
(e) Plot the learning curve for the average (over the 10 trials) MSE for the data in
part (b).
A random process x(n) is given as
x(n) = s(n) + w(n) wo=rr:/4, ¢=O
= sin(won + ¢) + w(n),
where w(n) is an additive white noise sequence with variance a~ = 0.1.
(a) Generate N = 1000 samples of x(n) and simulate an adaptive line enhancer of
length L = 4. Use the LMS algorithm to adapt the ALE.
(b) Plot the output of the ALE.
(e) Compute the autocorrelation YxAm) ofthe sequence x(n).
(d) Determine the theoretical values of the ALE coefficients and compare them
with the experimental values.
(e) Compute and plot the frequency response of the linear predictor (ALE).
(I) Compute and plot the frequency response of the prediction-error filter.
(g) Compute and plot the experimental values of the autocorrelation ree (m) of the
output error sequence for 0 :::: m < 10.
(h) Repeat the experiment for 10 trials, using different noise sequences, and super-
impose the frequency response plots on the same graph.
(i) Comment on the result in parts (a) through (h).
CHAPTER
14
Power Spectrum
Estimation
In this chapter we are concerned with the estimation of the spectral characteristics
of signals characterized as random processes. Many of the phenomena that occur in
nature are best characterized statistically in terms of averages. For example, mete-
orological phenomena such as the fluctuations in air temperature and pressure are
best characterized statistically as random processes. Thermal noise voltages gener-
ated in resistors and electronic devices are additional examples of physical signals
that are well modeled as random processes.
Due to the random fluctuations in such signals, we must adopt a statistical view-
point, which deals with the average characteristics of random signals. In particular,
the autocorrelation function ofa random process is the appropriate statistical average
that we will use for characterizing random signals in the time domain, and the Fourier
transform of the autocorrelation function, which yields the power density spectrum,
provides the transformation from the time domain to the frequency domain.
Power spectrum estimation methods have a relatively long history. For a histori-
cal perspective, the reader is referred to the paper by Robinson (1982) and the book
by Marple (1987). Our treatment of this subject covers the classical power spectrum
estimation methods based on the periodogram, originally introduced by Schuster
(1898), and by Yule (1927), who originated the modern model-based or paramet-
ric methods. These methods were subsequently developed and applied by Walker
(1931), Bartlett (1948), Parzen (1957), Blackman and Tukey (1958), Burg (1967),
and others. We also describe the method of Capon (1969) and methods based on
eigenanalysis of the data correlation matrix.
960
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 961
14.1 Estimation of Spectra from Finite-Duration Observations of Signals
The basic problem that we consider in this chapter is the estimation of the power
density spectrum of a signal from observation of the signal over a finite time interval.
As we will see, the finite record length of the data sequence is a major limitation
on the quality of the power spectrum estimate. When dealing with signals that are
statistically stationary, the longer the data record, the better the estimate that can be
extracted from the data. On the other hand, if the signal statistics are nonstationary,
we cannot select an arbitrarily long data record to estimate the spectrum. In such
a case, the length of the data record that we select is determined by the rapidity of
the time variations in the signal statistics. Ultimately, our goal is to select as short
a data record as possible that still allows us to resolve the spectral characteristics of
different signal components in the data record that have closely spaced spectra.
One of the problems that we encounter with classical power spectrum estimation
methods based on a finite-length data record is the distortion of the spectrum that
we are attempting to estimate. This problem occurs in both the computation of the
spectrum for a deterministic signal and the estimation of the power spectrum of a
random signal. Since it is easier to observe the effect of the finite length of the data
record on a deterministic signal, we treat this case first. Thereafter, we consider only
random signals and the estimation of their power spectra.
14.1.1 Computation of the Energy Density Spectrum
Let us consider the computation of the spectrum of a deterministic signal from a finite
sequence of data. The sequence x(n) is usually the result of sampling a continuous-
time signal Xa (t) at some uniform sampling rate Fs. Our objective is to obtain an
estimate of the true spectrum from a finite-duration sequence x(n).
Recall that if x(t) is a finite-energy signal, that is,
then its Fourier transform exists and is given as
From Parseval's theorem we have
(14.1.1)
The quantity IXa (F)1 2 represents the distribution of signal energy as a function
of frequency, and hence it is called the energy density spectrum of the signal, that is,
(14.1.2)
--_.-. .
962 Chapter 14 Power Spectrum Estimation
as described in Chapter 4. Thus the total energy in the signal is simply the integral
of Sxx(F) over all F [i.e., the total area under Sxx(F) l.
It is also interesting to note that Sxx(F) can be viewed as the Fourier tranSform
of another function, Rxx (r), called the autocorrelation function of the finite-energy
signal Xa (t), defined as
(14.1.3)
Indeed, it easily follows that
(14.1.4)
so that Rxx(r) and Sxx(F) are a Fourier transform pair.
Now suppose that we compute the energy density spectrum of the signal xa(t)
from its samples taken at the rate Fs samples per second. To ensure that there is
no spectral aliasing resulting from the sampling process, the signal is assumed to be
prefiltered, so that, for practical purposes, its bandwidth is limited to B hertz. Then
the sampling frequency Fs is selected such that Fs > 2B.
The sampled version of xa(t) is a sequence x(n), -00 < n < 00, which has a
Fourier transform (voltage spectrum)
X(w) = L00 x(n)e- jwn
n=-oo
or, equivalently,
X(f) = L00 x(n)e- j2rr!n (14.1.5)
n=-oo
Recall that X(f) can be expressed in terms of the voltage spectrum of the analog
signal Xa (t) as
X(F) = Fs L00 Xa(F - kFs) (14.1.6)
k=-oo
where f = F/ Fs is the normalized frequency variable.
In the absence of aliasing, within the fundamental range IFI ~ Fs/2, we have
(14.1.7)
Hence the voltage spectrum of the sampled signal is identical to the voltage spectrum
of the analog signal. As a consequence, the energy density spectrum of the sampled
signal is
(14.1.8)
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 963
We can proceed further by noting that the autocorrelation of the sampled signal,
which is defined as
rxx(k) = L00 x*(n)x(n + k) (14.1.9)
n=-oo
has the Fourier transform (Wiener-Khintchine theorem)
Sxx(f) = L00 rxx(k)e-j27rkf (14.1.10)
k=-oo
Hence the energy density spectrum can be obtained by the Fourier transform of the
autocorrelation of the sequence {x (n)}.
The relations above lead us to distinguish between two distinct methods for
computing the energy density spectrum of a signal xa(t) from its samples x(n). One
is the direct method, which involves computing the Fourier transform of (x(n)} ,
and then
Sxx(f) = IX(f)12
~ I_~oox(n)'-'''''-I' (14.1.11)
The second approach is called the indirect method because it requires two steps. First,
the autocorrelation rxx(k) is computed from x(n) and then the Fourier transform of
the autocorrelation is computed as in (14.1.10) to obtain the energy density spectrum.
In practice, however, only the finite-duration sequence x(n), 0 ::;: n ::;: N - 1, is
available for computing the spectrum of the signal. In effect, limiting the duration
of the sequence x(n) to N points is equivalent to multiplying x(n) by a rectangular
window. Thus we have
x(n) = x(n)w(n) = { ~:n), O::;:n::;:N-1 (14.1.12)
otherwise
From our discussion of FIR filter design based on the use of windows to limit the
duration of the impulse response, we recall that multiplication of two sequences is
equivalent to convolution of their voltage spectra. Consequently, the frequency-
domain relation corresponding to (14.1.12) is
X(f) = X(f) * W(f) (14.1.13)
11/2
= X(a)W(f - a)da
-1/2
Recall from our discussion in Section 10.2.1 that convolution of the window
function W (f) with X(f) smooths the spectrum X(f), provided that the spectrum
W(f) is relatively narrow compared to X(f). But this condition implies that the
window w(n) be sufficiently long (Le., N must be sufficiently large) such that W(f)
964 Chapter 14 Power Spectrum Estimation
is narrow compared to X (f). Even if W (f) is narrow compared to X(f), the con-
volution of X(f) with the sidelobes of W(f) results in sidelobe energy in X(I>., in
frequency bands where the true signal spectrum X(f) = O. This sidelobe energy is
called leakage. The following example illustrates the leakage problem.
EXAMPLE 14.1.1
A signal with (voltage) spectrum
X(f)={1. IfI 50.1
otherwise
O.
is convolved with the rectangular window of length N = 61. Determine the spectrum of X(f)
given by (14.1.13).
Solution. The spectral characteristic W (f) for the length N = 61 rectangular window is
illustrated in Fig. 10.2.2. Note that the width of the main lobe of the window function is
t:l.w = 4Jl"/61 or t:l.f = 2/61, which is narrow compared to X(f).
The convolution of X(f) with W(f) is illustrated in Fig. 14.1.1. We note that energy has
leaked into the frequency band 0.1 < If I 5 0.5, where X(f) = O. A part of this is due to
the width of the main lobe in W (f), which causes a broadening or smearing of X(f) outside
the range If I 5 0.1. However, the sidelobe energy in X(f) is due to the presence of the
sidelobes of W(f), which are convolved with X(f). The smearing of X(f) for If I > 0.1 and
the sidelobes in the range 0.1 5 If I 50.5 constitute the leakage.
10
0
-10
ii:i' -20
:3-
~ -30
'Iii
~ -40
'0
>. -50
ff
'"0:: -60
-70
-80
-90
-100 J
0 .1 .2 .3 .4 .5
Frequency (cycles/sample)
Figure 14.1.1 Spectrum obtained by convolving an M = 61 rectangular window with the
ideallowpass spectrum in Example 14.1.1.
Just as in the case of FIR filter design, we can reduce sidelobe leakage by selecting
windows that have low sidelobes. This implies that the windows have a smooth time-
domain cutoff instead of the abrupt cutoff in the rectangular window. Although such
window functions reduce sidelobe leakage, they result in an increase in smoothing or
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 965
10
0
-10
ii:i' -20
~
.~ -30
c
-40
Q)
"0
>. -50
~
Ji -60
-70
-80
-90
-100 j
0 .1 .2 .3 .4 .5
Frequency (cycles/sample)
Figure 14.1.2 Spectrum obtained by convolving an M = 61 Blackman window with the
ideallowpass spectrum in Example 14.1.1.
broadening of the spectral characteristic X(f). For example, the use of a Blackman
window of length N = 61 in Example 14.1.1 results in the spectral characteristic
X(f) shown in Fig. 14.1.2. The sidelobe leakage has certainly been reduced, but the
spectral width has been increased by about 50%.
The broadening of the spectrum being estimated due to windowing is particularly
a problem when we wish to resolve signals with closely spaced frequency components.
For example, the signal with spectral characteristic X(f) = Xl (f) + X2 (f), as shown
in Fig. 14.1.3, cannot be resolved as two separate signals unless the width of the
window function is significantly narrower than the frequency separation Ill. Thus
we observe that using smooth time-domain windows reduces leakage at the expense
of a decrease in frequency resolution.
It is clear from this discussion that the energy density spectrum of the windowed
sequence {i(n)} is an approximation ofthe desired spectrum of the sequence {x(n)}.
X(/)
Figure 14.1.3 -r----L-~ _ _L -_ _ ~ j_ _ _ _ _ _ _ _ _ _L __ _ _
Two narrowband signal
spectra. o 0.5
966 Chapter 14 Power Spectrum Estimation
The spectral density obtained from {x(n)} is
LSxx(f) = [N-1 j27C/ [2
IX(f)12 = x (n)e- (14.1.14)
n
n=O
The spectrum given by (14.1.14) can be computed numerically at a set of N
frequency points by means of the DFf. Thus
NL-1X(k) = x (n)e-j27Ckn/N (14.1.15)
n=O
Then (k)=IX- (k)12 Sxx(f)I/=k/N = Sxx N
(14.1.16)
and hence
Sxx (~) = [N~x-1(n)e-j27Ckn/N [2 (14.1.17)
which is a distorted version ofthe true spectrum SxAkj N).
14.1.2 Estimation of the Autocorrelation and Power Spectrum
of Random Signals: The Periodogram
The finite-energy signals considered in the preceding section possess a Fourier trans-
form and are characterized in the spectral domain by their energy density spectrum.
On the other hand, the important class of signals characterized as stationary random
processes do not have finite energy and hence do not possess a Fourier transform.
Such signals have finite average power and hence are characterized by a power density
spectrum. If x(t) is a stationary random process, its autocorrelation function is
yxx('r) = E[x*(t)x(t + r)] (14.1.18)
where E[.] denotes the statistical average. Then, via the Wiener-Khintchine the-
orem, the power density spectrum of the stationary random process is the Fourier
transform of the autocorrelation function, that is,
(14.1.19)
In practice, we deal with a single realization of the random process from which we
estimate the power spectrum of the process. We do not know the true autocorrelation
function yxx(r) and as a consequence, we cannot compute the Fourier transform in
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 967
(14.1.19) to obtain lxx(F). On the other hand, from a single realization of the
random process we can compute the time-average autocorrelation function
RxxC-r) = -12 ITO x*(t)x(t + r) dt (14.1.20)
To -To
where 2To is the observation interval. If the stationary random process is ergodic in
the first and second moments (mean and autocorrelation function), then
yxx(r) = lim RxxCr) (14.1.21)
TO-+oo
= lim -12 ITO x*(t)x(t+r)dt
To ......oo To -To
This relation justifies the use of the time-average autocorrelation Rxx (r) as an
estimate of the statistical autocorrelation function Yxx (r). Furthermore, the Fourier
transform of RxxCr) provides an estimate Pxx(F) of the power density spectrum,
that is,
Pxx(F) = I TO Rxx(r)e-j2rrFT dr
-To
= - 1 ITO [ITO x*(t)x(t + r)dt] e-J2. rrFTdr (14.1.22)
2To -To -To
I= -1 I To . 2
x(t)e-J2rrFtdt 1
2To -To
The actual power density spectrum is the expected value of Pxx(F) in the limit as
To --+ 00,
lxx(F) = lim E[Pxx(F)]
To-+oo
liTO= lim E [_1 (14.1.23)
To-+oo 2To -TO x(t)e-j2rrFtdtI2]
From (14.1.20) and (14.1.22) we again note the two possible approaches to com-
puting Pxx(F), the direct method as given by (14.1.22) or the indirect method, in
which we obtain RxxCr) first and then compute the Fourier transform.
We shall consider the estimation of the power density spectrum from samples
of a single realization of the random process. In particular, we assume that xa(t)
is sampled at a rate Fs > 2B, where B is the highest frequency contained in the
power density spectrum of the random process. Thus we obtain a finite-duration
968 Chapter 14 Power Spectrum Estimation
sequence x{n), 0 ::s n ::s N -1, by sampling xa(t). From these samples we compute
the time-average autocorrelation sequence
1 N-m-l
r~x{m) = - - -
"~ +x*{n)x{n m), m = 0, 1, ... , N - 1 (14.1.24)
N-m
n=O
and for negative values of m we have r~x (m) = [r~x (-m)]*. Then, we compute the
Fourier transform
P~x(f) = LN-l r~x{m)e-j2:n:fm (14.1.25)
m=-N+l
The normalization factor N - m in (14.1.24) results in an estimate with mean
value L1 N-m-l
E[r~x{m)] = N _ m +E[x*{n)x{n m)] (14.1.26)
n=O
= yxx{m)
where yxx{m) is the true (statistical) autocorrelation sequence of x{n). Hence r~x{m)
is an unbiased estimate of the autocorrelation function Yxx (m). The variance of the
estimate r~x{m) is approximately
(14.1.27)
which is a result given by Jenkins and Watts (1968). Clearly,
lim var[r~x (m)] = 0 (14.1.28)
N-4OO
provided that
L00
IYxx{n)1 2 < 00
n=-oo
Since E[r~x{m)] = yxx{m) and the variance of the estimate converges to zero as
N ~ 00, the estimate r~x (m) is said to be consistent.
For large values of the lag parameter m, the estimate r~x{m) given by (14.1.24)
has a large variance, especially as m approaches N. This is due to the fact that fewer
data points enter into the estimate for large lags. As an alternative to (14.1.24) we
can use the estimate
rxx{m) = NL1 N-m-l +x*{n)x{n m), 0::Sm::sN-1
n=O m = -1, -2, ... ,1- N (14.1.29)
rxx{m) = NL1 N-1 +x*{n)x{n m),
n=lml
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 969
which has a bias of 1mIYxx (m) / N , since its mean value is
1 N-m-1
N +E[x*(n)x(n m)]
= LE[rxx<m)]
n=O (14.1.30)
-Iml (Iml)= N N yxx(m) = 1 - Ii yxx(m)
However, this estimate has a smaller variance, given approximately as
var[rxx<m)] ~ NL1 00 + +[Yxx(n)1 2 y;x(n - m)yxx(n m)] (14.1.31)
n=-oo
We observe that rxx(m) is asymptotically unbiased, that is,
lim E[rxx<m)] = yxx<m) (14.1.32)
N-+oo
and its variance converges to zero as N --* 00. Therefore, the estimate rxx(m) is also
a consistent estimate of Yxx (m).
We shall use the estimate rxx(m) given by (14.1.29) in our treatment of power
spectrum estimation. The corresponding estimate of the power density spectrum is
PxX<f) = LN-l rxx(m)e-j2rrfm (14.1.33)
m=-(N-l)
If we substitute for rxx(m) from (14.1.29) into (14.1.33), the estimate Pxx(f) can also
be expressed as
?;= =PxX<f)
1 IN-l x(n)e- j2n"/n 12 1 IX(f)12 (14.1.34)
N N
where X(f) is the Fourier transform ofthe sample sequence x(n). This well-known
form of the power density spectrum estimate is called the periodogram. It was
originally introduced by Schuster (1898) to detect and measure "hidden periodicities"
in data.
From (14.1.33), the average value ofthe periodogram estimate Pxx(f) is
= ~ = ~E[Pxx(f)] E [
rxx(m)e-j2rrfm] E[rxx(m)]e-j2rrfm
m=-(N-l) m=-(N-l)
(14.1.35)
= LE[Pxx(f)]N-l ( 1 - 1:1 ) YxX<m)e-j2rrfm
m=-(N-l)
970 Chapter 14 Power Spectrum Estimation
The interpretation that we give to (14.1.35) is that the mean ofthe estimated spectrum
is the Fourier transform of the windowed autocorrelation function
(14.1.36)
where the window function is the (triangular) Bartlett window. Hence the mean of
the estimated spectrum is
E[Pxx(f)] = L00 Yxx(m)e- j2rr/ m
m=-oo (14.1.37)
where WB (f) is the spectral characteristic of the Bartlett window. The relation
(14.1.37) illustrates that the mean of the estimated spectrum is the convolution of
the true power density spectrum r xx (f) with the Fourier transform WB (f) of the
Bartlett window. Consequently, the mean of the estimated spectrum is a smoothed
version of the true spectrum and suffers from the same spectral leakage problems,
which are due to the finite number of data points.
We observe that the estimated spectrum is asymptotically unbiased, that is,
L = L =lim E [ N-l rxx (m)e- j2:n:/m]
00 Yxx(m)e- j2:n:/m fxx(f)
N--+oo m=-(N-l) m=-oo
However, in general, the variance of the estimate Pxx (f) does not decay to zero as
N ~ 00. For example, when the data sequence is a Gaussian random process, the
variance is easily shown to be (see Problem 14.4)
var[Pxx(f)] = rx2x(f) [ 1 + (SNinsi2n12l1'/l'N/ )2] (14.1.38)
which, in the limit as N ~ 00, becomes
lim var[Pxx (f)] = f';x (f) (14.1.39)
N--+oo
Hence we conclude that the periodogram is not a consistent estimate ofthe true power
density spectrum (i.e., it does not converge to the true power density spectrum).
In summary, the estimated autocorrelation rxx<m) is a consistent estimate of
the true autocorrelation function Yxx (m). However, its Fourier transform Pxx (f),
the periodogram, is not a consistent estimate of the true power density spectrum.
We observed that Pxx(f) is an asymptotically unbiased estimate of fxx (f) , but for
a finite-duration sequence, the mean value of Pxx (f) contains a bias, which from
(14.1.37) is evident as a distortion of the true power density spectrum. Thus the
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 971
estimated spectrum suffers from the smoothing effects and the leakage embodied
in the Bartlett window. The smoothing and leakage ultimately limit our ability to
resolve closely spaced spectra.
The problems of leakage and frequency resolution that we have just described,
as well as the problem that the periodogram is not a consistent estimate of the power
spectrum, provide the motivation for the power spectrum estimation methods de-
scribed in Sections 14.2, 14.3, and 14.4. The methods described in Section 14.2 are
classical nonparametric methods, which make no assumptions about the data se-
quence. The emphasis of the classical methods is on obtaining a consistent estimate
of the power spectrum through some averaging or smoothing operations performed
directly on the periodogram or on the autocorrelation. As we will see, the effect of
these operations is to reduce the frequency resolution further, while the variance of
the estimate is decreased.
The spectrum estimation methods described in Section 14.3 are based on some
model of how the data were generated. In general, the model-based methods that
have been developed over the past few decades provide significantly higher resolution
than do the classical methods.
Additional methods are described in Sections 14.4 and 14.5. In Section 14.4, we
consider filter bank methods for the estimation of the power spectrum. The methods
described in Section 14.5 are based on an eigenvalue/eigenvector decomposition of
the data correlation matrix.
14.1.3 The Use of the OFT in Power Spectrum Estimation
As given by (14.1.14) and (14.1.34), the estimated energy density spectrum Sxx(f)
and the periodogram Pxx(f)' respectively, can be computed by use of the DFf,
which in turn is efficiently computed by an FFf algorithm. If we have N data points,
we compute as a minimum the N -point DFf. For example, the computation yields
samples of the periodogram
(k) IN?-;lPxx N = N1 12x (n)e-j27rnk/N , k = 0, 1, ... , N - 1 (14.1.40)
at the frequencies fk = k/ N.
In practice, however, such a sparse sampling of the spectrum does not provide a
very good representation or a good picture of the continuous spectrum estimate
Pxx(f). This is easily remedied by evaluating Pxx(f) at additional frequencies.
Equivalently, we can effectively increase the length of the sequence by means of
zero padding and then evaluate Pxx (f) at a more dense set of frequencies. Thus
if we increase the data sequence length to L points by means of zero padding and
evaluate the L-point DFf, we have
IN?-;lPxx (~) = ~ 12 k = 0, 1, ... , L - 1 (14.1.41)
x (n)e-j27rnk/L ,
972 Chapter 14 Power Spectrum Estim~on
We emphasize that zero padding and evaluating the DFT at L > N points does
not improve the frequency resolution in the spectral estimate. It simply provides
us with a method for interpolating the values of the measured spectrum at more
frequencies. The frequency resolution in the spectral estimate Pxx (f) is determined
by the length N of the data record.
EXAMPLE 14.1.2
A sequence of N = 16 samples is obtained by sampling an analog signal consisting of two
frequency components. The resulting discrete-time sequence is
x(n) = sin 2;r (0. 135)n + cos 2;r(0.135 + fj,f)n, n = 0,1, ... ,15
where fj,f is the frequency separation. Evaluate the power spectrum P(f) = (1/N)IX(f)12
at the frequencies fk = k/L, k = 0,1, ... , L - 1, for L = 8, 16,32, and 128 for values of
fj,f = 0.06 and fj,f = 0.01.
Solution. By zero padding, we increase the data sequence to obtain the power spectrum
estimate Pxx(k/ L). The results for fj,f = 0.06 are plotted in Fig. 14.1.4. Note that zero padding
o8
o 16
~-L-L~~~~-L-L~~~~~-L~~~~~~~~~~.~~~
o 32
o 128
Figure 14.1.4 Spectra of two sinusoids with frequency separation fj,f = 0.06.
14.1 Estimation of Spectra from Finite-Duration Observations of Signals 973
o8
o 16
o 32
o 128
Figure 14.1.S Spectra of two sinusoids with frequency separation t:l.f = 0.01.
does not change the resolution, but it does have the effect of interpolating the spectrum
Pxx (f). In this case the frequency separation t:l.f is sufficiently large so that the two frequency
components are resolvable.
The spectral estimates for t:l.f = 0.01 are shown in Fig. 14.1.5. In this case the two
spectral components are not resolvable. Again, the effect of zero padding is to provide more
interpolation, thus giving us a better picture of the estimated spectrum. It does not improve
the frequency resolution.
When only a few points of the periodogram are needed, the Goertzel algorithm
described in Chapter 8 may provide a more efficient computation. Since the Goertzel
algorithm has been interpreted as a linear filtering approach to computing the DFT, it
is clear that the periodogram estimate can be obtained by passing the signal through
a bank of parallel tuned filters and squaring their outputs (see Problem 14.5).
974 Chapter 14 Power Spectrum Estimation
14.2 Nonparametric Methods for Power Spectrum Estimation
The power spectrum estimation methods described in this section are the classi-
cal methods developed by Bartlett (1948), Blackman and Thkey (1958), and Welch
(1967). These methods make no assumption about how the data were generated and
hence are called nonparametric.
Since the estimates are based entirely on a finite record of data, the frequency
resolution of these methods is, at best, equal to the spectral width of the rectangular
window of length N, which is approximately 1/N at the -3-dB points. We shall be
more precise in specifying the frequency resolution of the specific methods. All the
estimation techniques described in this section decrease the frequency resolution.in
order to reduce the variance in the spectral estimate.
First, we describe the estimates and derive the mean and variance of each. A
comparison of the three methods is given in Section 14.2.4. Although the spectral
estimates are expressed as a function of the continuous frequency variable f, in
practice, the estimates are computed at discrete frequencies via the FFf algorithm.
The FFI'-based computational requirements are considered in Section 14.2.5.
14.2.1 The Bartlett Method: Averaging Periodograms
Bartlett's method for reducing the variance in the periodogram involves three steps.
First, the N -point sequence is subdivided into K nonoverlapping segments, where
each segment has length M. This results in the K data segments
Xj(n) = x(n + iM), i = 0, 1, ... , K - 1 (14.2.1)
n = 0, 1, ... , M-l
For each segment, we compute the periodogram
i = 0, 1, ... , K - 1 (14.2.2)
Finally, we average the periodograms for the K segments to obtain the Bartlett power
spectrum estimate [Bartlett (1948)]
L1 K-t (14.2.3)
pfxU) = K P;~(f)
;=0
The statistical properties of this estimate are easily obtained. First, the mean
value is
LE[pfxU)] = ~ K-l E[P;~U)] (14.2.4)
;=0
14.2 Nonparametric Methods for Power Spectrum Estimation 975
From (14.1.35) and (14.1.37) we have the expected value for the single periodogram as
L 1:1=E[P;?(f)] ) YxAm)e- j2rr!m
M-l ( 1 -
m=-(M-l) (14.2.5)
= -1 jl/2 rxAa(S)i.n1l'(f - a)M)2 da
M -1/2 sIn1l'(f-a)
where ~
Wf = M (SiSn.In11ll''ffM)2 (14.2.6)
B()
is the frequency characteristic of the Bartlett window
1wB(n) =1 MIml' Iml .:s M-1 (14.2.7)
-
otherwise
0,
From (14.2.5) we observe that the true spectrum is now convolved with the
frequency characteristic WB (f) of the Bartlett window. The effect of reducing the
length of the data from N points to M = N / K results in a window whose spectral
width has been increased by a factor of K. Consequently, the frequency resolution
has been reduced by a factor K.
In return for this reduction in resolution, we have reduced the variance. The
variance of the Bartlett estimate is
L1 K-l (14.2.8)
var[Pfx(f)] = K2 var[P;?(f)]
i=O
!= var[P;?(f)]
If we make use of (14.1.38) in (14.2.8), we obtain
2 [ ( 2]B 1 10V' iV\ ~ i"
var[Pxx(f)] = K rxx(f) cJJ) -::~ ()
1+ Msins2in1l'2f 1Ml'() (14.2.9)
Therefore, the variance of the Bartlett power spectrum estimate has been reduced
by the factor K.
14.2.2 The Welch Method: Averaging Modified Periodograms
Welch (1967) made two basic modifications to the Bartlett method. First, he allowed
the data segments to overlap. Thus the data segments can be represented as
=Xi(n) x(n + iD), n = 0,1, ... , M-1 (14.2.10)
i = 0,1, ... , L-1
/
976 Chapter 14 Power Spectrum Estimation
where i D is the starting point for the ith sequence. Observe that if D = M, the
segments do not overlap and the number L of data segments is identical to the
number K in the Bartlett method. However, if D = M12, there is 50% overlap
between successive data segments and L = 2K segments are obtained. Alternatively,
we can form K data segments each of length 2M.
The second modification made by Welch to the Bartlett method is to window
the data segments prior to computing the periodogram. The result is a "modified"
periodogram
I?;~up;?(f) =M-1 12 =i 0, 1, ... , L - 1 (14.2.11)
x; (n)w(n)e- j21r!n
where U is a normalization factor for the power in the window function and is
selected as
(14.2.12)
The Welch power spectrum estimate is the average of these modified periodograms,
that is,
(14.2.13)
The mean value of the Welch estimate is (14.2.14)
1 L-1
E[Px~(f)] = L LE[P;?(f)]
;=0
But the expected value of the modified periodogram is
_ . 1 M-1 M-1 .
=E[P;~(f)] MU L L w(n)w(m)E[x;(n)x;(m)]e-J21r!(n-m)
n=O m=O (14.2.15)
(14.2.16)
1 M-l M-1 .
= MU L L w(n)w(m)yxx(n - m)e-J2rr!(n-m)
n=O m=O
Since
14.2 Nonparametric Methods for Power Spectrum Estimation 977
substitution for yxx(n) from (14.2.16) into (14.2.15) yields
[M-1 EM-1 ]=E[P;~(f)]
1 11/2 ~ w(n)w(m)e- j21f (n-m)(f-a) da
MU -1/2 f'xAa)
11/2 (14.2.17)
= f'xAa)W(f - a)da
-1/2
where, by definition,
IM-1=W(f) M1U ~ w(n)e- j2:n:fn 12 (14.2.18)
The normalization factor U ensures that (14.2.19)
11/2
W(f)df = 1
-1/2
The variance of the Welch estimate is
1 L-1 L-1 (14.2.20)
var[Px~(f)] = L2 LLE[P;~(f)P;~)(f)] - {E[px~(f)]}2
i=O j=O
In the case of no overlap between successive data segments (L = K), Welch has
shown that
var[Px~(f)] = ..!:.var[p;~(f)] (14.2.21)
L
~ if'';x(f)
In the case of 50% overlap between successive data segments (L = 2K), the
variance ofthe Welch power spectrum estimate with the Bartlett (triangular) window,
also derived in the paper by Welch, is
(14.2.22)
Although we considered only the triangular window in the computation of the
variance, other window functions may be used. In general, they will yield a different
variance. In addition, one may vary the data segment overlapping by either more or
less than the 50% considered in this section in an attempt to improve the relevant
characteristics of the estimate.
918 Chapter 14 /
Power Spectrum Estimation
14.2.3 The Blackman and Tukey Method: Smoothing the Periodogram
Blackman and Tukey (1958) proposed and analyzed the method in which the sample
autocorrelation sequence is windowed first and then Fourier transformed to yield
the estimate of the power spectrum. The rationale for windowing the estimated
autocorrelation sequence rxx (m) is that, for large lags, the estimates are less reliable
because a smaller number (N - m) of data points enter into the estimate. For values
of m approaching N, the variance of these estimates is very high, and hence these
estimates should be given a smaller weight in the formation of the estimated power
spectrum. Thus the Blackman-Tukey estimate is
p~}(f)= LM-1 rxAm)w(m)e- j2tr!m (14.2.23)
m=-(M-1)
where the window function w(n) has length 2M - 1 and is zero for Iml :::: M.
With this definition for w(n), the limits on the sum in (14.2.23) can be extended
to (-00, 00). Hence the frequency-domain equivalent expression for (14.2.23) is the
convolution integral
p~}(f) = 11/2 Pxx(a)W(f - a)da (14.2.24)
-1/2
where Pxx(f) is the periodogram. It is clear from (14.2.24) that the effect of win-
dowing the autocorrelation is to smooth the periodogram estimate, thus decreasing
the variance in the estimate at the expense of reducing the resolution.
The window sequence w(n) should be symmetric (even) about m = 0 to ensure
that the estimate of the power spectrum is real. Furthermore, it is desirable to select
the window spectrum to be nonnegative, that is,
W(f) :::: 0, If I ::: 1/2 (14.2.25)
This condition ensures that p'f}(f) :::: 0 for If I ::: 1/2, which is a desirable property
for any power spectrum estimate. We should indicate, however, that some of the
window functions we have introduced do not satisfy this condition. For example,
in spite of their low sidelobe levels, the Hamming and Hann (or Hanning) windows
do not satisfy the property in (14.2.25) and, consequently, may result in negative
spectrum estimates in some parts of the frequency range.
The expected value of the Blackman-Tukey power spectrum estimate is
(14.2.26)
where from (14.1.37) we have
(14.2.27)
14.2 Nonparametric Methods for Power Spectrum Estimation 979
and WB(f) is the Fourier transform of the Bartlett window. Substitution of (14.2.27)
into (14.2.26) yields the double convolution integral
(14.2.28)
Equivalently, by working in the time domain, the expectedvalue ofthe Blackman-
Tukey power spectrum estimate is
LE[p!T(f)] = M-1
E[rxAm)]w(m)e- j2rr!m
m=-(M-1)
(14.2.29)
= LM-1
Yxx (m)wB (m)w(m)e- j2rr!m
m=-(M-1)
where the Bartlett window is
1wB(m) = 1 NIml' Iml <N (14.2.30)
-
otherwise
0,
Clearly, we should select the window length for w(n) such that M < < N, that is, w(n)
should be narrower than wB (m) to provide additional smoothing of the periodogram.
Under this condition, (14.2.28) becomes
(14.2.31)
since
11/2 11/2WB(a - ()W(f - a)da = WB(a)W(f - () - a)da
-m -m
(14.2.32)
~ W(f - ()
The variance of the Blackman-Tukey power spectrum estimate is
(14.2.33)
where the mean can be approximated as in (14.2.31). The second moment in (14.2.33)
is
(14.2.34)
/
980 Chapter 14 Power Spectrum Estimation
On the assumption that the random process is Gaussian (see Problem 14.5), we
find that
=E[Pxx(a)Pxx(lj)] rxx(a)rxAO) { 1 + [ Sin1l'(o+a)N]2 + [Sin1l'(O-a)N]2j
Nsin1l'(O +a) Nsin1l'(O _ a)
(14.2.35)
Substitution of (14.2.35) into (14.2.34) yields
[ 1/2 ]2E{[px~T (f)f) = ( rxx(o)w(f - O)dO
L1/2
1/211/2
+ 1 rxx(a)rxx(O)W(f - a)W(f - 0)
-1/2 -1/2
xI[ NS.iSnln11l'l('o(O++a)aN) ]2 +[Si.n1l'(O-a)N]2j da dO (14.2.36)
NSln1l'(O-a)
The first term in (14.2.36) is simply the square of the mean of p!}(f), which is
to be subtracted out according to (14.2.33). This leaves the second term in (14.2.36),
which constitutes the variance. For the case in which N » M, the functions sin 1l' (0 +
a)N/ N sin1l'(O+a) and sin1l'(O-a)N/ N sin 1l'(O-a) are relatively narrow compared
to W (f) in the vicinity of 0 = -a and 0 = a, respectively. Therefore,
t/2 rxx(O)w(f -0) I[Sin~(O +a)N]2 + [Sin~(O -a)N]2j dO
Ll/2 NSln1l'(O+a) NSID1l'(O-a)
(14.2.37)
rxA-a)W(f + a) + rxx(a)W(f - a)
_ _ _ _~~~~~~
~~~~-2 ~
N
With this approximation, the variance of p!}(f) becomes
where in the last step we made the approximation (14.2.38)
(14.2.39)
11/2
rxx (a)rxx(-a) W(f - a)W(f + a)da ~ 0
-1/2