IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 309
A Note on the Poor–Verdú Upper Bound for the Channel lower bounds (random coding and expurgating) for E3(R) and em-
Reliability Function ployed them to show that E3(R) is convex and can be exactly deter-
mined via a simple expression at high rates (for R beyond some critical
Fady Alajaji, Senior Member, IEEE, rate). The determination of E3(R) at low rates, which is conjectured
Po-Ning Chen, Senior Member, IEEE, and
to be convex, is still an unsolved problem, even for the simple memo-
Ziad Rached, Student Member, IEEE
ryless binary symmetric channel (BSC). In [5], Egarmin extended the
Abstract—In an earlier work, Poor and Verdú established an upper
bound for the reliability function of arbitrary single-user discrete-time expressions of the random coding lower bound and the space parti-
channels with memory. They also conjectured that their bound is tight for
all coding rates. In this note, we demonstrate via a counterexample in- tioning upper bound for E3(R) for discrete finite-alphabet channels
volving memoryless binary erasure channels (BECs) that the Poor–Verdú
upper bound is not tight at low rates. We conclude by examining possible with modulo-additive irreducible Markov noise. He also proved that the
improvements to this bound.
two bounds coincide asymptotically (with the block length n) at high
Index Terms—Arbitrary channels with memory, binary erasure chan-
nels (BECs), channel coding, channel reliability function, information spec- rates. In [7], Han derived an information-spectrum-based lower bound
trum, probability of error.
for E3(R) for arbitrary (not necessarily, stationary, ergodic, etc.) chan-
nels with memory. In addition to the general upper bound provided by
Poor and Verdú, Chen et al. [3] derived another information-spectrum
upper bound for E3(R) for arbitrary channels as a consequence to their
result providing a general expression for the asymptotic largest min-
imum distance of block codes.
I. INTRODUCTION II. PRELIMINARIES
Consider an arbitrary input X defined by a sequence of finite-dimen- n X YDefinition 1 (Channel Block Code): An (n; M ) code for channel
sional distributions [9] W with input alphabet and output alphabet is a pair of mappings
X 1 Xn = X1(n); . . . ; Xn(n) 1 f: f1; 2; . . . ; Mg ! X n and g: Yn ! f1; 2; . . . ; Mg:
= n=1 :
Denote by Its average error probability is given by
Y 1 Y n = Y1(n); . . . ; Yn(n) 1 Pe(n; M ) 1 1 M : g(y )6=mg W n(ynjf (m)):
m=1 fy
= n=1 = M
the corresponding output induced by X via the channel Definition 2 (Channel Reliability Function [8]): For any R > 0,
define the channel reliability function E3(R) for a channel W as the
W 1 fW n = PY jX : X n ! Yngn1=1
nlargest scalar > 0 such that there exists a sequence of (n; M ) codes
=
with
X n Yn X Ywhich is an arbitrary sequence of n-dimensional conditional distribu-
0n e n 1
tions from to , where and are the input and output alpha- lim!i1nf log2 P (n; M )
N
X Ybets, respectively. We assume throughout that is finite and that is
and
arbitrary.
In [8], Poor and Verdú established an upper bound for the reliability n nR < lim!i1nf 1 log2 M : (1)
function E3(R) of W . They then conjectured that this bound is tight n
for all code rates. However, no known proof could substantiate this With this definition, we next derive a slightly different but equivalent
conjecture. In this work, we demonstrate via a counterexample that expression of the Poor–Verdú upper bound.
their original upper bound formula is not necessarily tight at low rates.
A possible improvement to this bound is then addressed. Definition 3: Fix R > 0. For an input X and a channel W , the
Previous related work mainly involved the establishment of upper large-deviation spectrum of the channel is defined as
and lower bounds for E3(R). In [1], [4], [6], and [10] (cf. also the X n 0 1(R) = lim!i1nf 1 log2 Pr 1 iX W (Xn; Y n) R
references therein), the authors examined E3(R) for discrete memo-
n n
ryless channels (DMCs). More specifically, they presented three upper
bounds (sphere packing, space partitioning, and straight line) and two where
Manuscript received August 30, 2000; revised May 28, 2001. This work was iX W (Xn; Y n) = log2 WPnY(Y(nYjXn)n)
supported in part by the Natural Sciences and Engineering Research Council of
Canada (NSERC) and the National Science Council of Taiwan, R.O.C., under is the channel information density.
Grant NSC 89-2213-E-009-237. The material in this correspondence was pre-
sented in part at the IEEE International Symposium on Information Theory, Theorem 1 (Poor–Verdú Upper Bound to E3(R) [8, eq. (14)]): The
Washington DC, June 2001.
channel reliability function satisfies
F. Alajaji and Z. Rached are with the Department of Mathematics and Statis-
tics, Queen’s University, Kingston, ON K7L 3N6, Canada (e-mail: fady@mast. 0 n X X W n nE3(R) 1 1
queensu.ca; [email protected]). i (X ; Y ) R
n log n
P.-N. Chen is with the Department of Communication Engineering, National 2lim!i1nf sup Pr (2)
Chiao-Tung University, Hsin Chu, Taiwan, R.O.C. (e-mail: [email protected]. (3)
edu.tw). X X= sup (R)
Communicated by P. Narayan, Associate Editor for Shannon Theory. for R > 0.
Publisher Item Identifier S 0018-9448(02)00036-6.
0018–9448/02$17.00 © 2002 IEEE
310 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002
jS jThis 1
Proof: Inequality (2) is actually given by [8, eq. (14)]; so we only is justified as follows. Let Mn (Xn) . We then observe that
=
need to prove equality (3). For any Xn
iX (xn; yn) = log2 PX ; Y (xn; yn)
W PX (xn)PY (yn)
1 1
0 n n
log2 Pr iX W (Xn; Y n) R = log2 1 PX ; Y (xn; yn)
M
0 sup 1 1 x 2S(X ) PX ; Y (xn; yn)
X
n log2 Pr n iX W (Xn; Y n) R PX ;Y (xn; yn)
which implies that for any X = log2 Mn + log2 2S(X )PX ;Y (xn; yn)
x
log2 Mn:
0 X(R) 1 1
linm!i1nf sup log2 Pr iX W (Xn; Y n) R: Hence by (4)
X n n
Pr 1
Accordingly, we have iX W (Xn; Y n) R
n
Pr 1 1
0 sup X(R) 1 1 iX (Xn; Y n) log2 Mn =1
X n n
linm!i1nf sup n log2 Pr n iX W (Xn; Y n) R: W
X
for n infinitely often, which immediately gives that X (R) = 0. Con-
On the other hand, the finite alphabet assumption ensures the existence sequently, when maximizing X (R) over all X that are uniformly dis-
tributed over their support, one only needs to consider those X vio-
of X^ n such that
lating (4); this justifies the upper bound formula in Corollary 1.
1 1
0 n n
log2 Pr iX^ W X^ n; Y^ n R
0 = sup 1 1 III. LOOSENESS OF EPV(R) AT LOW RATES
X
n log2 Pr n iX W (Xn; Y n) R In this section, we provide a counterexample in terms of a binary
where Y^ n is the channel output due to channel input X^ n. Let X^ be the erasure channel (BEC) with crossover probability " (0 < " < 1),
triangular-array process having X^ n as its n-dimensional marginal (for
each n). Then which proves the looseness of EPV(R) at low rates.
Denote by X~ the input to the BEC, where X~ n is uniformly dis-
f gtributed over 0; 1 n. Then for any s > 0
Pr 1
sup X (R) X^ (R) iX~ W X~ n; Y~ n R
X n
(X~ ; Y~ ) 20n1s1R
0 = 1 1 = Pr 20s1i
linm!i1nf sup log2 Pr iX W (Xn; Y n) R:
X n n 2n1s1RE 20s1i (X~ ; Y~ ) (Markov's inequality)
= 2n1s1REn 20s1i (X~ ; Y~ )
In the previous theorem, the range of the supremum operation in- where Y~ n is the channel output due to the input X~ n. This implies
cludes all possible inputs. However, it is straightforward from the proof 1
of the Poor–Verdú upper bound (e.g., [8, eq. (14)]) that one can place EPV(R) X2suQp(R) X (R)
both a uniformity restriction and the asymptotic condition of (1) on the =
input to yield a (possibly) better bound. This is illustrated in the next
corollary. 00 00 0 0=
X~ (R) 20s1i (X~ ; Y~ )
Corollary 1: The channel reliability function satisfies ssu>p0 sR log2 E + (1 ")20s
ssu>p0 sR log2 "
E3(R) EPV(R) 1 X2suQp(R) X (R) R) log2 10R ;
R log2 1R0" + (1 "
=
for any R > 0, where = for 0 < R < 1 0 " (5)
0; for R 1 0 ":
Q(R) 1 X : Each Xn in X is uniformly distributed over its support Observe that (5) is exactly the space partitioning upper bound Epar(R)
= for the BEC which is given by [1]
S jS j(Xn); 1 Epar(R) 1
and R < linm!i1nf log2 (X n ) : 0sup sup
n = X s>
2 0sR 0 log2 x2X PX (x)PY1=jX(1+s)(yjx) 1+s
Remark: An observation that upholds the result of the above corol- y2Y
log2 " + 20s(1
lary is that for a channel W and any input X uniformly distributed over 0 0 00= sup sR ") :
s>
its support and satisfying
jS jlinm!i1nf1 Hence, the conjectured tightness of EPV(R) can be disproved via the
log2 (Xn) < R
n (4) looseness of the space partitioning upper bound. We then conclude
from [1, Theorem 10.7.3] that
the channel large deviation spectrum satisfies for 0 < R < 1 0 p" EPV(R) > E3(R)
X (R) = 0: (see the Appendix).
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 311
Remarks: sequence 0 and the all-one sequence 1 of dimension k (PX^ (0) =
• It can be shown by Cramer’s theorem [2] that for DMCs PX^ (1) = 1=2). Then
0 0X~ (R) = ssu>p0 sR log2 E 20s1i (X~ ; Y~ ) S(6) 1 X^ n 1 log2 2dn=ke 1
linm!i1nf log2 = linm!i1nf =
n n k
for a channel input X~ uniformly distributed over its entire space. 2 Pwhich implies that X^ (R; ) for 0 < R < 1=k, and
The memoryless BEC, however, is indeed a peculiar channel for EP(V) 1 X2sPu(pR;) X (R) X^ (R):
which the space-partitioning upper bound Epar(R) is actually =
equal1 to (6). For example, in the case of the memoryless BSC, Observe that under X^ , the BEC (when the very last term is excluded)
(6) is numerically observed to be strictly less than Epar(R) (and is transformed into a DMC with transition probability described by
the straight-line bound); hence, the above simple technique that PY^ jX^ yk 0 0= (1 ")v (y )"k0v (y ) 1 1 v1 yk = 0 (7)
yk 1 0= (1 ")v (y )"k0v (y ) 1 1 v0 yk = 0 (8)
is used to disprove the tightness of EPV(R) at low rates certainly and
does not apply for the BSC. Furthermore, since (6) is equal to PY^ jX^
Epar(R) for the BEC, we also cannot use this technique to dis- f1gwhere v0(yk) and v1(yk), respectively, represent the number of 0’s
prove the tightness of EPV(R) at high rates [since Epar(R) is
and 1’s in yk, and 1 is the set indicator function. Since PY^ jX^
tight at high rates]. only depends on v0 and v1, we can rewrite (7) and (8) as
• One may ask that the looseness of EPV(R) at low rates may be
due to the fact that in its formula, the range of the supremum op-
Qeration includes all the inputs in (R), which may not be neces-
sary. From the proofs of the Poor–Verdú upper bound in [8] and
Theorem 1, we can further restrict the condition on the input to
yield that for any > 0
E3(R) EP(V) (R) 1 X2sPu(pR; ) X (R) PV jX^ (v0; v1j0) = k v "k 1 1fv1 = 0g
v0
=
where and
P S(R; 1 PV jX^ (v0; v1j1) = k v "k 1 1fv0 = 0g
X : Xn is uniform over its support (Xn) and v1
) =
jS jR 1 0where 1
< linm!i1nf log2 (Xn) < R + : (1 ")=". Therefore,
n =
Clearly, EP(V)(R) is no greater than EPV(R) since the former in- Pr 1 iX^ W X^ n; Y^ n R
volves an additional restriction on the choice of X . Also, note n
f gthat the uniform input over 0; 1 n, which is used to disprove 1 ! (X^ ik; k
Pthe tightness of EPV(R), does not belong to (R; ) for 0 < i
0Rtreernedxea1rmedplfero;tomhesEnhcoP(ewV,)at(hRpaot).sEsWiP(beVl),e(hRimo)wpiersovsvetrie,lmlcneaonnttctoirgnehaEtteaPtaVrna(otRtehs)ecmrlocaosyeubntoe- = Pr n i=1 iX^ W Y^ )
zero. + 1 iX^ W X^ j!+1; Y^ !j +1 R
Claim: Consider a BEC with crossover probability ", and fix > 0. n
Then for 0 < R < 1=k
Pr 1 ! X^ ik; k
i
(! + 1)k i=1 iX^ W Y^
1
+ 1)k
+
(! iX^ W X^ !j +1; Y^ !j +1 R (9)
EP(V)(R) ssu>p0 0sR 0 1 log2 "k + 20s 1 0 "k !
k Pr 1 iX^ X^ ik; Y^ k R
i
(! + 1)k i=1 W (10)
where k 1 d1=e.
=
Proof: Let X^ be block-wise independent with block size k; i.e., where (9) holds since (1=n) 1=[(! + 1)k], and (10) follows because
! iX^ W X^ !j +1; Y^ !j +1 0 with probability one:
PX^ (xn1 ) = i=1 PX^ xi(ki01)k+1 2 PX^ (x!nk+1)
where ! 1 bn=kc, j 1 n 0 !k, PX^ is the j-dimensional marginal of Accordingly,
= =
PX^ , and PX^ is equally distributed over a set consisting of the all-zero 0 1 1
n n
1This property actually holds for all memoryless q-ary (q 2) erasure chan- log2 Pr iX^ W X^ n; Y^ n R
f 0 g f 0 gnels with input alphabet 0; 1; . . . ; q 1 , output alphabet q0; 1; . . . ; 1; e ,
01 1 1 ! X^k k
and crossover probability ". So the Poor–Verdú bound is also loose at low rates i i
for this entire family of channels. k ! log2 Pr !+1 i=1 iX^ W ; Y^ kR :
312 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002
f gThe proof is completed by noting that
iX^ W (X^ ik; Y^ ki ) ! is in- Further investigation of the tightness of (11) at low rates for the
i=1 BEC is an interesting future work.
dependent and identically distributed (i.i.d.), and hence we can apply
Cramer’s theorem [2] to obtain
0 0X^ (R)sR 1 PX^ xk APPENDIX
sup y 2Y
s>0 k log2 x 2X Lemma 1: For a BEC with crossover probability "
2 PY^10jsX^ yk xk PY^s yk Epar(R) > Esl(R)
0 0= supsR 1 1 PY^s yk 0 pfor 0 < R < 1 ", where Esl(R) represents a straight-line upper
s>0
k log2 2 2Y bound for the channel reliability function.
y Proof: First recall that for a BEC, the low-rate reliability function
can be written as
2 PY^10jsX^ yk 0 + PY^10jsX^ yk 1 0 0EL(R) = ssup0 mPax
k k0v sR s log2 i; j2f0; 1g PX (i)PX(j)
0 0= supsR 1 1
s>0 2 v =0 v =0
k log2
21 k v "k1(v1 = 0) 2 W (kji)W (kjj)
2 v0
k2f0; e; 1g
1 k s sR0 0= sups log2 1 + "1=s :
+ v1 s0
v "k1(v0 = 0) 2
2
2k 10s !In the limit as R 0, the sphere-packing upper bound EU (R) coin-
v0 v "k1(v1 = 0) cides with EL(R) [1, p. 410]. Hence,
k 10s
v1
+ v "k1(v0 = 0) 0EU (0) = EL(0) = ssup0 s log2 1 + "1=s :
2
0 0= supsR 1 0It is easy to check that s log2([1 + "1=s]=2) is increasing in s. There-
s>0
k log2 fore,
21 2"k k 1 k v "k
2 2s v1
+ =1 1 + "1=s
v 0EU (0) = sl!im1 s log2
2
k 1 k v "k
v0
+ =1 2s log2 1+"
v
0= sl!im1 2
1 1 1=s
2s
k log2
0 0 0= supsR "k + 1 "k : 0 p= log2 "
s>0
#Based on the above claim, we can take R 0 to obtain where the last equality follows by l’Hôpital’s rule [11].
Now [1, Theorem 10.7.3] indicates that any line segment between
1 "k + 20s[1
a point on the sphere-packing upper bound and a point on the
k log2 spacing partitioning upper bound is an upper bound for the channel
0 0 0Rlim#0 EP(V) (R) sR "k] reliability. Construct the straight-line upper bound by taking the point
#lim sup
0 p(0; log2 ") from the sphere-packing upper bound and the point
R 0 s>0 0 p 0 p(1 ", Epar(1 ")) from the space partitioning upper bound
0= log2(");
Epar(R). This straight-line upper bound should be of the form
E3(R)
close to
0which is strictly greater = log2(")=2.
quently, EP(V) (R) is not zero.
than limR#0 Conse-
tight
at rates
• The previous remarks, together with the remark following 0 pEsl(R) = REp0 ar(R0) log2 " 0 log2 p"
Corollary 1, indicate that when bounding the reliability function
of a channel by its large deviation spectrum, one should always R0"
consider the input whose normalized support size ultimately R0)(1
achieves the considered code rate. Any small deviation of the 0 0= R log2 (1 ")
asymptotic normalized support size from the code rate could
lead to a loose upper bound (at low rates). As a consequence, the 0 p 0 pfor 0 < R < R0, where R0 satisfies Epar(R0) = Esl(R0), which is
best upper bound that can be readily obtained from the proofs of
the Poor–Verdú upper bound in [8] and Theorem 1 is exactly R0 = 1 ". Taking R0 = 1 " into the above equation,
yields
1 pp 0 pEsl(R) = R for 0 < R < 1 0 p":
E3(R) in>f0 E () (R): (11) " log2 ";
PV log2 1 + "
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 313
The proof is then completed by noting Ethsal(t RE)pafro(rR0)<isRs<tric1t0lypco"n. vex
in its domain, and hence is larger than
REFERENCES Fig. 1. Bethe tree T .
[1] R. Blahut, Principles and Practice of Information Theory. Reading, Fig. 2. Cayley tree T (i.e., binary tree).
MA: Addison Wesley, 1988.
We discuss mainly a Bethe tree TB; N on which each vertex has N +
[2] J. A. Bucklew, Large Deviation Techniques in Decision, Simulation, and 1 neighboring vertices. For simplicity, we investigate only TB; 2 (see
Estimation. New York: Wiley, 1990.
Fig. 1) in this correspondence.
[3] P.-N. Chen, T.-Y. Lee, and Y. Han, “Distance-spectrum formulas on the
largest minimum distance of block codes,” IEEE Trans. Inform. Theory, To index the vertices on TB; 2, we first fix any one vertex as the
vol. 46, pp. 869–885, May 2000.
“root” and label it by 0. A vertex is said to be on the nth level if the
[4] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Dis- path linking it to the root has n edges.
crete Memoryless Systems. New York: Academic, 1981. We also discuss a rooted Cayley tree TC; 2 (i.e., a binary tree, see
Fig. 2). In a Cayley tree TC; 2, the root has only two neighbors and all
[5] M. Egarmin, “Upper and lower bounds for the probability of error in other vertices have three neighbors just as in TB; 2. When the context
coding for discrete channels,” Probl. Pered. Inform., vol. 5, no. 1, pp. permits, TB; 2 and TC; 2 are all denoted simply by T .
23–39, 1969. nthWleevdeelntoottehebymLthmnletvheel.sIunbpgarartpihcuolafr,TTc(onn) t=1ainLin0ngisthtehevseurtbitcreese
containing the vertices from level 0 (the root) to level n. from
[6] R. G. Gallager, Information Theory and Reliable Communica-
tion. New York: Wiley, 1968. of T
[7] T. S. Han, Information-Spectrum Methods in Information Theory (in We use (n; j) to denote the jth vertex at the nth level. Thus, (n; j)
Japanese). Tokyo, Japan: BaifuKan, 1998. has neighbors (n + 1; 2j 0 1); (n + 1; 2j) and (n 0 1; [j=2]), where
[c] is the smallest integer not less than c.
[8] H. V. Poor and S. Verdú, “A lower bound on the probability of error j jWe denote by B the number of vertices in subgraph B. It is easy
in multi-hypothesis testing,” IEEE Trans. Inform. Theory, vol. 41, pp. to see that if T is a Bethe tree TB; 2
1992–1994, Nov. 1995.
T (n) = 3 1 2n 0 2 (1)
[9] S. Verdú and T. S. Han, “A general formula for channel capacity,” IEEE
Trans. Inform. Theory, vol. 40, pp. 1147–1157, July 1994. if T is a Cayley tree TC; 2 (2)
[10] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and T (n) = 2n+1 0 1:
Coding. New York: McGraw-Hill, 1979.
Let
= f0; 1gT , F be the smallest Borel field containing all
[11] W. R. Wade, An Introduction to Analysis. Englewood Cliffs, NJ: Pren- stheetsminea
s.uLraebtlXe sp=acfeX(
t;;tF2),Tthgabteist,hfeosrtaoncyha!sti=c
tice-Hall, 1995. cylinder process de-
fined on
Strong Law of Large Numbers and Shannon–McMillan T g, define f!(t); t2
Theorem for Markov Chain Fields on Trees
Xt(!) = !(t); t 2 T: (3)
Weiguo Yang and Wen Liu
FLet be a probability measure on the measurable space (
; ). We
Abstract—We study the strong law of large numbers and the
Shannon–McMillan theorem for Markov chain fields on trees. First, we will call a random field on tree T .
prove the strong law of large numbers for the frequencies of occurrence
of states and ordered couples of states for Markov chain fields on trees. Definition 1 (see [5]): Let be a probability measure on the mea-
Then, we prove the Shannon–McMillan theorem with almost everywhere
(a.e.) convergence for Markov chain fields on trees. We prove the results surable space (
; F). If
on a Bethe tree and then just state the analogous results on a rooted (!(j)j!(k); k 2 T 0 fjg) = (!(j)j!(k); k 2 N(j)) (4)
Cayley tree. In the proof, a new technique for establishing the strong limit
theorem in probability theory is applied
Index Terms—Bethe tree, Markov chain fields, random fields, rooted
Cayley tree, Shannon–McMillan theorem, strong law of large numbers.
I. INTRODUCTION
f gA tree is a graph G = T; E which is connected and contains no
6 2circuits. Given any two vertices = T , let be the unique
path connecting and . Define the graph distance d(; ) to be the
number of edges contained in the path .
Manuscript received Nov. 1, 1999; revised April 21, 2001.
W. Yang is with the Jiangsu University of Science and Technology, Zhenjiang
212013, China (e-mail: [email protected]).
W. Liu is with the Hebei University of Technology, Tianjin 300130, China.
Communicated by I. Csiszár, Associate Editor for Shannon Theory.
Publisher Item Identifier S 0018-9448(02)00060-3.
0018–9448/02$17.00 © 2002 IEEE