This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
E{fW) p-a.e. and in L1. For, observe that for every A e 2 )
I / E(fn\Z))dp~ [ E(f\%)dp\ < f \E(Jn\fQ)-E{f\fQ)\dp \J A
JA
JA
< I E(\fn-f\\V)dp,
by (3),
JA
= I \Jn~f\dp< [ JA
JX
\fn-f\dp->0
1.2. Conditional
expectations
13
as n —> oo by the Dominated Convergence Theorem. But then,
/ £(/„|2)) d,i= f fndn-> JA
JA
[ fdfi= f E(J\%) dfi JA
JA
for every 4 e 2 ) again by the Dominated Convergence Theorem. This shows that E(fn\V) -»• E(f\■ £(/„|2)) t # ( / | ? J ) Ma.e. For, we can assume without loss of generality that / „ > 0 fi-a.e. for n > 1. Note that lim i?(/„|2)) exists and is 2)-measurable. Thus for A G 2) we have by a n—Kx>
suitable apphcation of the Monotone Convergence Theorem twice I Urn E(fn\<9) dn = Urn [ E(fn\Z))dv n 7A"^°° -"».M = / JA
lim fndfi= "-+ 00
= lim / " / „ d / i n-*00^ /
/d/j
JA
= J E(m)dfM. Therefore, the desired conclusion follows. (12) (Jensen's Inequality) If ip : R —> R is a convex (= concave upward) function bounded below and (p(f) G Ll{X) for an / G Ll(X), then v{Eum)
< E{
In fact, note that, since tp is convex, ip(t) = sup {at + P : a,(3 G R are such that as + f3 <
f G R.
HGIICG
B ( p ( / ) | ® ) >E(af
+ $\!Q) =
aE{f\fO)+p.
Taking the supremum w.r.t. a, /3 € R on the RHS suitably, we get E(¥>(/)|2)) >¥>(£(/!?)))(13) £(-|2J) : LP(X) -+ LP(2)) is a projection for p > 1. For, since jtt is a probability measure and hence LP C L 1 , £J(-|2J) is defined on all V for p > 1. If p = oo, then for / G L°°(£)
|£(/|2))|<£(|/||2J)<||/|U
1
.
Chapter I: Entropy
implying that ||£(/|2))l|oo < ll/lloo- If K P < oo, then it follows from (12) that
\\E(f\y)f= [ \E(fW)\pd»< f E(\f\'\y)dn = /
\f\pdix=\\f\\*
JX
since
f E(fg\
E(f\Z))gd»=(E(f\Z)),g)2
where (•, O2 is the inner product in L2{X). This is enough to obtain the conclusion. (15) (Holder's Inequality) If £ + A = 1, p, q > 1, then E(\fg\\fQ) < E{\f\*\Z))>E(\g\*\Z))<,
/ € L"(X),g € L"{X).
For, we can assume 0 < £'(|/| p |2)), S(|p| 9 |2)) < 00 /i-a.e. without loss of gener ality. If "p = 1 and q = 00" or "p = 00 and q = 1," then the inequality is immediate. Now suppose 1 < p, q < 00. Then l/g| E(\f\p\^)iE(\g\"\Wl)i
< 1 l/l p , 1 \g\9 PE{\f\p\
since s5<5 < ±sp + Hq for s,t> 0. By taking the conditional expectation E(-\Z)) it holds that S(|/H®)'B(|ff|'|?)) }
p
9
or £(|/ff||?)) < ^ ( | / | p | 2 ) ) ^ ( | 5 | 9 | 2 ) ) ' , i.e., the desired inequality is obtained. (16) E{f\*Q) oS = E(fo SIS-1®), where (/ o S)(x) = f(Sx) for For, both sides are S~^-measurable f JS~1A
(E(f\y)oSf)(x)»(dx)=
and for A € 2) it holds that [ Js~1A
E(f\Z))(Sx) fi(dx)
xeX.
1.2. Conditional expectations
15
= JAE(f\Z))(y)»(dS~ly) = J E(f\%)dtM= J fdf, = f fd(txoS-l)=
f
JA
foSd»,
JS-1A
since S is measure preserving. Hence E{f\*Q) o S = E(f o S|5 _ 1 2)). (17) (18) (19) (20) (21)
0 < P(A|2J) < 1. A 6 2J =* P{A\fQ) = 1AA C B => P(A\m < P(B|2J). P(A'\Z)) = 1 - P(A\tQ). P(A|2) = n(A).
(( U OO OO oo
I
n = 1
4 I
n=1
I
\\
'
'
"°°" °°
= En=l P(AnW). n=l
In fact, the above statements (17) - (22) are almost obvious. Let {2)„} be an increasing sequence of u-subalgebras of X such that 2) n f 2), i.e., 2Jn C 2) n + i (n > 1) and a[ U 2) n ) = 2J, where cr(-) is the tr-algebra generated by {■}. We say that {/„} C LX(X) is a martingale relative to {2) n } if £(/n+i|2)„) = /«,
n > 1.
For instance, if 2)„ f 2), / € £*(£) and / „ = E{f\tQn), n > 1, then {/„} is a martingale relative to {2J n }. {/„} C ^(X) is said to be a submartingale relative to {2Jn} if fn is 2J n -measurable for n > 1 and £(/„+i|2)„) > / „ ,
n > 1.
If ">" is replaced by "<" in the above, then {/ n } is called a supermartingale relative to{2Jn}(23) (Martingale Convergence Theorem) If 2)„ f 2) and / E(f\yn) -> £ ( / | 2 J ) M -o.e. and in L 1 .
G Ll(X),
then
For, let / G £ x ( £ ) . We can assume that 2J = X and hence E(f\%)) = f. We first prove the convergence in L1. Let e > 0 be arbitrary. Since U L 1 (2J n ) is dense in n=l
Ll(X), there exist k > 1 and g G i 1 ^ ) such that ||/ - g\\x < e. Then, for 2Jfc C 2J n and £(s|2J n ) = g-, so that | | £ ( / | 2 J n ) - / H , < ||i5(/|3J„) - B f e l S J J ^ + ||£(s|2J„) -
ff||
+ \\g - f\\x
n>k,
„
Chapter I: Entropy
lb
<2l|/-ff||i<2e. This shows that E(f\yn) follows. For any e > 0
-> HfW)
in L\
As to fi-a.e. convergence, we proceed as
/if[limsup|^(/|2)„)-/|>Vi]) < / i f f l i m 8 u p { | J B ( / - s | ? ) „ ) - ( / - s ) | + |^(ffl?)n)-fl|}>>/e])
<M([limsup{|E(/-ff|a)„)| + | / - < 7 | } > ^ ] ) < M([limsup|S(/-5|!0„)| > ^ ] ) . + / i ( [ | / - f f | > ^ ] ) <4= / I/-SI<*M+4= / Vs Jx
|/-JI4P<WS.
Ve Jx
Therefore £ ( / | 2 J „ ) -4 £ ( / | 2 J ) /x-a.e. (24) Let 2J„ t ?J, / € i 1 (3£) a n d• R be an increasing convex function. If
< liminf ||/„||i. n—>oo
For the proof of (24) and (25) we refer to Rao [2].
1.3. T h e Kolmogorov-Sinai e n t r o p y With the preparation given in the previous section we define and study the Kolmogorov-Sinai entropy. Again let (X, X, /x, 5) be a fixed dynamical system and 2) be a cr-subalgebra of X. A 2J -partition is a finite 2J-measurable partition 21 of X, i.e., a = {Au...
, An} C 2J, Aj n Ak = 0 ( J V *) and U A3- = X . Denote by V{Z» i=i the set of all 2>partitions. Let 21,53 e 73(2J). We let 21V OS = {A n B : A € 21, B € 05},
S - 1 ! * = { S " 1 ; ! : A 6 21},
which are clearly 2>partitions. 21 < 93 means that 93 is finer than 21, i.e., each A £ 2t can be expressed as a union of some elements in OS.
1.3. The Kolmogorov-Sinai entropy
17
Definition 1. Let 21 = {Au ... , An} e V(X). The entropy H (21) of a ■partition 2t is defined by n
ff(2l) = ~5>(A3)log/zUi) i=i
= -£>(A)lo gjU (4). The entropy function 1(21) o/ 21 is defined by J(20(-) = - £ U ( - ) l o g M ( A ) . In this case we have # ( 2 l ) = £ ( / ( 2 l ) ) = f 7(21) d/i. Jx The conditional entropy function J(2l|2J) and conditional entropy i?(2l|2J) are respectively defined by
/(2l|2))(0 = - £ U(0 \ogP(A\m-), 7f (2t|2J) = S(/(2tl9J)) = / J(2t|2J) dM.
(3.1)
For 21 € V(X) we denote by 21 the er-algebra generated by 21, i.e., 21 = a(2l). For a-subalgebras 2Ji,2J 2 we denote 2Ji V 2J2 =
H (2l|2J) = £(/(2t|2J)) = £(£(/(2l|2J)|2J))
= - E / P(AW -
l0 p
§ (^l?))<*/*-
(3-2)
Aea '* where 21 e ^(X) and 2J is a cr-subalgebra of X. Consider a special case where 2) = 93 with 93 G V(X). Then, since P(,4|53) = £ ^ ( y l ^ l s for A e J , M^l-B) being Be33
the conditional probability of yl given B, H(2l|«B) = / I(2l|<8) d/x
Chapter I: Entropy
18
= ~ E /"log(X>(A|£)l B )d/z =
E / X)lBlog|i(^|S)dA«
- E /*(*) E { - ^ l S ) l°S^\B)}seas
(3.3)
Aea
Here we consider — £] /i(yl|B)log//(yl|5) as the conditional entropy of 21 given P G 03 and (3.3) as the average conditional entropy of 21 given 03. Then the following is fundamental. Theorem 1. Let 21, OS G V(X) and 2),2)i,2)2 be cr-subalgebras of X. (1) if(2l|2) = if(21). (2) i f ( 2 t v 0 3 | 2 ) ) = if(2l|2J)+if(03|21V2J). (3) i f (21V 03) = i f (21) + H(03|2t). (4) 21 < 23 => if(2l|2J) < if(Q3|2J).
(5) 21 < 03 => if(2l) < if(03). (6) 2Jx D 2J2 ^ if(2l|2J 1 ) < P(2l|2J2). (7) if(2l|2J) < if(2l).
(8) (9) (10) (11)
P(2lV03|2))< if(2l|2J) + H(03|2J). if (21V 03) < if (21) + if (03). if(5- 1 2l|5- 1 2J) = if(2l|2J). if(5- 1 2t)=if(2t).
Proof. (1) By definition we have i(2l|2)(-) = - E
U(-)logP(A|2)(-) = - E
1A(0logP(A) = /(»)(•)
and hence if(2l|2) = / i(2l|2)d M = / 7(521)^ = ^(21). (2) Observe that f(2lv03|2J) = -
E
lclogP(C|2J)
C621V03
= - E
E UnBlogP(i4nB|3))
1.3. The KolmogoTov-Sinai entropy
=-
19
Y, IAIB icgp^ig) - E UIB i°g A,B
p(
yl,B
pf;y
^l^l-yj
= -^UlogP(A|2J)-^lBlogP(5|2lV2J), .4
B
since for 5 e 95 we have
p«= £ ,ra»_ * ' ^ ' ^ and hence logP(S|av2J) = £
UlogP(p^,ff)
M-a-e-
Taking the expectation, we see that the desired equality holds. (3) follows from (1) and (2). (4) a < 05 implies 21 C 05 and a V 53 = 23. So (2) implies F(2t V 93|2j) = #(93|2J) = # ( a | 2 J ) + # ( 9 3 | a V 2J) >
ff(a|2J).
(5) is obtained from (1) and (4). (6) Since 4>{t) = —tlogi is concave it follows from Jensen's Inequality ((12) in Section 1.2) that for A e 21 £ ( * ( P ( i i | $ h ) ) | S ) a ) < 0(£(P(^|2J 1 )|2J 2 )) = 0(P(4|2J 2 )), the last equality is by (7) in Section 1.2 since 2Ji D3)2- Thus
ff(2t|2J0 = J2 E(4>(P(Am))), by (3.2), = ES(^(P(^|2J 1 )|2J2))) < E S(0(P(J4|2J2))) =
ff(a|2J2),
by (3.2).
(7) follows from (6) by letting 2Ji = 2J and 2J2 = 2. (8) is derived as follows: # ( a v 9 5 | 2 J ) = # ( a | 2 ) ) + .ff(23|av2J), < H(a|2J) + H(95|2J),
by (2),
by (6).
Chapter I: Entropy
20
When 2J = 2, (9) is obtained. h (10) Since P(5- 1 .4|S'- 1 2J) = P(A\fQ) oSiovAeXwe E we have
HiS-^S-1®)
= Y, E&iPiS^AlS-1®))), = ^
by (3.2),
E(<j>(P(A\2)) o 5 ) ) ,
by (16) in Section 1.2,
= 52E(4>(P(A\V)))t byn°S-l = v, = H(0\fQ). (11) is obvious. R e m a r k 2. (1) Theorem 1 holds if H{%) and i7(2l|2J) are replaced by 7(21) and I(2l|2J), respectively, where (10) and (11) of Theorem 1 are then read as: (10) IiS-^S-1®) = 7(2l|2J) o S. (ll)/(5-12l) = /(2t)o5. (2) In Theorem 1 (2), if 21 < 53 and 2J = 93, then 21V 55 = 53 and 21V 93 = 93, so that i7(2t|93) = 0. This means that 93 contains all the information that 21 has, and hence the conditional entropy of 21 given 53 equals 0. In particular, H{%\X) = 0 for every 21 e ?>(£). Definition 3 . Let 21 € V{X). The entropy H(% S) of S relative to 21 is defined by 77(21, S) = Urn i i f C V s ^ ' a V n-+oo n \ i=o /
(3.4) '
v
where the existence of the limit is shown below. The entropy 77(S) of S or the Kolmogorov-Sinai entropy of S is defined by H(S) = sup {77(21, S) : 21 €
V(X)}.
We have a basic lemma. L e m m a 4. Let 21 € V(X).
Then:
n 1
(i)H( y s-w)=H(K) + nf;H(x\ v s-*v). vj=o
y
fc=1
v ij=i
;
1
(2) i f ( V s-*a) = 77(21) + "f; # (s-*a| V s~m). Vj=o
J
fc=1
V
Ij=0
(3)/(Vs-^2i) =/(2i) + "f;17(2i| v s-'a).
)
inai entropy en 1.3. The Kolmogorov-Sinai
21
(4) /( V S - ' ' B ) = /(si) + nf; /(s-fcB| Vs-^a). Proo/. (1) follows from the following computation: SYVS-'B)
=j?(s- (n - 1) av ( n y z s _J 'an = Jf (5- ( n - 1 ) 2l) + H ( "v 2 S- J '2t|5- (n - 1 >2t)
= Jf (a) + H (s-(n-2>a v ("v35-J'2i) Is-' 7 1 - 1 ^ ff(S-("-2)2l|S-(n-1>2l)
= if (21) +
+ Ff"v 3 5'- ; '2l|5- (n - 2) 2lvS- (n - 1) 2l[ N ) = F(2l) + //(2l|5- 1 3l) +/f("y 3 5- J '2l| . V <Sf-J«) n-l
= H(21) + ] T ff (2l| .V 5- J '2l). fc=i
(2), (3) and (4) can be verified in a similar manner. The following lemma shows that the limit (3.4) exists and 77(21, 5) is well-defined for every 21 £ V{X). Lemma 5. Let 21 G P(£). Then: (1) 77(21, 5) is well-defined and it holds that 77(21,5)= lim 77(21! V S-J'2l) n-yoo
\
I j'=l
/
n
= lim 77('5- 2trv 5- J '2iy ra-t-oo
V
1
I j=0
I
(2) 7/ S is inveriible, then 77(21,5)= lim 77 ( d V Sj
V I j'=l
/
n-Kx; n
V .7=0
Sj9i). /
Proof. (1) By Theorem 1 (6) and (1) we see that 0 < 77(2l| .V^S-'a) < 77(2l| V 5 - i 2 i ) < 77(2l|2) = 77(21)
22
Chapter I: Entropy
forn > 1 and hence lim H (oil V 5 _ J '2l) exists. Now by Lemma 4 (1) it holds that
ifm,s)= um ifrfVfir'a) v
n-KX> n
V .7=0
/
= lim - YH(*\ V S-'a) = lim fffal v s^'a). The second equality is obtained from Lemma 4 (2). (2) An application of Theorem 1 (11) and (1) above give
H(21, s)= lim ifffVs-'o) n-t-oo n
\ j=0
n->oo n 1
V /n-1
I
= lim ±i7(V("-1>"vV
)
= hm -iff v sm). n-too n
Similarly, the second equality is obtained.
\ 7=0
/
Basic properties of the entropy H(21, S) are given in the following, which are similar to those of H(21). T h e o r e m 6. Let 21,03 G 7>(2J). Then: (1) 2t < SB => H(21,5) < if (58,5). (2) if f . V 5-J'2t, 5 ) = if (21, S) for n > m > 0. f/ 5 is invertible, this holds for
ra,meZ = { 0 , ± l , ± 2 , . . . } . (3) if( m y o 1 5 , -J'2t,S' m ) = mff(2l,5) for m > 1. (4) if (21,5) < if (23, S) + H(21|
lim l f f ( P v V f c (
S~%S)= )
p-toop
\fc=0
V S^2l)Y Vj=m
))
= lim I f f ( « r ™ f ^ V ^ S - ^ l P-+00 p
V
\
fc=0
/J
by definition, J
'
knai entropy en 1,3. The Kolmogorov-Sinai
= lim
23
p+ n —m — 1
p—foo
11
/p+n—m—1 /P+n-m-1
-Hi p + n —m - 1 v
p
V
,
\
5- f c 2l), fc=o /
by Theorem 1 (11), =
H(%S).
The invertible case is verified by Lemma 5 (2). (3) Since m > 1 we see that by definition
ff(Vs-»'a,H
-H(ny\smrk(mylS-m)\
= Um
= lim ra- — jyf V S- fc 2l) n-voo mn fc=o / n-voo mn V V fc=o = mH(%,S).
(4) We have the following computation: ff(Vs-%)
< H(( V s " j a ) V ( " v V ' s ) ] , = H(n\/1S-jfB) \j=0
by Theorem 1(4),
+iff"v 1 5- J '2irv 1 S _ f c
\j=0
I
fc=0
by Theorem 1 (3), v
I
"
n-1
< i l ( " v S~jfB\ + ^2 H(s-j2l\
"y S-k&),
by Theorem 1 (8),
j=o n-1
< H ( " V S-j
by Theorem 1 (6),
j—"
J = Iff = H^S-ift) fl'("v "v 1S-S-J'!8') 'Q3') ++n#(2l|
by by Theorem Theorem 1 (10).
Thus we have
H(%S)= H(%,S)=
Urn -Hi
-Hf^S-w) V S-W)
^U*(2 s ~'*) + *< a l»>} = F(<8,S) + #(2l|<8). The following is a direct consequence of Theorem 6 (2) and (3). Corollary 7. H(Sm) \m\H(S) for meZ.
= mH(S)
for m > 1. If S is invertible, then H(Sm)
=
24
Chapter
I:
Entropy
In order to state the Kolmogorov-Sinai Theorem we need the following two lem mas, which involve uniform integrability and Martingale Convergence Theorem. Lemma 8. // 2Jn t 2) and 21 € V(X), where 2J„ 's and 2J are cr-subalgebras of X, then [ sup/(2l|2J n )<^l
Proof Let / = sup J(2l|2J„) and F(t) = fi([f > t]) for t > 0, where [f > t] = {x € n>l
Jf : f{x) > i). Then we see that r
yoo
fdn=-
/»oo
y«oo
tdF(t) = [ - tF(t)]~ + /
F(t) dt<
F(t) dt
and for t > 0 P(t) = J
Jsup | - J2 UlogP(A|2Jn)
1 > t\ J
= ^iin[infP(A|?)„)<e-'])
=
^^(An4,
AeSln=l
where A\ = [P{A\V){) < e -*] and A\, = 7i* [P(A|2J„) < e -*,P(A|2J fc ) > e~4] for n > 2. Note that Aj,'s are disjoint and j j ^ = [ hif P(A|2J n ) < e " ' ] . Also note that A*n <E 2J„ and n(A n A*) = / A , P(A|2J„) rf/i, so that w
OO
n=l
n=l
E^nii»)<£e-*M4)<«-'. n=l n=l oo
Obviously £ p(^ n ,4*) < M(A). Thus we get F(t) < min {p(A)t e~*} and /•oo
/
/-oo
F(t)dt<J2 B
wn.{n(A),e-*}dt L
/
>lea *• ■/0
M(A)dt+ / -/-log^(A)
e-
J
tnai entropy en 1.3. The Kolmogorov-Sinai
25
= £ { - / i ( , 4 ) I o g M A ) + /i(A)} = 77(21) + 1 . Therefore the desired inequality holds. Lemma 9. 7/ 2J„ 12J and 21 G P ( £ ) , tften.(1) 7(2l|2Jn) -> 7(2l|2J) /i-a.e. and in L 1 . (2)lf(2l|2) n )4ff(2l|2)). TYoo/. (1) It follows from the Martingale Convergence Theorem ((23) in Section 1.2) that P(A\Z)n) -S- P(A\Z)) n-a.e. for A G 21 and hence 7(2l|2J n ) ->■ 7(2l|2J) /i-a.e. Since sup/(2l|2)„) is integrable by Lemma 8, we see that 7(2l|2J„) -> 7(2l|2J) in Ll, TI>1
too. (2) is derived from (1) and the fact that 77(2l|2Jn) = Jx 7(2l|2J„) a> and 77(2l|2J„) > 77(2l|2J„+1) for n > 1 by Theorem 1 (6). Now we have: Theorem 10 (Kolmogorov-Sinai). 7 / 5 is invertible and 21 G V(X) is such that V 5"2l = X, then 77(5) = H(21,5). n=—oo
Proof. Let 2l„ =
V 5fc2l for n > 1. Then 77(2^, 5) = #(31, 5) by Theorem 6 (2).
Observe that for 25 G V(X) 77(23, 5) < 77(2l„, 5) + H^^),
by Theorem 6 (4),
= 77(2l,5)+77(23|2l n ) -> 77(21, 5)
(n ->• oo)
since H^^) I 77(23|£) = 0 by Lemma 9 (2) and Remark 2 (2). This means that 7/(03, 5) < 77(21, 5) for 23 G V{X), which implies that 77(21, 5) = sup {77(23, 5) : 23 G V{X)} = 77(5).
An immediate corollary of the above is: Corollary 11. Let 21 G V(X). n
Then:
(1) V 5 " 2 l = X => 77(5) = 77(21,5). n=0
„„
Chapter I: Entropy (2) If S is invertible and V S - " 8 = X, then H(S) = H(% S) = 0.
Proof. (1) Letting 2l„ = V S _fc 2l for n > 1, we see that this plays the same role as in the proof of Theorem 10. (2) By Theorem 10 we have # ( 2 l , S ) = H(S). We also have H(%S)=
lim H(%\ V S-j
V
= H(VL\X),
Ij = l
by Lemma 5(1),
/
by Lemma 9 (2)
since X = S~1X = V S~fc2l by assumption, which is 0 by Remark 2 (2). fc=i Consider the case where H(S) = 0. Note that this implies S~XX = X. For, suppose that 5 _1 3C C X, a proper set inclusion. Then there is an A G X such that A i S'^X. Let Oo = {A,AC} G V(X). Then 0 < H(Vl0\S-1X)
< lim fl-faol V S-j%0) n—nx
\
I j'=0
= H(ph, S) < H(S) = 0, /
a contradiction. To consider various dynamical systems and entropies, isomorphism among these systems is important, which is defined by Definition 12. Let (Xi,Xi,Hi,Si) (i = 1,2) be two dynamical systems. These systems or Si and 52 are said to be isomorphic, denoted Si = S2, if there exists some one-to-one and onto mapping (p : Xi —> X2 such that (1) for any subset Ai C X\, Ai G Xi iffH(Si) = H(S2). That is, the Kolmogorov-Sinai of measure preserving transformations is invariant under isomorphism.
entropy
Proof. LetX2 be the isomorphism between two systems. Let 2ti G V(Xi) and observe that
(2li),S 2 ) since ^i(A) = M 2 ( V ( A ) ) for A G 2l x . Hence, H(St) < H(S2). The converse H(Si) > H(S2) is also true. Thus if (Si) = H(S2). It trivially follows from the above theorem that if H(Si) / H(S2), then S x ^ S 2 In the next example we show how to compute the entropy of Bernoulli shifts.
nai entropy ent 1.3. The Kolmogorov-Sinai
27
Example 14 (Bernoulli shifts). Let (Xo,p) be a finite scheme, where XQ = { o i , . . . ,at} and p = ( p i , . . . ,pi) 6 Ae, so that p(a,j) = Pj, 1 < j
:Xk € Xo, k € Z } ,
where Z = {0, ± 1 , ± 2 , . . . } , and the shift S on X given by S : (... ,x-i,x0,xi,...)
>->■(... ,£'_!, X O J S I , . . . ) ,
xjj. = Xfc+i, 4 e Z .
A cylinder set is defined by [x?---x?] = {(•■■ ,xux0,xi,...)
:xk = x°k, i < k < j}
and let Extend /^o to the a-algebra X generated by all cylinder sets, denoted by p.. Note that 5 is measure-preserving w.r.t. \i and hence (X, X, p, 5) is a dynamical system. The shift S is called a ( p i , . . . ,pt)-Bernoulli shift. Since 21 = {[xo = «i], • • • , [^o = ai]} oo
is a finite partition of X and
Sn2l = X by definition, we have by Theorem 10
V n=—oo
and Lemma 5 (2) that
H(s) = Hm,s) = hm iff(Vs-fcsa). n->oo n
\
fc=o
/
Now V S~fc2l = {[x 0 • • • x„_i] : Xj € X0, 0 < j < n - l} and hence F(Vs-*2l)== -
^
p([x0---xn_1])logp([x0---xn_1])
5Z
p([xo---x n _i])logp([x 0 ])---p([x n _ 1 ])
XO,-" i^n — l€-Xo
= -
5 Z M ( W ) logp([x 0 ])
^2
a:oG-Xo
aJn —i€-Xo
= nH{%) since /i([fflj']) = p(aj) — Pj f° r 1 < J < n. This implies that I
H(S) = H(Ql) =
-J2PjtegPj.
n{[xn-i])
logp([x„_i])
28
Chapter
I:
Entropy
Thus (§, \)-Bernoulli shift and (§, §, §)-Bernoulli shift are not isomorphic. ( p i , . . . ,p„)- and ( g i , . . . , g m )-Bernoulli shifts have the same entropy, i.e., n
If
m.
1
- ^PJ
°SPJ
=-J2qk
log Qk
'
*!=i
j=i
then are these isomorphic? This was affirmatively solved by Ornsterin [1](1970) as: Theorem 15. Two Bernoulli shifts with the same entropy are
isomorphic.
We say that the entropy is a complete invariant among Bernoulli shifts. The proof may also be found in standard textbooks such as Brown [3], Ornstein [3] and Walters [1]. Example 16 (Markov shifts). Consider a finite scheme (Xo,p) and the inifinite product space X = XQ with the shift S as in Example 14. Let M = (rriij) be an £ x£ i
stochastic matrix, i.e., my > 0, Yl mij = 1 for 1 < i, j < £, and m = (mi,... j=l
, mi)
t
be a probability distribution such that Yl mifnij = rrij for 1 < j < £. For each i,j, i=l
rriij indicates the transition probability from the state dj to the state aj and the row vector m is fixed by M in the sense that m M = m. We always assume that mi > 0 for every i = 1,... ,£. Now we define /z0 on SDT, the set of all cylinder sets, by Mo([a»0 •••«»„]) =miomioi1
■■■min_lin.
fj,0 is uniquely extended to a measure // on X which is 5-invariant. The shift S is called an (M, m)-Markov shift. To compute the entropy of an (M, m)-Markov shift S consider a partition 21 = {[so = a i ] , . . . ,[x0 = at]} e V{X), which satisfies
V
Sn% = X. As in Example
n=—oo
14, we see that
Y,
M([zo • • ■ x„-i]) log fj,([x0 ■ • •a;„_1])
XQ,... , X n _ i ^ X o H
Ckl0S~k*)
= -
E
M(N--x„_1])logM([xo---a;n_1])
XQ,... 22 ,xn_i^Xo
m
iomioii---min^in^logmiornioil---rnin_ainl
to,-- , i „ _ i = l
e
mi
Zl
o"*»on---mi„_2in_1logmiomioil--.min_2i„_1
m = -^2to,-io , i „ _ ilo=gra»o - (n - 1) ^ l
= ~ «o=l E mio «o=l
lo
immij logmy
i , J = l mimij log ray S m i o - (n - 1) ^2 i,J=l
1,4. Algebraic models I
since X) j n i' T l y = mj
29 an
t
d !C m »j
=
1 f° r 1 — *"> J' S '• By dividing n and letting
n - > o o w e get i
H(S) = — 22 rriimij log rriij.
1.4. Algebraic m o d e l s In the previous section, we have defined the Kolmogorov-Sinai entropy for measure preserving transformations and have seen that two isomorphic dynamical systems have the same entropy. In this section, we relax isomorphism to conjugacy, which still has the property that two conjugate dynamical systems have the same entropy. Algebraic models of dynamical systems are good tools to examine conjugacy among these systems and these will be fully studied in this section. Let (X, X, n, S) be a dynamical system. Two sets A,BeX are said to be /iequivalent, denoted A~B, if fi(AAB) = 0, where AAB = (A U B) - {A !~l B), the symmetric difference. Let A = {B 6 X : A~B}, the equivalence class containing X e l , and 93^ = {A : A € X}, the set of all equivalence classes of X, called the measure algebra of n. The measure fj, on 93M is of course defined by H(A) = fi(A),
A eX
and is strictly positive in the sense that n(A) > 0 if A ^ 0. For now we consider conjugacy between a pair of probability measures. Definition 1. Let (Xj, Xj, fij) (j = 1,2) be two probability measure spaces with the associated measure algebras 931 and 2*2, respectively. Then Hi and \x2 are said to be conjugate, denoted fix ~ (j,2, if there exists a measure preserving onto isomorphism T : 5Si -»■ 93 2 , i.e., T satisfies that H2(TB) =
T
y OO
v
OO
rs
TBc={TBf,
M x (B),
T
/
OO
\
B e ®i, OO
( u^) = u i' ( n i) = n r i ^ ^ e ® i > J ^ L B
Conjugacy between a pair of probability measures can be phrased in terms of a unitary operator between L 2 -spaces as follows.
Chapter I: Entropy
30
P r o p o s i t i o n 2. Let (Xj,Xj,p,j) U = M ) be a Pair of probability measure spaces. Then, ^ i - H2 iff there exists a unitary operator U : £ 2 (A*I) -* L2(n2) such that UL°°(m) C L°°{nz) and U(fg) = Uf-Ug,
/,ff6£°°(/ii).
Proo/. To prove the "only i f part, assume /Ji ~ /x2 and let T : 55i -»■ 552 be an onto measure preserving isomorphism, where 93, is the measure algebra of fij (j = 1,2). Define an operator £/ : L2(jj.i) -> L 2 (/i2) by f/l B = I r s ,
5 e ®i.
Note that \\UlBh = V^(TB) = Vt*i(B) = l I M b s i n c e T i s measure preserving. Then, U can be linearly extended to the set of all 93i-simple functions and is an isometry. Thus U becomes a unitary operator since T is onto. n
Now, let / = J2 aj^Bj, where a / s are complex and Bj G Q5i (1 < j < n) are Now, let / = J2 aj^Bj,
where otj's are complex and Bj G Q5i (1 < j < n) are
disjoint. Then, it holds that
disjoint. Then, it holds that
Uf2 = ufeajlB,)
=
u(^aiaklBllBl]
= 5Z a J afcl;rB > nTB * = 5^ a l l T B J' (Uff = ( 13 Q J l T B i )
=
5ZaJafclrBilrB*
= 5Z«jafclT(s,nB t ) =
J2oipTBt,
and hence Uf2 = (Uf)2 This implies by the polarization identity that U(fg) = Uf ■ Ug for all «Bi-simple functions / , g and then for all / , g e £°°(/xi) by a suitable approximation. Conversely suppose U satisfies the conditions mentioned. If / is an indicator function corresponding to S i esay. Define T : Q5i ->• Q52 by TSx = BJ for Z?i e ©i._Since U is onto, T is also onto. Since U is a unitary operator, TBi = 0 implies Bi = 0. To see that T is a measure preserving isomorphism, note that pi(B x ) = II/IH = HI7/III = p 2 ( T B i ) for Bx e
1.4- Algebraic
models
31
where g corresponds to B2. Moreover, T(Bi U B2) = TB\ U TB2 follows from the fact that / + g — fg corresponds to B\ U B2, and hence
Hi(BiUB2) = \\f + g-fg\\l=\\U(f = \\Uf + Ug-Uf-
( oU oB =\ o n=l U B =/ o o oo
\\ o
oo oo
o oo // o oo
+
g-fg)\\l
Ug\\l = ii2{TBx U TB2). \\
// oo _ \\ oo _
U TBn and /J 2 ( U TBn) = m[ U Bn) for {S„} CC such that f,g e T(fj.) and f = g \i-a.e. imply f(x) = g(x) for every x G X. This is due to the existence of a lifting on L°°(fi) (cf. Tulcea and Tulcea [1]). Let us define a function ip^ on T(/i) by
That!
n
vM) = / fdn,
/er(/x).
Jx Then the following is a basic. Proposition 3. For a probability measure space (X,X,n), function on r(/u) such that f^(f) = 1 iff f = 1.
f^ is a positive definite
Proof. For any n > 1, / x , . . . , / „ £ r(/x) and a i , . . . , a„ e C we have
^2aJak
j,k j,k i,k
f I Jx\
/JJ xx fifkdt* J x
j
I2 I
which implies that ip^ is positive definite. As to the second statement, the "if part is obvious. To see the "only i f part, suppose
32
Chapter I: Entropy
Considering g = g+ - g~ with g+,g~ > 0, we have l = W ( / ) = / SdM = / 9+dn~ Jx Jx < f g+dfi<
I \g\dii<
JX
JX
/ g~dn Jx I
v X
\f\dp=l,
implying that Jx g~ dfj, = 0, g~ = 0 and \g\ = g. Hence Jx \g\ d/j. = 1 and \g\ = 1. Thus / = g = \g\ = 1. The conjugacy between two probability measures induces a certain mapping be tween r(-)-spaces as is seen below. P r o p o s i t i o n 4. Let (XJ,XJ,/J,J) (j = 1,2) be a pair of probability measure spaces. If Hi ~ H2, then there exists a group isomorphism U : T(/ii) —> r ( / i 2 ) such that (1) U T ( M I ) = r 0 i 2 ) ;
(2) Uc = cforce G; (3) If T C T(/j,i) is a set generating L2(ni), then VT generates L 2 (/x 2 ); (4) If T C T(/ii) is orthonormal in L2(ni), then UT is orthonormal in L2(fi2);
(5)vMl(/) = ^ 2 (^/)/or/er( M l ).
Proof. Let J7 : L2(ni) —> L2(/i2) be a unitary operator realizing the conjugacy between ^ i and /x2 (cf. Proposition 2 above). We show that U restricted to T ( / J I ) is the desired isomorphism. At first, note that ||t//||oo = ||/||oo for / £ r ( ^ i ) (see the proof of Proposition 2). Hence \Vf\ < 1 n2-a.e. For n > 1 let
S„ = | a ; 2 eX 2 :|(C//)( a ; 2 )|0 for some n0 > 1. Then,
[ \Uf\2dn2= [
Jx2
JBno
\Uf\2dn2+ [ \Uf\2dfi2 JBono
2(jB B --~ i 1V(_ ^^JnMJ'W ° ) ++ ^^no) «o) " "^ ""' '-'^-no 0
v
< M*a) = Ml(X,) = / |/| 2 d Ml , Jx, contradiction to ||CT/||, = ||/|| 2 . Thus n2(Bn) = 0 for every n > 1 and \Uf\ = 1 M2-a.e. This means that U : r ( / n ) -> r ( / i 2 ) . The properties (1) - (5) are easily verified.
1.4. Algebraic models
33
The properties (1) and (5) in Proposition 4 are particularly important and will be used to characterize the conjugacy of measures in terms of their algebraic models. For this goal we need several technical results. Here, a pair of probability measure spaces (Xj,Xj,fij) {j = 1,2) are fixed. Lemma 5. Let V : L2(m) -> L2(fi2) be a linear isometry. If f E L°°{(i{) is such that {Vf)n = Vfn for n > 1, then Vf € L°°(/i 2 ) and ||V/||oo = l l / I U Proof. Observe that for n > 1
f \vf\2nd^= [ {VfYWJYdM
-L
Vf" ■ Vfn d/j.2, by assumption,
= /
fn-fndni,
since V is an isometry,
JXx JXi
\!\2nd»u
which implies that ||V/|| 2 n = ll/lbn- Letting n -¥ 00, we get ||V/||oo = ||/||oo and Vf 6 L°°(n2). Lemma 6.
Let V : £ 2 ( M I ) - * £ 2 (A»2) be a linear isometry and A C L2(fii).
If
f e i°°(/ii) is such that Vf € L°°{fj.2) and V(fg) = VfVg,
ge A,
(4.1)
then (4.1) is true for every g eA, the closure of A in L 2 (/ii). Proof. Take &g eA and choose a sequence {<;„} C A such that \\gn — g\\2 —> 0. Then \\Vg„ - Vg\\2 = Ha, - g\\2 -> 0 and \\V(fgn) - V(fg)\\2 = \\fgn - fg\\2 -> 0 since / is bounded and V is an isometry on L 2 -spaces. But since V(fgn) = Vf Vgn for n > 1, we have \\V(Jg) - Vf ■ Vg\\2 < \\V(fg) - V(fgn)\\2 or
+ \\Vf ■ Vgn - Vf ■ Vg\\2 -+ 0,
V(fg) = Vf ■ Vg. Thus (4.1) holds for every g £A.
Lemma 7. Let V : L2{ii\) —> L2(fj,2) be a linear isometry, A C L°°(/i 1 ) be an additive group and g 6 L2(f/,i). If V(fg) = Vf ■ Vg,
\\Vf\U
= ||/|U
(4.2)
Chapter I: Entropy
34
for every f G A, then (4.2) is true for every f G A, the closure of A in
L°°(fii).
Proof. Let f & A and choose a sequence {/„} Q A so that ||/„ - /j|oa -> °- T h e n Wfn9-f9h - > 0 a n d hence | | V ( / „ s ) - V ( / f f ) | | z -> 0 because V is an isometry. Since \\Vfn - VfnWoo = ||/n - /m||oo -+ 0 as n, m -+ oo by (4.2) and the choice of {/„}, it follows that \\Vfn ~ Vf\\oo -» 0. Thus \\Vfn -Vg-VfVg\\2 -» 0. Now since V(fn9)
= Vfn ■ Vg,
||V/„||co = l l / n | U
n>l,
we have by letting n -> oo that V(/ff) = V / ■ Vp and ||V/||oo = ll/lloo as desired. T h e o r e m 8. Lei .4. C L°°(ni) fie suc/i i/aai (1) .4. is dense in L 2 (/ii); (2) a, j3 G C, rational, f,geA=>af + pgeA; (3) / G A => f G A; (4) f,geA^-fgeA; and let VQ : A —► L2(p2) be such that (5) / G .A =>• \\V0f\\2 = ll/lb, i-e., V0 is on isometry; (6) a , 0 G C, rotionai, f,geA^-V0(af + j3g) = a F 0 / + /JVbp; (7) / G .4 =*• Vb/ = Vb/, i.e., V0 is real; (S)f,geA^V0(fg) = Vof-Vog; Then, VQ can be extended to a linear isometry V : L 2 (/ii) —¥ L2{p,2) such that (a) f,ge L°°(m) => V(/fl) = V / • F 9 ; (b) / G L~( M i) =* IIV/Hco = ll/lloo//, moreover, VA is dense in L2(n2), then VL2(fj,i) L°°{ix2).
= L2(fi2)
and VL°°(fii)
=
Proof. The extendabiUty of V0 to a linear isometry V follows from (1), (2), (5) and (6). It follows from (3) and (7) that V is also real. Invoking Lemma 5 we have
HV/Hoo = 11/Hoo,
feA.
(1) and Lemma 6 imply that V(fg) = Vf-Vg,
/ G A,
fl
G L2(Ml).
We can deduce from Lemma 7 that V{fg) = Vf-Vg,
feA,ge
IIV/H00 = ll/lloo,
L2(m),
/ e A,
(4.3) (4.4)
1.4- Algebraic models
35
where A is the closure of A in L°°(fii). Claim 1. If / € A andis a C-valued continuous function on C, then <j> o / e A. For, note that A is a subalgebra of L°°(n\) such that g G A for <7 G .4. Let a > ll/lloo- Then there is a sequence {p„(z,z)} of polynomials in z and z converging uniformly to 0 on the disk \z\ < a. Hence {pn(f(x), f(x))} converges uniformly to <j>{f{x)) on X. Thus, cj> o f e A since pn(f, f) G .4 for n > 1. Claim 2. VL°°(/*i) C L°°(/i 2 )For, let / G L°°(/ii) and a > ||/||oo- Choose a C-valued continuous function <\> on C such that (z) = z for \z\ < a and \<j>{z)\ < a on C. Then, clearly <j>of = f ^-a.e. Moreover, choose a sequence {/„} C A such that ||/ n —/H2 —> 0 and / n —)■ / \i\-a.e. Then, /i n = <j> o / „ 6 „4 (n > 1) and /i„ -¥ (j> o f = f fi^-a.e. Since \hn\ < \ \ < a (n > 1), it follows from the Bounded Convergence Theorem that \\Vhn — V/II2 -» 0. Without loss of generality we may assume that Vhn —> Vf y,2-a.e. since otherwise we can choose an appropriate subsequence. Then we deduce that ||K/||oo < a and hence Vf G L00^) since ||V/i„||°o = U M U < a by (4.4). To show (a) we use (4.3) and Lemma 6. We note that V(fg) = Vf-Vg,
f G L°°(/n), 9 G L2(»x)
sinccA = i 2 ( ^ i ) and each / G L°°(ni) satisfies the assumptions of Lemma 6, which clearly implies (a). Then, (b) follows from Lemma 5. As to the last statement, if VA is dense in L2(fj.2), then evidently VL2(fii) = L2{n2) and V becomes a unitary operator. If we consider V~x : L2(fi2) —► £ 2 (Mi) and apply the part of the theorem proved, we get V^L00^) Q £°°(A»I) or L°°(/X2) £ VL°°(in). Thus VL°°{y,{) = £°°(M2). Definition 9. A pair (r,V2) are said to be isomorphic, if there exists an onto isomorphism U : I \ —t- r 2 such that V I ( T ) =
M), (C,
If V(T) = 1 for some 7 ^ 1 , then we can associate an algebraic measure system (f,
og
Chapter I: Entropy
Let C" = {7 e T : |<^(7)I = !}■ Then C" is a group and
= vii'Mil
T'
e C, 7 e r.
Let Ci = {7 G T : ip(-y) = 1}. Then Cx is a group and ?(7l) = ^(72),
7172 -1 € G\.
Let f = r / C i , the quotient group and define (p on T by
£(7) = P(7). 7 e r> where 7 is the equivalence class containing 7. Then it is verified that (p is positive definite on f such that £(7) = 1 iff 7 = 1, so that (f,M) is an algebraic model for fj.. In Definition 10, we may identify T and JT, so that we can consider T C r(^t) and ip = ip^. The following theorem is a main result of this section, which characterizes mea sures uniquely up to conjugacy. Theorem 1 1 . Two probability measures are conjugate iff they have isomorphic algebraic models. Proof. Let (X,-, Xj, fj,j) (j = 1,2) be a pair of probability measure spaces. If Vi - M2, then the algebraic models (T(ni), ip^) and {T(fj,2), ip^) are isomorphic by Proposition 4. Conversely, assume that nx and fi2 have isomorphic algebraic models ( r i ,
T 2 be the onto isomorphism such that
vM Let A = j £ ajfi
=
/£%).
: as £ C,/,- e r ( M l ) , l < j < n,n
> l | C L 2 ( M l ) . Then for
n
f = J2 ajfj 1=1
£ - ^ w e have that
I |/| 2 d/i!= I \Y,<*ifi dfn= f
'Eaj-Gkfjftdm
1.4. Algebraic models
37
= Y^ajakiplil(fjf^1)
Y^ajak
=
i.fc
j,k 2
= [ \j2aiuf>
dfi2.
/x 2
It follows that we can define a mapping UQ : A —> L°°(ii2) unambiguously by U
o y J2 ajfjj = Y, aJUfi'
since J2ajfj = 0 fii-a.e. implies that YLajUfj i j a linear multiplicative mapping such that ||Db/|]. = H/lla,
= 0 fj.2-a.-e. Then note that f/o is
/ € A
It then follows from Theorem 8 that UQ can be extended to a unitary operator Ui : L2(m) -> L2(fi2) such that UiLx{tn) = L°°(fi2) and Ul(fg) = U1f-U1g,
/ , j e r W .
Therefore fix — (J.2Corollary 12. Two probability measures fj,\ and fi2 are conjugate iff algebraic measure systems (r(/ii),
7€r,
JG
where (x,7) is the duality pair for x € G and 7 G Y. Define J : Y —¥ L2(fi) by
/ 7 = (-.7>»
7er.
„o
Chapter I: Entropy
Clearly, J is a homomorphism of T into T(n), JT generates L2(fi), and
Te
r.
Moreover, J is one-to-one since (-,7) = 1 fi-a.e. implies ¥>(l) = / (*- 7) »(dx) = 1, JG
so that 7 = 1The assertion about the Haar measure is obvious. Corollary 14. Every probability measure fi is conjugate to a regular Borel measure v on a compact abelian group. Proof-. (r(/i),<^M) is an algebraic model for \i and also an algebraic model for a regular Borel measure i / o n a compact abelian group G. We invoke Theorem 11 to obtain conjugacy between /x and v. Theorem 15. Let (X,X,ii) be a probability measure space. Then /i is conjugate to a Haar measure X on a compact abelian group G iff there is a group Ti C T(fi) which is a CONS ( = complete orthonormal system) of L2(n). Proof. Suppose /i ~ A and let U : L2(\) —► L2(fi) be a unitary operator such that UL°°(X) = L°°((i) and U(fg) = Uf-Ug,
f,geL°°(X).
Since the dual group G is a CONS of L2(X), T1 = UGisa. CONS of L 2 (p). Conversely, suppose that there is some Ti C T(/i) that is a CONS of L2(n). Then, (Tii w ) i s a n algebraic model of fi. Moreover, since ^ ( 7 ) = 1 iff 7 = 1, Theorem 11 implies that ( r i , ip^) is also an algebraic model of the Haar measure A of G = F i , a compact abelian group. Thus n ~ A by Theorem 11. Now let (X, X, //, 5) be a dynamical system. Denote by Us the linear isometry on L2(n) defined by Usf = f o 5 , i.e., (Usf)(x) = f(Sx) for x £ X and / G L 2 ( M ). Definition 16. Let (XJ,XJ,(J,J,SJ) (j = 1,2) be a pair of dynamical systems with measure algebras 33i and «B2, respectively. Then, 5 i and 5 2 are said to be conjugate, denoted S\ ~ 52, if there exists a measure preserving onto isomorphism T : 93! ->
39
1.4- Algebraic models
Definition 3.12). Note that Si ~ S2 iff Mi and M2 are conjugate by means of a unitary operator U : £ 2 (Mi) ->■ L2(n2) such that f t / s , = C/s2t^- This follows from Proposition 2. Definition 17. A triple (r,r 2 such that UU\ = U2U. If (X,X,fj.,S) is a dynamical system, then (r(/i),
T(n) such that JU = Us JThen we have the following. Theorem 19. Two dynamical systems are conjugate iff they have isomorphic alge braic models. Proof. Let (Xj,Xj, ftj, Sj) (3 = 1,2) be a pair of dynamical systems with associated isometries Us1 and Us2 on L2{^{) and L2(/j,2), respectively. Suppose that Si ~ £2- Then, it is easy to see that the algebraic models ( r ^ i ) , ^ ! , ^ , ) and (r(/i 2 ),¥V 2 ,£/s 2 ) are isomorphic. Conversely, assume that Si and S2 have isomorphic algebraic models (Fi, ipi, Ui) and (r2,
T2 be the onto isomorphism such that ipi = tp2 ° U and UUi = U2U. Then, by the proof of Theorem 11, 17 can be extended to a unitary operator from L 2 ( M I ) onto L2(n2), still denoted by U, such that UL°°(ni) = L°°(n2) and
U(fg) = Uf-Ug,
/,9ei°°W.
Moreover, the equality UUif = U2Uf,
f g L2(m)
is verified first for / g Fi, then for a linear combination / of functions in Fi, and finally for / g L2(fj.i) by a suitable approximation since Ti generates L2(fii).
.„
Chapter I: Entropy
Corollary 20. Two dynamical systems {XhXj,iij,Sj) (j = 1,2) are conjugate iff the algebraic measure systems (F(Hj),
ip(i)=
-yeT.
JG
By considering a mapping J : T -> T(fi) defined by
J-y = {-,-y),
T^r.
we have that (r,
= / (Tx,y) n(dx) = / JG
(x,y)v(dx).
JG
Since JX is unique in Bochner's theorem, we must have \i = v and hence r is measure preserving. Given two dynamical systems (Xj, Xjt fij, Sj) (j = 1,2) with the measure algebra B j of 3£j. If Si ~ S2, then B i ~ B2 by some measure preserving onto isomorphism T : B i -> B 2 . For 21 e V{Xi) it holds that if(2t) = H{TQL),H{% Si) = # ( T 2 l , S 2 ) and if(Si) = H(S2)- Thus, conjugate dynamical systems have the same entropy.
1.5. Entropy functionals
41
1.5. Entropy functionals In Section 1.3, we fixed a probability measure space (X,X,n) and considered a measure preserving transformation 5 on it. For a given finite partition 2t 6 V(X), the entropies i?(2t) and H(21, S) are defined and they depend on the probability measure fj,. So we denote them by if(/x,2l) and H(/j,,%S), respectively. In this section, however, given a measurable space (X, X) with an invertible measurable transformation (i.e., an automorphism) S, we consider entropies H(n, 21, S) for Sinvariant probability measures fi, so that H(-, 21, S) becomes a functional on the set of such measures. Our goal of this section is to extend this functional to the set of all C-valued 5-invariant measures, to see that the extended functional is bounded, and to obtain an integral representation of this functional in a special case of interest. To this end we need some notations. Let 21 G P(X) be a finite partition and let n 3=1
%> = .7(2,,),
a,
for n > 1. P(X) denotes the set of all probability measures on X and PS{X) the set of all 5-invariant measures in P{X). M(X) (resp. MS(X)) stands for the set of all C-valued (resp. S-invariant) measures on X, which is a Banach space with the total variation norm || ||, while M+(X) (resp. M+(X)) stands for the set of all nonnegative elements in M(X) (resp. MS(X)). For a probability measure /u € P(X) and a cr-subalgebra 2J of X, PM(^4|2)) denotes the conditional probability of A E X relative to 2J under /z. For £ e M+{X) we let fo = ^ e P(X) and P 4 (A|2J) = P?1(yL|2J). Now we fix a partition 2t € P(X) and consider a functional i/(-,2l, S) on M + ( X ) given by
ff(£,a,s) = / ^(aiaude, where J€(2t|2J) = - £ U l o g P ^ I ? ) ) -4ea
#(£,», S) = -^2 = - V
rewrite if(£,2l,S) as follows:
f PeiAfao) logP^iiiaoo) df, by (3.2), lim / P { (i4|a„) logP c (i4|a„) df,
*—* n—►oo IY A691 rf Act*
We can
£ e M+(X),
A
by Lemma 3.9 (2),
42
Chapter I: Entropy
= - lim -
V)
{(A)logf(A),
< - YJ Z(A) lo&Z(A) + f W l ° g £ W ,
by (3.4), b
y Lemma 3.8.
For a general £ € MS(X) we write
e = ^-e2+^3-^4
(i=v^i)
with & 6 M+(X), j = 1,2,3,4 and define ff(£, 21,5) = H(Z\ 21,5) - H(e, 21,5) + iff(£3,21,5) - iff (£4,21,5).
(5.1)
Then, the functional ff(•, 21,5) so defined on M S (X) is called the entropy functional for the partition 21 and the automorphism S. Trivially, ff (/i, 21, 5) coincides with the Kolmogorov-Sinai entropy for n e Pa(X) defined in Section 1.3. We shall prove that ff (•, 21,5) is Unear on M+(X) and on M3(X), and is bounded on MS(X) in a series of lemmas. Lemma 1. For any £,n £ M+(X) and a,/? > 0 ii /jo/ds i/iai ff (a£ + #,, 2t, 5) = off (£, 21,5) + /?fffo,21,5).
(5.2)
Froo/. Let (,r/6 M+(X) and o,/3 > 0, and observe that ~~
E (c*S + 0T,KA)log(a£ + 0T,)(A) Aeava„_,
= -£ E KW+#JM) log K(^) + /*?(*))
= - \ E Q^) los^) - \ E /w los iw -lE^,.»8(<,+^)-iE«-).«6(^^),(53) where in the third and fourth sums on the extreme right we consider £(A) log (a + P$$) = 0 if Z{A) = 0 and similarly 7,(71) log (/3 + a { $ ) = 0 if IJ(X) = 0. If £(A) > 0, then by log(z + 1) < x for x > 0
»,.sl«(.+^)s„„+e.^.
43
1.5. Entropy functionals
Hence we have
i«JT)aloga
a*(A) log (a +
£
< -S(X)aloga n
+
n
P^r)
-Pri{X).
Thus it follows from (5.3) that - lim - Y W + 0rj)(A) log(a£ + Pn)(A) n—¥00 n *—'
= - a lim i V £ ( A ) l o g £ ( A ) - / 3 Urn i Y 77(A) log77(A), z rf n—► oo 71 *—* A
n—^oo n — A
i.e., (5.2) holds. When a/3 = 0, (5.2) is obvious. Lemma 2. f/ £, £', 7?, 77' e M+(X)
are such that £ - 77 = £' — 77', iften
if (£, 2t, S) - ff (77,21,5) = if (£', 21, S) - if (77', 21,5).
(5.4)
Proo/. Since £ — 77 = f' — 77' implies £ + 77' = £' + 77, we see that if (f, 21, 5) + if (77', 21,5) = if (£ + 77', 21, 5),
by Lemma 1,
= if(£' +77,21,5) = H(£', 21,5) + H(77,21, 5),
by Lemma 1.
Thus (5.4) is true. Let Ml(X) be the set of all K-valued measures in MS{X). is well-defined on M J ( X ) by ff(£, 21,5) = H(t+,21,5)-
if ( r , a, S),
By Lemma 2, H (•, 21, S)
£€M;(X),
where £ = £ + — f~ with f + , £ ~ £ M + ( X ) is the Hahn decomposition. Lemma 3 . For any £,77 £ M J ( X ) and a,/? £ R ii holds that H{ai + /3T7, 21, S) = aH{£, 21,5) + /3ff (77,21,5). TTiai is, if (-,21,5) is a real linear functional on M J ( X ) .
(5.5)
44
Chapter
J: Entropy
Proof. It follows from (5.1) and Lemmas 1, 2 that # K , 21, S) = aH(t, 21, S),
a > 0, £ G MJ(X).
Hence the LHS is well-defined. If £ = £ &, where & = $ - Cfc with ^ " , C € fc=i
M+(X) for 1
F(£, 21,5) =ff( J2 £ - E C *»S) ^
fc=i
7
fc=i
ff(Dk+'^5)-ff(E^a'S)'
=
by Lemma 2,
fc=l fc=l
= E iH<&>*>5>_ ^(Ca. 5 )}- fey Lemma !n
= J2H&
-4".a-5),
by Lemma 2,
fc=i n
= £#(&, a, s). Moreover, for f = £+ - € - € Msr(X) with £+,£" G M+pC) it holds that
ff(-^,2t,S)=H(r-^+,2l,S) = if(£-,21, S ) - # ( £ + , 2 1 , S), = - # ( £ , 21, S),
by Lemma 2,
and hence for any a < 0 H(a£,2t,5) = H(a£+ - aC% =
S)
+
ff(ae ,a,S)-fT(ar,a,S)
= o£r(f'1aTsf)-oH(r,a,s) = a J E T ( { + - r , « , 5)
= aff(e,a,s). Therefore, (5.5) is proved. Lemma 4. if (-,21, S) is a bounded linear functional on MJ(Jf). More fully, \H{S,%S)\<^\\a
£eMJ(X),
45
1.5. Entropy functionate
where |2l| is the number of elements in S3, Proof. Observe that for £ e M+(X)
and A e 21
o < - /" p 4 (A|si 00 )iogP t (A|a 00 )de < ||^(A|900)iogi%(ii|sU)||4ino€(jr)
< jtlfll, where || ■ ||f]00 is the £-ess. sup norm. Thus we have
iy(e,a,s)<M|Ki|. For a general £ £ M s r (X) write f = f + - f" with £ + , £ - e M + ( X ) . Then we see that
< H(iei,2i,5) < J|l||ifin = Mjien.
\H(Z,%S)\ as desired.
Lemma 5. if(-,2l, 5) is well-defined and is a bounded linear functional on such that
\H(t,%s)\<\a\U\\,
eeMs(x).
MS(X)
(5.6)
Proof. If f e M s r (X), then it follows from (5.1) that
H(iZ,%S) = iH(Z,2L,S), H(i£,%S) iH(Z,%S), Hence it is easily seen that H(-,QL,S) is well-defined and linear by Lemmas 1, 2, 3 and 4. To see the boundedness of H(•, 2t, 5) note that for £, n e M J ( X ) max (||e||, 117,11) < U + iv\\ < VUW2 + IMP < 2||£ + : Then, we obtain \H(Z + ir,,%,S)\
= V\m,%,S)\2
+
\H(r,,%S)\*
by Lemma 4,
46
Chapter I: Entropy
Summarizing Lemmas 1 - 5 we have the following theorem. Theorem 6. The entropy functional H(-, 21, S) is a nonnegative bounded linear functional on Ma(X). Remark 7. Here are some properties of the entropy functional. Let t € M+(X) and2l,H{t, 2t, 5) < H{t, 53, S). (2) 21 < 53 => ff(£, 21,5) < #(£, 53,5). i3) Hit,S-1K,S) = Hit,VL,S). (4) # ( £ , a v 9 3 , S ) < . f f ( £ ) a , S ) + lf(£,«8,S')<2.ff(£,2lV«B,S). The entropy of 5 under t £ -Mi1" (X) is defined by
Hit, s) = sup {#(£, a, 5): a e P(x)}. If a e ViX) is such that 2 ^ = X, then
H(S,S)=H{t,X,S),
f£M+(X)
by Remark 7(1). More generally, we have the following. Theorem 8. Let {a(ra)} be a sequence in ViX) such that
a(i)
<
Then it holds that
Hit,S)=limoHitMn),S),
£ e MS+(X).
Proo/. Let t € M+(X). For any a 6 P(£) and m, n > 1 one has that
4
E ^)loe^) 2n + k < --
1 £ tiA)]ogtiA) In + fc iie%)va(m) ! , + ,. 1
Y, j iAiogPe(x|ir( .=v_n^'a(m))) de
^ea
(5.7)
47
1.5. Entropy functionals
= I(k, m, n) + J(m, n),
say.
(5.8)
For any e > 0 we can choose mo > 1 such that 0 < J{m,n)
< J(m,0) < e,
m > mo, n > 1.
Hence -r
X) ?(-A)logf(A) < J(fc,m,n) + e, Aeavat_!
If we let k ->■ co, then I(k,m,n) # ( £ ^ , 5 ) . Thus we have
-»■ # ( f , a ( m ) , S ) and the LHS of (5.8) -)•
-ff($,a,5)
=1
m > m 0 , n > 1.
m > m 0 , n > 1.
(5.9)
is monotonely nondecreasing by Remark 7 (2), lim F ( ^ a ( m ) , S ) = H £
exists. This and (5.9) imply that
B(t>S)
Hf(A) = Jfdn,
A eX.
Note that \\fif\\ = ||/||i for / € i 1 (/i)- S denotes the operator on M(X)
(S£)(A) = as-1A),
defined by
Aex,ieM{X).
A linear functional F on M M (X) is said to be S-stationary if F(S£,) = F(£) for
fe^(i).
Let 21 e V(X) be fixed and consider an entropy functional H(-, a, 5) on MS(X) n M M (X). For each £ G M g p Q , # ( £ , a , S ) is represented as ff(f,a,S)=
/ h(x)((dx), ./x
(5.10)
48
Chapter I: Entropy
where f = ^ - £ 2 + i£ 3 - if4 with & 6 M + ( X ) (j = 1,2,3,4) and h< = -
^{P^(A|aoo)]ogP4i(il|a00)-Pt.(A|SL)l0gP{.(il|Hoo) ytea
■iP { .(A|a 00 )iogP € .(A|aU)-iP C 4(i4|a 00 )iogP C 4( J 4|a 00 )}. Our goal is to find a universal function /i on X such that (5.10) holds instead of h%. The following theorem is a partial solution to this. Theorem 9. The entropy functional H(-,%L,S) on Ms{X)C\Mli{X) is extended to a functional H(-, 21, 5) on M M (X) so that it is an S-stationary nonnegative bounded linear functional. Moreover, it has an integral representation H{$, %S)=
f h(x) £(dx), Jx
£e
M^X)
by some S-invariant nonnegative measurable function h on X. Proof. Let 3 M be the cr-subalgebra of X consisting of S-invariant (mod/z) sets, i.e., 3 M = {A e X : n(S~1AAA) = 0}. £^(-13,,) denotes the conditional expectation relative to 3 M under \i. We define a functional H(-, 21, S) on M^X) by H{£,VL,S) = H ^
M
(^|j
M
)d^a,sV
£ € M M (X).
Then this functional is well-defined since - B » . ( ^ | ^ ) d / / G M S ( X ) . Moreover, it is linear. The boundedness of H(-,% S) follows from
|#(£,2l,S)| =
ff^(g|^)dM,2l,s)
^(^W^
< laillfll for £ e M M (X). To see that H{-,%S)
is 5-stationary, observe that
H &, 21,5) = H (E^ (
^
| j M ) d/x, 21,5)
=*(MiKK,»,s) ■ff«,a,5),
1.6. Relative
entropy and Kullback-Leibler
information
49
since ^p-{x) = g£(Ss) and E^Usfl^) = ^ ( / | ^ ) for every ^-measurable func tion / . To obtain an integral representation of H(•, 21,5), observe the duality M^X)* = i 1 (/i)* = L°°(JU). Then there exists uniquely a function h € L°°(fi) such that fftf,a,S)
= / *(*)€(<**).
f e Af„(X).
Clearly, ft is nonnegative and 5-invariant. If we can find a probability measure jx which dominates all of MS{X), i.e., f •< \i for £ e M S (X), then we see from the above theorem an integral representation of H(£, 21, S) (£ £ MS(X)) by a universal function h such that
#(£, % S) = [ h(x) S{dx),
(6^(1).
In this case, h(-) = /i(-,2l, 5) is called an entropy function for 21 and S. This representation can be shown without assuming an existence of a dominating measure in Section 2.7 using Ergodic Theorems and ergodic decompositions.
1.6. Relative entropy and Kullback-Leibler information In Section 1.1, we defined relative entropy H(p\q) for two finite probability dis tributions p and q. In this section, we extend this definition to an arbitrary pair of probability measures. Some properties of relative entropy are obtained. As an application, we discuss sufficiency of a cr-subalgebra in terms of relative entropies of measures. In testing hypotheses, relative entropy is used as a good criterion. We shall discuss these points in some detail. Let (X, X) be a measurable space. P(X) and V{X) are the same as before. Let 2J be a cr-subalgebra of X and take n, v G P(X). Then, the relative entropy of \i w.r.t. v relative to 2J is defined by \ !>(A)log^:ag7>®)f.
(6 0
where V(JQ) is the set of all finite 2J-measurable partitions of X. If 2) = X, then we write Hx{tAv) = H{JJ\V) and call it the relative entropy of n w.r.t. v. Note that the relative entropy is defined for any pair of probability measures. The following lemma is obvious.
gg
Chapter I: Entropy
Lemma 1. Let n,v e P(X) and 2Ji,2)2 be a-subalgebras. (1) H(IJI\V) > 0; H{p\v) = 0&fi = v.
(2) ?h c 2)2 => H 8I O*|I') < % 2 (^k) < -ff(^k)When fi<.u, relative entropy has an integral form as is shown in the following. Theorem 2. Let /J,,V G P{X). (1) If ii<.v, then
^-iM^t>-iy^^-s^ty-syi^(2) If nitis,
(6.2)
then H(n\v) = oo.
Proof. Since (2) is obvious we prove (1). Since V'(i) = ilogt (£ > 0) is convex, we have for A e X with i/(vl) > 0 and for a measurable function / > 0 that I\v(A)J -T7T Jf/ fdA ) !°g kg&\v{A)J (-TK I| --TTT A v /" log fdv (-TK log n r J f fdv)< fdA < --FT: f/ Jfflog )-v{A)J \V(A)JAA ) &\v(A)JAAJ )-v{A)JAAJ by Jensen's Inequality (cf. Section 1.2 (12)). If, in particular, / = g j , then
<*>*^/, (£*£)* ^^ist^tyIt then holds that for any 21 <E 'P(X)
EKA)log4S< E / ( ^ l o g # > = / f^log^W Taking the supremum on the LHS, we see that by (6.1)
»<*>*/, (£*£)* *<*>*/,(£*£)*• To show the reverse inequality, let . rk du k +1 n An,h = - < - £ < —— , in dv n 1
*•■"• = [£-"]■
„ 0
1.6. Relative entropy and Kullback-Leibler information
51
Then {An>k : 0 < k < n2} € P(X) for each n > 1. Moreover, by the definition of A»,fc's
/ lAn,k^dv< J ' du Jxx ' du
-v{An,k)< n n
n
v(A>,k)
n
for 0 < k < n2 — 1 and n > 1. This implies h n
< _
M(-An.fc) < fc + 1 v{An
Let # 1 = {k : *±i < 1} and Jf2 = {& : * > ; } • Since ip(t) = tlogt is decreasing on (0, | ) and increasing on ( j , l ) , we have that
log——<
;
- log - <
' r log )
' ; log
, . '
/„(x),
x i6
[, ,
feeJTi, keK2.
Define a function hn for n > 1 by
{ n2-l
, 4 , A».*
n2-l
where / „ = JZ n ^ n * an< ^ S« = X) ^ ^ 1 A „ »• It then follows that for n > 1 fc=o ' fc=o 2
^ n(A„ik) log M "' fc > V ) fc=0 "l^VkJ
= I Jx
/
/ „ log / „ di/ + ^ / s „ logp„ di/ fcgjf./A..* k€KxjA^
hn\oghndu,
which implies that if(/i|i/) > / Jx
hnloghndu.
On the other hand, let / = 4jJ and observe that for a; € X and n > 1 0 < gn{m) - fn(x) < - , n
0 < /(*) - /„(*) < &,(*) - /„(*) < -, n
j2
Chapter I: Entropy
0 < gn{x) - f(x) < gn(x) - fn{x) < -■ Then we see that 0<|/-ft»|<|/-/n|l
U An,k + \f-9n\lu
An,k,
l» > 1,
and hence / f\ogfdv= lim / hn\oghndu < H(ii\v). n Jx ^°° Jx Thus the reverse inequality is proved. Therefore (6.2) is true. Let us examine some properties of the relative entropy, which are collected in the following theorem. T h e o r e m 3 . (1) Let 2)„ (n > 1) and 2) be cr-subalgebras of X and fi, v 6 If Vn t % then H%MV) t Hv(jt\v). (2) Let 0 < a < 1 and /^, Vj € P{X) (j = 1,2). Tften, # ( a / x i + (1 - a)/i 2 |ai/i + (1 - a)i/ 2 ) < aJf(A»x|*i) + (1 - a) H
fall's).
P(X).
(6.3)
(3) 7/ ||/i n - un\\ -► 0 and \\vn - v\\ -» 0 witfi {(i„,/i, i/„, v : n > 1} C P{X),
then
H(n\u) < liminf Hfa\un).
(6.4)
n—+oo
(4) 7/ fi,veP(X),
then \\n-V\\
Proof. (1) By Lemma 1, {H) —> H ). Suppose first that H < v. For n > 1 let /n„ = ^ l ^ and vn = i/|3)„, the restrictions of n and v to 2)„, respectively. Then /i 1. If we let / „ = ^ and / = g^, then it follows from Theorem 2 that for n > 1 # S > „ ( M I " ) = / /nlog/„aV,
Hv(n\v)=
j
flogfdv.
Since {/„ : n > 1} is a martingale in Lx(u), we have / „ -» / /n log / „ - * • / log / i/-a.e. Then, Fatou's lemma implies that H»0*1*0= / / l o g / d i / < liminf / n Jx ^>°° Jx
fn\ogfndv
i/-a.e. and hence
1.6. Relative entropy and Kullback-Leibler tier information in
= liminf HVn(fi\u)
53
< H
n—too
from which we conclude that H0. Let e > 0 be such that n(A) > 2e. Since 2)„ | 2), there exist n > 1 and £ 6 2J„ such that /i(AAB) < e, v{AAB) < e. Now observe that
.i™,flSo.M") > »».W") > #B(MI")
<">
-«»»•**$+«r»«*$where <8 = {B,B C } € 7>(?J„). Note that fj.(B) > fj.(A) -e>
^/i(A),
v(B) < u{A) + e = e.
Thus one has
mv**&>\M^
^"> ^W) -"{BC){I - VM)=M(5C) - V[BC v(B<
Therefore, by (6.5) we have + M (5 C ) - i/(B e )
lim Hyn (/i|i/) > \n{A) log ^ n—too
2,
£
—> co
as e —> 0.
That is, lim HmAlAv) = co = Hm{)j\v). n—too
(2) We can assume 0 < a < 1. If /ti -^ fi or /t2 it v%, then (6.3) is true since both sides of (6.3) become oo. So we assume /ti -C v\ and /*2 -C ^2- In order to prove (6.3) it suffices to show that for any 21 € V(X)
E > Aea
r
fA\
W
, M + (1
< a £
\
"
f/i\"M
a ) M 2 ( A ) } l0S
QMi(^) +
a^ W
+
M i ^ l o g ^ + (1 " a) £
(l-a)M2(^)
( l - a ) ^
W
M A ) log J ^ .
(6.6)
54
Chapter I: Entropy
Let ci,C2,di, d2 be nonnegative constants and consider a function ip denned by r V{x)
,-
\ l i
OJCi + (1 — x)c2
= {xCl + (1 - x)c 2 } log xdi
+
{1_x)d2-
Then we see that „, . [(ci - c2){xdi + (1 - x)rf2} - (<*i - d2){xCl + (1 - x)c 2 }] 2 (£ (x) = 2 —" {xci + (1 - x)c2}{xdi + (1 - x)d 2 } for x € [0,1] since Cj,dj > 0 for j = 1,2. Thus, ^ is convex and hence for x € [0,1] since Cj,dj > 0 for j = 1,2. Thus, ^ is convex and hence ¥>(a) = p(a ■ 1 + (1 - a) • 0) < ay>(l) + (1 - a)v(0), a e [0,1], ¥>(a) = 9?(a • 1 + (1 - a) ■ 0) < ay>(l) + (1 - a)v(0), a e [0,1], which proves (6.6). Therefore (6.3) holds. which proves (6.6). Therefore (6.3) holds. (3) To prove (6.4) it suffices to show HA(ji\v) < liminf JHft(/i»K), 21 e ?>(£), n—>oo
since H(fi\v) = sup {ffa(/*|i/) : 21 € P(3f)}. ||//n - y\\ -> 0 and ||i/„ - i/|| -> 0 imply that /i„(A) -> /i(A) and i/n(A) -> v{A) for ,4 6 £. If i/(A) > 0 then
&*w£Sf-'Wfc«^If i/(A) = 0 and /x(A) 11 >> 0,0.then then ^ ^ ) l o
g
^
= oo = ^ ) l o
g
^ .
If «/(A) = 0 = n(A), then Uminf ^(A) log ^
> hminf (»n(A) - vn(A)) = 0
since logx > 1 - ± for x > 0. By the definition of H^n\vn)
we see that
liminf H%(nn\vn) > Hs{n\v). (4) Let n, v e P(X). By the Hahn decomposition there is some B e X such that (n-v)+(A)
= (n-u)(Ar\B),
{»-v)-(A)
= {»-v)(AnBc),
A e X.
55
1.6. Relative entropy and Kullback-Leibler information
It follows that Up - 4 = (M - * ) + ( X ) - (M - ^ ) - ( ^ ) = 2(/x(B) -
v{B)).
For 21 = {B, Bc} € 7>(£) it holds that ffa0il,)
= M(5) log ^ f j + (1 -
M (B))
log i ^ | | | .
Now we note that j2{ylog| + (l-j/)log|—|}>2(j/-x),
0<«
(6.7)
This can be seen as follows. Fix y € [0,1] and consider a function p defined by p(x) = y l o g - + (1 - */) log—~-^- - 2(y - x)2, X
0<x
1— X
Then we have P{X)=
(x-y)(2x-l)2 x(l-x)
*0'
and hence p(x) > p(j/) = 0. That is, (6.7) is true. Now from (6.7) we see that, by letting x = v{B) and y = n(B), ^H{n\v)
> ^2H^(p,\u)
> 2{fi(B) - u{B)) = ||p - v\\,
as was to be proved. When the measurable space (X, X) is (R,03), where R is the real line and 05 is the Borel a-algebra, we can consider Kullback-Leibler information. Probability measures on (R, 05) naturally arise from real random variables on a probability measure space (X, X, p.). Let £ and 77 be real random variables on (X, X, p), so that they have probability distributions pg and pv given by H{A)
= p,(C\A)),
pv(A)=p{r,-l(A)),
Ae%
respectively. Suppose that p% and p„ are absolutely continuous w.r.t.the Lebesgue measure dt of R, so that we have the probability density functions / and g, respec tively given by
f=% f=%
g=^eL\R)=L\R,dt). g=^eL\R)=L\R,dt).
5g
Chapter I: Entropy
In this case, the Kullback-Leibler information i(£|n) between f and 77 is defined by I(t\l) = / (/(*) lc) g/( f ) - / ( ' ) l°S9(t)) dtJR Lemma 5. Let / and pfeeprobability density functions on R. Then
Lemma 5. Let f and g be probability density functions on R. Then
(6.8)
- [arefit)finite. log/W < - [ holds f(t) logg(t) (6.8) is true if both integrals Thedtequality iff f = gdt a.e. (in the Lebesgue JU
m.pn.
JR
is true if both integrals are finite. The equality holds iff f = g a.e. (in the Lebesgue measure). Proof. Since t - 1 > logt for t > 0 and "=" holds iff t = 1, we have Proof. Since t - 1 > logt for t > 0 and "=" holds iff t = 1, we have f(t)(logg(t) - log/(t)) = / ( t ) ] 0 g ^ | < 9(t) - f(t)
a.e.,
where we define ^ = 00 (a > 0) and 0 • 00 = 0. This implies that / (/(*) log/W - f{t) \ogg{t)) dt< f (g(t) - /(*)) dt = 0, JR
JR
or (6.8) is true. Obviously, the equality holds iff f(t) = g(t) a.e. Lemma 6. Let £ and n be a pair of real random variables with probability distribu tions H£ and nv, respectively. If n$ and nv have probability density functions f and g, respectively, then
i/(^K) = /(ffo). Proof. If Hinzln,,) < 00, then ^ < fj.v by Theorem 2. Hence
m\v) = I (fit) iog/(«) - f(t) logffW) ^ = / / ( * ) log ^ d i JR 9(t)
= / %i o g ^iZ* d f /R dt
dp.n/dt
log ~ £ d/i =
H
iHi\lh,)-
57
ibler inf> 1,6. Relative entropy and Kullback-Leibler information
Similarly, it can be shown that if I(£\r]) < oo, then //j -C [in and H{fj,^\nr)) = I(£\ri). When £ is a real Gaussian random variable, the density function f(t) can be written as
11
f(
2
(t-m) \ (t-mr\
for some m G R and a > 0. In this case, the probability measure /if is also said to be Gaussian. As is well-known any two Gaussian measures fii and fi2 are equivalent (i.e., mutually absolutely continuous) or singular. That is, fii x \i2 (i.e., HI0] for /J 6 M. A subset Iff € X is called a chunk if there is some (i £ 931 such that if C if ^ and fi{K) > 0. A disjoint union of chunks is called a chain. Since A(X) < oo and X(K) > 0 for every chunk K, we see that every chain is a countable disjoint union of chunks. We also see that a union of two disjoint chains is a chain and, hence, a countable union of (not necessarily disjoint) chains is a chain. Let a = sup{A(C) : C is a chain}. Then, we can find an increasing sequence {C„} of chains such that lim A(C„) = a. n—»oo
If we let C =
U C„, then C is a
n=l
chain and A(C) = a. Moreover, there exists a sequence {Kn} of chunks such that C = U Kn with a sequence of measures {//„} C 971 such that ii"„ C K^ and M # n ) > 0 for n > 1. Let 91 = {jin : n > 1}. Obviously 9 t < 971 since 91 C 971. We shall show 9711. Take an arbitrary fi e 931. Since /i(A\ii"M) = 0 by the definition of K^, we can assume that A C K^. If /i(A\C) > 0, then X(A\C) > 0 and hence A U C is a chain with A(A U C) > A(C) = a. This contradicts the maximality of C. Thus n{A\C) = 0.
58
Chapter I: Entropy
Now observe that A(JlnC) = ^ A ( 4 n i f n ) = o n=l
since 0 = nn(A) = fi(A n Kn) = JAnKn f^ dX and Kn C #,,„ imply A(A n tf„) = 0 for n > 1. Therefore, fi{A) = n{A\C) + n(A D C) = 0. This means SOT < 01. We now introduce sufficiency. Definition 10. Let 2) be a1}. Let X A
( ) = T,^(A),
A ex.
n=l
Then, A e P(X) and SOT « {A}. Since 2) is sufficient for SOT, for each A 6 £ there exists a 2J-measurable function hA such that °° 1 f f A(An£?) = £ — / ^ „ ( U | 2 J ) d / x „ = / ^ d A , Hence EX(1A\Z)) we have
Be?).
= hA X-a.e. Take any fi e SOT and let g = jfc. Then, for any A € X
!A^dX=SAtdX=^A) =
jxE^m^
= I hAdfi= [ Ex(lA\
59
1.6. Relative entropy and Kullback-Leibler information
= / lAEx(g\
[
Ex(g\Z))d\,
JA
which implies that g = E\(g\?Q) and g is 2)-measurable X-a.e. Conversely, suppose that there exists A € P{X) such that SOT w {A} and ^ is 2J-measurable /x-a.e. for \i € SOT. Let /x e SOT and .A € SE be arbitrary and let g = jg*, which is 2)-measurable /i-a.e. Then, for B 6 2J it holds that / £ A ( U | 2 J ) d / i = / £ A (l A |2J) 5 dA JB
JB
= f \Agd\= I JB
E^lA^dn.
JB
Since B e 2) is arbitrary, we see that ExiUm
= EJIAW)
H-a.e.
Hence 2) is sufficient for SOT. R e m a r k 12. Let SOT C P ( X ) and 2) be a a-subalgebra of X. SOT is said to be homogeneous if n as v for /*, v e SOT. If SOT is homogeneous, then 2) is sufficient for SOT iff g^ is 2)-measurable for fj,, v £ SOT. This is seen as follows. Let A be any measure in SOT. Then SOT is dominated by A and Theorem 11 applies. We introduce another type of sufficiency. Definition 1 3 . So let SOT C P(X) and 2) C X, a cr-subalgebra. 2J is said to be pairwise sufficient for SOT if 2) is sufficient for every pair {jx, v) C SOT. It is clear from the definition that sufficiency implies pairwise sufficiency. To consider the converse implication, we need the following lemma. L e m m a 14. Let SOT C P(X) and 2J C X, a a-subalgebra. Then, 2) is pairwise sufficient for SOT iff, for any pair {[i, v} C SOT, diJLv\ is 2)-measurable (fi + v)-a.e. Proof. Assume that 2) is pairwise sufficient for SOT. Let ft, v e SOT be arbitrary and let A = ^ e P(X). Since 2J is sufficient for {/*, v}, we have that ^ and ^ are 2J-measurable by the argument similar to the one given in the proof of Theorem 11. Thus g7~CT is also 2J-measurable. The converse is obvious in view of Theorem 11. T h e o r e m 15. Let SOT C P(X) be dominated and 2) C X, a a-subalgebra. Then, 2) is sufficient for SOT iff 2J is pairwise sufficient for SOT.
gO
Chapter I: Entropy
Proof. The "only i f part was noted before. To prove the "if part, suppose that 2) is pairwise sufficient for 9JI. We can assume that 9JI = {fj,n ■ n > 1} in view of Lemma 9. Let
°° 1
It follows from Lemma 14 that g ^ i m is 2)-measurable for fi € fffl. Consequently, for ii G Tl dfj, dfi ( dfi ' dX d(n + A) \ d(fj, ■ is 2J-measurable. Theorem 11 concludes the proof. Now sufficiency can be phrased in terms of relative entropies. T h e o r e m 16. Let fi, v € P(X) and 2) C 31, a a-subalgebra. (1) If 2) is sufficient for {/z, v}, then H%)(fi\v) = H(ft\v). (2) If H
*<*>-««.<*>-/>*!-log ^ H x
d\ildv dpldv duo , d(i0/dv0 log -——— duo/di/Q • —— di/0 dv
/log/d£, where
and / = w^ « = 3 £ d " - Let V(t) = * log t, t> 0. Then, since fxfd£ and hence 0 < / < oo i/-a.e., we can write
=l
V (/(*)) = V(i) + {/(*) - i}V(i) + £{/(*) - i}V"(ffW) /#C^_
il2
2fl-(a:) 2p(a:) where S (x) € ( l , f(x)) or (/(x), l ) and hence 0 < g < oo v-o.e. Thus we see that
L 4>(f(x))adx)>0
bier infoi 1.6. Relative entropy and Kullback-Leibler information
61
since fxfd£ = 1- The equality holds iff / = 1 u-a.e., i.e., H(n\v) = H
,p{an))
Hi : q = (g(oi),... ,g(a„)), we have to decide which one is true on the basis of samples of size k, where p , q G A„ and Xo = {ai,... , a „ } . We need to find a set A C XQ, so that, for a sample ( i i , . . . ,3;*;), if ( x i , . . . ,Xfc) G ^4, then ifo is accepted; otherwise ifi is accepted. Here, for some e G (0,1), type 1 error probability P(A) satisfies k
J2
\{p{xj) = P{A)<e
(xi,...
,xt)€Aj=l
and type 2 error probability Q(A) is given by k
]T
J[q(xi) = l-Q{A) = Q(A%
(xi,.-,XkHAJ=1
which should be minimized. Under these notations we have the following. Theorem 17. In a hypothesis testing problem mentioned above, for any e G (0,1) and the sample size k > 1 let p(k,e) = m i n { Q ( ^ c ) : P(AC) >l-e,ACXk}.
(6.9)
Then it holds that Urn Uog(3(k,e)
= -£p(a,-)log^
=
-H(p\q).
(6.10)
Proof. Let £ i , . . . , £*. be independent, identically distributed (i.i.d.) random variables taking values in Xo with the probability distribution p = ( p ( a i ) , . . . ,p(a„)) and let
vj=iogP
M' i-s-k'
g2
Chapter I: Entropy
so that rjj's are also i.i.d. with the same mean p{o.i)
£p(^) = £ p ( ^ ) i o g g g = tf(p|q)
1 < j < k.
Hence it follows from the weak law of large numbers (cf. e.g. Rao [2]) that for any 6>0
55, P { £ & - - ff(Pl^ < <*} = L
(6-11)
For each 5 > 0 let E(k,6)=l(xu...,xk)eX*:
i^l0g^l-ir(p|q)
< «J 1.
Then by (6.11) we have that lim P(E(k, 5)) = 1,
5 > 0.
For (xi,... , Xfc) € E(k, 5) it holds that it
Y[p(xj) exp { - k(H(p\q) - 5)} > f[ J= l
q(Xj)
3=1 kk
> HP(XJ) exp { - k(H(p|q) + <$)}, > 3=1 HP(XJ)
exp { - A(tf(p|q) + <$)},
and hence for sufficiently large k and hence for sufficiently large k
Q(E(k,S)) =
J2
]l9(*i) fc
£
£
nP^)exp{-fc(H(p|q)-^)}.
(xi,...,xk)eB(k,6)j=l
By the definition of @(k, 8), (6.9), we see that -logp(k,6) < ilogQ(£(fc,5)) < -ff(p|q) + 5,
(6.12)
Bibliographical
63
notes
which implies that limsup-log£(fc,<5) < --ff(p|q) + 6.
(6.13)
On the other hand, in view of the definition of E(k, 5) we have
Q{Ac)>Q(AcnE(k,S))
E [x!,... ,xk)eAcnE(k,S)
f[ff(*i) j=l k
>
E (xi,... ,xt)€AcnE{k,6)
n^) e x P{- A ; Wp|q) + 5)} j=l
= P(AC n E(k,6)) exp { - fe(H(p|q) + 6)}. C
= P(A
n E(k,8)) exp { - fc(JT(p|q) + <5)}.
Since P(A) < e, (6.12) implies that for large enough k > 1 Since P(.A) < e, (6.12) implies that for large enough k > 1 p(Aen£(*,*))>^ and hence Q(AC) > ^
exp { - fc(# (p|q) + 6)}.
Since the RHS is independent of A it follows that P(k, S) > i ^
exp { - *(H{p|q) + 6)},
so that liminf i log/?(*,£) > - i r ( p | q ) - 5.
(6.14)
fc—foo fc
Since <S > 0 is arbitrary, combining (6.13) and (6.14) we conclude that (6.10) holds.
Bibliographical n o t e s There are some standard textbooks of information theory: Ash [1](1965), Csiszar and Korner [1](1981), Feinstein [2](1958), GaUager [1](1968), Gray [2](1990), Guiasu [1](1977), Khintchine [3](1958), Kullback [1](1959), Martin and England [1](1981), Pinsker [1](1964), and Umegaki and Ohya [1, 2](1983, 1984). As is well recognized there is a close relation between information theory and ergodic theory. For instance, Billingsley [1](1965) is a bridge between these two theories. We refer to some text books in ergodic theory as: Brown [1](1976), Cornfeld, Fomin and Sinai [1](1982),
Chapter I: Entropy
64
Gray [1](1988), Halmos [1, 2](1956, 1959), Krengel [1](1985), Ornstein [2](1974), Parry [1, 2](1969, 1981), Petersen [1](1983), Shields [1](1973) and Walters [1](1982). Practical application of information theory is treated in Kapur [1](1989) and Kapur and Kesavan [1](1992). The history of entropy goes back to Clausius who introduced a notion of entropy in thermodynamics in 1865. In 1870s, Boltzman [1, 2](1872, 1877) considered an other entropy to describe thermodynamical properties of a physical system in the micro-kinetic aspect. In 1928, Hartley [1] gave some consideration of the entropy. Then, Shannon came to the stage. In his epoch-making paper [1](1948), he really "constructed" information theory (see also Shannon and Weaver [1](1949)). The his tory of the early days and development of information theory can be seen in Pierce [1](1973), Slepian [1, 2](1973, 1974) and Viterbi [1](1973). 1.1. The Shannon entropy. Most of the work in Section 1.1 is due to Shannon [1]. The Shannon-Knintchine Axiom is a modification of Shannon's original axiom by Khintchine [1](1953). The Faddeev Axiom is due to Faddeev [1](1956). The proof of (2) => (3) in Theorem 1.4 is due to Tverberg [1](1958), who introduced a weaker condition than [1°] in (FA). 1.2. Conditional expectations. Basic facts on conditional expectation and condi tional probability are collected with or without proofs. For the detailed treatment of this matter we refer to Doob [1](1953), Ash [2](1972), Parthasarathy [3](1967) and Rao [1, 3] (1981, 1993). 1.3. The Kolmogorov-Sinai entropy. Kolmogorov [1](1958) (see also [2](1959)) introduced the entropy for automorphisms in a Lebesgue space and Sinai [l](1959) slightly modified the Kolmogorov's definition. As was mentioned, entropy is a com plete invariant among Bernoulli shifts, which was proved by Ornstein [1](1970). There are measure preserving transformations, called if-automorphisms, which have the same entropy but no two of them are isomorphic (see Ornstein and Shields [11(1973)). 1.4- Algebraic models. and Foias, [2, 3](1968). Chi section to projective limits are seen in Dinculeanu and
The content of this section is taken from Dinculeanu and Dinculeanu [1](1972) generalized the results in this of measure preserving transformations. Related topics Foias, [1](1966) and Foias, [1](1966).
1.5. Entropy functionals. Affinity of the entropy on the set of stationary probabil ity measures is obtained by several authors such as Feinstein [3] (1959), Winkelbauer [1](1959), Breiman [2](1960), Parthasarathy (1961) and Jacobs [4](1962). Here we followed Breiman's method. Umegaki [2, 3](1962, 1963) applied this result to con sider the entropy functional defined on the set of complex stationary measures. He obtained an integral representation of the entropy functional for a special case. Most of the work of this section is due to Umegaki [3]. 1.6. Relative entropy and Kullback-Leibler information.
Theorem 6.2 is stated
Bibliographical notes
65
in Gel'fand-Kolmogorov-Yaglom [1](1956) and proved in Kallianpur [1](1960). (4) of Theorem 6.4 is due to Csiszar [1](1967). Sufficiency in statistics was studied by several authors such as Bahadur [1](1954), Barndorff-Nielsen [1](1964) and Ghurge [1](1968). Definition 6.8 through Theorem 6.15 are obtained by Halmos and Savage [1](1949). We treated sufficiency for the dominated case here. We refer to Rao [3] for the undominated case. Theorem 6.16 is shown by Kullback and Leibler [1](1951). Theorem 6.17 is given by Stein [1](unpublished), which is stated in Chernoff [2](1956) (see also [1](1952)). Hoeffding [1](1965) also noted the same result as Stein's. Re lated topics can be seen in Blahut [2](1974), Ahlswede and Csiszar [1](1986), Han and Kobayashi [1, 2](1989) and Nakagawa and Kanaya [1, 2](1993).
CHAPTER II
INFORMATION
SOURCES
In this chapter, information sources based on probability measures are considered. Alphabet message spaces are reintroduced and examined in detail to describe infor mation sources, which are used later to model information transmission. Stationary and ergodic sources as well as strongly or weakly mixing sources are characterized, where relative entropies are applied. Among nonstationary sources AMS ones are of interest and examined in detail. Necessary and sufficient conditions for an AMS source to be ergodic are given. The Shannon-McMillan-Breiman Theorem is formu lated in a general measurable space and its interpretation in an alphabet message space is described. Ergodic decomposition is of interest, which states that every stationary source is a mixture of ergodic sources. It is recognized that this is a series of consequences of Ergodic and Riesz-Markov-Kakutani Theorems. Finally, entropy fimctionals are treated to obtain a "true" integral representation by a universal function.
2.1. Alphabet message spaces and information sources In Example 1.3.14 Bernoulli shifts are considered on an alphabet message space. In this section, we study this type of spaces in more detail. Also a brief description of measures on a compact Hausdorff space will be given. Let XQ = { o i , . . . , at} be a finite set, so called an alphabet, and X = XQ be the doubly infinite product of XQ over Z = {0, ± 1 , ± 2 , . . . } , i.e., oo
x = xl= n xk,
xk = x0,kez.
k=—oo
Each x £ X is expressed as the doubly infinite sequence x = (xfc) = (•■■, x-u xo, 67
xu...).
„„
Chapter II: Information Sources
The shift S on X is defined by S:x^x'
= Sx={...
,£'_!,4,2;'!,...),
x'k = xk+1,
keZ.
Denote a cylinder set by [x°i---x°} =
[xi=x°i,...,xi=x§
= {x=
(xk) eX
:xk = x°k,i< k
<j},
where x% 6 X0 for i < k < j and call it a {finite) message. following properties:
One can verify the
(l)i<s( ^ • - • x°] n [yP ■ ■ ■ 2/P] = 0;
(3)[^..-*5] = n{[«S]:*<* [*S • ••«?] = U {[an ■■■xj]:xk
= x°k,s
c
(5) [a;? • ■ xP] = (J {l>« ■ • ■ «j] = *fc ^ zg for some k = i,... (6) (J { N
:x
i
e
^ o } = U {& ■■■xj}:xi,...,
t}\ , j};
Xj G X 0 } = X.
Thus the set 9K of all messages forms a semialgebra, i.e., (i) 0 G 9JI; (ii) .A, B 6 SDT =>■ A n B G 1H; and (iii) A G m => Ac = Li Bj with disjoint B\, ■.. , Bn G Wl. S is a one-to-one and onto mapping such that (7)5" 1 (( a ; f c )) = ( s f c - i ) f o r ( x f c ) G X ; (8) S""[xP • • • xP] = [yf+n ■ ■ ■ 2/P+J with y°k+n = x°k for i < k < j and n G Z. Let X be the cr-algebra generated by all messages 9Dt, denoted X = a(Wl). Then {X, X, 5) is called an alphabet message space. Now let us consider a topological structure of the alphabet message space {X, X, S). Letting d 0 (a,, a,-) = |i - j \ , a*, a,- G X 0 ,
"(^')= E
^
,
*,x'GX,
(1.1)
k=—oo
we see that X is a compact metric space with the product topology and S is a homeomorphism on it. Recall that a compact Hausdorff space X is said to be totally disconnected if it has a basis consisting of closed-open (clopen, say) sets. Then we have the following: Theorem 1. For any nonempty finite set X0 the alphabet message space X = X j is a compact metric space relative to the product topology, where the shift S is a
69
2.1. Alphabet message spaces and information sources
homeomorphism. Moreover, X is totally disconnected and X is the Borel and also Baire cr-algebra of X. Proof. The shift S is continuous, one-to-one and onto. Hence it is a homeomorphism. X is totally disconnected. In fact, the set 97t of all messages forms a basis for the product topology and each message is clopen. To see this, let U be any nonempty open set in X. It follows from the definition of the product topology that there exists a finite set J = {ji,... , jn} of integers such that prfc(f) = Xk = XQ for k £ J, where prj;() is the projection onto the fcth coordinate space Xk- Let i = min{fc : k e J} and j = max{fc : k 6 J}. Then we see that, for any u = (life) €E U, [ui ■ ■ ■ Uj] C U
and
U = M [ui ■ ■ ■ Uj\.
This means that 971 is a basis for the topology. Each message is clearly clopen. In the rest of this section, we consider a compact Hausdorff space X and its Baire cr-algebra X with a measurable transformation S on X. C{X) and B{X) denote the Banach spaces of all continuous functions and Baire measurable functions on X with sup-norm, respectively. As in Chapter 1, M(X) denotes the Banach space of all C-valued measures on X. In this case, M(X) is the space of all Baire measures on X. P{X) (resp. PS(X)) denotes the set of all (resp. 5-invariant) probability measures in M(X). Each measure (JL 6 P{X) (or PS(X)) is called an information source (or stationary information source), or simply a source (or stationary source). A stationary source fj, 6 PS(X) is said to be ergodic if fJ,(A) = 0 or 1 for every S-invariant set A e X. Pse(X) denotes the set of all ergodic sources in PS(X). E x a m p l e 2. Let XQ — {a%,... , at} be an alphabet with a probability distribution P = (Pii • ■ ■ >Pi)- Consider the alphabet message space X = XQ with a shift S on it. For a message [x° • • • x^] we define
«,([*?-*?]) =P(*?) •••*(*?)•
(1-2)
Then, no is defined on the algebra .4(971) generated by 97t, the set of all messages, and is S-invariant such that ^ o ( ^ ) — 1- By the Caratheodory extension theorem ^o can be extended uniquely to an 5-invariant probability measure /x on X = cr(97t), i.e., \i € Pa(X). This n is called a ( p i , . . . ,pe)-Bernoulli (information) source and S is called a ( p i , . . . ,pt)-Bernoulli shift as in Example 1.3.14. We claim that fj, is ergodic. To see this, suppose that A G X is S-invariant and let e > 0 be arbitrary. Choose B e .4(971) such that n(AAB) < e and hence k
\n(A) — n(B)\ < e. Since B — U Bj with disjoint B\,... no > 1 such that S n°B have no > 1 such that S~n°B nohave n(S- B n{S-n°B
j=i
different coordinates from different coordinates from r\B) = M(S-"°5)/i(B) = n B) = n(S-noB)fj,(B) =
,Bk € 97t, we can choose B. This implies that B. This implies that fj,(B)2 fj,{B)2
Chapter II: Information Sources
70
by virtue of (1.2). Then we have fj,(AAS~n°B) = (i{S~n"AAS~noB), = n(S-n°(AAB)) = n(AAB) < e,
since A is S-invariant,
and hence /*(AA(B n S~n°B)) < »{{AAB) U (AAS-"°B)) < fi(AAB) + n(AAS-noB) < 2e. Consequently, it holds that \p(A) - n(B n S-n°B)\
< 2e and
|/*(A) - »(A)2\ < \p(A) - KB n S~n°B)\ + \fi(B n 5-""5) -
/J(>L) 2 |
<2e+|M(5)2-/i(J4)2| = 2£+(M(B)+/i(A))|M5)-M(^)| <4e. Therefore n(A) = n(A)2, or /i(A) = 0 or 1, and /u is ergodic. Moreover, we can see that (i is strongly mixing. This fact and ergodicity of Markov sources will be discussed in Section 2.3. Remark 3. Let us examine functional properties of P(X) and Pa(X). (1) Observe that M(X) = C(X)* (Riesz-Markov-Kakutani Theorem) by the iden tification M(X) 3 fj, = AM 6 C(X)* given by A„(/)=//dM,
feC(X)
(cf. Dunford and Schwartz [1, IV.6]). Hence, P(X) is a bounded, closed and convex subset of M(X), where the norm in M(X) is the total variation norm ||£|| = \£\{X) for £ € M(X). Moreover, it is weak* compact by the Banach-Alaoglu theorem (cf. Dunford and Schwartz [1, V.4]). Since B(X) contains C(X) as a closed subspace, C(X)* = M(X) can be embedded into B(X)*. For each n e P{X) we have (infin itely many) Hahn-Banach extensions 77 of fi onto B(X), i.e., 77 e B(X)* and 7/ = fi on C(X). Among these extensions r\ we can find a unique p, € B(X)* such that /„ I f implies £(/„) J, £(/), where £(/) = Jx f dp for / € S(X). Hereafter, we shall write /*(/) = [ fdfi,
n S M p Q , / G B(X).
S.2. Ergodic theorems
71
(2) Let us consider the measurable transformation S as an operator S on functions / on X defined by ( S / ) ( i ) = f{Sx), xeX. Then S is a linear operator on C{X),B(X) or LP(X,p) for p > 1 and p e For each n G N denote by S n the operator on functions / on X defined by (S„/)(*) = i £ ( S f c / ) ( s ) = i ] T / ( S f c * ) ,
xeX
P(X).
(1.3)
fc=0 fc=0
Observe that the operator S : P(X) -)• P(X) defined by S(p) = poS'1 for p € P(X) is affine. Suppose that 5 is continuous in the weak* topology on M(X), i.e., pn —> p (weak*) implies Sp„ —¥ Sp (weak*). Then it follows from Kakutani-Markov fixed point theorem (cf. Dunford and Schwartz [1, V.10]) that there is a p € P(X) such that Sp = p o 5 _ 1 = p, i.e., p G PS{X)- Hence, Pa(X) is nonempty and is also a norm closed and weak* compact convex subset of P(X). By Krein-Milman's theorem (cf. Dunford and Schwartz [1, V.8]) the set exPs(X) of all extreme points of P3(X) is nonempty and Pa(X) = co[exPg(X)], the closed convex hull of exPs(X). Here, the closure is w.r.t. the weak* topology and p G Pa(X) is called an extreme point if p = an + P(, for some a,/3 > 0 with a + '/3 = 1 and n,£ e Pa(X) imply that M = n = i■ (3) The operator S on M(X) is continuous in the weak* topology if S is a con tinuous transformation on X. To see this, first we note that S is measurable. Let / G C(X). Then S / G C(X) since S/(-) = f(S-) and S is continuous. If C G X is compact, then there is a sequence {/ n }^Li Q C(X) such that / „ J. lc as n —> oo since X is compact and Hausdorff. Thus, lc(S-) = S l c ( 0 is Baire measurable, i.e., 5 _ 1 C G X. Therefore, S is measurable. Now let pn -> /J (weak*), i.e., /*„(/) —> /i(/) for / G C ( X ) . Then, we have for / G C{X) Spn(f)=
[ f{x)Spn(dx)= Jx
[ Jx
f(x)pn{dS-1x)
= I f(Sx) pn{dx) = M s / ) -> Ms/), since S / G C ( X ) , implying 5^i„ —> Sp (weak*). Therefore, S is continuous in the weak* topology.
2.2. Ergodic theorems Two celebrated Ergodic Theorems of Birkhoff and von Neumann will be stated and proved in this section. We begin with Birkhoff's ergodic theorem, where the operators S n 's are defined by (1.3).
72
Chapter II: Information Sources
Theorem 1 (Birkhoff Pointwise Ergodic T h e o r e m ) . Let fj, e PS(X) f e L1(X,fj,). Then there exists a unique fs € L1(X,fi) such that (1) fs=
lim S n /
and
fi-a.e.;
n—too
(2) Sfs
= fs
ii-a.e.;
(3) / f dfj,= I fs dfj, for every S-invariant A € X; JA
JA
Ll(X,n).
(4) | | S „ / - /s||i, M - ^ O a s n - > o o , ||-||i, M being the norm in If, in particular, \i is ergodic, then fs is constant
fi-a.e.
Proof. We only have to consider nonnegative / e Ll(X,n). / ( x ) = limsup(S„/)(x), r>->oo
Let
/(x)=liminf(Sn/)(x), ~
x e X.
n->oo
To prove (1) it suffices to show that f fdfi< JX
since this implies that f = f fM(x)
f fdn<
I
JX
JX
fdp,
fi-a.e. and (1). Let M > 0 and e > 0 be fixed and x e X.
= min {J(x), M},
Define n(x) to be the least integer n > 1 such that 1 n_1 < (S»/)(«) + e = - £ ' ( ^ z ) + e> n ,=o
fu(*)
Note that n(x) is finite for each x e X. Since / and fM
*
e
X
are 5-invariant, we have
n (( xx ) - l n
n(x)fu(x)
< n(x) [(S„ (x) / M )(x) + e] = £
" ( * ) / * ( * ) < n(a) [ { S „ W / M ) M + e] =
£ j=0
f(Sjx) + n(x)e,
x e X.
(2.1)
/ ( 5 i x ) + n(x)e>
* e X.
(2.1)
Choose a large enough N > 1 such that Choose a large enough N > 1 such that M
^
<
M
M{^)
With
A =
with
'fa;G ^ : « ( x ) >AT}.
A = {xG X : n ( x ) > AT}.
Now we define / and n by Now we define / and h by /(x) = (/(s). n > 10,
* M X6A'
n ( X }
|n(x), -\l,
xiA x € j 4
-
2.2. Ergodic
73
theorems
Then we see that for all x € X n(x) < N, n(x) —1
by definition, n(x)—l
J
E
7M(S X)<
3=0
£
f(Sjx) + n(x)e,
(2.2)
3=0
by (2.1) and S-invariance of fM, and that / fdfi= Jx
/ fd/i+ JA"
I f dfi JA
< I fdfi+ I fd/i+ I Mdfi JA*
JA
= / fd/j.+ Jx
JA
/ Mdn< JA
(2.3)
/ fdn + e. JX
Furthermore, find an integer L > 1 so that ^jf- < e and define a sequence {wfc(x)}fcL0 for each x € X by n fc (x) = n fc _i(x) + n ( 5 " f c ^ x ) ,
n0(x) = 0,
ft
> 1.
Then it holds that for x e X fc(i)
L-l
ni(a:)-l
2
E/M(^*) = E 3=0
fc=lj=nk_1(i)
L-l
7M(S'*) + £
i=n*w(s)
fM(Sjx),
where fc(x) is the largest integer ft > 1 such that rifc(s) < L — 1. Applying (2.2) to each of the k(x) terms and estimating by M the last L — rik(x)(x) terms, we have k(x) r»jt(x)-l
L-l j
T,fM(S x)
= J2
E
L-l j
fu(S x)+
fc=lj=nt_i(a;)
3=0
fc(a:)
^E
E
fM(Sjx)
j=n*(z)(z)
n*W-l
E
/ ( 5 : ' x ) + ( n f c ( a ; ) - n f c _ 1 ( x ) ) e + (L-n f c ( a : ) (a;))M
k=i L-l
< E f(Sjx) + Le + (N- 1)M 3=0
since / > 0, / M < M and L — nk^(x) and divide by L, then we get /
7M<^<
< N — 1. If we integrate both sides on X
/ /d/i + e + - — ~
< I
fdfi + Se
74
Chapter II: Information Sources
by the S-invariance of n, (2.3) and ^ - < e. Thus, letting e -* 0 and M —» oo give the inequaUty Jxfd/j, < Jx f dfi. The other inequaUty Jxfd/j, < Jxfd(j, can be obtained similarly. (2) is clear, (3) is easUy verified, and (4) foUows from the Dominated Convergence Theorem. FinaUy, if n is ergodic, then for any r G M the set [fs > r] is S'-invariant (since fs is 5-invariant) and has measure 0 or 1. That is, fs is constant fi-a.e. Theorem 2 (von N e u m a n n M e a n Ergodic T h e o r e m ) . Let fi € PS(X) and f G L2(X,/J,). Then there exists a unique fs G L2(X,fi) such that Sfs = fs fi-a.e. and \\Snf ~ / s | | a * -*■ 0 (n -> oo), uAere ||-||2,M is the norm in
L2(X,ii).
Proof. Suppose g G L°°(X, /i). Then, g G L2(X, y) C Ll{X, n) and by Theorem 1 S„0 - » 5 s i*-a.e. for some 5-invariant # s € L 1 ^ , ^ ) . Clearly # s € L°°(X,n) C L 2 (X,/i). Since 2 |S„ff - s s | -> 0 /z-a.e., it follows from the Bounded Convergence Theorem that \\Sng - gsh#->
0 («->»).
(2.4)
Now let / 6 L2(X,n) be arbitrary. For any e > 0 choose p G £°°(X,/x) such that 11/ - gh,n < £■ By (2.4) we can find an n0 > 1 such that \\Sng - S m g|| 2 ,^ < e,
n,m>
n0.
Since | | S „ / | | 2 ^ < ||/|f»lM for n > 1 we see that l|S„/ - S m /|| 2 ) M < US,,/ - S n f f || 2 i / i + ||S n f l - Smg\\2ill + \\Smg - S m /|| 2 , M
n0. This means that { S , , / } ^ is a Cauchy sequence in L2(X, n). Hence there is an fs G L 2 ( A » such that | | S „ / - / s || 2 r f I -* 0. To see the 5-invariance of fs observe that \\fs-Sfs\\2jt=
Urn
^ S
n + 1
/ - S(S„/) 2,M
:
li m Mhdi n-»oo
n
= Q
75
2.3. Ergodic and mixing properties
implying that fs = Sfs
y,-a.e. This completes the proof.
Remark 3. (1) A sharper form of the Pointwise Ergodic Theorem is obtained for an arbitrary measure space (X, X, p) based on the Maximal Ergodic Theorem (see e.g. Rao [2]). (2) In the Pointwise Ergodic Theorem, let 3 be the o--subalgebra of X consisting of S-invariant sets. Then, fs = E^{f\3) p-a.e., where Ep(-\3) is the conditional expectation w.r.t. 3 under the measure p. (3) In the Mean Ergodic Theorem, let S be the closed subspace of L2(X, p) con sisting of S-invariant functions and Ps ■ L2(X, p) —>• S be the orthogonal projection. Then, fs = Psf for / € L2(X, p) (see (5) below). (4) It follows from the proof of the Mean Ergodic Theorem that the Mean Ergodic Theorem holds for every / 6 IJ'lX^p) with 1 < p < oo. That is, if 1 < p < oo, p e Pa(X) and / € ^(Xjp), then there is some S-invariant fs € LP(X, /i) such that | | S „ / — /s||p,/i —> 0 as n —> oo, || ■ || Pi/i being the norm in LP{X, /x). (5) The outline of von Neumann's original proof of Theorem 2 is as follows. Let 5 be as in (3) and « = 6 { / - S / : / e I 2 ( I , 4 where ©{•■■} is the closed subspace spanned by {■•■}■ Then, the first step is to show that S and H are orthogonal complementary subspaces, i.e., S®H = L2(X, fi). The next step is to prove that S n / —> 0 in L2 for / € %. Then, for any / e L2(X, fi) write f = fi+ f2 with / i 6 5 and fi e %. Hence we have llS«/-/l||2,M=llS«(/l + /2)-/l||2,,
= IIS»/2||2,^0' as was desired. This tells us that Theorem 2 holds for an arbitrary measure space.
2.3. Ergodic and mixing properties Let X be a compact Hausdorff space and X be its Baire tr-algebra. In this section, ergodicity and mixing properties are considered in some detail. After giving the following lemma we shall characterize ergodicity of stationary sources by using ergodic theorems. Recall that two measures /i, 77 e M(X) are said to be singular, denoted n L 77, if there is a set A 6 X such that |/i|(A) = \\fi\\ and |»7|(.(4C) = H77II, i.e., fi and T) have disjoint supports. Also recall that /j, e PS(X) is ergodic if each 5-invariant set A G X has measure 0 or 1 and Pse(X) denotes the set of all stationary ergodic sources. Lemma 1. / / /z, 77 G Pse(X),
then either n = n or fi ± 77.
7g
Chapter II: Information
Sources
Proof. Suppose that /J, / 77. Then there is an A G X such that n{A)
G X : lim ( S « 1 A ) ( « ) =
I
M^)}.
n-*oo
J
Av = \x G X : lim ( S „ U ) ( x ) = 7 7 ^ ) } . Then we see that A^ and ^4,, are S-invariant, AltCiA„ = $ and p(^4/i) = »?(A,) = 1 by Theorem 2.1 since /i and 7? are ergodic. This implies that /J J_ 77. Theorem 2. For a stationary source n G Pa{X) the following conditions are equiv alent to each other: (1) fj, G Pse(X), i.e., fi is ergodic. (2) There is some 77 G Pse(X) such that /i -C 77. (3) If £€ PS{X) and f < \i, then £ = (i. (4) n G exPs(X). (5) If f G B(X) is S-invariant n-a.e., then f = const /x-a.e. (6) / s ( i ) s lim (S„/)(a;) = / / d / * 11-a.e. for every f e n->oo
Ll(X,n).
^
(7) n lim ) (S„f,g)2/1
= (/, 1)2,^(1,5)2,^ /or ever?/ / , 5 G
(8) lim fJ.((Snf)g)
= n(f)fi(g)
for every f,ge
(9) lim (i((Snf)g)
= p{f)vig)
for every f,g G C(X).
L2(X,n).
B(X).
I n-1
(10) lim - y. n(S-kA n-t-oo n
(11) Urn - "][: (t(S-hA n-xx> n
C\B)=
fJ,(A)n(B) for every
A,BeX.
k=0
r\A) = n(A)2 for every
AeX.
k-0
Proof. (1) «• (2) is obvious and (1), (2) => (3) follows from Lemma 1. (3) => (4). Suppose (4) is false, i.e., fi £ exPs{X). Then there are a,j3 > 0 with a + P = 1 and £, 77 G P S (X) with £ ^ 7? such that /j = a£ + /3r/. Hence £ ^ (i and £ -C /z, i.e., (3) does not hold. (4) => (1). Assume that (1) is false, i.e., \i is not ergodic. Then there is an Sinvariant set A G X for which 0 < fi{A) < 1. Hence fi can be written as a nontrivial convex combination »(-) = »(AM-\A) + n(Ac)n(-\Ac), where /i(-\A) / n(-\Ac) and n(-\A),n(-\Ac) i.e., (4) is not true.
G PS(X).
This means that \x f
(1) => (5). Let / G B(X) be real valued and S-invariant and let Ar = {x G X : f(x) > r } ,
r G K.
exPs(X),
77
2.3. Ergodic and mixing properties
Then AT 6 X is S-invariant and hence /i{Ar) = 0 or 1 for every r € R by (1). This means / = const fj.-a.e. (5) =» (6). Let / e ^(X,n). Then, fs is measurable and .S-invariant fi-a.e. by Theorem 2.1. By (5) / s = const /i-o.e. Hence, / s = Jx fs dfi = fx f dfj. \i-a.e. (6) =>■ (7). Let / , g € L 2 (X, ^)- Then, by (6), fs = Jxf Ergodic Theorem implies lim {S„f,g)
dfi n-a.e. and the Mean
= ( lim Snf,g)
=
(fs,g)2,»
= (Jfdft,gj
=(/ll)2,„(l,9)2^.
(7) =* (8) =4> (9) are obvious since C(X) C B(X) C L2(X,n) and (9) => (7) can be verified by a simple approximation argument since C(X) is dense in L2(X, fj,). (8) => (10). Take / = 1A and g = 1 B in (8). (10) => (11) is obvious. (11) => (1). Let A 6 £ be S-invariant. Then (11) implies that (M(A) = n(A)2, so that /J(A) = 0 or 1. Hence (1) holds. R e m a r k 3 . (1) Recall that a semialgebra of subsets of X is a set Xo such that (i) 0 e X 0 ; (ii) A, B e X0 =*• A n 5 € X 0 ; (iii) i e J E o ^ A ^ U B j with disjoint B 1 ( . . . , Bn € XQ. i=1 As we have seen in Section 2.1, in an alphabet message space Xg2, the set 9JI of all messages is a semialgebra. Another such example is the set X x 2) of all rectangles, where (Y,%)) is another measurable space. (2) Let n e P{X) and Xo be a semialgebra generating X, i.e., cr(Xo) = X. If // is S-invariant on Xo, i-e., ^ ( S - 1 ^ ) = n(A) for A 6 Xo, then // e P S (X). In fact, let X1 = {AeX:
M(S_1A) =
n{A)}.
Then, clearly l o ^ I i - It is not hard to see that each set in the algebra -4(Xo) generated by Xo is a finite disjoint union of sets in Xo- Hence A{XQ) C XI- Also it is not hard to see that X\ is a monotone class, i.e., { A n } ^ ! C X\ and An t (or An \) imply U An 6 Xi (or n An e Xi). Since the cr-algebra generated by -4(Xo) is the monotone class generated by A(XQ),
we have that X = a(A(Xo))
= X\. Thus
»ePs(X). (3) In view of (2) above, we can replace X in conditions (10) and (11) of Theorem 2 by a semialgebra Xo generating X. In fact, suppose that the equality in (10) of
ya
Chapter II: Information Sources
Theorem 2 holds for A,B € X0. Then it also holds for A,B e A{X0) since each A € A(X0) can be written as a finite disjoint union of some Ai,... , An € XQ- Now let e > 0 and A,B eX, and choose A),B 0 G A(X0) such that n{AAA0) < e and li(BABo) < e. Note that for ji > 0 (S-'A n B)A(S~jAo n Bo) £ ( S ^ A S " ' A > ) u (BAB 0 ) = (S-j(AAA0)) U (BAB 0 ) and hence n((S-'A n B)A(5-;''AoASo)) < ^(S''(AAA0))
+ n(BAB0) < 2e,
since fj, is 5-invariant. This implies that \n{S-jA n B) -
n Bo) | < 2e,
A*(5-M0
j > 0.
(3.1)
Moreover, we have that
|M5-^ns)-MA)MS)| < \p(S-jA n B ) - /i(5~JAo n B 0 )| + I M ^ A O n B 0 ) -
n(A0)n(B0)\
+ \p{Ao)p(Bo) - lt{A)n{Bo)\ + \ii(A)n(Bo) - A*(A)/I(B)| < 4e + |/i(S-M 0 n Bo) - /Z(A 0 )JU(B 0 )|,
(3.2)
which is irrelevant for ergodicity but for mixing properties in Theorem 6 and Remark 11 below. Consequently by (3.1) it holds that i "—1 3=0'
i
n
~*
' 'j=0 =o n-1
+ +
lI " _ 1
- J- ]„ M(5-J Ao n Bo) - M(A 0 )M(B 0 ) 3=0
n
\li(Ao)it(Bo)-ii(A)it(B)\
1 *~ < 4e + - ^ /x(5-M 0 n B 0 ) - M(AO)P(B 0 ) 3=0
where the second term on the RHS can be made < e for large enough n. This means that (10) of Theorem 2 holds. (4) Condition (11) of Theorem 2 suggests that the following conditions are equiv alent to any one of (1) - (11) of Theorem 2:
79
i.S. Ergodic and mixing properties
(7') JJrn^ ( S „ / , fhtli (8')
= \(f, 1 ) 2 , / for every / G L2(X,
lim M ( ( S „ / ) / ) = H{f? n—¥oo
(9') lim (i{(Snf)f)
for every / G
2
= fi(f)
M );
B(X);
for every / G C{X).
n—K30
(5) 3 denotes the cr-subalgebra consisting of S-invariant sets in X, i.e., 3 = {A G X : S^A = A}. For M G -P(^) let J ^ j i e X : niS^AAA)
= 0},
the set of all fi-a.e. S-invariant or S-invariant (mod/i) sets in 3£. Clearly 3 0 ^ . Then, we can show that fi G iM-^O is ergodic iff fi(A) = 0 or 1 for every A G J^. In fact, the "if part is obvious. To prove the "only if part, let A G 3 M . First we note that n(S~nAAA) = 0, n > 0. For, if n > 1, then n - 1l
n -- 1l
7=0 j=0 j=0
7=0 3=0 j=0
j AAS-J 'A) A) = S'^S^AAA) S~nAAA C y (S-<-i (5- (j+1)+1) ^AS= \J y ^"(S^AAA)
and hence li(S-nA&A)
< nuiS^AAA)
= 0.
Now let AQO = (~l U S - - M = lim sup S~n A. Then we also note that /i(Aoo) = /i(j4) n = 0 j=n
n-»oo
smce fi(A00AA)
= n[ ( n U S" J A AA \ \n=Qj=n
V \n=0j'=n
/
/
J
* "((JL^H
/
oo
< ] £ / * ( 5 - ' A A J 4 ) = 0. j=n
Finally, we note that S _ 1 J4OO = -Aoo, since
i
s-^oo=s- nu ~ = n u s-« A = n u s~iA=Aoo
oo
n=Oj=n
S JA
oo
oo
n=0 j'=n
+i)
oo
oo
n=Oj=n+l
It follows from ergodicity of \i that //(^oo) = M ( ^ ) = 0 or 1.
gO
Chapter
II: Information mS, Sources
E x a m p l e 4. Consider an (M, m)-Markov source fi, where M = (my) is an £ x £ stochastic matrix and m is a row probability vector such that m = m M with mi > 0, 1 < i < £. If we write Mk = (my') for k > 1, then this gives the k-step transition probabilities, i.e., rojf = Pr{xk = aj \x0 = o j ,
1 < i, j < t
M or /i is said to be irreducible if for each i, j there is some k > 1 such that m y We first claim that 1 n_1
N=
> 0.
lim - V M * n-+oo n *—' fc=0
exists and N = (ny) is a stochastic matrix such that JVM = MN = N = N2 In fact, let Ai = [XQ = a»], 1 < « < £ and apply the Pointwise Ergodic Theorem to / = lAi ■ Then we have that
fs(x)=
lim
1 "_1
- V u t A ) fc=0
exists fi-a.e.x and — / fs(x)lAj(x)
ii{dx) = — l i m
I V ^ S - ^ n ^ )
1 "_1 = Urn - Y > j , f c ) = n y fe=0
(3.3)
for l 0 for every i, j . (4) /i is irreducible. (1) =*• (2). By (3.3) we see that for each i,j 1 "_1
=m n „^S, nZ ^»tix° ==«*.»* ""'Xk ==°'"D ^' y, ^S, ^ Z) %]) ='m,n fc=0 fc=0 fc=0
81
2.3. Ergodic and mixing properties
while the ergodicity of \i implies the RHS is also equal to niimj. Hence n ^ = nij. (2) =$> (3) is clear since we are assuming m* > 0 for every i. 1 "_1
(3) => (4). For any i,j,
(k)
lim - J2 mj- = )iy > 0. This implies that we can find
n->oo n
fc=0
some k > 1 such that m\f > 0. That is, fi is irreducible. It is not hard to show the implications (4) => (3) =$> (2) => (1) and we leave it to the reader. We now consider mixing properties for stationary sources which are stronger than ergodicity. Definition 5. A stationary source fi e Pg(X) is said to be strongly mixing (SM) if lim n(S~nA
n B) = u,(A)u(B),
A,BeX,
n-¥oo
and to be weakly mixing (WM) if and to be weakly mixing (WM) if n-l
lim - nV- l \n(S-kA n B) - n(A)u{B) 1=0,
lim - fc=0 V n—>oo n *-~* fc=0
k
\n(S- A
A,BeX.
n B) - n(A)u{B) 1 = 0 ,
A,BeX.
It follows from the definition and Theorem 2 that strong mixing => weak mixing =*• ergodicity. First we characterize strong mixing. T h e o r e m 6. For a stationary source fi G PS{X) the following conditions are equiv alent to each other: (1) n is strongly mixing. (2) lim (Snf,g)2,li = (/, l) 2 l / .(l, ff)a,M for every f,ge L2(X,fj.). That is, S " / ->
/,
f dfj, weakly in L2(X,fj.) for every f £
L2(X,fi).
(3) lim ( S n / , / ) 2 i M = | ( / , l) 2 , / i | 2 for every f e L2(X, n->oo
n
'
(4) lim fj,(S~ A nA) = fi(A)
2
'
for every
M ).
AeX.
n—too
(5) lim [i(S~nA n A) = (i(A)2 for every A G Xo, a generating semialgebra. n—+oo
Proof. (2) =>■ (1) is seen by considering / = 1A and g = 1B- (2) =3- (3) => (4) => (5) is clear. (5) => (4) follows from (3.1) and (3.2) with A = B and A0 = B0.
tion So Chapter II: Information Sources
g2
(1) => (2). Let A,BeX.
Then by (1) we have
lim ( S n U , lu)a, M = n->oo
lim
KS~nA
n B) = M ^ M # ) = ( U , l ) a * ( l i 1B)2, M .
n-K»
If / = E " i 1 ^ , and g = £ Pk^Bk, simple functions, then 3=1
fc=l
lim(Sn/,s)2,M= n-K»
Um
y^«i/9fc(SBlxi,lBjaJ.
n->oo-^—'
= 22«i^fc(l^i.l)2,*i( 1 ' 1BJ2,,I = (/, 1)2,^(1,3)2^ Hence the equality in (2) is true for all simple functions / , g. Now let / , g e L2(X, (i) and e > 0. Choose simple functions /o and 30 such that | | / — /olh./i < e and llfl — 3O||2,M < £• Also choose an integer no > 1 such that I(S n /o,So)2,M-(/o,l)2, M (l,9o)2,^\ <£,
n>n0.
Then we see that for n > no \(Snf,g)2lli-(f,
1)2,^(1,5)2^1 n
< |(S / lff ) 2 ,M - (S"/o, 3)2,^| + |(S n /o,3)2,^ - (S"/o,ffo)2, M | + |(S n /o,5o)2, / i - (/O,1)2,M(1,5O)2,A.|
+ |(/o, 1)2,^(1,50)2,^ - (/, 1)2,^(1,50)2,^1 + i(/, 1)2,^(1,50)2,^ - (/, 1 ) 2 , ^ ( 1 , 5 ^ 1 < | ( S n ( / - fo),g)2J
+ |(Sn/0,5 -5O)2,M| + e
+ I ( / - /o, 1)2,^| I(l,5o)2,^| + I ( / , 1 ) 2 J , | I(1,5 - 5 o ) 2 , „ | < 11/ - /ollajlfflhj. + WfohJg
- 5O||2,M + e
+ ll/-/o||2,^||5o||2^ + ||/||2,^||5-5o||2,^ < e||fflk„ + l l / o l k ^ + e + e\\go\\2,u + [|/||a*e < j\9fo*
+ ( l l / l t a + e)e + e + e(||5lk„ + e) + e\\f\\2tll.
It follows that 1 n
™ 0 (S n /,5)2, M = (/, 1)2,^(1,5)2,M.
(4) =» (3) is derived by a similar argument as in the proof of (1) =>• (2) above.
83
2.3. Ergodic and mixing properties
(3) =*■ (2). Take any / 6 L2(X,n)
and let
-H = 6{Snf,c
: c € C,n > 0},
the closed subspace of L2(X, n) generated by the constant functions and S n / , n > 0. Now consider the set Mi = {g € L2(X,»)
: Umg(Snf,g)2tll
= (/.^(l.fl),^}.
Clearly Mi is a closed subspace of L2(X, (i) which contains / and constant functions, and is S-invariant. Hence Mi contains M. To see that Mi = L2(X,(i) let g e M x , the orthogonal complement of M in L2(X,(j,). Then we have (S"/,ff)a,M = 0 ( « > 0 )
and
(1.5)2^ = 0,
so that g € Mi. Thus M"1 C Mi. Therefore Mi = L2(X,fi),
i.e., (2) holds.
In Theorem 6 (2) and (3), L2(X, ft) can be replaced by B(X) or
C(X).
Example 7. Every Bernoulli source is strongly mixing. To see this let fi be a ( p i , . . . ,p*)-Bernoulli source on X = X j . Let A = [x? ■ • • SBJ], B = [y° ■ ■ ■ y(] e M. Then it is clear that lim n(S-nA n B) = u(AW-B) n—foo
since for a large enough n > 1 we have n + i > t. By Theorem 6 /u is strongly mixing. In order to characterize weak mixing we need the following definition and lemma. Definition 8. A subset J C Z + = { 0 , 1 , 2 , . . . } is said to be of density zero if lim - | J n J U = 0 ,
n—>oo n
where «/„ = { 0 , 1 , 2 , . . . , n — 1} (n > 1) and | J n J „ | is the cardinality of J n Jn. Lemma 9. For a bounded sequence {an}^=1 are equivalent:
of real numbers the following conditions
(1) Urn ; " E h l = 0 ; (2) lim -nE\aj\2 n-K» n j
= 0;
= 0
(3) Tftere is a set J C Z + 0/ density zero such that
lim
J$n—t 00
a n = 0.
84
Chapter II: Information
Sources
Proof. If we can show (1) <=> (3), then (2) <3- (3) is easily verified by noting that lim
an = 0 <=*• lim
J$n—too
an = 0.
J$n—foo
So we prove (1) <=> (3). (1) => (3). Suppose (1) is true and let (1) => (3). Suppose (1) is true and let
Observe that £ i C £ Observe that £ i C £
2 2
Ek = In G Z + : | a n | > i | ,
k > 1.
Ek = in G Z+ : |o»j > ~\,
k> 1.
0 - ' and each £fc has density zero since 0 - ' and each Ek has density zero since
-\EknJn\<-f2fa\->0 n
n *—'
J
as n —> oo by (1). Hence for each k = 1,2,... we can find an integer j j . > 0 such that 1 = j 0 < J! < j 2 < ■•• and -\Ek+i n J „ | < ^-j-y,
n > i&.
(3.4)
Now we set J = U (£* n [j'fc-i.jfc)). We first show that J has density zero. If jk-i
JnJ„ = [Jnlo.jfc-x)] u [Jn[jfc_!,n)] c [B t n[o,i M )]u[£ w n[o,n)] and hence by (3.4)
~\Jn Jn\ < l(\Ek n [o,jfc_i)| + |^fc+i n [o,n)|) = ^(l^*nj A _,| + |Bfc+1nJn|) <±(\Eknjn\
< ii +T H 'l
+
\Ek+1njn\)
'
This implies that £ | J n J „ | —> 0 as n —> oo, i.e., J has density zero. Secondly, we show that lim a n = 0. If n > jfc and n 4 J, then n 4 Ek and J$n—foo
|a„| < £ T J . This gives the conclusion.
ng properties pi 2.3. Ergodic and mixing
85
(3) => (1). Suppose the existence of a set J C Z + and let e > 0. Then,
j=o
j'6J„nJ
jeJnnJ'
Since {an} is bounded and J has density zero, the first term can be made < e for large enough n. Since a„ —> 0 as n —>■ oo and n $ J, the second term can also be made < e for large enough n. Therefore (1) holds. Theorem 10. For a stationary source fx £ Ps{X) the following conditions are equivalent to each other: (1) fj. is weakly mixing. (2) For any A,B e X there is a set J C Z+ of density zero such that lim
fj,(S-nA C\B) =
n(A)p(B).
J$ n—KX>
(3) lim - T ; U(5- J '-4 n B) - /i(A)/z(B)| 2 = 0 for every
A,BeX.
re-t-oo n j = o
(4) lim - " E | ( S k / , j ) 2 , , - (/,1)2,,(1,»)2,,| = 0 /or e ^ n , / , j £ L 2 ( I , , i ) . n-»oo n
k=0
(5) /J x /i is weakly mixing relative to S x S, where fix fi is the product measure on (XxX,X®X). (6) n x T) is ergodic relative to S xT, where (Y, %),n,T) is an ergodic dynamical system, i.e., n € Pse(Y). (7) fj, x fi is ergodic relative to S x S. Remark 11. In Theorem 10 above, conditions (2), (3) and (4) may be replaced by (2'), (2"), (3'), (3") and (4'), (4") below, respectively, where X0 is a semialgebra generating X: (2') For any A,B £ XQ there exists a set J C Z + of density zero such that lim /x(S~ n A n B) = fi(A)n(B). J$ n—>oo
(2") For any A e XQ there exists a set J C Z+ of density zero such that Urn fi{S-nAC\A) = n{A)2. J$ n—>-oo
(3') lim - V ; \n(S-jA n-K» n
HA)-
fi(A)2\2
r\A)-
fi(A)2\ = 0 for every A e X0.
= 0 for every A e X0.
j=Q
(3") lim - nT. WS'iA n-voo n J=Q
gg
Chapter
ion Sot II: Information Sources
(4') lim - i f |(SV, / ) 2 , „ - | ( / , 1) 2 ,„| 2 | = 0 for every / € L2(X, fi). n-s-oo n j = o
(4") Urn - " E K S ^ ' / . ^ ^ - K / . ^ ^ P I ^ O for e v e r y / e L 2 ( X , M ) . n-voo n J=Q
Proof of Theorem 10. (1) <=> (2) •» (3) follows from Lemma 9 with an = tM(S-nAnB)-fi(A)fj,(B),
n > 1.
(1) =*> (4) can be verified first for simple functions and then for L 2 -functions by a suitable approximation as in the proof of (1) =>■ (2) of Theorem 6. (4) =>• (1) is trivial. (2) =» (5). Let A, B, C, D € X and choose Jj,, J 2 C Z+ of density zero such that n(S~nA
lim
n B) = /i(^)/i(B),
J i $ n—»-oo
Urn
/ i ( S " " C 1-12?) =
n(C)n(D).
J2$n—foo
It follows that lim
(/ux/j)((5x5)-"(J4xC7)n(Bx£>))
JiUJ2£n-K»
n{S~nA
lim
n B)n{S~nC
n D)
JiUJ2$n~foo
= /i(A)/i(B)/i(C7)/x(i)) = ()jx
^)(J4
x C)(/i x /^)(B x D).
Since 3C x X = {A x B : A, B e 3E} is a semialgebra and generates X <S> X, and J i U J 2 C Z + is of density zero, we invoke Lemma 9 and Remark 11 to see that / J X / I is weakly mixing. (5) =*- (6). Suppose fixfj,is weakly mixing and (Y, 2), T, ri) is an ergodic dynamical system. First we note that (5) implies (2) and hence n itself is weakly mixing. Let A, B e X and C, D 6 2). Then n-l
- £ ( M x 17)((5 x r ) - ' ( A x C) n (B x £>)) 3=0 n-l
= -£^M£)77(r^cni3) i=o
1
n _ 1
+ n E
W~'*
n
-B) - M(^)M(S)}7?(r^C n Z>).
(3.5)
87
2.3. Ergodic and mixing properties
The first term on the RHS of (3.5) converges to li{A)n(B)ri(C)T]{D) = (fixr]){Ax
C){fi x n ) ( S x D) (n -> oo)
since r/ is ergodic. The second term on the RHS of (3.5) tends to 0 (n —> oo) since n-l
l £ > ( s - ' ' i 4 n B) - M(A)M(B)}n(r-^c n D) n j=0
n-l
< - 53 IM-S'M nfl)- M(^)A*(B)| -> o (n —> oo), since ^i is weakly mixing. Thus n x r) is ergodic since 3£x 2) is a semialgebra generating £®2J. (6) =4- (7) is trivial. (7) => (3). Let A, B 6 £ and observe that 1
n—1
- ^
1
fi(S-'A
n—1
D B) = - ^ 0 * x AO ((5 x S)~j{A
j=o
xX)n(Bx
X))
j=o
-¥{jpx = - n—1
p){A x X)(/t x fi)(B x X),
by (7),
»(A)(i(B), - n—1
- 53 MOS'-'A n B)2 = - 53 (/* x M) ((5 x S)-J'(;4 x A) n (5 n B)) .7=0
j=0
-»
(A*
x /i)(A x A){n x n)(B x B),
by (7),
2
= M(A) V(B) . Combining these two, we get n-l
i53|M5-^nB)- M (AMB)| 2 n .
j=0
n-l
= - 5 3 {»(S~jA n B)2 - 2/i(S'-;'A n B)/*(A)Ai(B) + //(A) V(B)2} n
j=o
-» n(A)V(B)2
- 2n(A)n(B)n(A)n(B)
+ ^(A) 2 M (B) 2 = 0.
Thus (3) holds. E x a m p l e 12. Let fi be an (M, m)-Markov source on X = XQ . ^ or M is said to be aperiodic if there is some no > 1 such that M"° has no zero entries. Then, the following statements are equivalent:
88
Chapter II: Information Sources (1) n is strongly mixing. (2) jj. is weakly mixing. (3) /i is irreducible and aperiodic. (4) lim Tuffi = mj for every i, j . n—►oo
**
The proof may be found in Walters [1, p. 51]. In the rest of this section we use relative entropy to characterize ergodicity and mixing properties. Recall that for a pair of information sources fj, and v the relative entropy H{v\fi) of v w.r.t. \i was obtained by
H{y\fi) = sup j £
u{A) log ^
log — dv, dfj,
/ ,x oo,
: 21 G V(X) J if
J/
< /i,
otherwise
(cf. (1.6.1) and Theorem 1.6.2). The following lemma is necessary. L e m m a 1 3 . Let fin (n > 1),/J, G P(X). Suppose fin < afi for n > 1, where a > 0 is a constant. Then, lim fj.n(A) = fj,(A) uniformly in A G X iff lim H(fj,n\u) = 0. 71—fOO
71—>00
Proo/. The "if part follows from Theorem 1.6.3(4). To see the "only i f part, observe that {^jf} is uniformly bounded and converges to 1 in probability (w.r.t. //) by assumption. Since + ±(t-l)2,
\tlogt\<\t-l\
*>0,
we have { ^ f l o g ^ f } converges to 0 in probability. Thus, since {&*.log^-} uniformly bounded, we also have lim H(Jtn\^)=
Urn [ ^ l o g ^ d / x = 0.
We introduce some notations. Let n G P{X). fln on X <8> X by 1
is
For each n > 1 define a measure
71 — 1
W^xfl) = -^/»(s-^nB),
A, Be*.
2.3.
89
Ergodic and mixing properties
For a finite paxtition 21 G P(X) of X, fJ.% denotes the restriction of fi to 21 = o-(2l), i.e., fi® e 'P(^) denote by .4(21 x 53) the algebra generated by the set {A x B : A G 21, S G 53} of rectangles. We also let
*-*^-JLMA"B^mmPV
Aea,B€®
FA„(2lx®) = -
£
;PV
;
/in(J4xB)log/i„(AxJB),
A62l,£e93
*woa x») = - E "(■*) lo^(A) - E ^(fi) lo s^ 5 ) Ae«
Be<8
= #„(&) + tfM(53), say. Now we have the following. Now we have the following. Proposition 14. For a stationary source fi G •Ps(-X') i/»e following statements are equivalent to each other: (1) fi is ergodic. (2) lim HJ% 53) = 0 /or every 21,53 G 7>(£). n—>oo
(3) lim fla„(2l x 53) = ffMXM(2t x 53) /or every 21, 53 G 7>(£). n—>oo
Proo/. (1) <(=> (2). By Theorem 2 /x is ergodic iff lim £„(A x B ) = n(A)ti(B),
A G 21, S G 53, 21, 53 G PCX).
n—+oo
If we fix 21, 53 G P{X), then the convergence above is uniform on .4.(21 x 53) and x ^ — 2Aea S„ TTTv "rn d(Max/i<8) u(A)
ff
n(2l,58) ( ^ ' B | M a x /,«), IT cor ro\ _= H uY.-.H.Bl
where /ijj' 33 = /i n U(ax®), the restriction of /2„ to .4(21 x 53). Thus by Lemma 13 (1) & (2) holds. (2) <& (3) is clear since for n > 1 and 21,53 G V{X) Hn(%53) = -H A n (21 x 53) +
ffMXM(0,53).
To characterize mixing properties we let for n > 1
lin(AxB)
= n(S-nAnB),
A,
BeX,
ion So Chapter II: Information Sources
90
tf„(a,23) = fl^(ax«8)
£
= -
M n ^ x ^ l o g ^ ^ , ^
0,®€7>(3£),
^„(A x B) log(Mn(AxB),
a,®e7»(3£).
46a,B€®
Then the following proposition is derived from Proposition 14 by replacing # n and Hp.n by H„ and H^, respectively. P r o p o s i t i o n 15. For a stationary source n 6 PS{X) equivalent to each other: (1) (j, is strongly mixing. (2) lim F„(2t, 55) = 0 for every a , 35 6 V{X).
the following statements are
Tl—tOO
(3) lim Jf„ ( a x 95) = i f „ X u ( a x 55) /or ewery a , 35 € 7>(3E). n—*oo
Finally, weak mixing is characterized as follows: P r o p o s i t i o n 16. For a stationary source \i e P3(X) equivalent to each other: (1) fj. is weakly mixing. (2) lim - Yl Hj(%, 35) = 0 for every a , 35 £
the following statements are
V(X).
(3) lim - " E HH (21 x 35) = flMXM(a, 35) /or every 21,35 G 7>(3E). n-K» n , _ n
Proof. (1) «■ (2). For any 21,35 6 P ( £ ) and n > 1 it holds that n-1
S 2a
E
2 ' MlRB)^EI^-^n5)-M^(5)| 1
<
where a = max { ^ j
: B 6 35, /z(B) ^ 0} since
(*-l)+^(«-l)2<*logi<|t-l| + ^(i-l)2,
*S[0,1].
2.4- AMS sources
91
This is enough to show the equivalence (1) 4* (2). (2) O (3) follows from .. n—1
» n—1
- £ ff^a x 55) = ff„x„(a x 95) - - £ ff„„ (21 x 53) j=0
3=0
f o r n > 1 and 21,55 e P ^ ) . The results obtained in this section will be applied to consider ergodic and mixing properties of stationary and AMS channels in Chapter 3.
2.4. A M S sources Let X be a compact Hausdorff space and X be the Baire cr-algebra of X as before. S denotes a measurable transformation on X. Of interest is a class of nonstationary sources for which the Ergodic Theorem holds. Each source in this class is said to be asymptotically mean stationary. Here is a precise definition. Definition 1. A source fj, 6 P{X) is said to be asymptotically mean stationary or AMS if, for each A e X, 1
n-ln
~1
k hm -Y n-yoo n J^S- A) fc=0
= p{A)
(4.1)
exists. [In this case, p is a probability measure, i.e., an information source, by the Vitali-Hahn-Saks Theorem (cf. Dunford and Schwartz [1, III.7]). Moreover, p € PS(X).] p is called the stationary mean of fi. Pa{X) denotes the set of all AMS sources in P(X). R e m a r k 2. (1) If [i e Pa(X) with the stationary mean p e PS(X), then (i) n(A) = p(A) for all 5-invariant A e X, (ii) n{f) = p(f) for all 5-invariant / e B(X). (2) If, for n € P(X), the limit in (4.1) exists for all A G Xo, the generator of X, p need not be AMS. L e m m a 3 . (1) Pa{X) is a norm closed convex subset of (2) If p,r)€ Pa(X), then | | p - rj\\ < \\p - n\\-
P(X).
Proof. (1) Clearly Pa(X) is convex. To see that Pa(X) is norm closed let {fin}^=1 C Pa(X) be a Cauchy sequence. Then, since P{X) is norm closed there exists a
Chapter II: Information Sources
92
p G P{X) such that ||p„ - p|| -> 0 as n -> oo. Hence, \nn{A) - fi(A)\ -* 0 uniformly in .A G £ since |p„(>l) - M (A)| = Kft, - M)(il)| < ||p„ - MilThat is, for any e > 0 there is an integer n 0 > 1 such that |^(A)-M{A)|<£,
n>n0,
A e l
Thus for A G 3E and p, g > 1 we have that ?-i
J
I£MS- A)-JX>(S-^)
¥ p
H y
j j=o =o
fc=o 9-1
< i 2 {KS-jA) - i^iS-'A)} - - £ {n(S~kA) - »no(S-kA)} i
+
__
■>
„
i
i£/zno(S-^)-iX>»o(
i
j=0 9-1
p-1
< i V \rtS-iA) - »no(S-jA)\ +q -J2 HS-kA) - »no(S-kA)\ Pj^i
k=o
i£ M n o (S-^)-^Mn„(S- f c A). +p k=0 u Since p„ 0 is AMS we can choose an integer po > 1 such that the third term of the RHS of the above expression can be made < e for p, g > po- Consequently it follows that for p,q > po, the LHS of the above expression is < 3e. Therefore, the limit in (4.1) exists for every A G X, so that p G P a ( X ) . (2) Let us set Ma{X)
= {an + fa : a,0€
C, p,»? G P « ( X ) } ,
Af s (X) = { a p + ^7?: a,/? G C, /X.JJ g
P.{X)}.
Note that M 0 ( X ) is the set of all measures p G M(X) for which the limit in (4.1) exists, and MS(X) is the set of all 5-invariant p G M(X). Define an operator
10:Pa(X)^Ps(X)by Top = p,
p G Pa{X)
and extend it linearly to an operator T on Ma(X) onto Ma(X). Then we see that X is a bounded linear operator of norm 1, since Ma(X) is a norm closed subspace of M(X) by (1). Hence (2) follows immediately.
24-
AMS sources
93
Definition 4. Let fi, rj e P{X). That r\ asymptotically dominates /u, denoted n<.n, means that TJ(A) = 0 implies lim fi(S~nA) = 0. n—foo
The usual dominance implies the asymptotic dominance in the sense that, if H e P(X), T) e P3(X) and \i < 77, then / J « 77. In fact, if 77(A) = 0, then 77(S _ M) = n(A) = 0, which implies that p(S~nA) = 0 by fj, -C 77 for every n e N. Thus a
a
lim fj.(S~nA) = 0, implying / J < 7 7 . Although < is not transitive, one has that if
n—>oo
a
a
a
/ i < ^ < 7 7 o r / i < f < 7 / , then p, < 77. After the next lemma, we characterize AMS sources. Lemma 5. Let p.,r) € P ( X ) and for n > Q let Snfi = (Sn(i)a + (Snfi)s be the Lebesgue decomposition of Snn w.r.t. 7/, where (Snp,)a is the absolutely continu ous part and {Snp.)a is the singular part. If / „ = » ^'" is the Radon-Nikodym a
derivative and p. -C 77, then it holds that n lim < sup n(S~ A)
n->oo I
JjndV 1=0
(4.2)
A e x
Proof. For each n = 0 , 1 , 2 , . . . let Bn e X be such that 77(Bn) = 0 and Snp(A)
= Snfi(A n Bn) + f /» *?,
A 6 3E.
OO
Let 5 = U i? n . Then we see that r)(B) = 0 and for any A € X 0 < M(5~ n A) - [ fndr,
= /x(5-n(Anfi„))
< At(5- n (A n 5 ) ) < fi{S-nB)
-> 0
a
as n —^ oo by /i
(2) Tftere ezisfc some n e PS(A") sitcft that p
S~nX
imp/y /J(J4) = 0.
(4) There exists some rj € Ps(X) such that r](A) = 0 and S_1A p.{A) = 0.
= A imply
Chapter II: Information
g4
(5) lim Snf = fs
n-a.e. for f G B(X),
where fs is an S-invariant
Sources
function.
n-nx
(6) lim «(S„/1 exists for f G B{X). n-¥<x
If one (and hence all) of the above conditions holds, then the stationary mean p, of fi satisfies: p(f) = Urn /i(S„/) = Kfs), f e B(X). (4.3) n—>oo
Proof. (1) =>■ (2). Suppose (1) is true and let r) = p, G PS(X). n
let B = ]imsupS- A= n-»oo
OO
OO
n
k
U S~ A.
n = 1
Assume p(A) = 0 and
,
fc=n
Then
OO oo
<£>($-* A) = 0 fc=i
since /Z(S_fcA) = /Z(A) = 0 for every k > 1. This implies /x(i?) = 0 since B is clearly 5-invariant. Now we see that limsup/z(S _ n A) < / / ( l i m s u p S - n A ) = 11(B) = 0 by Fatou's lemma. Thus lim /j,(S~nA) = 0, and therefore (2) holds. n—>oo
a
(2) => (3). Assume that 77 € -Ps(-^) satisfies fi^tirj. Take any >1 G 36^ such that a _ / A\ r\ 1 J^__ J _ f >4 I n n /— -v' J _ 1 _ *~»—n A A I* V I T . ii (2) => (3). Assume that n € -P s (^) satisfies fJ.<^f]. Take any .A G SEQO such that n r)(A) = 0 and find a sequence { A n } ^ ! C X with S~ A„ = A for n > 1. It then follows from Lemma 5 that fi(A) — I / „ dr) —► 0 as n —^ oo, ■^„
where / „ =
' ?**'". Since 77 is stationary and 77(A) = 0, we have
/
JAn
fndn=
f Snfndr, = 0,
JA
so that 11(A) = 0. Thus (3) holds. (3) => (4) is immediate since A G Xoo if A G X is S-invariant. (4) =>■ (2). Let TJ G P S (X) satisfy the condition in (4). Take an A G X with 77(A) = 0 and let B = r i m s u p S ~ M . Then we see that B is S-invariant and 77(B) = n-Kx>
0. That lim^ n(S~nA) (1) =* (2)"above.
= 0 can be shown in the same fashion as in the proof of
24- A.MS sources
95
(2) => (5). Suppose that / J < 7 ? with 77 G Pa(X), and let / G B(X) be arbitrary. Then the set A = {x G X : (S„/)(a:) converges} is 5-invariant and r](A) = 1 by the Pointwise Ergodic Theorem. Thus, that Ac is 5-invariant and 77(^4°) = 0 imply lim n(S~nAc) n—>oo exists
a
= n{Ac) = 0 by n
and is 5-invariant n-a.e.
n—foo
exists and is 5-invariant \i-a.e. (5) => (6). Let / G ■B(-X') and observe that {Snf}'^'=1 is a bounded sequence in B(X) C LX(X, n) such that S „ / -» / s ^t-a.e. by (5). Then the Bounded Convergence Theorem implies that /i(S„/) —> (J,{fs)(6) => (1). We only have to take / = 1A in (6). The equality (4.3) is almost clear. a
Remark 7. (1) Note that /z• (2) in the above theorem. (2) If n G P(X) and n < 77 for some 77 G Ps(-X"), then n is AMS. (3) The Pointwise Ergodic Theorem holds for \i G P(X) iff /j G Pa(X). More precisely, the following statements are equivalent for \i G P(X): (i) /. G Pa(X). (ii) For any / G B(X) there exists some 5-invariant function fs G B(X) such that S n / —► / s \x-a.e. In this case, for / G L 1 (X,/Z), S „ / —» / s ft-a.e., p-a.e. and in L J (X,/T), and / s = En(fP) = Eji{f\3) fi-a.e. and p-a.e., where 3 = {.4 G X : 5 _ 1 . 4 = A} is a cr-algebra. (4) In (5) and (6) of Theorem 6, B(X) can be replaced by C(X). (5) Pa(X) is weak* compact with the identification of Pa(X) C M(X)
= C(X)*
When 5 is invertible, we can have some more characterizations of AMS sources. Proposition 8. Suppose that S is invertible. Then, for /j, G P(X) conditions are equivalent: (1) fi G Pa{X). (2) There exists some 77 G PS(X) such that n -C 77. (3) There exists some 77 G Pa(X) such that ft
the following
Proof. (1) =^ (2). Let 77 = p G P S (X). Since 5 i s invertible, we have SX = X= and hence 3£oo =
S~XX
OO
PI 5~n3£ = X. Thus the implication (1) => (3) in Theorem 6
71=0
implies ^ -C 77 = p. (2) => (3) is immediate. (3) => (1). If 77 G Pa(X),
then 77 « 7 7 by the proof of (1) => (2). Hence, if fj, < 77,
Chapter II: Information
96
Sources
then fj, -C 77. This implies fi
M (yl)
= JA fdq,
AeX
is AMS. In this case the stationary mean \x of \i is given by n-l n—foo 71
fc=0
lim — >
n-t
/
/ dn
fc= n-l
1
= „ i+-/jE/(^)^) = /
lim
=
fsdij,
Snfdr] Ae X,
(4.4)
where fs = J m i S „ / , by the Pointwise Ergodic Theorem since 77 is stationary. In particular, if fj,
where ^ - = { ^ £ 1 :
ftS^AAA)
1 A a e
"--
= 0}.
(2) Take a stationary source 77 € Pa(X) Then, \x defined by
and a set £ € X such that 77(B) > 0.
^ = 5 == ^| s ),
^=Sr ^ '
AA € £
^
8.4- AMS sources
97
is AMS since /i < 77. In fact, this is a special case of (1) since n(A) = JA 4 ^ j dr\ for A e £ Similarly, take an AMS source 77 £ Pa{X) with the stationary mean rj and a set B € X with 77(B) > 0. Then the conditional probability /J(-) = r)(-\B) of 77 given B is also AMS. For, if 77(A) = 0, then T)(B)J]imsupS-nA)
< r/(limsup5_nA) = 0
> n—foo
'
* n—too
'
a
a
by 77
H is is AMS. AMS. It It follows follows that that AMS AMS property property remains remains in in H is lost lost by by conditioning, conditioning, in in general. general. is Definition 1 1 . An AMS source /i € Pa{X) is said for every S-invariant A G X. Pae{X) denotes the set
conditioning while while stationarity stationarity conditioning to be ergodic if fi(A) = 0 or 1 of all ergodic AMS sources.
We have several equivalence conditions of ergodicity for an AMS source. Theorem 12. For fi € Pa(X) conditions are equivalent: (1) H&Pae(X).
with the stationary mean p 6 PS{X) the following
(2)p€Pse(X). a,
(3) There exists some r\ G Pse(X)
such that ii<^r}.
(4) fs(x) = lira ( S n / ) ( x ) = I fdfi n—¥oo
(5) Im^ (Snf,g)2fi
fl-a.e. and p-a.e. for f 6
J v
= (f, 1 ) ^ ( 1 , s) 2 , M for f,g e L2(X,»)
(6) Jlim o M((S„/)s) = p(f)fJ.(g) for every f,ge
B(X).
(7) lim p((Snf)g)
= p(f)p(g)
for every f,ge
C(X).
(8) n—»oo lim - V
k
p(A)n{B)
fj,(S~ A DB)=
nL2(X,p).
for every
A,BeX.
(8) lim -I " n-l v fj,(S~ A DB)= p(A)n{B) for every (9) lim - V fj,(S~kA n A) = /Z(AWA) for every
A,BeX. AeX.
k
1 n-l
Ll(X,p).
(9) lim - V fj,(S~kA r\A)= /Z(AWA) for every AeX. Proof. (1) <=> (2) is clear from the definition. (1), (2) =*• (3) follows from Remark 7(1) by taking 77 = p. (3) =$■ (1). Let 77 e Pse(X) be such that / J < 7 7 . If A e X is 5-invariant, then 77(A) = 0 or 1. If 77(A) = 0, then fj,(A) = p(S~nA) -> 0 by / J < 7 7 , i.e., n(A) = 0. Similarly, if 77(A) = 1, then we have p(A) = 1. Thus \x £ Pae(X). The implications (1) =>. (4) =>• (5) =► (6) => (7) => (8) => (9) => (1) are shown in much the same way as in the proof of Theorem 3.2.
Chapter II: Information Sources
gg
Remark 13. In (8) and (9) of Theorem 12, X can be replaced by a semialgebra X0 that generates X. Also in (5), (6) and (7) of Theorem 12, we can take g = f. Theorem 14. (1) If p e exPa(X), then fi G Pae{X). That is, exPa(X) C Pae{X). (2) If Pse(X) / 0, then the above set inclusion is proper. That is, there is a H G Pae{X) such that \i £ exPa(X). Proof. (1) This can be verified in exactly the same manner as in the proof of (4) =J- (1) of Theorem 3.2. (2) Let fj, G Pae(X) be such that n ^ p. The existence of such a fi is seen as follows. Take any stationary and ergodic f G Pse(X) ( / 0) and any nonnegative / 6 ^(X, f) with norm 1 which is not S-invariant on a set of positive f measure. Define fi by
p(A) = J f<%,
A€X.
We see that \i is AMS by Example 10 (1) and ergodic because £ is so. Clearly fi is not stationary. Hence \x^p. Also note that p = f since for A 6 X
tHA) = Jfsdt = t(A) by (4.4) and / s = 1 £-a.e. because of the ergodicity of £. Then r) = \{n + p) is a proper convex combination of two distinct AMS sources and rj(A) = 0 or 1 for S-invariant A e X. Thus r\ £ exPa(X) and n 6 Pae{X). Again, if S is invertible, ergodicity of AMS sources is characterized as follows, which is similar to Proposition 8: Proposition 15. If S is invertible, then the following conditions are equivalent for »ePa(X): (1) p e Pae(X). (2) There exists some n G Pse(X) such that / i < i | . (3) There exists some 7? G Pae(X) such that | i < ? ) , (4) There exists some n G Pae(X)
such that fJ.<.n.
Proof. (1) => (2). Take r? = p G Pae{X), (2) => (3) is clear.
then n -C p = r\ by Remark 9.
(3) =► (4). Let r) e Pae(X) be such that p < n. Then rj € Pse(X) Hence /i < rj and ^ < 77 since rj is stationary.
and n < fj.
(4) => (1). Let 7? e Pae(X) be such that /i<7?. Then 77 < 77 and ii-krj. 77 G -PsePO, Theorem 12 concludes the proof.
Since
2.5. Shannon-McMillan-Breiman Theorem
99
Ergodicity and mixing properties may be defined for nonstationary sources. Let fi e P{X). /J. is said to be ergodic if n(A) = 0 or 1 for every S-invariant A e X. ^t is said to be weakly mixing if n-1
lim - VUs-'AnB)-/*(s-M)/z(B)| = 0,
A,Bex
and strongly mixing if
lim U S _ M n B) - ^(S,-"J4)^(B) 1=0,
n—foo '
A,BeX.
'
Clearly, these definitions are consistent with the ones for stationary sources and it holds that strong mixing implies both weak mixing and ergodicity. We can show that if n is AMS and weakly mixing, then \i is ergodic. In fact, the condition (8) or (9) of Theorem 12 may be easily verified.
2.5. S h a n n o n - M c M i l l a n - B r e i m a n T h e o r e m An ergodic theorem in information theory, the so-called Shannon-McMillanBreiman Theorem (SMB Theorem), is proved in this section. First we briefly de scribe practical interpretation of entropy (or information) in an alphabet message space. Then we formulate the SMB Theorem in a general setting. Let us consider an alphabet message space (X, X, S), where X = XQ, X0 = { o i , . . . , at} and S is a shift. Let / / b e a stationary information source, where we sometimes denote it by [X,/x]. For an integer n > 1, 9Jtn denotes the set of all messages of length n of the form
[4W} " '*&]. [4* ••■*&].
l
i.e., the messages of length n starting at time 0. Note that 97l„ is a finite partition of X, i.e., 9Kn e V(X), and fl«n = "v S^Otti. Hence the entropy of 2Jt„ under the 3=0
measure /J, is given by flr/J(9Jl„) = -
^Z
/i([xo---x„-i])log/x([x 0 ---a;n-i]),
XQ,... ,a:n-i€Jfo
where we write the dependence of the entropy on /i. If UJ^n denotes the set of all messages of length n starting at time i — ± 1 , ± 2 , . . . , xlf
'-^-ii
i
Chapter II: Information Sources
100
then the entropy of WFn is the same as that of Wtn = 2rt£ since fi is stationary. Hence the information (or entropy) per letter (or symbol) in messages of length n input from the stationary information source [X, fj] is -Hli(Tln) n
=
-Hjny1S-'Tl1) n ^Vj=o /
and if we let n —> oo, then the limit H(n) = H(fi,,m1,S)=
lim --ffJEDU
n—*oo n
exists by Lemma 1.3.5, which represents the average information per letter of the stationary information source [X, fi\. This is a practical interpretation of the defi nition (1.3.3). Moreover, in this case, the Kolmogorov-Sinai entropy of the shift S, oo
denoted H(u, S), is equal to H(u) by Theorem 1.3.10 since
~
V SnTl1
n=—oo
= X.
To formulate the SMB Theorem we are concerned with entropy functions. Let (X, X, S) be an abstract measurable space with a measurable transformation 5 and fi £ PS{X). Recall that for a finite partition 21 e V(X) and a cr-subalgebra 2J of X, the entropy function 7^,(21) of 21 and the conditional entropy function 7^(2112)) are defined respectively by M20(-) = - £ U ( - ) l o g M 0 4 ) ,
VaiSWO = - E iA(-)iogi^(A|2))(-), um)u iA(-)iogi^(A|2))(-), where P M (|2J) is the conditional probability relative to 2J under the measure /x (cf. Section 1.3). These functions enjoy the following properties: for 21,58, <£ e V{X) (1) / „ ( » ) = 7M(2l|2), where 2 = { 0 , X } , (2) 7^(21 V 23) = / „ ( » ) + /„(«B|a), where 21 = a(2l), (3) 7M(2l V » | € ) = 7^(2l|C) + 7^(<8|2l V £), (4)7 M (2l)o5 = 7 / i (5- 1 2i), (5) 7M(2t|23) o 5 = 7 / i (5" 1 2l|S- 1 58). In fact, (1) - (5) are in Remark 1.3.2. (6) iJvZj) \j-i
= E /„(a*| V % ) /
fc=1
v
ij=o
for 2l0 = 2 , 2 l ! , . . . ,2l„ e 7>(£) and n > 1.
/
—
For, this is verified by (1),(2),(3) and the mathematical induction. = 7 / i (5-("- 1 )2l) + £ / ^ - ( " - f c - i y
(7) iJVs-w)
•7_u
'
fc=i
^
V 5-("-J)2lV I i=i
/
S.5. Shannon-McMillan-Breiman Theorem
101
This is obtained from (6) by letting 21, = S~ln~j)% = 7„(2l) o S " - 1 + £
(8) iJ^S-m) \j=o
y
J
1 < j < n.
Mf a l V 5-( fc -> +1 )2l) o 5 n - f c - x . i
v ij=i
fc=1
This is immediate from (4) and (7). Now for n = 1,2,... let
/. = /„( Vs-H),
PO = i„(a),
„ = /„ (21I v 5- fc a), v 1«=i
/
5
= /„ (21I v \
1k=i
s-kx). /
Then the equation in (8) can be written as (9) / „ = E 1 S " - * - 1 ^ for n > 1, where Sgk
=gkoS.
h=0
(10)ff( M ,2l,S) = [ gdii. Jx For, by Lemma 1.3.5 it holds that H(u,%S)=
V 5- J '2l)
Urn HJ&\ n—*oo
V
Ij = l
= lim / lj&\ = n lim / ^°°Jx
/
V 5-%)^
gndp
= / n_lim gn dp,, Jx >°°
since g in L 1 by Lemma 1.3.9,
= / gdfi. Jx Similarly, we get H(p,%S)
= lim n-too n
-HjnVlS-t&) \ 3=0
I
= lim - f / J V S - m ) dp, n->oo n Jx
> i=°
= lim / - / „ d / i .
'
(5.1)
n—►oo
We expect that i / „ —►ftfor some 5-invariant function ft and H(p,, 21, S) = fxh The SMB Theorem guarantees that this is the case, which is given as follows:
dp.
Chapter II: Information
102
Sources
Theorem 1 (Shannon-McMillan-Breiman). Let n € Pa{X) be a stationary source and 21 e V(X) be a finite partition of X. Then there exists an S-invariant function h 6 L1(X, /x) such that lim - / „ = lira -IA
n->oo n
fe
2l) = h \j.-a.e. and in L1,
V S
n—K» n
V
H(IJ.,%.,S)=
fc=0
/
I hdfi= Jx
(5.2)
/ gdfi. Jx
If, in particular, \i is ergodic, then h = H(n,%S)=
gdfj,
n-a.e.
Jx Proof. Since V S _fc 2iT V S _fc 2t we can apply Lemma 1.3.9 (1) to obtain fc=i fc=i
gn —> g fi-a.e. and in L1.
(5.3)
It follows from the Pointwise Ergodic Theorem (Theorem 2.1) that there exists an h G LX(X, fi) such that Sh = h fi-a.e. and 1 n_1 - y ^ Sn~k~1g fc=0
= Sng -» h
fi-a.e. and in L1.
(5.4)
Now observe that
-fn-h n
fc=o
1,M
_ s „- fcfc--i1Ss)) (s „- fcfc-i < lg l£(S"-1flfc fffc-S"fc=o
i,„
fc=o fc=0 —±
++
fc=o
l g s-S2s « - *n--kl-slg-h _h n
;fc=0
1,M
hlt
rv-s oo 0ft 9asC T» n —i—►
by (5.3) and (5.4). This establishes the L 1 -convergence. To prove fi-a.e. convergence we proceed as follows. We have from the above computation that '-/n - h
^lEsn-k-1\9k-g\ fc=0
+ fc=0
2.5. Shannon-McMillan-Breiman
Theorem
103
The second term tends to 0 fi-a.e. by the Pointwise Ergodic Theorem. So we have to show 1 n_1 l i m s u p - y * S " - f c - 1 | p f c - 5 | = 0 n-a.e. (5.5) "^°° nfc=o For AT = 1 , 2 , . . . let GN= sup \gk-g\k>N
Then, GN I 0 n-a.e. since §k -> <7 fi-a.e., and 0 < G 0 < sup S „ +
ff
G Z , 1 ^ , /i)
n>l
by Lemma 1.3.8. Let N > 1 be fixed. If n > N, then 1
re-l n— i
,^ /yJ- jVv — - li
n
nn—- il \
V fc=0 W-l
fc=JV' n-l
< - V s"-*- ^, + -T sn-k-H 1
n z —'
n ■'—'
fc=0 fc=JV
=
IEsn-fc-lGo + ! ^ V . n fc=o *—'
1
n
£
S;G,
n — N *—> i=o
Letting n —r oo, we see that the first term on the RHS tends to 0 fj.-a.e. and the second term converges to some S-invariant function GM,S £ LX(X, //) \i-a.e. by the Pointwise Ergodic Theorem. Thus we have 1 \. t limsMp-J2sn~k~1\9k-9\
n
^°°
H-a.e.
fc=o
Finally, note that GN,S 4- as iV —>• OO and by the Monotone Convergence Theorem
Finally, note that GN,S 4- as N —>• oo and by the Monotone Convergence Theorem /
Jx
GN,SGN,d/j.= sdij,=
Jx Jx
GN
dfi —> 0
as N —> oo
since Gjy 4- 0, which implies that GN^ —> 0 /i-a.e. Therefore (5.5) holds and fi-a.e. convergence is obtained. (5.2) is clear from (10) above and (5.1). Corollary 2. Let X = XQ be an alphabet message space with a shift S and (i G Ps(X) be a stationary source. Then there exists an S-invariant function h G L1(X,fj.) such that nlim ^ o < ] ~ n I,
E ^ XQ}... J I H - I G X Q
1
lo iox [xo-xn-i] l [ 3 o - z . . -&l i ] l([ go---Xn-i])\ ^([X0-'Zn-l])| = h )
fi-a-e. and in L1.
JQ4
Chapter II: Information Sources
If, in particular, \i is ergodic, then h = H(fi, S)
(i-a.e.
Proof. Take 21 = Wli € V(X) and observe that for n > 1 5- fc 2l) = I^Wtn),
/„ = /„( V
since 9Jln = V
S-kSHl,
= - Yl UlogM(-A) Aesotn ]£
l[x 0 -x T ,_ 1 ]logM([*o---a;„-i3).
Thus the corollary follows from Theorem 1. The following corollary is sometimes called an entropy equipartition
property.
C o r o l l a r y 3 . Let X = XQ be an alphabet message space with a shift S and p. 6 Pse(X) be a stationary ergodic source. For any e > 0 and 5 > 0 there is an integer no > 1 such that J{xeX:
\^fn(x)-H(fi,S)\>e}j
< 6,
n>n0.
Hence, for n > no, the set 9Jt„ of messages of length n starting at time 0 can be divided into two disjoint subsets 2Jtn,g and 33tn,6 such that every message M € 9Jl nj9 . (2)J
(3) J
v
U Mean„, s
'
M)>1-8.
U M)< 5.
Proof. Since a.e. convergence implies convergence in probability the corollary follows immediately from Theorem 1. R e m a r k 4. (1) In Corollary 3, for a large enough n (n > no) each message M € 93I„iS has a probability approximately equal to e~nH^'s^ and hence the number of messages in 9Kn,g is approximately enH',i,si. Since the number of messages in 9Jl„ is tn = enl°^ and H{fj,,S) < log*, it holds that |0JI„)9| » |3K„,(,|, i.e., the number of elements in 9Jt„i9 is much larger than that of those in 9Jtni(,. (2) Another consequence of Corollary 3 is that if we receive a message long enough, then the entropy per letter in the message almost equals the entropy of the infor mation source. A generalized version of Theorem 1 is formulated as follows:
2.5. 2.5. Shannon-AfcJWWan-Breiman Shannon-McMillan-Breiman Theorem Theorem
105
Corollary 5. Let (X,X) be a measurable space with a measurable transformation S : X —¥ X. Let \i G PS(X) be a stationary source and 21 6 V(X) be a finite partition of X. Assume that 2) is a a-subalgebra of X such that 5 _ 1 2J = 2J and let
fn=/, (2 *-*■!»). \
I
rc=l
/
\
I
for n > 1. Then there exists an S-invariant function h e L1(X,/j.) lim - / „ = n-ioo n
Um -7„('"v 1 5- f c 2l|2) V ) = / i n->oo n
\
oo
fc=0
I
rc=l
/
such that
u-a.e. and in L1.
(5.6)
/
, ~
If, in particular, 2) C 2too = V S~k%
then
k—l
H(n,2L,S)=
[ hdfi. Jx
(5.7)
Proof. (5.6) can be verified in a similar manner as in the proof of Theorem 1. As to (5.7) assume that 2J C 2too. Then we have H(ji, 21, S) = lim J5r„(a|a„),
by Lemma 1.3.5,
n—>oo
= HM(a|aoo) = ^(2112100 V 2J),
by assumption,
where 2l„ = V S ^ S l and 2l„ = o-(2l„) as before. Thus we get where 2l„ = V S ^ S l and 2l„ = o-(2l„) as before. Thus we get /
ftd/i=
_/x
Urn /
ijJVs,-fc2l|2))d/z
n-too J x n \fc=0 1 /n-1 ,~| fc
I \
n->oo n
/
= lim -HJ
/
v 5- at?j)
V
fc=0
I
= Urn -( J ff M (2l|?J) + V F M ( 2 t | 2 l f c V 2 J ) l =
H{n,%S).
Since the function h in Corollary 5 depends on /i € -Ps(-^0 we should denote it by ft^, so that H(fi, %S)=
f K dii,
fi € P S (X).
(5.8)
Chapter II: Information
106
Sources
This will be applied to obtain an integral representation of the entropy functional in Section 2.7.
2.6. Ergodic decompositions In this section, ergodic decomposition of a stationary source is studied. Roughly speaking, if a measurable space (X, X, S) with a measurable transformation S : X -¥ X is given, then there is a measurable family {fix}x^x of stationary ergodic sources, which does not depend on any stationary source, such that each stationary source fj, is written as a mixture of this family: fj.(A) = / (J.X(A) n(dx),
AeX.
Jx
This is, in fact, a consequence of the Krein-Milman Theorem: PS(X) = coPse(X) (cf. Section 2.1). Our setting here is that X is a compact metric space, so that X is the Baire (= Borel) cr-algebra and the Banach space C(X) is separable. Let {/n}£Li C C(X) be a fixed countable dense subset. S is a measurable transformation on X into X as before. Recall the notation S„ (n > 1): for any function / on X 1
n_1
(S„/)(s) = - £ / ( # * ) ,
x€X.
3=0
Definition 1. For n > 1 and x e X consider a functional MnjX on B(X) given by Mn>x(f) = (S„/)(x),
/ G B(X).
A point x e X is said to be quasi-regular, denoted x e Q, if lim M„, x (/) = M x (/) exists for every / e C(X). A measurable set A e X is said to have invariant measure 1 if IJ,(A) = 1 for every /j, e P S (X). Lemma 2. For eac/i quasi-regular point x € Q there is a unique stationary source Hx £ Ps(X) such that nsx — Hx and Mx(f) = f f(y) tix(dy),
f e C(X).
(6.1)
2.6. Ergodic decompositions
107
Moreover, Q is an S-invariant Baire set of invariant measure 1. Proof. Let x e Q. It is not hard to see that Mx(-) is a positive linear functional of norm 1 on C(X) since ||M n , a (-)|| = sup |Jlf„,,(/)| = sup |(S„/)(x)| < sup H/ll = 1 ll/ll
I f{y)fix(dy) = Mx(f) = Mx(Sf) Jx = I f(Sy)fix(dy) = [ Jx Jx
f(y)tix(dS-1y).
This implies that \xx = \ix o 5 _ 1 or \ix is stationary. That fisx = Vx is also derived from MSx(f) = Mx{Sf) = Mx(f) for / G C[X). Let ifklkLl C C(X) b e d e n s e a n d Ah = \x G X : lim (S n / f c )(x) = lim MnJfk) I
n—too oo
exists},
n—t-oo
k > 1.
1
Then we note that Q = f~l Ak since {fk} is dense in C(X). fc=i
Also note that Ak is an
Fas set in X for k > 1 and fJ.{Ak) = 1 for every /i G P S (X) by the Pointwise Ergodic oo Theorem. Now we see that Q = fl 4 is a Baire set such that /x(Q) = 1 for every fc=i
H G PS(X),
or Q has invariant measure 1.
From now on we assume that {fix}xeQ IS a family of stationary sources obtained in Lemma 2. Denote by B(X, 5) the set of all 5-invariant functions in B(X). Clearly B(X, 5) is a closed subspace of the Banach space B(X). For / G B(X) let
/«(*) =
/ f(y)l*x(dy), w , w
Jx V { 0,
"
xeQ,
~ x£Q.
(6.2)
If / G C ( X ) , then /* G B(X) since /*(*) = M I ( / ) for x G Q and Q G X. For a general / G S ( X ) choose a sequence {gn}%Li C C(X) such that / f(y)(J'x(dy)=
lim / gn(y)
iix{dy)
= lim Mx{gn) = Urn <^(x),
x G Q.
108
Chapter II: Information
Sources
Hence /& g -B(X). Now we have: Lemma 3. If f e B(X), then />> € B(X,S). The mapping A : / t-* /•" is a projection of norm 1 /rom -B(X) onto B(X,S), so that A 2 = A onrf ||A|| = 1. Moreover it holds that for fi g -Ps(-2Q and / g B(X) j
f(x)p(dx)
= J ( / / ( » ) M x ( r f 2 / ) ) / i ( d x ) = y /*(*)Ai(dr).
(6.3)
Proo/. Since /usa = //i for £ G Q by Lemma 2, we have that f\Sx)
= [ f(y) »Sx(dy) Jx
= / f(y) (Mx(dy) = / " ( * ) , Jx
x g Q
and f\Sx) = 0 = /1(x) for x 4 Q. Hence /» g B(X,S). If / g B(X,S), then clearly /& = / . Thus A is a projection of norm 1. To see (6.3) we proceed as follows. Let /J g PS(X). Then, since fi(Q) = 1 by Lemma 2 we see that for / g C(X) I f(x)n(dx)= Jx
/ lim (Snf)(x) Jx n^°° = [
ii(dx),
by the Pointwise Ergodic Theorem,
Mx(f)fi(dx)
JQ
= J' (Jxf{v)i**(dy)y{dx) = f f\x)fi(dx). JQ
Let B = {/ g B(X) : (6.3) holds for / } . Then B contains C(X) and is a monotone class, which is easily verified. Thus B = B(X) and (6.3) is true for every / g B(X). The following corollary is immediate. Corollary 4. Let fj, g PS(X).
Then, for f g /* = ^ ( / | 3 )
^(X,^), M-O.C,
where /•" is de./ined 6y (6.2), D = {A g X : 5 _ 1 A = .4} and ^ ( - | J ) is the conditional expectation w.r.t. 3 under the measure (i. Definition 5. A point x g Q is said to be regular if the corresponding stationary source fj,x is ergodic, i.e., (ix g P 8e (.Y). -R denotes the set of all regular points. Let (i* g Pse(X) be a fixed stationary ergodic source and define Mx = / A
x£R.
2.6. Ergodic decompositions
The set {nx}x^x
109
is called an ergodic decomposition relative to S.
We summarize our discussion in the following form. Theorem 6. Let X be a compact metric space, X the Baire a-algebra and S : X —> X a measurable transformation. Then the set R of all regular points is an S-invariant Baire set. Let {fJ.x}xex be an ergodic decomposition relative to S. Then, for any stationary source \i 6 PS{X) the following holds: (1) n(R) = l. (2) For f e L1(X,/i) let /» be defined by (6.2) and f*(x) x e R and = 0 otherwise. Then,
= Jxf(y)/j,x{dy)
Urn (S„/)(x) = Urn M„lX{f) = £ „ „ ( / ) (i-a.e., n—>oo
if
(6.4)
n—J-OO
/ # = / » = B„(/|3) /x-a.e.,
(6.5)
f f{x) nidx) = f fix) tiidx) = I ( f fiy) nxidy))tiidx). JX
JR
JR \JX
(6.6)
J
In particular, if f = 1A (-A € X), then H{A)=
[
frMnidx).
(6.7)
JR
Proof. Since Hsxi-A) = fixiA) for x £ Q and A e X by Lemma 2, we see that R is 5-invariant. Note that by Remark 3.3 (4) R = {xeQ:
^ ^ ( ( S , , / ) / ) = iixif)2,fe
= {x e Q : Um nxiiSnfk)fk)
C(X)}
= fj.xifk)2,k>
l}
oo
= f l [x e Q : ^ / ^ ( ( S B A J A ) = Mx(/fc)2}. fc=l
Since for each / G C'(X), A*(.)(/) = / x /Hi/) M()(^2/) is ^-measurable, i j is an im measurable set. Let ^i (E P S (X). We only have to verify (1). Observe the following two-sided implications: /i(i?) = 1 «=^ nx 6 .PsePO //-a.e.x lim (S n / fc )(i/) = nxifk)
fix-a-e-y, fi-a.e.x, k > 1
Chapter II: Information Sources
n 110
f \fk,s(v) ~ ^(/fc)| 2 Mx(%) = 0 y.-a.e.x,k> Jx where hAv) = Km (S„/ f c )(y),y 6 X,
1,
n—*oo
■ / " ( / |/*,s(»)'M-CA)r^(^))/*W = o. * ^ L Now for each fc > 1 it holds that
J(Jx\fk,s(y)-^(fk)\%x(dy)y(dx) = j (Jx\fk,s(y)\2^(dy)-M/*)|2)/*(
JQ I Jx
JQ
\fk,s^)\2Kdx)
= J
- J
\fl(x)\2
fkd^ (i(dx)
n(dx)
= [ \fk,s(*)\2»(dx) - f \fk,s{*)\2 f(dx), JQ
JQ
since /I(as) = M I (/ f c ) = / fe ,s(x) for x € Q, = 0.
We should remark that if (X, X) is an abstract measurable space with a countable generator, then for any measurable transformation 5 : X —► X we can find an ergodic decomposition {nx}xex Q PSe{X) for which (6.4) - (6.7) hold. For a detailed treatment see e.g. Gray [1].
2.7. E n t r o p y functionals, revisited In Section 1.5 we considered an entropy functional if (-,21, S) on the set of all S-invariant C-valued measures on a measurable space (X, X), where 21 € V(X) is a fixed finite partition of X and S is an automorphism on X, and we derived an integral representation of if (-,21, S) under certain conditions. In this section, we shall show an integral representation of the entropy functional by a universal function by two approaches: functional and measure theoretic ones. Also we shall extend the entropy functional to the space Ma(X) of all AMS measures. We begin with a functional approach using ergodic decompositions. Our setting is as follows. Let X be a totally disconnected compact Hausdorff space, S a fixed
2.7. Entropy functional, revisited
111
homeomorphism and X the Baire cr-algebra. Since an alphabet message space XQ is compact and totally disconnected, the above conditions are fairly general. Take any clopen partition 21 e V(X) consisting of disjoint clopen sets in X. As in Section 1.5, we use the following notations: n
, oo
%> = \ / S->'2l € P(X),
aB =
ff(0B),
.
2l0O=a((j2ln
j=l
^ n=l
'
for n > 1. P ( X ) , P S ( X ) , M ( X ) and M 8 (X) are as before. Since 21 and 5 are fixed, we write the entropy of y, € P«(X) relative to 21 and 5 by H(^)=H(IM,%S)
= -J2
I
p
M\*oo)
lim -
V
n—►oo 1. n->oo /l
IogP M (A|a o o ) dii
(7.1)
A*(A)logAt(A),
■ ■
A€ava„_i where P/1(J4|2)) is the conditional probability of A relative to a cr-subalgebra 2) of X under the probability measure fi. To reach our goal we need several lemmas. L e m m a 1. / / fj,, r\ e P3(X)
and r\
P „ ( A | 2 U = P „ ( A | 2 U 7,-a.e.
(7.2)
Proo/. Let 2J =
A £21
on Cj,, the support of /x. 5-invariance of CM implies that C^ € 2looNow let fi, T] e Ps(-X') be such that 77 < //. Then, 77' -C /J' and ^ 7 is 5-invariant and 2J-measurable, hence a^,-measurable (mod^i). Thus, we have that for A € 21 and P e aioo f P^Afiao)dr,
=j
lAdr, = j
Uj^dy
Chapter II: Information Sources
112
-L
pM(A|a00)^d/i
= /"pM(A|aoo)*j, JB
so that (7.2) is true. L e m m a 2. Let n € P s ( ^ ) - T^en iftere is a bounded, upper semicontinuous Sloo -measurable function h^ such that
hll = -Y,pM\^c0)i0Sp^A\^«>)
Hfr) = [
»-a-e->
hy. dy,.
and
(7-3) (7.4)
Jx
Proof. Note that P M (A|2t„) G C ( X ) for A e 21 since each B e 2l„ is clopen. For n > 1 let fc,,» = - J2 PM\*n) logP^(A|a«) (7.5) and observe that /i^,, € C(.X') and /!,,,„ J, by Jensen's Inequality (cf. (12) in Section 1.2). If we let hu = lim hun, then hu is upper semicontinuous since each hun *
n-K»
"
is continuous, and is 2loo-measurable since each h^n is 2l„-measurable. Moreover, {hn,n}n°=i forms a submartingale relative to {21,,} on (X, X, fi) and (7.3) is obtained by the Submartingale Convergence Theorem (cf. (25) in Section 1.2). (7.4) follows from (7.3) and (7.1). Lemma 3. The functional H(-) on PS(X) is weak* upper semicontinuous, we identify PS(X) C M(X) = C{X)*.
where
Proof. For n g Pe(X) and n > 1 let Hn{y) = / Jx
h^ndn,
where /iMi„ is defined by (7.5) in the proof of Lemma 2. Since Hn(ii) -► H([i) as n -$• oo for each /u e P3{X), it suffices to show the weak* continuity of Hn(-) on PS{X), which immediately follows from that Bnifl) = - E / lAtogP,*(A.\%n)dll /tea Jx = X ) Yl Ae
{fi(AnB)logfi(B)-n{AnB)logfi{AnB)}
2.7. Entropy functionate, revisited
113
and that A e 21 and B € 2l„ are clopen. In order to use the ergodic decomposition of S-invariant probability measures developed in the previous section, we need to introduce a metric structure in X. oo
Denote by 93o the algebra of clopen sets generated by
U
S3% and let 33 = cr(93o),
j=—oo
the (T-algebra generated by 93o- In the Banach space C{X), let C(X, 21) be the closed subspace spanned by {1,4 : A € 9S0} Q C(X). Note that C(X,2l) has a countable dense subset {fn}n°=i since 93o is countable. Define a quasi-metric d on X by
,A ,M„ 7
V \MX)~
«*•»>-1;
fn(y)\
2 «||/ n ||
„ „c y
'
*'»eJr
and an equivalence relation ~ by a; ~ 2/ -<=>• d(x,y) = 0,
x,y € X.
Then the quotient space X = X/ ~ becomes a metric space with the metric d defined by d(x,y) = d(x,y), x,y 6 X, where x = {z G X : z ~ x}, the equivalence class containing x. Moreover, (X, d) is a compact metric space, the canonical mapping i K i from X onto X is continuous, and C(X, 21) is isometrically isomorphic to C(X) with an isometric isomorphism C{X) 3 f >-> / € C(X), where f(x) = f(y),
xeX,y£x.
Hence 93 = { S : £ e 93} with B = {i : x € 5 } is the Baire a-algebra of X. The mapping S on X given by Sx = Sx, x e X is well-defined and a homeomorphism. Therefore the triple (X, 93, S) consists of a compact metric space X with the Baire cr-algebra 93 and a homeomorphism S. Lemma 4. For a positive linear functional X on C(X, 21) probability measure fix £ P(X, 93) sucft tftat A ( / ) = / fdux, Jx
of norm 1 there is a
feC{X,VL),
where P(X, 93) is tfee set of all probability measures on (X, 93). Moreover, if X is S-invariant, i.e., A(S/) = A(/) for f G C(X,21), tften n\ is S-invariant.
Chapter II: Information Sources
114
Proof. Let A(/) = A(/) for / 6 C(X,2l). Then A is a positive linear functional of norm 1 on C(X), and hence there is a probability measure ^ on (X, 95) such that
A(/)= I f{i)n{dx),
f'eC(X).
Jx By letting n\(B) = fi^(B) for B G 93, we have the desired measure fj,\. Recall some notations and terminologies from the previous section. X,fe C(X) and n > 1 we denote 1
For x €
n _ 1
Mn>x{f) = (S„/)(s) = - E / ( ^ = M ".-(^)' Let Q and .R be the sets of quasi-regular and regular points in X, respectively. Then Q and R have invariant measure 1, i.e., ji(Q) = fi{R) = 1 for p, 6 PS(X) by Lemma 6.2 and Theorem 6.6. If Q = {x € X : x € Q} and R = {x £ A" : x e R], then Q and i? have invariant measure 1, i.e., (i(Q) = fi{R) = 1 for (i g ^ ( A ) , where Q and iZ are called the sets of quasi-regular and regular points in X relative to C(X,21), respectively. Thus, for each x € Q, Mx(f) = lim M„tX(f) exists for n—>oo
/ £ C(X, 21), M x (-) is a positive linear functional of norm 1 on C(X, 21), and there is an 5-invariant probability measure fj,x = \IMX on (X, 93) such that Mx(f)
= f f(y) »x(dy), Jx
f e C(X, 21).
L e m m a 5. For a bounded 93 -measurable function f on X f\r)
= [ f{x) »T(dx), Jx
reR,fe
C(X, 2t)
is a bounded, *B-measurable and S-invariant function on R and satisfies that jxf(x)lM(dx) for every /ie
= Jj\r)»(dr)
=J ( j
f{x)nr{dx)\»(dr)
P3(X).
Proof. Let / be bounded and ©-measurable on X. By Theorem 6.6 we see that the function g on R defined by 9(f) = I f(x)nf{dx) Jx
=
f\r),
reR
2.7. Entropy functional, revisited
115
is !8-measurable and satisfies / g(f) Ji{df) = / f(x) Jl{dx), JR
/2 e
P3(X).
JX
S-invariance of /* follows from that of \ix for x € Q. Under these preparations, we are now able to prove the integral representation of the entropy functional. Theorem 6. Let X be a totally disconnected compact Hausdorff space with the Baire v-algebra X and a homeomorphism S. If 21 € V(X) is a clopen partition of X, then the entropy functional H(-) = J?(-,2l, S) on Pa(X) has an integral representation with a universal bounded nonnegative S-invariant Baire function h on X: H(fi) = [ h(x) »{dx),
ii e P.(X),
(7.6)
n e PS{X),
(7.7) (7.7)
Jx h(x) = h^{x)
n-a.e.,
where h^ is given by (7.3) and h is unique in P„{X)-a.e. sense. Proof. Let us denote by PS(X, 03) the set of all S-invariant measures in P(X, OS). Define a function h on X by h(r)=l
/
h
nr(x)»r(dx)
[ 0
= H(fj.r)
if reR,
if r£R.
Clearly h is nonnegative. We shall show that h is bounded, S-invariant and 03measurable. Note that each h^ (r e R) is obtained by the limit of {/i/jr,n}^=i (cf. (7.5)) as was seen in the proof of Lemma 2: ft^m 4- h^r. Let gn(r) = / V.,„(x) nr(dx) Jx
= - £ Y, M>in.B){logins)-logMB)} for reR. Since fir(C) = MT(\c) (C e 03o) is a 03-measurable function of r on R, „(■) is also OS-measurable on R. Hence h is 05-measurable on R since /i Mr , n 4- V r and h(r)= / h^x) nT(dx) = n_>0 lim gn(r), reR. Jx ° This and the definition of h imply that h is bounded and 03-measurable on X. 5-invariance of h follows from that of fir.
jjg
Chapter II: Information Sources
To show (7.6) let A G 21, B G 2too and /J,, r\ G P S (X) with TJ < M- Then it holds that on one hand TJ(A nB)=
f PniAlQLco) dri= f
PM\%°°)
= f ( j P»{MKoo){x) »r{dx)\
*».
by Lemma 1,
??(dr),
by Lemma 5,
and on the other hand TJ{A r\B)=
j lAnB dri= Jx
JR
dr
lAnB
= / /j.r(AnB)r}(dr),
l
by Lemma 5,
JR
= jR(JBP^A\*~){x)Hr{dx)}
r,{dr).
It follows from these equalities and S-invariance of fj,r (r G R) that for each A G 21
i ^ s u = PuMfooo), K = V
(7-8)
Hr-a.e. for /j-a.e.r. We can now derive (7.6) as follows: for fi G P${X) H(n) = / htidn= Jx
hfiJr) n(dr),
by Lemma 5,
JR
= / I / hn(x) fj,r(dx) J n(dr),
= JR(Jxh^(x)^(dx)\ = / h(r)n(dr),
by Lemma 5,
n{dr), by (7.8),
by the definition of h,
JR
= / h(x) fJ,(dx), Jx
by the definition of h.
As to (7.7), let /J,7? G PS{X) be such that r) < //. Then by Lemmas 1, 2 and (7.6) H{v)=
/ / i „ * 7 = / hlldr)= ./A:
JX
/
/icfy,
JX
and hence for every 5-invariant / G LX(X, fj,) f h(x)f(x)v(dx)= Jx
[ Jx
hfl{x)f{x)ii{dx).
2.7. Entropy functionate, revisited
117
Since h and hu are S-invariant, (7.7) holds. The function h obtained above is called a universal entropy function associated with the clopen partition 21 and the homeomorphism S. Of course the functional H(-) is extended to M3{X) with the same universal function h. Next we formulate an integral representation of the entropy functional by a mea sure theoretic method. So let [X, X, S) be an abstract measurable space with a measurable transformation S : X —> X. Theorem 7. Let 21 £ V(X) be a fixed partition. Then there exists an S-invariant nonnegative measurable function h on X such that H(n) = H(ji, 21, S) = [ h d/i,
/i e Pa (X),
(7.9)
Jx h = hli n-a.e.,
(7.10)
fiePs(X),
where h^ is given by (5.8).
(7.10)
where h^ is given by (5.8).
Proof. With the notation of %, = V S _ J'Sl,a„ = a(2l rl ) and 2 ^ = a[ U 2l„) let 3=1
V n=l
/
2J = {B € « « . : 5 _ 1 S = 5 } and observe that 2J is a cr-subalgebra with S _ 1 2) = 2) and 2J C 2too. Hence by Corollary 5.5, for each \i £ PS(X), there exists a function h^ such that F ( / i , 21, S) = / hj,, djt. Jx Let ji 6 Ps(-X') be fixed. Then for any A e 21V 2l„_i we have that fc-i
P M (A|0J)(x) = l i m s u p - ^ I A ^ Z )
fi-a.e.
(7.11)
since the RHS is 2)-measurable and for any B 6 2) fc-i
„
, fc-i
/ limsup — 2_. iAiS-'x) n(dx) = I limsup — y ^ 1B(X)1A{S'
L X
fc-1
x) n(dx)
lfc-1 limsup - ^ 1,4ns (S'aO /u(dx) fc-foo
= n(AnB),
k
j=0
Chapter II: Information Sources
118
where we have used the 5-invariance of B and the Pointwise Ergodic Theorem. Let for A € X fA(x)
1
fc-i
k
j=0
= limsup - J2 lA(Sjx), k-too
x 6 X.
It then follows from (7.11) that, with the notation in Corollary 5.5, ij M (2lV2t n _ 1 |2J) = - n
n
£ lxlogP„(AIS)) Aeava„_, V
71 n
Ulog./U />a.e.
^—'
Since the LHS —» h^ fjra.e., we see that hp{x) = - U m s u p ^ lA(x)logfA(x) n ->°° n A £ a v s i „ . 1
= h(x)
\i-a.e.
Note that h is denned on X, is independent of /J, and satisfies (7.9) and (7.10). Moreover, h is 5-invariant mod\i for fj, e PS(X) since h^ is so. Thus we can redefine h so that h is actually 5-invariant on X. Let 21 G V{X) be a fixed partition and h be the 5-invariant measurable function obtained in Theorem 7. Then h is called a universal entropy function. Finally we want to extend the entropy functional if (■) = H(-, 21,5) to P 0 ( X ) , the space of all AMS sources, and hence to Ma(X) = {afi+/3n: a, /3 € C, /J, »7 € Pa(X)}. Proposition 8. Assume that (X, X, 5) is an abstract measurable space with a mea surable invertible transformation S : X —> X. Let 21 e V(X) be a fixed partition. Then the entropy functional if(-,2t, 5) wi/» a universal entropy function h can be extended to a functional H(-, 21, 5) on Ma(X) with the same entropy function h such that H(£,21,5) = i f ( ? , Z , S ) =
[ hd£, Jx
£e
Ma(X).
Proof. Let £ € M a ( X ) and £ € Af,(X) be the stationary mean. Since £ < £, £ e M£-(X) E { f ) 6 M ( X ) : 77 < £} C Ma(X). Hence, by Theorem 1.5.9, the functional H(-, 21,5) can be extended to a functional H{-, 21,5) on M?(X) with the same entropy function h. But then, since h is invariant we see that
H(Z,%S) = J^hd£ = J ho%= H(l21,5),
£ e M0(X)
Bibliographical notes
119
by Remark 4.2. This completes the proof. Results obtained in this section will be applied to derive an integral representation of the transmission rate of a stationary channel in Section 3.6.
Bibliographical notes 2.1. Alphabet message spaces and information sources. Alphabet message spaces were introduced to formulate information sources and channels by McMil lan [1](1953). Theorem 1.1 is shown in Umegaki [4](1964) where he proved that an alphabet message space is not hyper Stonean. 2.2. Ergodic theorems. Birkhoff [1](1931) proved the Pointwise Ergodic Theorem. The proof given here is due to Katznelson and Weiss [1](1982), which does not use the maximal ergodic theorem, von Neumann [1](1932) proved the Mean Ergodic Theorem. See also Akcoglu [1](1975). 2.3. Ergodic and mixing properties. (4) of Theorem 3.2 is obtained by Breiman [2](1960) and Blum and Hanson [1](1960) (see also Farrel [1](1962)). (4) of Theorem 3.6 is proved by Renyi [1](1958). Lemma 3.9 is due to Koopman and von Neumann [l](1932). An example of a measurable transformation that is weakly mixing but not strongly mixing is given by Kakutani [1](1973). Characterization of ergodic and mixing properties by the relative entropy is obtained by Oishi [1](1965) (Lemma 3.13 through Proposition 3.16). Related topics are seen in Rudolfer [1](1969). 2.4- AMS sources. The idea of AMS sources goes back to Dowker [2](1951) (see also [1, 3](1947, 1955)). Jacobs [1](1959) introduced almost periodic sources, which are essentially the same as AMS sources. Lemma 4.3 is proved in Kakihara [4] (1991). Lemma 4.5 is due to Gray and Kieffer [1](1980). In Theorem 4.6, (2) and (5) are due to Rechard [1](1956), (3) and (4) to Gray and Kieffer [1] and (6) to Gray and Saadat [1](1984). Proposition 4.8 is shown by Fontana, Gray and Kieffer [1](1981) and Kakihara [4]. In Theorem 4.12, (2) is proved in Gray [1](1988), (8) is given by Ding and Yi [1](1965) (for almost periodic sources), and others are noted here. (1) of Theorem 4.14 is obtained in Kakihara [4] and (2) is in Kakihara [5]. (2) and (3) of Proposition 4.15 are in Kakihara [4] and (4) is in Kakihara [5]. 2.5. Shannon-McMillan-Breiman Theorem. The ergodic theorem in information theory is established in this section. Shannon's original form is Corollary 5.3 given in Shannon [1](1948). McMillan [1] obtained the L1 -convergence in the alphabet mes sage space (Corollary 5.2). Breiman [1](1957, 1960) showed the a.e. convergence. Corollary 5.5 is due to Nakamura [1](1969). There are various types of formula tions and generalizations of SMB Theorem. We refer to Algoet and Cover [1](1988), Barren [1](1985), Chung [1](1961), Gray and Kieffer [1], Jacobs [1], [3](1962), Ki-
120
Chapter II: Information Sources
effer [1, 3](1974, 1975), Moy [1](1960), [2, 3](1961), Ornstein and Weiss [1](1983), Parthasarathy [2](1964), Perez [1, 2](1959, 1964) and Tulcea [1](1960). 2.6. Ergodic decompositions. Ergodic decomposition is obtained by Kryloff and Bogoliouboff [1](1937). Oxtoby [1](1952) gave its comprehensive treatment. The content of this section is mainly taken from these two articles. See also Gray and Davisson [1](1974). 2.7. Entropy functionals, revisited. The integral representation of an entropy functional by a universal entropy function in the alphabet message space was ob tained by Parthasarathy [1](1961) (see also Jacobs [5](1963)). For a totally discon nected compact Hausdorff space Umegaki [4] proved such a representation. Lemma 7.1 through Theorem 7.6 are due to Umegaki [4]. Nakamura [1] derived an integral representation in a measure theoretic setting without using ergodic decompositions (Theorem 7.7). Proposition 7.8 is noted here. Extension of the entropy functional H(-) to almost periodic sources is shown by Ding and Yi [1] using the result of Jacobs
CHAPTER III
INFORMATION
CHANNELS
In this chapter, information channels are extensively studied in a general setting. Using alphabet message spaces as models for input and output, we formulate various types of channels such as stationary, continuous, weakly mixing, strongly mixing, AMS and ergodic ones. Continuous channels are discussed in some detail. Each channel induces an operator, called the channel operator, which is a bounded linear operator between two function spaces and completely determines properties of the channel. One of the main parts of this chapter is a characterization of ergodic chan nels. Some equivalence conditions for ergodicity of stationary channels are given. It is recognized that many of these conditions are similar to those for ergodicity of stationary sources. AMS channels are considered as a generalization of stationary channels. Ergodicity of these channels is also characterized. Transmission rate is introduced as the mutual information between input and output sources. Its inte gral representation is obtained. Stationary and ergodic capacities of a stationary channel are defined and their coincidence for a stationary ergodic channel is shown. Finally, Shannon's first and second coding theorems are stated and proved based on Feinstein's fundamental lemma.
3.1. Information channels Definitions of certain types of channels are given and some basic properties are proved. As in Section 2.1 let Xo = {ai, ■ ■ ■ , ap} be a finite set, construct a doubly infinite product space X = Xg, and consider a shift transformation S on X. If X is the Baire cr-algebra of X, then (X, X, S) is our input space. Similarly, we consider an output space (Y,2J,T), where Y = Y0Z for another finite set Y0 = {b%,... ,bq}, 2J is the Baire cr-algebra of Y, and T is the shift on Y The compound space (X xY,X<8>%),S xT) is also constructed, where X<8>2) is the cr-algebra generated by {A x C : A 6 X, C G 2)} and is the same as the Baire cr-algebra of X x Y. We use notations P(fi), P„(ft), P,(tt), C[il), B(Q), ■.. etc. for SI = X, Y or X x Y 121
Chapter III: Information Channels
122
Definition 1. A channel with input X = X% and output Y = y„ z is a triple [X, u, Y) for which the function i / : X x ! ) - > [ 0 , l j satisfies (cl) v(x, ■) G P(Y) for every x G X; (c2) v{; C) G B{X) for every C 6 2J. In this case i/ is called a channel distribution or simply a channel. C(X, Y) denotes the set of all channels with input X and output Y. The condition (cl) says that if an input x G X is given, then we have a probability distribution on the output, and we can know the conditional probability v(x, [?/*■•• yj]) of a particular message [j/i • ■ ■ yj] received when x is sent. The technical condition (c2) is needed for mathematical analysis. A channel v G C(X, Y) is said to be stationary if (c3) v{Sx, C) = v(x, r _ 1 C ) for every x G X and C G 2), which is equivalent to (c3') i/(5x, E^) = u(x, T^E*) for every x G X and £ G £ ® 0), where JS« = {y G y : (x, y) G J5}, the x-section of i?. Since, in this case, S and T are invertible, we may write the condition (c3) as (c3") u(Sx, TC) = v(x, C) for x G X and C € 2). C S (X, y ) denotes the set of all stationary channels v e C(X, Y). Note that C(X, Y) and C S (X, y ) are convex, where the convex combination is defined by (avi + (1 - a)v2){x,C)
=ai/i(x,C) + (l-a)z/2(a;,C),
x 6 X , C € 2J.
Let p(6fc|a,-) be the conditional probability of && received under the condition that Oj is sent, where 1 < j < p and 1 < k < q. If a channel v is defined by j
(c4) u(x, [y,--- yj]) = ]J p(yt\xt),
where x = (xt) G X and [y€ • ■ ■ yj] C Y is a
message, then it is said to be memoryless. The p x g matrix P = (p(bk\aj)) .. is called a channel matrix of i/. Clearly every memoryless channel is stationary. On the contrary to this, a channel v G C(X, Y) is said to have a finite memory or to be an m-memory channel if (c5) There exists a positive integer m such that for any message V = [y{- ■■ yj] with i < j it holds that v{x, V) = u{x', V),
x= (xk),x'
= (x'k) G X with xk = x'k {i-m
< k < j).
In some literature, a finite memory channel is defined to satisfy (c5) above and the finite dependence condition (c8) (Definition 3.3). A weaker condition than (c5) is as follows: A channel u G C(X, Y) is said to be continuous if
3.1- Information channels
123
(c5') i/(-, [yi ■ ■■yj\) G C{X) for every message [yt ■ ■■yj] C Y, which is equivalent to (c5") J f{;yH;dy)
e C(X) for every / € C(X x Y).
The equivalence will be proved in Proposition 5 below. A channel v G C(X, Y) is said to be dominated if (c6) There exists some n 6 P(Y) such that v(x, •)
I
u{x,Ex)/i(dx),
C G 2J, £G£®2J,
(1.1) (1.2)
where £ « is tlie x-section of E. (1.2) can also be written as /*® i/(i4 xC)=
v{x,C)n{dx),
AeX,CeZ).
JA
Note that n ® i/(X x C) = ^ ( C ) a n d M ® "(-A x Y) = ^ ( ^ ) for .4 6 X and C G 2). All of the above definitions (except (c4), (c5) and (c5')) can be made for a pair of general compact Hausdorff spaces X, Y with Baire cr-algebras X, 2J and (not neces sarily invertible) measurable transformations S, T, respectively. Or more generally, (cl) - (c3) and (c6) are considered for channels with input and output of measur able spaces (X, X, S), (Y, 2), T). In this case, we consider "abstract" channels v with input X and output Y. In what follows, unless otherwise stated, X and Y stand for general measurable spaces. Note that any output source n G P{Y) can be viewed as a "constant channel" by letting uv(x,C)
= n(C),
ZGX,CG2J.
So we may write P(Y) c C(X, Y). In this case, (J. ® vv = n x n,
/J.uv = ri,
n G P(X).
Thus, if rj is stationary, the channel vv is stationary. Consequently, we may write P8(Y)CCS(X,Y). A simple and important consequence of the above definitions is:
124
Chapter III: Information
Channels
Proposition 2. If v € CS(X,Y) and \i £ P,{X), then \iv £ PS(Y) and n®u £ PS(X x Y). That is, a stationary channel transforms stationary input sources into stationary output sources and such inputs into stationary compound sources. Proof. Let v £ CS(X, Y), \i £ PS(X), ^ i / ( ( S x T ) - 1 ( J 1 x O)
A £ X and C £ 2). Then we have
= li ® i / ( 5 _ 1 A x
T~lC)
= f
v{x,T-lC)y,(dx)
= I
u(Sx, C) n(dx) = / v(x, C) Sn(dx)
JS~1A
JA
= / v(x,C)n(dx)
= [j,®v(Ax
C),
JA
where 5/i = \i o S - 1 . This is enough to deduce the conclusion. A type of converse of Proposition 2 is obtained as follows: Proposition 3. Assume that S is invertible and 2) has a countable generator 2)o, and let fi £ P(X) and v £ C(X,Y). If fi ® v £ P S ( X x Y), then n £ P S (X) and v £ C S (X, Y") {i-a.e. in the sense that there is some stationary v\ £ C3(X,Y) such that v(x, •) = fi(x, ■) n-a.e.x £ X. Proof. Since // ® t/ is stationary, we have for A £ 3£ and C £ 2) // ® i/(A x C) = A v(x, C) n{dx) fi®v((SxT)-1(AxC))
=
= /i®i/(5_1AxT_1C).
(1.3)
If C = Y, then (1.3) reduces to ft(A) = KS~lA),
AcX,
i.e., \x £ P„{X). Using this, (1.3) can be rewritten as H®v(S-1AxT-1C)=
[
JS-lA
vfaT-iC)
i*(dx)
= I ^(5-1x,r-1C)5'/x(da;) JA
= I JA
v{S-lx,T-lC)n{dx)
3.1. Information
channels
125
for every A G X and C G 2J. Hence i/(x,C) = v(S~1x,T-1C) ft-a.e.x for every C G 2). Let 2J0 = { C ^ C a , . . . } and X n = {x G X : 1/(1,C„) = v{S-1x,T-1Cn)} OO
for n > 1. Then X* = v(x,-) = v(S~1x,T~1-) 1/ is stationary n-a.e.
fl I „ € X is such that M ( X , ) = 1 and, for every x G X„
n=l
on 2)o and hence on 2J since 2J0 is a generator of 2J. Thus
The following proposition may be viewed as an ergodic theorem for a stationary channel. Proposition 4. If v G CS(X, Y) and \i G PS(X), follomng limit exists: 1
then for every E,F G £2) t/ie
n-l n_1
/x-a.e.x.
n—*-oo 71 fc=0
in particular, for every C,D G 2) ifte following limit exists: n -- ll n
Urn - V f(x, T~kC n £>) /j-a.e.x. n—►00 71 fc=0
Proof. L e t 3 = { £ e X ® 2 ) : ( 5 x r ) - 1 ^ = £ } and E, F € £ ® 2J be arbitrary. Since / i ® i / £ P , ( I x y ) we have by the Pointwise Ergodic Theorem that 1
n _ 1
lim - V l B ( ( S x T ) f c ( x > 2 / ) ) = lim
[(S®T) n l E ](x, V )
fc=0
= E^v{lE\3){x,y) where [ ( S » T ) / ] ( * , » ) = f{Sx,Ty)
n®v-a.e.,
(1.4)
for / G B ( X x F ) , x G X, y G Y and ( S ® T ) n =
n-l
~ J2 ( s ® T)k- L e t ^ = {(x>2/) € X xY : Limit in (1.4) exists for (x,y)}. Then fc=o li®v(Z) = Jx v(x, Zx) p(dx) = 1, so that v{x, Zx) = 1 fi-a.e.x. Hence the following limit exists (M-a.e.x by the Bounded Convergence Theorem: 1 "_1
lim - V * v(x, [(S x T ) - f c £ n F]x)
n—)•
,„
fc=0
= lim /
lF(x,2/)iVls((5xr)fc(x,2/))!/(x,d2/)
128
Chapter III: Information
= I
Channels
lF(x,y)E^v{lEp){x,y)v(x,dy).
We need a tensor product Banach space with the least crossnorm A (cf. Schatten [1]). Let £ and T be two Banach spaces and £ 0 T be the algebraic tensor product. n
For $ = J2 4>j © *l>j tlw * east crossnorm A($) is defined as: i=i
A(«) = sup {|(0* © 0*)($)| : 0* 6 f *, r G * • , ll^'ll < 1,||0*|| < 1}, where (0* 0 0*)(00 0) = 0*(0)0*(0)- The completion of £©.F w.r.t. A is denoted by £ ®\F and called the injective tensor product Banach space off and T. Suppose that X, Y are compact Hausdorff spaces. If £ = C(X), then C(X) 0 T is identified with the function space consisting of .F-valued functions on X:
f 53 si o ^ ) (x)=53 aJ (x)^.
a; G X .
Moreover, it is true that C(X)®XF
=
C(X;F),
the Banach space of all ^"-valued (norm) continuous functions on X with the sup norm. In particular, we have that C(X) ®A C{Y) = C(X x Y), where we identify (aQb)(x,y) = a(x)b(y) for a G C(X), b G C(Y) and x G X, y G Y. Proposition 5. Consider alphabet message spaces X = Xff, Y = Yjf and a function i / : X x ? ) 4 [ 0 , l ] . Then: (1) (c5) =>■ (c5') & (c5"). (2)(c5'),(c6)=Kc2). Proof. (1) (c5)=>(c5') is clear and (c5')<»(c5") follows from the fact that each message [yt ■ ■ ■ yj] is a clopen set and the set fJJl of all messages forms a topological basis for Y, and the fact that C(X x Y) = C(X) ®x C(Y), where the algebraic tensor product C(X) © C(Y) is dense in it as is noted above. (2) Let b e C(Y). Then there exists a sequence { s n } ^ ! of simple functions of the form sn = $J aniklAn
, ^„ jfc e fflt, n > 1 such that
fc=i su
P|sn(2/)-%)| = ||s„-6|| -^0
asn-»oo.
3.1. Information channels
127
This implies that / sn(y) v(x, dy) ->• / b(y) v{x, dy)
uniformly in x.
Since fYsn(y)v(-,dy) <E C(X) by (c5'), we have JYb{y)u(-,dy) € C(X). Now (c6) implies that the RN-derivative k(x, •) = " & v y exists in LX(Y, n) for every fixed x € X and i/(x, C) = / k(x, y) r){dy),
x € X, C e 2).
Let C G 2) be given and find a sequence {6 n }£Li Q C(Y) such that
y M»)-lc(v)|i/(d»)-M). Then we have that, for x € X, i/(x,C)= /
lc(y)k(x,y)r)(dy)
JY
= lim / n-*oo Jy
bn{y)k(x,y)r](dy)
= lim /
bn{y)v{x,dy)
by the Dominated Convergence Theorem. Therefore, z/(-, C) is a limit of a sequence of continuous functions, and hence is measurable on X. That is, (c2) holds. For a channel v G C(X, Y) define an operator K„ : B(Y) -» B(X) by (K„6)(x) = /" b(y) v{x, dy),
b 6 B(Y),
(1.5)
where B ( X ) , 5 ( y ) are spaces of all bounded measurable functions on X, Y, respecn
n aj^-Cj, a simple function on V, then obviously K„6 e B(X) tively. In fact, if 6 = ^Z tively. In fact, if 6 = ^Z aj^-Cj> a simple function on V, then obviously K„6 e B(X). For a general 6 e 5 ( 1 0 we can choose a sequence {£>„} of simple functions converging to 6 pointwise with |6„| < |6| for n > 1. Then the Dominated Convergence Theorem applies to conclude that K„6 6 B(X). Now we can characterize continuous channels between alphabet message spaces using the operator K„.
P r o p o s i t i o n 6. Consider alphabet message spaces X = XQ and Y = YQ and a stationary channel v € Cs (X, Y). Then the following conditions are equivalent:
Chapter III: Information Channels
128
(1) v is continuous, i.e., v satisfies (c5'); (2) The operator K„ defined by (1.5) is a linear operator from C(Y) into C(X); (3) The dual operator K* of K„ is from C{X)* into C(Y)», i.e., K* : M(X) -+ M(Y) and is sequentially weak* continuous, i.e., £„ -¥ f weak* implies K*£„ -+ K*£ weak*. (4) u(; n [y£] ■ ■■yfty G C(X) forn>l and messages [yik ■ - - j A ] C 7 ( l < k(4) => (2) => (3) => (1). (1) =>■ (4). We first prove the case where n = 2. If j i < i2, then
[^-^n^-^H U bf-liVr"Mf^?l (1-6) y*e*o >l
and hence
K-> laP • • • iShfibff • ■ ■ iffD = £ *(■.foP• • • i&Wt • • • *-««? • Wt€Vo Jl <*<«2
(1.7) Since each term on the RHS of (1.7) is continuous by (1), the LHS is also continuous. If ii < ii <jt <j2, then LHS of (1.6) = h,W • • • y ^ l j n fo£> ■ ■ • »£>] n J 0
[y(2)... y
«]
if yk ' j=- yk
n
^
. . . „«]
for some ii < k < ji
Thus the LHS of (1.7) is continuous by (1). The other cases are similarly proved. For a general n > 2, we can establish (4) by mathematical induction. (4) => (2). Let Co be the algebra generated by the functions of the form Mvi-vj] (* ^ i)- T h e n ( 4 ) implies that K^Co C C(X). Since each 6 G C{Y) is uniformly approximated by a sequence {&„} C Co, we have K„6 G C ( X ) because ||K„(6 - 6 n )|| < ||6 - 6„|| -r 0. Therefore, K„ is a linear operator from C(Y) into C{X). (2) => (3). Suppose that {£„} C M ( X ) = C(AT)* is a sequence converging weak* to £ G M ( X ) , i.e., (a,£n)= / ad£„-> / arfC={a,0, where (■, ■) is the duality pair for C(X) and M(X). (b,KXn)
= (Kvb,£n)
a G C(X),
Then for 6 G C(Y) we have that
->
3.8. Channel operators
129
since Kvb e C(X) by (2). This means that {K*£„} C M(Y) K*f. Therefore (3) holds. (3) => (1). For i 6 ^ w e denote by Sx the Dirac measure if x e A and = 0 if x 4 A for A € X. Note that 5X e M(X) that {a; n }^=i <~ X converges to x, i.e., d(xn,x) —► 0, where by (II.l.l). For a 6 C(X) one has
converges weak* to at x, i.e., SX(A) = 1 for x e X. Assume the metric d is defined
( M x J = a{xn) -> a(x) = (a.tfj,). Thus, by (3) we see that for 6 e C(F) (K„6)(z„) = (K„6A„> = (b,Kt5xJ
-> ( 6 , K ^ > = (K„6)(a:).
If, in particular, we take 6 = l[Vf-Vi], then the above shows that K„6 e C(X). That means (1) holds.
3.2. C h a n n e l operators A channel associates with a certain averaging operator on the space of bounded Baire functions. We shall establish a one-to-one affine correspondence between chan nels and this kind of operators. Let X, Y be a pair of compact Hausdorff spaces with Baire measurable transfor mations S, T, respectively. Consider the algebras B(X),B(Y) and B(X x Y) of all bounded Baire functions on X, Y and X xY, respectively, where the product is a pointwise multiplication. These are C*-algebras with unit 1, and B(X) and B(Y) are regarded as *-subalgebras of B(X x Y) by the identification of a = a 0 1 and 6 = 1 0 6 for a € B{X) and 6 6 B(Y). Here the involution / H-> / * is defined by /*(u) = f(ui) for / G B(Q,),UJ e fi and the norm is the sup-norm, where U = X, Y or X x Y. A linear operator A from B(X xY) onto B(X) is said to be an averaging operator if for f,g e B(X xY) (al) A l = 1, A(fAg)
= (A/)(A5);
(a2) / > 0 =► A / > 0, which imply: (al') A is idempotent, i.e., A 2 = A; (a2') ||A|| = 1. That is, A is a norm 1 projection from B{X x Y) onto
B(X).
An averaging operator A satisfies that A / * = (A/)*,
(A/)*(A/) < A ( / 7 ) ,
feB(XxY).
J3Q
Chapter III: Information
Channels
In fact, the first follows from (a2) and the second is derived using the first, (al) and (B2):
0 < A((/ - A/)*(/ - A/)) = A ( / V ) - (A/)*(A/). Let us denote by A(X, Y) the set of all averaging operators A from B(X xY) B(X) such that
onto
(a3) / „ 4. 0 =* A / „ 4. 0. A G A(X, Y) is said to be stationary if (a4) SA = A ( S ® T ) . AS(X, Y) denotes the set of all stationary operators in A(X, Y). Note that A{X, Y) and As (X, Y) are convex sets. We need another set of operators. Let K(X, Y) denote the set of all bounded linear operators K : B(Y) -t B(X) such that (kl) K l = l, K 6 > 0 i f 6 > 0 ; (k2) bn I 0 => K6„ 4. 0. K € K(X, Y) is said to be stationary if (k3) K T = SK. K.3{X,Y) denotes the set of all stationary K 6 IC(X,Y). Also note that K.{X,Y) and fCs (X, Y) are convex sets. Let v e C(X, Y) be an abstract channel and define operators A : B{X x Y) -¥ B{X) and K : B(Y) -4 B(X) respectively by (A/)(x) = I
f(x, y) u{x, dy),
(Kb)(x) = J b(y) u(x, dy),
f € B(X x Y), be B(Y).
(2.1) (2.2)
A and K are called channel operators associated with v and sometimes denoted by Av and K„, respectively. Note that A„(l 0 6) = Kub for 6 6 B(Y), where (a © b)(x, y) = a(x)b(y) for a e B(X), b e B{Y) and x € X, y e Y. First we establish a one-to-one, onto and affine correspondence between C(X, Y) and K.(X, Y) as follows, since they are convex: Theorem 1. There exists a one-to-one, onto and affine correspondence v <-> K between C(X,Y) and K.{X,Y) (or CS(X,Y) and JCS(X,Y)) given by (2.2). Proof. Let v € C(X,Y) and define K : B(Y) -> B(X) by (2.2). Then, K satisfies (kl) by (cl) and (c2). (k2) follows from the Monotone Convergence Theorem.
3.2. Channel operators
131
Conversely, let K € K.(X, Y). For each fixed x € X, px defined by px(b) = (Kb)(x),
beC(Y),
is a positive linear functional of norm 1 on C(Y) by (kl). Hence, there exists uniquely a probability measure vx £ P(Y) such that Px(b)
= J b(y) ux(dy),
b e C(Y).
(2.3)
Let Bi be the set of all b e B(Y) for which (2.3) holds. Then C{Y) C Bx and B\ is a monotone class by (k2). Thus B\ = B(Y). If we take b = lc for C e 2J, then {Klc)(x) = !Ylc(y)vx{dy) = vx{C), x e X and J/ ( .)(C) e B(X). Letting v(a;, C7) = vx(C) for a; e X and C £ 2), we see that (cl) and (c2) are satisfied. Thus we have established a one-to-one, onto correspondence v f+ K between C(X,Y) and fC(X,Y), which is clearly affine. Moreover, if v e CS(X,Y), then for x € X and b e B(Y) it holds that (KT6)(x) = [K(Tb)](x)
= J b(Ty)
v{x,dy)
= J b{y) u(x, dT~ly) = J b(y) u(Sx, dy) = (Kb){Sx) =
(SKb)(x),
i.e., K £ K.S(X,Y). If, conversely, K G ICS(X,Y), then considering b = l c for C £ 2) we see that v{x,T~1C) = v(Sx,C), x 6 X by virtue of (k3), i.e., v e Ca(X,Y). Therefore, the correspondence f ++ K is one-to-one, onto and affine between C3(X, Y) and £ 8 ( X , y ) . Using Theorem 1, we can also obtain a one-to-one correspondence between C(X, Y) and A(X, Y) as follows: Theorem 2. There exists a one-to-one, onto and affine correspondence v <->• A between C(X,Y) and A(X,Y) (or C„(X,Y) and AS(X,Y)) given by (2.1). Proof. Let v e C(X, Y) be given and define an operator A by (2.1). A l = 1 is clear. Observe that for f,g € B(X x Y) and x e X [A(fAg)]
(x) = [ ( / A 5 ) (x,y)u(x,
dy)
JY
= j
f{x,y)(Ag)(x)v(x,dy)
132
Chapter III: Information
= / =
Channels
f(x,y)v{x,dy){Kg){x) (Af)(x)(Ag)(x).
Hence A satisfies (al). (a2) is rather obvious and (a3) follows from the Monotone Convergence Theorem. Thus A G A(X, X). If v € Ce(X,Y), i.e., v is stationary, then for / € B(X x Y) and x € X one has (SA/)(x) = (A/)(5x) = f f(Sx, y) u(Sx, dy) = J f(Sx, y) v{x, dT^y) = J f{Sx, Ty) u(x, dy) = J[(S®
T)f] (x, y) u(x, dy) = [A(S ® T ) / ] (x).
Consequently, (a4) is satisfied and A G ^4 s (X,y). Conversely, let A € ,A(X, y) and (K6)(x) = A ( 1 0 5)(i),
beB{Y),xeX.
Then, it is easily seen that K € K.(X,Y). By Theorem 1 there exists a unique channel v G C(X,Y) such that (2.2) holds. Thus, for a i , . . . ,an € Bpf) and &i,... , bn G B(Y) we have
A( Yak 0 bk)(x) = \
i
i
/
Yak(x)(Kbk)(x) i
-i
= >Jafc(a;) / =J
bk{y)v{x,dy)
\y2akObk)(x,y)i>{x,dy).
(2.4)
Since the algebraic tensor product space B(X) © B(Y) is a monotone class, (2.4) implies (2.1). If A e A3(X, Y), then for x G X and C G 2) it holds that i/(Sx, C) = / lc(«/) v{Sx, dy) = / (1 © l c )(Sx, y) i/(Sx, dy) = [SA(1 © la)] (x) = [A(S © T)(l © l c ) ] (x) = jf l ( 5 « ) l c ( r y ) !/(«, dy) = /" l c ( 9 ) „(», d r ^ y ) = i/(x,T- 1 C).
3.2. Channel operators
Therefore,
133
veCa(X,Y).
R e m a r k 3. L e t i / n K ^ A, where v G C(X, Y), K G K(X, Y) and A <= .4(X, F ) . If v is continuous, i.e., v satisfies (c5"), then we can see that K : C(Y) —> C(X) and A : C(X x Y) —> C(X) are positive bounded linear operators of norm 1. Definition 4. Let V he & subset of P(X). (1) Two channels v\, V), if v\{x, ■) = vi(x, •) n-a.e.x G X for every fi eV. In this case, we write V\(x, ■) = v2(x, •) V-a.e.x. (2) Two operators K i , K 2 G /C(X,F) are said to be identical m o d P , denoted Ki = K 2 (mod V), if (Ki6)(x) = (K26)(a;) /x-a.e.z G X for every 6 € B(Y) and ( i £ ? . In this case, we write Kifc = K2b V-a.e. (3) Two operators A i , A 2 G ^ ( X , F ) are said to be identical mod'P, denoted Ai = A 2 (mod V), if niA.f)
= M(A2/),
/ G C(X x F ) ,
M
G 7>,
or, equivalently, Al(n) = A*2{fi),
/xGP,
where A? : B(X)* -► B(X x Y)* is the adjoint operator of A* (i = 1,2). We need the following lemma and proposition. L e m m a 5. Let v G C(X,Y)
and \i G P(X).
f (Kb)(x) fi(dx) = f
I b(y) u(x,dy)lM(dx) = f b(y) »v{dy),
/ {&f)(x) n(dx) = /
JX
Then the following holds:
/
J X J1
" / / ,XxY
(2.5)
f(x,y)u(x,dy)fi{dx) f(x,y)n®v(dx,dy)
(2.6)
for b G B(Y) and f G B(X x Y), where fiv and /x ® v are output source and compound source defined by (1.1) and (1.2), respectively.
Chapter III: Information
134
Channels
Proof. (2.5) and (2.6) can be verified first for simple functions and then for general bounded Baire functions by approximation. C o r o l l a r y 6. a.e.x, then fifi (2) 1/ v HK*n = pv and
(1) Let vuv2 6 C(X,Y) and V C P{X). If v^x,-) < v2{x,-) < /J,V2 and fj, ® vi < fi (g> v2 for every fieV. K <-► A for v g C(X,Y), K G K{X,Y) and A G A(X,Y), A*/i = n®v for fi € F ( X ) .
Vthen
Proposition 7. Let V C P(X) and suppose that i/* G C(X, Y), K; G /C(X, y ) and Aj G A(X, Y) (i = 1,2) correspond to each other, i.e., ^ f> Kj « A ; for i = 1,2. TTien ifte following statements are equivalent: (1) ^i = v2 ( m o d P ) . (2) K ! = K 2 (mod7>). (3) A x = A 2 (mod7>). (4) A*/i = Ag/i, i.e., H®VI=JI®V2
for every fi G V.
Proof. We have the following two-sided implications: vi = u2 (modP) v\{x, •) = v2(x, ■) V-a.e.x /
/ a(x)b(y)v1(x,dy)n(dx)=
/
/ a{x)b(y) v2(x,
dy)n(dx)
for a G C(X), b G C(Y) and /i 6 V
(2.7)
/ A!(a©6)(x)/i(dx)= / A2(aQb)(x)n(dx) Jx Jx for a € C(X), 6 6 C(Y) and ^ € 7? M(Ai(a © 6)) = M(A 2 (a 0 6)) for a G C{X), b G C ( y ) and p 6 ^
(2.8)
<£=>■ A i = A 2 (mod7>), since C(X xY)
= C(X) ®A C{Y). This implies (1) <S> (3). Moreover, we have
(2.8) <s=^ HiaKib) = /i(aK 2 6) for a G C p f ) , 6 e C{Y) and
fj.eV
«=> Ki6 = K 2 6 7>-a.e. for 6 G C(Y) -^Ki^Ka
(modP),
implying (2) «■ (3), and (2.7) <=>• / / a(x)b(y) fi ® ^ ( d x , dy) = if a(x)b(y) n ® v2{dx, dy) JJXxY JJxxY for a G C{X), b G C(Y) and /* G V <=$■ fj, ® i/i = fi ® y 2 for /J G 7>
3.2. Channel operators
135
by Lemma 5, implying (1) <=>■ (4). The important cases are those where V = Pa{X) and V =
Pse(X).
Definition 8. P3(X) is said to be complete for ergodicity provided that if A G X and ju £ Pa(X) are such that fi(A) > 0, then there exists an ergodic source 77 G Pse(X) such that n(A) > 0. As was remarked before (cf. Section 2.2), if 5 is a continuous transformation on X, then S is measurable, Pa(X) ± 0 and Pae(X) / 0. Proposition 9. Suppose that S is a continuous transformation on the compact Hausdorff space X into X. Then, PS(X) is complete for ergodicity. Proof. Suppose that fi(A) > 0 for some \i G Pa{X) and A G X. Since /J is regular, there is a compact set C G X such that CCA and n(C) > 0. Then we can choose a sequence {/ n }£Li C C ( X ) such that / „ | l p as n - > o o since X is compact and Hausdorff. If 0 < a < 1, then the set Va = {n G P S (X) : 77(C) > a } is a weak* compact convex set, since 00
Va = f | {v e PS(X) : 77(/„) > a} n=l
and P„(X) is so. Let a0 = sup {a : Va ^ $}■ Then, a 0 > M(C) > 0 and P a o is a nonempty weak* compact convex set. Hence, there exists some 770 G exVao such that 7/0(C) = c*o > 0. We claim that 770 G exPs(X) = Pse(X). If this is not the case, there exist /Ji,/J2 £ -Ps(-^0 and /3 € (0,1) such that 770 = /3/Ji + (1 — 0)^2 and /ii / 7^2- Note that by the definition of ao we have ni(C), 112(C) < ao- Since ao = Vo(C) = Pfii(C) + (1 - jS)/i2(C), we must have /xi(C) = M2(C) = a 0 and hence lii, H2 G T'ao- This is a contradiction to the extremality of 770 in Vao. Thus 770 £ P s e ( X ) . For this 770 it holds that 770 (A) > 770(C) > 0. Proposition 10. Suppose that PS(X) is complete for ergodicity and let v\,V2 G Cs (X, Y). Then the following conditions are equivalent: (1) V! = v2 (modP„(X)). (2) vx = u2 ( m o d P s e ( X ) ) . (3) vi[x,Ex) = u2(x,Ex) P3(X)-a.e.x for every E G X® 0). (4) vi(x,Ex) = u2(x,Ex) Pae(X)-a.e.x for every E G X ® 0 ) . (5) \i ® v\ = (i (gi V2 for every | i e P , (X). (vi = fi ® V2 for every fi G Pse(X). Proof. (1) => (2), (3) =>• (4), (5) =► (6), (1) O (5) and (2) «* (6) are clear.
Chapter III: Information
136
Channels
(2) =*- (1) and (4) => (3) follow from the completeness for ergodicity of PS(X). (6) => (4). Assume (4) is not true. Then there is some E e X ® 2) and some H e Pse(X) such that fi({x 6 X : vi{x,Es) jt v2(x,Ex)}) > 0. We may suppose H({x € X : Ui(x,Ex) > v2{x,Ex)}) > 0. Then I v1{x,Ex)n(dx)> where A = {x e X : vi{x,Ex) Vi(x,[En{A
> u2(x,Ex)}.
I
u2{x,Ex)n(dx), Since
x Y)]x) = lA(x)ui(x,Ex),
i = 1,2,
it holds that
/*®i/i(Bn(4xy))>/i®»j(£n(Ax r)), which implies that JU® I/J ^ ^,®V2, a contradiction to (6). To close this section we characterize continuous channels once more in terms of channel operators, which is an extension of Proposition 1.6. Proposition 11. Let v G C{X, Y) be a channel and K 6 K{X, Y) and A g A(X, Y) be the corresponding channel operators. Then the following conditions are equivalent: (1) v is continuous, i.e., v satisfies (c5"). (2) K : C(Y) —» C(X) is a linear operator. (3) A : C(X x 7 ) - t C(X) is a linear operator. (4) K* : M(X) —> M(Y) is sequentially weak* continuous. (5) A* : M(X) —> M(X x Y) is sequentially weak* continuous. Proof. This follows from Proposition 1.6, Remark 3 and Corollary 6. The results obtained in this section will be applied to characterize ergodicity of stationary channels in Section 3.4 and to discuss channel capacity in Section 3.7.
3.3. Mixing channels Finite dependence was defined for channels between alphabet message spaces. As a generalization, asymptotic independence (or strong mixing) and weak mixing are introduced, which are similar to those for stationary information sources. Also ergodicity and semiergodicity are defined for stationary channels and relations among these notions are clarified.
3.3. Mixing channels
137
Consider a pair of abstract measurable spaces (X, X) and (Y, 2)) with measurable transformations S and T, respectively. We begin with a notion of ergodicity. Definition 1. A stationary channel v 6 CS(X, Y) is said to be ergodic if (c7) p. 6 Pae(X) => p, <8> i/ G P s e ( X x F ) , i.e., if a stationary ergodic source is the input, then the compound source must also be stationary ergodic. Cae(X, Y) denotes the set of all stationary ergodic channels. To obtain conditions for ergodicity we first consider channels between alphabet message spaces X = XQ and Y = YQ, where XQ and YQ are finite sets. We have the following with the notation in Section 3.1. Proposition 2. If v € CS(XQ,Y0Z)
is a memoryless channel, then v is ergodic.
Proof. Let P = (p(6|a)) . be the channel matrix of v (cf. Definition 1.1, (c4)). Let p e PSe{X) and A = [x0'= a], B = [x0 = a'] C X and C = [y0 = 6], D = [y0 = b'] C Y. Then one has 1
n _ 1
- Yl M ® v((S x T)~k(A fc=0
1
x C) n (B x D))
n _ 1
= - Y^ M ® y((S~fc^4 n B) x (T- f c C n £>)) n fc=o j n-l
= - Y2, f* ® "(fr* = a - x o = a'] x [2/fc = 6,2/o = 6']) fc=0
l""1 = - 22 p([xk = a,x0 = a'])p(b\a)p(b'\a') 71 fc=o ->• /i([a;o = a])p([x0 = o'])p(fc|a)p(6'|o'), by ergodicity of p, = pi/(^4 x C)/i ® v(B x D). It is not hard to verify that the above holds for general messages A, B C XQ and C,D C YQ1. Hence p ® v is ergodic by Theorem II.3.2. Therefore i/ is ergodic. Definition 3. A channel v e C(XQ,YQ) dependent if 71
is said to be finitely dependent or in
(c8) There exists a positive integer m G N such that for any n,r,s,t < r < s < £ and s — r > m it holds that v{x, Cn,r n CStt) = K^, C„, r )i/(x, C M )
€ N with
138
Chapter III: Information Channels
for every x 6 X and every message C„, r = [y„ ■ ■ ■ yT], Ce,t = [y3 ■ ■ ■ yt] C Y. Then we record the following theorem, which immediately follows from Theorem 7 below. T h e o r e m 4. / / a stationary channel v € CS(X$,F0Z) is ergodic.
is finitely dependent, then u
E x a m p l e 5. Consider alphabet message spaces X = XQ and Y = Y® with messages DJly in Y. Let m > 1 be an integer and for each ( » i , . . . , x m ) 6 X™ a probability distribution p ( ( x i , . . . , x m ) | ■ ) on Y"0m is given. Define v0 : X x fflY -> [0,1] by t-s
"of*, [y» ••■!/«]) = n ? ( ( I « + * - " " " ' ,a: s -(-fc-i)|(j/ s +fc- m ,... ,2/ s +fc-i)) fc=o and extend vo to i/ : X x 2) —> [0,1], so that v becomes a channel. Then, this v is stationary and finitely dependent, and has a finite memory. This is easily verified and the proof is left to the reader. Finite dependence can be generalized as follows. Consider a general pair of ab stract measurable spaces (X, X) and (Y, 2J) with measurable transformations S and T as before. First recall that a stationary source fj. € PS(X) is said to be strongly mixing (SM) if lim /j,(S-nA
n—foo
n f l ) = fi(A)n(B),
A,
BeX,
and to be weakly mixing (WM) if 1 "_1 i™o n E I ^ S - ' v l n B ) - fi(A)fi(B)\ = 0,
A,5el
fc=0
Also recall that \i e PS(X) is ergodic iff 1 "_1 fc=0
(cf. Theorem II.3.2). It was noted that for stationary sources strong mixing => weak mixing => ergodicity. As an analogy to these formulations we have the following definition.
3.3. Mixing channels
139
Definition 6. A channel v € C(X,Y) strongly mixing (SM) if
is said to be asymptotically independent or
(c9) For every C , D e 2 ) Vimg{u(x,T-nCnD)-v(x,T-nC)u(x,D)}=0
P,(X)-a.e.x.,
to be weakly mixing ( WM) if (clO) For every C , D e 2 ) n-l
lim - ^ | i / ( a ; , r " f c C n i ? ) - i / ( a ; 1 T - f c C ) i / ( x , I > ) | = 0
n—foo n
Pa(X)-a.e.x,
fc=0
and to be semiergodic (SE) if (ell) For every C, £> e 2) 1 n-l Urn 71 - ^ { f ( x , r - f c C n £ > ) - i/(a;,r- f c C)i/(x,D)} = 0 P s (X)-a.e.x. n—»oo fc=0
Note that if (X, X, S) is complete for ergodicity (cf. the previous section), then in (c9), (clO) and (3.1) below we can replace Ps(X)-a.e. by Pse(X)-a.e. Consider the following condition that looks slightly stronger than (ell): for every C,D 6 2) 1
n—1
lim -Y^v{x,T~kCr)D)=
1
n—1
lim -^u{x,T-kC)v{x,D)
n—►oo
Ps{X)-a.e.x.
(3.1)
fc=0 fc=0
In view of Proposition 1.4, the LHS of (3.1) exists for a stationary channel. Hence (ell) and (3.1) are equivalent for stationary channels. If (3.1) holds for every x € X, then the existence of the RHS means that v(x, ■) £ Pa(Y) and (3.1) itself means that v{x, •) 6 Pae(Y) for every x £ X (cf. Theorem II.4.12). Also we have for (stationary) channels strong mixing => weak mixing =$■ ergodicity => semiergodicity, where the first implication is obvious, the second is seen in Theorem 7 below, and the last will be verified later. Actually, (3.1) is a necessary (and not sufficient) condition for ergodicity of a stationary channel (cf. Theorem 13 and Theorem 4.2). Now we can give a basic result regarding these concepts. Theorem 7. Let v € Ca{X,Y)
and /j. e
P3(X).
Chapter III: Information Channels
140
(1) / / fj, is ergodic and v is weakly mixing, then /z ® v is ergodic. Hence, every strongly or weakly mixing stationary channel is ergodic. (2) If fj. and v are weakly mixing, then /z ® v is also weakly mixing. (3) If /z and v are strongly mixing, then \i®v
is also strongly
mixing.
Proof. (1) Let A, B € X and C, D 6 2). Then we have that 1 n_1 f - V / n
v(x,T~kC)i>{x,D)n(dx)
t^Js-^AnB
1 "_1 r = - V / n
Jx
k=o
= - V n
k=o
Jx
/
v{x,T-kC)lA{Skx)v{x,D)lB(x)iJ,{dx) u{Skx,C)lA{Skx)v{x,D)lB{x)li{dx), (3.2)
since v is stationary, -» / u(x,C)lA(x)ii(dx) I v{x,D)\B{x)n{dx), Jx Jx Jx since fj, is stationary and ergodic, = n ® v(A x C)n ® v{B x D). On the other hand, since v is weakly mixing, it holds that 1 n_1 - J2 Hx,T~kCC\D) k=0
- v{x,T-kC)v(x,D)\
-> 0
(1,-a.e.x e X
as n -* oo. By the Bounded Convergence Theorem we have - E
Hx,T-kCnD)-v(x,T-kC)v(x,D)\n(dx)
/ f 1 "_1
^ -// x » E K*>T~*C n D ) - "(*. T~kC)v{x, D) | M(dr) -> 0 fc=o
as rc -» oo. Combining (3.2) and (3.3), we get l"_1 - 2 J p ® v((S x r ) - * ( A x C) n (B x D)) - jx ® v(,4 x C)/x ® v ( B x £>) k=0
n-l
1 "_1 - E ^ ® " ( ( 5 " f c ^ n S ) x (T- f c C n £>)) - /i ® v ( ^ x C)n ® i/(B x D) fc=0
(3.3)
3.3. Mixing channels
141
1 "_1 t - yZ / i/(x, T _ f c C n £>) n(dx) - fj, ® 1/(4 x C)/i ® i/(5 x Z>) n fc=0-/s-^nB 1 n _ 1 /" ^~Y1 / |i/(x,T-fcCn£»)-i/(x,r-fcC)i/(x,Z))U(dx) 1
+
n _ 1
/■
v(x, T~kC)v{x,
-Y]
D) n(dx) -n®v(Ax
C)fi ® i/(£ x D)
—► 0 as n —> oo. Thus, by Theroem II.3.2 fj, ® i/ is ergodic. (2) and (3) can be verified in a similar manner as above. In the above theorem we obtained sufficient conditions for ergodicity of a sta tionary channel, namely, strong mixing and weak mixing. Key expressions there are (3.2) and (3.3). In particular, if n G PS{Y) is WM, then the constant channel vv is stationary ergodic. For, if \i G Pse(X), then fi ® uv = (i x r) is ergodic by Theorem II.3.10. In the rest of this section, we consider semiergodic channels together with averages of channels. We shall show that there exists a semiergodic channel for which, if a weak mixing source is input, then the compound source is not ergodic. This implies that semiergodicity of a channel is weaker than ergodicity. Definition 8. Let 3 = {A G X : S~XA = A} and v G Ca(X, Y). Then, a stationary channel v* G Cs (X, Y) is called an average of v if (cl2) v*(Sx, C) = v*{x, C) for x G X and C G 2); (cl3) f u(x, C) fi(dx) = / v'{x, C) fj,(dx) for A G 3, C G 2J and fi G P,(X). JA
JA
(cl2) means that for each C G 2) the function v*(-,C) is 5-invariant, while (cl3) means that v*{;C) = Eli{v{;C)\l){-) u.-a.e. (3.4) for C G 2J and /i G ^ ( X ) , where £^(-|3) is the conditional expectation w.r.t. 3 under the measure (1. A sufficient condition for the existence of the average of a given stationary channel will be obtained (cf. Theorem 13). We need a series of lemmas. Lemma 9. / / v G CS(X, Y) has an average v*, then for C G 2J 71-1
1 "_1
v*(;C)=
k
lim -Yv(S -,C) n—»oo n• n - 4 0 0 n. fc=0 * -^
Ps(X)-a.e.
(3.5)
142
Chapter III: Information Channels
Proof. In fact, this follows from (3.4) and the Pointwise Ergodic Theorem. Lemma 10. Let v e CS(X,Y) with an average v*. If fi € P s e ( X ) is such that H®v ^ fix (fiu), the product measure, then pv / /i ® v*. Proof. Observe that for A € X and C G 0) fi v*{A x C) = / v*(x, C) fj,(dx) JA
t 1 "_1 = / lim - V i / ( 5 f c a ; , C ) / i ( d x ) , JA»->°°n£<0 1 n_1 f = lim - V /
by (3.5),
i/(S f c z,C)U(z)M<*c)
= / v(x,C) fi(dx) I 1A(X) n(dx), Jx Jx = >i{A)nv{C).
since/i is ergodic,
Thus, n ® i/ / fi ® i/* by assumption. L e m m a 1 1 . i e t ^1,1/2 € C S (X, F ) 6e semiergodic with averages v{,v%, respectively. If v{ = v\ (modP s (X)) (cf. Definition 2.4), tften z/0 = K ^ i + i / i ) is also semiergodic. Proof. Since v\ (i = 1,2) exists, we see that for C,D €%) n-l
1 i ™ - ^ ^ ( ^ T ^ C H ^ . D ) = z/*(a:, < > ; ( * , £ )
Ps(X)-a.e.3
by Lemma 9 and hence, by semiergodicity of Vi, n-l
1 "_1 "£, Tl ^ E "*(*• r ~ * C n *>) = "*(*> C)n{x, D) n—too
Ps(X)-a.e.x.
fc=0
Consequently, we have that n-l
£5ii E M * . T~*C n D) - u0(x,T-kC)u0(x,D)} fc=0 1 =
n —
*
fi
„1™-E fc=0
^i(x,T~kCnD) + -u2(x,T-hCnD) L
(3.6)
3.3.
Mixing channels
143
-\{MSkx,C)
+ v2(Skx,C)}{v1{x,D)
+ v2(x,D)}
,
since V\, v2 are stationary, = ^ W ( * , C)ux(x,D)
+ !/2*(x, C > 2 ( x , £>)}
- £ { » ? ( * , O + i4{x,C)}{vr{x,D)
+ v2{x,£»)},
by (3.6), stationarity of V\,v%, and the definition of the average, = \{v{(x,C) = 0
- u;(x,C)}{Vl(x,D)
-
v2{x,D)}
Ps(X)-a.e.x,
by v* = 1/* (modP s (X)). Thus 1/0 is semiergodic. Lemma 12. If v € CS(X,Y) semiergodic.
is semiergodic and has an average v*, then v* is also
Proof. Let C,D € 2J and fi € P„(X). Then,
i n
n-l
i/ (a; r fccnjD)=
^r^ * ' " fc=0 fc=0 fc=0 fc=0
, n-1
^^£^^ ( -' T "' : c n j D ) i : , ) ( x )
= lim EJ-yi/(-,rfccni))3 \ fc=o = ^(^(.,C)i/(.,D)|3)(*),
(I) /
by (3.6) with 1/ = ut and (10) in Section 1.2, = =
v*(x,C)Eli(v(-,Dp)(x) u*{x,C)v*(x,D) n-l
= lim - V / ( i , r ' C ) / ( i , D )
//-a.e.x,
fc=0
by (cl2), which yields that v" is semiergodic. Now the existence of an average channel of a given stationary channel is considered under some topological conditions.
Chapter III: Information
144
Channels
T h e o r e m 1 3 . Let (X, X, S) be an abstract measurable space with a measurable transformation S and Y be a compact metric space with the Borel a-algebra 2) and a homeomorphism T. Then every stationary channel v € Ca(X,Y) has an average u*€Ca(X,Y). Proof. Denote by Mf(Y) C M(Y) = C(Y)* the positive part of the unit sphere and by B(Y) the space of bounded Borel functions on Y. Let v 6 CS(X, Y) be given. For each x £ X define a functional vx on B{Y) by
*.(/) = y f(y)»(x,dy),
f€B(Y).
If we restrict vx to C(Y), then we see that vx € M*(Y) for each x € X. Let 33 be a countable dense subspace of C(Y) with the scalar multipUcation of rational complex numbers and let 1 "_1
A x (/)= Urn -5>fl»„(/), n—K» n *—» fc=0
/۩
(3.7)
for each x € X. Since, for each / £ C(Y), the function vi)(f) is bounded and measurable on X, (3.7) is well-defined \i-a.e. by the Pointwise Ergodic Theorem for each fj, e PS(X). Let X® = {x € X : (3.7) exists for all / € 33}. It is easily seen that X® € X, X® is S-invariant, and n(X%,) = 1 for fj, € -PS(-3Q since 3D is countable. Note that, for each x e X, Ax(-) is a positive bounded linear functional on 3) since for / 6 5) |A,(/)| =
um
-Y]vSkX(f)
n-too n *—* k=0
}^lj:[\ny)\Hskx,dy)< 1—n ^ X
so that A I ( ) can be uniquely extended to a positive bounded linear functional A I ( ) of norm 1 on C(Y). Let us examine some properties of Ax. Ax satisfies that As* (/) = A,, (/),
x e Xv,
/ 6 C(F)
(3.8)
since A S l ( / ) = A x ( / ) for x € X D and / e D. For each / € 33, A ( . } (/) is £ measurable on Xv, which follows from (3.7). For a general / e C(Y) there exists a sequence {/„}£°=1 C 33 such that ||/„ - / | | -> 0, so that A x ( / ) = Urn Ax(fn),
x € Xo,
3.3. Mixing channels
145
implying the measurability of A(.)(/) on X®. Moreover, for each x £ X®, one can find a probability measure r)x on 2) such that
A.(/) = y f(y)rix(dy),
feC(Y)
by the Riesz-Markov-Kakutani Theorem. One can also verify that rjx is T-invariant, i.e., T) £ PS{Y) for x £ X D , which follows from (3.7) and stationarity of v. Consider the set #o = \ f € B(Y) : / fdr)x is ^-measurable on X® >. Then we see that C(Y) C Bo and Bo is a monotone class, i.e., if {fn}n°=i ^ So and /„ I f, then / £ B0. Hence one has B0 = B(Y). Denote by the same symbol A^ the functional extended to B(Y). Take any 77 £ PS(Y) and define v* by
/(x,C)=(*(C) v ; \ JJ(C)
tf e
* *» if x e X£
for C £ 2J. We shall show that v* is the desired average of v. First note that v* is a stationary channel, v* £ CS(X,Y), by % (a: 6 XD),T/ £ P S (Y) and (3.8). Now we verify (cl2) and (cl3) (Definition 8). If x £ X33, then Sx £ X-Q and v*(ar,C) = rjSx(C) = r,x(C) = z/*(x,C),
C £ 2J,
and if x £ Xf,, then 5x £ X | , and I /*(Sx,C)
= 7?(C) = ^ ( x , C ) ,
C£?J,
so that (cl2) is satisfied. To see (cl3), let n £ Pa(X) be fixed and observe that f [ f(y) v(x,dy)n(dx)
= f
f f(y) v*(x, dy)fi(dx)
for / £ 5). For, if g(x) = / y f(y) v(x, dy) [x £ X), then gs(x) =
tim{Sng)(x) fl-KM
= / f{y)v*(x,dy) Jy
and / g(x)n(dx)= Jx
/ Jx
gs(x)n(dx)
n-a.e.x
(3.9)
146
Chapter III: Information
Channels
by the Pointwise Ergodic Theorem, which gives (3.9). Moreover, (3.9) holds for / e C(Y) since 2) is dense in C(Y). If G G X is S-invariant (G € 3) with n(G) > 0, then the measure HG defined by
MA)
=^ g ^
= KA\G),
A e x,
is an S-invariant probability measure. We have that for C € 2) I u(x, C) n(dx) = n(G) J -H{G) = j
v(x, C) na(dx)
[ Jx
v*(x,C)nG(dx)
u*{x,C)fi(dx).
JG
If G 6 3 is such that fj,(G) = 0, then the equality in (cl3) also holds. Thus v* satisfies (cl3). Therefore v* is the desired average of v. The following theorem is our main result of this section. Theorem 14. There is a semiergodic channel VQ and a strongly mixing input Ho for which the compound source fio ® va is not ergodic. Hence, a semiergodic channel is not ergodic. Proof. Consider an alphabet message space X = {0,1} Z with the Baire cr-algebra X and the shift S. Let v £ C(X, X) be a memoryless channel with the channel matrix
- ( ! ! ) (cf. Definition 1.1). Then one can directly verify that v is stationary and semiergodic. v has an average v" by Theorem 13 and v* is semiergodic by Lemma 12, so that 1
1 ,
is also semiergodic by Lemma 11 since (i/*)* = v*. Now let fio be a ( | , |)-Bernoulli source (cf. Example II.1.2), which is strongly mixing (cf. Example II.3.7). Then n0 ® v0 is not a direct product measure, so that Mo ® v / no ® v* by Lemma 10. Since Ho ® vo = -zHo ® v + -fio ® v* z z
3.4- Ergodic channels
147
is a proper convex combination, fj,0 ® "o is not ergodic by Theorem II.3.2. This completes the proof.
3.4. Ergodic channels In the previous section sufficient conditions for ergodicity were given. In this section, ergodicity of stationary channels is characterized by finding several necessary and sufficient conditions. We shall notice that many of these conditions are similar to those for a stationary source to be ergodic. Also we use channel operators to obtain equivalence conditions for ergodicity. In particular, we realize that there is a close relation between ergodicity and extremality. Let (X, X, S) and (Y, 2),T) be a pair of abstract measurable spaces with mea surable transformations S and T, respectively. Recall that a stationary channel v e C,(X, Y) is said to be ergodic if (c7) it £ Pse(X)
=> n ® v € Pse{X x Y).
Definition 1. Let V be a subset of P{X). A stationary channel v e CS(X,Y) is said to be extremal in CS(X,Y) m o d P , denoted v e exCs(X, Y) (modP), if v\,vi € C„(X,Y) and a e (0,1) are such that v = av\ + (1 - a)v2 (modP), then vi = i>2 ( m o d P ) .
The following theorem gives some necessary and sufficient conditions for ergodicity of a stationary channel, which are very similar to those for ergodicity of a stationary source (cf. Theorem II.3.2). Theorem 2. For a stationary channel v e CS(X,Y) the following conditions are equivalent: (1) v € Cse{X, Y), i.e., v is ergodic. (2) If E eX<8>%) is S x T-invariant, then v{x,Ex) = 0 or 1 P3e(X)-a.e.x. (3) There exists an ergodic channel v\ e C3e(X, Y) such that v(x,-) •< V\{x, •) Pee(X)-a.e.x. (4) / / a stationary channel VQ £ CS(X, Y) is such that ^(a;, •)
n—1
lim -*yv(x,[(SxT)-kEr\F]x)= k=0
.. n—1
lim - Vi/(jt, US x
T)-kE]x)v(x,Fx)
fc=0
= /i ® z/(£)i/(x, Fj:)
fi-a.e.x.
148
Chapter III: Information
(7) For A,B eX,C,De%) and n £ Pae(X) ; , D e 2 ) ana
it holds that
k
k
{u(x, {u(x, T~ T~kCC n D) D) - v(x, v(x, T~ T~kCHx, CHx,
llim-yf i m - y f
Channels
D)} D)} p(dx) p(dx) = 0.
Proof. (1) => (2). Let fi £ P.se(X). 'Then fi ® i^ 6 P s e ( X x K) by (1). Hence, if e(X). E eXtgiZ) is S x T-invariant, then // ® u(E) = 0 or 1, i.e., / v(x, Ex) /j,(dx) = 0 or 1.
Jx
This implies z/(x, Ex) = 0 fi-a.e.x or i/(a;, S^) = 1 /x-a.e.x. Thus (2) holds. (2) => (1). Let E 6 £ ® 2) be 5 x T-invariant, /x 6 P se (X) and ,40 = {z GX : !/(*,-£*) = 0 } , A i = { x £ X : i / ( x , £*) = ! } . Note that T~1Esx = -Ex since r - ^1EsSx* = = {yeY:Tye{zeY: {y € Y : Ty € {z e Y :(Sx,z) (Sx,z) €£ E}} E}} T~ {yeY:(Sx,Ty)eE} = = {yeY:(x,y)e(Sx T)~ T^EXE = E} E) = EExx..
(4.1)
Hence AQ is 5-invariant since S~lA0 = {xeX
:SxeA0}Ao} = {xeX 1
= {x £ X : v(x, T~ Esx) = 0},
: v{Sx, ESx) = 0} because v is stationary,
= {x £ X : 1/(2;,^) = 0} = A0. Similarly, we see that Ai is 5-invariant. Thus, n(A0) = 0 or 1, and n(Ai) = 0 or 1 since n is ergodic. Consequently, we have that lt®v{E)=
I u(x,Ex)n(dx) Jx
= 0 or 1.
Therefore, \x ® v is ergodic. (1) => (3) is trivial since we can take v\ = v and (3) =$■ (1) follows from Theorem II.3.2 and Corollary 2.6 (1).
3-4- Ergodic channels
149
(1) =>• (5). Suppose that (5) is false. Then for some v\,v% € CS(X,Y) with vi ^ v2 (modPPsese(X)) (X)) ;and a G (0,1) we have v = avx + (1 - a)u2 (modP s e (X)). Hence nv\ i= A4 ® V2 for some fj. 6 Pse{X) and H ® ^(.E) = / v(x, Ex) n(dx) Jx = /
[aux(x, £ s ) + (1 - a)i/ 2 (x, #*)] /i(
= a/i®^i(£,) + ( l - a ) ^ ® z y 2 ( £ ) ,
Be2®2).
Thus fi ® J/ = a/xi>i + (1 — a)n ® i/2 is a nontrivial convex combination of two distinct stationary sources. Therfore, \i ® v is not ergodic, which contradicts the ergodicity of v. (5) => (2). Assume that (2) is false. Then, there are some /J e Pse{X) and some SxT-invariant £ e 3Dg>2) such that ^(v4) > 0, where A= {x e X : v(x, Ex) ^ 0, l } . Define v\ and v-i by X £ / 1
u1(x,C)={
v{x,Ex) ' [ v(x,C), y(x,C), (r v{x, v(x, CCnn El) £%) ^(x,0=\ ^(x,C)=i v(x,E°) v(x,E°) ' [ v(x,C), v(z,C),
' 1 6 xeA ,4°,c, I XeA € A
> xeAcc,
where C £ 2}- Clearly, i/i and 1*2 are channels. As in the proof of (2) => (1) we see that Ex = T'^Esx by (4.1) and hence 1 v(x, T1ESx E)Sx) = u(Sx,E = v(x,Tv{Sx,ESx Sx),
u{x,E v{x,Ex) implying that S^A
x €e A,
(4.2)
Ul[ bx M
= A. Thus, for x e A and C G 2)
'°>' W
_ v(Sx,CnE v(Sx,CnESxSx) ) v(Sx,E u(Sx,ESxSx))
_
_ ~ 11
v{x,T-\CnESx v{x,T-\CnE Sx)) u{x,Exx)
_ ii//^^rr--''ccnnrr ^^)) __ V^T^Cr\E V^T^Cr\EXX)) =
u(x, Exx) u1(x,T~1C).
v(x, Exx)
It follows that v\ is stationary. Similarly, one can verify that v2 is stationary. More over, we see that vi(x,Ea)
= l,
v2{x,Ex)
= 0,
x e A,
150
Chapter III: Information
Channels
vx ^ i/ v22 ( m o d P s e ( X ) ) . Note that for x G X X and C G £ 2) and hence 1/1 2) r/(a:,C) / f o&)}**(*»£)■ ^HifoC). 1/(1, C) = v(*,^,)i* j/(z,£,)i/i(ar,C) l (ar,C) + { l - iK*»
(4.3)
Let B = {x e X : i/(x, S a ) > 5} G £ and define 1/3 and v4 by
1 r^J = f
1/3111
x G B,
( ' )'
I1 2i/(*,£,)vt(x,C) + x)}u2(x,C), 2v(x,Ex)Vl(x,C) + {1 - 2u(x,E {l-2v(x,E x)}v2(x,C),
v3(x,0) .
Ul x c
„
f { 2 » ( s 1 f t ) - l } n ( « , C ) + {2-2v(*lB.)}i*,(*,C)> I.
v2(x,G),
C x1 6GBf?,c ,
xGBc,
where C G 2). Obviously, 1/3,1/4 € C ( Z , F ) . Observe that S - 1 . B = B and 5 _ 1 B C = B c by (4.2). Then it is easily verified that 1/3 and 1/4 are stationary by S-invariance of B and Bc, stationarity of V\ and v2, and (4.2). Furthermore, we have by the definition of 1/3 and 1/4, and by (4.3) that v = -1/3 + -1/4 (modPsePO)B,, then z/3(x, .&,;) = vi(x, = 1, while If x G A n 5 i/i(a:, E i ^x)) = Vi(x,E = - l}vi(x,E i/ = {2v(x,E {&>(*>■£*) l}vi{x,Emx)) x) m) 4 (a:,S») = 2v(x,Ex)-l <1.
+ {2 {2 +
2v{x,E 2v(x,Ex)}v )}u22{x,E (x,Ex)
If x G A n Bc, then i/ 3 (z,£*) = {l-2i/(x,£; a ; )}i/ 2 (a;,S a ! ) + 2 1 / ( 1 , ^ ) 1 / 1 ( 0 : , ^ ) = 2v(x,Exx)>Q, )>0, 2v(x,E while i/4(x, B j ) = v2(x, Ex) = 0. Thus, v3{x, Ex) ^ z/4(x, £?„.) for x € A. This means that 1/3 ^ 1/4 (modP s e (X)) and so f £ e x C s ( X , y ) ( m o d P s e ( X ) ) . Therefore, (5) is not true. (4) => (2). Suppose (2) is false. Then for some fj, G Pse{X) and £ G £ ® 2J with E it holds that j*(A) > 0, where A = {x G X : 0 < u(x,Ex) < 1}. (S x T)-XE = E Define I/Q by v(x,CnEx) x G T l , CG2J, i/(x, Ex)
{
1/(1, C7),
= ; , x G T l , C G 2 JJ , v[x,E ) ' Then we see that i/0 G C S (X, F ) v(x,C), and v £x u0 (modPxeA in a similar way as before. c s e (X)) , CG2J. Moreover, we have v0(x,-) « v(x,-) Pse(X)-a.e.x 1since D G 2J and v{x,D) = 0 imply i/0(x, D) = 0 for x G X . This contradicts (4).
3.4- Ergodic channels
151
(1), (2), (5) =S> (4). Thus far we have proved (1) <3> (2) «• (3) «• (5). Assume (4) is false. Then there exists some VQ £ CS(X, Y) such that (X)-a.e.z, v0(x, •) < u(x, ■) P 3e s e{X)-a.e.x,
VQ^V (modP s e (X)).
5 2) is 5 x T-invariant, TLet /i £ Pse(-X") be arbitrary. By (2), ifEeX®Z)isSx then v(x, Ex) = 0 or 1 fi-a.e.x. Let v\ = \v+ \vfj- Then, i/i(x, F^) = 0 or 1 fi-a.e.x since u(x,Ex) = 0 ti-a.e.x implies i/o(x,£ z ) = 0 (i-a.e.x, and i>(x,Ex) = 1 ji-a.e.x implies fo(x, F.,,) = 1 /i-a.e.x. Thus fi is ergodic by (2), which contradicts (5). (1) =$■ (6). Suppose v is ergodic and let \i £ P s e ( X ) . Then (X xxy) y) . ii fi®ue ® v e PPse3e(x and hence for every E, F' £ X ® 2) ln_1 c lim V / x/*®® i/((5 i/((S xx T)~ r ) _kfEOF') S n f ) ==fi® / i « v(E)fi® i/(F)/i ® i/(F'). lim -fi V v(F"). n—>oo ' '
(4.4)
n—>oo fi fc=0 ' ' fc=0
If we take F ' = F n (.A x Y) for F £ £ ® 2) and >1 € X, then on the one hand _1 11 "" _ 1 ff
Um m -Y\ - V // ^.[(SxTj-'Enfn^xy)],)^) ^.[(SxTj-'Enfn^xy)],)^) U 1n_1 = lim - ^ n ® v((S x T)~kE n F n (A x Y)) li®v(E)ii®v(Fn(AxY)), = ii®v(E)ii®v(Fn(AxY)),
by (4.4),
H®u(E) )fi(dx), = H® u(E) I/ u(x,u(x,F Fx) xfi(dx),
(4.5)
and on the other hand by Proposition 1.4 _ 1
11 "" _ 1 /■f v{x> K 5 * r ) _ fk c £ n F] ) fi{dx) LHS of of (4.4) (4.4) = = JJirr^ LHS t o ^ -J2 - X ) / "(a;, [(5 x T)~ E n F]xx ) /z(dx) k=0
A
/" 1I n _ 1 r = / J ^ - £Y,) i/(x, T ) -kfEc S n F) F]xa)) fi(dx). ii(dx). v{x, [(5 x T)~ A "'• "'•A
fc=0 fc=0
(4.6)
Since (4.5) and (4.6) are equal for every A e 3E, one has the equation in (6). (6) => (7). If £■ = >1 x C and F = B x D, then the equation in (6) reduces to 1 "_1 lim -
Y*ls-kAnB(x)v{x,T-kCnD)
fc=0
1 "_1 = lim -Y"l s -«AnB(a;)«'(a;,r- A : C)^(x,Z)) n—>oo n ' * fc=0
Pae(X)-a.e.x.
152
Chapter III: Information Channels
Integrating both sides w.r.t. /x G Pae(X)
over X, we get the equation in (7).
(7) =>• (1). Let /i e F , e ( X ) , A, 5 € X and C, Z> € 2). Then k
--Y^{^®"((SxT)^ {M ® v ( ( 5(AxC)n(BxD))-n®HA^ x T)- f c (A x C) n ( S x JO)) - n 9 v{A x C)p®v{B C)p $ v{B x £D)} )} fc=0 k k k k {u(x,TCnD)-u(x,TC)u(x,D)}fi(dx) {u(x,TCnD)-u(x,TC)u(x,D)}fi(dx)
= - Y
[I n^Js-XAnB
k fc k k + - yY {l ( [f/ lAl J(S x)u(S x,C)l (x)u(x,D) fi(dx) U(SkBx)u(S x,C)lB(x)u(x,D)fi(dx) i(S *)»'(5*a!,C7)lB(a!)i/(*,I»)/i(dB)
- I/ lU(i)i/(a;,C)A»(da:) I/ Is(*)"(*> lB(x)v(x,D)n(dx)\ D) (i{dx) 1 A(x)v{x,C)ii(dx) —> (as nn — — oo) —>■ 00 (as >> oo) by the assumption (7) and the ergodicity of \i. Thus /i ® z/ is ergodic. In (7) of Theorem 2, if we replace E = X x C and F = X x D, then (7) reduces to the condition (3.1). We now consider ergodicity of a stationary channel in terms of the channel oper ators it associates. So assume that X and Y are a pair of compact Hausdorff spaces with homeomorphisms S and T, respectively. Definition 3. Let K e K3{X, Y), A G AS{X, Y) and P C P ( X ) . Then K is said to be extremal in KLa{X,Y) m o d P , denoted K € exK.s(X,Y) (mod'P), provided that, if K = o K j + (1 - Q ) K 2 (modP) for some K i , K 2 G £ S ( X , F ) and a G (0,1), then K i = K 2 ( m o d P ) . Similarly, A is said to be extremal in AS(X, Y) mod'P, denoted A G e x A s ( X , y ) (mod'P), provided that, if A = a A i + ( l - a ) A 2 (mod'P) for some A i , A 2 G A S (X, Y) and a e (0,1), then A x = A 2 ( m o d P ) . Under these preparations, extremal operators A G AS(X, Y) are characterized as follows: T h e o r e m 4. Let A G A ASS(X, (X, Y). Then the following conditions are equivalent: (1) For every f,g£ C(X x Y) fo^A(fng)(x) g&MfnSK*)
= (X)-a.cJB, = nJHrn^A/„(ar)A«(*) lnn o A/ n (x)A f f (a;) PPse (X)-a.e.x, sfee{X)-a.e.x,
1 n-l
where wfcere / „ = (S ® T ) „ / = - "£^ (S ® T ) f c / /or n > > 1. n
fc=0
(2) (JL G P s e ( X ) =► A > G Pse(X
x Y).
(4.7)
3-4- Ergodic channels
153
(3) A 6 exA8a(X,Y)
(modPse(X)).
Proof. (1) => (2). Let fi e Pse{X) lim A*fi(fng)
and f,g e C(X x Y). Then
= lim / J ( A ( / „ # ) )
= hm lim fJ.(Af„ ■ Ag),
by (1),
n—>oo n—foo
= lim n((Af)nAg), = /i(Af)fi(Ag),
since A ( S ® T ) = SA, since (i is ergodic,
= A>(/)A>(P), where ( A / ) „ = S „ ( A / ) , n > 1 (cf. Theorem II.3.2). Thus A*n e Pse(X x F ) by Theorem II.3.2. (2) =>• (1). Suppose that (1) is false. Then it holds that for some fi e Pse{X) and f,geC(XxY) ^J{x ({x £ eX X : lim A A(f( n/g)(x) n 5 ) ( x ) ^ lim Afn(x)Ag(x)})
>0
or, equivalently, for some h e C(X) Urn lim n(A(fng) n—yoo
■ h) ^± h lim Ag-ti). m /u(A/„ ■Ag-h).
x
'
n—>oo
(4.8)
v
Since A*n is ergodic by (2), we have lim fi(A(f n(A(fng) n—foo
s
■ h) = lim n(A(fngh)), '
n—foo
by (al),
v
= lim A*n(f A*n(fnngh) gh) n—¥oo
Urn A > ( / n ) A V ( f f / i ) = hm n—J-oo
fi(Af h). = lim /^i(Af „)fj,(Ag ) A t ( A 5 •■ h). i ( A /nn)fi(Ag
ft)-
(4.9)
n—►cxp n—►cxp
Moreover, since // is ergodic, we also have lim M ( A / „ -Ag-h)= n—>oo n—>oo
lim //(A/ n(Afn)fj,(Ag n )//(A# ■ h).
ft).
(4.10)
n—>-oo
(4.9) and (4.10) contradict (4.8). Thus (4.7) holds. (2) => (3). Suppose A = a A i + (1 - a ) A 2 (modP s e (X)) for some Ax, A 2 e A and a € (0,1). For fj, 6 Pse{X) S(X,Y) A(x,y) A*n = aAlu + (1 - a)Aj/i
Chapter III: Information
154
Channels
is also ergodic by (2). Then by Theorem II.3.2, A\n = A ^ = A*fi G Pse(X x Y). Since this is true for every fj, G PBe{X), we have Ai = A 2 = A {vaoAPse{X)) by Proposition 2.7. Thus (3) holds. (3) => (2). Assume that (2) is not true. Then there exists a \i G Pse(X) such that A*/x is not ergodic. Hence there is some S x T-invariant set E G 3£® 2) such that 0 < A*n{E) < 1. Let \i = A*/J,(E) and A2 = 1 - A1; and take 7 so that 0 < 7 < min{Ai,A2}. Let a* = jL (i = 1,2) and define operators A 1 ( A 2 on B(X x y ) by A i / = a i A ( / l B ) + (1 - o i A l B ) A / , A 2 / = a 2 A(/l B c) + (1 - a 2 Al £ c)A/,
f e B(X x Y), y). / eG 55((XX xx Y).
We shall show Ai, A 2 G A(X,Y). A x l = 1 is clear. A i ( / A ^ ) = (Ai/XAip) for /, g G B(X x Y) is seen from the following computation: A A1i(fA (/A [/{oiA(sl«) 1g) l 5 ) = Ax [/{oiA(sl B ) + (1 -
aaixAl AlEE)Ag}] )Ag}]
= aioiA[/{aiA(ffl A[f{aiA(glB) B ) + (1 - aiAl BB)A<,}l BB] + (1 - aiAlj5)A[/{aiA(ffl*) + (1 - axAl^Aff}] = of A[flEA(glE)] + a i A [ / l s ( l - aiAlB)AS] + oi(l AlE)A[fA(gl + (1 - o i A l « ) A [ / ( l an(l - aiaiAl E)] B )A[/A(ffl«)] = a\A{flE)A{glE) + a i A [ / l B ( l - cnAl*)]Ag aiAlB)]Ag
ai aiAlBB)Ag]
+ a i ( l - aiaAl i AE)Af lB)A/ ■ • A(gl A(glBE)) + + (1 (1 -- aa11A AllBB))A A[[//((ll -- a i AailAl As B )E])]Ag = = a\A{flBB)A{glEE) + aiaiA(flEE)(l ai aiAlEE)Ag + a i ( l - a i A l B ) A / ■ A(glE) + (1 - aaiAl i AEl)B2Af ) 2 A / ■ Ag A3 = [aiA(/l B ) + (1 - aiAlE)Af] [aiA(glE) + (1 Al ai E B)Ag] = (Ax/XAtf), (A1f){A1g), where we used A(fAg) = (Af)(Ag). Similarly, we can verify (al) for A 2 . More over, (a3) is clearly satisfied by Ai, A 2 , which is seen from their definitions. Thus Ai,A 2 G A(X,Y). Furthermore, Ai,A 2 G A,(X,Y) since for / G B(X x Y) we have Ai(S ® T ) / = aiA(((S 9 T)f)lB)
+ (1 - aiAl*)A(S ® T ) /
= a x A((S ® T)(/1 B )) + (1 - a i A l B ) S A / , since A(S ® T) = SA and E is S x T-invariant, = otiSA(/l H ) + (1 - a i A l B ) S A / = SAi/
3.4- Ergodic channels
155
and similarly A 2 ( S ® T ) / = S A 2 / . Now we show that A i ^ A 2 (modPae(X)).
In fact, we have
1JSC — A i l ^ c ) = n(a /i(a22A\ Al£;c + (1 — a 2 Alfi<:)Al£;c /X(A21JS<: AlB<:)Alf;c Ec - aiA(l s <:l£;) - (1 - a i A l B ) A l £ c ) n(a22Al + A 1 B C ( Q ! I A 1 B — a 2 aA2Al = /i(a Al.Ec + l BECc)) )) Ec = ct a22A*fi{E c(aiAlBE = A*fi{Ecc)) + n(AlEEc{aiAl n(AlEc)fi(aiAlEE = a 2 A 2 + n(AlE')n(aiAl
-
A\EEc)) a22Al
a2AlEc),
since S„(Al£e) = A l ^ and \i is ergodic, = a 2 A 2 + A 2 (aiAi — a 2 A 2 ) = 7 + A 2 ( 77 - 7 ) > 0 .
(4.11)
Finally we see that A1A1 + A 2 A 2 = A since for / e B(X x Y) A i A i / + A 2 A 2 / = 7 A ( / 1 J B ) + (Ai - 7 A 1 B ) A / + T A ( / 1 ^ ) + (A2 = 7 A / + ( 1 - 7 ) A / = A/.
^AlE.)Af (4.12)
These results imply that A ^ exA3(X, Y) (modP s e (X)), a contradiction. As a corollary we have: Corollary 5. Let v e C3{X,Y) be a stationary channel and K e K3(X,Y) and A e AS(X, Y) be the corresponding channel operators to v. Then the following statements are equivalent: (1) v is ergodic. (2) v 6 exC3(X,Y) (modPae 3e(X)). (modPae(X)). (3) K € exlC3(X,Y) (4) A 6 exA3(X,Y) (modP3e ae(X)). (5) For f,g£ C{X x Y) it holds that lim A(fng)(x) n—*-oo
= lim Afn(x)Ag(x)
Pse(X)-a.e.x,
n—¥oo n—¥oo
ju/iere where / „ = (S ® T ) n / , n > > 1. (6) For f,g / , o e£ B(X x Y) the equality in (5) holds. (7) For it AoZrfs holds t/iat that /or for Pse(X)-a.e.x J b r E, F e£ X ® 2) if -1 n—1
-. - n—1
k - V xv(x,[(S k [(S X T)-*fi]„) K ( I , F ,x)).. lim --Yv(x,[(SxT)y i / ( i , [ ( S xEr\F] r ) - *x)= £ n J ! l , ) = lim ~Y xT)-kE]s)u(x,F
fc=0 fe=0
*:=0
Chapter III: Information
156
Channels
Proof. (1) «• (2) ■& (3) «• (4) « • (5) follow from Theorems 2.1, 2.2, 2 and 4. (5) =>• (6) is proved by a suitable approximation. (6) => (7) is derived by taking / = 1E and g = 1F for E,F € X ® 2). (7) =>■ (6) follows from the fact that {1E : E € X ® 0)} spans 5 ( X x r ) . (6) =► (5) is trivial. Remark 6. Observe that each of the following conditions is not sufficient for er(X,X,S) and (Y,2J,T) are a godicity of a stationary channel v € CS(X, Y), where (X,X,S) pair of abstract measurable spaces with measurable transformations: (1) u(x, ■) e Pae(Y) PaeP(X)-a.e.x. se(X)-a.e.x. (2) v{x, •) £ Pse(Y) PaeP(X)-a.e.x. ae(X)-a.e.x. (3) v = i/v for n e P < e (Y), f,, being the constant channel determined by n. In fact, if X = y , n e P s e (-^) is not WM, and i/„ is ergodic, then ■q®vTI = is ergodic, which implies n is WM by Theorem II.3.10, a contradiction.
nxrj
3.5. A M S channels AMS sources were considered as an extension of stationary sources in Section 2.4. In this section, AMS channels are defined and studied as a generalization of stationary channels. A characterization of AMS channels is obtained as well as that of ergodic AMS channels. Absolute continuity of measures plays an important role in this section. Let X, Y be a pair of compact Hausdorff spaces with Baire <7-algebras X, 2) and homeomorphisms 5, T, respectively. The invertibility assumption is crucial. Assume that 2) has a countable generator 2Jo, i.e., 2)o is countable and
stationary
(cl4) ft e Pa{X) =*v ® ft £ Pa(X x Y). That is, if the input source is AMS, then the compound source is also AMS. Ca(X, Y) denotes the set of all AMS channels. First we need the following two lemmas. Lemma 2. A channel ueC(X,
Y) is AMS iff
ft € PS(X) => ft ® v e Pa{X x Y). Proof. The "only i f part is obvious. As to the "if part, let ft e P a(X). Pa(X).
Then
3.5. AMS channels
157
/Z G PS(X) and fi{-) = \i ® i^A" x •) G P S (V). (3) t>(x, •) -C T"7 (i-a.e.x.
Then:
Proof. (1) Observe that for J4 G X
JUS, S\ EE MS"*A) MS"*A) == nUm « ,,((5 JT)-»(il xx yy)) JUS, fca II £g M ( ( 5 xx JT)-»(il )) M « fc=0 fc=0
=/T®T(A x Y). Thus /z is AMS with the desired stationary mean. (2) is verified similarly. (3) Suppose that ~fiv(C) = 0. Then fJ.v(C) = 0 since \w
= 0.
This implies that v{x, C) = 0 (i-a.e.x, completing the proof. An important consequence of these two lemmas is: Corollary 4. For a channel v G C(X, Y) the following conditions are equivalent: (1) v is AMS, i.e., v G Ca(X,Y). (2) For each stationary ft G Ps (X) there exists a stationary channel v\ e Cs (X, Y) such that v(x, •) -C vi{x, •) (i-a.e.x. (5.1) (3) For each stationary n G PS(X) there exists an AMS channel v\ G Ca(X, Y) such that (5.1) holds. Proof. (1) => (2). Suppose i/ is AMS and let /^ G P S (X). Then / J ^ G Pa(Y) by Lemma 3 (2) since \i ® i/ G Pa{X x Y). If we let vx(x, C) = TIv{C) for x G X and C G 2), then i>i is a constant stationary channel and (5.1) is true by Lemma 3 (3). (2) => (3) is immediate.
158
Chapter III: Information
Channels
(3) =► (l). Let fi e PS(X) and suppose the existence of vi mentioned. Then fj, ® fi e Pa(X x Y) and fi ® v < /it ® i/i by (5.1). Hence p ® v e Pa{X x F) by Proposition II.4.8. Thus v is AMS by Lemma 2. Now we want to consider an analogy of stationary means of AMS channels. Sup pose that v G Ca(X, Y) and y, € PS{X). Then fi ® v is AMS. Observe the following computation: for A e 3£ and C € 2) 1 "_1
- £ n
1 "_1
fc
M® "((5 x T)- (A x O ) = - £ M ® ^(5-fci4 x T~kC) fc=o
fc=o
v{x,T~kC)
= -T,[
fi(dx)
= \ E / "(5- fc x,T- fc C) Ai(d(S"*i)) = /
-^i/(5"*a;,r~*C)M*f)
= / vn(x,C)n(dx),
say,
= n®vn(A x C) -» //® v{A x C) (n -> oo). Clearly, each u„ (n > 1) is a channel, but not necessarily stationary. The sequence {vn} is expected to converge to some stationary channel V, to be called again a "stationary mean" of v. We shall prove the following. Proposition 5. A channel v € C(X, Y) is AMS iff for each stationary input source H G Ps (X) there exists a stationary channel v 6 C3 (X, Y) such that for every C G 2) n n -- l fc k fc *>(*, O = lim - Vkx,Tz/(Si>(x,C)=lim C) x, r- C) o-Y,»(S-
/x-a.e.z. n-a.e.x.
(5.2)
fc=0
In this case it holds that /z®i/ = ji®i>.
(5.3)
Proof. Assume that v € Ca(X,Y) and fi e P S (X). Let ??(•) = /n7(-). Then, by Lemma 3, n e P,(Y) and u(x, ■) < r) fi-a.e.x. Let Xi = {x e X : v(x, ■) < »y}, so that /i(Xi) = 1. Moreover, if X* = nn ;S"Xi, then X* is S-invariant and u(JC*) = 1
3.5. AMS channels
159
since (i is stationary. Let
I 0,
x^X*,y€y.
X
Then we have that k e L (X x Y,fi®v) by Theorem IV.2.5 and Remark IV.2.6, which are independent of this proposition, since fi ® v
f k(x,y)V(dy), Jc
xeX*,Ce
Now one has for x e X* and C e 2 ) fc
fc
k k(S~ y)r,{dy) r,{dy) k(S~kkx,x, y)
k x, TC) = \± J2 \i £ J2 «,(5v{S-kx, T~fc C) J2 I fc
i £
*(S- x, T- C) = i J2 I 11 ""
__ 11
f
h
k
I k(S~ k(S~hx,T~ x,T~ky)r)(dy), y)r)(dy),
= —y] —y]
n
k(S~ x, y) r,{dy)
since 77 is is stationary,
Jc
tzi>
= [ =
k kk k k k T-c(y)lj2 ^c(y)lj2 (S-(Sx,Tx,T-y)r,(dy) y)r,(dy)
= J lo(j/) lc(y) [(S-1 ® T-%k] T-%k] (x, (x,y)y)T,(dy). r,(dy). By the Pointwise Ergodic Theorem there is an S x T-invariant function k* such that (S _ 1 ® T_1)„fe -> ->■ fc* fc* /T®17-a.e. n®v-a.e. since k is jointly measurable. Hence we see that i
n_1
r
lim -Yv(S-kx,T-kC)= / lc(y)k*(x,y)r)(dy) n-a.e.x since n(-) = a ® u(X x •). Take a stationary channel v* e CS(X, Y) and define v by since r/(-) = fj,® v(X x •). Take a stationary channel v* e CS(X, Y) and define v by
-/ ^ / [ k*(x,y)r,(dy), u(x, C) = < J c
xeX*,C£2),
Then, clearly £ is a stationary channel and (5.2) holds. By the Bounded Convergence Theorem we have from (5.2) that
1
n_1 -fc
H®v(AxC) [i® v((S C)) | i ® i i ( A x C ) == lim - }V//i/((S xx T)~ T) k{A (j4 xx C)) Jk=0 Jfe=0
Chapter III: Information
160
Channels
for A e X and C G 2J. Thus, since n ® v is stationary, /J, ® v is AMS with the stationary mean n®v, i.e., (5.3) is established. In Proposition 5, the stationary channel v depends on the given AMS channel v and the stationary input source ft. We would like to have a single stationary channel independent of input sources such that (5.2) and (5.3) are true for all stationary fj. G PS(X). This will be obtained using the countable generator 2J0 of 2J.
€ CC T h e o r e m 6. For any AMS channel v G a(X,Y) there is a stationary a(X,Y) V G Cs (X, Y) such that for any stationary input source (i G Ps (X)
channel
n-l
V(x, C) = lim - V i/(S'-fca;, T- fc C) n—^oo Tl ' ' fc=0
/x-a.e.a;, C € 0),
(5.4)
/j, ® v = n ® v, v(x, ■) -C V{x, -) Proo/. Let v G Ca{X,Y)
(5.5)
fM-a.e.x.
(5.6)
and
X{C) = i\ x 6 X X : lim - ^2 A"(C) ^2 v{S~ v{S~kkx,x, T~ T~kkC) C) exists II,,
CG 2J00,, € 2)
X H = f ) X(C). ce?)o
Then for any stationary (J. G P S ( ^ ) we have H{X{V)) = 1 since ju(X(C)) = 1 for C G 2J0 by Proposition 5 and 2J0 is countable. Take a stationary channel v* G C S (X, Y) and define a set function 17 on X x 2J0 by k ! lim lim -- " ££ u{Su{S-kkx,T~ x,T~kC), C),
eG f f2 lJo0 , Xj eG lXHMC, C
[v*{x,C), \v*{x,C),
a ; x<£X(v),Ce%) ^ X ( j / ) , C G 2 J oQ.
It is evident that V can be extended to X x 2J and becomes a stationary channel. Thus (5.4) is satisfied. (5.5) is shown by the Bounded Convergence Theorem and (5.6) is almost obvious. Definition 7. The stationary channel V constructed above is called a mean of the AMS channel v G Ca(X, Y).
stationary
3.5. AMS channels
161
The following is a collection of equivalence conditions for a channel to be AMS. Recall the notations A„ and (S ® T ) n . T h e o r e m 8. For a channel v € C(X, Y) the following conditions are equivalent: (1) v € e Ca(X,Y), i.e., v is AMS. (2) n pe P P.(X) ^fi®u€ Pa(X x Y). S(X) ^»®v€ (3) There is a stationary v\ € C3(X, Y) such that v(x, •)
n—foo n *•—' u—n
s(X)-a.e.x. P sP (X)-a.e.x.
(6) For f € B(X x Y) and n G PS{X) the following limit exists: lim /
\Av(S®T)nf]{x)fj,(dx).
n fo
- ° Jx
/If/ any (and hence all) of the above is true, then it holds that (7) JLWV JT®v = /n®V i ® F for for fi € PS(X). (8) v(x, ■) •) << V{x, V{x,•)•) PP3{X)-a.e.x. 3{X)-a.e.x. (9) For fi e PS(X) and f e B{X x Y) lim / [ A „ ( S ® T ) „ / ] ( x ) ^ ( d x ) = [/ ( A F /(A ) (vf)(x)/J,{dx), x)/x(dx). n-K» Jx Jx Proof. (1) ■& (2) was proved in Lemma 2 and (1) •£> (5) follows from Theorem 6. By taking V\ = V, (3) is derived from (5). (3) => (4) is immediate and (4) =>- (1) is proved in Corollary 4. Thus we proved (1) <£> (2) •» (3) <4> (4) •£> (5). (2) <» (6). Observe that for / e B{X x Y) and \i e PS{X) I
[A„(S ® T ) „ / ] (x) n{dx) ® T ) „ / ( x , y) v{x, dy)fJ.(dx) ji(ote) = = jM j( S(S® 2/) i/(x,
JX
JX J i
= // (S®T)„/(x,2/)/x®i;(dx,d?/) (S®T)nf(x,y)fj.®v(dx,dy) JJxxy JJxxY by Lemma 2.5. The equivalence follows from Theorem II.4.6. (7) and (8) are already noted in Theorem 6. As to (9) we proceed as follows: for / G B(X x Y) and n e PS(X) / (Aj7/)(x) ( A F / ) ( x ) n{dx) = / JX
J
/ XJY
f{x,y)V{x,dy)ix{dx) f{x,y)v{x,dy)ix{dx)
Chapter III: Information
lg2
= // f(x,y)fi®V(dx,dy), f(x,y)fi®v(dx,dy), JJxxY
by Lemma 2.5,
= /[[/ = f{x,y)~iT®T{dx,dy), ./Vxxy JJxxY
by (7),
= n lim [f ^>°°JJxxY
Channels
(S®T)nf(x,y)iJ.®v(dx,dy)
= lim / fA„(S®T)„/](:c)/i(d:r). \AJ$®T!)nf}{x)n{dx). "-*■<» JJxx Example 9. (1) As was mentioned before, each probability measure 77 G P(Y) can be regarded as a channel by letting vv{x,C) = 77(C) for x G X and C G 2). If 77 G P a (X), then z/,, is AMS. In fact, 77 < 77 since T is invertible, so that vv(x, ■) = 77 < 77 = i/jj(x, •) for x G X. Moreover, 1/^ 6 CS(A", Y) imphes that vn G C0(X, Y) by Theorem 8 (3). In this case we have v„ = v^. (2) If a channel v G C(X, Y) satisfies that v(x, i>(a:, •)•) << 7777 PP s(A")-a.e.a; 8(X)-a.e.x for some AMS 77 G -PQ(V0, then 1/ is AMS by Theorem 8 (4) and (1) above. Let fc( ) =
^
k^**1"-
W'
where 77 G PS{Y) is the stationary mean of 77. Then v can be written as v(x, i/(x, C)= C) = I k(x, y) rj{dy), Jc
x e X, C €G ?J?J
and its stationary mean V as I7(x,C7) = / k*(x,y)ri(dy) V{x,C) = J/c k*(x,y)ri(dy)
P,(X)-a.e,x,C P,(X)-a.e,x,C
G 2), G 2),
where k*{x,y) = lim ( S _ 1 ® T-1)nfc(a;,7/) n®v-a.e. (x,y) for /* G P S (X). (3) A channel 1/ G C(X, Y) satisfying the following conditions is AMS: (i) v is dominated, i.e., u(x, ■) <^rj (x & X) for some 77 G P(Y); (ii) v(x, ■) G Pa(Y) for every a; G X. In fact, 1/ is strongly measurable by Corollary IV.2.3 since v is dominated and 2J has a countable generator. Hence v has a separable range in M(Y), so that {i/(x„, •) : n > 1} is dense in its range for some {x„} C X. Let
»=1
^
3.5. AMS channels
163
Then we see that £ € Pa(Y) by (ii) and Lemma 11.4.3(1). Thus v is AMS by (2) above since v(x, ■) < £ (x € X). Definition 10. An AMS channel v e Ca(X, Y) is said to be ergodic if (cl5) n € P o e ( * ) =*• ft ® v € P o e ( X x F ) . C ae (X, y ) denotes the set of all ergodic AMS channels in Ca(X, Y). After giving a lemma we have the following characterization of AMS ergodic channels. Some of the equivalence conditions are similar to those of AMS ergodic sources and some are to those of stationary ergodic channels. Lemma 11. Let v e Ca(X,Y) be AMS and \i 6 PS{X) be stationary. every E,F G 3£ <8> 2) the following limit exists (i-a.e.x:
Then for
1 "_1
hm -Vj/fcpxrj-^nfU. fc=0
Proof. Since v is AMS and fi is stationary, p, ® v is AMS with p®v = \i®v. The proof parallels that of Proposition 1.4 and we have 1 n_1 lim US x r ) _ f c EkEnF] n F]x)) lim - V vU, Y'v(x,[(SxT)x
n-+oo 71 ^ ^ n-+oo 71 fc=0 *-^ fc=0
= / lF(x,y)Eli®v(lE\3)(x,y)v(x,dy) where 3 = {E eX®Z)
y-a.e.x,
: (S x T)-XE = E}.
Theorem 12. Let v 6 C0(A", Y) with the stationary mean V € CS{X, Y). Then the following conditions are equivalent: (1) v e Coe (X, Y), 5^)j i.e., «•£•, v is ergodic. ae(-^, (2) y, Pse(X) ^ii®v£ x xY). /i € eP ► M ® v PeaeP(Xoe (x y). se (x) = (3)VEC (X,Y). (3)veCsese(X,Y). (4) There is a stationary ergodic v\ € Cse(X, Y) such that v{x,-) < i / ! ( x , •)
Pse{X)-a.e.x.
(5.7)
(5) There is an AMS ergodic ui e Cae(X,Y) such that (5.7) is true. (6) If E € X<2)2) is S x T-invariant, then u(x,Ex) = 0 or 1 PseP(X)-a.e.x. se(X)-a.e.x.
j64 164
Chapter III: Information
Channels
(7) For E, F g X ® 2) and pft£G PP$e{X) E,FeX®Z) se(X) k k k lim -yu(x,[(SxT)--yu(x,{(SxT)V i / f e EnF} KEnF\ S x)= xx)= rx)= ^ n f ] ^ -yu(x,[(SxT)EnF]
kk lim - i Y*V(*>l(S £^r >^^(a*K [(S r ,. S Kx S xxr T) )-T)-T)^^E],x)x)v{x,F ))u{x,F ** E) ^ ,^^x)
= fi®V(E)u(x,F fi®V(E)u(x,F x)x)
fi-a.e.x. fi-a.e.x.
Proof Proof. (1) => (2) is clear. (2) => (3). Suppose (2) is true and let p 6 P, A* ® ® vi/ €€ PPaeae(X (X xx FF)).. Psee{X). (X). Then ft Hence fi®V = = /T®T JT®u g P ,s e ( X x F ) by Theorem II.4.12 and Theorem 8 (7). Thus VF is ergodic. (3) => (4). Take v\=V and invoke Theorem 8. (4) => (5) is immediate. (5) =^ (6). Assume (5) is true. Let 2 ? € - £ ® 2 ) b e S ' x T-invariant and take fi g P ses e(X). . IfIf /itA*®®^(E ( X ) . Then fi®v fi® xV\ g€ Paeo e{X ( ^ x Y) F ) and fi /* ® vi>
I v(x,Ex)fi{dx) fi(dx) ■■= 0. )fj,{dx)
Jx
If ^(a;, 2^) fi-a.e.x, then we have fi®u(E) = 1. Thus A*®I/ is ergodic. Therefore v(x, E x) = 1 p-a.e.x, f is ergodic. (1) => (7). This is shown in a similar manner as the proof of (1) => (6) of Theorem 4.2 using Lemma 11. (7) =*• (1). Let M fi €e P Pse(X) E,F ® 2). By integrating both sides of the , P eg £X ® s e ( * ) and £ equation in (7) w.r.t. fi over X, X, we get 1 "_1 lim -^2 fi® n® u((S x T)~kE
n F) =
fi®V{E)fi®v{F). fi®v{E)fi®v{F).
Hence ^ ® i> is AMS ergodic. Therefore z/ is AMS ergodic. We noted that e x P 0 ( X ) C Pae(X) and the set inclusion is proper (cf. Theorem II.4.14). Similarly we can prove the following. If i/v ez eexC T h e o r e m 1 3 . (1) 7/ x Ca0(X,Y) ( X , F ) ((modP m o d Pses(X)), (X,F). e ( X ) ) , then v 6 C ae o e(X,Y). e*Ca(X,Y)CCaeQC (X,Y). ae(X,Y).
Thus
S.5. AMS channels channels
165
(2) / / there exists a weakly mixing source in Pse(Y), then the above set inclusion is proper, in that there exists some AMS ergodic channel v G Cae(X,Y) such that v<£exC v£exC a(X,Y) a{X,Y)
(modP„(X)). (modP s e (X)).
Proof. (1) Let v G Ca(X,Y), (X, Y), V F G CS8{X,Y) (X, Y) be its stationary mean and v <->■ A G A{X, v ^ C Qe Y). Then .A(X, Y) Y) be the corresponding channel operator. Suppose that 1/ (X, Y). ae{X, there is some // p G Pses e{X) ( X ) such that A* A*pp ^£ Pae{X (X x Y). Hence there is some S x Tinvariant set £ € 3E ® 2) 2) such such that that 00 << Ai Ai == A*p(E) A*n(E) << 1.1. Letting Letting AA22 == 11 -- Ai, Ai, 3t ® take 7 > 0 so that 0 < 7 < min{Ai, A A2}. 2 }. Let a>i = jL (i = 1, 2) and define operators A i , A 22 oo nn PB((XI x Y) Y ) bbyy A x / = aiaA{fl i A (B)/ l B ) + (1 - ai a Al i AEl)Af, B)A/,
f/ e€ SB(X ( X xx Y), Y),
Q 2 A ( / 1 B , ) + (1 - a 2 A l BB c ) A A//,, A 22 // = aaACfljsO
P ( X x Y). Y). / G B{X
Then as in the proof of (3) =$■ => (2) (2)ofofTheorem Theorem4.4 4.4we wesee seethat that AAi ,i ,AA2 2 Ge A(X, A(X, Y). Y). ItIt follows from (4.11) that A i ^ A 2 (modPse(X)). Moreover, A is a proper convex combination of A i and A 2 : A = A1A1 + A 2 A 2 , which follows from (4.12). Now we want to show that v\ 1/1 and v2 are AMS channels, where i / j f t A i ( i = l,2), Observe that for / G B(X x Y) [A x (S®T)„/](x)M(
[ a i ( A ( S ® T ) B / 1 B ) + (1 - a 1 A l B ) A ( S ® T ) „ / ] (*) ft(dx) [ a i ( A ( S ® T ) B / 1 B ) + (1 - a 1 A l B ) A ( S ® T ) „ / ] (*) /i(dx)
= [ [a1A(S®T)n(/lB) + (l-aiAlB)A(S®T)fI/](i)/i(di) = Jx / [a1A(S®T)n(/lB) + (l-aiAlB)A(S®T)fI/](i)/i(di) because E is S x T-invariant. Since v is AMS, lim f„ a i A ( S ® T ) ( / l g ) dp, exists because E is 5 x T-invariant. Since v is AMS, lim f„ a i A ( S ® T ) nn ( / l g ) d/i exists by Theorem 8 (6). Also lim / x ( l - a i A l £ ; ) A (n-¥oo S ® T )-*n / d ^ exists by Theorem 8 (6) andTheorem the Bounded by 8 (6). Convergence Also lim / x (Theorem. l - a i A l £ ;Thus, ) A ( S we ® Tproved ) n / d ^ that exists by Theorem 8 (6) and the Bounded Convergence Theorem. Thus, we proved that lim / A i ( S ® T ) „ / d / i n ^°°Jx exists for every /f G x Y) and hence 1/1 is AMS by Theorem 8 (6). Similarly, u2 e B(X xY) is AMS. Consequently we see that v £ exCa(X, Y) (modP s e (X)).
Chapter III: Information
jgg 166
(2) Take an 77 6 Pse{Y)
Channels
that is WM and define £ f by i{C)=
[ gdr,, Jc
CeZ),
where g e Ll{Y, r,) is nonnegative with norm 1 which is not T-invariant on a set f nrrt 7 art m/lOCIl of positive 7 measure. Then, as in the proof of Theorem II.4.14 we see that £ e Pae(Y),£ ^ V,I = V a n d C £ ++f) en)eP ^oeC^) C==| (i^ ae(Y)is a proper convex combination of two distinct AMS sources. TT Hence i/ i exC exCaa(X,Y) j / cf £ (X, Y) ( m(modP, o d P s eeP(X)) 0) vcc = = |\{v$ +^ u„), v^,v since i/ (^ + ),^ . " ni , e€ Ca(X,Y) (X, Y) and v^ t/€ ^ i/„. We need to show u vc 6 e Cae(X, Y). Clearly F c = v^ = = v„ £ e Cse{X, Y) since (j.® v„ = p x r, € Pae(X x Y) for At £ Pse {X) by Theorem II.3.10. Thus v c e C o e ( X , y ) by Theorem 12. 3e{X) If the output space is an alphabet message space Y = Y^ with a shift transfor mation T, then there exists a Bernoulli source that is strongly mixing (cf. Example II.3.7), so the assumption in Theorem 13 (2) is satisfied.
3.6. Capacity and transmission rate For a stationary channel we define the transmission rate functional and the sta tionary and ergodic capacities. An integral representation of the transmission rate functional is given. For a stationary ergodic channel the coincidence of two capac ities is shown. First we deal with the alphabet message space case and then the general case. Consider alphabet message spaces X = XQ and Y = Y0Z with shifts S and T, respectively, where X0 = { a x , . . . , a p } and Y0 = { 6 1 , . . . , 6 , } . For each n > 1, 9JtJ,(X) denotes the set of all messages in X of length n starting at time i of the form ■x 1 [*i fc) -*£Lil. i+T>-lJ> (k)
x
I<*
Similarly, we denote by SDtj.OO and WPn(X x Y) the sets of all messages in Y and X x Y of length n starting at time i, respectively. Note that ajr„(x) WPn(X) = = Vs-''9Jti(x)e V S~jTi[(X) 6 i=o V(X) for n > 1 and i e Z. Let a channel v 6 C(X, Y) be given. For each input source fj. G P(X) we associate the output source \w 6 P(Y) and the compound source A*v e P(X x Y). The mutual information I„(X, Y) = J n (/x; v) between the two finite schema (Wln(X), y) and (QJtj^Y), /HI/) is given by
ln{X,Y)=In(jtv) (/*;") = lJn
3.6. Capacity
and transmission
rate
167
= H^TlKX)) H^mUx)) ++ H^WI^Y)) ff^«(Y))
H^(m Y)) - -H^ u{m\{X n(x xxY))
(cf. (1.1.2)), where H^(m H^UX)) n(X))
= = --
J2 J2 Azmux)
rtA)l0E»(A), ^)lo gfi(A),
etc. Then, ^ ^In n(n;u) ( M ^ ) is considered as an average information of one symbol (or letter) when messages of length n in the input source [X, fi] are sent through the channel v. If the limit I(n;u) = lim Urn (n;v) I(n;u) -I-Inn(n;v) n—¥oo n-¥oo n
exists, then I(fi; v) is called the transmission rate of the channel v under the input source /z, or the mutual information between the input X and the output Y through the channel v with the input source fi. I(fj,; v) represents the average information per symbol when messages of the input source [X, )i] are sent through the channel v. If v and fx are stationary, then we have by the Kolmogorov-Sinai Theorem (Theorem 1.3.10) that I(»; v) = flH^S) /(/*; u) + H^T) tf„„(T) - H^ H^„{S V(S x T) M{S) + since
V SnDJl\(X) = X, etc. In this case note that /(■ ;v) is affine: for a,/? > 0 n=—oo
with a + /3 = 1 and \i, r\ € Pa(X) (afi + Pi];v) Pn;v) = aI(n;v) aI(n;v) + /3I(n; I (afi pi(-q; v), v), which follows from Theorem 1.5.6 (or Lemma 1.5.1). The stationary capacity and the ergodic capacity of a stationary channel v are respectively defined by Cs(u) = sup {I(p {l(tM ;v): ;V):IM& C.(v) M £ PP,(X)}, ,(X)},
{v) = = sup{/(/*;j/) ::fieP and /x®1/ Cee(z/) / z e SPe(X) i / €G PPs se X x yxY)}, )}, s e P 0 and/x® e ((X where if JJ ® v ^ Pse{X x F ) for every \i e Pse(X), then let Ce(v) = 0. Let us mention the weak* upper semicontinuity of the entropy functional. For n > 1 let Hn{n) = Hp(Wln{X)). Then we have that for each [i 6 P,{X)
0 < #„(M) < Hn+1(n), #fcn(M) < ^ n ( M ) ,
n > 1, n>l,fc>l,
U m ^ l M = jff (S) . n-Kx>
n
(6.1)
igg 168
Chapter III: Information
Channels
In particular, (6.1) implies that kn so that
n
, , , , _ . . H2(u) HM H2»(n) Hl # iM* ( / i ) ~T~ > - 2 - -> ~^~ - 2 2 - >- -''' " --> ~^~ - ^ - > -- - - , '
i.e, g2 ^„^) 4 Hn(S). This shows that H(.)(S) is a weak* upper semicontinuous 2" function on P S (X) C C(X)* since each Hn(-) is so. Although this was proved in Lemma II.7.3, we shall use this method to prove the part (2)(i) of the following theorem, which summarizes the fundamental results regarding transmission rates and capacities in the alphabet message space case. Theorem 1. Let 1/ v 6 Ca(X^,Y^) be a stationary channel. (1) 0 < CM < < Cs^(v) logpq, ''"here where p = |\X ") < ^ iogM, Z00\| and q = \Y0\. (2) / / v has a finite memory (cf. Definition 1.1 (c5)) and is finitely dependent (cf. Definition 3.3 (c8)), then: (i) / ( • ; u) is upper semicontinuous in the weak* topology of PS(X) C C{X)*; C(X)*;
in) CM CM; CM = CM", (iii) There exists a stationary ergodic input source \i* 6 Pse(X) I(H*;V). {y) = Ce(v) if v is ergodic. (3) Cs(v)
such that Ce(v) (y) =
Proof. All the statements of Theorem 1 (except (2)(i)) follow from the discussion below (see Theorem 8), where we consider a more general setting. So we prove (2)(i). Since v has finite memory, it is continuous (Definition 3.1, (c5')), so that i7MI/(T) is a weak* upper semicontinuous function of /j, by Proposition 1.6 and Lemma II.7.3. Hence, it is sufficient to show that H^S) - H^V(S x T) is a weak* upper semicon Hp(S)-Hp&,(SxT) tinuous function of fj, on PS(X). Since v has finite memory and is finitely dependent, there exists a positive integer m such that (c5) and (c8) hold. By (c5) we have that for any message C = [y m + 1 • ■ ■ yn](m + 1 < n) C Y u(x,C) = u(x',C),
x,x' 6 A = [x [ S xl •■ ■!„]. •*»].
We denote this common value by u(A, C). For n > 1 and n € PS(X) fn =
12lx®l'(AxC)lo&lJ'®l/(AxC)-Y2'J'(A)lo&lJ'(A)
^n ll
A,C A,c
let
'
(6-2)
A A
where the sum is taken for A 6 M\{X) and C € 9Rj£j£(y). Observe that where the sum is taken for A 6 Mn(X) f f
n
l
12 A,C
^
C
and C € 9Rj£j£(y). Observe that
A "M == n 5Z v ( >C )^ ) *>gv( , CMA) - £ [i(A) \og»(A) A A
"M
V A
)^ )
A
*>gv( , CMA)
- £
A
l
A,C
A
[i(A)
logn(A) J J
3.6. Capacity and transmission
rate rate
169
= jv{A,C)ti{A)\ogv{A,C). = -Y ±5>(A,C)/i(A)log«'(A,C).
(6.3) (6.3)
For A = [xi•■•!„] € 9 J £ p O let A' = [a: m + i■••!„] e 3 C - m W and for C = [j/m+i • ■ • 2/n] e 9 C - m ( ^ ) 1«* C = [y'1--- y'mym+1 ■ ■ ■ yn] e 97£(y). Then one has for n e Pa(X) H® v(A C) < /x®v(Ax C) < fi® v{A' x C). u(A xxC')m + 1 and n e P»(X) -H ® i/) = ^ -ff„(/i n(fi
M "(-A xx C") G')log/i log/i®®i/(A i/(4 xx C") C") jt ®®"(-A
-A.C"
< ^
M* ® "(-^ A v{A x C") logM ® "(-A v{A x C) C)
A,C
= = ^2 Y n ® i/(^4 v(A x C) log/t log ft ® J/(J1 J/(A x C) A,C = G „ ( / J ) , say,
(6.4)
< ^2 fj, ® v(A x C) log ®i/(A' ®u(A' x C) A,c c log/i ® v{A'4 x 1 = X ^2 I A H® = * ®v(A' " ( ^ xx C) ) kg/* ® K^ ' x CC) ) A',C m m = Yi »V ® ^v{S v{SmmA'A' x = ( 5 "A' " ^ ' x rTm C) C ) log/z ® i/(S
m TmC)
A',C
= -Hn-m(n®v).
(6.5)
Hence (6.2) and (6.4) imply /„(/*) = 1 (G„(M) + #„(/*)) > i (fl„0i) - #n(/* ® I/)), while (6.2) and (6.5) imply /«(/*) < fnilA <
-(H-(S n(ii)-H m(fi®v)). n(ii.)-H n-nm-{iM®v)).
These two yield that liminf/„(p) > lim - (Hn(n) - Hn{n ® v)) n—*oo
n—►«>TO n—>oo TO
= fl-„(S) tf„(S) - H H^MV®„(5 {S x T)
Chapter III: Information
170 170
Channels
lim - (Hn(n) - Hn^m(/j, ® v)) limsup/„(/*) < Urn n^oo
"->°° Tl
= lim -Hn(fi)n-¥oo n-¥oo n n
lim
n — 71 — 77i 771
n-too n-too
n n
''
Hn-m(iJ, ® u)
Tl Tl — — TJl m
= Hf,,(S) — Hy,®v(S x T). = Hf,,(S) — Hy,®v(S x T). So we conclude that So we conclude that Urn / „ ( M ) = H^S) lim /„(/i) = H^S) n—foo
- Hmv{S - H^V{S
x T), x T),
/z G P 8 ( X ) . /z G PS(X).
n—foo
Note that /„(•) is a weak* continuous function on PS(X) for each n > 1. To prove the weak* upper semicontinuity of H^S) — H^^S x T) it suffices to show that {fn(fi)} contains a monotonely decreasing subsequence. Let I > 1 be arbitrary and n = 2(m + £). Denote a message A = [x\ ■ ■ -xn] € m\(X) by .4 = [xi [an ■ • ■■ •■xm+e ■ • xn] == Ai n A2,2, say. m + € + i • ■■■x„] A= xm+t]] nn[a; [im+M-i A1CiA say. Similarly, we write [2/m+i ■ ■ • •■2/n] C = b/m+i yn] = [Vm+l ■ ■ ■ Vm+l] n [j/m+£+l ■ ■ ' V2m+t\ V2m+t\
f~l [ j / 2 m + / + l ■ ■ ■ 3/2(m+£)]
= C i n C " n C 2 , say. Since vi/ has a finite memory and is finitely dependent, one has u(A,G) = i / ( x ,v(A,C j/(A,C) C i n C1nC'nC " n C 22)
^22(m( m + + l) £) ^ V
=
^T+1 ^T+e
;
^
" ( ^ i » C i M ^ aHAuC ,Ci)M ( A 2ti nCi24)ii{A g i / ( ^ i , C 11),C i / 1()v(A i 4 a ,2G 2) ,C2) a)lo 1)v(A 1nA2)loev(A
A1,C1yl2,C2
£^ )
" ( ^ i . cHAi,C o M A1)n(A O i o1g)log i / CV(A A1L,Cd1)
Ai,Ci
=
fm+t{»),
f2(m+i){fi). Since this holds for every £ = 1 , 2 , . . . , we can choose so that fm+e(n) > f2(m+t){n)a monotonely decreasing subsequence {fnk{n)}^=1 for each fi e PS(X).
3.6. Capacity and transmission rate
171
Now we assume that X and F are totally disconnected compact Hausdorff spaces with the bases XQ and 2)o of clopen sets and homeomorphisms S and T, respectively. As before X and 2) stand for the Baire cr-algebras of X and F , respectively. Let 21 e V(XQ) and 93 e T*(2)o) be fixed clopen partitions of X and F , respectively. £ = 21 x 33 denotes the clopen partition {A x B : A e 21, B € 03} of X x F . We consider three entropy functionals: (n)=H{n,%S), {»)=H(fi,%S), H11(n)=H{ tM,%S),
fieMss(X), (X), fieM
(v) = H( ,
v£M,(Y), v£M,(Y), v£M,(Y),
(0 ) ,, H(£, <£,, SS x rD H33s(0 (0 = H # ((£, £, U T),
££ €e€ M MSS(X (X xx F). FY), ).
(6.6)
By Theorem II.7.6 there are 5-, T- and S x T-invariant nonnegative measurable functions hi on X , /i2 on Y and /13 on X x F , respectively such that hiix)n(dx), # I ( M ) = / /ii(i)/i(dx), Jx
( J 6fieM M S (s(X), I),
h2(y) r,(dy), H2(n) = yJ Aa(»)
r, 6 MM. (S(Y), y),
«H33(0 ( 0 = II f[
£ e M3(X x Y).
h3(x, y)ttdx, y)£(dx, dy),
J JxxY
Definition 2. Let v e CS(X,Y) be a stationary channel and /J, £ PS(X) be a sta tionary source. Then, the transmission rate 9t(^i; v) = 9t(/!i; v, 21,93) of the channel v w.r.t. \i is defined by m((j.; m(n; 1/) = H Hi(n) x(fi) + H2(jw)
-H3{n®v),
where Hi,H2 and H3 are defined by (6.6). Hence 9t(- ;i/) is called the transmission rate functional of v on P«(X) or on MS(X). Hereafter, we shall use the letter 9t for the transmission rate functional. We can obtain an integral representation of the transmission rate functional
*(•;»). Proposition 3. Let v e CS(X,Y) be a stationary channel. Then the transmission rate functional 9t(- ;/i) is a bounded positive linear functional on M„(X) and there is a universal S-invariant bounded Baire function t(-) on X such that 9t(/j; v) = I/ t{x)ft(dx), t(x) n(dx),
fi €e M,(X). M„{X). ft
(6.7)
172 172
Chapter III: Information
Channels
Proof. Observe that for fi fj, €€ M„(X) Ma(X) 9t( M ; v)v) == Hi{n) DK(fi; Hi(n) ++HH22(H (nv) -- HH33(n (fi ®® v)v) -= /I h h!(x)n(dx)+ (x)n(dx)+ 11(x)n(dx)+ Jx
fiv(dy) - II {y)nv{dy)/ h22(y) {y)nv{dy) JY
= / h11(x) (x)li{dx)+ li{dx)+ Jx
/
h3{x,y) fj,®v(dx,dy) h3{x,y)ii®v{dx,dy)
JJXXY JJXXY
/
h22{y)v{x,dy)n{dx) (y)v(x,dy)n(dx) h2h{y)v{x,dy)n{dx)
JXJY JXJY
- I
I
h3h(x,y)v(x,dy)fj,(dx) h33(x,y)v(x,dy)fj,(dx) (x,y)v(x,dy)n(dx)
JXJY JXJY
= /
hi(a;)+ /ii(a:)+ / h22{y)v{x,dy){y)v{x,dy)-
I h33(x,y)v(x,dy) (x,y)v(x,dy)
n(dx) fi{dx) fi{dx)
by Lemma 2.5. Hence, letting r(x) = h1(x)+
h2(y)v(x,dy)-
h3(x,y)u(x,dy),
x e X,
(6.8)
we have the desired integral representation (6.7). S-invariance of t follows from S-invariance of hi, T-invariance of h2, S x T-invariance of h3 and stationarity of v. Clearly SH(-; v) is a bounded linear functional on MS(X) since t is bounded. We show that SH(-; v) is nonnegative. Note that *R(n;i>) (j,v-n®v),),
\i €= PPS(X) fi < S(X)
(6.9)
by the linearity of H3(-) and Bi(/i) + H2(fiv) = H3(fj, x v). Thus it follows that 9l(/i; v) = H3(n x /lu) - H3(fj, ® v)
(/i X V AA = ^ n„ ^ Y, f[(/i X ^ »^ A,B A,B
x B) X B log to ^ xx f^^A xxSS)
) s0* H(^
)
--Ii® ii ® v{A i>(4 xx B)log/i® 5 ) log/i ® i/(A i/(A xx B)] B)] = rr bm = bm )) -- 5^ 3 M M® ® v{A v{A xx B) B) [log(/x [log(/x xx pv){A A"/)(A X X B) B) -- log/x log(j. ® ® i/(^4 v{A xx B)] B)]
>0
A,B
>0 by Theorem 1.1 (1), where the sum is taken over A G 21 V 2t„ and B e i B V 2*„, and we have used the following computation: 5 3 (M x lu>){A x B) log(/i x fiv){A x B) 4,B
3.6. Capacity and transmission
rate
173
= A) log /z( A) + = 55 33 /*( li{A) fi(A) + 5^23 /zi/(S) ia>(B)loglogy,v(B) IIV{B) A
B
= ^2lJ,®v(AxB)
log n{A) + 5 3 A n*®®u(A v{A xx B) B) log logfj.v{B) /zz/(S)
A,B
A,B x 5
X
7
log(iz x Aixv){A B). = 5 3 nA* ® ® u(A " ( ^ x B)) i°g(A* " ) ^ xx B). A,B A,B
For a general /z G M + ( X ) For a general /z G M + ( X ) The following corollary The following corollary
we can similarly show Dt(/z; u) > 0. we can similarly show Dt(/z; i/) > 0. immediately follows from (6.9). immediately follows from (6.9).
Corollary 4. For a stationary channel v G CS(X,Y), tional JH(-; v) is written as 9t(/z;z/)= 9 t ( / z ; i / ) =// /
h3(x,y)^(dx,dy),
the transmission rate func
M ,S ( X /ZJZ£GM I )) ,
JJXxY
where (,,,, = = A* p xx (iv (X). uAere C/* liV — ~ fiI1 ® ® v" for / o r /zM G € MMS S(X). To consider the Parthasarathy-type integral representation of transmission rates we use results in Sections 2.6 and 2.7. As in Section 2.7 denote by 2$o the algebra of U 5 _ J 2 t and let 93 x = c(2$o), the
Lemma 5. With the notation mentioned above, {p\9x x : A /z* eG P ,P.(X)}. (1) P s (X,
(2) Suppose that /zi G P s e (X, 93x) is ergodic and let Pi be as in (1). If /z G ex [P S (X) n P i ] , then \i is ergodic. For, if /z = an + /3£ with a, /? > 0, a + /? = 1 and n,£ G P S ( X ) , then as functionals /J. = an + 0£ = Hi on C(X, 21). Since /zi is ergodic, we see that /z = n = f on C(X, 2t) and hence n,£ G P S (X) n P i . Thus /z = 77 = £ on C(X) because /z is extremal in P S (X) n P i . Therefore zz is ergodic and H\
174 174
Chapter III: Information Channels
It follows from Lemma 5 (1) that Hi(ti) = Hiin), H^n),
n,r)€
Pe{X)
with H\
Hence the entropy functional Hx(-) on PS(X,%) HI(/J,I) = = Hi(ii), Hi(jj,i) HiQi),
is imambiguously defined by
P „ ( X , 5 5Xx),M ),/i e € P,(X) P.(JT) with //i! i! = Mi e P„(X,55 = /X|
Recall that iZ denotes the set of all regular points in X (cf. Section 2.7). For H € PS{X) and r e R, fir denotes the stationary ergodic source corresponding to r. By Lemma 5 (2) there is some \ir G Pae{X) such that (ir = £V|<8X a n < l w e define m(fir;v)
=
m(fir;p)
for a stationary channel v. For partitions 55 and C = 21 x 55 we define cr-subalgebras 55y of 2J and 55* x y of X ® 2) in a similar manner as 55xProposition 6. Let v € CS(X, Y) be a stationary channel satisfying that (c2') u(-, C) is 55x-measurable for C e 2J. Then the function 9t(nr] v) of r € R is 55 x-measurable on R such that
fj,ePa(X).
JR
Proof. According to the proof of Theorem II.7.6 the entropy functions hi, ha and ^3 are 55x-, 55y- and 55x x y-measurable, respectively. Hence the condition (c2') imphes that Jyh2(y)v{-,dy) and $Yh,3(-,y)v{-,dy) are ^ - m e a s u r a b l e , so that 9t(/j(.); u) is also 55x-measurable on R. By Lemma II.7.5 we see that for /i € PS(X) 9 % ; « / ) = / t(x)/x(dx) = / Jx
= f
/
JRUX
t(x)fi r (dx)\n{dr) J
m(nr;v)ij.(dr).
JR
Definition 7. For a stationary channel v e C,(X, Y) the stationary capacity C„{v) is defined by C.(v) = sup {«R(M; v) : ft € PS{X)} and the ergodic capacity Ce{v) by Ce{v) = sup {m(fj, ; i / ) : / j e Pae(X)
and /i ® v € P s e ( X x y ) } .
3.6. Capacity
and transmission
If there is no n e Pae(X)
rate
175
with fiv e Pae(X x Y), then we let Ce(v) = 0.
T h e o r e m 8. Let v 6 CS(X,Y) be o stationary channel satisfying (c2'). (1) C.{y) = sup { 9 % ; v) : p e P 8 e ( X ) } . (2) / / v is ergodic, then Ca(y) = Ce(u). (3) If v is ergodic and Dt(-; v) is weak* ripper semicontinuous on PS(X) C C(X)*, then there exists an ergodic source fi* e Pae(X) such that Ce(v) = Ce(v) = 9t(/z*; v). Proof. (1) By Proposition 6 we see that for any /x e P*(X) fi(dr) 9 t ( / i ; i / ) = / minr;v) 5t(ft.;i/)/i(dr) Jx < < sup5K(/ir;z/) p {p9l( l^( A e W}}-^< sSuU S t^; ^I )/:iMGG^Pe» W Taking the supremum on the LHS over Ps (X) gives the conclusion. (2) is obvious. (3) Since 9t(-;i/) is weak* upper semicontinuous and the set PS(X) is weak* compact and convex, there exists at least one fio £ Ps(X) such that 9t(/io; v) attains its supremum, i.e., 9K(fj.o;v) = C3(u). Let Pi be the set of all such /io- Since 9t(-;i/) is weak* upper semicontinuous and affine on Pe(X), Pi is weak* compact and convex. By the Krein-Milman Theorem there is at least one extreme point //* in Pi. Then fi* is also extremal in Pa(X). For, if fi' = a/ii + (1 — a)/i2 for a G (0,1) and fii,H2 e Ps{X), then
CM =*(,*» a,( I /)=SR(/x*;i/) = o0t(/«i; a£H(//i; u) + (1 - a)«(/x a ) « ( / i22;;v) f) < C.(i/), C.(v), which implies 9t(/*i; t') = 9t(/z 2 ;") = Cs(") and pi,/*2 € Pi- By the extremality of //* we have n* = Hi = l*2, so that //* is extremal in P«(X). Therefore /t* is ergodic by Theorem II.3.2. In the rest of the section we discuss the transmission rate and the capacity of a (Y,2),T) be stationary channel in a measure theoretic manner. Let (X,3e,5) (X, X, S) and (y,2),T) a pair of abstract measurable spaces with measurable transformations S and T. A slightly different type of transmission rate is defined as follows. Y) be a stationary channel. Definition 9. Let v e C,(X, C8{X,Y) e PS,(X) ( ^ ) is denned by (1) The transmission rate 9i(p,;v) of vf w.r.t. n/x E m((j.; v) „(«8, T) - Hflp^(8 93, S x T) : i/) = sup { f l ^ O , 55)) + ffM„(93, p
a6?(I),i8e?@)},
Chapter III: Information
176 176
Channels
where #,,(21,5) = H(n, 21,5), H{(i, 21, S), etc. (2) The stationary capacity Cs{v) and the ergodic capacity Ce{v) are respectively defined as in Definition 7. {u) = Ce(y) R e m a r k 10. (1) In the above definition, the equality Ca(u) (u) for stationary ergodic channel v is not trivial. Wli(X) and 93 03 = (2) In the alphabet message space case we have chosen 21 = 9Jli(X) T"03 = fXJli(Y). Since they are the generators in the sense that V 5""2l = X a n d V T"!8 2J, 2), we obtained
{S - H^V(S
x T)
by the Kolmogorov-Sinai Theorem. In the discussion from Definition 2 through Theorem 8 we fixed partitions 21 G V(X) and 55 € 7>(2J). (3) Since /J. ® v -^ /j, x /iv and
H„(K) H,V(
ff JJxxY
d{n®u) d{ti x fiv)
d(fi®v) d(fi x (iv) ^
^''
Note that this quantity is defined for all channels and input sources. Proposition 11. If P3(X) is complete for ergodicity (cf. Definition 2.8) and u G Cse{X, Y) is a stationary ergodic channel, then the equality Cs(u) = Ce{v) holds. Proof. We only have to prove Cs{v) < Ce{v) assuming Cs{v) < < oo. For an arbitrary e > > 0 choose a stationary source // n G 6 PS8(X) {X) and finite partitions 21 G e V{X) 'P(3E) and 58 03 € G 7>(2J) such that F ffM (2l, S) 5) + + # M1/ ^ ( (03, 0 3 , T) - H^„(C3s(u) {u) - e. M(2t,
(6.10)
Then by Theorem II.7.7 we can find transformation invariant bounded measurable functions hi{x),h2{y) and h3(x,y) such that
Hllll(%S)= H (%S)= [
Jx
h^nidx), h^fiidx),
3.6. Capacity
and transmission
rate
177
HI1V(
f[
h2{y)fiu{dy),
y)n® v(dx, v{dx, dy). x « B , S x T ) = [[ h3(x, y)/j,® JJxxY
As in the proof of Proposition 3, if we define t(x) by (6.8), then LHSof (6.10)= /f Jx Jx
t(x)fi(dx).
Since t(-) is 5-invariant there exists a sequence {t„(-)} of S-invariant simple functions such that t„ f1 1t on X. X . For each n > 1 write t„ as kn r
" = y^ Q n,fcl/i„,n n,fcl/l„,n fc=i fc=l
where {Ani i,... ,A , A»,fc S lA1A„ = Ann,k for 1 < fc < knn. It follows n±,... ntkn} n} € V(X) with S~ n,ktk from the Monotone Convergence Theorem that lim nn
/ tn(x) n(dx) = / ^°° ^°° Jx Jx Jx Jx
v(x)fi(dx).
Hence by (6.10) we see that t
^"Q ^"0
/ *„ r„ 0 (x) fi(dx) H(dx) = Y^ otno,kn{An0tk) J x
> Cs(u) - e
fc=i
for some no > 1. Consequently one has for some fco Ono.fco > »(") -
e
>
M(^no.fco) > °-
Since PS(X) is complete for ergodicity, there exists some n* e Pse{X) l?{A^M) = 1. Thus
such that
n*(dx) = a„ 0|fco > C,(u) Cg(i/) - e. 5H(/i*;«/)> / t(x)/x*(dx)> / t„ 0 (x) ji*(dx) Jx Jx By the ergodicity of v we obtain Ce(y) > Ca(v), completing the proof.
178
Chapter III: Information
Channels
3.7. Coding theorems In this section, Shannon's first and second coding theorems are fomulated and proved. Feinstein's fundamental lemma is also proved. It is used to establish the above mentioned theorems. We use the alphabet message space setting: X = X% and Y = F 0 Z for finite sets X0 and Y0 with the shifts S and T, respectively. A channel v € C(X, Y) is said to be nonanticipatory if = (xk),x (cl6) v(x, [yn = b]) = v(x', [y„ = b]) for every n e Z, 6 € Y0, and x — (x'k) e X with xk = x'k (k
=
Recall the notations Wn{X) and v{A, D): VJVn(X) is the set of all messages of length n starting at time i in X, and v(A, D) is the common value v{x, D) for x & A, where A e X and D € 2). Then Feinstein's fundamental lemma is formulated as follows: L e m m a 1 (Feinstein's fundamental lemma). Let v e CS(X,Y) be a stationary, m-memory, m-dependent and nonanticipatory channel with the ergodic capacity C = Ce. For any e > 0 there exist positive integers n = n(e) and N = N(e), messages Ui,... , UN e M^n(X) and measurable sets Vu... ,VN e A(W? the algebra .A(9K£(y)), n{Y)), generated by 9Jl£(y), WQ(Y), such that (l)V inVj=H(i^j); (i)v,nv5 = fl(»?fci); (2) u(U i,V i)>l-e,ll-e,l e <(( -- >)).. (3) N AT>e Proof. By the definition of Ce there exists an ergodic source fj. G Pse(X)
such that
£
K{n£ f t ( Mv;)>ci / ) > C-.- | . In fact, we may assume 9t(/i; v) = C in view of Theorem 6.1 (2). Since v is stationary and ergodic by Theorem 3.4, pv and n<8>v are also stationary and ergodic. For n > 1 we denote A = [x_mx_m+1 ■ Wlm^mn™ • ••■!„_!] ■ xn-i] €€ Tl (X), n{X),
(7.1) (7.1)
D = [yoyi [y ■ 0■yi---y • s/„_i] £ £Dt° (K). n-l]€DX° n(Y).
(7.2) (7.2)
Then, by SMB Theorem (Theorem II.5.1) — log/i(A), n
—log^(D), n
converge to if M (5),if M „(T) and H^S tively. Hence
—logn®v(A n
x T) a.e. and hence in probability, respec
I i 6o K ^ ^ = I i o6 e ^ ® i ^ S ^ 2 nn
6
fiv{D) fiv{D)
xD)
n n
6
n(A)nv{D) n(A)nv{D)
3.7. Coding theorems theorems
179 179
tends to H^S) + H^T) - #„®„(S x T) = *R(fi;i>) in probability (fj, ® v). Thus we can choose a large enough n = n(e) > 1 for which
f[l,
v{A,D)
„ e\\
1,
e
'•'([s*^><'-!])> -f
<"»
where [• ■ • ] indicates the set {{ (( (( ** ** )) ,, (( »» )) )) e >C C -- |I with with (7.1), (7.1), (7.2)}. (7.2)}. 6X Xx xy Y :: II ll oo gg ^^ ^^ > For For each each A Ae e aJt~^„(X) aJt~^„(X) let let
V. =U { ^ « K , : i . o g ^ | ) > C - £ } . v*-\j{»e*V)>k*&$><>-\}. Then we get LHS of (7.3) = n® u( \J{A LHS of (7.3) = fi®v(\J{A ^ A ^ A
x VA)\ x VA )\
=^IJ.®V{AX = ^IJ.®V{AX ' A ' A £
VA) VA )
= Y,»{A)v{A,V J > ( A M i i A,)>\~ K O >-. i - l
(7.4)
A
If A and .D D are such that £ l o g ^ j ^ > >>z /J/(A, ( J 4 , yFj A e "e( nc (-Ct )- At V VkA)) l ) )>> t I /"((V
>
n c c or ( -i). -t>. or fiv(V /M/(VA) < e~ e~"< A) <
n c [iv(D)e nv{D)e< ( -i\
(7.5)
Now choose a t/i e aJt^™ n (X) such that »(Di,Vok)>l-«I which is possible in view of (7.4), and let Vi = % t . If there is a [/2 e Jnt~+„(X) such that »K(^0,aV, % b 2i--yVLi ) > l - e , then we let V2 = Vu2 — V\. If, moreover, there is a U3 G SWm+n(X) such that
yK(C^33, ,Vvii;;aa--((Vy11Uu V v 22 )) )) >>li~- ee ,, then we let V3 = Vus — (V1UV2). This procedure terminates in a finite number of steps, and we write the so obtained sets as £/!,..., Z/jveaJCJjX)
and
^....V ^ u^...,V ( INeA(M° S K Fn(Y)). )).
Chapter III: Information
1180 80
Channels
It follows from the construction of these sets that (1) and (2) are satisfied. Further VR^ Tl^nn(X) more, we have for A e %K^ (x) n(X)
(A,VAA-\JVA -\JV^
3=1 3=1 j=i
''
which implies
v{A, VAA) <
\ V
,=i j=l
t'
Taking the average over 9Jt~^ n (X) w.r.t. \i, we get V 11 e£ £Y,»{A)v(A, /x(A)i/(A, V VA) < < l*( l*( U U Vj) j) ++1-e, ~~ -' A
^j=l
'
which, together with (7.4), yields that
^(U^)>|-
(7.6) (*■«)
Since Vj C VVj (l<j
c c V»{UUVi)**)* ^EE" "H^^) < )
''
j=i j=l
Thus, invoking (7.6), we see that n(C)> £ ein(C-f) N>ie> N
niC-e) > n(C-e) 2 2 if we choose n = n(e) > 1 such that n(e) > | log | . Thus (3) holds. if we choose n = n(e) > 1 such that n(e) > - log - . Thus (3) holds. An immediate consequence of the above lemma is: An immediate consequence of the above lemma is: C o r o l l a r y 2. Let v 6 CS(X,Y) (X, F ) 6e be a stationary, nonanticipatory, m-memory and m-dependent channel with the ergodic capacity C = Ce. Then, for any sequence
Coding theorems 3.7. Codinj theorems
181 181
{e n }£L m + 1 such that en 4- 0, there exist a sequence of positive integers {N = N{n)}%Lm+1, a family of finite sets of messages {U[n\... , 1 $ ° } C VRm™n(X) (n > m + 1) and a family of finite sets of measurable sets {V^ , . . . , VJf'} C A{Tl^(Y)) (n > m + 1) such that for n > m + 1 n) n (l)U^nUJ =l-e v ; n,li-en)i
(3) iJV > e n ( cc --£e"">) . To formulate Shannon's coding theorems, we consider another alphabet X'0 = {a' x ,... , a^} and the alphabet message space (X', X', S"), where X' = X'0 and 5 ' is the shift on X ' . A code is a one-to-one measurable mappingX. If we let vv(x',A) 1
= lA(ip(x')),
x'eX',AeX,
v
then [X , v , X] is called a noiseless channel. This means that, if vv{x', A) = 1, then we know that x' e X' is encoded as( [ 4 r + l ■ • - 4 r + r ] ) = V r ( 4 r + 1 > ■ • ■ ,>44rr++ rr ) >
* ^ ZZ,,
where r > 1 is an integer, called the code length, and ipr : (XQ)T —► (Xo) r is a mapping. In this case, ip is called an r-block code. Let [X, v, Y] be a channel and y;: X' —> X be a code. Define vv by Vv(x',D) Vip{x',D)
= v(
x'eX',D€V), x'eX',DeV),
then we have a new channel [X', uv, Y] or i ^ € C(X', Y), which is called the induced channel. Suppose that K 6 C , (X, Y) is a stationary, nonanticipatory, m-memory and mdependent channel, and tp : X' —¥ X is an n-block code. Let [X', fj!\ be an input source for the induced channel uv. Arrange messages A'k e 9H^™ n (X'),l < k < lm+n in a way that /fj,'(A[) i'(^i) > > p'(A' fj,' (A'2) > >■■> > fi(A' n(A'em+n em+n))
> 0. >
(7.7)
Consider messages D%, D2, ■ ■ ■ , Dq*. G 9Jl° (F) of length n. For each k (1 < k < qn) choose a subscript ik (1 < «jb < ^ m + n ) for which = / / ® ^ ((A^ 4 * xx D£)k) fc) =
max //// <® ( 4- x max g> 1^ ^ (A xD Z?kfc),),
(7.8)
and let Q"
£„=jj«x£ En=\J(A'ihxDkf).c ). fc=i fc=i
(7.9)
182
Chapter III: Information
Channels
When, for each k, the message Dk is received at the output Y, then it is natural to u' ® ®v^{E vAEn) consider that the message Aik is sent, or we decode Dk as Aih. Then ft' indicates the probability of decoding without error. The following theorem asserts that under the above mentioned conditions we can find a block code with sufficiently large length for which the decoding error probability is arbitrarily close to zero. T h e o r e m 3 (Shannon's first coding t h e o r e m ) . Let v € CS(X,Y) be a station ary, nonanticipatory, m-memory and m-dependent channel with the ergodic capacity C = Ce and [X', ft'] be a stationary ergodic source with entropy Ho = H(ft') < C. Then, for any e (0 < e < 1) there exists a positive integer no = no(e) such that for any n > no there exists an n-block code ip : X' -¥ —¥XX with with fJ.®v ft®vv(E {Enn)) >l—e, > 1 — e,where where En is defined by (7.9). Proof. Since [X1, ft'] is stationary ergodic, SMB Theorem implies that log M '([x'_ m • ■■x' ■■x'nn__11}) }) -> -> ff ff00 = = H(ft') M '([x'_ m • -- ii log H(ft') in probability as n —s- oo, and that for any given e (0 < e < 1) there exists a positive integer n i = ni(e) such that
ft'\\ ±-log ft'([x'_ ft'([x'_ ft'\\ - ±-log • ••••<_ • < _ 1 ]1)])< <^o ^ 0 + +| ]f]j J> >1 -1-f. |. m m Let A[,...
(7.10) (7.10)
,A'Ni e 2rt~+ n i (X') be such that - — log logfi'(A' fi'(A'kk))
1 < Jb < JV Ni. X.
Then, this and (7.10) imply that
x:^' )>i-|,
f > ' ( 4fc) > l - f , fc=i H ft'(A' - »li<W +t)i / * ' Kk)) > > Be-" °+i),
z
(7.11)
1 < fc < JVj, JVi,
from which we see that JVi ni ff 1l >>5 ^XM>' '((AA' 'ffcc ) >>J Vi V i ei -e B- l ( f(l ' 0°++*i))
or JVJ # i << eennii <"°+§>. or <"°+f>.
(7.12) (7.12)
fc=i
Since [X, y, F] is stationary, nonanticipatory, m-memory and m-dependent, we can apply Feinstein's fundamental lemma to see that there exist positive integers
183 183
3.7. Coding theorems
n2 = n2(e) and N2 = N2(e), messages Ui,... sets Vtu,..., ... , V VY, -4(2TC£ (F)) such that N2 G A{WP ni2 {¥)) (i) V Vjt n Vj VJ = 0 ((ii /# j); j);
, £/JV2 G 9tt 9tt~™„ m ™„ 2 (X) and measurable
(u)v(Di,Vpt)>l-f1l
n c (iii) N 7V22 > > e " 2e(^c -~h). f >. Let no = no(e) = max{ni, n2}. We may suppose that C > Ho + e, so that C — —f§>> # o + f • Thus (7.12) and (iii) imply that n H ) n o(c_ °+h) < JVi < eena°((Ho+ i < ee no(C-f) 5 ) << jy N22.
Let n > n00 be arbitrary and define a function9Jt^+ 9Jt m + n (X) by m+„ : 9K~™„(X') Vm l{l{ Vm
**
. ,,s_[U , , k,(Uk, IA k) ~\UNl+X) Nl+X)
l
~\u
where A'ks satisfy (7.7). Then,
u(ip(xf), Dj)
n'(dx')
JA
'H
=
u{U v{Ukk,Dj)y!{A' ,Di)lj!{Ak),
where Dj G WQ(Y) (1 < j < qn) and i/(J7 v(Ukfc, , Dj) is the common value of v{x, Dj) for x eUk eUk since i> has m-memory and since x' G -Ajj. imphes implies p{x') G £/&. Hence, by (ii), for 1 < k < JVi
n' ® K x = MM' 'KK))KK% M' ®^ MAi x vVfcfc)) = % ,, ^^ )) > > ((lI -- f| )) /PA' (44 )) -Consequently, for i?„ defined by (7.9), it holds that Consequently, for i?„ defined by (7.9), it holds that
A*' ® M ^ " ) = M' ® »V ( U (A'i« A*' ® M ^ " ) = M' ® »V ( U ^ *
X X
^^ ) ^ )
>> E /*' ® ® M4»* M4»* AO AO E E E »' J=ID CV., 4
J=ID 4 CV., Wi
>E
E
/*'® M 4 x - 0 * ) .
> ^ E A"x ^JxA), =E^'®^K ^) i=i i=i
by (7.8), by (7.8),
(7-13) (7-13)
184
Chapter III: Information
Channels
> £ ( 1 - | V ' ( 4 ) ' by (7.13), >(l-|)
2
,
by (7.11),
> 1-e.
The first first coding theorem states that the the error probability of decoding can can be be made made arbitrarily small. On the the other hand, the the second coding theorem asserts that, in addition, the the transmission rate of the associated channel can can be close to the the entropy of the the input, so that efficiency of the information transmission is also guaranteed. Theorem 4 (Shannon's second coding theorem). t h e o r e m ) . Let [X',n'] [X',fj.'] be a stationary stationary ergodic ergodic source sourcewith with the the entropy entropyHQ HQ== H(n') H(n') and and [X, [X,v,Y] v,Y] be beaastationary, stationary, nonannonanstationary, nonanticipatory, ticipatory, m-memory m-memory and m-dependent m-dependent channel channel with with the ergodic ergodic capacity capacity C — Ce. If HQ < C and 0 < e < 1, then then there there exists exists an n-bolock n-bolock code code■ —>■XXsuch suchthat that H S) vv,(E„) > 1 - e
and
9t(/x'; v¥v) > H0 - e,
where En is defined by (7.9). Proof. For each n > 1 let
en=mfle>0:»'(j-±logv'([x'_m--.x' --.x'nn__11])
+
e]^>l-eV
so that ft' ( [ - ~ logm---x' {[x'_ ■ ■ 0■ < _ , ] ) < H > 1 -n.£'n. ^{[-~log([x'_ +0 + e ' n ]e')n\\>l-e' n_mt})
(7.14)
It follows from the SMB Theorem that - -
lo
g/"'([z'probability /y!, g / " ' ( [ ^ - mm • ' ' a 4 - i ] ) -> H0 in probabihty /,
and hence e'n -> 0 as n -> oo. For each n > 1 let A'nl,... such that --log )
+ e'n,
Note that
£p'«, f c )>i-c fc=l
, A'n N e WZ"L{X') m ' l
l
be (7.15)
3.7. Coding theorems
185
-"("°+<>, / i ' « f c ) > e-»<*°+*»>,
1 < fc k < < JVX,
by (7.14) and (7.15), respectively. It follows from the above pair of inequalities that (/fo+e l >> 5^3y/«Litf' «c )i f>c )N> 1J V e -i e"-("/ f o + e *» )J
or JVi < < eenn<(HHo+e »> ° +e »>
(7.16)
fc=i
By Corollary 2 we can find a sequence {eJ(}£L m+ i of positive numbers, a se quence {iV2 = N2(n)}^'=m+1 of positive integers, a family of finite sets of messages 2rt^,™nn(X) (X) (n > > m + 1) and a family of finite sets of measurable {[/{" , . . . , £/]£ } C fXH^_ sets {V}n),...
) such that , Vfc]} C A(SDt°(Y)) (n > > m + 11)
(i) f / W n C / j n ) = 0 ( i ^ j ) ; n) (ii) i / ( D < B J , vv(uf\v} / " > ) > l -)>l-eZ,l l < « < J V 22;
(iii) iV2 > e ^ - O ; (iv) Urn e" = 0. n—foo
Since lim hm e' = 0, we have lim (e" + e ' ) = 0, so that there exists an no > 1 such Since n-¥oo n-¥oo
n-too n-too
that e'„ + e„> nono- Let Let nn > > nono- Then Then by by (7.16) (7.16) and and (iii) (iii) we we see see that that £„ < that Ni < -> S SK K ^^ „„ (( X X )) by by m+n :: 2rt~™ Ni
, yw m + n ( (A i4„ ym+n(i4„
^ -_ /< ^ n .) -.
fc) fc)
-
<
. .
I CSTi+i.
isfc<JVk, ATi + 1 < fc < ^ m + n ,
and hence an (m + n)-block code ip
M'®M£n)>i-4-C so that for a large enough n it holds that / / ® vv(En) > 1 — e. To prove 9 t ( / / ; v^) > HQ — e, it suffices to show that To prove 9 t ( / / ; v^) > Ho — e, it suffices to show that # V ® ® i/^,) "*>) -- Hdx'vy) H(fj.'vv) < < ee H(ii' or, equivalently,
jjim ii ((ff„, J V0 1^v (J»CJ <(y)) Jjim (9H-™ (9K-™nn (x') (X') x x SW» (V)) -- Jj ^^
= 0o ««(*-))) ( *y -)) )) ) =
since Urn ^(m^ ^ ( sn(X')) m ^ X ' ) ) = = H(ji') = H0. Let n—»-oo "
&...<4M) =
^ fr* V f"'^ *p\ -'n k ) l/ l/
r* *p\
J J
t -'ntk
)
<* e ȣ&.<*'), A.,* e < ( n
Chapter
186 186
III: Information
Channels
Then we have that for each k pm+n am+n
E^n,*«,i)l0g^„,4«,i) i=l pm+n
= £l>„,*(„,*(<«) +
E
where ifck is such that \J <2> vv{A'nM x D £>„, = n
^».k«i)l0g^„,*«i). max
n
A*' ® *v(^n,i M ^ n . i xx A.,*),
1 > > ^^..(^.jiogc^ . . ( ^ . J l o g C ^« « iij j ++(1(i-e ~ D fo„.«,ij) ..>«J)tog log * " ^ ^ ^ ' iyJ , . since xlogx is convex on [0,1], ^.k(A'ntik)l0g^lDnik = &>„,,«<„) o g(A' f on„,*«„-,)) log (1 - &>„,*«,<*)) m+ -(l-^n,k«iJ)log(£ "-l) + lloo g ( ^ > log2 (1 & . , , « < » ) ) 1). > - log2 - (1 - £z>„,*«u)) g(^ m + "" -- !)• Now if we take the average over k (1 < k < qn) w.r.t. the measure fi'vv, then g"
£m+"
fc=i fc=i
jii=i =i
v^p(Bn)) ( S „ ) ) log(r log(rvv++"" -- 1)1) < log log22 + (1 - // ® (e'n + < log 2 + (£*„ + e„) (m + n) log £. I On the other hand, it is easy to verify that
-Jy^pn.o -i>VA.,fc) E E ^..(^^log^.K.o ^..K^iog^.^K,,) ft=l ft=l
i i=l =l
= E^ ^' ®® M M4>,i = -- E 4,,i
X
X
A>,fc) llog// ^ ^( ^n ..ii XX-Or.,*:) A>,fc) A>,fc) o g ^ ' ®® M
+ EE ^M' M / Z i ^ A , , *ifc) ' MA A^ . ,o * )glog/i%(Z>„ kft
= * W , (an^»(x*) x sw»(K)) (aw»(Y)). sw£(K)) - Hfl^^ (OW»(Y)). Therefore, 0
- n 1 ^ £ K'°"* im™+n(X') * 3«SaO) - *V«V « ( 5 0 ) )
Bibliographical
notes notes <
187
lim
log2+(g , n + Q ( m + n)tog/
n—*oo
Tln
== 0
as desired.
Bibliographical notes 3.1. Information channels. The present style of formulation of channels (Defini tion 1.1) is due to McMillan [1](1953), Khintchine [2](1956) and Feinstein [2](1958). Proposition 1.2 is proved by McMillan [1]. Proposition 1.3 is due to Fontana, Gray and Kieffer [1](1981). Proposition 1.4 is obtained in Yi [2](1964). Proposition 1.5 is in Umegaki and Ohya [1](1983). Proposition 1.6 is proved by Umegaki [7](1974). 3.2. Channel operators. Echigo (Choda) and M. Nakamura studied information channels in an operator algebra setting in Echigo and Nakamura [1](1962) and Choda and Nakamura [1, 2](1970, 1971), where a channel is identified with an operator between two C-algebras. Umegaki [6] (1969) established a one-to-one correspon dence between channels and certain type of averaging operators on some function spaces, which led to a characterization of ergodic channels. Theorems 2.1, 2.2 and Propositions 2.7, 2.9 are due to Umegaki [6]. Regarding completeness for ergodicity Y. Nakamura [1](1969) gave a sufficient condition. Proposition 2.10 is proved by Y. Nakamura [1] and Umegaki [6] independently. Proposition 2.11 is obtained by Umegaki [5] (1964). 3.3. Mixing channels. Proposition 3.2 is noted by Feinstein [2]. Theorem 3.4 together with Definition 3.3 is due to Takano [1](1958). Strong and weak mixing properties of channels were introduced by Adler [1](1961), where he proved Theorem 3.7. Definition 3.8 through Theorem 3.14 are formulated and proved by Nakamura [2](1970). 3.4- Ergodic channels. Characterization of stationary ergodic channels was first obtained by Yi [1, 2] in 1964 (Theorem 4.2 (2), (6)). In some years it has been over looked. Umegaki [6] and Nakamura [1] independently obtained some equivalence conditions of ergodicity of a stationary channel. Umegaki used a functional analy sis approach to this characterization, while Nakamura applied a measure theoretic consideration. Theorem 4.2(3), (4), (5) and (7) are due to Nakamura [1]. Theorem 4.4 and Corollary 4.5 (2) - (7) are obtained by Umegaki [6]. Ergodicity of Markov channels is considered in Gray, Durham and Gobbi [1](1987). 3.5. AMS channels. Jacobs [1] (1959) defined "almost periodic channels" together with ''almost periodic sources" in the alphabet message space setting, which are special cases of AMS channels and AMS sources, defined by Fontana, Gray and Kieffer [1] and Gray and Kieffer [1](1980). Almost periodic channels and sources are essentially the same as AMS ones in treating. Subsequently Jacobs showed some
188 188
Chapter III: Information
Channels
rigorous results regarding almost periodic channels in his papers [2, 6](1960, 1967). Lemma 5.2 through Theorem 5.8 are mainly due to Fontana, Gray and Kieffer [1]. In Theorem 5.8, (6) is noted in Kakihara [4](1991). Kieffer and Rahe [1](1981) showed that a Markov channels between one-sided alphabet message spaces is AMS. Lemma 5.11 is given by Ding and Yi [1](1965) for an almost periodic channel. In Theorem 5.12, (2) is observed by Kieffer and Rahe [1], (3) is due to Fontana, Gray and Kieffer [1], (4) and (5) are given by Kakihara [5], and (6) and (7) by Ding and Yi [1] for an almost periodic channel. Theorem 5.13 is obtained in Kakihara [5]. 3.6. Capacity and transmission rate. The equality Ce(y) = Cs{v) for a sta tionary channel was one of the important problems since Khintchine [2] (1956) (cf. Theorem 6.1). Carleson [1](1958) and Tsaregradskii [1](1958) proved Ce = Ca for finite memory channels. Feinstein [3](1959) and Breiman [2](1960) showed Ce = Cs for finite memory and finitely dependent channels. In particular, Breiman [2] proved that for such a channel Ce is attained by some stationary ergodic source, where he used Krein-Milman's Theorem. So part (i) and (iii) of Theorem 6.1 (2) are due to Breiman [2]. Parthasarathy [1](1961) also showed Ce = Ca using integral represen tation of entropy and transmission rate functionals invoking ergodic decomposition of stationary sources. According to his proof, Ce = Cs holds for stationary ergodic channels (Theorem 6.1 (3)). For channels between compact and totally disconnected spaces Umegaki [5] considered transmission rates and capacities. Proposition 6.3 through Theorem 6.8 are due to Umegaki [5]. Nakamura [1] formulated transmission rates and capacities in an abstract measurable space setting and proved Proposition 6.11. For a discrete memoryless channel an iterative method to compute the capac ity was obtained by Arimoto [1](1972), Blahut [1](1972) and Jimbo and Kunisawa [1](1979). Various types of channel capacity have been formulated. We refer to Ki effer [2](1974), Nedoma [1, 3](1957, 1963), Winkelbauer [1](1960) and Zhi [1](1965). 3.7. Coding theorems. Feinstein [1](1954) proved Lemma 7.1. The proof here is due to Takano [1]. Shannon [1](1948) stated his coding theorems (Theorems 7.3 and 7.4). Subsequently, their rigorous proofs were given by Khintchine [2], Feinstein [2] and Dobrushin [1](1963). Here we followed Takano's method (see Takano [1]). The converse of coding theorems was obtained by many authors, see e.g. Feinstein [3] and Wolfowitz [1](1978). Gray and Ornstein [1](1979) (see Gray, Neuhoff and Shields [1](1975)) gives a good review for coding theorems. Related topics can be seen in Kieffer [2], Nedoma [1, 2](1957, 1963) and Winkelbauer [1].
CHAPTER IV
SPECIAL
TOPICS
In this chapter, special topics on information channels are considered. Fisrt, ergodicity and capacity of an integration channel are studied, where an integration channel is determined by a certain mapping and a stationary noise source. A chan nel can be regarded as a vector valued function on the input, i.e., a measure valued function. Then strong and weak measurabilities are considered together with domi nating measures. A metric topology is introduced in the set of channels. Complete ness w.r.t. this metric is examined for various types of channels. When a channel is strongly measurable, it is approximated by a channel of Hilbert-Schmidt type in the metric topology. Harmonic analysis is applied when the output is a locally compact abelian group. In this case a channel induces a family of unitary representations and a family of positive linear functional on the L 1 -group algebra. The Fourier transform is shown to be a unitary operator between certain Hilbert spaces induced by a channel and an input source. Finally, noncommutative channels are considered as an extension of channel operators between function spaces, which are commuta tive algebras. Here the function spaces are replaced by certain (noncommutative) operator algebras.
4.1. Channels w i t h a noise source In this section we consider integration channels determined by a mapping and a noise source. Ergodicity and capacity of this type of channels are studied. Let (X, X, S) and (Y, 2),T) be a pair of abstract measurable spaces with mea surable transformations, which are the input and the output of our communication system as before. We consider another measurable space (Z, 3 , U) with a measurable transformation U on it and a measurable mapping ip : X x Z —> Y satisfying that (nl) ip(Sx, Uz) = Trp{x, z) for x e X and z e Z. Choose any U-invariant probability measure C G Pg{Z), called a noise source, and 189
190
Chapter IV: Special Topics
define a mapping v : X x 2) —> [0,1] by v(x,C)=
[ lc(4>(x,z))C(dz),
xeX,CeZ).
(U)
Since ip is measurable, it is easily seen that v is a channel, i.e., v € C(X, y ) . More over, it is stationary since for x 6 X and C € 2J i / f o l ^ C ) = /" l r -»c(^(as,*))C(«b),
by (1.1),
= y i0(r^(«,*))c({Sx,Uz))adz), Jz = / l<j(^(Sii,z))f(dz),
by(nl), since C is stationary,
= !/(&, C). The channel i/ defined by (1.1) is called an integration channel determined by the pair (ip,C) a n d is sometimes denoted by v$£. We shall consider ergodicity of this type of channels, where the mapping i/> and the noise source £ are fixed. Proposition 1. Suppose that /ix C, € -Pse(-^ x Z) for every fi e Pse(X), integration channel v = v$£ is ergodic, i.e., v € Cse(X, Y).
then the
Proof. Assume that EeX®2)isSx T-invariant. Then, T^Esx = Ex lor x € X by (III.4.1). Letting f(x,z) = lEx(ip(x,z)) for (x, z) € X x Z, we note that / is 5 x T-invariant since for (x, z) € X x Z f(Sx, Uz) = lEsi tyiSx, Uz)) = \Esx (Ti,(x, *)) = lT^ESx(^(X'Z))
= 1Et(lp(x,z))
=
f(x,z).
By assumption fj. x C is ergodic for any /x e Pse(X), so that /j x C(E) = 0 or 1 and hence C(EX) = 0 or 1 p-a.e.x. Thus / = 0 or 1 fix C-a.e. Consequently, for every fj, € Pse(X) (i®i>(E) = I
Jx
u{x,Ex)fj.(dx)
4-1. Channels with a noise source
191
= / [ JxJz
lEl(4>(x,z))C(dz)»(dx)
= j [ JxJz
f(x,z)C(dz)»(dx)
=
f(x,z)nxC(dx,dz) JJXxZ
= 0 or 1. Thus n ® v is ergodic. Therefore v is ergodic. In addition to (nl) we impose the following two conditions on ip: (n2) ij)(x, •) : Z —¥ Y is one-to-one for every x £ X; (n3) A(G) G £ ® 2) for every G e X ® 3 , where the mapping X-.XxZ^XxY A : X x Z 4 l x 7 is defined by X(x, z) = [x, tp(x, z)) for (x, z) G X x Z. Note that if X, Y, Z are complete metric spaces, then (n3) is always true for any Baire measurable mapping tp : X x Z —¥ Y. Under the additional assumption of (n2) and (n3) the converse of Proposition 1 is proved as follows: Proposition 2. Suppose that the mapping tp satisfies (nl) - (n3). Then the inte gration channel v = v^ is ergodic iff /xx£ is ergodic for every ergodic /J, G Pse(X). Proof. The "if part was shown in Proposition 1. To prove the "only if" part, let A be the mapping in (n3). Since A is also one-to-one by (n2) we have that A _1 A(G) = G for G e X <8> 3- If G € X ® 3 is S x {/-invariant, then A(G) = A((5 x *7) _1 G) = {X(x,z)
:
=
{Sx,Uz)eG}
{(x,iP(x,z))-(Sx,Uz)eG}
= {(x,y):y
= ip(x,z), (Sx,Uz) e G}
C {(x,y) : Ty = i>(Sx, Uz), (Sx,Uz) C {(x,y) = (Sx
: (Sx,Ty)
€ G}
e X(G)}
T^XiG).
Now one has that for an ergodic n e
Pse(X)
ii x C(G) = n x C(A"1A(G)) = / /
l\-i\(G)(x,z)ti,{dx)(;(dz)
JZJX
= / / JZJX
lA(G)(A(z,z))/i(dx)C(dz)
Chapter IV: Special Topics
1192 92
=
l\(G)(x,ip(x,z))
fJ.(dx)((dz)
JzJx = /11 / = JXJZ JXJZ
11
x(G)xx(ip(x,z))C(dz)ii(dx)
= [ v(x,\{G)m)ii(dx) Jx = /j,®v(\(G)) =0or 1 = /j,®v(\(G)) =0or 1 since v is ergodic and fi <8> v is so. Therefore JJ x C, is ergodic. since v is ergodic and fi <8> v is so. Therefore n x £ is ergodic. E x a m p l e 3 . (1) For an integration channel v = u^,^ with (nl) - (n3) it holds that, for a stationary \i G PS{X), /J, x £ is ergodic iff fi ® v is so. (2) Suppose that (X, X, S) = (Z, 3 , U), then an integration channel v = v^^ with (nl) - (n3) is ergodic iff C is WM. (3) Suppose that (X, X) = (Y, 2J) = (Z, 3) is a measurable group with a group operation "■" commuting with S = T = U. Let y = tp(x, z) = x ■ z for x, z G X. Then, the integration channel v = v^,^ determined by (ip, C) is ergodic iff f is WM. (4) Consider a special case where X = XQ, Y = Y0Z and Z = ZQ with XQ = { 0 , 1 , 2 , . . . ,p-l},Y0 = { 0 , 1 , 2 , . . . ,p + q-2} and Z0 = { 0 , 1 , 2 , . . . ,q-l}. Define ip{x,z)i = Xi + Zi (mod(p + q)) for i G Z, where a; = (xi),z = (zi) and ^>(ar, «)< is the ith coordinate. Then the integration channel v = v^, s is called a channel of additive noise. In this case, v is ergodic iff \i x v is ergodic for every fj, G Pse(X). The following proposition characterizes integration channels. Proposition 4. j4ss«me that ip : X x Z —¥ Y is a measurable mapping (nl) - (n3). Then, a stationary channel v G CS(X,Y) is an integration determined by {ip,C) for some noise source £ G PS(Z) iff (1) v(x,i/j(x,Z)) = lforxeX; (2) u(x,1>(x,W))
= V(X',TI>{X',W))
for x,x'
satisfying channel
G X and W G 3-
Proo/. Under conditions of (nl) - (n3), for any x G X and W G 3 the set ^ ( x , W) is measurable, i.e., ^ ( x , W) = {^(x, z ) : z 6 f f J 6 ? ) . For, X(X x W) G £ <8> 2) by (n3) and hence V(x, W) = A(X x W ) x G 2). Suppose v = vj,^. Then for x G X and W G 3 i / ( x , V ( x ,,W W0) ) = / l*(« l W )(^(i,«))C(«fe) )(iKx,«)K(
- can
by(n2),
4.1. Channels with a noise source
193
which is independent of x. Thus (1) and (2) immediately follow. Conversely assume that (1) and (2) are true. Then, C(-) = v(x,ip(x,-)) is inde pendent of x. Note that C is a [/-invariant probability measure on 3 , i.e., C € PB{Z) by (cl), (n2) and (1). Now for C G 2) we have that u(x,C) = v(x,Cri1>{x,Z)), v(x,Cri1>{x,Z)), u(x,C)
by (1), by(l),
1 = v(x,ip v(x,ipxxi>i>-1(C) (C) n ipxx{Z)), {Z)), l 1 (C))) = = v ( *u(x,f , « . x((^1p(C)))
where &,(•) ij)x{-) ==i/>(x, j>(x, ■), ■),
= C(C1(C0)
=/ V V(oWC(«fa) (C)(*K(«fa) = Jlc{i>(x,z))C(dz), which implies that i> is an integration channel. Next we consider capacities of integration channels in the setting of alphabet message spaces. So assume that X = XQ,Y = Y0Z and Z = Z% for some finite sets XQ,YO and Zo- 5 , T and U are shifts on the respective spaces. For a stationary channel v e CS(X, Y) the transmission rate functional £H(-; v) is given by «*(/*;»/) == ^fl-„(S) m(fj,;»/) ( S ) ++ fl^(T) fl^(T) -- HH^^, ,((55 xxTT)), ,
e P.(JT) P.(JT)
MMe
as in Section 3.6. Also the stationary capacity Cs(y) and the ergodic capacity Ce(f) are defined there. We now define an integration channel as follows. Let m > 0 be a nonnegative integer and ipo : X™+1 x ZQ —*■ Yb be a mapping such that ^ o ( x 0 l H , . . . ,xm,z0)
-fo{x0,xi,...
,xm,Zo)
=> z0 = z'0.
Define a mapping V> : X x Z —> Y by rp(x,z)i = tl)Q(xi-m,Xi-m+x,---
,Xi,Zi),
i e Z,
where x = (x;) £ X, 2: = (ZJ) £ Z and V>(x, z)j is the ith coordinate of ^(x, z) e Y. Evidently rp is measurable and ip satisfies (nl) since %j>(Sx, Uz)i = 1>0((Sx)i-m, = 1po(Xi-m+l,
=
i>{x,z)i+i
{Sx)i-m+u
Xi-m+2,
Xi+l,
■■■ , (Sx)i, Zi+l)
(Uz)i)
Chapter
194
IV: Special
Topics
= (T4,(x,z)). for each i € Z. Moreover, V> enjoys the properties of (n2) and (n3). Taking any noise source C € PS(Z), we can define an integration channel v = v^ <.. We note that v has a finite memory or m-memory (cf. (c5) in Section 3.1): for any message [Vi---yj](i<J)cY [m---Vj](i<J)cY (1.2) (1.2) v{x, [j/i ■ • • yj\) = v(x', [Vi ■ ■ ■ yj\) if x = (xk) and x' = (x'k) satisfy xk=x'k
(i-m
j).
Theorem 5. The transmission rate functional 9t(-; v) of an integration channel v = v r > is given by
p £ P.(X).
Proof. Let fi € PS{X). Then we see that Hmv(SxT)=-\xnxo-
^2
M®v([(a:i.yi)--'(a5n.»n)])
E
x i , . . . , i n yi,...,y„
■ log fi ® i/([(xi, J/I) ■ • • (*„, y„)]) =
~,5&,n
E
E
/*®"([*i-m"-a:n]x|j/i ■■■»„])
■logfj,®u([x1-m---xn] ""n^on
E
»{[X-L-rn'--Xn])l>(x,[y\---yn])
E
"1-mr—
x [yi---y„])
,Xn 8 1 , . . . ,J/n
• log/i([xi_ m • ■ -Xn\)v(x, [yi ■ ■ • j/„]) for x = (a;fc) e [xi-m ■■■xn] since i/ has m-memory (cf. (1.2)). Hence we have that Wji; v) = JJ„(S) + H^(T) - H^V(S = Hli„{T)-limo-
+ „1imo^
x T)
J2
E
A*([*i-m-"aSn])lQg/*([*i-w-••*,,])
E
M(fa-m •••*„])*(*,[!&...*,])
I l - m v ,3n Hi,-.- ,!/n
• log/i([an_m ■ ■■xn])u(x, [Vl ■ ..*,]) = ^^ (( T T )) ++ nlUh nn i -i = n o
££
££
^ m ,, -- ,, Ii n ^ li -- m n yyii, ,- .- .. ,, !y/ „„
M([*i-m-*„])v(x,I» M([*i-m-*„])v(x,I» xx ... ... ff cc ]) ])
4.1. Channels with a noise source
195 ■logi/(x, [2/1 ■■ •?/„]),
where x = (a;*,). Now, it holds that V(x, \V\ ■ ■ ■ Vn]) = I MVl-yn] {4>(X, Z)) C(dz) = C{Mi(xi-m,...
,xi,yi)n---nMn(x n—77i) ■ ■ ■
where for i = 1 , . . . , n Mi(ao,au...
,an,b) = {z = (zk) 6 Z : ip0{ao, ■ ■ ■ ,an,Zi) = 6} C Z,
since ip(x, z) = (V>o(zi-m, ■ • • , %i, z i ) ) i e Z e [yi • • ■ yn] iff z 6 M i ( x i _ m , . . . , Xj, j/j) for 1 < i < n. Let Fi = {V>o(xi_ m ,... ,Xi,d) : d € Zo} C F 0 . Then for any j/* 6 Y\ there is a unique [zi\ £ ^ ^ ( Z ) such that Mi(xi-m,... , x;,j/j) = [zi], where VK[(Z) is the set of all messages of length 1 starting at time i in Z. If yi e Yo — Yi, then Mi(xi-m,... , j/i) = 0. Therefore,
fR{n;v) Hia/(T)+lim 9%; v) = H^T) + tim^ o-
^Z ^Z ^
n([x!-m ■• ■-x„]) ■ xn}) M([ZI-«-
[yi ■ ■•?/„]) ■•?/„]) log logi>(x, ■ •■■■j/„]) yn]) i/(x, [j/i 1/(1, [j/i ■ 1
n
n + Urn Um -H^U-ifXtflffls/ iH c (c/- ajt;(z)v---v{7a«;(z)) = _ff ff/xl/ (r) + ■ • -V UVtfl{Z)) /x„(r)
=
/fM„(r)-fr H^(T)-Hc(cr). C(U).
Proposition 6. For tAe integration channel v — v± , there exists a stationary ergodic input source fi* 6 Pae(X) such that Cs(v) = £ft(/J*; v). Proof. H^S) is a weak* upper semicontinuous function of \i by Lemma II.7.3. Since v has a finite memory, it is continuous, i.e., it satisfies (c5') or (c5"). Continuity HpviT) is a weak* upper semicontinuous function of \i by of a channel implies that; H^iT) Proposition III.2.11. Thus 9t(-; v) is weak* upper semicontinuous on PS(X). The assertion follows from Theorem III.6.8 (3). Let us consider a special case where XQ = Yo = ZQ is a finite group. Let ipi(x, z) = x-z = (xi-Zi)igz. The channel v determined by tpi and C e Ps(Z) is called a channel of product noise. Theorem 7. For a channel v = ity, ^ of product noise it holds that Ca(v) = C.(v)
log\X0\-H((U). ]og\Xo\-H ((U).
Chapter
IV: Special
Topics
Proof. Let p = \X0\ and consider a ( A , . . . , ^-Bernoulli source /xo on X = X ? . Then we see that for n > 1 /io(dx) lMoKbi i o K t o '• •• •• 2/"D *>(<**) »»l) =■ =■ // *(*» *(*» to' ■' »n]) to'"'^
= =
Y, E
= =
Y,
= — Pp n ^*
=i P P
/[
v{x,[yi---yn])va{dx) "(^toi-'fcD^W
v(x,[y[j/i ]), "(a:, •••Sn])W)([a;i 1---y n])fJto([xi---xn'••*n])i V
w(ar,[yi---ife]),
E
/*[»-*](**<*.*))«'**)
*—* a?i,...,a;„ a?i,...,a;„
/Z H H ,, .. .. .. ,, zz :: „„ '' / Z
=4 E /
= (**)> {xk), *x =
([^■ • ■■!„]) since ^ v0([xi ■■!„])== — —, , V i7
ifc~*i(tfri(*,*))c(«fa) x
1
=4 = 4 E E c(K*r CClO^-wO-feC -wi>-fee 1 •*»)]) •*»)!) J_ p" since {a;,"1 ■ j/j : 1 < $ < p} = {a* : 1 < i < p} = -*o for 1 < j < n. Hence, ^v also a ( A , . . . , i)-Bernoulli source. Thus # M o „(T) = log \X0\ = logp. Therefore, C.M =
sup
[HM1/(T) - fl-c(C0] < logp - HC(U) =
or SR(Mo; «0 = C.(i/) = log \X0\ -
4.2.
is
H((U).
Measurability of a channel
In this section, we regard a channel as a measure valued function on the input space and consider its measurability. Let (X, X, S) and (Y, 2), T) be a pair of abstract measurable spaces with measur able transformations 5 and T, respectively. As before, M(fi) denotes the space of all C-valued measures on f2 and B(fl) the space of all C-valued bounded measurable functions on fi, where f2 = X, Y or X x Y. First we need notions of measurability of Banach space valued functions, which are slightly different from those in Bochner integration theory (cf. Hille and Phillips [1] and Diestel and Uhl [1]).
4-2. Measvrability of a channel
197
Definition 1. Let £ be a Banach space with the dual space £*, where the duality pair is denoted by {(j>, cj>*) for 4> £ £ and <j>* 6 £ * ■ Consider a function tp : X —» £. is said to be finitely £-valued or £-valued simple function if n
k,
k,
xxe€ XX
fc=i fc=i for some partition {-Afc}£=i C X of X and some {0fc}JJ=i C. £. tp is said to be strongly measurable if there exists a sequence {tpn}%Li of £ -valued simple functions on X such that \\0^ where ||-||f is the norm in £. tp is said to be weakly measurable if the scalar function <*»(■),**> **••((?? ( • ) ) = <«»(■),**>
is measurable for* G £*. Although the above definition is not identical nor equivalent to the one in Bochner integration theory, one can show that tp : X —¥ £ is strongly measurable iff ip is weakly measurable and has a separable range. The usual definition of measurabilities of a Bochner integration theory is as follows. Let m be a finite positive measure on (X, X). Then a function tp : X —> £ is said to be strongly measurable if there is a sequence {y>n} of £-valued simple functions such that
||00 m-a.e.x. ||
u{x, ■) == D{x) u{x) G€ M(Y), M(Y), v(x, ■)
xeX. xeX.
(2.1)
We want to consider strong and weak measurabilities of v in the following. Definition 2. Let v e C(X, Y) be a channel and v : X —> M(Y) (2.1). Then v is said to be strongly measurable if (cl7) v is strongly measurable on (X, X) and to be weakly measurable if (cl8) v is weakly measurable on
(X,X).
be defined by
!98
Chapter IV: Special Topics
e C{X,Y) C(X,Y) is said to be Clearly (cl7) implies (cl8). Recall that a channel v G dominated if (c6) There exists some 77 G P(Y) such that v{x, •) < 77 for every x G X . Then (c6) is between (cl7) and (cl8) as seen from the following. T h e o r e m 3 . Let v G C(X,Y). Then, (cl7) =► (c6) =>■ (cl8). That is, if v is strongly measurable, then v is dominated, which in turn implies that v is weakly measurable. Proof. (cl7) => (c6). Assume that v is strongly measurable. Then the range {v(x, ■) : x G X} of v is separable in M(Y) and hence has a countable dense subset {v(xn,-) : n > 1}. Let
„=1
^Z
where the RHS is a well-defined element in P{Y). Suppose 77(C) = 0. Then v(x„,C) = 0 for n > 1 by definition. For any x G X let {£ n t }jtLi be a subse quence of {x n } such that — v(x, Oil ~~* 0
\\v{xnkr)
as fe —>■ 00,
which is possible by denseness of {v(xn, ■) : n > 1} in {v(x, 0 : a; G X } . Hence we see that v(x,C) = lim u(xnk,C) = 0. This shows that v(x, ■) -C 77. fc—►<»
(c6) =S> (cl8). Assume that some 77 G P(Y) satisfies v(x)
[ g(y)r,(dy),
C G 2),
Le
-> ff = T ^ . w l l e r e lllbll = W O O = \\9h,n- The L°°-space L 0 0 ^ ) is the dual of Ll{Y,n) by the identification L°°{Y,77) 3 f = f* e L 1 (¥",»?)* given by /*(s) = /
/(»)»(») i/(dy),
S G LX(V,77).
Now for x G X let
*o-^Wc™ be the RN-derivative. To show the weak measurability of v it suffices to prove that of the function X B x ^ ux e L1{Y,n). For / G L°°(Y, 77) choose a sequence {/„}~ = 1 of bounded simple functions on Y such that / „ -> / 77-a.e. For each n > 1 fc(v*) fn{y)vx{y) n{dy) n{dy) = f*(x), f*(x), / n W = /I fn{y)vx{y)
say,
4-2. Measurabihty of a channel
199
is a function of x converging to f*{v%) = f*{x),
sa
Y> f° r all x e X by the Bounded
fcn
Convergence Theorem. If we let / „ = JZ ar»,fclcn t for n > 1, then we see that for n> 1
J2 Qn,fcicn.fc (y)vx{y) n{dy) ' k=i k=l
^
Jcn,k
v(dy)
= fc=l X) a».* /JC , "fa' d3/) n k
i 3: C = y,"n,* }JCtn,kv{x,C ntk), = '( i 'n,t)i fc=l k=l
which implies that ,/£(•) is measurable. Hence /*(■) is also measurable. Since / e L°°(Y, rf) is arbitrary, we can show that the function x i-> i/x is weakly measurable. This completes the proof. When the measurable space is separable, the implication (c6) => (cl7) holds, i.e., under the assumption of separability of the measurable space (c6) and (cl7) are equivalent, which will be shown in the following. Corollary 4. Let v £ C(X, Y) and assume that 2) has a countable generator. (c6) implies (cl7). That is, every dominated channel is strongly measurable.
Then
Proof. Suppose that (c6) holds with r) e P(Y). Since 2) has a countable generator, L1(Y,rf) is separable. Using the notations of the proof of Theorem 3, we only have to prove the strong measurability of the function X 3 x >-» vx e L1(Y, rj). But then, the weak measurability of that function is shown in the proof of Theorem 3. Moreover, Ll(Y, rf) is separable. Therefore x t-> vx is strongly measurable. Assume that a channel v 6 C(X, Y) is dominated, u(x, •) < tj ( i e I ) for some n £ P{Y). Let k{x,y)=V-^f,
(x,y)eXxY
be the RN-derivative. We want to consider joint measurability of this function k. To this end we shall use tensor product Banach spaces, which will be briefly mentioned. We refer to Schatten [1] and Diestel and Uhl [1]. Let n e P{X) and £ be a Banach space. L1(X ;£) denotes the Banach space of all £-valued strongly
Chapter
200
IV: Special
Topics
measurable functions $ on X which are Bochner integrable w.r.t. fi, where the norm ||$||i, M is defined by M= || || ** I| || II ,, M = //
\\*(x)\\el*{dx). [|*(*)ll*M*0-
Jx
The algebraic tensor product Ll(X) O £ consists of functions of the form 1 (X),fkeLfk1eL (X), ke£,l l, ke£,l l,
f^fk&k, f^fk& k, fc=i
which are identified with £ -valued functions (^2fkQkj(x) ^2fk(x)(t>k, (^2fkQ kj(x) = ^2fk(x)(t>k, ^fc=l
'
xeX. xeX.
k=\
The greatest crossnorm "/(•) is defined on i 1 ( X ) 0 £ by
7f E/*©**)-w(f;ii/jM^iu-'E^0^=EA©**l' H=l
'
3
l 1=1
k
)
which is equal to e
II
W^fkOfa JX II
k
dn= £
^fkQfa k
l,fi
Thus, the completion of LX(X)Q£ w.r.t. the greatest crossnorm 7, denoted L 1 ( X ) ® 7 £, is identified with L X ( X ; £). LX(X) ® 7 £ is called the projective tensor product of Ll{X) and £. If £ = M(Y) or £ = L ^ F , 77) for some 77 G P ( Y ) , then
L\X-M(Y)) L\X-M(Y)) 1 L (X;L\Y)) L1(X;L\Y))
= L\X) L\X) ®7 M(Y), M(F), 1 1 11 = L (X)® L (X)® (Y) yL (Y) yL X = {X xY,fix = L\X L\X xY) xY) = L L^X xY,fix 77). n).
Theorem 5. Assume that a channel v G C(X, Y) is strongly measurable and hence is dominated by some source n G P{Y). Let y, G P{X) be arbitrarily fixed. Then the RN-derivative k(x,y) = "fyjy is jointly measurable on the product measure space ( X x y,£®0),/xxr7). Proof. We shall consider the tensor product space L 1 (X,/i)7 £ for £ = M(Y) or Ll(Y, 77). Since v is strongly measurable, £(•) is an M(Y)-valued strongly measurable
4-2. Measurability
of a channel
201
function on the measure space (X, X, /J) and, moreover, it is Bochner integrable w.r.t. fi, i.e., v{-) G L1 (X ; M{Y)). Then there are a sequence {$„} C L1(X)QM(Y) and an T) € P(Y) such that 7 ( $ „ - v) ->• 0, fnj < »?, where $ „ = X) fn,jO£n,j,
i/(x, •) < 77,
x 6 X,
1 < j < jn, Tl > 1,
n > 1. Since £ (F) = ^ ( Y , J?) is identified with a closed
3=1
subspace of M(Y) as in the proof of Theorem 3, we can regard LX{X) © Ll(Y) and LX{X) ® 7 V-iy) as subspaces of Ll{X) © M(Y) and L 1 ^ ) ®y Af (Y), respectively. In fact, the following identifications can be made:
$n = £
/«J © & J S £
j=l
/nj © ^ e i ' W O
L X (Y),
j=l
"w(')=fc('')ejLl(x)®7jLl(n This means k € L 1 (X x Y) and fc is jointly measurable. Remark 6. In Theorem 5, if a measure £ € P ( X x Y) is such that £
x GX
for some 77 e P(.X'). Recall that if 77 is considered as an input source, then the output source r\v G P(X) is given by W(C)
= f v(x, C) r){dx),
Jx
Then we have: Proposition 7. Under the above assumption one has: 77; (1) T)v 771/ «< 77;
»S=S*^''
CeX.
Chapter IV: Special Topics
202
Proof. (1) is obvious since i/(x, •) < n for every x € X. (2) Note that k(x,y) = ^ ^ is jointly measurable on {X x X,X®X,r,xri) Theorem 5. Applying Pubini's Theorem we get for C e X that
by
i I^M v{dv)=nu{c) = Lv{x%c) n{dx) =
k(x,y)n(dy)r](dx) JxJc
I I Hx,y)v{dx))r)(dy).
= This is enough to obtain the conclusion.
4.3. Approximation of channels Let (X, X, S) and (Y, 2),T) be a pair of abstract measurable spaces with mea surable transformations. We introduce a metric topology in the space C(X, Y) of channels from X to Y. It will be shown that C(X, Y),Ca(X, Y),Cae(X, Y),Ca{X, Y) and Cae(X, Y) are complete w.r.t. this topology. When a channel is strongly mea surable, a Hilbert-Schmidt type for it is defined. We shall prove that any strongly measurable channel is approximated by a channel of Hilbert-Schmidt type in the metric topology. Definition 1. Define p(-,-) on C(X,Y)
x C{X,Y)
p{v1,V2)=SWp\\l>1(x,-)-V2(x,-)\\, x€X
V!,V2€C(X,Y),
where ||-|| is the total variation norm in M(Y). Recall that each channel u € C{X,Y) B(X) given by
by (3.1)
Clearly p is a metric on C(X, Y).
induces a channel operator K „ : B(Y) -►
(Kvg)(x) = J g{y) v{x, dy),
g 6 B(Y),
where B(X) and B(Y) are spaces of all bounded measurable functions on X and Y as before. An immediate consequence of the above definition is: Lemma 2. Let vuv2 e C{X,Y) and K n , K ^ : B{Y) -► B{X) channel operators. Then it holds that \\K^-KV2\\
=
p(Vl,u2).
be corresponding
4-3. Approximation
of channels
203
Proof. Observe that | | K ^ ~ K „ 2 | | = sup | | ( K n - K ^ ) / | | = sup sup / f{y){v1(x,dy) \\f\\
-
v2(x,dy)\
\f(y)\\v1(x,-)-v2(x,-)\(dy)
Conversely, let e > 0 be given. Choose an xo G X such that ||i/i(x 0 , •) - v2(x0, -)|| > p{v\, vi) - e, which is possible by (3.1). Define a functional A on B(Y) by A(/) = j
f(y){vi{x0,dy)
- u2(x0,dy)},
f G B(Y).
Then A is bounded and linear with norm ||i/i(£(),•) — v2{xQi ")ll- Clearly ||K,^ — K „ J > ||A||,sothat H K ^ - K ^ I I > p{vuv2)-e. Therefore p C ^ - K ^ H > p{vuv2). Lemma 3. Let v„ (n > l),i/ G C(X,Y) any \i G P(X) it holds that \\fj,un - fiu\\ -> 0,
and p(yn,v)
—>• 0 as n —> oo. Then, for
W/J,vn - n ® v\\ -s- 0.
Proof. Let p, G P(X). It suffices to show the second convergence. For n > 1 and E G X2) choose Entl,E„t2 G £ ® 2J such that £ „ , i n JE?„,2 = 0, J5„,i U £ n > 2 = X and ||/x® vn - n® i/|| = \n®(i>n-v)\(X = H®(vn-
xY)
v)(EnA)
- p ® (vn -
v){En<2).
Then the RHS of the above is estimated as Then the RHS of the above is estimated as H® {vn - v)(En,i) -p®{vn-
v)(En,2) /
= I Jx
{' n{x,(Enil)x)-i'(x,(EnA)x)}fj.(dx) - j
{vn{x,{Ent2)x)-v{x,{Eni2)x)}ij,(dx)
Chapter IV: Special Topics
204
< sup \vn{x, (Enji)x) x&X
- v(x,
(E„ti)x)\
+ sup \vn(x, (En<2)x) - v(x, xex <2sup\\vn(x,-)-v(x,-)\\ xex = 2p(vn, v) -> 0 (n-¥ oo),
(E„t2)x)\
where (JE„,i)« is the x-section of EUii for i = 1,2 and n > 1. Therefore the conclusion follows. Proposition 4. Consider the metric p on C(X, Y). (1) (C(X,Y),p) is a complete metric space. (2) (CS(X, Y),p) is a complete metric space. (3) (Cse(X,Y),p) is a complete metric space. (4) (Ca(X,Y),p) is a complete metric space. (5) (Cae(X, Y),p) is a complete metric space. Proof. (1) Let {vn} C C(X,Y) be a Cauchy sequence, i.e., p{vn,um) oo. Let i 6 l b e fixed. Then, ||"n(a;,-)- , / m(a;,0||
->0asn,m->
-*■ 0 ( n , m - > o o ) .
Since v„(x, ■) € P ( F ) for n > 1 and P(Y) is norm closed, there is some v(x, •) € P(Y) such that ||i/„(a:, ■) — v(x, -)|| —>• 0. Hence v{-, •) satisfies (cl). To see that v(-,-) satisfies (c2), let C 6 2) be arbitrary. Since &>„(-, C) is 3E-measurable for n > 1 and |i/„(a:,C0 - u(x,C)\
< \\vn(x, •) - v(x,-)\\ -► 0,
s 6 X,
we see that i/(-, C) is also X-measurable. Hence (c2) is satisfied. Thus v is a channel. For any e > 0 there is some no > 1 such that p(t'„, vm) < e for n, m > n 0 . Letting m -> oo we obtain p(i/ n , v) < e for n > n 0 . This means that p(vn, v) -¥ 0 as n -> oo. Therefore (C(X, y ) , p ) is complete. (2) Let {i/„} C C S (X, F ) be a Cauchy sequence. Then by (1) there is some v e C(X,Y) such that p{vn,v) -> 0. To see that v is stationary let x € X and C e 2J. It follows that \u(Sx,C)-u(x,T-1C)\
< \u(Sx,C) +
- un(Sx,C)\
+ \un(Sx,C)
-
vn{x,T-lC)\
\vn{x,T-lC)-v(x,T-1C)\
< \\v(Sx, •) - un(Sx, -)|| + | | M * , •) - *(*, -)|| < 2p(i/„, i/) -> 0 (n -> oo).
4-3. Approximation of channels
205
Hence v{Sx,C) = v{x,T~1C). Thus v is stationary. (3) Let {vn} C Cse(X, Y) be a Cauchy sequence. Then there is some stationary v G CS(X, Y) such that p(vn, v) -¥ 0 by (2). To see that v is ergodic let \i G Pse(X). Since ^ ® f „ is ergodic for n > 1, ||/*®i/ n — A*<S»f|| —>• 0 by Lemma 3, and Pse{X xY) is norm closed, one has that fj, ® v is also ergodic. Thus v is ergodic. (4) and (5) can be proved similarly. Let us now consider channels [X, v, X], where the input and the output are iden tical. Let n G P(X) and consider the Hilbert space L2(X,n). If an operator L : L2(X, n) ->• L2{X, r)) is denned by f G L2(X,n),
( L / ) ( i ) = / k(x,y)r,(dy), Jx
x G X,
where &(-, •) is a suitable kernel function on X x X, then L is said to be of HilbertSchmidt type if k G L2(X x X, n x n). Similarly we have: Definition 5. Assume that a channel v G C(X,X) is strongly measurable, so that there is some n G P(X) such that v(x, ■) -C n for a; G X and
k(x y)
' -
v(dy)
is jointly measurable. In this case, v is said to be of (77) Hilbert-Schmidt type if fceL2(XxX,»7X77). Theorem 6. Let v G C(X,X) be a strongly measurable channel with a dominating measure 77 G P(X). For any e (0 < e < 1) there exist a pair of channels V\,v-i G C(X, X) and a A (0 < A < e) such that v = AJ/I + (1 - A)i/2, h(x, y) = ^ ^ *,(*,y) =
Proof. Let k(x,y) = ^ constant c > 0 let
eL\XxX,nx
77),
(3.3)
^M- eL2(XxX,r,xr,). may)
(3.4)
V2
^ for (x,y) eXxX.
Then clearly k G Ll{X x X). For a
fccO^ V) = M^, 2/)l[fc>c](a:I«/), x
(3.2)
fc x
K( > V) = ( > 2/)![fc
x,yeX, x, y G AT,
Chapter IV: Special Topics
206
where [k > c] = {(x,y) G X x X : fc(a;,y) > c}. Now let e (0 < e < 1) be given. Choose a constant c > 0 for which 0 < Pclli,„x„ < e,
l-e<||**H1,,x,
Then define v\, v2 by Vl(x,
C) = ^
u2(x, C) = T T T ^
/ k'c(x, y) n(dy),
xeX,C€X,
/ K(x, y) r,(dy),
x G X, C G X,
II^C l|l,TJXT| JC
and let A = ||f c || Mxq . It follows that (3.2) - (3.4) hold. and let A = ||f c || Mxq . It follows that (3.2) - (3.4) hold. T h e o r e m 7. Let v G C(X, X) be a channel. If v is strongly measurable, then: T h e o r e m 7. Let v G C(X, X) be a channel. If v is strongly measurable, then: (cl9) There is an n G P{X) such that for any e > 0 there exists an (rj) HilbertSchmidt channel ue G C(X,X) for which p{v,ve) < e. If (X,X) is separable, then the converse is true, i.e., (cl9) implies the strong measurability of v ((cl7)). Proof. Let r] G P{X) be a dominating measure for v. Theorem 6 implies that for any e (0 < e < 1) there are channels V\, v2 G C(X, X) and a A > 0 such that v = Ai/i + (1 - X)v2,
0
where v2 is of (rf) Hilbert-Schmidt type. It follows that \\u{x, -)|| == A||i/!(i, \\\i>i(x, 0■)- -^ v( 2z{x, -)|| ||v(ar, ■) 0 -- vu2{x, , 0|| 2(x, 0||
X xGX
or p(v, v2) < e. Conversely, assume (cl9). Then there exists a sequence {i/„} C C(X,X) nels of (77) Hilbert-Schmidt type such that ^ 1 P{v,vn) < - , n
of chan
n > l .
Since v„(x, ■) < n for x 6 X and n > 1, one has 1/(1, ) < »? for a; G X and £ is weakly measurable by Theorem 2.3, where v{x) = v(x, ■). Since (X, X) is separable, v has a separable range in L1(X,n) C M(X). Thus v is strongly measurable.
4-4- Harmonic analysis for channels
207
4.4. Harmonic analysis for channels We consider channels [X, v, G], where (X, X) is an abstract measurable space and G is an LCA ( = locally compact abelian) group with the Baire cr-algebra <S. It will be shown that a channel is associated with a family of continuous positive definite functions and a family of positive linear functionals on the L 1 -group algebra. Purthermore, Fourier transform is studied in connection with the output source of a channel. There are two good reasons for studying such channels. (1) We can employ a harmonic analysis method. (2) For any measurable space (Y, 2)) we can construct a compact abelian group G such that Y C G. In fact, let T = {/ e B(X) : | / ( x ) | = 1} and consider the discrete topology on T. Then T is a discrete abelian group with the product of pointwise multiplication. Hence G = T is a compact abelian group. Observe that Y C G by the identification y = sy for y € Y, where sy(x) = x(s) f° r X € I \ the point evaluation. Let us denote by G the dual group of G and by (t, %) = x(t) the value of % e G at t G G. i 1 ( G ) = Lx(G,rh) denotes the L 1 -group algebra of G, where in is the Haar measure of G, and the convolution, involution and norm are defined respectively by
f*g(x)= f*9(x)=
JG JG
fix) = TOP),
IfUMxx'-^Mdx'), ff(x')9(xx'~l)m(dx'), ll/lli = t I/(X)I m(dX), JG
for f,g € LX{G) and x^G- There is an approximate identity {eK} C Ll(G), which is a directed set and satisfies that for each K, eK * eK = e* = e K , ||e K ||i = 1 and l|e« * / - /Hi ->• 0 for / £ i 1 ^ ) We begin with a definition. Definition 1. (1) Let V = V(X,G) satisfying the following conditions:
be the set of all functions: X x G -5- C
(pi) For every x E X, <j>{x, •) is a continuous positive definite function on G such that (f>(x, 1) = 1, where 1 is the identity function on G\ (p2) For every
X
€ G, #(-,x) € B ( X ) ,
where ip : G —> C is said to be positive definite if n n
n n
j=i j = i fc=i
for any n > 1, * i , . . - , tn G G and ori,... , a „ € C
Chapter IV: Special Topics
208
(2) Let Q = Q(X, ^{G)) the following conditions:
be the set of all functions q : X x ^(G)
-> C satisfying
(ql) For every x € X, q(x, ■) is a positive linear functional on Ll(G) of norm 1; (q2) For every / G Ll(G),
(-,/) G B(X).
Note that V and Q are convex sets. Proposition 2. (1) For a channel v € C = C(X, G) define <j>v by M*i X)=
I (t, X) v(x, dt),
xeX,X£G-
JG
Then,veV. (2) / / G is compact and totally disconnected, then for any <j> G V there exists a unique channel v € C such that (j> = <j>v. In this case, the correspondence v o <j>v between C and V is onto, one-to-one and affine. Proof. (1) This is a simple application of Bochner's Theorem together with the fact that <j>v{x,x) = KvX(x) (x 6 X,x € G), where K „ is the channel operator associated with v (cf. Section 3.2). (2) Suppose that G is compact and totally disconnected. Let <j> G V be given. For each x G X there exists by Bochner's Theorem a Baire measure vx on G such that
xeG.
JG
cj>(x, 1) = 1 implies that vx is a probability measure. Define a mapping 4>(-,-) : X x C(G) -s- C by
4>(x, b)= ( b(t) vx(dt),
xex,be
C{G).
JG
Clearly <j> = <j> on X x G. Since G is compact, for a fixed b G C(G), there exists a sequence {&„} of finite Unear combinations of elements in G such that 6„ ->■ 6 uniformly on G. Hence we have that
<M-,M = / M«) »(.)(#) -> I b(t)u{.)(dt) = <£(•,&) JG
JG
uniformly on X. Since, for each finite linear combination 6' = £ a*.** with afc G C,Xfc G G , l < f c < n ,
(x,Xk) JG
fc=i
4-4- Harmonic analysis for channels
209
and (■,b') is 3£-measurable, we see that <j>(-,b) is also 3:-measurable for b e C(G). By assumption we can choose a topological basis X for G consisting of clopen sets. For each C 6 X, \c € C(G) and hence V(.){C) is X-measurable. Let (Si = {C e 0 : i/(.)(C) € B ( X ) } . Then it is easily seen that (5i is a monotone class and contains T. Hence ©i = a(t) = <8 since <S is generated by X. Thus ^(.)(C) is 3£-measurable for all C e <&. Letting v(x, C) = vx{C) for x e X and C e (5, we conclude that v e C and (j) = 4>v The uniqueness of v follows from that of vx for each x € X. A one-to-one correspondence between V and Q is derived from a well-known theorem (cf. Dixmier [1, p. 288]). Lemma 3. There exists a one-to-one, onto and affine correspondence cj) O q between V and Q given by
«(*, /) = / fixMx, x) fh(dx),
xexje
L\G).
(4.1)
JG
In this case, there exist a family {Ux(-),'Hx,Qx}xex of weakly continuous unitary representations of G and a family {Vx(-),7ix, 8x}xeX of bounded ^representations of Ll(G) such that for each x G X 4>{x, x) = (UX(X)QX, &,)„,
q(x, f) = (V,(/)fo, QX)X,
X € G,
f € ^(G),
(4.2)
(4.3)
wftere Wx is a Hilbert space with inner product (■, - ^ and gT £ %j is a cyclic vector of norm 1 for both representations. Remark 4. In Lemma 3 above, for each x G X the Hilbert space %x can be obtained from I/ 1 (G) through the positive linear functional q(x, ■) by means of the standard GNS (Gel'fand-Neumark-Segal) construction as follows. Define (•, -) x by (f,9)x = q(x,f*9'),
f,gzL\G),
(4.4)
which is a semiinner product. Let Mx = {/ € Ll{G) : (/, f)x = 0} and Wx be the completion of the quotient space LX{G)/Nx w.r.t. (-,-)a;- Let us denote the inner product in ~HX by the same symbol (■, -) x . Denote by [f]x the equivalence class in %x containing / e LX(G), i.e., [f]x = {ge L\G)
:(f-g,f-g)x
= 0}.
(4.5)
Ux(-) and T4() are given respectively by Ux{x)[g]x = [9xU
Vx(f)[g}x = [f * g]x
(4.6)
Chapter IV: Special Topics
210
for x e G,f,g G i ^ G ) , where operator on 'H. satisfying
ffx(-)
= j(x-)- We see that 17.(x) is a unitary
(1) UX{XX') = ^x(x)t7x(x') for X,X' e G; (2) Ux(l) = 1, the identity operator on W s ; (3) I / . O E ) - 1 = ff.GT1) = %(x)*> the ^adjoint of UX(X), for x £ G; (4) (C«(-)[fl]*i tff]*)» i s continuous on G for any ff € LX{G), while V^.{/) is a bounded linear operator on 7ix such that (5) Vx(af + fig) = aVx(f)
+ pVx(g) for a,P G C and / , g G L ^ G ) ;
(6) Vx(f*g)
for f,g G
= Vx(f)Vx(g)
(7) Vi(/*) = Vx(f)* for / G
L\G);
L\G);
(8) ||V,(/)|| < H/Hi for / G i x ( G ) . Moreover, the cyclic vector gx is obtained as the weak limit of the approximate identity: gx = w-lime K , i.e., K
([/]*, Q*)x = lim ( [ / ] . , e B ) „
/ G ^(G).
Now let S be a measurable transformation on (X, X) and T be a continuous homomorphism on G. T induces a continuous homomorphism T on G given by the equation {Tt,x) = (t,fX), tGG,xGG, and T associates an operator T on B(G) such that T/(x) = /(Tx),
f€B(G),XeG.
Definition 5. 4> EV is said to be stationary if (p3) 0 ( S i , x) = 0(z, Tx) for x G X and x G G. Denote by Vs the set of all stationary cj)€V. q G Q is said to be stationary if (q3) g(Sx, / ) = g(x, f / ) for x € X and / 6
Ll(d).
Denote by Qs the set of all stationary q G Q. Proposition 6. Let v 6 C, 4>v G V and qu e Q be corresponding elements.
Then,
Proof. That v e C3 ■&v € Vs can be proved by the following two-sided implications using the fact that G spans a dense subset in L ^ G , v{x, ■)) for each x e X: veCs<=*
v(Sx, C) = I / ( I , Z ^ C ) , x G X, C 6 6
4-4' Harmonic analysis for channels
211
<^> [ {t,X)v{Sx,dt)=
[
JG
{t,x)»(:c,dT-H),xeX,x€G
JG
<=>v(Sx,x) = <j>v{x,fx),
xeX,xeG
«=*> 4>v 6 7> s . Similarly, >v € Vs & qv € Qs is derived as follows:
^6P.<=>MSx,x) = 4>v{x,Tx), ^
ff(x)MSx,x)m(dx)=
JG
xeX,fe
xex,xed (
JG
f(x)Mx,TX)m(dx),
L^G)
<=> q„{Sx,f) = qv{x,Tf), «=> g„ e Q„.
x 6 X, f € Ll(G)
R e m a r k 7. Let <£ £ Vs and g e Q , . Then <£ is said to be extremal in V, mod P s e ( X ) , denoted <j> e exVa (modP s e (X)), if $ = a(j>i + (l-a)2 with a e (0,1) and i, 2 G ■ps implies <j>(x,-) = cf>i(x,-) = $2(2:, •) Pse(X)-a.e.x. Similarly, q is said to be extremal in Qs m o d P g e ^ ) , denoted q £ ex Qs (modP s e (X)), if q = aqi + (1 — 0)92 with a € (0,1) and gi,?2 £ Q« implies q(x, •) = gi(x, •) = g 2 (z, ■) P« e (X)-o.e.a;. Then for a stationary channel v e C„ consider the following conditions: (1) z/ is ergodic. (2) v S exC„ (modP s e (X)). (3) 0„ 6 ex-p s (modP s e (X)). (4) g„ e e x Q s (modP s e (X)). In general, (1) «• (2) -£= (3) O (4) is true by Theorem III.4.2, Proposition 2 (1) and Proposition 6. If G is compact and totally disconnected, then (1) <S4- (2) O (3) •» (4) holds by Proposition 2 (2). Take a channel i> G C and let <£ = (j>v € V and q = qv 6 Q be corresponding to e. Let {Ux(-),Hx,Qx}xex and { V r I ( - ) , M J , e I } , ^ be families of weakly continuous unitary representations of G and bounded ^representations of i 1 ( G ) given by (4.2) and (4.3), respectively (Lemma 3). Choose any input source n e P(X) and define a semiinner product (-,-),, on L X (G) by (f,9)„ = J
([f)x,[9]x)xn(dx)
= J q(x,f*g*)lj,{dx),
f^eL^G),
where we use notations in (4.4), (4.5) and (4.6). Let Nfl = {/ e LX(G) : (/, f)^ = 0} and Hp be the completion of the quotient space L 1 (G)/A/) 1 w.r.t. (-, •),,. Then Ti^ is a Hilbert space. Denote by [/],, the equivalence class in H^ containing / e LX{G).
Chapter IV: Special Topics
212
Lemma 8. Let v 6 C. For each input source \i e P(X) there exist a weakly con tinuous unitary representation {Ufi(-),'Htl,ftj of G and a bounded ^representation {V^(-), "HM, Qfi} of L\G) on the same Hilbert space Up such that for g,h£ L (G)
{u„{x)[gV, [%)„ = / (C(x)M«. [h]x)x
tid*),
{Vp(f)bU [%), = / (VMM*, [AW. Hi**),
x e G, f G ^(G).
Proo/. Let ^ € P ( X ) and
*(x)= / *(*.x)M= „ eV.
xeG,
By Lemma III.2.5 we have that for x £ G
* ( x ) = / / < * » X) «'(*. * ) / * ( & ) = / <*, X> / * ( * ) ■ JX JG
JG
Since /«/ is a Baire probability measure on G, <& is a continuous positive definite function on G such that $(1) = 1. Hence there exists a positive linear functional Q on L X (G) of norm 1 such that Q(f) = / fixMx)
™(dx),
f G
L\G).
JG
It follows from Pubini's Theorem that for / 6 LX(G)
Q(f) = f f(x)[J 4>(x,X)ti*:) m(dX) = /
/
f(x)^>{x,x)m(dx)n(dx)
JXJG
= I q{x,f)fi(dx), Jx
by (4.1),
where q = qv. Observe that for f,g € L ^ G ) ( / , » ) / . = / q{x,f*9*)fi(dx) Jx
=
Q(f*g*).
Thus there exist a weakly continuous unitary representation {U^-),^, g } of G and a bounded ^representation { V ^ ( - ) , 7 ^ , ^ } of L ^ G ) on the Hilbert space ? C such that
*(x) = (c^(x)eM. fin)„. x G G,
4.4' Harmonic analysis for channels
213
0(/) = (vM(/)^,eM)M,
feL\G).
Now the desired equations are easily verified. Let us now consider the Fourier transform 7 on L 1 (G) defined by f e L a (G).
(*7)(*) = L(t,x)f(x)m(dx),
For \i e P ( X ) let H^ be the Hilbert space obtained in Lemma 8. As is wellknown (Plancherel's Theorem), the Fourier transform T is a unitary operator be tween L2(G) and L2(G) (cf. e.g. Dixmier [1, p. 369]). Similarly, we can prove the following. Theorem 9. Let v € C and >v e V and qv G 2 be corresponding to v. For each fi £ P(X) let Hf! be as in Lemma 8. Then, the Fourier transform T on L1(G) is regarded as a unitary operator from ri^ onto L2(G,nv). Proof. Let (i € P(X).
Using Fubini's Theorem we have for / 6 Ll{G) that
([/U[/W M =/?(^/*r)M(da:) = f
[{f*f*)(x)Mx,x)fti(dx)l*(dx)
JXJG
= [ L(f*f*)(x) JXJG
IIIIXJGJG JXJGJG
JG
U * f*){x)(t,
/'{t,x)v(x,dt)fh{dxMdx) X) fh{dx)v{x,
dt)n(dx)
[ I W*f*)(.t)v{x,dt)ii(dx)
JXJG
f inoivw
JG
= (Hf),nf))2tliU, where (•, -)2,^v is the inner product in L2(G, {iv). Since J r (L 1 (G)) is dense in CQ(G), which is the Banach space of all C-valued continuous functions vanishing at infinity and is dense in L2(G, fiv), we conclude that J r ( i 1 ( G ) ) is dense in L2(G, /iv). There fore, the Fourier transform T can be uniquely extended to a unitary operator from Hp onto L2(G,fj,f). Remark 10. To consider measurability of a channel v G C(X, G) assume that G is compact abelian with the normalized Haar measure m(-). The dual group G is
Chapter IV: Special Topics
214
discrete abelian, so that Ll{G) = tl{G). Let= 4>„ e V correspond to v. If 4>{x,-) G i1(G) for every x € X, then v(x, ■) < m (x e X), i.e., v is dominated by m, and the function v : X -» M{G) is weakly measurable. For, since <j>(x, ■) = x(-) e ^(G) for each x G X, it follows from the Fourier inversion formula that
MX) = I F<&(t)(t^jm(dt),
XeG.
JG
On the other hand, we know that 4>(x,X)=
f (t,x)v(x,dt),
X
GG.
JG
Hence v(x, ■) < m (x G X ) . Therefore the conclusion follows from Theorem 2.3.
4.5. N o n c o m m u t a t i v e channels In this section, a noncommutative extension of (classical) channels is discussed based on the one-to-one correspondence between the set of channels and the set of certain type of operators between function spaces (Theorem III.2.1). We formulate a noncommutative (or quantum mechanical) channel as a certain mapping between the state spaces of two C*-algebras. Stationarity, ergodicity, KMS condition and weak mixing are introduced, and their properties are examined. As standard textbooks of operator algebra theory we refer to Dixmier [1](1982), Sakai [1](1971) and Takesaki [2] (1979). Let A be a C*-algebra and 6(A) be its state space. 6(A) consists of positive linear functionals on A of norm 1. Consider a strongly continuous one parameter group a(G) of *-automorphisms of A over an LCA group G. That is, i) at is a *-automorphism of A for t € G\ ») atlat2 = a M 2 for ti,t2 e G; iii) lim \\ata - a\\ = 0 for a £ A, e being the unit of G. Then the triple (A, 6 ( A j , a ( G ) ) is called a C*-dynamical
system.
Definition 1. Let G be an LCA group and (A, 6 ( A ) , a ( G ) ) and (B,6(B),/?(G)) be a pair of C*-dynamical systems. A mapping A* : 6(A) ->• 6(B) is said to be a channel if (qcl) A* is a dual map of a completely positive map A : B ->■ A. Here A*
4.5. Noncommutative
channels
215
to an n x n positive matrix (A&j,). Sometimes A* is called a quantum C(A, B) denotes the set of all channels from A to B.
channel.
E x a m p l e 2. (1) Let (X, X) and (F, 2)) be a pair of compact Hausdorff spaces with the Baire a-algebras, and A = C(X),E — C{Y), the Banach spaces of Cvalued continuous functions on X, Y, respectively. Then, A and B are commutative C*-algebras with 6(A) = [C(X)*]+ = M+(X) = P(X) and 6(B) = P{Y). If S : X —> X and T : Y —> Y are invertible measurable transformations, then an = Sn and Pn = T n (n G Z) define one parameter groups of ^-automorphisms on A and B over Z, respectively, where S and T are operators induced from S and T, respectively. Hence (C{X),P{X),S(Z)) and (C(Y),P{Y),T(Z)) are C*-dynamical systems. Let A : C(Y) —> C(X) be a positive linear operator with Al = 1, then the dual map A* : P(X) —$■ P(Y) is a channel, where 1 is the identity function. In fact, the following statement is true by the proof of Theorem III.2.1: / / A : C(Y) —> C{X) is a positive linear operator with Al = 1, then A has a unique extension h.\ : B(Y) —>• B(X) such that B(Y) 3 bn | 0 implies B(X) 3 Ai&„ J. 0. Thus, since Ai induces a unique channel, such a A defines a continuous channel in view of Proposition III.2.11. (2) In (1), let B(X) and B(Y) be the Banach spaces of all bounded Baire functions on X and Y, respectively. In this case, B(X) and B(Y) are also C*-algebras and it holds that P(X) C e(B{X)) and P{Y) C &(B(Y)). Let A : B{Y) -> B{X) be a positive linear operator with Al = 1 such that 6 n 4- 0 => hbn \, 0 (cf. (kl) and (k2) in Section 3.2). Then, A* : G(B(X)) ->• 6(B(Y)) is a channel. In view of Theorem III.2.1, Definition 1 is a noncommutative extension of (classical) channels. Note that B{X) is a H*-algebra, i.e., a C*-algebra with a a-convergence. In this case, the er-convergence an-^ta is given by sup ||a n || < oo and a„ —> a pointwise. n>l
Also note that B(X) is the smallest E*-algebra containing the C*-algebra C(X), denoted B(X) = C(X)a and called the a-envelope of C(X). Therefore, as a direct noncommutative extension of a channel operator defined in Section 3.2 we can for mulate a noncommutative channel operator as a positive linear operator A between two E*-algebras with Al = 1 and such that bn A-0 implies hbn ->0. (3) Let % be a complex separable Hilbert space and B{T-L) be the algebra of all bounded linear operators on %. T{H) denotes the Banach space of trace class operators on H, so that T(H)* = B(H) holds. Hence, 6(B{H))
= T+(H) = {pe T{H) : p > 0, t r p = 1},
where tr(-) is the trace. Let {Qn}'^'=1 be a set of mutually orthogonal projections oo
on % such that £Z Qn — l i i-e-i
a
n=l oo n=l
resolution of the unity, and define A* by
216
Chapter IV: Special Topics
which is called a quantum measurement. Then, A* is a channel in C{B(/H),B('H)). (4) Let B be a C*-algebra and A be its closed *-subalgebra. If A : B -¥ A is a projection of norm 1, then the dual map A* : 6(A) -¥ 6(B) is a channel. (5) Let (X, X) and C(X) be as in (1) and (A, 6 ( A ) , a ) be a C*-dynamical system. If u>: X —> 6(A) is an X-measurable mapping, then A*(i = [ u(x)p,(dx),
fi G P(X)
Jx
defines a channel A* : P(X) -> 6(A) and is called a classical-quantum H : X -> A + = {a G A : a > 0} is an A + -valued c.a. measure, then A V ( 0 = ¥>(*(•)).
defines a channel, called a quantum-classical channel, where A : C(X) satisfy that
A(f)=
channel. If
f fdS, Jx
—> A and S
feC(X),
which is a Riesz type theorem for vector measures. Let us fix two C*-dynamical systems (A, 6 ( A ) , a(R)) and ( l , 6 ( B ) , /9(R)), where we assume that A and B have the identity 1 and G = R, the real line. Definition 3. Consider a C*-dynamical system (A, 6 ( A ) , a ( R ) ) . 1(a) = denotes the set of all a-invariant states, i.e.,
I(a,A)
1(a) = {p 6 6(A) : p(ata) = p(a), a e A, ( e R}. A state p € 6(A) is said to be a-KMS if, for any a, 6 e A, there corresponds a C-valued function fa$ on C such that (i) fa,b is analytic on D = {z G C : 0 < I m z < 1}, where Im{-} indicates the imaginary part; (ii) fa,b is bounded and continuous on D; (iii) fa,b(t) = p((ata)b) and fa>b(t + i) = p{b(ataj) for t G R. K(a) = K(a, A) denotes the set of all a-KMS states. A state p is said to be weakly mixing (WM) if p(a(a)b) = p(a(a))p(b),
a, b G A,
where 1 ft a(a) = lim - / as(a) ds,
'^°° t J0
a G A.
4-5. Noncommutative channels
217
WM(a) = WM(a, A) denotes the set of all WM states. Using the above definition we can introduce stationarity, ergodicity, KMS condi tion and WM condition for quantum channels. Definition 4. Let A* : 6(A) —¥ 6(B) be a channel. A* is said to be stationary if (qc2) A o fa = at o A for t E R. CS(A,B) denotes the set of all stationary channels in C(A, B). A stationary channel A* is said to be ergodic if (qc3) A*(ex 7(a)) C ex/(/3), where ex{-} denotes the set of all extreme points in the set {■}. C se (A,B) denotes the set of all stationary ergodic channels. A stationary channel A* is said to be KMS if
(qci)A*{K(a))CK(P). Cfc(A, B) denotes the set of all KMS channels. A channel A* is said to be weakly mixing (WM) if (qc5) A ' ( W ( a ) ) C WM(P). Cu)m(A,B) denotes the set of all WM channels. As is easily seen, the condition (qc2) corresponds to the condition (k3) ( K T = SK). In a classical channel setting, ex.P3(X) = P8e(X) and exP„(Y) = Pse(Y), and a stationary ergodic channel v transforms ergodic input sources into ergodic output sources, or K * ( e x P s ( X ) ) C exPs(Y). Hence (qc3) is a natural extension of ergodicity of classical channels. Example 5. Let A®B be the C*-tensor product of A and B, and j : B —f A®B be a natural embedding, i.e., j(b) = 1 ® b for 6 6 B. Also j denotes a natural embedding of A into A ® B. (1) Take a state i/> 6 6(B) and define A : B -S- A by A = (1V>) o j . Then, it is easily seen that A*ip = tp for G WM(0), then the dual map A* of A = (1 ® ijj) o ao j is WM. For, let (a((l ® )8(6i))(l ® 6a)))
Chapter IV: Special Topics
218
=(ff(i ® 0(h))ff(i = ip®ip(a®/3(a(l
® 6 2 ))
® fci))cr(l ® 6 2 )).
Now for oi, a2 € A and 61,62 € B it holds that tp® ip(a ® /?(ai ® &i)°2 ® 6 2 ) =
ip(a(ai))tp(a2)ip((3(bi))ip(b2)
since y» is WM, so that AV(^(6i)62) = V ® ^ ( o ^ M
1
® 6 i ) ) ) v ® V»(ff(l ® 62))
= A*(^(6 1 ))AV(6 2 ) since the algebraic tensor product A©B is dense in AB. This means A*tp 6 WM{0). Therefore A* is WM. (3) In (2), if V € K(fi), then A* = [(1 ® ip) o a o j] * is KMS. For, let tp e K(a). Then
/«,*(*) = ¥>«lKa*®A(9)fl).
teH,
/«,«(* + ») = V®lK£o*®ft(Q)).
i e R
Letting Q =
fQ,n(t) = f*M (*) = AV(A(6i)& 2 ), / 0 ,ji(« + «) s A , A ( * +1) = A>(6aA(fri)). Hence A*( 6 ) = AAfr(&) + (1 - A)A^(6) for 6 e B and tp e ex I (a). It is easily verified that for
6 e B, t 6 R, fc = 1,2,
4-5. Noncommutative channels
219
so that A*.
i>$exl(0).
Hence we can find ipi, i\>i G I(j3) with ipi ^ ip2 and a A G (0,1) such that ip = Xijji + (1 — X)ip2- Since r[>i and ip2 are dominated by ip there exists a GNS representation {n^,ir^,,x^,u^} of B and operators Qf,Qt G 7r(B)' n i / * ( t ) ' such that
(i) Of > 0; (ii) (x^,Qfx^,)
= l;
(iii) ^ ( 6 ) = {x+,Qt*+(b)xi)
(b G B),
for k = 1, 2, where ^ ( R ) is a one parameter group of unitary operators on H^, and 7ty(B)' and u^(R)' are the commutants in B(H^). Since B is simple, TT^ is faithful, i.e., Tr^(b) — 0 O b = 0. Let 6 ^ = A071-I1 : 7r^,(B) —> A, which is completely positive because A : B —> A is so. Since A = BCH), Q^, can be extended to a completely positive map 0^, : B(H^,) —>■ A. Hence we have that A > = (9^, o ^)*v,
y G 6(A) = T+(H)
i>k(b) = ®*MQhMb))>
6GB,fc = l , 2 .
Let A * = €>,<, oQ]£ OTT^, : B —> A, which is completely positive (k = 1,2). For fe = 1,2 consider a map
0
A]?:B^ 0
A,,
where ip = h.*tp if ^> G J(/9), Aj^ = A otherwise and A^, = A for ip G 6(A). It is easily seen that this map is completely positive. Define a projection E : © A^ —>■ A v>ee(A) by W(W
0
A*(6)))= W (A*(6)),
6GB, W G6(A),fc= l,2.
220
Chapter IV: Special Topics
Let Ak = Eo
® A|£ (k = 1,2). Then, Afc : B -> A is completely positive, v>ee(A) Afcl = 1 and A*k € CS(A, B) for k = 1,2. Moreover, AJ ^ A^ and A* = AAJ + (1 - A)AJ (mod/(a)), which contradicts the assumption. We now consider interrelations between KMS channels, WM channels and ergodic channels. B denotes the set of all (3-analytic elements of B, i.e., b € B iff there exists a B-valued entire function A(t) such that A(t) = pt(b) for t € R. (A, ©(A), a) is siad to be a-abelian if 1 /•' lim - / (p(c*[aa(a),b]c) ds = 0,
a,b,c e A,ip £ 1(a),
where [a, 6] = aft — 6a for a, 6 € A. Then we can state the following. Theorem 9. Let A* € CS(A, B) fee a stationary channel. Then: (1) A* is WM iff A(0(bi)h)
= A(^(6i))A(62) (mod W21f(a)),
6 1; 6 2 G B.
(5.1)
(2) A* is KMS iff HMh)bi)
= A(b2Pt-i{bi)) (modK(a)),
h G I , b2 g B,« 6 R.
(3) 7/ (A, 6(A), a) is a-abelian and A is *-homomorphism, then A* is KMS iff A* is ergodic. Proof. (1) Suppose A* is WM. Then we have for blt b2 G B,G WM(a) AV(^(6i)6 2 ) = A*tp(0(bi))A*(b2),
since A > G WM(0),
= v»(A?(61))v»(A(6a)) = v(a(A61))y5(A(62)),
since A* is stationary,
= v(a(Afci)A(&2)),
since p G WM(a),
= f(A{Pbi)A(b2)),
since A* is stationary.
Thus (5.1) holds. The converse implication can be verified in a same fashion as above. (2) and (3) are easily verified.
4-5. Noncommutative channels
221
As in the commutative case, we want to consider compound states. Let A* : 6(A) -> 6(B) be a channel. Take any( 6 ) for 6 G B. If $o = f>® A'V, then $o is a trivial compound state. This is not useful since it does not give any interrelationship between
(w® A*w)/i(dw),
which depends on the choice of S and p, /J is not necessarily unique. /x, since fi R e m a r k 10. (1) $j,,s is an extension of the classical compound source. In fact, let A = C(X), B = C(Y) and 5 = P(X). Let v € C(X,Y) o K* = A*. Then each /i has a unique decomposition l* =
Jx
5xii(dx),
where 5X is the Dirac measure at x e X. Hence we have that for A € X and C 6 2) where 5X is the Dirac measure at x e X. Hence we have that for A € X and C 6 2) * „ , * ( J 4 x C) = / 5a;(A)A*<5:c(C)//(da;)
= / lyi(x)i/(x,C)//(da:) Jx = / u(x,C)
fi(dx)
JA
= fi<8>i/(A x C). (2) If A = B(7fi),B = B(ft 2 ), A* : 6(A) -> 6(B) is a channel, and p € ©(A) = ^ i + ( ^ i ) i s written in a Schatten decomposition form
p=Y,X"Pn, n=l
(5.2)
222
Chapter IV: Special Topics
then p ® h*p = J2 ^nPn ® A*p„ is a compound state, where Ai > A2 > ' •' ^ 0 an<* n=l
, ,*
p„'s are mutually orthogonal one-dimensional projections (cf. Schatten [2\). A quantum entropy and relative entropy are denned and entropy transmission through a quantum channel can be discussed. A brief description regarding this will be given in the Bibliographical notes.
Bibliographical notes 4.1. Channels with a noise source. The contents of this section is taken from Nakamura [3] (1975). 4.2. Measurability of a channel. The content of this section is taken from Umegaki [7](1974). Remark 2.6 is added, which was used in Section 3.5. The role of dominated channels is clarified in connection with weak and strong measurabilities. 4-3. Approximation of channels. The metric p(-, -) is introduced in Jacobs [1](1959) and Ding and Yi [1](1965). Lemma 3.2 is in Kakihara [3](1985). Lemma 3.3 and Proposition 3.4 are noted here. Theorems 3.6 and 3.7 are obtained in Umegaki [7]. Another type of distance was considered by Neuhoff and Shields [1](1982) using d-distance. KieflFer [4](1983) considered various topologies in C(X, Y) and showed that the class of weakly continuous channels is dense in some types of channels under these topologies. 4-4- Harmonic analysis for channels. The idea of this section is taken from Kakihara [1, 2](1976, 1981) and [3] and Kakihara and Umegaki [1](1979). 4-5. Noncommutative channels. Echigo (Choda) and Nakamura [1](1962) and Choda and Nakamura [1, 2] (1970, 1971) considered a noncommutative extension of classical channels. On the other hand, quantum channels were defined and studied by some authors such as Davies [1](1977), Holevo [1](1977), Ingarden [1](1976) and Takahashi [1](1966). The present style of formulation of a (quantum) channel is ob tained by Ohya [1](1981). A E*-algebra formulation of a channel was considered by Ozawa [1](1977) (see also [2](1980)), which is a direct generalization of the classical channel as was seen in Example 5.2 (2). Definition 5.4 through Remark 5.10 are due to Ohya [1], [2](1983). von Neumann [2](1932) introduced an entropy for a state p € T?{U) that is defined by H{p) = -trplogp OO
n=l
Bibliographical
notes
223
where {<j>n} is any CONS in H, H(p) is independent of the choice of a CONS {^ n }, and H(p) is called the von Neumann entropy of p. Since p can be written by (5.2), if we choose a CONS containing all the eigenvectors of p, then we see that oo
rc=l
Thus the von Neumann entropy can be viewed as a generalization of the Shannon entropy since {A„} is an infinite probability distribution, i.e., An > 0 (n > 1) and oo
Y^ Xn = 1. Basic properties of H(p) are collected in the following: n=l
(1) (Positivity) H(p) > 0 for p G T+{U). (2) (Concavity) H(\Pl + (1 - \)p2) > XH{Pl) + (1 - \)H{p2) for A e [0,1] and pup2€T+(H). (3) (Lower semicontinuity) pn —» p (weakly) =$■ H(p) < liminf H(pn). n—>oo
These properties together with some other properties of the von Neumann entropy were obtained by Lieb and Ruskai [1](1973), Lieb [1](1973) and Lindblad [1, 2, 3, 4] (1972-1975). The von Neumann entropy was extended to a C*-dynamical setting by Ohya [2, 3]. Another type of entropy was introduced by Segal [1](1960), now known as the Segal entropy, which was developed by Ruskai [1](1973) and Ochs and Spohn [1](1978). Umegaki [1](1962) introduced what is now called the Umegaki relative entropy in a semifinite andf trp(logp-logo-) H(p\a) = < ( oo
if s(p)<s{a) . otherwise
where s(p) is the support of p. Araki [1, 2](1976, 1977) extended to the case where A is any von Neumann algebra based on the Tomita-Takesaki theory (cf. Takesaki [1](1970)). Shortly afterwards Uhlmann [1](1977) defined the relative entropy for a pair of positive linear functional^ on a *-algebra, which is a generalization of Araki's definition and is in the framework of interpolation theory. Ohya [2],[3](1984) defined the relative entropy in a C*-algebra setting and applied it to entropy transmission in quantum channels (see also Ohya [4](1985)). Yuen and Ozawa [1](1993) showed the ultimate upper bound for information carried through a quantum channel. A related topic can be seen in Sugawara [1](1986).
REFERENCES
R. L. Adler [1] Ergodic and mixing properties of infinite memory channels. Proc. Amer. Math. Soc. 12 (1961), 924-930. R. Ahlswede and I. Csiszar [1] Hypothesis testing with communication constraints. IEEE Trans. Inform. The ory IT-32 (1986), 533-542. M. A. Akcoglu [1] A pointwise ergodic theorem in L p -spaces. Canad. J. Math. 27 (1975), 10751082. R Algoet and T. Cover [1] A sandwich proof of the Shannon-McMillan-Breiman theorem. Ann. Prohah. 16 (1988), 899-909. H. Araki [1] Relative entropy of states of von Neumann algebras. Publ. RIMS Kyoto Univ. 11 (1976), 809-833. [2] Relative entropy of states of von Neumann algebras II. Publ. RIMS Kyoto Univ. 13 (1977), 173-192. S. Arimoto [1] An algorithm for computing the capacity of arbitrary discrete memoryless channls. IEEE Trans. Inform. Theory IT-18 (1972), 14-20. R. Ash [1] Information Theory. Interscience, New York, 1965. [2] Real Analysis and Probability. Academic Press, New York, 1972. R. R. Bahadur [1] Sufficiency and statistical decision functions. Ann. Math. Statist. 25 (1954), 423-462. 0 . Baradorff-Nielsen [1] Subfields and loss of information. Z. Wahrscheinlichkeitstheorie 2 (1964), 369379. A. R. Barron [1] The strong ergodic theorem for densities: Generalized Shannon-McMillanBreiman theorem. Ann. Probab. 13 (1985), 1292-1303. 225
226
References
P. Billingsley [1] Ergodic Theory and Information. Wiley, New York, 1965. G. D. Birkhoff [1] Proof of ergodic theorem. Proc. Nat. Acad. Sci. U.S.A. 17 (1931), 656-660. R. E. Blahut [1] Computation of channel capacity and rate-distortion functions. IEEE Trans. Inform. Theory IT-18 (1972), 460-473. [2] Hypothesis testing and information theory. IEEE Trans. Inform. Theory IT20 (1974), 405-417. J. R. Blum and D. L. Hanson [1] On invariant probability measures. Pacific J. Math. 10 (1960), 1125-1129. L. Boltzman [1] Weitere Studien iiber das Warmegleichgewicht unter Gasmolekulen. Wiener Berichte 63 (1872), 275-370. [2] Uber die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Warme theorie und der Warhscheinhchkeitsrechung respektive den Satzen iiber das Warmegleichgewicht. Wiener Berichte 76 (1877), 373-435. L. Breiman [1] The individual ergodic theorem of information theory. Ann. Math. Statist. 28 (1957), 809-811: Correction, ibid. 31 (1960), 809-810. [2] On achieving channel capacity in finite-memory channels. III. J. Math. 4 (1960), 246-252. J. R. Brown [1] Ergodic Theory and Topological Dynamics. Academic Press, New York, 1976. L. Carleson [1] Two remarks on the basic theorems of information theory. Math. Scand. 6 (1958), 175-180. H. Chernoff [1] A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23 (1952), 493-507. [2] Large sample theory: parametric case. Ann. Math. Statist. 27 (1956), 1-22. G. Y. H. Chi and N. Dinculeanu [1] Projective limits of measures preserving transformations on probability spaces. J. Multivariate Anal. 2 (1972), 404-417. M. Choda and M. Nakamura [1] A remark on the concept of channels II. Proc. Japan Acad. 46 (1970), 932-935. [2] A remark on the concept of channels III. Proc. Japan Acad. 47 (1971), 464-469. K. L. Chung [1] A note on the ergodic theorem of information theory. Ann. Math. Statist (1961), 612-614.
32
References
227
I. P. Cornfeld, S. V. Fomin and Ya. G. Sinai [1] Ergodic Theory. Springer-Verlag, Berlin, 1982. I. Csiszar [1] Information-type measures of difference of probability distributions and indi rect observations. Studia Sci. Math. Hungar. 2 (1967), 299-318. I. Csiszar and J. Korner [1] Information Theory: Coding Theorems for Discrete Memoryless Systems. Aca demic Press, New York, 1981. E. B. Davies [1] Quantum communication systems. IEEE Trans. Inform. Theory IT-23 (1977), 530-534. N. Dinculeanu and C. Foia§ [1] A universal model for ergodic transformations on separable measure space. Michigan Math. J. 13 (1966), 109-117. [2] Algebraic models for measures. III. J. Math. 12 (1968), 340-351. [3] Algebraic models for measure preserving transformations. Trans. Amer. Math. Soc. 134 (1968), 215-237. Hu Guo Ding and Shen Shi Yi [1] Some coding theorems for almost periodic channels. Chinese Math. 6 (1965), 437-455. J. Diestel and J. J. Uhl, Jr. [1] Vector Measures. Amer. Math. Soc, Providence, R.I., 1977. J. Dixmier [1] C*-algebras. North-Holland, New York, 1982. R. L. Dobrushin [1] General formulation of Shannon's main theorems in information theory. Amer. Math. Transl. 33 (1963), 323-438. L. Doob [1] Stochastic Processes. John Wiley & Sons, New York, 1953. Y. N. Dowker [1] Invariant measures and the ergodic theorems. Duke Math. J. 14 (1947), 10511061. [2] Finite and a-finite invariant measures. Ann. Math. 54 (1951), 595-608. [3] On measurable transformations in finite measure space. Ann. Math. 62 (1955), 504-516. N. Dunford and J. T. Schwartz [1] Linear Operators, Part I. Interscience, New York, 1958. M. Echigo (Choda) and M. Nakamura [1] A remark on the concept of channels. Proc. Japan Acad. 38 (1962), 307-309. A. D. Faddeev
228
References
[1] On the notion of entropy of a finite probability space. Uspekhi Mat. Nauk 11 (1956), 227-231 (in Russian). R. H. Farrel [1] Representation of invariant measures. HI. J. Math. 6 (1962), 447-467. A. Feinstein [1] A new basic theorem of information theory. IRE Trans. Inform. P.G.I.T. 4 (1954), 2-22. [2] Foundations of Information Theory. McGraw-Hill, New York, 1958. [3] On the coding theorem and its converse for finite-memory channels. Control 2 (1959), 25-44.
Theory
Inform.
C. Foias, [1] Automorphisms of compact abelian groups as models for measure-preserving transformations. Michigan Math. J. 13 (1966), 349-352. R. J. Fontana, R. M. Gray and J. C. Kieffer [1] Asymptotically mean stationary channels. IEEE Trans. Inform. Theory I T - 2 7 (1981), 308-316. R. G. Gallager [1] Information Theory and Reliable Communication. Wiley, New York, 1968. I. M. Gel'fand, A. N. Kolmogorov and A. M. Yaglom [1] On the general definition of the amount of information. Dokl. Akad. Nauk SSSR 111 (1956), 745-748 (in Russian). S. G. Ghurge [1] Information and sufficient subfields. Ann. Math. Statist. 38 (1968), 2056-2066. R. M. Gray [1] Probability, Random Processes, and Ergodic Properties. Springer-Verlag, New York, 1988. [2] Entropy and Information Theory. Springer-Verlag, New York, 1990. R. M. Gray and L. D. Davisson [1] The ergodic decomposition of stationary discrete random processes. IEEE Trans. Inform. Theory IT-20 (1974), 625-636. R. M. Gray, M. O. Durham and R. L. Gobbi [1] Ergodicity of Markov channels. IEEE Trans. Inform. Theory I T - 3 3 (1987) 656-664. R. M. Gray and J. C. Kieffer [1] Asymptotically mean stationary measures. Ann. Probab. 8 (1980), 962-973. R. M. Gray, D. L. Neuhoff and P. C. Shields [1] A generalization of Ornstein's d distance with application to information the ory. Ann. Probab. 3 (1975), 315-328. R. M. Gray and D. S. Ornstein
References
229
[1] Block coding for discrete stationary d-continuous noisy channels. IEEE Trans. Inform. Theory IT-25 (1979), 292-306. R. M. Gray and F. Saadat [1] Block source coding theory for asymptotically mean stationary sources. IEEE Trans. Inform. Theory IT-30 (1984), 54-68. S. Guia§u [1] Information Theory with Applications. McGraw-Hill, New York, 1977. P. R. Halmos [1] Lectures on Ergodic Theory. Math. Soc. Japan, Tokyo, 1956. [2] Entropy in Ergodic Theory. Lecture Notes, University of Chicago Press, Chicago, 1959. P. R. Halmos and L. J. Savage [1] Application of the Radon-Nikodym theorem to the theory of sufficient statis tics. Ann. Math. Statist. 20 (1949), 225-241. Te Sun Han and K. Kobayashi [1] Exponential-type error probabilities for multiterminal hypothesis testing. IEEE Trans. Inform. Theory IT-35 (1989), 2-14. [2] The strong converse theorem for hypothesis testing. IEEE Trans. Inform. The ory IT-35 (1989), 178-180. R. V. L. Hartley [1] Transmission of information. Bell Sys. Tech. J. 7 (1928), 535-563. E. Hille and R. S. Phillips [1] Functional Analysis and Semi-groups. Amer. Math. Soc, Providence, R.I., 1957. W. Hoeffding [1] Asymptotically optimal tests for multinomial distributions. Ann. Math. Statist. 36 (1965), 369-401, 401-408. A. S. Holevo [1] Problems in the mathematical theory of quantum communication channels. Rep. Math. Phys. 12 (1977), 273-278. R. S. Ingarden [1] Quantum information theory. Rep. Math. Phys. 10 (1976), 43-73. K. Jacobs [1] Die Ubertragung diskreter Informationen durch periodische und fastperiodische Kanale. Math. Annalen 137 (1959), 125-135. [2] Uber die Durchlasskapazitat periodischer und fastperiodischer Kanale. In: Trans. Second Prague Conf. on Information Theory, Statistical Decision Func tions, Random Processes, Held at Prague in 1959, Ed. by J. Kozesnik, Aca demic Press, New York, pp. 231-251, 1960. [3] Uber die Struktur der Mittleren Entropie. Math. Zeit. 78 (1962), 33-43.
230
References
[4] Uber Kanale von Dichtetypus. Math. Zeit. 78 (1962), 151-170. [5] Ergodic decomposition of the Kolmogorov-Sinai invariant. In: Ergodic Theory, Ed. by F. B. Wright, Academic Press, New York, pp. 173-190, 1963. [6] Almost periodic sources and channels. Z. W. Verw. Geb. 9 (1967), 65-84. M. Jimbo and K. Kunisawa [1] An iteration method for calculating the relative capacity. Inform. (1979), 216-223.
Control 4 3
Y. Kakihara [1] Information channels between compact groups. Res. Rep. Inform. Sci. Tokyo Institute of Technology, No. A-28, May 1976. [2] Some remarks on information channels. Res. Rep. Inst. Inform. Sci. Tech. Tokyo Denki Univ. 7 (1981), 33-45. [3] Stochastic processes with values in a Hilbert space and information channels. Doctoral Thesis, Tokyo Institute of Technology, March 1985. [4] Ergodicity of asymptotically mean stationary channels. J. Multivariate Anal. 39 (1991), 315-323. [5] Ergodicity and extremality of AMS sources and channels. Submitted. Y. Kakihara and H. Umegaki [1] Harmonic analysis and information channels. 1978-Seminar on Applied Func tional Analysis, Ed. by H. Umegaki, Yurinsha, Tokyo, pp. 19-23, 1979. S. Kakutani [1] Examples of ergodic measure-preserving transformations which are weakly mix ing but not strongly mixing. In: Lecture Notes in Mathematics #318, Springer, New York, pp. 143-149, 1973. G. Kallianpur [1] On the amount of information contained in a cr-field. Contribution to Probabil ity and Statistics - Essays in Honor of Harold Hotelling. Stanford University Press, Stanford, 1960. J. N. Kapur [1] Maximum Entropy Models in Science and Engineering. Wiley Eastern Limited New Delhi, 1989. J. N. Kapur and H. K. Kesavan [1] Entropy Optimization Principles with Application. Academic Press New York 1992. I. Katznelson and B. Weiss [1] A simple proof of some ergodic theorems. Israel J. Math. 42 (1982), 291-296. A. I. Khintchine [1] The concept of entropy in probability theory. Uspekhi Mat. Nauk 8 (1953) v 3-20 (in Russian). "
References
231
2] On the fundamental theorems of information theory. Uspekhi Mat. Nauk 11 (1956), 17-75 (in Russian). 3] Mathematical Foundations of Information Theory. Dover, New York, 1958. J. C. Kieffer 1] A simple proof of the Moy-Perez generalization of the Shannon-McMillan the orem. Pacific J. Math. 51 (1974), 203-206. 2] A general formula for the capacity of stationary nonanticipatory channels. In form. Control 26 (1974), 381-391. 3] A generalized Shannon-McMillan theorem for the action of an amenable group on a probability space. Ann. Probab. 3 (1975), 1031-1037. 4] Some topologies on the set of discrete stationary channels. Pacific J. Math. 105 (1983), 359-385. C. Kieffer and M. Rahe 1] Markov channels are asymptotically mean stationary. SIAM J. Math. Anal. 12 (1981), 293-305. N. Kolmogorov 1] A new metric invariant of transitive dynamical systems and automorphisms in Lebesgue spaces. Dokl. Akad. Nauk SSSR 119 (1958), 861-864 (in Russian). 2] On the entropy per unit time as a metric invariant of automorphisms. Dokl. Akad. Nauk SSSR VIA (1959), 754-755 (in Russian). B. O. Koopman and J. von Neumann 1] Dynamical systems of continuous spectra. Proc. Nat. Acad. Sci. U.S.A. 18 (1932), 255-263. U. Krengel 1] Ergodic Theorems. De Gruyter Series in Mathematics, De Gruyter, New York, 1985. N. Kryloff and N. Bogoliouboff 1] La theorie de la measure dans son application a I'etude des systes dynamiques de la mecanique non lineaire. Ann. Math. 38 (1937), 65-113. Kullback 1] Information Theory and Statistics. Wiley, New York, 1959. S. Kullback and R. A. Leibler 1] On information and sufficiency. Ann. Math. Statist. 22 (1951), 79-86. E. H. Lieb 1] Convex trace functions and the Wigner-Yanase-Dyson Conjecture. Adv. in Math. 11 (1973), 267-288. E. H. Lieb and M. B. Ruskai 1] Proof of the strong subadditivity of quantum mechanical entropy. J. Math. Phys. 14 (1973), 1938-1941. G. Lindblad
232
References
[1] An entropy inequality for quantum mechanics. Comm. Math. Phys. 28 (1972), 245-249. [2] Entropy, information and quantum measurements. Comm. Math. Phys. 33 (1973), 305-322. [3] Expectations and entropy inequalities for finite quantum systems. Comm. Math. Phys. 39 (1974), 111-119. [4] Completely positive maps and entropy inequaUties. Comm. Math. Phys. 4 0 (1975), 147-151. N. F . G. Martin and J. W. England [1] Mathematical Theory of Entropy. Addison-Wesley, New York, 1981. B. McMillan [1] The basic theorems of information theory. Ann. Math. Statist. 24 (1953), 196219. Shu-Teh C. Moy [1] Asymptotic properties of derivatives of stationary measures. Pacific J. Math. 10 (1960), 1371-1383. [2] Generalizations of Shannon-McMillan theorem. Pacific J. Math. 1 1 (1961), 705-714. [3] A note on generalizations of Shannon-McMillan theorem. Pacific J. Math. 1 1 (1961), 1459-1465. K. Nakagawa and F. Kanaya [1] On the converse theorem in statistical hypothesis testing. IEEE Thins. Inform. Theory IT-39 (1993), 623-628. [2] On the converse theorem in statistical hypothesis testing for Markov chains. IEEE Trans. Inform. Theory IT-39 (1993), 629-633. Y. Nakamura [1] Measure-theoretic construction for information theory. Kodai Math. Sem. Rep. 21 (1969), 133-150. [2] A non-ergodic compound source with a mixing input source and an Adler ergodic channel. Kodai Math. Sem. Rep 22 (1970), 159-165. [3] Ergodicity and capacity of information channels with noise sources. J. Math. Soc. Japan 27 (1975), 213-221. J. Nedoma [1] The capacity of a discrete channel. In: Trans. First Prague Conf. on Informa tion Theory, Statistical Decision Functions, Random Processes, pp. 143-181, 1957. [2] On non-ergodic channels. In: Thins. Second Prague Conf. on Information The ory, Statistical Decision Functions, Random Processes, Held at Prague in 1959, Academic Press, New York, pp. 363-395, 1960. [3] Die Kapazitat der periodischen Kanale. Z. Wahrscheinlichkeitstheorie 2 (1963) 98-110.
References
233
D. L. Neuhoff and P. C. Shields [1] Channel distances and representation. Inform. Control 55 (1982), 238-264. W. Ochs and H. Spohn [1] A characterization of the Segal entropy. Rep. Math. Phys. 14 (1978), 75-87. M. Ohya [1] Quantum ergodic channels in operator algebras. J. Math. Anal. Appl. 84 (1981), 318-328. [2] On compound state and mutual information in quantum information theory. IEEE Trans. Inform. Theory IT-29 (1983), 770-774. [3] Entropy transmission in C*-dynamical systems. J. Math. Anal. Appl. 100 (1984), 222-235. [4] State change and entropies in quantum dynamical systems. In: Lecture Notes in Mathematics #1136, Quantum Probability and Applications II, Springer, Berlin, pp. 397-408, 1985. N. Oishi [1] Notes on ergodicity and mixing property. Proc. Japan Acad. 41 (1965), 767770. D. S. Ornstein [1] Bernoulli shifts with the same entropy are isomorphic. Adv. in Math. 4 (1970), 337-352. [2] Ergodic Theory, Randomness, and Dynamical Systems. Yale University Press, New Haven, 1974. D. S. Ornstein and P. C. Shields [1] An uncountable family of K-automorphisms. Adv. in Math. 10 (1973), 63-88. D. S. Ornstein and B. Weiss [1] The Shannon-McMillan-Breiman theorem for a class of amenable groups. Israel J. Math. 44 (1983), 53-60. J. C. Oxtoby [1] Ergodic sets. Bull. Amer. Math. Soc. 58 (1952), 116-136. M. Ozawa [1] Channel operators and quantum measurements. Res. Rep. Inform. Sci. Tokyo Institute of Technology, No. A-29, May 1977. [2] Optimal measurements for general quantum systems. Rep. Math. Phys. 18 (1980), 11-28. W. Parry [1] Entropy and Generators in Ergodic Theory. Benjamin, New York, 1969. [2] Topics in Ergodic Theory. Cambridge University Press, Cambridge, 1981. K. R. Parthasarathy [1] On the integral representation of the rate of transmission of a stationary chan nel. III. J. Math. 5 (1961), 299-305.
234
References
[2] A note on McMillan's theorem for countable alphabets. In: Trans. Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1962, pp. 541-543, 1964. [3] Probability Measures on Metric Spaces. Academic Press, New York, 1967. A. Perez [1] Information theory with an abstract alphabet. Generalized forms of McMillan's limit theorem for the case of discrete and continuous times. Theory Probab. Appl. 4 (1959), 99-102. [2] Extensions of Shannon-McMillan's limit theorems to more general stochastic processes. In: Trans. Third Prague Conference on Information Theory, Statisti cal Decision Functions, Random Processes, held at Prague in 1962, pp. 545-574, 1964. K. Petersen [1] Ergodic Theory. Cambridge University Press, Cambridge, 1983. J. R. Pierce [1] The early days of information theory. IEEE Trans. Inform. Theory IT-19 (1973), 3-8. M. S. Pinsker [1] Information and Information Stability of Random Variables and Processes. Holden Day, San Francisco, 1964. M. M. Rao [1] Foundations of Stochastic Analysis. Academic Press, New York, 1981. [2] Probability Theory with Applications. Academic Press, New York, 1984. [3] Conditional Measures and Applications. Marcel Dekker, New York, 1993. O. W. Rechard [1] Invariant measures for many-one transformations. Duke Math. J. 23 (1956), 477-488. A. Renyi [1] On mixing sequences of sets. Ada Math. Acad. Sci. Hungar. 9 (1958), 215-228. S. M. Rudolfer [1] On characterizations of mixing properties of measure-preserving transforma tions. Math. Systems Theory 3 (1969), 86-94. M. B. Ruskai [1] A generalization of entropy using trace on von Neumann algebras. Ann. Inst. Henri Poincare 19 (1973), 357-373. S. Sakai [1] C*-algebras and W*-algebras. Springer-Verlag, New York, 1971. R. Schatten [1] A Theory of Cross-Spaces. Ann. Math. Studies No. 26, Princeton University Press, Princeton, 1950.
References
235
[2] Norm Ideals of Completely Continuous Operators. Springer-Verlag, New York, 1960. I. E. Segal 1] A note on the concept of entropy. J. Math. Mech. 9 (1960), 623-629. E. Shannon 1] A mathematical theory of communication. Bell System Tech. J. 27 (1948), 379-423, 623-656. E. Shannon and W. Weaver 1] The Mathematical Theory of Communication. University of Illinois Press, Urbana, 1949. P. C. Shields 1] The Theory of Bernoulli Shifts. University of Chicago Press, Chicago, 1973. Ya. G. Sinai 1] On the concept of entropy for dynamical systems. Dokl. Akad. Nauk. SSSR 124 (1959), 768-711 (in Russian). D. Slepian 1] Information theory in the fifties. IEEE Trans. Inform. Theory IT-19 (1973), 145-148. 2] Key papers in the Development of Information Theory. IEEE Press, 1974. C. Stein 1] Information and comparison of experiments. Unpublished. A. Sugawara 1] On mathematical information channels with a non-commutative intermediate system. J. Math. Anal. Appl. 114 (1986), 1-6. H. Takahashi 1] Information theory of quantum mechanical channels. In: Advances in Commu nication Systems, Vol. 1, Academic Press, New York, pp. 227-310, 1966. K. Takano 1] On the basic theorems of information theory. Ann. of the Inst. Statist. Math. (Tokyo) 9 (1958), 53-77. M. Takesaki 1] Tomita's theory of modular Hilbert algebras and its applications. Lecture Notes in Mathematics #128, Springer, Berlin, 1970. 2] Theory of Operator Algebras I. Springer-Verlag, New York, 1979. I. P. Tsaregradskii 1] A note on the capacity of a stationary channel with finite memory. Theory Probab. Appl. 3 (1958), 79-91. I. Tulcea 1] Contributions to information theory for abstract alphabets. Arkiv for Math. 4 (1960), 235-247.
236
References
A. I. Tulcea and C. I. Tulcea [1] Topics in the Theory of Lifting. Springer-Verlag, Berlin, 1969. H. Tverberg [1] A new derivation of the information function. Math. Scand. 6 (1958), 297-298. H. Uhlmann [1] Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in interpolation theory. Comm. Math. Phys. 54 (1977), 21-32. H. Umegaki [1] Conditional expectation in an operator algebra IV (entropy and information). Kodai Math. Sem. Rep. 14 (1962), 59-85. [2] Entropy functionals in stationary channels. Proc. Japan Acad. 38 (1962), 668672. [3] A functional method on amount of entropy. Kodai Math. Sem. Rep. 15 (1963), 162-175. [4] General treatment of alphabet-message space and integral representation of entropy. Kodai Math. Sem. Rep. 16 (1964), 18-26. [5] A functional method for stationary channels. Kodai Math. Sem. Rep. 16 (1964), 27-39: Supplement and correction, ibid. 189-190. [6] Representations and extremal properties of averaging operators and their ap plication to information channels. / . Math. Anal. Appl. 25 (1969), 41-73. [7] Absolute continuity of information channels. J. Multivariate Anal. 4 (1974), 382-400. [8] Operator Algebras and Mathematical Information Theory, Selected Papers. Kaigai, Tokyo, 1985. H. Umegaki and M. Ohya [1] Entropies in Probabilistic Systems - Information Theory in Functional Analy sis I. Kyoritsu, Tokyo, 1983 (in Japanese). [2] Quantum Mechanical Entropies - Information Theory in Functional Analysis II. Kyoritsu, Tokyo, 1984 (in Japanese). A. J. Viterbi [1] Information theory in the sixties. IEEE Trans. Inform. Theory IT-19 (1973), 257-262. J. von Neumann [1] Proof of the quasi-ergodic hypothesis. Proc. Nat. Acad. Sci. U.S.A. 18 (1932), 70-82. [2] Die Mathematischen Grundlargen der Quantenmechanik. Springer-Verlag, Berlin, 1932. P. Walters [1] An Introduction to Ergodic Theory. Springer-Verlag, New York, 1982. K. Winkelbauer
References
237
[1] Communication channels with finite past history. In: Trans. Second Prague Conf. on Information Theory, Statistical Decision Functions, Random Pro cesses, held at Prague in 1959, Ed. by J. Kozesnik, Academic Press, New York, pp. 685-831, 1960. J. Wolfowitz [1] Coding Theorems of Information Theory (Third Ed.), Springer-Verlag, New York, 1978. Shen Shi Yi [1] The fundamental problem of stationary channel. In: Trans. Third Prague Conf. on Information Theory, Statistical Decision Functions, Random Processes, held at Prague in 1962, pp. 637-639, 1964. [2] Basic problems concerning stationary channels. Advancement in Math. 7 (1964), 1-38 (in Chinese). H. P. Yuen and M. Ozawa [1] Ultimate information carrying limit of quantum systems. Phys. Rev. Lett. 70 (1993), 363-366. Zhang Zhao Zhi [1] Some results obtained with almost-periodic channels. Chinese Math. 6 (1965), 428-436.
This page is intentionally left blank
INDICES
Notation index
Chapter I
P(A|SJ), 11
N, 1
E(f), 12 2, 12 L°°(X), 12
II/IU.12 P = ( P i , - - - ,Pn), 1
P(0. 1 (X,P),1 F(X), 1 RHS, 1 ff(p), 1 H(j>l,...
ll/lloc, 13
11/11,. 14 Jfrffl). I4 2 ,Pn),
A„, 2 p{xj,Vk), 2 P(zjls/fc)> 2 H(X|y), 2
ff(*lv), 2 J(X,F), 2 H(X,Y),2 ff(p|q). 2 R, 4 R+,4 R+,4 LHS, 9 (X,£)M),11 / i o S " 1 , 11 (X,X,n,S), 11 J £ ( £ ) . 11 M/> 11 /i/ < n, 11 J5(/I2». 11
1
£ ( 2 ) , 14 (/,S) 2 , 14 /o5,14
2)nt?), 15t], 24 Si S S 2 , 25
240
Xl 27 [*?••■*?], 27 a?t, 28 A&B, 29 i , 29 » M , 29 Mi £^ Ma, 29 r(/i), 31 C, 31 C, 31 ^l, 33 A, 34 (I»,35 f, 37 (x )7 >,37 CONS, 38 Us, 38 Si — S2, 38
Indices
5,47 3^,48 ^ . ( • P M ) . 48
jsr(ffa,s),48 ^2,(^,49
flW"), eft, 55
49
TO, 55 Mi »M2, 57 Mi 1 M2, 57 01 < Tt, 57
<m « m, 57 /J(*,e),61 Chapter II Xl 67 Z, 67
(r >v ,to,39
[*?-«J],68
fl"(/i,B),41 ff(/i,a,5),41 «n,41 «n,41 too, 41 p ( * ) , 41
071, 68 d0(ai,a.j), 68 d(x,x'), 68 pr f c (), 69 C(X), 69 B ( X ) , 69 M(X), 69 P ( X ) , 69 PS(X), 69 P i a W , 69 -4(371), 69
ftW, 41 M(X), 41 MS(X), 41 M+(X), 41 M+(X), 41
*W4S», 41 P((A\1D), 41
W B ) , 41
#(-,2l,S), 42 MJ(X), 43 |a|, 44 ll-IU.oo, 4 5 M„{X), 47
c(xy, 70
A M (/), 70 MX), 70 A»(/), 70 S 71 £*(*,*!), 71 S„, 71 5, 71 exP s (X), 71
Notation index
co[exP„(X)],
fs, 72 / , / » 72
li-11^,74 B M (.|3), 75
IMU75
©{•••}, 75 5 ffi ft, 75 const, 75 X x 2), 77 3 M , 79 Pr(-), 80 U^, 83 Z+, 83 J„, 83 | J n J„|, 83 p x p, 85 p(A), 91 Pa(X), 91 M a ( X ) , 92 MS(X), 92 M<77, 93 (Sp) a , 93 (5/x),, 93
71
M , ( / ) , 106 Hx, 106 B ( X , S ) , 107 /•", 107 R, 108 / # , 109 ff(//), 111
cM, i n ®o, H 3 03, 113 C(X,2l), 113 P(X,Q5), 113 Q, 114 R, 114 P,(X,
c s (x,y), 122
Xoo, y«i
Pae(X), 97 [X, M ],99 2»n,99
H^),
241
99
9JFn,99 3K£, 100 H{n), 100 H{n,S), 100 I M (2t), 100
/„(2t|2)), ioo 9Jl„,9, 104 Mnfi, 104 hM, 105 M n , , ( / ) , 106 Q, 106
p^fclo,-), 122 P=(p(bk\aj)).k, pi/, 123 p ® f, 123 ^ ( x , C ) , 123 S ® T, 125 (S ® T ) n , 125 5 0 f , 126 A($), 126 <£ © V, 126 <j>*®4i*, 126 £ ®x -F, 126 C{X) ®A .F, 126 C ( X ; . F ) , 126 K„, 127 (-,•>, 128
122
Indices
242
Sa> 129
A(X,Y), 130 A(X,F),130 K(X,Y), 130
x; f (jr,y), 130 A = A„, 130 K = K „ , 130 !/i = v2 ( m o d P ) , 133 K l S K 2 (mod-p), 133 A i = A 2 ( m o d P ) , 133 Cse(X,Y), 137 9Ky, 138 v* (*,(?), 141 M 1 + ( r ) , 144 z/ e e x C 5 ( X , y ) (modP), 147 K 6 exlCs(X,Y) (modP), 152 A e e x X , ( X , F ) (modP), 152 C Q (X,y), 156 V(x,C), 160 C a e ( X , y ) , 163 9JTn(X), 166 97T„(F), 166 fnr„(x x y ) , 166 / „ ( X , y ) , 166 In(n;v), 166 /(M;«0,167 C S ( I / ) , 167, 174, 176
Ceiy), 167, 174, 176 ff»M, 167 v(A,C), 168 9 % ; z / ) , 171, 175 PS(X,
3 Mr, 174 x(fm«(y)), 178 v*, 181 ¥V, 181 vv, 181 £ „ , 181 HBX.
Chapter IV ity,C, 190 ^ , 193 (&■£*}, 197
ll"IU,197 £, 197 »fo, 198 v „ 198 *(*,»), 199 LX{X;£), 199 II*III,M.
2
00
L1(X)Q£, 200 7 ( 0 . 200 L 1 ( X ) ® 7 £ , 200 p(-, -),202 s y , 207 G, 207 (i,X>,207 m, 207 / * 9, 207 e«, 207 V = V{X,G), 207 Q = Q(X,L 1 (<5)),208 >„, 208 {C«(-).««.fo}x6JC.209
{>*(■),«-, a.}« e J r , 209 (■,■)«. 209
[/.], 209 gx, 209, 210 f, 210 f, 210 V„ 210 Q S) 210 e x p s ( m o d P s e ( X ) ) , 211 e x Q s ( m o d P s e ( X ) ) , 211 (•,0M, 211
WM, 211 [/]/.. 211 {^(0,^M,^},212
Notation index
{U).«c.ec}.2i2 Tf, 213 C0(G), 213 ^ ( G ) , 214 A, 214 6(A), 214 a(G), 214 (A,6(A),a(G)), 214 A*, 214 C(A,B), 215 o„-^-o, 215
C W 7 , 215 B(«), 215 T(H), 215 r+(W), 215 tr(-), 215 1(a) = I (a, A), 216 Im{}, 216 K(a) = K(a,A), 216 a(a), 216 WM(a) = WM(a,A), 216 CS(A,B), 217 Cse(A,I), 217 C*(A,B), 217 C wm (A,B), 217 A ® B , 217 ai = a2 (mod 5), 218 AJ = A£ (mod S), 218 {W^jTV'**'1**}" 2 1 9 TT(B)', 219
t, 220 [a, 6], 220 $„,«, 221 H(p), 222 ff(p|
243
Indices
244
A u t h o r index
A Adler, R. L., 187 Ahlswede, R., 64 Akcoglu, M. A., 119 Algoet, P., 119 Araki, H., 223 Arimoto, S., 188 Ash, R., 63, 64
Dinculeanu, N., 64 Ding, Huo Guo, 119, 120, 188, 222 Dixmier, J., 209, 213, 214 Dobrushin, R. L., 188 Doob, L., 64 Dowker, Y. N., 119 Dunford, N., 70, 71, 91 Durham, M. O., 189
B Bahadur, R. R., 64 Barndorff-Nielsen, O., 64 Barron, A. R., 119 Bernoulli, J., 27, 69 Billingsley, P., 63 Birkhoff, G. D., 72, 119 Blahut, R. E., 64, 188 Blum, J. R., 119 Bogoliouboff, N., 120 Boltzman, L., 63 Breiman, L., 64, 102, 119, 120, 188 Brown, J. R., 28, 63
E Echigo (Choda), M., 187, 222 England, J. W., 63
C Carleson, L., 188 Chernoff, H., 64 Chi, G. Y. H., 64 Choda, M., 185, 222 Chung, K. L., 119 Clausius, R., 63 Cornfeld, I. P., 63 Cover, T., 119 Csiszar, I., 63, 64 D Davies, E. B., 222 Davisson, L. D., 120 Diestel, J., 196, 199
F Faddeev, A. D., 7, 64 Farrel, R. H., 119 Feinstein, A., 63, 64, 177, 187, 188 Foia§, C , 64 Fomin, S. V., 63 Fontana, R. J., 119, 187, 188 G Gallager, R. G., 63 Gauss, C. F., 56, 57 GePfand, I. M., 64 Ghurge, S. G., 64 Gobbi, R. L., 187 Gray, R. M., 63, 110, 119, 120, 187, 188 Guiasu, S., 63 H Hahn, H., 43 Halmos, P. R., 63, 64 Han, Te Sun, 64 Hanson, D. L., 119 Hartley, R. V. L., 63 Hille, E., 196 Hoeffding, W., 64
Author
index
Holder, O., 14 Holevo, A. S., 222 I Ingarden, R. S., 222 J Jacobs, K., 64, 119, 120, 187, 222 Jensen, J. L. W. W., 13 Jimbo, M., 188 K Kakihara, Y., 119, 188, 222 Kakutani, S., 119 Kallianpur, G., 64 Kanaya, F., 64 Kapur, J. N., 63 Katznelson, I., 119 Kesavan, H. K., 63 Khintchine, A. Y., 7, 63, 64, 187, 188 KieflFer, J. C , 119, 120, 187, 188, 222 Kobayashi, K., 64 Kolmogorov, A. N., 20, 25, 64 Koopman, B. O., 119 Korner, J., 63 Krengel, U., 63 Kryloff, N., 120 Kullback, S., 55, 63, 64 Kunisawa, K., 188 L Lebesgue, H., 93 Leibler, R. A., 55, 64 Lieb, E. H., 223 Lindblad, G., 223 M Markov, A. A., 28 Martin, N. F. G., 63 McMillan, B., 102, 119, 187
245
Moy, Shu-Teh C , 120 N Nakagawa, K., 64 Nakamura, M., 187, 222 Nakamura, Y., 119, 120, 187, 188, 222 Nedoma, J., 188 Neuhoff, D. L., 188, 222 O Ochs, W., 223 Ohya, M., 63, 187, 222, 223 Oishi, N., 117 Ornstein, D., 28, 63, 64, 120, 188 Oxtoby, J. C , 120 Ozawa, M., 222, 223 P Parry, W., 63 Parthasarathy, K. R., 64, 120, 188 Perez, A., 120 Petersen, K., 63 Phillips, R. S., 196 Pierce, J. R., 64 Pinsker, M. S., 63 R Rahe, M., 188 Rao, M. M., 16, 61, 64, 75 Rechard, O. W., 119 Renyi, A., 119 Rudolfer, S. M., 119 Ruskai, M. B., 223 S Saadat, F., 119 Sakai, S., 214 Savage, L. J., 64 Schatten, R., 126, 199, 222 Schwartz, J. T., 70, 71, 91
Indices
246
Segal, I. E., 223 Shannon, C. E., 1, 7, 63, 64, 102, 119, 182, 184, 188 Shields, P. C , 63, 64, 188, 222 Sinai, Ya. G., 20, 25, 63, 64 Slepian, D., 64 Spohn, H., 223 Stein, C , 64 Sugawara, A., 223 T Takahashi, H., 222 Takano, K., 187, 188 Takesaki, M., 214, 223 Tsaregradskii, I. P., 188 Tulcea, A. I., 31, 118 Tulcea, C. I., 31 Tverberg, H., 64 U Uhl Jr., J. J., 196, 199 Uhlman, H., 223 Umegaki, H., 63, 64, 119, 120, 187, 188, 222, 223 V Viterbi, A. J., 64 von Neumann, J., 74, 119, 222 W Walters, P., 28, 63, 88 Weaver, W., 63 Weiss, B., 119, 120 Winkerbauer, K., 64, 188 Wolfowitz, J., 188 Y Yaglom, A. M., 64 Yi, Shen Shi, 119, 120, 187, 188, 222 Yuen, H. P., 223
Z Zhi, Zhang Zhao, 188
Subject index
Subject index
A absolutely continuous, 11, 57 abstract channel, 123 additivity, 4 algebraic dynamical system, 39 algebraic measure system, 35 algebraic model, 36, 39 algebraic tensor product, 126 a-invariant, 216 a-KMS, 216 a-abelian, 220 alphabet, 65 message space, 68 AMS, 91, 156 analytic element, 220 aperiodic, 87 approximate identity, 207 asymptotically dominate, 93 asymptotically independent, 139 asymptotically mean stationary (source), 91 asymptotically mean stationary (channel), 156 average (of a channel), 141 averaging operator, 129 stationary , 130
C C*-algebra, 214 C*-dynamical system, 214 C*-tensor product, 217 capacity stationary , 167, 174, 176 ergodic , 167, 174, 176 chain, 57 channel, 122, 214 distribution, 122 matrix, 122 of additive noise, 192 of product noise, 195 operator, 130 abstract , 123 asymptotically mean stationary 156 classical-quantum ,216 constant , 123 continuous 122 dominated , 123 ergodic , 136, 147, 163, 217 induced ,181 integration , 190 KMS , 217 m-dependent , 137 m-memory , 122 B memoryless , 122 barycenter, 221 noiseless , 181 Bernoulli shift, 27, 69 quantum , 215 Bernoulli source, 69 quantum-classical , 216 stationary , 122, 217 /3-analytic element, 220 , 139 Birkhoff Pointwise Ergodic Theorem, 72 strongly mixing (SM) weakly mixing (WM) , 139, 217 block code, 181 chunk, 57 bounded ^representation, 209 classical-quantum channel, 216 boundedness, 12 clopen, 68 code, 181
247
Indices
248
length, 181 block , 181 commutant, 219 complete for ergodicity, 135 complete invariant, 28 complete system of events, 1 completely positive, 214 compound scheme, 2 compound source, 123 compound space, 121 compound state, 221 trivial , 221 concave, 3 concavity, 5 conditional entropy, 2, 17 function, 17 conditional expectation, 11 conditional probability, 11 conjugate, 29, 38 CONS, 38 constant channel, 123 continuity, 4 continuous (channel), 122 convex, 13 cyclic vector, 209 cylinder set, 27, 68
functional, 42 of a finite scheme, 1 of a measure preserving transformation, 20, 46 of a partition, 17 conditional , 2, 17 conditional function, 17 Kolmogorov-Sinai , 20 relative , 2, 49 Segal , 223 Shannon ,1 universal function, 117, 118 von Neumann , 223 equivalent, 57 ergodic capacity, 167, 174, 176 channel, 136, 147, 163, 217 decomposition, 109 source, 69, 97, 99 theorem, 72, 74 Pointwise , 72 Mean , 74 expectation, 12 extendability, 4 extremal, 147, 152, 211 extreme point, 71
D density zero, 83 Dirac measure, 128 dominated, 57 dominated (channel), 123, 197 Dominated Convergence Theorem, 12 dynamical system, 11
F Faddeev Axiom, 7 faithful, 219 Feinstein's fundamental lemma, 178 finite memory, 122 finite message, 68 finite scheme, 1 finitely dependent, 137 finitely £-valued, 197 Fourier inversion formula, 214 Fourier transform, 213
E f-valued simple function, 197 entropy, 1 equipartition property, 104 function, 17, 49
249
Subject index
G Gaussian (probability measure), 57 Gaussian (random variable), 56 generator, 176 greatest crossnorm, 200 GNS construction, 209 GNS representation, 219 H Hahn decomposition, 43 Hilbert-Schmidt type (channel), 205 Hilbert-Schmidt type (operator), 205 Holder's Inequality, 14 homogeneous, 59 hypothesis testing, 60 I idempotency, 12 identical (modP), 133 induced channel, 181 information, 1 source, 69 Kullback-Leibler , 55 mutual , 2, 167 stationary source, 69 injective tensor product, 126 input source, 123 input space, 121 integration channel, 190 intertwining property, 217 invariant measure 1, 106 invertible, 11 irreducible, 80 isomorphic, 26, 35, 39 isomorphism, 26 J Jensen's Inequality, 13
K KMS channel, 217 KMS state, 216 Kolmogorov-Sinai entropy, 20 Kolmogorov-Sinai Theorem, 25 Kullback-Leibler information, 55 L least crossnorm, 126 Lebesgue decomposition, 93 lifting, 31 linearity, 11 M m-dependent, 137 m-memory, 122 Markov shift, 28 martingale, 15 Convergence Theorem, 15 Mean Ergodic Theorem, 74 measurable (transformation), 11 strongly , 197 weakly 197 measure algebra, 29 measure preserving, 11 memoryless (channel), 122 message, 68 mixing strongly channel, 139 strongly source, 81, 99, 138 weakly channel, 139, 217 weakly source, 81, 99, 138 weakly state, 216 Monotone Convergence Theorem, 13 monotonicity, 4 /^-equivalent, 29 fi-a.e. S-invariant, 79 mutual information, 2, 167
250
N noise source, 189 noiseless channel, 181 nonanticipatory, 178 O of density zero, 83 output space, 121 P pairwise sufficient, 59 partition, 16 Plancherel Theorem, 213 point evaluation, 207 Pointwise Ergo die Theorem, 72 positive definite, 31, 207 positivity, 4, 12 projective tensor product, 200 pseudosupported, 221 Q quantum channel, 215 quantum measurement, 216 quantum-classical channel, 216 quasi-regular, 106, 114 R regular, 108, 114 relative entropy, 2, 49 Umegaki , 223 resolution of the unity, 215 S S-invariant (modp), 79 S-stationary, 47 Schatten decomposition, 221 semialgebra, 68, 77 semiergodic, 139 Shannon entropy, 1
Indices
Shannon's first coding theorem, 182 Shannon's second coding theorem, 184 Shannon-Khintchine Axiom, 7 Shannon-McMillan-Breiman Theorem, 102 shift, 27, 68 Bernoulli , 27 Markov , 28 cr-convergence, 215 cr-envelope, 215 £*-algebra, 215 simple, 219 simple function, 197 singular, 75 source, 69 Bernoulli , 69 compound , 123 ergodic , 69 input , 123 noise , 189 output , 123 stationary , 69 state, 214 space, 214 compound , 221 trivial compound , 221 stationary, 210 capacity, 167, 174, 176 channel, 122, 217 channel operator, 130 information source, 69 mean (of a channel), 160 mean (of a source), 91 source, 69 strongly measurable, 197 strongly mixing, 81, 99, 139 subadditivity, 5 submartingale, 15 Convergence Theorem, 16 sufficient, 58
Subject index
pairwise , 59 supermartingale, 15 support, 223 symmetry, 4 T tensor product algebraic , 126 injective , 126 projective , 200 totally disconnected, 68 transmission rate (functional), 167, 171, 175 trivial (compound state), 221 type 1 error probability, 61 type 2 error probability, 61 U Umegaki relative entropy, 223 uncertainty, 1 uniform integrability, 24 universal entropy function, 117, 118 V von Neumann entropy, 223 von Neumann Mean Ergodic Theorem, 74 W weak law of large numbers, 61 weakly continuous unitary representation, 209 weakly measurable, 197 weakly mixing, 81, 99, 138, 139, 216, 217 X x-section, 122 Y 2)-partition, 16
251