Fundamentals of Measurable Dynamics Ergodic Theory on Lebesgue Spaces
DANIEL J. RUDOLPH Department of Mathematics, University of Maryland
CLARENDON PRESS, OXFORD 1990
Oxford University Press, Walton Street, Oxford OX2 6DP Oxford New York Toronto Delhi Bombay Calcutta Madras Karachi Petaling Jaya Singapore Hong Kong Tokyo Nairobi Dar es Salaam Cape Town Melbourne Auckland and associated companies in Berlin Ibadan Oxford is a trade mark of Oxford University Press Puhlished in the United States hy Oxford University Press, New York
(0 Daniel J. Rudolph, 1990 All rights reserved. No part of this puhlication may be reproduced, stored in a retrieval system, or transmitted, in any form or hy any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press British Library Cataloguing in Puhlication Data Rudolph, Daniel J. Fundamentals of measurahle dynamics. 1. Ergodic theory I. Title 515.42 ISBN 0 -19 853572 4 Lihrary of Congress Cataloging in Puhlication Data Rudolph, Daniel J. Fundamentals of measurable dynamics: ergodic theory on Lehesgue spaces / Daniel J. Rudolph. Includes hihliographical references (p. ). Includes index. 1. Ergodic theory. 2. Measure-preserving transformations. I. Tille. QA614.R83 1990 515'.43-dc20 90-7486 ISBN 0-19-853572-4 Set hy Asco Trade Typesetting Ltd, Hong Kong. Printed and hound in Great Britain hy Biddies Ltd, Guildford and King's Lynn
Preface
Our intention here is to give an elementary technical treatment of the fundamental concepts of the measure-preserving dynamics of a Lebesgue probability space. This text has grown out of a course given at the University of Maryland beginning in the Spring of 1984. The last twenty-five years have seen an enormous growth in the theory of dynamical systems in general, and in particular, the probabilistic side of this field, classically known as ergodic theory. This development in recent years, most especially through the work of D. S. Ornstein and his school, has changed the perspective on much of the classical work in the field to a more set-coding combinatorial point of view as opposed to a functional analytic point of view. This perspective is of course much older, easily visible in the work of Kakutani, Chacon, and many others. We choose to attach Ornstein's name to it as it is in his work, and the work of those around him, that this point of view has reached its current power. Good expository treatments of ergodic theory already exist and certainly proofs of many of our main results can easily be found in standard texts in the field. What we are attempting to do through the methodology and order of proof we have chosen is to present the reader with this body of material from what has been to date a most fruitful point of view with the fabric of its development, at least at an elementary level, intact. We assume the reader has a thorough working knowledge of the topology of the real line, and Lebesgue measure theory on the real line. Royden (1968) is a good source for this material. The one deep result we will use without proof is the Riesz representation theorem for the dual of the continuous functions but only on such simple topological spaces as Cantor sets or the unit circle. The text is not intended to be encyclopaedic, but rather to present detailed arguments and chains of arguments showing technically how the fundamentals of dynamics on Lebesgue spaces are developed. The intent is to show those who want to prove theorems in ergodic theory what some of the more fruitful threads of argument ha ve been. For this reason, we gladly present some very technical material, and at several points give multiple proofs. Although in places the technical detail may seem formidable, we will in fact often make simplifying assumptions. The one most obvious is that only the ergodic theory of single transformations is considered. To extend the theory to actions of 7L n or more general discrete abelian groups is not too difficult. For non-abelian and continuous groups, even as basic as IR, extension requires new ideas,
vi
I
Preface
fundamentally the existence of measurable sections. We include a bibliography where such extensions can be found. From the basis given here, this literature should be readily accessible. Chapter 1 presents the fundamental concepts of measure-preserving dynamics and introduces a number of examples. Chapter 2 is a basic and technical treatment of the structure of Lebesgue probability spaces. As a preparation for later work, we prove an L i-martingale theorem via the Vitali covering lemma. This argument is a warm-up for our proof of both the BirkhofT ergodic theorem and the Shannon-McMillanBreiman theorem, as both will be proven by a 'Vitali' type argument. Chapter 3 presents the ergodic theorems, and ergodic decomposition. We present the now classical von Neumann L 2-ergodic theorem and the GarsiaHalmos proof of the BirkhofT ergodic theorem, to juxtapose them with the 'backward Vitali lemma' proof we then present. The intention is to give the reader as much technical insight into these theorems as is reasonable. Our last task is to show that any measure-preserving transformation decomposes as an integral of ergodic transformations. Hopefully the presentation given here makes this very technical argument approachable. Proofs are difficult to find in the literature. Chapter 4 covers the hierarchy of mixing properties, presenting the circle of definitions of weakly mixing and ending with the definition of a Kolmogorov automorphism. Included is a short development of the spectral theory of transformations. This leads to the theory of entropy in Chapter 5, which we develop from a name-counting point of view, again using the backward Vitali lemma to prove the Shannon-McMillan - Brieman theorem. In Chapter 6 we introduce the concept of a joining and disjointness and we use these to again characterize ergodic weakly mixing, and K-mixing transformations. We also show that Chacon's map has minimal self-joinings and use this to construct some counter-examples. In Chapter 7 we present the Burton- Rothstein proofs of Krieger's generator theorem and Ornstein's isomorphism theorem from the viewpoint of joinings. Many exercises are presented throughout the course of the text. They are intended to help the reader develop technical facility with the methods developed, and to explicate areas not fully developed in the text. Chapters 1 through 5 form the core of an introductory graduate course in ergodic theory. As the point of view here is technical, we have been most successful using this material in conjunction with a more broadly oriented text such as Walters (1982), Friedman (1970) or Cornfeld, F ormin and Sinai (1982). Chapters 6 and 7 can be used either as the core of a more advanced seminar or reading course, expanded perhaps with appropriate research literature of the field, allowing the instructor to orient the course either toward deeper abstract study, or application of the material. The book ends with a bibliography.
Preface
I
vii
I would like to thank Charles Toll, Ken Berg, Mike Boyle, Aimee Johnson and Janet Kammeyer for collecting and refining the original note~, and the text during its years of development. 1 also must thank Virginia Vargas for shepherding along the manuscript through its many revisions.
Maryland 1989
D.l.R.
Contents
1.
Measurable dynamics I.!
Examples
1.2 Exercises
2.
lebesgue probability spaces
2.1 2.2 2.3 2.4 2.5
3.
9 10 12 18 25
Von Neumann's L 2- ergodic theorem Two proofs of Birkhoff's ergodic theorem Proof of the backward Vitali lemma Consequences of the Birkhoff theorem Disintegrating a measure space over a factor algebra
27 29 37 43 45
Mixing properties
4.1 4.2 4.3 4.4 4.5 4.6 4.7
5.
Countable algebras and trees of partitions Generating trees and additive set functions Lebesgue spaces A martingale theorem and conditional expectation More about generating trees and dynamical systems
Ergodic theorems and ergodic decomposition
3.1 3.2 3.3 3.4 3.5
4.
7
Poincare recurrence Ergodicity as a mixing property Weakly mixing A little spectral theory Weakly mixing and eigenfunctions Mixing The Kolmogorov Property
51 51 52 57 62 66 69
Entropy
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Counting names The Shannon-McMillan-Breiman theorem Entropy zero and past algebras More about the K-property The entropy of an ergodic transformation Examples of entropy computations Entropy and information from the entropy formula More about zero entropy and tail fields
71 77
79 82 83 85 89 94
5.9 Even more about the K-property 5.10 Entropy for non-ergodic maps 6.
Joinings and disjointness
6.1 6.2 6.3 6.4 6.5 6.6
7.
97 103
Joinings The relatively independent joining Disjointness Minimal self-joinings Chacon's map once more Constructions
105 108 113 116 118 125
The Krieger and Ornstein theorems
7.1 72 7.3 7.4 7.5 7.6 7.7
Symbolic spaces and processes Painting nameS on towers and generic nameS The d- Metric and entropy Pure columns and Ornstein's fundamental lemma Krieger's finite generator theorem Ornstein's isomorphism theorem Weakly Bernoulli processes
Bibliography Index
129 132 135 141 150 152 156 164 167
1 Measurable dynamics
Dynamics, as a mathematical discipline, is the study of those properties of some collection of self-maps of a space which become apparent asymptotically through many iterations of the maps. The collection is almost always a semigroup, usually a group, and often, as a group, simply 7L or IR. The origins of the field lie in the study of movement through time of some physical system. The space is the space of possible states of the system and the self-maps indicate how the state changes as time progresses. The field of dynamics breaks into various disciplines according to the category of the space and self-maps considered. If the space is a smooth manifold and the self-maps are differentiable then the discipline is smooth dynamics. If the space is just a topological space and the self-maps are continuous, the field is topological dynamics. As a very significant subfield of this, if the space is a closed shift-invariant subset of all infinite sequences of symbols from some finite set, and the self-maps are the shifts, then the field is symbolic dynamics. Lastly, and of interest to us here, if the space is a measure space and the self-maps are measure-preserving then the field is measurable dynamics, or more classically, ergodic theory. Each of these disciplines has its own special flavour and language, but the overlaps among them are enormous and the parallels often subtle and insightful.
1.1
Examples_
Here is a collection of examples of dynamical systems from various mathematical disciplines, each of which has one or more natural invariant measures.
Example 1 Rotations of the circle. The space Sl will be the circle of circumference 1 and the self-maps, RIZ , rotations by an angle tX. In this case, the space and the collection of maps can be identified as the same object, the group of rotations of the circle. Lebesgue measure m on the circle is invariant under rotations. This is a particular case of a compact group acting on itself by left multiplication and its unique invariant Haar probability measure. We will return to this idea in our last example. Example 2 Hyperbolic toral automorphisms. Here the space T2 will be the two-dimensional torus. We think of this as W~.217L2. This is again a compact abelian group but we will not consider the action of the group on itself. Let
2
I Fundamentals of measurable dynamics
ME SL(2.Z) be a 2 x 2 integer matrix of determinant 1, (for example
[~
:]>. As M(Z2) =
7L 2, M projects to an automorphism of T2 preserving
Lebesgue measure. Suppose the eigenvalues of M are A, A- I and do not lie on the unit circle. They both must be real. Let unit eigenvectors be VI' and V2 • Supposing IAI < 1 for any point x E T2, the line of points x(s) = x + SV 1 contracts exponentially fast as M is iteratively applied, i.e., IIM"(x(s» - M"(x)II =
IAI"lsl.
Similarly, the line x(u) = x + UV2 expands exponentially fast. The existence of such expanding and contracting foliations (families of lines or curves), is a common, and much studied, phenomenon. It has powerful implications for the dynamics of the system, as we see in Exercise 2. Mixing Markov Chains. Let M be an n x n matrix whose entries ~ 0 all entries of Mk are greater thanO. The space will consist of all sequences x = {xj}f=_"",XjE {l, ... ,n} where Example 3
mi• j are either 0 or 1. Assume that for some k
LM
We view the movement from Xj to xj +1 as a transition from one state to another and the 1's in M indicate the allowed transitions. Mk > 0 says that all transitions across time units are allowed. The map -+ is the left shift, i.e., T(x) = y where Yi = Xi +1' The space is a closed shift-invariant subset of {l, ... , n}Z, and is called a topological Markov shift, or chain. In this situation there are many invariant measures. Here is one sort. Let P = [Pi.i] be a matrix with all Pi.j :?: 0, Pi.j > 0 iff mi.} = land
k
LM
T: LM LM
"
'" p 1.1 .. = 1'
L j=1
i.e., a Markov matrix compatible with M. As pk has all positive entries. P has a unique left eigenvector (PI,""P,,) with eigenvalue 1 with all Pi> 0 and Li'=lPi = 1. This allows us to define a T-invariant finitely additive measure on open sets y defining it on a cylinder set c = {x : Xj = i(j), a 5;. j 5;. b} to be b-l
Jlp(c)
= Pi(",
n
Pi(k).i(k+1)·
k=" fhat Jlp extends to a T-invariant Borel measure on Ko]mogorovextension theorem (see Chapter 2).
L
follows from the
Example 4 Isometries of compact metric spaces. Here the space X will be a x>mpact metric space with metric d( , ), and the self-map f: X -+ X, an sometry d(f(x),f(y» = d(x, y).
Measurable dynamics
We wi1l assume f has a dense orbit, Le., fOT some
I3
Xo
00
U
!"(xo) = X.
(1.1)
"=-00
Iff has such a dense orbit, then all positive orbits are dense, i.e., U;;-'=l j"(x) =. X for aU x E X. We want to see two things about this situation; first that X can be made a compact abelian group with the action of f given as multiplication by f(xo) and second that X has a unique f-invariant Borel probability measure. If X is a finite set, then f must be a cyclic permutation and the results are easy to see. Thus we may assume that all the points fn(x o ) are distinct. We first define a product rule on this dense subset of X x X by j"(xo ) x fm(x o ) = j"+m(xo). If fn,(x o ) and fffl;(X O ) are both Cauchy sequences, then so is f n, -m,(xo)' Thus for a, bE X, select f"'(x o) -+ a, f""(xo) -+ band derme a x b- 1 = limj",-m,(xo). This element of X is independent of the Cauchy sequences chosen, and makes X an abelian topological group. The group relations follow from those on j"(xo)' Notice that Xo is the identity element, f("o) x a = f(a) and multiplying by any a E X is an isometry of X. Remarks 1. A compact group with an element g with {gn: n ~ O} dense is caUed monothetic.
2. This argument can be generalized to much weaker situations; for example X need not be metric, only compact Tz with {j"} equicontinuous. The index n could be replaced by t E IR or more generaUy by g E G, a locally compact abelian group of self-maps of X. We could also have started with X merely precompact and extended f to its compactification. We will apply this last idea in our discussion of the weakly mixing property in Chapter 4. 3. If we do not assume f has a dense orbit, then X always decomposes into an indexed family X(IX) of disjoint compact f-invariant sets on each of which f has a dense orbit. This is a simple case of the decomposition of a space into 'ergodic components.' We now want to see that f has a unique invariant Borel probability measure. We will do this by showing that a compact abelian metrizable monothetic group has a unique invariant Haar measure. For any continuous function II, set 1 n-l
An(h, x)
where g has a dense orbit.
=-
L h(gi
n 1=0
X
x)
(1.2)
4
I
Fundamentals of measurable dynamics
Using the density of the orbit of 9 one shows that lim IAn(h, x) - An(h, y)1 =
°
(1.3)
uniformly in X x X, and hence as (1.4)
we can conclude lim An(h, x)
= L(h)
converges uniformly to a constant for x E X. As L(h) is a bounded linear functional on C(X), by the Riesz representation theorem (1.5)
for some Borel measure J1.. That J1. is 9 invariant follows easily from L(h)
= L(g(h».
If v were some other g-invariant measure, then
f
An(h,x)v(dx)
=
f
hv(dx).
But as An(h, x) converges uniformly to L(h), v = J1., and J1. is unique. This is only one of myriad constructions of Haar measure for such groups. Example 5 Rank-l cutting and stacking constructions. We will describe a general method for constructing a certain class of maps. Each will act on some interval [0, a) c ~+. The method is inductive. At each stage of the induction we will have constructed a partial map /,., defined on the interval [0, an), where an s;; a. The map in will have the following form. The interval [0, an) will be cut into disjoint subintervals 1(1, n), 1(2, n), ... , I(N(n), n),
left-closed, right-open, all of the same length an/N(n). We will have assigned some permutation 11:nof {I, ... , N(n)} to order the intevals and in maps 11:n(i) to 11:n (i + 1) linearly. Thus /,. is defined everywhere except l(11: n(N(n», n), and /,.-1 is defined everywhere except 1(11:n(1), n). We view this situation as a stack, tower or block of intervals (all three words are often used) and /,. as movement vertically through the stack (Fig. 1.1). To get started N(O) = 1, 1(1,0) = [0,1),
and
To construct!..+l' we choose a parameter k(n
11:0
= id.
(1.6)
+ 1) > 1 and cut each I(j,n)
Measurable dynamics
I
5
1 - - - - - - - 1 } 1(:rr,,(N(n»,n)
1 - - - - - - - 1 } 1(:rr,,(3).n) 1 - - - - - - - - 1 ) 1(:rr,,(2),n)
t - - - - - - - l } I(:rr,,(l),n)
Fig. 1.1 A stack of intervals. 1 1 1 --1--1---1-1 1
1
--1--1---1---1--1---1---1--1--1--
Fig. 1.2 Cutting a stack. into ken + 1) subintervals of the same width. We can view this as slicing through the stack vertically (Fig. 1.2). This gives us the first part of our list of intervals 1(j, n + 1). We also select parameters S(I,n
+ 1), S(2, n + 1), ... , S(k(n + 1),n + 1) E Z+ U to}
and cut off k(n+1)
L
S(j,n+l)
j=1
intervals, of the same length N(n)k(n
+ 1)
(1.7)
as those already cut, but from an interval [an,an+d. We can define nn+1 by describing how to stack these intervals. We work from left to right through the sliced off subcolumns of the previous stack, placing one above the other, putting S(j, n + 1) of our new intervals atop the jth slice before adding the next (Fig. 1.3). nn +l (i) is the index of the interval i steps up the stack of intervals. It is easy to see that f.. +l = f.. where both are defined, and that all f.. preserve Lebesgue measure t. We assume 00 "'k(n) ~ £...-)=1
£...n=1
S( J,. n) an -1
N(n - l)k(n)
a- 1 <00
(1.8)
6
I
Fundamentals of measurable dynamics
D
5(k(n), n+ 1) 1===
O(n+l)o1""'
D D
5(2,n-l I) {
S(I,n+
D2ndl){ D'M~re
Fig. 1.3 Forming the next stack.
so lim an = a <
00
and if we normalize I to J1 = Ila, the limit map f preserves the probability measure J1, and is 1-1 onto from [0, a) -+ (0, a). Omitting the forward orbit 1"(0), n E Z+, we are left with a measure-preserving invertible map. The map f is characterized by the sequence of parameters ken), S(I, n), ... , S(k(n), n) and is caUed a rank-l cutting and stacking construction. Some interesting examples are:
= 2, S(j, n) = 0 for all}, n, called the dyadic adding machine; and ken) = 3, S(I, n) = S(3, n) = 0, S(2, n) = 1, an example due to Chacon that
(1) ken) (2)
has very interesting properties which we will return to in Chapters 3 and 6. We have now described a little collection of examples of measure-preserving dynamical systems. We have limited our list to what are the most easily described, or most useful for our later purposes. As we continue we will add other examples and elaborate on those already given. The fundamental problem of any branch of dynamics is to find methods
Measurable dynamics
I
7i
for classifying dynamical systems up to the category of interest. In the I case of ergodic theory the problem is to find methods, or stru\:tures, which! classify measure-preserving systems which are invariant under measurable I isomorphism. What does it mean for two dynamical systems to be measurably isomorphic? i Suppose G is a semigroup and we have two measure spaces (X,~, Jl) and I (X', ~',Jl'), and for g E G, ~easure-preserving maps, composing according to I the multiplication rules of G, 1'g: X -+ X, 1',;: X' -+ X'. Definition 1.1 We call these systems isomorphic if there are invariant subsets I Xo c X, X~ c X', each offull measure, and a measurable measure-preserving! invertible map,
rfJ: Xo -+Xo
(1.9) I
so that
rfJTg = T;rfJ for all g E G. Any properties of interest to us in ergodic theory must be preserved by I isomorphism. For example, the property of ergodicity,i.e., that any set A with Tg-I(A) = A I for all g E G must have measure 0 or 1, is an isomorphism iwarianl In our I list of examples, we can show virtually all the maps discussed were erg6dic. Another property we will verify for ~ome of our examples is mixing, that for I any measurable sets A and B lim Jl(T"(A) fi B) = Jl(A)Jl(B).
(1.10) I
PI-CiO'
This is also an isomorphism invariant. The exercises that follow involve verifying these properties for some of our I examples.
f 1 .2
Exercises
Many of these exercises are quite difficult. We recommend the reader keep I returning to them while progressing. through later chapters. They are not I intended so much to test the reader's understanding of the chapter, as to lead the reader more deeply into the material. 1.1
Let Ra be rotation by ex on SI. Suppose Ra has an invariant set, i.e., I there is a set A c SI, m(A) -1= 0, 1 and Ra(A) = A. Show that ex = 27[r, I r E Q (the rationals).
1.2
Let ME SL(2, Z) with eigenvalues A, A-I not on the unit circle. Using the fact that the two-dimensional trigonometric polynomials are dense in L2(T2), for I, g E L2(T2) show that
8
I
Fundamentals of measurable dynamics
!~~ f f'g(Mn)dJl = f fdJl f gdJl. Hence if f
1.3
0
M
=
f, f is a constant Jl-a.e.
Let P = [Pi,j] be a Markov matrix as in Example 3, and p its unique left eigenvector with eigenvalue 1, as described. 1. Usingthet 1 metric Ilvll =
L7;llvil on IRn,showthatifL7;1 Vj = 0 then lim Ilvpnll = O. Hint: Show IlvP11 ~ Ilvll and then splitting v into positive and negative terms show Ilvpkll ~ AV where A = 1 - (smallest entry in pk)/2n. 2. Use part 1 to show that for any probability vector v, lim II(p - v)pnll
=
O.
3. Use part 2 to show that for any cylinder sets C 1 , C 2 , lim (Jlp(C 1 n P(C2 )))
f
=
Jlp(CdJlp(C2 ).
4. Use part 3 to show that for any f ELI (LA' Jlp) if f constant Jlp-a.e.
0
T
=
f then
=
1.4
Suppose (X, d,f) is as in Example 4. Show that if f has a dense orbit, then all positive orbits of f are dense.
1.5
Work through the details of the argument sketched in the remarks following Example 4.
1.6
Show that if f is a rank-l cutting and stacking construction and h E U (Jl) then
1 n-l An(h, x) = -
L
h(P(x»
ni;O
J
converges in L I(Jl) to h dJl. (First show this for functions constant on intervals I (j, n) and then use their density in U (Jl).) 1.7
Show that no two of the following are isomorphic: (1) rotation on the circle by an angle 2mx, ex E 0;
(2) a hyperbolic toral automorphism; (3) the dyadic adding machine (hint: consider T 2 h).
2 Lebesgue probability spaces
The examples of measure-preserving dynamical systems of the previous chapter sat on a wide variety of spaces. Some were merely topological, others were smooth manifolds, often groups. All we are fundamentally interested in is the fact that they are measure spaces endowed with a probability measure. The purpose ofthis chapter is to show that under some easily verified assumptions, all such spaces are measurably conjugate to Lebesgue measure on the unit interval in RI. Thus the underlying probability space, unless very unusual, will never provide an obstacle to the existence of a measurable isomorphism. This fact allows us to use, on such 'Lebesgue spaces', all the machinery of real analysis, most especially the Vitali covering lemma. We do so to prove an L I-martingale theorem at the end of the chapter. We now describe the basic structure whose existence on a given set and finitely additive measure will imply that, as far as measurable behavior is concerned, we are dealing with the unit interval and Lebesgue measure.
2.1
Countable algebras and trees of partitions
To begin, let X be some arbitrary set. By a countable algebra of subsets of X we mean a countable collection A of subsets of X closed under the operation of taking complements, finite intersections and hence finite unions. A finite partition P of X is a collection {Sl" .. ,Sk} of disjoint sets whose union is X. Notice that the collection of all finite unions of elements of a finite partition of X forms a countable algebra. Given two finite partitions PI and P2 of X we can form their span, PI v P2 consiting of all intersections of one element of PI and one of P2 • We say P' refines P if every element of P is a union of elements of P'. Thus PI v P2 is the smallest partition refining both PI and P2 • Starting with a countable algebra A = {AI> A 2 , •• •}, we can define a refining and now sequence of partitions Po, PI' .. , in A as follows. Let Qi = {Ai> set Po = {X} and ~+l = Pi V Qi+l' Notice that any set in A is a finite union of elements of some Pi in the sequence. Let ~ = {Si, I, Su,"., Si,k(i)}' As each Si,j is a union of sets in Pi + l , we can associate with the sequence a directed graph. Its nodes are the sets Si,j and an arrow goes from Si,j to Si+1,j' if Si+1,j' £; Si,j' The sequence Pi. and associated directed graph we call a refining tree of partitions. Thus from a countable algebra we can construct a refining tree of partitions. Notice that conversely, given a refining tree of partitions the collection of aU
An
finite unions of elements of the various 1'; is a countable algebra. Notice also that the tree assoicated with a countable algebra is not unique. If we re-order the elements of A in our description, the tree may change. Example 1
Let 1'; consist of intervals [kI2i, (k
+ 1)/2 i ) in [0, 1).
Example 2 Let 1'; consist of squares of the form [kI2 i , (k [k'/2 i ,(k' + 1)/2i) in the unit square [0,1) x [0, I).
2.2
+ 1)/2i )
x
Generating trees and additive set functions
We say a tree of partitions generates if for any Xl' X 2 E X, there is a partition with Xl and X 2 in distinct elements of 1';. Both Examples 1 and 2 generate. An additive set function on a tree of partitions {1';} is a map fJ.o from the sets of each Pi to IR+ with 1';
(1) fJ.o(5~)
= 0;
(2) fJ.o(X) = 1; and (3) if A = U~=l Aj a disjoint finite union, where A E 1';, Aj E 1';+1' then k
fJ.o(A)
=
L fJ.o(Aj)' j=l
(2.1)
Examples (1) fJ.o(S) = length of S. (2) fJ.o(S)
= area of S.
We are, at this point, interested in generating trees of partitions, with a given additive set function defined on the sets of the corresponding algebra. A chain of sets rrJ = {C l ,C2 , ... } in a tree of partitions is a sequence Ci E Pi where Ci + l C Ct. For any such chain, fJ.o(c;) is a non-increasing sequence, hence converges to some value we call fJ.(rrJ). Exercise 2.1
Show that fJ.(rrJ) >
°
on at most a countable collection of chains <Xl
A = {rrJl , rrJ2 , ... }
L fJ.(rti) :s 1.
and
(2.2)
i=l
As we assume {P;} generates, for any chain ~ x(rrJ)
n
= ""
i=l
Ci
(2.3)
..... u ..::.yut: f.JrUUi:luIII1Y spaces I I I
consists of either one point, or is 0. Those points which are such intersections with J.L(rt') > 0 are called atoms of J.Lo. We say J.Lo is non-atomic if no such atoms exist, i.e., for all chains rt', J.L(rt') = O. Exercise 2.2 If J.Lo is non-atomic then chains rt'.
limi~oo
J.Lo(cJ
= 0,
uniformly on
Let E be the union of all sets A in any 1'; with J.Lo(A) = O. Let X 0 = X\E. Theorem 2.1 If {1';} is a generating tree of partitions of X, and J.Lo is a non-atomic additive set function on it, then there is a 1~ 1 map ,p:Xo-+Zc~.
Z is a compact totally disconnected subset of ~ and for any set A E Pi' ,p(A) is open and closed in the relative topology on Z and J.Lo(A) = t(,p(A», (remember f is Lebesgue measure). Proof To each A E 1';, J.Lo(A) > 0, we will assign a closed intervaII(A) E ~+ as follows. For Po = {X}, assign an interval I (X), with t(l(X» = 5/4. Assume we have made assignments through Pk and (1) if A, A' are disjoint in Pk so are I(A), I(A'); (2) if A c A' are in different P/s, then I(A) c I(A'); and (3) if A
E
Pk then
To construct the intervals for Pk +1' simply select inside each I(A), A E Pk , disjoint closed intervals one for each A' c A, A' E Pk + 1 , each some small fraction i:k+ 1 ~ r k- 1 larger than J.LO(A') > O. For any chain rt' = {cd there corresponds a constructed chain of closed intervals I(c i ) and as
(2.4) nkJ(Ck) is a single point.
For any x E X 0, let ck(x) E Pk be that set containing x and ,p(x) be the point in ~+ corresponding to this chain. This defines ,p. As J.Lo(A) = 0 implies (A n X 0) = 0, ,p(A) n J(A) of- 0 for any set A. It follows that if A E Pjo ' (2.5)
I Fundamentals of measurable dynamics
12
(as fJ.o is non-atomic, for A E Pi' t(I(A» goes to 0 uniformly as j -=1= 0 if Ai c A). As
-+ 00,
and
tft(A) n [(Ai)
fJ.(A)(1
+ ei) < t U
[(A;) < fJ.(A)(1
+ Ti),
AjEP Ai CA
(2.6)
Since tft(A) and tft(A C ) are disjoint for A E p;, these sets are both open and closed in the relative topology on Z = tft(X). The construction of the map tft is simply modelling fJ.o and X by a Cantorlike set in IJ;l of Lebesgue measure I. Exercise 2.3 State and prove a modified version of this theorem that includes the possibility of atoms.
2.3
lebesgue spaces
We call X, with generating tree of partitions {P;} and non-atomic set functon fJ.o, Lebesgue if
t(tft(X o)\tft(X
0» = o.
(2.7)
In more technical terms Lebesgue means that ()'>
UA
(1) for any 8 > 0 there is a set E(e)
=
:?
l
L Ai'
i~l
where each Ai E lj for some j and the Ai are pairwise disjoint with
I
fJ.o(A i) < 8;
i
and (2) for any chain ((j
= {C 1} with
n
Ci
= 0, once i is sufficiently large, Ci c
E(8),
or the empty chains have measure O.
(2.8)
Exercise 2.4 The definition above of a non-atomic Lebesgue space is given in two forms (2.7) and (2.8). Show that they are equivalent. Hint: first show that any union of sets in a ~ree can be written as a disjoint union. Exercise 2.5
Both Examples (I) and (2) are Lebesgue.
Lebesgue probability spaces I 1;:S
At this point being Lebesgue seems to depend critically on the choice of {Pi}' We proceed to eliminate this artifact. From now on we assume X, {Pi}' fJ.o is Lebesgue, and non-atomic unless otherwise stated. In Z = (b(X) we have the u-algebra of Lebesgue measurable sets :F. Let d be the inverse images in X of such sets, a u-algebra in X containing all the Pi' For any A E Pi' (b(A) is Lebesgue measurable as :F is complete. Remember (Royden 1968) :F consists of those sets whose outer and inner measures are equal. For A E Pi> (b(A) is within measure 0 of (b(A). These closed and open sets generate the topology on Z. If we define ~(S) to be the set of all coverings of S by disjoint unions of sets A, each an element of some Pj, and then define fJ.*(S)
I
= inf
fJ.o(A)
= t*({b(S)),
(2.9)
CE'6(S) AE'6
then for SEd, fJ.*(S) = t({b(S)) =
1 - t({b(SC»
=
1 - fJ.*(SC),
and if fJ.*(S) = 1 - fJ.*(SC)
then t*({b(S»
=
1 - t*({b(SC»
and SEd. Thus d
= {S: fJ.*(S) = 1 - fJ.*(SC)}
(2.10)
and we write fJ.(S) for t({b(S» = fJ.*(S). The extension of fJ.o to all of d under the Lebesgue hypothesis is a version of the Kolmogorov extension theorem (Chung 1968). It can, of course, be done directly in terms of outer and inner measure. The value of our approach via the injection {b to IR is that we can now use the very tight connection in IR between geometry and measure. We want to see that if one generating tree of partitions and additive set function fJ.o yields a Lebesgue space, then any other choice for a tree from d is also Lebesgue. If (X, d, fJ.) has a choice for X, {PJ, fJ.o making it Lebesgue, we call (X,d,fJ.) a Lebesgue space. Let {QJ be a tree of finite partitions, not necessarily generating, made of sets in d. We first create a new space X on which it does generate. We say Xl '" X 2 if for any i and S E Qi' if Xl E S then X 2 E S. This is an equivalence relation. Let it be the space of equivalence classes of ~. The QI can be thought of as partitions of X and on this space, {Qi} generates.
Theorem 2.2 If (X, d, fJ.) is Lebesgue and non-atomic and {Qj} is any tree of partitions from .s# with fJ.(S) > 0 for all S E Qj, then (X, {Q;}, fJ.) is Lebesgue. Proof What we must show is (2.8) that given any 8 > 0, we can find a countable disjoint collection of sets B;, each an element of some Qi' so that LfJ.(BJ < 10 i
n
U
and for any chain If,} = {c i } from the tree {Qi} with C j = 0, Ck C Bj , for some k. We fix e. We will work in Z, the image space constructed in Theorem 2.1 using the Lebesgue space (X, {Pj }, fJ.o) we know exists. For each set S E Qj we construct a closed subset D(S) c 1 - 10/2.
This exists as
Assuming we have assigned closed sets D(S) through S C S' E Qk' select a closed set (perhaps 0) D(S)
c
Qb
if S E Qk+I'
with t(D(S))
~ t(
Us
Let if;k = e Qk D(S), a finite disjoint union of closed sets. Clearly if; is closed and contained in Z. Furthermore t(if;)
~
(2.11 )
1) .
f (1 - 210k) > 1- e.
=
nif;k
(2.12)
k=1
Let Y' = {S E Ql' some i:
U
SeY'
s)
=
~ fJ.(S;) < 1 -
t(if;) <
8.
(2.13)
,
We want to see that these cover the empty chains. If r/ = {cd is a chain c
{Qd with n cj = 0, as D(c i )
o
in =
=
_
Exercise 2.6 State aJprove a modified version of this theorem that includes the possibility that {Qi} has atoms. Knowing that (X, {Qi},Il) is Lebesgue, we can apply Theorem 2.1 to map the equivalence classes X into a compact subset of jR by a map ,p. The completion with respect to p. of the a-algebra of sets ,p-l (Lebesgue set) we write as Vi=1 Qi, and call such sets {Qi} measurable. We call the measure space (X, Vi=1 Q i., Il) a factor of (X, ff', Il)· We next want to see that generating is all we need in order to specify the Lebesgue sets, i.e., two different generating trees of partitions yield the same a-algebra of measurable sets. Theorem 2.3 If (X, .91, Il) is Lebesgue and non-atomic and {Qi} is a generating tree of partitions then Vi':::'1 Qi = .91. Proof From the previous theorem we know that there is a decreasing sequence of sets E;, each of which is a countable disjoint union of sets from various Q/s Il(Ei ) r 0 and each EI covers the empty {Q;} chains. Let {P;} be the partitions generating .91. . Let Hi = P; V Qi V (VJ=1 (E j , E)), the minimal partition for which Ph Qi and Ei,j = 1, ... , i are all finite unions of elements of HI. Delete from X all sets S E Hi with Il(S) = 0, a set of total measure 0, leaving X' s; X, a measurable subset of full measure. Now X', {Pi}' Ilo is still Lebesgue and showing {Q;} generates .91 on X' implies it does on X. In thi~ case, however, for all S E Hi' Il(S) > 0 and without loss of generality we write X for X'. We know X, {HI}, Il is Lebesgue and non-atomic. Using Theorem 2.1, if; is the injection to jR constructed using {H;}, Il then ,p(X) = Z is a compact totally disconnected subset Z s; jR+. For a~ sets S E Hi' ,p(S) is a compact open subset of Z in the relative topology. Hence Eni ,p(EJ is a closed subset of measure 0 in Z, as Ei is a finite union of elements of Hi. As {Q;} generates, if ZI' Z2 E ,p(X), there are disjoint sets S1> S2 E Qi for some i with Z1 E
;(SI)'
Z2 E
,p(S2).
(2.14)
Note: SI and S2 are each finite disjoint unions of elements of Hi' hence ,p(SI)' ,p{S2) are disjoint. Notice Ee n Z c: Range(t,b). If ZI E E and Z2 E E' n Z we claim there must be disjoint SI, S2 E Qi for some i with
(2.15) Otherwise ZI ahd Z2 are always in the same ,p{SJ, Si E Qi. As ZI E E, this forces E, as its Q/ chain is always contained in ,p(E;). Let C be a closed subset of Z. We want to show
SdZI,Z2), S2(ZI,Z2) E Qi for some i and ZI
E
t;b(SI(ZI'ZZ»'
Z2
E
t;b(S2(ZI,Z2»·
Fix Z2 and consider {'(s~Til' Zz» IZ1 E C} an open cover of C. As C is compact, there is a finite subcover. Let t;b(S(Z2» be the union of elements in this finite collection of sets, again an open compact subset of,Z and Zz if t;b(S(zz». Now S(Z2) is a finite union of elements of some Qi. Let
n
C =
(2.16)
t;b(S(;;»,
z2eEcnC·
at most a countable intersection as S(Z2) is an element of a countable algebra of sets. Hence rl(C) E 11. But C c C c CuE and as ~ is complete rl(C) E ~ and we are done. _ Note: For our purposes, then, a Lebesgue probability space is just a measurable subset of measure 1 of a compact totally disconnected subset of IR. This theorem gives us the following corollary. Corollary 2.4 If(X,d, JJ.) and (X, d',JJ.') are both Lebesgue probability spaces, and t;b : X -+ X' is 1-1, onto and measurable, and furthermore JJ.(r 1 (A»
=
°
only if JJ.'(A)
=
°
(called non-singularity), then t;b -1 is also measurable.
Proof Let {Pi} be a generating tree of measurable partitions for (X', d', JJ.'). As t;b is 1-1, onto and non-singular, {Qt} = {r 1(Pt)} is a generating tree of partitions in d Hence rl(d) = ,p-l
(Yo p) = iYo rl(P
i)
Thus for Sed', t;b(S) E d and,p-l is measurable.
= S~ Qi = d'.
•
We will indicate a Lebesgue space by a triple (X,F,JJ.) where F is the a-algebra of measurable sets and JJ. is the measure. It will be useful, at times, to select a particular generating tree of partitions and assume they are open compact subsets. In this sense the significance of our work in the previous arguments is not so much to show the existence of one identification of our space with a subset of jR+, but rather to understand their multitude. Exercise 2.7 If (X, $', JJ.) is a non-atomic Lebesgue space, then there is a subset X 0 £; X, with X(i a countable set and a 1-1 measure-preserving map t;b: Xo -+ (0, 1).
Lebesgue probability spaces
Exercise 2.8
I
17
Modify the previous exercise to include the possibility of atoms.
Exercise 2.9 Returning to Example 3 of Chapter 1, show that J-l p ' given as a finitely additive set function on cylinder sets, makes LM a Lebesgue space. Exercise 2.10 Returning to Example 4 of Chapter!, show that Haar measure on a compact abelian monothetic metric group makes it a Lebesgue space. Having seen what a Lebesgue probability space is, it behooves us to consider what can fail in the arguments we have given if less is assumed. On a space X let {P;} be a generating tree of partitions, and J-lo an additive set function on the elements of the partition. In Theorem 2.1 we saw that this is enough to construct a 1-1 map ,p from X to some subset Range(,p) c IR. We know t(Range(,p» = 1. As rjJ is 1-1, we may assume X = Range(,p) c IR. We know J-lo can be extended to a measure J-l making (X, §i', J-l) Lebesgue if t(X\X)
= 0.
(2.17)
This can fail to occur at two levels, first J-lo might not be extenable to a measure f.,l, and second (X, §i', J-l) might not be Lebesgue. Here are examples of both possibilities. Let Xc [0,1] be a measurable subset so that on any interval f, t(f)
> t(f n X) > 0.
Let 1'; be the partition of X into intervals of width ri, and for Si,j E Pi' J-lO(Si,j) = rio
In this case the additive set function J-lo cannot be extended to a measure, as X can be covered by a countable union of dyadic intervals, the sum of whose measures is strictly less than 1. For a second example, let X c [0,1] be a non-measurable subset of outer measure 1. Again let 1'; be the partition of X into intervals of width r i and set J-l(S) = teA), where A is Lebesgue measurable and S = An X. That X is of full outer measure implies this definition is independent of the choice of A. The measure space is, though, not Lebesgue as t*(Range(,p)\Range(,p»
= t*(XC n
[0, 1] > 0,
(2.18)
(the map ,p will, in fact, be 1-1 and measure-preserving on [0, 1]). This example is the general form of a separable probability space. A probability space could also fail to be Lebesgue if no tree of partitions could generate, i.e., if the cardinality of X were larger than C.
10 I r-unaamentalS OT meaSuraOle aynamlcs
We also have seen in Theorem 2.3 that in a Lebesgue space for a tree of partitions to separate points is equivalent to generating the measure algebra. The following example shows that this also can fail in the non-Lebesgue case. Exercise 2.tt LetA ~ [O,t) beasubset witht*(A) = tand t*(AC n [O,!» = t. Let X = Au (Ae + t) ~ [0,1]. Define .F = {X n E : E is Lebesgue measurable} and J.I.(X n E) = t(E) as in the above example (1) show J.I. is well defined;
(2) exhibit a tree of partitions {PJ ~ .F that separates points of X but does not generate the full measure algebra .F. At the other extreme from these examples is a case which cannot fail to be Lebesgue. This is the case when our generating tree of partitions has no empty chains. For example the Cantor-like set Z constructed in Theorem 2.1 has this property as do mixing Markov chains, for the given tree of partitions. In such cases all additives set functions on the tree extend to measures. One can topologize such a space, taking as a base for the open sets the sets in the tree. The topology is totally disconnected, metrizable and, as there are no empty chains, compact. The Borel measures on this space are, by the Riesz representation theorem, the dual of the continuous functions. In this particular case the Riesz theorem is easily proven, as the characteristic functions of sets in the tree are continuous. Borel probability measures on this space are exactly the ones which come from additive set functions on the tree and are the same as positive nonned linear functionals on the continuous functions. This space of Borel probability measures is itself a compact metric space, as the dual of the continuous functions. We will see much more of this in Chapter 6.
2.4
A martingale theorem and conditional expectation
As an application of our Lebesgue space development we will prove a martingale theorem. The result is quite standard and our proof is not particularly special. It is though, via a covering argument and the Vitali lemma. Such covering arguments are the core of our approach to many later results and we wish to stress the analogy and so include the complete argument. Lemma 2.5 (Vitali covering lemma) Let S c [0,1] be Lebesgue measurable and suppose that for each XES there is given a nested sequence of intervals Ii(x) which intersect to x. Then there is a countable disjoint collection 11 , 12 , ... where Ik = Ii(k)(x(k» for some i(k), x(k) E S which covers almost all of S, i.e., (2.19)
Lebesgue probability spaces
I 19
Proof Select the 1,. inductively with length 1,.+1 > 1sup{length Iix) IlAx) disjointfrom 11 ,12' ... ,1d, with Ik +l disjoint from 11" .. , l k. Either this process terminates after finitely many steps, in which case we are finished or we get a sequence of disjoint intervals II' I 2 , ..•• The length of 1; -+ 0 as i -+ 00. In fact, the sum of the lengths of all the 1; which are selected is bounded by 1. Let Ii be 1; expanded symmetrically by a fraction Il of its length. So the length of Ii = (1 + Il) X length Let 1; be 1; expanded (symmetrically) 5 times its length. Claim. For all t ~ 2, allll > 0
1;.
Sc
1
ex>
i =1
i=l+l
U I;u U
(2.20)
Ii'
Otherwise there is some XES not in the right hand side. So x ¢ Ul=1 I;, and there is some lAx) disjoint from II' ... , [,. If lAx) is also disjoint from ['+1' [,+2, '" then we get a contradiction to the selection of the 1; (length Ij(x) > twice the length [, for large enough t). If 1. is the first of ['+1, [,+2' ... which intersects Ij(x), then x fL and so length (fix» > 2 length (1.) and this is another contradiction to the selection of 1.. Therefore (2.20) is true so
t(s\.V1;) ~ t(.V [/;\1;]) + t(lP;) < .-1
.-1
1+1
Il
+ II
•
for large t.
Let {Q I} be a tree of partitions for X, not necessarily generating. Recall we can construct a space X' of equivalence classes on which {Q;} generates and is Lebesgue. The purpose of the martingale theorem we now want to prove is to project L 1 (JL) on X to L 1 (JL') on X'. Such a projection is called a conditonal expectation. Here are some simple examples of such projections. Example 1 Suppose (X, fF, JL) is the unit interval with Lebesgue measure, and Bo = ([O, t), [t, 1), [0, 1), 0}. The factor space X' consists of two points, call them {R,B}. IffE L 1 (JL) we define 2
(pf)(x ' ) =
r
1r
fdJL
ifx'=R
fdJL
ifx'
J[o.!]
2
JI!. I]
Then for any A
E
X' (i.e., A
f
E
A
{R, B}) we have
p(f) dJL'
=
r
Jp-'(A)
f dJL.
= B.
Example 2 Let § ' be 'doubled sets' in [0,1], i.e., if x e B, B e ~I, then x ± ! e B for the appropriate ± sign. The equivalence classes of [0, 1] mod § ' consist of pairs of points {x, y} where Ix - yl = !. If f eL l ([0, 1]) let (pf)( {x, y}) = !(f(x) + f(y» and for A e ff"
f
p(f) dJl'
=
A
So p : L 1 (Jl)
-+
r
f dJl.
jp-'(A)
L 1 (Jl') isometrically.
Let {Q.} be a tree of partitions in a Lebesgue probability space (X, [F,Jl). Let f E L~ (Jl), the positive integrable functions. Consider the space (X', [F', Jl') which {Qi} generates. We want to project f to an f' e L 1 (Jl') so that for any
Se [F', (2.21) We calIf' the conditional expectation off given ff", and will write it E(flff"). For a finite partition Q we can easily define the conditional expectation of f given Q as
fQ(x)
= JltS) Is f dJl,
where xeS e Q,
(2.22)
If {Qi} is a tree of partitions, we see each f Q, is a simple function constant on each S e Qi. Further, for S e Qi and j 2 i
(2.23) In fact, we need not assume f Q , actually arises from some original f e L~; only these last conditions (2.23) need be assumed to get the result we want. We call {Ji}, {QJ a martingale if {Qi} is a tree of partitions, Ji is constant on sets S e Qi and for S e Qi and j 2 i
Is /;
dJl =
Is./j dJl.
(2.24)
This notion of a martingale is much stronger than is standard, and so our martingale convergence theorem is a much weakened version of Doob's martingale theorem (Chung 1968). Theorem 2.6 If {/;}, {Q;} is a martingale, /; 2 0, then /; converges a.e. to a function f eL l (Jl), and for any {Qi} measurable set S
lim
LL Ji 2
f.
(2.25)
Lebesgue probability spaces
I
21
Notice, as each /; is Qi measurable, we can assume without loss of generality {Qi} generates. Proof First, we show that S = {x: J;(x) -+ oo} has measure O. S is measurable so for any II > 0, we can find a finite disjoint union of sets Sj E Qi for some i with ,u(U1SjAS) < ll. As U1Sj is a finite union, once i is large enough
r
JUi
fid.u S}
is a constant in i. But now
U
and we conclude .u(S n j Sj) = 0 and .u(S) < ll. Thus .u(S) = O. Now to show f, converges pointwise a.e. to a limit It will be convenient to transport our construction to IR, using the {Qi} trees, so that each set ,ps.. S E Qi' is an open compact set, with ,p(S) c I(S) an interval oflength less than .u(S)(l + 2- i ). Since {Qi} may contain atoms the tree ofintervals may have branches which descend to a set of positive measure. Such a set will be a closed interval. Thus ,p will not be defined pointwise on atoms of ,u, but will map the atom to the whole interval as a set map. (If you did not solve Exercise 2.3, take this as a hint and do so now.) We may define!; on ,p(X), by transporting to each I(S) the value of fi on S E Qi' Clearly if /; converges a.e. on t;b(X) then fi does on X. Note: If C1 => C2 => ••• is a branch of {Q;} with limi_oo.u(C1 ) > 0, i.e., an atom, then clearly h(X) converges as i -+ 00 for all x E C i as
1
n
(2.26)
i.e., fi(x) is nearly monotone decreasing. Thus we need only work on the non-atomic part. Pick a > II > and let
°
Sa .• = {z E ,p(X) : !;(z) > a + II infinitely often and !;(z) < a often}.
II
infinitely
hex) converges a.e. iff t(Sa .• ) = 0 for all a, ll. Notice also that if l(z) converges then z rf Sa .• for any a, B. By our note if z E Sa .• then ,p-l(Z) is not in an atom of .u. Thus, for every Sa .• we can select intervals I;(z) from the construction of t;b such that Ii +1 (z) £; Ij(z), t(Ij(z» -+ 0 and for each i, z if j ~ j(i, z) then Ij(z) corresponds -. ;TI some Qj(i,%) so that
Z E
(1) if i is even
Jr
_fj(i.z)dt =
1,(%)ntP(x)
Jf _hdt > (a + ~)t(l;(Z»
(2.27)
(a - ~)t(Ji(Z».
(2.28)
l i (%),-.,4>(x)
and (2) if i is odd then
f
_];(i.%)dt
=
f
_];dt <
J l i (z)n;(x)
Jl l (%)n;(x)
We accomplish this by alternately selecting intervals from the two available infinite sequences for Z E Sa .•. We successively apply the Vitali lemma first to select a cover of Sa .• a.s. by a disjoint collection of intervals of even index. Call the union of these intervals UI . We restrict the odd intervals to only those contained in UI , and select a cover a.s. of UI n Sa .• by disjoint intervals of odd index. Call the union of these intervals D2 • Restrict the even intervals to only those contained in D2 to select a cover for D2 n Sa .• a.s. by disjoint even intervals. Call the union of these intervals U2 • Continue ad infinitum to build countable disjoint unions of alternately even and odd index UI
and
;:2
D2
;:2
U2 ;:2 D3 ...
ni Di = ni Uicontains Sa .• a.s. Thus (2.29)
Each Ui and Dj is a countable union of intervals Ij(z). We can inductively define a sequence of sets Vi ~ Ui' Vi ~ 0" VI 2 V2 2 V2 ••• where each is a finite disjoint union of such Ij(x),j even for U, odd for D and we still have
t( 0Di) = t( 0 u) ~
t(Sa .• )/2.
(2.30)
As Ui is a finite disjoint union of intervals Ij(z),j even lim j-+ct:J
and as
f_]; dt ~ (a + -21:) t(U
JU
i)
i
Di is a finite disjoint union of intervals I;(z), j odd
J~~ Iv, 1; dt ~ (a..,. ~) teD;).
(2.31 )
f_ J;dt ~ (a + -2e)t(i5;)
(2.32)
lim j-+ct:J
JU
i
hence (a + 1'./2)t(U;) ~ (a - 1'./2)t(D;), Taking limits over i, t(U i ) and t(D;) converge to the same value, which must be O. Thus t(Sa .• ) = O. This gives us pointwise convergence of]; a.e. to a function 1 Hence on X, h converges pointwise a.e. to a function f For any V'f'=l Qi measurable set lim
r ]; z. ~f.
JIP(S)
by Fatou's lemma. Hence lim
J
tP(S)
f z. f s
J;
s
•
f
We will usually want more than pointwise convergence. The easiest condition to use that implies L 1 convergence also, and in fact the most general is uniform integrability. Let S(B, i) = {x: lJ;(x) I z. B}. If J; is integrable, then lim B~OCJ
f
Ihl dJl = O.
S(B, i)
We say the sequence J; is uniformly integrable if this limit is uniform in i. Theorem 2.7 If J; z. 0 and converge pointwise a.e. to f on the Lebesgue probability space (X, fF, Jl), then J; ~ fin L 1 iff the J; are uniformly integrable. Proof The only if direction we leave as an exercise for the reader, as the if direction is all we ever use. By Fatou's lemma, f ELI. Thus given 1'., there is a <5 so that if Jl(E) < <5 then
L
fdJl <
i· i»
As the J; are ,,"uniformly integrable. Jl(S(B, ~ l/B JS(B.i) Ihl dJl ~ 0 in B, uniformly in i. Select B so large that for all i, Jl(S(B, i» < <5, and
f
S(B,i)
Ihi dJl
Select i so large that Jl( {x: 1J;(x) - f(x)1 >
~ i·
n) ;~;{5~,<5). <
For i this large,
I
II; - fldJl
~
f
II; - fldJl
:x IJi(x) - J(x)1
+
Jr
s;./ s}
{x. lUx) - J(x)! > £/5 JnS(B, i)·
II; - fldJl +
Jr
S(B. i)
II; - fldJl
::;; ~ + 5
::;;
Jr
II; - fl dJ.l + 2e/5
{X,IJi(X) - J(x)1 >£/5}nS(B.i)C
~e + BJ.l( {X: II;(x) -
f(x) I >
i})::;;
e.
•
Letting II -+ 0 we get the result. Corollary 2.8 Let f
E
L ~ and {Qi} be a tree of partitions. If I;(x)
= J.l:S)
L
f dJ.l
for XES E Qi' then {I;}, {Qi} is a positive martingale and I; converge pointwise and in Ll.
Proof That {/;}. {Q;} is a positive martingale is easy. All we need check is that the I; are uniformly integrable. Let S(B) = {x: sup/;(x) ~ B}. Certainly J.l(S(B» -+ 0 in B as the I; converge a.e., and for all i, S(B, i) c S(B). Now S(B) is a countable uunion of disjoint sets Ii E Q}(i), where x E Ii iff .Ij(i)(x) > B, andj(i) is the first such .Ij(i)(x)
= J.l (~.), ]I f dJ.l.
r
i
Thus for k ::;; j(i),
(2.33) and for k > j(i) (2.34)
Thus for all i
Thus, as B -+
00,
uniformly in i.
•
In this example, the limit function is called the conditional expectation of / given V'f'=l Qi' We extend the conditional expectation to all of Ll by setting E(fla) = E(/+ la) - E(f-Ia). The first exercise below shows that the conditional expectation is uniquely defined.
Leoesgue prODaOlllty spaces I ZO
Exercise 2.12
Let a
c
ff' be a sub-a-algebra of a Lebesgue probability space.
(a) Show that there is a tree of partitions {Q;} with a E(fla) = E(fl VT'=l QJ, an a-measurable function. (b) Show that for any set A
E
a,
L
E(fla)
=
= vr~l
Q;, and set
L f
(c) Show that if f1 and f2 are both a-measurable, and for any A E a, JAfl = JAf2 then f1 = f2 a.e., and hence E(f!a) is uniquely defined, independent of the {QJ tree. Exercise 2.13 Show that the construction of a conditional expectation does not require the Lebesgue property, but only separability (i.e., the existence of a generating tree of partitions). Hint: in this case, we know X can be regarded as a subset of(O, 1] offull outer measure and ff' the Lebesgue sets restricted to X. Exercise 2.14 Show that if gEL 1 is a measurable and f then E(f· gla) = E(fla)· g.
ELI
and f. gEL 1
Exercise 2.15 Prove the Radon-Nikodym theorem, that if w is another positive measure on (0, ff', J-l), a Lebesgue probability space, and J-l(E) = implies w(E) = 0, then there is an L 1 function f: 0 -+ IR+ and
°
w(S) =
2.5
L
fdJ-l.
More about generating trees and dynamical systems
We saw in Theorem 2.3 that if {Q;} is a generating tree of partitions in a Lebesgue space, then vr~l Qi was the full algebra, i.e., the tree also generated the full a-algebra. The converse of this is also almost true. Theorem 2.9 If (X, ff, J-l) is a Lebesgue space, and {Qi} is a tree of partitions with VT'=l Qi = ff', then there is a subset Xo S; X of full measure, and {QJ generates on X 0, i.e., separates points. Proof Let {AJ be a countable collection of measurable sets which separate points in X. By Corollary 2.8 and Exercise 2.12,
h.ix) =
J-l(A; n S) J-l(S)
where XES E Qj' converges a.e. to XA, Let
26
I Fundamentals of measurable dynamics
x 0 = {X EX: /;jX) -+ XA,(X) for all i}. If Xl, X 2 E X o, then as XA/xd # XA/X2) for some i, Xl and X2 are separated by _ the {Qi} tree.
Thus in a Lebesgue space we have two equivalent notions of a generating tree, that of generating the measure algebra up to null sets, and that of separating points ofT a null set. In light of this, we modify slightly our definition of a generating partition. If no measure is specified, or many are to be considered, then a generating tree must separate points. But if some particular measure is specified, then it need only separate points ofT a null set to be called a generator. As our spaces will never be of cardinality beyond the continuum, any tree which separates points ofT a null set can be modified by null sets to actually separate points. This is consistent with our definition of isomorphism, as we now can say if a tree generates on one space, then it generates on any isomorphic space. In situations where only one measure is being considered, no subtlety will arise in ignoring null sets in this manner. Later though, especially in Chapters 6 and 7 when we must consider spaces of measures, it will be necessary to specify a tree which generates by separating points, and not allow null sets to enter the arguments unless they are universally null, i.e., of measure 0 for all measures considered. A dynamical system for us is a Lebesgue space (X,fF,JJ.) and a measurable measure-preserving bijection T from X to itself. Notice that given any finite partition P we can construct a natural tree of partitions {V:'=-n T-i(P)} which generates the minimal T-invariant sub-a-algebra Vt>=-oo T-i(P), containing P. If this a-algebra is all of §', we call P a generator or generating partition for the dynamical system. Later, especially in Chapters 5 and 7 we will investigate this concept more deeply. If we are given a tree of partitions there is also a natural way to build a new tree that generates the minimal T-invariant sub-a-algebra containing the given one. We inductively define the elements of the tree so that each new level contains both forward and backward images of the previous level of the tree as well as the next level of the original tree. In such a tree the forward and backward images of any level are finite unions of elements of the next level. Hence the T-invariance of the measure can be directly read ofT from the measures of sets in the tree.
3 Ergodic theorems and ergodic decomposition
A fundamental tenet of naive probability is that the probability of an event is the average rate of occurrence of that event over time, e.g., the probability of a coin landing heads is the average number of heads achieved through many tosses. Ergodic theorems are perhaps the most natural formalization of this idea. If (X, ff, J-l) is a Lebesgue probability space and T a bimeasurable measure-preserving map of X to itself, then T can be viewed as the movement from one successive outcome of a random series of events to the next. A measurable function f can be viewed as some particular measurement on the outcome of the event. The average value of this measurement over a span of n time intervals is, then, the Cesaro average (3.1)
(When regarding this as a function, we write it A.(f». That these Cesaro averages converge in various senses under various hypotheses on f form the formal content of the first half of this chapter. This will put the naive probabilistic notion of average rate of occurrence on firm ground, and will also provide one of the fundamental tools of ergodic theory. We first prove perhaps the simplest of the ergodic theorems.
3.1
Von Neumann's L2 -ergodic theorem
Theorem 3.1
Let T be an invertible, bimeasurable, measure-preserving map of the Lebesgue probal!!lity space (X, ff, J-l). If f E L 2eu), then A.(f) converges in U(J-l) to a function f with IT(x) = lex)
for a.e. x
E
X.
Proof Let UT(f)(x) = f(T(x»
(3.2)
be t~e unitary operator on U(J-l) induced by T N First suppose f E Range(UT - id), i.e. f = UT(g) - g for some gEL 2 (J-l). Ow
1
A.(f) = - (U;(g) - g) n
so
which goes to 0 in n. Notice that if IIf - g/l2 < then that if
B,
then I An(!) - An(g)1I2 < dor all n. We conclude
f
E
Range(UT - id),
then A,,(!) -+ 0 in L2(p.). Now suppose
f (i.e., (f, UT(g) - g)
E
Range(UT
id)1-.
-
= 0 for all 9 E L 2 (J.l».
Then
or (U:;U),g)
= (f,g)
forallgEL 2 (J.l).
(i.e., U:;(!) = f a.e.). What is the adjoint U1(!)? (f, UT(g» =
f
f(x)g(T(x»dJ.l =
f
as T is measure-preserving and invertible. Hence U:;(!) = f(T- 1 (x»
f(T- 1 (x»g(x)dJ.l
a.e.
I.e.,
Thus f(T- 1 (x» = f(x) f(x)
a.e.
= f(T(x»
a.e.
so An(f) = f
a.e.
This completes the proof. Notice we obtain more, having identified J as the _ projection off on the subspace of T-invariant L 2-functions. This is a purely operator theoretic argument. The study of T via the unitary operator UT leads to the ergodic theory of linear operators, a bit outside our
t:rgoOic meorems ana ergoOic aecomposltlon I
:l~
focus here. We will see later that when U r has pure point spectrum, it determines T completely, but for more general spectral types Ur gives only weak information about T.
3.2 Two proofs of Birkhoff's ergodic theorem We now prove the much deeper Birkhoff pointwise ergodic theorem, that the Cesaro averages actually converge pointwise almost everywhere. We well, in fact, give two proofs of this result. The first is the semi-classical GarsiaHalmos argument, the second a far more technical backward Vitali argument following ideas of Ornstein and Weiss (1983). Both arguments have their special strengths. The Halmos argument requires less of the measure space, but the Ornstein-Shields argument generalizes to larger group actions and other pointwise convergence theorems. For the Halmos result, we first prove the following. Lemma 3.2 (Maximal ergodic theorem). Let T be a measure-preserving transformation on the a-finite measure space (X, ff, p). Suppose f E L 1 (p). Set En =
{x: Aif,x) >
Fn(x)
=
°
for somej::::;; n}.
Then
for all n. Proof.
Set max ( 0,
~~ f(Ti(x» :j
::::;; n).
a non-decreasing sequence of functions. Observe Fn +1
=
max(O,f + Fn
0
As
on
En+1
and f
=
Fn+1 - Fn
0
T
T).
Thus
tn+! fdJ-l tn+! (F.+l =
Now Fn+1 = 0 off En +1 and -Fn off E.+ 1 • So
0
Fn
0
T)dJ-l.
T S 0 everywhere so Fn+l - Fn
0
T sO
• Corollary 3.3
Setting E",
= U~~1
En,
r fdJ-l ~ O.
•
JEoo
Theorem 3.4 (Birkhoff ergodic theorem). Let T be a measure-preserving transformation on a a-finite measure space (X,ff,J-l), and f E U(J-l). There exists an with
1
Proof.
For rational u and v let E u.v = {x: lim An(f, x) > v > u > lim A.(f, x) :j>, Q}.
(3.3)
If x belongs to no such Eu.v then limn~oo An(f, x) exists (possibly ± 00). We will show J-l(Eu.V> = O. Assume v > 0, otherwise replace f by - f and - u > O. Assume J-l(E u.v) > O. From its definition, if T(x) E E u • v , then x is also. In other words, T- 1 (E u • v ) <;; E u • v , so we may, without loss of generality, assume Eu • v is the entire measure space. Hence we can assume for all x E X = E u • v , for some n, 1 n-l
- I.
(f(Ti(X» - v) > O.
ni-O
The function f - v ¢ L 1 (J-l) if J-l(X) for each x
= 00,
but choosing a set A c X, J-l(A) <
00,
(3.4)
and f - VXA
E
L 1 (J-l). Hence by the maximal ergodic lemma
Ix
(f - vXA)dJ-l
~0
Ergodic theorems and ergodic decomposition
I
31
or LfdJ-l
~ vJ-l(A)
(3.5)
since X is the set Eoo of Corollary 3.3. Letting A increase to all of X, I fill ~ VJ-l(X) and as v > 0, J-l(X) < 00. Now, using J-l(X) < 00, u - f ELI (J-l) and for some n, 1 .-1 U - - L f(Ti(x» > O. ni=O
By the maximal lemma
or
Ix f dJ-l
::5;
(3.6)
uJ-l(E..,.,).
-
This implies J-l(E ..,v) = 0 and we are done.
Corollary 3.5 Defining the map L(f) = J,for f E LI(J-l), IIL(f)111 ::5; I fill and so L(f) is a continuous projection from L I (p) onto the subspace of T-invariant U-functions. Proof As II An(g)lll to!,
= I gill for g ~ 0, and since A.(f) converges pointwise
and (3.7)
Notice that L(f)(T(x»
= lim A.U; T(x» =
· 1 1... f f(T'.(x» Ilmn i~l
= lim(n :
1
1 A n + (f,x) _
f~X»)
= L(f)(x). !hus L(f) is T-invariant. As IIL(f - g) II 1 = IIL(f) - L(g)lIl IS a continuous projection onto the T-invariant Ll functions.
(3.8) ::5;
II! - gill' L _
~~
I r-unaamentalS OT measuraole aynamlcs
We now show that the only non-trivial convergence in the Birkhofftheorem is in the case of a finite invariant measure. Corollary 3.6 If (X, ff', JJ.) has no T-invariant subsets of finite measure, then LU) == 0 for all f E U(JJ.). Proof If X has no T-invariant sets of finite measure, the only T-invariant V function is identically equal to O. • Notice that X can be broken up into a subset Xro which has no T-invariant subsets of finite measure, and its complement which is at most a countable union of T-invariant sets of finite measure. On x,XJ' Cesaro averages of Ll functions converge to zero. The following corollary says that on a remaining piece the convergence is in L 1. Corollary 3.7 If JJ.(X) <
00,
then II An(f) - L(f)lIl
-+
O.
Proof Define d
= {J E L 1 (JJ.): II AnU) -
L(f)lIl
-+
O}.
As the operator L is a contraction in £I, d is £I-closed (if /; is Cauchy so is L(/;).
If f is bounded then all AnU) have the same bound and by the dominated convergence theorem, AnU) -+ LU) in Ll. The only closed subspace of L 1 (JJ.) that contains all bounded functions is L 1 (JJ.) itself. • Corollary 3.8 If (X, ff', JJ.) is a probability space, T a measure-preserving transformation, and f E U(JJ.), then LU) = EUIJ) where J is the algebra of Tinvariant sets. Proof Using Exercise 2.12 all we need show is (1) L(f) is J-measurable (which is true as L is T-invariant); and
(2) for any A
E
J
L
L(f)dJJ. =
L
fdJJ..
As A is T-invariant, fA AnU)dJJ. = fAfdJJ. and as AnU)-+LU) in Ll, fA L(f) dJJ. = fA f d}J., and we are done. The power of the BirkhofT theorem thus lies where JJ.(X) < 00. We will now construct another development of the proof in the case where (X, ff', JJ.) is a Lebesgue probability space. The proof has a similar flavour to all pointwise convergence arguments, showing that sets EII,v, where the value of the sequence
Ergodic theorems and ergodic decomposition I
;;S;;S
oscillates infinitely often above v and below u, has measure O. This time, though, instead of using a maximal lemma we use a Vitali type lemma to 'disjointify' segments of orbit on which the averages are above or below the appropriate bounds. Compare this argument with the martingale theorem of the previous chapter. We state here the Vitali lemma we use and show how it proves the Birkhoff theorem. We postpone the proof of the lemma to the end of the chapter and on first reading, we recommend the reader omit its study. It is an extremely useful fact but the argument is quite technical. ' We say x E X is a periodic point of least period n for T if r(x) = x, and n is the least such value. We say T is non-periodic if the collection of all periodic points has measure O. Theorem 3.9 (Backward Vitali lemma). Suppose (X,~, Ii) is a Lebesgue probability space and T a non-periodic measure-preserving invertible map of X to itself Suppose A s; X has Ii(A) > 0 and for every x E A we have sequences of measurable integer-valued functions ik(x):5; 0 :5; jk(X) with limk~oo (jk(X) ik(x) + 1) = 00 for all x E A. We can then for any e > 0 find a subset A' s; A and measurable functions i(x) :5; 0 :5; j(x), defined for x E A' where (i(x),j(x» = (ik(x),jk(X» for some k (depending on x) so that the sets J(x) = Ult;l(x) Ti(x) are pairwise disjoint and
Notice the analogy between this result and the standard Vitali covering lemma. There are differences however. The most obvious one is that the intervals Jk(x) = U{~~2(x) Ti(x) are increasing on the orbits of T instead of decreasing on IR. The basic geometry of the proof, though, is the same. Fig. 3.1 draws UXEA' J(x) schematically. The set J(x) cuts vertically through the figure. The intervals drawn correspond to disjoint measurable subsets of X, and cover all but e in measure of X. Theorem 3.10 (Rohlin Lemma). Let T be non-periodic and as above. For any n E Nand e > 0 there is a set F c X so that F, T(F), ... , T" -1 (F) are disjoint and their union Covers all but e in measure of X.
~oof For all x E X, let ik(x) = O,jk(X) = nk - 1. There is, by the backward VItali lemma, a set A' c X and measurable values i(x), j(x) defined on A', chosen from among the ik(x), A(x) so that all U{t;I(x) Ti(X) = J(x) are disjoint and
________
~I~x
_______________ A'
I
I I I
l(x)
Fig. 3.1 Schematic of the backward Vitali lemma. The intervals represent disjoint measurable subsets, and the orbit sections lex) cut through the diagram vertically.
Now i(x) andj(x) must be of the form i(x) = 0 andj(x) = nk(x) - 1. Set
F = x~L
1(X)-1 (
}}o
)
(3.9)
T"i:(x) .
Letting "-I
J'(x) =
U Ti(X)
for x
E
F,
1=1
then k(x)-I
l(x)
=
U
J'(T"l(X»
k=O
is a disjoint union. Thus for x
E
F, the J'(x) are pairwise disjoint and
J.l( xeF U J'(X») = J.l( xeA' U leX») >
1-
B.
(3.10)
Saying the lex) are pairwise disjoint is equivalent to saying F, T(F), ... , _ T"-I(F) are pairwise disjoint. The Rohlin lemma is in fact much older than the backward Vitali lemma, and this is a very unusual and in fact difficult proof, in that it uses the much deeper backward Vitali lemma. Exercise 3.1 Give a direct proof of the Rohlin lemma, not relying on the backward Vitali lemma.
Ergodic theorems and ergodic decomposition [ 35
Standard proofs usually rely heavily on the ordering of 7L, as did our first proof of the Birkhoff theorem, or look surprisingly like the proof of the backward Vitali lemma. In this sense the backward Vitali lemma can be viewed as a generalized Rohlin lemma. In fact for non-periodic actions of general amenable groups the backward Vitali lemma remains true with only slight changes (F0lner sets replace intervals and complete disjointness is not obtained) whereas the Rohlin lemma as stated requires the existence of tiling sets (Ornstein and Weiss 1980). We noW use the backward Vitali lemma to re-prove the Birkhoff theorem. Suppose we have a set F and measurable functions i(x) ~ 0 ~ j(x) so that the sets J(x) = U{~l(x) Ti(X) are disjoint as x varies over F. A basic fact to keep in mind is that for any f ELI (J-l) we get
1
iI
j(x)
UX.F J(x)
fdJ-l
=
f(Ti(x»dJ-l.
(3.11 )
F i =i(x)
Theorem 3.11 Let T be a measure-preserving, invertible transformation on (X,.fF,J-l), a Lebesgue probability space. Let f E e(J-l) and as before set
For almost all x E X, An(J,x) converges. We can, as usual, identify the limit function. Let J
= {A E ff: J-l(ALlT- 1 (A» = O}.
This is a a-algebra, the a-algebra of T-invariant (mod 0) sets. For almost all XEX,
(3.12) Proof We first handle periodic points. Let X = U~o Xn U Xoo where x E Xn iff x is of least period n. All the Xn are in J, and J restricted to Xn separates the n-point orbits. Thus both E(f[J) and limk~oo.h are equal to (lIn) ~}:6 f(Ti(x» on X n. We are left with X oo , on which T is non-periodic. Renormalizing the measure, we have reduced the problem to T a non-periodic transformation of (X'~,J-l).
Define
J(x)
=
lim sup(An(J, x»,
taking values in R U {oo, -oo}. As J(T(x» = J(x), J is J measurable. Let E E J. We wish tO show that JEJdJ-l = JEfdJ-l, and hence J = E(f[J) a.e. As the sets where f > 0 and f ~ 0 are in J we can, withbut loss of generality, assume ? O. A
J
36
I
Fundamentals of measurable dynamics
Let Eoo = {x E E :/(x) = OO}, EM Fix e > ik(x)
~
00
°
and M > so that
(1) AM x;)+1(J,x) >
= {x E E:/(x) ~ M}.
(3.13)
°
and we can measurably select functions ik(x) = 0,
~ if x E Eoo; (3.14)
Use the backward Vitali lemma to find a set F £ X and functions i(x) = 0, j(x) = A(x) for some k so that all the J(x) = Uf~l(x) Ti(X) are disjoint and B = UXEFJ(X) has JJ(B) > 1 - B. As E oo , EM E J, if x E Ea, then J(x) £ Ea. Thus for Eoo ,
r
JE.,nB
r ~! r
fdJJ =
(j(x)
+ 1)Aj (x)+1(f,x)dJJ
JE",nF
(j(x)
+ 1)dJJ = JJ(E oo n B).
B JE",nF
B
So
Letting B..... 0, JJ(Eoo) = O. Thus E = limM EM a.s. so 0 ~ I < Now by (3.11)
f
B 2: BJJ(EM n B) 2: B
(j(x)
00
a.e.
+ l)dJJ
EMnF
r ~ Ir ~
I/(x) - Aj(X)+1(f,x)I(j(x)
+ 1)dJJ
JEMnF
JEMnF
+ 1) dJJ I (3.15)
Letting B..... 0, fEM
Ergodic theorems and ergodic decomposition
3.3
I 37
Proof of the backward Vitali lemma
We now digress to prove the backward Vitali lemma. On first reading we recommend skipping this proof, jumping forward to Corollary 3.16. Lemma 3.12 If the set of all periodic points for T of least period less than N has measure 0, and Jl(A) > 0, then there is a subset A' £; A, Jl(A') > and all the sets A', T(A'), .;., T N- 1 (A') are disjoint.
°
Proof We prove the contra positive. Suppose Jl(A) > some 0 < n < N,
°
and for all A'
£;
A, for
T"(A') n A' ¥- 0. As we could always delete sets of measure
Let n(A') be the least n > n(A'). Let
°
°
from A', we must have an n with
Jl(T"(A') n A') > O.
with Jl(T"(A') n A') > 0. If A"
No
=
£;
A' then n(A") ~
max (n(A'» < N. A'S;;A /l(A'»O
°
.There is a subset A' £; A, Jl(A') > and n(A') = No, so A', T(A'), ... , TNo -l(A') are all disjoint. Let A" = A'\ TN°(A'). As A", T(A"), ... , T No- 1 (A"), TN°(A") are all disjoint, n(A") > No so Jl(A") = 0, i.e.,
A' = TN°(A')
a.s.
This must also be true of any subset of A'. Let Pi be a refining tree of partitions of A'. For any subset S E Pi'
TNo(s) = S a.s. Delete from A' a TN°-invariant subset of measure (A"),
°
so that on what remains
TNo(S n A") = S n A" for all S E Pi. For any chain of sets {cd in the tree p;, that descends to a point x E A", it follows that TNo(x) = x. Thus every point of A" is a periodic point of period No < N.
°
-
Corollary 3.13 If T is non-periodic, then for any subset A, Il(A) > and i !S;; 0 !S;;j, there is a subset A's; A, Il(A') > 0 with Ti(A'), T i + 1 (A'), ... , TJ(A') all disjoint. _
38
I Fundamentals of measurable dynamics
Exercise 3.2 Let P" = {x: Tn(x) = x and n is the least such}. If J-L(Pn) > 0 show that Pn = Ui:6 Ai where the Ai are disjoint, T-l(A;) = A(i-l)mOdn' This shows what the periodic part of T looks like. Exercise 3.3. Suppose T is ergodic but Tn is not, for some n > 1. Show that there is a value k dividing n, k =f. 1, and disjoint sets A o, At, ... , A k- l with x = U~:6 Ai a.s., T-l(A i ) = A(i-l)mOdk and Tk acting on Ao is ergodic. We are now ready for our two most technical steps. Notice the parallel between them and the proof of the standard Vitali lemma of Chapter 2. The use of one-dimensional geometry is the same. Because our sequences of intervals nest outward instead of inward, we must use much more explicit constructions than in the classical Vitali lemma. Lemma 3.14 Let (X, ff', J-L) be a Lebesgue probability space, T a measurepreserving invertible non-periodic map of X to itself. On a set A c X, J-L(A) > 0, we are given bounded integer-valued measurable functions i(x) ~ 0 ~ j(x). There is then a measurable subset A' c A, so that for all x E A', the sets l(x) = U{<:I(x) Ti(X) are disjoint (i.e., l(x) n l(x') = 0 unless x = x') and Il(UXEA l(x)) 2 Il(A)/3. Proof Let A i.j = {x E A: i(x) = i,j(x) = j}, a measurable set. As i(x),j(x) are bounded there are only finitely many such sets. Order the sets Ai,j as Ai(l),j(l)'" Ai(M),j(M) so that j(k
+ 1) - i(k +
1)
~j(k)
- i(k),
(3.16)
i.e., the lengths of the blocks l(x), x E Ai(k),j(k) are non-increasing in k. Let n(k) = j(k) - i(k) + 1 be this length. We construct a sequence of sets AI. ~ Ai(k),j(k) inductively. A' will be the union of these sets. Let
.911
=
{A' ~ Ai(l),j(l): Il(A') > 0, and Ti(l)(A' ), Ti(I)+l(A' ), ... , Tj(l)(A' ) are disjoint}.
By the previous lemma .911 =f. t/J. If A'l ~ A~ ~ .. , ~ AI. ~ ... are all in .911 then so is Uf=l A;. Thus there must be an a.s.-maximal set under containment A'l E .911 , i.e., if A'l ~ A" E .911 then Il(A") = Il(A'd. (This is not done with the axiom of choice. Rather the sequence is constructed to approach the maximal available measure.) We claim j(l)+n(l)
A i(1),j(l)
~
U
i=i(I)-n(l)
Ti(A'I ) a.s.
(3.17)
Ergodic theorems and ergodic decomposition
I
39
Hnot, then )11)+n(l)
B
U
= A i(l),)(I)\
i=i(I)-n(l)
Ti(A'I)
°
would be of positive measure and hence contain a subset B', J-L(B') > and Ti(I)(B'), T i (l)+1(B'), ... , Tj~)(B') all disjoint and disjoint from Uf<,;lo) Ti(A'd. But then B' u A'l E .91., conflicting with maximality of A'l' Arguing now by induction, to complete the proof suppose we have subsets A~, 1 :::;; u < k, with all sets of the form Ti(A~),
i(u):::;; i :::;; j(u)
.,
pairwise disjoint and k-l (
1-1
V
U
I
Ai(u),)(U)
)(u)+n(u)
)
~ UVI i=i(~n(U) Ti(A~)
a.s.
(3.18)
Set Al
k-l (
U u=l
= Ai(k),j(k)
j(u)+n(u)
U
)
TI(A~).
I=i(u)-n(u)
If J-L(A1 ) = 0, setting A~ = 0, the induction extends to k. Otherwise J-L(Ak ) ¥- O. Let
.9Ik = {A' ~ A k : J-L(A') > 0, and Ti(k)(A'), T i(k)+1(A'), .", T)(k)(A') are disjoint}. As before.9lk is non-empty and must contain an a.s.-maximal element under containment. Call it A~, and )(k)+n(k) Ak ~
U
i =i(k)-n(l)
Ti(A~).
(3.19)
As A~ C A k, and n(k):::;; n(u) for u < k, U!!!l(k) Ti(AD is disjoint from U~;;;t(Ui~l(u) Ti(A~»,so all sets Ti(A~), 1 :::;; u:::;; k, i(u):::;; i :::;;j(u), are pairwise disjoint. Combining (3.18) and (3.19), k -1 (
1
U Ai(u).)(U) ~ U u=l
u=l
)(u)+n(u)
U
)
TI(A~).
(3.20)
1=I(u)-n(u)
Continue the induction through M and M
A
Set A' =
U!'=l
(
j(u)+n(u)
~ .Vl i=i(~n(U) Ti(A~)
)
.
A~ and (3.21 )
40
I Fundamentals of measurable dynamics
a disjoint union. To see that disjointness ofthis union is equivalent to disjointness of the sets lex), suppose x, x' E A' and lex) n lex') ¥= 0. Then x E Ak> x' E A k , and for some i(k):5; i :5;j(k). i(k') ~ i' :5;j(k'), Ti(x) = Ti'(x') so Ti(Ak)n Ti'(A k,) ¥= 0 and k 7" k', i = i' and x = x'. As M
A
(
~ UVI
}(u)+n(u)
i=i(Y-n(U)
Ti(A~)
,
~ U~1 3 J~U) Il(Ti(A~» = 31l %~, I(x) M
JL(A)
)
'(u)
( )
.
•
The conclusion ofthe next corollary is precisely that of the backward Vitali lemma. The hypotheses are stronger though. Conditions (1) and (2) are growth conditions on the orbit intervals. We shall see later that they are obtained by dropping to a subsequence. Corollary 3.15 For (X,~,JL) and T as in the previous lemma, and 1 ~ e > 0, suppose we have a set A c: X, Il(A) > 0 and for x E A, k = 1, "" M we have bounded integer-valued functions ik(x) ~ 0 ~ A(x) so that (1)
eM
12 > 1, and
(2) sup (jU1 (x) - ik+1 (x)
+ 1) ~ -4e
xeA
inf (A(x) - ik(x)
+ 1).
(3.22)
xeA
It follows that there is a subset A' c: A and measurable functions i(x) ~ 0 ~ j(x) for x E A' with (i(x),j(x» = (ik(x),A(x» for some k depending on x. The sets j(x)
I(x) = are pairwise disjoint for x
E
U
i=i(x)
Ti(X)
A' and
Proof Let i~(x) =
ik(x) - sup Uk +1 (X) - iUl(X) + 1) xeA
j~(x) =
jk(X) + sup Uk +1 (x) - ik+1 (x) + 1).
(3.23)
XEA
Use the previous lemma inductively to get a subset A'l c: A with J'(x) = Uf~~?(%) Ti(X) pairwise disjoint for x E A~ and
Il (
U II (X») ~
xeA'
1 -3
JL(A).
Ergodic theorems and ergodic decomposition
I
41
'Set A2 = A\UxeA' J'(x), If J.t(A 2) > 0 repeat the procedure using i 2(x), h(x) to find A~ ~ A 2, s~t I'(x) = U{~~?(x) Ti(X) pairwise disjoint for x E A~ and J.t(
U, I'(X»)~~J.t(A2)'
xeA 2
Continuing inductively, set A"
= A~ U U I'(x) = A"-l k-1 1=1
\
xeA;
U
",eA k_,
If ji(Ai) > 0 repeat the procedure using i,,(x),j,,(x) to find Ale Uf~~(x} T/(x), pairwise disjoint for x
ji (
E
(3.24)
I'(x). £;
A k. Set l'(x) =
Ale and
U, l' (X») 2 ~ ji(A,,).
(3.25)
XEA k
x
Continue through the M steps. Set A' = U:'=l A", a disjoint union. For is in a unique Ale so set i(x) = i,,(x),j(x) = A(x), and now
E A', x
j("')
I(x) =
U T/(x).
/=/("')
Ifx, x' e A',x #: x' and I(x) (] I(x') #: 0, then as for all y, I(y) c: I'(y), we must have x
E
A"" x' E A"2' kl #: k 2. Then x = T"(x') where
Inl :::;; (it,(x) .and either x e I'(x') or x'
ik,(x)
E I'(x),
+ 1) + (j",(x') -
i",(x')
+ 1)
which is a conflict. Hence
I(x) (] I(x') =
0.
From this disjointness,
and
(3.26) Letting AM+l = A\Ur-l UxeA;J(X), we have A :2 A2 :2 A3:2 .. , :2 AM+l' Suppose ji(A M+1) > 6/2, i.e., ji(Ak) > 6/2 for all k. Then
jiC~, I(X») 2 M( 1 - i)~ > ~; > 1,
I
42
Fundamentals of measurable dynamics
a conflict . .Thus
and
: :; ~ + ~ =
t,
(j(x) - i(x)
~2 + -2 U I(X)):::;; 8
(
+ l)d~ (3.27)
8.
•
xeA'
To finish the proof of the backward Vitali lemma, what we need to do is to see how to obtain boundedness of the functions ik and jk and conditions (3,22) of Corollary 3.15. Proof of Theorem 3.9 (Backward Vitali Lemma) Having fixed 8 > 0, select M so that M8/24 > 1. We define new sequences of functions ik andjk inductively on successively reduced domains in A. Here are steps 1 and 2. Let idx) = i 1(x),jt(?<) = jt (x). Delete at most 8/4 in measure of A to obtain a subset At with it andjt bounded on At. Select k(x) measurably so that for all x E A,
Letj2(x) = jk(X)(X), t2(x) = ik(x)(x). Delete from At at most 8/8 in measure of At to obtain a subset A2 with iZ ,j2 bounded on A 2, Suppose inductively we have found subsets A:2 At :2 A2 :2"':2 Ak- t , ~(Au+l) ~ Il(Au) - 8/2j +l, and (iu(x),j(x)), u = I, ... , k - 1 are from among the (iv(x),jv(x)) (v depending measurably on x), are boundary and satisfy A
A
sup (ju(X) - ju(x) XEAk-t
e
+ 1) S; -4
A
A
inf (ju+l(X) - iu+l(x) xeA k -
+ 1)
1:::;; u < k.
1
(3.28)
Select k(x) measurably so that for all x E A k - t ,
Set Jk(X) = jk(X)(X), ik(x) = ik(x)(x) and delete at most 8/2 k+l in measure of A k- t to obtain Ak on which bothjk(x) and ik(x) are bounded. Continue through M steps. Notice
Ergodic theorems and ergodic decomposition
I 43
The previous Corollary 3.15 applies to the sequence offunctions tk(X).ik(X) in reverse order and the set AM' with error e/2. The conclusion of the corollary gives us the set A's; AM ~ A and (i(x),j(x» from among (iu (x),}u (x» which are from among the (ik(X),jk(X» with I(x) = Ul~l(x) Ti(X) pairwise disjoint and /l.(A\
U I(X») 5, /l.(AJII\XEA' U I(X») + -2[; 5, B.
'1\xeA1
•
Exercise 3.4 Formulate and prove a version of Theorem 3.9 in 7L". Can you do this for more general intervals than cubes?
3.4
Consequences of the Birkhoff theorem
As we know from Chapter 1, a map T is called ergodic if J, the IT-algebra of T-invariant sets (mod 0) consists only of sets of measure 0 and 1. In this case the Cesaro averages I" must converge a.e. to Ifd/l.. We want a test for ergodicity that requires we check it on only countably many sets. The Birkhoff theorem provides the tool. Corollary 3.16 of all sets and
If {A;} is a countable collection of sets L 1 dense in the collection 1 "-1
Ii i~1
(3.30)
XAiTi(x» -+ Jl(Aj )
for all j for a.e. x then T is ergodic. Proof Let A be a measurable invariant set and Ai any from our collection. Now as A is invariant
1 n-l = lim ( n .... oo
-.L n ,=0
= /l.(A n Ai>.
f
XAj(Ti(X» dJl
)
A
(3.3 t)
Selecting Ai so /l.(A L\ Ai)"'" 0,
and Jl(A) = 0 or 1.
•
44
I Fundamentals of measurable dynamics
Often it is the case that T is given as a homeomorphism of a compact metric space without an invariant measure. The BirkhofT theorem can be used to obtain invariant measures, or at least information about them. The following exercises examine some parts of this issue. Exercise 3.5 Throughout, suppose T is a homeomorphism of a compact metric space X. As usual, Ait. x) = [f(x) + ... + I(T"-lx)]/n for I E CUll(~. 1. Unique Ergodicity. Suppose, for each I, there is a constant L(f) so that An(f, x) -+ L(f) for all x in X. Then there is a unique invariant probability measure Jl. on X (with respect to the transformation T, and the Borel sets in X), namely the Jl. for which L(f) = JI dJl.. Moreover, An(t. x) converges to L(f) uniformly in x. The proof follows from the exercises below.
(a) L(f) is a positive linear functional on CUll (X), and L(I) = 1. Moreover, L(I 0 T) = L(f). By the Riesz representation theorem, there is a Jl. with Jl.(X) = 1, Jl. ~ 0, L(f) = f I dJl., and Jl.(T- 1E) = Jl.(E) for all Borel E. (b) If v is an invariant probability measure then f An(f)dv = f Idv and f An(f) dv -+ f L(f) dv = L(f) = f I dJl.. (You don't need any ergodic theorem here, Lebesgue dominated convergence will do.) Thus Jl. = v (cite a reason). (c) If v is a signed invariant measure (or, equivalently, a continuous linear functional on CUlI(X» then v = TJl. for some T E IR. (Look up the appropriate decomposition theorem, including uniqueness.) (d) Let V = {J:I = g - goT + k, g E CUll, k E IR}. Then Vis dense in CUll (X) (in the norm II . 'Lx,). (Use (c), and an appropriate version of the HahnBanach theorem.) (e) Use (d) to deduce An(f) -+ L(f) uniformly for all I
E
CUlI(X).
2. Example: Let X = [0,1] with the identification 0 = 1. Let Tx = x 2 (so Tnx = X(2"l). Observe Tnx -+ 0, for all x E X, but not uniformly. Observe An(f) -+ 1(0) uniformly for all continuous f.
3. Example: As an example of a map not uniquely ergodic, let T be an ergodic automorphism (or non-ergodic for that matter) of the d-dimensional torus IRtl/lLtl, i.e., multiplication by an integer matrix of determinant ± 1. Show that the periodic points are dense (identify them!) and conclude there are many invariant measures. Exercise 3.6 For each invariant probability measure Jl. on X, we have a linear functonal L E C~(X) satisfying L ~ 0 (i.e., I ~ 0 -+ L(f) ~ 0), L(t) = 1, and L(f 0 T) = L(f), and, conversely, such an L creats a Jl.. Note IILII = 1 < 00 follows from the above.
Ergodic theorems and ergodic decomposition I 45
1. Show the collection K of such L is a weak· closed subset of the unit ball
in Cn(X), and thus weak· compact. Show the set is also convex. Conclude there are extreme points. 2. Show that T is ergodic with respect to the invariant measure J1 if and only if L (corresponding to J-l) is extreme in K.
Notes: If you can't make anything of the terminology in these last exercises, forget it for now. However, Sections 6 and 7 in Chapter 10 of Royden (1968), are relevant in 1. For 2, +- follows easily once you form the contra positive. The other direction needs the Radon-Nikodym theorem.
3.5
Disintegrating a measure space over a factor algebra
We now take leave of ergodic theorems and return to the constructive methods of Chapter 1. What we will show is that any measure-preserving map can be disintegrated into ergodic components. The idea here is to construct a map qJ : X -+ [0, 1] x [0, 1] defined a.e. and measurable onto a.a. of the unit square, taking J-l to Lebesgue measure, ~ to the Lebesgue sets and J to the algebra of verticalsets A x [0,1]. The measure spaces {x} x [0,1] with one-dimensional Lebesgue measure provide the desired disintegration and Fubini's theorem, the basic integration tool for re-integrating the decomposition. These ideas originate in the work of Rohlin (1966) and Stone (1950). We will in fact consider a case slightly more general than the invariant algebra oF. .91 can be any T-invariant complete sub-u-algebra of ~ containing oF. Here are two examples of such a situation. Example 1 X is [0,1] x {1,2, ... ,n}, J-l is the direct product of Lebesgue measure and normalized counting measure, and ~ is the Lebesgue algebra. T is a bimeasurable measure-preserving map of X that maps a fibre {x} x {1, 2, ... , n} to another such fibre, measurably and preserving J-l. .91 is the completion of the algebra of vertical sets A x {1,2, ... , n} where A is Lebesgue measurable in [0,1]. Example 1 X is [0,1] x [0,1], J-l is Lebesgue measure, ~ is the Lebesgue algebra, T is a bimeasurable measure-preserving map of X, taking fibres {x} x [0, 1] to fibres bimeasurably preserving Lebesgue measure on the fibres . .91 is the completion of the algebra of vertical sets A x [0,1]. Exercise 3.7 Construct explicit examples of both of these types. In both examples J-l can be written as an integral over [0,1] of fibre measures, (either counting measure/n or Lebesgue measure) by Fubini's theo-
46
I
Fundamentals of measurable dynamics
rem. Lettingf(x) be the fibre over x, Jlx the fibre measure and ~ the measurable subsets of the fibre, our examples have: (1) T: (f(x),~, Jlx) -+ (f(T(x», ffT(x) , /IT(X)) bimeasurably and preserving measure; (2) for A
E~,
for a.e. x, /lx(A)
(3)
= E(A Id); and /leA)
=
Il
Jlx(A)dt.
(3.32)
A more general example would be a weighted average of examples like I and 2. Of course (3.32) would still hold. If we have a T-invariant algebra d, and can find a bimeasurable measurepreserving map from (X,.9', Jl) to almost all of a weighted average of examples of type 1 and 2, then we can pull back the fibres, fibre algebras and fibre measures to X. We call such a collection (f(x),~,JlJ a disintegration of (X,.9', /l) over the algebra d.
Theorem 3.17 If d is a non-atomic T-invariant complete sub-u-algebra of IF, J e d (T, of course, bimeasurable and measure-preserving) then (X,.9', Jl) Can be disintegrated over d.
Note: We can leave out the non-atomic condition by putting atoms in the first coordinate. The J c A condition can also be removed but then requires more elaborate possible cases. The former issue we leave to the reader, the latter never arises in our work. What we are proving is a special case of the Rohlin-Stone decomposition of a measure space over a factor algebra (Rohlin 1966, Stone 1950).
Proof Our problem is to construct the requisite map to a weighted average of examples of types 1 and 2. Let {Pi} be a generating tree of partitions for .9', with T(Pi ) and T- 1 (Pi ) both Pi +1 measurable. Let {Qi} be a generating tree of partitions for d with the same property. Remember, d is a Lebesgue algebra (Theorem 2.2). For a partition P = {Sl'.'.' Sk} and algebra B we write D(PIB)
= (E(XsIIB), E(X' IB), ... , E(XskIB», 2
(3.33)
a probability vector valued, B measurable function defined a.e. We know, for a.e. x, for all i, D(P;lQ)(x)
j --+
D(Pild)(x).
As T(P;) and T- l (Pi ) are Pi +1 measurable and T(Qj) and T- 1 (Qj) are Qj+1 measurable, on this subset X' c X of full measure j
D(T(Pi)IQj)(T(x» --+ D(Pild)(x)
Ergodic theorems and ergodic decomposition I 47
D(T- I (P;)IQj)(T-I(x» ~ D(P;ld)(x)
For x E X', let l€(x) descending to x. Let
= C I (x)
::J
(X(x)
c 2 (x) ...
be the chain of sets in the {PJ tree
= lim E(ci(x)ld). / .... 00
As T(c/(x» is a finite union of sets in Pi + h one of which is c/+1(T(x», and similarly for T- I , (X(T(x» = (X(x) for x E X' and (X is d measurable. Thus there is a T-invariant subset X" c X' of full measure, and restricting to X", (X is constant on the equivalence classes of points not separated by the {Qd tree. The function D(Pild)(x) is, for each x E X", a probability vector. The component terms of D(Pi+lld)(x) can be summed in subsets according to which element of Pi they belong, to obtain D(P; Id)(x). It follows, that if (X (x) > 0, then for some nEZ, (X(x) = lin and for some J(x) large enough, if i ~ J(x) D(Pild)(x)
l 1 1 ... ,-l} , = {-,-,-, n n n
n
(3.34)
where we have omitted elements of measure 0. To see (3.34) just notice that as i grows if arbitrarily small positive terms occur in E(P; Id)(x) then for some x' equivalent to x, (X(x') = 0. But (X(x') = (X(x) > 0. Hence, as of some stage J(x), sets cease being split. As (X(x) is a constant on the equivalence class, all non-zero terms in the vector must be equal. Break X" into T-invariant subsets XI' X 2 , X 3 , ••• , X~ where XEXn if (X(x) = lin (1/00 = O).•We will discuss X oo , as it is the more interesting case. The X n , n < 00 follow a similar line of reasoning leading to the parts of the decomposition of type 1. X is the part of type 2. Assume Il(Xoo ) > 0, and renormalizing Il, we can assume X = Xoo and for all x, (X(x) = 0. We first construct a map cp from a.a. the equivalence classes of points not separated by {Q/} onto a.a. of [0, 1] by assigning to successive levels of the tree left-closed, right-open intervals. We have discussed this construction in detail in Chapter 2. The map cp is defined a.e., maps to a.a. of [0, 1] and is bimeasurable and measure-preserving. We can assume the subset X' where cp is defined is T-invariant. Let Z = cp(X'). We wish to refine cp to a point map from X' to Z x [0,1], still bimeasurable and measure-preserving by successively cutting the fibres fz = z x [0,1] according to the probability vectors D(Pj Id)( iji-l(X». Let PI = {Pl,P2' ... 'Ps}. The functions D(P;ld) are constant on {Qd separated fibres, hence {D(P/ld)(cp-l(Z» H=l is well-defined everywhere on X and is a measurable probability vector-valued function. For a set qj E Ql and Pi E PI' define a map cp(qj f""I Pi) to the measurable 00
48
I Fundamentals of measurable dynamics
subset of cp(qj) x [0,1] between the graphs of
E(iO Pil d
)
0
E(
and
cp-l
k =1
UP;lJII)
0
ip-l,
(3.35)
k =1
closed below, open above. It follows easily that [2 (cp(qj
n Pi)) = /1(qj n p;)
(3.36)
where [2 is two-dimensional Lebesgue measure. Figure 3.2 illustrates this construction. We extend cp to sets of the form qj n Pi'
Pi
qj E Qb
E
Pk
by induction on k. We assume the elements of Pk are ordered so that those whose union is the first element of Pk - 1 come first, next those in the second element, etc. Thus defining cp(qj n Pi) to be the subset of ip(qj) x [0,1] between the graphs of
E(G Pdd)
0
E( UPdd)
and
ip-l
0 ip-I,
k =1
k=1
closed below, open above, we automatically get
- -- - - -- -
r
-
-,--
-
V-
- -- - - -- ---- - r-_ ....- - - - - - - - - -- .... _ f- -
-
- -....
....
-
--
-
....
- -- f- -
-
Fig. 3.2
PI'
---
--.
--- ---....
/
........ .... -........
'"
--....
- r- -
- r- -
....
....
-- --
-
:.---
The dashed lines indicate how P2 refines PI'
.... ....
--
--
V---
- - r-- - I-I- -- - f- - -
-
Disintegration over a factor algebra. The dashed lines indicate how P2 refines
Ergodic theorems and ergodic decomposition
t.
(2 (qJ(qJ () Pi)
=
I 49
/l(qj ("'\ Pi)·
2. {qJ(qj ("'\ Pi)lqj ~PI iSJrom level k of {Qi v P;}} partitions Z x [0,1]. Call this partition Qi v Pi· 3. The partitions {Qi v P;} form a tree of partitions exactly mirroring the intersection properties of {Qi v P;}. The map qJ gives a 1-1 correspondence between the chains of the {Qi v PJ tree and the {Q, v P'} tree of partitions. If two points Zl and Z2 are not separa~d by the {Q, v Pd tree then first, they must lie on the ~ame fibre as the Qi tree separates points of Z. But now, from our constructIOn, the cham of sets c1 ::::> C2 ::::> ••• that contains Zl and Z2 must intersect to an interval, and hence
fz'
Now the Ci = qJ -l(Ci) either intersect to 0 or to a single point x E X. If not 0, then ex = ex(x) > 0 which we know is not true. Hence the collection of all chains in {Qi v Pi} that descend to more than one point correspond to empty chains in {Qi v Pi}. These form a set of chains of measure in X, hence a set of measure 0 in Z x [0,1]. On what remains, {Qi v P'} is a generating tree of partitions. The {QI v P'} chains that descend to 0 form a set of chains of measure O. Delete from X a T-invariant subset of measure containing the intersection points of the chains corresponding to these. On the remaining T-invariant set of full measure qJ now reduces to a point map which is bimeasurable and measure-preserving as the two trees are images one of the other and generate. Let f = qJTqJ-l be a measure-preserving map of qJ(X) c [0,1] x [0,1] to itself. As T(Qd and T-1(Q;) are subsets of Qi+l' Tmaps fibres fz to fibres, but furthermore, as T(Pi ) and T-1(Pi ) are subsets of Pi+l' T maps the intervals on a fibre corresponding to some Pk to a finite union of intervals on the image fibre in a measure-preserving fashion, the {.P;} tree restricted to a fibre generates the Lebesgue algebra ~ on it and so letting t z represent this fibre measure, T is a bimeasurable measure-preserving map from (f(z),~, t z ) to (f(T(z», ~1'(z),t1'(Z»· _ As the Qi consist of vertical sets and are qJ(Qi),.9I = qJ(.9I) is the completion of the algebra of vertical sets. From Fubini's theorem, for any measurable set A c [0,1] x [0,1], for a.e. z, A ("'\ f(z) is measurable
°
°
E(AI.9I)(z)
and t(A)
For X n , n <
00,
=
Ll
= tAA)
tz(A)dx.
the construction of the map to almost all of [0, 1] x {I, ... , n}
60
J
Fundamentals of measurable dynamics
is completely analogous and as measurability on the fibres is trivial, is much easier.
•
Exercise 3.8 Suppose Ii(Xn ) > O. Show Xn = Ui:J Ai' a disjoint union, and T- 1 (Ai) = Ai -l(modnj. Note the similarity of this to Exercise 3.3 except here ergodicity of T is not assumed, just that J c do
Corollary 3.18 If in the above construction .91 = oF = the algebra of Tinvariant (modO) sets, then for a.e. x, T is a bimeasurable measure-preserving ergodic map from (F(z), §"., t z ) to itself The disintegration of (X, JF, Ii) over J is called the ergodic decomposition of the system. Proof Mter deleting a set of measure 0, T(Q;) = Qi identically. Hence, a.s., T(f(z»
= f(z).
All that remains is to verify ergodicity. We use Corollary 3.16. Let f be the characteristic function of some set Pk E Pi, and fn its Cesaro averages. By the Birkhoff theorem, J.. converges a.e. to E(floF). As there are only countably many such f, for a.e. z, for t z a.e. y, f.«z,
y»
--+
E(fIJ)(z),
a constant t z a.e. on f(z). That this holds for finite unions of sets in some Pi follows easily. These are Ll dense as {PJ generates, hence by Corollary 3.16 to the Birkhoff theorem Ton (f(z), ~,tz) is ergodic for a.e. x. • This completes our discussion of disintegration over factor algebras. We
will not argue the essential uniq ueness of the disintegration. This is not terribly difficult to demonstrate.
4 Mixing properties
The fundamental problem of ergodic theory is to explore the structure of measure-preserving transformations in search of properties natural to them, which can be easily applied to describe and distinguish them. In this chapter we will discuss a hierarchy of such properties, each successively stronger, called mixing properties. The reason for this name is that they concern the way in which the powers of a transformation T 'mix' one set, A, into another set E, i.e., they concern the sequence of functions (4.1)
4.1
Poincare recurrence
The simplest of these properties is Poincare recurrence which says for some i >0
Theorem 4.1 space (X,~,
If T is a measure-preserving transformation of the probability
J-t) and J-t(A) > 0, then for some 0 < i -< [1/J-t(A)] J-t(T-i(A) n A) > O.
(4.2)
~
Proof If J-t(T-i(A) n A) J-t(X)
~
=
0 for all such i, then
ll;t:l]
J-t(T-i(A))
= J-t(A)
(1 + [J-t(~)J) > 1,
a conflict.
4.2
Ergodicity as a mixing property
If T is ergodic we know more. The L 2 -ergodic theorem says
which we could write as
•
52 I Fundamentals of measurable dynamics 1 "-1
11 i~
f
(XT-;(A)' XB -
Jl(A)Jl(B»dJL ~ 0;
this is half of the following theorem. Note: from here on 'transformation' will always mean an invertible, measure-preserving map from a Lebesgue probability space to itself. Theorem 4.2 and B,
A transformation T is ergodic iff for any two measurable sets A
(4.3)
Proof We know one direction, that if T is ergodic the limit holds. Assume the limit holds and suppose T(A) = A. Letting A = B 1 "-1
11 i~
f
(XT-'(A)' XA -
Jl(A)2)dJl ~ O.
But as T- 1 (A) = 1:1. this says
frl
dJl
= Jl(A) = Jl(A)2.
•
Thus Jl(A) = 0 or 1 and T is ergodic. Corollary 4.3
The transformation T is ergodic iff for any measurable set A
ifo f
1 "-1
11
(XT-;(A)' XA -
JL(A)2)dJl ~ O.
•
For mixing properties this is not unusual, that knowing all sets mix with themselves in some fashion implies the same for any pair of sets.
4.3
Weakly mixing
Our first non-trivial mixing property will require that the Cesaro convergence of the sums above be absolute. Definition 4.1 We say a transformation Tis weakly mixing iffor any measurable sets A and B, (4.4)
This condition lies at the heart of a large web of argument. Perhaps the core result here is that any ergodic transformation has a maximal invariant factor
Mixing properties
I
53
algebra on which it is isomorphic to an isometry of a compact metric space. The transformation is weakly mixing exactly when this factor algebra is trivial. We will ofTer two proofs of this fact, one by a bare-hands construction of the metric space due to Katznelson, the other via a short discussion of spectral theory. Our first result will already show that weakly mixing is a non-trivial condition. We first need a preliminary definition. We use the symbol # to indicate the cardinality of a finite set. Definition 4.2
A subset SeN is of density rx if #(Sn{O,I, ... ,n-I})
n and of full density if rx
=
n
-+
rx,
(4.5)
I, i.e., S contains 'almost all' of N.
Lemma 4.3 A transformation T is weakly mixing iff for any measurable A and B there is a subset S = {nl < n2 < n3"'} of N of full density for which lim
~(T-nk(A)
n B)
= ~(A)~(B).
(4.6)
k~X!
Proof
Suppose such a subset S existed. Then -
!~~
;;1 Jo f IXT-'(A)XB n-l
~(A)~(B)I d~
~ !~~ ~Cn{o,~,n-L} 1~(T-i(A) n B) - ~(A)~(B)I) -~
1
+ lim ~ ( # SC n {O, I, ... , n - I}) "-Xl n
which equals zero. On the other hand, if T is weakly mixing,
1 n-1 . lim ~ L I~(T-'(A) n B) - ~(A)~(B)I n--+oo n i==Q Letting S,
=
=
0.
{i: IJl(T-i(A) n B) - Jl(A)Jl(B) I < Il},
-I' #{S%n{O" .. ,n-I}} 1m n
1
~ ~
lim
In~l
.
~ L... I~(T-'(A)
8n~X!ni=O
n B) -
~(A)~(B)I =
0.
#{S,n{O, ... ,n-I}} , , Th us IImn~X! = 1. If denSity in N were O'-additive, we n could now just take Sl/i' As it is not, we must be more clever, Choose {Ni } so that Ni+l/Ni > i and for n 2 Ni
ni
54
I Fundamentals of measurable dynamics #{SI/i n {O, ... ,n-l}} n
Let S =
Ur,;,l SI/i n
> 1-I/i.
{a, 1, ... , Ni - I}. The result is now a computation.
-
Corollary 4.5 T is weakly mixing iff lim
n1 i~ f(XT-'
J.l(A)J.l(B»2 dJ.l
= O.
(4.7)
Theorem 4.6 An ergodic isometry T of a compact metric space X is not weakly mixing. (We assume X contains more than one point.) Proof We may as well assume T is minimal as otherwise X decomposes into T-invariant closed sets on which it is and as T is ergodic, J.l is supported on just one such set. Let B,(x) denote the open ball of radius r centred at x. Now J.l(Br(x» is independent of x and is a non-decreasing function of r and so is continuous at some point ro > O. Thus for any /: > 0 there is a fJ > 0 so that if d(x, y) < fJ then
Now J.l(B6(x» > 0, and the convergence of the Birkhoff theorem applied to B6(X) holds uniformly on X. (see Example 4 of Chapter 1 and Exercise 3.5 on unique ergodicity). Thus {n: T"(x) E B6(X)} has density J.l(B6(x» > O. But for such an n,
J.l(Bro(x) n T-"(B,o(x»
= J.l(B,o(x) n B'o(T"(x» ~
J.l(B,o(x» - 1:(J.l(B'o(x)d Bro(T"(x)))
~
(1 - 1:)J.l(B,o(x»
and T is not weakly mixing.
-
Corollary 4.7 If T, acting on (X, F, J.l), has a factor action measurably isomorphic to an isometry of a compact metric space then T is not weakly mixing. Proof If T is weakly mixing, then restricted to any factor it also is.
•
Exercise 4.1 Refine Theorem 4.6 to show that if T is a minimal isometry of a compact metric space X, J.l its unique invariant Borel probability measure, then for any FE L 2(J.l) and I: > 0, show there is a sequence nk of positive density with fIF(T"(x»F(x) - 11F11WdJ.l < 1:. We will now develop a circle of equivalent definitions of weakly mixing, one piece of which will be the converse of Corollary 4.7.
Mixing properties I 55
Theorem 4.8 Let T acting on (X,.fF,J.l) be ergodic. If the Cartesian square TxT, acting on (X x X,.fF x .fF, J.l x J.l) is ergodic then T is weakly mixing. The proof rests on the following piece of arithmetic. Lemma .4.9
If an is a sequence of real numbers with
1 "-1
(4.8)
lim-Lal=a " .... 00 n /=0 and 1 "-1
L
lim -
"-00 n
i=O
ar =
a2
then 1 "-1
lim Proof
1
n-1
lim - L (aj - a)2
"-00 n i=O
Proof of Theorem 4.7
L (ai -
n 1=0
""'00
= lim
a)2
= O.
(1 L ar - -
2a n-1
"-1
"-00 n ;=0
n
)
L ai + a 2
= O.
i=O
•
For any sets A and B,
1 "-1 lim - L Ji(A
n"'oo
n 1=0
.
11
T-'(B» = Ji(A)Ji(B)
as T is ergodic. Letting A = A x Allnd B = B x B,
1 "-1 _ _ 1 "-1 lim - L Ji x Ji(A 11 T- i x T-i(B» = lim - L Ji(A 11 T-i(B»2 "-00 n i=O "-ex:: n i=O = Ji x Ji(A)Ji x Ji(B) = (Ji(A)Ji(B»2 .
• By Lemma 4.9 then
1 "-1 lim - L (Ji(A
"-00 n i=O
11
T-i(B» - Ji(A)Ji(B»2
and Corollary 4.5 tells us T is weakly mixing.
=0
•
The next theorem is Katznelson's demonstration that if a transformation is not weakly mixing, then it must have a factor isomorphic to a minimal isometry. This is done by constructing an invariant pseudometric which
56
I
Fundamentals of measurable dynamics
makes the space precom pact. Later we will give an alternate proof via spectral theory. Theorem 4.10
If T acting on (X,:#', p,) is ergodic and has no factor actions isomorphic to an isometry of a compact metric space and S acting on (Y,~, v) any other ergodic transformation, then T x S acting on (X x y,:#' x ~,p, x v) is ergodic.
Proof We verify the contrapositive, i.e., assuming the Cartesian product is non-ergodic we construct a factor algebra on which T is isomorphic to an isometry of a compact metric space. We do this by constructing a non-trivial T-invariant pseudometric on X making it a precompact metric space. Let A E :#' x ~ be a T x S-invariant set, 0 < p, x v(A) = IX < 1. Let :#'1 = ff x {trivial algebra} and :#'2 = {trivial algebra} x <§. Both are T x S-invariant algebras, and on each, T x S is ergodic. Thus E(XA 1:#'1) is :#'1 measurable and T x S-invariant, hence is IX a.s. Letting Ax = {yl(x, y) E A}, for a.e. x, Ax is measurable and v(Ax) = E(xAI'd = IX, and AT(x) = S(Ax) as A is T x S-invariant. Let Xo c X be a T-invariant set of full measure with v(Ax) = p, x v(A) = IX for x E Xo. For x, x' E X o , let d(x, x') = v(AxAAx')' Now d(T(x), T(x'» = v(AT(x) A AT(x') = v(S(AxAAx'» = d(x,x'),
and as
o :s: d(x, x") ~ d(x, x') + d(x', x"), d is aT-invariant pseudometric, and hence a metric on the equivalence classes (x) = {Xl: d(x, Xl) = OJ. Let E = {(x)}, the space of such equivalence classes. As T is ergodic and p, x v(A) # 0, 1, E contains more than one point. For any set BE:#', the function fB: x -+ p,(Ax n B) is constant on all x in a class (x) E E, and such functions separate the classes in E. Hence (E,91,'1) is a measurable factor of (X,:#',p,).
Set ,p: x -+ (x) to be the factor map. The measure JL, of course, projects to a measure PI. We wanno show that U = T,p acting on (E, 91,,,) is measurably isomorphic to an isometry of a compact metric space. It is sufficient for this to show that E contains a U-invariant subset Eo, offull measure, on which d is precompact. The map U will extend to an isometry on the compactification of Eo. Ergodicity of T implies U has a dense orbit, and hence is minimal and uniquely ergodic. p, must then project to this unique invariant probability measure, and Eo is almost all of the compactification. Showing the existence of a precompact Eo is the same as showing, for any e > 0, X can be covered a.s. by a finite number of e-balls in the pseudometric d.
Mixing properties
I
57
Let {A 1 ,A 2, ... } be a sequence of sets dense in <§, i.e., for any A E <§ and e > 0 there is an Ai with v(A L\ A;) < e. For e fixed, let
and 00
U Bi=XO• i=l
If Z E TJ(BJ then v(A zL\ Sl(A;) = v(As-j(z) L\ A;) < e/6. Without loss of generality assume Jl(B 1 ) > 0 and now select N so large that Jl
N) (U Bi > 1 i=l
Jl(Bd
--. 2
Let Xl = Ui:=-oo Ti(Bd, a set offull measure by ergodicity of T. For x E Xl, E TJ(Bd, as Jl(TJ(Bd) = Jl(Bd > 1 - Jl(Uf=l B;), there must be ayE TJ(B 1 ) n Bk for some k E {1, ... , N}. But then
X
and so the sets Ai:=
{X: v(AxL\A;) <
n,
i = 1, ... ,N
Cover X 1 and are of diameter less than e, completing the result.
4.4
•
A little spectral theory
Having just completed one proof that a transformation is weakly mixing if and only if it has no non-trivial isometric factors, we now present another. In fact, our true intention is to show that a transformation is weakly mixing if .and only if it has no non-trivial eigenfunctions. We do this by developing a small piece of the spectral theory of transformations. This material is selfcontained and can be omitted or read lightly at first passage. It provides, however, an irreplaceable tool in ergodic theory. We regard a transformation T as a unitary operator on complex-valued .L2(Jl). What we want to do is to model this operator by multiplication by e 2n ;6 on L2 of the unit circle Sl in the complex plane. We must make two sacrifices in order to do this. First we will not model all of L 2(Jl) but only the T-invariant subspace generated by some single function. Second, on the unit circle, we will
58
I Fundamentals of measurable dynamics
not have Lebesgue measure. In fact, the core work here is the construction of the appropriate Borel measure on Sl. This second issue is not in fact a sacrifice. The measure we build becomes a kind of bookkeeper for much of the structure of T. To begin, let T be an ergodic transformation on (X,!F, JL) and F : X ..... C a complex-valued function in L 2 (JL). We will define a spectral measure associated to F, (sp(f». On L2(Sp(f) we have a unitary operator, multiplication by the function e 2"ifl. We want this operator to be isomorphic to the action of T on the subspace of U(JL) generated by F. In this correspondence F is to be associated to the function 1. We will construct sp(F) by describing inner products of continuous functions with respect to sp(F). We need a standard way of uniformly approximating a continuous function by trigonometric polynomials. In our work we represent Sl as {eifl : 0 ~ 0 ~ 2n}, and write functions of Sl as 2n-periodic functions of 0 E ~. Let
f.
KiO; = •
j=-n
. (n-?-tJ) + ~ )2 _1_ ( +
(1 _ +
1
JlL)e iifl = n 1 n
sm
1
(4.9)
. '1, sm2
be Fejer's kernel (Katznelson 1968). Lemma 4.10
Fejer's kernel is a positive summability kernel in that
(1)
Kn(O) is 2n-periodic, continuous and non-negative for all n, 0;
(2)
1/2n Kn(O) dO
(3)
limn_oo Kn(O)
J
= 1; and
= 0 uniformly on any interval (j ~ 0 ~ 2n -
(j,
0<
(j
< n.
Proof Exercise 4.2 part (1). Corollary 4.12
For any continuous 2n-periodic f: ~ ..... C,
(1n(/)(O)
= 2n1
f2" f(t -
O)Kit)dt
0
= jI:.n n cj(f)eiJfI
(4.10)
is a trigonometric polynomial c'(f) = J
(1 - JlL)~ f2" + n
1 2n
f(t)eijldt =
0
Further, (1n is a positive linear operator
and (1n(/) It f uniformly.
(1-
JlL)c.(f). n+1 J
(4.11)
Mixing properties
I 59
Proof Exercise 4.2 part (2). Exercise 4.2 (1) Prove Lemma 4.1l. (2) Prove Corollary 4.12. Let f and g be two 2n-periodic continuous functions from IR to C. Define a series of bilinear forms n
=
I
j.k=-n
cjn(ffc;(g)aj,k
We want to see that these bilinear forms converge to the desired inner product. The next result is a special case of Bochner's theorem.
(\,()'" Theorem 4.13 Suppose {is continuous, 2n-periodic, real-valued and positive. The operators f .....
< -( ) 1>\0.-
(../j-;--'JJ>n
Proof For f in the cone of positive continuous functions L. : f ..... is continuous in the uniform norm on f. We can extend Ln to all of CIR([O, 2n» by L(f+) - L(f-) and get a complex-valued continuous linear functional. Hence by the Riesz representation theorem, Ln(f) = f dVn for some perhaps complex-valued measure vn • But for f ~ we compute
J
°
so Vn is a positive real-valued measure with vn([0,2n» = <1, 1)n = ao,o'
The Vn lie in a bounded, hence weak*, compact region of P = If=_NCjeiit is a trigonometric polynomial
f jPj2 dV n~N C/:kaj.k. n -:
C~([O,
2n». If
60 I Fundamentals of measurable dynamics
Any positive continuous Jis the uniform limit of such polynomials IPI 2 • Thus the sequence of measures {vn } cannot have more than one weak* limit and so must converge weak* to a Borel measure we call sp(F). We want to identify
f
= (J,g).P(P)·
Jgd(sp(F»
Let P and Q be positive real-valued trigonometric polynomials N
P =
L bjeiifJ j=-N
Q=
L Ck eikfJ • k=-N
N
Using
+ (P, Q)SP(F) «P + Q,P + Q)sP(P) - (P,P).P(F)(Q,Q)SP(F)
2(P, Q)sP(P) = (P, Q).p(F) =
one easily computes N
= J,k=-N L bi:kaj,k'
(P, Q).P(F)
This easily extends by linearity to arbitrary complex-valued polynomials. Thus for f and 9 continuous N
(Gn(f),Gn(g».P(F) =
L cj(f)C;(g)aj,k = (f,g)n' J,k=-N
(4.13)
As Gn(f) and Gig) converge uniformly to f and g,
• Define now a map Cl>F: {trigonometric polynomials} ..... L2(p,) by setting
= j~N bjF(TJ(x».
Cl>F C=t N bJe iifJ )
(4.14)
Thus, for example, we can compute F(l)
= F(x)
'and F(eiBp(O»
= Cl>p(P(O»
0
T
for any trigonometric polynomial P. The next result is the core of spectral theory as we will use it. Let L 2(F, p,) be the closure in U(p,) of the linear span ofthe set offunctions {F 0 TJ}jez,
Mixing properties
I
61
Theorem 4.14 The map cI>F extends to an L 2-isometry from L2(Sp(F» to L2(F, J,L). As we have seen, cI>F(l) = F and for any f E L2(Sp(F», cI>F(e i6 f(O» = cI>(f)
Proof
0
T.
For P and Q polynomials, (P, Q)sP(F) = =
L b/:kaj,k j,k
f(~
bjF(Tj(X»)
= (cI>E(P), cI>F(Q» F
(~ CkF(Tk(X») dJ,L
w
Thus cI>F is an L 2-isometry where it is defined. Certainly then it extends isometrically to the closure of the polynomials in L 2(sp(F». This is all of L2(Sp(F». The image under F of the trigonometric polynomials is exactly the linear span of {F 0 Tj}jez, Hence the range of the extended cI>F is its closure. Since T acts as an isometry of L 2(F, J,L) to itself, as does multiplication by ei6 on L 2(sp(F», the identity cI>F(e i6 f(O»
= F(f) 0 T
•
extends from polynomials to all of L 2(sp(F».
Corollary 4.15 If G E L 2(F,J,L), then sp(G)-< sp(F) and we can compute the Radon-N ikod ym derivative
Proof Supposing G = (P), P a polynomial and nomials, it is a finite computation that
f
and 9 are also poly-
(Pf, Pg).P(F) = (f, g)SP(fl>F(P))'
Suppose now GEL 2(F, J,L) is arbitrary, G = F{h). Suppose P; -: h in L2(Sp(F», where the P; are polynomials. Then G; = F(P;) -+ G in L 2(J,L). Thus all the coefficients
f G;(Tj(x»G;(Tk(x» dJ,L -: f G)Tj(x»GjTk(x»dJ,L. This tells us, for f and 9 still polynomials (hf,hg)sP(F) = (f,g)sP(G)' I
As F and G are both L 2 isometries, this tells us that the map f -+ if is an U-isometry from L 2(Sp(G» into the ideal hU(sp(F», and for any f E
62
I Fundamentals of measurable dynamics
U(sp(G», f Jdsp(G)
=
flhI2JdSP(F).
•
A complete discussion of the spectral theory of ergodic transformations would now include how U(JI') is built up from the pieces U(F, JL). We have what we want of spectral theory now, though, and refer the reader to Parry (1969) to pursue this picture further.
4.5
Weakly mixing and eigenfunctions
We will now see how a spectral measure sp(F) can ferret out the failure of weakly mixing. Lemma 4.16
For F
E
L 2 (JL), 2n
~ 1 jtn If F(x)F(Tj(x» dJLI
tends to zero in n iff sp(F) has no atoms. Proof To identify the atoms of sp(F) we look at [0,2n) x [0,2n) with measure v = sp(F) x sp(F). Notice that v( {(O, 0) :
°
E
[0, 2n)}) = (sp(F)(atomic part»2.
To compute v( {(O, O)}) consider the sequence of functions In(x, y) = -1- fi.J 2n + 1 j=-n
..( )
ell X-Y
(4.15)
which converge pointwise to the characteristic function of the diagonal and are uniformly bounded by 1. Thus v({(O,O)})
=
!~~
f
In(x,y)dv.
Computing,
t feijXe-ijYdV f ln(X,y)dV = _1_ 2n + 1 j=-n =
1 • -2--1 I ao.jaO.-j n + j=-n
= -1-
2n
I• If F(x)F(Tj(x»dJL. 12
+ 1 j=-n
•
Mixing properties
Corollary 4.17
Suppose T is ergodic. For any set A
I
63
E §",
±
_1_ (Jl(A n T-J(A» - Jl(A)2)2 2n + 1 i=-n
(4.16)
tend to zero in n iff sp(F) has no atoms away from 0, where F
=
XA(X) - Jl(A).
Proof FE L 2 (Jl) and notice
t
_1_ IfF(X)F(Ti(X» dJll2 2n + 1 j=-n =
2n
~ 1 jtn (f XA(X) -
Jl(a»(XA(Tj(x» - Jl(A»Y
Furthermore, sp(F)({O})
=
.f
lim _1-1 fF(X)F(Tj(X»dJl 2n + )=-n
n-ro
=
fFdJl fFdJl
by the L 2-ergodic theorem (Theorem 3.1). This is zero as F dJl = O. Thus sp(F) has no atom at O. By Lemma 4.16 it has no atoms elsewhere if and only if
J
1
2n
n
+ 1 j2n (Jl(A n
Ti(A» - Jl(A)2)2
•
tends to zero. . Notice that
1 2n
n
+ 1 i2 =
n (Jl(A n Ti(A» - Jl(A)2)2
(2n 2n
2) (_1_ t +
+ +1
n
1 j=O
(Jl(A n Ti(A» _ IJ(A)2f) _ (Jl(A) - Jl(A)2f. 2n + 1
Thus the limit of the symmetric average is the same as that of the one-sided average. Corollary 4.18 Suppose T is ergodic but not weakly mixing. There is then a A. E C, IAI = 1, A-# 1 and an f E L 2 (Jl), If I = 1 with f(T(x» = )J"(x), i.e., T has an eigenfunction with eigenvalue A.
64
I Fundamentals of measurable dynamics
Proof As T is not weakly mixing for some non-trivial set A and F(x) = XA(X) - Jl(A), we know sp(F) has an atom not at O. Suppose it is at a point (Jo· Set A = eiflo . The function 1 at (Jo { bflo = 0 elsewhere is in L 2 (sp(F» and
f bflo dsp(F) > O. Thus f = F(bflo ) is not identically zero. But notice f(T(x»
To see that f
E
= F(e ifl bflo(O» = (Abflo ) = ¥(x),
Jl-a.s.
L OO(Jl) just note
If(T(x» I = IAllf(x)1 = If(x) I and so If I is a constant Jl-a.s. As f=l= 0, we can normalize f to have modulus 1. • The eigenfunction f above can be regarded as a factor map from X to S1 carrying the action of T to rotation by (JoIf 00 is irrational we know Rflo is uniquely ergodic, hence f(Jl) must be Lebesgue measure on S1. If 00 is rational then Rflo is not ergodic, but is still uniquely ergodic on each of its ergodic components. They are finite sets of course. Thus f(Jl) must map to exactly one of them. In either case, we have obtained an isometric factor, giving an alternate proof of Theorem 4.10. We have in fact gained more, having identified the isometry as either a rotation on the circle or on a finite point set. Thus, for example, we have shown that any isometry of a compact metric space must have such a factor algebra. As at the end of Theorem 4.10, we are once more poised ready to show that an ergodic T has a maximal isometric factor. To do so along the lines of the proof of Theorem 4.10 involves constructing an appropriate metric from the full algebra of invariant sets in X x X. Here, using spectral theory, we can more explicitly describe this algebra as the algebra generated by the eigen-. functions. The full discussion of this we construct as a series of exercises. Exercise 4.3 1. Using Exercise 4.1, show that if T is a minimal isometry of a compact metric space and F E L 2(Jl) then sp(F) must have atoms. 2. Using Corollary 4.15 show that in fact sp(F) must be purely atomic. Hint: pick G to avoid atoms.
Mixing properties
I 65
3. Conclude that for a minimal isometry, L 2(1') is generated by eigenfunctions. 4. For T arbitrary but ergodic, let .sd be the algebra generated by the eigenfunctions. Show .sd must contain all isometric factors. 5. Let A(T) be the set of all eigenvalues of T. Show that, as T is ergodic and L 2 (J.l) is separable, that A(T) is countable. 6. For each AE A(T), let SA S;;; L 2 (J.l) be the subspace of eigenfunctions with eigenvalue A. Show that as T is ergodic, dim(SA) = 1. 7. Let A(T) = {Ai} and /; E SA, be a generator for SA,. Show that the pseudometric d(x,y) = LII/;(x) - /;(y)1/2 i makes the action of T on the factor algebra.sd isomorphic to a minimal isometry of a compact metric space: This completes the argument that d is the maximal isometric factor, it gives more, showing that T is an isometry exactly when all sp(F) are purely atomic. A minimal isometry is in fact determined, up to isomorphism, by its eigenvalues A(T). A proof of this result of von Neumann is accessible from our current vantage. It follows from the observation that if the eigenvalues agree, then one can construct an isometry between the two metrics as given in Exercise 4.3, part 7, above. This requires a careful analysis of how the various eigenfunctions must fit together. Thus spectral theory, when the spectral measures are purely atomic is carried in the collection of numbers A(T), and is relatively simple. When the spectral measures sp(F) are non-atomic the situation is much more difficult. We will stop here though, and return to tie together all our discussions of the weakly mixing property. Proposition 4.19 For an ergodic transformation T acting on (X, iF, 1'), the following are equivalent:
(1) T is weakly mixing; (2) the Cartesian product of T with any other ergodic transformation, with product measure, is ergodic; (3) the Cartesian square TxT with product measure, is ergodic; (4) T has no factor action isomorphic to an isometry of a compact metric space; (5) T has no non-trivial eigetifunctions.
•
This circle of equivalent conditions makes the weakly mixing property extremely easy to work with. In Chapter 6 we will add another condition equivalent to these three, that of disjointness from all isometries. For a continuation of these investigations into the weakly mixing property, see Furstenberg (1981).
66
4.6
I Fundamentals of measurable dynamics Mixing
Definition 4.3 Our next mixing property of a transformation T, simply called mixing, is that for any two sets A, B E ~ (4.17)
lim f.l(P(A) n B) = f.l(A)f.l(B). Thus the limit on a set offull density of Lemma 4.4 is made a limit.
This would seem to be the most natural of mixing conditions, having a far simpler definition than weak mixing (or the later K-mixing). The fact that we will prove no theorems about mixing is one indication that this is not the case. The mixing property is in fact rather difficult to work with. Definition 4.4 We say Tis k-fold mixing if for any sets Ai' A 2 , lim
... ,
Ak
f.l(Ai n P2(A 2 n T ft 3(A3'" (At - i n Tftk(A k)) ... )
(4.18) There is a parallel notion of k-fold weak mixing, that the full density version of this limit holds. Furstenberg (1981) has shown that weakly mixing implies k-fold weakly mixing for all k. The relationship between two-fold and threefold mixing remains one of the outstanding unsolved problems in measurable dynamics. Chacon's transformation
We will now describe the construction of a transformation T, due to Chacon, which is weakly mixing but not mixing. We give a rank-1 cutting and stacking version of the construction. As described in Example 5 of Chapter 1, for Chacon's example k(n) = 3,
S(1, n)
=
S(3, n) = 0,
S(2, n)
= 1,
i.e., at each stage of the construction the stack is cut into three equal slices, which are stacked in order with a single new level placed between the second and third slices (Fig. 4.1). It is a computation that T acts on the interval [0,3/2). Theorem 4.20
Chacon's map is weakly mixing but not mixing.
Proof Let N(n) be the number of intervals in the nth stack, and label these intervals /(1, n), /(2, n), ... , /(N(n), n)
in order from the bottom of the stack upward. Thus for 1:::; j < N(n), T(/ (j, n» = I (j + 1, n).
Mixing properties I 67
D DiDiO D D
S(2,n)= 1
Cutting
Stacking
Fig. 4.1
Cutting and stacking in Chacon's map.
From the diagram above, for 1 < j < N(n), 1
J.l(TN(n)(I(j, n» (') I(j, n» ~
"3 J.l(I(j, n»
J.l(T N(n)+l(I(j, n» (') I(j, n» ~
"3 J.l(/(j, n».
1
(4.19)
Let S = 1(2,2). For the second level of the second stack, J.l(S) = 2/9. For any n > 2, S is a disjoint union of sets of the form I(j, n) where 1 < j < N(n). It follows that for n > ~2, (4.20) and T is not mixing. We show T is weakly mixing by contradiction. Suppose d( , ) is a nontrivial T-invariant pseudometric on [0,3/2] making it precompact. For 1/10 > 8 > 0, let D be an 8-ball of positive measure ::F 1. As the intervals I(j, n) refine in n, there must be an nand j with l(j, n) satisfying J.l(I(j, n) (') D) ~ (1 - 8)J.l(I(j, n»,
(4.21)
i.e., all but a fraction 8 of I(j, n) is within a single 8-ball. Let
l(j, n) = I(j, n) (') D. As dis T-invariant, setting f(k, n) = T"-J(I(j, n», each f(k, n) has radius at most 8 and occupies a fraction (1 - 8) of I(k, n). As 8 < 1/10, and •
I Fundamentals of measurable dynamics
68
Jl(TN(n)(I(j, n» r. I(j, n»
~ ~ Jl(I(j, n»,
we know
Hence for a set of points x of positive measure, i.e., those in d(TN(n)(x), x)
[U, n),
< 28.
But as f(x) = d(TN(n)(x), x)
is T-invariant it is constant a.e., hence d(TN(n)(x),x) < 28
(4.22)
d(T N (n)+1 (x), x) < 28
(4.23)
almost surely. Using
we similarly conclude almost surely. But then d(T(x), x)
< 4e
(4.24)
almost surely, hence d(T(x), x)
= 0 almost surely and as a.e. orbit is dense for
d, d is the trivial pseudometric.
_
Exercise 4.4 Give an alternate proof that Chacon's map is not weakly mixing by showing it has no non-trivial measurable eigenfunctions. Hint: Show that any measurable function must finally be essentially constant on most levels of the towers. Use this, and the 'spacer' to show that the only possible eigenvalue is 1. Exercise 4.5 Consider the rank-l cutting and stacking construction with = 2", and spacers S(i, n) = 0 except for i = 2"-1 and S(2 n-1, n) = 1. Call the e~godic map obtained T.
k(n)
1. Show T is weakly mixing. 2. Show that there is a sequence nk .....
00
so that for any set S,
Jl(ynk(S) r. S) -: Jl(S),
(this property is called rigidity).
Mixing properties
I 69
3. Show that a mixing map is never rigid, and hence T is not mixing. 4. Show that an isometry of a compact metric space is always rigid. 5. Show that Chacon's map is not rigid, and in fact, for no set S "# 0 in X is there a sequence nit /' 00 with Jl(rnk(S) AS) --+ O. This gives a third proof that Chacon's map is weakly mixing.
4.7 The Kolmogorov property We end this chapter by giving a definition of K-automorphism (K is for Kolmogorov) that shows how it fits into the hierarchy of mixing conditions. In the next chapter on entropy we will develop a chain of equivalences for the . K-automorphisms very similar to that we developed for weak mixing. Definition 4.5 A map T acting on the Lebesgue probability space (X,~, Jl) is called a K-automorphism provided the following is true. For any finite partition P and for any 8 > 0 there is an N (depending on both P and 8) so that for any t E Z+ and any integers nz, n3"'" nt with nz > N, n3 - nz > N, ... , nt - nt - 1 > N, we have (4.25)
for all B E P. Comments on the definition.
1. Notice that if t were kept bounded by k the definition reduces to n...Tn2(B2) rn (B )···1i T"k(B IJl(BJl(T"2(B T"k(B » 2 rn (B 3
Ii
Ii
3
k
I
(B) < e
It ) _
3
3 ) ••• I i
Jl
(4.26)
once n 2 , n3 - n 2 , ••• , nk - nk-l are large enough. A little induction shows that this is equivalent to k-fold mixing. Thus the K-property is a kind of uniform k-fold mixing, and K-automorphisms are mixing of all orders. 2. Although the definition gives uniformity in t, we must choose the sets B,
B2 , ••• , B, from some fixed finite collection. If we allowed ourselves to choose from an infinite collection, for example, B, T- 1 (B), T-Z(B), ... , no map would satisfy the condition.
3. We use conditional expectation in the definition as opposed to
as these values automatically tend to zero in t if T is mixing. For any e and for large t, the condition would be vacuous. Thus this property is just k-fold mixing for all k.
70
I
Fundamentals of measurable dynamics
The K-property asks for more, requiring that each new Tn'(Bt ) mix well with the previous ones relative to their size. Just as weakly mixing opened the door to spectral theory and point spectrum, the K-property will open the door to entropy.
5 Entropy
Our approach so far to the classification of ergodic processes has been to search for natural invariants of the process. We have discussed two sorts, the point spectrum and mixing properties. One more invariant remains to be looked at, perhaps the most fundamental numerical invariant of stationary stochastic processes, the Kolmogorov-Sinai entropy. What the entropy attempts to measure is the rate at which a process becomes random.
5.1
Counting names
The approach we take to this question is unusual, being based from the outset on name counting. We will not intersect the classical theory for some time (at Definition 5.3). With our approach the ideas will evolve quite naturally, and the skills we develop in manipulating names will stand us in good stead in Chapters 6 and 7. Let's begin to focus precisely on how entropy is computed. Let T, acting on the Lebesgue probability space (X,!F, p.) be an invertible measure-preserving ergodic map, and P = {Pl' ... , P.} be a finite partition of X into measurable sets. Here is a slightly novel notion of a 'partition' which will be very useful for us. Regard P as a function from X to a 'finite state space' of 'symbols' or 'names.' Thus P:X ..... {Pl' ... ,Ps} is a finite partition and {Pl' ... 'Ps} is the space of symbolS': The set Pk is then more precisely p- 1 (Pk), the set of points 'named,' or 'labeled' Pk. It will happen often that we will label several partitions with the same labels. In this case the state space will be lower case letters, SUbscripted to order them. The function will be indicated by the corresponding capital letter, embellished with a subscript or other notational device to indicate it uniquely (Pl ,P2 ,P',F, etc.). When there is no ambiguity we will refer to a set by its name. The T,P,n-name of x E X is a sequence of n symbols, each chosen from among PI, ... , P. written Pn(X)
= (P(x), P(T(x», . .. , P(Tn - 1 (x»)
(5.1)
where P(x) is that element of P containing x. Thus Pn maps X to r. These sequences of n symbols from P are then the names for the sets in T-i(P). The measure ~ on X is transported by Pn to a measure on the elements PO-the measure of a name is the measure of the set of points which have that name.
vr;;J
72
I Fundamentals of measurable dynamics
Entropy, roughly speaking, is the exponential rate of growth in n of the number of T,P,n-names. To be precise, fix 8 > O. Starting from the names of least measure in pn, remove as many as possible so that the measure of the remaining names is still greater than (1 - 8). This collection of remaining names we denote by
S(T, P, n, 8). The presence of the 8 is what ties the entropy to the invariant measure p. We only consider the 'large' names. If we did not omit any names, measure would play no role in the definition. This is a fruitful approach also and leads to the development of topological entropy for symbolic systems. Let
N(T,P,n,8)
(5.2)
# (S(T,P,n,e».
=
It seems reasonable to guess that this number would grow exponentially in n. To try to extract the coefficient of this growth we introduce the following definition.
Definition 5.1
For T ergodic and P a finite partition let
1 h(T,P,n,e) = -log2(N(T,P,n,e» n
[note:
~
log 2 s].
(5.3)
To remove dependence on n set
h(T, P, e) = lim h(T, P, e, n),
(5.4)
and to remove dependence on e set
h(T, P) = lim h(T, P, e).
(5.5)
The limit certainly exists as the sequence h(T, P, e) is monotone increasing (as e ..... 0) and bounded (by log2 s). We first use the backward Vitali lemma to show that e plays no significant role, except to be less than 1, in this computation. Theorem 5.1 we have
For T ergodic and P a finite partition,for any e', with e' < e < 1 lim h(T,P,n,e')
~
lim h(T,P,n,e).
Proof Fix e, e' and select {n;} so that lim h(T, P, ni , e)
= lim h(T, P, n, e) = h(T, P, e).
Also fix 8 < e', and assume 1h(T, P, ni , e) - h(T, P, e)1 < 8/10.
(5.6)
Entropy I 73 Recall that S(T, P, nj, e) is the set of 'large' names-i.e., a minimal (in cardinality) set of T,P,nj-names sufficient to cover all but e (in measure) of X. Set
8i = {x: Pn'(x) E S(T,P,ni,e)}.
(5.7)
Clearly I'(S;) ~ 1 - e. Let
[= n~=l Ur;,N S;
= the subset of those x which lie in infinitely many S;], and
1'(8) ~ 1 - e > O.
Consider
i=O
Since T is ergodic 1'(8)
= 1 and
without loss of generality we may assume
$=X. For x E X = $ we have x E Ti(S) for some i so T-i(x) E 8 and for infinitely many k, T-i(x) E SI;' Let k(l, x) < k(2, x) < ... be these k's. Once nl;(n.x) > i we may set in = - i, in = nl;(n.x) - i. Thus for each x E X there are values in < 0, jn > 0 with jn - in -+ 00 in n and the T, P, jn - in-name of T-in(x) is in the set S(T, P, jn - in, e). We also know 1h(T, P, jn - in' e) h(T, P, e)1 < '8/10. We are now ready to use the backward Vitali lemma (Theorem 3.9). Applying it, there is a set F, and for x E F, values i(x), j(x) from among the in (x), in(x) so that the orbit intervals ~
{Ti(X)(x), Ti(X) +1 (x), ... , Ti(x)(x)}
are disjoint and cover all but '8/10 of X. There is also an No with j(x) - i(x) < No uniformly. Let xeF
and I'(G) > 1 - '£/10. Select N ~ IONofe and so large that for n ~ N, for all but '8 of X in measure and for all but a set of density at most '8/5 of i E {O, 1, ... ,n - I}, we have Ti(x) E G. That we can do this is a consequence of the Birkhoff ergodic theorem applied to 1.G' Let H be this good set for the ergodic theorem. We now compute an upper bound for the number of T,P,n-names for points in H. Such a T,P,n-name can be represented as in Fig. 5.1. There is a subset of the name of density at least 1 - (2/5)'8 consisting of disjoint blocks 110 12 , ••• , II and across each such block we see a name from S(T,P, #11;,e). These blocks are the intervals
74
I Fundamentals of measurable dynamics HI[~---H][~--~]~E----4-~jJE ~~
11
[
~~
12
//- 1
Fig. 5.1
T,P,n-names for x.
(i(TU(x))
+ u,j(TU(x) + u)
1/
where P(x) E F, and the entire block is contained in (0, 1, ... , n - 1). The remaining indices correspond to points TU(x) ~ G or whose block of indices is not completely contained in (0, ... , n - 1). The number of such T,P,n-names in H is bounded by of names ) IJ'(#possible
(#
of ways ) the I h . : ., I, x k-1 can anse
Now
across II.:
(#
of ways)
(1)
theI 1 ,.:.,I,:::;
(#
( #
p~~:~~es )
(
#
of names po.ssible outSIde the Ik
1 .
(S.8)
of subsets of size )
atmostt~nina
,
set of SIze n
can anse
(2)
x
= 2"(T,P, #lk,a)(#lk)
across Ik and (3)
(p~s~i~7:~~:-):::;
s2/Sin.
(S.9)
side the Ik
To estimate the number of subsets of size at most (2/S)I.n in a set of size n we use Stirling's formula. (S.lO)
where
Note that this fundamental combinatorial formula can be regarded as the core of entropy theory. See Ahlfors (1966) for a proof. Thus the binomial coefficient
I 75
Entropy
So the number of sets of size at most exn in a set of size n is
n ) ( [exn]
+(
n ) [exn] - 1
+ ... + (n) + (n) ~ (ex-«(l 1
0
_ ex)-(l-«»)n
In. (5.12)
Set
H(ex)
=
(5.13)
-odog 2 ex - (1 - ex)log2(1 - ex).
Combining estimates (5.11) and (5.12), the number of T,P,n-names covering all but e of X is bounded by 2', where
e) k~l #11 (5Te) (n + clog2 n) + (18) 5 (log2 s)n + (h(T, P, e) + 10 e H (Te) log2 n (Te) ) ( h(T,p,e)+10+ 5 +c n -+ 5 log2 s n. t
r= H ~
Thus
h(T,P,n,B) ~ h(T,P,e)
(26) 26
n e + clog2 n + 10 + H 5" + slog2 s.
This holds for all n sufficiently large so
!~~ h(T, P, n, e) ~ h(T, P, e) + :0 + H (~) + ~ log2 s.
(5.14)
Clearly for e ~ e',
lim h(T,P,n,B)
~
lim h(T,P,n,e')
~
so if we let e -+ 0 in (5.l4), lim h(T,P,n,e')
~
h(T,P,e).
It follows now that for e < 1,
lim h(T,P,n,e) exists and its value is h(T, P) independent of e. Corollary 5.2 (Of the proof) Given T as usual and a finite partition P, there exists an increasing sequence of sets An whose limit is a.a. of X so that log2( # of T,P,n-names in An) n
Converges to h( T, P) as n -+
00.
(5.15)
76
I Fundamentals of measurable dynamics
Proof In the course of proving the previous theorem we selected 6 and constructed a set G of measure at least 1 - 6/10 using the backward Vitali lemma. We discarded those points of X whose orbit of length n was outside G more than 6/5 of the time-by the pointwise ergodic theorem this is a decreasing sequence of sets. More precisely, let
Bn ,.
=
{x :pointwise ergodic theorem holds for for all n'
XG
~ n to within an error of 10 6 }.
Here we choose 6 so
and n so large that
By the previous proof, log( # T,P,n'-names in B
,.)
n --=--=------'------.:--,_ _----=.::.c = h( T, P)
n
± E:
for all n' ~ n. Now let ei = 2- i and select {nJ increasing so that Jl(Bn".,) > 1 -
and for n' > ni in Bn I og ( # T,P,n'-names ,'"
r
i
.»)
n
= h(T,P)
± ei •
Set An =
n
ni+l~n
Bn".,·
Now
so An increases to a.a. of X. Also
if-
log( T,P,n-names in ---=::... An) ---="-'---_ _____
:s; log( # T,P,n-names in Bn ,,0.•. )
n
n =
where
h(T, P)
± ei
Entropy I 77
Hence
lim log( # T,P,n-names in A .. ) ~ h(T, Pl. n
However, lim loge # T,P,n-names in A .. ) ~ h(T, P) n
•
anyway as Jt(A .. )#,.!. 1.
5.2
The Shannon-McMillan-Breiman theorem
We have often spoken of the measure of a T,P,n-name, meaning the measure of the set of points with this T,P,n-name. We write this as Jt(P.. (x» allowing P.(x) to represent both the name and the set of points possessing it. Our next result concerns the asymptotic size of such names. Theorem 5.3 (Shannon-McMilIan-Breiman) For T an ergodic map and P a finite partition, for a.e. x E X
lim -log2(Jt(P.. (x))) = h(T, Pl.
(5.16)
n
" .... 00
Note: this convergence is also in L 1 (Jt). Proof We first
show~
lim -log2(Jt(P.. (x))) ~ h(T, P) n
for a.e. x Let
E X.
B•.• = {x: -log2(:(P.. (X))) > h(T, P)
+ e, x E A .. }
where A .. is the set constructed in the previous corollary. Once n is sufficiently large Jt(B•.• )
~ 2-(h(T.P)+t)n. 2(h(T.P)+(£/2».
Thus 00
L Jt(B.... ) < 00 . .. =1
= 2-../2 •
(5.17)
78
I Fundamentals of measurable dynamics
By the Borel-Cantelli lemma ",{x: x lies in infinitely many B.... } If x lies in only fmitely many
Bn ••~then
= O.
for large n either x ¢
-log2(",(Pn(x))) < h(T, P) n
An
or
+ B.
As the A .. increase to a.a. of X, for a.e. x, once n is large enough, x for a.e. x, once n is large enough 1 --log2"'(p,,(x» < h(T,P) n
E An.
Hence
+ B.
Thus for a.e. x,
which is (5.17). Now we prove the lower estimate lim "-+00
-~log2"'(p,,(x» ~ h(T,P).
(5.18)
n
Redefine
B",. =
{x: -~ log "'(P..(x» < h(T, P) - B}.
Thus the number of T,P,n-names in B". is less than or equal to Let B.
2(h(T,P)-.)n.
= {x: x is an infinitely many B.... }.
If ",(B.) = 0 for all B we are done. Assume ",(B.) > 0 for some e > O. Now by ergodicity
1=0
and for a.e. x there exist i .. (x) < 0, i ..(x) > 0 with i .. (x) - i,,(x) .!. 00 and such that the T,P,i..(x) - i.. (x)-name of Tin(X)(x) is among the 2(h(T,P)-.)Un(x)-i..(x» names in Bjn(X)-in(X)•• ' Following the sequence of estimates of Theorem 5.1, using the backward Vitali lemma we can conclude from this that h(T, P) < h(T, P) - e/2, a contradiction. Hence ",(B.) = 0 for all e > 0 and (5.18) holds finishing the proof of pointwise convergence. To see uniform integrability and hence L 1 ("') convergence, just notice
",{x: -~log2"'(P.(X» > log2s + a} ~ 2-..
a•
•
Entropy
5.3
I 79
Entropy zero and past algebras
To this point entropy has been tied to a fixed finite partition P. We need to loosen this tie. Our first step is to understand what h(T, P) = 0 means. This in itself is an important step. Later we shall see that the relationship between entropy zero and the K -property is very analogous to the relationship between isometries and the weakly mixing property. Let T be, as usual, an invertible measurable measure-preserving map on the Lebesgue probability space (X,.?i', p.), and P = {PI"'" Ps} a finite partition of X. Define the past a-algebra of T, P to be -00
V T-j(P),
f1J =
(5.19)
j=-l
and more generally v
f1Ju.v =
Y
V T-i(P),
(5.20)
j=u
where u ~ v. If (u ' , Vi) £ (u, v) then f1Ju'.v' £ f1Ju.v' The sets in f1Ju.v consist of points whose T,P-names agree from index u to index v. This T,P-name we write Pu,v(x). Let A be any measurable set. We want to define the conditional expectation of A given f1J which we will write E(AIf1J). Define fN(X) = p.(A n P-l,-~(x» = E(AIP_ 1 -N)' p.(p-l, -N(X» ,
It is an easy check that the functions fN and algebras~_l -N form a bounded positive martingale. The U-martingale theorem (Coroll~ry 2.8) of Chapter 2 tells us, for a.e. x, fN(X) converges. We call the limit functions, defined a.e., E(AlgP)·
(5.21)
We know, for any S E ~
Is
XA dp. =
Is E(AIf1J)dp..
(5.22)
Suppose A = Pi' an element of the partition P. The function E(pdgP) meaSures the probability that, given only the past history of a point x, that it now lies in Pi' Note that since LPiEPX Pi = 1, we have
L PiEP
so for a.e. x, the vector
E(pdf1J) = 1 a.s.,
80
I Fundamentals of measurable dynamics
forms a probability vector which we write (5.23)
D(PI&')(x),
a probability distribution-valued function. Theorem 5.4 h(T,P) = 0 iff for all PI e P, E(p,I&') the (T, P) process determines its present.
= Xll •
a.s. i.e., the past of
Proof Suppose that for all Pi e P, E(pd&')
= Xll ,
a.s.
For x e X, let p(x) e P be that element which contains x. Thus for a.e. x, E(p(x)I&') = 1.
Fix e > 0 and select No so large that on a set G, ",(G) > 1 - e, we have for xeG
Exercise 5.1
How do we do this?
Thus if we know that x e G and we know thenamef-l.-No(X) then we know p(x). By the ergodic theorem we can select N so large that for all but e of the
pointsxeX,atleast(1 - 2e)Nofx, T(x), ... , TN-1(X) are in G. Call this subset BN , ",(BN ) > 1 - e.
We now count the number of T,P,N-names in HN . We do this by first selecting a subset of 2EN places in 0, ... , N - 1 which are to be those indices not in G. At these indices in the name, and in the first No positions we assign some arbitrary symbols from P. The rest of the symbols in the name are now determined, as working inductively from the left, at- an undetermined index we must be in G and we know symbols in the previous No positions. Thus
# (T,P,N-names in BN ) ~
~
# of subsets Of)
# names # names) across x ( across ofN (0, ... , No - 1) such a set 2(H(2£)+c1ogNIN+2£logs+(NoIN)logs)N
(
~2eNinaset
x (
(cf. estimates (5.11), (5.12) and (5.14». Thus h(T,P,e,N) ~ H(2e)
clogN
N.
+ ~ + 2dogs + ; logs.
1
(5.24)
Entropy I 81 Letting N
--+ 00
and then e --+ 0, h(T,P)
= O.
To prove the converse we will show that if E(PilgIJ) is not 0, 1 a.s .• then h(T,P) > O.
Note: To say E(pdgIJ)
= {~
a.s.
is the same as to say (5.25) Since there are only finitely many elements Pi in p. for some fixed Pi and for some a > 0 we must have 1 - a > E(PilgIJ) > a on a set A E ~ Jl(A) > a. Select e < a and No so large that for a set G of measure Jl(G) > 1 - e we have for all x E G and n ~ No. E(pd,9l)(x) - E(PioP-l . ...,J(x) < e.
(5.26)
I'\.
This follows from the pointwise convergence of the martingale
I" =
E(pJi>-l.-N)·
This now gives us information about the asymptotic behavior of Jl(P.(x». The key observation is that if T"(x) E A (") G and n ~ No. then (5.27) To see this just note Jl(p" +1 (x» = E(P(T"( »1 (T"( ») Jl(p"(x» x P-l,-" x and as T"(x)
E
G (") A,
E(P(T"(x»lp_l._"(T"(x)))
= E(P(T"(x»lgIJ) ± e <
{
E(PilgIJ) + e 1 - E(Pil,9l)
if P(T"(x» = Pi + e if P(T"(x» =F Pi
Regardless of whether T"(x) is in A (") G or not. Jl(P"+l (x» ~ 1. Jl(p"(x» Write (5.28)
82
I Fundamentals of measUlable dynamics
Each factor is less than orequalto 1 and wheneveri it is less than or equal to 1 - DC + e. Setting
~
No and Ti(X)
E
An G
N
k(n, x)
= L
XAnG(Ti(x»,
i-No
we have Jl(p,,(x» :s; (1 -
DC
+ e)"(N.X>.
By the BirkhofT theorem, for a.e. x .
1
N~oo
N
11m -k(n,x) = Jl(Gn A) >
DC -
e
and so for a.e. x
But we already know this limit exists (Theorem 5.3) and is h(T, P).
5.4
_
More about the K-property
We now turn to the opposite extreme, that of transformations T for which h(T, P) > 0 for all non-trivial partitions P. We .will see that these are the
K-systems of the last chapter. Theorem 5.5 partitions P.
If T is a K-system then h(r, P) > 0 for all finite non-trivial
This theorem depends on two lemmas. Lemma 5.6
If h(r, P)
= 0, then for all integers n, h(T", P) = 0.
Proof Certainly N(T",P,k,e):s; N(T,P,nk,e) and h(T",P) :s; nh(T,P).
_
Lemma 5.7 If T is a K-system, P a non-trivial partition and e > 0, then there is an n so that for a set G, Jl(G) > 1 - e.
If x
E
G then
for all Pi E P.
Entropy
I 83
Proof We know
E(PiI5~1 T-nk(P») = ;~ E(PiI5~1 T-nk(p») pointwise a.e. For any x E X,
( I"Y--N1 T
E Pi
-Ilk
)
_
(P) (x)-
P.(Pi n p(T-n(x» n p(T- 2 n(x» n"'n P(T-Nn(x))) p.(P(T n(x»nP(T 2n(x»n"'n P(T Nn(x))) .
As T is a K-system, (see Definition 4.4) once n is sufficiently large, for all N, for all but B of X, this is P.(Pi) ± B for all PI' • Proof of Theorem 5.5 If P is a non-trivial partition, as T is a K-system by Lemma 5.7 there is an n so that E(pdVk-=~l T-nl(p» =F 0 or 1 with positive probability (in fact this conditional expectation is approximately P.(Pi(X» for most x). This says h(Tn, P) =F O. Now by our first lemma h(T, P) =F O.
5.5
The entropy of an ergodic transformation
We want to prove the converse of Theorem 5.5 but it will be some time before we have sufficient machinery to do so. We now begin to develop this machinery, for a time leaving behind the notion of K-system and re-entering the general development of the theory of entropy. DefinidoD 5.1 h(T)
=
suph(T,P)
(5.29)
where the sup is taken over all finite partitions P of X. We need some basic facts to help make h(T) a reasonably computable . 0 quantIty. Lemma 5.8
For all k > 0, h ( T,
i~l T- i(P») = h(T, Pl·
(5.30)
Proof We estimate the number of T, Vl'=-l T-i(P), n-names on a subset A. On the one hand it is at least the number of T,P,n-names on A. On the other hand to know the T, ~-:,k_k T-i(P), n-name of x is precisely to know P-k,nU(X), hence there are at most S2k times as many T, Vl'=-l T-i(P), n-names on A as there are T,P,n-names, Hence
84
I Fundamentals of measurable dynamics # (T,P,n-names in A)
~
# ( T,
~
S2k
/i"
T-i(P), n-names in A )
# (T,P,n-names in A).
•
The result follows.
Lemma S.9 Suppose a partition H is P measurable, i.e., an element of H is a union of elements of P. Then h(T, H)
~
h(T, p).
(5.31)
•
Proof Exercise 5.2.
Exercise 5.2 Prove Lemma 5.9. If we have two finite partitions of X, H, and H', both with state space {h l' ... , h,}, we can define their symmetric differences H tl H' = {x : H (x) of. H'(x)}. This set is a measure of how different the partitions are. Notice it does take into accounfthe labels on the sets.
Lemma S.lO Suppose Hand H' are two finite partitions with the same state space {hl>"" h,}, If p(H tl H') < e, i.e., Hand H' are very close, then h(T, H)
~
h(T, H')
+ elog2 s + H(6).
(5.32)
Proof Fix B > 0 and select Nl so large that for a set A l , p(Ad > 1 - Band for n ~ N l ,
# (T,H',n-names in Ad
~ 2("(T,H)+i)n
(cf. Theorem 5.3).
Let E = {x: H(x) =F H'(x)}, and hence p(E) < 6. Applying the Birkhoff theorem to E, select N2 so large that for a set A2 with ",(A 2) > 1 - B we have for x E A2 and n ~ N 2 , for all but at most (6 + B)n of the points x, T(x), ... , yn-l(X), H(Ti(x» = H'(Ti(x». LetA = Al n A 2 ,so",(A) ~ 1 - !e;letN = max{Nl ,N2 },andn ~ N. Then 2"(T,H,2i, ..) ~
# (T,H,n-names in A)
~
# T,H',n-names in A)
# of subsets of size ~ (e x(. f . In a set 0 SIZe n
+ B)n)
n
x s .
Applying the usual Stirling's formula estimates
h(T, H,!e, n) ~ h(T, H') Letting n -+
00
clogn
+ B + H(e + B) + - - + (e + B) log s.
and B -+ 0 we get the conclusion.
n
•
Entropy
Corollary 5.11
I
85
If
V T-i(P)
He
i=-oo
then h(T, H) ::;; h(T, P).
Proof For any e > 0 select k and k
H'
V T-i(P)
c
i=-k
with Il(H ~ H') < e.
Now applying Lemmas 5.9 and 5.10 h(T,H)::;; h(T,H') ::;; h (T,
+ H(e) + dogs
i~k T-i(P)) + H(e) + dogs
::;; h(T, P)
+ H(e) + dog s.
•
Letting e -+ 0 completes the result. Corollary 5.12
If P is a generating partition, i.e., 00
V T-i(P) = ;=-00
ff'
then h(T) = h(T, P).
•
5.6 Examples of entropy computations Example 1 Irrational Rotations. Let T circle (J -+ ((J
=
R a , an irrational rotation of the
+ oc) mod 2n.
The partition P = {(O, n), [n, 2n]}
is a generator. In fact, if (Jl =F (J2' as the values noc mod 2n are dense in [0,2n), for some n, (nIX) mod 2n lies between (Jl and (J2' hence (Jl and (J2 lie in distinct elements of T"(P). Thus VtLoo T-i(P) separates points and so generates. Thus h(R,,) = h(R", P).
86
I Fundamentals of measurable dynamics
Theorem 5.13
h(R,.. P)
= o.
Proof A set in
is an interval. Spanning this with T-"(P) cuts exactly two of these when forming
Thus #(T,P,n+ I-names) = # (T,P,n-names)
+2
•
and the result follows.
Example 1 Bernoulli Processes. We introduced Markov processes in Chapter 1, Example 3. Bernoulli processes are a special case of Markov processes. As we continue through this and the next two chapters, they will come to play an ever more critical role. Hence we will describe them in detail, and begin with almost obvious facts. Let X = {1,2, .. . ,s}l, the set of all doubly infinite sequences of elements from the finite set of 'symbols' {t, 2, ... , s}, and let
be a probability vector 'Ttl > O. Define I-' on a 'cylinder set' (iN' iN +1 , ..• , iv)
= {x E X I the symbol at index i
E
(u,v) ofx is ij },
to be 'Ttl •• 'Ttl •• I
•••••
'Ttl.·
(5.33)
As long as s =F 1, one can easily show that using cylinder sets to form a generating tree of partitions, (X, iF,l-') is a non-atomic Lebesgue probability space. Define T: X -+ X by where is = is+1 (the left shift). This is known as a Bernoulli shift, Bernoulli process, or i.i.d. (independent, identically distributed) process. A Bernoulli shift is completely specified by the probability vector ft, so often this is all that is given, i.e., the Bernoulli shift (1/2,1/2), also called just 'the 2-shift' or Bernoulli shift (1/3, 1/3, 1/3), etc. We want to compute h(T). To do so we first need to know it is ergodic.
Entropy I 87 Lemma 5.14 A Bernoulli shift is mixing and so ergodic (in fact it is a K-process but we're not ready to prove that yet).
Proof Let A and B be finite unions of cylinder sets. It is clear that, once n is sufficiently large Jl(T"(A) 11 B) = Jl(A)Jl(B).
Such finite unions are dense in F (w.r.t. Jl). It follows that if T(A) = A then = Jl(A)2 and so Jl(A) = 0,1. •
Jl(A)
Let P be the finite partition of X according to io(x), the O-position symbol, and
P = {Pl,P2'''.,Ps}. Sets in V;"=u T-i(P) consist of cylinders on indices (u, v), hence P clearly separates points and so generates. We will compute h(T) = h(T, P)
by using the Shannon-McMillan-Breiman theorem, (5.3) which identifies the entropy as the exponential shrinkage rate of a typical T,P,n-name. Now Jl(p"(x»
= 1t p(x) ·1t p(T(x»··· ·1tp(T"-'(X»
as p"(x) is the cylinder (P(x), P( T(x», ... , P(T,,-1 (x))).
This is then 11-1
II-I
n;~XI(T;(X» x
n
1
II-I
I 'oX2(T;(X» x ... x n;~X.(T;(X» s 2
where Xl(X) is the characteristic function of Pl. According to the Birkhoff theorem, given B > 0, for a.e. x E X, there is an N so that for n ~ N, 1 "-1
.
-ni=O L Xk(T'(x» =
1tk
± B.
Therefore for large n Jl(p"(x» = (1t~'1ti2 .. ·1t:·)"(1t 1 •• ·1tS)±E". For e > 0, choose B so small that - elOg 2(1t 1 0
and now
••• 0
1ts )
< "6
88
I Fundamentals of measurable dynamics
We conclude for a Bernoulli process s
h(T)
=
= -
h(T,P)
L x log2 X i
i'
(5.34)
i=l
The entropy of a Bernoulli shift is thus easily computed. With this result we can now, for example, conclude that the 2-shift and 3-shift are nonisomorphic, as they have different entropies. A much deeper fact, Ornstein's isomorphism theorem, states the converse: any two Bernoulli shifts of the same entropy are isomorphic. We will prove this in Chapter 7. Also notice that we have finally seen the formula Xi log2 Xi' which classically is the starting point for entropy theory. See Billingsley (1965) or Smorodinsky (1971) for good discussions of its abstract character. This is a generalization of our function H(a.) which arose from Stirling's formula.
Li
Example 3 Ergodic Markov processes. Let [Xi.J] be the matrix of transition probabilities, and (Xl"'" Xs) be the stationary distribution of an ergodic Markov process. As in Chapter 1, X is the set of all doubly infinite sequences ( ... , i.. - l ' i..... ) where X1._"I. > O. The measure p. is defined on cylinders (io, i 1 , • •• ,it) by p.((io,i1,···,it
»=
Xio 'Xio,i, "'Xi,_hit"
A generating partition can be formed by setting P(x) = io(x) as in the previous example. We compute h(T) = h(T, P) exactly as in the i.i.d. case,
where ri,j(x) = Li.:-J X(I,J)(1(~» and X(i,J) is the characteristic function of the cylinder (i, j). Once again, by the Birkhoff theorem, for B > 0, for a.e. x, once n is sufficiently large p.(P.. (x» =
xp(x)
TI i,} 1[i.j>O
Hence for e > 0 choose e with
and we obtain
xl~ji.J)±.)(.. -l).
I 89
Entropy
-log2(/l(Pn(x» n
=
-log2 1tp(x) n
n - 1~ I - - - L.. 1t i 1t i ,} Og2 1ti,j n
+ _ B.
i,}
The Shannon-McMillan-Breiman theorem (5.3) now implies for ergodic Markov processes
L 1t 1t
h(T, P) = -
i
i ,j
(5.35)
log2 1t i ,j'
t,}
Note that a Bernoulli shift is a special case of a Markov process with 1ti ,j = and the entropy formula of Example 3 reduces to that of Example 2.
5.7
1tj
Entropy and information from the entropy formula
As we indicated earlier, the formula for the entropy of a Bernoulli shift is the starting point for the classical development of entropy. We will now give a part of that development. Definition 5.3
F or a probability vector ft =
(1t 1" .. , 1t.)
we define
s
H(ft) = -
L 1t log2 1t i
(5.36)
i
i=l
(with the convention that extending x log x continuously to 0, 0 log2 0
=
0).
This is a convex function, i.e., Lemma 5.15
If ft and ft' are probability vectors and 0 < A < 1 then H(Aft
+ (1
- A)ft')
~
AH(ft)
+ (1
- A)H(ft')
with equality precisely when ft = ft'.
Proof Let F(A) = H(Aft
+ (1
- A)ft').
One easily computes
and equality holds only if 1ti =
1ti
for all i.
•
Exercise 5.3 Show that as a corollary of Lemma 5.15, the maximum value of H on n-dimensional probability vectors is log2(n).
90
I Fundamentals of measurable dynamics
Definition 5.4
For a finite partition P of a probability space (X, ff', 11), we set (5.37)
Now for a measure-preserving transformation T of (X, ff', 11) we have the past algebra -00
!!J =
V
T-'(P),
i=-l
and defined a.e., the conditional distribution of P given ~ written D(PI!!J) = (E(P11!!J), E(P21!!J), ... , E(Psl!!J»·
This is a probability vector-valued function on X. We have seen earlier that h(T,P)
=0
if and only if D(PI!!J) is an elementary vector (all O's but for a single 1) almost everywhere. Definition 5.5 The conditional information of P given its past .9 is I(PI!!J)
= H(D(PI!!J».
This is a function, not anum ber. More generally, if .Yf is any subalgebra of ff' then D(PI.Yf)
= (E(P11.Yf), E(P21.Yf),···,E(p.I.Yf»
and I(PI.Yf)
= H(D(PI.Yf».
This function is called the conditional information of P given .Yf. It is meant to measure the amount of 'information' gained when learning the set in P to which x belongs, having already known the sets in .Yf to which x belong. Equivalently, it measures how much 'randomness' remains in P after having learned all of .Yf. These are obviously only heuristic ideas. In fact, much precision can be given them. Our intentions are more technical and less philosophical. Set h(T, P)
=
f
I(PI!!J) dJl.
(5.38)
Our goal is to show that h(T, P)
= h(T, Pl.
First we check this equality for the examples we have considered.
(5.39)
Entropy I 91 Example 1 h(T, P) = 0
iff h(T, P) = O.
We have seen that h(T, P) = 0 iff D(PI9) is an elementary vector a.e. But H(n) = 0 iff n is an elementary vector and the result follows. Example 2
For an i.i.d. process, by the definition of independence, D(PI9)
= (n 1 ,7t2, ... ,7t.).
Hence I(PI9) = H(7t) so h(T, P)
= H(7t) =
h(T, Pl.
Example 3 Fora Markov process, D(PI9) Pk and hence
f I(PI9)dJl = i
= (nl ,1,nl ,2, ... ,nk ,s)
if T- 1(x)
E
7tk·H(7tk,l,7tt,2, .. ·,7tt,.) = h(T,P).
k=1
Showing that 11 = h can be thought of as a generalization of what happens for Markov chains. For a Markov chain all the 'information' the past algebra gives us concerning the present is contained in the single symbol at time -1. Lemma 5.16 algebra,
Let P and H be finite partitions. Regarding H as a finite
h(P v H) = h(H)
Proof Let n i
= Jl(hJ
+ f I(PIH)dJl.
and ni,j
=
Jl(h i n p.) Jl(hJ J
= Jl(pjlhJ
Then h(P v H) = = -
L Jl(h; n pj)log2(Jl(h/ n PJ» ;,j L 7t/7t/,j log2(7t/7t;,i) ;,j
= - L.. " n·1,).n.log - "L., n·'.}.7t.log2(n .. ) 2(n.) , I 1 '.J i,i
;,J
= L n;log2(n;) i
Now
L n n;,jlog2(n ,J)' i9j i
i
(1-
92
I Fundamentals of measurable dynamics
f
I(PIH)dp.
=
-~~ 1t(pj nh i )log2
= -
p.(h(n Pj) p.(h;)
L 1ti,j1ti log2(1t ,j) i
i,j
and h(P v H)
= h(H) +
f
J(PIH)dp..
•
Corollary 5.17 Suppose n = (n1"'" n.) and n' = (n~, ... , n;) are probability vectors and, for 0 < 1 < 1, n"
= (11t 1, 11t 2 , ... , A.ns ,(1
- 1)n'1, ... ,(1 - 1)1t~).
Then H(1t") = 1H(1t)
+ (1
- 1)H(1t')
+ H(1).
•
Proof Exercise 5.4. Exercise 5.4
Prove Corollary 5.17.
Lemma 5.18 h(T, P)
1
(,,-1
= !~~ ~h :Yo T-'(P)
)
.
Proof Let e > 0 and select N so that for all n ~ N, for a set G of all but e of the x E X we have p.(Pn(X» =
Let B
= GC, p.(B)
<
r(h(T,p)±£)".
e. Consider the probability vectors
and
Now h('2 T-i(P»)
by Corollary 5.17. If e is sufficiently small,
= p.(B)H(n) + p.(G)H(n') + H(p.(B»
Entropy
I
93
and
hC~ T-i(P») = (1 ± Il)H(n') ± llH(x) ± e. Now
(5.40)
L xi log2 xi _ ' (2-("(T,Pl±£)n) - - L Xi log2 /J(G) = - L X;{ -(h(T,P) ± e)n) -log2(/J(G»)}
H(x') = -
= n(h(T, P) ± e) + log2(/J(G», and H(x)::5: nlog 2 s
as 1t has at most s .. elements and H is maximized when they are all of equal size (Exercise 5.3). Thus (5.40) gives
!
n
(V
T-i(P»)
= (1 ± e)(h(T, P) ± e) -
log2(/J(G»
n
i=O
Letting n --+
00,
± e10g2(s) ± B.
forces both e and B to zero and we get the result.
_
Theorem 5.19 Let P be a finite partition of X and &' its past algebra. h(T,P) =
f
I(PI&')d/J = h(T,P).
~
Proof By iterative applications of
= h(H) + f I(PIH),
h(P v H)
we have h
(':2
T-i(P») =
f
I(T-n+l(p»d/J
+
f
f
f
I(T-·+ 2(P)1 T-n+l(p» d/J
I(T-n+3(p)1 T-·+1(P)
+ ... + =
+
f
I(P)d/J
I(pi
+
f
V
T-n+2(p» d/J
~2: T-i(P»)d/J
I(PIT(P»d/J
+ ...
f
I(pi
~21 T-i(P»)d/J.
I
94
Fundamentals of measurable dynamics
Setting
we know jj -+ I(PI&')
pointwise and in L I, by the L I-martingale theorem, (D(P IVi=1 Ti(P» form the martingale. Hence h(T,P)
1 = !~~ ~h
("-1Yo
1,,-1 L n i=O
= lim n-+oo
=
f
T-i(P)
)
f J;dp. •
I(PI&,)dJl.
Exercise 5.5 Within this discussion, there arises a natural notion of the conditional entropy of a partition conditioned on a factor algebra d.
1. h(Pld) =
f
I(Pld) dJl. For example, h(T, P) = h(PI&'p)·
This extends to a natural notion of the conditional entropy of a process, conditioned on a factor algebra.
(a) The first equal sign in 2 is a definition, of course. Prove the second one. (b) If the partition H is a generator for d, then following the argument for Lemma 5.16, show h(T,P v H)
5.8
= h(T,H) + h( T, pi iYCXl
T-i(H»).
More about zero entropy and tail fields
Returning to the case of zero entropy transformations, remember we know = 0 iff
h(T, P)
Pc&,
=
-00
V i=-1
T-i(P).
Entropy
I 95
This, of course, says -00
T-i(P) c
V
i=-i-I
T-i(P)
and so If we define
the tail field of P,
t~en
we obtain the following result.
Corollary 5.20
iff
h(T, P) = 0
•
Pc !fp •
We want to generalize this last corollary to the following theorem (we have taken this argument directly from Smorodinsky (1971). Theorem 5.21
If P and Q are finite partitions and P
c
!fa' then h(T, P) =
Lemma 5.22 h ( Tk,
Yo T-1(P) = kh(T, P)
k-I
)
Proof Exercise 5.6. Exercise 5.6 Prove Lemma 5.22. Corollary 5.23 h(T1) = kh(T). Proof of Theorem 5.21
By lemma 5.22 h(T, P)
f (k-lY f (1-1V = = -k1
I
T-i(p)
J-O
lim -k1
k.... oo
I
}=o
1-.Y
00
T-i(p) )
}--I
T-J(P)
1-V 00
J=-I
T-i(p) ) .
o.
96
I
Fundamentals of measurable dynami~
By lemma 5.8
h(T, P) = lim! fI k-+co k
(V
T-i(P»).
i=O
Consider: (1)
and
k1
(2)
f (k-l f (k-lY k f (k-l f (k-l
i'Y.
I iVa T-i(p v Q) I-CO) T-i(p) 1 = -1
I
k
T-J(P)
}-o
+1
1-.Y
00
T-i(p) )
}--l
I iVa T-i(Q)
IjYk-l
oo
)
T-i(p) .
The left hand side of (2) is less than or equal to
k1
)
I iVa T-J(P v Q)
and greater than or equal to the left side of (1), both of which tend to h(T, P v Q) as k -+ 00. Each term of the right hand side of (2) is less than or equal to the corresponding term on the right hand side of(I). Thus
must converge to 0 as k 1 !~ k
If P S
ffQ'
-+ 00,
i.e.
f I (k-l iVa T-i(p) 1i'Y.
then for all j
00
1
~
T-i(p v Q)) = h(T, P).
0, -00
T-i(p) s
V i=-l
and
-00
T-i(Q) s
V T-i(p v Q) j=-l
(5.41)
Entropy k-l
I 97
-00.
V T-i(p) ~ V
i=O
T-J(P v Q)
i=-l
and so
!k fI
(k\)
"
T-i(P)
j=O
and hence by (5.6), h(T, P) =
I V T-i(P v Q») = 0 i=-l
•
o.
Exercise 5.7 1. Show that if h(T, P)
= 0 and h(T, Q) = 0 then H(T, P v Q) = o.
2. Use part 1 to show that for any transformation T there is a maximal tr-algebra n, called the Pinsker algebra, so that for any Pen, h(T, P) = o. 3. Show that as irrational rotations and permutations of finite sets have zero entropy, any minimal isometry has zero entropy. Hint: spectrum.
5.9
Even more about the K-property
Corollary 5.24 h(T, P) #- 0 for all non-trivial P iff ~ is trivial for all P. • (Trivial means consists only of sets of measure 0 or 1.)
We want to see that this is also equivalent to ffp-trivial for a single generating partition P. Theorem 5.25 If P ~nd Q are finite partitions, Q c !/p.
V/=oc:.. oo T-i(p) then Y Q ~
We first deal with some preliminaries. Note that if Q
c Vi"=-k T-i(P)
then
Y Q c Yp is easy to show.
Next notice that if!. gEL land d is any algebra, IIE(fld) - E(gl.s;l)lIl ~
IIf -
gill'
It follows that given any s E 7L.+ and Il > 0, there is a partitions with the same state space and Jl(PAP') <
then
f
~
II(Pld) - I(P'ld)1 dJl <
for any algebra d.
Il
~
so that if P and P' are
98
I Fundamentals of measurable dynamics
Remember from Lemma 5.15 that H(1t) is a strictly convex function, i.e., H(r, ai 1ti ) ~ L a j H(1t i )
where a is a strictly positive probability vector, equality holding exactly when all1ti are equal. The following is just a restatement of this. Corollary 5.26
If P and Q are finite partitions then I I(PIQ)dJl S h(P)
and equality holds iff P is independent of Q (written P .1 Q), i.e., D(PIQ) a.s.
= D(P)
This leads to the following strengthening. Theorem 5.27
For any algebras .911 and .912, and finite partition P, II(P1.9I1 v .9I2)dJl S II(P 1.9I1 )dJl
and equality holds iff
Proof We begin by supposing .911 is generated by a finite partition Q = {ql, ... ,q.}. Now
D(PI.9I2) =
L E(qil.9l2) (D(P n Qil.9l2») qi
E(qil.9l2)
and
a.s. where D(P n Qil.9l2) E(qil.9l2)
(E(pi n Qil.9l2), (E(P2 n qil.9l2), ... ) E(qil.9l2)
is a probability vector-valued function. For a.e. x
equality holding iff
E
X,
Entropy
I 99
(5.42)
As H(D(P n qlldz )/E(q;!.sa1z )) is d 2 measurable, the left hand side of (5.42) is
L q,
f
x
E f(Xq,(D(P n qlldz»))dll E(qil d 2)
= =
fL x q,
Ix
XqJ{JJ(P n q;ld2»)dll \ - E(qildz)
J(D(PIQ v d 2))dll·
Equality in (5.42) holds iff D(P n qild2)/E(q;ld2) = D(Pld2) for all qi for a.e. x, i.e., D(PIQ v d 2) = D(Pldz ) a.e. Letting Qi refine down to d
l ,
f J(PIQi+1 v d 2)dll S f J(PIQ. v dz)dll
(5.43)
by what we just proved and taking limits f J(Pld2 v d 2)dll S f J(Pldddll·
If equality holds here it holds in (5.43) for all i and ~
D(PIQi v d 2) = D(Pld2) a.e.
for aU i and D(Pldl v d 2) = D(Pld2). Corollary 5.28
•
For any finite partitions P and Q, D(PI&'p v ,rQ) = D(PI&'p).
Proof We saw earlier that I/k JI(Vit;J T-j(P)I\/;:~l T-i(p v Q)) dll converges in k to h(T, Pl. This quantity can be written
~:~ f I ( T-i(p) Ii5l-1 T-i(P) Vi'll T-i(Q») dll 1 1- T-i(P) i=~-I T-i(Q) = k i~O J P k-l
f(
00
i'Y-l
This is the Cesaro average of the sequence
-00
V
)
dll·
100
I
Fundamentals of measurable dynamics
!Xj
fI(p15Z1 T-i(P) f ~f
=
V
i=~-1 T-i(Q»)dll ,
an increasing bounded sequence, hence !Xj converges to h(T, P). But now I(PI.?JIp v ,rQ)dll
I(PI.?JIp)dll
~!Xj
for all j hence
and so
• We need one last ingredient, a reverse martingale theorem. Lemma 5.29 Let G1 c G2 C f E L 1 (1l), A E R be fixed. Set
{x:
M =
For any set A
E
c
...
Gn be a finite sequence of O"-algebras and
max E(fIGk ) l";k";.
~ A}.
G1 ,
f
f dll
~ AIl(M n A).
Ar.M
Proof Let Mk = {x: E(fIGi ) < A for i < k but E(fIGk ) ~ A}. The Mb k 1, 2, ... , n are disjoint and cover M. Each Mk is Gk measurable.
=
n
~
L
k=1
AIl(Mk n A) = AIl(M n A).
Theorem 5.30 (Reverse martingale theorem, Doob 1953) If {G 1 ;2 G2 ;2 ... } is a sequence of O"-algebra~ which decrease to G and iff E L'(Il), then E(fIGd converges pointwise and in Ll to E(fIG). Proof L 1 convergence will follow from pointwise convergence and uniform integrability. Let
Entropy
A
=
I 101
A(Al' A2}
= {x: lim E(fIGn} <
..1.1 < ..1.2 < lim E(FIGn}}.
If we show JL(A} = 0 for all ..1.1' ..1.2 then we will have pointwise convergence. Let M" = {x:maxl,;;",;;nE(fIGd ~ A2}' By lemma 5.29
f
I dJL
~ A,2JL(M" (\ A).
Mnrvt
As n --+
00,
M" (\ A
--+
A and so
L L
I dJL
~ A2JL(A).
Replacing I by -land ..1.2 by - ..1.1' we get similarly IdJL
~ AIJL(A).
Hence JL(A) = O. To verify uniform integrability, assume I ~ O. Notice that hk(t5) = SUPUA E(/IGk) dJL IJL(A) < t5} is actually a maximum and is achieved on a set A which is Gk measurable. Hence h,,(t5) ~ hk- 1 (c5) is decreasing in k. Thus if t5 is such that for any A, JL(A) < t5, SAl dJL < e, then for all Gk and JL(A) < t5,
L
E(fIGk)dJL < t5.
This shows E(fIG,,) converges in Ll and pointwise to some function 1*. Now 1* is G measufable so all we need show is that for A E G, SAl· dJL = SAldJL to conclude 1* = E(fIG). But SAI* dJL = lim,,--+oo SA E(/I Gk) dJL, and as A E G £ Gk, this is equal to SA I dJL and we are done. Theorem 5.30, for example, tells us that for finite partitions P and Q, D(QIVi;;'~j T-i(P» converges pointwise and in Ll to D(QI,r(P» and so
JI(QIVi:~j T-i(P»dJL converges to JI(QIff,,)dJL.
Proof of Theorem 5.25 We prove that for Q c: Vioc;,,-oo T-i(P) we must have ,rQ £ ,rp by showing that any R c ,rQ has conditional entropy 0 with respect to ,rp and hence is,rp measurable. Now using Corollary 5.28,
so
102
I
Fundamentals of measurable dynamics
DC~k T- i(P)li=2_1
R) = DC~k T- i(P)li=2_1 T-i(P>}
T-i(p) "
Let k > m and S c V;'~-m T-i(P). Then D(
sli=2_1 T-i(P») D(sli=~_1 T-i(P) v R) =
and so
Letting k
--+ 00,
using the reverse martingale theorem
fI(SI~)dll fI(SI~ =
v R)dll·
This holds for any S lying in a finite span of the T-i(p). As Q c Vt~-oo T-i(P) and R c ffQ' R c Vt~-oo T-i(P) so there is a sequence of partitions Si c 'Vi~(iJ.m(i) T-i(p) and Si --+ R in symmetric difference. Thus
and
fI(Sd~)dll--+ fI(RI~)dll' fI(Sd~)dll fI(Sd~ fI(RI~ =
V
R)dll--+
We conclude I(RI~) = 0 a.e. and so R Corollary 5.31
If
~
c ~
and ffQ
v R)dll
=
C;; ffp.
O.
•
.
is trivial for a generating partition P then ffQ is trivial
~~~
This now gives us a circle of facts about the K-property analogous to Proposition 4.19 on weakly mixing. Definition 5.32
The following are all equivalent:
(1) ffp is trivial for all finite P;
(2) h(T, P) -::/= 0 for all finite P; (3) D(PI Vi·;'~i T- i (P)} "7 D(P) for all finite P; J
(4) T is a K-automorphism.
If T has a finite generating partition P, these are also equivalent to
Entropy (5)
I 103
ff" is trivial;
(6) D(Vl<=-k T-i(P)1 Vi·~~j T-i(P» -+ D(Vik=-k T-i(P» as j -+
00.
Proof We already know the equivalence of(l), (2) and (5) and that (4) -+ (2). That (3) -+ (6) and (3) -+ (4) are direct. That (1) -+ (3) is the reverse martingale theorem. To complete the proof we show (6) -+ (5) and by the same reasoning ~-+~ . Suppose A E ffp, Jl(A) -:F 0, 1. Let k be chosen and A' E Vik=-k T-'(P) so that IJl(AIA') - Jl(A)1 = IJl(A n A')/Jl(A') - Jl(A) I > 0. But by (5), A 1. A', i.e., Jl(A IA') = Jl(A) contradicting Jl(A) -:F 0, 1. • The 'usual' definitions of a K-automorphism are conditions (1), (2) or (6). Exercise 5.8 Show that a mixing Markov chain is a K -automorphism. Exercise 5.9 Show that a rank-1 cutting and stacking construction always has zero entropy.
5.10
Entropy for non-ergodic maps
We end this chapter with a few remarks concerning the entropy theory for not necessarily ergodic maps. Working with counts of names one can show without ergodicity that h(T,P,e)
= lim h(T,P,e,n)
as given in Definition 5.1, is in fact a limit and as a function of e is monotone MiHncreasing. One can show lim h(T, P, e) = ess-sup (h(Ty, P» .~O
(5.44)
yef
where Ty, Y E Yare the ergodic components of T. This is one possible choice for h(T, P), but certainly not the best. A Shannon-McMillan-Breiman theorem also holds. For Jl-a.e. x lim - log2(Jl(pix))) = h(Ty,P) n-+oo
n
(5.45)
where T, is the ergodic component containing x. Using the classical entropy function, one can also compute
(n-l) V
lim -1 h T-i(P) = h(PI&') = n ,=0
n~ao
i f
h(Ty, P) dm.
(5.46)
104
I
Fundamentals of measurable dynamics
Classical proofs of these results can be found in Jacobs (1962) and Krengel (1985). Working from the ergodic decomposition of Chapter 2, the techniques ofthis chapter can also be extended to their proof. We leave this as a challenge to the truly motivated reader. We will take as our definition h(T,P)
1 = lim -h ""'00
n
("-1) V = T-i(P)
.=0
h(PI&')
=
f
I(PI&,)dJl.
All the equalities here are either definitions or provable without ergodicity (see Theorem 5.19). We will use these rather extensively in Chapter 7, but will
only use the Shannon-McMillan- Breiman theorem for ergodic systems.
6 Joinings and disjointness
The fundamental question, and certainly the question behind all our work in Chapters 4 and 5, has been how to tell whether two dynamical systems are isomorphic. A more refined question has been, what structures exist within a dynamical system that can be used to distinguish it from others? Entropy and the point spectrum are two such structures which we have investigated rather carefully. Also mixing properties are of this sort. In this chapter we will approach our original question from a slightly different perspective. Given two dynamical systems, how closely can they be matched to one another? What we will consider are the ways of joining two systems together as factors of some common system. As we shall see, the space of all such joinings will carry a compact metric topology, and within this space we can search for isomorphisms, or obstacles to them. We will see that a number of mixing properties can be characterized in terms of these spaces of joinings. Further, the factor algebras and centralizer of a system are governed by the self-joinings. In this context we will construct some interesting examples. All this leads up to Chapter 7 where we prove Krieger's representation theorem and Ornstein's isomorphism theorem through careful construction and analyses within certain spaces of joinings.
6.1 Let
Joinings
X = (X, fF, p., T)-and Y = (Y, f§, v, S) be two ergodic dynamical systems.
Definition 6.1 A joining of X and Y is a T x S invariant measure {1 on X x Y for which all sets of the form A x B, A E :F, B E f§ are measurable and whose marginal measures (1(A x Y) = p.(A)
and (1(X x B)
= v(B),
i.e., agree with p. and v. We assume the {1 measurable sets to be the completion of:F x
f§.
Theorem 6.1 Suppose X and Yare dynamical systems on Lebesgue spaces, and {AJ c IF and {BJ c f§ are countable, T- and S-invariant generating
106
I
Fundamentals of measurable dynamics
subalgebras respectively. Suppose Jio is a T x S-invariant additive set function on sets of the form Ai x Bj satisfying Jio(Aj x Y)
= P(Ai)
and Jio(X
x Bj ) = v(BJ).
Then jio extends to a unique joining ji of the two systems and the measure space (X x Y,' x r§, ji) is automatically Lebesgue.
Proof Using sets Ai x Bj we build a refining and generating tree of partitions
{PJ, j
Pi =
V (Aj
x Bj,Aj x B),Ai x Bj,Ai x B).
j=1
Of course, setting i
Qi
= j=! V (Aj' Aj),
i
Hi =
V (Bj' B),
j=!
these are refining and generating in X and Y separately and Pi = (Qi
X
Y)
V
(X x H;).
The additive set function Jlo is defined on our tree {Pi}, and is T x Sinvariant. All we need to show is that the empty chains have measure O. Suppose ~ = {Ci } is a chain in {pJ There are then two chains ~' = {Cn in {Qi} and~" = {Cn in {Hd with Cj = C; x Now ~ is an empty chain if and only if one of C' or C" is empty. As X and Y are Lebesgue, the empty chains in {Q;} and in {HJ have respectively Jl and v measure O. As jio agrees with Jl and von its marginals, the {Pj}-empty chains have jio measure O. •
Cr.
This result will make joinings relatively easy to construct. All we need do is build jio on a tree. More importantly, that there is a 1-1 correspondence between additive set functions with the proper marginals and joinings will allow us to topologize the space of joinings. We have defined a joining as a measure on X x Y. Such arise any time both X and Yare embedded as factors of some common supersystem Z = (Z, &I, '1, U). If qJ1 : Z --+ X and qJ2 : Z --+ Yare factor maps, then restricting '1 to the smallest tT-algebra containing
'1(qJi 1 (A)
(")
That this is an additive set function is easy to check. By Theorem 6.1, it extends to a joining.
Joinings and disjointness
I 107
The factor of Z generated by q>1 1 (F) v q>i 1 (<§) is isomorphic to (X x Y,F x <'§,Jiz, T x S).
Quite often we will construct supersystems like Z and refer to them as joinings of some pair of factors. It is in the sense we have just described that the word joining is intended.
Definition 6.2
By J(X, Y) we mean the space of all joinings of X and
Y.
J(X, Y) is never empty as it always contains p. x v. It is convex, since if ji1 and ji2 are both joinings, then so is aji1 + (1 - a)ji2' 0 ~ a ~ 1. We want to topologize J(X, Y); we give an explicit metric. For a generating tree {p;} as constructed above let
and
Lemma 6.2
lim i _ oo
Iljii,jill
= 0 if and only if for all A E ~ and BE<'§,
lim jii(A x B) = ji(A x B). Thus although the metric depends on the choice of p;, the topology itself does not. Proof Certainly if ~ jii(A x B) -+ ji(A x B) I
then
I jii' jill "7 o. I
On the other hand, for any A x Band 8 > 0, select A' x B' with p.(AAA')
<
8
and
v(BAB')
<
Thus Jl(A x B A A' x B') < 28 for any (t
E
8,
A',B'EPi
for some i.
J(X, 1\
lim lji/(A' x B') - ji(A' x B')I
=0
i-oo
so
lim Ijil(A x B) - ji(A x B)I < 4e. 1-00
•
108
I Fundamentals of measurable dynamics
Theorem 6.3 (J(X, Y), II', . II) is compact, conveX and if X and Yare ergodic, its extreme points are the ergodic joinings. Proof Given any sequence Jii E J(X, Y), we can select a subsequence Jii(t) so that Jii(t)(q is convergent for all sets C in the tree {PJ. Let Jio(q be the limit. This will be a T x S-invariant additive set function on the tree with marginals Jl. and v. Hence Jio extends to a joining Ji. Oearly II ii.i{t),plI -+ O. To identify the extreme points, suppose ii E J(X, Y) ahd we write ii = J~ iit dt, its ergodic decomposition. We want to show that for a.e. t, iit E J(X, f). What we must show is that the marginals of iit are Jl. and v respectively. F or a set A E §, I
n-l
-n L
XA(Ti(x»
-+ Jl.(A),
Jl.-a.e.
n
i=O
Thus ii-a.e.,
Thus for a.e. t, for all sets Ai, for iit-a.e. (x,y), this Cesaro limit is Jl.(A;). Just repeating then, for a.e. t, iit(Ai x Y) = Jl.(A j ). The first marginal of ii, (and by symmetry the second) is Jl. (is v). Thus a non-ergodic measure is not extreme. As an ergodic measure cannot be written as an average of any other, an ergodic measure is extreme. -
6.2 The relatively independent joining We want to describe now a basic construction. When two dynamical systems ha ve a common factor algebra, this leads to a common supersystem or joining. This is a version in measure-algebraic terms of what is called a fibre product in topology. The construction occurs entirely on the measure space level. That the measure we build on X x Y is T x S-invariant makes it a joining and what this joining does is to identify the two copies of the common factor algebra, and on the fibres over the factor to take the direct product of the corresponding fibre measures. To be explicit, let X and f be two systems. Suppose Z1 = (Z 1, Jt"1 , '11' U d and Z2 = (Z2,Jt;"Ih, U 2 ) are factors of these. That is, we have homomorphisms
X -+Zl CfJ2: X -+Z2'
CfJI:
Further, suppose t/I: ZI -+ Zz is an isomorphism. We want to build a joining of X and f that identifies sets CfJll(q E § and CfJ;1 0 t/I(q E IF as a.s. equal.
Joinings and disjointness
We define the joining on rectangles A x B. For A constant on sets
E
I 109
9', E(xAI
!A(Z1) = E(XBI
(JB(Z2) = E(xBl
jio(A x B) =
fz, fA(zd x gB(t/I(zd)d'11(zd (6.2)
as t/I is measure-preserving.
Lemma 6.4 The set function jio above is additive, T x S-invariant and has p. and v as marginals. Proof Additivity follows from linearity of the conditional expectation. Invariance under T x S follows from
E(XT-l(A)I
jio(A x Y):=
f
= E(xAllPil(Jt'»(T(xd)·
fA(zdd'11 =
A,
f
E(xAllPi1(Jt'»dp. = p.(A).
•
X
_ Th_us jio extends to a joining ji we call the relatively independent joining over (ZI,ZZ,t/I)·
Lemma 6.S
For ji, the relatively independent joining over (ZI' Zz, t/I),
(1) ji({(x,y): t/I(IPI(X»
= 1P2(y)}) =
1;
(2) A E lPi l (Jt;) if and only if there is aBE
f'§
with
ji(A x YAX x B)=O; (3) BE
ji(A x Y AX x B) and (4) for such a pair A and B, t/I(A)
= B.
=
0;
110
I Fundamentals of measurable dynamics
Proof (1) Let {Hi} be a refining and generating tree of partitions in Zl' Hence {",(Hi)} is such in Z2' Write Hi = {A i.l>Ai. 2 , ... ,A i,s(i)}' In the tree {
A chain of such diagonal sets descends to a set of points (x, y) where "'(
=
Consider sri) Ai
U (
=
x
j;l
the points in diagonal sets at level i. These are nested and the points (x, y) with "'(
ri sri)
j;l
sri)
=
z,
f
A=
ni ~ is exactly
f'P"(A •.iZ1) x g'Pi'(A.)(",(zd)d'1d z d
j~l z, XA.)zd x XI/t(A.i",(zd)d'11(zd sri)
=
L '11(A
i)
= 1.
j;l
Thus jl(A) = 1. (2), (3) and (4): If A
E "'1'1(~)
then just as above, we compute
jl(A x Y A X x ",(A»
Conversely, suppose 0= jl(A x YAX x B)
=
I,
fA(1 - gB
= o.
° I/I)d'11 +
f
(1 - fA)gB
° ",d'11'
Now both fA and gB 0 '" lie between 0 and 1. For this to be zero, both integrals must be O. Thus when gB ° "'(Zl) -::f. 1, we must have fA(zd = 0 and when fA(zd -::f. 1, we must have gB ° ",(zd = 0 ('11-a.s. of course). Thus '11-a.s., if fA(zd=O, then gBo"'(Zl)=O and whenever fA (z 1) -::f. 0,
Thus fA B
E
= 9 B ° '" is a characteristic function.
It follows that A
and B = "'(A).
We can rephrase this as the commuting of a certain diagram:
E
JOinings and disjointness
I
111
where 1t1 and 1t2 are the coordinate projections. Embedded in the relatively independent joining is a single copy of the factor, identified as both 1t l 1 0 cpl1 (~) and 1ti1 0 cp2"1 (Jt'2)' Lemma 6.6
For functions f EJi(f o 1t 1
X
= E/i(f
0
E
L 1 (11), 9 E U(v),
go1t211tl1oCP1(~»)(x,y)
1t1 x go 1t211ti1
0
CP2(Jt;»)(X,y)
= EI'(flcpl\K1»)(x)E.(glcp2"1(Jt'2))(Y)'
Proof (Note: in any expression where the measure under consideration could be ambiguous, we will put the measure as a subscript). We show the identity for characteristic functions. As 1tl1(cpl1(~)) = 1t2"l(cp2"l(£Z)) ji-a.s. EI'(XA Icpl1 (£1)) (x)E.(XsI cp2"I(Jt;))(y) is 1tl1CPl1(~) measurable. For any set D E~,
r
¥
J"""','(D)
EI'(XAl cpl1(£l»)(x)E.(Xslcp2"1(Jt;»)(y)dji
= 1,1""'(D) fA(CP1 (X))gB(CP2(Y)) dji. Now ji-a.s. CPI (x)
t
= 1/1-1 CP2 (y) = Zl' Thus our integral is
fA(zdgB(I/I(zd)d'11
=
f f f
XAI""P,'(D)(X)Xs(y)dji
r
XA(X)XB(Y) dji.
XD(ZI)fA(zdgB(I/I(ztl)d'11
= fAI""P,'(D)(Zl)gB(I/I(ztl)d'11 =
=
z.
J"i'",,'(D)
(6.3)
112
I
Fundamentals of measurable dynamics
Conditions (6.3) and (6.4) uniquely define the conditional expectation (Exercise 2.12). • Corollary 6.7
If we decompose X over the factor ZI we get fibre measures
f.ll.z" and similarly decomposing Y over 22 we get fibre measures f.l2,Z2' Decomposing (X x y, jO x f§, ji) over the factor
nIl the fibre
measu~e
0
0
at (x, y) is
Examining the proof of Theorem 3.17 where the fibre measures are constructed, they are simply the conditional expectations of Lemma 6.6. •
Proof
Fig. 6.1 indicates how this fibring of the measure space looks. There are two special cases of this construction we should consider. Example 1 If 21 and 22 are both the trivial process (the identity transformation on one point), then ji(A x B) =
f
z,
f.l(A)v(B) d'1l = f.l(A)v(B)
and ji = f.l x v. Example 2 If X and Yare isomorphic, 1/1 the isomorphism, then using = X, Z2 ~ Y, by Lemma 6.5,
Zl
Fig. 6.1 Relatively independent joining over Z. The fibre measure on the dashed sq uare is the direct prod uct of the fibre measures on the two dashed lines.
Joinings and disjointness ,u( {(x, I/I(x»lx
E
x})
=
I
113
1.
The set of all (x, I/I(x» is just the graph of 1/1. This joining is supported on the graph. Thus any automorphism of x gives rise to a joining supported on its graph. This second example has an important converse. Theorem 6.8 If ji E J(X, Y) and !F x Y = X x isomorphism 1/1 : X -+ Y and Jl( {(x, I/I(x»lx E X})
f'§,
ji-a.e., then there is an
=L
Proof As!F x Y = X x f'§, ji-a.s. we can construct refining and generating trees {PJ of X and {Pi} of Y with ji(Pi x YAX x P;)=O.
To any chain ft} in {Pi} there corresponds a chain ft}' in {Pi}. If ft} and not empty, set
ft}'
are
This map takes almost all of X to almost all of Y. As ji(A x Y AX x I/I(A» = for any A E Pi' it is true for all A and 1/1 is measure-preserving. As v(I/I(T(A» A S(I/I(A))) = ji(T(A) x YA X x S(I/I(A)))
°
= ji(A
x Y AX x I/I(A»
= 0,
1/1 0 T = So 1/1, Jl-a.s.¥and 1/1 is an isomorphism. Following the computation in Lemma 6.5 completes the result. _
Theorem 6.8 begins the work of the next chapter, showing us how to identify joinings as isomorphisms. We now want to apply these ideas. In the remainder of this chapter our applications will involve cases where J(X, Y) is particularly small. In Chapter 7 we will consider applications involving J(X, Y) particularly large.
6.3
Disjointness
Definition 6.3 We say two systems X and Yare disjoint ifJ(X, Y) = {Jl x v}, i.e., the only joining is product measure. Exercise 6.1 Show that if IX and f3 are two irrational numbers, and further 1X/f3 is irrational, that rotation by IX and by fJ on R/Z are disjoing (hint: show R~
x Rp is minimal).
r
'4 , Fundamentals of measurable dynamics
We have already seen that certain mixing properties are characterized by the non-existence of certain factors. Thus a map is ergodic if it has no invariant factor, weakly mixing if it has no isometric factor, and K if it has no zero entropy factor. We will now see that something stronger is true, in each case the mixing property implies a disjointness property. All three of these arguments rest on a seemingly trivial observation. If a sequence Ji E L 2(J-l) is convergent, then if you embed L 2C1.l) in some larger L2-space, the Ji will still converge. Theorem 6.9 A system a Lebesgue space.
X is ergodic iff it is diSjoint from any identity map on
Proof The if direction is clear. Suppose T is ergodic, and Y = {Y, 'fJ, v, S} is an identity map. For any f E L 2(J-l), the ergodic averages AnU) converge in L 2(J-l) to the constant Sf dJl. For ji E leX, Y), the lift U
0 1( 1 )
E
U(ji),
AnU 0
1( 1
)
still converges U(ji) to Sf dJl. Thus for any g E U(v),
f
U
0 1( 1 )
x (g
0
1(
2)dji = =
=
f
AnU 0
f AnU
f
fdJ-l
1t1
X
0 1( 1 )
X
go 1( 2)dji go 1t 2 dji (as S = id)
f~dV.
But this says ji = J-l x v.
•
Theorem 6.10 A system X is weakly mixing iff it is disjoint from all isometries of compact metric spaces.
Proof The if direction is clear. Suppose X is weakly mixing and f an isometry of a compact metric space. Following similar lines as the last argument, let f E L 2(J-l), gEL 2(v) be characteristic functions. As S is isometric we know (Exercise 4.1) for any B > 0, there is a sequence {nk} 5; 71+ of positive density with Ilg(snk(y» - g(y)1I2 <
B.
As T is weakly mixing, by Lemma 4.4, there is a sequence {m k } 5; 7L+ offull density with f 0 Tmk converging in U(J-l) to the constant function Jf dJ-l. Intersecting these sequences gives an infinite sequence {td. For ji E leX, Y) we compute
Joinings and disjointness
I
115
Iff07t1 xgo7t 2 djI- ffdJl fgdVI
=!~~lffOTtk07tl
xgoStko7t2djI- ffdJl fgdVI
:s:!~~lffOTtk07tl0g07t2djI- ffdJl fgdvl+ellfll2 ellfll2' T'/c 0 7t 1 ) --+ Jf dJl in L 2(jI). Thus jI = Jl =
as (f 0
Theorem 6.11 An ergodic ergodic Y of zero entropy.
X
•
x v.
has the K-property iff it is disjoint from all
(Note: the ergodicity of Y can be removed.) Proof The if direction, as always, is already clear. For the other, assume X is K, Y is zero entropy and ergodic, and ji E J(X, Y). By Theorem 6.3 we can aSsume ji is ergodic. Let P be a finite partition of X and Q of Y. We lift p = P x Yand Q= X x Q. From Corollary 5.28, wi th ji as measure, Djj(Plgilp
v .rQ } = D,.(PI9I'p) 0 7t 1 .
Consider this same identity with (T x S) replaced by (T x S)2k and write it Djj(PI~ v .r~) = D,.(PI9I';) 0
7t 1 .
By Lemma 5.6, h(S2ky = 0, so .r~
= .rQ = X
00
x
V
S-i(Q).
i=-oo
Definition 4.4 of the K-property says D,.(PlgII;} k D(P) = (Jl(pd, .. ·, Jl(P.}}· Both sides of (6.5) are vector-valued reverse martingales as mJ~+l C ;7j> _
Thus by Theorem 5.30,
Thus
~
~p.
(6.5)
116
I Fundamentals of measurable dynamics
f
I(pi
Q&'~ v ffQ )dji =
f
I(P)dJl
But JI(PIQ)djiis squeezed between these two by Corollary 5.26. Thus it also is JI(P)dJl. But then P .1 Q relative to ji. • We have already pointed out the common thread in the 'only if' direction of these three theorems. In the case of the K-property we were not being quite honest, as the argument was entropy and not L2-based. Notice that the 'if' directions also are completely parallel. When disjointness from identities, isometries or zero entropy fails it is because of the existence of a factor of this type. We will see in the next seciton that this is not a totally general phenomenon. Systems can fail to be disjoint without possessing common factors. Each of these results concerns two collections of transformations, characterizing one as the systems disjoint from the other. One can ask if the other member of the pair (identities, isometries, zero entropy) is characterized similarly. For identity maps this is false. There are maps which are disjoint from all ergodic maps but are not the identity. A nice example is T(x, y) = (x, x + y) on the two-dimen~ional torus. For isometries it is also false. Glasner and Weiss (1990) have constructed systems which are not isometries of compact metric spaces but are disjoint from all weakly mixing systems. For zero entropy though it is true. As part of our work in Chapter 7 (Theorem 7.24) we will see that any ergodic positive entropy system has a Bernoulli shift factor, hence is not disjoint from the K-systems.
6.4
Minimal self-joinings
When one considers two distinct systems it is possible, as we have seen, for
leX, Y) to contain only product measure. When the two systems are the same, l(X, X) must contain more. Elements of l(X,X) we refer to as self-joinings. There are automatically certain automorphisms of this system, the powers of T. The measures supported on their gr~p~s are self-joinings. Thus ~j supported on the graph {(X, TJ(x»} is in l(X, X). We call ~j an off-diagonal as it generalizes diagonal measure ~o. Definition 6.4 We say X has two-fold minimal self-joinings if leX, X) is the convex hull of {Jl x Jl,bj}j=-coWe call this two-fold minimal self-joinings because we could always consider joining any finite number of copies of X together. In this case there are again certain joinings that must exist. Define an off-diagonal joining to be one supported on a granh {(x, Ti1(x), ... , TJ,,(x»} in a (k + I)-fold joining. These must be in l(X, ... ,X). More generally, one could partition the k copies into
JOinings and disjointness
I
117
subsets. On each subset place an ofT-diagonal measure, then take the direct . product of these ofT-diagonals. Definition 6.5 We say X has k-fold minimal self-joinings if all k-fold selfjoinings of X are in the convex hull of such products of ofT-diagonals. In these definitions we are not assuming product measure is necessarily ergodic. We will see though that if a system has minimal self-joinings and is not weakly mixing, then it must be a finite rotation. We will use this notion as a constructive tool to exhibit control of certain aspects of a dynamical system not directly reachable with our current methods. Our next theorem sets the tone for these ideas. Theorem 6.12 If X is totally ergodic and has two-fold minimal self-joinings, then it commutes only with its powers and has no non-trivial factor algebras. Proof Suppose To S = SoT, and of course S is measure-preserving. First assume S is invertible. Construct the joining ji supported on the graph of S (Example 6.2). Now ji-a.s. the two coordinate algebras are equal, and ji is ergodic for TxT. Thus ji is either J1. x J1. or some ~j. It cannot be J1. x J1., as this does not identify the coordinate algebras. Thus ji = ~j for some j. But then ji( {(x, S(x»} Ll {(x, Tk(X»}) = 1
and some Sex) = Tj(x) J1.-a.s. If S is not invertible, then S-1 (ff) does not separate points and is a nontrivial factor algebra. All that remains, then, is to show there are no such. Let cp : X -+ Z be a factor map. Let ji be the relatively independent selfjoining of X over Z. Now ji may not be ergodic, but it must be of the form co
+ (1
J1. = a(J1. x J1.)
- a)
L aA,
j=
-00
L
where 0 ::; a ::; 1, and aj ~ 0, aj = 1. Suppose Z is non-trivial, i.e., there is a set A E cp-1(£) with J1.(A) =F 0,1. By Lemma 6.5, J1.(A)
=
ji(A x A)
=
aJ1.(A)2
+ (1
co
- a)
L
aj J1.(A (\ T- j(A».
Of the numbers J1.(A)2 and J1.(A (\ T-i(A» only one is as large as J1.(A), and that is J1.(A (\ T(-O)(A». All the others are strictly smaller, as T is totally ergodic. But then we must have a = 0 and aj = 0 unlessj = O. We conclude ji
and
= ~O,
118
I Fundamentals of measurable dynamics jI(A x X)
=
jI(A x A)
=
ji(X x A)
for all A E $' and so again by Lemma 6.5, cp-l(£,)
= $' and cp is an isomor-
ph~m
_
Corollary 6.13 (Of the proof). If X has two-fold minimal self-joinings but is not totally ergodic, then X is finite. Proof As X is not totally ergodic, it has a factor algebra that is a finite point space (Exercise 3.3). Constructing ji the relatively independent self-joining over this factor the above proof still says IJ. = 0 as ~(A)2 < ~(A). But now, for any point Z E Z, by Corollary 6.7, the fibre measure of ji over Z is ~l,z X ~2,z' Hence the fibre measure over the entire first coordinate algebra is J1.2,z' But this measure is atomic, equal to LJ= -00 a/j(Ti(x» over x. Thus Z is atomic, and the fibre measures J1.z over Z are atomic. Thus X itself is atomic.
-
In fact, any cyclic permutation on a finite set of points has minimal selfjoinings. If such were all of them this idea would not be very useful.
6.5
Chac6n's map once more
We will show here that Chacon's map has minimal self-joinings of all orders. This argument is due to del Junco, Rahe and Swanson (1980). As always with Chacon's map, it is the spacer placed between the second and third blocks in the cutting and stacking that brings otT the argument. By the time we are finished we will have come to a very precise understanding of the name structure of the transformation. The core of the proof is to show that if a pair of points (x, y) gives the expected limit values for the ergodic theorem applied to some self-joining ji, but lie on distinct orbits of T, then in fact ji = J1. x ~. What we have to show is that for such a pair (x, y) the Cesaro averages are those of product measure. Our first technical lemmas tell us how to recognize product measure. Lemma 6.14 If then ji = ~ x v.
X and Yare ergodic and ji E l(X x Y) is (id x S)-invariant,
Proof As S is ergodic, by Theorem 6.8, ji = ~ x v. Our next lemma tells us how we see (id x S)-invariance in a ji E is far easier to prove than express. Lemma 6.15 Suppose for X, and Q for Y. Let
X and
-
leX, X). It
Yare ergodic, P is a finite generating partition
I
Joinings and disjointness d- =
nQ C~n
(T x Sri(p x
a countable generating algebra of cylinder sets. Suppose ii E J(X, X) is ergodic and (x, y) E X 1 n-l
X
119
Q»).
Y is a point satisfying
.
-n i=O L XA(T'(x), S'(y» -+n ji(A)
(6.6)
and (6.7) for all A E.£ Suppose there are intervals (ij,jk) S Z, i" :5: 0 :5: j", Uk - ik) -+ and subintervals I", I" + tk S (ik,A) with #(1,,) ~ ex(A - ik + 1). Finally, for i E I",
00,
an ex > 0
(1) P(Ti(X» = P(Ti+t,,(x»; and (2) Q(Si(y» = Q(Si+t,,+1(y».
(6.8)
We conclude ji is id x S-invariant and hence ji = p. x p.. Proof Let (C x C) E V7';" -no (T X S)-i(P x Q) be some cylinder set. We want to show ji(C x S(C'» = ji(C x C'). This of course completes the result. We have convergence of the Birkhofftheorem in both directions for T x S on cylinder sets. The intervals Ik and Ik + tIc occupy a fraction ex > 0 of (i",it). It is a simple argument that
and lim ""'00
#
(II) t
L
Xc Xs(c,)(Ti(x), Si(y» = ji(C x S(C'»,
(6.10)
ieI"+I,,
(Break (it ,it) into pieces and consider those on which this convergence must occur. This forces convergence on any differences of such pieces that occupy at least a fraction IX > 0 of (ik,j,,).) But now for i more than 2no + 1 positions from the ends of I", (6.8) tells us (T x S)i(X,y) E C X C exactly when (T x S)i+t,,(X,y) E C x S(C'). But then by (6.9) and (6.10) .. .L- XcxdT'(x),S'(y» I#11 (~ "
Letting k
,eI"
-+ 00
we are done.
-. ~ ~ ,eI"+',,
.. XCXs(c,)(T'(x), S'(y» )
I:5: 4no#1+ 2 . k
•
120
I Fundamentals of measurable dynamics
What we need to do now is to show that we can find the structure of this lemma in Charon's map. We first exhibit a generating partition. Lemma 6.16 Let X be Chacon's map. The two-set partition P of X into points in the zero block, and its complementary set of points added in spacers, generate.
Proof We argue inductively that the T,P-name of a point determines its level in a tower. This is equivalent to saying the T,P-name can be broken in only one way into n-blocks, each of which is the name of a passage up through the n-tower. This is true for O-blocks, as those are the individual occurrences in the name of Pl' Suppose we can recognize passages through the n-tower uniquely in the T,P-name of x. These blocks are disjoint and separated in the name by at most one occurrence of P2' Such single P2'S indicate a point on the orbit of x added as a spacer after stage n. Whenever we see an n-block with such a spacer both above and below it, this n-block must be the third (top) n-block in an (n + I)-block. We must see such an n-block at least one out of every nine. Once we see one such the entire name breaks uniquely into triples of n-blocks forming (n + I)-blocks. • This argument has begun an investigation of T,P-names in X, telling us such a name breaks up into a hierarchy of n-blocks, the n-blocks grouping into triples to form the (n + I)-blocks. Let N(n) be the number of levels in the nth stack, hence the length of a passage through an n-block. List these intervals as 1(1, n), ... , I(N(n), n) from bottom to top. Thus T(/(i, n» = I(i + 1, n) for i < N(n). Let hn(x) to be the index withx E I(h,,(x) + 1, n). It is undefined if x is not in the nth stack. Set k,,(x) to be 1,2 or 3 depending on whether x is in the first, second, or third occurrence of an (n - I)-stack in the n-stack. Again k n is undefined if x is not in the (n - I)-stack. In terms of the T,P-name of x, the origin is at a position hn(x) to the right of the beginning of its n-block. When (n - I)-blocks get grouped to form n-blocks, k,,(x) tells us which of the three contain the origin. Lemma 6.17 density 1/3.
For p.-a.e. x, the set of n E ~ with k,,(x)
= 1, 2 or 3 each has
Proof For a.e. x, kno(x) is defined once no is large enough. Let .r..o = U~~~o) I(i, no) be the no-tower. It is a simple induction that the functions k"o+l' k"o+2"" restricted to.r..o are independent and take on values 1, 2, 3 equally likely. By the law of large numbers (just the ergodic theorem applied to the Bernoulli shift (1/3,1/3,1/3», for p.-a.e. x E .r..o ' k,,(x) = 1,2,3, each with density 1/3 in ~. •
Joinings and disjointness
I
121
The sequence of functions k,,(x) 'drive' the construction of T. If we know their values we can construct the T,P-name of X, the first index no for which k"o(x) is defined tells us x is in the spacer of the no-block, i.e., h"o(x) = 2N(no - 1) + 1. Each successive value k,,(x) tells us how to extend the name across the n-block. If these extensions covered all of Z we would get the full name this way. As k,,(x) = 2 infinitely often, with probability one we do cover all of Z. In fact the only points where this procedure does not generate the full name is when either k,,(x) is asymptotically always 1 or always 3. In this case we only get a half name. These two half names can be combined to form a full name, two in fact, one with a spacer between the two half names. Lemma 6.18
For p.-a.e. x, a point y
= Ti(x) iff k,,(x) = k,,(y) for all large
enough n.
Proof Suppose y = Ti(x). Then once h,,(x) and N(n) - h,,(x) are at least Ijl, k" +1 (x) = k"+1 (y). On the other hand, if k,,(x) = k,,(y) for all n ~ no, then for some iii < N(no), y and Ti(x) have identical T,P-names. As P generates, y = Tl(x). • Theorem 6.19 (del Junco- Rahe-Swanson) Chacon's transformation has twofold minimal self-joinings.
Proof Suppose ji is an ergodic self-joining of X. Assuming ji =F ~j for any j, ji gives measure 0 to the union of the graphs of all powers of T. Thus for ji-a.e. (x, y), k,,(x) =F k,,(y) for infinitely many values n. Select (x,y) so that the Birkhotl' theorem is satisfied for both (T x T) and (T- I x T- I ) with reapect to ji for all sets in .s;[ =
"VI C~" (T
X
T)-i(P x P)}
We are going to establish the structure of Lemma 6.14 on the (T x T), P x P-name of(x,y). Let nl' n2' '" be those values with k" (x) =F k" (y) and both are defined. We know ji-a.s. both k,,(x) and k,,(y) are i~nitely' often 2, although these may never be at a value n•. But one of the two following cases must occur: (1) For infinitely many values s, either k".-I(X)
(2) for infinitely many k"s-l(X) = k".-I(y).
= 2; or (6.11)
The only way (2) can fail is if k,,(x) =F k,,(y) for all sufficiently large n. In this case k" -1 (x) = 2, infinitely often. In either case, in the T,P-name of x and y, the indices of the (n. - 1)-blocks
122
I
Fundamentals of measurable dynamics
Table 6.1 T,P-name of x T,P-name of y Bo BI *Bz
----~-----------------
2
R'I Bb*B~
2
2
3
B-zB_l *80(*)
----------B~
Bl 8 z *B3 ----~-----
B~*Bz
2
containing x and y respectively intersect (overlap) in at least N(n. - 2) = (N(n. - 1) - 1)/3 places. Consider the partition of the T,P-names ofx and yinto (n. - I)-blocks with possible spaces in between. We want to list the various possibilities we might see near the origin depending on the values kIf (x) and k" (y). Table 6.1 lists them. A Bi indicates an (n. - "I)-block i~ the T,P-name of x, B; in that of y. A * indicates a spaces, (*) a possible spacer. The origin lies in Bo and Bo. . In all six cases we have identified a value It I ~ 1 where one sees either ( 1)
tl!!*!1!.!J1 B'B' ,or ,
(2)
'+1
I-Bf.~~~I.
lB,
B'+1
r
(6.12)
The overlaps (B, n B;)and (B'+1 n B;+1) are each at least (N(ns - 1) - 1)/3 - 2 and as It + 11 ~ 2, are contained in an interval (i.,j.)
= (- 2N(n. - 1) - 2,2N(n. - 1) + 2).
In (1) when the spacer is between B, and B,+1, set Is B'+1 n(B:+l
+ 1).
= B, n B;
and I; =
Joinings and disjointness
In (2), set Is = H'+1 n H:+1 and I; = H, n(B; In either case 1(
I 123
+ 1).
N(n - 1) - 7 )
~ 18 N(ns ~ 1) + (4/3) U. - is - 1)
# (I.) = # (I;)
~ 2~(jS - i. + 1), once s is large enough. The T,P-name across all (n. - I)-blocks is the same, so condition (6.8) of Lemma 6.15 is established and Ii = J1. x J1.. • Theorem 6.20 (del Junco-Rahe-Swanson) Chacon's transformation has minimal self-joinings of all orders. Proof We know the result for k = 2. Assume it for some value k ~ 2. An otT-diagonal joining of any number of copies of X is isomorphic to a single copy. Thus any (k + I)-fold joining which, when restricted to some pair of copies, is an otT-diagonal is in fact a joining of just k copies. By inducion, then, we can assume that on any subset of k of the copies of X we have product measure. What we will do is show that Ii is either (id)(lc-l) x T x id- or (id)k-l x (T x T)-invariant. In either case, Lemma 6.14 completes the result. In the first k coordinates consider the set Sn of points (Xl' x 2 , .... x k ) for which hn(xJ ~
N(n)
ill
for
. I
=
1, ... , k,
~
kn +1(x i ) = 1 for i = 1, ... , k - 1, but
kn +1 (x t ) = 2. As Ii is product measure on these k coordinates,
Ii(Sn) once n
~
~ [(/0 - N~n»)~J ~ (~y
3.
Table 6.2 XI Xk-I Xk
B~I
B:B:
*BJ
Bok- 11l:I 1
*B~-I
B~*~
(6.13)
124
I Fundamentals of measurable dynamics
Thus jI(limsup(Sn» ~ O. As jI(S"A(T x .. , x T)Sn):::;; kIN(n) < kI3", lim sup(S,,) is (T x '" x T)-invariant jI-a.s. As we assume jI an ergodic joining, jI-a.s., (x1, .. ·,Xk,XH1 ) E Sn for infinitely many n. Suppose (Xl"",Xk,XHl)ESn and consider the n-blocks near the origin (Table 6.2). We know n~=l Hb and n~=l Hi both have length not less than 9N(n)/IO - I. What do we see in the T,P-name of Xlc+l across this section? We see two blocks ~+1Bt+1 or BA+1*B~+l where n~~f Bb and n~~f Bi are both ~ t(9N(n)/IO - I). Here BA+1 is chosen to be the block whose intersection with n~=l Bb is largest, not necessarily the one containing the origin. If for infinitely many n we see BA +1 B~ +1 then we will conclude, as in Theorem 6.19, that Ii is (id)(l-l) x T x id-invariant. If B~+1*B~+1 occurs infinitely often, Ii is (id)(k-l) x (T x T)-invariant. In either case, induction forces Ii = (~t • The proofs of Theorems 6.19 and 6.20 followed from a rather simple observation about the name structure of (T, P). The map T- 1 has this same structure, except the spacer is always placed between the first and second, rather than second and third blocks. In fact, at each stage of the construction one could make an independent decision as to which one of these two choices we make. Explicitly, let e = {e lO e2,"'} be an infinite sequence ofO's and I's. For each such we build a transformation Te analogous to Charon's. The spacer at stage n of the construction is placed between the first and second (n - I)-blocks if en = 0, and the second and third if en = 1. Chacon's map corresponds to e = {I} and its inverse e = {OJ. In fact, interchanging I's and O's in e always takes Te to Te- 1• Exercise 6.2
1. Show that all Te have minimal self-joinings of all orders.
2. Show that if elf =
e~
for all
n ~ No,
then Te ~ Te..
3. Show that if e and e' differ in infinitely many places, then Te and Te. are disjoint. Hint: show any ergodic joining is id x Te.-invariant. Thus Chacon's map is not isomorphic to its inverse. Exercise 6.3 Show that a map with two-fold minimal self-joinings must have zero entropy. Exercise 6.4 Show that any non-atomic system with two-fold minimal selfjoinings must be weakly mixing. Thus all the maps Te are entropy zero, weakly mixing, and if not isomorphic, then disjoint. Notice how much more precise the information gained from the explicit name structure of a system is than the more global mixing or entropy data.
Joinings and disjointness
I 125
The disjointness observed in Exercise 6.2 parts 2 and 3 is a general fact. Among systems with two-fold minimal self-joinings, two are either disjoint or isomorphic. This result is just the beginning of a much deeper analysis of minimal self-joinings on which we will not embark. We refer the interested reader to bibliographic items del Junco and Keane (1985); del Junco and Rudolph (1987); and Rudolph (1979).
6.6
Constructions
Systems with minimal self-joinings have no factors and commute only with their powers. This is perhaps too simple a situation. From it, though, we can build up examples whose behavior is controlled, but is not quite this trivial. Start with X, a non-atomic system with minimal self-joinings of all orders. Let X k be its k-fold direct product. On this we can define a group of transformations as follows. For i E Z and 1t a permutation of {I, .. . ,k}, set (6.14) Thus U(i,1[) acts by permuting the k coordinates by 1t and acting by Ti on each. Notice U(i,1[)
0
UU,1[') = U(i+i,1[o1[')'
We write X(i,1[) to indicate U(i,1[) acting on the k-fold direct product. Lemma 6.21 For any i f= 0 and 1t, 1t' E S(k) a joining of X(i,1[) and X(i, 1[') is in fact a self-joining of X(l,idl' hence a 2k-fold self-joining of X. Proof Suppose ji E JtX(i,n), X(i.d. As UN,1[) = UN1[') = UU,id),j = k! i f= 0, ji is a self-joining of UU,i~)' As we can decompose ji into ergodic components for UU,id) x UU,id)' we assume the action is ergodic. Setting 1 i-I (i = --:ji 0 (U(I.id) )
L
X
U(I,id»,
(6.15)
1=0
(i is an ergodic self-joining of X(l,id)' hence a 2k-fold self-joining of X. Thus (i must be a product of otT-diagonal measures. As X is weakly mixing, U(j,id) x UU,id) acts ergodically on such a {i. But (6.14) writes (i as an average of U(i,id) x UU,id)-invariant measures. Thus they must all the equal to {i, i.e., ji = (i, The ergodic self-joinings of XU,id) agree with those of X(l,id)' These are
the extreme points of all self-joinings, so J(X(1,id)' X(1,id»
= J(XU,id)' XU,id»'
But ) (X(i,1[)' X(i,1['» is squeezed between these.
•
Example 1 We've already seen that a non-atomic system with two-fold minimal self-joinings could have no roots (a power cannot be a root). Here is an
1 26 I Fundamentals of measurable dynamics example of a system with more than one non-isomorphic square root. Let k = 2, TI = U(i,id) = T x Tand T2 = U(1,(1,2». We use cycle form torepresent 1t. Thus Tl2 = Tl = T2 x T2. We want to show TI and T2 cannot be isomorphic. Suppose ji is a joining supported on the graph of an isomorphism. This, by Lemma 6.21, must be a four-fold self-joining of X. It is product measure on both the first two and second two coordinates, but identifies these two product algebras. Call them 9i;. x §i and ~3 x ~. As jiis a product of otT-diagonals, it must identify 9i;. with ~ or ~4. Suppose ~3. Then ~3 = ~
= (TI
X
T2)(~I)
= (T3
X
T4)(~)
= ~,
ji-a.s. This is of course false. The same holds if ~l is identified with ~. We conclude no isomorphism exists. Notice that even though TI and 0. are not isomorphic, they do have a common factor, the factor of symmetric sets invariant under interchanging the two coordinates. This factor algebra has two point fibres in both systems. The relatively independent joining over this common factor is ergodic for TI x 0. but not for TxT x TxT. It is obtained by averaging two four-fold joinings, one supported on points {(X h X2,X 1 ,X2 )}, the other on {(Xl' X2, X2, Xl)}· Also notice that T2 x T2 has other square roots. For example T3:
(XI,X 2 ) --+
(X2' T2(xd)·
This, though, is isomorphic to T2 by the isomorphism cp: (Xl>X 2) --+ (T- 1 (X I ),X2).
In fact, any square root of T2 x T2 is either TI or isomorphic to
0..
Exercise 6.S Construct a system with at least countably infinitely many pairwise non-isomorphic square roots. Example 2 It is quite simple to find a system with no non-trivial factors but which commutes with more than its powers. Just take T2 where Thas two-fold minimal self-joinings. To get a system which commutes only with its powers, but has non-trivial factors, we do the following. Let U = TxT x T. We will take a certain factor of this, the u-algebra of sets invariant under p : (Xl' X2 , X3) -+ (X2' X3, Xl)·
This consists of sets of the form Au p(A)u p2(A),
A
E
Let Y be U restricted to this factor. Now sets also invariant under
~
X
~2
X
~3.
Y has factors.
For example those
JOinings and disjointness
I
127
form a non-trivial subalgebra. Now Y is a factor of X x X x X with three point fibres. The fibre consists of the three points (X p X2,X 3 ), p«X 1 ,X 2,X3 » and p2«X 1 ,X2,X3» which Y cannot separate. If ji is a self-joining of Y, then ji can be extended via the relatively independent joining, to a self-joining of (T x TxT). Suppose qJ commutes with S. Support ji on the graph of qJ and extend it to {.t, a self-joining of (T x TxT). Now {.t still has Yas a factor, but has nine point fibres over it, consisting of a choice for Xl in the first copy, and for x~ in the second, from among the three. Thus {.t is invariant under the nineelement group H generated by p x id and id x p. The copy of Y consists of those sets invariant under this group H. Now {.t may not (in fact, cannot) be an ergodic joining. H permutes the ergodic components of {.t. As the fibres over Yare atomic, in any ergodic component {.tz of {.t, the coordinate algebras~, ~ and IF3 must be identified bijectively with~, /Fs, IF6 • If anyone were independent, the fibres would not be atomic. If §'l with §'4' then automatically §i1. with §'S and §'3 with §'6' This is an ergodic joining. Acting on it by p x id and p2 x id we get two others, and {.t must be an average of such triples. In /Jz the identification of IFl with §'4 is via some power of T, i.e., the measure restricted to this pair of algebras is supported ona graph {(Xl' Ti(x l ))}. Acting by p x p, there must also be a joining supported on {(X2, Ti(X2))} and similarly {(x 3, Ti(X3»}' This says that the set of sextuples {(X l ,X 2 ,X 3 , Ti(x 1), Ti(X2), Ti(X3))} has {.t-positive measure and as it is Y measurable and TxT x T-invariant, measure 1. We conclude that ji is supported on the graph of Si and hence this is qJ. In both these examples we considered algebras of sets invariant under some group of coordinate permutations. In our last two examples they will again carry the day. Exercise 6.6 Let H s S(n) be a subgroup of the symmetric group. In (X)<") consider the algebra d H of subsets invariant under the coordinate permutations in H, i.e., under all
Let YH be (X)
128
I
Fundamentals of measurable dynamics
Example 3 Let Y be the factor of symmetric sets in X x X. Certainly Y commutes only with its powers by Exercise 6.6. We want to show it has no non-trivial factors. This follows much the lines of the proof that X has no factors. Any such factor is also a factor of X x X. Let ji be the relatively independent joining over this factor algebra. Write Ii = L azliz, its ergodic decomposition, as products of otT-diagonals. Suppose the factor contains a non-trivial set A. As J.I. x J.I.(A) = ji(A), and jiz(A x A) ::;;; J.I. x J.I.(A) for all z, the only possible jiz are those giving equality. These are supported on one of two graphs: (1) {(XttX2,Xl'X2)}; or
(2) {(X l t X2'X 2 ,X 1)}.
Both of these, when restricted to Y, give diagonal measure, and so the factor must be all of t§. Notice that Y does not have minimal self-joinings, as it can be joined non-trivially in X x X x X. Example 4 Consider the two systems Y from Example 3, and X itself. These two have no common factors, as they each separately have no non-trivial factors, and as only one has minimal self-joinings, they are not isomorphic. On the other hand, they are joined non-trivially in X x X, hence are not disjoint. The reader interested in pursuing the construction of examples like the above, is referred to the bibliography. Some of this development can be subsumed into general classes of maps like our U(i, n). Much, though, remains a matter of individual construction and analyses of the structure of names, especially for positive entropy examples.
7 The Krieger and Ornstein theorems
Both Krieger's finite generator theorem (Krieger 1970) and Ornstein's isomorphism theorem (Ornstein 1974) for Bernoulli shifts are representation theorems. Krieger's theorem tells us that any ergodic process of entropy less than log2(no) can be represented as the shift map on the space {P1>P2' ... ' Pno}Z with an appropriate shift-invariant Borel measure. Ornstein's theorem characterizes those systems which can be represented again as the shift map on {PI'" .,P"J Z , only now with a Bernoulli measure. In both these cases the representing systems are shift maps on a symbol space {PI' ... ' p,,)Z with which is associated a shift-invariant Borel measure. Both theorems are proved by identifying a certain weak* closed collection of joinings of our target system and such symbolic systems. The isomorphisms of the theorems will in fact be dense G,-sets in these joining spaces. We will obtain this by identifying a notion of approximate representation and showing that such approximate representations are open and dense. This approach originates in unpublished work of R. Burton and A. Rothstein.
7.1
Symbolic spaces and processes
Let P = {PI"" ,PliO} be a finite state space. Define Yp
= {PI, ... ,P"J Z,
(7.1)
the full P-shift. This is a compact metric space in the product topology. A point y E Yp is of the form ( ... , y( - n), ... , y(O), ... , y(n), ... ), an infinite sequence of elements of P. We define the left shift S(y}(n)
= yen + 1),
a homeomorphism of Yp. We will always use S for the left shift no matter what the state space P. There is a natural partition of Yp labeled by P, P(y)
= yeO),
and in fact, y
= (... , p(s-n(y)), ... , P(y), ... , P(S"(y)), .. . ).
On Yp there are many S-invariant measures. We can see this in two ways. First, the generating tree of partitions
130
I Fundamentals of measurable dynamics P" =
" S-i(P) V
(7.2)
i=-n
has no empty chains. Any finitely additive, S-invariant set function on the tree will extend to such a measure. Notice S-invariance is visible in the tree itself as
(7.3) Thus finite additivity and S-invariance are just a list of identities that a set function on the tree should satisfy. Let '7P be the space of all S-invariant Borel measures on (Yp, 91). We can explicitly metrize it by (7.4)
where II VI (P,,), V2(P,,)1I = tLpeP.lvl(p) - v2 (p)1 ~ 1, analogous to Definition 6.2. This metric depends only on VI' V2 as additive set functions on the tree. The identities which imply V is additive and S-invariant are closed conditions in this metric. Hence ('7P' 11', . II) is a compact space. It is important to note here that Vi -+ V itT v/(A) -+ veAl for all sets A in the tree, and not for a larger collection of sets, as in Lemma 6.2, This topology on '7P is precisely the classical weak* topology on '7 as a subset of the dual ofthe continuous functions. To see this, just notice that for A in the tree, XA is continuous. Finite linear combinations of such characteristic functions are uniformly dense in C( Yp). Since an average of S-invariant borel probability measures on '7P is again such, '7P is also convex. The taking of convex combinations exv1 + (1 - ex)V2 is jointly continuous in ex, VI and V2' The extreme points of '7P we know, by Corollary 3.18, are the ergodic measures. Surprisingly, we will see later the ergodic measures are also dense. Here is another way to see how rich '7P is. Let X = (X, ff,~, T) be any system, and P any partition of X labeled by P. To the pair (X, P) we can associate an element V = v(,¥,l') E '7p. The map p(x) = ... , P(T-n(x», ... , P(x), ... , P(T"(x», ... ) of x to its infinite T,P-name takes X to Yp and is Borel in the sense that 00
p-I(91)
S;;;
V
T-i(P) S;;;~.
i=-oo
As p(T(x» = S(P(x», p(ll) = v(X,P) is an S-invariant Borel measure on Yp. ~otice that up to completing 91 with respect to V(i,P),jYp ,9!f, v(X,~,~ is Isomorphic to T acting on the factor algebra Vi~-oo T-i(P). A pair (X, P) is referred to as a process or, more precisely, a process with state space P. Two processes (Xl' PI) and (X2 , P2 ) are said to be identical if they project to the same measures
The Krieger and Ornstein theorems I 131 Thus and (X,P)
are identical processes. Said still another way, (XI, Pd and (X 2, P2 ) are identical processes if Tl restricted to Vir:.- oo T,,-i(P1 ) and T2 restricted to Vi':' -00 T2- i (P2 ) a~ isomorphic by a map which simply takes a Tl 'PI-name to the identical T2 ,P2 -name. The topology on '7P thus can be regarded as a topology on processes. Two processes are not separated by this topology exactly when they are identical. We write the metric -
-
-
IIX 1 ,P1 ;X 2 ,P2 11
(7.5)
= IIv(x,.p,),v(x,.p,)II,
. and still refer to this as the weak* topology (in probabilistic terms it would be called the weak or vague topology). We end this section with a small lemma concerning entropy. For notational convenience, we write (Yp, v) for the system (Yp,aI, v, S), and hv(S) for its measure-theoretic entropy. Lemma 7.1
Suppose Vi
E
and Vi
'7P
--+ V
weak·. Then
lim hv.(s) :s; hy(s). i .... oo
Thus
I
if (Xi>~) -+ (X, P) weak·, then lim h(I;, ~) :s; h(T, Pl· i ....... oo
Proof We know, for any
'7p, P is a generator for (Yp, v). We know
V E
hv(S) =
!~~ hv(pli21 S-i(P»)
(see Exercise 5.5) and in fact the terms in this limit are non-increa.sing (Theorem 5.27). For any fixed n,
!~~ hyi (plj21 S-j(P») = hy(pli21 S-j(P») as these values hv,(PIVj:'~l s-j(P» depend only on the ~:,"o S-i(p) ~ Pn • As hv,(PI ~:~1 S-i(P» :5: hv,(PI ~:~l S-i(P»,
Vi
measure of sets in
Z S-j(P») :s; hy, (Pli21 S-j(P»)
lim hYi (p15 for all n. Let n -+
00.
1
•
132
7.2
I
Fundamentals of measurable dynamics
Painting names on towers and Generic names
We now describe a method for constructing partitions labelled by P. It is the essential tool for creating our representations. We call it painting a name on a tower. Suppose we have a finite P-name, i.e., a sequence P = (Pio,Pi"""PiN_,)ePN.
Also, in some dynamical system disjoint sets
X we have a tower of height N
consisting of
F, T(F), ... , TN-I(F).
To paint the name P on this towe~is to define a map P; Ui=;ol TJ(F) --+ P where P(x) = Pi. for x E Ti(F). Notice P does not partition all of X, only the tower. Suppo~e we are given a finite collection of names PI> ... , Ps E pH, and for each a tower with bases F l , ... , F., and height N, all of which are disjoint. We can paint Pi onto the tower over Fj giving a partition P of the union of all the towers. To actually partition all of X, imagine P to have been extended outside the towers in some perhaps arbitrary way. We will describe explicitly how when need be. As we said earlier, this painting of names on towers is the critical tool we will use in the proofs of our representations. There are three parts to the process of painting. We must have names to paint, towers to paint them on and an assignment of names to towers. We begin with a discussion of names. Suppose (X, Ii) is a process with state space P. Let {Bi.,,}~i"l be a listing of the elements ofVi"=-. T-i(P). Definition 7.1 We say x E X is e, N-generic for (X, Ii) if for all n, 0 ~ n ~ log2(2/e) + 1, and all Bi•• E V;"~-. T-i(P), /
n1 ifo XB;)P(X» N-l.
/
- ~(Bi.") <
(e 2n) ~(Bi
21 2 -
N
.• )·
Notice that if x is e, N-generic for (X, P), then any other x' with Ii(Ti(x» = -log2(2/e) - 1 ~ j ~ N + log2(2/e) + 1 is also e, N-generic.
P(Ti(x'
»,
Lemma 7.'1. If it is ergodic and Ii is a finite partition, then for any e > 0, once No is large enough, the set of points which are e, N-generic for (X, Ii) for all N ~ No has measure at least (1 - e).
Proof Fixing e > 0, there are only finitely many sets Bi •• , 0 ~ n ~ log2(2/e) + 1. We need only consider those with ~(Bi.") > O. Set a lower bound for No, No > 16(log2(2/e) + 1)/e. Further, require No so large that by the BirkhofT theorem (3.4), for all but a subset of X of measure at most e,
The Krieger and Ornstein theorems
I~ ~t~ XB"JTJ(x» for all such Bi,n' for all N
-
I 133
-
~(Bi,n)1 < i~(Bi,n) < ~G ~)~(Bi,n)
•
~ No,
Notice that whether or not a point is in a set Bi,n is determined by the name P(T-n(x», , .. , P(x), ... , p(Tn(x».
Thus if we are given some P-name
for each index 0 ::; j ::; N, by reading the name
we will either see a name corresponding to Bi,n or we will not. Let b(Bi,n, {p;}7=-':n+n) be the density in (0, ... , N) of occurrences of indices j where (Pi j _ n , " " Pi j +) corresponds to Bi,n' Definition 7.2 We say a P,N-name (Pi,"" p,' N-l ) is e-generic flor (X, P) if for o all 0 ::; n ::; log2(2/e) + 1, and any extension of the name , Pi_ n ,"" Pi o"'"
PiN-I""
,Pi N -
1 +n'
we have
If x E X is e,N-generic for (X,P), then the T,P,N-name of x, P(x), ... , P(T N- 1(X» is.,f.-generic for (X, Pl. •
Corollary 7.3
Theorem 7.4
Suppose we are given a collection of P-names (Pi(O,1)"'" Pi(N,1) (Pi(O,t),· .. ,
~(N,t)
all of which are e-generic for some (X l' P 1 ). Suppose we also have a collection of disjoint towers of height N in X 2 with bases F 1 , ••• , Ft with
If we paint these names on the corresponding towers we will get a partition P2 of X 2 • We can conclude
134
I Fundamentals of measurable dynamics
Proof For any 0 ~ n ~ log2(2/e) sponds to a set
+ 1, each set
~~n E
Vt'=-n T1-i(Pd corre-
n
Bln
E
V
T2- i (P2 )
i=-n
with the same P-name. We can estimate Ilz(Bln) by splitting it into that part within the tower and that outside. For a given tower, say based on FIc , there is a ftxed Pz-name up the tower, the name (PI(O,Ic)""'Pi(n,K»' This name may extend in various ways as we look n steps before and after the tower. But independent of this, as the name is e-generic, 112 (Bin
n
N-I
112
(
~Q Tl(Fk») .
)
U Tl(F
e
I
-
III
(l1;.n)
I
<"4 lli(B;.n)·
k)
;=0
Thus unioning over the towers and summing over the various Bin'
(7.6)
The points outside the tower can lie in any Bin' but only one such. Thus
but as
e T"
L
•
we are done.
This theorem explains our rather complex deftnition of an e,N-generic point. The important results are Lemma 7.2 and Theorem 7.4. We can jazz up the argument of Theorem 7.4 a little to obtain the fact that ergodic measure are weak· dense in '1p. Theorem 7.5 Given any mea~ure v E '1p and ergodic, non-periodic system and e > 0 there is a partition P of X with
II(Yp , v),P;X,PII <
g,
e.
Proof First, as '1p is compact, convex and its extreme points are ergodic, we can ftnd a ftnite convex combination v' = (L~=l (XiVi) of ergodic measure Vi with
IIv',vll <~.
The Krieger and Ornstein theorems
I
136
Using Lemma 7.2, select No large enough and P,No-names p.
=
(Pi(O,.), Pi(1,.)'···' Pi(No-l,.»,
t
= 1, ... , k,
with p. an e/4-generic name for «Yp , v.),P). In X2 , use the Rohlin lemma (Theorem 3.10) to fmd a set F £; X with F, T2(F), ... , TfO-l(F) disjoint and covering all but el4 in measure of X 2, Partition F into k pieces, F1 , ••• , Fk so that J.l2(Fj )IIJdF ) = a. i •
Paint Pr on the tower over R,. This constructs P. Following precisely the computation of Theorem 7.4, -
--
II(Yp , v'), P; X, PII
~
e
e
e
4 + 4 = 2'
We conclude
• Exercise 7.1 We can improve on Theorem 7.5 slightly. Not only are the ergodic measures dense in I'/P' they are a G6 • Consider the subset S(Bi,n,
8, N) = {v EI'/P: II ~ ~t~ XB•. JSi(x»
- v(Bi,n) 112 <
1. Show S(Bi,n, e, N) is weak'" open. Hence so is S(Bi,n, 8) =
2. Show that v is ergodic iff v E Hint:
ni,n,m S(Bj,n,
8}.
UN=1 S(Bi,., e, N).
11m).
Theorem 3.1 and Corollary 3.16.
Exercise 7.2
Follow the above reasoning to show that the v E I'/P for which
(Yp , v) is weakly mixing are a dense G". Hint: Remember v will be weakly mixing exactly when v x v is ergodic for S x S. So show that the measures in 1'/ p x p of the form v x v are a closed convex set and proceed. Exercise 7.3 Show that the measures v for which Yp is mixing are dense, but meager, i.e., its complement contains a dense G6 • Hint: Show that the rigid processes (Exercise 4.5) are a dense G".
7,3
d-Metric and entropy
We now introduce a much stronger metric on processes than the weak"', one intimately connected with joinings. Where the weak'" metric says two processes are close if for some long, but finite, period of time they moved sets in approximately the same way, the d-metric will ask that the two processes look approximately the same forever. Just as with the weak'" topology, we can
interchan.8eably view d a£ a metric on processes (X, P) or as a metric on measures in For two processes (Xl,Pt> and (X2 ,P2 ), consider the space of joinings J(Xl ,X2 ). For any fl. E J(X l ,X2 ) we can compute how closely PI and P2 have been matched by fl.(PI x X z A X I x Pz ), where P; x Xj indicates the lifting of P; to the joined spaces.
"p.
Definition 7.3 d(X l ,PI ;X2 ,P2 ) =
inf /leJ(X,.X,)
p(PI x X2 AX I x Pz).
We will write
Lemma 7.6
1. dis a metric on equivalence classes of identical processes. 2. The infimum of the definition of d is actually a minimum. 3. If XI and X2 are ergodic, then the d-distance between (XI' PI) and (X2 ,1iz) is achieved by an ergodic joining.
Proof 1. Certainly ([(XI ,i'I; X2, 1'2) = 0 itT the two processes are identical. Symmetry is obvious. The triangle inequality follows via the construction of the relatively independent joining. If Al E J(X!> X2 ) and A2 E J(X 2 , X3 ), then we can construct.u, ajoining of Xl' X2 and X3 simultaneously as the relatively independent joining of fl.1 and A2 over the common factor X2. Certainly
(l(P. x X2 x X3 AX. X X2 x P3 ) ~ p(Pl x X 2 X X3 L1 Xl X liz x
+ P(XI x P2
=.u(PI
X
X3AX.
X
X3)
X2x
P3 )
x X 2 AX l x P2)+P2(P2 x X 3 L1X 2 x
P3 ),
and as (l, restricted to the first and third coordinates is in J(Xl' X 3 ) we are done. 2 and 3. Remember J(X I , X2 ) is weak"'-compact and convex. The function V(A) = A(P. x X2 L1 Xl x P2 ) is weak"'-continuous and linear. Hence it has a minimum achieved on the boundary of J(X.,X2)' By Theorem 6.3, if Xl and X2 are ergodic this boundary consists of the ergodic joinings. _
-- -
-
-
e
d(X I ,Pl; X 2 , P2) < 4Iog 2 (2/s)
then
IIX.,P1 ;X2 ,P2 11 <
e.
+4
, ne I\.negar ana urnstem tneorems I .,,,,
Proof Let jJ. achieve the d-distance. For n
jJ.C~n T1-i
X 2 AX1
AC~n (T1-i(Pd X
~
X
log2(2/e)
i~n T
2-
+ 1,
i(P2
»)
X 2 AX 1 x T2- i(P2
»)
Thus for such n
As
L i>log2(2/e)+1
2- i <~, 2
•
we are done.
Thus the d~metric is a strengthening of the weak· metric. It is in fact an extreme strengthening. We can see this in a variety of ways. First, '1p is not d-compact. Consider the uncountable collection of pairwise disjoint maps of Exercise 6.2. Each is equipped with a generating partition Pe into two sets (the first tower and the spacers). These sets have measures 2/3 and 1/3, respectively. As the only joining of two of thesl! is product measure, ([(ge" Pe,; ge" Pe,) = 5/9. Thus in '1P there is a collectio_n of cardinality the continuum whose elements are pairwise 5/9 apart in d. Similar reasoning leads to the conclusion that no K-system can be the d-limit of zero entropy systems. A third indication, intimately related to the above remark, is the following two theorems.
Theorem 7.8
The ergodic measures in '1p are d-closed.
Proof Suppose vi -+ v in d. Choose i so large that d(Vi'
v) < e2 ,
and is achieved by a joining, A. Let A = J~ fit dt be its ergodic decomposition. By the same reasoning as Theorem 6.3, a.e. fit has first marginal Vi. Letting Vt be its second marginal,
v= is an ergodic decomposition of v.
fol vtdt
138
I Fundamentals of measurable dynamics
As
d(Vi'V) = !ol d(v;, v,) dt < e2, for all but e in measure of the v" d(v, v,) < e. This says for a.e. t, d(v, v,) v = v" hence v is ergodic. Theorem 7.9
= 0 or _
If (X1 ,Pd and CX2 , P2 ) are ergodic and
d(X 1 ,Pl ;X2 ,P2 ) <
e,
then (7.7)
Proof Let P. be an ergodic joining that achieves the d-distance. As p.(Pl x X 2 L\ Xl X P2 ) < e, the result follows from Lemma 5.10. _ Thus entropy is in fact d-continuous. The d-distance is intimately related to what is referred to in information theory as the Hamming distance between names. Definition 7.4 If PI and P2 are two names in pN, then the Hamming distance between them is
IIpl' P211Z = # {i: PI (i) #: P2(i)}. the dN-distance is the normalized Hamming distance, i.e.,
d( N
)_
Pl,P2 -
# {i: PI(i) #: P2(i)} N
.
Theorem 7.10 Suppose (Xl,Pd and (X 2 ,P2 ) are ergodic processes, ek '" 0 and we have sequences of names p~ E
pNk
p~ E
pNk,
and
and pl is ek-generic for (~, ~), j
= 1,2. Then
d(Xl' PI; X2, P2) :s;; lim dNk(pL pf)· k-+ao
Proof For each k, consider the double name Pk
= «plO,k)' pfO,k»"'" (ptNk-1,k),pfNk-1,k»)
E
(P
X
pt".
Nk,l' 00
The Krieger and Ornstein theorems
I
139
Let (X) be some arbitrary non-periodic ergodic system. Build a tower in X3 with base F", and height Nk covering all but e" of X). Paint p" on this tower, constructing a partition ~ v P~. By Theorem 7.4
II(XI ,i>d, (X)' ~)II
:5:
2e"
and Furthermore,
~k IIpLp;lIrt + e".
1l3(P: APn 5
Consider the sequence of measures
Let
v
be the limit of a weak* convergent subsequence on which 1
lim Nk
Ilpf ,p:ll~k
is a limit. Thus eX I , Pd and (Yp , x P2' v, PI) are Jdentic~, as are XP" v,P2 )· Hence v eotends to a joining of XI and X 2 , and
(X 2 , P2 ) and
(Yp ,
d(X I ,i'I;X2,P2):5:
V(PI ~P2)
= lim dNJpLp~).
•
Corollary 7.11 Suppose (XI,Pd and (X2 ,P2 ) are ergodic processes, and Nk )" 00. Suppose we have a sequence of subsets A" E X I' Il(A k ) > ~ > 0, and measure-preserving maps ((Jk : Ak -+ X 2. Then
d(X I , PI; X2,P2) :5: lim-(lA) III
k
f dNJp~Jx),p~k«((Jk(x)))dlll. Ak
Proof Since Nk )" 00, we can find ek \, 0 by Lemma 7.2 so that all but e" of the x E Xj are e". N,,-generic for X j • j = 1 or 2. On~ 2~ < ~, there must be points x" E S" w.!.th Eoth P~k(X), ek , N,,-generic for (Xl> PI) and P~k«((Jk(X», e", Nk-generic for (X2' P2 ) and I 2 ( 1 dNk(PNk(x,,), PNk«((J,,(Xt))):S:; III (Ak)
fAk
I 2 dNk(PNk(X)' PNk«((J,,(X») dill )
(
Apply Theorem 7.10 to finish the result.
2e,,)
1 - -;- .
•
When we apply Corollary 7.11 later, the sets A" will be all of X I. What is important here is that the maps ((Jk pairing points in X I to points in X 2 need not be joinings, i.e., need not commute with the transformations. Exercise 7.4
Show that the weakly mixing processes are d-closed.
140
I
Fundamentals of measurable dynamics
Exercise 7.5
Show that the mixing and K-processes are d-closed.
Exercise 7.6 Show that if Xl and X2 have ii('xl,Pl ;X2,P2) = if, then for a.e. x E Xl> there is a sequence of names p;(x) = {P~o.n)(X)'P~I.n)(X), ... , P~n.nlx)} E pn which becomes ever more generic for (X2 , P2 ) and for which 1
lim -IIPn(x),p;lIl, n-+oo n
-
= d.
We end this section with a very important perturbation argument. We saw in Theorem 7.9 that entropy is d-continuous. Hence a small d-perturbation cannot change entropy by much. What we want to see now is that for v E flp, as long as hw(x) < log2(n), i.e., v is not the Bernoulli n-shift, we can perturb some entropy into v. We need a particularly strong version of this, we define a strengthening of the d metric. Definition 7.5
Set
d N(Xl>P1 ;X2,P2) = d( X 1 'iYN Tl-i(filf,X2'iYN T2- i(fi2»), i.e., we measure not just how closely PI can be joined to fi2 , but how closely
Vl!-N T1- i(fil ) can be joined to vr=-N T2- i (P2 ). It is a computation that
dN(X I , PI; X 2, P2) :5: Nii(X I , fi,,; X 2, P2)· Exercise 7.7
Show that for any e > O. for N > log2 (21e)
+1
IIv l , v2 11 :5: dN(vl> V2) + e. Theorem 7.12 so that
For any VI
E flp,
e > 0 and natural number N, there is an V2
E flp
(1) h. 2 (S) ~ hw,(S) + e(log2(n) - h.,(S»; and (7.8)
Further,
if VI is ergodic, so is v2.
Proof Let Vo be the Bernoulli n-shift (n- 1, n-l, ... ~ n- 1) with h.o(S) = log2(n), the unique maximal entropy measure in flp. Let (X, R) be any weakly mixing process, R, a partition into two sets r l , r2 where
IlCDN T-i(r ») ~ l
(1 - e)ll(rl
)
and ll(r2) = e. Such a partition can be found in any non-periodic ergodic X using the Rohlin lemma. Consider the space Z = Yp >< X >< Yp with measure VI >< 11 >< Vo. and trans-
The Krieger and Ornstein theorems
I
141
formation S x T x S. Construct a partition of f as follows {P(Yt> if x P(YI' x, Y2) = P(Y2) if x
E Yl
E
r2'
Let V2 E v(Z,P) E '1p, __ To see (2) just notice that VI and V2 are joined in Z as P(Yl) and P(YI' x, Y2)' In Z, if x E n~-N T-i(rd, then (YI' x, Y2) and Yl belong to the same elements ofV~_N(S x T x Sfi(p) and Vf=-NS-1(P), respectively. Thus iJN(v l , V2) .::;; 1 -
P,CDN T-i(rd) .::;; 2e.
For (1) we remember hV2 (S,P)
= hv,xPxvo(S x T x S,P)
fI(p15Z1 ~ fI(pIS~1 S-i(P) L
=
(S x T
=
Xr
x
2
X
Y
=
x
I(Y x X X
S)-i(P))d(V l
X
X
fr2xY (p I. VS-i(P») ,=-1
+ Lxr, I(pII'11
P, x vo) I
x p, x Vo
pi I'll S-i(P) I
x p, x vol
Yli'11 S-i(P)
iYa i-i(R) Xi'll S-i(P»)d(V I
X
;'10 T-i(R) Xi'll S-I(P»)dV
iYa T-i(R) x I'll S-I(P»)d(V
+ Lxr,n I(P x x
X
I
x p, x Vo)
d(p, x vol
S-I(P»)d(V
I
x p,)
= p,(r2)log2(n) + p,(rdhv,(S) = elog 2 (n) + (1 - e)hv,(S). 7.4
•
Pure columns and Ornstein's fundamental lemma
Ornstein's fundamental lemma is the critical piece of information we need for our proofs of the Ornstein and Krieger theorems. It is a painting argument. To this point our painting arguments have been rather robust, just painting
1 ...£
I r-unaamentals ot measurable dynamics
a name or collection of names on essentially any tower. Here though we will be given some pre-existing partitions and will wish to repaint them to improve their character. Our first step is to understand how a pre-existing partition paints a tower. Let (X, P) be an ergodic process, and F, T(F), ... , TN-I(F) be a tower in X. The T,P,N-names of points in F cut this tower into subtowers. Suppose P = (Pio,Pi" ... ,PiN_)E pN is some particular P,N-name. Let Fp c F consist of those points with PN(X) = p. Such sets partition F, and are bases for disjoint towers. Such a tower over F, is called a pu~e P-column because each ~ttle piece Ti(F,) n Ti(F) lies in a single, pure set of P, 0 ~ j < N. Notice that Pis simply a painting of the names P on the columns Fp. We want to prove tower-related versions of the ergodic theorem, and the Shannon-McMillan-Breiman theorem. Both these results will rest on the same trick. Even though the base set F occupies only a fraction l/N of the tower, thickening it to UI~1 Ti(F) occupies a fraction 0(. Further, the name of a point in the thickening differs in a controlled way from the name of the point below it in F. Theorem 7.13 Suppose (X, P) is an ergodic process. For any e > 0, once N is large enough in any tower F, T(F), ... , T N- I (F) with N-I
(1) J1 ( i~ Ti(F)
(2) J1( {x
E
)
> e; we can conclude
Fp: P is e-generic for (X, P)}) > (1 - e)J1(F).
Proof Notice that if x is 8,M -genericfor (X, P), then T-i(x) is (8 + i/(m + 1», (M + i)-generic for (X, P). Let 8 = e3 /8. Choose Mo by Lemma 7.2 so that for all but e of the x E X, x is 8,M-generic for all M ~ Mo. Let N(l - e/2) > Mo and consider the set of points B = UI~ci21 Ti(F). As J1(B) > e2/2 there is a subset Bo 5; B, J1(B o) > (1 - e/4)J1(B), and all x E Bo are e,M-generic for all M ~ Mo. Thus if x E Bo n Ti(F), 0 ~ i =s;: [eN/2], then x is 8(N - i)-generic. Hence T-i(x) E F is (e + i/(m + i», N-generic, hence e,N-generic for (X, P). As J1(UI~ci21 T-i(Bo) n F) > (1 - e/4)J1(F) we are done. _ Lemma 7.14 Suppose (X, P) is an ergodic process. For any e > 0, if N is large enough, and A is any set with J1(A) ~ e, there is, then, a subset AD 5; A, J1(Ao) > (1 - e)J1(A) and for any x E AD,
1- ~
log2(J1(PN(x) n A)/J1(A» - h(T,p)1 < e.
Proof We establish upper and lower estimates separately. First, it is enough to show
The Krieger and Ornstein tneorems I ,..., (7.9)
as
once N > -2Iog 2 (e)/e. By the Shannon-McMiIlan-Breiman Theorem (5.3), once N is large enough, for all but at most e2 /4 of X, \ _log2(f1~N(X))) _ h(T,P)\
<~.
(7.10)
So certainly -log2(f1(PN(x» n A)/N < h(T, P) + e/2 for all but a fraction e/4 of A. For the other inequality, we remind ourselves that by Corollary 5.2 once N is large enough a collection of at most 2("(T.l')+£/4)N T,P,N -names covers all but e2/8 of X. Such a collection will cover all but a fraction e/8 of A. Let B £ A be those x E A with f1(PN(X) n A) :s; 2-("(T.l')-t/2)N, i.e., where the lower estimate fails. We conclude
Once N > -4Iog 2 (e/8)/e, f1(B) < f1(A)/4. This gives us both estimates outside • a subset of A of measure :s; ef1(A)/2. Theorem 7.1S Suppose (X, P) is an ergodic process and e > O. Once N is large enough, for any tower!', T(F), ... , TN-l (F) with (1) f1(U~-OI Ti(F» > e,
we can find a set I of P,N-names so that (2) f1(U, eI Fp ) > (1 - e)f1(F); and (3) for pel, I-l/N log2(f1(Fp )/f1(F» - h(T, P)I < e. Proof As in Lemma 7.14, we establish upper and lower estimates separately. Since
-
log2 f1(F) log(e) 10g(N) e < ---+--
once N is large enough, it is enough to show \-
~ log2(f1(Fp» -
h(T, P)\ <
~.
(7.11)
144
I
Fundamentals of measurable dynamics
Consider the two sets A(+) = U!~~N] Ti(F) and A(-) = U?=-[£/8N] Ti(F). Both have measure at least e2 /8, independent of N. For the upper estimate, by Lemma 7.14, if N is large enough, a subset A~+) of all but a fraction e/4 of A(+) is covered by T,P,[N(l + e/4»-names whose intersections with A(+) have measure at most 2-(h(T.P)-4)([N(1-4)]+1) .
For any x
E F
and 0
~ i ~ [eN/8],
Ti(PN(X) 11 F)
£; (P[N(1-8)]+1 (Ti(X» 11
A(+».
Thus if Ti(X) E A~+) then /1(FpN (Je» = /1(PN(X) 11 F) ~ 2-(h(T,P)-2)N. Such x must cover all but a fraction e/4 of F. For the lower estimate, again using Lemma 7.14, if N is large enough there is a subset A~-) of all but a fraction e/16 of A(-), covered by at most 2 -(h(T,P)+£/16)([N(1 +8)]+1) T,P,[N(1 + e/8)] + I-names. Let Fo £; F be those points with T-i(x) E A~-) for some 0 ~ i ~ [eN /8]. Thus /1(Fo) ~ (1 - e/16)/1(F). Each T,P,[N(1 + e/8)] + I-name in A~-) gives rise to a possible [eN/8] + 1 different T,P,N-names in Fo, one for each 0::;; i::;; [eN/8]. Thus Fo is covered by at most ([eN/8] + 1)2(h(T,P)+3£/16)(n+l) T,P,N-names. Be sure N is large enough that this is less than 2(h(T,P)+£/4)N. What is the measure of the set of x E Fo with /1(PN(X) 11 F) ~ r(h(T,P)+e/2)N?
At most 2(h(T,P)+£/4)-(h(T,P)+2)N::;; 2-e/2N.
Being sure N > 2/dog 2 (4N/2), and we conclude 2-£/2N
~ ~ /1(F).
Thus for all but a fraction e/2 of F we get the lower estimate and we are done .
• The next exercise will in fact be used later in the proof of the isomorphism theorem. Exercise 7.8 In Theorem 7.12 we saw entropy could be perturbed into a process. Here we show it can be perturbed out. Explicitly, suppose (X, Q) is ~n ergodic non-periodic process and 1 > !1. > e > O. Show there is a partition Qo with (1) /1(Qo L\ Qo) < (2) h(T, (20)
!1.;
and
< (1 - !1.)h(T, Q) + e.
The Krieger and Ornstein theorems
I
146
Hint: Construct a sequence of partitions QN by building a Rohlin tower of height N, covering all but (eIN) of X. Split the base F = FI U F2 where /1(Fd = IX. Repaint the tower over FI to be all in qi' Use Theorems 7.4 and 7.10 to show V(X.QN) = VN E '1Q converges weak· to IXVo + (I - oc)v(x.Q) where Vo is a point mass on the sequence of all qI'S. Use Lemma 7.1 to complete things. We are now prepared to prove Ornstein's fundamental lemma. We will in fact prove two forms, dividing the argument first into a painting project and second a perturbing to gain entropy. It is historically incorrect to call either of these Ornstein's fundamental lemma (see Shields 1973). Embodied in them though is Ornstein's original 'copying' or 'painting' argument, which remains the core insight to the proofs of the Krieger and Ornstein results. The structure of these theorems though has been greatly refined and clarified, most particularly by Kieffer, and Burton and Rothstein. We are presenting this refined version. Suppose (XI, P) and (X 2' (2) are two ergodic processes, and p, is an ergodic joining. The question the fundamental lemma addresses is when is it possible to perturb p, only slightly weak· so that P becomes approximately X2 measurable and Qbecomes approximately Xl measurable. There will obviously be entropy constraints. Furthermore, both processes cannot remain fixed. One of them, (XI, P), must also be perturbed. Definition 7.6 We say a partition P is /1,e-contained in au-algebra d, written P ~ d, if there is a partition PI c d with J.l(P 11 Pd < e. p
Lemma 7.16 Suppose (X, P) is a ergodic, non-atomic process. For any e > 0 there is an No and partition pi of x with (1) /1(P APt> < e; and
(2) /1({x: Y(Ti(X))
= PI
for i
= 0, 1, ... , No - I}) = 0.
Proof As (X,P) is non-atomic, limN .... oo /1({x: P(T1(x)) = PI for i = 0, 1, ... N - I}) = 0. Choose No so that this measure is less than e. Let
P' (x) =
{P(X) P2
if p/(x), i = 0,1, ... , No - 1 are not all . If they are.
Pl'
•
Theorem 7.17 (Ornstein's fundamental lemma, first form) Suppose (XI,l') and (X2' Q) are non-periodic ergodic processes, and h(TI' P)
> h(T2' Q).
146
I
Fundamentalli of measurable dynamics
Further, suppose fl E J(X I ,X2 ) is ergodic and £ > O. There is, then, a partition 15 of X 2 so that (1) II(X I x X2 ,ji),P x Q;X2 ,15 v QII <£;and (2)
Q /!:.
V 12- (P).
(7.12)
i
IJ, i=-oo
Proof Using Lemma 7.16, Lemma 5.10 and Theorem 7.7, we can assume without loss of generality that for some No, III ({x: p(Ti(X» = PI for i = 1,2, ... , No - I}) = O. We begin by applying Theorems 7.13 and 7.15 to our joined process «Xl x X2 ,{l)p x Q) and its two coordinate processes (Xl' 1') and (X2 ,Q). Let h(TI , 1') - h(T2 , Q) > !1. > 0 where!1. ~ 1. Be sure N is so large that for any tower in the joined process of height N, covering at least 8!1./16 of Xl x X 2 we have (1) All but a fraction £!1./16 of the 8!1./16-generic for fl.
P x Q,N-names in the base of the tower are (7.13)
Note: if a P x Q,N -name is 8!1./16 generic for fl, then the coordinate P, N -names and Q,N-names are £!1./16-generic for III and 112 respectively. (2) All but a fraction 8!1./16 of the points (Xl' X 2 ) E F satisfy
-I
1
1 £!1. (a) - Nlog2({1(FPN(JC1)//I.(F» - h(TI'P~ < 16' (b)
1- ~
log2({l(FqN(JC 2)/{1(F» - h(T2
)1
<
~:.
(7.14)
We also require that N > (1/!1.)log2(16/£), and N > 32(No + 2)/8. A pply the Rohlin lemma (Theorem 3.10) to construct a tower in X 2 of height N + No + 2 with base set F, covering all but £/32 of X 2 • For now we work only on the first N levels of the tower, F, T2(F), ... , T!-l (F). This covers all but 8/16 of X'Z. LetF = Xl x F, a base in XI x X 2 .LetFo ~ Fconsistofthose(xl,x2)with (1) theP x Q,N-nameof(x l ,x 2)is8!1./16-genericfor{l;and
(7.15)
-I
1
£!1. and (2) (a) - N1 log2({l(F'N(JC,»//l.(F» - h(TI • P) < 16;
(b)
1- N1 IOg2({1(F'N(x2)//l.~F» -
-I
h(7;./~ < £!1. 16·
(7.16)
We know (1(Fo) >,J - "8!1./16)/I.(F). Let J = {q
= QN(X 2 ): there is an Xl
with (XI' X2)
E
Fo},
The Krieger and Ornstein theorems
I
147
and I = {p = PN(~ : there is an X2 with (Xl' X 2 ) E Fo}.
To any element q E J we can associate I(Q)
£;
J where
J(q) = {p E J: for some (X I ,X 2 ) E Fo, Pn(xd = P and qN(X 2 ) = q}.
(7.17)
We want to paint the tower with base Fq with a name P from J(q). This will make the pair (p, q) eoc/16-generic for p. To obtain (2) of (7.12), though we want, to as great a degree as possible, P to be unique to q. To be more specific, consider = {cp; J o ..... I; J o £; J, cp is 1-1 and cp(q) E I(q)}. We partially order by CPI -< CPo if Dom(CPI) £; Dom(cpo). Let cP be a maximal element of <1>. Let us compute
M{(Xl' X 2 ) E F; qN(X2) E dom(cp)}). Notice that for any q E J and pEl
~(Fp) <
2-«(I-'-')N
<
2-7/8«N
~(Fq)
<~. 16
(7.18)
Thus
e < 16~(F). A
Since cP is maximal, Range(cp);2 Uq~Dom('P)I(q) as otherwise cp could be extended to some q if; Dom(cp) by assigning cp(q) = P E I(q)IRange('P)' Thus
M{(Xl> x 2 ) E F; qN(X 2 ) E Dom(cp)}) ~
(i(Fo) - M{(X I ,X 2 ) E Fo: qN(X 2) if; Dom(cp)})
~ MFo) ~ (i(Fo)
M{(Xl ,X 2 ) E Fo : PN(X I ) E Range(cp)}) B
-16 MF)
>MF)(I-~).
(7.19)
No~Jor qEDom(cp) paint the name_cp(q) on the tower Fq , T2 (Fq ), ... , TJ-I(Fq ). For X E F q , q if; dom(cp), paint P(x) arbitrarily. Remember we have No + 2 further levels TN(F), ... , TN+No+I(F) to paint. Paint these with the
name P2 followed by Nop/s followed by a P2' This name never occurs as a name cp(q) as No consecutive PI'S never occur in a TI,l'-name. On the remainder of X 2 , let P be arbitrary. This constructs the partition P of X 2 • To verify (1) of (7.12), notice that by (7.19) the towers over Fq , q E Dom( cp)
148
I Fundamentals of measurable dynamics
cover all but (56/16) of X 2 • We can regard each such tower as painted by a double name (lp(q), q) which we know to be Boc/16-generic for fl. By Theorem 7.4,
/I(XI
X2 ,fl),P
x
x
Q;X2 ,P
Q/I < ~B <
x
For (2) of (7.12), we show how to approximate Q in point x E X 2 examine the name
B.
Vi::
-00
T2- i (P). For a
p(x), P(T2(x», ... , p(T{'+No+1(X» for an occurrence of the name P2, PI' ... , PI' P2. If x E Tj(F), 0 ~ j < N, this will occur at precisely
p(T{,-i(x)), ... , p(T{'+No+1-i(x)).
(7.20)
The name p(j,x)
= P(T2-i(x)), ... , P(T{,-l-i(x))
will be in range (lp) and Q(x) = lp-I(p(j, x))j. Thus in V~=+!~:: T2- i (P) is a partition 12 which agrees with Q on UqeDom(q» Ui=O Tl(Fq ), i.e., Jl2(Q L\ 12) < (5B/16) < B.
•
Lemma 7.18 (Ornstein's fundamental lemma, second form) Suppose (XI' PI) and (X2, Q) are non-periodic ergodic processes and suppose h(~, P) > h(T2). Further, suppose A E J(X I ,X2) is an ergodic joining and B > O. There is, then, a non-periodic ergodic process (X 3, P3) with h(T3' P3) > h(T2). Further, there is an ergodic joining Al E J(X3 , X2) so that (1)
II(X I
x
X2 ,m,PI •
(2) P3 x X 2
C
X3
X
x
Q;(X3
x
X 2 ,AI),P3
x
Q/I <
B,
:F2, and
';1
(3) X3 x
•
Q C :F3
X
X 2.
(7.21)
';1
Proof First notice how this result differs from the first form. Here we suppose h(TI,PI ) > h(T2)' not just greater than h(T2,Q). In Theorem 7.17, we constructed P inside the X 2 process. Here we only get (X3, P3 ) joined to (X 2, (2) with approximate containment. The critical new fact is
h(T3 , P3 ) > h(T2)· We will gain this entropy using Theorem 7.12. Notice that
h(S) < h(T, P)
~
log2(n).
The Krieger and Ornstein theorems
I 149
Let log2(n) - h(S) > /X > 0, /X ::;; 1. Also notice that if we obtain (7.21) for a refinement of Q, we automatically get it for Qitself. Select e so that (7.22) By our remark above we can assume, without loss of generality, that
oS
h(T2 )
6/X h(T2 , Q) ::;;
s.
-
(7.23)
Use Theorem 7.17 with error 6/2. This gives us a partition (1)
II(XI x X2'p.),P x Q;X 2,P v QII < 2' and
-
-
-
(2)
Qsg
V
T2-
i
--
-
P of X 2 so that
e
(p)'
(7.24)
112 ;=-ao
"p.
If we apply Theorem 7.12, with error e/4, for any N, Let VI = V{i'2'P) E there is a V2 E with
"p
(7.25) As VI is ergodic, we may assume V2 is also. How do we choose the parameter N? We ask two things of N. first i N > logA4/e), and second. as Q c;E12 V'r'=-oo we ask that Q 2 ._ 2 _ _ T2- (P). _ Vf= -N T2-'(P). With this choice for N. let (X 3 , P3 ) = «Yp , V2), P) and {J,I be an ergodic joining of (X3,1'3) and (X 2, p) that achieves the tiN-distance. Notice that as (7.25) tells us
c!
~
--_
--
-
-
-
d N (X2'P v Q;(X 3 x X2'{J,.),P x Q) <
6e
2 < 2'
and n > log2(4/e). Exercise 7.7 tells us
IIX2 ,P v Q;(X 3 x X2 ,{J,I),P x QII < e and we obtain (1) of (7.21). To see (2) of(7.21), since P ~ ~2 and (J,I(1'3 x X 2 L\X3 x we have 1'3 x X 2 c A, X3 x ~2. Conclusion (3) of(7.21) is slightly more delicate. We know
(7.26)
Pr < (6/2) < e,
150 • Fundamentals of measurable dynamics
i.e., there is a partition Qo
/11
£;
Vf= -N Tl-i(P) with /11(Q ~ Qo) < e. As
NT3- i(P3) (i'::{N
X
Xl
~ X3
X
N) <~,-
i'::{N Tl-i(P)
in Vf= -N T3- i (P3 ), we can define Ql to be the union_of precisely the ~ame P-names as form <20 in \/f=-N 7;-i(p). This gives /11(Ql x Xl~X3 X Qo) < - n~ (e/2) and so X3 x Q C,;~ $'3 X Xl' •
7.5
Krieger's finite generator theorem
Theorem 7.19 (Krieger's finite generator theorem (Krieg~r 1970») Suppose is ergodic and h(T) < logl(n). There is then a partition P of X into n sets with $' = Vf= -N T-i(P).
g
Before we prove this, some remarks are in order. First, notice the entropy bound h(T) < logl(n) is necessary as the only measure v E I'/P of entropy log2(n) is the full n-shift, which at the very least is a K -system. If g has entropy logz (n) but is not a K-system, then certainly it cannot have an n-set generator. Second, our proof is not Krieger's original, but is due to Burton and Rothstein who first noticed the deep connection between the Ornstein and Krieger theorems. This connection is so natural that the fundamental lemma designed to complete Ornstein's theorem is in fact more easily applied to prove the Krieger result. Notice that by Theorem 6.8 the conclusion of Krieger's theorem is equivalent to saying there is a joining /1 E J«Yp, v), X) so that (1) P x X c Yp x §; and ,;
(2) Yp x
$'
c :JB x X.
(7.27)
Il
Letting {Qi} be a refining and generating tree of partitions in X, with T(Q;) v T-i(Qi) £; Qi +1' (2) of (7.27) is equivalent to
(7.28) Conditions (1) and (7.28) look suspiciously similar to the conclusion of Lemma 7.18. Our first step is to establish the space in which these joinings sit. Let J consist of all joinings of (Yp , v) and g, where v E I'/p. This space is half-way between a purely symbolic space like 'l/P and a joining space. The first coordinate is symbolic, and its marginal measure arbitrary. The second coordinate is fixed. There is a natural metric topology on J, again a weak· topology given by
The Krieger and Ornstein theorems
I 161
where
Lemma 7.20
The metric space
(1, I " '11) is compact and convex.
Proof Convexity is obvious. For compactness, notice that Hn = Pn X Q. form a generating tree of partitions. Any additive set function fJ,o on this tree which agrees with Il on its second marginal will extend to a measure in 1. This is because the only possible empty chains in the tree must be empty in the Q. tree. The additive set functions are clearly II', . II compact. _ Definition 7.7
Let J be the weak'" closure of those
/l E j with
(1) fJ, ergodic; and (2) h.(S) > h(T) for v the first marginal of fJ,.
Corollary 7.21
J is a non-empty, compact space.
Proof Let (Yp,v o) be the full n-shift, (n-I,n-I, ... ,n- I ), and fJ, = Vo x Il. As (Yp , Yo) is weakly mixing, fJ, is ergodic. As h.(S) = log2(n) > h(T), fJ, E J. _ It is a very delicate task to identify J more precisely. It is not necessarily all elements of j whose first coordinate v E flp has h.(S) 2 h(T), although it is contained in this set. The ergodic elements of j may not be dense in it. The precise nature of J depends delicately on X. What we will show is that the /l E J satisfying our earlier conditions (7.27) and (7.28) are a dense Gd • Let (!7(n)
{ A
= /l E J : P x X
It is easy to see that any fJ, E
1/. Yp x , ~
I/.}
and Yp x Qn ~ PA x X .
nn fP. will satisfy (7.27) and (7.28).
Proof of Theorem 7.19 What we will show is that the sets fP(n) are open and dense in J. The Baire category theorem finishes the result, telling us fP(n) is a dense Gd in 1. ,)' '\' n To see that fP(n) is dense, notice that if fJ, = fP(n), then for some N large enough there are partitions pi s;;; QN and Q~ s;;; PN with
nn
IfJ,(Yp x
pi AP x X)I
<
8
152
I Fundamentals of measurable dynamics
and IA(Q~ x X ~ Yp x
Qn)1 <
B.
These are strict inequalities and all the sets in these expressions are fmite unions of elements of our tree. Thus for some {) > 0, if IIA',,a1l < {), these inequalities still hold with A replaced by,a'. Hence ,a' E lP(n) and it is open. To show 19(n) is dense, choose ,a E 1. We can assume ,a is ergodic and hv(S) > h(T) as such p. are dense in 1. Apply Lemma 7.18 with P = P, Q= Qn, ,a as given, and any 8 :s; lin. We get a Al with (1) 11,a, ,alII < B; l/n
(2) P x X c Yp
(3) Yp x Qn
X
l/n C ~ X
ff, and
(7.31)
X.
As ,a1 is ergodic, and »,s first marginal Hence 19(n) is dense in J. Corollary 7.22 For any
X with
VI
satisfies hv,(S) > h(T),
,a1 E 19(n). •
h(T) < log(n) there is a measure
(Yp, v) isomorphic to X.
VE
'1p with
•
Exercise 7.9 Suppose X has entropy less than log(n) and for v E '1p, hv(S) ~ P of X with
h(T). Given any 8> 0, show that there is a generating partition
IIX,P;(Yp,v),PII <
8.
Note: v is not assumed ergodic. This shows that in
{v E '1p;hv(S) ~ h(T)}, the set of measures isomorphic to X are dense. They of course cannot be a Gil' as any two such isomorphism classes are either disjoint or equal.
7.6
Ornstein's isomorphism theorem
Ornstein's theorem is much more than just that two Bernoulli shifts of equal entropy are isomorphic. It, in fact, identifies a certain property which allows for the proof of the isomorphism theorem. Bernoulli shifts happen to satisfy it. This property, and a rather long list of derivative properties, characterize those processes isomorphic to Bernoulli shifts. Our intention here is to prove the isomorphism theorem. Hence we will not delve too deeply into the world of such processes. The bibliography will direct the interested reader to sources of this material. We begin with its definition.
The Krieger and Ornstein theorems
I 163
Definition 7.8 We sayan ergodic process (X,P) is finitely determined if for anye > 0, there is a {) > 0 so that if (Xl> Pd is ergodic and (1) IIX,P;X1 ,PtII < {); and (7.32)
(2) h(T1 , PI) > h(T, P) - {);
then (7.33)
(3) d(X,P;XhP1 ) < e.
To complete the isomorphism theorem we will show first that any two finitely determined processes of equal entropy are isomorphic, and second that Bernoulli shifts are finitely determined. We will show a little more. In fact, we will see all mixing Markov chains are finitely determined, hence any two such of equal entropy are isomorphic. First a small technical fact. Lemma 7.23
If (X, 1') is finitely determined, and non-trivial, then h(T, 1') >
o.
Proof By non-trivial we mean l' consists of more than one set of positive measure. Suppo~e h(T, 1') = O. For any 0 < {) < L l.t(pt)2 = a, use Theorem 7.5 to find (X 2 ,P2 ), a K-system with
t
(1)
IIX,p;X1,P11I < {); and
(2) h(T2' P2 ) < {).
X1 and X2 are disjoint, so d(X,p;X1,p1) =
L" J.l(Pi)J.l2(P!)
i=1
~
L" J.l(Pi)2 -
{)
i=l
1 "
> "2 /~ J.l(p/)2 = a > O. Hence (X,p) cannot be finitely determined.
•
Let (Xl' 1') be finitely determined, and (X2' Q) another ergodic process with h(T1,p)
= h(T2' Q).
We assume l' and Q are generators of their respective processes. Let Ibe the weak· closure of the ergodic joinings in J(X!, X 2 ). As long as X! exists, lis not empty, as it is the closure ofthe extreme points of J(X!, X 2 ). In J we use the choice (1'" x Q") for the generating tree defining the weak· metric.
154 I Fundamentals of measurable dynamics Theorem 7.24
Those elements j1
E
J with P x X 2
c: p
Xl
X
~ are a dense GIJ.
Proof We define
(7.34) The proofthat (!'I(n) is open follows the same lines as that part of Theorem 7.19. To show denseness, we work as follows. Let {l E l, and 1/2n > e > 0 be given. We want to find Itl E ((,l(n) with IIIt,ltlll < e. We may, without loss of generality, assume {l is ergodic. We know by Theorem 7.7 that if ~~
~
~
~~
~
~
~
e
d«Xl x X 2 ,jil)'P x Q;(X I x X 2,ji2)'P x Q) < E = 410g(2/e)
+4
then n(X I x X2 ,jil)'P x Q;(X I x X2,ji2)'P x QII < e. Using E/4 in the definition of finitely determined for (Xl,P), we obtain a c5 > O. We assume
c5 < E/4 < e/4. As a first step, combining Lemma 5.10 and Exercise 7.8, we can find a partition Ql of X 2 with
and
Now h(TI,P) > h(T2 ,Qd
and {l can be regarded as an ergodic joining of(XI,P) and (X 2 , Qd. By Lemma 7.18 there is an ergodic process (X 3 , P3 ) and an ergodic joining {lo E J(X 3 , X2 ) with (1) II(X I x X2,ji),P x QI;(X3 x (2)
P3 x
1/211
X 2 c: Xl x Po
X2,fld,P3 x Qlll < c5; and
Vi=-oo T2- i(QI); most importantly, (7.35)
By (I) of (7.35), (1) IIX I x P;X3 x P3 11 < c5 and by (3), (2) h(T3 ,P3 ) > h(T,P) - c5. By our choice of c5, there is an ergodic ji E J(X I ,X3 ) with J.I.(P
~
X
-
E
X 3 AX I x P3 )<4.
The Krieger and Ornstein theorems
I
155
Let ji be almost any ergodic component of the relatively indepen~ent joining of (io E J(X 3,X2 ) and j1 E J(Xl>X3 ) over their common factor X 3, hence an ergodic joining of X3, X2 and Xl· Let (i1 E J be the restriction of ji to these two components. To compute II(X 1 x X 2 ,{i),P X Q;(X1 x X2 ,{il)'P x QII,notice it is at most
X2 ,m,p x Q;(X I x X 2 ,m,p x Q111+ (2) II(X 1 x X 2 ,m,p x Q1;(X3 x X2 ,{iO),P3 x Qlll + (3) II(X 3 x X2 ,{iO),P3 x Ql;(X l x X2 ,{il)'P x Qlll +
(1) II(X I x
(4) II(X 1 x X 2 ,{il)'P x Q1;(X l x X2 ,{il)'P x QII.
(7.36)
The pairs of processes in terms (1), (2), and (4) of(7.36) are within ad-distance "£/4. Hence each of these three is less than or equal to e/4. Term (3) we already know is not greater than e/4. Hence A and Al are weak* less than e apart. We know 1/Zn
P3
X
Xl
c::
X3
X
~,
ito
and
Thus _
P
lin X
X z c:: Xl
X :#'2'
ito
and (i1
E
(Q(n).
•
Before going on to the almost obvious Corollary 7.25, we stop to make some remarks. What Theorem 7.24 tells us is that a finitely determined process can be embedded as a factor in any process of equal entropy. One easily concludes that it can also be embedded in any process of greater entropy, as such always have factors of any smaller entropy. Restricted to the case of Bernoulli shifts, this says a Bernoulli shift can be embedded as a factor of any system with equal or greater entropy. This deep fact is originally due to Sinai. As with Krieger's theorem, we have not precisely identified J Here, in fact, it is all of J(Xl>X z ). Once we know Xl is isomorphic to a Bernoulli shift, it is relatively easy to show the ergodic joinings are dense in J(X l' X2). Corollary 7.25
(Ornstein's isomorphism theorem, one form) Suppose both P and Q generators, and h(Td = h(Tz ). Those elements of J supported on graphs of isomorp'hisms are a dense G6 in J Hence the two systems are isomorphic. (Xl' P) and (X z , Q) are finitely determined, with
156
I Fundamentals of measurable dynamics
Proof By Theorem 7.24, those j1 with both
Px
X2
C
r.
Xl
X
fF2
and
are a dense G,. As P and
Qgenerate,
•
and Theorem 6.8 completes the result.
Exercise 7.10 Suppose (X, P) is finitely determined and Qis another generating partition. Show that (X, Q) is also finitely determined. Thus finitely determined is a property of X. Hint: Using finite code approximations, show that any (Xl> Q1) close to (X, Q) in entropy and distribution contains a copy of (X l' PI) close to (X, P) in entropy and distribution. Use the finite codes to bring the d-closeness of (X, P} and (Xl> Pl) back to (X, Q) and (X l' Q1). In fact, any partition of a finitely determined process is finitely determined. This deep result of Ornstein and Weiss can be found in Ornstein (1974). Thus all factor algebras of finitely determined systems are themselves finitely determined.
7.7 Weakly Bernoulli processes To complete our picture of the isomorphism theorem we want to verify that mixing Markov chains are finitely determined. We begin with a property of mixing Markov chains. Definition 7.9 We say a process (X, P) is weakly Bernoulli, if for any e > 0 there is a k > 0 so that for all N
II D (
'i
l+N-I
)
T-i(P}I&'p - D
'i
(I:+N-I
T-'(P)
)
II 1 < e,
i.e.,
Lemma 7.26
Mixing Markov chains are weakly Bernoulli.
Proof If (X, P) is Markov, then for all k > 0 and N,
The Krieger and Ornstein theorems I 157
II D C:~-l
T- i (P)I9'p) - D
C:~C T-i(P»)
t
= IID(T- k (P)I9'p) - D(T-I:(P))II1·
As a mixing Markov chain is a K-system, Proposition 5.32 finishes the result.
-
Although the weakly Bernoulli property is known to be strictly weaker than finitely determined, it is the property most often applied. Hyperbolic toral automorphisms, for example, are proven Bernoulli by showing they are weakly Bernoulli for an appropriately chosen partition Ornstein (1974). We have seen earlier (Corollary 5.26) that entropy is convex. We need a uniformity in this to proceed.
O,L
Lemma 7.27 Fornfixed let nn = {(n t , ... , n,,);ni ~ ni = I}, the space of probability n-vectors. Suppose V is a Borel probability measure on n". Given B > 0 there is a ~ = ~(B, n) > 0 so that if H(J Jtdv) - JH(Jt) dv < ~, then
Proof We know from Corollary 5.26 that
equality holding iff v is a point mass at JJt dv, i.e.,
Now JH(n)dv, H(J Jtdv) and f IJt - f Jtdvl dv are all weak* continuous functions ofv. Compactness in weak* of the Borel probability measures completes the result. _ Smorodinsky (1971) has shown that in fact ~ does not depend on n. Our non-constructive argument of this uniform convexity of entropy completely misses this. Lemma 7.28
Suppose (Xl> Pd is ergodic and for some k, N,
Given anye > 0 there is a ~ so that if (X 2, P2 ) satisfies
158 (1)
I
Fundamentals of measurable dynamics
IIX1 ,P1 ;X2,P211 < <5; and
(2) h(T2 ,P2 ) > h(T1,P1) - b; then (3) II
DC:~-l T2-i(P2)I~P2) -
Proof We know that for j (N
DC:'Zl T2- i(P2»)
<
£
+ e.
(7.38)
= 1,2,
+ k)h(Tj, Pi) = I I
There are nUN elements in Choose M so that
t
(7.37)
pk+N.
C:V: 1j-i(P )19I'1 1
Choose b1
J
J)
d~j'
= b(e/8,nk+N) of Lemma 7.27.
o ~ I IC:V: 1 T1-i(P1)liSZ1 T1-i(Pd) - IC:V: 1 T1-i(P1)I~pl)d~1 < b~e. If bin (1) of (7.37) is small enough,
II
IC:V: 1 T1-i(P1)liSZI
-
T1-i(Pt»d~1
Y
II(k~V-1 T2- i(P2)I1--1 T2-i(P2»)d~21 < b81e .-0
as these integrals depend only on ~j restricted to V~~~MI 1j-i(~), j respectively. If () < ({)1 e/4(N + k» in (2) of (7.37) then
o~ I
(IC:V: 1 T2- i(P2)liSZI
3{)1
= 1,
2,
T2-i(P2»))-IC~V:1 T2-i(P2)19I'PI))d~2
e
<-8-' Thus for a subset G ~ Vi-=~I Vi-=~1 T- i (P2)'
T- i(P2),
H(D(Ny-1 T2-i(P2)IB)) __I(~2 B)
.=0
~2(G) >
3e/8, for BeG,
an atom of
r H(D(Ny-1 T2-i(P2)I&lp2))d~2<{)I' .=0
JB
By Lemma 7.27, for such aBc G,
2~:(B) IJDC-y:1 T2- i(P2)IB) -
DCy:1 T2-i(P2)I&lP2) Id~2 <~.
(7.39)
Outside G this integrand (7.39) is bounded by 2. Thus
V
T2- i(P2)I T2- i(P2 II D(k~y-1 .=0 i=-I
») - D(k+y-1 T2-i(P2)I&lP2) I i=O
< =8£ 1
+ 3e 4
The Krieger and Ornstein theorems
I
1 59
Now
I DC~y:1 T2-i(Pz)I~P2) - DC~~-l T
Z-
(a)
I D (k~V-1 T2-i(Pz)I~P2) -
D
,=0
(b)
i(P2»)
(k+y-1 Tz-i(Pz) I.-;f ,-0
,-
111 ~
Tz-i(PZ»)
1
I DC:v: 1 T2- i(pz )15Z1 Tz- i(P2»)-DC:V: 1 Tz-i(Pz»)t·
If bin (1) of(7.37) is small enough, (b) of (7.40) is
~
I
+ 1
(7.40)
c + (S/8) and we are done .
• We will use Corollary 7.11 to show that weakly Bernoulli processes are finitely determined. This involves the construction of maps cp which match names well. We will need the following technical argument. Lemma 7.29 Suppose (X 1,g-1,Jl1) and (X 2,g-Z,Jl2) are non-atomic Lebesgue spaces, A1 and A z partition Xl and Xz, and cp : Xl -+ Xz, is a measure-preserving 1-1 map. Suppose Q1and Q2 are further partitions of Xl and X z, respectively. We can then find a 1-1 measure-preserving cp' : Xl -+ X Z so that for any alE A 1 and a z E A z ,
Further, (1) D(Q11a 1 II cp'-l(az» = D(Q1Ia 1); (2) D(Qzlcp'(a 1)lIa Z) = D(Qzla z ); and (3) Jl1 ({ X
E
a 1 II cp,-l (a z ):
Q1 (x)
=1=
Qz(cp'(x»})
=
Jl(a 1 II cpr-! (az» IID(Q1Iad - D(Qzl az>lll'
(7.41)
Proof As Xl and Xl are non-atomic, it is clear we can move the sets cp-1(A z ) II a 1 without changing their mass so that (1) is satisfied and D(Qzlcp'(a 1) II a z ) remains D(Qzlcp(ad II a z ). Repeat this step in X z to obtain (2) without losing (1). Within each set a 1 II cp'(a z ) we can further perturb cp' so as to map as much of qf II a 1 to q; II a z as possible, i.e., a set of measure min(Jl1 (qf II a 1), Jlz(q; II a l ». Map the rest of a 1 II cp'-l(a z ) to the remainder of cp'(a 1) II a z to get the real cp'. We compute
Jl1({X
E
a 1 II cp'(a z ): Q1(X)
= III (a 1 II
cp'(az»
=1=
Qz(cp'(x»})
IID(Q1Iad - D(Qzlaz)111'
Our next result will complete the proof that weakly Bernoulli implies finitely determined. It is an inductive construction of maps CPr enabling us to apply
160
I
Fundamentals of measurable dynamics
Corollary 7.11. The notation becomes very complex, but essentially all we are doing is successively matching more and more of the T,Pl-names to Tz,i'znames, without disturbing the statistics of our earlier matching. Our starting point is the hypothesis and conclusion of Lemma 7.28.
Suppose (Xl,Pd and (X z , Pz ) satiify
Theorem 7.30
(1) IID(V~';kN-l ~-i(~)I&'p) - D(V~,;t-l ~-i(Pj))lll <
£,
for j = 1 and 2;
and
where N, K,
are> O.
£
(7.42)
Then -- - - k (3) d(Tl ,Pl ;Tz,Pz)<3£+ N+k'
(7.43)
Proof We will use Corollary 7.11. The sets Ar = Xl' and the lengths N r = r(N + k). To begin, ({Jo is any 1-1 measure-preserving map Xl -+ X z. By Lemma 7.29 and (2) of (7.42) we can construct ({Jl : Xl -+ X z with
f
t4N H) (PiN +k)(x), plN +k)( ({Jl (x» dJl
~
N: k+
Jll({x;pk(Tlk(X» #- p1(r;(x»)})
k ~N+k+£'
(7.44)
Assume we have constructed ({Jr with
f
d.(N+k)(P:(N+k)(X), P?cNH)«({Jr(X»dJll
To apply Lemma 7.29 again we let r(N +k)-l Aj =
V
(~),
~N:
k
+ 3£.
(7.45)
j = 1,2
i=O
and (7.46)
We conclude we can construct ({J(r+l) without modifying the measures of sets in Al V ({Jr-l(A z ) so that
The Krieger and Ornstein theorems
I
161
f
dNH(p1 H(T[
:; ; N: k+ :::; N k k
+
Ilz({x;p1(T{(N+k)H(X» of. pMT;
+~
L
IlI(a l nqJ,-';I(az»IID(Qllad - D(Qzlaz)1I1
alEA I D2EA2
-Dey:l
~PJ,j
As 1j-,(NH)(a) E
A+ + L
Tz-i(Pz»)
rI
L
+
r II ~
l-k
D(Ny-l
D2EA2 Ja2
(7.47)
= 1,2, expression (7.47) is at most
D(Ny-1
alEAIJal
t. TI-i(Pl)I~PI) -
D(Ny-l TI-i(Pd) II dill I-k
T2-i(Pz)l~p2)_D(Ny-1 Tz- i (P2») II
l-k
l-k
I
dll z +8 2
k :::; N
+ k + 38.
»
»
As III (a l n fP,(a z = III (a l n qJ,+1 (a z we have not disturbed the relative sizes ofmat:.:hed names on indices 0 to r(N + k). Noting -
2
d(, +1)(N H)(pt, +1)(N H) (x), p(' +l)(N +kl qJ'+1 (x»
z (T'(N + -1r d-(NH) (1 P(NH) (T,(N I +k)(» X ,P(NH) z +k)( qJ'+1 (»)) X
completes the induction. Proposition 7.31
Weakly Bernoulli processes are finitely determined.
(7.48)
•
182 I Fundamentals of measurabie aynamlcs Proof If (X I' PI) is weakly Bernoulli, then for 4k/8 so that
II DC:V: I
8
> 0, there is a k and N >
fl-i(Pdl~Pl) - DC:~-I ~-i(P2»)
t
<
i·
(7.49)
By Lemma 7.28, if (X2 , P2 ) satisfies (1) (2)
IIXI,PI;Xz,Pzll < c5; and h(1;,Pz) > h(ThPd > c5,
(7.50)
then
II DC:~-I
fZ-i(Pz)I~P2) -
DC:V: I fZ-i(PZ»)
t <~.
(7.51)
Certainly if c5 is small enough
Theorem 7.30 now tells us
-
-
-
d(T1 ,P1 ; Tz,Pz ) < N
k
38
+ k +"4 <
8.
(7.53)
Corollary 7.32 (Ornstein's isomorphism theorem, another form) Mixing Markov chains are finitely determined, hence any two of equal entropy are isomorphic. _ Corollary 7.33 Using Exercise 7.10, any generating partition in a mixing _ Markov chain is finitely determined. The chain of arguments, from the definition of weakly Bernoulli, through Lemma 7.28 to Theorem 7.30 and ending at Proposition 7.31, can be followed under weakened hypotheses. Our estimate of d-distances always came by ignoring a gap in the names oflength k and then matching the names without error for a length N. This was possible as the weakly Bernoulli condition allowed us to build such a pairing of names. If we were given instead as hypothesis that the conditional future distribution D(Vf=o Tl-i(Pl)l~p)(x) could be paired, for most x, with D(Vf=o TI-i(PI » with a small dN-error, this chain of argument could again be followed leading to a stronger version of Proposition 7.31. This condition, called 'very-weakly Bernoulli' turns out to be equivalent to finitely determined. The interested reader can pursue this idea further in Ornstein (1974) and Shields (1973). There are two directions of work we should mention at this point. A tacit assumption in our proof of the isomorphism theorem was that our systems
The Krieger and Ornstein theorems I 163
had finite generating partitions, i.e., had finite entropy. This condition can be removed and the result extended to infinite entropy systems. Here one must follow a refining tree of partitions (Ornstein 1974). A second and extremely useful extension, by J.-P. Thouvenot (1975), involves the investigation of processes all of which contain an isomorphic copy of a fixed process (X, H), examining structures 'relative' to this common factor process. Thouvenot investigates this 'relatively Bernoulli' property of being isomorphic to a direct product of (X, H) and a Bernoulli system. He demonstrates that a theory completely parallel to the Bernoulli theory exists in this context.
Bibliography
AhlCors, L. (1966). Complex analyses. McGraw Hill, New York. Billingsley, P. (1965). Ergodic theory and information. Wiley, New York. CornCeld, I. P., Fomin, S. V., and Sinai, Ya. G. (1982). Ergodic theory. Springer-Verlag, New York. Chung, K. L. (1968). A course in probability theory (2nd edn). Academic Press, New York. Doob, J. E. (1953). Stochastic processes. Wiley, New York. Feller, W. (1968). An introduction to probability theory and its applications. Vol. 1, (3rd edn). Wiley, New York. Feller, W. (1971). An introduction to probability theory and its applications, Vol. 2, (2nd edn). Wiley, New York. Friedman, N. (1970). Introduction to ergodic theory. Van Nostrand, Princeton. Furstenberg, H. (1967). Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation. Mathematical Systems Theory 1,1-49. Furstenberg, H. (1981). Recurrence in ergodic theory and combinatorial number theory. Princeton University Press. Glasner, S. and Weiss, B. (1990). Processes disjoint Crom weak mixing. Transactions of the American Mathematical Society. (In press.) Jacobs, K. (1962). Lecture notes on ergodic theory. Vols 1 and 2. University of Aarhus. del Junco, A. and Keane, M. (1985). On generic points in the Cartesian square of ChacOn's transformation. Ergodic Theory and Dynamical Systems 5, 59-69. del Junco, A., Rahe. M., and Swanson, L. (1980). Chac6n's automorphism has minimal self-joinings. Journal d'Analyse Mathematique 37, 276-84. del Junco, A. and Rudolph, D. (1987). On ergodic actions whose self-joinings are graphs. Ergodic Theory and Dynamical Systems 7,531-58. Katznelson, Y. (1968). Introduction to harmonic analysis. Wiley, New York. K~ngel, U. (1985). Ergodic theorems. W. de Gruyter, New York. Krieger, W. (1970). On entropy and generators of measure-preserving transformations, Transactions of the American Mathematical Society, 149,453-64. Ornstein, D. S. (1974). Ergodic theory, randomness and dynamical systems. Yale University Press. Ornstein, D. S. and Weiss, B. (1980). Ergodic theory of amenable group actions I, the Rohlin lemma. Bulletin of the American Mathematical Society, 2, 161-4. Ornstein, D. S. and Weiss, B. (1983). The Shannon-McMilIan-Breiman theorem for a class of amenable groups. Israeli Journal 0/ Mathematics, 44, 53-60. Parry, W. (1969). Entropy and generators in ergodic theory. Benjamin, New York. Rohlin, V. A. (1966). Selected topiCS in the metric theory of dynamical systems, American Mathematical Society Translations, Series 2, 49,171-240. Royden, H. L. (1968). Real analysis (2nd edn~ MacMillan, New York.
Bibliography
I 165
Rudolph, D. (1979). An example of a measure-preserving map with self-joinings and applications. J. d'Analyse Math. 35,97-122. Shields, P. (1973). The theory of Bernoulli shifts. University of Chicago Press. Smorodinsky, M. (1971). Ergodic theory; entropy, Springer Lecture Notes 214. SpringerVerlag, New York. Stone, D. M. (1950). Decomposition of measure algebras and spaces. Transactions of the American Mathematical Society, 69,142-60. Thouvenot, l.-P. (1975). Quelques proprietes des systemes dynamiques qui se decomposent en un produit de deux systemes dont I'un est un schema de Bernoulli. Israeli Journal of Mathematics, 21,177-207. Walters, P. (1982). An introduction to ergodic theory. Springer-Verlag, New York.
Index
additive set function 10 algebra q countable 9 past 79 Pinsker 97 tail 94 automorphism, toral
Bernoulli shift 86 entropy of 87 relatively 163 very-weakly 162 weakly 156 BirkhofT ergodic theorem 29, 35
Cesaro average 27 Chacon's example 6 minimal self-joinings of 118 weakly mixing but not mixing 66 chains, in a tree 10 atomic 11 empty 12 non-atomic II circle, rotations of I, 113 entropy of 85 conditional, expectation 18,20 entropy 94 information 90 cutting and stacking 4, 6
d metric, between names
138
of BernoulIi shift 86 of circle rotations 85 formula 89 of Markov chain 88 of non-ergodic maps \03 of partitions 90 zero 79,94, 115 ergodic 7,27,43,51,135 BirkhofT's theorem 29, 30, 35 decomposition 45, 50 L 2- theorem 27 maximal theorem 29 uniquely 44, 54
factor 15 maximal isometric 64 maximal O-entropy 97 relatively independent joining over \08 Fejer's kernel 58 finitely determined 153
generating partition 26,85, 150 Krieger theorem 150 group, of isometries 2, 53 monothetic 3
Haar measure 4, 17 Hamming distance 138
isometries of compact metric spaces 2, 53, 63 isomorphism, measurable 7
between processes 136 decomposition, ergodic 45, 50 density, full 53 in N 53 disjointness 113 of ergodic and identity maps 114 of K and O-entropy maps 115 ofrotations 113 of weakly mixing and isometric maps 114 dyadic adding machine 6 dynamical system 26
joining \06 ergodic \08 k-fold minimal self 117 minimal self 116 relatively independent 108 self 116 space of \07
eigenfunction 62 entropy 71, 72, 83
K-automorphism 69,82,97, \02, 115, 140 Kolmogorov, extension theorem 13
168
I Index
Kolmogorov, extension theorem (cont.) property, see K-automorphism Krieger theorem 150 Lebesgue probability space 9, 12, 16 non-atomic 11 Markov chain 2, 8 entropy of 88 are weakly BernoulIi 156, 162 martingale 18 convergence theorem 20 reverse, convergence theorem 100 measure, disintegration of 45, 46 Haar 4,17 marginal 105 outer and inner 13 spectral 58 minimality 3, 54 mixing 7,51,66,140 Hold 66 weakly 52,62,65, 114, 135, 139
names 72 d distance between 138 generic 132 painting on towers 132 T,P,n- 72 non-periodic 33 Ornstein's theorem 152 one form 155 another form 162 Ornstein's fundamental lemma 141 first form 145 second form 148
partitions, finite 9, 72 entropy of 90 generating 85, 150 generating tree of 10, 26 refining sequence of 9 span of 9 past algebra 79 periodic point 33 Pinsker algebra 97 probability space, Lebesgue 9, 12, 16 non-Lebesgue 17 process 130
d distance between 136 pure columns in a tower 141 Radon-Nikodym theorem 25 rank-1 transformation 4 recurrence, Poincare 51 relatively independent joining 108 reverse martingale, convergence of 100 Riesz representation theorem 18 rigidity 68, 135 Rohlin lemma 33 rotations of circle, see circle, rotations of self-joining 116 applications of 125 k-fold minimal 117 minimal 116 Shannon, McMiIIan, Breiman theorem 77 shift, BernoulIi 80 Markov 8,88 spectral, measure 58 theory 57,61 Stirling's formula 74 symbolic, process 130 space 129
tail field 94 toral automorphism tower, painting names on 132 pure columns in 141 Rohlin 33, 34 transformation 26, 52 non-periodic 33
uniform integrability 23 unique ergodicity 44, 54
very-weakly BernoulIi 162 Vitali covering lemma 18 backward 33 proof of backward 37, 42
weak* topology 130,150 weakly BernoulIi 156 . weakly mixing 52,62,65,114, 135, 139
zero entropy 79, 94