Cambridge University Press 978-0-521-38997-6 - Ergodic Theory Karl Petersen Frontmatter More information
© Cambridge University Press
www.cambridge.org
Cambridge University Press 978-0-521-38997-6 - Ergodic Theory Karl Petersen Frontmatter More information
© Cambridge University Press
www.cambridge.org
Ergodic theory
KARL PETERSEN Professor of Mathematics, University of North Carolina
The right of the University of Cambridge to print and sell all manner of books was grappled by Henry Pill in 1534. The University has pruned ad published continuously since 1584.
CAMBRIDGE UNIVERSITY PRESS Cambridge New York Port Chester
Melbourne
Sydney
Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 1RP 40 West 20th Street, New York, NY 10011, USA 10 Stamford Road, Oakleigh, Melbourne 3166. Australia © Cambridge University Press 1983
First published 1983 First paperback edition (with corrections) 1989 Printed in Great Britain at the University Press, Cambridge Library of Congress catalogue card number: 82-4473 British Library cataloguing in publication data
Petersen, Karl Ergodic theory. - (Cambridge studies in advanced mathematics: 2) 1. Ergodic theory L Title 515.42 QA313 ISBN 0 521 23632 0 hard covers ISBN 0 521 38997 6 paperback
TM
Contents
1 1.1 1.2
1.3
1.4
Preface
ix
Introduction and preliminaries The basic questions of ergodic theory The basic examples A. Hamiltonian dynamics. B. Stationary stochastic processes. C. Bernoulli shifts. D. Markov shifts. E. Rotations of the circle. F. Rotations of compact abelian groups. G. Automorphisms of compact groups. H. Gaussian systems. I. Geodesic flows. J. Horocycle flows. K. Flows and automorphisms on homogeneous spaces. The basic constructions A. Factors. B. Products. C. Skew products. D. Flow under a function. E. Induced transformations. F. Inverse limits. G. Natural extensions. Some useful facts from measure theory and functional analysis A. Change of variables. B. Proofs by approximation. C. Measure algebras and Lebesgue spaces. D. Conditional expectation. E. The Spectral Theorem. F. Topological groups, Haar measure, and character groups.
1
The fundamentals of ergodk theory
tI 12 113
The Mean Ergodic Theorem The Pointwise Ergodic Theorem Recurrence I/
1 5
10
13
23 23 27 33
vi
Contents
2.4 2.5 2.6
Ergodicity
3 3.1
More about almost everywhere convergence More about the Maximal Ergodic Theorem A. Positive contractions. B. The maximal equality. C. Sign changes of the partial sums. D. The Dominated Ergodic Theorem and its converse. More about the Pointwise Ergodic Theorem A. Maximal inequalities and convergence theorems. B. The speed of convergence in the Ergodic Theorem. Differentiation of integrals and the Local Ergodic Theorem The Martingale convergence theorems The maximal inequality for the Hilbert transform The ergodic Hilbert transform The filling scheme The Chacon—Ornstein Theorem
3.2
3.3 3.4 3.5 3.6 3.7 3.8 4 4.1
4.2
4.3
4.4
Strong mixing Weak mixing
More about recurrence Construction of eigenfunctions A. Existence of rigid factors. B. Almost periodicity. C. Construction of the eigenfunction. Some topological dynamics A. Recurrence. B. Topological ergodicity and mixing. C. Equicontinuous and distal cascades. D. Uniform distribution mod 1. E. Structure of distal cascades. The Szemerédi Theorem A. Furstenberg's approach to the Szemerédi and van der Waerden Theorems. B. Topological multiple recurrence, van der Waerden's Theorem, and Hindman's Theorem. C. Weak mixing implies weak mixing of all orders along multiples. D. Outline of the proof of the FurstenbergKatznelson Theorem. The topological representation of ergodic transformations A. Preliminaries. B. Recurrence along IP-sets. C. Perturbation to uniformity. D. Uniform polynomials.
41 57 64 74 74
90
100 103 107 113 119 126 133 133
150
162
186
Contents 4.5
E. Conclusion of the argument. Two examples A. Metric weak mixing without topological strong mixing. B. A prime transformation.
vii
209
227
5.2 5.3
Entropy Entropy in physics, information theory, and ergodic theory A. Physics. B. Information theory. C. Ergodic theory. Information and conditioning Generators and the Kolmogorov—Sinai Theorem
6
More about entropy
249
6.1
More examples of the computation of entropy A. Entropy of an automorphism of the torus. B. Entropy of a skew product. C. Entropy of an induced transformation. The Shannon—McMillan—Breiman Theorem Topological entropy Introduction to Ornstein Theory Finitary coding between Bernoulli shifts A. Sketch of the proof. B. Reduction to the case of a common weight. C. Framing the code. D. What to put in the blanks. E. Sociology. F. Construction of the isomorphism.
249
References
302
Index
322
5 5.1
6.2 6.3 6.4 6.5
227 234 243
259 264 273 281
Jumalate juhatusel Jooksvad elujoonekesed, Voolavad ônnelainekesed. For Elisabeth
viii
Preface
Ergodic theory today is a large and rapidly developing subject. The aim of this book is to introduce the reader first to the fundamentals of the ergodic theory of point transformations and then to several advanced topics which are currently undergoing intense research. By selecting one or more of these topics to focus on, a student can quickly approach the specialized literature and indeed the frontier of the area of interest. Of course the number of interesting topics that we have neglected is necessarily far greater than that of those we have been able to include. Thus we have to refer the reader elsewhere for discussions of, for example, operator ergodic theory, the existence of invariant measures, nonsingular transformations, orbit equivalence, differentiable dynamics, subadditive ergodic theorems. etc. Unfortunately, there do not exist coherent expositions of all of these topics; I invite those of my colleagues who are more expert than I in the areas I have omitted to do some more expository
writing. It should also be understood that. even for the advanced topics that we do discuss, their treatment here cannot be more than an entryway to the rapidly expanding specialized literature. Thus our presentations of multiple recurrence and the Ornstein theory, to mention two examples, are intended as introductions to the books of Furstenberg (1981) and Ornstein (1974), respectively. From my point of view, ergodic theory consists of examples, convergence theorems, the study of various recurrence properties, and the theory of entropy. Each of these facets is given first a basic and later a more advanced treatment. At the introductory level, one can find the usual important basic topics: the standard examples, the mean and pointwise ergodic theorems, recurrence, ergodicity, strong mixing, weak ix
x
Preface
mixing (all in Chapter 2) and the fundamentals of entropy (in Chapter 5). I have tried to make the writing as clear and complete as possible. Throughout we concentrate on single invertible measure-preserving maps on Lebesgue spaces. Some more advanced topics related to convergence theorems appear in Chapter 3, to recurrence in Chapter 4, and to entropy in Chapter 6. Chapter 3 presents a rather thorough analysis of maximal functions and their usefulness in the proof of the important convergence theorems of analysis (Lebesgue's Differentiation Theorem and the existence of the Hilbert transform), probability (the martingale convergence theorems), and ergodic theory (the Maximal Ergodic Theorem and its refinement to an equality, the Pointwise Ergodic Theorem, the Local Ergodic Theorem, the Dominated Ergodic Theorem and its converse, and the existence of the ergodic Hilbert transform). We also give Neveu's proof of the Chacon—Ornstein Theorem, as an introduction to operator ergodic theory. Chapter 4 begins with a direct construction of eigenfunctions for non-weakly-mixing transformations, which leads to a brief treatment of almost periodic functions and topological dynamics. In Section 4.3 we give an introduction to Furstenberg's approach to multiple recurrence and the Szemerédi Theorem, and Section 4.4 proves the Jewett— Krieger Theorem on the topological representation of measure-preserving transformations. The final section examines the important examples of Kakutani and Chacon, both of which are weakly but not strongly mixing and at least one of which is prime. Chapter 6 contains several nontrivial entropy computations (toral automorphisms, skew products, and induced transformations), the Shannon— McMillan—Breiman Theorem, and the fundamental facts about topological entropy. We also provide a brief introduction to Ornstein's isomorphism theory of Bernoulli shifts, proving the Ornstein Isomorphism Theorem by the Keane—Smorodinsky method, which produces a finitary (realizable by machine) map. The background necessary to read this book is some knowledge of measure theory and functional analysis; for example, the contents of Royden's Real Analysis suffice. Each chapter begins with a short summary, and many sections end with exercises, which range from trivial to difficult. The list of references contains all works referred to in the text, and then some, but it is not a complete bibliography for the subject; starred entries are books and survey articles, several of which contain more extensive bibliographies. I apologize for any errors of fact, misattributions of results, misprints, passages of incompetent exposition, and all other mistakes and misjudgments; I will be happy to receive lists of errors (preferably accompanied by suggested corrections) from all who care to compile them.
Preface
xi
This book grew out of courses given at the University of North Carolina and Yale University. During part of the writing I received financial support from the National Science Foundation, and the work was completed while I was on a leave supported by the Kenan Foundation at the Laboratoire de Calcul des Probabilités of the Université Pierre et Marie Curie (Université de Paris, VI). I thank all of those institutions for providing me with the opportunities to teach and learn this subject. I have had mathematical and editorial help especially from Brian Marcus, and also Shizuo Kakutani, Ulrich Krengel, Benjamin Weiss, Mike Keane, Shaul Foguel, Yves Derriennic, Bruce Kitchens, and Judy Halchin. My students listened patiently to my lectures, made many useful suggestions, and caught lots of errors. Janet Farrell typed the original manuscript with the assistance of Debbie Hanner Reives, Hazeline Lewis, and Doris Mahaffey. The editing and publishing were handled by David Tranah and John Samuel. My sincere thanks to all these people for their contributions.
Karl Petersen Chapel Hill, N.C. September 1981
Introduction and preliminaries
Without going into the details (to which the rest of the book is devoted), we mention some of the basic questions, examples, and constructions of ergodic theory, in order to provide an indication of the content and flavor of the subject as well as to establish reference points for terminology and notation. The final section presents a few facts from measure theory and functional analysis that will be used repeatedly.
1.1. The basic questions of ergodic theory Ergodic theory is the mathematical study of the long-term average behavior of systems. The collection of all states of a system forms a space X. The evolution of the system is represented by a transformation T: X X. where Tx is taken as the state at time 1 of a system which at time 0 is in state x. If one prefers a continuous variable for the time, he can consider a one-parameter family { : t E RI of maps of X into itself. When the laws governing the behavior of the system do not change with time, it is natural to suppose that Ts4_, = 7T. so that {7; : tER} is a flow, or group action of ER on X. A single (invertible) transformation T: X —> X also determines the action of a group, namely the integers 1, on X. The actions of arbitrary groups, and even of semigroups in case the transformations may not all be invertible, are worthy of study, but we will be interested mainly in the action of the powers of a single transformation and, occasionally, of a flow. In order to analyze a system mathematically, one needs to have some structure on X and restrictions on T. There are three major cases: (1) X is a differentiable manifold and T is a diffeomorphism, the case of differentiable dynamics; (2) X is a topological space and T is a homeomorphism, the case of topological dynamics;
2
/. Introduction and preliminaries (3) X is a measure space and T is a measure-preserving transformation, the case of ergodic theory.
Of course the three cases overlap extensively, and a single example can be viewed in different lights; in fact, some of the most interesting problems concern the relationships among the three areas. Let us define more carefully the case (3) of most interest for us. Let (X, 2, p) be a complete probability space, that is, a set X together with a a-algebra .2 of measurable subsets of X and a countably additive nonnegative set function p on .2 such that p(X) = 1 and such that 2 contains all subsets of sets of measure O. Let T: X X be a one-to-one onto map such that T and T -1 are both measurable: T -1 .4 = TI = 2. Since sets of measure 0 don't matter, we don't care if T only becomes well-defined and one-to-one onto after a set of measure 0 is discarded from X. Assume further that p(T -1 E)= p(E) for all E E.. A map T satisfying these conditions is called a measure-preserving transformation (abbreviated m.p.t.), and the systems (X, .4, p, T) will be our fundamental objects of
study. (Sometimes one wishes to consider possibly noninvertible maps T: X X such that T -1 .4 4 and p T -1 = p, and many of the results we present extend to such maps, but we will restrict ourselves for the most part to the invertible case. There is also an interesting theory of nonsingular maps (T -1 .4 c 2,12T -1 4p) which we do not have space to discuss here.) In the one-parameter case, we assume that T, is a m.p.t. for all te R, the map (x, t)--+ Ti x is jointly measurable from X x R -to X, To is the identity, and Ts+s = 7;1; for all s, te R. If T: X -4 X is a m.p.t., the orbit trix:nc.11 of a point x e X represents a single complete history of the system, from the infinite past to the infinite future. The o--algebra .4 is thought of as the family of observable events, with the T-invariant measure p specifying the (time-independent) probabilities of their occurrences. A measurable function f : X ER represents a measurement made on the system; f(x), f(Tx), f(T 2x),... may be thought of as the values of some physically interesting variable at successive instants of time, beginning with the world in initial state x. In statistical mechanics, information theory, and other areas of application it is interesting and sometimes necessary to consider the long-term time average
1 N-1
— N
f(T kx)
I=0
of a large number N of successive observations. (The average may not really be 'long-term', because the time unit involved may be very short:
1. L The basic questions of ergodic theory
3
in statistical mechanics, for example, molecular collisions may occur so often that we can actually observe only long-term' averages of any variable f.) A basic question of ergodic theory is that of the convergence of these averages: when does 1 1V-1 f(X) = liM —
N
E
f(rx)
k= 0
exist in some sense? If it exists, j(x) may be thought of as an equilibrium or central value of the variable f. The convergence had been proved in special cases earlier (e.g. Borel's Strong Law of Large Numbers (1909) in case thef V' are independent and identically distributed), but the general convergence in the mean square (L2) sense was proved by von Neumann and the almost everywhere convergence by Birkhoff, both in 1931. Their results are known as the Mean Ergodic Theorem and the Ergodic Theorem (or Pointwise Ergodic Theorem), respectively. Generalizations and improvements of these results have been appearing continually since 1932; we will have space for only a few in this book. The question of the existence of averages occurred to mathematicians because of the physicists' concern with the `ergodic hypothesis'. which was formulated in an (erroneous and unsuccessful) attempt to bring about the conclusion that the time mean 1 111T1 N—c)
N
N-1
E0 f(rx)
and the space mean
fx fdp coincide almost everywhere. It is desirable that the equilibrium or central value of a physical variable coincide with its weighted average over all possible states of the system. Boltzmann felt that if orbits penetrated all corners of the space, then this useful and symmetrical conclusion would follow. The investigation of the conditions under which this equality of time and space means, as well as even stronger conclusions, holds forms a second major topic in ergodic theory, the study of general recurrence properties, by which we mean the qualitative behavior of orbits. If the time mean of every measurable function coincides almost everywhere (a.e.) with its space mean, the system (or just T itself) is called ergodic. It turns out that a system is ergodic if and only if the orbit of almost
4
1. Introduction and preliminaries
every point visits each set of positive measure, or, equivalently, if p(E)> 0 and p(F) > 0 implies that p(TnE n F)> 0 for some n. A recurrence property which implies this one is strong mixing: lim„ p(TnE n F) = p(E)p(F) for all E, F E M. Weak mixing lies between the two, and just plain recurrence — if ,u(E) > 0 then p( TE n E) > 0 for some n — always holds in a space of finite measure. A third major question of ergodic theory is the classification problem. Let us say that two systems (X, M, p, T) and ( Y, y, S) are metrically isomorphic if there are sets of measure zero X0 X and 110 c Y and a one-to-one onto map 4): X1X0 Y\ Y0 such that çbT — SO on X\X 0 and p(0 - = v(E) for all measurable E c Y\Y o . (Such a map 41 is sometimes called an isomorphism mod O.) How can we tell whether two given systems are (metrically) isomorphic to one another? A classical way is by attaching isomorphism invariants to systems, of which ergodicity, weak mixing, and strong mixing are examples. These are, however, only spectral invariants. The map T: X —) X determines a unitary operator U = U T on L2(X) by U rf(x) —f( Tx) (Koopman 1931). (We sometimes denote this map by U T . sometimes by U. and sometimes even by T.) We can say that T and S are spectrally isomorphic if UT and Us are unitarily equivalent, in that VUT = U s V for some unitary V: L2(X)—) LAY). Then if S and T are spectrally isomorphic, either both or neither have any one of the recurrence properties mentioned above. An invariant which is sensitive to the nature of the action of T on individual points of X is the entropy h(T) of T. The entropy can be used to distinguish some nonisomorphic systems (Kolmogorov and Sinai) and is in fact a complete isomorphism invariant within certain classes of systems (Ornstein). in many parts of mathematics there is a construction problem as well as a classification problem. Such a question could be formulated in several ways in ergodic theory, one of which is the realization problem: which systems of type (3) above can be realized within type (2) (say with T preserving a unique Borel probability measure) or within type (1) (say with T preserving a measure determined by a smooth density)? The first of these questions is discussed below, and the second is the subject of current research. Still another question is that of genericity: which types of systems are 'typical', in various senses and in the several different settings? Many of the important questions of ergodic theory receive scant or no attention here: the existence of invariant measures, the case of infinite measure spaces, operator ergodic theory, the actions of other groups, C* dynamical systems. etc. And our discussion of the questions that we do treat is not alleged to be complete. The starred references in the bibliography are a good starting place for filling in any gaps that we leave_
1.1. The basic examples
5
1.2. The basic examples Because the two major sources of ergodic theory are mathematical physics (especially statistical mechanics and Hamiltonian dynamics) and the theory of stationary stochastic processes, naturally- these subjects provide a rich store of examples of measure-preserving transformations and flows. There are also several interesting and illustrative classes of abstract examples in an algebraic or geometric context. The following list will give some idea of the kinds of measure-preserving systems we will have in mind during the succeeding discussion. Obviously the various classes are not disjoint, and there are in fact inclusion relations among some of them. A. Hamiltonim dynamics The state at any time t of a physical system consisting of N particles can be specified by the three coordinates of position and the three of momentum of each particle, that is by a point in IR", which is the phase space of the system. More generally (allowing for changes of variables and constraints on the system), let the state of the system be described by a pair of vectors (q, p), where p = (p i , pn) (the 'generalized qn) (the 'generalized position') are in Di", momentum') and q = (q in which case the phase space is There is given a (V) Hamiltonian function H(q, p), which we assume to be independent of time, and which is typically the sum of the kinetic energy K(p) and potential energy U(q) of the system. Hamilton's equations are
dqi _ OH dt Opi "
dpi
011
di
— aq i
(i = 1. 2, ... , n).
These equations determine the state Tt(q,p) at any time t if the system has initial state (q, p), by the theorem on the existence and uniqueness of solutions of first-order ordinary differential equations. We obtain in this way a one-parameter flow : oo
11(q'
f311 Vp 1
OH OH
öp i -
q1 ' •
011 qn )'
for which clearly div ( V) = O. Denoting the Jacobian at (q, p) of the map T, by JT,(q, p), this implies through direct calculation that
0 —JT (q, p)= O. Dt
6
L Introduction and preliminaries
If E
R 2'1 is measurable and pc denotes Lebesgue measure on ER 2n then ,
ATE)
= JT;(q, p)d,u(q, p).
Thus d —
dt
p(TE) = f — PT (q, p)]dp(q, p) = 0. E t
(For the details, see Khintchine 1949 or Plante 1976.) Hamilton's Equations yield immediately that dH =0. dt Thus the system is not free to wander all over the phase space but is restricted to surfaces of constant total energy E. Usually most of these surfaces are compact manifolds. The flow restricted to any such surface also has an invariant measure. For the proof of this Proposition see Khintchine (1949). Proposition 2.2 The Hamiltonian flow restricted to a surface
S = t(q, p): 11(q, p) = El of constant energy preserves the measure diis = dS/II grad H where dS is the element of surface volume.
Through this formulation physical dynamical systems ranging from gas in a container to a cluster of galaxies enter the purview of ergodic theory. B. Stationary stochastic processes
Let (LI F P) be a probability space and ... J fo,f; J,, a sequence of measurable functions on a Suppose that the sequence is stationary, in that for any n i , n 2 , ... nr , any Borel subsets B 1 , B,, of ER, and any kEl, ,
13{"fni(w)EBI = Plco: , k(co)EB 1 ,
,
...
Br} , fnri_ h(co)ei3,1.
Such a stationary process corresponds to a measure-preserving system in a standard way. Let fie = ( x _ 1 , xo , x 1 ,...): each xi eR), define çb :S2 R z by (4)co)n = fco) for all n el, and define p on theBorel subsets of Ele by i(E) P(çt - 'E)Extend p to the completion A of the Borel field. Let cr:ble
Ile be the
1.2. The basic examples
7
shift transformation defined by (ax).= xn+ Because of the stationarity of ( fj p is shift-invariant on cylinder sets and so that we have constructed a measure-preserving hence on all of system (Rz, p, a). Moreover, if Tr. : Rz R is the projection onto the nth coordinate (iv = x ) , then [trn } has the same joint distributions on Rz as {L} on a Thus every stationary stochastic process 'comes from' some shift-invariant measure on Rz. A similar construction applies in the case of a continuous-parameter stochastic process (7; : — cc
C. Bernoulli shifts Let n = {0, 1.... n — 1} be an alphabet of finitely many symbols pi = 1. Form the with weights po , p l , p, 1 such that all pi > 0 and product space n7 of all two-sided sequences of the symbols in n, and give nz the product measure p determined by the given probability measure El on n. Thus for a typical cylinder set determined by a set of places and elements j 1 . jk en,
E7=-,;
= j, ,
x
= Pj1Pj2 ,
Clearly the the shift transformation cr:re —> n 7 preserves the measure p. The resulting measure-preserving system is denoted by .4(p0 , p 1 , p n _ i ) and represents a finite-valued stationary stochastic process with independent identically distributed terms (i.i.d.). models an experimenter tossing a fair coin from the infinite past on into eternity. D. Markov shifts Form the product space n7 and shift transformation as in (C). We will define a different invariant measure on nz, one for which the associated stochastic process is Markov rather than i.i.d. Let A = (ai) be an n x n stochastic matrix, i.e. a matrix with nonnegative entries and each row sum equal to 1. Suppose also that p = (p o , p l , PM- 1) is a row probability vector (all pi 0, IA = 1) which is fixed by A: pA = p. By the Perron—Frobenius Theorem (see Varga 1962) such a vector can always be found, and in some cases it is unique. Define the measure of a cylinder set determined by consecutive indices by
PA{x :xi
xi+k =4)
8
L Introduction and preliminaries
(Thus p gives the a priori probabilities of the symbols and A the transition probabilities from one symbol to another.) It can be verified that PA extends in a well-defined way to a countably additive measure on the algebra generated by the cylinder sets, and hence, by the CarathéodoryHopf Theorem, 144 extends to the Bord fi eld of lie and its completion bit. The resulting measure-preserving system (nz , .4, kiA , a) models a finitestate Markov chain. E. Rotations of the circle From the point of view of ergodic theory, the unit circle 11( = {z EC: I z I = 1} is the same as the unit interval [0, 1), and both are versions of R/Z, the reals 'mod 1'. Given an ŒE R, we consider the map TŒ : [0, 1) --- [0, 1) defined by Tax = x + a (mod 1) = (x + a > = fractional part of x + a
= x + a — [x + cc].
Regarded as a map of [K, T e znie = e 2ni(8+ a) . It is clear that Ta preserves Lebesgue measure. If a is rational, then Ta is periodic, all orbits being finite and of the same cardinality. Thus Tx is most interesting when a is irrational. F. Rotations of compact abelian groups Let G be a compact abelian group and go E G. Define Tgo :G G by Tgog = g + go . Because Haar measure is translation-invariant, To is a m.p.t. Again the most interesting case is when G is monothetic and go is a generator: {rig ° :nel} is dense in G.
G. Automorphisms of compact groups Let G be a compact group and T: G G a continuous automorphism. The uniqueness of normalized Haar measure implies that it is T-invariant. A particular case of interest is when G = fie . Rn/ln is the n-torus, when it can be shown that T is given by an n x n integer matrix with determinant ± 1. H. Gaussian systems
Consider a stochastic process ...f la foifi ,... on a probability space (SI, .9-, P) which is Gaussian in that the joint distribution of any finite number of the 4 is Gaussian : given i i < j 2 < ... < in , there are R and a symmetric positive-definite n x n matrix A = (Au)
1.2. The basic examples
9
such that for each Borel E
E} , .. - ji„(CO)E P{a):(fic, =
1
berVdet A
exp [ - I-(x - 0" A - ' (x - m)]dx i ... dx..
fE
j„(fi - no (fi - m)dp depends only If in fi dp is a constant, mo , and A. on i - j, then the process is stationary, and as in (B) determines a measurepreserving system (see Totoki 1970). I. Geodesic flows
Let M be a compact Riemannian manifold and U T(M) the unit tangent bundle of M, i.e. the collection of all (x, v), where x EM and y is a unit tangent vector to M at x. The geodesic flow {T) on U T(M) is defined as follows. Given (x, y), find the geodesic )40 which at time 0 passes through x and is tangent to v. Flow along the geodesic at unit speed for a time t, and take for 7;(x, y) the point and unit tangent vector that you finish with. The geodesic flow preserves the measure on the manifold that is determined by the Riemannian metric (see GottschalkHedlund 1955).
J. Horocycle flows Let M be a compact oriented surface of constant negative curvature. Then M is a quotient of the Poincaré disk by a discrete subgroup of isometries. Geodesics in the Poincaré disk are circular arcs that are perpendicular to the boundary circle:
geodesics: 7--------\
horocycles: / ".. -- — .... ■
By a horocycle we mean a circle which is interior to the disk (except for the point of tangency) and tangent to the boundary. The horocycle flow on VT(M) is defined as follows. Given (x, v)e UT(M), find the geodesic through x in the direction of v. Find the point (x, v)co
10
1, Introduction and preliminaries
at which this geodesic intersects the boundary. Construct the horocycle tangent to this point, through x, and orthogonal to v. Flow along this horocycle (in a pre-selected sense) at unit speed for a time t, and take for Tt(x, y) the equivalence class of the point and unit normal vector that you finish with. Again the natural volume is preserved. It is important to note that the geodesic and horocycle flows on the full Poincaré disk have no recurrence whatsoever, but even strong mixing does arise when we pass to a compact quotient by a discrete group of isometries (see Gottschalk and Hedlund 1955). K. Flows and automorphisms on homogeneous spaces Let G be a unimodular Lie group, F a discrete subgroup such that G/F has finite volume (as determined by the Haar measure on G), and Igt : t ER) a one-parameter subgroup of G. Then (&) determines a volume-preserving flow by left multiplication on the set of right cosets. In fact, the classical geodesic and horocycle flows arise in this way (see Auslander, Green and Hahn, 1963 and Brezin and Moore, 1981). Alternatively, let T: G > G be a continuous automorphism such that TF — F. Then the map induced on G/F by T is a m.p.t. By combining translations and automorphisms, one can produce affine maps on homogeneous spaces (see Parry 1969c, 1971). —
1.3. The basic constructions In any subject it is valuable to have ways to modify or combine old objects to make new ones. In this section we list some of the techniques that are available in ergodic theory. A. Factors Let (X, 2, ii,T) and (Y, Se, y, S) be measure-preserving systems
1.3. The basic constructions
11
and 0: X —. Y a map such that OT= S4) a.e., p(0 - 1 E) = v(E) for all E e (6. Then 4) is called a homomorphism (sometimes homomorphism mod 0) or factor map, (Y, W, y, S) (or S) is called a factor of (X, A', p, T) (or T), and (x, a, p, T) is called an extension of (Y, (6, y, S). B. Products
If (X i , .4 i , p i , T1 ) and (X 2 , .2 2 , 112 , T2 ) are measure-preserving systems, we define the product map T= T1 x T2 on X 1 x X 2 by T(x i , x 2) = (T1 x 1 , T2 X 2 ). Then T is a m.p.t. on the (completed) product measure space (X 1 x X 2 ,1 1 022 , p 1 x p2 ). C. Skew products Let (X, , p, T) be a measure-preserving system, and suppose that {Sx :xe X} is a family of m.p.ts of another probability space (Y,(6,y). Assume that Sy is a jointly measurable map X x Y--* Y. Then T : X x Y.X x Ydefined by
t(x, y) = (Tx, Sy) is a m.p.t. on the product space (Exercise 6). D. Flow under a function Let (X, R, p, T) be a measuring-preserving system and f: X —>
(0, cc) a measurable function. We construct a one-parameter flow on the region I- = (x, t) :0 -. t
under the graph of f. Each pcint flows vertically at unit speed, and we identify the points (x, f (x)) and (Tx, 0).
12
/. Introduction and preliminaries
It is easy to see (Exercise 7) that this flow preserves the product of IA with Lebesgue measure. The Ambrose- Kakutani Theorem (see Jacobs 1960) says that under suitable conditions every flow can be represented as a flow under a function. E. Induced transformations There are two kinds of induced transformations, the derivative transformation on a subset and the primitive transformation on a superset. 1. Let (X,.49,p,T) be a measure-preserving system and A X a measurable subset such that for a.e. x E A there is a smallest positive n(x)EN such that Tmx)xe A. Then TA :A.--, A defined by TA X = Tx is an invertible measurable map on A which preserves the normalized restriction of ji to A. 2. Let X = Y0 D D Y2 D Y3 D ... be a decreasing sequence of measurable sets, for each i take a copy X. of Yi such that all the X. are disjoint, and form the discrete union g = U13- 0 Xi . Then g inherits a a-algebra .4 and (possibly infinite) measure /1 from X in the obvious way. We think of fc as a tower or skyscraper built over X0 = X.
.t 3 x t2 x
t
X2
Î: g --) fc is defined by mapping each point in the tower to the one directly above it if there is one, and otherwise to the image under T in X 0 of its projection to the base. This can be made precise in the following way. We have measure-preserving injections ch i : X. —> X ; for i <j induced by the inclusions yi c Yi . We define = {1. 1 iii 1 T(0 1, o k)
if .)ZE X i and ci)-+110i,-{32} #ck otherwise.
One can show that tpreserves
ii.
F. Inverse limits Let (X,. i , p i . Ti) be measure-preserving systems for i = 0, 1, 2, ... and suppose that for each 1 j there is a homomorphism (iv X i , X i such that 4 ,i = identity and cki; ckik = 4lik if i j k. Form the subset X
1.4. Some useful facts
13
of the infinite product n't:"._ 0 X defined by X = fx = (xi): (kiixi ---- xi forai! i Let ni : X
.
X i be the projection: nix = xi . Then clearly
= 75 on X. Let 2 be the smallest a-algebra on X which contains all the by The countably additive measure p defined on the algebra U E) = pi(E) if Ee Mi extends (for Lebesgue spaces see 4.5 and 4.6) to all of R. Complete a. with respect to p. Define T: X X by T(x i) = Then clearly each 75 :(x, a, IL, T)-+ (Xi , pi , T) is a homomorphism. The system (X, 2, p, T) is called the inverse limit of the directed family { (X i , p i , = 0, 1, 2, ... J. Such a construction also applies for more general partially ordered index sets. G. Natural extensions Let (X,2,p) be a measure space and T: X X a possibly noninvertible measurable (T -1 2 c 2) measure preserving (pT 1 = p) map. A measure-preserving system (it, 4, 1.2, (with D invertible) is called a natural extension of c, , p, T) if there is a homomorphism 4): (fc, à. 12, (x, p, T) such that 4) - 2 = 4 up to sets of measure O. A natural extension can be constructed by taking for the space the inverse pi, Ti ), where all (X„ 2„ pi) = (X, T-ia, p), each limit of and for the transformation the (backwards = identity, and Oi; = unilateral) shift. The natural extension is unique up to isomorphism. -
1.4. Some useful facts from measure theory and functional analysis A. Change of variables Let (X, 2, p) be a measure space, (Y, W) a measurable space (i.e. a set Y together with a a-algebra 1 of subsets of Y), and 4): X Y ameasurable map, in the sense that 4 c Then 4) carries p to a measure denoted by 4)p or 14 on W, which is defined by p4) - '(E) = p(4,- 'E) for
a,.
EE Proposition 4.1 If 4): (X, 2, p) > (Y, ce) is a measurable map and f is a real valued measurable function on Y, then —
-
frfd(pork
-
1
)
1. Introduction and preliminaries
14
in the sense that if one of these integrals exists then so does the other and the two are equal. Proof: When f = xE is the characteristic function of Ee Se,
fyxE d(ticir I ) --= prk - '(E) and f xE ° (/) dp = f x x = P(0 - 1 E) , so that the result holds by definition of p4) - 1 . Hence the formula is also true for simple functions (i.e. linear combinations of characteristic functions). If f is a nonnegative measurable function, then f is the pointwise limit of an increasing sequence of simple functions, and the result follows from the Monotone Convergence Theorem. Finally, any measurable function f on Y can be written as the difference f = f + —f - of two nonnegative measurable functions on Y, so the formula is true in general. B. Proofs by approximation
The argument of the preceding paragraph is of a kind frequently encountered in ergodic theory, except that, even more, it is sometimes enough to check a formula only for certain characteristic functions in order to be able to conclude that it holds for all fe Li or L2, all measurable functions, or whatever. A semialgebra on X is a collection .99 of subsets of X which is closed under finite intersections and such that the complement of any Se .9') is a finite disjoint union of members of 99. We say that .99 generates a ff-algebra .0/ if A' is the smallest (complete) o-algebra which contains 99 • Examples of generating semialgebras (for the usual a-algebras) are the collection of half-open intervals [a, b) in [0, 1) and the family of cylinder sets
{xeé: x i = j i 9 Xi 2 =i2 , .. • Proposition 4.2 Suppose
X ik
=jk} in {0, 1, ... n — 1 } 1 .
that p.: L2 x L2 -4 [O, cc), for n =
O. 1,2, ... , is a function such that
(0 P.U., g)
(ii)
K lif 112 Ilg II 2 for all f, ge L2 and some constant K;
+f2 , g) P,S fi , g) + P,02 , g), NV, gi)+ P.(fg2), Pn(Lgi+2) Mai; g) lal Pn(f,g), and Pn(fi
P.(f ag) lai Mi., g) for all f,fi ,f2,g,g„ g2 E L2 and a € R. If lim.„ pn(xE , xF) = 0 for all sets E, F in a generating semialgebra in .07, then lim, oe p. (f, g) = 0 for all f, ge L2.
1.4. Some useful facts
15
Proof : By (ii) it follows immediately that limn„ pn(xA , 4) = 0 whenever A and B are disjoint unions of sets in .99, and also that limn, œp.(0, 0) = 0 whenever (/) and 0 are linear combinations of characteristic functions of sets in the algebra generated by .9' (which consists of all finite disjoint unions of members of 99). The set of such simple functions 4) is dense in L2. Given any f, g E L2, choose such simple 0 rkic -- f in L2 and I, 1l k —k g in 1,2. Then
Of g) = p„(4)k + f —
sbk, g) ‘. p„(4),„ g) + Of —
010
g)
P(4 k ' 0 lc) + P n( 0 1 cl g — 11/ lc) + P 'Si. — 4) k ' g) Pn (95k' Ikk) + K II0k11211g — t 1k 112 + K iif— (Pk h II g II 2 Given E> 0, choose k so that each of the last two terms is less than c/3. and then choose n so that the first term is less than en. C. Measure algebras and Lebesgue spaces There is a coherent way to ignore the sets of measure 0 in a measure space. A measure algebra is a pair (M, p), where N is a Boolean a-algebra (i.e., a set with general operations y, A,' which behave the same way as union, intersection, and complementation do in a a-algebra), and p is a countably additive positive (p(B) ?..-. 0, with equality if and only if B = 0) extended real-valued function on M. If A' is a Boolean a-algebra and 1 c 2, then we say that I is a a-ideal in case
(i) lei, BEN, B <1 (i.e., B A 1 = B) implies Be/ CO
(ii) in e.0(n =-- 1, 2, ... ) implies V I el. n= 1 n
If A' is a Boolean a-algebra and I c2 is a a-ideal, then .0 determines an equivalence relation on M according to B 1 — B2 if and only if B 1 A B2 = (B 1 A Bp y (B ; A B2)E.1. It is not difficult to check that the collection Al., of equivalence classes forms a Boolean a-algebra. For example, let At denote the collection of sets of measure zero. Then X is a a-ideal in M, it is constant on equivalence classes of modulo .Ar and hence determines a countably additive positive extended real-valued function 12 on 4 = 41/..4r, and (4, 12) is a measure algebra. The most important case is when a is the Boolean a-algebra of measurable sets in a measure space (X, 2, p); then (4, 12) is called the measure algebra of the measure space (X, .4, p). Let (M, p) and (4g, y) be measure algebras. A map T: 2 —. W is called a itomomorphism (of measure algebras) in case T commutes with the Boolean
16
I. Introduction and preliminaries
u-algebra operations (so that T(B A B;) = (TB) (TB 2)' and TVB = V TB„ for all B 1 , B2 ,... and satisfies p(B) = v(TB) for all Bea'. If T is onto and one-to-one, then it is called an isomorphism. Let (2, p) be a measure algebra. We define a metric d on 2 by d(A, B) = IAA AB) for A, Be. (.2, d) is called the metric space of the measure algebra
0,p). Proposition 4.3 The metric space of a measure algebra is
complete. A measure algebra (AS, p) is called separable in case its associated metric space is separable — that is, there is a countable dense set. If, for example, (ce, p) is a finite measure algebra and is countably generated, then (gi, p)
is separable. For a measure algebra (2, p), a nonzero Bea is called an atom if whenever A < B either A = 0 or A = B. A measure algebra (2, p) is called nonatomic if it has no atoms. The measure algebra th) of the unit interval is normalized (p(X) = 1), separable (since .41 is generated by a countable base for the open sets), and nonatomic. The following important theorem says that these three properties are sufficient to characterize the measure algebra of the unit interval. Theorem 4.4 (Carathéodory 1939) If (AI, p) is a normalized, separable, nonatomic measure algebra, then there is an isomorphism from (.2, p) onto the measure algebra (.4, th) of the unit interval.
Thus the measure space consisting of the unit interval with Lebesgue measure is a fairly representative object, at least at the level of measure algebras. We single out the class of measure spaces that are actually isomorphic to this standard one, with the possible addition of countably many point masses, in the sense of 'point isomorphism mod 0'. Definition 4.5 A (complete) finite measure space ( Y, ce, y) is called a Lebesque space if it is isomorphic (mod 0) with the ordinary Lebesgue measure space (X, 2, p) of a (possibly empty) interval [a, b) IR together with countably many point masses. That is, there are y1 , y2 , ... e Y, sets of measure zero Y0 c Y and X 0 OE X and a one-to-one onto map 4:
1.4. Some useful facts
)-■ X\X 0 such that 0 and
YVY0 u ty,
17 are measurable and
For a systematic treatment of the theory of Lebesgue spaces, see Rokhlin (1949) or Halmos and von Neumann (1942). For our purposes, it is enough to register the following two theorems of von Neumann (1932a) (see also Billingsley 1965, p. 69). Theorem 4.6 If X is a complete separable metric space and g is the completion of the family of Borel sets with respect to a Borel probability measure p on X, then (X. .4, p) is a Lebesgue space. Theorem 4.7 11 (X, Al, p) and (Y. 'te. y) are Lebesgue spaces and i) -'(1, it)is a homomorphism of their associated measure algebras, then (11 arises from a point homomorphism mod O: there is a set of measure zero X 0 c X and a measurable transformation 0:Xl.X 0 Y such that 1 coincides with el as a map ('i , i;) /2).
Since for Lebesgue spaces the concepts of point homomorphism mod 0 and homomorphism of measure algebras (the measurable sets reduced modulo the o--ideal of sets of measure 0) essentially coincide, a cavalier attitude toward sets of measure 0 can be forgiven. In this book we will deal exclusively with Lebesgue spaces, and in fact will usually work with just the unit interval, since the action of a m.p.t. on any atomic part is relatively uninteresting. D. Conditional expectation Let (X, p) be a probability space, f ELI(X, a., 14 and g a sub-a-algebra. Then
v(A)
f dp, for A e ST. A
defines a finite measure which is absolutely continuous with respect to the restriction of p to g. By the Radon-Nikodym Theorem there is a function ge 1.1( X ,97 , p) such that v(A) =
E1
4
for all A€.97.
A
Any other such function g 1 must coincide a.e. dp with g. We use the notation E(f1) for g, and call E(f Ig ) the conditional expectation of f with respect to As an element of L1 , E(fig- )is completely characterized by the following
18
I. Introduction and preliminaries
two properties:
gfl.F) is .9--measurable; f dp for all A e,_F. (ii) IA E(fl.F )dil This seemingly innocent application of the Radon-Nikodym Theorem in fact provides one of the most important kinds of averaging processes. Themeaning of E(f1.97 ) can best be apprehended in case f = xA is a characteristic function - in which case we use the notation
4AIfl = E(xA 1..F) and call p(A I,_F ) the conditional probability of A given .9- - and is a finite a-algebra (4), E, E`, X}, for some E e .2 with p(E)> 0 and 14E1> O. If E(f ) is to be .9--measurable in this case, it must be constant on E and on Ec; then (ii) gives
E(x
)=
I
AA
(-1 E) p(E) - p(AlE)
on E
p(A n Ec) p(A lEc) on E'. p(Ec)
We see that p(Al.F)(x) gives the probability of occurrence of A once we know to which elements of .9- the point x belongs. In the general case, too, E(f1")(x) represents the expected value of f if we know for each Ee.9whether or not x e E. The following properties of conditional expectation can be proved easily by using the characterization by (i) and (ii). Proposition 4.8 (1) If f 0 E( f ) 0 a e. (2) If f is .F-measurable, then E( f) = f a.e. (3) If f is .F-measurable, then E(fgi.F ) = f E(gI. ) a.e.
(4) (E(E(fl.F )) = E(f) (E(f)=Ix f dp). (5) If .97 , c S. then E(E(f1.17)1.9- 1 ) = E( fig 1 ) a.e. (6) If f is independent of .9-, then E( f ) = E( f) a.e. (7) For each p 1, E( is a linear operator of norm 1 on LP(X, p). (8) Jensen's Inequality: If 4): R -*R is convex (A i , 0, Et= = 1, x 1 , xft e ER implies 44E7= 1 .11x1) and O. f is integrable, then 0(E(fl..97)) E(0° fin a.e.
19
1.4. Some useful facts
E. The Spectral Theorem There are various ways of stating this result, of which we give three. Let T : H —+H be a continuous normal (TT* = T*T) operator on a separable Hilbert space H. (1) There is a finite measure space (X, A', ii) and a function 0ELoe(X, 6/42) such that T is unitarily equivalent to multiplication by 0 on L 2(X , .1, p). (2) There are Borel measures a i > a2 > ... on C such that T is unitarily equivalent to the operator S on the direct sum Er_ L 2(C, ad defined by
,e
S( f i (z 1 ), f2(z 2 ), ...) --= (z 1 f i (z 1 ), z 2(f2(z 2 ), •.•)• The ai are determined up to absolute-continuity equivalence. The equivalence class of a l is called the maximum spectral type of T. (3) There is a unique spectral measure E such that T= f AclE A .
c That is, E is a projection-valued Borel measure on C with compact support such that E(C) = identity, and (E Ur_ i A i )( f ) = Er_ i (EAd(f) if the Ai are measurable and disjoint, where the series converges in the
L2 norm. E is supported on the spectrum a(T) = {Ae C: T — AI has no continuous inverse} of T. If h is any bounded Borel-measurable function on a( T), the integral h(T) =
h(A)dE(A) fex(T)
can be defined by noting that if f, 0 H, then (E(- )f, g) is a complex measure on a(T), and the equation (Sf, g) = fh(A)d(E(A)f, g)
for all f, ge L2
o-(7)
will define S = h(T) on H. In particular Tk
=f J
il k dER 7
)
(1,f, g)= f A kd(E(A)f, g). 17(
T)
There are two facts about these spectral integrals that are very useful in applications.
20
1 . Introduction and preliminaries
Proposition 4.9 If R = j c f (A) d E(11,) and S = Lc g(ii.)dE(2), then RS = Lc f (.1.)g(.1)dE(i1). Proposition 4.10 Let T have spectral measure E, let xeH, and
let v be a finite positive measure which is absolutely continuous with respect to the measure (E(A)x, x). Then there is yeH such that v(A) = (E(A)y, y) for all Borel sets A. (Actually such a y can be found in the smallest T- and T*-invariant closed subspace containing x.) F. Topological groups, Haar measure, and character groups Let G be a topological group, so that G is simultaneously a group
and a topological space and the map G x G -. G defined by (x, y) --, xy - 1 is continuous. If G is locally compact, then there exists on G a left Haar measure ni, i.e., a positive, regular, Borel measure which is finite on compact sets, positive on nonempty open sets, and translation invariant: ni(gA)= m(A) for all g E G and all measurable A. The measure tit is unique up to a constant multiple. Similarly, there is a right Haar measure. Of course, on an abelian group they can be taken to be equal, and on a compact group they can be normalized to be probability measures (see Halmos 1950.). Henceforth let G be locally compact and abelian. The character group of G is the set Û of all continuous homomorphisms of G into the multiplicative group [1( of all complex numbers of modulus 1. '6 is a group under pointwise multiplication, When 6 is given the topology of uniform convergence on compact subsets of G, it becomes a locally compact abelian group. If G is discrete, then 6 is compact; if G is compact. then 6 is discrete. Examples: il% = Z. (Kn)" = Z", ER! = R, (W)*. or. The Fourier transform f of a function f eLt(G,n ) is defined on 6 by
f(x)= f f (g)- a dm(g) G
According to the Plancherel theorem, the Fourier transform f -+ f is an L2 -norm-preserving map from Li (G) n 1,2( G ) onto a dense subspace of L 2(d), which, therefore, extends to an isometry from LAG) onto /Ad). Frequently, f can be recovered from f via an inversion formula. If G is compact, then 6 forms a complete orthonormal set in L2 (G) and one has the Fourier series decomposition, in L2(G), f (g) =
E J(x)x(g)
for any f €L2(G).
xEa
The Pontrya gin Duality Theorem says that (6)^ can be identified in a
1.4. Some useful facts
21
natural way with G. Thus, 7L = 1K, etc. See Rudin (1967) for more details and references. Exercises
1. Show that if T is a m.p.t., then U = UT . defined on I}(X , .0, p) by UT 1(x)= f (Tx). is unitary. What if T is noninvertible? 2. Show that: (a) The one-sided Bernoulli shift (ax)n = on FI:°-0 {0, 1 }, where 0 and I both have weights 1, is isomorphic to the map t 2t mod 1 on [0, 1). (b) .0(4,1) is isomorphic to the 'baker's transformation'
T(x. =
3. 4. 5.
6.
7. 8.
9.
10.
{(2x mod 1, ly)
if 0 x (2x mod 1,(y + 1)) ill
x 1
on [0, 1) x [0, 1). Show that an automorphism of a compact group preserves Haar measure. Show that a homomorphism of measure-preserving systems is onto, up to a set of measure 0. Show that there is a correspondence between (isomorphism classes of) factors of a Lebesgue space and (appropriate equivalence classes of) invariant sub-a-algebras. (You may use the fact that a factor of a Lebesgue space is a Lebesgue space.) Verify that the skew product defined in 1.3.C. is a m.p.t. (Note: To get both T and t extra hypotheses may be needed.) Verify that the flow built under a function in 1.3.D. is measurepreserving. Give a categorical definition of the inverse limit (1.3.F) of a countable directed family of measure-preserving systems and show that an object satisfying this definition is unique up to isomorphism. (a) Construct the natural extension (1.3.G) as the shift on an inverse limit of measure spaces. (b) Prove that the natural extension is unique up to isomorphism. (c) Show that the natural extension of a unilateral Bernoulli shift is the corresponding bilateral Bernoulli shift. Let (X, .0, p) and ( Y, v) be Lebesgue spaces and T : V(Y, e, v)—) 1,2(X , JO an isometry which is multiplicative: T( fg) = T f • Tg
22
1. Introduction and preliminaries whenever f, g, and fg E 4f, v). Show that there is a homomorphism 0: X Y such that Tf (x) = f(4x) ac. 11. Fill in the details in the proof of Liouville% Theorem (2.1). 12. Use the Carathéodory-Hopf Extension Theorem to prove Kolmogorov's Consistency Theorem: Let A be an index set, and for each n = 1, 2, ... and each n-tuple (a, , ;) of elements of A, let p (a,...,.) be a Borel measure on Rn. Assume that: (i) If is any permutation of the set with n elements and TT is the corresponding transformation of Rn, then 1
(ii) If II., , .....
: Rn ,
k Rn
E)
for all Borel E c R.
is the projection map defined by
+k) = (x 1 ,
, x), then
..) (E) = p (a,... _ ..4.k) (11.-÷ik. .E)
for all n. k = 1, 2, 3,...
and all Borel E c= Rn. Then there is a probability space (n. P) and a family If. :ae AI of measurable functions on û such that always
= P{co:(fŒi (co),
fŒri (co))e El.
13. Prove Proposition 4.8. (Hint: For Jensen's Inequality, first consider the case when f is a simple function.) 14. Prove Proposition 4.3. 15. (Lind) Consider the two-to-one map Tx = li(x— 1/x) on R. (a) Show that T preserves the measure dx/(1 + x 2 ). (Hint: It is enough to show by calculus that f R iG(X - 1/x))(dx/(1 + x2)) = lot f (y)(dy/(1 + y 2)) for enough functions f.) (b) Show that the change of variables x = tan t carries T to a n/2). Lebesgue-measure-preserving map S of ( — (c) Show that S is isomorphic to the one-sided Bernoulli shift a(-1-, (d) Find a map that carries T to Vx = 4x(1 x) on [0, 1] (which preserves dx/n. /x(1 x)). (e) For each ae R. discuss the action of Taz = 3-(z + (a/z)) on C, represented as a sphere with poles at the two square roots of a. (Iteration of the map Ta is an ancient algorithm for computing square roots.) —
2 The fundamentals of ergodic theory
In this chapter we prove the basic convergence theorems of ergodic theory, namely von Neumann's Mean Ergodic Theorem and Birkhoff's Pointwise Ergodic Theorem. We also discuss recurrence. ergodicity, strong mixing, and weak mixing. These properties help to understand the manner in which a m.p.t. moves points and sets through the space on which it acts, and their presence or absence provides the first test for deciding whether two transformations are isomorphic.
2.1. The Mean Ergodie Theorem Although the Mean Ergodic Theorem and the Pointwise Ergodic Theorem bear the publication dates 1932 and 1931, respectively, von Neumann's theorem came first.
Theorem 1.1 Mean Ergodic Theorem (von Neumann 1932b) Let (X, A p) be a a-finite measure space, T: X --* X a measure-preserving transformation, and fe LAX, .0, 14. Then there is a function.TEL2(X,R, p) for which in-1
lim 1R -4' GO
E for—I
nk-=0
=---
0.
2
(The requirement that (X, .0, p) be a-finite is not really necessary. For if feL2 (X, A', p), then f vanishes outside a set E of a-finite measure. If we let CO
X' == U TkE, k=
—co
then each f 0 Tk vanishes outside X', and X' is invariant and has a-finite measure. If the Mean Ergodic Theorem holds in L2(r), then we can find Pfe L200 such that 1 "—
-n E Pr —1)'f k=0
24
2. The fundamentals of ergodic theory
in 12(X'). Then xe X — X') and
-
137 extends to Pf e 12(X) (by putting Pf (x) = 0 in case
f° T k Pf in L2(X).) k=0
We will prove a more general theorem than this. Note that the m.p.t. T : X -4 X determines a linear operator U: L2 according to (U f )(x) = f (Tx). U is an isometry, since (using 1.4.1)
flUf (x)I 2dp = fl f(Tx)1 2dp =
11f11 22"
U is also invertible (U 'f (x) = f(7 'x)), and hence U is a unitary operator (its adjoint equals its inverse). U = U T is called the unitary operator induced by T The Mean Ergodic Theorem, then, asserts that 1 n-1
-n
ukf
k= 0
converges in L2 ; this is a consequence of the following theorem.
Theorem 1.2 Mean Ergodic Theorem in Hilbert Space Let U be a (linear) contraction on a Hilbert space ./( (so that II U.f11 11f 11 for all fe Jr), let = {fe lif =f} (so .,11 is a closed linear subspace of /(), and let P: .11 be the projection of Ye onto Then for each fe ye , in-1
-
E
n k=0
Ukf
converges in ye to Pf. Proof : Let .)1( denote the closed linear span of {g — Ug: gel(); we claim that X and di are orthogonal complementary subspaces of .re It suffices to show that .Arl = 1, where Af-1- = {he :(h, n)= 0 for all n e.A } and (.,.) denotes the inner product on ye . ; then (h, g — Ug)= 0 for all gele, so Let h E 0
(h, g) — (h, Ug)= (h, g) (U* h, g) = (h U*h, g)
for all ge.)r, and hence h = U*h. We wish to show that Uh= h. Now I! Uh — h 11 2 = (Uh — h, Uh h) = II Uh11 2 — (h, Uh) (Uh, h) + 1h11 2 = II Uh11 2 — (U*h, h) — (h, U*h) + 1h 11 2 II h 11 2 — (h, h) — (h, h) + 11h112 = 0. Therefore Uh = h,
so hedi. We conclude that
.Afi a di.
2.1. The mean ergodic theorem
25
Conversely, if hE di, then Uh = h and hence U*h = h by the same argument applied to U* (which is also a contraction since
I u*f I
(u*f, U*ni Il UtrY 1111 f Therefore, for any gE
I(UU*M )1
2 -
I 11 u*f 1111 f 11).
,
(h, g — Ug) (h, g) — (h., Ug)= (h — U*h, g) = 0, so heX i. Consequently X.' =-We will show next that if f E.,1; then in-1
— E n
Ukf
k =0
converges to zero. First, if f = g — Ug for some g E .W. then in-1
—E n
k=0
1
ukf=
Ag — Ung),
SO
2 ;71 11 11
— Ukf n k=0
If f E.Ar then there is a sequence Igil in Yt such that fi = gi — Ugi converges to f. Then 1n-1
n-1
E
Uk+
Uk(f — fi )
1
"k----o for each j. Given c > 0, choose i so that ri /4=-0
so that
n-1
nk=0
Ukfi
I f - L II
<4i. and choose n
I n -1
nk = 0 then — n k= 0 Now iff E Then
E fi
is arbitrary, there is
a unique f0 E •Ar such that f =f0 + Pf
n-
ukf—
Pf
k=0
E
uk(fo + Pf) Pf n ic=0
— ukfo n k =0
The proof is complete.
2. The fundamentals of ergodic theory
26
There are many generalizations and modifications of this theorem, of which we will consider one example. The following theorem, essentially due to K. Yosida (1938) (see also Kakutani 1938b and Yosida and Kakutani 1941) contains several basic ideas common to many of these extensions.
Theorem 1.3 Mean Ergodic Theorem in Banach Space Let g be a Banach space and T:g —P g a continuous linear map satisfying IlTk il ‘. M for all k = 1, 2, ... for some constant M. Let yeg. Then the following statements are equivalent: (i) { —
E Tky:n = 1, 2, ...
nk=0 on
converges with respect to the norm
e'.
(ii)
i n —1 — E Tky : n = 1, 2, ...
has a weak cluster point. n k= 0 (iii) T has a fixed point in the weakly closed convex hull of {T ky: k = 0, 1, 2, ... } (i.e. the intersection of all the weakly closed convex sets containing {ry : k = 0, 1, 2, ... } ). {
Proof. (i) (ii): Clear. (ii) (iii): Abbreviating A ny = ( I /n) r,c1:(1 T'y, suppose that a stibnet Ay converges weakly to 9. Then for any continuous linear functional
(T9,0> = U, T*0> = Urn < An.y, T*0> = lim
= <.fl, 0> , because
TA ny — Any =
T1 .'y — Ty —i 0. n
Thus Tr? =9. Of course 9 , being the weak limit of convex combinations of { Tky}, is in the weakly closed convex hull of {Tky}. (iii) (i): Let r? be the fixed point. Since Tky = Tkr, + Tk(y — .T) = .fl + Tk(y — .p), we have Any = 9 + An(y — .0, and it is enough to prove that An(y — .p)-4 0 in q . Now 9 is a weak limit of convex combinations Ma
Say = E Ai.T iy, 't ?-- 0, E ■li.= 1. This implies that y — Say — Eilin(y — Tly) = Eilia(I — 7)(1 + T + T 2 + ... Ti ')y e range (I — T),
2.2. The pointwise ergodic theorem
27
so that y--j7 is in the weak closure of range (I - T), hence also in (weak and strong closure coincide for subspaces - see ----nr aro Royden 1968, p. 201). Now it is easy to show that A n(y ;7) -* 0. For, given e. Clearly Ax 0, > 0, choose X E range (I- T) with II - (y - 1 since An(I - T)= (T T" ')/n. Thus for large enough n,
il Axil +
A n(y -
if A.(x - (y - .0)11 < + Ms.
Remark 1.4 'Weakly closed' can be replaced by `norm closed'
in (iii) above, since in a locally convex topological vector space a convex set is strongly closed if and only if it is weakly closed (Royden 1968, p. 206).
2.2. The Pointwise Ergodic Theorem We turn our attention now to Birkhoff's Ergodic Theorem, also known as the Individual (or Pointwise) Ergodic Theorem. The basic estimate needed to prove the ac. convergence of the ergodic averages n-1
f (T kx)
-
k=0
is the content of the preliminary result known as the Maximal Ergodic Theorem, which concerns the maximum 1
n-1
n
0
f*(4= sup na,
1
f(Tkx)
of these averages. Actually, an estimate on the lim sup rather than the sup is sufficient to prove the Pointwise Ergodic Theorem, and this was Birkhoff's method in his original proof. Later we will study the role of maximal inequalities in some detail. For now we prove the estimate by means of an idea that goes back to Kolmogorov (1937) and Yosida and Kakutani (1939) and most recently has been presented in an especially clear form by Katznelson and Ornstein. Theorem 2.1 Maximal Ergodic Theorem (Wiener and Kakutani 1939) If fELAX, p), then
If* > 0}
Remark : It follows then that also fdp 0, •rif. 0)
1939, Yosida
28
2. The fundamentals of ergodic theory
since for each & > 0 we have
0 -.<._
( f + e)dp ...‹._ fiu. + os > 0)
f dp + e, fus> - E)
and we can let e decrease to 0 and apply a convergence theorem. Proof : The set where f * > 0 is the disjoint union of the sets
B 1 = {x:f(x) > 0 } B2 = { X :f(x) 4 0,f (x) + f(Tx)> 01
B.= {x: f(x) 4 0, ... ,f (x) + ... + f(Tn - 2 x) 4 0, f(x) + ... + f (Tn - 1 x) > 01 If we can show that
f then it will follow (by the Dominated Convergence Theorem applied to XB, u uB„ f) that Ls>o)
Let us fix an n = 1, 2, ... . The idea is to break B1 u ... u iiin into a union of different disjoint pieces, over each of which the integral of f will clearly be nonnegative. These pieces will be pictured as towers Bic' V TB Ik U ...UTk— 1 /31. In order to accomplish this, we make three observations:
(1) TkB,, B i u ... u Bn _ k for k = 1, 2, ... , n — 1. This is so because if X E B., then f(x) + f(Tx) + ... + f(T k- 'x) ...ç_ 0, while f(x) + f '(Tx)! + ... + f(T k- 1 x) + f (T kx) + ... so that we must have f(T5) + ... +f(Tn'x) > 0, i.e., f(T5) + f(T(T kx)) + ... + f(Tn -1` - 1 (rx))> O.
(2) The sets B., T B., ... , Tn - 1 B. are pairwise disjoint. This is so because if TiB. n VB.* 0 for some i <j, then B. n T-"B.# 0, contradicting (1).
2.2. The pointwise ergodic theorem
29
(3) If we let
B
1= H1 = B I VC 2 u ... u C.),C =
then the 'columns' C 1 , C 2 , ... Cn are pairwise disjoint, and the levels H , ,r ---1 .131 within each column Ck are also pairwise disjoint. Thus the following picture is a correct representation of B 1 u B2 B
L.) B2
Tn •
2B;,_ 3 B:,_
TB B,,
2
2 =
,\C,,
C._
B;1 _2 =
—B;
Bn_2\(C„ u
cl
cos - 2
This is so because (i) the pieces are in B 1 u u B, by (1);
(ii) each base is disjoint from all the columns to the left of it, by definition; (iii) each base is disjoint from all the columns to the right of it, by definition (for the base) and by (1) (for the higher levels - since the images of the base to the right are contained in Bis even further to the right). Thus, if two columns were to intersect, we could apply until one column intersected the base of the other; and we have seen that this does not happen. Now it is a simple matter to make the estimate
fdp =
f
n
n
E f fdp = E Ck
k = 1 B;c LI T 4 1.)...
n
.Ef k=1
4
(f + f T + ... + f T k - 1)dp › 0,
f dp L.) T ie
- 1 13;c
30
2. The fundamentals of ergodic theory
since f + f T + ... + f T k- 1 > 0, by definition, on
Bk = B.
Corollary 2.2 For each ac R, f dp › att{ f * > a}. art,* > al
Proof: Letting g =f- a, we see that { f* > a) = (g* > 0 } , so that 0
L.> oi gclti = f
(f - a)dii,
If* > a)
and hence
f
f dp .?...- °cuff * >Œ}
(fs> a}
Now we are in a position to state and prove the fundamental theorem of ergodic theory, the Pointwise Ergodic Theorem (also known as just the Ergodic Theorem) of G. D. Birkhoff.
Theorem 2.3 The Ergodic Theorem (Birkhoff 1931) Let (X, M, p) be a probability space, T: X -- X a m.p.t. and fc L l (X,M, p). Then (1) lim.,(1/n)E::,?„ f (Tkx) = J(x) exists a.e.; (2) f (Tx) = J(x) a.e.; (3) f E 1-1 , and in fact (4) if AE M with T -1 A = A. then fA fdp = f A ldp (this says that if .5 is the sub-c-algebra of .4 consisting of all the T-invariant sets, then f = E(f1.1) a.e.); (5) (1/n)a: (1) f Tk -> f in L.
11f111 IlfIli;
Remark 2.4 If X has infinite measure, the same statements hold, except that (4) and (5) apply only to invariant sets of finite measure. Proof of the Ergodic Theorem: (1) For each a, /3 € R with a < fi, let 1 n -1 E,,.fl = xe { X : lim inf - E f(rx) < a ‹ # ri --■ co n k=0 in-1
/1
E
k =0
We will show that p(E) ---- 0 for each a, P. Then the union over all rational a, /3 will also have measure 0, and hence the limit exists a.e. Now E 1 is an invariant subset of f * > #}, so by considering T restric{
31
1.1. The pointwise ergodic theorem
ted to E11, we see that the Maximal Ergodic Theorem implies that
SEE P
/3p(E.,fl ).
fdp
Next we consider — f. Since if x E Ece13 , there is an n 1 with
E f(Tkx)< a,
n k=0 we see that
Ect.P
((—
>
Hence, by the Maximal Ergodic Theorem,
— ap(E.. p ), Or
L.0
dit < c4(E..0).
Thus
fit4E,03) tçj
fdi.1 ‘. 340,4), Ea,p
and, since a < /3, this is possible only if p(E.. p ) = O. (2) It is clear thatjT =fa.e. (3) We use Fatou's Lemma. Since I 1 '1 I 1n-1 - ,1
we have
> f n k=0 Ik=0 I T' ,
I
I n 4‘
If and thus
J
if IcIP
11 I f( Tk X) fFd
JIfI d
< oo.
(4) This is usually proved by means of the Maximal Ergodic Theorem as follows; we will also obtain this result as an immediate corollary of (5). For each n =0, 1, 2, ... and k 0, + 1, + 2, ..., let
A ndt :=I X
E
k _
k+1 < 2 . 1.
The A n.k are invariant sets, and for each n we have A J A n k
le
2. The fundamentals of ergodic theory
32
Fix E > O. Then f* > kl2" — fdp
Similarly, (
—
E
on A n.k , so
(- — 2"
f )* > (k +
)/r on A n.k , so
k+1 2r, ii(A n.k).
fdp
Thus (— E)p(A 2n
and letting
k+1
fdp
n.k )
2R
A„,k
E --0
0 gives k+1
fdp
±11(A k) 1 n.
2n
p(A n.k)..
2n
A,, k
p(A "
,
Since obviously 2n
k+1
Idp
)
k ),
2n
"
n'
we have that fdp — f JAn.k Summing over k, dp —
1 A",k
2n
fdp
— p(A),
A
e
p(A
n.k
2n
and the conclusion follows upon letting n 0 X. (5) For bounded functions. the 12 convergence would follow from the Bounded Convergence Theorem. The general case can then be proved by approximating by bounded functions (which are dense in L') and using (3). There is no problem in assuming thatf 0, since we may writef =f —f and deal with the two parts separately. If g is bounded and 0 g Çf —
then n-
E n
-
k= 0
—1
-n1 nE
(fTk —
k= 0 n— 1
-n y
g Tk 4
k= 0 By (3), the third term is less than or equal to 11 g
—
f Il i , which can be made
2.3. Recurrence
33
arbitrarily small by appropriate choice of g. Similarly, the first term is also less than or equal to ll f- g 1h. But once g is fixed, the second term approaches 0 as n --, oo , by the Bounded Convergence Theorem. This proves (5). (Notice that it is essential here that X have finite measure.) We can also give in this vein an alternative proof of (4): .
1
n-1
1 fidil - ficlil l = fA ('Z k l
f) dil l
Z-4 1— E f Tk - 1 dp = -ni En— f' 7 4 - 11 1 n— 1
A n k -=
0
ic=0
---■ 0, LI(A)
by (5). Remark
2.5 It is not hard to extend the Ergodic Theorem from
the case of point transformations to that of more general Markov operators (3.7) having subinvariant measures (see Foguel 1980 for the details).
2.6 The Maximal Ergodic Theorem (2.2.1) and parts (1), (2) and (4) of the Ergodic Theorem (2.2.3) hold under the weaker hypothesis that one off + and f - is in L' rather than fe Li . Remark
Exercises
1. State and prove versions of the Maximal Ergodic Theorem and
Pointwise Ergodic Theorem for one-parameter measure2. 3. 4. 5.
preserving flows. Prove Remark 2.6. Show by example that (4) and (5) of the Ergodic Theorem need not hold if (X)= 00 . If gm. co and gel;3 , then {kg} need not converge a.e., but it does converge in some sense. Identify f(x) in case (a) (X, .4t, p, T) is 2(po , D.- 1 9 Pn - 1) and f(...x_ i , xo , x i ...)= .' • /
X(i)(x 0)•
6.
(b) Tx --.-- x + a mod 1, a irrational, and f ---- xi for some interval I. (c) Tx --.-- x + 1 on R, ii is Lebesgue measure, and f e L i . Show that if I',1:4f(Tic x) ---> co a.e., then f , f dp > 0.
2.3. Recurrence Let (X, At , 12) be a probability space and T: X -- ■ X a measurepreserving transformation. In this section we discuss the problem of recurrence, the most basic question to be asked about the natures of orbits of points and measurable sets. We prove the recurrence theorems of
34
1. The fundamentals of ergodic theory
Poincaré, Khintchine, and Halmos and define the induced transformation on a set of positive measure. The mathematically simple Poincaré Recurrence Theorem has become famous because of-its physical and philosophical implications, some of which we will indicate below. It may be considered to be the most basic result in ergodic theory. Definition 3.1 Let BEM. A point xeB is said to be recurrent with respect to B if there is a k 1 for which rx GB. Theorem 3.2 Poincaré Recurrence Theorem (1899) For each BEM, almost every point of B is recurrent with respect to B. Proof: Let F be the set of all those points of B which are not recurrent with respect to B; then F = B—
k=1
T -k B=BnT -1 (X — B)nT -2 (X — B)n...
Note that if xEF, then Tn..1cOF for each n 1. Thus F nT'F = 25 for n 1. and hence T'FnT -(n +k) F Ø for each n 1 and each k O. Then the sets F, T -1 F. T -2 F,... are pairwise disjoint and each has measure p(F). Since p(X) < co, p(F) must be zero. One may make the following physical interpretation of this result. As usual the space X is supposed to include all the possible stat_.; of some physical system, the a-algebra to consist of all observable states of the system, and the measure p to specify the probability of each observable state. The physical system is evolving with respect to a discrete time (i.e., we make measurements on it, say, once a second), and T: X —> X is the transformation which carries each state of the system into its succeeding state. In a situation of equilibrium, the transformation T preserves the measure p, so that the probability of an observable state does not change with the time. Under this interpretation Poincaré's Recurrence Theorem says that if at time zero the physical system is in some observable state B, then almost surely the system will eventually return to this observable state. For example, if we insert a partition in a box and pump out all the air on one side of the partition, our system of air molecules is in a state for which all the molecules are in half of the box. Poincares Recurrence Theorem asserts that if we remove the partition and wait long enough, the molecules will almost surely once again congregate in their original half of the box. The fact that this theorem implies that such an unlikely event will almost
2.3. Recurrence
35
surely happen has worried many people for a long time. Zermelo pointed out the apparent incompatibility of this prediction with such important conclusions of thermodynamics as the Second Law and Boltzmann's H-Theorem. Apparently much of the controversy is based on various misunderstandings, such as the fact that the H-Theorem involves not a single orbit (or history) but a quantity calculated by performing an average over all possible initial conditions. We also face the difficulty that the expected return times are even larger than enormous. Consider the following simple example, due to the Ehrenfests (1957), which involves only a
relatively small number of particles. Let us suppose that there are two urns, one of them containing 100 balls numbered from 1 to 100, and the other being empty. There is also a hat containing 100 slips of paper numbered from 1 to 100. Once a second we draw a slip of paper from the hat, read the number on it, replace it in the hat, and move the ball bearing that number from whichever urn it is in to the other urn. According to the Second Law of Thermodynamics, as well as our naive intuition, the system will settle down towards the statistical equilibrium state in which there are 50 balls in each urn. Of course there will continue to be random fluctuations about the 50-50 division, but it appears highly unlikely that a fluctuation could be so large that all 100 balls would return to the urn from which they started. The Recurrence Theorem says that although such a fluctuation may be highly unlikely, still it will occur almost surely. The Recurrence Theorem really applies to this situation because this sequence of experiments can be represented by a Markov shift. We can describe the state of our physical system at a given time k = 0, 1, 2... by Specifying the number cok e {0, 1, 2, , 100} of balls in the first urn. If at time k = 0 there are coo balls in the first urn and we proceed to draw numbers from the hat and so forth, the system passes through the states coo , co i , co 2 ,... in succession, subject to the conditions co
1 ... 100} k e{0" (ok wk411= for all k 0, 1, 2,... Matters can be simplified considerably if we assume that the experiment began infinitely long ago and continues infinitely far into the future. Let SI nr {0, 1, 2,...,100} , let :CI CI be the shift transformation defined by (aco) (n) = co(n + 1) for n E 1, and let .4 be the ci -algebra generated by the finite segments of histories ( 1)
Bk (i i , i2 ,...,in)
{coal: co(j + k) = ip j
36
2. The fundamentals of ergodic theory
We wish now to construct a measure p on (n, a) such that p(Bk (i t ,...,0) gives the probability of observing the sequence (i 1 , ... , id of successive states of the system beginning at time k. The a priori probability pi of observing the state i at time k is independent of k and is
1 (100) Pi— 2 1 " \ i ). Let p denote the vector (po , p i ,.••,p 100 ) For each i, j = 0, 1, ...,100, let aij denote the conditional probability of the state j given the state i. If we are in state i. then there are i balls in the first urn and 100—i in the second. Drawing a number m from the hat, the probability that ball number m is in the first (second) urn is 1/100 ((100 — 0/100). Now from state i we can only move to state i+ 1 or i— 1. Thus t, i - 1
a
=
..
=
.
.
i
100 100 — i 100
ao = 0 if j is neither i + 1 nor i — 1. We have the transition matrix -
0
1
1 100
o
—
A
0 0
2
0
0 99 100 0 3 100
0
0
o
o
98 100
o
o 10
•••
97 100
Then the probability of observing a sequence (i 1 ,... ,in) beginning at time k is independent of k and is given by
P(Bdil , • • • , in) ) = Pi, ai,,i, • .. Since pA = p, the conditions of Kolmogorov's Theorem are satisfied and p extends to a measure on all of ge such that a: X -- ■ X is a m.p.t. Let E = {cue il co(0) = 0 } , so E consists of those performances of the experiment in which we are interested. The measure of E is p o > 0. Accord-
2.3. Recurrence
37
ing to the Recurrence Theorem, IwEE: there is k> 0 such that crk coeE} has measure equal to p(E). That is. with probability 1 the system will recur again and again to the state in which there are no balls in the first urn. Of course the expected time is astronomical: by Kac's Theorem (2.4.6), it is 1/p(E) = 2 1 " drawings. For real systems with millions of particles, the expected return times will be even more absurd. For recurrence of sets a sharper, quantitative statement can be made. A set E c Z is called relatively dense if it has bounded gaps. in that there is a positive integer K such that En ijj + 1,...,j + K
11 # 0 for each je Z.
The following theorem says that the images of a measurable set under the iterates of T come back and overlap the set fairly regularly - namely, with bounded gaps. Theorem 3.3 (Khintchine 1934) For any BE
and any E > O. the
set
EE = IkEZ:p(rBnB)
p(B) 2
—
is relatively dense. Proof: We direct our attention to the Hilbert space L2 (X p) and apply the Mean Ergodic Theorem. Let U: L 2 4 L2 be the unitary operator induced by T. If Be.4 and z8 is the characteristic function of B. then -
8 . Thus
X7-kB = U
p(Tk B n B)=
zil z r,B dp=
z i3 U -k z Bdp=(z 13 ,U -k x8).
Applying the Mean Ergodic Theorem to f = 4, given E > 0 there is an n such that n-1
-
Pf
Ulf
n k=0
<
1 111+ 1
'
and hence for each j el n-1 '
E
uk+if-
Pf
• k=
+ 1 or
n-fj-1
-n kE. i U
kf
Pf <
11/. 11+ 1
for all jEZ.
Because of the orthogonal decomposition L2 =
Art e . 4 •
in the Mean
38
2. The fundamentals of ergodic theory
Ergodic Theorem, consequently
(Pt, Pf)=-- (Pf f),
also, since 1 edt, (f, 1) = (Pf 1);
(f, 1 )2 = (Pf, 1 )2 11 Pf 11 2 ( 1 , 1 ) = (Pf, Pf ) = (Pf, f). Then
y
-0 - 1 E
1( 1 n 1 k -
n
U f — Pf,f)1‘.
— Pf
f < E for all jeZ,
k=i
SO n+ — 1
E 11(74B n B) -=
=
kj
E
n k=j
(Ukf f) (Pf,f) — E (f, 1)2 — e
1.1(13) 2 — E for all jel.
Thus the interval [i, n +1 — 1] must contain at least one k for which p( TkB n B) 4 B)2 E, and this shows that Ef is relatively dense. One may study recurrence also in more general situations, even where there is no finite invariant measure and where T is merely a set transformation rather than a point transformation. Let (X, .4) be a measurable space, Jc .4 a a-ideal with T , and T: X X a (possibly noninvertible) measurable transformation. For each Beg, if 00
B* k=1
then B\B* is the set of all those points of B which never return to B. We say that the transformation T is recurrent in case B\B* e if for all Be. For each Beg,
n=0 j?--. n
is the set of all those points of B which do not return to B infinitely many times. 1f for each Beg this set is in 5, then we say that T is infinitely recurrent. Poincares Recurrence Theorem implies that a m.p.t. on a finite measure space is infinitely recurrent, where 5 is the a-ideal of sets of measure zero. B, T -2 B,... are pairA set Beg is called wandering if the sets B, wise disjoint. T is called conservative if every wandering set is in f. Since B\B* is wandering for each Beg, it is true that T conservative implies T recurrent. Finally, T is called incompressible if whenever Bea' and T - B B, then B\T-1
1.3. Recurrence
39
Theorem 3.4 (Halmos 1947) Let X, a, f. and T be as above. Then the following statements are equivalent: (I) T is incompressible. (2) T is recurrent. (3) T is conservative. (4) T is infinitely recurrent. Proof (Wright 196Ib): For each Bea, continue to let B* = Ukc° I T-k a Note that each set LAB* is wandering, and conversely if B is wandering then B n B* = 0 so B — B\13*. (1) (2): Let Bea and E = Bu B* . Then T -1 E =-- B* E, so E\T - 1 Eef. But E\T - 'E = (B u B*)\B* = B\B* , so B\B* E f . (2) (3): If B is a wandering set, then B = B\B* Eit, since T is recurrent. (I): Suppose Be a and T -1 B c B. Then B* = T - 1 B, so (3) = LW e .1 t , since B\B* is wandering and T is conservative. (4) (2): Obvious. (2) (4): If T is recurrent and Be, then 13\13*e I . We need to show that 13\n 0 T' B* e it . Now
B\
fl
T 'LP = B n (x \ n T-nB*). B n U (X\T'B*)
= B n[(X\B*)u (X\T - 1 B*)u (X\T - 2 B*)u ...] = B n[(X\131u(B*\T - 1B*)u(T ' B*\T - 2 Blu ...] = (13\13*)u[B n
co (T ' B*\T - (n+ 1)B*) .
0 Now T -1 T - "B* = T -("'"B* T - "B* and T is recurrent, hence incompressible, so T ' B*\T - (n+ 1) B* e., for each n = 0, I, 2, ... Consequently B\Rœ_ o T 'B* e 5 and T is infinitely recurrent. n=
The phenomenon of recurrence makes possible an interesting construction of m.p.t.s which is very useful for producing examples with a wide variety of properties. Let T: X —0 X be a m.p.t. and Ac Xa measurable subset of X of positive measure. The integer
nA (x).-- inf in .?.--- 1: Tnxeill is defined (and finite) for a.e. xeA. A is a probability space with measure PA = pl p(A). We define the induced or derivative transformation TA:A—*A by
TA (x) ---- TnA'x for a.e. xc A (Kakutani 1943)'.- Of course n A and TA are measurable, and in fact TA is a measure preserving transformation. This is all checked most easily with the aid of the diagram below, in which -
A. = (xe A n A (x) = n). :
40
2. The fundamentals of ergodic theory
1 I
1
I
i
1
I t I 1
1 4 I
t
T2AVALITA)
I
I
TA\A
I
j
A
AI
The action of T on this part of X is described as follows: a point not on the top floor is moved directly up, and the top floor is moved back down to the first floor. Under TA , each point of A is mapped to the point where it first returns to A. This idea of a derivative transformation can also be reversed, so as to define an induced or primitive transformation T: i'l -, R on a space Ye larger than X. For example, if A t X is a measurable set with positive measure, let A' be a distinct copy of A. The disjoint union fe = X u A' is made a probability space in the obvious way. We think of A' as sitting over A.
A'
X X\A
A
The action of T is described as follows. Since A' is a copy of A. there is a one-to-one onto map - --* x' from A to A'. We let Tx if xe X\A {
if xeA Ty if xe24 1 and x = y' for ye A.
The action of T is described in the same way as that of T above. Again T is a measure preserving transformation. Of course this process can be repeated so as to construct skyscrapers with 3, 4, ..., oo floors as well. -
2.4. Ergodicity 1.
2.
3.
4. 5.
6.
7.
41
Exercises Let T: X --+ X be an invertible, measurable, nonsingular transformation on a a- -finite measure space (X, A', m), in that T preserves the a-ideal of null sets of m. A set WER of positive measure is called weakly wandering if there is a sequence nk -+ oo such that the sets rk W are all pairwise disjoint. Show that if T has a weakly wandering set, then there does not exist a finite invariant measure equivalent to m. (The converse, due to Hajian and Kakutani (1964), is also true.) Let T be as in (1). Show that X has a decomposition into disjoint, measurable, invariant conservative and dissipative parts, X = C u D, in the following sense: TI C is conservative and D= { TRW: nel} for some wandering set W. (Hint: We may assume that m(X) < co . Take W,, with m(W,,) sup {m(W): W is wandering} and let C be the complement of the invariant set generated by the .) Let T be as in (1). Show that T is conservative if and only if ur takes only the two values 0 and co a.e. for each nonnegative u eV' (X, m). Verify that the induced transformations TA and really are m.p.t.s when X is. Describe the action of TA in case (a) X = [0, 1), Tx = x + a (mod 1), and A = [0 , 1), (b) X = (0, 1}', a- is the shift, and A = tx: x(0) = 01. Prove directly (i.e. without using 3.3 or the Mean or Point wise Ergodic Theorems) that if T is a m.p.t. on a finite measure space and p(A) > 0, then In .?--• 1 " A n A) > 01 has bounded gaps. (Hint: If not, there is a sequence (ni ) such that the sets T - "1,4 are essentially disjoint.) If p.(A)> 0, then almost every point of A returns to A with a positive limiting frequency. Need almost every point of A return to A with bounded gap?
Er. o
2.4. Ergodicity From an abstract point of view, the qualitative ideas of recurrence discussed in the preceding section lead naturally to quantitative considerations. Granted that points almost surely return to subsets of positive measure, how often do these recurrences take place, and how much time does (the orbit of) each point spend in the vicinity of a given subset? _ Considering again the example of the Ehrenfests, according to intuition
42
2. The fundamentals of ergodic theory
the two-urn system should spend most of its time near the highest probability situation (an even division), with only very infrequent and brief diversions to low-probability states. Thus the high probability states at least seem attractive and stable, with high probability and on a reasonable time scale. Historically and physically the idea of ergodicity carne from Boltzmann's ergodic hypothesis. It was desirable that the time mean of a physical variable should coincide with its space mean, i.e. that the long-term time average along a single history (or orbit) should equal the average over all possible initial conditions, or, equivalently, the average at any other single moment over all possible histories. (The word `ergodic' comes from the Greek for 'energy path') In order to arrive at this conclusion, Boltzmann hypothesized that each orbit (under the action of Ill) filled out all of the phase space (in his case a surface of constant energy). It did not take long to realize that such a condition was topologically impossible, and so it was replaced by the quasi-ergodic hypothesis: each orbit is dense in each surface of constant energy. (This is minimality ; see Section 4.2). However, in spite of many attempts to prove that it did, this weaker hypothesis does not imply the equality of space means and time means. (This follows from the fact that a minimal system need not be uniquely ergodic - see Section 4.2-, as was first shown by Markov - see Nemytskii and Stepanov 1960.) In this section we will establish several correct and mathematically easy necessary and sufficient conditions for the equality of space means and time means. The most basic of these is metric indecomposability; we will in fact take this as the definition of ergodicity. Physicists have complained that none of these conditions is easy to verify in practice. Indeed, it was only recently (1963) that Sinai was able to prove the ergodicity of (a simplified version ot) the hard sphere gas, the most fundamental system of interest in statistical mechanics. Practical difficulties aside, the property is interesting in itself and worthy of close examination; and since every system can be decomposed into ergodic systems (see Jacobs 1960 pp 85 ff) the assumption of ergodicity, if necessary, is frequently harmless. Recall that a set Be gi is called invariant if 1 T ' BAB) = 0. The transformation T (or, more properly, the system. (X, gi1, p, T)) is called ergodic or metrically transitive if every invariant set has measure 0 or 1. (
Proposition 4.1 (X ,gI, p, T) is ergodic if and only if every invariant (fo T = f a.e.) measurable function on X is constant a.e. Proof: Suppose every invariant measurable function is constant a.e., and
2.4. Ergodicity
43
let Ee.4 be an invariant set. Then xE is constant a.e., so xE is either 0 a.e. or 1 a.e. Thus ;4E) is 0 or 1. Conversely, suppose (X, .4, p, T)is ergodic andf is an invariant measurable function. Then for each reR. E,, = bce X: f(x) > I.} is measurable and invariant, hence has measure 0 or I. But if f is not constant a.e., there exists an reR such that 0 < ,u(Er) < 1. The refore f must be constant a.e. Theorem 4.2 (X, p, T) is ergodic if and only if 1 is a simple eigenvalue of the transformation U induced on L2 (X, p) (complex) by T. Moreover, if (X, at, p, T) is ergodic, then every eigenvalue of U is simple and the set of all eigenvalues of U is a subgroup of the circle group K I zI = 1 }. Conversely, given a subgroup G of the circle group, there is an ergodic system (X, p, T) such that G is the group pf eigenvalues of the unitary operator on L2 (X, A p) determined by T. Proof: 1 is always an eigenvalue of U, since constant functions are invariant. For 1 to be a simple eigenvalue means that Uf =f implies f is constant a.e. Thus the first statement is obvious. Since U is unitary, every eigenvalue of U has absolute value 1. (For if Uf= Af, then f= .I.U*f, so (Uf, f) = A(f, f) and (Uf, f) = (f, U*f)= 1/X(f,f). Thus AX = 1.) If f is an eigenfunction with eigenvalue A, then Ulfl = I Ufl = 141 = If I, so If I is a nonzero constant a.e., since T is ergodic. Therefore f* 0 a.e. Now if g is another eigenfunction with eigenvalue then U(g/f) = (ri/A) (g/f), so g/f is an eigenfunction with eigenvalue ?JR; therefore the eigenvalues of U form a subgroup of K. If we take = A, we see that g/f is constant a.e. ; therefore each eigenvalue is simple. Finally, suppose we are given a subgroup G of K. If we give G the discrete topology, then its character group Ô is a compact abelian group and hence has a unique normalized Haar measure p. Define 0E6- by 0(g) = g for all geG, and T: Ô Ô by Ty = 0-y. Equipping 6' with the a-algebra A' of Borel sets, we obtain a probability space (6', A p) and a m.p.t. T: Ô --• Ô. Note that if f: K is a character of Ô. then for each 7E 6, (Uf)(y) = fifirY) 2---. f(A) f (y), so f is an eigenfunction of U with eigenvalue f(4). We will see that all eigenfunctions (up to constant multiples) arise in this way, and that each eigenvalue is simple. The map G --0 d according to g where fg(y) = y(g), is well known to be an isomorphism onto. The eigenvalue corresponding to feÔ is JO) = 4)(g) = g. Thus G is contained in the group of eigenvalues of U.
44
2. The fundamentals of ergodic theory
Now the elements of d form a complete orthonormal set in L2(0, M, A). Thus, given an eigenfunction f with eigenvalue A, we have f=
E agfg
(ag EC),
geG
and thus
f (OD = 1 2g f,(0 .0 - Eagfg (o)f.(y)= Eag gy(g) geG
= Af (V) =
E ilag fp)= Eolagy(g), geG
so agg = Aag for all g EG, and hence ag = 0 whenever g # A. Therefore g = a,lf,t , and we see that each eigenfunction is a constant multiple of a character of d. The foregoing also shows that the group of eigenvalues coincides with G and each eigenvalue is simple. Remark 4.3 A system (X, a', p . T), such as the system (6, M, p, T)
constructed in the preceding proof, which has the property that the eigenfunctions of the induced unitary operator U on LAX. M. p) span /i(X. M. p). is said to have discrete spectrum. Theorem 4.4 (X, gi, p, T) is ergodic if and only if for each f E Li (X , a, p) the time mean off equals the space mean of f a.e. : in-1
f(X)==
lirn -
E
n—bco rik.0
f(Tkx) =
f f dp a.e.
x Proof: Suppose (X, gi1, p, T) is ergodic. Then 7, being invariant, must be constant a.e. Moreover, from the Ergodic Theorem we also have
ix fdp = fx jd,u = 1(x)
a.e.
Conversely, suppose that for each f EL 1 (X, If f is an invariant function in L1 (X, M, p), then
1 —'
-E
f( T x) = f(x)
a, p), f
is constant a.e.
a.e.,
n k-=- 0 so f = f a.e. Thus f is constant a.e., and Proposition 2.4.1 implies that (X, a, p, T) is ergodic.
For a measurable set E and a point xe X. we defi ne the mean sojourn time of x in E to be in-1
liM— n_.con
y xE (Tkx) =1E(x).
k=0
45
2.4. Ergodicity
If T is ergodic, then iE = p(E) a.e. Conversely, if the preceding theorem holds for characteristic functions of measurable sets, then, for an invariant set EE, 2E = X E a.e. is the constant p(E) a.e., so ,u(E) must be 0 or 1. Thus T is ergodic if and only if the mean sojourn time in a measurable set equals the measure of the set for almost all points of X. The theory of Krylov and Bogoliouboff (1937) (see also Oxtoby 1952) reverses this situation by starting with a measurable transformation and trying to define an ergodic invariant measure by
1''
p(E)= lim 11 -* C0
E XE (T kX)
n k =0
when the limit exists. Proposition 4.5 (X, M, p, T) is ergodic if and only if for each f geV(X, M., p) we have 1 n -1
rim n-'œ
E (U" f, g) = ( f, 1)(g , 1 ).
n k.0
Proof: If (X, M, p, T) is ergodic, then 1 n-1 (ukf, 11111 — rt-• co /1 k=0
E
g) = (1, g) = ( f f dp, g) X = (( f,
I), g) = (f,1)(1, g) = (f, 1)(g, 1).
Conversely, given f€ P(X , A', p) c 1)(X , M, p), suppose (7, g) = (f, 1) (g, 1) for all ge12(X,M,p). Then we must have 1= (f,1)= Ix f dp a.e., so T is ergodic by the preceding theorem. In case T: X -, X is ergodic, the Kakutani skyscraper picture we associated with an induced transformation TA :A-- Aon a subset of positive measure is actually a picture of the action of T on (almost) all of X, since almost every point of X must eventually enter A.
2. The fundamentals of ergodic theory
46
Here nA(x)= inf In > 0: Tnx e A} and A n = Ix e A:n A(x)= n } . This diagram makes possible a calculation of the expected recurrence time 1
nA d,u
,u(A)fA
of a point of
A to A: it is 11p(A). We give three proofs
of this interesting
fact (partly because the question of expected multiple recurrence times see 4.3 - is open). Theorem 4.6 (Kac 1947) Let T be an ergodic m.p.t. space (X. PÀ, p). If A E4 with p(A) > O. then
on a Lebesgue
nA dp = 1. s'A
(Wright 1961a) Referring to the diagram above.
Proofs : (1)
CO
SA nA dp =
p(X)= 1. n= 1
of n pieces, each of since the column Au T A n u ...0 T" - 'A measure p(A n). and the union of the columns is X. (2) (Kac 1947) For each k = 0. 1. 2.... let Ak = X\T A. Define and
m 0 --- 1
nAt n---n As, — 1) for k O.
mk
Then for each k
1.
1.1(Anizf i n...n k nT 4 + 1) A) = p(A n i n n Ak ) p(A nA i n n = p(i n n ;i ) - p( 2-1,D n n A k )
/10. i )
' • • • n Ak+ 1)] n 71k+ 1) - g il-on Lu(Ai (and. because T is measure-preserving) 71k- 1) gilon • • • n izfd - [f (A ° n - n 74k) gÂo n - n A k+1)]
= 1-1(Aon =
Mk
— ink+ — M k+1 +
M k+2
=
Mk —2M k+ 1 + M k+ 2'
Now
SA
(k + 1)1.4(Anq i n n  k nT -(k+ " A)
n A d,u = k=0
=
(k + 1)(m k - 2m k+ + m k+ 2 ) k=0
2.4. Ergodicity
47 ,
E (k + 1)(m k — Ink + 1 ± ink+ 2)
= link
n-6c° k=0
= liM 1 [ n-tco
n
ni-1
k=0
k=1
n+2
E (k + 1)m k — y 2kmk + E (k —
)mk ]
k--=.2 n
= liM
[
tno ± 27n 1 —
n-4 ci
2m1 +
E
[(k + 1) —2k
k=2
+ (k — 1)]mk — 2(n + 1)mn+ 1 + nm,. + 1 + (n + Om ni_ 2 ] = 1 + lim [ — 2mn + 1 — nmn+ 1 + ( n + 1)m +2 ] n-4 cc
= 1 + Ifin [(n + 0012,1+2 — Mn + 1) — Mr + 1] n-bco
= 1 — lim [n(m n — m
+ m a ].
Since the partial sums increase, In(mn — mn + i ) + mn i is decreasing, and hence the limit exists. But co
n AI,
k=0
is an invariant subset of 2i, and p(A)> O. so by ergodicity of T we must have I
hill -'co
mr =( P
(1 Ak )= 0.
k=0
The series CC
E (m„ — m ) .,1
n=0
has kth partial sum mo — mk + i , and hence the series is convergent and has sum mo = 1. Moreover, since {my. } is a decreasing sequence, this series has positive terms. Therefore limn,. n(mn — mn + 1 ) = 0, because otherwise (since the limit exists) we could conclude that the series E (1/ n) is convergent. It follows that
SA nA d,u = 1. (3) Since the derivative transformation TA :A -414 is ergodic (Exercise 1), 1 k-i 1 il irn i E nA (T,i x) a.e. on A. f it J= 0
48
2. The fundamentals of ergodic theory
For each k let Nk(x) --- nA (x) + n A(TAx) + ... + n A(TYI - 1 x), so that
Nk(x) k
1 p(A) f A
a.e. as k --- 0 0.
n A d,u
However, Nk(x) is the number of T-steps required to produce k entries to A; equivalently, k counts the number of visits of Tx, T 2x, ... rktx)x to A:
x,Tx,T 2x, ... Tilx, ... Ti 2x, ... ... T N"4(x) ..—„.
nA (x)
4.----,
nA (TAx) ... nA(T,k4 - 1 x)
ii vNkoo- 1 ..,, .-.-Thus k — - L- 4 = 0 A- At x) for x E A, and so by ergodicity of T k a.e. ' fXAdti = PO) N k(x)
The Kakutani skyscraper decomposition of an ergodic transformation leads to a lemma on which are based many of the important constructions of ergodic theory. With a bit more work, the lemma can be shown to hold actually for all aperiodic transformations- those whose periodic points form a set of measure 0 (see Halmos 1956).
Lemma 4.7 Kakutani- Rokhlin Lemma (Kakutani 1943, Rokhlin 1948) Let T :X -- X be an ergodic m.p.t. on a nonatomic measure space (X, a, p),n a positive integer, and e > O. Then there is a measurable set A c X such that A,TA,...T" -1 A are pairwise disjoint and cover X up to a set of measure less than e. Proof: Select a measurable set B c X of small measure (exactly how small to be determined later). We use the set B to form a Kakutani skyscraper decomposition of X. That is, for each k = 1, 2, ... let Bk = (x€B:nB (x)= k),
so that X decomposes (up to a set of measure 0) into the union of disjoint columns Bk l1 TB k l1 ...l1 T B it , k = 1, 2, ... In order to form the set A, we ignore the first n — 1 columns from the left; and beginning with the nth column from the left we select the base of each column and every nth one above it, moving up the column, and stopping before we arrive at a point n levels frOm the top of the column. Analytically,
2.4. Ergodicity
49 Tk- I Bk
TB2
T2B 3 T B3
B2
B3
:
TBk
..
Bk
"
.. ..... ......■
we May write [(Ic —n-1)/n.]
U T inBk.
A -=-- U Icrl
j=0
Then clearly A. TA, ... Til— I A are pairwise disjoint and cover X except maybe for a set of measure at most co n E ,u(Bk) ‹ n,u(B). k=
1
Thus we only need initially to choose p(B) < tin. Here are some examples of ergodic transformations. (1) Bernoulli shifts are ergodic. This follows from Exercise 4(b) (p. 56), since
the condition 1n— 1 — E p(T -k A n B)-- p(A)p(B)
n k= 0
is easily verified when A and B are cylinder sets. In the next section we will see that essentially the same argument proves that in fact Bernoulli shifts are strongly mixing. (2) Irrational rotations of the circle are ergodic. Let Tx = x + a (mod 1) for xe[O, 1), where a is irrational. If f eL2(X, .4, m), then f has a Fourier
expansion CO
f(x) =
I, ae 2"
-00
in L2 . If U f =f, we find that U f (x) = foc ± a) =_-__ E ane 2.b. e brin. _ v L, a„e2Rinx, -co
-.
so that a ae 2" for all n. Since ed" # 1 unless n = 0, we must have a. = 0 for n # O. Hence f(x) = ac, in L 2. By Proposition 2.4.1, T is ergodic. Thus if f ELV, A', m), we have i n -1 — E f(x + lux) ffdm a.e. nk = 0 -,
50
2. The fundamentals of ergodic theory
H. Weyl (1916) showed that in fact if f is Riemann integrable, then this formula holds for all x. His way to see this was as follows (see also Section
4.2). Translating f if necessary, it is enough to consider the case x = O. If f is Riemann integrable, there are trigonometric polynomials ni
nz
p 1(x) = E cne 2ginx ,
P2(X) =
E
dn e2ninx
such that p 1 < f < P2 on [0, 1) and
ft 0 (p 2 – f)dm
1
J
and f (f – p i )clm 0
are both a bitrarily small. Nowr
rl E Pi(kci)--co = j Pi din,
— N k= 0
0
and similarly for p2 , because for n # 0 1 N-1 1 1 _ e2niNna 2nka in = --).- O. — L e N 1 _e2nin, N k= 0
Then 1
i N-1
— E Oka) N
k= 0
<— N
1
N-1
E k=0
N-1
f(ka) <
2(ka) N
for all N
k=0
implies that (1/N) E kirs :01 f (koc) is arbitrarily close to 101 fdm for large N. This fact has an interesting application, the solution of Gelfand's problem: fi nd the frequency of the number 7 in the sequence 1, 2, 4, 8, 1, 3, 6, ... of first digits (in decimal notation) of the numbers 2, n = 0, 1, 2, ... That is, if for each k = 0, 1, ... 9, Pk() denotes the number of ks in the fi rst n terms of the above sequence, Gelfand's problem is to evaluate, if it exists, 197 , iim P.7(n) . n-■ -)D n Let a = log 10 2 and Ik = [108 10 k, log i 0 (k + 1)) for k = 1, 2, ... 9. Then the first digit of 2i is k if and only if k- 10' .-5. 2J < (k + 1).10r for some r, i.e., if and only ifjaE ik Thus .
n-1
P 7 (n)=
E x1 7 00 i=0
2.4. Ergodicity
51
aad 1 n-1
p7 = lim - E x 17 UŒ) = m(1 7) = log 10-87. n -•co n J--.0 (3) Translations of the torus. The same way as in (2), we can determine which translations of the n-torus
for i = 1, 2, ... n}
are ergodic. K" is a compact abelian group under addition modulo 1. Fix a = (al , a2,...a.)EDV and define T: K" -- K" by Tx = x + a (mod 1) for all x€ K". Let a be the a-algebra of Borel subsets of Kt', and let p denote the product Lebesgue measure on K". Then p is just normalized Haar measure, and T is a m.p.t. (le, a, p, T) is ergodic if and only if invariant L2 functions are constant ac. Suppose then that f is in the complex I! space OK", a, p). Now f -fo T is zero a.e. if and only if its Fourier transform ( f -fo T) is zero as a function on kn = Z. . For each k E ./n,
(f - fo T) -(1c) = f e - 21".' (f(x) - f(Tx))d,u(x) Kn
= f eoo- 27ik.xf(x)dp(x) _
e - Drik.xe2nik.afoodio) kn
= (1 _ e 21f e - 2.1k.xf(x)dp(x)= 0 _ e2.ik11(k). ic-
Suppose that a is rationally independent in the sense that, for keZ", k-cieZ if and only if k = O. Iff E L2 is invariant and a is rationally independent, then the above calculation shows that J(k) = 0 for all k # O. This implies that f equals the constant 1(0) a.e. Conversely, if (K", at, p, T) is ergodic, then a must be rationally independent. For if there is a nonzero kEir such that k - a E Z, then f (x) = is a nonconstant invariant function and (K", At, p, T) is not ergodic. Thus (Ihr, a, p, T) is ergodic if and only if a is rationally independent. The system (K", a, p, T) has discrete spectrum, as does any translation on a compact abelian group. (4) An irreducible Markov shift is ergodie. Let ((-a, a, p, e) be a Markov Shift on the alphabet {1, Z ... NI determined by an N x N stochastic matrix A = (ai) with left-fixed probability row vector p (see 1.2 and 2.3). Recall that Ak = (4) gives the k-step transition probabilities: a iki = Pr {cok = flcoo = i}.
2. The fundamentals of ergodic theory
52
Assume that A is irreducible, in that for each i and j there is a k for which 4.1 > O. Let = 1, 2, ... N.
Ei = (coEfl:coo = ,
The Ergodic Theorem yields the a.e. existence of the limits 1 n— 1 liM
E w(7x)
n k=0
and hence of 1 k urn - L xE,(a. x E, (x) dp(x) pi Ln = 0 1 = lin1 E p(a - k Ei n E)= lim 4i . Pi n-boon k=0 k=0 The matrix Q = (q ii) is stochastic and satisfies 1
qi,
[
—
Q = QA = AQ,
Q2
since Q= limn , co (l/n)
.
I claim that if A is irreducible, then all the entries of Q are positive and all the rows of Q are identical, each row being equal to p. From Q = QA, we see that, for fixed i and j,
=
E qikaz,
qik aikli for each k and n.
Thus if gi = (j:q u > then if k €' and atki > O. we have jeer Moreover, ei #0. So by irreducibility gi = (1, 2, ... NI for each i, that is to say, > 0 for all i and j. To see that all the rows of Q are identical, suppose that for some j oko qiko qioko < q = Then from Q2 = Q we have
qik.
=
qijqjko < (1 E 47
for ail i, which is impossible. Now (P(2)i =EPiqii= lim n•-• op = liM
n-■ co 11
k
L(PA1 j = pi ;
53
2.4. Ergodicity
since q ii is independent of i, we have qii = pi for all i, j. To prove that a is ergodic, it is enough to show that 1
_ n
11 -
1
y ,u(a - kE n F) -• ,u(E)p(F)
k=0
for each pair of cylindrical sets E = {coeil:corcor+ 1 ...cor#1 =-- id, ... ii }
where io , ...
... Le {1, 2, ... N}. If k is large enough, then
(fr, r + 1, ... r + l} + k) n {s, s + 1, ... s ± m} = 0,
and we have ,u(a -k En F)= (pioajoi , ... aim iim )(alm-i7a1.1 , ... a„_ . „).
Since 1" - l
E di::
n k=0
-p
p.
as n-* co,
the ergodicity of a follows. According to Exercise 8, the converses of all these implications also hold. Thus for a Markov shift with all p i > 0 the following statements are equivalent: (1) a is ergodic. (2) A is irreducible. (3) qii is independent of i. (4) qii = pi for all i and j. (5) go > 0 for all i and j. (5) Ergodicity of skew products (Anzai 1951). Let T be a m.p.t. on a probability space (X, a, p), and G a compact abelian group with Haar, measure v. Let f:X--G be a measurable function, and on the product probability space (X x G, Rix 0 aG . p x y) define the skew product transformation T f by T1 (x, g) = (Tx, g -f(x)).
Recall that Tf is measurable, invertible, and measure-preserving. Theorem 4.8 Suppose (X ,a,p, T) is ergodic. Then the skew product transformation Tf :X x G ---■ X x G is not ergodic if and only if there is a nontrivial character TE 6 and a measurable map pi of X into the unit circle D.( such that 'Of (x)) . q(x)q(Tx)
for almost all
X E X.
54
2. The fundamentals of ergodic theory
Proof: Suppose there is such arEd and such a map t 1: X -p K. Define h: X x G --P K by h(x, g) = q(x)r(g), so h is clearly measurable. The function h is T1 -invariant because h(Tx, f(x)) = n(Tx)r(gf(x)) = ti(T x)x(g).r( f(x)) = ri(x)t(g) = h(x, g)
a.e.
Moreover, h cannot be constant a.e. since (r, 1) = 0 in LAG, M G , V) and hence
f XxG f
h(x, g)d,u(x)dv(g) =
f( X
f ri(x)r(g)dv(g))d,u(x) = O. G
Thus h could be constant only if h = 0 a.e., but I hi = 1. Conversely, suppose the skew product is not ergodic and let h:X x G-PC be a nonconstant invariant function in L2. For each r E 6 define OT .X -"C by
=
h(x, g)t(g) - 1 dv(g). fG
Then OJT x)-r( f (x)) = fh(Tx, g)-r(g) - 1 r(f(x))dv(g) G
= f h(x, gf (x)')r(g) - 1 r(f(x))dv(g)
=
G
h(x, g)-r(gf(x)) - 1 -4 f (x))dv(g) fG
= f h(x, g)r(g) - 1 dv(g) = C(x)
a.e.,
G
SO
14)t(TX)I =
I ri) t(x)1 a.e. Since T is ergodic, I CI must be a constant Ct a.e.
There exists a nontrivial red for which CT # O. For if not, then for almost all x the function h(x, -)E L2(6) has each of its Fourier coefficients f h(x, .)T' dv equal to zero for r # 1 and consequently h(x, g) is independent of g. Since h is invariant and T is ergodic, this would imply that h is constant. Choose a nontrivial t e d such that C 0, and let
(1) (x) n(x) - 1. , C, so ri: X -P K is measurable. Then - 1 0,(x) Cr ri(x)riax) as required.
CT (MT x) (/),(x)-r(f(x)) - 1
-
2.4. Ergodicity
55
Anzai originally considered skew products in the case that X = G = K = 1-torus. Actually, this example goes back to von Neumann, who showed that the skew product (x, y) --4 (x + a, x + y) (a irrational) on 0K 2 has the same spectral type (i.e., the induced operators in L2 are unitarily equivalent) as the cross product of the shift transformation on an infinite torus with the translation x -4 x + a of K, but these two systems are not isomorphic (see Anzai 1951). In Anzai's situation, the transformation T: 0( 2 ---P 0( 2 is given by T(x, y) ,---- (x + a, y + f(x)). where a is irrational and f: K--p K is measurable, and the condition for nonergodicity becomes the following: there is an integer p # 0 and a measurable map Iv K- K such that
Pfix) = n(x) - ri(Tx)
a.e.
Obviously the idea of a skew product generalizes further, for example to infinite products. We might consider the map T: Kœ -4 Kœ given by, say . T(x i , x2 , ...) = (x 1 + a, x i + . 2 , X2 ± X 3 , ... ). Skew products are related to cohomology and representations of groups and to the virtual groups of Mackey. They also provide a useful tool for the analysis of the structure of a given m.p.t., as in structure theorems of Furstenberg (1963) and Zimmer (1975, 1976a, b). Recall that a given system (X, At, p, T) is said to have discrete spectrum in case the eigenfunctions of the unitary operator U induced on L2 by T span 12. There are two generalizations of this concept. Let Do c L2(X, a, p) consist of the constant functions of absolute value 1, and for each i = 1, 2, ... let Di = tfeL 2(X,M, ,4:If1=1 a.e. and Uf/feDi _J. The elements of D = Ur-cl Di are called quasi-eigenfunctions of U. and T is said to have quasi-discrete spectrum in case D spans ox,a,p). Again, let Ho c ox, a, p) denote the constant functions of absolute value 1. For each ordinal i, E. will denote the closed linear span in L2(X, M, p) of Hi . Assuming that Hi and ; have been defined for all j < i, we define Hi = U Hi if i is a limit ordinal, and
j
if i is a successor ordinal. The elements of H = U • H. are called generalized eigenfunctions of U, and Tis said to have generalized discrete spectrum in case H spans L2(X, p) (Abramov 1962, Parry 1969a, Petersen 1971 et al.). These classes of transformations may be considered the next simplest, after the discrete spectrum transformations.
a,
56
2. The fundamentals of ergodic theory Exercises 1. Prove that if T is ergodic, then so are the induced transformatiors TA and t (see Section 2.3). (t is metrically indecomposable even if ii(g) . co.) 2. Prove that if Tis ergodic, f..>. 0, and in-1 11In —
E f(T kx) < co a.e.,
n- • co
fl =0 then f E Li . 3. Which of the equivalent characterizations of ergodicity fail when X has infinite measure? 4. (a) Prove that T is ergodic if and only if
-n E ,u(Tk A n B)--- ,u(A)p(B)
for all A. BEA. k=0 (b) Prove that T is ergodic if and only if (a) holds for all A, B in a setnialgebra which generates the a-algebra R. 5. Let T be ergodic and y 4 p a measure on (X, M) such that v T -1 ‹ v. Show that vT = y and y is a constant multiple of z. 6. Let X = [0, 1] with Lebesgue measure m. Then T (preserving n) is ergodic on X if and only if -
1
in-1
-n E, f(T kx)--
7.
ffdm a.e. k .0 for each continuous function f. (Note: It is not true that ergodicity is characterized by the absene of continuous invariant functions.) Let U:V(X, A p)--+ L 2(X, al, p) be the unitary operator associaed with an ergodic m.p.t. on a nonatomic probability spate (X, A', p). Then every point of the unit circle is an approximae eigenvalue of U; i.e., given A with I AI = 1 there are fn e L 2 wih I fn 2= 1 for all n and 11 Uf — 24,11 -- O. Consequently tie spectrum of U is the entire unit circle. Suppose that ( 2, m, p, a) is a Markov shift determined by a give stochastic matrix A and fixed probability vector p with dl pi > 0. Prove that if 42, gi, p, a) is ergodic, then A is irreducib:. Prove that for any ergodic m.p.t. on a nonatomic space there if.a set A of positive measure for which the return time n A iS U1bounded. Prove that if T has discrete spectrum, then there is a sequencof
I
8.
9.
10.
57
2.5. Strong mixing
integers nk co with T"k I in the strong operator topology on Tnk f -f112 -0) for all f 12,
2.5. Strong mixing While every m.p.t. on a space of finite measure is recurrent, only certain ones, namely those that move almost all points throughout the space, are ergodic. The concept of ergodicity can be strengthened in several ways to define classes of transformations which satisfy even more stringent quantitative recurrence conditions. Recall that a m.p.t. T: X -4 X is ergodic if and only if I" (1)
-
nk
y ,u(T
p(A)p(B)
kA n
for all A, Best
=
We say that T is weakly mixing if 1 n— 1 pp(T-kAnB) - ,u(A),u(B)I -4 0 for all A, Be 91, (2) nk= 0
y
and strongly mixing if . ( 3) I P(T -k A r) B) - (A)i(B)t -4 0
for all A, Bea.
Thus T is strongly mixing if and only if ,u(T k A n B) ,u(T -k A n B) ti(B)
,u(A)
for all A, BE a with ,u(B) > O.
This last condition says that eventually T distributes A fairly evenly throughout the space X: for large k, the proportion of T ' A that lies in B, namely p(T k A n B)/4B), is approximately the same as the relative size of A in X, namely ,u(A). The following observations are important even though they are evident. Proposition 5.1. Strong mixing implies weak mixing and weak mixing implies ergodicity. Proposition 5.2. T is strongly mixing if and only if
(Unf, g)
f, 1)(g, 1)
for all f, g E
Thus T is strongly mixing if and only if Unf - Jfdp weakly in L2(X , 1.2) for each f eva, p). Too bad about that.
2. The fundamentals of ergodic theory
58
Proposition 5.3 T is strongly (respectively weakly) mixing if and only if (3) (respectively (2)) holds for all A, B in a semialgebra which generates a, or for all A. B in a set dense in 3. Proposition 5.4 Ergodicity, weak mixing, and strong mixing are isomorphism invariants of T. (Actually they are spectral invariants.) Theorem 5.5 (Rényi 1958) T is strongly mixing if and only if for
each A € lim p(T ' A n A) = 11
-• co
Proof: We prove that if limr „ p(T A n A) = p(A) 2 for all AEA then (X. Pi p, T) is strongly mixing; the converse is obvious. Fix AEA' and ,
let M be the closed linear subspace of L2(X, Pi, ,u) generated by the constant functions together with {U kxA :kEZ}. Since
lim (UnxA , 1) = lim p(A)= (x A , 1)(1, 1) and n
Il -6 OD
cc
lim (UnzA , UkxA ) = lim (Ur k xA , xA) = lim p(T - "" A n A) co
n co
= P(A) 2 = (xA 1)(Uk xA , 1) ,
,
we have lim(Ux A , f)= (x A , 1)( f , 1) n
for all f E M.
Co
Given f EL2 (x, Pi p), write f=f i ±f2 , where f l EM and f2 EMI . Then ,
lim (UnxA ,f)= lim (UnxA, f1) lim (Unx A , f2 ) n co
11 -4
= (xA , 1)(f, ,1) = (x A , 1 )(i, 1 ), since (f2 , 1) -- 0. Thus in particular, for BE 3 and f = lim ,4T _"A ri B) = lim (Un x A , xa) =-- (x A , 1)(xi3 , 1) = therefore (X,
a, p, T)
is strongly mixing.
Here are some examples of strongly mixing transformations. (1) Bernoulli shtfts are strongly mixing. This is true even for shifts based on infinite alphabets. Let CO
p) = n (X, a?, P), —
co
where (X a, P) is a probability space, and let cr:11 -4 LI be the shift. It is clear that if A, Be, are cylindrical sets (i.e., of the form lcoESI:coii E Erl, where E 1 , ... Ee a), then ,
p(cr - "A n
p(A)p(B),
59
2.5. Strong mixing
since if n is large enough the sets of indices determining cr'A and B do ' not overlap. By Proposition 5.3, a is strongly mixing. (2) Mixing for Markov shifts. An N x N stochastic matrix A is called aperiodic in case there do not exist an i = 1, 2,...,N and an integer m> 1 such that eti > 0 implies m divides k. Theorem 5.6 Let (0, ", p, o-) be the Markov shift determined by
an N x N stochastic matrix A which fixes a positive probability vector p (see Sections 1.2.B and 2.4). The following statements are equivalent: (1) (0., .F, p, cr) is strongly mixing; (2) limk _, dici ---- pi for all i and j; (3) A is irreducible and aperiodic. Proof: (1). (2): di`i = Pr {cok = jl coo = il .
p(T -k {coo = j} n {coo .--- i} ) -* P{coo= .i} = Pi. p{co o = il
(2) (3): This is clear, since all pi are positive. (3) .(1): If A is aperiodic and irreducible, then on is ergodic for each n, and since (1/n)Erk1 :1aiici -- pi > 0, we must have lim supk _. oe atj >0 for all i,j. Now if, for example, E = (wen: coo co i ...co, . io i i ...ir )
and F = {coen: wow ' .• • cos =id ' •-•./s } , then for large enough k, ME r) T -k n = (Pioaicii..- air_,i)(aorajoh. -- ais-lis) al!'
= 14E)ti(F)
Pio
and lim sup ,u(T -k En F) ‹ c,u(E)/2(F), where akc = sup lim sup--u. i,j
k-b ,x2
pj
By Ornstein's characterization (see Exercise 2.6.6), cr is strongly mixing.
Friedman and Ornstein (1970) proved that in fact every strongly mixing Markov shift is Bernoulli. This means that there is another measurement
60
2. The fundamentals of ergodic theory
to make on the experiment, the records of whose outcomes will be indistinguishable from the histories produced by a certain Bernoulli shift: There is a finite measurable independent partition (A 1 , A 2 ,...,An } of SI {1, 2, ...n} by (4)(0); = i which generates .F, so that the coding n ±, if and only if 4 toe A i gives an isomorphism of the Markov shift with the Bernoulli scheme on n symbols with weights p(A 1 ). ,u(A (3) Automorphisms of compact abelian groups (Halmos 1943). Let X be a compact abelian group, .q the Borel subsets of X, p normalized Haar measure on X, and T: X --■ X a continuous automorphism. Then T is a m.p.t. For, define a measure y on X by y(A) = p(T A) for each A E ; then for all A E a and all xe X, y(xA) = p[T -1 (xA)] = Apr - x) A)] = p(T A) = y(A), so because of the uniqueness of Haar measure y is a constant multiple of p. But y(X) = p(T' X) = so y = p. Again we wish to find conditions for (X a, p, T) to be ergodic. The character group k of X forms a complete orthonormal set in p), and the unitary operator U induced on /2 by T maps fe /2(X, into itself.
rto,,,
,
Theorem 5.7 If (X, M, p, T) is ergodic, then the only finite orbit
of U on k is that of the trivial character. Conversely, if U on k has no finite orbits other than that of the trivial character, then (X, a, p, T) is strongly mixing and hence ergodic. Proof: Suppose e k is a nontrivial character with a finite orbit under U. Then Let n be the smallest positive integer such that U% = is an invariant function. Since 1, 1,1/ = c + are distinct elements of k they are linearly independent, and therefore tfr is not constant. Consequently (X. a, p, T) is not ergodic. Conversely, suppose now that no element of k other than 1 has a finite orbit under U. Let f,geg and suppose at least one off g is not 1. Then Unf# g for all large n, and consequently lim (Unf, g) --= 0 = (f, 1)(g, 1). In case f = g = 1, the same formula still holds. Iff, geL 2 are arbitrary, the fact that (Uhf, g) = (f, 1)(g, 1) follows from approximation off and g by linear combinations of elements of k (see Proposition 1.4.2). We consider now the special case where X is the n-torus 11,5" = Rig". It is known that a continuous automorphism T: IK" LK" is given by an n x n matrix with integer entries and determinant + 1. We claim that T on IFS" is ergodic if and only if the matrix A representing T has no roots of unity among its eigenvalues.
61
2.5. Strong mixing
According to this Theorem, T on 11(" is ergodic if and only if U on Csn= in has no finite orbits other than that of the trivial character. Welk' is nontrivial, there is a nonzero ker such that f(x) = e2nik'' for all x E 11(n.
Then UPf(x) = e 2xik • 7t = e2rlk AP = e 2nik(Afr)v.x .
so if UPf =f, then k(AT = k and AP has 1 among its eigenvalues. Then we have APv = y for some v Fin, and hence
0 = (AP — 1)v = (A —
I)[(A —
(A — E,1)1)1,
where t ,...tp are the pth roots of unity. Thus either (A — p I)v is an eigenvector of A with eigenvalue t i , or else it is zero. It is clear that we may continue this process to find that at least one is an eigenvalue of A. Thus if U on kit has a finite orbit, then A has a root of unity among its eigenvalues. Conversely, suppose A has a root of unity among its eigenvalues. Then there will be kEZ" and pE Z + such that 14AgrY = k. It is clear that thenfE defined by f(x) = e2nik'x will have a finite orbit under U. Katznelson (1971) proved that any ergodic automorphism of Oe is isomorphic to a Bernoulli scheme. This was extended to compact abelian groups by Lind (1977) and Miles and Thomas (1978), and to certain ergodic automorphisrris of nilmanifolds by Dani (1976). There are some non-Bernoulli strongly mixing transformations, for example those with entropy 0 (see Chapter 5). We will mention now two other properties that lie between strong mixing and being Bernoulli. Let N be a finite or infinite cardinal number. We say that a m.p.t. T has Lebesgue spectrum of multiplicity N in case there is a set A of cardinality N and a set of functions
{fa.i : AEA, jeZ} which, together with 1, form an orthonormal basis for 1 2 (X, p), and such that UT f
=f1 for all 2 E A, iE Z.
It can be proved that T has Lebesgue spectrum if and only if its maximal spectral type is absolutely continuous with respect to Lebesgue measure on the unit circle (hence the name).
2. The fundamentals of ergodic theory
62
Example 5.8 Bernoulli schemes have countable Lebesgue spectrim 1), the shift on fr œ (0,11, The idea will be clear from considering where 0 and 1 each have weight 1. Let r(co) = ( — 1)w^. The finite products of the T n, together with the constant function 1, form an orthonormal basis for L2. Partition this basis into equivalence classes by saying that f ,-- g iffa" = g for some nel. There will be countably many equivalence classes, so a has countable Lebesgue spectrum.
ma,
Proposition 5.9 A m.p.t. which has Lebesgue spectrum is strongly
mixing. Proof: The relation (1)
(Unf, 9)-4 (f, 1)(1, g)
clearly holds if eitherfor g is 1. For any other elements of the distinguished orthonormal basis, clearly (fA,j+n,f1',P) = °
for all large enough n. Thus (1) holds for all basis vectors, and hence for all fElf We say that a m.p.t. T on (X, M, p) is a K-automorphism Kolmogorov) if there is a sub-a-algebra .4 c M such that
(for
(i) T -l af c si, cc (ii) U Td generates 61 , n = — 00
0
(iii) n T d is trivial. PI = — 00
Proposition 5.10 Bernoulli shifts are K-automorphisms. Proof: If M(p l , ... pN) is the shift on the measure space (0, .F, p), where (2 = {1, 2,...,N} z and p is the product measure determined by the weights pN , let .521 be the a-algebra generated by all cylinder sets of the form
{we LI : coi =j}, where j = 1, 2,...N and i > 0. Then clearly a' d c a and U' OE, esi generates M. Now if A E a-nsi, then A is in the a-algebra generated by cylinder sets determined by coordinates at least n, so if B is in the a-algebra generated by cylinder sets determined by coordinates less than n we will have p(A n B) = p(A)p(B). Hence if A c R,"_ 0 a ' at , A will be independent of every cylinder set B. This implies that p(A) --= p(A) 2 , and (iii) follows.
2.5. Strong mixing
63
Proposition 5.11 Every K-automorphism has countable Lebesgue
spectrum. Proof: Let .4 be the distinguished sub-a-algebra of and let M denote the closed linear span in L2 of the characteristic functions of all the sets in d. Then we will have 00
unm
constants = n
u2 m
um m u-im ...c
0
C
URM L2-
n = — 09
Choose an orthonormal basis fej for the orthogonal complement of UM in M. For each i, let
Mi = c.l.s. {Unei : riEZ}. Then UM i = Mi for all i, and
EGMi = (constants)'. If we let fij = (Pei , we will obtain the functions required by the definition of Lebesgue spectrum. Exercise 1 shows that dim (M e UM) = cc, so the transformation has countable Lebesgue spectrum. It can be proved that T is a K-automorphism if and only if every nontrivial factor of T has positive entropy (see Chapter 5 for the definition of entropy). Ornstein constructed K-automorphisms which are not Bernoulli (i.e. not isomorphic to any Bernoulli shift). On the other hand, there are m.p.t.s with Lebesgue spectrum and zero entropy, for example certain Gaussian systems (Girsanov 1958) and horocycle flows (Parasjuk 1953 and Gurevich 1961). (See also Rokhlin 1960 and Newton and Parry 1966.) Finally, we cannot end this section without mentioning Rokhlin's old, still-unsolved problem involving higher degrees of mixing. A m.p.t. T is said to be n-mixing if for any choice of measurable sets A 1 , A 2 , ... An , lim
AP"' A i n. .n T". An) = p(A 1 ) ... AA).
in ftm — ml — co
Thus 2-mixing coincides with strong mixing. It is still not known whether there are any 2-mixing transformations that are not 3-mixing. Exercises
1. Complete the proof of Proposition • 2.5.11 by showing that dim (M e UM)= co. (Hint: Choose a nonzero f EM e) UM. Note that d must be nonatomic, choose disjoint d-measurable
64
2. The fundamentals of ergodic theory subsets E 1 ,E2 ,... of the support off ; and consider the functions fXE i .) 2. Prove Proposition 2.5.2. 3. Prove Proposition 2.5.3. 4. Prove that a weakly mixing Markov shift is strongly mixing. 5. Prove that T is weakly mixing if and only if n-1
p(T -1€ A n A) — p(A)2 1= 0
liM n-boo n k=0
for each A E gi. 6. Prove that Bernoulli shifts, mixing Markov shifts, and ergodic automorphisms of compact abelian groups are n-mixing for all n. 7. (Ambrose, Halmos, Kakutani-see Halmos I949a.) Show that there is no concept of 'uniform mixing' for m.p.t.s: If
p(T - " A n B)
p(A)p(B)
uniformly for all A, BEM with A c B, then every set in At has measure 0 or 1 and so (X, At, p) is isomorphic with the space consisting of a single point. (Hint: It is easy if we remove the phrase 'with A c B' above. But the weaker hypothesis implies the stronger: make n B) n B) — p(A n B)p(B) I
and I /4( T - "(A n Er) n 13`) — p(A n MOB') I small simultaneously.)
2.6. Weak mixing Recall that a m.p.t. T: X X on a probability space (X, 01, p) is said to be weakly mixing in case for every pair of measurable sets A and B, n-1
urn n-bat)
n
E
p(T"AriB)— p(A)'4B) = O.
k =0
That the concept of weak mixing is natural and important can be seen from the following theorem, according to which a transformation is weakly mixing if and only if its only measurable eigenfunctions are the constants. Thus weakly mixing transformations lie at the other extreme from those with discrete spectrum, i.e., the group rotations. Most of this theorem seems to go back to Koopman and von Neumann (1932).
2.6. Weak mixing
65
Theorem 6.1 Let T: X X be a m.p.t. on a probability space (X, 6e, 14. Then the following are equivalent: (1) T is weakly mixing. n- 1
(2) firn n-■ op n
E
(UT' f, g) (f, 1)(g, 1)1 = 0 for all f, g LAX, .4, ply
k=1)
(3) Given A., BEM, there is a set J c 1 + of density 0 such that
lirn
1.1(T" A n B) = p(A) p(B).
co,n#.1
(4) (5) (6) (7)
T T T T
x T is weakly mixing. x S is ergodic on X x Y for each ergodic (Y, y, S). x T is ergodic. has no measurable eigenfunctions other than the constants.
We will prove this Theorem presently. The implications (1)-4*.(2) and (5) (6) are clear. The argument for (1).*>(3) consists of a real-variables analysis of the type of convergence of sequences involved in the definition of weak mixing and is based on a lemma of Koopman and von Neumann. The proof that (3) (4) turns just on the observation that the union of two sets of density 0 has density O. The arguments for (4) (5), (6) (7), and (6) (3) (and hence (6) (1)) will be found to be straightforward and easy. The major difficulty of the proof lies in the implication (7) =-(6). We will first give the usual proof, which makes use of the Spectral Theorem. This will be followed in Section 4.1 by a second, more constructive proof, which does not need the Spectral Theorem. Notice that (7) implies that T is ergodic; a characterization of weak mixing in terms of eigenvalues, however, would have to mention ergodicity explicity: T is weakly mixing if and only if T is ergodic and has no eigenvalues other than 1. Let us proceed towards the proof of the Theorem. Recall that a subset E NI of the positive integers is said to have density zero in case
1 n-1 n k=0 rs-• co urn -
E xE (k) = 0 .
Lemma 6.2 (Koopman-von Neumann 1932) Let f : be nonnegative and bounded. A necessary and sufficient condition that n- 1
lim
E f(k)= 0
n-boo k=0
is that there exist a subset E
lim f(n). O. nitE
fkl of density zero such that
66
2. The fundamentals of ergodic theory
Proof: Suppose first that there is such a set E. We may write 11f_1 1 1 - E f(k) = E f(k) + E f (k). n nk0 n k4n-1,k(E k4n — 1, keE
The first term is arbitrarily small for large n. since E has density 0 and f is bounded. The second term is arbitrarily small for large n, since f(k)x Ec(k) - ■ 0 and ordinary convergence implies Cesaro convergence. Conversely, suppose that
E f(k) = 0.
11111 —
n
co n k .= 0
For each m = 1, 2, ... let keN1 :f (k) >
E.
1
mJ
Then E l c E 2 , and each E. has density zero, since (1/m)xE., < f Therefore, for each m = 1, 2,... we may choose im >0 such that j 1 < 1, <
and
E xE — n
,n
(k)<
1 — for n
_ 1 (m = 2. 3, ... ).
112
k=0 Now let io = 0 and define
E= U En, n (im _ i„, ); m= 1
we will show that E has density zero and
lim f(n)= 0. n-, $E
Let us consider f(k) for ktiE. In the interval (0, i l ] we have removed those ks for which f(k) > 1; in (i 1 , i2 ] we have removed those ks for which have removed those ks for whichf (k) > 1/m. f (k) > ; and in (i._ 1Ç] Thus clearly
lim f(n)
O.
n-+ co, na
However, we have only removed a set of ks of density zero. For if then 1 n
1 n-
—E
nk=0
1 6-1- 1 xE(k)=
1 n
E
xE (k) + n k =0 1
in,- 1—
I
k=0
E
X E (k) 1 11— 1
i
•
-1
(k) + -n E
k=0
XE„,
2.6. Weak mixing
67
1 i,n-i-1
im- 1 k=0
<
1R - 1
-1
E XE..(k)
(k) + -n
k=0
1 1 +—. m—1m
Proof of the Theorem. (1). (2): This is clearly true whenf and g are characteristic functions of measurable sets. The general statement follows by forming linear combinations and approximating. (2) (1): Trivial. (1)«(3): This follows directly from the Lemma, taking f(n) = Ip(T"A n B) — p(A)p(B)I. . (3) (4): Let A, B, C, Dew. By (3), we may choose sets Ji , J2 c N, each of density zero, such that lim n
MCI
'
!pa' A n C) - ,u(A)p(C) i = 0, 1
liM 17-• COMM 2
Ip(T"B n D) - p(B)p(D)I = 0.
Let J = J 1 l)J 2 . Then J has density zero, and lim Iii x p(T x T)" ((A x B) n (C x D))
n-+ x
.4.1
-p x 14,4 x B)p x p(C x D)! -....- lim Ip(T" A n C)p(TnB n D) - p(A)p(B)p(C)p(D)I n- • ao ,nit.1 ‘. linl
[p(T"A n C)I p(T"B n D) - p(B)p(D)
I
n-4x),nt-J
+ p(B)p(D)Ip(T"A n C) - p(A)p(C)I] = 0.
The result then follows from the known equivalence (3)«(1) applied to T x T. (4) (5): If T x T is weakly mixing, then so is T itself. Let ergodic (Y, qr, v, S) be given. In order to show that T x S is ergodic on X x Y, it is enough to show that if A, BE and C, D E, then ln- 1
- Epxv[(TxS)&(AxC)n(BxD)] -. p(A)p(B)v(C)v(D). nk=0 However, the left-hand side is
-1 Pr-E1 p(TkA r) B)v(S k C n D) il k = 0 I Pi - I = - E {p(A)p(B)v(S kC n D) n k=0 ± [ti(T* A n B) - p(A)p(B)]v(S k C n D)}.
68
2. The fundamentals of ergodic theory
By ergodicity of S, the first term tends to p(A)p(B)v(C)v(D). The second term is dominated by
y lio-kA
-
n
n B) - p(A)p(B)I,
k=0
which tends to 0 as n —■ 00 because T is weakly mixing. (5) (6): T must itself be ergodic, if it satisfies (5), since T x {1} ( {1} = the identity transformation on a single point) is ergodic. Therefore, under (5), T x T is ergodic. (6) (3): If A, Bey, then -
in -1 L ki(rA n B) - p(A)p(B)J 2
n
k=0 1n-1 = — EgTkAn B)2
n k= 0
E
- 2,4A)p(B)p(T kA n B) + [p(A)p(B)] 2 n k=0
l n—I
_-,- - Epxp [(T x T)k(A x A)n (B x B)] - 2p(A)p(B)nk=0 in - 1 -Ep.xp[(TxT) k(AxX)n(BxX)] + [p(A)p(B)j 2 . n k=0 Since T x T is ergodic, as n —, 00 this tends to
p x p(A x A)it x pt(B x B) - 2p(A)p(B)p x p(A x X)p x p(B x X) + [p(A)p(B)] 2 2p(B) 2 - 2p(A) 2 p(B) 2 + p(A) 2 p(B) 2 = O. =1(A) By the Lemma, there is a set J of density zero such that lim 1,u(TnA nB)- p(A)p(B)I 2 = O. n--• co.tntJ
Then (3) is immediate. (6) (7): Suppose f EL2(X) is an eigenfunction, so Tf= 2f for some AEC with 121 ---- 1. For (x, y)e X x X, let g(x, y) = f(x)f(y). Then
(T x T)g(x, y) = g(Tx, Ty) = f (T x) f (T y) = .14(x, y)= g(x, y) a.e., and thus g is an invariant function. Assuming that T x T is ergodic, g, and hence f, must be constant a.e. (7) (1): We use the Spectral Theorem. Let V denote the closed linear span in L2(X) of the eigenfunctions of T. We will show first that if f E 1/1 and ge L2(X), then i n— 1
iim -
n-boorl
k=0
69
2.6. Weak mixing
Fix f E V1 and geL2(X), and let p be the complex measure on the spectrum a(UT ) of UT defined by p(A) = (E(A)f, g) for all Borel sets A c c(U T ) (see p. 19). Note that if 20 E E.S, then U TE{20}f =A{10 X dRAW = 20 1 mu 7-)
so E{11.0} f is an eigenfunction of
UT.
T)
410) dE(1)f = A 0 E{A 0 }
But since fE VI,
0 = (E{2 0 ) f,f) = (E{/1 0 } 2 f, f) = (E{11 0 } f, E{). 0 } f),
so 41.0 ) f O. Consequently p{.10 } 0 for each ilo e NDW . i nv, -1
L I(UTY, 01 2 ="
1 Fr- 1
E f
n k =0
V
2
1 R- 1
dp(2) 1 = - E = L n k = 0 a(t' T ) n k= 0 1 R
in
- k=E
0
2k
(42(.1)
f
ckdri(o)
C M' T )
e'
—
n
f,
2k
Ak dp(A)dii(C) (Au OxaCC
1 n--- 1
. f_ E a
kck) citiodgc)
n k= 0
a(L'Oxcra 7 T )
dp(A)dri(C).
f
= aWoxa(CT)
n 1 — g
g=
0 if and only if A = C. But the diagonal (This is permissible since 1 A of a(t./ 7,) x cr(U T ) has measure zero by Tonnelli's Theorem, since p assigns measure zero to individual points: ske
f0( Now 1(1/n)(1 Thecrem
,i(uT) ti{Y} dfi ty ) =
(2-cr)/ o -
in- 1
ihn
Xe(x, Y) 0111(x))dri(Y)
d(1-4 x = f awr)
E Kurkf,
n k=0
0 a.e., so by the Bounded Convergence
= o.
70
2.
The fundamentals of ergodic theory
From Lemma 6.2,
E i(u,kf, g)1 = 0
liM
for f E V I and geL2(X).
n " n k=0
Now let f, ge LAX) be arbitrary. Then, assuming that the only eigenfunctions of T are the constants, f (f, 1)e VI. Therefore, 0= lim n-"C
E
(f, 1), g)I
nk= 0 in-1
E
liM — n-• co n k = 0
KLITk f, g) ( f, 1)(g,
1)1,
so T is weakly mixing.
Remark 6.3. In statement (3) of Theorem 2.6.1, it is in fact possible to select a single set J c Z + of density 0 such that lirn Ip( r A n B)
-
p(A)p(B)1 = 0 for all A, BE a.
This has been noted by S. Kakutani and by L. K. Jones (1971). We select a set J (as in (3)) corresponding to each pair A i , Ai in a countable dense set in M. Renumber these as J 1 , J 2 , . Let fen } be a sequence decreasing to 0. Choose n 1 so that n-- 1
- E xj,00
if n n i .
E,
nk= 0
Choose n2 > n i so that
xj ,(k) < -2te2 nk = 0
n k =0
x j2 (k) <
2 if n n2 .
Continue in this manner, choosing n> nr _ i so that n-1
E x,i(k) <
for i = 1,
nk = 0
r if n
nr .
Define
J=
Û [J n(ni , cc)].
Then J has density 0, since if n, n
1%1,1
-n > x(k)
n-1
lir
+,
we have
r
EE
E r,
n k=0 i=1 which tends to 0 as r CO. However, since J contains a tail of each k=0
Jr,
2.6. Weak mixing
71
we have
lim a = lim a
n -4 CO. nOJ
each r,
n-+ oo.n0L-
and hence
lim ipt(Tn A i n A i ) - p(A 1 )pt(A)1 0 rt
, nit./
for each pair A i , A i in the countable dense set in M. Then given A, BE and &> 0, if we choose A i , A i with p(A i A) < E and p(A i AB) < e, we will have
lim ipt(Tn A n B) - p(A)14(B)1 n-■ ?area
"‘. linl 1114T n A n-• OD , /240
n B) - p(T" A i n Ai) I
+ 11.4(Tn A i n Ai) - p(A 1)p(A i)1 + I p(A i)1.4(A i) - p4A)p(B)1} < (e.g.. the third term is no greater than ip(A i)!4Ai) - p(A 1)p(B)1 + IAA i )p(B) - tc(A)p(B)I < 2c). Since g > 0 was arbitrary,
lim 11.1(T" A
B) - p(A)p(B)I = 0 forait A. BEM.
n-4 ao ,n0J
It is not easy to give examples of m.p.t.s which are weakly mixing but not strongly mixing - see Section 4.5. But there is a sense in which almost every m.p.t. is weakly mixing but not strong mixing. Because of the isomorphism theorem and von Neumann's theorem, every m.p.t. on a Lebesgue space is isomorphic to a m.p.t. on the ordinary measure space of the unit interval, [0, 1). We will regard two m.p.t.s on [0, 1) as equivalent in case they differ only on a set of measure O. The resulting set G of equivalence classes is the group of all automorphisms of [0, 1). G is a complete metric topological group with respect to each of the following two topologies: the weak topology, in which Tn -- T if and only if TA TA (in the sense that p(TAA TA) -■ 0) for all A E; and the strong topology, determined by either of the equivalent metrics d i (S, T) = p{xEX:Sx # Tx} or d2(S, T) = sup {1.4(S A A T A): A EM}. (The weak topology is the one induced in G by both the strong and weak topologies for operators on L2 , which coincide for unitary operators.) Halmos (1944) proved that with respect to the weak topology, the set of weakly mixing m.p.t.s is residual (i.e. the complement of a first category
72
2. The fundamentals of ergodic theory
set); while Rokhlin (1948) showed that with respect to the weak topology the set of all strongly mixing transformations is of the first category. Thus in this particular sense, the 'generic' m.p.t. is weakly mixing but not strongly mixing. The details of the proof, which involves approximation of arbitrary m.p.t.s by periodic m.p.t.s, can be found in Halmos (1956). Katok and Stepin (1967) investigated the possible speeds of such approximations, analogous to the speeds of approximation of irrational numbers by rationals. This refined theory has many applications to the 'study of spectra, isomorphism, entropy, and genericity of m.p.t.s and is useful for the construction of examples. There is another sense in which weak mixing is generic. Historically, the first genericity result was the theorem of Oxtoby and Ulam (1941). which stated that for a finite-dimensional compact manifold with a nonatomic measure which is positive on open sets the set of ergodic measure-preserving homeomorphisms is generic in the strong topology. (Note that in G. with respect to the strong topology the set of all ergodic m.p.t.s is nowhere dense!) Their purpose was to prove the existence of ergodic m.p.t.s on manifolds; cf. the realization problem discussed in 4.4. Katok and Stepin (1970) showed that in fact the weakly mixing measure-preserving homeomorphisms of a compact manifold are generic. Recently Alpern (1978) has shown how to obtain these results and the Halmos-Rokhlin theorems simultaneously. See also Oxtoby (1973). White (1974), Prasad (1979), Anosov and Katok (1970) and Fathi and Herman (1977) for recent work on this circle of questions. Exercises 1. Can a weakly mixing transformation have nonmeasurable (hence
nonconstant) eigenfunctions? 2. Prove that if T is weakly mixing, then so are T" and . 1 -f" for any n = 1, 2, ... (Here .fF 1 means any m.p.t. S for which 5' = T.) 3. Prove that there is a universal system (Y, y, S)(( Y, 9f, y) is a probability space, S: Y Y a m.p.t.) such that a m.p.t. T: X -> X on a probability space (X, q, p) is weakly mixing if and only if T x S is ergodic. 4. There are examples of weakly mixing m.p.t.s that are not strongly mixing (see 4.5). For now, consider some easier counterexamples. (a) Find an example of a sequence {an } for which
1 n— 1 lim- E lak I = 0 n-b 11 Go
k= 0
73
2.6. Weak mixing
but lim an # O. II -6 00
} of measurable sets, each having (b) A sequence {A1, 2 measure a, is called strongly mixing if
for all Beg,
lim p(A n n B) = ap(B)
n-•co
and it is called weakly mixing if 1 n-1
lim
E
-
n B) — ap(B)I = O.
Give an example of a sequence which is weakly mixing but not strongly mixing. as above is called mixing of order k (c) The sequence { A 1 , A 2 (k = 1, 2, ... ) if .,
lim infet, -• x
- ni --• x
p(A n A n n n1
n2
A nk
n B) ak p(B) for all BEG.
Give an example of a sequence that is mixing of order 1 but not of order 2. 5. (England and Martin 1968). Show that T is weakly mixing if and only if for each pair of sets A, BEM with positive measure, there is a set J c NI of density 0 such that n B)> 0
for all n0 J.
Can such a set J be found that works simultaneously for all A, BEN?
6. (Ornstein 1972). If T is ergodic for all n, and if there is c such that Jim supn, p(TnA n B) cp(A)p(B) for all A. Bed, then T is strongly mixing. (Hint: First prove that T is weakly mixing. Then let m be a cluster point of the measures v. on X x X defined by
f
f(x, y)dvn = ff(x, Tnx)dp
Show that m = p x p.) 7. Find a metric for the weak topology on G. 8. Which of the following classes of m.p.ts are closed under the formation of Cartesian products (and not just Cartesian squares): (a) ergodic; (b) weakly mixing; (c) strongly mixing?
3 More about almost everywhere convergence
Many of the fundamental convergence theorems of ergodic theory, analysis. and probability can be proved by the technique of maximal inequalities and Banach's Principle. We give proofs of this kind for the Ergodic Theorem. Local Ergodic Theorem (via the Lebesgue Differentiation Theorem), and existence of the ergodic Hilbert transform. For novelty. the Martingale Convergence Theorem and ChaconOrnstein Theorem are proved without using maximal inequalities. We begin with a careful study of the ergodic maximal function, and find that for one-parameter ergodic flows the maximal inequality is actually an equality. We show that the Ergodic Theorem cannot be improved, in that no statement is possible about the speed with which the convergence of the ergodic averages takes place. 3.1. More about the Maximal Ergodk Theorem
In this section we undertake a deeper study of the ergodic maximal functions
1 f*(x)= sup n?:
and
i n k =0
f*(x)= sup!-
f(T5)
11' f(T x)ds
(for a single m.o. T)
.
t> 0 t o
s
(for a flow { T,} ),
including the following topics: (A) Hopf's extension of the maximal inequality to the case of a positive contraction T on I"(X).
(B) Proof that the maximal inequality is actually an equality for ergodic flows. (C) The theory of sign changes of t
Ft(x)= f f(Tsx)ds 0
that follows from (13).
n-1
and
Stif(x)=
E f(rx)
k=0
3.1. More about the maximal ergodic theorem
75
(D) A characterization of when the maximal function f* of an 12 function is itself integrable: Wiener's Dominated Ergodic Theorem together with the converse that follows from (B). A. Positive contractions. Suppose that T:L i (X)--, L(X) is positive contraction. This means that if f , then f 0 a.e. implies Tt 0 a.e.. and I} 4. 1, i.e. 11 Tg lid for all gEL1 . Ergodic theorems can be proved for certain such operators T also, whether or not they arise from m.p.ts on X (see Section 3.8). Usually the essential first step is the proof of the corresponding maximal inequality. Theorem 1.1 Maximal Ergodic Theorem for Operators 1954) If T is a positive contraction on L i (X) and f E Li , then
f dp
O.
( f* > 0) rf(x), as might be guessed.) (Here f*(x) = sup„,(1/n) Proof (Garsia 1965): For each n = 1, 2,... let J-1 Tkf. = sup pc -o
E
Then
f=11 =12 and
un ›
{ f* > I
Also,
L f+Tf:
for all n
;
for clearly
f and for! <j‘,n
E kr-- 0
Therefore
i-2
Tkf= f+ T(T
E
k=0
. .f+ Tfi _ i kf 4
f+Tfn ‘f+
(HoPf
76
3. More about almost everywhere convergence fdpi _.-- f Tf:- dp f.dp - f { {fn > o} f.> 0} fun>0)
= ff.+ ail
—f
{In >0)
Tfit+ dp .?--. $f
- f Tfit+ dp ?.- 0,
since il Tli‘.. 1 implies f Tgdp -.-ç fgdy if 0 -‘,. geLl . The conclusion follows from the Monotone Convergence Theorem, letting n -. oo . In 3.8 we will examine the relationship of this maximal inequality with the filling scheme and will obtain a proof of the Chacon-Ornstein Theorem that actually avoids the maximal inequality.
B. The maximal equality Let us consider now a one-parameter flow { T, : - co
and
1 1 sre f*(x)= sup - F,(x) = sup- f(Tsx)ds. t>0 t i> 0 t o Recall the Maximal Ergodic Theorem. Theorem 1.2 Maximal Ergodic Theorem (Wiener 1939, Yosida and Kakutani 1939) Ufa! and ace R, then
f dp _.-- ap { f* > a } .
f f* >a) (
Actually this inequality is often an equality. Theorem 1.3 Maximal Ergodic Equality (Marcus and Petersen 1979; see also Engel and Kakutani 1981). If {Tel is ergodic, in that every measurable subset A of X which is invariant under the flow (7; A = A for te R) has measure 0 or 1, if fe LI (X), and if a _..- S f dp, then I
ir >co (If a < ifdp, then p{f* > al = 1.)
3.1. More about the maximal ergodic theorem
77
We will prove these results simultaneously, using Hartman's approach (1947), which is based on the following simple lemma. Lemma 1.4 Rising Sun Lemma (F. Riesz 1931, 1932) Let h be a
real-valued continuous function on an interval [a, b] c R and let S---- {tE(a, b): there is t` > t with h(e)> h(t)}. (If the graph of h represented a mountain range, S would be the set of points that were in the shade as the sun rose at the positive end of the x axis.) Then (i) S is open; (ii) if S = U(ak ,bk) is the decomposition of S as the union of pairwise disjoint open intervals, then h(ak) ‘. h(b k) for all k; and (iii) equality holds in (ii) except possibly when ak = a.
a
b
s The proof of the Lemma is easy and indeed is already clear from the picture. Proof of the Theorems: We let A = { x: Ff (x) .?:- 0 for all t > 0}
and notice that we need to prove that
fdp .4. 0, JAE'
and that equality holds in case I f dit ..- 0 and T} is ergodic — for, as before, the results will then follow after replacing f by a —f (If f = a — g, then Ac(f). {x: there is t > 0 with Ft (x) < 0} = {x : there is t > 0 with at — (a — g)dit ....ç. 0, or tg(Tsx)ds a}, and Lifdp,‘. 0 says f 14,.,,,i gdp..... ap, {g* > 4.) ' For each x e X, let Wx = {te ER : Txeilc} = {t e R : there is t' > t with Fe(x)< Ft(x)}, {
78
3. More about almost everywhere convergence
and let
W: . W x r)(0, I). By the Rising Sun Lemma (with h replaced by
—
h),
U (cI k , bk)
14 7
with
Fak (x) ) Fbic (x) for all k. Thus
Li f(Tsx)ds =
E Tf k
bk f( s x)ds =
E [Fox) - Fak (x)] . .5 o, k
ak
so that
0..... j f f(Ts x)dsdp(x) = Ts f f( x)dsdp(x) x wix ft(x.$):o<s<1.T.A-EA9 i f fit f(Tsx)dp(x)ds = f(Tsx)dp(x)ds J o i lx:rxeAr 1
= Jr i Jr 0
0frs 1 Ac
f(x)dit(x)ds
=
A,
Jr f(x)dp(x). AC
In order to prove the equality, we notice that if (ak , bk) is a bounded component of W., then ak ot Wx , so r
f(Tsx)ds _..- 0 for every t fak
and in particular fbk
bk f(Tsx)ds ->- 0, hence f x)ds f(Ts = O. ak
If p(A)> 0, then (for almost all x) all the components of W. must be bounded, since MI is ergodic. But if p(A) =0, then by the inequality just proved,
0 --.Ç.. ffdp = f A
fdp
+ f fdp A'
=I. fdp --.5 0, Ac
so that the equality
f dp = 0 fie
holds in this case. We suppose then that the components of W. are bounded.
More chout the maximal ergodic theorem
3.1.
XE X,
For each
79
let
the component of Wx containing 0, if xE { 0, if KEA
x
and Ii1
if xe A'
A(x)
0
if xe A
Then if x EK, we see that — se.fx if and only if se Jr _ ix . Thus xAc (x) = End applying the measure-preserving change of variables (x, s)---, (Tx s) on X x Ill allows us to compute that f(x)dp(x)= f(x)y Ac (x)dp(x)
fac
f
x
= fx
=
f(x)
A(T s x)dsdp(x) seJ x)
f(x)
A(T_ s x)dsdp(x) cse.IT _ ,x1
f kx)f (Tsx)dsdp(x) = f A(x)( f f(Tsx)ds)dp(x) = 0 x
.r„ (since we have seen that the integrals off over the components of Wx are all 0). fX J„
Remark 1.5 lI A = IxE X :Ft(x)> 0 for all t> 01 and jf then fdp. =
f = O. A°
Proof: Because of the 1.ocal Ergodic Theorem (see 3.3), lim
1 — t
f
1
(Ts.c)ds = f(x)a.e.;
Jo
thus if xe A, f(x) 0 aimost surely. It is enough to prove, then, that fitx e
f (x,› = O.
If f (x)> 0, then almostsurely F;(x)> 0 for all small t> O. Let
E.=
{X E AV4
:f (x) > 0 and I;(x)> 0 for 0 < t < 1/n} ;
we wilt show that p(E.). 0 for each n = 1,2 ....
80
3. More about almost everywhere convergence
If (E) > 0, by a theorem of von Neumann (1932 c) it is possible to choose t with 0 < t < 1/n and p(kn T _,E.) > 0. If x E.E. n T_,E, since xe A\A we can find t 0 > 0 with F10(x) =0. Then to ..›.. 1/n > t, and
0= F,.(x)= Ft (x)+ Ft.,(Tt x), which is impossible since Ft(x)> 0 and Ft .,(Tt x)..>'- O. C. Sign changes of the partial sums. The one-parameter Itchy case Suppose that {T,} is ergodic and If dp > 0. Then A' coincides with the oscillation set 6" = [x e X : there are t, t' > 0 with Fr(x)> 0 and F,.(x) < 01,
while A' coincides with the crossing set q? = {xe X : there is t >0 with Ft (x) = 0 } . (This observation uses the Ergodic Theorem.) The Maximal Equality asserts, then, that J0
fdp = f fdp = w
O.
We claim that this formula is true whether or not { T,} is ergodic and irrespective of the sign of ffdp. First, if ffdp < 0, then by considering —f, for which f —fdp > 0, we see that 0( — f)= 0(f) and if( f ) =Se(f), so that still —
fdp = f f dp = O.
w If ffd,u =0. then we have JO
J
f dp = f f dp = f f dp = 0,
A(f)
so that
A(- f)
0=
f dp = ff dp fA(f)vA(-
f)
and
L
fdp = m(f)um- Dr f clil =OE
3.1. More about the maximal ergochc theorem
81
Also, by the preceding Remark.
L
f du =
f dp = 0,
f dp +
E
AUM(f)
At - fhA( - fl
so that also
fe in this case. We have proved the formula, then, in the ergodic case, no matter what the sign of f f diz may be. In order to deal with the general, nonergodic case, we appeal to the theorem on ergodic decompositions. We may assume that X is the disjoint union of sets x,,,, each equipped with a a-algebra gi. and a probability measure p.., and that T acts ergodically on each (x.,g8., /. . The indexing set is another probability space (SI, S. P), and we have the formula )
fdp = f f fdpdP(o))
fx
for fel)(X).
n
Now for each co,
C. = C n X . and W.. = W n X 0 (where C., for example, is the oscillation set in X 0 off restricted to X, with respect to T restricted to x.); therefore
and
fe
fdp. = f f fdp..dP(co) = 0
dp = ff fdp.dP(to) = O. sf,f
Û w.,
We have proved the following comprehensive result. Theorem 1.6 For any measure-preserving flow { 1;} and any fe l)(X),
f dp . f f dp = O.
fe David Engel has pointed out that one can prove this result without making use of the full force of the theorem on ergodic decompositions. First, we extend the Maximal Equality to the non-ergodic case as follows.
sz
.5. More about almost everywhere convergence
Theorem 1.7 Let (T,} be a measure-preserving flow on (x, a, p), let feL l (X), and denote by 5 the a-algebra of all 1-invariant measurable sets. If E( f I .0) ..- 0, then
fdp = 0 for each iEf. J
it-14c
Proof: In case E( f j.f) > 0, we use the Ergodic Theorem to show that the components of W. are bounded. On an invariant set where ,E(f If) = 0, replace/ by4 =f + E. Then f dp = 0
I
l nAlfa° A(4) = A(f), so the conclusion follows by approximation. The
R >o
and rest of the proof carries through as before.
Corollary 1.8 (Engel and Kakutani 1981) If a(x) is measurable with respect to 5 and a(x) ..- E( f I .f) a.e., then ff dp = I nix: f(x)> a(x)!
Œ d/.L
for each
!E.,.
J in(x:Pfx)>cc(x))
Now the proof that WO. =1,6f dit =0 proceeds as before. We divide C into invariant sets according to the value of sgn E(f I .f) and observe that the integral off over each of these sets is 0. We may also consider crossings of, and oscillations about, levels other than 0. For any ŒE P, let
1 t '
1 t"
C a = xe X : there are t, t' > 0 with - F (x)> a and - F ,(x) < a
1 W.= {xE X: there is t> 0 with - F (x)= al. t '
Theorem 1.9 For any measure-preserving flow (7;), fE 1,1 i X), and ()cell,
f di .--- ag() and f fdp = ap.(WŒ). With a slight auxiliary argument for the converse, the following result is also immediate. (Here =-- means equals up to a set of measure 0.) Corollary 1.10 If WŒ =-
{T,} is ergodic.
X, then a = if dp. The converse holds if
.1.1. More about the maximal ergodic theorem
83
Thus a = Ifdp. is the only level which can be crossed by almost all the graphs {(1/t)F ,(x): — co
f
fdp
t
If a # J. f dp, there is a positive-measure set of trajectories which stay entirely below the level a, or such a set of trajectories which stay entirely above the level a. Remark 1.11 The following are equivalent:
(1) Cc, -4-- X. (2) f f dp. = a for almost all w.
x,„ 1 (3 )J = lim - l(x) equals a a.e. g.c. I The discrete case If T: X --, X is a single measure-preserving transformation, it is not hard to see that equality cannot hold in the Maximal Ergodic Theorem the same way as it does for a flow. (If we deal, however, with a function that takes values only in ( — 1, 0, 1), then again there is a sort of continuity and the equality can be recovered.) For fE V (X), we consider the partial
sums .- i
s„f(x) =
E f(T kx), k=0
the extrema
S* f(x) = inf S,,f (x) and
S* f(x) = sup Snf(x),
nil
and the sets of constant sign
A = {xe X : Sf(x)> 0 for all n ..- 1}, A = Ixe X : S,,f(x) -- 0 for all n › 11, E = {xe X : SJ(x) < 0 for all n ..- 1 } , E = [x€ X : S.f(x) 4. 0 for all n ,- 11.
nil
S4
J. more about almost everywhere
convergence
Theorem 1.12 If T is ergodic and Sfdp ?b, 0, then
fAS*f du = f S * f dp = ff dp. A
Proof: Since S *f = 0 on A\A, the first equality is clearly true. If gA) = 0, then the Maximal Ergodic Theorem applied to —f shows that
0 -.<..
— fdp= — f fdp,
so that
f dp -4-50 JAC
and we have 0 4 ff dp = f f dp 40, A'
whence I f dp ---- 0 = IAS*f dp, which proves the Theorem in this case. Suppose now that p(A) > O. Then for almost every xe X there is a smallest n(x)?.'.. 1 such that
T" E A. I claim that Mx) -1
S * f (x) =
E
f (Tkx).
k= 0
For Tn(x)xEA, so that
E
f(7`kx) ?JO
for all m .- n(x).
k = n(x)
This implies that the infimum S* f(x) is achieved by some i with 1 ..<- i ..ç. n(x):
S* f(x) =
E f(T kx). k=0
Then for m -- i,
E f(Tkx) = k=E0 f(Tkx) — k=E0 f(Tkx) -13,
k=i
so that rxEA and hence i = n(x).
3.1. More about the maximal ergodic theorem
85
Now we may compute that ri(x)- 1 ff dp .-f( Tkx)dp(x)--=.f S * f dp
f
A
E
k=1)
A
Remark 1.13 The preceding argument involves a sort of discrete Rising Sun Lemma. The Theorem is also, via suspensions (i.e. flows built under constant functions), a corollary of the Maximal Equality for flows. Corollary 1.14 If T is ergodic, f ELAM with ff dp ..- 0, and f takes values only in 1 —1, 0, 11, then p(A)= f f dp .--- ff dp . A
This last result is familiar in the theory of random walks. Suppose we flip a coin with probability of heads equal to p> 1/2 and that of tails equal to q = 1 — p. and we move up one unit for each head obtained and down one for each tail. (i.e., T is the Bernoulli shift alp, q) on the symbols +1.) Then Snf(x) gives the walker's height after n (independent) flips. The Corollary states that the probability of always remaining above 0 is p — q. Our result applies to all stationary processes. not just the independent, identically-distributed ones; on the other hand, the conclusion of the Corollary remains true for some non-stationary processes too, as for example in the Ballot Problem (see Feller 1950, p. 69). Let us turn now to the unrestricted discrete case: we no longer assume that T is ergodic or f f dp ..- O.
Theorem 1.15 If T: X —* X is a measure-preserving transformation and fe/2 (X), then S.f dp + sie PI dp.= i f di.i. JA A
j
E
Proof: We again use an ergodic decomposition of X into {(X .,.4., p j: foci-4. (As before, arguments avoiding ergodic decompositions are also available.) If f dp...>: 0, then fx.
f dp.= x.
S* f dp.; 16Ar
1
X,,,
while if fx. f dp. <0, then f f dp. — S*f dp. ,74.,, fEnx,,
i.
More about almost everywhere convergence
(as may be seen by considering —f in place off ). Thus ff dp =
S*f
f dp. dP(w) =
x,„
{
col x,o fdpo)
dP(w)
.Arl
S*f dp. dP(co) E n X co
Ito:1 x f dcv <
S * f dp
f S* f dp,
A
since, for example, toe A n X w implies that ff dp.
O.
If f takes only the values — 1, 0, 1. as in a simple random walk, then the Theorem states that the probability that the walker first returns to 0 from above equals the probability that he first returns to 0 from below, no matter what the probabilities of moving up and down are (and even in the case of dependent increments)! For S* f = 1 on A and S*f = —1 on E, so that according to the Theorem, 14A) — ME) =
f > 0} —
f < 0},
Or
p{f >0} - AA) = p{ f <
— 1(E)),
that is, pix: f (x) > 0, Snf(x)
=
0 for some n 1} f(x) < 0, Snf(x) 0 for some n
(the probability of first returning to 0 from above equals that of fi rst returning to 0 from below). By considering A and E in place of A and E. it follows similarly that p.tx: f(x) > 0, Snf(x) < 0
=
for some n 1} f(x) < 0, Snf (x) > 0 for some n
11.
In the LLD. case of a true random walk (i.e., T a Bernoulli shift), such results can be easily proved by symmetry considerations:
3.1. More about the maximal ergodic theorem
87
gach path which returns to 0 correspcnds (under a measure-preserving transformation obtained by considering the increments in reverse order. starting at the first impact on 0) to a path on the other side of 0:
Our theorem shows that such symmetry is present in every stationary stochastic process. For further discussion of random walks with stationary (dependent) increments, see Derriennic (1980). D. The Dominated Ergodic Theorem and its conu erse Wiener's Dominated Ergodic Theorem gives a sufficient condition that f * : that fa, log L. i.e.
fx 1.1.001 log
+
1.1 . 001d/4x) < X.
(For f* to be in L t means that the a.e. convergence of (1/n)Sn f is of the kind in the Dominated Convergence Theorem, hence the name.) Because of the Maximal Equality, we know the distribution of f* well enough to see that for a nonnegative function this condition is also necessary. Theorem 1.16 (Wiener 1939, Petersen 1979) Let { Tr } be an ergodic measure-preserving flow on (x, g p) and 0 f EL 1 (X). Then f *ELI if and only iff E L log L. Proof: For each a _) 0, let ,
G(a) = pixEX: f*(x)> al.
From (B), for a
If di/ we have
1f
GOO = -
Œ if* >
f
88
3. More about almost everywhere convergence
Now for any continuous increasing function h on [0, w) with h(0) = 0, we have
fx h(f*(x))dp(x)
—
h(y)dG(y) =
G(y)dh(y).
(For the last equality, check it first for bounded functions and then apply the Monotone Convergence Theorem.) Thus co
Ix
f*
=
II
G(y)dy = f
G(Y)dy +
G(Y)dy
0
roe
=c+ Ilf II 1Y
co
c + .1'
f
f dpdy
If* >yI
1 - J.
f(x)
f(x)
f dp dy = c + i f > VIII)
VII i -V If > y}
dy d p.(x)
f 11111 1
Y
= c +f(x) [log f(x) — log I f1]dp(x). 11 > 11 111 ii
so that f* EL' implies f EL log L. For the converse, we use a neat trick of Wiener's: if g = f • (a/ 2)} , then f* 4 g* + a/2, and the Maximal Ergodic Theorem applied to g says that
tif f * > cx} p{g* > (a/2)} -
gdp 4- f gdp a X
2
f dp.
— Œ If> (ca.211
Thus
L
f* dp = c + • c+
p{ f * > y} dy 211f Iii le 2
f dpdy if > (y1 2) 1
211111 1Y
=
f(x)
c +I
J 1f>llfII l I
1(x)
dv
I:114x)
J11111 1 y
• c +f(x) log + f (x)dp(x), > II f II
so that f EL log L implies f* E L1 . Reverse maximal inequalities and converse dominated theorems are found in other contexts as well: see Burkholder (1962), Stein (1969),
3.1, More about the maximal ergodic theorem
89
Gundy (1969), Ornstein (1971b), Derriennic (1973), Jones (1977). Part of the interest in the question of exactly when f* is integrable arises from the general theory of Hardy spaces: frequently one says that f€11" if an appropriate maximal function of f is in D'. See Hardy and Littlewood (1930), Burkholder, Gundy and Silverstein (1971), and Fefferman and Stein (1972) for the original results in this direction and Petersen (1977) and Garsia (1973) for an introduction to this deep and fascinating subject.
Exercises 1. Show that if T is a positive contraction on L', 0 f gel}, and re-1
E=
rg for some n 1
> k=0
k=0
then gdp
f dtt.
E
2. Prove that the sets A, A are measurable. 3. Give examples for which A # A, in both the flow and discrete cases. 4. Prove that if { 1 } is ergodic, then (6. = X if and only if a = f dp. (Hint: First consider the discrete case. If J f dp = 0, use the tower decomposition of X with respect to A to prove that p(A) = O. Then consider the time 1 map of and f i (x)= f o' f( 7;x) ds.) 5. Prove Remark 1.11. 6. Fill in the details of the proof of Theorem 1.7 and prove Corollary 1.8. 7. Show that the Maximal Ergodic Theorem (see Section 2.2) still holds if we only require that f ± EL', rather than f 8. If f eV and .31r is the family of invariant sets, then, without assuming ergodicity, (a) in the discrete case E( fif) = E(S * f-X A + S*f. xE ll) a.e. (b) in the flow case, E(f1.5r) = E(f.(x A + z E)1.1) a.e. (where = ix: F1(x) -.5 0 for all t 0}). 9. Work Exercise 8 on the basis of 1.12 and 1.3, avoiding ergodic decompositions. (Hint: (a) Consider first an integer-valued f, and work separately on the sets where E( f > 0, Et f ) = 0, and E(flit)
90
3. More about almost everywhere convergence
3.2. More about the Pointwise Ergodic Theorem This section includes two main topics. First, we will give a proof of Birkhoff's theorem which shows more clearly exactly how the maximal inequality produces the convergence theorem. The same type of argument will also appear in the next few sections. when we prove the Fundamental Theorem of Calculus and the existence of the real-variable and ergodic Hilbert transforms. Second, we will show that in general no statement can be made about the speed with which the convergence takes place in the Ergodic Theorem. A. Maximal inequalities and convergence theorems Already in 1925, Kolmogorov (1925) proved a maximal (or weak-type (1,1)) inequality for the conjugate of a harmonic function. and soon thereafter (Kolmogorov 1928, 1929-30) he established a weaktype (2.2) inequality for sums of independent random variables. Hardy and Littlewood (1930) further demonstrated the usefulness of maximal functions in analysis. They considered both the Hardy-Littlewood maximal function
M f (0) =
sup - I f (0 + x) dx (f E [ — n]) 4/4
nt
0
and the nontangential maximal Junction
Nc F(0)= sup IF(z)1 (F analytic on D = the open unit disk), zEn..(0) where St (0) is as shown.
and used these functions to study the nontangential convergence of analytic functions to their boundary values and what we now call Hardy spaces. F. Riesz (1931) based a direct proof of the Hardy- Littlewood Maximal Theorem (originally proved from its discrete version) on his Rising Sun Lemma and the (consequent) maximal inequality
m{Mi >
f
-IfI dm 1
3.2. More about the pointwise ergodic theorem
91
for the Hardy- Littlewood maximal function. Riesz was primarily interested in applying these ideas to differentiation theory, and indeed he showed that the Fundamental Theorem of Calculus for the Lebesgue integral, 1f - {f (x - t) - f (x)] dt e 0
0 a.e.
as e --+ 0,
was easily proved via this approach. Later, F. Riesz contributed important simplifications to the proof of the Ergodic Theorem; as we have seen, however, it was because of Wiener (1939) and Yosida and Kakutani (1939) that the Maximal Ergodic Theorem achieved its central position in the argument. For certain stochastic processes {X k(co):k = 1, 2, ... } (submartingales), maximal inequalities like 1 P{sup X k (co) cx) k4n
a
X n dP flsup Xk .rtj
were used by Paul Lévy (1937) and Jean Ville (1939); these led to the martingale convergence theorems of Doob (1940, 1953). Finally, we should mention that the discoveries of recent workers such as Fefferman and Stein (1972) and Calderôn (1950, 1977), have led to a very deep understanding of maximal functions and, of course, have also produced a set of new problems of correspondingly greater depth. The basic reason why the maximal function and the maximal inequalities are so useful in analysis was already formulated by Banach (1926). Theorem 3.1 Banach's Principle Let 1 .‘ p < cc and let { 7'.} be a sequence of bounded linear operators on LP. If
T* f = sup IT f < cc a.e.
for each fe LP,
then the set off for which { 7'nf } converges a.e. is closed in LP. For the proof, see the Exercises. Frequently it is easy to exhibit a dense set g in LP such that { Tnf} converges a.e. for each fe g. If one can prove that T* f < co a.e. for each f elf, then the a.e. convergence of {Tnf } for all f e LP will follow from Banach's Principle. In most cases, that T* f < ce a.e. for eachf e LP is established by proving that T* f satisfies a weak-type (p, q) inequality : \q c } { 7.'1 > 11 P
(Indeed, Stein 1961, Sawyer 1966, and Garsia 1970 have shown that in many situations T* f satisfies such an inequality if and only if
92
3. More about almost everywhere convergence
T* f < oc a.e.) Of course the Maximal Ergodic Theorem is an example of such a result. We will follow this scheme of proof several times in the following pages and will thus have a chance to see it operating in various contexts. First. let us apply it to the Ergodic Theorem. Approximation proof of the Ergodic Theorem 2.2 We will prove the
essential part, namely that the ergodic averages in-1
/
f Tk converge a.e.
First, if gel, and f = g - T.q (so that f is a `coboundary'), then i n-1
1
E (g T k - gTk + 1 ) = -(g - gT") -40 a.e. n k= 0 n k=0 n Also, iff eV is invariant (f T =f a.e.), then =-
y
_ f Tk -> f a.e. n k=0
Thus if g = { f + (g - Tg):fel}
is invariant, g e Ll
and in-
1
W = If eV :- 1 f T k converges a.e. , n k=0 then g and Sf is a linear subspace of L'. We only need to show. then. that (1) 9 is dense in 1,' and (2) (if is closed in L'. In order to prove (1), let g 2 -= if + (g - Tg): f E L2 is invariant, gE Ll.
We will show that g 2 -6i is dense in L2 , and hence in L' (recall that on a finite measure space 11 II 1 0 112 by Holder's Inequality). If h e 1,2 and h 1 (g Tg) for all g e Lc°, then -
TA hdp = f hT - 1 dp
for all A ER,
A
and hence h = hT - 1 a.e. This implies that L2 is even the orthogonal direct sum of the invariant functions and the closure of 1g - Tg:ge Ll. At any rate, 22 is dense in L2 and hence in L'. To prove (2), let ln-1
Anf =- E IT" nk=0
3.2. More about the pointwise ergodic theorem
93
and note that by the Maximal Ergodic Theorem
p{sup kJ' > al ‘4.2.{sup A VI > a} Œ
a
VI. >
Suppose then that fk e(6 and
in L t ; we will prove that fundamental (i.e., Cauchy) ac., so that also f e(6. Now
A mf
Ill
I fi dz
A nf,d+k m(j . —fdi+
•
A n f } is
f)I.
and the first term tends to 0 a.e. as m, n --+ x, since {Anfk } is fundamental a.e.; therefore, for each a> 0
p Ilim sup I A nif— A nfl > 1177.11 -i
z_çtti2suplAy—fdl> ‘- --2cx Ilf — fk which can be made arbitrarily small by a suitable choice of k. Thus
p lim sup lAni f n1.n-0
A n fi > a } = 0 for each a > O.
XJ
and hence. letting a
0, we see that lA n f'j is fundamental a.e.
B. The speed of convergence in the Ergodic Theorem Suppose that T: X X is an ergodic m.p.t., so that
n-
-
y fiTkx)_-.
ffdp a.e. for each f EL I
PI k = 0
For each n 1. let SH fix) =1(x) + J. (T x) + + fir - x). Let us consider only functions Je 12 for which ffdp = O. For such an j; S
a.e.
It is not unreasonable to suppose that for some T a series such as
Sn2f L. n=
1
fl
might converge a.e. After all, not only does
S j(x)
0 a.e..
but it usually changes sign frequently enough so that the convergence of this series, even if not absolute, might be enhanced by a great deal of cancellation, as in the case of an alternating series. Indeed, the series does
94
3. More about almost everywhere convergence
converge a.e. in many cases: if the f r are independent, as when T is a Bernoulli shift and f is a coordinate function (by the Law of the Iterated Logarithm); if Tx = x + a (mod 1), f is the characteristic function of an interval, and a is not a Liouville number, i.e., for almost all a (by numbertheoretic estimates on the `discrepancy'); iff=g—gT for some gel...'; or if f is a nonconstant eigenfunction of T. (For sums of independent random variables X k , the Law of Random Signs asserts that Eak X k converges a.e. if Eak2 < cc. The definitive result along these lines is Kolmogorov's Three Series Theorem: If {Xn } are independent, then EX,, converges a.e. if and only if each of the following three series does, for some AER:
Eptix„I-4. ygx(A) n ') yvar(V)) n Here X.(À) is X , truncated at + ),: v)(co) = .1 X n(co) if I Xn (c0)1 , 0 otherwise. Statements about series E akf(rx) and yb n (Sj(x))/n are related because of partial summation.) We will see, however, that the series Snf(x) L /1 2 ).
n=1
does not in general converge a.e., and that in fact (S,f(x))/n does not tend to 0 fast enough to make any divergent series convergent: given any sequence b„ -- 0 with y,°,0 b.= oc, and any ergodic T, there is an feLœ such that ,
Eœb.S "S'(fn x)
a.e.
n= Y
Thus, no general statement can be made about the speed with which convergence takes place in the Ergodic Theorem. We will actually construct such a function f for each given sequence {kJ and each ergodic T, using one of the basic tools of ergodic theory, the Kakutani-Rokhlin tower (see Lemma 2.4.7). (A category argument for the existence of such functions has been given by del Junco and Rosenblatt (1979).) Theorem 2.3 (Kakutani and Petersen 1981) Let T: X -4 X be an ergodic m.p.t. on a nonatomic probability space (X, (,4,14, and let I), .- 0 be any sequence for which CO
E k=1
bk = oc .
3.2. More about the pointwise ergodic theorem
95
Then there is an fel.f)(X) with if dp. = 0 for which S f (x) sup E b k = co a.e. ; k k k=1 in particular, the series
E
S f (x) n
b. " 1
diverges a.e. Proof: The function f will be defined as an infinite series,
fn 1=
E
The constants K. / oo and a sequence Çi oo will be defined inductively, and each f. will be defined as a simple function with respect to (i.e., constant on each set of) a Kakutani-Rokhlin tower in X. Let Gif(x) =
E
Skf (x)
bk k
,
k=1
we will show that for large N, G IN is very large on a set of measure very and E \ 0 such that close to 1: there are sequences M p{x
: GIN (x)1
MN}
1 -ç.
This estimate will be carried out by splitting the series for bk f (x) E_ E E„.1— E -'k= k n ., K,,
bk f (x) k n— 1k=1 k k v "K„ into three terms-those for which n N, and n = N-and showing that the term for which n = N dominates the other two. To begin, let K 1 = 1,1 1 = 1, and find a tower of height 2 in X: That is, find a measurable set A 1 c X such that A 1 n TA i = 0. Define f1 by f1 1 on A 1 , f1 - 1 on T A i , andfi -a 0 on X - (A l u TA Notice that Skfi (x) takes only the values 1, 0, and - 1, for all k and x. Suppose that K r I, and fi have already been defined for all j < n, and that c, = Sup SUP I S kfi „ < ce . Gif (x)
i
(1)
k1
-
Choose K. large enough that (a) K.>, K._ 1 , (b) bk for all j
(c) \Tr< k=1
< n.
—s
96
3. More about almost everywhere convergence
(2) Choose I
enough that
(a) 1. .>.- I._ 1
i„
E
(LI)
b E1" ±L. k
bk .. K n c
k=1
This is possible since
k=1
Ek(43
_1
b
= 00
implies that
I
E
bk
k =1
I
E k=
bkfic
1
(3) 4, is defined by using a tower of height 2n1 X, with 'remainder' having measure no greater than 1/n. That is, find a measurable set A such that A. T A,,,...,V"'" - ' A pairwise disjoint and 2ni, - 1
(
p i
u TiAn ) ...- 1 — 1-. n =0
We think of the tower as consisting of a lower half, n blocks each of height Ç, and an upper half, again n blocks each of height in.
97
3.2. More about the pointwise ergodic theorem
The function fe is defined to be 1 on the entire lower half of the tower, —1 on the entire upper half, and 0 on the remainder: fe -= 1 on TA n for 0 ‘.../ nin — 1, ffl a
Ion TA n for nin ‘..j -<2n1— — 1, 2n1,,— 1
—= 0 on X\
U TiAn . i= 0 feLD, J fdp=0, and Skfn(x) takes values only in Notice that — n1,,, nie], for all k and x. Thus ce+ 1 The K, le , andfn are therefore defined by induction for all n = 1, 2, We let
E k=i
f(x)=
fe(x)
Q< cc. This series converges in L, since fn I ‘.1 a.e. and EokriWe have fel," and f f dit = O. We will now show that, for large N, G I f (x) is very large on a set of N measure very close to 1. We may write bk
GiNf(x)=
I N bk
S f(x)
kk
EE
=
k=1
k=1
E"1
b
IN
Skf.(x)
k n= jr(,,
1 n < AN/
kskirn (x)
L,
n=lA k= i k
'N
b Sk fe(x)
kE =1
IN b
1
=171: Skfn(x) >NN/KnkE nI
1
'Nb k1
Skiv (x)
= 1 + 11 +
Now
I1I
E
1
n
bk
= eN Ic=1
= CC N
IN b
E kklskin(x)i E 1
k )n<1,1.1F <
1
v b k eN n
E
bk (
eN k=1
k
to
1 \
b (by 1(b)), where c is a finite constant. E -A k=1
I
98
3. More about almost everywhere convergence
Further, since I Sk f.(x)1 ‘.k for all k and x,
I Hi
1
1
IN
(by 1(c))
FK-nk=i
vo.N N
1
u,
.
c', another finite constant.
However, Sk fm(x)= k for k = 1, ... IN if x is in the bottom n - 1 blocks of the lower half of the N'th tower, and Sk fpi (X) = - k for k =1, ... 1N if x is in the bottom n -1 blocks of the upper half of the N'th tower. This set has measure at least ((N - 1)1N)1 . More precisely, Skfiv (X)= k for k=1,...I N if xe VA N and 0 ‘..j ,.<,. (N -1)1N -1; SkfN (X) = - k for k =1,...IN
if XE TiA ly and Nip, ‘. j ..‹,. (2N - I)1N - I; Ti A N ) > ( N - 1)2 U 0-S. »S(N - 1)1N - 1 N p( tw ig...5;4(2N- i)i„, - I
Therefore, on this set
1 111 1 = /
1
IN
E
-v KN k=1
Thus
'MI
-*
y bTk IN
b k ..- N/K N eN
.
k=1 rt
cc as N -, 00 , so also
II + II + IIII -■ co as N -4
. We
III +1111 have proved that ti {x : lim sup 1 '", x
EI k=- 1
bk
Sk f(x)1_ + 1_ 1. ,
-
r‘"
While for if T k } independent we have the Law of the Iterated Logarithm (Khintchine 1923, Kolmogorov 1929)- according to which I V (x)I = °( ‘,/n log log n) a.e. because of this Theorem no analogous statement can be made for stationary stochastic processes in general, for any choice of the comparison speed. Some applications of this result to questions about the convergence of series of the form OC
E
ak f (T k x)
k= 1
are mentioned in the Exercises and in section 3.6.
3.2. More about the pointwise ergodic theorem
99
Exercises 1. Prove Banach's Principle. (Hint: First use the Baire Category Theorem to show that if Tsf < cc a.e., then there is a function 4:5(a) such that NO —> 0 as a —> cc and pi Tsf > f li p } Nu) for all .fe LP and all a > O. For this purpose, consider the set
El =
IfELP: p{rf> m}
it[sup I Tkfl> m} .4e). IC-5n
Then proceed as in 2.2.) 2. Extend the approximation proof of the Ergodic Theorem to the infinite measure case. (Hint: {f+ g gT: f, g e L.2 and f= f Tax.) is dense in 1,2, and weak (1, 1) implies (using the integration technique on p. 88) that IIT*fI1 2 cll f 112 for fe L 2.) 3. Prove that no statement concerning the speed of convergence in the Ergodic Theorem is possible, in the following sense: Given any ergodic transformation T on a nonatomic measure space and any sequence an ■ 0, there is a bounded measurable function Jon X such that —
—
lim sup
I[Snf(x)In]— ffdpi
a.e.
(See (Krengel 1978)), where it is shown that if X = [0, 1), then f can in fact be chosen to be continuous or the characteristic function of a measurable set.) 4. Suppose ck ck+1 0 for all k = 1, {kc k } is bounded, and Ekc-_ k cc. Prove that for any ergodic transformation T on a nonatomic measure space X, there is a function feL(X) with jfdp = 0 such that
E
c,f(Tkx)
k=
diverges a.e. Thus series like
co f (Ty k
f(Tkx)
E
k=3 k log k log log k
, etc.
can be made to diverge a.e. (But see 3.6.) (Hint: Let bk -k(ck ck , ) and use partial summation.) 5. Prove the Ulam von Neumann Random Ergodic Theorem (Ulam and von Neumann 1945) Suppose that (X, .4, p) and (F, 9, y) are probability spaces, and that for each yel- there is a m.p.t. Ty : X > X. Assume that for each measurable fon X. the function -
—
100
3. More about almost everywhere convergence f(T).x) is jointly measurable in x and y. Let 0 = ri% (F, W, y be the product measure space; each WE is a sequence co = {yn(co)) of elements of F. If feL l (X). then for almost all coef. there isLe /.2 (X) such that
lim
1" -
E f (Tim _ , 00 ... T). , ((,,) Tmo x) =1.(x) a.e.-dp.
n -+ co ri k= 1
(Hint: Apply the Ergodic Theorem to a suitable skew product.) 6. (R.L. Jones) Let T: X -■ X be a m.p.t. and fe L i (X). (a) Show that f (T kx)/k -■ 0 a.e. (b) Show that if p> 1, then
i
(f ( TkX) y k )
k=
converges a.e. (Hint: Assume /.. - 0 and show that
ci3 {x:[ E
T P > a } ‘ c I f il , )P (\f (Tkx) k J .1
by writing Tk - s. AI (x) = f( and establishing such inequalities for g separately. Recall that g(x)
hk =fT k
-
gi
idg k(x)1Pdp -- ji; pa P - 1 pIgk > aldoc.) (c) If p > I, then f(T kx) L 'I kP k=
converges a.e. (Hint: Use partial summation.)
3.3. Differentiation of integrals and the Local Ergodic Theorem For a real-valued function ge OR), we define the HardyLittlewood maximal function Mg of g by Mg(t) = sup f>0
i r:
—
2.E „
I g(t + s)Ids.
We first prove the original maximal inequality of F. Riesz. Theorem 3.1 (F. Riesz 1931) If gE 12(R) and A > O. then
mIxe R: Mg(x) > /1.1 ‘.. ii2, II gil l .
3.3. Differentiation of integrals and the local ergodic theorem
101
Proof: Let A c x: Mg(x)> 11 be closed and bounded. For each xEA there is an open interval I, centered at x with }
{
1
m(k) fixigidm>.t. A finite subcollection / 1 , /2 ,...,/. of these still covers A. We may assume that no point of R belongs to more than two of the I. (For if t is in three /k's the interval with left endpoint farthest from t and the one with right endpoint farthest from / cover the union of the three intervals.) Thus
Ex', 2y,/ k k
k
so that 1 I k ) .‘_ 7,E f Igldm
m(A)
A k
k 1C
-=.-2 j EiglY/ k
k
ik
2C •-j IglY / k di
2 ---2 ilg
k
Theorem 3.2 Lebesgue Differentiation Theorem (Lebesgue 1904) If g is locally integrable on R (i.e.fe L I (K) for each compact K c R), then
lim
I f' —
,_. 0 . 2e _
g(t + s)ds = g(t) a.e.
Proof: Fix an open interval I and suppose that geL l (I); we will show that the formula holds a.e. on I. If g is continuous, then as in elementary calculus
1 CE
I g(t ± s)- g(t)Ids
L6 — e
can easily be made arbitrarily small by taking c small enough. Let As(t)= — 1 f g(t + s)ds
2c -
and defme
W = {g€L 1 (1): illg -0 g a.e. on / } . We have just seen that 5° contains the continuous functions on I. a dense set in L i (I), so that it is enough to prove that W is closed in 1,1(I).
LUZ
J. More about almost everywhere convergence
Suppose then that gk E As(t) — g(t)I
and gk —> g in VW. Since
g(t) 1 + I g k (t) — g(t)I
I A ,(g — g)(01 + I Ag(t)
and lim sup I il,gk (t) — gk (t)I =-- 0 on I,
we see that for each 2 > 0 m tt E
:
lim sup I A r g(t) — g(t)I> A:
+mttE/:1
Ç
A}
...Ç 3 11g —gk Il i +mftEl:Ig k (t)— g(t)i 1
2
which tends to 0 as k x. (Recall that L' convergence implies convergence in measure). Thus in { tei: lim supikg(t)— E--.0and letting 2 --. 0 shows that
g(t)i > A } = 0 for each A > 0,
lim supj As(t) — g(t) I = 0 a.e. Remark 3.3 This result can be extended to show that 1 CC
Ig(t + s) g(t)I ds
0
for a.e. t.
BE -
The set of t for which this happens is called the Lebesgue set of g. Theorem 3.4 Local Ergodic Theorem (Wiener 1939) Let {Tr} be a measure-preserving flow on a probability space (X, .4, p), and suppose that fE Li ( X). Then lim E -, 0 +
I
lit
f(Tsx)ds =f(x)
a.e. dp.
E
Proof: Direct proofs are also possible, and of course the result would follow immediately from the representation of 11;1 as a flow under a
function if this could be established independently, but we will give Wiener's original argument, which uses the Lebesgue Differentiation Theorem and Fubini's Theorem. There is a set of x of full measure for which f(Ts.x). as a function of
3.4. The martingale convergence theorems
103
s e R, is locally integrable on R. For any fixed x in this set, by the preceding Theorem ER: Jim ft
1 f
f(Ts Tr x)ds #f(Tx)
—
2e
-e
has measure 0. Hence (t, x): convergence fails at Tx}
has product measure 0, and therefore for almost all t p{x: convergence fails at Trx} = 0. If we choose and fix a single such t, then, letting F = {x : convergence fails at x},
we have that 0 = p{x: Tr xeF} = p(T _,F) = p(F), which proves the Theorem.
3.4. The martingale convergence theorems Let (0, .9-, P) be a probability space and S c ... an increasing sequence of sub-a-algebras of A sequence X 1 , X 2 ,... of functions in 00) such that X is measurable with respect to .9 -n for n= 1, 2, ... is called a submartingale if E(X
1 1 .F„)
a.e.,
a martingale if E(Xn+1 1.9-n) = X n a.e.. a supermartingale if E(X . 4_ 1 1 .9" .) 4. . X
and a.e.
If Xn is the fortune of a gambler after n plays, in the case of a submartingale the game is favorable to the player. The word 'martingale' was formerly used for a certain betting system, possibly appropriated for that use from its usual meaning as part of a bridle. The word 'supermartingale' is unfortunate– it is in no way connected with grocery stores. Here {Xn} is properly called a martingale (or submartingale or supermartingale) with respect to {. .}; the simple word 'martingale' may be reserved for the case when ..F n is the a-algebra generated by X 1 , X 2 ,.. .,X. The fundamental result about martingales is the Martingale Convergence Theorem of Doob, which is comparable in importance as well as content to the Ergodic Theorem. The following extremely elementary proof was shown to me by J. Horowitz. In contrast to the usual proofs and the other arguments in this chapter it relies on no maximal inequalities
104
3. More about almost everywhere convergence
or uperossing lemmas. It is no harder to prove the a.e. convergence of L' -bounded submartingales, and we will do so. First a simple decomposition will reduce the statement to showing that every non-negative supermartingale converges, and the proof will be completed with the help of an elementary optional sampling (i.e., stopping time) theorem.
Theorem 4.1 Submartin gale Convergence
Theorem Every 12-
bounded submartingale (i.e.. supn El Xn I < oc) converges a.e. Lemma 4.2 (Krickeberg Decomposition - see Neveu 1975) Every
L'-bounded submartingale P(.1 is the difference of a non-negative martingale {Mn } and a non-negative supermartingale {S}: X. - M. - S. for all n. Proof: Notice that {X: } is a non-negative submartingale, since
I..-Fn )
E(X:4_
E(X. +1 1,),..>- X.
and hence, since E(X:+1 1.F.)
E(X:+1 1.F.)
X: a.e.
0 and X
either X: or - X:.
for all n.
For fixed m n.
E(X:+1 1',Frd= so that E(X: limit
I Fyn ) increases in n for fixed m and hence has an increasing
M.
T E(X: grn).
I claim that {M.; is a martingale. For WI/1,J gn) E(lim T E(X: I g
= lim T
dig n)
X: = lim E 5-`" X: = M,„.
Also, each M n EL', since
E(M) = lim T E(E(X: .)) = iim T E(X:) ‘.
SUP
El X <
Then let Sn = M. - X. for n = 1, 2, ... . Clearly Sn 0 a.e. (see the second line of the proof) and each Sn is in P(0). Also,
= E( Mn+lign)- E(X 1 I .F„) so that {Sn ) is a non-negative supermartingale. E(Sn+
M - X,7 S a.e.,
3.4. The martingale convergence theorems
105
Lemma 4.3 (Optional Sampling)
Let {Xnl be a non-negative supermartingale with respect to tgrn} and a, t: 0 --- { 1, 2,..., CO } stopping times (i.e., {co: Ow) ‹ n} E .5% for all n = 1, 2,..., and similarly for T). Define {
XT(co) =
Xr(a)) (CO)
0
if r(o) < co if T(0)) = oo ,
and similarly for X.(co). If a(co) 4 r(co) a.e., then
E(X 0) ..- E(X r). (In the case of a supermartingale, a gambler's fortune tends to decrease with time. According to this Lemma, the sooner he stops, the greater his expected fortune.) Proof: Fix ni ...- 1. If n ..- m, then
X TAn dP =
X dP + r
L=m)
X r dP +
..-
f
X n dP ffu=nt, r >
n)
X n + 1 dP = f
X rA(n+ 1) dP
(since {a = m, r> n}e,„ and the supermartingale condition says that
TA
Xn+ i
dP4. f X n dP if Aeg-n). A
Also, for n ..- m, on {a- . m} we have m = a ..‘. -r, so that
X gdP = L=nt)
Xm dP = fla =n11
la =m)
and by the first part of the proof this is greater than, or equal to,
ficr=m1
X T A.dP
X rAn dP ?:" fin=pn,r<x))
(since each X. ,.. 0). Thus -
fin=
XadP ..et)
La
X rn A dP
for all n ->- m. We want to apply Fatou's Lemma; note that X, as n -- co on {a = m, T < 00 }
and X, = 0 on {-r = cc}.
106
3. More about almost everywhere convergence
Thus
f
X dP .-- lim inf n
fa = ml 6
X t An dP ler=m. r< col
X dP
lim inf ,dP =
f
XtdP.
ict
=-Pnl
Summing on m gives
J'
f XT dP.
'CAP
Corollary 4.4 Under Proof: Take a I.
the hypotheses of Lemma 4.3, X T e
Proof 4.5 (ofthe Submartingale Convergence Theorem) Because of Lemma 4.2 it is enough to show that every non-negative L'-bounded converges a.e. Let {Xn} be a non-negative supermartingale supermatingl which does not converge a.e. Then there are a, (3 with 0 c
such that
E = {co: lim inf X n(co) < a
KO = inf In > r21 (o): X n(co) > pl for i 0,
and T21 (o) = inf In >
T 21 _ 1 (0*
Xii (C,O) <
for i
where as usual inf (/) = + cc . Then T
0
T
1
T
2
—1
Xt2, < a if T 2i ‘. CC, and X r2. + > if T 2i + 1 < CC) Let pi = P{co:T i(co) < co}. We have p 21 21 - D 2i-1 and /3p 21+1
X
dP 1 214
IX r , • dP ap v ,
1,
3.5. The maximal inequality for the Hilbert transform
107
by Lemma 4.3. Therefore, .e. a P2i-i- I ''''''' — P 2i ....ç
P
a P 2i—1 .4
a
11
but this is impossible since p2+1 ---
P(E) > O.
3.5. The maximal inequality for the Hilbert transform For a function f EL1 (1R), the Hilbert transform Hf of f is defined by I
Hf(0= - -lim
It e.-.0 4-
f
Isl
f(t - s)
ds.
S
E
Considerable argument is required to show that this limit exists. We will show here that the maximal Hilbert transform If f (t - s) H*f(t) =-- sup ds L>o n i s! ?.., E S satisfies a weak-type (I, 1) inequality. m{tER:H*f(t) > i} 4c 11 f 11 , for each ; > 0. ;
While the convergence would, as usual, follow readily from this inequality, we will not stop to prove it, but will move on to the ergodic case. We give the ingenious proof due to Loomis (1946), who rediscovered an idea that actually goes back to Boole (1857). This argument, however, works only in one dimension, and for the higher-dimensional versions one must resort to covering lemmas and Calderón-Zygmund decompositions (see Stein 1970, Stein and Weiss 1971). The Hilbert transform Hf is the analogue on R of the Fourier conjugate of an I) function on the unit circle F. Recall that a harmonic function u on the unit disk D has a unique conjugate function a which is harmonic on D with u(0) = 0 and such that u + iii is analytic on D. If u has boundary values f Elf for some p> 1, then ii has boundary values J The conjugate function is also given by a certain singular integral:
7
7
.
1 t I 7(x) = - - limt f (x + t) , cot-- dt ir14. IIE-4 0 + (see Zygmund 1959). Moreover, there is a simple relationship between the Fourier coefficients {a n} off and tan} off: ein = - i sgn na,,. Similarly, if
3. More about almost everywhere convergence
108
then f determines by a Poisson integral a harmonic function u on the upper half-plane. If is is the imaginary part of a suitable analytic completion of u, then û has boundary values Hf on R. The situation in case p 1 is too complicated and too interesting to be discussed here; it is already hard to prove that /EL" when fE LP for p> 1. We are studying the Hilbert transform here because we wish to establish in the next section the existence of the ergodic Hilbert transforms E LP( R)
J
1(x) = lim
f (Tsx)
ds,
for a flow { Ts }, and
(x) =
E k=1
f(T kx)— f (T -k x)
for a single transformation T. The existence of these limits was first proved by Cotlar (1955). We will use Caldenin's approach (1968), which transfers the real-variable theorems to the ergodic-theoretic context. The first step toward the maximal inequality for the Hilbert transform is a simple covering lemma. which seems to be due to Sierpinski and appears also in the paper of Wiener (1939). A version of this Lemma also holds in R".
Lemma 5A Let A be a bounded subset of and R and {' l ' 12 , ... In} a family of intervals which covers A. Then there is a subfamily {J 1 , J,} of intervals with disjoint interiors such that m(A)
3
E M(4). k=1
Proof: Again, by discarding some of the /k's we can produce a covering of A with the property that no more than two I's intersect at any one point. Choose J i to be an interval of maximal length among the remaining {/k }. We include J 1 in our subfamily, and we delete from A the interval J 1 and any of the lk that meet J 1 . There remains a set A l of measure m(A1) _›-m(A) — 3m(J 1 ).
Now repeat the process with A 1 , choosing an interval J 2 of maximal length from among the remaining /k's and deleting a set of measure at most 3m(J2). There remains a set A 2 of measure m(A2) m(A) — 3[m(J 1 ) + m(J 2 )1 At the eth stage, we remove the last of the /k's. Then we have
m(Jk ),
0 m(A) — 3 k= 1
3.5. The maximal inequality for the Hilbert transform
109
so that
m(A) k=1
A deep study of the function appearing in the following Lemma can be found in a paper of Boole (1857).
Lemma 5.2 If a l , a2 , ... a >0 and g(t) = each A> 0 ,
1" m{t:g(0> = — A i= 1
= m{r:g(t) < — A}.
Proof: Notice that gi(t)> 0 if tO Is, ,
snl, and
iim 00= °D
lim g(t) =
— t), then for
i
so that g has the following type of graph:
The set where g> A consists of exactly n intervals, as shown. There are exactly n points, m l , m 2 , ... m., with g(t) = A, and
1(s1
intg >
—
i=1
The mi are the roots of the equation
i-1
ai = si — t
which may be rewritten as
[ai i=1
ncsi _ 0].). n(s.-t), :1=1
110
3. More about almost everywhere convergence
or
n ( — On - 1 (E ai )t '1- I + ... = 2[ ( — lri n + ( — ir - i E si r - i i= 1
or
1 The coefficient of tn -1 is the negative of the sum of the roots; therefore
1 —A Ea.I
Ern. = Es.J and 1
E(si — mi)
If /3 1 ,
p
the roots of g(t)==
A., then
m{g < —1} =E(p i — si),
and for t = pi we have a E si —= t or
E ai ri (si t)= i=1 (—
j4i 1 (Ea
n (si t), j=1
I
=
A[( —
tn
—
1r '(s1 )t'1
+ •••_11 1
or tn—
1,
(Es, +-Lai )t" - + = 0, .1
so that EPi = Esi
Eaf
and
E(Pi
Eai.
i)
Theorem 5.3 There is a constant c such that if f EL1 (R) and 2> 0, then
mfs ER: sup E>o
dt E
t
,
<
1 fAli
3.5. The maximal inequality for the Hilbert transform
111
O. Let A be a closed and bounded subset
Proof: We suppose first that f of the set where
f(t) f(s — t) dt l > .1.. dtl= sup sup I 1Ief(s)1 = sup t e>0 1 firpse s>0 t — c "Ifil—sl For each s€ A there is an interval I, say of radius es , centered at s such that It'S
f (t) t—s
dt >
A finite number / 1 , / 2 , ... In of these intervals cover A. Let Pk}, with centers sk and radii ek , be as in Lemma 5.1. Then the J,, are pairwise disjoint,
m(A) -. Ç3Em(J k), and
110
> ,1
.4 f(t) tdt
for each k.
Since F(s) =f (t)dt - 03
is a continuous function of bounded variation and f(t) f rclk sk
—t
dt =
Lk
dF(t)
exists as a Riemann- Stieltjes integral, this integral can be approximated by its Riemann sums (Rudin 1953 p. 112):
1
E
ti
i
AFi =
E
Choose a partition t l , Jk such that
E 1
f(t)dt. Sk
ti
tn including the endpoints and centers sk of the
A FL
for each k.
— ti
Let us deal with the inequality
y Kai+
ipr.lk
AF. -1->2 s — ti
for s = sk
112
3. More about almost everywhere convergence
a similar argument will apply in case the sum is less than —À. The left side is a decreasing function of s, so the inequality holds for sk — Ek <s Sk . For each such s, then, either F. A
E s — ti' >z
or
A F. (rti ocJi, s
A
— ti
2
In the first case, by Lemma 5_2 s falls into a set of measure 2
YAF <2 11f III ,
and in the second case into a set of measure 2 ;3.
I
A F1
.
Taking the union on k, we find that a set of measure Eck is covered by a set of measure 21if 1111A (the set of the first case does not change with k) and a set of measure
v2
2
Lal
k
fi n ,i,. lic.lk
Thus (taking also into account the case when the Riemann sum is less than — A), we find that
2
f Il
and rn(A) 4 . .3Em(J k) = 61Ek --
f
1.
Of course in case f were negative a similar argument would apply. For an arbitrary f Ef COI
+ (s)i
- (s)i
so that, H*f > + *f > -
2
IL)
{
11*f
> -}, 2
and hence m{H*f>il}
96
„
„
96
3.6. The ergodic Hilbert transform
113
3.6. The ergodic Hilbert transform We will prove that if {Ti } is a measure-preserving flow on a probability space (X, a' , p) and f e On then
f(Trx) dt lim E- 0 + SE4 itpg. 1/, t exists a.e. A similar argument shows that for a single measure-preserving transformation T : X -- X. the series
Ax)=
f (T kx) — f (T - k x) k k= 1 L
converges a.e. (It might be conjectured that if f e L i with jf dp = 0, then in fact f(T kx) L k k= 1
converges a.e. However, this is not true; partial summation shows that if n Sn f = >f Tk. then
k=-- 1
f Tk v' S f L '1 k = n=1 L' n(n n+ 1) k=
f'
and we have seen (in 3.2. B) that the series on the right does not converge a.e. in general, even for f e Lœ.) The method of proof involves first establishing a weak-type (1, 1) inequality for the maximal function
fk(x)= sup >O
f (T‘ x) dtl. t E41t1<'-11E
Theorem 6.1 There is a constant c such that if f EV(X) and .1> 0, then
pIxEX: f h(x) > ;1- 1
-;ct II f I - -
Proof : Fix N and let
f (T x)'
fNk(X)= sup 1 E.?-*-
UN
t it14 1ft
' t w
dtl= sup E?' -
111■1
f
E4 I tI 4 iik
We will find a constant c independent of N such that
p{x:f 1,1 (x)> A}
;c1 li f II i for all fa,' ;
f(T_ fx) dt I
t
I.
114
3. More about almost everywhere convergence
the conclusion will follow upon letting N —> co. The technique, due to Calderbn, involves transferring the real-variable result of the previous section to the ergodic context by considering one orbit at a time and then applying Fubini's Theorem. This is similar to the idea behind Wiener's proof of the Local Ergodic Theorem, but this time the argument is a bit more complicated; in particular, certain truncations must be made, since in the ergodic case we cannot integrate over an entire orbit. Now H:g(t)= sup i
E>im
g(t — s) ds , cflsi . 5. 1/, s
for ge OR),
is a semilocal operator, in that the support of li:g is contained in a neighborhood of the support of g: g(s)
Hg(r) = sup
—
ds
Eqt--sille t — s
E>IIN
is 0 at points t which are farther than N from supp g. Let F(t, x) = f (Tx),
and
G(t, x) = H:F(t, x) = sup EiiiN
= sup EI/ 1V
ltis
f
F (t — s, x)
eflsr-g. 1/e
s
ds
f(Ti _x)1 ds = fsh(Tix). s 1411E
Then F(t, Tsx)= F(s + t, x) G(t, Tsx)= G(s + t, x),
so that for fixed t 1 and t 2' Rt i , X) and F(t 2 , x) are equimeasurable functions of,,x, as are GO 1 , x) and G(t 2 , x). Fix a > 0 and define f
F(t, x) if Iti a otherwise
F„(t, x) = 10 and
Ga(t, x) = H:FJ• , x)(t).
Now G(t, x) = HF = HI(F., x + (F — Fa+ 1,41 )) ..-Ç 11F., N ± H:(F — F.,N) and F — Fai_ N has support in It' > a + N, so H:(F — F0 + N) is 0 for itj-..ç. a,
3.6. The ergodic Hilbert transform
115
and hence G ‹G a+b,
roll/1
Fix A > 0, and let
E = (xE X :G(0, x)> A) = (xe X : 4,;(x)> A), E= {(t, x):G a+ x (t, x)> A}, and
E= {t:(t, y)€E}, Y
for ye X.
For any fixed teR, let
E'= {x:G(t, x)> A}; by the equimeasurability of the G(t,.), p(E) = p(E) for all t. Thus a
2ap(E) = .1 1 p(Eldt, -a and, since G
OD
= m x p(E)= f m(k)dp(x) x = f m{t:G a+ 1,,,(t, x)> A} dp(x) x
=I
m{t: I-4;Fa+ n )(t, x > 21 dp(x). x Thus, by the weak-type (1, 1) inequality for the real-variables Hilbert transform,
2a(E) f 516 IIF .„(t, x)11 v (d) dp(x) x 11 = 96 f foe 00 IFa + N (t , x)Idt c 112(x)
_ 9:af f
+N
96
X a+ N
1 F(t, x)Idtdp(x)
(a + N)
A - la +1*/) afX 96 ral- N =
Ai
-la +N)
IF(t,
96r+N f ( i F 0, x)14(x)dt A j - a +141) X 96
x)I dp(x)dt =
(
11f11 idt= 2(a + N)71 IlfIli.
116
3. More about almost everywhere convergence
Therefore a+ N 96
A 11 10
a
for each a > 0,
and letting a -) cc gives 96 P(E)
f
By an analogous argument, but convolving with a step-function kerne rather than lit, one can also obtain: Corollary 6.2 There is a constant c such that iffeL i (X) and ). > 0, then
E f( rx)— f (T kx)1I >
pfx:supl
)
". 1 k= 1
11?
We will now prove the existence a.e. of the ergodic Hilbert transforrr
J(x) = lim E -■ 0
fdt (Tix)
+ L1114
of a function fe 12(X). The pattern of proof is familiar: convergence iE established relatively easily for a dense subset of L', and then the maximal inequality is used to show that the set of functions for which convergence takes place is closed in L'. Theorem 6.3 If on a probability space (X.
: - cc
a). p) and fe LI (X), then the limit f(1x) di
f(x) = lim c-0* exists a.e. on X. Proof: With
VEf(x)=
f(Trx) dt ise41:1.-5. 1/, t
and
fh(x)= st;tE)1Kf (x)I, we know that h(x)
> 111I1 fII1 .
Let 9 be the subspace of Li (X) generated by all invariant functions and all functions of the form f çJJ
f(x) =
.f0(Tsx)(1)(s)ds, -co
3.6. The ergodic Hilbert transform
117
where fo EL(X) and 4)ESe l (R) has compact support and 0 integral. We will show that 1(x) exists a.e. if f E g and that 9 is dense in 1.1 . If f is a function of this form, then, with
k(t)
11 t
if E ‘.1t1 ‹-
1
0
otherwise
we have that Vi(x) =
(1)
4 G°
f -
f(T, x)dt —
k (t) f f o(Tr± ,x)4)(s)ds dt ao
=
f0(Tix)[k.* 4)(t)]dt. fo(T,x)k.(t — s)4)(s)dsdt = f-30 Let K s(s) = 1/s for I si E and 0 otherwise. We will show that (i) K.* v(t) converges for a.e. t as E 0, (ii) I K.* 4)(t)i g(t)E12(R), and 0 as E —> O. This will imply that k.* 4) converges in (iii) il k e * 4) — K r * 4)1 /2(R) as E —> 0, hence limr V.f(x) exists a.e. for our particular f. ..
o
To prove (i), write
K.* 4)(t)=
4) (s) ds =
is _ ti „...„ t
—s
4)(s)
fIs-0 i t — s
The first term is of course constant with respect to
f
4)(s) — 4)(t) ds = f OW i?....13-11>e
—
4)(s)
ds + f
I ?-31s— t1?---E t
E,
ds.
—S
white the second is
4)(t)
t s —
The integrand is bounded by an absolute constant and converges a.e. ds to [(4)(s) — 4)(t))/(t — s )] xv _ 1 .,± n(s), so the integrals converge as E —+ O. For (ii), suppose that supp 4) c [ — K, K]. If Id ?-. 2K, then
IK, * Ot)I = f cc K.(t — s)44s)ds =1 fc ° [K.(t — s) K.(t)]4)(s)dsl(because f oe 4)(s)ds = 0) - ix =
f [I( — — K e(01 44S) dS1 —K K
I _Kt
1
1
S
10( s) ds = fic
Isl
— Ki t' I t
Si
ios)i ds
T 2 for some constant c depending on 4).
118
3. More about almost everywhere convergence
If Iti ‘... 2K, then K,* Ot)= f cc K,(s)(At - s)ds = 11°'' K,(s)[0(t - s) - 4)(t)] ds .
c°
«t - s) - 00
fs4 is'
ds =
S
L
4*
-
s) - 4)(t)
s
lsK3K
ds,
which is again bounded by a constant. Thus 1K1 *4(t)1 ‘.(c 1/t2) +
c2XE-2K.2KI(t) e L l(R)For (iii), notice that k * Ot) and K,*(14t) only differ because ke(t — s) = 0 for it — si ..>.-- 1/e while K,(t — s) = 1/(t — s) for all large ft — si. Thus
Os) ds
K ,* OW - k,* 4(t) =
f is-ti?, im t - s Ot - s)
=
S
c ds 4-i-x4_ lle+K,Ile—Kr 0) t
as in (ii), and this tends to 0 in L1 . To show that the invariant functions tozether with all the functions f(x) = rj0(1 ix)0(t)dt are dense in L', it is enough to show that if hELŒ)(X) and Ix hf dp = 0 for all such f, then h is invariant. (By the HahnBanach Theorem, the closed subspace generated by g can be separated by a linear functional from any function not in it.) But
0 = ffhdp = f fo(x) f cc h(T_ 1x)0(t)dt dp(x) for all fo e/ ( X) A' - co implies that
f
h(T,x)4P(t)dt = 0 for all such 4).
- co
Considering a çf' (smooth) version (/) of (1/m(I))x1 — (I/m(J))xj where I and J are intervals and letting I and J shrink to arbitrary points shows that his a.e. constant along almost every orbit. Finally, let us show that the set g of all feL l(X) for which V,f(x) converges a.e. is closed in LI . Let fk eg, k = 1, 2, ... and suppose that fk -*f in V. Since I 1/,f(x) - Vef (x)I =IVE[fk + (f - fk)](x)4/,,[fk + (f- fk)](x)i
I
k( — ‘.- VE fx)
Ve fk(x)1 + 2 sup I
V.( f — fk)(x)f
n> 0 -
3.7. The filling scheme
119
and for fixed k the first term approaches 0 as e, el -> 0, we see that for any A > 0, ptx: lim sup I V f (x) - K. f(x)I > A} .E•-•ci À c ‹..pfx:supl V„(f - fk)(x)i > - } -di ilf - frill 2
which tends to 0 as k --, cc. Therefore {17Ef (x) } is fundamental a.e., and hence converges a.e. Remark 6.4 In most of the arguments of this type, we do not need the weak-type (1, 1) inequality for the full maximal function - such an inequality for an 'eventual' maximal function, in which the sup is replaced by a lim sup, would be sufficient. (This was Birkhoff's method in his proof of the Ergodic Theorem.) Also, for the purposes of the theorem on differentiation of integrals, the absolute value signs could have been moved outside the integral. However, there is a theorem of Stein (1961) according to which the existence of a convergence theorem frequently already forces the corresponding full maximal function to satisfy a weak-type (1, 1) inequality, so such apparent gains in power may be only illusory. Corollary 6.5 If T: X -- X is a m.o. and f eL 1 (X), then the series 7(x) =
E f(T kx) -Jk .( T -kx) '
k=1
converges a.e. Proof: Use Corollary 6.2 and mimic the proof of Theorem 6.3.
17. The filling scheme In the next few sections we undertake the proof of the ChaconOrnstein Ergodic Theorem (1960), an extremely general pointwise an result which includes many other ergodic theorems as special-vergnc cases. This theorem deals with an operator T:L i(X)-) L I(X), and thus it extends the case when T arises from a m.p.t. X -, X to situations that are found in interesting applications, for example in the theory of Markov processes. We will give the proof of the conservative case with the simplifications due to Neveu (1979), who saw how to base the argument on certain key features of the filling scheme. (The filling scheme appeared already in Chacon and Ornstein 1960).
120
3. More about almost everywhere convergence
We deal with a a -finite measure space (X, Al, IA and assume that T: L1 (X) —> Li ( X)
is a positive contraction: if0 ---Ç. fe L I , then 0 ...‹... Tf e L i , !I Tf
II 1 11 f 11 i
and
for all f eV.
Such a T is sometimes called a sub - Markov operator; if T*1 = 1, where T* is the adjoint of T on L''' (defined by J4 Tf dtt = j T*4).f dp. for all Oa,' and all J EL L) then T is called a Markov operator. The filling scheme is defined as follows. We begin with 0 -. Ç f, geLl . Think off as giving the amount of sand piled above each point of X and g as giving the depth of a hole below some points of X:
When we allow whatever sand will do so to fall into the hole, and consequently decrease the depth of the hole, we obtain a new sand function h+ = (f-g
)
and a new hole-depth function
h - = (f — g) - . (Thus h = h+ — h - = f — g.)
3.7. The filling scheme
121
Now we use the transformation T to transport the sand: at the next stage the sand has distribution Th+ , while the hole still has depth h - . When the sand and hole interact, the distribution of the outcome is given by h 1 = Th ±— h - . Continuing in this manner, we build a sequence 110 , h1 9 h 2, of L' functions. The inductive definition is ".
ho = h = f — g h , 1 = Tki- — h„- for n = 0,
1, 2, ...
A short proof of the Chacon-Ornstein Theorem, for the conservative ergodic case, will emerge (in the next section) from the observation that if Pulp
This just says that the depth of he hole continues to decrease at each point. Let H - = lim 1 h.- (the decreasing limit of the ii; ) and A = {H - > O}. (2) h.41,_ i -..ç. Th:
for all n ?--. 0
This says that after each time the sand is moved, some of it may fall into the hole. In fact this statement can be slightly strengthened. Notice that h: = 0 on A, since 11.-
H- > 0
on A.
Thus
h:, 1 --. .. Th: implies that in fact
h,, TO( Ach,” = (T M A.)h: (M A C denotes multiplication by the characteristic function of A% and
hence, by induction,
h:_,_ i --aç- (TM A cr1-1/1+
for n =0, 1, 2, ....
(3) If T is a Markov operatcr, so that j Tf dp = jf dp for all f eLl , then
T
hn + i dp = f h. dp for n = 0, 1, 2, ... .
This says that the difference between the amount of sand and the size of the hole remains constant under the hypothesis that T*1 = I. This automatically holds, as we shall see, in case T is conservative. Then for all n, (*)
f
h.dp —
f hdp = I h: dA — f h.- clit -- lim 1 f h: dit — j1-/ - dp. ■
122
3. More about almost everywhere convergence
The filling scheme is useful in ergodic theory because the sequence {h is related to the sequence of partial sums of { TAP, as the following Proposition shows. Proposition 7.1 Let 0 --.5.f, gEL I(X) and h =1— g. Then there are une Li with un ->- 0 a.e. and n-1
E
n-1
Tkf=
E h: + un and
k=0
/4=0
n-1
n-1
E rg = E
k=0
Tn-k-ih k-
+ u n for n = 1, 2, ....
k=0
Proof : We define U 0 = 0,
un = (g — h; i ) + Tu n _ i for n ?-- 1. We will prove by induction that these un satisfy the required equations.
For n = 1, the formulas say that
f = k; + u i ,---- h + +u = h+ + (g — h - ) + TO = f and
g = ho- +u 1 = h - +u = h - + (g — h - ) + TO = g. Note also that u 1 = g — h - ?..- 0 since g ..... h - . Assume now that the formulas hold for n, and let us prove them for n + 1. Notice that un+ 1 ?-- 0 since
g_11-
11;
by (1) above. Further,
Th: = hk+ 1 + h1 = hk++ 1 + (h; — hic-+ 1 ), so by induction n-1
T E T kf = T k=0
n-1
n-1
k=0
k=0
E h: + Tu
E hk++ i + (h - — h;) + Tun
3.7. The filling scheme
123
and hence n-1
n
E
TY = f + T
E
n-1
TY =f +
E h:+1 + (h - - g)
n + (g - h;")+ Tu n = E k + u, +1 ..--,...--, k=0 U n+ 1
(of course f - g + h - = h 4 ). For the other formula, by induction n-1
n-1
T E T kg = E T"h k- + Tun ; k=0
k=0
therefore n
n-1
I(0
k=0
E Tkg = g +
E Tn - khk- + Tun = g +
n
=E
n
E
T" -khx- - h; + Tun
k=0
_i_ u "k 1 n+1 •
Tn—kh—
k=0
Next we will use the filling scheme to establish the Hopf decomposition of X into a conservative part and a dissipative part with respect to T. Usually this decomposition is a corollary of the Hopf maximal ergodic theorem, but here we deduce both results from the filling scheme. (Actually the Hopf decomposition can be proved from scratch, as Exercise 2.3.2 does for a partial case (Foguel 1980).) The heart of the Chacon-Ornstein Theorem lies in the case when T is conservative and ergodic. If T:1_2(X) -- L I (X) is a positive contraction, then we say that T is conservative if CO
E
Tk u takes only the two values 0 and co a.e. k --o T is called conservative ergodic if 0--<-. ueLl (X) implies
c0
0 ‘. ueL 1(X), u 0
implies E Tku = oo a.e. k=0
Lemma 7.2 Let 0 < fi e L l with fi ll° a.e. and 0 < ge Li . Then the H - functions E H - associated with the ih=f1 - g increase to g a.e. as i-, op.
124
3. More about almost everywhere convergence
Proof: Since It decreases with i and the mapping h Th+ h - is increasing (since T is positive), the (ih)k decrease in i for fixed k. Therefore (1h)k- increases in i for fixed k and hence 1 1-1 -
= Jim (' 1 )1-:
increases with i. Also, th - g
and (from (1) and (2)), hdp = fh o+ dp - fig dp .?; f Th o+ dp - fig dp =
J
h 1 dp
1 implies ITizt, 0 .-S. P20+ dp), so that
(since II T II
h dp
hdp
Therefore, replacing h by 'h,
dp =
J -
-> lim l
(g
+ g)dp
h: dp -
)dp
W di - f dp.
O.
Now taking the limit as i - cc gives g a.e.
1 11 -
Proposition 7.3
f
If 0 u, v LAX), then E:'
CO
Tv = 0 a.e. on
00
x: y Tflu(x). oo n=1
y
Tv(x) < oc}.
, ni =
Proof: Let us show that r = 0 a.e. on the set in question; the stronger statement will follow upon replacing L.' by Tr for each n = 1, 2, .... We apply the Lemma to fi = u/i and g = u to find
lim But we claim that = 0, for each i, on {E ,0_ 0 u = cc , Enco_ 0 Tv 0 with n— 1
n-1
Tkox),
rfi(x)> k=0
k=0
so, by Proposition 7.1, there is an n with 1::(x) > 0. For this latter n, i h;(x) = 0, and hence also ih,;(x)= 0 for m n. Thus on this set if
= lim 1 1 17; = 0 a.e.
3.7. The filling scheme
125
Theorem 7.4 (Hopf Decomposition 1954) Choose any ue/2(X) with u(x) > 0 a.e. The set 00 C = {x : E T ku(x) = 00}, k=0 called the conservative part of X. is independent of u. D ----- X\C is called the dissipative part of X. If 0--<... ueL l , then Tk i4 < OC' a.e. on D. Proof: Let 0 < ye L l(X). By Proposition 7.3, y must be 0 a.e. on
Er 0
C(u)\C(v)c
{ETku =
co,
E Tkv < co},
so this set must have measure O. If 0--.‹.. u€L1 , there is u' > 0 with u' € LI and u'?.-- u a.e. Then Ekœ 0 TkUi < CO a.e. on D, so Er_ o Tku < co a.e. on D.
The adjoint operator T* on the dual L°(X) of /2(X) is also positive: 4) ..›.: 0 implies T* 4) ?• -• O. Sets E for which T* xE = xE a.e. are called invariant. Proposition 7.5 If T is conservative (i.e., X = C up to a set of measure 0) and 4)€/:°(X) with T* 4) -..‹.. 4) a.e., then T*4) =4) a.e. In particular, T* 1 = 1 when T is conservative (so that (3) above holds). Proof: We may suppose that 4) .?:- 0, by adding a constant if necessary. Choose 0 < ueLl with ( 4), u> = j 4)u dp < cc . (This is possible because X is a-finite.) Then for each n..?-. 1, n
o ...“ 0 - T*0, E =<40 ,0
--
n
Tk u> --=
y < 7-4:k0,14 > -
k=0 k=0 (T *n±1 40 ,0 -.Ç.<0,0<00
n+I
k =1
independently of n, so by Fatou's Lemma O
(4) - T*4), E
no < oc, ,
k=0 which is impossible, since Ek't 0 TkU = OC a.e., unless T*4) = 4) a.e. Of course T* 1--.‹.. 1 since j 1- Tf dp-..ç._ '1 f dp for each 0--'ç_feLl . Thus T*1 = 1:
J Tfdp = jf dp
for fe L i .
Exercises 1. Use measurable point transformations to construct examples of contractions of each of the following types: (a) conservative, (b) dissipative (i.e., D = X),
126
3. More about almost everywhere convergence (c) with both C and D nontrivial, (d) conservative ergodic, (e) nonconservative ergodic, (f) conservative nonergodic, (g) nonconservative nonergodic. 2. Show that when T is induced by a point transformation, D is generated by a wandering set (W, T W, T -2 W... pairwise disjoint) and the restriction of T to C is incompressible (T -1 B B implies p(B\T B) = 0). 3. Let .0 denote the family of all invariant subsets of C. (a) .0 is a a-algebra. (b) .1 coincides with the family of all sets of the form for 0 -.S.feLl . = (c) Tx, = xt on I implies ./e.0; lei implies Tx1 = xl on C. (d) These are equivalent for each non-negative measurable h on C: (i) Th %S. h on C, (ii) Th = h on C, (iii) h is .0-measurable on C. 4. Use the Filling Scheme to prove the formula
{>f
SA S
dp = ff dp
of 3.1. 3.8. The Chacon-Ornstein Theorem Theorem 8.1 (Chacon and Ornstein 1960) if (X, .4f, p) is a-finite,
0
f, geL l (X), and T: V(X)
L 1 (X) is a positive contraction, then
n- 1
T kf k =0 n- 1
Tkg k= o converges to a finite limit a.e. on fx: E17_ 0 Tkg(x)> 01. Remark: On C the limit can be identified as a quotient
E(j) Ec.(4) of conditional expectations of modifications of f and g with respect to the ø'-algebra of invariant subsets of C (see Foguel 1980, 1.3) while on D it is
3.8. The Chacon—Ornstein theorem
127
the quotient a,.
E
T kf
k=0 .17
E
Tkg
k=0
of two almost surely finite numbers. The main interest of the Chacon—Ornstein Theorem lies in the case when T is conservative. We will henceforth assume that T is conservative and ergodic; the general statement follows from this special case with some extra effort, which we will not undertake here. (See Foguel 1980 for a complete discussion.) Also, notice that the hypothesis f >.- 0 is not necessary, since the general statement could be proved by applying the theorem to f + and f separately. This Theorem does not yet imply that if T is a positive contraction on L', then -
._ i —E 1
n 1c0
rf
converges a.e. for eachfeli. In fact, such a statement is false (Chacon 1964). However, M. A. Akcoglu (1975) has shown that if 1 < p < x, and T is a positive contraction on LP, then 1/n EN 01 Tkf does converge a.e. for each fe LP. Suppose then that T: 12(X) --. L l (X) is a conservative, ergodic, positive contraction, 0 --aç.f,geL l ,h=f— g, and h 1 , h 2 .... are associated tof and g by the filling scheme. Let A = {x : II - (x)> 0} and, fa k = 0, 1, 2, ... ,
vk = (M A C T*)k z it . (If T were induced by a point transformation T: X —> X on a probability space, then we would have vk(x) = xi,,A=k1 (x), where nA (y) is the first entry time of y to A.) Lemma 8.2 If it(A)> 0, thei. Ek1=0 vk = 1 a.e. Proof: First we will prove by induction that n
E vk + ou,,, 7-4,r ,,,,, 1 a.e. k=0
128
3. More about almost everywhere convergence
For n = 0, the formula says that Vo — XA '
which is true. If the formula holds for n, then n+ I
E
vk
+
(M 4, T*)'
' x Ac
I=0
n 7.--
XA
+
( M AC
T * ){ E vk + wAcTinxAc} k=0
= XA + (%1 Ac T*)(1) = XA + XA. = 1 a-e-, since T*1 = 1 when T is conservative. Since the vk are non-negative, the (M A C T*r X A , decrease a.e. with n to w .?; O. We have y E I), + w ,-- 1, w= 0 on A, M Ac T * w ---- w. k= o e_, w Ilso_ w, by slight extension of The last equation implies thaat.e71 1f•% then T*Z -‹.. Z, so Proposition 7.5, T*w = w 7'=:- a-II T*Z = Z,) We need to show, then, that w = 0 a.e. Choose 0-....5feLl , f # 0, with suppf c A. Then
so that
f
t w dit = 0,
E
Tkf-wcIpt =
k=0
since EZ: 01 TY
T*wdp = ff - nwdp = 0; k=o
—
oc a.e., this is impossible unless w -- 0 a.e.
Proposition 8.3 If T is conservative ergodic and ha.' with H - * 0, then 30
lim i f h: clpt = 0 and n-0 ao •
E
h: < ao a.e.
n=0
(H - * 0 means Sf dp < f gdp, so that at the outset the size of the hole exceeds the amount of sand. The Proposition says that under this condition, in the conservative ergodic case, all the sand disappears eventually, and during the entire process only a finite amount of sand passes over each point of X.)
3.8. The Chacon-Ornstein theorem
129
Proof: We know from (2) in the preceding section that
-..ç(Tm Acrh+
M
h:
for n = 0, 1, 2,....
Therefore, for each n, k = 0, 1, 2, ... ,
f
h: vk dp
(TM /On h+ • vk dp = h+ (M Ac T*)n vk dp
= f 11+ un+kcillNow sum over k and use the Dominated Convergence Theorem to find that
dp q
JO
h+ E v„,„dp\,o as n . k= o since E/T_ 0 Vn k \ 0 a.e. as n --+ c. On the other hand, if we summed over n we would have
f
Eh:vk dp:--çfh'Eun , k dp -.4h + dp<
so that L"_ 0 1.1„ Gc a.e. on each Ek = Ek Vk > 0 ax. on X.
Irk > 01.
hence a.e. on X. since
Theorem 8.4 Let T: L'(X)-> L' (X ) be a conservative, ergodic, positive contraction and 0 f, ge L I (X). Then n-1
iim
yTkf
f fdp
Tkg
gdp
k=0
n—
y
1
k=0
a.e. on {x: Ekœt 0 Tk g(x)> 0 } . Proof: We suppose that g #0 and consider first the case when jfdp < jgdp. Let h =f- g and apply the filling scheme. Then Pi dp
y
n—
E
1
n—1 Tkf
lim sup kn—= °1 ?I
E
k=0
E
un +lim sup n _ 1
tç. lim sup kn—= 1°
T kg
E
k=0
T kg
E k=0
Tk g
!-Ç.
0 + 1=1.
130
3. More about almost everywhere convergence
Let n-1
E
rf
Q(fg)_k=0
.
n-1
E
rg
k=0
If p > 0, then Qn (pf, g) — pQ n (f, g).
Thus given arbitrary 0 -‹..f, g e /2 with g * 0, we can choose p > 0 so that j pf dp < jgdp and apply the above estimate to see that
1 lim sup Qn (f, g) .-<..— . 1)
f
gdp
Letting p 7
ff dp lim sup W .f. g) -‹.. n
. we find that
ff dp
. f gdp
1ffm-- 0, the result is obvious. Otherwise, we interchange fand g and use the information already gained to conclude that
1
f
f dp
lim inf 02(f g ) • — lim sup Qn(gf ) f gdp Finally, we show how the Hopf Maximal Ergodic Theorem, the usual starting point for the proof of operator ergodic theorems, follows from the filling scheme. (By Exercise 2, the implication can also be reversed.)
Theorem 8.5 Hopf Maximal Ergodic Theorem (1954) Let he L' (X) and n- 1
E = Ix: sup
s Tkh(x)›ol.
111 k=-0
Then t0. .- .? fEhdp
3.8. The Chacon-Ornstein theorem
131
Proof: Note that
{h > 0} c E c 11/ - = 01, since (as in the proof of Proposition 7.3) for each point of E we have (with/ - g = h = h + - h - ) n-1
E
n-1
E
T kf>
k=0
rg,
k=0
so by Proposition 7.1 h: > 0 for some n at each point of E. Again we find that
H - = Jim 4. h; = 0 a.e. on E. Thus
J
hdp ?--- -
J
H - dp
(see (*) in the preceding section, p. 121),
and hence Lhdp + fhdp .?--- - I IV dp - f 11 - dp = - f 11 - dp; Ec
E
Ec
Ec
however, since on EC we have h -.Ç. 0 so h =-- - h - , this also equals
hdp
-
f h - dp. Er
. i lE
We find that
JE
hdp
h - dp - p; f H- d fEc
Ec
but the latter expression is nonnegative, since h - ..•.- H - a.e. Exercises
L (Rost) If T is conservative ergodic, then (a) lim.„ 4, Sh: dp = 0 if and only if ihdp -.‹.. 0 (b) lirn Œ, ,t. f 11; dp = 0 if and only if f hdp ..- 0; Interpret in terms of sand and holes. 2. Prove Proposition 8.3 from the Hopf Maximal Ergodk Theorem. 3. Prove Hurewicz's Ergodic Theorem (1944): If T is nonsingular on (X, Al, p) (pT - pT -1 - it) and wn = dpr/dp for n = 0, 1, 2, ..., then for any feLi and any positive measurable g with n-1
E
1,0
g(T kx)w k (x) -+, x a.e.,
132
3. More about almost everywhere convergence the limit n-
E f(rx)w 4 (x) lim kn-=10
D
g(Tkx)w 4(x)
n -"
k =. 0
exists a.e. 4. Extend the Chacon-Ornstein Theorem to the case of a oneparameter group {T, : - GC < t < ) of positive contractions on Li(X). 5. Show that the Chacon-Ornstein Theorem for general T follows from the case when T is conservative. (Hint: Consider the restriction of T to Ll(C).)
4 More about recurrence
This chapter presents several further topics concerning recurrence and mixing. First is a direct construction (based on the theory of almost periodic functions) of eigenfunctions for m.p.t.s which are not weakly mixing. The second section presents the purely topological analogues of recurrence, ergodicity, and weak and strong mixing, and the third introduces Furstenberg's theory of multiple recurrence and his proof of the Szemerédi Theorem. In 4.4 we give the Jewett-Bellow-Furstenberg proof of the existence of uniquely ergodic topological representations of ergodic m.p.t.s. The final section examines two examples of weakly mixing m.p.t.s which are not strongly mixing: an example of Kakutani (which we show is not even topologically strongly mixing), and one of Chacon. which is also prime (i.e. without proper invariant sub-cr-algebras), as was discovered by del Junco.
4.1. Construction of eigenfunctions We will give now a more direct proof of the existence of eigenfunctions for m.p.t.s which are not weakly mixing. The idea, apparently due to Varadhan, Furstenberg, and Katznelson, is to reduce the statement to the existence of nontrivial characters on compact abelian groups. However, since this is itself usually proved with the help of the Spectral Theorem for compact operators, we will complete the argument by means of an elementary proof, which also introduces some important ideas from topological dynamics and the theory of almost periodic functions.
A. Existence of rigid factors Theorem 1.1 Let T be a m.p.t. on (X, .4, p). If T is ergodic but
T x T is not ergodic, then T has a factor which is an isometry on a compact metric space, hence a rotation on a compact abelian (even monothetic) group.
134
4. More about recurrence
Proof: Suppose that Tx T is not ergodic, and let G:X x X --• C be a nonconstant invariant function in L2(X x X). For x, ye X, define
d(x, y) = I1G(x, ') — G(Y , )112 -= .\I f IG(x, z) — G(y, z)I 2 dp(z).
x
That is, we have a map 1: X —■ L2(X) defined by /(x) = G(x, -), and d is the pull-back to X of the metric on L2(X): d(x, y) = II i(x) - 101 11 2Clearly d is a pseudo-metric on X (i.e., d is symmetric and satisfies the triangle inequality). Since G is T-invariant, T is an isometry of X with respect to d. We form the quotient metric space g" of X obtained by identifying points of X which are at zero distance from one another. The quotient map n: X —, )? is continuous, and if Anx, ny) = d(x, y), then ci. is a metric on g. We also obtain a map t: g -> g defined by t(nx) = n(Tx) (the definition is allowable since x — y if and only if Tx — Ty, where x — y means d(x, y) = 0). The measure p is carried to a Borel measure fi on Â. Notice that n is measurable since n' of an E-ball in (g, a) is an E-ball in (X, d), which is measurable because, looking at the definition of d, d is a measurable function of x and y. Notice that T is ergodic, since it is a factor of T. I claim now that g has more than one point. For if g is a singleton, then G(x,.)= G(y,.) in L2(X) for each x, ye X, so that g(u) = G(x, u) is invariant, and hence constant, for a.e. x. Next we will show that g is totally bounded (i.e., for each E > 0, g can be covered by finitely many E-balls). Notice first that if B(x) is an E-ball in X, with center x E X, then p(B c(x)) is a.e. a constant independent of x. For
p(.13.(x)) = p(TB c(x))= p(.13 c(Tx)), since p is T-invariant and T is an isometry with respect to d, so p(13,(x)) is an invariant function and hence, by ergodicity of T, must be a constant a.e. Call this constant cc . We next show that cc > O. First observe that (g, a) is separable. This is so because (g, a) is a (topological) subspace of L2, and L2(X) is separable and hence second countable, so (g, is second countable and hence separable. Thus g can be covered by countably many E-balls, and hence one of these must have positive measure. Since they all have measure c.,, however, it follows that cE > O. Thus fi(B)) = c, > 0 a.e. for each E > 0, with cc independent of 56 eg. By taking a sequence {E. decreasing to 0 and discarding a set of measure 0
a)
}
4.1. Construction of eigenfunctions
135
from g for each n, we may assume that
ii(Bc(i)) = c c >0 for all 2 E for each E = En . Now given any 6 > 0, choose n so that an < 5. We will show that finitely many 2E n-bal 1 s. and hence finitely many a-balls, cover R . Since is ergodic, OD
U Tk(13JZ)) 1
for each 2€ g.
k=o
Thus, fixing SEX, we may choose N so that tic(Ben(i))>
1—
Cc. .
k=0
Then
r kB,„(5E) kO
must hit every En-ball in
k=
o
g, and therefore
—rkB 2E„(i)
covers g We have shown, then, that g is totally bounded. It follows that the completion g of g is compact. i- - extends to an isometry T on X, and /.7 on A! extends to /I on g with
ii(g\g) =0. is a factor of (X, T, p); thus T is an ergodic isometry on the compact metric space X. It is an easy exercise to show that g is a compact abelian group and T is multiplication by a certain member of I (Exercise 1). Then any nontrivial character of g will produce an eigenfunction of T and hence of T.
(g, T,
B. Almost periodicity Suppose that T : X X is an ergodic isometry on a compact metric space X with respect to a Borel measure p which assigns positive measure to each nonempty open subset of X. (The system (g, T, ri) constructed in the preceding section has these properties.) We will show that
136
4. More about recurrence
if f is a continuous function on X and X0 E X, then { f (Tnx 0): n El} is an almost periodic sequence. A sequence fan :neZ} is called almost periodic (we write a€ AP) in case for each E > 0 the set of all pEZ for which suplan+ , — an i 0, there s a relatively dense set of n for which d(x, Tax)
For a moment, let us consider an arbitrary homeomorphism T on a compact metric space X. For each xE X, let e(x) = {Tax:neZ}. A nonempty closed T-invariant set M is called minimal if it contains no proper closed T-invariant subsets. M is minimal if and only if the orbit of each point of M is dense in M. The phrase 'almost periodic' is used in a slightly different way in topological dynamics: a point xe X is called almost periodic if for each e> 0 there is a relatively dense set of n such that d(x, Trix) < E. Theorem 1.2 (Gottschalk 1944). 0(x) is minimal if and only if x is
almost periodic. Proof : Suppose that x is almost periodic. Let yet(x) and let U be a compact neighborhood of x: if we can show that C(y)n U * 0, it will follow that xE(9(y), and hence (9(y) = 6(x). If R {n:TaxE U), then R is relatively dense, so there is a finite interval of integers 1 c iL such that Z = R + 1. Then, writing nx for Tax, 0(x) c (R + I)x c I(Rx) c I U. which is compact, so that 0(x)
IU.
Thus, since ye0(x), we have ye II/ and hence C(y)nU * 0Conversely, suppose that 0(x) is minimal and let U be an open neighborhood of x; we need to show that R In: Taxe U1 is relatively dense. Each point of (9(x) has an orbit which meets U; therefore, C(x)c J TU, neZ
4.1. Construction of eigenfunctions
137
and by compactness there are n 1 , n2 , ... ?I, such that (9(x)c T - n 1 U If neZ, we can find i 1. r such that Taxe T -14 U ; then
Tni- nixe U, so that n + ni e R, i.e., neR — n i . This shows that
Z= R —
nr } = U (R i--
so that R is relatively dense (finitely many translates of R cover 1). (This proof is written in language which allows it to extend easily to the case of actions of groups other than Z.) Remark 1.3 Uniform almost periodicity Since T is an ergodic isometry, this Theorem readily implies that T is in fact uniformly almost periodic. For each n =01 1, 2, ... the continuous function
f(x) = d(x, Tax)
must be constant on X, since fn(Tx)= f.(x). Thus d(x, Tax) < e exactly when d(y, Tay) < E, for any x, ye X. Since by Zorn's Lemma X contains minimal sets and hence almost periodic points, it follows that every point of X is almost periodic and {n:d(x, Tax) < c} is a relatively dense subset of Z independent of x for each >0. Remark 1.4 Dense orbits Let us note also that T has dense orbits. If {U, U 2 , is a countable base for the topology of X, let ...
}
X i = {xe X: there is n
0 with Tnxe Ui }.
Since T -"'X i X i and U. X i , so that p(X i) >0, we have p(X i) = 1 for each i. Thus ni x, has measure 1, and so in fact almost every orbit is dense. That T is an isometry, however, implies that every orbit is dense. For if xo € X has a dense orbit and x, ye X are given, then
d(Tax, y) < d(Tax,T" -"x0) d(T""x o , y) d(x, Tax o) + d(Tn"x o , y), which can be made arbitrarily small by appropriately choosing first m and then n. Thus our (X, T) is minimal. Remark 1.5 Uniform existence of mean values We will
show that
4. More about recurrence
138
for the ergodic isometry T, 1nlim- E
f(Tkx)
n—conk=0
exists uniformly for x EX for each feW(X). (This, together with the limit's being constant, is the condition that (X, T) be uniquely ergodic - there is exactly one T-invariant Borel probability measure on X.) Since for each f €(X) the functions 1nk=o by form an equicontinuous family which is uniformly bounded by the Arzela-Ascoli Theorem some subsequence {fni } converges uniformly on X. Of course the limit must be if dp, since f dp pointwise. This observation shows that (X, T) is uniquely ergodic: for if y is any T-invariant Borel probability measure on X, we have 1 ff. & = — E ni k=0
ifTkav= ff
f [ di]& = d,u,
since f dp uniformly; therefore y = p. Now if {f,,} failed to converge to ff d,u uniformly, we could find E > 0, mi op, and x, E X with .
—E
f(rx i)
-
M i k= 0
if
dp
c.
Define continuous linear functionals pi on W(X) by 1 fti(g)= — E g(Tkx i) for geW(X), mi k=0 and let y be a weak* limit point of {p 1 } . Then ff - ff dp1 ?- c;
but also y is T-invariant, since for geW(X) 1 mi -1 (gnmf(xi)
E
Mi k= 0
so that ,ui (gT) = (gT).„(x i)
and iti(g)= gmi(xi )
g(Tmixi) g(r 1 ' ,c1)=gm,(xi)+
rn,
g(xi)
139
4.1. Construction of eigenfunctions
differ by a term which tends to 0 as i —■ co. Since T is uniquely ergodic, this situation is impossible. Remark 1.6 This proof shows that if fEW(X), then
1 n— 1
-n E f(rx n )—■ f f dp
uniformly
k=0
for any choice of x, , x2 , ...EX. Remark L7 From 1.8 below, it will follow that each almost periodic sequence {an} has a mean value
I Al { an)
= hill -
n
n- 1
Ea
lc +nt'
k=- 0
the limit existing uniformly in m (even if m varies with n). Remark 1.8 A sequence {a n :nel) is almost periodic if and only if its orbit closure under translation, the set of all sequences {an _,J : nEZ},j €1, is totally bounded in the supremum norm. This statement is nothing more
than a paraphrase of the definition of almost periodicity: the e-balls centered at K consecutive translates of a will cover the set of all translates of a, provided that every interval of length K contains an e-period of f: given j, we find an e-period p and ajo in the consecutive set such that j = p+j o and note that I an+i — an+jo l = lan+ , + jo — anti° I < e
for all n.
(This characterization of almost periodicity (which Bochner (1927) showed to be equivalent to Bohr's for continuous functions on DI) was employed by Bocimer and von Neumann (1935) to define almost periodic functions on a general group G: a bounded function f :G —■ 01 is called left almost periodic if the set of all left translates off is totally bounded in the supremum norm. One can prove that f is left almost periodic if and only if it is right almost periodic. Most of the properties of almost periodic sequences that we are considering here are shared by all almost periodic functions on groups.) Now if T: X —> X is an ergodic isometry, x o c X, and f EW(X), then {f(Tnx0 ):nEZ} is an almost periodic sequence, since T is uniformly
140
4. More about recurrence
almost periodic. In fact, all almost periodic sequences arise in this way. For if Ian ) is an almost periodic sequence, its orbit closure 0 under the shift a is compact. Of course a is an isometry with respect to the supremum norm, and (û , cr) is uniquely ergodic. (In fact, 0 is a compact monothetic group, since it contains a dense copy of Z, and a is multiplication by a fixed element of n.) The function f which evaluates 0th coordinates will recover the sequence a: a(n) = f( rr a). (We could also form the function algebra generated by a given almost periodic sequence and its translates and consider the transformation on its maximal ideal space induced by translation.) We have established the following result. Remark 1.9 Almost periodic sequences are exactly those which come from ergodic isometries. Remark 1.10 The almost periodic sequences form an algebra. We will in fact prove a stronger result: If {an } and {b.} are almost periodic sequences, then {(a n , bn) } forms an !J 2 -valued almost periodic sequence. For if {an) comes from a pair (A, S) via a function f and a point ao , and {k} from (B, T) via g and 1)0 , then S x T is an isometry on A x B. Let X be the orbit closure of (a0 , /30) under S x T. Then (S x Tr(a 0 ,b0)= (an , bn), for neZ, forms an R2-valued almost periodic sequence, by 1.9. We also give the classical nondynamical proof of this observation. Given e> 0, let P. be the set of all le-periods of a. Choose K so that every interval of length K contains members of both P. and Pb . In each interval [(n — 1)K, nK), choose ;E Pa and fin ePb . We have —I( ....ç. an — fin
K
for all n.
Thus there are only finitely many possible values for the an — fin , and hence there is an no such that for each neZ an n' can be found with —no --<.. n' .-5. no and
an —
an , — fin, .
Then an — an. = 13, — fi,,., and each is an &period of its respective sequence. The spacing of these c-periods is (Œ,,+1 — ao + i ). ) — (OE, — cen ,) I
I
œn + iœ(n + 1 ) — Œ n'l
.,Ç. 2K + 2n0.
4.1. Construction of eigenfunctions
141
:n = 1, 2, is a relatively dense set of &periods for each Thus lan of a and b. Now clearly if {(a., b)1 is an 1 R 2-valued almost periodic sequence and f :ER 2 is continuous, then {f(a n , b.)} is an almost periodic sequence. Since multiplication is continuous, 1.10 follows. Remark 1.11 The set AP of all almost periodic sequences is uniformly closed.
C. Construction of the eigenfunction Recall that T is an isometry on a compact metric space X consisting of more than one point and that T is ergodic with respect to an invariant Borel probability measure p. We are trying to find a nonconstant OeL 2(X, p) for which there is EC with 4(Tx) = (j)(x) a.e. Choose any f&(X) which is not constant and for which If dp = 0, and fix xo e X. Write Ri) = f (T x 0) ( E74
This is an almost periodic sequence. Since rotation of the circle through any fixed angle 270, is an isometry, { e 2ninA.
nEZ}
is an almost periodic sequence for each )ER. By Remarks 1.10 and 1.7
there is a uniform mean value -1 f (j)e - 2Eij A l - lim
ax.(A) =
co
We define
nE f(j)
n =0
C -21tijA
n-1
h(x) = ax(A) = lim n—e• co n
E f(Tix)e
j=0
Then h, being a uniform limit of continuous functions of x, is continuous, and 1" -1 h(T x) = lim - E f(T
x) e -2/ti jA
n'co e21
n-1 A ii m E f(r n —,
i+ x) e-2ni(j+1).1. e21"h(x).
n j=0
We need then only select .1 so that h 0 0. E.g., if we can find a .01 such that h(x0) = a 0 (A) # 0,
we will be through.
•
.
-
C. 1 iLC
IL,. 441, 1
We will be able to find such a A by careful scrutiny of the almost periodic sequence { f(j)}. We will consider &periods (in fact tin-periods) p. of this sequence, and for each n pick a rational number kip ft for which f , 0 j
p — 1,
'best matches' e 2nijk,a/pn u
pn
1,
in that Pn
E f (fie -2 it ij k1p,
c > 0 for all n.
Pn jO The rational numbers knip. will converge to our eigenfrequency .1 for which 1 xo (;)
iirn
n-œ F n
E
f
a
Let xo e X and fE (6(X) be fixed. We will prove the following assertion: Theorem 1.12 If t f ( j)} is an almost periodic sequence for which Œ(.1) = MI f(j)e -276P} = 0 for all ileR then n-1 lim 11-4
E ifoor =o,
n k=0
and hence f (j) = 0 for all j. Since in our case f (j) = f (Tixo) for a continuous function f on X. it will follow that if all cxxo (À) are 0, then f (x) = 0 for all xE X. and so we will be finished. Theorem 1.12 is fundamental in the theory of almost periodic functions: it is the heart of Bohr's theorem characterizing AP as the uniform closure of the set of all trigonometric polynomials
an = E j e2ninkr jI
(see Bohr 1925a, 19256, 1926, 1932; Besicovitch 1954). Our proof reproduces the essential part of Bohr's original argument. We will prove Theorem 1.12 by means of a sequence of lemmas. The idea is to show first that if a(),)--- 0 for all A, then, because of a certain continuity property of mean values (Lemma 1.13), in fact n- 1 lim n-.co
E f (i)e -27riP.
n .fro
(Remark 1.14).
=
unifortilly
in Ile ER
4.1. Construction of eigenfunctions
143
Lemma 1.131f { f(j)} is an almost periodic sequence with M{ f) then for each E> 0 there are N and 6 > 0 such that
11 n -1
-n j=0 E Ai) e - 2/tij.1
whenever n..>-- N
GE
and
0,
I .1.1 < b.
Proof: Since M{ f} --- 0, given e> 0 we can find no such that 1 n-1 E f (k + Al< - for n ...- no and for all kEZ.
E
2
n j=0
For n> no , write n = lno + p with p < no . Choose (5 so small that 1 C -21tiil — 11 <
for I'll < 6
E 2
and j = 0, 1, ..., no -1.
1If IL
Then 1 n— I
nj=0
E - 1n0 + p k= E 0 jkno I /-1 `C' ‘. -1 L
fOle -2 ni" +
E
f(j)e -216P
j= Ino
1 t1 ,-1
e - 27tikno A ____
E
f(j + kno)e - 21tijA
no J-0
k=0
a
n-1
1— 1 (k+ 1)no— 1
1
1 I 1 n -1 Ape -2' 4A ± --/. I — no i=in.
E
Now the second term is bounded by 1 -1 11 fli Go , which tends to 0 as / -> while in the first term we have 1 no—
GO ,
I
— E f(j + kn no j=. 0
°
2
and
1 no -1 E f(j + kn0)e -2ni3)=0 Al + kn o ) - — — no j E O n j=0
1 ne — I ‹ —
E
n 0 j=0
!Ai+ kno)111 - e'il
so that n
1— n ID
E
j=
i Io-
f (j + kn o)e - 21diAl<
E.
0
Thus the first term is less than E, and hence the sum of the two is less than 2e, if n -,- ln o and 1 is large enough.
144
4. More about recurrence
Remark 1.14 It follows readily that if
Mt f(j)e - 21'42 } = 0
for all A E ER,
then 1 n— I
fim - E f(j)e' lil = 0 n •.1 _— U-
uniformly for AEk.
n—..x.
Now we want to choose kn eZ and 1/n-periods pn such that
I
1 P" — 1
E
f(j)e'li kriPn > c > 0
for aim .
Pr! k = 0 Then if kip. -■ 'I, we will be able to argue that oixo (t)# O. If it is not possible to choose such kn and p„, then we will see that the Ar of the following lemma tend to 0 uniformly in k as p -- Go .
Lemma 1.15 Let I P— I 11 (f ) = —
I f(j)e' li kiP for p = 1, 2, ... and k = 0, 1,... p - 1.
If lim p—Poo A k(P) = 0 uniformly in k, then 1 n—1 liM —
E ifoor =0
(and hence f7.,-- 0). Assume henceforth that the hypotheses of Lemma 1.15 are satisfied. The following sequence of lemmas will constitute a proof of Lemma 1.15. Since AP is an algebra, i n— I
E f(k + j)f (j)
g(k) = lim n ,,œ. x
exists uniformly in k. We need to prove that g(0) = O.
This will be accomplished by considering certain periodic functions related to f For a fi xed integer p, let F = PP) be the periodic sequence of period p for which F(k) = f (k)
for k = 0, 1, ... p - 1.
Fourier analysis on the finite abelian group Zp shows that if 1 g— 1
Ak =-- A (P) = —
E
pi=-,o
f(j)e - 27riikiP,k
-1
4.1. Construction of eigenfunctions
145
then 13- 1
E A
F(k) = F(P) (k)=
e 2lar kIP
r= 0
Further, either by direct calculation or knowledge of Parseval's formula, we see that p- I
p- I
1p-'
-P•=0 E ifor
E
E
PJ O
1Al2 II fe. , then
Lemma 1.16 If supjApi - ■ 0 as p 13- 1
E
A4-'
0 as p 00 .
k0
Proof: P- 1
13-1
E 14014
E 1 A V) 12
sup I 4P)I2 k
k— 0
If
SUp
I 4P) 12
co
0 as p
Lemma 1.17 If G(k)= G (P) (k)= lip Er:(1) F(k + j)F(j), then -1
-1PE
IG(P)012--0
as p
o0
p i =o
Proof: Observe first that 1 P -1
p- 1 p- 1
E -E
-E
F(r + j)F(r)e- 2nijklp
G(j)e -2 ' li kiP P j=-.0 P j= 0 P r= 1 p- I
E
Pr=0
1 P j)kl p F(r)e2 'ir k/P- L F(r + j)e- 2nitr
P i= 0
1P I -
E
E
1 P- I
F(r)e" -
p r =o
F(j)e - 2Riik/P
P i= 0
E
1P-1
E
F(r)e'"IP F(r)e 2 '"IP Ak =--- A k P r =0 Pr=0
Ak Ak
= lAk I 2 ,
so that, by Fourier analysis (or directly), 13 - 1
G(k) = = Go E
=0
e 2idik,p.
4. More about recurrence
146
Define
1 P-1
y
H(k)= H°»(k) G(k + j)G(j). P j= 0 A similar argument to the foregoing will show that
P -1 H(k)=
E i=0
I Ail4e2nijk/p.
By Lemma 1.16,
P-1 11(P) (0) =
E jAr) 14 --
0
as p -. x) .
i=o
In view of the definition of H, this says that
1 p- t — El o G(P) 012 —■ 0 P .j =
as p ---+ op .
Recall that we are trying to obtain the conclusion of Lemma 1.15 for the almost periodic function g rather than the periodic function G, since this will immediately imply that g a. 0 and, in particular, g(0) = 0 as desired. However, if p is a large &period off for a small E > 0, there is not much difference between g and G. Lemma 1.18 If p = p. is a (large) 1/n-period of f, then
G(k) = g(k) + Op ,
where
1 8 pl E l7 + with
ilfil. n
as p GC. Proof: If 0 - .Ç. k ‘.. p, then Ep -- 0
P -1
1 P-k G(k) = - { E F(k + j)F(j) + P j= 0
E
»=P - k+
F(k + j - p) F(j)} I
P- 1 =;
E f(k +MO +
r j= 0
1 P-1 = - E f(k+.1)f0) + -I Pi= 0
E f(k + j - WO } P-1 E flil[f(k+./-14-f(k
J---p-k+1
P j=p-k+ I
Now 1
I f(k + j - p) - f(k + Al< n
for all k, j,
+ Al
147
4.1. Construction of eigenfunctions
since p is a 1/n-period off so that the second term is bounded by ll f Lim Since the first term converges to g(k) uniformly in k as p.- co, the proof is complete.
Lemma 1.19 For each n = 1, 2, ... , let p. be a 1/n-period off and suppose that pn --. cc. Then
1
1 P" — 1 1 Pi' -1 — 1 Ig(k) 1 2 - —
E 1014101 2
Pn
Pn k=0
--* 0
as n --+ co .
k=0
Proof: The discrepancy is 1 p "_ 1 — r,
I {Iwo! - 10P-4011{10A + I 0"(01}
rnk=0 1
I Pr'E- 1Ig(k) - G (Pn) (k) I,
.Ç. 2 I f Ill,— Pr;
k=0
which tends to 0 as n --0 Dc- , by Lemma 1.18. Combining 1.17 and 1.19, we obtain
Lemma 1.20 1
—E Pn
I g(k)I 2 -, 0 asn-0x.
k =0
and also Lemma 1.21 g 0, and hence g(0) = O. Proof: 1g1 2 is a nonnegative almost periodic function with mean O. Looking back a few pages, we realize that we have proved that f --=-- O. Of course this was accomplished under the hypotheses of Lemma 1.15, namely assuming that lim /4:4 = 0 uniformly in k, P -•
c0
where A lp) = _1 P— E1 foe -27ciiklp. Since we selected f # O. this hypothesis cannot be satisfied. Then there is c -- 0 and (by our proof) in fact 1/n-periods p.-0 co of f and integers k„ with jilik" I -,- c
for all n.
We may assume that in fact kip. -0
TER.
148
4. More about recurrence Lemma 1.22 oc,n(t) # 0.
Proof: If =
mf
276"1 = O.
then by Lemma 1.13 there are nT el and 45, > 0 such that 1 m-1 _ E
<
j= o However, we can choose pri knhin — I
2
for
nr and
P.— -cl <
and kn with
G
so this is impossible. We have obtained, then, the nonzero eigenfunction ax (r) with the nontrivial eigenfrequency T. This completes our proof of the spectral characterization of weak mixing, avoiding invocation of the Spectral Theorem. Perhaps we should repeat that this was not just an idle exercisebesides providing a relatively concrete way to find eigenvalues, we have also learned something about almost periodic functions and in fact have established the main part of Bohr's fundamental theorem. Corollary 1.23 The set of almost periodic sequences coincides
with the uniform closure of the set of all trigonometric polynomials P(k)=
y =
C
Âjk
(.1i eR)
I
on 1. Of course any uniform limit of such trigonometric polynomials is almost periodic, by 1.11. If If (j)} is an almost periodic sequence, then one can show that, in L2, at)
f(k)=
I n=
a n e2niA"k. 1
This is proved, using the above notation and machinery, by a sequence of steps familiar in Fourier analysis: (a) There are at most countably many ;1. for which oc(A) =
m { f (i)e -276 A l # 0.
(b) Bessel's Inequality: If ocP.n)
then yi a ni2
M{If(k)12}-
a. for each of the countably many As.
4.1. Construction of eigenfunctions (For if c
C N are arbitrary and b 1
149
b EC then
M {lf(k) _ E bie 2nikci 2 }
i=1 = M {If (0
12
E 1*(7)12 + i=
which clearly is a minimum when b helps to prove (a).) (c) Parseval's Formula: For the a.=
ib j=
i), j = 1,... N . This hint also
CO
mIlf(01 2 1=
I
laid'.
k= Then the L2 convergence of the above series to f follows readily. Thatfis in fact a uniform limit of trigonometric polynomials is somewhat trickier to prove, although several arguments (especially a direct, elementary one due to Weyl 1926-27) are available. However, some caution is needed here, since it is not true that the above series converges uniformly to f: indeed f can be uniformly approximated to any desired degree of accuracy by a trigonometric polynomial, but these polynomials need not be partial sums of the above Bohr—Fourier series of f. In general, we can be sure only that eventually, as the approximations improve towards perfection, all the eigenfrequencies .1.n off will appear in these approximating polynomials. Many of the properties of almost periodic sequences that we have established here are shared by almost periodic functions on ER and indeed by almost periodic functions on arbitrary groups. We have only introduced a large and beautiful subject which has important connections with and applications in many other parts of mathematics.
1.
2. 3. 4.
Exercises Prove that an ergodic isometry on a compact metric space is isomorphic to a rotation (i.e. multiplication by a fixed element) on a compact abelian group. The group is monothetic, in that it contains a dense cyclic subgroup. Prove!.!!. Prove that if {a n } is an almost periodic sequence, then Mt 161.121 = 0 if and only if a.= 0 for all n. Fill in the details of the proof sketched above that an almost periodic sequence is, in 1,2, the sum of its Bohr—Fourier series.
150
4. More about recurrence 5. Can a sequence which is not almost periodic still have a uniform mean value? 6. Let {ail be an almost periodic sequence and A n =E1j1- 01 ai . Show that {A} is almost periodic if and only if it is bounded.
4.2. Some topological dynamics In the preceding section, in order completely to understand weakly mixing m.p.t.s it became necessary to consider topological aspects of dynamics such as minimality, the existence of dense orbits, and uniform almost periodicity. Here we will briefly indicate some aspects of the topological theory that parallels the metric theory of transformations which is our main concern. Some of these ideas will be used in the discussion of the Szemerédi—Furstenberg Theorem and related results in 4_3, and in 4.4 we will see that in fact every m.p.t. has a topological realization. The proofs of many of the facts we mention make reasonable exercises, and so they are numbered as such and assigned to the reader. Topological dynamics is the study of groups acting on topological spaces by means of continuous maps. Rather than the most general situation, we concentrate on the case of a homeomorphism of a compact metric space X. in which case the acting group { Tn: may be identified with Z ; when we need to consider a general transformation group (X . G), we will rely on the ability of the reader to extend the concepts and definitions as required. The pair (X, T) is called a cascade. (We reserve the word flow for the case when the acting group is ER.) As usual the orbit of a point xe X is (9(x) ={Tx: nE Z1. A. Recurrence Recall that a cascade (X, T) is called minimal in case X has no proper closed T-invariant subsets, or, equivalently, every point of X has a dense orbit. According to Gottschalk's Theorem (1.2), if (X, T) is minimal then every point of X has a very strong recurrence property (unfortunately called 'almost periodicity'): Tk x returns to each neighborhood of x on a relatively dense set of k (i.e. with 'bounded gap). Let us consider some weaker versions of topological recurrence, the first of which holds in any cascade. Proposition 2.1 Every cascade contains a (nonempty) minimal
set. (Exercise 1.)
4.2. Some topological dynamics
151
Theorem 2.2 Birkhoff Recurrence Theorem (1927) Every cascade T) contains a point which is recurrent under T, i.e. a point x for which there are nk co with Pk x -÷ x. Proof: Every point of a minimal set is recurrent under T.
This is a topological analogue of the Poincaré Recurrence Theorem, with compactness replacing the important condition of finite measure. In Section 4.3 we will consider an interesting extension of this result by Furstenberg and Weiss. There is a weaker concept of recurrence than that of pointwise recurrence under T. We say that (X, T) is regionally recurrent if for any nonempty open U c X there is n 1 with T"U n U Ø. Within every (compact) cascade one can find a largest regionally recurrent cascade as follows. Call a point xe X wandering if there is a neighborhood U of x such that Tn U n U 0 for all n 1. Then the set of nonwandering points LI is closed and invariant. and (so long as X is compact) it is nonempty. It need not be true yet that (O. T) is regionally recurrent. Let Q. be the set of nonwandering points of (Q. T), and continue in this way. We obtain a descending sequence of closed invariant sets X
Q
Q.
02
such that if is a successor ordinal, then Q. is the nonwandering set of 04 _ ; while if is a limit ordinal, then Q= Rs . n<4
The intersection of all the Q, is denoted by Z and is called the center of (X, T). Proposition 2.3 The center of (X, T) coincides with the closure of the set of points of X that are recurrent under T. Proof: Exercise 2. B. Topological ergodicity and mixing (X . T) is topologically ergodic if every proper closed T-invariant subset of X is nowhere dense (i.e. its interior is empty). (X. T) is topologically weakly mixing if (X x X. T x T) is topologically
ergodic. (X, T) is topologically strongly mixing if given nonempty open U. V c X there is no e IN such that TU n V # 0 whenever n no . There are several ways to prove that there exist T-invariant positive
152
4. More about recurrence
Borel probability measures on X (Exercise 3); thus any cascade becomes a m.p.t. upon the choice of such a measure. A cascade (X, T) is called uniquely ergodic if there is only one such measure, and it is called strictly ergodic if it is minimal and uniquely ergodic. Here are alternative characterizations of some of these properties. Proposition 2.4 The following statements are equivalent: (i) (X, T) is topologically ergodic. (ii) Every point of X, with the possible exception of a set of first category, has an orbit which is dense in X.
(iii) There is a point xe X which has a dense orbit. (iv) Given nonempty open U, V c X, there is nE Z such that TnU n V Ø. (This property is sometimes called regional transitivity.) Proof: Exercise 4. Proposition 2.5 If (X, T) is a cascade and p is a T-invariant Borel probability measure on X whose support is all of X, then each of metric ergodicity, weak mixing, and strong mixing implies its correspond-
ing topological property. Proof: Exercise 5. Remark 2.6 The converses of these implications are not true (Exercise 6), even when (X. T) is uniquely ergodic. (Kolmogorov 1953, Petersen 1970a). Remark 2.7 I apologize for the two distinct meanings of the word
'metric'. Proposition 2.8 The following statements are equivalent: (i) (X. T) is uniquely ergodic. (ii) For each feSe(X), 11/n E;:= 01 f Til converges uniformly on X
to a constant. (iii) For each fe (if (X), some subsequence of 11/n pointwise on X to a constant. Proof: Exercise 7.
Y. 0nk
: 0 1
f 71 converges
Remark 2.9 Keynes and Robertson (1968) proved that (X, T) is topologically ergodic if and only if the only functions f on X which
satisfy (i) the set of discontinuity points off is residual,
4.2. Some topological dynamics
153
(ii) f T = f except possibly on a set of first category are those which are constant except possibly on a set of first category. Similarly, a minimal cascade (X, T) is weakly mixing if and only if it has no continuous eigenfunctions (Keynes and Robertson 1968, 1969, Petersen 1970b). Here are some typical examples of cascades to keep in mind. 1. A translation on a compact group. (e.g., an irrational rotation of the circle is topologically ergodic, uniquely ergodic, but not topologically weakly mixing — Exercise 8.) 2. An automorphism of a compact group. (e.g., an automorphism of an n-torus is not uniquely ergodic but is topologically strongly mixing whenever it is metrically strongly mixing with respect to Haar measure— Exercise 9.) 3. Symbolic cascades. The space Q = O. 1, N — }E is metrizable, with metric d(x, y) =
where n = inf (11c1: xk # yk l,
and the shift a: SI --• SI is a homeomorphism. (a) The orbit-closure of any point co e Q produces a topologically ergodic cascade ( (2(co), a). (b) The full shift (0., a) is topologically strongly mixing (Exercise 10). (c) A subshift of finite type (E A , a) is defined by an N x N matrix A with entries from (OE 1} by xE /A if and only if A xo,..,= 1 for all iel. Thus /A consists exactly of those sequences which contain only 'admissible transitions'. 4. Diffeomorphisms of manifolds (e.g., geodesic cascades and horocycle cascades). C. Equicontinuous and distal cascades
Next we consider properties of cascades that are at the other extreme from mixing. With respect to a mixing transformation, subsets of X are extremely pliable; but transformations with either of these two properties will be akin to rigid motions. We will also mention the important fixed-point properties of groups of mappings of these kinds. We say that (X, T) is equicontinuous in case {Tr' : nel} forms an equicontinuous family of maps X X. That is, given c > 0 there is 5 > 0 such that if d(x, y) < 5 then d(Tnx, Thy) < e for all 'tel.
154
4. More about recurrence
We say that (X, T) is distal in case x y implies inf„ d(Tnx,Thy)> 0. Two points x, ye X are said to be proximal if there are nk el and ze X with Tnkx -o z, T'y z. Thus a distal cascade is one that has no pairs of distinct proximal points. Of course each equicontinuous cascade is distal. Proposition 2.10 If (X, T) is equicontinuous, then ( (9 (x), T) is minimal for each x e X (i.e. (X, T) is pointwise almost periodic). Therefore X is the union of its minimal subsets. Proof: Exercise 11. Theorem 2.11 The following statements about a minimal topological cascade (X, T) are equivalent: (i) (X, T) is equicontinuous. (ii) (X, T) is uniformly almost periodic in that given t:> 0 there is a relatively dense set S c. 1 such that d(x, Tax) < t: for all neS. (iii) X can be given a group structure which makes it a compact topological group, and there is an element xo e X such that {4 :nel} is dense in X (so that X is monothetic) and Tx = xox for all xe X. .(Thus with respect to Haar measure on X, T has discrete spectrum.) Proof: Exercise 12. The following fixed point theorem, which is frequently used in economics, easily implies, and is implied by, the existence of Haar measure on compact groups (Exercise 13). Theorem 2.12 Kakutani Fixed Point Theorem (1938a) Let X be a nonempty compact convex subset of a locally convex topological vector space. Suppose that G is an equicontinuous group of affine maps X X: g(ax + (1 - a)y) = ag(x) + (1 - a)g(y) for 0 tç. a tç. 1 and x, ye X. Then G has a common fixed point in X. Remark 2.13 F. Hahn (1967) showed that the Theorem is still true if the action of G is required only to be distal rather than equicontinuous: gix -0 z and g iy 0 z implies x = y. We will give the clever proof of this stronger statement that is due to S. Glasner (1975, 1976). Proof: By Zorn's Lemma, X contains a minimal nonempty compact convex G-invariant subset K. Let M denote the closure of the set of extreme points of K. (M # 0 by the Krein-Milman Theorem.) I claim that M is the unique minimal subset of the system (K, G). (We are extending the terminology established for cascades to transformation groups in the obvious way.) -
4.2. Some topological dynamics
155
For let Y c K be a minimal subset of (K, G). Then co ( Y) (the closed convex hull of Y) is closed, convex, and G-invariant and hence coincides with K. Therefore the extreme points of K are in Y, and it follows that M c Y and M = Y. I claim now that all pairs of points of M are proximal to one another. For if x, ye M, then Gq(x + Y)) since M is the unique minimal set of (K, G). Thus there is an extreme point k of K and a net {g,} of elements of G such that 04(x + y))
k.
Using compactness, we may assume that gyx we find
x co and gy y . Then
and this is impossible, since k is an extreme point, unless x = y,ch . Then x and y are proximal. Since (M, G) is distal, this cannot happen unless M consists of just one point. The point is the one fixed by all elements of G. This theorem was further generalized by Ryll — Nardzewski, who showed that it still held if X was assumed only to be weakly compact. We again give the remarkable simple proof found by Glasner (1975, 1976). There is no loss of generality in assuming that B is separable and a Banach space (see Ryll — Nardzewski 1967 for the reduction). Theorem 2.14 Ryll—Nardzewski Fixed Point Theorem (1967) Let
B be a separable Banach space, X c Ba weakly compact convex set, and G a group acting affinely and norm distally (and of course continuously with respect to the weak topology) on X. Then G has a common fixed point in X. Proof: Repeat the proof of the Kakutani—Hahn Theorem for B with its weak topology. We find a minimal set (M, G), all pairs of points of which are proximal in the weak topology. Now we use norm distality to show that again M consists of just one point: given x, ye M and > 0, it is enough to find geG such that Ilgx — gY < e, for then M will be simultaneously norm-distal and norm-proximal, hence a single point. Let U be the closed ball of radius E centered at the origin in B. Choose a sequence b 1 , b 2 , ... of elements of B such that [bi + U:i = 1, 2, ...} covers B. Each bi + U is norm closed and convex, hence weakly closed. Then by the Baire Category Theorem, not every (b i + U)n M can be nowhere
156
4. More about recurrence
dense in the restriction of the weak topology to M. Thus there are an i and a weakly open subset Wof B such that 0* WnMc(b i + U)nM.
(Recall that the weak topology on a weakly compact subset of a separable Banach space is complete metric.) Now x and y are proximal, so there are g, e G and z E M such that g, x -> z and g,, y -* z, weakly. Choose he G (using minimality of (M, G)) such that hz e W n M. Then Wo = h - ' (W n M) is a weak neighborhood of z in M, so there is a v such that g, x e Wo and g,,ye Wo . Then hgwx and hg,y are both in WnMc bi + U, and therefore II hgv x - hg w Y D. Uniform distribution mod 1 Proposition 2.15 A minimal equicontinuous cascade is strictly
ergodic.
Proof: Exercise 14. Let X be the unit circle, here regarded as [0, 1), and let Tx = x + a (mod 1), where a is irrational. There are many easy ways to see that T is a minimal isometry, and hence (X. T) is minimal and equicontinuous (Exercise 15). By the Proposition, (X, T) is strictly ergodic. Of course the unique invariant measure is Lebesgue measure tn. Therefore
I n- I
- E f(X + n k= 0
1
ka) -, dm f f 0
uniformly on [0, 1) for each continuous f on [0, 1). In fact this statement holds for all Riemann integrable functions f on [0, 1); this is one of the equidistribution theorems of H. Weyl (1916). (Exercise 16. Hint: Use monotone convergence. If necessary, see Remark 4.4(2).) In particular, 1n -1 n k=0
+ ka)-* b - a
uniformly on [0, 1) whenever 0 -.Ç. a ‘.. b ‘.. 1. The fact that (km} (mod 1) is dense in [0, 1) when a is irrational has long been known. There are similar observations in the writings of Nicole Oresme (c. 1320-1382): In Proposition ll, 4 of his Tiactatus de commencell (Grant 1971), for surabilitate vel incommensurabilitate motuum example, Oresme considers two bodies moving on a circle with uniform but incommensurable velocities and says, 'No sector of a circle is so small that two such mobiles could not conjuct in it at some future time, and could not
4.2. Some topological dynamics
157
have conjuncted in it sometime [in the past]: Oresme had an astonishing understanding of circular motion and rational and irrational numbers. In De pro portionibus pro portionum, Ch. III, Proposition V, he says, 'It is probable that two proposed unknown ratios are incommensurable because if many unknown ratios are proposed it is most probable that any [one] would be incommensurable to any [other"' (Oresme 1351). Could it be that a fourteenth-century scientist already knew that the set of rational numbers has measure zero (or at least measure less than that of the irrational numbers)? Apparently Oresme was also one of the first discoverers of the divergence of the harmonic series. Oresme's Ad pauca respicientes was written to demonstrate that astrology is futile because the future is essentially unpredictable. His ideas are surprisingly close to P. Bohl's (1909) treatment of the 'problem of mean motion,' a question that had already been discussed by Lagrange. For a given Keplerian element p(t), such as the longitude of the perihelion of a planet orbiting the sun, are there a constant c and a bounded function h such that p(t) = ct + h(t)? (We suppose that at each time t the planet is on a Kepler ellipse, but that the ellipse is changing with time because of the perturbing influences of the other bodies in the solar system.) If there were, the perihelion would essentially revolve regularly about the sun, up to bounded anomalies. Lagrange (1870) gave an affirmative answer in case the equations of motion happened to take certain special forms, and it was generally believed (e.g. CavAllin and Gyldén 1895) that there always did exist a mean motion. However, Bohl showed that in one situation the mean motion existed if and only if the expression Eni : oi x,,,,,„,(x + ka) — n(b — a) is bounded in n. This happens if and only if b — a is an integral multiple of a mod 1. (Cf. Petersen 1973a, Furstenberg, Keynes and Shapiro 1973.) Since practically speaking such a determination could never be made by actual experimental measurement, the problem of existence of mean motion was one that by its very nature could never be settled. Similarly Oresme (Tractatus de Commensurabilitate, Part III), concluded that although the celestial motions are probably incommensurable, the final answer is necessarily unknowable, and indeed it is good that this is so. An elementary argument using the 'pigeonhole principle' due to Dirichlet, shows that if a is irrational then there are even infinitely many k with 1 ka(mod 1) < -. k (Exercise 17.)
158
4. More about recurrence
There is a similar theory of uniform distribution on the n-torus LK' = [0, 1)", a compact abelian group under coordinatewise addition mod 1. Let a = (a, , a2 , , an be rationally independent. Then it is a theorem of Kronecker (1884) (see also Tchebychef 1866) that tka) is dense in W. (Exercise 18.) That [ka) is equidistributed in 0( is another theorem of Weyl (1916). (Exercise 19.) Further, if P is any polynomial with at least one irrational coefficient, then {P(k)} is equidistributed mod 1 (Weyl 1916; for dynamical proofs, see Furstenberg 1960, Hahn 1965, and Postnikov 1966). )
E. Structure of distal cascades
The two most basic facts about distal cascades are that they can be characterized in terms of pointwise almost periodicity and that they can be resolved into towers composed of equicontinuous pieces. The sequences and nets that arise in discussions of proximality are most easily handled by means of the Ellis semigroup E(X, T) (also called the enveloping semigroup), which is the closure in the product topology (i.e. the topology of pointwise convergence) of t 7." :neZ) in the compact space V. It is not hard to show that for any cascade (X, T), (i) E is a semigroup; (ii) right multiplication is a continuous operation in E; (iii) left multiplication by a continuous element of E is a continuots operation in E; (iv) x l , x2 e X are proximal if and only if there is pe E with px = px2 (note that we write the action of the functions in E on the left); (y) Co(x)= E(X , T)x for each xe X. Proposition 2.16 (Ellis 1958) Every compact semigroup in which
multiplication on at least one side is continuous contains an idempotent. Proof: Let us suppose that right multiplication is continuous in the compact semigroup E. By Zorn's Lemma, there is a minimal compact subset K c E with K 2 c K. If ue K then Ku c K is compact and (Ku) 2 -Ku Ku c K 2u c Ku, so by minimality Ku = K. Let L = tve K:vu = u. Since Ku = K, L 0. However, L is closed and L2 = L, so we must have L = K. Thus Ku = u, and in particular u2 = u. Proposition 2.17 (Ellis 1958) (X, T) is distal if and only if E(X, T)
is a group.
4.2. Some topological dynamics
159
Proof: Suppose that E(X, T) is a group. If x 1 and x2 are proximal, then px i = px 2 for some peE(X, T). Multiplying by p - 1 e E(X, T) gives x I = X 2' Hence (X, T) is distal. Conversely, suppose that (X, T) is distal. I claim that then E(X, T) has left cancellation. If p, p , p 2 EE(X, T) and pp = pp 2 , then for any fixed xe X we have p(p i x) = p(p 2x), which by distality implies p x = p2x ; therefore p = p 2 .
Of course E(X, T) contains an identity element e = T°. By left cancellation, the identity element is unique. Inverses can be found as follows. Given pE E(X, T), K = E(X, T)p is compact and satisfies K2 = (E(X , T)pE(X, T))p c E(X, T)p = K. Thus K is a compact semigroup and, by the preceding Proposition. K contains an idempotent u. Using left cancellation, u•u=u=tre implies that u = e. Thus e = qp for some qe E(X, T). Repeating this argument for q, we find re E(X , T) with e = rq. Then p= ep = rqp = re = r. so we have qp = pq = e. Of course inverses will be unique by left cancellation. Proposition 2.18 (Ellis 1958) A distal cascade is pointwise almost
periodic. Proof: Let xeX. We need to show that (9(x) = E(X, T)x is a minimal subset of X. For this purpose it is enough to show that if ye (9(x), then
xeC(Y). But if ye ((x), then y = px for some pe E(X, T), and hence x = ye E(X , T)y = 6- (y). Theorem 2.19 (Ellis 1958) (X. T) is distal if and only if (X x X, T x T) is pointwise almost periodic. Proof: If (X, T) is distal, then so is (X x X, T x T), and hence it is pointwise almost periodic by the preceding Proposition. For the converse, suppose that x and y are proximal. Let A Xx X denote the diagonal. Then (9(x, y)r) A # 0, and hence, since A is closed and invariant and (5(x, y) is minimal, we must have (x, y)e A.
The fundamental theorem on the structure of distal flows was proved by Furstenberg. Similar ideas, except for m.p.t.s, played a role in his proof of the Szemeredi Theorem, which we will discuss in Section 4.3. Let (X, T) and (Y, S) be cascades, and let 0: X be a continuous onto map such that 4•T = SO. We say that 4 is a homomorphism, that (Y, S) is a factor of (X, T), and (X, T) is an extension of (Y, S). Also, the
160
4. More about recurrence
entire diagram 0:(X, T) -* (Y, S) is called a factor, homomorphism, or extension. An extension 0 :(X , T) -) (Y, S) is called isometric in case there is a continuous metric p(x i , x2 ) defined on each fiber 0 - '{ y} such that p(T x 1 , Tx2) = Axi, x 2) whenever 4i(x 1 ) = 402). According to Furstenberg's Theorem, every distal cascade is composed of a (possibly transfinite) string of isometric extensions. In order to state this theorem, one more concept, that of inverse limit, is necessary. Let {(X 1 , TOE): ote A} be a family of cascades indexed by a partially ordered set A such that whenever a, /3e A and a < /3, there is a homomorphism 00 :(X0 , To)--(XOE , T i). The inverse limit (X. T) of the system I (X2 , T 2)) is defined by X = tx = (xŒ)E n xo, :4),Œ x„ , xo, for all a, /3e A with a ‹ fil 2E44
and
Tx = T(x) = (T2x ) . There are natural homomorphisms ii)OE
:
T)
(X 2 , Tc,)
(X,
for each ae A. Theorem 2.20 Furstenberg Structure Theorem (1963) Let (X, T) be a minimal distal cascade. Then there is an ordinal number n and a family of factors { (X,t , Td: < ri} of X such that:
(i) X0 consists of a single point, (ii) (X,, T,) = (X, T); (iii) if < t --. . r1 , then there is a homomorphism c& :(X T , Tr)--, (X,. T ) ; (iv) if < ri is a successor ordinal, then (X4 , Tt) is an isometric extension of (X4 _, , "T_ 1 ); (v) if < ri is a limit ordinal, then (X.: , Tt) is the inverse limit of {(X„ TO:t < a Corollary 2.21 A nontrivial minimal distal cascade has nontrivial equicontinuous factors, and hence nontrivial continuous eigenfunctions. Proof : Exercise 20.
This theorem has been extended by Veech (1970) and Ellis (1973) so that no countability assumptions on X or the acting group G are needed
161
4.2. Some topological dynamics
and one must only assume that X is point distal: at least one point of X is not proximal to any other points. For further reading about this and the many other topics in topological dynamics that we have not been able even to mention we recommend the following references. The origins of the subject are in Henri Poincaré Les méthodes nouvelles de la mécanique céleste, I (1892), 11 (1893), and III (1899), Gauthiers—Villars, Paris. Also Dover, New York, 1957; and NASA TTF 450-452, Washington, D.C., 1967. G. D. Birkhoff (1927) and (1966). Dynamical Systems, A.M.S. Colloquium Publications 9. Providence, R.I. Systematic and progressively more recent accounts are V. V. Nemytskii and V. V. Stepanov (1960). Qualitative Theory of Differential Equations, Princeton University Press. W. H. Gottschalk and G. A. Hedlund (1955). Topological Dynamics, A.M.S. Colloquium Publications 36, Providence, R.1. Robert Ellis (1969). Lectures on Topological Dynamics, W. A. Benjamin, Inc., New York. I. U. Bronstein (1975). Extensions of Minimal Transformation Groups, Sijthoff & Noordhoff, Alphen an den Rijn, The Netherlands. For the connections with ergodic theory. Manfred Denker, Christian Grillenberger, and Karl Sigmund, (1976). Ergodic Theory on Compact Spaces, Springer Lectures Number 527. For the complete story of uniform distribution. L. Kuipers and H. Niederreiter (1974). Uniform Distribution of Sequences. John Wiley & Sons. New York. For the most recent (as well as an unbelievably thorough and novel) treatment of recurrence in both the topological and metric contexts, H. Furstenberg (1981). Recurrence in Ergodic Theory and Combinatorial Number Theory, Princeton University Press. Finally, a survey of much of the newer work in the field, including many open problems: William A. Veech (1977). Topological dynamics, Bull. Amer. Math. Soc. 83, 775-830. -
Exercises 21. Show that the space dl of positive Borel probability measures on a cascade (X, T) is convex and compact in the weak* topology and the set er of ergodic measures in di is exactly the set of extreme points of di. (In fact di is metrizable: Prohorov 1956; Varadarajan 1958, 1962; cf. Parthasarathy 1967.)
162
4. More about recurrence 22. Give necessary and sufficient combinatorial conditions on a sequence xe {0, 11 in order that its orbit closure under the shift be (a) minimal (b) uniquely ergodic (c) strongly mixing. 23. Construct explicitly a point which has a dense orbit in {0, 1 } . Do the same for a given irreducible subshift of finite type. 24. Give an example of a distal cascade that is not equicontinuous. Now give a minimal example. (Hint: Try a skew product.) 25. If 13 = ja mod 1 for some integer j, then Ez -_- ,i, x,0 „„(kco— ni3 is bounded in n. (Hecke 1922). 26. Show that a cascade which has the structure specified in the conclusion of the Furstenberg Structure Theorem is distal. 27. Show that the classes of distal cascades and minimal cascades are each closed under the formation of factors and inveist limits.
4.3. The Szemerédi Theorem A. Furstenberg's approach to the Szemerédi and van der
W aerden Theorems Baudet (a Dutch mathematician working in Göttingen) conjectured that if the positive integers are divided into two disjoint classes , then one of them at least must contain arbitrarily long arithmetic progressions. In 1927 van der Waerden published the proof of a stronger statement:
Theorem 3.1 Van der Waerden's Theorem (1927). If N = C 1 L.) C2 L-) ... L.) C, (disjoint), then there is a j, j = I, ... r, such that Ci contains arbitrarily long arithmetic progressions: given k, an a and n can be found such that
a, a + n, ... a + (k — 1)neC i . Van der Waerden (1971) describes at great length how he . Artin, and Schreier found the proof; Khintchine (1948) includes this result as one of his 'three pearls of number theory', although he has forgotten Baudet's name; and Rado's doctoral dissertation consisted of an extension of this result. It is possible that the conjecture is actually due to Schur, and that Baudet only propagated it - See Schur (1973), I, p. xiii. Later . Erdiis and Tulin (1936) conjectured that in fact any subset of N which has positive upper density must contain arbitrarily long arithmetic progressions. This is of course a stronger statement than the van der Waerden Theorem. Roth (1952) obtained the result for progressions of length 3 and Szemerédi established first (1969) the case k =4 and then (1975) the general statement.
4.3. The Szemerédi Theorem
163
Theorem 3.2 Szemerédi's Theorem (1975) Any set of positive integers with positive upper density contains arbitrarily long arithmetic
progressions. By a set 4f with positive upper density we mean one for which Ni , Mi with N i Mi o oo can be found such that -
-
lim
card [(I' n(M i , Ni)] N 1 -M 1
= 6 > O.
The Theorem asserts that for such a set (tf, given any k it is possible to find a and n such that a, a + n, ... a + (k - 1)neW.
In 1977, Furstenberg proved an ergodic-theoretic version of Szemerédi's Theorem. Although this Theorem can be proved on the basis of Szemerédi's Theorem, Furstenberg's proof uses ergodic theory and does not depend on the van der Waerden, Roth, or Szemerédi results; this approach provides a new and most interesting proof of the Szemeridi Theorem. Theorem 3.3 Ergodic Szetnerédi Theorem (Furstenberg 1977) If T: X -* X is a measure-preserving transformation on a finite measure space (X, p), AE .4 with p(A) > 0, and k > 0, then there is an n > 0
such that p(A r T" A n T 2" A
> O.
T'
Thus every m.p.t. on a finite measure space is multiply recurrent. Of course for k = 2 the statement reduces to the Poincaré Recurrence Theorem. When we take up the proof of this theorem, we will first consider the case when T is weakly mixing. There the result will follow from an even stronger one: every weakly mixing transformation is 'weakly mixing of all orders along multiples'. Theorem 3.4 If T: X -* X is a weakly mixing m.p.t. on a finite measure space (X, R, p), then for any A 1 , A 2 , ... Ak
1 urn p(A N-m-coN-M n _ m+i
E
T"A 2
k)
- p(A 1 )p(A 2 ) . p(A k)I = O. Of course taking A, = A 2 = A k = A will yield the preceding Theorem as an easy corollary, in case T is weakly mixing. Combining this Theorem with the observation of Kakutani and Jones
4. More about recurrence
164
that we discussed in Section 2.6, we can prove that each weakly mixing transformation is in fact 'strongly mixing of all orders along multiples' provided that we exclude all those powers of T which lie in a single set of density O.
J
Theorem 3.5 If T: X -o X is weakly mixing, then there is a set NI with density 0 such that given A 1 , ... , A k e,l, lim p(A i n T"A 2 n ... nrk-1) " A k) ----- p(Adp(A 2 )... n -•oo
nti
For the proof, for each k and each choice of A 1 , ... , A k from a countable generating set for M we choose a set J c 1 4. of density 0 (whose existence is guaranteed by Theorem 3.4 and Lemma 2.6.2) such that the statement holds for A 1 , . . . , A k and this J. This way we obtain countably many such sets J. The rest of the proof proceeds as before. Furstenberg and Weiss have observed that the topological analogue of the Ergodic Szemerédi Theorem is much easier to prove and also yields van der Waerden's Theorem as an easy corollary. Theorem 3.6 (Furstenberg and Weiss 1978) Let X be a compact Hausdorff space and T: X -o X a minimal homeomorphism. If U c X is open and nonempty and k> 1, then there is an n such that
U n PU n... n714-1) nU
0.
In fact, Furstenberg and Weiss found it not much harder to prove a similar theorem for any k commuting homeomorphisms, not just I. T, T 2 .... Tk_ i . Theorem 3.7 (Furstenberg and Weiss 1978) Let X be a compact metric space and Ti , ... Tk commuting homeomorphisms of X. Then there is a point x E X which is multiply recurrent: there are ni -0 00 with T !itx , --+ x for each i =1, ... k.
Of course this Theorem implies the one preceding it. By minimality, TrnxE U for some m. Letting Ti = T 1 for i = 0, 1, ... k — 1, we have Tx -o x
as k -0 op
for all i,
that is, T - nklx -0 x
and hence T - nki Tmx - ■ Tmx
for all i.
4.3 . The Szemerédi Theorem
165
Then clearly a single n can be found such that T - nix e U For all i - 0, I, ... k - 1, and this implies that xe U n TnU n ... n TU.
The existence of this 'commuting' version of the topological Szemerédi theorem suggested to Furstenberg and Katznelson that a similar extension might be possible in the ergodic setting as well. Theorem 3.8 (Furstenberg and Katznelson 1978) Let TL , T2 , ... , Tk
be commuting measure-preserving transformations on a finite measure space (X, A', p). If AE with p(A) > O. then lim inf N-. 0
,,,
1 N —
N
E1 p(r; A n T2" A n ... n Tk" A) > O.
In particular. there is an n > 0 with p( Tr; A n 7T An ... nT 17 A) > O. This strengthening of the ergodic version leads to a strengthening of the original Szemerédi Theorem. In order to indicate the connection between the combinatorial and ergodic settings, we will prove this Theorem immediately, assuming the Furstenberg-Katznelson Theorem. Theorem 3.9 Multidimensional Szemerédi Theorem (Furstenberg and Katznelson 1978) Let S c lr be a subset with positive upper density
(calculated with respect to any sequence of cubes [ani, bnl i x — x [anr, //nil with bin. - a:, -o oo as n -0 oo) and let F c Z" be any finite configuration. Then there are an integer d and a vector uar such that u + dF S (Le., S contains a figure similar to F). Proof: Let S c Zr have positive upper density. Let r=
{ 0,
1
} zr
with the product topology; an element w of R. can be thought of as a 0, 1-valued function (An, , n 2 , ... , nr) on Zr. On Or we have r commuting transformations a 1 ,0 2 , ... '°r' a i being the shift in the ith coordinate: for example (cr i co)(n i , n 2
,
...
,
nr) --= to(n 1 + 1,n 2 . ... ,n,.).
Let X be the orbit closure of xs ell, under the transformations a l , ... , ar : X = cl ta7xs : ne Z, i ------- 1, 2, ... , rl. Let
166
4. More about recurrence
so that ar am2 2 armrxs e A if and only if (m t , rn 2 , , mr)e S. We want first to find a Borel probability measure p on X invariant under all the a such that AA) > O. Since S Zr has positive upper density, there are cubes [ani bni] x x [arn rn -1j with each at, - 0 co as n -0 co such that the density of S calculated with respect to these cubes is positive. This is to say that if for each n we define a measure p n on 11,. by
1 P (i)=
— an1 )(b.2 —
n
—i
E
f(c4"0"212 • (7'74) (few(ov)),
then
Lin OA)
> O.
Let p be any weak* limit point of p„ : n = 1, 2, Then, p is invariant under each ai , and also p(A) > O. Let F Zr be any finite configuration; we need to prove that S contains a translate of a dilate of F. For this purpose it is enough to show that given any KeZ +, there are ueir and del + such that u 4- Ak i , k2 ,
kdeS
for all 0
ki ‘. K.
Since the transformations lc;
... r-k r:13
K,i = 1,
, r}
form a commuting family of transformations of (X, p) and p(A) > 0, by the Furstenberg-Katznelson Theorem we can find d> 0 with
n
P(
I
{I
2
r
7
> 0.
0
The definition of p = lim p
,,
shows that if p(E) > 0 then there
is (u 1 , u2 .
u)eir such that c ut i o..22 arurxse E.
(In fact there are enough such {u1 } that their limiting 'frequency' is p(E)> O.) Letting E be the above set of positive measure, we see that there is ueZr such that for each choice of k 1 , ,k,. with 0 ‘. ki K for all
j.
0.71012
cri,xs e
dk 0. dki
This says that cr ., - dk cr u, 2 f dk 2
- dke xse A .
r— dk r
A
4.3. The Szemerédi Theorem
167
Or
(u, There are further conjectures along these lines (due to Eras (1964) who has offered monetary rewards for their solution), some of which may also be amenable to ergodic-theoretic analysis. Perhaps the most important one is the following, a positive solution of which would imply that the sequence of primes contains arbitrarily long arithmetic progressions. Problem 1: If (Ilk} is a sequence of positive integers for which v,
Ln = k k then {nk } contains arbitrarily long arithmetic progressions. Problem 2: If co is any infinite sequence on the symbols 1 and - 1, then for every k an n and N can be found for which
kw(n) + co(2n) +
+ w(Nn) i > k.
While in Problem 2 less is required than in van der Waerden's Theoremin that we don't ask that either to) = 1} or to.) = - 1} contain arbitrarily long arithmetic progressions but seek only progressions along which either the is or - Is are predominant - at the same time the progressions considered are restricted, in that translation is not allowed. In the following section we will prove the topological Szemerédi Theorem and its consequence, van der Waerden's Theorem. The same approach yields a dynamical proof of Hindman's Theorem. Next, we will prove Furstenberg's ergodic Szemerédi theorem for weakly mixing transformations (i.e., weak mixing implies weak mixing of all orders). Only then, once the basics and the direction are clear, will we sketch the proof of the Furstenberg-Katznelson Theorem. (For the details of the proof, and much more information about recurrence, see Furstenberg 1981). We should remark also that equally important with establishing a proof of the Szemerédi Theorem is the development of the new ergodictheoretic ideas which the process entails. Furstenberg has contributed new insights which can be applied to many other problems in ergodic theory besides this one. B. Topological multiple recurrence, van and Hindman's Theorem
der Waerden's Theorem,
The fundamental recurrence result that will be generalized and applied in this section is the following one.
168
4. More about recurrence
Theorem 3.10 (Birkhoff 1927) Let T: X -o X be a homeomorphism on a compact metric space X. Then there is a point xe X which is with Tnkx -, x. recurrent under T: that is, there are nk -* Proof: By Zorn's Lemma, there are minimal sets in X (nonempty closed invariant sets M. each point of which has an orbit dense in M). Every point of a minimal set is recurrent, by Gottschalk's Theorem (1.2). We will say that a pair (X, T), where X is compact metric and T: X -■ X is a homeomorphism, is homogeneous if there is a group G of homeomorphisms of X commuting with T such that (X, G) is a minimal transformation group. A closed subset A c X is called homogeneous in (X, T) if there is a group G of homeomorphisms of X commuting with T such that GA = A and (A . G) is a minimal transformation group. Proposition 3.11 Let T: X -0 X be a homeomorphism of a compact metric space X and Ac Xa closed homogeneous subset. Suppose that for each E > 0 there are x, ye A and n ?-• 1 with d(Tnx, y)< E. Then for each e >0 there are ze A and n?..- 1 with d(T" z, z) < E. (Notice that A need not be T-invariant). Proof (Bowen): We will prove first that the point yE A mentioned in the hypotheses is arbitrary. Suppose then that (A, G) is minimal and E >0. We can find with
c (1) min d(gix, y) < i 2
for all x, ye A:
for cover A by finitely many sets V, of diameter less than e/2. Then for each j, {g - 1 Vj :geG} is an open cover of A, and hence has a finite subcover
Then given any x, ye A, ye V, for some j and xeg -i-1 Vi for some i and the same j. Consequently E
Agi,j X, y) < -i , proving (1). Now choose 6 > 0 small enough that d(x, x') < (5 implies d(g ,x, g ,x') < e12 for all i (note that the g, are fixed to satisfy (1)). By hypothesis, there are xo , yo e A and no › 1 such that
d(Tnax 0 , Ye) < (5-
4.3. The Szemerédi Theorem
169
Then for each i
d(g1 r°x 0 , giyo) = d(TnVix o , giyo) < and hence, since (1) allows us to choose i so that d(g iy o , y) < s/2, min d(T"Vixo , y) < E
for each ye A.
This proves that for each ye A there are x e A and n 1 such that d(rx, y) < E. Now fix an arbitrary Zo EA and choose z l e A and n 1 1 with
(2) d(T"'z i , z o ) < ;. Choose z 2 e A and n 2 › 1 with Tn2z 2 very close to z 1 and T'n 2z2 very close to z o : that is, choose z2 and n2 so that
d(P2z2 , z 1) < E 21 where E 2 < /2 is chosen small enough to guarantee that (2) still holds when z is replaced by Tn2z 2 . Continue in this manner. If z0 , z 1 , , zr e A, n t , n 2 ,..., ?Ir e and Er e(0, E/2) have already been chosen so that 1) <
(3) d(Tniz i ,
, r,
j = 1,
find cr+1 2 so that (3) still holds when z, is replaced by any point at distance less than Er+ from it. Then choose zr . i e A and nr . E N with
d(T"r+ 'z,. 1 , z,)
< Er+ 1'
We have that i <j implies
2 Simply by compactness of A. there are i, j with i <j and
d(z., z) < - . 2
Choosing n = ni + ni _ + + ni , we have d(Tnz i ,
z < c.
Proposition 3.12 Under the hypotheses of the preceding Proposition, there is x e A which is recurrent under T. Proof :We will use a category argument. For each n = 1, 2, ... , let
E
1 : inf d(T kx, x) - }. n
170
4. More about recurrence
If A contains no points recurrent under T, then A
Enn= 1 We will show that the interior Er° of each of the (closed) sets En is empty, and this will contradict the Baire Category Theorem. If gi # 0 for some n, then since (A, G) is minimal we have A and, by compactness, A g i-l En° u g G. for some g 1 . Choose 6 > 0 such that d(x, x') < 6 implies d(g ix, g ix') < 1/n for all i. We claim that if x eg.ï En° , then infk d(rx, x) S. This is so because if there is a k such that d(T k x, x) < 5, then
1
d(T kg .x, g .x)
for all j,
Or
1 d(T ky, y) < n
for some ye gr .
Since each x E A is in g; E`: for some j, we have proved that
inf d(T kx, x) ?- 6
for all x E A.
This conclusion, however, contradicts the preceding Proposition. Proof 3.13 Proof of the Furstenberg - Weiss Theorem Recall that T1 , , Tk are commuting homeomorphisrns of the compact metric space X, and we are to find xe X such that for each >0 there is n 1 with d(Tinx, x)
171
4.3. The Szemeredi Theorem
We verify the common hypothesis of the two preceding Propositions. We need to show that for each e > 0 there are x*, y*E A and n EN such that d(T" x*, y*) < e. Here we use the induction hypothesis. Let
Ri =
for i = 1, ... k - 1,
1
and find xE X and nm -) 00 such that
R!'-x x
for i = 1,
, k 1.
Suppose E > 0, and let
y* = (x, x, , x) and x* = ( Tk-
Tk
x
T,- x).
Then
x Ts:1-x*, y*) d(Tx*, y*) = d(Tr" x TIF" x = d((Tri ... Tk- "'" x, x), (x, x, ,x)) x)), = d((Rrx, , R171 x, x), (x, x, which can be made less than E by an appropriate choice of m. By Proposition 3.12, then, there is a point (x, x, x) E b which is recurrent under T 7 x T2 x x TA.. This is exactly the conclusion of the Theorem. Proof 3.14 Proof of van der Waerden's Theorem Suppose that we are given a partition
We form the sequence space = on the alphabet A = {1, 2. , r}. A has the discrete topology, and has the product topology. 11 may be given the metric d(co i co2) = 112', where k = inf. { ini :co i (n) * co2(n)}. a: SI LI is the shift, defined by ow(n)=
co(n + 1). Define coo E by coo(n) =
if n 1 if n O.
and n E Ci
Let X be the set of limit points of {a"coo :n 1 } . Then X is closed and invariant under a. Given k 1, let Ti = ai for i = 1, 2, ... , k. Applying the FurstenbergWeiss Theorem with E = 1, we find xe X and n 1 with
d(Tinx, x) < 1
for all i = 1, 2.
, k.
172
4. More about recurrence
This implies that x, (fix, 0.2nx, ... . enx all agree in the 0th coordinate, so that
x(0) = x(n) = x(2n) = ... = x(kn). Now the central (2kn + 1)-block of x must appear somewhere in the sequence co o , say starting at the mth place. Then
coo(m) = coo(m + n) = coo (m + 2n) -- ... ---- coo(m + kn). Therefore C.0(.) contains an arithmetic progression of length k + 1. We have shown that for every k there is a j such that Ci contains an arithmetic progression of length k. Of course then there must be a single j such that Ci contains arbitrarily long arithmetic progressions. Remark 3.15 For this application, the `c first' version of the Furstenberg— Weiss Theorem would have sufficed: For each t> O. there is xE X with d(Tinx, x) < c for all i = 1, 2, ... , k. The complete stronger version, based on Proposition 3.12, was not needed.
The topological multiple recurrence theorem leads to a higher-dimensional version of van der Waerden's Theorem. We leave the proof as an exercise. Theorem 3.16 Grünwald's finite partition of IV,
Theorem (see Rado 1943) For any
and any K = 1, 2, ... , there are Ci , dE IN, and be NJ' such that b + d(k 1 , k2 , ... , km)ECi
for 1 -<.. ki -<- K, 1
i--.. m.
Thus for any finite configuration F c Nr" some Ci contains a figure similar to F (i.e., a translate of a dilate of F).
Graham and Rothschild (1971) conjectured that van der Waerden's Theorem could be extended to apply also to certain infinite configurations, and this was proved by N. Hindman. Theorem 3.17 Hindman's Theorem (1974) If
IN = C I u C 2 t..) ... L.) Cr.
(disjoint),
then there is a j = 1, ..., r such that q contains a sequence p l , p2 , ... for which all finite sums pii + A i + ... + pin (j 1 < i 2 < ... < i, n = 1, 2, ... ) also belong to Ci .
A subset C c NI having this property i.e., containing a sequence p1.
4.3. The Szemerédi Theorem
173
P2 ... for which
-
is called an I P sequence. The terminology arises from the fact that the set of all such sums forms an 'infinite-dimensional parallelepined' IQ Pi 1 1.-) {P2 , Pi + P2 } 1.-) {P3, P i + P3, P2 + P3 , Pi + P2 4- P3} U • - . (each set is a translate of the union of the preceding ones). Thus Hindman's Theorem says that for any partition of the positive integers, at least one of the partitioning sets contains an infinite-dimensional parallelepiped. Earlier, Hilbert (1892) had proved that for any partition of the positive integers and any n ...- 1, at least one of the partitioning sets has to contain infinitely many translates of a parallelepiped of dimension n (i.e. one for which the above union terminates after n sets). Furstenberg and Weiss showed how to base the proof of Hindman's Theorem on the notion of proximality in topological dynamics. Recall that if X is a compact metric space and T: X , X is a continuous map, then two points x l , X 2 E X are said to be proximal in the system (X, T) if there is a sequence Ink1 of positive integers with d(Pkx, , rkx 2) —, O. (If T were a homeomorphism, this would be the definition of positive proximality). Recall also that the Ellis semigroup E(X, T) is the closure in the product topology of {Tn:n > 0} in
—
r.
—
Proposition 3.18 Let X be a compact metric space, T: X , X a continuous map, X0 E X, L(x 0) the set of all limit points of { rx o :n...>-. 0}, and M c L(x 0) a minimal set. Then there is a point mo EM such that xo and mo are proximal. Proof: We have E(X, T)x o = L(x 0 ). Thus if F = {peE:px o eM},
then Fx0 = M, F is closed, and F 2 EF F. By Proposition 2.16 F contains an idempotent u. Then UX 0 E M and is proximal to x0 . Proof 3.19 Proof of Hindman's Theorem Given a partition NI = C 1 u C 2 l) ... L.) Cr , again form the (compact metric) sequence space il = A' on the alphabet A = {1, 2, ... , r}, and define
jj
if n ?- 1 and nE C i if n 4. . O.
/4
4.
More about recurrence
Let L be the set of limit points of {an% :n 0 } . Choose a minimal set M L and a point mo EM which is proximal to con . Recall that two points co l and co2 of û are close together if they agree on a long central block: (MA = w 2 (j) for 1 ji k (some large 14. Also, by Gottschalk's theorem (Theorem 1.2) mo is almost periodic; in this context, this means that each block in mo reappears with bounded gap as one moves out along the sequence mo . Let m0(0) = Jo; we will show that
C. = n
1 :coo(n) = j o )
contains an IP-sequence {p i , p 2 , ...}. This is possible because the symbol j o recurs in the sequence mo with bounded gap, while the sequence coo mo on arbitrarily long blocks; thus the occurrences of f0 in cooagreswith canbeotrld. More precisely, we may define the sequence {p 1 , p 2 , } inductively as follows. Let Bo be the block j 0 , and find p 1 such that B o appears at the p i th place in both coo and mo : w0(P1) = m0(p1)
Bo 'h. Then defi ne B 1 = m0(0)m0(1) mdp 1 ). Since any sufficiently long block in mo contains B 1 , there is p 2 ?- 1 such that B 1 appears at the p 2th place in both coo and mo : w0(P2)w0(P2 + 1 )- wo(P2 P1)
in 0(132)In0(P2 + 1) • • • ino(1) 2 Pz)
= B 1 = m 0(0)m 0( 1) . m o(p i ). In particular, w0(P2 + Pi) in& 2 + P 1) mo(P 1) o m0(0) (00(13 2) = m0(P2) mo(P ) Jo so that p l , p2 , p + p2 ECio . Hp, , p 2 , pn have been selected with (1) 0(P1 1 + Pi, + • • - + mo(Pi, Pi2 + • - • + for any choice of 1 i1 < i2 < < n, (4) 0(p1 )
=
let
= m onm o(1) . m o(p + p 2 + + and find a pn > 0 such that B. appears at the pr+ i th place in both co0 and mo . Then
coo , Pn+1,(0 0(P
+ 1) ... coo (N + + (p +
= mo(P,4- i)mo(Pn+ +
+p))
mo(Pn+ +(p1 + + Pn)) +
4.3. The Szemerédi Theorem
175
This implies that pi , + p i2 + + p.E C.0 for any choice of 1 ‘. < E2 < <j n+ 1. In Section 4.4 this result will be used in the proof of the Jewett—Krieger Theorem on the representation of arbitrary ergodic transformations by uniquely ergodic subsystems of ((i, a). C. Weak mixing implies weak mixing of all orders along multiples We will prove in this section that if T: X --> X is a weakly mixing m.p.t. on a probability space (X, p), and A 1 , A 2 , , A k E a, then
Jim
E
n=m +1 N— p(A 1 )p(A 2)...p(A ,jj =
Tr' A 2
""
a
Recall that an ergodic m.p.t. T:X X is weakly mixing if and only if T x S is ergodic on X x Y for each ergodic (Y., (6, y, S). This has the following immediate consequence. Proposition 3.20 If T: X —) X is weakly mixing, then Tk = T x T 2 x... x Tk is ergodic on Xk for each k 1. Proof :Each (X, Ti) is weakly mixing. (Consider the characterization of weak mixing involving sets J 1 + of density 0.) Since (X, T) is ergodic and (X, T 2 ) is weakly mixing, (X x X, T x T 2) is ergodic. Since (X, T 3) is weakly mixing (X x X x X, T x T2 x T 3) is ergodic. We may continue in this manner for as many steps as needed, thereby establishing the result for all k. At the heart of the ergodic Szemerédi Theorem for weakly mixing transformations is a relationship between measures that generalizes that between point measures and the given invariant ergodic measure p: averages of translates of one approximate the other.
X be an ergodic m.p.t. on a finite Definition 3.21 Let T: X measure space (x, p) and let y be another measure on (X, a). Let Loe(X, p) be a T-invariant (i.e. La = sl) self-conjugate algebra of bounded measurable functions on X, and let {Mk . NJ be a sequence of pairs of integers with Nk — Mk 00. We say that y is generic for p with respect to d and {M k , NJ if No, j• 1 for all fed. f(Tnx)dv(x) ff dp Nk — M k tr=-- Mk+ 1
siore
't .
avow recurrence
Remarks 122 (1) p is always generic for p. (2) If .51 is cou ntably generated (with respect to p), then for almost all x, Sx (the point mass at x) is generic for p with respect to 0 and (0, k}. This follows from the Ergodic Theorem. (3) Let T be ergodic, X 2 = X x X, M 2 = the product a-algebra, and T2 = T x T2. Then the diagonal measure v 2 on (X, g12) defined by
f
f (x 1 , x 2)dv 2 (x 1 , x2 ) = f f (x, x)dp(x)
(f EL(X 2 , .03 2))
is generic for p2 --.-- p x p with respect to any sequence (M k , NJ for which Nk — Mk —4 09 and the algebra n
{ E fi(x i )gi(x 2):ft , gi ELMX, gi, p)}. i= 1 (Functions of the above form will be abbreviated Efi 0 gi .) <42 =
Proof: By the Mean Ergodic Theorem, iff E L2( X, ge, p), then I
Nk
—
Mk
(Tf + T 2f + ... + rk -mkf) --0 f f dp
in L2 .
Using the invariance of p, we see that Nk I Tif -• ff dp -M N k k i=- Aik+ 1
E
in L2 .
(For we may fix a k with the approximation in the first statement better than e, and then change variables, replacing x by Tmkx.) This implies that if f, gE L'. then
E Tif) --• (g, f f du).
g'
N — Mk j=Mk+1 I
k
(
and hence
1 Nk —
Nk f
E
Mk j= Mk + 1
g - Tif dp -0 , ff dp• f gdp,
Changing variables in each integral gives I NI
, —
or
1
Ç g(Tn x)f(T 2nx)dp --, if dti - f gdp., E Mknv-- Mh+1 Nk
N,,
Ef
Nk — M k n=Mk+ 1
f 2
4.3. The Szemerédi Theorem
177
We want to extend Remark 3.22 (3) from the case of p c 2 , a2 , T2 ) to Tk) (Tk = Tx T 2 x x Tk) for any k 2, in case T is weakly (Xk , mixing; weak mixing of all orders along multiples will then follow easily. The argument is similar to the one just given for the case k = 2, and it depends on an extension of the Mean Ergodic Theorem. Theorem 3.23 If (X, p, T) is ergodic and y is generic for p with respect to {M k , NJ and an algebra d, then for each f Ed,
1
Nk
E
in voc,
Tf -• fdp
!Vic — Mk mk+
y).
(In case v = p, this is the ordinary Mean Ergodic Theorem). Proof: Let fed; we assume without loss of generality that If dp = O. Given c > 0, choose Q so large that
fr
f T 2f + + TQf
Since
2
< E.
Tf T 2f + + TQf
g=
2 E
d.
by applying the definition of 'generic', we see that for large k Tn+Qf 2 Tn+ If 4_ Nk dv < E. Nk M k n= Aik+ 1 By the Schwarz Inequality ( I = (akin)1 -Ank = 1 ak2 0/n), it follows that also I Q,k
f
=
Nk
1 Nk
IA
—
Tn+ if + + Tn+ Qf dv
M k n= Mk+ I
Q,k1 2 dV <
But
A
141 x
—
Nk —
Mk
-,k
n=114h-t- 1
is not far from Nk
1 N ,,
M•
E
n= Mk +
T n+ f + + Tr' Qf
Tnf
I
4.
la
more about recurrence
in fact, the difference is (stuffing the k subscripts for the moment) 1
rif
Q Tn + f
Q
E N — M n=M+ 1 J=1 1 Q
1
E
E
(T n+if —
Tnf)
j=1 N M n= hi + I
iQ = 71 j=E
1(
N
E
per
Tn+if
E
Ty),
n=M+ 1
nr.--N-j+1
which has 1,2 (v) norm less than or equal to
211.i. 11,0
21
Nk
and this tends to 0 as k oo. We are able to conclude, then, that I'm sup k-•o3
f
1 Nk
Nk
E
— M k n= Mk+ 1
2
f dv Tnl
e,
as required. Theorem 3.24 If (X, 61, p, T) is weakly mixing, then for each k 2 the diagonal measure vk on (Xk , ak) (Xk = xk, = k-fold product
a-algebra of ge) defined by
f
f(x i ,
x2 , ... , xk )dvk --,--- f(x, x, ... , x)dp(x)
(f EVD (X k , ,41,))
x Tk, is generic for pk =pxpx x p with respect to Tk = T x T2 x any sequence {Mk , NO with Nk — Mk —) 00, and the algebra dk of all functions of the form
E i=
f
• • • f(x 1 , • • • xk )
E fii(xdfAx2)...fAxd (each f E /MX»
That is,
isr — E ff,(x)f2 (Tnx)...mrk - nnx) dp(x) N n=
dp ffk d p
for f 1 , , fk e 12(X). Proof • The proof is by induction on k. By Remark 3.22 (3), the statement holds for k = 2. Assume now that the Theorem is true for k, and we will prove it for k + 1. Since ak , pk , Tk ) is ergodic (Proposition 3.20), Theorem 3.23 implies that, given f L/X), i = 1, , k, 1 Nk
—
Nk
M k n=Mk..4. 1
ff.
--.0fkctuk
in L2(v
179
4.3. The Szemerédi Theorem
This says that Nk
—
Nk
Tnf 1 0 T 211f2 0 ... T knfk -) fi f fidti y M k n=Mk+1
in L2(vd, or (in light of the definition of yi),
1
Nk
E
f i (Px)f 2(T 2 nx)... fk(T knx)-)
Nk — M k n=Mk+1
fi
i= I
f
dp
in L2(p). Multiply by fo(x) and integrate to find that Nk 1 fo(x)f,(Tnx) • ik (Tknx)dp —) n Nk M k n=Mk+1 i=0 Changing variables in each integral (x T"x) gives Nk
1
f0(Tnx) f i (T 2nx)...fk(T (k+1)".x)elp -)
Nk — M k n=Mk+1
i.e. Ni
T:,(fo.f,
E Nk — M k n=Mk+1
f
, A k ER,
lim =
fddvk
fkdPk+i-
.f0 0 f
Corollary 3.25 If
A1 , A2,
•
(X,
p, T) is weakly mixing and
then 1
E p(A, nriA 2 n...nrk-l) nAd
N - iy, n=M+1 1 )11( A 2 )
Proof : Apply the Theorem to f1 = xAi , i = 1, ,k. Corollary 3.26 If
(X, a, p . T) is weakly mixing and
A 1 , A 2 , ... A k Eal, then
lim
1 N—
E p(A
n Tn A 2 n
n=m+1
n
rk— "n
- p(A i )p(A 2 ) p(A k)I = O.
Proof: Apply Corollary 3.25 to the weakly mixing transformation T x T on X x X and the sets A 1 x , Ak x Ak ; we obtain
1
E
x pr(A, x A i ) n(T x T)'(A
n=M+N-M 1 n (T x T) (k - "n(A k x A k )]
2 x A2)n...
ISO
4.
More about recurrence
i.e. N
1
E
[AA, n T" A 2 n ..
n=M+I N—M —, {AA 1 )1(A 2) ...
n T o, 1 )n A
)
]2
p(A k)] 2 .
If we let
an = p(A i nT" /1 2 n ... ni (k- »IA d and
Œ = p(A 1 ) 4 A 2 )...p(A k), then N 1 (an — cc)2 N — 11/1 n=M+ I
E
N
N
1 a 2 2c4 an + oc 2 0, = ' N — M n = . M +1 N — M " n= M + I since the first term tends to a2 and the second to — 2a2 . By the Schwarz Inequality, then also 1
E
V
NN
1
E
N — Mn,m+i
lan al --<... j
E
Iaa,— n Œ 2 .iN —1 M -40. n=M + 1
Remark 3.27 In connection with Corollary 3.25, notice that (X, .03, p, T) is weakly mixing if and only if N
lim
1
—
E p(A nT"B n T'C)= p(A)p(B)p(C)
for all A, B. CES.
N-'c z,N n=1
For this statement implies, as usual, that Ntirno,
-K1n EN
ff(x) g(T1x)h(T21x)dp
--.---- ff dp • jg dp • f hdp
for all f, g, he L2 (X).
But if T is not weakly mixing, then it has a nontrivial eigenfunction 4' with eigenvalue /1. # 1, and taking f = cfr, g = 4r 2 h = 41 would give f (*(Tm x )h(T2n) =__, o xp. -- 2n (10)2 )12nd: (x) a_-_ 1 ;
on the other hand, If dp = 0, so that the condition cannot hold. Of course the statement of Corollary 3.25 for k = 2 is equivalent to ergodicity.
4.3. The Szemerkli Theorem
181
D. Outline of the proof of the Furstenberg-Katznelson Theorem At the other extreme, with respect to dynamical behavior, from the weakly mixing transformations are the group rotations. In this section we will show first that the multiple recurrence result holds for such transformations also. The proof of the full multiple recurrence theorem, which we will then outline, depends on strengthening this result and the one of the previous section so that they will apply to the case of extensions. Suppose then that X is a compact metric abelian group, .4 is the Borel a-algebra of X, p is Haar measure, and Tx = xox , where x o e X is a point such that {x is dense in X. Recall that every ergodic isometry has a model of this kind. Lemma 3.28 Iffi , f2 , ... , fk e L'(X), then 111(x) = ff ,(y)f,(xy)- • • f(xk — ' y)dp(y) x is a continuous function of x. Proof: Each of the functions x -+ fi(x 'y) is a continuous function from X into Li (X). (For example, if f EL', given c > 0, we may choose Oe(X) with 114) -f Il l < E and. since multiplication is continuous, a 6 > 0 such that d(x, x) < 5 implies f 0(xy) - 4(x4)1 < e for all ye X. Writing fx(y) =-- f(xy), we then have + ilq5 . — II ix - .fx, II / II f x — if d(x, x') < 6.) Then if xi -) x we have
f
4• il i + II 0.- — L. li 1 < 3
fi( 31 f 2 (x iy) — 4(4 — 1 Adil(Y) - f f ,(y)f 2(xy) • - • fk(x k 1 Y)citi(Y)
= f f i (y)f 2(x iy) • - .f
i(xl - 'A [h (xl— ' y)
± f f 104 — - fk— 20c",' 3 A[fk—
—
fAk — 'A] M y)
'Oct,' 2 .0 — fk_ ,(xk — 2 y)]
x fk (x" )0401 + • • + f f ,(y)Lf 2(x iy) — f 2(xY)]f 3(x
2
Y)... fkOck- 1 AdAY),
and, using the boundedness of the f1, each of these integrals tends to 0 as j -) co.
125L
4.
more about recurrence
Theorem 3.29 If (X, gt, p, T) is an ergodic group rotation and f E Loe(X, 2, p) is nonnegative but not identically 0, then
lim N–m- -. 00
1 N
—
N
,,,,
Iv' n
E ff(Ylf(TnY)... f(rk - "nY)dti(Y) ."4- +1 ..
exists and is positive. Proof: Given such an f e L' , let 11 (x )
— fx f(Y)f(xY)-. - f (x
k- 1 A cli4Y)
for x e X. Since 4/ is continuous, by the Lemma, and T is minimal and uniquely ergodic,
iim
N_m_.,,
1
N
I 0(x)— fJ x0431dp(y)-
N — M n=M+1
This is, however, just the limit mentioned in the Theorem. Since iii..?- 0, lit(e)> 0, and 0 is continuous, h Odic > O. Suppose that (X, 2, p) is a probability space and I- is an abelian group which acts on X by m.p.t.s. A function fe LAX) is called an eigenfunction in case there is a character c of r (i.e. a homomorphism :r --+ C) such that f (yx) = (y) f(x)
for all y and a.a. x.
The system (X, At, p, r) is called almost periodic (or a Kronecker system) in case the eigenfunctions span V. Theorem 3.29 shows, then, that the ergodic Szemerédi Theorem holds for Kronecker systems (X, at, p, 1). Remarks 3.30 (1) Every measure-preserving system (X, 2, p, T)
has a maximal almost periodic factor. (2) If f is the group generated by k commuting m.p.t.s T 1 , T2 , ... , Tk on X and (x, a, p, r) is almost periodic, then for each 0 ‘.f e Lc°, f * 0,
iim
N_Ai_.„0,
1
isr
E
N —M n=m+ ,
f
(TW (TN -- - f(17,Y)dii(Y)>
0-
We will now consider the relativized versions of ergodicity, weak mixing
and almost periodicity. Let r be a fixed countable abelian group which acts by m.p.t.s on the measure spaces mentioned in the following discussion. If (X, a, p) and (Y, W, y) are probability spaces and it: X -• Y is a measurable onto map which commutes with the action of each TE r and for which pn -1 = y, then we write X -ir+ Y and say that Y is a factor of X. or X is an extension of Y.
4.3. The Szemeredi Theorem
183
The measure p on X has a disintegration in terms of fiber measures indexed by the points of Y. We always deal with Lebesgue spaces, so let us assume that X = Y = [0, 1), and n: (X , .4 , p) —) (Y, W , y) is an extension. Let My = n - 1 W. Let {4,j be a countable dense set in W(X). For each n -choose a version of the conditional expectation E(4 9 n I el y ). For almost all x€X, the map
= M 4 TI1My )(x) is uniformly continuous on {0. } , and hence Ax extends to all of W(X). The extension is linear and positive with Ax(1) = 1, so there is a positive Borel measure pirx (depending only on y = nx) such that Ao) , f 0 dp.x for all OEW(X). Then 2 0„)
f f dp = ff f (x)dp y(x)dv(y) Y
X
for all bounded Borel functions f on X, so we write P=f An extension X - Y is called relatively ergodic with respect to TEr if every T-invariant 1.2 function on X is a.e. a function on Y, in that there is fo E LA Y) with f = fo n a.e. The relativized version of the cross product is the fiber product X x, ,C, which is defined as follows. If X I-) Y and X' Ir) Y are extensions of Y, let X xy X' = {(x, x): nx = ex'}. The a-algebra, 4, of X x y X' is the one it inherits by restriction from X x X', and its measure, /I, is the one determined by the fiber measure j.iy = py x p; :
f
f(x, x')dii(x, x') = ffff(x, x')dp y(x)(114(.0dy(y).
Tr Now we can say that an extension X —) Y is relatively weakly mixing with , Ter if X x y X —) Y irespcto s a relatively ergodic extension for T. The extension of the definition of almost periodicity is a little more complicated. It can be proved that the following two properties of an extension X -+ Ir Y, for a subgroup A c r generated by T1 , ... , Tk , are equivalent: (1) There is a dense set 2 c LAX, .4, p) consisting of functions whose A-orbits have finite rank over Y, in that for each f E g and s > 0, there is a finite set g 1 , ... , g n e LAX, .4, p) such that for each TEA there is a j with il Tf — g illL2 (xacpy) <s for a.e. ye Y.
A i-l -nr
-r. iri
Ur C
Lwow recurrence
(2) The autocorrelation 1
lim
E f (T i ' Ti22 ... nkx 1 ) f (T T 2t2 ... 7 kikx 2) Ii1,i2....,iklicn of a function f E L2(X , .4 , p) is 0 a.e. in L2(X x y X, .4, fi) if and only if f ---- 0 a.e.in X . n , cio (2n + 1)k
An extension X -+" Y having one of these properties will be called relatively compact. (1) and (2) are relativized versions of almost periodicity: (1) corresponds to the fact that the eigenfunctions of an almost periodic system span 1,2 (see also the remarks at the end of Section 4.1.C), and (2) corresponds to a consequence of the recurrence property of almost periodic functions. Other characterizations of relative compactness analogous to properties of almost periodic systems, for example in terms of vector-valued eigenfunctions with unitary-matrix eigenvalues, are also possible.
Remark 3.31 If X -IT+ Y is a relatively compact extension for each of A1 , A2 I-, then it is relatively compact for A 1 A2 . Now we extend the characterization of weak mixing T x e g and Theorem 3.24 to the case of relatively weakly mixing extensions. Note that in 3.24 and 3.33, the presence or absence of the exponent 2 is immaterial - by increasing k, as in the proof of Corollary 3.26, it can be Wed when desired. Proposition 3.32 If X -• Y is relatively weakly mixing and X'- Y is relatively ergodic, then X x y X' -+ Y is relatively ergodic.
Theorem 3.33 Let (X, al, p) 12> Y, W, y) be a relatively weakly mixing extension, for each TE I-, T 1. If f, 12 , ... ,fk eI(X,g, p) and T1 , ... , Ti are distinct elements of r, then (
y1 f[E( n Tyi lay) _
1N
linl --
N-.co N n=
k
n
./-= 1
./=-- 1
k
2 T7E( fi I My)] dp =-- O.
The next theorem will imply that any system (X, A', p, T) can be resolved into relatively weakly mixing and relatively compact extensions, beginning with a one-point space.
Theorem 3.34 If a proper extension (X, 2, p) -14' (Y,(. 6 ' , y) is lot relatively weakly mixing for T e r, then there is a factor (Z, g. t) of (X, .4, p)
4.3. The Szemeredi Theorem
185
which is a nontrivial relatively compact extension of (Y, W, y) for the group generated by T. Using Remark 3.31, the following resolution can now be established.
Theorem 3.35 Let F be finitely generated and (Y, W, y) a proper factor of (X, 2, p). Then there are (1) a factor (Z, 3, r) of (X, 2, p) which
is a proper extension of (Y, W, y) and (2) a decomposition F = of r into the direct product of two subgroups such that
Fw X Fc
(i) (Z, g, T) --- (Y, W, y) is relatively weakly mixing for each T # 1
in r, and (ii) (Z,g,T) —> ( Y, W, y) is relatively compact for F. We say that the action of F on (X, 2, p) is SZ if given Ti , ... , Tk EF, the conclusion of the Furstenberg—Katznelson Theorem holds for T1 , ... , Tk : given AE P1 with p(A) > 0, there is (5 > 0 such that
on a set of lower density (calculated with respect to the intervals [1. N] as N —, co) greater than b. We want to show that the action of Z k on a finite measure space is always SZ. In order to prove the Furstenberg—Katznelson Theorem, one assumes that the commuting m.p.t.s T1 , ... Tk on (X, gi, p) are given, and that each is an element of f ^,-, r. The action of f on the maximal almost periodic factor (X, g, y) of (X, 2, p) is SZ, by Remark 3.30(2), above. The aim is to show that the maximal SZ factor of (X, gt, p) is (X, At, p) itself by using the resolution of this extension into relatively weakly mixing and relatively compact extensions. If { (YŒ . 4 OE , va): cce Al is a totally-ordered family of factors of (X, 2, p), we define their supremum to be the factor determined by the closure of the union of the corresponding sub-a-algebras of a. It can be proved that if the action of F on each (YŒ, WOE, yOE) is SZ, then the action of F on supOE ( ya , wa , v.) is SZ. Therefore, by Zorn's Lemma, (X, At, p) has maximal factors on which the action of F is SZ. Let us suppose that (Y, cg, y) is such a maximal factor. If the extension (X, 2, p)---> (Y, W , y) is proper, it can be resolved as in Theorem 3.35 into relatively weakly mixing and relatively compact parts. The core of the argument consists, then, in showing that the SZ property lifts through every extension of this particular form. Significantly, the key combinatorial fact required to accomplish this
186
4. More about recurrence
turns out to be Grünwald's Theorem, the multidimensional van der Waerden Theorem (3.16). In this way the delicate multidimensional Szemerédi Theorem is reduced to a much simpler special case, and the quantitative Furstenberg-Katznelson Theorem is reduced to a recurrence result in topological dynamics.
Exercises 1. If T: X + X is a m.p.t. on (X, a, #) and p(A) > 0, then for each r = 1, 2, ... a.e. point of A is r - recurrent with respect to A: if p r(x, A)=-- inf{n › 1:x, Tax, rnx, ... , TrnxE/1}, then pr(x, A) < co for a.e. x E A. -
4.4. The topological representation of ergodic transformations Because much of the motivation for the study of ergodic measurepreserving transformations comes from Liouville's Theorem and Boltzmann's Ergodic Hypothesis, many of the important systems for the study of which ergodic theory was created are physical. They are governed by differential equations, and their evolution is described by diffeomorphisms of manifolds which preserve measures with smooth densities. Poincaré and Birkhoff initiated the study of the purely topological and metric (i.e. measure-theoretic) properties of such maps in order to analyze their most basic properties, and this led to the abstract study of the dynamics of homeomorphisms of compact topological spaces and measurepreserving transformations on measure spaces. How far are these now vastly-developed subjects from their origins? More precisely, one may ask, for example: (a) Can every ergodic m.p.t. on a Lebesgue space be realized as (i.e., is it metrically isomorphic to) a minimal, uniquely ergodic homeomorphism of a compact metric space? (b) Can every ergodic m.p.t. on a Lebesgue space be realized as a diffeomorphism which preserves a smooth measure on a mani-
fold? The answer to (a) is Yes, as we shall now see. Work on (b) is in progress, and the answer may also be Yes, subject to certain known restrictions
(e.g. finite entropy). Robert I. Jewett (1970) gave the then astonishing positive answer to (a) for weakly mixing transformations. Before then substantial effort had gone into the construction of uniquely ergodic systems which had positi‘..: entropy (Furstenberg 1967, Hahn and Katznelson 1967), were weakly or
4.4. Topological representation
187
strongly mixing (Jacobs 1970b, Kakutani 1973, Petersen 1970a etc.). The existence of such systems is now, of course, a direct corollary of Jewett's Theorem. Subsequently, it remained for some time an interesting problem to give explicit constructions of uniquely ergodic K-systems and Bernoulli systems. This was finally accomplished by Grillenberger (1973a, b) and Grillenberger and Shields (1975). Krieger (1972) established the positive answer to (a) for ergodic transformations, using his theorem on the existence of generators and the theory of entropy. Other proofs, improvements, and extensions were due to Hansel (1974), Hansel and Raoult (1973), Jacobs (1970a), Denker (1973), and Denker and Eberlein (1974). In realizing an ergodic m.p.t. T topologically, the topological space may always be taken to be the Cantor set. If the entropy of the transformation is finite, the space may be taken to be a closed shift-invariant uniquely ergodic (even strictly ergodic — that is, uniquely ergodic and minimal) subset of the space of all bilateral sequences on N symbols, where N is the smallest integer larger, than 2h( T)(h(T) is the entropy of T), and the transformation may be taken to be the shift. (We must have the entropy of T strictly less than the entropy of the full shift on N symbols.) When one is working with an ergodic m.p.t. T, it may be convenient to assume, as this Theorem allows us to do, that T is a strictly ergodic subshift. It is hard to see, though, that such a supposition, comforting though it may be, can have any real usefulness. The Jewett—Krieger Theorem is much like the Isomorphism Theorem for Lebesgue spaces: its theoretical importance is tremendous, but its practical effects are essentially nil. Indeed, the theorem itself says that this must be so, since it guarantees that
the property of being realizable as a uniquely ergodic homemorphism tells us nothing at all about a given m.p.t. Still, just as some measure-theoretic constructions are more easily described on the unit interval, so some dynamics may more conveniently be done in a topological setting, even in a subshift. A. Bellow and H. Furstenberg (1979) extended Jewett's argument from the weakly mixing case to the ergodic case. They noticed that weak mixing was used at exactly one point, and that the proof could be carried past this point in the ergodic case by using Hindman's Theorem (which, symmetrically Furstenberg and Weiss had proved by dynamical methods in 1978 — see Section 4.3.B). There is no need to bring in the machineries of entropy and generators; the argument, close and clever as it is, uses only combinatorics, basic ergodic theory, and soft analysis. It is this argument that we will present here.
188
4. More about recurrence
Theorem (Jewett — Krieger) Let T:X --) X be an ergodic m.p.t. on a Lebesgue space (X, A', p). Then T is metrically isomorphic to a
strictly ergodic homeomorphism S of the Cantor set C. A. Preliminaries First we reduce the problem as far as possible by applying soft analysis and also record a few observations which will be useful later. Remark 4.1 A functional — analytic condition for isomorphism. We may assume that (X. A- ', p) is the ordinary measure space of the unit
interval [0, 1). According to the theorems of von Neumann (1.4.6 and 1.4.7), in the case of m.p.t.s on complete, separable metric spaces. every (set) isomorphism between the maps of the measure algebras arises from a (point) isomorphism between the m.p.t.s. Such a set isomorphism between T and S can be obtained from a Banach algebra isomorphism onto
Ck L'(X. .- P3. p) ---0 L(C, F. y), :
such that
(1)(fT).— 0( f)S and f 44 f )dv — ff dil for all fe 1:* (X. 1, p). (Here .97 is the Borel field of C and v is the unique S-invariant Borel probability measure on C.) Of course we put (ME) = 41(xE) for EEgi to define the set isomorphism. Then clearly
( TE)= S((E), v((TE)= p(E). (
(Rif)
and
(T( A u B) — OX A L , 13) — 11)(XA + XB — XAXB)
(RA)
L.)
For countable disjoint unions, we have
T. U
(
En
= 01
n=1
XE„=
ri= I
E o(x E.,) ± n =1
which converges in L i (C, -,F, v) to
n=N+ I
E„"•_ , oxE): ,
Go
dv ooco )(En) _ i E o n=N+ 1-1 (n---N+ I . .‘. 0 E. xE .., )dv + E f o xE)dv -> O. I
J
C
(
n= N+1
n=N+ I
Therefore
‹T U n=I
E.
=U nr-- 1
TEn.
C
(RB).
4.4. Topological representation
189
Remark 4.2 Construction of the Cantor set The Cantor set C is obtained by specifying its algebra of continuous functions W(C). This will be the closed subalgebra Alg T (h) of I(X, a, p) generated by {hT":neZ} and the constants, where he L'(X, a,p) is carefully chosen. We must have h total in that {h - 1 A:A c R is Boren is dense in the measure algebra gt, in order to guarantee that Loe(C,..F , y) is isomorphic to I(X, 2, p). And we must be sure that each function ge AlgT (h) is uniform, in that
J
1 n- t --n k=0
gdp in 1,'(X,
a, p),
in order to guarantee that S is uniquely ergodic. (Recall that S:C ---. C will be uniquely ergodic if and only if
-n k=0 E gSk converges uniformly to a constant for each geW(C).) There will be much more discussion of these two properties in what follows. It will also be important that h(X) a D for some compact totally disconnected set D c R, for this will imply that the simple continuous functions are dense in W(C) and hence C is totally disconnected. These assertions follow from the three facts about totally disconnected sets to be mentioned forthwith. Remarks 4.3 Facts about totally disconnected sets (1) Suppose every function in a set S c L% X, 2, p) has range contained in a fixed compact totally disconnected set D R. Then cl op Alg(S) contains a dense set of simple functions. (We write Alg(S) for the algebra generated by S and the constants.) Proof: By the Gelfand theory,
cl
= W(M)
for some compact Hausdorff space M. Thus if g c clLœ Alg(S) andf : is continuous, then f . geclv. Alg(S). Now if we are given go ecl Lao Alg(S) and c >0, we may choose ge A1g(S) with 11 g — go li co < 6/2. Let S > 0, and select a continuous simple function 44: D -- D such that if ck(D) — {d i , d2 , ... , dn}, then 4)(di) = di and
diam 0 - 1 tdil < (5 for i = 1, 2, ... ,n. The function g is a polynomial in members of S. Form a new function g' in Alg(0°S) by replacing each monomial S
.. S4, by 4).Sii .4).S12 ... 4). Sik .
190
4. More about recurrence
Then g' is in clv,. Alg(S)(since each 4' S.is), g' is simple, and II g' — g if 5 is small enough. (2) Suppose that M is a compact metric space such that the simple continuous functions are dense in (OM). Then M is totally disconnected. Proof: Let F be a component of M. Then F is closed. Suppose x, ye F. x # y. There is a continuous function f with f(x) f 61 and hence a continuous simple function fo with fo(x) # fo(y). We may assume that 10 is two-valued. Then fo- '{f 000} u f' } splits F into two disjoint closed sets. (3) If M is compact, metric, totally disconnected, and perfect (i.e. without isolated points), then M is homeomorphic to the ordinary Cantor set C. (Hocking and Young 1961, p. 100).
Remarks 4.4 Facts about uniform functions (1) The set of uniform functions is a closed subspace of If(X, Proof: For each function f on X and integer n, write pi- 1
a, 4
Anf = - f n 0 Then if fk -->f in 1_, ° and the fk are all uniform,
f dp — f k dp
fdp A nf
j
fk dp — A nfk CO
Anfk — A nf 00
First choose k, then n, to make this small. (2) If g 1 g2 ‘. ... and h l h2 ... are uniform functions E(X, p) with gn f, 1ln -> f a.e., then f is uniform.
Proof: For each n, k 1,
Ag — Jfdp A nf— Jfdp Given
s> 0, choose k
Ahk ffdp.
with
(f gk)dp < s/2, f (hk f)dp < 02, and no with
Angk > J gk dP — e/2 a.e. A nhk <
J
hk dp + E/2 a.e., for n?..-no .
in
4.4. Topological representation
191
Then —
e‘.. A nf — f f dp ‘..e
a.e. for n?..- n0 .
Remarks 4.5 Facts about total functions (1) If he Lc°(X , R, p) is total, then Alg(h) is dense in L1(X, g , 14 Proof : For any polynomial P and Borel A c R,
ix
IPch — xh _ iA ldp= f 1P — x A ld(poh - 1 ). DI
Given A, this can be made arbitrarily small by an appropriate choice of P. since the measure /L° h - ' has compact support. Because the sets h - 1 A are dense in .4, the result follows. (2) There are total functions on X, e.g.
where { Bn} is a countable dense set in R. ee(X,.1, p) are total and (3) If f 1 , f 2 .
—
E tiffn# frt+il< co ,
n=1
then { fn} converges a.e. and the limit function f is total. Proof : {fn } converges on the complement of .0
n u (fk #fk
+II,
n=1 Ittrt
which has measure 0 by the easy half of the Borel-Cantelli Lemma. Given BEa, and E> 0, we want to find a Borel set A = R with d(B, f -I A) — p(BAr A) < e.
Find an n with ii U { fi, leri
# fk + 1 } < q2
and a Borel set A c R with d(B,f n- 1 A) < E12.
Then d(B, f - 1 A) ‘.. d(B,f n- 'A) + d( f n- 1 A, f - 1 A).
The first term is less than E12, while the second is k
rt
192
4. More about recurrence
(4) Iff, g: X --, (0, I] are in 1.1 (X , A p) with Ill— g 11 , < E. and f is total., then there is a total h: X --> (O. 1] such that
ilh — gL < fE and
pth # fl 0 and nE N such that Ilf—gili < E i < E and Let E = {If—gl.?..-fEl l. Fix a (5 with 0 < 6 < 11n, to be specified later. Using the usual Lebesgue Ladder technique, for each k = 0, I, ... , n — 1. let Sk
=
k+
k t E: ;1 {x5
n
k n
Qk = IXO S: - < g(x) -..S.
k + 11 n
11 ,
.
Then eg = So , S I ... , Sn _ , , Q 0 , Q, , ... ,Q n _, } forms a partition of X. We define h on each cell of the partition by {
,
h = f on S k h = - + 6 f on Qk , k = 0, ... ,n — 1. n The claim is that if 6 is small enough, then this h satisfies the statement. First, why is h total? We have k n
h(S ) ( - + 6,
k+ 11 n j
and h(Q 0 c ( k k ± . 5 ]
l't,n (since f :X -4 (0, 1]) for each k. These 2n intervals form a partition of (0, 11 Now if A = (0, I] is a Borel set, then Sk r-4 -1 A = h -1 (A n ( k + 6, k + 1 1) n n _I )
193
4.4. Topological representation and
Qk nf -1 11=h -1 (-k + bA). n Thus given an arbitrary BEM, we can choose a Borel set A o with d(B., f '240) small, and then construct -
f
n-1
k +.0+1 u _lc +6A U A= Aon(--
k=0
-
Then h - '(A) will be very close to B, since they will nearly agree on each cell of Y. Now
1h - g I ‘. Nlii < Nli on each
Sk
E`",
and
h-g
on each
n
Qk ,
so we have
ilh
-
g
Finally,
tilh # f} Holvever,
,u(E) as (5 -, O. and N/iii(E) lif - di
Ei -
Therefore
and
p(E) p{h # f } ‘.. AS') < li
for sufficiently small b. (5) If f : X , [0, 1] is total and 0: [0, 1] -* [0, 1] is strictly increasing, then 0.f is total. Proof: (04) - 1 (A) = f ' (0 - I A), and 0 -1 A runs through all Borel subsets of [0, 1] as A does. -
-
Remark 4.6 Reduction of the problem LI order to prove the JewettKrieger Theorem, it is enough, given (X. .q, p, T), to find a compact totally disconnected D c [0, 1] and a total function h: X p D such that -
11....P .L..ç
•
the algebra generated by {hr: nell consists entirely of uniform functions. Proof: Recall that AlgT(h) denotes the Lc' closure of the algebra generated by the hr , ne 1, and the constant functions. By 4.4(1), Alg r(h) consists entirely of uniform functions. As promised in 4. 2 we let M be the maximal ideal space of AlgT(h), so that Alg T(h) =
By the Gelfand theory, the action of T on this algebra arises from a homeomorphism S :M -4 M. S is uniquely ergodic because all functions in W(M) are uniform: {Ag} converges uniformly on M to a constant for each gEW(M). Let y be the unique invariant Borel probability measure on M. We have a Banach algebra isomorphism
4): AlgT(h) -4 (e(M) such that
4)(f )dv for all f e AlgT (h). f dp. = x Now AlgT(h) c L'(X) and çi'(M) c= L°(M) are each dense in the respective norms. Thus 4) extends to a Banach algebra isomorphism Lc*(X ,
a, pc) V
(M , .9-, y)
such that (1)(f T)= f)S and
[
f p = J. f )dv
for allfeL(X , p).
By 4.1, (X, É p., T) is metrically isomorphic to (M, S). By discarding an open set of measure 0 from M if necessary, we may assume that y has full support. Then (M, S) must be minimal, and so it is in fact strictly ergodic. Notice that this step corresponds to discarding a set of p-measure 0 from X. so that Alg T (h) remains unchanged. By 4.3(1), WM) contains a dense set of simple functions. M is metrizable because W(M) is separable. Thus 4.3(2) applies, and M is totally disconnected. M must be perfect, since the set of isolated points is invariant and open, hence is either empty or all of M; but if it were all of M, M would have to consist of finitely many points, which is impossible because (M, .97 , y) (X, a, p), a Lebesgue space. By 4.3(3), M is homeomorphic to the Cantor set C. ,
B. Recurrence along I P- sets This section presents the extension, due to Bellow and Fursten-
4.4. Topological representation
195
berg, to ergodic m.p.t.s of a property that is obvious for weakly mixing m.p.t.s. This property is essential in the proof of Jewett's Uniform Perturbation Lemma, which is used to produce the function h. Recall Hindman's Theorem: If N = C L.) C (disjoint), then at least one C1 contains an IP-set, Le. a set {p 1 , p2 , } which is closed under the formation of finite sums pi , + pi, + + pi.(i i <12 < < in). In 4.3.B we gave a proof of this result by means of dynamics. Before applying it, some simple observations about ergodic theory and 1P-sets are needed. Lemma 4.7 Let T: X ---) X be a not necessarily ergodic m.p.t. on a probability space (X, ?.4 , p). Let S N be an IP-set and Be .4 with ,u(B)> O. Then for each E > 0 there are infinitely many SE S for which
p(B n T '13) > p(B) 2 — E. }, i.e. S consists of all Proof : Suppose that S is generated by (19,, p,, finite sums pii + p i, + + p i. , where t 1 <j2 < < Ç. Replacing S by a subset if necessary, we may assume that Pk+ I
> P1 ± P2 + "
Pk
= nk
for all k.
Note that < j implies ni — ni eS and il <
<J n J n.. From Khintchine's Theorem (2.3.3) we already know that fic€ N :p(B 7 —kB) > p(B) 2 — el is relatively dense, hence infinite. However, we claim that there are infinitely many k of the form ni — ni in this set. For if not, we would have p(B) 2 - c
for all j > i If f =xa — p(B), this would imply that for all pe p(T - '413 n
(f Tni
0 =
Trq " +
(X T - -y 8 + - • + 1)(I) — I) [10)2
+f
p- l a + 14413)
some I.
P - )2 dp
pp(B)) 2 dp 2P 2 II(B) 2 + P2 10)2
Pli(B)( 1 1413)) — EAP — 1 ). However, this tends to — oc, as p . so this situation is impossible.
17V
.
more avvut recurrence
Proposition 4.8 Let T: X X be a not necessarily ergodic m.p.t.
.on a probability space and suppose that
(X, a, p). Let
A, Bea, let S
N be an IP-set,
14A n T_13) = e(s)> 0 for all se S. Then we cannot have e(s) 0 as s --) co. Proof : Suppose that S is generated by p 1 , p 2 , .... Apply the Lemma to A' = A n
Pi B, e =
and the IP-set S' generated by p 2 , p 3 ,.... We find that there are infinitely many s'ES' with p(A' n T
A') > e,
i.e.
p(A nT - PiEnT - s' A
T"B)
c.
Hence
for infinitely many s'E S'.
This Proposition together with Hindman's Theorem will provide the desired steppingstone. For a finite partition of X, define the gauge 3) ofc by (5( 3 ) = inf {p(P): PE. p(P) > 01. Theorem 4.9 (Bellow Furstenberg 1979) Let T: X --> X be a m.p.t. on a probability space (X, .4, p), and let be a finite measurable partition of X. Then —
lim sup6(.9- y T)> O. n—■ oo
Proof: If .F = {F1 , F 2 ,...,Fr }, let
(i, j): 1 ‘. j j ,
r and there is ne N with
p(F, n T-nFi) =
V T-11
391
be the set of gauge-minimizing indices. For each (i, j)€.51, let
H(1, j) = tne N.: 6(.9- y T - ",) = p(F nT'F j )1 be the set of times at which this pair of indices scores a minimum measure. We then have a finite covering
N=
U H(i, j).
(I, DE.91
By Hindman's Theorem, some H(i, j) contains an IP-set. By Proposi-
4.4. Topological representation
tion 4.8 applied to F
and this set,
0 < 6(9- v T ' .9; ) = p(F i n T'Fi) = e(n)-/- ■ 0 as n
co .
C. Perturbation to uniformity The key step in Jewett's argument consists in perturbing a given simple function to a nearly uniform simple function. This is accomplished within a suitable tower in X. Henceforth T: X X is ergodic. Recall that for E c X, nE(x)= inf{n
1:
TnX E E}.
We use the notation n— 1
E
f(rx), A nf (x) = n ic=0 l n—
f(Tx) for n 1.
A ,f(x) = -
k =0
Lemma 4.10 Tower Lemma Let fe L l (X, , p), E > 0, and pe Then there are a measurable set E X and q > p such that
(I) p ‘.n E(x) q for all xe E ; (ii) if xEE and nE(x)> p, then ixfdp A nEw f(x)1 < e; (iii) p{xEE: nE(x)= pl < gip El EuTEu...un- 2E X is decomposed / into a tower whose height varies between p and q floors.
TP-1 E
TE PIE = P nE= p
+1
nE= q
flIf
The measure of the darkened part is less than E
Beginning at these points the ergodic aver:wee ne f are close to
fdP
't . /VI
1 70
Urr
MAIM
recurrence
Proof: Assume that E < 1. For each n --. 1, 2,... let
= Ixe X:
If
fdp — Akf (x) <E for ikl.?....n}.
x
Then G, c G2 c . . . and, by the Ergodic Theorem and because on a set of finite measure a.e. convergence implies convergence in measure, p.(G n), 1. Choose N > p with 1.2(G N) > 1 — El p, and choose D c TG N with 0< it(D) < 1IN. Let N
E0
—
D\ U TA D.
i Then p(E0) >0, since otherwise, up to sets of measure 0, k=
N
Dc: U TAD, k=1
so that
T -1 D
Du TD u ... u T N 'D,
and hence
T'(DuTDu ...0 T N-1 D)
Du TD u ... u T N-I D;
this is impossible because T is ergodic and D u TD u ... u T'D has measure strictly between 0 and 1. Now form the tower decomposition of X with respect to E0 ; a certain choice procedure will be used to construct the set E. Let Po — {xe E 0 : nE.(x) — i}, lor i — 1, 2,.... Then
EL0- u T E io u ...uT i-t E10' forms the ith column of the tower. Start at the roof of the skyscraper, in a column of height at least 2N, and count down 2N floors. When our choice procedure lands us above this boundary, nothing above the landing cell will be included in E. Start with a point xEE0 , and look at its successive images under T, moving up the tower (but only up to the first point beyond the boundary level nE.(x) — 2N). Whenever we encounter a point of G Nc , we include it in E and skip p steps; whenever we encounter a point of G N , we include it in E and skip N steps: Thus E0 c E, and if we are looking at TAX for some xe E0 and k ‘..nk..(x) — 1, then (a) if Tkx E G Nc , we put TA X EE, Tk + 1 x, ... , Tk+ P — ' X (4 E,
4.4. 1 opological representation
177
and turn our attention to r+PX (so long as k + p ‘.n E0(x) - 1); (b) if rxeG N , we put rxEE, T'x, ... , r+N—lx gi E, and turn our attention to T" Nx (so long as k + N ‘.. n E0(x) - 1); (e) if k › n E0(x) - 2N, we put rx EE but V' ± 'x, ... , TnEolx ) ' xit E. Now if XEE, either (I) XE Gs and nE(x) = N, (2) xf GN and nE(x). p, or (3) T" - 1 xeG N for some in with N < m ‘. 2N. (The entire top floor of the tower must be a subset of G N , since D c TG N). Ills clear then that
p ‘. N E(x) -... 2N
for all
XE E.
so (i) holds if we take q = 2N. To verify (ii), notice that if xE E and nE(x)> p, then nE(x)?.... N. If nE(x) = N, then XEG N and (ii) holds by definition of G N . If n E(x) > N, then y= TnE(x)- ' xe G N
and f (x)+ f( Tx) + ... + f(TnE(x) - 1x) _, f (7' - nom+ so that A n .(x)f (X) = A I'.
— n E(X)
and again (ii) holds by the definition of GN . Finally, if xEE and nE(x) - p, then xE G;. This implies that E pp {x E E :nE(x) -- p} ‘.. pp(G;) < P; < e,
so that (iii) is also true. Lemma 4.11 Uniform Perturbation Lemma Let f: X --dR be a simple (measurable) function (that is, f assumes only finitely many values), e > 0, and ME N. Then there is a simple function g: X -) R such that
(i) plx :f (x) # g(x)} < E. (ii) II fxfdp - A ,,g
L <E for all large enough n,
(iii) {0(4 g(Tx),...,g (Tm - 'x)): xe X} c I( f (x), f(T x), ... , f (T' -1 x)): xe Xl. Proof : The function F : X -,
or defined by
F(x) ---- (f(x),f(Tx), ... ,f(Tm - ' x))
assumes only finitely many values; call them a 1 , a2 ,...,ar . Then the sets
200
4. More about recurrence
F -1 ta i l.... ,F{ar} form a finite partition .f of X, so by the BellowFurstenberg Theorem, lim sup S(.9- y
0.
This means that there is an a with 0 < a < 1 for which there are infinitely many pE IN such that
(1) p(F - lad r T PF
{ai } ) > 0 implies p(F -1 lail n T - PF - 1 tail )
a.
Again, if
G.= {xe X :
fdp
Al,f(x)1‹ E/2 for k n},
then AG.) 1, so we can find N > m with p(G N) > 1 — a. Choose and fix p> N for which (1) holds. Let
= (i, j): 1 i,j r and p(F -1
n TF 1 {aip>
and
=
{ai }
P F - 1 { ail for (i,j)e
Jr.
Then {Eii : (i, _heir} forms a finite partition of X into sets of measure at least Œ. We must have, therefore,
it(G N nEii)> 0 for all (i, j)E
)t•
Thus for each (i, AEI'', it is possible to choose
yij EG N n Eii . This completes the hard part, for which the Bellow- Furstenberg Theorem is required. Now the function g is easily constructed by means of skyscraper architecture. Apply the Tower Lemma tof, t/2, and p; we obtain an E and q satisfying (i), (ii) and (iii) of that Lemma. Refer to the Figure (p. 197). We will redefine g only on the leftmost column of the tower, leaving g — f elsewhere: if Ep = {xe E : nE(x)=-- p } , then g will be equal to f everywhere except maybe at some points of Ep UTE p u TP - 'Er . Thus (iii) of the Tower Lemma gives (i) of the present lemma. Now if xeEp , then xe Eii for a unique (i, Y We will redefine g at x and its images Tx, T2x,..., TP - x up the column by
(g(x), g(Tx),..., g(TP - x)) = (f(y ij), f (T
, f (TP - t y i ) ).
Let G(x) = (g(x), g(Tx), ,g(T" 1-1x)) for XE X. In order to verify (iii), we have to show that range (G) c range (F). It is clear, since m < n
4.4. Topological representation
201
The difficulty comes from the possibility that x, T x,...,Tm - i x slop over the top of the tower, i.e. that m > n E (x). We overcome this by showing that for each xeE there is ye X such that (g(x), g( Tx), ... ,g(Tm -1" "E(xi - ' x) ) -=-- (f (y), f (Ty), . . . ,fir" ÷ "E(x)- 1 y))
Then m-step agreement, starting at any point in the tower, will follow from this agreement on each column plus ni steps, starting from a point in the base. Suppose then that XE E. There are several cases to consider, depending on where x starts from in E and where it first returns to E. Consider first the case when XE E,, (i.e., n E(x) = p). Assume that XE Eii = F- ' {ai } n T - P F-1 {ai). Then TPx,TPy ii EF- ' {ai}. If xe Ep and TPx E E p , then (g(x), g(Tx),...,g(TP - ' x))= ( f (y ii), f (T y i), ... , f(TP - 1 yi)).
Now TPxe F- 1 fail; suppose TPxe Elk = F - ' {ai } nT - IF- ' lak l. Then (g(TPx), ... , g(TP ÷ 1" - 1 x))= ( f (y .0,), .. . , f (T" - 1 yik )) = a. = (f(TPy ii), ... , f(TP+m - i yii )),
since TP y ii e F - I lail . Recall that m < p, so this case is settled. If xE Ep and TPXO E p , then (g(TPx), ... , g(TP 1- m - 'x)) — (f(TPx), ... , f(TP" - 'x)) = cli = (f (T PYii), -.- - f(TP+ m-
so that again we can take y = If x•TtE p and T"E (x)x0Ep , then clearly (g(x), ... , g(T""Ew - ' x)) = (f(x), ... The only case left to consider, then, is when x0E p but TnE(x)xe Ep . We have (g(x), g(Tx), ... ,g(TnEm - ' x)) = (f(x), f (Tx), ... , f(TnE (x)- ' x)). Suppose that TnE(x)xe Eii . Then (g(TnE(x)x), ... , g(Tnzw + P - 1 x)) = ( f (yii), f(T y ij), ... , f(TP - 'y)). so that (g(TnE (x)x), ... , g(T"E (') +"
' x)) = F(y1 ) = ai = F(T"Etx )x)
= (f(T"Emx), ... , f(T"E (x) ÷" ' - ' x)).
To verify (ii), note that from the Tower Lemma if xeE\E' p then
fx
f dp — A nRoo f (x)l<
c/2.
202
4. More about recurrence
The same conclusion holds also for xeEp , because each yij EG N and p> N. Because g and the return time to E are bounded, this observation is sufficient to yield (ii). For any average I n-1 A(x) = - E g(T kx) n k-IL) can be broken up into blocks according to the entrances of Tkx into E: choose 0-..ç. k 1
Ag(x) = - E g(Tkx)
1 r—lkji-i-1
E g(r kx). n k=kr k=0 i--- i k=k; The first and last sums each contain at most q terms and therefore tend uniformly to 0 as n --) cc. The central term may be written as
n
1
E E g(T kx)
,
,_
n
nE rr"x)A nEiTiq x) f (x) + ... + nE(T kr - ' x)A (Tk,
-
n
For large n, this expression approximates a convex combination of r - 1 terms, each of which is within E/2 of h f dp; the result must be within E of
Sidil. D. Uniform polynomials According to the preceding Lemma any simple function f can be changed on a set of arbitrarily small measure to obtain a nearly uniform simple function g which runs through the same ni-tuples of values asf. Next, given a polynomial P we construct the simple g so that P o g is nearly uniform, and then we take a limit of such functions g. Lemma 4.12 Let f: X -) DI be simple, t> 0, m > 1, and P a polynomial in m variables. Then there is a simple function g: X -) R such that (i) p{x : f(x) # g(x)} <
E,
(ii)0 Ix 1)(g, 9 T--- ,grn-1 )41 - Al(g, 9T,.. - ,gT m-1 ) IL < c for
n..>-- some no , (iii) { (g(x), g(Tx),...,g(Tm - 'X)): XE X} {
(f (x),f(Tx),...,f (Tin - 1 x)): xe X}.
Proof: We will apply the Uniform Perturbation Lemma to a simple function f obtained from f as follows. Again let F(x) = (f (x), f (Tx), ... , f (T'x)). and suppose that the values assumed by F are a l , a2 , ... , a,,.
4.4. Topological representation Choose distinct
a l , E/2 ,
203
5, with
< E/4 for all i,
'Rai) and define
J_ di on F {ai }. Use the Uniform Perturbation Lemma to find a simple 6 with
(a) pfx :J(x) * 6(x)} < for n no , (b) S xf An:q11 < (c) {(j(x), x)): xe X} {( f(x), f( T x), , f (Tm - x)): xe X} . We define
g
first entry of ai on V I
- = For (i), notice that if J(x) = 6(x) Thus tx: f (x)
then xe F- 1 lai l and g(x) = f (x).
g(x)) cE
For (iii), given
XE X
there is yx e X such that
(4(x),...,4(1" - 'x)) = (f(y x
),
,
I( Trn -
yx) •
Then for each k = 0, 1,..." — 1, j(T kx) = f (Tky.); thus if both are then g(rx) is the first coordinate of ai while ryx eF -1 tail so that
ai = (f( T'y),...). For (ii), given xe X find y as above such that j(x) =fl) and ,g(T m- x)) = (f
(.9(x), g(T
—
= =
f (T y x), • • f (Tm
{a i}. Then
Suppose that y.e
j(x)
().
P(g(x), g(T ,c) —
x))I x), f (T y x ), • -
1 Y ))I
— P(ai)I <
Therefore
<
- c/4 < AnP(g, everywhere, and hence, using (b),
—q2
A nlAg, g T,.
—
f dp < /2
for all large enough n. Now integrate the inequalities
fx
idt,_ E/2 < A nP(g, gT,... ,g Tm - < f f dp + El2
4. More about recurrence
204
to find that 1 fx P(g, gT,...,grn -l )dp - fxfdp The triangle inequality gives
fx P(g, gT,...,gT"')dp - A.P(g, g T, .. . ,g 7 -1 " -1 )
<e
for all large n. Lemma 4.13 There is a total functionfe L(X) such that the (not necessarily closed) algebra generated by the functions f r, n el, consists entirely of uniform functions. Proof: It is enough to find a total f such that each monomial in f, f T. ... ,f T'n - 1 is uniform. (Compose with a negative power of T to obtain all monomials in thef Tic. kE Z.) Begin by writing a sequence of monomials --• R, Mn(g 1 , • - - , t„) = tel l "te22(") ... ten n(n),
in such a way that every possible monomial appears infinitely many times in this list: for any choice of nonnegative integers e l ,...,ek , there are infinitely many n with = tei' ... t. (e.g., the list 1; 1, t 1 , t 2 ; 1, t 1 . t 2 , t 1 t 2 , tft2, it 1.22 , tft22;... will do.) [0, 1]'" to choose Emn < 1/2" Use uniform continuity of each M such that 1 sk - tk I <m,. for k= 1, 2, ... ,m implies 1 Mints 1 , - - - , s.) - MnSt 1 , - 'WI < 2 . + 1 -
I
Let us simplify the notation M(4, OT,...4r n-1 ) to M(4). The aim is to set up Remark 4.5(3) about total functions by defining inductively total fn : X -■ (0, 1)] and I.. .?..., n such that 1 pix:f,i+ i (x) #4,(x)} < i
.
and (1)
M m(fdd iu fyr
‘
_2(
co 1
2-w,+
I
2,,,.,_ 1 +...+-i,-,1 ) for 1 ‘. in ‘.. n and all n.
4.4. Topological representation ?hen 4.5(3) will give that
(2)
j‘ Mm(f )dp
205
a.e., f is total, and
f. —of
AM(f)
4.--2-7 1,, for all m.
To begin, let f1 : X (0, 1] be any total function and let r 1 = 1. If n> 1 have been chosen, let End fi ,... J 1 and r, ,
p=n+
max
rm
1 4S.m4n-1
End 1 min E 2 . — mn 2 I 4m4rr Since the simple functions are dense in 12°(X), there is a simple measurable g: X (0, 1] such that II g —4_, ii„ <E. By restricting attention to an invariant subset of full measure if necessary, we may assume that for each
x)) = is either empty or has positive measure. Now apply Lemma 4.12 to g, e, p, and M. (which may be considered a polynomial in p variables). We obtain a simple function h: X (0, 1] and an I.. n such that
(i) gx:h(x) * g(x)} < E, (ii) fxM,M)dtt A rMn(h)li . ,0 < c for r r n , (iii) (h(x), h(T x), , h(TP - x)) : x X} , g(TP' x)): x G X). {(g(x), g(Tx), Next get ready to use Remark 4.5(4) about total functions: _In_ is total and
+119 --4-1111 plx:h(x)* g(x)}
iloo <E ±E
+ so there is a total —
X
(0, 1] such that
h < ,12e
and
1 gx:fn(x) fn _ i (x)1 < /2e < — r Note next that since \721
.
nri , we have
(11, < 2 ,,1+ „ M(f) M0
2e,
206
4.
More about recurrence
so that also Ifx ki n(f.)dit - Ix MA)dizi< 1/2" 1 ; since s < 1/2" + ', (ii) gives
1 f Mn(f.)dm - A r M n(fn)
< 2-
1
for r
00 X proving (1) for the case m = n.
Suppose now that 1-...ç. m ‘.. n - 1. Since p ?.- n+ rm .?..- m, we have, by (iii), that
1 fx WA) dp Remembering that
—
A rm M m(h)
E < Emn
f Mm(f.)dp x
x,
—
Arm M.(g) 00
and ji-E < Emn , we see that
II Mm(g) - Man- 1) IL < 2 and
1 + 1
1
illt4 an) — M m(h) L < r + , .
Thus
Of
M .2( fn)dp — A ,,,,, M m( f.)
il A rm M .(4) — A ,....M .0) II 0,
Ar», M.(h) — f M m(4)dpil
x
co
+ A M m(g) — f M.(4)dp1 x Go + il A,M JO — A ,,,,,A4 m ( 4,- 1 ) i i oc, A ri. M jf.- 1 )
—
f
X
<
1
1 + 2n+ 1 +
+If
Mjf.)4111 cc
24,..M m( f._ 1 ) —
fx
M.(f._ 1)4 GO
Cfn_ , — 4)41 1
X
1
1
fi
1
1 1 1 1 = 2 (— + 24 21" 2m+ ' + .. - + 2" - ' + — by induction.
1 )
1
4.4. Topological representation
207
Thus the f. and rn satisfying (1) are defined for all n, and f. converges a.e. to a total function f satisfying (2). Given any monomial M(f) in f, f T, ... ,f T k- 1 , M appears in our list asaan M n, for infinitely many values of m. Therefore
fx
M(f)dy - A r., M(f )1
for infinitely many m. This implies that M(f) is uniform. For given e> 0, choose m so that 2m - 2 < e/2 and Ro so that 2r.11 M(f)II œiRo < 8/2. Then for R ..?- R o write R = nr„,+ q and 1 n-1 r— 1 .i q - 1 A R M(f ) = — M(f)T k r'in + — M(f)T k rr'n R j= 0 k = 0 R k=o to see that
y y
y
lif M(f)dp - A R M(f)
s
E. Conclusion of the argument It only remains now to prove the statement to which the JewettKrieger Theorem was reduced in Remark 4.6: given an ergodic system (X, A p, T), there is a compact totally disconnected D c [0, 1] and a total function h: X -• D such that the algebra generated by Or :neZ} consists entirely of uniform functions. Choose a total function f : X -, (0, 1] such that the algebra generated by {f Tn:neZ} consists entirely of uniform functions (Lemma 4.13). Avoiding the at most countably many atoms of the measure pf - I , choose a countable dense set B c R such that p(f - ' B)= 0. Use an old trick to construct a Cantor-type function: if B = (1) 1 , b 2 , ... }, let
1
OW = bk < t
Then D = O(D) c [0, 1] is totally disconnected: B is dense and at bk the strictly increasing function 0 has a jump of 1/2". Now h = 4) °f is total, sincef is total and 4) is strictly increasing (Remark 4.5(5)). Choose continuous 0., 0. with
0 ‘.. 4,. ,,, 4) and
1..?.- t/i. --, 4) on BC.
Let h. -= 4),°f and in = 'of Then h,, and i. are in the supremum norm closure of the algebra generated by f and the constants (by the StoneWeierstrass Theorem), and hence h. and in are uniform (by Remark 4.4(1) -
Lwow recurrence
..s.......,
concerning uniform functions). Then Remark 4.4(2) about unifort functions implies that h is uniform. in the same way, any monomial hTmi • hT"" ... hTmr
is the increasing limit as n --, co of monomials
4). f 7' ' - ck nf Tn2 ... cb ti fTmr and decreasing limit as n ---) co of monomials III n f Tml -tli n f T ml ... 0 n f TM%
each of which is 'uniformly approximable by a polynomial in f Tin', ... , f Tm'. Thus hTml - hTm 2
. . .
hTmr e Algr(f)
and hence is uniform. Exercises
1. Construct a strictly ergodic subsystem of the space of bilateral sequences of Os and is with the shift transformation which is metrically isomorphic to an irrational rotation of the unit circle. 2. Consider the three following m.p.t.s. (a) The von Neumann-Kakutani adding machine transformation T is defined on the unit interval as follows. Starting with the unit interval, at each stage cut in half vertically and stack the right half on top of the left half.
I i
I i I
I
I
1
i I. 0
1
I
1 1 t
I
i
.4 1 i I
i i T maps each interval isometrically to the one above it and at each stage is left momentarily undefined on the topmost interval. Here is another picture of the action of T.
0
i
o
o
#
i i
h "......."
"
,
4.5. Two examples
209
(b) 17 is a compact topological group under 'addition with carry to the right': 101 10 1 ... +0 1 1 0 1 0 ...
1 1 0 0 0 0 Let Sio = + (1, 0, 0, ) for (DEZT. (c) Define a `Toeplitz sequence' TE 0, 1} z by entering a 0 at all even places, then entering a 1 at every other one of the remaining places (i.e., write 1, blank, 1, blank, ... ), then a 0 at every other one, etc. Let X be the orbit closure in { 0, 1} z of r under the shift cr. Show that (17, S) and (X, a) are strictly ergodic and metrically isomorphic to ( [0, 1], T). 3. Show that the set of limit points of the forward orbit of the Morse sequence ... 0 110 1001 10010110... (at each stage write down the 'dual' of what is at hand, where 0 and 1 are the duals of each other) under the shift is strictly ergodic. 4. Note that the Bellow- Furstenberg Theorem is clear for weakly mixing transformations. 5. (a) Show by example that if (X, T) and (Y, S) are uniquely ergodic topological systems (i.e. compact metric spaces with homeomorphisms), then (X x Y, T x S) need not be uniquely ergodic. (b) If ci): (X, T) ---• (Y. S) is a homomorphism, does unique ergodicity of (X, T) imply that of (Y, S)? Conversely? (c) LetT:X -•Xbe an ergodic m.p.t. on (X, Af, p). Do the uniform functions in Loe(X, At, /4 form either an algebra or a lattice? If not, what can be said about the algebra and the lattice that they generate? {
4.5. Two examples Although in the sense of Haire category almost every m.p.t. is weakly mixing but not strongly mixing, concrete examples of such transformations were not easy to find. We will look at a modification of one such example constructed by Kakutani (1973). Our version (adapted from Petersen and Shapiro 1973 - see Petersen 1973b) is a derivative rather than a primitive induced transformation of a group rotation and has the advantage that it can easily be shown not even to be topologically strongly mixing, so that it also provides an example of a minimal, uniquely ergodic
LIU
4.
more about recurrence
cascade that is topologically weakly mixing but not topologically strongly mixing Recall that for strictly ergodic cascades, the following implications hold: 1
metric strong mixing
2
3
metric weak mixing
topological strong mixing
topological weak mixing
z
4
This example shows that 2 4- 1, 4 *3, and 2* 3. Other examples showing that 2 4- 1 are due to Chacon (1969), Katok and Stepin (1967), and Dekking and Keane (1978). That 3 4- 1 is seen in examples of Petersen (1970a) and, more recently, Dekking (1980), and that 4 *2 in one by Kolmogorov (1953) (see Parry 1981). Finally, it seems that topological strong mixing can hold without metric weak mixing for uniquely ergodic cascades. The following way to see this emerged in discussions among myself, Brian Marcus, and Marina Ratner. Every continuous reparametrization of a horocycle flow { T, } is topologically strongly mixing (Marcus 1975). On the other hand, since it is loosely Bernoulli (Ratner 1978), the horocycle flow has a measurable reparametrization which is an irrational flow on a torus, hence not metrically weakly mixing. By a result of Ornstein and Smorodinsky (1978), a system metrically isomorphic to this one can be arrived at by a continuous reparametrization of the horocycle flow. Again by a result of Marcus (1975), T, is uniquely ergodic for a.e. t. For such a t, T = T, will have the desired properties. A different example has been found by Oren (personal communication). In fact Lehrer (not yet published) proved that the topological model in the Jewett—Krieger Theorem can always be made topologically strongly mixing. The second example, due to Chacon (1969), is also uniquely ergodic and metrically weakly mixing but not metrically strongly mixing. Recently del Junco (1978) showed that this transformation is actually prime — it has no proper factors, or, equivalently, the c-algebra of all measurable subsets contains no proper invariant sub-a-algebras. The transformation first shown to be prime was constructed by Ornstein (1967). More recent work on prime transformations is in Rudolph (1979), Fieldsteel, del Junco,
4.5. Two examples
211
Rahe and Swanson (1980) and del Junco (1981). For the topological versions, see Petersen (1969), Furstenberg, Keynes and Shapiro (1973), Keynes and Newton (1976). A. Metric weak mixing without topological strong mixing The basis of this example is the von Neumann- Kakutani 'adding machine' transformation 4): [0, 1) - ■ [0, 1). For n = 0, 1, 2, ... , let 1. = [1 - 1/2", 1 10
I,
11
I
We define
4)(x) -= .7c — 1
1
r
1
12.
1
la 14 III
) on I.,
so that (1) slides I. to a symmetric position on the other side of 1/2. Then 41 is an ergodic m.p.t. with discrete spectrum, having eigenvalues e 2'' with A any dyadic rational. The map 4) is isomorphic to translation by a generator on the compact group of 2-adic integers. (See Exercise 2, 4.4.) Proposition 5.1 For any f L2, f ° 02" f in L2. Proof: For each dyadic rational A, let fa be the corresponding eigenfuncdon of modulus 1. Given f a!, we have
f = Ea4fa in L2, where aa = (f, f). Then f(4)21'30
= Eaote 2Ri2"Af jx),
so that
Ilf002"—fC=Elaallebtira
112.
A
Now 2nA = 0 mod 1 if A = p/2k for k < n and p odd, i.e. if A has rank no larger than n. Since le' - 11 2 for all n, we see that lif °Or - f
II; 4 rania>n E
CI5
since the series ElciAl2 converges. OD
Now let A = U I 2 n • =0 13 14
212
4. More about recurrence
Clearly if XE A then either Ox EA or ex EA, so that A has the first-return decomposition A = A 1 VA 2 , where A i =(/ 2 u/4 u...)u(/ 0 n0 -1(1 2 u/4 u...))
A 2 = /0 n0 - (/ 1 u1 3 u15 u...). (Note that writing the intersections with /0 here is actually redundant.) Let (PA :A A be the induced (derivative) transformation: OA = 01 on A i , i = 1, 2. Proposition 5.2 Proof : Suppose that f :A
OA is weakly mixing. {zeC : Izi = 1} is an eigenfunction of OA with
(for convenience). Extend f to [0,1) by letting f(x)=-eigenvalue f (4) - l x) for xe A'. Then
f (0x) j f(x)
t
on A 2 (i)A 2 on A i . Ac
0,4 2
At
A2
A
Thus if = e2'", we have f(4x) = ezniu(x)..if •xx,) where u(x) =
{1 if xe A 2 u •PA 2 2 if
If un(x) = u(x) + u(4)x) + + u(e - l x) for n 1, then
f(/x) = expf(x). Consider the case when n = 2" and p =2q is even. Decompose [0, 1) into 2" equal subintervals. The map (/) permutes these intervals cyclically, mapping all but the rightmost one by translation. Thus if XE [0, 1) is not a dyadic rational, the points x, Ox, , (P P- l x are distributed one to each of these subintervals. We can tell on which of these intervals u 2; we are uncertain only of u(x') = u(x"), where x' and x" are those points of {x, l x} which are in the rightmost subintervals of (0, 1/2) and (1/2, 1).
4.5. Two examples
213
Beginning at 0 and 1/2 and moving to the right, we encounter 2P -1 subintervals (of length 1/2/ on which u = 1, then 2P -2 on which u -= 2, 2P -3 on which u = 1, ... , 2 on which u..--- 1, and finally 2 on which u is not constant. Thus u 2„(x)= 2P -1 + 2.2P -2 + 2P-3 + 2'2P-4 + ... -1- 222 ± 2 1 + 2u(x').
Hence u 2,(x) = Mg + 2
or Mq + 4,
where mg . 2 24 -1 + 2 . 24-2 + ... ± 2 . 2 2 ± 2 1 .
Now ptx:u 22 (x) = Mg + 2) = pfx:u(x") -- 1) = p{x:x"eA 2 u4)242 } ----1.4x:x"e4)A 2 1 =p{x:x"e/ 1 u./ 3 u/5 u...}.
The rightmost subinterval of our partition looks like this, where the desirable region for x" is darkened: .1.2q _ 1
iv
liq-.2
1 1-Tp
In [1 - 1/2P, 1), the darkened region has relative measure 1 4
+-1-+ 16 • • ' = 13 9
so also 11{X:U 22,1(X) = M q + 3 2) -- I
(sliding x slides x"). By the preceding Proposition, 0 = iim II f 002P _ f 1122: = lim I
- ' 30
= lim
I e 2itiolu2p(x) _ 112 dx
g
ll e
2/riiitt p(r) 2
-111:
9-- ,r, = 'um ole2Riaow,,+ 2) _ 112 + il e2RiA(Mg 4- 4) _ 112). q -, co 3
Therefore il(Mq + 2) -, 0 and 11.(Mq + 4) -+0 (mod 1) as q -, co . Subtracting, 2A -+ 0 mod 1. Hence 2 = 1 and 4)A is metrically weakly mixing. The maps 4) and 4)A can be realized as uniquely ergodic cascades. Define a map co from [0,1) to H. f0, I} by
co(x) = xAc(ex).
214
4. More about recurrence
Then 0 on [0, 1) is carried over to the shift a on X = 7(w(0)). (X, (7) is uniquely ergodic since any invariant measure must be carried to Lebesgue measure by this isomorphism. (See Exercise 2, 4.4) Let X0 = { COEX:(.00 = 0). Then 0, is isomorphic to 0-0 : Xo X0, where co is the 'shift to the next 0'. (X0 , 0-0) is again uniquely ergodic. Proposition 5.3 (X0 , 0-0) is metrically weakly mixing but not topologically strongly mixing. Proof : Since (I) A is metrically weakly mixing, so is 0-0 . Let B be the initial 2r-block of ,
center co(0) = ... 010001010100010 ... For (X 0 , 0-0) to be topologically strongly mixing, it must be the case that is the set of places in (0(0) at which B,. appears, then the difference if contains all integers from some point on. Let pr denote the set dir, — number of Os in B,.. We will prove that if r is odd and n> r 3, then there do not exist two appearances of B, in co(0) separated by p. — 1 places. Denote by B* the block which agrees with B everywhere except in the last entry, which is `dualized' -0 is changed to 1 and 1 to O. Notice that B., BB .* for all n. We establish a list of simple observations. (1) B. appears at the k- 2n + 1 st place in co(0) for all k E Z. Proof: The 2' '-block appearing at this place is either B. +1 or B:4.1 . (2) If B. appears at the mth place in co(0), Then m = k - 2" for some /Gel. Proof : By induction. The statement is clearly true if n =0. If n 1, since B. = B_ 1 13.*_ , we have m = k - 2" - for some k. 11 k were odd, by (1) we would have both B. _ and B.*_. appearing at the (k + 1)2" -1 st place. (3)
(a) If n is odd,
1 = 2p. + 1 = 1 +
E pi . 1=0
(b)
n is even, p., = 2p.— 1 =
E pi . i=0
if n is odd pi= tpr pr + 1 if n is even. i=r Proof: (a) and (b) are proved easily by induction, and then (c) follows. (4) 4Ç — K (the difference set of the appearances of Br) is contained in the collection of all numbers of the form (c) For r odd and n> r
3, p.
E
,
= crPr + • • • where m ) r, each Ei is 0, 1, or — 1, and
Ent = 1.
4.5. Two examples
215
proof : Because of (2), it is enough to show that each initial I c- 2'-block of 00), for any k = 1, 2,...2" — 1, contains orPr + -* • + 45r+n–lPr-1-n– 1
OS, for some choice of the 01s each equal to 0 or 1.
We use induction on n. The statement is clear for n = 1. Suppose now that it holds for n. If k = 2, there are p. + ,= 1-p„ +(. +11 _, Os in the block in question. If k is one r + 1, ... , 2' — 1, then the initial k-r - block of co(0) is as shown below. B
Bn+r
Pry
+r
Os
initial (k-2 ) 2'-block
of con
By induction the block B contains 0r pr + ... + 6„ + ,1 _ 1 pr+ ._ 1 Os for some choice of 15r ,...,(5, + ._ 1 . Thus all told there are (5,pr + ". + 0r 4-n— 1Pr+n-1 + P. +, Os. 5. If r is odd and n> r_.-• 3, then p. — 1 is not a number of the form a = Er g, + ... + em pm ,
where m ?, r, each el is 0, 1, or — 1, and Proof: Note that if m < n, then n
-
{
1
Pn — Pr
1.
em =
}
or
< p. —
1.
i=r
Pn — Pr — 1 Thus we may assume that m?•-• n. The smallest number a of the above form is Pr
m– 1
Pm —
E
pi =
or Pr + 1
If for some k with n ‘. km - 1 we change Ek from — 1 to 0, we will get Pr + Pk or pr + pk + 1, each of which is greater than p. — 1. Thus in order to achieve a = p. — 1, we must keep all these ek = — 1: m-1
a = Pm —
E
i=n
Pi+ en- 1PN-1 +... + ErPr
= 'En– 1Pn– 1 + - • ' + Er Pr +
Pr,
(m odd)
or p,, + 1
(m even).
216
4. More about recurrence
It is enough, then, to show that
b = En _ i pn _ 1 + ... + er pr can never be
—
1 or
—
2. Let eq be the first nonzero coefficient, so that
b = v ., + ... + erpr , and q -..çn — 1. If eq = L then b > O. Hence Eq = — 1. But then the largest possible b is 1 q- 1{ —Pr ‘.. 5. —pq + —
B. A prime transformation Like the Kakutani example, Chacon's can be constructed either as a strictly ergodic symbolic cascade or as a cutting and sliding around of subintervals of [0, 1] which preserves Lebesgue measure. In the symbolic version of the construction, we define a sequence of blocks by Ao = 0
A 1 =0010 A 2 = 0010 0010 1 0010
A n+1
=
A n A n lA n
The length of A. is 1(A.) = 1.= -1- (3" 1 — 1). Let co l be a point of {0, 1} 1 m-block is A. for all n, and let SI be a minimal subset ofwhoseintal1 cite co, : n = 0, 1, 2, ...), where a is the shift. Then 12 consists of all those 0 7 1 some sequences which contain only those blocks that appear in the A n. Thus there is a point coal whose initial 1m-block, for each n, is A. We will show that (S -2, a) is strictly ergodic, metrically weakly mixing, not metrically strongly mixing, and prime. For symbolic cascades, unique ergodicity is easy to determine by counting frequencies of blocks. If A and B are blocks on the symbols 0 and 1, i.e. A = ao a l ... a. and B = bo b i ... b, , where all the ai and bi are 0 or 1, denote by N(A, B) the number of appearances of A in B, i.e.
N(A,B) = card 0:0 .4 ., ...‹.. m, bibi+i ... bi+m = Al. Lemma 5.4 (e(x), t7) is uniquely ergodic if and only if there is a
4.5. Two examples
217
sequence of integers ic, Go and a sequence of blocks A m which appear in x with l(Ani) I Go such that the frequency of appearance of each Am in any k -block in x tends uniformly to a constant as n -, co : For each m there LS a constant tint such that given E > 0, there is an no such that if n .... no and B is any kn-block which appears in x, then
1
N(An., B) kn
Proof: Suppose that ( (9 (x), a-) is uniquely ergodic with invariant measure p. For each block A, the cylinder set [A] = (y: yoy i ... Yi(A) -1 = A)
is open and closed, hence its characteristic function is continuous. By 2.8, 1 n —1 n
I
k= 0
x[A,(eY)-, ft[A]
uniformly for ye (9(x). The expression on the left-hand side is the frequency of A in the initial n-block of y. Clearly this implies the condition in the statement of the Lemma. Suppose now that the condition is satisfied, and that y and ti are different ergodic invariant Borel probability measures for ( (9 (x), a). Since {ai[A m] : jel, m , 1, 2, ... } generates the a-algebra of all measurable sets, we must have V[Am] # p[A m] for some m, which we now fix. Then for v-almost all ye 0(x), N(A m ,y oy i •..ykn _ i ) kn
—, v[A n ],
by the Ergodic Theorem. Since yoyi • - - Yk„-- 1 is a kn-block which appears in x, though, N(A m ,y0 Yi Yk n - 1)/ky, should be close to pm for large n. Hence v[A m] = 11,m , and similarly p[A nJ = pi ni . The following two simple observations will be very useful. Lemma 5.5 (1) Every block in co of length ln + 1 contains the initial point of an appearance of kin w. (2) An appears in co only where we expect it (i.e. only at those places
where it is explicitly and consciously written down during the above construction). Proof: (1) is clear. As for (2), the statement is obvious for Ao = 0, and we can check visually that A I appears only twice in 24 1 /1 1 =00100010
218
4. More about recurrence
and in A l lA i = 0010 1 0010.
Then by induction 21. +1 appears only twice in A. + A. +1 and in +1 14 since otherwise A. would appear at an `unexpected' place in one of these blocks. Proposition 5.6 (SI, a) is uniquely ergodic. Proof: Every in +m -block in co appears either in A + „,i1. + or A n+m lk +m , and every in+ m -sub-block of each of these two blocks contains between 3" and 31I— 3 appears of Am . Therefore, if B is any U m-block which
appears in co, we have
B) lm+.
,
3" j243/1-Fm+ 1
21
_ 0 —• 3, 3rn as n —> co .
Since the convergence is uniform in B, an application of the Lemma is sufficient to conclude the proof. Remark 5.7 Let ti be the unique invariant probability measure on (L-1, a). Then 2 1 for m = 0, 1, 2, ... 1U[Am] A system isomorphic to (S-1, 6, p) can be constructed by 'cutting and stacking'. Divide [0, 1] into [0, 2/3) and [2/3, 1). The interval [0, 2/3) forms the 'initial stack', and [2/3, 1) forms the 'reservoir'.
o At each stage we cut the stack on the left into thirds, which we place above one another (bottom to top corresponding to left to right), except that we interpose a subinterval of the reservoir, of the appropriate length, between the second and third thirds of the stack. Thus the next step is: 11
)
41
)+ )
01
Continue in this manner.
) +
4.5. Two examples
219
The transformation T is defined by mapping each interval in a stack linearly to the one above it. Eventually T is defined on all of [0, 1] in this way. The isomorphism with (LI, o-, 1.2) arises from corresponding [0,2/3) with [0] and [2/3, 1 ] with [1] : to xE [0, 1 ] we assign the sequence {X[ 2/3, ll(Tkx) }The ideas of the Kakutani and Chacon constructions are similar: in each case a delay is inserted into a fairly regular (discrete spectrum) system. This is sufficient to produce weak mixing but not strong mixing. Proposition 5.8 (SI, a, p,) is weakly mixing. Proof: Suppose that f Al -* C is an eigenfunction with eigenvalue ei". Then for large n, f is approximately constant on each cylinder set [A M ] = {xeSI:xox i ... xi. _ 1 = An } .
If f actually were constant on such a set, we could find x= - • Aniln lA,.... E [An], in which case we would have o-inx = - • A n lA„... E [A n]
and Then f(o-inx) = e2ln'f(x) and f(0-21,41x) = e2ni(21,-1-1)A f(x), but these are both equal to f (x), so we could conclude that (e 2,1„ = (e2,o.)24, +1 = 1,
and
e 2niA. = 1 .
The actual argument applies Lusin's Theorem to find, given e> 0, a set F with it(F) > 1 - E such that if f (x) = e 2 r 10(x), then 0(x) is uniformly continuous on F. Find a cylinder set E of the form E = ak [An]
such that gE n F)> (1 - e)12(E) and the oscillation of 0 on E n F is less than e. Then = inii. + 8(4 e(0.21+ 'x) = (in + 1)A + 0(c inx) (mod 27r), so that
0(0.21„+ 1x)) _0(o-inx) = A + 0(o-inx) - 0(x).
For small e, there is an xe(E n F) n a - 1"(E n F) n o- -(21n+1) (E n F) (say, xe ak [A. + J); then both the left side and 0(ainx) - 0(x) are small, so we must have A = 0.
220
4. More about recurrence
Proposition 5.9 (D, a, p) is not strongly mixing. Proof: For any k = 0, 1, 2..., alk[A]n[A k], which consists of all points of the form—. AkAk —, has measure greater than or equal to that of [Ak+1 ], namely 4(1/3k+1)--= 1 ptA k]. For each k, the sets in .f.k = { [Ak], a[A1,...,a lk - 1 [A j} are pairwise disjoint, and together with alk [A k] they cover D. Moreover, for each n,
generates the full a-algebra of measurable sets. Therefore, given any measurable set E, any e> 0, and any n, we can find a k ..>-' n and finitely many integers ii =0, 1,...,Ik - 1 such that U10$n"-k
p(EA U a ii [A j) < e.
Then
i
ii(o-ik E n E)..?: p[( 65rik
U e[Ak])n(UalAJ)1
- 28
. ?.. E ti(crik a'f[A k] n aii[A 1) -2e = E p(cr sk[A k] n[A k]) -2e )1 E p[Aj - 2s ?[p(E) - E] - 2E. i
This is inconsistent with strong mixing if la) and e are small enough. Proposition 5.10 (del Junco 1978) (D, a, p) is metrically prime: if 4): 0 -• Y is a measurable map to a system (Y, S, y) such that p4) -1 = v and 4)a = S4) a.e., then either 4) is one-to-one a.e. or 4) is constant a.e.
Thus the system (0, a, p) has no proper factors; equivalently, thee-algebra of measurable subsets off/ admits no proper invariant sub-a-algebras. Some preliminary discussion is in order before we make the combinatorial observations about co that constitute the actual proof. We may assume that Y c {0, 1}z. For if we choose a set E c Y with 0 < p(E) < 1 and define iii(x)k = xE (4)(ex))
for
x E SI
and for measurable G c (0, 11 1, then iii: (D, a, p) -> ((0, 11 z, a, A) will be a nontrivial homomorphism if 4) is. Second, we will employ the a distance between sequences of Os and ls, which is defined as follows: if x, ye {0, 1} z, then al(x, y) -
1 j—i+1
card { k: i ,.ç.. k-..ç.. j, -)Ck# Yk}
221
4.5. Two examples and
ax, y) = Jim sup di_ ; (x, y). i,i- co Thus d(x, y) measures the average frequency of the disagreements between x and y. We have d(ax, ay) = d(x, y) and
ax, z) < d(x, y) + d(x, z), so that il ls an invariant pseudometric. A map i/J: {0, 1 } ' -■ {0, 11 is called afinite code if it commutes with the shift and there is an n such that t/f(x)o depends only on xk for -n < k < n. Thus the finite codes are exactly the shift-commuting continuous maps - or endomorphisms - of [0, O Z. (There is quite a lot known about such maps see Hedlund 1969). The smallest such n is denoted by 1 01. Given our measurable map 4): { 0, 1}' -> {0, l}z and e >0, we can find a finite code 0 such that d(0(x), tli(x)) < e a.e. For let aq denote the a-algebra generated by the sets 6101 i < k <j. We can choose an n and a (cylinder) set Begin . such that if e' = {x : 4) (x)0 = 0), then
p(BAC) < c. Define 0 by
tP(x)k = xlitc (akx). Then 40(x)k # 111(x)1, if and only if exe BM, so that
1
j
a(4)(X), 1/0)) = lim sup . .
1 E xime (akx). p(BAC)< E a.e., id- cc J + 1 + A k= -i by the Ergodic Theorem and ergodicity of (0, a, pt). Our goal is to prove that if 0: n -- Y is the given homeomorphism, and (f4) is not one-to-one a.e., then dfrk(x), 0(o -x)) = 0 a.e. For then, as above, we will have for a.e. x that 0 = ack(x), 0(ax)) = lim sup i.i - xi i sup = Jim sup i.J-0,
1 1
.
card [lc - i
cfr(x) k , 1 }
j
E xpoTkom,
i + 1 k= ---i
where D = ize {0, 1}' : zo # z1 }. Choosing x so that 4)(x) is v-generic, we - D) = 0, 4)(x) 0 = 49(x), a.e. dp, and 0 is see that v(D) = 0. Thus constant a.e. on Q.
to '
222
4. More about recurrence
Proof of the Proposition: Suppose that 0: (1-, y {o, 1}z is not one-toone a.e. Let e> O. Choose a finite code tli such that 3(4)(x), t(i(x)) < E a.e.
Then
3(0(x), 41)(6x)) -‹..3(4)(x),IJI(x)) + 301/(x), tii(crx)) + 3(( (ax), 4)(0x)) < 3(0(x), 0(crx)) + 2E, and if 4)(x) = 0(y) then aili(x), tli(y)) < a. Let us define 0 on finite blocks B by agreeing that tli(B) is the initial 1(B)-block of the image under tli of the point ...000.B0000...En. For a block A, let A and A denote the block A with its first (respectively last) element deleted. We will show that 0(74.) and 1/4/1.) differ in only a small percentage of places for infinitely many n. In fact, if qpi is the number of places in which CA.) and i1#(A) differ, we will show that, for infinitely many n, a..n — 1 ,IIn— 1 < 36e. For this purpose, it is enough to show that if 4)(x)= 0(y), then for infinitely many n
a(x), OW)
1 q _1 n 18 in _ t
Suppose that 4)(x)-.---- ck(y). We may assume that y09(x), since for each k the invariant set {x:4)(x) = 0(45-kx)) must have measure 0 - otherwise O. çbo-k a.e. for some k, and ri) would have a non-constant eigenfunction. We may also assume that x and y are not in the orbit of any point which is of the form—. A— for every n or — A n .— for every n, since the sets
n
[An]
and
()
cr'n[Aj
both have measure O. Notice that a point which is in [A n] for infinitely many n is actually in [A n] for all n. Thus for large n the central coordinates of x and y are each contained in blocks A,, 1 . Note that because of (2) of the Lemma (which says that A„ i does not overlap itself in x and y), for each n x and y can be resolved uniquely into sequences on the two symbols A,,+1 and 1. Let us call the appearance of A n+ 1 in x which contains x o the central A n+ 1 in x, even though it may not be exactly centered. Define k„(x)= 1, 2, or 3 according to whether x o is contained in the first, second, or third A n which comprises A n+ 1 . Let E,, = z: k„(z) = 1 or 3 }. The cutting and stacking construction allows us to compute p(E„ n En+ 1 n... n E,,,). For consider the stack present at the (n + 1)th stage. {
4.5. Two examples
223
kn = 3 }
'spacer'
•
kn = 1
When this stack is cut into thirds along the dotted lines and restacked, with a spacer between the second and third chunks, exactly 2/3 of the part where ic.= 1 or 3 falls into the new first and third chunks, i.e. the set where kn ., 1 = I or 3. Continuing to cut into thirds and stack (with spacers), we see that
p(En n E., 1 n... Therefore, for each n
(fl Ek ) =-- 0, kn and it follows that with probability 1 lin = 2 for infinitely many n. We assume that our x and y are elements of this set of full measure. We will show now that there are infinitely many jc, such that k10(x) # k.0(y) and the central A Jo. s of x and y overlap at least ljo —1 places. (See del Junco, Rahe and Swanson 1980.) The idea is that the spacers (Is) in the central A joi, l 's of x and y should be far enough apart that, using a sort of differential delay in x and y, we will be able to find A i. and Ai0 _ 1 in one sequence facing the same block B in the other. Fix an no such that k(x) = 2. There is a j> no such that ki(x) # kj.(y). For otherwise, since kno , 1 (x) = k no , t (y), there is an i with Iii ‘. 1.0 , 1 such that o-ix and y agree on a block of length in. + , which includes the center of y. Then k (X) =--- kn., 2 (y) implies that o-ix and y agree on a block of length ino , 2 which includes the center of y, and so on. Hence either a-ix and y are different and positively or negatively asymptotic — a possibility that
224
4. More about recurrence
n
[A n] and we have excluded by discarding the orbits of the sets flerlIAJ or else aix = y, which is also not allowed. Let jo be the smallest j > no such that ki(x) * ki(y). I claim that the central A jo's of x and y overlap at least /Jo - 1 places. Suppose first that j o = no + 1. Then the central position falls in the x and in either the first or third A in the the central A second A central A no , in y, so the central 'In. + l 's in x and y must overlap at least /no= Jo — 1 places: center
A no
A no
A no
x: Y:
Or
Y: Y-
Ano+1
The other possibility is that j o > n o + 1. Then kJ._ 1 (x) = k J 0 - 1 (y) so that the central A . ' s in .x and y overlap at least 21.io places. center
A10 _11
1
Y:
I
' A10 _ 1
A/0
A10,4
Fix a jo such that kio(x) * kio(y) and the central Aio's of x and y overlap at least Ikr _ places. We will show now that for such jo , we can find within the set of places occupied by the central Aio +1 of x the blocks Ak _ i and A (in one of x or y) facing the same block B (in the other). Because of symmetry, there are only three cases to consider. (1) Suppose kio(x) = 1 and k 0(y) = 2.
4.5. Two examples
225 center t
it10
i I
X:
A lo
Aki
• 4-110
0 I adr-e1
4
B
Y:
d
Aki
1
Ak
Alo
(2) Suppose k.0(x) = 2 and kio(y)= 3. center j A10
2110
I X:
tii.- u
01
A10
h - l:
1
Y: A10
A10
Aio
(3) Suppose k.0(x) = I and kin(y) = 3. center Aio
Ah
1 X
1 6i.-1 1 0
Alo-.I
:
B1 1
Y:
1
B1
1 B2 1 I
central A" of y
A
40
Ah•
The central A i0+1 of y is followed either by lA jo+ or A.j o +, - If it is A + 1 , u seB 1 -9 ifitisA.Jo + 1 ueB s 2•
226
4. More about recurrence
Thus if ip(4 0_,) and w(A) differ in q10_ 1 places, then (allowing for edge effects) t/i(x) and ky) differ in at least qi0 _ 1 - 410 places in [ - 1.10+ 1 , ijo + i — 1]. Therefore 2c> a(110), til(Y))
" ?'
-1 = lim sup lim sup -2/1-=Jo
2/ .o+ 1 J
Jo
1 q 4' 1 2 9/
Jo — I
+ 4
1 q --1 , = — lim sup---12 18 .h., I so that qi. _ 1/1.10 _ 1 < 36i for infinitely many jo . Apply this observation to central A j's of a typical zefl, for which the limit used to compute 3(0(z), (j)(az)) exists, to see that ack(z), 0(crz))= lim di__ 0(z), 0(az)) id-. -.ç. lim inf [3f t(/(z)) + cli_ joli(z), 4a z)) + di_ i (0(07), ci)(a z))] 1.i-• co -..‹.. 2 lim sup c/-1 (0.(z), 0(z)) + lim inf di_ i (0(4, tif(az)) id-.. id -. og < 2s + 368 = 388. Therefore 3(4)(z), ao-z)) =-- 0 a.e., and the proof is complete.
5 Entropy
The concept of entropy was invented by Clausius in 1854; Shannon carried it over to information theory in 1948 and Kolmogorov to ergodic theory in 1958. In each setting entropy is a measure of randomness or disorder. The importance of entropy in ergodic theory arises from its usefulness in connection with the isomorphism problem for measure-preserving transformations. There are two major theorems concerning ergodic-theoretic entropy. The first, due to Kolmogorov and Sinai, says that the full entropy of a transformation can be computed by finding its entropy with respect to a generator. This makes possible the actual computation of entropy in a variety of cases and leads to the conclusion that Bernoulli schemes of different entropies are not isomorphic; in this way Kolmogorov and Sinai settled in the negative the old question of whether or not git(-1,1) and )are isomorphic. The second, due to Ornstein in 1970, says that for Bernoulli schemes entropy is a complete invariant: two Bernoulli schemes are isomorphic if and only if they have the same entropy. We will give a brief introduction to this subject in Sections 6.4 and 6.5. Entropy is a sort of categorical concept, versions of it existing in group theory, graph theory, and other areas. We begin with a brief look, for background, at the idea of entropy in the three fields in which it plays especially important roles.
5.1. Entropy in physics, information theory, and ergodic theory A. Physics Rudolf Clausius created the thermodynamical concept of entropy in 1854. He coined the actual term `entropy' fourteen years later, from the Greek roots for `transformation content'. When a system changes from state I I to state 12 , the change in entropy is given by the curve integral S=
f' dQ 1
,T
228
5. Entropy
in state space, where Q is the heat content of the system and T is its temperature. Later Boltzmann showed that (after normalization) the entropy S of a system is proportional to the logarithm of the relative probability of its state:
S = k log P, where k is Boltzmann's constant. By choosing the state so that S (or, equivalently, P) is maximized, one can derive, for example, the MaxwellBoltzmann velocity distribution for an ideal gas. Let us consider a simple illustration. Suppose that an isolated system consists of N identical molecules, each of which can occupy any one of a number of states E t , 12 , ... in phase (position-momentum) space. Consider a state of the system in which there are Ni molecules in state E, (1= 1, 2, ... , k). The number of ways to achieve such a distribution is
(N )(N — N 1 ) N2 \N I
=
(N — (N I + N 2 + ... ± N k _ i )) Nk
NI. N i !N 2 ! ... Nk ! .
This number is proportional to the relative probability of the state, and so for large N the entropy of the state approximately is proportional to (using Stirling's estimate log n! — n log n — n) log N! — Elog Ni ! — N log N — IN i log Ni . If pi = Ni/N is the probability of a given molecule being in state E r then
S — N log N — ENi log Ni = N log N — NEp i (log pi + log N) = NEp i log p i , and the entropy per particle is
( 1 ) — Epi log pi. Suppose that the state Ei has energy E. associated with it. Then maximizing S under the constraints Ep iEi = E' (fixed total energy), Epi . 1, yields the Maxwell-Boltzmann distribution, in which e - EAT Pi ' le - Ej/kT • i
The high-probability, high-entropy states are pictured as ones which achieve a high degree of disorder and randomness. They also exhibit a
0.i.
rinropy
in pnystes, informutturt, urtu erguutu trteury
LLY
sort of statistical attractiveness for other states; see our discussion of the Recurrence Theorem. In fact, according to the Second Law of Thermodynamics the entropy of a developing isolated system does not decrease. If we start in a relatively low probability state, say with a warm brick next to a cold one, the two-brick system will move to the maximal-entropy state in which both bricks have the same temperature. (Of course there will always be very slight fluctuations of the temperatures of the two bricks about their exact mean; and there is a positive probability of a fluctuation large enough to recreate the original state. Note that laws of physics, however, talk about what does happen and not about what may happen.) When the bricks reach thermal equilibrium, the system is somehow more disorganized than at first, and there has been some loss of information. Thus there is a connection between the concepts of entropy in physics and in information theory.
B. Information theory There is concrete justification for the feeling that increased entropy, hence increased disorder and randomness, is associated with a decrease in available information and an increase in uncertainty. Suppose a gas molecule moves freely in a box which is divided into two equal halves by a partition; •
then at any time we know which half of the box the molecule is in. Suppose now that we allow the partition to slide under the impacts of the molecule, as the temperature of the molecule is kept constant by contact with a heat reservoir. Since the molecule does work in pushing back the wall, it must draw heat from the reservoir. The heat content of the system increases and so does its entropy: AS =
AQ — .
T
The increase in entropy corresponds to a loss of information, since we no longer have as precise an idea of where the molecule is at any moment.
•
23U
.3. Entropy
There has been an increase in uncertainty together with the increase in disorder. In 1948 Claude Shannon initiated the mathematical study of information transmission. The (initial) key ideas are that (1) the amount of information transmitted can be measured by the amount of uncertainty removed, and (2) the uncertainty regarding a transmission is proportional to the expected value of the logarithm of the probability of the symbol received. Let us be specific. Consider a source, which is thought of as a ticker-tape or teletype emitting a string of symbols... xox i x2 ..., where each xi is an element of a finite alphabet A = {a 1 , a2 , ... , aj. Suppose the probability of receiving ai is pi for each i = 1, ... , n. If each symbol is transmitted independently of what has come before, we can take as a measure of the amount of information transmitted per symbol on the average (i.e., perhaps over many transmissions) the quantity , pi log2 pi . (2) H = — i= I, (We use logarithms to the base 2 because usually information is measured in terms of binary bits, counting the number of yes-no questions that must be answered in order to convey it.) Four remarks suggest the appropriateness of the quantity H. (1) Compare with formula (1) above. (2) If some pi. = 1 while all other pi = 0, then there is no uncertainty at all about what the source will do, and so there is no information being conveyed. In this case, under the convention 0 log 0 = lim._, 04 x log2 x = 0, we also have H = O. (3) H is a maximum when all p i = 1/n. To see this, consider the function f defined on [0, 1] by
y
f (t) =
{0
— t log2 t
if t = 0 if 0 < t ‘.. 1.
The function f is continuous, nonnegative, and concave downward:
5.1. Entropy in physics, infirmation, and ergodic theory From Jensen's Inequality it follows that for any positive A t , (
1
n i=E
E1
n
rri =
231
e [0, 1],
fR1
Thus for any probability vector (A i , A2 , /3 -1
f (Ai ) < f( 1- V A. ) f r2) = - -n1 log2 n-1 =n-1 log 2 n,
so H(A) log2 n. But clearly H(1/n, 1/n,..., 1/n) = log2 n. This case, when all symbols are equally likely, is the one in which there is the most uncertainty about the output of the source and hence the one in which the source as it functions conveys the maximum amount of information. (4) H is the average of the information conveyed by the receipt of each symbol, if we take the information content of an event E to be - log2 P(E). This is a reasonable choice since the only measure of information of the form f (P(E)), where f: [0, co) is continuous, which is additive for independent events f (P(E r-1 E2 )) = f(P(E i )P(E 2 )) f (P(E I )) + f (P(E 2)) is log, P(E) for some r > O. For another justification of the definition of H, see Khintchine (1957). For a general source, the probability of a given symbol's being received may depend on what has been received beforehand. For example, if the ticker tape is putting out English prose, the probability of receiving a 'u' will rise dramatically each time a `q' comes through. Such effects can be taken into account in the calculation of average information per symbol by grouping into blocks. For each k = 1, 2.,... ,let 1k be the family of all a.. Each block C of W, has a blocks of length k on the symbols al , a2, certain probability P(C) (possibly 0) of being received. The average information per symbol in a transmission of length k is then -
-
-
(3)
1 Hk = — — P(C)log 2 P(C). k ceirk
We define the entropy of the source itself to be (4)
1
h = lim k -'
E
ceek
P(C)log2 P(C).
The limit can be proved to exist (Proposition 2.10), and it represents the average amount of information which the source conveys with every symbol it prints. Equivalently, h measures the average degree of uncertainty the reader of the ticker tape has about what the next symbol will be, given whatever
232
5. Entropy
he or she has received so far. By letting people guess the next letter in samples of English prose, having seen 0 or 1 or 2 or ... immediately preceding letters, Shannon estimated the entropy of typical English prose to be about 1. (cf. Proposition 2.12). This is rather low compared to the log2 26 entropy of a source printing out the 26 letters independently. The reason for this is that English prose is extremely redundant. The rules of word formation, grammar and logic all restrict the number of combinations that actually occur and thereby decrease our uncertainty about what is coming next. Of course this redundancy is essential for the practical functioning of language, since it allows for the correction of errors, lapses in attention, etc. Sources putting out poetry or mathematical proofs, for example, will have higher entropy but will also suffer more disastrously if an occasional misprint occurs. C. Ergodic theory
The source discussed above was nothing but a finite-state stationary process. Outputs of the source are infinite strings ... xo x i x2 of symbols of an alphabet A = {a 1 a2 ,...,a, } , that is to say points of the sequence space A. If we assume that the source is indeed stationary, so that the probability of receiving any string does not change with time, then there is a measure on A z which is preserved by the shift transformation a: A z —■ A z . If p is the product measure generated by the weights p i for then we are dealing with the Bernoulli shift a(p i , p, ... ,p.); this is the case when the symbols are transmitted independently of what has gone before. In any case, in calculating the entropy of the source according to formula (4), we have found the 'entropy' of a measure-preserving system (Ar, a, p a). Given any measure-preserving system (X, a, p, T), it is easy to associate many finite-state stationary processes. Just take a partition ,
a= {A i , A 2 ,...,A.}
of X into finitely many measurable sets (the A i are pairwise disjoint, have positive measure, and cover X, all up to sets of measure 0). Each point xE X produces a single output of the ticker-tape as follows: at time j the tape prints out symbol A i if Tixe A i . Let us define the entropy of a partition a = {A 1 , A 2 , ...,A } to be H(a) —
1.4.(A dlog 2 p(A i).
= (cf. (2)). The appearance at the jth place on the tape of a block of symbols A i , A i2 A ik corresponds to the entry of Tx into the set
5.1. Entropy in physics, information, and ergodic theory
233
n
nT'A. . This set is an element of the partition = a v T -1 a y ... y T -kik+ ' a which is the least common refinement of the partitions a, T -1 a, ...,T - 4- I cx. Definitions : If a = {A 1 ,...,A.} and /3 = (13 1 ,...,B.}, then (2
7— l a is the partition IT -I A i , a y fi is the partition {A i nBi :i =1, , =1,..., m}, and /3 is a refinement of a, written /3 a, if each Bi is, up to a set of measure 0, a subset of some A i . Thus the quantity
Hk
in (3) is
1 -H(a V T -l a
T -k+l a),
and the entropy of the source defined by (4) is
1 h(a, T) = h u (a, T) = lirn -H(a y T - y ...y T -1c+ a). k-.07 k The quantity h(a, T) (which we will soon prove exists) is called the entropy of the transformation T with respect to the partition a. The number h(a, T) is a measure of the average uncertainty per unit time we have about which element of the partition a the point x will enter next (as it is moved by T), given its preceding history. Of course if a is foolishly chosen, there may not be much uncertainty - telling which cells of a the points Tix actually land in may not convey much information that could not have been guessed a priori. Therefore the entropy of the transformation T is defined to be the maximal uncertainty over all the finitestate processes associated with T: h(T) = h m(T)= sup h(a, T). The number h(T) is a measure of our average uncertainty about where T moves the points of X. The size of h(T) reflects the 'randomness' of T and the degree to which Tdisorganizes the space. Clearly h(T) is an isomorphism invariant of T. More generally, if S is a factor of T, then h(S) h(T) (because each partition downstairs determines one of equal entropy upstairs, but not necessarily conversely). Thus in fact entropy is an invariant of weak isomorphism, where we say that S and T are weakly isomorphic if each is a factor of the other. (Only recently Steve Polit (1974) gave an example of two weakly isomorphic transformations that are not isomorphic. Another example has been given by Rudolph (1979).) We may also think of a partition a as the collection of possible outcomes of an experiment (e.g. reading the most recent symbol printed out by a
234
5. Entropy
ticker-tape or rolling dice). The number H(a) is a measure of our (expected)
uncertainty about the outcome of the experiment or equivalently of the amount of information that is gained by performing the experiment. The common refinement (or join) a y /3 represents the compound experiment formed by performing the experiments a and /3 simultaneously. Thus 1 -H(a y T -l a y ... y T'Œ)
k
is the average amount of information per repetition obtained from k repetitions of the experiment a, h(Œ, T) is the time average of the information content of the experiment cc, and h(T) is the maximum information per repetition that can be obtained from any experiment, so long as T
is used to advance the time (i.e., to develop the system). Is it possible to design an experiment so cleverly that it extracts all the information that is to be had. i.e., so that h(Œ, T)= h(T)?
It seems that if the partitions ay T -1 a y ...y T generate the full a-algebra .4 , which is to say that a is a generator, this might just be the case, since then the outcomes of any experiment could be described to any desired degree of accuracy in terms of some finite-length compound of a. Kolmogorov (1958) defined the entropy of Tto be h(a,T) if T has a generator a and oo otherwise; the preceding definition is due to Sinai (1959a); the equivalence of the two is what is usually called the KolmogorovSinai Theorem, which we will prove below. It provides the chief tool for the computation of entropy and permits us to show that, for example, Ai-, and are not isomorphic. Before proving this theorem we need to establish a few properties of entropy and conditional entropy and to tie up the loose end concerning the existence of the limit
*I, 1, 11
1 h(a, T) = lim -H(a y T'a y ... y T -k+ l a). k-cck 5.2. Information and conditioning
In the preceding section we saw that — log2 p(E) is a reasonable measure of the information content of the announcement of the occurrence of an event E. Let a = {A 1 , A 2 , ... } be a countable partition of X into measurable sets. (For convenience we will deal sometimes with countable partitions, sometimes with finite ones.) For each xe X, denote by a(x) the
5.2. Information and conditioning
235
element of a to which x belcngs. Then the information function associated to a is defined to be /.(x) = — log2 p(a(x)) = — E log2 1.4AV A(X), Aea
so that /a(x) takes the constant value — log2 p(A) on the cell A of Œ. IŒ(X) measures the gain in information when we learn to which element of a the point x belongs. Clearly H(a) = — Z p(A) log2 p(A) = f
la(X)C111(X),
x
Aea
A point x can be thought of as a particular complete world history. la(x) tells how much we learn by performing the experiment a in the case of a particular history, and H(a) is the average information gain over all possible histories. It is useful to consider conditional information and entropy, which take into account information that may already be in hand. Let Sr g be a sub-a-algebra of the family of measurable subsets of X. Recall that for the conditional expectation E(4)1.59 of 4) given Sr is an 4) E Sr-measurable function on X which satisfies
On
JF
E(01")CIP = f 0 Clkt F
for all Fe Sr;E(4)ISr)(x) represents our expected value for 4) if we are given the foreknowledge .F, i.e., told exactly which sets of Sr the point x belongs to. The conditional probability of a set A E g given Sr is
MA1.91 = MCA IF); it represents our revised estimate of the likelihood of the occurrence of A once we know the information in Sr. Now if receipt of a symbol A from a ticker-tape source conveys an amount — log2 p(A) of information, then if the information in Sr is already known receipt of the symbol A will have information content — log2 p(AlSr). Thus we define the conditional information function of a countable partition a given a a-algebra Sr .W to be
1„,,(x),_
—E
log21,1(A1,- )(xvA(x).
Aea
The conditional entropy of a given Sr is defined by
H(aIS9 x
236
5. Entropy
Thus 1 a 1sr (x) measures our uncertainty about the outcome of the experiment a once we know to which elements of g the point x belongs, and H(alg) is the average of these uncertainties over all possible x. Of course these quantities can be infinite; for finite partitions they are always finite. Remarks 2.1
(i)/als, -?,.. 0 a.e., so that H(a1.5r)..?..- O. and H(a I {0, X} ) = H(a). (ii) /a1{0.X) = I (iii) laia = 0 a.e., so that 11(algi) .-- O. The proof of these statements is left as an exercise. Remark 2.2 Two alternative definitions of lais,(x) may suggest themselves to the reader: E(I a IF) and
- Aea E log2 Po Isr) - flo I 39' The first is not as interesting as I al." , since its integral is just H(a). The second also has integral H(a1.59, since for each A e a E(log2p(Alg) X Alg r ) = log2p4A 1,)- p.(A1.9), so that
H(cciSr) --= E(1. 1,) -----. E(E(140 159) =
E(
E log2 p(A13) - p(AI.5); Aea
however, it is g-measurable and so may not be sufficiently sansitive to a. It is easy to describe 1OEI , and H(al.59 in case g is the (finite) a-algebra .4(13) generated by a finite partition /3 = {B i , 132 , ... , 13m } . Then on each set Be MO we have p(Alato) = 1(A IB) = p(A n B)11.1(B), so that I aiat un = — 1°g214,4 1.4 (M) = — 1°g2
p(A n B) on A n B p(B)
and Mal a(M)
=
E - log2AA I B)p(A n B)
A,B
BO
= i — fx
A ea
log2p(A I B)p(A I B)) p(B)
n2 PO (x)
Igx))dpfx).
In this case we will abbreviate L ia($) to I a w and H(aI.4(13)) to H(al 13) • Notice that IOEIft is, on a cell B of fi, just the ordinary information function of the partition of the restricted probability space (B, .41 n B, ;LB) (where
5 2 Information and conditioning
237
pB(E) = p(E)/p(B)) determined by a. 100 (x) measures the amount of information gained when we are told which cell of a the point x is in, if we already know which cell of /3 it's in. H(a113) measures our (average) uncertainty about the outcome of the experiment a if we already know the outcome of another experiment fi. We establish some useful computational properties of the conditional information and entropy. The reader should supply philosophical support for these formulas by interpreting them in terms of ticker-tapes and experiments. Recall that if .F 1 and .9- 2 are a-algebras, then Sr i y .9-2 denotes the smallest a-algebra containing F k.) . Similarly, we define V nœ= U,T- -,); also, by the join of infinitely many partitions we mean the a algebra they generate: Vnœ_ cen = a((J,T- 1.4 (a ) ) = V7- la(a ). (
-
Proposition 2.3 If a and fi are countable measurable partitions of
X and .97 is a sub-a-algebra of a, then v 16
. 1.F
=
+
/0191(a) vsw a•e•
Proof: For each Be 13, p(B1.4(a)
=
y
E p(B nAjF) xA a.e.,
Aece MA I ") where we take ti = O. This is so because the right-hand side is measurable with respect to .4(a) y F; and for a typical generating member A' r F (where A' e a and Fe') of Ma) y .9" we have p(B L
is ( A
I
IF
S'(a AnFAe ,4A '-'
y A) C114
XA'
p(Ai lsr)
I
n A' 1.9)4 =
JF
X A'
.97 )
F
)
t.t(B n A' ISF )1 3r ) dp,
f E. (
=
p(I3 n A'1.59
A 1.9)
RXA,I.-F)14 pB(A r)1A1131f;') n A' n F) = A' n
F
XB dp.
Therefore EAece Pog2 P(A n B1.9)4(A1.9 .)] XA 1°g211(131 Acc) y 3r) a.e., and l a y pig' = - y log2 pt(A n131.99Z AnB Aea
Bell
= — E log2 p(A1.9)x AxB — log2 4l31.4(a) y Aea
BE
BEJ
=
E log2 I4A1.59x A — y log2 p4I31.4(a) Aea
=
HEft
/MAW v •
238
5. Entropy
Corollary 2.4 For countable measurable partitions a and 13, IOEvo = la + loot a.e.
Proof: Take SI . 125, X} in the Proposition. 2.5 Let cc and 13 be countable measurable partitions of X and gr, .9-1 , and .972 sub-a-algebras of a. Proposition
(1) Mc( y /3 1.F) = 11(cc1 39 + 11(131a(a) y g). (2) If Sr i .9.2 , then H(alg i )---ç.. H(alF 2).
(3) H(T ' alT - 'F) = H(r499. (4) If a Lç. 13, then H(alg) -.‹.. 11(131g). (5) H(a y Pig) t.5. H(ccigf) + H (1M,' ). (1') H(a y /3) = H(a) + 11(131a). (3')H(T - 1 cc) = (4') If cc < fi, then H(a) ‘.. H(13). (5')H(a y 13) '4. H(a) + HO.
Proof: The primed statements follow from the corresponding unprimed ones by taking .5;" = { 0, X}. (1) This follows directly from Proposition 2.3 by integrating. (2) Apply Jensen's Inequality (1.4.8) to the concave function f(t) = — t log2 t : for each A E a, E(f 0 1.1.(A I g Mg- 2 ) < f 0 E(it(A I g dig 2 ) = f o p(Alg 2). Integrating gives
fx f 'a p(Alg i )dp .4. f f ° p(Alg 2)dp., x and (2) follows by summing over all A G cc, bearing in mind Remark 2.2. (3) It is easily checked that p.(T -1 A1T -1 .59(x) = 4ie11.59(Tx) a.e. Therefore
H(T - laiT - '.9") = — fE log 2 1.1.(T ' AIT - 1 g)(x)X T -1,4 (x)dp(x) AE01
f =
-••••-
f
E log 2 m(AIS9(Tx)x A(Tx)di4x) Ilea
110g2 1-Z(Al 3) (X )X A(X)CIP1X) = Aeoz
(4) Since H(filgi(a) \./ g) ?--. 0 and a y /3 = 13 when a
H(fil.F) = H(a y 131g) = H(ccig) + H(/30(0 y g) (5) Since .f- a .41(a) y F, this follows from (1) and (2).
H(cc I .F).
5.2. Information and conditioning
239
Corollary 2.6 H(al .59 < H(a). Proof: Take .9-2 = [25, X} in (2) above. Proposition 2.7 H(a1.59 = 0 if and only if a c Sr up to sets of
measure O. Proof: If a a .9", then p(AF) = xA a.e. for each A E a, so that = 0 a.e. Conversely, suppose that H(al.9) = O. Then (by Remark 2.2) for each A ea we must have log2p(A I 39- p(Al.59 =0 a.e. This implies that p(A1.59 = 0 or 1 a.e., and hence 4AI.59 is a.e. equal to xF for some Fe F. Clearly p(F) = jp(A1.59dy = p(A). On the other hand, p(A n F) = SFXA dy= IFIL(A1 39dit = Ï FXF dP = P(F). Therefore A = F up to a set of measure O. Corollary 2.8 H(a1/3) = 0 if and only if a < /3.
We say that two measurable partitions a and 13 are independent, and write a 1 13, in case p(A n B) = p(A)p(B) for each A Ea and Be 13. It is reasonable that the information gained from independent experiments should be the sum of the amounts produced by each. Proposition 2.9 The following statements about two countable
measurable partitions a and 13 of finite entropy are equivalent: (1) a 113 (2) H(a y 13) = H(a) + H(13). (3) H(a113)= H(a). Proof: The equivalence of (2) and (3) is clear in view of Proposition 2.5 (1'). If a 1/3, then for each Ae a we have p(A IM(fi))= p(A) a.e.; hence Ialti = l a/a e " and H(0:113) = H(a). Conversely, suppose that H(a1/3) = H(a). Then, with f(z) = — t log2 t,
E [f (AA)) — fX f (14 ,400)) did = °-
AEG!
In the first step of the proof of Proposition 2.5 (2) take F2 = {0, X} to see that each term in this sum is nonnegative. Therefore for each A ea,
f(p(A))= f f(p(AIR(13)))dp x — p(A) log2 p.(A) =
E REP
—
log2
p(A n B) p(A n B) p(B). p(B) p(B)
Suppose that 13 = {13 1 , B2 , ...} ; let ilk = p(Bk) and xk = p(AlBk) = p(A n Bk)/p(Bk) for each k. We have
044 f (AA)) = E dtk(xk , fOck)).
240
5. Entropy
Now the region below the graph off is convex, and we have written an extreme point of this region as a convex combination of other extreme points. This is impossible unless all the xks are the same, that is to say po n Bk)/p(Bk ) --= p(A) for all k. Therefore a 1 fi.
2.10 For each countable measurable partition a of X, h(cc, T) = lim„_, co H(a v T -1 a v ... N./ T - " + ' Œ)/n exists (it is possibly + co). Proposition
Proof: Suppose that H(a) < co, since otherwise clearly h(a, T) = cc. For each n = 1, 2, ... let H.= H(a v T - 1 a v ... v T' +1 a). By Proposition
2.5 (41, {H.} is increasing. Notice that this sequence is subadditive: H. + .
=H(av T'a v ... v T'+'av T - "(av T -l av ... v T - m +1 a)) ‘...H(av T -1 a v ... v T'' 'G ) + H(av T'av ... v T - m +1 a) = H.+ H,.. Now clearly Hi. ---54H. and Hi t5jH 1 , so that {Hi/j:j = 1, 2, ... } is bounded; hence H. L = lim inf —1 < 00 . .i- ' co
i
Given e> 0, choose a large j such that Hj /j < L + e. For each n >j, choose i(n) such that [i(n)— l]j < n -41(n)j. Then H„ .-.5. How and hence H,, ,.... 11
n
n
< H104i ‘s. i(n)Hi < i(n) (1., + 4. [i(n) — 1] j [i(n) — 1] j i(n) — 1
Thus if n is large enough, we will have
H
L — c < --1 < L + 2e. n
241
5.2. Information and conditioning
Proposition 2.11 If a?, at 2 ... are sub-a-algebras of at and M, = V,T i 2n , and a is a finite measurable partition, then
lim H(alatn) = H(a*.). Proof: By the Martingale Convergence Theorem (see Section 3.4, noting = AA) for all n) that E(AA
IA))
for each Acct.
p(AIM.) —) ii(AIR ) a.e.
Composing with the bounded continuous function f(t)= — t log2 t, we
have f (1(A 1M n)) —> f (p(AIM )) a.e. for each A ea. By the Bounded Convergence Theorem,
fx f (p(Ald tn))dp —v f f (p(AIM 0„))dp for each A e a, x and the result follows by summing over the finitely many A Ea. Proposition 2.12 If a is a finite partition. then n
co
h(Œ, T) = lim +IV T ' a) = n' 00
H(ccIV
TŒ).
k=1
k=1
Proof: By Proposition 2.5 (1), j
j
1/(a lV T -k a)= H(a v T -l a v ... y T - Ja)— H( V T -k a) k= 1
k=1 i -1 (
i
T -k a)-
= H (V T -k a)V - H k=0 Sum over j = 1, ... , n to get n
(
H(Œ j=1
j
k=0
n
V T -k a)— H(a).
V T -ka = H k=1
k=0
Notice that H(al Vnk =1 TŒ) is nonnegative and decreasing, so that its limit exists as n —> oo . Therefore the limit of its Cesaro averages exists as well. Thus dividing the preceding equation through by n + 1 and taking limits gives n liin
H (ex 1 V 1.-ka )= lim
n--• cc
n--•co
k=1
1
n+ 1
n
H(V T -Ica)= h(a,T). k=0
(This also provides another proof of Proposition 2.10.) For the remaining equality, apply Proposition 2.11 with 00
n
and
242
5. Entropy
This formula justifies our speaking earlier of the entropy of a source as the average uncertainty about what symbol it will print next, given its output so far. Of course mathematically it is difficult to distinguish the past from the future: if T is the ordinary left shift and a is the partition according to the entries observed at time 0, then Viccit_ 1 T -ka actually represents the readings made in the future rather than the past. We could redefine our terms, transformations, or directions - or note that it doesn't really make any difference which way we go; see Exercise 4. 1 1Remark Both 2.11 and 2.12 remain true if a is assumed to a countable partition of finite entropy. For the proof, we need Corollary 6.2.2 below.
Several more formulas will be needed for the computations of the next section. We prove one and leave several others for exercises. Proposition 2.13
For any countable measurable partitions a and fi,
1(fi, T) 4. h(a, T) + H(fila). Proof: Let ro -1 = fi v T -1 13 v ... y T - m + 'fi and 4 -1 = a v T -la y ... y T - m + ' a for m = 1, 2, .... Then for each m, WV ' I ar I ) --H(fila)+ H(T -1 filT -l a)+ ... + H(T - m +l filT - ' 1 a) = mH(fi(a).
Therefore H(#7 -- 1 ) -4 H(K", - 1 y a7 - 1 ) = H(ano' - 1 ) + Mg -1 14 -1 ) 4 1-1(4 - 1 ) + mH(fila),
and 1 1 h(fi.T)= lim —H(137, - 1 ) 4. lim — H(4 - 1 ) + 11(fila) rn-■ co M
m-, of) M
= h(a, T)+ H(fiia). Exercises 1. Show that a t5. fl implies I.19 -‘. /019 a.e., but it is not even always true that Ial.Sr 4. /a a " 2. Extend Proposition 2.9 by proving that 1-1(a13)= H(a) if and only if the two a-algebras M(a) and .9- are independent, in that p(E n F) = p(E)p(F) for all Ee.R(a) and all Fe,.
5.3. Generators and the Kolmogorov-Sinai Theorem
243
3. Supply the proofs of Remarks 2.1. 4. Show that (a) h(a, T) = lim H(T - "al VI:: (1) T k a). (b) h(Œ, T) = lim H(al V Tka) = H(al V IT_ 1 Tka). (c) h(a, T) = h(a, T Interpret these formulas in terms of ticker-tapes. 5. Define I swi g for .F and g sub-a-algebras of M. Prove that
vg-2tg =
+ / "21givg'
6. Show that a ‘. fi implies h(a, T) h(fi, T). 7. Show that h( V7,,T -ka, T)= h(a, T). 8. Show that h(Tk) = Iklh(T). (Hint: For any partition a, consider the partition at k = Cr -k a y ak a and estimate h(at k, Tk).) 9. Show that IT- L a = JO T 10. Show that hu (cit, T) and hp (T) are affine functions of y on the space .11 7. of T-invariant probability measures on (X, Mt). 11. Show that h(T) = sup{h(a, T); a is a countable measurable partition with H(a) < col . 12. Prove Pinsker' s Formula : H(a y fil(a y fi)) = H(I31 fir) + H(alaT y
fif oe).
(Hint: Note that
H(a loci° y fi_ cit ) = lim H(a°_„laic y /V y /It) and
1 H(fi Ifi5 = lim - H(r n lar y fr).) n co n 5.3. Generators and the Kolmogorov-Sinai Theorem The computation of h(T) from its definition as supa h(a, T) is seldom feasible. However, if there should exist a partition a for which h(a, T) = h(T), then of course h(T) can be found. This is the case when T has a partition which generates the full a-algebra M. In this section we will deal again mainly with finite measurable partitions of X. For any partition a we use the notations a"= VZ=„, Ta; in case in - oo or n = co, recall that we intend by this notation
244
5. Entropy
the a-algebra generated by all the partitions involved: CO
ar = V M(T -k a), ic= 1 03
a -1 - co = V a(T ka), k= 1 co
These a-algebras may be thought of as representing all past (or future, depending on how you look at it) performances of the experiment a, all future (or past) performances of a, and all performances of a, respectively. A finite partition a is called a generator with respect to T incase a' co = M up to sets of measure O. We now have the tools to prove the following important result rather easily. Sinai's interesting proof, based on the entropy metric, is sketched in the Exercises. Theorem 3.1 Kolmogorov-Sinai Theorem (Kolmogorov Sinai 1959b) If a is a generator with respect to T, then
1958,
h(T) = h(a, T). Proof: It is enough to show that for each finite partition /3 we have h(fi, T) .‘,. 1(o4 T). Now for any n . 1, 2, ..., by Proposition 2.13 and
Exercise 2.7 h(fi, T) 45. h(an_., T) + 1-1(fil an_ n ) = h(a, T) + ,11(fi lan_ n). But since fi acc co up to sets of measure 0, by Propositions 2.7 and 2.11 we have
0 = H(Pial° co ) = lim H(fila%). n—• co
Thus the result follows by letting n -- co in the first formula. According to the following important theorem of Krieger, entropy can be computed in the most interesting cases. For the proof, see Denker, Grillenberger and Sigmund (1976). Theorem 3.2 Krieger Generator Theorem (1970) If T is an ergodic m.p.t. on a Lebesgue space with h(T) < co, then T has a finite generator. Examples of the computation of entropy Example 3.3 Zero entropy Suppose that T has a one -sided generator, i.e. that there is a finite partition a such that el', = .g up to sets of
5.3. Generators and the Kolmogorov-Sinai Theorem
245
measure 0. The existence of such a partition a means that in some sense the present and future of the system (X, M, j T) are completely determined by its past. Then of course a is a generator, so we have ,
h(T)= h(a, T) = 1-1(ala) = MO) as might be expected. For example, every ergodic rotation of the circle has entropy zero: if a is a partition of the circle into two disjoint semi-circles, then a7 = a up to sets of measure 0. Another way to see this is to notice that, with a and T as above, the partition a v T' V ... y T - "' a consists of 2m intervals on the circle. Thus H(a y T' y ... y a) -‘„ log2 (2m) (the entropy is always a maximum for a partition into pieces of equal measures), and H(a v T - ' a v v T - " a)/m -> O. This argument shows that h(a, T) depends on the asymptotic numbers and relative sizes of the cells of a V T - la y ... y T'a. We will have more to say about this later (Shannon-McMillan-Breiman Theorem). More generally every discrete spectrum system has entropy zero. For, by Exercise 10 of Section 2.4, if T has discrete spectrum then there is a sequence of integers nk co with T"kf ->f in L2 for each feL2 . Let a = {A,, A 2 , , A n} be a partition of X and f(x) = i if xeili . Then Trucf ->f in L2 implies that, up to sets of measure 0, a c aT, so that h(a., T)= H(ala) = 0. Example 3.4 Entropy of Bernoulli schemes Consider the Bernoulli scheme .q(p , p 2 , on the alphabet {a l , a2 , ....a.}. The time-zero cylinder sets A i = {x: x e = ai}, i 1, 2,
form a measurable partition a. Since a'A i = {x:x.= ai }, V' x Ao-- ma) contains all cylinder sets and hence equals .q? up to sets of measure 0. Thus a is a generator and 1 m+ h(o) = h(a, o-)= lim — H(a y o-- a m Now a y
la are sets of the form A. no-- 'A.t2
v a - m +l a is the partition by ni -blocks: its elements
•••
no-- " I A 1.m .
Because the measure is product measure, the partitions a, a'a, u-m+ l a are independent. By Proposition 2.9, H(a y a -t a y ... y am+ 'a) = mH(a)--- - m
E pi log2 p i ,
i=
••-
246
5. Entropy
and therefore the average uncertainty about what will appear next in any sequence is pi log2 p i .
h(a)=
Thus MI, -}) has entropy log2 2 and M(1,1, 1) has entropy log2 3, so that these two systems cannot be isomorphic. In this way Kolmogorov settled a question that had been embarrassing ergodic theorists for two decades. Example 3.5 Entropy of Markov shifts Let A = (a i) be an n x n
stochastic matrix with fixed (row) probability vector p (pA = p). Again let a be the partition into time-zero cylinder sets {x: xo = ai), = 1, 2, ...,n. This time, a typical element {x: x0 = io , xi = , , x„, = of a y o--1 a y ... V cicx has measure so that
E
1 a y ... y o -- moc)=--
H(a y
f(p io aioi .... a.
iOD• •••iPVI
=—E
ioaioil - • • ai.„ 2i,„_
+aiim
=— E
log
". a. rm - 2 1m - t
log, aim _iim )
E aim_
(
f(pioakji,...a irn- 21
in. - . im - 1 im •
E E =
,im _ f (aim _ iim ) 2
Pip._ if (aim
f(1)10a101 - • Ep 1 log2 pi — mEp iaii
1,,i)
log2 aii .
It follows that the Markov shift determined by A has entropy h(o) =
—
Epiaii log2 au .
If we read sequences one entry at a time from right to left, this number can be interpreted as the average uncertainty about what symbol will appear next, given the one we see at present. In fact, in this case H(al Vr_17 koc)= MŒIT- 'a); this is consistent with the Markov property, according to which the probability of what happens depends only on the immediately preceding situation rather than on the full past. The following fact is often useful when computing entropy. —
Proposition
3.6 If; a,
... is an increasing sequence of finite
5.3. Generators and the Kolmogorov-Sinai Theorem partitions and V
247
l a(a.)--= gi up to sets of measure 0, then
h(T)= lim h(an , T). Proof: Let fi be any finite measurable partition of position 2.13, for each n = 1, 2, ... we have
X. According to Pro-
h(fl, T) 45. h(a n , T) + H(fila n). By Proposition 2.11, H(fila)-* H(fi(e) = O. Therefore h(13, T)--.Ç. lim Î h(an , T). It follows that h(T)= lim.,h(a n , T). Example 3.7 Entropy of a product transformation Let T1 :X 1 -0C 1 and T2 : X2 - X2 be m.p.t.s on probability spaces (X i , and (X2 , Af 2' PO. Recall that the product transformation T=T 1 X T2 on X, X X2 is defined by T(x l , x2 ) = (Ti x i , T2 x2). If a l and a2 are finite partitions of X 1 and X2 , respectively, define partitions d i and 62 of X 1 X X2 by 51 = {A
X
X2 : Aea t }
62 = {X 1 x A: Aea 2 }. t and 1— .1 52 for all j. Let Then 6, and 52 are independent, as are T a = Cxi y 52 . Because T - ja = T - i5 1 y T 2 , for each n = 1, 2,... H(ano') = H ((d i )e- 1 ) + H ((62 )"0 - 1 ) = H ((a I C 1 ) + H((Œ 2) 1 ) and consequently
h(a, T) = ka,, T1 ) + h(a 2 , T2). Now if we take increasing sequences of partitions fan of X 1 and [an of X 2 with V:1 1 4') = Mt, and V,7_ 1 ar = £ 2 up to sets of measure 0, then \in' 1 4) x a(2") is the u-algebra of X i X X2 up to sets of measure O. By Proposition 3.6, h(TI x T2 ). h(TI ) + h(T2 ).
Exercises 1. Calculate the entropy of a Markov shift by using the formula h(a, T)= H(ale, ). Obtain the entropy of a Bernoulli shift as a corollary. 2. What can you say about the entropy of an infinite product transformation? What about an inverse limit of m.p.t.s?
248
5. Entropy 3. The entropy of a stationary stochastic process (f.: - 00 < n < cc ) is 0 if and only if fo is a measurable function off 1 ,f_ 2 ,f_ 3 ,-,.. 4. Given any r> O. there is a Bernoulli shift of entropy r. 5. Show that the set (I) of equivalence classes (mod 0) of fi nite measurable partitions of (X, M, ti) is a metric space under the entropy metric d(a, 13) = H(al fi) + H(fil (L ). 6. Show that 1 ka, T) — h(13, T), ‘. d(a, fi) for a, fiE(1), so that h(a, T) is a continuous function of a. 7. Show that if a, a2 -.. ... are finite partitions with Vkcc=i ak . M up to sets of measure 0, then {flea): fi -..ç. a. for some n) is dense in (1) (with respect to d). (Hint: Given a e (1), approximate all the elements of a well by members of some M(a), and take fi to be the partition the latter generate. Then use Exercise 6.) 8. Use Exercises 5-7 to give another (namely Sinai's) proof of the Kolmogorov—Sinai Theorem. (Hint: Let a be a generator and an = a" .. Show that if fi ‘, an for some n, then 2h(13, T) ,2h(a, T).) 9. Show that if p 1 -..‹.. qi for all i and j, then the entropy of the Bernoulli shift M(p 1 , p2 , ... ,p.) is no less than that of Aq t , q 2 , ...,q„). 10. Show that if a is a countable generator for T with H(a) < cc, then h(T) = h(a, T). What can you say if you find a countable generator a for T with H(a) = co ?
6 More about entropy
This chapter treats several topics in entropy theory that are somewhat beyond the basics. We begin by computing the entropies of automorphisms of the torus, skew products, and induced transformations. The following sections discuss convergence of the information per unit time (Shannon—McMillan—Breiman Theorem) and the topological version of entropy for cascades. We give an introduction to the Ornstein Isomorphism Theorem, which says that two Bernoulli schemes are isomorphic if and only if they have the same entropy. O rnstein's associated theory of sufficient conditions for m.p.t.s to be isomorphic to Bernoulli shifts has produced a surprising list of examples, including classical ones like geodesic maps and automorphisms of the torus, that are metrically indistinguishable from repeated independent random experiments. In the final section we present the Keane—Smorodinsky construction of the isomorphism whose existence is implied by Ornstein's theorem. Their work actually strengthens Ornstein's result, since they are able to construct the isomorphism explicitly, and the map is finitary: each coordinate of the image of a point can be calculated from knowledge of only a finite piece of the history of that point. (Alternatively, the map is a homeomorphism once a set of measure 0 has been deleted). This means that in principle such a coding can actually be carried out mechanically.
6.1. More examples of the computation of entropy A. Entropy of an automorphism of the torus Let Kn = RIX be the n-dimensional torus and T: DV —, IV the Haarmeasure-preserving map determined by an n x n integer matrix A with determinant + I. Assume that T is ergodic, so that among the eigenvalues ... , A. of A there are no roots of unity. (Then in fact T is even strongly mixing—see Section 2.5.) The formula
h(T)=
E
log2 1;.i I
lAii> I for the entropy of T was found by Sinai (1959b). We sketch, for n = 2, the proof given by Berg (1968).
250
6. More about entropy
The idea is to construct a sequence of partitions a, ‘. a2 Lç. ... with V: i an = a' up to sets of measure 0 and such that the entropy of T with respect to each a,, is the same. Then the result will follow from Proposition 5.3.6. Consider now the integer matrix A = () with determinant + 1 which determines T :K 2 --■ K2 by T ( ) = A ( ) mod I. A has characteristic polynomial xA(x) = x2 — (trace A)x + det A. I claim that the eigenvalues A i A2 of A are real, with, say 1 A 11> 1 and1A21< 1. For if A had a complexand eigenvalue A, then its other eigenvalue would be the complex conjugate X. Then A + X= 2 Re A = trace A and AX = IA 12 = det A, so that det A = 1 and IAl= 1. Since trace A is an integer, we would have Re A = 0, 1, or 1. The corresponding As would be + 1, + i, and +-12- ± i ‘/i/2, all roots of unity, which we have ruled out. The eigenspaces V 1 and V2 corresponding to the eigenvalues A 1 and A2 , when A is considered as a linear transformation of the plane, are two lines through the origin, each with irrational slope. (This is so because if m = trace A, then 1 = (m + ‘/m2 + 4)/2, and m2 ±4 is never a perfect square for nonzero integer m. If m =0, then 1= 1, which is impossible.) Remembering the Kronecker-Weyl Theorem, the projection of each of these lines to the torus K 2 = R2/12 is dense. Assume without loss of generality that A i , A2 >0 and for the sake of the pictures that both eigenvectors are in the first quadrant. One of our partitions a is constructed as follows. Beginning at the origin, take a long segment of 1/1 and project it to the torus. Call the projection P 1 .
Do the same for V2 to get P2.
6.1. More examples of the computation of entropy
251
Extend any 'hanging ends' like those indicated by the circles so as just to touch the projection of the other half-line.
Finally, beginning at whichever of (1,0), (1,1), or (0,1) is possible, retreat along V1 , V2 , or both until just coming into contact with the already existing dividing lines; this guarantees that there will not be a 'hanging edge' at (0,0)e K 2 .
We obtain a partition a of K2 into finitely many parallelograms with disjoint interiors whose boundaries are made up of segments P 1 and P2 of the projections of 111 and V2 into the torus. Now the action of A on K2 is to expand vectors in the direction of 1/1 by a factor of A 1 and to contract those in the V2 direction by a factor /12 . Thus one of these small parallelograms behaves under A as shown:
R
(after which it is reduced mod 1). Because V2 is an invariant subspace for A, we have AP 2 c P2 . This means that the dotted (contracting)
252
6. More about entropy
boundaries of the above parallelogram must again reach the dotted boundary, and they can't stop short as one end does here: ....." ........ .........-•
......"
........ ......' ...... ......
........
Similarly, A' 13 1 c P. This means that the image AR of R contains no part of the solid (expanding) boundary P 1 in its interior. Therefore, A carries R into something that sits nicely with respect to the partition a:
In the expanding direction, AR extends fully across each element a whose interior it hits; and AR contains no part of the expanding boundary in its interior. These are the key properties of a Markov partition. Such partitions were discovered by Adler and Weiss (1967) and have been studied extensively by Sinai (1968) and Bowen (1970a), (1970b), (1978). The existence of such a partition greatly facilitates the representation of a transformation by symbolic dynamics and, what is of most immediate interest for us, the calculation of entropy. Fix k = 1, 2,.... The partition a y Ta y ... y Tc arises by subdividing each parallelogram of a into long thin parallelograms by subdividing the contracting boundary of a only. In the figure, the large parallelogram is an element of a, the small ones belong to a y Ta y ... y Tka.
6.1.
More
examples of the computation of entropy
253
The contracting boundary of each element E of a y Ta y .•. y T k- 'a is between Al' times the lengths of the shortest and longest contracting boundaries of the elements of a, and its expanding boundary has the same length as an expanding boundary of one of the elements of a; therefore C
1
for some constant c> 0, for all Eea y Ta
y
... y T k a and all k -- 1, 2,...
Now we can write H(a
... y T k a) = — Ep(E)log 2 p(E)
y Ta y
E = — Eb E A -1 14 log2 (6 EÀ.M E
(where 6E = Aki p(E), I so that - ‹. b E ‹ c for all E) C
= — EgE)log2 bE + k log2 A , E tl(E), E
E
and make the estimate I H(a y Ta
y
...
y
ra) — k log2 A i l
E,u(E)I log2 bEi ‹. llog2 cl E
Then clearly 1 H(a h(Œ, T) = l irna. k -. k + 1
y Ta
y ... y Tka) = log2 A 1 .
By beginning with longer and longer segments of the eigenspaces 1/ 1 and V2 , we can produce a sequence of partitions a l , a2 ,... for which V:3_ i a. . M up to sets of measure 0. That h(T) = log2 A i then follows from Proposition 5.3.6. Adler and Weiss (1970) used these Markov partitions to show that entropy is a complete invariant for automorphisms of the two-dimensional torus. This result was subsumed in the later work of Katznelson (1971),
254
6. More about entropy
who showed that ergodic toral automorphisms are isomorphic to Bernoulli shifts. B. Entropy of a skew product Let us consider a slightly more general type of skew product than we discussed in 2.4. Let T: X —, X be a m.p.t. of a probability space (X,M,p), and tS.: xeX) a family of m.p.t.s of another probability space (Y, (e, y). Suppose that {S: xe XI is measurable in the sense that the map (x, y) —> S.), is measurable from X x Y to Y. We can then define a transformation T = T x {SJ on X x Y by
-r(x, y) = (Tx, Sk y). As before, T is easily verified to be measurable and measure-preserving. The entropy of -c was computed by Abramov and Rokhlin (1962). For a finite partition /3 of Y and n = 1, 2,... define /37(x).
S' fi y
I
sT-xl /3 y ... y sx-
1 ST -xt
... sT-t ix fl
and /3r(x) = V,7_ t r(x). Then H(/3I/3'i(x)) can be seen to be a measurable function of x. We let
hT (13, S) = f H(fi I iric (x))dp(x) x and define the fiber entropy to be
hT (S)= sup {hT(/3, S): fi a finite measurable partition of Y } . We will show that
h(T x {Sx}) = h(T) + h r(S). First a couple of preliminary results. Let ro (x) = /3 y /37(x). Proposition 1.1 hT(/3, S) = lim Proof: For each xeX,
(1/n)IxH(fino(x))dp(x).
H(/3"0(x)) = 1/(/3 y S.: ' /3'L- 1 (Tx))
= H(/3Z- '(Tx)) + H(fil 5: 'fir '(Tx)). Since fx 1/(3t(Tx))dp = hH(g(x))dp for all k, we may repeat this calculation to find that
fx H(/32,(x))dp = f[H(/3IS; 1 fl2,- 1 ( Tx) ) + 1-1(/31S; ' /32, - 2 (Tx)) x + ... + 11(filS; 'N(Tx)) + H(13IS;' fig( Tx))]dp,
6.1. More examples of the computation of entropy
255
so that
1 f H(/3"0(x))dp(x) = 1n nEl f
H(PIS; 1 fit(Tx))4(x).
k=0 X
"X
Now H(/3 I S; '13(Tx)) = HO I 14(x)) decreases a.e. to
H(131131° (x)) (Proposition 5.2.11);
the integrals of these functions converge to
f. H(fi I fiT (x))clit = ha, s) by the Bounded Convergence Theorem. Since the Cesaro means of a convergent sequence have the same limit as the sequence itself,
1 n- 1
Ef
1
1 H(fil 13 (x))dp(x) lim - 3" f H(/ o(x))dp(x) = lim x n'oc, n k=0 X
= ha, S). (The similarity of this part of the argument to the proof of Proposition 5.2.12 is more than coincidental.) Let a be a finite partition of X. Extend a to a partition cTc ={A x Y: Ae a} of X x Y. Similarly, Let 4 = {A x Y: Aegi}. For a partition i of X x Y and xeX, let denote the partition of Y determined by the x-sections of members of : the member of tx determined by a set ZE is Z.= {ye Y :(x, y)EZ}. Proposition 1.2
For a finite partition of X x Y,
Mt la) = f H(.)dp(x). x Proof: Note first that for any
ZE,
p x y(Z I
a). v(Z) a.e. d(p x y).
For v(Z.), being a function of x alone, is a-measurable on X x Y; and its integral over a typical set A x Ye 2 is
J
fA x y v(Z z )d,u(x)dv(y)
=J
=f
. v{ye Y: (x, y)e Z}d,u(x) = f f xz (x, y)dv(y)dp(x) A
= f f X z ()C, y)dvdp. A xY
AY
256
6. More about entropy
Therefore on each Ze we have — log2 (p x y)(Z I and hence
= — log2 v(Z) a.e.,
/40 (x, y) = I43, (y) a.e. Then clearly
H( I
= f
/40 (x, y)dvdp = f f I4. (y)dv(y)dp(x)
X xY
X
Y
= f Fi( x)C11(X). X
Proposition 1.3 h(T x {S}) = h(T) + h T(S).
Proof: As in the computation of the entropy of a direct product, it is -r) over all finite partitions enough to show that the supremum of of X x Y which are of the form = a x /3, where a and /3 are finite partitions of X and Y, respectively, is h(T) + h T(S). Now for such a and n = 1, 2,..., y 4) =
H(4) =
+ H(4IK) = H(4) + H(41 diL)
H(ano) + H(4 I
= H(4) + f 11((4) x)dii(x)
T - na). Be(where ano = y ry ... V T —ni and a"c, = a y T'a = fi and (q)x = /373(x). Using Proposition 1.1, cause c = a x /3, each then,
h ( , -c) = lim
1
n-..cori+1
+ lim
H ( n) °
1 man) n- n + 1 °
1 f H(137,(x))0(x) = h(Œ, T) + h T (fi, S). n+ I x
It follows by taking suprema over a and fi that
h(t) h(T) + h T(S). For the reverse inequality, begin by fixing finite partitions of X x Y and a of X. We have, for n, m = 1, 2, ... , v H(m) < H( nm nm
H(57") H(riir) nm nm
H(ccr) H(41" I ritcr, nm nm
6.1_ More examples of the computation of entropy
257
Because
Hgr I den =
y ... V r - ' 1 0 v T -2 'n +1 ) v I Er) "ro V ... V X — — ro ltincr)
gvt
v (r - '% v
H(C v
n-1
n-1
E FAT —kwi C I
k=0
E MT— kn VOI l t— k"16 )
k= 0
= nH(CIE), this says that
H(m) < H(cm) nm Taking limits as n
GC
H(C16) m
nm
gives
h(,t) h(Œ, T) +
°I6).
Now choose a from a sequence a l < cc2 < ... for which VIT_ 1 £l(Œ k) up to sets of measure O. We can conclude that
h(T)+ which in a case when appeared) says that
h( , Z)
= a x fi (a different a. from the one that just dis-
H(137,(x))dp(x). h(T) + —1 m x
Letting m -> co gives h(, r) h(T) + h(13, S),
and taking suprema over a and fi shows that
h(t) h(T)+ h T(S). C. Entropy of an induced transformation Let T: X -> X be a m.p.t. on (X, g, p) and Eeg a set of positive measure. Recall that the induced or derivative transformation TE :E E on (E, n E, p E) is defined by
TEx = TnEwx, where
nE(x)= inf (n 1: Tnxe The formula
h(TE)-
h(T) p(E)
258
6. More about entropy
was found by Abramov (1959); we give here the simple proof due to H. Scheller (see Krengel 1970, Denker, Grillenberger and Sigmund 1976, Neveu 1969). Notice that in general h(TE) > h(T): our uncertainty about the action of TE is greater, since points have some chance to wander around while they're in Ec. Let cc be a countable measurable partition of E with H(E) < cc ; we will also consider a to be a partition of X, by adjoining E`. We may assume also that all the sets En = {xeE:nE(x) = n} are in 7Va. To see this, it is necessary to verify that the partition {E.} of E has finite entropy. If the measures of the En are e 71 , ..>... en ..., then the non-ergodic version of Kac's formula (2.4.6) gives n ac n(n+ 1) 1 > E nken k.... ( E nk )en ?,-en, 2 k-1 k-1 so that en -.<._ 2/n 2 for all n. Since —x log2 x --,_ x213 for x sufficiently close to 0, _ en fog2 E ....<, E2I3 < 2213n-4 1 3 for large n, so that H{E} = — Eek log2 ek < oo.
Since it is possible to find an increasing sequence foc k } of such partitions a to sets of measure 0), it follows from for which .W( [ad' k) increases to . Proposition 5.3.6 and Exercises 5.2.7 and 5.2.11 that h(T) is the supremum of h(Œ, T) over all such Œ. For each n = 1, 2, ... , E.= T - n(TEEn), since TE IEn =-- Tn, so that each En Ea 1 and hence also Eear. We claim that the following equation holds relating two sub-a-algebras of ..4nE: co V TE-ka = cc iœ n E. k=1
First notice that if A c E, then Ti 1 A = U TE-1 (TE Ek n A) = U Ek n "Ak=1
k=1
Since each Ek ear. nE, if A ea then each set in the above union is in a7° nE also, and hence Ti 1 A ear n E. If Aea y T 1 oc, again TE-1 Aect iœ n E. Continuing in this way, we see that Go V TE-k tx c (xi n Ek= 1
6.2. The Shannon-McMillan-Breiman Theorem
259
For the reverse inclusion, notice that if A ea is a subset of E, then
T'AnE=(Ti 1 AnE.)4H N.._. TE 2 A nEfi n TE-1 E n — ii jl=
L.) (Ti n A
I
L) • • •
OD
nE l n TE- 1 E n ... n TE- (n - " Ede V TZ ka, k= I
because each E i E TE- 'a . For a sub-a-algebra , c M and Ac Ee .F ,
,u(AlF) = ,u(AIE n .F)
a.e. on E;
and if AeEn,, then of course p(AIE n.F) = xA a.e. Therefore Eue a ri and a(x) = EC on EC imply that / OEIŒT (x) = 0 a.e. on E` , and
h(Œ, T)= H(a I alc) ) .
f lakr, dp, = X
dp +
JE E
= ti(E) f I
E al V
dp
d/2 ŒV T Œ L.. 1
dpE = ,u(E)h(a, TE
.
/alŒr
EC
= f /44, di = f /awl, r, E dp = f I
EEE
J.
).
T F ha
, Abramov's formula follows by taking the supremum over all a of the type under consideration. k
=
Exercises 1. Show that if Y is a compact group, y is Haar measure on Y, f: X —) Y is measurable, and z(x, y) = (Tx, y + f(x)), then h(t) =
h(T). 2. Show that given any re(0, cc), there is a Bernoulli scheme of entropy r. Is the same true of toral automorphisms? 3. Give examples of m.p.t.s of infinite entropy. 4. Derive the formula for the entropy of a primitive (skyscraper) transformation built over a base T: X —, X.
6.2. The Shannon—McMillan—Breiman Theorem Consider again a ticker-tape-type source like in our discussion of information theory. At each time, measure the amount of information per symbol received that has been conveyed up to that time. It is plausible that in the long run this average amount of information per symbol should converge to the entropy of the source, at least provided that the message
260
6. More about entropy
that we are receiving accurately reflects the statistics of the source (which happens almost surely if the source is ergodic). Let T: X X be an ergodic m.p.t. on (X, gi, p.) and a a countable partition of X with H(a) < co. Recall that /4-1(x) = I a V T -1 7V .. V T -71+ 4x) measures the amount of information conveyed when the first n symbols of the message x are received (that is, when we know to which elements of a the points x, Tx, ... , Tn - 'x belong). We know (Proposition 5.2.10) 1
1
H(a)= f I„,,(x)dit -> h(a, T); - n+1 x -0 n+1 the preceding paragraph proposes that in fact the integrand 1
t(x) n + 1 "0
already converges a.e. to h(a, T). That this does in fact occur when T is ergodic is the content of the theorem proved in increasing degrees of generality by C. Shannon (1948), B. McMillan (1953), L. Carleson (1958), Leo Breiman (1957), (1960), A. Ionescu Tulcea (1960), and K. L. Chung (1961). This theorem can also be interpreted as providing an estimate of the speed with which the process of moving by T and intersecting chops up the atoms of a. For a typical An eano and large n, we have 1
n + 1 log 2 ,u(A n ) P..-; h(a, T), so that
AA .) 7,.,.-,, 2- cm + i M(a,T) • Thus the sizes of the atoms decrease exponentially at a rate determined by the entropy. We will formulate this version of the theorem more precisely below (see Corollary 2.5). The proof of the Shannon-McMillan-Breiman Theorem again depends on a maximal inequality, which in this case is stronger than the usual weaktype (1, 1) estimate. Continue to let T: X X denote an ergodic m.p.t. on (X, at, it) and a a countable measurable partition of X with finite entropy. Lemma 2.1 For each n = 1, 2, ... let f. = /air- lay... v T - ria = f* = sup„ i fn . Then for each .1. ...-• 0 and each A ea, p(XE A:
f ' ( x) > .1.) ‘.. 2 - A.
ta w; and
6.2. The Shannon-McMillan-Breiman Theorem
261
Proof: For each A ea and n = 1, 2, ... , let
fnA = I Aial, = — log2p(A ; T -1 a y ... y T - "a)
and = tx: f ;4(x), ... , f_ 1(x) ....‹.. A. fnA(x) > A l .
Of course f:i4 = f. on A. Since BkA e AT - 'a y ... y T -k a), p(BkA n A) = f xA dp = f p(AIT - l a y .•• y T -k a)dp A Bk 8„A . f 2 - f: dii, B:
Therefore x
00
p{xe A: f *(x) > A} =
E
AB kA n A) < 2 - '
k=1
y 103,1)‘. 2'. k=1
Corollary 2.2 f * E LI . Proof: Of course gxeA: f *(x) > A} -.‹.. p(A), so p{xe A: f*(x)> A} < min {p(A), 2 - A}. As in the proof of Theorem 3.1.16, cc, co = f p{ f * > A} dA = E f /4.7ceA: f ' ( x) > #1.1 cl).
JÎ
0
AEC(
‹Ef _E
0
OD
min fp(A). 2 - A l c1.1
AEOL 0
Ae
-i(A) og.,
lio
f
It(A)dA + ._-_
ix
-logligA)
[--- p(A) log2 p(A) + p(A)
log 2
ilea
H(a) ] .+
1 . log 2
Theorem 2.3 Shannon- McMillan- Breiman Theorem Let T: X -4 X be an ergodic m.p.t. on (X, R, p) and a a countable measurable partition of X with H(a) < co . Then 1 lim n_.co n ± 1 IaV T -l ay
Proof Lai:, =
... v
Ic4 T lay ... v T -- n OE
T na(x) = h(a,na.e.
for each n = 1, 21 ..., as before. By making
262
6. More about entropy
repeated use of the formulas levy -= 10 + /yip and I. Ea = Jo T (see Proposition 5.2.3 and Exercise 5.2.9), one can see that Ia0 = tavT - lav...vt - 1a =i7-1 av...vT - na ±IceIT - lav...v7— na -= /av...v7._„ +t:T + la17— lay_ v T - lia n
=4, +fn _ i T + ... +fi Tn - i - F IT -1 Œ 0 Tn = E fn- T' -0 k
where we have defined fo = Let f= ID; By Corollary 2.2 and the Martingale Convergence Theorem (see 3.4), 1= lim n_cofn both pointwise a.e. and in L'. Then we may write fi
I 11 k E fn _ k Tk = fT In= n + 1 "0 n+ 1 1,0 n + 1 ko 1
1
n
i
+
n + 1 k=0 n
By the Ergodic Theorem, 1 n E f Tk = f fdp = f /OEIŒT dp = H(cdar) lim ,-..con + 1 k=0 X X
= h(cx,T)a.e. ; therefore, in order to prove the Theorem it suffices to show that 1 n liM (1) If ,-flTk = 0 a.e. k _ c.
For each N = 1, 2, ..., let 1
n
f N1I.--k .nkIn—N 1
FN .-- SUD
"
E If"
n + I k=0
fi. Thenf I Tic
flTk = k=0
n
1
E
+ n+1
<
—
14,-fl T k
k=n—N+1 1 n—N
E
1
n
E If —k — flT k. n + I k=0 n + 1 k=n—N+1 Fix N and let n —* cc. Because I fn _ k —-,fl1 -..ç.f * ± fe Li for all k, the second term above tends to zero a_e. Similarly, 0 :‘. FN f* + Jai, so we may apply the Ergodic Theorem to the first term: 1 n—N 1 n—N E FN Tk.-_- FAIL lim E ifn _ k flT k -5 lim FN T ic +
n-om ri ± 1 k=0
I
X
6.2. The Shannon- McMillan-Breiman Theorem
263
The Dominated Convergence Theorem and the observation that FN —+ 0 a.e. then yield (1). Remark 2.4 Because the convergence in the Ergodic Theorem also takes place in L', and the FN above are dominated by an integrable
function, also 1
T) in Li . +
n 1 a.
The following consequence of the Shannon-McMillan-Breiman Theorem is sometimes called the entropy equipartition property. Corollary 2.5 Let T: X -) X be ergodic and a a countable measurable partition of X with H(a) < co . Given c> 0 there is an no such that for n?.- no the elements of ano can be divided into two classes, # and 1
(the 'good' and 'bad' atoms), such that: (i) 11 U Bet B < 6
(ii) For each Get, 1)(11(œ * n + 8) < u(G) <
2
Proof: Since I .,, An ± 1) -) h(Œ, T) a.e., the convergence also takes place . in measure: for each c> 0 there is an no such that if n?.- no then 1
/ .(x)- h(a, T)1?.... el < e. n + I ao Of course 1 (x) is constant on each atom of ano . Thus we need only to let
plx: 1
# be the collection of all those atoms G of ano on which 1
/ .(x) - h(Œ, T)l< E,
n + 1 "0
and let t be all those atoms that are left over. Exercises
1. Formulate and prove a version of the Shannon-McMillanBreiman Theorem for nonergodic T. 2. Compare the sizes of # - #. and rn in case a = {A 1 , A 2 ,... A r } is a finite partition with h(Œ, T) < log2 r. Interpret in terms of teletypes. 3. Let R i .472 ... be an increasing sequence of sub-a-algebras of .07, .04. = VsT i git., and a a countable measurable partition of X with H(a) < cc. Then IOE1(2,, -, /4m. a.e. and in L'.
264
6. More about entropy
4. Use the Shannon-McMillan-Breiman Theorem to compute the entropy of an ergodic Markov shift. 6.3. Topological entropy A topological analogue of entropy for a cascade (X, T) (X compact Hausdorff, T: X -, X a homeomorphism) was introduced by Adler, Konheim, and McAndrew (1965). Later, Bowen (1971) gave a different definition for (possibly noncompact) metric or uniform spaces. A similar idea appears in Dinaburg (1970). The relationship between topological entropy and metric entropy was found by Goodwyn (1969, 1972), Dinaburg (1970), and Goodman (1971): the topological entropy of a homeomorphism is the supremum of its metric entropies over all ergodic invariant Borel probability measures. Given this 'variational principle', many of the properties of topological entropy follow immediately from the corresponding statements for metric entropy. In the case of a homeomorphism T: X -, X of a compact Hausdorff space X, we deal with open covers oll of X rather than with measurable partitions. We say that "if is a refinement of QI, and write "V ?--- QI, if every V e '1/. is a subset of some U e alt. The least common refinement or join, gl v17, of d'ii and 'V. consists of the nonempty members of {Ur V:LIEW, VeY7}. For each open cover 'V of X, let N(611)= the minimum of the card inality of the subcovers of 61/
and H(6/1) = log2 N(Ql).
Proposition 3.1 Heil y 'V) -... H(6/1) + H(' 7). Proof: NCTI y 11 -S. N(61.1)1■1(1).
For - oo < m
n < co, define n
Q1 In
= V T - 41 = T - m 6/1 y 7-(m+ 1)6/1
y
...
y T - noll.
k = rn
Proposition 3.2
kW, T)= limn , .9. (1/n)H(614- 1 ) exists,
and
h(Qi,T)-SH( 611). Proof: If H.= H(Qin0-1 ), then H n+m = H(614 -1 v T - 417)1-1 ) -. ..H(611101-1 )+ H(T - nler 1) = H.+ Hm .
Then the proof of Proposition 5.2.10 applies to show that {H/n} is bounded by H, = H(011) and has a limit as n -. co .
265
6.3. Topological entropy
Definition 3.3 The topological entropy of a cascade (X, T) is defined to be h10 (T) = sup he I.l , T). * Notice that 'V .?-- e implies h(17 ' , T) ?..- he1 l , T). Thus h10 (T) is the increasing limit of the net h(q l, T), where the index set is the family of all open covers of X. The following fact, which is useful for the computation of topological entropy, is apparent. Proposition 3.4 If {Qin :n = 1, 2, ...} is a refining sequence of open covers in that 6/1 1 4. *2 4. . . . and for each finite open cover 4//' of X we have It 4 alt. for some n, then
h10 ( T) = lam h(Qen , T). n-bao
Remark 3.5 If gi i 4 gi 2 4. . . . and diam 91. =-- sup diameter (U) --, 0,
then {6/in} is a refining sequence. For given any open cover IT, we can choose n large enough that all the members of 9e,, have diameter smaller than the Lebesgue number of V. This forces each member of Qin to be ' contained in a member of V. Example 3.6 Let c be the shift transformation on {0, 1 }7 and X a closed shift-invariant subset. The covers consisting of the cylinder sets of rank n form a natural refining sequence. Here ait ,, consists of all nonempty cylinder sets of the form {xeX:x_. = i,,x_ n+i -,-- i_ n+i , ... ,x0 = io ,...,xn = in},
for some choice of i_ n ,i_ n+ 1 , ... ,in e {0, 1 } . Note that, writing V . 910 , 91. = (W)% .
As in Exercise 5.2.7, h(Qln , T) = h (9i , T) for all n, because 1 heI I, T) -..ç. h(91,,, T) = lim - H ((q l n) - i ) k-cok 1 1 : +k-1 = lirn - H(91 '') = lim - WTI it k k- 00 k
v 91 :-1)
■
CO
,..ç_ . lim _1 [H (91 n+k-1)+ H (4, :--1)] _ rim 1 H (Q1 ko-1) = h(9l, T).
Thus h(ll, T) = hell , T) = lim 11(917)i-1) h 0 ( T) --, lim h(Ql, , T) = lim m -4, co in -4, co W
n-4, co
n
266
6. More about entropy
The number of (nonempty) elements in qimo -1 is just the number of m-blocks that can appear in a sequence in X. Therefore we have the important result that the topological entropy of a symbolic cascade is
1 lim —log2 (number of m-blocks). m-..co In The topological entropy of the full two-shift is 1. If X consists of a single periodic orbit, then the number N . of n-blocks is bounded, so that in this case h(o-l x) = O. If X consists of all sequences containing only even-length maximal strings of O's and of l's, then N 2k = 2k, so that htop(o- Ix) = 1. (This is reasonable because in this case (cl)2 is isomorphic to the full two-shift.) Exercise 4 asks for the (less trivial) computation of the topological entropy of an arbitrary subshift of finite type. This example shows that the topological entropy is indeed a measure of the absolute information content of a transformation T, or, equivalently, of the extent to which T scatters points around the space X. It is a bit of a miracle that this definition in terms of open covers has proved to be so right and fruitful, as convincingly shown by Theorem 3.10. We turn now to the definition of entropy given by Bowen. Let T :X ,- X be a homeomorphism of a compact metric space X. In analogy with the preceding example, we wish to count the number of different orbit-blocks of length n that can be observed, where now (in order to get a finite number) we fail to distinguish points that are closer together than some positive distance E. Let us say that x and y are (n, 0-separated if their initial orbit blocks of length n can be distinguished by such a myopic observer, i.e., if d(74 x, T ky)> e for some k = 0, 1, ... , n — 1. A set E c X is called (n, 0-separated if x and y are (n, 0-separated whenever x, ye E and x # y. Then the maximum number of distinguishable orbit n-blocks is s(n, E) = max {card E:E c X is (n, 0-separated}. We define I
h(T, 0 = lim sup- log2 s(n, 0 n - cc rz and A top
(T) = lim h(T, 0.
E-0+
The latter limit exists because h(T, E) clearly increases as E decreases to O. It represents the information we gain from watching T if we can make arbitrarily accurate observations.
267
6.3. Topological entropy
If X is not compact, for each compact K c X let sK (n, c) = max {card E:E c K is (n, 0 separated}, -
1 hK(T, 0 = lirn sup - log2 s K(n, c), n
h K(T) =
lim h ic(l; c),
E -• 0 4
and define A top( T) = sup h K(T). K
In this way the topological entropy is defined for a homeomorphism of any metric space. Similar definitions apply also to uniform spaces, but we will not need to consider such a general situation. This definition of Atop is related to Kolmogorov's concept of c - entropy; see Walters (1975), Dinaburg (1970). One can also take a reverse approach to the definition of A top (T). For each E > 0 and n= 1, 2, ... , call a set F c X (n, 0-spanning if for each xe X there is yeF with d(T kx, Tky) -... E for all k = 0, 1, ... , n - 1.
Let E be a maximal (n, 0-separated set. Then E must also be (n, 0-spanning, since if we try to add any point y to E, E u {y} is no longer (n, 0-separated which is to say that the initial orbit n-blocks of y and some xe E will no longer be distinguishable by an &accurate observer. Therefore, if r(n, c) = min,{card F:F c X is (n, 0 spanning}, -
then r(n, c) -... s(n, c).
On the other hand, I claim that s(n, c)'‘. r(n, c/2).
For fix an (n, c/2)-spanning set F and let E be an (n, 0-separated subset of X. We can define a map E -■ F by choosing for each xeE a point ye F whose initial orbit block of length n `E/2-shadows' that of x. This map must be one-to-one, since any two points of E have &distinguishable initial orbit n-blocks. Thus card (E) z‘. card (F), and the result follows. Now the equivalence of the following alternative definition of h0(T) with the one already given is obvious.
lln log2 r(n, c). Next we establish the equivalence of the Adler, Konheim, McAndrew and Bowen definitions of topological entropy. Proposition 3.7 At.p(T) = lim, 0 . lim sup
268
6. More about entropy Proposition 3.8 Atop (T) = litop (T).
Proof: Given e > 0, let V be an open cover of X with diam gt = sup U < E. Then
s(n, e) -... N (4/2, - 1 )
for each n - - 1, 2, ... ,
because two points of any (n, e) - separated set cannot both be in the same element of Vo'. It follows that For the reverse inequality, given any open cover V of X let E be the Lebesgue number of V. Fix n = 1, 2, ... and let F be a minimal (n, e)spanning set. We will use F to find a subcover of qinct— 1 of cardinality less than or equal to card F = r(n, e). For each yeF and k = 0, 1, ... , n — 1, the ball B(T ky, e) of radius e centered at Tky is contained in some single member U(k, y) of Qt. Given xe X, there is ye F whose initial orbit-n-block is c-indistinguishable from that of x; this says x E U(0, y) n T - 1 U(1, y) n ... n T
1 U(n — 1, y).
Thus {U(0, y) n T - I U(1, y) n ... n T
1 U(n — 1, y): yE F}
is a subcover of 6/1130 ', and it clearly has cardinality less than or equal to that of F . It follows that N(91"0-1 ) ‘. r(n, c),
and hence h 0 (T) ‘. A top (T). We turn now to the proof of the variational principle, which appears as a conjecture in the original paper of Adler, Konheim, and McAndrew (1965). This principle shows clearly the importance and naturalness of the concept of topological entropy. There is an interesting theory of when the supremum is attained, and of when it is attained by a single measure; see Denker, Grillenberger and Sigmund (1976) for an introduction to this subject of current research. First we need one property of him . Proposition 3.9 For any k = 1, 2, ... , htop(Tk) = kjz top (T). Proof : For an open cover V of X,
htop (Tk) ,?•-- h(614- 1 , Tk) 1 — k lim —HO?'0 - ' ,,--., nk
... V T —(11— k— I ki V T —k62/ L1- 0 v
tvli k- t, 0
I
6_3. Topological entropy 1 . =k urn — Homk° nk
269 I) _
kh(Qe,T).
Also, gi v (T k) - I gi v ... v (T krn+ 10/1 -.Ç.g/ v T -1 g/ v ...v T - ' 16/1 implies that 1 h(W,T)= lim —H(Ql y T -1 * y.. y T - nk+161.1) n _. co nk 1 ?..- lim—,,,H(91 y (T k) -1611 y ... v (7 4) - n +1 °11) nil 1
T k).
Combining these inequalities and taking the supremum over all g/ yields the result. Theorem 3.10 Let T: X -) X be a homeomorphism on a compact metric space X, and denote by .11 2, the set of all T-invariant Borel probability measures on X. Then Itiop (T)= sup hp (T). ue.4(7.
Proof: (Misiurewicz, 1976) Fix an invariant measure p for T; we will show that hp(T) - htop (T). Let a be a finite measurable partition of X. For each A i ea choose a compact set Bi A i which approximates A i so closely in measure that if 13 is the partition {B i , B 2 1-1 (U Bj)')9 then Hp (a113) < 1. Notice that since H(ano - l I ro - 1 ) -...nH(a113) for each n = 1, 2, ... (see the proof of Proposition 2.13), we have H(œ[ 1)
H(Œ1
p(Œn0— 1 y 13110— 1 )
= H,(13Z - 1 ) + H p (ano- 1 I It - 1 ) :..ç. H ii(13Z - 1 ) + n, and hence 1 h(Œ, T) = lim H A (a') ...ç. hu(13, T) +1. 0 n , cc n -
Now form an open cover gl i = {U,} of X by letting Ui=(UitiBi)' for each i. Then each set U. intersects at most two members of fi-certainly Bi and possibly also (uBiy. Let gi be a subcover of g/i of minimal cardinality. Then for all n = 1, 2, ... , card WV 1 ) ---‹.. 2" card (//1C 1).
270
6. More about entropy
For card (*"0-1 ) counts the number of different *, n-names of points of X. i.e., the number of different sequences Ui. , Uii ,...,Ui such that for some xe X, Tkxe L/ik for k = 0, 1, ... ,n — 1; similarly for card (13"0 -1 ). And once we know the *, n-name of a point x EX, then at each time k we know which element of * Tkx belongs to, and have a choice of only two elements of fi to which T kx might belong. Therefore the number of different fi, n-names is at most T times the number of different *, n-names, and the assertion follows. Having this, it is easy to compute that I I n (f373-- 1 ) .... log2 card (fino- 1 ) -.,ç. n + log2 N(614- 1 ),
and therefore, hn(rx, T) -... h(/3, T) + 1 = lim-1 I-I APO- 1 ) + 1 -.‘. 2 + h (Ql, T).
n
Since a was arbitrary, we have h(T) -... 2 + htn,(T).
This same calcul; tion applies if T is replaced by Tk for any k ----- 1, Z .... The conclusion then says that kh n(T) -.,ç. 2 + kk op (T).
Since this holds for all k, we must have h(T) -'‘.12, 0p (T).
In order to prove the reverse inequality, let E > 0, for each n = 1, 2, ... choose a maximal (n, 0-separated set En , let 1 fiE . = card E SI x,E,, 1 6' (where 4 5.. denotes the unit point mass at x), and 1 n -'
k =0
Choose a subsequence {nk} such that 1
1
lim sup- log2 s(n, E) = liM — log2 s(nk , e). n k-,,c, nk By passing to a further subsequence if necessary, we may assume that tfti nk } converges to some measure tt in the weak * topology: h (T, E) =
fx fdpn„.... fxf dp for all fEW(X). i
6.3. Topological entropy
271
It is easy to see that pediT . Choose a finite measurable partition a of X such that for all A ea,
(i) diam(A) < e (ii) p(a A) = O. (Notice that for any XE X, at most countably many balls B(x, (5) have p(B(x, (5)\13(x, (5)) > O. Cover X by finitely many balls B of diameter less than e for which p(OB) = 0, and let a be the partition generated by this cover.) The importance of this second condition is the following: Fact: If p, p weak * and A is a measurable set with p(3 A) = 0, then pn(A) AA). Proof: Choose 0 ‘.fk e4f(X) with fk \ xii . Then
lim sup /4.(A) ..5 lim sup ifk dp. = ffk dp \ p(Â) n n so that lim sup p(A) tç. lim sup p(A) ‘. p(A) = p(A)• Similarly
lim sup /4.(XV4) --<.. p(X\A), n
or
1 — lim inf pn(A) 1.5 1 — p(A). n
Together these say that
lim sup p(A) ‘. AA)- --ç. lim infp(A). n
n
For each A eano- 1 , card (E r' A) --<.. 1, because two different points of En e of one another during their initial orbit n-blocks. canotsywih Therefore
H (Œn- 1 )
E pE. (A)1og2 pEn (A)
,
AEcco
" -
1
=- E
( car di En yeE„nA E Ace -1
0 = E Aexo
" - 1
with AnEn f 0
1) log2
1 E (card En ycEnnA
1
card E. log card En
2
1 log card En = log2 card E. ' En card yeEm
=E
1)
4. i Z.
V•
inure
entropy
UMJUI
Fix m . 1, 2, ... and let n = dm + r, where 0 ‘. r Gm and d> 0 are integers. We may write, for each k = 0, 1, ...,m - 1, n— 1
ao
dm i-r— =--- CCo
d-1
1
...,
V T - jrnaZi - 1 v ced : " - 1 j=0
d- 1 1 < V T - fin-k ,,,'4m0 - 1 v (katim+rdm
j=-.0
Because the partition in parentheses has no more than (card Œ)2m atoms, d-
H A E (an0 - 1 ) ---5 si
1
EH
ii En
(T - frn -k cen0 -1 )+ H p En tad alit
ri ns+ —1
k —1 vŒ) o
j - - - 45
(1)
d-1
i-,- 0
H p En(T - ft" -- k ao'n - 1 ) + 2m log2 card a.
Using the concavity of the function f (t) = - t log 2 t, for each AE cq - 1 1n-1
1
y
f (p.(A)) = f (pi Er, (T -k A) n k =0
n-1
n k =0
and hence, summing on A and using (1), 1 n-1 H (am0 - 1 ) › H (T - k ceno - 1 ) n k=-0 kE.„
E
1m-1 d— 1
n k = 0 j=0
Hg En (T - im -k ceno -1 )
1 ni— 1
E
)— [IIp ( an° -1 ) - 2m log 2 card al n k=0 2m 2 m = — H (an„-1 - — log 2 card a n n )
m = — log 2 card En n m = — log2 s(n, L) n
2m2
n
log2 card a
m2
3--- log2 card a. n
Let n = nk and k cc , while m is held fixed. Because ; (4') -11,,(amo - 1 ) by the Fact above (p. 271) and our choice of a, we are able to conclude that Hi (cxoni - 1 ) › mh(T,
When m -> cc, this yields hp (a, T) › h(T,
E),
4
6.4. Introduction to Ornstein theory
273
and letting e -03 (which, incidentally, requires us to take different as) gives h(T) ..--. h top (T).
1. 2.
3.
4.
Exercises If X is a strictly ergodic (see Section 4.2) subset of the two-shift ( {0, 1)7, a), then kop(alx) < 1 . Compute the topological entropy of the Morse minimal set, which is the orbit closure in ( {0, 1 }7, a) of the sequence ... 01100 1.0 1 10 1001 ... (see Exercise 4.4.3). Prove that if (X, T)is an extension of (Y, S), then h 0 (T) by using (a) the Adler-Konheim-McAndrew definition (b) the Bowen definition. Let (X A , CA) be the subshift of finite type in (10, 1, ... , n - 11 z, a) determined by an n x n 0, 1-matrix A (see Section 4.2). (a) If N . denotes the number of fixed points in XA of al, then
1 htop(cA) = „-„n hill -1°g2 N. (b) N . = trace (A"). (c) htop (crA ) = log I Amax !, where Amax is the (unique) eigenvalue of A which has maximal modulus. (d) Describe X A and find h(CA) in case A = (: 01 ). 5. Prove as easily as possible that kop(S x T) = h 0 (S) + 6. Show that if 46°T is the set of all ergodic T-invariant Borel probability measures on X, then h 0 ( T) = sup ki (T). Pc-4 r
6.4. Introduction to Ornstein theory Ornstein's discovery in 1970 that entropy completely classifies Bernoulli shifts up to isomorphism (Ornstein 1970a) created a revolution in ergodic theory. There are now known several conditions that guarantee that a given m.p.t. is Bernoulli, an extensive list of transformations satisfying these conditions, many interesting examples including K-automorphisms that are not Bernoulli, and parallel theories for finitary maps (see Section 6.5 and Rudolph 1981) and for equivalence when taking induced transformations is also allowed (variously called Kakutani equivalence,
274
6. More about entropy
monotone equivalence, or loosely Bernoulli (see Feldman 1976). Here we can give only a brief introduction to this vast and quickly developing subject. The central result, that two Bernoulli shifts are isomorphic if and only if they have the same entropy, is completely proved in the next section. For a detailed treatment of the theory, we recommend first Shields (1973) and then Ornstein (1974). Let T: X —> X be an ergodic m.p.t. on a probability space (X, R, i.t). Each finite measurable partition a = {A 1 , A 2 ,...,An } naturally determines a map by (4).x)k == j if and only if
TkXE Ai :
the kth coordinate of Oax is the number of the atom of a to which Tkx belongs. Obviously Oa( Tx) = a(O ax), where as usual a is the left shift transformation. The map Oa carries p down to a a-invariant measure y on y(E) . p(4).- 1 E)
for measurable E c O.
In this way a determines a factor map T) —■ (D.„ 6f, y, a). This simple concept of using a partition to code a system in terms of sequences on a finite alphabet is extremely important. There is some helpful terminology associated with the technique: the sequence chafx) is called the a, T-name of x (or a - name of x), and for each k = 1, 2,..., the initial k-block (0„x)0(4).x) 1 ... (4)Œ x)k _ 1 is called the a, k-name of .x (under T). A point xe X will be called a -generic for T (or simply generic) if each block i l i2 ...ik in the a, T-name of x appears with the right frequency, namely 1.4A 11 nT - ' /112 n ...nT -k + 1 AO. By The Ergodic Theorem, almost every point of X is a-generic for T. Proposition 4.1 The map Oa is an isomorphism if and only if a
is a generator with respect to T. Proof: If 4). is an isomorphism, then 4); 1 W = g up to sets of measure O. But if Wk a W is the a-algebra generated by all cylinder sets {coen: co_ k = i _ k , ... ,C0k = ik} (i- k , ... ,ik e {1, 2, ... then 4),-,,- iWk = (a ' k ). Therefore R(cc%) D 0 -14 ( U Wk) = a, and a is a generator.
6.4. Introduction to Ornstein theory
275
Conversely, suppose that a is a generator. The measure algebra map V 0„-- 1 : OW co ): 12) is always an isomorphism, since 0; 1 U = iniplies p(4): 1 (UA V)) = 0, so that U = V mod O. If 2(a%) = .0 up to sets of measure 0, then (bŒ-1 • f) —■ (4, fi) is an isomorphism of measure algebras, which, by von Neumann's theorem (1.4.7), because we are dealing with Lebesgue spaces, arises from a point isomorphism mod O. Let us say that a partition a is independent if the a-algebras g(Tka), ,ir eZ and k EZ, are independent: for any choice of distinct powers not necessarily distinct A ii e a, j = 1, . r, i2
A i2 r)...nT -irA ir)=
(The main example of an independent partition to bear in mind is the tiine-0 partition of a Bernoulli shift) Proposition 4.2 If T has an independent generator, then T is isomorphic to a Bernoulli shift. Proof : Let a = {A 1 , A 2 , . , A .} be the independent generator. Then the image y of kt under (/),, is product measure. Thus the system W, y, 4 is the Bernoulli shift on n symbols with weights p(A 1 ), /1(A 2), ... The map is an isomorphism by Proposition 4.1.
).)
Notice that if a is an independent generator for T, then of course h(T) = h(a, T) = the entropy of the Bernoulli shift downstairs. One form of Ornstein's theorem is a remarkable strengthening of the above Proposition, in which the hypothesis of independence of a is replaced by a much weaker sort of asymptotic independence. We deal henceforth with ordered finite measurable partitions. The distribution of a partition a = {A 1 , A2 An is the probability vector ,
,
}
dist(a) = (p(A 1 ), p(A 2 ),
,
if Bey, we define
dist (alB) = (p(A i lB), p(A 2 I B), ft(itl n B) p(B)
, p(A n iB))
p(/1
B)
p(B)
Definition 4.3 (1) Let E > 0 and let a and /3 be partitions. We that cc is e-independent of A and write a l'A in case there is a family atoms of fi such that
AUg) > 1 — E. (ii) If Beg, then 11 dist (a 1B) dist (a) =
E IMAIB)
say of
P(14)1 < L.
Aea
(2) Let c> O. A sequence ri a , a2 , ... of partitions is called e-independent an _ . in case for each n = 2, 3, ... a. is E-independent of a l
276
6. More about entropy
Definition 4.4 A partition a is called weakly Bernoulli (WB) for an ergodic m.p.t. T if given E> 0 there is an N such that for all m = 1, 2, ... ,
41
Thus for a weakly Bernoulli a, considering the factor space determined by tk, as above, cylinder sets of arbitrary index length are e-independent provided only that their index sets are separated by distance N = N(E). - - m
'
Theorem 4.5 (Friedman and Ornstein 1970) If T has a weakly Bernoulli generator, then T is isomorphic to any Bernoulli shift of the same entropy. Of course every independent generator is WB and in particular the time 0 partitions of Bernoulli shifts are WB; it follows that any two Bernoulli shifts of the same entropy are isomorphic. The next section contains a proof of this result. Example 4.6 The time 0 partition of a mixing Markov shift is W B. Let ((1, p, a) be the Markov shift on {1, 2, ... , r} z determined by an r x r stochastic matrix A = (ai) which fixes a positive (row) probability vector p:pA = p (see Section 1.2.D, 2.4, 2.5). Let a be the partition into the sets ai = {coal:coo = i } , i = 1, 2, ... , r. Fix m and N, and choose -1 E = nT ce. 1_ 1 . nT'a. e" nTN-"na + rn Ea —N F —N— rn" Then pi 11.12
= pi _
ai N+m-l iN+rn
_
1N 1
__
(N)
...a , _
•
,
• IV
+ Fri
is the N-step transition probability from state where a?? Lo. N = (A PI), , to state iN . Recall that strong mixing is characterized by lim d iv) p
for all i and j.
N-'co
Thus
Elp(EIF) — 1
= p(F) E 1
= il(F) E
n — (E)p(F )1 1—
a,(N) Pi N
io
a!N) Pi N
which tends to 0 uniformly in m. This proves that a is WB.
jo
6.4. Introduction to Ornstein theory
211
Corollary 4.7 A mixing Markov shift is isomorphic to every Blernoulli shift of the same entropy. In order to state weakenings of the WB condition and the associated strengthen ings of the above theorem, we need to formulate a definition of a metric, the d-distance, on the set of fmite-state stationary processes. The d-distance is a limit of the Hamming metrics of information theory. Two processes will be thought of as being close if their printouts are similar except possibly on a set of times of low frequency. If a = {A 1 , A 2 , ... , A.} and fi = {B 1 , B2 ... , Bn } are ordered partitions of X and Y, respectively, defme ,
d(a, /3) = i p(AiABi). Let T: X -+ X and S: Y --+ Y be ergodic m.p.t.s. To measure the distance between the processes (a, T) and (fi, S), we need first to represent them on the same space. To this end, take a Lebesgue space Z and isomorphisms 4): X -• Z and tli : Y -. Z. We let
(707:0( ( a,
I T), (fi, S)) = —
E d(OT 'cc, t,11 Sig)
m k= 0
(m = 1, 2, ...),
P1)((fX, T), ( 6, S)) = infcM((a, T), (fi, S)),
and a (a, T), (fi, S)) = sup a(m)((cc, T), (/6, S)) = lim d(m)((a, T), (fi, S)). (The last equality is the content of Exercise 3.) Proposition 4.8. The following statements are equivalent:
(1) cl((a, T),(fi, S)) < 45. (2) The a, T-name of every ci-generic Point for T can be changed on a set of frequency less than (5 to produce the fi, S-name of a fi-generic point for S. (3) There exist an oc-generic point .x for T and fi-generic point y for S such that the a, T-name of x and the fi, S-name of y differ on a subsequence of density less than b. The cl distance is involved in another useful criterion for isomorphism. Proposition 4.9 Two finite-entropy ergodic m.p.t.s, T on X and S on Y, are isomorphic if and only if there exist generators a for T and fi for S such that j( (a, T), (fi, S)) = O. Proof: The 'only if' part follows by taking a to be a finite generator for
278
6. More about entropy
T and fi its image under the given isomorphism. Conversely, given a and fi we have isomorphisms Oa : (X, A', p, T) - a w, y, a) and
y
=
y'
and T is
Now we use the J distance to formulate a weakening of the WB condition. We will require that given e> 0 there is an N such that for all m = 1, 2, ... , most atoms of am° have been moved, on the average during the time interval [0, N - 1], in a way that is approximately independent of a. To make this more precise, extend the above definition of the 3 metric to any pair of finite sequences of partitions a l , ... , ap, of X and fil. ... , fi of Y by a(19({0:1},
{fi})
= inf —
E
d((ba i , kl / 13 i),
where the infimum is again taken over all isomorphisms 4): X -> Z, tii : Y-> Z. Given a partition a of X and a measurable set Ec X, a determines a natural partition alE = {A n E: A e a} of E. Definition 4.10 A partition a is called very weakly Bernoulli (VWB) for an ergodic m.p.t. T if given e> 0 there is an N such that for all m = 1,2 ... , there is a family gm of atoms of ar'01 = V:" 0 T -ia such that (i) p( U gm) > 1 - e, (ii) If A e gm , then
Pk {T ialA:i = 1, 2, ... , N}, {T im: i = 1, 2, ... , N}) < E. Remark 4.11 Every WB partition is VWB. Theorem 4.12 (Ornstein 1970b) If T has a very weakly Bernoulli generator, then T is isomorphic to any Bernoulli shift of the same entropy.
We consider one further weakening of the independence condition and corresponding strengthening of Ornstein's theorem. An independent partition a (also WB and VWB partitions) can be shown to have the curious property that any partition that approximates it well enough in entropy, and in distribution up to a certain time n, must approximate it well, on average (i.e. in the d sense), for all time. Ornstein found that this weak property of being 'finitely determined' was still enough to allow him to prove the isomorphism theorem.
6.4. Introduction to Ornstein theory
279
Definition 4.13 A partition a is finitely determined
(FD) with respect to an ergodic m.p.t. T if given E> 0 there are 45 and N such that if (fl, S) is any ergodic process with
(i) card fi = card a, (ii) I h(Œ, T) h(g, S)I < 45, and (iii) fl dist (4) dist (g) h —
—
=E io
112(A i0 n T'A ii n...n T -6+1 /11 ) N - 1
iN 1 to
-1 B. n ..n .—v(BnS S-P1+1B. 11
IN- 1
)I
then
J((. n (fil SO < e. Proposition 4.14 WB VWB FD. Theorem 4.15 (Ornstein 1970b) The following statements about
an ergodic m.p.t. T are equivalent: 1. T is isomorphic to a Bernoulli shift. 2. T has a finitely determined generator. 3. Every partition is finitely determined with respect to T. Corollary 4.16 A factor of a Bernoulli shift is Bernoulli. Remark 4.17 Ornstein and Weiss (1974) showed that in fact every
finitely determined partition is very weakly Bernoulli. Thus by 4.15 (3), every partition of a Bernoulli shift is VWB. Remark 4.18 Some systems that have so far been proved to be Bernoulli. Note: A flow { T } is Bernoulli if T1 is, in which case all 7;
are. 1. Mixing Markov shifts (Friedman and Ornstein 1970). 2. Ergodic automorphisms of the n-torus (Katznelson 1971). 3. More generally, automorphisms of compact abefian groups (Lind 1977 and Miles and Thomas 1978) and certain ergodic automorphisms of nilmanifolds (Dani 1976). 4. A billiard system (one ball on a square table with finitely many convex obstacles) (Ornstein and Gallavotti 1974). 5. The hard-sphere gas in a rectangular box (Sinai, unpublished see Ornstein 1978).
280
6. More about entropy
6. Geodesic flow on a surface of negative curvature (Ornstein and Weiss 1973). 7. More generally, any mixing Anosov flow with a smooth invariant measure (Ratner 1974 and Bunimovitch 1973, 1974). 8. Brownian motion in a rectangular region with reflecting boundary (Ornstein and Shields 1973). 9. Natural extensions (see 1.3.G) of certain noninvertible maps of the interval, e.g. x fix mod 1, where /3 > 1. (Adler 1973, Smorodinsky 1973). 10. Shifts on certain Ising models (Gallavotti 1973, Ledrappier 1973, Liberto, Gallavotti and Russo 1973). Thus many classical dynamical systems and algebraically constructed systems not only contain random behavior, but from the point of view of measure theory actually are the same as repeated random experiments. Whether one views this as a demonstration of the unsuspected complexity of phenomena or of the inappropriateness of the concept of metric isomorphism, the collection of results is revolutionary - and the revolution is continuing.
Exercises 1. Show that there is a function f (c), which decreases to 0 as e 0, such that a 1 fi implies f3 If(0 a. (Hint: Consider the relationIAA B) p(A);1(B)1 <45.) ship between a fi and 2. Show that there is a function g(45), which decreases to 0 as 15 0, such that H(a) — H(alfi) < (5 implies a 10513. (Hint: If H(a) — H(alfi) < (5, then for a set of atoms B of fl of total measure at least 1— EA.Œ p(A f B)log2 B)/ p(A)1 < A5 . If OW = t — for log2 t — 1, then the first step implies that 0(t.t(A)/p(A ( B)) < all 'good' Be fi and a set of atoms A of a of total measure at least 1 — tA-5. Then use continuity of 0 -1 and sum over the 'good' A ea.) cf. Proposition 5.2.9. 3. Prove that sup., jon)((a., T), (fi, S)) =limm „Tr((a, T), (fi, S)). (Hint: If r, then ii(km» r for all k, and so for large n, ii(") > r — c.) 4. Show that the entropy h(a., T) is a continuous function of ergodic finite-state stationary processes with respect to the cl-distance. 5. Show that d((a, T), (fi, S)) can be found in the following way. Take an ergodic transformation R:Z --+ Z which has (a, T) and (A S) as marginals, in that there are partitions a' and if of Z with (a', R) ;:-.1 (a, T) and (fi', R) (fi, S), and compute the partition distance d(a' , 131; then take the infimum over all such systems..(We may think of the Z process as printing out both an a symbol and a
EA B
A-
6.5. Finitary coding between Bernoulli shifts
281
fi symbol at each time. If these symbols agree with high probability then the d-distance between the two processes is small.)
6.5. Finitary coding between Bernoulli shifts Let A'(p)= A(p o , ... , pa _ 1 ) (with measure p) and M(q) = qb _ i ) (with measure y) be two Bernoulli shifts, on alphabets A = {0,1. ... , a — 1} and B = {0, 1, ... , b — 1 } , of equal entropy h:
h = H(p)= —
E pi log2 pi = — E qi log2 gi = H(q).
According to Ornstein's Theorem (Theorem 4.5, Theorem 4.12, Theorem 4.15), the systems a(p) and a(q) are metrically isomorphic. Ornstein's proof does not actually produce an isomorphism, although it has the advantage of giving various conditions sufficient for any system to be isomorphic to a Bernoulli shift. Further, the isomorphism, or coding, 4): A(p) -■ A ( q) may not be effectively realizable in actual practice, i.e., by a real machine, since in order to determine the 0th entry (4)x)0 of the image of a sequence xeM(p), one may in general need to know the entire sequence x = ... x_ i xox i .... Keane and Smorodinsky (1979a; see also 1977, 1979b, 1979c) gave a new proof of Ornstein's theorem which does construct the isomorphism explicity, and in which the isomorphism is finitary - for a.e. xeM(p), (0) 0 depends only on a finite number (which may depend on x) of entries in x. (Unfortunately, in general the expectation of this coding time may be infinite.) This section is devoted to the major case of the Keane-Smorodinsky proof. First, let us mention several characterizations of finitary maps. A map (1): (p) -+ gR(g) is called finitary if it satisfies any one of the following three conditions, all of which can be shown to be equivalent (Exercise 1):
Conditions 5.1 (1) There are sets of measure 0, N c A(p) and M c A ( q), such that 0: .4(p)VV -- A(g)W is continuous. (2) For a.e. x EM(p) there is an integer j(x) such that if x and x' agree on their central j(x)-blocks, then (4)x)0 = ( 4) x')0 . (3) The inverse image (b - ' {y E A(q) : y o = j} of each time-0 cylinder set in MW) is, up to a set of measure 0, a countable union of cylinder sets in .4(p). The Keane Smorodinsky proof produces a metric isomorphism 4):Alp) -+ A(q), when H(p) = H(q), such that both 4) and 4) - 1 are finitary. Il is not true in general that the inverse of a finitary isomorphism is finitary -
282
6. More about entropy
(Rudolph), although this may be true of maps between Bernoulli shifts. Meshalkin (1959) and Blum and Hanson (1963) constructed finitary metric isomorphisms between Bernoulli shifts in certain special cases, each of which has entropy such as between,-14 ,1) and Af( 21 , 81 2. Finitary codes between subshifts of finite type of equal entropy were constructed by Adler and Marcus (1979), and Denker and Keane (1979) and Denker (1977) began a general study of the finitary category. It can be shown (Exercise 2) that a(p) and ar(q) are isomorphic under a measurepreserving homeomorphism if and only if p is a permutation of IT Rudolph (1981) has worked out the theory, parallel to Ornstein's, of sufficient conditions for a given system to be finitarily isomorphic to a Bernoulli shift. Notice that if both alphabets have two elements (a = b = 2), then the isomorphism can be constructed easily. For the function ,
13 1 ,
1 1 3 ,
8 1 ) ,
g(t) = — t log2 t — (1 — 0 log2 (1 — t)
on [0. 1] has the property that gad = g(t2 ) if and only if t i = t2 or t i = 1 — t 2 . Thus H(p o , 1 — po) = H(go , 1 — go) implies that either p o = go or p o = 1 — go . For the isomorphism, then, we take either the identity map or else the `dualizing' map which sends 0 to 1 and 1 to O. Henceforth we will assume therefore that a?•-- 2 and b> 2. If actually a = 2, the same proof that we are about to give for the case when a> 2 and b> 2 will carry through, provided that a few technical modifications are made here and there (Exercises 5 and 6). For the sake of the exposition, we will limit our attention to the major case when a ?.:- 3 and b › 3. A. Sketch of the proof
There is an easy reduction to the case when the probability vectors p and q contain a common weight, say po = go . Then the symbol 0 appears with the same probability in the sequences of Ag) as of .4(p), so it is natural (and very helpful) to consider codes 0: .4(p) —> Af (q) which take 0 to 0; i.e., we will have (4x) = 0 if and only if xi = 0, for all ieZ. This gives us sort of frame for the code, and we only need to tell which of
6.5. Fin itary coding between Bernoulli shifts
283
the remaining symbols 1, ... ,6 - 1 fill in the blanks in Ox. The Os are called markers, and the Bernoulli shift (p0 , 1 - po) is called the marker process; it is a factor of both a(p) and gg(q) in the obvious way (code 0 to 0 and any other symbol to 1). I x ..00... .0 ... 000 ... O.. E M(p) \E z -,- ..0011..11011...11000111..11011 t1 ° I Ox = y .. 00....0... 000...0... E M(q) 1 ea(p 0 ,1 - p o) The idea now is to look at a very long central block of x, observe the 'filler' in it (i.e., what's in the spaces between Os), and use this information to fill in many of the corresponding places in y, including at least y 0 . This amounts to defining 0 so that it sends fibers of n to fibers of T. The long blocks looked at are selected to be of a kind which can inherently never overlap themselves, namely the blocks between two enormously long record (as we look outward from the center) runs of consecutive Os. Such a block of Os and spaces is called a skeleton. Almost all x e(p) have skeletons of all 'ranks'. We think of a skeleton as appearing in both x and y. Given a skeleton, its blanks can be filled by finite blocks on the symbols Ao = {1, ... ,a - 1} to get a block in x, or Bo = {1, ... ,6 - 1} to get a block in y. These are called the filler alphabets. The relative probabilities of the various filler blocks in x are determined by products of the filler measure Po(k) = 1 Pk po
(k = 1, ... ,a - 1),
and those for y by products of vo(k) =
qk 1 - qo
(k = 1, ...,b - 1).
When there are very many spaces in the skeleton we are looking at, the Shannon-McMillan-Breiman Theorem (Corollary 2.5) implies, because ...,p..... 1 ) = H(q i , _ 04_ 1 ), that the blocks available to go into the spaces in M(p) and in 41(q) have roughly the same numbers and sizes; this gives us hope that they can be matched up, and that by looking at the filler in x we will be able to fill in many of the blanks in the corresponding portion of y. Given a skeleton, the fillers F for x can be thought of as cylinder sets of the Bernoulli shift on the alphabet Ao with weights given by vo , and the fillers G for y as cylinder sets of the Bernoulli shift on the alphabet Bo 0 ; these are the filler Bernoulli shifts. Each fillerwithegsvnbyV
284
6. More about entropy
F has associated with it a certain set J(F) of places in the skeleton whose entries it determines; as we pass to longer skeletons, places previously fixed stay fixed, and with the same entries. At any given stage we are interested in the equivalence classes of fillers, where two fillers are regarded as equivalent if they agree on their fixed sets, since these form the cylinder sets of the candidates for the x and y that will be matched to one another by the isomorphism. We look at progressively longer skeletons. At each odd stage, look at the equivalence classes of fillers in points x of A (p) and associate to each a set of equivalence classes of possible fillers for the corresponding points of y = 4)x in a measure-nondecreasing way; such an association is called a society. At each even stage do the same, except with the roles of M(p) and M(q) reversed. When we pass from one stage to the next, the association is made in a manner consistent with what went before and in a particularly efficient way. That this is possible follows from an extension of the famous combinatorial Marriage Lemma that has found many previous applications throughout mathematics, including in ergodic theory (see Rota and Harper 1971 and Ornstein 1970a). The sequence of skeletons and associations of equivalence classes of fillers produced in this way is called an assignment. The parameters of the procedure can be fixed in such a way that for all xE,l(p) with the possible exception of a set of measure 0, when we look at a long enough central skeleton of x, the equivalence class of its actual filler in x has associated to it by our assignment a single equivalence class of fillers in M(q). Then all those places in Ox can be filled in. Further, we will show that the central place of Ox is among this set of then-determined places. Thus ( 4, x)0 is determined by finitely many coordinates of x, and the map is finitary. The measure-nondecreasing condition in our assignments makes measure-preserving, and the finitary map tp constructed from 11(q) to 41(p) simultaneously and consistently with (1) will be (once sets of atypical points, of measure 0, are discarded). The map 49 is actually defined on entire orbits, and so is clearly shift-invariant. Now we go on to look at the details, which involve nothing more than some easy combinatorics and a few estimates arising from the ShannonMcMillan-Breiman Theorem, or, if you prefer to look at its proof rather than its statement, the Weak Law of Large Numbers (i.e., the Ergodic Theorem, with convergence only in probability rather than in measure). A certain amount of notation and bookkeeping is necessary; behind this, though, there is a very simple and beautiful argument. -
6.5. Finitary coding between Bernoulli shifts
285
B. Reduction o the case of a common weight We are dealing with two Bernoulli shifts M(p) = M(p o , ... 'Pa- i) and gl(g) ---- gg(q0, ". 'qt.- 1) on alphabets A = {O ,1, ... ,a — 1} and 8= {0,1, ... ,b — 1}, where a ...- 3 and b ?..- 3, each of entropy h. The following Lemma allows us to assume that p and g have a weight in common, say po = go . As outlined above, this produces the common factor marker process, makes possible the construction of the frame for the coding, according to which 0 goes to 0, and sets off the search for skeletons in the sequences of the two p:.ocesses. Lemma 5.2 Marker Lemma There is a probability vector r = (r o , ... , rc _ 1 ) with H(r)= — ri log2 ri = h such that ro = p i for some i
and r 1 = gi for some j. Proof: Assume that po /) 1 --- /)ii-1 , go •igi pa _ i .?.--- qb _ 1 . Put ro = po and r, . qb _ 1 . I claim that
---
gb-i,
and
H(r o , r 1 , 1 — (t o + r 1 )) tç. H(po, Pa- 1 , 1 - - (Po + Pa - i )) Intuitively this is clear, because the vector on the right is more equidistributed than the one on the left (recall that a .?.-- 3): Po
pa
ro = po
rt= qb-
I-
-
1
t< Pa-1
More precisely, it is enough to observe that the function g(t) = — p0 log2 p0
—
tlog 2 t
—
(1
—
(t + p0))log 2 (1
—
(t + p o))
increases as t increases to (1 — p0 )/2, so that g(gb _ 1 ) -.5.- g(pci _ 1 ). v 3 1 (p o + p, i )) ‹ H(p)= h, since a ‘.. 13 implies Of course H(p o , -a-1 H(a) - ‘. HO. Now we only need to choose c.?..- 3 and r 2 , ... ,rc _ 1 such that H(r o , r 1 , ... ,r = h. This can be done as in Exercise 5.3.4: use Lagrange multipliers to see that the continuous function — ri log2 ri , subject to the constraints ri ..?-. 0, Er i = 1 — r, where r = r o + r 1 , takes any value between —(1 — r)log 2 11 — r) and —(1 — r)log 2 (1 — r) + (1 — r)log2 (c — 2). -
If the finitary code can be produced whenever there is a common weight, then given arbitrary p and g of equal entropy we can find an isomorphism from each to a a(r), where r is as in the Lemma, and composition of one
286
6. More about entropy
of these maps with the inverse of the other will yield the isomorphism from A'(p) to g/(g). We assume henceforth that p o = go . C. Framing the code Let N 1 < N2 < ...be a sequence of integers, to be determined later. For each r = 1, 2,... a skeleton S of rank r = r(S) is a string of blocks of
Os and spaces Oni
S =On°
Onm
11
12
Im
(where Oi means a block of i consecutive Os) such that (i) no , n,,, .?-- Nr
(ii) N,.> nk for k # 0,m. Skeletons in a sequence xe M(p) or yea(g) are found by beginning at the center and moving out to both the left and right until we first encounter a block of at least N,. consecutive Os; in such a case we include all the Os in the skeleton: we say a skeleton S appears in a sequence x in case the spaces of S can be filled in by nonzero symbols in such a way that the resulting block appears in x neither preceded nor followed by a O. The number of spaces 1= 1 1 +.... + 1 m is called the length l(S) of the skeleton S. If S is as above, then by a subskeleton of S we mean a skeleton S o of some rank ro that takes the form onk+I
So= Onk
on,
Ilc+1
for some choice of k and i. Two subskeletons of the same skeleton are called disjoint if their sets lk + 1, ... ,il (as above) are disjoint (the right endblock of Os of one could be the left endblock of Os of the other). We say that a family of subskeletons of S forms a decomposition of S, and write S=Si x ... x Si , in case the S. are pairwise disjoint and the union of their sets of indices {k + 1, ... ,i } is all of 11, ... ,m}. Lemma 5.3 Skeleton Lemma (1) Each skeleton S of rank r> 1 admits a unique decomposition, called the rank decomposition of S, into subskeletons of rank r — 1.
(2) For a.e. xegif(p), either x o = O or for each r ?..- 1 there is a unique skeleton Sr(x) of rank r which includes the central coordinate of x. (3) Given any sequence L , ,L 2 , of lengths, we can choose N 1 < N 2 < ... ...
6.5. Finitary coding between Bernoulli shifts
287
such that for a.e. xE.4(p) and a.e. y AI(q ), i(Sr(x)) Lr and l(Sr (y)) Lr for all sufficiently large r. Proof: (1) Starting at the left of S, move to the right until we hit the first block of Os of length at least N r _ 1 ; all the places passed on the way comprise S 1 . Continue. (2) Beginning at the center of x, move out to the left and right until in each direction we hit a string of consecutive Os of length at least Nr . Almost every point contains infinitely many such strings, so such an Sr(x) can be found with probability 1. (3) We want to choose the IV, so that if Er = txegg(p): l(S r(x)) LJ,
then
),„
u n E,) = 1. (Because the search for skeletons involves only the marker process, the same birs will do also for M(q).) This can be done by making the /V, so large that we have to wait a very long time to see repetitions of 0Nr• By the easy half of the Borel—Cantelli Lemma, it is enough to achieve
r=
1
for then
p n u Ec,)=0. rip= r
Now E.: = {x: /(Sr(x)) < Lr} can be seen from this picture,
: O'r appears in x0 x 1... x(Lr , niv.. }, as center
x =. . . 00/vr [fewer than L r nonzero entries] 01'11'0
••
Y Sr(X) likii•Ne■••••••4114
N,
,10, ■••=01
Nr (L r + ON,
N,
288
6. More about entropy
so that, denoting by [B] the cylinder set determined by a block B, (L r bir
14Erc)'‘.. p U {x: O N r appears at the ith place in x} 1=0 iv
‘... (L,.N,. + 1)41 0'= (L,N,.+ 1)Kr <1y for appropriate choice of Nr . D. What to put in the blanks
We define for
AP)
and
M(q)
the filler alphabets
A o = {1, ... ,a — 1}
and
Bo = {1, ...,b — 1}
filler measures Pk (k = 1 , ,a Po(k) = 1 — po
1) and
vo(k) = qk (k = 1, ... ,b — 1) 1 — qo
filler entropies a— 1
- E p0(k)log2 po (k)
b— 1
= g=—
E
vo (k) log2 vo(k)
k=1
k= 1
filler Bernoulli shifts (A:, 4,) = (p 0(1), ... , po(a — 1))
and (Boz , voz ) = Av0(1), ... , vo(b — 1)).
(It is easy to check that the two entropies are equal.) Again a word about what is going on. We will try to pick blocks from these filler Bernoulli shifts to fill in the spaces in skeletons. Look at a single fiber over the marker process in each of ./(p) and 2(q). If r is odd, for each xe.1(p) in this fiber we associate to the equivalence class of the actual filler Fr(X) in Sr(X) a set of equivalence classes of fillers from (B07 , voz ), which we think of as the possibilities for the fillers for Sr(Ox). (The equivalence classes consist of fillers which agree on the places that can be determined from that time on.) If r is even, we do the same with the roles of Ap) and .4(q) reversed. Eventually the assigned equivalence classes become singletons, and, as r --+ oo, almost surely all blanks are filled in and almost all points in the two fibers are paired. To avoid excessive notation, we will use po and vo also to denote the product measures of various dimensions obtained from powers of po or vo . Let S be a skeleton of length I. Its index set is I(S) = {1, 2, ... ,I},
6.5. Finitary coding between Bernoulli shifts
289
and its filler sets are "IS) = A lo = IF = fif2 ...fi : all fi E A o l
for a(p)
and
g(s). go = (G. g 1 g2 ...gt : all g. B0 } for MM. Step 5.4 Determining the N, Fix a sequence e l ,>. £2>
O. Let Œ be the time-0 partition of (4, ii,f) , cr). Fix r = 1, 2,.... By the ShannonMcMillan-Breiman Theorem (Corollary 2.5), there is a k, such that if k .?:- xr , then at- ' (which is identified with ,F(S) in case /(S) = k) is the disjcint union ... -4
atwhere
(i) po( u ed < c, n=
(ii) For each A ego if
min
po(i), 12 -4g+ So < 2 -4g "r12)
n
14i4a— 1
Po(A)
<2
1 n
1
42) <-2 —k(9-842) < — 2 - k(g-Er) .
n
(We have chosen kr large enough that 2 -ks42 < n for k ?-. k r .) This allows us now to choose the L, (depending on the Ed in such a way that (i) L,?"-- kr for all r, lim LA-1 — 0' cc • (ii) r-i, oo Then the Lr determine the N, by the Skeleton Lemma (5.3(3)). Step 5.5 Definition of equivalence of fillers Let S be a skeleton of rank r and length I. We will define an equivalence relation — on the set ,(S) of fillers for S in 1 f(p) by attaching to each FE.F(S) a set J(F)c {1, 2, ... , I) (of (thenceforth) determined places in S) and agreeing
that F — F' if and only if (i) J(F)= AP) (ii) F = F' on J(F). Actually J(F) is defined by a sort of stopping rule in such a way that (ii) forces (i). A similar definition is made for g(S) (using y0 in place of p0), and the resulting sets of equivalence classes are denoted by .F(S) and g(S). If FE.F(S), then F denotes the equivalence class to which it belongs. Appropriate powers of po , and vo , determine probability measures on .F(S) and f(S), and T(S) and g(S), in the obvious way.
<
290
O. More about entropy
J(F) is defined by induction on the rank r of S. Suppose that r(S) = 1, l(S) = 1, and F = a l ... al e "(S). Define J(F) to be the initial segment of 1, 2, ... ,Iwhose corresponding cylinder set's p o-measure first drops below the upper bound found in the preceding paragraph for atoms of 1
AF) = {k: 1
n
Then usually J(F) {1, ... ,1}. Suppose now that J(F) is already defined for all FE.F(S) whenever r(S) < r. If S is a skeleton of rank r, form the rank decomposition S -- S 1 x ... x ; of S into disjoint subskeletons of rank r — 1. (Throughout, amalgamate, consolidate, and keep track of all index sets correctly.) If F e "(S), then in an obvious way F is a concatenation F = F1 x ... x F., with each FE g(S i). We define i
J O)
= U..= 1 AF i ) = all the previously fixed places, i
and we also add any new places, up to wherever the measures of atoms fall below the new (smaller) upper bound: Let 11, ... , /No(F) = { t 1 , ... , tu b where t 1 < ... < tu ,
and
define J(F) = J0 (F) u ft v :1 •
t) ....ç. u and 1
P0(0)110(ati ) ... p o(at. ) ?---- 2 -1 60--ar)} . II ( teJo(F) - t
/7
Thus at each stage r we include all places previously fixed plus throw in as many as possible until the measure of the atom formed starting at the left drops below the r'th bound, including also this last place. It is clear that if F' agrees with F on J(F), then J(F') = J(F). Notice that if S and S' are skeletons of the same length, then .F(S) = "IS% but the equivalence relations on " ( S) and ..F(S') may differ. The key estimates of the proof are contained in the following Lemma, which gives us some idea of the number and the sizes of the equivalence classes in ..F(S). (Bear in mind that eventually a set of possible equivalence classes will be forced to be a singleton.) Here 0= max p o(i). 14 i4a - 1 Lemma 5.6 Filler Lemma 1. If S is a skeleton of rank r and length 1, and if F e.F(S), then po(P)
6.5. Finitary coding between Bernoulli shifts
291
2. If S is a skeleton of rank r and length I .?.-- Lr , then for all F e "(S) except maybe a set of p 0 (I-fold product) measure less than e,,,
(a)
1 p o(P) -.<..- 2 -141-8r) ,
n
2 i log2 Oi er .
1 (6) - card J( F)> 1 1
Proof: (1) Let F = a l _al and J(F)= Jo(F)u {t,, ... , t w } . Then p 0(F) = po ( U F') = H p(a1 ) F -F
= (fl ieJo(F)
ie.1(F)
po(ai)
i
dad ... p o(a, ) ;10(0I-7 2 -49-4-) -
‘,
,
by the definition of J(F). (2a) If J(F) * l(S), then the product of all the po(ai), ieJ(F), is no more than (1/q)2 -1w - ", by definition of J(F). It remains only to consider the case when J(F)= l(S) = (1. 2, .... /1. Then f = F. But by the Shannon-McMillan- Breiman Theorem (see Step 5.4), the set of fillers FE.F(S)(i.e., atoms of 211 ) for which 1 p 0(F)> ri together comprise a set of p 0 measure less than er . (2b) It is enough to prove that if cardJ(F)//-...5 1 — 2ell log2 01, then p o(F)---.Ç. (1/q)2 - 1(9 +E r) , for then, again by Step 5.4, F will be contained in the same set of measure less than Er mentioned in the proof of 2(a). If (b) does not hold for F but (a) does, then card (I(S)\J(F))?.., 2/1,./1 log, 01, so that
p J i) )
po(F)= (ri
(
H
ieJ(F)
Pow
)
‹ Po(F)02 "°g2°' "
iel(SM(F)
. + Er/. ...,Ç1 2 - do- 4-12- c ger _... 1 2 -49 11 P1 where we have also used the arcane fact that 0 1/11°g20I = 2 - ' if 0< 0< 1. .
.
- .
,
Of course a similar lemma holds for g(S) and vo . In the following, 0 will be the minimum and maximum of all the ,u0(i) and vo(i).
n and
E. Sociology Al an odd stage r, we associate to the equivalence class r of the actual filler of the r - skeleton S of each element x of a fiber in A (p) a set of 'suitable partners,' namely a set of equivalence classes of possible fillers
292
6. More about entropy
of the (same) r-skeleton of 4)x. This is done by means of a map 99 : g(S) —* 24-(s) ,
which is called a society if it also satisfies a measure-nondecreasing property which will eventually imply that 4) is measure-preserving. At even stages we do the same, except that the roles of A(p) and M(q) are reversed. At each step the associations are consistent with ones previously established (this is made possible by forming the dual society when we pass from step r to step r + 1), and the set of suitable partners tends to decrease as we refine the societies by using the Marriage Lemma, until at some stage the set of suitable partners shrinks to a singleton. In this section we discuss societies on arbitrary finite sets U and V, which for mnemonic purposes, although at the risk of arousing prurient interest, can be thought of as a set of boys and a set of girls, respectively. Definition 5.7 Let (U, p) and (V, ()) be probability spaces of finite cardinality. By a society 99 from U to V, denoted 99 : U---0V, we mean a
map t.99 :U— 21'
such that p(B)
for all B
U.
A society .99 is thought of as an association to each boy b a set of girls Yb whom he 'knows,' in such a way that the size of any set of boys is no larger than the size of the set of all the girls whom they know. In the situation of the classical Marriage Theorem (see 5.13 below), p and a are just normalized counting measures. We say that a society M: U---> V is a refinement of a society .99 : U--). V, and write in case gt(b) c Y(b) for all b e U. The promiscuity number 49') of a society 9' is defined to be the number of girls who know more than one boy: 499) . card {ge V: there are b 1 # b2 in U with ge .99(b 1 ) n 99(b 2)}.
The dual of a society .9v : U---0 V is the map .99*: V--* 2' defined by ----{bEU:ge9 9(b)}. Proposition 5.8 99* is a society from V to U. Proof: Let G c V. The boys in (99*G)c know no girls in G, so
99[(99*GY] C C..
6.5. Finitary coding between Bernoulli shifts
293
Thus
o(G) I — o-(Gc) < 1 — ofY[(9 9*G)c1) < 1 — p[(9 9*G)1= p(.9 9*G). This easy, but mildly surprising, proposition, by establishing the symmetry between the roles of U and V, exculpates the discussion of any
suspicion of sexism. Remark 5.9 One particularly nice way to generate a society is from a joining of the probability measure p and a, that is, a probability measure 2 on U x V which projects to p and a:
(1)
E )1(b, g) = a(g), E ;t(b, g) = p(b) beU
for all b, g.
gel/
We then define 92 = U V by ge .99b if and only if .1.(b, g) > 0. Then
o-(9 B) =
E
ci(g) =
ge,913
E E A(b, g) ge.913 bU
beB ge.Yb
beB gel/
beB
so that .99 really is a society. Although not all societies arise in this way (even if we allow › in one of the equations in (1), every society has a refinement which does. To prove this we need one preliminary result.
Lemma 5.10 Given a society : U V, there is a society R < such that card (Rb, n .b 2) 1 whenever b, b 2. Proof : Suppose that there are b , b 2 E U and g i , g2 E V such that Ig i . g2 1 .9'b 1 n99b 2 . Define Ri
if b b b {.9'b YlAg i if b=b i
for i= 1, 2.
We claim that at least one of R I , 2 is a society. The required R < can then be produced by repeating this construction as many times as necessary. Suppose that R I is not a society. Then there is B o c= U with p(13 0)> 1 B0). Notice that if b B0 , then clearly R I Bo = .97B0 ; and because YB0W 1 B0 Yb2 , b 2 EB0 also implies that R I B° = YB0 . We must have, therefore, b l e B 0 and b2 it Bo . Let B OE U be given; we will verify that p(B) o-(M 2B), so that R2 is a B, and we're society. If b2 B or b 1 EB, by the preceding argument R 2 B done. Assume then that b2 e B and b B, so that R2B = B\{g 2 }. Then b 1 B implies that R2(B r B 0 ) (B B0 ), and we may calculate that
294
6. More about entropy p(B 0) + p(B\130) a(Y(Bu Bo)) =
1 B0 u
= cr(g1 i B u [.4? 2(B\130)\gt i (B n Bo)]) o-(.41 1 B 0) + o-P 2(BV 0)\M (B n B
and hence p(BV 3 0) 4.41 2(13\13 0)\M i (B n di Because b l ,b2 B Bo , we have .4'1 1 (B n B0)= 4t 2(B n Bo) = g(BnB0 ), so that p(B B 0) -4. 014 i (B n Bo)). Adding these inequalities gives
P(B) = P(B1/30) + p(B n Bo) cr(M i (B n B0)) + o-N2 (B\B0)A(B n B 0)] = ofM i (B B 0) u M 2(13\130)) = o-(.12(B nBo)u 3 2(B\130)) = cr(M 2 B). Proposition 5.11 Every society has a refinement which is generated by a joining. Proof: Let 9':U --'V be a society and fix n= 1, 2, .... Split each girl gi EV into n equal-weight subgirls g j = 1, 2, ... , n, to form a new set V. and society .9% : U V. More precisely, we may define V. = V x {1, 2, ... , j) = o-(g)In for all j, and .9%b = b x 11, 2, ... , n1 for all be U. By the Lemma, there is a society 41n < F/. such that card (.41.b 1 nM.b 2 ) ‘. 1 whenever b 1 # 62 . We define .
x {1, 2, ... , n1)) (the proportion of inb that falls 'within' the girl g). Then
(2)
E
g) =
geV
E o-
n({g} x {1, 2, ... , n1))
geV = r(b)
p(b)
for all be U,
t n ( {g} x {1, 2, ...,n}), b e U, cover and, because the sets {g } x 11, 2, ... , n1 and can be made disjoint by discarding at most one element of the second, two of the third, etc., Ajb, g) =
(3) beLl
n(M nto n({g} x ( 1, 2, , n))) beV
(card U —1)(card U)o -(g) 2n Choose .11.(b, g) g) for all (b, g) so that equality holds in (2); this does not affect (3). Notice that g) > 0 only if g e b. Choose (.1(b, g))
6.5. Finitary coding between Bernoulli shifts
295
to be any cluster point of the sequence of matrices (An' (b, g)). Then clearly A is a joining of p and a which generates a society that refines Y. Lemma 5.12 Marriage Lemma Given a society 99 : U--->V, there is a society M < .99 such that 7r(M) < card U. Proof: Choose gl to be a minimal refinement of Y. Then ./ is given by a joining A of p and a. By a cycle we will mean a sequence b 1 , , b k of k 2 different boys for which there are different girls g 1 . , gk such that
gi Eg(bi)r) (bi , 1 )
for all i,
where bk , 1 = b 1 . We will show that this minimal A' has no cycles, and then the estimation of 7r(M) will be easy. If there is a cycle as above, let m — min {Ab, gi): j = 1, , ic } and define 1,1(b i ,g,)— m if b=b and g = gi )t(b, g) = ).(b i ± 1 , g.) + m if b = bi ± 1 and g = g1 g) otherwise. 4,
b2
b3
It is clear that ; is again a probability measure with marginals p and a. Because the support of ; is a proper subset of the support of A, the society 9'1 generated by ; is a strict refinement of M. Since this is impossible, there cannot be any cycles. Now make a graph whose vertices are all the elements of U. Each girl in V who knows more than one boy determines just one edge of the graph by connecting any one such pair of boys. By the foregoing argument, this graph has no cycles, and hence each of its components is a tree. Every tree has fewer edges than vertices, and hence so does the entire graph. This says exactly that n(4t) < card U. It is curious that the hypothesis of this Lemma involves the measures p and a, but its conclusion refers only to cardinalities. Perhaps one can
296
6_ More about entropy
best understand this Lemma by observing that it is a kind of strengthening of the usual Marriage Lemma.
Lemma 5.13 Usual Marriage Lemma Suppose there is a set of n
boys, each of whom knows at least one of a set of n girls, and that any set of k boys knows at least k girls, k = 1, 2, ... , n. Then one can hold a mass wedding in which each boy marries a girl whom he knows. Proof: When p and a give equal weights 1/n to all the boys and girls, then the map .9' which assigns to each boy all the girls whom he knows is a society. Find the M < with no cycles as above. Each girl knows at least one boy; remove from consideration those girls who know exactly one boy. The boys who know the remaining girls are the vertices of a graph; each of the remaining girls is placed down as an edge joining any pair of boys whom she knows. Let B be the set of vertices of any component. Even though we are allowing repeats, still there are no cycles as defined above; this forces A B)> B), violating the society condition. Thus the graph is empty, and under .41' each girl knows exactly one boy. The following Proposition will be needed to construct the societies involved in the definition of the finitary isomorphism.
Proposition 5.14
Let .99i :Wi ,
ai) be societies for i = 1, 2.
x 992 b 2 is a society from U 1 x U 2 Then the product map .91b 1 , b2) = to V1 X V 2 . which is generated by a joining ;t i Proof: Each 99; has a refinement Then A I x 112 is a joining of p l x p2 and o- i x o-2 , and the society it generates is a refinement of 99. Therefore is also a society.
F. Construction of the isomorphism A point zegi(p o , 1 — po) in the marker process determines the r-skeletons Sr of all points in the fibers over z of both gip) and .4(q). By
induction on r we will define for all r-skeletons S societies .(S)-. (S) if r is odd and AS)....40..f(S) if r is even, independently of the z in which S appears.
6.5. Finitary coding between Bernoulli shifts
297
Let r = 1. For each 1-skeleton S, s is defined to be the trivial society in which each boy knows all the girls:
Y'sF'= g(S)
for all Peg(S).
(All I(S)-blocks in the filler process of M(q) are candidates for filling in the blanks in Ox.) Using the Marriage Lemma, choose gis < .Vs to be a society with rr(g/ s) < card (.5-',;1(S)). Suppose now that S is a skeleton of even rank r, and that the Ms have been defined for all ranks > r. Form the rank decomposition S = S1 x x Si of S into skeletons of rank r 1, look at the known societies form their duals and the product society (cf. Proposition 5.14)
g(S ) x x
x x
Recall that each FE (S) is determined by fixing the entries for a certain set of places .1(F) among the blanks in S. The elements of
.9-7(S) = g(S i ) x
x
are equivalence classes determined by fixing certain places in a (usually) proper subset of these. Because this is a coarser equivalence relation, each element of g(S) is a disjoint union of equivalence classes in (S), and so we may regard gs : 4.(S) g(S) as a society from g(S) to .97(S). (A subset of g(S) naturally determines one of g(S).) Choose 9?s < s by the Marriage Lemma with n(gs) < cardg(S). Consider R s : q(S)(S) as a society from g(S) to g(S) by putting
gis(d) M s(C) for each filler GE (S), clearly a well-defined map. Rs is a society because 4 ges(6)) I)(6) P(0 ‘. (p and c are appropriate powers of vo arid po). If S is a skeleton of odd rank r, we perform a similar construction except in the opposite direction, ending up with Rs : g(S) q(S). Now we are prepared to define the isomorphisms 4) :a (p)— M(q) and ‘P:M(q)—* a(p). For each x Eg(p), let i,.(x) denote the index in /(S,,(x)) 11, 2, ... , /(S,.(x))} of the 0th coordinate of x. Recall that Fr(X) denotes the 1(5,.(x))-b10ck that occupies in x the places occupied by blanks in Sr(x).
298
6. More about entropy
Lemma 5.15 Assignment Lemma For almost all xEM(p) with xo # 0 there is an even r = r(x) such that (1) With regard to the society gtsroo : g(Sr(x))-->.g(Sr(x)), is a singleton, Or(x). (2) 1 r(x)EJ0(0x)) (the union of the places fixed in the skeletons of the rank decomposition of Sr).
Before proving this Lemma, let us show how it allows us to define the isomorphism 4). For almost all x e A (p) choose such an even r, and define - {0 if xo = 0 (4)x)0 = the ir(X)th entry in 6r(x) if xo # O. Because subsequent societies arise from duals and refinements of previous ones, (4)x)0 is independent of r. In words, the map is defined by looking at each stage at the set of girls determined by x, waiting until there is only one suitable boy, and then reading off the 0th coordinate of this boy (which all boys from then on will have in common). The map 4) is actually defined on entire orbits by {0 if xi = 0 (4)x). = i ir(o-ix)th entry in 5Joix) if xi 0, and is clearly shift-invariant. Actually at any stage r we can fill in all the places in Using odd rs, we obtain a map IP: a(q) -4 M(p) which must be the inverse of 4) (wherever defined), since the societies involved at each stage are duals of one another: kil has to assign to 4)(x) (equivalence classes of) fillers with the same 0 entry as x. To show that 4) is measure-preserving, we need first to disintegrate the measure y on .4(q) with respect to the factor map M(q) -4 490) 0 ,1 — pd. In the current situation this is relatively easy, because we are dealing with essentially a product space. If m is the probability measure on M(p 0 , 1 — po) and C c AO is a cylinder set, then it is easy to check that
AC) =
yo(Cz)-dm(z),
I - P0) where Cz -=-- {gee° :(z, g)EC} is the section of C over z. (C z may change with z, but yo(Cz), when not 0, is constant.) A similar analysis applies to .(M(p), p, and po . Fix a cylinder set C c AO; we will show that p(41 - 'C) ?--- v(C). This will imply that ?--- y and, by considering complements, that p4) '= v. Apo.
6.5. Finitary coding between Bernoulli shifts
299
For a fixed zea(po , 1 p0), the section Cz is a cylinder set in Leo . Fix an r large enough that the r-skeleton Sr(z) of all x and y in the fibers over z includes all the places which determine C. As in the second part of the proof below of the Assignment Lemma (denoting by Gr(z, g) the filler of Sr(z) in (z,g)ca(q)), unlessg is in a subset U, of Boz having measure 6, 0, then the set of places determining Cz is contained in the fixed set .1 0(Gr(z, g)). (This will allow us to tell whether or not (z,g)EC by considering its filler equivalence class in V(Sr(z)). Also, as in the first part of the proof of the Assignment Lemma, if f A ol avoids a certain set Er of measure less than 6, 4 0, then under the society gfS(z) g(Sr)--*g(Sr), .4k1 f) is a singleton. (And hence we can identify 4)(z, f) on the fi xed set of this single—
-
ton equivalence class.) The set Cz determines a set of fillers G for Sr(z). Let C denote the saturation of G (i.e. the set of equivalence classes of members of G). As long as we avoid U, and E,, each filler fE A oz which is in .415, C has an equivalence class in i(S,) which 'knows' a single equivalence class in - (S„), and the fixed set of this last equivalence class contains all the places which Cz fixes. Since all such f are assigned by 4) to something in Cz (we are regarding 4) also as a map of the filler Bernoulli shifts, which is permissible because 49 respects fibers), we have
p 0(4) -1 Cz) ?-; po(Ms,C) — 26, = o-(Msr C) — 26,
p(C)
—
26,
vo(G) — 26„ = v 0(C) — 26„. Applying the disintegration formula gives p(4) letting r cc concludes the argument.
v(C)
—
26,, and
Proof of the Assignment Lemma: Fix zEM(p 0 ,1 po), thereby determining the skeletons S, for all r. By the Skeleton Lemma (5.3(3)), 1 = Ir = l(Sr) L r for sufficiently large r. For even r, let us estimate the measure of the set of complete fillers —
= If E A oz :with regard to gsr(x) :g(Sr)-9....f(S„), card gis-r(1,),(z, f)> 1). I po) x with g(p).) The cardinality of (We are still identifying the set of I" for which card gis- 'i > 1 is less than card g(S,), by the Marriage Lemma (5.12). And by the Filler Lemma (5.6(2a)), most i have measure no more than (1/1 )2 -1(61- 'r) . Therefore po(Er)
1 - 2 -1(g-44 card g(Sr ) + r .
300
6. More about entropy
Now look at the rank decomposition of Sr . For each di ei(S), by (1) of the Filler Lemma
v0(671) so that card g(S1)
2'd°
and card g(Sr) = n card g(S1) ‘. 249
').
Thus for sufficiently large r (depending on z),
„ tE 1 .<,.. 1_ 2-461-G.)2/w-4r 1) _a_ - = 1 2- war- i -Er) + E t"O‘ ri -1- Er r ti n + Er _4, 0. "I
The sets Er decrease as r increases; therefore CO
pix:xe Fr for all even r} = p
nE
k=
k=1
proving (1) To prove (2), refer to (2b) of the Filler Lemma (5.6). Again fix a point in the marker process, zE M(p o , 1 — po), and consider all the points x = (z, f)(f E A 2oE) in the fiber of (p) over z. This set breaks up into tr sets of equal measure Or according to where the 0 coordinate is located, since the shift maps any one such set to another. Thus, again using the rank decomposition of Sr ,
po { f eilf) :ir(x)0.10(G r(x))1
1
card Jo(Cr(x))
1
ir i= I 2Er_
ir Since this tends to 0 as r
r
-
1
1 1°g2 e l 1 1°g2 19 1 cc, (2) follows by the same argument used for
1). The sequence Er tending to 0 has remained completely arbitrary.
6.5. Finitary coding between Bernoulli shifts
1. 2. 3. 4. 5.
6.
7.
301
Exercises Prove the equivalence of Conditions 5.1. Show that .4(p) and .4(q) are metrically isomorphic under a homeomorphism if and only if p is a permutation of q. Show that not every society arises from a joining, even if ->- is allowed in one of the equations (1) on p. 293. Extend Proposition 5.11 to societies on arbitrary probability spaces. Show that if a = 2, b...... 3, there is a probability vector r with at least 3 entries and H(r) = h and k.?.- 1 such that pkop 1 = rko r i . (Hint: choose ro > max {p 0 , p 1 } , let r 1 = (p o ir o)kp i , with k large enough that H(ro , r i , 1 — (ro + r i ))< h.) Prove that .4(p) P-.:, .4(q) if a = 2, b.?.- 3. (Hint: It is enough to consider the case pko p 1 = qko q i . Replace 0 in the definition of skeleton by Ok 1. What other modifications will allow the proof to carry through? Let (U, p) and (V, 0) be probability spaces and q a measure on U x V According to a theorem of Strassen (1965) (see Jacobs 1978) there is a probability measure ). --'ç q on U x V if and only if
q(E x F)+ I .?..- p(E)+ o(F) for all measurable E c U, F c V. Use this to prove Proposition 5.11. (Hint: let q(b,g). card (Yb n {g}).)
References
The following list, while not a complete set of references for ergodic theory, does include all the articles and books mentioned in the text. Asterisks indicate books and survey articles, several of which contain extensive bibliographies. ABRAMOV, L. M. (1959) Entropy of induced automorphisms, Dokl. Akad. Nauk SSSR 128,647-50. Metric automorphisms with quasi-discrete spectrum, Izv. Akad. Nauk (1962) SSSR Ser. Mat. 26, 513-30. Amer. Math. Soc. Transi. 39 (1964), 37-56. ABRAMOV, L. M. and ROKHLIN, V. A. (1962) The entropy of a skew product of measure-preserving transformations, Vestnik Leningrad Univ. 17, 5-13. Amer. Math. Soc. Transi. (Ser. 2) 48 (1965), 225-65.
ADLER, ROY L. (1963) A note on the entropy of skew product transformations. Proc. Amer. Math. Soc. 14, 665-669. (1973) F-expansions revisited, Recent Advances in Topological Dynamics, Lecture Notes in Mathematics 318, Springer-Verlag, New York, 1-5. ADLER, ROY L., KONHEIM, A. G., and McANDREW, M. H. (1965) Topological entropy, Trans. Amer. Math. Soc. 114, 309-19. ADLER, ROY L. and MARCUS, BRIAN (1979) Topological Entropy and Equivalence of Dynamical Systems, Memoirs Amer. Math. Soc. 219. ADLER, ROY L. and WEISS, BENJAMIN (1967) Entropy, a complete metric invariant for automorphisms of the torus, Proc. Nat. Acad. Sci. USA 57, 1573-6. (1970) Similarity of Automorphisms of the Torus, Memoirs Amer. Math. Soc. 98. AKCOGLU, M. A. (1975) A pointwise ergodic theorem in Lp-spaces, Canad. J. Math. 27, 1075-82. ALPERN, STEVE (1978) Approximation to and by measure preserving homeomorphisms J. London Math. Soc. 18, 305-15.
References
303
ANOSOV, D. V. and KATOK, A. B. (1970) New examples in smooth ergodic theory: Ergodic diffeomorphisms, Trudy Moskov. Mat. Obsc. 23. Trans. Moscow Math. Soc. 23, 1-35. ANZAI, HIROTADA (1951) Ergodic skew product transformations on the torus, Osaka Math. J. 3, 83-99.
ARNOLD, V. I. and AVEZ, A. *(1968) Ergodic Problems of Classical Mechanics, W. A. Benjamin, Inc., New York. AUSLANDER, L., GREEN, L. and HAHN, F. *(1963) Flows on Homogeneous Spaces, Ann. of Math. Studies 53, Princeton University Press, Princeton, New Jersey. AVEZ, A. (see Arnold) BANACH, M. STEFAN (1926) Sur la convergence presque partout de fonctionelles linéaires, Bull. Sci. Math. (2) 50, 27-32, 36-43. BELLOW, A. and FURSTENBERG, H. (1979) An application of number theory to ergodic theory and the construction of uniquely ergodic models, Israel J. Math. 33, 231-40. BERG, KENNETH R. (1968) Entropy of torus automorphisms. Topological Dynamics, An International Symposium, Joseph Auslander and Walter Gottschalk, eds., W. A. Benjamin, New York. BESICOVITCH, A. S. *(1932) Almost Periodic Functions, Cambridge University Press, Dover Publications, New York, 1954. BILLINGSLEY, PATRICK *(1965) Ergodic Theory and Information, John Wiley & Sons, New York. BIRKHOFF, GEORGE D. *(1927) Dynamical Systems, Colloquium Publications IX. American Mathematical Society, Providence, Rhode Island; also 1966. (1931) Proof of the ergodic theorem, Proc. Nat. Acad. Sci. USA 17, 656-60. BLUM, J. R. and HANSON, D. L. (1963) On the isomorphism problem for Bernoulli schemes, Bull. Amer. Math. Soc. 69, 221-3. BOCHNER, S. (1927) BeitrAge zur Theorie der fastperiodischen Funktionen I, II, Math. Ann. 96, 119-47, 383-409. BOCHNER, S. and von NEUMANN, J. (1935) Almost periodic functions in groups, H, Trans. Amer. Math. Soc. 37, 21-50. BOGOLIOUBOFF, NICOLAS (see Krylov)
304
References
BOHL, P. (1909) Ober em in der Theorie der sâkularen Stikungen vorkommendes Problem, Journ. Reine u. Angew. Math. 135, 189-283. BOHR, HARALD (1925a) Zur Theorie der fastperiodischen Funktionen, Acta Math. 45, 29-127. (1925b) Zur Theorie der fastperiodischen Funktionen, II, Acta Math. 46, 101-214. (1926) Zur Theorie der fastperiodischen Funktionen, III, Acta Math. 47, 237-81. *(1932) Fastperiodiche Funktionen, Ergebnisse der Math. und ihrer Grenzgebiete I 5, J. Springer, Berlin, Eng. transi. Chelsea Pub. Co., New York, 1947. BOOLE, G. (1857) On the comparison of transcendents with certain applications to the theory of definite integrals, Philos. Trans. R. Soc. London 147, Part III, 745-803. BOWEN, RUFUS (1970a) Markov partitions and minimal sets for Axiom A diffeomorphisms, Amer. J. Math. 92, 907-18. (1970b) Markov partitions for Axiom A diffeomorphisms, Amer. J. Math. 92, 725-47. (1971) Entropy for group endomorphisms and homogeneous spaces, Trans. Amer. Math. Soc. 153,401-14. 411978) On Axiom A diffeomorphisms. CBMS No. 35, American Mathematical Society, Providence. Rhode Island. BREIMAN, LEO (1957) The individual ergodic theorem of information theory, Ann. Math. Stat. 28, 809-11. Correction, ibid. 31 (1960), 809-10. BREZ1N, JONATHAN and MOORE, CALVIN C. (1981) Flows on homogeneous spaces: a new look, Amer. J. Math. 103, 571-613. BRONSTEIN, I. U. *(1979) Extensions of Minimal Transformation Groups, Sijthoff & Noordhoff, Alphen an den Rijn, The Netherlands. BROWN, JAMES R. *(1976) Ergodic Theory and Topological Dynamics, Academic Press, New York. BUNIMOVITCH, L. A. (1973) Inclusion of Bernoulli shifts in some special flows, Uspehi Mat. Nauk 28 (3), 171-2. (1974) On a class of special flows, lzv. Akad. Nauk SSSR Ser. Mat. 38 (1), 213-27. BURKHOLDER, D. L. (1962) Successive conditional expectations of an integrable function, Ann. Math. Stat. 33, 887-93. BURKHOLDER, D. L.; GUNDY, R. F. and SILVERSTEIN, M. L. (1971) A maximal function characterization of the class HP, Trans. Amer. Math. Soc. 157, 137-53. CALDERCIN, A. P. (1950) On the behavior of harmonic functions at the boundary, Trans. Amer. Math. Soc. 68, 47-54.
References
305
(1968) Ergodic theory and translation-invariant operators, Proc. Nat. Acad. Sci. U.S.A. 59, 349- $ 3. (1977) Cauchy integrals on Lipschitz curves and related operators, Proc. Nat. Acad. Sci. U.S.A. 74. 1324-1327. CARATHÉODORY, CONSTANTIN (1939) Die Homomorphieen von Somen und die Multiplikation von Inhaltsfunktionen, Annali della R. Scuola Normale Superiore di Pisa (Ser. 2) 8, 105-30. CARLESON, LENNART (1958) Two remarks on the basic theorems of information theory, Math. Scand. 6, 175-80. CAVALLIN
Contributions to the theory of the secular perturbations of the planets, Meddelanden fritn Lunds astronomiska observatorium 19. CHACON, R. V. (1964) A class of linear trinsformations, Proc. Amer. Math. Soc. 15, 560-4. (1969) Weakly mixing transformations which are not strongly mixing, Proc. Amer. Math. Soc. 22, 559-62. CHACON, R. V. and ORNSTEIN, D. S. (1960) A general ergodic theorem, Illinois J. Math. 4, 153-60. CHUNG, K. L. (1961) A note on the ergo& theorem of information theory, Ann. Math. Stat. 32, 612-14. CORNFELD, I. P., FOMIN, S. V., and SINAI, YA. G. *(1981) Ergodic Theory, Springer-Verlag, New York. COTLAR, M. (1955) A unified theory of Hilbert transforms and ergodic theorems, Rev. Mat. Cuyana 1, 105-67. DANI, S. G. (1976) Bernoullian translatons and minimal horospheres on homogeneous spaces, J. Indian Math. Soc., 40, 245-84. DEKKING, F. M. *(1980) Combinatorial and Statistical Properties of Sequences Generated by Substitutions, Mathematisch Instituut Katholieke Universiteit van
Nijmegen. DEKKING, F. M. and KEANE, M. (1978) Mixing properties of substitutions, Z. Wahrsch. verw. Gebiete 42, 23-33. del JUNCO, ANDRES (1978) A simple measure-preserving transformation with trivial centralizer, Pacific J. Math. 79, 357-62. (1981a) Disjointness of measure-preserving transformations, minimal self-joinings and category, Ergo& Theory and Dynamical Systems 1. Proc. special year Maryland, 1979-80, A. Katok, ed., Progress in Math. 10, Birkhduser, Boston, 81-9. (1981b) A family of counter examples in ergodic theory, to appear.
306
References
del JUNCO, A., RAHE, M. and SWANSON, L. (1980) Chacon's autornorphism has minimal self-joinings, J. d' Analyse Math. 37, 276-84. del JUNCO. A. and ROSENBLATT, J. (1979) Counterexamples in ergodic theory and number theory, Math. Ann. 245, 185-97. DENKER, MANFRED (1973) On strict ergodicity, Math. Z. 134, 231-53. (1977) Generators and almost topological isomorphisms, Société Mathématique de France, Astérisque 49, 23-5. DENKER, MANFRED and EBERLEIN, ERNST (1974) Ergodic flows are strictly ergodic, Adv. in Math. 13, 437-73. DENKER, MANFRED; GRILLENBERGER, CHRISTIAN and SIGMUND, KARL *(1976) Ergodic Theory on Compact Spaces, Lecture Notes in Mathematics 527, Springer-Verlag, New York. DENKER, MANFRED and KEANE, MICHAEL (1979) Almost topological dynamical systems, Israel J. Math. 34. 139-60. DERRIENNIC, Y. (1973) On the integrability of the supremum of ergodic ratios, Ann. Prob. 1, 338-40. (1980) Quelques applications du theorème ergodique sous-additif, Société Mathématique de France, Astérisque 74, 183-201. DINABURG, E. I. (1970) The relation between topological entropy and metric entropy, DokL Akad. Nauk SSSR 190, 19-22. Soviet Math. DokL 11, 13-16. DOOB, J. L. (1940) Regularity properties of certain families of chance variables, Trans. Amer. Math. Soc. 47. 455-86. *(1953) Stochastic Processes, John Wiley & Sons, New York. EBERLEIN, ERNST (see Denker) EHRENFEST, PAUL and TATIANA *(1957) The Conceptual Foundations of the Statistical Approach in Mechanics, Cornell University Press, Ithaca, New York. ELLIS, ROBERT (1958) Distal transformation groups, Pacific J. Math. 8, 401-5. * (1 969) Lectures on Topological Dynamics, W. A. Benjamin, Inc., New York. (1973) The Veech structure theorem, Trans. Amer. Math. Soc. 186, 203-18. ENGEL, DAVID and KAKUTANI, SHIZUO (1981) Maximal ergodic equalities, to appear. * ENGLAND, JAMES W. (see Martin) ENGLAND, JAMES W. and MARTIN, N. F. B. (1968) On weak mixing metric automorphisms, Bull. Amer. Math. Soc. 74, 505-7.
References
307
ERD6S, P. (1964) Problems and results on Diophantine approximations, Comp. Math. 16, 52-65. ERDeoS, PAUL and TURÂN, PAUL (1936) On some sequences of integers, J. London Math. Soc. 11, 261-4. FATHI, ALBERT and HERMAN, MICHAEL R. (1977) Existence de difféomorphismes minimaux, Société Mathématique de France, Astérique 49, 37-59. FEFFERMAN, C. and STEIN, E. M. (1972) HP spaces of several variables, Acta Math. 129, 137-193. FELDMAN, J. (1976) New K-automorphisms and a problem of Kakutani, Israel J. Math. 24, 16-38. FELLER, WILLIAM *(1950) An Introduction to Probability Theory and its Applications, Vol. I., John Wiley & Sons, New York. FIELDSTEEL, A. (1979) An uncountable family of prime transformations not isomorphic to their inverses, unpublished. FOGUEL, SHAUL R. *(1980) Selected Topics in the Study of Markov Operators, Carolina Lecture Series, Department of Mathematics, University of North Carolina. FOMIN, S. V. (see Comfeld) FRIEDMAN, NATHANIEL A. *(1970) Introduction to Ergodic Theory, Van Nostrand Reinhold Company, New York. FRIEDMAN, N. A. and ORNSTEIN, D. S. (1970) On isomorphism of weak Bernoulli transformations, Adv. in Math. 5, 365-94. FURSTENBERG, HARRY (see also Bellow) (1960) Stationary Processes and Prediction Theory, Annals of Math. Studies 44, Princeton University Press, Princeton, N.J. (1963) The structure of distal flows, Amer. J. Math. 85, 477-515. (1967) Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation, Math. Systems Theory 1, 1-49. (1977) Ergodic behavior of diagonal measures and a theorem of Szemerédi on arithmetic progressions, J. d'Analyse Math. 31, 204-56. (l * 981) Recurrence in Ergodic Theory and Combinatorial Number Theory, Princeton University Pres , Princeton, New Jersey. FURSTENBERG, H. and KATZNELSON, Y. (1978) An ergodic Szemerédi theorem for commuting transformations, J. d'Analyse Math. 34, 275-91. FURSTENBERG, HARRY; KEYNES, HARVEY and SHAPIRO, LEONARD (1973) Prime flows in topological dynamics, Israel J. Math. 14, 26-38.
308
References
FURSTENBERG, H. and WEISS, B. (1978) Topological dynamics and combinatorial number theory, J. d'Analyse Math. 34, 61-85. GALLAVOTTI, GIOVANNI (see also Liberto, Ornstein) (1973) Ising model and Bernoulli schemes in one dimension, Comm. Math. Phys. 32, 183-90. GARSIA, ADRIANO M. (1965) A simple proof of E. Hopfs maximal ergodic theorem, J. Math. Mech. 14, 381-2. 1 (1970) Topics in Almost Everywhere Convergence, Markham Pub. Co., Chicago. 4 1973) Martingale Inequalities. Seminar Notes on Recent Progress, W. A. Benjamin, Inc., Reading, Massachusetts. GIRSANOV, L V. (1958) Spectra of dynamical systems generated by stationary Gaussian processes, DokL Akad. Nauk SSSR 119, 851-3. GLASNER, SHMUEL (1975) Compressibility properties in topological dynamics, Amer. J. Math. 97, 148-71 *(1976) Proximal Flows, Lecture Notes in Mathematics 517, Springer-Verlag, New York. GOODMAN, T. N. T. (1971) Relating topological entropy and measure entropy, Bull. London Math. Soc. 3, 176-80. GOODWYN, L. WAYNE (1969) Topological entropy bounds measure-theoretic entropy, Proc. Amer. Math. Soc. 23, 679-88. (1972) Comparing topological entropy with measure-theoretic entropy, Amer. J. Math. 94, 366-88. GOTTSCHALK, W. H. (1944) Orbit-closure decompositions and almost periodic properties, Bull. Amer. Math. Soc. 50, 915-19. GOTTSCHALK, WALTER HELBIG and HEDLUND, GUSTAV ARNOLD *(1955) Topological Dynamics. Colloquium Publications XXX VI, American Mathematical Society, Providence, Rhode Island. GRAHAM, R. L. and ROTHSCHILD, B. L. (1971) Ramsey's theorem for n-parameter sets, Trans. Amer. Math. Soc. 159, 257-92. GRAHAM, RONALD L.; ROTHSCHILD, BRUCE L. and SPENCER, JOEL H. *(1981) Ramsey Theory, John Wiley & Sons, New York. GRANT, EDWARD (see also Oresme) *(1971) Nicole Oresme and the Kinematics of Circular Motion, University of Wisconsin Press, Madison, Wisconsin. GREEN, L. (see Auslander) GRILLENBERGER, CHRISTIAN (see also Denker) (1970) Zwei kombinatorische Konstruktionen ftir strikt ergodische Folgen, .
Thesis, Universitat Erlangen-Nilmberg.
References
309
(1973a) Constructions of strictly ergodic systems I: Given entropy, Z. Wahrsch. verw. Geb. 25, 323-34. (1973b) Constructions of sttictly ergodic systems IL K-systems, Z. Wahrsch. verw. Geb. 25, 335-42. GRILLENBERGER, CHRISTIAN and SHIELDS, PAUL (1975) Construction of strictly ergodic systems III. A Bernoulli system, Z. Warsch. verw. Geb. 33, 215-17. GUNDY, R. F. (see also Burkholder) (1969) On the class L log L, martingales, and singular integrals, Studia Math. 33, 109-18. GUREVICH, B. M. (1961) The entropy of horocycle flows, Dokl. Akad. Nauk SSSR 136, 768-70. Soviet Math. lick!. 2, 124-30. GYLDÉN, J. A. H. (1895) Sur la transformation des agrégata périodiques, Astronomiska iakttagelser och undersôkningar pa Stockholms Observatorium 5, no. 4. HAHN, FRANK J. (see also Auslander) (1965) Skew product transformations and the algebras generated by exp (p (n)), Illinois J. Math. 9, 178-89. (1967) A fixed point theorem, Math. Systems Theory 1, 55-57. HAHN, FRANK and KATZNELSON, YITZHAK (1967) On the entropy of uniquely ergodic transformations, Trans. Amer. Math. Soc. 126, 335-60. HAJIAN, ARSHAG B. and KAKUTANI, SHIZUO (1964) Weakly wandering sets and invariant measures, Trans. Amer. Math. Soc. 110, 136-51. HALMOS, PAUL, R. (1943) On automorphisms of compact groups, Bull. Amer. Math. Soc. 49,619-24. In general a measure preserving transformation is mixing, Ann. of Math. (194) 45, 786-92. (1947) Invariant measures, Ann. of Math. 48, 735-54. (1949a) A non-homogeneous ergodic theorem, Trans. Amer. Math. Soc. 66, 284-88. *(1949b) Measurable transformations, Bull. Amer. Math. Soc. 55, 1015-34. * (1 950) Measure Theory, D. Van Nostrand Co., Inc., Princeton, New Jersey. *(1956) Lectures on Ergodic Theory, Chelsea Publishing Co., New York. HALMOS, PAUL R. and von NEUMANN, JOHN (1942) Operator methods in classical mechanics, II, Ann. of Math. 43, 332-50. HANSEL, GEORGES (1974) Strict uniformity in ergodic theory, Math. Z. 135,221-48. HANSEL, G. and RAOULT, J. P. (1973) Ergodicity, uniformity, and unique ergodicity, Indiana Univ. Math. J. 23, 221-37. HANSON, D. L. (see Blum)
310
References
HARDY, G. H. and LITTLEWOOD, J. E. (1930) A maximal theorem with function-theoretic applications, Acta Math.
54, 81-116. HARPER, L. H. (see Rota) HARTMAN, PHILIP (1947) On the ergodic theorems, Amer. J. Math. 69, 193-99. HECK E, E. (1922) Über analytische Funktionen und die Verteilung von Zahlen mod. ems, Abh. Math. Sem. Hamburg Univ. 1, 54-76. HEDLUND, G. A. (see also Gottschalk) (1969) Endomorphisrns and automorphisms of the shift dynamical system, Math.
Systems Theory 3, 320-75. HERMAN, MICHAEL R. (see Fathi) HILBERT, D.
(1892) Ueber die Irreduciliat ganzer rationaler Functionen mit ganzzahligen Coefficienten, J. Math. 110, 104-29. HINDMAN, N. (1974) Finite sums from sequences within cells of a partition of N, J. Combinatorial Theory A17. 1-11. HOCKING, JOHN G. and YOUNG, GAIL S. *(1961) Topology, Addison-Wesley, Reading, Massachusetts. HOPF, EBER HARD *(1937) Ergodentheorie, J. Springer, Berlin. Also Chelsea, New York, 1948. (1944) Über eine Ungleichung der Ergodentheorie, S.-B. Math.-Nat. Abt. Bayer. Akad. Wiss., 171-76. (1954) The general temporally discrete Markoff process, J. Rat. Mech. Anal. 3,
13-45. HUREWICZ, WITHOLD (1944) Ergodic theorem without invariant measure, Ann. of Math. 45, 192-206. IONESCU TULCEA, ALEXANDRA (see also Bellow) (1960) Contributions to information theory for abstract alphabets, Arkiv. Mat. 4, 235-47. JACOBS, KON RAD (1957) Fastperiodiziatseigenschaften allgemeiner Halbgruppen in BanachRdumen, Math. Z. 67, 83-92. *(1960) Neuere Methoden und Ergebnisse der Ergodentheorie, Springer-Verlag, Berlin.
* (1 962/ Lecture Notes on Ergodic Theory, Parts I and II, Matematisk Institut, 1963) Aarhus Universitet. (1970a) Lipschitz functions and the prevalence of strict ergodicity for continuous time flows, Contributions to Ergodic Theory and Probability, Lecture Notes in Mathematics 160, Springer-Verlag, New York, 87-124.
References
311
(1970b) Systèmes dynamiques Riemanniens, Czechos. Math. J. 20(90), 628-31. (1978) Measure and Integral, Academic Press, New York. JEWETT, ROBERT I. (1970) The prevalence of uniquely ergodic systems, J. Math. Mech. 19, 717-29. JONES, LEE KENNETH (1971) A mean ergodic theorem for weakly mixing operators, Adv. in Math. 7, 211-16. JONES, R. L. (1977) Inequalities for the ergodic maximal function, Studio Math. 60. 111-29. KAC, M. (1947) On the notion of recurrence in discrete stochastic processes, Bull. Amer. Math. Soc. 53, 1002-10. KAKUTANI, SHIZUO (see also Engel, Hajian, Yosida) (1938a) Two fixed-point theorems concerning bicompact convex sets, Proc. Japan Acad. 14, 242-5. (1938b) Iteration of linear operations in Banach spaces, Proc. Japan Acad. 14, 295-300. (1943) Induced measure preserving transformations, Proc. Japan Acad. 19, 635-41. * (1 952) Ergodic theory, Proc. Int. Cong. Math. Cambridge, Mass. 1950(2), 128-42. (1973) Examples of ergodic measure preserving transformations which are weakly mixing but not strongly mixing, Recent Advances in Topological Dynamics, Lecture Notes in Mathematics 318, Springer-Verlag, New York, 143-9. KAKUTAN1, SHIZUO and PETERSEN, KARL (1981) The speed of convergence in the Ergodic Theorem, Monatsh. Math. 91, 11-18. KATOK, A. B. (see Anosov) KATOK, A. B.; SINAI, YA. G. and STEP1N, A. M. *(1975) Theory of dynamical systems and general transformation groups with invariant measure, Progress in Science and Technology, Math. Analysis 13. J. Soviet Math. 7 (1977). 974-1065. KATOK, A. B. and STEPIN, A. M. (1967) Approximations in ergodic theory, Uspehi Mat. Nauk 22, 81-106. Russ. Math. Surveys 22, 77-102. (1970) Metric properties of measure preserving homeomorphisms, Uspehi Mat. Nauk 25, 193-220. Russ. Math. Surveys 25, 191-220. KATZNELSON, YITZHAK (see also Furstenberg, Hahn) (1971) Ergodic automorphisms of T" are Bernoulli shifts, Israel J. Math. 10, 186-95. KEANE, MICHAEL (see Dekking, Denker) KEANE, MICHAEL and SMORODINSKY, MEIR (1977) A class of finitary codes, Israel J. Math. 26, 352-71.
312
References
(1979a) Bernoulli schemes of the same entropy are fmitarily isomorphic, Ann. of Math. 109, 397-406_ (1979b) Finitary isomorphism of irreducible Markov shifts, Israel J. Math. 34, 281-6. (1979c) The fmitary isomorphism theorem for Markov shifts, Bull. Amer. Math. Soc. 1, 436-8. KEYNES, HARVEY B. (see also Furstenberg) * (1 972) Lectures on Ergodic Theory, School of Mathematics, University of
Minnesota. KEYNES, HARVEY B. and ROBERTSON, JAMES B. (1968) On ergodicity and mixing in topological transformation groups, Duke Math. J. 35, 809-19. (1969) Eigenvalue theorems in topological transformation groups. Trans. Amer. Math. Soc. 139, 359-69. KEYNES, H. B. and NEWTON, D. (1976) Real prime flows, Trans. Amer. Math. Soc. 218, 237-55. KHINTCHINE, A. I. (1923) Ober dyadische Briiche, Math. Z. 18, 109-16. (1933) Zu Birkhoffs Liisung des Ergodenproblems, Math. Ann. 107, 485-8, (1934) Eine Verschârfung des Poincaréschen "Wiederkehrsatzes", Comp. Math. 1, 177-9. * (1 949) Mathematical Foundations of Statistical Mechanics, Dover Publications,
New York. *(1948) Three Pearls of Number Theory, Graylock Press, New York. * (1 957) Mathematical Foundations of Information Theory, Dover Publications, New York. KOLMOGOROV, A. N. (1925) Sur les fonctions harmoniques conjuguées et les séries de Fourier, Fund. Math. 7, 24-9. (1928) Ober die Stunmen durch den Zufall bestinunter unabhangiger GriiBen, Math. Ann. 99, 309-19_ (1929) Ober das Gesetz des iterierten Logarithmus, Math. Ann. 101, 126-35. (1930) Bemerkungen zu meiner Arbeit 'Ober die Summen zufdlliger GrE•Beni, Math. Ann. 102, 484-8. (1937) EM vereinfachter Beweis des Birkhoff-Khintchineschen Ergodensatzes, Recueil Math. ( Mat. Sb.) 44, 367-8_ (1953) On dynamical systems with an integral invariant on the torus, Dokl. Akad. Nauk SSSR 93, 763-6. (1958) New metric invariants of transitive dynamical systems and automorphisms of Lebesgue spaces, Dokl. Akad. Nauk SSSR 119, 861-4. KONHEIM, A. G. (see Adler)
KOOPMAN, B. O. (1931) Hamiltonian systems and transformations in Hilbert space, Proc. Nat. Acad. Sel. U.S.A. 17, 315-18.
References
313
KOOPMAN, B. O. and von NEUMANN, J. (1932) Dynamical syste:ns of continuous spectra, Proc. Nat. Acad. Sci. 18, 255-63. KRENGEL, ULRICH (1970) On certain analogous difficulties in the investigation of flows in a probability space aid of transformations in an infinite measure space, Functional Analysis, Proceedings of a symposium held at Monterey, California, Octcber 1969, Carroll O. Wilde, ed., Academic Press, New York, 75-91. (1978) On the speed or convergence in the ergodic theorem, Monatsh. Math. 86, 3-6. KRIEGER, WOLFGANG (1970) On entropy arid generators of measure-preserving transformations, Trans. Amer. Mith. Soc. 149, 453-64. (1972) On unique ergalicity, Proc. Sixth Berkeley Symp. ( 1970) 1, University of California Press. Berkeley and Los Angeles, 327-46. KRON ECKER, LEOPOLD (1884) NAherungsweise ganzzahlige Auflii sung linearer Gleichungen, S.-B. Preuss. Akad. Wiss., 1179-93, 1271-99. Werke 111 (1), 47-109. KRYLOV, NICOLAS and BOGOLIOUBOFF, NICOLAS (1937) La théorie génétale de la mesure dans son application à l'étude des systèmes dynamiques de la mécanique non linéaire, Ann_ of Math. 38, 65-113. KUIPERS, L. and NIEDERREITER, H. * (1 974) Uniform Distribution of Sequences, John Wiley & Sons, New York. LAGRANGE, J. L. (1870) Oeuvres de Lagrange, Tome V, Paris. LEBESGUE, H. *( 1 904) Leçons sur l'intégration et la recherche des fonctions primitives. GauthiersVillars, Paris. LEDRAPPIER, F. (1973) Mesures d'équlibre sur un reseau, Comm. Math. Phys. 33, 119-28. LEVY, PAUL *(1937) Théorie de '.'Addition des Variables Aléatoires, Gauthier-Villars, Paris. LIBERTO, FRANCESCO di; GALLAVOTTI, GIOVANNI and RUSSO, LUCIO (1973) Markov processes, Bernoulli schemes, and Ising model, Comm. Math. Phys. 33, 259-82. LIND, D. A. (1977) The structure of skew products with ergodic group automorphisms, Israel J. Math 28, 205-48. LITTLEWOOD, J. E. (nee Hardy) LOOMIS, LYNN H. (1946) A note on the Hilbert transform, Bull. Amer. Math. Soc. 52, 1082-6.
314
References
MACKEY, GEORGE W. *(1974) Ergodic theory and its significance for statistical mechanics and probability theory. Adv. in Math. 12, 178-268. MARCUS, BRIAN (see also Adler) (1975) Unique ergodicity of the horocycle flow: variable negative curvature case, Israeli. Math. 21, 133-44. (1976) Reparametrization of uniquely ergodic flows, J. Differential Equations 22, 227-35. MARCUS, BRIAN and PETERSEN, KARL (1979) Balancing ergodic averages, Ergodic Theory, Lecture Notes in Mathematics 729, Springer-Verlag, New York. 126-43. MARTIN, NATHANIEL F. G. (see England) MARTIN, NATHANIEL F. G. and ENGLAND, JAMES W. *(1981) Mathematical Theory of Entropy, Addison-Wesley Pub. Co., Reading, Massachusetts. McANDREW, M. H. (see Adler) McMILLAN, BROCKWAY (1953) The basic theorems of information theory, Ann. Math. Stat. 24. 196-219. MESHALKIN, L. D. (1959) A case of isomorphism of Bernoulli schemes, Dokl. Akad. Nauk SSSR 128, 41-4. MILES, G. and THOMAS, R. K. (1978) Generalized torus automorphisms are Bernoullian, Studies in Probability and Ergodic Theory, Adv. in Math., Supplementary Studies 2, Academic Press, New York. MISIUREWICZ, M. (1976) A short proof of the variational principle for a r_li action on a compact space, Int. Conf. Dyn. Systems in Math. Physics, Société Mathématique de France, Astérique 40, 147-58. MOORE, CALVIN C. (see Brezin) NEMYTSKII, V. V. and STEPANOV, V. V. *(1960) Qualitative Theory of Differential Equations, Princeton University Press, Princeton, New Jersey. NEVEU, J. (1969) Une démonstration simplifiée et une extension de la formule d'Abramov sur l'entropie des transformations induites, Z. Wahrsch. verw. Geb. 13, 135-140. *(1975) Discrete - Parameter Martingales, North-Holland, Amsterdam, American Elsevier, New York. (1979) The filling scheme and the Chacon-Ornstein theorem, Israel J. Math. 33, 368-77. NEWTON, D. and PARRY, W. (1966) On a factor automorphism of a normal dynamical system, Ann_ Math. Stat. 37, 1528-33.
References
315
NIEDERREITER, H. (see Kuipers) ORESME, NICOLE (see also Grant) *(1351 7) De proportionibus proportionum and Ad pauca respicientes, Edward Grant, ed., University of Wisconsin Press, Madison, 1966. ORNSTEIN, DONALD S. (see also Chacon, Friedman) (19704) Bernoulli shifts with the same entropy are isomorphic. Adv. in Math. 4, 337-52. (1970b) Imbedding Bernoulli shifts in flows, Contributions to Ergodic Theory and Probability, Lecture Notes in Mathematics 160, Springer-Verlag, New York, 178-218. (1970c) Two Bernoulli shifts with infinite entropy are isomorphic, Adv. in Math. 5, 339-48. (1971) A remark on the Birkhoff ergodic theorem, Illinois J. Math. 15, 77-9. (1972) On the root problem in ergodic theory, Proc. Sixth Berkeley Symposium ( 1970), University of California Press, Berkeley and Los Angeles, 345-56. * (1 974) Ergodic Theory, Randomness, and Dynamical Systems, Yale Mathematical Monographs 5, Yale University Press, New Haven, Connecticut. *(1978) A survey of some recent results in ergodic theory, Studies in Probability Theory, Murray Rosenblatt, ed., Mathematical Association of America, 229-62. ORNSTEIN, DONALD S. and GALLAVOTTI, GIOVANNI (1974) The billiard flow with a convex scatterer is Bernoulli, Comm. Math. Phys. 38, 83-101. ORNSTEIN, D. S. and SHIELDS, P. C. (1973) Mixing Markov shifts of kernel type are Bernoulli, Adv. in Math. 10, 143-6. ORNSTEIN, D. S. and SMORODINSKY, M. (1978) Continuous speed changes for flows, Israel J. Math. 31, 161-8. ORNSTEIN, DONALD and WEISS, BENJAMIN (1973) Geodesic flows are Bernoullian, Israel J. Math. 14, 184-98. (1974) Finitely determined implies very weak Bernoulli, Israel J. Math. 17, 94-104. OXTOBY, JOHN C. *(1952) Ergodic sets, Bull. Amer. Math. Soc. 58, 116-36. (1952) Approximation by measure-preserving homeomorphisms, Recent Advances in Topological Dynamics, Lecture Notes in Mathematics 318, SpringerVerlag, New York, 206-217. OXTOBY, J. C. and ULAM, S. M. (1941) Measure-preserving homeomorphisms and metrical transitivity, Ann. of Math. 42, 874-920. PARASJUK, O. S. (1953) Horocycle flows on surfaces of constant negative curvature, Uspehi Mat. Nauk 8, 125-6.
316
References
PARRY, WILLIAM (see also Newton) (1969a) Compact abelian group extensions of discrete dynamical systems, Z. Wahrsch. Verw. Geb. 13, 95-113. *(1969b) Entropy and Generators in Ergodic Theory, W. A. Benjamin, Inc., New York. (1969c) Ergodic properties of affine transformations and flows on nihnanifolds, Amer. J. Math. 91, 757-71. (1971) Metric classification of ergodic nilflows and unipotent affines, Amer. J. Math. 93, 819-28. *(1981) Topics in Ergodic Theory, Cambridge University Press, Cambridge. PARTHASARATHY, K. L. *(1967) Probability Measures on Metric Spaces, Academic Press, New York. PETERSEN, KARL E. (see also Kakutani, Marcus) (1969) Prime Flows, Ph.D. Dissertation, Yale University. (1970a) A topologically strongly mixing symbolic minimal set. Trans. Amer. Math. Soc. 148, 603-12. (1970b) Disjointness and weak mixing of minimal sets, Proc. Amer. Math. Soc. 24, 278-80. (1971) Extension of minimal transformation groups, Math. Systems Theory 5, 365-75. (1973a) On a series of cosecants related to a problem in ergodic theory, Comp. Math. 26, 313-17. (1973b) Spectra of induced transformations, Recent Advances in Topological Dynamics, Lecture Notes in Math. 318, Springer-Verlag, New York, 226-30. * (1 977) Brownian Motion, Hardy Spaces and Bounded Mean Oscillation, London Math. Soc. Lecture Notes Series 28, Cambridge University Press, Cambridge. (1979) The converse of the dominated ergodic theorem, J. Math. Anal. Appl. 67, 431-6. PETERSEN, KARL and SHAPIRO, LEONARD (1973) Induced flows, Trans. Amer. Math. Soc. 117, 375-90. PLANTE, J. * (l 976) Introduction to Qualitative Theory of Differential Equations, Carolina Lecture Series, Department of Mathematics, University of North Carolina. POINCARÉ, HENRI Les méthodes nouvelles de la mécanique céleste, I (1892), 11 (1893), and III (1899), Gauthiers-Villars, Paris. Also Dover, New York, 1957 and NASA TIF 450-2, Washington, D.C., 1967. POLIT, STEVE (1974) Weakly isomorphic maps need not be isomorphic, Ph.D. Dissertation, Stanford University. POSTNIKOV, A. G. * (1 966) Ergodic problems in the theory of congruences and of Diophantine
References
317
approximations, Trudy Mat. Instituta Proc. im. V. A. Steklova 82, 3-112. Proc. Steklov Institute of Math 82, American Mathematical Society, Providence, Rhode Island, 1967. PRASAD, V. S. (1979) Ergodic measure preserving homeomorphisms. of gr Indiana Univ. Math. J. 28, 859-67. PROHOROV, JU. V. (1956) Convergence of randomprocesses and limit theorems in probability theory. Teor. Verojatnost. i Primenen 1, 177-238. RADO, R. (1943) Note on combinatorial analysis, Proc. London Math. Soc. 48, 122-60. RAHE, M. (see del Junco) RAOULT, J. P. (see Hansel) RATNER, MARINA (1974) Anosov flows with Gibbs measures are also Bernoullian, Israel J. Math. 17, 380-91. (1978) Horocycle flows are loosely Bernoulli, Israel J. Math. 31122-32. RIESZ, FRÉDERIC (1931) Sur un théorème de maximum de MM. Hardy et Littlewood, J. London Math. Soc. 7, 10-13. (1932) Sur l'existence de la dérivée des fonctions monotones et sur quelques problèmes qui s'y rattachent, Acta Sci. Mat. ( Szeged) 5, 208-21. RÉNYI, A. (1958) On mixing sequences of sets, Acta Math. Acad. Sci Hungar. 9, 215-28. ROBERTSON, JAMES B. (see Keynes) ROKHLIN, V. A. (see also Abramov) (1948) A 'general' measure-preserving transformation is not mixing, Dokl. Akad_ Nauk SSSR 60, 349-51. *(1949a) On the fundamental ideas of measure theory, Mat. Sb. 25, 107-50. Amer. Math. Soc. Trans!. 71 (1952). *(1949b) Selected topics from the metric theory of dynamical systems, Uspehi Mat. Nauk 4, 57-128. Amer. Math. Soc. Transi. Series 2 49, (1966), 171-240. * (1 960) New progress in the theory of transformations with invariant measure, Uspehi Mat. Nauk 94, 3-26. Russ. Math. Surveys 15 (1960), 1-22. * (1 967) Lectures on the theory of entropy of transformations with invariant measure, Uspehi Mat. Nauk 22, 3-56. Russ. Math. Surveys 22, 1-52. ROSENBLATT, .1 (see del Junco) ROTA, GIAN-CARLO and HARPER, L. H. (1971) Matching theory, an introduction, Advances in Probability I, Peter Ney, ed., Marcel Dekker, New York, 171-215. ROTH, KLAUS (1952) Sur quelques ensembles d'entiers, C. R. Acad. Sci. Paris Sér. A-B 234,
388-90.
318
References
ROTHSCHILD, BRUCE L. (see Graham) ROYDEN, H. L. *(1968) Real Analysis, Macmillan Pub. Company, New York. RUDIN, WALTER *(1967) Fourier Analysis on Groups, Interscience Publishers, New York. *(1976) Principles of Mathematical Analysis, McGraw—Hill, New York. RUDOLPH, DANIEL J. (1979) An example of a measure preserving map with minimal self-joinings, and applications, J. d'Analyse Math. 35, 97-122. (1981) A characterization of those processes finitarily isomorphic to a Bernoulli shift, Ergockc Theory and Dynamical Systems I, Proceedings Special Year, Maryland, 1979-80, A. Katok, ed., Progress in Math. 10, Birklauser, Boston, 1-64. RUSSO, LUCIO (see Liberto) RYLL-NARDZEWSKI, CZESLAW (1962) Generalized random ergodic theorems and weakly almost periodic functions, Bull. Acad. Polon. Sci. Sér. Sci. Math. Astronom. Phys. 10,271-5. (1967) On fixed points of semigroups of endomorphisms of linear spaces, Proc. Fifth Berkeley Symp. Math. Stat. Prob. ( 1965-66), II ( 1), University of California Press, Berkeley, 55-61. SAWYER, S. (1966) Maximal inequalities of weak type, Ann. of Math. 84, 157-74. SCHUR, ISSAI *(1973) Gesammelte Abhandlungen, Springer-Verlag, New York. SHANNON, C. E. (1948) A mathematical theory of communication, Bell System Tech. J. 27 379-423, 623-56. SHAPIRO, LEONARD (see Petersen) SHIELDS, PAUL C. (see also Ornstein) *(1973) The Theory of Bernoulli Shifts, University of Chicago Press, Chicago. SIGMUND, KARL (see Denker) SILVERSTEIN, M. L. (see Burkholder) SINAI, YA. G. (see also Katok, Cornfeld) (1959a) The notion of entropy of a dynamical system, DokL Akad. Nauk SSSR 125, 768-71. (1959b) Flows with finite entropy, Dokl. Akad. Nauk SSSR 125, 1200-2. (1968) Construction of Markov partitons, Funktsionengi Analiz i Ego Prilozheniya 2 (no. 3), 70-80. Functional Anal. Appt 2, 245-53. (1970) Dynamical systems with elastic reflections, Uspehi Mat. Nauk 25. Russian Math. Surveys 25, 137-89. *(1976) Introduction to Ergodic Theory, Princeton University Press, Princeton, New Jersey. SMORODINSKY, MEIR (see also Keane, Ornstein) *(1971) Ergodic Theory, Entropy, Lecture Notes in Mathematics 214, SpringerVerlag, New York.
References
319
(1973) fi-automorphisms are Bernoulli shifts, Acta Math. Acad. Sci. Hungar. 24, 273-8. SPENCER, JOEL H. (see Graham) STEIN, ELIAS M. (see also Fefferman) (1961) On limits of sequences of operators. Ann. of Math. 74. 140-70. (1969) Note on the class L log L. Studia Math. 32. 305-10. *( 1 970) Singular Integrals and Differentiability Properties of Functions, Princeton University Press, Princeton, N.J. STEIN, ELIAS and WEISS, GUIDO *(1971) Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press, Princeton, New Jersey. STEPANOV, V. V. (see Nemytskii) STEPIN, A. M. (see Katok) STRASSEN. V. (1965) The existence of probability measures with given marginals, Ann. Math. Stat. 36, 423-39. SWANSON, L. (see del Junco) SZEMERÉDI, E. (1969) On sets of integers containing no four elements in arithmetic progression, Acta Math. Acad. Sci Hungar. 20, 89-104. (1975) On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27, 199-245. TCHEBYCHEF, P. L. (1866) Sur une question arithmétique, Denkschr. Akad. Wiss. St. Petersburg Nr. 4, Oeuvres I, 637-84. THOMAS, R. K. (see Miles) TOTOKI, HARUO * (1 970) Ergodic Theory, Lecture Note Series 14, Matematisk Institut, Aarhus Universitet. TURÂN, PAUL (see Erd6s) ULAM, S. M. (see Oxtoby) ULAM, S. M. and von NEUMANN, JOHN (1945) Random ergodic theorems, Bull. Amer. Math. Soc. 51, 660. van der WAERDEN, BARTEL L. (1927) Beweis einer Baudet'schen Vermutung, Nieuw. Arch. Wisk. 15, 212-16. (1971) How the proof of Baudet's conjecture was found, Studies in Pure Mathematics presented to Richard Rado, L. Mirsky, ed., Academic Press, London, 251-60. VARADARAJAN, V. S. (1958) Weak convergence of measures on separable metric spaces, Sankhya 19, 15-22. * (1 962) Special Topics in Probability Theory, Lecture Notes, Courant inst. Math. Sci. VARGA, RICHARD S. New Jersey. * (1 962) Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs,
320
References
VEECH, WILLIAM A. (1970) Point-distal flows, Amer. J. Math. 92, 205-42. *(1977) Topological dynamics, Bull. Amer. Math. Soc. 83, 775-830. VERSHIK, A. M. and YUZVINSKII, S. A. *(1970) Dynamical systems with invariant measure, Progress in Mathematics 8, Plenum, New York. VILLE, JEAN *(1939) Etude Critique de la Notion de Collecq, Gauthier-Villars, Paris. von NEUMANN, JOHN (see also Bochner, Halmos, Ulam, Koopman) (1932a) Einige Skze über messbare Abbildungen, Ann. of Math. 33, 574-86. (1932b) Proof of the quasi-ergodic hypothesis, Proc. Nat. Acad. Sci. USA 18, 70-82. (1932c) Ober einen Satz von Herrn M. H. Stone, Ann. of Math. 33, 567-73. (1932d) Zur Operatorenmethode in der klassischen Mechanik, Ann. of Math. 33, 587-642. (1934) Almost periodic functions in a group, I, Trans. Amer. Math. Soc. 36, 445-92. WALTERS, PETER *(1975) Ergodic Theory: Introductory Lectures, Lecture Notes in Mathematics 458, Springer-Verlag, New York. *(1982) An Introduction to Ergodic Theory, Springer—Verlag, New York. WEISS, BENJAMIN (see also Adler, Furstenberg, Ornstein) *(1972) The isomorphism problem in ergodic theory, Bull. Amer. Math. Soc. 78, 668-84. WEISS, GUIDO (see Stein) WEYL, HERMANN (1916) Über die Gleichverteilung von Zahlen mod. Eins, Math. Ann. 77, 313-52. (1926) Integralgleichungen und fastperiodische Funktionen, Math. Ann. 97, 338-56. WHITE, H. E., JR. (1974) The approximation of one-one measurable transformations by measure preserving homeomorphisms, Proc. Amer. Math. Soc. 44, 391-4. WIENER, NORBERT (1939) The ergodic theorem, Duke Math. J. 5, 1-18. WRIGHT, FRED B. (1961a) Mean least recurrence time, J. London Math. Soc. 36, 382-4. (196 lb) The recurrence theorem, Amer. Math. Monthly 68, 247-8. *(1963) Ergodic Theory (Proc. Int. Symp. Tulane Univ. 1961), Academic Press, New York. YOSIDA, Ki5SAKU (1938) Mean ergodic theorem in Banach spaces, Proc. Japan Acad. 14, 292-4. YOSIDA, K6SAKU and KAKUTANI, SHIZUO (1939) Birkhoff's ergodic theorem and the maximal ergodic theorem, Proc. Japan Acad. 15, 165-8.
References
321
(f 941) Operator-theoretical treatment of Markoff process and mean ergodic theorem, Ann. Math. 42, 188-228. YOUNG, GAIL S. (see Hocking) YUZVIINSKII, S. A. (see Vershik) ZIMMER, ROBERT J. (1975) Extensions of ergodic actions and generalized discrete spectrum, Bull. Amer. Math. Soc. 81, 633-6. (1976u) Extensions of ergodic group actions, Illinois J. Math. 20, 373-409. (19764 Ergodic actions with generalized discrete spectrum, Illinois J. Math. 20, 555-88. ZYGMUND, A. *095 ( ) Trigonometric Series I, Cambridge University Press, Cambridge.
Index
Bold page numbers indicate formal definitions. Greek letters are entered according to the initial letter of their name: a (alpha) under 'a' ; /3 (beta) under 'b' etc. a-generic 274 a-name 274 a, k-name 274 a, T-name 274 Abramov, L. M. 55, 254, 258 Adler, Roy L. 253, 264, 267, 268, 273, 280, 282 affine map 10, 154 Akcoglu, M. A. 127 Alpern, Steve 72 almost periodic function 139, 133 point 136, 150 pointwise 159 sequence 136, 139, 140, 148 system 182 uniformly 136, 137, 154 Ambrose, W. 12, 64 Anosov, D. V. 72 Anosov flow 280 Anzai, Hirotada 53, 55 aperiodic matrix 59 transformation 48 approximate eigenvalue 56 approximation proof of the Ergodic Theorem 92 Artin, E. 162 assignment 284 Assignment Lemma 298, 299 atom 16 Auslander, L. 10 automorphism of a compact group (see also of a torus) 8, 21, 60, 153 higher mixing of 64 Bernoulli property of 279 of a nilmanifold 61, 279
of a torus entropy of 249, 259 ergodicity of 60, 61 strong mixing and Bernoulli properties of 61, 254, 279 autocorrelation 184
fi-transformation 280 Baker's transformation 21 ballot problem 85 Banach, M. Stefan 91 Banach's Principle 91, 99 Baudet, Pierre Joseph Henry 162 Bellow, A. (see also Ionescu Tukea) 187, 194 Bellow-Furstenberg Theorem 196, 200, 210 Berg, Kenneth R. 249 Besicovitch, A. S. 142 Bessel's Inequality 148 Bernoulli flow 279 Bernoulli shift (or scheme) 7, 2 1 , 60, 61, 85, 86, 94, 232, 248, 278 classification up to isomorphism 273 closure under factors 279 conditions for isomorphism to 275, 276, 278, 279 construction of strictly ergodic models for 187 countable Lebesgue spectrum of 62 ergodicity of 49 higher mixing of 64 isomorphism with 254, 277, 279, 280 relation to K-automorphisms 62, 63 strong mixing of 58 unilateral 21 Billingsley, Patrick 17 Birkhoff, George D. 3, 30, 90, 119, 151, 161, 168, 186
Index Birkhoff Recurrence Theorem 151, 168 Blum, J. R. 282 billiards 279 binary bits 230 Bochner, S. 139 Bogoliouboff, Nicolas 45 Bob!, P. 157 Bohr, Harald 139, 142, 148 Bohr-Fourier series 149 Boltzmann, L. 3, 42, 186, 228 Boltzmann's constant 228 Boltzmann's ergodic hypothesis 42 Boltzmann's H-Theorem 35 Boole, G. 107, 109 Borel, E. 3 bounded gaps (see also relatively dense) 37, 136 Bowen, Rufus 168, 252, 264, 267, 273 Bowen definition of topological entropy 266, 267 Breiman, Leo 260 Brownian motion 280 Bunimovitch, L. A. 280 Burkholder, D. L. 88, 89 Calderôn, A. P.91. 107, 108, 114 Calderón-Zygmund decomposition 107 Carathéodory, Constantin 16 Carleson, Lennart 260 cascade 150 Cava Ilin 157 center 151 Chacon, R. V. 119, 127, 211, 216 Chacon-Ornstein Theorem 76, 119, 121, 126, 132 character group 20 Clausius, Rudolf 227 coboundary 92 code 274 conditional distribution 275 entropy 235, 238 expectation 17, 235 information function 235, 237, 242, 243, 263 probability 18, 235 conjugate function 90, 107 conservative 38. 39, 41, 121, 123, 125, 126 conservative part (see also Hopf decomposition) 41, 125 contraction 24, 75, 120 converse dominated theorems 88 Cotlar, M. 108 countable partition 234, 242 Covering Lemma 107, 108 crossings 82 crossing set 80
323 cutting and stacking 219, 223 cycle 295 d-distance 221, 277, 281 Dani, S. G. 61, 279 decomposition (of a skeleton) 287 del Junco, Andrés 94, 211, 221. 224 Dekking, F. M. 211 Denker, Manfred 161, 187, 244, 258, 268, 282 density zero 65, 70, 73, 163 derivative transformation (see also induced transformation) 47, 200 Derriennic, Y. 87, 89 diagonal measure 176. 178 diffeomorphism (realization of rn.p.t.s by) 186 Dinaburg, E. I. 264, 267 Dirichlet, P. G. Lejeune 157 discrepancy 94 discrete spectrum 44, 55, 64, 154 disintegration of a measure 183, 298 disjointness of skeletons 286 dissipative 126 dissipative part (see also Hopf decomposition) 41, 123, 125 distal 154, 158, 159, 160 distribution (of a partition) 275 Dominated Ergodic Theorem 75, 87 Doob, J. L. 7, 91, 103 dual (of a society) 292, 293 E-entropy 267 &independent 275 E-period 136 Eberlein, Ernst 187 Ehrenfest, Paul and Tatiana 35, 41 eigenfunction 65, 133, 148 and weak mixing 64, 72 continuous 153, 160 generalized 55 of a group action 182 eigenvalue 43, 65 approximate 56 Ellis, Robert 158, 159, 160, 161 Ellis semigroup 158, 173 endotnorphisms of shift dynamical system 221 Engel, David 76, 81, 82 England, James W. 73 entropy (see also topological entropy, variational principle) 4, 186, 187, 227, 244 as a function of the measure 243 as an isomorphism invariant 233 computations automorphism of the torus 249, 259
324
Index
Bernoulli shift 245, 248 discrete spectrum system 245 induced transformation 257, 259 infinite product 248 inverse limit 248
expected return time 35, 46 extension 11
Markov shift 246, 248
factor (see also homomorphism) 10, 21 of a cascade 159, 162 with respect to a group action 182 of a Bernoulli shift 279
product transformation 247 rotation of the circle 245 skew product 254, 259 strictly ergodic system 273 subshift of finite type 273 convergence theorem for 241 infinite 259 of a K-automorphism 63
of a Lebesgue spectrum system 63 of a partition 232 of a source 231, 242, 259 of a transformation definition 233 convergence theorem for 247 of a transformation with respect to a partition 241, 243
definition 233 existence 240
inequality for 242 properties of 238 topological 264, 265, 266. 267 zero 244
entropy metric 244, 248 enveloping semigroup (see Ellis senugroup) equicontinuous 153, 154, 156 equidistribution 156 equilibrium 34 equimeasurable functions 114 equivalence of fillers 284, 289 Erdos, P. 162, 167 ergodic decomposition 81 ergodic Hilbert transform (see Hilbert transform, ergodic) ergodic hypothesis 3, 42, 186 ergodicity 3, 41, 42, 56, 58, 60, 61 conditions for 57 examples of 49-54 relation to topological ergodicity 152 topological 151, 152 Ergodic Szemeredi Theorem (see also Furstenberg-Katznelson Theorem) 163 Ergodic Theorem (individual or pointwise; see also Hurewicz, Chacon-Ornstein, Dominated, Hilbert transform,
Local, Maximal, Mean, Random, and Shannon-McMillan-Breiman Theorems) 3, 23, 27, 30, 44, 52, 91, 92, 93, 103, 119, 198, 222, 262
approximation proof of 92 for Markov operators 33
of a cascade 159
with respect to a group action 182
Fathi, Albert 72 Fefferman, C. 89, 91 Feller, William 85 Feldman, J. 274 fiber entropy 254 fiber measures 183 fiber product 183 Fieldsteel, A. 211 filler 283 alphabet 283, 288 Bernoulli shift 284, 288 entropy 288 equivalence 289 Lemma 291, 299, 300 measure 283, 288 set (of a skeleton) 289 filling scheme 76, 119, 120, 122, 123. 126, 127, 129, 130 finitary 249, 273, 282 finite code 221 finitely determined 279 flow 1, 76, 150 flow built under a function 11, 21 Foguel, Shaul R. 33, 123, 127 Fourier conjugate 107 Fourier transform 20 Friedman, N. A. 59, 276, 279 Fundamental Theorem of Calculus 90, 91 Furstenberg, H. 55, 133, 151, 157, 159, 161, 163, 164, 165, 173, 186, 187, 194, 211 Furstenberg-Katznelson Theorem 165, 166, 185 Furstenberg Structure Theorem for distal cascades 160, 162 Furstenberg-Weiss Theorem 164, 170, 171 Gallavotti, Giovanni 279, 280 Garsia, Adriano M. 75, 89, 91 gauge (of a partition) 196 Gaussian system 8, 63 Gelfand, T. M. 189 Gelfand's problem 50
generalized discrete spectrum 55 generalized eigenfunction 55 generic 4, 72, 175, 274 generating semialgebra 14 generator 234, 243, 244, 248, 274 geodesic flow 9, 280
Index geodesic cascade 153 Girsanov, I. V. 63 Glasner, Shmuel 154, 155 Goodman, T. N. T. 264 Goodwyn, L. Wayne 264 Gottschalk, W. H. 9, 10, 136, 150, 161 Gottschalk's Theorem 168 Graham, Ronald L. 172 Grant, Edward 156, 157 Green, L. 10 Grillenberger, Christian 161, 187, 244, 258 group of autornorphisms of [0, 1] 71 group rotation (see also rotations, translations, discrete spectrum) 64, 181 Griinwald's Theorem 192, 186 Gundy, R. F. 88, 89 Gurevich, B. M. 63 Gyldén 157 H-Theorem 35 Haar measure 20 Hahn, Frank 10, 154, 155, 186 Hajian, Arshag B. 41 Halmos, Paul R. 17, 34, 39, 48, 60. 64, 71, 72 Hamilton's equations 5 Hamiltonian function 5 Hamming metric 277 Hansel, G. 187 Hanson, D. L. 282 hard-sphere gas 42, 279 harmonic function 108 Harper, L. H. 284 Hartman, Philip 77 Hardy, G. H. 89, 90 Hardy-Littlewood Maximal Function 91 Hardy-Littlewood Maximal Theorem 90 Hardy spaces 89, 90 Hecke, E. 162 Hedlund, G. A. 9, 10, 161, 221 Herman, Michael R. 72 higher-degree mixing 63, 64, 73 higher-dimensional van der Waerden Theorem 172 Hilbert, D. 173 Hilbert transform ergodic 90, 108, 113, 116, 119 maximal inequality for 110, 113, 115 real-variable 90, 107, 108, 115 Hindman's Theorem 172, 173, 187, 195, 196 hole 120, 121, 128 homogeneous space 10 pair 168 subset 168 homomorphisrn (see also factor) 11, 21 of cascades 159 of measure algebras 15
325 Hopf, Eberhard 74, 75, 123 Hopf decomposition (see also conservative, dissipative parts) 123, 125 Hopf Maximal Ergodic Theorem 75, 123, 130, 131 horocycle cascade 153 horocycle flow 9, 63, 211 Horowitz, J. 103 Hurewicz' Ergodic Theorem 131
idempotent 158 incompressible transformation 38, 39, 126 independence 239, 242, 275 index set (of a skeleton) 289 induced transformation (see also derivative and primitive transformations) 12, 34, 39, 40, 41, 45, 56, 210, 257, 259 infinite product transformation (entropy of) 248 infinitely recurrent transformation 38, 39 informatnn function (see also conditional ir ormation function) 235, 238, 243 invariant set 42, 125, 126 inverse limit 12, 21, 160 entropy 248 of cascades 160, 162 inversion formula 20 lonescu Tulcea, Alexandra (see also Bellow) 260 1P-set 173, 195, 196 irreducible stochastic matrix 52, 59 Ising model 280 isometric extension 160 isomorphism 4, 16, 58, 60, 61, 63, 234, 246, 254, 273, 274, 275, 276, 277, 278, 279, 280, 301 Isomorphism Theorem for Lebesgue spaces 16, 71, 187 Jacobs, Konrad 12, 187, 301 Jensen's Inequality 18, 22 Jewett, Robert 1. 186 Jewett-Krieger Theorem 175, 187, 188 join of open covers 264 of partitions 233, 237 joining 281, 293, 301 Jones, Lee Kenneth 70, 163 Jones, R. L. 89, 100 K-automorphisin 62, 63, 187, 273 Kac, M. 46 Kac' Theorem 37,46 Kakutani, Shizuo (see also von NeumannKakutani adding machine) n, 26, 27, 39, 41, 64, 70, 76, 82, 91, 94, 154. 163, 187, 210, 216 Kakutani equivalence 273
326
Index
Kakutani Fixed Point Theorem 154 Kakutani-Hahn Theorem 155 Kakutani-Rokhlin Lemma (see also skyscraper, tower) 48, 94, 95 Kakutani skyscraper decomposition 45,48 A. B. 72, 211 Katok, Katznelson, Yitzhak (see also Furstenberg-Katznelson Theorem) 27, 61, 133, 165, 186, 253, 279 Keane, Michael 211, 281, 282 Keplerian element 157 Keynes, Harvey B. 152, 153, 157, 211 Khintchine, A. L 6, 34, 37, 98, 162, 231 Khintchine Recurrence Theorem 37, 195 Kolmogorov, A. N. 4, 27, 62, 90, 94, 98, 152, 211, 227, 234, 246 Kolmogorov Consistency Theorem 22, 36 Kolmogorov-Sinai Theorem 234, 244, 248 Konheim, A. G. 264, 267, 268, 273 Koopman, B. 0. 64, 65 Krengel, Ulrich 99, 258 Krickeberg Decomposition 104 Krieger, Wolfgang (see also JewettKrieger Theorem) 187 Krieger Generator Theorem 244 Kronecker, Leopold 158 Kronecker system 182 Kronecker-Weyl Theorem 250 Krylov, Nicolas 45 Kuipers, L. 161
Lagrange, J. L. 157 Law of the Iterated Logarithm 98 Law of Random Signs 94 least common refinement 233, 264 Lebesgue, H. 101 Lebesgue Differentiation Theorem 101, 102 Lebesgue set 102 Lebesgue space 16 Lebesgue spectrum 61, 62, 63 Ledrappier, F. 280 Lehrer, E. 211 Liberto, Francesco di 280 Lind, D. A. 22, 61, 279 Liouville number 94 Liouville's Theorem 5, 22, 186 Littlewood, J. E. 89, 90, 91 Local Ergodic Theorem 79, 100, 102, 114 Loomis, Lynn H. 107 Loosely Bernoulli 211, 274 Mackey, George W. 55 Marcus, Brian 76, 211, 282 markers 283 Marker Lemma 285 marker process 283 Markov, A. A. 42 Markov operator 33, 120, 121
Markov partition 252, 253 Markov process 119 Markov shift 7, 35, 52, 56, 60 entropy of 246, 248 ergodicity of 51 higher mixing of 64 isomorphism to Bernoulli shifts 276, 279 strong mixing of 59 weak Bernoulli property of 276 weak mixing of 64 Martin, Nathaniel F. G. 73 martingale 103 Martingale Convergence Theorem (see also Submartingale Covergence Theorem) 103, 262 marriage lemmas 284, 292, 295, 296, 297, 299 maximal almost periodic factor 182 maximal equality 74, 76, 80, 81, 85, 87 maximal function 89, 91, 119 eventual 119 ergodic 74, 75 ergodic Hilbert 113 Hardy-Littlewood 90, 100 Hilbert 107 nontangential 90 Maximal Ergodic Theorem 27, 31, 91, 92, 93 operators 75, 130 for for flows 76 maximal inequality 74, 76, 90, 91, 100, 103 107, 110, 113, 115, 116,260 maximum spectral type 19, 61 Maxwell-Boltzmann velocity distribution 228 McAndrew, M. H. 264, 267, 268, 273 McMillan, Brockway 260 Mean Ergodic Theorem 3, 23, 37, 176, 177 in Banach space 26 in Hilbert space 24 mean motion, problem of 157 mean sojourn time 44, 45 mean value (of almost-periodic sequence) 137, 139 measure algebra 15 measure algebra of a measure space 15 measure-preserving homeornorphism 72 measure-preserving transformation 2 Meshalkin, L. D. 282 metric indecomposability 42 metric space of a measure algebra 16 metrically prime 221 metrically transitive 42 Miles, G. 61, 279 minimal 42, 136, 150 mixing (see also higher mixing, strong mixing, topological mixing, uniform mixing, weak mixing) 73, 210
327
Index Misiurewicz, M. 269 monothetic group 154
monotone equivalence 274 Morse sequence (or set) 210, 273 Multidimensional Szemerécli Theorem 165 multiple recurrence (see also Furstenberg, Furstenberg-Katznelson, Furstenberg-Weiss, Ergodic Szemerédi Theorems) 46, 163, 164, 186 (n, s)-separated 266 (n, s)-spanning 267 natural extension 13, 21, 280 Nemytskii, V. V. 161 Neveu, J. 104, 258 Newton, D. 211 Niederreiter, H. 161 nonatomic 16 nonsingular 2, 41, 131 nonwandering set 151
one-parameter group (see also flow) 132 optional sampling 104, 105 orbit 2, 150 Oren, Ishai 211 Oresme, Nicole 156, 157 Ornstein, Donald S. 4, 27, 59, 63, 73, 88, 119, 211, 273, 274, 275, 276, 278, 279, 280, 284 oscillations 80, 82 oscillation set 80 Oxtoby, J. C. 45, 72
Parry, William 10, 55, 211 Parseval's Formula 149 Parthasarathy, K. L. 161 partition 232 countable 234, 242 distribution of 275 Petersen, Karl E. 55, 76, 87, 89, 94, 152, 153, 157, 187, 210, 211 phase space 5, 228 physical system 34 Pigeonhole Principle 157 Pinsker's formula 243 Plancherel Theorem 20 Plante, J. 6 Poincaré, Henri 161, 186 Poincaré Recurrence Theorem 34, 35, 37, 38, 151, 163, 229 point-distal 161 point homomorphism mod 017 pointwse almost periodic 154. 159 Poisson integral 108 Polit, Steve 233 Pontryagin Duality Theorem 20 positive contraction 89, 120 positive upper density 163
Prasad, V. S. 72 1Prasjuk, O. S. 63 prime transformation 211, 216, 217 proximal 154, 158, 161, 173 product 11 Prohorov, Ju. V. 161 promiscuity number 293
quasi-discrete spectrum 55 quasi-eigenfunction 55 quasi-ergodic hypothesis 42 r-recurrent point 186 Rado, R. 162 Rahe, M. 211, 224 Random Ergodic Theorem 99 random walk 85, 86 rank decomposition (of a skeleton) 287 Raoult, J. P. 187 rationally independent 51 Ratner, Marina 211, 280 realization problem 4, 186 recurrence 4, 33,41 topological 150 regional 151
recurrent point 34, 151 transformation 38, 39 refinement of an open cover 264 of a partition 233 of a society 293 refining sequence of open covers 265 regional transitivity 152 relatively compact extension 184 relatively dense 37, 136, 150 relatively ergodic extension 183 relatively weakly mixing extension 183, 185 Rényi, A. )b return time 56
reverse maximal inequality 88 Riesz, Fréderic 77, 90, 91, 100 Rising Sun Lemma 77, 78, 85 90 Robertson, James B. 152, 153 Rokhlin, V. A. (see also Kakutani-Rokhlin Lemma) 17, 63, 72, 95, 254 Rosenblatt, J. 94 Rost, H. 131 Rota, Gian-Carlo 284 rotations of compact abelian groups (see also group rotation, translation) 8, 133 rotation of the circle (see also group rotation, translation) 8, 49, /56, 157 Roth, Klaus 162 Rothschild, Bruce L. 172 Rudin, Walter 21
328
Index
Rudolph, Daniel J. 211, 233, 273, 282 Russo, Lucio 280 Ryll-Nardzewski, Czeslaw 155 Ryll-Nardzewski Fixed Point Theorem 155 a -ideal 15 Sand 120, 121, 128 Sawyer, S. 91 Scheller, H. 258 Schreier, 0. 162 Schur, Issai 162 Second Law of Thermodynamics 35, 229 semialgebra 14, 58 semilocal operator 114 separable measure algebra 16 Shannon, C. E. 227, 230, 232, 260 Shannon-McMillan-Breiman Theorem 245, 261, 263, 284, 285, 289, 291 Shapiro, Leonard 157, 210, 211 shift transformation 7 Shields, Paul C. 274, 280 Sierpinski, Waclaw 108 Sigmund, Karl 161, 244, 258 Silverstein, M. L. 89 Sinai, Ya. G. (see also Kolmogorov-Sinai Theorem) 4, 42, 234, 248, 249, 252, 279 singular integral 107 skeleton 283, 286 decomposition of 287 Skeleton Lemma 287 skew product 11, 21, 53, 55, 100, 254 skyscraper (see also Kakutani-Rokhlin Lemma, Tower) 40, 45, 48, 200 Smorodinsky, Meir 211, 280, 281, 282 society 284, 292, 301 dual of 293 source 230, 231, 242, 259 space mean 42, 44 spectral measure 19, 69 Spectral Theorem 19, 65, 68, 133, 148 spectral type 55 speed of approximation 72 speed of convergence in Ergodic Theorem 90, 93 stationary stochastic process 6, 85, 87, 91, 98, 248 Stein, Elias 88, 89, 91, 107, 119 Stepanov, V. V. 161 Stepin, A. M. 72, 221 stochastic matrix 36, 51, 52, 59 stopping time 104, 105, 106 Strassen, V. 301 strictly ergodic (see also uniquely ergodic) 152, 156, 187, 194, 217 Strong Law of Large Numbers 3 stong mixing (see also higher mixing,
mixing) 4, 57, 58, 59, 60, 61, 62, 210, 211 failure of, with weak mixing 217, 219, 220 Ornstein's condition for 73 of a sequence of sets 73 topological 151, 152, 210, 212, 214 strong topology (on group of m.p.t.$) 71 subadditive sequence 240 sub-Markov operator 120 submartingale 91, 103 Submartingale Convergence Theorem 104, 106 subshift of finite type 153, 162, 273 subskeleton 286 supermartingale 103, 105, 106 supremum (of a family of factors) 185 Swanson, L. 221, 224 Symbolic cascades 153 SZ 185 Szemerédi, E. 162 Szemerédi Theorem 159, 163 Szemerédi-Furstenberg Theorem (see also ErgOdic Szemerédi Theorem) 150 Tchebychef, P. L. 158 Thomas, R. K. 61, 279 three-series theorem 94 time mean 42, 44 toeplitz sequence 209 topological dynamics 133, 150 topological entropy 264, 265 Bowen definition of 266, 267 of a symbolic cascade 266 topological ergodicity 151, 152 topological weak mixing 151 topological weak mixing without strong mixing 210, 212, 214 topological strong mixing 151 total function 189, 191, 202, 208 Totoki, Haruo 9 tower (see also skyscraper, KakutaniRokhlin Lemma) 197, 200 transition matrix 36, 51, 52, 59 transitivity metric 42 regional 152 translation on a compact group (see also group rotation, rotation) 153 translation of the torus (ergodicity of) 51 trigonometric polynomial 148 Turin, Paul 162 624 n-name 270 Ulam, S. M. 72, 99 uniform distribution mod 1, 156 uniform function 189, 190, 202, 208, 210
Index uniform mixing 64 uniformly almost periodic 136, 137, 154 uniquely ergodic (see also strictly ergodic) 42, 138, 152, 186, 187, 217 unitary operator determined by a m.p.t. 24, 43 uperossing lemma 104 van der Waerden, Bartel L. 162 van der Waerden's Theorem 162, 167, 191, 172 Varadarajan, y.. S. 161 Varadhan, S. 133 Varga, Richard. 7 Variational Principle 264, 268, 269, 273 Veech, William A. 160, 161 very weakly Bernoulli 278, 279 Ville, Jean 91 virtual group 55 von Neumann, John 3, 17, 23, 55, 64, 65, 80, 99, 139, 188 von Neumann-Kakutani adding machine 209, 211 von Neumann's Theorems on Lebesgue spaces 17, 71 Walters, Peter 267 wandering point 151 wandering set 38, 41, 126 Weak Law of Large Numbers 285 weak mixing (see also mixing) 57, 58, 64, 65, 152
329 and density 0 73 and eigenfunctions 64, 72 characterizations of 65, 180 of higher degree 151, 163, 179 of roots and powers 72 topological 151, 152, 180, 210, 212, 214 without strong mixing 71, 210, 212, 214, 217, 219, 220 weak topology (on group of m.p.t.$) 71, 73 weak-type (1, 1) inequality (see also maximal inequality, Maximal Ergodic Theorem) 113, 115, 119 weak-type (p, q) inequality 91 weakly Bernoulli 276, 278, 279 weakly wandering set 41 Weiss, Benjamin (see also FurstenbergWeiss Theorem) 151, 164, 173, 187, 253, 279, 280 Weiss, Guido 107 Weyl, Hermann (see also Kronecker-Weyl Theorem) 50, 149, 156, 158 White, H. E., Jr. 72 Wiener, Norbert 27, 75, 76, 87, 88, 91, 102, 108, 114 Wright, Fred B. 39, 46 Yosida. Kôsaku 25, 27, 76, 91
Zermelo, E. 35 zero entropy 244 Zimmer, Robert J. 55 Zygmund, A. 107