Large Deviations
This is Volume 137 in PURE AND APPLIED MATHEMATICS
H. Bass, A. Borel, J. Moser, S.-T. Yau, editors Paul A. Smith and Samuel Eilenberg, founding editors A complete list of titles in this series appears at the end of this volume.
Large Deviations Jean-Dominique Deuschel Department of Mathematics Cornell University Ithaca, New York
Daniel W. Stroock Department of Mathematics Massachusetts Institute of Technology Cambridge, Massachusetts
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers Boston San Diego New York Berkeley London Sydney Tokyo Toronto
Copyright 0 1989 by Academic Press, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101 United Kingdom Edition published by ACADEMIC PRESS INC. (LONDON) LTD. 24-28 Oval Road. London NW1 7DX
Library of Congress Cataloging-in-PublictjonData Deuschel, Jean-Dominique, Date Large deviations 1 Jean-Dominique Deuschel, Daniel W. Stroock. p. cm. -(Pure and applied mathematics; v. 137) Rev. ed. of An introduction to the theory of large deviations 1 D.W. Stroock. c1984. Bibliography: p. Includes index. ISBN 0-12-213150-9 1. Large deviations. I. Stroock, Daniel W. Introduction to the theory of large deviations. 11. Title. ILI. Series: Pure and applied mathematics (Academic Press); 137. QA3.P8 vol. 137 [QA273.67] 89-397 510 s-dcl9 CIP [519.5'34]
Printed in the United States of America 89909192 987654321
For
Monroe D. Donsker who has always liked it best in function space
This Page Intentionally Left Blank
Preface The title of this book to the contrary not withstanding, there is no more a “theory” of large deviations than there is a “theory” of partial differential equations; and what passes for the “theory” is, in reality, little more than a grab-bag of techniques which have been successfully applied to special situations and are therefore worth trying in sufficiently closely related settings. Thus, even though the title implies that a master key is contained herein, the reader will discover that reading this book prepares him to analyze large deviations in the same sense as the manual for his computer prepared him to write his first program; that is, hardly at all! In spite of the preceding admission, we have written this book in the belief that even (and, perhaps, particularly) when a field possesses no “CAUCHY integral formula,” a useful purpose can be served by a book which surveys a few outstanding successes and attempts to codify some of the principles on which those successes are based. In the present case, the examples of success are plentiful but the underlying principles are few and somewhat illusive. We hope that the brief synopsis given below will help the reader spot and understand these few principles, at least in so far as we have recognized and understood them ourselves. After attempting, in Section 1.1, a heuristic explanation of the ideas on which the theory of large deviations rests, the remainder of Chapter I is devoted to a detailed account of two basic examples. The first of these, which is the content of Section 1.2, is CRAMER’S renowned theorem on the large deviations of the CESAROmeans of independent R-valued random variables from the Law of Large Numbers. In order to emphasize, as soon as possible, that large deviations can be successfully analyzed even in an infinite dimensional context, for our second example we have chosen vii
...
Vlll
Large Deviations
SCHILDER’S Theorem for re-scaled WIENER’Smeasure. The derivation is carried out in Section 1.3, and applications to first STRASSEN’S Law of the Iterated Logarithm and second to the estimates of VENTCELand FRETDLIN are given in Section 1.4. In connection with the VENTCEL-FREIDLIN estimates, we have assumed that the reader is familiar with the elements of IT& theory of stochastic differential equations; however, because the rest of the book relies on neither the contents of Section 1.4 nor a knowledge of IT& calculus, readers who are not acquainted with the quirks of stochastic integration need not (on that account) be too concerned about what lies ahead. Armed with the examples from Chapter I, we turn in Chapter I1 to the formulation of two of the guiding principles on which the rest of the book is more or less based. The first of these is contained in Lemma 2.1.4 which provides a reasonably general statement of the “covariant” nature of large deviations results under mappings which are sufficiently continuous. (The treatment given in Section 1.4 of the VENTCEL-FREIDLIN estimates should be ample evidence of the potential power of this principle.) In order to formulate the second general principle set forth in this chapter, we start in Section 2.1 with VARADHAN’S version of the LAPLACE asymptotic formula (cf. Theorem 2.1.10) and combine this in Section 2.2 with a little elementary convex analysis to arrive at the conclusion (drawn in Theorem 2.2.21) that when large deviations are governed by a convex rate function then that rate function must be the LEGENDRE transform of the logarithmic moment generating function. Since, as we saw in Chapter I, the rate functions produced in both C R A M ~ Rand ’ S SCHILDER’S Theorems are in fact LEGENDRE transforms of the corresponding logarithmic moment generating functions, this observation leads one to guess that there may be circumstances in which the easiest approach to large deviation results will consist of two steps: one being an abstract existential proof that the large deviations are governed by a convex rate function and the second being the “computation” of a LEGENDRE transform. (Such a procedure is reminiscent of the time-honored technique to describe the solution to a partial differential equation by first invoking some abstract existence principle and only then trying to actually say something concrete about its properties.) The contents of Chapters I11 and IV may be viewed as a sequence of examples to which the principles developed in Chapter I1 can be applied. In Chapter 111, all the examples concern partial sums of independent random variables. After introducing, in Section 3.1, a general argument (cf. Theorem 3.1.6 and its Corollary 3.1.7) for carrying out an abstract existential
Preface
ix
proof that large deviation results for such sums are governed by convex rate ’S this functions, we return in the rest of the chapter to C R A M ~ RTheorem; time in its full glory as a statement about random variables taking values either in a space of probability measures or in a BANACH space. Thus, Section 3.2 contains a proof of SANOV’S Theorem (cf. Theorem 3.2.17) for empirical distributions; and Section 3.3 is devoted to the BANACH space version of CRAMER’S Theorem. (In connection with the derivation of these results, we introduce in Lemma 3.2.7 a somewhat technical mini-principle which turns out to play an important role throughout the rest of the book.) Finally, in Section 3.4, we show that SCHILDER’S Theorem is a special case of the BANACH space statement of CRAMER’S Theorem and, in fact, that a scH~~DER-like result can be proved for general GAussian measures.
As we said before, Chapter IV is again an application of the principles laid down in Chapter 2. In particular, we now take up the study of SANOV-type theorems for MARKOV processes which do not necessarily have independent increments. In order to make the development here mimic the one in Chapter 111, we impose extremely strong hypotheses to guarantee that the processes with which we are dealing possess ergodic properties which are nearly as good as those possessed by processes with independent increments. As a result, basically the same ideas as those in Chapter I11 apply to nice additive functionals of such processes and allow us to prove (cf. Theorems 4.1.14 and 4.2.16) that these functionals have large deviations which are governed by a convex rate function. In particular, after identifying the rate functions involved, we use these considerations to obtain a variant of the original DONSKER-VARADHAN theory for the large deviations of the normalized occupation time distribution (i.e. the empiriprocess (cf. Theorems 4.1.43 cal distribution of the position) of a MARKOV and 4.2.43). Because it is technically the simpler, we do MARKOV chains (i.e., MARKOV processes with a discrete time-parameter) in Section 4.1 and move to the continuous-time setting in Section 4.2; and in Section 4.4 we show how, under the hypotheses used in Sections 4.1 and 4.2, one can realize the large deviation theory for the empirical distribution of the whole process as the projective limit of the theory for the position. Section 4.3, which is somewhat a digression from the main theme and should probably be skipped on first reading, contains DONSKER and VARADHAN’S analysis of the WIENERsausage problem. To some extent, Chapter V represents to retreat from the pattern set approach of in Chapters I11 and IV and a return to the more LLhands-on’’ Chapter I. Thus, just as in Chapter I, the approach in Chapter V is to first
X
Large Deviations
inequality; get an upper bound, basically as an application CHEBYSHEV’S then a lower bound via ergodic considerations; and finally a reconciliation the two. A rather general treatment of the upper bound is given in Section 5.1, where, in Theorem 5.1.6 and Corollary 5.1.11, we sharpen results obtained earlier in Theorem 2.2.4. In preparation for the derivation of the lower bound, we digress in Section 5.2 and give a brief resum6 of a few more or less familiar results from ergodic theory. As a first application of these considerations, we present in Section 5.3 a very general large deviation result for the empirical distribution of the position of a symmetric MARKOV process (cf. Theorem 5.3.10). Our second application is the content of Section 5.4, where we prove CHIYONOBU and KUSUOKA’Srecent theorem about the process level large deviations of a (not necessarily MARKOV)hypermixing process (cf. Theorem 5.4.27); and, in Section 5.5, we discuss the hypermixing property for processes which are 6-MARKOV. The motivation behind Chapter V has been our desire to get away from the extremely strong ergodic assumptions on which the techniques in C h a p ters 111and IV depend and to replace them with assumptions which have a better chance of holding in either non-compact or infinite dimensional situations. In order to test and compare the scope of the various techniques which are contained in Chapters IV and V, we describe in Chapter VI some analytic results with which one can see, at least in the context of diffusion processes, the relative position of these results as measured on the scale of elliptic coercivity. The contents of Chapters I through IV constitute a reasonably thorough introduction to the basic ideas of the theory and more or less record lectures given by the second author during the fall of 1987. Thus, we consider these four chapters as a suitable package on which to base a semester length course for advanced graduate students with a strong background in analysis and some knowledge of probability theory. In this connection, we point out that each section ends with a large selection of exercises. Although some of these exercises are quite routine and do not require any particular ingenuity on the part of the student, others are more demanding. Indeed, we have not hesitated to include in the exercises a good deal of important material. In particular, it is only in the exercises that one can find most of the applications. Finally, a word about the history of this book may be in order. In 1983, the second author gave a course, at the University of Colorado, in which he taught himself and one or two others something about the modern theory
Preface
xi
of large deviations. Having expended considerable effort on the task, he decided to set down everything which he then knew about the subject in a little book [loll. That was five years ago. In the intervening years, both the subject as well as his understanding of it have grown; and, with the aid and comfort provided by a fellow sufferer, he took on the more ambitious project of basing a full blown exposition on the course which he gave in fall of 1987 at M.I.T. Thus, the present book is a great deal longer: both because it contains more material and because the exposition is more detailed. Unfortunately, in the process of removing some of the more glaring imperfections and omissions in [loll, we are confident that we have introduced a sufficient number of new flaws to keep our readers somewhat annoyed and, occasionally, thoroughly confounded. However, the responsibility for these flaws is entirely ours and not that of the ever patient students in 18.158, who struggled with the class notes out of which this final version evolved. In particular, we take this opportunity to thank STEVEFROMM for goading us into addressing several of the more perplexing inanities in those class notes. Also, we are indebted to MICHAELSHARPE who saved us many harrowing hours manipulating w i n t o doing our bidding (cf. the similarity between the format, if not the content, of the present volume and volume # 133 in the same series); and, last but not least, it is a pleasure for us to thank our typist for Eir beautiful work. Cambridge, MA December 31, 1988
This Page Intentionally Left Blank
Contents Chapter I: Some Examples 1.1: The General Idea 1.2: The Classical CRAMERTheorem 1.3: SCHILDER’S Theorem 1.4: Two Applications of SCHILDER’S Theorem
1 1 3 8 21
Chapter 11: Some Generalities 2.1: The Large Deviation Principle 2.2: Large Deviations and Convex Analysis
35 35 52
Chapter 111: Generalized CramBr Theory 3.1: Preliminary Formulation 3.2: SANOV’S Theorem 3.3: CRAMER’STheorem for BANACH Spaces 3.4: Large Deviations for GAUssian Measures
58 58 64 78 85
Chapter IV: Uniform Large Deviations 4.1: MARKOVChains 4.2: Continuous Time MARKOVProcesses 4.3: The WIENER Sausage 4.4: Process Level Large Deviations
91 91 110 140 161
Chapter V: Non-Uniform Results 5.1: Generalities about the Upper Bound 5.2: A Little Ergodic Theory 5.3: The General Symmetric MARKOVCase 5.4: The Large Deviation Principle for Hypermixing Processes 5.5: Hypermixing in the Epsilon MARKOV Case
185 185 193 206 213 231
...
Xlll
xiv
Large Deviations
Chapter VI: Analytic Considerations 6.1: When is a MARKOV Process Hypermixing? 6.2: Symmetric Diffusions on a Manifold 6.3: Hypoelliptic Diffusions on a Compact Manifold
237 237 250 271
Historical Notes and References
284
Notation Index
301
Subject Index
305
I
Some Examples
1.1 The General Idea
Let E be a Polish space (i.e., a complete, separable metric space) and suppose that { p e : E > 0) is a family of probability measures on E with the property that p E + 6, as E -+ 0 for some p E E (i.e., pE tends weakly to the point mass 6,). Then, for each open set U 3 p , we have that p E ( U C ) 0; and so we can reasonably say that, as E -, 0, the measures p, “see p as being typical.” Equivalently, one can say that events l? E E lying outside of a neighborhood of p describe increasingly “deviant” behavior. What is often an important and interesting problem is the determination of just how “deviant” a particular event is. That is, given an event r for which p 4 F, one wants to know the rate at which pe(I’) is tending to 0. In general, a detailed answer to this question is seldom available. However, if one restricts ones attention to events which are “very deviant” in the sense that p e ( r ) goes to zero exponentially fast and if one only asks about the exponential rate, then one has a much better chance of finding a solution and one is studying the large deviations of the family { p E: E > 0). In order to understand why the analysis of large deviations ought to be relatively easy and what one should expect such an analysis to yield, consider the case in which all of the measures p e are absolutely continuous with respect S, it is reasonable to to some fixed reference measure m. Since pc suppose that dP€ - = gc exp[-~/e] dm where E log gc 0 uniformly fast as E + 0 and I is a non-negative function which vanishes only at the point p . One then has, for any r with m ( r ) < 00,
-
-
1
Large Deviations
2
and so (since m(r)< 0 0 )
-
-ess.sup{exp[--l(q)]
:q E
r}
as E -+ 0. (The “essential” here refers to the measure m.) Hence, in the situation described above, we have, at least when m(r)< 0O:
(1.1.1)
lim clogpL,(r)= -em. inf{I(q) : q E r}.
S-tO
In particular, the factor gE plays no role in the analysis of large deviations; and it is this fact which accounts for the relative simplicity of this sort of analysis. Moreover, it is often e&syto extend (1.1.1) to cover all r’s. For instance, such an extension can certainly be made if one knows that for each L > 0 there is a r L such that (1.1.2)
m(I‘L) <
00
and
i&
E’O
E log (pB(r‘i))5
-L.
In particular, we see that if E = Rd, X W d is LEBESGUE’Smeasure on Rd, and (1.1.3)
then (1.1.4)
lim tiOg(y,(r)) = -ess. inf{lqI2/2 : q E r}
€+O
for all measurable I7 in Rd. Although the preceding gives some insight into the phenomena of large deviations, it relies entirely on the existence of the reference measure m and therefore does not apply to many situations of interest (e.g., it will nearly never apply when E is an infinite dimensional space). When there is no reference measure, it is clear that (1.1.1) has got to be replaced by an expression in which m does not appear. Taking a hint from the theory of weak convergence, one is tempted to guess that a reasonable replacement
I Some Examples
3
-
for (1.1.1)in more general situations is the statement that there exists a function I : E [0, m] with the property that (1.1.5)
- i n f ~ 5 & ~ l o g ( p ~ ( r )5) l i m ~ l o g ( p , ( r ) )5 - i n _ f ~ . r o E+O O E' r
For instance, it is easy to pass from (1.1.4) to (1.1.5) with pE = -ye and I ( q ) = 1412/2. With the preceding in mind, we will adopt the attitude that the study of large deviations for { p E : 6 > 0) centers around the identification of an appropriate I for which (1.1.5) holds. Before attempting to lay out a general strategy, we will begin by presenting two classical cases in which such a program can be successfully carried to completion. 1.1.6 Exercise.
Let E = [0, 00) and define
for E E (0,m). Show that (1.1.5) holds with I ( q ) = q, q E [O,m).
1.2 The Classical Cramhr Theorem
Let p be a probability measure on R and, for n 2 1, let p" on R" denote the n-fold tensor product of p with itself. Next, let p n on R denote the distribution of x E R" xi under p". Assuming that 1x1 p ( d x ) < 00,the weak law of large numbers says that pn + S , where p = S z p(dz). Thus, {,un: n 2 1) is a candidate for a theory of large deviations (take pE = p n for n - 1 < 1 / 5 ~ n in order to make the notation here conform with that in Section 1.1). Moreover, in the case when ~ ( d z=) yl(dz) (cf. (1.1.3) and take the d there to be l ) , we have that p n = -ylin.Hence, at least for this special case, we know the theory of large deviations. Namely, we know that we can take I ( z ) = 1zI2/2. The purpose of the present section is to find the large deviation theory for other choices of p. We begin our program by introducing the logarithmic moment generating function
- xy
-
SR
Note that A E W A p ( X ) E (0,031is a lower semi-continuous convex function. Indeed, by truncation, it is easy to write A, as the non-decreasing
Large Deviations
4
limit of smooth functions, and the convexity of A,, follows from HOLDER’S inequality. Next, let A; be the Legendre transform of A,,: (1.2.2)
A;(%)
G
SUP{XZ - Ap(X) : X E W}, z E R.
Note that, by its definition as the point-wise supremum of linear functions, A; is necessarily lower semi-continuous and convex, In order to develop some feeling for the relationship between A,,, A;, and p , we present the following elementary lemma. 1.2.3 Lemma. Let p be a probability measure on R. Then A; Moreover:
2 0.
(i) If JR I ~ l p ( d x < ) oo and p = & x p ( d z ) , then A;(p) = 0, A; is nondecreasing on [p, CG) and non-increasing on ( - o o , p ] . In addition, for q 1 p , AL(q) = sup{Xq - A,(X) : X 2 0) and p([q,oo)) 5 exp[-A;(q)]; and, for q _< P, A;(a) = sup{Xq - A,,(A) : I O } and p((--oo,q]) I exp[-~;(q)]. (ii) If A,,(X) < oo for all X ’s in a neighborhood of 0, then A t ( z ) 1x1 + 00.
-
(iii) If A,,(X) < 00 for all A E R, then A,, E C”(R) and AL(z)/Ixl as1.1 oo.
oo as
oo
--$
PROOF:We begin by noting that, since Ax - A,(X) = 0 for X = 0 and every x E R, A;(.) 2 0. Now suppose that JR 1x1 p ( d ~ < ) 00 and set p = JR z p(dx). To see that AL(p) = 0, we use JENSEN’S inequality to obtain A,,(X) 2 Xp for all X E R.
(1.2.4)
In particular, this shows that Xp-A,(X) I0 for all X E R and so A;(p) 5 0. Since A; is non-negative and convex, this proves that A;@) = 0, A; is nondecreasing on b, oo),and A; is non-increasing on ( - o o , p ] . To complete the proof of i), we first note that, as a consequence of (1.2.4), if q 2 p then A;(q) = sup{Xq - A,(X) : X 2 0) and if q I p then hi(q)= sup(Xq A,(X) : A _< 0). Hence, if q 2 p , then, since (by CHEBYCHEV’S inequality) P([Qloo)) IexP[-(k - AfiO))]l
L 0,
we see that P ( [ 4 , o o ) ) 5 exP[-A;(q)]
Similarly, if q 5 p , then
*
I Some Examples
5
We next turn to the proof of (ii) and (iii). To this end, note that if A > 0 (A < 0) and A,(A) < 00, then i&--rcu A;(.)/. 2 A (lim cu A;(x)/x 5 -A). Hence, the only assertion left to be proved is that A, E Ccu(W) if A,(A) < o;, for all A. But, by TAYLOR'S Theorem and the LEBESGUE Dominated Convergence Theorem, it is easy to check that A E (-6,6) I-+ A,(A) is, in fact, real-analytic as long as A,(f6) < 00. I
As a consequence of part (i) of Lemma 1.2.3 we have the following. 1.2.5 Lemma. If
sR1x1p(dz) < m then for every closed set F i& n+m
1 -log(p,(F)) n
R
5 -infA;. F
PROOF:Let p = J, x p ( d x ) and note that J, 1x1 p n ( d x ) I JR 1x1 p(dx) < 00 and x pn(dx) = p for all n 2 1. Next, observe that if A, = A,, , then A,(A) = nA,(A/n), and therefore that A: = nA;. Now suppose that q 2 p (q I p ) . Then, by (i) applied to p,, we see that ,un([q,oo)) I exp [-nA; ( q ) ] (pn ( ( -00, q ] ) I exp [-nA; ( q ) ] ) . Since A; is non-decreasing (non-increasing) on [p, 00) (on (-00,p]), this proves the result when either F C [p, 00) or F (-00,pI. On the other hand, if both F n [p, m) # 8 and Fn(-m,p] # 8, let q+ = inf{x 2 p : x E F } and q- = sup{% I p :z E F ) . Then
sR
p n ( F ) Iexp[-nAl(q-)]
+ exp[-nAf,(q+)]
I 2exp[-ninfF A;],
and so the result holds in this case also.
1.2.6 Theorem. (CRAMER)Assume that A,(A) < m for every A E R. Then for every measurable I' C_ R one has that
(We adopt here, and throughout,the convention that the infimum over the null set is +00.)
PROOF:In view of Lemma 1.2.5, we need only show that if q E R and 6
> 0,
(1.2.7)
1
lim ;l o g [ p n ( ( q- 6, q + 6))] L - A i ( q ) . n-cu
In proving (1.2.7), we first suppose that there is a A E A;(q) = Aq - A,(A). Consider the probability measure
W for which
6
Large Deviations
and define the measures jin accordingly. Note that that
sR1.1
ji(dz)
< 00 and
At the same time, note that $ ( t q - A,(t))It=X = 0 since t E R H tq - A,(t) achieves its maximum value at A. Combining these, we conclude that q = z b(dz) and therefore (by the Weak Law of Large Numbers) that Fn((q - 6,q 6)) 1 as n -+ 00. Assuming that X 2 0, note that
sR
-
+
where
F’rom this and the preceding comments, we conclude that 1
n-oo
; 1% [Pn((q - 4 4 + 6N-j L - q n )
-
for every 6 > 0. Since the left hand side of the above is clearly nondecreasing as a function of 6 > 0, we have now proved (1.2.7) for the case when there is a X 2 0 for which A;(q) = Xq - A,(X). Clearly, the same argument (with q - 6 replacing q 6) works when hL(q)= Xq - A,,(X) for some X 5 0. We must now handle the case in which A;(q) > Xq - A,(X) for all X E R. If q 2 J,zp(dz), then (cf. (i) of Lemma 1.2.3) there exists a sequence X i /” 00 such that Xeq - A,(Xe) /” A;(q). Since it is clear that
+
we have that
But this is possible only if p((q,oo)) = 0 and p ( { q } ) = exp[-A;(q)]. Hence, p n ( { ( q , .. . ,q ) } ) = ex~[-nA;(q)], and so p n ( { q } ) L ex~[-nAL(q)]. Clearly, this implies (1.2.7) holds for every S > 0. An analogous argument can be used in the case when q F. fR z p ( d x ) . I
I Some Examples
7
1.2.8 Remark.
The reader should take note of the structure of the preceding line of reasoning. Namely, the upper bound comes from optimizing over a family of CHEBYCHEV inequalities; while the lower bound comes from introducing a RADON-NIKODYM factor in order to make what was originally “deviant” behavior look like typical behavior. This pattern of proof is one of the two most powerful tools in the theory of large deviations. In particular, it will be used in the next section as well as Sections 5.3 and 5.4. 1.2.9 Exercise.
Assuming that
sR1x1p(&)
< 00, show that
(1.2.10)
Hint: Set p = JR z p ( d z ) and show that if A;(q) < 00, then
according to whether g 2 p or g 5 p . 1.2.11 Exercise.
(i) Show that for everyp E R: AiE,(z) = AL(z-p), z E R, where p p = S p * p and we use u * p to denote the convolution of u with p. (ii) If p = crb,
+ (1- a)&,where a < b and
Q:
E (0,l),show that
where 0 log 0 = 0.
(iii) If p(dz) = X[o,oo)(z)e-Zdx, show that
A;(x) =
I”
z - 1 - logz
for x 5 0 for z > 0.
(iv) If p(&) = ( 2 7 r 0 ~ ) - ~exp[-(z /~ - ~ ) ~ / 2dz, a ~ where ] a E R and c > 0, show that
Large Deviations
8 1.3 Schilder’s Theorem
In this section we give an example of a large deviation result for a certain family of measures on an infinite dimensional space. Let d E Z+ be given and set
B E C([O,00); Wd) : e(0) = 0 and
lim -
t+w
t
and observe that (@,/I. 110) is a separable real BANACH space. In order to represent the dual 0* of 0 , note that 0 is naturally isometric to the space of continuous paths on [0, 00) which vanish at 0 and at 00 (namely, map B to the path t H(1 t)-’B(t)); and use this isometry to identify 0* with the space of Rd-valued, BORELmeasures A on [0,m) with the properties that A((0)) = 0 and Jfo,m)(lt ) IAl(dt) < 00, where 1A1 denotes the variation measure associated with A. With this identification, the duality relation e.(X, B)e is given by B ( t ) - A ( d t ) (the “.” here standing for the ordinary t )IAl(dt). inner product in W d ) and IlAlle== ~ro,w,(l Let B = Be denote the BORELfield over 0; and, for t 1 0, let Bt denote the smallest a-algebra over 0 with respect to which all of the maps 8 H B(s),s E [0,t ] ,are measurable. As is easy to check, B = a(Ut,oBt). The following remarkable existence theorem is due to N. WIENER-[112]. We have added a few small embellishments to WIENER’S original statement.
+
+
Sr0,.,
+
1.3.2 Theorem. (WIENER)There is a unique probability measure W on (0,B) with the property that (1.3.3)
W(dB)= exp[-Aw(A)],
L e x p [ G e.(A,B),]
A E 0*,
where (1.3.4)
Aw(A)
-
/
sA
t A(&) . A ( d t ) .
[O,mP
Moreover, if P is a probability measure on (0,B),then P = W if and only if any one of the following holds:
(i) For all 0 5 s < t , the random variable B H B(t) - B(s) under P is independent of B, and is GAussian with mean 0 and covariance (t - s ) I ~ d .
9
I Some Examples
(ii) For all 0 5 s < t and r (1.3.5)
E BRd,
P({d : O(t)E r)lBa)(11,) = r t - a ( $ ( s )
+ r)
for P-almost all 11, E 0. (The measure yt-s in (1.3.5) is the one described in (1.1.3).)
(iii) For every n E Z+, 0 5 tl < ... < t,, and &,...,En E Wd, n
'1
Finally, W has the properties that
-
-
(iv) For each a E ( O , o o ) , W is invariant under the scaling transformation 8 a1/28(./a)and the time shift transformation 8 e(. a ) - qa).
+
(v) W is invariant under the time inversion transformation 8 .8(l/.) (= o at t = 0).
(vi) For each a E (0,1/2) and T > 0,
The reader is assumed to be familiar with some form of WIENER'S basic existence theorem and with the basic properties of Wiener's measure W . In particular, he is advised to reconcile the statement which he knows with the one given above. We can now describe the family of measures which we want to study in this section. Namely, for each 6 > 0, let W , denote the distribution of 8 t 1 / 2 8 under W . Clearly W , 60, where 60 is the point mass at the path which never leaves 0. Hence, we are again dealing with a family for which it is reasonable to ask about large deviations. Before getting into the details, it may be helpful to make a couple of remarks. In the first place, it should be noted that, at least formally, we are dealing here with a situation like the one discussed in Section 1.1. Indeed, an often useful heuristic representation of WIENER'S measure is the formula
-
(1.3.6)
*
Large Deviations
10
The expression in (1.3.6) is somewhat fanciful. Indeed, none of quantities on the right hand side makes sense by itself. In particular, “do” stands for the (non-existent) translation invariant measure on 0 , Y?‘ denotes the derivative (which W-almost surely fails to exist) of 8, and the constant “c” is infinite. Thus, (1.3.6) is at best just a schematic representation of what one gets by formally passing to the limit in the expression for the W measure of a subset of 0 whose description involves a continuum of times. Leaving such technicalities aside, one has to admit that to whatever degree one accepts (1.3.6), one has to grant the expression
an equal degree of acceptance; and, on the basis of this expression combined with the discussion in Section 1.1, one is led to predict that the function I governing the large deviations of {We : E > 0) ought to be le(t)I2dt. A second remark, and the one on which our analysis will be based, is that the family {We : E > 0} is related to the sort of family which was handled in Section 1.2; and, as we will see later (cf. Sections 3.3 and 3.4 ), the result which we are about to obtain can be considered as a consequence of C R A M ~ RTheorem ’S for measures on 0. To understand the relationship with the situation dealt with in CRAMER’S Theorem, note that the measure W1/, here is precisely the distribution under W” of the random variable
--Ce, -
(e+ , e n )
E W
1
n
n
E
0.
1
Hence, on the basis of CRAMER’S Theorem, we should predict that the I governing the large deviations of {We: E > 0) is the LEGENDRE transform of the logarithmic moment generation function for W . In this connection, one should observe that the quantity AW introduced in WIENER’S Theorem above is the logarithmic moment generating function
for W . Thus, what we are now predicting is that the function
is the function which governs the large deviations of {We: E > 0). We begin our rigorous analysis with a lemma which shows, among other things, that the two predictions made above are at least consistent.
11
I Some Examples 1.3.8 Lemma. Given X E O* define ?(tx E 0 by I-t
(1.3.9)
Then, for all A, 77 E 0*,
Next, define H’ = H1([O,c o ) ; R d ) to be the space of ?,b E O with the property that $ ( t ) = s,” $(s) ds, t 2 0, for some E L 2 ( [ 000); , R d ) ; and set
ll*llH1
=
ll~llL2([o,00)pd)
for
*
4
E
H’. Then
( 1.3.12)
In particular,
AhNJA) = Aw(X),
(1.3.13)
and for each L 2 0, {$ E 0 : subset).
Ah($)5
E @*,
L } CC 0 (i.e., is a compact
PROOF:We first observe that the second equality in (1.3.10) is an elementary integration by parts. Second, we note that it suffices to prove the first equality in (1.3.10) when X = 77 (since the general case then follows from this by polarization) and that, by an elementary approximation argument, we need only handle X E 0’ which are non-atomic and compactly supported. But in this case we have: s A t A(&)
=-
s,
rm)
. A(&)
Wdt)
=2 i
tdlX((4 . ) I2
=
0
1,
d
IX((t,
. ) I2
.
lo.,
s X(ds)
dt.
00)
Turning to the proof of (1.3.12), first suppose that $ E H I . Then
12
Large Deviations
and therefore, by (1.3.10),
Ah($)
is equal to
f
4 ( t ) X((t,00)) dt - -
lo
IX((t,O O ) )dt~ :~X E 0'
F)
I
Hence, the proof of (1.3.12) will be complete once we show that $ E H' whenever Ah($)< oo.But if q5 E C,"([O, 00); R d ) (i.e., it is a smooth path with compact support) and we define X E 0' by + ( t ) = X ( ( t , oo)), t _> 0, then
and so there exists a unique $ E L2([0,00); Fad) such that
- Id
$ ( t ) . i ( t ) dt =
J, w
4(t)d t , 4 E C,-([0,0O);
w.
9-J)
From here it is an easy step to the conclusion that $ ( t ) = s," $(s) ds, t 2 0, and therefore that $ E H'. Given (1.3.12), (1.3.13) is clearly a consequence of (1.3.11). To complete the proof, note first that, directly from its definition in (1.3.7), A& is lower semi-continuous. Thus, the fact that {$ : A&($) I L} is compact follows immediately from (1.3.12) and the easily verified observation that bounded subsets of H' are relatively compact in 0.I We will now prove a slightly deficient form of the right hand side of (1.1.5) with I = Ah. The reader should remark the similarity of this argument with the proof of Lemma 1.2.5.
1.3.14 Lemma. Let $ E 0 be given. Then for each 6 > 0 there exists an > 0 such that
T
(Here, and throughout, B ( x , r ) denotes the open ball of radius T around a point x in a metric space; and B(x,r ) denotes the corresponding closed ball.) In particular, if K C C 0 , then (1.3.16)
-
lim 6 log ( W , ( K ) )5 - inf Ah.
€40
K
13
1 Some Examples PROOF:To prove (1.3.15), note that
W E(B($, r ) ) = W(B(+/E1/2, r / 2 l 2 ) )
for all X E O*. If Ah($)= 00, choose X E Q* so that hw(X) 2 1 + 1/6 and T = 1/(1 IlXlls.). If AL($) < 00, choose X E O* so that 6 8'(A $>, - A W ( X ) 2 Ah($) - 6/2 and T- = Z ( l + l l & . ) ' To prove (1.3.16), set .!= infK Ah and, for given 6 > 0, use (1.3.15) and the compactness of K to choose $ 1 , . . . , $n E K and T I , . . . , T n E (0,00) SO that K G U;" €I(&, r k ) and
+
Then
and
SO
Finally, let 6 \ 0. I
1.3.17 Remark. Suppose that { p E : E > 0) is a family of probability measures on a BANACH space ( X , 11 . 11) and let Ape denote the logarithmic moment generating function for p E (i.e.,
for X E X * ) . Further, assume that (1.3.18)
A(X) E lim €Ape(A/€) EO'
14
Large Deviations
exists for every A E X*.Then the argument used to prove Lemma 1.3.14 leads to the conclusion that for any K C C X
-
limclog(p,(K))
€+O
I -infA*, K
where s ~ p { x * ( A ,-~ A(A) ) ~ : A E X’}
A*(x)
is the LEGENDREtransform of A. In the particular case treated in Lemma 1.3.14, we had that CAW,(A/€) = hw(A),and so (1.3.18) was trivial. See Theorem 2.2.4 for more details on this subject. Although the result obtained in Lemma 1.3.14 is restricted to compact subsets and is therefore less than we really want, we will turn to the left hand side of (1.1.5) before addressing the problem of extending (1.3.16) to all closed sets. Just as in the proof of Theorem 1.2.6, the key to proving the left hand side of (1.1.5) is the use of an efficient method for moving the “center” (or mean) of the measures W e . In the present setting, this key is contained in the following important quasi-invarianceproperty of WIENER’S measure.
-
1.3.19 Lemma. (CAMERON & MARTIN)Given X E 0*,let W x denote the distribution of 6 0 $A under W , where $A is the element of 0 described in (1.3.9). Then W x << W and
(1.3.20)
dWX
+
= exp [,.(x,e),
-(e) dW
= Rx(e)
- A,(A)] , e E
o.
PROOF:Define the measure P on 0 by P ( d 0 ) = (l/Rx(O)) Wx(dB).Then the required result is equivalent to the statement that P = W . But, for any 7j E O*,
= exp
- A, $A),
= exp [0*(77 $A), ,
1
+ ho)]8 exp [e*(7j- A, e),]
- AW(A)
+ ~ w ( 7 j A)] -
W(W
= exp [~W(71)1
where we have made repeated use of (1.3.10) and (1.3.11). From the preceding, we see that the function
I Some Examples
15
extends to a n entire function on the whole complex plane, and, in particular, that
That is, P = W . 1
1.3.21 Lemma. For every open G
0,
lim clog (W,(G))2 E-0
- inf G
Ah.
PROOF:What we must show is that
lim c log (W,(G))2
(1.3.22)
-Ah($)
E’O
for every $ E G. By (1.3.12), we need only check (1.3.22) for $ E G n H’. But if II, E G f l H’, then we can find {&}‘lo E Cr([O, 0 0 ) ; Rd) such that - $llHi 0. In particular, $n E G for all sufficiently large n’s and Ah(&) Ah($). Hence, we need only prove (1.3.22) for $ E Cr([O,m);Rd).Given such a $, define X E 0* by X((t,co)) = $ ( t )and choose T > 0 so that B ( $ , T )C G. Then $ = $A and so, for 0 < 6 < T ,
--
W,(G) 2 W E ( B ( $6 ,) ) = W - X / e ”(B(O, 2 6/c1/’))
Since, by (1.3.13), Aw(X) = Ah($),we see that (1.3.22) holds. 1 We must now return to the problem of removing the compactness restriction from Lemma 1.3.14. Our idea will be to produce a family of compact sets K L , L > 0 with the property that (1.3.23)
-
lim E log ( W , ( K i ) ) 5 -L,
E+O
L > 0.
What (1.3.23) says is that, as L /” 00, the events K i become so “deviant” that they cannot even be seen on the scale at which we are looking; and, therefore, they cannot contribute to our calculation (cf. the proof of Theorem 1.3.27 below).
16
Large Deviations
There are several ways in which one can go about constructing the sets K L . The method which we will adopt here will be to construct a function Q :0 [0, m] with the properties that
-
(1) @ is sub-additive and @(a@) = IaIQ(0) for all a E ( 2 ) ( 0 : @(e) 5 L } cc 0 for each L > 0, (3) : q e ) < }.) = 1.
W and 8 E 0,
w({e
In order to construct such a Q and to pass from the fact that it exists to (1.3.23), we will make use of the following beautiful and powerful estimate due to X. FERNIQUE [45].
-
1.3.24 Theorem. (FERNIQUE) Let X be a real, separable FRECHET space and Q : X [O,m] a measurable subadditive function with the property that @(ax)= lalQ(x)for all a E R and 2 E X.Next, let p be a probability measure on (X, Bx) with the property that p2 on ( X 2 ,Bxz) is invariant under the transformation
If p ( { x : @(s)< oo}) = 1, then there exists an a
PROOF:Given 0 < s < t , we have
> 0 for which
I Some Examples
17
and therefore
Working by induction, we conclude from this that
where
Thus if
ct
< a / ( 2 P ) , then
1.3.25 Lemma. For 0 E 0 set
Then { O E 0 : @(0)5 L } cc 0 , for each L such that (1.3.26)
> 0; and there exists an a > 0
exp[cr@(8)2]w(&) < 00.
In particular, if K L = (0 : @(O)' 5 L/a},then K L CC 0 and (1.3.23) holds.
PROOF:The proof that, for every R E (O,m), (0 : a(0) I R } cc 0 is a standard application of the ASCOLI-ARZELA criterion combined with a diagonalization argument. The details are left to the reader.
Large Deviations
18
To prove that (1.3.26) holds for some Q > 0, we first observe that W 2 has the invariance property required in FERNIQUE’S Theorem. (Indeed, any centered GAussian measure on a FRECHET space will have this property.) Thus, the existence of a will follow once we show that W({e : @(e) < co}) = 1. To this end, note that, by parts (iv) and (vi) of Theorem 1.3.2 combined with FERNIQUE’S Theorem,
for some A d < 00. At the same time, again as a consequence of FERNIQUE’S Theorem and elementary properties of W , we see that
for some B d
< co. Finally, since, by
for some c
< 00,
d
(iv) in Theorem 1.3.2,
we can combine these into the estimate that
which is more than enough for our purposes. Knowing (1.3.26), we can proceed to prove (1.3.23) as follows:
I exp[-~/e] J
exp[aa(0)21 w ( ~ Q ) .
0
Together with (1.3.26), this surely leads to (1.3.23). I 1.3.27 Theorem. (SCHILDER) For every I’ E BQ: (1.3.28) - inf A&, r o
< lim E log (We(r))5 i& e log (WE(I?)) 5 - iEf A*,. r
€+O
E’O
PROOF: In view of Lemma 1.3.21, all that we have to do is show that
-
lim 6 log (We( F ) )5 - inf Ah F
E-bO
for each closed sets F . To this end, let C = i n f F h b , and, for L > 0, set FL = F n K L , where K L is the compact subset produced in the preceding lemma. Then W,(F ) 5 W € ( F L ) W € ( K L )and ; so, by Lemma 1.3.14 and (1.3.23), lim clog ( W E ( F )5) -(C A L).
+
S-bO
After letting L /” oo,we arrive at the required result. I
I Some Examples
19
1.3.29 Exercise. Given 11, E 0 and n 2 1, define
Show that
V ( $ )< 00 if and only if $ E H ' , and V ( $ )= ~
for $
HI.
~ $ ~ ~E$ 1
1.3.32 Exercise. The Lemma 1.3.19 is not a complete statement of CAMERON and MARTIN'S result 1151. Indeed, suppose that $ E H 1 and choose
{$nIF G c ~ ( [ o , ~ o ) ; R ~ )
so that
l$,
-
$11~1
-
0. Set
@,(el =
-f/l$llLl.
8 E 0,
where A, is the element of O* defined by A n ( ( t , 0 0 ) ) = & ( t ) , t 2 0. Show that an @ in L2(W),where @ under W is GAussian with mean 0 and variance Next, show that exp[@.,-Aw(A,)] exp[@-f~~$~~&] in L'(W). Finally, conclude from this that if W+ denotes the distribution of 6' 0 $ under W , then W $ << W and
-
-
+
(1.3.33)
The expression (1.3.33) is often called the Cameron-Martin formula.
1.3.34 Exercise. The purpose of this exercise is to show that the result obtained in Exercise 1.3.32 is optimal. The proof outlined below relies on a knowledge of the DOOB'SMartingale Convergence Theorem. In particular, one needs to know that if P and Q are probability measures on a measurable space ( E ,F), is a non-decreasing sequence of sub-a-algebras of 3 with the property that 7 = a Fn),and the restriction Qn of Q to Fn is absolutely continuous with respect to the restriction Pn of P to F,,then R, = R (as., P ) where R is the RADON-NIKODYM derivative of the absolutely continuous part of Q with respect to P .
{3,}r
-
(u,"
Large Deviations
20
(i) Given 11, E 0 and n 2 0, let 11,, be the element of 0 such that qhn (&) = 11, (f) for 0 5 k I 4n,, ll,n(t)= 11, (2n) for t 2 2", and ll,n is linear on each - 1 5 k 5 4n. Show that 11,, E H' and that of the intervals [*, (1.3.30)). II$nII$ = V2n(lCI) (cf.
k],
(ii) Given n 2 0, define Fn = (T ( 8 ( k / 2 " ) : 0 5 k I 4n) (i.e., the smallest (Talgebra over 0 with respect to which all the functions 8 H13(k/2~),0 5 k 5 4n are measurable). Note that Fn Fn+l and show that Be = u Fn) *
-
Cur
(iii) Given 11, E 0, let W* denote the distribution of 8 8 + 11, under W . Referring to (i) and (ii) above, let W t and Wn be the restriction of W @ and W , respectively, to Fn;and define R, = exp [Qn - illll,nll~~], where Qn is the function corresponding to ll,n as in Exercise 1.3.32. Show that dW* W$ << Wn and that # = R, (a.s.,Wn). (iv) Referring to (iii), show that
Now suppose that 11, E 0 \ H' and define W* accordingly. Using the preceding, show that W* I W (i.e., is singular with respect to W . ) This is the sense in which Exercise 1.3.32 gives the optimal result. 1.3.35 Exercise. Given T
> 0, define IT : 0
Show that if I' E BT, then
-
[O,m] by
1 Some Examples
21
1.4 Two Applications of Schilder’s Theorem
We continue in this section with the notation introduced in Section 1.3. Perhaps the single most striking application of SCHILDER’S Theorem is to the derivation of V. STRASSEN’S renowned Law of the Iterated Logarithm [loo]. Thus, our first goal in this section will be to show how SCHILDER’S Theorem provides the key estimates in the proof of the following statement. 1.4.1 Theorem. (STRASSEN) For n 2 3, define
whereP(n) = (2nlog(l0gn))~/~; andset K = {$ E H1: l l $ l l ~ l5 1). Then for W-almost every 8 E 0 the sequence { t n ( 0 ) } F ! 3has the properties:
(i) {tn(O)}?
is relatively compact in 0 and every limit point is an element
of K .
(ii) For every $ E K there is a subsequence of {tn(0)}r which converges in 0 to $. In particular, for every @ E C(0; R),
w({e: lim @(tn(e))= sup@}) = 1. K
(1.4.2)
n-oo
In the proof of (i) we will use the following elementary observation. 1.4.3 Lemma. Let S & (1,2) and assume that 1 E 3. If f3 E 0 has the property that
( 1.4.4)
then
([TI
denotes the greatest integer less than or equal T E W), Iltn(e) - Klle = 0.
for every s E S
PROOF: Assume that (1.4.4) holds for every s E S and set
for n 2 3, s E S, and m for which sm 2 3. Given 6 that 1/2
(1-:)
<6 and ( s - 1 ) <
> 0, choose s E S so 6
~
1 +6’
Large Deviations
22
Next, choose M E Z+ so that
Then for m 2 M
+ 1 and sm-'
5 n 5 sm we have that
and that
Also, since
for all $ E K , we see that
Combining these, we conclude that SM+l
- Klle < 106 as long as n 2
-1
PROOF OF THEOREM 1.4.1: We begin by proving (i). Because K Cc 0 , it suffices for us to show that &(e) + K for W-almost every 8 E 0. Moreover, because of Lemma 1.4.3, we will know this as soon as we show that (1.4.5) for every s > 1. Indeed, if (1.4.5) holds for every s > 1 and we take S = (1 1/n : n 2 l}, then we can choose one W-null set A so that
+
I Some Examples
4
23
S. To prove (1.4.5), let 6 > 0 be given and set Id6)= {$I E 0 : Il$I - Kilo < 6). Then y inf{II+IIkl : $I 4 K ( 6 ) )> 1; and therefore, by SCHILDER'S Theorem and W-scaling invariance (cf. part (iv) of Theorem 1.3.2),
for every 8
A and s
E
for all sufficiently large m, where
es(m)2
1 2 log(log[sm])f
Since y and s are strictly larger than 1, it follows immediately from the preceding that
w ((0 :
<[sm](O)
4 K " ) } ) < 00;
s* 2 3
and therefore, by the BOREL-CANTELLI Lemma,
w ({o
:
m E z+}) = 1.
Since this is true for every 6 > 0, (1.4.5) and, therefore, (i) have now been proved. 11<,(8) - K l l ~= 0). We now know that W ( 0 ' )= Set 0' = ( 8 : 1. Note that for any 0 E 0' and $I E K ,
Thus, in proving (ii)?we need only show that for each k 2 3 and $I E K
lim sup n+cc
i
1 -I&$,@ +
- +(t)I = o
for W-almost every 8 E 0. Moreover, by another application of (1.4.6), this will be shown if we can prove that (1.4.7)
and
Large Deviations
24
if t E [0, $k(t)
=
i]
$ ( t ) - $(i)if t E ( i , k ) if t E [ k , ~ ) . { O1Cl(k) -
The advantage gained by dealing with the functions qk,m instead of the original En ’s is that, for fixed A, the qk,m ’s are mutually independent under W . Hence, by the BOREL-CANTELLI Lemma, we will have proved (1.4.7) once we show that (1.4.8) m=1
for each 6 > 0. To prove (1.4.8), note that, by (iv) in Theorem 1.3.2,
for every 6 > 0, we can use SCHILDER’S Theorem to find a 7 < 1 such that
for all sufficiently large m’s. When combined with the preceding, this clearly proves (1.4.8) and therefore (ii). Given (i) and (ii), the proof of (1.4.2) is easy and is left to the reader. I Our second application of SCHILDER’S theorem will be to VENTCEL and FREIDLIN’S estimate on the large deviations of randomly perturbed dynamical systems [log]. Our approach is based on the ideas of R. AZENCOTT
PI
space C([O,TI;Fad) with Given T > 0, let (RT, 11 [laT)be the BANACH the uniform norm. The theory of VENTCEL and FREIDLIN deals with families of measures {Pe : e > 0) on ( n ~ , B aof~ which ) the following is a typical example. For a given bounded, uniformly LIPSCHITZcontinuous W d ,define 8 E 0 X ( 0 ) E OT by function b : Rd a
-
-
~ ( e) t ,= e ( t )
+
rt
10-
b ( ~ (e)) ~ ds, ,
tE
[o, TI;
--
1 Some Examples
25
and for E > 0 let P, = W,oX-' be the distribution of 0 X ( 0 ) under W E . Since an equivalent description of P, is as the distribution of 8 X(d/28) under W , it is clear that P, + ax,,, where Xo E RT is the integral curve of b which starts at 0. Moreover, it is easy to see that 8 E 0 H X ( 0 ) E RT is a continuous mapping. Thus, if G C RT is open, then SCHILDER's Theorem (cf. Exercise 1.3.35)says that
limelog [p,(G)] = b e l o g [ W € ( X - ' ( G ) ) ] E+O
E+O
>_ - inf{IT(+) : X ( $ ) E G } = - inf{IT o X - ' ( d ) :
{1
C#J
E G and d ( 0 ) = 0)
T
= - inf
l X ( t ) - b ( X ( t ) ) I 2dt : X E G and X ( 0 ) = 0
I
and (1.4.9)
Similarly, if F
Hk = H1(([o,TI; W d ) = { $ l [ O , T ] : $ E H ' } *
RT is closed, then
Theorem leads directly to a large deviation In other words, SCHILDER'S result for {P, : E > 0). The preceding example of VENTCEL and FREIDLIN'S theory is as simple as it is because the map 0 X ( 0 ) is especially pleasant; in particular, it is continuous and its inverse is easy to compute. In general, the maps involved are not only more complicated but are not even continuous. To be precise, let a : W d Rd 8 W d be symmetric matrix valued, b : W d Wd, and assume that there exists an M E [l,co) such that
-
-
-
Large Deviations
26
-
norm.) (In (1.4.11) and elsewhere, II.IIH.s. stands for the HILBERT-SCHMIDT Next, for z E Rd and E > 0, let XF : [O,T]x 0 Rd be the W-almost surely unique {Bt : t E [0,TI}-progressively measurable solution to the IT^ stochastic integral equation
x:(t,e)= x + E l l 2 (1.4.12)
I'
U(X,.(S,
6 ) )W s )
t where u
E [O,TI,
= a l l 2 ;and define P," = w
0
(xf
on ( f l ~ , l ? ~(since ~ ) Xp(-,O) E RT for W-almost every 6 E 0, there is no problem with considering P," on RT). Once again, P," Sx;, where is the integral curve Xg E
*
rt
Moreover, if one pretends that (1.4.12) means that x:(t,e) =
E1/2u(x:(S,e))B(s) + b(x:(s,e)),
tE
[o,~],
(this is not even formally correct, since we are dealing with IT^ and not STRATONOVICH integral; however this error becomes negligible as E + 0) and one ignores all continuity questions, then one can repeat the argument given in the preceding paragraph and thereby arrive at the conjecture that the large deviations of {P," : E > 0) are governed by the function
according to whether X - x 4 H$ or X - x E H;. Considering all the objections which one can raise to the above na'ive line of reasoning, it is somewhat remarkable that the conjecture to which it leads is, nonetheless, absolutely correct. In order to get around the most serious flaw in our heuristic argument (namely, our treatment of the maps 0 E 0 XF(0) E f l as~ if they were continuous), we introduce EULERapproximations. Namely, set
-
Tn(t)= -, [ntl n
n E Z+ and t E [ O , c o )
I Some Examples
27
(recall that [TI is the integer part of r E R), and consider the maps B E 0 HX&(B) E 5 2 given ~ by
-
for t E [O,T].Clearly the maps B E 0 X & ( @ ) E 5 2 are ~ continuous. Moreover, X&(B) = X:,, ( @ B ) ; and so, just as in the original case considered, we can apply SCHILDER'S Theorem to deduce that
-
- '$In,T,x a,b < -lim clog €-0
(1.4.15)
5 lim log €+O
[w({e: x:,€(e)E r})] [w({e: x;,,(e)E r})] 5 - i ~ f af 6, ; ~ , ~ , r
where
-
-
according to whether X - x 4 H; or X - x E H;. Since it is clear that X& X : in W-measure and that I;;:, as n -+ 00, all that stands between us and the conjectured result are estimates which allow us to exchange the order in which n-limits and €-limits are taken. The following lemma takes care of the required facts about the convergence of { I Z ; ; , ~to } ~I;;:.
-
1.4.17 Lemma. For each z f Rd, { X : I;!+.(X) 5 L ) CC RT for all L 2 0 and infr infr I&;: as n -+ 00 for every r 5 2 ~ .
PROOF:Assume that x = 0 and set I = 1;;; and I , = Because { X : I ( X ) 5 L } is a bounded subset of H;, we will know that it is compact in 5 2 as ~ soon as we show that it is closed there. To this end, ~ the properties suppose that { X n } y is a sequence of elements in 5 2 with that X, +X in 5 2 and ~ SUP, I ( X , ) 5 L . Then X E H; and X, X weakly in El$. Since this means that
-
weakly in L2([D,T];Rd),it follows that I ( X ) < ,oo I ( X , ) 5 L. Thus, { X : I ( X ) 5 L } Cc 5 2 ~ To . prove the convergence assertion, first note
28
Large Deviations
that infr I = 00 if and only if infr I, = 00 for every n 2 1. Next, note that if B is a bounded subset of H;, then lim sup II,(x> - I(x)~ = 0.
(1.4.18)
n-'OXEB
In particular, this proves that infr I 2 G,+m infr I,. Finally, if C = infr I < 00, then we can choose a bounded subset B of H$ so that I ( X ) A inf,Ll I,(X) 2 C 1 for X 4 B. Hence, because infr I, 5 C 1 for all sufficiently large n 's, we can use (1.4.18) to conclude that
+
+
inf I = inf I = lim inf I, = lim infI,. r rnB n-carn~ n+m r
I
As a preliminary to our estimate on the rate of convergence of the X& 's to XF, we present the following standard estimate for stochastic integrals.
-
1.4.19 Lemma. Let (Y : [0,00) x 0 RN@ Rd and p : [0, m) x 0 be bounded {&}-progressively measurable functions and set
PROOF:Set P ( t , 0) = Y(t, 0) - Y(s, 0) and [ E SN-', define
sstp ( ~ , 0dT) for t 2
and
&(e)
= inf{t 2 s :
<.P(t,O) 2 T}.
By IT& formula and DOOB'SStopping Time Theorem,
S.
-
RN
For p > 0
29
I Some Examples is a bounded martingale. Therefore
= exp [-pr
7 1
+ AT^^
.
Hence, after minimizing with respect to p
> 0, we see that
and from this it is an easy step to
and thence to (1.4.20). I We next show that X;,, approximates X: sufficiently fast as n
+ 00.
1.4.21 Lemma. For each x E W d and ail S > 0,
asn-cm.
PROOF:We assume that x = 0 and T = 1. (The reduction to this case is left to the reader.) Set X, = Xz,Xn,, = X:,+, and Yn,, = Xn,, - Xe; and, for p > 0, define
Large Deviations
30
Clearly,
w
({8 : sup IYn,.(t,8)l >
6
tE[O,T]
and so it suffices for us to prove that
for each p
( 1.4.24)
> 0 and that lim sup
i&
p-0
e+O
[
clog W
({
8 : <;,+(8) < l})] = -03.
To prove (1.4.23) we use Lemma 1.4.19 to obtain
from which (1.4.23) is immediate. . The proof of (1.4.24) goes as follows. Set f c , p ( y ) = ( p z l y 1 2 ) 1 / E Note that there is a K < 03, which is independent of n, E , and p , such that
+
(V2f denotes the HESSian matrix of f . ) Hence, an application of IT& formula to f c , p (Yn,€(t1 8 ) ) together with DOOB'SStopping Time Theorem shows that there is a C < 03, which is independent of n, E , and p , such that
I Some Examples
31
and therefore, uZ,,(l) 5 e x p [ -1( c + l o g P z ) ] . E
Since
W ((0 :
e,#)< I}) I(PZ + 6z)-1/%,E(1),
this completes the proof of (1.4.24). I We have, at last, made all the preparations necessary to prove the VENTCEL and FREIDLIN'S estimate. 1.4.25 Theorem. (VENTCEL & FREIDLIN) Let X z be the solution to (1.4.12) and define {P," : E > 0) on (RT, anT) accordingly. Then, for all
r E an,, (1.4.26)
-infI;;L ro
< limElog(P,3"(r))5 GElog(P,3"(r)) 5 --igfI$, €-+O
€'O
r
where 1;;; is defined in (1.4.13). (See Exercise 2.1.25 below for a slightly more general statement.)
PROOF:Let F be a closed subset of RT. Then, for any 6
> 0 and 2 1,
( F ( 6 )denotes the open &neighborhood around F.) Thus, by (1.4.15), Lemma 1.4.17, and Lemma 1.4.21, we see that ( 1.4.27) for every 6 > 0. Finally, set & = inf-1;;: for 6 2 0. It is clear that /" C I CO as 6 \ 0. Suppose that C < lo. We could then find {X,}? and L < L < Lo so that X , E F(lln)and I;".;L(X,) 5 L. Further, by Lemma 1.4.17, we could assume that X , X . But clearly this would mean that X E F and, again by Lemma 1.4.17, that infpI;$ 5 I;:L(X) < CO. Hence we can let 6 /" 0 in (1.4.27) and thereby get the right hand side of (1.4.26). Next, let G be an open set in RT. Then, for each X E G and n 2 1, we see that
Cs
-
Large Deviations
32
as long as B ( X ,26) C G. Using (1.4.15),Lemma 1.4.17, and Lemma 1.4.21, we conclude from this that
lim Elog(P:(G))
2 -I$(X).
I
O’E
1.4.28 Exercise.
STRASSEN’S Theorem is the function space version of the Classical Law of the Iterated Logarithm. That is, given real-valued, identically distributed, independent random variables X I , . . .,X,, . . . with mean 0 and variance 1, set S, = CyXm, n 2 1. Then the Classical Law of the Iterated Logarithm is the statement that (1.4.29)
lim
n’oo
sn =1
P(4
almost surely.
When the X , ’s are standard GAussian random variables, (1.4.29) is an immediate consequence of (1.4.2) with a($) = $(l) since, in this case, {S,}? has the same distribution as the distribution of 8 {O(n)}? under W . It turns out that the general classical result can also be seen as a consequence of STRASSEN’S Theorem. The proof entails the use of the SKOROKHOD Representation Theorem [97]. We outline below how this argument proceeds in the special case when the X , ’s are standard BERNOULLI random variables (i.e., P ( X , = 1) = P ( X n = -1) = L). 2 Throughout the rest of this exercise, the X , ’s are BERNOULLI and d = 1.
-
(i) Define q,(O) = 0 and Tn+l(f3) =
inf{t - En(8) : t 2 En(8) and lO(t) - 8(En(8))l 2 l}, n 2 0
where E,(8) = C:=o~m(8).Show that the 7,’s under W are identically distributed, independent, and have mean 1. Next, set C, = and define Yn(8)= O(E,) - O(E,-,) for n 2 1, and show that the Y, ’s (under W )are independent standard BERNOULLI random variables. (Both of these assertions turn on the fact that if T is a {&}-stopping time with W(T< m) = 1, then 8 E 0 8(. V T ( 8 ) ) - e(T(8)) E 0 under W is independent of Z?, and has distribution W.) Conclude that { S n / P ( n ) } y has the same distribution as the distribution of
cy~~
-
8
-
{ t n ( E n ( e ) / n ,0)):
under W . In particular, (1.4.29) for BERNOULLI random variables is equivalent to (1.4.30)
for W-almost every 8.
1 Some Examples
33
-
(ii) Use the Strong Law of Large Numbers to show that E,(O)/n 1 W-almost surely; and from this, together with Theorem 1.4.1, conclude that (1.4.30) holds for W-almost every 8. The construction of the 7, 's for more general random variables is more difficult. (The content of SKOROKHOD'S Theorem is that such 7, 's always exist.) However, once their existence has been established, the rest of the argument is the same as the one just given for the BERNOULLI case. 1.4.31 Exercise.
There is a more direct approach which can be taken to prove the left hand side of (1.4.26). Namely, given 11, E H$, let 8 H X,Zi$(e)be the W-almost surely unique {a, : t E [O, 2'1)-progressively measurable solution to
for t E [O,T].
-
(i) Show that the distribution of 8 H X ~ ~ ~under ( 0 )W is the same as that of 8 X f ( 0 ) under W+/"". (See Exercise 1.3.32 for the notation, and think of 11, as being the element of H 1 with $(it) = $ ( T ) for t 2 T.) (ii) Define Y"(11,) E f l by ~
Using (i) above, Exercise 1.3.32, and HOLDER'Sinequality, show that for every q E [l,00) and T > 0,
Conclude from this that
for all T > 0.
Large Deviations
34
(iii) From (ii), show that for every open G in RT, (1.4.32)
b e l o g ( P T ( G ) ) 2 -inf{Iz-(+) : 1c, E
and Y"(+)E G};
O E'
and show that this is equivalent to the left hand side of (1.4.26).
It should be noted that the preceding derivation does not use in any way the strict positivity of a(.) until the very end. Thus, (1.4.32) holds even if a is allowed to degenerate. However, when a can degenerate, it is not so easy to give its nice an expression as that in (1.4.14) for the quantity on the right hand side of (1.4.32). (cf. Exercise 2.1.25 below.) 1.4.33 Exercise.
Replace (1.4.10) and (1.4.11), respectively, by the assumptions that (1.4.34) 0 < a(.)
6 M(l+lx12)IRaand lb(x)I 5 M(1+)2)2)1'2, x E W d
for some M E (0, m) and that, for each T E (0, m),
for some M , E ( 0 , ~ ) Show . that for each x E Wd,e > 0, and T > 0, there is a W-almost surely unique {at : t E [O,T]}-progressively measurable solution 8 X?(O) to (1.4.12) and that both
-
and
are --oo for every T > 0. Conclude from these not only that Theorem 1.4.25 continues t o hold when (1.4.10) and (1.4.11) are replaced by (1.4.34) and (1.4.35) and also that (1.4.26) can be improved to the statement that (1.4.36) -$fI&$
-
< l i m e l o g ( P . f ( r ) ) 5 E ~ l o g ( P . ~ ( r )5) -igfI&;$ O E'
OE'
r
whenever z, x. Also, observe that it is still true that { X : Ig;",X) 5 L1 cc RT for every L 2 0.
I1
Some Generalities
2.1 The Large Deviation Principle Having seen several examples for which it is possible to carry out a successful analysis of the large deviations, we will now attempt to formulate into general principles some of the ideas and techniques which proved useful in those examples. Because we never use completeness in this section, we will take E throughout this section to be a separable metric space. A function I : E [0,co] is said to be a rate function if it is lower semi-continuous. Given a family { p E : E > 0 ) M l ( E ) (we often use M l ( E ) to denote the space of probability measures on (E,BE)), we will say that { p e : E > 0) satisfies the full large deviation principle with rate function I or, equivalently, that the rate function I governs the large deviations of { p E : E > 0) if (1.1.5) holds for every r E BE. It is clear that if I is a rate function which governs the large deviations of some family { p E : E > 0) then it must be true that infE I = 0.
-
The following result is elementary but reassuring.
2.1.1 Lemma. For any given { p e : E > 0) Ml(E) there is a t most one rate function governing the large deviations of { p e : E > 0).
PROOF:Suppose there were two, and name them I1 and 1 2 . Because of lower semi-continuity, we know that I j ( p ) = lim,Co infqp,,) Ij for every p E E . Thus it suffices for us to show that, for each p E E, infB(p,r)I1 = infs(p,,) I2 for each T in a dense subset of (0, co). To this end, observe that
35
36
Large Deviations
for any r > 0 with the property that infB(P,T)Ij = infF(p,TlI j . In particular, this will be the case if r > 0 is a continuity point for the nonincreasing function T E ( 0 , ~ ) infB(p,s)Ij; and therefore we see that infB(p,T)I1 = infqp,r)IZ for all but a countable number of r > 0. I
-
In all our examples, the governing rate function was not only lower semicontinuous but also had the property that the level sets { q € E : I ( q ) 5 L } were compact for all L 2 0. Because such rate functions play a prominent role and since the additional property is extremely useful, we will say that I :E [O,oo] is a good rate function if { q E E : I ( q ) 5 L } CC E for all L 2 0. Some elementary properties of good rate functions are listed in the next result.
-
2.1.2 Lemma. Let I be a good rate function. Then, for each closed F in E, (2.1.3)
-
(Recall that = { q E E : dist(q, I') < 6) for any subset I'.) In addition, if @ : E [-00, 00) is an upper semi-continuous function, then for any closed F E on which @ is bounded above there is a q E F such that @(q)- I ( q ) = suPF(@ - I).
PROOF:The derivation of (2.1.3) in this general setting differs in no way from the one given for the special case handled at the end of the first paragraph in the proof of Theorem 1.4.25; thus, we will not repeat the argument here. To prove the second assertion, first note that there is nothing to do if sup,(@ - I) = -00. Thus, we assume that C = supF(@ -I) > -00, in which case we know that C E (--oo,m). Choose {qn}yE F so that @(qn) - I ( q n ) 1 .t - $. Because {qn}y { q : I ( q ) 5 M - C l}, where M = supF a, there is a convergent subsequence of {qn}r which converges to some q; and, because @ - I is upper semi-continuous, not only is q E F but also @(q)- I ( q ) 2 C.
+
Another advantage that good rate functions have is that the full large deviation principle is a covariant notion when the rate function is good. (In this connection, we use here and elsewhere the notation p o f-l to denote the covariant image of a measure p under a measurable map f . Thus, p o f-'(I') = p(f-l(I')) for measurable subsets I' of the image space.) That is, such principles can be L'pushedforward" under mappings which are "nearly continuous." We have already seen an example of this when we discussed in Section 1.4 the passage from SCHILDER'S Theorem
II Some Generalities
37
to the estimate of VENTCEL and FREIDLIN. The next lemma provides a general statement of this technique. (See also Exercise 2.1.20 below.) 2.1.4 Lemma. Let I be a good rate function on E , f a measurable map from E into a second separable metric space (E’, p‘), and assume that there C ( E ;E’) such that exists a sequence { f n } T
{
-
lim sup p’ ( f n ( q ) ,f ( q ) ) : q E E with I(q) 5 L } = 0 for each L E (0,oo).
n+m
Then the map I‘ : E’
[0, m] given by
I’(q’) = inf{I(q) : q E E and q’ = f ( q ) } , q‘ E E’,
is a good rate function on E‘. Moreover, if, in addition, {pd: c M 1 ( E ) has the property that
> 0)
C_
for each 6 E (0, GO), then I’governs the large deviations of { p , o f - l : c > 0} whenever I governs the large deviations of {pd: 6 > 0). In particular, i f f E C ( E ;E’) and I is a good rate function on E which governs the large deviations of {pd : c > 0}, then I’ is a good rate function on E‘ which governs the large deviations of {pdo f-’ : E > O}.
PROOF:One should observe that the case when f is continuous everywhere on E is trivial and therefore really should not be thought of as a consequence of the general result. First, observe that f is continuous on K L { q E E : I ( q ) 5 L } for each L E [0, co). Second, suppose that q’ E E’ with I‘(q‘) < co. Then, for some L E [O,GO), I’(q’) = inf{I(q) : q E K L and q’ = f ( q ) } ;
and therefore, by Lemma 2.1.2, there is a q E f-’(q’) for which I ( q ) = I’(q’). With these preliminaries, we can easily prove that I’ is a good rate function. Indeed, if L E [0, GO) and
KL
{q;}:
= {q’
E E’ : I’(q’) 5
L},
then there is a {qn}y C_ K L such that qk = f ( q n ) and I’(qk) = I(q,) for each n E Z+. Thus, since K L CC E , we can choose a subsequence 00 {Qnm Im=l so that qn, q E K L . Because f J K L is continuous, this means that
-
4‘
=f(n) =
That is, KL C C E’.
*
lm
m-00
I
Pn,
and I’(q‘) 5 I ( q ) 5 L.
Large Deviations
38
In preparation for the second part of the proof, we next show that, for each closed F’ in E’, infI’ = lim lim inf { I ( q ) : p’(fn(q), F’) 5 6). F’
0 0 6‘ v 1J 7
-
To this end, first suppose that p’ E F’ with I’(p’) < 00 and 6 E ( 0 , ~ are ) given. Choose p E f-’(p‘) so that I ( p ) = I/@’). Noting that fn(p) p’ as n 00, we see that there is an N E Z+ such that
-
for all n 2 N ; and therefore, we now know that
To prove the opposite inequality, assume that
We can then choose { q m } ;
E Ke+l and nm
-
00 SO
that
for each rn E Z+. Furthermore, because Ke+l cc E and I is lower semicontinuous, we may and will assume that qm -+ q € Ke. Hence, since f l ~ ~ is+ continuous ~ and therefore q’ = f ( q ) E F‘, we have that
$i I‘ 5 I ( q ) 5 e. To complete the proof, assume that I governs the large deviations of {,uc : c > 0) and that lim L c l o g [pE(r(n;6))] = -00, 1 2 ’ 0 0
where
6 E (0,00),
E’O
r(n;6 ) = { Q : P’(fn(Q),f ( 4 ) ) 2 6).
Given an open set G’ in E’ and p’ E G’ with I‘(p’) < 00,choose p E f - ’ ( p ’ ) so that I ’ ( p ’ )= I ( p ) and 6 E (0, co) so that 26
< p’(p’, (GI)‘).
II Some Generalities Then, since each f n is continuous and f n ( p ) and a sequence {T,}:=,, g ( 0 , ~such ) that B(P,Tn)
-
c f,-l(B’(P’,s,),
39 f ( p ) , there is an N E Z+
n
L N,
where B’(p’,6) is the p’-ball in E’ of radius 6 around p‘. Hence, for n 2 N ,
B ( p , r n ) 2 f-’(G’) u r ( n ; 6 ) ; and therefore, by choosing n 2 N so that
-
lim clog
[pE(r(n;s))]5 - - ~ ’ ( p ’-) 1,
E+O
we see, from the large deviation principle for { p E : E
> 0}, that
E’O
from which we conclude that limelog [ p , o f - l ( G ’ ) ] 2 -infI’.
L
EO ‘
Finally, for closed F’ in E’, set
and note that f-l
(F’) g
f,-l
(~(6 n)); u
6)
for n E Z+ and S E (0, m). Hence, for every n E Z+ and 6 > 0,
where
- ~ ( n6) ;
- lim
E-+O
log
[pE(r(n; q)].
Since, by hypothesis, R(n;6) 00 as n because, by the preceding paragraph,
m for each
6 E (0,m) and
40
Large Deviations
the large deviation principle for { p E : c
> 0) now leads to
Another situation which we encountered in Chapter I (cf. the proof Theorem) is that of a deficient large deviation principle; of SCHILDER’S namely, one in which the right hand side of (1.1.5) has been proved only when the set r is relatively compact. As it was there, such a large deviation principle is usually a preliminary step on the way to proving a full large deviation principle. Nonetheless, it arises sufliciently often to warrant our giving it a name. Thus, i f f is a rate function and { p e : E > 0) C M l ( E ) satisfies ) - inf I lim clog ( p E ( G )2 G
for all open G in E
€’O
and
-
lim clog (pe(K))5 - inf I K
E O ’
for all K
cc E ,
then we will say that { p E : c > 0) satisfies the weak large deviation principle w i t h rate function I . The passage from a weak to a full large deviation principle is often accomplished by an application of the following simple observation.
2.1.5 Lemma. Let {pa : e > 0) C M 1 ( E ) , and assume that, for each L 2 0, there exists a K L C c E with the property that
-
lim clog ( p E ( K i )5) -L.
(2.1.6)
E-+O
If I is a rate function and { p E : E > 0) satisfies the weak large deviation principle with rate function I , then not only is I a good rate function, but it also governs the large deviations of {pa: c > 0).
PROOF:First note that inf I 2 - lim c log (p, (KL)) 2 L; KZ
€40
and so { q : I ( q ) 5 L } C K L + ~Since . I is lower semi-continuous, this proves that I is a good rate function. Next, let F be a closed subset in E and set FL = F n K L for L 2 0. Then pa(F) 5 PE(FL) + P L E ( G ) ,
II Some Generalities
41
and so
for every L 2 0. Thus we get the required result upon letting L /*
00.
I
MI@) is exponentially We will say that a family { p e : e > 0) tight if, for each L > 0, there is a K L cc E for which (2.1.6) holds. We end this section with a result which, in its original version, was first proved by S.R.S. VARADHAN [loti]. 2.1.7 Lemma. Let I be a rate function and suppose that { p a : E > 0) satisfies the weak large deviation principle with rate function I. If the function @ : E [-m, -001 is lower semi-continuous, then
-
{
2 sup @ ( q )- I ( q ) : q E E and @ ( q )A I ( q ) < m}. (Throughout we adopt the convention that the supremum over the empty set is -m.)
PROOF:Let q E E satisfy @ ( q )A I(q) < 00. Then, for each T > 0,
Since @(q)= lim,+o infB(q,r)@, we conclude that
-
2.1.8 Lemma. Assume that I is a good rate function and that { p E: E > 0) satisfies the full large deviation principle with rate function I . If @ : E [-00, 00) is an upper semi-continuous function which satisfies
Large Deviations
42 then
-
lim clog
S-0
(J exp[@/c]dpt)
5 sup(@- I). E
PROOF:We first work in the case when @ 5 M for some M E ( 0 , ~ ) . Given L > 0, set KL = { q : I ( q ) 5 L ) . Since @ is upper semi-continuous and K L CC E , we can choose, for given 6 > 0, a finite set {qm}$=l KL and positive numbers T I , . . . , T , so that
for 1 5 m 5 n,where B,
= B(q,,
J exp [@/el 44I exp [f ( M +
T,).
6
1%
Thus, if G = Uz=, B,,
( P E
then
))I
(GC)
and so
1
5 sup(@- I ) v ( M - L ) + 26. ( E
Now let 6 \ 0 and L /” 00. To treat the general case, set preceding to show that
where
@M = @
A M for M E (0, GO), and use the
11 Some Generalities
43
2.1.10 Theorem. (VARADHAN) Let I be a good rate function and assiime that { p e : E > 0) & M1(E) satisfies the full large deviation principle with rate function I . If @ E C ( E ;R) satisfies (2.1.9), then (2.1.11)
O’€lim 6 1%
(/
-I). exP[@/€]d P € ) = sup(@ E
In particular, (2.1.11) holds if @ E C ( E ;R) satisfies (2.1.12) for some a E (1,m).
PROOF:In view of Lemma 2.1.7 and Lemma 2.1.8 , all that we have to do is check that (2.1.12) implies (2.1.9). But, by HOLDER’Sinequality,
from which (2.1.9) follows immediately when (2.1.12) holds. I 2.1.13 Exercise.
(i) Define EULER’S Gamma function by
r(Y)=
io
t7-1 -t
e
dt,
yE(0,~).
7 0 0 )
Note that y-7+lr(y) = y
J
t7-le-7t
dt;
(O@)
and using Theorem 2.1.10 together with Exercise 1.1.6, conclude that
This is, of course, a very weak version of STIRLING’S formula and, as such, it serves as a good example of both the virtues and the deficiencies in the asymptotic theory with which we are dealing.
Large Deviations
44
-
(ii) Let W be WIENER’S measure on 0 with d = 1; and, for given P E R, define Xp : [0, 00) x 0 R by the equation
and up : 0 + [0, 00) by
If, for e > 0, pp,+ E M1([O,00)) is the distribution of 0 -+I e 1 / 2 u p ( B ) under W , show that : e > 0) satisfies the full large deviation principle with the good rate function Ip : [O, 00) [0, GO] given by
-
where
“8=(
p2
- w;
for P 2 1
p2
+
for P < 1
W;
and
-={
t h e w E ( 0 , n ) s u c h t h a t wcosw=Psinw 0
) that wcoshw = psinhw thew E ( 0 , ~such
ifPE(--Oo,l] ifp=1 if /3 E ( 1 , ~ ) .
Hint: Note that, by Lemma 2.1.4 combined with SCHILDER’S Theorem, the desired large deviation principle holds with 1
Ip(u) =- inf{
(&t) - P $ ( t ) ) 2 dt : $ E H1with
I’
4(t)2dt = u2
= Ip(1)u2;
-
and use the calculus of variations to evaluate I p ( 1 ) .
(iii) Next, define Yp : [0,00) x O2 stochastic integral equation
[0,00) to be the solution of the IT^)
II Some Generalities
45
under W 2 ,and note that
[l t
Yp(t, 0,0') = exp
X p ( s , 0) d0'(s) -
f
1 t
X $ ( s , 0) ds] .
Letting Pp E M1([0,co))denote the distribution of
(e,e') E o2
+-+
Yj(i,e,e')
under W 2 ,check that Pp(dy) = pp(y)dy where pp(y) = iqp(1ogy) and, for z # 0,
where 6 = 1/1z1. Finally, use ii) above and VARADHAN'S Theorem to show from this that
and that
lim c log
€+O
-1
(J
(O@)
[
1 (1 - 0 2 / 2 ) 2 exp -E
2a2
+ 0 2 / 2 y + I p ( 0 ) ] = ;[I + (1+
~ C X ~ ) ~ / ~ ] .
2a2
2.1.14 Exercise. Let { p B: 6
> 0)
M1(E) and a rate function I : E
-
[0, m] be given.
(i) If I is good and { p E : E > 0) satisfies the full large deviation principle with rate I, show that there is a q E E for which I ( q ) = 0.
(ii) Assuming that E is locally compact and that { p , : 6 > 0) satisfies the full large deviation principle with rate I, show that I is good if and only if {pe : E
> 0) is exponentially tight.
Large Deviations
46
for every lower semi-continuous @ : E
-
[-00,
001.
(iv) If
lim lim clog [ P E ( B ( Q , T ) ) ]I - I ( q ) , c+o
Q E E,
T\O
show that
In particular, this means, of course, that
-
lim clog(pe(K)) 5 -f;i
I,
EO'
K
CC
E.
Also, check that
I
limelog [/exp[@/~] dp. 5 s u p ( @ - I ) €+O
for every upper semi-continuous @ : E only (2.1.9) but also the condition that { q E E : -@(q)
--+
[-00,00)
I L } CC E , L E
which satisfies not
[o,~).
(v) Assume that lim limclog ( p . ( B ( q , T ) ) ) = - I ( q ) T
V
F o
=o\rlirn a+O limelog ( p a ( ~ ( q , ~ ) ) q) E , E.
Show that { p c : rate I and that
E
> 0)
satisfies the weak large deviation principle with
I1 Some Generalities for
47
E C ( E ;R) which satisfy (2.1.9) and the condition ( 4 E E : -@((I) 5 L ) cc E ,
LE
[O,Oo).
2.1.15 Exercise.
-
For each i from an index set Z let { p i + : E > 0) be a family of probability measures on E . Assume that there exists a good rate function I : E [0,00] with the property that
(2.1.16)
ro
Show that for any @ E C ( E ;R) which satisfies
one has that
- sup{@(q)- I ( q ) : q E
EIJ = 0.
In particular, show that (2.1.17), and therefore (2.1.18), holds if @ E C ( E ;R) satisfies (2.1.19) for some cx E (1,m).
2.1.20 Exercise.
This exercise contains several variations on the theme of Lemma 2.1.4. Throughout, { p E : E > 0) 5 Ml(E), I : E -+ [O,oo] is a good rate function, and E' is a second separable metric space.
(i) Assume that { p E: E > 0) satisfies the full large deviation principle with rate I. Further, assume that there is a non-decreasing family {FL : L 2 0 )
Large Deviations
48
of closed sets in E with the properties that pE(F,) = 1, E F, ULz0FL, and that
-
limelog ( p E ( p ; ) ) I -L,
EO ’
> 0,
where
L 2 0.
Finally, suppose that f : F, + E’ is a function whose restriction to each FL, L 0, is continuous. If p: E Ml(E‘) is defined by
for
E
> 0 and r‘ E BE,and if I’(q‘) = inf{I(q) : q E F, and f ( q ) = q ’ } ,
q’ E E’,
show that I’ is a good rate function on E’ and that it governs the large deviations of {p: : E > 0).
(ii) Let {fe : E 5 0) be a family of continuous maps from E into E’, set IA(q’) = inf{I(q) : q E E and q’ = f o ( q ) } , q’ E E’, and assume that
where p’ denotes the metric on E’. Assuming that { p e : E > 0) satisfies the full large deviation principle with rate I, show that { p eo f,-’ : c > 0) satisfies the full large deviation principle with the good rate function I;.
-
(iii) Let f : [O,m) x E E‘ be a measurable function for which there exists a sequence { f n } y C([O,oo)x E ; E’) with the properties that fn(O, .) f(0, .) uniformly on each level set of I and
-
-
Assuming that { p E: E > 0) is exponentially tight and that I governs the large deviations of {pe : E > 0}, show that the function I’ : E‘ [0,m] given by I‘(q’) = inf{l(q) : qi = f ( O , q ) } , q’ E E’, is a good rate function and that it governs the large deviations of { p E o f(E)-l : E > 0).
II Some Generalities
49
(iv) Again assume that { p c : E > 0) is an exponentially tight family whose large deviations are governed by I . Next, let X be a compact metric space and suppose that f : [ O , c o ) x E x X E’ is a measurable map with the property that there is a sequence E C([O,00) x E x X ;E’) such that
{fR}r
for L E [0, co),and
for S E (0, co). Finally, define I: : E’
-
[0, W] for x E X by
IL(q‘) = inf{l(q) : q‘ = f(0, q, x)},
q’ E E’
-
Show that I: is a good rate function for each x E X. In addition, show that if x E 2 as E \ 0 and if g ( E , q ) = ~ ( E , Q , z , for ) ( E , Q ) E (0,co) x E , then I; governs the large deviations of { p Eo g ( c ) - l : c > O}.
Hint: By using the exponential tightness of {pE: c every 6 E (0, co),
> 0}, show that, for
(v) Refer to the setting of part (iv) above and suppose that has the property that
f
C(E’;R)
for some Q E ( 1 , ~ )Show . that
2.1.2 1 Exercise.
The purpose of this exercise is to check that the full large deviation principle behaves in a functorial fashion under projective limits. To be precise, suppose that {En: n E Z’} is a sequence of Polish spaces and that,
Large Deviations
50
for each n E Z+, pn is a complete metric for En and r n + l , n : E n + 1 En is a mapping with the property that Pn (Tn+l,nzn+l, rn+l,nYn+l) I pn+l(zn+l,Yn+l) for all Zn+l, yn+l E En+l. Define E to be the set of 00 x = (XI, , z n , . . ) E En such that 2, = n n + l , n Z n + l for every n E Z+, and let nn denote the restriction to E of the natural projection map from En onto En. Give E the topology which it inherits from the product topology on En, and define
nn=l
n:=l
n,"=,
( 2.1.22)
(i) Show that E is a Polish space and, in fact, that p is a complete metric on E. Further, check that G is an open subset of E if and only if G = Up.lr;lGn, where G, is an open subset of En for each n E Z+. Also, check that for each closed subset F of E and every 6 > 0, there is an n E Z+ and a closed subset Fn of En such that F & n;'Fn F(", where F(@is computed relative to the metric p. Finally, show that K CC E if and only if K = n,",lr;lKn, where Kn cc En for each n E Z+. The metric space ( E , p ) is called the projective limit of the sequence {(En,rn+l,n,pn): n E n+}* (ii) For each n E Z+ suppose that In : En [0,m] is a good rate function, and define
-
(2.1.23)
I(x) = sup In(rnx), z E E. ncZ+
Show that I is a good rate function and that
I n ( z n )= inf{I(x) : 2, = rn(x)}, n E Z+ and x, E En. (iii) Again let In be a good rate function on En for each n E Z+ and define I accordingly as in (2.1.23). Next, let { p e : E > 0) & M 1 ( E ) ; and, for each n E Z+ set pn,+ = pLE o ngl.If, for each n E Z+,
for every open G, in En, show that
( '(
limtlog p G
€ 40
)> 2 -infI G
for every open G in E. Similarly, if, for each n E Z+,
-
lim c log ( p n , . (Fn)) 5 - inf In Fn
O E'
for all closed Fn in En, show that
( '(
limelog p F
-
€40
for every closed F in E.
'1 5 - i n f I F
II Some Generalities
51
2.1.24 Exercise.
Assume that { p , : E > 0) satisfies the full large deviation principle with respect to the good rate function I , and suppose that @ E C ( E ;R) satisfies the condition in (2.1.9).
(i) Show that
and that K c l o g ( l e x p [ @ ( q ) / ~pE(dq) ] 5 sup(@- I ) )
E-0
for closed F
E.
F
In particular, conclude that lim inf{I(q) - (a(q) : q E E with @ ( q ) > L } = 00.
L-PW
Hint: For
r E BE,set @'r(q) =
{
--03
ifqel? if q 4 r;
and apply Lemma 2.1.7 and Lemma 2.1.8 to
(ii) For
E
@G
and
@F,
respectively.
> 0, define
Next, set J ( q ) = I ( q ) - @ ( q ) - a , q E E , where a = infE(I - (a). Show that J is a good rate function and that it governs the large deviations of {ve : E > 0). Finally, check that when there is precisely one p E E at which J vanishes then the measures v, converge to 6,. 2.1.25 Exercise.
- -
In Theorem 1.4.25, we made the assumption that the diffusion matrix a : Rd Rd 8 W d was symmetric and that a together with the drift coefficient b : Rd Rd satisfy (1.4.10) and (1.4.11). Here we replace those assumptions by
Large Deviations
52
{
I;”.:”,X)= inf IT($) : $ E H1 and, for t E [O,T],
for X E RT (cf. (1.3.36) for the notation here) is a good rate function and that (1.4.26) continues to hold. Next (cf. Exercise 1.4.33) extend this result to cover the case when the preceding upper bounds on a and b are replaced by
0 5 a(.)
5 M ( 1 + 1z12)1/21Rdand lb(z)I 5 M ( 1 +
IXI~)”~,
zE
Wd.
2.2 Large Deviations and Convex Analysis
As we saw in Chapter I, it is sometimes the case that the state space E is a separable BANACH space. Furthermore, even when E is not itself a vector space, it often turns out that it is a convex subset of one. For this reason we formulate the following somewhat cumbersome hypothesis about E.
(C)
E is a closed convex subset of the locally convex, HAUSDORFF topological (real) vector space X, and E is a Polish space with respect to the topology that it inherits as a subset of X .
2.2.1 Remark.
The two examples which should be kept in mihd are when E is itself a separable BANACH space (in which case we take X = E ) and when E = MI@), where C is a Polish space. In the latter case, we take X = M(C) to be the space of all finite signed measures on C and endow M(C) with the topology generated by the sets
(2.2.2)
{ P E W :
IJdW-aIl
< T } ,
where a E M(C), E Cb(C; Fa), and T > 0. As is well known (cf. Lemma 3.2.2 below), the LEVY metric on is a complete separable metric, which is consistent with the restriction of this topology to M1(C).
I1 Some Generalities
53
Throughout the rest of this section we will be assuming, without further comment, that we are in the situation described in (C). In this connection, we will be using X * to denote the (real) topological dual of X ;and, for p E MI( E ) , we will define the logarithmic moment generating function of p to be the map A E X * A,(A) E [-m, m] given by
-
(2.2.3)
As we saw in Sections 1.2 and 1.3, when {p, : E > 0) is a family of measures on a separable BANACH space, the logarithmic moment generating functions of the p E's can play an important role in the analysis of the large deviations of {p, : t > 0). It should therefore come as no surprise that the same is true even when we are working with the more general situation described by (C). The reason for this is partially explained by the next result. 2.2.4 Theorem. Let { p E: t (2.2.5)
A(X)
> 0 ) C_ M l ( E ) and assume that
= 1imEApe(A/E)E [-co,oo] O E'
-
exists for every X E X * . Then A is a convex function on X*. Moreover, if the LEGENDREtransform q E E A * ( q ) of A is defined by (2.2.6)
A*(q) = SUP {x*(A,q), - A(X) : A E X * } ,
then A* is a non-negative, lower semi-continuous, convex function; and, for any F C c E , (2.2.7)
-
lim E log (p E(3'))5 - inf A*. F
c-0
Finally, if in addition, { p c : E > 0) is exponentially tight, then (2.2.7) continues to hold for all closed subsets F of E.
PROOF:The convexity of A follows from that of the Ape 's, which in turn is a consequence of HOLDER'Sinequality. To see that A * ( q ) 2 0 for every q E E , simply note that A(0) = 0. Also, because A* is the point-wise supremum over continuous affine functions on E , it is lower semi-continuous and convex. The proof of (2.2.7) for compact F is little more than a re-run of the argument used to derive (1.3.16). (Note that when p E = W,one has that E A , ~(A/€) = Aw(X) for all E > 0.) Namely, let p E E and 6 E (0,1] be given and choose A E X * so that x*(A,P)x- A(A) 2
{
1++ A*(p) -
4
if A*@) = -00 if A * ( p ) < 00.
Large Deviations
54
Next choose T
> 0 so that x*(X,p- q ) x 5 6/2 for q E B ( p ,T ) . Since
we see that
for all sufficiently small E and T . Once one has (2.2.8), (2.2.7) for compact F follows from the last part of (iv) in Exercise 2.1.14. Finally, the extension to all closed F when { p E : E > 0) is exponentially tight is precisely the same as the last part of the proof of Lemma 2.1.5. I
Although the preceding indicates that, when A exists, its LEGENDRE transform A" is a good candidate for the rate function governing the large deviations of { p E : 6 > 0 } , we know that, in general, h* will not be the correct rate function. Indeed, from Lemma 2.1.4 we know that when the p e 's come from pushing measures v, forward under a continuous mapping f and if J governs the large deviations of {vE : > 0}, then the large deviations of { p E : E > 0) will be governed by the rate function I given by I ( p ) = inf{J(q) : p = f(q)}. Since it is extremely unlikely that such an I will be convex even if J is, we see that convexity of I will be more the exception than the rule. With the preceding in mind and assuming that {,uE: E > 0) satisfies the full large deviation principle with some rate function I, one might ask if convexity is the only obstruction to the identification of I with A*. As we are about to see, the answer to this question is, apart from minor technicalities, "yes." There are two steps in the proof. The fist one is the easy application of Theorem 2.1.10 alluded to above. 2.2.9 Lemma. Let { p E : E
> 0) E Ml(E) satisfy the condition that
If { p E : 6 > 0) satisfies the full large deviation principle with the good rate function I, then the limit A(X) in (2.2.5) exists for every X E X * and satisfies (2.2.11)
A(X) = SUP { ~ * ( X , Q )-~ I ( q ) : p E E } , X E X * .
55
II Some Generalities PROOF:Note that (2.2.10) guarantees that
for each A E X * . Hence, we can apply Theorem 2.1.10 to the function
and thereby conclude not only that A(A) exists but also that (2.2.11) holds. I 2.2.12 Remark.
Let everything be as in Lemma 2.2.9 and define (2.2.13)
Obviously, (2.2.11) is then equivalent to (2.2.14)
{
A(A) = sup x*(A,z),
- ' ( 2 ) : x E X}
,
A E X*.
Moreover, I^ is always lower semi-continuous on X . Finally, X if I is convex on E .
i is convex on
The second step in our program is contained in the following theorem about one of the basic properties of the LEGENDREtransform. If one looks carefully at the proof, one realizes that this property is an analytic statement of the geometric fact that at each point on the graph of a convex function there is a tangent line which never goes above the graph. 2.2.15 Theorem. Let f : X
--
(-oa,cx]be a lower semi-continuous, (-cx,m] by
convex function and define g : X *
g(A) = sup{x*(A,z)x -
Iff is not identically equal to (2.2.16)
f(2)
00,
f(.)
:2 E
x}.
then
= sup{x*(A,x)x - g(A) : A E
x*}, 2 E x.
PROOF:The first step in the proof is to develop the geometric picture alluded to above. To this end, define E ( f ) = ( ( 2 , a )E
xxR:
Q
2 f(2))
Large Deviations
56
and
&*(f) = {(x,P) E X * x R : f(x) 1 X * ( X , Z ) ~- P for every x EX}. It is then an easy matter to check from our assumptions that &(f) is a non-empty, closed, convex subset of X x R. Indeed, the closedness and convexity of &(f) come from the lower semi-continuity and convexity of f ; and it is clear that (xo,f(xo))E &(f), where xo is any element of X for which f(x0) < 00. On the other hand, although &*(f) is obviously closed and convex, it is less obvious that it is non-empty. To see that &*(f) # 0, choose xo E E as above and apply the HAHN-BANACH Theorem to find a (p,p, r) E X* x R x R with the properties that the closed affine half space
contains the set €(f) but not the point
( 2 0 ,f(xo)
- 1). Then, since
while X * ( P 1x o ) x
we see that p
- P ( f ( E 0 ) - 1) > 7,
> 0, and therefore that
(2.2.18)
Next, noting that
P 1 g(A) for every (A,/?) E &*(f) and
(A,g(X)) E €*(f) for any A E X * with g(X)
< 00,
one sees that S ( W = inf{P : ( A P ) E
&*(fl},
and therefore that (2.2.16) is equivalent to
( 2.2.19)
f(z)= SUP{X*(A,z ) x - P : (A, P) E &*(f)}, x E X.
Since it is clear that f(x) 2 x-(X, x), -0 for any z E X and (A, P ) E &*(f), we will have proved (2.2.19) as soon as we show that, for each ( x , a )4 E ( f ) , there is a (A,/?) E &*(f) such that (2.2.20)
x * ( X , z ) , - P > a.
I1 Some Generalities
57
In order to prove the existence of the pair (X,P) E &*(f) in (2.2.20), suppose that z E X and that a < f (z) are given. Then, since (z, a ) 4 &(f), the HAHN-BANACH Theorem again provides the existence of ( p , p , y ) E X' x R x R so that the H ( p , p, 7) in (2.2.17) contains E ( f ) and (z, a ) 4 H ( p , p, y). In particular, since ~ . ( p5, 0 )-~pe 5 y for 2 f (zo), we know that p 2 0. Hence, for every 6 > 0,
where (X0,Po) is the element of E*(f) described in (2.2.18). (The introduction of 6 > 0 here is to take care of the case when the tangent hyperplane is vertical and therefore p = 0.) At the same time, for sufficiently small 6 > 0 one has that
Hence, (2.1.20) holds with (Alp) = (X6,Pb) for any sufficiently small 6 > 0. I By combining Lemma 2.2.10 and Remark 2.2.12 with Theorem 2.2.15, we arrive at the following useful algorithm for identifying convex rate functions. 2.2.21 Theorem. Assume that {pc : E > 0) G Ml(E) satisfies (2.2.10) and that I is a convex, good rate function which governs the large deviations of { p E: E > 0). Then, not only does the limit A(X) in (2.2.5) exist for every X E X', but also (2.2-11) holds and
(2.2.22)
I ( q ) = A*(q)
sup{x*(X,q)x - A(X) : X E X * } ,
qE
X.
2.2.23 Exercise.
Suppose that {pa : 6 > 0) satisfies the weak large deviation principle with rate function I . Further, assume that the limit A(X) in (2.2.5) exists for each X E X'.
(i) Show that A* 5 I . (ii) If one knows, in addition, that A and I satisfy (2.2.11), show that A' 2 f for every lower semi-continuous convex f : E (-co,co]which satisfies f 5 I . In other words, A* is the lower semi-continuous,convex minorant of I.
-
I11
General CramGr Theory
3.1: Preliminary Formulation
We want in this section to extend the CRAMERTheorem (cf. Theorem 1.2.6) to a more general setting. In order to describe the setting which we have in mind, it will be necessary to introduce the following embellished form of the hypothesis (C) made at the beginning of Section 2.2.
(c)
E and X are the same as they were in (C). In addition there is a metric p on E which is compatible with the topology on E induced by the topology on X and a measurable norm 11 . 11 on X (which need not be compatible with the topology on X) such that: ( E , p ) is Polish; 11 . 11 is bounded on pbounded subsets of E ;
for all a E [0, 11 and all elements p l , p2, 41, q2 E E ; and
Without further mention, we will be working in this section with the situation which we now describe. E , X , p, and 11 . 11 are as in (C), and R = EZ' is given the product topology. Note that, since E is a separable metric space, the BORELfield Bn over 52 coincides with the product Zf a-algebra (BE) . Next, for n E Z+, we use X, : R E to denote the nth coordinate map (i.e., X,(w) = w,). In view of the preceding remark about Bn,one sees that not only is each of the maps X, measurable from
-
58
III General Cram& Theory
(a,Bn) into (X,Bx) but
59
so are linear combinations of these maps. Given 0 5 m 5 n, we will use SF to denote Xi (f0 when m = n ) and
c,"=,+,
Sm S , to stand for +; and when m = 0, we will usually drop the superscript. Finally, p € M1(E), P z pZf (again using the remark about BQ, one sees that P E MI(0)) and p, E M1(E) is the distribution of under P. Our purpose will be to study the large deviation theory for the family { p , : n 2 1). Obviously, to whatever extent we succeed, we will have generalized CRAMER'STheorem. Our approach is an amalgam of ideas coming from D. RUELLEvia 0. LANFORDand the results obtained in Section 2.2. In particular, we will first use LANFORD'S argument to show, in complete generality, that { p , : n >_ 1) satisfies a weak large deviation principle with a convex rate function. We will then do our best to replace the weak principle with the full large deviation principle and to identify the governing rate function. The main reason for our needing to make the assumptions in is that we will want to use the technical facts proved in the following lemma.
17L
s,
(c)
3.1.1 Lemma. Let A be a non-empty, open convex subset of E . Then for any K cc A , the closed convex hull K of K is also a compact subset of A . In particular, if v E M1(E), then, for each S E (0, l),there is a convex K CC A such that v ( K )2 (1 - S ) v ( A ) .
PROOF: Suppose that K CC A . Given 0 < S < p ( K , A C )choose , M
PI, . . . , pM E K so that
K cUB(p,,S) 1
cr
and denote by r(6)the set of points amqm, where {a,}? C [0,1] with C,M a , = 1 and qm E B ( p , , S ) , 1 5 m 5 M . Clearly, r(6)g A and is closed in E . Moreover, because implies that pballs are convex, it is easy to show that r(6)is convex. Hence, I? 2 T'(S). This not only proves that K C A , but it also gives us an easy way to see that K is compact. Indeed, again using one sees that
(c)
(c),
where { P I , . . . , p ~ } -is the convex hull of { P I , . . . , p ~ and, } as such, is compact. Since K r(6)and 6 can be taken arbitrarily small, it follows immediately that K is totally bounded and therefore, since it is closed in E , compact.
Large Deviations
60
Given the first part, the second part of the lemma is an immediate consequence of the well-known ULAM’S Lemma which says that, because E and therefore A are Polish spaces, there is a K cc A such that v ( K ) (1- S)v(A);and obviously, the first part says that we may as well take K to be convex. I Our first application of Lemma 3.1.1 occurs already in the second part of the next key result.
-
3.1.2 Lemma. For each convex C E BE,n E Z+ pn(C) is supermultiplicative. In addition, if A is an open convex subset of E , then either p n ( A ) = 0 for all n E Z+ or there exists an N E Z+ such that pn(A) > 0 for all n 2 N . PROOF:To prove the first assertion, observe that, by convexity,
and therefore, by shift invariance and independence,
We next turn to the second assertion. Suppose that ,u,(A) > 0 for some m E h+, and, using Lemma 3.1.1, choose a convex K C c A so that p , ( K ) > 0. Let 0 < 26 < p ( K ,A C ) take , G = { q E E : llq - KI1 < 6}, and set M = sup{ 11q11 : q E K}. Then, for n = sm T , where 0 5 T < m,
+
as long as m M
< n6. Thus, if we choose N
so that m M
< NS and
then, since K is convex, we have that
for all n 2 N . I Before we can use Lemma 3.1.2, we recall the following simple fact about sub-addit ive functions.
III General Cram&r Theory
61
-
3.1.3 Lemma. Let f : Z+ [O,m] be a sub-additive function and assume that there is an N E Z+ such that f ( n ) < 00 for all n 2 N. Then
lim
n-+m
f tn ) = inf f (n)E [0, m). n n>N n
PROOF: For m 2 N , set M, = max{f(n) : m 5 n 5 2m). For n 2 m 2
where s = [n/rn] and r = n - ms. Hence,
By combining Lemma 3.1.2 with Lemma 3.1.3, we know that if C" denotes the collection of all non-empty, convex open sets A in E,then 1 C(A) = C,,(A) = - lim - log (pn(A)) E [0, m] n-oo n
(3.1.4)
exists for every A E C". Noting that if I is the rate function governing the large deviations of { p n : n 2 l}, then (cf. the proof of Lemma 2.1.1)
(3.1.5)
I ( g ) = I,,(q) G lim C,,(B(q,r)) = sup{C,(A) : q E A E C " } , r\O
we see that there is no alternative to our adopting (3.1.5) as the definition of I . Of course, we still have to check that this I does indeed give rise to a large deviation principle. 3.1.6 Theorem. The function I,, in (3.1.5) is a convex rate function on E and { p n : n 2 I} satisfies the weak large deviation principle with rate function I,,. Furthermore, if G is a finite union of elements from C", then
1 lim - log (pn(G)) = - inf I,,.
n-+mn
G
PROOF: The lower semi-continuity of I,, is an immediate consequence of its definition. To prove that I,, is convex, let q l , q 2 E E be given, and set q= Given an A E C" containing q, choose A, E C" so that q, E Ai
9.
62
Large Deviations
and A 2 A1aAa.Then
C ( A ) = - lim
n+m
1 2n
- log ( p z n ( A ) )
{w :
1 =-
2
5
(- lim
1
- log
n-mn
Ip(q1)
sn(w) E A1 and Sn(u) E Az}))
1 [ p n ( A l ) ]- lim - log [ p n n+m n
(&)I)
+U 4 2 ) . 7
2
+
and from this we conclude that I p ( q ) 5 ( I p ( q l ) I p ( q 2 ) ) / 2 . Because we already know that I p is lower semi-continuous, the convexity of I p is now proved by a familiar iteration argument followed by a passage to the limit. The fact that lim_ ,~ log ( p n ( G ) )2 - infG I p for arbitrary open G in E is built into the definition of I p . Next, suppose that K cc E and let C < infK I p . Then, there is a finite cover { A l , . . . , A M }& C" of K with the property that C(Am) > C for each 1 5 m 5 M . Hence,
and so we have proved that En+m log ( p n ( K ) )5 - infK I p and therefore that the weak large deviation principle holds. To complete the proof, suppose that G = M A,, where {A,}? G C". Then an easy argument shows that
Ul
1 n
- lim -log(pn(G)) n-m
=
min L ( A m ) .
ljm<M
Hence, it suffices for us to check that C ( A ) = infA I p for every A E C"; and since we already know that C ( A ) 5 infA I p , this comes down to checking that C ( A ) 1 infA I p when L ( A ) < 00. To this end, let 6 E (0,l) be given and choose N so that log ( p n ( A ) )5 L ( A ) + 6 for n 2 N . Next, we use Lemma 3.1.1 t o find a convex K cc A such that 1 1 1% ( P N ( A ) ) 1% ( P N ( K ) ) < 6. N Then, by sub-additivity and the preceding paragraph,
-;
inf I p 5 f;i A
1
I p 5 !& --log ( p n ( K ) ) n+m
III General Cram& Theory
63
3.1.7 Corollary. If for each L 2 0 there is a K L cc E such that (3.1.8)
- 1 lim - log ( p n ( ~ i ) 5 ) -L,
n+cc
n
then I p is a good rate function and {p,, : n 2 1) satisfies the full large deviation principle with rate function I p . 16 in addition,
for every X E X*, then
and
PROOF:The first assertion is no more than the conjunction of Theorem 3.1.6 and Lemma 2.1.5. The rest is an immediate consequence of the first part together with Theorem 2.2.21. I 3.1.11 Exercise.
In the case when E = X is finite dimensional and p(p, q ) = 1/q - pll, show that the whole of Corollary 3.1.7 applies as soon as A,(X) < 00 for every X E X*. This is, of course, the CRAMERTheorem in the general finite dimensional setting.
Large Deviations
64
3.2 Sanov's Theorem
In this section we will specialize to the situation described in the second example of Remark 2.2.1. That is, E = and X = M(C), where C is a Polish space and M(C) is given the topology which is generated by the sets in (2.2.2). Clearly, the topology inherited by as a subset of M(C) is the weak topology (i.e., the topology corresponding to convergence against bounded continuous test functions). In order to show that Ml(C) and M(C) satisfy the hypothesis at the beginning of Section 3.1, we must produce the metric p on M1(C) and the norm 11 * I( on M(C). The latter is easy; namely, we take llall to be the total variation of IIallvar of a E M(C). Since
(e)
we see that 11. l V ar is lower semi-continuous and therefore certainly measurable on M(C); and clearly 11 [ I v ar is bounded on MI@). We now turn to the metric for MI@). Following LEVY and PROHOROV, define the L6vy metric p ( a , v) = inf(6 > o : a ( ~5 )V ( F ( " ) ) 6 (3.2.1) and v ( F ) 5 Q ( F ( ~ ) )6 for every closed F in C} e
+
+
for a,v E MI@), where F ( 6 ) is defined relative to a complete metric on C. An easy argument shows that p is a metric and that it satisfies the convexity property required in (6).Since it is clear that p(a,v) 5 IIv - a [ ( , all that remains is to show that p is compatible with the weak topology and that (MI@),p) is Polish. Before proving this, we will need to recall some elementary properties of the weak topology.
(i) The weak topology is second countable. (ii) a,
v if and only if En+m a,(F) 5 v ( F ) for every closed F in C.
(iii) 2 Ml(C) is relatively compact if and only if for each 6 > 0 there is a K cc C such that a ( K ) 2 1 - 6 for every a E r. (Such a subset r i s said to be tight.)
(iv)If F c b (C;R) is uniformly bounded on all of C and is equi-continuous v implies that on each compact subset of C, then a,
All of these facts are well-known, and their proofs can be found in any standard text in which the modern theory of weak convergence is discussed. We will now use them to check that the LEVY metric possesses the properties which we want.
111 General Cram& Theory
65
3.2.2 Lemma. ( L ~ v Y& PROHOROV) The metric p in (3.2.1) is compatible with the weak topology on MI@), and (MI@),p) is Polish.
-
PROOF:In view of property (3)above, it is obvious that a , + v if p ( a n , v) 0. To prove the opposite implication, let S > 0 be given and for each closed F in C define dist (a, ( F ( ' ) ) " ) = dist(a, F )
+ dist (a,( F ( 6 ) ) c')
a E C,
where "dist" is measured with the same metric on C as the one used in the definition of F ( 6 ) .It is then an easy matter to check that {$JF : F closed in C} is uniformly bounded and equi-continuous on C. Hence, by property (iv), if a, ==+ u , then f $JF da, f $JF du at a rate which is independent of F ; and, since X F 5 $F 5 x F ( 6 ) , we conclude from this that p(an,v ) 0 if a, =+ v. (We use the notation XJ-to denote the indicator (or characteristic) function of a set r.) We have therefore proved that p is compatible with the weak topology on MI@). To prove that p is a complete metric on MI@), suppose that
-
-
- -
sup p(an,am) n>m
0 as m
00.
We must show that {an}yis relatively compact. To this end, let 6 > 0 be p(a,, am) 5 S/2', given and, for l E Z+, choose m E Z+ so that and then (using property (iii)) choose Ke C C C so that f f k (Ke) 2 1- 6/2e
for all n E Z+. Finally, set
n 00-
K =
Kj6/2')
e=i
note that K is closed and totally bounded with respect to a complete metric and is therefore compact, and check that an( K " ) 5 26 for all n E Z+. Thus, by property (iii), {an}yis indeed relatively compact. I Before getting down to the main business of this section, there is one more general fact about the space M(C) which it will be useful to have at our disposal. Namely, we want a good representation of M(C)*.
3.2.3 Lemma. The duality relation (3.2.4)
Large Deviations
66
-sc
determines a representation of M(C)* as cb(C; R).
PROOF:Clearly, for each 4 E cb(C;R), a E M(C) 4da determines a unique element of M(C)*. Thus, all that we have to show is that every element of M(C)* arises in this way. Let X E M(C)* be given and define $(a) = A(&), (T E C. Clearly, 4 is continuous. Moreover, because of the way in which the topology on M(C) is defined, we can find a finite set {$~~}r/lE Cb(C; R) such that
and from this it is clear that 4 is bounded. Finally, it is obvious that X(Q) = 4da if a is a linear combination of point masses; and, because such a's are dense in M(C), it follows that this equation holds for all Q E M(C). I
s,
Returning to the problem of large deviations, let Q E Ml(Ml(C)) be given and define Qn E M1 (Ml(C))to be the distribution of U = (Vl,.. . ,V,)
E
M1(C)"
-C 1 "
-
n k=l
vk E Ml(C)
under Q" E M1(MI@)").By the Weak Law of Large Numbers combined with the second countability of the weak topology on MI@),one can easily check that Qn ,,a , where PQ E Ml(C) is defined by r
Thus, it is reasonable to inquire about the large deviations of {Qn : n 2 1). In fact, by the results which we proved in Section 3.1, we will know that the large deviations {Qn : n 2 1) are governed by the rate function (3.2.5)
I Q ( v )= h b ( V )
SUP
4 d V - A Q ( ~ :) 4 E Cb(C;R)
for v E Ml(C), where
and that IQ is good, as soon as we show that {Qn : n 2 1) is exponentially tight. In order to do so, we employ the following remarkably useful general observation which will serve us well not only here but also later on.
III General Cram& Theory
67
3.2.7 Lemma. Let p E Ml(C) be a fixed and suppose that {Vm}E=lis a bounded sequence of non-negative, measurable functions on C which tends to 0 in p-measure as m m. Then, for each M E [l,m) and /3 E [I,GO) with the property that there is a subsequence (Vmf
-
for 0 < E I 1 and L E [ l , m ) ,whenever {R,: c family which satisfies
13.2.9)
0<€<1 SUP
> 0) G Ml(Ml(C)) is a
[$
( / M I I ~ ) e x ~ L V d u ] Re(du))' I M L ~ x P [ B V dfi ]
for every bounded measurable V : C -+ [O,m]. In particular, for each such that L E [ l , ~there ) is a CL CC
R,(C:) i exp[-L/~],
(3.2.10) for any {R, :
E
o <E I 1,
> 0) satisfying (3.2.9).
PROOF:Obviously, for any 6 > 0 and measurable V : E
-+
[0, GO) ,
for every 0 < 6 5 1 and T E (0,m). Now let .4! E Z+ be given, take S = l / Y , T = .4!(e log(2M) + 1),and choose me E Z' so that
+
One then has that
R,
({v
E
M ~ ( c :)
cS,vm, dv 2 1)) I
e x p [ - ( ~+ I ) / € ]
for 0 < E 5 1 and I E Z+; and (3.2.8) is an immediate consequence of this. To get (3.2.10), choose { K , } to be a sequence of compact subsets of C for which p ( K & ) 0 as m -+ 00. Next, apply the preceding (with V, = x K , ) to see that there is a subsequence { K m f } z for l which (3.2.10)
-
68
Large Deviations
To see how the exponential tightness of {Qn : n 2 1) follows from uQ(du) and note that Lemma 3.2.7, set p~ =
where we have used JENSEN'Sinequality at the final step. Obviously, exponential tightness for {Qn : n 1) is now a trivial consequence of (3.2.10); and, as we mentioned just before Lemma 3.2.7, this is all that we needed in order to know that the IQ in (3.2.5) is good and that it governs the large deviations of {Qn : n 2 1).
>
-
A particularly interesting case of this general result is the one in which Q is the distribution ji of (T E C 6, E Ml(C) under some p E MI@). In this case, Qn is the distribution f i n of the empirical distribution functional (3.2.11)
t 7 = (01,. . . ,0")E
C"
-
.
1
c n
Ln(u) = n m=l
under pn and the measure p~ introduced above coincides with p. Specializing the preceding to this case, we see that
and therefore that the large deviations of {jin : n 2 1) are governed by the good rate function
I C ( 4 = AE(u) (3.2.12)
for u E MI@). However, before stating this as a theorem, we want to develop a more tractable expression for IF. 3.2.13 Lemma. For u E Ml(C), define
(3.2.14)
S,flogfdp
i f u < < p a n d f = dP k
00
otherwise.
111 General Cramkr Theory
69
Then the I,i in (3.2.12) is equal to H(. 111).
+
PROOF:We first show that if v << p and ve =_ 6 p (1 - e)v for 0 E [0,1], then H(vlp) = limelo H(velp). To this end, set f = and fe = 0 (1 0)f. Since x E [0, m) z log z is convex, JENSEN’S inequality says that
-
%
+
-
log x is non-decreasing and concave, At the same time, since x E [O,oo) log fe 2 (log 6) V ((1 - 0) log f ) ; and therefore
After combining these two, one clearly gets the asserted convergence. We next show that if v << p, then I,(v) 5 H(v1p). In view of the RE(v) is lower semipreceding and the obvious fact that v E continuous, we may and will assume that f = $ 0 for some 8 E ( 0 , l ) . In particular, by JENSEN’S inequality, we then have
>
from which it is clear that I p ( v ) 5 H(v1p). As a consequence of the preceding, all that remains is to show that if I,i(v) < 00, then dv = f d p and (3.2.15)
IjLW
L
J, f l % f d P .
Given v with I p ( v ) < 00, one has
for every bounded continuous 4. Since the class of 4’s for which (3.2.16) holds is closed under bounded point-wise convergence, (3.2.16) continues to be true for every bounded &-measurable 4. In particular, we can now show that u << p. Indeed, suppose that I? E BE with p ( r ) = 0. Then, by (3.2.16) with 4 = T X r , r v ( r ) 5 I,(v), T > 0; and therefore v ( r )= 0. Knowing that v << p , set f = I f f is uniformly positive and
%.
Large Deviations
70
uniformly bounded, then (3.2.15) is an immediate consequence of (3.2.16) with 4 = logf. If f is uniformly positive but not necessarily uniformly bounded, set f n = f A n and use (3.2.16) together with FATOU’S Lemma to justify
Finally, to treat the general case, define vg and f,g = e+(l-e)f for 0 E [0,1] as in the first paragraph of this proof. By the preceding, fe log fe dp 5 Ib(vg) as long as 0 E (0,l). Moreover, since 0 E [ O , l ] Ib(ug) is bounded, lower semi-continuous,and convex on [0, 11, it is continuous there. In conjunction with the result obtained in the first paragraph, this now completes the proof. I
-
The quantity H ( v ( p )in (3.2.14) is called the relative entropy of v with respect to p. As we will see in the sequel, the relative entropy functional plays a central role in the theory of large deviations. We have now proved the following theorem which, at least when C = R, was originally derived by SANOV. 3.2.17 Theorem. (SANOV)Let p be a probability measure on the Polish space C and Jet fin E Ml(Ml(C)) be the distribution under pn of the function L, in (3.2.11). Also, define H(.lp) as in (3.2.14). Then H(. Ip) is a good, convex rate function on MI@) and {fin : n 2 1) satisfies the full large deviation principle with rate function H(. lp).
Before dropping this topic, it seems appropriate to address a deficiency in the preceding result. Namely, the Weak Law of Large Numbers tells us that
-
for every bounded measurable 4 : C R; not just for bounded continuous 4’s. Thus, {fin}: actually tends to 6, in the strong topology on M1(C) and not just the weak topology. It is therefore reasonable to ask whether one cannot also develop the corresponding large deviation theory relative to the strong topology. As we are about to see, not only is it possible to do so, but it is even a rather easy exercise to transform what we already have into a statement about the strong topology.
I11 General Cram& Theory
71
The strong topology (or r-topology) on M1(C) is the topology U generated by the sets
as q5 runs over the space B ( C ; R ) of bounded measurable functions on C into R. Obviously, the strong topology is stronger than the weak one. In addition, it is clear that, for each a E MI@ ), the sets N
~ ( aA,,; . . . , A N ;
(3.2.18)
€1 =
n
~ ( aXA,; ;
E)
k=l
for N E Zf, A l , . . ., AN E BE,and E > 0 constitute a U-neighborhood basis at a. In particular, M I ( C ) with the strong topology is usually not even first countable! The key which will allow us to transform the SANOVTheorem to the strong topological setting is contained in the following, whose proof turns on another application of Lemma 3.2.7.
3.2.19 Lemma. Let p
E
M1(C) and assume that
{Re : E > 0) 2 Ml(Ml(C1) satisfies (3.2.9) for some p, M E [l,co)and all measurable V : C -+ [0, m]. Further, assume that { R , : E > 0) satisfies the weak large deviation principle with the rate function I . Then, (3.2.20) H(vlp) I P ( W ) + log(2M)), v E Ml( C ) . Moreover, given N E Z+ and A,, . . . , A N E BE,define f : M1(C)
-
RN
bY
f(v) = (v(Al),...,v(AN)),
v E Ml(C).
Then f is measurable and - i nf { I(v ): f ( v ) E A"}
< limclog(R,(f-'(A))) r-0
5 - inf{I(v) : f ( v ) EX} for every A E
&N.
PROOF:First observe that, by Lemmas 2.1.5 and 3.2.7, our hypotheses imply that I must be a good rate function and that {R, : E > 0} satisfies the full large deviation principle with rate I.
72
Large Deviations To prove (3.2.20), use Theorem 2.1.10 and (3.2.9) to obtain
for all u E M1(C) and V E cb(C;w);and so, for each v E M1(C),
-
for every bounded measurable V : C [0,m). In particular, just as in the proof of Lemma 3.2.13, I(v) = m if v is not absolutely continuous with respect to p. On the other hand, if v << p and f = $, take V, = log(1 f A n ) and conclude that
+
I(v) 2
-
1
k l o g ( l + f A n) dv - log(2M).
00 and thereby arrive at (3.2.20). Finally, let n We now turn to the proof of the second assertion. In view of Lemma 2.1.4, all that we have to do is produce a sequence {ft}: G C(M,(C);RN) with the properties that
and
To this end, choose for each 1 5 k 5 N a sequence
111 General Cramcr Theory
-
in p-measure as m {Vml}E, for which
00.
73
Next, apply Lemma 3.2.7 to find a subsequence
for 0 < c 5 1 and L E [ l , ~ ) . Finally, set
Clearly, fp, E C(M1(C);WN)for each l E Z+. Moreover, since (cf. part (i) of Exercise 3.2.23 below) {v : H(vlp) 5 K } is uniformly absolutely continuous with respect to p for every K E (0, co),one sees from (3.2.20) that Vm du 0
s,
-
-
uniformly over v's with I(v) 5 L; and therefore fe(v) f ( v ) uniformly for v 's in such sets. It is therefore clear that these fe 's will serve. I
3.2.21 Theorem. Let { R e : E > 0) and I be as in Lemma 3.2.19 above. For each L 2 0 the set {v E : I(v) 5 L} is compact in the strong topology. Moreover, if I' is a measurable subset of and T' and roT denote, respectively, the interior and closure of r in the strong topology, then
- inf I(v) < limclog(R,(r)) 5 i&~log(R,(I')) 5 - inf I(v). m - o r
€-0
€40
U€FT
In particular, for any p E M1 (C) and all r E BE,
PROOF:To see that {v : I(v) 5 L } is strongly compact, simply observe that the strong and weak topologies coincide on subsets of consisting of measures which are uniformly absolutely continuous with respect to a fixed element of and use (3.2.20) together with part (i) of Exercise 3.2.23 below. To prove the lower bound, note that if a E Yo', then there exist N E Z+,
Large Deviations
74
and a n open set A in RN such that
aE
{Y :
( ~ ( A I ). ,. . , v ( A N ) ) E A }
r;
and therefore Lemma 3.2.19 shows that
To prove the upper bound, let I? E I J M ~ ( E ) be given and suppose that F 2 r is a strongly closed subset of M I @ ) . Using P to denote the collection of all finite partitions P of C into measurable sets and given P = {A l , . . . , AN} E P , let A ( P ) be the closure in W N of
and set F ( P ) = {v : (.(A,), . . . ,.(AN)) 3.2.19, we know that
-
E
A ( P ) } . Then, by Lemma
’1
limElog(R,(l?)) 5 K c l o g R F ( P ) 5 - inf €40
EO’
[
‘(
vEF(P)
I(v).
Thus, we will have the upper bound once we show that inf I(v) = sup uEF
inf
I(v).
PEP V E W P )
In proving this, we may and will assume that there is an L E ( 0 , ~ ) such that inf I(v)5 L for every P E P. uEF(P)
Noting that (because the level sets of I are strongly compact) there is, for each P E P, a vp E F ( P ) such that I(v p ) = infVEF(p)I(v) and using the fact that {up : P E P } {v : I(Y)5 L } , choose any subnet {vp : P E P’} of {up : P E P } (think of P being partially ordered by “refinement”) so that {vp : P E P’} converges strongly to some a. Clearly, all that we have to do is check that a E F. To this end, let N E Z+, A l , . . . , AN E BE,and 6 > 0 be given; and denote by U the corresponding set defined by (3.2.18). Next, choose P E P’ so that {Al, . . . , AN} is contained in the algebra over C which is generated by P and CaEP ~ Y ~ ( A ) - ~ (
I11 General Cram& Theory
75
3.2.22 Exercise.
(i) On the basis of the reader’s previous information about the weak topology, derive the four properties listed at beginning of this section.
-
(ii) Suppose that {(En,p n ) } y is a sequence of complete separable metric spaces and that, for each n E Z+,x,+1,, : Cn+l C, is a mapping with the property that
-
Let (C, p ) be the projective limit of {(En,~,+1,,, pn) : n E Z+}, and show that M1(C) is homeomorphic to the projective limit of {M1(Cn)}f”when pn+l pn+l o is the mapping from M1(C,+1) into M1(C,) for each n E Z+.
r;il,,
Hint: The only difficult step here is to check that if pn E MI(&) and -1 p, = p,+1 o x,+~,, for each n E Z+, then there is a p E M l ( C ) such that p, = p o rG1, n E Z+. However, the existence of such a ,u can be seen as a consequence of the KOLMOGOROV Extension Theorem as presented in Theorem 1.1.10 of [104]. 3.2.23 Exercise.
In this exercise we outline an approach t o the SANOVTheorem which avoids the estimate in Lemma 3.2.7 and resembles, to a much greater extent, the ideas behind our proof of the classical CRAMERTheorem. Even though the proof avoids Lemma 3.2.7, it nonetheless uses the equation A; = H(.(p)provided by Lemma 3.2.13.
(i) It turns out (cf. Corollary 5.1.11) that both the goodness of A; as well as the upper bound (in terms of A;) for all closed sets C 2 MI@) follow once one knows that A, has the property: for each M E ( O , c m ) , there is a K ( M ) cc C such that A p ( V ) 5 1 whenever V E Cb(E;R) vanishes on K ( M ) and is bounded by M . Prove that Ap has this property. Although it is true that the goodness of A; follows from the general principles which will be derived in Section 5.1, it is actually very easy to check that the level sets {Y : H(vlp) 5 L } are not only weakly but even strongly compact. Indeed, it is clear that the set {f E L 1 ( p ) +: JC f l o g f d p 5 L } is uniformly p-integrable and is therefore weakly compact as a subset of L1(p). Since weak convergence in L ’ ( p ) gives rise to strong convergence in M(C) of the associated measures, it follows that (Y : H(v\p) 5 L } is strongly compact.
76
Large Deviations
(ii) We now outline a proof of the lower bound in Theorem 3.2.21 which is based the same principle as the one used to prove the lower bound in the classical C R A M ~Theorem. R Since this lower bound is better than the lower bound in SANOV'STheorem, this will complete the program for this exercise. Let H ( v ( p ) < 00 and suppose that G E t S ~ ~ ( isx )a strongly open neighborhood of v. Set f = $,define F,(u) = fm(um)for u E En, and let A, = {u E C" : L,(u) E G and F,(a) > O}.
nL=,
- -
Using the Law of Large Numbers, check that vn(A,) 1 as n Next, using JENSEN'S inequality and the fact that xlogx 2 -e-', [0, a), verify the following steps:
as long as vn(A,) > 0. Combining this with vn(A,) lower bound in Theorem 3.2.21.
-
00.
x
E
1, arrive at the
3.2.24 Exercise.
Prove that
llv - P I I L 5 2H(VIPL),
(3.2.25)
P , v E Ml(C).
A proof of (3.2.25) can be based on the observation that
+ 2x)(z logrc - x + 11, 2 E [O, m), IIv - plIvar = ]If - lJ1,y(p) if v << p and f = $, and
3(x - 1)2 5 (4 the fact that
SCHWARTZ'S inequality 3.2.26 Exercise.
For R E Ml(Ml(C)), define PR E MI@) by
Ill
77
General Cram&r Theory
and show that R E M1 (Ml(C)) I-+ p~ E is a continuous mapping. Next, let Q E M1 (MI@)) be given, and show that
I Q ( v )= A;(”)
= inf(H(R1Q) : p~ = v},
v E MI@).
(Hint: Either use the variational formula for relative entropy, or combine the results of the present section with Lemma 2.1.4.) Finally, apply JENSEN’Sinequality to check that
Conclude that
and therefore that v
<< p~ if IQ(v) < 03.
3.2.28 Exercise.
-
Let {QE : 6
> 0) be a family of probability measures on
and I : M1(C) [0,a] a rate function with the property that {v : I(v) 5 L } is compact in the strong topology for each L > 0. Assume, in addition, that - i n f I < limtlog(Q,(I’)) 5 G ~ l o g ( Q , ( r ) )5 - i n f I r o r €-+O €+O r -7
-
(see Theorem 3.2.21 for the notation here) for all I‘ E B M ~ ( ~Given ). a W which satisfies
BM,p)-rneasurable function @ : M1(C)
O z z l (/MI@)
for some cx E (1,m), show that
if @ is continuous with respect to the strong topology.
78
Large Deviations
3.3 Cramhr’s Theorem for Banach Spaces
In this section, we will be assuming that E = X is a separable real BANACH space with norm (1. [ I E , and we will be attempting to prove the analogue of CRAMER’S Theorem (cf. Theorem 1.2.6) in this setting. That is, we want to show that if p is a probability measure on (E,BE)for which (3.3.1)
sn
and if pn denotes the distribution of under p n (cf. the second paragraph of Section 3.1), then { p n : n 2 I} satisfies the full large deviation principle with rate function At,, where
and
However, before getting into the details, it might be helpful to know what it is that %on-deviant behavior” means in this situation. For this reaon, we begin with a proof of the Strong Law of Large Numbers for E-valued random variables. 3.3.4 Theorem. (RANGARAO) Let {Xn}ybe a sequence of independent, identically distributed, E-valued random variables on some probabilityspace(R,M,P); a n d l e t p be thedistributionofxl. IfsE I I z l I ~ p ( d z< ) m, then there is an m ( p ) E E such that
for P-almost every w E 0. Moreover, m ( p ) is the unique element y of E with the property that (3.3.5)
E’(A, Y)E
=
E*(A,z)E p(dx),
A
E**
xy
PROOF:First, by KOLMOGOROV’S Zero-One Law, if X,(w)has a limit for P-almost every w E s2, then that limit is, P-almost surely, independent of w E R. Knowing this and using the Classical Strong Law of Large Numbers, one can easily check that, if it exists, then the P-almost
III General Cram& Theory
79
xy
sure limit of xi, must satisfy (3.3.5). Hence, all that we have to do is show that convergence takes place. Note that the Classical Strong Law together with the second countability of the weak topology on M1(E) lead to the conclusion that
( 3.3.6) for P-almost every w E R. As we are about to see, this turns to be surprisingly close to the result which we are seeking. Indeed, suppose7 for the moment, that there is an R E (0,m) such that IIXn(w)lI~5 R for all n E Z+ and w E R. For A E E*, define * g r ’ ( x ) = q R ( x ) E * ( A 7 r )El
(3.3.7)
xEE
where QR E C ( E ;[0,1]) satisfies
{gy’
Then : IIAIIE. 5 l} is uniformly bounded and equi-continuous. Hence, by property iv) of the weak topology (cf. the first paragraph of Section 3.2) and (3.3.6)
and therefore
for P-almost every w E 0. Thus, for P-almost every w E R,
is a CAUCHY sequence in E ; and, as we pointed out in the preceding paragraph, that is all that we need. In other words, our result has been proved in the case of bounded random variables. To handle the general case, define, for R E (0,oo),
xLR’( w) = x 10,R] (11x n ( w )11E ) x n ( w )
Large Deviations
80
and Y,‘R’(w) = IlX,(w) - Xy)(W)(IE’ w E a. Given E > 0, choose R E ( 0 , ~so) that
and use the preceding result applied to {XLR’}f” together with the Classical Strong Law applied to {Y,(“)}: to conclude that
xy
Since this clearly shows that { X k ( w ) } r is CAUCHY in E for P-almost every w E 0, the proof is complete. I The quantity m ( p ) in (3.3.5) is called the mean of the measure p. Obviously, a consequence of Theorem 3.3.4 is the fact that pn 6m(p). In order to study the large deviations from this, we again want to use Corollary 3.1.7. Thus, we must show that there are compact K L Is for which (3.1.8) holds. Actually, most of the work required for their construction has already been done in Lemma 3.2.7; all that we need in addition is the following relatively simple observation.
*
3.3.8 Lemma. Ifr
-
E M1(E) satisfies
then v f T m(v)is a continuous map. In particular, iff : E is a lower semi-continuous function satisfying
and if
-
[O, co]
111 General Cram& Theory
then, for each L E [O,m), r ( f , L ) is closed and v E r ( f , L ) continuous. Thus, (3.3.9) is a measurable subset of Ml(E) on which v mapping.
-
-
81
m ( v ) is
m(v) is a measurable
PROOF:The second assertion is just an application of the first assertion once one notes that, for v E r(f,L),
Thus, we need only prove the first assertion. Moreover, since u H ~ l s l l E , R11.11~ ~ ( d zis) lower semi-continuous for each R E (0, m), we may and will assume from the outset that the set is closed in Ml(E). Now, suppose that { u n } r C r and that v, v,. Then
where the functions g y ) are the ones defined in (3.3.7). Hence, for each
R E (O,m),
from which the desired result is clear. I The preceding enables us to prove the following variant of Lemma 3.2.7.
3.3.10 Lemma. Let p E M,(E) be given, assume that
82 Then f(0) = 0; 2 E E
-
Large Deviations
f(x) E [O,OO] is lower semi-continuous;
f (x) - 00; lim -
lIxll+m
and
11xl1
1 JEexp[Ul dcL I 1-6’ s E [O, 1).
Finally, if { R , :
E
> 0) C MI(Ml(E)) satisfies
-
for some p, M E [l,00) and all measurable V : E [ O , o o ] , then for each L E [l,OO) there is a K L CC E (whose choice depends only on p, p, and M and is otherwise independent of { R, : E > 0}) such that
-
0 < 6 5 1,
P , ( K i ) L eXP[-L/E],
where pe E Ml(E) denotes the distribution of under R, and r(f)is the set in (3.3.9).
Y
E
[o, 00)},
T
r(f)
m(v) E E
PROOF:Define h(T)
= SUp{(I:T- g ( 0 ) :
(I:
E
E
[o, GO).
It is then clear that h(0) = 0 and that T E [O,GO) h ( ~E) [O,GO] is a lower semi-continuous, non-decreasing function for which h(T) lim = 00. T
T+OO
Furthermore, just as in the proof of Lemma 1.2.5 (only even more easily), one sees that p ( { x E E : 1)xll 2
T>)
5 e--h(T),
T
E [o,Go);
from which it is easy to show that
Now let { R , : E > 0} satisfying our hypothesis be given. By Lemma 3.2.7, we know that, for each L E [ ~ , o o )there , is a CL CC Ml(E) (depending only on p, 0,and M ) such that
R B ( C i )5 e-L’e,
0
< E 5 1.
III
83
General Cram& Theory
In addition, since
we have, when T L
R€(I? (f
7
+ log(4M)), that
f 2@(L
TL)
'>
/ < (g)"'/['2pE
-
e--LIE
f d u ] R,(dv) I 2
exp
M,(E)
for all L E [l,00) and 0 < E
E
< 1. Hence, if
then, as the continuous image of a compact set, K L CC E and
for all L E 11, 00) and 0 < E 5 1. I We now have all the machinery which we need to prove the following extension of the CRAMER Theorem.
3.3.11 Theorem. (DONSKER & VARADHAN) Assume that (3.3.1) holds. Then, for each L E [0, 00), there is a K L cc E such that
Pn(KL) I e P n L , n E Z+ and L
E
[l,m).
In particular, { p n : n E Z'} is exponentially tight and the function A f in (3.3.2) is a good rate function which governs the large deviations of {pn : n 2 1). PROOF: In view of Corollary 3.1.7, all that we need to check is the first assertion. To this end, define ji, E Ml(Ml(E)), as in Section 3.2, t o be the distribution of .
L,
= -n1
c ,s n
m=l
under pn. It is then clear that the measures R1/, ,!isatisfy n the hypotheses of Lemma 3.3.10 with respect to p ; and therefore the existence of the required K L 's is an immediate consequence of that lemma. I
Large Deviations
84
3.3.12 Exercise.
-
It is amusing and instructive to prove the large deviation result of this section as a corollary of the SANOVTheorem. To this end, let f : E [O,m] be the function in the proof of Lemma 3.3.10 and define the sets I'(f,L), L 2 0 accordingly. Noting that the mean value functional m is continuous on each I'(f,L), use part (i) of Exercise 2.1.20 together with the SANOVTheorem to show that I : E [O,m] given by
-
is a good rate function on E and that it governs the large deviations of {pn : n 2 1). Conclude, in particular, that I = A;.
A direct proof of the equality in (3.3.13) is not easy, even in special cases. For example, suppose that E = 0 and that p is WIENER'Smeasure W as in Section 1.3. As we pointed in the discussion (1.3.7), the large deviation principle proved in SCHILDER'S Theorem can be thought of as an example of CRAMER'S Theorem; and so (3.3.13) gives us another variational formula for Iw = Ah. To give a direct proof of (3.3.13) in this case, one can use the fact that P E MI(@)is absolutely continuous with respect to W if and only if there is a {Bt : t E [0,m)}-progressively measurable map b : [0,m) x 0 + Rd such that
Ib(t, e)I2 dt and
[I qS,e)
< 00
d e ( s )-
-
in which case the distribution of
eEo
1
m
00
P ( q = exp
(as., W )
e-
1 w(de);
lb(s,e)12ds
1'qs,
e) ds
under P is W . Using this, one can check that, when P << W ,
Finally, for a given $J E H1, show that, among P E MI(@)satisfying Je llOlleP(d6) < 00 and m ( P ) = $J,the one which minimizes H(P1W) is
W* .
III General Cram& Theory
85
3.4 Large Deviations for Gaussian Measures
In this section, we will generalize SCHILDER’S Theorem to cover all centered Gaussian measures on a separable, real BANACH space. That is, we will be assuming that ( E , 11. I / E ) is a separable, real BANACH space and that p is a probability measure on ( E ,B E ) with the property that
-
for some symmetric, bilinear map Qr : E* x E* [0,w). The bilinear map Qr is called the covariance of p. At least when E is infinite dimensional, the archetype for this sort of measure is WIENER’S measure W on the space 0 (cf. Section 1.3), in which case
and SCHILDER’S Theorem gave us a large deviation principle for the family {We : E > 0). Our aim here will be to come as close as possible to duplicating SCHILDER’S result in general. 3.4.2 Lemma. There exists an a E ( 0 , ~such ) that (3.4.3)
~ e x P [ a l l x l l bPL(dX) l < 00.
In particular, (3.4.4)
2Ap(X) = Q p ( X , X ) =
sE
L
~*(X,x);p(dZ) 5 B11X11&
for X E E * , where B = 11x11; p ( d ~ E) (0,oo). Finally, A; is a good rate function; and, for all x E E and t E R, AL(tx) = t2AL(x). (See Exercise 3.4.15 below for a little more information.)
PROOF:The existence of an a > 0 for which (3.4.3) holds is a consequence of FERNIQUE’S Theorem (cf. Theorem 1.3.24). Furthermore, the equalities in (3.4.4) are all obtained from consideration of the R-valued, centered GAussian random variable x E E I-+ p ( X , z ) ~ the ; inequality is trivial; and the finiteness of B follows trivially from (3.4.3). Finally, given (3.4.3), the fact that A; is a good rate function is covered in the statement of Theorem 3.3.11; and the homogeneity of A; is an immediate consequence of the homogeneity of Ap. I Following the pattern in SCHILDER’S Theorem, we now define pe to be the distribution of x E E d I 2 x E E under p; and, as a first approximation to his result, we present the following.
-
Large Deviations
86 3.4.5 Theorem. The family { p E : E
> 0) satisfies the full large deviation
principle with the good rate function A;. PROOF:We have already pointed out that, as a consequence of Theorem 3.3.11, A; is a good rate function. Furthermore, since p1ln is the distribution under p n of x E E n x k (i.e., plln here is the same measure as the one which we denoted by p n in Section 3.3), Theorem 3.3.11 allows us to also conclude from (3.4.3) that { p l l n : n 1 1) satisfies the full large deviation principle with rate A;. In order to pass from this statement to ] 1 and y ( ~ = ) en(€)for E > 0. It is the desired one, set n(E) = [ 1 / ~ V y(~)’/’x under p l / n ( B )has distribution p E and that then clear that x y(e) E [l-- E , 11 for 0 < E < 1. Now suppose that F is a closed subset of E and set F = {y-lI2x : y E 13 and x E F}. Then F is also closed, and so
- xy
-
[i,
Since
this proves the upper bound in the large deviation principle. To prove the lower bound, let G be an ‘open set in E and suppose that x E G. Then we can find an open neighborhood U of x and an EO E (0,1/2] such that U C y(c)-’/’G for all 0 < E < €0. Hence
) lim Y(E) log ( p n ( E ) ( y ( ~ ) - l / Z G ) ) lim E log ( p E ( G )=
-
F o4
BO ’
6 )
1
1 n-oo & I -log (p1ln(U)) 2 - inf A; 2 -A;(.). n U Thus, the lower bound is also proved. I As a dividend of Theorem 3.4.5, we get the following sharpening of the estimate in (3.4.3). 3.4.6 Corollary. (DONSKER & VARADHAN) Set (3.4.7) a = inf{A;(x) : 11x11~= 1) and b = sup{Q,(X,X) : I l X l l p = 1).
Then (3.4.8)
lim
R-CC
1
log [p({x E E : l
l ~ l 2l ~R})]= -U
1 2b-
= --
III General Cramer Theory
&
87
sEIlxll&p(dx)<
In particular, 5 fails for cr E (a, m).
00,
(3.4.3) holds for (Y E (O,a), and it
PROOF:It suffices to prove (3.4.8). To prove the first equality, set B = B(O,l) and note that inf A; = inf inf A* (rx) = a inf r2 = a BC
T 2 1 ZEaB
T21
and similarly that infzc A; = a. Hence, by Theorem 3.4.5, we see that
1 lim -log R+m
R2
[p(B(O, R)")]= lim flog [ p . ( E ) ] = - inf A; = -a.
B'
t+O
To prove the second equality in (3.4.8), first observe that A;(.)
= sup { J E * ( A , z ) ~ - t2A,(A, A) : J E R and (IA((E. = 1)
{
= SUP AfA ( E * ( A , x ) ~ : )I)AJ~E*= 1}
where pA denotes the distribution under p of x E E and we have used (iv) of Exercise 1.2.11 to see that
I-+
~ ( A , x ) E~ R,
e2
(3.4.9) In particular, if 11x1(~ = 1, then
To prove the opposite inequality, suppose JJAJJE. = 1 and note that
where we have applied the first equality in (3.4.8) to both p and to p A , and we again used (3.4.9) to get the final equality. Since this shows that a 5 2 ~ 1p ( ~ whenever , ~ f JJA)JE-= 1, we have now shown that a =
8.
88
Large Deviations
Before leaving the topic of centered GAussian measures p, we want to show that one can always develop a representation of A; analogous to the one for Ah given in (1.3.12). For this purpose, it will be convenient to introduce a new notion. Namely, we will say that ( E ,H , S, p ) is a Wiener quadruple if
(i) E is a separable, real BANACH space, (ii) H is a separable, real HILBERTspace, (iii) S is a continuous, linear injection from H into E , (iv) 1.1 is a probability measure on ( E ,BE) with the property that (3.4.10) l e x p [ a
where S' : E'
-
E * ( X , Z ) ~ ]p(dz) = exp
H is the adjoint map to S.
Obviously, if ( E ,H , S,p ) is a WIENERquadruple, then p is a centered GAUssian measure on E and the covariance of p is &,(A,
A') = (S*A, S*A'),.
In particular, S* is bounded from E* to H with operator norm given by
Hence, the norm of S also satisfies (3.4.11)
3.4.12 Theorem. If p is a centered GAussian measure on the separable, real BANACH space E , then there exist a separable, real HILBERTspace H and a continuous, linear injection S : H E such that ( E ,H , S, p ) is a WIENERquadruple. Moreover, if ( E ,H , S, p ) is any WIENERquadruple, then S is a compact map, S satisfies (3.4.11), and
-
(3.4.13)
AE(x) =
{ $11S-'x11& CQ
ifx E SH i f x e E\SH.
PROOF:To prove the first statement, let H denote the closure in L 2 ( p )of the subspace spanned by the functions p ( A , .)E,A E E*; and set l l h l l ~= l l h 1 1 ~ 2 (for ~ , h E H . In order to define S, we must first define (cf. Theorem
I11 General Cramkr Theory
89
3.3.4) “ m ( f p )E E” for f E L 2 ( p ) . To this end, assume that f E L 2 ( p ) is non-negative and that f dp = 1. Then, since
sE
-
we can define r n ( f p ) , where f p is the probability measure u given by u(dz) = f(z)p(dz). Now, extend f m ( f p ) to the whole of L 2 ( p ) by linearity; and let S be the restriction of this map to H . Note that (3.4.14)
,-(A,
S h ) , = (,*(A,
-), h ) H ,
X E E* and h E H .
In particular, if h E H and Sh = 0, then h IH {p(X,.), : A E E’}, and therefore h = 0. That is, S is an injection. Finally, to complete the proof that ( E ,H , S, p ) is a WIENERquadruple, let X E E* be given and, using (3.4.14), check that S’X = ,.(A, .), and therefore that I(S*Xl(&= &,(A, A). Now let ( E ,H , S, p ) be any WIENERquadruple. We have already seen that (3.4.11) holds. Moreover, since A; is a good rate function, the compactness of S will follow as soon as we show that (3.4.13) is true. To prove (3.4.13), first suppose that z = Sh for some h E H . Then 1 S h ) , - zllS*Xll& : X E E * }
{ SUP ~.(s’x, { {
= sup ,.(A, =
1 h ) , - zlls*A(($ : E E*}
= SUP
1 ~ * ( h ’ , h )-, 511h’l[&: h’ E H
1:
= -llhllL
since S is an injection and therefore S’E’ is dense in H . Conversely, suppose that 2 E E and that A;(.) < 00. Then, since
we have that JE*(X,Z),J
L (2~~(~))1121/~*~E 1 1E*. ~,
Hence, because S*E* is dense in H , there is a unique, continuous, linear functional F on H such that F(S*X)= p(X,z),, X E E*; and therefore, by the RIESZ Representation Theorem, we know that there is a unique h E H with the property that ,*(A,
S h ) , = (s*X, h ) H = ,p(X,z),,
Thus, we conclude that
3: =
E E’.
Sh and therefore that (3.4.13) holds.
Large Deviations
90
3.4.15 Exercise.
-
Let p be a centered GAussian measure on the separable, real BANACH Qp(A,A’) is continuous with space E. Show that (A,A’) E E* x E* respect to the weak* topology; and conclude from this that there is a A0 E E* with I l A o l l ~ . = 1 and a = , where a is defined as in ~Q~(XO,XO) (3.4.7). In particular, use this to show that
3.4.16 Exercise. Let E = C(C;W), where C is a compact metric space and we think of E as a BANACH space with the uniform norm. Given a centered GAussian measure p on E , define q p ( s , t ) = Qp(6,,6t) for s, t E C. Show that qp E C(C2;[0, co))and that
Next, show that q p ( s , t)’ 5 q r ( s , s)qp(t, t ) for all s, t E C, and use this to conclude that b = sup,Ec qp(s, s), where b is defined as in (3.4.7).
IV
Uniform Large Deviations
4.1 Markov Chains
In this section, we present a theory which generalizes the results in Chapter I11 in an important direction. Namely, we will see how to replace the independence which we assumed there with the MARKOV property here and still end up with a SAVOV-type result for the empirical distribution functional. Of course, we will have to impose strong ergodicity conditions in order to assure that there is a “typical behavior” from which large deviations may occur (cf. Example 4.1.1 below). Throughout, C will denote a Polish space and E , X, p, and )I . ) I will be as in at the beginning of Section 3.2. Set 2 = C x E , 6 = kN;and, for n E N, let 2 H = ((Xn(&),Xn(&)E ) C x E denote the nth coordinate map on 6, and set @ = (5m: 0 5 rn 5 n). Next, let
(c)
en(&) SE
2
fi(6,.) E Ml(k)
-
be a transition probability function on 2 (i.e., ?I E k fI(S,f’) is measurable for every f‘ E Bk);and for each 6 E 2 denote by Pe the unique probability measure on (6,B6) with the properties that
Pe({&: go(&)= 6}) = I and
Pe({; : 2 n + 1 ( E~ F)1333,) ) = fi(kn(.),F) (a.s.,Pc) for each n E N and f E B,. That is, Pe is the distribution of the Markov chain on 2 starting from 8 with transition function fi. Finally, define n
S,(&)
= z X k ( 3 ) and k=l
1 Sn(&)=.-Sn(&) n 91
for n E Z+,
92
Large Deviations
sn
and let ~ 8E M1(E) , ~ be the distribution of under P+.What we want to do is study the large deviation theory for the families {p&+ : n 1 l}, 8 E 2;and, in so far as possible, our treatment will be based on the ideas introduced in Chapter 111.
4.1.1 Example. The example to be kept in mind is the one in which E = M1(C) and
where D E C H I I ( D , .) is a transition probability function on C. In this case, it is unnecessary to deal with fl at all. Instead, one should set R = CN, let w E R HC,(w) E C be the nth coordinate map on 0, and take P, E Ml(S2) to be measure defined by the conditions P,({W
: C,(w) = D } ) = 1
and for all n E hl and r E BE. (In other words, P, is the distribution of the MARKOVchain on C starting at a and having transition function II.) It is then an easy matter to check that for any 8 = (0,v),~ 8is the , distribution ~ of the empirical distribution functional (4.1.3)
under P,. In particular, P(,,~),~is independent of v E Ml(C), and therefore, in this case, we will use the notation pa,n instead. In order to explain why it is that one might suspect that the measures p,,,, n E Z+, are candidates for a large deviation theory, assume for the moment that the MARKOVchain determined by II is sufficiently ergodic to allow one to conclude that there is a p E with the property p (a.e.,P,) for each D E C. One would then have that that Ln(w) pc,n I 6, for every u E C; and, obviously, a large deviation theory for { P , , ~: n E Z'} would then be the precise analogue for MARKOVchains of what SANOVdid for sums of independent random variables: the difference being that here we have to rely on ergodicity, whereas there we had the Strong Law working for us.
*
IV Uniform Large Deviations
93
It will be convenient to have the notation
sT(Lj) =
2
Xk(&)
and
1 n-m
~ ( l j )-s:(G)
k=rn+l
forO
We now present the analogue of Lemma 3.1.2. 4.1.4 Lemma. For n
2 1 define
P,(r)= CEC inf p6,,$), Then, for each convex r E B E , n E Z+ In addition, if
-
r E BE. Pn(r)is super-multiplicative.
(4.1.5) then for every p-bounded r E BE which is convex, either Pn(r)= 0 for all every n E Z+, or, for each S > 0, there is an m E Zt such that P,(1'(6))> 0, n 2 rn. (Throughout, r(@ is defined relative to the metric p on E . )
PROOF:Suppose that I' E BE is convex. Then, for 6 E ~ C , r n + n ( r= )
9 and m, n E Z+,
r>) E r and sm(3) E I?})
( ( 2 : Srn+n(G) E
>&
({2 : K + n ( G )
J
=
pfi,,,(G),n(r)kC(dh) L p+,rn(r)pn(r)-
{ G :6, ( 2 )E
r}
Hence, P,+,(r) 2 Pm(r)Pn(r). To prove the second statement, assume that I' E BE is pbounded and convex; and suppose that Pm(I')> 0 for some rn E Z+. For n > rn, set qn = [ g ] and rn = n - qnm. Then, since, by (C), r is 11 11-bounded, p+,n(r(6))= P&({G: ~
~ E ( r("}) 3 )
2 k&({G :c ( G ) E 2 &({G
and
[Isn(;) -sfi"(G)ll
: ~ l S r n ( G< ) ~n6/2 ~ and s',(G) E I?})
2 S ( { G : llSTn(G)Il < n6/2})Pqnm(r)
> kC({G : 1
1 ~ (~ ~" ) l l
< 6)
94
Large Deviations
for all sufficiently large n 's. Since
we now see that P,
> 0 for all large enough n 's. I
Assuming that (4.1.5) holds and proceeding as in Section 3.1, we first use Lemma 4.1.4 together with Lemma 3.1.3 to define C(Q,T)=
{
O0
- limn+m
log P, (B(q,T ) )
if SUP,EZ+ P n (B(q,T / 2 ) ) = 0 otherwise
; then take for ( q , r) E E x ( 0 , ~ )and
4.1.7 Lemma. Assume that (4.1.5) holds. Then the function I= in (4.1.6) is lower semi-continuous and convex. In addition, for every open G in E ,
1
(4.1.8)
lim -logP,(G) 2 -infIfi. G n+m n
PROOF:The proof is, more or less, the same as that of Theorem 3.1.6. To see that Ifi is lower semi-continuous, suppose that C < I=@). Then C ( p ,T ) 2 C for some T > 0; and therefore I = ( q ) 2 C(q, ~ / 2 )2 C ( p ,r) 2 C for all q E B(p,r/4). To prove the convexity, let p , q E E with Ifi(p)V I f i ( q ) < 00 be given. For r > 0, choose 6 > 0 so that
Then
for all large enough n E Z+. Hence,
IV Uniform Large Deviations
95
+
(q)
and so Ifi 5 (Ifi(p) I f i ( q ) ) / 2 .Since Ifi is lower semi-continuous, it follows from this that it is also convex. Next, suppose that G is open in E and that p E G with I f i ( p ) < 00. Then, for T > 0 with B ( p , r ) g G,
!&
1 -log'P,(G)
1
2 !& -lo g P,(B(p ,~ ) ) 2 -L(P,T); n+m n
n-+m
and therefore lim?
logPn(G) 2 --Ifi(p). We now introduce an assumption which, among other things, guarantees that our MARKOVchain is uniformly ergodic (cf. Exercise 4.1.48 below). Namely, we will assume that there exist C, N E Z+ and M E [1,00)with e 5 N such that
(6)
{
,oo
P ( 6 , .) 5
g -g=l fly?, )
for all 6, .iE
*
supeE&JE exp[allzll]riE(6, dx) < 0;)
where lP+'(6, . ) =
for Q E
J2 h y i , . ) fi@,d i ) ,
m
2
(0,0;)),
2 1,
and f I ~ ( 8r) , = fl(C?,C x r) for r E BE. The next lemma contains some important preliminary consequences of
(0). 4.1.9 Lemma. Assume that (U) holds. If
then, for ail m E Z+, S > 0, n E N, and (4.1. l o )
Q
E ( 0 , ~ :)
sup P8 ( ( 2 : I I S ~ + ~ ( L .2J 61) ) I I 5 exp[-crb
eE2
+rn~,].
In particular, this means that (4.1.5) is satisfied. Furthermore, if to E MI$) and Q E M1(E) is defined by
Q(r)= $
c J: N
Ic=l
c r)fio(de), r E B E ,
~ ( 6 ,x
c
then (4.1.11)
J,e x ~ [ ~ l l ~Ql( ld]x ) 5 exp[&],
Q
E
(o,~),
96
Large Deviations
and (4-1.12)
Ln -
F(xt+m(G), .. . t x n t + m ( G ) ) p e ( u )I M"
F ( x ) Q"(dx)
for all m E N, n E Z+, and all measurable F : En [O,oo). Finally, if either E is the space of probability measures on some Polish space or (X,11 . 11) is a separable BANACH space, then, for each L 2 0, there is a K L cc E for which
PROOF: Noting that
4
we see that (4.1.10) will be proved as soon as we show that
IJ,eXP[allSm(G)lll (JEexP[.ll.lll
) R&W3
fiE(~m(4,a)
and so the required estimate follows by induction on m. Next let CO be given and define Q accordingly. Then, since
firn+1(S,. ) =
J-c fi(i,- ) A m ( 6 , d i ) 5 supfi(i, *), (€5
-
we see that Q satisfies (4.1.11). In addition, since, for any m E N, n E Z+, and measurable F : En [0,oa)
IV
Uniform Large Deviations
97
and therefore the desired result follows easily by induction on n. Given (4.1.10) and (4.1.12), the proof of (4.1.13) can be accomplished as follows. Using Lemma 3.2.7 in the case when E is the space of probability measures on a Polish space and Theorem 3.3.11 when ( X , 11 11) is space, we can find compact sets K L in E such that a separable BANACH limn-m log (Qn(KE))5 - ( M L ) . Furthermore, by Lemma 3.1.1, we may assume that these K L 's are convex. Hence,
+
and so, by (4.1.12), the required estimate follows. I We are now ready to prove the basic large deviation result of this section. 4.1.14 Theorem. Assume that
(6)holds and let Ifi be the function de-
fined in (4.1.6). Then, for every K cc E , (4.1.15) Furthermore, if either E is the space ofprobability measures on some Polish space or (X, 11 . 11) is a separable BANACHspace, then Ia is a good rate function anti, for all I' E BE,
Large Deviations
98
In particular, (4.1.17)
infpi,ne(r) I
d,,(r>I suppi,ne(r), r E BE. i
What we will show first is that everything holds when pa,n is replaced by and Ifi is replaced by Ufi.It will then be a relatively simple matter to pass to the desired statements. We begin by showing that (4.1.18) for every p E E . To this end, suppose that 0 < a < I f i ( p ) and choose 6 > 0 so that L(p,46) > a. Then, by (U),for all 6 , f E 2,
At the same time, for each 1 I m 5 N ,
Since P, (B(p,26)) 5 exp [-na] for sufficiently large n 2 1, we conclude from the above that
and therefore that (4.1.18) holds. Given (4.1.18), one can proceed in exactly the same way as one did in part (iv) of Exercise 2.1.14 to show that
IV
Uniform Large Deviations
99
In particular, we now know that (4.1.15) holds when p ; ~ and , ~ 1, are replaced by p:,, and [In, respectively. Furthermore, from (4.1.8) and (4.1.17), we see that
for every open G in E. Finally, suppose that E is either the space of probability measures on some Polish space or that (X, 11 . 11) is a separable BANACHspace. By (4.1.13), we know that there is a family { K L : L 2 0) of compact, convex E such that
Hence, just as in the proof of Lemma 2.1.5, we conclude not only that I= is good but also that (4.1.16) holds with ~ 2 and , CIfi ~ in place of p ; ~and , ~
In. In order to complete the proof, note that, from (4.1.10),
for every r E BE. Hence, (4.1.15) follows from (4.1.19); and, when E is either the space of probability measures on some Polish space or a separable BANACHspace, the right hand side of (4.1.16) is a n easy consequence of the fact that it holds when Ifi and p ~ .are , ~replaced by f?Ih and P:,~, respectively. Since the left hand side of (3.1.16) is precisely (3.1.8), the proof is now complete. I 4.1.20 Corollary. Assume that (U) holds and that either E is the space of probability measures on some Polish space C' or that (X, 1) . 11) is a separable real BANACH space. Then, for every @ E C(E; R) which satisfies 1/12
(4.1.21)
SUP SUP nEZ+
( L e x ~ [ n a @ &] , n )
6 ~ 2
<
0;)
for some a E (1, oo),one has that
-
sup{@(q)-I&)
:q E
E } = 0.
Large Deviations
100
In particular, if (4.1.23)
Afi(X)
-1
= n+m lim n 6supApa,,(nX), €E
X E X’,
then Afi(X) E R, X E X*,
and (4.1.25)
n+m
- Afi(X) = 0,
A E X*
(Remember that, when E = Ml(C’), X* = Cb(C’;R) and
for X E c b (C’; R) and q E MI (C‘) .)
PROOF:The first assertion is an immediate consequence of (4.1.16) combined with Exercise 2.1.15. Once one has (4.1.22), (4.1.24) follows from the estimate (4.1.10) together with Theorem 2.2.21. Finally, (4.1.25) is just a special case of (4.1.22). I 4.1.26 Remark.
It should be clear that Theorem 4.1.14 and Corollary 4.1.20 applied to the case when fi(&,.) = p E M1(E), & E 2,can be used to recover both the SANOV as well as the CRAMERTheorems. What we want to do now is turn our attention to the situation described in Example 4.1.1. In other words: E = MI@);II is a transition probability function on C; for each u E C, P, on Q = EN is the MARKOVchain starting at D with transition probability II; and : n 2 1) is the distribution of the empirical distribution functional w HL,(w) in (4.1.3). As a consequence of the preceding, we know that if there exist e, N E Zf with 1 5 C 5 Nand M E [ l , ~ such ) that
IV Uniform Large Deviations (the second part of E BMl(.E)
where (4.1.28)
A;I(v) = sup
(6)is trivially
101
satisfied in this case) then for every
{ J, V d v - An(V)
:V E
Cb(C; R)
and
for V E Cb(C;R). Of course, these functions An and Ah make perfectly good sense even when one does not assume that (U) holds; and, as we will see below, the program on which we are about to embark makes no use of
(U). Let (B(E; W), (1 .I.) denote the BANACH space of bounded, measurable real-functions on C with 11 . I I B being the uniform norm; and, again using the expression in (4.1.29), extend An to the whole of B ( C ;R). Clearly, (4.1.30)
IAn(V) - An(W)) 5 IIV - W l l ~ , V, W E B ( C ,W).
Also, as a consequence of HOLDER’Sinequality, note that An is a convex function on B(C;R). Our aim is to find alternative expressions for An and Ah. In particular, we want to give an expression for Ah which is more directly related to II itself. In doing so, we will need to introduce the operators IIv : B (C ;R) B(C;R), V E B ( C ;R), defined by
-
J
(4.1.31)
[ ~ v + ] ( o= ) ~ X P [ V ( ~ ) I+ ( T ) ~ c
+
0d
~, )
for u E C and E B(C;R). When V 3 0, we will use II to denote the operator Ilv. Also, it will be useful to recall the concept of the logarithmic spectral radius of a bounded linear operator L : B ( C ; R ) -+ B ( C ; R ) . Namely, the logarithmic spectral radius p ( L ) of L is the number given by (4.1.32)
Large Deviations
102
-
where llLllop f sup{llL(bII~: ll(b11~5 1) and 11 I(B is the uniform norm on B ( C ; R ) . (Note that n E Z+ log((lLnIlop)E R is subadditive and therefore that the limit in (4.1.31) necessarily exits.) The first step in our program is taken in the following trivial version of the FEYNMAN-KAC formula. +
4.1.33 Lemma. For any V E B ( C ;R),
= e x p [ - ~ ( ~ [II;+'~] )] (a), (n,0)E Z+ x C, for dl(b E B ( C , W). In particular, (4.1.35)
1
V E B(C;R).
An(V) = p ( l I v ) = n+m lim -log[))[II;l])),], n
PROOF:To prove (4.1.34), note that
and that
Hence, (4.1.34) follows by induction. Once one has (4.1.34), (4.1.35) is obvious. I 4.1.36 Lemma. Let II be a transition probability function on C and s u p pose that A is a closed subalgebra of B(C;R) with the properties that 1 E A and f o (b 6 A whenever f E Cm(R;R) and (b E A. If A is invariant under the operator n, then
sup
{ J,V dv
(4.1.37)
=sup
{
- An(V) : V E A
-Llog-ddv[nu1 U
: u E A and u
21
I V Uniform Large Deviations
103
In particular, if II is FELLER continuous (i-e., cb(c;R) is invariant under II), then Ah = Jn, where (4.1.38)
for u E M1(C). Finally, in any case, d v : u E B(C;[ l , ~ ) ) (4.1.39) = SUP
{
V dv - An(V) : V E B(C;R)
for every v E M1(C).
PROOF:Let Ahd and J# denote the left and right hand sides of (4.1.37), respectively. Given u E A with u 2 1, set V = log A. [nu1 Then, since [IIvu]= u,An(V) = 0, and so
Thus, we now know that Ahd 2 J#. To prove the opposite inequality, suppose that a > An(V). Then, by (4.1.35), c,”==,exp[-na] [IIFl] converges uniformly to an element uv E A which satisfies uv 1. In addition, since [nvuv]= ea (uv - I ) ,
>
Hence,
J, V d v - An(V) 5 J,”(v),
V
E
A and v
E
MI(C);
and clearly this is more than enough to conclude that Ahd 5 J i . Finally, note that Jn(v) is obviously dominated by the right hand side of (4.1.39). At the same time, for fixed v E MI@),the set of u E B ( C ; [I,co)) for which - & log dv 5 Jn(v) contains Cb (C; [ l , co)), is closed under bounded point-wise convergence, and therefore contains B ( C ;[l,M)). I We now want to show that one can say more about the topic in Lemma 4.1.36 when (U) holds.
Large Deviations
104
4.1.40 Lemma. Assume that (U) holds; and, given u E MI@), define CL E Ml(C) by
Then, for each V E B(C; Fa),
and therefore
An(V) 5
(4.1.41)
1
7 (A,(!V)
+ log M),
V E B ( C ;R),
(See discussion preceding (3.2.11) for the notation ji, and use (3.2.6) with Q = ji to define A,.) In addition, if V E B ( C ;R) and {Vn}y is a bounded sequence in B(C;R) such that V,(a) V(u) for p-almost every u E C, then An(V,) An(V). In particular,
-
-
A& = Jn
(4.1.42)
even when II is not necessarily FELLER continuous.
PROOF:Choose and fix uo E C and set
p =
k C:=,
from which (4.1.41) follows. To prove the asserted convergence result, let {V,}? Then, by convexity, (4.1.34), and the preceding, 1 1 h ( v ) 5 -AII(PK) + - A I I ( P ' ( ~ - K))
P
P
-
IIm(q,, Using 6, E Ml(C)
(4.1.12) (note that the Q there is the distribution of u E C under the p here) and HOLDER'Sinequality, note that
and V be given.
P'
iogM
+ A,(P'e(v
a ) .
-
vn>>
I V Uniform Large Deviations
105
for every p E (1,m). Since limndm A,(p’l(V - Vn)) = 0 for every p E (1, GO), one concludes that An(V) < b,,w hn(Vn) by lettingp \ 1in the preceding. Because the same argument leads to An(V,) 5 An(V), we now have that An (V) = limn+m An (Vn) . With the preceding in hand, we know that V E B(C;R) An(V) E R is continuous under bounded point-wise convergence; and therefore, it is an easy matter to check that, for each Y E Ml(C),
I
A ; ( Y ) = s ~ p ( k v d ~ - A ~ ( VV) E: B ( C ; R ) Thus, by (4.1.39), Jn = A;.
By combining the above considerations with Theorem 4.1.14, we now arrive at the following version of a theorem proved originally by DONSKER and VARADHAN [30].
& VARADHAN) Assume that (U) holds and 4.1.43 Theorem. (DONSKER define Jn as in (4.1.38). Then Jn is a good convex rate function and for every E BM, (C)
(4.1.44) n-+m 12
Having gone to the trouble of replacing Ah in Theorem 4.1.14 by Jn, it is only reasonable to ask whether the effort was worthwhile. A partial answer is provided by the following sharpened version of a result due to DONSKER and VARADHAN.
4.1.45 Lemma. For any transition probability functions ll,
where (4.1.47)
In particular, Jn(p) = 0 if and only if p = p I I .
Large Deviations
106
PROOF:By JENSEN'Sinequality and Lemma 3.2.13,
Now apply Exercise 3.2.24. Finally, suppose that p = pII. Then, by JENSEN'Sinequality,
for every u E
cb(c;[I, 0 0 ) ) .
Hence, J n ( p ) = 0. I
4.1.48 Exercise. The condition (U) is more than enough to guarantee that there exists which is Il-invariant (i.e., p = pn). In this precisely one p E exercise, we will give two approaches to the proof of this fact. The first of these approaches makes direct use of the results which we have just proved; the second, and in many ways better, approach is a particularly simple example of the DOEBLIN theory of ergodicity for MARKOVchains
PI. (i) Let 3n = { p E M1(C) : p = p n ) . Clearly 3n is a convex set. Furthermore, by Lemma 4.1.45, 3n = {p E Ml(a) : &(p) = 0); and, by (4.1.44) applied to r = M1(C), we know that infM,p) Jn = 0 and therefore (cf. Lemma 2.1.2) that there is at least one p E 3n. To show that there is only one such p, first observe that { p : J n ( p ) = 0) C c M1(C). Now suppose there were more than one element of 3n, and apply the KREIN-MILLMAN Theorem to deduce that there would then have to exist distinct extreme elements p1 and pa of 3n. Finally, show that this is impossible since, on the one hand, (U) says that p1 would have to be equivalent to p2 while, on the other hand, standard ergodic theoretic considerations (cf. (iv) of Exercise 5.2.28 below) guarantee that p1 Ip2.
I V Uniform Large Deviations
107
(ii) We are now going to outline a quite different approach to the same question. Namely, we are going to show that if ( E , F ) is any measurable space and 11 is a transition probability function on ( E ,F)with the property that -
Wz,.) 2 a p , z E E , ( O , l ] and p E M l ( ( E , F ) ) ,then there is a unique p
(4.1.49)
for some a E M l ( ( E , F ) )such that
E
n E Z+ and v E M l ( ( E , F ) ) . IIvTI” - pllvar5 2(1 Show that, if it exists at all, then such a p must be the one and only ‘Iinvariant element of M1 ((E,F)). Also, show that when IT satisfies (U) and N nm, then satisfies (4.1.49) with a = & and p = IIe(a,.) Il G & in (4.1.50) for any a E C; and conclude that the corresponding p E is the one and only element of 3n. (Hint: check that p n E JR.) (4.1.50)
Turning t o the proof of (4.1.50), set E = E x { - 1 , l ) and define I? to be the transition probability function on ( f i ; F )F , 3 x B1-l,l) determined bv
Next, let {p(,,c): (x,[) E k} be the MARKOVchain on 6 = 3” with transition probability function fI,and use X,(LZt) and En(G)to denote the projections on E and {-1, l}, respectively, of the position of LZt E fl a t time n E N. Now check that
Xn(Lzt) E r})= TI”(z,I’), n E Z+, r E F, and (z,E) E E . Finally, define T ( W ) = inf{n E Z+ : S%(LJ) = I}, &s)({LZt
:
show that
*
n
(z,
n
r) = CY C (1 - a)m-lpZ+m(r) m=l
+ p(,,-l)({Lzt: Xn(LZt)E ?? and T ( G ) > n } ) , and conclude that --n
/I”.,
I(v1T - van I 2(1 - a)” for all n E Z+ and v1, v~ E M1 ( ( E , F ) ) In . particular, this means that, for each v E M l ( ( E , F ) ) ,
lp=r+n - v b ( l y a r5 2 ( 1 -
a),,
m, 71 E z+;
and, therefore, not only does {vTI”}? converge in variation to some p M l ( ( E , 3 ) ) ,but also the limit is independent of v and (4.1.50) holds.
E
Large Deviations
108
4.1.5 1 Exercise. Let II(a, .) be a transition probability on C.
(i) Suppose that p E MI@) is II-invariant and define the relative entropy functional H( Ip) as in (3.2.14). Show that H(vUlp) I H(vlp),
y
E Ml(C).
Hint: Apply Lemma 3.2.13 and use JENSEN'Sinequality. (ii) Assume that (U)holds and let p be the II-invariant measure produced in Exercise 4.1.48. Note by taking the v in Lemma 4.1.40 to be p, one sees that (4.1.41) holds for this p. From this observation, show that (4.1.52)
H ( v l p ) I CJn(v)
+ logM,
Y
E Mi(C).
In particular, conclude that if Jn(v) < 00 then v CJn(v) M where f = $.
+
<< p and fr,f log f dp 5
4.1.53 Exercise. Again assume that (U) holds. Using Theorem 3.2.21, show that, for every r E B M l ( E ) ,
(4.1.54)
P, ( { w : Ln(w)
ror
n+co
E
r})
n-ma
n
where the notation is the same as that used in Theorem 3.2.21. In particular, using (4.1.54) and the reasoning which led to Theorem 2.1.10, show that n-m fim UEE suP[:log
( j n e X P [ n @ ( ~ n ( ~ pu(du)) ))l
(4.1.55) - sllp{lp(v)
-
for every measurable @ : the strong topology and satisfies
for some a E (1, m).
- Jn(v) : v E M1(E)}] = 0
R which is continuous with respect to
IV
Uniform Large Deviations
109
4.1.56 Exercise.
Return to the original setting of this section and assume not only that also that E is a separable BANACH space. For v E M 1 ( k ) , define V E E M 1 ( E ) by v E ( r )= v ( C x E ) , I' E BE. Next, define
(6)holds but
{
I(x)= inf J f i ( v ): J, llyll v ~ ( d y )< 00 and ~
( v E= ) x
1
,
z E E.
(See Theorem 3.3.4 for the definition of ~ ( v E ) . Using ) the technique in Exercise 3.3.12, show that I is a good rate function on E and that (4.1.16) holds with this I in place of Ifi. In particular, conclude that I = A h = Ifi. 4.1.57 Exercise. In order to provide an example in which S, is an additive function other than the empirical distribution, let ll be a transition probability on C which satisfies (U), E be a separable BANACH space, v a probability measure on E , and suppose that F : C x E E is a continuous function satisfying
-
Next, define the transition probability function
fi on 2 = C x E so that
@ & , A x B ) = l l ( c ~ , A ) v (: {F(a,z) ~ E B } ) for A E BE and B E B E . Check that this fI satisfies variables E
fi
(a),and note that the corresponding random
-
S,(LJ) E E , n E z+,
are distributed under P(m,z, in the same way as the random variables
are under P, x vz+ . Finally, show that
I=(.) = sup{ p ( A , z ) -~ An(Vx) : A E E ' } where
Large Deviations
110
4.2 Continuous Time Markov Processes
We now want to carry out the program of Section 4.1 for the case of MARKOVprocesses having a continuous time-parameter. Again, let C be a Polish space and let E, X,p, and 11 11 be as in at the beginning of Section 3.1. Next, let ( t , a )E (0,m) x C P(t,cr,.) E Ml(C x E ) be a measurable map with the property that
-
P(s + t,&
= L x E
-
(c)
P(t,<,r(s, t J ) ) P(s,cr,dJ x dz)
for s, t E (0, cm), cr E C, and I? E B x x ~where ,
-
Using P(t,cr,F) to denote P(t,cr,rx E ) for r E BE,note that ( t , a ) E (0, m ) x C P ( t ,cr, is a continuous-time transition probability function. That is, it satisfies the Chapman-Kolmogorov equation a)
,.
Throughout, we will be assuming that, in addition, P (t ,cr, .) is continuous at 0 in the sense that
P(t,a,.)
(4.2.1)
as t \ 0,
6,
cr E C.
Because, in the continuous-time context, there is no canonical choice of the sample space on which to put the associated MARKOVprocess, we will simply assume that the following items exist:
(i) A measurable space (0,331 with ) a non-decreasing family of sub cr-algebras whose union generates m. :
(ii) A
-
{m: t 2 0)
t 2 0)-progressively measurable map (t,w) E (0, cm) x R
(C,(w),S,(w))E C x E
such that, for each w E R, &(w)
=o\tlim Ct(w)
exists, limtL0 St(w) = 0, and -8
St(#)
where & ( w )
= st(w) t-s tSt(w)for t
E E, E (0, m).
0 5 s
< 00 and w E R,
I V Uniform Large Deviations
111
(iii) A measurable family {P,, : u E C} of probability measures on (R,M) with the properties that for every u E C:
P6({W : C,(w)
= .})
=1
and, for every f' E B C ~ E ,
for all s, t E (0, GO).
-
Finally, for ( t ,c)E (0, 00) x C, we will use po,t E M l ( E ) to denote the distribution of w E R St(") E E under P,,. 4.2.2 Remark. In the first place, it should be noticed that, even though there is a certain amount of ambiguity about the way in which the family {P,, : u E C} is realized, the measures P,~ are uniquely determined by +(t,u, .). Secondly, as in Section 4.1, our basic example will be the case in which E = M l ( C ) and 'st(w) is the empirical distribution functional for the position process t E [0, 00) Ct(w)and, as such, is given by
-
where X p t ] denotes normalized LEBESGUEmeasure on [0, t]. (Note that,
when the paths
t E [0,03)
-
C,(w) E C
have any reasonable regularity.) Just as in the case of MARKOVchains, when we are dealing with this situation, there is no need to introduce E as a part of the state space or to have the joint distribution of w E R (C,(w),s,(w)) given as part of our data, because it is already determined by P ( t l U 1 . ) .
-
For the most part, the argument which we will use in the present situation to get our basic result (Theorem 4.2.16 below) differs very little from that used in the preceding section. Thus, we will avoid repetition and only provide details at those places where new techniques are needed.
112
Large Deviations
4.2.3 Lemma. Fort E (O,m), define
P t ( r ) = jg/&,t(r),r If r E BE is convex, then t E (0,m) addition, if (4.2.4)
H
E BE.
P t ( r ) is super-multiplicative. In
lim sup sup P,({w : IlSt(w)II
R - m oEC tE[O,T]
2 R } ) = 0, T E (0,m),
then, for every p-bounded, convex I' E BE, either up^,^ Pt(I') = 0 or, for every 6 > 0, there exist 0 < a < b < 00 such that inft+b] Pt(I'(6))> 0.
PROOF:The super-multiplicative property is proved in exactly the same way as it was in Lemma 4.1.4. As for the second part, suppose that, for some s E [0, m), P,(r)> 0; and, for t > s, define qt E [0, 00) and Tt E [0, s) so that t = qts rt. Since r is bounded and
+
-
y s r t-q),
st-q
= t
one sees that, for sufficiently large t
> s:
Pu({w : s t ( w ) E 1'(6)})2 Pu({w : q ( w ) E I' and ~ ~ S v t ( < w t) 6~/ 2~ } )
2 ~ a ( r ) ~ ~ ~ u:(~{ ~w
S r t<(t6/2}); ~ ) ~ ~
from which the desired conclusion is clear. I We next need the following variant of Lemma 3.1.3. 4.2.5 Lemma. Let f : (0,m)
-
[0, m] be a sub-additive function (i.e.,
f(s + t ) I f(s) + f(t) for all s, t E ( 0 , ~ ) with ) the property that SUpt~[~,b] f(t)< 0 < a < b. Then there is a T E ( 0 ,m) such that
00
for some
. f(t) = inf f (4 E W. lirn t
t+cc
t/T
i!
PROOF:Set M = SUPtf[a,b]f ( t ) ,choose qo E Z+ so that 4o(b - a ) 2 a, and let T = qoa. For t 2 T put q = [ t / a ] ,and note that t / q E [a,b] and therefore that f ( t )5 qM 5 $ M . Now let tl 2 T be given. For t 2 2t1, set qt = [t/tl] and rt = t - q t t l . Then
fo < (qt - l ) f ( t l ) + f ( t l + t and so
t
Tt)
(Qt
I
- l)f(tl) + y t
y
7
IV
Uniform Large Deviations
113
Assuming that (4.2.4) holds, define
and (4.2.6)
Ipb) = sup C ( p ,T ) = lim C ( p ,r ) T / O
T>o
With the same argument as we used to prove Lemma 4.1.7, one can easily check the following.
-
4.2.7 Lemma. Assume that (4.2.4) holds and refer to the preceding. Then
the map I p : E open G in E ,
[0, ca] is lower semi-continuous and convex; and, for
In order to get the complementary upper bound, we must introduce an assumption which will play the same role here as (6)played in Section 4.1. Namely, we will assume that
for some M E [I, m) and
PI, p2
E M1((0,1]).
4.2.8 Remark.
(a)
Although we have stated in such a way that time t = 1 appears to have special importance, this is in fact not the case. Indeed, as will be apparent from the development given below, we could deal equally well with the case in which the probability measures p1 and p2 are supported on any bounded interval in (0,m). We have chosen the interval (0,1] only for convenience.
As an easy consequence of toward the upper bound. 4.2.9 Lemma. Assume that
(a)we can take the following initial step
(a)holds.
114
Large Deviations
(i) If
then (4.2.10)
for all a E (0,m) and (t,(T) E (0, m) x C. In particular, (4.2.4) holds. (ii) For every p E E
and so
PROOF: The proof of (4.2.10) can be accomplished by induction on [ t ] using the MARKOVproperty. The details are left to the reader (cf. Lemma 4.1.9). By the usual HEINE-BOREL argument (cf. (iv) of Exercise 2.1.14), we need only prove the first assertion in (ii). To this end, let a < I p ( p ) and choose 6 > 0 so that C ( p ,36) > a. Then, for any (T,T E C and t > 0, Po,t(B(P,6 ) )
from which the required estimate is immediate. I
IV Uniform Large Deviations
115
Combining Lemma 4.2.7 with Lemma 4.2.9, we now see that, under (fi), {pu,t : t > 0) satisfies the weak large deviation principle, uniformly in CY E C, with rate function I p . In order to show that I , is a good rate function and that the full (uniform) large deviation principle holds under it will be useful to have some additional notation. Set fi = R x (0, 1IN;and, for CY E C, define Po= P, x py on (fi, B x BE,ll). Next, for G = (w, t) E d, set S.(G) = S.(w),s . ( G ) = s.(w); and define
(a),
T~(G =) 72 +
n
C
tm,
R,
E N,
m=O
and
m=l
where
-
Finally, for ( T ,a) E (0, oo) x C, we denote by ,G,,,T E bution under Pu of G E fi S T ( ; ) E E. 4.2.11 Lemma. Assume that for 6 E (0, oo),
M l ( E ) the
distri-
(0)holds and refer to the preceding.
Then,
Furthermore, given v E M1(C), define M u E
for
E
BE. Then
Mi(E)
by
116
and, for every n E Z+ and all measurable F : E"
h
(4.2.13)
x m + z ( 4 ? .*
J$Lh(4,
IM"
Ln
-
7
-
Large Deviations [0, m),
xm+2("-1)(W))
RT(d4
F ( x )M X W .
In particular, when E is either the space of probability measures on some Polish space or E = (X, I( 11) is a separable BANACH space, there exists for each L E [ 1 , m ) a Kr, CC E such that
-
-
Finally, in the case described in Remark 4.2.2, for every measurable function V : C [O, 001,
where p,, E MI@) is given by (4.2.15)
IV Uniform Large Deviations
117
and, for T E [1,00),
After combining these with the above and again applying HOLDER'Sinequality, we arrive at
for all ( T , a ) E [l,00) x C and a E (0,00); and clearly (4.2.12) is an easy step from here.
To prove (4.2.13), let v E M1(C) be given and define M u E Ml(E) accordingly. The estimate on the exponential moments of M u is an easy consequence of (4.2.10). Next, denote by fiu E Ml(E) the distribution XI(;) E E ; and note that, by the second part under Pu of 5 E fl of fiu 5 M M , for all a E C. At the same time, by the MARKOV property, we have that
-
(o),
where fiG
= fig,,(-) and
= Cl+7m+z(n-1)(3)(W)
for 5 = ( w , t). Combining this with the preceding and using induction on n, one quickly arrives at (4.2.13). In order to use (4.2.13) to check the uniform exponential tightness of : t > 0} when E = (X, 1) . 11) is a separable BANACH space, define G E fi LT(G) E MI(E) by (cf. the paragraph preceding the statement of this lemma)
{fiu,t
-
n(T,G)
Pm(T,5)6xm(-j);
LT(5)= m=1
and let Q u , E~ Ml(Ml(E)) be the distribution of LT under Pu.Since (cf. Theorem 3.3.4) S T ( & ) = m(LT(G)),
Large Deviations
118
the desired exponential tightness follows immediately from the last part of Lemma 3.3.10 once one notices that, by (4.2.13),
for T E [l,cm) and measurable V : E
-
[0, m].
When E = M1(C’) for some Polish space C’, we apply the preceding and JENSEN’Sinequality to conclude that
for all T E [l,m) and measurable V : C’ given by p:=J
-
[ O , o o ] , where p: E M1(C’) is
a’M,(da’). M i (E’)
Thus, Lemma 3.2.7 applies and yields the desired tightness. Finally, to prove (4.2.14) in the situation described in Remark 4.2.2, it suffices to apply the preceding (with C’ = C) and to observe both that the coincides with the pv in (4.2.15) and that above
for 2 = ( w ,t), T E [l,cm), and measurable V : C
-
[0, m].
IV
Uniform Large Deviations
119
We are now in a position to prove the main result of this section.
(a)
holds and that E is either a separable BANACH space or the space ofprobability measures on a Polish space. Then the I p in (4.2.6) is a good, convex rate function and, for every E BE,
4.2.16 Theorem. Assume that
PROOF:We have already proved everything except the goodness of IF and the upper bound for closed sets. But, by Lemma 4.2.7 combined with (4.2.12), we see that 1 lim 7 log [ 21$ t-cc
>
,Gu,t(G)] - inf I p G
s
for all open G E . In particular, since {fiu,t : t > 0) is uniformly exponentially tight, we now see that, for each L E [l,co)there is a K L Cc E such that 1 inf I - > - lim - log KZ p -
t-oo
t
and, since we already know that I p is lower semi-continuous, this completes the proof that I p is good. Turning to the upper bound for closed sets, note that by combining (ii) of Lemma 4.2.9 with (4.2.12) one sees that
for every p E E . Thus, again by the uniform exponential tightness of {jiu,t : t > 0}, we know that the upper bound holds when pa,t is replaced by jiu,t. But, by (4.2.12), this means that
for every closed F & E and S > 0; and, because (cf. (2.1.3)) I p is good, this is all that we need in order to get the upper bound. Applying the results of Chapter 11, we can now take the following preliminary step toward the identification of the rate function I p .
Large Deviations
120
-
4.2.17 Corollary. Let everything be as in the statement of Theorem 4.2.16. Then, for each continuous @ : E R which satisfies
for some Q E (1, oo), lim sup [:log
t-ca U€C
(J,exp[t~(r)l pu,t(dz)) - xEE sup(a(z) -IF(.)>
I
= 0.
In particular, if (4.2.18)
Ap(X)
- 1
= t'Wlim -t UsupAP,,,(tX), EC
A E X*,
then h p ( X ) E R for X E X*, (4.2.19)
Ap(X) = S U P { X * ( X , Z ) ~ - I ~ ( z :) x E E } ,
X E X*,
IF(.) = SUP { ~ * ( X , X ) , - Ap(z) : X E X * } , z E E ,
and (4.2.20)
- Ap(X) = 0,
X E X'.
We are now at the same stage in our development here as we were after proving Corollary 4.1.20 in Section 4.1; and, once again, we want to develop the analogue of the identification made in (4.1.42). Thus, from now on, we will be assuming that we are in the situation described in Remark 4.2.2 and we introduce (4.2.21)
Ap(V) =
lim f log (sup
t+m
t
UEC
1 n
exp
[l
V ( C s ( w ) )ds] P,(dw))
for V E B(C;W). By Corollary 4.2.17, we know that, under
(o),
I P ( V ) = A;.(.)
(4.2.22)
= SUP
{
V ( O ~) ( d a-) A p ( V ): V E Cb(C; R)
for v E M1(C). Clearly,
( A p ( V )- AP(W)I I IIV - WIIB,
-
V, W E B ( C ;
W;
and, by HOLDER'S inequality, one sees that V E B ( C ; R ) Ap(V) is convex. Our goal is to find alternative expressions for these functionals; and, just as in the discrete time setting, we point out that the identification itself does not rely hypothesis (6).
IV Uniform Large Deviations
121
What we have to do first is interpret Ap(V) as the logarithmic spectral radius of an appropriate operator; and this will again involve the FEYNMAN-KAC formula. However, in the present setting, there are a few more technical details which have to be confronted. In the first place, we must do a little elementary perturbation theory for semigroups. Define
(4.2.24)
for ( t ,a) E (0, a)x C. In particular, if
+
for E B ( C ;W), then {PF : t > 0) is a semigroup of bounded operators on B(C;R); and, in fact, IIP?llop 5 exp[tllV+II~],t > 0.
PROOF:The existence as well as the uniqueness of ug is an elementary application of the standard P I C A R D iteration procedure for solving equations of VOLTERRA type. Furthermore, as a consequence of the uniqueness, one can easily prove the semigroup property for : t > 0) by checking that u(t,.) = u z ( s t,.) satisfies (4.2.24) with replaced by [P:4]. Finally, the asserted bound follows immediately from
+
{Py
+
We can now state and prove the Feynman-Kac formula in the context of continuous-time processes. 4.2.25 Theorem. For each V E B ( C ; W) and all 4 E B ( C ;R),
Large Deviations
122 In particular, if
P ( t ,U , r) z
exp [[o,,l
J
V ( C S ( 4 )ds P d d U )
w:xt(w)Er)
f o r ( t , u , r ) E ( O , O O ) X C X B ~ , then [P,"#](D) = & # ( < ) ~ ~ ( t , ~ , forall d<) q5 E B(C;R) and (t,a) E (0,m) x C.
PROOF:Let u.(t,a)denote the right hand side of (4.2.26). Then, by the MARKOVproperty, u(t7.1 -
[P,#](u)
c
-
- J,,.t,
[Ps(vu(t - s, -))I (0)ds;
from which it is obvious that u ( t ,.) satisfies (4.2.24) and is therefore equal to [p,"$]. m Armed with (4.2.26), the interpretation of A p ( V ) as a logarithmic spectral radius is essentially trivial. 4.2.27 Corollary. For every V E B(C;W),A p ( V ) is equal to the logarithmic spectral radius
of the operator P y . (The limit in (4.2.28) exists by sub-additivity.) Our problem now is to use (4.2.28) to pass to a more pleasing expression for A;; unfortunately, this will involve in a new round of technicalities. Define Bo to be the space of # E B ( C ;W) with the property that 4 ( ~=) limtko [P,#](c)for every D E C . 4.2.29 Lemma. I f 4 E B ( C ;R) and (4.2.30)
lim[~:#](o)
= #(a), a E C ,
IV
Uniform Large Deviations
123
for some V E B(C;R), then 4 E Bo. Conversely, if E Bo, then (4.2.30) holds for every v E B ( C ;R). Further, c b ( C ; R) C_ Bo and Bo is a closed linear subspace of B ( C ;R ) which is { P y : t > 0)-invariant for every V E B ( C ;R). Finally, if $ E cb(c;R) and 4 E Bo, then 4 $ E Bo.
-
PROOF:The first assertion and its converse are obvious consequences of (4.2.24), and the linearity as well as the closedness need no comment. Moreover, the asserted invariance of Bo follows immediately from the first part together with the semigroup property; and clearly the inclusion cb(c;R) Bo is guaranteed by (4.2.1). In fact, not only do we have the asserted inclusion, but we even have that
In particular, if
4 E Bo and $ E cb(c;R), then
as t \ 0 for every u E C; in other words, 4.g E B". I We now define D" to be the space of that
for some $ E Bo. Note that for determined by
When V
3
4E
B(C;R) with the property
4 E DV,the associated $ E Bo is uniquely
0, we will use D and L , respectively, in place of Do and Lo.
4.2.31 Lemma. For every V E B ( C ; R ) ,D" E Bo, DV is {PF : t > O}invariant, and [Lv0 P,"#] = [ P y 0 L"#] for t E ( 0 , ~and ) 4 E D". Moreover, if X > p(Py) and
Large Deviations
124
for 4 E B(C;R)+, then RY admits a unique extension its a bounded linear operator taking B(C;R) into Bo;and, for each 4 E B(C;R),
fort E (0, m). In particular, if 4 E Bo, then, for every X DV and
> p ( P y ) , [RY4] E
[Lv0 Ry4] = X[RY+]- 4. Finally, if V E Cb(C; R) and X D n DV and
> p ( P r ) , then, for every 4 E Bo, [Ry4] E
PROOF:The preliminary assertions are all trivial consequences of the definition just given. To see that Ry can be extended as a bounded linear operator on B(C;W), it sufficesto note that RY is non-negativity preserving and that e-Yl [p31IIB dt < 0;) [RYlI IIB 5
II
J
(O@)
as long as X > p ( P r ) . Moreover, in proving (4.2.33), we may assume that 4 E B(C;R)+, in which case all the steps taken below are easily justified:
Clearly this proves (4.2.33) and, therefore, also that [RY$] E Bo for all 4 E B ( C ;R) and that [Rr+]E DV with [LVo RY4] = X[RY4]- 4 when 4 E Bo.
I V Uniform Large Deviations
125
> p(py) V 0. It is then
Finally, suppose that V E c b ( c ; R) and that easy to check from (4.2.24) and (4.2.32) that
[m] [R%] + [R:(V[RY4])] =
4 E B(C;R).
1
Hence, in this case, if 4 E Bo, then not only is [RY+] E DV but also (cf. the last part of Lemma 4.2.29) [Ry4] E D. Thus, for such 4's, we also have that - [LORY41 = 4 - V[R,V4];
x[m]
from which (4.2.34) is immediate. To handle the case when X E ( p ( P y ) ,01, simply observe that p(Py+") = p(P;) +a and that RrZ: = RY for every aER. g We are now ready to return to the problem of finding alternate descrip tions of A;. 4.2.35 Lemma. For v E MI@) define
{ J, 414 dv
-
(4.2.36)
~ ~ (= sup v ) -
E D n B ( c ; [I,m))}.
:
U
Then
A;.(.)
I W.) I L * ( 4
(4.2.37) =SUP
{L
V d v - Ap(V) : V E B ( C ; R )
Moreover, if {Pt : t > 0) is FELLER continuous (ie., Cb(C;R) is {Pt : t 0)-invariant) and D, = { u E D n c b ( c ; R) : Lu E c b ( C ; w ) } , then (4.2.38) A>(.) = J ~ ( Y )
SUP
{-l t
d
U
:
.>
= [Pt.]
-
I(. +
io,tl
p t - 8 (vu
*
1.
E D, n C b ( C ; [1,00))
PROOF:Let u E D n B(C;[l,m)) and set V, = matter to check that the function w : [0, m) x C satisfies w(t,
>
43,
-?.It is then a trivial
91
R given by w ( t ,.) = u
(0) ds,
t > 0.
Hence, u = [ P p u ] , t > 0. But this means that Ap(V,) = p ( P p ) = 0; and so, for every v E MI@),
-
$dv Ix p * ( v ) .
Large Deviations
126
Clearly, this proves the second part of (4.2.37). Moreover, if u E D, n cb(c;[l,m)) and therefore V, E cb(C;w), then the same argument shows that J p 5 A>. To complete the proof, let V E Cb(C;R) and X > A p ( V ) be given. Set u = [Ryl] and observe that, by Corollary 4.2.27 and the last part of Lemma 4.2.31, u E D and that Xu - (Lu Vu) = 1. In addition, by the FEYNMAN-KAC formula, one sees that u 2 E for some 6 > 0. Hence,
+
from which the first part of (4.2.37) follows after one lets X \ A p ( V ) and then takes the supremum over V E cb(C;w).Finally, in the FELLER continuous case, one can easily check that [Ryl] E D,; and so the preceding shows that A>(.) 5 J p ( v ) . I 4.2.39 Theorem. For h > 0 let n h be the transition probability function given by nh(0,-) = P(h,g,.). Then,
Jn,(v) 5 (4.2.40)
hxp*(v) for
h
>0
1 and z p ( v ) 5 lim - Jn,(v) for v E MI@); hTO
and so, p E MI@) is {Pt : t > 0)-invariant ifx'p*(p) = 0, and J p ( p ) = 0 if p is {Pt : t > 0)-invariant. (See (4.1.38) for the definition of Jn,.) In particular, if A> = Ap*, then 1 (4.2.41) j p ( v ) = x p * ( v ) ,Jn,(v) 5 h j p ( v ) , a n d J p ( v ) = lim -Jn,(v) O\h h
for all v E M1(C); and so p E MI (C) is { Pt : t > 0)-invariant if and only if J p ( p ) = 0. Finally, when {Pt : t > 0) is FELLER continuous, then 1 (4.2.42) J p ( v ) = A>(v), Jn,(v) 5 h J p ( v ) , and J p ( v ) = lim -Jn,(v) oh \ h
for v E Ml(C); and so, in this case, p E Ml(C) is {Pt : t and only if J p ( p ) = 0.
> 0)-invariant if
PROOF: To prove the first part of (4.2.40), first use (4.1.39) to see that
Jn,, (v) 5 sup
{
V dv - An, ( V ): V E B ( C ;R)
1
IV
Uniform Large Deviations
127
Thus, the inequality will be established once we note that, by JENSEN'S inequality and Lemma 4.1.33,
To prove the second part of (4.2.40),let u E D f l B ( C ;[l,co))be given and note that, since 1 - 2 5 - logz for 2 E (0, co),
and therefore that -k%dv=limh\oh
s
c
~
-PU
1
dv 5 lim -Jn,(v). hyO
Clearly, this completes the proof of (4.2.40). Moreover, if Kp*(,u) = 0, then, by (4.2.40) and Lemma 4.1.45, one sees that P p h = ,u for all h > 0. On the other hand, if ,u is {Pt : t > 0)-invariant, then, by Lemma 4.1.45, Jn,(p) = 0 for all h > 0, and therefore sp(,u) = 0 by (4.2.40). Next, suppose that A> = A p * . Then (4.2.41) follows immediately from (4.2.37) and (4.2.40). Finally, suppose that {Pt : t > 0 ) is FELLERcontinuous. Then, by Lemma 4.1.36, Jn, = A h h . At the same time, by the same argument as the one which led to the first part of (4.2.40), Ahh(.) 5 hA;(v); and, by (4.2.38), AI;, = J p . Clearly this proves that Jn, 5 h J p . Hence, by the last part of (4.2.40), we now see that Jp(v) 5 s p ( v ) 5 bhL0 xJnh(v) 1 5 JP(U).
I
We have now proved the following version of a result which was originally derived by DONSKER and VARADHAN [30]. 4.2.43 Theorem. (DONSKER & VARADHAN) Assume that (0)holds and * define J p as in (4.2.36). Then T p is a good rate function, K p = J p = A>
Large Deviations
128
and, for every r E BM,( c ) ,
In particular, if, in addition, {Pt : t > 0) is FELLER continuous and Jp is defined as in (4.2.38), then Jp = 5p; and so (4.2.44) holds with Jp in place of 3p.
PROOF:In view of Theorem 4.2.16, (4.2.19), and the second part of Theo- * rem 4.2.39, all that we have to do is show that A> = A p . But, with the aid of (4.2.14), this follows by the same argument as we used to prove Lemma 4.1.40. A truth which is familiar to MARKOV process devotees is that life often becomes simpler when one deals with symmetric transition probability functions. Thus, it should come as no surprise that the preceding theory of large deviations takes a more pleasing form when applied to such processes. In particular, we will close this section by showing that the rate function can often be expressed in terms of the DIRICHLET form associated with the symmetric process. We begin by recalling a few of the basic facts about symmetric MARKOVsemigroups. The a-finite measure m on ( E l & ) is said to be reversing for the transition probability function P(t,a,.) if the measures rnt, t E (0, GO), defined on (C2,Bg ) by
are symmetric (i.e., rnt(rlx r2)= rnt(r2x r1) for all rl, r2 E B E ) . Clearly, m being reversing for P ( t ,a, .) is equivalent to the statement that the semigroup {Pt : t > 0) is m-symmetric in the sense that, for each t E (0, oo),
In particular, by taking II, = 1 in the preceding, we see that for any t E (0700) and 4 E B ( C ;[ 0 , 4 ) ,
L J , Pt4drn=
4dm.
IV
Uniform Large Deviations
In other words, m is {Pt : t inequality,
129
> 0)-invariant;
IIPt4lI~.(,) 5 llw+2)llL1(Tn)
and therefore, by JENSEN'S = ll4Il;2(,)
-
for all t E (0,m) and 4 E B ( C ; R ) . After combining this with the fact that [Pt4](a) 4(0), o E c, for 4 E cb(c;w),one can easily show that {Pt : t > 0) determines a unique strongly continuous semigroup {Ft : t > 0) of self-adjoint contractions on L2(m)such that Ft4 = Pt4 for 4 E B(C;R) n L 2 ( m ) . Use to denote the generator of the semigroup {pt : t > 0) and note that is a non-positive self-adjoint operator on L2(m).(The selfadjointness of follows from that of the Pt 's and the non-positivity is a consequence of their contractive property.) Moreover, by either STONE'S Theorem or the HILLE-YOSHIDA Theorem, one knows that
z
-z.
where {Ex : A E [0,co)) is the spectral resolution of the identity for Finally, define the Dirichlet form & to be the quadratic mapping given by
let D(E) = {4 E L2(m): E ( $ , 4 )
< 00); and note that, by (4.2.46),
-
What we want to show is that, under reasonable assumptions, the function JE : M1(C) [0,m] given by (4.2.49)
&(f1/2,
f 1/2)
governs the large deviations of {Lt : t
if p << m and f = otherwise.
> O}.
4.2.50 Lemma. For V E B(C;W) define {PT :
4.2.23. Assuming that (4.2.45) holds,
t > 0) as in Lemma
130
Large Deviations
and so, for each t E (0, co),there is a unique continuous extension Fr to L2(m)of P y on B ( C ; R ) n L'(m). Moreover, {pr : t > 0) is a strongly continuous semigroup of bounded, self-adjoint operators on L2(m);and (4.2.51)
1
(J,V d p - J&(p) : p E Ml(C) 1 = lim t 1% (ll?StvllLz(m)+L2(m) ) IA P ( V ) ,
A&(V)
SUP
t+oo
where we have used L2(m)into itself.
)I
l l ~ z ( ~ ) - , p ( to ~ ) denote the norm for operators on
1
By (4.2.26), it is obvious that [P,"4](a)/I etllVlle[Pt\#\](a), aE C. Hence, the first assertion follows immediately from the fact that Pt itself acts contractively on L2(m);and so there is no problem about proving the existence and uniqueness of the extensions Fr. In addition, it is clear that {Fr : t > 0 } forms a semigroup and that this semigroup is strongly
PROOF:
continuous on L 2 ( m ) .In order to show that the Fr 's are self-adjoint, first observe that
for ( t , a ) E (0,co) x C and 4 E B ( C ; R ) . Indeed, using the expression in (4.2.26) for [Py4](a),one sees that
Now let denote the adjoint of $. Then, for 4, $ E B ( C ;R) n L'(m), one sees from (4.2.52) and (4.2.24), respectively, that
IV Uniform Large Deviations
131
and
where we have used the self-adjointness of Ft to get the first of these expressions. Starting from the above, it is an easy step to
and thence to $ = FY. Having established that { p y : t > 0} is a strongly continuous semigroup of bounded, self-adjoint operators on L2(rn),we can now say that
P:
(4.2.53)
e-xt
= V
t E (0, CQ),
,a)
where {EY : A E A[, m)} is the spectral resolution of the identity for -V -L and is the generator of {F: : t > 0). In particular, -Av = limt+m $log ( I / ~ Y I I L z ( m ) - L Z ( m ) . Thus, we will be done once we show that -XV 5 Ap(V) and that -Av = AE(V). To see the first of these, let A > AV be given. Then there is a 2c) E L2(rn) such that 2c) = EY2c) # 0. Thus, we can find a d, f B(C;R) r l L1(rn)such that EYd # 0. But this means, on the one hand, that
zv
)
and, on the other hand, that
In other words, -Av 5 A p ( V ) . To prove that -Av = AE(V), first note that
and there fore that
Large Deviations
132
Next, using (4.2.24) and (4.2.48), check that
By combining these, we see t,hat
Thus, all that remains is to check that the preceding supremum is unchanged if we restrict ourselves to non-negative 4’s. But, for 4 E L1(rn)n B ( C ;R),
and so, by (4.2.48) and an easy limit argument, we see that
as t \ 0 for every 4 E L2(rn). In particular, we now know that
ql#L14) I €(A#).
I
4.2.55 Lemma. Assume that m is a reversing measure for P ( t ,u, .) and
define & and JE accordingly. Then (cf. (4.2.37) and Theorem 4.2.39for the notation)
and (4.2.57)
1 Jc(p) = lim -Jn,(p) Oh \ h
PROOF:Obviously,
for p E MI@) satisfying p
<< m.
I V Uniform Large Deviations
133
and so (4.2.56) follows immediately from (4.2.37) and (4.2.51). (See Exercise 4.2.63 below for more information about the relationship between J E and A&.) To prove (4.2.57), let p f M1(C)with p << rn be given and set f = $. Noting that log(1 - x) 5 -2, 2 E (-co,l], and that 5 1 for any u E B(C;(0, co)),we see (cf. (4.1.39)) that
.-Pu
f112 A n - p h (f112 A n ) f1/2An+c
for all n f Z+ and
E
f drn
> 0. At the same time, for all n E 7+ and c > 0,
and
Thus, by LEBESGUE'SDominated Convergence Theorem, we have that
Clearly the desired result follows from this together with (4.2.40) and (4.2.48). By combining the preceding with our earlier results, we arrive at the following version of a result which, once again, is due originally to DONSKER and VARADHAN [30]. 4.2.58 Theorem. Assume that rn is P ( t , a , - ) -reversing. I f p << rn whenever K p * ( p ) < 00, then x p * = J E ; and, when P(t,a,.) is FELLER continuous, J p = JE if p << rn whenever J p ( p ) < 00. Finallx under the condition J p = JE and so z p can be replaced by JE throughout
(o),
(4.2.44).
PROOF:The only assertion that has not been covered already is the last one. However, this one would be obvious if we knew that (0)implied that v << m whenever J p ( v ) < co; and this latter fact is established in Exercise 4.2.59 below. I
134
Large Deviations
4.2.59 Exercise. Assume that
(0)holds.
(i) Proceeding as in Exercise 4.1.48, show that there is one and only one p E Ml(C) which is {Pt : t > 0)-invariant (i.e., p = pPt, t > 0) and observe that when v = p the measure pv in (4.2.15) coincides with p itself. Use this observation together with (3.2.20) to check that (4.2.60)
W 4 P ) I 1 6 ( 3 P ( 4 + log(2M)), v
E Ml(C).
(ii) Conclude from (i) that: p = m if m is any {Pt : t > 0)-reversing a-finite measure, v << p whenever J p ( v ) < 00, and {v : I p ( v ) 5 L } is compact with respect to the strong topology on Ml(C) for each L 2 0. 4.2.61 Exercise. Again assume that (0)holds. By (ii) of the preceding exercise, we know that { j p 5 L } is compact in the strong topology for each L 2 0. The purpose of the present exercise is to show that (4.2.44) continues to hold when r" and are replaced by ro7and F, respectively. (See the last part of Section 3.2 for the notation here.) This can be done in two quite easy steps.
(i) Use the results in Lemma 4.2.11 together with Theorem 3.2.21 to show that
and
for every I? E f ? ~ ~ ((Recall c ) . that, in the course of proving Theorem 4.2.16, we showed that {pb,t : t > 0) satisfies the uniform full large deviation principle with rate l p . )
(ii) Starting with (i) and using (4.2.12), complete the program. In particular, show that lim sup (4.2.62)
t-m u€C
[ (S, log
exp [t@(w)lPu(dw)) - sup{ @(v)
- Jp(u) : v E MI@)}]= 0
IV
Uniform Large Deviations
for every measurable : M1(C) the strong topology and satisfies
-
135
W which is continuous with respect to
for some LY E (1,oo). 4.2.63 Exercise.
-
Assume that m is reversing for P ( t ,u,.) and define the DIRICHLET form E and the associated function J E : M1(C) [ O , o a ] accordingly. In Theorem 4.2.58, we saw that J& = x p * under the condition that p << m whenever Ki-p*(p)< m. In this exercise, we will show that in general
-
J & ( p )is lower Notice that although (4.2.64) proves that p E MI(C) semi-continuous with respect to the strong topology on MI@), it does not show that it is lower semi-continuous with respect to the weak topology!
(i) Show that the map
4 E L 2 ( m )H €(4,&) is lower semi-continuous.
(ii) Using (4.2.54) and the preceding, show that the map f E L'(m)+ ++ E(f1I2, f1l2) is lower semi-continuous and convex. (Hint: Because of
(4.2.54), the convexity assertion comes down to an application of the triangle inequality for R2.) Conclude from this that JE is convex and that f(f112,
= SUP
for f E L1(m)+with = Ll(m).)
x
{
V f dm - A&(V): V
E B(C;R)
E
Ilf1(~1(~)
= 1. (Hint: Use Theorem 2.2.15 with
(iii) In view of (ii), we see that (4.2.64) holds when p << m. Thus, to complete the proof of (4.2.64), all that one has to do is check that the right side of (4.2.64) is infinite when p is not absolutely continuous with respect to m. To this end, note that A&(V)5 0 if V E B(C;W) vanishes m-almost everywhere and then use this fact to complete the derivation of (4.2.64). 4.2.65 Exercise.
Let P ( t ,0,.) be a transition probability function on C and assume that 6, as t \ 0 for each B E C. Next, let {Pt : t > 0) be the P(t,a,.)
136
Large Deviations
semigroup on B(C;Is) determined by P ( t ,u, and suppose that p E is {Pt : t > 0)-invariant. Using (i) in Exercise 4.1.51, show that a)
H(yPtIp) /” H ( ~ P ) , E MI(%
-
4.2.66 Exercise.
as t \ 0.
Let (s,a,t) E R x C x (0, cm) P ( s ,a;t , E Ml(C) be a time inhomogeneous transition probability function in the sense that it is measurable,
P(s,a;t , .)
* 6,
a)
as t \ 0 for all ( s , u ) E W x C,
and it satisfies the CHAPMAN-KOLMOGOROV equation:
J, P(S +t,T;t’,.)P(S,u;t,d7)
(4.2.67)
P(S,a;t+t‘,*) =
for all (s,a) E W x C and all t , t’ E ( 0 , ~ )In . addition, we will assume that (5, o,t ) E W x C x (0,m) I-+ P(s,o;t, is periodic in the sense that a)
P(s
+ 1 , o ;t ,-) = P ( s , u ; t ,.),
(s,a) E R x C and t E [O,m);
and, finally, we impose the condition that there exist M E ( 0 , ~and ) T E (0,m) for which (4.2.68)
P ( s ,u;t , .) 5 M P ( s ,7 ;t,*), t E [T, T + 11
for all (s,u,T) E R x C2.
(i) Define]r[= T-[TI for T E R; and, using (4.2.67), (4.2.68),and periodicity, show that
for all t E [T,T
+ 11 and ( s , u ) , (s’,a’) E R x C with 6 = s’ - s E [O,l].
(ii) Set 2 = [0, 1) x C, where we think of [0, 1) as the compact metric space in which the distance between points [ and q is given by
Next, define
I V Uniform Large Deviations
137
so that
P ( t ,6, f')
xi.(]< + t[,7)P(<,a;t ,d ~ ) , t E (0, w),
=
for 6 = (<,a) E [0,1) x C and E 13%. Using periodicity, check that (t,0)E ( 0 , ~x)2 ++ P ( t , 6 , . ) is a (time homogeneous) transition probability function and show, from (4.2.69), that
for all 0,? E 2.In particular, if {& : 0 E family corresponding to P ( t ,0,.) and if
(t,;)
E
(0, CO) x
C
-
g} on (d,L?)is a
MARKOV
L,(C;r) E Mi(,%)
is defined accordingly, conclude that the large deviations of
{Lt(G) : t > 0) under {& : 5 E
e}
are uniformly governed by the good rate function
Jp ( 6 ) = sup
(- 5 U
d6 : G E D n I?(.%; [I,w))
where the generator on D is defined for P ( t ,0,.) as in the discussion preceding Lemma 4.2.31.
(c),
(iii) Given 6 E M1 let v1 E M1([0,1)) denote its marginal distribution on [ O , l ) (i.e., vl(A)= 6(A x C) for A E Z?~~,l~). Referring to part (ii), show that T p ( V ) = 00 unless v1 is equal to LEBESGUE'Smeasure on [0, 1). (For more information on this subject, the reader might want to consult [27].) 4.2.70 Exercise.
Let everything be as it was at the beginning of this section, assume that condition holds, and let I = I p be the corresponding good rate function which appears in Theorem 4.2.12. In addition, assume that t E [O,W) S t ( w ) E E is continuous for every w E a, and consider the process t E [O, m) 6 , ( w ) E c = C([O,I];E), w E st given by B t ( w ) ( s ) = S&), E [O, 11. The purpose of this exercise is to investigate the large deviations as t co of
-
(a)
-
-
under the measures P,.
Large Deviations
138
(i) For n E N, define
for 1 5
2", and define 2n
I , : E2" ---+ [O,OO] by I,(x) =
1 F-J(zp),
x E E2".
e= 1 Given
r=
x
. . . x I'p, where
re E BE for each 1 5 e 5 2", show that
Conclude from this both that the E2"-valued family {AnGt : t > 0} is exponentially tight and second that I , is a good rate function which governs the large deviations of this family uniformly under {P, : ~7E C}.
(ii) Define K, : E2"
-
C so that
and t E [(C - 1)/2,, t / 2 , ] w K,(x)(~) is linear for each 1 I l < 2". Next, set Il, = 7rn o A,, C, = rn(E'"), and define
Check that 2, is a convex rate function and that
for every J? E Be. (We give C the topology of uniform convergence.)
139
I V Uniform Large Deviations (iii) Show that 2, 5 Zn+l, n E N, and define 2:C-[O,oo]byZ=
lim 2 ,=s u p Z n .
n-m
nEN
Note that Z is a convex rate function, and show that { W : Z?t(W)
E G})]
L --i$Z
for every open G C_ C and that
{ w : G t ( u ) E K } ) ]S - i g f Z
-1 t-m
t
for every K CC C. Conclude, in particular, that if {Gt : t > 0} is exponentially tight uniformly under { P, : u € C}, then Z is good and governs the large deviations of { g t: t > 0) uniformly under { P,, : CT E C}.
Hint: In proving the upper bound, let $1,.
. .,$ M
E
E
> 0 be given and choose
K and 61,. . . , 6E ~(0, a)
so that
Next, set 6 = 61 A ... A 6~ and choose n E N so that ((&$ -
$JII,<
6 for every $ E K .
Conclude that
where we have used Ke to denote K n B($e, 26e).
140
Large Deviations
4.3 The Wiener Sausage In the preceding two sections, we discussed the large deviations of the empirical distribution measured in the weak topology on MI@).Although, as we pointed out in Exercise 4.1.53 and Exercise 4.2.61, the conditions (U) and (6)allow one to transform weak topology results into ones about the strong topology, thus far we have said nothing about the variation-norm (or uniform) topology. The reason for this should be obvious; namely, the empirical distributions are likely to be mutually singular to one another and therefore behave very badly in the variation norm. (This is particularly clear in the case of MARKOVchains, when the empirical distributions are purely atomic.) Thus, in some sense, it is not even reasonable to hope for a satisfactory theory with respect the variation norm unless one first carries out a “mollification procedure” to make the empirical distributions more “friendly” to one another. The purpose of the present section is to show how this can be done. Throughout this section we will be working in the setting of Remark 4.2.2 in Section 4.2 and will be assuming (without always mentioning it) that (6)holds. In particular, all the notation is the same as it was in Section 4.2.
4.3.1 Lemma. For each V E B ( C ;Fa),
for @,a) E (0,m) x C .
PROOF:Given V E B(C;W), set
for s, t E (0, GO) and a E C . Note that, on the one hand,
for (5, t, a) E [0,1] x (0, m) x C ; while, on the other hand, the MARKOV property leads to
141
IV Uniform Large Deviations
IMexP[411VIlB]@(O,t;7); and so, if
+(t)= inf @(O, t;D), UEE
t
E (0,00),
then @(O, t;0)IMexP[411vllB]w,
(t,0)E (0,).. x C. Therefore, all that we have to do is show that @ ( t )5 exp[tAp(V)] for t E (0,oo). But, by (4.2.20), we know that 1
lim - log(@(t))= Ap(V).
t In addition, by the MARKOV property, t-+W
w,
2 3; .)@(t) for (s,t ) E ( 0 , ~ )and ~ ; so t E (0,oo) Thus, by Lemma 4.2.5,
-
@ ( tE) R is super-multiplicative.
Before stating our next result, we need to introduce a little more structure. Namely, for each m E Z+ let 3,/ be a probability measure on (0, l / m ] and define
Km
(4.3.3)
= ~ o , l , m ipt om(dt),
mE
z+.
When K m acts on a function 4 E B ( C ;R), we will write [Km$];when it acts on a v E M1(C), we write YK,. In addition, for each t E (0,co) we will suppose that we are given a FELLER continuous transition probability &(a,.)on C;and we will use [St$], $ E B ( C ; R ) ,and Y S t , Y E MI@), to denote
J,4(T)
st(., d7) E
R) and
J,
st(T, .)y(dT)
E M~(c),
respectively. (Recall that FELLER continuity is simply the statement that [st$] E cb(c;R) whenever 4 E cb(C; R).) Our development from here on will rely on our making the following hypotheses about these quantities.
142
Large Deviations
(i) There exists a separable, closed subspace E of the BANACHspace (M(C),I( * llvar) such that Y E
-
M~(c)
vnm E E~ = E n M ~ ( c )
is a continuous mapping into El (with the JI.JJvar-topology) for each m E
H+.
(ii) For every v E M 1 ( C ) ,vSt E El for each t E (0,oo) and vSt 4 00; and for v E E l , IIv - VS tllva r 0 as t 00.
v as
- -
t
=$
(iii) For each a > 0 and K CC C, there is a function t E (0,oo) n(t;a, Ii') E Z+ satisfying
.) : u E K } can be covered by n(t;a , K ) open such that { St(u, of radius a.
-
11. llvar-balls
Throughout, the topology on E and E l will be the one determined by
II
*
IIvar.
The following lemma summarizes a few elementary consequences of the above assumptions. 4.3.4 Lemma. For each L E (0,00), {Y E
and
: Jp(v)I
L}
CC
El
-
{
lim sup 11v- vstllVar: ~ p ( v5) L } = 0.
t-+m
Moreover, for each t E ( O , o o ) , rn E Z+, .!? E Z+, and K an N ( t7-C 7 K ) = (2C)n(t;1/e,K) element subset W ( t , m , l , K )ofV
{ B( [St covers
o
0
there exists
= {V E Cb(C;R) : J l V l l 5~ 1 ) such that
(Ic, - I ) W ] ,lo/.!?)
{ [st
cc C
:
W E W ( t ,rn,& K ) }
(n, - I ) V ] : v E v},
(The balls here are taken with respect to the uniform norm c b ( C ; R).)
PROOF:To see that the level sets of (4.1.46) and (4.2.41)) that (4.3.5)
Jp
11 . I I B
on
are compact in E l , recall (cf.
I V Uniform Large Deviations
143
In - particular, since YK, E El for every rn E Z+, this means that v E El if J p ( v ) < 00. In fact, since, as the continuous image of a compact set,
-
for each rn E Z+ and L E ( O , o o ) , it also means that {v : J p ( v ) 5 L } is totally bounded in El and is therefore compact there. To prove the uniform convergence in E of vSt to v for v 's in jp-level sets, simply note that since IlvSt - vllvar 0 for each v E El this convergence must take place uniformly fast over compact sets.
-
To check the last assertion, let t E ( O , o o ) , rn E Z+, C E Z+, and K CC C be given. Then, by the third property listed above, there exist n = n(t;l/e, K ) points 01,. . . ,an E K with the property that the open sets
cover K . Now let {qk}g,1 C cb(c;[0,1]) be chosen so that {IT: 7]k((T) > 0) is a relatively compact subset of u k and 771, E 1 on K ; and define $ , for a = ( a 1 , .. . , a n )E M = {-ae 1,.. . , 2 t } n
xi=, +
bv
It is then an easy matter to check that for each V E V there is an a E 21 with the property that
Finally, given a E M, choose W , E Y so that
if such an element of V exists, and set W , 4.3.6 Lemma. For every 6 E (0, oo),
= 0 otherwise. I
Large Deviations
144
PROOF:Let L E (0, m) be given. Using the estimates obtained in Lemma 4.2.11, choose K cc C so that - 1 lim -log s u p p u , t ( { v : v ( K C 2 ) 6/16})) 5 -L. t-mc
(
t
UEZ
Next, let C be the smallest element of Z+ which dominates 80/6, and (referring to Lemma 4.3.4) note that llvSt-VStKmllvar
I 8 + sup
= SUP
{ J,
[St
{J , [st (Ics 0
0
( K m - I ) V ]d v : V
- I ) V ]dv :
v E v}
I
w
If 5
EV}
+ m a x { J K [ ~ t 0 ( ~ -s I>WId v : E W(t,rn,l,K) 4 36 - rnax [St o (Ks- I )W ]dv : W E W ( t,rn,l ,K)
{
+
8 for any v E Ml(C) satisfying v ( K C I ) 6/16. Thus,
I (-L) v ( A + F W ) , and F ( m ) is defined to be where A = suptE[l,oo)logo) t
Since, by Lemma 4.3.1, F(rn) 5 -L
+ G(rn),where
G ( m ) = t-rw i& sup{Ap(R[Sto (K, - I ) V ] ) : V E V } and R = $, all that remains is to check that limm--roo G(rn) = 0. To this end, recall that
A p( V ) = sup
{
1
V d v - g p ( v ) : v E MI@)
for any V E cb(c;R). Hence, for any V E V , AP(R[St -
O
(K,
0 V (sup
- IW])
{R l [St
0
(K,
- I ) V ]du : J p ( v ) 5 2R))
-
I Rsup { IlvstL - vStl(var : J P ( ~ I) 2.)
{
I sup 211vst- vllvar + I I -~VKmllvar :
-
J P ( ~I )
2 ~ } ,
I V Uniform Large Deviations
145
where we have used
in the derivation of the last line. Thus, the desired result follows immediately from Lemma 4.3.4 and (4.3.5). I With these preliminaries, we can now state and prove the following version of the main theorem in [31]. 4.3.7 Theorem. Assume that (0)holds and refer to the preceding. Then J p I E l is a good rate function on ( E l ,11 . Ilvar); and, for every r E BE^,
where l?" the set I'.
and
-var
r
are, respectively, the
PROOF: Define f : M I @ )
-
11
*
Ilvar-interiorand closure of
El by
-
and, for m E P+, let fm denote the map v E MI@) I-+ v K , E E l . Then, by (4.3.5), 11 f n ( u ) - f (v)llvar 0 uniformly on each level set of T p ; and so, by the first part of Lemma 2.1.4, 7 ~ is a 1 good ~ rate ~ function on El. Next, observe that, for every r E f 3 ~ ~ and ( ~ 6) E (O,m), can be used to easily prove that
(a)
for all sufficiently large t E (0,m); and therefore, (4.3.8) reduces to proving that
146
Large Deviations
) some P O E C. Indeed, suppose that (4.3.9) holds. for all l? E B M ~ ( cand Then for any open subset G of El and u E G we would have that
At the same time, for any closed subset F of El, we would have that
5 - lim inf J p ( u ) : IIv - Fllvar < 6)
{
6\0
= - inf J p , F
where we have used Lemma 2.1.2 to get the final equality. In order to prove (4.3.9), we will again apply Lemma 2.1.4. Namely, for t E (O,m), denote by Qt the distribution of u E M1(C) H uSt E El under p u o , t ;and observe that, since uSt u as t 00 uniformly for u 's in compact subsets of MI@), the estimates in Lemma 4.2.11 are sufficient to justify applying part (ii) of Exercise 2.1.20 and thereby conclude that J p governs the large deviations of { Q t : t > 0) as a family of probability measures on M1(C).In addition, iff and {fin}: are the functions defined above, then (4.3.5) and Lemma 4.3.6 tell us that all the hypotheses of Lemma 2.1.4 are met by these functions and the family {Qt : t > 0). Hence, as a consequence of Lemma 2.1.4, we now see that J p l ~ governs , the Iarge deviations of {Qt : t > 0) as a family of probability measures on El; and this is just another way of saying that (4.3.9) holds. 1
-
The principle reason for DONSKER and VARADHAN'S interest in Theorem 4.3.7 is that they wanted to apply it to the following rather strange computation. Namely, let N E Z+ be given and, as in Section 1.3, denote by W WIENER'S measure on 0. Given E > 0, t E (0, m), and 8 E 0 , define
c$"(o) = {x E RN :
< for some s E
1 2 - ~(s>l E
[o, tl}
-
to be the €-sausagearound 81ro,tl.Using Il?l to denote the LEBESGUEmeasure of l? E B R ~note , that 8 E 0 lG~"(O)lis measurable and set
d(')(t; 7 )=
1
[
exp - rl6i"(e)l]W(dO), t E (0,m)
0
IV
Uniform Large Deviations
147
for fixed y E (0,m).In order to verify a conjecture made by some physicists, what DONSKER and VARADHAN wanted to do is compute the asymptotic behavior of d("(t; 7 ) as t 00; and we will devote the rest of this section to showing what they did. The first step is to rewrite d ( ' ) ( t ; y )in such a way that it becomes clearer what one should expect. To this end, observe that, by BROWNIAN scaling (cf. (iv) of Theorem 1.3.2), for each Q E (0,m):
-
have the same distribution under W . Thus, since
we see, upon taking a: = t2/N, that
where ~ ( t=)E / t 1 l N .Looking at the form of X ( € ) ( t ; y ) one , is led to guess that
and therefore, by (4.3.10),
might be the appropriate limit to compute. Further evidence that the preceding is a step in the right direction is provided by the following relatively simple computation.
4.3.11 Lemma. Let G be a bounded, non-empty, open subset of RN and set
(The space CF(G;R) consists of those 4 E C"(RN; R) with compact support in G.) Then
Large Deviations
148
(See Remark 4.3.33 below.)
PROOF:For z E R N and 6 E 0 , let &Js be the path t E [O,m) Hx + B ( t ) E RN accordingly. It is then clear, by the RN and define G~"'"(&) translation invariance of LEBESGUE'Smeasure, that
for all z E W N . Next, define
c ( x , O ) = inf{t 2 0 : &(t)4 G } .
where u G ( t , Z ) = W ( { e : show that
C ( Z , ~ ) > t } ) . Thus, all that we have to do is
(4.3.13)
The proof of (4.3.13) depends on an elementary fact about the relation between WIENER'S measure and the FRIEDRICHS' extension of $ A on CF(G;R). (We use A here to denote the standard LACLACEoperator on RN.) Namely, if Qt is the operator on B(G;W) defined by
z
[Qt+](x)=
J
# ( O , ( t ) ) W(dB), z E G and
4 E B(G;R),
{ws,w-t)
then {Qt : t > 0 ) is a sub-MARKOVian semigroup on B(G;W) which is weakly continuous on Cb(G;W) and satisfies
for all 4, p!~ E B ( G ; R ) .In particular, each Qt a d m i t s a unique extension as a self-adjoint contraction on L 2 ( G ) ,and : t > 0) becomes a
vt
{ot
IV
Uniform Large Deviations
149
strongly continuous semigroup of self-adjoint contractions whose generator coincides with E. That is, Qt = etL,t E [ O , o o ) . (For more information on such matters, the reader might want to consult [SO] or [51].) With the preceding in hand, we now see that
and so (4.3.13) comes down to checking that (4.3.14)
After combining these we see that
and obviously (4.3.14) is an immediate consequence of this. I
Large Deviations
150
Considering how crude the idea behind (4.3.12) appears to be, one may be surprised that, after making the optimal choice of G, the right hand side of (4.3.12) turns out to be the limit which we are seeking. The intuitive explanation for this is that a WIENERpath 8 either takes an excursion which carries it far away from the origin, with the result that (6i‘(t))(8)( becomes very large as t CQ, or 8 remains in some fixed bounded open G, in which case its “sausage” eventually fills up the whole of G. Although this intuitive picture is appealing, it does not lend itself easily to a rigorous proof. Instead, our derivation of the upper bound will rely on an application of Theorem 4.3.7 and will not make any direct reference to the preceding intuition. In order to arrive at a situation to which that theorem is applicable, we will need to make some preliminary preparations. Let R E (0,oo) be chosen and fixed, and set
-
Next, introduce on C ( R ) the metric
D R ( ~y), 5 min{ Iz + Rk - yI : k E ZN}, z, y E C(R); and observe that ( C ( R ) ,DR)becomes a compact metric space for which the corresponding BORELfield B x ( R ) coincides with the field BRN [ C ( R ) ] of BRN-measurable subsets of C ( R ) . Also, define FR : RN C ( R )by
-
(151 = max{n E Z : n 5 <} for E R), and note that FR is a continuous surjection which is locally isometric. measure, Using XR to denote the restriction to &(R) of LEBESGUE’S PR(~,z,.) E Ml(C(R)) by define ( t , x ) E (0,oo) x C ( R )
-
PR(t,5, dY) =
’Yt(x
+ Rk - 3 ) XR(dy).
kEZN
It is then an easy matter to check that PR(t,x,.) is a XR-symmetric, FELLER-continuous transition probability function on C ( R ) and that
PR(t,z,a) =+ 6, as t \ 0 for each 2 E C(R).
-
We will use ER t o denote the corresponding DIRICHLETform (cf. (4.2.47) and (4.2.48)) and will denote the associated function JcR : MI ( C ( R ) )
IV Uniform Large Deviations
151
[O,m] (cf. (4.2.49)) by IR. Finally, choose and fix an even function p E CF (BRN ( 0 , l ) ;[0, m)) having total integral one, and define (ti
E (0,m)
-
x(R)
S t , R ( x i *) E
MI (x(R))
(Recall that c ( t ) G &.)
4.3.15 Lemma. Set
E ( R ) = {CLE M ( C ( R ) ): CL << h},
--
give E ( R ) the topology induced by the variation norm, and set &(R) = E ( R ) n M1(C(R)). Then 1, : E 1 ( R ) [O,oo] is a good rate function. Moreover, if (t,0 ) E (0, m) x 0 Lt,R(8) E MI ( C ( R ) )is defined by
then, for every measurable subset I? of E ( R ) , - r inf IR o var
1
5 lim - log (w({ e : Lt,R(e)st,RE r})) t-m t
(4.3.16) - 1
5 lim -log t+m t
(w({e: Lt,R(e)st,RE r}))5 - i nr f
IR.
PROOF:Set R(R) = C([O, 00); C ( R ) ) and turn R(R) into a Polish space by giving it the topology of uniform convergence on finite intervals. Next, for 2 E C ( R ) ,let P Z , E~ MI (O(R))be the distribution of 8 E 0 H FR o 8, E O ( R ) under W . Using C t , ~ ( uE) C ( R ) to denote the position of w E Q(R) at time t E [ O , o o ) and setting
we see that (4.3.16) is equivalent to
- r inf o var I R I lim t+cc
1
7 log ( P o , R ( {: ~L ~ , R ( Q ) S ~E, Rr}))
Large Deviations
152
and in order to prove this, it suffices to check that Theorem 4.3.7 applies to w E R(R) H L t , ~ ( w ) S tE, ~Ml(R(R)) under the family {P,,R : x E
W)}. We begin by showing that { P Z , : z ~ E C ( R ) } is a time-homogeneous MARKOVfamily with transition probability function &(t, z,.). To this end, let B ~ , R be the a-algebra over R(R) generated by the maps w E R(R) &,R(w) E C ( R ) for s E [O,t]. Next, given 0 5 s < t < 00, A E & , R , and r E t ? x ( ~ )note , that
-
and therefore, by (ii) of Theorem 1.3.2,
That is, { P z , : z ~ f C ( R ) } is indeed a MARKOVfamily with transition z,, .). probability function P R ( ~ We next note that
-
for some M R E [l,m), and therefore that PR(~, x,.) satisfies (0).Further, P ~ ( l / r n , x , . E) &(R) is continuous for each it is clear that 2 E C (R ) rn E Z+. Thus, all that remains is to check that ( t ,x) E (0,m) x C ( R ) S t , ~ ( x , E MI ( C ( R ) )satisfies the conditions (ii) and (iii) stated prior to Lemma 4.3.4. But it is easy to see that
-
a)
for some K R E (0, m). Hence, (ii) certainly holds. In addition, since there is a C R E (0, 00) such that, for every T E (0, m), C ( R ) can be covered
I V Uniform Large Deviations
153
by fewer than [CR/TN] DR-balls of radius T , one can easily use this same estimate to check that (iii) is also satisfied. I
In order to apply Lemma 4.3.15 to our problem, we make a sequence of simple observations. In the first place, it is clear that F R ( r ) E &(R) and that IFR(r)( 5 Irl for every r E B ~ N .Secondly, if
In particular, these remarks lead to
and therefore, since
we now have that
4.3.18 Lemma. For every R E (0, m),
(4.3.19)
lim Llog(X(‘)(t;y)) t 5 -inf
t-ca
{yl{f >.O}l
where the infimum is taken over the set
PROOF:Define @R : &(R)
Since, for p E E1(R),
-
[O,RN]by
+&~(f’/~,fl/~)},
154
Large Deviations
where f = $-, it is an easy matter to see that on &(R). Furthermore, by (4.3.17) QR(L,R(w~,R)
@ R is lower
i Y ld"")(d)( , (t,e) E (0,
semi-continuous
x 0.
Thus, (4.3.19) follows from Lemma 4.3.15 together with Lemma 2.1.8. I Let
denote the collection of all non-empty, bounded, open subsets of R N . Then, by combining (4.3.12) with (4.3.19), we arrive at @b
inf{ IGI
+ X(G) : G E @b}
Thus we will know that our limit exists as soon as we show that inf{ IGl+ X(G) : G E 6 b )
The proof of (4.3.21) requires some work. In particular, we must find a more tractable expression for the right hand side of (4.3.21), and this is most easily done by introducing some SOBOLEV-space terminology. Thus, define the SOBOLEVspace H 1 ( R N )to be the completion of Cr(RN;R) with respect to the HILBERTnorm
(Throughout this section we will use the classical notation Vq5 to denote the EucLrDean gradient of the function 4.) It is then a familiar and elementary fact that q5 E L 2 ( R N )is an element of H1(RN)if and only if there is a (necessarily unique) Vq5 E ( L ' ( X R ) ) with ~ the property that
l N V q 5 . 9 d x =-iNq5(V.9)dx, 9 E ( C r ( W N ; R ) ) N , where
N j=1
-
is the EucLIDean divergence of P.Moreover, if 4 E H1(RN), then (4.3.22) continues to hold, and therefore q5 E H'(R*) Vq5 E ( L 2 ( R N ) ) Nis a continuous map.
I V Uniform Large Deviations
155
By analogy with the preceding, we next introduce the SOBOLEV spaces
H 1 ( C ( R ) ) for R E ( 0 , ~ )To . this end, define C " ( C ( R ) ; R ) to be the space of 4 E C(C(R);R) with the property that 4 0 FR E Cm(RN;R); and, for 4 E C m ( C ( R ) ;R), define V R to~ be the restriction of V ( 4 0 FR) to C ( R ) . Then, H ' ( C ( R ) ) is the HILBERTspace obtained by completing C" ( C ( R ) ;R) with respect to the HILBERTnorm (4.3.23)
I1411H1(C(R))
(1 1 4 1 1 ~ 2 ( X ~+) (1 IVR41 IILn(AR) 2
)1/2;
and, just as before, 4 E L 2 ( X ~is)an element of H 1 ( C ( R ) )if and only if there is a (unique) OR4 E (L2(X~))" such that
where VR . 9 is the restriction to C ( R ) of V . (90 FR). In addition, 11411Hi(C(R)) continues to be given by (4.3.23) for all 4 E H 1 ( C ( R ) ) . 4.3.24 Lemma. Let
R
E (0,m). I f + E L 2 ( X ~ and ) €R($,$)
< 00, then
4 E H 1 ( C ( R ) )and (4.3.25)
Furthermore, if R
> 4 and
r ( R )= {x E RN : R112 5 xj 5 R - R1I2for 1 5 j 5 N } , then, for each $ E H 1 ( C ( R ) ) + ,there is a 4 E H ' ( C ( R ) ) + such that
and
PROOF:We begin the proof of the first assertion by checking that (4.3.25) holds for 4 E C " ( C ( R ) ; R ) . To this end, we use (4.2.54) to see that 2ER(4,4) is equal to
156
Large Deviations
and we then apply TAYLOR'S theorem to get (4.3.25). Next, let 4 E L 2 ( X ~ ) with € R ( + , $ ) < 00 be given, and set
Clearly dt E Cm(C(R);R) for each t E ( 0 , ~ ) Moreover, . L2(X,) and, by (4.2.46) and (4.2.47),
t
-
f R ( d - d t , d - &)
-
-
0
+t
-
4 in
-
0. Thus, by (4.3.25) for elements of C-(C(R); R), we know that 0; and, because q5t q5 in L 2 ( X ~q5t) , must be converging to q5 in H ' ( C ( R ) ) . In particular, q5 E H 1 ( C ( R ) ) ;and, since, again by (4.2.46), € ~ ( $ t q&) , € ~ ( d , das) t 0, this completes the proof of (4.3.25). In order to prove the second assertion, we introduce on L 2 ( A ~the ) translation operators T,,R, a E R N ,defined by as
dt converges in H 1 ( C ( R ) )as t
-
4 EL2(W Since, as is easily checked for all B.qR)-measurable 4 's, a E R N , and t E R, [ 7 a , R d ]).(
h({. E C ( R ):
= 4 O F R ( X - a),
[7a,Rd] > t } )
= XR({a: E C ( R ) :
4 >t>),
one sees that each T@,Rinduces an isometry on both L 2 ( X ~ and ) H1(C(R)). Moreover, both
-
(a,&) E R~ x L ~ ( X R >
and
( a , $ ) E W~ x H'(c(R))
[ T ~ , R + ]E L ~ ( X R )
[ T ~ , R + ]E
H'(c!R))
are continuous mappings. Now let E H1(C (R))+be given. Then (4.3.26) holds when 4 = [ 7 , , ~ $ ]for any a E RN. In addition, $J
and therefore, there must exist an a E
d = [7a,R$J]. I
RN for which (4.3.27) holds with
IV Uniform Large Deviations
157
4.3.28 Lemma. The right hand side of (4.3.21) dominates
id
{
> 011
+f J
1041~ dx : 4 E H'(RN)+ with
=1
I I ~ I I , ~ ( R N )
RN
PROOF:Clearly there is an Ro 2 16 and a C1 E (0,oo) such that
For R 2 Ro, set
and define
where p E C r (BRN (0,l); [0, m)) has total integral one. Obviously: g~ = 1 on r ( R ) ;7312 E C r ( C " ( R ) [0, ; l]),where
c"(R)= {x
E R~ :
o < x j < R for 1 5 j 5 N } ;
and there exists a CZE (0, m) for which
Hence, for any also
4 E H1(C(R)),not only is V
R an ~
element of H 1 ( R N )but
With these preliminaries, we can now complete the proof as follows. By Lemma 4.3.24, we know that the right hand side of (4.3.21) dominates
Large Deviations
158
Now let R
> Ro and 4 E H ' ( C ( R ) ) + with l1411p(xR)= 1 and
for some C3, C, E ( 0 , ~ )and ; clearly the desired result follows from this. I At this point what we know is that - inf{ylGI+
A(G) : G E S,}
Although (4.3.29) appears to be still some distance from our goal, it, in conjunction with a beautiful result from classical potential theory, turns out to be all that we need. To be precise, for measurable 4 : RN [0, 00) define the decreasing rearrangement of 4 to be the non-negative measurable function on RN with the property that
-
4
I V Uniform Large Deviations
159
I{$
where f 2 ~ 3 (BRN(0,l)I. Obviously, > t}l = 1{4 > t } ( for every t E [O,m), and therefore 4 E L 2 ( R N )I-E L 2 ( R N )is an isometry. The beautiful result alluded to states that E H ' ( R N ) and
4
4
(4.3.30)
if
4 E H1(RN).For an elegant
proof of this statement, see [74].
4.3.31 Theorem. (DONSKER & VARADHAN) Set
where
:
4 E CF(BRN
(0,
(I/~N)"~))
1
=1
with
.
Then, for every E E (0, m), lim
t+oo
1
tN/(N+2)
log ( L e x P
[ - ~l6~')(')1]
W(de)) = - & N ( y ) .
PROOF: In view of (4.3.10) and (4.3.29), all that we have to do is check that inf{ylGI+ X(G) : G E St,} 5 &N(Y) (4.3.32)
To this end, note that, by an obvious scaling argument,
where BA denotes the open ball in R N around the origin wit Hence, inf{ylG(
I
volume
+ X(G) : G E @I,}Ii n f { y ( B ~+( X(BA) : A E (0,m))
1
Large Deviations
160
which is the left hand side of (4.3.32). To prove the right hand side of (4.3.32), suppose that E H1(WN)+ with l l 4 l l ~ 2 ( ~=~ 1, and A = > 0)l < 00 is given. Then, by the result cited above,
I{+
where 6 is the decreasing rearrangement of 4. At the same time, by an elementary mollification procedure, one can easily check that
for every 6 E ( 0 , ~ )Thus, . after letting 6 \ 0, we conclude that
4.3.33 Remark. The reader who is uncomfortable with the sort of DIRICHLET-form technology used in the proof of Lemma 4.3.11 should note that the proof of Theorem 4.3.31 only required our knowing (4.3.12) when G is a ball around the origin, in which case (4.3.12) can be easily derived from familiar, classical facts about the eigenvalues and eigenfunctions for f A with boundary condition 0.
I V Uniform Large Deviations
161
4.4 Process Level Large Deviations
In the preceding three sections, we discussed the large deviation theory for the empirical distribution of the position of a MARKOVprocess. In this section, we will develop the same theory for the empirical distribution of the whole process. We begin in the setting of MARKOVchains. Thus, let II be a transition probability function on a Polish space C and denote by {Pu: 0 E C} the associated MARKOVfamily of probability measures on R = EN.For n E N, define 8, : R R so that C,(&w) = C,+,(w) (recall that Cn(w) is the position of w E R at time n E N); and, given ’~tE E + , define
-
Once again, under the conditions introduced in Section 4.1, ergodic considerations predict that R,(w) + Pp almost surely, where Pp = & P, p ( d a ) and p E M1(C) is the II-invariant discussed in Exercise 4.1.48. Our goal is to describe the large deviation theory for the families
{P, 0 (R,)-’: n 2 I},
a E C.
Note that L,(w) = R,(w) o C,’ and therefore that the result which we are now pursuing is “higher” than the earlier one. We will begin by considering the more modest task of dealing with a study of the analogous problem for the finite dimensional marginals of the R,(w) ’s. Namely, for 1 5 k < l < 00, define
and, for d 2 2 , consider the map
and let pbfk E MI (MI@&))denote the distribution of w c)L?)(w) under PU . We will now develop the large deviation theory for the families {pb,, (4 :
n 2 1) when Il satisfies (U). To this end, define the transition probability function Il(d)on C d by (4.4.3)
Large Deviations
162
for
dd)E Ed and r
E
&d;
and let {PLf&:
d d )E C d } be the associated
= (Ed)N.Noting that
MARKOVfamily on
hid))
( r I ( d ) ) d ' e - l ( ~ ( d ) , d ~ (=d )r)I e ( ~ y ) , d ~ l ( d ) ) r I ( ~ l ( d ) , . . - r I(4 ( ~ ~ - (4 ~ ,)d r ~ for C E Z+, one sees that (U) implies that
for
d d )E xd.
Thus, when II satisfies (U), Theorem 4.1.43 applies to the empirical distribution of the position of the MARKOVchain {PLfd): ( ~ ( E~ 1E d } and tells us that
Jn( 4(v)
(4.4.4)
Jn(d)(v),
v E M1(Cd)
is a good rate function and that
for every r E
f3M1(Cd);
where .
n
and ELd)( w ( ~ )is) the position of w ( ~at) time n E N. Since, by the MARKOV property, it is an easy matter to check that for any n E Z+, (T E C, and dd)E Ed with oy) = (T: P g m =
s,. pi:)) ( { J d )L , ( J ~ )r})) :
E
( J d ) , d ~ @ ) ) r, E aE,
and therefore that
for all n E Z+ and deviation result.
(T
E C , we have now proved the following uniform large
IV Uniform Large Deviations
163
4.4.5 Lemma. Assume that (U) holds. Then the function JF’ is a good rate function on M1(Cd) and
for all r E
BM,
We next want to give an alternative expression for Jf). In order to develop this other expression, it will be necessary to recall a basic property of probability measures on a Polish space. Namely, given a Polish space E , a countably generated sub-a-algebra 3 of BE, and a P E M l ( E ) , there is a map z E E P 3 ( z l . ) E MI(C) with the properties that (1) z E E P F ( z ,B ) is F-measurable for every B E BE; (2) P F ( z ,A ) = X A ( ~ )z, E E , for each A E 3; (3) P ( A n B ) = JA P 3 ( z , B ) P ( d z ) for all A E F and B E BE.
--
The map z E E P3(x, .) is caljed a regular conditional probability distribution of P given F (abbreviated by r.c.p.d. of P given 3).The existence of a regular probability distribution is a well-known but nontrivial fact (cf. Theorem 1.1.8 in [104]) about the measure theory of Polish spaces. On the other hand, it is easy to see that any two r.c.p.d.’s of P given 3 can differ only on a F-measurable, P-null set.
-
4.4.7 Lemma. Let E be a Polish space and 3 a countably generated sub-a-algebra of B E . Given P, Q E M 1 ( E ) ,let x E E P 3 (x , .) and x EE Q 3 ( x , .) be, respectively, r.c.p.d.’s of P and Q given F. Then xEE H(Q3(z,-)lP3(x,.))is 3-measurable; and
--
where PI3 and QI3 are the restrictions of P and Q to 3 .
PROOF:First note that since, by Lemma 3.2.13,
we have that (v,p ) E (Mi(E))’
++
H(vIP)
Large Deviations
164
-
is a lower semi-continuous function; and therefore the 3-measurability of z EE
H(Q7(z, . ) ( P F ( z.)) ,
is established. Second, observe that if either side of (4.4.8) is finite, then Q << P. Indeed, Q << P by definition if the left hand side is finite. On the other hand, if the right hand side is finite, then Q)7 << P ) Fand there is a Q-null set A E 3 such that Q F ( z , << P F ( x ,.) for z 4 A. Thus, if I' E BE is a P-null set, then there is a P-null set B E 3 such that Q F ( zI,') = 0 for all z 4 B. Since Q ( B )= 0, we conclude that Q(r)= 0. In view of the preceding, it remains only to handle P and Q for which Q << P. But if Q << P , then we may and will assume that Q 7 ( q << P 7 ( z , for all x E E. Hence, because 3is countably generated, one can use the Martingale Convergence Theorem to construct an 3 x BE-measurable f : E2 [0, m) with the property that a)
a)
-
a)
Q F ( z I?) , =
jrf ( x ,
y) P7(x, dy),
z E E and
r E BE.
~ n { A E 3 : x E A } is the Noting that P 7 ( z , [ z ] r )= 1 where [ z ] = 3-atom containing z, one sees that, for each z E E , f ( x , y ) = f ( y , y ) (a.e., P F ( z , and therefore a));
Q F ( z ,I') =
jrh(y) PF(z,dy),
x E E and I' E BE,
-
where h(y) = f ( y , y), y E E. Hence, if g : E [O,m) is an 3-measurable function such that Q ( A ) = S,g(y)P(dy), A E 3, then g . h = In particular,
s,
With the preceding result in hand, we will be ready to give another expression for JF' as soon as we have introduced a couple of notions. In the first place, we will say that Y E M1(Cd) is shift-invariant if
IV Uniform Large Deviations
165
Second, given d 2 2 and 1-1 E M1(Ed-'), we will denote by p element of MI (Ed) defined by
for I' E
&d.
Note that if
denotes the mapping
,OF,)
@d
II the
d d )E E d
-
(cy),. . . E Ed-', then p @d n is uniquely determined by the fact that ( p @d II) o (rF1)-' = p together with the fact that
c(d)E Ed
is a r.c.p.d. of p
@d
-
b r y 2 1 ( u ( d ) ) @d
n('El?.)
II given B E , f (rEl)-'( B E d - 1 ) .
II be any transition probability on C, and define Jf) accordingly. If v E MI (Ed), then 4.4.9 Lemma. Let
J f ) ( u )=
(4.4.10)
where vd-
1 3
{
H(vIvd-1
@d
Il)
00
if v is shift-invariant otherwise,
(4 v o (rd-
PROOF:First suppose that v is not shift-invariant. Then there is a $ E c b (Ed-'; R) for which
J,.$($),
. . .,cp,, V ( d d d ) ) -
Thus, if T A ~ ( o= ( ~exp[acl(ojd). ))
. . . ,cP1)]
for o > 0 and
d d )E Ed,
then log ( [II(d)ua] ( d d ) )=)crqb(ap),. .. ,u y ' ) , and so
(4(v) = 00. which means that Jn H dd-') (a We next suppose that v is shift-invariant. Let be a r.c.p.d. of Y given Bd-l; (4 and note that, by Lemma 4.4.7,
7.1
Large Deviations
166
At the same time, by Lemma 3.2.13,
which obviously dominates
for every H(vIvd-1
E Cb(Cd; W). But, by shift invariance, the preceding says that @ d n) dominates
Thus, we have now shown that Jf'(v) 5 H(v(vd--1 @d II). On the other hand, by Lemma 3.2.13, JENSEN'Sinequality, and shift-invariance:
and clearly this completes the proof. I We are now ready to return to the problem, posed at the beginning of this section, of examining the large deviation theory for the Ml(a)-valued random variables in (4.4.1). Actually, as we are about to see, we are already
IV
167
Uniform Large Deviations
quite close to having such a theory. Indeed, by (ii) in Exercise 3.2.22,we can identify Ml(R) as the projective limit of the sequence {M1(Cd) : d 2 2}. Furthermore, if ?rd is the projection map w ER
-
?rd(W)
= (C*(w), . . . ,C d - - 1 ( W ) ) E C d ,
then it is obvious that pb$ is the distribution of w under P,. Hence, just as in Exercise 2.1.21, if we set
JAm’(Q)z SUPJ$’(Q
(4.4.11)
0
(?rd)-l)j
-
&(w)
0 (?rd)-l
Q E Ml(R),
d22
then we have the following uniform large deviation result as a consequence of Lemma 4.4.5. 4.4.12 Theorem. Assume that (U) holds, and define JAW’ as in (4.4.11). Then JAW’is a good rate function and
for every JAW’ .)
r
E B M ~ ( Q (See ) . (4.4.16) below for more information about
Although Theorem 4.4.12 in conjunction with Lemma 4.4.9 provides a reasonably satisfactory description of the large deviation theory under consideration, it would be an even better theory if we could find a more Unfortunately, in order t o get a nicer direct method of computing Jnm’(Q). ( ( expression for JnW) it will be necessary for us to introduce some additional not at ion. We will say that Q E M 1 ( R ) is shift-invariant if Q = Q o 8;’ for every n 2 1, and we will use Ms(Q) to denote the set of all shift-invariant Q E Ml(R). Note that, by Lemma 4.4.9, we need only concern ourselves with the computation of JAm’(Q) for Q E MT(R) since JAW)(&)= 00 for Q 4 MT(R). Next, for n E Z,set Zn = Z n (-w,n], R: = Czn, and use C,(w*) to denote the position of W* E Cl; at time m E Z,. Given n E Z and Q E MT(Q), one can use the KOLMOGOROV Extension Theorem to show that there is a unique QE E Ml(R;) with the property that
Q; ( { w *
E 0; : ( x - d + n ( W * ) , . . . , ~ ( w * >E)
r})
= Q ( { W E ~ :(Zdw),...,Mw)) Er})
Large Deviations
168
for all d 2 1 and I? E Ed+'. Next, given P E Ml(QZ), define P @O II to be the unique element of Ml(f2;) satisfying
for all d 2 1 and $ E B(Cd+2;Fa). Finally, for n E Z and k 5 C 5 t3$; denote the a-algebra over generated by the map w* E
a;
-
(Ck(W*),
. . ., C,(w*))
E
TI,
let
Pk;
and, for P, P' E Ml(f2;), let HF)(P'IP) denote the relative entropy of the restrictions to Z3r$!n, of measures P and I". It is then an easy matter to see that, for any Q E Ms(f2) and d 2 1, Lemma 4.4.9 becomes the statement that
Hence, (4.4.14)
J ~ ~ ) (=QSUPH~)(Q;IQ; ) gon), Q E M;(~I). dzl
In order to take advantage of the expression in (4.4.14), we will need the following simple continuity result for the relative entropy functional. 4.4.15 Lemma. Let ( E ,7 )be a measurable space, suppose that
ur
is a non-decreasing sequence of sub-a-algebras of 3 such that 3ngenerates 3. Then, for any pair of probability measures P and Q on ( E ,3),
H(QIF"IPIF,,)/" H(QIP) as TI 400. PROOF:By the argument used to prove Lemma 3.2.13, we know that
R) denotes the space of bounded, .Fn-measurable $ : E where B ( E ,3n; R. At the same time,
-
IV
-
Uniform Large Deviations
169
Hence, it is clear that n H(QIF,,IPIF~) is non-decreasing and that its limit does not exceed H(QJP). On the other hand, the class of II, E B ( E ,F ;W) for which
is closed under bounded, point-wise convergence and, obviously, contains B ( E ,F ~W);for all n 2 1. Combining Lemma 4.4.15 with (4.4.14), we now see that JAm’(Q) =
(4.4.16)
{ H(Q;IQ: 00
n)
80
if Q E M?(o) otherwise.
When (4.4.16) is put together with (4.4.12), we obtain a version of the process level large deviation result proved originally by DONSKER and VARADHAN [36]. Having dealt with the discrete time setting, we now want to see whether we cannot prove the analogous result in the continuous-time context. However, before we can do so, we must arrange that the sample space R at the beginning of Section 4.2 be itself amenable to a Polish structure. For this reason, we will assume that R is the space of right-continuous paths w : [O,oo) C which have a left limit at each t E ( 0 , ~ ) .Next, define RT = D([O,TI;C) for T E (0,oo) to be the Skorokhod space of rightC which have a left limit at each t € (0, T] and continuous WT : [0, T ] are left-continuous at T,and endow RT with the Skorokhod topology. That is, we give RT the topology induced by the metric
-
-
for W T , w k E Q T , where X runs over all increasing homeomorphisms of [O,T]onto itself. The following facts about the SKOROKHOD topology on RT are standard and will be important for our development below.
(i) The SKOROKHOD topology on RT is Polish in the sense that it is separable and admits a complete metric. (ii) The a-algebra over RT generated by the maps C, t E [0,TI,coincides with the BORELfield an,.
WT E
RT
-
W T ( ~ )E
170
-
Next, for 0 < TI < Tz < 00, define T!$) : 0~~
Large Deviations
R T ~so that
Unfortunately, although these natural restriction maps are, by (ii), measurable, they are not continuous. Thus, it is not possible for us to simply define the topology on R as the projective limit of the topologies on the RT’s. However, we will postpone the consideration of this technicality until later on in our development and will, for now content ourselves with the introduction of the projective limit measurable structure on R; which, according to (ii), is the one induced by the position maps w ER C t ( w ) , t E [0, 00). Thus, we will use Bt,t E [O, oo),to stand for the a-algebra over R generated by w E R C,(w), s E [ O , t ] , and we will use B to denote the smallest a-algebra over R containing Obviously, ( t ,w ) E [o, 00) x R c,(w) E c is {ot: t E [o, oo)}-Gogressively measurable. In addition, for each T E (0,00), the map TT : R RT defined bv
-
-
-
is a measurable surjection; and, in fact, a
u,,&.
-
(UtEIO,T)Bt) = rG1(Do,).
4.4.17 Warning.
Throughout the rest of this section, R will be the path-space just described, and we will be assuming that the transition probability function P ( t ,(I, .) permits us to realize the corresponding MARKOVfamily {Po: a E C} on (R,B) with C t ( w ) being the position of w E R at time t E [0,00). Furthermore, we will be assuming that (4.4.18)
Po({w : & ( w ) = limC,(w)}) = 1, ( t , a ) E ( 0 , ~ x) C. s/t
In what follows, it will be convenient to have a notation for the “splice” of two paths. For this reason, if T E (0,00), WT E RT, and w’ E R, define ZT E 0 by Z j ~ ( t = ) W T ( t A T ) ,t E [0,00), and WT @T W’ E SO that WT @T W’ = ZT if CO(W’) # W T ( T and )
-
. that the map ( W T , w’) E RT x Cl WT @T W’ if Co(w’) = W T ( T )Observe is measurable. When w,w’ E R, set w @T w’ = ( T T W ) @T w’. Finally, for
IV
Uniform Large Deviations
171
for ~kE B((R,B);R). We will also need the time-shift semigroup ( 8 , : t 2 0) on Q. Namely, for t E [ O , c o ) , define the time-shift Bt : R R by Es(8,w) = Cs+t(w), s E [ O , c o ) ; and note that ( t , w ) E [O,m) x R 8tw E R is measurable.
--
-
With these preliminaries taken care of, we can begin to formulate the problem which we want to study. To this end, let w E R Rt(w) E MI ((GI B)) (the probability measures on the measurable space ( 0 ,B)) be the map given by (4.4.19)
where X p t ] denotes normalized LEBESGUEmeasure on [0,t].(Cf. the comment in Remark 4.2.2 following the definition of L t ( w ) for the appropriate expression when the paths are regular.) What we want to do is analyze the large deviations of {Rt : t > 0} under the measures P,. Of course, as yet, we do not even have a topological structure on Ml((R,B)) and therefore are not really in a position to carry out such an analysis. Nonetheless, just as in the MARKOVchain case, our analysis will be actually accomplished at the level of the finite time-marginals of the Rt ’s; and this analysis we are ready to do. Unfortunately, although the ideas here are just as intrinsically simple as the ones in the discrete time setting, technicalities introduced by the continuity of time tend to make them appear more complicated than they really are. Given T E (0,cu) and WT E RT,define
-
It is a relatively easy matter to check that WT E QT is measurable. What is less obvious is that {Pi:) : MARKOVproperty described in the following. 4.4.21 Lemma. For each T E (0, m), WT E
WT
Pi:) E MI((@B)) E RT} satisfies the
RT, and w’
E R,
Large Deviations
172
In addition, for each T E (0, oo),WT E RT, s E [0, oo), and A E
&+T,
S, (esw/> Q
P~T)(~~I)
(4.4.22)
for every Q
E B((R,
B);R).
PROOF:To prove the first assertion, note that d i s t ( q , V ( @ t ( W@T w ‘ ) ) ) 5 dist ( W T , r T ( O t Z T ) ) dist (T~+T;T, T
+
~ + T ( ~ @T T
w’))
To prove (4.4.22), set
A , = {w’ : WT
@T w‘ E
A},
and first suppose that s 5 T . Then Os(wT @T w ’ ) = (O,Zs,) so, by the MARKOVproperty for the P, ’s,
@ . T - ~ w’,
and
since, by (4.4.18), C,(w’) = [ T T ( ~ , ( w T @ T W ’ ) ) ]( T )for P,,(T)-alrnost every w’ E fl. When s > T , a similar argument, based on the identities
-
for w’ E R with Co(w‘) = W T (T ),yields the desired result. For T E (0,oo) define ( t , W T ) E (0, m) XRT p ( T ) ( tW,T , .) E M I ( ~ T ) so that
I V Uniform Large Deviations
173
Then, by (4.4.22), P ( T ) ( tW, T , is a transition probability function on f l ~ . In addition, by the first part of Lemma 4.4.21, P ( T ) ( t , ~ ~; ,I . )S, as t \ 0. Finally, if BiT) = Bt+T, t E [0,m), then the map a)
( t , w ) E [o,m) x
R
-
= TT(etW)
c ~ ) ( W )
E RT
is {Z3iT): t E [0, 00))-progressively measurable and
r}
=
H T )(t,xiT),r)
(a.e., pi:)) for all s, t E [0,m) and every l? E Bn,. Thus, we are in the situation treated in Section 4.2. In fact, if the original transition probability function P(t,a,.) on C satisfies then it is clear that ~ i T ) ( { w :’ C L T ; ( W ’ ) E
(o),
p ( T ) (ft T ,W T , ’) pl(dt)
(W’)
6;,
P ( T ) ( t + T , w ~ , ’ ) p z ( d t ) , W T , W & E flT.
Thus (cf. Remark 4.2.8 as well as Exercise 4.2.61), the following statement is just an application of the results proved in Section 4.2. 4.4.23 Lemma. Assume that P ( t , a , . ) satisfies given, and define $)
-
= Jp(q: MI( f l ~ )
(a),let T E (0,m) be
[O, m]
in terms of P ( T ) ( t ,.)~ in ~ ,the same way as Jp is defined by (4.2.36) in terms of P(t,cr, .). Then: the level sets of $? are strongly compact; -(TI 1 1 (4.4.24) Jp (v) = sup-J ( T ) ( v= ) lim -J ( T ) ( v ) , v E MI(!&), h>O h * h h\Oh nh where ~(LT’(wT,.) (4.1.38); and
= P ( T ) ( hW, T , . )
and
is defined accordingly as in
and Xpt] denotes normalized LEBESGUE measure on [0, t]. The reason for our choosing to state (4.4.25) relative to the strong topology on M i ( f l ~will ) become clear as we develop the theory for unbounded time intervals.
174
Large Deviations
-
Noting that the distribution of W'
EQ
(Rt(d))0 rG1
E Mi
(a,)
under P,,(T)coincides with that of W'
E Q I-+ LIT)(&&) E Mi (RT)
under Pw, (TI, we see that, as long as
CJ
= WT(T),
Moreover, having stated Lemma 4.4.23 in terms of the strong topology, we can circumvent the technical objection (raised after our initial discussion of the SKOROKHOD topology) to putting an inductive limit topology That is, we will entirely avoid putting a topology on R itself and on Q. will, instead, go directly to the projective limit strong topology on Ml((R,B)). To be precise, we consider the topology on Ml((Q,B)) for which the sets (4.4.26) form a neighborhood basis at Q as runs over the bounded functions which are &-measurable for some T E [0,m). Clearly this is the projective limit, under the maps
Q E Mi((Q,B)) - Q 0 r T 1
€Mi(%)
of the strong topology on the spaces M,(Q,). In particular, 00
K c c M 1 ( ( R , B ) ) ifandonlyif K =
(7 ( Q : Q O ~ ~ ; ~ E K ~ } , d=l
where Kd is a strongly compact subset of M1(Rd) for each d E Z'. Finally, we will say that r C M1 ((a,B)) is measurable if it is an element of the B)) generated by the sets in (4.4.26). In case there a-algebra over M I ((a, is any doubt about it, we point out that this notion of measurability will, in most cases, be much more restrictive than the one determined by the BORELstructure associated with the projective limit strong topology. Having made these preparations, we can now prove the following large deviation principle.
I V Uniform Large Deviations
-
175
4.4.27 Theorem. Assume that P ( t , 0,.) satisfies : MI ((0,a)) tion JLW)
( 4.4.28)
(6) and define the func-
[0, 001 by
J p (Q) = s u p { j r ) ( Q o n,')
-(m)
:
T E (0, G O ) } ,
for each Q E M l ( ( 0 , B ) ) . (See (4.4.38) below for more information about -(a)
J,
r c_
.) Then the level sets of 7Lm)are compact; and, for every measurable Ml((fi,ql - inf ro
7p)5 lim t-03
t
P,({w : R t ( w ) E I?})
(4.4.29)
P, ( { w : R , ( w ) E I?}) PROOF:In view of (4.4.26) and (4.4.25), the only part of this statement which requires comment is the last inequality in (4.4.29). The main difficulty in the proof stems from the fact that the strong topology on M l ( 0 ~ ) is not first countable. In particular, it is not immediately clear whether 0 ~5 L) } are sequentially comstrongly compact sets {v E ~ ~ (: J(~T)(Q) pact in the strong topology. To see that they are, let { u n } F 5 M l ( 0 ~ ) satisfying $?(un) 5 L < 00 for all n E Z f be given. Then, because {v E MI(&) : T(pT'(v) _< L } is weakly compact, we can choose a . subsequence (u,{} which converges weakly to some u E M l ( 0 ~ )But --(TI {v E M l ( 0 ~ :) J p (v) 5 L } is also strongly compact, and therefore v,~ v in the strong topology. Once one has the preceding, it becomes an easy matter to check that if
-
{Qd}zff=, G Mi ((0,a))and
suP$)(Qd d,l
0
nil)
< 0,
then there is a subsequence { Q d , , } which converges in Ml((0,a)). Now suppose that r is a measurable subset of M l ( ( 0 , B ) ) and that F 2 'I is a closed subset of Ml((O,B)). Choose ( F d ) & so that F = {Q : Q o n;' E F d } and each F d is a strongly closed set in Ml(0d). We then know, from (4.4.25), that
nzl
-1 lim -
t-03
t
P, ( { w : R t ( w ) E
r})
5 - sup inf -(dl Jp . dEZ+
Fd
Next , suppose that C = sUp&z+ infFd$) < 00. Then we can choose o T;') = infFd$ 5 C. Thus, by the { Q d I z l C F so that $?(Qd
Large Deviations
176
-
preceding paragraph, we can find a subsequence {Qd,,} and a Q E MI E ( ( 0 , O ) ) so that Qd, Q in M1 ((a,23)). Since F is closed, Q E F , and -(a)
clearly J,
(Q) 5 C. 1
Once again, we want to develop a better expression for our rate function. Our development will turn on (4.4.24)combined with the sort of reasoning with which we solved the analogous problem in the case of MARKOVchains. In what follows, it will be handy to have some more notation. In the first place, if v E M 1 ( f l ~ for ) some T E ( 0 , ~ or ) if Q E Ml ((a ,B)), then we will use vt, t E [O,T],or Qt, t E [O,W), to denote v o or Q o T;', respectively. Secondly, for T E ( 0 , ~ and ) h E [O,T],define : 0~ RT by O p ) w T = nT(OhZT) and say that v E M l ( O ~ is ) shift-invariant if [Y o = Y T - ~for all h E [O, TI. Similarly, we say that Q E M1((fl,23)) is shift-invariantif Q o Oh = Q for every h E [O,W); and we will use MY(&) and Ms((CI,B)) to denote the set of shift-invariant elements of M 1 ( O ~and ) M1 ((0,a)),respectively.
Or)
-
(TY))-'
(Or))-l]T-h
4.4.30 Lemma. Q E M l ( ( 0 , B ) ) is shift-invariant if and only if QT E MY(&-) for every T E (0, m). Moreover, if v E MY(RT),then
In particular, if Q E M;((O,B)), then, for each t E (O,W), lim,ft C,(w) for Q-almost every w E R.
&(w) =
PROOF:Obviously it suffices to prove the second assertion. To this end, define
Because each WT has at most countably many discontinuities,
where l[O,T~ denotes normalized LEBESGUEmeasure on [0,TI. On the other hand, if v E M s ( 0 ~ )then ,
is independent o f t E (O,T];and therefore, by FUBINI'S Theorem, we get the desired result. I
IV
Uniform Large Deviations
177
For T E [O,oo) and v E Mi(&-), we set (4.4.31)
and when Q E Mi ((Q, a)),we use Q € 3 P, ~ instead of QT @T P,. Note that v @T P, is the unique Q E Ml((R,B)) with the properties that QT = v and w E R w PEiw, is a r.c.p.d. of Q given rT1 (Ba,). In addition, by (4.4.18) and the MARKOVproperty for { P , : 0 E C}, one can easily check that
for 0 5 TI< TZ< 00 and v E M1(RT1). 4.4.33 Lemma. Let T E ( 0 , ~ and ) v E M l ( 0 ~ be ) given. If v is not
shift-invariant, then
On the other hand, if Y E Ms(R,), then, for h E (O,T),
(4.4.34)
-
where WT E RT v ( ~ - ~ ) ( w.)Tis, a r.c.p.d. of v given ( r(TI T - J 1- (f?nT-,,). In particular, if v E M ~ ( R Tand ) s, t > 0 satisfy s t < T, then
+
PROOF:First note that
( ~ 2 ,(flT_h)-measurable, )-
for any $ E B ( ~ T R).; In particular, if '$ is = $ o Of); and if v E M ~ ( R T )then , then [IIiT)$]
1
Large Deviations
178
With these preliminaries, the argument used to prove Lemma 4.4.9 can be easily adapted to prove the first assertion of the present lemma as well as (4.4.34). Finally, by combining Lemma 4.4.7 with (4.4.32), we see that
Thus, if v is shift-invariant, then (4.4.35) follows from (4.4.34). I As was the case in the MARKOVchain setting, in order to complete our program it will be convenient to move our measures to the left halfline. Thus, for T E [0, oo),let 0; be the space of right-continuous paths LJ; : (-..,TI C which have a left limit at each t E (-..,TI and are the left-continuous at T. For --oo < s 5 t 5 T < -00, denote by B~sT~l a-algebra over 0; generated by the maps w; E 0; w;'(T) E C for T E [ s , t ] ; and use B@) to stand for the smallest a-algebra over fl; which contains BIs,Tl ( T ) for all s E (--00, TI.
-
-
4.4.36 Lemma. Let Q E MY((0,B)) be given. Then, for every T E
[0, oo), there is a unique QG E MY ((a;, #)))
for every n E Z+, -m < tl
with the property that
< ... < t, 5 T, and I? E BE..
PROOF:The uniqueness assertion is obvious; and clearly it suffices to prove existence in the case when T = 0. For d E Z+, let Or-d,O1 be the space of right-continuous paths w ~ - ~ :, ~ , [-d, 01 C which have a left limit at each t E [-d, 01 and are continuous at each t E [-d,O] for which -t E Z. Then (cf. Exercise 4.4.40 below), fir-,,,] becomes a Polish space when it is given the topology determined metric in which the homeomorphisms X : [-d,O] by the SKOROKHOD [-d, 01 have the property that X ( t ) = t for every t E [-d, 01 n Z.Also, it is then easy to see that the natural restriction mapping taking Or-d-l,ol onto Rr-,,,] is continuous for each d E Z+;and, clearly, the projective limit of {O~-d,,l : d E Z'} can be identified with the space S2tm,ol consisting of those paths wC; E fl; which are continuous at -n for every n E N.
-
-
IV
Uniform Large Deviations
179
for all n E Z+, 0 5 tl < . . . < t , 5 d , and r E 0p.Moreover, the family (QT-d,ol : d E Z+} is consistently defined on the spaces (fir-d,ol : d E Z+}. Hence, by KOLMOGOROV'S Extension Theorem, there is a unique Qg E MY((f22;,,B(0))) which extends all the Qr-d,ol's; and clearly this is the measure which we were seeking. I Given T E [0, oo),WT E f l ~ and , wC; E a;, define wC; W T ) ( t ) = wg*(t A 0) if w;(o)# w ~ ( 0 and )
€30 WT
E R$
SO
that
(wg*€30
if w;(O)= W T ( O ) . It is then an easy matter to check that
is measurable. Thus, for Q E MT ((Q, a))and T E [0, oo),we can determine (Qg €30 p+)TE Mi ((a;, by
for all r E O ( T ) . Finally, for T E [O,m), s Ml((Cl;,B(T))),we will set
E
(-oo,T], and p ; , v; E
After one reconciles the notation just introduced with our earlier notation, one finds that (4.4.34) says that, for all 0 < h < T ,
and, as we are about to see, (4.4.37) is the key to the last step in our identification of -I(.J , .
180
Large Deviations
4.4.38 Theorem. Let Q E M1 ((n,a)) be given. Then, for any h
> 0,
PROOF:If Q 4 M?((O,B)),then, by Lemma 4.4.30 and Lemma 4.4.33, $?)(Q) = 00. Thus,we will now assume that Q E MY((0,B)). Set f ( h , T ) = J n ( ~ ) ( Qfor~ )0 < h < T < 00. Then, f(h,.)is nonh ; by (4.4.35), f ( s + t , T ) = decreasing on ( h ,00) for each h E ( 0 , ~ )and, f ( t ,T - s) + f(s,T) as long as s + t < T. In particular, if h E (0,00) and T E (1,00) and n E Z+, then by induction on 0 5 e 5 n: l-1
k=O
and so nf(k,T)
2f(h,T)Lnf
;,T-l (h
),
TE(2,00)andn~Z+.
Consequently,
for every n E Z+; and therefore, by (4.4.24),
and clearly the desired result now follows immediately from (4.4.37) and LEMMA4.4.7. 1 In conjunction with Theorem 4.4.38, Theorem 4.4.27 becomes a version of the DONSKER and VARADHAN’S result on this subject [36]. 4.4.40 Exercise.
Working with the SKOROKHODtopology is notoriously unpleasant; and, in order not to burden the presentation with even more technicalities, we have swept some annoying details under the rug. What follows is a selection of some points which we have used without proof.
IV
-
Uniform Large Deviations
181
(i) Show that, for each T E (0,m) and t E [O,T],the map WT E RT W T ( ~ )E C is &,-measurable. This fact, which is well-known when C = R,
can be proved for general C’s by using the fact that every Polish space may be continuously embedded as a 66 in [0, 11” and applying the C = R result to each of the coordinates of the embedding. (ii) In the proof of Lemma 4.4.36, we tacitly used the fact that if d E Z+ and we define the SKOROKHOD distance dist(ui-d,Ol, between paths
Gji-d,ol)
--I
q - d , O ] , W[-d,O]
E y - 4 0 1 by
where X runs over increasing homeomorphisms of [-d, 01 satisfying X ( t ) = t for t E [-d,O] n Z, then the resulting metric makes R1-d,O1into a Polish space and the natural restriction maps from f2r-d-l,01onto Rr-,,,] continuous. Check this fact. 4.4.41 Exercise.
A remarkable dividend of looking at large deviations at the level of processes is that the rate functions JAW’ and Tim) have the pleasing property that they are affine on the space of shift-invariant probability measures. (As we will see in Section 5.3 below, this fact can be made to play an extremely important role in the derivation of process-level large deviation -(.I. results.) In this exercise, we outline a simple way to see this fact for J p , an analogous approach leads to the same fact for JAW’. What we want to show is that, for Q , Q’ E MT(R), -(m)
-(m)
-(a’
(4.4.42) J p (aQ+(l-a)Q’) = a J p ( Q ) + ( l - a ) J p (Q‘),
(Y
E (0,l).
Since we already know that -(m) J p is convex, all that we need to do is check that the right hand side of (4.4.42) is dominated by the left. The first step will be to develop yet another expression (cf. (4.4.43) below) for -(m) Jp . (i) Given
Y
E Ml(C), set
P, =
Po v(du).
Using (4.4.8) and (4.4.34),show that for any Q E MY(R), Y E Ml(C), and T E [0,m):
H ( Q ~ + h \ ( p v ) ~ +=h H ) ( Q T I ( P ~ ) T+) J n y + h ) ( Q T + ~ ) , h E (0, w).
Large Deviations
182
Starting from the preceding and using (4.4.39), conclude that (4.4.43)
(ii) To complete the proof of (4.4.42), prove that (aa+(l-a)b)log(aa+(l-a)b)
aaloga+(1-a)blogb-
Ib - al e
for every Q f ( 0 , l ) and all a, & f [0, 00). Now suppose that Q, &’ E Ms(Q) and a E (0,l) are given, set Y = aQ0 (1 - a)&&, and use the preceding together with (4.4.43) to conclude that
+
-(a) Jp
(QQ + (1 - a)&’)L ~
-(m) J P
( 9 )+ (1- a )-(a) JP
(&’)a
(iii) The equation (4.4.43) is interesting in its own right. Indeed, it ex-(m) presses J p (Q) as a specific relative entropy. This expression becomes particularly interesting in the case when one knows (as one does if P(t,v, .) satisfies apriori that there is a {Pt : t > 0)-invariant p E Ml(C) with -I(. the property that H(Q0Ip) < 00 for every Q E MT(R) with J, (Q) < 00. Indeed, show that, in this case, one can replace (4.4.43) by
(a))
(4.4.44)
-
4.4.45 Exercise.
Let n be a transition probability on C, and define Jn : M1(C) [0, m] accordingly (as in (4.1.38)). Also, for given v E MI@), let M?)(C2) denote the space of p E M1(C2) with the property that p o w l ’ = Y = P O T 2 where x i , i E {1,2}, is used here to denote the ith projection from C2 into C .
-’ f
(i) Assume that II satisfies the condition (U) of Section 4.1, and use the results in this section together with those in Section 4.1 to prove the equality (4.4.46)
Jn(v) = inf{H(plv@z n) : p E M?)(C2)}
as an application of the last part of Lemma 2.1.4. Conclude, in particular, that if Jn(v) < 00, then there must exist a p E MY)(E2) such that Jn(.) = H ( p 1 8~ 2 n).
IV
Uniform Large Deviations
183
(ii) Half of (4.4.46) is trivial and depends in no way on the condition (U). Namely, to see that the left hand side of (4.4.46) is always dominated by the right, check directly from the definitions of Jn and Jc’ (cf. (4.4.4)) that Jn(v) I: Jf’(p) for every p E MF’(E2), and then apply Lemma 4.4.9. (iii) Even when ll satisfies (U), a direct proof that the left hand side of (4.4.46) dominates the right is not so easy. Thus, all that we will attempt to do here is explain how the existence of a p E Ma’(Ez) satisfying J ~ ( Y = ) H(p(v8 2 II) is related to the functions u E B ( C ;[l,m)) in terms of which Jn(v) is defined. Given a u E B (C; [l,m)), consider the transition probability defined by
(Note that, in the notation of Section 4.1, the II, above would have been denoted there by IIv with V = log &.)Next, define p, = v 82 II,, check that
and conclude that (4.4.47)
Jn(v) = -
J,log
dv = H(pu Iv
II).
for pu E Ma’ (C2) Conversely, use Lemma 3.2.13 to check that
and conclude that
sE
Summarizing, we now see that Jn(v) = - log dv if and only if p, E MF)(E2), in which case Jn(v) = H ( p , I v 8 2 II). The problem is, of course, that one cannot expect, in general, that there will exist a dv. u E B ( C ;[0,m)) for which Jn(v) = - SElog
Large Deviations
184 4.4.48 Exercise.
--
It is no accident that the rate function governing the large deviations of the empirical process is infinite off of the space of shift-invariant measures. To see this, iet R = EN,define w E R &(w) E Ml(R) as in (4.4.1), and suppose that P E Ml(Cl) and I : R [0,a]satisfy 1
lim -log ( P ( { w : &(w) E G } ) ) 2 -I(Q) n-w
for every open G in Ml(R) and Q E G. Show that I must be identically infinite off of MT(S2).
Hint: First check that MT(R) is a closed subset of Ml(R2);and, second, note that, for any c > 0, there is an N E Z+ such that the LBVY distance between elements Q and Q’ of M,(R) is less than e if
(The map
~FIO,NI is
the projection of R onto EN obtained by restricting a
n [0, N ] . ) Finally, for any w E R and n E 7+,let Gn E R be the path determined by Ck,+e(G,)
= & ( w ) for k E N and 1 5 l < n;
and show both that R, (G,) E MT(R) and that
V
Non-Uniform Results
5.1 Generalities about the Upper Bound
We begin by restating Theorem 2.2.4 for the setting in which we will be working. Namely, let fl be a Polish space and suppose that {QE: c > 0) is a family of probability measures on Ml(52) with the property that
(5.1.1)
A(V) = lim clog E'O
(1
Ml(W
exp
[1 1 e n
V ( w )p(dw)] QE(dp))
exists for every V E Cb(52;R). We then know that
-
lim E log(Q,(C)) 5 - inf A*
(5.1.2)
EO'
for C
cc Ml(52), where A* : M1(R)
(5.1.3)
A*(p)
= sup
{
-
C
[0,00], given by
V d p - A(V) : V E Cb(i2;R)
I
,
is the LEGENDREtransform of A. Our goal in this section is to find out when we can remove the restriction that the C in (5.1.2) be compact.
-
R, we will say that @ is nonGiven a function @ : Cb(R;R) decreasing if @(V1) 5 @(Vz)whenever V, 5 VZ; and we will say that @ is tight if for each M E (0,oo) there is a K ( M ) CC 52 such that @ ( V 5 ) 1 whenever V is an element of cb(n;W)which vanishes on K ( M ) and is bounded by M . 185
Large Deviations
186 5.1.4 Lemma. Let @ : Cb(Q; R)
--+ W
be a non-decreasing, convex function with the property that @(cl) = c, c E R. Then I@(V2)- @(Vl)l 5 llv2 - v1IIB for all v1, v 2 E Cb(R;R). Moreover, if, in addition, @ is tight, , ) CC R such that then for every E > 0 and M E (0,m) there is a K ( EM - Q(V1)I 5 E for all ~ 1 V,, E cb(R;R)with the properties that VI = V2 on K ( E , M )and IIVIIIB V llvzll~I M.
IQ(v~)
PROOF: First, note that @(V)5 9(11VllB1) = IIVIIB and that
v -v O=m(l)
Thus,
Q(V) + q - V ) I 2
I@(v)l5 IlVll~,v E cb(@ R). Second, using convexity and writing
one sees that
for 8 E ( 0 , l ) . fiom (5.1.5) and the remark preceding it, we have that
for all 8 E ( 0 , l ) ; and, therefore, after letting 8 \ 0 and reversing the roles of V1 and V2, one gets the first assertion. To prove the second assertion, let E > 0 and M E (0,m) be given and use (5.1.5) to see that
V llv2llB 5 M . Finally, define 8 E (0,l) so that as long as I[VI~~B
e 1 - ( 1 + 4 M ) = € A -, 2 2 and set K(6,M ) = K ( 4 M / 8 ) ,where { K ( M ) : M E (0, m)} is the family of compact sets which appears in the definition of tightness for @. After reversing the roles of Vl and V2, one then arrives at the desired conclusion. 1 Before presenting the next result, we need to introduce some notation. Let /j be a compatible metric on R with the property that (Q, p ) is totally bounded, and denote by fl the completion of R with respect to b. Obviously,
V Non-Uniform Results
187
fl
is compact and, because it is Polish, il can be thought of as a dense subset of fl. In particular, we will identify Ml(R2) with the subset of those ji E Ml(fl) for which ji(fi \ R) = 0. In addition, if Ct,(R;R) denotes the space of bounded, &uniformly continuous functions on R, then E ~ ( f l€3); E C b ( i l ;R) is a surjective isometry. What the following theorem turns on is the observation that “tightness” allows one to work on the compact space fl and then transfer one’s conclusions there back to 52 itself.
4
-
$In
-
5.1.6 Theorem. Let @ : Cb(il2;R) W be a non-decreasing, convex function with the property that @(cl)= c, c E R; and define 9 on Ml(il2) bY
Then ! I ! is convex rate function. Moreover, if @ is tight, then 9 is good, there is a po E Ml(R) at which P vanishes, (5.1.8)
@ ( V= ) SUP
{
V d p - @ ( p ): /J E MI(R)},
V
E cb(fi;R),
and (5.1.9)
=
{
P(ji)
if fi E Ml(fZ)
m
if fi E M(6) \ Ml(n)
where $ is defined on M(6) by (5.1.10) !b(ji) = s u p
V d j i - @ ( V l n ): V E C(fl;R)
-
Conversely, suppose that @ : M,(R) [0, m] is a convex rate function which vanishes a t some po E Ml(S1);and define @ on cb(R;W) by (5.1.8). Then @ is a non-decreasing, convex function which satisfies @(cl)= c, c E €3; @ can be recovered from @ via (5.1.7); and 9 is good if and only if @ is tight.
PROOF:Let @ be a function of the sort described in the first part of the theorem, and define P accordingly by (5.1.7). Obviously, 9 is lower semicontinuous and convex. In addition, since @(O) = 0, it is clear that 9 > 0. Next, add the assumption that @ is tight. To see that P is good, let { K ( M ): M E (0, m)} be the compact subsets of il described in tightness property for @. If @ ( p )5 L, then
188
Large Deviations
for all V E Cb(R;R) satisfying llvll~5 M and V = 0 on K ( M ) . Hence, Q ( p ) 5 L implies that p ( K ( M ) ' ) 5 for all M E (0,00); and therefore
is compact in M1(R). We next turn to the proof of (5.1.9). To see that &(/it = 0;) unless fi f M,(fl), suppose that fi f M(fl)\Ml(fl).If @ is not a probability measure, then @(/i) = 0;) follows easily from Q(c1) = c, c E W. Thus, suppose that ji E Ml(fl) \ Ml(R). Then ji = Op (1 - O)D, where p E M,(R), D E M1(fi) with D(R) = 0, and 0 E [0,1). Since R is a subset of fl, \ R can be written as the countable union of compact 2 $. subsets of fl. Hence, there exists a compact I? C f i \ R for which $(k) Now let M E ( 0 , ~ be ) given and use the TIETZE Extension Theorem to construct a VM E C(fl; [0,M I ) with the properties that VM = 0 on K ( M ) and ?M = M on K.We then have that
+
@(b)L
~ V M d / i - - " ( V M I o )L
(1 - 8 ) M
- 1, M E (0,m);
and this shows that @(/i) = 00. To complete the proof of (5.1.9), we must still check that $ ( p ) = Q ( p ) for p E Ml(R). Obviously, & ( p ) 5 Q ( p ) , and so it suffices to check that V dp - Q ( V ) 5 @ ( p ) for all V E Cb(R; R). Given V E cb(n;R) and c > 0, set M = IlVll~,choose K(c,M ) CC 0 as in the last part of Lemma 5.1.4, and take K CC R so that K 2 K ( EM , ) and p(KC)< e / ( M 1). Now use the TIETZE Extension Theorem to construct a P E C(fl;W) so that V = V on K and IlVll~5 IIVIIB. Then
s,
+
Continuing in the setting of the preceding paragraph, we next want to derive (5.1.8). To this end, first observe that, because of (5.1.10), (5.1.9), and the fact that M(fi) is the dual of C(fl;W), Theorem 2.2.15 implies (5.1.8) for E C b ( @ W ) . Also, it is clear that for all v E Cb(fl;R) the left hand side of (5.1.8) dominates the right. With these preliminaries in mind, let V E c b ( f l ; R ) and 0 < c 5 1 be given. Set M = l l V ( l ~and (recalling that we already know that @ is good) choose K cc Q so that K 2 K ( c , M ) and p(KC)< c / ( M 1) whenever Q ( p ) 5 2M 1. Next, construct W E &(fl; R) so that llWll~5 M and W = V on K , and choose p E M1(R) so that Q ( W )5 W d p - Q ( p ) c. Then, Q ( p ) 5 2M 1, and so
v
+
s,
+
+
+
V Non-Uniform Results
189
In other words, (5.1.8) is now proved. Finally, by taking V = 1 in (5.1.8), we see that infMI(n)9 = 0; and therefore, by Lemma 2.1.2, there is a po at which 9 vanishes. It remains to prove the converse assertions. Let 9 be given as in the second part of the theorem, and define CP by (5.1.8). It is then an easy matter to check that CP is a non-decreasing, convex function for which @(cl) = c, c E R. Moreover, the ability to recover 9 via (5.1.7) is a simple application of Theorem 2.2.15. In particular, by the first part of this theorem, 9 is good if CP is tight. Finally, to see that CP is tight if 9 is good, let M E (0,m) be given; and choose K cc R so that p ( K C )< 1/M whenever Q ( p ) 5 M . Then the right hand side of (5.1.8) is dominated by 1 for all V E Cb(SZ;R) which vanish on K and satisfy l l V l l ~5 M .
5.1.11 Corollary. Let {QE : E > 0) be a family of probability measures on Ml(R) and assume that the limit A(V) in (5.1.1) exists for each V E Cb(R; R). Then A is a non-decreasing, convex function with the property that A(c1) = c, c E W. Moreover, if A is tight, then the function A* in (5.1.3) is good and (5.1.2) holds for every closed set C 2 Ml(0). PROOF:The only assertion which is not an immediate consequence of Theorem 5.1.6 is the final one. To handle this one, denote by Qe the ) from Q E by the inclusion M1(R) C M1(fi). measure on M ~ ( f iinduced Then
for
3 E C(fi;R). Thus, if G is defined in terms of A as in (5.1.10), then
for all closed C Ml(h2). At the same time, if A is tight, then, by (5.1.9), infc A* = infc,,Ml(n) A*; and clearly this shows that (5.1.2) holds for every closed C. I h
5.1.12 Exercise. It turns out that there is no need to know that the limit A(V) in (5.1.1) exists in order to get an upper bound. Indeed, let {QE : E > 0) C M1(M1(C)),suppose that CP : cb(c;w) R is a function which dominates
-
(5.1.13)
i ( V ) = lim clog O E ’
(1
for V E Cb(C;R); and let 9 :
exp
-
M1 (C)
[f
V(n)p(do)]Q E ( d p ) )
R be defined as in (5.1.7).
Large Deviations
190
(i) Show that
-
lim E log (Qd(C)) 5 - inf Q
(5.1.14)
C
€40
for all C cc MI@). Next, show that h is a non-decreasing, convex function which satisfies h(c1) = c, c E R; and conclude that (5.1.14) continues to hold for all closed C M1(C) if is tight. In particular, these considerations apply when @ = h; in which cwe we will use k to denote the corresponding \Ir .
-
(ii) Suppose that there exists a function F : C R with the properties that F is bounded below, {o : F ( o ) 5 M } C C C for every M E [O,m), and
Show that is then tight; and conclude that h* is good, that (5.1.14) holds with \Ir = h* for every closed C C M1(C), and that
h(V)= sup
(5.1.16)
{ J, V d p
- i * ( p ) : p E M1(C)
for every V E Cb(C; R).
5.1.17 Exercise. Return to the setting of Remark 4.2.2 in Section 4.2, and define hp(V) to be
for V E Cb(C;R).
(i) Check that
np
is non-decreasing, convex, and satisfies i p ( c 1 ) = c, c E
W. Thus, if (5.1.7) is used to define h> from i p , then
(cf. Remark 4.2.2) holds always for C CC M1(C) and will hold for every if hp is tight. closed C C_
V Non- Uniform Results
191
-
(ii) Show that if F : C R is a function which is bounded below and has the properties that {a : F ( a ) 5 M } cc C, M E [0, m), and
then
i p
-
is tight.
-
(iii) Let F : C R be a lower semi-continuous function which is bounded below, and suppose that there is a measurable u : [O, m) x C [0, m ) which satisfies (see the paragraph preceding Lemma 4.2.23)
Show that
Finally, if {c : F ( u ) 5 M } cc C, M E [O, m ) ,u is uniformly positive, and
conclude that x p is tight. At least when dealing with processes whose paths are continuous, one often finds the function u by a localization procedure. Namely, one starts with a function F with compact level sets and seeks a non-decreasing, locally bounded sequence of functions un E D which satisfy u, 2 1 and Lu, = -Fun on a sequence of open sets U, which exhaust C; and one then takes u to be the limit of the u, 's. (iv) It is clear that A p 5 A p (where A p is defined in (4.2.21)) and therefore that h > ( v )5 i > ( v ) and also that
for all v E
(cf. (4.2.22) and (4.2.37) for the notation here). Thus (cf. (4.2.36) and (4.2.38)),we see that J p 5 and that, when P(t,a, is FELLER-continuous, J p 5 A>. Check that the following line of reasoning leads to A> 5 J p and thence to (5.1.21)
xi
A> = J p
-*
if A, = A>.
a)
Large Deviations
192
Let V E Cb(C; W) be given and define {Py : t Lemma 4.2.23. Given X > i p ( V ) , define
Show that infnEZ+ infoex u,(u) Lemma 4.2.31), and that
> 0, U,
XU, - VU, - Lu, = 1 - v,
> 0 } accordingly as in
E D (cf. the discussion preceding
where v,
G
e-',
[P,Vl]
Next, check that
-
and therefore that supnEZ+IIv,/u,)IB < co. Since X > i p ( V ) , conclude that v,/u, 0 boundedly. After combining this with the preceding, one is led to
and from here it is an easy step to the desired conclusion. Finally, by the same reasoning which just led to (5.1.21), prove that (5.1.22)
-
A* -A* - J p
when P(t,cr, .) is FELLER-continuous.
(iv) Formulate and verify the results in (i) through (iv) for the discretetime setting.
V Non- Uniform Results
193
5.2 A Little Ergodic Theory
Before attempting to develop lower bounds which will complement the upper bounds obtained in Section 5.1, we make a digression in which we will discuss a few essential facts from ergodic theory. Because it is not so readily available in standard texts, we will work in the continuous parameter setting. We begin our discussion with the lovely Sunrise Lemma of F. RIESZ[91]. To understand both the name as well as the intuition behind what is going on, think about the distribution of light and shade in a (one-dimensional) mountainous region at precisely the moment when the sun comes up over the horizon. In the lemma, the sun is on the right, the set E is the region in the shade, and “ F ( s )is the altitude at s.”
-
5.2.1 Lemma. Let I = [a,b] be a non-empty compact interval and F : I R a continuous function. Denote by E the set of s E I” with the property that F ( t ) > F ( s ) for some t E (s, b ) . Then E is an open subset of R; and if E # 8, then it is the union of countably many mutually disjoint open intervals (a,/?) each of which has the property that F ( P ) 2 F ( a ) .
PROOF:Clearly, E is open in R, and therefore all that we have to do is check that if (a, /?) is a non-empty connected component of E then F(/?)2 F ( a ) . To this end, suppose that F ( P ) < F ( a ) and set A = (F(a)+F(P))/2. Then C E {s E (a,p) : F ( s ) = A } is a non-empty, compact subset of (a,/?). Let y = max{s : s E C}, and observe that F ( t ) < A for all t E (y,/?].In addition, since p $ E , F ( t ) 5 F ( P ) < A for every t E (/?,b). Hence, F ( t ) < A = F(y) for all t E (y,b), and therefore y 4 E . However, y E (a,/?)E ; and so we have a contradiction. I As a direct consequence of Lemma 5.2.1, we get the following sharp form of the HARDY-LITTLEWOOD Maximal Inequality [58]. 5.2.2 Theorem. Given a function
f
E L ’ ( R ) , define
(5.2.3) Then s E R I-+
J ( s ) E [O,m) is lower semi-continuous and
. use Irl to denote the LEBESGUE measure of for all X E ( 0 , ~ )(We W.) In particular, for all p E (1,001,
(5.2.5)
r
C_
194
Large Deviations
PROOF:Without loss of generality, we will assume that f 2 0. Given n E Z+ and X E (0, m), set I , = [-n, n] and define
and
for s E [-n,n).Clearly, {s E 1; : fn(s) > A} coincides with the set E,,x in Lemma 5.2.1 corresponding to the function Fn,x on I,. Moreover, by that lemma, we know that E,J is either empty or the countable union of mutually disjoint intervals ( a , @ with ) the property that A(@ - a ) 5
J,B f(t)d t . Hence, After letting n /” Xl{s :
c
00, one
quickly concludes from the above that
f(4 > All I
s:f(s)>A}
f(t)d t ,
E (0700);
and so (5.2.4) results from taking left limits in the preceding. ) bounOnce one knows (5.2.4), one can get (5.2.5) for p E ( 1 , ~ and ded, non-negative f E L1(R)by simply noting that
where we have used HOLDER’Sinequality in the last step. The derivation of the general result is now an easy limit argument. Since (5.2.5) is obvious when p = 00, the proof is now complete. 1 We are now ready to start doing ergodic theory. Let (R,B) be a measurable space. The family 0 = (0, : t E [ O , o o ) } is called a measurable, one-parameter semigroup of transformations on (R, 23) if ( t , w ) E [O,m) x 0 H & ( w ) E R is l?pa) x B-measurable function from [O,m) x 0 into (0,B) and ds+, = 8, o Ot for all s, t E [ O , o o ) . A set A 2 R is said to be @-invariantif A = 8r1A, t E [O,m); and a measure Q E M1((R, l?)) is said to be @-invariantif Q = Q o d,’, t E [0, m). We will use 30 and MY((R,B)), respectively, to denote the @-invariant subsets A E B and @-invariant measures Q E MI ((a,B)).
V Non-Uniform Results
195
5.2.6 Theorem. (MAXIMAL ERGODIC INEQUALITY) Let (52, 23) be a measurable space and 0 = (0, : t E [0,m)} a measurable, one-parameter semigroup of transformations on (R, 23). Then the set 30 is a sub-a-algebra of B. Next, given a measurable f : R R, let Rf be the set of w E R with If(0tw)l dt < 00 for every T E [0,m). Then Rf E 23, the property that and Q ( R f ) = 1 for all Q E MY((R,B)) and f E L1(Q). Finally, given a measurable f : R W, define f~ : 0 R for T E ( 0 , ~by )
-
f T ( u )=
{
Then (T,w) E (0,m) x R
-
+
w
f(e,w) dt
ifw E af otherwise.
f ~ ( w )E R is measurable, T E (0,m)
f ~ ( wE)R is continuous for each w E R, and, for every Q
E
-
MY((R, a)),
one has that (5.2.7)
1
Q ({w : Mf(w) 2 A}) 5 illfllL1(Q)lA E (07 0 0 ) 7
and
where
Mf(w)= SUP
(fT(41,
WER.
T€(O,m)
-
PROOF:The only thing that we need to do is check that (5.2.7) and (5.2.8) hold for bounded measurable f : R [O,m). Let such an f be given; and, for m E Z+ and w E R , define
and
It is then an easy matter to see that for 1 5 m < n and t E (0,n- m],
~rnf(6t~ i )J n , w ( t ) Hence, by (5.2.4), for all X E ( 0 , ~ and ) 15 m XQ ( { w : M m f b ) 2 A } ) cn-m
Large Deviations
196
By first letting n /" 00 and then rn /" co, one easily gets (5.2.7) from this. Similarly, if p E (1,co) and one uses (5.2.5), then one has that
from which (5.2.8) follows when p E (1,co). Since (5.2.8) is trivial for p = co, we are now done. The most familiar application of the Maximal Ergodic Inequality is the renowned Individual Ergodic Theorem.
-
5.2.9 Theorem. (ERGODIC THEOREM) Referring to Theorem 5.2.6, let
EQ[f13e] QQ E MY((R,O)) and f E L'(Q) be given. Then f~ almost surely and in L' (Q). Moreover, for each p E (1,co) and f E LP(Q), (5.2.10)
I(~TII~~(~)
PROOF:Because = I ~ ~ I I L I ~ Q ) for f E L'(Q)+, the first assertion reduces to checking that the convergence takes place Q-almost surely. Moreover, because of (5.2.8), (5.2.10) will follow by LEBESGUE'S Dominated Convergence Theorem as soon as we know the Q-almost sure convergence result. Thus, all that we will do is show that f~ -+ EQ[f13e] Q-almost surely for f E L1(Q). Let 5 denote the space of f E L1(Q) such that lim f~ = EQ [ f13e] (as., Q)
T-+W
Clearly 5 is a linear subspace of L1(Q), and, as a consequence of (5.2.7), we know that it is also closed in L1(Q). Thus, we will know that 5 = L1(Q) once we show that 5 contains a dense subspace of L2(Q). To this end, let L be linear span of functions f = g - g o 8t as g runs over bounded measurable functions and t runs over [ O , c o ) . Noting that EQ[g13e] = EQ[go 8,(3e]
V Non- Uniform Results
197
(a.s., Q), we see that EQ[f1301 can be taken to be 0 for such an f . At the same time,
[i t
fT(w) =
g(B,w)ds - r t g ( B s w ) ds]
-
0
as T /”
00.
Thus, L 5, and so it remains only to check that the perpendicular complement L L of L in L2(Q) is also contained in 5. But, if h E L I , then h = h o Ot (as., Q) for every t E [0,m). Hence, by FUBINI’S Theorem, for each T E (O,m), h~ = h (a.s.,Q); and, because T E (0,m) h~(w) is continuous for Q-almost every w E R, this means that there is one Qnull set A E 0 such that ~ T ( w )= h ( w ) for all T E (0,m) and w 4 A. In f L ~ ( w exists}, ) then G E 30 and A‘ G. particular, if G = { w : IimT,, Thus. if
-
x
then ?E(w) = x(Btw) for all (t,w) E [O,m) x R and = h (a.s.,Q). In particular, is itself &measurable, and so we conclude that
x
We now want to study the structure of the set M?((R,B)). Clearly it is a convex subset of MI((!& 0)).In the following lemma, we show that the extreme elements of MY((R,B)) coincide with the ergodic elements of MY ((a,0)) (i.e., those Q E MY ((0,0))with the property that Q ( I ) E { O , l } , I E Je). We will use EMY((R,B)) to denote the set of ergodic elements of MY (( R, 0)).
5.2.11 Lemma. If Q1 and QZ are elements of MY((R,B)), then either Q1 = Q2 on 0 or there is an I E 30 for which Qz(1) # Ql(1). In particular, distinct elements of EM? ((Q, a)) are singular, and EMF ((R, a)) is precisely the set of extreme elements of My ((a,23)). PROOF:To prove the first statement, suppose that Q1, Qz E M?((R,B)) and - that Ql(I’) # Qz(r) for some l? E 23. Set f = Xr and define g = limT,, fT. Then g is a bounded 30-measurable function and
Thus, Q1 differs from Q2 on 38.
198
Large Deviations
The singularity of distinct ergodic measures is an immediate consequence of the preceding. Furthermore, if Q is ergodic and
Q = aQi+ (1- a)Qz for some a E ( 0 , l ) and Q1, Qz E MY((Q,B)),then it is clear that Q1 coincides with QZ on 30 and therefore, by the preceding, on B. Thus, Q is extreme in MY((h,B)) if it is ergodic. Finally, if Q E MY((&B))is not ergodic, choose I E 30 so that a = Q(I) E (0,I) and set
Ql(dw) = (y X d W ) Q(dw) and Qz(dw) = Xrc (w) Q(dw). l-ff
It is obvious that Q1 and Qz are distinct elements of MY((R,a)), and therefore we see that Q cannot be an extreme element of MY((R,a)). I By general principles of convex analysis, the preceding leads one to suspect that all the elements of MY((R,B)) ought to be recoverable from elements of EM? ((0,B)). Indeed, the recovery process ought to consist of taking (finite) convex combinations followed by a “limit procedure.” The rest of this section is devoted to making these ideas precise, at least when additional structure is imposed. In particular, from now on we will be assuming that R is a Polish space and that B = Ba. In addition, we will assume that 0 is a measurable semigroup of continuous transformations f?t : R 0. What we are going to show, under these assumptions, is the existence for every Q E MY(R) = M,@ ((0,Bn)) of a PQ E Ml(Ml(0)) with the properties that PQ is concentrated on EMy(R) EM?((R,B)) and
-
(5.2.12) 5.2.13 Remark.
--
Notice that the existence of PQ ought to be a “trivial” consequence of the following line of reasoning. Namely, since R is Polish, there is a conditional probability distribution w E R Qw E M1(R) of Q Qw is 3egiven 3e (abbreviated by c.p.d. of QIJe). That is, w measurable and
Q(In I?) =
1Qw(r)
Q(dw), I E 30 and
-
-
r E Bn.
(See Theorem 1.1.6 in [lo41for the proof that w Qwexists.) Moreover, it is not hard to believe that when Q E MY(R) then Qw E MY(R) for Qalmost every w E R. Thus, if we knew that w Q,,, were regular (i.e.,
V Non-Uniform Results
199
Q w ( I ) = x l ( w ) , ( w , I ) E R x 30), then we would be done. Indeed, we would then know that Qw E EMy(R) for Q-almost every w E R, and we could therefore take PQ to be the distribution of w I-+ Qw under Q. Unfortunately (cf. Exercise 5.2.21 below) the o-algebra 30 will nearly never be countably generated, and therefore it is not clear that w -I+ Qw can be chosen to be regular. Thus, some more work is required. In order to handle the problem raised above, it will be useful to have specified a countable class 3 C cb(n;W) which has the properties that 5 determines weak convergence (i.e., Qn Q if and only if f dQn f dQ for every f E 3) and that it generates, under bounded pointwise convergence, all of the bounded measurable functions on R to R. For example, one can take 5 to be a countable dense subset of the space of bounded, &uniformly continuous functions from R to R, where fi is some totally bounded metric for 0. Next, for f E cb(n;R), let R( f ) be the set of w E R at which
*
f*(w)
=
sn
-
lini fT(W) T-CC
exists. Then it is clear that
Also, define Roo to be the set of w E R for which
converges in MI(S2) to some limit 6: as T /” m. Although it is not clear whether Roo E Bn, it is obvious that 000 is (0, : t > 0)-invariant and that 6: E Mf(R) for every w E ROO.In addition,
so
-
f d6: for all w E ROOand f E Cb(R; R). Finally, given and f *(w) = Q E MY(R) and a c.p.d. w Qw of Q)&, set
200
Large Deviations
and note both that RQ E 38 and that
RQ = {w
E 000 : : 6 = Qw}.
5.2.14 Lemma. The set MY(R) is closed in Ml(R), and Q E MY(R) is
an element of EMF@) if and only if
for every f E 5. In particular, EMY(R) E B,,,,,. Moreover, Q(RQ) = 1 for each Q E MY(f2); and therefore, for each Q E MY(R), Qw E MY(R) for Q-almost every w E R.
PROOF:Since Q E MY(R) if and only if lf(etw)Q(dw) =
J, f(w)Q(dw)
for all t E (0,m) and f E Cb(R;R);
and because f 0 et E Cb(fl;R), t E (o,m), whenever f E c b ( R ; w ) , it is clear how to write MY(R) as the intersection of closed sets C ( t ,f ) , (t, f ) E (0, m) x c b (0; To prove the characterization of EM?(R), it is enough to show that the stated condition is sufficient. But, if f* = f dQ (a.s.,Q) for every f E 5, then EQ [fl3e]is Q-almost surely constant for every f E 5. Since the class of f E B(R;R) which have this property is closed under bounded point-wise convergence, we see that EQ [f)3@]is Q-almost surely constant for every f E B(R;R); and obviously, this is tantamount to the assertion that Q is ergodic. Finally, if Q E MY(fl), then the equality Q(RQ) = 1 is an immediate consequence of the Individual Ergodic Theorem together with the fact that, for each f f B(R;fa), w E R fdQw is a version of EQ[f13@].1
w).
s,
-s,
5.2.15 Lemma. For every Q E MY(R), Qw = : 6 E EMY(0) for Qalmost every w E R. In particular, if R b = {w E RQ : Qw E EMY(R)},
then
fib E 38, Qb C_ ROO, and Q = PROOF:Note that Q({w :
Qw
4 EM?(fl)})
J"b
6: Q ( L ) .
V Non-Uniform Results At the same time, for each f E 5 and
201 E
> 0,
and
In the preceding, we have used the fact that f~ E Cb(R;R) in order to pass from the second to the third lines, and we have used (xn,,f*) o B5 = xnof*, s E [O,m), in the passage to the last line. I Clearly, the preceding shows that w H Qw admits a regular version; and therefore, by the reasoning at the end of Remark 5.2.13, we have the following result as an immediate consequence of Lemma 5.2.15. 5.2.16 Theorem. (ERGODIC DECOMPOSITION THEOREM) Let R be a Polish space and 0 = (0, : t E [ O , c o ) } a measurable semigroup of continuous transformations on R. Then, for each Q E MY(C2), there is a PQ E Ml(Ml(R2)) with the properties that ~Q(EMY(R)) = 1 and (5.2.12) holds.
Before closing this section, we record what our results look like in the case when 0 = (0, : t E R} is a measurable group of transformations (i.e., Os+t = B,oB, for all s, t E R) on 0. Note that invariance of measures or functions under 0 is equivalent to invariance under either of the semigroups 0+ f (0, : t E [0, m)} or 0- = {O-, : t E [0,m)}. Thus, by treating O+ and 0- separately, one sees that for every Q E MY((!&a)) and f E L’(Q),
202
Large Deviations
for A E (0, m),
(5.2.18)
both Q-almost surely and in L1(Q), and
(5.2.20)
if p E ( 1 , ~ and ) f E L P ( Q ) . Finally, when R is a Polish space and the Ot ’s are continuous, then the Ergodic Decomposition Theorem again applies and yields (5.2.12) with a PQ which is concentrated on the ergodic elements of M? (0). 5.2.21 Exercise.
As was mentioned in Remark 5.2.13, the c-algebra 3g is hardly ever countably generated. To see why this is the case, assume that 0 = (0, : t E R} is a measurable group of transformations on (R,B) with the property that every orbit [w]e= {Btw : t E W}, w E 0, is an element of B and that there exists a Q E EMY((0,B)) such that Q ( [ w ] e )= 0 for every w E R. Under these circumstances, it is impossible for 3~ to be countably generated. Indeed, suppose that 30 = o({Ae}Y). Choose {&}y so that Be =
= n,“=,
{
At A:
ifQ(Ae) = 1 if Q(A&)= 0.
Show that C Be = [w]e for some w E R, and conclude that 1 = Q(C) = &([w]Q)= 0. In particular, this rules out the possibility that 3s is countably generated. For a simple example of such a situation, take R to be the 2-torus S1x S1 and (0, : t E [0,m)} to be the flow generated by the vector 7% where y is an irrational number. Check that all the orbits are then go subsets of R and that the normalized LEBESGUE measure on R is an ergodic, invariant measure which assigns measure 0 to each of these orbits.
&+
V Non-Uniform Results
203
5.2.22 Exercise.
For the sake of completeness, work out the theory developed in this section for the case of a discrete 1-parameter semigroup (0, : n E Z+}. Of course, since Bn = 0" where 0 = 01, the appropriate notions of invariance are simply that Q = Q o 0-' and f = f o 0-I.
(i) F'rom the HARDY-LITTLEWOOD inequality, derive
for all X E (0, m) and any sequence {an}y.(Here we use JrJ to denote the LEBESGUE measure of r S Z+; in other words, the cardinality of I?.)
(ii) Knowing (i), prove that for any 0-invariant Q E M1((R,B)) and any f f L1(Q),
for X E (O,m),
(5.2.26)
1
-
n m=l
f(0"w)
-
E Q [ f 1 3 e ] ( w ) (as.,&) and in L'(Q),
and
if p E (1,m) and f f LP(Q).
(iii) Assuming that R is a Polish space and that 0 is continuous, state and prove the appropriate version of the Ergodic Decomposition Theorem (i.e., Theorem 5.2.16).
204
Large Deviations
5.2.28 Exercise.
Let II(a, be a transition probability function on the measurable space ( C , F ) and define the operator [n4](a)= & ~ ~ ( T ) I I ( u a, ~ET C, ) , for 4 E B ( ( E , F ) ; R ) Denote . by B n ( ( C , F ) ; R )the space of 4 E B ( ( C , F ) ; R ) which are II-invariant (i.e., 4 = IIq5), and let MY((C,3)) be the space of II-invariant p E M1 ((C, F))(i.e., p = p I I = J.II(o, p ( d a ) ) . a)
a)
(i) Prove that, for any p E M Y ( ( C , F ) ) and 4 E L'(p),
for X E (0, m) and
for p E (1,m].
-
(ii) Next, show that for each p E M Y ( ( C , F ) ) there is a unique bounded linear operator E, : L 1 ( p ) L 1 ( p ) with the property
-1
"
-
[IIm4](a)
m=l
[ECL+](a) p-almost surely and in L ' ( p ) .
Show that E i = E,, ECL42 0 if 4 2 0, and E p 4 = q5 (a.s.,p) if 4 E B n ( ( C , F ) ; R ) .In particular, conclude that E, is a contraction on P ( p ) for every p E [l,m]. Finally, show that
for p E (1,m) and
E Lp(p).
(iii) Call an element p of MY (( C, F))II-ergodic if
4=J
c
4dp
(as., p ) for each
4 E B~((c,T);R).
Show that two II-ergodic elements of M F ( ( C , F ) )are either equal or singular.
V Nan-Uniform Results
205
(iv) Set R = EN,B = p ,and let {Po: a E C} be the MARKOVfamily of probability measures on (a,B) whose transition function is lI(cr, .). Given p E Ml((C,F)), set Pp = JE Pcp(da), and check that Pp is invariant under the shift O : R R given by (Ow), = w,+l, n E h+, if and only if show that p E MF((C,F)) is HI-ergodicif and only is ergodic for 8.
-
5.2.31 Exercise.
Let 0 = {Ot : t E [0, GO)} be a measurable semigroup of transformations on the measurable space (a,F),and assume that there is a sub a-algebra 30C F with the property that UtE[o,m) O;lFo generates the whole of F. Next, for each T E [0, m), let FT and p be the a-algebras generated by UtEIO,T~ OF'30 and UtEIT,m) O;'~O, respectively. Finally, define the tail a-algebra 7 = p.
nTEIO,m)
(i) Given any f E B ( ( R , F ) ; R ) , set
-
f*(w)= lim f t ( w ) = t-+m
t+m
t
When f is 3~-measurablefor some T E [0, m), show that the function f * is 7-measurable. Next, assuming that Q E M Y ( ( 0 , F ) )and using --Q 7 to
7 -measurable for every denote the Q-completion of 7, show that f* is -Q
f E B((fm;q. (ii) Using (i), show that if Q E MY((R,F)),then 3e that Q is ergodic if Q(A) E ( 0 , l ) for every A E 7.
-Q
7 ; and conclude
206
Large Deviations
5.3 The General Symmetric Markov Case
Our first application of the results obtained in Section 5.1 will be to the large deviation theory for the empirical distribution of the position of a symmetric MARKOV process. More precisely, let C, P(t,a,.),and the associated MARKOV family {Po: (T € C} Ml((f2,B)) be as they were in Section 4.4; and define
s
(5.3.1)
L t h ) =X[O,t] O
( W j 0 , t I ) -I,
I..
(tl w ) E (01
x 0,
as in Remark 4.2.2. Next, assume that there is a P (t ,(T,-)-reversingmeasure m E MI@), and define the DIRICHLET form E and the associated functions A& : B(C;R) R and J& : MI@) [0, m] as we did in the final part of Section 4.2 (cf. especially (4.2.47), (4.2.51), and (4.2.49)). Finally, set P, = J, P, m(da).
-
5.3.2 Lemma.
-
If JE is lower semi-continuous, then
and (5.3.4) for all C Cc MI@).Moreover, if, in addition, JE is good (or, equivalently, A, is tight), then (5.3.4) holds for every closed C MI@).
PROOF:In (ii) of Exercise 4.2.63, we saw that JE is convex. Thus, by Theorem 2.2.15 and (4.2.51), if JE is lower semi-continuous then (5.3.3) follows; and so, by the results in Section 5.1 (in particular part (i) of Exercise 5.1.12), all that we have to do is check that
But, because m E MI@), it is easy to see that
We now want to show that, under reasonable conditions, one can prove the complementary lower bound. The approach which we are going to
V Non- Uniform Results
207
adopt is very reminiscent of the one which we used in the original proof that we gave in Section 1.2 of the classical CRAMERTheorem for realvalued random variables. That is, we will force certain ergodic behavior factor and will by the introduction of an appropriate RADON-NIKODYM get our lower bound by estimating the size of the factor which we have introduced. However, in order to carry out this program, we need to make the following mild assumption.
(E)
If {QT : T > 0) C Ml((R,B)) is consistent in the sense that Q T ~= Q T ~on Btj~~ for all 0 5 TI < T2 < m, then there exists a unique Q E Mi((R,B)) such that Q = QT on BT for each T E [O,m).
Note that (E) holds if 52 is a Polish space, B = B,, each Bt is countably generated, and B is generated by U ,, - at. (Cf. Theorem 1.1.10 in [104].) 5.3.5 Lemma. Let u E D n B ( C ;[l,m)), set V, =
-e,define
for ( t , ~E)(0, m) x C, and set r E BE and
for ( t , w ) E [O,m) x R. (See Lemma 4.2.23 and Theorem 4.2.25 for the notation here.) Then P,(t, (T,-) is a transition probability function; and, for every u E C, (X,(t), Bt, Po)is a non-negative martingale with meanvalue 1. Moreover, for each CT € C, there is a unique P," € Ml((R,B)) satisfying PZ(A) =
X,(t,w) P,(dw),
In fact, the family {P,"
: uE
t E [O,m) and A E
C} is measurable and, for each cr E C,
for all s, t E (0, m) and A E B,. Finally, if
(5.3.7)
at.
208
Large Deviations
then mu is a reversing measure for P,(t,
6,
a).
PROOF: We first check that P,(t, 0,.) is a transition probability function. To this end, note that
Thus, the measurability of
as well as the CHAPMAN-KOLMOGOROV equation are immediate. In addition, since u = P p u (cf. the proof of Lemma 4.2.35), it is clear that P,(t, 0,C) = 1. We next show that, for each CT E C and I? E BE,
for 8 , t E ( 0 , ~ and ) A E D,. Indeed, by the MARKOV property combined with (4.2.26),
which is equivalent to (5.3.8). By taking I? = C in (5.3.8), we get the asserted martingale property; and therefore, by (E), the existence and uniqueness of P," have also been established. Moreover, the measurability of u E C w P," is a trivial consequence of the expression for P," on each of the Dt 's, and (5.3.6) follows easily from (5.3.8). Finally, to see that mu is reversing for Pu(t,0,.), note that for 4, E B ( C ;R) $J
FF
Since, by Lemma 4.2.50, is self-adjoint on L 2 (m),it follows that the first expression in the above is symmetric in 4 and $J.
V Non- Uniform Results
209
5.3.9 Lemma. Assume that J & ( p ) = 0 only if p = m. Then for every u E D n B ( C ; [l,co)) and every r-open neighborhood (cf. the discussion
preceding Lemma 3.2.19) G E B M ~ (of~mu ) X,(t, w)Po(dw)= 1 in m-measure.
-
PROOF:Note that it suffices to check that if PzL= ScPzm,(da), then P"({W : Lt(w) E G}) 1 as t + 00. Furthermore, since P" is tirneshift invariant, this latter statement will follow from the Individual Ergodic Theorem once we show that P" is ergodic relative to time-shift. Thus, all that we have to do is show that if {tn}y [O,m), F E B(CZ+;W), and
then Cpo is P"-almost surely constant if, for each t f ( O , o o ) , P"-almost surely. We begin by showing that if 4 E B(C;R) satisfies
@t
=
@O
for each t E (0,00), then 4 is mu-almost surely constant. In fact, given such a 4, we can use symmetry to check that
Since, for each t E (O,m), PU(t,o,dr)m,(do) is bounded above and below by constant positive multiples of P(t,0,d ~ ) m ( d a it ) , follows from (4.2.54) that €(4,4) = 0. But, this means that J & ( p )= 0, where
and therefore, by hypothesis, f#J is rn-almost surely constant. Returning to the ergodicity question about P", suppose that = @O P"-almost surely for each t E (0,m). Set 4(a) = S , @ o ( w ) P ~ ( d w ) , and observe that for all t E (0, m) and mu-almost every o E C
Large Deviations
210
Thus, by the preceding, 4 is mu-almost surely constant. But this means that, for any t E (0,m) and A E Bt,
s,
Qo(w) P"(dw) =
J, @t(w)P"(dw)
=L
$ ( C t ( w ) ) P"(dw) = P"(A)
and clearly this leads to the conclusion that
In other words, @o must be P"-almost surely constant. I
5.3.10 Theorem. Assume that J & ( p ) = 0 only if p = m, and Jet v E M,(C) have the property that, for some T E [O, m), VPT is not singular to m. Then for every r-open set G E B M , ( c ) (5.3.11)
Hence, if, in addition, JE is a good rate function, then
(5.3.12)
for every r E B M ~ ( c ) .
PROOF:In view of Lemma 5.3.2, all that we have to do is check (5.3.11). Also, since, for any T E ( 0 , ~and ) 6 > 0, P U P T ( { W : Lt(w)
E GI)
4 Pu({.
: l [ L t ( 4 - GIVar< 6 )
as soon as t is sufficiently large, we will assume, without loss of generality, that II itself is not singular to rn. In particular, this means, by Lemma 5.3.9, that
lim t+m
X u ( t ,w)P u ( h ) > 0. w:Lt(w)€G}
We begin by showing that if u E D n B ( C ;[ l , m ) ) , then (5.3.13)
V Non- Uniform Results
211
for every r-open G E ~ ? M ~ ( - Q containing mu. To this end, set
; therefore, by the remarks made above, for all r E ( 0 , ~ )and
2
- lim
sup
'LopEG(r)
1
V, d p =
C
5
dm,.
(5.1.13) is now proved. Finally, we will show that if Jt(p) < 00, then there exists a sequence {un}yG D n B ( C ;[l,m)) such that mu, p in the strong topology on Ml(C) and JE(m,,,) J E ( ~ Clearly, ). when combined with (5.3.13), this will complete the proof of (5.3.11). with J E ( ~<) m be given. Then p ( d a ) = f(a)m(da) Let p E where f E L1(m)+and & ( f 1 / 2 , f 1 / 2 ) = J E ( ~ For ) . n E Z+,set
-
Clearly fit/2
-
-
f112 in L2(m),and therefore, by (i) of Exercise 4.2.63,
At the same time, one can use (4.2.54) to check that
Large Deviations
212
Thus, our problem reduces to that of finding {u,}T D f l B(C;[l,oo)) for the case when f = satisfies a2 5 f 5 l / a 2 for some a E (0,1]. But
&
where R:, X E (0, oo),is the operator defined in (4.2.32) with V = 0. Then, not only is u, E B ( C ;[l,m)) but also, by Lemma 4.2.31, u, E D. At the same time, by the Spectral Theorem,
-
-
-
-
and SO CYZL, f ' I 2 in L2(rn)while a 2 E ( u n ,Un) E ( f ' I 2 ,f ' I 2 ) . From these, it is easy to see that mu, p strongly and that J~(rn,,) J E ( P ) *I
5.3.14 Exercise.
Assume that P ( t ,6,-) is FELLER-continuous and that P(t,u,-) for every ( t ,u) E (0, oo) x C.
<< m
(i) Show that rn itself is the only {Pt : t > 0)-invariant probability measure, and conclude not only that J & ( p )= 0 only if p = rn but also that
7
inf lim -log P u ( { w : L t ( w ) } ) ) 2 -igfJE
t for every .r-open G E B M ~ ( c ) . uECtym
(ii) Next, using (4.2.42), (4.1.46), and Theorem 4.2.58, show that JE = Jp. In particular, this means that JE is lower semi-continuous. (iii) Finally, under the additional assumption that AE is tight (equivalently, J& is good), use (i) and (ii) above together with the considerations in Exercise 5.1.17 to conclude that
- inf JE 5 inf lim ro
a f C tTm
t
Pu({w : L t ( w ) E I?})
pu({w :Lt(w) E
r}))5 - inf J E , r E &I~(C). r
5.3.15 Exercise.
(i) Suppose that there exist a , /? E [O,oo) such that H(plm) I (YJE(c1) + P I c1 E Ml(C). Show that JE is then a good rate function on MI@).In fact, show that {p : J E ( ~I) L } is a strongly compact subset of MI@) for every L E (O,.). (5.3.16)
V Non- Uniform Results
213
(ii) Next, suppose that (5.3.16) holds with p = 0, and show that, in this case, J E ( ~ = ) 0 if and only if p = m. (Hint: use (3.2.25).) In particular, (5.3.16) with p = 0 means that all the hypotheses of Theorem 5.3.10 are satisfied. (iii) Define
A~(v = )log
(J,e x p [ ~dm) ] ,
v E B ( C ; R)
as in Section 3.2. Using the preceding and Lemma 3.2.13, show that (5.3.16) holds for a given a > 0 and /3 = 0 if and only if 1 AE(V) 5 -Afi(aV), a
(5.3.17)
V E B(C;R).
Also, check that (5.3.17) holds as soon as the indicated domination holds for all V E Cb(C; R). 5.4 Large Deviations for Hypermixing Processes In this section, we will present a very general, process-level large deviation principle. In order not to overburden the discussion with inessential technicalities, we will deal initially with continuous path processes. Afterwards, we will say what has to be done to extend the results to the SKOROKHOD setting; and we will leave the discrete time case as an exercise. Let C be a Polish space and denote by R the Polish space C(R;C) with the topology of uniform convergence on compacts. For t E W, define Ct : R C so that C,(w)is the position of the path w E R at time t. Next, given a closed interval I G R, let RI stand for C ( I ;C), and for t E I define Ct : RI C accordingly. Also, define T I : R RI to be the natural projection map obtained by restriction to I ; and set BI = nll(Bn,). It is clear that ,131 coincides with the a-algebra over R generated by the maps C t , t E I . Given C > 0, n 2 2, and real-valued functions f1,. . ., fn on a,we will say that f i , . . . ,fn are t-measurably separated if there exist intervals 11,...,I, with the properties that dist(Im,Im,)2 C for 1 5 m < m’ n and fin is BIm-measurable for each 1 5 m 5 n. With the preceding notation, we can now describe an important mixing property. Namely, we will say that P E M1(R) is hypermixing if there exist a number l o 2 0 and non-increasing functions a , /3 : ( l o , m) [l,m) and y : ( l o , 00) [0,1] which satisfy
-
-
-
-
(5.4.1)
lim a ( [ )= 1,
e-+m
lim C(p(.t)- 1) < 00,
e+oo
lim ~ ( = t )0,
e+oo
214
Large Deviations
and for which
m=l
whenever n 2 2, C tions; and
> CO, and fl, . . . ,fn
are l-measurably separated func-
whenever C > l o and f, g E L1(P) are C-measurably separated. Finally, define the time-shift transformation group (0, : t E R} on R by C,(@,w> = C,+t(w)for s, t E R and w E 52. Clearly, eS o 0, = 0,+, for all s, t E R, 7r1 o 0, = 7rt+I for all t E R and intervals I, and ( t , ~E) RxR 0,w E R is continuous. We will use MY(R), EMT((R),and 3 ~ , to denote, respectively, the (0, : t E R}-invariant Q E M1(R), the ergodic Q E MY(R), and the (0, : t E R}-invariant elements of BQ. For w E 52 and T E (0, oo),let RT(w)E Ml(R2)be the empirical process measure given by
-
(5.4.2) Obviously, (T,w)E (0,oo) x R
RT(w)E MI(R) is continuous.
t--)
Throughout the rest of this section, P will denote a fixed hypermixing element of Ms(f2). As a consequence of (H-2) and part (ii) of Exercise 5.2.31, it is clear that P is ergodic and therefore that RT(w) P for P-almost every w E R. What we want to do is derive an associated large deviation principle. The procedure which we will use is based on the following outline.
STEP1: For any compact interval I and V E B(01;R), we will use (H-1) to show that
exists. At the same time, we will show that V E c b ( n I ; R ) tight and thereby derive the upper bound
(5.4.4)
- 1 lim - log[p({w T+CO
2T
:
RT(w)E F } ) ] 5 -inf A* F
-
AI(V) is
V Non-Uniform Results for closed F M1(R), where A* : MI(52) function defined by A;(Q o w;')
(5.4.5)
and, for each I , A; : M1 (521)
-
:
-
215 [O,m] is the good rate
I is a compact interval
[0,m] given by
is the LEGENDRE transform of AI.
STEP2: Given a compact interval I, define H r ( Q J P )for Q E Ml(C) to be the relative entropy H ( Q o sF1lPo w,') of Q o wF' given P o fly1. Again using (H-1), we will show that
Thus, if we define the specific entropy function
then H is a good rate function and (5.4.9)
- 1
lim -log[p({w 2T
T+m
:
RT(w)E F } ) ] I - i g f H
for closed F C MI(52).
STEP 3: Having established the upper bound (5.4.9), we will turn to complementary lower bound. We will use the Ergodic Theorem to check that, for Q E EMS (Q), (5.4.10) P ( { w : RT(w)E G})
2 -H(Q)
for open G 3 Q.
-
In order to remove the restriction that Q be ergodic, we will introduce the lower semi-continuous function J : MI(52) [0,m] given by
J(Q) = - inf (5.4.11)
1 { lim log[p({w 2T
: RT(w) E G})]
TTm
:G
is an open set containing Q
216
Large Deviations
for Q E Ml(R). Clearly J 5 H on Ml(R) \ MY(f2); and, by (5.4.10), we also know that J 5 H on EM?(R).Thus, all that remains is to check that the domination of J by H on EMf(f2) extends to the whole of MS(R); and our proof of this fact wilI turn on The Ergodic Decomposition Theorem. Namely, at the one place where we use (H-2), we will show that J is a convex function. At the same time, we already know (cf. Exercise 4.4.41) that H is affine; and these two observations will be used to conclude that
(5.4.12)
where p~ E MI(EMT(SZ)) is the measure described in the Ergodic Decomposition Theorem (cf. Theorem 5.2.16). Obviously, in conjunction with the preceding, (5.4.12) is more than enough to complete the proof of the lower bound. Keeping the preceding outline in mind, we now get down to business. 5.4.13 Lemma. For every compact interval I and every V E B(RI;R),
the limit AI(V) in (5.4.3) exists. In addition, for l
> lo,
In particular, the map V E Cb(f21; R) HAI(V) is a tight, convex function which satisfies AI(c1) = c for c E R.
PROOF:Without loss of generality, we will assume throughout that V is non-negative, and we will use M to denote IlVll~. To prove the existence of AI(V), set
Because of shift-invariance, all that we have to do is check that the limit ) given and write T = limT,, exists. To this end, let S E ( 0 , ~ be nTS T T , where 1 2 E~ Z+ and TT E [0, S),for T > S. Then, by (H-1) and
+
V Non-Uniform Results
217
shift-invariance, for every C > LO,
Hence,
for S E (0, 00) and C implies that
> CO; and, since a ( [ )\ 1 as C /” 00, this clearly this
In order to prove (5.4.14), let C > C, be given and set T = C + 111. Then, again by shift-invariance and (H-l),
218
Large Deviations
where we have used JENSEN'Sinequality in the passage from the second to the third line. After dividing through by nT and then letting n -, 00, we arrive at (5.4.14). Finally, the convexity of A, as well as the equality Ar(c1) = c, c E R, are both immediate consequences of the definition of A,. Moreover, given (5.4.14), it is clear how to choose the sets K ( M ) cc QI to check tightness. Namely, let C > CO V 1 be given and choose K ( M ) CC 521 so that
-
P({w : v ( w ) 4 K ( M ) } ) I exp[-(C+
IIl)4C)M]. I
Now let A; : Ml(R1) [0,00] be the function defined in (5.4.6). Then, by Lemma 5.4.13 and Corollary 5.1.11, A; is a good rate function on M ~ ( Q Iand )
-
P({w : RT(w)0 nll E F } ) 5 - inf A; lim -log 2T l ( > F for closed F MI ($2,). Thus, by (ii) of Exercise 2.1.21 (cf. (ii) of Exercise 3.2.22 as well), the function A* : Ml(Q) [0,m] in (5.4.5) is also a good rate function; and, just as in (iii) of Exercise 2.1.21, we now have (5.4.4). T-+m
-
Having completed STEP 1, we now begin STEP 2 by checking that A*(&) = 0;)
(5.4.15)
when Q f Ml(Q)\ Ms(Q).
To this end, suppose that Q $ Ms(S2) is given. One can then choose a compact interval I and a v E cb ($21; R) so that
w)>Jn
(5.4.16) l V o n & $ w ) Q(d
-
Vonl(w)Q(dw)+l
for some C E W.
In particular, if the compact interval J is chosen so that ( L + I ) U I C J and W E C ~ ( $ ~ J ;iRs d) e f i n e d b y W o n ~ e V o n I o e ~ - V o nthen(5.4.16) ~, leads to A*(Q) 2 - A J ( M W > : M E (0, m)}.
sup{^
Thus, we will have completed the proof of (5.4.15) once we show that
A j ( M W ) 5 0 for every M E (0,m). But it is clear that, for any T > C,
and, therefore,
2T log asT-00.
(k [l: exp
MW
o TJ
(&w) dt
V Non- Uniform Results
219
To complete the proof of (5.4.7), we will use the following lemma. 5.4.17 Lemma. Let I be a compact interval. Then
(5.4.18) for all Q E Ml(R) and
> C,;
and, for every Q E MT(R),
1
5 FHI(T)(QIP) for T E (0,m) and V E B(f21;R), where I ( T ) = {t : It - I1 I T}.
PROOF:Recall (cf. Lemma 3.2.13) that HI(QIP) is given by (5.4.20) s u p { ~ V o 7 r 1 d Q - l o g ( ~ e x p [ V o n r ]d P ) : V E Cb(l2r;R)). Thus, (5.4.18) is an immediate consequence of (5.4.14). To prove (5.4.19), let Q E MT(R) and V E B(RI;R) be given. For T E (0,m), define VT E B(RI(T); R) so that
Because Q is shift-invariant, one then has that
Finally, by (5.4.20), the right hand side of the preceding is dominated by &HI(T)(QIP) when V E Cb(R1;W)and therefore for general V E B(Qr;W). I From here, it is an easy matter to complete STEP2. Indeed, by (5.4.18), for any Q 6 Ml(R), we have that
- 1
lim -HI(QJP) 5 A*(&).
IPR
)I)
220
Large Deviations
On the other hand, if Q E MY(O), then both HI(Q)P) and h;(Q) depend on I only through 111, and, by (5.4.19),
for any S E ( 0 , ~ )and V E Cb(R[-s,s];R). Clearly this leads immediately to Ai-S,S'] (Q T[-S,S]) < - -T lim M ' &H[-,T](QIP); and the rest of STEP 2 is now simply a matter of notation.
-'
We next turn to STEP 3 and verify that (5.4.10) holds for ergodic Q E
MY(W 5.4.21 Lemma. If Q E EMS(O) and I is a compact interval, then for any G I E B M ~ (which ~ ~ is ) a r-open neighborhood of Q o A;'
PROOF: The argument is very much like the one used in (ii) of Exercise 3.2.23, only here the Ergodic Theorem plays the role that the Law of Large Numbers did there. Set I ( T ) = {t : It - I1 5 T} and
-
and let AT = { w : RT(w)o ~7' E GI and FT(w) > 0). Then, by the Ergodic Theorem, Q(AT) 1 as T + 00. Thus, by JENSEN'S inequality,
since
&-(w)log(FT(w)) p ( h )- Hr(T)(Q\P) - J,: 2 -eel - HI(T)(QIP). I As an essentially immediate consequence of Lemma 5.4.21, we see that lim log ( P ( { w : RT(w)E G})) 2 -H(Q) T+w
for any open G C_ Ml(R) and any ergodic Q E G.
V Non- Uniform Results
221
-
Continuing with STEP3, we next define the lower semi-continuous function J : Ml(CL) [0, 003 as in (5.4.11). Our goal is to prove that J 5 H. At the moment (cf. the preceding paragraph), we know that J 5 H on
E M m ) u (Ml(W \ MW)). 5.4.22 Lemma. The function J in (5.4.11) is convex.
PROOF:Since J is lower semi-continuous, it suffices to check that
for Q 1 , Q Z E M1(R) satisfying J(Q1) V J(Qz) < 00. To this end, let G be ) T > 0 so an open set containing Q = ( 0 1 Q2)/2. Choose S E ( 0 , ~ and that
+
where I = [-S, S] and the balls BI are defined relative to the LEVYmetric on Ml(0,). Set
and
w(T)= P ( { w : RT(w)E G}). Then, by (H-2):
+
as long as C > and T > ( 2 s C ) / ~ T . (The number P(C)' is the HOLDER conjugate of P(C).) Since J(Q1) V J ( Q 2 ) < 00 means that ul(T)uz(T)2
222
Large Deviations
exp [ - M T ] for some M
< 00 and all sufficiently large T 's, we now see that
1
2 5"'(T)"2(T) for all sufficiently large T 's; and clearly this leads to
1 2. - lim -log TTm
+ -1 2-
J(Q1)
P({w : RT(w)on,'
2T l
E
Br(Ql,r)}))
(
7 ,
lim - log P ( { w : RT(w)0 rT1E Br(Q2,T ) } ) )
TT&! 2T
+ J(Q2). 2
We are now in the following situation. Both of the functions J and H are lower semi-continuous and convex; and we know that H(Q) >_ J ( Q ) for all Q E (M1(R)\ M:(R)) u EMf((R).Furthermore, the function H is affine on MS((R)in the sense that
H(aQi + (1 - a I Q 2 ) = aWQi)+ (1 - a)H(Q2)
(5.4.23) for
(Y
E [0,1] and Q1, Q 2 E
Ms((R).To see this, simply observe that (cf.
(ii) of Exercise 4.4.41)
+
aHI(Q1JP) (1 - a)Hr(Q2lP) 2 Hr (aQi+ (1 - Q ) Q ~ ~ P ) 2 2 aHr(Q1IP)+ (1 - a)Hr(Q21P)- .; From these remarks, it should be clear that the following lemma is all that we need in order to complete STEP3.
M1((R)+ [O,oo] be a lower semi-continuous function. If CP is convex, then for every p E Mi (fl)
5.4.24 Lemma. Let
(5.4.25)
ip :
(Ll(o)
Rp@,))
s,,,,)
@ ( RP) ( W
On the other hand, if CP is f i n e on Mf(R) and p E Ml(Mf(fl)),then (5.4.26)
@(R)P(dR).
V Non- Uniform Results
223
PROOF:We begin with the case in which p ( K ) = 1 for some compact subset K of M1(R).Throughout, B ( Q , r ) denotes the LEVY-metric ball in M 1 ( 0 ) of radius r around Q. For m E Z+, choose a finite set {Rm,e}tzl E K so that the balls B,,e = B(R,,e, l/m), 1 5 C 5 L,, cover K ; set Am,l = K fl Bm,l and
for 2 5 C 5 L,; and take a,,e = p(A,,e). Next, for m E Z+ and 1 I lI L,, choose P,,e E K n Bm,e SO that
@(Pm,e)5 inf{ @ ( R ): R E K n B,,!}
+ -;m1
and define Fm,t by
Assuming that @ is convex, we have that
-
where Q,,(R) = @(Pm,e)for R E A,,J. Since @ is lower semi-continuous, @,(R) Q,(R)for each R E K . Thus, when @ is bounded, LEBESGUE'S Dominated Convergence Theorem shows that
as m
-
00.
At the same time,
and so, again by lower semi-continuity,
Large Deviations
224
and together, these imply the desired result when 0 is bounded. Thus, even if 9 is not bounded, we have that (5.4.25) holds for @ A n; and, therefore, a passage to the limit as n --t 00 yields the result for a's which are not necessarily bounded. Next, assume that 9 is afiine on Ms(R). Because Ms(R) is closed, we may and will assume that the K for which p ( K ) = 1is contained in Ms(R); and therefore that each of the measures is an element of Ms(R). Thus, - Fm,t since JMs(n) R P ( ~ R=) am,tpm,t,
c,"=;
where S m ( R ) = @(Fm,t) for R E A,,[. Noting that, by lower semicontinuity, @ ( R ) 5lim m,(R) m+oo
for each R E K , we can now use FATOU'S Lemma to conclude that the left hand side of (5.4.26)dominates the right hand side. At the same time, by the result in the preceding paragraph, the opposite inequality also holds. We have now completed the proof in the case when p is compactly supported. To handle the case when p is not compactly supported, choose a non-decreasing sequence of compact sets K,, so that p(K,) 2 (n- l)/n;set a, = p(K,); and define a,(I') = &p(I'nK,) and Tn(I') = &p(I'nK:) for I' E UMl(a). Since each o,,is compactly supported and J Ro,(dR) J Rp(dR),we see from the above that
*
@
(
/Ml (a)
p(dR))
n k
@
(/MI
(a)Ro,(dR)
when 9 is convex. On the other hand, if
Q,
)
I
/Ml(Q)
is affine, then
@ ( RP) ( W
V Non-Uniform Results
225
Since it is clear that
we are done. I Applying Lemma 5.4.24 to J and H, we now see that
where PQ E MI (EMs(R)) is the measure described in the Ergodic Decomposition Theorem. Hence, we have now completed STEP 3; and therefore we have derived the following version of a theorem proved originally by T. CHIYONOBU and S. KUSUOKA in [la].
-
5.4.27 Theorem. Assume that P E Ms(G) is hypermixing. Then the specific entropy function H : M,(R) [O,oo]in (5.4.8) exists (ie., the indicated limit exists) and defines a good rate function which governs the large deviations of { P o R;’ : T E (0, m)} as T 00.
-
At the beginning of this section we mentioned that there are certain technical difficulties associated with taking R to be the SKOROKHOD space D(R;C) of right-continuous paths w : R C which have a left-limit at each t E R. The difficulties alluded to stem from the problem of putting a Polish topology on R which is the projective limit of Polish topologies on the SKOROKHOD spaces of paths on finite time intervals. To be precise, let I be a compact interval and denote by D ( I ;C) the space of right-continuous paths WI : I C which have a left-limit at each t E I and are leftcontinuous at the right hand end of I . Using SKOROKHOD’S prescription, one can then put a metric PI on D ( I ;C) in such a way that ( D ( I ;C), P I ) is a complete, separable metric space and pr-convergence of { w ~ , e } & to WI is equivalent to
-
-
+ supIX(t) - tl : X E LI tEI
where distc denotes the distance on C determined by the C ’s metric and LI stands for the group of increasing homeomorphisms of I onto itself. Furthermore, the P I ’ S can be chosen so that if I = [a,b] and J = [c,d],
226
Large Deviations
where c 5 a and b 5 d , and if left-continuous at b, then
WJ,
w> are elements of D ( J ;C) which are
(5.4.28) The problem comes from the fact that W J E D ( J ;C ) and I J do not guarantee that W J is~ an~ element of D ( I ; C ) ,since W J need not be leftcontinuous at the right end of I . Worse, even if one replaces the restriction map by TI : D ( J ;C ) D ( I ;C ) given by
-
the situation in (5.4.28) does not improve substantially (i.e., the topologies still do not mesh correctly). For this reason, we will adopt a scheme for introducing a topology on D(R;C) which is slightly different from the one which we used for C(R;C ) . From now on, R will denote D(R;C); and, for compact intervals I, PI will be the metric introduced by SKOROKHOD on D ( I ; C ) . Given T E (O,m), we will use QT to denotes the space D ( ( - T , T ) ; C ) of paths WT : (-T,T) C which are right-continuous and have a left limit at each t E (-T, T ) . Next, we define the metric dT on RT by
-
and we take
-
-
Rs, S f Finally, we define KT,S : 1 2 ~ Qs, 0 < S < T , and T S : R ( 0 ,m), to be the natural restriction mappings. As a relatively straight-forward application of the fact that each w E 52 can have at most countably many points of discontinuity, one can use (5.4.28) to check all but the final assertion in the following lemma. The final assertion is a consequence of the well-known facts that, for each compact interval I, the SKOROKHOD topology on D ( I ;C) restricts to the uniform topology on C ( I ;C) and that the Bore1 field of the SKOROKHOD topology is the a-algebra generated by the evaluation maps C t , t E I .
V Non- Uniform Results
22 7
5.4.29 Lemma. Each of the spaces (RT, d ~ )T, E ( 0 , ~ is ) a complete separable metric space; and, for all 0 < S < T , dS(TT,SU,TT,SUk) 5 dT(WT,W&), w T ,w k E
%"I'
Moreover, (R, d ) is a complete, separable metric space which is homeomorphic to the projective limit of the sequence ( ( f & , ~ ~ + l , ~ , d :, )n E E+}; and ( t ,w)E (0,m) x R Otw is continuous. Finally, the relative topology which C(R; C) inherits as a subset of (R, d) coincides with the topology of uniform convergence on compacts, and ?3nis the a-algebra over R generated by the maps w E R &(w) E C, t E R.
-
Once one has the facts contained in Lemma 5.4.29, the argument used to prove Theorem 5.4.27 with R = C(R; C ) applies without change to the case when R = D(R;C). 5.4.30 Exercise.
Formulate and prove the analogue of Theorem 5.4.27 for the discreteparameter setting. 5.4.31 Exercise.
Let R be either C(R; C ) or D(R;C ) and let C' be a second Polish space. Suppose that F : R -+ C' is a B[-~,~l-measurable map for some T E [O,W), and assume that t E R F(Otw) E C' is an element of a' R' so that C : ( @ ( U ) ) = D ( R ; C ' ) for each w E R. Finally, define : R F(Otu) for t E R. Given a P E MT(R) which is hypermixing, show that P' = P o @-' is a hypermixing element of Ms (0').
- -
5.4.32 Exercise.
=
-
Let R = C(W;C), and suppose that P E M1(R) admits a good rate function J : M1(R) [0, W] which governs the large deviations of { P 0 RT1 : T > 0). Next, define the empirical position measure
- -
and observe that LT(w) = RT(u)0 X i 1 . Thus, since w E 0 &(w) E C is continuous, and, therefore, so is R E M1(R) R o C,' E Ml(C), the final part of Lemma 2.1.4 says that
I ( p ) = inf { J ( R ): R E M1(R) and p = Ro C;'},
p E
MI(C),
is a good rate function which governs the large deviations of ( P 0 LT1 : T > O}.
228
Large Deviations
-
Now let R = D(W;C)and suppose that there exist P E Ml(R) and a good rate function J : M l ( 0 ) [O,m] which is related to P as in the preceding paragraph. What one would like is to repeat the argument just given and thereby show that the large deviations of { P o LT : T > 0 } are governed by a rate function of the sort described above. The problem is, of course, that w E R -,I Co(w) E C is no longer a continuous mapping. In order to circumvent this problem, one can take the following sequence of easy steps.
-
(i) Set Ro = { w : & ( w ) = C,-(w)} and show that Ro is a bs-subset of R and that w E Ro & ( w ) E C is continuous. Conclude that MY(R) = { Q E Ml(R) : &(no) = l} is a C58 subset of M1(R) and that Q E MT(R) Q o C,' is continuous. Finally, check that Ms(R) MY(R).
-
-
(ii) For ( T , w ) E (0,m) x R, define GT E R so that G T I [ - ~ , = ~ )w[-T,T) and 82TGT = GT.Show that ( T ,w ) E ( 0 , ~ x) GT E is measurable and therefore so is ( T , w ) E ( 0 , ~x)R +-I RT(w)= RT(&) E Ms(R). In addition, check that, for each S E [0,m),
-
(iii) Suppose that P E M1(R) and that J : M1(R) [O,m] is a good rate function which governs the large deviations of {PoR;;l : T E (0, m)} as T 00. Show that JlMS(,) is a good rate function which governs the
-
large deviations of {P o R G 1 : T E (0, m)} as T
l
LT(w) =
T
1
-
00.
Next, define
T 6Ct(w)
dt
and show that { P o LT1 : T E (0, m)} satisfies the full large deviation principle with respect to the good rate function p E M1(C)
-
I(p)
= inf{ J ( Q ) : Q E MY(,)
and Q o X i 1 = p } .
In particular, when P E MT(R) is hypermixing, conclude that { PoL,l : T E (0,m)) satisfies the full large deviation principle with the good rate function I : M1(C) [0,m] given by
-
(5.4.33) 5.4.34 Exercise.
I ( p ) = inf{H(Q) : Q o C,'
-
=
p}.
Let P E Ms(R) be hypermixing. Starting ffom (5.4.14),show that, for each compact interval I, V E B(R1;Fa) Al(V) E R is a continuous
V Non- Uniform Results
229
function of bounded, point-wise convergence. (Hint: See the proof of Lemma 4.1.40.) Conclude that
(5.4.35) for Q E Ml(i2). 5.4.36 Exercise.
Let P ( t ,c7, .) be a transition probability function on C and assume that the corresponding MARKOVfamily {Pu : u E C} can be realized on D([O,00); C). Also, suppose that there is precisely one P(t,u,,)-invariant p E MI@); and denote by P the unique element of MY(fl) with the property that
for --oo < s < t < 00 and I? E BE.(Obviously, P o CF1 = p for all t E W.) Finally, assume that P is hypermixing. The purpose of this exercise is to see when the rate function I in (5.4.33) can be identified with one of the rate functions which we produced in Section 4.2.
(i) Show that if p = rn is P(t,a,-)-reversing, then I = J E , where JE is defined from the associated DIRICHLET form E (cf. (4.2.47)) as in (4.2.49). (Hint: Use (5.4.18) with I = (0) and Exercise 5.3.15.) (ii) The non-reversible case is not so satisfactory. To see what sort of thing as in Exercise 5.1.17, and J p and J p can be said, define i p , A;, and as in (4.2.38) and (4.2.36). Noting that A{,) 5 A p (cf. (4.2.21)), show that I 2 J p . Next, if, for some V E B(C;R), (5.4.38)
xi
show that i p ( V ) 5 Ai0)(V). Conclude from this, Exercise 5.1.17, and Exercise 5.4.34 that I = J p when (5.4.38) holds for every V E B(C;R). Similarly, when P ( t ,c7, -) is FELLER-continuous, show that, when (5.4.38) holds for every V E Cb(C; W), I must equal J p .
230
Large Deviations
5.4.39 Exercise.
One of the more remarkable features of the hypermixing property is its behavior under products. To be precise, let 3 be a countable index set and for each i E 3 let Pi be a hypermixing element of MY (D(R;Xi)) where each Ci is a Polish space. Further, assume that there are functions a, pl and y satisfying (5.4.1) such that (H-1) and (H-2) hold with P = Pi for all i E 3. After making the obvious identification of
show that
niE3Pi determines an element of
which is hypermixing with the same choice of functions a , p, and y. 5.4.40 Exercise.
Define the r-topology on Ml(f2) to be the weakest topology with respect to which the mapping
is continuous for each compact interval I and V E B(SZ1;R). Given a I? C_ Ml(SZ), let I?" and r Tdenote, respectively, the interior and closure of I? in the .r-topology. 'Assuming that P E Ms(SZ) is hypermixing, show that, for every measurable 'I M1(f2), -
inf H(Q) < lim QEr0
t-tm
t
log (P({u : R t ( w ) E I?}))
(Hint: Use the estimate on which (5.4.14) is based and apply Theorem 3.2.21.) With the preceding in hand, one sees that it would have been possible to avoid some of the difficulties associated with the SKOROKHOD topology by proceeding along a line of reasoning like the one which we used to complete the program in Section 4.4.
V Non-Uniform Results
231
5.5 Hypermixing in the Epsilon Markov Case
In this section, we develop a sufficient condition for the hypermixing property t o hold. Throughout, R will denote the space D ( R ; C ) (cf. the discussion following Theorem 5.4.27) and P will denote a fixed element of
M m .
-
Recall the a-algebras BI = a({& : t E I}),where I runs over intervals in R. We will use BI(R;R) to denote the subset of f E B(R;W) which PL E M l ( R ) to are BI-measurable. Also, given I, choose w E R be a regular conditional probability distribution of P given BI and define EI : B(R; R) Br(S2;W)so that E ~ f ( w=) f ( w ’ ) PL(dw’). Notice that, by JENSEN’Sinequality,
-
s,
(5.5.1) where
for p , q E [l,001 and any operator K defined on the bounded measurable functions B ( ( E 3); , R) of a measure space ( E ,F,p ) . In addition, by shiftinvariance, one has that
[Wf
(5.5.3) Es+If = 0 4 4 1 0 8, (8% P ) for all s E W and f E B(R; R). Using E; and E,’ to denote E(-,,,] and E[s,oo), respectively; we now define Pt : B(R; R) B(R; R) for t f (0, cm) by
-
(5.5.4)
Ptf = E, [E:(f
0
et)]
= E{ [(E?,f) 0 41.
Obviously, (5.5.5) l l p , J I L p ( p ) ~ L p ( p= ) 1, p E I.* In addition, if f E B+,(R; R) G B[-s,m)(R;R) and 0 (cf. (5.5.3)) P-almost surely:
p t f = ~ ; ( f0 e,) = E,-E;-,(~ = E, ([E,(f
0
0
es>] 0 a t - , )
< s < t < 00, then
e,) = E , [tP,f)0
&-,I;
and therefore, by (5.5.1), we see that (5.5.6)
IIPt f IILP ( P ) 5 I1P, f IILP ( P )
for p E [ l , ~ and ] f E B?,(QR). yields
(5.5.7)
7
O<s
Finally, another application of (5.5.3)
E,ET f = [Pt(f 0 8-,-,)] for s E R, t E (0, co), and f E B(R; R).
0
8,
(as.,
P)
232
Large Deviations
We are now ready to describe an extension of the usual MARKOVprop erty. Given E E [0, oo),we will say that P is 6-Markov if
E{E:f
(EM)
= E[-,,o]f (a.s., P),
f E B(R;R).
Notice that, with only a P-negligible alteration in the definition, we may and will assume that
(5.5.S)
.
Pt : B(R; R)
-
B[-,,O](R; R),
t
E (0,oo)
when P is E-MARKOV. Thus, if P is E-MARKOV and s, t E (5.5.7),
[E,
oo),then, by
P,+tf = EJf O & + t ) = E,-E,-(f 0 &+d = E; [(Ptf) 0 e,] = P,(Ptf) (a% P ) for f E B?,(R; R). Hence, we have the following semigroup property. 5.5.9 Lemma. If P is E-MARKOV, then, for s E [c, oo),P, maps B?,(R; Fa) into itself and
P,+tf = Ps(Ptf) (a% p >
(5.5.10) for s, t E
[E,
oo) and f E B_+,(R;R).
We next introduce a property which, ir, conjunction with the €-MARproperty, will guarantee hypermixing. Namely, if P E MT(CI) is CMARKOVand TOE [ E , oo),we say that P is To-hypercontractiveif KOV
(HC)
IlfllLz(P)
llPTofIJL4(P)I
for f E B_+,(R;R).
5.5.11 Lemma. If P is To-hypercontractive, then
ssl
PROOF:Let f E B?,(R; R) with f d P = 0 be given. Then, by the To-hypercontraction property, for every a E R:
and, therefore,
V Non- Uniform Results
233
5.5.12 Theorem. Assume that P is ~-MARKOV and To-hypercontractive.
Then,
for t E [TO, 00) and f E Bz,(R; R). Moreover, if a 1 < p < q < 00 and t 2 4To:
II~tfllLqp,IIIfIILP(P)
(5.5.14)
s.
l l ~ t f l l L 2 ( p )I IlfllLP(t)(P)
and
w,
then for
for f E B + , ( W )
as long as eat 2 In particular, if p ( t ) = 1 1 + exp(at), then, for t E TO, 00): (5.5.15)
=
+ exp(-at)
IlptfllL.ct,(p)
and q ( t ) =
I IlfIILZ(P)
whenever f E B?,(R; R).
PROOF:The first assertion is an immediate consequence of Lemma 5.5.9 and Lemma 5.5.11. In proving the rest of the theorem, we will use p' to denote the HOLDER conjugate of p E [l,001. Set 0 = 4/12) = 213, and define r n for n E N so that l / T k = On/4'. It is then clear that T~ \ 1 and that 1-0 e 1 1-6 e and -- - -. Tn 1 Tn-1 Tn+1 1 Tn Hence, by the RIESZ-THORIN Interpolation Theorem applied to the LEBESGUE spaces 1
-
+-
+
LP_,(P)= {4 E L p ( P ): 4 is f?[-,,,)-measurable}, we see that
and so, by induction and Lemma 5.5.9, we get ll%-wo
llL::(P)+L:y(P)
<1, Olmln.
Because I I P t l l ~ p ( p ) + ~ q=( p1) for all t E (0,m) and q 5 p , it is easy to deduce from the above that IIPtllL~( p ) - L ~ c ( p ) = 1 whenever 1 < p < q 5 2 and t E TO, m) satisfies exp[& log(3/2)] 2 * Next, set = 214 = 112 and define sn for n E SO that 113, = 6 12. Proceeding as in the preceding, one then finds that IIPtllLp- c (p)+Lp_,(p) = 1 for 2 5 p < q < m and t E T TO, m) satisfying exp[& log21 2 and, after combining this with the result obtained before, one arrives at the desired conclusion. I -6
s.
s;
234
Large Deviations
5.5.16 Corollary. Let everything be as in Theorem 5.5.12, and define
5.5.17 Theorem. Assume that P is 6-MARKOV and To-hypercontractive, and define
p : (0,m)
-
(1,m)
and p : (0, m)
-
(1,m)
as in Theorem 5.5.12 and Corollary 5.5.16, respectively. Then P is hypermixing 'with to = 1W0,a ( l ) = p ( C / 2 ) in (H-1), and p ( l ) = p ( t / 3 ) and
in (H-2). Conversely, if P E MT(f2) satisfies (H-1), then it is To-hypercontractive for some TO E (0,m); and so, for E-MARKOV P 's, hypercontractivity is equivalent to (€3-1),and (H-1) implies hyperrnixing. PROOF:In proving (H-1), we assume, without loss of generality, that f, E Blorn,bml (a;[ O , m)) for 1 5 m 5 n, where a1 = 0 and a,+l - b, 2 t for 15 m < n. Since, for any t E (O,cw),
V Non-Uniform Results
235
with t = t / 2 . To this end, note that, by Corollary 5.5.16,
II pt (fl
*
I
. . f n ) 11LZ ( p )I IIfl IILa (C) (P)I pt9 IIL2 (P)
7
and, for n 2 3,
Hence (5.5.18) follows easily by induction on n.
We now turn to the proof of (H-2). Since
we may and will assume that f E B[e,m)(R;R), g E B(-,,ol(s2; R) and that fs2f d P = 0. But then, by (5.5.7) (with g in place of f),
and so, by H0LDER7s inequality, (5.5.14), (5.5.13), and (5.5.15):
where t = t / 3 .
236
Large Deviations
To prove the converse assertion, note that if P E M f ( 0 ) satisfies (H-1) for some a ( l )\ 1, then there is an l E (0,m) with the property that
for g E B(-,,ol(R;R)
and f E B + ( R ; R ) . But, for such f and g,
and so it is clear that P is t-hypercontractive. I 5.5.19 Exercise.
The notion of 6-MARKOV has various manifestations. A more precise name for the one which we have adopted would be backward E-MARKOV since it says that “the future given the past depends only the past back to time -6.”
(i) Show that E i E$ = E[-.,ol if and only if
for f E B(-,,o](Q; R) and 9 E
B[O,m)@%R).
(ii) The forward E-MARKOV property can be expressed as the equality E$E; = EL^,.]. Show that an equivalent formulation is the statement that
for f E B{-,,OI(R; R) and g E B[o,,)(R; R); and check that Theorem 5.5.17 continues to hold when one adopts this notion of E-MARKOV.
(iii) A more symmetric notion of E-MARKOV(and the one which was adopted in [18])is contained in the equality
Show that this definition is equivalent to
(a;R), and conclude that when it for f E B(-,,,/z~(O;R) and g E holds then so do both of the one-sided versions of the E-MARKOV property.
VI
Analytic Considerations
6.1 When Is a Markov Process Hypermixing?
In this section, we pick up the project, initiated in Exercise 5.4.36, of connecting the results obtained in Chapter V to those in Chapter IV, especially those in Sections 4.2 and 4.4. Thus, P ( t ,o,-) will be a transition probability function on the Polish space C , and we will be assuming that the associated measurable MARKOVfamily {Pu: o E C} can be realized on the SKOROKHOD space D([O,0 0 ) ; C). Also, we will use {Pt : t > 0) to denote the MARKOVsemigroup on B ( C ;W) which is determined by P ( t ,o,.), and we will suppose that there is a {Pt : t > 0)-invariant p E Ml(C). Finally, we will denote by P the unique element of My (R) (52 = D(W;C)) with the properties that (6.1.1)
P OC;'
=p
and P ( A n B ) =
PE,(~)(B) P(dw)
for A E B(-,,01 and B E B[O,0 0 ) . It should be obvious that, in the terminology of Section 5.5, P is 0MARKOV.In fact, the Pt in (5.5.4) is given by
for f E B+(R; R); and, as a consequence, Theorem 5.5.17 is easily seen to become the following statement. 6.1.3 Theorem. The P in (6.1.1) is hypermixing if and only if
(6.1.4)
IIPT
I L2(p)4L4
(p)
= 1 for some T E ( 0 , ~ ) . 237
238
Large Deviations
A MARKOVsemigroup for which (6.1.4) holds is said to be p-hypercontractive; and it is our goal to find conditions which guarantee this hypercontractive property. As a preliminary step in this direction, the following result is often useful. 6.1.5 Lemma.
If llpTlILz(p)+Lyp)
=1
then (6.1.6)
llpt9 - (9)pIIL*(p) I 3 - q
ll4IlLZ(p)
for t E [T,m) and 9 E B ( C ;R); where we have introduced the notation
( 9 ) p=
(6.1.7)
Jc 9 d P .
Conversely, if, for some To, TI E (0, oo),
MO
IIpTollLZ(p)+L4(p)< O0
then (Pt : t > 0) is p-hypercontractive.
PROOF:The first assertion is simply a translation of Lemma 5.5.11 into the present context. To prove the second part, write 9 = a + @, where a = (9),. Then, by HOLDER'S inequality, for t > TOV 21' :
jl~t411;"(p)I a4 + 6."Pt@lI"l.,,)
+ 41~lllPt@11;3(p)+ llmIl;qp) 4
I a4 + 8a211~t@11;z(p)+ 311pt@llL4(p)
+ 3M,4p4[(t-TO)'T'l
< a4 + 8p2[t/~1~ a2 "@llLz(p)
11414L2(p,1
where we have used (6.1.8) in the passage to the last line. Finally, we choose t > To V TIso that 8p2[t/T11 5 2 and 3Mtp4[(t-TO)/T11< - 1,
and thereby obtain
II~t9(l;4(p,Ia4 + 2a211@11~2(p) + 11@114Lz(p) = (a2 + ll~ll"L2(p))2 = 11#114L2(,).
I
The next result is a typical application of Lemma 6.1.5.
VI Analytic Considerations
239
6.1.9 Theorem. Suppose that there exist TO,TI E (0, co) for which P(Ti,u, d ~ =) p(Ti,0,T ) p ( d ~ ) , i E { O , 1 } and a E C,
and there is an
Then {Pt : t
E
> 0 such that
> 0) is p-hypercontractive.
llLz(p)+L4(p)
ObviouslY, ((PT, is check that PROOF:
< 00. Thus, all that we have to do
for some t E ( 0 , ~ ) But, . by (ii) of Exercise 4.1.48 with n ( a ,.) = P(Tl,a,*), we see that (4.1.50) says that
Hence, if EO is the operator on L 1 ( p ) which takes 4 E L1(p) into the constant function ( 4 ) p ,then (because p is P ( t ,0,.)-invariant) llPt - ~ o I l L ' ( p ) 4 ' ( p )< -2
for every t E (0, OO),
and, by the preceding,
Hence, by the RIESZ-THORIN Interpolation Theorem,
and clearly this means that we need only take t = nT1 for some sufficiently large n E Z+. I
Large Deviations
240
6.1.10 Remark. Theorem 6.1.9 makes it reasonably clear where hypermixing stands in relation to the hypothesis under which we proved our large deviation principle in Section 4.2. Namely, hypermixing is implied by the following strong version of (0):
(a)
for some 21 ' , T2 E (0,oo) and M E [l,00). Indeed, there is then (cf. Exercise 4.2.59) precisely one {Pt : t > 0)-invariant p E MI@);and, by Theorem 6.1.9, the corresponding P is necessarily hypermixing. Even though (SU) implies hypermixing, it is easy to see that itself does not always lead to hypermixing processes. For example, uniform rotation on S' certainly satisfies (0)and is certainly not hypermixing. On the other hand, as the following example demonstrates, there are important hypermixing processes for which fails.
(a)
(a)
-
6.1.11 Example. Define ~t : W
(0,oo) for t E (0, m) by
and let
The corresponding MARKOVprocess is the famous Ornstein-Uhlenbeck process; and, as is well known, the associated measures {P, : x E W} live on C([O,0 0 ) ; R). In fact, P, is the distribution under WIENER'S measure W of the solution X : [O,m) x 0 W to
-
x ( t ,e) =
+ e(t)- -6 lox ( s ,e) ds.
(See Section 1.3 for the notation here.) Furthermore, it is obvious that
m(dz) = -yl(x)dx is the one and only {Pt : t > 0)-invariant measure but that cannot be satisfied by P ( t , x , . )for any choice of p1 and pz. Nonetheless, as we are about to see, the {Pt : t > 0) is m-hypercontractive, and therefore the corresponding P in (6.1.1) is hypermixing. To verify the preceding assertion, first note that
(a)
VI Analytic Considerations
241
where
From this expression it is easily seen that P ( t , z,?/)*rn2(dX x dY)
< 00,
and therefore ((4 llLz(m)+L4(m) < 00 for all sufficiently large t E ( 0 , ~ ) . Thus, by Lemma 6.1.5, all that remains is to check that the second part of (6.1.8) holds. To this end, observe that (as the preceding expression makes explicit) rn is P(t,x, -)-reversing, and therefore, by (4.2.46) and (4.2.57),
p t 4 - (d),JJLZ(,) I e-xtlldllL2(m),
t E (0,CQ) and d E L2(rn>
where (6.1.13)
X
= inf(E($,$)
:
4 E L2(rn)and 114 - ( $ ) , l l ~ z ( ~ )
= 1).
(We are using primes here to denote derivatives with respect to x.) Since Cz(f4; R) is {Pt : t > 0)-invariant and
4)
d =~I
t J 4 J - pt4, &),( 1
1 = -2l
( d Y X ) - X d ~ ( X ) ) d ( X )d d x ) = 211d)));z(m)
and (Pt4)' = e-t/2Pt(4')for d
#J
E C;(R; Fa), we know that
2
2
---((Ptd((,z,,) dt = 2E(Pt4, Ptd) = e-"((~t(d')JIL2fm)
I e-t11d111i2(m)= 2 e - t E ( h d )
-
first for all $r E C,"(R;R) and thence for all q5 E L2(rn).Finally, since Pt4 ( c $ ) in ~ L2(rn)as t -+ 00, we now have:
i,
Hence, X 2 and therefore the second part of (6.1.8) holds for all TI E (0, GO). Actually, A = since
a,
IId - (4),11;z(,) when
d(z) = x, x E R.
= 2E(4,d)
242
Large Deviations
At least when p is P ( t ,cr, .)-reversing, the preceding example indicates that the property of p-hypercontractivity is closely related to properties of the associated DIRICHLET form. This connection is spelled out most precisely in the following version of a theorem due to L. GROSS[56].
6.1.14 Theorem. (GROSS)Suppose that rn E is P ( t ,n, -)-reversing; and let & be the associated DIRICHLET form (cf (4.2.47) ). Given Q E ( 0 , ~and ) ,B 2 0,
if and only if (6.1.15) for 1 < p 5 q < 00 and t E ( 0 , ~with ) e 4 t / a 2 ( q - 1)/@ - 1). In fact, (6.1.15) with p = 2 implies (LS) and therefore (6.1.15) for general p E (1, a) . .
z
PROOF:Recall the operator which generates the semigroup {pt : t > 0) on L2(rn) (cf. the discussion preceding (4.2.46)), let 4 E B(C;(0, a)) n Dom(z) be given, and set dt = Pt4. Then, for any q E [I,oo),
Note (cf. the argument leading to (4.2.54)) that, for any 1c, E B ( C ;[0, m))n Dom(Z),
where we have used the fact that, for any a , b E (0,oo) and q E [1,00),
243
VI Analytic Considerations which follows, in turn, from
for 71 E (1, co). Hence, we now see that
At the same time, if t E (0,oo) B(C;[0, co)), then
-
q ( t ) E (1,co) is smooth and 11, E
-
Therefore, after combining this with the above, we have that for smooth q ( t ) E (1,co) and 4 E B ( C ;[0, co)) f l Dom(Z) :
t E (0,co)
(6.1.16)
Now suppose that (LS)holds and, for given p E ( l , c o ) , set q ( t ) = 1 ( p - l)e4t/a. Then q’(t) = 4 ( q ( t ) - l ) / a and so (6.1.16) says that
+
and therefore that
at least for 4 E B ( C ; [0, m)) n Dam(,). Since the passage from this to general 4 E LP(m) is trivial and ((PtllLP(m)+Lr(m) = 1 for all T E [l,001, we
244
Large Deviations
have now proved that (LS) implies (6.1.15). On the other hand, if one takes e4t/a, then one finds that (6.1.16) becomes an equality at t = 0. Hence, when (6.1.15) holds with p = 2 and therefore $llr#t((Ln(t)(m) < - 0 at
q(t) = 1
+
t = 0, (LS)follows for r# E B(C;[0, m)) n Dom(Z). At this point it is an easy step to (LS) for all r# E B ( C ; [O, m)) and thence, via (4.2.54), for all r# E L2(m).I An estimate of the form in (LS) is called a logarithmic Sobolev inequality.
6.1.17 Corollary. Assume that m E M1(C) is P(t,u,.)-reversing and define AE and JE accordingly (asin (4.2.62) and (4.2.60), respectively). Then the following three properties are equivalent:
with e4t/(r 2 ( q - l)/(p - l), and (6.1.20)
A ~ ( V: l)o ~ g(Lexp[a~]dm),
v€c~(c;R).
Moreover, if any one of these holds, then (6.1.21)
for t E (0, m) and r# E L2(m).
PROOF:Note that (6.1.18) is equivalent to (LS),first for non-negative 4’s and then (by (4.2.54)) for all 0’s. Thus, by Theorem 6.1.14, (6.1.18) and (6.1.19) are equivalent. At the same time, the equivalence of (6.1.18) to (6.1.20) is the content of Exercise 5.3.15. Finally, by (6.1.6), one knows that IIPtIlp(m)+L4(m) = 1 implies that IlPtr# - (r#)mIILz(m) 5 3 - 1 ~ 2 ~ ~ r # ~In~particular, ~ ~ ( m ) . when (6.1.19) holds, then one can take t = (alog 3)/4. After combining this with the Spectral Theorem (cf. (4.2.57)), one concludes that Ear# = (d)m, that EX- EO = 0 for X E [0,2/a), and therefore that (6.1.21) holds. I We conclude this section with a result which sharpens for the reversible setting the sort of topics treated in Theorem 5.5.12 and Lemma 6.1.5.
6.1.22 Theorem. Assume that m is P ( t ,g,-)-reversing.
245
VI Analytic Considerations (i) Suppose that IIPTIILP(m)-L'I(m) =
1
for some T E (0,m) and 1 < p < q < 00. Then (6.1.18) holds with (6.1.23) In particular, if {Pt : t then
> 0) is rn-hypercontractive at time T E
(6.1.24) for 1 < p
=1
IIPtllLP(m,-Lq(m)
< q < m and t
E
(O,m),
(0, m) with et/T 2
s.
(ii) Assume that
114 - (4)mll;2(m)
(6.1.25)
41, 4 E W m ) ,
5
and that (LS) holds for some a , p, y E ( 0 , ~ ) Then .
and so {Pt : t
> 0) is m-hypercontractive.
PROOF:To prove (i) we will use the criterion provided by the equivalence of (6.1.18) and (6.1.20). To this end, we first show that, for given V E B ( C ;R),
where a = a(T,p,q). Indeed, for 4 E B ( C ;[O,oo)), set
J,exp [c T V ( C m T ( w ) ) (b(CnT-t(W))P u ( b ) , n-1
an,t(c)=
m=O
I
E C-
Then, by Theorem 4.2.25, JENSEN'Sinequality, and the MARKOVproperty:
246
Large Deviations
and so
But
and therefore
Since, by our hypothesis and HOLDER'Sinequality, it is easy to see that
-
m. we now get the asserted estimate after letting n To complete the proof of (i), we reason as follows. If p = 2, there is nothing to do, since 1 AE(V) = t'iE ;log
)
(ll~~llL2(m)+Lz(m)
On the other hand, if 1 < p < 2, then (by precisely the same argument as we used to prove (5.5.14)) we can find a TI E (0,m) for which IJPT,IILP(m)--tLZ(m) = 1; and therefore = lim -log 1 n+m
nT
(IIV '~T+T,I~L~(~)~L~(~)
A similar argument applies when 2 < p < 00. To prove (ii), we will show that
(6.1.26)
VI Analytic Considerations
247
and clearly this will lead immediately to the desired result. Note that in order to prove (6.1.26), it suffices to show that
J,(1 + t$)' log ((1+ t*y ) dm I t2 J, q2log(@) dm + 2t2, +
for all II, E L 2 ( m )with be given and set
t
t2
($),
€or t E R. Then fa(0) = log(1
+
= 0 and
I l $ l l ~ z ( ~= )
ER
1. To this end, let 6 > 0
+ S),
(1 t$)lCIlog((l
+ t$)' + 6)drn + 2 1
+ log(1 + t 2 )+
J, $2
+
(1 t*)3* (l+t*)2+sdm
log($') dm] ,
and (1
+ tI+q2 + 6
dm
+ 10
(1+ t*)* [(1+ t*)2
< 2 log (1 + -
m) 6 - [4A(t,6)'
dm-2--
+ &I2
- 10A(t,6)]
4t2 1 t2
+
-2
5 2 1 0 4 1 + 6) + 4 where
and we have used JENSEN'Sinequality in the passage to the last line. From these and TAYLOR'S Theorem, we conclude that
and therefore the required estimate follows once one lets 6 \ 0.
248
Large Deviations
8.1.27 Exercise. Referring to Lemma 5.3.5, let u E D fl B ( C ; [l,cm)) be given and define mu E MI@) and the transition probability function Pu(t,u,.) accordingly.
(i) Show that for any 4 E B(C;R) and p E M1(C)
(6.1.28)
J,
42
log
= inf
{
(
l1411;z(p)) "
dp
[4'logq5' - $'log t - 4'
1
+ t] d p : t E ( 0 , ~ )
+
Next, check that 3c log z - z log t - z t >_ 0 for all (t, 3c) E (0, cm) x [O,cm); and use this in conjunction with (6.1.28) to show that (6.1.29)
H(vImu)
IJJuJJ%H(vJm), v E Ml(C).
(ii) Let &, denote the DIRICHLET form associated with P,(t, Using (4.2.54), show that (6.1.30)
.) and mu.
(T,
& ( A4) I l141il(m)~u(4,4), 4 E B ( C ;w.
(iii) By combining parts (i) and (ii), show that (6.1.18) implies that H(vlmu)
I (WIIu114BJEu(4,
v E Ml(C).
In particular, this means that the hypermixing property is preserved by the transformation described in Lemma 5.3.5. 6.1.3 1 Exercise.
Let m E M1(C) be a P(t,o,.)-reversing measure. More familiar than logarithmic SOBOLEV inequalities are classical Sobolev inequalities of the form (6.1.32)
114112LP(m)
6 A ( E ( h 4 ) + Bll$lliz(m)),
4E
m; W),
for some p E (2,cm) and A, B E [O,cm). One naturally expects that a classical SOBOLEV inequality ought to be a stronger statement than a logarithmic one. To verify this, let 4 € B(C;R) with J J $ J J L z (=~ )1 be given, and use JENSEN'S inequality to check that
VI Analytic Considerations
249
Thus, (6.1.32) implies that
In particular, if one has, in addition to (6.1.32), that
then, by part (ii) of Theorem 6.1.22,
+
PA(1 BC) P-2
+ 12.
JE(v), v E M1(C).
6.1.33 Exercise.
In his article [56], GROSSconsidered the “two-point” space C = {-1,1} with the BERNOULLI measure m = (6-1 + 61)/2 and the transition probability function l+e-t i f o = T P(t,cr,T) = l-e+ if = -T.
I,
Obviously, m is P(t,u, .)-reversing. Using & to denote the DIRICHLET form associated with P ( t ,u, .) and m, show that (6.1.34) and conclude from this that the associated semigroup {Pt : t the property that p t = 1 as long as 1 < p < q e2t 2 ( q - l ) / ( p - 1). Finally, check that (6.1.34) is optimal.
l L P(m)+Lr(m)
> 0) has < 00 and
Hint: First observe that it suffices to prove (6.1.34) for 4’s of the form &,(a) = 1 bo, where b E [0,1]; and then show that (6.1.34) for $hb is equivalent to
+
h(b)
+
(1 b)2 log(l+ b )
+ (1 - b)2 lOg(1 - b ) - (1+ b 2 ) lo g ( l+ b 2 ) I 2b2
for b E [O,11. Finally, prove the preceding by checking that h(0) = h‘(0) = 0 and that h”(b) 5 4.
Large Deviations
250
6.1.35 Exercise. Referring to the situation in Corollary 6.1.17 and assuming (6.1.18) holds, show that (6.1.36) H(vPtlm) 5 exp
[-%I
a
H(vlrn),
v E M1(C) and t E [O,oo).
Hint: Assuming that f is a uniformly positive element of Dom(z) which is bounded, set f t = [Ptf]and check that
Next, using (4.2.54) in the same sort of way that we used it in the proof of Theorem 6.1.14, show that
6.2 Symmetric Diffusions on a Manifold
The purpose of this section is to provide a ready source of examples to which the results in Chapter V and Section 6.1 are applicable. The setting in which we will working is that of differentiable manifolds. Thus, we will assume that C is a separable, connected, N-dimensional C"-manifold on which there is given a complete RIEMANNian structure; and we will denote by X the associated RIEMANNian measure on C. Given vector fields X , Y E r(T(C)),(XIY) E CO"(C;R) will be the RIEMANNian inner product of X and Y ; and 1x1 = ( X I X ) l / ' is the length of X. (We use T(C) to denote the tangent bundle over C and I'(T(C)) to denote the space of smooth sections.) Also, we use V x Y E r(T(C)) to denote the associated (LEVICIVITA)RIEMANNian covariant derivative of Y with respect to X. That is, V i defined to be the KOSULconnection which satisfies (6.2.1)
V x Y - V y X = [X, Y ] , X , Y E F(T(C)),
where X , Y ] = X Y - Y X is the commutator of X and Y , and (6.2.2)
X ( Y I 2 ) = (VxYIZ) + (YIVxZ)
for X, Y, Z E r(T(C)).
In addition, we will use grad 4 E r(T(C))and divX E C w ( C ;R) to denote the gradient of Q E C"(C;R) and the divergence of X E r(T(C)).Thus, for X E I'(T(C)): (6.2.3)
x4 = (XIgrad4), d E C"(C;W,
VI Analytic Considerations
251
and (6.2.4)
lXq5dX = -
J, q5divXdX
for q5 E Cp(C;R),
where C r ( C ; R ) denotes the class of # E C""(C;R) which have compact support. In particular, with the use of normal coordinates, one can easily check that
if {&}? C I'(T(C)) is orthonormal at u. Finally, we will use A to denote the LaplaceBeltrami operator given by
A4 = div(grad$),
# E Cm(C; R).
The reason for our introducing the preceding terminology is that we are going to be dealing with diffusions on C corresponding to an operator L of the form eu 2
[Ld]= -div
(6.2.6)
1 (e-Ugrad(6) = -([A#] - (gradUIgrad4)) 2
€or # E C"(C; R), where U is a fixed element of C"(C; W) which satisfies (6.2.7) (Note that Example 6.1.11 corresponds to C = R with the standard EucLIDean structure and U ( z ) = (xc2- log2n)/2.) Our first step will be to make sure that such a diffusion exists and that the measure rn E Ml(C) given by m(do) = e-u(u) X(da)
(6.2.8)
is reversing for the corresponding transition probability function. To be precise, we will prove the following.
-
6.2.9 Theorem. Set 52 = C([O,0 0 ) ; C) and define the evaluation map
Ct : R R and the u-algebra f3t for t E [O,m) accordingly. Then, for each cr E C , there is precisely one P,, E Ml(R) with the property that
(6.2.10)
Large Deviations
252
-
is a mean-zero martingale for every # E Cr(C;R). Moreover, the map uEC P, E Ml(fl) is continuous and the family {Pe: u E C} is (timehomogeneous) MARKOV.Finally, let P(t, u,.) denote the associated transition probability function (i.e., P ( t , u , r ) = P,({w : C t ( w ) E I’})). Then the measure m in (6.2.8) is P(t, u,-)-reversing. In fact, the corresponding DIRICHLET form E is given by
4) =
(6.2.11)
f Jc lgrad#I2 dm
for # E L2(m)nC“(C; R) with lgrad #I E L2(m);and E is the closure of its own restriction to C r ( C ; Fa) in the sense that # E L2(m)is an element of Dom(E) (i.e., satisfies €(#,#) < 00) if and only if # is the limit in L2(m) of a sequence {&}y G C r ( C ; R) with the property that
in which case E(4, #) = limn+m E(#n, (bn). In particular, if {pi: t > 0) is the semigroup on L 2 ( m )determined by P(t,u, then for every # E L2(m), [Ft#] Jc #dm in ~ ~ ( rasn t )-+ 00.
-
a ) ,
Aside from rather mundane probabilistic considerations, the proof of Theorem 6.2.9 comes down to showing that the diffusion “generated” by L does not explode (cf. Chapter 10 of [104]);and the key to checking this is contained in the following variant of a lemma due to M. GAFFNEY[52], which shows how to utilize the completness assumption that we have made about the RIEMANNian structure on C. (For the required standard facts about RIEMANNian geometry, the reader might want to consult MILNOR’S marvelous [761.)
6.2.12 Lemma. (GAFFNEY)There exists a 11, E C” (C; [0, co)) with the properties that the level set {u : $(a)5 R} is compact for each R E (0,co) and that Igrad11,I is bounded. In particular, there exists a non-decreasing sequence {q,,}? C_ C r ( C ; [0,1]) with the properties that
- -
11
lgradqnI 1 1 ~ 0 as n {u : q,(o) = 1) /” C and PROOF: Choose and fix a reference point uo E C, and set $(g)
= dist(o,ao),
CT
0.
E C,
where “distance” is being measured with respect to the RIEMANNian distance function on C. Because C is connected, C = {u : #(u) < co}; and by the triangle inequality, it is obvious that # is LIPSCHITZcontinuous with LIPSCHITZconstant 1. Moreover, because the RIEMANNian structure on C is complete, the level sets K ( R ) G {o : #(u) 5 R} are compact, and clearly they exhaust C. Thus, we can find an open cover { U m } r and an atlas {(Wm,Qrn)}Twith the following properties:
VI Analytic Considerations
253
(i) Every pair of points in W , are joined by a unique geodesic which lies entirely inside of W , . (ii) Dm cc
w,.
(iii) For every R E (0,oo) there are only finitely many rn E Z+ with W , n K ( R ) # 0; and if W , n K ( R ) # 0, then 5 K ( R 1).
w,
+
Finally, choose a , C r (C; [O, I]) to be a partition of unity which is subordinate to {Um}y.
4m,Ja) =J
4 0 @.,l(Y)p€(@,(a)
- Y) 4 4 ,
(7 E
urn,
@rn(Wm)
where p,(y) = ~ - ~ p ( y / cand ) p E C" (RN;[0, m)) is compactly supported E in the unit ball and has total (LEBESGUE) integral 1. Clearly, 4,, C" (U,;[0, m)). In addition, for every u E U,,
Similarly, for all a, T E U,, 147n,C(T>
- 4rn,€WlI
SUP diSt(Q,,,(T), lYl<€
@m,y(a)).
Using to denote the JAcoBian matrix of Q'm,y and noting that there is a C, E (0,m) such that (1 - C m c ) I u I (DQ,,&))*DQ,,&)
I (1 + C,f)Z
for IyI < c and u E U, (where I,, is used here to denote the identity map on T,(C) and the asterisk indicates the adjoint relative to the RIEMANNian metric) one easily sees that
Large Deviations
254
when
4m
(6.2.13)
4m,em
on
urn.
We now set m= 1 where the
4,’s
In addition, Finally,
are the ones defined in (6.2.13). Obviously,
I+ - $1 5 1 on C; and therefore the level sets of
C
are compact.
00
M
grad II, =
$J
am
grad 4m
+ C 4m grad a m . m=l
m=l
Clearly, the first sum contributes at most contribution of the second sum, note that grada,
4 to lgrad$l.
To estimate the
= 0;
m=l and therefore that the length of the second sum is dominated by
Hence, lgrad$I 5 2. To complete the proof, choose r] E CM(R, [0,1]) SO that rl E 1 on (--00, 11, r] 0 on [2, oo),and lr]’l 5 1 everywhere. It is then clear that the functions given by
have the desired properties. I
VI Analytic Considerations
255
PROOFOF THEOREM 6.2.9: Let 9 = C U {p} be the one-point compactification of C, and set 6 = C([O,m);k). Then, by standard diffusion theoretic techniques (see Chapter 10 of [lo41 or [51])one can show that, for each r E C, there is a unique P, E M I@ ) with the properties that the expression in (6.2.10) is a mean-zero martingale for all 4 E C,"(C; R) and
-
for s E [ O , c o ) . (We extend conipactly supported functions on C to 2 by taking them equal to 0 at p . ) Moreover, (T E C P, E Ml(f2) is continuous; and if Pp is the measure concentrated on the constant path at p , then {P+ : 6 E k} forms a time-homogeneous MARKOVfamily. Hence, if ~ ( w= ) inf{t E 10, M) : 9 t ( w ) = p } and
for every 4 E B ( C ; R ) ,then {Qt : t > 0) forms a FELLER-continuous, sub-MARKOVian semigroup which is weakly continuous at 0. What is not so obvious, but is nonetheless true (cf. [51]),is the fact that the symmetry of Llc,-pR) in Lz(rn):
implies that the Qt 's are also symmetric on L2(rn).Thus each Qt admits a unique extension as a self-adjoint contraction operator Gt on L2(m)and the semigroup : t > 0) is strongly continuous. In fact, (0, : t > 0} of is the semigroup which is generated hy the FRIEDRICHS extension LIC,UO(C;R). Using {Ex : A E [0,00)} to denote the spectral resolution of - L , we have the representation
{ot
In particular, if
256
Large Deviations
then (6.2.14) leads to
for t E (0,m). A basic fact about the FRIEDRICHS extension of a nonnegative operator is that its DIRICHLET form is the closure of its quadratic form. Thus, in the present situation, E is the closure of its restriction to C,-(C;R). We next want to prove (6.2.11). To this end, let #J E Cw(C;R)nL2(m) with (grad41 E L 2 ( m )be given, and observe that, by (6.2.15) and the fact that E is closed, all that we have to do is produce a sequence {#Jn}TC C,OO(C;R) such that #Jn #J in L2(m) and
-
lgrad#J, - grad#JI2dm
- 0
as n
0.
To this end, choose the functions 71, as in the last part of Lemma 6.2.12 and simply take #Jn = 7]9t#J. As an immediate consequence of (6.2.11) and (6.2.16) with #J = 1, we see that k [ Q t l ] d m 2 1 for all t E (0,~); and because [Qtl] is continuous and dominated by 1, this proves that [Qtl] 1. Equivalently, we now know that P,({w : [ ( w ) 5 t } ) = 0 for every (t, u ) E [0, m) x C ; and therefore the measures P, are actually concentrated on a. In particular, {P, : (T € C } is itself a FELLER-continuous time-homogeneous MARKOV family of probability measures on R; and all of the statements which we have made about the Qt 's immediately become statements about the semigroup {Pt : t > 0) determined by {Po: n E C}. We still have to prove the final assertion of the theorem. Using the spectral representation of Pt = Gt,one sees that it is sufficient to show that the range of the projection EO is the constant functions. Equivalently, this comes down to checking that #J is constant if #J E L 2 ( m )with E(#J,#J)= 0. To this end, assume that €(@,4) = 0. One then has that &(#J,$) = 0 for every $ E Dom(E) and therefore that
Vl Analytic Considerations
257
for every $ E CF(C;R). But this means that [L4] = 0 in the sense of distributions and therefore, by standard elliptic regularity theory, that 4 E Co3(C;W).In particular, this now leads to the conclusion that
and, therefore, that grad4 = 0 everywhere. Clearly the constancy of follows from this and the connectedness of C.
4
From now on m will be the probability measure in (6.2.8) and we will use (r$)mto denote the m-integral of a 4 E L1(m).Also, P ( t ,g, will be the transition probability function for the MARKOVfamily {Pn : o E C) produced in Theorem 6.2.9, and {Pt : t > 0) will be the corresponding FELLER-Continuous semigroup. Before proceeding, we will need the following technical addendum to Theorem 6.2.9. a)
6.2.17 Lemma. Set
Then, for each f E
F,( t ,0) E ( 0 , ~x) C
-
[ P t f ] ( g is ) smooth,
and lgradfl E L2(m).In fact, (6.2.20)
-(g, [Lf]) U(m)
Finally, 3 is {Pt : t
-(
for f , g E F.
= 1 (gradflgradg))
2
m
> 0)-invariant.
PROOF:Let f E F and 1c, E C?(C;R) be given. Then,
Thus, ( t ,o) E ( 0 , ~x) C +-I [ P t f ] ( osatisfies ) the first equality in (6.2.19) is the sense of distributions; and therefore, by elliptic regularity theory, it is a smooth function which satisfies this equality in the classical sense.
Large Deviations
258
Before attempting to check the second inequality in (6.2.19), we will prove lgradfl E L2(rn), f E 7 ,and (6.2.20). To this end, choose {v,}? as in the last part of Lemma 6.2.12. Then
I(
I L22(m)
from which it is a simple matter to estimate lgradfl in terms of Jlfll~~z(~)IILfll~z(~). Thus, we now know that lgradfl E L2(rn)for all f E 3,and once one knows this, the proof of (6.2.20) is easy:
Returning to the proof of the the second equality in (6.2.19), note that we already know that (6.2.19) holds for elements of CF(C; R); and therefore, if II, € Cr(C;R), then
where, in the passage from the first to the second lines, we have used the facts that [Pt$] E 7 ,t € [O, oo),and therefore that (6.2.20) applies. Clearly the second equality in (6.2.19) follows from the above. Moreover, we now see that 3is {Pt : t > 0)-invariant, since the only thing that we had left to check is that [LPtf]E L2(rn),and this is obvious from the second equality in (6.2.19). I Our goal now is to find conditions which will tell us when the results in Sections 5.3 and 5.4 apply to the processes described in Theorem 6.2.9. We begin with the following. 6.2.21 Theorem. Set V = lgrad UI2 - AU and assume that the level sets {o E C : V ( c )5 R } , R E [0,m) are compact. Then Jc is a good rate
VI Analytic Considerations
259
function and
€or every measurable r
MI (C). PROOF:Recall that, for any R E [0, GO), the set
is relatively compact in L 2 ( R N ) where , B is the open unit ball in W N . Hence, with the use of a partition of unity, one can easily check that, for any relatively compact open set G C,
{ 4J E C r ( G ;
Igrad4JI2dm5 R
R) :
is a relatively compact subset of L2(m)for every R E [O, 00). Knowing this and using, once again, the functions qln from Lemma 6.2.12, one concludes that, for each R E [O,m),
@(R)= { 4JE C?@; R) : €(d, 4)I R } is relatively compact in Lt,(m). That is, every sequence {4Jn} @(R) contains a subsequence which is L2(m)-convergenton each compact subset of C. Thus, we would know that @ ( R )is relatively compact in L2(m) if we could produce a sequence {Kt}? of compact subsets in C such that {&t}
(6.2.23)
lim '-+O0
sup
J
4Jz dm = 0.
+E@(R) K;
To prove (6.2.23) under the stated hypothesis on V , note that if C,"(C; R) and 1' 1, = e-U/2+, then
= 2€(4,4J)
+
1 c
[LU]4J2dm
+
a s,
lgrad UI24J2dm
4J
E
Large Deviations
260 and therefore
Since the level sets of V are compact, it is clear from this how to choose the sets Ke. To complete the proof that Jc is good, remember that E is the closure of its restriction to C r ( C : R) and conclude that
where (90 is the closure of @(R)in L2(m). Thus, if {v 7L }"1- C_ M1(C) with J&(v,,) 5 R, n E Z+, then dv, = dm, where {&}y C_ "(R). Now choose a subsequence which converges in L2(m)to an element 4 of O(R). It is then clear that v,t + v, where dv = d2drn. Moreover, since (cf. (4.2.54)) {&t)
J & ( Y )=
WI, 141) I €(4,+) I R,
it is also clear that J&(v) 5 R. The rest of the proof is nothing but an application of elliptic regularity theory and Exercise 5.3.14. Indeed, elliptic regularity theory assures us that P(t,0,d7) = p ( t , 0,T )m ( d ~ )where , p E C" ( ( 0 , ~ x) C x C; (0, a)). I Having found a condition which enables us to apply the results in Section
5.3, we next want to see what we can do to bring the results in Section 5.4 to bear. As we pointed out in Remark 6.1.10, the strong form of (0)in (SU)is more than enough to guarantee that the semigroup {Pt : t > 0) is hypercontractive. Of course, at least from the standpoint of large deviation theory, this is not a very useful observation since (SU) itself implies far stronger large deviation results than does hypermixing. On the other hand, Example 6.1.11 clearly demonstrates that there are interesting situations in which (SU) fails to hold but {Pt : t > 0) is nonetheless hypercontractive; and what we want to do now is develop machinery for recognizing such situations. Thus, we are about to embark on a program which wilI eventually give us a criterion with which to determine when {Pt : t > 0) is hypercontractive even though (SU)may fail. The program which we have in mind is based on the work of BAKRYand EMERY[3]and entails the analysis of the function
VI Analytic Considerations
261
where f is a uniformly positive element of F (cf. (6.2.18)) and we use to denote [Ptf].Using Lemma 6.1.17, one can easily justify the steps:
ft
Thus, since, by the last part of Theorem 6.2.9,
(6.2.25)
Clearly (6.2.25) is potentially related to a logarithmic SOBOLEV inequality. In particular, it indicates that we would be well-advised to study quantities related to the integrand on the right hand side. With this in mind, we introduce, for 6 E (0, a), the function (6.2.26)
a) = (lgrad ft(u)l'
+ 6)
I/'
, (t,a) E [O, 00)
x
c.
By straight-forward computation (IT& transformation rule for second order operators), one can show that
where
and (6.2.29)
Our next goal is to interpret the quantity v(t,a) in (6.2.28). In doing so, it will be necessary to recall some more notions from RIEMANNian geometry. In the first place, if g E C"(C; W), then the Hessian, Hessg, is the element of r(T*(C)@ T*(E)) given by Hessg(X, Y) = X Y g - VxYg for X, Y E r(T(C)). Note that, because the LEVI-CIVITAconnection is torsion free, Hessg is symmetric. Also, an elementary calculation leads to (6.2.30)
Hessg(X,Y) = (VxgradglY),
X , Y E r(T(C)).
Large Deviations
262
A second notion which we will need is that of the RICCIcurvature tensor. For this purpose, recall that the Riemann curvature is the tensor R E r(T*(C)s4)defined by R(X, V, Y, W ) = -(Vx
0
VvY
- V v 0 VXY - V[X,VI YIW)
for X , Y, V, W E r(T(C)),and that the Ricci curvature is the tensor Ric E r(T*(C)e2)such that N
(6.2.31)
Ric(X, Y)(.)
R (X ,Ek,Y, &)(d),
=
X, Y E K'(T(E)),
k=l
as long as {Ek}y r(T(C))is orthonormal at We will now show that (6.2.32)
w(t,.) = (Ric
0.
+ HessU)(gradft,gradft) + IIHessftIIH,S,, 2
where, for any {Ek}Y C I'(T(C)) which is orthonormal at a,
is the HILBERT-SCHMIDT norm of Hessft(o). In the derivation of (6.2.32), a central role will be played by the identity grad (grad ulgrad w) = V,,d,grad
(6.2.33)
w
+ Vgrad .grad u
for u,w E C"(C;R). To prove (6.2.33), set Y = gradu and 2 = gradw. Then, for X E K'(T(C)):
+
+
(XJVYZ VZY) = Y(XJZ) - ( V y X J Z ) Z(XIY) - (VZXJY) = YXW - ( V X Y l Z ) = XYw
+ zxu - (YIVXZ) - ([Y,X]IZ) - ( [ Z , X ] I Y )
+ XZu, - X ( Y \ Z ) = X(YI2) = (X)grad(YJZ)),
where we made use of the torsion free nature of V. Turning to the proof of (6.2.32), note that 1
(
(
I
w(t,.) = 5A(gradftlgradft) - Z(gradU grad gradft gradft)) -
= wo(t,
a)--
(grad Aft (gradft)
+ (grad (grad Ulgrad ftllgrad ft )
1 (grad Ulgrad (grad fi /grad ft)) 2
+ (grad (gradUlgradft)Igradft).
VI Analytic Considerations
263
At the same time, by (6.2.33) (with u = U and 'u = ft),and (6.2.30): 1 - -2 (grad UIgrad (grad ft [gradft)) + (grad (grad UIgrad ft)Igrad ft) 1 = --grad U b a d ftlgrad ft) (Vgrad u s a d ftlgrad ft) 2 Hess U (grad ft ,grad ft ) = HessU(gradft(gradft).
+
+
Thus, all that remains is to show that
11
lli,s,.
vo(t,-1 = Ric(grad ft , grad ft ) + Hess ft In order to check (6.2.34), it will be convenient to fix a CJ E C and to choose {Ek} C r(T(C))so that {Ek(O)}y is orthonormal and VxEk(0) = 0 for 1 5 k 5 N and X E r (T(C ) ). For example, one can choose a normal coordinate system (xl,. . . , xN) in a neighborhood 0 of u and arrange that Ek = in 0. By (6.2.33), one then has (6.2.34)
&
1 -A(lgradftI2)(0) 2 = div ( v g r a d f t gradft)(g) N
= x(VEkvgradftgradf t l E k ) ( a ) k=l
and, by (6.2.2),
I
N
(grad Aft g a d f t ) (0)=
(grad (VEk grad f t k=l
N
IE k ) Igrad f t ) (0)
N
= C(VgradftVEI:gradftIEl,)(a) -k C(VEl:gradftIVgradfrEk)((T)
k=l
k=l
N
= C(VgradftvqgradftIEb)(CJ). k=l
Thus, after subtracting the second of these from the first, we arrive at N
vo(t,O) = Ric(gradftlgradft)(.)
+
(V[Ek,gradjt]gradftlEk)(a). k=l Finally, note that, because the HEssian is symmetric and V is torsion free, (V[Ek,gradft]gradft I E k )
(0)
= Hessft ([El,,p a d f t ] , Ek)(u) = (VEkgradftI[Ek,gradft])((.) = (VEk grad ft lVEk grad ft) (a>- (VEk grad ft l v g r a d ft E k ) (0) = ( b g r a d ft lVEkgrad ft) (a).
Thus, (6.2.34) follows after summing the preceding over 1 5 Ic 5 N .
264
Large Deviations
Having dealt with w ( t , a),we next want to estimate W,5(t, u ) in (6.2.29). Remembering that the square of the HILBERT-SCHMIDT norm dominates the square of the largest eigenvalue of a symmetric matrix, use (6.2.33) to check that
and therefore
By combining (6.2.27), (6.2.32), and (6.2.35), we arrive at the important relation (6.2.36)
aw
[ L ~ b ] ( t , u-)- ( t , u )
at
2
(Ric
+ Hess U )(grad ft ,grad ft) (a) 2w(t, 0 )
In particular, if we now make the assumption that Ric for some €
+ Hess U 2 261,
> 0, then
6.2.38 Lemma. Let T E (0,m) and w E C” ([0, TI x C; [0, m)) be given, and assume that t E [O,T]c,Ilw(t,.)11~2(~,is bounded. If
then
PROOF:Choose {qn}yas in the last part of Lemma 6.2.12 and set
265
VI Analytic Considerations Then
from which the desired inequality follows after one takes the limit as n 00.
-+
I
With the preceding preparations, we are at last ready to prove the estimate toward which our efforts have been directed. 6.2.39 Lemma. Assume that (BALE)holds for some€
> 0. Then,
for
every uniformly positive element f of 3,
as in (6.2.26). Then, by Lemma 6.2.17, (6.2.37), and PROOF:Define Lemma 6.2.38, we know that
Now let 4,
$J
E C,oO(C; [0,w)) be given and set
Large Deviations
266
-(u6(T-t,')(grad[p~4]Jgradi)) m dt - &'I2
Jd'e-tt "Pt+l, 4 L z ( m ) d t .
-
Now let {qn}f10 be the sequence produced in Lemma 6.2.12, replace 11, in the preceding by qnrlet n 00 and 6 \ 0, and use the above together with (6.2.41) to conclude that
(4, [grad [ P T / I I ) ~ 5~ e--ET ( ~ ) (4, [P~lgradfl])L 2 ( m ) Finally, because this is true for an arbitrary 4 E Cr (C; [0,w)),it obviously implies (6.2.40). I 6.2.42 Theorem. Assume that all 1 < p 5 q < 00,
(6'2.43)
IIYfIl,~(rn)-L~(m)
In particular, {Pt : t (6.2.44)
(B&E) holds for some 6 > 0. Then, for
-1 -
9-1 for t E (0, 00) with e2Et2 p-1'
> 0) is hypercontractive at time (log3)/26 and
VI Analytic Considerations
267
PROOF:Let f be a uniformly positive element of F. Then, from (6.2.40), we have that
and so, by (6.2.25),
Next, let q E ( 1 , ~and ) a uniformly positive 4 E F be given. Choosing 2 6.2.12, set fn = (qn+qj2 l / n ) . Plugging this fn into the above, noting that
+
{qn}yas in Lemma
and then letting n
-
00,
we arrive at
Since $t [Pt4]is a uniformly positive element of F whenever 4 itself is, we can use this in (6.1.16) with q ( t ) = 1 ( p - l)e2Etto conclude that
t E [O, 0 0 )
-
+
IIpt411L.ct,(m,
is non-increasing; and from this point it is an easy step to (6.2.43). Finally, (6.2.44) follows from (6.2.43) together with Theorem 6.1.14 and Corollary 6.1.17. a 6.2.45 Corollary. Assume that there is a bounded V E C"(C;R) with
the property that (6.2.46)
for some c
Ric
> 0.
Then {Pt : t
+ Hess (U + V) 1 €1 > 0) is hypercontractive.
PROOF:Without loss of generality, we will assume that l e v d m = LeV'"dX
= 1.
Large Deviations
268
Define m' E and the DIRICHLET form &' relative to U Theorem 6.1.14and Theorem 6.2.42,
+ V. By
Using the technique in part (i) of Exercise 6.1.27, one sees that
At the same time, by (6.2.11),
and therefore
Thus, we find ourselves at the same place as we were when we started the second paragraph in the proof of Theorem 6.2.42;and therefore the same argument applies here. 1 6.2.47 Exercise.
Let C = W N and give W N the standard EucLIDean structure. Then the RIEMANNian measure is LEBESGUE'S measure and A is the standard EucLIDean LAPLACE operator. Let U E Cm(RN;R) be a function which is bounded below and satisfies (6.2.7),and define m E M1(WN) and L on CF(RN;W) accordingly. Finally, let & be the corresponding DIRICHLET form described in Theorem 6.2.9,and define V as in Theorem 6.2.21.
(i) It is interesting to see that, at least for the setting just described, Theorem 6.2.21 is quite sharp. To see this, suppose that there is an T E (0,oo) and a sequence u, 00 with the property that
-
sup '7%€2'
sup
V ( 7 )< 00,
T E B ( 0 , rr )
where B(U,T)denotes the open EucLIDean ball with center u and radius T . Choose y5 f
and set
Cr (B(0,T ) ; [O, 00))
with
lN
$ dz = 1,
4, = exp(U/2)$, where $,(T) = $(T + u,), T E R N . Show that I l 4 , l l ~ z ( ~ ) = 1 for all n E Z+ and sup €(&, 4,) < 00; nG!+
and conclude from this that the associated JE cannot be good.
VI Analytic Considerations
269
(ii) Assume that
where a E ( 0 , ~ and ) c, is chosen so that the normalization condition is satisfied. Show that JE is good if and only if a E ( 1 , ~ and ) that the associated semigroup {Pt : t > 0) is hypercontractive if a E [ 2 , ~ ) . Finally, if a E (1,2), show that (LS) fails and therefore that {Pt : t > 0) is not hypercontractive. (Hint: Try test functions of the form egu with P E (0,
m.)
(iii) The preceding result showed that the ORNSTEIN-UHLENBECK semigroup in Exercise 6.1.11 (i.e., the case when a = 2) is at the borderline of hypercontractivity. By a remarkable coincidence, it turns out that Theorem 6.2.42 predicts the optimal hypercontractive result for this semigroup. To see this, check that in this case (B&E) holds with E = and therefore that P-1 IIPtIILp(m)+Lg(n) for et 2 p - 1' Using the fact (cf. the last part of Example 6.1.11) that
and therefore that the predicted result is optimal. Actually, one can do even better. Namely, by considering the functions +,(a) = exp(rlz12), one can show that for any 1 < p < q < 00 and t E ( 0 , ~ with ) et < ( q - l)/(p - l),
llpt
IILp(m)+Lq(m)
= 00.
The facts contained in this exercise were first obtained by E. NELSON[79] and constitute the origins of all hypercontractivity considerations. 6.2.48 Exercise.
It is interesting to look at the BAKRY-EMERY argument when C is compact; even though, in that case, we already know that (SU)holds and therefore that {Pt : t > 0) is more than hypercontractive. In this exercise we outline the argument for the compact case and point out that the argument is not only simpler but also leads to a slightly sharper statement. Observe that the key to the simplification is hidden entirely in the fact that the space C"(C; R) is invariant under both L and {Pt : t > 0).
270
Large Deviations
(i) Let f E C”(C; R) be uniformly positive and set H ( t ) = (ft logft)mr where, once again, ft = [Ptf].First show that
where
+t
= log f t , and second that
Now conclude that the condition
+
+
(e@[ ~ ~ H e s s ~ ~(Ric ~ ~ , HessU)(grad$,grad$)]) s.
(B&E’)
m
2 2t(e@lgrad$I2) m for 1c, E C”(C;W) implies (6.2.43).
(ii) The major advantage that (B&E’) has over (B&E) is that it leaves open the possibility of applying it even when no point-wise estimate holds. N For example, consider the case when C is the flat N-torus (= (R/Z) ) and U 0. Then, since the RICCIcurvature vanishes, the left hand side of (B&E‘)becomes
which is easily seen to dominate
where (61,... ,ON) is the standard coordinate system on C. Thus, in this case, (B&E’)holds for all N E Z+ with a given E if it holds when N = 1for that E . Therefore, assume that N = 1, and observe that when h = &I2 then 2 the preceding dominates 4llh”[[,,(,), whereas the factor to be estimated on 2
the right hand side of (B&E’)becomes 4[lh’llL2(x).Use these observations to show that (B&E’)holds with E =
i.
VI Analytic Considerations
271
6.3 Hypoelliptic Diffusions on a Compact Manifold
In this section we will describe a particularly good situation to which the results in Section 4.2 apply and will attempt to give a more pleasing expression for the associated rate function, even when the process involved is not symmetric. The general setting in which we will be working is as follows. The space C will be a connected, compact, N-dimensional differentiable manifold; and X will denote a fixed probability measure on C which is "smooth" in the sense that, €or any coordinate chart (W,a ) ,there is an a E C" ( W ;(0,m)) for which r
In particular, for any X E r(T(C)),there is a (unique) gx E C"(C;R) with the property that
where (6.3.1)
X*$
= -X$ + gx$,
$ E C"(C; R).
Now suppose that X I , . . . ,X d , and Y are given elements of r(T(C)) and define the operator
The following theorem contains a few important facts about the diffusion determined by Ly. 6.3.2 Theorem. Let R = C([O, 00); C), w E R H&(w) E C, t E [0, m), and {Bt : t > 0) be as in Theorem 6.2.9. Then, for each u E C, there is a unique P,, E Ml(R) for which
{Po' : u E C} is a FELLERcontinuous MARKOVfamily. Finally, let (PF : t > 0} denote the asso-
is a mean-zero martingale. In addition,
ciated MARKOVsemigroup. Then, for each
E C"(C;R),
the function
272 ( t , u ) E [o,m) x
which satisfies
c
-
Large Deviations [~:4~](u E )R is an element ofC"([o,oo)
(6.3.3) -(t, u) = [Lyu] ( t ,u), (t,u) E [0, m) x C,
at
x C;R)
with u(0, = 4; a)
X is {PF : t > 0)-invariant if and only if g y = 0; and X is { P y : t reversing if and only if Y = 0.
> 0)-
PROOF:There are many ways in which one can prove each of these facts. For the sake of completeness, we will outline a proof which should be pleasing to the probabilists, if no one else. Without loss of generality, we assume that C is an embedded submanifold of R" for a suitably large n E Z+ and that the vector fields XI,. . . , Xd, and Y are the restrictions to C of vector fields X I , . . .,X d , and 3 on Rn with coefficients in C r (R"; W) (i.e., bounded continuous derivatives of all orders). At the same time, we think of each of the functions gx, as the restriction to C of some jx, E Coo(R"; R) , and then set xi = -kk gx,. Hence, if fl = C" ([0, m); Rn), then one can use 1 ~ 6theory ' ~ of stochastic integral equations to construct a FELLER-continuous, MARKOVfamily {Pz: x E R"} 5 Ml(f2) with the property that, for every x E R",
+
h
-
6
is a mean-zero martingale for every E C?(R";R), where 2 E fl & ( L j ) E Rn and & are defined by analogy to their "unhatted" counterparts, and d
t = - p jo X - , + Y . k=l
In fact, one knows that it is possible to differentiate the solution to 1 ~ 6 ' ~ equations it9 a function of the starting point x. As a consequence, one finds first that the associated semigroup { pt : t > 0} maps C r (R"; R) into itself and then that (t,x) E [0, m) x C H [ P t J ](x)E R is a smooth solution to a0
-(t, at x) = [hi]( t ,x), t E [0, oa) x C with C(0,
6
a)
=
d
for each E C?(R";R). Finally, if x = u E C, then one can easily show that pc(Q)= 1; and so we get all the required existence results by simply taking Po = pnlBn,u E C. Furthermore, the asserted uniqueness statement follows easily (cf. Theorem 6.3.2 in [104]) from the fact that we now also know how to find a smooth solution to (6.3.3) for every smooth 4; namely, one simply chooses E C" (R"; R) so that $ 1 ~= 4 and then takes u(t,u)= [ M ] ( u ) .
4
VI Analytic Considerations
2 73
To complete the proof, let 4, $ E C"(C; W) be given and note that, for any T E (O,m),
for t E [O,T].Hence, with t+!~ = 1, we see that X is {P' : t if and only if g y = 0. At the same time, if Y = 0, then
whereas, if X is {P'
:
> 0)-invariant
t > 0)-reversing, then (Y$,q5)L2(A) = 0. I
6.3.4 Remark.
Note that if U E C"(C;R) and Y u E I'(T(C)) and mu E Ml(C) are defined by
sc
where Zu = e-' dX, then mu is {P' : t > 0)-reversing if and only if Y = Y'. Indeed, for any X E T(T(C)), one can easily check that
from which it is clear that the reasoning used to prove the last part of the preceding theorem applies with mu replacing X and X i (XkU) replacing X,. .
+
As yet we have not made any assumptions which would guarantee the sort of conditions required to make the results in Section 4.2 applicable. For this reason, we will now add the following hypothesis: Lie(X1,. . . , xd) = T(C),
(H)
where Lie(X1,. . . , Xd) denotes the LIE subalgebra of I'(T(E)) generated by {XI,.. . ,x d } and the equality means that, at each 0 E C,
{
x E Lie(X1,. . . ,xd)}
~ ( 0: )
= T,(c).
274
Large Deviations
famous theorem (see [63]),the hypothesis (H) According to HORMANDER’S is more than enough to guarantee that, for any Y E I’(T(C)), the operator
d
- + LY
at
is “hypoelliptic.” In particular, this means that
PY(t,(T, d7) = p Y ( t , c,?-)A(&), where the function py is a non-negative element of Cm((O,00) x C x C; W). In addition, (H) is sufficient to guarantee that p y must be everywhere strictly positive. To see this, one can either invoke BONY’S strong maximum principle (see [13]) or one can use the ‘‘support theorem” in [103]. Thus, with (H), we have more than enough information to see that not only does hold but even that, for every t E (0, m), the condition
(a)
1 -A Mt
(6.3.6)
5 P Y ( t , u ,*) 5 M t A ,
(T
E
c,
for some Mt E [l,GO). In view of the preceding, we now know that (H) allows us to apply the results of Section 4.2, and the following lemma summarizes what we can say immediately on the basis of those results.
-
6.3.7 Lemma. Assume that (H) holds, and define w E fl Lt(w) E Ml(C), t E (0, m) as in Remark 4.2.2. Then, for every r E &?M~(C),
where (6.3.9) .Iy(.)
= sup { -
LY u -dv U
: u E C ” ( C ; [I,m))}
,
Y E
M1(C).
-
Moreover, if & denotes the DIRICHLET form corresponding to (t,u) E Po(t,cr,.)E Ml(C) and A, then
(0,m) x C
(6.3.10)
P ( v )= JE(V) =
~ ( f 1 / f1/2) ~ ,
if dv = f d ~ otherwise ,
VI Analytic Considerations where J o
3Jy
275
with Y = 0.
PROOF:Let L be the operator defined in the discussion preceding Lemma 4.2.31 and define D, as in Lemma 4.2.35. In view of Theorem 4.2.43, the first assertion will be proved once we note that D, C_ C"(C; R), Lu = L y u for 2~ E C"(C;R), and that, for every u E D, there is a sequence {un}yC C"(C; R) such that (tin,Lyun) (u, Lu) uniformly as n 03. Clearly the only one of these needing comment is the last. But, for every u E D,, un [PGnu] E C"(C;R) and Lyun = [Pl'/,Lu]. Finally, since holds, the second assertion is an immediate consequence of Theorem 4.2.58. I
-
-
(a)
6.3.1 1 Remark.
In connection with Remark 6.3.4, one should notice that the last part of Lemma 6.3.8 can be immediately modified to say that J y = JEW when Y is the Y u in (6.3.5) and Iuis the DIRICHLETform associated with the corresponding symmetric MARKOVsemigroup on L2(mu).
Our main goal in the rest of this section will be to obtain a better expression for the rate function J y , even in cases when Remark 6.3.11 does not apply. In particular, what we are seeking is an expression in which one can clearly see the distinct contributions made to J y by the "symmetrizable" and "non-symmetrizable" parts of L y . In order to carry out our program, it will be useful to introduce the following notions. In the first place, for # E C"(C; R) define X# E C" ( C ;Rd) by
X# =
["' I.
Xd#
Next, for p E [ l , m ) , define W,"'(X,X) to be the space of # E P ( X ) for which there exists a sequence {&}? C C"(C; R) with the properties that
as m
-
CQ.
6.3.13 Lemma. For any p E [l,m), there is a unique continuous linear mapping
-
X ( P f: Wj')(X,X) + L P ( A ; P )
276
Large Deviations --(PI
for which X Q, = XQ, whenever Q, E C"O(C;R). In fact, unique element of P ( A ; R d ) with the property that
-(P)
X Q, is the
-(P) -(d and therefore, X (b = X (b A-almost everywhere when (b 6 Wj"(X, A) n W,$"(X, A). Moreover, if 7 E C1(R;R) and Q, is an element of Wjl'(X, A) -(PI which satisfies o Q, E Lq(A)and (77' o Q,)X (b f Lq(A; R d ) for some q E [l,m), then r] 0 Q, E Wil)(X,A) and -(PI
-(q)
x
(7704) = (v' O Q , > X 4.
PROOF:We first note that, for any (b f LP(A), there is at most one @ E P ( A ; R d ) with the property that (6.3.14) for every X€J E Cm(C;Rd). Second, we observe that if {&}y Cm(C;R) satisfies (6.3.12), then X(bn converges in L p ( A ; R d ) to a @ E L p ( X ; R d ) for which (6.3.14) holds. Thus, both the existence and uniqueness statements follow immediately, and all the other statements are easy applications of these. I Because the program which we have in mind rests on L y being a compact perturbation of Lo, we will have to assume that d
(6.3.15)
Y =C
a k X k
for some {ak}?
c c~(c;w).
k=l
The importance of (6.3.15) is already apparent in the next result.
6.3.16 Lemma. Assume that (H) holds. Then W,'"(X,A) = Dom(E) and -w for (b E Dom(E). €(#>4) = 1Ix (b11;2(x;Rd)
Ic in addition, Y is given by (6.3.15), then .Iy(.) < 00 if and only if dv = f dA, where f is non-negative and f '1' E Wil)(X, A). PROOF:To prove the first part, note that
VI Analytic Considerations
277
for 4 E C"(C; W). Thus, since 4 E Dom(E) and
€(+,$) = n-m lim when
lim 4n in L ~ ( A ) ~ ( 4 ~ , 4i f 4~ =) n-00
{&}y C Dom(E) satisfies
we see that Wil)(X, A) C Dom(E) and that
€(4,4) = Il~(2)4\1&xiRd) for
4 E Wil)(X, A). To prove the opposite inclusion, let 4 E Dom(E) be given and set 4, = [Pl"/,,$],n E Z+. Then, because of (H), {&}? & C"(C;W), and clearly 4, 4. At the same time, by the Spectral Theorem,
-
as m
-+
00.
Turning to the second part, note that (cf. Theorem 4.2.58) there is nothing to do when Y = 0. On the other hand, if Y is given by (6.3.15), then, after writing u E C"(C; [l,m)) as e-4, we see that
Hence, if we take
(6.3.18)
A = [a1]
9
ad
then we find that
By reversing the preceding argument, we also find that
and so we now see that J y ( v ) < m if and only if Jo(v) < 00.
278
Large Deviations
In order to complete our program, let Y be given by (6.3.15), define A as in (6.3.18); and, for v E MI@), define A, to be the orthogonal projection in L2(v; W d )of A onto
{x4:
f$
E C"(C;R)}
LZ(u;Rd) 1
and set
P(A; ') = IIAVll;Z(,;Rd)* Since
it is clear that v E
-
P(A,v) is lower semi-continuous and convex.
6.3.19 Theorem. Assume that (H) holds and that Y is given by (6.3.15). Then (6.3.20)
1 J y ( v ) = J E ( v ) -P(A; v ) 4
+
+ 51
RY dv,
where A is defined as in (6.3.18) and d
Ry =
~ C X i a k . k=l
PROOF:In view of Lemma 6.3.16, we need only consider v E M1(C) for which dv = f dX for some non-negative f with f112 E Wil)(X,X). In addition, since both sides of (6.3.20) are lower semi-continuous and convex, we may and will assume that f 2 E for some 6 > 0. (Otherwise, set v, = (1 - c)v and let E \ 0.)
+
We begin by proving that (6.3.21)
1 J y ( v ) = J E ( v ) -P(A; v ) - (A,, 4
To this end, choose {f$.}f"
+
$2)
f
1/2
1/2
)
C"(C; R) so that
and set iDn = X4n - ;A,. Then (cf. (6.3.17))
.Iy(.) equals
L2(u;Rd)
VI Analytic Considerations
279
and, for any given $ E C"(C; R),
At the same time,
-(2)
Hence, since x + -
1/2
E L2 ( u ; R d ) ,
where
5
c i/\X+I\L2(u;Rd)7
"9
for some C E ( 0 , ~ )depending only on A and f . Clearly, by using (6.3.17) with Y = 0 to compute JE(v), one can easily use the preceding to get (6.3.21).
To prove (6.3.20) from (6.3.21), all that we have to do is check that
280
Large Deviations
and this comes down to showing that there exists a sequence { g , } T C"(C; W) such that
For this purpose, choose
{u,}T
E
G C" (C; [l,co))so that
and set gn = log u,. One then has that
We can now complete the proof by simply noting that
6.3.22 Exercise. Let X I , .. . ,Xd, and Y be smooth vector fields on the connected, compact manifold C; and assume that the Xk 's satisfy (H). Next, set 2 = C x R and define the vector fields XI,. . .,i d , and Y on 2 by
VI Analytic Considerations
281
and
for 4 E C”(f:; Finally, define
W), where b l , . . . ,b d , and c are given elements of C”(C; W). d
2
=EX:+ Y
on c - ( k ; ~ ) .
k=l
One can then show that L determines a (unique) FELLER-continuous, MARKOV family {Pa : 0 E k} of probability measures on R = C([O,0 0 ) ; 2)with the property that
is a mean-zero martingale for every 6 E k and all 4 E C” (9; W). In fact, as aficionados of stochastic differential equations will easily verify, if & = (a,<)E C x R, then Pa is the joint distribution under &dimensional WIENER’Smeasure W of the solution to
together with
(In both of these expressions, the stochastic integrals are taken in the sense of STRATONOVICH.)
(i) Write
WJ) = ( W 4 , S t ( 4 ) E c x w, show that the hypotheses at the beginning of Section 4.2 are met by the measures Pg P(a,O), a E C, and check that the condition is satisfied. Conclude that, for every p E W, the limit
(a)
282
Large Deviations
(this would have been denoted by ha(@)in Section 4.2) exists uniformly under {P,, : in u E C and that the the large deviations of s,(w)= u E C} are uniformly governed by the good rate function I given by
I ( f y )= sup{ ap - A(@) : p E R},
fy
E R.
(ii) In order to get a handle on A(p), define
and show that wg is the smooth solution to
9 = [Lpwg] + QBwp at
on [ O , o o ) x C with wg(O,*) = 1,
k=l
k=l
and k=l
Hint: Consider the function G g ( t , 6 ) = e g E w p ( t , u ) , t E [ O , o o ) and 6 = (a,<)E C x R,
and apply the FEYNMAN-KAC formula to see that aGg at
-(t,
6)= [LGp]( t ,6)+ @[Gp(t,6).
Now apply Theorem 4.2.43 to the MARKOV process determined by L p and conclude that
(iii) Finally, we add the assumption that
VI Analytic Considerations
283
introduce a smooth probability measure X on C, and, using X * to denote the A-adjoint of X , define gx by (6.3.1). Applying Theorem 4.3.19, check that 1 1 Rpdv, J'(v) = JE(v) -P(A'; v) 4 2
+
+
where E is the DIRICHLET form obtained by closing
f#J
E C"(C; R)
-c J k=l
()'&
'
in L 2 ( X ) ,
and
After combining the preceding with part (ii), show that
where
B=
["1 bd
and Xf#J =
[
x'"]
.
Xd#
A more complete discussion of these, and related, matters can be found in [lo21and [9].
Historical Notes and References DISCLAIMER The authors make no claims for the completeness and few claims for the accuracy or value of what follows. In other words, we are doing no more than perpetuating the mathematical tradition of covering up the tracks which we and others may or may not have made. For example, we are aware that there is a huge body of excellent work in the statistics literature on the subject of large deviations and that we have done little more than acknowledge the statisticians’ just claim of paternity. In addition, we have given rather short shift to some very beautiful mathematics in which large deviation theory is employed to find the asymptotics of heat kernels. In particular, we have essentially ignored the important contributions of MOLCHANOV [77], AZENCOTT [l],BISMUT[lo],and their students to this topic. Thus, the interested reader should consult VENTCEL and FREIDLIN’S book [lll] for applications to dynamical systems. In another direction, ELLIS’S book [39]describes connections with statistical mechanics.
CHAPTERI 81.1 & $1.2 Historically, the theory of large deviations emerged as an attempt to carry the well known CENTRAL LIMITTHEOREM one step further. To be more precise, suppose that p E MI@) has mean 0 and variance 1. Then the classical CENTRAL LIMIT THEOREM says that, for each 2 E R,
284
Historical Notes and References converges as n
-
0;)
285
to
The problem in which people became interested was that of determining just how fast the tails of the approximents were approaching those of the GAussian. As early as 1928 KHINCHIN [69], followed by SMIRNOFF [98], studied this question by replacing the fixed x by a function x(n) which 00. By restricting their attention to BERNOULLI tends to 0;) as n random variables, they were able to give a very precise answer in the case = 0. Although KHINCHINsaid that he was studying when limn-,m LLgroi3e Abweichungen,” from the standpoint the theory presented in this book the deviations with which he was dealing would have to be considered “moderate deviations.” For related work and other references, see LINNIK [75], RICHTER 1901 and PETROV[87].
-
The archetype for large deviation results of the sort dealt with in this book is the one which we have called CRAMER’S Theorem. In [20] CRAMER proved this theorem for distributions p on R which are not singular to LEBESGUE’Smeasure. From the viewpoint of later developments, the most significant idea introduced by CRAMER was that of transforming the given measure (apparently the transformation itself antedates C R A M ~ Ruse ’ Sof it and goes back to ESSCHER [43]). As distinguished from the use to which we put it in the proof of (1.2.7), CRAMBR uses the transformation as an initial step in a program which eventually enabled him to bring to bear refined LIMIT THEREM.The first proof of the estimates about the CENTRAL [17]. CHERNOFF general statement in Theorem 1.2.6 is due to CHERNOFF uses CHEBYSHEV’S inequality (in exactly the same way as we) to obtain the upper bound, but he gets the lower bound via approximation by discrete distributions and a clever application of STIRLING’S formula. CHERNOFF’S motivation came from statistics. In particular, he was interested in questions about the asymptotic efficiency of statistical tests and initiated a program which has been carried further by several statisticians: BAHADUR [4] and [5], BAHADUR and RANGA RAO[6], BARNDORFFNIELSEN [8], and DACUNHA-CASTELLE [22]. $1.3& $1.4 Theorem 1.3.27, which appeared in SCHILDER’S thesis [95], is the first example of a large deviation result for measures on a function space. At the time, SCHILDER was a student of M. DONSKER, and it seems clear from DONSKER’S earlier work that the idea for such a result should be credited to him. Be that as it may, what we have called SCHILDER’S
Large Deviations
286
Theorem contains only the first step of a program in which it was envisioned that function-space integral techniques could be used to provide an entire asymptotic expansion for the quantities under consideration. Although SCHILDER’S thesis contains the first examples of this line of reasoning, the real breakthrough came in the article of VARADHAN [lo61 where the foundations were laid for large deviation theory as we have presented it here. It seems that BOROVKOV [14] should also be cited as one of the first to study large deviation theory in a function space context, although his work does not appear to have had a great deal of influence even in Russia. Also, slightly later and apparently independently, VENTCEL and FREIDLIN in [log], [110],and [lll]started to use essentially the same function-space integral ideas to analyze randomly perturbed dynamical systems. Our brief presentation of their estimates as an application of SCHILDER’S Theorem is based on the ideas of AZENCOTT [l]who is also responsible for much of the recent progress toward the completion of SCHILDER’S program.
CHAPTER I1 52.1 The formulation given for the principle of large deviations as well as the LAPLACEasymptotic result contained in Theorem 2.1.10 appear for the first time in VARADHAN’S pioneering work [106]. Lemma 2.1.4 is simply an abstraction of ideas which had already been used by several authors, in his treatment of the VENTCEL-FREIDLIN in particular, by AZENCOTT estimates . Part ii) of Exercise 2.1.13 stems from a problem posed by G . STEIN (and answered independently, with entirely different methods, by E.M. projective limits of large deviation principles (cf. Exercise 2.1.21) STEIN); play a prominent role in DAWSON and GARTNER [24]; and Exercise 2.1.24 is an adaptation of a technique used by extensively by ELLISin [39]. 52.2 The basic relation between large deviations and the LEGENDREtransform of the logarithmic moment generating function is already present in CRAMER’S and CHERNOFF’S papers; and the role that convex analysis has to play in the theory became increasingly evident in the work of several authors (cf. especially J. GARTNER[53]and also the comments below on 53.1). The systematic use of the LEGENDREtransformation as a tool for identifying rate functions is an underlying principle in the second half of [loll.
Historical Notes and References
287
CHAPTERI11 The general outline of this chapter is taken from AZENCOTT’S excellent treatment in [I]. 53.1 The use of sub-additivity to show that limits like those in 3.1.4 exist was introduced by RUELLEin [92] and [93] and systematically exploited by LANDFORD in [73]. At the time, they were dealing with the problem of thermodynamical limits and the characterization of specific entropy in terms of GIBBS’variational principle. The first authors to apply this technique specifically to large deviation theory were BAHADUR and ZABELL [7] who, after significantly generalizing the idea, used it to derive both SANOV’S Theorem and the BANACH space case of C R A M ~ RTheorem ’S (cf. Theorems 3.2.17 and Theorem 3.3.11).
$3.2 SANOV’S elegant result was at first so surprising that several authors expressed doubts about its veracity! The theorem, which SANOVproved only for empirical distributions of W-valued random variables, was extended by many authors: HOADLEY[60],HOEFFDING[61], and at last achieved the form stated here in DONSKER and VARADHAN [33].The first statement of SANOV’STheorem relative to the strong topology (cf. Theorem 3.2.21) appeared in GROENEBOOM, OOSTERHOFF and RUYMGAART [55]. The relative entropy function had been introduced into statistics by KULLBACK and LEIBNER[71], and its properties were investigated by CSISZAR[21]. Lemma 3.2.13 (which is implicit in the cited work by RUELLE and LANFORD on GIBBS’variational principle) owes its present form to DONSKER and VARADHAN [30]. Finally, the estimate (3.2.25) is taken from KEMPERMAN [SS]. 53.3 & 53.4 The large deviation results as they are stated in these two sections were first obtained by DONSKER AND VARADHAN [33]. However, Theorem 3.4.5 was proved earlier by FREIDLIN [49]and VENTCEL [lo81for both HILBERTspace as well as C([O,11). The important idea of obtaining the required exponential tightness in the BANACH space setting from the corresponding result for the empirical distributions (cf. Lemma 3.3.10) is due to DONSKER and VARADHAN but was, to some extent, anticipated by RANGARAO [89] in his elegant proof of the Strong Law (cf. Theorem 3.3.4). The applications to GAussian measures (in particular, Corollary 3.4.6) given in Section 3.4 are again basically due to DONSKER and VARADHAN, although the outline of our treatment follows that of AZENCOTT [l]and
Large Deviations
288
Corollary 3.4.6 itself should be viewed as the culmination of the program initiated by LANDAU and SHEPP[72]. Exercise 3.3.12 is taken from FOLLMER [47],and more recent results about BANACH spaces can be found in BOLTHAUSEN [ll].
CHAPTERIV Credit for the theory of large deviations of the occupation time functional for a MARKOV process should unquestionably go to DONSKER and VARADHAN;even though similar ideas and results were formulated and proved a little later by GARTNER[53].The moving force behind DONSKER and VARADHAN’S investigation was DONSKER’S stubborn conviction that something deep must underlie KAC’Sformula [66]for the smallest eigenvalue of a SCHRODINGER operator; and it is this force which led first to [29]and eventually to [30],[31], [33],[34],[36], and [107].
§4.1&$4.2 The results contained herein are, more or less, the same as those in [30]. However, in addition to some technical improvements in the hypotheses under which they worked, our presentation is entirely different from DONSKER and VARADHAN’S. Indeed, their derivation is much more a direct application of the principles underlying the proof in Section 1.2 of CRAMER’S theorem, whereas ours (which is a slight embellishment of the one adopted in [loll)is an immediate descendent of BAHADUR and ZABELL’S approach to CRAM~R’S Theorem. Moreover, our procedure for identifying the rate function is quite different from DONSKER and VARADHAN’S and was influenced by the heuristic exposition given in [67]by KAC. Finally, the possibility of working in the strong topology as well as that of dispensing with FELLER continuity were first considered, in this setting, by BOLTHAUSEN [12]. The extent to which these results extend to general irreducible Markov chains has been investigated by various authors: DE ACOSTA[25]and [26], N.C. JAIN[64]and [65], NEY [BO],and NEY and NUMMELIN[Bl].
$4.3 Apart for minor changes in the ingredients, the recipe which we have used to cook the WIENERsausage is the same as the original one in [31] and [32].In [35],DONSKERand VARADHAN applied the same ideas to random walks; and recently SZNITMAN [99]has carried out the analogous computation for hyperbolic spaces. 54.4 DONSKER and VARADHAN in [36]were the first to formulate and prove the large deviation principle at the level of processes. Aside from
Historical Notes and References
289
the intrinsic aesthetic appeal of this formulation (in particular, the reappearance of entropy as the rate function), it was only after introducing this formulation that they were able to solve the so-called Polaron problem in [37]. It should be remarked that some of the arguments in this section are intimately related to similar results in information theory. In particular, the lower bound at the process level can be viewed as an application of SHANNON-MC MILLAN technology; and this is the way it was developed first in MOY[78]and later in FOLLMER [46]and [47],and OREY[83]and [84]. However, direct application of SHANNON-MC MILLANideas do not prove the lower bound except at ergodic points; and there is work to be done before one can handle the general case. In an attempt to get away from processes and to handle random fields, several authors, have carried out a version of the DONSKER and VARADHAN program in connection with GIBBSmeasures for lattice systems (cf. COMETS[l9], FOLLMER and OREY[48]and OLLA[82]). ELLISwas the first to suggest that the process level result ought to be obtainable from a multi-dimensional position level result, and he carried out such a program first for independent variables (cf. [39]) and, more recently, for MARKOVchains (cf. [40] and [41]). Our own treatment in Section 4.4 is based on the same idea.
CHAPTERV 55.1 This section basically reproduces the first part of Chapter 8 of The criterion in EXERCISE 5.1.17 iii) already appeared in [33].
[loll.
55.2 So far as we know, the first person to see the Maximal Ergodic Inequality as an easy corollary of the Sunrise Lemma was P. HARTMAN [59]. The history of the Ergodic Decompostion Theorem starts with the paper [70] by KRYLOVand BOGOLIOUBOFF. Their results were re-worked by OXTOBYin [86],and it is on OXTOBY’S ideas that our own proof is based. 55.3 This section is taken from the second part of Chapter 8 of [ l o l l . The motivation here is to provide conditions which have a chance of holding even in an infinite dimensional context. See [62] for an example of this sort.
55.4 & 55.5 These two sections are based on CHIYONOBU and KUSUOKA [18]. Our proof of the upper bound of Theorem 5.4.27 takes into account the results of 5.1 but otherwise differs very little from theirs. On the other
Large Deviations
290
hand, our proof of the lower bound is based on the methods of Section 3.1 and, as such, is quite different from theirs. The hypermixing property was formulated and used (in the context of constructive quantum field theory) by GUERRA, ROSENand SIMON[57]. It is clear that the notion is intimately related to NELSON’Sideas about hypercontractive semigroups (cf. [79] and the discussion for Chapter VI below). Lemma 5.5.9 is due to SIMON[96], and Corollary 5.5.16 was first derived in [57]. Other references to large deviations for non-MARKOV processes are: DONSKERand VARADHAN [38] for GAussian processes and OREY [84], OREYand PELIKAN [85],and TAKAHASHI [lo51 for dynamical systems.
CHAPTERVI $6.1 This section is an expanded version of the contents of Chapter 9 of [loll. Hypercontractivity was introduced by NELSON[79] in connection with his construction of a two-dimensional quantum field where he proved it for the ORNSTEIN-UHLENBECK semigroup and used it in the form that it appears in (6.1.20). A precursor of the logarithmic Sobolev inequality can be found in the article [44]by FEDERBUSH, but a its systematic exploitation appears for the first time in GROSS’S[56],and [loll may be the first place where it is stated in full generality. The second part of LEMMA 6.1.5 is due to GLIMM[54].
$6.2 The key to our handling of diffusions on a non-compact RIEMANNian manifold is contained in Lemma 6.2.12 which, in turn, is taken from GAFF -NEY [52]. In particular, it is GAFFNEY’S result which tells us how to exploit completeness. BAKRYand EMERY[3]were the first ones to provide the local condition for hypercontractivity given in Theorem 6.2.42. Although our treatment is derived from their ideas, it is not clear from their presentation when one can work in a non-compact setting. On the other hand, their formulation is couched in more abstract terms and therefore may be applied in situations which are not covered by us. The formula (6.2.32) is familiar to differential geometers, who think of it as an application of the BOCHNERLICHNEROWICZWEITZENBOCK formula. Closely related topics are treated in YAu [113],DAVIESand SIMON[23], and BAKRY[2]. Finally, CARLEN and STROOCK [16]apply BAKRYand EMERY’S criterion to certain infinite dimensional diffusion processes. The idea outlined in Exercise 6.2.48 comes from EMERYand YUKICH [421.
Historical Notes and References
291
s6.3 The contents of this section are taken from [9], where they are used to address the sort of question raised in Exercise 6.3.20. Related computations and ideas appear in the article [88] by R. PINSKY.
Large Deviations
292
REFERENCES
[l] R. Azencott, Grandes ddviataons et applications, in “Ecoles d’Etk de Probabilitks de Saint-Flour VIII-1978,” edited by P.L. Hennequin. Lecture Notes in Mathematics 774,Springer, Berlin, 1980, pp. 1-176. [2] D. Bakry, Un critkre de non-explosion pour certaines diffusions sur une varidtd riemannienne complkte, C.R. Acad. Sc. Paris SCrie I 303 (1986), 23-25. [3]D.Bakry and M. Emery, Diffusions hypercontractives, in “SBminaire de probabilitks XIX,” Lecture Notes in Mathematics 1123,Springer, Berlin, 1985, pp. 179-206. [4]R.R. Bahadur, Rates of convergence of estimates and test statistics, Ann. Math. Statist. 38 (1967), 303-324. [5] R.R. Bahadur, “Some Limit Theorems in Statistics,” Society for Industrial and Applied Mathematics, Philadelphia, 1971.
[6]R.R. Bahadur and R. Ranga Rao, O n deviations of the sample mean, Ann. Math. Statist. 31 (1960), 1015-1027.
[7]R.R. Bahadur and S.L. Zabell, Large deviations of the sample mean in general.vector spaces, Ann. Probab. 7 (1979), 587-621. [8] 0. Barndorff-Nielsen, “Information and Exponential Families in Stat istical Theory,” Wiley, Chichester, 1978.
[Q] P.H. Baxendale and D.W. Stroock, Large deviations and stochastic flows of diffeomorphisms, Probab. Th. and rel. Fields 80 (1988), 169-216.
[lo] J-M. Bismut, “Large Deviations and the Malliavin Calculus,” Birkhauser, Basel, 1984.
[ll] E. Bolthausen, O n the probability of large deviations in Banach spaces, Ann. Probab. 12 (1984), 427-435. [12] E. Bolthausen, Markov process large deviations in the r-topology, Stoch. Proc. and Appl. 25 (1987), 95-108. [13] J.-M. Bony, Principe du maximum, indgalite‘ de Harnack et unicite‘ du problkme de Cauchy pour les operateurs elliptiques ddgdndre‘s, Ann. Inst. Fourier XIX no. 1 (1969), 277-304. [14]A.A. Borovkov, Boundary-value problems for random walks and large deviations in function spaces, Th. Prob. Appl. 12 (1967), 575- 595.
Historical Notes and References
293
[15] R.H. Cameron and W.T. Martin, Transformations of Wiener integrals under translations, Ann. Math. 45 (1955), 386-396. [16] E.A. Carlen and D.W. Stroock, A n application of the Bakry- Emery
criterion to infinite dimensional diffusions, in “Skminaire de probabilitCs XX,” Lecture Notes in Mathematics 1204, Springer, Berlin, 1986, pp. 341-347. [17] H. Chernoff, A measure of asymptotic efficiency for tests of a hypoth-
esis based on the sum of observations, Ann. Math. Statist. 23 (1952), 493-507. [18] T. Chiyonobu and S. Kusuoka, The large deviation principle for hy-
permixzng processes, Probab. Th. and Rel. Fields 78 (1988), 627-649. [19] F. Comets, Grandes de‘uiations pour des champs de Gibbs sur hd, C.R.
Acad. Sc. Paris, Skrie I 3 0 3 (1986), p. 511. [20] H. Cram&, Sur un nouveau the‘odme-Zimite d e la the‘orie des probabilitb, Actualitks Scientifiques et Industrielles 736 (1938), 5-23. Col-
loque consacrk B la thkorie des probabilitks, Vol. 3, Hermann, Paris. [21] Csisz&r, I-divergence geometry of probability distributions and minimization problems, Ann. Probab. 3 (1975), 146-158, [22] D. Dacunha-Castelle, Formule de Chernoapour une suite de vari-
abEes re‘elles, in “Grandes Deviations et Applications Statistiques,” AstCrisque 68, SociktC Mathkmatique de Ekance, Paris, 1979, pp. 1924.
[23] E.B. Davies and B. Simon, Ultracontractivity and the heat kernel for Schrodinger operators and Dirichlet Laplacians, J. Func. Anal. 59 (1984), 335- 395. J. Gartner, Long time fluctuations of weakly interacting diffusions, Stochastics 20 (1987), 247-308.
[24] D.W. Dawson and
[25] A. de Acosta, Upper bounds for large deviations of dependent random vectors, Z. Wahrsch. verw. Geb. 69 (1985), 551-565.
[26] A. de Acosta, Large deviations for vector valued functionals of a Markou chain: lower bounds, Ann. Probab. 16 (1988), 925-960. 1271 J.D. Deuschel and D.W. Stroock, A fanetion space large d e v i ~ ~ i o n
principle for certain stochastic integrals, Probab. Th. Rel. Fields (to
294
Large Deviations appear).
[28]W.Doeblin, Ele‘ment d’une the‘orie ge‘ne‘rale des chaines simples constantes de Markofl Ann. Sc. Ecole Norm. Sup. 57 (1940), 61-111. [29]M.D. Donsker and S.R.S. Varadhan, in “Functional Integration and Its Applications,” Proceedings of the International Conference Held a t Cumberland Lodge, Winstor Great Park, London, April 1974, Edited by A.M. Arthurs, Clarenton, Oxford, pp. 15-33. [30] M.D. Donsker and S.R.S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time,I, Comm. Pure Appl. Math. 28 (1975), 1-47. [31] M.D. Donsker and S.R.S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time,II, Comm. Pure Appl. Math. 28 (1975), 279-301. [32] M.D. Donsker and S.R.S. Varadhan, Asymptotics for the Wiener sausage, Comm. Pure Appl. Math. 28 (1975), 525-565. [33]M.D. Donsker and S.R.S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time, 111, Comm. Pure Appl. Math. 29 (1976), 389-461. [34] M.D. Donsker and S.R.S. Varadhan, O n the principal eigenvalue of second-order elliptic differential operators, Comm. Pure Appl. Math. 29 (1976), 595-621. [35] M.D. Donsker and S.R.S. Varadhan, On the number of distinct sites visited by a random walk, Comm. Pure Appl. Math. 32 (1979), 721-747. [36]M.D. Donsker and S.R.S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time,IV, Comm. Pure Appl. Math. 36 (1983), 183-212. [37] M.D. Donsker and S.R.S. Varadhan, Asymptotics for the poZaron, Comm. Pure Appl. Math. 36 (1983), 505-528. [38]M.D. Donsker and S.R.S. Varadhan, Large deviations for stationary Gaussian processes, Comm. Math. Phys. 97 (1985), 187-210. I391 R.S. Ellis, “Entropy, Large Deviations and Statistical Mechanics,” Springer, Berlin, 1985. [40] R.S. Ellis, Large deviation for the empirical measure of a Markov
Historical Notes and References
295
chain with an application to the multivariate empirical measure, Ann. Probab. 16 (1988), 1496-1508. [41] R.S. Ellis and A. Wyner, Uniform large deviation property of the
empirical measure of a Markov chain, Ann. Probab. (to appear). [42] M. Emery and J.E. Yukich, A simple proof of the logarithmic inequal-
ity on the circle, in “SCminaire de Probabilitbs XXI,” Lecture Notes in Mathematics 1247, Springer, Berlin, 1987, pp. 173- 176. [43] F. Esscher, O n the probability function in the collective theory of risk, Skandinavisk Aktuarietidskrift 15 (1932), 175-195. [44] P. Federbush, Partially alternative derivation of a result of Nelson, J . Math. Phys. 10 no. 1 (1969), 50-52. [45] X. Fernique, Re‘gularite‘ des trajectoires des fonctions ale‘atoires gaussi-
ennes, in “Ecoles d’EtC de ProbabilitCs de Saint-Flour IV-1974,” edited by P.L. Hennequin. Lecture Notes in Mathematics 480, Springer, Berlin, 1975, pp. 1-97. [46] H. Follmer, On entropy and information gain in random fields, Z. Wahrsch. verw. Geb. 26 (1973), 207-217. [47] H. Follmer, Random fields and digusion processes, in “Ecoles d’EtC
de ProbabilitCs de Saint-Flour XVI- 1986” (to appear). [48] H. Follmer and S. Orey, Large deviations for the empirical field of a Gibbs measure, Ann. Probab. 16 (1988), 961-977.
[49] M.I. Freidlin, Action functional for a class of stochastic processes, Th. Prob. Appl. 17 (1972), 511-515. [50] M. Fukushima, “Dirichlet Forms and Markov Processes,” NorthHol-
land, Amsterdam, 1980. [51] M. Fukushima and D.W. Stroock, Reversibility of solutions to mar-
tingale problems, in “Probability, Statistical Mechanics, and Number Theory Advances in Mathematics Supplemental Studies, Vo1.9,” Academic Press, New York, 1986, pp. 107-123. [52] M.P. Gaffney, The conservation property of the heat equation on riemannian manifolds, Comm. Pure Appl. Math. 12 (1959), 1-11. [53] J. Gartner, On large deviations from the invariant measure, Th. Prob. Appl. 22 (1977), 24-39. 154) J. Glimm, Boson fields with nonlinear self-interaction in two dimen-
Large Deviations
296
sions, Comm. Math. Phys. 8 (1968), 12-25. [55] P. Groeneboom, J. Oosterhoff and F.H. Ruymgaart, Large deviation theorems f o r empirical probability measures, Ann. Probab. 7 (1979),
553-586. [56] L. Gross, Logarithmic Sobolev inequalities, Amer. J. Math. 97 (1976),
1061-1083. [57] F. Guerra, L. Rosen and
B. Simon, The P(ip), Euclidean quantum field theory as classical statistical mechanics, Ann. of Math. 101 (1975), 111-259.
[58] G.H. Hardy and J.E. Littlewood, A mmimal theorem with f i n c -
tion-theoretic applications, Acta Math. 5 (1930), 81-116. [59] P. Hartman, On the ergodic theorem, Am. J. Math. 69 (1947), 193-199. [60] A.B. Hoadley, O n the probability of large deviations of functions of several empirical cdf ’s, Ann. Math. Stat. 38 (1967), 36G382. [61]
W. Hoeffding, O n probabilities of large deviations, in “Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability,” Univ. of California Press, Berkeley, 1965, pp. 203-219.
[62] R. Holley and D.W. Stroock, Logarithmic Soboleu inequalities and stochastic Ising models, J. Stat. Physics 46 (1987), 1159-1194. [63] L. Hormander, Hypoelliptic second order differential equations, Acta Math. 119 (1967), 147-171. [64] N.C. Jain, A Donsker- Varadhan type invariance principle, 2. Wahrsch. verw. Geb. 59 (1982), 117-138. [65] N.C. Jain, Large deviation lower bozlnds f o r additive functionals of
Markov processes, Ann. Prob. (to appear). [66] M. Kac, On some connections between probability theory and d z e r -
ential and integral equations, in “Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability,” Univ. of California Press, Berkeley, 1950, pp. 189-215. [67] M. Kac, “Integration in Function Spaces,” Fermi Lectures, Academia
Nazionale dei Lincei Scuola Normale Superiore, Pisa, 1980. [68] J.H.B. Kemperman, O n the optimum rate of transmitting information,
in “Probability and Information Theory,” Lecture Notes in Math.89,
Historical Notes and References
297
Springer, Berlin, 1967, pp. 120-169.
(691 A.I. Khinchin, Uber einen neuen Grenzwertsatz der Wahrscheinlichkeitsrechnung, Math. Annalen 101 (1929), 745-752. [70]N.Krylov and N. Bogolioubov, La the‘orie ge‘ne‘rale de la mesure duns son application ii l’e‘tude des systdmes de la me‘canique non line‘aires, Ann. of Math. 38 (1937), 65-113.
[71]S. Kullback and R.A. Leibler, O n information and suficiency, Ann. Math. Statist. 22 (1951), 79-86. [72]H.J Landau and L.A. Shepp, O n the supremum of a Gaussian process, Sankhya Ser. A no. 32 (1970), 369-378. [73]O.E. Landford, Entropy and equilibrium states in classical statistical mechanics, in “Statistical Mechanics and Mathematical Problems,” Edited by A. Lenard. Lecture Notes in Physics 20,Springer, Berlin, 1973, pp. 1-113. [74] E. Lieb, Existence and uniqueness of the minimizing solution of Choquard’s non-linear equation, Studies in Appl. Math. 57 (1977), 93-105. [75] Y.V. Linnik, O n the probability of large deviations for the sums of independent variables, in “Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability,” Univ. of California Press, Berkeley, 1961, pp. 289-306. [76]J. Milnor, ‘‘Morse Theory,’’ Princeton Univ. Press, Princeton, 1969. [77]S. Molchanov, Dzffusion processes and Riemannian geometry, Russian Math. Surveys 30 (1975), 1- 53. [78]S.T. Moy, Generalisation of Shannon-Mc Millan Theorem, Pacific J. Math. 11 (1960), 1371-1438. [79] E. Nelson, The free Markov field, J. Func. Anal. 12 (1973), 211-227.
[80]P. Ney, Dominating points and the asymptotics of large deviations for random walk o n Wd,Ann. Probab. 11 (1983), 158-167. [81]P.Ney and E. Nummelin, Markov additive processes 11: large deviations, Ann. Probab. 15 (1987), 593-609. [82] S. Olla, Large deviations for Gibbs random fields, Prob. Th. Rel. Fields 77 (1988)’ 343-357.
Large Deviations
298
[83] S. Orey, O n the Shannon-Perez- Moy theorem, in “Proceedings on
Particle systems, random media and large deviations (New Brunswick, Maine),” Contemp. Math. 41, A.M.S., Providence R.I., 1985, pp. 3 19-327. [84] S. Orey, Large deviations in ergodic theory, in “Seminar on Stochas-
tic Processes,” Edited by E.Cinlar, K.L. Chung and R.K. Getoor, Birkhauser, Basel, 1985, pp. 195-249. [85] S. Orey and S. Pelikan, Large deviation principles for stationary processes, Ann. Probab. 16 (1988), 1481-1496.
[86] J.C. Oxtoby, Ergodic sets, Bull. Amer. Math. SOC.58 (1952),
116-136. [87] V.V. Petrov, “Sums of Independent Random Variables,” translated
by A.A. Brown, Springer, Berlin, 1975. [88] R. Pinsky, On evaluating the Donsker- Varadhan I-finctional, Ann. Probab. 13 (1985), 342-362. [89] R. Ranga Rao, Relations between weak and uniform convergence of measures with applications, Ann. Math. Statis. 33 (1962), 659-680. [go] V. Richter, Local limit theorems for large deviations, Th. Prob. Appl. 2 (1957), 206-220. [91] F. Riesz, Sur un the‘or2me de maximum de MM. Hardy et Littlewood,
J. London Math. SOC.7 (1931), 10-13. [92] D. Ruelle, Correlation functionals, J. Math. Physics 6 (1965), 201
-220. [93] D. Ruelle, A variational formulation of equilibrium statistical mechan-
ics and the Gibbs phase rule, Comm. Math. Phys. 5 (1967), 324-329. [94] I.N. Sanov, O n the probability of large deviations of random variables, (in Russian), Mat. Sb. 42 (1957), 11-44. (English translation in Se-
lected ’Ikanslations in Mathematical Statistics and Probability I(1961) pp. 213- 244.) [95] M. Schilder, Some asymptotics formulae for Wiener integrals, Trans. Amer. Math. SOC.125 (1966), 63-85. 1961 B. Simon, “The P(@)zEuclidian (Quantum) Field Theory,” Princeton
Univ. Press, Princeton, 1974. [97] A.V. Skorokhod, Limit theorems f o r stochastic processes, Th. Prob.
Historical Notes and References
299
and Appl. 1 (1956), 261-290. [98] N. Smirnoff, Uber Wahrscheinlichkeiten grosser Abweichungen, Rec. SOC.Math. Moscou 40 (1933), 441-455. [99] A.S. Sznitman, Lipschitz tail and Wiener sausage on hyperbolic space, Comm. Pure Appl. Math. (to appear).
[loo]
V. Strassen, An invariance principle f o r the law of the iterated logarithm, Z. Wahrsch. verw. Geb. 3 (1964), 211-226.
[loll
D.W. Stroock, “An Introduction to the Theory of Large Deviations,” Springer, Berlin, 1984.
[lo21 D.W. Stroock, On the rate at which a homogeneous diffusion approaches a limit, an application of the large deviation theory of certain stochastic integrals, Ann. Probab. 14 (1986), 840-859. [lo31 D.W. Stroock and S.R.S. Varadhan, O n the support of diffusion processes, with applications to the strong maximum principle, in “Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability,” Univ. California Press, Berkeley, 1970, pp. 333-360. [lo41 D.W. Stroock and S.R.S. Varadhan, “Multidimensional Diffusion Processes,” Springer, Berlin, 1979. [lo51 Y. Takahashi, Entropy function (free energy) for dynamical systems and their random perturbations, in “Proceedings Taniguchi Symposium on Stochastic Analysis at Katata and Kyoto,” edited by K. It6, Kinokuniya and North Holland, Tokyo, 1982. [lo61 S.R.S. Varadhan, Asymptotic probabilities and differential equations, Comm. Pure Appl. Math. 19 (1966), 261-286. [lo71 S.R.S. Varadhan, “Large Deviations and Applications,” Society for Industrial and Applied Mathematics, Philadelphia, 1984. [lo81 A.D. Ventcel, Action functional f o r gaussian random function, Th. Prob. Appl. 17 (1972), 515-517. [log] A.D. Ventcel and M.I. Freidlin , O n small perturbations of dynamical systems, Russian Math. Surveys 25 (1970), 1- 55. [110] A.D. Ventcel and M.I. Freidlin , Some problems concerning stability under small random perturbations, Th. Prob. Appl. 17 (1972), 269283. [lll] A.D. Ventcel and M.I. Freidlin, “Random Perturbations of Dynam-
300
Large Deviations ical Systems,” translated by J. Sziics, Springer, Berlin, 1984.
[112]N. Wiener, Diflerential spaces, J. Math. Phys. 2 (1923), 131-174. [113] S.T.Yau, On the heat kernel of a complete Riemannian manifold, J. Math. Pures et Appl. 57 (1978), 191-201.
Frequently Used Notation
Symbols
Description
Page
BORELfield over E bounded, measurable functions on C uniform norm on B ( C ;R) open ball of radius r around x BAKRY-EMERY hypercontractivity criterion smooth functions with compact support indicator function of r compact subset of domains of L & Lv FELLERdomain of L domain of & splice of measures SKOROKHOD'S path space LAPLACE-BELTRAMI operator divergence of X DIRICHLET form conditional expectation operator 6-MARKOV property open &hull of F LEBESGUEmeasure of r closure & interior in the r-topology GAussian measure with variance c RIEMANNian gradient of 4 smooth sections of T(Z) relative entropy functional relative entropy restricted to the interval I specific entropy function underlying HILBERT space for W associated norm SOBOLEV space associated norm
8 101 101 12 264 12, 147 65 11 123 125 129 171 169 251 250 129 231 232 31 146 73 2 250 250 68 215 215 11 11 154 154
hypermixing conditions HEssian o f f
214 261
Large Deviations
302
11 1IH.S.
CH) Jn Je 3s [TI
-Jn JP
JL J P
J&
JAW' -(w)
JP
L, Lt L & L ~ L
11 llp(m)+Lz(m)
HILBERT-SCHMIDT norm HORMANDER 's hypoellipticity condition II-invariant probability measures 0-invarian t sets time-shift invariant sets integer part of T rate function determined by II position level rate functions for P ( t ,0 , .) position level rate function for & process level rate function for II process level rate function for P ( t ,cr, empirical distribution functional empirical distribution functional generators of {Pt : t > 0) & {P," : t > 0) generator of {Ft : t > 0) norm for operator on L 2 ( m )into itself norm for operators on P ( p ) into L Q ( p ) lowest eigenvalue of A in G LEGENDREtransform of J& logarithmic moment function of p logarithmic moment function of LEGENDREtransform of A, logarithmic moment function for II LEGENDREtransform of An logarithmic moment function for P ( t ,(T,.) LEGENDREtransform of Ap variant of the preceding variant of A p LEGENDREtransform of A p variant of the preceding logarithmic moment function of W LEGENDREtransform of Aw logarithmic SOBOLEV inequality space of probability measures on C space of finite signed measures on C @-invariantprobability measures on R ergodic elements of My ((a,B)) time-shift invariant probability measures on R time-shift ergodic probability measures on R mean of the measure p a)
26, 262 273 106 194 214 21 103 125 129 167 175 68, 92 111 123 129 130 231 147 130 3, 78 68 4, 78 101 101 120 120 125 190 190 191 8 10 242 64 64 194 197 167, 176 214 78
Notation
303
(d,
mean of q5 under m splice of p with Il p@dn distribution of 5, under Pe Pd,n distribution of L, under and P, Pa,, distribution of L, under pn Pn distribution of St under P, Pu,t covariant derivative of Y relative to X VXY splice of v with P, u @T p* splice of paths W T 8 w' f i ( b , & l-I(o,-) transition probability functions [discrete-time) FEYNM AN- K AC kernel (discrete-time) law of the MARKOVchain starting from b & c cont inuous-time transition probability function variant of the preceding FEYNMAN-KAC semigroup FEYNMAN-KAC kernel (continuous time) L2(m)-extension of {Pi : t > 0) law of the MARKOVprocess starting from o r.c.p.d. of P given B1 splice of P with II empirical process measures variant of the preceding regular conditional probability distribution RICCIcurvature tensor partial sums normalized partial sums additive & normalized additive functionah unit circle €-sausage around 01[ o , t ~ tail a-field Rd-valued WIENER paths and dual associated norm duality relation between 0 ' and 0 tangent bundle over C a uniform ergodic condition for fi(8,.) a uniform ergodic condition for n(a, a uniform ergodic condition for ?(t, a,.) a variant of the preceding (total) variation norm WIENER'S measure a)
d)
25 7 165 92 92 68 111 250 177 f 70 91, 92 101 91, 92 110
110 121 122 129 111 231 168 161, 171 214 163 262 59, 91, 93 59, 91, 93 If0 202 146 205 8 8 8 250 95 100 113
240 64 8
304
Large Deviations
* W$')(X,A) (XIY) [X, YI
X* X'(
' >x
weak convergence of measures a SOBOLEV space RIEMANNian inner product of X and Y commutator of X and Y topological dual of X duality relation between X* and X
1
275 250 250 53 53
Subject Index
afEne property of specific entropy, 181 Azencott, 24 empirical distribution functional, 68,92 of the position process, 111 Cameron and Martin’s formula, 14,19
of the whole process, 161, 171
Chapman-Kolmogorov equation, 110
empirical process measure, 214
Chiyonobu and Kusuoka, 225
entropy-see relative entropy
classical Sobolev inequality, 248
€-Markov, 232
conditional probability distribution, 198 covariant notion of large deviations, 36 CramWs theorem
backward, 236 forward, 236 ergodic
classical, 5
decomposition theorem, 201
for Banach spaces, 83
elements, 197
for Gaussian measures, 86
individual theorem, 196
generalized , 61
maximal inequality, 195
in
R N ,63
exponentially tight 41
decreasing rearrangement, 158
Feller continuous, 103, 125
Dirichlet form, 129
Fernique’s theorem, 16
discrete one-parameter semigroup, 203
Feynman-Kac formula
Doeblin’s theory of ergodicity for Markov chain, 106
discrete-time, 102 continuous-time, 121
Donsker and Varadhan, 83, 86, 105, 127,
133, 146, 159, 169, 180
Gafhey’s lemma, 252
306
Large Deviations
Gaussian measure on Banach spaces, 85
logarithmic Sobolev inequality, 242
covariance of, 85
logarithmic spectral radius, 101, 122
tail estimate of, 86
lower bound for symmetric rnarkov pro-
good rate function, 36
Gross’s logarithmic Sobolev inequalities, 242, 249
cesses, 210 lower-semicontinuous convex minorant, 57 m-symmetric, 128
Hardy-Littlewood maximal inequality, 193 Hessian, 261 Hormander’s condition, 273 hypercontractive, 232, 238 hypermixing, 213 individual ergodic theorem-see ergodic
maximal ergodic inequality-see ergodic mean of the measure, 80 measurable group of transformation, 201 measurable one parameter semigroup of transformations, 194 moment generating functionsee logarithmic moment generating function
[-measurably separated, 213 Lanford, 59
non-decreasing function, 185
LaplaceBeltrami operator, 251 large deviation principle for hypermixing processes, 225 full, 35 for symmetric Markov processes, 133,
210 uniform, for Markov chains, 97, 105 uniform, for Markov processes, 119,127 uniform, for Markov chains at process level, 167,169 uniform, for Markov processes at process level, 175, 180 uniform, for Markov processes w.r.t. the variation- norm topology, 145 weak , 40 law of the iterated logarithm classical , 32 Strassen’s, 21 Legendre transform, 4,55
Ornstein-Uhlenbeck process, 240, 269 n-ergodic, 204 n-invariant, 106 Polish space, 1 projective limits, 50 strong topology, 174
{pt : t
> 0)-invariant,
134
Ranga Rao’s theorem, 78 rate function, 35 good , 36 regular conditional probability distribu -tion, 163 relative entropy, 70 variational formula for , 68 reversing measure, 128 Ricci curvature, 262
LBvy metric, 64
Riemann curvature, 262
logarithmic moment generating function,
Riesz’s sunrise lemma, 193
3,53
Ruelle, 59
Subject Index
307 tight
Sanov’s theorem 70 w.r.t. the strong topology, 73 Schilder’s theorem, 18 shift-invariant, 164, 167, 176
S korokhod space, 169 topology, 169 representation theorem, 32 smooth probability measure, 271 Sobolev classical inequality, 248 logarithmic inequaltity, 242 space, 154 specific relative entropy, 182, 215 affine property of, 181, 222 Strassen’s theorem, 21 strong law of large numbers in Banach spaces, 78 symmetric Markov process, 128
function, 185 set, 64 time-shift semigroup, 171 transformation group, 214 topology strong, 71 7-, 71
uniform norm, 140 variation-norm, 140 weak, 52, 64 transition probability function, 91, 110 upper bound, 189 Varadhan’s theorem, 43 Ventcel and Fkeidlin’s estimate, 31 Wiener quadruple, 88 Wiener sausage, 146 asymptotics of, 159
tail estimate for Gaussian measures, 86
Wiener measure 8
tail a-algebra, 205
scale invariance property of, 9
@-invariant, 194
quasi-invariance property of, 14
PURE AND APPLIED MATHEMATICS VOl. 1 VOl. 2 VOl. 3
VOl. 4 VOl. Vol. VOl. Vol. VOl.
5
6
7 8 9
VOl. 10
VOl. 11* VOl. 12* Vol. 13 Vol. 14 Vol. 15* Vol. Vol. Vol. VOl. Vol. VOl.
16* 17 18 19 20 21
VOl. 22 Vol. 23* VOl. 24
Arnold Sommerfeld, Partial Differential Equations in Physics Reinhold Baer , Linear Algebra and Projective Geometry Herbert Busemann and Paul Kelly, Projective Geometry and Projective Metrics Stefan Bergman and M. Schiffer, Kernel Functions and Elliptic Differential Equations in Mathematical Physics Ralph Philip Boas, Jr., Entire Functions Herbert Busemann, The Geometry of Geodesics Claude Chevalley, Fundamental Concepts of Algebra Sze-Tsen Hu, Homotopy Theory A. M. Ostrowski, Solution of Equations in Euclidean and Banach Spaces, Third Edition of Solution of Equations and Systems of Equations J . Dieudonnt, Treatise on Analysis: Volume I, Foundations of Modern Analysis; Volume II; Volume III; Volume IV; Volume V; Volume VI; Volume VII S. I . Goldberg, Curvature and Homology Sigurdur Helgason, Differential Geometry and Symmetric Spaces T . H. Hildebrandt, Introduction to the Theory of Integration Shreeram Abhyankar, Local Analytic Geometry Richard L. Bishop and Richard J. Crittenden, Geometry of Manifolds Steven A. Gad, Point Set Topology Barry Mitchell, Theory of Categories Anthony P . Morse, A Theory of Sets Gustave Choquet, Topology Z. I. Borevich and I. R. Shafarevich, Number Theory JosC Luis Massera and Juan Jorge Schaffer, Linear Differential Equations and Function Spaces Richard D. Schafer, A n Introduction to Nonassociative Algebras Martin Eichler, Introduction to the Theory of Algebraic Numbers and Functions Shreeram Abhyanker, Resolution of Singularities of Embedded Algebraic Surfaces
Presently out of print
Vol. 25 Vol. Vol. Vol. Vol.
26 27 28* 29
Vol. 30 Vol. 31 Vol. 32 VOl. 33 VOl. 34* VOl. 35
Vol. VOl. Vol. VOl. Vol.
36 37 38 39 40*
Vol. 41* Vol. 42 VOl. 43 VOl. 44
VOl. 45 Vol. 46 VOl. 47
Vol. 48
Franqois Treves, Topological Vector Spaces, Distributions, and Kernels Peter D. Lax and Ralph S . Phillips, Scattering Theory Oystein Ore, The Four Color Problem Maurice Heins, Complex Function Theory R. M. Blumenthal and R. K . Getoor, Markov Processes and Potential Theory L. J . Mordell, Diophantine Equations J . Barkley Rosser, Simplified Independence Pro0fs: Boolean Valued Models of Set Theory William F . Donoghue, Jr., Distributions and Fourier Transforms Marston Morse and Stewart S . Cairns, Critical Point Theory in Global Analysis and Differential Topology Edwin Weiss, Cohomology of Groups Hans Freudenthal and H. De Vries, Linear Lie Groups Laszlo Fuchs, Infinite Abelian Groups Keio Nagami, Dimension Theory Peter L. Duren, Theory of H p Spaces Bod0 Pareigis, Categories and Functors Paul L. Butzer and Rolf J . Nessel, Fourier Analysis and Approximation: Volume I, One-Dimensional Theory Eduard PrugoveCki, Quantum Mechanics in Hilbert Space D. V. Widder, An Introduction to Transform Theory Max D . Larsen and Paul J. McCarthy, Multiplicative Theory of Ideals Ernst-August Behrens, Ring Theory Morris Newman, Integral Matrices Glen E. Bredon, Introduction to Compact Transformation Groups Werner Greub, Stephen Halperin, and Ray Vanstone, Connections, Curvature, and Cohomology: Volume I, De Rham Cohomology of Manifolds and Vector Bundles Volume 11, Lie Groups, Principal Bundles, and Characteristic Classes Volume III, Cohomology of Principal Bundles and Homogeneous Spaces Xia Dao-xing, Measure and Integration Theory of InfiniteDimensional Spaces: Abstract Harmonic Analysis
Ronald G. Douglas, Banach Algebra Techniques in Operator Theory Vol. 50 Willard Miller, Jr ., Symmetry Groups and Theory Applications Arthur A. Sagle and Ralph E. Walde, Introduction to Lie Vol. 51 Groups and Lie Algebras T . Benny Rushing, Topological Embeddings Vol. 52 VOl. 53* James W. Vick, Homology Theory: A n Introduction to Algebraic Topology E. R. Kolchin, Differential Algebra and Algebraic Groups VOl. 54 VOl. 55 Gerald J. Janusz, Algebraic Number Fields Vol. 56 A. S . B. Holland, Introduction to the Theory of Entire Functions VOl. 57 Wayne Roberts and Dale Varberg, Convex Functions Vol. 58 H. M. Edwards, Riemann’s Zeta Function VOl. 59 Samuel Eilenberg, Automata, Languages, and Machines: Volume A, Volume B Vol. 60 Morris Hirsch and Stephen Smale, Differential Equations, Dynamical Systems, and Linear Algebra Wilhelm Magnus, Noneuclidean Tesselations and Their Group Vol. 61 Vol. 62 FranGois Treves, Basic Linear Partial Differential Equations Vol. 63* William M . Boothby, A n Introduction to Differentiable Manifolds and Riemannian Geometry Vol. 64 Brayton Gray, Homotopy Theory: An introduction to Algebraic Topology Vol. 65 Robert A. Adams, Sobolev Spaces Vol. 66 John J. Benedetto, Spectral Synthesis Vol. 67 D. V. Wilder, The Heat Equation Vol. 68 Irving Ezra Segal, Mathematical Cosmology and Extragalactic Astronomy Vol. 69 I . Martin Isaacs, Character Theory of Finite Groups Vol. 70 James R. Brown, Ergodic Theory and Topological Dynamics Vol. 71 C. Truesdell, A First Course in Rational Continuum Mechanics: Volume I , General Concepts Vol. 72 K. D. Stroyan and W. A. J. Luxemburg, Introduction to the Theory of Infinitesimals VOl. 73 B. M. Puttaswamaiah and John D. Dixon, Modular Representations of Finite Groups VOl. 74 Melvyn Berger ,Nonlinearity and Functional Analysis: Lectures on Nonlinearity Problems in Mathematical Analysis VOl. 75 George Gratzer, Lattice Theory
VOl. 49
Vol. 76
Charalambos D. Aliprantis and Owen Burkinshaw, Locally Solid Riesz Spaces Jan Mikusinski, The Bochner Integral VOl. 77 Vol. 78 Michiel Hazelwinkel, Formal Groups and Applications Vol. 79 Thomas Jech, Set Theory Vol. 80 Sigurdur Helgason, Differential Geometry, Lie Groups, and Symmetric Spaces Vol. 81 Carl L. DeVito, Functional Analysis Vol. 82 Robert B . Burckel, An Introduction to Classical Complex Analysis Vol. 83 C. Truesdell and R. G. Muncaster, Fundamentals of Maxwell’s Kinetic Theory of a Simple Monatomic Gas: Treated as a Branch of Rational Mechanics Vol. 84 Louis Halle Rowen, Polynomial Identities in Ring Theory Vol. 85 Joseph J. Rotman, An Introduction to Homological Algebra Vol. 86 Barry Simon, Functional Integration and Quantum Physics Vol. 87 Dragos M. Cvetkovic, Michael Doob, and Horst Sachs, Spectra of Graphs Vol. 88 David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications VOl. 89 Herbert Seifert, W . Threlfall, A Textbook of Topology Vol. 90 Grezegorz Rozenberg and Art0 Salomaa, The Mathematical Theory of L Systems Vol. 91 Donald W. Kahn, Introduction to Global Analysis Vol. 92 Eduard PrugoveCki, Quantum Mechanics in Hilbert Space, Second Edition VOl. 93 Robert M. Young, An Introduction to Nonharmonic Fourier Series VOl. 94 M. C. Irwin, Smooth Dynamical Systems Vol. 96 John B. Garnett, Bounded Analytic Functions VOl. 97 Jean Dieudonnk, A Panorama of Pure Mathematics: As Seen by N. Bourbaki Vol. 98 Joseph G. Rosenstein, Linear Orderings VOl. 99 M. Scott Osborne and Garth Warner, The Theory of Eisenstein Systems VOl. 100 Richard V. Kadison and John R. Ringrose, Fundamentals of the Theory of Operator Algebras: Volume 1, Elementary Theory; Volume 2, Advanced Theory VOl. 101 Howard Osborn, Vector Bundles: Volume I , Foundations and Stiefel-Whitney Classes
VOl. 102 Avraham Feintuch and Richard Saeks, System Theory: A Hilbert Space Approach Vol. 103 Barrett O’Neill, Semi-Riemannian Geometry: With Applications to Relativity VOl. 104 K. A. Zhevlakov, A. M. Slin’ko, I. P. Shestakov, and A. I. Shirshov, Rings That Are Nearly Associative Vol. 105 Ulf Grenander , Mathe~aticalExperiments on the Computer VOl. 106 Edward B. Manoukian, Renormalization Vol. 107 E. J. McShane, Unified Integration Vol. 108 A. P . Morse, A Theory of Sets, Revised and Enlarged Edition VOl. 109 K. P. S . Bhaskara-Rao and M. Bhaskara-Rao, Theory of Charges: A Study of Finitely Additive Measures VOl. 110 Larry C. Grove, Algebra VOl. 111 Steven Roman, The Umbra1 Calculus VOl. 112 John W. Morgan and Hyman Bass, editors, The Smith Conjecture Vol. 113 Sigurdur Helgason, Groups and Geometric Analysis: Integral Geometry, Invariant Differential Operators, and Spherical Functions Vol. 114 E. R. Kolchin, Differential Algebraic Groups Vol. 115 Isaac Chavel, Eigenvalues in Riemannian Geometry Vol. 116 W. D. Curtis and F. R. Miller, Differential Manifolds and Theoretical Physics Vol. 117 Jean Berstel and Dominique Perrin, Theory of Codes Vol. 118 A. E. Hurd and P. A. Loeb, A n Introduction to Nonstandard Real Analysis VOl. 119 Charalambos D . Aliprantis and Owen Burkinshaw, Positive Operators VOl. 120 William M. Boothby, A n Introduction to Differentiable Manifolds and Riemannian Geometry, Second Edition VOl. 121 Douglas C. Ravenel, Complex Cobordism and Stable Homotopy Groups of Spheres VOl. 122 Sergio Albeverio, Jens Erik Fenstad, Raphael Hsegh-Krohn, and Tom Lindstrram, Nonstandard Methods in Stochastic Analysis and Mathematical Physics Vol. 123 Albert0 Torchinsky, Real- Variable Methods in Harmonic Analysis Vol. 124 Robert J. Daverman, Decomposition of Manifolds Vol. 125 J. M. G. Fell and R. S. Doran, Representations of *-Algebras, Locally Compact Groups, and Banach *-Algebraic Bundles: Volume 1, Basic Representation Theory of Groups and Algebras
Vol. 126 J. M. G . Fell and R. S. Doran, Representations of *-Algebras, Locally Compact Groups, and Banach *-Algebraic Bundles: Volume 2, Induced Representations, the Imprimitivity Theorem, and the Generalized Mackey Analysis Vol. 127 Louis H. Rowen, Ring Theory, Volume I Vol. 128 Louis H . Rowen, Ring Theory, Volume I1 Vol. 129 Colin Bennett and Robert Sharpley , Interpolation of Operators Vol. 130 Jiirgen Poschel and Eugene Trubowitz, Inverse Spectral Theory Vol. t31 Jens Carsten Jantzen, Representations of Algebraic Groups Vol. 132 Nolan R. Wallach, Real Reductive Groups I VOl. 133 Michael Sharpe, General Theory of Markov Processes Vol. 134 Igor Frenkel, James Lepowsky, and Arne Meurman, Vertex Operators and the Monster Vol. 135 Donald Passman, lnfinite Crossed Products Vol. 136 Heinz -Otto Kreiss and Jens Lorenz, Initial-Boundary Value Problems rind the Nuvier-Stokes Equations Vol. 137 Jean-DominiqueDeuschel and Daniel W. Stroock, Large Deviations
This Page Intentionally Left Blank