Probability with Martingales David Williams Statistical Laboratory, DPMMS Cambridge University
Th~ right of th~ Uniwrsity 0/ Cambridg~ to print and s~1I all manner of books was granted by Henry VIII in 153~.
The Uniwrsity has printed and published continuously since /58<1.
CAMBRIDGE UNIVERSITY PRESS Cambridge New York Port Chester Melbourne Sydney
Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 lRP 40 West 20th Street, New York, NYl00ll, USA 10, Stamford Road, Oakleigh, Melbourne 3166, Australia
© Cambridge University Press 1991 Printed in Great Britain at the University Press, Cambridge
British Library cataloguing in publication data available Library of Congress cataloging in publication data available
Contents
Preface - please read!
Xl
A Question of Terminology
xiii
A Guide to Notation
xiv
Chapter 0: A Branching-Process Example
1
0.0. Introductory remarks. 0.1. Typical number of children, X. 0.2. Size of nth generation, Zn. 0.3. Use of conditional expectations. 0.4. Extinction probability, IT. 0.5. Pause for thought: measure. 0.6. Our first martingale. 0.7. Convergence (or not) of expectations. 0.8. Finding the distribution of Moo. 0.9. Concrete example.
PART A: FOUNDATIONS
14
Chapter 1: Measure Spaces
1.0. Introductory remarks. 1.1. Definitions of algebra, a-algebra. 1.2. Examples. Borel a-algebras, B(S), B = B(R). 1.3. Definitions concerning set functions. 1.4. Definition of measure space. 1.5. Definitions concerning measures. 1.6. Lemma. Uniqueness of extension, IT-systems. 1.7. Theorem. Caratheodory's extension theorem. 1.8. Lebesgue measure Leb on ((0, l],B(O, 1]). 1.9. Lemma. Elementary inequalities. 1.10. Lemma. Monotone-convergence properties of measures. 1.11. Example/Warning. 23
Chapter 2: Events
2.1. Model for experiment: (n, F, P). 2.2. The intuitive meaning. 2.3. Examples of (n,F) pairs. 2.4. Almost surely (a.s.) 2.5. Reminder: limsup,liminf,! lim, etc. 2.6. Definitions. lim sup En, (En,i.o.). 2.7. v
vi
Contents
First Borel-Cantelli Lemma (BC1). 2.S. Definitions. lim inf En, (En,ev). 2.9. Exercise. Chapter 3: Random Variables
29
3.1. Definitions. E-measurable function, mE, (mE)+ ,bE. 3.2. Elementary Propositions on measurability. 3.3. Lemma. Sums and products of measurable functions are measurable. 3.4. Composition Lemma. 3.5. Lemma on measurability of infs, liminfs of functions. 3.6. Definition. Random variable. 3.7. Example. Coin tossing. 3.S. Definition. a-algebra generated by a collection of functions on Q. 3.9. Definitions. Law, Distribution Function. 3.10. Properties of distribution functions. 3.11. Existence of random variable with given distribution function. 3.12. Skorokod representation of a random variable with prescribed distribution function. 3.13. Generated a-algebras - a discussion. 3.14. The Monotone-Class Theorem. Chapter 4: Independence
3S
4.1. Definitions of independence. 4.2. The 1T-system Lemma; and the more familiar definitions. 4.3. Second Borel-Cantelli Lemma (BC2). 4.4. Example. 4.5. A fundamental question for modelling. 4.6. A coin-tossing model with applications. 4.7. Notation: lID RVs. 4.S. Stochastic processes; Markov chains. 4.9. Monkey typing Shakespeare. 4.10. Definition. Tail 17algebras. 4.11. Theorem. Kolmogorov's 0-1 law. 4.12. Exercise/Warning. Chapter 5: Integration
49
5.0. Notation, etc. 11(1) :=: J f dl1, 11(1; A). 5.1. Integrals of non-negative simple functions, SF+. 5.2. Definition of 11(1), f E (mE)+. 5.3. MonotoneConvergence Theorem (MON). 5.4. The Fatou Lemmas for functions (FATOU). 5.5. 'Linearity'. 5.6. Positive and negative parts of f. 5.7. Integrable function, £l(S,E,I1). 5.S. Linearity. 5.9. Dominated Convergence Theorem (DOM). 5.10. Scheffe's Lemma (SCHEFFE). 5.11. Remark on uniform integrability. 5.12. The standard machine. 5.13. Integrals over subsets. 5.14. The measure f 11, f E (mE)+. Chapter 6: Expectation
5S
Introductory remarks. 6.1. Definition of expectation. 6.2. Convergence theorems. 6.3. The notation E(X; F). 6.4. Markov's inequality. 6.5. Sums of non-negative RVs. 6.6. Jensen's inequality for convex functions. 6.7. Monotonicity of £P norms. 6.S. The Schwarz inequality. 6.9. £2: Pythagoras, covariance, etc. 6.10. Completeness of £P (1:::; P < 00). 6.11. Orthogonal projection. 6.12. The 'elementary formula' for expectation. 6.13. Holder from Jensen.
Conieni3
Chapter 7: An Easy Strong Law
vii 71
7.1. 'Independence means multiply' - again! 7.2. Strong Law - first version. 7.3. Chebyshev's inequality. 7.4. Weierstrass approximation theorem.
Chapter 8: Product Measure
75
8.0. Introduction and advice. 8.1. Product measurable structure, ~1 X ~2' 8.2. Product measure, Fubini's Theorem. 8.3. Joint laws, joint pdfs. 8.4. Independence and product measure. 8.5. B(R)n = B(Rn). 8.6. The n-fold extension. 8.7. Infinite products of probability triples. 8.8. Technical note on the existence of joint laws.
PART B: MARTINGALE THEORY
Chapter 9: Conditional Expectation
83
9.1. A motivating example. 9.2. Fundamental Theorem and Definition (Kolmogorov, 1933). 9.3. The intuitive meaning. 9.4. Conditional expectation as least-squares-best predictor. 9.5. Proof of Theorem 9.2. 9.6. Agreement with traditional expression. 9.7. Properties of conditional expectation: a list. 9.8. Proofs of the properties in Section 9.7. 9.9. Regular conditional probabilities and pdfs. 9.10. Conditioning under independe?ce assumptions. 9.11. Use of symmetry: an example.
Chapter 10: Martingales
93
10.1. Filtered spaces. 10.2. Adapted processes. 10.3. Martingale, supermartingale, submartingale. 10.4. Some examples of martingales. 10.5. Fair and unfair games. 10.6. Previsible process, gambling strategy. 10.7. A fundamental principle: you can't beat the system! 10.8. Stopping time. 10.9. Stopped supermartingales are supermartingales. 10.10. Doob's OptionalStopping Theorem. 10.11. Awaiting the almost inevitable. 10.12. Hitting times for simple random walk. 10.13. Non-negative super harmonic functions for Markov chains.
Chapter 11: The Convergence Theorem
106
11.1. The picture that says it all. 11.2. Up crossings. 11.3. Doob's Upcrossing Lemma. 11.4. Corollary. 11.5. Doob's 'Forward' Convergence Theorem. 11.6. Warning. 11.7. Corollary.
T
viii
Contents
Chapter 12: Martingales bounded in £2
110
12.0. Introduction. 12.1. Martingales in £2: orthogonality of increments. 12.2. Sums of zero-mean independent random variables in £2. 12.3. Random signs. 12.4. A symmetrization technique: expanding the sample space. 12.5. Kolmogorov's Three-Series Theorem. 12.6. Cesaro's Lemma. 12.7. Kronecker's Lemma. 12.8. A Strong Law under variance constraints. 12.9. Kolmogorov's Truncation Lemma. 12.10. Kolmogorov's Strong Law of Large Numbers (SLLN). 12.11. Doob decomposition. 12.12. The anglebrackets process (M). 12.13. Relating convergence of M to finiteness of (M)oo. 12.14. A trivial 'Strong Law' for martingales in £2. 12.15. Levy's extension of the Borel-Cantelli Lemmas. 12.16. Comments. Chapter 13: Vniform Integrability
126
13.1. An 'absolute continuity' property. 13.2. Definition. UI family. 13.3. Two simple sufficient conditions for the UI property. 13.4. UI property of conditional expectations. 13.5. Convergence in probability. 13.6. Elementary proof of (BDD). 13.7. A necessary and sufficient condition for £1 convergence. Chapter 14: VI Martingales
133
14.0. Introduction. 14.1. UI martingales. 14.2. Levy's 'Upward' Theorem. 14.3. Martingale proof of Kolmogorov's 0-1 law. 14.4. Levy's 'Downward' Theorem. 14.5. Martingale proof of the Strong Law. 14.6. Doob's Submartingale Inequality. 14.7. Law of the Iterated Logarithm: special case. 14.8. A standard estimate on the normal distribution. 14.9. Remarks on exponential bounds; large deviation theory. 14.10. A consequence of Holder's inequality. 14.11. Doob's £P inequality. 14.12. Kakutani's Theorem on 'product' martingales. 14.13.The Radon-Nikodym theorem. 14.14. The Radon-Nikodym theorem and conditional expectation. 14.15. Likelihood ratio; equivalent measures. 14.16. Likelihood ratio and conditional expectation. 14.17. Kakutani's Theorem revisited; consistency of LR test. 14.18. Note on Hardy spaces, etc. Chapter 15: Applications
153
15.0. Introduction - please read! 15.1. A trivial martingale-representation result. 15.2. Option pricing; discrete Black-Scholes formula. 15.3. The Mabinogion sheep problem. 15.4. Proof of Lemma 15.3( c). 15.5. Proof of result 15.3( d). 15.6. Recursive nature of conditional probabilities. 15.7. Bayes' formula for bivariate normal distributions. 15.8. Noisy observation of a single random variable. 15.9. The Kalman-Bucy filter. 15.10. Harnesses entangled. 15.11. Harnesses unravelled, 1. 15.12. Harnesses unravelled, 2.
Contents
ix
PART C: CHARACTERISTIC FUNCTIONS Chapter 16: Basic Properties of CFs
172
16.1. Definition. 16.2. Elementary properties. 16.3. Some uses of characteristic functions. 16.4. Three key results. 16.5. Atoms. 16.6. Levy's Inversion Formula. 16.7. A table. Chapter 17: Weak Convergence
179
17.1. The 'elegant' definition. 17.2. A 'practical' formulation. 17.3. Skorokhod representation. 17.4. Sequential compactness for Prob(R). 17.5. Tightness. Chapter 18: The Central Limit Theorem
185
IS. 1. Levy's Convergence Theorem. 18.2. 0 and 0 notation. 18.3. Some important estimates. lS.4. The Central Limit Theorem. 18.5. Example. lS.6. CF proof of Lemma 12.4.
APPENDICES Chapter A1: Appendix to Chapter 1
192
ALl. A non-measurable subset A of 51. A1.2. d-systems. A1.3. Dynkin's Lemma. A1.4. Proof of Uniqueness Lemma 1.6. A1.5. A-sets: 'algebra' case. A1.6. Outer measures. A1.7. Caratheodory's Lemma. A1.S. Proof of Caratheodory's Theorem. A1.9. Proof of the existence of Lebesgue measure on «0,1],8(0,1]). A1.10. Example of non-uniqueness of extension. A1.11. Completion of a measure space. Al.12. The Baire category theorem. Chapter A3: Appendix to Chapter 3
205
A3.1. Proof of the Monotone-Class Theorem 3.14. A3.2. Discussion of generated a-algebras. Chapter A4: Appendix to Chapter 4
208
A4.1. Kolmogorov's Law of the Iterated Logarithm. A4.2. Strassen's Law of the Iterated Logarithm. A4.3. A model for a Markov chain. Chapter A5: Appendix to Chapter 5
211
A5.1. Doubly monotone arrays. A5.2. The key use of Lemma 1.1O(a). A5.3. 'Uniqueness of integral'. A5.4. Proof of the Monotone-Convergence Theorem.
x
Contents
Chapter A9: Appendix to Chapter 9
214
A9.I. Infinite products: setting things up. A9.2. Proof of A9.l(e). Chapter A13: Appendix to Chapter 13
217
A13.I. Modes of convergence: definitions. A13.2. Modes of convergence: relationships. Chapter A14: Appendix to Chapter 14
219
A14.I. The a-algebra :FT, T a stopping time. A14.2. A special case of OST. A14.3. Doob's Optional-Sampling Theorem for VI martingales. A14.4. The result for VI submartingales. Chapter A16: Appendix to Chapter 16
222
A16.I. Differentiation under the integral sign.
Chapter E: Exercises
224
References
243
Index
246
Preface - please read!
The most important chapter in this book is Chapter E: Exercises. I havp left the interesting things for you to do. You can start now on the 'EG' exercises, but see 'More about exercises' later in this Preface. The book, which is essentially the set of lecture notes for a third-year undergraduate course at Cambridge, is as lively an introduction as I can manage to the rigorous theory of probability. Since much of the book is devoted to martingales, it is bound to become very lively: look at those Exercises on Chapter lO! But, of course, there is that initial plod through the measure-theoretic foundations. It must be said however that measure theory, that most arid of subjects when done for its own sake, becomes amazingly more alive when used in probability, not only because it is then applied, but also because it is immensely enriched. You cannot avoid measure theory: an event in probability is a measurable set, a random variable is a measurable function on the sample space, the expectation of a random variable is its integral with respect to the probability measure; and so on. To be sure, one can take some central results from measure theory as axiomatic in the main text, giving careful proofs in appendices; and indeed that is exactly what I have done. Measure theory for its own sake is based on the fundamental addition rule for measures. Probability theory supplements that with the multiplication rule which describes independence; and things are already looking up. But what really enriches and enlivens things is that we deal with lots of a-algebras, not just the one a-algebra which is the concern of measure theory. In planning this book, I decided for every topic what things I considered just a bit too advanced, and, often with sadness, I have ruthlessly omitted them. For a more thorough training in many of the topics covered here, see Billingsley (1979), Chow and Teicher (1978), Chung (1968), Kingman and Xl
Xll
Preface
Taylor (1966), Laha and Rohatgi (1979), and Neveu (1965). As regards measure theory, I learnt it from Dunford and Schwartz (1958) and Halmos (1959). After reading this book, you must read the still-magnificent Breiman (1968), and, for an excellent indication of what can be done with discrete martingales, Hall and Heyde (1980). Of course, intuition is much more important than knowledge of measure theory, and you should take every opportunity to sharpen your intuition. There is no better whetstone for this than Aldous (1989), though it is a very demanding book. For appreciating the scope of probability and for learning how to think about it, Karlin and Taylor (1981), Grimmett and Stirzaker (1982), Hall (1988), and Grimmett's recent superb book, Grimmett (1989), on percolation are strongly recommended. More about exercises. In compiling Chapter E, which consists exactly of the homework sheet I give to the Cambridge students, I have taken into account the fact that this book, like any other mathematics book, implicitly contains a vast number of other exercises, many of which are easier than those in Chapter E. I refer of course to the exercises you create by reading the statement of a result, and then trying to prove it for yourself, before you read the given proof. One other point about exercises: you will, for example, surely forgive my using expectation E in Exercises on Chapter 4 before E is treated with full rigour in Chapter 6.
Acknowledgements. My first thanks must go to the students who have endured the course on which the book is based and whose quality has made me try hard to make it worthy of them; and to those, especially David Kendall, who had developed the course before it became my privilege to teach it. My thanks to David Tranah and other staff of CUP for their help in converting the course into this book. Next, I must thank Ben Garling, James Norris and Chris Rogers without whom the book would have contained more errors and obscurities. (The many faults which surely remain in it are my responsibility.) Helen Rutherford and I typed part of the book, but the vast majority of it was typed by Sarah Shea-Simonds in a virtuoso performance worthy of Horowitz. My thanks to Helen and, most especially, to Sarah. Special thanks to my wife, Sheila, too, for all her help. But my best thanks - and yours if you derive any benefit from the book - must go to three people whose names appear in capitals in the Index: J.L. Doob, A.N. Kolmogorov and P. Levy: without them, there wouldn't have been much to write about, as Doob (1953) splendidly confirms. Stati:Jtical Laboratory, Cambridge
David Williams October 1990
A Question of Terminology Random variables: functions or equivalence classes?
At the level of this book, the theory would be more 'elegant' if we regarded a random variable as an equivalence class of measurable functions on the sample space, two functions belonging to the same equivalence class if and only if they are equal almost everywhere. Then the conditional-expectation map
X
t-+
E(Xlg)
would be a truly well-defined contraction map from LP(n,F, P) to LP(n, g, P) for p ;::: 1; and we would not have to keep mentioning versions (representatives of equivalence classes) and would be able to avoid the endless 'almost surely' qualifications. I have however chosen the 'inelegant' route: firstly, I prefer to work with functions, and confess to preferring
4+5
= 2 mod 7
to
[4h + [5h = [2h·
But there is a substantive reason. I hope that this book will tempt you to progress to the much more interesting, and more important, theory where the parameter set of our process is uncountable (e.g. it may be the timeparameter set [0,00». There, the equivalence-class formulation just will not work: the 'cleverness' of introducing quotient spaces loses the subtlety which is essential even for formulating the fundamental results on existence of continuous modifications, etc., unless one performs contortions which are hardly elegant. Even if these contortions allow one to formulate results, one would still have to use genuine functions to prove them; so where does the reality lie?!
xiii
A Guide to Notation
~ signifies something important, ~~ something very important, and the Martingale Convergence Theorem.
~~~
I use ':=' to signify 'is defined to equal'. This Pascal notation is particularly convenient because it can also be used in the reversed sense. I use analysts' (as opposed to category theorists') conventions:
N:= {1,2,3, ... } ~ {O, 1,2, ... } =: Z+. Everyone is agreed that R+ := [0,00). For a set B contained in some universal set S, IB denotes the indicator function of B: that is IB : S -+ {a, I} and
° ifsEl!,
IB(S):={l
otherWise.
For a, bE R, a vb:= max(a, b).
a 1\ b:= min(a, b),
CF:characteristic function; DF: distribution function; pdf: probability density function. a-algebra, a(C) (1.1); aCY, : 1 E C) (3.8, 3.13). 7r-system (1.6); d-system
(A1.2).
a.e.:
almost everywhere (1.5)
a.s.:
almost surely (2.4)
bE:
the space of bounded E-measurable functions (3.1) xiv
xv
A Guide to Notation
13 := 13(R) (1.2)
8(S):
the Borel q-algebra on S,
C.X:
discrete stochastic integral (10.6)
d>./dJ.l:
Radon-Nikodym derivative (5.14)
dQ/dP:
Likelihood Ratio (14.13)
E(X):
expectation E(X):= JQX(w)P(dw) of X (6.3)
E(XjF):
fFXdP (6.3)
E(XI9):
conditional expectation (9.3)
(En,ev):
lim inf En (2.8)
(En,i.o.):
lim sup En (2.6)
Ix:
probability density function (pdf) of X (6.12).
Ix,Y:
joint pdf (8.3)
IxlY:
conditional pdf (0.6)
Fx:
distribution function of X (3.9)
liminf:
for sets, (2.8)
lim sup:
for sets, (2.6)
x
=i limx n :
Xn
i x in that
Xn ::; X n +l
(Vn) and
log:
natural (base e) logarithm
Cx, Ax:
law of X (3.9)
CP, LP:
Lebesgue spaces (6.7, 6.13)
Leb:
Lebesgue measure (1.8)
Xn --+
x.
space of E-measurable functions (3.1) process M stopped at time T (10.9)
{M}:
angle-brackets process (12.12)
J.l(f):
integral of I with respect to J.l (5.0, 5.2)
J.l(fjA):
fA fdJ.l (5.0, 5.2)
cpx:
CF of X (Chapter 16)
cp:
pdf of standard normal N(O,l) distribution
DF of N(O,l) distribution
XT:
X stopped at time T (10.9)
Chapter 0
A Branching-Process Example (This Chapter is not essential for the remainder of the book. You can start with Chapter 1 if you wish.)
0.0. Introductory remarks The purpose of this chapter is threefold: to take something which is probably well known to you from books such as the immortal Feller (1957) or Ross (1976), so that you start on familiar ground; to make you start to think about some of the problems involved in making the elementary treatment into rigorous mathematics; and to indicate what new results appear if one applies the somewhat more advanced theory developed in this book. We stick to one example: a branching process. This is rich enough to show that the theory has some substance. 0.1. Typical number of children, X In our model, the number of children of a typical animal (see Notes below for some interpretations of 'child' and 'animal') is a random variable X with values in Z+. We assume that
P(X = 0) > O. We define the generating function where
1
1(8) := E(8 x ) =
of X as the map
L
1
8k p(X = k).
kEZ+
Standard theorems on power series imply that, for 8 E [0,1],
and JJ := E(X) = 1'(1) =
L 1
kP(X = k) ~
(X).
[0,1]
--->
[0,1],
2
Chapter 0: A Branching-Process Example
(0.1) ..
Of course, f'(I) is here interpreted as lim f(B) - f(l) Bj! () - 1
= lim 1 - f(B) Btl
1 - () ,
since f(l) = 1. We assume that Jl
< 00.
Notes. The first application of branching-process theory was to the question of survival of family names; and in that context, animal = man, and child = son. In another context, 'animal' can be 'neutron', and 'child' of that neutron will signify a neutron released if and when the parent neutron crashes into a nucleus. Whether or not the associated branching process is supercritical can be a matter of real importance. \Ve can often find branching processes embedded in richer structures and can then use the results of this chapter to start the study of more interesting things. For superb accounts of branching processes, see Athreyaand Ney (1972), Harris (1963), Kendall (1966, 1975). 0.2. Size of nth generation, Zn To be a bit formal: suppose that we are given a doubly infinite sequence
(a) of independent identically distributed random variables (lID RVs) , each with the same distribution as X: p(x~m)
= k) = P(X =
k).
The idea is that for n E Z+ and r E N, the variable X~n+1) represents the number of children (who will be in the (n+ l)th generation) ofthe rth animal (if there is one) in the nth generation. The fundamental rule therefore is that if Zm signifies the size of the nth generation, then
(b)
Z n+l --
X(n+l) 1
+ ••• + X(n+l) Zn·
We assume that Zo 1, so that (b) gives a full recursive definition of the sequence (Zm : m E Z+) from the sequence (a). Our first task is
Chapter 0: A Branching-Process Example
.. (O.S)
to calculate the distribution function of Zn, or equivalently to find the generating function
(c)
0.3. Use of conditional expectations
The first main result is that for n E Z+ (and 0 E [0,1])
(a) so that for each n E Z+,
in
is the n-fold composition
in = i 0 f
(b)
0 ... 0
f·
Note that the O-fold composition is by convention the identity map fo(O) = indeed, forced by - the fact that Zo = 1.
e, in agreement with -
To prove (a), we use - at the moment in intuitive fashion - the following very special case of the very useful Tower Property of Conditional Expectation:
(c)
E(U)
= EE(UIV);
to find the expectation of a random variable U, first find the conditional expectation E(UIV) of U given V, and then find the expectation of that. We prove the ultimate form of (c) at a later stage. We apply (c) with U =
Now, for k satisfies
E
eZn +
1
and V = Zn:
Z+, the conditional expectation of
OZn+l
given that Zn = k
(d) But Zn is constructed from variables X~r) with r S n, and so Zn is inden +1 ), ..• n+1 ). The conditional expectation given Zn = k pendent of in the right-hand term in (d) must therefore agree with the absolute expectation
xi
(e)
,xi
Chapter 0: A Branching-Process Example
-4
(o.S) ..
But the expression at (e) is a expectation of the product of independent random variables and as part of the family of 'Independence means multiply' results, we know that this expectation of a product may be rewritten as the product of expectations. Since (for every nand r) E(8x~n+l»)
= f(8),
we have proved that
and this is what it means to say that
= k, the conditional expectation is equal to the conditional expectation E(UIV = k) of U given that V = k. (Sounds reasonable!)] Property (c) now yields [If V takes only integer values, then when V E(UIV) of U given
~7
E8 Zn + 1
= Ef( 8)Zn ,
and, since
o
result (a) is proved. Independence and conditional expectations are two of the main topics in this course.
0.4. Extinction probability, 7r Let 7r n := P(Zn = 0). Then 7r n = fn(O), so that, by (0.3,b),
(a)
7r n +l
= f(7r n ).
Measure theory confirms our intuition about the extinction probability:
(b)
7r:= P(Zm = 0 for some m)
Because
=i lim7rn •
f is continuous, it follows from (a) that
(c)
7r=f(7r).
The function f is analytic on (0,1), and is non-decreasing and convex (of non-decreasing slope). Also, f(1) = 1 and f(O) = P(X = 0) > O. The slope 1'(1) of f at 1 is f-l = E(X). The celebrated pictures opposite now make the following Theorem obvious.
THEOREM If E(X) > 1, then the extinction probability 7r is the unique root of the equation 7r = f( 7r) which lies strictly between 0 and 1. If E(X) :s; 1, then 7r = 1.
.. (0·4)
Chapter 0: A Branching-Process Example
y = f(x)
~2r---------------~~-----r
~l~~------------~
y=x
o~----------~----~--------~
o
~l
=
Case 1: subcritical, J-l f' (1) < 1. Clearly, ~ The critical case J-l = 1 has a similar picture.
Case 2: supercritical, J-l = f'(I)
> 1.
Now,
11"
= 1.
< 1.
~2
6
Chapter 0: A Branching-Process Example
(0.5) ..
0.5. Pause for thought: measure
Now that we have finished revising what introductory courses on probability theory say about branching-process theory, let us think about why we must find a more precise language. To be sure, the claim at (OA,b) that
(a)
1['
=i lim 7rn
is intuitively plausible, but how could one prove it? We certainly cannot prove it at present because we have no means of stating with puremathematical precision what it is supposed to mean. Let us discuss this further. Back in Section 0.2, we said 'Suppose that we are given a doubly infinite sequence {X~m) : m, r E N} of independent identically distributed random variables each with the same distribution as X'. What does this mean? A random variable is a (certain kind of) function on a sample space n. We could follow elementary theory in taking n to be the set of all outcomes, in other words, taking to be the Cartesian product
n
the typical element w of
n being w
= (w~r)
: r EN,s EN),
and then setting X!r)(w) = w~r). Now n is an uncountable set, so that we are outside the 'combinatorial' context which makes sense of 7r n in the elementary theory. Moreover, if one assumes the Axiom of Choice, one can prove that it is impossible to assign to all subsets of n a probability satisfying the 'intuitively obvious' axioms and making the X's IID RVs with the correct common distribution. So, we have to know that the set of w corresponding to the event 'extinction occurs' is one to which one can uniquely assign a probability (which will then provide a definition of 7r). Even then, we have to prove (a). Example. Consider for a moment what is in some ways a bad attempt to construct a 'probability theory'. Let C be the class of subsets C of N for which the 'density'
p(C) := litm Uk : I S k S n; k E C} n
00
exists. Let C n := {I, 2, ... , n}. Then C n E C and C n i N in the sense that Cn ~ C n + 1 , Vn and also U C n = N. However, p(Cn) = 0, Vn, but peN) = 1.
.. (0.6)
Chapter 0: A Branching-Process Example
7
Hence the logic which will allow us correctly to deduce (a) from the fact that {Zn = O} i {extinction occurs} fails for the (N,C,p) set-up:
(~~,C,p)
is not 'a probability triple'.
0
There are problems. Measure theory resolves them, but provides a huge bonus in the form of much deeper results such as the Martingale Convergence Theorem which we now take a first look at - at an intuitive level, I hasten to add. 0.6. Our first martingale Recall from (0.2,b) that
Z n+1 -- X(n+l) 1
+ ... + X(n+l) Zn'
where the x.(n+l) variables are independent of the values ZI, Z2, . . , , Zn. It is clear from this that
a result which you will probably recognize as stating that the process Z = (Zn : n ~ 0) is a Markov chain. We therefore have
j
or, in a condensed and better notation,
(a) Of course, it is intuitively obvious that
(b) because each of the Zn animals in the nth generation has on average 11 children. We can confirm result (b) by differentiating the result
with respect to
e and setting e = 1.
Chapter 0: A Branching-Process Example
8
(0.6) ..
Now define
n 2:
(c)
o.
Then
E(Mn+lIZo, ZI,"" Zn) = M n, which exactly says that
(d) M is a martingale relative to the Z process. Given the history of Z up to stage n, the next value Mn+l of M is on average what it is now: AI is 'constant on average' in this very sophisticated sense of conditional expectation given 'past' and 'present'. The true statement
(e)
E(Mn)
= 1,
Vn
is of course infinitely cruder. A statement S is said to be true almost surely (a.s.) or with probability 1 if (surprise, surprise!)
peS is true) =1. Because our martingale J..,[ is non-negative (Aln 2: 0, Vn), the Martingale Convergence Theorem implies that it is almost surely true that
(f)
]..[00 :=
lim Mn
exists.
Note that if Moo > 0 for some outcome (which can happen with positive probability only when p. > 1), then the statement (a.s.) is a precise formulation of 'exponential growth'. A particularly fascinating question is: suppose that p. > 1; what is the behaviour of Z conditional on the value of Moo?
0.7. Convergence (or not) of expectations We know that Moo := lim Mn exists with probability 1, and that E(Mn) = 1, \In. We might be tempted to believe that E(Moo) = 1. However, we already know that if p. s: 1, then, almost surely, the process dies out and Mn IS eventually O. Hence
(a)
if p.
s: 1,
then Moo = 0 (a.s.) and 0= E(Moo) =l-lim E(Mn)
= 1.
.. (0.8)
Chapter 0: A Branching-Process Example
9
This is an excellent exan1ple to keep in mind when we come to study Fatou's Lemma, valid for any sequence (Yn ) of non-negative random variables: E(liminfYn) ~ liminfE(Yn ). What is 'going wrong' at (a) is that (when J.l ~ 1) for large n, the chances are that lvln will be large if Mn is not and, very roughly speaking, this large value times its small probability will keep E(Mn) at 1. See the concrete examples in Section 0.9.
°
Of course, it is very important to know when
(b)
limE(·)
= E(lim·),
and we do spend quite a considerable time studying this. The best general theorems are rarely good enough to get the best results for concrete problems, as is evidenced by the fact that E(Moo) = 1 if and only if both Jl.
(c)
> 1 and E(XlogX) < 00,
where X is the typical number of children. Of course 0 log 0 = O. If J.l > 1 and E(XlogX) = 00, then, even though the process may not die out, Moo = 0, a.s.
o.s.
Finding the distribution of Moo
Since Mn
-+
Moo (a.s.), it is obvious that for A > 0, exp( -AMn) -+ exp( ->.Moo)
(a.s.)
Now since each Mn ~ 0, the whole sequence (exp(->.Mn» is bounded in absolute value by the constant 1, independently of the outcome of our experiment. The Bounded Convergence Theorem says that we can now assert what we would wish:
(a)
Eexp(-AMoo)
= limEexp(->.Mn ).
Since Mn = Zn/ Jl.n and E(8 Zn ) = fn(8), we have
(b) so that, in principle (if very rarely in practice), we can calculate the left-hand side of (a). However, for a non-negative random variable Y, the distribution function y 1-+ P(Y ::; y) is completely determined by the map A 1-+ E exp( ->'Y)
on
(0,00).
Chapter 0: A Branching-Process Example
10
(0.8) ..
Hence, in principle, we can find the distribution of Moo. We have seen that the real problem is to calculate the function
L(>.)
:= E exp( ->.Moo).
Using (b), the fact that fn+l = ! 0 fn, and the continuity of L (another consequence of the Bounded Convergence Theorem), you can immediately establish the functional equation:
(c)
L(>'Il)
= !(L(>.)).
0.9. Concretfil example
This concrete example is just about the only one in which one can calculate everything explicitly, but, in the way of mathematics, it is useful in many contexts. We take the 'typical number of children' X to have a geometric distribution:
(a) where
o < p < 1,
q:= 1 - p.
Then, as you can easily check,
(b)
I!q8'
f(8) =
and 7r
To calculate f 0 f 0 ••• 0 of the upper half-plane. If
=
{P1/ q
P
if q if q
!, we use a G=
Il =~,
> p, :5 p.
device familiar from the geometry
(911 912) g21
g22
is a non-singular 2 x 2 matrix, define the fractional linear transformation:
(c)
.. (0.9)
11
Chapter 0: A Branching-Process Example
Then you can check that if H is another such matrix, then G(H(B»
= (GH)(B),
so that composition of fractional linear transformations corresponds to matrix multiplication. Suppose that p
we find that the
nth
f:.
q. Then, by the S-I AS = A method, for example, power of the matrix corresponding to I is
( 0 p)n = -q
1
(q - p)
(1 p) (pn 0) (q -p)
-I
1
q
0
-1
qn
1
'
so that
(d) If p, = q/p ~ 1, then limn In(B) process dies out. Suppose now that p,
=
1, corresponding to the fact that the
> 1. Then you can easily check that, for A > 0,
L(A): = Eexp(-AMoo) = limln(exp(-Ajp,n)) PA qA
+q +q -
P p
= 7re- A.O + lOO(l_7r?e-AXe-(I-1f)Xdx,
from which we deduce that and
P(Moo = 0) = P(x
7r,
< Moo < x + dx)
or, better,
P(Moo > x)
= (1 - 7r?e-(I-1f)xdx
= (l-7r)e-(I-1f)x
(x> 0),
(x > 0) ..
Suppose that p, < 1. In this case, it is interesting to ask: what is the distribution of Zn conditioned by Zn =1= O? We find that
where
Chapter 0: A Branching-Process Example
12 so 0
< an < 1 and an + /3..
= 1.
an
-+
As n
-+ 00,
1 - fJ,
/3n
(0.9)
we see that
-+
fJ,
so (this is justified) lim P(Zn = klZn =I- 0) = (1 - fJ)fJ k- 1
(e)
(k EN).
n-OO
Suppose that fJ = 1. You can show by induction that
f n (0)
= n - (n - 1)0 , (n + 1) - nO
and that corresponding to
P(Zn/n > xlZn =I- 0)
(f)
-+
e- x ,
x> O.
'The FatoD factor' We know that when fJ ::; 1, we have E(Mn) = 1, \In, but E(Moo) = we get some insight into this? First consider the case when fJ for large n,
o.
Can
< 1. Result (e) makes it plausible that
E(ZnIZn =I- 0) is roughly (1- fJ) E kfJk-l = 1/(1- fJ)· We know that
P(Zn =I- 0) = 1 - fn(O) is roughly (1 - fJ)fJ n , so we should have (roughly)
E(Mn) = E
(!= IZn
= (1 _
=I- 0) P(Zn =f 0)
~)fJn (1 -
#)fJ n = 1,
which might help explain how the 'balance' E(Mn) values times small probabilities.
= 1 is achieved by big
.. (0.9)
Chapter 0: A Branching-Process Example
Now consider the case when
P(Zn
~
= 1.
19
Then
=I 0) = l/(n + 1),
and, from (f), Zn/n conditioned by Zn =I 0 is roughly exponential with mean 1, so that Mn = Zn conditioned by Zn =I 0 is on average of size about n, the correct order of magnitude for balance. Warning. We have just been using for 'correct intuitive explanations' exactly the type of argument which might have misled us into thinking that E(.1\100) = 1 in the first place. But, of course, the result
is a matter of obvious fact.
PART A: FOUNDATIONS Chapter 1
Measure Spaces
1.0. Introductory remarks Topology is about open sets. The characterizing property of a continuous function I is that the inverse image 1-1 (G) of an open set G is open. Measure theory is about measurable sets. The characterizing property of a measurable function I is that the inverse image 1-1 (A) of any measurable set is measurable. In topology, one axiomatizes the notion of 'open set', insisting in particular that the union of any collection of open sets is open, and that the intersection of a finite collection of open sets is open. In measure theory, one axiomatizes the notion of 'measurable set', insisting that the union of a countable collection of measurable sets is measurable, and that the intersection of a countable collection of measurable sets is also measurable. Also, the complement of a measurable set must be measurable, and the whole space must be measurable. Thus the measurable sets form a u-algebra, a structure stable (or 'closed') under count ably many set operations. Without the insistence that 'only count ably many operations are allowed', measure theory would be self-contradictory - a point lost on certain philosophers of probability.
The probability that a point chosen at random on the surface of the unit sphere 52 in R3 falls into the subset F of 52 is just the area of F divided by the total area 411". What could be easier? However, Banach and Tarski showed (see Wagon (1985)) that if the Axiom of Choice is assumed, as it is throughout conventional mathematics, then there exists a subset F of the unit sphere 52 in R3 such that for
Chapter 1: Measure Spaces
.. (1.1) 3 :5 k < of F:
00
(and even for k
= (0), 52
15
is the disjoint union of k exact copies
;=1 where each T;(k) is a rotation. If F has an 'area', then that area must simultaneously be 411"/3,411"/4, ... ,0. The only conclusion is that the set F is non-measurable (not Lebesgue measurable): it is so complicated that one cannot assign an area to it. Banach and Tarski have not broken the Law of Conservation of Area: they have simply operated outside its jurisdiction. Remarks. (i) Because every rotation T has a fixed point x on S2 such that = x, it is not possible to find a subset A of 52 and a rotation T such that AU T(A) = 52 and An T(A) = 0. So, we could not have taken k = 2. (ii) Banach and Tarski even proved that given any two bounded subsets A and B of R3 each with non-empty interior, it is possible to decompose A into a certain finite number n of disjoint pieces A = U~1 A; and B into the same number n of disjoint pieces B = U7=1 B;, in such a way that, for each i, A; is Euclid-congruent to B;!!! So, we can disassemble A and rebuild it as B. (iii) Section A1.1 (optional!) in the appendix to this chapter gives an Axiom-of-Choice construction of a non-measurable subset of 51.
T( x)
This chapter introduces u-algebras, 1I"-systems, and measures and emphasizes monotone-convergence properties of measures. We shall see in later chapters that, although not all sets are measurable, it IS always the case for probability theory that enough sets are measurable.
1.1. Definitions of algebra, u-algebra Let 5 be a set. Algebra on 5 A collection ~o of subsets of 5 is called an algebra on 5 (or algebra of subsets of 5) if (i) 5 E ~o, (ii) F E ~o ::::} FC:= 5\F E ~o, (iii) F,G E ~o ::::} FuG E ~o. [Note that 0 = 5 c E ~o and
16
Chapter 1: Measure Spaces
(1.1) ..
Thus, an algebra on S is a family of subsets of S stable under finitely many set operations. Exercise (optional). Let C be the class of subsets C of N for which the 'density' lim m-1Hk: 15 k 5 m;k E C} mToo
exists. We might like to think of this density (if it exists) as 'the probability that a number chosen at random belongs to C'. But there are many reasons why this does not conform to a proper probability theory. (We saw one in Section 0.5.) For example, you should find elements F and Gin C for which FnG ~ C. Note on terminology ('algebra versus field'). An algebra in our sense is a true algebra in the algebraists' sense with n as product, and symmetric difference A6.B := (A U B)\(A n B) as 'sum', the underlying field of the algebra being the field with 2 elements. (This is why we prefer 'algebra of subsets' to 'field of subsets': there is no way that an algebra of subsets is a field in the algebraists' sense - unless 1:0 is trivial, that is, 1:0 = {S, 0}.) u-algebra on S A collection 1: of subsets of S is called a u-algebra on S (or u-algebra of subsets of S) if 1: is an algebra on S such that whenever Fn E 1: (n EN), then n
[Note that if 1: is a u-algebra on Sand Fn E 1: for n E N, then
n
n
Thus, a u-algebra on S is a family of subsets of S 'stable under any countable collection of set operations'. Note. Whereas it is usually possible to write in 'closed form' the typical element of many of the algebras of sets which we shall meet (see Section 1.8 below for a first example), it is usually impossible to write down the typical element of a u-algebra. This is the reason for our concentrating where possible on the much simpler '1r-systems'. Measurable space A pair (S,1:), where S is a set and 1: is a u-algebra on S, is called a measurable space. An element of 1: is called a 1:-measurable subset of S.
.. (1.2)
Chapter 1 : Measure Spaces
17
a(C), a-algebra generated by a class C of subsets Let C be a class of subsets of S. Then a(C), the a-algebra generated by C, is the smallest a-algebra I: on S such that C ~ I: . It is the intersection of all a-algebras on S which have C as a subclass. (Obviously, the class of all subsets of S is a a-algebra which extends C.)
1.2. Examples. Borel a-algebras, B(S), B = 8(R) Let S be a topological space.
8(S) B(S), the Borel a-algebra on S, is the a-algebra generated by the family of open subsets of S. With slight abuse of notation, B(S):= a(open sets). B:= B(R) It is standard shorthand that B := B(R). The a-algebra B is the most important of alIa-algebras. Every subset of R which you meet in everyday use is an element of B; and indeed it is difficult (but possible!) to find a subset of R constructed explicitly (without the Axiom of Choice) which is not in B. Elements of 8 can be quite complicated. However, the collection 7r(R) := {( -00, xl
:x
E R}
(not a standard notation) is very easy to understand, and it is often the case that all we need to know about B is that
(a)
B = a(7r(R)). Proof of (a). For each x in R, (-00, xl = nnEN( -00, x + n -1), so that as a countable intersection of open sets, the set (-00, xl is in B.
All that remains to be proved is that every open subset G of R is in a(7r(R)). But every such G is a countable union of open intervals, so we need only show that, for a, b E R with a < b, (a,b) E a(7r(R)).
But, for any u with u > a,
(a, uJ = (-00, uJ n (-00, ale E a(7r(R)), and since, for e
= Hb -
a),
(a, b) = U(a, b - en-I], n
we see that ( a, b) E a( 7r(R)), and the proof is complete.
o
(1.3) ..
Chapter 1: Measure Spaces
18
1.3. Definitions concerning set functions
Let S be a set, let Eo be an algebra on S, and let J.to be a non-negative set function
J.to : Eo
-+
[O,ooJ.
Additive Then J.to is called additive if J.to(0)
F nG = 0
= 0 and, for F, G E Eo,
J.to(F U G) = J.to(F)
=}
+ J.to(G).
Countably additive The map J.to is called countably additive (or a-additive) if J.t(0) = 0 and whenever (Fn : n E N) is a sequence of disjoint sets in Eo with union F = UFn in Eo (note that this is an assumption since Eo need not be a a-algebra), then n
Of course (why?), a count ably additive set function is additive. 1.4. Definition of measure space
Let (S, E) be a measurable space, so that E is a a-algebra on S. A map
J.t: E -+ [O,ooJ. is called a measure on (S, E) if J.t is countably additive. The triple (S, E, J.t) is then called a measure space. 1.5. Definitions concerning measures
Let (S, E, J.t) be a measure space. Then J.t (or indeed the measure space (S, E, J.t» is called finite if J.t(S) a-finite
< 00,
if there is a sequence (Sn : n E N) of elements of E such that
J.t(Sn) <
00
(Vn E N) and USn = S.
Warning. Intuition is usually OK for finite measures, and adapts well for a-finite measures. However, measures which are not a-finite can be crazy; fortunately, there are no such measures in this book.
.. (1.6)
19
Chapter 1: Measure Spaces
Probability measure, probability triple Our measure P is called a probability measure if
peS) = 1, and (S,}j, p) is then called a probability triple. p-null element of}j, almost everywhere (a.e.) An element F of}j is called p-null if p(F) = O. A statement S about points s of S is said to hold almost everywhere (a.e.) if F:= {s: S(s) is false} E}j and Jl.(F)
= o.
1.6. LEMMA. Uniqueness of extension, 7r-systems
Moral: a-algebras are 'difficult', but IT-systems are 'easy'; so we aim to work with the latter. ~(a)
Let S be a set. Let I be a IT-system on S, that is, a family of subsets of S stable under finite intersection:
Let }j := a(I). Suppose that Jl.I and P2 are measures on (S,}j) such that PI(S) = /-I2(S) < 00 and Jl.I = Jl.2 on I. Then
PI ~(b)
= Jl.2
on }j.
Corollary. If two probability measures agree on a IT-system, then they agree on the a-algebra generated by that 7r-system.
The example 13 = a(7r(R)) is of course the most important example of the }j = a(I) in the theorem. This result will play an important role. Indeed, it will be applied more frequently than will the celebrated existence result in Section 1.7. Because of this, the proof of Lemma 1.6 given in Sections A1.2-1.4 of the appendix to this chapter should perhaps be consulted - but read the remainder of this chapter first.
1.1. THEOREM. ~
(1.7) ..
Chapter 1: Measure Spaces
20
CaratlH~odory's
Extension Theorem
Let S be a set, let I:o be an algebra on S, and let
If flo is a countably additive map flo : I:o measure fl on (S, I:) such that fl
= flo
-+
[0,00]' then there exists a
on I:o.
If floeS) < 00, then, by Lemma 1.6, this extension is unlque - an algebra is a 7r-system!
In a sense, this result should have more ~ signs than any other, for without it we could not construct any interesting models. However, once we have our model, we make no further use of the theorem. The proof of this result given in Sections A1.5-1.8 of the appendix is there for completeness. It will do no harm to assume the result for this course. Let us now see how the theorem is used.
1.8. Lebesgue measure Leb on «0,1]' B(O, 1]) Let S = (0,1]. For F ~ S, say that FE I:o if F may be written as a finite umon
where r E N, 0 (0,1] and
~ al
S; bI S; ..• S;
ar
S; br S; 1. Then I:o is an algebra on
I: := a(I: o) = B(O, 1]. (We write B(O, 1] instead of B«O, 1]).) For F as at (*), let
flo(F)
= L(bk -
ak).
k::;r
Then flo is well-defined and additive on I:o (this is easy). Moreover, flo is countably additive on I: o. (This is not trivial. See Section A1.9.) Hence, by Theorem 1.7, there exists a unique measure fl on «0,1], B(O, 1]) extending flo on I:o. This measure fl is called Lebesgue measure on «0,1], B(O, 1]) or (loosely) Lebesgue measure on (0,1]. We shall often denote fl by Leb. Lebesgue measure (still denoted by Leb) on ([0, l],B[O, 1]) is of course obtained by a trivial modification, the set {OJ having Lebesgue measure o. Of course, Leb makes precise the concept of length. In a similar way, we can construct (a-finite) Lebesgue measure (which we also denote by Leb) on R (more strictly, on (R, B(R».
21
Chapter 1: Measure Spaces
·.(1.10)
1.9. LEMMA. Elementary ineqnalities Let (S, E, jl) be a measure space. Then
(a) jl(A. U B) S jl(A) + jl(B)
(A,B E E),
... (b) jl(Ui.$n Fi) ~ L:i.$n jl(Fi) Furthermore, if jl(S)
<
00,
then
(c) jl(A.UB)=jl(A)+jl(B)-jl(AnB)
(A,BEE),
(d) (inclusion-exclusion formula): for Fl, F2' ... ' Fn E E,
jl(Ui.$n Fi) = I:i.$n jl(Fi) - I: I:i
successive partial sums alternating between over- and under-estimates.
You will surely have seen some version of these results previously. Result (c) is obvious because AU B is the disjoint union AU (B\(A n B». But (c)*(a)*(b) - check that 'infinities do not matter'. You can deduce (d) from (c) by induction, but, as we shall see later, the neat way to prove (d) is by integration. 1.10. LEMMA. Monotone-convergence properties of measures . These results are often needed for making things rigorous. (Peep ahead to the 'Monkey typing Shakespeare' Section 4.9.) Again, let (S, E, jl) be a measure space . ... (a) If Fn E E
(n EN) and Fn
T F, then jl(Fn) TJl(F).
Notes. Fn T F means: Fn ~ F n+ l (\In EN), the fundamental property of measure. Proof of (a). Write G 1 := F 1 , G n (n E N) are disjoint, and
G n := Fn \Fn- l
jl(Fn) = jl(G 1 U G 2 U ... U Gn) =
UFn = F. Result (a) is (n 2:: 2). Then the sets
I: jl(GkH I: jl(Gk) = k.$n
jl(F).
D
k
Application. In a proper formulation of the branching-process example of Chapter 0,
{Zn = O}
T{extinction occurs},
so that
11" n
T11".
(A proper formulation of the branching-process example will be given later.)
(1.10) ..
Chapter 1: Measure Spaces
22
for some k, then p(G n ) ! p(G). Proof of (b). For n E N, let Fn := Gk \GHn, and now apply part (a). 0
~(b) If
Gn E I:, Gn ! G and p(Gk) <
00
Example - to indicate what can 'go wrong'. For n E N, let
Hn := (n,oo). Then Leb(Hn ) = ~( c)
00, "In,
but Hn
! 0.
The union of a countable number of p-null sets is p-null.
This is a trivial corollary of results (1.9,b) and (1.lO,a).
1.11. Example/Warning Let (S, I:, p) be ([0,1], B[O, 1], Leb). Let c:(k) be a sequence of strictly positive numbers such that c:(k) ! 0. For a single point x of S, we have {x} ~ (x - c:(k),x
(a)
+ c:(k)) n S,
so that for every k, p({x}) < 2c:(k), and so p({x}) = 0. That {x} is B(S)measurable follows because {x} is the intersection of the countable number of open subsets of S on the right-hand side of (a). Let V = Q n [0,1], the set of rationals in [0,1]. Since V is a countable union of singletons: V = {v n : 11 E N}, it is clear that V is B[O,l]measurable and that Leb(V) = 0. We can include V in an open subset of S of measure at most 4c:( k) as follows:
nEN
n
°
nk k
Clearly, H := G satisfies Leb(H) = and V ~ H. Now, it is a consequence of the B aire category theorem (see the appendix to this chapter) that H is uncountable, so
(b) the set H is an uncountable set of measure 0; moreover,
H= nUln,k =I un1n,k = V. k
n
n
k
Throughout the subject, we have to be careful about interchanging orders of operations.
Chapter 2
Events
2.1. Model for experiment:
(n, F, P)
A model for an experiment involving randomness takes the form of a probability triple (n, F, P) in the sense of Section 1.5.
Sample space fl is a set called the sample space_
Sample point A point w of n is called a sample point.
Event The a-algebra F on n is called the family of events, so that an event is an element of F, that is, an F-measurable subset of
n_
By definition of probability triple, P is a probability measure on
(n, F).
2.2. The intuitive meaning Tyche, Goddess of Chance, chooses a point w of n 'at random' according to the law P in that, for F in F, P(F) represents the 'probability' (in the sense understood by our intuition) that the point w chosen by Tyche belongs to F. The chosen point w determines the outcome of the experiment. Thus there is a map --+ set of outcomes, w 1-+ outcome.
n
There is no reason why this 'map' (the co-domain lies in our intuitio~!) should be one-one. Often it is the case that although there is some obvious 'minimal' or 'canonical' model for an experiment, it is better to use som~ richer model. (For example, we can read off many properties of coin tossing by imbedding the associated random walk in a Brownian motion.)
23
(2.9) ..
Chapter 2: Events
24
2.3. Examples of (11, F) pairs We leave the question of assigning probabilities until later. (a) Experiment: Toss coin twice. We can take 11 = {HH, HT, TH, TT},
F = P(11) := set of all subsets of 11.
In this model, the intuitive event 'At least one head is obtained' is described by the mathematical event (element of F) {H H, HT, T H}. (b) Experiment: Toss coin infinitely often. We can take
so that a typical point w of 11 is a sequence
We certainly wish to speak of the intuitive event 'w n = W', where W E {H, T}, and it is natural to choose
F
= u( {w E 11 : Wn = W}
: n E N, W E {H, T}).
Although F 1= P(11) (accept this!), it turns out that F is big enough; for example, we shall see in Section 3.7 that the truth set
F
= {W : ~(k $
n
~Wk = H)
-+
~}
of the statement number of heads in n tosses n
1 2
-+ -
is an element of F. Note that we~n use the current model as a more informative model for the experiment in (a), using the map W >-+ (Wl,W2) of sample points to outcomes.
°
( c) Experiment: Choose a point between and 1 uniformly at random. Take 11 = [O,l],F = B[O,l],w signifying the point chosen. In this case, we obviously take P =Leb. The sense in which this model contains model (b) for the case of a fair coin will be explained later.
.. (2.5)
Chapter 2: Events
25
2.4. Almost surely (a.s.) A statement S about outcomes is said to be true almost surely (a.s.), or with probability 1 (w.p.1), if
F:= {w : Sew) is true} E :F and P(F) (a) Proposition. If Fn E:F
(n EN) and P(Fn)
penn Fn) Proof. P(F~)
= 1.
= 1, Vn,
then
= 1.
= O,Vn, so, by Lemma 1.l0(c),
P(UnF~)
= O. But nFn =
(UF~Y.
0
(b) Something to think about. Some distinguished philosophers have tried to develop probability without measure theory. One of the reasons for difficulty is the following. When the discussion (2.3,b) is extended to define the appropriate probability measure for fair coin tossing, the Strong Law of Large Numbers (SLLN) states that F E :F and P(F) = 1, where F, the truth set of the statement 'proportion of heads in n tosses -+ !', is defined formally in (2.3,b). Let A be the set of all maps a : N For a E A, let
-+
N such that a(l)
_ { . #(k :::; n : W",(k) = H) F"'w. n
-+
< a(2) < ...
~} 2
the 'truth set of the Strong Law for the subsequence a'. Then, of course, we have P(F",) = 1, Va E A. Exercise. Prove that
n Fa = 0.
",EA
(Hint. For any given w, find an a ... . ) The moral is that the concept of 'almost surely' gives us (i) absolute precision, but also (ii) enough flexibility to avoid the self-contradictions into which those innocent of measure theory too easily fall. (Of course, since philosophers are pompous where we are precise, they are thought to think deeply ... .) 2.5. Reminder: lim sup, liminf, (a) Let (x n
:
1 lim, etc.
n E N) be a sequence of real numbers. We define
limsupxn := inf {sup ,
m
n~m
xn} =1 lim {sup xn} E [-oo,ooJ. m
n~m
(2.5) ..
Chapter 2: Events
26
Obviously, Ym := SUPn>m Xn is monotone non-increasing in m, so that the limit of the sequence Y';; exists in [-00, ooJ. The use of jlim or llim to signify monotone limits will be handy, as will Yn 1 Yoo to signify Yeo =llim Yn' (b) Analogously, liminfxn :=suP { inf xn} =ilim{ inf xn} E [-00,00]. m
n2:m
m
n~m
(c) We have Xn
converges in [-00,00]
{=:::}
limsupxn = liminfx n ,
and then limx n = limsupxn = liminfxn. ~(d)
Note that (i) if z > limsupxn, then x n < z eventually (that is, for all sufficiently large n) (ii) if z
< lim sup X n , then Xn > z infinitely often
(that is, for infinitely many n).
2.6. Definitions. lim sup En, (En, i.o.) The event (in the rigorous formulation: the truth set of the statement) 'number of heads/ number of tosses -> !' is built out of simple events such as 'the nth toss results in heads' in a rather complicated way. We need a systematic method of being able to handle complicated combinations of events. The idea of taking lim infs and lim sups of sets provides what is required.
It might be helpful to note the tautology that, if E is an event, then
E = {w: wEE}. Suppose now that (En: n E N) is a sequence of events. ~(a)
We define
(En' i.o.): = (En infinitely often) : = lim sup En :=
nU m
= {w : for every m,
En
n2:m
3n(w)
~
m such that w E En(w)}
= {w: wEEn for infinitely many n}. I
.. (2.8) ~(b)
Chapter 2: Events
27
(Reverse Fatou Lemma - needs FINITENESS of P) P(limsupEn) ~ limsupP(En).
Proof. Let G m := Un>m En. Then (look at the definition in (a)) Gm 1 G, where G:= lim sup En-:- By result (l.lO,b), P(G m ) 1 peG). But, clearly,
P(G m) ~ sup PeEn). n2: m
Hence, P( G)
~llim { sup P( En)} =: lim sup P( En). m
n~m
o
2.7. First Borel-Cantelli Lemma (BCl) Let (En : n E N) be a sequence of events such that Ln PeEn) < 00. Then P(limsupEn) = peEn, i.o.) = 0. Proof. With the notation of (2.6,b), we have, for each m,
peG) ~ P(Gm) ~ using (1.9,b) and (l.lO,a). Now let m
i
L
peEn),
00.
o
Notes. (i) An instructive proof by integration will be given later.
(ii) Many applications of the First Borel-Cantelli Lemma will be given within this course. Interesting applications require concepts of independence, random variables, etc .. 2.8. Definitions. liminfEn,
(En, ev)
Again suppose that (En : n E N) is a sequence of events. ~(a) We define
(En, ev) : = (En eventually) : = lim inf En :=
U n En
= {w: for some mew), wE En,Vn ~ mew)} = {w: wEEn for all large n}. (b) Note that (En, evy = (E~, i.o.). ~~(c)
(Fatou's Lemma for sets - true for ALL measure spaces) P(liminf En)
~
liminfP(E n ).
Exercise. Prove this in analogy with the proof of result (2.6,b), using (l.lO,a) rather than (1.10,b).
28
(2.9) ..
Chapter 2: Events
2.9. Exercise
For an event E, define the indicator function IE on
n via
I ( ) ._ {I, if wEE, E W.0, if w ~ E. Let (En: n E N) be a sequence of events. Prove that, for each w,
and establish the corresponding result for lim infs.
Chapter 3
Random Variables
Let
(S,~)
be a measurable space, so that
~
is a a-algebra on S.
3.1. Definitions. ~-measurable function, m~, (m~)+, b~ Suppose that h: S
-+
R. For A
~
R, define
h-1(A) := {s E S: h(s) E A}.
Then h is called ~-measurable if h- 1 : B So, here is a picture of a
~-measurable
-+ ~,
that is, h-1(A) E ~, VA E B.
function h:
We write m~ for the class of ~-measurable functions on S, and (m~)+ for the class of non-negative elements in mL We denote by b~ the class of bounded ~-measurable functions on S.
Note. Because lim sups of sequences even of finite-valued functions may be infinite, and for other reasons, it is convenient to extend these definitions to functions h taking values in [-00,00] in the obvious way: h is called ~-measurable if h- 1 : B[-oo,oo]-+ ~. Which of the various results stated for real-valued functions extend to functions with values in [-00, ooj, and what these extensions are, should be obvious. Borel function A function h from a topological space S to R is called Borel if h is B(S)measurable. The most important case is when S itself is R.
29
(9.2)..
Chapter 9: Random Variables
90
3.2. Elementary Propositions on measurability The map h- l preseMJes all set operations: h-I(Ua Aa) = Ua h-I(Aa ), h-I(AC) = (h-I(AW, Proof This is just definition chasing.
(a)
etc. 0
IfC ~ 8 and u(C) = 8, then h- l : C -+ E => hEmE. Proof. Let £ be the class of elements B in 8 such that h-I(B) E E. By result (a), £ is a u-algebra, and, by hypothesis, £ ;;2 C. 0 (c) If S is topological and h : S -+ R is continuous, then h is Borel. Proof. Take C to be the class of open subsets of R, and apply result (b). 0 ~(d) For any measurable space (S, E), a function h: S -+ R is E-measurable if {h S c} := {s E S: h(s) S c} E E (Vc E R).
~(b)
Proof Take C to be the class 7I'(R) of intervals of the form (-00, cl, cE R, and apply result (b). o Note. Obviously, similar results apply in which {h S c} is replaced by {h> c}, {h ~ c}, etc.
3.3. LEMMA. Sums and products of measurable functions are m.easurable ~
mE is an algebra over R, that is, if A E Rand h, hI, hz E mE, then
hI
+ hz E mE,
hlh2 E mE,
Ah E mE.
Example of proof Let c E R. Then for s E S, it is clear that hl(S)+h2(S) if and only if for some rational q, we have
>c
In other words,
{hl+hz>c}= U({h l >q}n{h2 >c-q}), qEQ
a countable union of elements of E.
o
.. (S.6)
Chapter S: Random Variables
31
3.4. Composition Lemma. If h E
m~
and f E mB, then
f
0
hE
m~.
Proof. Draw the picture:
S~R-LR
~C8C8 Note. There are obvious generalizations based on the definition (important in more advanced theory): if (St, ~d and (S2, ~2) are measurable spaces and h : SI -+ S2, then h is called ~d~2-measurable if h- l : ~2 -+ ~l' From this point of view, what we have called ~-measurable should read ~/B-measurable (or perhaps ~/8[-00, ooJ-measurable).
3.5. LEMMA on measurability of infs, lim infs of functions ~
Let (hn : n E N) be a sequence of elements of m~. Then
(i) inf hn, (ii) liminf hn' (iii) limsuph n are ~-measurable (into ([-00,00], 8[-00,00]), but we shall still write inf h n E m~ (for example)). Further, (iv) {s : limhn(s) exists in
R} E ~.
nn
Proof. (i) {inf h n ~ c} = {hn ~ c}. (ii) Let Ln(s) := inf{hr(s) : r ~ n}. Then Ln E m~, by part (i). But L(s):= liminf hn(s) =i limLn(s) = sup Ln(s), and {L:::; c} = nn{Ln:::; c} E~. (iii) This part is now obvious. (iv) This is also clear because the set on which limh n exists in R is
{lim sup h n < oo}
n {liminf h n > -oo} n g-l( {O}),
where g := lim sup h n -liminf hn'
o
3.6. Definition. Random variable ~Let
(n,F) be our (sample space, family of events). A random variable is an element of mF. Thus, X :
n -+ R,
X-I: B -+ F.
Chapter 9: Random Variables
92
(9.7) ..
3.7. Example. Coin tossing Let
n = {H, T}N, W = (Wt,W2, •• . ), Wn :F = a( {w : Wn
= W}
Let
E {H, T}. As in (2.3,b), we define
: n E N, W E {H, T}). ifw n = H, if Wn = T.
The definition of :F guarantees that each Xn is a random variable. By Lemma 3.3,
Sn := Xl
+ X 2 + ... + Xn =
number of heads in n tosses
is a random variable. Next, for p E [0,1], we have number of heads } A:= { w: -+p ={w:L+(w)=p}n{w:L-(w)=p}, number of tosses
where L+ := limsupn-ISn and L- is the corresponding lim info By Lemma
3.5, A E:F. ~~
Thus, we have taken an important step towards the Strong Law: the result is meaningful! It only remains to prove that it is true! 3.8. Definition. a-algebra generated by a collection of functions
on
n
This is an important idea, discussed further in Section 3.14. (Compare the weakest topology which makes every function in a given family continuous, etc.) In Example 3.7, we have a given set
n,
a family (Xn : n E N) of maps Xn :
n -+ R.
The best way to think of the a-algebra :F in that example is as
:F = a(Xn : n E N) in the sense now to be described. ~~Generally, if we have a collection
Y
(YI' : I E C) of maps YI' : n -+ R, then
:= a(YI' : I E C)
·.(3.10)
99
Chapter 3: Random Variables
is defined to be the smallest a-algebra Y on Q such that each map Y-y (, E C) is Y-measurable. Clearly, a(Y-y : 1 E C) = a( {w E Q: Y-y(w) E B} : 1 E C, BE 8). If X is a random variable for some (Q, F), then, of course, a(X)
~
F.
Remarks. (i) The idea introduced in this section is something which you will pick up gradually as you work through the course. Don't worry about it now; think about it, yes! (ii) Normally, 7r-systems come to our aid. For example, if (Xn : n E N) is a collection of functions on Q, and Xn denotes a(Xk : k ~ n), then the union U Xn is a 7r-system (indeed, an algebra) which generates a(Xn : n EN). 3.9. Definitions. Law, distribution function Suppose that X is a random variable carried by some probability triple (Q, F, P). We have
[0,1]
Define the law
ex
or indeed [0,1] of X by
ex:= ex
P +--
P +--
x- 1
F +--8, x- 1 a(X) +--8.
ex :
P oX-l, 8 -+ [0,1] . . Then (Exercise!) is a probability measure on (R,8). Since 7r(R) = {( -00, c] : c E R} is a 7r-system which generates 8, Uniqueness Lemma 1.6 shows that ex is determined by the function Fx : R -+ [0,1] defined as follows:
Fx(c) := ex( -00, c] = P(X ~ c) = P{w : X(w) ~ c}. The function Fx is called the distribution function of X. 3.10. Properties of distribution functions Suppose that F is the distribution function F = Fx of some random variable X. Then (a) F: R -+ [0,1], F j(that is, x ~ y => F(x) ~ F(y)), (b) limx_ooF(x) = 1, limx __ ooF(x) = 0, (c) F is right-continuous. Proof of (c). By using Lemma (1.10,b), we see that
P(X
~ x
+ n- l ) !
P(X
~ x),
and this fact together with the monotonicity of Fx shows that Fx is rightcontinuous. Exercise! Clear up any loose ends.
(S.ll) ..
Chapter S: Random Variables
3.11. Existence of random variable with given distribution function ~If
F has the properties (a,b,c) in Section 3.10, then, by analogy with Section 1.8 on the existence of Lebesgue measure, we can construct a unique probability measure C on (R, B) such that
Take (f!,F,P)
=
C(-oo,xJ = F(x), Vx. (R,B,C), X(w) = w. Then it is tautological that Fx(x) = F(x), Vx.
Note. The measure C just described is called the Lebesgue-Stieltjes measure associated with F. Its existence is proved in the next section.
3.12. Skorokhod representation of a random variable with prescribed distribution function Again let F : R --+ [0, 1J have properties (3.10,a,b,c). We can construct a random variable with distribution function F carried by
(f!,F, P) = ([0, IJ, B[O, 1], Leb) as follows. Define (the right-hand equalities, which you can prove, are there for clarification only) (al)
X+(w) := inf{z : F(z) > w} = sup{y: F(y) ::; w},
(a1)
X-(w):= inf{z: F(z)
~
w} = sup{y: F(y) < w}.
The following picture shows cases to watch out for.
1
1
W
~ I
--,
I
OL-~------L---------~-------
X-(F(x))
x
By definition of X-,
(w ::; F(c))
=}
(X-(w) ::; c).
X+(F(x))
Chapter 9: Random Variables
.. (9.12)
95
Now,
(z > X-(w))
=>
(F(z)
~
w),
so, by the right-continuity of F, F(X-(w» ~ w, and
Thus, (w :=:; F(c»
<==* (X-(w):=:; c), so that
P(X- :=:; c) = F(c). (b)
The variable X- therefore has distribution function F, and the measure .c in Section 9.11 is just the law of X-.
It will be important later to know that
(c)
X+ also has distribution function F, and that, indeed,
Proof of (c). By definition of X+,
(w < F(c»
=>
(X+(w) :=:; c),
so that F(c):=:; P(X+ :=:; c). Since X- :=:; X+, it is clear that
{X- =J X+} =
U{X- :=:; c < X+}. cEQ
But, for every c E R,
Since Q is countable, the result follows.
o
Remark. It is in fact true that every experiment you will meet in this (or any other) course can be modelled via the triple ([0,1),8[0,1]' Leb). (You will start to be convinced of this by the end of the next ch~pter.) However, this observation normally has only curiosity value. '
{3.13}..
Chapter 3: Random Variables
36
3.13. Generated u-algebras - a discussion Suppose that (n, F, P) is a model for some experiment, and that the experiment has been performed, so that (see Section 2.2) Tyche has made her choice of w. Let (Y.." : I E C) be a collection of random variables associated with our experiment, and suppose that someone reports to you the following information about the chosen point w: (*) the values Y..,,(w), that is, the observed values of the random variables Y"),(,EC). Then the intuitive significance of the u-algebra Y := u(Y")' : I E C) is that it consists precisely of those events F for which, for each and every w, you can decide whether or not F has occurred (that is, whether or not w E F) on the basis of the information (*); the information (*) is precisely equivalent to the following information: (**) the values IF(w) (F E Y). (a) Exercise. Prove that the u-algebra u(Y) generated by a single random variable Y is given by
u(Y)
= y-I(B):= ({w: Yew) E B}:
BE B),
and that u(Y) is generated by the 7r-system
7r(Y):= ({w: Y(w)::; x}: x E R) = y-l(rr(R».
o
The following results might help clarify things. Good advice: stop reading this section after (c)! Results (b) and ( c) are proved in the appendix to this chapter. (b) If Y : n -> R, then Z : n -> R is an u(Y)-measurable function if and only if there exists a Borel function I : R -> R such that Z = I(Y). (c) If YI, Y2 , ••• , Yn are functions from n to R, then a function Z : n -> R is u(Yi., Y2 , ••• , Yn)-measurable if and only if there exists a Borel function f on Rn such that Z = f(Yi.,}2, ... , Yn). We shall see in the appendix that the more correct measurability condition on I is that I be 'Bn-measurable'. (d) If (Y")' : I E C) is a collection (parametrized by the infinite set C) of functions from n to R, then Z : n -> R is u(Y")' : I E C)-measurable if and only if there exists a countable sequence (,i: i E N) of elements of C and a Borel function I on RN such that Z = f(Y")'" Y")'2' .. .). Warning - for the over-enthusiastic only. For uncountable C, B(RG) is much larger than the C-fold product measure space IT")'EC B(R). It is the latter rather than the former which gives the appropriate type of f in (d).
.. (9.14)
Chapter 9: Random Variables
97
3.14. The Monotone-Class Theorem In the same way that Uniqueness Lemma 1.6 allows us to deduce results about a-algebras from results about 7I"-systems, the following 'elementary' version of the Monotone-Class Theorem allows us to deduce results about general measurable functions from results about indicators of elements of 71"systems. Generally, we shall not use the theorem in the main text, preferring 'just to use bare hands'. However, for product measure in Chapter 8, it becomes indispensable.
THEOREM. ~
Let 1-1. be a claS8 of bounded functions from a set S into R satisfying the following conditions: (i) 1i is a vector space over R; (ii) the constant function 1 is an element of 1i; (iii) if (In) is a sequence of non-negative functions in 1-1. such that In i I where f is a bounded function on S, then f E 1i. Then il 1i contains the indicator function of every set in some 71"system I, then 1i contains every bounded a(I)-measurable function on S.
For proof, see the appendix to this chapter.
Chapter 4-
Independence
Let (n,.r, P) be a probability triple. 4.1. Definitions of independence
Note. We focus attention on the q-algebra formulation (and describe the more familiar forms of independence in terms of it) to acclimatize ourselves to thinking of IT-algebras as the natural means of summarizing information. Section 4.2 shows that the fancy q-algebra definitions agree with the ones from elementary courses. Independent q-algebras
gl, ~f2, ... of.r are called independent if, whenever Gi E (i E N) and iI, ... ,in are distinct, then
~Sub-q-algebras
(ii
n
P(Gi 1 n ... n Gin)
= II P(Gik)· k=l
Independent random variables ~Random
variables Xl,X 2 , ••• are called independent if the IT-algebras
are independent. Independent events ~ Events
E 1 , E 2 , • •• are called independent if the q-algebras E1 , E2 , • •• are independent, where En
is the q-algebra {0,E n ,n\En ,n}.
Since En = q(IEn ), it follows that events E 1 , E 2 , .. . are independent if and only if the random variables IE" IE., ... are independent.
98
Chapter
··(4·2)
4:
Independence
39
4.2. The 7r-system Lemma; and the more familiar definitions We know from elementary theory that events E 1 , E 2 , ••• are independent if and only if whenever n E N and i 1 , •• • , in are distinct, then n
P(Ei 1 n ... n E;n) =
II P(E;.), k=1
corresponding results involving complements of the Ej, etc., being consequences of this. We now use the Uniqueness Lemma 1.6 to obtain a significant generalization of this idea, allowing us to st udy independence via (manageable) 7r-systems rather than (awkward) a-algebras. Let us concentrate on the case of two a-algebras. ~~(a)
LEMMA. Suppose that 9 and 1i are sub-a-algebras of F, and that I and .:1 are 7r-systems with a(I)
= 9,
a(.:1)
= 1i.
Then 9 and 1i are independent if and only if I and.:1 are independent in that P(I n J) = P(I)P( J), I E I, J E.:1.
Proof Suppose that I and.:1 are independent. For fixed I in I, the measures (check that they are measures!)
H
I-t
P(I n H) and H
I-t
P(I)P(H)
on (n,1i) have the same total mass P(I), and agree on.:1. By Lemma 1.6, they therefore agree on a(.:1) = 1i. Hence,
P(I n H) = P(I)P(H),
I E I, HE 1i.
Thus, for fixed H in 1i, the measures
G I-t peG n H) and G
I-t
P(G)P(H)
on (n,9) have the same total mass P(H), and agree on I. They therefore . agree on a(I) = 9; and this is what we set out to prove. 0
Chapter 4: Independence
40
Suppose now that X and Y are two random variables on (n,:F, P) such that, whenever x, y E R, P(X ~ XjY ~ y) = P(X ~ x)P(Y ~ y).
(b)
Now, (b) says that the 7r-systems 7r(X) and 7r(Y) (see Section 3.13) are independent. Hence u(X) and u(Y) are independent: that is, X and Y are independent in the sense of Definition 4.1. In the same way, we can prove that random variables Xl, X 2, •.• ,Xn are independent if and only if n
P(Xk ~ Xk : 1 ~ k ~ n) =
II P(Xk ~ Xk), k=l
and all the familiar things from elementary theory. Command: Do Exercise E4.1 now. 4.3. Second Borel-Cantelli Lemma (BC2) ~~
If (En: n E N) is a sequence of independent events, then
L
peEn)
= 00 =>
peEn, Lo.)
= P(limsup En) =
Proof. First, we have
(limsupEn)C
= liminf E! = U
n
1.
E!.
m n~m
With Pn denoting peEn), we have
this equation being true if the condition {n ;::: m} is replaced by condition {r ;::: n ;::: m}, because of independence, and the limit as r T 00 being justified by the monotonicity of the two sides. For x ;::: 0,
1 - x ~ exp( -x), so that, since
E Pn =
00,
II (1 - Pn) ~ exp (- LPn) = o.
n~m
So, P [(lim sup EnYJ =
n~m
o.
Exercise. Prove that if 0 ~ Pn < 1 and S := EPn < 00, then IT(l- Pn) O. Hint. First show that if S < 1, then IT(1 - Pn) ;::: 1 - S.
0
>
41
Chapter 4: Independence
··(4·4) 4.4. Example
Let (Xn : n E N) be a sequence of independent random variables, each exponentially distributed with rate 1:
Then, for a > 0, P(Xn > alogn) = n-"', so that, using (BC1) and (BC2), (aO)
P(Xn > alogn for infinitely many n)
if a> 1, if a ::s 1.
= { 01
Now let L:= lim sup(Xn/log n). Then
P(L
~ 1) ~ P(Xn > logn, i.o.) = 1,
and, for kEN,
P(L > 1 + 2k- 1 ) ::s P (Xn > (1 + k- 1 ) log n, i.o.) = Uk {L > 1 + 2k- 1 } is P-null, and hence
= O.
Thus, {L > I}
L
= 1 almost surely.
Something to think about In the same way, we can prove the finer result (al)
P(Xn > logn + a log log n, i.o. )
=
{ 01
if a> 1, if a ::s 1,
or, even finer,
P(X 2) n (a > 1ogn+ 1og 1ogn+a 1og 1og 1ogn,
1.0.
) -_ {O1 if if aa
::s> 1;1,
or etc. By combining in an appropriate way (think about this!) the sequence of statements (aO),(al),(a2), ... with the statement that the union of a countable number of null sets is null while the intersection of a sequence of probability-l sets has probability 1, we can obviously make remarkably precise statements about the size of the big elements in the sequence (X n ). I have included in the appendix to this chapter the statement of a truly fantastic theorem about precise description of long-term behaviour: Sirassen's Law.
42
Chapter
4:
(4·4) ..
Independence
A number of exercises in Chapter E are now accessible to you.
4.5. A fundamental question for modelling Can we con~truct a ~equence (Xn : n E N) of independent random variables, Xn having prescribed distribution function Fn P We have to be able to answer Yes to this question - for example, to be able to construct a rigorous model for the branching-process model of Chapter 0, or indeed for Example 4.4 to make sense. Equation (0.2,b) makes it clear that a Yes answer to our question is all that is needed for a rigorous branching-process model.
The trick answer based on the existence of Lebesgue measure given in the next section does settle the question. A more satisfying answer is provided by the theory of product measure, a topic deferred to Chapter 8. 4.6. A coin-tossing model with applications Let (n, F, P) be ([0,1], B[O, 1], Leb). For wEn, expand w in binary:
w = 0.WIW2 ... (The existence of two different expansions of a dyadic rational is not going to cause any problems because the set 0 (say) of dyadic rationals in [0,1] has Lebesgue measure 0 - it is a countable set!) An an Exercise, you can prove that the sequence (~n : n EN), where ~n(w):= wn, is a sequence of independent variables each taking the values 0 or 1 with probability t for either possibility. Clearly, (~n : n E N) provides a model for coin tossing.
Now define
Yi(w):= 0.WIW3W6.·· , Y2(w):= 0.W2wSw9 ... ,
}'jew)
:=
0.W4 W SW 13· •• ,
and so on. We now need a bit of common sense. Since the sequence
has the same 'coin-tossing' properties as the full sequence (w n is clear that Y1 has the uniform distribution on [0,1]; and similarly for the other Y's.
:
n EN), it
··(4·8)
Chapter
4:
Independence
Since the sequences (1,3,6, ... ), (2,5,9, ... ), ... which give rise to Yb Y2 , .•• are disjoint, and therefore correspond to different sets of tosses of our 'coin', it is intuitively obvious that ~
Y I , Y 2 , • •• are independent random variables, each uniformly distributed on [0,1]. Now suppose that a sequence (Fn : n E N) of distribution functions is given. By the Skorokhod representation of Section 3.12, we can find functions gn on [0,1] such that
Xn := gn(Yn) has distribution function Fn. But because the V-variables are independent, the same is obviously true of the X -variables. ~
We have therefore succeeded in constructing a family (Xn : n E N) of independent random variables with prescribed distribution functions. Exercise. Satisfy yourself that you could if forced carry through these intuitive arguments rigorously. Obviously, this is again largely a case of utilizing the Uniqueness Lemma 1.6 in much the same way as we did in Section 4.2. 4.7. Notation: lID RVs Many of the most important problems in probability concern sequences of random variables (RVs) which are independent and identically distributed (lID). Thus, if (Xn) is a sequence of lID variables, then the Xn are independent and all have the same distribution function F (say):
P(Xn
~
x) = F(x),
Vn,Vx.
Of course, we now know that for any given distribution function F, we can construct a triple (n,F,p) carrying a sequence of lID RVs with common distribution function F. In particular, we can construct a rigorous model for our branching process. 4.8. Stochastic processes; Markov chains ~A
stochastic process Y parametrized by a set C is a collection
Y
= (Y,
: '"Y E C)
of random variables on some triple (n, F, P). The fundamental question about existence of a stochastic process with prescribed joint distributions is (to all intents and purposes) settled by the famous Daniell-Kolmogorov theorem, which is just beyond the scope of this course.
Chapter
44
4:
Independence
Our concern will be mainly with processes X = (Xn : n E Z+) indexed (or parametrized) by Z+. We think of Xn as the value of the process X at time n. For wEn, the map 11 1-+ Xn(W) is called the sample path of X corresponding to the sample point w. A very important example of a stochastic process is provided by a Markov chain. ~~Let
E be a finite or countable set. Let P = (Pij : i,j E E) be a 3tocha3tic Ex E matrix, so that for i,j E E, we have Pij ~ 0,
Let J-l be a probability measure on E, so that J-l is specified by the values = (Zn: n E Z+) on E with initial di3tribution J-l and I-step transition matrix P is meant a stochastic process Z such that, whenever n E Z+ and io,i 1 , .•. ,in E E,
J-li:= J-l({i}), (i E E). By a time-homogeneou3 Markov chain Z
Exercise. Give a construction of such a chain Z expressing Zn( w) explicitly in terms of the values at w of a suitable family of independent random variables. See the appendix to this chapter. 4.9. Monkey typing Shakespeare Many interesting events must have probability 0 or 1, and we often show that an event F has probability 0 or 1 by using some argument based on independence to show that P(F)2 = P(F). Here is a silly example, to which we apply a silly method, but one which both illustrates very clearly the use of the monotonicity properties of measures in Lemma 1.10 and has a lot of the flavour of the Kolmogorov 0-1 law. See the 'Easy exercise' towards the end of this section for an instantaneous solution to the problem. Let us agree that correctly typing WS, the Collected Works of Shakespeare, amounts to typing a particular sequence of N symbols on a typewriter. A monkey types symbols at random, one per unit time, producing an infinite sequence (Xn) of lID RVs with values in the set of all possible symbols. We agree that €
:= inf{P(Xl = x): x is a symbol}
> O.
Let H be the event that the monkey produces infinitely many copies of WS. Let H k be the event that the monkey will produce at least k copies of WS in
Chapter
··(4·9)
4:
45
Independence
e\J ("A+-
all, and let H m,k be the pr~bability. that it will produce at least k copies by time m. Finally, let H(m) be the event that the monkey produces infinitely many copies of WS over the time period [m + 1,00). Because the monkey's behaviour over [1, m] is independent of its behaviour over [m + 1,00), we have
But logic tells us that, for every m, H(m)
P(Hm,k
r
n H) =
= H!
Hence,
P(Hm,k)P(H).
r
r
But, as m 00, Hm,k Hk, and (Hm,k n H) (Hk obvious that Hk ;2 H. Hence, by Lemma 1.10(a),
n H) = H, it being
P(H) = P(Hk)P(H). However, as k
r 00, Hk ! H, and so, by Lemma 1.10(b), P(H)
= P(H)P(H),
whence P(H) = 0 or 1. The Kolmogorov 0-1 law produces a huge class of important events E for which we must have P(E) = 0 or P(E) = 1. Fortunately, it does not tell us which - and it therefore generates a lot of interesting problems! Easy exercise. Use the Second Borel-Cantelli Lemma to prove that P(H) = 1. Hint. Let El be the event that the monkey produces WS right away, that is, during time period [1, N]. Then P(El) 2': eN. Tricky exercise ( to which we shall return). If the monkey types only capital letters, and is on every occasion equally likely to type any of the 26, how long on average will it take him to produce the sequence 'ABRACADABRA' ?
The next three sections involve quite subtle topics which take time to assimilate. They are not strictly necessary for subsequent chapters. The Kolmogorov 0-1 law is used in one of our two proofs of the Strong Law for lID RVs, but by that stage a quick martingale proof (of the 0-1 law) will have been provided.
Note. Perhaps the otherwise-wonderful 'lEX makes its T too like I. Below, I use K instead of I to avoid the confusion. Script X, X, is too like Greek chi, X, too; but we have to live with that.
46
Chapter
4:
(4. 10) ..
Independence
4.10. Definition. Tail u-algebras ~~Let
X I ,X2 , ••• be random variables. Define Tn := u(Xn +I ,Xn + 2 , •••
(a)
T:=
),
n
Tn.
n
The u-algebra T is called the tail u-algebra of the sequence (X n
:
n EN).
Now, T contains many important events: for example,
(bl)
FI
(b2)
F2 :=
(b3)
) . Xl + X 2 + ... + Xk Fa:= ( hm -----=-k---- exists .
:=
(limXk exists) := {w : limXk(w) exists}, k
(2: Xk converges),
Also, there are many important variables which are in mT: for example,
(c) which may be
±oo,
of course.'
Exercise. Prove that FI ,F2 and F3 are are T-measurable, that the event H in the monkey problem is a tail event, and that the various events of probability 0 and 1 in Section 4.4 are tail events.
Hint - to be read only after you have already tried hard. Look at F3 for example. For each n, logic tells us that Fa is equal to the set (n) { l' Xn+l(w)+",+Xn+k(w) 't} F3 := W: If! k eXls s .
Now, X n + b X n + 2 , ... are all random variables on the triple (n, Tn, P). That Fin) E Tn now follows from Lemmas 3.3 and 3.5. 4.11. THEOREM. Kolmogorov's 0-1 Law ~~
Let (Xn : n E N) be a sequence of independent random variables, and let T be the tail u-algebra of (Xn : n EN). Then T is P-trivial: that is, (i) FE T =? P(F) = 0 or P(F) = 1, (ii) if is aT-measurable random variable, then, is almost deterministic in that for some constant c in [-00,00],
e
e
pee = c) = 1.
.. (-/.11) We allow
Chapter ~
=
4:
47
Independence
±oo at (ii) for obvious reasons.
Proof of (i). Let
Xn := u(X1 , ••• , Xn),
Tn := u(Xn +l,Xn+2 , •• • ).
Step 1: We claim that Xn and Tn are independent. Proof of claim. The class K of events of the form
{w: X;(w)
~
xi: 1 ~ k ~ n},
is a 7r-system which generates X n . The class
{W : Xj(w)
~
Xj : n + 1 ~ j
Xi
:r of sets of the form
n + r},
~
E R U {oo}
rEN,
xjERU{oo}
is a 7r-system which generates Tn. But the assumption that the sequence (Xk) is independent implies that K and:r are independent. Lemma 4.2(a) now clinches our claim. Step 2: Xn and T are independent. This is obvious because T
~
Tn.
Step !i: We claim that Xoo := u(Xn : n E N) and T are independent. Proof of claim. Because Xn ~ X n +1 , "In, the class Koo := U Xn is a 7rsystem (it is generally NOT a u-algebra!) which generates X(X). Moreover, Koo and T are independent, by Step 2. Lemma 4.2(a) again clinches things. Step
4.
Since T ~ X(X)' T is independent of T! Thus,
FE T and P(F)
= 0 or
=>
P(F)
= P(F n F) = P(F)P(F),
1.
o
PrClof of (ii). By part (i), for every x in R, P(~ ~ x) = 0 or 1. Let c := sup{x : P(( ~ x) = OJ. Then, if c = -00, it is clear that P(( = -00) = 1; and if c = 00, it is clear that P(~ = 00) = 1.
So, suppose that c is finite. Then P(( ~ c -lin) = 0, "In, so that
while, since P(( ~ c + lin) = 1, "In, we have
48
Chapter
Hence,
4:
(4. 11) ..
Independence
o
PC, = c) = 1.
Remarks. The examples in Section 4.10 show how striking this result is. For example, if XI, X 2 , ••• is a sequence of independent random variables, then either Xn converges) = 0
P(L:
or
peE Xn converges) = 1.
The Three Series Theorem (Theorem 12.5) completely settles the question of which possibility occurs. So, you can see that the 0-1 law poses numerous interesting questions. Example. In the branching-process example of Chapter 0, the variable
is measurable on the tail u-algebra of the sequence (Zn : n E N) but need not be almost deterministic. But then the variables (Zn : n E N) are not independent. 4.12. Exercise/Warning Let
Yo, VI, 12, ...
be independent random variables with P(Yn
= +1) = P(Yn = -1) = t,
"In.
For n E N, define Xn := YoYi ... Y n • Prove that the variables Xl, X 2 , ••• are independent. Define
y:= u(YI, Y2, .. .),
Tn := u(Xr
: r
> n).
Prove that
Hint. Prove that Yo E mC and that Yo is independent of R. Notes. The phenomenon illustrated by this example tripped up even Kolmogorov and Wiener. The very simple illustration given here was shown to me by Martin Barlow and Ed Perkins. Deciding when we can assert that (for Y a u-algebra and (Tn) a decreasing sequence of u-algebras )
is a tantalizing problem in many probabilistic contexts.
Chapter 5
Integration
5.0. Notation, etc. /-1(/) :=: I
I
dl-', /-1(/; A)
Let (S, E, p.) be a measure space. We are interested in defining for suitable elements I of mE the (Lebesgue) integral of I with respect to /-I, for which we shall use the alternative notations:
/-1(1) :=: IsI(s)/-I(ds) :=: Isld/-l. It is worth mentioning now that we shall also use the equivalent notations for A E E:
(with a true definition on the extreme right!) It should be clear that, for example,
1-'(/;1
~
x):= /-1(/; A), where A = {s E S: I(s)
~
x}.
Something else worth emphasizing now is that, of course, summation is a special type 01 integration. If (an: n E N) is a sequence of real numbers, then with S = N, E = peN), and /-I the measure on (S, E) with 1-'( {k}) = 1 for every k in N, then s 1-+ a. is /-I-integrable if and only if:E lanl < 00, and then
We begin by considering the integral of a function I in (mE)+, allowing such an I to take values in the extended hall-line [0,00].
50
Chapter 5: Integration
{5.1}..
5.1. Integrals of non-negative simple functions, SF+ If A is an element of ~, we define
The use of po rather than p signifies that we currently have only a naive integral defined for simple functions. An element / of (m~)+ is called simple, and we shall then write / E SF+, if / may be written as a finite sum
(a) where ak E [0,00] and Ak E
~.
We then define
(b)
(with 0.00:= 0 =: 00.0).
The first point to be checked is that /10(/) is well-defined; for / will have many different representations of the form (a), and we must ensure that they yield the same value of /10(/) in (b). Various desirable properties also need to be checked, namely (c), (d) and (e) now to be stated: (c)
if /,g E SF+ and /1(/
i- g) = 0 then /10(/) = /1o(g); SF+ and e ;::: 0 then / + 9 and e/ are in
(d) ('Linearity') if /,g E and /10(/ + g) = 1'0(/) + /1o(g),
SF+,
po(e!) = el'o(/);
(e)
(Monotonicity) if /,g E SF+ and / 5 g, then /10(/) 5/10(g);
(f)
if /,g E SF+ then /1\9 and / V 9 are in SF+.
Checking all the properties just mentioned is a little messy, but it involves no point of substance, and in particular no analysis. We skip this, and turn our attention to what matters: the Monotone-Convergence Theorem.
5.2. Definition of /1(/), / E (m~)+ ~For
(a)
/ E (mE)+ we define
p(/) := sup{/1o(h) : h E SF+, h 5 f} 5 00.
Clearly, for / E SF+, we have 1'(/) = /10(/)' The following result is important.
51
Chapter 5: Integration
.. (5.3) LEMMA ~(b)
II I E (m~)+ and JL(f)
= 0,
then
JL(U
> OJ)
= O.
Proof Obviously, U> O} =T limU > n- 1 }. Hence, using (1.lO,a), we see that if JL( U > O}) > 0, then, for some n, JL( U > n- 1 }) > 0, and then
o 5.3. Monotone-Convergence Theorem (MON) ~~~(a)
If (fn) is a sequence of elements of (m~)+ such that In then
T I,
or, in other notation,
This theorem is really all there is to integration theory. We shall see that other key results such as the Fatou Lemma and the Dominated-Convergence Theorem follow trivially from it. The (MON) theorem is proved in the Appendix. Obviously, the theorem relates very closely to Lemma 1.1O(a), the monotonicity result for measures. The proof of (MON) is not at all difficult, and may be read once you have looked at the following definition of o:
0
(b)
a(r)(x):= { (i - 1)2-r
r
Then I(r) = a(r)
0
if x = 0, if (i -1)2- r < x ~ i2- r ~ r if x> r.
I satisfies I(r) E SF+, and I(r)
We have made a(r) left-continuous so that if In
TI
TI
(i EN),
so that, by (MON),
then a(r)(fn)
Ta(r) (f).
(5.9) ..
Chapter 5: Integration
52
Often, we need to apply convergence theorems such as (MON) where the hypothesis (fn i I in the case of (MON» holds almost everywhere rather than everywhere. Let us see how such adjustments may be made. (c)
= 9 (a.e.),
If I,g E (m~)+ and I
then J-l(f)
= J-l(g).
Proof. Let I(r) = a(r) 0 I, g(r) = a(r) 0 g. Then I(r) = g(r) (a.e.) and so, by (5.1,c), J-l(f(r») = J-l(g(r»). Now let r i 00, and use (MON). 0 ~(d)
If I E (m~)+ and (fn) i8 a 8equence in a J-l-null set N, In i I. Then
(m~)+
such that, except on
Proof. We have J-l(f) = J-l(fIs\N) and p(fn) = J-l(fnIS\N). But fnIS\N l IIs\N everywhere. The result now follows from (MON). 0 From now on, (MON) is understood to include this extension. We do not bother to spell out such extensions for the other convergence theorems, often stating results with 'almost everywhere' but proving them under the assumption that the exceptional null set is empty.
Note on the Riemann integral If, for example,
I
is a non-negative Riemann integrable function on ([0,1]'
B[O,I], Leb) with Riemann integral I, then there exists an increasing sequence (Ln) of elements of SF+ and a decreasing sequence (Un) of elements of SF+ such that
Ln and J-l(Ln)
TI,
J-lCUn)
1L
TL
~ I,
Un
1U ~ I
If we define
- {L
I =
0
if L = U, otherwise,
then it_is clear that j is Borel measurable, while (since J-l(L) = J-l(U) = 1) {f =J J} is a subset of the Borel set {L =J U} which Lemma 5.2(b) shows to be of measure O. So f is Lebesgue measurable (see Section Al.l1) and the Riemann integral of I equals the integral of I associated with ([0,1], Leb[O, 1], Leb), Leb[O,I] denoting the a-algebra of Lebesgue measurable subsets of [0,1].
5.4. The Fatou Lemmas for functions ~~(a)
(FATOU) For a sequence (fn) in (m~)+, p(liminf In) ~ liminf J-l(fn).
.. (5.6)
59
Chapter 5: Integration
Proof. We have
liminf fn n
For n
~ k,
we have fn
~
=i limgk,
where gk
:= infn~k In.
gk, so that p.(fn) ~ P.(gk), whence
and on combining this with an application of (MON) to (*), we obtain p.(liminf In) n
=i limp.(gk) $i lim inf p.(fn) k k n~k
o
=: liminf p.(fn). n
Reverse Fatou Lemma ~(b)
If(fn) is a sequence in (mE)+ such that for some gin (mE)+, we have In $ g, "In, and p.(g) < 00, then
p.(limsup/n) ~ limsupp.(fn).
o
Proof· Apply (FATOU) to the sequence (g - In).
5.5. 'Linearity' For a, (3 E R+ and I,g E (mE)+, p.(af
+ (3g) = ap.(f) + (3p.(g)
($ 00).
Proof· Approximate I and 9 from below by simple functions, apply (5.1,d) to the simple functions, and then use (MON). 0
5.6. Positive and negative parts of I For I E mE, we write
I = 1+ - 1-, where
I+(s):= max(f(s),O),
Then 1+,1- E (mE)+, and
r(s):= max(-f(s),O).
III = j+ + 1-.
54
(5.7) ..
Chapter 5: ,Integration
5.7. Integrable function, {,l(S,E;/1) ~For
fEmE, we say that f is /1-integrable, and write
if and then we define
Note that, for
f
E {,l (S, E, /1), I/1U)1 ~ /1(lfl),
the fanliliar rule that the modulus of the integral is less than or equal to the integral of the modulus.
We write {,l (S, E, /1)+ for the class of non-negative elements in {,l(S, E, /1). 5.8. Linearity Fora,f3 E Rand f,g E {,l(S,E,/1), af + f3g E {,l(S, E' /1) and fl(af
+ f3g) = a/1U) + f3/1(g).
Proof. This is a totally routine consequence of the result in Section 5.5. 0
5.9. Dominated-Convergence Theorem (DOM) ~
Suppose that fn'! E mE, that fn(s) --+ f( s) for every s in S and that the sequence Un) is dominated by an element 9 of {,l(S, E,/1)+: Ifn(s)1 ~ g(s), where /1(g) < fn
00.
--+
Vs E S, Vn E N,
Then
fin {,l(S, E,/1): that is, /1(lfn - fl)
whence
Command: Do Exercise E5.1 now.
--+
0,
55
Chapter 5: Integration
.. (5.11)
Proof We have Ifn Lemma 5.4(b),
fl ::; 2g, where
fl(2g) <
00,
so by the reverse Fatou
limsuPfl(lfn - fl) ::; fl(1imsup Ifn - fl)
= fl(O) = O.
Since
IflUn) - pU)1 the theorem is proved.
= IflUn -
1)1 ::; fl(lfn - fl),
o
5.10. Scheffe's Lemma (SCHEFFE) ~(i)
Suppose that fn,! E £I(S, I:, fl)+,. in particular, fn and f are nonnegative. Suppose that fn ..... f (a.e.). Then P(lfn - fl) ..... 0 if and only if flUn) ..... pU)·
Proof The 'only if' part is trivial. Suppose now that (a) Since Un - 1)- ::;
f,
(DOM) shows that
(b)
fl(Un - f)-) .....
o.
Next,
p(Un - f)+)
= PUn -
f;!n 2: f)
= flUn) - flU) - PUn - fi fn
< I).
But
IpUn - f;!n < f)l ::; Ifl(Un - f)-)I ..... 0 so that (a) and (b) together imply that
(c)
fl(Un - f)+) ..... O. Of course, (b) and (c) now yield the desired result.
o
Here is the second part of Scheffe's Lemma. ~(ii)
Suppose that fn'! E £1(S,I:,fl) and that fn ..... f (a.e.). Then P(lfn - fl) ..... 0 if and only if P(lfn!} ..... fl(lfl}·
Exercise. Prove the 'if' part of (ii) by using Fatou's Lemma to show that flU;') ..... flU±), and then applying (i). Of course, the 'only if' part is trivial. 5.11. Remark on uniform integrability The theory of uniform integrability, which we shall establish later for probability triples, gives better insight intq the matter of convergence of integrals.
Chapter 5: Integration
56
(5.12) ..
5.12. The standard machine What I call the standard machine Monotone-Class Theorem.
IS
a much cruder alternative to the
The idea is that to prove that a 'linear' result is true for all functions h in a space such as [1(S, E,fl), • first, we show the result is true for the case when h is an indicator function - which it normally is by definition; • then, we use linearity to obtain the result for h in Sp+; • next, we use (MON) to obtain the result for h E (mE)+, integrability conditions on h usually being superfluous at this stage; • finally, we show, by writing h the claimed result is true.
= h+
- h- and using linearity, that
It seems to me that, when it works, it is easier to 'watch the standard machine work' than to appeal to the monotone-class result, though there are times when the greater subtlety of the Monotone-Class Theorem is essential.
5.13. Integrals over subsets Recall that for
I
E (mE)+, we set, for A E E,
If we really want to integrate lover A, we should integrate the restriction IIA with respect to the measure flA (say) which is fl restricted to the measure space (A, EA)' EA denoting the cr-algebra of subsets of A which belong to E. So we ought to prove that
(a) The standard machine does this. If I is the indicator of a set B in A, then both sides of (a) are just fl(A n B); etc. \Ye discover that lor I E mE, we have IIA E mEA; and then
in which case (a) holds.
.. (5.14)
Chapter 5: Integration
57
5.14. The measure fJl, f E (m1:)+ Let
f
E (m:E)+. For A E 1:, define
(a) A trivial Exercise on the results of Section 5.5 and (MON) shows that
(b)
(f J1) is a measure on (S, 1:).
For h E (m1:)+, and A E 1:, we can conjecture that
(c) If h is the indicator of a set in 1:, then (c) is immediate by definition. Our standard machine produces (c), so that we have (d) Result (d) is often used in the following form: ~(e)
if f E (m1:)+ and h E (m1:), then h E Cl(S,'£,Jj1) if and only if fh E C 1 (S,1:,j1) and then (fJl)(h) = j1(fh).
Proof. We need only prove this for h 2:: 0 in which case it merely says that the measures at (d) agree on S. 0 Terminology, and the Radon-Nikodym theorem
If >. denotes the measure f Jl on (S, 1:), we say that>. has density f relative to j1, and express this in symbols via d>./dj1 =j. We note that in this case, we have for F E 1::
(f)
j1(F)
= 0 implies that
>.(F) = 0;
so that only certain measures have density relative to j1. Nikodym theorem (proved in Chapter 14) tells us that
The Radon-
(g) if j1 and>' are u-finite measures on (S,1:) such that (f) holds, then >. = fJl for some f E (m1:)+.
Chapter 6
Expectation
6.0. Introductory remarks
We work with a probability triple (n, F, P), and write £r for £r(n, F, P). Recall that a random variable (RV) is an element of mF, that is an Fmeasurable function from n to R. Expectation is just the integral relative to P. Jensen's inequality, which makes critical use of the fact that pen) = 1, is very useful and powerful: it implies the Schwarz, Holder, ... inequalities for general (S,E,p). (See Section 6.13.) We study the geometry of the space £2(n, F, P) in some detail, with a view to several later applications. 6.1. Definition of expectation For a random variable X E £1 E(X) of X by
E(X):=
In
=
£1(n, F, P), we define the expectation
XdP =
In
X(w)P(dw).
We also define E(X) (::; 00) for X E (mF)+. In short, E(X)
= P(X).
That our present definitions agree with those in terms of probability density function (if it exists) etc. will be confirmed in Section 6.12. 6.2. Convergence theorenls
Suppose that (Xn) is a sequence of RVs, that X is a RV, and that Xn almost surely: P(Xn -+ X) = 1.
-+
X
We rephrase the convergence theorems of Chapter 5 in our new notation:
58
Chapter 6: Expectation
.. (6··0 ~~(MON)
if 05 Xn
~~(FATOU) if Xn ;::
i
X, then E(Xn)
59
i E(X) 5 ooj
0, then E(X) 5liminfE(Xn)j
~(DOM) if IXn(w)1 5 Y(w) V(n,w), where E(Y)
E(IXn - XI)
-+
< 00, then
0,
so that
E(Xn)
-+
E(X)j
~(SCHEFFE) if E(lXnl) -+ E(IXI), then
E(IXn - XI) ~~(BDD)
-+
OJ
if for some finite constant /(, IXn(w)1 5 I<, V(n,w), then
E(IXn - XI)
-+
O.
The newly-added Bounded Convergence Theorem (BDD) is an immediate consequence of (DOM), obtained by taking Yew) = /(, VWj because of the fact that P(O) = 1, we have E(Y) < 00. It has a direct elementary proof which we shall examine in Section 13.7j but you might well be able to provide it now. As has been mentioned previously, uniform integrability is the key concept which gives a proper understanding of convergence theorems. We shall study this, via the elementary (BDD) result, in Chapter 13.
,
6.3. The notation E( X j F)
For X E £1 (or (mF)+) and F E F, we define E(XjF):= fFX(w)P(dw):= E(XIF), where, as ever, I ifw E F, IF(w):= { 0 ifw ¢ F. Of course, this tallies with the J.l(fj A) notation of Chapter 5. 6.4. Markov's inequality Suppose that Z E mF and that 9 : R -+ [0,00] is 8-measurable and nondecreasing. (We know that g(Z) = 9 0 Z E (mF)+.J Then ~
Eg(Z) ;:: E(g(Z)j Z ;:: c) ;:: g(c)P(Z ;:: c).
Examples:
(6·4)··
Chapter 6: Expectation
60
for Z E (m.rf,
eP(Z ~ e):::; E(Z),
eP(IXI ~ e) :::; E(IXI)
for X E (1,
(e>O), (e> 0).
~ ~ Considerable strength can often be obtained by choosing the optimum B for
e tn
(B>O,
eER).
6.5. Sums of non-negative RVs We collect together some useful results. (a) If X E (m.:F)+ and E(X) ~(b) If (Zk) is a sequence in
< 00, then P(X < (0) = 1. This is obvious.
(mF)+, then
This is an obvious consequence of linearity and (MON). ~(c) If (Zk) is a sequence in
L:Zk
(mF)+ such that L:E(Zk) <
< 00 (a.s.)
and so Zk
--+
00,
then
0 (a.s.)
This is an immediate consequence of (a) and (b). (d) The First Borel- Cantelli Lemma is a consequence of (c). For suppose that (Fk) is a sequence of events such that L: P(Fk) < 00. Take Zk = IF•. Then E(Zk) = P(Fk) and, by (c),
L: IF. = number of events Fk which occur is a.s. finite. 6.6. Jensen's inequality for convex functions ~~A
function c : G --+ R, where G is an open subinterval of R, is called convex on G if its graph lies below any of its chords: for x, y E G and 0:::; p = 1 - q:::; 1, e(px + qy) :::; pc(x) + qc(y).
It will be explained below that c is automatically continuous on G. If c is twice-differentiable on G, then e is convex if and only if e" ~ O. ~Important examples of convex functions: Ixl,x 2 ,e 9x (B E
R).
.. (6.7)
61
Chapter 6: Expectation
THEOREM. Jensen's inequality ~~
Suppo.'le that c : G -+ R i.'l a convex function on an open .'lubinterval G of Rand that X i.'l a random variable such that
E(IXI) <
00,
P(X E G)
= 1,
Elc(X)1 <
00.
Then
Ec(X)
~
c(E(X)).
Proof. The fact that c is convex may be rewritten as follows: for u, v, w E G with u < v < w, we have ~u v "
:s ~v w, where ~u v := c(v) V -
I
c(u) U
.
It is now clear (why?!) that c is continuous on G, and that for each v in G the monotone limits
(D+c)(v)
:=llim~v.w wlv
:s
exist and satisfy (D_c)(v) (D+c)(v). The functions D_c and D+c are non-decreasing, and for every v in G, for any min [(D_c)(v), (D+c)(v)) we have x E G. c(x) ~ m(x - v) + c(v), In particular, we have, almost surely, for fl := E(X),
o
and Jensen's inequality follows on taking expectations. Remark. For later use, we shall need the obvious fact that
c(x) = sup[(D_c)(q)(x - q) + c(q)] = sup(anx
(a)
n
qEG
+ bn )
(x E G)
for some sequences (an) and (b n ) in R. (Recall that c is continuous.) 6.1. Monotonicity of £P norms ~ ~For
1
:s p < 00, we say that X
E £P = O(Q,:F, P) if
E(/XIP) <
00,
Chapter 6: Expectation
62
(6.7) ..
and then we define
The monotonicity property referred to in the section title is the following: ~(a)
if 1 ~ p ~ r < 00 and Y E
~Proof.
.cr,
then Y E £P and
For n E N, define
Xn(W):= {IY(w)1 II n}p. Then X n is bounded so that X n and X~/p are both in £1. Taking c( x) xr/p on (0,00), we conclude from Jensen's inequality that
Now let n
i
00 and use (MON) to obtain the desired result.
Note. The proof is marked with a effective use of truncation.
~
=
o
because it illustrates a simple but
Vector-space property of £P (b)
Since, for a, b E R+, we have
£P is obviously a vector space.
6.8. The Schwarz inequality ~~(a)
If X and Yare in £2, then XY E £1, and
Remark. You wiJI have seen many versions of this result and of its proof before. We use truncation to make the argument rigorous.
Proof. By considering IXI and IVI instead of X and Y, we can and do restrict attention to the case when X ~ 0, Y :2: o.
Chapter 6: Expectation
·.(6.9)
69
Write Xn := X 1\ n, Y n := Y 1\ n, so that Xn and Y n are bounded. For any a,b E R, 0:5 E[(aX n + bYn)2] = a2E(X~)
+ 2abE(XnYn) + b2E(Y;),
and since the quadratic in alb (or bl a, or ... ) does not have two distinct real roots, {2E(XnYn)}2 :5 4E(X~)E(Y;):5 4E(X2)E(y2). Now let n
i
00
o
using (MON).
The following is an immediate consequence of (a):
(b)
if X and Yare in
£2, then
JO
i3 X
+ Y,
and we have the triangle law:
Remark. The Schwarz inequality is true for any measure space - see Section 6.13, which gives the extensions of (a) and (b) to £P. 6.9. £2: Pythagoras, covariance, etc.
In this section, we take a brief look at the geometry of £2 and at its connections with probabilistic concepts such as covariance, correlation, etc. Covariance and variance
If X, Y E £2, then by the monotonicity of norms, X, Y E £1, so that we may define flx := E(X), flY:= E(Y). Since the constant functions with values flX, flY are in £2, we see that
(a)
X := X
- /lx,
are in £2. By the Schwarz inequality,
(b)
Y:= Y XV
- flY
E £1, and so we may define
Cov(X, Y) := E(XY) = E[(X - flX )(Y - /lY)]'
. The Schwarz inequality further justifies expanding out the product in the final [ ] bracket to yield the alternative formula:
(c)
Cov(X,Y) = E(XY) - flXflY. As you know, the variance uf X is defined by
(d)
Var(X) := E[(X - flx )2] = E(X2) - /l3.; = Cov(X, X).
(6.9) ..
Chapter 6: Expectation
Inner product, angle For U, V E £2, we define the inner (or Jcalar) product
(U, V)
(e) and if 11U1I2 and and V by
IWII2 f:.
:= E(UV),
0, we define the cosine of the angle fJ between U
(f) This has modulus at most 1 by the Schwarz inequality. This ties in with the probabilistic idea of correlation: (g) the correlation p of X and Y i" cos a where a is the angle between and Y.
X
Orthogonality, Pythagoras theorem
£2 has the same geometry as any inner-product space (but see 'Quotienting' below). Thus the 'cosine rule' of elementary geometry holds, and the Pythagoras theorem takes the form (h) If (U, V) = 0, we say that U and V are orthogonal or perpendicular, and write U ..L V. In probabilistic language, (h) takes the form (with U, V replaced by X, Y)
(i)
Var(X
+ Y) = Var(X) + Var(Y)
if Cov(X, Y)
= 0.
Generally, for Xl, X 2 , ••• ,Xn E £2,
(j)
Var(XI
+ X 2 + ... +Xn) =
LVar(Xk) + 2L Li<jCoV(Xi,Xj ). k
I have not marked results such as (i) and (j) with they are well known to you.
~
because I am sure that
Parallelogram law Note that by the bilinearity of (-, .), (k)
IIU + VII2 2 + IIU -
VII2 2
= (U + V, U + V) + (U = 211UII2 2 + 211V1I22.
V, U - V)
.. (6.10)
Chapter 6: Expectation
65
Quotienting (or lack of it!): L2 Our space £2 does not quite satisfy the requirements for an inner product space because the best we can say is that (see (5.2,b»
110112 =
0 if and only if U = 0 almost surely.
In functional analysis, we find an elegant solution by defining an equivalence relation
U '" V if and only if U
= V almost surely
and define L2 as '£2 quotiented out by this equivalence relation'. Of course, one needs to check that if for i = 1,2, we have Cj E R and Uj, Vi E £2 with Uj '" Vi, then
that if Un -+ U in £2 and Vn '" Un and V", U, then Vn -+ V in £2; etc. As mentioned in 'A Question of Terminology', we normally do not do this quotienting in probability theory. Although one might safely do so at the moderately elementary level of this book, one could not do so at a more advanced level. For a Brownian motion {B t : t E R+}, the crucial property that t 1-+ Bt(w) is continuous would be meaningless if one replaced the true function B t on by an equivalence class.
n
6.10. Completeness of 0
(1:5 p
< 00)
Let P E [1,00). The following result (a) is important in functional analysis, and will be crucial for us in the case when p = 2. It is instructive to prove it as an exercise in our probabilistic way of thinking, and we now do so.
(a)
If (Xn) is a Cauchy sequence in £P in that sup IIXr - Xsllp -+ 0
(k -+ 00)
r.s~k
then there exists X in £P such that Xr -+ X in £P: (r-+oo).
Note. We already know that £P is a vector space. Property (a) is important in showing that £P can be made into a Banach space LP by a quotienting technique of the type mentioned at the end of the preceding section.
(6.10) ..
Chapter 6: Expectation
66
Proof of (a). We show that X may be chosen to be an almost sure limit of a subsequence (X kn )· Choose a sequence (kn : n E N) with kn
i
(X)
such that
Then
so that E
L IXk + n
1 -
Xkn I <
(x).
Hence it is almost surely true that the series
L(Xkn+l - Xk n ) converges (even absolutely!), so that
limXkn(w) exists for almost all w. Define
X(W) := limsupXkn(w), Vw. Then X is :F-measurable, and Xkn ~ X, a.s. Suppose that n E Nand r
E (IXr so that on letting t
i
Firstly, Xr - X E
.cp ,
(X)
~
kn . Then, for N 3 t
- x.k IP) = IIXr t
~
n,
X k t II P P _< 2- np ,
and using Fatou's Lemma, we obtain
so that X E
.c p •
Secondly, we see that, indeed,
~~X~O.
Note. For an easy exercise on
0
.c p
convergence, see EA13.2.
6.11. Orthogonal projection
The result on completeness of .cp obtained in the previous section has a number of important consequences for probability theory, and it is perhaps as well to develop one of these while Section 6.10 is fresh in your mind. I hope that you will allow me to present the following result on orthogonal projection as a piece of geometry for now, deferring discussion of its central role in the theory of conditional expectation until Chapter 9. We write 11·11 for 11·112 throughout this section.
67
Chapter 6: Expectation
.. (6.11)
THEOREM ~
Let K be a vector subspace of £2 which is complete in that whenever (Vn ) is a sequence in K which ha~ the Cauchy property that
sup r,.!~k
IlVr -
(k -+ (0),
V.II -+ 0
then there exists a V in K such that
(n-+oo).
IlVn - VII-+ 0
Then given X in £2, there exists Y in K such that
(i)
IIX - YII
= ~ := inf{IIX X - Y ..1. Z,
(ii)
WII : W E K},
VZ E K.
Properties (i) and (ii) ofY in K are equivalent and if Y share~ either property (i) or (ii) with Y, then
IIY - YII
=0
(equivalently,
Y
= Y, a.s.).
Definition. The random variable Y in the theorem is called a version of the orthogonal projection of X onto K. If Y is another version, then Y = Y, a.s. Proof. Choose a sequence (Yn ) in K such that
By the parallelogram law (6.9,k),
But !(yr + Y.) E K, so that IIX - HYr + Y.)1I 2 ~ ~2. It is now obvious that the sequence (Yn ) has the Cauchy property so that there exists a Y in K such that llYn - YII -+ O. Since (6.8,b) implies that IIX - YII ::; IIX - Ynll IIX -
YII
=~.
+ llYn -
YII, it is clear that
(6.11) ..
Chapter 6: Expectation
68
For any Z in K, we have Y
+ tZ E K
for t E R, and so
whence
-2t(Z,X - Y) +t 2 11Z112 2': O. This can only be the case for all t of small modulus if
(Z,X - Y)
o
= O.
Remark. The case to which we shall apply this theorem is when K has the form £2(n, g, P) for some sub-a-algebra 9 of :F. 6.12. The 'elementary formula' for expectation
Back to earth! Let X be a random variable. To avoid confusion between different £'s, let us here write Ax on (R, B) for the law of X: Ax(B) := P(X E B).
LEMMA ~
Suppose that h is a Borel measurable function from R to R. Then
h(X) E £l(n,:F,p) if and only if hE £l(R,B,Ax) and then
(a)
Eh(X)
= Ax(h) =
k
h(x)Ax(dx).
Proof We simply feed everything into the standard machine.
Result (a) is the definition of Ax if h = IB (B E B). Linearity then shows that (a) is true if h is a simple function on (R, B). (MON) then implies (a) for h a non-negative function, and linearity allows us to complete the argument. 0 Probability density function (pdf) We say that X has a probability density function (pdf) fx if there exists a Borel function fx : R - t [0,00] such that
(b)
P(X E B)
=
fa
fx(x)dx,
BEB.
.. (6.19)
Chapter 6: Expectation
69
Here we have written dx for what should be Leb( dx). In the language of Section 5.12, result (b) says that Ax has density Ix relative to Leb:
dAx dLeb
= Ix.
The function Ix is only defined almost everywhere: any function a.e. equal to fx will also satisfy (b) 'and conversely'. The above lemma extends to
E(lh(X)/) <
00
if and only if J Ih(x)llx(x)dx
and then
Eh(X)
=
l
< 00
h(x)fx(x)dx.
6.13. Holder from Jensen The truncation technique used to prove the Schwarz inequality in Section 6.8 relied on the fact that pen) < 00. However, the Schwarz inequality is true for any measure space, as is the more general Holder inequality. We conclude this chapter with a device (often useful) which yields the Holder inequality for any (S, E, f-t) from Jensen's inequality for probability triples. Let (S, E,f-t) be a measure space. Suppose that ~
p> 1 and p-l Write
I
+ q-l = 1.
E £P(S, E,f-t) if I E mE and f-t(I/IP)
<
00,
and in that case define
THEOREM Suppose that f,g E £P(S, E,f-t), hE O(S, E,f-t). Then
Ih
~(a)
(Holder's inequality)
~(b)
(MinkowskPs inequality)
E ,CI(S,E,f-t) and
70
Chapter 6: Expectation
(6.13) ..
Proof of (a). We can obviously restrict attention to the case when
f, h ~ 0 and p(fP) > o. With the notation of Section 5.14, define
so that P is a probability measure on (S, E). Define
U(8):= {Oh(8)/f(S)P-I
if f(s) > 0, iff(s) =0.
The fact that P(u)9 ::; P(u9) now yields
o Proof of (b). Using Holder's inequality, we have
p(lf + glP)
where
= p(lfllf + gIP-I) + p(lgllf + gIP-I) ::; IIfllpA + IIgllpA,
A = Illf + gIP-III, = p(lf + gIP)I!9,
and (b) follows on rearranging. (The result is non-trivial only if f, 9 E £P, and in that case, the finiteness of A follows from the vector-space property of cP.) 0
Chapter 7
An Easy Strong Law
7.1. 'Independence means multiply' - again!
THEOREM ...
Suppose that X and Yare independent RVs, and that X and Yare both in {}. Then XY E £1 and E(XY) = E(X)E(Y). In particular, if X and Yare independent elements of £2, then Cov(X, Y) = 0 and Var(X
+ Y) =
Var(X) + Var(Y).
Proof. Writing X = X+ - X-, etc., allows us to reduce the problem to the case when X ~ 0 and Y ~ O. This we do.
But then, if a(r) is our familiar staircase function, then
where the sums are over finite parameter sets, and where for each i and j, Ai (in o-(X» is independent of Bj (in o-(Y». Hence E[a(r)(X)a(r)(y)]
=L = L
Now let r
i
00
LaibjP(Ai n Bj) L aibjP(Ai)P(Bj)
= E[a(r)(X)]E[a(r)(y)].
o
and use (MON).
Remark. Note especially that if X and Y are independent then X E £1 and Y E £1 imply that XY E Cl. This is not necessarily true when X and
71
(7.0) ..
Chapter 7: An Easy Strong Law
72
Yare not independent, and we need the inequalities of Schwarz, Holder, etc. It is important that independence obviates the need for such inequalities.
1.2. Strong Law - first version The following result covers many cases of importance. You should note that though it imposes a 'finite 4th moment' condition, it makes no assumption about identical distributions for the (Xn) sequence. It is remarkable that so fine a result has so simple a proof.
THEOREM ~
Suppose that Xl, X 2, ... are independent random variables, and that for some constant K in [0,(0),
E(XA:) Let Sn
= Xl
+X2
or again, Sn/n
-4
= 0,
+ ... +Xn .
E(xt) ~ K,
Vk.
Then
0 (a.s.).
Proof. vVe have E(S~)
= E[(XI + X 2 + ... + X n )4) = E(Lxt +6LL.<.X1XJ), k J I
because, for distinct i, j, k and I,
using independence plus the fact that E(Xi) = o. [Note that, for example, the fact that E(Xf) < 00 implies that E(XJ) < 00, by the 'monotonicity of .c p norms' result in Section 6.7. Thus Xi and Xl are in .c 1 .) We know from Section 6.7 that
[E(xlW ~ E(Xt) ~ K, Hence, using independence again, for i
E(xl Xl)
Vi.
i' j,
= E(XnE(XJ) ~ K.
.. (7.9) Thus
Chapter 7: An Easy Strong Law
79
E(S!):::; nK + 3n(n - I)K:::; 3Kn 2 ,
and (see Section 6.5)
so that 'fJSn/n)4 <
00,
a.s., and
Sn/n
->
0,
o
a.s.
°
Corollary. If the condition E(Xk) = in the theorem is replaced by E(Xk) = p for some constant p, then the theorem holds with n- 1 Sn -> p (a.s.) as its conclusion. Proof. It is obviously a case of applying the theorem to the sequence (Yk), where Yk := Xk - p. But we need to know that
(a) This is obvious from Minkowski's inequality
(the constant function pIon n having £4 norm Ipl). But we can also prove (a) immediately by the elementary inequality (6.7,b). 0
The next topics indicate a different use of variance.
7.3. Chebyshev's inequality As you know this says that for c 2:: 0, and X E £2, p:= E(X);
and it is obvious.
Example. Consider a sequence (Xn) of lID RVs with values in {O, I} with p
= P(Xn = 1) = 1 -
P(Xn
= 0).
(Vi) ..
Chapter 7: An Easy Strong Law
74 Then E(Xn)
= P and Var(Xn ) = p(l- p)::; 1. Thus (using Theorem 7.1) Sn := Xl + X 2 + ... + Xn
has expectation np and variance np(l - p) ::; n/4, and we have E(n-ISn) = p, Var(n-ISn) = n- 2 Var(Sn)::; 1/(4n). Chebyshev's inequality yields P(ln-ISn - pi> b)::; 1/(4n82 ).
1.4. Weierstrass approximation theorem If f is a continuous function on [0,1] and c polynomial B such that
> 0,
then there exists a
sup IB(x) - f(x)1 ::; c. xE(O,I]
Proof. Let (Xk), Sn etc. be as in the Example in Section 7.3. You are well aware that
Hence
Bn(P):= Ef(n-ISn)
= ~f(n-Ik)(~)pk(l- pt-k,
the 'B' being in deference to Bernstein. Now f is bounded on [0,1], If(y)1 ::; K, Vy E [0,1]. Also, f is uniformly continuous on [0,1]: for our given t: > 0, there exists b > 0 such that (a) Ix - yl ::; 8 implies that If(x) - f(y)1 Now, for p E [0,1],
< !t:.
IBn(P) - f(p)1 = IE{f(n-ISn) - f(p)} I· Let us write Y n := If(n-ISn) - f(p)1 and Zn := In-ISn - pl. Then Zn ::; 8 implies that Yn < !c, and we have IBn(P) - f(p)1 ::; E(Yn) = E(Yn; Zn ::; b) + E(Yn; Zn > 8) ::; !cP(Zn ::; 8) + 2KP(Zn > b) ::; !e + 2K/( 4nb 2 ). Earlier, we chose a fixed 8 at (a). We now choose n so that
2f{/(4nb 2 ) < !c. Then IBn(P) - f(p)1 < c, for all p in [0,1]. Now do Exercise E7.1 on inverting Laplace transforms.
o
Chapter 8
Product Measure
8.0. Introduction and advice One of this chapter's main lessons of practical importance is that an 'interchange of order of integration' result
is always valid (both sides possibly being infinite) if f ~ OJ and is valid for 'signed' f (with both repeated integrals finite) provided that one (then the other) of the integrals of absolute values:
is finite. It j., a good idea to read through the chapter to get the ideas, but you are strongly recommended to postpone serious study of the contents until a later stage. Except for the matter of infinite products, it is all a case of relentless use of either the standard machine or the Monotone-Class Theorem to prove intuitively obvious things made to look complicated by the notation. When you do begin a serious study, it is important to appreciate when the more subtle Monotone-Class Theorem has to be used instead of the standard machine.
8.1. Product measurable structure, I:1 x I:2 Let (5], I:x) and (52, I: 2) be measurable spaces. Let 5 denote the Cartesian product 5 := S1 X S2. For i = 1,2, let Pi denote the ith coordinate map, so that
75
(8.1) ..
Chapter 8: Product Measure
76
The fundamental definition of E
= El
X
E2 is as the <7-algebra
~(a)
Thus E is generated by the sets of the form
together with sets of the form
Generally, a product <7-algebra is generated by Cartesian products in which one factor is allowed to vary over the <7-algebra corresponding to that factor, and all other factors are whole spaces. In the case of our product of two factors, we have
(b) and you can easily check that
(c)
I
= {Bl
X
Bz : Bi E Ei }
is a 7r-system generating E = El X E 2 • A similar remark would apply for En, but you can see that, since we may only take a countable product countable intersections in analogues of (b), products of uncountable families of <7-algebras cause problems. The fundamental definition analogous to (a) still works.
n
LEMMA (d)
Let ?t denote the class of functions which are such that for each
81
in SI, the map 8Z
for each 8Z in Sz, the map Then?t
81
f-+ f-+
f :S
--+
R which are in bE and
f(81,82) is E 2-measurable on S2, f(81' 82) is El -measurable on Sl.
= bE.
Proof. It is clear that if A E I, then fA E?t. Verification that ?t satisfies the conditions (i)-(iii) of the Monotone-Class Theorem 3.14 is straightforward. Since E = <7(I), the result follows. 0
.. (8.2)
Chapter 8: Product Measure
77
8.2. Product measure, Fubini's Theorem We continue with the notation of the preceding Section. We suppose that for i = 1,2, f.li is a finite measure on (Si, Ei). 'Ne know from the preceding Section that for f E bE, we may define the integrals
LEMMA Let 1i be the class of elements in hE such that the following property holds: I{(.) E bEl and ItO E hE2 and
Then 1i
= hE.
Proof If A E I, then, trivially, IA E 1i. Verification of the conditions of Monotone-Class Theorem 3.14 is straightforward. 0 For FEE with indicator function
f
;=
IF, we now define
Fubini's Theorem ~
The set /unction f.l is a measure on (S, E) called the product measure of f.lI and f.lz and we write f.l = f.ll X f.l2 and
Moreover, f.l is the unique measure on (S, E) for which
(a)
Ai EEj. If f
(b)
E (mE)+,
then with the obvious definitions of I{, It, we have
Chapter 8: Product Measure
78
in [0,00]. If f E m~ and p(lfl) terms in R).
(8.2)..
< 00, then equation (a) is valid (with all
Proof. The fact that fl is a measure is a consequence of linearity and (MON). The fact that Jl is then uniquely specified by (a) is obvious from Uniqueness Lemma 1.6 and the fact that 17(I) = ~.
Result (b) is automatic for f = lA, where A E I. The Monotone-Class Theorem shows that it is therefore valid for f E b~, and in particular for f in the S F+ space for (S,~, p). (MON) then shows that it is valid for f E (m~)+; and linearity shows that (b) is valid if fl(lfl) < 00. Extension ~
All of Fubini's Theorem will work if the sure spaces:
(Si'~i,Jli)
are 17-jinite mea-
We have a unique measure Jl on (S,~) satisfying (a), etc., etc. We can prove this by breaking up 17-finite spaces into countable unions of disjoint finite blocks. Warning The 17-finiteness condition cannot be dropped. The standard example is the following. For i = 1,2, take Si = [0,1] and ~i = B[O, 1]. Let PI be Lebesgue measure and let Jl2 just count the number of elements in a set. Let F be the diagonal {(x,y) E Sl x S2: x = y}. Then (check!) F E~, but
and result (b) fails, stating that 1 = O. Something to think about So, our insistence on beginning with bounded functions on products of jinite measures was necessary. Perhaps it is worth emphasizing that in our standard machine, things work because we can use indicator functions of any set in our 17-algebra, whereas when we can only use indicator functions of sets in a 7r-system, we have to use the Monotone-Class Theorem. We cannot approximate the set F in the Warning example as F =j limFn, where each Fn is a finite union of 'rectangles' Al x A 2 , each Ai being in
B[O,l].
Chapter 8: Product Measure
.. (8.3)
79
A simple application Suppose that X is a non-negative random variable on (n, F, P). Consider the measure p.:= P X Leb on (n,F) x ([0, oo),B[O, 00)). Let
A := {(w,x) : O:S x :S X(w)},
h:= IA.
Note that A is the 'region under the graph of X'. Then
INw)
= X(w),
J.l(A)
= E(X) = f
I~(x)
= P(X 2:: x).
Thus
(c)
P(X 2:: x)dx,
J[O,oo)
dx denoting Leb(dx) as usual. Thus we have obtained one of the well-known formulae for E(X) and also interpreted the integral E(X) as 'area under the graph of x'. Note. It is perhaps worth remarking that the Monotone-Class Theorem, the Fatou Lemma and the reverse Fatou Lemma for functions amount to the corresponding results for sets applied to regions under graphs.
8.3. Joint laws, joint pdfs Let X and Y be two random variables. The (joint) law.cx,Y of the pair (X, Y) is the map .cx,Y : B(R) x B(R) -10 [0,1] defined by .cX,y(r) := P[(X, Y) E r].
The system {(-oo,x] x (-00, V] : x,y E R} is a 7r-system which generates B(R) X B(R). Hence .cx,y is completely determined by the joint distribution Fx,Y of X and Y which is defined via Fx,Y(x,y):= P(X:S x;Y:S V).
We now know how to construct Lebesgue measure p.
= Leb x Leb on (R, B(R))2.
We say that X and Y have joint probability density function (joint pdf)
ix,Y on R2 iffor r E B(R) X B(R), P[(X, Y) E r]
lr = kk =
/x,y(z)p.(dz) Ir(x,v)fx,y(x,y)dxdy
Chapter 8: Product Measure
80
(8.9) ..
etc., etc., (Fubini's Theorem being used in the last step(s)). Fubini's Theorem further shows that fx(x) :=
~ fX'y(x,y)dy
acts as a pdf for X (Section 6.12), etc., etc. You don't need me to tell you any more of this sort of thing. 8.4. Independence and product measure Let X and Y be two random variables with laws .cx,.cy respectively and distribution functions Fx, Fy respectively. Then the following three statements are equivalent:
(i) X and Yare independent; (ii) .cx,y =.cx x .cy; (iii) Fx,Y(x,y)
= Fx(x)Fy(y);
moreover, if (X, Y) has 'joint' pdf fx,y then each of (i)-(iii) is equivalent to (iv) fx,Y(x,y)
= fx(x)fy(y)
for Leb x Leb almost every (x,y).
You do not wish to know more about this either.
Here again, things are nice and tidy provided we work with finite or countable products, but require different concepts (such as Baire IT-algebras) if we work with uncountable products.
B(Rn) is constructed from the topological space Rn. Now, if Pi: Rn R is the
ith
-->
coordinate map:
then Pi is continuous, and hence B(Rn)-measurable. Hence
On the other hand, B(Rn) is generated by the open subsets of Rn, and every such open subset is a countable union of open 'hypercubes' of the form
II
(ai,bi)
l~k~n
and such products are in B(R)n. Hence, B(Rn)
= B(R)n.
o
In probability theory it is almost always product structures Bn which feature rather than B(Rn). See Section S.S.
.. (B.7)
Chapter B: Product Measure
81
8.6. The n-fold extension So far in this chapter, we have studied the product measure space of two measure spaces and how this relates to the study of two random variables. You are more than able to 'generalize' 'from two to n' from your experience of similar things ill other branches of mathematics. You should give some thought to the associativity of the 'product' in product measure space, something again familiar in analogous contexts. 8.7. Infinite products of probability triples This topic is not a trivial extension of previous results. We concentrate on a restricted context (though an important one) because it allows us to get the main idea in a clear fashion; and extension to infinite products of arbitrary probaLility triples is then a purely routine exercise. Canonical model for a sequence of independent RVs Let (An : n E N) be a sequence of probability measures on (R, B). We already know from the coin-tossing trickery of Section 4.6 that we can construct a sequence (Xn) of independent RVs, Xn having law An. Here is a more elegant and systematic way of doing this.
THEOREM Let (An Define
n E N) be a sequence of probability measures on (R, B).
so that a typical element w of n is a sequence (w n ) in R. Define Xn :
n --+ R,
and let F := o-(X n : n EN). Then there exists a unique probability measure P on (n, F) such that for r E Nand B 1 , B 2 , • •• ,Br E B,
(a) We write
(n,.r,p) =
II (R,B,An). nEN
Then the sequence (Xn : n E N) is a sequence of independent RVs on (n,F,p), Xn having law An.
Chapter 8: Product Measure
82
(8.7)..
Remarks. (i) The uniqueness of P follows in the usual way from Lemma 1.6, because product sets of the form which appear on the left-hand side of (a) form a 1r-system generating:F. (ii) We could rewrite (a) more neatly as
P(
IT Bk) = IT Jh(Bk). keN
keN
To see this, use the monotone-convergence property (1.10,b) of measures. Proof of the theorem is deferred to the Appendix to Chapter 9. 8.8. Technical note on the existence of joint laws Let (n,:F), (Sb !:1) and (S2, !:2) be measurable spaces. For i = 1,2, let Xi : n -> Sj be such that XiI : !:j ->:F. Define
Then (Exercise) X-I : !: -> :F, so that X is an (S, !:)-valued random variable, and if P is a probability measure on n, we can talk about the law p. of X (equals the joint law of Xl and X 2 ) on (S,!:) : p. = Po X-Ion !:. Suppose now that SI and S2 are metrizable spaces and that !:i = 8(Si) Then S is a metrizable space under the product topology. If SI and S2 are separable, then!: = 8(S), and there is no 'conflict'. However, if SI and S2 are not separable, then 8( S) may be strictly larger than !:, X need not be an (S,8(S»-valued random variable, and the joint law of Xl and X 2 need not exist on (S, 8 (S».
(i
= 1,2).
It is perhaps as well to be warned of such things. Note that the separability of R was used in proving that 8(Rn) ~ 8 n in Section 8.5.
PART B: MARTINGALE THEORY ChapteT' 9
Conditional Expectation
9.1. A motivating example Suppose that (O,:F, P) is a probability triple and that X and Z are random variables, X taking the distinct values Xl, XZ, ••• , X m, Z taking the distinct values Zl, Z2, .. . , Zn.
Elementary conditional probability:
P(X
= xilZ = Zj):= P(X = Xi; Z = Zj)jP(Z = Zj)
and elementary conditional expectation:
E(XIZ
= Zj) = 2:XiP(X = xdZ = Zj)
are familiar to you. The random variable Y = E(XIZ), the conditional expectation of X given Z, is defined as follows: (a)
if Z(w) = Zj, then Y(w) := E(XIZ = Zj) =: Yj (say).
It proves to be very advantageous to look at this idea in a new way. 'Reporting to us the value of Z(w)' amounts to partitioning 0 into' Z-atoms' on which Z is constant:
o
I
Z =
Zl
I
Z = Zz
I .. . I
Z = Zn
I
The u-algebra 9 = u(Z) generated by Z consists of sets {Z E B}, B E B, and therefore consists precisely of the 2 n possible unions of the n Z-atoms. It is clear from (a) that Y is constant on Z-atoms, or, to put it better,
(b)
Y
is
g-measurable.
89
(9.1) ..
Chapter 9: Conditional Expectation
84
Next, since Y takes the constant value Yj on the Z-atom {Z
r
J{z=~}
YdP = YjP(Z = Zj) = LXiP(X
= Zj}, we have:
= xdZ = Zj)P(Z = Zj)
j
= LXiP(X = Xii Z = Zj) = i
r
XdP.
J{z=Zj}
If we write Gj = {Z = Zj}, this says E(YIGj) = E(XIGj)' Since for every Gin 9, IG is a sum of IGj 's, we have E(YIG) = E(XIG), or
(c)
!aYdP= LXdP,
VGE9.
Results (b) and (c) suggest the central definition of modern probability. 9.2. Fundamental Theorem and Definition (Kolmogorov, 1933) Let (n, F, P) be a triple, and X a random variable with E(lX!) < 00. Let 9 be a sub-u-algebra of F. Then there exists a random variable Y such that
~~~
(a)
Y is 9 measurable,
(b)
E(IY!) <
(c)
for every set G in 9 (equivalently, for every set G in some 7r-system which contains n and generates 9), we have
00,
/ YdP= / XdP, G
VGE 9.
G
Moreover, if Y is another RV with these properties then Y = Y, a.s., that is, P[Y = Y] = 1. A random variable Y with properties (a )-( c) is called a version of the conditional expectation E(XI9) of X given 9, and we write Y = E(XI9), a.s.
Two versions agree a.s., and when one has become familiar with the concept, one identifies different versions and speaks of the conditional expectation E(XI9). But you should think about the 'a.s.' throughout this course. The theorem is proved in Section 9.5, except for the 7r-system assertion which you will find at Exercise E9.1. ~Notation.
We often write E(XIZ) for E(Xlu(Z)), E(XIZI, Z2, ... ) for E(Xlu(Zl,Z2," .)), etc. That this is consistent with the elementary usage is apparent from Section 9:6.below.
.. (9.5)
Chapter 9: Conditional Expectation
85
9.3. The intuitive meaning
An experiment has been performed. The only information available to you regarding which sample point w has been chosen is the set of values Z(w) for every 9-measurable random variable Z. Then Yew) = E(XI9)(w) is the expected value of X(w) given this information. The 'a.s.' ambiguity in the definition is something one has to live with in general, but it is sometimes possible to choose a canonical version of E(XI9). Note that if 9 is the trivial <7-algebra {0, n} (which contains no information), then E(XI9)(w) = E(X) for all w. 9.4. Conditional expectation as least-squares-best predictor ~~
If E(X2) < 00, then the conditional expectation Y = E(XI9) i3 a ver3ion of the orthogonal projection (3ee Section 6.11) of X onto £2(n, 9, P). Hence, Y is the lea3t-3quare3-best 9-mea3urable predictor of X: among3t all 9 -measurable functions (i. e. among3t all predictors which can be computed from the available information), Y mtmmtzes
No surprise then that conditional expectation (and the martingale theory which develops it) is crucial in filtering and control - of space-ships, of industrial processes, or whatever. 9.5. Proof of Theorem 9.2
The standard way to prove Theorem 9.2 (see Section 14.14) is via the RadonNikodym theorem described in Section 5.14. However, Section 9.4 suggests a much simpler approach, and this is what we now develop. We can then prove the general Radon-Nikodym theorem by martingale theory. See Section 14.13. First we prove the almost sure uniqueness of a version of E(XI9). Then we prove the existence of E(XI9) when X E £2; and finally, we prove the existence in general. Almost sure uniqueness of E(XI9) Suppose that X E £1 and that Y and Yare versions of E(XI9). Then Y, Y E £l(n, 9, P), and
E(Y - Y;G)
= 0,
VGE9.
86
Chapter 9: Conditional Expectation
{9.5} ..
Suppose that Y and Y are not a!.most surely equal. We may assume that the labelling is such that P(Y > Y) > O. Since
we see that P(Y - Y > n- 1 ) > 0 for some n. But the set {Y - Y is in 9, because Y and Yare (i-measurable; and
a contradiction. Hence Y
= Y, a.s.
> n- 1 }
,0
Existence of E(XI9) for X E £2 Suppose that X E £2 := £2(n, F, P). Let 9 be a sub-O'-algebra of F, and let JC := £2(9) := £2(n, 9, P). By Section 6.10 applied to 9 rather than F, we know that JC is complete for the £2 norm. By Theorem 6.11 on orthogonal projection we know that there exists Y in JC = £2(9) such that
(a) (b)
(X - Y, Z)
= 0,
VZ in £2(9).
Now, if G E 9, then Z:= Ia E £2(9) and (b) states that E(Y;G)
= E(X;G).
Hence Y i3 a version of E(XI9), as required. Existence of E(XI9) for X E £1 By splitting X as X = X+ - X_, we see that it is enough to deal with the case when X E (£1 )+. So assume that X E (£1 )+. We can now choose bounded variables Xn with 0 ~ Xn i X. Since each Xn is in £2, we can choose a version Yn of E(XnI9). We now need to establish that ( c) it is almost surely true that 0
~
Yn
i.
We prove this in a moment. Given that (c) is true, we set Y(w):= lim sup Yn(w).
Then Y E m9, and Yn i Y, a.s. But now (MON) allows us to deduce that E(Y;G)=E(X;G)
(GE9)
from the corresponding result for Yn and X n •
0
Chapter 9: Conditional Expectation
.. (9.6)
87
A positivity result Property (c) follows once we prove that
(d) if U is a non.ne9ative bounded RV, then E(ulg) :::: 0, a.s. Proof of (d). Let W be a version of E(Ulg). If P(W < 0) > 0, then for some n, the set G := {W < _n- 1 } in g has positive probability, so that
OS; E(UjG) = E(WjG) < -n-IP(G) < O.
o
This contradiction finishes the proof.
9.6. Agreement with traditional usage The case of two RVs will suffice to illustrate things. So suppose that X and Z are RVs which have a joint probability density function (pdf)
fx,z(x,z).
IR
Then Jz(z) = fx,z(x,z)dx acts as a probability density function for Z. Define the elementary conditional pdf fXlz of X given Z via if fz(z~ i- OJ otherwIse.
fXlz(xlz):= {fx,z(x,z)/fz(z)
o
Let h be a Borel function on R such that
Elh(X)1 = klh(x)lfx(x)dx <
00,
where of course fx(x) = IRfx,z(x,z)dz gives a pdffor X. Set
g(z):= k h(x)fxlz(xlz)dx. Then Y := g(Z) is a version of the conditional expectation of h(X) given
a(Z). Proof The typical element of a(Z) has the form {w : Z(w) E B}, where BE B. Hence, we must show that (a)
L := E[h(X)IB(Z)] = E[g(Z)IB(Z)] =: R.
But
L
=
JJ
h(x)IB(z)Jx,z(x,z)dxdz,
R
and result (a) follows from Fubini's Theorem.
=
J
g(z)IB(z)Jz(z)dz,
o
Some of the practice is given in Sections 15.6.15.9, which you can look at now.
88 ~~~9.1.
(9.7) ..
Chapter 9: Conditional Expectation
Properties of conditional expectation: a list
These properties are proved in Section 9.S. All X's satisfy ECIXI) < 00 in this list of properties. Of course, 9 and 1{ denote sub-a-algebras of:F. (The use of 'c' to denote 'conditional' in (cMON), etc., is obvious.)
= E(X).
(a) If Y is any version of E(XI9) then E(Y) (b) If X is 9 measurable, then E(XI9)
= X,
(Very useful, this.)
a.s.
(c) (Linearity) E(aIXI + a2X219) = aIE(XI19) + a2E(X219), a.s. Clarification: if Yi. is a version of E(XI19) and Y2 is a version of E(X219), then al YI + a2Y2 is a version of E(aIXI + a2X219). (d) (Positivity) If X
~
0, then E(XI9)
~
0, a.s.
(e) (cMON) IfO::; Xn i X, then E(XnI9)j E(XI9), a.s. (f) (cFATOU) If Xn
~
0, then E[liminfXnI9]::; liminfE[Xn I9], a.s.
(g) (cDOM) If IXn(w)1 ::; V(w), Vn, EV < E(XnI9)
-+
00,
E(XI9),
and Xn
Important corollary: IIE(XI9)llp ::;
X, a.s., then
a.s.
(h) (cJENSEN) If c : R -+ R is convex, and Elc(X)1 < E[c(X)19] ~ c(E[XI9]),
-+
00,
then
a.s.
IIXlip for p ~
1.
(i) (Tower Property) If 1{ is a sub-a-algebra of 9, then E[E(XI9)11{]
= E[XI1{],
a.s.
Note. We shorthand LHS to E[XI911il for tidiness.
(j) ('Taking out what is known') If Z is 9-measurable and bounded, then E[ZXI9]
= ZE[XI9],
a.s.
I£p> 1, p-I +q-I = 1, X E O(!l,F,P) and Z E O(!l,9,P), then (*) again holds. If X E (mF)+, Z E (m9)+, E(X) < 00 and E(ZX) < 00,
then (*) holds. (k) (Role of independence) If 1{ is independent of a(a(X),9), then E[Xla(9,1{)]
= E(XI9),
a.s.
In particular, if X is independent of 1{, then E(XI1{)
= E(X),
a.s.
.. (9.8)
89
Chapter 9: Conditional Expectation
9.8. Proofs of the properties in Section 9.7 Property (a) follows since E(Yjn) = E(Xjfl), n being an element of g. Property (b) is immediate from the definition, as is Property (c) now that its Clarification has been given. Property (d) is not obvious, but the proof of (9.5,d) transfers immediately to our current situation. Proof of (e). If 0 ~ Xn j X, then, by (d), if, for each n, Y n is a version of E(Xnlg), then (a.s.) 0 ~ Y n j. Define Y:= lim sup Y n . Then Y E mg, and Yn j Y, a.s. Now use (MON) to deduce from
= E(XnjG),
E(YnjG)
VGE g,
that E(Yj G) = E(Xj G), VG E g. (Of course we used a very similar argument in Section 9.5.) 0 Proof of (f) and (g). You should check that the argument used to obtain (FATOU) from (MON) in Section 5.4 and the argument used to obtain (DOM) from (FATOU) in Section 5.9 both transfer without difficulty to yield the conditional versions. Doing the careful derivation of (cFATOU) from (cMON) and of (cDOM) from (cFATOU) is an essential exercise for ~u. 0 Proof of (h). From (6.6,a), there exists a countable sequence «an, bn » of points in R2 such that c(x) = sup(anx n
+ bn ),
x E R.
For each fixed n we deduce via (d) from c(X) surely,
~
E[c(X)lg] ~ anE[Xlg]
anX
+ bn
that, almost
+ bn.
By the usual appeal to countability, we can say that almost surely (**) holds simultaneously for all n, whence, almost surely, E[c(X)lg] ~ sup(anE[Xlg]
+ bn ) =
c(E[Xlg])·
n
Proof of corollary to (h). Let p ~ 1. Taking c(x)
= IxI P , we see that
E(IXIPlg) ~ IE(Xlg)/P, a.s.
o
gQ
Chapter g: Conditional Expectation
(g. 8)..
o
Now take expectations, using property (a).
Property (i) is virtually immediate from the definition of conditional expectation. Proof of (j). Linearity shows that we can assume that X ~ O. Fix a version Y of E(XIQ), and fix G in Q. We must prove that if Z is Q-measurable and appropriate integrability conditions hold, then
(***)
E(ZX; G)
= E(ZY; G).
We use the standard machine. If Z is the indicator of a set in Q, then (***) is true by definition of the conditional expectation Y. Linearity then shows that (***) holds for Z E SF+(n,Q,p). Next, (MON) shows that (***) is true for Z E (mQ)+ with the understanding that both sides might be infinite. All that is necessary to establish that property (j) in the table is correct is to show that under each of the conditions given, E(IZXI) < 00. This is obvious if Z is bounded and X is in r}, and follows from the Holder 0 inequality if X E £P and Z E £q where p > 1 and p-l + q-l = 1. Proof of (k). We can assume that X ~ 0 (and E(X) < 00). For G E Q and HE 1i, XIa and H are independent, so that by Theorem 7.1,
E(X; G n H) = E[(XIa)IH] = E(XIa)P(H). Now if Y = E(XIQ) (a version of), then since Y is Q-measurable, YIa is independent of 11. so that
E[(YIa)IH] = E(YIa)P(H) and we have
E[X;GnH] = E[Y;GnH]. Thus the measures
F
t-+
E(X; F),
F
t-+
E(Y; F)
on u(Q, 1i) of the same finite total mass agree on the 1T-system of sets of the form G n H (G E Q, HE 1i), and hence agree everywhere on u(Q, 1i). This is exactly what we had to prove. 0
91
Chapter 9: Conditional Expectation
.. (9.10)
9.9. Regular conditional probabilities and pdfs For F E F, we have P(F) = E(IF). For F E F and g a sub-a-algebra of F, we define P(Flg) to be a version of E(IFlg).
By linearity and (cMON), we can show that for a fixed sequence (Fn) of disjoint elements of F, we have
(a)
(a.s.)
Except in trivial cases, there are uncountably many sequences of disjoint sets, so we cannot conclude from (a) that there exists a map
P(·,·):
nxF
-+
[0,1)
such that (bl)
for F E F, the function w
(b2)
for almost every w, the map
1-+
F
pew, F) is a version of P(Flg);
1-+
P(w,F)
is a probability measure on F. If such a map exists, it is called a regular conditional probability given g. It is known that regular conditional probabilities exist under most conditions encountered in practice, but they do not always exist. The matter is too technical for a book at this level. See, for example, Parthasarathy (1967).
Important note. The elementary conditionai" pdf fXlz(x/z) of Section 9.6 is a proper - technically, regular - conditional pdf for X given Z in that for every A in 8, W 1-+
Proof. Take h
L
= IA
fXlz(xIZ(w»dx is a version of P(X
E
AIZ).
in Section 9.6.
o
9.10. Conditioning under independence assumptions Suppose that r E N and that X b X 2, ... , X r are independent RVs, X k having law A k • If h E bBr and we define (for Xl E R)
(a)
92
(9.10) ..
Chapter 9: Conditional Expectation
then
(b)
··/(Xl ) is a version of the conditional expectation E[h(Xl ,X2 , ••• ,Xr)IXlJ.
Two proofs of (b). We need only show that for B E 8,
(c)
E[h(Xl ,X2, ... ,Xr)IB(XdJ = E[,h(Xl)IB(Xl )].
We can do this via the Monotone-Class Theorem, the class 1t of h satisfying (c) contains the indicator functions of elements in the 7l"-system of sets of the form Bl X B2 x ... x Br (Bk E 8), etc., etc. Alternatively, we can appeal to the r-fold Fubini Theorem; for (c) says that
r
irE"'
h(x)IB(Xl)(Al x A2 x ...
X
Ar)(dx)
=
r
iX1ER
,h(Xl)IB(Xl)Al(dxd,
where
o 9.11. Use of symmetry: an example
Suppose that X l ,X2 , ••. are lID RVs with the same distribution as X, where E(IX!) < 00. Let Sn := Xl + X 2 + ... + X n , and define
On := a(Sn, Sn+l,"') = a(Sn,Xn+I,Xn+2 , •• • ). We wish to calculate
E(XlIOn)' for very good reasons, as we shall see in Chapter 14. Now a(Xn+l , X n+2, ... ) is independent of a(Xl,Sn) (which is a sub-a-algebra of a(Xl , ... ,Xn )). Hence, by (9.7,k), E(XI/On) = E(XlISn). But if A denotes the law of X, then, with Sn denoting Xl + X2 + ... + X n , we have E(Xl;Sn E B) =
J···1
xlA(dxdA(dx2)'" A(dxn)
'nEB
= E(X2; Sn
E B)
= ... = E(Xn; Sn
E B).
Hence, almost surely,
E(XlISn) = ... = E(XnISn) = n-lE(Xl + ... + XnlSn)
= n-ISn.
Chapter 10
Martingales
10.1. Filtered spaces ~~As
basic datum, we now take a filtered space (fl,:F, {Fn }, P). Here, (fl,F,P) is a probability triple as usual,
{:Fn : n ;?: O} is a flltratioll, a-algebras of :F:
that is, an increasing family of sub-
We define
:Foo := a(UFn)
~ F.
n
Intuitive idea. The information about w in fl available to us at (or, if you prefer, 'just after') time n consists precisely of the values of Z(w) for allFn measurable functions Z. Usually, {:Fn } is the natural flltration
Fn = a(Wo, W}, .. . , W n ) of some (stochastic) process W = (Wn : n E Z+), and then the information about w which we have at time n consists of the values _
Wo(w), WI (w), ... , Wn(w).
10.2. Adapted process process X = (Xn : n ;?: 0) is called adapted (to the filtration {Fn}) iffor each n, Xn is Fn-measurable.
~A
Intuitive idea. If X is adapted, the value Xn(w) is known to us at time n. Usually, Fn = a(Wo, WI"'" Wn ) and Xn = In(Wo, WI'"'' W n) for some BnH-measurable function In on RnH.
99
94
(10.S) ..
Chapter 10: Martingales
10.3. Martingale, supermartingale, submartingale ~~~A
process X is called a martingale (relative to ({Fn},
(i)
X is adapted,
(ii)
E(IXn!) <
(iii)
E[XnIFn-d
00,
P» if
"In,
= X n- 1,
a.s.
(n 2:: 1).
A supermarlingale (relative to ({Fn }, (iii) is replaced by
P» is defined similarly, except that (n 2:: 1),
and a sub martingale is defined with (iii) replaced by (n 2:: 1).
A supermartingale 'decreases on average'; a submartingale 'increases on average'! [Supermartingale corresponds to superharmonic: a function f on Rn is superharmonic if and only if for a Brownian motion B on Rn, feB) is a local supermartingale relative to the natural filtration of B. Compare Section 10.13.] Note that X is a supermartingale if and only if -X is a subraartingale, and that X is a martingale if and only if it is both a supermartingale and a submartingale. It is important to note that a process X for which Xo E .c 1 (r2,Fo,P) is a martingale [respectively, supermartingale, submartingale] if and only if the process X - Xo = (Xn - Xo : n E Z+) has the same property. So we can focus attention on processes which are null at o. ~
If X is for example a supermartingale, then the Tower Property of CEs, (9.7)(i), shows that for m < n,
E[XnIFm] = E[XnIFn-lIFm] ::; E[Xn-1IFml::;··· ::; X m ,
a.s ..
10.4. Some examples of martingales As we shall see, it is very helpful to view all martingales, supermartingales and submartingales in terms of gambling. But, of course, the enormous importance of martingale theory derives from the fact that martingales crop up in very many contexts. For example, diffusion theory, which used to be studied via methods from Markov-process theory, from the theory of
Chapter 10: Martingale3
.. (10.4)
95
partial differential equations, etc., has been revolutionized by the martingale approach. Let us now look at some simple first examples, and mention an interesting question (solved later) pertaining to each. (a) Sums ofindependent zero-mean RVs. Let X I ,X2 , ••• be a sequence of independent RVs with E(IXkl) < 00, Vk, and
Define (So := 0 and)
Sn := Xl + X 2 + ... + X n, Fn :=
Fo:= {0,Q}.
2:: 1, we have (a.s.)
The first (a.s.) equality is obvious from the linearity property (9.7,c). Since Sn-l is Fn_I-measurable, we have E(Sn-dFn-d = Sn-l (a.s.) by (9.7,b); and since Xn is independent of F n- l , we have E(XnIFn-d = E(Xn) (a.s.) by (9.7,k). That must explain our notation!
Intere3ting que3tion: when does limSn exist (a.s.)? See Section 12.5. (b) Products of non-negative independent RVs of mean 1. Let
Xl, X 2 , ••• be a sequence of independent non-negative random variables with
E(Xk) Define (Mo := 1, Fo := {0, Q} and)
Then, for n
2:: 1, we have (a.s.)
so that Jvf i3 a martingale.
= 1,
Vk.
(10.4) ..
Chapter 10: Martingales
96
It should be remarked that such martingales are not at all artificial.
Interesting question. Because AI is a non-negative martingale, Moo = limMn exists (a.s.); this is part of the Martingale Convergence Theorem of the next chapter. When can we say that E(Moo) = 17 'See Sections 14.12 and 14.17. (c) Accumulating data about a random variable. Let {Fn} be our filtration, and let ~ E £1(O,F,P). Define Mn := E(~IFn) ('some version of'). By the Tower Property (9.7,i), we have (a.s.)
Hence M is a martingale.
Interesting question. In this case, we shall be able to say that Mn -+ Moo := E(~I.roo),
a.s.,
because of Levy's Upward Theorem (Chapter 14). Now Mn is the best predictor of ~ given the information available to us at time n, and A100 is the best prediction of ~ we can ever make. When can we say that ~ = E(~IFoo), a.s7 The answer is not always obvious. See Section 15.8. 10.5. Fair and unfair games Think now of Xn - X n- 1 as your net winnings per unit stake in game n in a series of games, played at times n
o.
= 1,2, . ...
(n 2:: 1)
There is no game at time
In the martingale case, (a)
E[Xn - X n - 1IFn- 1] = 0,
(game series is fair),
and in the supermartingale case, (game series is unfavourable to you). Note that (a) [respectively (b)] gives a useful way of formulating the martingale [supermartingale] property of X. 10.6. Previsible process, gambling strategy ~~We
call a process C = (Cn : n E N) previsible if Cn is F n -
1
measurable (n 2:: 1).
.. (10.8)
Chapter 1 0: Martingales
97
Note that C has parameter set N rather than Z+: Co does not exist. Think of C n as your stake on game n. You have to decide on the value of Cn bascd on the history up to (and including) time n - 1. This is the intuitive significance of the 'previsible' character of C. Your winnings on game n are Cn(Xn - Xn-d and your total winnings up to time n are
Yn =
L
Ck(Xk - Xk-1) =: (C. X)n'
l::;k::;n Note that (C. X)o
= 0, and that
The expression C. X, the martingale transform of X by C, is the discrete analogue of the stochastic integral J C dX. Stochastic-integral theory is one of the greatest achievements of the modern theory of probability. 10.7. A fundamental principle: you can't beat the system! ~~(i)
(ii)
Let C be a bounded non-negative previsible process so that, for some K in [0,00), ICn(w)1 ::; J( for every n and every w. Let X be a supermartingale [respectively martingale}. Then C.X is a supermartingale [martingale} null at O. If C is a bounded previsible process and X is a martingale, then
(C • X) is a martingale null at O. (iii)
In (i) and (ii),the bo'undedness condition on C may be replaced by the condition C n E £2, Vn, provided we also insist that Xn E £2, Vn.
Proof of (i). Write Y for C. X. Since C n is bounded non-negative and 1 measurable,
Fn -
Proofs of (ii) and (iii) are now obvious. (Look again at (9.7,j).)
10.8. Stopping time
A map T : Q ~~(a)
~
{O, 1,2, ... ; oo} is called a stopping time if,
{T ::; n} = {w : T(w) ::; n} E F n ,
Vn::; 00,
0
(10.8) ..
Chapter 10: Martingales
98 equivalently,
(b)
{T=n}={w:T(w)=n}EFn,
Note that T can be
Vn:::;oo.
00.
Proof of the equivalence of (a) and (b). If T has property (a), then
{T = n}
= {T:::; n}\{T:::; n -I} E Fn.
If T has property (b), then for k :::; n, {T
{T:::; n} =
= k} E F"
~
Fn and
U {T = k} E Fn. O:$":$n
Intuitive idea. T is a time when you can decide to stop playing our game. Whether or not you stop immediately after the nth game depends only on the history up to (and including) time n: {T = n} E Fn. Example. Suppose that (An) is an adapted process, and that B E B. Let
T = inf{n
~ 0:
An E B}
= time of first entry of A into set B.
By convention, inf(0) = 00, so that T = 00 if A never enters set B. Obviously, {T:::; n} = {A" E B} E F n ,
U
":$n so that T is a stopping time. Example. Let L = sup{n : n :::; lOjAn E B}, sup(0) = O. Convince yourself that L is NOT a stopping time (unless A is freaky). 10.9. Stopped supermartingales are supermartingales
Let X be a supermartingale, and let T be a stopping time. Suppose that you always bet 1 unit and quit playing at (immediately after) time T. Then your 'stake process' is G(T), where, for n E N,
G~.r)=I{n:$T)'
so
Gf)(w)={l ifn:::;!(w),
o
otherwIse.
Your 'winnings process' is the process with value at time n equal to
(G(T). X)n
= XTl\n -
Xo.
99
Chapter 10: Martingales
.. (10.9)
If XT denotes the process X stopped at T:
then
C(T). X
= XT -
Xo.
Now C(T) is clearly bounded (by 1) and non-negative. Moreover, C(T) is previsible because C~T) can only be 0 or 1 and, for n E N, {C~T)
= O} = {T:S n -
I} E F n -
1•
Result 10.7 now yields the following result.
THEOREM. ~~(i)
If X is a 3Upermartingale and T is a stopping time, then the stopped process XT = (XTAn : n E Z+) i" a supermartingale, so that in particular,
E(XTI\n):S E(Xo), ~~(ii)
Vn.
If X is a martingale and T i" a stopping time, then XT is a martingale, so that in particular,
E(XTl\n)
= E(Xo),
Vn.
It is important to notice that this theorem imposes no extra integrability conditions whatsoever (except of course for those implicit in the definition of supermartingale and martingale). But be very careful! Let X be a simple random walk on Z+, starting at O. Then X is a martingale. Let T be the stopping time:
T:= inf{n : Xn
= I}.
It is well known that P(T < 00) = 1. (See Section 10.12 for a martingale proof of this fact, and for a martingale calculation of the distribution of T.) However, even though
E(XTAn)
= E(Xo) for every n,
we have
1 = E(XT) =I- E(Xo)
= O.
(10.9) ..
Chapter 1 0: Martingales
100
We very much want to know when we can say that E(Xr)
= E(Xo)
for a martingale X. The following theorem gives some sufficient conditions.
10.10. Dooh's Optional-Stopping Theorem ~~(a)
Let T be a "topping time. Let X be a "upermartingale. Then Xr i" integrable and E(Xr) :S E(Xo) in each of the following situations:
(i) T is bounded (for "ome N in N, T(w) :S N, VW)j (ii) X is bounded (for some J( in R+, IXn(w)1 :S J( for every nand every w) and T is a.s. finite; (iii) E(T) <
00,
and, for some K in R+,
IXn(w) - Xn-1(w)1 :S J( (b)
V(n,w).
If any of the conditions (i)-(iii) holds and X i" a martingale, then E(Xr)
= E(Xo).
Proof of (a). We know that Xr/>on is integrable, and
E{XrAn - Xo) :S o. For (i), we can take n = N. For (ii), we can let n --+ For (iii), we have
00
in (*) using (BDD).
rAn IXrAn -Xol
= 12:(Xk -Xk-l)l:s J(T k=l
and E(KT) < 00, so that (DOM) justifies letting n --+ the answer we want. Proof of (b). Apply (a) to X and to (-X).
00
in (*) to obtain 0
o
Chapter 10: Martingales
.. (10.11)
101
Corollary ~(c)
Suppose that M is a martingale, the increments Mn - M n- 1 of which are bounded by some constant ](1. Suppose that C is a previsible process bounded by some constant ](2, and that T is a stopping time such that E(T) < 00. Then
E(C. M)T
= O.
Proof of the following final part of the Optional-Stopping Theorem is left as an Exercise. (It's clear whose lemma is needed!)
(d)
If X is a non-negative supermartingale, and T is a stopping time which is a.J. finite, then
E(XT)
~
E(Xo).
10.11. Awaiting the almost inevitable In order to be able to apply some of the results of the preceding Section, we need ways of proving that (when true!) E(T) < 00. The following announcement of the principle that 'whatever always stands a reasonable chance of happening will almost surely happen - sooner rather than later' is often useful. LEMMA ~
Suppose that T is a stopping time such that for some N in Nand some c > 0, we have, for every n in N:
P(T ~ n Then E(T) <
+ NIFn) >
c,
a.s.
00.
You will find the proof of this set as an exercise in Chapter E. Note that if T is the first occasion by which the monkey in the 'Tricky exercise' at the end of Section 4.9 first completes
ABRACADABRA, then E(T) < 00. You will find another exercise in Chapter E inviting you to apply result (c) of the preceding Section to show that E(T)
= 2611 + 264 + 26.
A large number of other Exercises are now accessible to you.
(10.le) ..
Chapter 10: Martingale"
102
10.12. Hitting times for simple random walk Suppose that (Xn : n E N) is a sequence of IID RVs, each Xn having the same distribution as X where
P(X = 1) = P(X = -1) = !. Define So := 0, Sn := Xl
+ ... + X n, and set T := inf{n : Sn = 1}.
Let
:Fn = O'(Xlo". ,Xn) = O'(So, S1, ... , Sn)'
Then the process S is adapted (to {:Fn }), so that T is a stopping time. We wish to calculate the distribution of T. For 8 E R, Ee 'lx = Hell + e- II ) = cosh 8, so that E[(sech8)e'lXn ] = 1,
Vn.
Example (1O.4,b) shows that Mil i" a martingale, where
Since T is a stopping time, and Mil is a martingale, we have
(a) ~
EM~An
= E[(sech8)TAn exp(8STAn)] = 1,
Vn.
Now insist that 8> O. Then, firstly, exp(8STAn) is bounded by ell, so Mf,An is bounded by ell. Secondly, as n 1 00, Mf,An -+ Mf, where the latter is defined to be 0 if T = 00. The Bounded Convergence Theorem allows us to let n -+ 00 in (a) to obtain EMf, = 1 = E[(sech8)T e'lJ
the term inside [.J on the right-hand side correctly being 0 if T =
(b)
E[(sech 8)T] = e- II
We now let 8 ! O. Then (sech8)T 11 if T Either (MON) or (BDD) yields EI{T
00.
Hence
for 8 > O.
< 00, and (sech8)T 1 0 if T =
= 1 = peT < 00).
00.
~
103
Chapter 10: Martingales
.. (10.13)
The above argument has been given carefully to show how to deal with possibly infinite stopping times. Put a
(c)
= sech 8 in (b) to obtain E( aT) = L a"P(T = n) = e- fJ = a-I [1 -
~l,
so that
Intuitive proof of (c) We have
(d)
f(a) : = E(a T ) = lE(a T JX 1 = !a + !af( a)2.
= 1) + !E(a T JX 1 = -1)
The intuitive reason for the very last term is that time 1 has already elapsed giving the a, and the time taken to go from -1 to 1 has the form Tl + T 2 , where Tl (the time to go from -1 to 0) and T2 (the time to go from 0 to 1) are independent, each with the same distribution as T. It is not obvious that 'Tl and T2 are independent', but it is not difficult to devise a proof: the so-called Strong Markov Theorem would allow us to justify (d). 10.13. Non-negative superharmonic functions for Markov chains Let E be a finite or countable set. Let P matrix, so that, for i,j E E, we have Pij
~ 0,
L
Pik
= (Pij)
be a stochastic Ex E
= 1.
kEE
Let Il be a probability measure on E. We know from Section 4.8 that there exists a triple (!1, F, PI') (we now signify the dependence of P on Il) carrying a Markov chain Z = (Zn : n E Z+) such that (4.8,a) holds. We write 'a.s., PI" to signify 'almost surely relative to the pl'-measure'. Let Fn := u(Zo, Zt, ... , Zn). It is easy to deduce from (4.8,a) that if we write p(i,j) instead of Pij when typographically convenient, then (a.s.,PI')
Let h be a non-negative function on E and define the function Ph on E via
(Ph)(i) = LP(i,j)h(j). j
(10.19) ..
Chapter 1 0: Martingales
104
Assume that our non-negative function h is finite and P-superharmonic in that Ph ~ h on E. Then, (cMON) shows that, a.s., PI',
so that h(Zn) is a non-negative supermartingale (whatever be the initial distribution J.l). Suppose that
th~
chain Z is irreducible recurrent in that
hi := pi(Ti
< 00)
= 1,
Vi,j E E,
where pi denotes pI' when J.l is the unit mass (J.lj below) and Tj := inf{n : n 2: 1; Zn = j}.
= bii)
at i (see 'Note'
Note that the infimum is over {n 2: I}, so that Iii is the probability of a return to i if Z starts at i. Then, by Theorem IO.IO(d), we see that if h is non-negative and P-superharmonic, then, for any i and j in E,
so that h is constant on E.
Exercise. Explain (at first intuitively, and later with consideration of rigour) why lij
= LPik/ki + Pij k#j
2: LPidkj k
and deduce that if every non-negative P-superharmonic function is constant, then Z is irreducible recurrent. 0 So we have proved that our chain Z is irreducible and recurrent if and only il every nonnegative P-superharmonic function is constant. This is a trivial first step in the links between probability and potential theory. Note. The perspicacious reader will have been upset by a lack of precision in this section. I wished to convey what is interesting first. Only the very enthusiastic should read the remainder of this section.
.. (10.13)
Chapter 10: Martingale8
105
The natural thing to do, given the one-step transition matrix P, is to take the canonical model for the Markov chain Z obtained as follows. Let £ denote the a-algebra of all subsets of E and define
(fl,F):=
II (E,£). nEZ+
In particular, a point w of fl is a sequence W=(WO,Wl, ••. )
of elements of E. For w in fl and n in Z+, define
Then, for each probability measure Jl. on (E, £), there is a unique probability measure PI' on (fl, F) such that for n E Nand i o, i 1 , ••• , in E E, we have
The uniqueness is trivial because w-sets of the form contained in [.J on the left-hand side of (*), together with 0, form a 1r-system generating F. Existence follows because we can take PI' to be the Pl'-law of the noncanonical process Z constructed in Section A4.3:
Here, we regard
Z as
the map
Z:!1--+fl W f-4 (ZO(W),ZI(W), ... ), this map
Z being :t/:F measurable in that Z-l : :F --+
:to
The canonical model thus obtained is very satisfying because the measurable space (fl, F) carries all measures PI' simultaneously.
Chapter 11
The Convergence Theorem
11.1. The picture that says it all The top part of Figure 11.1 shows a sample path n >-+ Xn(w) for a process X where Xn - X n- t represents your winnings per unit stake on game n. The lower part of the picture illustrates your total-winnings process Y := C • X under the previsible strategy C described as follows: Pick two numbers a and b with a < b. REPEAT Wait until X gets below a Play unit stakes until X gets above b and stop playing UNTIL FALSE (that is, forever!). Black blobs signify where C = 1; and open circles signify where C Recall that C is not defined at time o.
= o.
To be more formal (and to prove inductively that Cis previsible), define
C t :=
I{Xo
and, for n :2: 2,
11.2. Up crossings The number UN[a,b](w) of upcrossings of [a,b] made by n >-+ Xn(w) by time N is defined to be the largest k in Z+ such that we can find
o S; 8t < tt < 82 < t2 < ... < 8k < tk
S; N
with
Xs;(w) < a,
Xt;{w) > b
106
(lS;iS;k).
(11.2)
Chapter 11: The Convergence Theorem
10?
Figure 11.1
o
b
o
~--------~------
__+-____________________
o a
~
______
~
________
~
________
~~~~
00·
00·
y 000
I
______
(11.£) ..
Chapter 11: The Convergence Theorem
108
The fundamental inequality (recall that Yo(w) := 0) ~(D)
YN(W) ~ (b - a)UN[a, bj(w) - [XN(W) - aj-
is ob,;ous from the picture: every up crossing of [a, bj increases the Y-value by at least (b - a), while the [XN(W) - aj- overemphasizes the loss during the last 'interval of play'.
11.3. Dooh's Upcrossing Lemma ~
Let X be a supermartingale. Let UN [a, bj be the number of upcrossings of(a, bj by time N. Then
Proof The process C is previsible, bounded and ~ 0, and Y = C • X. Hence Y is a supermartingale, and E(YN) :::; O. The result now follows from
(11.2,D). 11.4. COROLLARY ~
Let X be a supermartingale bounded in £,1 in that
supE(lXnl) <
00.
n
Let a,b E R with a
< b. Then, with Uoo[a,bj :=i limN UN[a,bj,
(b - a)EUoo[a, bj :::; lal + sup E(IXnl)
< 00
n
so that
P(Uoo[a,bj
= (0) = o.
Proof By (11.3), we have, for N E N, (b-a)EUN[a,bj :::;lal+E(/XNI) :::;lal+supE(/Xnl). n
Now let N
i
00,
using (MON).
o
.. (11.7)
Chapter 11: The Convergence Theorem
109
11.5. Doob's 'Forward' Convergence Theorem ~~~
Let X be a supermartingale bounded in £1 : supE(IXnl) < 00. n
Then,- almost surely, Xoo := limX n exists and is finite. For definiteness, we define Xoo(w) := limsupXn(w), Vw, so that Xoo is :Foo measurable and Xoo = limXn' a.s. Proof(Doob). Write (noting the use of [-00,00]):
A:
= {w: Xn(w) does not converge to a limit in = {w: liminf Xn(w) < limsupXn(W)}
U
[-oo,oo]}
{w : liminf Xn(W) < a < b < limsupXn(w)}
{a,bEQ:a
=:
UAa,b
(say).
But Aa,b ~
{w : Uoo[a, b](w)
= oo},
so that, by (11.4), P(Aa,b) = O. Since A is a countable union of sets Aa,b, we see that peA) = 0, whence
Xoo:= limXn exists a.s. in [-00,00]. But Fatou's Lemma shows that E(IXool)
= E(liminf IXnl)
:::;liminf E(IXnl) :::; supE(IXnl) < 00,
so that
P(Xoo is finite)
o
= 1.
Note. There are other proofs for the discrete-parameter case. None of these is as probabilistic, and none shares the central importance of this one for the continuous-parameter case.
11.6. Warning As we saw for the branching-process example, it need not be true that -+ Xoo in £1.
Xn
11. 7. Corollary ~~
If X is a non-negative supermartingale, then Xoo := limX n exists almost surely.
Proof X is obviously bounded in £1, since E(IXnl)
= E(Xn) :::; E(Xo).
0
Chapter 12
Martingales bounded in £, 2
12.0. Introduction When it works, one of the easiest ways of proving that a martingale M is bounded in r} is to prove that it is bounded in £2 in the sense that (a)
sup I/Mnl/2 n
< 00, equiva:Iently,
supE(M~)
< 00.
n
Boundedness in £2 is often easy to check because of a Pythagorean formula (proved in Section 12.1) n
E(M~) = E(M~) + EE[(Mk - Mk-l?J. k=l
The study of sums of independent random variables, a central topic in the classical theory, will be seen to hinge on Theorem 12.2 below, both parts of which have neat martingale proofs. We shall prove the ThreeSeries Theorem, which says exactly when a sum of independent random, variables converges. We shall also prove the general Strong Law of Large Numbers for lID RVs and Levy's extension of the Borel-Cantelli Lemmas. 12.1. Martingales in
.c 2 ;
orthogonality of increments
Let M = (Mn : n ~ 0) be a martingale in £2 in that each Mn is in.c2 so that E(M~) < 00, Vn. Then for s, t, 11, V e Z+, with s ~ t ~ 11 ~ v, we know that E(M,,/Fu) = Mu (a.s.), so that M" -Mu is orthogonal to .c 2 (Fu) (see Section 9.5) and in particular,
(a)
(Mt
-
M.,M" - Mu) = O. 110
Chapter 12: Martingales bounded in C2
.. (12.1)
111
Hence the formula n
Mn
= Mo + 2)Mk -
Mk-1)
k=1 expresses Mn as the sum of orthogonal terms, and Pythagoras's theorem yields n
E(M~)
(b)
= E(MJ) + E E[(Mk -
Mk-dl·
k=1
THEOREM ~
Let M be a martingale for which Mn E C2, Vn. Then M is bounded in £.2 if and only if
EE[(Mk -Mk-I)21 <
(c)
OOj
and when this obtains,
Mn
-+
1.100 almost surely and in £.2.
Proof. It is obvious from (b) that condition (c) is equivalent to the statement M is bounded in t:.2. Suppose now that (c) holds. Then M is bounded in C2, and hence, by the property of monotonicity of norms (Section 6.7), M is bounded in C1. Doob's Convergence Theorem 11.5 shows that Moo := lim Mn exists almost surely. The Pythagorean theorem implies that
n+r (d)
E[(Mn+r - Mn?l
=
E
E[(Mk - Mk-l?l·
k=n+1 Letting r
(e)
-+ 00
and applying Fatou's Lemma, we obtain
E[(Moo - Mn?l S;
E
E[(Mk - Mk-1?1·
k;?n+l Hence
(f) so that Mn -+ M(X) in C2. Of course, (f) allows us to deduce from (d) that (e) holds with equality.
Chapter 12: Martingales bounded in £2
112
(12.2) ..
12.2. Sums of zero-mean independent variables in £2
THEOREM ~~
Suppose that (X" : kEN) is a sequence of independent random variables such that, for every k,
(a)
Then
(L>·~ < 00) (b)
implies that
e2:X" converges, a.s.).
eXk)
If the variables are bounded by some constant ]( in [0,00) in that IX,,(w)1 $](, Vk, Vw, then
e2: X" converges, a.s.)
implies that
(2: q~ < 00).
Note. Of course, the Kolmogorov 0-1 law implies that
PCE X"
converges) = 0 or 1.
Notation. We define
(with Fo := {0, Q}, Mo ;= 0, by the usual conventions). We also define n
An;= 2:q~, k=1
so that Ao ;= 0 and No := o. Proof of (a). We know from (lOA,a) that Al is a martingale. Moreover
so that, from (12.1,b), n
E(M~) =
2: q~ = An. k=1
If E q~ < 00, then M is bounded in £2, so that lim Mn exists a.s.
0
.. (1£.9)
119
Chapter 12: Martingales bounded in £,2
Proof of (b). We can strengthen (*) as follows: since Xk is independent of
:Fk-l, we have, almost surely,
A familiar argument now applies: since J.{k-l is :Fk-l measurable,
11~
2Mk-1E(Mkl.rk-J) + MLI = E(Mll:Fk-l) - MI-l (a.s.)
= E(Mil:F/;-l) -
But this result states that
N is a martingale. Now let c E (0,00) and define
T:= inf{r : IMrl > c}. "\Ve know that NT is a martingale so that, for every n,
But since IMT - MT-ll = IXTI $ J( if T is finite, we see that for every n, whence
IM!I $
J{
+c
(**) However, since EXn converges a.s., the partial sums of EXk are a.s. bounded, and it must be the case that for some c, P(T = 00) > O. It is now clear from (**) that Aoo := E 11~ < 00. 0 Remark. The proof of (b) showed that if (Xn) is a sequence of independent zero-mean RVs uniformly bounded by some constant J(, then
(P{
partial sums·of EXk are bounded}
> 0) ::::} (EXk
converges a.s.)
Generalization. Sections 12.11-12.16 present the natural martingale form of Theorem 12.2 with applications. 12.3. Random signs
Suppose that (an) is a sequence of real numbers and that (en) is a sequence of lID RVs with P(en = ±1) = !.
(12.9).
Chapter 12: Martingales bounded in £2
114
The results of Section 12.2 show that converges (a.s.) if and only if "E- a~
L: enan and that
"E- enan
(a.s.) oscillates infinitely if "E- a~
< 00,
= 00.
You should think about how to clinch the latter statement. 12.4. A symmetrization technique: expanding the sample space We need a stronger result than that provided by (12.2,b).
LEMMA SUppOiJe that (Xn) is a sequence of independent random variables bounded by a constant K in [0,(0):
IXn(w)1
~
K,
Vn, Vw.
Then
("E- Xn
converges, a.s.) ::}
("E- E(Xn) converges and "E- Var(Xn) < (0).
Proof If each Xn has mean zero, then of course, this would amount to (12.2,b). There is a nice trick which replaces each Xn by a 'symmetrized version' Z~ of mean 0 in such a way as to preserve enough of the structure.
Let (n,ft,p,(.Kn Define
n EN» be an exact copy of (n,.1', P, (Xn : n EN».
:
(n",.1'*, PO) := (n,.1', P) x (Q, ft, P) and, for w* = (w,w) E n*, define X~(w*) := Xn(w),
X~(w*):= Xn(W),
Z~(w*):= X~(w*) - .Kn(w*).
We think of X~ as Xn lifted to the larger 'sample space' (n*,.1'*, PO). It is clear (and may be proved by applying Uniqueness Lemma 1.6 in a familiar way) that the combined family (X~ : n E N) U (X~ : n E N)
is a family of independent random variables on (n*,.1'*, PO), with both and X~ having the same PO-distribution as the P-distribution of Xn: P*
0
(X~)-l
= Po X;;-l
on (R, B), etc.
X~
Chapter 12: Martingales bounded in £2
.. (12.5)
115
Now we have
(a)
(Z~: n
EN") is a zero-mean sequence of independent random variables on (O",.P,P") such that IZn(w")I:::; 2K (Vn,Vw") and Var(Z~)
= 20"~,
where O"~ := Var(Xn).
Let
G:= {w EO: EXn(w) converges}, with G d~fined similarly. Then we are given that P{G) = peG) = 1, so that P"(G x G) = 1. But EZ~(w·) converges on G x G, so that P"(EZ~ converges)
(b)
= 1.
From (a) and (b) and (12.2,b), we conclude that
LO"~ < 00, and now it follows from (12.2,a) that
(c)
E[X n - E(Xn)] converges, a.s.,
the variables in this sum being zero-mean independent, with
Since (c) holds and EX n converges (a.s.) by hypothesis, EE(Xn) converges. 0 Note. Another proof of the lemma may be found in Section 18.6.
12.5. Kolmogorov's Three-Series Theorem Let (Xn) be a sequence of independent random variables. Then E Xn converges almost surely if and only if for some (then for every) K > 0, the following three properties hold:
(i) EP(/Xn / > K) <
00,
n
(ii)
E E(X~()
converges,
n
(iii) EVar(X,{') < n
00,
Chapter 12: Martingales bounded in £2
116
(12.5) ..
where
X K( )._ {Xn(W) n w.0
if IXn(w)1 ::; K, if IXn(w)1 > J(.
Proof of 'if' part. Suppose that for some K > 0 properties (i)-(iii) hold. Then LP(Xn =F X~() = LP(IXnl > K) < 00, so that by (BC1)
P(Xn
= x!f for all but finitely many n) = 1.
It is therefore clear that we need only show that EX!f converges almost surely; and because of (ii), we need only prove that
E YnK
converges, a.s., where YnK := X!f - E(X!f).
However, the sequence (YnK random variables with
:
n E N) is a zero-mean sequence of independent
Because of (iii), the desired result now follows from (12.2,a).
o
Proof of 'only if' part. Suppose that E Xn converges, a.s., and that K is any constant in (0,00). Since it is almost surely true that Xn -+ 0 whence IXnl > K for only finitely many n, (BC2) shows that (i) holds. Since (a.s.) Xn = x!f for all but finitely many n, we know that
EX!f converges, a.s. Lemma 12.4 completes the proof.
o
Results such as the Three-Series Theorem become powerful when used in conjunction with Kronecker's Lemma (Section 12.7).
12.6. Cesaro's Lemma Suppose that (b n ) is a sequence of strictly positive real numbers with bn i 00, and that (v n ) is a convergent sequence of real numbers: Vn -+ Voo E R. Then
.. (12.7)
Chapter 12: Martingales bounded in £2
117
Here, bo := O. Proof. Let
E
> O. Choose N such that Vk > Voo - E whenever k
~
N.
Then, liminf-I- f)bk - bk-I)Vk n-oo
n k=l
~ liminf { ~ ~(bk ~
0 + Voo
bk-t)Vk
+ bn ~ bN (voo -
E) }
-E.
Since this is true for every E > 0, we have lim inf similar argument, lim sup ~ V oo , the result follows.
~
Voo; and since, by a
0
12.7. Kronecker's Lelll.llla ~
Again, let (b n ) denote a ~equence of strictly positive real numbers with bn i 00. Let (xn) be a sequence of real numbers, and define Sn :=
Xl
+ X2 + ... + X n .
Then
Proof. Let
Un
Thus
:= Lk~n(Xk/h), so that
U
oo := lim Un exists. Then
n
Sn
=L
n
bk(uk - Uk-I)
k=1
= bnu n -
L(bk - h-I)Uk-I' k=1
Cesaro's Lemma now shows that
o
Chapter 12: Martingales bounded in £2
118
{12.8} ..
12.8. A Strong Law under variance constraints
LEMMA Let (Wn) be a sequence of independent random variables such that
Proof. Dy Kronecker's Lemma, it is enough to prove that E(Wn/n) con0 verges, a.s. But this is immediate from Theorem 12.2(a). Note. We are now going to see that a truncation technique enables us to obtain the general Strong Law for lID RVs from the above lemma.
12.9. Kohnogorov's Truncation Lemma Suppose thatX1 ,X2 , ••• are IID RVs each with the same distribution as X, where E(IX\) < 00. Set 1-':= E(X). Define
y. ._ {Xn n'0
if IXnl :5 n, iflXnl > n.
Then
(i) E(Yn)
-+
1-';
(ii) P[Yn = Xn eventually} = 1; (iii)
E n-2 Var(Yn ) < 00.
Proof of (i). Let
Z ._ {X0 n·-
if IXI :5 n, if IXI > n.
Then Zn has the same distribution as Y n , so that in particular, E(Zn) E(Yn). But, as n -+ 00, we have
so, by (DOM), E(Zn)
-+
E(X) = 1-'.
=
0
119
Chapter 12: Martingales bounded in £2
.. (12.10)
Proof of (ii). We have 00
L P(Yn;f Xn) = LP(IXnl > n) = LP(IXI > n) n=l 00
= E
L I{IXI>n} = E L l :s; E(IXI) <
00,
l:::;n
n=1
so that by (Bel), result (ii) holds.
0
Proof of (iii). We have
where, for 0
< z < 00, f(z) =
L
n- 2 :s; 2/max(l,z).
n~rnax(l,z)
We have used the fact that, for n
~
1,
Hence
o
12.10. Kolmogorov's Strong Law of Large Numbers (SLLN) Let X l ,X2 , ••• be lID RVs with E(IXkl)
Sn:= Xl +X2
<
00,
Vk. Define
+ ... +Xn •
Then, with J1. := E(Xk)' Vk, n -1 Sn
-+
J1.,
almost surely.
Proof. Define Y n as in Lemma 12.9. By property (ii) of that lemma, we need only show that n- 1 -+ J1., a.s.
LYk
k:::;n
Chapter 12: Martingales bounded in £2
120
(12.10) ..
But
(a) where W k := Yk - E(Yk). But, the first term on the right-hand side of (a) tends to J1. by (12.9,i) and Cesaro's Lemma; and the second almost surely 0 converges to 0 by Lemma 12.8. Notes. The Strong Law is philosophically satisfying in that it gives a precise formulation of E(X) as 'the mean of a large number of independent realizations of X'. We know from Exercise E4.6 that if E(IXI) = 00, then
lim sup ISn lin
= 00,
a.s ..
So, we have arrived at the best possible result for the IID case. Discussion of methods. Even though we have achieved a good result, it has to be admitted that the truncation technique seems 'ad hoc': it does not have the pure-mathematical elegance - the sense of rightness - which the martingale proof and the proof by ergodic theory (the latter is not in this book) both possess. However, each of the methods can be adapted to cover Jituations which the others cannot tackle; and, in particular, classical truncation arguments retain great importance.
Properly formulated, the argument which gave the result, Theorem 12.2, on which all of this chapter has so far relied, can yield much more.
12.11. Doob decomposition
In the following theorem, the statement that 'A is a previsible process null at 0' means of course that Ao = 0 and An E m.Fn - 1 (n EN).
THEOREM ~~(a)
Let (Xn : n E Z+) be an adapted process with Xn E £1, 'In. Then X has a Doob decomposition
.. (12.12)
Chapter 12: Martingales bounded in £2
121
where M is a martingale null at 0, and A is a previsible process null at o. Moreover, this decomposition is unique modulo indistinguishability in the sense that if X = Xo + if + A is another such decomposition, then ~(b)
X is a ~ubmartingale if and only if the process A is an increasing process in the sense that
Proof. If X has a Doob decomposition as at (D), then, since M is a martingale and A is previsible, we have, almost surely,
E(Xn - Xn-1IFn-t}
= E(Mn = 0 + (An
Mn-1IFn-J) - An-I).
+ E(An
- An-1IFn-l)
Hence n
(c)
An =
L E(Xk -
Xk-tiFk-I),
a.s.,
k=l
and if we use (c) to define A, we obtain the required decomposition of X. The 'submartingale' result (b) is now obvious. 0 Remark. The Doob-Meyer decomposition, which expresses a submartingale in continuous time as the sum of a local martingale and a previsible increasing process, is a deep result which is the foundation stone for stochastic-integral theory. 12.12. The angle-brackets process (M) Let M be a martingale in £2 and null at Jensen's inequality shows that
(a)
o.
Then the conditional form of
M2 is a submartingale.
Thus}'1 has a Doob decomposition (essentially unique):
(b)
M2
= N + A,
where N is a martingale and A is a previsible increasing process, both N and A being null at o. Define Aoo :=; lim An, a.s. Notation. The process A is often written (}'1).
Chapter 12: Martingales bounded in £2
122 Since E(M~)
= E(An), we see that
M is bounded in
(c)
(12.12).
£2
if and only if E(Aoo) <
00.
It is important to note that ~(d)
An - A n- I = E(M~ - M~_II.1'n-l)
= E[(Mn -
Mn-dl.1'n-ll·
12.13. Relating convergence of M to finiteness of (M)OC) Again let M be a martingale in £2 and null at O. Define A := (M). (More strictly, let A be 'a version of' (M).)
THEOREM
limMn(w) exists for almost every w for which Aoo(w) <
~(a)
00.
n
Suppose that M has uniformly bounded increment$ in that for some in R, IM,,(w) - Mn_l(w)1 ::; 1<, Vn, Vw.
~(b)
J{
Then AOC)(w)
< 00 for a.lmost every w for which lim Mn(w) exists. n
Remark. This is obviously an extension - and a very substantial one - of Theorem 12.2. Proof of (a). Because A is previsible, it is immediate that for every kEN, S(k):= inf{n E Z+ : An + 1
> k}
defines a stopping time S(k). Moreover, the stopped proceS$ AS(k) is previsible because for B E 5, and n E N,
{AnI\S(k) E B}
= FI U Fl ,
where n-I
FI :=
U {S(k) = r; Ar E B} E .1'n
-1,
r=O
Fl := {An E B}
n {S(k) ::; n
Since
(MS(k»l _ AS(k)
= (Ml
-lY E .1'n- l • _ A)S(k)
is a martingale, we now see that (MS(k)} = AS(k). However, the process AS(k) is bounded by k, so that by (12.12,c), MS(k) is bounded in £2 and
(c)
lim.Mnl\s(k) exists almost surely. n
.. (12.14)
123
Chapter 12: M artingaZeJ bounded in [,2
However,
{Aoo < oo}
(d)
= U{S(k) = oo}. k
o
Result (a) now follows on combining (c) and (d).
Proof Qf(b). Suppose that
P(A oo
= 00, sup IMnl < 00) > O. n
Then for some e
(e)
> 0, P(T(e)
= 00,
Aoo
= 00) > 0,
where T(e) is the stopping time:
T(e) := inf{r : IMrl > e}. Now, E(Mf(c)"n - AT(c)"n)
= 0,
and MT(c) is bounded by e + I<. Thus
(f)
EAT(c)"n
:s: (e + I,
\In.
But (MON) shows that (e) and (f) are incompatible. Result (b) follows. 0 Remark. In the proof of (a), we were able to use previsibility to make the jump AS(k) - AS(k)-1 irrelevant. We could not do this for the jump MT(c) - MT(c)-1 which is why we needed the assumption about bounded increments. 12.14. A trivial 'Strong Law' for martingales in £}
Let M be a martingale in [,2 and null at 0, and let A = (M). Since (l+A)-1 is a bounded previsible process,
defines a martingale W. Moreover, since (1 E[(Wn - Wn-dIFn-ll
+ An) is F n- I
= (1 + An)-2(An :S (1
+ An_t)-I
measurable,
An-I)
- (1
+ An)-t,
a.s.
(12.14)··
Chapter 12: Martingale3 bounded in £2
124
We see that (W)oo ::; 1, a.s., so that lim Wn exists, a.s .. Kronecker's Lemma now shows that ~~(a)
Mn/An
--+
= oo}.
0 a.s. on {Aoo
12.15. Levy's extension of the Borel-CallteIIi Lemmas
THEOREM Suppo3e that for n E N, En E Fn. Define
LIE. = number of Ek(k ::; n) which occur.
Zn:=
l~k~n
Y n :=
L
~k.
l~k~n
Then, almo3t surely,
(a) (Yoo
<: 00) => (Zoo < 00),
(b) (Yoo
= 00) => (Zn/Yn
--+
1).
Remarks. (i) Since E~k = P(Ek), it follows that if EP(Ek) < 00, a.s. (BCl) therefore follows.
00,
then
Y00 <
(ii) Let (En: n E N) be a sequence of independent events associated with some triple (fl, F, P), and define Fn = a(E1 , E 2 , ••• ,En). Then 6 = P(Ek), a.s., and (BC2) follows from (b). Proof. Let M be the martingale Z - Y, so that Z = M + Y is the Doob decomposition of the submartingale Z. Then (you check!)
An := (M)n =
L ~k(l -
~k) ::; Yn,
a.s.
k~n
If Yoo < 00, then A"" < 00 and limMn exists, so that Zoo is finite. (We are skipping 'except for a null w-set' statements now.) If Y00 = Zn/Yn --+ 1.
00
and Aoo <
If Y00 = 00 and Aoo = Mn/Yn --+ 0 and Zn/Yn --+ 1.
00
then lim Mn exists and it is trivial that
00,
then M n/ An
--+
0, so that, a fortiori, 0
.. (12.16)
Chapter 12: Martingales bounded in £2
125
12.16. Comments
The last few sections have indicated just how powerful the use of (M) to study M is likely to be. In the same way as one can obtain the conditional version Theorem 12.15 of the Borel-Cantelli Lemmas, one can obtain conditional versions of the Three-Series Theorem etc. But a whole new world is opened up: see Neveu (1975), for example. In the continuous-time case, things are much more striking still. See, for example, Rogers and Williams (1987).
Chapter 13
Uniform Integrability
We have already seen a number of nice applications of martingale theory. To derive the full benefit, we need something better than the DominatedConvergence Theorem. In particular, Theorem 13.7 gives a necessary and sufficient condition for a sequence of RVs to converge on C1. The new concept required is that of a uniformly integrable (VI) family of random variables. Thi3 concept linb perfectly with conditional expectation3 and hence with martingale3. The appendix to this chapter contains a discussion of that topic loved by examiners and others: modes of convergence. Our use of the Upcrossing Lemma has meant that this topic does not feature large in the main text of this book. 13.1. An 'absolute continuity' property
LEMMA ~(a)
Suppo3e that X E C1 == C1(n,F,p). Then, given c > 0, there exi3t3 a 0 > 3uch that for F E F, P(F) < 0 implie3 that E(IXli F) < c.
°
Proof. If the conclusion is false, then, for some co > 0, we can find a sequence (Fn) of elements of F such that
Let H :== limsupFn. Then (BC1) shows that P(H) == 0, but the 'Reverse' Fatou Lemma (5.4,b) shows that
E(IXI; H) ~ coi and we have arrived at the required contradiction.
126
o
127
Chapter 19: Uniform Integrability
.. (19.9) Corollary
(b)
Suppo&e that X E £1 and that c > O. Then there exists K in [0,00) such that E(IXli IXI > K) < c.
Proof. Let 5 be as in Lemma (a). Since KP(IXI
> K) :S E(IXI),
o
we can choose K such that P(IXI > K) < 5.
13.2. Definition. VI family ~~
A class C of random variables i3 called uniformly integrable (VI) if given c > 0, there exi3ts K in [0,00) 3uch that
E(IXli IXI > K) < c,
"IX E C.
We note that for such a class C, we have (with Kl relating to c ::: 1) for every X E C, E(IXI)
= E(IX/i IXI > $ 1 + KI,
Kd + E(IXli IX/ $
K 1)
Thus, a UI family i3 bounded in £1. It is not true that a family bounded in £1 is UI.
Example. Take(n,F,P):::([O,I]'B[O,IJ,Leb). Let
Then E(IXnl) = 1, "In, so that (Xn) is bounded in £1. However, for any K > 0, we have for n > K,
so that (Xn) is not UI. Here, Xn
--+
0, but E(Xn)
-r o.
13.3. Two simple sufficient conditions for the VI property
.(a)
Supp03e that C i& a cla3s of random variable3 which is bounded in £P for 30me p > Ij thu3, for 30me A E [0,00), E(/XIP)
<
A,
"IX E C.
(19.9) ..
Chapter H: Uniform Integrability
128
Then C is Ul. Proof. If v 2:: K> 0, then v ::; X E C, we have
](l-pv p
(obviously!). Hence, for ](
> 0 and
E(jXI; IXI > K) ::; K1-PE(jXIP; IXI > K)::; K1-p A.
o
The result follows. (b)
Suppose that C is a class of random variables which is dominated by an integrable non-negative variable Y: IX(w)1 ::; Y(w),
VX E C and E(Y) <
00.
Then C is UI. Note. It is precisely this which makes (DO:M) work for our (st, F, P). Proof. It is obvious that, for K > 0 and X E C,
E(IXI; IXI > K) ::; E(Y; Y > K), and now it is only necessary to apply (13.1,b) to Y.
o
13.4. VI property of conditional expectations The mean reason that the VI property fits in so well with martingale theory is the following. See Exercise E13.3 for an important extension.
THEOREM ~~
Let X E £1. Then the class
{E(XI9) : 9 a sub-a-algebra of F} is uniformly integrable. Note. Because of the business of versions, a formal description of the class C in question would be as follows: Y E C if and only if for some sub-a-algebra 9 of F, Y is a version of E(XI9). Proof. Let e > 0 be given. Choose 5 > 0 such that, for FE :F, P(F) < 5 implies that E(IXI; F) < e.
Chapter 19: Uniform Integrability
.. {19.5}
129
Choose ]( so that K-IE(IXI) < h. Now let g be a sub-u-algebra of :F and let Y be any version of E(Xlg). By Jensen's inequality,
(a)
IYI :5 E(IXII g),
a.s.
Hence E(IYI) :5 E(IXI) and KP(IYI > K) :5 E(IYI) :5 E(IXI), so that P(IYI > K) <
o.
But {WI> K} E g, so that, from (a) and the definition of conditional expectation, o E(IYlj WI ~ K) :5 E(IXlj WI ~ K) < c. Note. Now you can see why we needed the more subtle result (13.1,a), not just the result (13.1,b) which has a simpler proof.
13.5. Convergence ill probability Let (X n) be a sequence of random variables, and let X be a random variable. We say that ~
Xn
-+
X in probability
if for every c > 0,
P(IX n -XI> c)
-+
0 as n
-+ 00.
LEMMA ~
If X n
-+
X almost surely, then Xn
-+
X in probability.
Proof.. Suppose that Xn -+ X almost surely and that c > O. Then by the Reverse Fatou Lemma 2.6(b) for sets, 0= P(IX n - XI> c, i.o.) = P(limsup{IXn - XI> c}) ~ limsupP(IXn - XI> c),
and the result is proved.
o
130
(13.5) ..
Chapter 13: Uniform Integrability
Note. As already mentioned, a discussion of the relationships between various modes of convergence may be found in the appendix to this chapter.
13.6. Elementary proof of (BDD) We restate the Bounded Convergence Theorem, but under the weaker hypothesis of 'convergence in probability' rather than 'almost sure convergence'.
THEOREM (BDD) Let (Xn) be a sequence of RV~, and let X be a RV. Suppose that Xn -+ X in probability and that for some J( in [0,00), we have for every nand w Then
E(IXn - XI) Proof Let us check that P(IXI :S K)
so that
POXI > J( + k- 1 )
= 1.
-+
O.
Indeed, for kEN,
= O. Thus
P(lXI > K)
= P(U{IXI > J( + k- 1 }) = o. k
Let c > 0 be given. Choose no such that
P(IXn - XI >i c) < Then, for n
~
E(IXn - Xi)
3~
when n
~ no·
no,
= E(IXn
- XI; IXn - XI >ic) + E(IXn >ic) +!c :$ c.
-
XI; IXn - XI :Sic)
:$ 2J(P(lXn - XI
The proof is finished.
o
This proof shows (much as does that of the Weierstrass approximation theorem) that convergence in probability is a natural concept.
.. {19.7}
Chapter 19: Uniform Integrability
13.7. A necessary and sufficient condition for
.c 1
191 convergence
THEOREM ~
.c
Let (Xn) be a sequence in .c1, and let X E 1 • Then Xn ~ X in 1 , equivalently E(IXn - XI) ~ 0, if and only if the following two condition., are 3ati3fied:
.c
(i) Xn
~
X in probability,
(ii) the 3equence (Xn) i., UI. I
Remarks. It is of course the 'if' part of the theorem which is useful. Since the result is 'best possible', it must improve on (DOM) for our (n,F,p) triple; and, of course, result 13.3(b) makes this explicit. Proof of 'if' part. Suppose that conditions (i) and (ii) are satisfied. For K E [0,00), define a function cP K : R ~ [- K, K] as follows: J(
CPJ«(x)::=
{
x -J(
if x> K, iflxl::;K, if x < -K.
Let 0: > 0 be given. By the VI property of the (Xn) sequence and (13.1,b), we can choose K so that
E{lCPK(X) - XI} <
0:
3·
But, since IcpK(X) - CPK(y)1 ::; Ix - yl, we see that CPK(Xn ) ~ CPJ«(X) in probability; and by (BDD) in the form of the preceding section, we can choose no such that, for n ~ no,
The triangle inequality therefore implies that, for n ~ no,
E(lXn - XI) <
0:,
o
and the proof is complete. Proof of 'only if' part. Suppose that Xn ~ X in Choose N such that
n ~N
=>
.c
1•
E(IXn - XI) < 0:/2.
Let
0:
> 0 be given.
(1S.7) ..
Chapter 1 S: Uniform Integrability
132
By (13.1,a), we can choose Ii > 0 such that whenever P(F) < Ii, we have
E(IXnliF) < c E(IXli F) < c/2. Since (Xn) is bounded in
(1 S n S N),
.c 1 , we can choose K K- 1 supE(IXri)
such that
< Ii.
r
Then for n ?: N, we have P(IXnl > K) < Ii and E(IX"I; IX"I > K) E(IXli IXnl > /{)
s
+ E(IX -
Xni) < c.
For n S N, we have P(IXnl > K) < Ii and E(lXnl; IXnl > K) < c. Hence (Xn) is a
ur family.
Since cP(IXn - XI> c) S E(IXn - XI) it is clear that Xn
->
X in probability.
= IIXn -
Xlh,
o
Chapter 14
ur Martingales
14.0. Introduction The first part of this chapter examines what happens when uniform integrability is combined with the martingale property. In addition to new results such as Levy's 'Upward' and 'Downward' Theorems, we also obtain new proofs of the Kolmogorov 0-1 Law and of the Strong Law of Large Numbers. The second part of the chapter (beginning at Section 14.6) is concerned with Doob's Submartingale Inequality. This result implies in particular that for p > 1 (but not for p = 1) a martingale bounded in £P is dominated by an element of £P and hence converges both almost surely and in £P. The Submartingale Inequality is also used to prove Kakutani's Theorem on product-form martingales and, in an illustration of exponential bounds, to prove a very special case of the Law of the Iterated Logarithm. The Radon-Nikodym theorem is then proved, and its relevance to likelihood ratio explained. The topic of optional sampling, important for continuous-parameter theory and in other contexts, is covered in the appendix to this chapter. 14.1. UI martingales
Let M be a UI martingale, so that M is a martingale relative to our set-up
(11,F, {Fn},P) and (Mn: n E Z+) is a UI family. Since M is VI, M is bounded in £1 (by (13.2)), and so Moo := limMn almost surely. By Theorem 13.7, it is also true that Mn -+ Moo in
exi~ts
£1:
E(IMn - MooD -+ O. We now prove that Mn = E(MooIFn), a.s. For F E·Fn , and r martingale property yields
E(Mr; F)
= E(Mn; F). 199
~
n, the
(1./.1) ..
Chapter 14: UI Martingales
134 But
IE(Mri F) - E(Mooi F)I ::; E(IMr - Moo I; F) ::; E(IMr - MooD.
Hence, on letting r
-+ 00
in (*), we obtain
E(Moo; F)
= E(Mn; F).
We have proved the following result.
THEOREM ~~
Let M be a VI martingale. Then
Moo := lim Mn exi.!ts a.s. and in Cl. Moreover, for every n,
The obvious extension to VI supermartingales may be proved similarly.
14.2. Levy's 'Upward' Theorem ~~
Let ~ E C1(n,.1",p), and define Mn := EW.1"n ), a..!. Then M is a VI martingale and almo.!t surely and in Cl. Proof. \Ve know that M is a martingale because of the Tower Property. We know from Theorem 13.4 that M is VI. Hence Aloo := limMn exists a.s. and in Cl, and it remains only to prove that Me<:> = 1], a.s., where
1]
:= E(~IFoo).
Without loss of generality, we may (and do) assume that consider the measures Ql and Q2 on (n,.1"oo), where Ql(F) := E(1]; F), If FE .1"n, then since E(1]I.1"n)
Q2(F)
= E(Moo; F),
~ ~
FE .1"00'
= E(~IFn) by the Tower Property,
O. Now
.. (14·9)
195
Chapter 14: ifI Martingales
the second equality having been proved in Section 14.1. Thus Ql and Q2 agree on the 1r-system (algebra!) U.rn' and hence they agree on .roo. Both." and Moo are.roo measurable; more strictly, Moo may be taken to be .roo measurable by defining Moo := lim sup Mn for every w. Thus,
Hence P(." > Moo)
o
= 0, and similarly P(Moo > .,,) = o.
14.3. Martingale proof of Kolmogorov's 0-1 law Recall the result. THEOREM Let Xl, X 2 , ••• be a sequence of independent RV.5. Define
n
Then if PET, PCP)
= 0 or 1.
Proof Define.rn :== a(XI' X 2 , • •• , Xn). Let PET, and let." :== IF. Since E b.roo, Levy's Upward Theorem shows that
'TJ
However, for each n, ." is Tn measurable, and hence (see Remark below) is independent of Fn. Hence by (9.7,k),
E07I.rn ) == E(.,,) = PCP),
a.s.
Hence." = PCP), a.s.; and since." only takes the values 0 and 1, the result follows. 0 Remark. Of course, we have cheated to some extent in building parts of the earlier proof into the martingale statements used in the proof just given.
Chapter 14: UI Martingales
136
(14·4).·
14.4. Levy's 'Downward' Theorem ~~
Suppose that (n,.r, P) is a probability triple, and that is a collection of sub-a--algebras of.r such that
W-n : n
E N}
9-00:= n9-k ~ ... ~ 9-(n+l) ~ 9-n ~ ... ~ 9-1. k
Let, E £1 (n, F, P) and define
Then
M- oo := limAl_ n exists a.s. and in {} and ALco
= E( ,19-00)'
a.s.
Proof. The Upcrossing Lemma applied to the martingale
(Mk,9k: -N::; k::; -1) can be used exactly as in the proof of Doob's Forward Convergence Theorem to show that lim M_ n exists a.s. The uniform-integrability result, Theorem 13.4, shows that lim M_ n exists in £1. That (*) holds (if you like with M- oo := lim sup M_ n E m9-co) follows by now-familiar reasoning: for G E 9-00 ~ 9-r, ECli G)
and now let r
= E(M_ r ; G),
1 00.
o
14.5. Martingale proof of the Strong Law
Recall the result (but add £1 convergence as bonus).
THEOREM Let X 1 ,XZ , ••• be lID RVs, with E(IXkl) < oo,Vk. common value of E(Xn ). Write
Let J-l be the
Chapter 14: UI Martingales
··(14·6)
137
Proof. Define g-n := u(Sn' Sn+l, Sn+2, . .. ),
g-oo:=
n
g-n.
n
We know from Section 9.11 that
Hence L := limn- 1 S n exists a.s. and in £1. For definiteness, define L := limsupn- 1 Sn for every w. Then for each k,
L
+Xk+n = 1·Imsup Xk+l + ... n
so that L E mT" where T" = u(Xk+I..\'''+2, .. . ). By Kolmogorov's 0-1 law, P(L = c) = 1 for some c in R. But
o Exercise. Explain how we could have deduced £1 convergence at (12.10). Hint. Recall Scheffe's Lemma 5.10, and think about how to use it. Remarks. See Meyer (1066) for important extensions and applications of the results given so far in this chapter. These extensions include: the Hewitt-Savage 0-1 law, de Finetti's theorem on exchangeable random variables, the Choquet-Deny theorem on bounded harmonic functions for random walks on groups.
14.6. Dooh's Suhmartingale Inequality THEOREM ~~(a)
Let Z be a non-negative submartingale. Then, for c > 0,
Proof. Let F:= {suP":5n Z" ::: c}. Then F is a disjoint union F
= Fo U Fl U ... U F n ,
(14.6) ..
Chapter 14: UI Martingales
138 where
Po := {Zo ~ c},
n
:=
{Zo < c} n {Zl < c} n ... n {Zk-l < c} n {Zk
~
c}.
o
Summing over k now yields the result.
The main reason for the usefulness of the above theorem is the following.
LEMMA ~(b)
If M is a martingale, c is a convexfundion, and Elc(Mn)1 < then
00,
\In,
c(M) is a submartingale. Proof. Apply the conditional form of Jensen's inequality in Table 9.7.
0
Kolmogorov's inequality ~
Let (Xn : n E N) be a sequence of independent zero-mean RVs in £2. Define u~ := Var(Xk). Write n
Sn := Xl
+ ... + X n,
Vn := Var(Sn) =
L: u%. k=l
Then, for c > 0,
Proof. We know that if we set :Fn = u(X!, X 2 , ••• ,Xn)' then S martingale. Now apply the Submartingale Inequality to S2.
= (Sn) is a 0
Note. Kolmogorov's inequality was the key step in the original proofs of Kolmogorov's Three-Series Theorem and Strong Law. 14.7. Law of the Iterated Logarithm: special case Let us see how the Submartingale Inequality may be used via so-called exponential bounds to prove a very special case of Kolmogorov's Law of the
.. (1.4-7)
Chapter 14: UI Martingales
199
Iterated Logarithm which is described in Section A4.1. (You would do well to take a quick look at this proof even though it is not needed later.)
THEOREM Let (Xn : n EN) be lID RVs each with the standard normal N(O, 1) distribution of mean 0 and variance 1. Define
Then, almost surely,
· sup 11m
Sn
1
(2n log log n)l
= 1.
Proof. Throughout the proof, we shall write
h(n):= (2nloglogn)t,
n
~3.
(It will be understood that integers occurring in the proof are greater than e, when this is necessary.) Step 1: An exponential bound. Define Fn:= q(X1 ,X2 , ••• ,Xn ). Then Sis a martingale relative to {Fn }. It is well known that for () E R, n E N,
The function x
14
efJ:J;
is convex on R, so that e fJSn is a sub martingale
and, by the Submartingale Inequality, we have, for () > 0,
This is a type of exponential bound much used in modern probability theory. In our special case, we have
-
Chapter 14: U1 Martingales
140
and for c > 0, choosing the best 8, namely
(14. 7) ..
c/n, we obtain
(a)
Step 2: Obtaining an upper bound. Let ]( be a real number with ]( > 1. (We are interested in cases when]( is close to 1.) Choose C n := Kh(I.;n-1). Then
The First Borel-Cantelli Lemma therefore shows that, almost surely, for all large n (all n ~ no(w)) we have for 1.;n-1 :::; k:::; ](n,
Sk:::; sup Sk :::;
Cn
= ](h(](n-1):::; ](h(k).
k5,l(n
Hence, for K > 1, limsuph(k)-lSk
:::;](,
a.s.
k
By taking a sequence of ](-values converging down to 1, we obtain
Step 3: Obtaining a lower bound. Let N be an integer with N > 1. (We are interested in cases when N is very large.) Let e be a number in (0,1). (Of course, e will be small in the cases which interest us.) Write S(r) for Sr, etc., when typographically more convenient. For n E N, define the event
Then (see Proposition 14.S(b) below),
where Thus, ignoring 'logarithmic terms', P(Fn ) is roughly (n logN)-(1-~)2 so that L:: P(Fn ) = 00. However, the events Fn (n E N) are clearly independent, so
.. (14.8)
Chapter 14: UI Martingales
141
that (BC2) shows that, almost surely, infinitely many Fn occur. Thus, for infinitely many n,
But, by Step 2, S(Nn) > -2h(Nn) for all large n, so that for infinitely many n, we have
It now follows that limsuph(k)-lSk
~
limsuph(Nn+l)-lS(Nn+l)
k
n
(You should check that 'the logarithmic terms do disappear'.) The rest is obvious. 0
\. 14.8. A standard estimate on the normal distribution We used part of the following result in the previous section.
Proposition Suppose that X has the standard normal distribution, so that, for x E R,
P(X > x)
=1-
(x)
=
1
00
cp(y)dy
where Then, for x > 0,
P(X > x) S; x-lcp(x), P(X > x) ~ (x + x-l )-lcp(X).
(a) (b)
Proof. Let x > O. Since cp'(Y)
cp(x) yielding (a).
=
1
= -ycp(y),
00
ycp(y)dy
~x
1
00
cp(y)dy,
(14·B) ..
Chapter 14: UI Martingales
142 Since (y-l<.p(y))'
x-1rp(x) =
= -(1 + y-2)<.p(y),
1
00
(1
+ y-2)rp(y)dy ~ (1 + x- 2)
1
00
rp(y)dy,
o
yielding (b).
14.9. Remarks on exponential bounds; large-deviation theory Obtaining exponential bounds is related to the very powerful theory of large deviations - see Varadhan (1984), Deuschel and Stroock (1989) - which has an ever-growing number of fields of application. See Ellis (1985). You can study exponential bounds in the very specific context of martingales in Neveu (1975), Chow and Teicher (1978), Garsia (1973), etc. Much of the literature is concerned with obtaining exponential bounds which are in a sense best possible. However, 'elementary' results such as the Azuma-Hoeffding inequality in Exercise E14.1 are very useful in numerous applications. See for example the applications to combinatorics in Bollobas (1987). 14.10. A consequence of Holder's inequality Look at the statement of Doob's £P inequality in the next section in order to see where we are going.
LEMMA Suppose that X and Yare non-negative RVs such that cP(X :2: c) ~ E(Yj X :2: c) for every c > O. Then, for p > 1 and p-l
+ q-l = 1,
we have
Proof. We obviously have
Using Fubini's Theorem with non-negative integrands, we obtain
Chapter 14: UI
.. {14.11} L
=
Martingale~
1: (10 I{X~c}(w)P(dw»)
= 10
149
pcP-Idc
(1::~) pcP-IdC) P(dw) = E(XP).
Exactly similarly, we find that
We apply Holder's inequality to conclude that
(**) Suppo~e that IIYII, < 00, and suppose for now that since (p - l)q = p, we have
IIXII, < 00 also.
Then
so (**) implies that IIXlip :$ qIlYII,. For general X, note that the hypothesis remains true for X 1\ n. Hence IIX 1\ nll p :$ qlIYII, for all n, and the result follows using (MON). 0
-
14.11. Dooh's CP inequality THEOREM ~~(a)
Let p > 1 and define q ~o thatp-l +q-l = 1. Let Z be a non-negative submartingale bounded in CP, and define {thi~ i~ standard notation}
Z·:= sup Zk. kEZ+
Then Z· E CP, and indeed
IIZ·llp :$ q sup IIZrll,. r
The ~ubmartingale Z is therefore dominated by the element C'. A~ n --+ 00, Zoo := lim Zn exists a.s. and in CP and
z·
of
Chapter 14: UI Martingales
144 ~~(b)
(14·11) ..
If Z is of the form IMI, where M is a martingale bounded in CP, then Moo := limMn exists a.s. and in CP, and of course Zoo = IMool, a.s.
Proof. For n E Z+, define Z~ := SUPk
Property (*) now follows from the Monotone-Convergence Theorem. Since (-Z) is a supermartingale bounded in CP, and therefore in Cl, we know that Zoo := lim Zn exists a.s. However, IZn - ZIP
s:; (2Z*)P
E
CP,
so that (DOM) shows that Zn -+ Z in CPo Jensen's inequality shows that I!Zrl!p is non-decreasing in r, and all the rest is straightforward. 0
14.12. Kakutani's theorem
011
'product' martingales
Let X},X 2 , ••• be independent non-negative RVs, each of mean 1. DeJine Mo := 1, and, for n E N, let
Then M is a non-negative martingale, so that
11100 := limMn
exists a.s.
The following Jive statements are equivalent:
(i) E(Moo) = 1, (ii) Mn -+ Moo in Cl; (iii) M is UI; (iv)
IT an > 0 where 0 < an := E(Xj) s:; 1,
(v) 2:(1- an) <
00.
If one (then everyone) of the above Jive statements fails to hold, then
P(Moo
= 0) =
1.
Remark. Something of the significance of this theorem is explained in Section 14.17.
.. (1,4.19)
Chapter 14: UI Martingales
Proof. That an obvious.
~
145
1 follows from Jensen's inequality. That an > 0 is
First, suppose that 3tatement (iv) holds. Then define
Then N is a martingale for the same reason that M is. See (lOA,b). We h~e
EN~ = 1/(ala2'" an )2 ~ 1/(11 al,)2 < 00,
so that N is bounded in £2. By Doob's £2 inequality, E
~
('1 "\
tv 0..-: ';0", ) h
(s~pIMnl) ~ E (s~pINnI2) ~ 4s~pEIN~1) < 00,
,
so that M is dominated by M* := sUPn IMnl E £1. Hence M is UI and properties (i)-(iii) hold. Now consider the ca3e when IT an = O. Define N as at (*). Since N is a non-negative martingale, limNn exists a.s. But since IT an = 0, we are forced to conclude that Moo = 0, a.s.
The equivalence of (iv) and (v) is known to us from (4.3). The theorem is proved. 0
14.13. The Radon-Nikodym theorem Martingale theory yields an intuitive - and 'constructive' - proof of the Radon-Nikodym theorem. We are guided by Meyer (1966). \Ve begin with a special case.
THEOREM ~~(I)
Suppose that in that
(n, F, P)
is a probability triple in which F is separable
F = u( Fn : n E N) for some sequence (Fn) of .mb3ets of n. Supp03e that Q is a finite measure on (n, F) which i8 absolutely continuous relative to P in that
(a)
for
FE F,
P(F)
= 0 => Q(F) = O.
Chapter 14: UI Martingales
146
(14·13) ..
Then there exists X in Cl(fl,F,P) such that Q
= XP
(see Section
5.14) in that
Q(F)
=
i
XdP
= E(XiF),
"iF E.F.
The variable X is called a version of the Radon-Nikodym derivative of Q relative to P on (fl, F). Two such versions agree a.s. We write
dQ dP
=X
'L
on or,
a.s.
Remark. Most of the <1-algebras we have encountered are separable. (The <1-algebra of Lebesgue-measurable subsets of [0,1] is not.)
Proof. With the method of Section 13.1(a) in mind, you can prove that property (a) implies that
(b)
given
0:
> 0, there exists 6 >
°
such that, for F E F,
P(F) < 6 =? Q(F) < 0:. Next, define
Fn := <1(Fl' F2 , ••• ,Fn). Then for each n, Fn consists of the 2 r (n) possible unions of 'atoms'
An,l, . .. ,An,r(n) of F n, an atom A of Fn being an element of Fn such that 0 is the only proper subset of A which is again an element of Fn. (Each atom A will have the form where each Hi is either Fi or Fr.) Define a function Xn : fl
-+
[0,00) as follows: if w E An,k, then
P(An,k) = 0, P(An,k) > 0.
(c)
E(Xni F)
= Q(F),
Chapter 14: UI Martingales
.. (14·19)
147
The variable Xn is the obvious version of dQ/dP on (f2,Fn). It is obvious from (c) that X = (X n : n E Z+) is a martingale relative to the filtration (Fn : n E Z+), and since this martingale is non-negative,
Xoo := limXn, Let
€
exists, a.s.
> 0, choose 6 as at (a), and let
I( E (0,00) be such that
Then so that E(XniXn > K)
= Q(X n > K) < €.
The martingale X is therefore UI, so that
It now follows from (c) that the measures
F
f-+
E(XiF)
and
F
f-+
Q(F)
agree on the 7r-system U F n , so that they agree on F. All that remains is the proof of uniqueness, which is now standard for us. 0 Remark. The familiarity of all of the arguments in the above proof emphasizes the close link between the Radon-Nikodym and conditional expectation which is made explicit in Section 14.14. Now for the next part of the theorem ...
(II)
The assumption that F is separable can be dropped from Part 1.
Once one has Part II, one can easily extend the result to the case when P and Q are <7-finite measures by partitioning f2 into sets on which both are finite. Proving Part II of the theorem is a piece of 'abstract nonsense' based on the fact that £1 (or, more strictly, L1) is a metric space, and in particular on the role of sequential convergence in metric spaces. You might well want to take Part II for granted and skip the remainder of this section. Let Sep be the class of all separable sub-a-algebras of F. Part I shows that for 9 E Sep, there exists Xo in £1(f2, 9, P) such that
dQjdP
= Xo i equivalently, E(XOi G) = Q(G), G E 9.
(14. 19}.
Chapter 14: UI Martingales
148
We are going to prove that there exists X in £1(n,:F, P) such that Xa ..... X in £1
(d) in the sense that given e
> 0, there exists IC in Sep such that
if IC S; g E Sep, then IIXa - XIiI
< e.
First, we note that it is enough to prove that
(e)
(Xa : g E Sep) is Cauchy in £1
in the sense that given e > 0, there exists IC in Sep such that if IC S; gi E Sep for i = 1,2, then IIXal -Xa2lh < e.
Proof that (e) implies (d). Suppose that (e) holds. Choose IC n E Sep such that if IC n S; gi E Sep for i = 1,2, then
Let H(n) = u(ICI,IC 2 , ••• ,ICn ). Then (see the proof of (6.1O,a)) the limit X:= limX1t (n) exists a.s. and in £1, and indeed,
Set X := limsupX1t(n) for definiteness. For any g E Sep with g ;;2 Hn we have Result (d) follows.
o
Proof of (e). H (e) is false, then (why?!) we can find eo> 0 and a sequence IC(O) S; IC(l) S; ... of elements of Sep such that
However, it is easily seen that (XA::(n») is a UI martingale relative to the filtration (IC(n)), so that XA::(n) converges in £1. The contradiction establishes that (e) is true. 0
Proof of Part II of the theorem. We need only show that for X as at (d) and for F E :F, we have E(XjF) = Q(F).
Chapter 14: UI Martingales
.. (14. 15)
149
Choose K such that for K <;;;; 9 E Sep, IIX" -XIII < e. Then a(K,F) E Sep, where a(K, F) is the smallest a-algebra extending K and including F; and, by a familiar argument,
IE(X; F) - Q(F)I
= IE(X - X,,(IC,F); F)I :::; IIX - X,,(IC,F) lit < e.
o
The result follows.
14.14. The Radon-Nikodym theorem and conditional expectation
Suppose that (11, F, P) is a probability triple, and that 9 is a sub-a-algebra of F. Let X be a non-negative element of £1 (11, F, P). Then
Q(X) := E(X; G),
G E 9,
defines a finite measure Q on (11,9). Moreover, Q is clearly absolutely continuous relative to P on 9, so that, by the Radon-Nikodym theorem, (a version ... ) Y := dQ/ dP exists on (11,9). Now Y is 9-measurable, and
E(Y; G) = Q(G) = E(X; G),
G E 9.
Hence Y is a version of the conditional expectation of X given 9:
Y
= E(XI9),
a.s.
Remark. The right context for appreciating the close inter-relations between martingale convergence, conditional expectation, the Radon-Nikodym theorem, etc., is the geometry of Banach spaces. 14.15. Likelihood ratio, equivalent measures
Let P and Q be probability measures on (11,F) such that Q is absolutely continuous relative to P, 50 that a version X of dQ/dP on F exists. We say that Y is (a version of) the likelihood ratio of Q given P. Then P is absolutely continuous relative to Q if and only if P(X > 0) = 1, and then X-I is a version of dP / dQ. When each of P and Q is absolutely continuous relative to the other, then P and Q are said to be equivalent. Note that it then makes sense to define
l
v'dPdQ:=
l
XtdP =
l
(X-i)dQ,
FE:F;
and we can hope for a fuller understanding of what Kakutani achieved ....
150
Chapter 14: UI Martingales
(14·16) ..
14.16. Likelihood ratio and conditional expectation Let (n, F, P) be a probability triple, and let Q be a probability measure on (n, F) which is absolutely continuous relative to P with density X. Let g be a sub-u-algebra of F. What g-measurable function (modulo versions) yields dQ/dP on g? Yes, of course, it is Y = E(Xlg), for, yet again, with E denoting P-expectation, E(Y; G)
= E(X; G) = Q(G) for G E g.
Hence, if {Fn} is a filtration of (n,F), then the likelihood ratios (dQ/dP on Fn)
= E(XIFn)
form a UI martingale. (This is of course why the martingale proof of the Radon-Nikodym theorem was bound to succeed!) Here and in the next two sections, we are dropping the 'a.s.' qualifications on such statements as (*): we have outgrown them.
14.17. Kakutani's Theorem revisited; consistency of LR test Let n = RN, Xn(w) =
W
n , and define the u-algebras
F= U(Xk: kEN), Suppose that for each n, In and gn are everywhere positive probability density functions on R and let rn(x) := gn(x)/In(x). Let P [respectively, QJ be the unique measure on (n, F) which makes the variables Xn independent, Xn having probability density function In [respectively, gnJ. Clearly, but you should prove this,
M n := dQ/dP
= Y1 Y2 ••• Yn
on F n ,
where Yn = Tn(Xn). Note that the variables (Yn : n E N) are independent under P and that each has P-mean 1. For any of a multitude of familiar reasons, AI is a martingale. Now if Q is absolutely continuous relative to P on F with dQ/ dP = ~ on F, then Mn = E(elFn), and M is UI. Conversely, if Mis UI, then Moo exists (a.s., P) and E(MoofFn) = Mn, Vn. But then the probability measures
F
t-+
Q(F)
and
F
t-+
E(Mex>; F)
agree on the 7T-system UFn and so agree on F, so that Moo = dQ/dP on F. Thus Q is absolutely continuous relative to P on F if and only if M is UI.
Chapter 14; UI Martingales
.. (14. 18)
151
Kakutani's Theorem therefore implies that Q is equivalent to P on :F if and only if
equivalently if
L 1Rf {VIn(x) n
Vgn(X)
r
dx < 00;
and that then P is also absolutely continuous relative to Q. Suppose now that the Xn are identically distributed independent variables under each of P and Q. Thus, for some probability density functions I and 9 on R, we have In = I and gn = g for all n. It is clear from (*) that Q is equivalent to P if and only if I = 9 almost everywhere with respect to Lebesgue measure, in which case Q = P. Moreover, Kakutani's Theorem also tells us that if Q f:. P, then Mn ~ 0 (a.s., P) and this is exactly the con.5isiency 01 the Likelihood-Ratio Test in Statistics. 14.18. Note on Hardy spaces, etc. (prestissimof) We have seen in this chapter that for many purposes, the class of UI martingales is a natural one. The appendix to this chapter, on the OptionalSampling Theorem, provides further evidence of this. However, what we might wish to be true is not always true for UI martingales. For example, if M is a UI martingale and C is a (uniformly) bounded previsible process, then the martingale C. M need not be bounded in £.1. (Even so, C. M does converge a.s.!) For many parts of the more advanced theory, one uses the 'Hardy' space of martingales M null at 0 for which one (then each) of the following equivalent conditions holds: 'H~
IMnl
£.1,
(a)
M* := sup
(b)
[M]i, E £.1, where [M]n := L~=l(Mk - Mk_l)2 and [M]oo =i limMn •
E
By a special case of a celebrated Burkholder-Davis-Gundy theorem, there exist absolute constants C p , C p (1 :5 p < 00) such that
(c)
(l:5p
The space 'H~ is obviously sandwiched between the union of the spaces of martingales bounded in £.p (p > 1) and the space of UI martingales. Its
Chapter 14: UI Martingales
152
(14.18) ..
identification as the right intermediate space has proved very important. Its name derives from its important links with complex analysis. Proof of the B-D-G inequality or of the equivalence of (a) and (b) is too difficult to give here. But we can take a very quick look at the relevance of (b) to the C. M problem. First, (b) makes it clear that
(d)
if ME 1ib and C is a bounded previsible process, then C. M E 1i~,
and we shall now see that, in a sense, this is 'best possible'. Suppose that we have a martingale'\l null at 0 and a (bounded) previsible process C = (ck : kEN) where the Ck are lID RVs with P(ck = ±1) = L and where C and M are independent. We want to show that (e)
M E 1i~ if (as well as only if)
C •
M is bounded in
.c 1 •
We run into no difficulties of 'regularity' if we condition on M:
EI(c. M)ln
= EE{I(c. M)nlla(M)} 2: 3- 1 E([Mlh
And where did the last inequality appear from? Let (ak : kEN) be a sequence of real numbers. Think of ak as M" - M k - 1 when M is known. Define n
Xk:=akck,
W n :=X1 +···+Xn,
Vn=E(W~)=Lai. k=1
Then (see Section 7.2)
so that, certainly, E(W~) S 3v~. On combining this fact with Holder's inequality in the form
we obtain the special case of Khillchille's inequality we need:
o For more on the topics in this section, see Chow and Teicher (1978), Dellacherie and Meyer (1980), Doob (1981), Durrett (1984). The first of these is accessible to the reader of this book; the others are more advanced.
Chapter 15
Applications
15.0. Introduction - please read! The purpose of this chaptcr is to give some indication of some of the ways in which the theory which we have developed can be applied to real-world problcms. We consider only very simple examples, but at a lively pace! In Sections 15.1-15.2, we discuss a trivial case of a celebrated result from mathematical economics, the Black-Scholes option-pricing formula. The formula was developed for a continuous-parameter (diffusion) model for stock prices; see, for example, Karatzas and Schreve (1988). We present an obvious discretization which also has many treatments in the literature. What needs to be emphasized is that in the discrete case, the result has nothing to do with probability, which is why the answer is completely independent of the underlyiIJ.g probability measure. The uae of the 'martingale mea.mre' P in Section 15.£ ia nothing other than a device for expreasing aome aimple algebra/combinatorics. But in the diffusion setting, where the algebra and combinatorics are no longer meaningful, the martingale-representation theorem and Cameron-Martin-Giraanov changeof-measure theorem provide the essential language. I think that this justifies my giving a 'martingale' treatment of something which needs only juniorschool algebra. Sections 15.3-15.5 indicate the further development of the martingale formulation of optimality ill stochastic control, at which Exercise E10.2 gave a first look. We consider just one 'fun' example, the' M abinogion sheep problem'; but it is an example which illustrates rather well several techniques which may be effectively utilized in other contexts. In Sections 15.6-15.9, we look at some simple problems of filtering: estimating in real-time processes of which only noisy observations can be made. This topic has important applications in engineering (look at the IEEE journals!), in medicine, and in economics. I hope that you will look
159
{is. 0) ..
Chapter 15: Applications
154
further into this topic and into the important subject which develops when filtering is combined with stochastic-control theory. See, for example, Davis and Vintner (1985) and Whittle (1990). Sections 15.10-15.12 consist of first reflections on the problems we encounter when we try to extend the martingale concept. 15.1. A trivial martingale-representation result Let S denote the two-point set {-1, 1}, let I: denote the set of all subsets of S, let p E (0,1), and let Jl be the probability measure on (S, I:) with
Jl({l}) = p = 1- Jl({ -1}). Let N E N. Define
(n, F, P) = (S, I:, Jl)N
so that a typical element of n is
Wk E {-I, 1}. Define ek : n -> R by ek(W) := Wk, so that (el, e2, ... , eN) are lID RVs each with law Jl. For 0 :::; n :::; N, define n
Zn
:=
2:)e k -
2p + 1),
k=1
Fn Note that E(ek)
(a)
:=
<7(Zo, ZI, ... , Zn)
= l.p + (-1)(1 Z
p)
= (Zn
= <7(el' e2, ... , en).
= 2p -
1. We see that
: 0 :::; n :::; N)
i.'l a martingale (relative to ({Fn : 0 :::; n :::; N}, P».
LEMMA If M = (Mn : 0 :::; n :::; N) is a martingale (relative to ({Fn : 0 :::; n :::; N}, P»), then there exists a unique previsible process H such that
Remark. Since Fo = {0,n}, Mo is constant on common value of the E(Mn ).
n and has to be the
Proof. We simply construct H explicitly. Because Mn is Fn measurable,
Chapter 15: Applications
.. (15.2)
155
for some function !n : {-I, l}n -+ R.. Since M is a martingale, we have
0= E(Mn - Mn-IIFn-I)(W) = P!n(WI, ... ,Wn-l, 1) + (1 - P)!n(WI, ... ,Wn-I. -1) - !n-I(WI, ... ,wn-d· Hence the expressions
!n(WI, ... ,wn -I.l) - !n-I(WI, ... ,Wn-l) 2(1 - p)
(bl) and
!n-I(WI,.'. ,wn-d - !n(WI, ... ,Wn-l, -1) 2p
(b2)
are equal; and if we define Hn(w) to be their common value, then H is clearly previsible, and simple algebra verifies that M = Mo + H • Z, as required. You check that H is unique. 0
15.2. Option pricing; discrete Black-Scholes formula
Consider an economy in which there are two 'securities': bonds of fixed interest rate r, and stock, the value of which fluctuates randomly. Let N be a fixed element of N. We suppose that values of units of stock and of bond units change abruptly at times 1,2, ... ,N. For n = 0,1, ... , N, we write Bn = (1 + r)nBo for the value of 1 bond unit throughout the open time interval (n, n + 1), Sn for the value of 1 unit of stock throughout the open time interval
(n,n
+ 1).
You start just after time 0 with a fortune of value x made up of Ao units of stock and Vo of bond, so that
AoSo
+ VoBo = x.
Between times 0 and 1 you invest this in stocks and bonds, so that just before time 1, you have AI units of stock and VI of bond so that
So, (AI' Vi) represents the portfolio you have as your 'stake on the first game'.
156
(15.2) ..
Chapter 15: Applications
Just after time n -1 (where n units of bond with value
~
1) you have A n -
l
units of stock and Vn -
l
By trading stock for bonds or conversely, you rearrange your portfolio between times n - 1 and n so that just before time n, your fortune (still of value Xn-l because we assume transaction costs to be zero) is described by
Your fortune just after time n is given by
(a) and your change in fortune satisfies
Now, and
Sn - Sn-l = RnSn- l
,
where Rn is the random 'rate of interest of stock at time n'. We may now rewrite (b) as
so that if we set
(c) then
(d) Note that (c) shows Y n to be the discounted value of your fortune at time n, so that the evolution (d) is of primary interest. Let n,F,cn(1 $ n $ N), Zn(O $ n $ N) and Fn(O $ n $ N) be as in Section 15.1. Note that no probability measure has been introduced. We build a model in which each Rn takes only values a, b in (-1,00), where a < r < b,
Chapter 15: Applications
.. (15.£)
157
by setting
a+b b-a Rn = -2- + -2- en .
(e) But then
(f)
Rn - r =
Hb -
a) (en - 2p+ 1) =
Hb -
a)(Zn - Zn-l),
where we now choose r-a
(g)
p:= b-a·
Note that (d) and (f) together display Y as a 'stochastic integral' relative to Z. A European option is a contract made just after time 0 which will allow you to buy 1 unit of stock just after time N at a price K; K is the so-called striking price. If you have made such a contract, then just after time N, you will exercise the option if SN > K and will not if SN < K. Thus, the value at time N of such a contract is (SN - K)+. What should you pay for the option at time O? Black and Scholes provide an answer to this question which is based on the concept of a hedging strategy. A hedging strategy with initial value x for the described option is a portfolio management scheme {(An, Vn) : 1 :5 n :5 N} where the processes A and V are previsible relative to {Fn}, and where, with X satisfying (a) and (b), we have for every w,
(hI)
Xo(w) = x,
(h2)
Xn(w)
(h3)
XN(W) = (SN(W) - 1<)+.
~
0 (0 :5 n :5 N),
Anyone employing a hedging strategy will by appropriate portfolio management, and without going bankrupt, exactly duplicate the value of the option at time N. Note. Though Black and Scholes insist that Xn(w) ~ 0, Vn, Vw, they (and we) do not insist that the processes A and V be positive. A negative value of V for some n amounts to borrowing at the fixed interest rate r. A negative value of A corresponds to 'short-selling' stock, but after you have read the theorem, this may not worry you!
Chapter 15: Applications
158
(15.2) ..
THEOREM A hedging strategy with initial value x exists if and only if
where E is the expectation for the measure P of Section 15.1 with p as at (g). There is a unique hedging strategy with initial value Xo, and it involves no short-selling: A is never negative. On the basis of this result, it is claimed that Xo is the unique fair price at time 0 of the option. Proof. In the definition of hedging strategy, there is nowhere any mention of an underlying probability measure. Because however of the 'for every w' requirement, we should consider only measures on n for which each point w has positive mass. Of course, P is such a measure. Suppose now that a hedging strategy with initial value x exists, and let A, V, X, Y denote the associated processes. From (d) and (f),
Y
= Yo +FeZ,
where F is the previsible process with
Of course, F is bounded because there are only finitely many (n,w) combinations. Thus Y is a martingale under the P measure, since Z is; and since Yo = x and YN = (1 +r)-N(Sn - K)+ by (c) and the definition of hedging strategy, we obtain x
= Xo.
(We did not use the property that X 2:: 0.) Now consider things afresh and define
Then Y is a martingale, and by combining (f) with the martingale-representation result in Section 15.1, we see that for some unique previsible process A, (d) holds. Define
Xn := (1
+ rtYn,
Vn := (Xn - AnSn)/Bn.
Then (a) and (b) hold. Since Xo = x and XN
= (SN -
K)+,
.. (15.9)
159
Chapter 15: Applications
the only thing which remains is to prove that A is never negative. Because of the explicit formula (15.1,b1), this reduces to showing that E [(SN - K)+ISn-b Sn = (1
+ b)Sn-d
;::: E [(SN - I()+ISn-l, Sn = (1
+ a)Sn-l] ;
and this is intuitively obvious and may be proved by a simple computation on binomial coefficients.
15.3. The Mabinogion sheep problem In the Tale of Peredur ap Efrawg in the very early Welsh folk tales, The Mabinogion (see Jones and Jones (1949)), there is a magical flock of sheep, some black, some white. We sacrifice poetry for precision in specifying its behaviour. At each of times 1,2,3, ... a sheep (chosen randomly from the entire flock, independently of previous events) bleats; if this bleating sheep is white, one black sheep (if any remain) instantly becomes white; if the bleating sheep is black, then one white sheep (if any remain) instantly becomes black. No births or deaths occur. The controlled system Suppose now that the system can be controlled in that just after time 0 and just after every magical transition time 1,2, ... , any number of white sheep may be removed from the system. (White sheep may be removed on numerous occasions.) The object is to maximize the expected final number of black sheep. Consider the following example of a policy: Policy A: at each time of decision, do nothing if there are more black sheep than white sheep or if no black sheep remain; otherwise, immediately reduce the white population to one less than the black population.
The value function V for Policy A is the function
V: z+ X Z+
-+
[0,00),
where for w, b E Z+, V( w, b) denotes the expected final number of black sheep if one adopts Policy A and if there are w white and b black sheep at time O. Then V is uniquely specified by the fact that, for w, b E Z+, (a1) V(O, b) = b;
(15.S) ..
Chapter 15: Applications
160
(a2) V(w, b) = V(w -I, b) whenever w ~ band w (a3) V(w, b) b> 0 and w
=
w~b V(w
> o.
+ 1, b -
1)
+
> OJ
w~b V(w - I, b + 1) whenever w < b,
It is almost tautological that if Wn and Bn denote the numbers of white and black sheep at time n, then, if we adopt Policy A, then, whatever the initial values of Wo and B o,
(b) V(Wn' Bn) is a martingale relative to the natural filtration of {(Wn,Bn) : n ~
OJ.
(c) LEMMA The following statements are true for w, b E Z+:
(el) V(w, b)
~
V(w -l,b) wheneverw
> 0,
(c2) V(w, b) ~ w~b V( W + I, b - 1) + w~b V(w - I, b + 1) whenever w > 0 and b > O. Let us suppose that this Lemma is true. (It is proved in the next Section.) Then, for any policy whatsoever,
(d) V(Wn' Bn) is a supermartingale. The fact that V(Wn, Bn) converges means that the system must a.s. end up in an absorbing state in which sheep are of one colour. But then V(W00, Boo) is just the final number of black sheep (by definition of V). Since V(Wn' Bn) is a non-negative supermartingale, we have for deterministic TVo, Bo,
Hence, whatever the initial position, the expected final number of black sheep under any policy is no more than the expected final number of black sheep if Policy A is used. Thus Pol~cy A is optimal.
In Section 15.5, we prove the following result:
V(k, k) - (2k
+ f -..;:if) -+ 0 as k -+ 00.
Thus if we start with 10 000 black sheep and 10 000 white sheep, we finish up with (about) 19824 black sheep on average (over many 'runs'). Of course, the above argument worked because we had correctly guessed the optimal policy to begin with. In this subject area, one often has to make good guesses. Then one usually has to work rather hard to clinch results
Chapter 15: Applications
.. (15·4)
161
which correspond in more general situations to Lemma (c) and statement (d). You might find it quite an amusing exercise to prove these results for our special problem now, before reading on. For a problem in economics which utilizes analogous ideas, see Davis and Norman (1990). 15.4. Proof of Lemma 15.3(e) It will be convenient to define
(a)
Vk := V(k, k).
Everyt hing hinges on the following results: for 1 :s: c
(bI) V(k - c, k
+ c) =
Vk
+ (2k -
vk)2-(21.:-2)
:s: k,
I::
1) :-,
j=k
J
k+c-l (2k
(b2) V(k+l-c,k+c) Vk
+ (2k + 1 -
Vk) { 2 2k - 1
+! ( 2k) k'
}-1
{k+C-l ~ (2k)} j ,
which simply reflect (15.3,a3) together with the 'boundary conditions': V(k,k)=Vk,
(e)
V(k
+
l,k)
V(0,2k)=2k,
= Vk,
V(0,2k
+ 1) = 2k + 1.
Now, from (15.3,a2), Vk+l = V(k
+ 1, k + 1) =
V(k, k
+ 1),
and hence, from (b2) with c = 1, we find that
(d)
Vk+l
Pk 2Pk = -1- V k + - - ( 2 k + 1), 1 + PI.: 1 + Pk
where Pk is the chance of obtaining k heads and k tails in 2k tosses of a fair coin:
(e)
2k Pk -_ 2- (2k) k'
Result (d) is the key to proving things by induction.
(15.4) ..
Chapter 15: Applications
162
Proof of result (I5.3,cI). From (I5.3,a2), result (I5.3,cI) is automatically true when w ;::: b. Hence, we need only establish the result when w < b. Now if w < band w + b is odd, then (w,b)=(k+I-c,k+c)
for some c with 1 :$ c :$ k. But formulae (b) show that it is enough to show that for 1 :$ a :$ k, (2k
+1 -
2k 1 VI< ) { 2 -
} -I + 21 (2k) k
(
k
+2ka-I )
;::: (2k _ vk)2-(2k-2) ( 2k -1 ); k+a-I
and since (
1)/
2k ) / (2k) ;::: ( 2k k+a-I k k+a-I
we need only establish the case when a (2k
= 1:
+ 1- VI<)T(2k-I)(I + Pk)-I
(2k k
1),
e:)
;::: (2k - vk)T(2k-2) Ck;
1),
which reduces to
(f) But property (f) follows by induction from (d) using only the fact that Pk is decreasing in k.
Proof for the case when b + w is even may be achieved similarly. Proof of result (I5.3,c2). Because of (I5.3,a3), the result (I5.3,c2) is automatically true when w < b, so we need only establish it when w ;::: b. In analogy with the reduction of the 'general a' case to the 'boundary case a = l' in the proof of (I5.3,cI), it is easily shown that it is sufficient to prove (I5.3,c2) for the case when (w, b) = (k + 1, k + 1) for some k. Formulae (b) reduce this problem to showing that
{I + (2k + l)p,.JVk :$ 2k(2k + l)Pk,
163
Chapter 15: Applications
.. (15.5)
and this follows by induction from (d) using only the fact that (2k
+ l)Pk
o
is increasing in k.
15.5. Proof of result (l5.3,d) Define Then, from (15.4,d),
where
7r
4 Stirling's formula shows that find N so that
Ck
---+ 0 as k ---+
le,,1 < e
for
00,
so that given e > 0, we can
k;::: N.
Induction shows that for k ;::: N,
But, since EPk = 00, we have IT(l - p,,) lim sup lak+ll :c:; e, so that ---+ O.
a"
= 0,
and it is now clear that
Because of the accurate version of Stirling's formula:
0< 8 = B(n) < 1, we have
so that as required.
v" - (2k + i - ~) ---+ 0, o
We now take a quick look at filtering. The central idea combines Bayes' formula with a recursive property which is now illustrated by two examples.
Chapter 15: Applications
164
(15.6) ..
15.6. Recursive nature of conditional probabilities Example. Suppose that A, B, C and D are events (elements of F) each with a strictly positive probability. Let us write (for example) ABC for An B n C. Let us also introduce the notation
CA(B) := P(BIA) = P(AB)fP(A) for conditional probabilities. The 'recursive property' in which we are interested is exemplified by
'if we want to find the conditional probability of D given that A, B and C have occurred, we can assume that both A and B have occurred and find the CAB probability of D given C'. 0 Example. Suppose the X, Y, Z and Tare RVs such that (X, Y, Z, T) has a strictly positive joint pdf fX,Y,Z,T on R4: for B E B4,
P{(X,Y,Z,T) E B}
=
JJJ
lfx,y,Z,T(X,y,z,t)dxdydzdt.
Then, of course, (X, Y, Z) has joint pdf fx,Y,z on R3, where
fx,y,z(x, y, z) =
~ fX,Y,Z,T(X,y, z, t)dt.
The formula
fX.Y,Z,T(X, y, z, t) f TIX,Y,Z (ll x, y, z ) ..- f ( ) X,Y,Z x,y,z defines a ('regular') conditional pdf of T given X, Y, Z: for B E B, we have, with all dependence on w indicated,
peT E BIX, Y, Z)(w) = E(IB(T)IX, Y, Z)(w)
=l
frlx,Y,z(tIX(w), Yew), Z(w))dt.
Similarly,
y, z, t) f T,ZIX,Y (t , z Ix , y ) -_ fX,Y,Z,T(X, f X,Y (x,y ) .
.. {15.7}
165
Chapter 15: Applications
The recurrence property is exemplified by
f ) IX Y:= h,ZIX,Y f TIX,Y,Z = ( TIZ f . , ZIX,Y
o
15.7. Bayes' formula for bivariate normal distributions With a now-clear notation, we have for RVs X, Y with strictly positive joint pdf fx,Y on R2,
- fx,Y(x,y) _ fx(x)fYlx(Ylx) f XIY (I) x y Jy(y) Jy(y) . Thus
the 'constant of proportionality' depending on y but being determined by the fact that
in
fXIY(x,y)dx
= 1.
The meaning of the following Lemma is clarified within its proof.
LEMMA
(a)
Suppose that p, a, b E fl, that U, W E (0,00) and that X and Yare RVs such that C(X) = N(p, U),
Cx(Y)
= N(a + bX, W).
Then
Cy(X)
= N(X, V),
where the number V E (0,00) and the RV X are determined as follows: X p bey - a)
V=U+
W
Proof The absolute distribution of X is N(p, U), so that
fx(x) = (21rU)-! exp {_ (x
;;)2 }.
.
(15.7) ..
Chapter 15: Application3
166
The conditional pdf of Y given X is the density of N(a + bX, W), so that
fylx(yJx)
= (21lW)-' exp {(y I
a - bx)2 }
2W
.
Hence, from (**),
(X-J-L)2 (y-a-bx)2 2U 2W
10gfxly(xJy) = Cl(y) =C2(Y)where l/V
o
= l/U +b2 /W
and x/V
(x - £)2 2V '
= p/U +b(y-a)/V.
The result follows.
COROLLARY
(b)
With the notation of the Lemma, we have
Proof. Since Cy(X) = N(_x" , V), we even have
E{(X -
X?JY} = V,
a.s.
15.8. Noisy observation of a single random variable Let X, 7]1,7]2, .•• be independent RVs, with
C(X)
= N(O,u 2 ),
C(7]k) = N(O, 1).
Let (c n ) be a sequence of positive real numbers, and let
We regard each Yk as a noisy observation of X. We know that
One interesting question mentioned at (10A,c) is:
when is it true that Moo = X a.s.?, or again, is X a.s. equal to an Feo-measurable RVl
o
167
Chapter 15: Applications
.. {15.S}
Let us write Cn to signify 'regular conditional law' given Y 1 , Y2 , •• We have
•,
Yn'
Suppose that it is true that Cn - 1 (X) = N(Xn -
1,
Vn -
1)
where Xn- 1 is a linear function of Y 1, Y 2 , .•• , Y n- 1 and V n - 1 a constant in (0,00). Then, since Yn = X + CnT}n, we have
From the Lemma 15.7(a) on bivariate normals with 2
= Xn-1,U = Vn-1,a = O,b = 1, W = cn' A
Jl
we have where 1
1
1
=-+Vn Vn- 1 c~' But the recursive property indicated in Section 15.6 shows that
We have now proved by induction that
Now, of course,
However, n
Vn
= {17- 2 + 2:>k 2}-1. k=l
Our martingale M is £2 bounded and so converges in £2. We now see that
Moo = X a.s. if and only if 2: c k2 = 00 .
{15.9}..
Chapter 15: Applications
168
15.9. The Kalman-Bucy filter The method of calculation used in the preceding three sections allows immediate derivation of the famous Kalman-Bucy filter. Let A, H, C, K and 9 be real constants. Suppose that X o , Yo, el, e2, ... , '1/1, '1/2, ••• are independent RVs with
The true state of a system at time n is supposed given by X n , where (dynamics)
Xn - X n -
1
= AXn- 1 + Hen + g.
However the process X cannot be observed directly: we can only observe the process Y, where (observation)
Yn
-
Yn -
1
= CXn + K'l/n.
Just as in Section 15.8, we make the induction hypothesis that
where Cn - 1 signifies regular conditional law given Yi, Y2 , ... , Yn Xn = aXn- 1 + 9 + Hen, where 0':= 1 +.4,
we have •
Cn - 1 (X n ) = N(aXn Also, since Yn = Yn -
1
1
+ g,a 2 Vn - 1 + H 2 ).
+ CXn + K'l/n, we have
Apply the bivariate-normal Lemma 15.7(a) to find that
where
(KBl) (KB2)
C2
1 1 Vn = a 2Vn_ 1 +H2
+ ](2'
aXn_1 + 9 Vn = a 2Vn_ 1 +H2
+
.in
C(Yn - Yn-d ](2
•
1•
Since
.. (15.10)
Chapter 15: Applications
169
Equation (KB1) shows that Vn = f(Vn- 1 ). Examination of the rectangular hyperbola which is the graph of f shows that Vn -> Veo , the positive root of V = f(V). If one wishes to give a rigorous treatment of the K-B filter in continuous time, one is forced to use martingale and stochastic-integral techniques. See, for example, Rogers and Williams (1987) and references to filtering and control mentioned there. For more on the discrete-time situation, which is very important in practice, and for how filtering does link with stochastic control, see Davis and Vintner (1985) or Whittle (1990). 15.10. Harnesses entangled
The martingale concept is well adapted to processes evolving in time because (discrete) time naturally belongs to the ordered space Z+. The question arises: does the martingale property transfer in some natural way to processes parametrized by (say) Zd? Let me first explain a difficulty described in Williams (1973) that arises with models in Z (d = 1) and in Z2, though we do not study the latter here. Suppose that (Xn : n E Z) is a process such that each Xn E £1 and that ('almost surely' qualifications will be dropped)
(a) For m E Z, define
9m
= U(Xk
: k ::; m),
Hm
= u(Xr : r ?
The Tower Property shows that for a, b in Z+ with a a < r < b,
E(XrI9a, Hb)
m).
< b, we have for
= E(XrIX. : r i' sl9a, Hb) = !E(Xr- 1 19a, Hb)
so that r ....... E(Xr I9a, Hb) is the linear interpolation
Hence, for n E Z and
U
E N, we have, a.s.,
+ !E(Xr+t!9a, Hb),
170
(15.10) ..
Chapter 15: Applications
Now, the u-algebras u(gn-u,1tn+d decrease as u i 00. Because of "Varning 4.12, we had better not claim that they decrease to u(g-oc" 1tn+l). Anyway, by the Downward Theorem, we see that the random variable L:= limul-oo(Xu/u) exists (a.s.)
and u
Hence, by the Tower Property,
(b) whence we have a reversed-martingale property: E(Xn - nL/L, 1tn+d = Xn+l - (n
+ l)L.
A further application of the Downward Theorem shows that
(c)
A := limnloo(Xn
-
nL) exists (a.s.)
Hence L = lim(Xu/u). uToo
By using the arguments which led to (b) in the reversed-time sense, we now obtain
(d) From (b) and (d) and the Tower Property, E(Xn + L/Xn +1 ) = Xn+l, E(Xn+J/Xn + L) = Xn + L. Exercise E9.2 shows that Xn+l = Xn
+ L.
Xn = nL + A,
Hence (almost surely)
Vn E Z,
so that (almost) all sample paths of X are straight lines! Hammersley (1966) suggested that any analogue of (a) should be called a harness property and that the type of result just obtained conveys the idea that every low-dimensional harness is a straitjacket!
Chapter 15: Applications
.. (15.12)
171
15.11. Harnesses unravelled, 1 The reason that (15.10,a) rules out interesting models is that it is expressed in terms of the idea that each X n should be a random variable. What one should say is that Xn:Q-+R but then require only that differences
(Xr - X. : r, s E Z) be RVs (that is, be F-measurable), and that for n, k E Z with k f; n,
I call this a difference harness in Williams (1973). Easy exercise. Suppose that (Yn any function on Q. Define
:
n E Z) are IID RVs in £1. Let Xo be
if n ?: 0, ifn < O. Thus, Xn - X n-
I
= Y n , \In. Prove that X is a difference harness.
15.12. Harnesses unravelled, 2 In dimension d ?: 3, we do not need to use the 'difference-process' unravelling described in the preceding section. For d ?: 3, there is a non-trivial model, related both to Gibbs states in statistical mechanics and to quantum fields, such that each Xn(n E Zd) is a RV and, for n E Zd,
E(XnIXm : m E Zd \ {n})
= (2d)-1 'EXn+u, uEU
where U is the set of the 2d unit vectors in Zd. See Williams (1973). In addition to a fascinating etymological treatise on the terms 'martingale' and 'harness', Hammersley (1966) contains many important ideas on harnesses, anticipating later work on stochastic partial differential equations. Many interesting unsolved problems on various kinds of harness remain.
PART C: CHARACTERISTIC FUNCTIONS Chapter 16
Basic Properties of CFs
Part C is merely the briefest account of the first stages of characteristic function theory. This theory is something very different in spirit from the work in Part B. Part B was about the sample paths of processes. Characteristic function theory is on the one hand part of Fourier-integral theory, and it is proper that it finds its way into that marvellous recent book, Korner (1988); see also the magical Dym and McKean (1972). On the other hand, characteristic functions do have an essential role in both probability and statistics, and I must include these few pages on them. For full treatment, see Chow and Teicher (1978) or Lukacs (1970). Exercises indicate the analogous Laplace-transform method, and develop in full the method of moments for distributions on [0,1]. 16.1. Definition
The characteristic function (CF) 'P = 'PX of a random variable X is defined to be the map
(important:the domain is R not C) defined by 'P(B) := E(e i8X ) = E cos(BX)
+ iE sin(BX).
Let F := Fx be the distribution function of X, and let Ji. := Ji.x denote the law of X. Then
so that
172
.. (16.3)
Chapter 16: Basic Properties of CFs
173
16.2. Elementary properties
Let 'f' = 'f'X for a RV X. Then
= 1 (obviously);
~(a)
'f'(O)
~(b)
1'f'(B)1 :::; 1, VB;
~(c)
B 1-+ 'f'( B) is continuous on R;
(d) (e)
= 'f'x(B), VB, 'f'aX+b(B) = eib °'f'x(a8). 'f'(-x)(B)
You can easily prove these properties. (Use (BDD) to establish (c).) Note on differentiability (or lack of it). Standard analysis (see Theorem A16.1) implies that if n E N and E(IXln) < 00, then we may formally differentiate
In particular,
= (1 -IBI)I[-I,I](8)
so that
Amongst uses of CFs are the following: • to prove the Central Limit Theorem (CLT) and analogues, • to calculate distributions of limit RVs, • to prove the 'only if' part of the Three-Series Theorem, • to obtain estimates on tail probabilities via saddle-point approximation, • to prove such results as if X and Y are independent, and X + Y has a normal distribution, then both X and Y have normal distributions. Only the first three o{ these uses are discussed in this book.
174
Chapter 16: Basic Properties of
CF~
(16·4)··
16.4. Three key results
(a) If X and Yare independent RVs, then 'Px+y(9)
= 'Px(9)'Py(9),
V9.
Proof. This is just 'independence means multiply' again:
o (b) F may be reconstructed from 'P. See Section 16.6 for a precise statement.
(c) 'Weak' convergence of distribution functions corresponds exactly to convergence of the corresponding CFs.
See Section IS. 1 for a precise statement. The way in which these results are used in the proof of the Central Limit Theorem is as follows. Suppose that X I ,X2 , ••• are lID RVseach with mean o and variance 1. From (a) and (16.2,e), we see that if Sn := XI + ... + X n , then Eexp(i9Sn lfo) = 'Px(9Ifot· We shall obtain rigorous estimates which show that
V9.
-t(
2 ) is the CF of the standard normal distribution (as we Since 9 1-+ exp( shall see shortly), it now follows from (b) and (c) that
the distribution of Snl fo converges weakly to the distribution function 1> of the standard normal distribution N(O, 1).
In this case, this simply means that
P(Snlfo:::; x)
-+
1>(x),
x E R.
16.5. Atoms In regard to both (16.4,b) and (16.4,c), tidiness of results can be threatened by the presence of atoms.
.. (16.6)
Chapter 16: Basic Properties of CFs
175
If P(X == c) > 0, then the law Jl. of X is said to have an atom at c, and the distribution function F of X has a discontinuity at c:
p( {c}) == F(c) - F(c-)
= P(X == c).
Now p can have at most n atoms of mass at least lin, so that the number of atoms of p is at most countable.
H therefore follows that given x E R, there exists a sequence (Yn) of reals with Yn 1 x such that every Yn is a non-atom of Jl. (equivalently a point of continuity of F); and then, by right-continuity of F, F(Yn) 1 F(x). 16.6. Levy's Inversion Formula
This theorem puts the fact that F may be reconstructed from 'P in very explicit form. (Check that the theorem does imply that if F and G are distribution functions such that 'PF = 'PG on R, then F == G.)
THEOREM Let 'P be the CF of a RV X which has law 11 and distribution function F. Then, for a < b,
(a)
1
? Tjoo_7r lim
jT
e- i9a
-T
_
.(J !
e- i8b
1
'P( (J)d(J
1
= 2 P({a}) + p(a, b) + 211({b}) 1
= ?[F(b) _
1
+ F(b-)] - -[F(a) + F(a- )]. 2
Moreover, if JR 1'P«(J)ld(J < density function f, and
00,
then X has continuous probability
(b) The 'duality' between (b) and the result
(c) can be exploited as we shall see. The proof of the theorem may be omitted on a first reading.
Ba~ic
Chapter 16:
176
(16.6) ..
Properties of CPs
Proof of the theorem. For u, v E R with u
=:;
v,
(d) either from a picture or since
Let a, b E R, with a < b. Fubini's Theorem allows us to say that, for
0< T < (e)
00,
1 ---2 7r
j T e-i8a _ e- i9b
'0
-T
l
1 jT
=
-T
27r
1
= ---27r
1
-i8a
e
~(O)dO
~e
-i8b ( [
iR e iBx Il(dx)
{ j T eill(x-a) -
'0
-T
R
e i8 (z-b)
l
dO
}
)
dO Il( dx)
provided we show that .- 1
CT.-
---2 7r
1 I {jT
R
e i9 (r-a) - eill(x-b)
'0
I}
l
-T
However, inequality (d) shows that CT
dO
Il(dx)
=:; (b - a)T/7r,
< 00.
so that (e) is valid.
Next, we can exploit the evenness of the cosine function and the oddness of the sine function to obtain
(f)
1 jT
---2 7r
-T
eill(z-a) _ eiB(x-b)
dO
'0
t
= sgn(x - a)S(lx - alT) - sgn(x - b)S(ix - biT) 7r
where, as usual, sgn(x) :=
{~
-1
and S(U) :=
l
o
U
sin x dx x
if x> 0, if x = 0, if x < 0,
(U > 0).
.. {16.6}
Chapter 16: Basic Properties of
10
Even though the Lebesgue integral
1
00
o
(sinx)+ dx -x
00
=
X-I
1
00
177
sinxdx does not exist, because
(sinx)dx -x
0
CF~
= 00,
we have (see Exercise E16.1)
lim S(U) UToo
= ~. 2
The expression (f) is bounded simultaneously in x and T for our fixed a and bj and, as T T00, the expression (f) converges to
o if x < a or x > b, ! if x = a or x = b, 1 if a
<
x
< b.
The Bounded Convergence Theorem now yields result (a).
IR
Suppose now that J'P(O)JdO (a) and use (DOM) to obtain
(g)
F(b) - F(a) = -
1
211"
<
We can then let T
00.
1
e-i()a -
.0
R
e-i()b
T00 in result
tp(O)dO,
t
provided that F is continuous at a and b. However, (DOM) shows that the right-hand side of (g) is continuous in a and b and (why?!) we can conclude that F has no atoms and that (g) holds for all a and b with a < b. We now have
F(b) - F(a) b- a
(h)
1
= -211"
1 R
e-i()a - e-i()b ·O(b) cp( O)dO. Z - a
But, by (d),
Ie-iUa iO(b Hence, the assumption that b --t a in (h) to obtain
e-i()b
a)
211"
f
-
IR Jcp(O)JdO < 00 allows us to use (DOM) to let
F I (a) = f(a) := - 1 and, finally,
I
is continuous by (DOM).
1 .()
e-' acp(O)dO,
R
o
(16.7).
Chapter 16: Basic Properties of CFs
178
16.1. A table pdf
Support
CF
1 ex { (X-IJ)2} ,,$ P -~
R
exp(if-lB - ~/72B2)
2. U[O,l]
1
[0,1]
--.r
3. U[-l,l]
2
1
[-1,1]
sin 9 -11-
4. Double exponential
!e-/ x /
R
1 1+112
5. Cauchy
1 1I"(1+r2)
R
e- Ilil
6. Triangular
l-Ixl
[-1,1]
2 e-~~s II)
7. Anon
--;;r-
) -cos x
R
(1 -IBI)I[-I,I](B)
Distribution l. N(f-l, (7 2 )
2
e i ' -1
The two lines 4 and 5 illustrate the duality between (16.6,b) and (16.6,c), as do the two lines 6 and 7. Hints on verifying the table are given in the exercises on this chapter.
Chapter 17
Weak Convergence
In this chapter, we consider the appropriate concept of 'convergence' for probability measures on (R, 8). The terminology 'weak convergence' is unfortunate: the concept is closer to 'weak·' than to 'weak' convergence in the senses used by functional analysts. 'Narrow convergence' is the official pure-mathematical term. However, probabilists seem determined to use 'weak convergence' in their sense, and, reluctantly, I follow them here. We are studying the special case of weak convergence on a Polish (complete, separable, metric) space S when S = R; and we unashamedly use special features of R. For the general theory, see Billingsley (1968) or Parthasarathy (1967) - or, for a superb acount of its current scope, Ethier and Kurtz (1986). Notation. We write probeR) for the space of probability measures on R, and
for the space of bounded continuous functions on R. 17.1. The 'elegant' definition Let (p.n : n E N) be a sequence in ProbeR) and let p. E ProbeR). We say that p'n converges weakly to p. if (and only if)
(a)
fln(h)
->
p.(h),
and then write
(b) 179
Chapter 17: Weak Convergence
180
(17.1) ..
\\'e know that elements of ProbeR) correspond to distribution functions via the correspondence where
F(x)
= Jl(-oo,xj.
Weak convergence of distribution functions is defined in the obvious way:
Fn ~ F
(c)
if and only if Pn ~ Jl.
\\'e are generally interested in the case when Fn = Fx n , that is when Fn is the distribution function for some random variable X n . Then, by (6.12), we have, for h E Cb(R),
Jln(h) =
in h(x)dFn(x)
= Eh(Xn).
Note that the statement Fn ~ F is meaningful even if the Xn's are defined on different probability spaces. However, if Xn (n E N) and X are RVs on the same triple (51, F, P), then
(d)
(Xn ...... X, a.s.)
=}
(Fxn ~ Fx),
and indeed,
(e)
(Xn ...... X in prob)
=}
(Fxn ~ Fx),
Proof of (d). Suppose that Xn ...... X, a.s., and that Jln is the law of Xn and Jl is the law of X. Then, for h E Cb(R), we have h(Xn) ...... heX), a.s., and,
by (BDD),
Jln(h)
= E(Xn) ...... E(X) = p(h).
0
Exercise. Prove (e). 17.2. A 'practical' formulation Example. Atoms are a nuisance. Suppose that Xn = ~, X = O. Let fin be the law of X n, so that Jln is the unit mass at ~, and let fI be the law of X. Then, for h E Cb(R),
so that Jln ~ Jl. However,
Fn(O)
= 0 f+ F(O) =
1.
o
Chapter 17: Weak Convergence
.. (17.2)
181
LEMMA
(a)
Let (Fn) be a sequence of DFs on R; and let F be a DF on R. Then Fn ~ F if and only if
lim Fn(x) n
= F(x)
for every non-atom (that is, every point of continuity) x of F. Proof of 'only if' part. Suppose that Fn ~ F. Let x E R, and let 0 > O. Define h E Cb(R) via
hey) := Then J-ln(h)
--+
{i -
O-l(y - x)
ify ~ x, if x < y < x + 6, if Y ~ x + o.
J-l(h). Now,
so that lim sup Fn(x) ::::; F(x
+ 0).
n
!
However, F is right-continuous, so we may let
{j
(b)
Vx E R.
lim sup Fn(x) ::::; F(x),
0 to obtain
n
In similar fashion, working with y 1-+ hey + 0), we find that for x E R and 6> 0, liminf Fn(x-) ~ F(x - 6), n
so that
(c)
lim inf Fn( x-) n
~
F( x- ),
Inequalities (b) and (c) refine the desired result.
Vx E R.
o
In the next section, we obtain the 'if' part as a consequence of a nice representation.
Chapter 17: Weak Convergence
182
(17.3) ..
17.3. Skorokhod representation
THEOREM Suppose that (Fn : n E N) is a sequence of DFs on R, that F is a DF on R and that Fn(x) -; F(x) at every point x of continuity of F. Then there exists a probability triple (n,.1', P) carrying a sequence (Xn) of RVs and also a RV X such thai Fn=Fx n
,
F=Fx,
and
Xn -; X
a.s.
This is a kind of 'converse' to (17.1,d).
Proof. We simply use the construction in Section 3.12. Thus, take
(n,.1',p) = ([0, 1J,8[0, IJ,Leb), define
X+(w) :=inf{z:F(z) >w}, X-(w):= inf{z: F(z) 2: w}, and define x;t, X;; similarly. We know from Section 3.12 that X+ and Xhave DF F and that P(X+ = X-) = l. Fix w. Let z be a non-atom of F with z > X+(w). Then F(z) > w, and hence, for large n, Fn(z) > w, so that x;t(w) S; z. So limsuPn x;t(w) S; z. But (since non-atoms are dense), we can choose z! X+(w). Hence ' lim sup x;t(w) S; X+(w), and, by similar arguments, liminf X;;-(w) 2: X-(w). Since X;; S; x;t and P(X+ = X-) = 1, the result follows.
o
17.4. Sequential compactness for ProbeR)
There is a problem in working with the non-compact space R. Let /1n be the unit mass at n. No subsequence of (/1n) converges weakly in probeR), but JLn ~ /100 in probeR), where JLoo is the unit mass at 00. Here R is the
Chapter 17: Weak Convergence
·.(17.4)
183
compact metrizable space [-00,00], the definition of ProbeR) is obvious, and Jln ~ Jloo in probeR) means that Vh E C(R). (We do not need the subscript 'b' on C(R) because elements of C(R) are bounded.) It is important to keep remembering that while functions in C(R) tend to limits at +00 and -00, functions in Cb(R) need not. The space C(R) is separable (it has a countable dense subset) while the space Cb(R) is not. Let me briefly describe how one should think of the next topic. Here I assume some functional analysis, but from the next paragraph (not the next section) on, I resort to elementary bare-hands treatment. By the Riesz representation theorem, the dual space C(R)" of C(R) is the space of bounded signed measures on (R, B(R». The weak" topology of C(R)· is metrizable (because C(R) is separable), and under this topology the unit ball of C(R)" is compact and contains probeR) as a closed subset. The weak" topology of probeR) is exactly the probabilists' weak topology, so (a)
ProbeR) i" a compact metrizable space under our probabilists' weak topology. The bare-hands substitute for result (a) is the following.
LEMMA (Reily-Bray)
(b)
(*)
Let (Fn) be any sequence of distribution functions on R. Then there exist a right-continuou~ non-decreasing function F on R such that 0::; F::; 1 and a subsequence (ni) such that limFn,(x)
•
= F(x)
at every point of continuity F .
Proof. We make an obvious use of 'the diagonal principle'. Take a countable dense set C of R and label it:
C
= {Cl, C2, C3, ••• }.
Since (Fn(cd : n E N) is a bounded sequence, it contains a convergent subsequence (Fn(l,j)(Cl»: as j
-+
00.
In some subsequence of this subsequence, we shall have as j
-+
00;
(17·4) ..
Chapter 17: Weak Convergence
184
and so on. If we put
ni
H(c)
= n( i, i), then we shall have:
:= lim Fn;(c) exists for every c in
C.
Obviously, 0 :::; H :::; 1, and H is non-decreasing on C. For x E R, define
F(x) := lim H(c), cUz
the
'11' signifying that c decreases strictly to x through C. (In particular,
F(c) need not equal H(c) for c in C.) Our function F is right-continuous, as you can check. By the 'limsupery' of Sections 17.2 and 17.3, you can also check for yourself that (*) holds: I wouldn't dream of depriving you of that pleasure. 0
17.5. Tightness Of course, the function F in the Helly-Bray Lemma 17 .4(b) need not be a distribution function. It will be a distribution if and only if lim F(x)
xU-oo
= 0,
lim F(x)
zfl+oo
= 1.
Definition ~~
A sequence (Fn) of distribution functions is called tight if, given c > 0, there exists ]( > 0 such that
I1n[-K, K]
= F(I<) -
F( -I< -) > 1 - c.
You can see the idea: 'tightness stops mass being pushed out to +00 or -00'.
LEMMA Suppose that Fn is a sequence of DFs.
(a)
If Fn ~ F for some DF F, then (Fn) is tight.
(b)
If (Fn) is tight, then there exists a subsequence (Fn,) and a DF F such that Fn; ~ F.
This is a really easy limsupery exercise.
Chapter 18
The Central Limit Theorem
The Central Limit Theorem (CLT) is one of the great results of mathematics. Rere we derive it as a corollary of Levy's Convergence Theorem which says that weak convergence of DFs corresponds exactly to pointwise convergence of CFs. 18.1. Levy's Convergence Theorem ~~~
Let (Fn) be a sequence of DFs, and let
Proof. Assume for the moment that
(a)
the sequence (Fn) is tight.
Then, by the ReIly-Bray result 17.5(b), we can find a subsequence (Fnk) and a DF F such that But then, for
(J
E R, we have
(take h(x)::: eill.,). Thus 9 =
185
Chapter 18: The Central Limit Theorem
186
(18.1) ..
Now we argue by contradiction. Suppose that (Fn) does not converge weakly to F. Then, for some point x of continuity of F, we can find a subsequence which we shall denote by (Fn) and an 17 > 0 such that
'In. But (Fn) is tight, so that we can find a subsequence Fn; and a DF F such that -
Fn;
w-
-+
F.
But then 'Pn; -+ rp, so that 'P = 'Pfr = g = 'PF. Since a CF determines the corresponding DF uniquely, we see that F = F, so that, in particular, x is a non-atom of F and
The contradiction between (*) and (**) clinches the result.
o
We must now pro\'e (a). Proof of tightness of (Fn). Let c > 0 be given. Since the expression
'Pn(8)
+ 'Pn(-B) = k2cos(8x)dFn(x)
is real, it follows that g(B) + g( -B) is real (and obviously bounded above by 2). Since 9 is continu.ou.s at 0 and equal to 1 at 0, we can choose 0 > 0 such that 11 - g(B)1 < tC when 181 < 8.
We now have
0< 0- 1
1 6
{2 - g(B) - g( -B)}d8 s:;
~c.
Since g = lim 'Pn, the Bounded Convergence Theorem for the finite interval [0,8] shows that there exists no in N such that for n ::::: no,
Chapter 18: The Central Limit Theorem
.. (18.2) However,
0- 1
1 6
= 0- 1 =
187
{2 - 'Pn(O) - 'Pn( -O)}dO
1: {l
l {0- 1 1
6 6
(1 -
eil/x)
dFn(X)} dB
(1 -
eil/x)
dB} dFn(x),
the interchange of order of integration being justified by the fact that since 11 - eil/ x I : : ; 2, 'the integral of the absolute value' is clearly finite. We now have, for n ~ no,
1( ~1
c~2
R
sinOx) 1- - " dFn ~ 2
ixi>26- 1
vX
1 ( ixi>26-1
1-
1) dFn(x)
-1
dFn = Iln{X : Ixi > 20- 1 }
and it is now evident that the sequence (Fn) is tight.
o
If you now re-read Section 16.4, you will realize that the next task is to
obtain 'Taylor' estimates on characteristic functions.
18.2.
0
and 0 notation
Recall that
J(t) = O(g(t»
t
as
~
L
means that limsupIJ(t)/g(t)1 <
CX)
t-L
and that
J(t)
= o(g(t»
as
t ~ L
as
t ~
means that
J(t)/g(t)
~
0
L.
(18.S).
Chapter 18: The Central Limit Theorem
188
18.3. Some important estimates
For n
= 0,1,2, ...
and x real, define the 'remainder' Rn(x)
= e i '" -
n
)k
(.
L
~:!
k=O
Then Ro(x) = e i '" -1 =
1'"
.
ieilldy,
and from these two expressions we see that IRo(x)1 ~ min(2, Ixl). Since Rn(x) =
1'"
iRn_1 (y)dy,
we obtain by induction: . (2lxln IXln+I) IRn(x)1 ~ mm ~'(n + I)! . Suppose now that X is a zero-mean RV in £2: E(X) = 0,
u 2 := Var(X)
< 00.
Then, with If' denoting If' x, we have ~(a)
11;:'(8) - (1-lu 282)1
= IER2 (8X)1 ~ EIR2(8X)1
~ (PE (IXI2 1\ 1811:1 3 )
•
The final term within EO is dominated by the integrable RV IXI 2 and tends to 0 as 8 -+ O. Hence, by (DOM), we have ~~(b)
Next, for Izl < l, and with principal values for logs, 10g(1
+ z) -
z
=
l -=-+ z (
o 1
w)
w
dw
and since 11 + tzl ;::: l we have ~~(c)
1I0g(1 + z) - zl ~ Iz12,
= _z2
11 0
tdt
-1 + tz
.. (18.5)
Chapter 18: The Central Limit Theorem
189
18.4. The Central Limit Theorem
Let (Xn) be an IID sequence, each Xn distributed as X where
E(X) Define Sn := Xl
= 0,
+ ... + X n ,
0- 2
:::;
x)
-+
Var(X) <
00.
and set
Then, for x E R, we have, as n
P(G n
:=
(x)
-+ 00,
=
1
.;;C
v21r
jX
exp(-!y2)dy.
-00
Proof. Fix () in R. Then, using (18.3,b),
the '0' now referring to the situation when n we have, as n -+ 00,
-+ 00.
But now, using (18.3,c),
Hence 'PeJ(}) -+ exp(-!(}2), and since () 1-+ exp(-!B2) is the CF of the normal distribution, the result follows from Theorem 18.1. 0
18.5. Example
Let us look at a simple example which shows how the method may be adapted to deal with a sequence of independent but non-lID RVs. With the Record Problem E4.3 in mind, suppose that on some (n,:F, P), E 1 , E 2 , ••• are independent events with PeEn) = lin. Define
Chapter 18: The Central Limit Theorem
190
(18.5)
the 'number of records by time n' in the record context. Then
E(Nn )
= L ~ = log n + I + 0(1), CI is Euler's constant) k:::;n
Let
G n := N n -logn y10gn so that E(G n )
~
0, Var(G n )
But
Hence P(G n
:::;
x)
~
=
00,
!1 n
~
1. Then, for fixed () in R,
=
g n
{
1-
II"} .
k + ke't
and with t:= (}1y1ogn,
(x), x E R.
o
See Hall and Heyde (1980) for some very general limit theorems. 18.6. CF proof of Lemma 12.4 Lemma 12.4 gave us the 'only if' part of the Three-Series Theorem 12.5. Its statement was as follows.
191
Chapter 18: The Central Limit Theorem
.. (18.6)
LEMMA Suppose that (Xn) is a sequence of independent random variables bounded by a constant K in [0,00): IXn(w)1 :5 K, 'tin, 'tIw. Then (I:;Xn converges, a.s.) =} (I:;E(Xn) converges and I:;Var(Xn) < 00). The proof given in Section 12.4 was rather sophisticated.
Proof using chamderistic functions. First, note that, as a consequence of estimate (18.3,a), if Z is a RV such that for some constant K 1 , <12:= Var(Z) < 00,
IZI :5 K}, E(Z) = 0, then for 101 < K}-l, we have l'f'z(B)1 :5 1-
~q2B2 + ~IW KIE(Z2) 1
1
1
2
6
3
< 1 - _<1 2 B2 + _<1 2 B2 = 1 - _<1 2 B2
-
:5 exp (
_~<12B2) .
Now take Zn := Xn - E(Xn). Then
E(Zn)
= 0,
Var(Zn)
= Var(Xn),
l'f'zn(B)1 = lexp{-iBE(Xn)}'f'xn(O)1 and IZnl :5 2K. If I:; Var(Xn) (2K)-1,
= 00,
then, for 0
= l'f'xn(B)I,
< (2I()-1, we shall have, for 0 < IBI <
IT l'f'x.(B)1 = IT l'f'z.(B)1 :5 exp {_~B2 LVar(X
k )}
= 0.
However, if I:; X k converges a.s. to S, then, by (DOM),
II 'f'x.(B) = Eexp(iBSn) and 'f's(B) is continuous in B with 'f's(O)
= 1.
-+
'f's(O),
We have a contradiction.
Hence I:;Var(Xn ) = I:;Var(Zn) < 00, and, since E(Zn) = 0, Theorem 12.2(a) shows that Zn converges a.s.
L
Hence
L E(Xn) = L {Xn -
Zn}
converges a.s., and since it is a deterministic sum, it converges! This last part of the argument was used in Section 12.4. 0
APPENDICES Chapter A1
Appendix to Chapter 1
ALl. A non-measurable subset A. of SI. In the spirit of Banach and Tarski, although, of course, this relatively trivial example pre-dates theirs, we use the Axiom of Choice to show that
(a) where the Aq are disjoint sets, each of which may be obtained from any of the others by rotation. IT the set A = Ao has a 'length' then it is intuitively clear that result (a) would force 271"
= 00 x
length (A),
an impossibility.
{e i9 z
rv
To construct the family (Aq : q E Q), proceed as follows. Regard SI as : 8 E R} inside C. Define an equivalence relation,"" on SI by writing W if there exist a and fJ in R such that
Use the Axiom of Choice to produce a set A which has precisely one representative of each equivalence class. Define
Aq = eiqA = {eiqz : z EA.}. Then the family (Aq : q E Q) has the desired properties. (Obviously, Q could be replaced by Z throughout the above argument.) We do not bring this example to its fully rigorous conclusion. The remainder of this appendix is fully rigorous.
192
.. (Al.S)
--
Chapter Al: Appendix to Chapter 1
19S
We now set out to prove Uniqueness Lemma 1.6.
A1.2. d-systems. Let S be a set, and let 'D be a collection of subsets of S. Then 'D is called ad-system (on S) if
(a)
S E 'D,
(b)
if A,B E 'D and A ~ B then B\A E 'D,
(c)
if An E 'D and An
Recall that An (d)
i A, then A E 'D.
i A means: An
~
An+! ('vn) and
U An =
A.
Proposition. A collection E oj subsets oj S is a q-algebra iJ and only iJ E is both a 1r-system and ad-system.
Proof. The 'only if' part is trivial, so we prove only the 'if' part. Suppose that E is both a 1r-system and a d-system, and that E, F and En(n E N) E E. Then Ee:= S\E E E, and
Hence On := El U .. . UEn E E and, since On Finally,
i UEk, we see that UEk E E.
o Definition of d(C). Suppose that C is a class of subsets of S. We define d(C) to be the intersection of all d-systems which contain C. Obviously, d(C) is a d-system, the smallest d-system which contains C. It is also obvious that
d(C)
~
q(C).
A1.3. Dynkin's Lemma IJ I is a 1r-system, then d(I)
= q(I).
Thus any d-system which contains a 1r-system contains the q-algebra generated by that 1r-system.
Proof. Because of Proposition A1.2(d), we need only prove that d(I) is a 1r-system.
Chapter A1: Appendix to Chapter 1
194
(A1.S) ..
Step 1: Let VI := {B E d(I) : B n C E d(I), VC E I}. Because I is a 7T'system, VI ;2 I. It is easily checked that VI inherits the d-system structure from d(I). [For, clearly, S E VI' Next, if B 1 , B2 E VI and Bl ~ B 2, then, for C in I, (B2 \B 1 ) n C = (B2 n C)\(BI n C);
and, since B2 n C E d(I), BI n C E d(I) and d(I) is a d-system, we see that (B2\Bl) neE d(I), so that B2\Bl E VI. Finally, if Bn E VI(n E N) and Bn TB, then for C E I, (Bn
n C) T(B n C)
so that B n C E d(I) and B E Vl.J We have shown that VI is ad-system which contains I, so that (since VI ~ d(I) by its definition) VI = d(I). Step 2: Let '02 := {A E d(I) : B n A E d(I), VB E d(I)}. Step 1 showed that '02 contains I. But, just as in Step 1, we can prove that '0 2 inherits the d-system structure from d(I) and that therefore 'D2 = d(I). But the fact that '02 = d(I) says that d(I) is a 7r-system. 0
Al.4. Proof of Uniqueness Lemma 1.6
Recall what the crucial Lemma l.6 stated: Let S be a set. Let I be a 7T'-system on S, and let E := a(I). Suppose that J.ll and J.l2 are measures on (S, E) such that J.lI (S) = J.lz(S) <.00 and J.ll = J.l2 on I. Then
J.ll
= J.lz
on
E.
Proof. Let
'0= {F E E : J.ll(F) = J.l2(F)}. Then V is a d-system on S. [Indeed, the fact that S E V is given. If A,B E V, then
so that B\A E V. Finally, if Fn E V and Fn
so that FE V.J
TF,
then by Lemma l.IO(a),
.. (Al.5)
195
Chapter Al: Appendix to Chapter 1
Since V is a d-system and V "2 I by hypothesis, Dynkin's Lemma shows that V "2 O"(I) = ~, and the result follows. 0 Notes. You should check that no circular argument is entailed by the use of Lemma 1.10 (this is obvious).
The reason for the insistence on finiteness in the condition IlleS) /l2(S) < = is that we do not wish to try to claim at (*) that
= - = = = - =. Indeed the Lemma 1.6 is false if '< =' is omitted -
see Section A1.IO below.
We now aim to prove Caratheodory's Theorem 1. 7. A1.5. >.-sets: 'algebra' case
LEMMA Let
90
be an algebra of subsets of S and let
>. : 90
-+
[0, =]
with >'(0) = O. Call an element L of element of 90 properly': >.(L
90
a >.-sei if L 'splits every
n G) + >'(U n G) = >'(G),
VG E
90.
Then the class £0 of >.-sets is an algebra, and>' is finitely additive on £0 . Moreover, for disjoint L 1, L 2 , ... , Ln E £0 and G in 90,
Proof Step 1: Let L1 and L2 be >.-sets, and let L prove that L is a >.-set.
Now LC n L2 = L2 n Lf and LC we have, for any G in 90, >'(L Cn G)
n L~ = L 2.
= L1 n L 2.
We wish to
Hence, since L2 is a >.-set,
= >.(L 2 n L~ n G) + >'(L~ n G)
Chapter AI: Appendix to Chapter 1
196
(A1.5).
and, of course
A(L2 n G)
+ A(L2 n G) = A( G).
Since Ll is a A-set,
A(L2 n L'1 n G)
+ A(L n G) = A(L2 n G).
On adding the three equations just obtained, we see that
A(LC n G)
+ A(L n G) = A(G),
VG E 90,
so that L is indeed a A-set.
Step 2: Since, trivially, S is a A-set, and the complement of a A-set is a A-set, it now follows that £0 is an algebra. Step S: If Ll and L2 are disjoint A-sets and G E 90, then
so, since Ll is a A-set,
The proof is now easily completed.
o
A1.6. Outer measures
Let 9 be a a-algebra of subsets of S. A map
A: 9 -; [0,00] is called an outer measure on (S, 9) if
= 0;
(a)
A(0)
(b)
A is increasing: for G 1 , G 2 E 9 with G 1 ~ G 2 ,
(c)
A is countably subadditive: if (G k ) is any sequence of elements of then
9,
Chapter A1: Appendix to Chapter 1
.. (Al.7)
197
AI. 7. Caratheodory's Lemma.
Let>.. be an outer measure on the measurable space (5,9). Then the >"-sets in 9 form a (f-algebra C on which>" is countably additive, so that (5, C, >..) is a measure space. Proof. Because of Lemma A1.5, we need only show that if (LA:) is a disjoint sequence of sets in C, then L ;= U LA: E C and A:
(a) By the subadditive property of >.., for G E
(b)
>"(G)
Now let Mn
;= UA::;n
(c)
M~
;2
s:; >..(L n G) + >..(L n G). C
LA:. Lemma A1.5 shows that Mn E C, so that
= >..(Mn n G) + >"(M~ n G).
>..(G) However,
9, we have
LC, so that >"(G)
~
>..(Mn
n G) + >"(U n G).
Lemma A1.5 now allows us to rewrite (c) as
>"(G) ~
E >"(LA: n G) + >..(L
C
n G),
C
n G)
A::$n
so that
(d)
>..(G) ~
E >..(LA: n G) + >..(L A:
~
>..(L n G)
+ >"(U n G),
using the count ably subadditive property of>.. in the last step. On comparing (d) with (b), we see that equality must hold throughout (d) and (b), so that L E Cj and then on taking G = L we obtain result (a). 0
198
(A1.8) ..
Chapter A1: Appendix to Chapter 1
A1.B. Proof of Caratheodory's Theorem.
Recall that we need to prove the following. Let S be a set, let Eo be an algebra on S, and let E:= a(E o ). If flo is a countably additive map flo : Eo a measure fl on (S, E) such that
-+
[O,ooj, then there exists
fl = flo on Eo·
Proof. Step 1: Let
9 be the a-algebra of all subsets of S. For G E 9, define n
where the infimum is taken over all sequences (Fn) in Eo with G
~
UFn. n
'Ve now prove that
(a)
>. is an outer measure on (S,9).
The facts that >'(0) = 0 and ,\ is increasing are obvious. Suppose that (G n ) is a sequence in 9, such that each >'(G n ) is finite. Let c > 0 be given. For each n, choose a sequence (Fn,k : kEN) of elements of Eo such that
L:>O(Fn,k) < >'(Gn) +c2- n • k
Then G:= UGn ~ UUFn,k, so that n
k
>'(G) $
L L flo(Fn,k) < L >'(G n
k
n)
+ c.
n
Since c is arbitrary, we have proved result (a). Step 2: By Caratheodory's Lemma A1.7, >. is a measure on (S, [,), where [, is the a-algebra of >.-sets in 9. All we need show is that (b)
Eo ~ [" and>' = flo on Eo;
for then E := a(Eo) ~ [, and we can define fl to be the restriction of >. to (S,E).
Chapter AI: Appendix to Chapter 1
.. (AI.B) Step
s:
Proof that A = Po on
199
~o.
Let F E ~o. Then, clearly, A(F) ~ po(F). Now suppose that F ~ E ~o. As usual, we can define a sequence (En) of disjoint
Un Fn, where Fn sets:
such that En ~ Fn and
UEn = UFn :2 F.
Then
~o.
by using the countable additivity of Po on
Hence
so that A(F) 2: /lo(F). Step 3 is complete. Step 4: Proof that ~o ~ C. Let E E ~o and G E y. Then there exists a sequence (Fn) in ~o such that G ~ Un Fn , and
n
Now, by definition of A,
n
n
n
2: A(E n G) + A(EC n G), since EnG ~ U(E arbitrary,
n Fn) and EC n G A(G) 2: A(E n G)
~
U(EC n Fn). Thus, since
t
is
+ A(EC n G).
However, since A is subadditive,
A(G)
~
A(E n G)
We see that E is indeed a A-set.
+ A(EC n G).
o
Chapter AI: Appendix to Chapter 1
200
(AI.9) ..
A1.9. Proof of the existence of Lebesgue measure on ((0, 1],B(O, 1]).
Recall the set-up in Section 1.8. Let S = (0,1]. For F ~ S, say that F E ~o if F may be written as a finite union
°
where r E N, ~ a1 ~ b1 .$ ... ~ a r ~ br ~ 1. Then (as you should convince yourself) ~o is an algebra on (0,1] and ~
:= u (~o) = B(O, 1].
(We write B(O, 1] rather than B((O, 1]).) For F as at (*), let /1o(F) =
2:)b k -
ak).
k$r
Of course, a set F may have different expressions as a finite disjoint union of the form (*): for example,
(0,1]
= (0, !] U (L 1].
However, it is easily seen that /10 is well defined on ~o and that /10 is finitely additive on ~o. While this is obvious from a picture, you might (or might not) wish to consider how to make the intuitive argument into a formal proof. The key thing is to prove that /10 is countably additive on ~o. So, suppose that (Fn) is a sequence of disjoint elements of ~o with union F in ~o. We know that if G n = U~=1 Fk, then n
/1o(G n ) = l:>O(Fk)
and
G n iF.
k=1
To prove that /10 is count ably additive it is enough to show that /1o(G n ) i /1o(F), for then n
/1o(F) =i lim/1o(Gn ) =i li~
L /10(Fk) = L /1o(F,,). k=1
Let Hn
= F\G n.
Then Hn E ~o and Hn
! 0.
We need only prove that
.. (Al.9)
Chapter Al: Appendix to Chapter 1
201
for then It is clear that an alternative (and final!) rewording of what we need to show is the following:
(a)
if (H n) is a decreasing sequence of elements of Eo such that for some > 0,
I;;
Proof of (a). It is obvious from the definition of Eo that, for each kEN, we can choose Jk E Eo such that, with Jk denoting the closure of Jk,
But then (recall that Hn 1)
Hence, since !lo(Hn)
> 21;;, Yin, we see that for every n,
and hence nkSn h is non-empty. A fortiori then, for every n,
Kn
:=
nJ
k
is non-empty.
kSn
That
(b) now follows from the Heine-Borel theorem: for if (b) is false, then «Jky : kEN) gives a covering of [0,1] by open sets with no finite subcovering. Alternatively, we can argue directly as follows. For each n, choose a point Xn in the non-empty set Kn. Since each Xn belongs to the compact set J1 , we can find a subsequence (nq) and a point x of J 1 such that xn. ~ x. However, for each k, Xn. E Jk for all but finitely many q, and since Jk is
(Ai.9) ..
Chapter Ai: Appendix to Chapter 1
202
compact, it follows that x E
Jk • Hence
x E
nk J
k,
and property (b) holds.
o
Since po is count ably additive on ~o and poCO, 1] < 00, it follows that Po has a unique extension to a measure p on «0, l],B(O, 1]). This is Lebesgue measure Leb on «0,1], B(O, 1]). The po-sets form a IT-algebra strictly larger than B(O, 1], namely the IT-algebra of Lebesgue measurable subsets of (0,1]. See Section Al.ll. A1.10. Example of non-uniqueness of extension With (S,
(a)
~o)
as in Section A1.9, suppose that for F E Vo
(F)
:== {
° 00
if F == if F i=
~o,
0, 0.
The Caratheodory extension of Vo will be obtained as the obvious extension of (a) to ~. However, another extension ii is given by
ii(F) == number of elements in F.
A1.11. Completion of a measure space In fact (apart from an 'aside' on the Riemann integral), we do not need completions in this book. Suppose that (S,~, p) is a measure space. Define a class N of subsets of S as follows: N EN if and only if 3Z E ~ such that N ~ Z and p(Z)
= 0.
It is sometimes philosophically satisfying to be able to make precise the idea
that 'N in N is p-measurable and peN) = 0'. This is done as follows. For any subset F of S, write FE ~* if 3E, G E ~ such that E ~ F ~ G and p(G \ E) = 0. It is very easy to show that ~* is a IT-algebra on S and indeed that ~* = IT(~,N). With obvious notation we define for F E ~*, p*(F) = peE) = peG), it being easy to check that p* is well defined. Moreover, it is no problem to prove that (S, ~*, p*) is a measure space, the completion of (S,~, p).
.. (Al.12)
Chapter Al: Appendix to Chapter 1
20S
For parts of advanced probability, it is essential to complete the basic probability triple (n, F, P). In other parts of probability, when (for example) S is topological, E = B(S), and we wish to consider several different measures on (S, E), it is meaningless to insist on completion. If we begin with ([0, 1],B[0, 1], Leb), then B[O,l]* is the u-algebra of what are called Lebesgue-measurable sets of [0,1]. Then, for example, a function f : [O,lJ -+ [O,lJ is Lebesgue-measurable if the inverse image of every Borel set is Lebesgue-measurable: it need not be true that the inverse image of a Lebesgue-measurable set is Lebesgue-measurable.
A1.12. The Baire category theorem In Section 1.11, we studied a subset H of S:= [0,1] such that (i) H =
nGk for a sequence (G
of open subsets of S,
k)
k
(ii) H;2 V, where V If H were countable: H
(a)
S
= Qn S. = {hr : r
=H
UHC
EN}, then we would have
= (U{h r }) U (U G k) r
expressing S as a countable union
(b) n
of closed sets where no Fn contains an open interval. [Since V ~ Gk for every k, G k ~ VC so that G k contains only irrational points in S.] However, the Baire category theorem states that if a complete metric space S may be written as a union of a countable sequence of closed sets: then some Fn contains an open ball.
Thus the set H must be uncountable. The Baire category theorem has fundamental applications in functional analysis, and some striking applications to probability too! Proof of the Haire category theorem. Assume for the purposes of contradiction that no Fn contains an open ball. Since F{ is a non-empty open subset of S, we can find Xl in S and £1 > such that
°
204
Chapter Ai: Appendix to Chapter 1
(Ai.12) ..
B(XI' et) denoting the open ball of radius el centred at Xl. Now F2 contains no open ball, so that the open set
is non-empty, and we can find
X2
in U2 and
e2
> 0 such that
Inductively, choose a sequence (xn) in S and (en) in (0,00) so that we have
en+l < 2- l en and
Since d(xn,xn+t) < 2- l e n , it is obvious from the triangle law that (xn) is Cauchy, so that x := limxn exists, and that
contradicting the fact that
U Fn = S.
o
Chapter AS
Appendix to Chapter 3
A3.1. Proof of the Monotone-Class Theorem 3.14 Recall the statement of the theorem. ~
Let 1{ be a claM of bounded functions from a set S into R satisfying the following conditions:
(i)
1{
is a vector space over R;
(ii) the constant function 1 is an element of 1{; (iii) if (In) is a sequence of non-negative functions in 1{ such that In I where I is a bounded function on S, then I E 1{.
r
Then if 1{ contains the indicator function of every set in some 7rsystem I, then 1{ contains every bounded q(I)-measurable function on S. Proof Let 1) be the class of sets F in S such that IF E 1{. It is immediate from (i) - (iii) that 1) is a d-system. Since 1) contains the 7r-system I, 1) contains q(I). Suppose that
I
is a q(l)-measurable function such that for some K in
N,
o ~ I(s)
~ K,
Vs E S.
For n E N, define K2R
In(s)
= E iTnIA(n,i), i=O
where
A(n,i):= {s: iT n ~ I(s) < (i
+ 1)2-n}.
Since I is q(I)-measurable, every A(n, i) E q(I), so that lA(n,i) E 1{. Since is a vector space, every In E 1{. But 0 ~ In I, so that I E 1{.
r
1{
205
(AS.l)..
Chapter AS: Appendix to Chapter S
206
max(f,O) If f E ba(I), we may write f = j+ - f-, where f and f- = max( - f, 0). Then j+, f- E bO"(I) and f+, f- ~ 0, so that f+ ,f- E 'H by what we established above. o
A3.2. Discussion of generated O"-algebras This is one of those situations in which it is actually easier to understand things in a more formal abstract setting. So, suppose that nand S are sets, and that Y : D -+ S; ~
is a O"-algebra on S:
X: D -+ R.
Because y- I preserves all set operations, y-I~:=
{y-I B: B E ~}
is a O"-algebra on D, and because it is tautologically the smallest O"-algebra ~ -+ Y), we call it O"(Y):
Yon D such that Y is Y /~ measurable (in that y-I :
LEMMA (a)
X is O"(Y)-mea.mrable if and only if
X where
f
is a
~-measurable
= feY)
function from S to R.
Note. The 'if' part is just the Composition Lemma Proof of 'only if' part. It is enough to prove that
(b)
X E bO"(Y) if and only if 3f E b~ such that X
= f(Y).
(Otherwise, consider arc tan X, for example.) Though we certainly do not need the Monotone-Class Theorem to prove (b), we may as well use it. So define 'H to be the class of all bounded functions X on D such that X = icY) for some f E b~. Taking I = O"(Y), note that if F E I then F = y-I B for some B in ~, so that
IF(w) = IB(Y(w)),
207
Chapter A9: Appendix to Chapter 9
.. (A9.2)
so that IF E 1i. That 1i is a vector space containing constants is obvious. Finally, suppose that (Xn) is a sequence of elements of 1i such that, for some positive real constant K,
o :S Xn i
X :S K.
For each n, Xn = In(Y) for some In in bE. Define E bE. Then X = I(Y).
1:= lim sup In,
f
One
ha~
so that 0
to be very careful about what Lemma (a) mean" in practice.
To be sure, result (3.13,b) is the special case when (S,E) Di8cussion of (3.13,c). Suppose that Yk : define a map Y : n ~ Rn via
n~
= (R,B).
R for 1 :S k :S n. We may
The problem mentioned at (3.13,d) and in the Warning following it shows up here because, before we can apply Lemma (a) to prove (3.13,c), we need to prove that
[This amounts to proving that the product u-algebra rIl
+ X 2 + ... + X n . Sn "f2n log log n
Then, almost surely,
= +1,
· . f 1Imln
Sn
"f2n log log n
= - 1.
This result already gives very precise behaviour on the big values of partial sums. See Section 14.7 for proof in the case when the X's are normally distributed.
A4.2. Strassen's Law of the Iterated Logarithm Strassen's Law is a staggering extension of Kolmogorov's result. Let (Xn) and (Sn) be as in the previous section. For each w, let the map t ...... St(w) on [0,00) be the linear interpolation of the map n ...... Sn(w) on Z+ , so that
St(W) := (t - n)Sn+t(w) + (n
+ 1- t)Sn(w),
t E [n,n + 1).
With Kolmogorov's result in mind, define
Zn(t,w):=
Snt(w) .../2n log log n 208
t E [0,1],
209
Chapter A4: Appendix to Chapter 4
.. (A4·S)
so that t ~ Zn(t,W) on [0,1] is a rescaled version of the random walk S run up to time n. Say that a function t ~ J(t,w) is in the set f{(w) oflimiting shapes of the path associated with w if there is a sequence n 1(w ), n2 (w), ... in N such that
Zn(t,w) ...... J(t,w) uniformly in t E [0,1]. Now let f{ consist of those functions in the Lebesgue-integral form
J(t)
=
l'
h(s)ds
where
J in C[O, 1] which can be
written
11 h(s?ds :S 1.
Strassen's Theore1U
P[f{(w)
= K] = 1.
Thus, (almost) all paths have the same limiting shapes. Khinchine's law follows from Strassen's precisely because (Exercise!) sup{J(I): J E f{}
= 1,
inf{J(I): J E f{}
=
-1.
However, the only element of]{ for which J(I) = 1 is the function J(t) = t, so the big values of S occur when the whole path (when rescaled) looks like a line of slope 1. Almost every path will, in its Z rescaling, look infinitely often like the function t and infinitely often like the function -t; etc., etc.
ReJerences. For a highly-motivated classical proof of Strassen's Law, see Freedman (1971). For a proof for Brownian motion based on the powerful theory of large deviations, see Stroock (1984).
A4.3. A model for a Markov chain Let E be a countable set; let /l be a probability measure on (E, E), where
£ denotes the set of all subsets of E; and let P denote a stochastic E matrix as in Section 4.8.
X
E
Complicating the notation somewhat for reasons we shall discover later, we wish to construct a probability triple (n,.t, PI') carrying an E-valued stochastic process (Zn:n E Z+) such that for n E Z+ and i o,i 1 , ••• ,in E E, we have
-
- = io; ... ; Zn- = in) = /lioPioi, ... Pi n-
pl'(Zo
1
in'
210
Chapter A4: Appendix to Chapter 4
The trick is to make
(0., J:, PI')
carry independent E-valued variables
(.2 o;Y(i,n):i E E;n E N)
.20
having law J1 and such that PI'(Y(i, n)
= j) = p(i,j),
(i,j E E).
We can obviously do this via the construction in Section 4.6. For wEn and n E N, define
and that's it!
(A4.
Chapter AS
Appendix to Chapter 5
Our task is to prove the Monotone-Convergence Theorem 5.3. We need an elementary preliminary result. AS.I. Doubly monotone arrays Proposition. Let (y~) : r E N, n E N)
be an array of numbers in [0,00] which is doubly monotone: for fixed r, y~r)
1 as n 1 so
that
for fixed n, y~r)
1 as r 1 so
that Yn :=llimy~r) exists.
y(r)
:=llim y~r) exists; n
r
Then y(oo)
:=llimy(r) =llimYn =: r
n
YCX)'
Proof The result is almost trivial. By replacing each we can assume that the y~.) are uniformly bounded. Let
e: > 0 be given. Choose
no such that Yno
>
(y~;»
'l/CX) -
by arc tan
y~r),
te:. Then choose
ro such that yto) > Yno - !e:. Then
so that
y(oo)
~ YCX)' Similarly, Yoo ~ y(oo).
o
AS.2. The key use of Lemma I.I0(a) This is where the fundamental monotonicity property of measures is used. Please re-read Section 5.1 at this stage.
211
(A5.2).
Chapter A5: Appendix to Chapter 5
212 LEMMA
(a)
Suppose that A E
~
and that h n E SF+
Then po(hn )
and
hn
TIA.
T peA).
Proof. From (5.1,e), po(h n ) S peA), so we need only prove that
liminfpo(hn) ~ peA). Let c; > 0, and define An := {S E A: hn(s) by Lemma 1.10(a), Jl(An) TpeA). But
> 1- c:}.
Then An
TA
so that,
.
(1 - e)IAn S h n so that, by (5.1,e), (1 - c;)p(An) S po(h n ). Hence liminf po(h n ) Since this is true for every c;
~
(1 - c;)Jl(A).
o
> 0, the result follows.
LEMMA (b) Suppose that f E SF+ and that gn E SF+
Proof. We can write f as a finite sum disjoint and each a" > o. Then a;llAkgn
and
f
TIAk
=
gn
Tf·
E akIAk (n
where the sets Ak are
Too), 0
and the result follows from Lemma (a).
A5.3. 'Uniqueness of integral'
LEMMA
(a)
Suppose that f E (m~)+ and that we have two sequences (I(r» (In) of elements of SF+ such that f(r) Then
Tf,
fn
Tf·
and
.. (A5··0
213
Chapter A5: Appendix to Chapter 5
Proof. Let I~r) := I(r) 1\ In. Then as r I~r) TI(r). Hence, by Lemma A5.2(b),
T 00,
lir)
T In,
and as n
T 00,
110(f~r»
Tp.o(fn) as r T00, 110 (fir) ) Tp.o(f(r» as n T00.
o
The result now follows from Proposition A5.1. Recall from Section 5.2 that for
f
E (m:E)+, we define
11(f) := sup{l1o(h) : h E Sp+; h $ f} $
00.
By definition of p.(f), we may choose a sequence h n in S p+ such that h n $ I and p.o(h n ) Tp.(f). Let us also choose a sequence (9n) of elements of SP+ such that 9n T I. (We can do this via the 'staircase function' in Section 5.3.) Now let In := max(9n, hI, h 2 , •• • , hn). Then fn E SP+, In $ I, and since In ~ 9n, In l1o(fn) $ 11(f), and since In ~ h n , we see that
T I·
Since In
<
I,
On combining this fact with Lemma (a), we obtain the next result. (Lemma (a) 'changes our particular sequence to any sequence'.)
LEMMA (b)
Let IE (m:E)+ and let (fn) be any sequence in SP+ such that In Then
TI·
A5.4. Proof of the Monotone-Convergence Theorem Recall the statement: Let (fn) be a sequence of elements of (m:E)+ such that In
TI.
Then
Proof. Let a(r) denote the rth staircase function defined in Section 5.3. Now set fir) := a(r)(fn), f(r) := a(r)(f). Since a(r) is left-continuous, I~r) TI(r) as n T00. Since a(r)(x) T x, Vx, fir) T In as r T00. By Lemma A5.2(b), p.(fir» T p.(f(r» as n T 00; and by Lemma A5.3(b), p.(fir» T p.(fn) as r T00. We also know from Lemma A5.3(b) that p.(f(r» T11(f). The result now follows from Proposition A5.1. 0
Chapter A9
Appendix to Chapter 9
This chapter is solely devoted to the proof of the 'infinite-product' Theorem 8.6. It may be read after Section 9.10. It is probably something which a keen student who has read all previous appendices should study with a tutor.
A9.1. Infinite products: setting things up Let (An :E N) be a sequence of probability measures on (R, B). Let
n:=
II R, nEN
so that a typical element w of n is a sequence W = (w n of R. Define Xn(w) := W n , and set
:
n E N) of elements
The typical element Fn of Fn has the form
(a)
Fn =Gn
X
IT R, k>n
Fubini's Theorem shows that on the algebra (NOT a-algebra)
we may unambiguously use (a) to define a map P- : F- -+ [0,1] via
(b) and that P- is finitely additive on the algebra F- . However, for each fixed n,
214
.. (A9.~)
Chapter A9: Appendix to Chapter 9
215
(c)
(n,Fn,p-) 13 a bona fide probability triple which may be identified with TII
We want to prove that P- is countably additive on F-
(obviously with the intention of using Caratheodory's Theorem 1.7). Now we know from our proof of the existence of Lebesgue measure (see (Alo9,a)) that it is enough to show that (e) if (Hr) is a sequence of sets in F- such that Hr ;2 Hr+I. Vr, and if for 30me f: > 0, P-(Hr ) ~ e for every r, then nHr =I- 0.
A9.2. Proof of (A9.1,e) Step 1: For every r, there is some nCr) such that Hr E Fn(r) and so
= h r(WI,W2, ... ,Wn(r») for some hr E bBn(r). Recall that Xk(W) = Wk, and look again at Section A3.2. IlI.(w)
Step 2: We have
(aO) because the left-hand side of (aO) is exactly P-(Hr). If we work within the probability triple (n, Fn(r), P-), then we know from Section 9.10 that ,Aw) ::;= gr(WI) := E- hr(wt,X 2 , X 3 , ••• , Xn(r»
is an explicit version of the conditional expectation of Ill. given FI , and
Now, 0
~
gr
~
1, so that
e ~ Al (gr) ~ 1AI {gr ~ e2- I } + e2- I Al {g~I) ~ e2- I } ~
Al {gr ~ f:2- I
}
+ e2- I .
Thus
Step 3: However, since Hr;2 H r+ I , we have (working within (n,Fm,p-) where both Hr and Hr+I are in Fm) grew!) ~ gr+I(Wt},
for every WI in R.
Chapter A9: Appendix to Chapter 9
216
(A9.2) ..
Working on (R,8,AJ) we have
and gr
1 so that {gr 2: .o2- 1 } 1;
and by Lemma 1.10(b) on the continuity from above of measures, we have
Hence, there exists
wi
(say) in R such that
(al)
Step
4:
We now repeat Steps 2 and 3 applied to the situation in which
hr is replaced by hr(wr), where
We find that there exists
w~
in R such that
(a2) Proceeding inductively, we obtain a sequence w*
= (w~
: n E N)
with the property that
(* * ... ,wn(r) *) > \.J E -h rWl,w2, _c 2- n (r) ,yr. However, hr(w;,wi,··· ,w:(r»
= IH.(w*),
and can only be 0 or 1. The only conclusion is that w* E H r , '
Chapter A13
Appendix to Chapter 13
This chapter is devoted to comments on modes of convergence, regarded by many as good for the souls of students, and certainly easy to set examination questions on. A13.1. Modes of convergence: definitions Let (Xn : n E N) be a sequence of RVs and let X be a RV, all carried by our triple (Q,:F, P). Let us collect together definitions known to us. Ahnost sure convergence We say that Xn -> X almost surely if
P(Xn
->
X)
= 1.
Convergence in probability We say that Xn -> X in probability if, for every c > 0,
P(IXn -XI> c)
-> 0
as
n
-> 00.
£P convergence (p :::: 1) We say that Xn -> X in £P if each Xn is in £P and X E .0 and as
n ->
00,
equivalently, as
217
n ->
00.
Chapter AlS: Appendix to Chapter lS
218
(Al!J.2).
A13.2. Modes of convergence: relationships Let me state the facts. Convergence in probability is the weakest of the above forms of convergence Thus (a) (Xn ~ X, a.s.)
:=}
(Xn
-+
X in prob)
(b) for p 2: 1,
(Xn -+ X in .0)
=}
(Xn
-+
X in prob).
No other implication between any two of our three forms of convergence is valid. But, of course, for r 2: p 2: 1, (c) (Xn
~
X in
.c r ) :=} (Xn
-+
X in .0).
If we know that 'convergence in probability is happening quickly' in that
(d)
L: P(IXn
-
XI> e)
< 00, ' 0,
n
then (BC1) allows us to conclude that Xn -+ X, a.s. The fact that property (d) implies a.s. convergence is used in proving the following result:
(e) Xn
-+ X in probability if and only if every subsequence of (Xn) contains a further subsequence along which we have almost sure convergence to x.
The only other useful result is that -+ X in .c p if and only if the following two statements hold: (i) Xn -+ X in probability, (ii) the family (IXnIP : n 2: 1) is UI.
(f) for p 2: 1, Xn
There is only one way to gain an understanding of the above facts, and that is to prove them yourself. The exercises under EA13 provide guidance if you need it.
Chapter A14
Appendix to Chapter 14
We work with a filtered space (n,:F, {:Fn: n E Z+}, P). This chapter introduces the u-algebra :FT, where T is a stopping time. The idea is that :FT represents the information available to our observer at (or, if you prefer, immediately after) time T. The Optional-Sampling Theorem says that if X is a uniformly integrable supermartingale and S and T are stopping times with S::; T, then we have the natural extension of the supermartingale property: E(XTI:Fs) ::; X s ,
a.s.
A14.1. The u-algebra :FT, T a stopping time Recall that a map T: n -+ Z+ U {oo} is called a 3topping time if {T::; n} E :Fn ,
n E Z+ U fool,
{T=n}E:Fn,
nEZ+U{oo}.
equivalently if In each of the above statements, the 'n = 00' case follows automatically from the validity of the result for every n in Z+. Let T be a stopping time. Then, for F ~
n,
we say that F E :FT if
Fn{T:::;n}E:Fn ,
nEZ+U{oo},
Fn{T=n}E:Fn,
nEZ+U{oo}.
equivalently if Then:FT =:Fn if T
== nj
:FT
= Frx>
if T
219
==
OOj
and:FT ~ :Foo for every T.
Chapter A14: Appendix to Chapter 14
220
You can easily check that FT is a IT-algebra. You can also check that if S is another stopping time, then
Hint. If F E FSAT, then Fn{T=n}= UFn{SAT=k}.
o
k:$n
Another detail that needs to be checked is that if X is an adapted process and T is a stopping time, then XT E mFT. Here, Xoo is assumed defined in some way such that X 00 is F 00 measurable.
Proof. For B E 5, {XT E B}
n {T = n} = {Xn
E B}
n {T = n} E Fn.
0
A14.2. A special case of OST
LEMMA Let X be a supermartingale. Let T be a stopping time such that, for Mme N in N, T(w) :::; N, Vw. Then X T E .c 1 (fl,FT,P) and
Proof. Let FE FT. Then
:::; LE(Xn;Fn{T=n})=E(XT;F). n5,N (Of course, the fact that IXTI :::; E(IXTI) < 00.)
IXII + ... + IXNI
guarantees the result that 0
A14.3. Doob's Optional-Sampling Theorem for VI martingales ~~
Let M be a UI martingale. Then, for any stopping time T, E(MooIFT) = MT ,
a.s.
.. ( A14.4)
221
Chapter A14: Appendix to Chapter 14
Corollary 1 (a new Optional-Stopping Theorem!) If M i8 a UI martingale, and T and E(MT) = E(Mo).
i8
a stopping time, then E(IMTi) <
00
~
T,
Corollary 2 If M is a UI martingale and Sand T are stopping time8 with S then E(MTIFs) = Ms, a.s.
Proof of theorem. By Theorem 14.1 and Lemma A14.2, we have, for kEN,
E(MooIFk)
= Mk,
a.s.,
E(MkIFTl\k)
= MTl\k,
a.s.
Hence, by the Tower Property,
If F EFT, then (check!) F
(**) E(MoojFn{T
~
k})
n {T ~ k}
E FTl\k, so that, by (*),
= E(MTl\kjFn{T ~ k}) = E(MTjFn{T ~ k}).
We can (and do) restrict attention to the case when Moo ~ 0, whence Mn = E(MooIFn) ~ 0 for all n. Then, on letting k i 00 in (**) and using (MON), we obtain
E(MoojFn {T
< oo}) = E(MTjFn {T < oo}).
However, the fact that
E(MoojFn {T = oo}) = E(MTjFn {T = oo}). is tautological. Hence E(Moo; F)
o
= E(MT; F).
Corollary 2 now follows from the Tower Property, and Corollary 1 follows from Corollary 2! A14.4. The result for UI submartingales
A UI submartingale X has Doob decomposition
X=Xo+M+A, where (Exercise: explain why!) E(Aoo) a stopping time, then, almost surely,
<
00
and M is UI. Hence, if T is
E(XooIFT) = Xo + E(MooIFT) + E(AooIFT) = Xo + MT + E(AooIFT) ~ Xo + MT + E(ATIFT) =XT'
Chapter A16
Appendix to Chapter 16
A16.!. Differentiation under the integral sign Before stating our theorem on this topic, let us examine the type of application we need in Section 16.3. Suppose that X is a RV such that E(IXI) < 00 and that h(t, x) = ixe itr • (We can treat the real and imaginary parts of h separately.) Note that if [a, b] is a subinterval of R, then the variables {h(t,X) : t E [a,b]} are dominated by lXI, and so are UI. In the theorem, we shall have EH(t,X)
= 'Px(t) -
'Px(a),
t
E
[a, b],
and we can conclude that 'Px(t) exists and equals Eh(t,X).
THEOREM Let X be a RV carried by (n,.1", P). Suppose that a, b E R with a < b, and that
h : [a, b] x R ~ R has the properties:
(i) t (ii) x
I->
I->
h(t,x) is continuous in t for every x in R, h(t, x) is B-measurable for every t in [a, b],
(iii) the variables {h(t,X): t E [a,b]} are llI. Then
(a)
t
(b)
his B[a,b] x B-measurable,
I->
Eh( t, X) is continuous on [a, b],
222
(c)
223
Chapter A16: Appendix to Chapter 16
.. (A16.1) if H(t, x) :=
f:
h(s, x)ds for a ~ t ~ b, then for t E (a, b),
~EH(t,X)
exists and equals
dt
Eh(t,X).
Proof of (a). Since we need only consider the 'sequential case tn (a) follows immediately from Theorem 13.7. Proof of (b). Define en := 2- n(b - a), Dn := (a
--+
t', result 0
+ 61+) n [a, b),
rn(t):= inf{T E Dn: r ~ t},
t E [a, b),
t E [a, b), x E R.
hn(t,x):= h(Tn(t),x),
Then, for B E B,
U «[T,T + 6) n [a, b))
=
h;;l(B)
{x : h(T,X) E B}),
X
rED ..
so that h n is B[a, b) follows. Proof of (c). For
r
X
~
B-measurable. Since h n
--+
h on [a, b) x R, result (b) 0
[a, b) x R, define
O'(r):= ((t,w) E [a, b)
X
n: (t,X(w» E r}.
If r = A x C where A E B[a, b) and C E B, then
O'(r)
=A X
(X- 1 C) E B[a, b) x :F.
It is now clear that the class of r for which O'(r) is an element of B[a, b) x:F is a a-algebra containing B[a, b) x B. The point of all this is that
(t,w) ~ h(t,X(w)) is B x:F- measurable
since for B E B, ({t,w) : h(t,X(w» E B} is O'(h-IB). (Yes, I know, we could have obtained (*) more directly using the hn's, but it is good to have other methods.) Since the family {h(t,X) : t ~ O} is UI, it is bounded in [.1, whence
lb
Elh(t,X)ldt <
Fubini's Theorem now implies that, for a
it
Eh(s,X)ds = E
and part (c) now follows.
it
~
t
00.
~
b,
h(s,X)ds = EH(t,X),
o
Chapter E
Exercises
Starred exercises are more tricky. The first number in an exercise gives a rough indication of which chapter it depends on. 'G' stands for 'a bit of gumption is all that's necessary'. A number of exercises may also be found in the main text. Some are repeated here. We begin with an Antidote to measure-theoretic material- just for fun, though the point that probability is more than mere measure theory needs hammering home.
EG.!. Two points are chosen at random on a line AB, each point being chosen according to the uniform distribution on AB, and the choices being made independently of each other. The line AB may now be regarded as divided into three parts. What is the probability that they may be made into a triangle? EG.2. Planet X is a ball with centre O. Three spaceships A, B and C land at random on its surface, their positions being independent and each uniformly distributed on the surface. Spaceships A and B can communicate directly by radio if '(AOB < 90°. Show that the probability that they can keep in touch (with, for example, A communicating with B via C if necessary) is (71" + 2)/(471"). EG.3. Let G be the free group with two generators a and b. Start at time the unit element 1, the empty word. At each second multiply the current word on the right by one of the four elements a, a-I, b, b- I , choosing each with probability 1/4 (independently of previous choices). The choices
o with
a, a, b, a-I, a, b- 1 , a-I, a, b
at times 1 to 9 will produce the reduced word aab of length 3 at time 9. Prove that the probability that the reduced word 1 ever occurs at a positive time is 1/3, and explain why it is intuitively clear that (almost surely) (length of reduced word at time n)/n .......
224
t.
Chapter E: Exerci$e$
225
EG.4.* (Continuation) Suppose now that the elements a, a-t,b,b- I are chosen instead with respective probabilities a, a, (3, (3, where a > 0, (3 > 0, a + (3 = !. Prove that the conditional probability that the reduced word 1 ever occurs at a positive time, given that the element a is chosen at time 1, is the unique root x = rea) (say) in (0,1) of the equation
As time goes on, (it is almost surely true that) more and more of the reduced word becomes fixed, so that a final word is built up. If in the final word, the symbols a and a-I are both replaced by A and the symbols b and b- I are both replaced by B, show that the sequence of A's and B's obtained is a Markov chain on {A, B} with (for example)
PAA
a(l - x) = a ( 1-x ) +2 (3( 1-y)'
where y = r((3). What is the (almost sure) limiting proportion of occurrence of the symbol a in the final word? (Note. This result was used by Professor Lyons of Edinburgh to solve a long-standing problem in potential theory on Riemannian manifolds.)
Algebras, etc. ELI. 'Probability' for subsets of N Let V
~
N. Say that V has (Cesaro) density ,eV) and write V E CES if
,eV) := lim "(V n {1, 2, 3, ... , n}) n
exists. Give an example of sets V! and V2 in CES for which Vi Thus, CES is not an algebra.
n V2 f/. CES.
Independence E4.1. Let (n, F, P) be a probability triple. Let II, I2 and I3 be three lI'-systems on n such that, for k = 1,2,3,
Chapter E: Exercises
226 Prove that if
pelt n 12 n 13 )
= P(Il)P(I2)P(I3 )
whenever I" E I" (k = 1,2,3), then aCId, a(I2)' a(I3) are independent. Why did we require that E I,,?
n
E4.2. Let s > 1, and define (s):= EnEN n-·, as usual. Let X and Y be independent N-valued random variables with
P(X
= n) = P(Y = n) = n-·/(s).
Prove that the events (Ep : p prime) , where Ep = {X is divisible by p}, are independent. Explain Euler's formula
l/(s)
= II(l-l/p') p
probabilistically. Prove that P(no square other than 1 divides X)
= 1/(2s).
Let H be the highest common factor of X and Y. Prove that
E4.3. Let XI, X 2 , ••• be independent random variables with the same continuous distribution function. LetEl:= n, and, for n ~ 2, let
En
:=
{Xn > Xm,Vm < n} = {a 'Record' occurs at time n}.
Convince yourself and your tutor that the events E 1 , E 2 , • __ are independent, with PeEn) = l/n. Borel-Cantelli Lemmas E4.4. Suppose that a coin with probability p of heads is tossed repeatedly. Let A" be the event that a sequence of k (or more) consecutive heads occurs amongst tosses numbered 2",2" + 1,2" + 2, ... , 2"+1 - 1. Prove that p
eA . )= {I0 ", 1.0.
ifp>!, 'f p< - 2' 1
I
Hint. Let Ei be the event that there are k consecutive heads beginning at toss numbered 2" +( i-I )k_ Now make a simple use ofthe inclusion-exclusion formulae (Lemma 1.9).
227
Chapter E: Exercises
E4.S. Prove that if G is a random variable with the normal N(O,l) distribution, then, for x
> 0,
peG > x) = -1-
v"ii
1
00
e- H" dy:::;
x
1 1 --e-~
xv"ii
x' .
Let X I ,X2 , ... be a sequence of independent N(O,l) variables. Prove that, with probability 1, L :::; 1, where
L:= limsup(Xn/J2logn). (Harder. Prove that pel = 1) = 1.) [Hint. See Section 14.8.)
Let Sn := Xl + X 2 distribution. Prove that
+ ... + X n .
P(iSnl
Recall that Sn/fo has the N(O,l)
< 2Jnlogn, ev) = 1.
Note that this implies the Strong Law: P(Sn/n
-t
0) = 1.
Remark. The Law of the Iterated Logarithm states that P (limsup
Sn .../2n log log n
=
1) =
1.
Do not attempt to prove this now! See Section 14.7.
E4.6. Converse to SLLN Let Z be a non-negative RV. Let Y be the integer part of Z. Show that
and deduce that
2: P[Z 2: n] :::; E(Z) :::; 1 + 2: P[Z 2: n]. Let (Xn) be a sequence of lID RVs (independent, identically distributed random variables) with E(IXnl) = 00, 'tin. Prove that
2: P[Xnl > kn] =
00
(k E N)
and
lim sup IXn I = n
n
Deduce that if Sn = Xl
+ X 2 + ... + X n , then
lim sup ISnl = n
00,
a.s.
(Xl,
a.s.
Chapter E: Exerciaea
228
E4.1. What's fair about a fair game? Let XI, X 2 , ••• be independent RVs such that
=
X n
Prove that E(Xn)
{n2-1 - 1
n-2
with probability with probability 1 - n- 2 •
= 0, Vn, but that if Sn = Xl + X 2 + ... + X n, then Sn
-;:;- ..... -1,
a.s.
E4.8*. Blackwell's test of imagination This exercise assumes that you are familiar with continuous-parameter Markov chains with two states. For each n E N, let x{n) = {x{n)(t) : t ;:: O} be a Markov chain with state-space the two-point set {O, I} with Q-matrix
and transition function p{n)(t) = exp(tQ{n». Show that, for every t,
The processes (x{n) : n E N) are independent and x(n)(o) = 0 for every n. Each x{n) has right-continuous paths.
Suppose that Lan =
00
and L an/bn <
00.
Prove that if t is a fixed time then
p{x(n)(t) = 1 for infinitely many n} = O. Use Weierstrass's M -test to show that Ln log p~~) (t) is uniformly con vergent on [0,1], and deduce that
p{x{n)(t)
= 0 for ALL n} ..... 1
as t! O.
Prove that
p{x{n)(a) = 0, Vs 5 t, Vn}
= 0 for every t > 0
and discuss with your tutor why it is almost surely true that
229
Chapter E: Exerciu3 (**)
within every non-empty time interval, infinitely many of the x(n) chains jump.
Now imagine the whole behaviour. Notes. Almost surely, the process X = (x(n») spends almost all its time in the countable subset of {O, I}N consisting of sequences with only finitely many l's. This follows from (*) and Fubini's Theorem 8.2. However, it is a.s. true that X visits uncountable points of {O, I}N during every nonempty time interval. This follows from (**) and the Baire category theorem Al.12. By using much deeper techniques, one can show that for certain choices of (an) and (b n ), X will almost certainly visit every point of {O, I}N uncountablyoften within a finite time.
Tail a-algebras
E4.9. Let Yo, YI , Y 2 , ••• be independent random variables with P(Yn = +1) = P(Yn = -1)
= 1,
"In.
For n EN, define
Xn := YOYI
'"
Yn •
Prove that the variables Xl, X 2 , ••• are independent. Define
Y:= a(Yl> Yz, .. .),
Tn := a(Xr : r > n).
Prove that
Hint. Prove that Yo E m£ and that Yo is independent of R.
E4.10. Star Trek, 2 See EIO.l1, which you can do now.
Dominated-Convergence Theorem
ES.1. Let S := [0,1], E := 8(S), J.L := Leb. Define fn := nI(o,l/n). Prove that fn(s) -> 0 for every s in S, but that J.L(fn) = 1 for every n. Draw a picture of g:= SUPn Ifni, and show that 9 f/. £l(S, E,J.L).
Chapter E: Exercises
290
Inclusion-Exclusion Formulae ES.2. Prove the inclusion-exclusion formulae and inequalities of Section 1.9 by considering integrals of indicator functions. The Strong Law E1.1. Inverting Laplace transforms Let f be a bounded continuous function on [0,00). The Laplace transform of f is the function L on (0,00) defined by
Let X},X 2 , ••• be independent RVs each with the exponential distribution of rate -X, so P[X > xl = e-~%, E(X) = Var(X) = Show that
!,
( _l)n-I
-xnL(n-I)(-X) (n-l)!
:&.
= Ef(Sn,)
where Sn = Xl +X2 +·· ·+Xn , andL(n-1) denotes the (n_l)th derivative of L. Prove that f may be recovered from L as follows: for y > 0,
fey)
= lim (_It-l (n/y)nL(n-l)(n/ y ). nToo
(n - I)!
E1.2. The uniform distribution on the sphere sn-1
~
Rn
As usual, write sn-I = {x E Rn : Ixl = I}. You may assume that there is a unique probability measure v n- 1 on (sn-1,8(sn-I») such that vn-I(A) = /.In-1(HA) for every orthogonal n X n matrix H and every A in 8(sn-1). Prove that if X is a vector in Rn , the components of which are independent N(O,l) variables, then for every orthogonal n X n matrix H, the vector HX has the same property. Deduce that X/IXI has law /.In-1. Let ZJ, Z2, ... be independent N(O,l) variables and define
Prove that Rn/ Vn
-t
1, a.s.
Combine these ideas to prove a rather striking fact which relates the normal distribution to the 'infinite-dimensional' sphere and which is important both for Brownian motion and for Fock-space constructions in quantum mechanics:
231
Chapter E: Exerciaea
If, for each n, (yt), y 2(n), ••• , yJn») is a point chosen on sn-l according to the distribution lin-I, then
Hint. p(y1(n) ~ u)
= P(XI/Rn ~ u).
Conditional Expectation E9.1. Prove that if g is a sub-u-algebra of :F and if X E £1 (n,:F, P) and ifY E £I(n,g,p) and
(*)
E(X; G)
= E(Y; G)
for every G in a 1l"-system which contains n and generates g, then (*) holds for every G in g. E9.2. Suppose that X, Y E £1 (n,:F, P) and that
E(XJY) Prove that P(X
= Y,
a.s.,
E(YJX)
= X,
a.s.
= Y) = 1.
Hint. Consider E(X - Y; X > cl ,Y ~ c) + E(X - Y; X
~
c, Y
~
c).
Martingales EtO.1. P6lya's urn
At time 0, an urn contains 1 black ball and 1 white ball. At each time 1,2,3, ... , a ball is chosen at random from the urn and is replaced together with a new ball of the same colour. Just after time n, there are therefore n + 2 balls in the urn, of which Bn + 1 are black, where Bn is the number of black balls chosen by time n. Let Mn = (Bn + 1)/(n+2), the proportion of black balls in the urn just after time n. Prove that (relative to a natural filtration which you should specify) M is a martingale. Prove that P(Bn = k) = (n + 1)-1 for 0 ~ k ~ n. What is the distribution of e, where e:= 1imMn?
Chapter E: Exercises
232 Prove that for 0
< 8 < 1,
defines a martingale Nfl.
(Continued at EI0.8.)
EIO.2. Martingale formulation of Bellman's Optimality Principle
Your winnings per unit stake on game n are Cn, where the cn are IID RVs with P(cn=+l)=p,P(Cn=-l)=q,
where
!
Your stake C n on game n must lie between 0 and Zn-l, where Zn-l is your fortune at time n - 1. Your object is to maximize the expected 'interest rate' E 10g(ZN /Zo), where N is a given integer representing the length of the game, and Zo, your fortune at time 0, is a given constant. Let Fn = a(cl, ... ,Cn) be your 'history' up to time n. Show that if C is any (previsible) strategy, then log Zn - na is a supermartingale, where a denotes the 'entropy' a = plogp+ qlogq +log2, so that E loge Z N/ Zo) ~ N a, but that, for a certain strategy, log Zn - na is a martingale. What is the best strategy? EIO.3. Stopping times
Suppose that Sand T are stopping times (relative to (n,F, {Fn }». Prove that S 1\ T (:= mineS, T», S V T(:= max(S, T» and S + T are stopping times. EIO.4. Let Sand T be stopping times with S
~
T. Define the process
l(S,T1 with parameter set N via
l(S,T1(n,w):=
{01
if Sew) < n ~ T(w), otherwise.
Prove that l(S,T1 is previsible, and deduce that if X is a supermartingale, then
299
Chapter E: Ezerci.. e..
ElO.S. 'What alwaY6 6tand6 a re460nable chance of happening will (almost surely) happen - .. ooner rather than later. '
c
Suppose that T is a stopping time such that for some N E N and some n:
> 0, we have, for every
P(T $ n
+ NI.rn ) > C,
a.s.
Prove by induction using P(T > kN) = P(T > kN; T k = 1,2,3, ... P(T> kN) $ (1- c)". Show that E(T)
> (k -1)N) that for
< 00.
ElO.B. ABRACADABRA At each of times 1,2,3, ... , a monkey types a capital letter at random, the sequence of letters typed forming an lID sequence of RVs each chosen uniformly from amongst the 26 possible capital letters. Just before each time n ::; 1,2, ... , a new gambler arrives on the scene. He bets $ 1 that the nth letter will be A. If he loses, he leaves. If he wins, he receives $ 26 all of which he bets on the event that the (n + l)th letter will be B. If he loses, he leaves. IT he wins, he bets his whole current fortune of $ 26 2 that the (n + 2)th letter will be R and so on through the ABRACADABRA sequence. Let T be the first time by which the monkey has produced the consecutive sequence ABRACADABRA. Explain why martingale theory makes it intuitively obvious that ' E(T) ::; 2611 + 264 + 26 and use result 1O.1O(c) to prove this. (See Ross (19$3) for other such applications.)
EtO.7. Gambler's Ruin Suppose that Xl,X2 , ••• are lID RVs with P[X
= +1] = p,
P[X = -1]
= q,
where
0
< p =1-
q
< 1,
Chapter E: Exercises
and p
f:. q.
Suppose that a and b are integers with 0
Sn := a
+ Xl + ... + X n ,
< a < b. Define
T:= inf{n: Sn = 0 or Sn = b}.
Let :F = a(XI, ... , Xn) (Fo = {0, O}). Explain why T satisfies the conditions in Question EIO.5. Prove that
define martingales}.f and N. Deduce the values of peST = 0) and E(ST). EIO.B. Bayes' urn A random number e is chosen uniformly between 0 and 1, and a coin with e of heads is minted. The coin is tossed repeatedly. Let Bn be the number of heads in n tosses. Prove that (Bn) has exactly the same probabilistic structure as the (Bn) sequence in (ElO.l) on Polya's urn. Prove that N! is a regular conditional pdf of e given BI, B 2 , •• • , B n •
p~obability
(Continued at EIB.5.)
EIO.9. Show that if X is a non-negative supermartingale and T is a stopping time, then E(XTjT < 00):::; E(Xo). (Hint. Recall Fatou's Lemma.) Deduce that cP(supXn ~ c):::; E(Xo). n
EIO.IO*. The 'Star-ship Enterprise' Problem The control system on the star-ship Enterprise has gone wonky. All that one can do is to set a distance to be travelled. The spaceship will then move that distance in a randomly chosen direction, then stop. The object is to get into the Solar System, a ball of radius r. Initially, the Enterprise is at a distance Ro( > r) from the Sun. Let Rn be the distance from Sun to Enterprise after n 'space-hops'. Use Gauss's theorems on potentials due to spherically-symmetric charge distribntions to show that whatever strategy is adopted, 1/ Rn is a supermartingale, and that for any strategy which always sets a distance no greater than that from Sun to Enterprise, l/R n is a martingale. Use (EIO.9) to show that P[Enterprise gets into Solar System] :::; r/Ro• For each e > 0, you can choose a strategy which makes this probability greater than (r / Ro) - e. What kind of strategy will this be?
235
Chapter E: Exercises EI0.ll *. Star Trek, 2. 'Captain's Log ...
Mr Spock and Chief Engineer Scott have modified the control system so that the Enterprise is confined to move for ever in a fixed plane passing through the Sun. However, the next 'hop-length' is now automatically set to be the current distance to the Sun ('next' and 'current' being updated in the obvious way). Spock is muttering something about logarithms and random walks, but I wonder whether it is (almost) certain that we will get into the Solar System sometime. " '
Hint. Let Xn := log Rn-Iog R n- 1 • Prove that Xl, X 2 , ••• is an lID sequence of variables each of mean 0 and finite variance a 2 (say), where a > O. let
Prove that if a is a fixed positive number, then P[inf Sn n
= -00]2: P[Sn
2: lim sup P[Sn
~
~
-aavn,i.o.]
-aavnJ
= 1'( -0) > O.
(Use the Central Limit Theorem.) Prove that the event {infn Sn in the tail a-algebra of the (Xn) sequence.
= -(X)} is
E12.1. Branching Process A branching process Z a family
{xl
n):
= {Zn: n
2: O} is constructed in the usual way. Thus,
n, k 2: 1 } of IID Z+ -valued random variables is supposed
given. vVe define Zo:
= 1 and then define recursively:
Z n+l·. -- X(n+l) 1
+ ... + X(n+l) z"
Assume that if X denotes anyone of the J1.:
= E(X) < 00
and
0
xt),
(n 2: 0). then
< a 2 : = Var(X) < 00.
Prove that Mn: = Zn/ /-Ln defines a martingale M relative to the filtration
Fn =a(ZO,ZI"",Zn). Show that
and deduce that M is bounded in £2 if and only if J1. > 1. Show that when
/-L> 1,
Chapter E: Exerci.'Jes
296
E12.2. Vse of Kronecker's Lemma
Let E 1 ,E2 , ••• be independent events with peEn) = lin. Let Y; = IE•. Prove that 2: (Yk - t) I log k converges a.s., and use Kronecker's Lemma to deduce that N n ~1 a.s., logn ' where N n : = Y 1 + ... + Y n. An interesting application is to E4.3, when N n becomes the number of records by time n. E12.3. Star Trek, 3
Prove that if the strategy in ElO.ll is (in the obvious sense) employed and for ever - in A3 rather than in A2, then
L R;;2 < 00,
a.s.,
where Rn is the distance from the Enterprise to the Sun at time n. Note. It should be obvious which result plays the key role here, but you should try to make your argument fully rigorous.
Uniform Integrability E13.1. Prove that a clas C of RVs is VI if and only if both of the following conditions (i) and (ii) hold: (i) C is bounded in £1, so that A:= sup{E(IXI): X E C} < 00, (ii) for every e > 0, 35 > 0 such that if F E F, P(F) < 5 and X E C, then E(IXli F) < e. Hint for 'if'. For X E C, P(IXI > K) :s; K-l A.
Hint for 'only
if'. E(IXli F) :s; E(lXli IXI > K) + KP(F).
E13.2. Prove that if C and V are VI classes of RVs, and if we define
C+V
:= {X
+Y
: X E C, Y E V},
then C + V is VI. Hint. One way to prove this is to use El3.1. E13.3. Let C be a VI family of RVs. Say that Y E V if for some X E C and some sub-O"-algebra 9 of F, we have Y = E(XI9), a.s. Prove that V is VI. E14.1. Hunt's Lemma
Suppose that (Xn) is a sequence of RVs such that X:.= limX n exists a.s. and that (Xn) is dominated by Y in (£1) +:
IXn(w)1 :s; Yew),
V(n,w),
and
E(Y) < 00.
237
Chapter E: Exercises
Let {Fn} be any filtration. Prove that
Hint. Let Zm: = SUPr>m IXr - XI. Prove that Zm Prove that for n 2 m, we have, almost surely,
-+
0 a.s. and in £.1.
E14.2. Azuma-Hoeffding Inequality (a) Show that if Y is a RV with values in [-c,c] and with E(Y) for 8 E R,
Ee 9Y
:::;
cosh 8c :::; exp
(~82 c2)
= 0,
then,
•
(b) Prove that if M is a martingale null at 0 such that for some sequence (c n : n E N) of positive constants,
then, for x
> 0,
Hint for (a). Let fez):
= exp(8z),
z E [-c, c). Then, since f is convex,
fey) ~ c 2- y f(-c) c
+ c+y f(c). 2c
Hint for (b). See the proof of (14.7,a).
Characteristic Functions E16.1. Prove that lim
fT X-I sin x dx
TtooJo
=
7r
/2
by integrating f Z-Ieizdz around the contour formed by the 'upper' semicircles of radii e and T and the intervals [-T, -e] and fe, T]. E16.2. Prove that if Z has the U[-I, 1] distribution, then
cpz(8)
= (sin8)/8,
Chapter E:
238
Exerci~e~
and prove that there do not exist IID RVs X and Y such that X - Y '" U[-I, I].
E16.3. Suppose that X has the Cauchy distribution, and let 0 > o. By integrating e i8z /(1 + z2) around the semicircle formed by [-R, R] together with the 'upper' semicircle centre 0 of radius R, prove that 'Px(O) = e- 8 • Show that 'Px(O) = e- 181 for all O. Prove that if Xl, X 2 , ••• Xn are lID RVs each with the standard Cauchy distribution, then (Xl + ... + Xn)/n also has the standard Cauchy distribution. E16.4. Suppose that X has the standard normal N(O,I) distribution. Let 0> O. Consider J(27r)-! exp(-!z2)dz around the rectangular contour
(-R - iO) -; (R - iO) -; R -; (-R) -; (-R - iO), and prove that 'P X (0)
= exp( -! 02 ).
E16.5. Prove that if'P is the characteristic function of a RV X, then 'P is non-negative definite in that for complex Ct, C2, ••• , Cn and real 01 , O2 , •• • , On,
L L CjCk'PCOj j
Ok)? O.
k
(Hint. Express LHS as the expectation of ... .) Bochner's Theorem says that 'P is a characteristic function if and only if 'P(O) = 1, 'P is continuous, and 'P is non-negative definite! (It is of course understood that here 'P : R -; C.) E18.6 gives a simpler result in the same spirit.
E16.6. (a) Let (11,F,P) = ([O,IJ,8[O,I],Leb). What is the distribution of the RV Z, where Z(w) := 2w -I? Let w = I:2- n R n(w) be the binary expansion of w. Let
U(w)
=
L
2- nQn(w),
where
Qn(w) = 2R n(w)-l.
odd n
Find a random variable V independent of U such that U and V are identically distributed and U + ! V is uniformly distributed on [-1, I]. (b) Now suppose that (on some probability triple) X and Y are IID RVs such that X +!Y is uniformly distributed on [-1, I].
Chapter E: Exercises
239
Let i.p be the CF of X. Calculate r.p(9)/r.p( ~9). Show that the distribution of X must be the same as that of U in part (a), and deduce that there exists a set F E 8[-1, I] such that Leb(F) = and P(X E F) = 1.
°
°
E18.1. (a) Suppose that ,\ > and that (for n > '\)Fn is the DF associated with the Binomial distribution B(n,,\/n). Prove (using CFs) that Fn converges weakly to F where F is the DF of the Poisson distribution with parameter ,\.
(b) Suppose that XI,X 2 , ••• are IID RVs each with the density function (1- cosx)/7rx2 on R. Prove that for x E R,
+ X 2 + ... + Xn
lim P (XI
n
n--+oo
where arc tan E (- f,
:=:;
x) = ! +
7r- 1
arc tanx,
f)·
E18.2. Prove the Weak Law of Large Numbers in the following form. Suppose that X], X 2 , ••• are lID RVs, each with the same distribution as X. Suppose that X E .c} and that E(X) = Ji. Prove by the use of CFs that the distribution of
An
:=
n-I(X I + ...
+ Xn)
converges weakly to the unit mass at Ji. Deduce that
An
-+ Ji
in probability.
Of course, SLLN implies this Weak Law. Weak Convergence for Prob[O, I] E18.3. Let X and Y be RVs taking values in [0,1]. Suppose that
k
= 0,1,2, ....
Prove that
(i) Ep(X)
= Ep(Y) for every polynomial p,
(ii) Ef(X) = Ef(Y) for every continuous function f on [0,1], (iii) P(X :=:; x)
= P(Y :=:; y) for every x
in [0,1].
Hint for (ii). Use the Weierstrass Theorem 7.4. E18.4. Suppose that (Fn) is a sequence of DFs with
Fn(x) =
°for x < 0,
Fn(l) = 1,
for every n.
Chapter E: Exercises Suppose that
(*)
m,,:=lim n
J
[D,I)
x"dFn existsfork=0,1,2, ....
Use the Helly-Bray Lemma and E18.3 to show that Fn ~ F, where F is characterized by ~D,I) x"dF = m", Vk.
E1B.5. Improving on E18.3: A Moment Inversion Formula Let F be a distribution with F(O-) = 0 and F(l) = 1. Let J.l be the associated law, and define
f
m,,:=
x"dF(x).
i[D,1)
Define Q=[O,l]x[O,l]"',
e(w)
= WD,
H,,(w)
:F=BxB"',
P=J.lxLeb"',
= I[D,...o)(w,,).
This models the situation in which e is chosen with law J.l, a coin with probability e of heads is then minted, and tossed at times 1,2,... . See E10.8. The RV HI: is 1 if the kth toss produces heads, 0 otherwise. Define Sn := HI
+ H2 + ... + Hn.
By the Strong Law and Fubini's Theorem,
Sn/n
-+
e,
a.s.
Define a map D on the space of real sequences (an: n E Z+) by setting Da
= (an
- a n +l : n E Z+).
Prove that
.E (7) (Dn-im); -+ F(x)
Fn(x):= .
I~n%
at every point x of continuity of F. EIB.6* Moment Problem
Prove that if (m" : k E Z+) is a sequence of numbers in [0,1], then there exists a RV X with values in [0,1] such that E(X") = m" if and only if mo = 1 and (r,s, E Z+). Hint. Define Fn via E18.5(*), and then verify that E18.4(*) holds. You can show that the moments m",n of Fn satisfy mn,D = 1,
m n ,l = m},
You discover the algebra!
m n,2 = m2
+ n- 1 (ml
- m2),
etc.
241
Chapter E: Exercises
Weak Convergence for Prob[O, 00) E18.7. Using Laplace transforms instead of CFs Suppose that F and G are DFs on R such that F(O-)
= G(O-) = 0,
and
V).. 2: 0. Note that the integral on LHS has a contribution F(O) from {OJ. Prove that F = G. [Hint. One could derive this from the idea in E7.1. However, it is easier to use EI8.3, because we know that if X has DF F and Y has DF G, then n
= 0,1,2, ....J
Suppose that (Fn) is a sequence of distribution functions on Reach with Fn(O-) = and such that
°
exists for)" 2: that
°
and that L is continuous at 0. Prove that Fn is tight and
Fn ~ F where
J e->-,xdF(x) =
L()"),
Modes of convergence EA13.1. (a) Prove that (Xn
-+
X, a.s.)
'* (Xn
-+
X in prob).
Hint. See Section 13.5.
(b) Prove that (Xn
-+
X in prob)
fr (Xn
-+
X, a.s.).
= lEn' where E 1 , E 2 , ••• are independent events. Prove that if 2:= P(iXn - XI> e) < 00, Ve > 0, then Xn -+ X,
Hint. Let X n (c)
a.s.
Hint. Show that the set {w : Xn(w) -;. X(w)} may be written
U {w:
IX~(w)
- X(w)1 > k- 1 for infinitely many n}.
kEN
(d) Suppose that Xn
-+
X in probability. Prove that there is a subsequence -+ X, a.s.
(X n.) of (Xn) such that X nk
Hint. Combine (c) with the 'diagonal principle'.
Chapter E: Exercises
242
(e) Deduce from (a) and (d) that X n -+ X in probability if and only if every subsequence of (Xn) contains a further subsequence which converges a.s. to X. EA13.2. Recall that if ~ is a random variable with the standard normal N(O,l) distribution, then
Suppose that ~1' 6, ... are lID RVs each with the N(O,l) distribution. Let Sn = L~=1 ~k, let a, bE R, and define
Xn
= exp(aSn
-
bn).
Pro\"e that
(X" but that for
r
-+
0, a.s.)
¢:}
(b > 0)
?: 1, (Xn
-+
0 in C)
¢:}
(r < 2bja 2 ).
References
Aldous, D. (1989), Probability Approximations via the Poisson Clumping Heuristic, Springer, New York. Athreya, K.B. and Ney, P. (1972), Branching Processes, Springer, New York, Berlin. Billingsley, P. (1968), Convergence of Probability Measures, Wiley, New York. Billingsley, P. (1979), Probability and Measure, Wiley, Chichester, New York (2nd edn. 1987). Bollobas, B. (1987), Martingales, isoperimetric inequalities, and random graphs, Coll. Math. Soc. J. Bolyai 52, 113-39. Breiman,1. (1968), Probability, Addison-Wesley, Reading, Mass .. Chow, Y.-S. and Teicher, H. (1978), Probability Theory: Independence, Interchangeability, Martingales, Springer, New York, Berlin. Chung, K.L. (1968), A Course in Probability, Harcourt, Brace and Wold, New York. Davis, M.H.A. and Norman, A.R. (1990), Portfolio selection with transaction costs, Maths. of Operation Research (to appear). Davis, M.H.A. and Vintner, R.B. (1985), Stochastic Modelling and Control, Chapman and Hall, London. Dellacherie, C. and Meyer, P.-A. (1980), Probabilitis et Potentiel, Chaps. V-VIII, Hermann, Paris. Deuschel, J.-D. and Stroock, D.W. (1989), Large Deviations, Academic Press, Boston. Doob, J.L. (1953), Stochastic Processes, Wiley, New York.
249
References
Doob, J.L. (1981), Classical Potential Theory and its Probabilistic Counterpart, Springer, New York. Dunford, N. and Schwartz, J.T. (1958), Linear Operators: Part I, General Theory, Interscience, New York. Durrett, R. (1984), Brownian Motion and Martingales in Analysis, Wadsworth, Belmont, Ca. Dym, H. and McKean, H.P. (1972), Fourier Series and Integrals, Academic Press, New York. Ellis, R.S. (1985), Entropy, Large Deviations, and Statistical Mechanics, Springer, New York, Berlin. Ethier, S.N. and Kurtz, T.G. (HI86), Markov Processes: Characterization and Convergence, Wiley, New York. Feller, W. (1957), Introduction to Probability Theory and its Applications, Vol.l, 2nd edn., Wiley, New York. Freedman, D. (1971), Brownian Motion and Diffusion, Holden-Day, San Francisco. Garsia, A. (1973), Martingale Inequalities: Seminar Notes on Recent Progress, Benjamin, Reading, Mass. Grimmett, G.R. (1989), Percolation Theory, Springer, New York, Berlin. Grimmett, G.R. and Stirzaker, D.R. (1982), Probability and Random Processes, Oxford University Press. Hall, P. (1988), Introduction to the Theory of Coverage Processes, Wiley, New York. Hall, P. and Heyde, C.C. (1980), Martingale Limit Theory and its Application, Academic Press, New York. Halmos, P.J. (1959), Measure Theory, Van Nostrand, Princeton, NJ. Hammersley, J.M. (1966), Harnesses, Proc. Fifth Berkeley Symp. Math. Statist. and Prob., Vol.III, 89-117, University of Calif
References
245
Kendall, D.G. (1966), Branching processes since 1873, J. London Math. Soc. 41, 385-406. Kendall, D.G. (1975), The genealogy of genealogy: Branching processes before (and after) 1873, Bull. London Math. Soc. 7,225-53. Kingman, J.F.C. and Taylor, S.J. (1966), Introduction to Measure and Probability, Cambridge University Press. Korner, T.W. (1988), Fourier Analysis, Cambridge University Press. Laha, R. and Rohatgi, V. (1979), Probability Theory, Wiley, New York. Lukacs, E. (1970), Characteristic Functions, 2nd edn., Griffin, London. Meyer, P.-A. (1966), Probability and Potential (English translation), Blaisdell, Waltham, Mass. Neveu, J. (1965), Mathematical Foundation of the Calculus of Probability (translated from the French), Holden-Day, San Francisco. Neveu, J. (1975), Discrete-parameter Martingales, North-Holland, Amsterdam. Parthasarathy, K.R. (1967), Probability Measures on Metric Spaces, Academic Press, New York. Rogers, L.C.G. and Williams, D. (1987), Diffusions, Markov Processes, and Martingales, 2: Ito calculus, Wiley, Chichester, New York. Ross, S. (1976), A First Course in Probability, Macmillan, New York. Ross, S. (1983), Stochastic Processes, Wiley, New York. Stroock, D.W. (1984), An Introduction to the Theory of Large Deviations, Springer, New York, Berlin. Varadhan, S.R.S. (1984), Large Deviations and Applications, SIAM, Philadelphia. Wagon, S. (1985), The Banach-Tarski Paradox, Encyclopaedia of Mathematics, Vol. 24, Cambridge University Press. Whittle, P. (1990), Risk-sensitive Optimal Control, Wiley, Chichester, New York. Williams, D. (1973), Some basic theorems on harnesses, in Stochastic Analysis, eds. D.G. Kendall and E.F. Harding, Wiley, New York, pp.349-66.
Index
(Recall that there is a Guide to Notation on pages xiv-xv.) ABRACADABRA (4.9, E10.6). adapted process (10.2): Doob decomposition (12.11). a-algebra (1.1). algebra of sets (1.1). almost everywhere = a.e. (1.5); almost surely = a.s. (2.4). atoms: of a-algebra (9.1, 14.13); of distribution function (16.5). Azuma-Hoeffding inequality (E14.2). Baire category theorem (A1.12). Banach-Tarski paradox (1.0). Bayes' formula (15.7-15.9). Bellman Optimality Principle (E10.2, 15.3). Black-Scholes option-pricing formula (15.2). Blackwell's Markov chain (E4.8). Bochner's Theorem (E16.5) Borel-Cantelli Lemmas: First extension of (12.15).
=
BC1 (2.7); Second
Bounded Convergence Theorem = BDD (6.2, 13.6). branching process (Chapter 0, E12.1). Burkholder-Davis-Gundy inequality (14.18).
246
=
BC2 (4.3); Levy's
Index
Caratheodory's Lemma (A1.7). Caratheodory's Theorem: statement (1.7); proof (A1.8). Central Limit Theorem (18.4). Cesaro's Lemma (12.6). characteristic functions: definition (16.1); inversion formula (16.6); convergence theorem (18.1). Chebyshev's inequality (7.3). coin tossing (3.7). conditional expectation (Chapter 9): properties (9.7). conditional probability (9.9). consistency of Likelihood-Ratio Test (14.17). contraction property of conditional expectation (9.7,h). convergence in probability (13.5, A13.2). convergence theorems for integrals: MON (5.3); Fatou (5.4); DOM (5.9); BDD (6.2, 13.6); for UI RVs (13.7). convergence theorems for martingales: Main (11.5); for UI case (14.1); Upward (14.2); Downward (14.4). d-system (A1.2).
differentiation under integral sign (A16.1). distribution function for RV (3.10-3.11). Dominated Convergence Theorem
= DOM (5.9); conditional (9.7,g).
J.L. DOOB's Convergence Theorem (11.5); Decomposition (12.11); £P inequality (14.11); Optional Sampling Theorem (A14.3-14.4); Optional Stopping Theorem (10.10, A14.3); Submartingale Inequality (14.6); Up crossing Lemma (11.2) - and much else! Downward Theorem (14.4). Dynkin's Lemma (A1.3). events (Chapter 2): independent (4.1). expectation (6.1): conditional (Chapter 9). 'extension of measures': uniqueness (1.6); existence (1. 7).
Index
extinction probability (004). fair game (10.5): unfavourable (E4.7). Fatou Lemmas: for sets (2.6,b), 2.7,c); for functions (5.4); conditional Version (9.7 ,f). filtered space, filtration (10.1). filtering (15.6-15.9). finite and u-finite measures (1.5). Forward Convergence Theorem for supermartingales (11.5). Fubini's Theorem (8.2). gambler's ruin (ElO.7). gambling strategy (10.6). Hardy space
'HA
(14.18).
harnesses (15.10-15.12). hedging strategy (15.2). Helly-Bray Lemma (1704). hitting times (10.12). Hoeffding's inequality (E14.2). Holder's inequality (6.13). Hunt's Lemma (E14.1). independence: definitions (4.1); 1l"-system criterion (4.2); and conditioning (9.7,k, 9.10). independence and product measure (8.4). inequalities: Azuma-Hoeffding (E14.2); Burkholder-Davis-Gundy (14.18); Chebyshev (7.3); Doob's £,P (14.11); Holder (6.13); Jensen (6.6), and in conditional form (9.7,h); Khinchine - see (14.8); Kolmogorov (14.6); Markov (604); Minkowski (6.14); Schwarz (6.8). infinite products of probability measures (8.7, Chapter A9); Kakutani's Theorem on (14.12, 14.17). integration (Chapter 5).
Index
Jensen's inequality (6.6); conditional form (9.7,h). Kakutani's Theorem on likelihood ratios (14.12, 14.17). Kalman-Bucy filter (15.6-15.9). A.N. KOLMOGOROV's Definition of Conditional Expectation (9.2); Inequality (14.6); Law of the Iterated Logarithm (A4.1, 14.7); Strong Law of Large Numbers (12.10, 14.5); Three-Series Theorem (12.5); Truncation Lemma (12.9); Zero-One (0-1) Law (4.11,14.3). Kronecker's Lemma (12.7). Laplace transforms: inversion (E7.1); and weak convergence (E18.7). law of random variable (3.9): joint laws (8.3). least-squares-best predictor (9.4). Lebesgue integral (Chapter 5). Lebesgue measure = Leb (1.8, A1.9). Lebesgue spaces 0, LP (6.10). P.LEVY's Convergence Theorem for CFs (18.1); Downward Theorem for martingales (14.4); Extension of Borel-Cantelli Lemmas (12.15); Inversion formula for CFs (16.6); Upward Theorem for martingales (14.2). Likelihood-Ratio Test, consistency of (14.17). Mabinogion sheep (15.3-15.5). Markov chain (4.8, 10.13). Markov's inequality (6.4). martingale (Chapters 1O-15!): definition (10.3); Convergence Theorem (11.5); Optional-Stopping Theorem (10.9-10.10, A14.3); OptionalSampling Theorem (Chapter AI4). martingale transform (10.6) measurable function (3.1). measurable space (1.1). measure space (1.4). Minkowski's inequality (6.14). Moment Problem (E18.6).
250
Index
monkey typing Shakespeare (4.9). Monotone-Class Theorem (3.14, A1.3). Monotone-Convergence Theorem: for sets (1.10); for functions (5.3, Chapter A5); conditional version (9.7,e). narrow convergence - see weak convergence. option pricing (15.2). Optional-Sampling Theorem (Chapter AI4). Optional-Stopping Theorems (10.9-10.10, AI4.3). optional time - see stopping time (10.8). orthogonal projection (6.11): and conditional expectation (9.4-9.5). outer measures (A1.6). 1I"-system (1.6): Uniqueness Lemmas (1.6, 4.2). P6lya's urn (ElO.1, ElO.8). previsible ( = predictable) process (10.6). probability density function = pdf (6.12); joint (8.3). probability measure (1.5). probability triple (2.1). product measures (Chapter 8). Pythagoras's Theorem (6.9). Radon-Nikodym theorem (5.14, 14.13-14.14). random signs (12.3). random walk: hitting times (10.12, ElO.7); on free group (EG.3-EG.4). Record Problem (E4.3, EI2.2, 18.5). regular conditional probability (9.9). Riemann integral (5.3). sample path (4.8) sample point (2.1) sample space (2.1)
Index
251
Schwarz inequality (6.8).
Star Trek problems (E10.10, E10.11, E12.3). stopped process (10.9). stopping times (10.8); associated u-algebras (A14.1). Strassen's Law of the Iterated Logarithm (A4.2). Strong Laws (7.2, 12.10, 12.14, 14.5). submartingales and supermartingales: definitions (10.3); convergence theorem (11.5); optional stopping (10.9-10.10); optional sampling (Al4.4). superharmonic functions for Markov chains (10.13). symmetrization technique (12.4). d-system (A1.2); 7r-system (1.6). tail u-algebra (4.10-4.12, 14.3). Tchebycheff: = Chebyshev (7.3). Three-Series Theorem (12.5). tightness (17.5). Tower Property (9.7,i). Truncation Lemma (12.9). uniform integrability (Chapter 13). Up crossing Lemma (11.1-11.2). Uniqueness Lemma (1.6). weak convergence (Chapter 17): and characteristic functions (18.1); and moments (E18.3-18.4); and Laplace transforms (E18.7). Weierstrass approximation theorem (7.4). Zero-one law = 0-1 law (4.11,14.3).