This page intentionally left blank
Stochastic Processes This comprehensive guide to stochastic processes gives a comp...
118 downloads
1284 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
This page intentionally left blank
Stochastic Processes This comprehensive guide to stochastic processes gives a complete overview of the theory and addresses the most important applications. Pitched at a level accessible to beginning graduate students and researchers from applied disciplines, it is both a course book and a rich resource for individual readers. Subjects covered include Brownian motion, stochastic calculus, stochastic differential equations, Markov processes, weak convergence of processes, and semigroup theory. Applications include the Black–Scholes formula for the pricing of derivatives in financial mathematics, the Kalman–Bucy filter used in the US space program, and also theoretical applications to partial differential equations and analysis. Short, readable chapters aim for clarity rather than for full generality. More than 350 exercises are included to help readers put their new-found knowledge to the test and to prepare them for tackling the research literature. richard f. bass is Board of Trustees Distinguished Professor in the Department of Mathematics at the University of Connecticut.
CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS Editorial Board Z. Ghahramani (Department of Engineering, University of Cambridge) R. Gill (Mathematical Insitute, Leiden University) F. P. Kelly (Department of Pure Mathematics and Mathematical Statistics, University of Cambridge) B. D. Ripley (Department of Statistics, University of Oxford) S. Ross (Department of Industrial and Systems Engineering, University of Southern California) M. Stein (Department of Statistics, University of Chicago) This series of high-quality upper-division textbooks and expository monographs covers all aspects of stochastic applicable mathematics. The topics range from pure and applied statistics to probability theory, operations research, optimization, and mathematical programming. The books contain clear presentations of new developments in the field and also of the state of the art in classical methods. While emphasizing rigorous treatment of theoretical methods, the books also contain applications and discussions of new techniques made possible by advances in computational practice. A complete list of books in the series can be found at http://www.cambridge.org/statistics. Recent titles include the following: 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 33. 34.
Statistical Models, by A. C. Davison Semiparametric Regression, by David Ruppert, M. P. Wand and R. J. Carroll Exercises in Probability, by Lo¨ıc Chaumont and Marc Yor Statistical Analysis of Stochastic Processes in Time, by J. K. Lindsey Measure Theory and Filtering, by Lakhdar Aggoun and Robert Elliott Essentials of Statistical Inference, by G. A. Young and R. L. Smith Elements of Distribution Theory, by Thomas A. Severini Statistical Mechanics of Disordered Systems, by Anton Bovier The Coordinate-Free Approach to Linear Models, by Michael J. Wichura Random Graph Dynamics, by Rick Durrett Networks, by Peter Whittle Saddlepoint Approximations with Applications, by Ronald W. Butler Applied Asymptotics, by A. R. Brazzale, A. C. Davison and N. Reid Random Networks for Communication, by Massimo Franceschetti and Ronald Meester Design of Comparative Experiments, by R. A. Bailey Symmetry Studies, by Marlos A. G. Viana Model Selection and Model Averaging, by Gerda Claeskens and Nils Lid Hjort Bayesian Nonparametrics, edited by Nils Lid Hjort et al. From Finite Sample to Asymptotic Methods in Statistics, by Pranab K. Sen, Julio M. Singer and Antonio C. Pedrosa de Lima Brownian Motion, by Peter M¨orters and Yuval Peres Probability, by Rick Durrett Stochastic Processes, by Richard F. Bass Structured Regression for Categorical Data, by Gerhard Tutz
Stochastic Processes Richard F. Bass University of Connecticut
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ao Paulo, Delhi, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9781107008007 C R. F. Bass 2011
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2011 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library
p.
Library of Congress Cataloguing in Publication data Bass, Richard F. Stochastic processes / Richard F. Bass. cm. – (Cambridge series in statistical and probabilistic mathematics ; 33) Includes index. ISBN 978-1-107-00800-7 (hardback) 1. Stochastic analysis. I. Title. QA274.2.B375 2011 2011023024 519.2 32 – dc23 ISBN 978-1-107-00800-7 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
To Meredith, as always
Contents
Preface Frequently used notation
page xiii xv
1 1.1 1.2
Basic notions Processes and σ -fields Laws and state spaces
1 1 3
2 2.1
Brownian motion Definition and basic properties
6 6
3 3.1 3.2 3.3 3.4 3.5 3.6
Martingales Definition and examples Doob’s inequalities Stopping times The optional stopping theorem Convergence and regularity Some applications of martingales
13 13 14 15 17 17 20
4 4.1 4.2
Markov properties of Brownian motion Markov properties Applications
25 25 27
5
The Poisson process
32
6 6.1 6.2
Construction of Brownian motion Wiener’s construction Martingale methods
36 36 39
7
Path properties of Brownian motion
43
8
The continuity of paths
49
vii
viii
Contents 9 9.1 9.2 9.3 9.4
Continuous semimartingales Definitions Square integrable martingales Quadratic variation The Doob–Meyer decomposition
54 54 55 57 58
10 10.1 10.2
Stochastic integrals Construction Extensions
64 64 69
11
Itˆo’s formula
71
12 12.1 12.2 12.3 12.4 12.5 12.6
Some applications of Itˆo’s formula L´evy’s theorem Time changes of martingales Quadratic variation Martingale representation The Burkholder–Davis–Gundy inequalities Stratonovich integrals
77 77 78 79 79 82 84
13 13.1 13.2
The Girsanov theorem The Brownian motion case An example
89 89 92
14 14.1 14.2 14.3
Local times Basic properties Joint continuity of local times Occupation times
94 94 96 97
15 15.1 15.2 15.3
Skorokhod embedding Preliminaries Construction of the embedding Embedding random walks
100 100 105 108
16 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8
The general theory of processes Predictable and optional processes Hitting times The debut and section theorems Projection theorems More on predictability Dual projection theorems The Doob–Meyer decomposition Two inequalities
111 111 115 117 119 120 122 124 126
Contents
ix
17 17.1 17.2 17.3 17.4 17.5 17.6 17.7
Processes with jumps Decomposition of martingales Stochastic integrals Itˆo’s formula The reduction theorem Semimartingales Exponential of a semimartingale The Girsanov theorem
130 130 133 135 139 141 143 144
18
Poisson point processes
147
19 19.1 19.2 19.3 19.4 19.5
Framework for Markov processes Introduction Definition of a Markov process Transition probabilities An example The canonical process and shift operators
152 152 153 154 156 158
20 20.1 20.2 20.3
Markov properties Enlarging the filtration The Markov property Strong Markov property
160 160 162 164
21 21.1 21.2 21.3 21.4
Applications of the Markov properties Recurrence and transience Additive functionals Continuity Harmonic functions
167 167 169 170 171
22 22.1 22.2 22.3 22.4
Transformations of Markov processes Killed processes Conditioned processes Time change Last exit decompositions
177 177 178 180 181
23 23.1 23.2
Optimal stopping Excessive functions Solving the optimal stopping problem
184 184 187
24 24.1 24.2 24.3
Stochastic differential equations Pathwise solutions of SDEs One-dimensional SDEs Examples of SDEs
192 192 196 198
x
Contents 25
Weak solutions of SDEs
204
26
The Ray–Knight theorems
209
27
Brownian excursions
214
28 28.1 28.2 28.3 28.4
Financial mathematics Finance models Black–Scholes formula The fundamental theorem of finance Stochastic control
218 218 220 223 226
29 29.1 29.2 29.3 29.4 29.5 29.6
Filtering The basic model The innovation process Representation of F Z -martingales The filtering equation Linear models Kalman–Bucy filter
229 229 230 231 232 234 234
30 30.1 30.2 30.3
Convergence of probability measures The portmanteau theorem The Prohorov theorem Metrics for weak convergence
237 237 239 241
31
Skorokhod representation
244
32 32.1 32.2
The space C[0, 1] Tightness A construction of Brownian motion
247 247 248
33 33.1 33.2
Gaussian processes Reproducing kernel Hilbert spaces Continuous Gaussian processes
251 251 254
34 34.1 34.2 34.3
The space D[0, 1] Metrics for D[0, 1] Compactness and completeness The Aldous criterion
259 259 262 264
35 35.1 35.2 35.3
Applications of weak convergence Donsker invariance principle Brownian bridge Empirical processes
269 269 273 275
Contents
xi
36 36.1 36.2
Semigroups Constructing the process Examples
279 279 283
37 37.1 37.2 37.3 37.4
Infinitesimal generators Semigroup properties The Hille–Yosida theorem Nondivergence form elliptic operators Generators of L´evy processes
286 286 292 296 297
38 38.1 38.2 38.3
Dirichlet forms Framework Construction of the semigroup Divergence form elliptic operators
302 303 304 307
39 39.1 39.2 39.3
Markov processes and SDEs Markov properties SDEs and PDEs Martingale problems
312 312 314 315
40 40.1 40.2 40.3 40.4
Solving partial differential equations Poisson’s equation Dirichlet problem Cauchy problem Schr¨odinger operators
319 319 320 321 323
41 41.1 41.2 41.3 41.4 41.5 41.6
One-dimensional diffusions Regularity Scale functions Speed measures The uniqueness theorem Time change Examples
326 326 327 329 333 334 336
42 42.1 42.2 42.3
L´evy processes Examples Construction of L´evy processes Representation of L´evy processes
339 339 340 344
Appendices A Basic probability A.1 First notions A.2 Independence A.3 Convergence A.4 Uniform integrability
348 348 353 355 356
xii
Contents A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 A.15 A.16
Conditional expectation Stopping times Martingales Optional stopping Doob’s inequalities Martingale convergence theorem Strong law of large numbers Weak convergence Characteristic functions Uniqueness and characteristic functions The central limit theorem Gaussian random variables
357 359 359 360 361 362 364 367 370 372 372 374
B B.1 B.2
Some results from analysis The monotone class theorem The Schwartz class
378 378 379
C
Regular conditional probabilities
380
D
Kolmogorov extension theorem
382
References Index
385 387
Preface
Why study stochastic processes? This branch of probability theory offers sophisticated theorems and proofs, such as the existence of Brownian motion, the Doob–Meyer decomposition, and the Kolmogorov continuity criterion. At the same time stochastic processes also have far-reaching applications: the explosive growth in options and derivatives in financial markets throughout the world derives from the Black–Scholes formula, while NASA relies on the Kalman–Bucy method to filter signals from satellites and probes sent into outer space. A graduate student taking a year-long course in probability theory first learns about sequences of random variables and topics such as laws of large numbers, central limit theorems, and discrete time martingales. In the second half of the course, the student will then turn to stochastic processes, which is the subject of this text. Topics covered here are Brownian motion, stochastic integrals, stochastic differential equations, Markov processes, the Black–Scholes formula of financial mathematics, the Kalman–Bucy filter, as well as many more. The 42 chapters of this book can be grouped into seven parts. The first part consists of Chapters 1–8, where some of the basic processes and ideas are introduced, including Brownian motion. The next group of chapters, Chapters 9–15, introduce the theory of stochastic calculus, including stochastic integrals and Itˆo’s formula. Chapters 16–18 explore jump processes. This requires a study of the foundations of stochastic processes, which is also known as the general theory of processes. Next we take up Markov processes in Chapters 19–23. A formidable obstacle to the study of Markov processes is the notation, and I have attempted to make this as accessible as possible. Chapters 24–29 involve stochastic differential equations. Two very important applications, to financial mathematics and to filtering, appear in Chapters 28 and 29, respectively. Probability measures on metric spaces and the weak convergence of random variables taking values in a metric space prove to be relevant to the study of stochastic processes. These and related topics are treated in Chapters 30–35. We then return to Markov processes, namely, their construction and some important examples, in Chapters 36–42. Tools used in the construction include infinitesimal generators, Dirichlet forms, and solutions to stochastic differential equations, while two important examples that we consider are diffusions on the real line and L´evy processes. The prerequisites to this book are a sound knowledge of basic measure theory and a course in the classical aspects of probability. The probability topics needed are provided (with proofs) in an appendix. There is far too much material in this book to cover in a single semester, and even too much for a full year. I recommend that as a minimum the following chapters be studied: Chapters 1–5, Chapters 9–13, Chapters 19–21, and Chapter 24. If possible, include either xiii
xiv
Preface
Chapter 28 or Chapter 29. In Chapter 11, the statement and corollaries of Itˆo’s formula are very important, but the proof of Itˆo’s formula may be omitted. I would like to thank the many students who patiently sat through my lectures, pointed out errors, and made suggestions. I especially would like to thank my colleague Sasha Teplyaev who taught a course from a preliminary version of this book and made a great number of useful suggestions.
Frequently used notation
Here are some notational conventions we will use. We use the letter c, either with or without subscripts, to denote a finite positive constant whose exact value is unimportant and which may change from line to line. We use B(x, r) to denote the open Euclidean ball centered at x with radius r. a ∧ b is the minimum of a and b, while a ∨ b is the maximum of a and b. x+ = x ∨ 0 and x− = (−x) ∨ 0. The symbol ∃ is used in a few formulas and means “there exists.” Q, Q+ , N, and Z denote the rationals, the positive rationals, the natural numbers, and the integers, respectively. If C is a matrix, C T is the transpose of C. For a set A, we use Ac for the complement of A. If A is a subset of a topological space, A, 0 A , and ∂A denote the closure, interior, and boundary of A, respectively. Given a topological space S , we use C(S ) for the space of continuous functions on S , where we use the supremum norm. If S is a domain in Rd , C k (S ) refers to the set of continuous functions with domain S whose partial derivatives up to order k are continuous. C ∞ functions are those that are infinitely differentiable. We will on a few occasions use the Fourier transform, which we define by f (u) = eiu·x f (x) dx for f integrable. This agrees with the convention in Rudin (1987). If X is a stochastic process whose paths are right continuous with left limits, then Xt− = lims
xv
1 Basic notions
In a first course on probability one typically works with a sequence of random variables X1 , X2 , . . . For stochastic processes, instead of indexing the random variables by the positive integers, we index them by t ∈ [0, ∞) and we think of Xt as being the value at time t. The random variable could be the location of a particle on the real line, the strength of a signal, the price of a stock, and many other possibilities as well. We will also work with increasing families of σ -fields {F t }, known as filtrations. The σ -field Ft is supposed to represent what we know up to time t.
1.1 Processes and σ -fields Let (, F , P ) be a probability space. A real-valued stochastic process (or simply a process) is a map X from [0, ∞) × to the reals. We write Xt = Xt (ω) = X (t, ω). We will impose stronger measurability conditions shortly, but for now we require that the random variables Xt be measurable with respect to F for each t ≥ 0. A collection of σ -fields Ft such that Ft ⊂ F for each t and Fs ⊂ Ft if s ≤ t is called a filtration. Define Ft+ = ∩ε>0 Ft+ε . A filtration is right continuous if Ft+ = Ft for all t ≥ 0. The σ -field Ft+ is supposed to represent what one knows if one looks ahead an infinitesimal amount. Most of the filtrations we will come across will be right continuous, but see Exercise 1.1. A null set N is one that has outer probability 0. This means that inf {P(A) : N ⊂ A, A ∈ F } = 0. A filtration is complete if each Ft contains every null set. A filtration that is right continuous and complete is said to satisfy the usual conditions. Given a filtration {Ft }, whether or not it satisfies the usual conditions, we define F∞ to be the σ -field generated by ∪t≥0 Ft , that is, the smallest σ -field containing ∪t≥0 Ft , and we write F∞ = Ft . t≥0
Recall that the arbitrary intersection of σ -fields is a σ -field, but the union of even two σ -fields need not be a σ -field. We say that a stochastic process X is adapted to a filtration {Ft } if Xt is Ft measurable for each t. Often one starts with a stochastic process X and wants to define a filtration with respect to which X is adapted. 1
2
Basic notions
The simplest way to do this is to let Ft be the σ -field generated by the random variables {Xs , s ≤ t}. More often one wants to have a slightly larger filtration than the one generated by X . We define the minimal augmented filtration generated by X to be the smallest filtration that is right continuous and complete and with respect to which the process X is adapted. For each t, Ft is in general strictly larger than the smallest σ -field with respect to which {Xs : s ≤ t} is measurable because of the inclusion of the null sets. It is important to include the null sets; see Exercise 1.5. There is no widely accepted name for what we call the minimal augmented filtration; I like this nomenclature because it is descriptive and sufficiently different from “filtration generated by X ” to avoid confusion. The minimal augmented filtration generated by the process Xt can be constructed in three steps. First, let {Ft00 } be the smallest filtration with respect to which X is adapted, that is,
Ft00 = σ (Xs ; s ≤ t ).
(1.1)
Let P∗ be the outer probability corresponding to P: for A ⊂ ,
P∗ (A) = inf {P(B) : B ∈ F , A ⊂ B}. Let N be the collection of null sets, so that N = {A ⊂ : P∗ (A) = 0}. The second step is to let Ft0 be the smallest σ -field containing Ft00 and N , or
Ft0 = σ (Ft00 ∪ N ).
(1.2)
0 Ft = ∩ε>0 Ft+ε .
(1.3)
The third step is to let
Exercise 1.2 asks you to check that {Ft } is the minimal augmented filtration generated by X . We will refer to {Ft00 } as the filtration generated by X . Two stochastic processes X and Y are said to be indistinguishable if P(Xt = Yt for some t ≥ 0) = 0. X and Y are versions of each other if for each t ≥ 0, we have P(Xt = Yt ) = 0. An example of two processes that are versions of each other but are not indistinguishable is to let = [0, 1], F the Borel σ -field on [0, 1], P Lebesgue measure on [0, 1], X (t, ω) = 0 for all t and ω, and Y (t, ω) equal to 1 if t = ω and 0 otherwise. Note that the functions t → X (t, ω) are continuous for each ω, but the functions t → Y (t, ω) are not continuous for any ω. If X is a stochastic process, the functions t → X (t, ω) are called the paths or trajectories of X . There will be one path for each ω. If the paths of X are continuous functions, except for a set of ω’s in a null set, then X is called a continuous process, or is said to be continuous. We similarly define right continuous process, left continuous process, etc. A function f (t ) is right continuous with left limits if limh>0,h↓0 f (t + h) = f (t ) for all t and limh<0,h↑0 f (t + h) exists for all t > 0. Almost all our stochastic processes will have the property that except for a null set of ω’s the function t → X (t, ω) is right continuous and has left limits. One often sees cadlag to refer to paths that are right continuous with left limits; this abbreviates the French “continue a` droite, limite a` gauche.”
1.2 Laws and state spaces
3
1.2 Laws and state spaces Let S be a topological space. The Borel σ -field on S is defined to be the σ -field generated by the open sets of S . A function f : S → R is Borel measurable if f −1 (G) is in the Borel σ -field of S whenever G is an open subset of R. A random variable Y : → S is measurable with respect to a σ -field F of subsets of if {ω ∈ : Y (ω) ∈ A} is in F whenever A is in the Borel σ -field on S . A stochastic process taking values in a topological space S is a map X : [0, ∞) × → S , where for each t, the random variable Xt is measurable with respect to F . Recall that if we have a probability space (, F , P ) and Y : → R is a random variable, then the law of Y is the probability measure PY on the Borel subsets of R defined by PY (A) = P(Y ∈ A). Similarly, if Y : → Rd is a d-dimensional random vector, then the law of Y is the probability measure PY on the Borel subsets of Rd defined by PY (A) = P(Y ∈ A). We extend this definition to random variables Y taking values in a topological space S . In this case PY is a probability measure on the Borel subsets of S with the same definition: PY (A) = P(Y ∈ A). In particular, if Y and Z are two random variables with the same state space S , then Y and Z will have the same law if P(Y ∈ A) = P(Z ∈ A) for all Borel subsets A of S . The relevance of the preceding paragraph to stochastic processes is this. Suppose X and Y are stochastic processes with continuous paths. Let S = C[0, ∞) be the collection of real-valued continuous functions on [0, ∞) together with the usual metric defined in terms of the supremum norm: d( f , g) = sup | f (t ) − g(t )|. 0≤t
(Strictly speaking, we should write C([0, ∞)), but we follow the usual convention and drop the outside parentheses.) Let the random variable X taking values in S be defined by setting X (ω) to be the continuous function t → X (t, ω), and define Y similarly. More precisely, X : → S with X (ω)(t ) = X (t, ω),
t ≥ 0.
Then X and Y are random variables taking values in the metric space S , and saying that X and Y have the same law means that P(X ∈ A) = P(Y ∈ A) for all Borel subsets A of S . When this happens, we also say that the stochastic processes X and Y have the same law. Two stochastic processes X and Y have the same finite-dimensional distributions if for every n ≥ 1 and every t1 < · · · < tn , the laws of (Xt1 , . . . , Xtn ) and (Yt1 , . . . , Ytn ) are equal. Most often the topological spaces we will consider will also be metric spaces, but there will be a few occasions when we want to consider topological spaces that are not metric spaces. Suppose S = R[0,∞) . We furnish S with the product topology. S can be identified with the collection of real-valued functions on [0, ∞), but the topology is not given by the supremum norm nor by any other metric. We use f for elements of S , where f (t ) is the tth coordinate of f . We call a subset A of S a cylindrical set if there exist n ≥ 1, non-negative reals t1 , t2 , . . . , tn , and a Borel subset B of Rn such that A = { f ∈ S : ( f (t1 ), . . . , f (tn )) ∈ B}.
4
Basic notions
The appropriate σ -field to use on S is the one generated by the collection of cylindrical sets. We want to generalize this notion slightly by allowing more general index sets and by allowing for the possibility of considering only a subset of the product space. Definition 1.1 Let U be a topological space, T an arbitrary index set, and B a subset of U T , the collection of functions from T into U . We say a set C is a cylindrical subset of B if there exist n ≥ 1, t1 , . . . , tn ∈ T , and a Borel subset A of Rn such that C = { f ∈ B : ( f (t1 ), . . . , f (tn )) ∈ A}.
Exercises 1.1
This exercise gives an example where {Ft00 } defined by (1.1) is not right continuous. Let = {a, b}, let F be the collection of all subsets of , and let P({a}) = P({b}) = 12 . Define ⎧ t ≤ 1; ⎪ ⎨0, Xt (ω) = 0, t > 1 and ω = a; ⎪ ⎩ t − 1, t > 1 and ω = b. Calculate Ft00 = σ (Xs ; s ≤ t ) and show {Ft00 } is not right continuous.
1.2
If X is a stochastic process, let Ft00 , Ft0 , and Ft be defined by (1.1), (1.2), and (1.3), respectively. Show that {Ft } is the minimal augmented filtration generated by X .
1.3
Let {Ft } be a filtration satisfying the usual conditions and let B[0, t] be the Borel σ -field on [0, t]. A real-valued stochastic process X is progressively measurable if for each t ≥ 0, the map (s, ω) → X (s, ω) from [0, t] × to R is measurable with respect to the product σ -field B[0, t] × Ft . (1) If X is adapted to {Ft } and we define Xt (n) (ω) =
∞
Xk/2n (ω)1[k/2n ,(k+1)/2n ) (t ),
k=0
show that X (n) is progressively measurable for each n ≥ 1. (2) Use (1) to show that if X is adapted to {Ft } and has left continuous paths, then X is progressively measurable. (3) If X is adapted to {Ft } and we define Yt (n) (ω) =
∞
X(k+1)/2n (ω)1[k/2n ,(k+1)/2n ) (t ),
k=0
show that for each t ≥ 0, the map (s, ω) → Y (n) (s, ω) from [0, t] × to R is measurable with respect to B[0, t] × Ft+2−n . (4) Show that if X is adapted to {Ft } and has right continuous paths, then X is progressively measurable. 1.4
Let S = R[0,1] , the set of functions from [0, 1] to R, and let F be the σ -field generated by the cylindrical sets. The purpose of this exercise is to show that the elements of F depend on only countably many coordinates.
Notes
5
Let S0 = {(x1 , x2 , . . .)}, the set of sequences taking values in R. Let F0 be the σ -field generated by the cylindrical subsets of RN , where N = {1, 2, . . .}. Show that B ∈ F if and only if there exist t1 , t2 , . . . in [0, 1] and a set C ∈ F0 such that B = { f ∈ S : ( f (t1 ), f (t2 ), . . .) ∈ C}. 1.5
/ F , where Null sets are sometimes important! Let S and F be as in Exercise 1.4. Show that D ∈ D = { f ∈ S : f is a continuous function on [0, 1]}.
1.6
Suppose X is a stochastic process, {Ft } its minimal augmented filtration, and F∞ = ∨t≥0 Ft . Suppose with probability one, the paths of X are right continuous with left limits. Let Xt− = lims 1}, prove A ∈ F∞ .
1.7
Suppose X is a stochastic process, {Ft } is the minimal augmented filtration for X , and F∞ = ∨t≥0 Ft . If the paths of X are right continuous with left limits with probability one, show that the event A = {X has continuous paths} is in F∞ .
Notes The older literature sometimes uses the notion of a separable stochastic process, but this is rarely seen nowadays. For much more on measurability, see Chapter 16. For the complete story on the foundations of stochastic processes, see Dellacherie and Meyer (1978).
2 Brownian motion
Brownian motion is by far the most important stochastic process. It is the archetype of Gaussian processes, of continuous time martingales, and of Markov processes. It is basic to the study of stochastic differential equations, financial mathematics, and filtering, to name only a few of its applications. In this chapter we define Brownian motion and consider some of its elementary aspects. Later chapters will take up the construction of Brownian motion and properties of Brownian motion paths.
2.1 Definition and basic properties Let (, F , P ) be a probability space and let {Ft } be a filtration, not necessarily satisfying the usual conditions. Definition 2.1 Wt = Wt (ω) is a one-dimensional Brownian motion with respect to {Ft } and the probability measure P, started at 0, if (1) Wt is Ft measurable for each t ≥ 0. (2) W0 = 0, a.s. (3) Wt − Ws is a normal random variable with mean 0 and variance t − s whenever s < t. (4) Wt − Ws is independent of Fs whenever s < t. (5) Wt has continuous paths. If instead of (2) we have W0 = x, we say we have a Brownian motion started at x. Definition 2.1(4) is referred to as the independent increments property of Brownian motion. The fact that Wt − Ws has the same law as Wt−s , which follows from Definition 2.1(3), is called the stationary increments property. When no filtration is specified, we assume the filtration is the filtration generated by W , i.e., Ft = σ (Ws ; s ≤ t ). Sometimes a one-dimensional Brownian motion started at 0 is called a standard Brownian motion. Figure 2.1 is a simulation of a typical Brownian motion path. We define d-dimensional Brownian motion with respect to a filtration {Ft } and started at x = (x1 , . . . , xd ) to be (Wt (1) , . . . , Wt (d ) ), where the W (i) are each one-dimensional Brownian motions with respect to {Ft } started at xi , respectively, and W (1) , . . . , W (n) are all independent. The law of a Brownian motion is called Wiener measure. More precisely, given a Brownian motion W , we can view it as a random variable taking values in C[0, ∞), the space of real-valued continuous functions on [0, ∞). The law of W is the measure PW on 6
2.1 Definition and basic properties
7
2 1.5
1
0.5
0 0.5
1
1.5
2 0
0.2
0.4
0.6
0.8
1
Figure 2.1 Simulation of a typical Brownian motion path.
C[0, ∞) defined by PW (A) = P(W ∈ A) for all Borel subsets A of C[0, ∞). The measure PW is Wiener measure. There are a number of transformations one can perform on a Brownian motion that yield a new Brownian motion. The first one is called the scaling property of Brownian motion, or simply scaling. Proposition 2.2 If W is a Brownian motion started at 0, a > 0, and Yt = aWt/a2 , then Yt is a Brownian motion started at 0. Proof We use Gt = Ft/a2 for the filtration for Y . Clearly Yt has continuous paths, Y0 = 0, a.s., and Yt is Gt measurable. If s < t, Yt − Ys = a(Wt/a2 − Ws/a2 ) is independent of Fs/a2 , hence is independent of Gs . Finally, if s < t, and if s < t, then Yt − Ys will be a normal random variable with mean zero and
t s Var (Yt − Ys ) = a2 Var (Wt/a2 − Ws/a2 ) = a2 2 − 2 = t − s. a a This suffices to give our result. For some other transformations, see Exercises 2.3 and 2.5. Recall what it means for a finite collection of random variables to be jointly normal; see (A.29). A stochastic process X is Gaussian or jointly normal if all its finite-dimensional distributions are jointly normal, that is, if for each n ≥ 1 and t1 < · · · < tn , the collection of random variables Xt1 , . . . , Xtn is a jointly normal collection.
8
Brownian motion
Proposition 2.3 If W is a Brownian motion, then W is a Gaussian process. Proof
Suppose W is a Brownian motion and let 0 = t0 < t1 < · · · < tn . Define Wt − Wti−1 , Zi = √i ti − ti−1
i = 1, 2, . . . , n.
By Definition 2.1(4), Zi is independent of Fti−1 , and hence independent of Z1 , . . . , Z j−1 . By Definition 2.1(3), Zi is a mean-zero random variable with variance one. We can write Wt j =
j
(ti − ti−1 )1/2 Zi ,
j = 1, . . . , n,
i=1
and so (Wt1 , . . . , Wtn ) is jointly normal. It follows that Brownian motion is a Gaussian process. Since the law of a finite collection of jointly normal random variables is determined by their means and covariances, let’s calculate the covariance of Ws and Wt when W is a Brownian motion. If s ≤ t, then t − s = Var (Wt − Ws ) = Var Wt + Var Ws − 2 Cov (Ws , Wt ) = t + s − 2 Cov (Ws , Wt ) from Definition 2.1(2) and (3). Hence Cov (Ws , Wt ) = s if s ≤ t. This is frequently written as Cov (Ws , Wt ) = s ∧ t.
(2.1)
We have the following converse. Theorem 2.4 If W is a process such that all the finite-dimensional distributions are jointly normal, E Ws = 0 for all s, Cov (Ws , Wt ) = s when s ≤ t, and the paths of Wt are continuous, then W is a Brownian motion. Proof For Ft we take the filtration generated by W . If we take s = t, then Var Wt = Cov (Wt , Wt ) = t. In particular, Var W0 = 0, and since E W0 = 0, then W0 = 0, a.s. We have Var (Wt − Ws ) = Var Wt − 2 Cov (Ws , Wt ) + Var Wt = t − 2s + s = t − s. We have thus established all the parts of Definition 2.1 except for the independence of Wt −Ws from Fs . If r ≤ s < t, then Cov (Wt − Ws , Wr ) = Cov (Wt , Wr ) − Cov (Ws , Wr ) = r − r = 0, and so Wt − Ws is independent of Wr by Proposition A.55. This shows that Wt − Ws is independent of Fs . We now look at two results that are more technical. These should only be skimmed on the first reading of the book: read the statements, but not the proofs. The first result says that if W is a Brownian motion with respect to the filtration generated by W , then it is also a Brownian motion with respect to the minimal augmented filtration.
2.1 Definition and basic properties
9
Proposition 2.5 Let Wt be a Brownian motion with respect to {Ft00 }, where Ft00 = 0 σ (Ws ; s ≤ t ). Let N be the collection of null sets, Ft0 = σ (Ft00 ∪ N ), and Ft = ∩ε>0 Ft+ε . (1) W is a Brownian motion with respect to the filtration {Ft }. (2) Ft = Ft0 for each t. Proof (1) The only property we need to check is Definition 2.1(4). If f is a continuous bounded function on R, A ∈ Fs00 , and s < t, then because W is a Brownian motion with respect to {Ft00 }, the independent increments property shows that
E [ f (Wt − Ws ); A] = E [ f (Wt − Ws )] P(A).
(2.2)
If A is such that A\B and B\A are null sets for some B ∈ Fs00 , it is easy to see that (2.2) continues to hold. By linearity, it also holds if A is a finite disjoint union of such sets. If C1 is the collection of subsets of Fs0 that are finite disjoint unions of such sets, then C1 is an algebra of subsets of Fs0 . Let M1 be the collection of subsets of Fs0 for which (2.2) holds. It is readily checked that M1 is a monotone class. By the monotone class theorem (Theorem B.2), M1 is equal to the smallest σ -field containing C1 , which is Fs0 . Therefore (2.2) holds for all A ∈ Fs0 . 0 0 Now suppose A ∈ Fs = Fs+ . Then for each ε > 0, A ∈ Fs+ε , and so using (2.2) with s replaced by s + ε and t replaced by t + ε, we have
E [ f (Wt+ε − Ws+ε ); A] = E [ f (Wt+ε − Ws+ε )] P(A).
(2.3)
Letting ε → 0 and using the facts that f is bounded and continuous and W has continuous paths, the dominated convergence theorem implies that
E [ f (Wt − Ws ); A] = E [ f (Wt − Ws )] P(A).
(2.4)
This equation holds whenever f is continuous and A ∈ Fs . By a limit argument, (2.4) holds whenever f is the indicator of a Borel subset of R. That says that Wt − Ws and Fs are independent. (2) Fix t and choose t0 > t. Let M2 be the collection of subsets of Ft00 whose conditional 0 0 expectation with respect to Ft is Ft measurable, that is, A ∈ M2 if A ∈ Ft00 and E [1A | Ft ] 0 is Ft0 measurable. Let C2 be the collection of events A for which there exist n ≥ 1, 0 ≤ s0 < s1 < · · · < sn ≤ t0 with t equal to one of the si , and Borel subsets B1 , . . . , Bn of R such that A = (Ws1 − Ws0 ∈ B1 , . . . , Wsn − Wsn−1 ∈ Bn ). Suppose A is of this form, and suppose t = si . Then by the independence result that we proved in (1),
E [1A | Ft ] = 1(Ws1 −Ws0 ∈B1 ,...,Wsi −Wsi−1 ∈Bi ) × P(Wsi+1 − Wsi ∈ Bi+1 , . . . , Wsn − Wsn−1 ∈ Bn ), which is Ft0 measurable. Thus C2 ⊂ M2 . Finite unions of sets in C2 form an algebra of subsets of Ft00 that generate Ft00 . It is easy to check that M2 is a monotone class, so by the 0 monotone class theorem, M2 equals Ft00 . By linearity and taking monotone limits, if Y is non-negative and Ft00 measurable, then E [Y | Ft ] is Ft0 measurable.
10
Brownian motion
To finish, suppose A ∈ Ft . Then since t < t0 , we see that A ∈ Ft00 . By Exercise 2.7, there exists Y ∈ Ft00 such that 1A = Y , a.s. Then E [Y | Ft ] is Ft0 measurable. Since Ft0 contains 0 all the null sets, 1A = E [1A | Ft ] is also Ft0 measurable, or A ∈ Ft0 . This proves (2). The final item we consider in this chapter is a subtle one. The question is this: if W and W are both Brownian motions, do they have all the same properties? To illustrate this issue, let’s revisit the example of Chapter 1 where = [0, 1], F is the Borel σ -field on [0, 1], P is Lebesgue measure on [0, 1], X (t, ω) = 0 for all t and ω, and Y (t, ω) is 1 if t = ω and 0 otherwise. For each t, P(Xt = Yt ) = 1, so X and Y have the same finite-dimensional distributions. However, if
A = { f : f is not a continuous function on [0, 1]}, then (X ∈ A) is a null set but (Y ∈ A) is not. Even though X and Y have the same finite-dimensional distributions, X has continuous paths but Y does not. To rephrase our question, is it true that P(W ∈ A) = P(W ∈ A) for every Borel subset A of C[0, ∞)? We know W and W have the same finite-dimensional distributions because each is jointly normal with zero means and Cov (Ws , Wt ) = s ∧ t = Cov (Ws , Wt ). The fact that the answer to our question is yes then comes from the following theorem. We look at C[0, t0 ] instead of C[0, ∞) for the sake of simplicity. Theorem 2.6 Let t0 > 0 and let X , Y be random variables taking values in C[0, t0 ] which have the same finite-dimensional distributions. Then the laws of X and Y are equal. Proof Let M be the collection of Borel subsets A of C[0, t0 ] for which P(X ∈ A) equals P(Y ∈ A). We will show that M is a monotone class and then use the monotone class theorem to show that M is equal to the Borel σ -field on C[0, t0 ]. First, let C be the collection of all cylindrical subsets of C[0, t0 ] (defined by Definition 1.1). Since the finite-dimensional distributions of X and Y are equal, then M contains C . It is easy to check that C is an algebra of subsets of C[0, t0 ]. If A1 ⊃ A2 ⊃ · · · are elements of M, then
P(X ∈ ∩n An ) = lim P(X ∈ An ) = lim P(Y ∈ An ) = P(Y ∈ ∩n An ) n
n
since P is a finite measure. Therefore ∩n An ∈ M. A very similar argument shows that if A1 ⊂ A2 ⊂ · · · are elements of M, then ∪n An ∈ M. Therefore M is a monotone class. By the monotone class theorem, M contains the smallest σ -field containing C . We will show that M contains all the open sets; then M will contain the smallest σ -field containing the open sets, and we will be done. Since C[0, t0 ] is separable, every open set is the countable union of open balls. Because M is a σ -field, it suffices to show that M contains the open balls in C[0, t0 ], that is, all sets of the form B( f0 , r) = { f ∈ C[0, t0 ] : sup | f (t ) − f0 (t )| < r} 0≤t≤t0
where r > 0 and f0 ∈ C[0, t0 ]. For each m and n, { f ∈ C[0, t0 ] : sup | f (k/2n ) − f0 (k/2n )| ≤ r − (1/m)} 0≤k≤2n t0
Exercises
11
is a set in C , and so is in M. As n → ∞, these sets decrease to Dm = { f ∈ C[0, t0 ] : sup | f (t ) − f0 (t )| ≤ r − (1/m)}, 0≤t≤t0
since all the functions we are considering are continuous. Finally, Dm increases to B( f0 , r) as m → ∞, so B( f0 , r) is in M as desired.
Exercises 2.1
Suppose W is a Brownian motion on [0, 1]. Let Yt = W1−t − W1 . Show that Yt is a Brownian motion on [0, 1].
2.2
This exercise shows that the projection of a d-dimensional Brownian motion onto a hyperplane yields a one-dimensional Brownian motion. Suppose (Wt (1) , . . . , Wt (d ) ) is a d-dimensional Brownian motion started from 0 and λ1 , . . . , λd ∈ R with di=1 λ2i = 1. Show that Xt = d (i) is a one-dimensional Brownian motion started from 0. i=1 λiWt
2.3
This exercise shows that rotating a Brownian motion about the origin yields another Brownian motion. Let W be a d-dimensional Brownian motion started at 0 and let A be a d × d orthogonal matrix, that is, A−1 = AT . Show that Yt = AWt is again a d-dimensional Brownian motion.
2.4
Here is a converse to Exercise 2.2: roughly speaking, if all the projections of a d-dimensional process X onto hyperplanes are one-dimensional Brownian motions, then X is a d-dimensional Brownian motion. Suppose (Xt1 , . . . , Xtd ) is a d-dimensional continuous process, i.e., one taking values in d R . Let {Ft } be the minimal augmented filtration generated by X . Suppose that whenever λ1 , . . . , λd ∈ R with di=1 λ2i = 1, then di=1 λi Xti is a one-dimensional Brownian motion started at 0 with respect to the filtration {Ft }. (1) If u = (u1 , . . . , ud ), let u = ( u2j )1/2 and let λ j = u j /u. Calculate d
E exp i
u j Xt
j
d
j = E exp iu λ j Xt ,
j=1
j=1
the joint characteristic function of Xt . (2) If t0 < t1 < · · · < tn , use independence and (1) to calculate n−1 d
E exp i
j j ukj (Xtk+1 − Xtk ) .
k=0 j=1
(Xt1 , . . . , Xtd )
is a d-dimensional Brownian motion started from 0. (3) Prove that (Some care is needed with the filtrations. If we only know that Y λ = i λi X i is a Brownian motion with respect to the filtration generated by Y λ for each λ = (λ1 , . . . , λd ), the assertion is not true. See Revuz and Yor (1999), Exercise I.1.19.) 2.5
Let Wt be a Brownian motion and suppose lim Wt /t = 0,
t→∞
a.s.
(2.5)
Let Zt = tW1/t if t > 0 and set Z0 = 0. (This is called time inversion.) Show that Z is a Brownian motion. (We will see later that the assumption (2.5) is superfluous; see Theorem 7.2.)
12 2.6
Brownian motion Let X and Y be two independent Brownian motions started at 0 and let t0 > 0. Let
Xt , t ≤ t0 , Zt = Xt0 + Yt−t0 , t > t0 . Prove that Z is also a Brownian motion.
2.7
Let Ft00 and Ft0 be defined as in (1.1) and (1.2). Prove that if X is Ft0 measurable, there exists Z such that Z is Ft00 measurable and Y = Z, a.s.
2.8
Let Ft00 and Ft0 be defined as in (1.1) and (1.2). The symmetric difference of two sets A and B is defined by A B = (A\B) ∪ (B\A). Prove that Ft0 = {A ⊂ : A B ∈ N for some B ∈ Ft00 }.
Notes Brownian motion is named for Robert Brown, a botanist who observed the erratic motion of colloidal particles in suspension in the 1820s. Brownian motion was used by Bachelier in 1900 in his PhD thesis to model stock prices and was the subject of an important paper by Einstein in 1905. The rigorous mathematical foundations for Brownian motion were first given by Wiener in 1923.
3 Martingales
Although discrete-time martingales are useful in a first course on probability, they are nowhere near as useful as continuous-time martingales are in the study of stochastic processes. The whole theory of stochastic integrals and stochastic differential equations is based on martingales indexed by times t ∈ [0, ∞). After giving the definition and some examples, we extend Doob’s inequalities, the optional stopping theorem, and the martingale convergence theorem to continuous-time martingales. We then derive some estimates for Brownian motion using martingale techniques.
3.1 Definition and examples We define continuous-time martingales. Let {F t } be a filtration, not necessarily satisfying the usual conditions. Definition 3.1 Mt is a continuous-time martingale with respect to the filtration {Ft } and the probability measure P if (1) E |Mt | < ∞ for each t; (2) Mt is Ft measurable for each t; (3) E [Mt | Fs ] = Ms , a.s., if s < t. Part (2) of the definition can be rephrased as saying Mt is adapted to Ft . If in part (3) “=” is replaced by “≥,” then Mt is a submartingale, and if it is replaced by “≤,” then we have a supermartingale. Taking expectations in Definition 3.1(3), we see that if s < t, then E Ms ≤ E Mt is M is a submartingale and E Ms ≥ E Mt if M is a supermartingale. Thus submartingales tend to increase, on average, and supermartingales tend to decrease, on average. There are many martingales associated with Brownian motion. Here are three examples. Example 3.2 Let Mt = Wt , where Wt is a Brownian motion. Then Mt is a martingale. To verify Definition 3.1(3), we write
E [Mt | Fs ] = Ms + E [Wt − Ws | Fs ] = Ms + E [Wt − Ws ] = Ms , using the independent increments property of Brownian motion and the fact that E [Wt − Ws ] = 0. 13
14
Martingales
Example 3.3 Let Mt = Wt 2 − t, where Wt is a Brownian motion. To show Mt is a martingale, we write
E [Mt | Fs ] = E [(Wt − Ws + Ws )2 | Fs ] − t = Ws2 + E [(Wt − Ws )2 | Fs ] + 2E [Ws (Wt − Ws ) | Fs ] − t = Ws2 + E [(Wt − Ws )2 ] + 2Ws E [Wt − Ws | Fs ] − t = Ws2 + E [(Wt − Ws )2 ] + 2Ws E [Wt − Ws ] − t = Ws2 + (t − s) − t = Ms . We used the facts that Ws is Fs measurable and that Wt − Ws is independent of Fs . Example 3.4 Again let Wt be a Brownian motion, let a ∈ R, and let Mt = eaWt −a t/2 . Since 2 Wt − Ws is normal with mean zero and variance t − s, we know E ea(Wt −Ws ) = ea (t−s)/2 ; see (A.6). Then 2
E [Mt | Fs ] = e−a t/2 eaWs E [ea(Wt −Ws ) | Fs ] 2
= e−a t/2 eaWs E [ea(Wt −Ws ) ] 2
= e−a t/2 eaWs ea 2
2
(t−s)/2
= Ms .
We give one more example of a martingale, although not one derived from Brownian motion. Example 3.5 Recall that given a filtration {Ft }, each Ft is contained in F , where (, F , P ) is our probability space. Let X be an integrable F measurable random variable, and let Mt = E [X | Ft ]. Then
E [Mt | Fs ] = E [E [X | Ft ] | Fs ] = E [X | Fs ] = Ms , and M is a martingale.
3.2 Doob’s inequalities We derive the analogs of Doob’s inequalities in the stochastic process context. Theorem 3.6 Suppose Mt is a martingale or non-negative submartingale with paths that are right continuous with left limits. Then (1)
P(sup |Ms | ≥ λ) ≤ E |Mt |/λ. s≤t
(2) If 1 < p < ∞, then
E [sup |Ms |] p ≤ s≤t
p p E |Mt | p . p−1
Proof We will do the case where Mt is a martingale, the submartingale case being nearly identical. Let Dn = {kt/2n : 0 ≤ k ≤ 2n }. If we set Nk(n) = Mkt/2n and Gk(n) = Fkt/2n , it is clear that {Nk(n) } is a discrete-time martingale with respect to {Gk(n) }. Let An = { sup |Ms | > λ}. s≤t,s∈Dn
3.3 Stopping times
15
By Doob’s inequality for discrete-time martingales (see Theorem A.32),
P(An ) = P(maxn |Nk(n) | > λ) ≤ k≤2
E |N2(n) E |Mt | n | = . λ λ
Note that the An are increasing, and since Mt is right continuous, ∪n An = {sup |Ms | > λ}. s≤t
Then
P(sup |Ms | > λ) = P(∪n An ) = lim P(An ) ≤ E |Mt |/λ. n→∞
s≤t
If we apply this with λ replaced by λ − ε and let ε → 0, we obtain (1). The proof of (2) is similar. By Doob’s inequality for discrete-time martingales (see Theorem A.33),
p p
p p p E [sup |Nk(n) | p] ≤ E |N2(n) E |Mt | p . n | = p−1 p−1 k≤2n Since supk≤2n |Nk(n) | p increases to sups≤t |Ms | p by the right continuity of M, (2) follows by Fatou’s lemma.
3.3 Stopping times Throughout this section we suppose we have a filtration {Ft } satisfying the usual conditions. Definition 3.7 A random variable T : → [0, ∞] is a stopping time if for all t, (T < t ) ∈ Ft . We say T is a finite stopping time if T < ∞, a.s. We say T is a bounded stopping time if there exists K ∈ [0, ∞) such that T ≤ K, a.s. Note that T can take the value infinity. Stopping times are also known as optional times. Given a stochastic process X , we define XT (ω) to be equal to X (T (ω), ω); that is, for each ω we evaluate t = T (ω) and then look at X (·, ω) at this time. Proposition 3.8 Suppose Ft satisfies the usual conditions. Then (1) T is a stopping time if and only if (T ≤ t ) ∈ Ft for all t. (2) If T = t, a.s., then T is a stopping time. (3) If S and T are stopping times, then so are S ∨ T and S ∧ T . (4) If Tn , n = 1, 2, . . . , are stopping times with T1 ≤ T2 ≤ · · · , then so is supn Tn . (5) If Tn , n = 1, 2, . . . , are stopping times with T1 ≥ T2 ≥ · · · , then so is inf n Tn . (6) If s ≥ 0 and S is a stopping time, then so is S + s. Proof We will just prove part of (1), leaving the rest as Exercise 3.4. Note (T ≤ t ) = ∩n≥N (T < t + 1/n) ∈ Ft+1/N for each N. Thus (T ≤ t ) ∈ ∩N Ft+1/N ⊂ Ft+ = Ft . For a Borel measurable set A, let TA = inf {t > 0 : Xt ∈ A}.
(3.1)
16
Martingales
Proposition 3.9 Suppose Ft satisfies the usual conditions and Xt has continuous paths. (1) If A is open, then TA is a stopping time. (2) If A is closed, then TA is a stopping time. Proof (1) (TA < t ) = ∩q∈Q+ ,q
if k/2n ≤ T (ω) < (k + 1)/2n .
(3.2)
Exercise 3.5 asks you to prove that the Tn are stopping times decreasing to T . Define
FT = {A ∈ F : for each t > 0, A ∩ (T ≤ t ) ∈ Ft }.
(3.3)
This definition of FT , which is supposed to be the collection of events that are “known” by time T , is not very intuitive. But it turns out that this definition works well in applications. Exercise 3.6 gives an equivalent definition that is more appealing but not as useful. Proposition 3.10 Suppose {Ft } is a filtration satisfying the usual conditions. (1) FT is a σ -field. (2) If S ≤ T , then FS ⊂ FT . (3) If FT + = ∩ε>0 FT +ε , then FT + = FT . (4) If Xt has right-continuous paths, then XT is FT measurable. Proof If A ∈ FT , then Ac ∩ (T ≤ t ) = (T ≤ t ) \ [A ∩ (T ≤ t )] ∈ Ft , so Ac ∈ FT . The rest of the proof of (1) is easy. Suppose A ∈ FS and S ≤ T . Then A ∩ (T ≤ t ) = [A ∩ (S ≤ t )] ∩ (T ≤ t ). We have A ∩ (S ≤ t ) ∈ Ft because A ∈ FS , while (T ≤ t ) ∈ Ft because T is a stopping time. Therefore A ∩ (T ≤ t ) ∈ Ft , which proves (2). For (3), if A ∈ FT + , then A ∈ FT +ε for every ε, and so A ∩ (T + ε ≤ t ) ∈ Ft for all t. Hence A ∩ (T ≤ t − ε) ∈ Ft for all t, or equivalently A ∩ (T ≤ t ) ∈ Ft+ε for all t. This is true for all ε, so A ∩ (T ≤ t ) ∈ Ft+ = Ft . This says A ∈ FT . (4) Define Tn by (3.2). Note (XTn ∈ B) ∩ (Tn = k/2n ) = (Xk/2n ∈ B) ∩ (Tn = k/2n ) ∈ Fk/2n . Since Tn only takes values in {k/2n : k ≥ 0}, we conclude (XTn ∈ B) ∩ (Tn ≤ t ) ∈ Ft and so (XTn ∈ B) ∈ FTn ⊂ FT +1/2n .
3.5 Convergence and regularity
17
Hence XTn is FT +1/2n measurable. If n ≥ m, then XTn is measurable with respect to FT +1/2n ⊂ FT +1/2m . Since XTn → XT , then XT is FT +1/2m measurable for each m. Therefore XT is measurable with respect to FT + = FT .
3.4 The optional stopping theorem We will need Doob’s optional stopping theorem for continuous-time martingales. An example to keep in mind is Mt = Wt∧t0 , where W is a Brownian motion and t0 is some fixed time. Exercise 3.12 is a version of the optional stopping time with slightly weaker hypotheses that is often useful. Theorem 3.11 Let {Ft } be a filtration satisfying the usual conditions. If Mt is a martingale or non-negative submartingale whose paths are right continuous, supt≥0 E Mt2 < ∞, and T is a finite stopping time, then E MT ≥ E M0 . Proof We do the submartingale case, the martingale case being very similar. By Doob’s inequality (Theorem 3.6(1)),
E [sup Ms2 ] ≤ 4E Mt2 . s≤t
E [supt≥0 Mt2 ]
< ∞ by Fatou’s lemma. Letting t → ∞, we have Let us first suppose that T < K, a.s., for some real number K. Define Tn by (3.2). Let Nk(n) = Mk/2n , Gk(n) = Fk/2n , and Sn = 2n Tn . By Doob’s optional stopping theorem applied to the submartingale Nk(n) , we have
E M0 = E N0(n) ≤ E NS(n) = E MTn . n Since M is right continuous, MTn → MT , a.s. The random variables |MTn | are bounded by 1 + supt≥0 Mt2 , so by dominated convergence, E MTn → E MT . We apply the above to the stopping time T ∧K to get E MT ∧K ≥ E M0 . The random variables MT ∧K are bounded by 1 + supt≥0 Mt2 , so by dominated convergence, we get E MT ≥ E M0 when we let K → ∞.
3.5 Convergence and regularity We present the continuous-time version of Doob’s martingale convergence theorem. We will see that not only do we get limits as t → ∞, but also a regularity result. Let Dn = {k/2n : k ≥ 0}, D = ∪n Dn . Theorem 3.12 Let {Mt : t ∈ D} be either a martingale, a submartingale, or a supermartingale with respect to {Ft : t ∈ D} and suppose supt∈D E |Mt | < ∞. Then (1) limt→∞ Mt exists, a.s. (2) With probability one Mt has left and right limits along D. The second conclusion says that except for a null set, if t0 ∈ [0, ∞), then both limt∈D,t↑t0 Mt and limt∈D,t↓t0 Mt exist and are finite. The null set does not depend on t0 . Proof Martingales are also submartingales and if Mt is a supermartingale, then −Mt is a submartingale, so we may without loss of generality restrict our attention to submartingales.
18
Martingales
By Doob’s inequality (Theorem 3.6(1)),
P( sup |Mt | > λ) ≤ t∈Dn ,t≤n
1 E |Mn |. λ
Letting n → ∞ and using Fatou’s lemma,
P(sup |Mt | > λ) ≤ t∈D
1 sup E |Mt |. λ t
This is true for all λ, so with probability one, {|Mt | : t ∈ D} is a bounded set. Therefore the only way either (1) or (2) can fail is that if for some pair of rationals a < b the number of upcrossings of [a, b] by {Mt : t ∈ D} is infinite. Recall that we define upcrossings as follows. Given an interval [a, b] and a submartingale M, if S1 = inf {t : Mt ≤ a}, Ti = inf {t > Si : Mt ≥ b}, and Si+1 = inf {t > Ti : Mt ≤ a}, then the number of upcrossings up to time u is sup{k : Tk ≤ u}. Doob’s upcrossing lemma (Theorem A.34) tells us that if Vn is the number of upcrossings by {Mt : t ∈ Dn ∩ [0, n]}, then E |Mn | . E Vn ≤ b−a Letting n → ∞ and using Fatou’s lemma, the number of upcrossings of [a, b] by {Mt : t ∈ D} has finite expectation, hence is finite, a.s. If Na,b is the null set where the number of upcrossings of [a, b] by {Mt : t ∈ D} is infinite and N = ∪a
lim
u∈D,u>t,u→t
Mu .
has paths that are right continuous with left limits. Since Ft+ = Ft and M t It is clear that M t is Ft measurable. is Ft+ measurable, then M Let N be fixed. We will show {Mt ; t ≤ N} is a uniformly integrable family of random variables; see Section A.4. Let ε > 0. Since MN is integrable, there exists δ such that if P(A) < δ, then E [|MN |; A] < ε. If L is large enough, P(|Mt | > L) ≤ E |Mt |/L ≤ E |MN |/L < δ. Then
E [|Mt |; |Mt | > L] ≤ E [|MN |; |Mt | > L] < ε, since |Mt | is a submartingale and (|Mt | > L) ∈ Ft . Uniform integrability is proved.
3.5 Convergence and regularity
19
Now let t < N. If B ∈ Ft , t ; B] = E [M
lim
u∈D,u>t,u→t
E [Mu ; B] = E [Mt ; B].
Here we used the Vitali convergence theorem (Theorem A.19) and the fact that Mt is a t is Ft measurable, this proves that M t = Mt , a.s. Since N was arbitrary, martingale. Since M we have this for all t. We thus have found a version of M that has paths that are right t is a martingale is easy. continuous with left limits. That M The following technical result will be used several times in this book. A function f is increasing if s < t implies f (s) ≤ f (t ). A process At has increasing paths if the function t → At (ω) is increasing for almost every ω. Proposition 3.14 Suppose {Ft } is a filtration satisfying the usual conditions and suppose At is an adapted process with paths that are increasing, are right continuous with left limits, and A∞ = limt→∞ At exists, a.s. Suppose X is a non-negative integrable random variable, and Mt is a version of the martingale E [X | Ft ] which has paths that are right continuous with left limits. Suppose E [X A∞ ] < ∞. Then ∞ ∞ E X dAs = E Ms dAs . (3.4) 0
Proof
0
First suppose X and A are bounded. Let n > 1 and write E ∞
∞ 0
X dAs as
E [X (Ak/2n − A(k−1)/2n )].
k=1
Conditioning the kth summand on Fk/2n , this is equal to
E
∞
E [X | Fk/2n ](Ak/2n − A(k−1)/2n ) .
k=1
Given s and n, define sn to be that value of k/2n such that (k − 1)/2n < s ≤ k/2n . We then have ∞ ∞ E X dAs = E Msn dAs . (3.5) 0
0
For any value of s, sn ↓ s as n → ∞, and since M has right-continuous paths, Msn → Ms . Since X is bounded, so is M. By dominated convergence, the right-hand side of (3.5) converges to ∞ E Ms dAs . 0
This completes the proof when X and A are bounded. We apply this to X ∧ N and A ∧ N, let N → ∞, and use monotone convergence for the general case. The only reason we assume X is non-negative is so that the integrals make sense. The equation (3.4) can be rewritten as ∞ ∞ E X dAs = E E [X | Fs ] dAs . (3.6) 0
0
20
Martingales
We also have
t
E
X dAs = E
0
t
E [X | Fs ] dAs
(3.7)
0
for each t. This follows either by following the above proof or by applying Proposition 3.14 to As∧t .
3.6 Some applications of martingales The following estimates are very useful. Proposition 3.15 If Wt is a Brownian motion, then
P(sup Ws ≥ λ) ≤ e−λ /2t , 2
λ > 0,
(3.8)
s≤t
and
P(sup |Ws | ≥ λ) ≤ 2e−λ /2t , 2
λ > 0.
(3.9)
s≤t
Proof For any a the process {eaWt } is a submartingale. To see this, since x → eax is convex, the conditional expectation form of Jensen’s inequality (Proposition A.21) implies
E [eaWt | Fs ] ≥ eaE [Wt |Fs ] = eaWs . By Doob’s inequality (Theorem 3.6(1)),
P(sup Ws ≥ λ) = P(sup eaWs ≥ eaλ ) ≤ s≤t
s≤t
E eaWt . eaλ
(3.10)
Since E eaY = ea Var Y/2 if Y is Gaussian with mean 0 by (A.6), it follows that the right side 2 of (3.10) is bounded by e−aλ ea t/2 . If we now set a = λ/t, we obtain (3.8). Inequality (3.9) follows by applying (3.8) to W and to −W and adding. 2
Let us use martingales to calculate some probabilities. Let us suppose a, b > 0 and set T = inf {t > 0 : Wt = −a or Wt = b}, the first time Brownian motion exits the interval [−a, b]. By Proposition 3.9, T is a stopping time. We have / [−a, b]}, and let Proposition 3.16 Let W be a Brownian motion, let T = inf {t > 0 : Wt ∈ a, b > 0. Then b a , , (3.11) P(WT = −a) = P(WT = b) = a+b a+b and
E T = ab.
(3.12)
Proof Since Wt 2 − t is a martingale with W0 = 0, it is easy to check that for each u, 2 2 Wt∧u − (t ∧ u) is also a martingale. Applying Theorem 3.11, we see that E Wu∧T = E [u ∧ T ]. As u → ∞, the right-hand side tends to E T by monotone convergence. |Wu∧T |2 is bounded
3.6 Some applications of martingales
21
by (a + b)2 , so by dominated convergence the left-hand side tends to E WT2 ≤ (a + b)2 as u → ∞. Therefore
E T = E WT2 .
(3.13)
In particular, E T < ∞, so we know T < ∞, a.s. We use that T is finite, a.s., to conclude that P(WT ∈ {−a, b}) = 1, or 1 = P(WT = −a) + P(WT = b).
(3.14)
Since Wt is a martingale, then so is Wt∧u for each u, and therefore E Wu∧T = 0. Letting u → ∞ and using dominated convergence (noting |Wu∧T | is bounded by a + b), we have E WT = 0, or 0 = (−a)P(WT = −a) + bP(WT = b).
(3.15)
We get (3.11) by solving (3.14) and (3.15) for the unknowns P(WT = −a) and P(WT = b). We get (3.12) by (3.13), writing
E T = E WT2 = (−a)2 P(WT = −a) + b2 P(WT = b), and substituting the values from (3.11). In proving Proposition 3.16, we used the fact that Wt∧T is a martingale and P(T < ∞) = 1. The same proof shows Corollary 3.17 Suppose Mt is a martingale with continuous paths and with M0 = 0, a.s., T = inf {t ≥ 0 : Mt ∈ / [−a, b]}, and T < ∞, a.s. Then
P(MT = −a) =
b , a+b
P(MT = b) =
a . a+b
We can also use martingales to get more subtle results. Suppose r > 0. Since erWt −r t/2 is a martingale, as above 2
E erWT ∧t −r (T ∧t )/2 = 1. 2
The exponent is bounded by rb if r > 0, so we can let t → ∞ and use dominated convergence to get
E erWT −r T /2 = 1. 2
This can be written as e−ra E [e−r T /2 ; WT = −a] + erb E [e−r T /2 ; WT = b] = 1. 2
2
Since e−rWt −r t/2 is also a martingale, similar reasoning gives us 2
era E [e−r T /2 ; WT = −a] + e−rb E [e−r T /2 ; WT = b] = 1. 2
2
We can solve those two equations to obtain 2 E e−r T /2 ; WT = −a = and
2 E e−r T /2 ; WT = b =
erb − e−rb er(a+b) − e−r(a+b)
(3.16)
era − e−ra . er(a+b) − e−r(a+b)
(3.17)
22
Martingales
The left-hand sides of (3.16) and (3.17) are the Laplace transforms of the quantities P(T ∈ dt; WT = −a)/dt and P(T ∈ dt; WT = b)/dt, respectively, and finding the inverse Laplace transforms of the right-hand sides of (3.16) and (3.17) gives us formulas for P(T ∈ dt; WT = −a)/dt and P(T ∈ dt; WT = b)/dt. If we add the two formulas, we get an expression for P(T ∈ dt )/dt, and integrating over t from 0 to t0 gives an expression for P(T ≤ t0 ). We sketch how to invert the Laplace transform and leave the detailed calculations and justification for inverting a Laplace transform term by term to the interested reader. See also Karatzas and Shreve (1991), Section 2.8. The right-hand side of (3.16) is equal to e−ra − e−ra−2rb . 1 − e−2r(a+b) Since e−2r(a+b) < 1, we can use (1 − x)−1 =
∞
xn
n=0
to expand the denominator as a power series; if we set λ = r2 /2, then E e−λT ; WT = −a =
∞
√
e−(2n+1)
√ 2λa−2n 2λb
√
− e−(2n+1)
(3.18)
√ 2λa−(2n+2) 2λb
.
n=0
We then use the fact that the Laplace transform of k 2 e−k /4t √ 3 2 πt
√
is e−k λ to find the inverse Laplace transform of the right-hand side of (3.18) by inverting term by term. Similarly (see Exercises 3.15 and √ 3.16), if b > 0, W is a Brownian motion, and S = −λS − 2λb inf {t > 0 : Wt = b}, then E e =e . Inverting the Laplace transform,
P(S ∈ dt ) = √
b 2πt 3
e−b /2t , 2
t ≥ 0.
(3.19)
Exercises 3.1
If W is a Brownian motion, show that
Wt3 − 3
t
Ws ds 0
is a martingale. 3.2
Suppose {Ft } is a filtration satisfying the usual conditions. Show that if Mt is a submartingale and E Mt = E M0 for all t, then M is a martingale.
3.3
Let X be a submartingale. Show that supt≥0 E |Xt | < ∞ if and only if supt≥0 E Xt+ < ∞.
3.4
Prove all parts of Proposition 3.8.
Exercises
23
3.5
If Tn is defined by (3.2), show Tn is a stopping time for each n and Tn ↓ T .
3.6
This exercise gives an alternate definition of FT which is more appealing, but not as useful. Suppose that {Ft } satisfies the usual conditions. Show that FT is equal to the σ -field generated by the collection of random variables YT such that Y is a bounded process with paths that are right continuous with left limits and Y is adapted to the filtration {Ft }.
3.7
Suppose {Ft } is a filtration satisfying the usual conditions. Show that if T is a stopping time, then T is FT measurable.
3.8
Suppose {Ft } is a filtration satisfying the usual conditions and T is a stopping time. Show that if S is a FT measurable random variable with S ≥ T , then S is a stopping time.
3.9
This exercise demonstrates that the conclusion of Corollary 3.13 cannot be extended to submartingales. Find a filtration {Ft } satisfying the usual conditions and a submartingale X with respect to {Ft } such that X does not have a version with paths that are right continuous with left limits.
3.10 Suppose {Ft } is a filtration satisfying the usual conditions. Show that if S and T are stopping times and X is a bounded F∞ measurable random variable, then E [E [X | FS ] | FT ] = E [X | FS∧T ].
Hint: Let Yt = E [X | Ft ] and Zt = Yt∧S . Show the left-hand side is equal to YS∧T . 3.11 A martingale or submartingale Mt is uniformly integrable if the family {Mt : t ≥ 0} is a uniformly integrable family of random variables. Show that if Mt is a uniformly integrable martingale with paths that are right continuous with left limits, then {MT ; T a finite stopping time} is a uniformly integrable family of random variables. Show this also holds if Mt is a non-negative submartingale with paths that are right continuous with left limits. 3.12 This exercise weakens the conditions on the optional stopping theorem. Show that if Mt is a uniformly integrable martingale that is right continuous with left limits and T is a finite stopping time, then E MT = E M0 . 3.13 Let W be a Brownian motion and let T be a stopping time with E T < ∞. Prove that E WT = 0 and E WT2 = E T . This is not an easy application of the optional stopping theorem because we do not know that Wt∧T is necessarily a uniformly integrable martingale. 3.14 Suppose that (Wt1 , . . . , Wtd ) is a d-dimensional Brownian motion. Show that if i = j, then j WtiWt is a martingale. 3.15 Let Wt be a Brownian motion, b > 0, and T = inf {t > 0 : Wt = b}. Show T < ∞, a.s. Show E T = ∞. Hint: Take a limit in (3.11). 3.16 Suppose W is a Brownian motion and b > 0. If S = inf {t > 0 : Wt = b}, show that the Laplace transform of the density of S is given by √
E e−λS = e−
2λb
.
3.17 Let Wt be a Brownian motion. Show that if α > 1/2, then Wt lim t→∞ t α
= 0,
a.s.
24
Martingales Hint: Let α0 ∈ (1/2, α), estimate P(
sup 2n ≤s≤2n+1
|Ws | ≥ (2n )α0 )
using (3.9), and then use the Borel–Cantelli lemma. 3.18 Let Wt be a one-dimensional Brownian motion and α ∈ (0, 1/2]. Prove that lim sup t→∞
|Wt | > 0, tα
a.s.
3.19 If W is a Brownian motion and b is a constant, then the process Xt = Wt + bt is a Brownian motion with drift. Prove that if b > 0, then lim Xt = ∞,
t→∞
a.s.
4 Markov properties of Brownian motion
In later chapters we will discuss extensively the Markov property and strong Markov property. The Brownian motion case is much simpler, and we do that now.
4.1 Markov properties Let us begin with the Markov property. Theorem 4.1 Let {F t } be a filtration, not necessarily satisfying the usual conditions, and let W be a Brownian motion with respect to {Ft }. If u is a fixed time, then Yt = Wt+u − Wu is a Brownian motion independent of Fu . Proof Let Gt = Ft+u . It is clear that Y has continuous paths, is zero at time 0, and is adapted to {Gt }. Since Yt − Ys = Wt+u − Ws+u , then Yt − Ys is a mean zero normal random variable with variance (t + u) − (s + u) = t − s that is independent of Fs+u = Gs . The strong Markov property is the Markov property extended by replacing fixed times u by finite stopping times. Theorem 4.2 Let {Ft } be a filtration, not necessarily satisfying the usual conditions, and let W be a Brownian motion adapted to {Ft }. If T is a finite stopping time, then Yt = WT +t − WT is a Brownian motion independent of FT . Proof We will first show that whenever m ≥ 1, t1 < · · · < tm , f is a bounded continuous function on Rm , and A ∈ FT , then
E [ f (Yt1 , . . . , Ytm ); A] = E [ f (Wt1 , . . . , Wtm )] P(A).
(4.1)
Once we have done this, we will then show how (4.1) implies our theorem. To prove (4.1), define Tn by (3.2). We have
E [ f (WTn +t1 − WTn , . . . , WTn +tm − WTn ); A] ∞ E [ f (WTn +t1 − WTn , . . . , WTn +tm − WTn ); A, Tn = k/2n ] =
(4.2)
k=1
=
∞
E [ f (Wt1 +k/2n − Wk/2n , . . . , Wtm +k/2n − Wk/2n ); A, Tn = k/2n ].
k=1
Following the usual practice in probability that “,” means “and,” we use the notation “E [· · · ; A, Tn = k/2n ]” as an abbreviation for “E [· · · ; A ∩ (Tn = k/2n )].” Since A ∈ FT , 25
26
Markov properties of Brownian motion
then A ∩ (Tn = k/2n ) = A ∩ ((T < k/2n ) \ (T < (k − 1)/2n )) ∈ Fk/2n . We use the independent increments property of Brownian motion and the fact that Wt − Ws has the same law as Wt−s to see that the sum in the last line of (4.2) is equal to ∞
E [ f (Wt1 +k/2n − Wk/2n , . . . , Wtm +k/2n − Wk/2n )] P(A, Tn = k/2n )
k=1
=
∞
E [ f (Wt1 , . . . , Wtm )] P(A, Tn = k/2n )
k=1
= E [ f (Wt1 , . . . , Wtm )] P(A), which is the right-hand side of (4.1). Thus
E [ f (WTn +t1 − WTn , . . . WTn +tm − WTn ); A] = E [ f (Wt1 , . . . Wtm )] P(A).
(4.3)
Now let n → ∞. By the right continuity of the paths of W, the boundedness and continuity of f , and the dominated convergence theorem, the left-hand side of (4.3) converges to the left-hand side of (4.1). If we take A = in (4.1), we obtain
E [ f (Yt1 , . . . , Ytm )] = E [ f (Wt1 , . . . , Wtm )] whenever m ≥ 1, t1 , . . . , tm ∈ [0, ∞), and f is a bounded continuous function on Rm . This implies that the finite-dimensional distributions of Y and W are the same. Since Y has continuous paths, Y is a Brownian motion. Next take A ∈ FT . By using a limit argument, (4.1) holds whenever f is the indicator of a Borel subset B of Rd , or in other words,
P(Y ∈ B, A) = P(Y ∈ B)P(A)
(4.4)
whenever B is a cylindrical set. Let M be the collection of all Borel subsets B of C[0, ∞) for which (4.4) holds. Let C be the collection of all cylindrical subsets of C[0, ∞). Then we observe that M is a monotone class containing C and C is an algebra of subsets of C[0, ∞) generating the Borel σ -field of C[0, ∞). By the monotone class theorem (Theorem B.2), M is equal to the Borel σ -field on C[0, ∞), and since (4.4) holds for all sets B ∈ M, this establishes the independence of Y and FT . In the future, we will not put in the details for the arguments using the monotone class theorem. Observe that what is needed for the above proof to work is not that W be a Brownian motion, but that the process W have right continuous paths and that Wt − Ws be independent of Fs and have the same distribution as Wt−s . We therefore have the following corollary. Corollary 4.3 Let {Ft } be a filtration, not necessarily satisfying the usual conditions, and let X be a process adapted to {Ft }. Suppose X has paths that are right continuous with left limits and suppose Xt − Xs is independent of Fs and has the same law as Xt−s whenever s < t. If T is a finite stopping time, then Yt = XT +t − XT is a process that is independent of FT and X and Y have the same law.
4.2 Applications
27
2b−x b x
Figure 4.1 The reflection principle.
4.2 Applications The first application is known as the reflection principle and allows us to get control of the maximum of a Brownian motion. The idea is the following. Suppose that Wt is a Brownian motion and for some path, the Brownian motion goes above a level b before time t but that at time t the value of Wt is less than x, where x < b. We could take the graph of this path and reflect it across the horizontal line at level b the first time the path crosses the level b (Figure 4.1). This will give us a new path that ends up above 2b−x. Thus there is a one-to-one correspondence between paths where the maximum up to time t is above b and Wt is below x and the paths where Wt is above 2b − x. More precisely, we have the following. Theorem 4.4 Let Wt be a Brownian motion, b > 0, T = inf {t : Wt ≥ b}, and x < b. Then
P(sup Ws ≥ b, Wt < x) = P(Wt > 2b − x).
(4.5)
s≤t
Proof
Let Tn be defined by (3.2). We first show that
P(Tn ≤ t, Wt − WTn < x − b) = P(Tn ≤ t, Wt − WTn > b − x).
(4.6)
28
Markov properties of Brownian motion
Writing [x] for the integer part of x, the left-hand side of (4.6) is equal to [2 t] n
P(Tn = k/2n , Wt − WTn < x − b)
k=0 [2 t] n
=
P(Tn = k/2n , Wt − Wk/2n < x − b)
k=0
[2n t]
=
P(Tn = k/2n )P(Wt − WTn < x − b),
k=0
using the independent increments property of Brownian motion and the fact that we have (Tn = k/2n ) ∈ Fk/2n . Using the symmetry of the normal distribution, that is, that Wt − Ws and Ws − Wt have the same law, this is the same as [2 t] n
P(Tn = k/2n )P(Wt − WTn > b − x),
k=0
and reversing the steps above, this equals the right-hand side of (4.6). Since W has continuous paths, WT = b, so (T = t ) ⊂ (Wt = b). Because Wt is a normal random variable, then P(T = t ) = 0. Also, P(Wt − WT = b − x) and P(Wt − WT = x − b) are both zero. If we now let n → ∞ in (4.6), we obtain
P(T ≤ t, Wt − WT < x − b) = P(T ≤ t, Wt − WT > b − x). Since WT = b, this is the same as
P(T ≤ t, Wt < x) = P(T ≤ t, Wt > 2b − x).
(4.7)
By the definition of T and the continuity of the paths of W, the left-hand side is equal to the left-hand side of (4.5). If Wt > 2b − x, then automatically T ≤ t, so the right-hand side of (4.7) is equal to the right-hand side of (4.5). Our second application will be useful when studying local time in Chapter 14. Proposition 4.5 Let Wt be a Brownian motion with respect to a filtration {Ft } satisfying the usual conditions. Let T be a finite stopping time and s > 0. If a < b, then |b − a| . P(WT +s ∈ [a, b] | FT ) ≤ √ 2π s Proof If A ∈ FT , let k > 0 and write
P(WT +s ∈ [a, b], A) ∞ P(WT +s ∈ [a, b], A, j/k ≤ WT < ( j + 1)/k) = j=−∞
≤
∞
P(WT +s − WT ∈ [a − ( j + 1)/k, b − j/k],
j=−∞
A, j/k ≤ WT ≤ ( j + 1)/k).
Exercises
29
Using the fact that WT +s − WT is a Brownian motion independent of FT , this is less than or equal to ∞
P(Ws ∈ [a − ( j + 1)/k, b − j/k]) P(A, j/k ≤ WT ≤ ( j + 1)/k)
j=−∞
≤
∞
1 b − a + 1/k P(A, j/k ≤ WT ≤ ( j + 1)/k) √ √ s 2π j=−∞
1 b − a + 1/k P(A). ≤√ √ s 2π We used here the formula for the density of a normal random variable with mean zero and variance s. This is true for all k, so letting k → ∞ yields our result.
Exercises 4.1
If W is a Brownian motion, let St = sups≤t Ws . Find the density for St .
4.2
With W and S as in Exercise 4.1, find the joint density of (St , Wt ).
4.3
Let W be a Brownian motion started at a > 0 and let T0 be the first time W hits 0. Find the law of supt≤T0 Wt .
4.4
Use the reflection principle to prove that if W is a Brownian motion and T = inf {t > 0 : Wt ∈ (0, ∞)}, then P(T = 0) = 1.
In other words, Brownian motion enters the interval (0, ∞) immediately. By symmetry it enters the interval (−∞, 0) immediately. Conclude that Brownian motion hits 0 infinitely often in every time interval [0, t]. 4.5
Let Wt be a Brownian motion and {Ft } be the minimal augmented filtration generated by W . Let T = inf {t > 0 : Wt = sup Ws }. 0≤s≤1
Show that T is not a stopping time with respect to {Ft }. 4.6
Let W and S be as in Exercise 4.1. (1) Let 0 < s < t < u and let a < b with b − a ≤ 1. Show that there exists a constant c, depending on s, t, and u, but not a or b, such that P(Ss ∈ [a, b], sup Wr ∈ [a, b]) ≤ c(b − a)2 . t≤r≤u
(2) Show that the path of a Brownian motion does not take on the same value as a local maximum twice. That is, if S and T are times when W has a local maximum, then WS = WT , a.s. 4.7
Let Vt be the number of upcrossings of [0, 1] by a Brownian motion W up to time t. This means we let S1 = 0, Ti = inf {t > Si : Wt ≥ 1}, and Si+1 = inf {T > Ti : Wt ≤ 0} for i = 1, 2, . . . , and we set Vt = sup{k : Tk ≤ t}. Show that Vt → ∞, a.s., as t → ∞.
30 4.8
Markov properties of Brownian motion Let W be a Brownian motion. The zero set of Brownian motion is the random set Z(ω) = {t ∈ [0, 1] : Wt (ω) = 0}. (1) Show that Z(ω) is a closed set for each ω. (2) Show that with probability one, every point of Z(ω) is a limit point of Z(ω). Conclude that Z(ω) is an uncountable set.
4.9
Let W be a one-dimensional Brownian motion and δ > 0. (1) Prove that there exists γ such that if t ≤ γ , then P(0 ≤ Wt ≤ δ/2) ≥ 1/4
and
P(−δ/2 ≤ Wt ≤ 0) ≥ 1/4.
(2) Prove there exists γ such that P(sup |Ws | > δ/2) ≤ 1/8. s≤γ
(3) Prove that if m ≥ 1, then P(
sup
mγ ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, Wmγ ∈ [0, δ/2], |W(m+1)γ | ≤ δ/2 | Fmγ ) ≥ 18 P(
sup
mγ ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, Wmγ ∈ [0, δ/2])
and the same with Wmγ ∈ [−δ/2, 0] in place of Wmγ ∈ [0, δ/2]. Conclude that P(
sup
mγ ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2, |W(m+1)γ | ≤ δ/2 | Fmγ ) ≥ 18 P(
sup
mγ ≤s≤(m+1)γ
|Ws − Wmγ | ≤ δ/2, |Wmγ | ≤ δ/2).
(4) Use induction to prove that if t0 > 0, there exists c1 > 0 such that P(sup |Ws | ≤ δ) > c1 . s≤t0
(5) Prove that if W is a d-dimensional Brownian motion, t0 > 0, and δ > 0, there exists c2 such that P(sup |Ws | ≤ δ) > c2 . s≤t0
4.10 The p-variation of a function f on the interval [0, 1] is defined by V p ( f ) = sup
n−1
| f (ti+1 ) − f (ti )| p : n ≥ 1, 0 = t0 < t1 , · · · < tn = 1 ;
i=0
the supremum is over all partitions P of [0, 1]. In this exercise we will prove that if p < 2 and W is a Brownian motion, then V p (W ) = ∞, a.s. (1) Let Xi be an i.i.d. sequence of random variables with finite mean. Use the strong law of large numbers to prove that if K > E X1 , then P
n
i=1
as n → ∞.
Xi > Kn → 0
Exercises
31
(2) If p < 2, take r ∈ (p, 2), and let εn = n−1/r . Let S0 = 0 and for i ≥ 0, set Si+1 = inf {t > Si : |Wt − WSi | > εn }. Set Xi = εn−2 (Si − Si−1 ). Prove that the Xi are i.i.d. with finite mean. (3) Use (1) to show that P(Sn > 1) = P
n
Xi > εn−2 → 0
i=1
as n → ∞. p (4) Using the partition {S0 , S1 , . . . , Sn }, show that V p (W ) ≥ nεn on the event (Sn ≤ 1). p (5) Conclude V (W ) = ∞, a.s.
5 The Poisson process
At the opposite extreme from Brownian motion is the Poisson process. This is a process that only changes value by means of jumps, and even then, the jumps are nicely spaced. The Poisson process is the prototype of a pure jump process, and later we will see that it is the building block for an important class of stochastic processes known as L´evy processes. Definition 5.1 Let {F t } be a filtration, not necessarily satisfying the usual conditions. A Poisson process with parameter λ > 0 is a stochastic process X satisfying the following properties: (1) X0 = 0, a.s. (2) The paths of Xt are right continuous with left limits. (3) If s < t, then Xt − Xs is a Poisson random variable with parameter λ(t − s). (4) If s < t, then Xt − Xs is independent of Fs . Define Xt− = lims→t,s 0. If there were a jump of size 2 or larger at some time t strictly less than t0 , then for each n sufficiently large there 32
The Poisson process
33
exists 0 ≤ kn ≤ 2n such that X(kn +1)t0 /2n − Xkn t0 /2n ≥ 2. Therefore
P(∃ s < t0 : Xs ≥ 2) ≤ P(∃ k ≤ 2n : X(k+1)t0 /2n − Xkt0 /2n ≥ 2) ≤ 2n sup P(X(k+1)t0 /2n − Xkt0 /2n ≥ 2)
(5.1)
k≤2n
= 2 P(Xt0 /2n ≥ 2n ) n
≤ 2n (1 − P(Xt0 /2n = 0) − P(Xt0 /2n = 1))
n n = 2n 1 − e−λt0 /2 − (λt0 /2n )e−λt0 /2 . We used property 5.1(3) for the two equalities. By l’Hˆopital’s rule, (1 − e−x − xe−x )/x → 0 as x → 0. We apply this with x = λt0 /2n , and see that the last line of (5.1) tends to 0 as n → ∞. Since the left-hand side of (5.1) does not depend on n, it must be 0. This holds for each t0 . Another characterization of the Poisson process is as follows. Let T1 = inf {t : Xt = 1}, the time of the first jump. Define Ti+1 = inf {t > Ti : Xt = 1}, so that Ti is the time of the ith jump. Proposition 5.3 The random variables T1 , T2 − T1 , . . . , Ti+1 − Ti , . . . are independent exponential random variables with parameter λ. Proof In view of Corollary 4.3 it suffices to show that T1 is an exponential random variable with parameter λ. If T1 > t, then the first jump has not occurred by time t, so Xt is still zero. Hence
P(T1 > t ) = P(Xt = 0) = e−λt , using the fact that Xt is a Poisson random variable with parameter λt. We can reverse the characterization in Proposition 5.3 to construct a Poisson process. We do one step of the construction, leaving the rest as Exercise 5.4. Let U1 , U2 , . . . be independent exponential random variables with parameter λ and let j Tj = i=1 Ui . Define Xt (ω) = k
if Tk (ω) ≤ t < Tk+1 (ω).
(5.2)
An examination of the densities shows that an exponential random variable has a gamma distribution with parameters λ and r = 1, so by Proposition A.49, Tj is a gamma random variable with parameters λ and j. Thus ∞ −λx λe (λx)k−1 dx. P(Xt < k) = P(Tk > t ) =
(k) t Performing the integration by parts repeatedly shows that
P(Xt < k) =
k−1 i=0
e−λt
(λt )i , i!
and so Xt is a Poisson random variable with parameter λt. We will use the following proposition later.
34
The Poisson process
Proposition 5.4 Let {Ft } be a filtration satisfying the usual conditions. Suppose X0 = 0, a.s., X has paths that are right continuous with left limits, Xt − Xs is independent of Fs if s < t, and Xt − Xs has the same law as Xt−s whenever s < t. If the paths of X are piecewise constant, increasing, all the jumps of X are of size 1, and X is not identically 0, then X is a Poisson process. Proof Let T0 = 0 and Ti+1 = inf {t > Ti : Xt = 1}, i = 1, 2, . . . We will show that if we set Ui = Ti − Ti−1 , then the Ui are i.i.d. exponential random variables and then appeal to Exercise 5.4. By Corollary 4.3, the Ui are independent and have the same law. Hence it suffices to show U1 is an exponential random variable. We observe
P(U1 > s + t ) = P(Xs+t = 0) = P(Xs+t − Xs = 0, Xs = 0) = P(Xt+s − Xs = 0)P(Xs = 0) = P(Xt = 0)P(Xs = 0) = P(U1 > t )P(U1 > s). Setting f (t ) = P(U1 > t ), we thus have f (t + s) = f (t ) f (s). Since f (t ) is decreasing and 0 < f (t ) < 1, we conclude P(U1 > t ) = f (t ) = e−λt for some λ > 0, or U1 is an exponential random variable.
Exercises 5.1
Suppose Pt is a Poisson process and we write Xt = Pt− . Is P1 − X1−t a Poisson process on [0, 1]? Why or why not?
5.2
Let P be a Poisson process with parameter λ. Show that P nt lim sup − λt = 0, n→∞ t≤1 n
a.s.
5.3
Show that if P(1) and P(2) are independent Poisson processes with parameters λ1 and λ2 , respectively, then Pt(1) + Pt(2) is a Poisson process with parameter λ1 + λ2 .
5.4
If X is defined by (5.2), show that X is a Poisson process.
5.5
Let Xt be a stochastic process and let {Ft00 } be the filtration generated by X . Suppose X is a Poisson process with respect to the filtration {Ft00 }. Show that X is a Poisson process with respect to the minimal augmented filtration generated by X . Hint: Imitate the proof of Proposition 2.5.
5.6
Suppose Pt is a Poisson process and f and g are non-negative bounded deterministic ∞functions withcompact support. Find necessary and sufficient conditions on f and g so that 0 f (s) dPs ∞ and 0 g(s) dPs are independent. ∞ Hint: First show that the characteristic function of F = 0 f (s) dPs is E eiuF = exp
∞ 0
(eiu f (s) − 1) ds .
Exercises 5.7
35
We will talk about weak convergence in general metric spaces in Chapters 30–35. This exercise is concerned with the weak convergence of real-valued random variables as defined in Section A.12. Suppose for each n, Pn is a Poisson random variable with parameter λn and λn → ∞ as n → ∞. Prove that Pn − λn √ λn converges weakly to a normal random variable with mean zero and variance one. Hint: Imitate the proof of Theorem A.51.
6 Construction of Brownian motion
There are several ways of constructing Brownian motion, none of them easy. Here we give two constructions. The first is the one that Wiener used, which is based on Fourier series. The second uses martingale techniques. A method due to L´evy can be found in Bass (1995); see also Exercises 6.4 and 6.5. We will see several other constructions in later chapters.
6.1 Wiener’s construction For any of the constructions of Brownian motion, the main step is to construct Wt for t ∈ [0, 1]. Once we have done this, we get Brownian motion for all t rather easily. More specifically, suppose we have a Brownian motion Y (0) started at 0 on the time interval [0, 1]. Take independent copies Y (1) , Y (2) , . . . , each on [0, 1]. We have Y0(i) = 0 for each i, and now (1) to get Brownian motion started at 0, define Wt to be equal to Yt (0) if t ≤ 1, equal to Y1(0) +Yt−1 if 1 < t ≤ 2, and more generally Wt =
[t]−1
[t] Y1(i) + Yt−[t]
i=0
if t ≥ 1, where [t] is the largest integer less than or equal to t. This will give Brownian motion started at 0 on the time interval [0, ∞). Therefore the crux of the problem is to construct Brownian motion on [0, 1]. Because we are working with Fourier series, it is more convenient to look at Brownian motion on [0, π]; we can just disregard times between 1 and π when we are done. Throughout this chapter we make the supposition that we can find a countable sequence Z1 , Z2 , . . . of independent and identically distributed mean zero normal random variables with variance one that are F measurable, where (, F , P ) is our probability space. This is an extremely mild condition. Theorem 6.1 There exists a process {Wt ; 0 ≤ t ≤ 1} that is Brownian motion. Proof If we fix t ∈ [0, π] and compute the Fourier series for the function f (s) = s ∧ t, it is an exercise in calculus to get the Fourier coefficients. We end up with ∞ 2 sin ks sin kt st . s∧t = + π π k=1 k2
36
(6.1)
6.1 Wiener’s construction
37
This suggests letting Z0 , Z1 , . . . be i.i.d. normal random variables with mean 0 and variance 1 and setting ∞ 2 sin kt t Wt = √ Z0 + (6.2) Zk . π k π k=1 Assuming there is no problem with convergence, we see that Wt has mean zero, since each of the Zi does, and that st 2 sin ks sin kt E [WsWt ] = + = s∧t π π k2 k=1 ∞
(6.3)
as required. We used the independence of the Zi here to show that E [Zi Z j ] = 0 if i = j. 2 We argue that there is in fact no difficulty with the convergence. Note that mk=1 sink 2 kt increases as m increases to a finite limit. Therefore n n sin kt 2 sin2 kt E Zk →0 = k k2 k=m k=m in L2 as m, n → ∞. This means that the sum on the right of (6.2) is a Cauchy sequence in L2 . By the completeness of L2 , the sum on the right of (6.2) converges in L2 . A use of the Cauchy–Schwarz inequality allows us to justify the formula for the expectation of WsWt . If we let j 2 sin kt t j Zk , Wt = √ Z0 + π k π k=1 then (Wt1j , . . . , Wtmj ) is a jointly normal collection of random variables for each j whenever t1 , . . . , tn ∈ [0, π]. By Remark A.56, it follows that (Wt1 , . . . , Wtm ) is a jointly normal collection of random variables. Therefore Wt is a Gaussian process. Since each Wt has mean zero and Cov (Ws , Wt ) = s ∧ t, then Wt has the correct finite-dimensional distributions to be a Brownian motion. The only part remaining to the construction is to show that Wt as constructed above has continuous paths, for we can then use Theorem 2.4. In what follows, pay attention to where the absolute values are placed. If one is cavalier about placing them, one will very likely run into trouble. Define 2m−1 sin kt Sm (t ) = Zk k k=m and let Tm = sup0≤t≤π |Sm (t )|. We write t Wt = √ Z0 + π
∞ 2 S2n (t ). π n=0
We will show
E Tm2 ≤
c . m1/2
(6.4)
38
Construction of Brownian motion
Once we have this, then by the Fubini theorem and then Jensen’s inequality,
E
∞
T2n =
n=0
∞
∞
∞
1/2 E T2n ≤ E [T22n ] < ∞.
n=0
n=0
Therefore n=0 T2n < ∞, a.s., and by the Weierstrass M-test (see, e.g., Rudin, 1976), we have that with probability 1, ∞ n=0 S2n (t ) converges uniformly in t. Since each S2n (t ) is a continuous function of t, we see that the uniform limit is also continuous and we are done. We therefore have to prove (6.4). Using | k ak |2 = j,k ak a j for ak complex valued, we have 2m−1 2 ikt e Zk Tm2 ≤ sup k 0≤t≤π k=m 2m−1 ikt −i jt e e Z j Zk ≤ sup jk 0≤t≤π j,k=m ≤
2m−1 k=m
≤
2m−1 k=m
m−1 2m−−1 1 2 eit Z Z + 2 sup Z j j+ k 2 k j( j + ) 0≤t≤π =1 j=m m−1 2m−−1 1 2 1 Z Z + 2 Z j j+ . k2 k j( j + ) j=m =1
(6.5)
In the third inequality we wrote 2m−1 j,k=m
=
+2
m≤ j=k≤2m−1
,
m≤ j
and then set = k − j. Write I for the first sum on the last line of (6.5) and J for 2m−−1 1 Z Z . The expectation of I is equal to j=m j( j+) j j+ 2m−1 k=m
1 c ≤ . 2 k m
We next look at the expectation of the J . Since the Zi are mean zero and independent, E [Zi1 Zi2 Zi3 Zi4 ] is zero unless either all four subscripts are equal or else two subscripts are equal and the other two subscripts are also equal. By Jensen’s inequality,
2m−−1 E |J | ≤ E j=m
=
2m−−1 j=m
2 1/2 1 Z j Z j+ j( j + )
1/2 1 . j 2 ( j + )2
The last equality follows by multiplying out 2
Z Z j j+ j( j + )
(6.6)
6.2 Martingale methods
39
and noting that expectations of the cross-product terms are zero. Since j ≥ m in the last line of (6.6) and there are at most m terms in the sum, the last line of (6.6) is bounded by (cm/m4 )1/2 = cm−3/2 . Therefore
E
m−1
|J | ≤ c/m1/2 .
=1
Substituting in (6.5) completes the proof of (6.4). By Proposition 2.5, the Brownian motion that we constructed is a Brownian motion with respect to the minimal augmented filtration.
6.2 Martingale methods Here, we use martingale methods to take care of the continuity of the paths. We proceed as in the previous section to construct {Wt ; 0 ≤ t ≤ π}, where Wt is a Gaussian process with E Wt = 0 and Cov (Ws , Wt ) = s ∧ t, and we need to show that W has a version with continuous paths. We show that W is a martingale, and so has a version with paths that are right continuous with left limits. We use Doob’s inequalities to control the oscillation of W over short time intervals, and then use the Borel–Cantelli lemma to show continuity. Theorem 6.2 If {Wt ; t ≤ 1} is a Gaussian process with E Wt = 0 for all t ≤ 1 and Cov (Ws , Wt ) = s ∧ t for all s, t ≤ 1, then there is a version of W that is a Brownian motion on [0, 1]. Proof As in the proof of Theorem 6.1, we need to show that W has a version with continuous paths. Since Cov (Wt − Ws , Wr ) = r − r = 0 if r ≤ s < t, we see by Proposition A.55 that Wt − Ws is independent of Fs00 = σ (Wr ; r ≤ s). Then
E [Wt − Ws | Fs00 ] = E [Wt − Ws ] = 0, so Wt is a martingale. By Theorem 3.12, with probability one, W has left and right limits along D, the dyadic rationals. Let Wt = limu>t,u∈D,u→t Wu . Since E (Wu − Wt )2 = u − t → 0 as u → t, then Wt = Wt , a.s., or W is a version of W with paths that are right continuous with left limits. We now drop the primes. Set Wt = W1 if t ≥ 1. For any t0 ∈ [0, 1], Wt+t0 − Wt0 is also a martingale, and by Jensen’s inequality for conditional expectations (Proposition A.21), |Wt+t0 − Wt0 |4 is a submartingale. Using Doob’s inequalities (Theorem 3.6), if λ > 0 and t0 , δ ∈ [0, 1],
P( sup |Wt − Wt0 | ≥ λ) = P( sup |Wt − Wt0 |4 ≥ λ4 ) t0 ≤t≤t0 +δ
t0 ≤t≤t0 +δ
E |Wt0 +δ − Wt0 |4 . λ4 Since Wt0 +δ − Wt0 is a mean zero normal random variable with variance δ if t0 + δ ≤ 1, we have δ2 P( sup |Wt − Wt0 | ≥ λ) ≤ c 4 . (6.7) λ t0 ≤t≤t0 +δ ≤c
40
Construction of Brownian motion
Let An = {∃ k ≤ 2n :
sup k/2n ≤t≤(k+2)/2n
|Wt − Wk/2n | > 2−n/8 }.
From (6.7) with δ = 2−n+1 and λ = 2−n/8 ,
P(An ) ≤ 2n maxn P( k≤2
≤
sup k/2n ≤t≤(k+2)/2n
|Wt − Wk/2n | > 2−n/8 )
c2n 2−2n = c2−n/2 , 2−n/2
which is summable. By the Borel–Cantelli lemma, P(An i.o.) = 0. (The event (An i.o.) is the event where ω is in infinitely many of the An .) Except for a set of ω’s in a null set, there exists a positive integer N (which will depend on ω) such that if n ≥ N, then ω ∈ / An . Given ε > 0, take n ≥ N such that 2−n/8 < ε/2. If −n |t − s| ≤ 2 with s, t ∈ [0, 1], then s, t ∈ [k/2n , (k + 2)/2n ] for some k ≤ 2n . Since ω ∈ / An , |Wt − Ws | ≤ |Wt − Wk/2n | + |Ws − Wk/2n | ≤ 2 · 2−n/8 < ε. This proves the continuity of Wt . There is nothing special about the trigonometric polynomials in this second construction. 1 Let f , g = 0 f (r)g(r) dr be the inner product for the Hilbert space L2 [0, 1]; we consider only real-valued functions for simplicity. Let {ϕn } be a complete orthonormal system for L2 [0, 1]: we have ϕm , ϕn = 0 if m = n, ϕn , ϕn = 1 for each n, and f = 0, a.e., if f , ϕn = 0 for all n. One property of a complete orthonormal system is Parseval’s identity, which says that f, f =
∞
| f , ϕn |2 ;
n=1
see Folland (1999). If we replace f by g and then by f + g and use f , g = 12 [ f + g, f + g − f , f − g, g], we obtain f , g =
∞
f , ϕn g, ϕn .
n=1
Now let
an (t ) = 1[0,t] , ϕn =
t
ϕn (r) dr. 0
If Z1 , Z2 , . . . are independent mean zero normal random variables with variance one, let Wt =
∞ n=1
an (t )Zk .
(6.8)
Exercises
41
Assuming there is no difficulty with the convergence, we have Cov (Ws , Wt ) =
∞
an (s)an (t ) =
∞
n=1
1[0,s] , ϕn 1[0,t] , ϕn
n=1
= 1[0,s] , 1[0,t] = s ∧ t. Exercise 6.2 asks you to verify that the process W defined by (6.8) is a mean zero Gaussian process on [0, 1] with the same covariances as a Brownian motion.
Exercises 6.1
Let Z0 , Z1 , Z2 , . . . be a sequence of independent identically distributed mean zero normal random variables with variance one. Define ∞ t2 2 cos kt Zk . (6.9) Xt = √ Z0 + π k2 2 π k=1
(1) Show that the convergence in (6.9) is absolute and uniform over t ∈ [0, 1]. (2) Show that Xt is a Gaussian process. (3) If Wt is a Brownian motion and t Wr dr, t ∈ [0, 1], Yt = 0
show that X and Y have the same finite-dimensional distributions. Show that X and Y have the same law when viewed as random variables taking values in C[0, 1]. (The process X is sometimes known as integrated Brownian motion.) (4) Find Cov (Xs , Xt ). 6.2
Let {ϕn } be a complete orthonormal system for L2 [0, 1]. Show that the sum (6.8) converges in L2 and give the details of the proof that the resulting process W is a mean zero Gaussian process with Cov (Ws , Wt ) = s ∧ t if s, t ∈ [0, 1].
6.3
Let D = {k/2n : n ≥ 1, k = 0, 1, . . . , 2n } be the dyadic rationals. Suppose the collection of random variables {Vt : t ∈ D} is jointly normal, each Vt has mean zero, and Cov (Vs , Vt ) = s ∧ t. (1) Prove that the paths of V are uniformly continuous over t ∈ D. (2) If we define Wt = lims∈D,s→t Vs , prove that W is a Brownian motion.
6.4
In this and the next exercise we give the Haar function construction of Brownian motion. Let ϕ00 = 1 on [0, 1] and for i = 1, 2, . . ., and 1 ≤ j ≤ 2i−1 , set ⎧ (i−1)/2 , (2 j − 2)/2i ≤ x < (2 j − 1)/2i , ⎪ ⎨2 ϕi j (x) = −2(i−1)/2 , (2 j − 1)/2i ≤ x < 2 j/2i , ⎪ ⎩ 0, otherwise. It is a well-known and easily proved result from analysis (see, e.g., Bass (1995), Section I.2) that the collection {ϕi j } is a complete orthonormal system for L2 [0, 1]. For each i, j, define t ϕi j (s) ds, ψi j (t ) = 0
42
Construction of Brownian motion for each i and j, let Yi j be independent mean zero normal random variables with variance one, and let 2 i−1
Vi (t ) =
Yi j ϕi j (t )
j=1
for i ≥ 1. Set V0 = Y00 ϕ00 . (1) Fix i ≥ 1. Prove that each ψi j is bounded by 2(−i−1)/2 . Prove that the sets {t : ψi j (t ) > 0}, j = 1, . . . , 2i−1 , are disjoint. (2) Fix i ≥ 1. Write P(∃ t ∈ [0, 1] : |Vi (t )| > i−2 ) ≤ P(∃ j ≤ 2i−1 : |Yi j |2(−i−1)/2 > i−2 ),
use Proposition A.52 to estimate this, and conclude that ∞ i=1
6.5
P( sup |Vi (t )| > i−2 ) < ∞.
(6.10)
0≤t≤1
This is a continuation of Exercise 6.4. With ϕi j , ψi j , Yi j , and Vi as in that problem, let Wt =
∞
Vi (t ).
i=0
(1) Prove that W is a jointly normal Gaussian process with mean zero and Cov (Ws , Wt ) = s∧t. (2) Use (6.10) and the Borel–Cantelli lemma to show that ni=1 |Vi (t )| converges uniformly over [0, 1]. Conclude that W is a Brownian motion.
7 Path properties of Brownian motion
The paths of Brownian motion are continuous, but we will see that they are not differentiable. How continuous are they? We will see that the paths satisfy what is known as a H¨older continuity condition. A precise description of the oscillatory behavior of Brownian motion will be given by the law of the iterated logarithm. A function f : [0, 1] → R is said to be H¨older continuous of order α if there exists a constant M such that | f (t ) − f (s)| ≤ M|t − s|α ,
s, t ∈ [0, 1].
(7.1)
We show that the paths of Brownian motion are H¨older continuous of order α if α < 12 . (They are also not H¨older continuous of order α if α ≥ 12 ; we will see this from the law of the iterated logarithm.) Theorem 7.1 If α < 12 , the paths of Brownian motion are H¨older continuous of order α on [0, 1]. Proof Step 1. First we apply the Borel–Cantelli lemma to a certain sequence of sets. Let W be a Brownian motion and set An = {∃ k ≤ 2n − 1 :
sup k/2n ≤t≤(k+1)/2n
|Wt − Wk/2n | > 2−nα }.
Since Wt+k/2n − Wk/2n is a Brownian motion,
P(An ) ≤ 2n sup P( sup |Wt+k/2n − Wk/2n | > 2−nα ) k≤2n
t≤1/2n
≤ 2 P( sup |Wt | > 2−nα ) n
(7.2)
t≤1/2n
≤ 2 · 2n exp(−2−2nα /2(2−n )). Here we used Proposition 3.15. Since α < 12 , then 2n(1−2α) > 2n for n large, and the last line of (7.2) is less than 2n+1 exp(−2n(1−2α) /2) ≤ 2n+1 e−n if n is large. Hence P(An ) < ∞, and P(An i.o.) = 0 by the Borel–Cantelli lemma. Step 2. Next we show that this implies the H¨older continuity. For almost every ω there exists N (depending on ω) such that if n ≥ N, then ω ∈ / An . Let s ≤ t be two points in [0, 1]. If 2−(n+2) ≤ t − s ≤ 2−(n+1) for some n ≥ N and k is the largest integer such that 43
44
Path properties of Brownian motion
k/2n+2 ≤ s, then |Wt − Ws | ≤ |Wt − Wt∧((k+1)/2n+2 ) | + |Wt∧((k+1)/2n+2 ) − Wk/2n+2 | + |Ws − Wk/2n+2 | ≤ 3 · 2−nα ≤ 3 · 4α |t − s|α . We know |Wt (ω)| is bounded on [0, 1] since the paths are continuous; let K (depending on ω) be the bound. If |t − s| ≥ 2−(N+1) , then |Wt − Ws | ≤ 2K ≤ (2K )(2N+1 )|t − s| ≤ (2K )(2N+1 )|t − s|α . Thus, no matter whether |t − s| is small or large, there exists L (depending on ω) such that |Wt (ω) − Ws (ω)| ≤ L|t − s|α for all s, t ∈ [0, 1]. One of the most beautiful theorems in probability theory is the law of the iterated logarithm (LIL). It describes precisely how Brownian motion oscillates. Theorem 7.2 Let W be a Brownian motion. We have lim sup t→∞
|Wt | 2t log log t
= 1,
a.s.
and lim sup t→0
|Wt | 2t log log(1/t )
= 1,
a.s.
Proof The second assertion follows from the first by time inversion; see Exercise 2.5. Thus we only need to prove the first assertion. Proof of upper bound: We use the Borel–Cantelli lemma. Let ε > 0 and then choose q larger than 1 but close enough to 1 so that (1 + ε)2 /q > 1. Let An = (sup |Ws | > (1 + ε) 2qn−1 log log qn−1 ). s≤qn
By Proposition 3.15,
(1 + ε)2 2qn−1 log log qn−1 P(An ) ≤ 2 exp − 2qn
(1 + ε)2 c (log(n − 1) + log log q) = = 2 exp − , q (n − 1)(1+ε)2 /q where we are using our convention that the letter c denotes a constant whose exact value is unimportant. This is summable in n, so P(An ) < ∞. By the Borel–Cantelli lemma, P(An i.o.) = 0. Hence, except for a null set, there exists N = N (ω) such that ω ∈ / An if n ≥ N (ω). If t ≥ qN , then for some n ≥ N + 1 we have qn−1 ≤ t ≤ qn , and |Wt | ≤ sup |Ws | ≤ (1 + ε) 2qn−1 log log qn−1 ≤ (1 + ε) 2t log log t. s≤qn
Path properties of Brownian motion
45
Therefore lim sup t→∞
|Wt | 2t log log t
≤ 1 + ε,
a.s.
(7.3)
Since ε > 0 is arbitrary, the upper bound is proved. Proof of lower bound: We start with the second half of the Borel–Cantelli lemma. Let ε > 0 and then take q > 1 very large so that (1 − ε)2 (1 + ε) <1 1 − q−1 √ and 2/ q < ε/2. This is possible because (1 − ε)2 (1 + ε) = (1 − ε2 )(1 − ε) < 1. Let Bn = (Wqn+1 − Wqn > (1 − ε) 2qn+1 log log qn+1 ). Since Brownian motion has independent increments, the events Bn are independent. Let Wqn+1 − Wqn Z= . qn+1 − qn Then Z is a mean zero normal random variable with variance one. By Proposition A.52, we see that P(Bn ) = P(Z > (1 − ε) 2qn+1 log log qn+1 / qn+1 − qn )
(1 − ε)2 (1 + ε)2qn+1 log log qn+1 ≥ exp − 2(qn+1 − qn )
log(n + 1) + log log q = c exp − (1 − ε)2 (1 + ε) 1 − q−1 for n large. Hence
P(Bn ) ≥ c
n
1 (n +
1)(1−ε)2 (1+ε)/(1−q−1 )
= ∞.
By the Borel–Cantelli lemma, with probability one, ω is in infinitely many Bn . Consequently, with probability one, infinitely often (7.4) Wqn+1 − Wqn > (1 − ε) 2qn+1 log log qn+1 . The inequality (7.4) is not exactly what we want, as we want a lower bound for Wqn+1 , but we can derive the desired lower bound by using the upper bound we proved in Step 1. We know from (7.3) that for n large enough, 2 ε n+1 2q log log qn+1 . |Wqn | ≤ 2 2qn log log qn ≤ √ 2qn+1 log log qn+1 < q 2 Thus infinitely often
Wqn+1 > (1 − 3ε/2) 2qn+1 log log qn+1 .
46
Path properties of Brownian motion
This proves lim sup n→∞
Wqn+1 2qn+1
log log qn+1
≥1−
3ε , 2
a.s.
Since ε is arbitrary, the lower bound follows. The law of the iterated logarithm show that the paths of Wt are not differentiable at time 0, a.s. Applying this to Ws+t − Wt , we see that for each t, W is not differentiable at time t, a.s. But the null set Nt might depend on t, and it is even conceivable that ∪t∈[0,1] Nt is not a null set. We have the following stronger result, which says that except for a set of ω’s that form a null set, t → Wt (ω) is a function that does not have a derivative at any time t ∈ [0, 1]. Theorem 7.3 With probability one, the paths of Brownian motion are nowhere differentiable. Proof
Note that if Z is a normal random variable with mean 0 and variance 1, then r 1 2 P(|Z| ≤ r) = √ e−x /2 dx ≤ 2r. 2π −r
(7.5)
Let M, h > 0 and let AM,h = {∃s ∈ [0, 1] : |Wt − Ws | ≤ M|t − s| if |t − s| ≤ h}, Bn = {∃k ≤ 2n : |Wk/n − W(k−1)/n | ≤ 4M/n, |W(k+1)/n − Wk/n | ≤ 4M/n, |W(k+2)/n − W(k+1)/n | ≤ 4M/n}. We check that AM,h ⊂ Bn if n ≥ 2/h. To see this, if ω ∈ AM,h , there exists an s such that |Wt − Ws | ≤ M|t − s| if |t − s| ≤ 2/n; let k/n be the largest multiple of 1/n less than or equal to s. Then |(k + 2)/n − s| ≤ 2/n
and
|(k + 1)/n − s| ≤ 2/n,
and therefore |W(k+2)/n − W(k+1)/n | ≤ |W(k+2)/n − Ws | + |Ws − W(k+1)/n | ≤ 2M/n + 2M/n < 4M/n. Similarly |W(k+1)/n − Wk/n | and |Wk/n − W(k−1)/n | are less than 4M/n. Using the independent increments property, the stationary increments property, and (7.5),
P(Bn ) ≤ 2n sup P(|Wk/n − W(k−1)/n | < 4M/n, |W(k+1)/n − Wk/n | < 4M/n, k≤2n
|W(k+2)/n − W(k+1)/n | < 4M/n) ≤ 2nP(|W1/n | < 4M/n, |W2/n − W1/n | < 4M/n, |W3/n − W2/n | < 4M/n) = 2nP(|W1/n | < 4M/n)P(|W2/n − W1/n | < 4M/n) × P(|W3/n − W2/n | < 4M/n) = 2n(P(|W1/n | < 4M/n))3
4M 3 ≤ cn √ , n
Exercises
47
which tends to 0 as n → ∞. Hence for each M and h,
P(AM,h ) ≤ lim sup P(Bn ) = 0. n→∞
This implies that the probability that there exists s ≤ 1 such that lim sup h→0
|Ws+h − Ws | ≤M |h|
is zero. Since M is arbitrary, this proves the theorem.
Exercises 7.1
Here you are asked to find a more precise description of the modulus of continuity of Brownian paths. Prove that |Wt − Ws | sup < ∞, a.s. lim δ→0 s,t∈[0,1],0<|t−s|<δ δ log(1/δ) Hint: Imitate the proof of Theorem 7.1.
7.2
The following is part of what is known as Chung’s law of the iterated logarithm. We will see in Section 40.3 that there exists c1 such that P(sup |Ws | ≤ λ) ≤ c1 e−π
2
t/8λ2
s≤t
for t/λ2 sufficiently large. Prove that sups≤t |Ws | < ∞, lim inf t→∞ t/ log log t 7.3
a.s.
Let Wt be a one-dimensional Brownian motion. We will see in Section 40.3 that there exists c2 such that P(sup |Ws | ≤ λ) ≥ c2 e−π
2
t/8λ2
s≤t
if t/λ2 is sufficiently large. Prove that sups≤t |Ws | > 0, lim inf t→∞ t/ log log t
a.s.
This is the other half of Chung’s law of the iterated logarithm. In fact, sups≤t |Ws | lim inf = c, t→∞ t/ log log t
a.s.
(7.6)
Identify c and prove (7.6). 7.4
A function f is H¨older continuous of order α at a point t if there exists c such that | f (u)− f (t )| ≤ c|u − t|α for all u. Suppose α > 1/2 and Wt is a Brownian motion. Show that the event A = {∃ t ∈ [0, 1] : W is H¨older continuous of order α at t ) has probability 0. Hint: Imitate the proof of nowhere differentiability, but use more than three time intervals.
48 7.5
Path properties of Brownian motion Let W be a one-dimensional Brownian motion and let Mt = sups≤t Ws (with no absolute value signs). Prove that if ε > 0, then Mt lim inf √ > 0, t→∞ t/(log t )1+ε
a.s.
7.6
This is a complement to Exercise 4.10. Prove that if p > 2 and W is a Brownian motion, then the p-variation of W , defined in Exercise 4.10, is finite, a.s. Hint: Use the fact that the paths of Brownian motion are H¨older continuous of order α if α < 1/2.
7.7
Let W be a Brownian motion and let Z be the zero set: Z = {t ∈ [0, 1] : Wt = 0}. (1) Show there exists a constant c not depending on x or δ such that P(∃s ≤ δ : Ws = −x) ≤ P(sup |Ws | ≥ |x|) ≤ ce−x
2
/2δ
.
s≤δ
(2) Use the Markov property of Brownian motion to show that there exists a constant c not depending on s or t such that
t − s P(Z ∩ [s, t] = ∅) ≤ c 1 ∧ . s 7.8
Given a Borel measurable subset A of [0, 1], define ∞ [bi − ai ]γ : A ⊂ ∪∞ Hγ (A) = lim sup inf i=1 [ai , bi ], sup |bi − ai | ≤ δ . δ→0
i
i=1
In other words, cover A by the union of intervals [ai , bi ] and define the analog of Lebesgue measure. The differences are that we look at |bi − ai |γ but do not require that γ be one, and we require that none of the intervals be longer than δ. The quantity Hγ (A) is called the Hausdorff measure of A with respect to the function xγ . The Hausdorff dimension of a set A is defined to be inf {γ : Hγ (A) > 0} = sup{γ : Hγ (A) = ∞}. (For subsets of we replace the intervals [ai , bi ] by balls of radius ri .) As a warm-up to this exercise, prove that the Hausdorff dimension of the standard Cantor set in [0, 1] is log 2/ log 3. The purpose of this exercise is to show that if W is a Brownian motion and Z = {t ∈ [0, 1] : Wt = 0} is the zero set, then the Hausdorff dimension of Z is no more than 1/2. (1) For each n, let C n be the collection of intervals [i/2n , (i + 1)/2n ] contained in [0, 1] that intersect Z. (Cn is random.) If #Cn is the cardinality of Cn , use Exercise 7.7 to show Rd ,
E [ #Cn ] ≤
n 2 −1
P(Z ∩ [i/2n , (i + 1)/2n ] = ∅) ≤ c2n/2 .
i=0
(2) Write
[i/2n ,(i+1)/2n ]∈C
|2−n |γ = 2−nγ #Cn . n
Use the Chebyshev inequality and (1) to conclude that the Hausdorff dimension of Z is less than or equal to 1/2, a.s. (We will show that it is at least 1/2 in Exercise 14.10.)
8 The continuity of paths
It is often important to know whether a stochastic path has continuous paths. An important sufficient condition is the Kolmogorov continuity criterion. This criterion is also useful in showing the continuity of a family of random variables X a in the variable a, where a is a parameter other than time. Kolmogorov’s continuity criterion is part (2) of Theorem 8.1. Let Dn = {k/2n : k ≤ 2n } and let D = ∪n Dn . The set D is known as the set of dyadic rationals in [0, 1]. We will use ∞ ∞ i−2 ≤ 1 + x−2 dx = 2. i=1
1
(In by a standard exercise using Parseval’s identity in the theory of Fourier series, ∞fact −2 is actually equal to π 2 /6.) i=1 i We will be considering at first a real-valued process {Xt : t ∈ D}. To show continuity by considering Xt − Xs for all pairs (s, t ) doesn’t work – there are too many pairs. Kolmogorov’s proof circumvents this problem by considering only a restricted collection of pairs. To bound X15/32 − X11/32 , for example, we compare X15/32 to X7/16 , compare X7/16 to X3/8 , and compare X3/8 to X1/4 , and we also compare X11/32 to X5/16 and compare X5/16 to X1/4 . The advantage of this complicated way of matching pairs is that each comparison, say, for example X3/8 to X1/4 , is used for a great many of the possible pairs (s, t ). The proof of Theorem 8.1 has three main steps. Step 1 is to reduce the problem to proving the bound (8.3). The second step is to set up the comparisons that we need, and the third is to obtain estimates on all the comparisons. Theorem 8.1 Suppose {Xt : t ∈ D} is a real-valued process and there exist c1 , ε, and p > 0 such that
E [|Xt − Xs | p ] ≤ c1 |t − s|1+ε ,
s, t ∈ D.
Then the following hold. (1) There exists c2 depending only on c1 , p, and ε such that for M > 0,
|Xt − Xs | P sup ≥ M ≤ c1 /M p . ε/4p s,t∈D,s=t |t − s| (2) With probability one, Xt is uniformly continuous on D. Proof
Step 1. Let λn = M2−(n+1)ε/4p and An = |Xt − Xs | ≥ λn for some s, t ∈ D with |t − s| ≤ 2−n . 49
(8.1)
(8.2)
50
The continuity of paths
Recall our convention that the letter c denotes unimportant constants which can change from line to line. We will show
P(An ) ≤ c2−nε/4 M −p .
(8.3)
This implies (1) and (2) as follows. If |Xt − Xt | ≥ M|t − s|ε/4p for some s, t ∈ D with s = t, choose n such that 2−(n+1) < |t − s| ≤ 2−n , and then An holds. The event on the left-hand side of (8.2) is contained in ∪n An , and using (8.3) shows that
P(∪n An ) ≤ cM −p
∞
2−nε/4 = cM −p ,
n=1
which implies (1). Let BM = { sup |Xt − Xt |/|t − s|ε/4p ≥ M}. s,t∈D,s=t
Note BM decreases as M increases and from (1) we have P(∩∞ M=1 BM ) = 0. Thus except for an event of probability zero, each ω is in BcM for some M (where M depends on ω), and this implies (2). Thus we must show (8.3). Step 2. Define a( j, t ) to be the integer multiple of 2− j that is closest to t (if there are two different multiples that are equally close, we use some convention to break the tie). If t ∈ Dm , then a(m, t ) = t. If |t − s| ≤ 2−n , then |a(n, t ) − a(n, s)| ≤ 2−n+2 . Now if s, t ∈ Dm and m ≥ n, we use the triangle inequality to write |Xt − Xs | = |Xa(m,t ) − Xa(m,s) |
(8.4)
≤ |Xa(n,t ) − Xa(n,s) | + |Xa(n+1,t ) − Xa(n,t ) | + · · · + |Xa(m,t ) − Xa(m−1,t ) | + |Xa(n+1,s) − Xa(n,s) | + · · · + |Xa(m,s) − Xa(m−1,s) |. If |Xa(n,t ) − Xa(n,s) | < λn /2 and for each i |Xa(n+i+1,t ) − Xa(n+i,t ) | <
λn 8(i + 1)2
and the same with t replaced by s, then by (8.4) λn λn +2 ≤ λn . 2 2 8(i + 1) i=0 ∞
|Xt − Xs | <
Hence if |Xt − Xs | ≥ λn for some s, t ∈ Dm , then at least one of the events E, Fi , or Gi , i ≥ 0, must hold, where E = {|Xa(n,t ) − Xa(n,s) | ≥ λn /2 for some s, t ∈ Dn with |s − t| ≤ 2−n }, Fi = {|Xa(n+i+1,t ) − Xa(n+i,t ) | ≥ λn /8(i + 1)2 for some t}, Gi = {|Xa(n+i+1,s) − Xa(n+i,s) | ≥ λn /8(i + 1)2 for some s}.
The continuity of paths
51
Step 3. For the event E to hold, we must have |Xr − Xq | ≥ λn /2 for some q, r ∈ Dn with |q − r| ≤ 2−n+2 . There are at most c2n such pairs (q, r), so the probability of E is bounded, using Chebyshev’s inequality and (8.1), by (c2n )
sup q∈Dn ,r∈Dn+1 ,|r−q|≤2−n+2
≤ c2n
P(|Xr − Xq | ≥ λn /2)
supq∈Dn ,r∈Dn+1 ,|r−q|≤2−n+2 E [ |Xr − Xq | p ] (λn /2) p
≤
c2n −n+2 1+ε (2 ) λnp
≤
c2−nε . λnp
For Fi to hold, that is, for |Xa(n+i+1,t ) − Xa(n+i,t ) | to be greater than λn /8(i + 1)2 for some t, we must have |Xr − Xq | ≥ λn /8(i + 1)2 for some r ∈ Dn+i , q ∈ Dn+i+1 with |r − q| ≤ 2−n−i+2 . There are at most c2n+i such pairs, and so the probability of Fi is bounded by (c2n+i )
sup r∈Dn+i ,q∈Dn+i+1 ,|r−q|≤2−n−i+2
≤c ≤
P |Xr − Xq | ≥
λn 8(i + 1)2
2n+i 2(−n−i+2)(1+ε) (8(i + 1)2 ) p λnp
c2−nε 2−iε/2 . λnp
Here we used the fact that 2−iε (i + 1)2p ≤ c2−iε/2 for some constant c depending on p and ε but not i. We have the same bound for Gi . Therefore
P(∪i (Fi ∪ Gi ) ∪ E ) ≤
∞ c2−nε/2 2−iε/2 i=0
λnp
+
c2−nε/2 ≤ c2−nε/2 λ−p n . λnp
Letting m → ∞ we have −nε/4 −p P(An ) ≤ c2−nε/2 λ−p M n = c2
as required. The proof of Theorem 8.1 is an example of what is known as a metric entropy or chaining argument. In the above, the only place we relied on the fact that we were using real-valued processes was in using the triangle inequality. Therefore with only slight changes in notation, we have the following theorem.
52
The continuity of paths
Theorem 8.2 Suppose X takes values in some metric space S with metric dS and there exist c1 , ε, and p > 0 such that
E [dS (Xs , Xt ) p] ≤ c1 |t − s|1+ε ,
s, t ∈ D.
(8.5)
Then the following hold. (1) There exists c2 depending only on c1 , p, and ε such that for M > 0,
dS (Xs , Xt ) P sup ≥ M ≤ c1 /M p . ε/2p |t − s| s,t∈D,s=t (2) With probability one, Xt is uniformly continuous on D. Remark 8.3 Theorem 8.2 holds for random variables indexed by time, but the analogous result holds for the continuity in a of random variables X a indexed by some parameter a running through D. We may also let the parameter a run instead through the dyadic rationals in [b1 , b2 ] for any b1 < b2 . The proof of the following corollary is an adaptation of the proof of Theorem 8.1 and is left as Exercise 8.1. Corollary 8.4 Suppose there exist c1 , ε, N, and p > 0 such that if n ≤ N,
E [dS (Xs , Xt ) p ] ≤ c|t − s|1+ε ,
s, t ∈ Dn .
Then there exists c2 depending on c1 , ε, and p but not N such that for M > 0 and n ≤ N we have
dS (Xs , Xt ) P ≥ M < c2 M −p . sup ε/2p s,t∈Dn ,s=t |t − s| Recall the definition of H¨older continuity from (7.1). Proposition 8.5 If α < 1/2, then the paths of a one-dimensional Brownian motion {Wt ; 0 ≤ t ≤ 1} are H¨older continuous of order α with probability one. Proof
By the stationary increments property and scaling,
E |Wt − Ws | p = E |Wt−s | p = |t − s| p/2 E |W1 | p . If α < 1/2, choose p large enough so that ((p/2) − 1)/p > α and then take ε = (p/2) − 1. (Here ε is large!) Take γ sufficiently small that (ε/p) − γ > α. Then by Exercise 8.2 the paths of Wt are H¨older continuous of order α, with probability one, provided we restrict t to D. But the paths of Brownian motion are continuous, so we see that we have H¨older continuity of order α when t ∈ [0, 1].
Exercises 8.1
Prove Corollary 8.4.
8.2
If the hypothesis of Theorem 8.1 holds and γ < ε/p, show that there exists c2 depending only on c1 , ε, γ , and p such that for M > 0
dS (Xs , Xt ) sup P ≥ M ≤ cM −p . (ε/p)−γ s,t∈D,s=t |t − s|
Exercises 8.3
53
Suppose X is a real-valued process and there exist constants c1 , c2 such that P(|Xt − Xs | > λ) ≤ c1 e−c2 λ log
4
(1/|t−s|)
,
s, t ∈ [0, 1].
Prove that with probability one, X has a version which is uniformly continuous on the dyadic rationals in [0, 1]. 8.4
Suppose (Xt , t ∈ [0, 1]) is a mean zero Gaussian process and there exist c and ε such that Var (Xt − Xs ) ≤ c|t − s|ε ,
s, t ∈ [0, 1].
Prove that there is a version of X that has continuous paths on [0, 1]. 8.5
Let X be as in Exercise 8.4. For what values α will X have paths that are H¨older continuous of order α? (α will depend on ε.)
8.6
Let {Xs,t : s, t ∈ [0, 1]} be a collection of random variables. Suppose there exist c, p, and ε > 0 such that E |Xs ,t − Xs,t | p ≤ c(|t − t| + |s − s|)2+ε .
Prove that with probability one, the map (s, t ) → Xs,t (ω) is uniformly continuous on D × D = {(s, t ); s, t ∈ D}.
9 Continuous semimartingales
Roughly speaking, a semimartingale is the sum of a martingale and a process whose paths are of bounded variation. In this chapter we consider semimartingales whose paths are continuous. We will give definitions, and then investigate in more detail the class of martingales that are square integrable. Finally we present a proof of the Doob–Meyer decomposition for continuous supermartingales. The Doob–Meyer decomposition used to be considered a very hard theorem, but at least in the continuous case, an elementary proof is possible. For a proof for the general case, see Chapter 16.
9.1 Definitions Let {F t } be a filtration satisfying the usual conditions and let
F∞ = Ft = σ Ft . t≥0
t≥0
We say a process X has increasing paths or that X is an increasing process if the functions t → Xt (ω) are increasing with probability one. Throughout this book saying f is “increasing” means that s < t implies f (s) ≤ f (t ), while saying f is “strictly increasing” means that s < t implies f (s) < f (t ). A process X with paths of bounded variation is just what one would expect: with probability one, the functions t → Xt (ω) are of bounded variation. We say X has paths locally of bounded variation if there exist stopping times Rn → ∞ such that the process Xt∧Rn has paths of bounded variation for each n. We turn to martingales. A martingale M is a uniformly integrable martingale if the family of random variables {Mt } is uniformly integrable. A process X is a local martingale if there exist stopping times Rn → ∞ such that Mtn = Xt∧Rn is a uniformly integrable martingale for each n. A martingale whose paths are continuous is called a continuous martingale and we similarly define a right-continuous martingale. A semimartingale is a process X of the form Xt = Mt + At , where Mt is a local martingale and At is a process whose paths are locally of bounded variation. As a consequence of the Doob–Meyer decomposition we will see that submartingales and supermartingales are semimartingales. As an example, a Brownian motion Wt is a martingale and is a local martingale (let Rn be identically equal to n), but is not a uniformly integrable martingale. We will define what it means to be a square integrable martingale in the next section; Brownian motion is not a square integrable martingale. 54
9.2 Square integrable martingales
55
9.2 Square integrable martingales Definition 9.1 A martingale is a square integrable martingale if there exists a F∞ measurable 2 < ∞ and Mt = E [M∞ | Ft ] for all t. random variable M∞ such that E M∞ An example of a square integrable martingale would be Mt = Wt∧t0 , where Wt is a Brownian motion and t0 is a fixed time; in this case M∞ = Wt0 . Proposition 9.2 Let {Ft } be a filtration satisfying the usual conditions and M a right continuous process. The following are equivalent: (1) Mt is a square integrable martingale. (2) M is a martingale with supt≥0 E Mt2 < ∞. (3) M is a martingale with E [supt≥0 Mt2 ] < ∞. Proof To show (1) implies (2), suppose M is a square integrable martingale. Then by Jensen’s inequality for conditional expectations (Proposition A.21), 2 2 E Mt2 = E [(E [M∞ | Ft ])2 ] ≤ E [E [M∞ | Ft ] ] = E M∞ .
To show (2) implies (3), for each N,
E [ sup Mt2 ] ≤ 4E MN2 0≤t≤N
by Doob’s inequalities. That (2) implies (3) follows by letting N → ∞ and using Fatou’s lemma. Now suppose (3) holds, and we will show (1) holds. Since E Mn2 is uniformly bounded in n, the martingale convergence theorem (Theorem A.35) implies that Mn converges almost 2 surely and in L2 . Let us call the limit M∞ ; we have E M∞ < ∞ by the L2 convergence. Since 2 E Mn is uniformly bounded, then Mn is a uniformly integrable martingale, and by Proposition A.37, Mn = E [M∞ | Fn ]. If n − 1 ≤ t ≤ n, we have Mt = E [Mn | Ft ] = E [ E [M∞ | Fn ] | Ft ] = E [M∞ | Ft ], as required. For the remainder of this section all our martingales will have paths that are right continuous with left limits. Proposition 9.3 If M is a square integrable martingale and S ≤ T are finite stopping times, then E [MT | FS ] = MS . Proof Let A ∈ FS and define U (ω) = S(ω)1A (ω) + T (ω)1Ac (ω). Thus U is equal to S if ω ∈ A and otherwise is equal to T . Since A ∈ FS ⊂ FT , then we have (U ≤ t ) = [(S ≤ t ) ∩ A] ∪ [(T ≤ t ) ∩ Ac ] is in Ft , and therefore U is a stopping time. By Proposition 3.11,
E M0 = E MU = E [MS ; A] + E [MT ; Ac ] and
E M0 = E MT = E [MT ; A] + E [MT ; Ac ].
56
Continuous semimartingales
These two equations imply that E [MS ; A] = E [MT ; A], which is what we needed to prove. By Exercise 3.11, the conclusion is valid if M is a uniformly integrable martingale. As an immediate corollary we have Corollary 9.4 Suppose M is a square integrable martingale and T is a stopping time. Then Xt = Mt∧T is a martingale with respect to {Ft∧T }. The proof of the following proposition is similar to that of Proposition 9.3. It may be viewed as a converse of the optional stopping theorem. Proposition 9.5 Suppose {Ft } is a filtration satisfying the usual conditions and M is a process that is adapted to {Ft } such that Mt is integrable for each t. If E MT = 0 for every bounded stopping time T , then Mt is a martingale. Proof Suppose s < t and A ∈ Fs . Define T to be equal to s if ω ∈ A and equal to t if ω ∈ / A. As in the proof of Proposition 9.3, but even more simply, T is a stopping time, so 0 = E MT = E [Ms ; A] + E [Mt ; Ac ]. The fixed time t is a stopping time, hence 0 = E Mt = E[Mt ; A] + E [Mt ; Ac ]. Comparing, E [Mt ; A] = E [Ms ; A], which proves M is a martingale. Proposition 9.6 Suppose Mt is a square integrable martingale. Then
E [(MT − MS )2 | FS ] = E [MT2 − MS2 | FS ]. Proof
(9.1)
By Proposition 9.3
E [(MT − MS )2 | FS ] = E [MT2 | FS ] − 2MS E [MT | FS ] + MS2 = E [MT2 | FS ] − MS2 = E [MT2 − MS2 | FS ] and we are done. If we take expectations in (9.1), we obtain
E [(MT − MS )2 ] = E MT2 − E MS2 .
(9.2)
Theorem 9.7 Suppose M0 = 0, Mt is a continuous local martingale, and the paths of Mt are locally of bounded variation. Then M is identically 0, a.s., that is, P(Mt = 0 for all t ) = 1. Proof Using the definition of local martingale, it suffices to suppose M is a continuous uniformly integrable martingale. Let t0 be fixed and let At denote the total variation of the paths of M up to time t. If TN = inf {t : At ≥ N}, we look at MtN = MTN ∧t∧t0 . Using Proposition 9.3 and the remark following it, we see that M N is also a continuous martingale with paths of bounded variation, and if M N is identically zero, then letting N → ∞ and t0 → ∞, we obtain our result. Therefore it suffices to suppose the total variation of Mt is bounded by N, a.s. In particular, Mt is bounded by N.
9.3 Quadratic variation
57
Let n ≥ 1 and set Vn = sup |M(k+1)t0 /2n − Mkt0 /2n |. k≤2n −1
Note Vn ≤ 2N, a.s., and Vn → 0, a.s., as n → ∞ by the uniform continuity of the paths of M on [0, t0 ]. By dominated convergence, E Vn → 0 as n → ∞. We write
E Mt20
=E
n −1 2
2 2 (M(k+1)t n − Mkt /2n ) 0 /2 0
k=0
=E
n −1 2
(M(k+1)t0 /2n − Mkt0 /2n )2
k=0 n −1 2 ≤ E Vn |M(k+1)t0 /2n − Mkt0 /2n |
k=0
≤ N E Vn . The second equality follows by (9.2). Since n is arbitrary and E Vn → 0, then E Mt20 = 0. By Doob’s inequalities, E [sups≤t0 Ms2 ] = 0. Hence M is identically 0 up to time t0 .
9.3 Quadratic variation Definition 9.8 A continuous square integrable martingale Mt has quadratic variation Mt (sometimes written M, Mt ) if Mt2 − Mt is a martingale, where Mt is a continuous adapted increasing process with M0 = 0. In the case where W is a Brownian motion, t0 is fixed, and Mt = Wt∧t0 the quadratic variation of M is just Mt = t ∧ t0 by Example 3.3. Brownian motion itself does not fit perfectly into the framework of stochastic integration because it is not a square integrable martingale, although it is a martingale; we will be dealing with this point several times in what follows. We will show existence and uniqueness of Mt by means of the Doob–Meyer decomposition, Theorem 9.12, below. However we defer the proof of the Doob–Meyer decomposition until the next section. A process Z is of class D if {ZT : T a finite stopping time} is a uniformly integrable family of random variables. Theorem 9.9 Let Mt be a continuous square integrable martingale. There exists a continuous adapted increasing process Mt with M0 = 0 and with increasing paths such that Mt2 − Mt is a martingale. If At is a continuous adapted increasing process such that Mt2 − At is a martingale, then P(At = Mt for some t ) = 0. Proof
By Jensen’s inequality for conditional expectations,
E [Mt2 | Fs ] ≥ (E [Mt | Fs ])2 = Ms2 if s < t, and so Mt2 is a submartingale. Since M∞ is square integrable, given ε there exists δ 2 2 such that E [M∞ ; A] < ε if P(A) < δ. Since Mt2 is a submartingale, if K > E M∞ /δ, then 2 P(Mt2 > K ) ≤ E Mt2 /K ≤ E M∞ /K < δ,
58
Continuous semimartingales
and consequently 2 E [Mt2 ; Mt2 > K] ≤ E [M∞ ; Mt2 > K] < ε.
By Exercise 3.11, Mt2 is of class D. Applying the Doob–Meyer decomposition (Theorem 9.12) to −Mt2 , we write −Mt2 = Nt − Bt , where Nt is a martingale and Bt has increasing paths. We then set Mt = Bt . The uniqueness follows from the uniqueness part of the Doob–Meyer decomposition. In view of Proposition 9.3 and the definition of M, we have
E [(MT − MS )2 − (MT − MS ) | FS ] =
E [MT2
−
MS2
(9.3)
− (MT − MS ) | FS ] = 0
if S and T are finite stopping times and M is a continuous square integrable martingale. If M and N are two square integrable martingales, we define M, Nt by M, Nt = 12 [M + Nt − Mt − Nt ].
(9.4)
This is sometimes called the covariation of M and N. An alternative representation of Mt is the following. A proof could be given now, but it is a bit messy. After we have Itˆo’s formula this will be easier. Theorem 9.10 Let M be a square integrable martingale and let t0 > 0. Then Mt is the limit in probability of
[2n t0 ]
(M(k+1)/2n − Mk/2n )2 ,
k=0 n
where [2 t0 ] is the largest integer less than or equal to 2nt0 .
9.4 The Doob–Meyer decomposition In this section we give a proof of the Doob–Meyer decomposition for continuous supermartingales. First we need the following inequality, which has many other uses as well. Proposition 9.11 Suppose A1 and A2 are two increasing adapted continuous processes starting at zero with Ai∞ = limt→∞ Ati < ∞, a.s., i = 1, 2, and suppose there exists a positive real K such that for all t,
E [Ai∞ − Ati | Ft ] ≤ K,
a.s.,
i = 1, 2.
(9.5)
Let Bt = At1 − At2 . Suppose there exists a non-negative random variable V with E V 2 < ∞ such that for all t, |E [B∞ − Bt | Ft ] | ≤ E [V | Ft ], Then
a.s.
√ E sup Bt2 ≤ 8E V 2 + 8 2K(E V 2 )1/2 .
(9.6)
(9.7)
t≥0
Proof
We start by showing
E (Ai∞ )2 ≤ 2K 2 ,
i = 1, 2.
(9.8)
9.4 The Doob–Meyer decomposition
59
First suppose Ai∞ is bounded by a positive real number L. Note that we have E Ai∞ = E [E [Ai∞ − Ai0 | F0 ] ] ≤ K. A simple calculation shows that ∞ i 2 (Ai∞ − Ati ) dAti . (A∞ ) = 2 0
We then have, using Proposition 3.14,
E (Ai∞ )2 = 2E = 2E = 2E
∞
0 ∞ 0 ∞
0
(Ai∞ − Ati ) dAti (E [Ai∞ | Ft ] − Ati ) dAti
E [Ai∞ − Ati | Ft ] dAti ∞
≤ 2K E 0
dAti = 2K E Ai∞ ≤ 2K 2 .
If we let TL = inf {t : + At2 ≥ L} and Ati,L = 2 2 by Ati,L . We obtain E (Ai,L ∞ ) ≤ 2K , and then At1
proves (9.8). We next write
B2∞
and hence
(B∞ − Bt ) dBt ,
0
E B2∞
∞
=2
i At∧T , then (9.5) still holds if we replace Ati L letting L → ∞ and using Fatou’s lemma
∞
= 2E E [B∞ − Bt | Ft ] dBt 0 ∞ ≤E E [V | Ft ] d(At1 + At2 ) 0 ∞ V d(At1 + At2 ) =E 0
= E [V (A1∞ + A2∞ )]. The bound (9.8) takes care of the integrability concerns. By the Cauchy–Schwarz inequality we obtain √ E B2∞ ≤ (E [(A1∞ + A2∞ )2 ])1/2 (E V 2 )1/2 ≤ 2 2K(E V 2 )1/2 . Now let Mt = E [B∞ | Ft ], Nt = E [V | Ft ], where we take the right–continuous versions (see Corollary 3.13), and let Xt = Mt − Bt . We have |Xt | = |E [B∞ − Bt | Ft ] | ≤ Nt , and using Doob’s inequalities, 2 E sup Xt 2 ≤ E sup Nt2 ≤ 4E N∞ = 4E V 2 . t≥0
t≥0
Also by Doob’s inequalities, 2 E sup Mt2 ≤ 4E M∞ = 4E B2∞ . t≥0
Since supt≥0 |Bt | ≤ supt≥0 |Xt | + supt≥0 |Mt |, our result follows.
60
Continuous semimartingales
We now prove the Doob–Meyer decomposition for continuous supermartingales. In view of the proof of Proposition A.30, we would like to let
t
At =
E
dZ
0
s
ds
| Fs ds,
but this doesn’t make sense. We instead define an approximation Ath by (9.9) and show that Ath converges to what we want as h → 0. Theorem 9.12 Suppose Zt is a continuous adapted supermartingale of class D. Then there exists an increasing adapted continuous process At with paths locally of bounded variation started at 0 and a continuous local martingale Mt such that Zt = Mt − At . If M and A are two other such processes with Zt = Mt − At , then Mt = Mt and At = At for all t, a.s. Proof Let us prove the second assertion first. Let SN be the first time that |Mt | + |Mt | exceeds N. If Zt = Mt − At = Mt − At , = At∧SN − At∧S is a martingale whose paths are locally of bounded then Mt∧SN − Mt∧S N N variation. By Theorem 9.7, Mt∧SN = Mt∧S , a.s. Since this is true for all N, then Mt = Mt . N Now let us prove the existence of M and A. Let TN = inf {t : |Zt | ≥ N}∧N and ZtN = Zt∧TN . By Exercise 9.2, Z N is a supermartingale. If we prove the decomposition ZtN = MtN − AtN for each N, then by the uniqueness assertion, if N1 < N2 , we have AtN1 and MtN1 agreeing with AtN2 and MtN2 , respectively, for t ≤ TN1 . Hence given t, we can choose N large enough so that t ≤ TN and then define Mt = MtN , At = AtN . Clearly this gives the desired decomposition. Thus we may suppose that Zt is bounded by some N and that Zt is constant for t ≥ N. Let Vδ = sup|t−s|≤δ |Zt − Zs |. Since Z has continuous paths,
Vδ =
sup s,t∈Q+ ,|t−s|≤δ
|Zt − Zs |,
and therefore Vδ is measurable with respect to F∞ . Since the paths of Z are uniformly continuous, Vδ → 0, a.s., as δ → 0, and since |Vδ | ≤ 2N, we have by dominated convergence that E Vδ2 → 0 as δ → 0. We define 1 t h At = (Zs − E [Zs+h | Fs ]) ds. (9.9) h 0 At this point we do not know even that E [Zs+h | Fs ] has any nice measurability properties (it is not a martingale, for example); let us assume that it has a version that has continuous paths, is adapted, and is jointly measurable in t and ω, and prove this
9.4 The Doob–Meyer decomposition
61
fact a bit later on. Because Z is a supermartingale, Ah is increasing. We have (note Exercise 9.6) 1 ∞ h h E[A∞ − At | Ft ] = E E [Zs − Zs+h | Fs ] ds | Ft h t 1 ∞ E [Zs − Zs+h | Ft ] ds = h t ∞ 1 ∞ Zs ds − Zs ds | Ft = E h t t+h 1 t+h = E Zs ds | Ft h t 1 =E Zt+uh du | Ft . 0
Since Z is bounded by N, it follows that Ah satisfies (9.5). If k < h, then 1 (Zt+uh − Zt+uk ) du | Ft |E [(Ah∞ − Ath ) − (Ak∞ − Atk ) | Ft ] | = E 0
≤ E [Vh | Ft ]. Now apply Proposition 9.11 to see that E supt≥0 (Ath − Atk )2 → 0 as k, h → 0. This shows that whenever hn decreases to 0, then Ahn is a Cauchy sequence in a normed linear space, where the norm is given by X = (E sup |Xt |2 )1/2 ,
(9.10)
t≥0
which is complete by Exercise 9.5. Therefore there exists a limit A. Since
E sup(Ath − At )2 → 0 t≥0
as h → 0, there exists a subsequence hn → 0 such that supt≥0 (Athn − At )2 → 0, a.s., which proves that At is continuous and increasing. We calculate
E [A∞ − At | Ft ] = lim E [Ah∞ − Ath | Ft ] h→0 1 = lim E Zt+uh du | Ft h→0 0 1 Zt du | Ft =E 0
= Zt . Therefore Zt = E [A∞ | Ft ] − At , which is the decomposition of Z into a martingale minus an increasing process.
62
Continuous semimartingales
Fix h. It remains to show that there is a version of E [Zs+h | Fs ] that is a continuous jointly measurable adapted process. Define Yt = Zt+h and define Yt n to be equal to Yk/2n if t k,n of the martingale E [Yk/2n | Ft ] k/2n ≤ t < (k +1)/2n . Take the right-continuous version Y (see Corollary 3.13) and let t n (ω) = Y
∞
t k,n (ω). 1[k/2n ,(k+1)/2n ) (t )Y
k=0
t n is right continuous, so we see that it t n = E [Yt n | Ft ], a.s., for all t. Moreover, Y Note that Y is jointly measurable in t and ω. Now for n > m, t n − Y t m | ≤ sup E [V2−m | Ft ]. sup |Y t≥0
(9.11)
t≥0
We have already seen that there exists a subsequence such that the right-hand side of t n converges (9.11) converges to 0 almost surely. Hence along the appropriate subsequence, Y , we see that Y t is right continuous, adapted, and jointly uniformly. If we call the limit Y n n n n measurable. If k/2 ≤ t ≤ (k + 1)/2 , then |Yt − Yk/2 n | ≤ V2−n , so n n n t n − Y k/2 |Y n | = |E [Yt − Yk/2n | Ft ] | ≤ E [V2−n | Ft ].
By the triangle inequality, t n − Y sn | ≤ 2 sup E [V2−n | Ft ] |Y t≥0
t n is bounded by if k/2n ≤ s, t ≤ (k + 1)/2n . Therefore the largest jump of Y has continuous paths. Finally, Yt n dif2 supt≥0 E [V2−n | Ft ], and we conclude the limit Y t is a version of fers from Yt by at most V2−n , so we see by passing to the limit that Y E [Zt+h | Ft ].
Exercises 9.1
Let Wt be a Brownian motion started at 1 and T0 = inf {t > 0 : Wt = 0}. Is Mt = Wt∧T0 a square integrable martingale? A locally square integrable martingale? A uniformly integrable martingale? A martingale? A local martingale? A semimartingale?
9.2
Prove that if M is a submartingale such that the paths of M are continuous, supt |Mt | is integrable, and S ≤ T are finite stopping times, then E [MT | FS ] ≥ MS . Note that the last part of the proof of Proposition 9.3 breaks down here.
9.3
Suppose Mt is a local martingale with continuous paths. Show that if N > 0, TN = inf {t : |Mt | ≥ N}, and MtN = Mt∧TN , then M N is a uniformly integrable martingale.
9.4
Suppose Wt1 and Wt2 are two independent Brownian motions, t0 > 0, and Mti = Wt∧t0 , i = 1, 2. Show M 1 , M 2 t = 0.
9.5
Show that the norm defined in (9.10) is complete.
9.6
Let Zt be a bounded supermartingale with continuous paths that is constant from some time t0 on. Show that for each t ∞ ∞ E E [Zs − Zs+h | Fs ] ds | Ft = E [Zs − Zs+h | Ft ] ds, a.s. t
t
Notes 9.7
63
We mentioned that one can prove the existence of M without using the Doob–Meyer theorem. Here is how that argument starts. Let M be a bounded continuous martingale and for each n, define [t2n ] (M(i+1)/2n − Mi/2n )2 . In (t ) = i=0
Here [x] is the integer part of x. Prove that for each t > 0, E |In (t ) − Im (t )|2 → 0 as n, m → ∞. One can then define Mt as the L2 limit of In (t ). Hint: If n > m, note that M(i+1)/2m − Mi/2m =
2n−m (i+1)−1
(M( j+1)/2n − M j/2n ).
j=2n−m i
Notes The first proof of the Doob–Meyer decomposition was by Meyer in the early 1960s and was a major breakthrough. There are now a number of alternate proofs. The proof we give here for continuous supermartingales is new.
10 Stochastic integrals
This chapter is devoted to the construction of stochastic integrals, primarily t with respect to continuous square integrable martingales. The motivating example is 0 Hs dWs , where W is a Brownian motion and H is an adapted process satisfying certain conditions. We cannot define this integral as a Lebesgue–Stieltjes integral because the paths of Brownian motion are nowhere differentiable (Theorem 7.3). One way to visualize a stochastic integral is to think of dWs as “white noise,” on a radio and Hs as the volume control which increases or decreases the white noise by a factor. For another model, if Ws is supposed to represent a stock price at time s (of course, stock prices can’t be negative, while Brownian motion can!) and Hs is the number of shares held at time s, then the stochastic integral represents the net profit.
10.1 Construction Let Mt be a continuous square integrable martingale with respect to a filtration {F t } satisfying the usual conditions, and suppose Ht is an adapted process. Under appropriate additional assumptions on H , we want to define t Hs dMs , (10.1) Nt = 0
the stochastic integral of H with respect to M. We impose two conditions on the integrand Ht , a measurability one and an integrability one. First we define the predictable σ -field P on [0, ∞) × . This is the smallest σ -field of subsets of [0, ∞) × with respect to which all left continuous, bounded, and adapted processes are measurable. In symbols,
P = σ (X : X is left continuous, bounded, and adapted to {Ft }). This can be rephrased by saying P is the σ -field on [0, ∞) × generated by the collection of all sets of the form {(t, ω ∈ [0, ∞) × : Xt (ω) > a}, where a ∈ R and X is a bounded, adapted, left continuous process. We require H : [0, ∞) × → R to be measurable with respect to P . When this happens, we say H is predictable. The integrability is easier to state: we require ∞ E Hs2 dMs < ∞. (10.2) 0
64
10.1 Construction
65
Observe that H will meet both requirements if H is bounded, adapted, and has continuous paths. t We define 0 Hs dMs in three steps: Step 1. When Hs (ω) = K(ω)1(a,b] (s), where K is bounded and Fa measurable. Step 2. When Hs is a sum of processes of the form in Step 1. Step 3. When H is predictable and satisfies (10.2). If Mt = Wt∧t0 , where W is a Brownian motion and t0 is a fixed time, then Mt = t ∧ t0 , and it might help the reader to work through the proofs in this special case. Even in this situation, all the elements of the general construction are present. We will need the following easy lemma. Lemma 10.1 The predictable σ -field P is generated by the collection C of processes of the form Xt (ω) = ni=1 Ki (ω)1(ai ,bi ] (t ), where for each i, Ki is a bounded Fai measurable random variable. Proof If X ∈ C , then X is bounded, adapted, and left continuous, hence X is a predictable process. Thus C ⊂ P . On the other hand, if Y is a bounded, adapted, left-continuous process, we can approximate Y by the processes n2 n
Yt (ω) = n
Yi/2n (ω)1(i/2n ,(i+1)/2n ] (t ).
i=0
Each such Y n is in C . Therefore the σ -field generated by C contains P . Proposition 10.2 Suppose H is as in Step 1 above. Then Nt = K(Mt∧b − Mt∧a ) is a continuous martingale, 2 E N∞
=E
∞
K 2 1(a,b] (s) dMs = E [K 2 (Mb − Ma )],
0
and
t
Nt =
K 2 1(a,b] (s) dMs . 0
Proof The continuity of the paths of N is clear. Set N∞ = K(Mb − Ma ). Since K is bounded 2 and M is square integrable, E N∞ < ∞. We will show Nt = E [N∞ | Ft ], which will prove that Nt is a martingale. If t ≥ b, then since K, Mb, and Ma are Ft measurable,
E [N∞ | Ft ] = K(Mb − Ma ) = Nt . If a ≤ t ≤ b, K is Ft measurable, and
E [K(Mb − Ma ) | Ft ] = K E [Mb − Ma | Ft ] = K(Mt − Ma ) = Nt . In particular, Na = E [N∞ | Fa ] = 0. Finally, if t ≤ a,
E [N∞ | Ft ] = E [E [N∞ | Fa ] | Ft ] = 0 = Nt .
66
Stochastic integrals 2 For E N∞ , we have by (9.2) with S = a and T = b, 2 E N∞ = E [K 2 (Mb − Ma )2 ] = E [K 2 E [(Mb − Ma )2 | Fa ] ]
= E [K 2 E [Mb − Ma | Fa ] ] = E [K 2 (Mb − Ma )]. To verify the formula for Nt , let L∞ = K 2 (Mb − Ma )2 − K 2 (Mb − Ma ), Lt = K 2 (Mb∧t − Ma∧t )2 − K 2 (Mb∧t − Ma∧t . Then
t
Lt = Nt2 −
K 2 1(a,b] (s) dMs , 0
and we must show that Lt is a martingale. To do this, it suffices to show Lt = E [L∞ | Ft ]. If t ≥ b, then L∞ is Ft measurable, so E [L∞ | Ft ] = L∞ = Lt . If a ≤ t ≤ b, then
E [L∞ | Ft ] = K 2 E [(Mb − Ma )2 − (Mb − Ma ) | Ft ] = K 2 E [Mb2 − Ma2 − (Mb − Ma ) | Ft ] = K 2 E [Mt2 − Ma2 − (Mt − Ma ) | Ft ] = K 2 E [(Mt − Ma )2 − (Mt − Ma ) | Ft ] = Lt , using (9.1) and (9.3) with the stopping times there being fixed positive real numbers. In particular, E [L∞ | Fa ] = La = 0. Finally, if t ≤ a,
E [L∞ | Ft ] = E [E [L∞ | Fa ] | Ft ] = 0 = La as required. Next suppose Hs (ω) =
J
K j 1(a j ,b j ] (s),
(10.3)
j=1
where each K j is Fa j measurable and bounded. We may rewrite H so that the intervals (a j , b j ] satisfy a1 < b1 ≤ a2 < b2 ≤ · · · ≤ aJ < bJ . For example, if Hs = K1 1(a1 ,b1 ] + K2 1(a2 ,b2 ] with a1 < a2 < b1 < b2 , we may rewrite Hs as K1 1(a1 ,a2 ] + (K1 + K2 )1(a2 ,b1 ] + K2 1(b1 ,b2 ] . Define Nt =
J
K j (Mt∧b j − Mt∧a j ).
(10.4)
j=1
We need to check that rewriting Hs so that a1 < b1 ≤ a2 < · · · < bJ does not affect the value of Nt , but this is routine.
10.1 Construction
67
Proposition 10.3 With H as in (10.3) and N defined by (10.4), Nt is a continuous martingale, ∞ 2 E N∞ = E Hs2 dMs , 0
and
t
Nt =
Hs2 dMs .
(10.5)
0
Proof
By linearity, Nt is a continuous martingale. We have 2 E N∞ =E H j2 (Mb j − Ma j )2 j
+ 2E
(10.6)
Hi H j (Mbi − Mai )(Mb j − Ma j ) .
i< j
The cross terms vanish, because when i < j and we condition on Fa j , we have
E [Hi H j (Mbi − Mai )E [(Mb j − Ma j ) | Fa j ] ] = 0. For the terms in the first sum in (10.6), by (9.3)
E [H j2 (Mb j − Ma j )2 ] = E [H j2 E [(Mb j − Ma j )2 | Fa j ] ] = E [H j2 E [Mb j − Ma j | Fa j ] ] = E [H j2 ([Mb j − Ma j ])]. Therefore
2 E N∞
∞
=E
Hs2 dMs .
(10.7)
0
The argument for Nt is similar. Now suppose Hs is predictable and (10.2) holds. Choose Hsn of the form given in (10.3) above such that ∞
E
(Hsn − Hs )2 dMs → 0.
0
To see that this can be done, define
Y 2 = E
∞
Yt 2 dMt
1/2
0
for Y predictable. Then Y 2 is an L norm on functions on [0, ∞) × , so by Lemma 10.1 we can approximate H in this norm by processes of the form given in (10.3). (When H is bounded, adapted, and has continuous paths, taking Hsn equal to Hk/2n if k/2n < s ≤ (k+1)/2n for s < n and Hsn = 0 if s ≥ n will work.) By Doob’s inequalities we have
t 2
∞ 2 E sup (Hsn − Hsm ) dMs (Hsn − Hsm ) dMs ≤ 4E t≥0 0 0 ∞ = 4E (Hsn − Hsm )2 dMs → 0. 2
0
68
Stochastic integrals
The norm Y ∞ = (E [sup |Yt |2 ])1/2
(10.8)
t
is complete;this was shown in Exercise 9.5. Thus there exists a process Nt such that t supt≥0 |Nt − 0 Hsn dMs | → 0 in L2 . If Hsn and Hsn are two sequences converging in the · 2 norm to H , then t
t 2 n n E (Hs − Hs ) dMs = E (Hsn − Hsn )2 dMs → 0, 0
0 n
or the limit is independent of which sequence H we choose. It is easy to see, because of the L2 convergence, that Nt is a martingale: if A ∈ Fs , then t s E Hrn dMr ; A = E Hrn dMr ; A 0
0
by Proposition 10.3. Now use that t t n Hr dMr − Nt ; A ≤ E Hrn dMr − Nt E 0 0
t 2 1/2 ≤ E Hrn dMr − Nt →0 0
and similarly with t replaced by s. Similar arguments using the L2 convergence show that t E Nt2 = E Hs2 dMs ,
(10.9)
0
and
Nt =
t
Hs2 dMs .
(10.10)
0
t Because supt≥0 |Nt − 0 Hsn dMs | → 0 in L2 , there exists a subsequence {nk } such that the convergence takes place almost surely, that is t a.s. Hsnk dMs − Nt → 0, sup t≥0
0
t
Since each 0 Hsn dMs has continuous paths, with probability one, Nt has continuous paths. t We write Nt = 0 Hs dMs and call Nt the stochastic integral of H with respect to M. We summarize our construction as follows. Theorem 10.4 Suppose the filtration {Ft } satisfies the usual conditions and Mt is a square integrable martingale with continuous paths. Suppose H is of the form J i=1
K j (ω)1(a j ,b j ] (s),
(10.11)
10.2 Extensions
69
where each K j is bounded and Fa j measurable. In this case define
t
Hs dMs = 0
J
K j (Mt∧b j − Mt∧a j ).
j=1
If H is predictable and
∞
E
Hs2 dMs < ∞,
0
∞ choose H n of the form given in (10.11) with E 0 (Hsn − Hs )2 dMs → 0, and define t Nt = Hs dMs 0
t to be the limit with respect to the norm (10.8) of 0 Hsn dMs . Then Nt is a continuous martingale, ∞ 2 E N∞ = E Hs2 dMs , 0
and
t
Nt =
Hs2 dMs . 0
Moreover the definition of Nt is independent of the particular choice of the H n .
10.2 Extensions There are some extensions of the definition that are fairly routine. Extension 1. If ∞ Hs2 dMs < ∞, a.s., 0
but without the expectation being finite, let t TN = inf t : Hs2 dMs > N . 0
t Mt = Mt∧TN is a square integrable martingale with M t = Mt∧TN , so 0 Hs2 dM t ≤ N. t t Define 0 Hs dMs to be the quantity 0 Hs dMs∧TN if t ≤ TN . If t ≤ TK ≤ TN , we need to t t check that 0 Hs dMt∧TK = 0 Hs dMt∧TN , so that our definition is consistent. This is part of Exercise 10.2. Extension 2. If Mt is a continuous local martingale (see Section 9.1 for the definition), let Sn = inf {t : |Mt | ≥ n}. By Exercise 9.3, Mt∧Sn will be a uniformly integrable martingale, and in fact, since Mt∧Sn is bounded, it is square integrable. For t ≤ Sn we set t t Hs dMs = Hs dMs∧Sn 0
0
and Mt = Mt∧Sn . Again there is consistency to check, which is also part of Exercise 10.2.
70
Stochastic integrals
Extension 3. Suppose that Xt = Mt + At is a semimartingale with continuous paths, so that and A is a process with paths locally of bounded variation. If ∞ M2 is a localmartingale ∞ H dM + |H | |dA | < ∞, we define s s s s 0 0 t t t Hs dXs = Hs dMs + Hs dAs , 0
0
0
where the first integral on the right is a stochastic integral and the second is a Lebesgue– Stieltjes integral. For a semimartingale, we define X t = Mt .
(10.12)
Given two semimartingales X and Y we define X , Y t by: X , Y t = 12 [X + Y t − X t − Y t ].
Exercises 10.1 Prove (10.5) in Proposition 10.3. 10.2 Check the consistency of the first two extensions of the definition of stochastic integrals. 10.3 Show that if M is a continuous square integrable martingale, and T a finite stopping time, then ∞ 1[0,T ] dMs = MT . 0
t 10.4 Show that if Nt = 0 Hs dMs where M is a continuous square integrable martingale, H is ∞ 2 t predictable, ∞ 2 and E 0 Hs dMs < ∞, and Lt = 0 Ks dNs , where K is predictable and E 0 Ks dNs < ∞, then t Hs Ks dMs . Lt = 0
10.5 Show that if M, H , and N are as in Exercise 10.4, then M, Nt = Hint: Derive a formula for N + Mt from the fact that t Nt + Mt = (1 + Hs ) dMs .
t 0
Hs dMs .
0
10.6 Suppose that t M and L are square integrable martingales, H is predictable and satisfies (10.2), and Nt = 0 Hs dMs . Show that t N, Lt = Hs dM, Ls . (10.13) 0
Sometimes the stochastic integral of H with respect to M is defined to be the square integrable martingale N for which (10.13) holds for all square integrable martingales L. 10.7 Show that if M and N are square integrable martingales with continuous paths, then M, Nt ≤ (Mt )1/2 (Nt )1/2 . Hint: Imitate an appropriate proof of the Cauchy–Schwarz inequality. This result is a special case of the inequality of Kunita–Watanabe.
11 Itˆo’s formula
The most important result in the theory of stochastic integration is Itˆo’s formula. This is also known as the change of variables formula. Let C k be the functions that are k times continuously differentiable and Cbk those functions C k such that the function and its ith-order derivatives are bounded for i ≤ k. Theorem 11.1 Let Xt be a semimartingale with continuous paths and suppose f ∈ C 2 . Then for almost every ω t t f (Xs ) dXs + 12 f (Xs ) dX s , t ≥ 0. (11.1) f (Xt ) = f (X0 ) + 0
0
Step 1 will be to reduce to the case when f ∈ Cb3 and X has appropriate boundedness conditions. Step 2 is a use of Taylor’s formula; see (11.2). Step 3 shows that each term converges to the appropriate quantity, and Step 4 removes the restriction that f be in Cb3 . Proof Step 1. If Xt = Mt + At is the decomposition of X into a local martingale M and a process A that t has paths locally of bounded variation, let Vt be the total variation of A up to time t: Vt = 0 |dAs |. Let TN = inf {t : |Mt | > N or Mt > N or Vt > N}. By the continuity of paths, TN → ∞, a.s., as N → ∞, so for almost every ω and for each t, t ∧ TN = t for N large enough. Since Itˆo’s formula is a path-by-path result, it suffices to prove Itˆo’s formula for Xt∧TN for each N, or what amounts to the same thing, we may take N arbitrary and assume Mt , Mt , At , and Vt are all bounded by N. In this case, Xt is bounded by 2N. Since X is bounded, we may modify f , f , and f outside of [−2N, 2N] without affecting the validity of Itˆo’s formula. Therefore we will also assume f ∈ C 2 with compact support. Let us temporarily assume in addition that f exists and is continuous; we will remove this last assumption later on. Let t0 > 0, ε > 0, S0 = 0, and define Si+1 = Si+1 (ε) = inf {t > Si :|Mt − MSi | > ε or Mt − MSi > ε or Vt − VSi > ε} ∧ t0 . Note Si = t0 for i sufficiently large (how large depends on ω) by the continuity of the paths. 71
72
Itˆo’s formula
Step 2. The key idea to proving Itˆo’s formula is Taylor’s theorem. We write f (Xt0 ) − f (X0 ) = =
∞ i=0 ∞
[ f (XSi+1 ) − f (XSi )]
(11.2)
f (XSi )(XSi+1 − XSi ) +
1 2
∞
i=0
+
f (XSi )(XSi+1 − XSi )2
i=0
∞
Ri ,
i=0
where Ri is the remainder term. We have |Ri | ≤ c f ∞ |XSi+1 − XSi |3 . Step 3. Let us first look at the terms with f in them. Let Hsε = f (XSi ) if Si ≤ s < Si+1 . By the continuity Xs , we see that Hsε converges boundedly and pointwise to f (Xs ). t0 ofε f and In particular, 0 |Hs − f (Xs )| dVs → 0 boundedly, hence t0 E |Hsε − f (Xs )| dVs → 0. 0
Also,
E
t0
(Hsε − f (Xs )) dMs
2
=E
0
t0
|Hsε − f (Xs )|2 dMs → 0
0
as ε → 0. We then have f (XSi )(XSi+1 − XSi ) =
t0
Hsε
0
i
t0
(dMs + dAs ) →
f (Xs ) (dMs + dAs ),
0
which leads to the f term in Itˆo’s formula. Next let us look at the f terms. We can write (XSi+1 − XSi )2 = (MSi+1 − MSi )2 + 2(MSi+1 − MSi )(ASi+1 − ASi ) + (ASi+1 − ASi )2 . Note i f (XSi )(MSi+1 − MSi )(ASi+1 − ASi ) is bounded in absolute value by t0 ε f ∞ |ASi+1 − ASi | ≤ ε f ∞ dVs ≤ ε f ∞ N, 0
i
which goes to 0 as ε → 0; this follows from the definition of Si . Similarly the expression f (XSi )(ASi+1 − ASi )2 also goes to 0. Therefore we need to show t0 2 f (XSi )(MSi+1 − MSi ) → f (Xs ) dX s . i
0
By an argument very similar to the one for the f terms, t0 1 1 f (XSi )(MSi+1 − MSi ) → 2 f (Xs ) dMs , 2 i
(11.3)
0
and since X t = Mt for semimartingales (see (10.12)), the right-hand side of (11.3) is the correct f term. We thus need to show that f (XSi )[(MSi+1 − MSi )2 − (MSi+1 − MSi )] → 0 (11.4) i
Itˆo’s formula
73
as ε → 0. We will show
E
∞
2
→ 0,
Bi
(11.5)
i=0
where Bi = f (XSi )[(MSi+1 − MSi )2 − (MSi+1 − MSi )]. We have
E
2 Bi
=E
i
B2i + 2
i
Bi B j .
i< j
If i < j, then
E [Bi B j ] = E [Bi E [B j | F Si+1 ] ]. By (9.2) and the fact that Si+1 ≤ S j , we see that
E [B j | FSi+1 ] = f (XS j )E [(MS j+1 − MS j )2 − (MS j+1 − MS j ) | FSi+1 ] = 0, so the cross-products vanish. Therefore to prove (11.5) it remains to show E i B2i → 0 as ε → 0. We use the easy inequality (x + y)2 ≤ 2x2 + 2y2 . Since f is bounded,
E
B2i ≤ 2 f 2∞
i
E [(MSi+1 − MSi )4 ]
i
+ 2 f 2∞
(11.6)
E [(MSi+1 − MSi )2 ].
i
The first sum on the right-hand side of (11.6) is bounded by 2ε2 f 2∞
E [(MSi+1 − MSi )2 ] = 2ε2 f 2∞ E [Mt20 − M02 ]
i
≤ 8ε2 f 2∞ N 2 . The second sum on the right-hand side of (11.6) is bounded by 2ε f 2∞
E [(MSi+1 − MSi ] ≤ 2ε f 2∞ E Mt0 ≤ 2ε f 2∞ N.
i
Both of these tend to 0 as ε → 0. Therefore E for the f term is complete.
i
B2i → 0, and the proof of the convergence
74
Itˆo’s formula
The final terms to examine are the remainder terms. We have shown that E remains bounded as ε → 0. Since
i (XSi+1 −XSi )
2
|Ri | ≤ cε f ∞ (XSi+1 − XSi )2 ,
we see E i |Ri | → 0 as ε → 0. Step 4. To finish up, we remove the assumption that f ∈ C 3 . (We still assume that f ∈ C 2 with compact support.) Take a sequence { fm } of C 3 functions such that fm , fm , and fm converge uniformly to f , f , and f , respectively. Apply Itˆo’s formula with fm and then let m → ∞. The terms fm (Xt ) and fm (X0 ) clearly converge to f (Xt ) and f (X0 ). The fm terms converge because t0
t0 2 E ( fm (Xs ) − f (Xs )) dMs = E | fm (Xs ) − f (Xs )|2 dMs → 0 0
and
0
t0 E ( fm (Xs ) − f (Xs )) dAs ≤ E 0
t0
| fm (Xs ) − f (Xs )| dVs → 0
0
as m → ∞. The fm terms converge by dominated convergence. This shows that (11.1) holds for each t0 , except for a null set Nt0 depending on t0 . Let N = ∪t∈Q+ Nt , where Q+ denotes the non-negative rationals. If ω ∈ / N, then (11.1) holds for every t0 rational. Each term in (11.1) is continuous, a.s. (with a null set N independent of t0 ). Therefore if ω ∈ / N ∪ N , (11.1) holds for all t0 . There is a multivariate version of Itˆo’s formula, which is proved in a very similar way: Theorem 11.2 Suppose Xt 1 , . . . , Xt d are continuous semimartingales, Xt = (Xt 1 , . . . , Xt d ), and f is a C 2 function on Rd . Then with probability one, t d ∂f f (Xt ) = f (X0 ) + (Xs ) dXsi 0 i=1 ∂xi t d ∂2 f 1 +2 (Xs ) dX i , X j s 0 i, j=1 ∂xi ∂x j
(11.7)
for all t ≥ 0. The following is known as the integration by parts formula or Itˆo’s product formula, and is very useful. Corollary 11.3 If X and Y are semimartingales with continuous paths, then t t Xt Yt = X0Y0 + Xs dYs + Ys dXs + X , Y t . 0
Proof
0
By Itˆo’s formula, Xt = 2
X02
t
+2
Xs dXs + X t . 0
Exercises
75
The analogous formula holds when X is replaced by Y and when X is replaced by X + Y . We then use Xt Yt = 12 [(Xt + Yt )2 − Xt 2 − Yt 2 ]; substituting the formulas for Xt 2 , Yt 2 , and (Xt + Yt )2 that we obtained from Itˆo’s formula and doing some algebra yields our result.
Exercises 11.1 Suppose Wt is a Brownian motion and a ∈ R. Show that the amount of time Brownian motion spends at the point a is zero, i.e., that
t 0
1{a} (Ws ) ds = 0,
a.s.
for all t > 0. (0) = 0 and 11.2 Let a < b and let fa,b be the C 1 function such that fa,b (0) = fa,b
fa,b (x) =
x
1[a,b] (y) dy,
x ∈ R.
0
In other words, fa,b is the function whose second derivative is 1[a,b] , except that the second derivative is not defined at a and b. Show Itˆo’s formula holds for fa,b :
t
fa,b (Wt ) =
fa,b (Ws ) dWs +
0
1 2
t
1[a,b] (Ws ) ds.
0
11.3 If Wt is a Brownian motion, a > 0, and T = inf {t > 0 : |Wt | = a}, calculate E each non-negative integer k. Also calculate
T
E 0
T 0
(Ws )k ds for
1[b1 ,b2 ] (Ws ) ds
if [b1 , b2 ] ⊂ [−a, a]. 11.4 Let W be a Brownian motion, let t0 < t1 < · · · < tn = 1, and let Bi = (Wti − Wti−1 )2 − (ti − ti−1 ). Show there exists a constant c1 not depending on {t0 , . . . , tn } such that E
n
i=1
2 Bi
≤ c1 max |ti − ti−1 |. 1≤i≤n
11.5 Use Exercise 11.4 and the Borel–Cantelli lemma to prove that if W is a Brownian motion, then 2 n
lim
n→∞
k=1
(Wk/2n − W(k−1)/2n )2 = 1,
a.s.
76
Itˆo’s formula
11.6 In our proof of Itˆo’s formula, the use of stopping times simplifies the proof considerably. This exercise considers a proof of Itˆo’s formula using fixed times. Suppose M is a bounded continuous martingale, A is a continuous process whose paths have total variation bounded by N > 0, a.s., and Xt = Mt + At . (1) Writing [x] for the integer part of x, prove that for each t, n [2 t]+1
(X(i+1)/2n − Xi/2n )2
i=1
converges in probability to X t . (2) Prove that if f is a C 2 function whose second derivative is bounded, then n [2 t]+1
f (Xi/2n )(X(i+1)/2n − Xi/2n )2
i=1
converges in probability to
t
f (Xs ) dXs .
0
Since the increments of M and A are not uniformly bounded by something small, this is much harder than the proof of Theorem 11.1 given in this chapter. 11.7 Here is an alternate way to prove Itˆo’s formula. (1) Suppose X = M + A, where M and A are as in Exercise 11.6. Write Xt − 2
X02
=
n [t2 ]−1
2 2 (X(i+1)/2 n − Xi/2n )
i=0
=
n [t2 ]−1
i=0
2Xi/2n (X(i+1)/2n − Xi/2n ) +
n [t2 ]−1
(X(i+1)/2n − Xi/2n )2 .
i=0
Use Exercise 11.6 to show that Itˆo’s formula holds when f (x) = x2 . (2) Derive the Itˆo product formula. Then use induction to show that Itˆo’s formula holds when f (x) = xn , n a positive integer. (3) Given f ∈ C 2 , find polynomials Pm such that Pm , Pm , Pm converge uniformly to f , f , f , respectively, on a compact interval as m → ∞. Apply Itˆo’s formula for Pm and show that one can take limits to derive Itˆo’s formula for f .
12 Some applications of Itˆo’s formula
We will be using Itˆo’s formula throughout the book. In this chapter we give some applications, each of which will turn out to be quite useful.
12.1 L´evy’s theorem The following is known as L´evy’s theorem. Recall that if M is a local martingale with continuous paths and TN = inf {t : |Mt | ≥ N}, we defined Mt to be equal to Mt∧TN if t ≤ TN ; see Section 10.2. Moreover, by Exercise 9.3, Mt∧N is a square integrable martingale for each N. Theorem 12.1 Let Mt be a continuous local martingale with respect to a filtration {F t } satisfying the usual conditions such that M0 = 0 and Mt = t. Then Mt is a Brownian motion with respect to {Ft }. Proof Fix t0 and let Nt = Mt+t0 − Mt0 , Ft = Ft+t0 . It is routine to check that Nt is a martingale with respect to Ft and that Nt = t. Note F0 will not be the trivial σ -field in general. We see that 2 E Nt2 = E Mt+t − E Mt20 = t < ∞. 0
If f is a function mapping the reals to the complex numbers, we may still use Itˆo’s formula: just apply Itˆo’s formula to the real and imaginary parts of f . Doing this for f (x) = eiux , where u and x are real, we have t u2 t iuNs iuNt iuNs e = 1 + iu e dNs − e ds. (12.1) 2 0 0 If we take TK = inf {t : |Nt | ≥ K}, then t∧TK u2 t∧TK iuNs iuNt∧TK iuNs = 1 + iu e dNs − e ds. e 2 0 0
(12.2)
Take A ∈ F0 , multiply (12.2) by 1A , and take expectations. The stochastic integral is a martingale, so this term will have 0 expectation. Then let K → ∞, and we are left with u2 t iuNt E [e ; A] = P(A) − E [eiuNs ; A] ds. (12.3) 2 0 We used the Fubini theorem here. (The reason we introduced the stopping time TK is that Nt∧TK is a square integrable martingale, and hence the stochastic integral is a martingale. We might run into integrability problems if we worked with (12.1) instead of (12.2).) 77
78
Some applications of Itˆo’s formula
Write J (t ) = E [eiuNt ; A], so we have J (t ) = P(A) −
u2 2
t
J (s) ds.
(12.4)
0
Since J is bounded, (12.4) shows that J is continuous. Since J is continuous, using (12.4) 2 again shows that J is differentiable. Hence J (t ) = − u2 J (t ) with J (0) = P(A). The only solution to this ordinary differential equation is J (t ) = P(A)e−u t/2 . 2
(12.5)
If we set A = , this tells us that E eiuNt = e−u t/2 , and by the uniqueness theorem for characteristic functions (Theorem A.48), Mt+t0 − Mt0 is a mean zero normal random variable with variance t. Equation (12.5) also tells us that 2
E [eiuNt ; A] = E [eiuNt ]P(A)
(12.6)
f (u) when A ∈ F0 . Let f be a C ∞ function with compact support. The Fourier transform will be in the Schwartz class; see Section B.2. Replacing u by −u in (12.6), multiplying the resulting equation by f (u), and integrating over u ∈ R, we have −iuNt f (u)E [e ; A] du = f (u)E [e−iuNt ] du P(A). Using the Fubini theorem and the Fourier inversion theorem, and dividing by a constant, we conclude
E [ f (Nt ); A] = E [ f (Nt )]P(A). Since f is in the Schwartz class, integrability is not a problem when applying the Fubini theorem. A limit argument shows that this equation holds with f equal to 1B , where B is a Borel subset of R, hence
P(Mt+t0 − Mt0 ∈ B, A) = P(Mt+t0 − Mt0 ∈ B) P(A). This shows the independence of Mt+t0 − Mt0 and Ft0 . We thus see that Mt is a continuous process starting at 0 with Mt+t0 −Mt0 being a mean zero normal random variable with variance t independent of Ft0 , and therefore M is a Brownian motion.
12.2 Time changes of martingales The next theorem says that most continuous martingales arise from Brownian motion via a time change. That is, the paths are the same, but the rate at which one moves along the paths varies. In fact, it is possible to show that all continuous martingales arise from a time change of a Brownian motion that is possibly stopped at a random time. Theorem 12.2 Suppose Mt is a continuous local martingale, M0 = 0, Mt is strictly increasing, and limt→∞ Mt = ∞, a.s. Let τ (t ) = inf {u : Mu ≥ t}. Then Wt = Mτ (t ) is a Brownian motion with respect to Ft = Fτ (t ) .
12.4 Martingale representation
Proof
79
Let us first suppose that Wt 2 is integrable. We have by Proposition 9.3 that
E [Wt | Fs ] = E [Mτ (t ) | Fτ (s) ] = Mτ (s) = Ws , or Wt is a continuous martingale. Similarly, Wt 2 − t is a martingale. Now apply L´evy’s theorem, Theorem 12.1. Removing the assumption that Wt 2 is integrable is left as Exercise 12.1.
12.3 Quadratic variation Itˆo’s formula allows us to prove Theorem 9.10 fairly simply. Proof of Theorem 9.10 If TK = inf {t : |Mt | ≥ K}, we will show that
[t0 2n ]
(MTK ∧(k+1)/2n − MTK ∧k/2n )2
k=0
converges to Mt0 ∧TK . Since TK → ∞ as K → ∞, this will prove the proposition. Thus we may assume M is bounded by K. If s > 0 and we let Nt = Ms+t − Ms , then Nt is a martingale with respect to the filtration Ft = Fs+t and we can check that Nt = Mt+s − Ms . By Itˆo’s formula applied to the process N, we obtain t (Mt+s − Ms )2 = 2 (Mr+s − Ms ) dMr + (Mt+s − Ms ). 0
Applying this with t = 1/2n and s = k/2n and summing, we see that
[t0 2n ]
t0
(M(k+1)/2n − Mk/2n ) − Mt = 2 2
Lnr dMr + R,
(12.7)
0
k=0
where Lnr = Mr − Mk/2n for k/2n ≤ r < (k + 1)/2n and R = M([t0 2n ]+1)/2n − Mt0 . Note
E 2 0
2
t0
Lnr
dMr
t0
= 4E
(Lnr )2 dMr .
(12.8)
0
The integrand (Lnr )2 is bounded by 4K 2 , E Mt = E Mt2 ≤ K 2 is finite, and Lnr tends to 0 as n → ∞. By dominated convergence, the right-hand side of (12.8) tends to 0 as n → ∞. As for the remainder term, R goes to 0 by the continuity of the paths of Mt . The reason we only have convergence in probability rather than in L2 is due to the stopping time argument involving TK .
12.4 Martingale representation The next theorem says that every martingale adapted to the filtration of a Brownian motion can be expressed as a stochastic integral with respect to the Brownian motion. This
80
Some applications of Itˆo’s formula
used to be a rather arcane result that was of interest only to probabilists specializing in martingales. But then it turned out that this theorem is the basis for showing the completeness of the market in the theory of financial mathematics; see Chapter 28. The martingale representation theorem is also key to the innovations approach to stochastic filtering; see Chapter 29. Theorem 12.3 Let Ft be the minimal augmented filtration generated by a one-dimensional Brownian motion Wt , let t0 > 0, and let Y be Ft0 measurable with E Y 2 < ∞. There exists a t predictable process Hs with E 00 Hs2 ds < ∞ such that
t0
Y = EY +
Hs dWs ,
a.s.
(12.9)
0
The proof consists of showing (12.9) holds for successively larger classes of random variables. Step 1 of the proof shows that the equation holds for random variables of the form eiu(Wt −Ws ) and Step 2 shows that (12.9) holds for products of such random variables. In Step 3, it is shown that if the equation holds for a set of random variables, it holds for the closure of that set with respect to the L2 norm. Proof Step 1. Let Xt = iuWt + u2t/2. Note X t = (iu)2 W t . By Itˆo’s formula applied with f (x) = ex , eiuWt +u t/2 = 1 +
t
2
eXr d(iuWr − u2 r/2) +
1 2
0 t
iuWr +u2 r/2
=1+
iue
t
(−u2 )eXr dr 0
dWr .
0
Therefore eiuWt = e−u t/2 +
t
2
iueiuWr +u r/2−u t/2 dWr . 2
2
(12.10)
0
The integrand in the stochastic integral in (12.10) is eiuWr times a deterministic function, hence is predictable. Therefore (12.9) holds when Y = eiuWt and moreover, the support of H in this case is contained in [0, t], that is, Hr = 0 if r ∈ / [0, t]. Similarly, (12.9) holds when Y = eiu(Wt −Ws ) , and in this case the support of the corresponding H is [s, t]. Step 2. Suppose now that Y1 and Y2 are two random variables for which (12.9) holds with the supports of the corresponding t H1 and H2 overlapping by at most finitely many points. To be more precise, if Yi = E Yi + 00 Hi (s) dWs , i = 1, 2, then we suppose that, with probability one, H1 (s)H2 (s) = 0 except for finitely many points s. This implies 0
t0
H1 (s)H2 (s) ds = 0.
12.4 Martingale representation
81
t Let Zi (t ) = E Yi + 0 Hi (s) dWs , i = 1, 2. Note Zi (0) = E Yi and Zi (t0 ) = Yi . Then by the product formula (Corollary 11.3), t0 t0 Z1 (s) dZ2 (s) + Z2 (s) dZ1 (s) + Z1 , Z2 t0 Y1Y2 = (E Y1 )(E Y2 ) + 0 0 t0 t0 = (E Y1 )(E Y2 ) + Z1 (s)H2 (s) dWs + Z2 (s)H1 (s) dWs 0 0 t0 + H1 (s)H2 (s) ds 0 t0 Ks dWs , (12.11) = (E Y1 )(E Y2 ) + 0
where Ks = Z1 (s)H2 (s)+Zs (s)H1 (s), and so the support of Ks is contained in the union of the supports of H1 (s) and H2 (s). Taking an expectation in (12.11), E [Y1Y2 ] = (E Y1 )(E Y2 ). Thus (12.9) holds for Y1Y2 . Using induction, (12.9) will hold for the product of n random variables Yi , i = 1, . . . , n, provided the supports of any two of the corresponding Hi overlap by at most finitely many values of s. Combining this with Step 1, we see that if s1 < s2 < · · · < sn+1 ≤ t0 , then the random variables of the form n
Y = exp i u j (Ws j+1 − Ws j ) (12.12) j=1
satisfy (12.9). Step 3. We claim that random variables of the form (12.12) generate σ (Ws ; s ≤ t0 ). To see this, we proceed as in the last paragraph of the proof of Theorem 12.1, namely, we replace each u j by −u j , multiply by f (u1 , . . . , un ), the Fourier transform of a C ∞ function f with compact support, integrate over (u1 , . . . , un ) ∈ Rn , use the Fubini theorem and the Fourier inversion theorem, and we obtain random variables of the form f (Ws2 − Ws1 , . . . , Wsn+1 − Wsn ) for f in C ∞ with compact support. By a limit argument, such random variables generate σ (Ws ; s ≤ t0 ). We will prove that whenever Yn satisfies (12.9) and Yn → Y in L2 , then Y satisfies (12.9). By Exercise 2.7 and Proposition 2.5, this will prove our theorem. Suppose each Yn satisfies (12.9) with integrand Hn (s) and suppose Yn → Y in L2 . Then E Yn → E Y , and Yn − E Yn converges in L2 to Y − E Y . Since t0 E (Hn (s) − Hm (s))2 ds = E ((Yn − E Yn ) − (Ym − E Ym ))2 → 0, 0
t the sequence Hn is a Cauchy sequence with respect to the norm X = (E 00 Xs2 ds)1/2 , which is an L2 norm and hence complete. Therefore there exists Hs (which is predictable t t because each Hn (s) is predictable) such that E 00 Hs2 ds < ∞ and E 00 (Hn (s) − Hs )2 ds → 0. Hence t0 t0
2 E (Yn − E Yn ) − Hs dWs = E (Hn (s) − Hs )2 ds → 0. 0
0
Since Yn − E Yn converges in L2 to Y − E Y , it follows that Y − E Y =
t0 0
Hs dWs , a.s.
82
Some applications of Itˆo’s formula
Corollary 12.4 Suppose Mt is a right-continuous square integrable martingale with respect to the minimal augmented filtration {Ft } generated by a one-dimensional Brownian motion t and suppose M0 = 0. Let t0 > 0. Then there exists a predictable process Hs with E 00 Hs2 ds < ∞ such that with probability one t Hs dWs Mt = 0
for all t ≤ t0 .
E Mt0 = Proof Since Mt is a martingale, E [Mt0 | F0 ] = M0 , and taking expectations, t E M0 = 0. By Theorem 12.3, there exists a predictable process H with E 00 Hs2 ds < ∞ t such that Mt0 = 0 Hs dWs . t Taking conditional expectations with respect to Ft , we obtain Mt = 0 Hs dWs . This holds almost surely for each t. Thus except for a null set of ω’s, it holds for all t rational. Since Mt is right continuous, it holds for all t. Corollary 12.5 If Mt is a square integrable martingale with respect to the minimal augmented filtration of a one-dimensional Brownian motion W , then Mt has a version with continuous paths. Proof By Corollary 3.13, M has a version with right continuous paths. By Corollary 12.4, M can be written as a stochastic integral with respect to W . But such stochastic integrals have continuous paths by Theorem 10.4. It is important for the martingale representation theorem that Mt be a martingale with respect to the minimal augmented filtration of W and not a larger filtration. For example, let (X , Y ) be a two-dimensional Brownian motion and let {Ft } be the minimal augmented filtration generated by (X , Y ). We show that we cannot write Y1 as a stochastic integral with respect to Xt . If it were possible to do so, since Y1 has mean zero, we would have 1 Hs dXs . Y1 = t
0
t Taking conditional expectations, Yt = 0 Hs dXs . Then X , Y t = 0 Hs ds by Exercise 10.5. But if (X , Y ) is two-dimensional Brownian motion, then X and Y are independent, and so X , Y t = 0 by Exercise 9.4, a contradiction. (However, it is true, by a proof similar to that of Theorem 12.3, if {Ft } is the minimal augmented filtration of a d-dimensional Brownian motion (W 1 , . . . , W d ) and Y is square t0 i andi Ft0 measurable, then there exist suitable dintegrable i processes Hs such that Y = E Y + i=1 0 Hs dWs .)
12.5 The Burkholder–Davis–Gundy inequalities Next we turn to a pair of basic inequalities, those of Burkholder, Davis, and Gundy. In both of the following theorems, the constant depends on p, the exponent. As stated and proved below, we require p ≥ 2 for Theorems 12.6 and 12.7; in fact, the two theorems are true (with a different proof) as long as p > 0; see Bass (1995), pp. 62–4, or Exercise 12.12. The proof we present here is a nice application of Itˆo’s formula.
12.5 The Burkholder–Davis–Gundy inequalities
83
Define Mt∗ = sup |Ms |. s≤t
Theorem 12.6 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose 2 ≤ p < ∞. There exists a constant c1 depending on p such that for any finite stopping time T ,
E (MT∗ ) p ≤ c1 E MTp/2 . Proof There is nothing to prove if the left-hand side is zero, so we may assume it is positive. First suppose MT∗ is bounded by a positive constant K. Note for p ≥ 2 the function x → |x| p is C 2 . By Doob’s inequalities and then Itˆo’s formula (and the fact that |Ms | ≥ 0), we have
E |MT∗ | p ≤ cE |MT | p T T p−1 1 = cE p|Ms | dMs + 2 cE p(p − 1)|Ms | p−2 dMs 0 0 T ≤ cE (MT∗ ) p−2 dMs 0
= cE [(MT∗ ) p−2 MT ]. (Recall our convention about constants and the letter c.) Using H¨older’s inequality with exponents p/(p − 2) and p/2, we obtain
E (MT∗ ) p ≤ c(E (MT∗ ) p )
p−2 p
p
2
(E (MT2 ) p .
Dividing both sides by (E (MT∗ ) p )(p−2)/p) and then taking both sides to the power p/2 gives our result. We then apply the above to T ∧ UK , where UK = inf {t : |Mt | ≥ K}, let K → ∞, and use Fatou’s lemma. Theorem 12.7 Let Mt be a continuous local martingale with M0 = 0, a.s., and suppose 2 ≤ p < ∞. There exists a constant c2 depending on p such that for any finite stopping time T ,
E MTp/2 ≤ c2 E (MT∗ ) p. Proof As in the previous theorem, we may assume the left-hand side is positive. Set r = p/2. Let us first suppose MT and MT∗ are bounded by a positive constant K. Let Nt = Mt∧T , so r−1 . Using integration by parts, that N∞ = MT , and let At = Mt∧T ∞ ∞ Ns dAs = N∞ A∞ − As dNs 0
0
1 = Nr∞ − Nr∞ . r Since
∞ 0
N∞ dAs = Nr∞ ,
84
Some applications of Itˆo’s formula
we then have
Nr∞ = r
∞ 0
(N∞ − Ns ) dAs .
Using Propositions 3.14 and 9.6,
∞ E MrT = E Nr∞ = rE (N∞ − Ns ) dAs 0 ∞ = rE (E [N∞ | Fs ] − Ns ) dAs 0 ∞ = rE E [N∞ − Ns | Fs ] dAs 0 ∞ 2 = rE E [N∞ − Ns2 | Fs ] dAs 0 ∞ ∗ 2 ≤ cE E [(N∞ ) | Fs ] dAs 0
∗ 2 = cE [(N∞ ) A∞ ]
= cE [(MT∗ )2 Mr−1 T ]. We use H¨older’s inequality with exponents r and r/(r − 1), divide both sides by the quantity (E MrT )(r−1)/r , and then take both sides to the rth power. We then get
E MrT ≤ cE (MT∗ )2r , which is what we wanted. To remove the restriction that M and M ∗ are bounded, we apply the above to T ∧ VK in place of T , where VK = inf {t : Mt + Mt∗ ≥ K}, let K → ∞, and use Fatou’s lemma.
12.6 Stratonovich integrals For stochastic differential geometry and also many other purposes, the Stratonovich integral is more convenient than the Itˆo integral. If X and Y are continuous semimartingales, the t Stratonovich integral, denoted 0 Xs ◦ dYs , is defined by t t Xs ◦ dYs = Xs dYs + 12 X , Y t . 0
0
Both the beauty and the difficulty of Itˆo’s formula are due to the quadratic variation term. The change of variables formula for the Stratonovich integral avoids this. Theorem 12.8 Suppose f ∈ C 3 and X is a continuous semimartingale. Then t f (Xs ) ◦ dXs . f (Xt ) = f (X0 ) + 0
Proof By Itˆo’s formula applied to the function f and the definition of the Stratonovich integral, it suffices to show that t f (Xs )dX s . (12.13) f (X ), X t = 0
Exercises
85
Applying Itˆo’s formula to the function f , which is in C 2 , t f (Xt ) = f (X0 ) + f (Xs ) dXs + 12 0
t
f (Xs ) dX s ,
0
from which (12.13) follows. If X and Y are continuous semimartingales and we apply the change of variables formula with f (x) = x2 to X + Y and X − Y , we obtain t 2 2 (Xt + Yt ) = (X0 + Y0 ) + 2 (Xs + Ys ) ◦ d(Xs + Ys ) 0
and
t
(Xt − Yt ) = (X0 − Y0 ) + 2 2
(Xs − Ys ) ◦ d(Xs − Ys ).
2
0
Taking the difference and then dividing by 4, we have the product formula for Stratonovich integrals t t Xs ◦ dYs + Ys ◦ dXs . (12.14) Xt Yt = X0Y0 + The Stratonovich integral
0
0
Hs ◦ dXs can be represented as a limit of Riemann sums.
12.9 Suppose H and X are continuous semimartingales and t0 > 0. Then Proposition t H ◦ dX is the limit in probability as n → ∞ of s 0 s n 2 −1
k=0
Hkt0 /2n + H(k+1)t0 /2n (X(k+1)t0 /2n − Xkt0 /2n ). 2
Proof
We write the sum as Hkt0 /2n (X(k+1)t0 /2n − Xkt0 /2n ) + 12 (H(k+1)t0 /2n − Hkt0 /2n )(X(k+1)t0 /2n − Xkt0 /2n ). t The first sum tends to 0 Hs dXs while by Exercise 12.10 the second sum tends to 12 H, X t . This proves the proposition.
Exercises 12.1 Show that Wt and Wt2 − t are local martingales, where W is defined in the statement of Theorem 12.2. 12.2 Suppose {Ft } is a filtration satisfying the usual conditions, X is a Brownian motion with respect to {Ft }, and T is a finite stopping time with respect to this same filtration. Let Y be another Brownian motion that is independent of {Ft } and define
Xt , t
86
Some applications of Itˆo’s formula
12.3 Suppose Mt is a continuous local martingale with respect to a filtration {Ft } satisfying the usual conditions, T is a stopping time with respect to {Ft }, and Mt = t ∧ T . Prove that Mt∧T has the same law as a Brownian motion stopped at time T . 12.4 Here is a multidimensional version of L´evy’s theorem. Let {Ft } be a filtration satisfying the usual conditions. Suppose (Mt1 , . . . , Mtd ) is a d-dimensional process such that each component Mti is a continuous martingale with respect to {Ft } with M i t = t. Suppose that M i , M j t = 0 if i = j. Prove that (Mt1 , . . . , Mtd ) is a d-dimensional Brownian motion. 12.5 Let {Ft } be a filtration satisfying the usual conditions. Let At be a strictly increasing continuous process adapted to {Ft } with limt→∞ At = ∞, a.s. Suppose (Mt1 , . . . , Mtd ) is a d-dimensional process such that each component Mti is a continuous martingale with respect to {Ft } and M i t = At . Suppose that M i , M j t = 0 if i = j. Prove that (Mt1 , . . . , Mtd ) is a time change of d-dimensional Brownian motion. 12.6 Suppose M is a continuous local martingale such that Mt is deterministic. Prove that M is a Gaussian process. 12.7 Suppose M is a continuous local martingale with M0 = 0, a.s. Show that there exists a Brownian motion W , an increasing process τt , and a stopping time T such that Mt = Wτt ∧T for all t. ∗ < ∞) and (M 12.8 Let Mt be a continuous local martingale. Show that the events (M∞ ∞ < ∞) differ by at most a null set.
12.9 Let Mt be a continuous local martingale. Prove that P(sup |Mt | > x, M∞ < y) ≤ 2e−x
2
/2y
.
t≥0
12.10 Suppose X and Y are continuous semimartingales and t0 > 0. Prove that n 2 −1
(X(k+1)t0 /2n − Xkt0 /2n )(Y(k+1)t0 /2n − Ykt0 /2n )
k=0
converges to X , Y t0 in probability. 12.11 Let p > 0. Suppose X and Y are non-negative random variables, β > 1, δ ∈ (0, 1), and ε ∈ (0, β −p /2) such that P(X > βλ, Y < δλ) ≤ ε P(X ≥ λ)
for all λ > 0. This inequality is known as a good-λ inequality. Prove that there exists a constant c (depending on β, δ, ε, and p but not X or Y ) such that E X p ≤ cE Y p .
Hint: First assume X is bounded. Write P(X/β > λ) = P(X > βλ, Y < δλ) + P(Y ≥ δλ)
≤ ε P(X ≥ λ) + P(Y/δ ≥ λ). Multiply by pλ p−1 , integrate over λ, and use the fact that ε < β −P /2.
Exercises
87
12.12 Use Exercise 12.11 to prove that the Burkholder–Davis–Gundy inequalities hold for all p > 0. Hint: Use time change to reduce to the case of a Brownian motion W . If T is a stopping time and U = inf {t : WT∗ > λ}, write P(WT∗ > βλ, T 1/2 < δλ) = P(WT∗ > βλ, T < δ 2 λ2 , U < ∞)
≤ P(
sup
U ≤t≤U +δ 2 λ2
|Wt − WU | > (β − 1)λ, U < ∞).
Condition on FU , use Theorem 4.2, and notice that P(U < ∞) = P(WT∗ > λ). 12.13 Define the H 1 norm of a martingale by MH 1 = E [sup |Mt | ]. t≥0
Prove that this is a norm. Does there exist a uniformly integrable continuous martingale that is not in H 1 ? 12.14 Let W be a Brownian motion and let T be a stopping time. Prove that if E T 1/2 < ∞, then E WT = 0. 12.15 Suppose W = (W 1 , . . . , W d ) is a d-dimensional Brownian motion started at 0, and let {Ft } be the minimal augmented filtration of W . Suppose Y is a F1 measurable random variable with mean and finite variance. Prove there exist predictable processes H 1 , . . . , H d such that 1 zero i 2 E 0 (Hs ) ds < ∞ for each i and Y =
d i=1
1 0
Hsi dWsi .
12.16 Suppose W is a Brownian motion and H is adapted, bounded, and right continuous. Let t ≥ 0. Show t+h 1 Hs dWs Wt+h − Wt t converges in probability to Ht . 12.17 Let W be a Brownian motion and α > 0. Show that t 1 ds α 0 |Ws | is infinite almost surely if α ≥ 1 but finite almost surely if α < 1. 12.18 Here is a useful inequality. Suppose A is an increasing process with A0 = 0, a.s., and suppose there exists a non-negative random variable B such that for each t, E [A∞ − At | Ft ] ≤ E [B | Ft ],
a.s.
Prove that for each integer p ≥ 1, there exists a constant c p depending only on p such that p
E A∞ ≤ c p E B p .
Hint: Write
A∞ = p!
0
take expectations, and use Proposition 3.14.
∞
(A∞ − At ) dAt ,
88
Some applications of Itˆo’s formula
12.19 Let W be a one-dimensional Brownian motion with filtration {Ft } and let f (r, s) be a deterministic function. Define the multiple stochastic integral by t s t s f (r, s) dWr dWs = f (r, s) dWr dWs , 0
provided
0
0
t 0
s
0
f (r, s)2 dr ds < ∞,
0
and similarly for higher-order multiple stochastic integrals. (1) If f : Rm → R and g : Rn → R are bounded and deterministic, n = m, t rm−1 f ··· f dWr1 · · · dWrm , Mt = 0
g Mt
0 f
g
and is defined similarly, show that E [Mt Mt ] = 0 for all t. (2) Show that the collection of random variables f
{M1 : f has domain Rm for some m and is bounded and deterministic} is dense in the set of mean zero F1 measurable random variables with respect to the L2 (P ) norm.
13 The Girsanov theorem
We look at what happens to a Brownian motion when we change P to another probability measure Q. This may seem strange, but there are many applications of this, including to financial mathematics and to filtering; see Chapters 28 and 29. Another application we will give (at the end of this chapter in Section 13.2) is to determine the probability a Brownian motion Ws crosses a line a + bs before time t.
13.1 The Brownian motion case We start with an observation. Suppose Yt is a continuous local martingale with Y0 = 0 and let Zt = eYt −Y t /2 . Applying Itˆo’s formula to Xt = Yt − 12 Y t with the function ex yields t t
Yt −Y t /2 Xs 1 1 Zt = e =1+ e d Ys − 2 Y s + 2 eXs dY s 0 0 t =1+ Zs dYs . (13.1) 0
This can be abbreviated by dZt = Zt dYt . Zt is called the exponential of the martingale Y , and since Z is the stochastic integral with respect to a local martingale, it is itself a local martingale. Before stating the Girsanov theorem, we need two technical lemmas. Lemma 13.1 Suppose Y is a continuous local martingale with Y0 = 0 and Zt = eYt −Y t /2 . If Y t is a bounded random variable for each t, then E |Zt | p < ∞ for each p > 1 and each t. Proof Let us first suppose Y is bounded in absolute value by N. Since Zt ≥ 0, we have by the Cauchy–Schwarz inequality
E Ztp = E e pYt −pY t /2 2 2 = E e pYt −p Y t e(p −(p/2))Y t
1/2
1/2 2 2 ≤ E e2pYt −2p Y t E e(2p −p)Y t .
(13.2)
By the exact same calculation as in (13.1) but with Y replaced by 2pY , we see e2pYt −2p Y t is a stochastic integral of a bounded integrand with respect to a bounded martingale, and hence is a martingale. This shows that the first factor on the last line of (13.2) is 1. By our assumption that Y t is bounded, the second factor on this line is finite and does not depend on N. 2
89
90
The Girsanov theorem
If Y is not bounded, let TN = inf {s : |Ys | ≥ N}, apply the above argument to Yt∧TN , and let N → ∞. The second lemma is the following. Lemma 13.2 Suppose At is a continuous increasing process adapted to a filtration {F t } satisfying the usual conditions. Let X be a bounded random variable, H a bounded adapted process, s < t, and B ∈ Fs . Then t t E X Hr dAr ; B = E E [X | Fr ] Hr dAr ; B . s
s
Proof By linearity, it suffices to suppose H are non-negative. Let Ar = Ar+s , r X and Hr = Hr+s , and Fr = Fr+s . Let Cr = 0 Hr 1B dAs , and so we must show t−s t−s E X dCr = E E [X | Fr ] dCr . 0
0
This follows by Proposition 3.14. Let Mt be a non-negative continuous martingale with M0 = 1, a.s. Define a new probability measure Q by Q(A) = E [Mt ; A] if A ∈ Ft . Note Q is a probability measure because Q() = E Mt = E M0 = 1. Q is well-defined because if A ∈ Fs ⊂ Ft , then since M is a martingale, we have E [Mt ; A] = E [Ms ; A]. A more general version of the Girsanov theorem is possible (see Exercise 13.5), but the Girsanov theorem is most frequently used with Brownian motion. Theorem 13.3 Suppose Wt is a Brownian motion with respect to P, H is bounded and predictable, t
t Mt = exp Hr dWr − 12 Hr2 dr , (13.3) 0
0
and t
Q(B) = E P [Mt ; B]
if B ∈ Ft .
(13.4)
Hr dr is a Brownian motion with respect to Q. t Proof We prove the theorem by showing Wt − 0 Hr dr satisfies the hypotheses of L´evy’s t theorem (Theorem 12.1). We first show Wt − 0 Hr dr is a martingale with respect to Q. By t (13.1) with Yt = 0 Hr dWr and Zt = Mt , t Mt = 1 + Mr Hr dWr . Then Wt −
0
0
By Exercise 10.5,
M, W t =
t
Mr Hr dr.
(13.5)
0
We want to show that if B ∈ Fs , then t s E Q Wt − Hr dr; B = E Q Ws − Hr dr; B . 0
0
(13.6)
13.1 The Brownian motion case
91
If B ∈ Fs , then using the definition of Q and the product formula (Corollary 11.3),
E Q [Wt ; B] = E P [Mt Wt ; B] t t = EP Mr dWr ; B + EP Wr dMr ; B 0
(13.7)
0
+ E P [M, W t ; B] and
E Q [Ws ; B] = E P [MsWs ; B] s s Mr dWr ; B + EP Wr dMr ; B = EP 0
(13.8)
0
+ E P [M, W s ; B]. · Since H is bounded, 0 Hr dWr t ≤ ct. By Lemma 13.1, Mt is a martingale and E |Mt | p < ∞ for each t and each p ≥ 1. Since stochastic integrals with respect to martingales are martingales, t s EP Mr dWr ; B = E P Mr dWr ; B (13.9) 0
and
EP
t
0
Wr dMr ; B = E P
0
s
Wr dMr ; B .
(13.10)
0
Combining (13.7), (13.8), (13.9), and (13.10), we see that (13.6) will follow if we show t E P [M, W t − M, W s ; B] = E Q Hr dr; B . (13.11) s
Using Lemma 13.2 and (13.5), we have t t t EQ Hr dr; B = E P Mt Hr dr; B = E P Mt Hr dr; B s s s t t E [Mt | Fr ]Hr dr; B = E P Mr Hr dr; B = EP 0
s
= E P [M, W t − M, W s ; B], which proves (13.11). t A similar proof shows that (Wt − 0 Hr dr)2 −t is a martingale with respect to Q, and hence t the quadratic variation of Wt − 0 Hr dr under Q is still t (or see Exercise 13.2). Since the t t process Wt − 0 Hr dr has continuous paths, by L´evy’s theorem, Wt − 0 Hr dr is a Brownian motion under Q. The assumption that H be bounded can be weakened, but in practice it is more common to use a stopping time argument; for an example, see the proof of Theorem 29.3.
92
The Girsanov theorem
13.2 An example Let us give an example of the use of the Girsanov theorem, namely, to compute the probability that Brownian motion crosses a line a + bt by time t0 , a > 0. We want to find an exact expression for P(∃t ≤ t0 : Wt = a + bt ), where W is a Brownian motion. Let Wt be a Brownian motion under P. Define Q on Ft0 by d Q/d P = Mt = e−bWt −b t/2 . 2
t = Wt + bt is a Brownian motion, and Wt = W t − bt. By the Girsanov theorem, under Q, W Let A = (sups≤t0 Ws ≥ a). If we set S = inf {t > 0 : Wt = a}, then A = (S ≤ t0 ) and A ∈ FS∧t0 . We write
P(∃t ≤ t0 : Wt = a + bt ) = P(∃t ≤ t0 : Wt − bt = a) = P(sup(Ws − bs) ≥ a).
(13.12)
s≤t0
t is a Brownian motion under Q. Therefore the last Wt is a Brownian motion under P while W line of (13.12) is equal to s − bs) ≥ a). Q(sup(W s≤t0
This in turn is equal to
Q(sup Ws ≥ a) = Q(A). s≤t0
To evaluate Q(A), note MS = e−ab−b S/2 and by (3.19) with b replaced by a, 2
a 2 P(S ∈ ds) = √ e−a /2s . 3 2π s Now we use optional stopping to obtain
P(∃t ≤ t0 : Wt = a + bt ) = Q(A) = E P [Mt0 ; A] = E P [MS∧t0 ; S ≤ t0 ] = E P [MS ; S ≤ t0 ] t0 a 2 2 = e−ab−b s/2 √ e−a /2s ds. 2π s3 0
(13.13)
Exercises 13.1 Whether a filtration satisfies the usual conditions depends on the class of null sets and hence the probability measure involved matters. Suppose {Ft } satisfies the usual conditions with respect to P, H is a bounded predictable process, W a Brownian motion with respect to P, M defined by (13.3), and Q defined by (13.4). If t0 > 0 and A ∈ σ (Ws ; s ≤ t0 ), show P(A) = 0 if and only if Q(A) = 0.
Exercises
93
13.2 Theorem 9.10 allows us to avoid some calculations in the last paragraph of the proof of Theorem 13.3. Suppose X is a continuous semimartingale under P and Q is a probability measure equivalent to P. That is, a set is a null set for P if and only if it is a null set for Q. Show X is a semimartingale under Q and the quadratic variation of X under P equals the quadratic variation of X under Q. 13.3 Let W = (W 1 , . . . , W d ) be a d-dimensional Brownian motion with minimal augmented filtration {Ft } and let H1 , . . . , Hd be bounded predictable processes. Let Mt = exp
d
i=1
t 0
Hi (s) dWsi −
1 2
d i=1
t
|Hi (s)|2 ds .
0
ti = Wti − a probability measure Q by setting Q(A) = E P [Mt ; A] if A ∈ Ft . Let W Define t 1 d 0 Hi (s) ds for each i. Prove that W = (W , . . . , W ) is a d-dimensional Brownian motion under Q. 13.4 Let Wt be a d-dimensional Brownian motion and let δ, t0 > 0. Let f : [0, t0 ] → Rd be a continuous function. Prove that there exists a constant c such that P(sup |Ws − f (s)| < δ) > c. s≤t0
This is known as the support theorem for Brownian motion. Hint: First assume that f has a bounded derivative. Use Exercise 4.9 and the Girsanov theorem. 13.5 Here is a more general form of the Girsanov theorem. Suppose Lt is a bounded continuous martingale under P, Mt = eLt −Lt /2 , and Q is a probability measure defined by Q(A) = E P [Mt0 ; A] if A ∈ Ft0 . Suppose {Ft } is a filtration satisfying the usual conditions with respect to both P and Q. Show that if X is a martingale under P, then Xt − X , Lt is a martingale under Q.
14 Local times
Let Wt be a one-dimensional Brownian motion. Although the Lebesgue measure of the random set {t : Wt = 0} is 0, a.s., nevertheless there is an increasing continuous process which grows only when the Brownian motion is at 0. This increasing process is known as local time at 0. We want to derive some of its properties.
14.1 Basic properties Let W be a Brownian motion. By Jensen’s inequality for conditional expectations (Proposition A.21), |Wt | is a submartingale, and by the Doob–Meyer decomposition (Theorem 9.12), it can be written as a martingale plus an increasing process. Since Wt is itself a martingale, the increasing process grows only at times when the Brownian motion is at 0. Rather than appealing to the Doob–Meyer decomposition, we give the explicit decomposition of |Wt |. We define ⎧ ⎪ x > 0; ⎨1, sgn (x) = 0, x = 0; ⎪ ⎩ −1, x < 0. Theorem 14.1 Let Wt be a one-dimensional Brownian motion. (1) There exists a non-negative increasing continuous adapted process Lt0 such that t sgn (Ws ) dWs + Lt0 . (14.1) |Wt | = 0
(2) Lt0 increases only when W is at 0. More precisely, if Ws (ω) = 0 for r ≤ s ≤ t, then = Lt0 (ω).
L0r (ω)
Lt0 is called the local time at 0. The equation (14.1) is called the Tanaka formula. Proof
Define
fε (x) =
|x| < ε; x2 /2ε, |x| − (ε/2), |x| ≥ ε.
The function fε is an approximation to the function | · |, and note that fε (0) = fε (0), while fε (x) = ε−1 1[−ε,ε] (x), except at x = ±ε. 94
14.1 Basic properties
95
We apply the extension of Itˆo’s formula given in Exercise 11.2 to fε (Wt ) and obtain t t fε (Ws ) dWs + 12 fε (Ws ) ds. fε (Wt ) = 0
0
As we let ε → 0, we see that fε (x) → |x| uniformly, and fε (x) → sgn (x) boundedly. By Doob’s inequalities, if t0 > 0, t t 2 E sup fε (Ws ) dWs − sgn (Ws ) dWs → 0, (14.2) t≤t0
0
0
while supt≤t0 | fε (Wt ) − |Wt | | → 0, a.s. Therefore there exists an increasing process Lt0 and a subsequence εn → 0 such that 1 t sup 1[−εn ,εn ] (Ws ) ds − Lt0 → 0, a.s. (14.3) t≤t0 2εn 0 Hence for almost every ω there is convergence uniformly over t in finite intervals, so Lt0 is t continuous in t. Since 2ε1n 0 1[−εn ,εn ] (Ws ) ds increases only for those times t where |Wt | ≤ εn , then Lt0 increases only on the set of times when Wt = 0. In the Tanaka formula, the stochastic integral term is a martingale, say Nt . Note Nt = t, since sgn (x)2 = 1 unless x = 0, and we have seen that Brownian motion spends 0 time at 0 (Exercise 11.1). Hence we have exhibited reflecting Brownian motion, namely |Wt |, as the sum of another Brownian motion, Nt , and a continuous process that increases only when W is at zero. Let Mt denote sups≤t Ws . Note we do not have an absolute value here. The following, due to L´evy, is often useful. Theorem 14.2 The two-dimensional processes (|W |, L0 ) and (M − W, M ) have the same law. Proof
Let Vt = −Nt in the Tanaka formula, so that |Wt | = −Vt + Lt0 .
(14.4)
Let St = sups≤t Vs . We will show St = Lt0 . This will prove the result, since V is a Brownian motion, and hence (M − W, M ) is equal in law to (S − V, S) = (|W |, L0 ). From (14.4), Vt = Lt0 − |Wt |, or Vt ≤ Lt0 for all t, hence St ≤ Lt0 , since L0 is increasing. Lt0 increases only when Wt = 0 and at those times Lt0 = Vt + |Wt | = Vt ≤ St . Given two increasing functions with f ≤ g, if f (t ) = g(t ) at those times when f increases, a little thought shows that f and g are equal for all t. Hence Lt0 = St for all t. Just as we defined Lt0 via the Tanaka formula, we can construct local time at the level a by the formula t |Wt − a| − |W0 − a| = sgn (Ws − a) dWs + Lta , (14.5) 0
96
Local times
and the same proof as above shows that Lta is the limit in L2 of 1 t 1[a−ε,a+ε] (Ws )ds. 2ε 0
14.2 Joint continuity of local times Next we will prove that Lta can be taken to be jointly continuous in both t and a. Theorem 14.3 Let W be a one-dimensional Brownian motion and let Lta be the local time of W at level a. For each a ∈ R there exists a version Lta of Lta so that with probability one, a Lt is jointly continuous in t and a. Recall that two processes X and Y are versions of each other if for each t, Xt = Yt , a.s. We will use the Kolmogorov continuity criterion, Corollary 8.2, together with Remark 8.3. ta − N tb , where N ta = t sgn (Ws − a) dWs , by means of the We will obtain an estimate on N 0 Burkholder–Davis–Gundy inequalities. Proof Let M > 0 be arbitrary. It suffices to show the joint continuity for times less than or equal to M and for |a| ≤ M. Let M∧t sgn (Ws − a) dWs . Nta = 0
Since |Wt −a| is uniformly continuous in t and a for |t| ≤ M, |a| ≤ M, by the Tanaka formula (14.5) it suffices to establish the same fact for Nta . Let T be a stopping time bounded by M and a < b. Since (Nta − Ntb )2 − N a − N b t is a martingale, E ((NMa − NMb )−(NTa − NTb ))2 |FT M =E (sgn (Ws − a) − sgn (Ws − b))2 ds|FT T M 1[a,b] (Ws ) ds|FT = 4E T M+T 1[a,b] (Ws ) ds|FT ≤ 4E T M 1[a,b] (Ws+T ) ds|FT ; = 4E 0
recall Exercise 11.1. From Proposition 4.5 we deduce M M c(b − a) E 1[a,b] (Ws+T ) ds|FT ≤ ds ≤ c(b − a). √ s 0 0 Thus
E ((NMa − NMb ) − (NTa − NTb ))2 |FT ≤ c|b − a|,
and so by (9.3)
E [N a − N b M − N a − N b T | FT ] ≤ c|b − a|.
14.3 Occupation times
97
If we write At = N a − N b t , then we have by Proposition 3.14 M E A2M = 2E (AM − At ) dAt 0 M = 2E (E [AM | Ft ] − At ) dAt 0 M = 2E E [AM − At | Ft ] dAt 0 M ≤ c|b − a|E dAt ≤ c|b − a|2 . 0
Applying the Burkholder–Davis–Gundy inequalities,
E [sup |Nta − Ntb |4 ] ≤ c|b − a|2 .
(14.6)
t≤M
By the Kolmogorov continuity criterion applied on the Banach space of continuous functions with the metric d( f , g) = supt≤M | f (t ) − g(t )|, we see Nta is continuous as a function of a for a in the dyadic rationals in [−M, M], uniformly over t ≤ M. Therefore Lta is continuous over a in the dyadic rationals in [−M, M], uniformly for t ≤ M. Also, (14.5) and (14.6) imply 2 E [sup |Lta − Ltb |4 ] ≤ c |a − b| ∧ 1 .
(14.7)
t≤M
Note that if we define Lta = lim Ltbn where the limit is as bn → a and bn is in the dyadic rationals, then (14.7) implies that Lta = Lta , a.s. The uniform continuity of Lta over a in the dyadic rationals and t ≤ M implies the joint continuity of Lta .
14.3 Occupation times If we integrate local times over a set, we obtain occupation times. More precisely, we have the following. Theorem 14.4 Let Wt be a Brownian motion and Lty the local time at the level y, where we take Lty to be jointly continuous in t and y. If f is non-negative and Borel measurable, t f (y)Lty dy = f (Ws ) ds, a.s. (14.8) 0
with the null set independent of f and t. Proof Suppose we prove the above equality for each C 2 function f with compact support and denote the null set by N f . Taking a countable collection { fi } of non-negative C 2 functions with compact support that are dense in the set of non-negative continuous functions on R with compact support and letting N = ∪i N fi , then if ω ∈ / N we have the above equality for all fi . By taking limits, we have (14.8) for all bounded and continuous f . A further limiting procedure implies our result.
98
Local times
Suppose f is bounded and C 2 with compact support. Notice that the process is increasing and continuous. Define g(x) = f (y)|x − y| dy.
f (y)Lty dy
(14.9)
By Exercise 14.1, g is C 2 with 12 g = f . If we take the Tanaka formula (14.5), replace a by y, multiply by f (y), and integrate over R with respect to y, we see that t g(Wt ) − g(W0 ) = martingale + f (y)Lty dy. 0
Using Itˆo’s formula,
g(Wt ) − g(W0 ) = martingale +
1 2
t
g (Ws ) ds
0
t
f (Ws ) ds.
= martingale + 0
Thus
t
f 0
(y)Lty
t
dy −
f (Ws ) ds 0
is a continuous martingale with paths locally of bounded variation, hence by Theorem 9.7 it is identically 0.
Exercises 14.1 Suppose f
is C 2
with compact support and g(x) = f (y)|x − y| dy.
Show that g is C 2 and g = 2 f . y
14.2 Let Lt be the jointly continuous local times of a Brownian motion W . Show t 1 y 1[y−ε,y+ε] (Ws ) ds → Lt , a.s. 2ε 0 Show the null set can be taken to be independent of y. Thus there is no need to take a subsequence y εn to get almost sure convergence to Lt . t 14.3 Let W be a Brownian motion and fix t. Show that the function x → 0 1(−∞,x] (Ws ) ds is continuous, a.s., but that the function x → 1(−∞,x] (Wt ) is not continuous. 14.4 Let {Ft } be a filtration satisfying the usual conditions. Suppose Wt is a Brownian motion and Xt = Wt + At , where Xt ≥ 0 for all t, a.s., and At is an increasing continuous adapted process such that A increases only at those times when Xt = 0. Suppose also that Xt = Wt + At , where Xt ≥ 0 for all t, a.s., and At is an increasing continuous adapted process that increases only when Xt = 0. Show that Xt = Xt and At = At , a.s., for all t ≥ 0. 14.5 Let W be a Brownian motion and Lt0 the local time at 0. Since Lt0 is increasing, for each ω there is a Lebesgue–Stieltjes measure dLt0 . Show that the support of dLt0 is equal to {t : Wt = 0}.
Exercises
99
Since Theorem 14.1(2) states that Lt0 does not increase when Wt is not equal to 0, what you need to show is that with probability one, if Wu (ω) = 0 and t < u < v, then L0v (ω) > Lt0 (ω). y
14.6 Use Tanaka’s formula to show that if Lt is the local time of Brownian motion at level y, / [a, b]}, then a ≤ x ≤ y ≤ b, and T = inf {t > 0 : Wt ∈ y
E x LT =
2(x − a)(b − y) . b−a
14.7 If Lt0 is the local time of a Brownian motion at 0, show that L0at has the same law as
√
aLt0 .
14.8 Let W be a Brownian motion with local times Lt . Set Lt∗ = supy Lt . Let p > 0. Prove that there exist constants c1 , c2 such that if T is any finite stopping time, y
y
c1 E T p/2 ≤ EL∗T ≤ c2 E T p/2 . The constants c1 , c2 can depend on p, but not on T . Hint: Use Exercise 12.11. 14.9 This exercise defines the local time of a continuous martingale. If M is a continuous martingale, then Mt2 is a submartingale and so equals a martingale plus an increasing process. The increasing process Lt0 is called the local time of M at 0. (1) Prove the analog of Tanaka’s formula. (2) Define the local time Lta of M at a. Prove that Lta is jointly continuous in t and a. (3) Prove that t f (Ms ) dMs = Lta f (a) da, a.s. R
0
if f is non-negative and measurable. 14.10 This exercise is a complement to Exercise 7.8. Let W be a Brownian motion and let us define Z = {t ∈ [0, 1] : Wt = 0}, the zero set. Let ε ∈ (0, 1/2) and let δ > 0. Fix ω and let {Bi } be any countable covering of Z(ω) by closed intervals such that the interiors of the Bi ’s are pairwise disjoint and the length of each Bi is less than or equal to δ. We write Bi = [ai , bi ]. Let ε > 0. Since L0 has the same law of the maximum of Brownian motion, there exists a c (depending on ω) such that ε
Lt0 − L0s ≤ c(t − s) 2 − 2 1
for each 0 ≤ s ≤ t ≤ 0. Write
|bi − ai | 2 −ε ≥ 1
i
≥
1 ε δ −ε/2 |bi − ai | 2 − 2 c c i
δ −ε/2 0 (Lbi − L0ai ) c i
δ −ε/2 0 [L1 − L00 ]. c Show that this implies that the Hausdorff dimension of Z is at least 1/2. =
15 Skorokhod embedding
Suppose Y is a random variable with mean zero and finite variance. Skorokhod proved the remarkable fact that if W is a Brownian motion, there exists a stopping time T such that WT has the same law as Y . Without any restrictions on T , there is a trivial solution (see Exercise 15.1), so one wants to require that E T < ∞. Skorokhod’s construction required an additional random variable that is independent of the Brownian motion, but since that time there have been 15 or 20 other constructions, most of which don’t require the extra randomization, that is, T is a stopping time for the minimal augmented filtration generated by W . Although conceptually some constructions are easier than others, none is easy from the point of view of technical details. We will give a construction that doesn’t have any optimality properties, but is a nice example of stochastic calculus. Then we will use this to prove an embedding for random walks.
15.1 Preliminaries A function f : R → R is a Lipschitz function if there exists a constant k such that | f (y) − f (x)| ≤ k|y − x|,
x, y ∈ R.
(15.1)
By the mean value theorem, if f has a bounded derivative, then f is a Lipschitz function. We will need the following well-known theorem from the theory of ordinary differential equations. Theorem 15.1 Suppose F : [0, ∞) × R → R is a bounded function and there exists a positive real k such that |F (t, x) − F (t, y)| ≤ k|x − y| for all t ≥ 0 and all x, y ∈ R. Let y0 ∈ R, define the function y0 by y0 (t ) = y0 for all t ≥ 0, and define the function yi inductively by t yi+1 (t ) = y0 + F (s, yi (s)) ds, t ≥ 0. (15.2) 0 i
Then the functions y converge uniformly on bounded intervals to a function y that satisfies t y(t ) = y0 + F (s, y(s)) ds. (15.3) 0
100
15.1 Preliminaries
101
For any s such that F (s, y(s)) is continuous at s, y satisfies dy = F (s, y(s)). ds
(15.4)
The solution to (15.3) is unique. This inductive procedure for obtaining the solution to (15.4) is known as Picard iteration. Proof Note each yi (t ) is bounded in absolute value by |y0 | + t sup |F |. Let gi (t ) = sups≤t |yi+1 (s) − yi (s)|. If s ≤ t, then s i+1 i |y (s) − y (s)| = [F (r, yi (r)) − F (r, yi−1 (r))] dr 0 t |F (r, yi (r)) − F (r, yi−1 (r))| dr ≤ 0 t |yi (r) − yi−1 (r)| dr ≤k 0 t gi−1 (r) dr. ≤k 0
Taking the supremum over s ≤ t, we have
gi (t ) ≤ k
t
gi−1 (r) dr. 0
t Fix t0 . Now g1 (t ) is bounded for t ≤ t0 , say by L. Then g2 (t ) ≤ k 0 L dr = kLt for each t t g4 (t ) ≤ k 0 (k 2 Lr2 /2) dr = k 3 Lt 3 /3! t ≤ t0 , and then g3 (t ) ≤ k 0 (kLr) dr = k 2 Lt 2 /2 and ∞ By induction gi (t ) ≤ k i−1 Lt i−1 /(i − 1)! We conclude i=1 gi (t0 ) < ∞. Then n−1 gi (t0 ), sup |yn (s) − ym (s)| ≤ s≤t0
i=m
which tends to zero as m and n tend to infinity. By the completeness of the space C[0, t0 ], there exists a continuous function y such that sups≤t0 |yn (s) − y(s)| → 0 as n → ∞. F is continuous in the x variable, so taking the limit in (15.2) shows that y solves (15.3). If F is continuous at a particular value of s, then (15.4) holds by the fundamental theorem of calculus. To prove uniqueness, suppose x and y are solutions to (15.4) and let us set g(t ) = sups≤t |x(s) − y(s)|. If s ≤ t, then s |F (r, x(r)) − F (r, y(r))| dr |x(s) − y(s)| ≤ 0 t |x(r) − y(r)| dr ≤k 0 t g(r) dr. ≤k 0
102
Skorokhod embedding
Taking the supremum over s ≤ t, we obtain
g(t ) ≤ k
t
g(r) dr. 0
For t ≤ t0 , we have |x(t )| and |y(t )| bounded by a constant, say L, so g(t ) is bounded for t ≤ t0 . t t We then have g(t ) ≤ k 0 L dr = kLt for each t ≤ t0 and then g(t ) ≤ k 0 kLr dr = k 2 Lt 2 /2. i i Iterating, we have g(t ) ≤ k t L/i! for each i, and hence g(t ) = 0. This is true for each t, hence x(s) = y(s) for all s ≤ t0 . If the random variable Y that we are considering is equal to 0, a.s., we can just let our stopping time T equal 0, a.s., and then WT = 0 = Y if W is a Brownian motion. In the remainder of this section and the next we assume E Y = 0, E Y 2 < ∞, but that Y is not identically zero. Define 1 2 e−y /2s , ps (y) = √ 2π s the density of a mean zero normal random variable with variance s. Use ps (x) to denote the derivative of ps with respect to x. Lemma 15.2 Suppose W is a Brownian motion and g : R → R such that E [g(W1 )2 ] < ∞. For 0 < s < 1, let (15.5) a(s, x) = − p1−s (z − x)g(z) dz and
b(s, x) =
p1−s (z − x)g(z) dz.
We have
(15.6)
1
g(W1 ) = E g(W1 ) +
a(s, Ws ) dWs ,
a.s.
(15.7)
0
and
E [g(W1 ) | F s ] = b(s, Ws ),
a.s.
(15.8)
Proof We will first prove (15.7), and we will first look at the case when g(x) = eiux . By Itˆo’s formula with the function f (x) = ex applied to the semimartingale Xt = iuWt + 2 u t/2 t t 2 eiuWt +u t/2 = 1 + eXs d(iuWs + u2 s/2) + 12 (−u2 )eXs ds 0 0 t 2 eiuWs +u s/2 dWs , = 1 + iu 0
so iuW1
e
=e
−u2 /2
+
1
2
iueiuWs eu 0
(s−1)/2
dWs .
15.1 Preliminaries
103
We need to check that 2
iueiux eu
(s−1)/2
= a(s, x).
Using integration by parts, a(s, x) = −
= iu
p1−s (z − x)g(z) dz = √
p1−s (z − x)g (z) dz
1 2 e−(z−x) /2(1−s) eiuz dz. 2π (1 − s)
This is iu times the characteristic function of a normal random variable with mean x and variance 1 − s, and so by (A.25) equals iueiux e−u
2
(1−s)/2
,
as desired. We therefore have iuW1
e
= Ee
iuW1
1
−
p1−s (z − Ws )eiuz dz dWs .
(15.9)
0
Now suppose g is in the Schwartz class (see Section B.2), replace u by −u in (15.9), multiply by the Fourier transform of g, and integrate over u ∈ R. We then obtain (2π )−1 g(W1 ) = (2π )−1 E g(W1 ) 1 p1−s (z − Ws )e−iuz g(u) dz dWs du, −
(15.10)
0
where g is the Fourier transform of g. Using the Fubini theorem (check that there is no trouble with the stochastic integral; see Exercise 15.2) and the inversion formula for Fourier transforms, the triple integral on the right-hand side of (15.10) is equal to 1 −1 p1−s (z − Ws )g(z) dz dWs , (15.11) − (2π ) 0
which gives us (15.7) when g is the Schwartz class. A limit argument gives us (15.7) for all g that we are interested in. To prove (15.8) we again start with the case g(x) = eiux . We have
E [eiuW1 | Fs ] = eiuWs E [eiu(W1 −Ws ) | Fs ] = eiuWs E [eiu(W1 −Ws ) ] = eiuWs e−u
2
(1−s)/2
,
using the independent increments property of Brownian motion and (A.25). On the other hand, the definition of b(s, x) shows that when g(x) = eiux , b(s, x) is the characteristic function of a normal random variable with mean x and variance 1 − s, so b(s, x) = eiux e−u
2
(1−s)/2
.
Replacing x by Ws proves (15.8) in the case g(x) = eiux . We extend this to general g in the same way as in the proof of (15.7).
104
Skorokhod embedding
Next, we want to find a reasonable function g such that g(W1 ) is equal in law to Y , where again W is a Brownian motion. Let FY (x) = P(Y ≤ x), the distribution function of Y and let (x) = P(W1 ≤ x). Then
P((W1 ) ≤ x) = P(W1 ≤ −1 (x)) = (−1 (x)) = x for x ∈ [0, 1], so (W1 ) is a uniform random variable on [0, 1]. Define g(x) = FY−1 ((x)).
(15.12)
We use the right-continuous version of FY−1 if FY−1 is not continuous. Then
P(g(W1 ) ≤ x) = P((W1 ) ≤ FY (x)) = FY (x), or Y is equal in law to g(W1 ) as desired. Note g is an increasing function. We will need the following estimates. Proposition 15.3 Let g be defined by (15.12) and define a and b by (15.5) and (15.6). (1) For each L > 0 and s0 < 1, a is continuously differentiable on [0, s0 ] × [−L, L]. Also, for each L > 0 and s0 < 1, a is bounded below by a positive constant on [0, s0 ] × [−L, L]. (2) For each L > 0 and s0 < 1, b is continuously differentiable on [0, s0 ] × [−L, L]. (3) For each s ∈ [0, s0 ], the function x → b(s, x) is strictly increasing. For each fixed s, let B(s, x) be the inverse of b(s, x) (so that B(s, b(s, x)) = x and b(s, B(s, x)) = x). For each L > 0 and s0 < 1, B is continuously differentiable on [0, s0 ] × [−L, L]. Proof
To start, we observe that for every r > 0,
E er|W1 | ≤ E erW1 + E e−rW1 < ∞. Since |z|m ≤ m!e|z| if m is a non-negative integer, then by the Cauchy–Schwarz inequality and the fact that E Y 2 < ∞, 2 m r|z| −z2 /2 |g(z)| dz ≤ m! e(r+1)|z| e−z /2 |g(z)| dz (15.13) |z| e e = m!E e(r+1)|W1 | |g(W1 )|]
1/2 ≤ m! E e2(r+1)|W1 | (E |g(W1 )|2 )1/2
1/2 ≤ m! E e2(r+1)|W1 | (E Y 2 )1/2 < ∞. We now turn to (1). |z − x| −(z−x)2 /2(1−s) e (1 − s)3/2 2 2 ≤ c|z − x|e−x /2(1−s) ezx/2(1−s) e−z /2(1−s)
|p1−s (z − x)| ≤ c
≤ c(|z| + L)e|z|L/2(1−s0 ) e−z /2 2
≤ c|z|ec |z| e−z /2 + cec |z| e−z /2 . Therefore
|a(s, x)| ≤
2
c|z|ec |z| e−z /2 |g(z)| dz + 2
2
cec |z| e−z /2 |g(z)| dz,
which is bounded by (15.13). This gives an upper bound for a.
2
15.2 Construction of the embedding
105
By the mean value theorem, |p1−s (z − x) − p1−s (z − (x + h))| ≤ c|h|(1 + |z|2 + L2 )e−(z−x) /2(1−s) 2
if s ≤ s0 , |x| ≤ L, and |h| ≤ 1, so 1 2 (p1−s (z − x) − p1−s (z − (x + h)) ≤ c(1 + |z|2 )ec |z| e−z /2 . h In view of (15.13), we can use dominated convergence to conclude that ∂a (s, x) = p1−s (z − x)g(z) dz ∂x and that |∂a(s, x)/∂x| is bounded above on [0, s0 ] × [−L, L]. By a similar argument we obtain that |∂a(s, x)/∂s| is also bounded above on [0, s0 ] × [−L, L]. The same argument shows that the second partial derivatives of a are bounded, and hence the first partial derivatives are continuous. Using integration by parts, a(s, x) = p1−s (z − x) dg(z), where the integral is a Lebesgue–Stieltjes integral; recall that g is an increasing function. Since we are working under the assumption that Y is not identically zero, then g is not identically zero, which implies that a is bounded below for s ≤ s0 and |x| ≤ L. The proof of (2) is quite similar. To prove (3), as above, we can use a dominated convergence argument to prove ∂b(s, x) = a(s, x). ∂x Since a(s, x) > 0 for each x and for each s < s0 , we conclude that x → b(s, x) is strictly increasing. The estimates for B follow from the implicit function theorem applied to f (s, x, y) = 0, where f (s, x, y) = b(s, x) − y.
15.2 Construction of the embedding Theorem 15.4 Suppose Y is a random variable with E Y = 0 and E Y 2 < ∞. There exists a Brownian motion N and a stopping time T with respect to the minimal augmented filtration of N such that NT is equal in law to Y . Moreover E T = E Y 2 . Proof The idea is to define M by (15.14) below and do a time change so that NT = M1 = g(W1 ). To show that T is a stopping time relative to the minimal augmented filtration for N, we set up an ordinary differential equation that the time change solves and use Picard iteration to show that the solution can be obtained in a constructive way. The case where Y is identically zero is trivial for we take T = 0, so we suppose Y is not identically zero. Let Wt be a Brownian motion and let {Ft } be its minimal augmented filtration. Define the function g by (15.12) and define a and b for s < 1 by (15.5) and (15.6). Define a(s, x) = 1 and b(s, x) = x if s ≥ 1.
106
Skorokhod embedding
Now let
t
Mt =
a(s, Ws ) dWs ,
(15.14)
0
and hence
t
Mt =
a(s, Ws )2 ds. 0
Note Mt → ∞, a.s., as t → ∞. Since E Y = 0, then E g(W1 ) = 0, so M1 = g(W1 ) by (15.7). Let τt = inf {s : Ms ≥ t}, the inverse of M. By Theorem 12.2, if we set Nt = Mτt , then N is a Brownian motion. Let {Gt } be the minimal augmented filtration generated by N. We let T = M1 . Then NT = NM1 = MτM1 = M1 = g(W1 ), and NT has the same law as Y . For the integrability of T we have
E T = E M1 = E M12 = E [g(W1 )2 ] = E Y 2 = Var Y < ∞.
(15.15)
It remains to show that T is a stopping time with respect to {Gt }. Since T = lims↑1 Ms , it suffices to show that Ms is a stopping time with respect to {Gt } for each s < 1. Fix K. We will show (τt ≤ s, sup |Ns | ≤ K ) ∈ Gt ,
s < 1.
s≤t
Letting K → ∞ will then show (Ms ≥ t ) = (τt ≤ s) ∈ Gt for s < 1. Since τ is the inverse of M, then dτt 1 1 = = dt dMτt /dτt a(τt , Wτt )2 with τ0 = 0, a.s. With B(s, x) being the inverse of b(s, x) in the x variable, Ms = E [M1 | Fs ] = E [g(W1 ) | Fs ] = b(s, Ws ), or Ws = B(s, Ms ),
s < 1.
Therefore Wτt = B(τt , Mτt ) = B(τt , Nt ) on the event (τt ≤ s) if s < 1. Thus τt solves the equation dτt 1 = , dt a(τt , B(τt , Nt ))2 or
τt = 0
t
τ0 = 0,
1 du. a(τu , B(τu , Nu ))2
(15.16)
15.2 Construction of the embedding
107
Fix s and t and choose s0 ∈ (s, 1). Let SK = inf {t : |Nt | ≥ K} and let NtK = Nt∧SK . Define (q, r) =
1 (a(r, B(r, NqK (ω)))2
if r ≤ s0 . Observe that depends on ω. Define (q, r) = 1 for r ≥ 1 and define (q, r) by linear interpolation for r ∈ (s0 , 1). Note that by Proposition 15.3, is continuous, bounded, and there exists k > 0 such that |(q, r) − (q, r )| ≤ k|r − r |, τt solves the equation
r ∈ R, q ∈ [0, ∞).
t
τt =
(u, τu ) du. 0
We solve the differential equation
t
(u, y(u)) du
y(t ) =
(15.17)
0
using Theorem 15.1. Thefunction y0 (t ) in the statement of Theorem 15.1 is identically zero, t and the function y1 (t ) = 0 (u, y0 (u)) du (which depends on ω because does) will be Gt measurable, and by induction, the functions yi (t ) will be Gt measurable. Therefore the limit, y(t ), will be Gt measurable. Since |NqK (ω)| ≤ K for all q and we are only interested in the solution to (15.17) for y(t ) ≤ s, then τt = y(t ) as long as τt ≤ s; therefore (15.16) holds and the proof is complete. In the above theorem, we started with a Brownian motion W , constructed a new Brownian motion N, and then defined our stopping time T in terms of N. We can actually start with a Brownian motion W and define a stopping time that is a stopping time with respect to the minimal augmented filtration of W . Corollary 15.5 Let W be a Brownian motion and let {Ft } be the minimal augmented filtration for W . Let Y be a random variable with E Y = 0 and Var Y < ∞. There exists a stopping time V with respect to {Ft } such that WV has the same law as Y . Proof
We sketch the proof and ask you to give the details in Exercise 15.3. Define (q, r) =
1 (a(r, B(r, Wq (ω))))2
and solve the equation dτ t = (t, τ t ), dt
τ0 = 0
by Picard iteration. The proof of Theorem 15.4 shows that the solution τ t will satisfy (τ t ≤ s) ∈ Ft for every t as long as s < 1. Let A be the inverse of τ , and define V = lims↑1 As . Then V will be the desired stopping time.
108
Skorokhod embedding
15.3 Embedding random walks Let us give an application of Skorokhod embedding to show that we can find a Brownian motion that is relatively close to a random walk. Suppose Y1 , Y2 , . . . is an i.i.d. sequence of real-valued random variables with mean zero and variance one. Given a Brownian motion Wt we can find a stopping time T1 such that WT1 has the same distribution as Y1 . We use the strong Markov property at time T1 and find a stopping time T2 for WT1 +t − WT1 so that WT1 +T2 − WT1 has the same distribution as Y2 and is independent of FT1 . We continue. We see k 2 that the T are i.i.d. and by Theorem 15.4, E T = E Y = 1. Let U = i i k i i=1 Ti . Then for each n n, Sn = i=1 Yi has the same distribution as WUn . Theorem 15.6
√ sup |WUi − Wi |/ n i≤n
tends to 0 in probability as n → ∞. Proof
We will show that for each ε > 0
√ lim sup P(sup |WUk − Wk | > ε n) ≤ ε. n→∞
(15.18)
k≤n
Since the paths of Brownian motion are continuous, we can find δ ≤ 1 small such that
P(
|Wt − Ws | > ε) < ε/2.
sup s,t≤2,|t−s|≤δ
By scaling,
P(
sup
√ |Wt − Ws | > ε n) < ε/2.
(15.19)
s,t≤2n,|t−s|≤δn
The strong law of large numbers (Theorem A.38) says that Un /n → E T1 = 1, a.s., and in fact, by Proposition A.39, we even have maxk≤n |Uk − k| → 0, n Therefore
a.s.
(15.20)
√ P(max |WUk − Wk | > ε n) k≤n
≤ P(max |Uk − k| > δn) + P( k≤n
≤ P max k≤n
sup
√ |Wt − Ws | > ε n)
s,t≤2n,|t−s|≤δn
ε |Uk − k| >δ + . n 2
By (15.20) this will be less than ε if we take n sufficiently large.
Exercises 15.1 Without some supplemental conditions on T , the problem of Skorokhod embedding is trivial. Suppose W is a Brownian motion with respect to a filtration {Ft } satisfying the usual conditions. Suppose Y is a finite random variable and suppose h is a real-valued function such that h(W1 ) has the same law as Y .
Exercises
109
(1) Show that if T = inf {t > 1 : Wt = h(W1 )}, then WT and Y have the same law. (2) Give an example of a mean zero random variable Y with finite variance such that if T is defined as in (1), then E T = ∞. 15.2 Show that the triple integral on the right-hand side of (15.10) is equal to the expression in (15.11). 15.3 A sketch was given for the proof of Corollary 15.5. Provide a detailed proof. 15.4 Here is another approach to proving Corollary 15.5. Let Y , N, T , and {Gt } be as in the proof of Theorem 15.4. (1) Show that there is a random variable U that is measurable with respect to σ (Ns : 0 ≤ s < ∞) such that U = T , a.s. (2) Show there is a Borel measurable map H : C[0, ∞) → [0, ∞) such that U = H (N ). (3) If W is a Brownian motion, define V = H (W ). Show V is a stopping time with respect to the minimal augmented filtration generated by W such that WV has the same law as Y . 15.5 Suppose p ∈ (0, 1/2) and Y is a random variable such that P(Y = 1) = P(Y = −1) = p and P(Y = 0) = 1 − 2p. Let W be a Brownian motion. Let Sx = inf {t > 0 : Wt = x} and let T = inf {t > Sx ∧ S−x : Wt ∈ {−1, 0, 1}}. Determine x such that WT and Y have the same law. 15.6 Suppose Y is a mean zero random variable and there exists a real number K > 0 such that |Y | ≤ K, a.s. Let W be a Brownian motion and let T be a stopping time with E T < ∞ such that WT and Y have the same law. (We do not necessarily assume that T was constructed by the method of Section 15.2.) Let SK = inf {t : |Wt | ≥ K}. Prove that T ≤ SK , a.s. 15.7 Let Yi be a sequence of i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 , and let Sn = ni=1 Yi . Sn is called a simple symmetric random walk. Let T1 , T2 , . . . and U1 , U2 , . . . be as in Section 15.3. p (1) Prove that E T1 < ∞ for all p ≥ 1. (2) Prove that if ε > 0, lim
n→∞
supk≤n |Uk − k| = 0, n(1/2)+ε
a.s.
Hint: Use Doob’s inequalities to estimate P(sup |Uk − k| ≥ δn(1/2)+ε ). k≤n
(3) Show that sup |WUi − Wi |/n(1/4)+(ε/2) i≤n
tends to zero in probability as n → ∞. 15.8 Let Sn , Ti , and Ui be as in Exercise 15.7. Prove that lim
n→∞
supi≤n |WUi − Wi | = 0, √ n
a.s.
15.9 Let Sn be a simple symmetric random walk; see Exercise 15.7. Let Y be a bounded symmetric random variable that takes values only in Z. (Y being symmetric means that Y and −Y have the same law.) Does there necessarily exist a stopping time N such that SN and Y have the same law? Why or why not?
110
Skorokhod embedding
Notes The survey article Obł´oj (2004) summarizes many different methods of Skorokhod embedding. The embedding presented here is from Bass (1983); see also Stroock (2003), pp. 213–17.
16 The general theory of processes
The name “general theory of processes” refers to the foundations of stochastic processes. Specific topics include measurability issues and classifications of stopping times. This chapter is fairly technical and abstract and should only be skimmed on the first reading of this book: read the definitions and statements of theorems, propositions, and lemmas, but not the proofs. The two main results we discuss are the measurability of hitting times, and the Doob–Meyer decomposition of submartingales, Theorem 16.29.
16.1 Predictable and optional processes Suppose (, F , P ) is a probability space. The outer probability P∗ associated with P is given by
P∗ (A) = inf {P(B) : A ⊂ B, B ∈ F }.
(16.1)
A set A is a P-null set if P∗ (A) = 0. We suppose throughout this chapter that {Ft } is a filtration satisfying the usual conditions; recall from Chapter 1 that this means that each Ft contains all the P-null sets and that ∩ε>0 Ft+ε = Ft for each t. Let π : [0, ∞) × → be defined by π (t, ω) = ω.
(16.2)
We define the predictable σ -field P to be the σ -field on [0, ∞) × generated by the collection of all bounded left continuous processes adapted to Ft . That is, P is the σ -field on [0, ∞) × generated by the collection of all sets of the form {(t, ω) ∈ [0, ∞) × : Xt (ω) > a}, where a ∈ R and X is a bounded, adapted, left-continuous process. The optional σ -field O is the σ -field on [0, ∞) × generated by the collection of all bounded right-continuous processes adapted to Ft . The word for predictable in French is “pr´evisible.” The older literature uses “well measurable” in place of the word “optional.” If S and T are random variables taking values in [0, ∞], let [S, T ) = {(t, ω) ∈ [0, ∞)× : S(ω) ≤ t < T (ω)}, and define (S, T ], (S, T ), etc. similarly. With this notation, [T, T ], the graph of T , is equal to {(t, ω) ∈ [0, ∞) × : T (ω) = t < ∞}. Note that [T, T ] is a subset of [0, ∞) × , so π ([T, T ]) = (T < ∞). 111
112
The general theory of processes
Recall that a stopping time can take the value ∞. A stopping time T is predictable if there exists a sequence of stopping times Tn such that for all ω (1) T1 (ω) ≤ T2 (ω) ≤ · · · , (2) limn→∞ Tn (ω) = T (ω), and (3) if T (ω) > 0, then Tn (ω) < T (ω) for each n. In this case, the stopping times Tn predict T or announce T . If T is a stopping time satisfying (1)–(3) above and S = T , a.s., then we call S a predictable stopping time as well. A stopping time T is totally inaccessible if P(T = S < ∞) = 0 for every predictable stopping time S. For an example of a predictable stopping time, let Wt be a Brownian motion started at 0 and let T = inf {t > 0 : Wt = 1}. The stopping time T is predicted by the stopping times Tn = inf {t > 0 : Wt = 1 − (1/n)}. For an example of a totally inaccessible stopping time, let Pt be a Poisson process with parameter 1 and let T = inf {t : Pt = 1}, the first time the Poisson process jumps. Since Pt has independent increments, Pt − t is a martingale, just as in Example 3.2. By (A.8), E [(Pt − t )2 ] < ∞. If S is a bounded predictable stopping time, by the optional stopping theorem, E PS = E S. If Sn are stopping times predicting S, then by monotone convergence
E PS− = lim E PSn = lim E Sn = E S. n→∞
n→∞
Therefore E [PS − PS− ] = 0, and since Pt is an increasing process, this says that P does not jump at time S. Applying this to S ∧ M and letting M → ∞, we see that P does not jump at any predictable time S, whether or not S is bounded. Therefore P(T = S < ∞) = 0, so T is totally inaccessible. The proof of the following proposition is reminiscent of that of the Vitali covering theorem from measure theory. Proposition 16.1 Let T be a stopping time. There exist predictable stopping times S1 , S2 , . . . and a totally inaccessible stopping time U such that [T, T ] = [U, U ] ∪ (∪∞ i=1 [Si , Si ]). Proof
Let a1 = sup{P(S = T < ∞) : S is a predictable stopping time}
and choose S1 to be a predictable stopping time such that P(S1 = T < ∞) ≥ 12 a1 . Given S1 , . . . , Sn , let an+1 = sup{P(S = T < ∞, S = S1 , . . . , S = Sn )) : S is a predictable stopping time} and choose Sn+1 such that P(Sn+1 = T < ∞, Sn+1 = S1 , . . . , Sn+1 = Sn ) ≥ 12 an+1 . If this procedure stops after n steps, set U (ω) equal to T (ω) if T (ω) is not equal to any of S1 (ω), . . . , Sn (ω) and equal to infinity otherwise. It is easy to check that U is a stopping time that is totally inaccessible. The other alternative is that this procedure continues indefinitely. In this case define
T (ω), T (ω) = S1 (ω), S2 (ω), . . . , U (ω) = ∞, otherwise.
16.1 Predictable and optional processes
113
There is no problem checking that U is a stopping time, but we need to show that U is totally inaccessible. Since probabilities are bounded by one, we have an → 0. If there exists a predictable stopping time S such that b = P(S = U < ∞) > 0, then b > 2an for some n, and in our construction we would have then chosen S in place of the Sn we did choose. Therefore such a stopping time S cannot exist. Proposition 16.2 (1) The optional σ -field O is generated by the collection of sets {[S, T ) : S, T stopping times}. (2) O is generated by the collection of sets of the form [a, b)×C, where a < b and C ∈ Fa . (3) The predictable σ -field P is generated by the collection of sets {(S, T ] : S, T stopping times}. (4) P is generated by the collection of sets {[S, T ) : S, T predictable stopping times}. (5) P is generated by the collection of sets of the form [b, c) × C, where a < b < c and C ∈ Fa . Proof (1) Since 1[S,T ) is a bounded right-continuous process that is adapted to {Ft }, sets of the form [S, T ) are optional. Now suppose X is a bounded adapted process with rightcontinuous paths. Let ε > 0, let U0 = 0, a.s., and let Ui+1 = inf {t > Ui : |Xt − XUi | > ε},
i ≥ 0.
(16.3)
Since X has right-continuous paths, (U1 < t ) = ∩q∈Q+ ,q ε}, where Q+ denotes the positive rationals, and it follows that U1 is a stopping time. Similarly Ui is a stopping time for each i; Exercise 16.4 asks you to prove this. If we set Xt ε (ω) =
∞
XUi (ω)1[Ui (ω),Ui+1 (ω)) (t ),
i=0
then supt |Xt − Xt ε | ≤ ε. Therefore it suffices to show that each process X ε is measurable generated by the collection of sets of the form [S, T ). with respect to the σ -field O To do that, it suffices to show that processes of the form Yt (ω) = 1A (ω)1[Ui (ω),Ui+1 (ω)) (t ), . If we set S(ω) equal to Ui (ω) if ω ∈ A where A ∈ FUi , are measurable with respect to O and equal to ∞ otherwise and we set T (ω) equal to Ui+1 (ω) if ω ∈ A and ∞ otherwise, then Yt (ω) = 1[S(ω),T (ω)) . (2) If C ∈ Fa , then 1C (ω)1[a,b) (t ) is a bounded right-continuous adapted process, so it is optional. By (1), every bounded right-continuous adapted process can be approximated by linear combinations of processes of the form 1[S,T ) . Now 1[S,T ) = 1[S,∞) − 1[T,∞) , and 1[S,∞)
114
The general theory of processes
is the limit of 1[Sn ,∞) , where Sn = k/2n if (k − 1)/2n ≤ S < k/2n , and we can similarly approximate 1[T,∞) . Note 1[Sn (ω),∞) (t ) =
∞
1((k−1)/2n ≤S(ω)
k=1
Since ((k − 1)/2n ≤ S(ω) < k/2n ) ∈ Fk/2n and 1[k/2n ,∞) (t ) is the limit of 1[k/2n ,m) (t ) as m → ∞, we see that every bounded right-continuous adapted process is measurable with respect to the σ -field generated by processes of the form 1A (ω)1[a,b) (t ), where A is Fa measurable. For (3), 1(S,T ] is left continuous, bounded, and adapted, hence predictable. Any leftcontinuous adapted bounded process can be approximated by processes of the form n n2 −1
Xk/2n (ω)1(k/2n ,(k+1)/2n ] (t ),
k=0
which in turn can be approximated by linear combinations of processes of the form Y = 1A (ω)1(a,b] (t ), where A is Fa measurable. Such a process Y is of the form 1(S,T ] if we define S and T by
a, ω ∈ A, b, ω ∈ A, S(ω) = T (ω) = ∞, ω ∈ / A, ∞, ω ∈ / A. To prove (4), note that S + 1k is always a predictable stopping time (predicted by the stopping times Sn = S + 1k − 1n for n > k). We have (S, T ] = ∪k {∩m [S + 1k , T + m1 )}. On the other hand, if S and T are predictable and are predicted by sequences Sn and Tm , respectively, then [S, T ) = ∩n {∪m (Sn , Tm ]}. (4) now follows by using (3). (5) As long as a + (1/n) < b, the processes 1C (ω)1(b−(1/n),c−(1/n)] (t ) are left continuous, bounded, and adapted, hence predictable. The process 1C (ω)1[b,c) (t ) is the limit of these processes as n → ∞, so is predictable. On the other hand, if Xt is a bounded adapted left-continuous process, it can be approximated by n n2 −1
X(k−1)/2n (ω)1(k/2n ,(k+1)/2n ] (t ).
k=1
Each summand can be approximated by linear combinations of processes of the form 1C (ω)1(b,c] (t ), where C ∈ Fa and a < b < c. Finally, 1C (ω)1(b,c] (t ) is the limit of 1C (ω)1[b+(1/n),c+(1/n)) (t ) as n → ∞. A consequence of Proposition 16.2(1) and (4) is that P ⊂ O.
16.2 Hitting times
115
16.2 Hitting times Let S be a separable metric space. Suppose {Ft } is a filtration satisfying the usual conditions and X is a stochastic process taking values S whose paths are right continuous and such that the jump times are totally inaccessible. Saying the jump times are totally inaccessible means that if T is a predictable stopping time, then XT − = XT , a.s., where XT − = lims 0 : Xt ∈ B}. TB is known as the first hitting time of B and UB as the first entry time of B. Proposition 16.3 (1) If A is an open set, then TA and UA are stopping times. (2) If A is a compact set, then TA and UA are stopping times. Proof
(1) Since the paths of Xt are right continuous and A is open, for each t, (TA < t ) = ∪q∈Q+ ,q
where Q+ denotes the non-negative rationals. Thus TA is a stopping time. Since (UA < t ) = (TA < t ) ∪ (X0 ∈ A) ∈ Ft ,
(16.4)
then UA is also a stopping time. (2) Now suppose A is compact and let An = {x ∈ S : d(x, A) < 1/n}. Each set An is open, hence TAn is a stopping time for each n. The TAn increase; let T be the limit. If we show T = TA , a.s., this will prove TA is a stopping time. Since A ⊂ An , then TAn ≤ TA for each n. Therefore T ≤ TA . On the other hand, if n > m, then XTAn ∈ An ⊂ Am , the closure of Am . Either TAn (ω) = T (ω) for all n sufficiently large, in which case XT (ω) ∈ Am , or else TAn (ω) < T (ω) for all n. In the latter case, XT (ω) = limn→∞ XTAn (ω) ∈ Am except for ω’s in a null set since the jump times of X are totally inaccessible. In either case, XT ∈ Am . This is true for all m, so XT ∈ ∩m Am = A, and therefore TA ≤ T . We conclude TA is a stopping time. To prove UA is a stopping time, we argue using (16.4) as above. For the proof of the following, which uses Choquet’s capacity theorem, we refer the reader to Blumenthal and Getoor (1968), Section I.10. Fix t and define Rt (A) = {ω : Xs (ω) ∈ A for some s ∈ [0, t]} = (UA ≤ t ).
(16.5)
Theorem 16.4 If A is a Borel subset of S , then Rt (A) ∈ Ft and there exists an increasing sequence of compact sets Kn contained in A such that P(Rt (Kn )) ↑ P(Rt (A)). Since (UA ≤ t ) = Rt (A), we have the following as an immediate corollary.
116
The general theory of processes
Theorem 16.5 For all Borel sets A, UA is a stopping time. Here is the main theorem of this section. Theorem 16.6 Suppose {Ft } is a filtration satisfying the usual conditions and X is a right continuous process whose jump times are totally inaccessible. If B is a Borel subset of S , then TB is a stopping time. Proof If we let Yt δ = Xt+δ and UBδ = inf {t ≥ 0 : Yt δ ∈ B}, then by the above, UBδ is a stopping time with respect to the filtration {Ftδ }, where Ftδ = Ft+δ . It follows that δ + UBδ is a stopping time with respect to the filtration {Ft }. Since (1/m) + UB1/m ↓ TB , then TB is a stopping time with respect to {Ft }. We now show that the hitting times of Borel sets can be approximated by the hitting times of compact sets. Proposition 16.7 There exists an increasing sequence of compact sets Kn contained in B such that UKn ↓ UB on (UB < ∞), P-a.s. Proof For each t we can find an increasing sequence of compact sets Ltn contained in B with P(Rt (Ltn )) ↑ P(Rt (B)). Let q j be an enumeration of the non-negative rationals. Let Kn = Lqn1 ∪ · · · ∪ Lqnn . Then the Kn are compact, form an increasing sequence, and are all contained in B. Thus UKn decreases, say to S, and since UKn ≥ UB for all n, then S ≥ UB . If we prove S ≤ UB , P-a.s., then S = UB , and we have our result. If UB < S, there exists a rational q j with UB < q j < S. Hence it suffices to prove P(UB < q j < S) = 0 for all j. If UB < q j , then ω ∈ Rq j (B). Since q q Rq j (Ln j ) ↑ Rq j (B), a.s., then except for a null set, ω will be in Rq j (Ln j ) for all n large enough, hence in Rq j (Kn ) if n is large enough. Then UKn (ω) ≤ q j < UB or S ≤ q j . Therefore P(Ub < q j < S) = 0. Theorem 16.8 There exists an increasing sequence of compacts Kn contained in B such that TKn ↓ TB . Proof Let Yt δ = Xt+δ and UBδ = inf {t ≥ 0 : Yt ∈ B}. Applying the above proposition to Yt 1/m , for each m there exist compact sets Lmn , increasing in n and contained in B, such that ↓ UB1/m . Let Kn = L1n ∪ · · · ∪ Lnn . Then Kn is an increasing sequence of compact sets UL1/m m n ↓ UB1/m . Also, for each n, 1/m + UK1/m ↓ TKn and 1/m + UB1/m ↓ TB . contained in B, and UK1/m n n We write ) TB = lim(1/m + UB1/m ) = lim lim(1/m + UK1/m n m
m
= lim lim(1/m n
m
+ UK1/m ) n
n
= lim TKn . n
is decreasing in both m and n, the change in the order of taking limits is Since 1/m + UK1/m n justified. Since TKn is decreasing, this completes the proof.
16.3 The debut and section theorems
117
16.3 The debut and section theorems If E ⊂ [0, ∞) × , let DE = inf {t ≥ 0 : (t, ω) ∈ E}, the debut of E. An important generalization of Theorem 16.6 is the following, known as the debut theorem. Theorem 16.9 If E ∈ O, then DE is a stopping time. The proof of this theorem is beyond the scope of this book, and we refer the reader to Dellacherie and Meyer (1978) for a proof. Using Theorem 16.9, we can weaken the assumptions on X in Theorem 16.6. Theorem 16.10 If X is an optional process taking values in S and B is a Borel subset of S , then UB and TB are stopping times. Proof Since B is a Borel subset of S and X is an optional process, then 1B (Xt ) is also an optional process. UB is then the debut of the set E = {(s, ω) : 1B (Xs (ω)) = 1}, and therefore is a stopping time. To prove that TB is a stopping time, we argue exactly as in the proof of Theorem 16.6. Remark 16.11 In the theory of Markov processes, the notion of completion of a σ -field is a bit different. However it is still the case that the hitting times of Borel sets by right continuous processes are stopping times. See Remark 20.4. The optional section theorem is the following. Theorem 16.12 If E is an optional set and ε > 0, there exists a stopping time T such that [T, T ] ⊂ E and P(π (E )) ≤ P(T < ∞) + ε. The statement of the predictable section theorem is very similar. Theorem 16.13 If E is a predictable set and ε > 0, there exists a predictable stopping time T such that [T, T ] ⊂ E and P(π (E )) ≤ P(T < ∞) + ε. Again we refer to Dellacherie and Meyer (1978) for proofs. We note that Proposition 16.7 is a precursor of the optional section theorem. To see this, let A be a Borel set and let E = {(t, ω) : Xt ∈ A}. Then DE = UA . If the process is right continuous, then XUKn ∈ Kn ⊂ A, where the Kn are as in Proposition 16.7, and the graphs of the UKn are contained in E. Here is a corollary of Theorems 16.12 and 16.13. Corollary 16.14 (1) If X and Y are optional processes such that P(XT = YT ) = 1 for every finite stopping time T , then X and Y are indistinguishable: P(Xt = Yt for all t ) = 1. (2) If X and Y are predictable processes with P(XT = YT ) = 1 for every finite predictable stopping time T , then X and Y are indistinguishable. Proof We prove (1), the proof of (2) being similar. Let F = {(t, ω) : Xt (ω) = Yt (ω)}. Then F is an optional set, and if P(π (F )) > 0, there exists a stopping time U with [U, U ] ⊂ F
118
The general theory of processes
and P(U < ∞) > 0. By looking at T = U ∧ N for sufficiently large N, we obtain a contradiction. Another application of the section theorems is the following. Proposition 16.15 Suppose [T, T ] is a predictable set. Then T is a predictable stopping time. Proof Since T is the debut of [T, T ], then T is a stopping time. By the predictable section theorem, Theorem 16.13, for each n there exists a predictable stopping time Sn such that [Sn , Sn ] ⊂ [T, T ] and
P(π ([Sn , Sn ])) ≥ P(π ([T, T ])) − 2−n . Saying [Sn , Sn ] ⊂ [T, T ] implies that for each ω, either Sn (ω) = T (ω) or else Sn (ω) = ∞. The set of ω’s for which T (ω) < ∞ but Sn (ω) = ∞ has probability at most 2−n . Let Qn = S1 ∧ · · · ∧ Sn . Then the Qn ’s are predictable stopping times by Exercise 16.1, they decrease, [Qn , Qn ] ⊂ [T, T ], and P(π ([Qn , Qn ])) ≥ P(π ([T, T ]))−2−n+1 . Let Q = limn Qn . If Q(ω) < ∞, then Qn (ω) < ∞ for all n sufficiently large (how large depends on ω); since Qn (ω) is either equal to T (ω) or to ∞, Qn (ω) = Q(ω) for all n sufficiently large, and hence Q(ω) = T (ω). If T (ω) < ∞, then except for a set of ω’s of probability zero, Qn (ω) = T (ω) for n sufficiently large. Therefore Q = T , a.s. Choose Rnm predicting Qn as m → ∞. Choose mn large enough such that
P(Rnmn + 2−n < Qn < ∞) < 2−n
and
P(Rnmn < n, Qn = ∞) < 2−n .
Let Un = n ∧ Rnmn ∧ Rn+1,mn+1 ∧ · · · . Fix n for the moment. If 0 < Q(ω) < ∞, then R jm j (ω) < Q j (ω) = Q(ω) for all j sufficiently large. Choosing j > n sufficiently large, Un (ω) ≤ R jm j (ω) < Q(ω). The Un increase; let T be the limit. By the Borel–Cantelli lemma, if Q(ω) < ∞, then Rnmn (ω) ≥ Qn (ω) − 2−n = Q(ω) − 2−n for all n sufficiently large, except for a set of ω’s of probability zero. Therefore Un (ω) ≥ Q(ω) − 2−n+1 for n sufficiently large, and we conclude that Un (ω) ↑ Q(ω), except for a set of ω’s of probability zero. If Q(ω) = ∞, then Qn (ω) = ∞ for all n. By the Borel–Cantelli lemma, except for a set of probability zero, Rnmn ≥ n for n sufficiently large. Hence Un (ω) = n for n sufficiently large, so Un (ω) < Q(ω) and Un (ω) ↑ Q(ω). Thus Q is predictable and T = Q, a.s. (We leave consideration of those ω for which Q(ω) = 0 to the reader.) Proposition 16.16 Let Xt be a predictable process with paths that are right continuous with left limits. If a ∈ R and T = inf {t > 0 : Xt ≥ a}, then T is a predictable stopping time. Proof The set A = {(t, ω) : Xt (ω) ≥ a} is a predictable set. Since Xt is right continuous, [T, ∞) = A ∪ (T, ∞) ∈ P by Proposition 16.2, and so [T, T ] = [T, ∞) \ (T, ∞) ∈ P . Now apply Proposition 16.15.
16.4 Projection theorems
119
16.4 Projection theorems Let B [0, ∞) be the Borel σ -field on [0, ∞), let F∞ = ∨t≥0 Ft , and let H be the product σ -field
H = B [0, ∞) × F∞ .
(16.6)
The following is the optional projection theorem. Theorem 16.17 Let X be a bounded process that is H measurable. There exists a unique optional process oX such that o
XT 1(T <∞) = E [XT 1(T <∞) | FT ]
(16.7)
for all stopping times T , including those taking infinite values. If X ≥ 0, then oX ≥ 0. o
X is called the optional projection of X . If X is already optional, then by the uniqueness result, Corollary 16.14, oX = X . If we take our stopping time T in (16.7) equal to a fixed time t, we have o
Xt = E [Xt | Ft ],
a.s.
(16.8)
This observation is sometimes useful when X is not an adapted process and one wants a version of E [Xt | Ft ] that is jointly measurable in t and ω. If (16.7) holds, then taking expectations shows that
E [oXT ; T < ∞] = E [XT ; T < ∞]
(16.9)
for all stopping times T . Conversely, suppose (16.9) holds for all stopping times T . If S is a stopping time and A ∈ FS , let SA be defined by
S(ω) ω ∈ A; SA (ω) = (16.10) ∞ ω∈ / A. Then (16.9) with T replaced by SA implies that
E [oXS 1(S<∞) ; A] = E [XS 1(S<∞) ; A]. Since oXS 1(S<∞) is FS measurable, this implies (16.7) holds for the stopping time S. Consequently (16.7) holding for all stopping times T is equivalent to (16.9) holding for all stopping times T . Proof of Theorem 16.17 The uniqueness is immediate from Corollary 16.14. We look at existence. If Xt (ω) = 1F (ω)1[a,b) (t ) where F ∈ F∞ , we set oXt equal to E [1F | Ft ]1[a,b) (t ), where we use Corollary 3.13 to take the right continuous version of the martingale E [1F | Ft ]. We check:
E [oXT ; T < ∞] = E [E [1F | FT ]1[a,b) (T ); T < ∞] = E [1F 1[a,b) (T ); T < ∞] = E [XT ; T < ∞]
120
The general theory of processes
since (T < ∞) and 1[a,b) (T ) are both FT measurable. We then use linearity and limits to define oX for bounded measurable X . The positivity of oX when X ≥ 0 is clear from the construction. Almost the same proof gives Theorem 16.18 Let X be a bounded measurable process. There exists a unique predictable process p X , called the predictable projection of X , such that
E [ pXT ; T < ∞] = E [XT ; T < ∞] for every predictable stopping time T . If X ≥ 0, then p X ≥ 0. Proof Uniqueness is as before. If Xt = 1F (ω)1(a,b] (t ), we let p Xt = 1(a,b] (t )Zt− (ω), where Zt− denotes the left-hand limit of Zt at time t and Zt is the right-continuous version of the martingale E [1F | Ft ]. We use linearity and limits to define p X for bounded measurable X . The positivity of p X when X ≥ 0 is clear.
16.5 More on predictability If U is a random time, i.e., a F∞ measurable map from to [0, ∞], define
FU − = σ {XU : X is bounded and predictable}. Lemma 16.19Suppose T is a predictable stopping time predicted by stopping times Tn . Then FT − = ∞ n=1 FTn . Proof If X is leftcontinuous, adapted, and bounded, then XT = lim XTm and XTm ∈ FTm ⊂ F , n Tn so XT ∈ n FTn . An argument using the monotone class theorem shows FT − ⊂ n FTn . On the other hand, suppose A ∈ FTn for some n. Define X = 1(Un ,∞) , where Un = Tn if ω ∈ A and ∞ otherwise. Since Tn < T on (T > 0), then XT = 1A . (We leave consideration of what happens on the event (T = 0) to the reader.) X is predictable since it is left continuous, adapted, and bounded, so A is FT − measurable. Therefore FTn ⊂ FT − for all n, and we conclude n FTn ⊂ FT − . Corollary 16.20 Suppose T is a predictable stopping time. If M is a uniformly integrable martingale with right-continuous paths, then
E [MT | FT − ] = MT − . Proof If Xt = Mt− , then X is left continuous, hence predictable, so MT − = XT is FT − measurable by the definition of FT − and a limit argument. Suppose the sequence Tn predicts T . If A ∈ FTm and n > m, then A ∈ FTm ⊂ FTn , and by optional stopping (see Exercise 3.12),
E [MT ; A] = E [MTn ; A] → E [MT − ; A] as n → ∞. Since FT − = m FTm , we have E [MT ; A] = E [MT − ; A] for all A ∈ FT − . Now use the definition of conditional expectation. Corollary 16.21 Let S be a predictable stopping time, M a square integrable martingale, and Nt = MS 1(t≥S) . Then Nt is a square integrable martingale.
16.5 More on predictability
121
Proof Since |Nt | ≤ 2 sups≥0 |Ms |, N is square integrable. We will show N is a martingale by showing E NT = 0 for all bounded stopping times T , and then appealing to Proposition 9.5. If T is a bounded stopping time, then (T ≥ S) ∈ FS− ; to see this, if Sm is a sequence of stopping times predicting S, then (T ≥ S) = ∩m (T ≥ Sm ) ∈ ∨m FSm . Using Corollary 16.20,
E NT = E MS 1(T ≥S) = E [MS ; T ≥ S] − E [MS− ; T ≥ S] = 0, and we are done. We now show that every stopping time for Brownian motion is predictable. Proposition 16.22 Let {Ft } be the minimal augmented filtration of a Brownian motion. If T is a stopping time with respect to {Ft }, then T is a predictable stopping time. Proof Let T be a stopping time for Brownian motion. Let g be a continuous strictly increasing function from [0, ∞] to [0, 1], e.g., g(s) = (2/π ) arctan s. Let Mt be the rightcontinuous modification of the martingale E [g(T ) | Ft ]. The property of Brownian motion that is key here is that every martingale adapted to the filtration of a Brownian motion is continuous; see Corollary 12.5. Hence Mt can be taken to be continuous. Let Vt = Mt − g(T ∧ t ). Then Vt has continuous paths and since g(T ∧ t ) increases with t, V is a supermartingale. We have Vt = E [g(T ) − g(T ∧ t ) | Ft ], so V is non-negative. Clearly VT = 0. If S is the first time that Vt is 0, then S ≤ T . Also, 0 = E VS = E [g(T ) − g(T ∧ S)], so S ≥ T. We let Tn = inf {t : Vt = 1/n}. By the continuity of V , it is clear that each Tn is strictly less than T if T > 0 and the Tn increase up to T . Hence T is predictable. Now let us suppose that At is a right-continuous adapted process whose paths are increasing. We call such a process an increasing process. At denotes the jump of A at time t, that is, At = At − At− . Proposition 16.23 Suppose At is an increasing process such that (1) AT = 0 whenever T is a totally inaccessible stopping time, and (2) AT is FT − measurable whenever T is a predictable stopping time. Then A is predictable. Proof Let Umi be the ith time |At | ∈ (2−m , 2−m+1 ]. The Umi are predictable stopping times by Exercise 16.5. We decompose each Umi as in Proposition 16.1. Since A does not jump at totally inaccessible times, none of the Umi has a totally inaccessible part. We do this for each m and i and obtain a countable collection of predictable stopping times, the union of whose graphs contains all the jump times of A. We order them in some way as R1 , R2 , . . . Define T1 = R1 , define T2 by setting T2 (ω) = R2 (ω) if R2 (ω) = R1 (ω) and infinity otherwise. Set Tn (ω) = Rn (ω) if Rn (ω) = R1 (ω), . . . , Rn−1 (ω) and Tn (ω) = ∞ otherwise. We thus get a sequence of predictable stopping times Tn with disjoint graphs and
122
The general theory of processes
∪n [Tn , Tn ] includes all the jumps of A, except for the set of ω’s of probability zero. The Tn are predictable stopping times by Exercise 16.6. Since A jumps only at the predictable stopping times Tn , we see that we can write At = Atc + i (ATn )1[Tn ,∞) , where Ac is a continuous increasing process. By hypothesis, ATn is FTn − measurable. Therefore the proof will be complete once we show (ATn )1[Tn ,∞) is a predictable process. It therefore suffices to show that the process Yt = 1B (ω)1[T,∞) (t ) is predictable if T is a predictable stopping time and B ∈ FT − . Since Yt = 1[TB ,∞) (t ), where TB is equal to T if ω ∈ B and equal to infinity otherwise, the predictability of Y follows by Exercise 16.3.
16.6 Dual projection theorems In this section At is a right-continuous increasing process with A0 = 0, a.s. We do not necessarily assume that At is adapted, only that A is measurable with respect to H defined by (16.6). Define μA on elements of H by ∞ 1B (t, ω) dAt (ω). μA (B) = E We define μA (X ) by E then μA (X ) = 0.
∞ 0
0
Xt dAt if X is bounded and H measurable. Note that if X = 0,
Theorem 16.24 Suppose μ is a bounded positive measure on H such that μ(X ) = 0 whenever X = 0. Then there exists a unique right-continuous increasing process A with A0 = 0, a.s., such that μ = μA . Proof First, uniqueness. If μ = μA = μB , let t > 0 and let C be the set of ω’s where At (ω) > Bt (ω)+ε. Then μA ([0, t]×C) ≥ μB ([0, t]×C)+εP(C), which implies P(C) = 0. Since ε is arbitrary, then At = Bt , a.s. Since A and B are right continuous, we conclude A = B. To prove existence, for each rational q, define νq (C) = μ([0, q] × C). Clearly νq is q be the Radon–Nikodym derivative of νq with absolutely continuous with respect to P. Let A q . It is easy to respect to P. Since μ is positive, A is increasing in q. Let At = lim supq→t,q>t A check that μA = μ. Theorem 16.25 Suppose A is right continuous, A0 = 0, a.s., and μA (X ) = μA (oX ) for every bounded H measurable process X . Then At is optional. Proof Since At is right continuous, we need only show that At is adapted. Fix t and let Y be a bounded F∞ measurable random variable, Z = Y − E [Y | Ft ], and Xs (ω) = 1[0,t] (s)Z(ω). If T is a stopping time, then (T ≤ t ) ∈ Ft , and so by the definitions of X and Z,
E [oXT ; T < ∞] = E [XT ; T < ∞] = E [Z; T ≤ t] = 0. This implies oX = 0 by the definition of oX . Hence ∞ E [At Z] = E Xs dAs = μA (X ) = μA (oX ) = 0. 0
16.6 Dual projection theorems
123
Thus E [At Y ] = E [At E [Y | Ft ] ]. We write
E [At Y ] = E [At E [Y | Ft ] ] = E [E [(At E [Y | Ft ]) | Ft ] ] = E [E [At | Ft ]E [Y | Ft ] ] = E [E [(Y E [At | Ft ]) | Ft ] ] = E [Y E [At | Ft ] ]. Hence E [At Y ] = E [Y E [At | Ft ] ] for all bounded Y , or At = E [At | Ft ], a.s., which says that At is Ft measurable. Theorem 16.26 If μA (X ) = μA ( pX ) for all bounded X , then A is predictable and can be taken to be right continuous. Proof
By hypothesis, together with Exercise 16.8, μA (oX ) = μA ( p (oX )) = μA ( p X ) = μA (X ).
By Theorem 16.25, At is right continuous and optional. We need to show that A does not jump at totally inaccessible times and that AT is FT − measurable at predictable times T ; we then use Proposition 16.23. Let T be a totally inaccessible stopping time and let B = (AT > 0). Set TB equal to T on B and equal to infinity otherwise. It is easy to check that TB is also totally inaccessible. Let X = 1[TB ,TB ] . If U is a predictable stopping time, E [XU ; U < ∞] = P(TB = U < ∞) = 0. By the definition of predictable projection, p X = 0. Hence
E [AT ; AT > 0] = E [ATB ] = μA (X ) = μA ( pX ) = 0. Now suppose T is a predictable stopping time. Let Y be a bounded H measurable random variable, set Z = Y − E [Y | FT − ], and X = Z1[T,T ] . Let S be any predictable stopping time. Then if W = 1[S,S] , W = limn→∞ 1[S,S+(1/n)) is a predictable process by Proposition 16.2(4). By the definition of FT − , WT is FT − measurable. This is the same as saying (S = T < ∞) ∈ FT − . Therefore
E [XS ; S < ∞] = E [Z; S = T < ∞] = 0. This implies p X = 0, and then 0 = μA ( p X ) = μA (X ) = E [ZAT ]. Similarly to the proof of Theorem 16.25,
E [AT Y ] = E [AT E [Y | FT − ] ] = E [E [AT | FT − ]E [Y | FT − ] ] = E [Y E [AT | FT − ] ]. Since this holds for all Y , then AT = E [AT | FT − ] is FT − measurable. We now define the dual optional projection and the dual predictable projection of an increasing process. Given a right-continuous increasing, not necessarily adapted process At with A0 = 0, a.s., define μo by μo (X ) = μA (oX )
(16.11)
124
The general theory of processes
for bounded H measurable X . Exercise 16.11 asks you to prove that μo is a measure. Clearly μo (oX ) = μA (o (oX )) = μA (oX ) = μo (X ). By Theorem 16.17, we see that oX ≥ 0 if X ≥ 0, hence μo is a positive measure. If X = 0, then oX = 0, so μo (X ) = μA (oX ) = 0. Therefore by Theorems 16.24 and 16.25, μo corresponds to an optional increasing process Ao, called the dual optional projection of A. The dual optional projection is used in excursion theory. More commonly used is the dual predictable projection, which is defined in a very similar way. Define μ p (X ) = μA ( pX ), and and let A p be the predictable increasing process associated with μ p . We often denote A p by A call it the compensator of A. The reason for this terminology is the following proposition. t Proposition 16.27 Let At be an adapted increasing process with A0 = 0, a.s. Then At − A is a martingale. Proof
Let s < t, let B ∈ Fs , define
s, ω ∈ B, S(ω) = ∞, ω ∈ / B,
and
t, ω ∈ B, T (ω) = ∞, ω ∈ / B.
Let X = 1(S,T ] . Then
E [At − As ; B] = μA (X ) = μA ( pX ) = μA p (X ) = E [Atp − Asp ; B], which does it.
16.7 The Doob–Meyer decomposition Proposition 16.28 If M is a predictable uniformly integrable martingale with paths that are right continuous with left limits, then M is continuous. Proof Let ε > 0 and let T = inf {t : |Mt | > ε}. T is a predictable stopping time by Exercise 16.2. By Corollary 16.20, E [MT | FT − ] = MT − . By the definition of FT − and a limit argument, MT is FT − measurable, and thus E [MT | FT − ] = MT . Hence MT = MT − at all predictable stopping times, and in particular at time T . But ε is arbitrary, so M has no jumps. We say a process X is of class D if the family {XT : T a stopping time} is uniformly integrable. The Doob–Meyer decomposition is the following. If Zt is a supermartingale, then −Zt is a submartingale, and it is a matter only of convenience whether we state the Doob–Meyer decomposition in terms of submartingales or supermartingales. Theorem 16.29 Suppose Zt is a submartingale of class D with paths that are right continuous with left limits and such that Z0 = 0, a.s. Then Zt = Mt +At , where Mt is a uniformly integrable right-continuous martingale with M0 = 0, a.s., and At is a predictable increasing process with A0 = 0, a.s. The decomposition is unique. The existence is the hard part. We define a measure μ by μ((S, T ]) = E[ZT − ZS ] for stopping times S ≤ T , and then let A be the increasing process such that μA (X ) = μ( p X ). Proof We start with uniqueness. If Zt = Mt + At = Nt + Bt , then Mt − Nt = Bt − At , and so Mt − Nt is a predictable uniformly integrable martingale. By Proposition 16.28, Mt − Nt is
16.7 The Doob–Meyer decomposition
125
a continuous martingale. Since Mt − Nt = Bt − At , then Mt − Nt is a continuous martingale whose paths are of bounded variation on each finite time interval, hence Mt − Nt = 0 by Theorem 9.7. This proves uniqueness. We turn to existence. By the martingale convergence theorem (Theorem 3.12), Z∞ = limt→∞ Zt exists, a.s. By Fatou’s lemma, E |Z∞ | < ∞. Let I denote the collection of finite unions of subsets of [0, ∞) × of the form (S, T ], where S ≤ T are stopping times. Define μ((S, T ]) = E [ZT −ZS ]. Since Z is a submartingale, then μ is non-negative. We note that I is an algebra and that μ is finitely additive on I . If K = (S1 , T1 ] ∪ · · · ∪ (Sn , Tn ] with S1 ≤ T1 ≤ S2 ≤ · · · ≤ Tn , set K = [S1 , T1 ] ∪ · · · ∪ [Sn , Tn ]. If H = (S, T ] and ε > 0, let
S(ω) + (1/n), S(ω) + (1/n) < T (ω), Sn (ω) = ∞, otherwise, and
Tn (ω) =
T (ω), S(ω) + (1/n) < T (ω), ∞, otherwise.
Then [Sn , Tn ] ⊂ (S, T ] and Sn ↓ S, Tn ↓ T . Since Z is right continuous and of class D, then μ(Sn , Tn ] = E [ZTn − ZSn ] → E [ZT − ZS ] = μ(H ). Thus if n is sufficiently large and we take K = (Sn , Tn ], then K ⊂ H and μ(K ) > μ(H ) − ε. We now prove that μ is countably additive on I . Suppose Hn ∈ I with Hn ↓ ∅. We need to show that μ(Hn ) ↓ 0. Let ε > 0 and choose Kn ∈ I such that K n ⊂ Hn with μ(Kn ) > μ(Hn ) − ε/2n . Let Ln = K 1 ∩ · · · ∩ K n . Then for each n we have μ(Hn ) ≤ μ(Ln ) + ε. Since Ln ⊂ K n ⊂ Hn , we have Ln ↓ ∅. Let DLn be the debut of Ln . The stopping times DLn increase; let R be the limit. Let Fn = Fn (ω) = {t : (t, ω) ∈ Ln }. This is a closed subset of [0, ∞), and DLn (ω) ∈ Fn ⊂ Fm whenever n ≥ m and DLn (ω) < ∞. If R(ω) < ∞, then R(ω) ∈ Fm for each m, which contradicts ∩m Lm = ∅. Therefore R = ∞. Since Z is of class D, then ZDLn converges almost surely and in L1 to Z∞ . Thus μ(Ln ) ≤ E [Z∞ − ZDLn ] → 0. Hence lim sup μ(Hn ) < ε, and since ε is arbitrary, μ(Hn ) → 0. This proves that μ is countably additive on I . By the Carath´eodory extension theorem, μ may be extended to a measure on P . Define μ(X ) = μ( p X ). Then μ( p X ) = μ( p ( p X )) = μ( p X ) = μ(X ), and so there exists a predictable right-continuous increasing process At such that μ = μA . Since
E A∞ = μA (1(0,∞) ) = μ( p1(0,∞) ) = μ(1(0,∞) ) = E [Z∞ − Z0 ] < ∞, A∞ is integrable, and since At is an increasing process, the collection of random variables {At } is uniformly integrable. If S is any stopping time, then by Proposition 16.2, (S, ∞) is a predictable set, hence p 1(S,∞) = 1(S,∞) . We thus have
E [A∞ − AS ] = μ((S, ∞)) = μ( p 1(S,∞) ) = μ(1(S,∞) ) = E [Z∞ − ZS ].
126
The general theory of processes
Letting t > 0 and B ∈ Ft , define S = t if ω ∈ B and equal to infinity otherwise. Then
E [A∞ − At ; B] = E [A∞ − AS ] = E [Z∞ − ZS ] = E [Z∞ − Zt ; B], or Mt = Zt − At is a martingale. Proposition A.17 tells us that M is a uniformly integrable martingale. A process X is of class DL if there exist stopping times Vn → ∞ such that Xt∧Vn is of class D for each n. It is clear that there is a version of the Doob–Meyer decomposition for submartingales of class DL. Proposition 16.30 The process A is continuous if and only if E ZTn → E ZT whenever Tn ↑ T and Tn < T on (T > 0). Proof Let T be a predictable stopping time predicted by the sequence Tn . Since we know E [A∞ − ATn ] = E [Z∞ − ZTn ], then taking limits,
E [A∞ − AT − ; T < ∞] = E [Z∞ − ZT − ; T < ∞], using the fact that Z is of class D. Also E [A∞ − AT ] = E [Z∞ − ZT ]. Thus E [AT − AT − ] = E [ZT − ZT − ]. Then E [AT − AT − ] = 0 if and only if E ZT = E ZT − . Corollary 16.31 Let S be a totally inaccessible stopping time, Y a non-negative bounded be the compensator of A. random variable that is FS measurable, and At = Y 1(t≥S) . Let A has continuous paths. Then A Proof Let T be a stopping time and let Tn be stopping times increasing to T . If we have P(T = S) = 0, then limn→∞ ATn = AT , a.s., since A jumps only at time S. If P(T = S) > 0, then [T, T ] cannot contain the graph of a predictable stopping time since S is totally inaccessible. Therefore we cannot have Tn < T for all n with positive probability, hence Tn (ω) = T (ω) for all n sufficiently large (depending on ω). Thus again limn→∞ ATn = AT , is continuous. a.s. By Proposition 16.30, A
16.8 Two inequalities Proposition 16.32 Suppose Zt = Mt − At , where Mt is a uniformly integrable martingale and At is an increasing predictable process with A0 = 0, a.s. Suppose Z is bounded, that is, there exists K > 0 such that P(|Zt | > K for some t ) = 0. If p is any positive integer, p E A∞ < ∞.
Proof
Let λ > 0 and let M = 4K. Let T = inf {t : At ≥ λ}. Because AT − ≤ λ,
P(A∞ ≥ λ + M ) = P(A∞ ≥ λ + M, T < ∞) ≤ P(A∞ − AT − ≥ M, T < ∞) A − A ∞ T− ; A∞ − AT − ≥ M, T < ∞ ≤E M 1 ≤ E [A∞ − AT − ; T < ∞]. M
16.8 Two inequalities
127
We will show 1 E [A∞ − AT − ; T < ∞] ≤ 12 P(T < ∞), M which, since P(T < ∞) = P(A∞ ≥ λ), implies
P(A∞ ≥ λ + M ) ≤ 12 P(A∞ ≥ λ).
(16.12)
(16.13)
Taking λ = kM in (16.13) yields
P(A∞ ≥ (k + 1)M ) ≤ 12 P(A∞ ≥ kM ). Since P(A∞ ≥ M ) ≤ 1, induction tells us
P(A∞ ≥ kM ) ≤
1 2k−1
,
which implies our conclusion. Therefore we need to prove (16.12). T is a predictable stopping time by Proposition 16.16. Let Tn be stopping times with Tn ↑ T and Tn < T on (T > 0). Let n be fixed for the moment and let N > 0. If j > n,
E [A∞ − ATj ; Tn < N] = E [E [A∞ − ATj | FTj ]; Tn < N] = −E [E [Z∞ − ZTj | FTj ]; Tn < N] ≤ 2K P(Tn < N ) since Zt + At is a martingale, (Tn < N ) ∈ FTn ⊂ FTj , and |Z| is bounded by K. Letting j → ∞ and using Fatou’s lemma, we get
E [A∞ − AT − ; Tn < N] ≤ 2K P(Tn < N ). Letting n → ∞, by Fatou’s lemma again,
E [A∞ − AT − ; T < N] ≤ 2K P(T ≤ N ). Finally, letting N → ∞, by monotone convergence,
E [A∞ − AT − ; T < ∞] ≤ 2K P(T < ∞). By our choice of M, this gives (16.12). For use in the reduction theorem in Chapter 17, we will need a variation of the preceding proposition. Proposition 16.33 Let U be a stopping time, Y a non-negative integrable random variable that is FU measurable. Let Nt be the right-continuous version of E [Y | Ft ]. Suppose there exists K > 0 such that Nt ≤ K if t < U . Let Zt = Y 1(t≥U ) , which is an increasing process, p and let At be its compensator. If p is a positive integer, then E A∞ < ∞. Proof
As in the proof of Proposition 16.32, it suffices to show
E [A∞ − AT − ; T < ∞] ≤ K P(T < ∞),
(16.14)
where λ > 0 and T = inf {t : At ≥ λ}. Since A is a predictable process, then T is a predictable stopping time by Proposition 16.16. Let Tn be stopping times predicting T .
128
The general theory of processes
Let N, n ≥ 1. If j > n, then (Tn < N ) ∈ FTn ⊂ FTj and
E [A∞ − ATj ; Tn < N] = E [Z∞ − ZTj ; Tn < N].
(16.15)
We observe that Z∞ − ZTj = 0 on the event (Tj ≥ U ), while Z∞ − ZTj = Y on the event (Tj < U ). Therefore
E [Z∞ − ZTj ; Tn < N] = E [Y ; Tj < U, Tn < N] = E [E [Y | FTj ]; Tj < U, Tn < N] = E [NTj ; Tj < U, Tn < N] ≤ K P(Tj < U, Tn < N ) ≤ K P(Tn < N ). With this and (16.15), we can now proceed as in the proof of Proposition 16.32 to obtain (16.14).
Exercises 16.1 Show that if S1 , . . . , Sn are predictable stopping times, then so are S1 ∧ · · · ∧ Sn and S1 ∨ · · · ∨ Sn . 16.2 If At is a predictable process with paths that are right continuous with left limits and a > 0, show T = inf {t > 0 : At > a} is a predictable stopping time. 16.3 Show that if T is a predictable stopping time, B ∈ FT − , and TB (ω) is defined to be equal to T (ω) if ω ∈ B and equal to ∞ otherwise, then TB is a predictable stopping time. 16.4 Let X be a bounded adapted right-continuous process, let ε > 0, let U0 = 0, a.s., and define Ui by (16.3) for i ≥ 1. Show each Ui is a stopping time. 16.5 Let A be a predictable increasing process and let Sk be the kth time A jumps more than ε. Thus S0 = 0, a.s., and Sk+1 = inf {t > Sk : At > ε}. Show each Sk is a predictable stopping time. 16.6 Show that the stopping times Tn defined in the proof of Proposition 16.23 are predictable. 16.7 Show that if Pt is a Poisson process, then ( p P)t = Pt− . 16.8 Show that if X is bounded and measurable with respect to the product σ -field B[0, ∞) × F∞ , then p (oX ) = p X . 16.9 Suppose T is a totally inaccessible stopping time. Show that if X = 1[T,T ] , then p X = 0. p
16.10 If P is a Poisson process with parameter λ, determine Pto and Pt . 16.11 Show that μo defined in (16.11) is a measure. 16.12 Let Xt be a continuous process and suppose there exists K > 0 such that for all t, E [ |X∞ − Xt | |Ft ] ≤ K,
a.s.
∗ = sup Let X∞ t≥0 |Xt |. Prove that there exists a depending only on K such that ∗
E eaX∞ < ∞.
Notes
129
This is sometimes called the John–Nirenberg inequality after the inequality of the same name in analysis. Hint: Imitate the proof of Proposition 16.32. This exercise is somewhat easier than the proof of that proposition because X has continuous paths. 16.13 A martingale M is said to be in the space BMO if 2 sup E [M∞ − Mt2 | Ft ] < ∞,
a.s.
t≥0
Let Mt∗ = sups≤t |Ms |. Show that if M is in BMO, then there exists a > 0 such that ∗
E eaM∞ < ∞.
The name BMO comes from the “bounded mean oscillation” spaces of harmonic analysis. Hint: Use Exercise 16.12.
Notes A progressively measurable set is one whose indicator is a progressively measurable process, which is defined in Exercise 1.3. In fact, the debut of a progressively measurable set is a stopping time; see Dellacherie and Meyer (1978). An elementary proof of the general Doob–Meyer theorem along the lines of the proof given in Chapter 9 can be found in Bass (1996). See Dellacherie and Meyer (1978) for more on the general theory of processes.
17 Processes with jumps
In this chapter we investigate the stochastic calculus for processes which may have jumps as well as a continuous component. If X is not a continuous process, it is no longer true that Xt∧TN is a bounded process when TN = inf {t : |Xt | ≥ N}, since there could be a large jump at time TN . We investigate stochastic integrals with respect to square integrable (not necessarily continuous) martingales, Itˆo’s formula, and the Girsanov transformation. We prove the reduction theorem that allows us to look at semimartingales that are not necessarily bounded. Since I encouraged you to skim Chapter 16 on the first reading of this book, it is only fair that I tell you the facts that we will need from that chapter. We will need the Doob–Meyer decomposition (Theorem 16.29), Proposition 16.1, Corollaries 16.21 and 16.31, and the two inequalities in Propositions 16.32 and 16.33.
17.1 Decomposition of martingales We assume throughout this chapter that {F t } is a filtration satisfying the usual conditions. This means that each Ft contains every P-null set and ∩ε>0 Ft+ε = Ft for each t. Let us begin by recalling a few definitions and facts. The predictable σ -field is the σ -field of subsets of [0, ∞) × generated by the collection of bounded, left-continuous processes that are adapted to {Ft }; see Section 10.1. A stopping time T is predictable and predicted by the sequence of stopping times Tn if Tn ↑ T , and Tn < T on the event (T > 0). A stopping time T is totally inaccessible if P(T = S) = 0 for every predictable stopping time S. The graph of a stopping time T is [T, T ] = {(t, ω) : t = T (ω) < ∞}; see Section 16.1. If Xt is a process that is right continuous with left limits, we set Xt− = lims→t,s
17.1 Decomposition of martingales
131
We will use the following lemma. Lemma 17.1 If At = Bt − Ct , where Bt and Ct are increasing right-continuous processes with B0 = C0 = 0, a.s., and in addition B and C are bounded, then t2 < ∞. E sup A t≥0
Proof
2 < ∞, and so 2∞ < ∞ and E C By Proposition 16.32, E B ∞ 2 ] ≤ 2E B 2 < ∞. t2 + 2 sup C 2∞ + 2E C t2 ≤ E [2 sup B E sup A t ∞ t≥0
t≥0
t≥0
We are done. A key result is the following orthogonality lemma. Lemma 17.2 Suppose At is a bounded increasing right-continuous process with A0 = 0, t is the compensator of A, and Mt = At − A t . Suppose Nt is a right continuous square a.s., A integrable martingale such that (Nt )(Mt ) = 0 for all t. Then E M∞ N∞ = 0. Proof
By Lemma 17.1, M is square integrable. Suppose H (s, ω) = K(ω)1(a,b] (s)
with K being Fa measurable. Since Mt is of bounded variation, we have (this is a Lebesgue– Stieltjes integral here) ∞ E Hs dMs = E [K(Mb − Ma )] = E [K E [Mb − Ma | Fa ] ] = 0. 0
We saw in Lemma 10.1 that linear combinations of such H ’s generate the predictable σ -field. ∞ Thus by ∞linearity and taking limits, E 0 Hs dMs = 0 if Hs is a predictable process such that E 0 |Hs | |dMs | < ∞. In particular, since Ns− is left continuous and hence predictable, ∞ E 0 Ns− dMs = 0, provided we check integrability: E
∞
|Ns− | |dMs | ≤ E
0
∞
(sup |Nr |) |dMs | r
0
∞ )] < ∞ = E [(sup |Nr |) (A∞ + A r
by the Cauchy–Schwarz ∞inequality. ∞ By hypothesis, E 0 Ns dMs = 0, so E 0 Ns dMs = 0. On the other hand, using Proposition 3.14, we see ∞ ∞ E M∞ N∞ = E N∞ dMs = E Ns dMs = 0. 0
The proof is complete.
0
132
Processes with jumps
If we apply the above to Nt∧T , we have E M∞ NT = 0. If we then condition on FT ,
E [MT NT ] = E [NT E [M∞ | FT ] ] = E [NT M∞ ] = 0.
(17.1)
The reason for the name “orthogonality lemma” is that by (17.1) and Proposition 9.5, Mt Nt is a martingale. This implies that M, Nt (which we will define soon, and is defined similarly to the case of continuous martingales) is identically equal to 0. Let Mt be a square integrable martingale with paths that are right continuous and left 2 limits, so that E M∞ < ∞. For each i ∈ Z, let Ti1 = inf {t : |Mt | ∈ [2i , 2i+1 )}, Ti2 = inf {t > Ti1 : |Mt | ∈ [2i , 2i+1 )}, and so on; i can be both positive and negative. Since Mt is right continuous with left limits, for each i, Ti j → ∞ as j → ∞. We conclude that Mt has at most countably many jumps. Next we decompose each Ti j into predictable and totally inaccessible parts by Proposition 16.1. We relabel the jump times as S1 , S2 , . . . so that each Sk is either predictable or totally inaccessible, the graphs of the Sk are disjoint, M has a jump at each time Sk and only at these times, and |MSk | is bounded for each k; of the proof of Proposition 16.23. We do not assume that Sk1 ≤ Sk2 if k1 ≤ k2 , and in general it would not be possible to arrange this. If Si is a totally inaccessible stopping time, let Ai (t ) = MSi 1(t≥Si )
(17.2)
i (t ), Mi (t ) = Ai (t ) − A
(17.3)
and
i is the compensator of Ai . Ai (t ) is the process that is 0 up to time Si and then jumps where A is continuous. If Si is a an amount MSi ; thereafter it is constant. By Corollary 16.31, A predictable stopping time, let Mi (t ) = MSi 1(t≥Si ) .
(17.4)
By Corollary 16.21, Mi is a martingale. Note that in either case, M − Mi has no jump at time Si . Theorem 17.3 Suppose M is a square integrable martingale and we define Mi as in (17.3) and (17.4). (1) Each Mi is square integrable. (2) ∞ in L2 . i=1 Mi (∞) converges ∞ c (3) If Mt = Mt − i=1 Mi (t ), then M c is square integrable and we can find a version that has continuous paths. (4) For each i and each stopping time T , E [MTc Mi (T )] = 0. Proof (1) If Si is a totally inaccessible stopping time and we let Bt = (MSi )+ 1(t≥Si ) and Ct = (MSi )− 1(t≥Si ) , then (1) follows by Lemma 17.1. If Si is predictable, (1) follows by Corollary 16.21. n (2) Let Vn (t ) = i=1 Mi (t ). By the orthogonality lemma (Lemma 17.2), E [Mi (∞)M j (∞)] = 0 if i = j and E [Mi (∞)(M∞ − Vn (∞)] = 0 if i ≤ n. We thus
17.2 Stochastic integrals
133
have n
E Mi (∞)2 = E Vn (∞)2
i=1
Therefore the series E
2 ≤ E M∞ − Vn (∞) + E Vn (∞)2 2 = E M∞ − Vn (∞) + Vn (∞)
n i=1
2 = E M∞ < ∞.
Mi (∞)2 converges. If n > m,
E [(Vn (∞) − Vm (∞)] = E 2
n
2
Mi (∞)
=
i=m+1
n
E Mi (∞)2 .
i=m+1
This tends to 0 as n, m → ∞, so Vn (∞) is a Cauchy sequence in L2 , and hence converges. (3) From (2), Doob’s inequalities, and the completeness of L2 , the random variables supt≥0 [Mt − Vn (t )] converge in L2 as n → ∞. Let Mtc = limn→∞ [Mt − Vn (t )]. There is a sequence nk such that sup |(Mt − Vnk (t )) − Mtc | → 0,
a.s.
t≥0
We conclude that the paths of Mtc are right continuous with left limits. By the construction of the Mi , M − Vnk has jumps only at times Si for i > nk . We therefore see that M c has no jumps, i.e., it is continuous. (4) By the orthogonality lemma and (17.1),
E [Mi (T )(MT − Vn (T )] = 0 if T is a stopping time and i ≤ n. Letting n tend to infinity proves (4).
17.2 Stochastic integrals If Mt is a square integrable martingale, then Mt2 is a submartingale by Jensen’s inequality for conditional expectations. Just as in the case of continuous martingales, we can use the Doob–Meyer decomposition (this time, we use Theorem 16.29 instead of Theorem 9.12) to find a predictable increasing process starting at 0, denoted Mt , such that Mt2 − Mt is a martingale. Let us define |Ms |2 . (17.5) [M]t = M c t + s≤t c
Here M is the continuous part of the martingale M as defined in Theorem 17.3. As an example, if Mt = Pt − t, where Pt is a Poisson process with parameter 1, then Mtc = 0 and [M]t = Ps2 = Ps = Pt , s≤t
s≤t
because all the jumps of Pt are of size one. In this case Mt = t; this follows from Proposition 17.4 below.
134
Processes with jumps
In defining stochastic integrals, one could work with Mt , but the process [M]t is the one that shows up naturally in many formulas, such as the product formula. Proposition 17.4 Mt2 − [M]t is a martingale. Proof
By the orthogonality lemma and (17.1) it is easy to see that Mi t . Mt = M c t + i
Since
Mt2
− Mt is a martingale, we need only show [M]t − Mt is a martingale. Since
[M]t − Mt = M c t + |Ms |2 − M c t + Mi t ,
it suffices to show that By Exercise 17.1
s≤t
i
Mi t −
i
Mi (t )2 = 2
s≤t
i
|Mi (s)|2 is a martingale.
t
Mi (s−) dMi (s) + 0
|Mi (s)|2 ,
(17.6)
s≤t
where the first term on the right-hand side is a Lebesgue–Stieltjes integral. If we approximate martingale, we see that the first this integral by a Riemann sum and use the fact that Mi is a term on the right in (17.6) is a martingale. Thus Mi2 (t ) − s≤t |Mi (s)|2 is a martingale. Since Mi2 (t ) − Mi t is a martingale, summing over i completes the proof. If Hs is of the form Hs (ω) =
n
Ki (ω)1(ai ,bi ] (s),
(17.7)
i=1
where each Ki is bounded and Fai measurable, define the stochastic integral by t n Hs dMs = Ki [Mbi ∧t − Mai ∧t ]. Nt = 0
i=1
Very similar proofs to those in Chapter 10 show that the left-hand side will be a martingale and (with [·] instead of ·), Nt2 − [N]t is a martingale. ∞ If H is P measurable and E 0 Hs2 d[M]s < ∞, approximate H by integrands Hsn of the form (17.7) so that ∞ E (Hs − Hsn )2 d[M]s → 0 0
Ntn
and define as the stochastic integral of H n with respect to Mt . By almost the same t proof as that of Theorem 10.4, the martingales Ntn converge in L2 . We call the limit Nt = 0 Hs dMs the stochastic integral of H with respect to M. A subsequence of the N n converges uniformly over t ≥ 0, a.s., and therefore the limit has paths that are right continuous with left limits. The same arguments as those of Theorem 10.4 apply to prove that the stochastic integral is a martingale and t Hs2 d[M]s . [N]t = 0
17.3 Itˆo’s formula
135
A consequence of this last equation is that t
t 2 E Hs dMs = E Hs2 d[M]s . 0
(17.8)
0
17.3 Itˆo’s formula We will first prove Itˆo’s formula for a special case, namely, we suppose Xt = Mt + At , where Mt is a square integrable martingale and At is a process of bounded variation whose total variation is integrable. The extension to semimartingales without the integrability conditions will be done later in the chapter (in Section 17.5) and is easy. Define X c t to be M c t . Theorem 17.5 Suppose Xt = Mt + At , where Mt is a square integrable martingale and At is a process with paths of bounded variation whose total variation is integrable. Suppose f is C 2 on R with bounded first and second derivatives. Then t t 1 f (Xs− ) dXs + 2 f (Xs− ) dX c s (17.9) f (Xt ) = f (X0 ) + 0 0 + [ f (Xs ) − f (Xs− ) − f (Xs− )Xs ]. s≤t
Proof
The proof will be given in several steps. Set t f (Xs− ) dXs , Q(t ) = S(t ) = 0
and J (t ) =
1 2
t
f (Xs− ) dX c s ,
0
[ f (Xs ) − f (Xs− ) − f (Xs− )Xs ]. s≤t
We use these letters as mnemonics for “stochastic integral term,” “quadratic variation term,” and “jump term,” respectively. Step 1. Suppose Xt has a single jump at time T which is either a predictable stopping time or a totally inaccessible stopping time and there exists N > 0 such that |MT | + |AT | ≤ N a.s. t be the compensator. If we replace If T is totally inaccessible, let Ct = MT 1(t≥T ) and let C Mt by Mt − Ct + Ct and At by At + Ct − Ct , we may assume that Mt is continuous. If T is a predictable stopping time, replace Mt by Mt − MT 1(t≥T ) and At by At + MT 1(t≥T ) , and again we may assume M is continuous. t = Xt − Bt and A t = Mt + A t is t = At − Bt . Then X t and X Let Bt = XT 1(t≥T ) . Set X s a continuous process that agrees with Xt up to but not including time T . We have Xs− = X and Xs = 0 if s ≤ T . By Theorem 11.1 t t 0 ) + s ) d X s + 1 s ) dMs t ) = f (X f (X f (X f (X 2 0 0 t t 0 ) + s− ) d X s + 1 s− ) dX c s = f (X f (X f (X 2 0 0 s ], + [ f (Xs ) − f (Xs− ) − f (Xs− )X s≤t
136
Processes with jumps
t agrees with Xt . At time T , f (Xt ) has a since the sum on the last line is zero. For t < T , X , S(t ), will jump f (XT − )XT , jump of size f (XT ) − f (XT − ). The integral with respect to X Q(t ) does not jump at all, and J (t ) jumps f (XT ) − f (XT − ) − f (XT − )XT . Therefore both sides of (17.9) jump the same amount at time T , and hence in this case we have (17.9) holding for t ≤ T . Step 2. Suppose there exist times T1 < T2 < · · · with Tn → ∞, each Ti is either a totally inaccessible stopping time or a predictable stopping time, for each i, there exists Ni > 0 such that |MTi | and |ATi | are bounded by Ni , and Xt is continuous except at the times T1 , T2 , . . . Let T0 = 0. Fix i for the moment. Define Xt = X(t−Ti )+ , define At and Mt similarly, and apply Step 1 to X at time Ti + t. We have for Ti ≤ t ≤ Ti+1 t t f (Xt ) = f (XTi ) + f (Xs− ) dXs + 12 f (Xs− ) dX c s T Ti i + [ f (Xs ) − f (Xs− ) − f (Xs− )Xs ]. Ti <s≤t
Thus for any t we have
Ti+1 ∧t
Ti+1 ∧t
f (XTi+1 ∧t ) = f (XTi ∧t ) + f (Xs− ) dXs + f (Xs− ) dX c s Ti ∧t Ti ∧t + [ f (Xs ) − f (Xs− ) − f (Xs− )Xs ]. 1 2
Ti ∧t<s≤Ti+1 ∧t
Summing over i, we have (17.9) for each t. Step 3. We now do the general case. As in the paragraphs preceding Theorem 17.3, we can find stopping times S1 , S2 , . . . such that each jump of X occurs at one of the times Si and so that for each i, there exists Ni > 0 such that |MSi | + |ASi | ≤ Ni . Moreover each Si is either a predictable stopping time or a totally inaccessible stopping time. Let M be decomposed into M c and Mi as in Theorem 17.3 and let Atc
= At −
∞
ASi 1(t≥Si ) .
i=1
Since At is of bounded variation, then Ac will be finite and continuous. Define Mtn = Mtc +
n
Mi (t )
i=1
and Atn = Atc +
n
ASi 1(t≥Si ) ,
i=1
converges uniformly over t ≥ 0 to M in and let Xt n = Mtn + Atn . We already know that M n L2 . If we let Btn = ni=1 (ASi )+ 1(t≥Si ) and Ctn = ni=1 (ASi )− 1(t≥Si ) and let Bt = supn Btn , Ct = supn Ctn , then the fact that A has paths of bounded variation implies that with probability
17.3 Itˆo’s formula
137
one, Btn → Bt and Ctn → Ct uniformly over t ≥ 0 and At = Bt − Ct . In particular, we have convergence in total variation norm: ∞ E |d(Atn ) − At )| → 0. 0
We define S (t ), Q (t ), and J (t ) analogously to S(t ), Q(t ), and J (t ), respectively. By applying Step 2 to X n , we have n
n
n
f (Xt n ) = f (X0n ) + S n (t ) + Qn (t ) + J n (t ), and we need to show convergence of each term. We now examine the various terms. Uniformly in t, Xt n converges to Xt in probability, that is,
P(sup |Xt n − Xt | > ε) → 0 t≥0
t
as n → ∞ for each ε > 0. Since 0 dM c s < ∞, by dominated convergence t t n c f (Xs− ) dM s → f (Xs− ) dM c s 0
0
in probability. Therefore Q (t ) → Q(t ) in probability. Also, f (Xt n ) → f (Xt ) and f (X0 ) → f (X0 ), both in probability. We now show S n (t ) → S(t ). Write t t n f (Xs− ) dAns − f (Xs− ) dAs 0 0 t t n n n = f (Xs− ) dAs − f (Xs− ) dAs 0 0 t t n f (Xs− ) dAs − f (Xs− ) dAs + n
0
0
= I1n + I2n . We see that |I1n | ≤ f ∞
t
|dAns − dAs | → 0 0
as n → ∞, while by dominated convergence, |I2n | also tends to 0. We next look at the stochastic integral part of S n (t ). t t n n f (Xs− ) dMs − f (Xs− ) dMs 0 0 t t n = f (Xs− ) dMsn − f (Xs− ) dMsn 0 0 t t n + f (Xs− ) dMs − f (Xs− ) dMs 0
= I3n + I4n .
0
138
Processes with jumps
The L2 norm of I3n is bounded by
t
E
n | f (Xs− ) − f (Xs− )|2 d[M n ]s ≤ E
0
t
n | f (Xs− ) − f (Xs− )|2 d[M]s ,
0
which goes to zero by dominated convergence. Also
t
=
I4n
f (Xs− ) 0
∞
dMi (s),
i=n+1
so using the orthogonality lemma (Lemma 17.2), the L2 norm of I4n is less than ∞
f 2∞
E [Mi ]∞ ≤ f 2∞
i=n+1
∞
E Mi (∞)2 ,
i=n+1
which goes to zero as n → ∞. Finally, we look at the convergence of J n . The idea here is to break both J (t ) and J n (t ) into two parts, the jumps that might be relatively large (jumps at times Si for i ≤ N where N will be chosen appropriately) and the remaining jumps. Let N > 1 be chosen later. [ f (Xs ) − f (Xs− ) − f (Xs− )Xs ] J (t ) − J n (t ) = s≤t
−
n n [ f (Xsn ) − f (Xs− ) − f (Xs− )Xsn ] s≤t
=
[ f (XSi ) − f (XSi − ) − f (XSi − )XSi ]
{i:Si ≤t}
−
[ f (XSni ) − f (XSni − ) − f (XSni − )XSni ]
{i:Si ≤t}
=
[ f (XSi ) − f (XSi − ) − f (XSi − )XSi ]
{i>N:Si ≤t}
−
[ f (XSni ) − f (XSni − ) − f (XSni − )XSni ]
{i>N:Si ≤t}
+
[ f (XSi ) − f (XSi − ) − f (XSi − )XSi ]
{i≤N,Si ≤t}
− [ f (XSni ) − f (XSni − ) − f (XSni − )XSni ] = I5N − I6n,N + I7n,N .
By the fact that M and A are right continuous with left limits, |MSi | ≤ 1/2 and |ASi | ≤ 1/2 if i is large enough (depending on ω), and then |XSi | ≤ 1, and also |XSi |2 ≤ 2|MSi |2 + 2|ASi |2 ≤ 2|MSi |2 + |ASi |.
17.4 The reduction theorem
We have |I5N | ≤ f ∞
139
(XSi )2
i>N,Si ≤t
and |I6n,N | ≤ f ∞ ∞
Since i=1 |MSi |2 ≤ [M]∞ < ∞ and N large such that
∞ i=1
(XSi )2 .
n≥i>N,Si ≤t
|ASi | < ∞, then given ε > 0, we can choose
P(|I5N | + |I6n,N | > ε) < ε. Once we choose N, we then see that I7n,N tends to zero in probability as n → ∞, since Xt n converges in probability to Xt uniformly over t ≥ 0. We conclude that J n (t ) converges to J (t ) in probability as n → ∞. This completes the proof.
17.4 The reduction theorem Let M be a process adapted to {Ft }. If there exist stopping times Tn increasing to ∞ such that each process Mt∧Tn is a uniformly integrable martingale, we say M is a local martingale. If each Mt∧Tn is a square integrable martingale, we say M is a locally square integrable martingale. We say a stopping time T reduces a process M if Mt∧T is a uniformly integrable martingale. Lemma 17.6 (1) The sum of two local martingales is a local martingale. (2) If S and T both reduce M, then so does S ∨ T . (3) If there exist times Tn → ∞ such that Mt∧Tn is a local martingale for each n, then M is a local martingale. Proof (1) If the sequence Sn reduces M and the sequence Tn reduces N, then Sn ∧ Tn will reduce M + N. (2) Mt∧(S∨T ) is bounded in absolute value by |Mt∧T | + |Mt∧S |. Both {|Mt∧T |} and {|Mt∧S |} are uniformly integrable families of random variables. Now use Proposition A.17. (3) Let Snm be a family of stopping times reducing Mt∧Tn and let Snm = Snm ∧ Tn . Renumber the stopping times into a single sequence R1 , R2 , . . . and let Hk = R1 ∨· · ·∨Rk . Note Hk ↑ ∞. To show that Hk reduces M, we need to show that Ri reduces M and use (2). But Ri = Snm for some m, n, so Mt∧Ri = Mt∧Snm ∧Tn is a uniformly integrable martingale. Let M be a local martingale with M0 = 0. We say that a stopping time T strongly reduces M if T reduces M and the martingale E [ |MT | | Fs ] is bounded on [0, T ), that is, there exists K > 0 such that sup E [ |MT | | Fs ] ≤ K,
a.s.
0≤s
Lemma 17.7 (1) If T strongly reduces M and S ≤ T , then S strongly reduces M. (2) If S and T strongly reduce M, then so does S ∨ T . (3) If Y∞ is integrable, then E [E [Y∞ | FT ] | FS ] = E [Y∞ | FS∧T ].
140
Processes with jumps
Proof (1) Note E [ |MS | | Fs ] ≤ E [ |MT | | Fs ] by Jensen’s inequality, hence S strongly reduces M. (2) It suffices to show that E [|MS∨T | | Ft ] is bounded for t < T , since by symmetry the same will hold for t < S. For t < T this expression is bounded by
E [ |MT | | Ft ] + E [ |MS |1(S>T ) | Ft ]. The first term is bounded since T strongly reduces M. For the second term, if t < T , 1(tT ) | Ft ] = E [ |MS |1(S>T ) 1(t
Let Rn ↑ ∞ be a sequence reducing M. Let Snm = Rn ∧ inf {t : E [ |MRn | | Ft ] ≥ m}.
Arrange the stopping times Snm into a single sequence {Un } and let Tn = U1 ∨ · · · ∨ Un . In view of the preceding lemmas, we need to show Ui strongly reduces M, which will follow if Snm does for each n and m. Let Yt = E [ |MRn | | Ft ], where we take a version whose paths are right continuous with left limits. Y is bounded by m on [0, Snm ). By Jensen’s inequality for conditional expectations and Lemma 17.7
E [ |MSnm |1(t<Snm ) | Ft ] ≤ E [ |E [ |MRn | | FSnm ]|1(t<Snm ) | Ft ] = E [ E [ |MRn |1(t<Snm ) | FSnm ] | Ft ] = E [ |MRn |1(t<Snm ) | FSnm ∧t ] = YSnm ∧t 1(t<Snm ) = Yt 1(t<Snm ) ≤ m. We are done. Our main theorem of this section is the following. Theorem 17.9 Suppose M is a local martingale. Then there exist stopping times Tn ↑ ∞ such that Mt∧Tn = Utn + Vt n , where each U n is a square integrable martingale and each V n is a martingale whose paths are of bounded variation and such that the total variation of the paths of Vn is integrable. Moreover, Ut = UT and Vt = VT for t ≥ T . The last sentence of the statement of the theorem says that U and V are both constant from time T on. Proof It suffices to prove that if M is a local martingale with M0 = 0 and T strongly reduces M, then Mt∧T can be written as U + V with U and V of the described form. Thus we
17.5 Semimartingales
141
may assume Mt = MT for t ≥ T , |MT | is integrable, and E [ |MT | | Ft ] is bounded, say by K, on [0, T ). be the compensator of A, let V = A − A , and let Let At = MT 1(t≥T ) = Mt 1(t≥T ) , let A . Then V is a martingale of bounded variation. We compute the expectation U = M −A+A of the total variation of V . Let Bt = MT+ 1(t≥T ) and Ct = MT− 1(t≥T ) . Then the expectation of the total variation of A is bounded by E |MT | < ∞ and the expectation of the total variation is bounded by of A ∞ = E B∞ + E C∞ ≤ E |MT | < ∞. ∞ + E C EB We need to show U is square integrable. Note |Mt − At | = |Mt |1(t
17.5 Semimartingales We define a semimartingale to be a process of the form Xt = X0 + Mt + At , where X0 is finite, a.s., and is F0 measurable, Mt is a local martingale, and At is a process whose paths have bounded variation on [0, t] for each t. If Mt is a local martingale, let Tn be a sequence of stopping times as in Theorem 17.9. We c set Mt∧T = (U n )tc for each n and n Ms2 . [M]t∧Tn = M c t∧Tn + s≤t∧Tn
It is easy to see that these definitions are independent of how we decompose M into U n + V n and of which sequence of stopping times Tn strongly reducing M we choose. We define X c t = M c t and define [X ]t = X c t + Xs2 . s≤t
We say an adapted process H is locally bounded if there exist stopping times Sn ↑ ∞ and constants Kn such that on [0, Sn ] the process H is bounded by Kn . If Xt is a semimartingale and t H is a locally bounded predictable process, define 0 Hs dXs as follows. Let Xt = X0 +Mt +At . If Rn = Tn ∧ Sn , where the Tn are as in Theorem 17.9 and the Sn are the stopping times used t∧R in the definition of locally bounded, set 0 n Hs dMs to be the stochastic integral as defined t∧R in Section 17.2. Define 0 n Hs dAs to be the usual Lebesgue–Stieltjes integral. Define the stochastic integral with respect to X as the sum of these two. Since Rn ↑ ∞, this t defines 0 Hs dXs for all t. One needs to check that the definition does not depend on the decomposition of X into M and A nor on the choice of stopping times Rn . We now state the general Itˆo formula.
142
Processes with jumps
Theorem 17.10 Suppose X is a semimartingale and f is C 2 . Then t t 1 f (Xt ) = f (X0 ) + f (Xs− ) dXs + 2 f (Xs− ) dX c s 0 0 + [ f (Xs ) − f (Xs− ) − f (Xs− )Xs ]. s≤t
Proof First suppose f has bounded first and second derivatives. Let Tn be stopping times t strongly reducing Mt , let Sn = inf {t : 0 |dAs | ≥ n}, let Rn = Tn ∧ Sn , and let Xt n = Xt∧Rn − ARn . Since the total variation of At is bounded on [0, Rn ), it follows that X n is a semimartingale which is the sum of a square integrable martingale and a process whose total variation is integrable. We apply Theorem 17.5 to this process. Xt n agrees with Xt on [0, Rn ). As in the proof of Theorem 17.5, by looking at the jump at time Rn , both sides of Itˆo’s formula jump the same amount at time Rn , and so Itˆo’s formula holds for Xt on [0, Rn ]. If we now only assume that f is C 2 , we approximate f by a sequence fm of functions that are C 2 and whose first and second derivatives are bounded, and then let m → ∞; we leave the details to the reader. Thus Itˆo’s formula holds for t in the interval [0, Rn ] and for f without the assumption of bounded derivatives. Finally, we observe that Rn → ∞, so except for a null set, Itˆo’s formula holds for each t. The proof of the following corollary is similar to the proof of Itˆo’s formula. Corollary 17.11 If Xt = (Xt 1 , . . . , Xt d ) is a process taking values in Rd such that each component is a semimartingale, and f is a C 2 function on Rd , then t d ∂f f (Xt ) = f (X0 ) + (Xs− ) dXsi 0 i=1 ∂xi t d ∂2 f 1 +2 (Xs− ) d(X i )c , (X j )c s 0 i, j=1 ∂xi ∂x j +
f (Xs ) − f (Xs− ) −
s≤t
d ∂f (Xs− )Xsi . ∂xi i=1
If X and Y are real-valued semimartingales, define [X , Y ]t = 12 ([X + Y ]t − [X ]t − [Y ]t ). The following corollary is the product formula for semimartingales with jumps. Corollary 17.12 If X and Y are semimartingales of the above form, t t Xs− dYs + Ys− dXs + [X , Y ]t . Xt Yt = X0Y0 + 0
Proof
0
Apply Theorem 17.10 with f (x) = x . Since in this case 2
f (Xs ) − f (Xs− ) − f (Xs− )Xs = Xs2 ,
(17.10)
17.6 Exponential of a semimartingale
we obtain
Xt = 2
X02
143
t
+2
Xs− dXs + [X ]t .
(17.11)
0
Applying (17.11) with X replaced by Y and by X + Y and using Xt Yt = 12 [(Xt + Yt )2 − Xt 2 − Yt 2 ] gives our result.
17.6 Exponential of a semimartingale A function with finite total variation is purely discontinuous if it has no continuous part, i.e., a(t ) = s≤t a(s). Theorem 17.13 Let Xt be a semimartingale. Define
Zt = Z0 exp Xt − 12 X c t (1 + Xs )e−Xs .
(17.12)
0≤s≤t
Then Zt is a semimartingale, 0≤s≤t (1 + Xs )e−Xs is a process of bounded variation whose paths are purely discontinuous, and Zt satisfies t Zs− dXs . (17.13) Zt = Z0 + 0
Proof Since the product of finitely many functions of bounded variation which are purely discontinuous will give a function of the same type and in each finite interval there are only finitely many jumps of Xt of size larger in absolute value than 1/2, it suffices to consider Vt = (1 + Xs )e−Xs 1(|Xs |≤1/2) . 0≤s≤t
Note log Vt =
(log(1 + Xs ) − Xs )1(|Xs |≤1/2) ,
s≤t
which is bounded in absolute value by a constant times s≤t Xs2 < ∞. Exercise 17.4 tells us that Vt = exp(log Vt ) is a purely discontinuous process, and consequently V is also. We apply the multivariate version of Itˆo’s formula (Corollary 17.11). Let f (x, y) = ex y and let Zt = f (Kt , Vt ) where Kt = Xt − 12 X c t . We obtain t t t Ks− 1 Zs− dKs + e dVs + 2 Zs− dK c t Zt − Z0 = 0 0 0 + [Zs − Zs− − Zs− Ks − e−Ks− Vs ] s≤t
= I1 + I2 + I3 + I4 . We have
t
I1 =
Zs− dXs − 0
1 2
t
Zs− dX c s . 0
144
Processes with jumps
Since Vt is purely discontinuous, I2 =
eKs− Vs .
s≤t
Since K c = X c ,
I3 =
1 2
t
Zs− dX c s . 0
Note that Zs = Zs− (1 + Xs ) and Zs− Ks = Zs− Xs , so eKs− Vs . I4 = − s≤t
Summing yields (17.13). The solution Z of (17.13) is called the exponential of the semimartingale X .
17.7 The Girsanov theorem Let P and Q be two equivalent probability measures, that is, P and Q are mutually absolutely continuous. Let M∞ be the Radon–Nikodym derivative of Q with respect to P and let Mt = E [M∞ | Ft ]. The martingale Mt is uniformly integrable since M∞ ∈ L1 (P ). Once a non-negative martingale hits zero, it is easy to see that it must be zero from then on; this is Exercise 17.5. Since Q and P are equivalent, then M∞ > 0, a.s., and so Mt never equals zero, a.s. Observe that MT is the Radon–Nikodym derivative of Q with respect to P on FT . Let Lt be the local martingale defined by t 1 dMs , Lt = M s− 0 so that dMt = Mt− dLt , or M is the exponential of L. Theorem 17.14 Suppose X is a local martingale with respect to P. Then Xt − Dt is a local martingale with respect to Q, where t t 1 Ms− d[X , M]s = d[X , L]s . Dt = M s 0 0 Ms Note that in the formula for D, we are using a Lebesgue–Stieltjes integral. Proof Exercise 17.6 tells us that it suffices to show that Mt (Xt − Dt ) is a local martingale with respect to P. By Corollary 17.12, d(M (X − D))t = (X − D)t− dMt + Mt− dXt − Mt− dDt + d[M, X − D]t .
Exercises
145
The first two terms on the right are local martingales with respect to P. Since D is of bounded variation, the continuous part of D is zero, hence t [M, D]t = Ms Ds = Ms dDs . 0
s≤t
Thus
t
Mt (Xt − Dt ) = local martingale + [M, X ]t −
Ms dDs . 0
Using the definition of D shows that Mt (Xt − Dt ) is a local martingale.
Exercises 17.1 Suppose a(t ) is a deterministic right-continuous nondecreasing function of t with a(0) = 0. Prove the following formulas: t a(t )2 = [(a(t ) − a(s)) + (a(t ) − a(s−))] da(s), (17.14) 0 t and a(t )2 = (2a(s−) + a(s)) da(s) 0 t =2 a(s−) da(s) + (a(s))2 . (17.15) 0
s
Hint: First do the case where a has only finitely many discontinuities. t is its compensator, show that A jumps only when A does. 17.2 If At is an increasing process and A j
17.3 Let Pt , j ∈ Z, be independent Poisson processes with parameter λ j . Suppose λ j = λ− j for each j = 0. Suppose λ j decreases as j increases for j ≥ 1. Let j Pt . Xt = j∈Z
Determine reasonable conditions on the sequence λ j so that X is a semimartingale. A local martingale. A martingale. A locally square integrable martingale. 17.4 Show that if f (t ) is a purely discontinuous function, then e f (t ) is also. 17.5 Suppose M is a non-negative right-continuous martingale and T = inf {t > 0 : Mt = 0}. Show that Mt = 0 on (t > T ). 17.6 Suppose P and Q are two equivalent probability measures, M∞ is the Radon–Nikodym derivative of Q with respect to P, and Mt = E [M∞ | Ft ]. Show that Yt is a local martingale with respect to Q if and only if Yt Mt is a local martingale with respect to P. 17.7 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0, a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose Xt − t is a martingale. Prove that X is a Poisson process. Hint: Imitate the proof of Theorem 12.1. When using Itˆo’s formula, it is important to use the fact that Xt is always 0 or 1.
146
Processes with jumps
17.8 Suppose Xt is an increasing process with paths that are right continuous with left limits, X0 = 0, a.s., X is purely discontinuous, and all jumps are of size +1 only. Suppose limt→∞ Xt = ∞, a.s. Prove that X is a time change of a Poisson process. 17.9 Suppose Pt is a Poisson process with parameter λ, {Ft } is the minimal augmented filtration for P, and Mt = Pt − λt. Suppose Y is a F1 measurable random variable with finite mean and variance. Prove that there exists a predictable process H such that 1 Y = EY + Hs dMs . 0
17.10 Let P1 and P2 be two independent Poisson processes with the same parameter. Let Xt = Pt1 − Pt2 and let {Ft } be the minimal augmented filtration for X . Find a bounded mean zero random variable Y that is F1 measurable which does not satisfy 1 Y = Hs dXs 0
for any predictable process H .
18 Poisson point processes
Poisson point processes are random measures that are related to Poisson processes. We will use them when we study L´evy processes in Chapter 42. Poisson point processes are also useful in the study of excursions, even excursions of a continuous process such as Brownian motion (see Chapter 27), and they arise when studying stochastic differential equations with jumps. Let S be a metric space, G the collection of Borel subsets of S , and λ a measure on (S , G ). Definition 18.1 We say a map N : × [0, ∞) × G → {0, 1, 2, . . .} (writing Nt (A) for N (ω, t, A)) is a Poisson point process if (1) for each Borel subset A of S with λ(A) < ∞, the process Nt (A) is a Poisson process with parameter λ(A), and (2) for each t and ω, N (t, ·) is a measure on G . A model to keep in mind is where S = R and λ is a Lebesgue measure. For each ω there is a collection of points {(s, z)} (where the collection depends on ω). The number of points in this collection with s ≤ t and z in a subset A is Nt (A)(ω). Since λ(R ) = ∞, there are infinitely many points in every time interval. A consequence of the definition is that since λ(∅) = 0, then Nt (∅) is a Poisson process with parameter 0; in other words, Nt (∅) is identically zero. Our main result is that Nt (A) and Nt (B) are independent if A and B are disjoint. Theorem 18.2 Let {Ft } be a filtration satisfying the usual conditions. Let S be a metric space furnished with a positive measure λ. Suppose that Nt (A) is a Poisson point process with respect to the measure λ. If A1 , . . . , An are pairwise disjoint measurable subsets of S with λ(Ak ) < ∞ for k = 1, . . . , n, then the processes Nt (A1 ), . . . , Nt (An ) are mutually independent. Proof We first make because N (t, ·) is a measure and the A1 , A2 , . . . , An n the observation that n are disjoint, then k=1 Nt (Ak ) = Nt (∪k=1 Ak ) is a Poisson process with finite parameter. A Poisson process has jumps of size one only, hence no two of the Nt (Ak ) have jumps at the same time. To prove the theorem, it suffices to let 0 = r0 < r1 < · · · < rm and show that the random variables {Nr j (Ak ) − Nr j−1 (Ak ) : 1 ≤ j ≤ m, 1 ≤ k ≤ n} 147
148
Poisson point processes
are independent. Since for each j and each k, Nr j (Ak ) − Nr j−1 (Ak ) is independent of Fr j−1 , it suffices to show that for each j ≤ m, the random variables {Nr j (Ak ) − Nr j−1 (Ak ) : 1 ≤ k ≤ n} are independent. We will do the case j = m = 1 and write r for r j for simplicity; the case when j, m > 1 differs only in notation. We will prove this using induction. We start with the case n = 2 and show the independence of Nr (A1 ) and Nr (A2 ). Each Nt (Ak ) is a Poisson process, and so Nt (Ak ) has moments of all orders. Let u1 , u2 ∈ R and set φk = λ(Ak )(eiuk − 1),
k = 1, 2.
Let Mtk = eiuk Nt (Ak )−tφk . We see that Mtk is a martingale because E eiuk Nt (Ak ) = etφk , and therefore
E [Mtk | Fs ] = Msk E [eiu(Nt (Ak )−Ns (Ak )))−(t−s)φk | Fs ] = Msk e−(t−s)φk E [eiu(Nt (Ak )−Ns (Ak )) ] = Msk , using the independence and stationarity of the increments of a Poisson process. Since we have argued that no two of the Nt (Ak ) jump at the same time, the same is true for the Mtk and so [M j , M k ]t = 0 if j = k. By the product formula (Corollary 17.12) and Itˆo’s formula (Theorem 17.10) t t Mtk = 1 − φk eiuk Ns− (Ak )−sφk ds + iuk eiuk Ns− (Ak )−sφk dNs (Ak ) 0 0 + eiuk Ns− (Ak )−sφk [eiuk Ns (Ak ) − 1 − iuk Ns (Ak )] s≤t
t
= 1 − φk
eiuk Ns− (Ak )−sφk ds +
0
eiuk Ns− (Ak )−sφk [eiuk Ns (Ak ) − 1]
s≤t
tk + Btk . =1−B tk , where Btk is a complex-valued process We see therefore that Mtk − 1 is of the form Btk − B tk is the compensator of Btk . whose paths are locally of bounded variation, and B k k − 1. Since the Mtk do not jump at the same time, by the orthogonality Let M t = Mt∧r 1 2 lemma (Lemma 17.2), E M ∞ M ∞ = 0, which translates to
E Mr1 Mr2 = 1. This implies
E ei(u1 Nr (A1 )+u2 Nr (A2 )) = erφ1 erφ2 = E eiu1 Nr (A1 ) E eiu2 Nr (A2 ) .
Since this holds for all u1 , u2 , then Nr (A1 ) and Nr (A2 ) are independent. We conclude that the processes Nt (A1 ) and Nt (A2 ) are independent.
Poisson point processes
149
To handle the case n = 3, we first show that Mt1 Mt2 is a martingale. We write
E [Mt1 Mt2 | Fs ] = Ms1 Ms2 e−(t−s)(φ1 +φ2 ) E [ei(u1 (Nt (A1 )−Ns (A1 ))+u2 (Nt (A2 )−Ns (A2 ))) | Fs ] = Ms1 Ms2 e−(t−s)(φ1 +φ2 ) E [ei(u1 (Nt (A1 )−Ns (A1 ))+u2 (Nt (A2 )−Ns (A2 ))) ] = Ms1 Ms2 , using the fact that Nt (A1 ) and Nt (A2 ) are independent of each other and each have stationary and independent increments. Note that Mt3 = eiu3 Nt (A3 )−tφ3 has no jumps in common with Mt1 or Mt2 . Therefore if 3 3 M t = Mt∧r , then 3
1
2
E [M ∞ (M ∞ M ∞ )] = 0, and as before this leads to
E [Mr3 (Mr1 Mr2 )] = 1. As above this implies that Nr (A1 ), Nr (A2 ), and Nr (A3 ) are independent. To prove the general induction step is similar. We will also need the following corollary. Corollary 18.3 Let Ft and Nt (Ak ) be as in Theorem 18.2. Suppose Yt is a process with paths that are right continuous with left limits such that Yt − Ys is independent of Fs whenever s < t and Yt − Ys has the same law as Yt−s for each s < t. Suppose moreover that Y has no jumps in common with any of the Nt (Ak ). Then the processes Nt (A1 ), . . . , Nt (An ), and Yt are independent. Proof The law of Y0 is the same as that of Yt − Yt , so Y0 = 0, a.s. By the fact that Y has stationary and independent increments,
E eiuYs+t = E eiuYs E eiu(Ys+t −Ys ) = E eiuYs E eiuYt , which implies that the characteristic function of Y is of the form E eiuYt = etψ (u) for some function ψ (u). We fix u ∈ R and define MtY = eiuYt −tψ (u) . As in the proof of Theorem 18.2, we see that MtY is a martingale. Since M Y has no jumps in Y Y , we see by Lemma 17.2 that common with any of the Mtk , if M t = Mt∧r Y
1
n
E [M ∞ (M ∞ · · · M ∞ )] = 1, or
E [MrY Mr1 · · · Mrn ] = 1. This leads as above to the independence of Y from all the Nt (Ak )’s. We now turn to stochastic integrals with respect to Poisson point processes. In the same way that a nondecreasing function on the reals gives rise to a measure, so Nt (A) gives rise
150
Poisson point processes
to a random measure μ(dt, dz) on the product σ -field B [0, ∞) × G , where B [0, ∞) is the Borel σ -field on [0, ∞); μ is determined by μ([0, t] × A)(ω) = Nt (A)(ω). Define a nonrandom measure ν on B [0, ∞) × G by ν([0, t] × A) = tλ(A) for A ∈ G . If λ(A) < ∞, then μ([0, t] × A) − ν([0, t] × A) is the same as a Poisson process minus its mean, hence is locally a square integrable martingale. We can define a stochastic integral with respect to the random measure μ − ν as follows. Suppose H (ω, s, z) is of the form H (ω, s, z) =
n
Ki (ω)1(ai ,bi ] (s)1Ai (z),
(18.1)
i=1
where for each i the random variable Ki is bounded and Fai measurable and Ai ∈ G with λ(Ai ) < ∞. For such H we define t Nt = H (ω, s, z) d(μ − ν )(ds, dz) (18.2) 0
=
n
Ki (μ − ν)(((ai , bi ] ∩ [0, t]) × Ai ).
i=1
Let us assume without loss of generality that the Ai are disjoint. It is not hard to see (Exercise 18.3) that Nt is a martingale, that N c = 0, and that t [N]t = H (ω, s, z)2 μ(ds, dz). (18.3) 0
Since Nt must be predictable and all the jumps of N are totally inaccessible, it follows from Proposition 16.30 that Nt is continuous. Since [N]t − Nt is a martingale, we conclude t H (ω, s, z)2 ν(ds, dz). (18.4) Nt = 0
Suppose H (s, z) is a predictable process in the following sense: H is measurable with respect to the σ -field generated by all processes of the form (18.1). Suppose also that ∞ E H (s, z)2 ν(ds, dz) < ∞. 0
S
n
2 Take processes H of the form (18.1) converging t n to H in the space L with norm ∞ 2 1/2 n (E 0 S H dν ) . The corresponding Nt = 0 H (s, z) d(μ − ν ) are easily seen to be a Cauchy sequence in L2 , and the limit Nt we call the stochastic integral of H with respect to μ − ν. As in the continuous case, we may prove that E Nt2 = E [N]t = E Nt , and it follows from this, (18.3), and (18.4) that t t 2 [N]t = H (s, z) μ(ds, dz), Nt = H (s, z)2 ν(ds, dz). (18.5) 0
S
0
S
One may think of the stochastic integral as follows: if μ gives unit mass to a point at time t with value z, then Nt jumps at this time t and the size of the jump is H (t, z).
Exercises
151
Exercises 18.1 Suppose {Ft } is a filtration satisfying the usual conditions and Pt1 and Pt2 are Poisson processes with respect to {Ft } with parameters λ1 , λ2 , respectively. Suppose Pt1 + Pt2 is a Poisson process with parameter λ1 + λ2 . Prove that P1 and P2 are independent processes. 18.2 Suppose {Ft } is a filtration satisfying the usual conditions, Pt is a Poisson process with respect to {Ft }, and Wt is a Brownian motion with respect to {Ft }. Show that if Wt + Pt has stationary and independent increments, then P and W are independent processes. 18.3 If H is as in (18.1) and N is defined by (18.2), show that N is a martingale, N c = 0, and [N]t is given by (18.3). 18.4 Suppose {As , 0 < s < ∞} is a collection of subsets of S such that λ(As ) → ∞ as s → ∞. Show that Nt (As )/λ(As ) converges to t uniformly over finite intervals, where the convergence is in probability. 18.5 Suppose {As , 0 < s < ∞} is a collection of subsets of S such that Ar ⊂ As if r ≤ s and λ(As ) → ∞ as s → ∞. Show that for each t, N (A ) u s sup − u u≤t λ(As ) tends to zero almost surely as s → ∞. 18.6 Let S be a metric space and λ a σ -finite measure on S . Construct a Poisson point process which has λ as the corresponding measure. j j 18.7 Let Pt , j = 1, 2, . . . be independent Poisson processes with parameter β j . Let Xt = ∞ j=1 a j Pt , where a j is a sequence such that Xt is finite, a.s. For A ⊂ R \ {0}, define Nt (A) to be the number of times before time t that X has a jump whose size is in A: 1A (Xs − Xs− ). Nt (A) = s≤t
Prove that Nt is a Poisson point process and determine λ.
19 Framework for Markov processes
It is not uncommon for a Markov process to be defined as a sextuple (, F , Ft , Xt , θt , Px ), and for additional notation (e.g., ζ , , S , Pt , Rλ , etc.) to be introduced rather rapidly. This can be intimidating for the beginner. We will explain all this notation in as gentle a manner as possible. We will consider a Markov process to be a pair (Xt , Px ) (rather than a sextuple), where Xt is a single stochastic process and {Px } is a family of probability measures, one probability measure Px corresponding to each element x of the state space.
19.1 Introduction The idea that a Markov process consists of one process and many probabilities is one that takes some getting used to. To explain this, let us first look at an example. Suppose X1 , X2 , . . . is a Markov chain with stationary transition probabilities with K states: 1, 2, . . . , K. Everything we want to know about X can be determined if we know p(i, j) = P(X1 = j | X0 = i) for each i and j and μ(i) = P(X0 = i) for each i. We sometimes think of having a different Markov chain for every choice of starting distribution μ = (μ(1), . . . , μ(K )). But instead let us define a new probability space by taking to be the collection of all sequences ω = (ω0 , ω1 , . . .) such that each ωn takes one of the values 1, . . . , K. Define Xn (ω) = ωn . Define Fn to be the σ -field generated by X0 , . . . , Xn ; this is the same as the σ -field generated by sets of the form {ω : ω0 = a0 , . . . , ωn = an }, where a0 , . . . , an ∈ {1, 2, . . . , K}. For each x = 1, 2, . . . , K, define a probability measure Px on by
Px (X0 = x0 , X1 = x1 , . . . Xn = xn ) = 1{x} (x0 )p(x0 , x1 ) · · · p(xn−1 , xn ).
(19.1)
We have K different probability measures, one for each of x = 1, 2, . . . , K, and we can start with an arbitrary probability distribution μ if we define Pμ (A) = Ki=1 Pi (A)μ(i). We have lost no information by this redefinition, and it turns out this works much better when doing technical details. The value of X0 (ω) = ω0 can be any of 1, 2, . . . , K; the notion of starting at x is captured by Px , not by X0 . The probability measure Px is concentrated on those ω’s for which ω0 = x and Px gives no mass to any other ω. Let us now look at Brownian motion, and see how this framework plays out there. Let P be a probability measure and let Wt be a one-dimensional Brownian motion with respect to P started at 0. Then Wt x = x + Wt is a one-dimensional Brownian motion started at x. Let = C[0, ∞) be the set of continuous functions from [0, ∞) to R, so that each element 152
19.2 Definition of a Markov process
153
ω in is a continuous function. (We do not require that ω(0) = 0 or that ω(0) take any particular value of x.) Define Xt (ω) = ω(t ).
(19.2)
This will be our process. Let F be the σ -field on = C[0, ∞) generated by the cylindrical subsets of C[0, ∞); see Definition 1.1. Now define Px to be the law of W x . This means that Px is the probability measure on ( , F ) defined by
Px (X ∈ A) = P(W x ∈ A),
x ∈ R, A ∈ F .
(19.3)
The probability measure Px is determined by the fact that if n ≥ 1, t1 ≤ · · · ≤ tn , and B1 , . . . , Bn are Borel subsets of R, then
P(Xt1 ∈ B1 , . . . , Xtn ∈ Bn ) = P(Wt1x ∈ B1 , . . . , Wtnx ∈ Bn ). We call the pair (Xt , Px ), x ∈ R, t ≥ 0, a Brownian motion.
19.2 Definition of a Markov process We want to allow our Markov processes to take values in spaces other than the Euclidean ones. For now, we take our state space S to be a separable metric space, furnished with the Borel σ -field. For the beginner, just think of R in place of S . To define a Markov process, we start with a measurable space (, F ) and suppose we have a filtration {Ft } (not necessarily satisfying the usual conditions). Definition 19.1 A Markov process (Xt , Px ) is a stochastic process X : [0, ∞) × → S and a family of probability measures {Px : x ∈ S } on (, F ) satisfying the following. (1) For each t, Xt is Ft measurable. (2) For each t and each Borel subset A of S , the map x → Px (Xt ∈ A) is Borel measurable. (3) For each s, t ≥ 0, each Borel subset A of S , and each x ∈ S , we have
Px (Xs+t ∈ A | Fs ) = PXs (Xt ∈ A),
Px − a.s.
(19.4)
Some explanation is definitely in order. Let ϕ(x) = Px (Xt ∈ A),
(19.5)
so that ϕ is a function mapping S to R. Part of the definition of filtration given in Chapter 1 is that each Ft ⊂ F . Since we are requiring Xt to be Ft measurable, that means (Xt ∈ A) is in F and it makes sense to talk about Px (Xt ∈ A). Definition 19.1(2) says that the function ϕ is Borel measurable. This is a very mild assumption, and will be satisfied in the examples we look at. The expression PXs (Xt ∈ A) on the right-hand side of (19.4) is a random variable and its value at ω ∈ is defined to be ϕ(Xs (ω)), with ϕ given by (19.5). Note that the randomness in PXs (Xt ∈ A) is thus all due to the Xs term and not the Xt term. Definition 19.1(3) can be rephrased as saying that for each s, t, each A, and each x, there is a set Ns,t,x,A ⊂ that is a null set with respect to Px and for ω ∈ / Ns,t,x,A , the conditional expectation Px (Xs+t ∈ A | Fs ) is equal to ϕ(Xs ).
154
Framework for Markov processes
We have now explained all the terms in the sextuple (, F , Ft , Xt , θt , Px ) except for θt . These are called shift operators and are maps from → such that Xs ◦ θt = Xs+t . We defer the precise meaning of the θt and the rationale for them until Section 19.5, where they will appear in a natural way. In the remainder of the section and in Section 19.3 we define some of the additional notation commonly used for Markov processes. The first one is almost self-explanatory. We use E x for expectation with respect to Px . As with PXs (Xt ∈ A), the notation E Xs f (Xt ), where f is bounded and Borel measurable, is to be taken to mean ψ (Xs ) with ψ (y) = E y f (Xt ). If we want to talk about our Markov process started with distribution μ, we define Pμ (B) = Px (B) μ(dx), and similarly for E μ ; here μ is a probability on S .
19.3 Transition probabilities If B is the Borel σ -field on a metric space S , a kernel Q(x, A) on S is a map from S × B → R satisfying the following. (1) For each x ∈ S , Q(x, ·) is a measure on (S , B ). (2) For each A ∈ B , the function x → Q(x, A) is Borel measurable. The definition of Markov transition probabilities (or simply transition probabilities) is the following. Definition 19.2 A collection of kernels {Pt (x, A); t ≥ 0} are Markov transition probabilities for a Markov process (Xt , Px ) if (1) Pt (x, S ) = 1 for each t ≥ 0 and each x ∈ S . (2) For each x ∈ S , each Borel subset A of S , and each s, t ≥ 0, Pt (y, A)Ps (x, dy). (19.6) Pt+s (x, A) = S
(3) For each x ∈ S , each Borel subset A of S , and each t ≥ 0, Pt (x, A) = Px (Xt ∈ A).
(19.7)
Definition 19.2(3) can be rephrased as saying that for each x, the measures Pt (x, dy) and Px (Xt ∈ dy) are the same. We define f (y)Pt (x, dy) (19.8) Pt f (x) = when f : S → R is Borel measurable and either bounded or non-negative. Lemma 19.3 Suppose Pt are Markov transition probabilities. If f is Borel measurable and either non-negative or bounded, then Pt f is non-negative (respectively, bounded) and Borel measurable and Pt f (x) = E x f (Xt ),
x ∈ S.
(19.9)
19.3 Transition probabilities
155
Proof Using (19.7) and Definition 19.1(2), the Borel measurability and (19.9) hold when f is the indicator of a set A. By linearity they hold for simple functions, and then using monotone convergence they hold for non-negative functions. Using linearity again, we have measurability and (19.9) holding for f bounded and Borel measurable. The non-negativity (respectively, the boundedness) of f follows from (19.9). The equations (19.6) are known as the Chapman–Kolmogorov equations. They can be rephrased in terms of equality of measures: for each x Pt (y, dz)Ps (x, dy). (19.10) Ps+t (x, dz) = y∈S
Multiplying (19.10) by a bounded Borel measurable function f (z) and integrating gives Ps+t f (x) = Pt f (y)Ps (x, dy). (19.11) The right-hand side is the same as Ps (Pt f )(x), so we have Ps+t f (x) = Ps Pt f (x),
(19.12)
i.e., the functions Ps+t f and Ps Pt f are the same. The equation (19.12) is known as the semigroup property. By Lemma 19.3, Pt is a linear operator on the space of bounded Borel measurable functions on S . We can then rephrase (19.12) simply as Ps+t = Ps Pt .
(19.13)
Operators satisfying (19.13) are called a semigroup, and are much studied in functional analysis. We will show in Chapter 36 how to construct the Markov process corresponding to a given semigroup Pt . More about semigroups can also be found in Chapter 37. One more observation about semigroups: if we take expectations in (19.4), we obtain Px (Xs+t ∈ A) = E x PXs (Xt ∈ A) . The left-hand side is Ps+t 1A (x) and the right-hand side is
E x [Pt 1A (Xs )] = Ps Pt 1A (x), and so (19.4) encodes the semigroup property. The resolvent or λ-potential of a semigroup Pt is defined by ∞ Rλ f (x) = e−λt Pt f (x) dt, λ ≥ 0, x ∈ S . 0
This can be recognized as the Laplace transform of Pt . By Lemma 19.3 and the Fubini theorem, we see that ∞ Rλ f (x) = E x e−λt f (Xt ) dt. 0
Resolvents are useful because they are typically easier to work with than semigroups. When practitioners of stochastic calculus tire of a martingale, they “stop” it. Markov process theorists are a harsher lot and they “kill” their processes. To be precise, attach an
156
Framework for Markov processes
= S ∪ , and the topology on S is the one isolated point to S . Thus one looks at S generated by the open sets of S and {}. is called the cemetery point. All functions on S by defining them to be 0 at . At some random time ζ the Markov process are extended to S is killed, which means that Xt = for all t ≥ ζ . The time ζ is called the lifetime of the Markov process.
19.4 An example Let us give an example, that of Brownian motion, of course. Let Xt and Px be defined by (19.2) and (19.3). Define Ft = σ (Xr ; r ≤ t ). Clearly Definition 19.1(1) holds. Observe that since, under P, Wt is a mean zero normal random variable with variance t,
Px (Xt ∈ A) = P(Wt x ∈ A) = P(x + Wt ∈ A) 1 2 e−(y−x) /2t dy. =√ 2πt A
(19.14)
By dominated convergence, x → Px (Xt ∈ A) is continuous, therefore measurable. This proves Definition 19.1(2). It remains to prove Definition 19.1(3), which is the following proposition. Proposition 19.4 Let W be a Brownian motion as defined by Definition 2.1, let Wt x = x+Wt , and let (Xt , Px ) be defined by (19.2) and (19.3). If f is bounded and Borel measurable,
E x [ f (Xt+s ) | Fs ] = E Xs f (Xt ), Proof
Px -a.s.
(19.15)
We will first prove
E x [ f (Xt+s ) | Fs ] = E Xs f (Xt )
(19.16)
when f (x) = eiux . Using independent increments and the fact that Wt+s − Ws has the same law as Wt , we see that under each Px , Xt+s − Xs is independent of Fs and has the same law as a mean zero normal random variable with variance t. We conclude that
E x eiu(Xt+s −Xs ) = e−u t/2 ; 2
see (A.25). We then write
E x eiuXt+s |Fs = E x eiu(Xt+s −Xs ) |Fs eiuXs = E x eiu(Xt+s −Xs ) eiuXs = e−u t/2 eiuXs . 2
On the other hand, for any y, y
E y eiuXt = E eiuWt = E eiuWt eiuy = e−u t/2 eiuy . 2
Replacing y by Xs proves (19.16) for this f . Now suppose that f ∈ C ∞ with compact support and let f be the Fourier transform of f . In (19.16) we replace u by −u, multiply both sides by f (u), and integrate over u ∈ R. Using
19.4 An example
157
the Fourier inversion formula, we then have
E x [ f (Xt+s ) | Fs ] = (2π )−1 E x
= (2π )−1 E Xs
e−iuXt+s f (u) du | Fs
e−iuXt f (u) du
= E Xs f (Xt ). We used the Fubini theorem several times to interchange expectation and integration; this f is in the Schwartz class; see is justified because f in C ∞ with compact support implies Section B.2. This proves the proposition for f in C ∞ with compact support, and a limit argument gives it for all bounded and measurable f . The same proof works for d-dimensional Brownian motion. Set 1 Pt (x, A) = Px (Xt ∈ A) = P(Wt + x ∈ A) = √ 2πt
e−(y−x) /2t dy. 2
(19.17)
A
Clearly for each x and t, Pt (x, ·) is a measure with total mass 1. As we mentioned earlier, the function x → Pt (x, A) is continuous, hence Borel measurable. We will show the Chapman– Kolmogorov equations. These follow from the next proposition. Proposition 19.5 If s, t > 0 and x, z ∈ R, then 1 −(y−x)2 /2t 1 2 e e−(z−y) /2s dy √ √ 2πt 2π s y∈R 1 2 e−(z−x) /2(s+t ) . =√ 2π (s + t )
(19.18)
Proof This is a well-known property of the Gaussian density, but we can derive (19.18) from Proposition 19.4. Let f be continuous with compact support. Taking expectations in (19.15),
E x f (Xt+s ) = E x [E Xs f (Xt )], or Pt+s f (x) = Ps Pt f (x). Using Lemma 19.3 and (19.17), 1 2 e−(z−x) /2(s+t ) dx f (x) √ 2π (s + t ) 1 −(y−x)2 /2t 1 2 e e−(z−y) /2s dy dx. = f (x) √ √ 2πt 2π s Since this holds for all continuous f with compact support, (19.18) holds for almost every x. Since both sides of (19.18) are continuous in x, then (19.18) holds for all x.
158
Framework for Markov processes
19.5 The canonical process and shift operators Suppose we have a Markov process (Xt , Px ) where Ft = σ (Xs ; s ≤ t ). Suppose for the moment that Xt has continuous paths. For this to even make sense, it is necessary that the set {t → Xt is not continuous} to be in F , and then we require this event to be Px -null for to be the set of continuous functions on [0, ∞). If set X t = ω (t ). each x. Define ω ∈ , x x Define Ft = σ (Xs ; s ≤ t ) and F∞ = ∨t≥0 Ft . Finally define P on (, F∞ ) by P (X ∈ ·) = Px (X ∈ ·). Thus Px is specified uniquely by t1 ∈ A1 , . . . , X tn ∈ An ) = Px (Xt1 ∈ A1 , . . . , Xtn ∈ An ) Px ( X for n ≥ 1, A1 , . . . , An Borel subsets of S , and t1 < · · · < tn . Clearly there is so far no loss t , (or gain) by looking at the Markov process (X Px ), which is called the canonical process. Let us now suppose we are working with the canonical process, and we drop the tildes everywhere. We define the shift operators θt : → as follows. θt (ω) will be an element of and therefore is a continuous function from [0, ∞) to S . Define θt (ω)(s) = ω(t + s). Then Xs ◦ θt (ω) = Xs (θt (ω)) = θt (ω)(s) = ω(t + s) = Xt+s (ω). The shift operator θt takes the path of X and chops off and discards the part of the path before time t. We will use expressions like f (Xs ) ◦ θt . If we apply this to ω ∈ , then ( f (Xs ) ◦ θt )(ω) = f (Xs (θt (ω))) = f (Xs+t (ω)), or f (Xs ) ◦ θt = f (Xs+t ). If the paths of X are not continuous, but instead only right continuous with left limits, being the collection of we can follow exactly the above procedure, except we start with functions from [0, ∞) to S that are right continuous with left limits. Even if we are not in this canonical setup, from now on we will suppose there exist shift operators mapping into itself so that Xs ◦ θt = Xs+t .
Exercises 19.1 Suppose (Xt , Px ) is a Brownian motion and St = sups≤t Xs . Show that ((Xt , St ), Px ) is a Markov process and determine the transition probabilities. x 19.2 Suppose (X bounded, Borel measurable function, tt , P ) is a Brownian motion, f a non-negative, and At = 0 f (Xs ) ds. Show that ((Xt , At ), Px ) is a Markov process.
19.3 Suppose Pt is a Poisson process with parameter λ. Let be the collection of functions on [0, ∞) which are right continuous and which have left limits, let F be the σ -field on generated by the cylindrical subsets of , let Ptx = x + Pt , and let Px be the law of x + P. Show that (Xt , Px ) is a Markov process and determine the transition probabilities.
Notes
159
19.4 Suppose m is a measure on the Borel subsets B of a metric space S . Suppose for each t > 0 there exist jointly measurable non-negative functions pt : S × S → R such that pt (x, y) m(dy) = 1 for each x and t and define Pt (x, A) =
pt (x, y) m(dy). A
Show that the kernels Pt satisfy the Chapman–Kolmogorov equations if and only if ps (x, y)pt (y, z) m(dy) = ps+t (x, z) for every s, t ≥ 0, every x ∈ S , and m-almost every z. 19.5 The Ornstein–Uhlenbeck process Y started at x is a continuous Gaussian process with E Yt = e−t/2 x and covariance Cov (Ys , Yt ) = e−(s+t )/2 (es∧t − 1). If X is the canonical process and Px is the law of an Ornstein–Uhlenbeck process started at x, show that (Xt , Px ) is a Markov process and determine the transition probabilities.
Notes For more, see Blumenthal and Getoor (1968).
20 Markov properties
We want to accomplish three things in this chapter. First, we want to talk about what it means in the Markov process context for a filtration to satisfy the usual conditions. This is now more complicated than in Chapter 1 because we have more than one probability measure. Second, we want to extend the Markov property to expressions that are more complicated than E x [ f (Xs+t ) | F s ]. Third, we want to look at the strong Markov property, which means we look at expressions like E x [ f (XT +t ) | FT ], where T is a stopping time. Throughout this chapter we assume that X has paths that are right continuous with left limits. To be more precise, if N = {ω : the function t → Xt (ω) is not right continuous with left limits}, then we assume N ∈ F and N is Px -null for every x ∈ S .
20.1 Enlarging the filtration Let us first introduce some notation. Define
Ft00 = σ (Xs ; s ≤ t ),
t ≥ 0.
(20.1)
This is the smallest σ -field with respect to which each Xs is measurable for s ≤ t. We let Ft0 be the completion of Ft00 , but we need to be careful what we mean by completion here, because we have more than one probability measure present. Let N be the collection of sets that are Px -null for every x ∈ S . Thus N ∈ N if (Px )∗ (N ) = 0 for each x ∈ S , where (Px )∗ is the outer probability corresponding to Px . The outer probability (Px )∗ is defined by (Px )∗ (S) = inf {Px (B) : A ⊂ B, B ∈ F }. Let
Ft0 = σ (Ft00 ∪ N ).
(20.2)
0 0 Ft = Ft+ = ∩ε>0 Ft+ε .
(20.3)
Finally, let We call {Ft } the minimal augmented filtration generated by X . Ultimately, we will work only with {Ft }, but we need the other two filtrations at intermediate stages. The reason for worrying about which filtrations to use is that {Ft00 } is too small to include many interesting sets (such as those arising in the law of the iterated logarithm, for example), while if the filtration is too large, the Markov property will not hold for that filtration. 160
20.1 Enlarging the filtration
161
The filtration matters when defining a Markov process; see Definition 19.1(3). We will assume throughout this section that (Xt , Px ) is a Markov process with respect to the filtration {Ft00 }, that is,
Px (Xs+t ∈ A | Fs00 ) = PXs (Xt ∈ A),
Px -a.s.
(20.4)
whenever A is a Borel subset of S and s, t ≥ 0. We will also make the following assumption, which will be needed here and also in Section 20.3. Assumption 20.1 Suppose Pt f is continuous on S whenever f is bounded and continuous on S . Markov processes satisfying Assumption 20.1 are called Feller processes or weak Feller processes. If Pt f is continuous whenever f is bounded and Borel measurable, then the Markov process is said to be a strong Feller process. We show that we can replace Ft00 in (20.4) by Ft0 . Proposition 20.2 Let (Xt , Px ) be a Markov process and suppose that (20.4) holds. If A is a Borel subset of S , x ∈ S , and s, t ≥ 0, then
Px (Xs+t ∈ A | Fs0 ) = PXs (Xt ∈ A),
Px -a.s.
(20.5)
Proof Since the right-hand side is a function of Xs and hence Fs0 measurable, we need to show that if B ∈ Fs0 , then Px (Xs+t ∈ A, B) = E x PXs (Xt ∈ A); B . (20.6) This holds for B ∈ Fs00 by (20.4). It holds for sets B ∈ N , the class of null sets, since both sides are 0. Therefore it holds for sets B such that there exists B1 ∈ Fs00 with BB1 being a null set. By linearity it holds for finite disjoint unions of sets of the form just described. The class of such finite disjoint unions is a monotone class that generates Fs0 , and our result follows by the monotone class theorem, Theorem B.2. The next step is to go from Fs0 to Fs . Proposition 20.3 Let (Xt , Px ) be a Markov process and suppose that (20.4) holds. If Assumption 20.1 holds and f is a bounded Borel measurable function, then
E x [ f (Xs+t ) | Fs ] = E Xs f (Xt ),
Px -a.s.
(20.7)
It will turn out (see Proposition 20.7 below) that Fs0 is equal to Fs , but we do not know this yet. Proof
We start with (20.5). By linearity, we have
E x [ f (Xs+t ) | Fs0 ] = E Xs f (Xt ),
Px -a.s.,
(20.8)
when f is a simple random variable, then by monotone convergence when f is non-negative, and then by linearity again, when f is bounded and Borel measurable. In particular, we have this when f is bounded and continuous.
162
Markov properties
0 0 If B ∈ Fs = Fs+ , then B ∈ Fs+ε for every ε > 0. Hence by (20.8) with s replaced by s + ε, if f is bounded and continuous, E x [ f (Xs+t+ε ); B] = E x E Xs+ε f (Xt ); B . (20.9)
The right-hand side is equal to
E x [Pt f (Xs+ε ); B]; since Pt f is continuous and Xt has paths that are right continuous with left limits, this converges to E x [Pt f (Xs ); B] = E x E Xs f (Xt ); B by dominated convergence. The left-hand side of (20.9) converges, using dominated convergence, the continuity of f , and the fact that X has paths that are right continuous with left limits, to
E x [ f (Xs+t ); B]. We therefore have
E x [ f (Xs+t ); B] = E x E Xs f (Xt ); B .
(20.10)
A limit argument shows this holds whenever f is bounded and measurable. Since B is an arbitrary event in Fs , that completes the proof. Remark 20.4 In Chapter 16, we discussed the fact that the first time a right continuous process whose jump times are totally inaccessible hits a Borel set is a stopping time, provided the filtration satisfies the usual conditions. Even though the notion of completion of a filtration is a bit different in the context of Markov processes, the result is still true. See Blumenthal and Getoor (1968).
20.2 The Markov property We start with the Markov property given by Proposition 20.3:
E x [ f (Xs+t ) | Fs ] = E Xs [ f (Xt )],
Px -a.s.
(20.11)
Since f (Xs+t ) = f (Xt ) ◦ θs , if we write Y for the random variable f (Xt ), we have
E x [Y ◦ θs | Fs ] = E Xs Y,
Px -a.s.
(20.12)
We wish to generalize this to other random variables Y . Proposition 20.5 Let (Xt , Px ) be a Markov process and suppose (20.11) holds. Suppose Y = ni=1 fi (Xti −s ), where the fi are bounded, Borel measurable, and s ≤ t1 ≤ · · · ≤ tn . Then (20.12) holds.
20.2 The Markov property
163
Proof We will prove this by induction on n. The case n = 1 is (20.11), so we suppose the equality holds for n and prove it for n + 1. y Let V = n+1 j=2 f j (Xt j −t1 ) and h(y) = E V . By the induction hypothesis,
Ex
n+1 j=1
f j (Xt j )|Fs = E x E x [V ◦ θt1 |Ft1 ] f1 (Xt1 )|Fs = E x (E Xt1 V ) f1 (Xt1 )|Fs = E x [(h f1 )(Xt1 )|Fs ].
By (20.11) this is E Xs [(h f1 )(Xt1 −s )]. For any y,
E y [(h f1 )(Xt1 −s )] = E y [(E Xt1 −s V ) f1 (Xt1 −s )] = E y E y [V ◦ θt1 −s |Ft1 −s ] f1 (Xt1 −s ) = E y [(V ◦ θt1 −s ) f1 (Xt1 −s )]. If we replace V by its definition, replace y by Xs , and use the definition of θt1 −s , we get the desired equality for n + 1 and hence the induction step. We now come to the general version of the Markov property. As usual, F∞ = ∨t≥0 Ft . The expression Y ◦ θt for general Y may seem puzzling at first. We will give some examples when we get to applications of the strong Markov property in Chapter 21. Theorem 20.6 Let (Xt , Px ) be a Markov process and suppose (20.11) holds. Suppose Y is bounded and measurable with respect to F∞ . Then
E x [Y ◦ θs | Fs ] = E Xs Y, Proof
Px -a.s.
(20.13)
If in Proposition 20.5 we take f j (x) = 1A j (x) for Borel measurable A j , we have
E x [1B ◦ θs | Fs ] = E Xs 1B
(20.14)
when B = {ω : ω(t1 ) ∈ A1 , . . . , ω(tn ) ∈ An }. It is easy to see that the set of B’s for which (20.14) holds is a monotone class. By an argument using the monotone class theorem, (20.14) holds for all B that are measurable with respect to F∞ . Taking linear combinations, (20.13) holds for Y ’s that are simple random variables. Using monotone convergence, (20.13) holds for non-negative Y ’s, and then by linearity for bounded Y ’s. Proposition 20.7 Let (Xt , Px ) be a Markov process with respect to {Ft }. Let Ft0 and Ft be defined by (20.2) and (20.3). Then Ft = Ft0 for each t ≥ 0. Proof Let Y1 = ni=1 fi (Xti ) and Y2 = mj=1 g j (Xu j ), where t1 < · · · < tn ≤ s and 0 ≤ u1 < · · · < um and the f j and g j are bounded Borel measurable functions. Then by Proposition 20.5,
E x [(Y1 )(Y2 ◦ θs ) | Fs ] = Y1 E Xs Y2 . Since E Xs Y2 is a function of Xs , then (Y1 )(E Xs Y2 ) is Fs0 measurable. Using a monotone class argument, we conclude that if Y is bounded and F∞ measurable, then E x [Y | Fs ] is Fs0
164
Markov properties
measurable. Now apply this to Y = 1A for A ∈ Fs to obtain that 1A = E x [1A | Fs ] is Fs0 measurable. The following is known as the Blumenthal 0–1 law. Proposition 20.8 Let (Xt , Px ) be a Markov process with respect to {Ft }. If A ∈ F0 , then for each x, Px (A) is equal to 0 or 1. Proof
Suppose A ∈ F0 . Under Px , X0 = x, a.s., and then
Px (A) = E X0 1A = E x [1A ◦ θ0 | F0 ] = 1A ◦ θ0 = 1A ∈ {0, 1},
Px -a.s.
since 1A ◦ θ0 is F0 measurable. Our result follows because Px (A) is a real number and not random.
20.3 Strong Markov property Given a stopping time T , recall that the σ -field of events known up to time T is defined to be FT = A ∈ F∞ : A ∩ (T ≤ t ) ∈ Ft for all t > 0 . We define θT by θT (ω)(t ) = ω(T (ω) + t ). Thus, for example, Xt ◦ θT (ω) = XT (ω)+t (ω) and XT (ω) = XT (ω) (ω). Now we can state the strong Markov property. The notation and definition are admittedly a bit opaque at this stage – be patient until we reach the examples in the next chapter. Theorem 20.9 Suppose (Xt , Px ) is a Markov process with respect to {Ft }, that Assumption 20.1 holds, and that T is finite stopping time. If Y is bounded and measurable with respect to F∞ , then
E x [Y ◦ θT |FT ] = E XT Y, Proof
Px -a.s.
Following the proofs of Section 20.2, it is enough to prove
E x [ f (XT +t )|FT ] = E XT f (Xt )
(20.15)
for f bounded. We can obtain this by a limit argument if we have (20.15) for f bounded and continuous. Define Tn to be equal to (k + 1)/2n on the event (k/2n ≤ T < (k + 1)/2n ). If A ∈ FT , then A ∈ FTn . Therefore A ∩ (Tn = k/2n ) ∈ Fk/2n and we have by the Markov property, Theorem 20.6,
E x [ f (XTn +t ); A, Tn = k/2n ] = E x [ f (Xt+k/2n ); A, T = k/2n ] = E x [E Xk/2n f (Xt ); A, Tn = k/2n ] = E x [E XTn f (Xt ); A, Tn = k/2n ].
20.3 Strong Markov property
165
Then
E x [ f (XTn +t ); A] =
∞
E x [ f (XTn +t ); A, Tn = k/2n ]
k=1
=
∞
E x E XTn f (Xt ); A, Tn = k/2n
k=1 x
= E [E XTn f (Xt ); A]. Now let n → ∞. E x [ f (XTn +t ); A] → E x [ f (XT +t ); A)] by dominated convergence and the continuity of f and the right continuity of Xt . On the other hand, using the continuity of Pt f , E XTn f (Xt ) = Pt f (XTn ) → Pt f (XT ) = E XT f (Xt ). Therefore
E x [ f (XT +t ); A] = E x [E XT f (Xt ); A] for all A ∈ FT , and hence (20.15) holds. Recall that we are restricting our attention to Markov processes whose paths are right continuous with left limits. If we have a Markov process (Xt , Px ) whose paths are right continuous with left limits, which has shift operators {θt }, and which satisfies the conclusion of Theorem 20.9, whether or not Assumption 20.1 holds, then we say that (Xt , Px ) is a strong Markov process. A strong Markov process is said to be quasi-left continuous if XTn → XT , a.s., on {T < ∞} whenever Tn are stopping times increasing up to T . Unlike in the definition of predictable stopping times given in Chapter 16, we are not requiring the Tn to be strictly less than T . A Hunt process is a strong Markov process that is quasi-left continuous. Quasi-left continuity does not imply left continuity; consider the Poisson process. Proposition 20.10 If (Xt , Px ) is a strong Markov process and Assumption 20.1 holds, then Xt is quasi-left continuous. Proof First suppose T is bounded, Tn increases to T , Y = limn→∞ XTn , and f and g are bounded and continuous. If Tn = T for some n, then limn→∞ g(XTn +t ) = g(XT +t ), and if Tn < T for all n, then limn→∞ g(XTn +t ) = g(X(T +t )− ), where Xs− is the left-hand limit at time s. In either case, lim lim g(XTn +t ) = g(XT ). t→0 n→∞
Then
E x [ f (Y )g(XT )] = lim lim E x [ f (XTn )g(XTn +t )] t→0 n→∞
= lim lim E x [ f (XTn )Pt g(XTn )] t→0 n→∞ x
= lim E [ f (Y )Pt g(Y )] = E x [ f (Y )g(Y )]. t→0
By a limit argument we have
E x [h(Y, XT )] = E x [h(Y, Y )]
(20.16)
for all bounded measurable functions h on S × S . Now take h(x, y) to be zero if x = y and one otherwise. The right-hand side of (20.16) is 0, so the left-hand side is also.
166
Markov properties
If T is not bounded, apply the argument in the preceding paragraph to the stopping time T ∧ M, where M is a positive real, and then let M → ∞.
Exercises 20.1 Suppose that S is a locally compact separable metric space and C0 is the set of continuous functions on S that vanish at infinity. To say a continuous function f vanishes at infinity means that given ε > 0 there exists a compact set K such that | f (x)| < ε if x ∈ / K. Show that if Assumption 20.1 is replaced by the assumptions that Pt f ∈ C0 whenever f ∈ C0 and Pt f → f uniformly as t → 0 whenever f ∈ C0 , then the conclusion of Theorem 20.9 still holds. 20.2 Suppose (Xt , Px ) is a Markov process with respect to a filtration {Ft }. Suppose that Et ⊂ Ft for each t and that Xt is Et measurable for each t. Show that (Xt , Px ) is a Markov process with respect to the filtration {Et }. 20.3 Give an example of a Markov process that is not a strong Markov process. Hint: Let the state space be [0, ∞) and starting from x ∈ (0, ∞), let X move deterministically at constant speed to the right. Starting at 0, let X wait an exponential length of time, and then begin moving at constant speed to the right. 20.4 Let (Xt , Px ) be Brownian motion and let {Ft } be the minimal augmented filtration. Suppose B ∈ ∨t≥0 Ft and for some s > 0 is of the form 1B = 1A ◦ θs . Show that if B is a Px -null set for some x, then it is a Px -null set for every x. 20.5 Let Pt be transition probabilities for a Poisson process with parameter λ. These are defined in Exercise 19.3. Show that Assumption 20.1 holds. 20.6 Suppose (Xt , Px ) is a Markov process with transition probabilities Pt , f is a bounded Borel measurable function, t0 > 0, and we define Mt = Pt0 −t f (Xt ) for t ≤ t0 . Show that (Mt , t ≤ t0 ) is a Px -martingale for each x. 20.7 Use the Blumenthal 0–1 law to show that if W is a one-dimensional Brownian motion and T = inf {t > 0 : Wt > 0} is the first time Brownian motion hits (0, ∞), then P(T = 0) = 1. 20.8 Let A be a Borel subset of a metric space S . Let TA = inf {t : Xt ∈ A}, where (Xt , Px ) is a strong Markov process. Show that Px (TA = 0) is either 0 or 1 for each x. 20.9 Let (Xt , Px ) be a strong Markov process and let A be a Borel subset of S . We define Ar by setting Ar = {x : Px (TA = 0) = 1}, where TA is the first hitting time of A. Thus Ar is the set of points that are regular for A. Prove that for each x, Px (XTA ∈ A ∪ Ar ) = 1.
21 Applications of the Markov properties
We give some applications of the Markov property and the strong Markov property. In the first application, we show that d-dimensional Brownian motion is transient if d ≥ 3. Next we consider estimates on additive functionals. (An example of an additive functional is t At = 0 f (Xs ) ds, where f is a non-negative function on the state space of the Markov process X .) Third is a sufficient criterion for a Markov process to have continuous paths. Finally, we discuss harmonic functions and show how to solve the classical Dirichlet problem of analysis and partial differential equations.
21.1 Recurrence and transience Let Wt = (W1 (t ), . . . , Wd (t )) be a d-dimensional Brownian motion started at 0 with d ≥ 3 and let Wt x = x + Wt be Brownian motion started at x. Let h(y) = |y|2−d . A direct calculation of derivatives shows that h(x) =
d ∂ 2h i=1
∂x2i
(x) = 0,
x = 0.
(Noting that ∂ 2 yi ∂ |y| = (y1 + · · · + y2d )1/2 = ∂yi ∂yi |y| helps with the calculation.) By Exercise 9.4, Wi , W j t equals 0 if i = j and we saw in Section 9.3 that it equals t if i = j. Suppose r < |x| < R, and let S = inf {t : |Wt x | ≤ r or |Wt x | ≥ R}. S is finite, a.s., because |Wt x | ≥ |W1 (t )| − |x| and W1 (t ) exits [−2R, 2R] in finite time by Theorem 7.2. By Itˆo’s formula, x ) = h(W0x ) + martingale + h(Wt∧S
1 2
t∧S 0
d ∂ 2h i=1
∂x2i
(Wsx ) ds
= h(x) + martingale. Therefore h(Wt∧S ) − h(x) is a martingale started at 0. The function h is equal to r2−d on ∂B(0, r), the boundary of B(0, r), and equal to R2−d on ∂B(0, R), the boundary of B(0, R). 167
168
Applications of the Markov properties
By Corollary 3.17, we deduce
P(Wt x hits B(0, r) before B(0, R)) = P(h(Wt x ) − h(x) hits r2−d − |x|2−d before R2−d − |x|2−d ) =
|x|2−d − R2−d . r2−d − R2−d
If we let R → ∞ and recall that 2 − d < 0, we see that
r d−2 P(Wt x ever hits ∂B(0, r)) = . |x|
(21.1)
We want to use the strong Markov property to go from (21.1) to lim |Wt x | = ∞.
t→∞
(There are other ways besides the strong Markov property of showing this.) The first step in doing this is to convert to the Markov process notation. Let (Xt , Px ) be a Brownian motion. What we have shown is that
r d−2 Px (Xt ever hits ∂B(0, r)) = . (21.2) |x| Let M > 0 and let S1 = inf {t : |Xt | ≥ 2M}, T1 = inf {t > S1 : |Xt | ≤ M}, S2 = inf {t > T1 : |Xt | ≥ 2M}, T2 = inf {t > S2 : |Xt | ≤ M}, and so on. Another way of writing this is to define S = inf {t > 0 : |Xt | ≥ 2M},
T = inf {t > 0 : |Xt | ≤ M},
and then to let S1 = S, and for each i ≥ 1, Ti = Si + T ◦ θSi ,
Si+1 = Ti + S ◦ θTi .
Let us explain what is going on. Given a path ω, which is a continuous function from [0, ∞) to Rd , T ◦ θSi means to proceed along the path until time Si , disregard this piece, and then see how long it takes after time Si to first enter B(0, M ). If we add the quantity Si to T ◦ θSi , we then get the amount of time for Xt to first enter B(0, M ) after time Si . Thus Ti with the shift notation is the same as inf{t > Si : Xt ∈ B(0, M )}. The shift notation interpretation of Si+1 is similar. Now we can apply the strong Markov property. Since Ti+1 = Si+1 + T ◦ θSi+1 , we can write
Px (Ti+1 < ∞) = Px (Si+1 < ∞, T ◦ θSi+1 < ∞) = E x Px (T ◦ θSi+1 < ∞ | FSi+1 ); Si+1 < ∞ = E x PXSi+1 (T < ∞); Si+1 < ∞ .
21.2 Additive functionals
169
At time Si+1 , we have |XSi+1 | = 2M, and by (21.1)
PXSi+1 (T < ∞) = ( 12 )d−2 . Therefore
Px (Ti+1 < ∞) ≤ 22−d Px (Si+1 < ∞) ≤ 22−d Px (Ti < ∞). The last inequality is simply the fact that Si+1 ≥ Ti . Since Px (T1 < ∞) ≤ 1, induction tells us that
Px (Ti < ∞) ≤ 2(2−d )(i−1) → 0 as i → ∞. Hence Px (Ti < ∞ for all i) = 0. Since Ti increases as i increases, for almost all ω, Ti will be infinite for i sufficiently large (how large will depend on ω). Hence Xt returns to B(0, M ) for a last time, a.s. Since M is arbitrary, this proves that Xt tends to ∞ as t → ∞. We have thus proved Proposition 21.1 If (Xt , Px ) is a d-dimensional Brownian motion and d ≥ 3, then |Xt | → ∞ as t → ∞ with Px -probability one for each x.
21.2 Additive functionals Let D be a closed subset of S , let f : D → [0, ∞), let S = τD , and let S x A = sup E f (Xs ) ds, x∈D
0
where τD = inf {t > 0 : Xt ∈ / D} is the first time X exits D. Proposition 21.2 If A < ∞, then sup P x∈D
x
S
f (Xs )ds ≥ 2kA ≤ 2−k .
(21.3)
0
This is rather remarkable: as soon as one gets a bound on the expectation, although it must be uniform in x, one gets exponential tails for the distribution. A use of Chebyshev’s inequality would only give the bound (2k)−1 . t∧S Proof Let Bt = 0 f (Xs ) ds. This is a special case of what is known as an additive functional; see Section 22.3. Let U1 = inf {t : Bt ≥ 2A}, and let Ui+1 = Ui + U1 ◦ θUi . To explain this formula, composing ω with θUi means we disregard the path before time Ui . Thus U1 ◦ θUi is the length of time after time Ui until Bt has increased an amount 2A over its value at Ui . Therefore Ui + U1 ◦ θUi is the (i + 1)st time B has increased by 2A. The event Px (BS ≥ 2kA) is bounded by
Px (Uk ≤ S) = Px (Uk−1 ≤ S, U1 ◦ θUk−1 ≤ S ◦ θUk−1 ) = E x Px (U1 ◦ θUk−1 ≤ S ◦ θUk−1 |FUk−1 ); Uk−1 ≤ S = E x PXUk−1 (U1 ≤ S); Uk−1 ≤ S .
170
Applications of the Markov properties
If Uk−1 ≤ S, then XUk−1 ∈ D. If y ∈ D,
P (U1 ≤ S) ≤ P y
y
S
f (Xs )ds ≥ 2A ≤
Ey
S
0
0
f (Xs )ds ≤ 2A
1 2
by Chebyshev’s inequality. Then
Px (Uk ≤ S) ≤ 12 Px (Uk−1 ≤ S) and (21.3) follows by induction. We give another proof of Proposition 4.5. Proposition 21.3 Let W be a one-dimensional Brownian motion. If T is a finite stopping time and a < b, then b−a , P(WT +t ∈ [a, b] | FT ) ≤ √ 2πt Proof
a.s.
Let (Xt , Px ) be a one-dimensional Brownian motion. If y ∈ R, then
Py (Xt ∈ [a, b]) = P0 (Xt ∈ [a − y, b − y]) b−y b−a 1 2 . e−z /2t dz ≤ √ =√ 2πt a−y 2πt
(21.4)
By the strong Markov property,
P(WT +t ∈ [a, b] | FT ) = P0 (XT +t ∈ [a, b] | FT ) = E 0 [1[a,b] (Xt ) ◦ θT | FT ] = E XT [1[a,b] (Xt )] = PXT (Xt ∈ [a, b]). Now use (21.4) with y replaced by XT .
21.3 Continuity Let us now come up with a criterion for a Markov process to have continuous paths. We assume we have a strong Markov process (Xt , Px ) whose paths are right continuous with left limits. Let d(·, ·) be the metric for the state space S . Lemma 21.4 Let (Xt , Px ) be a strong Markov process with state space S . For all x ∈ S and all λ ≥ 0,
Px (sup d(Xs , x) ≥ λ) ≤ 2 sup sup Py (d(Xs , X0 ) ≥ λ/2). s≤t
s≤t y∈S
Note that the left-hand side has the supremum inside while the right-hand side has the suprema outside the probability. Proof
Let us use the notation F (t, λ) = sup sup Py (d(Xs , X0 ) ≥ λ). s≤t y∈S
(21.5)
21.4 Harmonic functions
171
Write S = inf {t : d(Xt , X0 ) ≥ λ}. Then by the strong Markov property,
Px (sup d(Xs , x) ≥ λ) ≤ Px (d(Xt , x) ≥ λ/2) + Px (S < t, d(Xt , X0 ) ≤ λ/2) s≤t ≤ F (t, λ/2) + E x PXS (d(Xt−S , X0 ) ≥ λ/2) ≤ 2F (t, λ/2);
(21.6)
see Exercise 21.2. Proposition 21.5 Let (Xt , Px ) be a strong Markov process. With F (t, λ) defined as in (21.5), suppose F (t, λ) →0 (21.7) t as t → 0 for each λ > 0. Then Xt has continuous paths with Px -probability one for each x. For X a Brownian motion, F (t, λ) ≤ 2e−λ /8t by Proposition 3.15, and hence F (t, λ)/t → 0 as t → 0. Thus Brownian motion satisfies (21.7). On the other hand, (21.7) is not satisfied for the Poisson process; see Exercise 21.3. 2
Proof Suppose λ, t0 > 0 and X has a jump of size larger than 4λ at some time before t0 with positive probability, that is,
Px (sup d(Xt− , Xt ) ≥ 4λ) > 0, t≤t0
where Xt− = lims↑t,s
d(Xs , Xt ) ≥ 4λ;
[x] is the largest integer less than or equal to x. Therefore there exists k ≤ [t0 2n ] + 1 such that sup s∈[k/2n ,(k+1)/2n ]
d(Xs , Xk/2n ) ≥ 2λ.
But by Lemma 21.4
Px (∃k ≤ [t0 2n ] + 1 :
sup k/2n ≤s≤(k+1)/2n
d(Xs , Xk/2n ) ≥ 2λ)
≤ ([t0 2n ] + 1) sup Py ( sup d(Xs , X0 ) ≥ 2λ) s≤2−n
y
−n
≤ 2([t0 2 ] + 1)F (2 , λ) n
for every n. In the first inequality we used the Markov property at time k/2n and the fact that there are at most [t0 2n ] + 1 intervals. Letting n → ∞, we see the probability of a jump of size larger than 4λ before time t0 must be zero. Since λ and t0 are arbitrary, the paths of X are continuous.
21.4 Harmonic functions Suppose (Xt , Px ) is a continuous Markov process satisfying the strong Markov property, and for each x, the sets of paths are right continuous with left limits with Px -probability one. Let
172
Applications of the Markov properties
D be an open subset of S , and suppose that τD < ∞, a.s., with respect to each Px , where τD = inf {t : Xt ∈ / D} is the time of the first exit from D. Let f be a bounded measurable function on ∂D, the boundary of D. Proposition 21.6 Define h(x) = E x f (XτD ) and Fs = Fs∧τD . Then for each x, h(Xt∧τD ) is a martingale under Px with respect to the filtration {Ft }. Proof Let s < t. Consider a path ω starting at x and continuing until it exits D at time τD (ω). If we have u ≤ τD and we cut off the first u time units of the path, we have a path going from Xu (ω) and proceeding until it exits D. But note that the point at which it exits will not be changed by cutting off a piece from the beginning of the path. Therefore XτD ◦ θu = XτD if u ≤ τD . Using this, E x [h(Xt∧τD ) | Fs∧τD ] = E x E Xt∧τD f (XτD ) | Fs∧τD = E x [E x [ f (XτD ) ◦ θt∧τD | Ft∧τD ] | Fs∧τD ] = E x [ f (XτD ) | Fs∧τD ] = E x [ f (XτD ) ◦ θs∧τD | Fs∧τD ] = E Xs∧τD f (XτD ) = h(Xs∧τD ), as required. This becomes particularly interesting in the case when Xt is a d-dimensional Brownian motion. Suppose D is a bounded domain (i.e., a bounded open subset) in Rd . There exists M such that D ⊂ B(0, M ). We know Xt 1 , the first component of Xt is a one-dimensional Brownian motion, and by Theorem 7.2, Xt 1 will exit [−M, M] in finite time, no matter what X01 is. Therefore the time for Xt to exit D will be finite almost surely with respect to each Px . Take x ∈ D and take δ smaller than the distance from x to the boundary of D. If S = inf {t : |Xt − x| ≥ δ}, the first time X leaves the ball of radius δ about x, then by Proposition 21.6 and optional stopping, we have h(x) = E x h(XS ).
(21.8)
By Exercise 2.3 we know that d-dimensional Brownian motion is rotationally invariant. We conclude from this that the location where a Brownian motion hits the boundary of a ball of radius δ about the starting point must have a uniform distribution. Hence XS will be uniformly distributed on ∂B(x, δ). Thus (21.8) can be rewritten as h(y) σx,δ (dy), h(x) = ∂B(x,δ)
where σx,δ is a surface measure on ∂B(x, δ) normalized to have total mass one. This holds for every δ small enough, and since h is bounded (because f is), it can be shown that h is C 2
21.4 Harmonic functions
173
in D and is harmonic there: h(x) =
d ∂ 2h i=1
∂x2i
(x) = 0;
the proof is not obvious – see Bass (1995), Section II.1. We can use Proposition 21.6 to give a solution to the Dirichlet problem. In the Dirichlet problem one is given a domain in Rd and a continuous function f on the boundary of D. One wants to find a continuous function h that is harmonic inside D, that is, h(x) = 0 for x ∈ D, and that agrees with f on ∂D. There are domains for which one cannot solve the Dirichlet problem, but a solution can be found provided the domain is moderately nice. We explain how to solve the Dirichlet problem probabilistically; the class of domains where one can do this is the same as the class where one can solve the Dirichlet problem analytically. Let us say that a point x is regular for a Borel subset A if Px (TA = 0) = 1, where TA = inf {t > 0 : Xt ∈ A}. Thus a point x is regular for a set A if starting at x the Brownian motion enters A immediately. For example, a consequence of Theorem 7.2 is that the point 0 is regular for the set A = (0, ∞) when we have a one-dimensional Brownian motion. Theorem 21.7 Suppose D is a bounded open domain in Rd and f is a function on ∂D that is continuous on ∂D. Let (Xt , Px ) be a d-dimensional Brownian motion and τD = inf {t : Xt ∈ Dc }. If each point of ∂D is regular for Dc , then h(x) = E x f (XτD ) is a solution to the Dirichlet problem. The regularity condition says that starting at any point of ∂D, Brownian motion enters Dc immediately. Uniqueness of the solution to the Dirichlet problem is easy, and we do not address this here. Proof We have already seen in Proposition 21.6 and the remarks immediately following the proof of that proposition that h is harmonic in D. This implies that h is continuous in D. Thus we only need to show that h agrees with f on ∂D. Our first step is to fix t and ε and to show that the set {x : Px (τD ≤ t ) > 1 − ε} is an open set. Let s < t, define ϕs (x) = Px (τD ≤ t − s), and let ws (x) = Px (Xu ∈ Dc for some u ∈ [s, t]). By the Markov property at time s, ws (x) = E x PXs (Xu ∈ Dc for some u ∈ [0, t − s]) = E x [PXs (τD ≤ t − s)] 2 x −d/2 ϕs (y)e−|x−y| /2s dy. = E ϕs (Xs ) = (2π s) By dominated convergence, the last integral is a continuous function of x. If w0 (x) = Px (Xu ∈ Dc for some u ∈ [0, t]), then ws (x) ↑ w0 (x), so {x : w0 (x) > 1 − ε} = ∪s∈(0,t ) {x : ws (x) > 1 − ε} is open. Let z ∈ ∂D. Let ε > 0 and choose η such that | f (w) − f (z)| < ε if |w − z| < η and w ∈ ∂D. Pick t small so that P0 (sups≤t |Xs | > η/2) < ε; this is possible because Brownian
174
Applications of the Markov properties
motion has continuous paths. Because z ∈ ∂D and every point of ∂D is regular for Dc , Pz (τD ≤ t ) = 1. Finally choose δ < (η/2) ∧ ε so that if |w − z| < δ and w ∈ D, then Pw (τD ≤ t ) > 1 − ε. Now if |w − z| < δ and w ∈ D, then
Pw (|XτD − z| < η) ≥ Pw (τD ≤ t, sup |Xs − w| ≤ η/2) s≤t
≥ P (τD ≤ t ) − P0 (sup |Xs | > η/2) w
s≤t
≥ (1 − ε) − ε. The set ∂D is a bounded and closed subset of Rd , hence compact, and since f is continuous on ∂D, there exists M such that | f | is bounded by M. If |w − z| < δ and w ∈ D, |h(w) − f (z)| = |E w f (XτD ) − f (z)| ≤ |E w [ f (XτD ); |XτD − z| < η] − f (z)Pw (|XτD − z| < η)| + 2M Pw (|XτD − z| ≥ η) ≤ ε Pw (|XτD − z| < η) + 4Mε ≤ (1 + 4M )ε. We used the fact that | f (XτD ) − f (z)| < ε if |XτD − z| < η. Since ε is arbitrary, this proves that h(w) → f (z) as w → z inside D. a = Let us give a sufficient condition for a point to be regular for a domain D. Let V 2 2 2 2 {(x1 , . . . , xd ) : x1 > 0, (x2 + · · · + xd ) < a x1 }. The vertex of Va is the origin. A cone V in a for some a. Rd is a translation and rotation of V The following is known as the Poincar´e cone condition. Proposition 21.8 Suppose there exists a cone V with vertex y ∈ ∂D such that V ∩ B(y, r) ⊂ Dc for some r > 0. Then y is regular for Dc . a Proof By translation and rotation of the coordinates, we may suppose y = 0 and V = V for some a. Then for each t,
P0 (τD ≤ t ) ≥ P0 (Xt ∈ Dc ) ≥ P0 (Xt ∈ V ∩ B(0, r)) ≥ P0 (Xt ∈ V ) − P0 (Xt ∈ / B(0, r)). √ By scaling, the last term is P0 (X1 ∈ V ) − P0 (X1 ∈ / B(0, r/ t )), which converges to 2 P0 (X1 ∈ V ) = (2π )−d/2 e−|z| /2 dz > 0 V
as t → 0. Observe P (τD ≤ t ) converges to P (τD = 0). By the Blumenthal 0–1 law (Proposition 20.8), P0 (τD = 0) = 1. 0
0
Continue to suppose (Xt , Px ) is a d-dimensional Brownian motion and D is a bounded domain, but now we suppose d ≥ 3. Define ∞ x 1A (Xs ) ds, x ∈ D. U (x, A) = E 0
21.4 Harmonic functions
175
This is the same as the λ-resolvent of 1A with λ = 0. We write ∞ x 1A (Xs ), ds U (x, A) = E ∞0 Px (Xs ∈ A) ds = 0 ∞ 1 2 e−|y−x| /2s dy ds = d/2 (2π s) 0 ∞A 1 2 e−|y−x| /2s ds dy. = d/2 (2π s) A 0 Some calculus shows that the inside integral is equal to c|x − y|2−d . If we denote c|x − y|2−d by u(x, y), we then have that (21.9) U (x, A) = u(x, y) dy. A
The expression u(x, y) is called the Newtonian potential density. Note that u(x, y) is a function only of |x − y|, it blows up as |x − y| → 0, and tends to 0 as |x − y| → ∞. If x is in the interior of D, then u(x, ·) will be bounded on ∂D. Define hx (z) = E z u(x, XτD ); we saw above that hx is harmonic. Now define gD (x, y) = u(x, y) − hx (y); this function of two variables is called the Green’s function or Green function for D with pole at x. This is a well-known object in analysis – let us give a probabilistic interpretation. Since u(x, y) is symmetric in x and y, if A ⊂ D we have gD (x, y) dx = u(x, y) dx − E y u(x, XτD ) dx (21.10) A A A ∞ = Ey 1A (Xs ) ds − E y u(x, XτD ) dx A 0 ∞ ∞ y y XτD 1A (Xs ) ds − E E 1A (Xs ) ds . =E 0
0
Using the strong Markov property and then a change of variables, ∞ ∞ y XτD y E E 1A (Xs ) ds = E E y 1A (Xs ) ◦ θτD ds | FτD 0 ∞ 0 y 1A (Xs ) ◦ θτD ds =E 0 ∞ 1A (XτD +s ) ds = Ey 0 ∞ 1A (Xs ) ds. = Ey τD
Substituting this in (21.10) we have y gD (x, y) dx = E A
τD
1A (Xs ) ds.
0
For this reason gD is sometimes called the occupation time density for D.
176
Applications of the Markov properties
Exercises , Px ) is a two-dimensional Brownian motion, and r
21.1 Suppose d = 2, (Xt > 0. Imitate the argument of Proposition 21.1 but with h(x) = log(|x|) to show that Px (Xt hits B(0, r)) = 1 when |x| > r. Then use the strong Markov property to show that there are times Ti → ∞ with XTi ∈ B(0, r). That is, two-dimensional Brownian motion is neighborhood recurrent. 21.2 In the proof of Lemma 21.4, justify each inequality in (21.6). 21.3 Let (Xt , Px ) be a Poisson process with parameter a and let F be defined by (21.5). Show F (t, 1/2)/t does not converge to 0 as t → 0. 21.4 Suppose d ≥ 3, (Xt , Px ) is a d-dimensional Brownian motion, and ∞ f (Xs ) ds. U f (x) = E x 0
Show that if f is bounded and measurable with compact support, then U f is continuous and |U f (x)| → 0 as |x| → ∞. Show that if f ∈ C 2 with compact support, then U f is C 2 . Show that 12 U f = − f . 21.5 Let Wt be a Brownian motion and f a continuous function. Prove that if f (Wt ) is a submartingale, then f must be convex. 21.6 Prove the maximum principle for harmonic functions. This says that if h is harmonic in a bounded domain D, then sup |h(x)| ≤ sup |h(x)|. x∈D
x∈∂D
21.7 If W is a d-dimensional Brownian motion started at 0, find E T , where T is the first time W exits the ball of radius r centered at the origin. Hint: Use the fact that |Wt |2 − dt is a martingale. 21.8 Let f : R → R be a bounded function with | f (x) − f (y)| ≤ |x − y| for all x, y ∈ R. Let Dε = {(x, y) ∈ R2 : f (x) < y < f (x) + ε} for ε ∈ (0, 1). Let (Xt , Px ) be a two-dimensional / Dε }. Prove that there exists a constant c not depending Brownian motion and let τε = inf {t : Xt ∈ on ε such that E 0 τε ≤ cε2 . Hint: By Exercise 21.7 the expected time for two-dimensional Brownian motion to leave a ball of radius 2ε is less than cε 2 . Then use the strong Markov property repeatedly at the times Si , where Si is the first time after time Si−1 that Brownian motion has moved at least 2ε from XSi−1 .
22 Transformations of Markov processes
There are a number of interesting transformations that make new Markov processes out of old. We will look at four: killing, conditioning, changing time, and stopping at a last exit time. These are only a few of the possible transformations.
22.1 Killed processes One sometimes wants to consider a Markov process up until a stopping time ζ , called the lifetime of the process. We affix to our state space S an isolated point , called the cemetery state, and the topology on S = S ∪ {} is the one generated by the collection of open sets by of S together with the set {}. We define the killed process X
t = Xt , t < ζ ; X , t ≥ ζ ,
(22.1)
and we say we kill the process X at time ζ . Every function f on S is defined to be 0 at . One example of this situation would be to let ζ = τD , where D is a subset of S and τD = inf {t > 0 : Xt ∈ / D}, the first exit from the set D. Another common occurrence is to let ζ = S, where S is a random variable with an exponential distribution with parameter λ, i.e., P(S > t ) = e−λt , such that S is independent of X . A third possibility would be to let t ζ = inf {t : 0 f (Xs ) ds ≥ 1}, where f is a non-negative function. The crucial property of ζ is that it be a terminal time: ζ = s + ζ ◦ θs
if s < ζ .
(22.2)
t , Px ) Proposition 22.1 If (Xt , Px ) is a strong Markov process and (22.2) holds, then (X satisfies the Markov and strong Markov properties. Proof
As in Section 20.2, we need to show t ) ◦ θT |FT ] = E XT f (X t ), E x [ f (X
Px -a.s.
If A ∈ FT , t ) ◦ θT ; A] = E x [ f (Xt+T ); A, T + t < ζ ]. E x [ f (X 177
178
Transformations of Markov processes
On the other hand, t ) = E XT [ f (Xt ); t < ζ ]1(T <ζ ) E XT f (X = E x [ f (Xt ) ◦ θT ; t ◦ θT < ζ ◦ θT |FT ]1(T <ζ ) = E x [ f (Xt+T ); T + t ◦ θT < T + ζ ◦ θT , T < ζ |FT ] = E x [ f (Xt+T ); T + t < ζ |FT ],
since T + t ◦ θT = T + t and T + ζ ◦ θT = ζ on (T < ζ ). Hence t ); A] = E x [ f (Xt+T ); T + t < ζ , A], E x [E XT f (X
as required.
22.2 Conditioned processes Another type of transformation of a Markov process is by conditioning, also known as Doob’s h-path transform. To motivate this, let D be a domain in Rd and let Xt be a Brownian motion killed on exiting the domain. One would like to give a precise meaning to the intuitive notion of Brownian motion conditioned to exit the domain at a certain point. Let h be a positive harmonic function in D (i.e., h is C 2 in D, and h = 0 there) and suppose that h is 0 everywhere on the boundary of D except at one point z. The Poisson kernel for the ball or for the half-space gives examples of such harmonic functions. Then, heuristically, we have by the Markov property at time t,
Px (Xt ∈ dy, XτD = z) Px (XτD = z) x P (Xt ∈ dy)Py (XτD = z) . = Px (XτD = z)
Px (Xt ∈ dy|XτD = z) =
If p0 (t, x, dy) represents the probability that Brownian motion started at x and killed on leaving D is in dy at time t, we then expect that the analogous probability for Brownian motion conditioned to exit D at z ought to be h(y)p0 (t, x, dy)/h(x). We now make this precise. Let us look at a strong Markov process X . We say a function h is invariant with respect to X if Pt h(x) = h(x) for all t and x, where Pt is the semigroup associated with X . If h is invariant, by the Markov property,
E x [h(Xt ) | Fs ] = E x [h(Xt−s ) ◦ θs | Fs ] = E Xs h(Xt−s ) = Pt−s h(Xs ) = h(Xs ), and so for each x, h(Xt ) is a martingale with respect to Px . Conversely, if h(Xt ) is a martingale with respect to Px for all x, Pt h(x) = E x h(Xt ) = h(x) by the definition of martingale, and so h is invariant. In the case of Brownian motion killed on leaving a domain, the invariant functions are thus the harmonic ones.
22.2 Conditioned processes
179
Now let h be a non-negative invariant function for a strong Markov process X . Letting Mt = h(Xt )/h(X0 ), Mt is a non-negative continuous martingale with M0 = 1, Px -a.s., as long as h(x) > 0. We define the h-path transform of the Markov process X by setting
Pxh (A) = E x [Mt ; A],
A ∈ Ft .
(22.3)
Since M0 = 1, Pxh () = 1. Observe that Phx gives more mass to paths where h(Xt ) is big and less to where it is small. Note the similarity to the Girsanov theorem. We have the following. Proposition 22.2 Suppose (Xt , Px ) is a strong Markov process and that h is non-negative and invariant. Then (Xt , Pxh ) forms a strong Markov process. Proof Suppose A ∈ Fs and h(x) = 0. (We leave consideration of the case where h(x) = 0 to the reader.) Then
E x [ f (Xt+s )h(Xt+s ); A] h(x) x Xs E [E [ f (Xt )h(Xt )]; A] = h(x) 1 E Xs [ f (Xt )h(Xt )]h(Xs ); A = Ex h(Xs )
E xh [ f (Xt+s ); A] =
by the Markov property for X . This is equal to E x E Xh s [ f (Xt )]h(Xs ); A /h(x) = Ehx [E Xh s f (Xt ); A]. The Markov property follows from this. The strong Markov property is proved in almost identical fashion. Let us consider an example. Let (Xt , Px ) be a Brownian motion on the non-negative axis killed on first hitting 0. This is the same as a Brownian motion killed on exiting (0, ∞). This will be a strong Markov process. Since the second derivative of the function h(x) = x is 0, then h is harmonic on (0, ∞), and so is invariant for the killed Brownian motion. Let us now condition using the function h to get Brownian motion conditioned to hit infinity before hitting zero. To identify the resulting process, we argue as follows. Fix x and let Tε = inf {t > 0 : Xt < ε}. The Radon–Nikodym derivative of the law of Pxh with respect to Px on Ft∧Tε is Mt∧Tε = h(Xt∧Tε )/h(x). We can rewrite Mt∧Tε as t∧Tε 2
t∧Tε 1 1 Mt∧Tε = exp(log Xt∧Tε − log x) = exp dXs − 12 ds , Xs Xs 0 0 using Itˆo’s formula. By the Girsanov theorem, under Pxh ,
t∧Tε
Wt∧Tε = Xt∧Tε − 0
1 ds Xs
180
Transformations of Markov processes
is a martingale. By Exercise 13.2, its quadratic variation is t ∧ Tε , and so by Exercise 12.3, Wt∧Tε is a Brownian motion stopped at time Tε . We have t∧Tε 1 ds, Xt∧Tε = x + Wt∧Tε + X s 0 or X satisfies the stochastic differential equation dXt = dWt +
1 dt Xt
for t ≤ Tε . We will see later (Section 24.3) that this is the stochastic differential equation defining the Bessel process of order 3. The same argument shows that Brownian motion killed on exiting (0, a) and then conditioned to hit a before 0 is also a Bessel process of order 3 up until the time of first hitting a.
22.3 Time change An additive functional is an increasing adapted process with A0 = 0, a.s., such that At = As + At−s ◦ θs
(22.4)
if s < t t. The simplest examples are what are known as classical additive functionals: At = 0 f (Xr ) dr, where f is a non-negative measurable function. We have t t−s f (Xr ) dr = f (Xr ) dr ◦ θs = At−s ◦ θ. At − As = s
0
If we have the uniform limit of additive functionals, we again get an additive functional, and thus, for example, the local times Ltx of a one-dimensional Brownian motion are also additive functionals. Given a Markov process X and an additive functional A, let Bt = inf {u : Au > t} and Xt = XBt . Let Ft = FBt . Thus X is a time change of X . Proposition 22.3 Let (Xt , Px ) be a strong Markov process and At an additive functional. With B defined as above, (Xt , Px ) is also a strong Markov process. Proof We verify the strong Markov property. Let Ft = FBt . Then if T is a stopping time for Ft , we have
E x [ f (XT +t ) | FT ] = E x [ f (X (BT +t )) | FBT ]. BT can be seen to be a stopping time with respect to {Ft } and BT +t = Bt ◦ θBT where the θt are the shift operators, so this is
E x E X (BT ) f (XBt ) = E x E XT f (Xt ). This suffices to show that Xt is a strong Markov process.
22.4 Last exit decompositions
181
22.4 Last exit decompositions Let A be a Borel set, and let L be the last visit to A: L = sup{s : Xs ∈ A}. We define L to be 0 if X never hits A. The random time L is not a stopping time, but we can nevertheless kill the process X at time L. It turns out the resulting process Y is the process X conditioned by the function h(x) = Px (TA < ∞). The intuitive meaning of this is that Y is X conditioned to hit the set A. Let T = inf {t : Xt ∈ A}, and set
Xt , t < L, Yt = , t ≥ L. Let Ht = σ (Ys ; s ≤ t ). Proposition 22.4 If (Xt , Px ) is a strong Markov process, then (Yt , Px ) is a Markov process with respect to {Ht }. Proof
/ B), then If B ⊂ S (so that ∈ (Yt ∈ B) = (Xt ∈ B, L > t ) = (Xt ∈ B, T ◦ θt < ∞),
since L, the last time that X is in A, will be larger than t if and only if X hits A at some time after time t. We conclude that the function x → Px (Yt ∈ B) is Borel measurable. Since
Px (Yt = ) = Px (L ≤ t ) = 1 − Px (L > t ) = 1 − Px (T ◦ θt < ∞), then the function x → Px (Yt = ) is also Borel measurable. We need to show that if C ∈ Hs ,
E x [ f (Yt ); C] = E x [Qt−s f (Ys ); C],
(22.5)
where f is bounded and measurable, h(x) = Px (L > 0), and Qt g(x) =
1 Pt (gh)(x) h(x)
when h(x) = 0. (Set Qt g(x) = 0 if h(x) = 0.) It suffices to show (22.5) when C = (Yr1 ∈ B1 , . . . , Yrn ∈ Bn ) for r1 ≤ · · · ≤ rn ≤ s and the B1 , . . . , Bn are Borel sets. If we set Cs = (Xr1 ∈ B1 , . . . , Xrn ∈ Bn ), then Cs ∈ Fs , C ∩ (L > s) = Cs ∩ (L > s), and C ∩ (L > t ) = Cs ∩ (L > t ). We start with
E x [ f (Yt ); C] = E x [ f (Xt ); C, L > t] = E x [ f (Xt ); Cs , L > t] = E x [ f (Xt ); Cs , L ◦ θt > 0]. Conditioning on Ft , this is equal to
E x [ f (Xt )PXt (L > 0); Cs ] = E x [ f (Xt )h(Xt ); Cs ].
182
Transformations of Markov processes
Conditioning on Fs , this in turn is equal to
E x [Pt−s ( f h)(Xt−s ); Cs ] = E x [h(Xs )Qt−s f (Xs ); Cs ] = E x [PXs (L > 0)Qt−s f (Xs ); Cs ] = E x [Qt−s f (Xs ); Cs , L ◦ θs > 0],
(22.6)
where we used the Markov property for the last equality. Continuing, we have that the last line of (22.6) is equal to
E x [Qt−s f (Xs ); Cs , L > s] = E x [Qt−s f (Xs ); C, L > s] = E x [Qt−s f (Ys ); C], as desired. We can also look at XL+t , where L is as above. This new process is again a strong Markov process, and this time is the process X conditioned by the function h(x) = Px (TA = ∞). The intuitive meaning of this is that XL+t is X conditioned never to hit A. Since we are looking at the process after the last visit to A, this is entirely plausible. For a proof of the Markov property of XL+t , see Meyer et al. (1972).
Exercises 22.1 Let (Xt , Px ) be a one-dimensional Brownian motion, Ltx the local time of Brownian motion at x, and m a positive finite measure on R. Show that At = Ltx m(dx) is an additive functional. 22.2 We consider the space-time process. Let Vt = V0 + t. The process Vt is simply the process that increases deterministically at unit speed. Thus Vt can represent time. If (Xt , Px ) is a Markov process, show that ((Xt , Vt ), P(x,v) ) is also a Markov process. Is ((Xt , Vt ), P(x,v) ) necessarily a strong Markov process if (Xt , Px ) is a strong Markov process? For some applications, one lets Vt = V0 − t, and one thinks of time running backwards. Space-time processes are useful when considering parabolic partial differential equations. 22.3 Suppose (Xt , Px ) is a strong Markov process and f is a non-negative invariant function for (Xt , Px ). Write Qx for Pxf . Suppose g is a non-negative invariant function for (Xt , Qx ). Show that f g is a non-negative invariant function for (Xt , Px ) and that Qxg = Pxf g . 22.4 Suppose A and B are additive functionals for a Markov process and A and B have continuous paths. Prove that if E x At = E x Bt for all x and t, then Px (At = Bt for some t ≥ 0) = 0
for all x. Hint: Show At − Bt is a martingale. 22.5 Suppose A and B are additive functionals with continuous paths and suppose E x A∞ = E x B∞ < ∞ for each x. Show Px (At = Bt for some t ≥ 0) = 0
for each x. Hint: If f (x) = E x A∞ , then E x [A∞ | Ft ] − At = E Xt A∞ = f (Xt ),
and similarly with B in place of A. Then A − B is a Px martingale for each x.
Notes
183
22.6 Let A be an additive functional with continuous paths. Suppose there exists K > 0 such that E x A∞ ≤ K for each x. Prove that there exists a constant c depending only on K such that E ecA∞ < ∞,
x ∈ S.
22.7 Here is an argument that the law of a Brownian motion conditioned to have a maximum at a certain level is a Bessel process of order 3. Let W be a one-dimensional Brownian motion killed on hitting 0. Let St = sups≤t Ws be the maximum. By Exercise 19.1, X = (W, S) is a Markov process. Determine the law of X for t ≤ L, where L is the last time X hits the diagonal. To define L more precisely, let D = {(w, s) : w = s, w > 0} and L = sup{t ≥ 0 : Xt ∈ D}. L is finite, a.s., because W will hit 0 in finite time with probability one.
Notes Markov processes are in some sense supposed to have the property that the past and the future are independent given the present. From this point of view, one might hope that a Markov process run backwards is again a Markov process. This is, more or less, the case; see Chung and Walsh (1969) or Rogers and Williams (2000a).
23 Optimal stopping
A nice application of Markov process theory is optimal stopping. Suppose we have a reward function g ≥ 0 and we want to find the stopping time T that maximizes the value of E x g(XT ) and we also want to find the value of this expectation. This is the optimal stopping problem. An important example of an optimal stopping problem is pricing the American put. (See Chapter 28 for more on options.) A European put is an option to sell a share of stock at a fixed price K at a certain time t0 . If at time t0 the price St0 of the stock is lower than K, one can make a profit by buying a share of stock on the stock exchange for St0 dollars, exercising the put (which means selling a share of stock for K dollars), and taking home a profit of K − St0 . If the price of the stock is above K at time t0 , it would be silly to exercise the put, and thus the put is worthless. An American put is almost the same, but one has the option to sell a share of stock at price K at any time before time t0 . An American put is more valuable than a European put because if one exercises the option early, that is, sells the share of stock before time t0 , then one can put the money in a risk-free asset such as a bond or in the bank and earn interest on the money. When should one exercise an American put to maximize the expected return? One cannot look into the future, so the time should be a stopping time. The stopping time should depend on the stock price, the exercise price, and also the time until time t0 . Thus one is in the optimal stopping context with Xt = (t, St ), where St is the stock price, and one wants to find a stopping time T that maximizes a certain reward function.
23.1 Excessive functions A solution to the optimal stopping problem can be given in the Markov case through the use of excessive functions. A non-negative function f is excessive for a Markov process X if Pt f (x) ≤ f (x) for all t and x and Pt f (x) increases up to f (x) pointwise as t → 0. Here Pt is the semigroup associated with the Markov process X . If g ≥ 0, define ∞ ∞ x U g(x) = Ps g(x) ds = E g(Xs ) ds. (23.1) 0
0
When g ≥ 0, U g is excessive. To see this, using the semigroup property and a change of variables, ∞
∞ Ps g(x) ds = Ps+t g(x) ds Pt f (x) = Pt 0 ∞0 Ps g(x) ds. = t
184
23.1 Excessive functions
185
This is certainly less than the integral from 0 to ∞, hence is less than f (x), and Pt f (x) increases up to f (x) by monotone convergence. (It is possible that f is infinite for some or all x.) The theory of excessive functions is an important part of Markov process theory and we refer the reader to Blumenthal and Getoor (1968), a book which has inspired a generation of Markov process theorists. We have the following. Lemma 23.1 If f is excessive, there exist functions gn ≥ 0 such that U gn increases up to f , where U gn is defined by (23.1). Proof
Let gn = n( f − P1/n f ). Since f is excessive, then gn ≥ 0. We have ∞ ∞ U gn = n Ps f ds − n Ps+(1/n) f ds 0 0 1/n =n Ps f ds, 0
which is less than f and increases to f . Next we have Proposition 23.2 (1) If f is excessive, T is a finite stopping time, and h(x) = E x f (XT ), then h is excessive. (2) If f is excessive and T is a finite stopping time, then f (x) ≥ E x f (XT ). (3) If f is excessive, then f (Xt ) is a supermartingale Proof
(1) First suppose f = U g for some non-negative function g. Then ∞ x x XT g(Xs ) ds h(x) = E U g(XT ) = E E 0 ∞ ∞ g(Xs+T ) ds = E x g(Xs ) ds = Ex 0
(23.2)
T
by the strong Markov property and a change of variables. The same argument shows that ∞ ∞ Pt h(x) = E x h(Xt ) = E x E Xt g(Xs ) ds = E x g(Xs ) ds. T +t
T
x ∞
This is less than E T g(Xs ) ds = h(x) and increases up to h(x) as t ↓ 0. Now let f be excessive but not necessarily of the form U g. In the paragraph above, replace g by the gn that were defined in Lemma 23.1 to conclude Pt h(x) = lim Pt U gn (x) ≤ lim U gn (x) = h(x). n→∞
n→∞
That Pt h increases up to h is proved similarly; there is no difficulty interchanging the limit as n tends to infinity and the limit as t tends to 0 since Pt U gn increases both as n increases and as t decreases.
186
Optimal stopping
(2) As in the proof of (1), it suffices to consider the case where f = U g and then take limits. By (23.2), ∞ ∞ x x x E U g(XT ) = E g(Xs ) ds ≤ E g(Xs ) ds = U g(x). T
0
(3) By the Markov property,
E x [ f (Xt ) | F s ] = E Xs f (Xt−s ) = Pt−s f (Xs ) ≤ f (Xs ). The proof is complete. By Proposition 23.2, f (Xt ) is a supermartingale and therefore has left and right limits along the dyadic rationals. We could take a version of f (Xt ) that is right continuous, but there is the danger that doing so would result in a version of X that is not right continuous with left limits. We want to have X fixed and then conclude that f (Xt ) is right continuous with left limits without needing to take a version. Proposition 23.3 Let (Xt , Px ) be a strong Markov process. If f is excessive, then for each x, f (Xt ) is right continuous with left limits Px almost surely. For a proof, we refer the reader to Blumenthal and Getoor (1968), Theorem II.2.12 or to Exercise 23.8. Given a function g, the function G is an excessive majorant for g if G is excessive and G ≥ g pointwise. G is the least excessive majorant for g if (1) G is an excessive majorant, is any other excessive majorant, then G ≤ G pointwise. and (2) if G It turns out, which we will prove below, that an optimal stopping time is to stop the first time Xt leaves the set where g(x) < G(x). Therefore it is important to be able to calculate the least excessive majorant of a function. Here is one method of constructing the least excessive majorant. We say a function f : S → R is lower semicontinuous if {x : f (x) > a} is an open set for every real number a. See Exercise 23.9 for information about lower semicontinuous functions. Proposition 23.4 Suppose that g is non-negative, bounded, and continuous and that Assumption 20.1 holds. Let g0 = g, let Tn = {k/2n : 0 ≤ k ≤ n2n }, and define gn (x) = max Pt gn−1 (x) t∈Tn
for n = 1, 2, . . . Then gn (x) increases pointwise to G(x), the least excessive majorant of g. Proof Since gn (x) ≥ P0 gn−1 (x) = E x gn−1 (X0 ) = gn−1 (x), the sequence gn (x) is increasing. Call the limit H (x). We first show H is lower semicontinuous. If gn−1 is bounded and continuous, then Pt gn−1 is bounded and continuous for each t by Assumption 20.1. Since the maximum of a finite number of continuous functions is continuous, then gn is bounded and continuous. By an induction argument, each gn is continuous. By Exercise 23.9, H is lower semicontinuous. We next show that H is excessive. If t ∈ Tm and n ≥ m, then H (x) ≥ gn (x) ≥ Pt gn−1 (x) = E x gn−1 (Xt ).
23.2 Solving the optimal stopping problem
187
Letting n tend to infinity, H (x) ≥ E x H (Xt ) if t ∈ Tm for some m. Now take tk ∈ ∪m Tm with tk → t. Since H is lower semicontinuous, then using Exercise 23.9 and Fatou’s lemma, H (x) ≥ lim inf E x H (Xtk ) ≥ E x [lim inf H (Xtk )] ≥ E x H (Xt ). k→∞
k→∞
If a ∈ R, let Ea = {y : H (y) > a}, which is open. If a < H (x), then Pt H (x) = E x H (Xt ) ≥ aPx (Xt ∈ Ea ) → a as t → 0. Therefore lim inf t→0 Pt H (x) ≥ a for all a < H (x), hence lim inf Pt H (x) ≥ H (x), t→0
and we conclude Pt H (x) → H (x) as t → 0. Thus H is excessive. Suppose now that F is excessive and F ≥ g pointwise. If F ≥ gn−1 , then F (x) ≥ Pt F (x) ≥ Pt gn−1 (x) for every t ∈ Tn , hence F (x) ≥ gn (x). By an induction argument, F (x) ≥ gn (x) for all n, hence F (x) ≥ H (x). Therefore H is the least excessive majorant of g. In one case, at least, finding the least excessive majorant is easy. Suppose we have a onedimensional Brownian motion killed on leaving an interval [a, b] and a non-negative function g defined on [a, b]. Then the least excessive majorant is the smallest concave function G that is larger than or equal to g everywhere. To see this, if G is the smallest concave function, by Jensen’s inequality Pt G(x) = E x G(Xt ) ≤ G(E x Xt ) ≤ G(x). Because G is concave, it is continuous, and so Pt G(x) = E x G(Xt ) → G(x) as t → 0. is another excessive function larger than g and a ≤ c < x < Therefore G is excessive. If G x S ), where S is the first time the process leaves [c, d] by d ≤ b, we have G(x) ≥ E G(X Proposition 23.2(1). Since X is equal to a Brownian motion up to time S, we know the exact distribution of XS ; see Proposition 3.16. Therefore S ) = d − x G(c) + x − c G(d ). ≥ E x G(X G(x) d−c d−c is concave. Recall that the minimum of two concave Rearranging this inequality shows that G is a concave function larger than g that is less than or equal functions is concave, so G ∧ G or to G. But G is the smallest concave function larger than or equal to g, hence G = G ∧ G, G ≤ G. Thus G is the least excessive majorant of g.
23.2 Solving the optimal stopping problem Now let us turn to proving that an optimal stopping time can be given in terms of the least excessive majorant. For simplicity we will suppose that g is non-negative, continuous, and bounded. We will assume that our Markov process and g are such that a least excessive majorant G exists. Let g∗ be the optimal reward: g∗ (x) = sup{E x g(XT ) : T a stopping time}. / D}. Let D = {x : g(x) < G(x)}, the continuation region and let τD = inf {t : Xt ∈ Theorem 23.5 Let (Xt , Px ) be a strong Markov process and g, g∗ , G, and D as above. If τD < ∞, Px -a.s., then g∗ (x) = G(x) = E x g(XτD ).
188
Optimal stopping
In other words, an optimal stopping time is to stop the first time the process hits {x : G(x) = g(x)}. Proof Let Dε = {x : g(x) < G(x) − ε}, and write τε for τDε . Let Hε (x) = E x [G(Xτε )], which is excessive by Proposition 23.2(2). The first step of the proof is to prove (23.3) below. Second, we prove G(x) ≤ g∗ (x). The third step is to prove that G(x) = g∗ (x) and the fourth that g∗ (x) = E x g(XτD ). Step 1. Let ε > 0. We claim g(x) ≤ Hε (x) + ε,
x ∈ D.
(23.3)
To prove this, we suppose not, that is, we let b = sup(g(x) − Hε (x)) x∈D
and suppose b > ε. Choose η < ε, and then choose x0 such that g(x0 ) − Hε (x0 ) ≥ b − η.
(23.4)
Since Hε + b is an excessive majorant of g by the definition of b, and G is the least excessive majorant, then G(x0 ) ≤ Hε (x0 ) + b.
(23.5)
From (23.4) and (23.5) we conclude G(x0 ) ≤ g(x0 ) + η.
(23.6)
By the Blumenthal 0–1 law (Proposition 20.8), either τε is strictly positive with Px0 probability one or else zero with Px0 probability one. In the first case, for each t > 0, g(x0 ) + η ≥ G(x0 ) ≥ E x [G(Xt∧τε )] ≥ E x0 [g(Xt ) + ε; τε > t]. The first inequality is (23.6), the second is due to G being excessive, and the third because G > g + ε up until the time τε . If we let t → 0 and use the fact that g is continuous, we get g(x0 ) + η ≥ g(x0 ) + ε, a contradiction to the way we chose η. In the second case, where τε = 0 with Px0 -probability one, we have Hε (x0 ) = E x0 G(Xτε ) = E x0 G(X0 ) = G(x0 ) ≥ g(x0 ) ≥ Hε (x0 ) + b − η, a contradiction since we chose η < b. In either case we reach a contradiction, so (23.3) must hold. Step 2. A conclusion we reach from (23.3) is that Hε + ε is an excessive majorant of g. Therefore G(x) ≤ Hε (x) + ε = E [G(Xτε )] + ε x
≤ E x [g(Xτε ) + ε] + ε ≤ g∗ (x) + 2ε.
(23.7)
Exercises
189
The first inequality holds because G is the least excessive majorant, the second inequality because g(Xτε ) + ε = G(Xτε ) by the definition of τε , and the third by the definition of g∗ . Since ε is arbitrary, we see that G(x) ≤ g∗ (x). Step 3. For any stopping time T , because G is excessive and majorizes g, G(x) ≥ E x G(XT ) ≥ E x g(XT ). Taking the supremum over all stopping times T , G(x) ≥ g∗ (x), and therefore G(x) = g∗ (x). Step 4. Because τD is finite almost surely, the continuity of g tells us that E x g(Xτε ) → E x g(XτD ) as ε → 0. By the definition of g∗ , we know that E x g(Xτε ) ≤ g∗ (x). On the other hand, by the definitions of τε and Hε ,
E x g(Xτε ) = E x G(Xτε ) − ε = Hε (x) − ε. By the first inequality in (23.7), the right-hand side is greater than or equal to G(x) − 2ε = g∗ (x) − 2ε. Letting ε → 0 we obtain
E x g(XτD ) ≥ g∗ (x) as desired. The following two corollaries are useful in applications. Corollary 23.6 Suppose there exists a Borel set A such that h is an excessive majorant of g, / A}. Then g∗ (x) = h(x). where h(x) = E x g(XτA ) and τA = inf {t : Xt ∈ Proof
Let G be the least excessive majorant of g. Then h(x) ≥ G(x). However, h(x) = E x g(XτA ) ≤ sup E x g(XT ) = g∗ (x) = G(x) T
by Theorem 23.5. Corollary 23.7 Suppose g is continuous and G, the least excessive majorant of g, is lower semicontinuous. Let D be the continuation region, suppose τD < ∞, a.s., and let h(x) = E x g(XτD ). If h ≥ g, then h = g∗ . Proof Note D = {x : g(x) < G(x)} = ∪a b)], where the union is over all pairs of real numbers a < b. Since G is lower semicontinuous and g is continuous, / D, and so g(XτD ) ≥ G(XτD ), a.s. Since g ≤ G, we see that then D is open. This implies XτD ∈ h(x) = E x g(XτD ) = E x G(XτD ). Since G is excessive, then h is also excessive by Proposition 23.2. Therefore h is an excessive majorant of g and we can apply Corollary 23.6.
Exercises 23.1 Show that if f is excessive, then 1 − e− f is excessive. Thus, for some purposes it is enough to look at bounded excessive functions. 23.2 Show that if f and g are excessive, then f ∧ g is excessive.
190
Optimal stopping
23.3 Let At be an additive functional (defined in (22.4)) and let f (x) = E x A∞ . Show that f is excessive. 23.4 Let f be an excessive function for a strong Markov process (Xt , Px ). Let ε > 0 and S1 = inf {t : | f (Xt ) − f (X0 )| ≥ ε}. Let Si+1 = Si + S1 ◦ θSi . Prove that f (XSi ) is a supermartingale with respect to the σ -fields FSi and with respect to Px for each x. 23.5 For each n, let Atn be an additive functional with continuous paths and suppose that fn (x) = E x An∞ is finite for every x. Suppose At is a continuous additive functional with f (x) = E x A∞ also finite for each x. Suppose fn converges to f uniformly. Prove that for each x, with Px -probability one, Atn converges to At , uniformly over t ≥ 0. Hint: Use Proposition 9.11. 23.6 Suppose f is bounded and excessive, λ ≥ 0, and A = {y : f (y) ≤ λ}. Prove that if x ∈ Ar (i.e., x is regular for A), then f (x) ≤ λ. Hint: Use the optional section theorem (Theorem 16.12) to find stopping times Tm whose graphs are contained in {(t, ω) : t ≤ 1/m, f (Xt ) ≤ λ} with Px -probability at least 1 − (1/m). If the gn are as in Lemma 23.1, write Tm U gn (x) = E x gn (Xs ) ds + E xU gn (XTm )
0
≤ Ex
Tm
gn (Xs ) ds + E x f (XTm )
0
≤ Ex
Tm
0
gn (Xs ) ds + λ + f ∞ /m.
Let m → ∞, then n → ∞. 23.7 Suppose f is bounded and excessive, λ ≥ 0, and B = {y : f (y) ≥ λ}. Prove that if x ∈ Br , then f (x) ≤ λ. Hint: Use the optional stopping theorem as in Exercise 23.6 to find stopping times Rm analogous to the Tm . Write f (x) ≥ E x f (XRm ) ≥ λ − f ∞ /m, and then let m → ∞. 23.8 (1) Suppose f is bounded and excessive, x ∈ S , ε > 0, and C = {y : | f (y) − f (x)| ≥ ε}. Use Exercises 23.6 and 23.7 to show that if z ∈ C r , then | f (z) − f (x)| ≥ ε. (2) Let f , ε, and x be as in (1) and set S = inf {t > 0 : | f (Xt ) − f (x)| ≥ ε}. Use Exercise 20.9 to show that | f (XS ) − f (x)| ≥ ε with Px -probability one. (3) Let f , ε, x, and S be as in (2). Define S = 0 and Si+1 = Si + S ◦ θSi . By Exercise 23.4, f (XSi ) is a positive supermartingale. Use Corollary A.36 to show Si → ∞, Px -a.s. Deduce that with Px -probability one, f (Xt ) has paths that are right continuous with left limits. (4) Use Exercise 23.1 to show that if f is excessive but not necessarily bounded, then f (Xt ) has paths that are right continuous with left limits. 23.9 (1) Show that every continuous function is lower semicontinuous. (2) Show that if f is lower semicontinuous and x ∈ S , then lim inf f (y) ≥ f (x). y→x
(3) Show that if fn is a sequence of continuous functions increasing to f , then f is lower semicontinuous.
Notes
191
23.10 Suppose g is non-negative, bounded, and continuous, and Assumption 20.1 holds. Let g0 = g and define gn (x) = supt≥0 Pt gn−1 (x) for n ≥ 1. Prove that gn increases to the least excessive majorant of g.
Notes See Øksendal (2003) for further information on optimal stopping. Exercise 23.3 shows that E x A∞ is an excessive function if A is an additive functional. To a large extent the converse is true: given an excessive function f and some mild conditions, there exists an additive functional A such that f (x) = E x A∞ for all x. The proof is a modification of the Doob–Meyer decomposition of f (Xt ) that takes into account the fact there is a family of probabilities instead of just one; see Blumenthal and Getoor (1968). The optimal stopping problem involving American puts has a theoretical solution: look at the least excessive majorant for a certain reward function. The reward function is not just (K − s)+ because the interest earned on the money obtained after the sale of a share of stock needs to be taken into account. Moreover, the excessive functions here are relative to the space-time process (St , t ), not those relative to St . Finding a satisfactory solution to this optimal stopping problem is still open and is important.
24 Stochastic differential equations
Stochastic differential equations are used in modeling a wide variety of physical and economic situations, and are one of the main reasons for the interest in stochastic integrals. We consider stochastic differential equations (SDEs) of the form dXt = σ (Xt ) dWt + b(Xt ) dt, where σ and b are real-valued functions and W is a one-dimensional Brownian motion. We also consider multidimensional analogs of this equation. If X represents the position of a particle, the σ (Xt ) dWt term says that the particle X diffuses like a multiple of Brownian motion, but how strong the diffusivity is depends on the location of the particle. The b(Xt ) dt term represents a push in one direction or another, the size of the push depending on the location of the particle.
24.1 Pathwise solutions of SDEs Let Wt be a one-dimensional Brownian motion with respect to a filtration {F t } satisfying the usual conditions; see Chapter 1. We want to consider SDEs of the form dXt = σ (Xt ) dWt + b(Xt ) dt,
X0 = x0 .
This means that Xt satisfies the equation t t σ (Xs ) dWs + b(Xs ) ds, Xt = x0 + 0
t ≥ 0.
(24.1)
(24.2)
0
Here σ and b are Borel measurable functions, the first integral in (24.2) is a stochastic integral with respect to the Brownian motion Wt , and (24.2) holds almost surely, that is, we can find t versions of 0 σ (Xs ) dWs such that for almost all ω, (24.2) holds for all t. In order to be able to define the stochastic integral, we require that any solution Xt to (24.2) be adapted to the filtration {Ft }. If X satisfies (24.2), then X will automatically have continuous paths. We want to consider existence and uniqueness of solutions to the equation (24.2). Definition 24.1 A stochastic process X will be a pathwise solution to (24.1) if X is adapted to the filtration {Ft } and (24.2) holds almost surely, where the null set does not depend on t. We say the solution to (24.1) is pathwise unique if whenever Xt is another solution, then
P(Xt = Xt for some t ≥ 0) = 0.
(24.3)
Sometimes pathwise uniqueness is used for a slightly stronger concept: one can let W be a Brownian motion with respect to each of two filtrations {Ft } and {Ft }, which are possibly 192
24.1 Pathwise solutions of SDEs
193
different, and one can let Xt be adapted to {Ft }. One then requires (24.3) to hold. We won’t need to use this modification of the definition, and in any case our proof of uniqueness will be equally valid in this situation. The function σ in (24.1) is called the diffusion coefficient and the function b is called the drift coefficient. σ tells us the intensity of the noise at a point, and b tells us if there is a push in any direction at a given point. We will suppose that σ and b are Lipschitz functions: there exists a constant c such that |σ (x) − σ (y)| ≤ c|x − y|,
|b(x) − b(y)| ≤ c|x − y|.
(24.4)
We also suppose for now that σ and b are bounded. Theorem 24.2 Suppose σ and b are bounded Lipschitz functions. Then there exists a pathwise solution to (24.1) and this solution is pathwise unique. Proof
Existence. Let X0 (t ) = x0 for all t and define Xi (t ) recursively by t t σ (Xi (s)) dWs + b(Xi (s)) ds. Xi+1 (t ) = x0 + 0
(24.5)
0
Note that X0 (t ) is trivially adapted to {Ft }, and an induction argument shows that Xi is adapted to {Ft } for each i. Fix t0 . We will show existence (and uniqueness) up to time t0 ; since t0 is arbitrary, this will achieve the theorem. Since (x + y)2 ≤ 2x2 + 2y2 , then
r E sup |Xi+1 (r) − Xi (r)|2 = E sup [σ (Xi (s)) − σ (Xi−1 (s))] dWs r≤t r≤t 0 r 2 + [b(Xi (s)) − b(Xi−1 (s))] ds 0
r 2 ≤ 2E sup [σ (Xi (s)) − σ (Xi−1 (s))] dWs r≤t 0 2
r + 2E sup [b(Xi (s)) − b(Xi−1 (s))] ds . r≤t
0
By Doob’s inequalities (Theorem 3.6) and the fact that σ is a Lipschitz function, the first term after the inequality is bounded by t t 2 [σ (Xi (s)) − σ (Xi−1 (s))] dWs [σ (Xi (s)) − σ (Xi−1 (s))]2 ds = 8E 8E 0 0 t |Xi (s) − Xi−1 (s)|2 ds. ≤ cE 0
By the Cauchy–Schwarz inequality, the fact that t ≤ t0 , and the fact that b is a Lipschitz function, the second term is bounded by t 2
t |b(Xi (s)) − b(Xi−1 (s))| ds ≤ 2t0 E |b(Xi (s)) − b(Xi−1 (s))|2 ds 2E 0 0 t |Xi (s) − Xi−1 (s)|2 ds. ≤ cE 0
194
Stochastic differential equations
Therefore
t
E sup |Xi+1 (r) − Xi (r)| ≤ cE
|Xi (s) − Xi−1 (s)|2 ds.
2
r≤t
(24.6)
0
Let gi (t ) = E supr≤t |Xi (r) − Xi−1 (r)|2 . Thus provided we choose A big enough, g1 (t ) ≤ A for t ≤ t0 and t gi (s) ds, t ≤ t0 . gi+1 (t ) ≤ A 0
(Clearly |Xi+1 (t ) − Xi (t )|2 ≤ supr≤t |Xi+1 (r) − Xi (r)|2 .) Thus g2 (t ) ≤ A
t
A ds = A2t,
0
0
t
g3 (t ) ≤ A
t
g1 (s) ds ≤ A t
g2 (s) ds ≤ A 0
A2 s ds = A3t 2 /2, 0
and continuing by induction, gi (t ) ≤ Ait i−1 /(i − 1)! Exercise 24.1 asks you to show that if we define Y t = (E sup |Yr |2 )1/2
(24.7)
r≤t
when Y is a stochastic process, then Y t is a norm and the corresponding metric is complete. Hence (E sup |Xn (s) − Xm (s)|2 )1/2 = Xn − Xm t0 r≤t0
≤
n−1
Xi+1 − Xi t0
i=m
≤
n−1
(gi (t0 ))1/2
i=m
can made small by taking m, n large. (We use the ratio test to show that the sum be (Ait0i−1 /(i − 1)!)1/2 converges.) We have E X0 (t )2 < ∞. By the completeness of · t0 there exists Xt such that E sups≤t0 |Xn (s) − Xs |2 → 0 as n → ∞. This implies there exists a subsequence {n j } such that sups≤t0 |Xn j (s) − Xs |2 → 0 almost surely; since each Xn j is continuous in t, then Xt is also. Taking a limit in (24.5) as n → ∞ shows Xt satisfies (24.2). Uniqueness. Suppose Xt and Xt are two solutions to (24.2). Let g(t ) = E sup |Xr − Xr |2 . r≤t
24.1 Pathwise solutions of SDEs
195
Very similarly to the existence proof, E supr≤t |Xr |2 < ∞, the same with X replaced by X , and
r 2 2 E sup |Xr − Xr | ≤ 2E sup [σ (Xs ) − σ (Xs )] dWs r≤t r≤t 0 2
r + 2E sup [b(Xs ) − b(Xs )] ds r≤t
t
≤ cE
0
|Xs − Xs |2 ds.
0
t Therefore there exists A > 0 such that g(t ) is bounded by A and g(t ) ≤ A 0 g(s) ds. t t Then g(t ) ≤ A 0 A ds = A2t, g(t ) ≤ A 0 A2 s ds = A3t 2 /2, etc. Thus we have g(t ) ≤ Ait i−1 /(i − 1)! for all i, which is only possible if g(t ) = 0. This implies that Xt = Xt for all t ≤ t0 , except for a null set. We also want to consider the SDE (24.1) when σ and b are Lipschitz functions, but not necessarily bounded. Note |σ (x)| ≤ |σ (0)| + c|x|, so that |σ (x)| is less than or equal to c(1 + |x|), and the same for b. Theorem 24.3 Suppose σ and b are Lipschitz functions, but not necessarily bounded. Then there exists a pathwise solution to (24.1) and this solution is pathwise unique. Proof Let σn and bn be bounded Lipschitz functions that agree with σ and b, respectively, on [−n, n]. Let Xn be the unique pathwise solution to (24.1) with σ and b replaced by σn and bn , respectively. Let Tn = inf {t : |Xn (t )| ≥ n}. We claim Xn (t ) = Xm (t ) if t ≤ Tn ∧ Tm ; to prove this, let g(t ) = E sups≤t∧Tn ∧Tm |Xn (s) − Xm (s)|2 , and proceed as in the uniqueness part of the proof of Theorem 24.2. We then have existence and uniqueness of the SDE for t ≤ Tn for each n. To complete the proof, it suffices to show Tn → ∞. Let hn (t ) = E sup |Xn (s)|2 . s≤t∧Tn
Then
t + cE bn (Xn (s))2 ds 0 0 t t 2 2 σn (Xn (s)) ds + ct0 E bn (Xn (s))2 ds ≤ c|x0 | + cE 0 0 t |Xn (s)|2 ds ≤ c|x0 |2 + c + cE 0 t hn (s) ds, ≤c+c
hn (t ) ≤ c|x0 |2 + cE
t
σn (Xn (s)) dWs
2
0
using estimates very similar to those of the proof of Theorem 24.2. By Exercise 24.2, hn (t ) ≤ cect if t ≤ t0 . Note the constant c can be chosen to be independent of n. Then
P(Tn < t0 ) = P(sup |Xn (s)| ≥ n) ≤ s≤t0
as n → ∞. Since t0 is arbitrary, Tn → ∞, a.s.
E sups≤t0 |Xn (s)|2 hn (t0 ) ≤ →0 2 n n2
196
Stochastic differential equations
Although we considered one-dimensional SDEs for simplicity, the same arguments apply when we have higher-dimensional SDEs. Let W = (W 1 , . . . , W d ) be a d-dimensional Brownian motion, let σi j (x) be bounded Lipschitz functions for i = 1, . . . , n and j = 1, . . . , d, and let bi (x) be bounded Lipschitz functions for i = 1, . . . , n. Consider the system of equations dXt i =
d
σi j (Xt ) dWt j + bi (Xt ) dt,
i = 1, . . . , n.
(24.8)
j=1
This is frequently written in matrix form dXt = σ (Xt ) dWt + b(Xt ) dt
(24.9)
where we view X = (X 1 , . . . , X n ) as a n × 1 matrix, b = (b1 , . . . , bn ) as a n × 1 matrixvalued function, W as a d × 1 matrix, and σ as a n × d matrix-valued function. We have existence and uniqueness to the system (24.8). Exercise 24.5 asks you to prove this in the case when n = d, although there is nothing at all special about requiring n = d.
24.2 One-dimensional SDEs Although our proof of pathwise existence and uniqueness was for SDEs in one dimension, as is pointed out in Exercise 24.5, almost the same proof works in higher dimensions. In this section we look at a pathwise uniqueness result that is valid only for SDEs on R. The equation we look at is the same as the one in the last section, namely, t t Xt = x0 + σ (Xs ) dWs + b(Xs ) ds. (24.10) 0
0
Theorem 24.4 Suppose b is bounded and Lipschitz. Suppose there exists a continuous function ρ : [0, ∞) → [0, ∞) such that ρ(0) = 0, ε ρ −2 (u) du = ∞ (24.11) 0
for all ε > 0, and σ is bounded and satisfies |σ (x) − σ (y)| ≤ ρ(|x − y|) for all x and y. Then the solution to (24.10) is pathwise unique. For an example, let b(x) = 0 for all x, and let σ be H¨older continuous of order α, that is, there exists c such that |σ (x) − σ (y)| ≤ c|x − y|α . Then we take ρ(x) = xα , and the integral condition in the theorem is satisfied if and only if α ≥ 1/2. If (24.11) holds for all ε > 0, we say the Yamada–Watanabe condition holds. Instead of proving this theorem right away and then essentially repeating the proof to give a comparison theorem, we will state and prove a comparison theorem (Theorem 24.5) and then obtain Theorem 24.4 as a corollary of Theorem 24.5. We only prove the uniqueness of the solution to (24.10) here. The existence is a consequence of some measure-theoretic magic; see Revuz and Yor (1999), Theorem IX.1.7.
24.2 One-dimensional SDEs
197
Theorem 24.5 Suppose σ satisfies the conditions in Theorem 24.4. Suppose Xt satisfies (24.10) with b a Lipschitz function. Suppose Yt is a continuous semimartingale satisfying t t σ (Ys ) dWs + B(Ys ) ds, Yt ≥ Y0 + 0
0
where B is a Borel measurable function and B(z) ≥ b(z) for all z. If Y0 ≥ x, a.s., then Yt ≥ Xt almost surely for all t. Proof
Let an ↓ 0 be selected so that an−1
(ρ(u))−2 du = n.
an
a0 This can be done inductively. Choose a0 arbitrarily. Since ρ(x)−2 dx increases to inr a0 finity as r → 0, we can choose a1 such that a1 ρ(x)−2 dx = 1; in a similar manner we choose a2 , a3 , . . .. Let hn be continuous, supported in (an , an−1 ), 0 ≤ hn (u) ≤ a 2/nρ 2 (u), and ann−1 hn (u) du = 1 for each n. The idea here is to start with the function (1 + ε)1(an ,an−1 ) (u)/(nρ(u)2 ) for some small ε, and then modify this near an and an−1 to get a function that is continuous, is supported in (an , an−1 ), and integrates to 1. Let fn be such that fn (0) = fn (0) = 0 and fn = hn . Note u u fn (u) = fn (s) ds = hn (s) ds ≤ 1 0
0
and fn (u) ≥ 0, so 0 ≤ fn (u) ≤ 1 and fn (u) = 1 if u ≥ an−1 . Hence fn (u) ↑ u as n → ∞ for each u ≥ 0. Since x ≤ y, then fn (x − y) = 0, and we have by Itˆo’s formula t fn (Xt − Yt ) = martingale + fn (Xs − Ys )[b(Xs ) − B(Ys )] ds (24.12) 0 t fn (Xs − Ys )[σ (Xs ) − σ (Ys )]2 ds. + 12 0
We take expectations of both sides. The martingale term has 0 expectation. The final term on the right-hand side is bounded in expectation by t 2 t 1 E (ρ|Xs − Ys |)2 ds ≤ 2 2 n(ρ|X − Y |) n s s 0 by the assumptions on σ and the bound on fn = hn , and so goes to 0 as n → ∞. The expectation of the second term on the right of (24.12) is bounded above by t t E fn (Xs − Ys )[b(Xs ) − b(Ys )] ds + E fn (Xs − Ys )[b(Ys ) − B(Ys )] ds 0 0 t (1[0,∞) (Xs − Ys )) |Xs − Ys | ds ≤ cE 0 t (Xs − Ys )+ ds. = cE 0
198
Stochastic differential equations
Letting n → ∞,
E (Xt − Yt )+ ≤ c
t
E (Xs − Ys )+ ds.
0
If we set g(t ) = E (Xt − Yt )+ , we have
g(t ) ≤ c
t
g(s) ds, 0
and by Exercise 24.2 we conclude g(t ) = 0 for each t. Using the continuity of the paths of Xt and Yt completes the proof. We now prove Theorem 24.4. Proof of Theorem 24.4 Suppose X and X are two solutions to (24.10). Then by Theorem 24.5 with Y = X and B = b, we have Xt ≤ Xt for all t. Applying this argument with X and X reversed yields Xt ≤ Xt for all t, which completes the proof.
24.3 Examples of SDEs Ornstein–Uhlenbeck process The Ornstein–Uhlenbeck process is the solution to the SDE Xt dt, X0 = x. (24.13) 2 The existence and uniqueness follow by Theorem 24.3. Note that the drift coefficient is not bounded, so Theorem 24.2 is not sufficient. The process behaves like a Brownian motion, with a drift that pushes the process towards the origin; the farther the process gets from the origin, the stronger the push. The equation (24.13) can be solved explicitly. Rearranging, multiplying by et/2 , and using the product rule, Xt d[et/2 Xt ] = et/2 dXt + et/2 dt = et/2 dWt , 2 so t es/2 dWs , et/2 Xt = X0 + dXt = dWt −
0
or Xt = e−t/2 x + e−t/2
t
es/2 dWs .
(24.14)
0
We used here the fact that the martingale part of the semimartingale Zt = et/2 is zero, and therefore Z, W t = 0. By Exercise 24.6, Xt is a Gaussian process and the distribution of Xt t is that of a normal random variable with mean e−t/2 x and variance equal to e−t 0 (es/2 )2 ds = 1 − e−t . t If we let Yt = 0 es/2 dWs and Vt = Ylog(t+1) , then Yt is a mean-zero continuous Gaussian process with independent increments, and hence so is Vt . Since log(u+1) es ds = u − t, Var (Vu − Vt ) = log(t+1)
24.3 Examples of SDEs
199
then Vt is a Brownian motion. Hence Xt = e−t/2 x + e−t/2V (et − 1). This representation of an Ornstein–Uhlenbeck process in terms of a Brownian motion is useful for, among other things, calculating the exit probabilities of a square root boundary. Linear equations We consider the linear equation dXt = AXt dWt + BXt dt,
X0 = x0 ,
(24.15)
where A and B are constants. One place this comes up is in models of stock prices in financial mathematics; see Chapter 28. We have pathwise existence and uniqueness by Theorem 24.3; here both the diffusion and drift coefficients are unbounded. We will give a candidate for the solution, and verify that it solves (24.15). By the pathwise uniqueness, this will then be the only solution. Our candidate is Xt = x0 eAWt +(B−A /2)t . 2
To verify that this is a solution, we use Itˆo’s formula with the process AWt + (B − A2 /2)t and the function ex : t t 2 AWs +(B−A2 /2)s Xt = x0 + e A dWs + eAWs +(B−A /2)s (B − A2 /2) ds 0 0 t 2 eAWs +(B−A /2)s A2 ds + 12 0 t t 2 AWs +(B−A2 /2)s e A dWs + eAWs +(B−A /2)s B ds = x0 + 0 0 t t AXs dWs + BXs ds. = x0 + 0
0
Let us summarize our discussion. Proposition 24.6 The unique pathwise solution to dXt = AXt dWt + BXt dt is Xt = X0 eAWt +(B−A /2)t . 2
If we write Zt = AWt + Bt, then (24.15) becomes dXt = Xt dZt ,
X0 = x0 .
(24.16)
The equation (24.16) makes sense for arbitrary continuous semimartingales Z, and by using Itˆo’s formula as above, one can see that a solution is Xt = x0 eZt −Zt /2 .
200
Stochastic differential equations
Bessel processes We consider Bessel processes and the squares of Bessel processes. The reason for the name is that these processes turn out to be Markov processes and the infinitesimal generator of the semigroup (see Chapter 37) is related to Bessel’s equation, a type of differential equation. A Bessel process of order ν ≥ 2 is defined to be a solution of the SDE dXt = dWt +
ν−1 dt, 2Xt
X0 = x.
(24.17)
Bessel processes of order 0 ≤ ν < 2 can also be defined using (24.17), but only up until the first time the process X reaches 0; some extra information needs to be given as to what the process does at 0. The square of a Bessel process of order ν ≥ 0 is defined to be the solution to the SDE Y0 = y. (24.18) dYt = 2 |Yt | dWt + ν dt, There is no difficulty defining the square of a Bessel process for 0 ≤ ν < 2. By Theorem 24.4 we have pathwise uniqueness for the solution to (24.18), because | |y|1/2 − |x|1/2 | ≤ |y − x|1/2 , and we can thus take ρ(u) = 2u1/2 in Theorem 24.4. The solution to (24.18) when ν = 0 and y = 0 is clearly Yt = 0 for all t. By Theorem 24.5 with b(x) = ν and B(x) = 0, we see that the solution to (24.18) is greater than or equal to 0 for all t. We may thus omit the absolute value in (24.18) and rewrite it as √ Y0 = y. (24.19) dYt = 2 Yt dWt + ν dt, √ we see that If we √ apply Itˆo’s formula to the solution Yt of (24.19) with the function x, √ Xt = Yt solves (24.17) for t up until the first time Y reaches 0; the function x is twice continuously differentiable as long as we stay away from 0. We will see shortly that the square of a Bessel process started away from 0 never hits 0 if and only if ν ≥ 2. Using Itˆo’s formula with a d-dimensional process Wt and the function |x|2 shows that the square of the modulus of a d-dimensional Brownian motion is the square of a Bessel process of order d; this is Exercise 24.7. Bessel processes have the same scaling properties as Brownian motion. That is, if Xt is a Bessel process of order ν started at x, then aXa−2 t is a Bessel process of order ν started at ax. In fact, from (24.17), d(aXa−2 t ) = a dWa−2 t + a2
ν−1 d(a−2t ), 2aXa−2 t
and the assertion follows from the uniqueness of the solution to (24.17) and the fact that aW (a−2t ) is again a Brownian motion. Bessel processes are useful for comparison purposes, and so the following is worthwhile. Proposition 24.7 Suppose Yt is the square of a Bessel process of order ν. Suppose Y0 = y. The following hold with probability one. (1) If ν > 2 and y > 0, Yt never hits 0. (2) If ν = 2 and y > 0, Yt hits every neighborhood of 0, but never hits the point 0. (3) If 0 < ν < 2, Yt hits 0. (4) If ν = 0, then Yt hits 0. If started at 0, then Yt remains at 0 forever.
Exercises
201
When we say that Yt hits 0, we consider only times t > 0. We define T0 = inf {t > 0 : Yt = 0} and say that Yt hits 0 if T0 < ∞. Proof We prove (2). An application of Itˆo’s formula with the process being the square of up a Bessel process of order 2 and the function being log x shows that log Yt is a martingale t until the first hitting time of 0; cf. Exercise 21.1. The quadratic variation of log Yt is 0 Ys−2 ds for t less than the hitting time of 0. Suppose 0 < a < y < b. We claim that Yt leaves the interval [a, b], a.s. If not, log Y t ≥ b−2t → ∞ as t → ∞. Since log Yt is a martingale, it is a time change of Brownian motion, and Brownian motion leaves [log a, log b] with probability one, a contradiction. Then by Corollary 3.17,
P(Yt hits a before b) =
log b − log y . log b − log a
(24.20)
Letting b → ∞, we see that P(Yt hits a) = 1, and since a is arbitrary, Yt hits every neighborhood of 0. If in (24.20) we hold b fixed instead and let a → 0, we see P(Yt hits 0 before b) = 0; since b is arbitrary, this proves that Yt never hits the point 0. Parts (1), (3), and (4) are similar, but instead of log |x| we use |x|(2−ν)/2 . The details are left as Exercise 24.8.
Exercises 24.1 Show that · t defined by (24.7) gives rise to a complete normed linear space. 24.2 Suppose g(t ) is non-negative and bounded on each finite subinterval of [0, ∞). Suppose there exist constants A and B such that t g(s) ds (24.21) g(t ) ≤ A + B 0
for each t ≥ 0. Prove that g(t ) ≤ Hint: Write
AeBt
for all t ≥ 0. This result is known as Gronwall’s lemma.
g(t ) ≤ A + B
t
A+B
0
s
g(r) dr ds,
0
use (24.21) to substitute for g(r), and iterate. 24.3 The starting point in (24.1) can be random. Suppose Y is a random variable that is measurable with respect to F0 , Y is square integrable, and σ and b are bounded and Lipschitz. Prove pathwise existence and uniqueness for the equation t t σ (Xs ) dWs + b(Xs ) ds. Xt = Y + 0
0
24.4 The functions σ and b in (24.1) can depend on time as well as space. Suppose σ : [0, ∞) × R → R, b : [0, ∞) × R → R are bounded and uniformly Lipschitz in the second variable: there exists c independent of s such that |σ (s, x) − σ (s, y)| ≤ c|x − y| and similarly for b. Prove pathwise existence and uniqueness for the equation t t Xt = x0 + σ (s, Xs ) dWs + b(s, Xs ) ds. 0
0
202
Stochastic differential equations
24.5 Here is a multidimensional analog of (24.1). Suppose the functions σi j : Rd → R, 1 ≤ i, j ≤ d, are bounded and Lipschitz, and bi : Rd → R, i = 1, . . . , d, are bounded and Lipschitz, W j are independent one-dimensional Brownian motions, x0 = (x0(1) , . . . , x0(d ) ), and Xt = (Xt (1) , . . . , Xt )d ) ) satisfies Xt (i) = x0(i) +
t d 0 j=1
σi j (Xs ) dWs j +
t
bi (Xs ) ds
(24.22)
0
for i = 1, . . . , d. Prove pathwise existence and uniqueness for this system of equations. ∞ ∞ 24.6 Suppose f and g map [0, ∞) → R with 0 f (t )2 dt < ∞ and 0 g(t )2 dt < ∞. Show that ∞ 0 f (t ) dWt is a mean zero Gaussian random variable, the same with f replaced by g, and ∞
∞ ∞ Cov f (t ) dWt , g(t ) dWt = f (t )g(t ) dt. 0
0
0
Hint: Approximate f and g by piecewise constant deterministic functions. 24.7 Show that if Wt is a d-dimensional Brownian motion, then |Wt |2 is the square of a Bessel process of order d. 24.8 Prove (1), (3), and (4) of Proposition 24.7. 24.9 Let X be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt, where W is a one-dimensional Brownian motion, σ and b are Lipschitz continuous real-valued functions, and |σ (x)| ≤ c(1 + |x|) and |b(x)| ≤ c(1 + |x|). Let t0 > 0. Prove that if p ≥ 2, then E [sup |Xs | p ] ≤ c(1 + |x0 | p ). s≤t0
24.10 Let W be a one-dimensional Brownian motion and let Xtx be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt,
X0 = x.
Suppose σ and b are C ∞ functions and that σ and b and all their derivatives are bounded. Show that for each t the map x → Xtx is continuous in x with probability one. Show that the map is differentiable in x. 24.11 Suppose A(t ) and B(t ) are deterministic functions of t. Find an explicit solution to the onedimensional SDE dXt = A(t ) dWt + B(t ) dt,
X0 = x.
Notes If one wants to have a stochastic differential equation with jumps, besides a Brownian motion, one integrates with respect to a Poisson point process, which is defined in Chapter 18. Using the notation of that chapter, one considers the stochastic differential equation dXt = σ (Xt− ) dWt + b(Xt− ) dt + F (Xt− , z) (μ(dt, dz) − ν(dt, dz)), S
X0 = x0 ,
Notes
203
which means that we want a solution to t t Xt = x0 + σ (Xs− ) dWs + b(Xs− ) ds 0 0 t F (Xs− , z) (μ(ds, dz) − ν(ds, dz)). + 0
S
There is pathwise existence and uniqueness to this SDE provided F satisfies a suitable Lipschitz-like condition; see Skorokhod (1965).
25 Weak solutions of SDEs
In Chapter 24 we considered SDEs of the form dXt = σ (Xt ) dWt + b(Xt ) dt,
(25.1)
where W is a Brownian motion and σ and b are Lipschitz functions, or in one dimension, where σ has a modulus of continuity satisfying an integral condition. When the coefficients σ and b fail to be sufficiently smooth, it is sometimes the case that (25.1) may not have a pathwise solution at all, or it may not be unique. We define another notion of existence and uniqueness that is useful. Definition 25.1 A weak solution (X , W, P ) to (25.1) exists if there exists a probability measure P and a pair of processes (Xt , Wt ) such that Wt is a Brownian motion under P and (25.1) holds. There is weak uniqueness holding for (25.1) if whenever (X , W, P ) and (X , W , P ) are two weak solutions, then the joint law of (X , W ) under P and the joint law of (X , W ) under P are equal. When this happens, we also say that the solution to (25.1) is unique in law. Let us discuss the relationship between weak solutions and pathwise solutions. If the solution to (25.1) is pathwise unique, then weak uniqueness holds. For a proof of this result under very general hypotheses, see Revuz and Yor (1999), theorem IX.1.7. In the case that σ and b are Lipschitz functions, the proof is much simpler. Proposition 25.2 Suppose σ and b are bounded Lipschitz functions and x0 ∈ Rd . Then weak uniqueness holds for (25.1). Proof For notational simplicity we consider the case of dimension one. Suppose (X , W, P ) and (X , W , P ) are two weak solutions to (25.1). Let X0 (t ) = x0 and define Xi+1 (t ) by t t Xi+1 (t ) = x0 + σ (Xi (s)) dWs + b(Xi (s)) ds. (25.2) 0
0
We saw by the proof of Theorem 24.2 that the limit of the Xi exists, uniformly over finite time intervals, and solves (25.1), and the solution is pathwise unique. Since X also solves (25.1), we conclude that Xi converges (uniformly over finite time intervals) to X , a.s., with respect to P. Similarly, if we let X0 (t ) = x0 and define Xi+1 (t ) by t t (t ) = x0 + σ (Xi (s)) dWs + b(Xi (s)) ds, (25.3) Xi+1 0
0
then Xi converges, uniformly over finite time intervals, to X . 204
Weak solutions of SDEs
205
Now since W is a Brownian motion under P and W is a Brownian motion under P , then the law of (X0 , W ) under P equals the law of (X0 , W ) under P . By (25.2) and (25.3), the law of (X1 , W ) under P equals the law of (X1 , W ) under P , and iterating, the law of (Xi , W ) under P equals the law of (Xi , W ) under P . Passing to the limit, the law of (X , W ) under P equals the law of (X , W ) under P . We now give an example to show that weak uniqueness might hold even if pathwise uniqueness does not. Let σ (x) be equal to 1 if x ≥ 0 and −1 otherwise. We take b to be identically 0. We consider solutions to t σ (Xs ) dWs . (25.4) Xt = 0
Weak uniqueness holds since if W is a Brownian motion under P, then Xt must be a martingale, and the quadratic variation of X is dX t = σ (Xt )2 dt = dt; by L´evy’s theorem t (Theorem 12.1), Xt is a Brownian motion. Given a Brownian motion Xt and letting Wt = 0 σ (X1 s ) dXs , t then again by L´evy’s theorem, Wt is a Brownian motion and Xt = 0 σ (Xs ) dWs ; thus weak solutions exist. On the other hand, pathwise uniqueness does not hold. To see this, let Yt = −Xt . We have t t Yt = σ (Ys ) dWs − 2 1{0} (Xs ) dWs . (25.5) 0
0
t The second term on the right has quadratic variation 4 0 1{0} (Xs ) ds; this is 0 almost surely because we showed in Exercise 11.1 that the amount of time Brownian motion spends at 0 has Lebesgue measure 0. Therefore Y is another pathwise solution to (25.4). This example is not satisfying because one would like σ to be positive and even continuous if possible. Such examples exist, however. For each β < 1/2, Barlow (1982) has constructed functions σ that are H¨older continuous of order β and bounded above and below by positive constants and for which dXt = σ (Xt ) dWt ,
X0 = x0 ,
(25.6)
has a unique weak solution but no pathwise solution exists. Let us show how the technique of time change can be used to study weak uniqueness. We consider the SDE (25.6). Proposition 25.3 If σ is Borel measurable and there exist c2 > c1 > 0 such that c1 ≤ σ (x) ≤ c2 for all x, then weak existence and weak uniqueness hold for (25.6). Proof We consider only uniqueness, leaving existence as Exercise 25.1. Suppose (X , W, P ) and (X , W , P ) are two weak solutions. Then Xt is a martingale, and as in Section 12.2, if we set t σ (Xs )2 ds, τt = inf {s : As ≥ t}, At = 0
then Mt = Xτt is a Brownian motion under P. Define A , τ , and M analogously. The law of M under P is that of a Brownian motion, as is that of M under P .
206
Weak solutions of SDEs
Now let
Bt = 0
t
1 ds, σ (Ms )2
ρt = inf {s : Bs ≥ t}.
(25.7)
Since Mt is a Brownian motion and σ is bounded above and below by positive constants, then Bt is continuous, strictly increasing, and increases to infinity as t → ∞, and the same is therefore true of ρt . By a change of variables, t τt 1 1 Bt = ds = dAu 2 2 σ (Xτs ) 0 σ (Xu ) 0 τt 1 = σ (Xu )2 du = τt . 2 0 σ (Xu ) Therefore Mρt = Xτ (ρt ) = Xt . We have the analogous formulas with primes. The law of M under P equals the law of M under P since both are Brownian motions, so by (25.7) the law of (M, B) under P equals the law of (M , B ) under P , and consequently the law of (M, ρ) under P equals the law of (M , ρ ) under P . Since Xt = Mρt and similarly for X , we conclude the law of X under P equals the law of X under P . Finally, since t Wt = 0 σ (X1 s ) dXs and similarly for W , the joint law of (X , W ) under P equals the joint law of (X , W ) under P . We point out that in the above proof it is essential that one can reconstruct X from M in a measurable way. We now use the Girsanov theorem to prove weak uniqueness for (25.1). Proposition 25.4 Suppose σ and b are measurable and bounded above and σ is bounded below by a positive constant. Then weak existence and uniqueness holds for (25.1). Proof We prove the weak uniqueness, leaving it as Exercise 25.2 to prove existence. Define {F t } to be the minimal augmented filtration generated by X , t
t b 2 b 1 (Xs ) dWs − 2 (Xs ) ds , Mt = exp − σ 0 σ 0 by Q(A) = E P [Mt ; A] if A ∈ Ft . By Theorem 13.3, and Q the probability measure defined t = Wt + t (b/σ )(Xs ) ds is a Brownian motion, and under Q, the process W 0
b t . dXt = σ (Xt ) dWt + (Xt ) dt = σ (Xt ) dW σ analogously. By Proposition 25.3 the law of (X , W ) under Q is Define M , Q , and W equal to the law of (X , W ) under Q . Let n ≥ 1, t1 < · · · < tn , and let A1 , . . . , An be Borel subsets of R. Set B = {Xt1 ∈ A1 , . . . , Xtn ∈ An } and define B analogously. We have t
t b 2 dP b 1 (Xs ) dWs + 2 (Xs ) ds d Q P(B) = d Q = exp σ B dQ B 0 σ 0 t
t b 2 b s − 1 (Xs ) dW (Xs ) ds d Q. = exp 2 σ B 0 σ 0
Exercises
207
) under Q is the Using the analogous formula for P (B ) and the fact that the law of (X , W same as that of (X , W ) under Q , we see that P(B) = P (B ); thus the finite-dimensional distributions of X under P and of X under P are the same. Since both X and X are continuous processes, we conclude from t Theorem 2.6 that the law of X under P equals the law of X under P . Defining Yt = Xt − 0 b(Xs ) ds and similarly for Y , the joint law of (X , Y ) t under P equals the joint law of (X , Y ) under P . Finally, Wt = 0 σ (X1 s ) dYs and similarly for W , so we obtain our conclusion. The procedure of using the Girsanov theorem to get rid of the drift also works in higher dimensions. However the time change procedure of Proposition 25.3 is not nearly as useful in higher dimensions as in one dimension. The question of weak uniqueness for the system of equations in Exercise 24.5 is quite an interesting one; see Bass (1997) and Stroock and Varadhan (1977).
Exercises 25.1 Show weak existence holds under the hypotheses of Proposition 25.3. 25.2 Show weak existence holds under the hypotheses of Proposition 25.4. 25.3 Here is an example of an SDE where weak uniqueness does not hold. Suppose W is a onedimensional Brownian motion and α ∈ (0, 12 ). Let σ (x) = 1 ∧ |x|α . Find two solutions to dXt = σ (Xt ) dWt that are not equal in law. Hint: One is the solution that is identically zero. The other can be constructed by time changing a Brownian motion by the inverse of the increasing process t 2 (1 ∧ |Xs |2α )−1 ds. 0
25.4 (1) Suppose as and bs are bounded predictable processes with as bounded below by a positive constant. Let W be a one-dimensional Brownian motion. Suppose Y is a one-dimensional semimartingale such that dYt = at Yt dWt + bt dt,
Y0 = 0.
Prove that if t0 > 0 and ε > 0, there exists a constant c > 0 depending only on t0 , ε, and the bounds on as and bs such that P(sup |Ys | < ε) > c. s≤t0
(2) Now let W be d-dimensional Brownian motion, let x ∈ Rd , and let σ be a d × d matrixvalued function that is bounded and such that σ σ T (x) is positive definite, uniformly in x. That is, there exists > 0 such that for all x, d i, j=1
yi y j (σ (x)σ T (x))i j ≥
d
y2i ,
(y1 , . . . , yd ) ∈ Rd .
i=1
Let b be a d × 1 matrix-valued function that is bounded. Let X be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt,
X0 = x.
208
Weak solutions of SDEs Use Itˆo’s formula to find an equivalent expression for |Xt − x|2 . Then use (1) to prove that if t0 > 0 and ε > 0, there exists a constant c > 0 not depending on x such that Px (sup |Xs − x| < ε) > c. s≤t0
25.5 This is the support theorem for solutions to SDEs. Let X , x, ε, and t0 be as in (2) of Exercise 25.4. Suppose ψ : [0, t0 ] → Rd is a continuous function with ψ (0) = x. Use the Girsanov theorem to prove that there exists c > 0 such that Px (sup |Xs − ψ (s)| < ε) > c. s≤t0
25.6 Suppose weak uniqueness holds for the one-dimensional stochastic differential equation dXt = σ (Xt ) dWt ,
X0 = x,
(25.8)
where W is a one-dimensional Brownian motion. Suppose also that there exists a process X that is adapted to the minimal augmented filtration of W with X0 = x and dXt = σ (Xt ) dWt . Prove that pathwise uniqueness holds for (25.8). Hint: Show there exists a measurable map F from C[0, ∞) → C[0, ∞) such that X = F (W ). If X is another solution to (25.8), then weak uniqueness shows that the laws of (X , W ) and (X , W ) are equal, hence X = F (W ) = X .
26 The Ray–Knight theorems
The local time of Brownian motion, Ltx , is parameterized by space and time: x and t. Ray and Knight independently discovered that at certain stopping times T , the process x → LxT is a Markov process. Times that work are (1) the first time local time at 0 reaches a level r; (2) an exponential random variable T that is independent of the Brownian motion; and (3) the first time T that Brownian motion reaches the level one. We will prove the version of the Ray–Knight theorems in the last case. We will show that if W is a Brownian motion with local times Ltx and T = inf {t > 0 : Wt = 1}, then the process L1−x indexed by x has the same law as the square of a Bessel process of T order 2. We will see in Chapter 39 that the square of a Bessel process is a Markov process. We will use the following lemma. Lemma 26.1 Suppose Xt ( j) , j = 1, 2, are two continuous processes such that
E exp −
1
f (s)Xs(1) ds = E exp −
0
1
f (s)Xs(2) ds
0
whenever f is a non-negative continuous function with support in (0, 1). Then the laws of {Xt ( j) ; 0 ≤ t ≤ 1}, j = 1, 2, are equal. Proof Let ϕ be a non-negative continuous function with support in [0, 1] such that 1 ϕ(x) dx = 1, and let ϕε (x) = ε−1 ϕ(x/ε), so that the sequence 0 {ϕε } is an approximation to the identity. If g is a continuous function and t = 0, then g(s)ϕε (s − t ) ds → g(t ). Now let t1 , . . . , tn ∈ (0, 1), a1 , . . . , an > 0, and set fε (x) = ni=1 ai ϕε (x − ti ). Using the hypothesis and letting ε → 0, we obtain n n
E exp − ai Xti(1) = E exp − ai Xti(2) . i=1
i=1
The left-hand side is the joint Laplace transform of (Xt1(1) , . . . , Xtn(1) ) and the right-hand side is the same for X (2) . By the uniqueness of the Laplace transform, the finite-dimensional distributions of X (1) and X (2) are equal. Both processes have continuous paths, and the conclusion now follows from Theorem 2.6. 209
210
The Ray–Knight theorems
Let Bt be a Brownian motion, not necessarily the same as Wt , and let Zt be the non-negative solution to √ Z0 = 0, 0 ≤ t ≤ 1. (26.1) dZt = 2 Zt dBt + 2 dt, The solution to this equation is unique by Theorem 24.4, and Zt is the square of a Bessel process of order 2. Theorem 26.2 The processes {L1−x T ; 0 ≤ x ≤ 1} and {Zx ; 0 ≤ x ≤ 1} have the same law. Proof Let f ≥ 0 be a continuous function whose support [a, b] is a subset of (0, 1). Let F be the solution to F (x) = 2F (x) f (x),
F (1) = 1,
F (1) = 0;
see Exercise 26.1. Define g(x) = f (1 − x) and G(x) = F (1 − x), so that G = 2Gg, G (0) = 0, and G(0) = 1. We will show
1
1 1−x E exp − f (x)LT dx = E exp − f (t )Zt dt , (26.2) 0
0
and then apply Lemma 26.1. The left-hand side of (26.2) is equal to
1
1 E exp − f (1 − x)LxT dx = E exp − g(x)LxT dx 0 0
T = E exp − g(Xs ) ds , 0
where the last equality follows from the occupation time formula (Theorem 14.4). Let Mt = G(Wt )e−
t 0
g(Ws ) ds
.
By Itˆo’s formula and the product formula, dMt = −G(Wt )g(Wt )e−
t 0
g(Ws ) ds
t
dt + G (Wt )e−
t 0
g(Ws ) ds
dWt
+ 12 G (Wt )e− 0 g(Ws ) ds dt t = G (Wt )e− 0 g(Ws ) ds dWt , since 12 G − Gg = 0. Therefore Mt is a martingale. Since G is bounded on (−∞, 1], then Mt∧T is bounded and we then have 1 = G(0) = E M0 = E MT = E G(1)e− so
E exp − 0
T
g(Ws ) ds =
T 0
g(Ws ) ds
1 . G(1)
Now look at the right-hand side of (26.2). Let
F (t ) t 1 Nt = exp Zt − f (s)Zs ds . F (t ) 2F (t ) 0
,
(26.3)
The Ray–Knight theorems
211
Let Yt = Zt
F (t ) , 2F (t )
so using (26.1), dYt = Zt
2F (t )F (t ) − 2F (t )2 F (t ) √ F (t ) dt. dt + 2 Z dB + 2 t t 4F (t )2 2F (t ) 2F (t )
If
Xt = Yt −
t
f (s)Zs ds, 0
then the martingale part of X is 0
t
F (s) √ Zs dBs , F (s)
and hence dX t =
F (t ) 2 F (t )
Zt dt.
By Itˆo’s formula and the product formula and using F = 2F f , F (t )2 F (t ) Xt 1 Xt F (t ) e dt − Z e dt + dt Z t t F (t )2 F (t ) 2F (t ) 2F (t )2 F (t ) √ F (t ) dt − f (t )Zt dt + Zt dBt + F (t ) F (t ) 2 1 Xt F (t ) + 12 e Zt dt F (t ) F (t )2 F (t ) √ = Zt dBt . F (t )
dNt = −
Observe that F is continuous and positive on [0, 1], hence bounded below on [0, 1] by a positive constant. Also F is bounded above on [0, 1]. We see that Nt∧1 is a martingale. Then E N0 = 1/F (0) = 1/G(1), while
F (1) 1 − E N1 = E exp Z1 F (1) 2F (1) −
= Ee Therefore
E exp −
1 0
1 0
f (s)Zs ds
1
f (s)Zs ds
0
.
f (s)Zs ds = E N1 = E N0 =
1 . G(1)
Combining with (26.3), we conclude the two sides of (26.2) are equal.
212
The Ray–Knight theorems
You may wonder how the function F was arrived at. Exercises 26.2 and 26.3 may shed some light on this.
Exercises 26.1
Suppose f is a non-negative continuous function whose support [a, b] is a subset of (0, 1). Show that there is a unique solution to the ordinary differential equation F (x) = 2F (x) f (x), F (1) = 1, F (1) = 0, that F is everywhere positive, and F is bounded on [0, ∞). Hint: Since f is zero in (b, ∞), then F is zero there, and hence is of the form F (x) = Ax + B for some A and B for x ≥ b. Since F (1) = 0, conclude that A is 0.
26.2
Suppose Xt is a solution to the one-dimensional SDE dXt = σ (Xt ) dWt + b(Xt ) dt. Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What ordinary differential equation must H (x) satisfy (in terms of σ, b, and f ) in order that t
Mt = H (Xt )e
0
f (Xs ) ds
be a martingale? 26.3
Suppose Xt is a solution to the one-dimensional SDE dXt = σ (Xt ) dWt + b(Xt ) dt. Suppose σ and b are bounded and continuous and f is a bounded and continuous function. What partial differential equation must K(x, t )) satisfy (in terms of σ, b, and f ) in order that t
Nt = K(Xt , t )e
0
f (s)Xs ds
be a martingale? y
26.4 Let W be a Brownian motion and Lt the local times at level y. Prove that local times at a fixed y time t are not a Markov process. That is, let t > 0 be fixed and show that (Lt , y ≥ 0) is not a Markov process in the variable y. 26.5 Let S be the first time two-dimensional Brownian motion exits the unit ball and let ψ (λ) = P0 (S > λ). If W is a one-dimensional Brownian motion with local times Ltx and T = inf {t > 0 : Wt = 1}, find the distribution of Y = sup0≤x≤1 LxT in terms of ψ, i.e., write P(Y ≤ λ) in terms of the function ψ. 26.6 Suppose x ∈ (0, 1). With W and T as in Exercise 26.5, find the distribution of LxT . 26.7 Let W be a one-dimensional Brownian motion with local times Ltx . Let Tr = inf {t > 0 : Lt0 = r}. The law of the process x → LxTr can be described as follows: (1) The law of {LxTr , x ≥ 0} is the same as the law of {Xx , x ≥ 0} started at r, where X is the square of a Bessel process of order 0. (2) The law of {L−x Tr , x ≥ 0} is also the same as the law of {Xx , x ≥ 0} started at r, where X is the square of a Bessel process of order 0. (3) The processes {LxTr , x ≥ 0} and {L−x Tr , x ≥ 0} are independent of each other.
Notes
213
This is proved in Revuz and Yor (1999), Section XI.2, or for a challenge, try to prove (1) for yourself using the techniques of this chapter. Using this description of LxTr , find the distribution of L∗Tr = supx LxTr .
Notes There are several other proofs of the Ray–Knight theorems. One by Walsh (Rogers and Williams, 2000b; Walsh, 1978) uses excursion theory. In the next chapter we will indicate some ideas used in that proof.
27 Brownian excursions
The paths of a Brownian motion Wt are continuous, so the zero set Z(ω) = {t : Wt (ω) = 0} is a closed set. The complement of Z(ω) is an open subset of the reals, hence is the countable union of disjoint open intervals. If (a, b) is one of those intervals (depending on ω, of course), then {Wt (ω) : a ≤ t ≤ b} is a continuous function of t that is zero at t = a and t = b but is never 0 for any t ∈ (a, b). We call this piece of the path of Wt (ω) an excursion. To be more formal, let E be the collection of continuous functions f with domain [0, ∞) such that the following hold: there exists a positive real σ f such that f (0) = 0, f (σ f ) = 0, f (t ) = 0 if t ∈ (0, σ f ), and f (t ) = 0 if t > σ f . We make E into a metric space by furnishing it with the supremum norm. Given a Borel subset A of E , we say that the Brownian motion W has had an excursion in A by time t if there exists a time u and a function f ∈ A such that u + σ f ≤ t and Wu+s (ω) = f (s) for all s ≤ σ f . Let Kt (A) be the number of excursions of W in A by time t. Let Lt0 be Brownian local time at 0, and let Tr = inf {t > 0 : Lt0 ≥ r}
(27.1)
be the inverse of Brownian local time at zero. Set Nr (A) = KTr (A). Although Nt (A) might be identically infinite for some sets A, it will be finite for others. For example, let δ > 0 and suppose that every function in A has a supremum greater than δ. The continuity of the paths of W implies that Nt (A) is finite for every t. The main result of this section is the following. Theorem 27.1 Nt (·) is a Poisson point process. Proof If Nt (B) is not infinite, then it has right-continuous paths that increase at most 1 at any given time. The main step will be to show that Nt (B) has stationary increments and Nt (B) − Ns (B) is independent of the σ -field generated by the random variables {Nr (A) : r ≤ s, A a Borel subset of E }.
214
Brownian excursions
215
If r1 ≤ · · · ≤ rn ≤ s < t, k ≥ 0, j1 , . . . , jn ≥ 0, and B and A1 , . . . , An are Borel subsets of E , then
P(Nt (B) − Ns (B) = k; Nr1 (A1 ) = j1 , . . . Nrn (An ) = jn ) = P(KTt (B) − KTs (B) = k; KTr1 (A1 ) = j1 , . . . , KTrn (An ) = jn ) = E PWTs (KTt−s (B) − KT0 (B) = k); KTr1 (A1 ) = j1 , . . . , KTrn (An ) = jn ,
(27.2)
where we used the strong Markov property at time Ts . Since Ts is the first time that local time of Brownian motion at 0 exceeds s and Lt0 increases only when W is at 0, then at time Ts the process W is at 0, so WTs = 0. Therefore the last expression in (27.2) equals
P0 (KTt−s (B) − K0 (B) = k)P(KTr1 (A1 ) = j1 , . . . , KTrn (An ) = jn ), which can be rewritten as
P0 (Nt−s (B) − N0 (B) = k)P(Nr1 (A1 ) = j1 , . . . , Nrn (An ) = jn ). This shows that the law of Nt (B) − Ns (B) is the same as the law of Nt−s (B) − N0 (B) and is independent of σ (Nr (A) : r ≤ s, A ⊂ E ), which is what we wanted. Observe that Nt (B) is constant except for jumps of size one. By Proposition 5.4, Nt (B) is a Poisson process. It is clear that Nt (B) is a measure in B, which completes the proof. Let m(A) = E 0 N1 (A). The measure A is called the excursion measure. We can say a few things about m. Proposition 27.2 If A = { f ∈ E : sup | f (t )| > a}, t
then m(A) = 1/a. Proof Let U = inf {t : |Wt | = a} and V = inf {t > U : Wt = 0}. Since |Wt | − Lt0 is a 0 martingale by Theorem 14.1, then E 0 |Wt∧U | = E 0 Lt∧U . Letting t → ∞ and using dominated convergence on the left and monotone convergence on the right,
E LU0 = E 0 |WU | = a. Set R = inf {r : Nr (A) = 1}. Because Nr (A) is a Poisson process, then R is an exponential random variable with parameter E N1 (A) = m(A). It therefore suffices to show E 0 R = a; see (A.9). We have R = inf {r : KTr (A) = 1}, and because K can only increase at times when Wt = 0, then R = inf {Lt0 : Kt (A) = 1}. Now Kt (A) will first equal one when t = V . But because local time at 0 does not increase when W is not at 0, LV0 = LU0 . Therefore
E 0 R = E 0 LV0 = E 0 LU0 = a. We conclude that m(A) = 1/a.
216
Brownian excursions
By symmetry, if B = { f ∈ E : supt f (t ) > a}, then m(B) = 1/(2a). One can say more about m. Consider those excursions whose maximum is some fixed value b. Starting at any point other than 0, the excursion can be viewed as a Brownian motion killed at 0 and conditioned to have maximum b. Such a path can be decomposed into the part before the maximum, which is a Brownian motion conditioned to hit b before 0, and the part after the maximum, which is Brownian motion conditioned to hit 0 before b. The former can be shown to have the same law as a three-dimensional Bessel process, up until it hits the level b (see the example in Section 22.2), and the latter the same law as b − Xt , where Xt is also a three-dimensional Bessel process up until it hits the level b. Moreover, the part of the path before the maximum can be taken to be independent of the part of the path after the maximum. See Rogers and Williams (2000b) for details. Let us briefly revisit the Ray–Knight theorems and indicate how Brownian excursions can be used to obtain information about local times at different levels. Fix r and let Tr = inf {t > 0 : Lt0 ≥ r}. If x > 0 and y1 , . . . , yn < 0, then the local time at x is a function of the excursions from 0 that hit x and the local times at y1 , . . . , yn are functions of the excursions that go below zero. Since the set of excursions that take positive values and those that take negative values are independent, then LxTr should be independent of LyT1r , . . . , LyTnr . To find the distribution of LxTr , there are a Poisson number of excursions that reach the level x. Each excursion that reaches x contributes an amount to the local time at x that is an exponential random variable; see Exercise 27.1. After proving some additional independence, namely, that the amount each excursion contributes to local time at x is independent of the amount any other excursion contributes and that the amount contributed by an excursion is independent of the number of excursions reaching x, we see that LxTr should have the same distribution as a Poisson number of independent exponential random variables.
Exercises 27.1 Let W be a Brownian motion, x > 0, and T = inf {t > 0 : Wt = x}. If Ltx is the local time at x, show that the distribution of LxT is an exponential random variable. Determine the parameter of this exponential random variable. 27.2 Let W be a one-dimensional Brownian motion. This exercise asks you to prove that the normalized number of downcrossings by time t converges to local time at 0. If a > 0, let S0 = 0, T0 = inf {t : Wt = a}, and for i ≥ 1, Si = inf {t > Ti−1 : Wt = 0},
Ti = inf {t > Si : Wt = a}.
Then Dt (a), the number of downcrossings up to time t, is defined to be sup{k : Sk ≤ t}. Prove that there exists a constant c such that lim aDt (a) = cLt0 ,
a→0
a.s.,
where Lt0 is local time at 0 of W . Determine c. Hint: Use Exercise 18.5. 27.3 Let (Xt , Px ) be a Brownian motion. (1) Use the reflection principle to find P0 (Xs > −a for all s ≤ r).
Notes
217
This is the same as Pa (T0 > r), where T0 is the first time the Brownian motion hits 0. (2) Let A(a, r) = { f ∈ E : sup f (t ) > a, σ f > r}, B(r) = { f ∈ E : σ f > r, sup f (t ) > 0}, and C(a) = { f ∈ E : sup f (t ) > a}. Prove that m(B(r)) = lim m(A(a, r)) = lim [m(C(a)) × Pa (T0 > r)] a→0
a→0
and use this and (1) to compute m(B(r)). By symmetry, m({ f ∈ E : σ f > r}) will be twice the value of m(B(r)). 27.4 Let W be a Brownian motion. Let Et (r) be the number of excursions of length larger than r that have been completed by time t. An excursion of length larger than r means that σ f > r. Show that there exists a constant c such that √ lim rEt (r) = cLt0 , a.s. r→0
Determine c. One interesting point here is that this shows that Lt0 is determined entirely by the zero set Z(ω) = {t : Wt (ω) = 0}. 27.5 Let δ > 0 and Aδ = { f ∈ E : supt | f (t )| > δ}. Let S1 = inf {t : Kt (Aδ ) = 1} and S2 = inf {t > S1 : Kt (Aδ ) = 2}. Thus S1 and S2 are the times the first and second excursions in Aδ have been completed. Let Y1 (t ) be the excursion completed at time S1 and define Y2 (t ) similarly. To be more precise, if R1 = sup{t < S1 : Wt = 0}, then Y1 (s) = WR1 +s if s ≤ S1 − R1 and Y1 (s) is equal to 0 for all s ≥ S1 − R1 . Prove that Y1 and Y2 are independent. Hint: Use the strong Markov property at time S1 .
Notes Besides its use in the Ray–Knight theorems (Rogers and Williams, 2000b), excursion theory is useful in many other contexts. See Rogers and Williams (2000b) for applications to Skorokhod embedding and to the arc sine law.
28 Financial mathematics
A European call option is the option to buy a share of stock at a given price at some particular time in the future. For example, I might buy a call option to purchase one share of Company X for $40 three months from today. When the three months is up, I check the price of Company X. If, say, it is $35, then my option is worthless, because why would I buy a share for $40 using the option when I could buy it on the open market for $35? But if three months from now, the share price is, say, $45, then I can exercise my option, which means I buy a share for $40, and I can then turn around immediately and sell that share for $45 and make a profit of $5. Thus, today, there is a potential for a profit if I have a call option, and so I should pay something to purchase that option. A significant part of financial mathematics is devoted to the question of what is the fair price I should pay for a call option. Options originated in the commodities market, where farmers wanted to hedge their risks. Since then many types of options have been developed (options are also known as derivatives), and the amount of money invested in options has for the past several years exceeded the amount of money invested in stocks. In 1973 Black and Scholes, using the reasonable principle that you can’t get something for nothing, came up with a convincing formula for the price of an option. This chapter gives two derivations of the Black–Scholes formula, proves the fundamental theorem of finance, and finishes by considering a stochastic control problem. The Black–Scholes formula is a beautiful example of applied stochastic processes.
28.1 Finance models Let Wt be a Brownian motion. We assume that St is the price of a stock or other risky security. If we have $2,000 and we buy 100 shares in a stock that sells for $20 per share and it goes up $2, or if we buy 10 shares in a stock selling for $200 per share that goes up $20, we are equally happy; it is the percentage increase that matters. With this in mind, we assume that St satisfies dSt = μSt dt + σ St dWt .
(28.1)
This is plausible, since then dSt /St = μ dt + σ dWt , that is, we are assuming the relative change in price is a multiple of Brownian motion with drift. The quantity μ is known as the mean rate of return and σ is called the volatility. The solution to this SDE is St = S0 eσWt +(μ−(σ by Proposition 24.6. 218
2
/2))t
(28.2)
28.1 Finance models
219
We also assume the existence of a bond with price Bt , which is assumed to be riskless, and the equation for Bt is dBt = rBt dt, which implies Bt = B0 ert . Suppose at time t one buys A shares of stock. The cost is ASt . If one sells the shares at time t + h, one receives ASt+h , and the net gain is A(St+h − St ). One can also sell short, i.e., let A be negative. The formula for the gain is the same. Suppose at time ti one holds Ai shares, up until time ti+1 . The total net gain over t the whole n−1 period t0 to tn is i=0 Ai (Sti+1 − Sti ). This is the same as the stochastic integral 0 at dSt if at equals Ai when ti ≤ t < ti+1 . One should allow Ai to depend on the entire past F ti . Idealizing, one allows continuous trading, and if as is the number of shares held t t at time s, the net gain through trading the stock is 0 as dSs . One has a similar net gain of 0 bs dBs when trading bonds if bs is the number of bonds held at time s. Although at can depend on the entire past Ft , one does not want to let at depend on the future. This helps explain why the class of predictable integrands is the appropriate one to use. The pair (a, b) is called a trading strategy. Set Vt = at St + bt Bt , the amount of wealth one has at time t. The strategy is self-financing if t t Vt = V0 + as dSs + bs dBs 0
(28.3)
(28.4)
0
for all t. The first integral represents the net gain from trading in the stock, the second integral the net gain from trading in the bond, and (28.4) says that one’s wealth at time t is equal to what one starts with plus what one has realized through trading in the stock and bond. We assume throughout that there are no transaction costs (i.e., no brokerage fees). A European call gives the buyer the option of buying a share of the stock at a fixed time tE at price K. The time tE is called the exercise time. After time tE , the option has expired and is worthless. What is the option worth? At time tE , if StE ≤ K, the option is worth nothing, for who would pay K dollars for a share of stock when it sells for StE dollars? If StE > K, one can use the option to buy a share of the stock at price K and immediately sell it at price StE , to make a profit of StE − K. Thus the value of the option at time tE is (StE − K )+ . An important question is: how much should the option sell for? What is a fair price for the option at time 0? There are a myriad of types of options. The American call is almost the same as the European call, except that one is allowed to buy a share of the stock at price K at any time in the interval [0, tE ]. The European put gives the buyer the option to sell a share of the stock at price K at time tE , while the American put gives the buyer the option to sell a share at price K anytime before time tE .
220
Financial mathematics
28.2 Black–Scholes formula In 1973 Black and Scholes came up with their formula for the price of a European call. We will give two derivations of this formula. Derivation 1. First of all, the interest rate r on the bond may be considered to be the same as the rate of inflation. Thus the value of the option (StE − K )+ in today’s dollars is C = e−rtE (StE − K )+ .
(28.5)
In this first derivation we work in today’s dollars. Therefore the present-day value of the stock is Pt = e−rt St . Note P0 = S0 and the present-day value of our option at time tE is then C = e−rtE (StE − K )+ = (PtE − e−rtE K )+ .
(28.6)
By the product formula, dPt = e−rt dSt − re−rt St dt = e−rt σ St dWt + e−rt μSt dt − re−rt St dt = σ Pt dWt + (μ − r)Pt dt. The solution to this stochastic differential equation (see Proposition 24.6) is Pt = P0 eσWt +(μ−r−σ
2
/2)t
.
Also, the net gain or loss in present-day dollars when holding as shares of stock at time s is t a dPs . 0 s Define Q on FtE by
μ−r (μ − r)2 WtE − d Q/d P = MtE = exp − tE . σ 2σ 2 t = Wt + μ−r t is a Brownian motion by the Girsanov theorem. Under Q, W σ Now
μ−r t . dPt = σ Pt dWt + (μ − r)Pt dt = σ Pt dWt + dt = σ Pt dW σ Therefore under Q, Pt is a martingale since stochastic integrals with respect to martingales are martingales. The solution to the SDE t dPt = σ Pt dW is
Pt = P0 eσ Wt −(σ
2
/2)t
,
(28.7)
t have the same filtration. so Pt and W C is FtE measurable. By the martingale representation theorem (Theorem 12.3), there exists an adapted process As such that tE tE s = E QC + C = E QC + As dW Ds dPs , 0
where Ds = As /(σ Ps ).
0
28.2 Black–Scholes formula
221
Therefore, if one follows the trading strategy of buying and selling the stock St , where one holds Ds shares of stock at time s, one can obtain C − E QC dollars at time tE . Or, starting with E QC dollars and buying and selling stock, one can get the identical output as C, almost surely. A standard assumption in finance is that of no arbitrage, which means you cannot make a profit without taking some risk. To avoid riskless profits, C must sell for E QC. To explain this in more detail, suppose you could sell the European call for C dollars. If C > E QC, you could sell a call for C dollars, use the money and invest in the trading strategy of holding Ds shares of stock at time s, and at time tE have C + C − E QC worth of stocks and options. The buyer of the option decides whether to exercise the option, and it costs you C dollars to meet that obligation. With probability one, you have gained C − E QC dollars, a riskless profit. If C < E QC, simply reverse the roles of buying and selling. The only way to avoid making a riskless profit is if C = E QC. To find E QC, using (28.6) and (28.7) we write
E QC = E Q [(S0 eσ WtE −σ tE /2 − e−rtE K )+ ] 1 2 2 (S0 eσ y−σ tE /2 − e−rtE K )+ e−y /2tE dy, =√ 2πtE 2
(28.8)
which is the Black–Scholes formula. One can, if one wishes, perform some calculations to find alternate expressions for the right-hand side. It is noteworthy that μ does not appear in (28.8)! You and I might have different opinions as to what μ, the mean rate of return, is equal to, but we should agree on the price of the call. This was a shock to economists when this was first discovered. The value of σ , the volatility, does enter into the formula. Until we evaluated E QC in (28.8), the actual form of C was unimportant. For any type of option expiring at time tE , Derivation 1 tells us that its price at time zero should be its expectation under Q. Derivation 2. In this approach, which is the one used by Black and Scholes, we use the actual values of the securities, not the present-day values. Let Vt be the value of the option at time t and assume Vt = f (St , tE − t )
(28.9)
for all t, where f is some function that is sufficiently smooth. We also want VtE = (StE − K )+ . Recall the multivariate version of Itˆo’s formula (Theorem 11.2). We apply this with d = 2 and Xt = (St , tE − t ). From (28.1), St = σ 2 St2 dt, tE − tt = 0 since tE − t is of bounded variation and hence has no martingale part, and S, tE − tt = 0. Also, d(tE − t ) = −dt. Then Vt − V0 = f (St , tE − t ) − f (S0 , tE ) t t = fx (Su , tE − u) dSu − ft (Su , tE − u) du 0 0 t σ 2 Su2 fxx (Su , tE − u) du. + 12 0
(28.10)
222
Financial mathematics
Here fx is the partial derivative with respect to x, the first variable, fxx is the second partial derivative with respect to x, and ft is the partial derivative with respect to t, the second variable. On the other hand, t t Vt − V0 = au dSu + bu dBu . (28.11) 0
0
By (28.3) and (28.9), bt =
Vt − at St f (St , tE t − t ) − at St = . Bt Bt
(28.12)
Also, recall Bt = B0 ert . Comparing (28.10) with (28.11), we must therefore have at = fx (St , tE − t )
(28.13)
− ft (St , tE − t ) + 12 σ 2 St2 fxx (St , tE − t ) = bt B0 rert .
(28.14)
and
Substituting for bt using (28.12), r[ f (St , tE − t ) − St fx (St , tE − t )] = − ft (St , tE − t ) +
1 2 2 σ St fxx (St , tE 2
(28.15) − t)
for almost all t and all St . Since St is a continuous process, (28.15) leads to the parabolic partial differential equation (PDE) ft = 12 σ 2 x2 fxx + rx fx − r f ,
(x, s) ∈ (0, ∞) × [0, tE ),
and f (x, 0) = (x − K )+ . Solving this equation for f , f (x, tE ) tells us what V0 should be, i.e., the cost of setting up the equivalent portfolio. This partial differential equation can be solved and the solution is the Black–Scholes formula. Equation (28.13) shows what the trading strategy should be. Let us now briefly discuss American calls. Recall that these are ones where the holder can buy the security at price K at any time up to time tE . Since the holder of an American call can always wait up to time tE , which is equivalent to having a European call, the value of an American call should always be at least as large as the value of the corresponding European call. Suppose one exercises an American call early. If StE > K and one exercised early, at time tE one has one share of stock, for which one paid K, and one has a profit of (StE − K ). However, because one purchased the stock before time tE , one lost the interest Ker(tE −t ) that would have accrued by waiting to exercise the option. (We are supposing r ≥ 0.) Thus in this case it would have been better to wait until time tE to exercise the option. On the other hand, if StE < K, exercising the option early would mean that one has lost |StE − K|, whereas for the European option, one would have not exercised at all, and lost nothing (other than the price of the option). In either case, exercising early gains nothing, hence the price of an American call should be the same as that of a European call.
28.3 The fundamental theorem of finance
223
One can equally well price the European put, the option to sell a share of stock at price K at time tE , by either Derivation 1 or Derivation 2 of the Black–Scholes formula. However this analysis breaks down for American puts (sell a share of stock anytime up to time tE ), because in this case one gains by selling early: one can earn interest on the money received.
28.3 The fundamental theorem of finance In the preceding section, we showed there was a probability measure Q under which Pt was a martingale. This is true very generally. Let St be the price of a security in present-day dollars. We will suppose St is a continuous semimartingale, and can be written St = Mt + At . The NFLVR condition (“no free lunch with vanishing risk”) is that one cannot find fixed t positive real numbers t0 , ε, b > 0, and predictable processes Hn with 00 |Hn (s)| |dAs | + t0 2 Hn dMs < ∞, a.s., for each n such that 0 t0 1 a.s., Hn (s) dSs > − , n 0 for all n and
P
t0
Hn (s) dSs > b > ε.
0
Here t0 , b, ε do not depend on n. The condition says that one can with positive probability ε make a profit of b and with a loss no larger than 1/n. Q is an equivalent martingale measure if Q is a probability measure, Q is equivalent to P (which means they have the same null sets), and St is a local martingale under Q. Theorem 28.1 If St is a continuous semimartingale and the NFLVR condition holds, then there exists an equivalent martingale measure Q. Proof Let us prove first of all that dAt is absolutely continuous with respect to dMt . We suppose not and obtain a contradiction. Consider the measures μA and μM on the predictable σ -field defined by ∞ ∞ 1D dAt , μM (D) = E 1D dMt . (28.16) μA (D) = E 0
0
Since A is of bounded variation and continuous, it is a predictable process, and we can write At = Bt − Ct , where B and C are continuous increasing processes and where μB and μC are mutually singular measures on the predictable σ -field; we define μB and μC analogously to (28.16). To give a few more details on how to do this, we write At = Bt − Ct , where B and C are continuous increasing processes, we find t t non-negative predictable processes bt and ct such that Bt = 0 bt d(Bt + Ct ) and Ct = 0 ct d(Bt + Ct ), and then let t t Bt = 0 (bt − (bt ∧ ct )) d(Bt + Ct ) and Ct = 0 (ct − (bt ∧ ct )) d(Bt + Ct ). We leave it to the reader to check that B and C are the desired processes. Since μB and μC are mutually singular, there exists a set E in the predictable σ -field such that μB (D) = μB (D ∩ E ) and μC (D) = μC (D ∩ E c ) for all sets D in the predictable σ -field. If μA is not absolutely continuous with respect to μM , then at least one of μB and μC is not absolutely continuous. We assume that μB is not, for otherwise we can look t at −St instead of St . Therefore there exists a predictable set F and a fixed time t0 such that 00 1F dBs is almost
224
Financial mathematics
t surely non-negative, is strictly positive with positive probability, and 00 1F dMs = 0. We can replace F by F ∩ E and so assume that F ⊂ E, and hence μC (F ) = μC (F ∩ E c ) = 0. Then t0 t0 t0 t0 1F dSs = 1F dMs + 1F dBs + 1F dCs . 0
0
0
t0
0
The stochastic integral term is 0 because 0 (1F )2 dMs = 0. The integral with respect to Cs is zero because μC (F ) = 0. We then have the NFLVR condition violated with Hn = 1F for all n. Hence absolute continuity is established, and by the Radon–Nikodym theorem, t At = 0 hs dMs for some predictable process hs . t Our next goal is to show 0 h2s dMs < ∞ for all t. Let t U = inf t : h2s dMs = ∞ . 0
On (U < ∞) there are two possibilities: U t (1) 0 h2s dMs < ∞ if t < U but 0 h2s dMs = ∞, and U 2 U +ε 2 (2) 0 hs dMs < ∞ but U hs dMs = ∞ for all ε. t 1 (For a real variable analog, consider the two functions f1 (t ) = −1 |x| dx and f2 (t ) = t 1 1 dx at t = 0.) −1 (x>0) x Let us investigate case (1) and show that it cannot happen. Choose a fixed time t0 such that P(U < t0 ) > 0. Let t R1 = R1 (n) = inf t : h2s dMs ≥ n4 ∧ U ∧ t0 . 0
We suppose inf P(R1 (n) < U ∧ t0 ) > 0
(28.17)
n
and obtain a contradiction. Let Ht = ht 1[0,R1 ] /n4 . Then R1 2 t0 hs Hs dAs = dMs ≥ 1 n4 0 0 on (R1 < U < t0 ). On the other hand, t 2
E sup Hs dMs ≤ 4E t≤t0
0
t0
0
Hs2 dMs ≤
4 4 4 n = 4 8 n n
by Doob’s inequalities. Therefore 2 t 1 E supt≤t0 0t Hs dMs 4/n4 4 ≤ P sup Hs dMs > ≤ −2 = 2 . −2 n n n n t≤t0 0
Let
t Hs dMs ≥ 1/n R2 = R2 (n) = inf t : 0
28.3 The fundamental theorem of finance
225
t = Ht 1[0,R2 ] . We then have and let H
P(R2 < R1 ) ≤ P(R2 ≤ t0 ) ≤ 4/n2 , R2 R2 t0 s dSs = s dMs + s dAs H H H 0
0
1 ≥− + n
0
0 R2
h2s dMs ≥ −1/n n4
almost surely, and
P
t0
0
s dSs ≥ 1 ≥ P(R1 < U < t0 ) − P(R2 < R1 ) H 2 4 ≥ P(R1 < U < t0 ) − 2 . n
We do this for each n, and thus obtain a contradiction to the NFLVR condition, so (28.17) cannot hold. U +1 Case (2) is similar: choose δn such that U +δn h2s dMs ≥ n4 with positive probability, let Ht = ht 1[U +δ,U +1] /n4 , and proceed as above. We leave the details as Exercise 28.3. t We thus have 0 h2s dMs < ∞, a.s., for each t. Consequently the quantity s sups≤t | 0 hr dMr | is also finite. Let t t Vn = inf t : hs dMs ≥ n or h2s dMs ≥ n . 0
0
We conclude Vn ↑ ∞. Define Q on FVn by
Vn
d Q/d P = exp −
hs dMs −
0
1 2
Vn
h2s dMs .
0
The exponent is bounded, so Q is well defined. Under Q, if t ≤ Vn , then t ! · " hs dMs , M = Mt + hs dMs = Mt + At Mt − − t
0
0
is a martingale by the Girsanov theorem (Exercise 13.5). Therefore St = Mt + At is a local martingale. t 1 t 2 Finally, e− 0 hs dMs − 2 0 hs dMs is never zero nor infinite, so Q is equivalent to P. Let us give two examples to clarify the proof. Let C be the standard Cantor set and let g(t ) be the Cantor function. Suppose St = Wt + g(t ), where W is a Brownian motion. We then 1 let Ht = 1C (t ). Since the Cantor function increases only on the Cantor set, 0 Hs dg(s) = 1. 1 2 Since the Cantor set has Lebesgue measure 0, then 0 Hs ds = 0. But this is the quadratic 1 variation of 0 Hs dWs , so this stochastic integral is also 0. It follows that
1
Hs dSs = 0
1
1
Hs dWs + 0
Hs dg(s) = 1, 0
226
Financial mathematics
which says that with the trading strategy H we make a profit of 1 almost surely, that is, without any risk. Therefore the NFLVR condition is violated. This example indicates why we must have dAt absolutely continuous with respect to dMt . t Suppose now that W is a Brownian motion and St = Wt + 0 Hs ds with Hs bounded. Let Mt = e−
t 0
Hs dWs −
1 t 2 H ds 2 0 s ,
t and define Q on F1 by d Q/d P = M1 . By the Girsanov theorem, St = Wt + 0 Hs ds is a martingale under Q. This example shows that if the Radon–Nikodym derivative of dAt with respect to dMt is not too bad, we can apply the Girsanov theorem.
28.4 Stochastic control The theory of stochastic control, which includes a study of the Hamilton–Jacobi–Bellman (HJB) equation and requires some knowledge of partial differential equations, is beyond the scope of this book. However, we can consider one simple useful example. Suppose we have available to us a stock which satisfies the SDE dSt = σ St dWt + μSt dt, where Wt is a Brownian motion, and a risk-free asset which satisfies the equation dBt = rBt dt. We want to put a proportion u of our wealth Zt into the stock and the remainder into the risk-free asset. We will restrict 0 ≤ u ≤ 1, so that we do not borrow nor have short selling. Also, we take μ > r, for if the mean rate of return on the stock is less than the risk-free rate, we simply put all our money in the risk-free asset. How do we choose u in order to maximize our return? First of all, what do we mean by maximizing our return? Typically one chooses ahead of time a deterministic function U , called the utility function, and one wants to maximize E U (Zt0 ) at some fixed time t0 . Usually utility functions are taken to be increasing and concave. The function is chosen to be increasing because more money is considered better. It is chosen concave because one assumes that twice the amount of money will give increased pleasure, but not twice as much pleasure. Let us work out the optimal control problem when U (x) = x p for some p ∈ (0, 1). If Zt (depending on u) is our wealth, we have Zt = St + Bt and St = uZt , Bt = (1 − u)Zt . We will allow u to depend on t and ω, but our answer will turn out to be deterministic and independent of t, i.e., u is a constant. We have seen (Proposition 24.6) that St = S0 eσWt −σ
2
t/2+μt
and St = σ 2 St2 dt and that the equation for Bt has the solution Bt = B0 ert .
Exercises
227
Therefore neither St nor Bt can ever be 0 or negative, and so Zt > 0 for all t. Applying Itˆo’s formula to Ztp and noting that Zt = St , we have dZtp = pZtp−1 dZt + 12 p(p − 1)Ztp−2 dZt = pZtp−1 σ St dWt + pZtp−1 μSt dt + pZtp−1 rBt dt + 12 p(p − 1)Ztp−2 σ 2 St2 dt = puZtp σ dWt + puZtp μ dt + p(1 − u)rZtp dt + 12 p(p − 1)Ztp σ 2 u2 dt. Therefore
E Ztp0
=
E Z0p
+ pE 0
t0
Ztp [uμ + (1 − u)r + 12 (p − 1)σ 2 u2 ] dt.
This will be largest if the expression F (u) = uμ + (1 − u)r + 12 (p − 1)σ 2 u2 is largest, which by elementary calculus is largest when u=
μ−r . (1 − p)σ 2
Exercises 28.1 Let 1 (x) = √ 2π
x
−∞
e−y
2
/2
dy,
the cumulative normal distribution function. Rewrite the Black–Scholes formula for the value of a European call in terms of . This is the way the Black–Scholes formula is written in finance books. 28.2 A European put that gives one the option to sell a share of stock at price K at time tE has value (K − StE )+ at time tE . Find the present-day value of the European put at time 0. 28.3 Carry out the details of the proof of Theorem 28.1 for Case 2. 28.4 If the utility function in Section 28.4 is U (x) = log x instead of U (x) = x p , what is the optimal choice for u? 28.5 Let a, b > 0, let Yi be i.i.d. random variables that take only the values b and −a, and let Sn = ni=1 Yi . Show that if P(Y1 = b) > 0 and P(Y1 = −a) > 0, there exists a probability measure Q equivalent to P under which Sn is a martingale. Describe the Radon–Nikodym derivative of Q with respect to P. 28.6 Suppose the interest rate r is equal to 0 and an option V has payoff sup Ss s≤te
at time te . What is the price of V at time 0?
228
Financial mathematics
28.7 Suppose the interest rate r is equal to 0. Let U be the option that pays off − inf s≤te Ss at time te . What is the price of U at time 0? If V is as in Exercise 28.6, then U + V is the option that pays off the maximum of the stock price minus the minimum of the stock price, in other words, “buy low, sell high.” Naturally such an option would be expensive. It is remarkable that there exists a trading strategy that can duplicate this payoff, even though the times when the maximum and minimum occur are not stopping times.
29 Filtering
Stochastic filtering is a nice example of nontrivial interesting mathematics that is extremely useful. For example, it has been used extensively in NASA’s space program. The method we use is called the innovations approach to filtering, and uses L´evy’s theorem, the martingale representation theorem, and other results from stochastic calculus. We will start with a fairly general model, except for simplicity we will assume our observation process is one-dimensional. The extension to the d-dimensional case is mostly routine. Later on we will look at a specific model, the linear model, where one can give fairly explicit solutions to the filtering equation for real-life problems.
29.1 The basic model We start with a probability space (, F , P ), together with a filtration {Ft } satisfying the usual conditions. In filtering theory, there are a number of filtrations present, and we will need to be careful about which ones are which. We have a signal process Xt taking values in a complete separable metric space S and we let {FtX } be the minimal augmented filtration generated by X . We have a function f mapping S to the reals, we suppose E | f (Xt )|2 < ∞ for all t, and we suppose that there exists a process As adapted to the filtration {FtX } such that t As ds Mt = f (Xt ) − f (X0 ) − 0
{FtX }.
is a martingale with respect to the filtration Next we discuss the observation process. Let Wt be a one-dimensional Brownian motion with respect to the filtration {FtX }, let ht be a real-valued process adapted to {FtX }, and suppose t hs ds. (29.1) Zt = Wt + 0
The process Zt is called the observation process and is what we observe. Let {FtZ } be the filtration generated by the process Z. In practice one does not necessarily want to assume that {FtZ } is right continuous, but let us assume that it is for simplicity. Requiring the filtration to be complete is not a serious issue. For an example, suppose that dXt = σ (Xt ) dW t + b(Xt ) dt as in Chapter 24, where W t is a d-dimensional Brownian motion and σ and b are matrix valued, and suppose f ∈ C 2 (Rd ) is bounded or has linear growth. Then Itˆo’s formula shows that such an f will satisfy our 229
230
Filtering
assumptions. In this case hs in (29.1) is of the form g(Xs ) for a particular function g; see Section 39.3. The goal of filtering is to get the best estimate of f (Xt ) from the observations {Zt }. We want to find the best estimate for f (Xt ) in the following sense. We want to minimize the mean square error E | f (Xt ) − Y |2 over all random variables Y that are FtZ measurable, i.e., over all random variables that can be determined by the observations up to time t. The rationale is that since FtZ is the information we have observed up to time t, we want our estimate to be FtZ measurable, and among all random variables that are FtZ measurable, we want the one closest to f (Xt ) in L2 norm, which means we minimize the mean square error. Lemma 29.1 The best mean square error estimate of f (Xt ) over the class of FtZ measurable random variables is Y = E [ f (Xt ) | FtZ ]. Proof By our assumptions on f , the random variable V = f (Xt ) is in L2 (P ). Let Y be the best mean square estimator. The collection M of L2 random variables which are FtZ measurable is a linear subspace of L2 , and the element of a Hilbert space that minimizes the distance from V to this subspace M is the projection onto M. Therefore Y is the projection of V onto M. Hence V − Y is orthogonal (in the L2 sense) to every element of M. In particular, if E ∈ FtZ ,
E [(V − Y )1E ] = 0, which implies E [V ; E] = E [Y ; E]. This holds for every E ∈ FtZ and Y is FtZ measurable, hence Y = E [V | FtZ ]. t = E [Ht | FtZ ]. We will Given any process Ht that is {Ft } adapted, we use the notation H t look at expressions like 0 Hs ds, and you might wonder about the joint measurability of H in ω and t, since Ht is only defined almost surely for each t. The way to deal with this is to t be the optional projection of H with respect to the optional σ -field generated by {FtZ }; let H see (16.8) in Chapter 16.
29.2 The innovation process We next define the innovation process
Nt = Zt −
t
hs ds.
(29.2)
0
(Following our convention on notation, hs = E [hs | FsZ ].) Note that although Nt is FtZ measurable, we cannot determine it from (29.2) because it contains the unknown hs on the right-hand side. Proposition 29.2 Nt is a Brownian motion with respect to the filtration {FtZ }. Proof We will show that Nt is a continuous martingale with respect to the filtration {FtZ } whose quadratic variation is t, and then our result follows from L´evy’s theorem (Theorem 12.1). That Nt is continuous is obvious, and Nt = Zt = W t = t from the definitions of Z and W . Thus we need to show that N is a martingale with respect to {FtZ }.
29.3 Representation of F Z -martingales
231
If r ≥ s, we have
E [ hr | FsZ ] = E [E [hr | FrZ ] | FsZ ] = E [hr | FsZ ]. Then using Exercise 29.1,
t
E [Nt − Ns | FsZ ] = E [Zt − Zs | FsZ ] −
E [ hr | FsZ ] dr
(29.3)
(29.4)
s
= E [Wt − Ws |
FsZ ]
t
+
E [hr − hr | FsZ ] dr
s
= E [E [Wt − Ws | FsX ] | FsZ ] = 0, since FsZ ⊂ FsX .
29.3 Representation of F Z -martingales In this section we prove that if Yt is a martingale with respect to {FtZ }, then Y can be represented as a stochastic integral with respect to N. This is not an immediate consequence of Theorem 12.3 because we do not know that Nt generates {FtZ }; the filtration generated by N could conceivably be strictly smaller than the one generated by Z. Theorem 29.3 Suppose Yt is a square integrable martingale with respect to {FtZ }. Let P Z be the predictable σ -field definedon [0, ∞) × in terms of {FtZ }. Then there exists Hs which ∞ is P Z measurable and with E 0 Hs2 ds < ∞ such that t Yt = Y0 + Hs dNs (29.5) 0
for all t. To clarify, P Z is the σ -field generated by all bounded left-continuous processes that are adapted to {FtZ }. t t Proof First let us treat the case where 0 hs |2 ds, and Yt are each bounded. Define hs dNs , 0 | Z Q on Ft by d Q/d P |FtZ = Mt , where t
t 1 | hs |2 ds . Mt = exp − hs dNs − 2 0
Then by the Girsanov theorem (Theorem 13.3)
0
t
Zt = Nt +
hs ds
0
is a martingale under Q with respect to {FtZ }. Since Zt = Nt = W t = t, then Z is a Brownian motion under Q with respect to {FtZ }. t = Mt−1Yt . If A ∈ FsZ , then A ∈ FsX and Let Y t ; A] = E P [Mt (Mt−1Yt ); A] = E P [Yt ; A] = E P [Ys ; A] E Q [Y s ; A]. = E P [Ms (Ms−1Ys ); A] = E Q [Y
232
Filtering
t is a martingale under Q with respect to {FtZ }. By the martingale representation Therefore Y theorem (Theorem 12.3) there exists Ks ∈ P Z such that t = Y 0 + Y
t
0 + Ks dZs = Y
t
t
Ks dNs +
0
0
Ks hs ds.
0
t . We have dM, Y t = −Mt On the other hand, dMt = −Mt ht dNt and Yt = Mt Y ht Kt dt. By the product formula,
t
t
s + M, Y t 0 + s dMs + Ms dY Yt = M0Y Y 0 0 t t t t s Ms = Y0 − Ks Ms dNs + Ks Ms hs dNs + hs Ms ds − hs Ks ds, Y 0
0
0
0
s Ms which is of the desired form if we set Hs = Ks Ms − Y hs . In the general case, let t t | hs |2 ds + |Yt | ≥ K . hs dNs + TK = inf t : 0
0
We apply the above argument to Yt∧TK and use Exercise 29.3 to get Yt∧TK = Y0 +
t
HsK dNs , 0
Z } and is 0 from time TK on. Since where HsK is predictable with respect to the σ -fields {Ft∧T K Yt is square integrable, YTK → Y∞ almost surely and in L2 (P ) as K → ∞, and
E
∞
|HsK − HsL |2 ds = E [ |YTK − YTL |2 ] → 0
0
∞ as K, L → ∞. Using the completeness of L2 , there exists Hs such that E 0 Hs2 ds < ∞ ∞ and E 0 |Hs − HsK |2 ds → 0 as K → ∞. It is routine to check that Hs is P Z measurable and that (29.5) holds.
29.4 The filtering equation We now derive the general filtering equation. First we need a lemma. t ds is a t − t H Lemma 29.4 If Yt − 0 Hs ds is a martingale with respect to {FtX }, then Y 0 s Z martingale with respect to {Ft }.
29.4 The filtering equation
Proof
233
Since FsZ ⊂ FsX , t r dr | FsZ s − t − Y H E Y s
t E [Hr | FrZ ] dr | FsZ = E E [Yt | FtZ ] − E [Ys | FsZ ] − s t Hr dr | FsZ = E Yt − Ys − s t Hr dr | FsX | FsZ = 0. = E E Yt − Ys − s
The first equality is proved in a fashion similar to the one you were asked to prove in Exercise 29.1. Here is the filtering equation.
t Theorem 29.5 Let Mt = f (Xt ) − f (X0 ) − 0 As ds be a martingale with respect to {FtX } t and write Fs for f (Xs ). Suppose M, W t = 0 Ds ds. Then t t # Ft = F0 + (29.6) As ds + (F s hs − Fs hs + Ds ) dNs . 0
Proof
0
By Lemma 29.4, t − F 0 − Lt = F
t
s ds A
(29.7)
0
is a martingale with respect to {FtZ } and by Theorem 29.3, there exists Hs such that t Hs dNs . (29.8) Lt = 0
By the product formula t t t Fs dZs + Zs dFs + Ds ds Ft Zt = 0 0 0 t t t t t Fs dNs + Fs hs ds + Zs dMs + Zs As ds + Ds ds = 0 0 0 0 0 t X = F -martingale + [Fs hs + Zs As + Ds ] ds. 0
By Lemma 29.4 and the obvious fact that Z is adapted to {FtZ }, t Z # # t Zt = F Z = F -martingale + (F F t t s hs + Zs As + Ds ) ds. 0
Again using the product formula, t t t t Zt = s dZs + s + F F Zs d F Hs ds 0 0 0 t Z s s + Hs ] ds. = F -martingale + [F hs + Zs A 0
234
Therefore
Filtering
t
# (F s hs + Zs As + Ds − Fs hs − Zs As − Hs ) ds
0
is a continuous F Z -martingale that has paths that are locally of bounded variation and which is zero at time zero, hence is identically zero by Theorem 9.7. Hence with probability one, # Hs = F s hs − Fs hs + Ds for almost every s. Substituting this in (29.8) and combining with (29.7) gives our result.
29.5 Linear models The filtering equation (29.6) is difficult to apply in most cases. However, in the linear model, we can get a much simpler representation. To define the linear model in d dimensions, let Xt solve dXt = A(t ) dW t + B(t )Xt dt,
(29.9)
where W t is a d-dimensional Brownian motion and A(t ) and B(t ) are deterministic d × d matrices that are continuous in t. Let dZt = dWt + C(t )Xt dt,
(29.10)
where C is a deterministic d × d matrix-valued function that is continuous in t and Wt is a d-dimensional Brownian motion independent of W and X . Why is this model useful? Suppose Xt is two-dimensional with Xt (1) being the position of a particle and Xt (2) its velocity. Suppose the position and the velocity have some randomness and that our observations of the position and velocity are noisy. This fits into the model (29.9)–(29.10) if we take % $ % $ % $ 1 0 0 1 c1 0 . A(t ) = , B(t ) = , C(t ) = 0 c2 0 1 0 0 For another example, suppose a particle has a fixed unknown velocity and the position is observed, but obscured by noise. Let Xt (1) and Xt (2) be the position and velocity and let A(t ) be the zero matrix, $ % $ % 0 1 1 0 B(t ) = , C(t ) = . 0 0 0 0 The solution of the filtering problem modeled by (29.9)–(29.10) is known as the Kalman– Bucy filter. For simplicity we will consider the special case where the dimension d is 1 and A, B, C are constant in t; the general case is done in exactly the same way, but the notation becomes much more complicated (see Kallianpur, 1980). We will further assume E X0 and Var X0 are known.
29.6 Kalman–Bucy filter Let &2 − (X t )2 , Vt = X t the conditional variance of Xt given FtZ .
29.6 Kalman–Bucy filter
235
Theorem 29.6 Vt solves the deterministic ordinary differential equation dVt = 1 + 2BVt − C 2Vt 2 , dt
V0 = Var X0
(29.11)
t solves In particular, Vt is deterministic. X t = CVt dZt + (B − CVt )X t dt, dX
0 = E X0 . X
(29.12)
The equation (29.11) is an example of what is known as a Riccati equation. We get a similar equation when d > 1 or when A, B, and C depend on t, but in general one cannot solve the Riccati equation explicitly. However, when d = 1 and A, B, C do not depend on t, one can solve (29.11) by separation of variables. Write dV = dt, 1 + 2BV − C 2V 2 and integrate both sides. When d = 1 (and even if A, B, and C depend on time), we can solve (29.12). Let Gt = B − CVt so that we have t dt, t = CVt dZt + Gt X dX or by the product formula
t t t = e− 0 Gr drCVt dZt , d e− 0 Gr dr X
and hence t = E X0 + X
t
t
e
s
Gr dr
CVs dZs .
0
(Cf. the solution of (24.15).) Proof of Theorem 29.6 By Itˆo’s formula, if f ∈ C 2 , t X f (Xt ) − f (X0 ) = F -martingale + [ 12 f (Xs ) + BXs f (Xs )] ds. 0
By the filtering equation applied with f (x) = x, t t Xs ds + C Vs dNs . Xt = E X0 + B 0
(29.13)
0
By Exercises 29.4(2) and 29.5(3), &3 − X &2 = 2X t X t Vt . X t t
(29.14)
With the filtering equation applied with f (x) = x2 and (29.14), t t &2 = E X 2 + C (1 + 2BX &2 ) dNs &2 ) ds + C (X &3 − X s X X t s s s 0 0 0 t t &2 ) ds + 2C s dNs . = E X02 + C (1 + 2BX Vs X s 0
0
236
Filtering
Therefore &2 − (X t )2 ) dVt = d(X t
(29.15)
&2 dt ) − 2X t dNt + (1 + 2BX t (CVt dNt + BX t dt ) − C 2Vt 2 dt = 2CVt X t = (1 + 2BVt − C 2Vt 2 ) dt. This shows that Vt solves the deterministic ordinary differential equation (29.15). This equation has a unique solution (cf. Theorem 15.1), so Vt is deterministic. We obtain (29.12) from (29.2), (29.10), and (29.13).
Exercises 29.1 Justify the first equality in (29.4). t is a martingale with respect 29.2 Show that if Mt is a martingale with respect to {FtX }, then M to {FtZ }. 29.3 Suppose W is a Brownian motion and {Ft } is its minimal augmented filtration. Let T be a bounded stopping time with respect to {Ft }. Suppose Y is a FT measurable T random variable with E Y 2 < ∞. Show that there exists a predictable process Hs with E 0 Hs2 ds < ∞ such T that Y = E Y + 0 Hs dWs , a.s. 29.4 (1) Show that the solution to (29.9) is a Gaussian process. (2) Show that the solutions (Xt , Zt ) to (29.9)–(29.10) form a Gaussian process. 29.5 (1) Show that if X is a normal random variable with mean μ and variance σ 2 , then E X 3 = μ(μ2 + 3σ 2 ).
(2) Show that if X , Y1 , . . . , Yn are jointly normal random variables, then E [X 3 | Y1 , . . . , Yn ] = E [X | Y1 , . . . , Yn ](E [X | Y1 , . . . , Yn ]2
+ 3Var [X | Y1 , . . . , Yn ]), where Var [X | Y1 , . . . , Yn ] = E [(X − E [X | Y1 , . . . , Yn ])2 | Y1 , . . . , Yn ]. (3) Show that &3 = X t ((X t )2 + 3Var (Xt | FtZ )), X t where &2 − (X t )2 | FtZ ] = X t )2 . Var (Xt | FtZ ) = E [(Xt − X t
Notes For more on filtering, see Kallianpur (1980) and Øksendal (2003).
30 Convergence of probability measures
Suppose we have a sequence of probabilities on a metric space S and we want to define what it means for the sequence to converge weakly. Alternately, we may have a sequence of random variables and want to say what it means for the random variables to converge weakly. We will apply the results we obtain here in later chapters to the case where S is a function space such as C[0, 1] and obtain theorems on the convergence of stochastic processes. For now our state space is assumed to be an arbitrary metric space, although we will soon add additional assumptions on S . We use the Borel σ -field on S , which is the σ -field generated by the open sets in S . We write A0 , A, and ∂A for the interior, closure, and boundary of A, respectively.
30.1 The portmanteau theorem Clearly the definition of weak convergence of real-valued random variables in terms of distribution functions (see Section A.12) has no obvious analog. The appropriate generalization is the following; cf. Proposition A.41.
S furnished with the Definition 30.1 A sequence of probabilities {Pn } on a metric space Borel σ -field is said to converge weakly to P if f d Pn → f d P for every bounded and continuous function f on S . A sequence of random variables {Xn } taking values in S converges weakly to a random variable X taking values in S if E f (Xn ) → E f (X ) whenever f is a bounded and continuous function. Saying Xn converges weakly to X is the same as saying that the laws of Xn converge weakly to the law of X . To see this, if Pn isthe law of Xn , that is, P n (A) = P(Xn ∈ A) for each Borel subset A of S , then E f (Xn ) = f d Pn and E f (X ) = f d P. (This holds when f is an indicator by the definition of the law of Xn and X , then for simple functions by linearity, then for non-negative measurable functions by monotone convergence, and then for arbitrary bounded and Borel measurable f by linearity.) What might cause a bit of confusion is that weak convergence in probability is not the same as weak convergence in functional analysis, but rather is equivalent to what is known as weak-∗ convergence in functional analysis. Feel free to skip the remainder of this paragraph where we explain this. Recall that if B is a Banach space and B∗ is its dual, then xn ∈ B converges weakly to x ∈ B if f (xn ) → f (x) for all f ∈ B∗ . fn ∈ B∗ converges with respect to the weak-∗ topology to f ∈ B∗ if fn (x) → f (x) for all x ∈ B. By the Riesz representation theorem, there is a one-to-one correspondence between positive bounded linear functionals on B = C(X ), the continuous functions on X , where X is compact, and the set M of finite 237
238
Convergence of probability measures
measures on X . When B = C(X ), B∗ can be identified with M, and measures Pn with mass 1 in M converge to P ∈ M with respect to the weak-∗ topology if Pn (g) → P(g) for every g ∈ B = C(X ). Interpreting Pn (g) as g d Pn shows the connection. Returning to weak convergence in the probability sense, the following theorem, known as the portmanteau theorem, gives some other characterizations. For this chapter we let Fδ = {x : d(x, F ) < δ}
(30.1)
for closed sets F , the set of points within δ of F , where d(x, F ) = inf {d(x, y) : y ∈ F }. Theorem 30.2 Suppose {Pn , n = 1, 2, . . .} and P are probabilities on a metric space. The following are equivalent. (1) Pn converges weakly to P. (2) lim supn Pn (F ) ≤ P(F ) for all closed sets F . (3) lim inf n Pn (G) ≥ P(G) for all open sets G. (4) limn Pn (A) = P(A) for all Borel sets A such that P(∂A) = 0. Proof The equivalence of (2) and (3) is easy because if F is closed, then G = F c is open and Pn (G) = 1 − Pn (F ). To see that (2) and (3) imply (4), suppose P(∂A) = 0. Then lim sup Pn (A) ≤ lim sup Pn (A) ≤ P(A) n
n
= P(A0 ) ≤ lim inf Pn (A0 ) ≤ lim inf Pn (A). Next, let us show (4) implies (2). Let F be closed. If y ∈ ∂Fδ , then d(y, F ) = δ. The sets ∂Fδ are disjoint for different δ. At most countably many of them can have positive P-measure, hence there exists a sequence δk ↓ 0 such that P(∂Fδk ) = 0 for each k. Then lim sup Pn (F ) ≤ lim sup Pn (Fδk ) = P(Fδk ) = P(Fδk ) n
n
for each k. Since P(Fδk ) ↓ P(F ) as δk → 0, this gives (2). We show now that (1) implies (2). Suppose F is closed. Let ε > 0. Take δ > 0 small enough so that P(Fδ ) − P(F ) < ε. Then take f continuous, to be equal to 1 on F , to have support in Fδ , and to be bounded between 0 and 1. For example, f (x) = 1 − (1 ∧ δ −1 d(x, F )) would do. Then f dP lim sup Pn (F ) ≤ lim sup f d Pn = n
n
≤ P(Fδ ) ≤ P(F ) + ε. Since this is true for all ε, (2) follows. Finally, let us show (2) implies (1). Let f be bounded and continuous. If we show lim sup f d Pn ≤ f d P, (30.2) n
for every such f , then applying this inequality to both f and − f will give (1). By adding a sufficiently large positive constant to f and then multiplying by a suitable constant, without
30.2 The Prohorov theorem
loss of generality we may assume f is bounded and takes Fi = {x : f (x) ≥ i/k}, which is closed. k i i − 1 ≤ f (x) < f d Pn ≤ Pn k k i=1 = =
239
values in (0, 1). We define i k
k i [Pn (Fi−1 ) − Pn (Fi )] k i=1 k−1 i+1 i=0
k i Pn (Fi ) − Pn (Fi ) k n i=1
1 1 ≤ + Pn (Fi ). k k i=1 k
Similarly,
1 P(Fi ). k i=1 k
f dP ≥ Then lim sup n
1 1 + lim sup Pn (Fi ) k k i=1 n k 1 1 1 ≤ + P(Fi ) ≤ + f d P. k k i=1 k k
f d Pn ≤
Since k is arbitrary, this gives (30.2). If xn → x, Pn = δxn , and P = δx , it is easy to see Pn converges weakly to P. Letting A = {x} shows that one cannot, in general, have limn Pn (F ) = P(F ) for all closed sets F .
30.2 The Prohorov theorem It turns out there is a simple condition that ensures that a sequence of probability measures has a weakly convergent subsequence. Definition 30.3 A sequence of probabilities Pn on a metric space S is tight if for every ε there exists a compact set K (depending on ε) such that supn Pn (K c ) ≤ ε. The important result here is Prohorov’s theorem. Theorem 30.4 If a sequence of probability measures on a metric space S is tight, there is a subsequence that converges weakly to a probability measure on S . Proof Suppose first that the metric space S is compact. Then C(S ), the collection of continuous functions on S , is a separable metric space when furnished with the supremum norm; this is Exercise 30.1. Let { fi } be a countable collection of non-negative elements of C(S ) whose linear span is dense in C(S ). For each i, fi d Pn is a bounded sequence, so we
240
Convergence of probability measures
have a convergent subsequence. By a diagonalization procedure, we can find a subsequence n such that fi d Pn converges for all i. By the term “diagonalization procedure” we are referring to the well-known method of proof of the Ascoli–Arzel`a theorem; see any book on real analysis for a detailed explanation. Call the limit L fi . Clearly 0 ≤ L fi ≤ fi ∞ , L is linear, and so we can extend L to a bounded linear functional on S . By the Riesz representation theorem (Rudin, 1987), there exists a measure P such that L f = f d P. → Since f d P f d P for all fi , it is not hard to see, since each Pn has total mass 1, that i n i f d Pn → f d P for all f ∈ C(S ). Therefore Pn converges weakly to P. Since L f ≥ 0 if that is identically equal to 1 is bounded f ≥ 0, then P is a positive measure. The function and continuous, so 1 = Pn (S ) = 1 d Pn → 1 d P, or P(S ) = 1. Next suppose that S is a Borel subset of a compact metric space S . Extend each Pn , initially defined on S , to S by setting Pn (S \ S ) = 0. By the first paragraph of the proof, there is a subsequence Pn that converges weakly to a probability P on S (the definition of weak convergence here is relative to the topology on S ). Since the Pn are tight, there exist compact subsets Km of S such that Pn (Km ) ≥ 1 − 1/m for all n. The Km will also be compact relative to the topology on S , so by Theorem 30.2,
P(Km ) ≥ lim sup Pn (Km ) ≥ 1 − 1/m. n
Since ∪m Km ⊂ S , we conclude P(S ) = 1. If G is open in S , then G = H ∩ S for some H open in S . Then lim inf Pn (G) = lim inf Pn (H ) ≥ P(H ) = P(H ∩ S ) = P(G). n
n
Thus by Theorem 30.2, Pn converges weakly to P relative to the topology on S . Now let S be an arbitrary metric space. Since all the Pn ’s are supported on ∪m Km , we can replace S by ∪m Km , or we may as well assume that S is σ -compact, and hence separable. It remains to embed the separable metric space S into a compact metric space S . If d is the metric on S , d ∧ 1 will also be an equivalent metric, that is, one that generates the same collection of open sets, so we may assume d is bounded by 1. Now S can be embedded in S = [0, 1]N as follows. We define a metric on S by d (a, b) =
∞
2−i (|ai − bi | ∧ 1),
a = (a1 , a2 , . . .), b = (b1 , b2 , . . .).
(30.3)
i=1
Being the product of compact spaces, S is itself compact. If {z j } is a countable dense subset of S , let I : S → [0, 1]N be defined by I (x) = (d(x, z1 ), d(x, z2 ), . . .). We leave it to the reader to check that I is a one-to-one continuous open map of S to a subset of S . Since S is σ -compact, and the continuous image of compact sets is compact, then I (S ) is a Borel set. Clearly, Prohorov’s theorem is easily modified to handle the case of finite measures on S .
30.3 Metrics for weak convergence
241
30.3 Metrics for weak convergence Since we have defined a notion of convergence of probability measures, one might wonder if one can make the set of probability measures M on S into a metric space so that weak convergence is equivalent to convergence in M. This is indeed possible and in fact there are a number of metrics on the space of probability measures that work. We will focus on the Prohorov metric. Definition 30.5 If P and Q are probability measures on a separable metric space S , define dM (P, Q ) = inf {ε : P(F ) ≤ Q(Fε ) + ε for all F closed}.
(30.4)
It is not immediately obvious that dM is even a metric, so the first task is to show that it is. Proposition 30.6 dM is a metric on M. Proof We start with symmetry, that is, that dM (Q, P ) = dM (P, Q ). Let α be any real number larger than dM (P, Q). If H is closed, then Hα = {x : d(x, H ) < α} is open and K = S \ Hα is closed. Note that H ⊂ S − Kα , where Kα = {x : d(x, K ) < α}, because if x ∈ H , then d(x, K ) ≥ α, so x ∈ / Kα and hence x ∈ S \ Kα . Since K is closed, by the definition of dM (P, Q),
P(Hα ) = 1 − P(K ) ≥ 1 − Q(Kα ) − α = Q(S \ Kα ) − α ≥ Q(H ) − α, or Q(H ) ≤ P(Hα ) + α. Since H was an arbitrary closed set, dM (Q, P ) ≤ α, and it follows that dM (Q, P ) ≤ dM (P, Q ). Reversing the roles of P and Q shows symmetry. Clearly dM (P, Q ) ≥ 0. If dM (P, Q ) = 0, then P(F ) = Q(F ) = 0 for all closed sets F . Since the collection of closed sets generates the Borel σ -field, it is not hard to see that P(A) = Q(A) for all Borel subsets A, and hence P = Q. Finally we prove the triangle inequality. Suppose P, Q, R ∈ M. If α is any real larger than dM (P, Q ) and β any real larger than dM (Q, R ), then for any ε > 0 and any closed set F
P(F ) ≤ Q(Fα ) + α ≤ Q(Fα ) + α ≤ R((Fα )β ) + α + β ≤ R(Fα+β+ε ) + (α + β + ε). Therefore dM (P, R ) ≤ α + β + ε, and since ε is arbitrary, the triangle inequality follows. Now we show that weak convergence is equivalent to convergence in the topology generated by dM , at least if S is separable. (L∞ [0, 1] is an example of a nonseparable metric space.) Proposition 30.7 Suppose S is a separable metric space. A sequence of probability measures Pn on S converges weakly to a probability P if and only if dM (Pn , P ) → 0. Proof We first suppose dM (Pn , P ) → 0 and show that Pn converges weakly to P. Separability is not used in this part of the proof. Suppose F is closed and set εn = dM (Pn , P ) + 1/n. Since Pn (F ) ≤ P(Fεn ) + εn , then lim sup Pn (F ) ≤ lim sup P(Fεn ) = P(F ), n
and we now apply Theorem 30.2(2).
n
242
Convergence of probability measures
We now suppose Pn converges weakly to P. Let ε > 0. Cover S with countably many balls {Bi } of diameter less than ε/2 (separability is used here) and let A1 = B1 , A2 = B2 \ B1 , A3 = B3 \ (B1 ∪ B2 ), A4 = B4 \ (B1 ∪ B2 ∪ B3 ), and so on. Hence the An form a collection of disjoint sets which cover S and each An has diameter less than ε/2. Choose N large enough so that P(∪Ni=1 Ai ) > 1 − ε/2. Let G be the collection of open sets of the form (Ai1 ∪ · · · ∪ Ai j )ε/2 such that i1 , . . . , i j ≤ N. That is, we look at all finite unions of A1 , . . . , AN , and then take the (ε/2)-enlargements. The collection G is finite. This fact and Theorem 30.2(3) imply that we can find n0 such that P(G) ≤ Pn (G) + ε/2 if n ≥ n0 and G ∈ G . Suppose F is closed. Let G = (∪{Ai : i ≤ N, Ai ∩ F = ∅})ε/2 . Then G ∈ G and if n ≥ n0
P(F ) ≤ P(G) + P(∪∞ i=N+1 Ai ) ≤ P (G) + ε/2 ≤ Pn (G) + ε ≤ Pn (Fε ) + ε. In the last inequality we used the definition of G and the fact that the Ai have diameters less than ε/2. This shows dM (P, Pn ) ≤ ε if n ≥ n0 , which in turn implies dM (P, Pn ) → 0.
Exercises 30.1 If S is a metric space, then it is well known that C(S ), the collection of continuous functions with the metric d( f , g) = sup | f (x) − g(x)| x∈S
is a metric space. Show that if S is compact, then C(S ) is separable. 30.2 Suppose Xn converges weakly to X and the random variables Zn are such that d(Xn , Zn ) converges to 0 in probability. Prove that Zn converges weakly to X . This is known as Slutsky’s theorem. Hint: Start with P(Zn ∈ F ) ≤ P(Xn ∈ Fδ ) + P(d(Xn , Zn ) ≥ δ). 30.3 Suppose Xn take values in a normed linear space and converge weakly to X . Suppose cn are scalars converging to c. Show cn Xn converges weakly to cX . 30.4 Give an example of a sequence Pn converging weakly toP and a function f that is continuous but not bounded such that f d Pn does not converge to f d P. 30.5 Give an example of a sequence Pn converging weakly to P and a function f that is bounded but not continuous such that f d Pn does not converge to f d P. 30.6 Show that if Xn converges weakly to X and Yn converges in probability to 0, then XnYn converges in probability to 0. 30.7 This exercise considers a sequence of probability measures that have densities. Suppose S is furnished with the Borel σ -field and μ is a measure on S . Suppose that fn : S → [0, ∞) and f : S →[0, ∞) are measurable functions, each of whose integral over S is one, and define Pn (A) = A fn (x) μ(dx) for each n and P(A) = A f (x) μ(dx). (1) Show that if fn → f , μ-a.e., then Pn converges weakly to P. (2) Give an example where Pn and P are as above, Pn converges weakly to P, but fn does not converge almost everywhere to f .
Notes
243
30.8 Give an example of continuous processes Xn and X such that all the finite-dimensional distributions of Xn converge weakly to the corresponding finite-dimensional distributions of X , but where Xn does not converge weakly to X with respect to the topology of C[0, 1]. 30.9 Suppose X is a random variable taking values in a complete separable metric space. If ε > 0, show there exists a compact set K such that P(X ∈ / K ) < ε. Hint: For each n choose closed balls {Bni , i = 1, . . . , Nn } such that n P(X ∈ / ∪Ni=1 Bni ) < ε/2n+1 .
Nn Then K = ∩∞ n=1 ∪i=1 Bni is totally bounded, hence compact.
30.10 Suppose Xn converges weakly to X and the metric space S is complete and separable. Prove that the sequence {Xn } is tight. 30.11 Let L be the collection of continuous functions on S such that (1) supx∈S | f (x)| ≤ 1. (2) | f (x) − f (y)| ≤ d(x, y) for all x, y ∈ S . Define f d Q. dL (P, Q ) = sup f d P − f ∈L
Show that dL is a metric on the collection of probability measures on the Borel σ -field of S . Prove that a sequence of probability measures Pn converges weakly to P if and only if dL (Pn , P ) → 0. 30.12 Suppose S is a separable metric space. Show that M is separable.
Notes For more information, see Billingsley (1968) and Ethier and Kurtz (1986).
31 Skorokhod representation
Suppose S is a complete separable metric space furnished with the Borel σ -field. We are going to show that if Xn are random variables taking values in S converging weakly to a random variable X , then we can find another probability space and other random variables Xn , X such that the law of Xn equals the law of Xn for each n, the law of X equals the law of X , and Xn converges to X almost surely. Let = [0, 1], F the Borel σ -field on [0, 1], and P Lebesgue measure. We first prove Theorem 31.1 Let P be a probability measure on S . Then there exists a random variable X mapping to S such that the law of X under P is equal to P. Proof For each k ≥ 1, let {Aki } be a countable disjoint covering of S by Borel sets of diameter less than 1/k, such that P(∂Aki ) = 0, and {Aki } is a refinement of {Ak−1,i }. We can construct these families inductively. To start, cover S with countably many balls of radius less than 1. Since for each x0 , P({x : |x − x0 | = r}) can be nonzero for at most countably many values of r, we can arrange matters so that the P-measure of the boundary of these balls is 0. We order the balls B1 , B2 , . . . , and then let A11 = B1 , A12 = B2 \ B1 , A13 = B3 \ (B1 ∪ B2 ), and so on. To construct {A2i }, we first find a similar covering of S by sets {A2i } of diameter less than 1/2, and then take all intersections of sets in {A2i } with sets in {A1 j }. We inductively define closed subintervals of [0, 1] by choosing I11 to have left endpoint at 0 and length equal to P(A11 ), then I12 to have left endpoint equal to the right endpoint of I11 and length equal to P(A12 ), and so forth. We then decompose I11 into subintervals {I21 } in an analogous way so that the lengths of the subintervals match the probabilities of the A2i ’s contained in A11 . We then subdivide I12 , and so on. We observe that {Iki } is a refinement of {Ik−1,i } for all k ≥ 2 and P (Iki ) = P(Aki ) for all k and i. Pick a point xki ∈ Aki for each k and i. We define X k by setting X k (ω ) equal to xki if ω ∈ Iki . (The set of endpoints of the Iki is countable, hence has Lebesgue measure 0, and it doesn’t matter how we define X k at those points.) For each ω except those that are endpoints of some Iki , if n ≥ m, then X n (ω ) and X m (ω ) are in the same Ami for some i. Since the diameter of Ami is less than 1/m, we see that d(X n (ω ), X m (ω )) ≤ 1/m. That is, X n (ω ) is a Cauchy sequence. The space S is complete, so we can define X (ω ) to be the limit of the X n (ω ). The collection of endpoints of the Imi is countable, so the limit exists for almost every ω .
244
Skorokhod representation
245
It remains to show that the law of X under P is P. Let F be a closed set, let Fk = {x : d(x, F ) < 1/k}, and let Jk = {i : Aki ∩ F = ∅}. We have P (X k ∈ F ) ≤ P (X k ∈ ∪i∈Jk Aki ) ≤ P (X k ∈ Aki ) =
P (Iki ) =
i∈Jk
i∈Jk
P(Aki ) ≤ P(Fk ).
i∈Jk
We used the fact that each Aki has diameter less than 1/k. Hence lim sup P (X k ∈ F ) ≤ P(F ). k
Therefore the laws of X under P converge weakly to P. But we know d(X k (ω ), X (ω )) ≤ 1/k, so X k converges to X , a.s., with respect to P . If f is continuous and bounded, E f (X k ) → E f (X ) by dominated convergence, so X k → X weakly. Therefore the law of X under P is equal to P. k
We did not need the fact that the Aki were continuity sets, i.e., that the probability of the boundary of Aki is zero, but this will be used in the next theorem, which is known as the Skorokhod representation. Theorem 31.2 Suppose Pn are probability measures on S converging weakly to P. Then there exist random variables Xn mapping to S with laws Pn and a random variable X mapping to S with law P such that Xn → X , a.s. Equivalently, if Xn converges to X weakly, there exist random variables Xn and X mapping to S with laws equal to Xn and X , respectively, such that Xn → X , a.s. Proof Let the Aki be as in the proof of the previous theorem, and for each Pn define intervals Ikin and random variables Xnk as was done above, and let Xn be the limit of the Xnk ’s. c Let Kkn = {i : P(Aki ) > Pn (Aki )} and Kkn = {i : P(Aki ) ≤ Pn (Aki )}. Since [P(Aki ) − Pn (Aki )] = 1 − 1 = 0, i
we have
[P(Aki ) − Pn (Aki )] = −
c Kkn
Hence
[P(Aki ) − Pn (Aki )]. Kkn
|P (Iki ) − P (Ikin )| =
i
|P(Aki ) − Pn (Aki )|
i
=
[P(Aki ) − Pn (Aki )] −
[P(Aki ) − Pn (Aki )]
c Kkn
Kkn
=2
(31.1)
[P(Aki ) − Pn (Aki )]
Kkn
=2
[P(Aki ) − Pn (Aki )]+ .
i
Each term in the sum on the last line goes to 0 as n → ∞ by Theorem 30.2 because the Aki are P-continuity sets, that is, P(∂Aki ) = 0; also each term is dominated by P(Aki ), and
246
Skorokhod representation
i P (Aki ) = 1. Therefore by dominated convergence the sum on the last line of (31.1) goes to 0. Fix k and j and let α, αn be the left-hand endpoints of Ik j , Iknj , respectively. Then (31.1) allows us to use dominated convergence to conclude that P (Iki ) = lim P (Ikin ) = lim αn , α= i∈J
n→∞
i∈J
n→∞
where J consists of those i such that Iki is to the left of Ik j ; note that for i ∈ J we have that Ikin is to the left of Iknj and conversely, if Ikin is to the left of Iknj , then i ∈ J . Similarly the right-hand endpoint of Iknj converges to the right-hand endpoint of Ik j . If ω is in the interior of Ik j , then it will be in the interior of Iknj for all sufficiently large n. This means that for n sufficiently large, d(X (ω ), Xn (ω ) ≤ 2/k. This implies our result.
Exercises 31.1 Suppose f is bounded, Xn converges to X weakly, and also that P(X ∈ D f ) = 0, where D f = {x : f is not continuous at x}. Show that f (Xn ) converges weakly to f (X ). 31.2 Suppose a sequence {Xn } is uniformly integrable and Xn converges to X weakly. Show E Xn → EX. 31.3 Give an example of a sequence of random variables Xn converging weakly to X and where each Xn is integrable, but X is not integrable. 31.4 Suppose Xn converges weakly to X and each Xn is non-negative. Prove that E X ≤ lim inf E Xn . n→∞
31.5 Suppose Xn converges weakly to X and each Xn has the property that with probability one, |Xn (t ) − Xn (s)| ≤ |t − s|,
s, t ≤ 1.
|X (t ) − X (s)| ≤ |t − s|,
s, t ≤ 1.
t (This might arise, for example, if each Xn is of the form Xn (t ) = 0 Yn (s) ds and each Yn is bounded by 1.) Prove that X has this same property, that is, with probability one,
31.6 Here is a way to prove one direction of Lebesgue’s theorem on Riemann integrable functions. (1) For each n ≥ 1 and each i ≤ n, let xin be a point in [(i−1)/n, i/n). Let Pn be the probability measure that assigns mass 1/n to each point xin , i = 1, 2, . . . , n. Show that Pn converges weakly to P, where P is a Lebesgue measure on [0, 1]. (2) Suppose f is a bounded function which is continuous at almost every pointof1 [0, 1]. Show that f d Pn → f d P. Note that f d Pn is a Riemann sum approximation to 0 f (x) dx.
32 The space C[0, 1]
We examine weak convergence for the space C[0, 1], the set of continuous real-valued functions on [0, 1]. We give a criterion for the laws of a sequence of continuous stochastic processes to be tight. We apply these results to show that a simple symmetric random walk converges weakly to a Brownian motion, which in particular gives another construction of Brownian motion.
32.1 Tightness Let C[0, 1] be the collection of continuous real-valued functions from [0, 1] into R. We make C[0, 1] into a metric space by defining d( f , g) = sup | f (t ) − g(t )|, t∈[0,1]
and it is well known that C[0, 1] is separable and complete. We recall the Ascoli–Arzel`a theorem: if a family F of functions on a compact set is equicontinuous and uniformly bounded at one point, then every subsequence in F has a further subsequence in F that converges. Rephrased another way, if the family F is equicontinuous and uniformly bounded at one point, then the closure of F is compact. We furnish C[0, 1] with the Borel σ -field. Given a continuous function f on [0, 1], we define ω f , the modulus of continuity of f , by ω f (δ) =
sup
| f (t ) − f (s)|.
s,t∈[0,1],|t−s|<δ
We have the following criterion for a sequence of continuous processes to be tight. Theorem 32.1 Suppose the Xn are continuous real-valued processes. Suppose for each ε and η > 0 there exist n0 , A, and δ (depending on ε and η) such that if n ≥ n0 , then
P(ωXn (δ) ≥ ε) ≤ η
(32.1)
P(|Xn (0)| ≥ A) ≤ η.
(32.2)
and
Then the Xn are tight. Proof Since each Xi is a continuous process, then for each i, P(ωXi (δ) ≥ ε) → 0 as δ → 0 by dominated convergence. Hence, given ε and η we can, by taking δ smaller if necessary, assume that (32.1) holds for all n. 247
248
The space C[0, 1]
Choose εm = ηm = 2−m and consider the δm and Am so that sup P(ωXn (δm ) ≥ 2−m ) ≤ 2−m n
and sup P(|Xn (0)| ≥ Am ) ≤ 2−m . n
Let Km0 = { f ∈ C[0, 1] :
| f (t ) − f (s)| ≤ 2−m for all m ≥ m0 ,
sup s,t∈[0,1],|t−s|≤δm
| f (0)| ≤ Am0 }. Each Km0 is an equicontinuous family, and by the Ascoli–Arzel´a theorem, each Km0 is a compact subset of C[0, 1]. We have
P(Xn ∈ / Km0 ) ≤ P(|Xn (0)| ≥ Am0 ) +
∞
P(ωXn (δm ) ≥ εm )
m=m0
≤ 2−m0 +
∞
2−m = 3 · 2−m0 .
m=m0
This proves tightness. We have given one criterion for a process to have continuous paths, namely, Theorem 8.1. In the case of Markov processes, we have given another: Theorem 21.5.
32.2 A construction of Brownian motion We will now use the results of Section 32.1 to give a construction of Brownian motion, quite different from that of Chapter 6. Let Yi be i.i.d. random variables with P(Yi = 1) = P(Yi = −1) = 12 . Then Sn = ni=1 Yi √ is a simple symmetric random walk. Let Zn (t ) = Snt / n for t a multiple of 1/n and define Ztn by linear interpolation for other t. That is, if k/n ≤ t ≤ (k + 1)/n, then Ztn =
(k + 1) − nt nt − k Sk + √ Sk+1 . √ n n
(32.3)
The Zn are continuous processes. Let Pn be the law of Zn , which will be a probability measure on C[0, 1]. Theorem 32.2 The sequence Pn converges weakly to a probability measure P∞ on C[0, 1], and P∞ is the law of a Brownian motion. Proof The main step is to prove that the Pn are tight. We then show that any subsequential limit point is a Wiener measure, that is, the law of a Brownian motion. We can then appeal to Theorem 31.1 to obtain the process X , which will be a Brownian motion. A computation shows that
E Sn4
=
n i=1
E Yi4 +
i= j
(E Yi2 )(E Y j2 ) ≤ cn2 ,
(32.4)
32.2 A construction of Brownian motion
249
since E Yi and E Yi3 are both 0, the Yi ’s are independent, and the second sum has n(n−1) ≤ n2 terms. If s and t are multiples of 1/n, then nt nt−ns 1 4 1 4 E Y = E Yi i 2 n2 n i=ns+1 i=1 c 2 2 2 ≤ 2 n |t − s| ≤ c|t − s| . n
E |Zt − Zs |4 =
(32.5)
If we tried to get by with only the second moment, we would only end up with c|t − s|, which is not good enough for Theorem 8.1. At this point we would like to apply Theorem 32.1, but we have the technical nuisance that s and t might not be multiples of 1/n. If |t − s| ≤ 2/n, then by the construction of Zn using linear interpolation and the fact that the Yi ’s are bounded by one in absolute value, we √ have |Zn (t ) − Zn (s)| ≤ c|t − s| n and then
E |Zn (t ) − Zn (s)|4 ≤ c|t − s|4 n2 ≤ c|t − s|2 .
(32.6)
Suppose |t − s| > 2/n. Let s be the largest multiple of 1/n less than or equal to s and t the largest multiple of 1/n larger than or equal to t. Using (32.5) and (32.6),
E |Zn (t ) − Zn (s)|4 ≤ cE |Zn (t ) − Zn (t )|4 + cE |Zn (t ) − Zn (s )|4 + E |Zn (s ) − Zn (s)|4 ≤ c|t − t |2 + c|t − s |2 + c|s − s|2 ≤ c|t − s|2 , since |t − t |, |t − s |, and |s − s | are all less than c|t − s|. Note Zn (0) = 0 for all n. We now apply Theorems 8.1 and 32.1 to obtain the tightness. Any subsequential limit point is a probability measure on C[0, 1], so to show that the limit is a Brownian motion, it is enough by Theorem 2.6 to show that the finite-dimensional distributions under the limit law P∞ agree with those of Brownian motion. Fix t. Then Zn (t ) √ √ differs from S[nt] / n by at most 1/ n, where [nt] is the √ largest integer less than or equal to nt. By the central limit theorem (Theorem A.51), S[nt] / [nt] converges weakly (with respect to the topology of R) to a mean zero normal random variable with variance one. By Exercise √ 30.3, S[nt] / n converges weakly to a mean zero normal random variable with variance t, and by Exercise 30.2, Zn (t ) converges weakly to a mean zero normal random variable with variance t. This shows that the one-dimensional distributions of Zn converge weakly to the one-dimensional distributions of a Brownian motion. We leave the analogous argument for the higher-dimensional distributions to the reader. One can also use Doob’s inequalities to obtain the necessary tightness estimate. If s and t are multiples of 1/n, we have √ E |Snt − Sns |4 P( max |Sk − Sns | > λ n) ≤ c ns≤k≤nt λ4 n2 |t − s|2 . ≤c λ4
(32.7)
250
The space C[0, 1]
Exercises 32.1 The support of a measure λ is the smallest closed set F such that λ(F c ) = 0. Let P be a Wiener measure on C[0, 1], i.e., the law of a Brownian motion on [0, 1]. Use Exercise 13.4 to prove that the support of P is all of C[0, 1]. 32.2 Let (S , d ) be a complete separable metric space and let R be a subset of S . Then (R, d ) is also a metric space. If Xn converges weakly to X with respect to the topology of (S , d ) and each Xn and X take values in R, does Xn converge weakly to X with respect to the topology of (R, d )? Does the answer change if R is a closed subset of S ? If Xn and X take values in R and Xn converges weakly to X with respect to the topology of (R, d ), does Xn converge weakly to X with respect to the topology of (S , d )? What if R is a closed subset of S ? 32.3 Give a proof of Theorem 32.2 using (32.7) in place of Theorem 8.1. 32.4 Suppose (X , W, P ) is a weak solution to dXt = σ (Xt ) dWt + b(Xt ) dt,
X0 = x,
(32.8)
where W is a one-dimensional Brownian motion and σ and b are bounded and continuous, but we do not assume that σ is bounded below by a positive constant. Suppose the solution to (32.8) is unique in law. Suppose σn and bn are Lipschitz functions which are uniformly bounded and which converge uniformly to σ and b, respectively. Let Xt (n) be the unique pathwise solution to dYt = σn (Yt ) dWt + bn (Yt ) dt,
Y0 = x;
the probability measure here is P. Prove that X (n) converges weakly to X with respect to C[0, 1]. 32.5 Let W be a d-dimensional Brownian motion and let {Xt , t ∈ [0, 1]} be the solution to (24.22). If x ∈ Rd , prove that the support of Px is all of C[0, 1].
33 Gaussian processes
A Gaussian process is a stochastic process where each of the finite-dimensional distributions is jointly normal. We will primarily, but not exclusively, be concerned with Gaussian processes that have continuous paths. For much of what we consider, it is not essential that the index set of times be [0, ∞), and can in fact be almost any set. We will thus consider {Xt : t ∈ T } for some index set T , and where for every finite subset S of T , the collection {Xs : s ∈ S} is jointly normal.
33.1 Reproducing kernel Hilbert spaces We define the covariance function by
(s, t ) = E [(Xs − E Xs )(Xt − E Xt )],
s, t ∈ T.
(33.1)
For our purposes, having a non-zero mean just complicates formulas without adding anything interesting, so in this chapter we will assume E Xt = 0 for all t ∈ T , and (33.1) becomes
(s, t ) = E [Xs Xt ],
s, t ∈ T.
(33.2)
We first show how can be used to construct a Hilbert space called the reproducing kernel Hilbert space (RKHS). When we write (s, ·), we mean that we fix an element s ∈ T and then consider the function g : T → R defined by g(t ) = (s, t ) for t ∈ T . Let K be the collection of finite linear combinations of the functions (s, ·), s ∈ T . Thus each element of K has the form m
a j (s j , ·),
j=1
where m ≥ 1, the a j ’s are nreal, and each s j , j = 1, . . . , m, is an element of T . If f = m a
(s , ·) and g = j j j=1 k=1 bk (tk , ·), define f , gRKH S =
m n
a j bk (s j , tk ).
j=1 k=1
We define H to be the closure of K with respect to the norm induced by the inner product ·, ·RKH S . We need to show that this bilinear form is indeed an inner product, that what is known as the reproducing property holds, and that H is a Hilbert space. 251
252
Gaussian processes
We start with the reproducing property. If f = property applied to f is the formula
m j=1
a j (s j , ·), then the reproducing
f , (t, ·)RKH S = f (t ).
(33.3)
This follows from f , (t, ·)RKH S =
m
a j (s j , t ) = f (t ).
j=1
By taking limits, (33.3) holds for all f ∈ H. To show that ·, ·RKH S is an inner product, notice that when f = a j (s j , ·) ∈ K, then f , f RKH S =
m m
a j ak (s j , sk ) =
j=1 k=1
=E
m
m
a j ak E [Xs j Xsk ]
j,k=1
2 a j Xs j
≥ 0.
j=1
The Cauchy–Schwarz inequality holds for ·, ·RKH S (the standard proof of the Cauchy– Schwarz inequality applies), and so if f , f RKH S = 0, then | f (t )|2 = f , (t, ·)2RKH S ≤ f , f RKH S (t, ·), (t, ·)RKH S = 0, and thus f is zero. If fn is a Cauchy sequence with respect to the norm gRKH S = g, g1/2 RKH S , then | fn (t ) − fm (t )|2 = fn − fm , (t, ·)2RKH S ≤ fn − fm , fn − fm RKH S (t, ·), (t, ·)RKH S , which tends to 0 as n, m → ∞. Thus fn converges pointwise. This is enough to prove H is complete; this is Exercise 33.1. We summarize. Proposition 33.1 H with the inner product ·, ·RKH S is a Hilbert space. Moreover, if f ∈ H and t ∈ T , then f , (t, ·)RKH S = f (t ). We consider another Hilbert space M, the closure of the linear span of {Xt : t ∈ T } with respect to L2 (P ). We define Y, ZM = E [Y Z]
33.1 Reproducing kernel Hilbert spaces
253
if Y and Z are both finite linear combinations of the Xt ’s. Thus if m, n ≥ 1, a j , bk ∈ R, we set m !
a j Xs j ,
j=1
n
" bk Xtk
k=1
M
=
m n
a j bk E [Xs j Xtk ],
(33.4)
j=1 k=1
and we let M be the closure of the collection of random variables of the form mj=1 a j Xs j with respect to ·, ·M . Since (s j , tk ) = E [Xs j Xtk ], from (33.4) we see that H and M m are isomorphic, where we have a one-to-one correspondence between j=1 a j (s j , ·) and m j=1 a j Xs j . Let {en } be a complete orthonormal system for H. Let Yn be the element of M corresponding to en . Then
E [YnYm ] = Yn , Ym M = en , em RKH S = δnm , where δnm is 0 if n = m and 1 if n = m. This implies that the Yn are independent normal random variables with mean zero and variance one; see Proposition A.55. (Recall that we are assuming that all the Xt ’s have mean zero.) Since (s, ·) is an element of H, we can write
(s, ·) =
∞
(s, ·), en RKH S en (·) =
n=1
∞
en (s)en (·).
n=1
Using the correspondence between H and M, we have Xs =
∞
en (s)Yn ,
n=1
where the Yn are i.i.d. standard normal variables. This is known as the Karhunen–Lo`eve expansion of a Gaussian process. Example 33.2 Let’s see what this expansion is in the case of Brownian motion. If we define 1 f , gCM = f (r)g (r) dr (33.5) 0
for f and g whose first derivatives are in L2 ([0, 1]) and such that f (0) = g(0) = 0, then because (s, t ) = s ∧ t, 1 (s, ·), (t, ·)CM = 1[0,s) (r)1[0,t ) (r) dr = s ∧ t 0
= (s, t ), and we see that we have identified the reproducing kernel Hilbert space for Brownian motion on [0, 1]. The notation ·, ·CM is used because the Hilbert space with this inner product is called the Cameron–Martin space, a space that has many connections with Brownian motion. √ If en (s) = 2 sin(nπs)/nπ, then the sequence {en } is a complete orthonormal sequence for the Cameron–Martin space. The Karhunen–Lo`eve expansion is equivalent to the formula (6.2) that we used in our first construction of Brownian motion.
254
Gaussian processes
33.2 Continuous Gaussian processes We now turn to the construction of Gaussian processes with continuous paths. Suppose we have an index set T and a non-negative definite kernel (·, ·). Saying is non-negative definite means that for each n and each t1 , . . . , tn ∈ T , the matrix whose (i, j) entry is
(ti , t j ) is a non-negative definite matrix. We define a metric on T by defining d(s, t ) = (Var (Xt − Xs ))1/2 . Actually, d is a pseudo-metric because d(s, t ) = 0 does not necessarily imply t = s. An ε-ball is a set of the form {t ∈ T : d(t, t0 ) < ε} for some t0 . Let N (ε) be the minimum number of ε-balls needed to cover T . Theorem 33.3 Let : T × T → R be continuous with respect to the pseudo-metric d, symmetric, and non-negative definite. If for some β < 1 and some constant c we have log N (ε) ≤ cε −β ,
ε ∈ (0, 1),
(33.6)
then there exists a continuous Gaussian process {Xt : t ∈ T } with covariance kernel . One can in fact be more precise than (33.6) and give an integral condition that N (x) must satisfy for x small. Before proving Theorem 33.3, let us look at a number of examples. Example 33.4 In the case of Brownian motion, Var (Xt − Xs ) = |t − s|, so that d(s, t ) = |s − t|1/2 . If T is the interval [0, 1], then the set of intervals of length ε2 and centers kε 2 /4, k = 0, 1, . . . , 4/ε2 , is a collection of ε-balls covering [0, 1]. Therefore N (ε) ≤ c/ε 2 , implying log N (ε) ≤ c log(1/ε), which satisfies (33.6). This and Theorem 2.4 gives a construction of Brownian motion. Example 33.5 We look at fractional Brownian motion. Let H ∈ (0, 2). H is known as the Hurst index, where H = 1 corresponds to Brownian motion. Define
(s, t ) = |s|H + |t|H − |s − t|H . This leads to d(s, t ) = c|t − s|H/2 . Open intervals of length ε2/H are ε-balls, and it takes cε−2/H of them to cover [0, 1]. Therefore again N (ε) ≤ c log(1/ε), and (33.6) applies. One use of fractional Brownian motion is to model stock prices where there is more or less memory of the past than a Brownian motion has. Example 33.6 Here is our first example of a Gaussian process where T is not a subset of [0, ∞). We construct a Brownian sheet, X (t1 , t2 ), where the points (t1 , t2 ) ∈ [0, 1]2 . More generally we can consider X (t ), where t ∈ [0, 1]d . This is no harder, but for simplicity of notation we consider only the case d = 2. If s = (s1 , s2 ) and t = (t1 , t2 ), define
(s, t ) = (s1 ∧ t1 )(s2 ∧ t2 ). One motivation for this formula is to identify the point (t1 , t2 ) with the rectangle Rt whose lower left corner is at the origin and whose upper right corner is at (t1 , t2 ). Then the covariance of Xs and Xt is the area of Rs ∩ Rt .
33.2 Continuous Gaussian processes
255
Some simple geometry shows that if we put ε-balls centered at the points (c1 jε2 , c1 kε2 ) for an appropriate c1 and with j, k ≤ c2 ε−2 , we cover T . Therefore N (ε) ≤ cε−4 , and so log N (ε) ≤ c log(1/ε). Example 33.7 We can generalize the last example. For every Borel subset A of [0, 1]d , let XA be a Gaussian random variable. We want the covariance of XA and XB to be the Lebesgue measure of A ∩ B. This is known as a set-indexed process. If we let T be the collection of all Borel subsets of [0, 1]d , one cannot get a continuous Gaussian process. In order to get a continuous process X one must restrict T to be a subcollection of sets whose boundaries are sufficiently smooth; see Dudley (1973). Example 33.8 Our last example has a more complicated index set. Let W be a onedimensional Brownian motion. If f ∈ L2 [0, 1], define 1 Xf = f (s) dWs . 0
By Exercise 24.6, X f is a Gaussian random variable with mean 0 and variance 1 and the covariance of X f and Xg is 0 f (s)g(s) ds. It follows that 1 2 d( f , g) = ( f (s) − g(s))2 ds.
1 0
f (s)2 ds
0
The process X f is known as a Gaussian field. For what subsets T of L2 ([0, 1]) can one define a process X f that has continuous paths with respect to d? This means that the map f → X f (ω) is continuous for almost all ω, where we use the pseudo-metric d to define open sets in T . It turns out T = { f ∈ L2 ([0, 1]) : f 2 ≤ 1} is too large to obtain a continuous Gaussian process, but, for example, T = { f ∈ C 2 ([0, 1]) : f ∞ ≤ 1, f ∞ ≤ 1, f ∞ ≤ 1} is small enough to apply Theorem 33.3. We now proceed to the proof of Theorem 33.3. Proof of Theorem 33.3 Since T can be covered by finitely many ε-balls for each ε, it follows −n that if A(ε) is the collection of centers for the cover by ε-balls, then A = ∪∞ n=1 A (2 ) is a countable dense subset of T . We first label the elements of A by t1 , t2 , . . . For each n, we construct the law of (Xt1 , . . . , Xtn ). We then use the Kolmogorov extension theorem to construct the law of {Xt : t ∈ A}. Next we prove that t → Xt is uniformly continuous on A, almost surely. Finally we define Xt for all t ∈ T by continuity. Step 1. We construct the law of (Xt1 , . . . , Xtn ). Let n be fixed, and let B be an n×n matrix whose (i, j) entry is (ti , t j ). The matrix B is symmetric, and non-negative definite by hypothesis. Let Y1 , . . . , Yn be independent normal random variables with mean zero and variance one. If we let C be the non-negative definite square root of B and X = CY (viewed as vectors), or equivalently, Xti =
n j=1
Ci jY j ,
256
Gaussian processes
a simple calculation shows that E [Xtk Xtm ] = Bkm = (tk , tm ). The Xt j ’s are jointly normal and this gives the first step of the construction. Step 2. We apply the Kolmogorov extension theorem. Let Pn be the law of (Xt1 , . . . , Xtn ). It is easy to see the consistency property holds for the Pn , so by the Kolmogorov extension theorem, there exists a probability P on RN such that if we define Xt (ω) by ω(t ) for t ∈ A, the law of (Xt1 , . . . , Xtn ) is Pn for each n. Step 3. We show that except for a null set of probability zero, the map t → Xt (ω) is uniformly continuous on A. To prove the uniform continuity, we proceed similarly to Theorem 8.1. For each point t ∈ A, let t j be the element of A(2− j ) closest to t, with some convention for breaking ties. We will fix J in a moment, and write Xt = XtJ + (XtJ +1 − XtJ ) + (XtJ +2 − XtJ +1 ) + · · · , where the sum is finite because t ∈ A. Let λ > 0. If |Xt − Xs | > λ for some s, t ∈ A with d(s, t ) < 2− , then ω is in one or more of the following events: (a) the event EJ = {|XtJ − XsJ | > λ/2 for some sJ , tJ ∈ A(2−J ) with d(sJ , tJ ) ≤ 3 · 2−J }; (b) the event λ Fj = |Xt j+1 − Xt j | > 2 for some t j ∈ A(2− j ), t j+1 ∈ A(2−( j+1) ) 8j with d(t j , t j+1 ) < 3 · 2− j+1 for some j ≥ J ; (c) the event λ G j = |Xs j+1 − Xs j | > 2 for some s j ∈ A(2− j ), s j+1 ∈ A(2−( j+1) ) 8j with d(s j , s j+1 ) < 3 · 2− j+1 for some j ≥ J . First we bound the probability of EJ . There are N (2−J ) elements of A(2−J ), so there are at most exp(2c(2J )β ) pairs (sJ , tJ ). If d(tJ , sJ ) < 3 · 2−J , then
(λ/2)2 P(|XsJ − XtJ | > λ/2) ≤ 2 exp − . 2 · 3 · 2−J Therefore the probability of EJ is bounded by βJ
P(E j ) ≤ ec2 e−cλ 2 . 2 J
Since β < 1, this can be made as small as we like by taking J large enough. For any t j and t j+1 with d(t j , t j+1 ) < 3 · 2− j+1 ,
P(|Xt j − Xt j+1 | > λ/(8 j 2 )) ≤ 2 exp
λ2 /64 j 4 6 · 2− j+1
.
Exercises βj
257
β( j+1)
There are less than ec2 points in A(2− j ) and ec2 pairs. Thus the probability of Fj is bounded by βj
βj
points in A(2−( j+1) ), so less than ec2
P(Fj ) ≤ cec2 e−cλ 2 / j . Since β < 1, this is summable in j, and ∞ j=J P (Fj ) can be made as small as we like if we take J large enough. We handle the bound for G j similarly. Thus, given ε, we have P(
sup s,t∈A,d(s,t )<2−J
2 j
4
|Xt − Xs | > λ) ≤ ε
if we take J large enough, where J depends on ε and λ. This suffices to prove the uniform continuity. Step 4. We use continuity to complete the proof. Define Xt = lims∈A,s→t Xs . The limit exists and will be a continuous function of t by virtue of the uniform continuity. By Remark A.56, Xt will have the desired covariance function. We have been considering Gaussian processes taking values in R, but it is also of interest to look at Brownian motion taking values in a Hilbert space or a Banach space. There are three steps to constructing such a process: (1) constructing Gaussian measures on Banach (or Hilbert) spaces; (2) getting a suitable estimate on Xt − Xs ; (3) constructing a Brownian motion. Of these three steps, the third follows along the lines we used for real-valued processes. Steps (1) and (2) require considerable work, and we refer the reader to Bogachev (1998) or Kuo (1975). A measure μ on a Banach space is called Gaussian if μ ◦ L−1 is a Gaussian measure on R for every linear functional L on the Banach space.
Exercises 33.1 Finish the proof that H as defined in Section 33.1 is complete. 33.2 Show that if in Example 33.8 we let T = { f ∈ C 1 ([0, 1]); f ∞ ≤ 1, f ∞ ≤ 1}, then N (ε) is bounded above by c1 ε −1 and bounded below by c2 ε −1 . 33.3 Suppose X i and Y i are two sequences of Brownian motions with all of the Brownian motions independent of each other. Let 1 i i n Xs Yt . Z(s,t ) = √ n n
i=1
Prove that Z n converges weakly with respect to the topology of C([0, 1]2 ) as n → ∞ to a Brownian sheet.
258
Gaussian processes
33.4 Let X be a Brownian bridge. (This will be studied further in Section 35.2.) This means that X is a mean zero Gaussian process with Cov (Xs , Xt ) = s ∧ t − st,
0 ≤ s, t ≤ 1.
Identify the reproducing kernel Hilbert space for X . 33.5 Let X be the Ornstein–Uhlenbeck process started at 0. This was defined in Exercise 19.5. Identify the reproducing kernel Hilbert space for X .
34 The space D[0, 1]
We define the space D[0, 1] to be the collection of real-valued functions on [0, 1] which are right continuous with left limits. We will introduce a topology on D = D[0, 1], the Skorokhod topology, which makes D into a complete separable metric space. We will give a criterion for a subset of D to be compact, which will lead to some criteria for a family of probability measures on D to be tight.
34.1 Metrics for D[0, 1] We write f (t−) for lims 0, let t0 = 0, and for i > 0 let ti+1 = inf {t > ti : | f (t ) − f (ti )| > ε} ∧ 1. Because f is right continuous with left limits, then from some i on, ti must be equal to 1. Our first try at a metric, ρ, makes D into a separable metric space, but one that is not complete. Let’s start with ρ anyway, since we need it on the way to the metric d we end up with. Let be the set of functions λ from [0, 1] to [0, 1] that are continuous, strictly increasing, and such that λ(0) = 0, λ(1) = 1. Define ρ( f , g) = inf {ε > 0 : ∃λ ∈ such that sup |λ(t ) − t| < ε, t∈[0,1]
sup | f (t ) − g(λ(t ))| < ε}. t∈[0,1]
Since the function λ(t ) = t is in , then ρ( f , g) is finite if f , g ∈ D. Clearly ρ( f , g) ≥ 0. If ρ( f , g) = 0, then either f (t ) = g(t ) or else f (t ) = g(t−) for each t; since elements of D are right continuous with left limits, it follows that f = g. If λ ∈ , then so is λ−1 and we have, setting s = λ−1 (t ) and noting both s and t range over [0, 1], sup |λ−1 (t ) − t| = sup |s − λ(s)| t∈[0,1]
s∈[0,1]
and sup | f (λ−1 (t )) − g(t )| = sup | f (s) − g(λ(s))|, t∈[0,1]
s∈[0,1]
and we conclude ρ( f , g) = ρ(g, f ). The triangle inequality follows from sup |λ2 ◦ λ1 (t ) − t| ≤ sup |λ1 (t ) − t| + sup |λ2 (s) − s| t∈[0,1]
t∈[0,1]
259
s∈[0,1]
260
The space D[0, 1]
and sup | f (t ) − h(λ2 ◦ λ1 (t ))| ≤ sup | f (t ) − g(λ1 (t ))| t∈[0,1]
t∈[0,1]
+ sup |g(s) − h(λ2 (s))|. s∈[0,1]
Look at the set of f in D for which there exists an integer k such that f is constant and equal to a rational on each interval [(i − 1)/k, i/k). It is not hard to check (Exercise 34.1) that the collection of such f ’s is dense in D with respect to ρ, which shows (D, ρ) is separable. The space D with the metric ρ is not, however, complete; see Exercise 34.2. We therefore introduce a slightly different metric d. Define λ(t ) − λ(s) λ = sup log t −s s=t,s,t∈[0,1] and let d( f , g) = inf {ε > 0 : ∃λ ∈ such that λ ≤ ε, sup | f (t ) − g(λ(t ))| ≤ ε}. t∈[0,1]
Note λ−1 = λ and λ2 ◦ λ1 ≤ λ1 + λ2 . The symmetry of d and the triangle inequality follow easily from this, and we conclude d is a metric. Lemma 34.1 There exists ε0 such that ρ( f , g) ≤ 2d( f , g) if d( f , g) < ε0 . (It turns out ε0 = 1/4 will do.) Proof
Since log(1 + 2x)/(2x) → 1 as x → 0, we have log(1 − 2ε) < −ε < ε < log(1 + 2ε)
if ε is small enough. Suppose d( f , g) < ε and λ is the element of such that d( f , g) < λ < ε and supt∈[0,1] | f (t ) − g(λ(t ))| < ε. Since λ(0) = 0, we have log(1 − 2ε) < −ε < log
λ(t ) < ε < log(1 + 2ε), t
(34.1)
or
λ(t ) < 1 + 2ε, t which implies |λ(t ) − t| < 2ε, and hence ρ( f , g) ≤ 2d( f , g). 1 − 2ε <
(34.2)
We define the analog ξ f of the modulus of continuity for a function in D as follows. Define θ f [a, b) = sups,t∈[a,b) | f (t ) − f (s)| and ξ f (δ) = inf { max θ f [ti−1 , ti ) : ∃n ≥ 1, 0 = t0 < t1 < · · · < tn = 1 1≤i≤n
such that ti − ti−1 > δ for all i ≤ n}. Observe that if f ∈ D, then ξ f (δ) ↓ 0 as δ ↓ 0.
34.1 Metrics for D[0, 1]
261
Lemma 34.2 Suppose δ < 1/4. Let f ∈ D. If ρ( f , g) ≤ δ 2 , then d( f , g) ≤ 4δ + ξ f (δ). Proof Choose ti ’s such that ti − ti−1 > δ and θ f [ti−1 , ti ) < ξ f (δ) + δ for each i. Pick μ ∈ such that supt | f (t )−g(μ(t ))| < δ 2 and supt |μ(t )−t| < δ 2 . Then supt | f (μ−1 (t ))−g(t )| < δ 2 . Set λ(ti ) = μ(ti ) and let λ be linear in between. Since μ−1 (λ(ti )) = ti for all i, then t and μ−1 ◦ λ(t ) always lie in the same subinterval [ti−1 , ti ). Consequently | f (t ) − g(λ(t ))| ≤ | f (t ) − f (μ−1 (λ(t )))| + | f (μ−1 (λ(t ))) − g(λ(t ))| ≤ θ f (δ) + δ 2 ≤ ξ f (δ) + δ + δ 2 < ξ f (δ) + 4δ. We have |λ(ti ) − λ(ti−1 ) − (ti − ti−1 )| = |μ(ti ) − μ(ti−1 ) − (ti − ti−1 )| ≤ 2δ 2 < 2δ(ti − ti−1 ). Since λ is defined by linear interpolation, |λ(t ) − λ(s)) − (t − s)| ≤ 2δ|t − s|, which leads to
s, t ∈ [0, 1],
λ(t ) − λ(s) − 1 ≤ 2δ, t −s
or log(1 − 2δ) ≤ log
λ(t ) − λ(s) t −s
≤ log(1 + 2δ).
Since δ < 14 , we have λ ≤ 4δ. Proposition 34.3 The metrics d and ρ are equivalent, i.e., they generate the same topology. In particular, (D, d ) is separable. Proof Let Bρ ( f , r) denote the ball with center f and radius r with respect to the metric ρ and define Bd ( f , r) analogously. Let ε > 0 and let f ∈ D. If d( f , g) < ε/2 and ε is small enough, then ρ( f , g) ≤ 2d( f , g) < ε, and so Bd ( f , ε/2) ⊂ Bρ ( f , ε). To go the other direction, what we must show is that given f and ε, there exists δ such that Bρ ( f , δ) ⊂ Bd ( f , ε). δ may depend on f ; in fact, it has to in general, for otherwise a Cauchy sequence with respect to d would be a Cauchy sequence with respect to ρ, and vice versa. Choose δ small enough that 4δ 1/2 + ξ f (δ 1/2 ) < ε. By Lemma 34.2, if ρ( f , g) < δ, then d( f , g) < ε, which is what we want. Finally, suppose G is open with respect to the topology generated by ρ. For each f ∈ G, let r f be chosen so that Bρ ( f , r f ) ⊂ G. Hence G = ∪ f ∈G Bρ ( f , r f ). Let s f be chosen so that Bd ( f , s f ) ⊂ Bρ ( f , r f ). Then ∪ f ∈G Bd ( f , s f ) ⊂ G, and in fact the sets are equal because if f ∈ G, then f ∈ Bd ( f , s f ). Since G can be written as the union of balls which are open with respect to d, then G is open with respect to d. The same argument with d and ρ interchanged shows that a set that is open with respect to d is open with respect to ρ.
262
The space D[0, 1]
34.2 Compactness and completeness We now show completeness for (D, d ). Theorem 34.4 The space D with the metric d is complete. Proof Let fn be a Cauchy sequence with respect to the metric d. If we can find a subsequence n j such that fn j converges, say, to f , then it is standard that the whole sequence converges to f . Choose n j such that d( fn j , fn j+1 ) < 2− j . For each j there exists λ j such that sup | fn j (t ) − fn j+1 (λ j (t ))| ≤ 2− j ,
λ j ≤ 2− j .
t
As in (34.1) and (34.2), |λ j (t ) − t| ≤ 2− j+1 . Then sup |λn+m+1 ◦ λm+n ◦ · · · ◦ λn (t ) − λn+m ◦ · · · ◦ λn (t )| t
= sup |λn+m+1 (s) − s| s
≤ 2−(n+m) for each n. Hence for each n, the sequence λm+n ◦ · · · ◦ λn (indexed by m) is a Cauchy sequence of functions on [0, 1] with respect to the supremum norm on [0, 1]. Let νn be the limit. Clearly νn (0) = 0, νn (1) = 1, νn is continuous, and nondecreasing. We also have λn+m ◦ · · · ◦ λn (t ) − λn+m ◦ · · · ◦ λn (s) log t −s ≤ λn+m ◦ · · · ◦ λn ≤ λn+m + · · · + λn 1 ≤ n−1 . 2 If we then let m → ∞, we obtain 1 νn (t ) − νn (s) ≤ n−1 , log t −s 2 which implies νn ∈ with νn ≤ 21−n . We see that νn = νn+1 ◦ λn . Consequently −1 −j sup | fn j (ν −1 j (t )) − f n j+1 (ν j+1 (t ))| = sup | f n j (s) − f n j+1 (λ j (s))| ≤ 2 . t
s
Therefore fn j ◦ ν −1 j is a Cauchy sequence on [0, 1] with respect to the supremum norm. Let f be the limit. Since sup | fn j (ν −1 j (t )) − f (t )| → 0 t
and ν j → 0 as j → ∞, then d( fn j , f ) → 0.
34.2 Compactness and completeness
263
We next show that if fn → f with respect to d and f ∈ C[0, 1], the convergence is in fact uniform. Proposition 34.5 Suppose fn → f in the topology of D[0, 1] with respect to d and f ∈ C[0, 1]. Then supt∈[0,1] | fn (t ) − f (t )| → 0. Proof Let ε > 0. Since f is uniformly continuous on [0, 1], there exists δ such that | f (t ) − f (s)| < ε/2 if |t − s| < δ. For n sufficiently large there exists λn ∈ such that supt | fn (t ) − f (λn (t ))| < ε/2 and supt |λn (t ) − t| < δ. Therefore | f (λn (t )) − f (t )| < ε/2, and so | fn (t ) − f (t )| < ε. We turn to compactness. Theorem 34.6 A set A has compact closure in D[0, 1] if sup sup | f (t )| < ∞ f ∈A
t
and lim sup ξ f (δ) = 0.
δ→0 f ∈A
The converse of this theorem is also true, but we won’t need this. See Billingsley (1968) or Exercise 34.9. Proof A complete and totally bounded set in a metric space is compact, and D[0, 1] is a complete metric space. Hence it suffices to show that A is totally bounded: for each ε > 0 there exist finitely many balls of radius ε that cover A. Let η > 0 and choose k large such that 1/k < η and ξ f (1/k) < η for each f ∈ A. Let M = sup f ∈A supt | f (t )| and let H = {−M + j/k : j ≤ 2kM}, so that H is an η-net for [−M, M]. Let B be the set of functions f ∈ D[0, 1] that are constant on each interval [(i − 1)/k, i/k) and that take values only in the set H . In particular, f (1) ∈ H . We first prove that B is a 2η-net for A with respect to ρ. If f ∈ A, there exist t0 , . . . , tn such that t0 = 0, tn = 1, ti − ti−1 > 1/k for each i, and θ f [ti−1 , ti ) < η for each i. Note we must have n ≤ k. For each i choose integers ji such that ji /k ≤ ti < ( ji + 1)/k. The ji are distinct since the ti are at least 1/k apart. Define λ so that λ( ji /k) = ti and λ is linear on each interval [ ji /k, ji+1 /k]. Choose g ∈ B such that |g(m/k) − f (λ(m/k))| < η for each m ≤ k. Observe that each [m/k, (m + 1)/k) lies inside some interval of the form [ ji /k, ji+1 /k). Since λ is increasing, [λ(m/k), λ((m + 1)/k)) is contained in [λ( ji /k), λ( ji+1 /k)) = [ti , ti+1 ). The function f does not vary more than η over each interval [ti , ti+1 ), so f (λ(t )) does not vary more than η over each interval [m/k, (m + 1)/k). g is constant on each such interval, and hence sup |g(t ) − f (λ(t ))| < 2η. t
We have |λ( ji /k) − ji /k| = |ti − ji /k| < 1/k < η for each i. By the piecewise linearity of λ, supt |λ(t ) − t| < η. Thus ρ( f , g) < 2η. We have proved that given f ∈ A, there exists g ∈ B such that ρ( f , g) < 2η, or B is a 2η-net for A with respect to ρ.
264
The space D[0, 1]
Now let ε > 0 and choose δ > 0 small so that 4δ + ξ f (δ) < ε for each f ∈ A. Set η = δ 2 /4. Choose B as above to be a 2η-net for A with respect to ρ. By Lemma 34.2, if ρ( f , g) < 2η < δ 2 , then d( f , g) ≤ 4δ + ξ f (δ) < ε. Therefore B is an ε-net for A with respect to d. The following corollary is proved exactly similarly to Theorem 32.1. Corollary 34.7 Suppose Xn are processes whose paths are right continuous with left limits. Suppose for each ε and η there exists n0 , R, and δ such that
P(ξXn (δ) ≥ ε) ≤ η and
P( sup |Xn (t )| ≥ R) ≤ η. t∈[0,1]
Then the Xn are tight with respect to the topology of D[0, 1].
34.3 The Aldous criterion A very useful criterion for tightness is the following one due to Aldous (1978). Theorem 34.8 Let {Xn } be a sequence in D[0, 1]. Suppose lim sup P(|Xn (t )| ≥ R) = 0
R→∞
(34.3)
n
for each t ∈ [0, 1] and that whenever τn are stopping times for Xn and δn → 0 are reals, |Xn (τn + δn ) − Xn (τn )|
(34.4)
converges to 0 in probability as n → ∞. Proof We will set Xn (t ) = Xn (1) for t ∈ [1, 2] to simplify notation. The proof of this theorem comprises four steps. Step 1. We claim that (34.4) implies the following: given ε there exist n0 and δ such that
P(|Xn (τn + s) − Xn (τn )| ≥ ε) ≤ ε
(34.5)
for each n ≥ n0 , s ≤ 2δ, and τn a stopping time for Xn . For if not, we choose an increasing subsequence nk , stopping times τnk , and snk ≤ 1/k for which (34.5) does not hold. Taking δnk = snk gives a contradiction to (34.4). Step 2. Let ε > 0, fix n ≥ n0 , and let T ≤ U ≤ 1 be two stopping times for Xn . We will prove
P(U ≤ T + δ, |Xn (U ) − Xn (T )| ≥ 2ε) ≤ 16ε.
(34.6)
To prove this, we start by letting λ be Lebesgue measure. If AT = {(ω, s) ∈ × [0, 2δ] : |Xn (T + s) − Xn (T )| ≥ ε}, then for each s ≤ 2δ we have P(ω : (ω, s) ∈ AT ) ≤ ε by (34.5) with τn replaced by T . Writing P × λ for the product measure, we then have
P × λ(AT ) ≤ 2δε.
(34.7)
34.3 The Aldous criterion
265
Set BT (ω) = {s : (ω, s) ∈ AT } and CT = {ω : λ(BT (ω)) ≥ 14 δ}. From (34.7) and the Fubini theorem, λ(BT (ω)) P(dω) ≤ 2δε, so
P(CT ) ≤ 8ε. We similarly define BU and CU , and obtain P(CT ∪ CU ) ≤ 16ε. If ω ∈ / CT ∪ CU , then λ(BT (ω)) ≤ 14 δ and λ(BU (ω)) ≤ 14 δ. Suppose U ≤ T + δ. Then λ{t ∈ [T, T + 2δ] : |Xn (t ) − Xn (T )| ≥ ε} ≤ 14 δ, and λ{t ∈ [U, U + δ] : |Xn (t ) − Xn (U )| ≥ ε} ≤ 14 δ. Hence there exists t ∈ [T, T + 2δ] ∩ [U, U + δ] such that |Xn (t ) − Xn (T )| < ε and |Xn (t ) − Xn (U )| < ε; this implies |Xn (U ) − Xn (T )| < 2ε, which proves (34.6). Step 3. We obtain a bound on ξXn . Let Tn0 = 0 and Tn,i+1 = inf {t > Tni : |Xn (t ) − Xn (Tni )| ≥ 2ε} ∧ 2. Note we have |Xn (Tn,i+1 ) − Xn (Tni )| ≥ 2ε if Tni < 2. We choose n0 , δ as in Step 1. By Step 2 with T = Tni and U = Tn,i+1 ,
P(Tn,i+1 − Tni < δ, Tni < 2) ≤ 16ε.
(34.8)
Let K = [2/δ] + 1 and apply (34.5) with ε replaced by ε/K to see that there exist n1 ≥ n0 and ζ ≤ δ ∧ ε such that if n ≥ n1 , s ≤ 2ζ , and τn is a stopping time, then
P(|Xn (τn + s) − Xn (τn )| > ε/K ) ≤ ε/K.
(34.9)
By (34.6) with T = Tni and U = Tn,i+1 and δ replaced by ζ ,
P(Tn,i+1 ≤ Tni + ζ ) ≤ 16ε/K
(34.10)
P(∃i ≤ K : Tn,i+1 ≤ Tni + ζ ) ≤ 16ε.
(34.11)
for each i and hence
We have
E [Tni − Tn,i−1 ; TnK < 1] ≥ δ P(Tni − Tn,i−1 ≥ δ, TnK < 1) ≥ δ[P(TnK < 1) − P(Tni − Tn,i−1 < δ, TnK < 1)] ≥ δ[P(TnK < 1) − 16ε], where we used (34.8) in the last step. Summing over i from 1 to K,
P(TnK < 1) ≥ E [TnK ; TnK < 1] =
K
E [Tni − Tn,i−1 ; TnK < 1]
i=1
≥ Kδ[P(TnK < 1) − 16ε] ≥ 2[P(TnK < 1) − 16ε],
266
The space D[0, 1]
or P(TnK < 1) ≤ 32ε. Hence except for an event of probability at most 32ε, we have ξXn (ζ ) ≤ 4ε. Step 4. The last step is to obtain a bound on supt |Xn (t )|. Let ε > 0 and choose δ and n0 as in Step 1. Define DRn = {(ω, s) ∈ × [0, 1] : |Xn (s)(ω)| > R} for R > 0. The measurability of DRn with respect to the product σ -field F × B [0, 1], where B [0, 1] is the Borel σ -field on [0, 1], follows by the fact that Xn is right continuous with left limits. Let G(R, s) = sup P(|Xn (s)| > R). n
By (34.3), G(R, s) → 0 as R → ∞ for each s. Pick R large so that λ({s : G(R, s) > εδ}) < εδ. Then
1DRn (ω, s) P(dω) = P(|Xn (s)| > R) ≤
1, G(r, s) > εδ, εδ, otherwise.
Integrating over s ∈ [0, 1],
P × λ(DRn ) < 2εδ. If ERn (ω) = {s : (ω, s) ∈ DRn } and FRn = {ω : λ(ERn ) > δ/4}, we have 1 1 1 δ P(FRn ) = δ P(dω) ≤ 1DRn (ω, s) λ(ds) P(dω) ≤ 2εδ, 4 4 FRn
0
so P(FRn ) ≤ 8ε. Define T = inf {t : |Xn (t )| ≥ R + 2ε} ∧ 2 and define AT , BT , and CT as in Step 2. We have
P(CT ∪ FRn ) ≤ 16ε. If ω ∈ / CT ∪ FRn and T < 2, then λ(ERn (ω)) ≤ δ/4. Hence there exists t ∈ [T, T + 2δ] such that |Xn (t )| ≤ R and |Xn (t ) − Xn (T )| ≤ ε. Therefore |Xn (T )| ≤ R + ε, which contradicts the definition of T . We conclude that T must equal 2 on the complement of CT ∪ FRn , or in other words, except for an event of probability at most 16ε, we have supt |Xn (t )| ≤ R + 2ε, provided, of course, that n ≥ n0 . An application of Corollary 34.7 completes the proof. Aldous’s criterion is particularly well suited for strong Markov processes. Proposition 34.9 Suppose Xn is a sequence of real-valued strong Markov processes and there exists c, p, and γ > 0 such that
E x |Xn (t ) − Xn (0)| p ≤ ct γ ,
x ∈ R,
t ∈ [0, 1].
(34.12)
Then for each x ∈ R, the sequence of Px -laws of {Xn } is tight with respect to the space D[0, 1].
Exercises
267
Unlike the Kolmogorov continuity criterion, we do not require γ > 1. Proof
Fix x. For each t,
Px (|Xn (t )| ≥ R + |x|) ≤ Px (|Xn (t ) − Xn (0)| ≥ R) E x |Xn (t ) − Xn (0)| p ≤ Rp ct γ ≤ p, R which tends to 0 as R → ∞. We used Chebyshev’s inequality here. Suppose τn are stopping times for Xn and δn → 0. By the strong Markov property, for each ε>0 E x |Xn (τn + δn ) − Xn (τn )| p Px (|Xn (τn + δn ) − Xn (τn )| > ε) ≤ εp Xn (τn ) −p x =ε E E |Xn (δn ) − Xn (0)|γ ≤ cε −p δnγ , which tends to 0 as n → ∞. Now apply Theorem 34.8.
Exercises 34.1 Show that the space D with the metric ρ is separable. 34.2 Let fn = 1[1/2,1/2+1/n) . Show that this is a Cauchy sequence with respect to ρ, but does not converge to an element of D. Show { fn } is not a Cauchy sequence with respect to d. 34.3 Show that (with respect to the topology on D) the subset C[0, 1] of D is nowhere dense. 34.4 Consider D with the metric dsup ( f , g) = supt∈[0,1] | f (t ) − g(t )|. Show that D is not separable with respect to the metric dsup . 34.5 Suppose P and P are measures supported on D[0, 1] that agree on all cylindrical subsets of D[0, 1]. In other words, all the finite-dimensional distributions agree. Prove that P = P on D[0, 1]. 34.6 Show that the following are continuous functions on the space D[0, 1]. (1) f (x) = supt≤1 x(t ). 1 (2) f (x) = 0 x(t ) dt. (3) f (x) = supt≤1 (x(t ) − x(t−)). 34.7 Let P be a Poisson process with parameter λ. Prove that Pnt − nλt √ nλ converges weakly with respect to the topology of D[0, 1] as n → ∞ to a Brownian motion. 34.8 Suppose Xn converges weakly to X with respect to the topology of C[0, 1]. Prove that Xn converges weakly to X with respect to the topology of D[0, 1].
268
The space D[0, 1]
34.9 This is the converse to Theorem 34.6. Let A be an index set, and suppose the collection of functions { fα , f ∈ A} is precompact in D[0, 1], i.e., its closure is compact. (1) Prove supα∈A sup0≤t≤1 | f (t )| < ∞. (2) Prove lim sup ξ fα (δ) = 0.
δ→0 α∈A
Notes See Billingsley (1968) for more information.
35 Applications of weak convergence
In Chapter 32 we showed how weak convergence of stochastic processes could be used to give another construction of Brownian motion by showing that a simple symmetric random walk converges to a Brownian motion. In the first section of this chapter, we show that the sum of independent, identically distributed mean zero random variables with variance one also converges to a Brownian motion, which is known as the Donsker invariance principle. We then consider a Brownian bridge, which is a Brownian motion conditioned to return to zero at time one. We prove in Section 35.3 that a Brownian bridge is the limit process for a sequence of normalized empirical processes.
35.1 Donsker invariance principle Suppose the Yi are i.i.d. real-valued random variables√with mean zero and variance one, Sn = ni=1 Yi , and Zn (t ) is defined to be equal to Snt / n if nt is an integer and defined by linear interpolation for other values of t. The Donsker invariance principle says that the Zn converge weakly with respect to the space C[0, 1] to a Brownian motion. This is a bit more delicate than in Section 32.2 because here our Yi only have second moments. The statement of the Donsker invariance principle is the following. Theorem 35.1 Let the Yi and Zn be as above. Then Zn converges weakly to the law of Brownian motion on [0, 1] with respect to the metric of C[0, 1]. Before we prove this, we give an application and explain the name “invariance principle.” An example of how the Donsker invariance principle can be used is the following. Corollary 35.2 Let M = sups≤1 Ws and Mn = sups≤1 Zn (s), where W is a Brownian motion. Then Mn converges weakly to M. Proof Let g be a bounded and continuous function on the reals and define a function F on C[0, 1] by F ( f ) = g(sup f (s)). s≤1
Notice | sups≤1 f2 (s) − sups≤1 f1 (s)| ≤ sups≤1 | f2 (s) − f1 (s)| and therefore F : C[0, 1] → R is bounded and continuous. Since Zn converges weakly to W with respect to the topology on C[0, 1], then E F (Zn ) → E F (W ). This is equivalent to E g(Mn ) → E g(M ). Because g is an arbitrary bounded continuous function on the reals, we conclude Mn → M weakly. 269
270
Applications of weak convergence
√ This corollary says that the distribution of maxi≤n Si / n converges to the supremum of a Brownian motion. We can actually use this to derive the distribution of the maximum of a Brownian motion: first determine the distribution of the maximum of Sn when the Yi ’s are particularly simple, such as when they are a simple symmetric random walk. (That is, P(Yi = 1) = P(Yi = −1) = 12 .) Then take the limit as n → ∞. In the case of a simple symmetric random walk, we can find the distribution of the maximum using the reflection principle, and there are no technical difficulties with the proof, unlike using the reflection principle with Brownian motion. 1 1 Another useful example is where In = 0 |Zn (t )|2 dt and I = 0 |Wt |2 dt. Here the distribution of I can be found by an eigenvalue argument (Kuo, 1975), and this is then an approximation to the distribution of In . If f is a continuous function from C[0, 1] to R, an argument similar to the proof of Corollary 35.2 shows that f (Zn ) converges weakly to f (W ). We get the same limit process, regardless of the distribution of the Yi ’s, provided only that they are i.i.d. with mean zero and variance one. This is where the name “invariance principle” comes from – the limit is invariant with respect to changing the distribution of the Yi ’s. Lemma 35.3 Suppose we have a sequence Yi of i.i.d. random variables with mean zero and variance one and Sn = ni=1 Yi . Suppose λ > 4. Then √ √ P(max |Si | ≥ λ n) ≤ 43 P(|Sn | ≥ λ n/2). i≤n
√ √ Proof Let N = min{i : |Si | ≥ λ n}, the first time Si is bigger than λ n. N is a stopping time and (N = i) is in the σ -field generated by Y1 , . . . , Yi . We have √ √ √ P(max |Si | ≥ λ n) ≤ P(|Sn | ≥ λ n/2) + P(N < n, |Sn | < λ n/2) (35.1) i≤n √ ≤ P(|Sn | ≥ λ n/2) n−1 √ P(N = i, |Sn | < λ n/2). + i=1
√ √ If N = i with i < n and |Sn | < λ n/2, then |Sn − Si | ≥ λ n/2, and moreover the event √ {|Sn − Si | ≥ λ n/2} is in the σ -field generated by Yi+1 , . . . , Yn , and hence is independent of the event {N = i}. Using Chebyshev’s inequality, the sum on the last line of (35.1) is bounded by n−1 i=1
n−1 √ E |Sn − Si |2 P(N = i)P(|Sn − Si | ≥ λ n/2) ≤ P(N = i) λ2 n/4 i=1
= ≤ ≤
n−1
P(N = i)
i=1 1 P(N < i) 4 1 P(max |Si | 4 i≤n
n−i λ2 n/4
√ ≥ λ n),
since λ > 4. Therefore
√ √ √ P(max P(|Si | ≥ λ n) ≤ P(|Sn | ≥ λ n/2) + 14 P(max |Si | ≥ λ n). i≤n
i≤n
35.1 Donsker invariance principle
271
Subtracting the second term on the right from both sides and multiplying by 4/3 proves the lemma. Note that the central limit theorem tells us that for any β > 0 √ 2 P(|Sn | ≥ β n) → P(|Z| ≥ β ) ≤ e−β /2 , where Z is a mean zero normal random variable with variance one, and hence for n large (depending on β), √ 2 P(|Sn | ≥ β n) ≤ 2e−β /2 . (35.2) Lemma 35.4 For each ε, η > 0, there exist n0 and δ such that if n ≥ n0 and s ∈ [0, 1 − δ], then
P( sup |Zn (t ) − Zn (s)| > ε) ≤ ηδ. s≤t≤s+δ
Proof Let ε, η > 0, and choose δ small enough that 2e−ε /128δ ≤ δη/2. Then choose j0 large enough so that, using (35.2), √
ε j 2 P |S j | > √ ≤ 2e−ε /128δ ≤ δη/2 8 δ 2
if j ≥ j0 . Finally, choose n0 ≥ j0 /δ + 2, so that if n ≥ n0 , then [nδ] + 2 ≥ j0 and nδ ≥ ([nδ] + 2)/2, where [x] is the largest integer less than or equal to x. Let n ≥ n0 and set J = [nδ] + 2. Suppose there exists s such that for some t ∈ [s, s + δ] we have |Zn (t ) − Zn (s)| > ε. Then there exists j ≤ n such that for some i between j and √ j + J we have |Si − S j | ≥ ε n/2. Therefore n ≥ J/2δ and by Lemma 35.3 √ P( sup |Zn (t ) − Zn (s)| > ε) ≤ P( max |Si − S j | > nε/2) j≤i≤ j+J
s≤t≤s+δ
√
Jε max |Si − S j | > √ j≤i≤ j+J 4 δ √
J ε ≤ 43 P |S j+J − S j | > √ 8 δ √
Jε ≤ 43 P |SJ | > √ 8 δ ≤ δη. ≤P
The proof is complete. Lemma 35.5 For each ε, η > 0 there exist n0 and δ such that if n ≥ n0 ,
P(ωZn (δ) ≥ ε) ≤ 2η. Proof We will take δ = 1/K for some large K. If |t − s| ≤ 1/K, then either both s, t are in the same interval [(i − 1)/K, i/K] or they are in adjoining intervals. Thus they both lie in some interval of the form [(i − 2)/K, i/K]. Since |Zn (t ) − Zn (s)| ≤ |Zn (t ) − Zn ((i − 2)/K )| + |Zn (s) − Zn ((i − 2)/K )|,
272
Applications of weak convergence
then using Lemma 35.4 with δ = 2/K
P(∃s, t ∈ [0, 1] : |Zn (t ) − Zn (s)| ≥ ε, |t − s| < δ) sup |Zn (s) − Z(i−2)/K | ≥ ε/2) ≤ P(∃i ≤ K : (i−2)/K≤s≤i/K
≤ K sup P( i
sup (i−2)/K≤s≤i/K
|Zn (s) − Z(i−2)/K | ≥ ε/2)
≤ Kη(2/K ) = 2η, which proves the lemma. We can now prove the Donsker invariance principle. Proof of Theorem 35.1 By Lemma 35.5, Theorem 32.1, and the fact that Zn (0) = 0 for all n, the laws of the Zn are tight. Therefore by Prohorov’s theorem (Theorem 30.4), every subsequence has a further subsequence which converges weakly with respect to the topology on C[0, 1]. We therefore only need to show that every subsequential limit point of the Zn with respect to weak convergence is a Brownian motion. Since our processes lie in C[0, 1], the paths of any subsequential limit point are continuous, so it suffices by Theorem 2.6 to show that the finite-dimensional distributions of Zn converge weakly to the corresponding finite-dimensional distributions of a Brownian motion W . We will show the one-dimensional distributions converge, and leave the analogous argument for the higher-dimensional distributions to the reader. We have √ √ P(max |Yi |/ n ≥ ε) ≤ nP(|Y1 | ≥ nε) ≤ nP(|Y1 |2 /ε2 ≥ n). (35.3) i≤n
For any integrable non-negative random variable X , nP(X ≥ n) = E [n; X ≥ n] ≤ E [X ; X ≥ n], which tends to zero by dominated convergence. Therefore √ P(max |Yi |/ n ≥ ε) → 0. i≤n
(35.4)
√ Fix t ∈ [0, 1]. By the central limit theorem, S[nt] / [nt] converges weakly on R to a mean √ zero normal random variable with variance one, and by Exercise 30.3, we see that S[nt] / n converges weakly to a mean zero normal random variable with variance t. From the preceding √ paragraph we conclude that for each t, |Zn (t ) − S[nt] / n| converges to zero in probability. √ By Exercise 30.2, Zn (t ) has the same weak limit as S[nt] / n, namely, a mean zero normal random variable with variance t, which is the distribution of Wt . There is an elegant proof of the Donsker invariance principle using Skorokhod embedding. Unlike the proof above, however, this second proof does not extend to random variables taking values in Rd . By Theorem 15.6 we can find a Brownian motion W and a random walk Sn such that sup i≤n
|Si − Wi | →0 √ n
35.2 Brownian bridge
273
in probability. By the continuity of paths of W ,
P(
sup |t−s|≤1/n,s,t≤1
|Wt − Ws | > ε) → 0.
√ If we let W n (t ) = Wnt / n, we then have that supi≤n |Zn (i/n) − Wn (i/n)| tends to zero in probability as n → ∞ and also, because Wn is again a Brownian motion,
P(
sup |t−s|≤1/n,s,t≤1
|Wn (t ) − Wn (s)| > ε) → 0.
We conclude that sup |Zn (t ) − Wn (t )| → 0. t≤1
The law of Wn is that of a Brownian motion and does not depend on n. By Exercise 30.2 we obtain that Zn converges weakly to the law of a Brownian motion. If the above proof seems too simple, remember that we used Theorem 15.6, which in turn relies on Skorokhod embedding. √ One might ask about the weak convergence of Zn (t ) = S[nt] / n; these are the normalized partial sums without the linear interpolation. Rather than being continuous and piecewise linear like the Zn (t ), the Zn (t ) are piecewise constant and have jumps. Zn Proposition 35.6 Suppose the Yi are independent with mean zero and variance one. The converge weakly with respect to the topology of D[0, 1] to Brownian motion. Proof The Zn converge weakly with respect to the topology of C[0, 1] to a Brownian motion. By the Skorokhod representation (Theorem 31.2), we can find a probability space and random variables Zn having the same law as Zn that converge almost surely with respect to the supremum norm. Therefore the Zn converge almost surely with respect to the metric of D[0, 1], and hence the Zn converge weakly to a Brownian motion with respect to the topology of D[0, 1]. If we show that supt≤1 |Zn (t ) − Zn (t )| converges to zero in probability, then our result will follow by Exercise 30.2. Now Zn (t ) and Zn (t ) will differ by more than ε for some t only if some Yi is larger than √ nε in absolute value. But by (35.4), the probability of this tends to zero as n → ∞.
35.2 Brownian bridge 0
A Brownian bridge Wt is the process defined by Wt 0 = Wt − tW1 ,
0 ≤ t ≤ 1,
where W is a Brownian motion. W 0 has continuous paths, is jointly normal, is zero at time 0 and at time 1, has mean zero, and we calculate its covariance by Cov (Ws0 , Wt 0 ) = Cov (Ws , Wt ) − s Cov (W1 , Wt ) − t Cov (Ws , W1 ) + stVar (W1 ) = s ∧ t − st, recalling (2.1).
274
Applications of weak convergence
A Brownian bridge can be characterized as a Brownian motion conditioned to be zero at time 1. To make this precise, let W be a Brownian motion started at zero under P, and for A a Borel subset of C[0, 1], define
Pε (A) = P(W ∈ A | |W1 | ≤ ε); cf. (A.13). Set P0 (A) = P(W 0 ∈ A), the law of W 0 . Proposition 35.7 Pε converges weakly to P0 with respect to the topology of C[0, 1] as ε → 0. Proof
Since W is a jointly normal process and Cov (Wt − tW1 , W1 ) = Cov (Wt , W1 ) − tVar (W1 ) = 0,
then the process Wt 0 = Wt − tW1 and the random variable W1 are independent by Proposition A.55. Let F be any closed subset of C[0, 1] and let Fδ = {g ∈ C[0, 1] : d(g, F ) < δ}, where d(g, F ) = inf {d(g, f ) : f ∈ F } and d here is the supremum norm. Note supt≤1 |Wt −Wt 0 | ≤ ε on the event {|W1 | ≤ ε}. If δ > ε,
Pε (F ) = P(W ∈ F | |W1 | ≤ ε) ≤ P(W 0 ∈ Fδ | |W1 | ≤ ε) = P(W 0 ∈ Fδ ) = P0 (Fδ ). Thus lim supε→0 Pε (F ) ≤ P0 (Fδ ). Since F is closed, P0 (Fδ ) → P0 (F ) as δ → 0, so lim sup Pε (F ) ≤ P0 (F ). An application of Theorem 30.2 completes the proof. We show that a Brownian bridge can also be represented as the solution X of the stochastic differential equation Xt dt, X0 = 0, (35.5) dXt = dWt − 1−t where W is a Brownian motion. This is plausible: X behaves much like a Brownian motion until t is close to 1, when there is a strong push toward the origin. The existence and uniqueness theory of Chapter 24 shows uniqueness and existence for the solution of (35.5) for s ≤ t for any t < 1; see Exercise 24.4. We can solve (35.5) explicitly. We have X Xt t dWt = dXt + dt = (1 − t ) d , 1−t 1−t or t dWs . Xt = (1 − t ) 1 −s 0 Thus Xt is a continuous Gaussian process with mean zero. The variance of Xt is t 2 (1 − s)−2 ds = t − t 2 , (1 − t ) 0
the same as the variance of a Brownian bridge. A similar calculation shows that the covariance of Xt and Xs is the same as the covariance of Wt − tW1 and Ws − sW1 ; see Exercise 24.6. Hence the finite-dimensional distributions of Xt and a Brownian bridge are the same. We now appeal to Theorem 2.6.
35.3 Empirical processes
275
35.3 Empirical processes In this section we will consider empirical processes, which are useful in statistics in estimating distribution functions. Let Xi , i = 1, . . . , n, be i.i.d. random variables that are uniformly distributed on the interval [0, 1]. Define the empirical process 1 1[0,t] (Xi ). n i=1 n
Fn (t ) =
(35.6)
The Glivenko–Cantelli theorem (Theorem A.40) says that sup |Fn (t ) − t| → 0,
a.s.
t∈[0,1]
Our goal in this section is to obtain the corresponding weak limit theorem. Let Zn (t ) =
√
1 n(Fn (t ) − t ) = √ (1[0,t] (Xi ) − t ). n =1 n
(35.7)
We will show that Zn converges weakly with respect to D[0, 1] to a Brownian bridge. Let ωZn (δ) =
sup
|Zn (t ) − Zn (s)|.
s,t∈[0,1],|t−s|<δ
The paths of Zn are not continuous: they have a jump of size 1/n at every time Xi . Thus ωZn (δ) does not tend to zero as δ → 0. Nevertheless we can get reasonable estimates on ωZn (δ). We need an elementary lemma on binomial random variables, the proof of which is Exercise 35.1. Lemma 35.8 Suppose Sn is a binomial random variable with parameters n and p. Then there exists a constant c not depending on n or p such that
E |Sn − E Sn |4 ≤ cnp + cn2 p2
(35.8)
E |Sn |4 ≤ cnp + cn4 p4 .
(35.9)
and
Proposition 35.9 Let ε, η > 0. There exists δ and n0 such that if n ≥ n0 , then
P(ωZn (δ) > ε) ≤ η. The idea of the proof is to use Corollary 8.4 to estimate Zn (t ) − Zn (s) when |t − s| is small and use estimates on binomials when |t − s| is large. Proof Let ε, η > 0. We will choose n0 , δ later. Assuming that they have been chosen, suppose n ≥ n0 and choose k such that n ≤ 2k < 2n. If t ∈ [0, 1], let t(k) be the largest multiple of 2−k less than or equal to t and similarly define s(k). Let Dk = {i/2k : 0 ≤ i ≤ 2k }. We will show there exists δ > 0 such that
P(
sup s,t∈Dk ,|t−s|<2δ
|Zn (t ) − Zn (s)| > ε/3) < η/3
(35.10)
276
Applications of weak convergence
and
P( sup |Zn (s) − Zn (s(k))| > ε/3) < η/3.
(35.11)
s∈[0,1]
Step 1. We first prove (35.10) by using Corollary 8.4. Suppose s, t ∈ Dk with |t − s| < 2δ. Then either s = t, in which case Zn (t ) − Zn (s) = 0, or else |t − s| ≥ 2−k ≥ 1/(2n). Take p = t − s and note that 1(s,t] (Xi ) is a Bernoulli random variable with parameter p. Using (35.7) and Lemma 35.8, c (np + n2 p2 ) n 2 p + p2 ≤ c|t − s|2 , =c n
E |Zn (t ) − Zn (s)|4 ≤
where in the last line we used 1/n ≤ 2|t − s|. By Corollary 8.4,
P(
sup s,t∈Dk ,|t−s|<2δ
|Zn (t ) − Zn (s)| > ε/3) ≤ P
sup s,t∈Dk ,|t−s|<2δ
|Zn (t ) − Zn (s)| ε > c |t − s|1/8 δ 1/8
≤ c(ε/δ 1/8 )−4 = cδ 1/2 /ε4 . We choose δ small enough so that the last term is less than η/3. Step 2. We now prove (35.11). Let Tn (t ) =
n
1[0,t] (Xi ).
i=1
Observe that Tn (t ) is nondecreasing in t. If there exists s ∈ [0, 1] such that Tn (s)−Tn (s(k)) > √ √ ε n/3, then there exists j ≤ 2k −1 such that Tn (( j +1)/2k )−Tn ( j/2k ) > ε n/3. Therefore, using (35.9),
Tn (s) − Tn (s(k)) > ε/3 P sup √ n s∈[0,1]
√ ≤ P(∃ j ≤ 2k − 1 : Tn (( j + 1)/2k ) − Tn ( j/2k ) > ε n/3) √ ≤ 2k sup P(Tn (( j + 1)/2k ) − Tn ( j/2k ) > ε n/3) j≤2k −1
≤ c2k ≤ c2k
sup j E |Tn (( j + 1)/2k ) − Tn ( j/2k )|4 −k
n2
−k 4
ε4 n2
+ (n2 ) . ε4 n2
Since n2−k ≤ 2, the last line is less than or equal to c2k n2−k /ε4 n2 = c1 /ε4 n. We choose n0 > 1/δ large enough so that if n ≥ n0 , then c1 /ε4 n is less than η/3. Also,
E [Tn (s) − Tn (s(k)] ≤ n(s − s(k)) ≤ n2−k ≤ 2
35.3 Empirical processes
277
√ will be less than ε n/3 if n ≥ 36/ε2 and we choose n0 larger if necessary so that n0 > 36/ε2 . Since Tn (t ) − Tn (s) E [Tn (t ) − Tn (s)] − , Zn (t ) − Zn (s) = √ √ n n (35.11) follows. Step 3. Now that we have (35.10) and (35.11), we write |Zn (t ) − Zn (s)| ≤ |Zn (t ) − Zn (t(k))| + |Zn (t(k)) − Zn (s(k))| + |Zn (s(k)) − Zn (s)|. If |t − s| < δ, then |t(k) − s(k)| ≤ δ + 2−k ≤ δ + 1/n. Provided n ≥ n0 > 1/δ, combining (35.10) and (35.11) gives
P(
|Zn (t ) − Zn (s)| > ε) < η
sup s,t∈[0,1],|t−s|<δ
as required. Theorem 35.10 The Zn converge weakly to a Brownian bridge with respect to the topology of D[0, 1]. Proof set
We smooth Zn to get a continuous process Vn . Set Zn (t ) = Zn (1) for t ∈ [1, 2] and
n−1
Vn (t ) = n
Zn (u + t ) du. 0
We have
n−1
|Vn (t2 ) − Vn (t1 )| ≤ n
|Zn (t2 + u) − Zn (t1 + u)| du 0
≤n
n−1
ωZn (|t2 − t1 |) du = ωZn (|t2 − t1 |). 0
Note also that by (35.8) with p = t − s and using Jensen’s inequality with the measure n1[0,n−1 ] (u) du, n−1 E |Vn (0)|4 ≤ n E |Zn (u)|4 du ≤ c. 0
Hence
E |Vn (0)|4 c ≤ 4. 4 A A Therefore by Theorem 8.1, the Vn are tight with respect to weak convergence on C[0, 1]. If the Vn j converges weakly (with respect to C[0, 1]), by the Skorokhod representation we may find Vnj with the same law as Vn j that converge almost surely. Then the Vnj will also converge almost surely in the space D[0, 1]. This proves that the Vn are tight in D[0, 1] by Exercise 30.10. Given ε and η, choose δ and n0 such that P(ωZn (δ) > ε) < η if n ≥ n0 . We have n−1 |Zn (u + t ) − Zn (t )| du ≤ ωZn (n−1 ). |Vn (t ) − Zn (t )| ≤ n P(|Vn (0)| ≥ A) ≤
0
278
Applications of weak convergence
If n is large enough so that n−1 < δ, then
P(sup |Vn (t ) − Zn (t )| > ε) ≤ P(ωZn (n−1 ) > ε) ≤ P(ωZn (δ) > ε) < η. t
Therefore Vn − Zn converges to 0 in probability, and by Exercise 30.2 the subsequential limit points of Vn are the same as those of Zn . It remains to show that any subsequential limit point of the Zn is a Brownian bridge. This follows from the multidimensional central limit theorem for multinomials (see Remark A.57) and is left as Exercise 35.2.
Exercises 35.1 Prove Lemma 35.8. 35.2 Prove that the finite-dimensional distributions of Zn in Theorem 35.10 converge to those of a Brownian bridge. 0 is also a Brownian bridge. 35.3 If Wt0 is a Brownian bridge, prove that Yt = W1−t
35.4 Let t0 < 1. The SDE (35.5) has a unique solution when X0 = 0 is replaced by X0 = x. Let Px be the law of the solution when X0 = x and let Zt be the canonical process. Show that (Zt , Px ) is not a Markov process. 35.5 Let Nt (A) be a Poisson point process with respect to the measure space (S , m) and let As , s > 0, be an increasing sequence of subsets of S with m(As ) → ∞ as s → ∞. Does Nt (As ) − m(As ) √ m(As ) converge weakly with respect to D[0, 1] as s → ∞? What is the limit? This can be applied to get central limit theorems for the number of downcrossings of a Brownian motion, for example. 35.6 This exercise asks you to prove that the Poisson process conditioned to be equal to n at time 1 has the same law as n times the empirical process. Here is the precise statement. Suppose Pt is a Poisson process with parameter λ > 0. Let Q be the law of {Pt , t ∈ [0, 1]} conditioned so that P1 = n. Thus Q is a probability on D[0, 1] with Q(P ∈ A) = P(P ∈ A | P1 = n).
Since (P1 = n) is an event with positive probability, there is no difficulty defining these conditional probabilities. Prove that Q is also the law of {nFn (t ), t ∈ [0, 1]}, where Fn is defined in Section 35.3.
36 Semigroups
In this chapter we suppose we have a semigroup of positive contraction operators {Pt }, and we show how to construct a Markov process X corresponding to this semigroup. In Chapters 37 and 38, we will show how such semigroups might arise. We suppose that we have a state space S that is a separable locally compact metric space S . Let C0 be the set of continuous functions on S that vanish at infinity. Recall that f ∈ C0 if f is continuous, and given ε, there exists a compact set K depending on ε and f such that | f (x)| < ε,
x∈ / K.
We use the usual supremum norm on C0 . We assume we have a semigroup {Pt } of positive contractions mapping C0 to C0 . More precisely, we assume Assumption 36.1 There exists a family {Pt }, t ≥ 0, of operators on C0 such that (1) If f ∈ C0 , then Pt (Ps f )(x) = Pt+s f (x),
x ∈ S,
s, t ≥ 0.
(2) If f (x) ≥ 0 for all x and if t ≥ 0, then Pt f (x) ≥ 0 for all x. (3) For all t, Pt f ≤ f . (4) If f ∈ C0 , then Pt f → f uniformly as t → 0. Our goal in this section is to construct a process X corresponding to the semigroup Pt . The steps we use are the following. (1) We temporarily assume each Pt maps the function 1 into itself. We define Xt for t in the dyadic rationals and define Px using the Kolmogorov extension theorem. (2) We verify a preliminary version of the Markov property. (3) We use the regularity theorem for supermartingales to show that X has left and right limits along the dyadic rationals, and then define Xt for all t. (4) We verify that our process (Xt , Px ) corresponds to the semigroup Pt . (5) We remove the assumption that Pt 1 = 1.
36.1 Constructing the process Let us assume the following for now. We will remove this assumption at the end of this section. Assumption 36.2 Pt 1(x) = 1 for all x and all t ≥ 0. 279
280
Semigroups
We now begin the construction of (Xt , Px ). Step 1. Let Dn = {k/2n : k ≥ 0} and let D = ∪n Dn , the dyadic rationals. Let be the set of functions from D to S . Define Xt (ω) = ω(t ),
t ∈ D,
ω ∈ .
We let F be the σ -field on generated by the collection of cylindrical subsets of . By the Riesz representation theorem (see Rudin (1987)), for each t > 0 there exists a measure Pt (x, dy) such that Pt f (x) = f (y) Pt (x, dy), f ∈ C0 . (36.1) (The Riesz representation theorem is most often phrased for continuous functions on compact spaces; since we are working with C0 , we can let the state space satisfy slightly weaker hypotheses; see Folland (1999), p. 223.) We can use (36.1) to define Pt f for all bounded Borel measurable functions f . Since Pt maps C0 to C0 , and continuous functions are Borel measurable, a limit argument shows that Pt f is Borel measurable whenever f is bounded and Borel measurable. Our main task in this step is to define Px . D is countable and we fix a labeling D = {t1 , t2 , . . .}. Let En = {t1 , . . . , tn }. Let s1 ≤ · · · ≤ sn be the ordering of En according to the usual ordering of the reals, so that s1 is the smallest element of the set {t1 , . . . , tn }, s2 is the next smallest, and so on. Define
Pxn (Xs1 ∈ A1 , . . . , Xsn ∈ An ) = ··· Ps1 (x, dx1 )Ps2 −s1 (x1 , dx2 ) · · · Psn −sn−1 (xn−1 , dxn ) An
(36.2)
A1
for A1 , . . . , An Borel subsets of S . The Pxn are consistent in the sense of Appendix D. The key to checking this is to observe that if s1 , . . . , sn is the ordering of En and we temporarily write s1 , . . . , si , s, si+1 , . . . , sn for the ordering of En+1 , then Ps−si (xi−1 , dx)Psi+1 −s (x, dxi ) = Psi+1 −si (xi−1 , dxi ) S
by the semigroup property; cf. (19.10). By the Kolmogorov extension theorem (Theorem D.1), for each x there exists a probability Px such that
Px (Xt1 ∈ A1 , . . . , Xtn ∈ An ) = Pxn (Xt1 ∈ A1 , . . . , Xtn ∈ An ) for each n whenever A1 , . . . , An are Borel subsets of S . If E x is the expectation corresponding to Px , (36.2) can be rewritten as
E x [ f1 (Xs1 ) · · · fn (Xsn )] = · · · f1 (x1 ) · · · fn (xn )Ps1 (x, dx1 )Ps2 −s1 (x1 , dx2 ) · · · × Psn −sn−1 (xn−1 , dxn )
(36.3)
36.1 Constructing the process
281
when fi = 1Ai for each i. To see this, by linearity we have (36.3) when the functions fi are simple functions; by a limit argument we have (36.3) when the fi are Borel measurable and non-negative, and by linearity, (36.3) holds when the fi are bounded and Borel measurable. By (36.2) we have Px (Xt ∈ A) = E 1A (Xt ) = Pt (x, dy) = Pt 1A (x). A
Using linearity and a limit argument, we have E x f (Xt ) = Pt f (x) when f is bounded and Borel measurable. Proposition 36.3 If f is bounded and Borel measurable, s, t > 0, and x ∈ S , then E x E Xt f (Xs ) = E x f (Xs+t ).
(36.4)
Proof The proof of (36.4) is mainly a matter of sorting out notation. Let ϕ(x) = E x f (Xs ) = Ps f (x). Hence E Xt f (Xs ) = ϕ(Xt ) = Ps f (Xt ). Then the left-hand side is E x (Ps f )(Xt ) = Pt (Ps f )(x). The right-hand side of (36.4) is Ps+t f (x), and so the two sides agree by the semigroup property. Step 2. We so far only have Xt constructed for t ∈ D. To extend the definition to all t, we want to let Xt = limu>t,u∈D,u→t Xu . But before we can make that definition, we need to know that the limits exist. We will use the regularity of supermartingales to show this, so we need to look at conditional expectations. Let
Fs = σ (Xr ; r ≤ s, r ∈ D ). Proposition 36.4 If s < t with s, t ∈ D and f is bounded and Borel measurable, then
E x [ f (Xt ) | Fs ] = E Xs f (Xt−s ),
Px -a.s.
(36.5)
Proof Take n ≥ 1, r1 ≤ r2 ≤ · · · ≤ rn ≤ s with each r j in D, and A1 , . . . , An Borel subsets of S . It suffices to show that
E x [ f (Xt )1A1 (Xr1 ) · · · 1An (Xrn )] = E x [(E Xs f (Xt−s ))1A1 (Xr1 ) · · · 1An (Xrn )],
(36.6)
since the events (Xr1 ∈ A1 , . . . , Xrn ∈ An ) generate Fs . The right-hand side of (36.6) is equal to
E x [Pt−s f (Xs )1A1 (Xr1 ) · · · 1An (Xrn )]. From (36.3)
E x [Pt−s f (Xs )1A1 (Xr1 ) · · · 1An (Xrn )] =
(36.7)
···
Pt−s f (y)1A1 (x1 ) · · · 1An (xn )
(36.8)
× Pr1 (x, dx1 ) · · · Prn −rn−1 (xn−1 , xn )Ps−rn (xn , dy).
But Pt−s f (y) = f (z)Pt−s (y, dz). Substituting this in (36.8) and using (36.3) again, we obtain the left-hand side of (36.6). Step 3. We define Rλ , the resolvent or λ-resolvent of Pt , by ∞ e−λt Pt f (x) dt. Rλ f (x) = 0
(36.9)
282
Semigroups
Lemma 36.5 If f ≥ 0 is bounded and Borel measurable and x ∈ S , then Mt = e−λt Rλ f (Xt ), t ∈ D, is a supermartingale with respect to the filtration {Ft ; t ∈ D} and the probability measure Px . Proof
What we need to show is that if s < t ∈ D, then
E x [e−λt Rλ f (Xt ) | Fs ] ≤ e−λs Rλ f (Xs ),
Px -a.s.
By Proposition 36.3 the left-hand side is e−λt E Xs Rλ f (Xt−s ), so what we need to show is that
E y Rλ f (Xt−s ) ≤ eλ(t−s) Rλ f (y)
(36.10)
for all y. The left-hand side of (36.10) is
∞
Pt−s Rλ f (y) =
0
e−λr Pt−s Pr f (y) dr
∞
e−λr Pr+t−s f (y) dr ∞ e−λr Pr f (y) dr = eλ(t−s) t−s ∞ ≤ eλ(t−s) e−λr Pr f (y) dr =
0
0
= eλ(t−s) Rλ f (y). The first equality is the Fubini theorem, the second the semigroup property, and the third equality comes from a change of variables. Next, if f is non-negative and bounded, by Theorem 3.12 with P replaced by Px , we see that e−λt Rλ f (Xt ) has left and right limits along t ∈ D. Therefore the same is true for Rλ f (Xt ). By Assumption 36.1 and dominated convergence, we have that if f ∈ C0 , ∞ e−λt (Pt f (x) − f (x)) dt λRλ f (x) − f (x) = 0 ∞ e−t (Pt/λ f (x) − f (x)) dt = 0
tends to zero uniformly in x as λ → 0. Take a countable dense subset { fi } of C0 and look at jR j fi (Xt ) for all positive integers j. Since jR j fi (Xt ) has left and right limits along D, a.s., letting j → ∞, we see that fi (Xt ) does also. We conclude that Xt has left and right limits along D. Now define Xt = limu>t,u∈D,u→t Xu . Then Xu is right continuous with left limits. We check that Px (Xt1 ∈ A1 , . . . , Xtn ∈ An ) = ··· Pt1 (x, dx1 ) · · · Ptn −tn−1 (xn−1 , dxn ). A1
An
36.2 Examples
283
To see this, we know this holds when the ti are in D. By linearity and a limit argument, we conclude E x [ f1 (Xt1 ) · · · fn (Xtn )] = · · · f (x1 ) · · · f (xn )Pt1 (x, dx1 ) · · · Ptn −tn−1 (xn−1 , dxn ) (36.11) when the fi are bounded and continuous. Using a limit argument, we know (36.11) holds when the ti are arbitrary non-negative real numbers. Using a limit argument again, (36.11) holds for all bounded and measurable f , in particular, when fi = 1Ai . Step 4. It remains to show that (Xt , Px ) satisfies Definition 19.1 and that Pt is the semigroup of this process. Let Ft00 = σ (Xs ; s ≤ t ). Then we have already shown that (Xt , Px ) is a Markov process with respect to the filtration {Ft00 }, except for showing that
Px (Xs+t ∈ A | Fs00 ) = PXs (Xt ∈ A). However, this can be proved almost identically to the way we proved Proposition 36.4. Step 5. Sometimes the semigroup is a contraction semigroup and satisfies Assumption 36.1 but not Assumption 36.2. In this case the Pt (x, A) are called sub-Markov transition probability kernels. The missing probability is due to the process being killed, and we can handle this situation as follows. Let S = S ∪ {}, where we introduce an isolated point {}. The topology on S is the one generated by the open sets on S together with the set {}. Given a function f on S , we extend it to S by setting f () = 0. We replace Pt (x, A) by Pt (x, A), where ⎧ ⎪ x ∈ S, A ⊂ S, ⎨Pt (x, A) = Pt (x, A), (36.12) Pt (x, {}) = 1 − Pt (x, S ), x ∈ S , ⎪ ⎩ Pt (, {}) = 1. One can go through the above construction with Pt and obtain a strong Markov process Xt whose state space is S . It is not hard to show that starting at , the process stays at forever; see Exercise 36.1. We remark that by the results of Chapter 20 and also Exercise 20.1, we can expand the filtration from {Ft00 } to {Ft }, where {Ft } is right continuous and each Ft contains all the sets that are null with respect to each Px . In addition, the strong Markov property will hold for (Xt , Px ).
36.2 Examples Example 36.6 Our first example is a Brownian motion. Let p(t, x, y) = (2πt )d/2 e−|x−y| /2t , 2
and set
Pt (x, A) =
p(t, x, y) dy. A
We know
p(t, x, z)p(s, z, y) dz = p(t + s, x, y)
284
Semigroups
by Proposition 19.5, and so Pt satisfies the semigroup property. We showed in Section 19.4 that Assumption 36.1 is satisfied, except for the fact that Pt maps C0 to C0 ; this is Exercise 36.2. Therefore we have a strong Markov process associated with Pt . By Proposition 21.5, the paths of the strong Markov process can be taken to be continuous. This gives yet another construction of a Brownian motion. Example 36.7 We now use the machinery we have developed in this chapter to construct the Poisson process. Define transition probabilities by Pt (x, A) = e−λt
∞ (λt )k 1A (x + k), k! k=0
where λ is some fixed parameter. If p(t, k) = e−λt (λt )k /k!, then Pt f (x) =
∞
f (x + k)p(t, k).
(36.13)
k=0
Thus Ps (Pt f )(x) =
∞
Pt f (x + j)p(s, j) =
j=0
∞ ∞
f (x + j + k)p(t, k)p(s, j).
j=0 k=0
This is equal to ∞
f (x + m)
m
p(t, m − k)p(s, k),
(36.14)
f (x + m)p(s + t, m) = Ps+t f (x).
(36.15)
m=0
k=0
which by Exercise 36.3 is equal to ∞ m=0
Therefore the semigroup property holds. We therefore have a strong Markov process X whose paths are right continuous with left limits. We want to show that the process Xt under the probability measure P0 is a Poisson process. That P0 (X0 = 0) = 1 is obvious. We need to show that Definition 5.1(3) and (4) hold. For the former,
P0 (Xt − Xs = k) =
∞
P(Xt = j + k, Xs = j)
(36.16)
p(s, j)p(t − s, k) = p(t − s, k),
(36.17)
j=0
=
∞ j=0
as desired. For Definition 5.1(4), suppose r1 ≤ r2 ≤ · · · ≤ rn ≤ s < t, a1 , . . . , an are integers, and let A = (Xr1 = a1 , . . . , Xrn = an ). We will be done if we show
P0 (Xt − Xs = k, A) = P0 (Xt − Xs = k)P0 (A).
(36.18)
Notes
285
The left-hand side of (36.18) is equal to ∞
P0 (Xt = j + k, Xs = j, A) =
j=0
∞
E 0 [P0 (Xt = j + k | Fs ); Xs = j, A]
j=0
=
∞
E 0 PXs (Xt−s = j + k); Xs = j, A
j=0
=
∞
E 0 P j (Xt−s = j + k); Xs = j, A
j=0
=
∞
E 0 [p(t − s, k); Xs = j, A]
j=0
= p(t − s, k)P0 (A). Together with (36.16) this proves (36.18).
Exercises 36.1 Suppose Pt is a family of sub-Markov transition probabilities and we define Pt by (36.12). Show that Pt is a family of Markov transition probabilities. Show that P (Xt = for some t > 0) = 0, i.e., starting at , the process stays there forever. 36.2 Show that if Pt (x, A) is defined by (19.17), and Pt f (x) = f (y) Pt (x, dy), then Pt maps C0 into C0 . 36.3 Show that (36.14) equals (36.15). 36.4 Show that Pt defined by (36.13) satisfies all the parts of Assumption 36.1. 36.5 Suppose {μt , t ≥ 0} is a tight family of probability measures on the real line. Suppose there exists a function ψ : R → C such that the Fourier transforms of the μt have the following form: t ≥ 0, u ∈ R. eiux μt (dx) = etψ (u) , (1) Prove that μt converges weakly to μ0 as t → 0. Note that μ0 is the same as point mass at 0. (2) Define the operators Pt by Pt f (x) = f (x − y) μt (dy). Prove that the Pt form a strongly continuous semigroup of contraction operators mapping C0 into C0 . Conclude that there exists a strong Markov process whose semigroup is given by the Pt . This semigroup is called a convolution semigroup because μt+s = μt ∗ μs , in the sense of convolution of measures. We will see later that these are associated with L´evy processes.
Notes See Blumenthal and Getoor (1968) for further information.
37 Infinitesimal generators
Often a Markov process is specified in terms of its behavior at each point, and one wants to form a global picture of the process. This means one is given the infinitesimal generator, which is a linear operator that is an unbounded operator in general, and one wants to come up with the semigroup for the Markov process. We will begin by looking further at semigroups and resolvents, and then define the infinitesimal generator of a semigroup. We will prove the Hille–Yosida theorem, which is the primary tool for constructing semigroups from infinitesimal generators. Then we will look at two important examples: elliptic operators in nondivergence form and L´evy processes.
37.1 Semigroup properties Let S be a locally compact separable metric space. We will take B to be a separable Banach space of real-valued functions on S . For the most part, we will take B to be the continuous functions on S that vanish at infinity (with the supremum norm), although another common example is to let B be the set of functions on S that are in L2 with respect to some measure. We use · for the norm on B . For the duration of this chapter we will make the following assumption. Assumption 37.1 Suppose that Pt , t ≥ 0, are operators acting on B such that (1) the Pt are contractions: Pt f ≤ f for all t ≥ 0 and all f ∈ B , (2) the Pt form a semigroup: Ps Pt = Pt+s for all s, t ≥ 0, and (3) the Pt are strongly continuous: if f ∈ B , then Pt f → f as t → 0. Note that the semigroup property implies in particular that Ps and Pt commute. For a bounded operator A on B , A = sup{A f : f ≤ 1}, so saying Pt is a contraction is the same as saying Pt ≤ 1. Define the resolvent or λ-resolvent operator of a semigroup Pt by ∞ Rλ f (x) = e−λt Pt f (x) dt. (37.1) 0
The resolvent equation is Rλ − Rμ = (μ − λ)Rλ Rμ . We show that the semigroup property implies the resolvent equation. 286
(37.2)
37.1 Semigroup properties
287
Proposition 37.2 The resolvent equation (37.2) holds. Proof
We write
∞
e−λt Pt (Rμ f )(x) dt ∞ ∞ −λt e e−μs Pt (Ps f )(x) ds dt = 0 0 ∞ ∞ −λt e e−μs Pt+s f (x) ds dt = 0 0 ∞ ∞ −λt μt e e e−μs Ps f (x) ds dt = 0 t ∞ s e−(λ−μ)t e−μs Ps f (x) dt ds = 0 0 ∞ 1 − e−(λ−μ)s −μs e Ps f (x) ds = λ−μ 0 ∞ 1 ∞ −λs e Ps f (x) ds − e−μs Ps f (x) ds = μ−λ 0 0 1 = [Rλ f (x) − Rμ f (x)]. μ−λ
Rλ (Rμ f )(x) =
0
The second equality uses Exercise 37.2, the fourth a change of variables, and the fifth the Fubini theorem. We have the following corollary to Proposition 37.2. Corollary 37.3 If μ, λ > 0 and |μ − λ| < λ, then R μ f = Rλ f +
∞
(λ − μ)i Ri+1 λ f.
(37.3)
i=1
Here R2λ f = Rλ (Rλ f ), and similarly for Riλ f . Proof
By Proposition 37.2, we have Rμ f = Rλ f + (λ − μ)Rλ Rμ f .
(37.4)
If we substitute for Rμ f in the last term on the right-hand side of (37.4), we have Rμ f = Rλ f + (λ − μ)Rλ Rλ f + (λ − μ)2 Rλ Rλ Rμ f . We again substitute for Rμ f , and repeat. Since (λ − μ)Rλ ≤
|λ − μ| , λ
which is less than one in absolute value, (λ − μ)i Ri+1 λ Rμ f converges to zero as i → ∞ and the series converges.
288
Infinitesimal generators
Remark 37.4 In particular, if Rλ and Sλ are two resolvents that agree at one value of λ, say λ0 , then Corollary 37.3 applied once with Rλ and once with Sλ implies that if λ < 2λ0 , then Rλ f = Rλ0 f + = Sλ0 f +
∞ i=1 ∞
(λ0 − λ)i (Rλ0 )i+1 f
(λ0 − λ)i (Sλ0 )i+1 f = Sλ f ,
i=1
or Rλ and Sλ agree for λ < 2λ0 . Applying this observation again with λ0 replaced by 3λ0 /2, then Rλ and Sλ agree for λ < 3λ0 . Continuing this argument, we see that Rλ and Sλ must agree for each positive value of λ. If for some f ∈ B ,
' 'P f − f ' ' h − g' → 0 ' h
as h → 0, we say that f is in the domain of the infinitesimal generator of the semigroup, we write g = L f and write f ∈ D = D (L ). Generally D (L ) is a proper subset of B . If f ∈ D and t > 0, then
P f − f Pt Ph f − Pt f Ph Pt f − Pt f h = = Pt → Pt L f , (37.5) h h h since Pt is a contraction. Therefore Pt f ∈ D when f ∈ D and L(Pt f ) = Pt (L f ). Proposition 37.5 Fix λ > 0 and let C = {Rλ f : f ∈ B }. Then C = D (L ) and for f ∈ B ,
LRλ f = λRλ f − f . Proof
Suppose that g ∈ C, so that g = Rλ f for some f ∈ B . Then ∞ ∞ e−λt Ph+t f dt = eλh e−λt Pt f dt, Ph Rλ f = 0
(37.6)
h
and so λh
∞
Ph g − g = Ph Rλ f − Rλ f = (e − 1)
−λt
e
h
Pt f dt −
h
e−λt Pt f dt.
(37.7)
0
Dividing by h and letting h → 0, the first term on the right of (37.7) converges (use Exercise 37.2) to ∞ λ e−λt Pt f dt = Rλ f . 0
Since f ∈ B , then Pt f → f as t → 0. After dividing by h, the second term on the right-hand side of (37.7) converges to f . Thus
L(Rλ f ) = λRλ f − f , as required.
(37.8)
37.1 Semigroup properties
289
We have shown that C ⊂ D (L ), and we now show the opposite inclusion. Suppose f ∈ D (L ). Let g = λ f − L f , which is in B . Since Pt and L commute, then Rλ and L commute, and by (37.8), f = λRλ f − (λRλ f − f ) = λRλ f − Rλ L f = Rλ g, which is in C. Example 37.6 Let us compute the infinitesimal generator when (Xt , Px ) is a onedimensional Brownian motion. For our space B we take the continuous functions on R that vanish at infinity. Suppose f ∈ C 2 with compact support. By a Taylor series expansion, Ph f (x) = E x f (Xh ) = f (x) + f (x)E x (Xh − x) + 12 f (x)E x (Xh − x)2 + Rh , where Rh is the remainder term. We know Rh is bounded by f ∞ E x [ϕ(Xh − x)], where ϕ is bounded and |ϕ(y)/y2 | → 0 as y → 0. Since Wh started at x has mean x and variance h, we have Ph f (x) = f (x) + 12 f (x)h + Rh , where |Rh /h| tends to zero as h → 0. Therefore Ph f − f → h
1 2
f ,
the convergence being with respect to the supremum norm. Exactly the same argument holds in higher dimensions to show that L f = 12 f . We have shown that D (L ) contains the C 2 functions with compact support, but have not actually identified the domain of the infinitesimal generator. We refer the reader to Knight (1981) for a detailed discussion. The domain of an infinitesimal generator is nearly as important as the operator itself. We will briefly discuss aspects of the domains of the infinitesimal generator for absorbing Brownian motion and for reflecting Brownian motion on [0, ∞). Both have the same operator L f = 12 f but different domains. Absorbing Brownian motion on [0, ∞) is Brownian motion killed on first hitting (−∞, 0). Let Wt be standard Brownian motion on R and let Xt be Wt killed on first hitting (−∞, 0). If f ∈ C 2 [0, ∞) with f and its first and second derivatives being bounded and uniformly continuous and x = 0, (E x f (Xt ) − f (x))/t differs from (E x f (Wt ) − f (x))/t by at most f ∞ Px (T0 < t )/t, where T0 is the first time Wt hits (−∞, 0). If x = 0,
Px (T0 < t ) Px (sups≤t |Ws − W0 | ≥ x) 2 −x2 /2t ≤ ≤ e →0 t t t as t → 0. Therefore for x = 0, the infinitesimal generator of absorbing Brownian motion is the same as the infinitesimal generator of standard Brownian motion, namely, 12 f (x).
290
Infinitesimal generators
If f = Rλ g for g bounded and continuous, we have T0 0 e−λt g(Xt ) dt = 0. f (0) = Rλ g(0) = E 0
We use the fact that starting at 0, T0 = 0, a.s., by Theorem 7.2. Using Proposition 37.5, every function in the domain of the infinitesimal generator of absorbing Brownian motion must satisfy f (0) = 0. We can define reflecting Brownian motion on [0, ∞) by Xt = |Wt |, where W is a onedimensional Brownian motion on R. As in the preceding paragraph, the infinitesimal generator for X agrees with 12 f (x) if x = 0. For x = 0, an application of Taylor’s theorem gives
E 0 f (|Wt |) = f (0) + f (0)E 0 |Wt | + 12 f (0)E 0 |Wt |2 + E 0 Rt , term. Subtracting f (0) from both sides and dividing by t, and noting where Rt is a remainder √ E 0 |Wt |/t = c1 t/t → ∞ as t → 0, the only way we can get convergence is if f (0) = 0. Thus every function in the domain of the infinitesimal generator of reflecting Brownian motion must satisfy f (0) = 0. In higher dimensions, the analogous restriction for reflecting Brownian motion is that the normal derivative ∂ f /∂n must equal zero on the boundary of the domain, where n is the inward-pointing unit normal vector. In the partial differential equations literature, this is known as the Neumann boundary condition, and models situations where there is no heat flow across the boundary. For absorbing Brownian motion the analogous restriction is that f = 0 on the boundary of the domain, and this is called the Dirichlet boundary condition. Example 37.7 Next we compute the generator for a Poisson process with parameter λ. We can let B be as in Example 37.6. We have Ph f (x) =
∞ i=0
e−λh
(λh)i f (x + i) i!
= e−λh f (x) + e−λh λh f (x + 1) +
∞ i=2
e−λh
(λh)i f (x + i). i!
Subtracting f (x) from both sides, dividing by h, and letting h → 0, we obtain
L f (x) = −λ f (x) + λ f (x + 1) = λ[ f (x + 1) − f (x)]. In this case the domain of L is all of B . A very useful result is Dynkin’s formula. Theorem 37.8 Suppose Pt operating on the space B of continuous functions vanishing at infinity is the semigroup of a Markov process (Xt , Px ), f ∈ D (L ), and f and L f are bounded. If x ∈ S and T is a stopping time with E x T < ∞, then T x x E f (XT ) − f (x) = E L f (Xr ) dr. 0
37.1 Semigroup properties
291
Proof If f ∈ D (L ), then L f ∈ B , and so Pt L f is continuous in t. Moreover, as we saw in (37.5), ∂ Pt f (y) = Pt L f (y). ∂t By the fundamental theorem of calculus,
t
Pt f (y) − f (y) =
Pr L f (y) dr,
0
which can be rewritten as
E f (Xt ) − f (y) = E y
y
t
L f (Xr ) dr;
(37.9)
0
we used the Fubini theorem here t as well. This holds for each y ∈ S and each t > 0. Set Mt = f (Xt ) − f (X0 ) − 0 L f (Xr ) dr. What (37.9) says is that E y Mt = 0 for all y and all t. By the Markov property, t E x [Mt − Ms | Fs ] = E x f (Xt ) − f (Xs ) − L f (Xr ) dr | Fs s t−s
= E x f (Xt−s ) − f (X0 ) − L f (Xr ) dr ◦ θs | Fs 0
= E Xs Mt−s = 0. Therefore Mt is a martingale with respect to Px for each x. If T is a bounded stopping time, then by optional stopping, E x MT = 0. If T is instead only integrable with respect to Px , we have E x MT ∧n = 0 for each n. We then let n → ∞ and use the fact that f and L f are bounded to conclude E x MT = 0, which is what we want. We say a few words about the Kolmogorov backward and forward equations. Suppose the semigroup Pt can be written f (y)p(t, x, y) dy, Pt f (x) = for functions p(t, x, y), which are called transition densities. Provided there are no difficulties interchanging integration and differentiation, the equation ∂ Pt f (x) = LPt f (x) ∂t can be rewritten as
f (y)
∂ p(t, x, y) dy = ∂t
f (y)L p(t, x, y) dy,
which leads to the Kolmogorov backward equation ∂ p(t, x, y) = L p(t, x, y), ∂t where L operates on the x variable and y is held fixed.
292
Infinitesimal generators
If L has an adjoint operator L∗ , which means domains of L∗ and L, respectively, the equation
f (Lg) =
(L∗ f )g for f and g in the
∂ Pt f (x) = Pt L f (x) ∂t can be rewritten as ∂ f (y)L∗ p(t, x, y) dy, f (y) p(t, x, y) dy = L f (y)p(t, x, y) dy = ∂t which leads to the Kolmogorov forward equation ∂ p(t, x, y) = L∗ p(t, x, y), ∂t where L∗ operates on the y variable and x is held fixed.
37.2 The Hille–Yosida theorem We now show how to construct a semigroup given the infinitesimal generator. We start with a few preliminary observations. If A is a bounded operator, we can define e = I + A + A /2! + · · · = A
2
∞
Ai /i!
i=0
To see that the series converges, note that n n ∞ ' ' ' ' Ai /i!' ≤ Ai /i! ≤ Ai /i!, ' i=m
i=m
i=m
which will be small if m is large since A is a finite number. Similarly, e ≤ A
∞
Ai /i! = eA .
i=0
Proposition 37.9 Suppose {Rλ } is a family of bounded operators defined on B such that (1) the resolvent equation holds, (2) Rλ ≤ 1/λ for each λ > 0, and (3) λRλ f − f → 0 as λ → ∞ for each f ∈ B . Then there exists a strongly continuous semigroup Pt whose resolvent is Rλ . Proof Let Dλ = λ(λRλ − I ) and Qtλ = etDλ . Note that the resolvent equation implies that Dλ and Dμ commute and therefore all the operators Dλ , Qtλ , Dμ , and Qtμ commute. Since λRλ ≤ 1, then Qtλ = e−λt etλ Rλ ≤ e−λt etλ Rλ ≤ e−λt eλt = 1. 2
2
We first show that the set of f such that Dλ f converges as λ → ∞ is a dense subset of B . If f = Ra g for some a > 0 and some g ∈ B , then by the resolvent equation Dλ f = λ(λRλ − I )(Ra g) = λ2 Rλ Ra g − λRa g =
λ2 (Ra g − Rλ g) − λRa g. λ−a
37.2 The Hille–Yosida theorem
293
We have λ2 λ Rλ g = λRλ g → g λ−a λ−a as λ → ∞ by hypothesis (3) and λ λ2 Ra g − λRa g = aRa g → aRa g λ−a λ−a as λ → ∞. Therefore Dλ Ra g → aRa g − g.
(37.10)
Thus Dλ converges on E = ∪a>0 {Ra f : f ∈ B }. But for any f ∈ B , aRa f → f as a → ∞ and aRa f = Ra (a f ) ∈ E, which proves that E is a dense subset of B . Next we show that if Dλ f converges, then Qtλ f converges. Suppose Dλ f converges and ε > 0. Choose M such that if λ, μ ≥ M, then Dλ f − Dμ f < ε. Since ∂Qtλ f /dt = Dλ Qtλ f and Qλ0 , Qμ0 are both the identity operator, we have t ∂ μ μ λ Qt f − Qt f = (Qλs Qt−s f ) ds 0 ∂s t μ μ [Qλs Dλ Qt−s f − Qλs Dμ Qt−s f ] ds = 0 t μ [Qλs Qt−s (Dλ f − Dμ f )] ds, = 0
so Qtλ f − Qtμ f ≤ tDλ f − Dμ f < εt, μ using that Qλs and Qt−s are contractions. Since ε is arbitrary, this proves that Qtλ f is a Cauchy sequence in B and hence converges. Call the limit Pt f . We can easily check that Qtλ is a semigroup for each λ > 0 and we saw that Qtλ is a contraction for each t and λ. It follows that Pt is a semigroup and that the norm of each Pt is bounded by 1. Each Qtλ is strongly continuous, and by the uniform convergence, it follows that Pt f → f as t → 0 for f ∈ E. Since each Pt is a contraction and E is dense in B , we can extend each Pt so as to have domain B and so that the Pt will be a strongly continuous semigroup on B . Let Sλ be the resolvent for Pt . It remains to prove that Sλ = Rλ . Fix a and let f = Ra g. We saw in (37.10) that Dλ Ra g → aRa g − g. Now Qtλ is a semigroup for each λ and by Exercise 37.4, the infinitesimal generator of Qtλ is Dλ . By the fundamental theorem of calculus, t t ∂ λ λ (Qs Ra g) ds = Qt (Ra g) − Ra g = Qλs (Dλ Ra g) ds. ∂s 0 0
Letting λ → ∞,
Pt (Ra g) − Ra g =
t
Ps (aRa g − g) ds. 0
294
Infinitesimal generators
Let b < a. Multiply the above equation by e−bt and integrate over t from 0 to ∞. Then t ∞ 1 −bt e Ps (aRa g − g) ds dt Sb (Ra g) − Ra g = b 0 0 ∞ ∞ e−bt Ps (aRa g − g) dt ds = 0 s ∞ 1 −bs e Ps (aRa g − g) ds = b 0 1 = Sb (aRa g − g). b Therefore Sb g = Ra g + (a − b)Sb Ra g. Applying this with g replaced by Ra g, iterating, and using Corollary 37.3, we obtain Sb g = Ra g + (a − b)R2a g + (a − b)3 R3a g + · · · = Rb g. By Remark 37.4, this proves Sb = Rb for all b. We now show that under appropriate hypotheses on L, there exists a semigroup whose infinitesimal generator is L. This is known as the Hille–Yosida theorem. We say that an operator L is dissipative if (λ − L ) f ≥ λ f ,
f ∈ D (L ).
(37.11)
Theorem 37.10 Suppose L is an operator such that (1) the domain of L is a dense subset of B , (2) the range of λ − L is B for each λ, and (3) L is dissipative. Then there exists a semigroup on B which has L as its infinitesimal generator. Proof
If (λ − L ) f = (λ − L )g, then λ f − g ≤ (λ − L )( f − g) = 0,
or f = g. Thus λ − L is a one-to-one map, hence is invertible because the range of λ − L is B . We let Rλ be the inverse, and thus the domain of Rλ is all of B . We first show that the resolvent equation holds. We observe (μ − L )
1 1 Rμ f = f λ−μ λ−μ
and (μ − L )
1 1 1 Rλ f = (μ − λ) Rλ f + (λ − L ) Rλ f λ−μ λ−μ λ−μ 1 f. = −Rλ f + λ−μ
Combining, (μ − L )Rμ Rλ f = Rλ f = (μ − L )
1 (Rμ − Rλ ) f . λ−μ
37.2 The Hille–Yosida theorem
295
Applying Rμ to both sides yields the resolvent equation. The hypothesis that λ − L ) f ≥ λ f immediately implies Rλ f ≤ f /λ. We next show λRλ f → f as λ → ∞. If f ∈ D, then Rλ L f = LRλ f = λRλ f − f , and so λRλ f − f ≤
1 L f → 0 λ
as λ → ∞. Since λRλ ≤ 1 and the domain of L is dense in B , we conclude λRλ f → f for all f ∈ B . We use Proposition 37.9 to construct Pt . By Proposition 37.9, Rλ is the resolvent for Pt . If M is the infinitesimal generator for Pt , then by Proposition 37.5, the domain of M is {Rλ f : f ∈ B }. Since we know L(Rλ f ) = λRλ f − f ∈ B , then the domain of L contains {Rλ f : f ∈ B }. Since M is the infinitesimal generator of Pt , by Proposition 37.5, M(Rλ f ) = λRλ f − f . Therefore L is an extension of M. If f ∈ D (L ), then g = (λ − L ) f ∈ B , and thus (λ − M )−1 g ∈ D (M ) ⊂ D (L ). Hence (λ − L ) f = g = (λ − M )(λ − M )−1 g = (λ − L )(λ − M )−1 g. Since λ − L is one-to-one, then f = (λ − M )−1 g, which implies f ∈ D (M ). Therefore M = L and so L is the generator of Pt . When applying the Hille–Yosida theorem, it is quite often the case that it is easier to show that the range of λ − L is only dense in B , rather than being all of B . When that occurs, one needs to look at a closed extension L of L. An operator L is closed if whenever fn → f and L fn → g, then f ∈ D (L ) and L f = g. To construct the closed extension of L, where we assume that L is dissipative (defined by (37.11)), let Rλ g = f if (λ − L ) f = g. L being dissipative is equivalent to the norm of Rλ being bounded by 1/λ on the range of λ − L, and so we can extend the domain of Rλ uniquely to all of B . Now define D (L ) to be the range of Rλ and set
LRλ g = λRλ g − g.
(37.12)
We will soon give two examples where infinitesimal generators can be used to construct very useful processes. The first is where the infinitesimal generator is an elliptic operator of second order in non-divergence form. The second case studies the infinitesimal generators of L´evy processes. We should mention that there is another important example where infinitesimal generators are useful in constructing a process, that of infinite particle systems. The name “infinite particle systems” refers to a class of models with discrete space and continuous time that are useful in mathematical biology and in statistical mechanics. One of the simplest examples is the voter model. Suppose at every point in Z2 , the integer lattice in the plane, there is a voter, who is leaning either toward the Democrat candidate or the Republican candidate. At each point, the voter waits a length of time that is exponential with parameter one, chooses
296
Infinitesimal generators
one of his four nearest neighbors at random, and then changes his view to agree with that neighbor. Other infinite particle systems include the contact process (modeling the spread of infection), Ising model (modeling ferromagnetism), and the exclusion model (used in solid state physics). See Liggett (2010) for how to construct these processes using infinitesimal generators, and for much more.
37.3 Nondivergence form elliptic operators Let us consider the operator L defined on C 2 functions on Rd by
L f (x) =
d
∂2 f ∂f (x) + bi (x) (x). ∂xi ∂x j ∂xi i=1 d
ai j (x)
i, j=1
We suppose ai j (x) = a ji (x) for all x. We assume the ai j and bi are bounded and H¨older continuous of order α ∈ (0, 1): there exists c such that |ai j (x) − ai j (y)| ≤ c|x − y|α ,
|bi (x) − bi (y)| ≤ c|x − y|α ,
for i, j = 1, . . . , d. We also assume a uniform ellipticity condition on the ai j : there exists > 0 such that d i, j=1
ai j (x)yi y j ≥
d
y2i ,
(y1 , . . . , yd ) ∈ Rd .
i=1
Uniform ellipticity says that the matrix whose (i, j)th element is ai j (x) is positive definite, uniformly in x. If the ai j and bi were Lipschitz continuous, we can construct the Markov process with infinitesimal generator L using stochastic differential equations (see Chapter 39), which is a more probabilistic way of doing it. Even when the ai j are continuous and the bi only measurable, it is possible to construct the Markov process via SDEs, although this is much harder. Here we illustrate how the Hille–Yosida theorem can be used in constructing these processes. Let B be the space of continuous functions that vanish at infinity. We will want the domain of L to include the class C of functions f such that f and its first and second partial derivatives are continuous and vanish at infinity. Then C is dense in B and L maps C into B . We show that L is dissipative. Let f ∈ C and let x0 be a point where | f (x0 )| = f . There is nothing to prove if f is identically zero. If f (x0 ) < 0, we can look at − f , so let us suppose f (x0 ) > 0. Such a point x0 exists because f is continuous and vanishes at infinity. It suffices to show that L f (x0 ) ≤ 0, since then λ f = λ f (x0 ) ≤ (λ − L ) f (x0 ) ≤ (λ − L ) f . Let A be the matrix whose (i, j) element is ai j (x0 ) and let H be the Hessian at x0 so that Hi j =
∂2 f (x0 ). ∂xi ∂x j
37.4 Generators of L´evy processes
297
Let y ∈ Rd and consider the function f (x0 + ty), t ∈ R. Since x0 is the location of a local maximum for this function, its second derivative, which is di, j=1 yi y j Hi j , will be less than or equal to 0. The first derivative of this function will be zero at x0 . Since A is positive definite, there exists an orthogonal matrix P and a diagonal matrix D with positive entries such that A = PT DP. Recall the trace of a square matrix is defined by Trace (C) = di=1 Cii and Trace (AB) = Trace (BA). Note d i, j=1
ai j (x0 )
∂2 f (x0 ) = Trace (AH ). ∂xi ∂x j
We have Trace (AH ) = Trace (PT DPH ) = Trace (PH PT D) =
d
(PH PT )ii Dii ,
i=1
since D is a diagonal matrix. Thus to show that Trace (AH ) ≤ 0, it suffices to show that (PH PT )ii ≤ 0 for each i. If we let ei be the unit vector in the xi direction and y = PT ei , we have d yi y j Hi j ≤ 0. (PH PT )ii = eti PH PT ei = yt H y = i, j=1 ∂f (x0 ) ∂xi
Since x0 is the location of a local maximum, then = 0, and we conclude L f (x0 ) ≤ 0. Since L1 = 0, then Pt 1 = 1 for all t. This and Exercise 37.1 imply that the Pt are non-negative operators. To apply the Hille–Yosida theorem, it remains to show that the range of λ − L is dense in B . For this we refer the reader to the PDE literature, e.g., Bass (1997), Chapter 3 or Gilbarg and Trudinger (1983), Chapters 5,6.
37.4 Generators of L´evy processes Let n be a measure on R \ {0} satisfying (h2 ∧ 1) n(dh) < ∞. Consider the operator L defined on C 2 functions by L f (x) = [ f (x + h) − f (x) − 1(|h|≤1) f (x)h] n(dh). We will show that L is the infinitesimal generator of a Markov semigroup. We construct these processes, the L´evy processes, probabilistically in Chapter 42. We confine ourselves to the one-dimensional case, although the argument for higher dimensions is completely analogous. We let B be the continuous functions vanishing at infinity. We let C be the class of Schwartz functions, which is the class of C ∞ functions, all of whose kth partial derivatives go to zero faster than |x|−m as |x| → ∞ for every k = 0, 1, . . . and every m = 1, 2, . . .; see Section B.2.
298
Infinitesimal generators
First we show that L maps C into B , so that the domain of L contains C , and hence is dense in B . Given M > 1 and f ∈ C , by Taylor’s theorem |L f (x)| ≤ | f (x + h) − f (x) − 1(|h|≤1) f (x)h| n(dh) (37.13) h2 n(dh) + 2( sup | f (y)|) n(dh) ≤ sup (| f (y)|) |y−x|≤M |y−x|≤1 0<|h|≤1 1<|h|≤M +2 f ∞ n(dh). |h|>M
This shows |L f (x)| is finite. Given ε > 0 and f ∈ C , choose M large so that n(dh) < ε/ f ∞ . |h|>M
Since the first two terms on the right-hand side of (37.13) tend to zero as |x| → ∞, then L : C → B. To show L is dissipative, let f ∈ C and choose x0 such that | f (x0 )| = f . There is nothing to prove if f = 0, so assume f > 0. Because f is in the Schwartz class, it takes on its maximum and its minimum. By looking at − f if necessary, we may suppose f (x0 ) > 0. Since x0 is the location of a local maximum, f (x0 ) = 0 and f (x0 + h) − f (x0 ) ≤ 0 for each h, hence L f (x0 ) ≤ 0. Then λ f = λ f (x0 ) ≤ (λ − L ) f (x0 ) ≤ (λ − L ) f . Taking limits, this holds for every f in the domain of L. Finally we need to show that the range of λ − L is dense in B . This is the most complicated part and we break the argument into steps. Step 1. We start by computing the Fourier transform of L f if f ∈ C . Let nδ (dh) = 1(|h|≥δ) n(dh) and let Lδ f (x) = [ f (x + h) − f (x) − 1(|h|≤1) f (x)h] nδ (dh). Then nδ is a finite measure. Using the Fubini theorem and the fact that the Fourier transform of the function x → f (x + h) is eiuh f (u), f (u) and the Fourier transform of f (x) is −iu # L [eiux f (x + h) − eiux f (x) − 1(|h|≤1) eiux f (x)h] dx nδ (dh) δ f (u) = = f (u) [e−iuh − 1 + 1(|h|≤1) iuh] nδ (dh) = f (u) [e−iuh − 1 + 1(|h|≤1) iuh]1(|h|≥δ) n(dh). (37.14) The expression in brackets on the last line is bounded by c(h2 ∧ 1) and by dominated convergence the last line converges to f (u)ψ (u) as δ → 0, where ψ (u) = [e−iuh − 1 + 1(|h|≤1) iuh] n(dh). (37.15)
37.4 Generators of L´evy processes
299
Since # &f (u) − L |L δ f (u)| [ f (x + h) − f (x) − 1(|h|≤1) f (x)h] n(dh) dx = eiux |h|<δ h2 n(dh) dx, ≤ ( sup | f (y)|) |y−x|<δ
|h|<δ
which tends to zero as δ → 0 because f ∈ C , we conclude &f (u) = L f (u)ψ (u).
(37.16)
Step 2. Now let g ∈ C , let ε > 0, choose K > 1 such that |h|≥K n(dh) < ε, let mK (dh) = 1(|h|≥K ) n(dh), and define LK and ψK in terms of mK . We show there exists f ∈ C such that g = (λ − LK ) f = g. We have ψK (u) = [e−iuh − 1 + iuh1(|h|≤1) ] n(dh), |h|≤K
so using dominated convergence, ψK (u)
=
|h|≤K
ψK (u)
[−ihe−iuh + ih1(|h|≤1) ] n(dh),
=
[−h2 e−iuh ] n(dh), |h|≤K
with similar formulas for the higherderivatives. Thus all the derivatives of ψK are bounded. Moreover the real part of ψK (u) is |h|≤K [cos(uh) − 1] n(dh), which is less than or equal to 0. Since g ∈ C , by Section B.2, g ∈ C . If we define f by f (u) =
1 g(u), λ − ψK (u)
(37.17)
we see that f and all its derivatives are continuous and tend to zero faster than |u|−m for every m. Hence f ∈ C , which implies f ∈ C by Section B.2. Notice (λ − LK ) f = g because λ − ψK (u) g(u) = g(u). λ f (u) − L# K f (u) = λ − ψK (u) Step 3. We prove that L f − LK f ≤ cεg. Since g ∈ C , then g ∈ L1 . From (37.17) we have | f (u)| ≤ | g(u)|/λ. Then f L1 ≤ c gL1 f ∞ ≤ c
300
Infinitesimal generators
and
|L f (x) − LK f (x)| ≤
| f (x + h) − f (x)| n(dh) n(dh) ≤ 2 f ∞ |h|≥K
|h|≥K
≤ cε gL1 . Step 4. We complete the proof that the range of λ − L is dense in B . Since L f − LK f ≤ cεg by Step 3 and (λ − LK ) f = g, then (λ − L ) f − g ≤ cεg. Because f ∈ C ⊂ D (L ) and ε is arbitrary, this proves the range of λ − L is dense in C , hence in B . We thus have L satisfying all the hypotheses of the Hille–Yosida theorem, and hence there exists a semigroup Pt mapping B into B . We again note that L1 = 0, hence Pt = 1 for all t, and so by Exercise 37.1, the Pt are non-negative operators.
Exercises 37.1 Let B be either the space with respect to a finite measure or else the continuous functions vanishing at infinity for some locally compact separable metric space S. In the former case, we say f ≥ 0 if f (x) ≥ 0 for almost every x, in the latter case if f (x) ≥ 0 for all x. A semigroup is non-negative if f ≥ 0 implies Pt f ≥ 0 for all t ≥ 0. Suppose that Pt is a semigroup, the space B contains the constant functions, and Pt 1 = 1 for all t. Show that Pt is a contraction if and only if Pt is non-negative. L2
37.2 Show that Pt and Rλ commute and that
Pt Rλ f =
∞
e−λs Ps+t f ds.
0
Show that for any a < b we have ' ' b ' ' λt e Pt f dt ' ≤ ' a
b
e−λt Pt f dt.
a
Hint: Approximate Rλ f by a Riemann sum. 37.3 Show that if Pt is a contraction semigroup and Rλ is the resolvent, then Rλ ≤ 1/λ.
(37.18)
37.4 Show that if A is a bounded operator and Tt = etA , then Tt is a strongly continuous semigroup of operators with infinitesimal generator A. (We cannot assert that the Tt are contractions.) 37.5 Prove that if L is dissipative, the domain of L is dense in B, and the range of λ − L is dense in B, then L defined in (37.12) is a closed extension of L that is dissipative and the range of λ − L is equal to B. Show there is only one such closed extension of L. 37.6 If the range of λ − L equals B for a single value of λ, then the range of λ − L equals B for every value of λ. Hint: Define Rλ as the inverse of λ − L, then use (37.3) to define Ra for other values of a.
Exercises
301
37.7 Let (Xt , Px ) be a Markov process with transition probabilities given by Pt f (x) = f (x + t ). Determine L and D (L ). 37.8 Let Pt be a strongly continuous semigroup of contraction operators and let L be the infinitesimal generator. Show that D (Ln ) is dense in B for every positive integer n. 37.9 This is a continuation of Exercise 36.5. Prove that if f ∈ C 2 with compact support, Pt is the semigroup given in Exercise 36.5, and L is the infinitesimal generator, then the Fourier transform f (u)ψ (u). of L f is 37.10 Suppose that Pt is a strongly continuous semigroup, but not necessarily of contractions. Thus Pt+s = Pt Ps and Pt f → f in norm if f ∈ B, but we do not assume Pt ≤ 1. Prove that there exist constants K, b > 0 such that Pt ≤ Kebt for all t ≥ 0. Hint: Use the uniform boundedness principle from functional analysis to prove there exists c, t0 such that Pt ≤ c if t ≤ t0 . Then use the semigroup property.
38 Dirichlet forms
When constructing semigroups, it is sometimes easier to start with a bilinear form, called the Dirichlet form, than to work with the infinitesimal generator, and to construct the semigroup from the form. For example, let be the Laplacian. If f , g ∈ C 2 with compact support, then integration by parts shows d ∂ 2g 1 1 f (x)( 2 g)(x) dx = 2 f (x) (x) dx ∂x2i Rd Rd i=1 d ∂f ∂g (x) (x) dx. = − 21 ∂xi Rd i=1 ∂xi If we write
E ( f , g) = we thus have
1 2
d ∂f ∂g (x) (x) dx, ∂x ∂x i i i=1
Rd
f ( 12 g) = −E ( f , g).
Clearly E ( f , g) is symmetric in f and g, so 1 f ( 2 g) = −E ( f , g) = −E (g, f ) = Rd
(38.1)
Rd
g( 12 f ) dx.
If Rλ is the resolvent for Brownian motion, (38.1) and the fact that 12 Rλ f = λRλ f − f tells us that E (Rλ f , g) + λ (Rλ f )g = − ( 12 Rλ f )g + λ (Rλ f )g (38.2) = − (λRλ f − f )g + λ (Rλ f )g = f g. The bilinear form E ( f , g) makes sense even if f , g are only in C 1 with compact support, which is one major advantage of the Dirichlet form. Since E is clearly linear in each variable, we have
E ( f , g) = 12 [E ( f + g, f + g) − E ( f , f ) − E (g, g)], 302
38.1 Framework
303
so to specify the Dirichlet form, it is only necessary to know E ( f , f ), a number, rather than L f , a function. One disadvantage of Dirichlet forms is that one needs a self-adjoint operator, and not every infinitesimal generator is self-adjoint. Another disadvantage is that when working with Dirichlet forms, L2 is the natural space to work with, which means there are null sets one has to worry about. In particular, the construction of Chapter 36 is not directly applicable, because there we required our Banach space to be the set of continuous functions vanishing at infinity. (Modifications of the methods in Chapter 36 do work, however.)
38.1 Framework Let us now suppose S is a locally compact separable metric space together with a σ -finite measure m defined on the Borel subsets of S . We want to give a definition of the Dirichlet form in this more general context. We suppose there exists a dense subset D = D (E ) of L2 (S, m) and a non-negative bilinear symmetric form E defined on D × D, which means
E ( f , g) = E (g, f ), E ( f + g, h) = E ( f , h) + E (g, h) E (a f , g) = aE ( f , g), E ( f , f ) ≥ 0 for f , g, h ∈ D, a ∈ R. We will frequently write f , g for f (x)g(x) m(dx). For a > 0 define
Ea ( f , f ) = E ( f , f ) + a f , f . We can define a norm on D using the inner product Ea : the norm of f equals (Ea ( f , f ))1/2 ; we call this the norm induced by Ea . Since a f , f ≤ Ea ( f , f ), then
Ea ( f , f ) ≤ Eb ( f , f ) = Ea ( f , f ) + (b − a) f , f
b − a ≤ 1+ Ea ( f , f ) a if a < b, so the norms induced by different a’s are all equivalent. We say E is closed if D is complete with respect to the norm induced by Ea for some a. Equivalently, E is closed if whenever un ∈ D satisfies E1 (un − um , un − um ) → 0 as n, m → ∞, then there exists u ∈ D such that E (un − u, un − u) → 0 as n → ∞. We say E is Markovian if whenever u ∈ D, then v = 0∨(u∧1) ∈ D and E (v, v) ≤ E (u, u). (A slightly weaker definition of Markovian is sometimes used.) A Dirichlet form is a nonnegative bilinear symmetric form that is closed and Markovian. Absorbing Brownian motion on [0, ∞) is a symmetric process. The corresponding Dirichlet form is E( f , f ) =
∞
1 2
| f (x)|2 dx,
0
and the appropriate domain turns out to be the completion of the set of C 1 functions with compact support contained in (0, ∞) with respect to the norm induced by E1 . In particular, any function with compact support contained in (0, ∞) will be zero in a neighborhood of 0. In a domain D in higher dimensions, the Dirichlet form for absorbing Brownian motion becomes
E( f , f ) =
1 2
|∇ f (x)|2 dx,
(38.3)
304
Dirichlet forms
with the domain of E being the completion with respect to E1 of the C 1 functions whose support is contained in the interior of D. Reflecting Brownian motion is also a symmetric process. For a domain D, the Dirichlet form is given by (38.3) and the domain D (E ) of the form is given by the completion with respect to the norm induced by E1 of the C 1 functions on D with compact support, where D is the closure of D. One might expect there to be some restriction on the normal derivative ∂ f /∂n on the boundary of D, but in fact there is no such restriction. To examine this further, consider the case of D = (0, ∞). If one takes the class of functions f which are C 1 with compact support and with f (0) = 0 and takes the closure with respect to the norm induced by E1 , one gets the same class as D (E ); this is Exercise 38.1. One nice consequence of the fact that we don’t need to impose a restriction on the normal derivative in the domain of E for reflecting Brownian motion is that this allows us to define reflecting Brownian motion in any domain, even when the boundary is not smooth enough for the notion of a normal derivative to be defined.
38.2 Construction of the semigroup We now want to construct the resolvent corresponding to a Dirichlet form. The motivation given in (38.2) shows we should expect
Ea (Ra f , g) = f , g
(38.4)
for all a > 0 and all f , g such that Ra f , g ∈ D. Our Banach space B will be L2 (S , m). Theorem 38.1 If E is a Dirichlet form, there exists a family of resolvent operators {Rλ } such that (1) the Rλ satisfy the resolvent equation, (2) λRλ ≤ 1 for all λ > 0, (3) λRλ f → f as λ → ∞ for f ∈ B , and (4) Ea (Ra f , g) = f , g if a > 0, Ra f , g ∈ D. Moreover, if f ∈ B satisfies 0 ≤ f (x) ≤ 1, m-a.e., then for all a > 0 0 ≤ aRa f ≤ 1,
m-a.e.
(38.5)
Proof Fix f ∈ B and define a linear functional on B by I (g) = f , g. This functional is also a bounded linear functional on D with respect to the norm induced by Ea , that is, there exists c such that |I (g)| ≤ cEa (g, g)1/2 . This follows because |I (g)| = f g ≤ f , f 1/2 g, g1/2 ≤ f , f 1/2 ( 1a Ea (g, g))1/2 by the Cauchy–Schwarz inequality. Since E is closed, D is a Hilbert space with respect to the norm induced by Ea . By the Riesz representation theorem for Hilbert spaces (see, e.g., Folland (1999), Theorem 5.25), there exists a unique element u ∈ D such that I (g) = Ea (u, g) for all g ∈ D. We set Ra f = u. In particular, (38.4) holds, and Ra f ∈ D.
38.2 Construction of the semigroup
305
We show the resolvent equation holds. If g ∈ D,
Ea (Ra f − Rb f , g) = Ea (Ra f , g) − E (Rb f , g) − aRb f , g = f , g − E (Rb f , g) − bRb f , g + (b − a)Rb f , g = f , g − Eb (Rb f , g) + (b − a)Rb f , g = (b − a)Rb f , g = Ea ((b − a)Ra Rb f , g). Since this holds for all g ∈ D and D is dense in B , then Ra f − Rb f = (b − a)Ra Rb f . Next we show that aRa f ≤ f , or equivalently, aRa f , aRa f ≤ f , f .
(38.6)
If Ra f , Ra f is zero, then (38.6) trivially holds, so suppose it is positive. We have aRa f , Ra f ≤ Ea (Ra f , Ra f ) = f , Ra f ≤ f , f 1/2 Ra f , Ra f 1/2 by (38.4) and the Cauchy–Schwarz inequality. If we now divide both sides by Ra f , Ra f 1/2 and then square both sides, we obtain (38.6). We show that bRb f → f as b → ∞ when f ∈ B . If f ∈ D, then by the Cauchy–Schwarz inequality and (38.6) bRb f , f ≤ bRb f , bRb f 1/2 f , f 1/2 ≤ f , f . Using this, bbRb f − f , bRb f − f ≤ Eb (bRb f − f , bRb f − f ) = b2 Eb (Rb f , Rb f ) − 2bEb (Rb f , f ) + Eb ( f , f ) = b2 Rb f , f − 2b f , f + E ( f , f ) + b f , f ≤ E ( f , f ). Now divide both sides by b to get bRb f − f 2 ≤ E ( f , f )/b → 0 as b → ∞. Since D is dense in B and bRb ≤ 1 for all b, we conclude bRb f → f for all f ∈ B . It remains to show 0 ≤ bRb f ≤ 1, m-a.e., if 0 ≤ f ≤ 1, m-a.e. Fix f ∈ B with 0 ≤ f ≤ 1, m-a.e., and let a > 0. Define a functional ψ on D by ! f" f ψ (v) = E (v, v) + a v − , v − . a a We claim ψ (Ra f ) + Ea (Ra f − v, Ra f − v) = ψ (v),
v ∈ D.
(38.7)
306
Dirichlet forms
To see this, start with the left-hand side, which is equal to ! 1 1 " E (Ra f , Ra f ) + a Ra f − f , Ra f − f + Ea (Ra f − v, Ra f − v) a a 1 = Ea (Ra f , Ra f ) − 2Ra f , f + f , f + Ea (Ra f , Ra f ) − 2Ea (Ra f , v) + Ea (v, v) a 1 = f , f − 2 f , v + E (v, v) + av, v a = ψ (v). If follows from (38.7) and the fact that Ea (g, g) is non-negative for any g ∈ D that Ra f is the function that minimizes ψ. Set φ(x) = 0 ∨ (x ∧ (1/a)) and let w = φ(Ra f ). Observe that |φ(t ) − s| ≤ |t − s| for t ∈ R and s ∈ [0, 1/a], so f (x) f (x) ≤ Ra f (x) − , w(x) − a a and therefore ! f" ! f f f" w − ,w − ≤ Ra f − , Ra f − . (38.8) a a a a Since E is Markovian, then aw = 0 ∨ ((aRa f ) ∧ 1), which leads to 1 E (aRa f , aRa f ) = E (Ra f , Ra f ). (38.9) a2 Adding (38.8) and (38.9), we conclude ψ (w) ≤ ψ (Ra f ). Since Ra f is the minimizer for ψ, then w = Ra f , m-a.e. But 0 ≤ w ≤ 1/a, and hence aRa f takes values in [0, 1], m-a.e.
E (w, w) ≤
If we combine Proposition 37.9 and Theorem 38.1, we obtain a semigroup Pt whose resolvent satisfies (38.4). We would like to know that the analog of (38.5) holds for Pt . Corollary 38.2 If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ Pt f ≤ 1, m-a.e. Proof If 0 ≤ f ≤ 1, m-a.e., then 0 ≤ bRb f ≤ 1, m-a.e, by Theorem 38.1, and iterating, 0 ≤ (bRb )i f ≤ 1, m-a.e., for every i. Using the notation of the proof of Proposition 37.9, Qtb f
(x) = e
−bt
∞
(bt )i (bRb )i f (x)/i!,
i=0
i which will be non-negative, m-a.e., and bounded by e−bt ∞ i=0 (bt ) /i!, m-a.e. Passing to the limit as b → ∞, we see that Pt f takes values in [0, 1], m-a.e. When it comes to using the semigroup Pt derived from a Dirichlet form to construct a Markov process X , there is a difficulty that we did not have before. Since Pt is constructed using an L2 procedure, Pt f is defined only up to almost everywhere equivalence. Without some continuity properties of Pt f for enough f ’s, we must neglect some null sets. If the only null sets we could work with were sets of m-measure 0, we would be in trouble. For example, when S is the plane and m is a two-dimensional Lebesgue measure, the x axis has measure zero, but a continuous process will (in general) hit the x axis. Fortunately there is a notion of sets of capacity zero, which are null sets that are smaller than sets of measure zero. It is
38.3 Divergence form elliptic operators
307
possible to construct a process X starting from all points x in S except for those in a set N of capacity zero and to show that starting from any point not in N , the process never hits N . There is another difficulty when working with Dirichlet forms. In general, one must look , a certain compactification of S , which is a compact set containing S . Even when our at S is not necessarily equal to S , the Euclidean closure of S , and state space is a domain in Rd , S one must work with S instead of S . It can be shown that this problem will not occur if the Dirichlet form is regular. Let CK be the set of continuous functions with compact support. A Dirichlet form E is regular if D ∩ CK is dense in D with respect to the norm induced by E1 and D ∩ CK also is dense in CK with respect to the supremum norm.
38.3 Divergence form elliptic operators We want to show how to construct the Markov process corresponding to the operator d ∂
∂f L f (x) = (·) (x). ai j (·) ∂xi ∂x j i, j=1
(38.10)
If the ai j ’s are smooth in x, this can be interpreted as first calculating the partial derivative of f with respect to x j , multiplying the result by ai j (x), taking the partial derivative of the product with respect to xi , and then summing over i and j. If, however, the ai j ’s are only bounded and measurable, one cannot even in general give any nontrivial examples of functions in the domain of L. Here is where Dirichlet forms are the perfect tool. Operators of the form (38.10) are known as elliptic operators in divergence form or in variational form, and the study of their properties has a long history in PDE. We assume ai j (x) = a ji (x) for each i and j and each x. We suppose the ai j (x) are measurable functions and are uniformly bounded in x for each i and j. We also require uniform ellipticity: there exists such that d
ai j (x)yi y j ≥
i, j=1
d
y2i ,
(y1 , . . . , yd ) ∈ Rd .
i=1
Just as in the nondivergence elliptic operator case, the matrix whose (i, j)th element is ai j (x) is positive definite, uniformly in x. We will shortly define a Dirichlet form, but let us first specify a domain. Let CK1 be the collection of C 1 functions with compact support, and define H 1 to be the completion of CK1 with respect to the norm
1/2 (| f (x)|2 + |∇ f (x)|2 ) dx . (38.11) f H 1 = One can show that H 1 with this norm is a Banach space; this is Exercise 38.2. Now for f ∈ CK1 define
E( f , f ) =
d Rd i, j=1
ai j (x)
∂f ∂f (x) (x) dx. ∂xi ∂x j
(38.12)
308
Dirichlet forms
We can use the fact that CK1 is dense in H 1 to extend the definition of E to all of H 1 × H 1 . The connection with the operator L is that when the ai j are smooth, integration by parts yields (L f )g dx = −E ( f , g) if g is C 1 with compact support; cf. (38.1). Because of the boundedness and uniform ellipticity, there exist positive constants c1 and c2 not depending on f such that 2 c1 |∇ f (x)| dx ≤ E ( f , f ) ≤ c2 |∇ f (x)|2 dx. Therefore the norm induced by E1 and the norm in H 1 are equivalent. This implies E is closed. By the definition of H 1 , E is regular, and clearly E is symmetric. Thus we need only to show that E is Markovian. Let φ(x) = (0 ∨ x) ∧ 1. For each ε > 0 let φε be C ∞ , bounded, agreeing with φ on [0, 1], with φε ∞ ≤ 1, and such that φε (x) → φ(x) uniformly in x as ε → 0 and φε (x) → 1[0,1] (x) pointwise as ε → 0. Note ∇φε ( f ) = φε ( f )∇ f , so if f ∈ CK1 ,
E (φε ( f ), φε ( f )) =
d
(φε ( f )(x))2 ai j (x)
i, j=1
∂f ∂f (x) (x) dx. ∂xi ∂x j
(38.13)
Since d
ai j (x)
i, j=1
∂f ∂f (x) (x) ≥ |∇ f (x)|2 ≥ 0 ∂xi ∂x j
and |φε ( f )(x)| ≤ 1, we see that
E (φε ( f ), φε ( f )) ≤ E ( f , f ). Taking the limit as ε → 0 in (38.13) we obtain
E (φ( f ), φ( f )) ≤ E ( f , f ) < ∞.
(38.14)
In particular, φ( f ) ∈ H 1 = D (E ). We now pass to the limit to show that (38.14) holds for all f ∈ H 1 , which says that E is Markovian. We can therefore apply Theorem 38.1 to obtain a semigroup corresponding to the Dirichlet form E . As mentioned earlier, there is potentially a problem in that the semigroup is only defined for points not in a certain null set. However, a famous result of Nash and of DeGiorgi shows that the semigroup Pt can be written as Pt f (x) = f (y)p(t, x, y) dy with p(t, x, y) H¨older continuous in x and y; see Bass (1997), Chapter VII for a presentation of this result. This allows us to take the null set to be empty and to see that our semigroup satisfies the assumptions of Chapter 36. Therefore there exists a strong Markov process having Pt as its semigroup.
Exercises
309
Exercises C 1 [0, ∞)
38.1 Let F1 = { f ∈ : f has compact support} and F2 = F1 ∩ { f ∈ C 1 [0, ∞) : f has compact support, f (0) = 0}. Show that the closures of F1 and F2 with respect to the norm ( (| f (x)|2 + | f (x)|2 ) dx)1/2 are the same. 38.2 If H 1 is the completion of CK1 , the C 1 functions on Rd with compact support, relative to the norm given by (38.11), show H 1 is a Hilbert space. 38.3 Show that the resolvent operator Rλ defined in Theorem 38.1 is a symmetric operator, that is, if f , g ∈ B, then Rλ f , g = f , Rλ g. 38.4 Show that if the resolvent operator Rλ is a symmetric operator, then the transition operators Pt are also symmetric: if f , g ∈ B, then Pt f , g = f , Pt g. 38.5 To do the next few exercises, you will have to know some functional analysis, specifically, the spectral theorem for self-adjoint operators. See Lax (2002). Let E be a Dirichlet form with domain D (E ) and let L be the infinitesimal generator of the semigroup Pt that corresponds to L. Let E(dλ) be a spectral resolution of the identity for −L. (The operator L is a negative operator, so −L is a positive one.) Then a consequence of the spectral theorem is that ∞ e−λt E(dλ) f Pt f = 0
and
∞
Ra f = 0
Also
1 E(dλ) f . a+λ
∞
f , g =
E(dλ) f , g.
0
Show that if f , g ∈ D, then
∞
E ( f , g) =
λ E(dλ) f , g.
0
Hint: First prove it for f = Ra h. Write
∞ 0
λ a+λ
∞
a 1− E(dλ)h, g a+λ 0 ∞ λ E(dλ)(Ra h), g. E(dλ)h, g =
E (Ra h, g) = h, g − aRa h, g =
=
0
To extend this to all f in the domain of E , use the fact that E is closed. 38.6 If L is the √ infinitesimal generator of the semigroup associated with the Dirichlet form E , show that D ( −L ) = D (E ). 38.7 Show that if f ∈ D (E ), then aRa f converges to f with respect to the norm induced by E1 . 38.8 Show that if b > 0, then {Rb f : f ∈ L2 } is a dense subset of D (E ) with respect to the norm induced by E1 . 38.9 Show that {Pt f : f ∈ L2 , t > 0} is a dense subset of D (E ) with respect to the norm induced by E1 .
310
Dirichlet forms
38.10 This exercise shows how to approximate E by forms whose domain is all of B. Let E (t ) ( f , g) =
1 f − Pt f , g. t
Show that if f ∈ D (E ), then E (t ) ( f , f ) increases to E ( f , f ). Show that if f , g ∈ D (E ), then E (t ) ( f , g) converges to E ( f , g). 38.11 Show that if u ∈ D (E ), then |u| ∈ D (E ) and E (|u|, |u|) ≤ E (u, u). Hint: Use Exercise 38.10. 38.12 Use Exericse 38.11 to show that if u ∈ D (E ), then E (u+ , u− ) ≤ 0. 38.13 Suppose {Pt } are the transition probabilities corresponding to a Dirichlet form E . Suppose there exist functions pt (x, y) such that for each t, Pt f (x) = pt (x, y) m(dy) for almost every x. Prove that for almost every pair (x, y) with respect to the product measure m × m, pt (x, y) = pt (y, x). 38.14 Let f ∈ L2 (m) and define the functional ψ (u) = E (u, u) + λu, u − 2 f , u for u in the domain of E . Prove that ψ is minimized by u = Rλ f , and that this function is the unique minimizer. 38.15 Let Pt be the semigroup associated with a Dirichlet form and define J (dx, dy) = Pt (x, dy) m(dx). (1) Prove that if f , g are continuous with compact support, then f (x)g(y) J (dx, dy) = g(x) f (y) J (dx, dy). (2) With f and g continuous with compact support, prove that f (x)g(y) J (dx, dy) = f , Pt g and
f (x)g(x) J (dx, dy) = f g, Pt 1.
(3) Let k(x) = 1 − Pt 1(x). Prove that if E (t ) is defined as in Exercise 38.10, then 2t E (t ) ( f , g) = ( f (x) − f (y))(g(x) − g(y)) J (dx, dy) + f (x)g(x)k(x) m(dx). (4) Is E (t ) a Dirichlet form? A regular Dirichlet form?
Notes
311
38.16 This is a continuation of the previous exercise. If f is a function on the state space, we say that g is a normal contraction of f if |g(x)| ≤ | f (x)| for all x and |g(x) − g(y)| ≤ | f (x) − f (y)| for all x and y. As an example, note that if g(x) = −1 ∨ ( f (x) ∧ 1), then g is a normal contraction of f . Prove that if f ∈ D (E ), where E is a Dirichlet form and g is a normal contraction of f , then for each t > 0, E (t ) (g, g) ≤ E (t ) ( f , f ) ≤ E ( f , f ).
Notes See Fukushima et al. (1994) for further information.
39 Markov processes and SDEs
One common way of constructing Markov processes is via stochastic differential equations. Roughly speaking, if there is uniqueness for every starting point, then one can create a strong Markov process. After proving this, we establish a connection between stochastic differential equations and partial differential equations, and then we describe what is known as the martingale problem.
39.1 Markov properties Let P be a probability and suppose W is a d-dimensional Brownian motion with respect to P. Consider the SDE dXt = σ (Xt ) dWt + b(Xt ) dt.
(39.1)
Here σ is a d × d matrix-valued function and b is a vector-valued function, both Borel measurable and bounded. This can be written in terms of components as dXt i =
d
σi j (Xt ) dWt j + bi (Xt ) dt,
i = 1, . . . , d,
j=1
where W = (W 1 , . . . , W d ). Let Xt x be the solution to (39.1) when X0 = x. Let Px be the law of Xt x . Let = C[0, ∞), let F be the cylindrical subsets of , and define Zt (ω) = ω(t ). The main result of this section is that if weak existence and weak uniqueness hold for (39.1) for every starting point x, then the solutions (Zt , Px ) form a strong Markov process. We begin by considering regular conditional probabilities. Definition 39.1 Let (, F , P ) be a probability space, and let E be a σ -field contained in F . A regular conditional probability for E [ · | E ] is a kernel Q(ω, dω ) such that (1) Q(ω, ·) is a probability measure on (, E ) for each ω; (2) for each A ∈ F , Q(·, A) is a random variable that is measurable with respect to F ; (3) for each A ∈ F and each B ∈ E , Q(ω, A) P(dω) = P(A ∩ B). B
Regular conditional probabilities need not always exist, but if the probability space has sufficient structure, then they do. We provide a proof in the appendix; see Theorem C.1. Q(ω, A) can be thought of as P(A | E )(ω), regularized so as to have some joint measurability. 312
39.1 Markov properties
313
Recall that the definition of minimal augmented filtration for a Markov process was given in Section 20.1. Theorem 39.2 Suppose weak existence and weak uniqueness hold for the SDE (39.1) whenever X0 is a random variable that is in L2 and is measurable with respect to F0 . Suppose the matrix σ (y) is invertible for each y. Let (, F , P ) be defined as above. Let Px be the law of the weak solution when X0 is identically equal to x. Let {Ft } be the minimal augmented filtration generated by Z. Then (Px , Zt ) is a strong Markov process. Proof We will prove that if T is a bounded stopping time and f is a bounded and Borel measurable function on Rd , then
E x [ f (ZT +t ) | FT ] = E ZT f (Zt ),
a.s.
As in Section 20.3, this is sufficient to get the strong Markov property. Fix x. Let t b(Zr ) dr Yt = Zt −
(39.2)
(39.3)
0
and
t
Wt =
σ −1 (Zr ) dYr .
(39.4)
0
Since the Px law of Zt is the same as the P law of Xt x , then the Px law of W is the same as the P law of W , or in other words, W is a Brownian motion under Px . Rearranging (39.3) and (39.4), we have the equation t t Zt = Z0 + σ (Zr ) dWr + b(Zr ) dr. (39.5) 0
0
t = WT +t −WT . Zt = ZT +t and W Let Q be a regular conditional probability for E [ · | FT ]. Let Using (39.5) with t replaced by T +t and then with t replaced by T , and taking the difference, we obtain T +t T +t ZT +t − ZT = σ (Zr ) dWr + b(Zr ) dr, x
T
and hence Zt = Z0 +
T
t
r + σ ( Zr ) W
0
t
b( Zr ) dr.
(39.6)
0
is a Brownian motion with respect to Q(ω, ·) for We will show in a moment that W P -almost all ω. Thus except for ω in a Px -null set, (39.6) implies that under Q(ω, ·), Z is a solution to (39.1) with starting point Z0 = ZT (ω). If E Q denotes the expectation with respect to Q, the weak uniqueness tells us that x
E Q f ( Zt ) = E ZT f (Zt ),
Px (dω)-a.s.
(39.7)
On the other hand,
E Q f ( Zt ) = E Q f (ZT +t ) = E x [ f (ZT +t ) | FT ], Combining (39.7) and (39.8) proves (39.2).
Px (dω)-a.s.
(39.8)
314
Markov processes and SDEs
is a Brownian motion. Q(ω, ·) is a It remains to prove that under Q the process W t is continuous for every ω . Let t1 < · · · < tn and probability measure on , so t → W n
N (u2 , . . . , un , t1 , . . . , tn ) = ω : E Q exp i u j (WT +t j − WT +t j−1 ) j=2
= exp −
n
|u j |2 (t j − t j−1 )/2
.
j=2
By the strong Markov property of the Brownian motion W and the definition of Q, n n
E Q exp i u j (WT +t j − WT +t j−1 ) = E exp i u j (WT +t j − WT +t j−1 ) | FT j=2
j=2
= E WT
n
exp i u j (WT +t j − WT +t j−1 ) j=2
= exp −
n
|u j |2 (t j − t j−1 )/2 ,
j=2
where the second equality holds almost surely, that is, except for a Px -null set of ω’s. This shows that N (u2 , . . . , un , t1 , . . . , tn ) is a null set with respect to Px . Let N be the union of all such N (u1 , . . . , un , t1 , . . . , tn ) for n ≥ 1, u1 , . . . , un rational, and t1 < . . . < tn rational. Therefore N is a Px -null set. Suppose ω ∈ / N. By the continuity of the paths of W , n n
E Q exp i u j (WT +t j − WT +t j−1 ) = exp − |u j |2 (t j − t j−1 )/2 j=2
j=2
for all t, . . . , tn ∈ [0, ∞) and u2 , . . . , un ∈ R. Thus the finite-dimensional distributions of W and Theorem 2.6, under QT (ω, ·) are those of a Brownian motion. By the continuity of W is a Brownian motion, except for a null set of ω’s. under QT , W By a slight abuse of notation, we will say (Xt , Px ) is a strong Markov family when (Zt , Px ) is a strong Markov family.
39.2 SDEs and PDEs The connection between stochastic differential equations and partial differential equations comes about through the following theorem, which is simply an application of Itˆo’s formula. Let L be the operator on functions in C 2 defined by
L f (x) =
1 2
d i, j=1
∂2 f ∂f (x) + bi (x) (x). ∂xi ∂x j ∂xi i=1 d
ai j (x)
(39.9)
39.3 Martingale problems
315
Theorem 39.3 Suppose Xt is a solution to (39.1), σ and b are bounded and Borel measurable, and a = σ σ T . Suppose f ∈ C 2 . Then t f (Xt ) = f (X0 ) + Mt + L f (Xs ) ds, (39.10) 0
where Mt =
t d ∂f (Xs )σi j (Xs ) dWs j ∂x i 0 i, j=1
(39.11)
is a local martingale. Proof Since the components of the Brownian motion Wt are independent, we have dW k , W t = 0 if k = ; see Exercise 9.4. Therefore σik (Xt )σ jl (Xt ) dW k , W t dX i , X j t = k
=
σik (Xt )σkTj (Xt ) dt = ai j (Xt ) dt.
k
We now apply Itˆo’s formula: f (Xt ) = f (X0 ) +
t 2 ∂f ∂ f i 1 (Xs ) dXs + 2 (Xs ) dX i , X j s ∂x ∂x ∂x i i j 0 0 i i, j t 2 t ∂ f ∂ f 1 = f (X0 ) + Mt + (Xs )bi (Xs ) ds + 2 (Xs )ai j (Xs ) ds 0 ∂xi 0 i, j ∂xi ∂x j i t = f (X0 ) + Mt + L f (Xs ) ds, t
0
and we are finished.
39.3 Martingale problems In this section we consider operators in nondivergence form, that is, operators of the form given by (39.9). We assume throughout this section that the coefficients ai j and bi are bounded and measurable and that ai j (x) = a ji (x) for all i, j = 1, . . . , d and all x ∈ Rd . The coefficients ai j are called the diffusion coefficients and the bi are called the drift coefficients. We also assume that the operator L is uniformly elliptic, which means that there exists > 0 such that d yi ai j (x)y j ≥ |y|2 , y ∈ Rd , x ∈ Rd . (39.12) i, j=1
This says that the matrix ai j (x) is positive definite, uniformly in x. We saw in the previous section that if Xt is the solution to (39.1), a = σ σ T , and f ∈ C 2 , then t L f (Xs ) ds (39.13) f (Xt ) − f (X0 ) − 0
316
Markov processes and SDEs
is a local martingale under P. A very fruitful idea of Stroock and Varadhan is to phrase the association of Xt with L in terms which use (39.13) as a key element. Let consist of all continuous functions ω mapping [0, ∞) to Rd . Let Xt (ω) = ω(t ) and given a probability P, let {Ft } be the minimal augmented filtration generated by X . A probability measure P is a solution to the martingale problem for L started at x0 if
P(X0 = x0 ) = 1 and
t
f (Xt ) − f (X0 ) −
L f (Xs ) ds
(39.14)
(39.15)
0
is a local martingale under P whenever f ∈ C 2 (Rd ). The martingale problem is well posed if there exists a solution P and this solution is unique. Uniqueness of the martingale problem for L is closely connected to weak uniqueness or, equivalently, uniqueness in law of (39.1). Theorem 39.4 Suppose a = σ σ T and suppose the matrix σ (x) is invertible for each x. Weak uniqueness for (39.1) holds if and only if the solution for the martingale problem for L started at x is unique. Weak existence for (39.1) holds if and only if there exists a solution to the martingale problem for L started at x. Proof We prove the uniqueness assertion. Let be the continuous functions on [0, ∞) and Zt the coordinate process: Zt (ω) = ω(t ). First suppose the solution to the martingale problem is unique. If (Xt 1 , Wt 1 , P1 ) and (Xt 2 , Wt 2 , P2 ) are two weak solutions to (39.1), define Pxi on to be the law of X i under Pi , i = 1, 2. Clearly Pxi (Z0 = x) = Pi (X0i = x) = 1. The expression in (39.13) is a local martingale under Pxi for each i and each f ∈ C 2 . By the uniqueness for the solution of the martingale problem, Px1 = Px2 . This implies that the laws of Xt 1 and Xt 2 are the same, or weak uniqueness holds. Now suppose weak uniqueness holds for (39.1). Let t b(Zs ) ds. Yt = Zt − 0
Px1
Px2
Let and be solutions to the martingale problem. If f (x) = xk , the kth coordinate of x, then ∂ f /∂xi (x) = δik and ∂ 2 f /∂xi ∂x j (x) = 0, where δik is 1 if i = k and 0 otherwise, and so L f (Zs ) = bk (Zs ). We see from (39.13) that the kth coordinate of Yt is a local martingale under Pxi . k m t Now let f (x) = xk xm . A simple computation shows that L f (x) = akm (x), hence Yt Yt − a (Zs ) ds is a local martingale. We set 0 km t Wt = σ −1 (Zs ) dYs . 0
The stochastic integral is finite since t d d E (σ −1 )i j (Zs ) (σ −1 )ik (Zs ) dY j , Y k s 0
j=1
=E
k=1
t d 0 i,k=1
(a−1 )ik (Zs )aik (Zs ) ds = t < ∞.
(39.16)
Exercises
317
Since Yt is a local martingale, it follows that Wt is a local martingale, and a calculation similar to (39.16) shows that Wt k Wt m − δkmt is also a martingale under Pxi . By L´evy’s theorem (Exercise 12.4), Wt is a Brownian motion under both Px1 and Px2 , and (Zt , Wt , Pxi ) is a weak solution to (39.1). By the weak uniqueness hypothesis, the laws of Zt under Px1 and Px2 agree, which is what we wanted to prove. Exercise 39.1 asks you to prove that the existence of a weak solution to (39.1) is equivalent to the existence of a solution to the martingale problem. If the σi j and bi are Lipschitz functions, the solution to (39.1) is pathwise unique; see Exercise 24.5. By Proposition 25.2, weak existence and uniqueness hold, and then the martingale problem for L is well posed for every starting point. A process that can be described in terms of a martingale problem (as well as other ways) is super-Brownian motion. Super-Brownian motion, also known as a measure-valued branching diffusion process, is a process whose state space is the set M of finite positive measures on Rd . The intuitive picture is as follows. Given an initial finite measure μ as a starting point, let Xt n be the process that starts with [nμ(Rd )] particles, each with mass 1/n, each distributed according to μ(dx)/μ(Rd ), where [·] denotes the integer part. Each particle moves as an independent Brownian motion for a time 1/n, at which time each particle splits into two or dies, independently of the other particles. The particles that are now alive move as independent Brownian motions for time 1/n, at which time each particle splits into two or dies, and so on. Xt n is the measure that assigns mass 1/n at each point at which there is a particle alive at time t. We take the right-continuous version of Xt n . It turns out that the sequence converges weakly with respect to the topology of D[0, 1], but where the state space is the set of right-continuous functions with left limits taking values in M (rather than the set of real-valued functions) and the limit law can be characterized as the unique solution to a martingale problem. A solution to this martingale problem started at μ ∈ M is a probability measure on the space of continuous processes taking values in M such that (1) P(X0 = μ) = 1; (2) if f ∈ C ∞ has compact support and we write ν( f ) for f dν, then t Mtf = Xt ( f ) − Xr ( 12 f ) dr 0
is a continuous martingale with quadratic variation process given by t Xr ( f 2 ) dr. Mtf = 0
See Dawson (1993) and Perkins (2002) for more on these processes.
Exercises 39.1 Show that the existence of a weak solution to (39.1) is equivalent to the existence of a solution to the martingale problem for L. 39.2 Suppose the ai j are Lipschitz functions in x and the matrices a(x) are positive definite, uniformly in x; see Exercise 25.4. Show that we can find matrices σ (x) so that each σi j is a Lipschitz function of x and a(x) = σ (x)σ T (x) for each x.
318
Markov processes and SDEs
39.3 If X is a solution to (39.1), give formulas for At and Mt in terms of σ and b, where Mt is a local martingale, At is a process whose paths are locally of bounded variation, and |Xt | = Mt + At . 39.4 Let A ∈ (−1, ∞) and let X be a solution to (39.1), where all the bi ’s are equal to 0, a = σ σ T , and δi j + Axi x j /|x|2 ai j (x) = 1+A for x = 0, where δi j is equal to 1 if i = j and 0 otherwise. Let a(0) be the identity matrix. (1) Prove that the matrices a(x) are uniformly elliptic. (2) Show that |Xt | has the same law as a Bessel process of order d+A . 1+A Conclude that if A is sufficiently close to −1, then X is transient, i.e, limt→∞ |Xt | = ∞, a.s., while if A is sufficiently large, there exist arbitrarily large times t such that Xt = 0. 39.5 Suppose for each n ≥ 1, anij (x) is symmetric in i and j, is continuous in x, and the matrix whose (i, j)th entry is anij (x) is positive definite, uniformly in x and n. Let Ln f (x) =
d
anij (x)
i, j=1
∂2 f (x) ∂xi ∂x j
(39.17)
for f ∈ C 2 . Suppose anij (x) converges to ai j (x) uniformly in x as n → ∞, and define L analogously to (39.17). Fix x0 and let Pn be a solution to the martingale problem for Ln started at x0 . (1) Prove that Pn converges weakly to a solution P to the martingale problem for L started at x0 . (2) Prove that if the ai j are continuously differentiable functions of x whose first partial derivatives are bounded, then there exists a solution to the martingale problem for L started at x0 . (3) Prove that if the ai j are continuous functions of x, then there exists a solution to the martingale problem for L started at x0 . 39.6 Suppose X is a solution to dXt = σ (Xt ) dWt , where W is a d-dimensional Brownian motion, σ (x) is a d × d matrix-valued function that is bounded, and σ T σ is positive definite, uniformly in x. Prove the following estimate for the time to leave a ball: there exist constants c1 and c2 not depending on x0 such that c1 r2 ≤ E x0 τB(x0 ,r) ≤ c2 r2 , where τB(x0 ,r) = inf {> 0 : Xt ∈ / B(x0 , r)}.
Notes See Bass (1997) for more information.
r > 0,
40 Solving partial differential equations
We will be concerned with giving probabilistic representations of the solutions to certain PDEs. Throughout we will be assuming that the given PDE has a solution, the solution is unique, and the solution is sufficiently smooth. We will consider Poisson’s equation, the Dirichlet problem, the Cauchy problem (with an application to Brownian passage times), and Schr¨odinger’s equation. We let Xt be the solution to dXt = σ (Xt ) dWt + b(Xt ) dt.
(40.1)
Here W is a d-dimensional Brownian motion, σ is a bounded Lipschitz continuous d × d matrix-valued function, b is a bounded Lipschitz continuous d × 1 matrix-valued function, and X takes values in Rd . We let a = σ σ T and we consider the operator on C 2 functions given by
L f (x) =
1 2
d
∂2 f ∂f (x) + bi (x) (x). ∂xi ∂x j ∂xi i=1 d
ai j (x)
i, j=1
(40.2)
We suppose the operator L is uniformly elliptic: there exists > 0 such that d i, j=1
ai j (x)yi y j ≥
d
y2i ,
y1 , . . . , yd ∈ Rd .
i=1
In fact, the uniform ellipticity of L will be used only to guarantee that the exit times of bounded domains are finite, a.s.; see Exercise 40.1. For many non-uniformly elliptic operators, it is often the case that the finiteness of the exit times is known for other reasons, and the results then apply to equations involving these operators. Let Xt x be the solution to (40.1) when X0 = x and let Px be the law of Xt x . As in Chapter 39, we slightly abuse notation and say that (Xt , Px ) is a strong Markov process.
40.1 Poisson’s equation We consider first Poisson’s equation in Rd . Suppose λ > 0 and f is a C 1 function with compact support. Poisson’s equation is
Lu(x) − λu(x) = − f (x), 319
x ∈ Rd .
(40.3)
320
Solving partial differential equations
Theorem 40.1 Suppose u is a C 2 solution to (40.3) such that u and its first and second partial derivatives are bounded. Then ∞ x u(x) = E e−λt f (Xt ) dt. 0
Proof
Let u be the solution to (40.3). By Theorem 39.3, t u(Xt ) − u(X0 ) = Mt + Lu(Xs ) ds, 0
where Mt is a martingale. By the product formula, t t t −λt −λs −λs e dMs + e Lu(Xs ) ds − λ e−λs u(Xs ) ds. e u(Xt ) − u(X0 ) = 0
0
0
Taking the expectation with respect to P and letting t → ∞, ∞ e−λs (Lu − λu)(Xs ) ds. −u(x) = E x x
0
Since Lu − λu = − f , the result follows. Let us now let D be a nice bounded domain, e.g., a ball. Poisson’s equation in D requires one to find a function u such that Lu − λu = − f in D and u = 0 on ∂D, where f ∈ C 2 (D) and λ ≥ 0. Here we can allow λ to be equal to 0. Theorem 40.2 Suppose u is a solution to Poisson’s equation in a bounded domain D that is C 2 in D and continuous on D. Then τD u(x) = E x e−λs f (Xs ) ds. 0
Proof The proof is nearly identical to that of the previous theorem. We already mentioned that τD < ∞, a.s.; see Exercise 40.1. Let Sn = inf {t : dist (Xt , ∂D) < 1/n}. By Theorem 39.3, t∧Sn Lu(Xs ) ds. u(Xt∧Sn ) − u(X0 ) = martingale + 0
By the product formula, x −λ(t∧Sn )
E e
u(Xt∧Sn ) − u(x) = E
t∧Sn
x
−λs
e
Lu(Xs ) ds − E
0
= −E x
t∧Sn
x
e−λs u(Xs ) ds
0 t∧Sn
e−λs f (Xs ) ds.
0
Now let n → ∞ and then t → ∞ and use the fact that u is zero on ∂D.
40.2 Dirichlet problem Let D be a ball (or other nice bounded domain) and let us consider the solution to the Dirichlet problem: given a continuous function f on ∂D, find u ∈ C(D) such that u is C 2 in D and
Lu = 0 in D,
u = f on ∂D.
(40.4)
40.3 Cauchy problem
321
We considered the Dirichlet problem in the special case when L is the Laplacian in Section 21.4. Theorem 40.3 Suppose u is a solution to the Dirichlet problem specified by (40.4). Then u satisfies u(x) = E x f (XτD ). Proof As we mentioned above, τD < ∞, a.s. Let Sn = inf {t : dist (Xt , ∂D) < 1/n}. By Theorem 39.3, t∧Sn u(Xt∧Sn ) = u(X0 ) + martingale + Lu(Xs ) ds. 0
Since Lu = 0 inside D, taking expectations shows u(x) = E x u(Xt∧Sn ). We let t → ∞ and then n → ∞. By dominated convergence, we obtain u(x) = E x u(XτD ). This is what we want since u = f on ∂D. If v ∈ C 2 and Lv = 0 in D, we say v is L-harmonic in D.
40.3 Cauchy problem The related parabolic partial differential equation ∂u = Lu ∂t is often of interest. Here u is a function of x ∈ Rd and t ∈ [0, ∞). When we write Lu, we mean d d ∂ 2u ∂u Lu(x, t ) = ai j (x) (x, t ) + bi (x) (x, t ). ∂x ∂x ∂x i j i im j=1 i=1 We will sometimes write ut for ∂u/∂t. Suppose for simplicity that the function f is a continuous function with compact support. The Cauchy problem is to find u such that u is bounded, u is C 2 with bounded first and second partial derivatives in x, u is C 1 in t for t > 0, and ut (x, t ) = Lu(x, t ), u(x, 0) = f (x),
t > 0, x ∈ Rd , x ∈ Rd .
(40.5)
Theorem 40.4 Suppose there exists a solution to (40.5) that is C 2 in x and C 1 in t for t > 0. Then u satisfies u(x, t ) = E x f (Xt ). Proof
Fix t0 and let Mt = u(Xt , t0 − t ). Note ∂ u(x, t0 − t ) = −ut (x, t0 − t ). ∂t
322
Solving partial differential equations
Similarly to the proof of Theorem 39.3 (see Exercise 40.2) but using now the multivariate version of Itˆo’s formula, t t u(Xt , t0 − t ) = martingale + Lu(Xs , t0 − s) ds − ut (Xs , t0 − s) ds. (40.6) 0
0
Since ut = Lu, Mt is a martingale, and E M0 = E Mt0 . On the one hand, x
x
E x Mt0 = E x u(Xt0 , 0) = E x f (Xt0 ), while on the other hand,
E x M0 = E x u(X0 , t0 ) = u(x, t0 ). Since t0 is arbitrary, the result follows. A very similar proof allows one to represent the solution to the Cauchy problem in a bounded domain. Suppose u(x, t ) is C 2 in the x variable, C 1 in the t variable, and satisfies ∂u (x, t ) = Lu(x, t ) ∂t for (x, t ) ∈ D × (0, t1 ], where D is a bounded domain in Rd and t1 > 0. Suppose u(x, 0) = f (x) and u(x, t ) = 0 for all x ∈ ∂D. Exercise 40.3 asks you to show that in this case u(x, t ) = E x f (Xt∧τD ), where again τD is the first exit time of X from the domain D. The Cauchy problem has an application to the passage times of Brownian motion. Suppose we look at the equation ux (x, t ) = 12 uxx (x, t ),
0 < x < b,
t > 0,
with u(x, 0) = f (x) for all x,
u(0, t ) = u(b, t ) = 0 for all t,
where f is a bounded function on [0, b]. This is a partial differential equation (the heat equation) that is sometimes solved in undergraduate classes; see, e.g., Boyce and DiPrima (2009), Section 10.5. Using a combination of the technique of separation of variables and Fourier series expansions, the solution can then be shown to be u(x, t ) = f (y)p0 (t, x, y) dy, where 2 −n2 π 2 t/2b2 e sin(nπ x/b) sin(nπ y/b). p (t, x, y) = b n=1 ∞
0
See also Knight (1981), p. 62. Since u(x, t ) is also equal to E x f (Xt∧τD ), where D is the interval (0, b), then the p0 (t, x, y) are the transition densities for Brownian motion killed on exiting (0, b). In particular, if we take f identically equal to 1 on (0, b), we see that starting at x inside 2 2 2 (0, b), Px (t < τD ) is asymptotically equal to ce−π t/2b . If b is 2, this becomes ce−π t/8 .
40.4 Schr¨odinger operators
323
Since the time for a Brownian motion started at 0 to leave (−1, 1) is the same as the time for a Brownian motion started at 1 to leave (0, 2), we obtain the estimate that was used in Exercise 7.2.
40.4 Schr¨odinger operators Finally we look at what happens when one adds a potential term, that is, when one considers the operator
Lu(x) + q(x)u(x).
(40.7)
This is known as the Schr¨odinger operator, and q(x) is known as the potential. Equations involving the operator in (40.7) are considerably simpler than the quantum mechanics Schr¨odinger equation because here all terms are real-valued. If Xt is the diffusion corresponding to L, then solutions to PDEs involving the operator in (40.7) can be expressed in terms of Xt by means of the Feynman–Kac formula. To illustrate, let D be a nice bounded domain, e.g., a ball, q a C 2 function on D, and f a continuous function on ∂D; q+ denotes the positive part of q. Theorem 40.5 Let D, q, f be as above. Let u be a C 2 function on D that agrees with f on ∂D and satisfies Lu + qu = 0 in D. If
τD E x exp q+ (Xs ) ds < ∞, 0
then
Proof
τD u(x) = E x f (XτD )e 0 q(Xs ) ds . Let Bt =
t∧τD 0
(40.8)
q(Xs ) ds. By Itˆo’s formula and the product formula,
eB(t∧τD ) u(Xt∧τD ) = u(X0 ) + martingale +
t∧τD
t∧τD
u(Xr )eBr dBr +
0
eBr Lu(Xr ) dr.
0
Taking the expectation with respect to P and using Proposition 39.3, t∧τD t∧τD E x eB(t∧τD ) u(Xt∧τD ) = u(x) + E x eBr u(Xr )q(Xr ) dr + E x eBr Lu(Xr ) dr. x
0
0
Since Lu + qu = 0,
E x eB(t∧τD ) u(Xt∧τD ) = u(x). If we let t → ∞ and use the exponential integrability of q+ , the result follows. The existence of a solution to Lu + qu = 0 in D depends on the finiteness of τD + E exp( 0 q (Xs ) ds), an expression that is sometimes known as the gauge. Even in one dimension with D = (0, 1) and q a constant function, the gauge need not 2 be finite. With x = 1/2, Px (τD > t ) is asymptotically equal to ce−π t/2 as t → ∞ by x
324
Solving partial differential equations
Section 40.3. Hence
E exp x
τD 0
q ds = E x eqτD ∞ = qeqt Px (τD > t ) dt; 0
this is infinite if q ≥ π 2 /2.
Exercises 40.1 This (lengthy) exercise is designed to guide you through a proof that solutions to (40.1) exit bounded sets in finite time, a.s. (1) Suppose t Xt = Wt + as ds, 0
where W is a one-dimensional Brownian motion, and as is an adapted process bounded by K. Let L > K > 0 and t0 > 0. Show that there exists ε > 0, depending only on L, K, and t0 such that P(|Xt0 | > 3L) > ε. t (2) Suppose Xt = Mt + 0 as ds, where as is as in (1) and M is a continuous martingale with −1 K ≤ dMt /dt ≤ K, a.s. Use a time change argument to show that there exist L, ε > 0 such that P(sup |Xs | ≤ L) ≤ 1 − ε. s≤1
(3) If now X is a solution to (40.1), a = σ σ T , and L given by (40.2) is uniformly elliptic, show by looking at the first coordinate of X that there exist L, ε such that Px (sup |Xs | ≤ L) ≤ 1 − ε,
x ∈ B(0, L).
s≤1
(4) What you have proved in (3) can be rephrased as saying that if (Xt , Px ) is a strong / B(0, L)}, then Markov process that solves (40.1) for every starting point and τ = inf {t : Xt ∈ Px (τ > 1) ≤ 1 − ε, where ε does not depend on x. Now use the strong Markov property (cf. the proof of Proposition 21.2) to show Px (τ > k) ≤ (1 − ε)k . Conclude that τ < ∞, Px -a.s., for each starting point x. 40.2 Prove (40.6). 40.3 Let D be a ball in Rd and suppose u is the solution to the Cauchy problem in the domain D × [0, t1 ] as described in Section 40.3. Show that u(x, t ) = E x f (Xt∧τD ). 40.4 Suppose f is such that the solution u to ut (x, t ) = Lu(x, t ) + q(x),
u(x, 0) = f (x),
is C 2 in x and t and X is the diffusion associated with L. Prove that t u(x, t ) = E x f (Xt )e 0 q(Xs ) ds .
Notes
325
40.5 Suppose (Xt , Px ) is a Brownian motion on [0, b] with reflection at 0 and b. Find a series expansion for p(t, x, y), the transition densities for X . Hint: Imitate the argument for absorbing Brownian motion in Section 40.3, but now use the boundary conditions ux (0, t ) = ux (b, t ) = 0.
Notes See Bass (1997) for more on the connection between probability and PDEs.
41 One-dimensional diffusions
Under very mild regularity conditions, every one-dimensional diffusion arises from first time-changing a one-dimensional Brownian motion and then making a transformation of the state space. We will prove this fact in this chapter.
41.1 Regularity Throughout this chapter we suppose that we have a continuous process (Xt , Px ) defined on an interval I contained in R. For almost all of the chapter, we suppose for simplicity that the interval is in fact all of R. We further suppose that (Xt , Px ) is a strong Markov process with respect to a right-continuous filtration {F t } such that each Ft contains all the sets that are Px -null for every x. We call such a process a one-dimensional diffusion. Write Ty = inf {t : Xt = y},
(41.1)
the first time the process X hits the point y. We will also assume that every point can be hit from every other point: for all x, y,
Px (Ty < ∞) = 1.
(41.2)
When (41.2) holds, we say the diffusion is regular. / J }, the first time the process leaves J . When For any interval J , define τJ = inf {t : Xt ∈ Xt is a Brownian motion, we know (Proposition 3.16) that the distribution of Xt upon exiting [a, b] is b−x x−a , . (41.3) Px (X (τ[a,b] ) = a) = Px (X (τ[a,b] ) = b) = b−a b−a We say that a regular diffusion Xt is on natural scale if (41.3) holds for every interval [a, b]. We also say a regular diffusion X defined on an interval I properly contained in R is on natural scale if (41.3) holds whenever [a, b] ⊂ I and x ∈ (a, b). If Xt is regular, then the process started at x must leave x immediately. That is, if S = inf {t > 0 : Xt = x}, then Px (S = 0) = 1. To see this, let ε > 0 and U = inf {t : |Xt − x| ≥ ε}. By the regularity of X , E x e−U > 0. Observe that U = S + U ◦ θS , where θt is the shift operator. By the strong Markov property at time S,
E x e−U = E x [e−S E x [e−U ◦ θS | FS ] ] = E x [e−S E XS [e−U ] ] = E x [e−S E x e−U ], since XS = x by the continuity of the paths of X . The only way this can happen is if E x e−S = 1, which implies S = 0, Px -a.s. 326
41.2 Scale functions
327
41.2 Scale functions We will show that given a regular diffusion, there exists a scale function that is continuous, strictly increasing, and such that s(Xt ) is on natural scale. We first look at a special case, when the diffusion is given as the solution to an SDE. Suppose Xt is given as the solution to dXt = σ (Xt ) dWt + b(Xt ) dt,
(41.4)
where we assume σ and b are real-valued, continuous and bounded above and σ is bounded below by a positive constant. Let a(x) = σ 2 (x). In this case we can give a formula for the scale function. Theorem 41.1 The scale function s(x) is the solution to 1 a(x)s (x) 2
+ b(x)s (x) = 0,
and for some constants c1 , c2 , and x0 is given by x
s(x) = c1 + c2 exp − x0
Proof
y x0
2b(w) dw dy. a(w)
(41.5)
To solve the differential equation, we write b(x) s (x) = −2 , s (x) a(x)
or (log s (x)) = −2b(x)/a(x), from which (41.5) follows. Since we assumed that σ and b are continuous, s(x) given by (41.5) is C 2 . Since σ is bounded below by a positive constant and b and σ are bounded, s given by (41.5) is strictly increasing. Applying Itˆo’s formula, t s(Xt ) − s(X0 ) = s (Xr )σ (Xr ) dWr (41.6) 0
because
0
t
[ 21 s (Xr )σ (Xr )2 + s (Xr )b(Xr )] dr = 0.
This implies that s(Xt ) − s(X0 ) is a martingale, hence a time change of Brownian motion. Therefore the exit probabilities of s(Xt ) for an interval [a, b] are the same as those of a Brownian motion, namely, those given by (41.3). From (41.6), if Yt = s(Xt ), then dYt = (s σ )(s−1 (Yt )) dWt .
(41.7)
Now we show there exists a scale function for general regular diffusions on R. Let J be an interval [a, b]. We define p(x) = pJ (x) = Px (XτJ = b).
(41.8)
Proposition 41.2 Let J = [a, b] be a finite interval. Then p(Xt∧τJ ) is a regular diffusion on [0, 1] on natural scale.
328
One-dimensional diffusions
Proof First we show that p is increasing. To get to the point b starting from x, the process must first hit every point between x and b because X has continuous paths. If a < x < y < b, by the strong Markov property at time Ty , p(x) ≤ p(y). We claim there is a positive probability that the process starting from x hits a before y, that is,
Px (Ta < Ty ) > 0.
(41.9)
If (41.9) did not hold, then the process started at x must hit y before hitting a, then by the continuity of paths must hit x before hitting a, and once the process is again at x, it again hits y with probability one before a and so on. Therefore the process never hits a, a contradiction to the regularity; Exercise 41.2 asks you to make this argument precise. Therefore (41.9) does hold, and by the strong Markov property at Ty , p(x) = Px (Ty < Ta )p(y). Since Px (Ty < Ta ) = 1 − Px (Ta < Ty ) is strictly less than 1, p is strictly increasing. Next we show that p is continuous. We show continuity from the right; the proof of continuity from the left is similar. Suppose xn ↓ x. The process Xt has continuous paths, so given ε we can find t small enough so that Px (Ta < t ) < ε. By the Blumenthal 0–1 law (Proposition 20.8), Px (T(x,b] = 0) is zero or one, where T(x,b] is the first time the process hits the interval (x, b]. If it is zero, the process immediately moves to the left from x, a.s., and by the strong Markov property at Tx , it never hits b, a contradiction. The probability must therefore be one. Thus by the continuity of paths, for n large enough, Px (Txn < t ) ≥ 1 − ε. Hence with probability at least 1 − 2ε, Xt hits xn before a. Since p(x) = Px (Txn < Ta ) p(xn ) ≥ (1 − 2ε)p(xn ) and ε is arbitrary, we see that p(x) ≥ lim inf n→∞ p(xn ). Since p is strictly increasing, p(xn ) decreases, and therefore p(x) = lim p(xn ). Finally, we show p(Xt ) is on natural scale. Let [e, f ] ⊂ (0, 1) and let r(y) = Py (Xt hits p−1 ( f ) before hitting p−1 (e)). Note that −1
Px (p(Xt ) hits f before e) = P p
(x)
(Xt hits p−1 ( f ) before p−1 (e))
= r(p−1 (x)). For y ∈ [p−1 (a), p−1 (b)], the strong Markov property tells us that p(y) = Py Xt hits p−1 ( f ) before p−1 (e) p p−1 ( f ) + Py Xt hits p−1 (e) before p−1 ( f ) p p−1 (e) = r(y) f + (1 − r(y))e. Solving for r(y), we obtain r(y) = (p(y) − e)/( f − e). Substituting in (41.10),
Px (p(Xt ) hits f before e) = r(p−1 (x)) = (p(p−1 (x)) − e)/( f − e) = (x − e)/( f − e), which is the formula we wanted.
(41.10)
(41.11)
41.3 Speed measures
329
Note that if Xt is on natural scale, then so is c1 Xt + c2 for any constants c1 > 0, c2 ∈ R. Theorem 41.3 There exists a continuous strictly increasing function s such that s(Xt ) is on natural scale on s(R ). Proof Let Jn be closed intervals increasing up to R. Pick two points in J1 ; label them a and b with a < b. Choose An and Bn so that if sn (x) = An pJn (x) + Bn , then sn (a) = 0 and sn (b) = 1. We will show that if n ≥ m, then sn = sm on Jm . Once we have that, we can set s(x) = sn (x) on Jn , and the theorem will be proved. Suppose Jm = [e, f ]. By Proposition 41.2, both sm (Xt ) and sn (Xt ) are on natural scale. For all x ∈ Jm , sm (x) − sm (e) = Psm (x) sm (Xt ) hits sm ( f ) before sm (e) sm ( f ) − sm (e) = Px (Xt hits f before e). We have a similar equation with sm replaced everywhere by sn . It follows that sm (x) − sm (e) sn (x) − sn (e) = sm ( f ) − sm (e) sn ( f ) − sn (e) for all x, which implies that sn (x) = Csm (x) + D for some constants C and D. Since sn and sm are equal at both x = a and x = b, then C must be 1 and D must be 0.
41.3 Speed measures Suppose that (P , Xt ) is a regular diffusion on R on natural scale. For each finite interval (a, b), define
2(x−a)(b−y) , a < x ≤ y < b, b−a Gab (x, y) = 2(y−a)(b−x) (41.12) , a < y ≤ x < b, b−a x
and set Gab (x, y) = 0 if x or y is not in (a, b). A measure m(dx) is the speed measure for the diffusion (Xt , Px ) if E x τ(a,b) = Gab (x, y) m(dy) (41.13) for each finite interval (a, b) and each x ∈ (a, b). As (41.13) indicates, the speed measure governs how quickly the diffusion moves through intervals. As an example, let us argue that the speed measure for Brownian motion is a Lebesgue measure. By Proposition 3.16, if (Xt , Px ) is a Brownian motion,
E x τ(a,b) = (x − a)(b − x). On the other hand, a calculation shows that Gab (x, y) dy = (x − a)(b − x).
330
One-dimensional diffusions
Since
E x τ(a,b) =
Gab (x, y) dy
and Brownian motion is on natural scale, we see that the speed measure m(dy) of Brownian motion is equal to a Lebesgue measure. We will show that (1) a regular diffusion on natural scale has one and only one speed measure, (2) the law of the diffusion is determined by the speed measure, and (3) there exists a diffusion with a given speed measure. We first want to show that any speed measure must satisfy 0 < m(a, b) < ∞ for any finite interval [a, b]. To start we have the following lemma. k Lemma 41.4 If [a, b] is a finite interval, then supx E x τ(a,b) < ∞ for each positive integer k.
Proof Pick y ∈ (a, b). Since Xt is a regular diffusion, Py (Ta < ∞) = 1, and hence there exists t0 such that Py (Ta > t0 ) < 1/2. Similarly, taking t0 larger if necessary, Py (Tb > t0 ) ≤ 1/2. If a < x ≤ y, then
Px (τ(a,b) > t0 ) ≤ Px (Ta > t0 ) ≤ Py (Ta > t0 ) ≤ 1/2, and similarly, Px (τ(a,b) > t0 ) ≤ 1/2 if y ≤ x < b. By the Markov property,
Px (τ(a,b) > (n + 1)t0 ) = E x [PX (nt0 ) (τ(a,b) > t0 ); τ(a,b) > nt0 ] ≤ 12 Px (τ(a,b) > nt0 ), and by induction, Px (τ(a,b) > nt0 ) ≤ 2−n . The lemma is now immediate. Lemma 41.5 If (Xt , Px ) has a speed measure m and [a, b] is a non-empty finite interval, then 0 < m(a, b) < ∞. Proof
If m(a, b) = 0, then for x ∈ (a, b), we have E x τ(a,b) = Gab (x, y) m(dy) = 0,
which implies τ(a,b) = 0, Px -a.s., a contradiction to the continuity of the paths of Xt . Next we show the finiteness of m(a, b). Pick (e, f ) such that [a, b] ⊂ (e, f ). There exists a constant c such that for x, y ∈ (a, b), Ge f (x, y) is bounded below by c, so f −1 Ge f (x, y) m(dy) = c−1 E x τ(e, f ) < ∞. m(a, b) ≤ c e
This completes the proof. Theorem 41.6 A regular diffusion on natural scale on R has one and only one speed measure. Proof First let I = (e, f ) be a finite open interval. For n > 1 let xi = e + i( f − e)/2n , i = 0, 1, 2, . . . , 2n . Let Dn = {xi : 0 ≤ i ≤ 2n }. Let mn (dx) = 2
n
n 2 −1
i=1
B(xi )δxi ,
(41.14)
41.3 Speed measures
331
where B(xi ) = E xi τ(xi−1 ,xi+1 ) . We first want to show that if [a, b] is a subinterval of I with a, b each in Dn and x is also in Dn , then E x τ(a,b) = Gab (x, y) mn (dy). (41.15) To see this, let S0 = 0 and S j+1 = inf {t > S j : |Xt − XS j | = 2−n } ∧ τ(a,b) . The S j ’s are the successive times that X moves 2−n , up until the time of leaving (a, b). Because X is on natural scale, XS j+1 is equal to XS j + 2−n with probability 12 and equal to XS j − 2−n with probability 12 , until leaving (a, b). Therefore XS j is a simple symmetric random walk on the lattice with step size 2−n , stopped on leaving (a, b). Let J (xi ) = (xi − 2−n , xi + 2−n ) for xi = a, b. Let J (a) = J (b) = ∅. By repeated use of the strong Markov property,
E x τ(a,b) =
∞
E x (S j+1 − S j )
j=0
= Ex
∞
E X (S j ) [τJ (X0 ) ] = E x
j=0
Let Ni =
∞ j=0
∞
B(XS j )1(a,b) (XS j ).
j=0
1{xi } (XS j ), the number of visits to xi before exiting (a, b). Then
E x τ(a,b) = E x
∞
B(XS j )1(a,b) (XS j )
(41.16)
j=0 −1 ∞ 2 n
=E
x
B(XS j )1{xi } (XS j )
j=0 i=1
=E
x
n −1 2
B(xi )Ni .
i=1
E x Ni must equal 0 when x = a or x = b and satisfies the equation E x j Ni = δi j + 12 (E x j+1 Ni + E x j−1 Ni ),
(41.17)
where δi j is 1 if i = j and 0 otherwise. This holds because for j = i, the process goes left or right, each with probability 1/2, while if j = 1, we add one to Ni before going left or right. The function x → E x Ni is hence piecewise linear on (a, xi ) and on (xi , b). Some algebra shows that we must have
E x Ni = 2n Gab (x, xi ). Combining (41.16) and (41.18),
E τ(a,b) = x
n 2 −1
B(xi )2n Gab (x, xi )
i=1
= which is (41.15).
Gab (x, y) mn (dy),
(41.18)
332
One-dimensional diffusions
Using (41.15) and the same proof as that of Lemma 41.5, mn (a, b) is bounded above by a constant independent of n. By a diagonalization procedure, there exists a subsequence nk such that mnk converges weakly to m, where m is a measure that is finite on every subinterval (a, b) such that [a, b] ⊂ I. By the continuity of Gab , x E τ(a,b) = Gab (x, y) m(dy) (41.19) whenever a, b, and x are in Dn for some n. We now remove this last restriction. If a, b are not of this form, take ar , br to be in ∪n Dn such that (ar , br ) ↑ (a, b). Then τ(ar ,br ) ↑ τ(a,b) , and by the continuity of Gab in a, b, x, and y, we have (41.19) for all a and b. Take yr ↑ x, zr ↓ x such that yr and zr are in Dn for some n. By the strong Markov property,
E x τ(a,b) = E x τ(yr ,zr ) + E yr τ(a,b) Px (Xτ(yr ,zr ) = yr ) + E zr τ(a,b) Px (Xτ(yr ,zr ) = zr ). By the continuity of Gab in x, and the fact that E x τ(yr ,zr ) → 0 as r → ∞, we obtain (41.19) for all x. We leave the uniqueness as Exercise 41.3. Finally, let Ik be finite subintervals increasing up to R. Let mk be the speed measure for Xt on the interval Ik . By the uniqueness result, mk agrees with m on I if I ⊂ Ik . Setting m to be the measure whose restriction to Ik is mk gives us the speed measure. The speed measure completely characterizes occupation times. Corollary 41.7 Suppose Xt is a diffusion on natural scale on R. If f is bounded and measurable, for each a < b, τ(a,b) Ex f (Xs ) ds = Gab (x, y) f (y) m(dy). (41.20) 0
Proof Suppose that f is continuous and bounded on [a, b]. Let xi , S j , B(xi ), Ni , and mn be as in the proof of Theorem 41.6. Let εn = sup{| f (x) − f (y)| : |x − y| ≤ 2−n }. Note that if (x − a)/(b − a) is a multiple of 2−n , τ(a,b) ∞ Ex f (Xs ) ds = Ex 0
S j+1
f (Xs ) ds
(41.21)
f (XS j )1(a,b) (XS j )E XS j S1
(41.22)
Sj
j=0
and
Ex
∞
f (XS j )(S j+1 − S j ) = E x
∞
j=0
j=0
=
n 2 −1
i=1
f (xi )B(xi )1(a,b) (xi )E x Ni .
41.4 The uniqueness theorem
333
Moreover, the right-hand side of (41.21) differs from the left-hand side of (41.22) by at most εn E x τ(a,b) . By (41.18) the right-hand side of (41.22) is equal to n 2 −1
2n f (xi )B(xi )1(a,b) (xi )Gab (x, xi ) =
Gab (x, xi ) f (xi ) mn (dx).
i=1
By weak convergence along an appropriate subsequence, the left-hand side and the righthand side of (41.20) differ by at most lim supn εn E x τ(a,b) , which is zero. A limit argument then shows that (41.20) holds for all x ∈ [a, b], and another limit argument shows that (41.20) holds for all bounded f .
41.4 The uniqueness theorem We next turn to showing that the speed measure characterizes the law of a diffusion. Theorem 41.8 If (Xt , Pxi ), i = 1, 2, are two diffusions on natural scale with the same speed measure m, then Px1 = Px2 . Proof
We start by letting (a, b) ⊂ R and considering the operator τ(a,b) e−λt f (Xt ) dt, λ ≥ 0, Riλ f (x) = E xi
(41.23)
0
for i = 1, 2. We show first that R10 = R20 , that is, that τ(a,b) E x1 f (Xt ) dt = E x2 0
τ(a,b)
f (Xt ) dt
0
if f is bounded and Borel measurable. This is easy, because by Corollary 41.7, both sides are equal to b Gab (x, y) m(dy). a
is the process X killed on exiting (a, b), the is a Markov process, where X Since resolvent equation (37.2) holds. We have t , Pxi ) (X
Ri0 f ∞ ≤ f ∞ sup E x τ(a,b) x = f ∞ sup Gab (x, y) m(dy) x
≤ c f ∞ m(a, b) < ∞. Since Ri0 ∞ < ∞, we can let μ go to zero in (37.2). We can repeat the proof of Corollary 37.3 with λ = 0 to see that Riμ f = Ri0 f +
∞
(−μ) j (Ri0 ) j+1 f
i=1
provided μ < Ri0 ∞ . We can then use Remark 37.4 to obtain that R1λ = R2λ for all λ > 0. We now take open intervals In increasing up to R. Applying the above to In and letting n → ∞,
334
One-dimensional diffusions
we have
E x1
∞
∞
e−λt f (Xt ) dt = E x2
0
e−λt f (Xt ) dt
0
whenever f is bounded and Borel measurable and x ∈ R. Suppose f is continuous as well. By the uniqueness of the Laplace transform, we see that E x1 f (Xt ) = E x2 f (Xt ) for almost every t, and since both terms are continuous in t, this equality holds for all t. By a limit argument, this equality holds for all bounded and Borel measurable f . Therefore the one-dimensional distributions of X under Px1 and Px2 agree. If s < t and f and g are bounded and Borel measurable, 1 2 E x1 [ f (Xs )g(Xt )] = E x1 [ f (Xs )Pt−s g(Xs )] = E x1 [ f (Xs )Pt−s g(Xs )] 2 2 g)(Xs )] = E x2 [( f Pt−s g)(Xs )] = E x1 [( f Pt−s 2 = E x2 [ f (Xs )Pt−s g(Xs )] = E x2 [ f (Xs )g(Xt )]. i 1 is the semigroup for (Xt , Pxi ); since the one-dimensional distributions agree, Pt−s = Here Pt−s 2 x x Pt−s . We have thus shown the two-dimensional distributions of X under P1 and P2 agree. Continuing, we see that all the finite-dimensional distributions under Px1 and Px2 agree. By the continuity of the paths of X and Theorem 2.6, that is enough to show equality of Px1 and Px2 .
41.5 Time change We now want to show that if m is a measure such that 0 < m(a, b) < ∞ for all intervals [a, b], then there exists a regular diffusion on natural scale on R having m as a speed measure. If m(dx) had a density, say m(dx) = r(x) dx, we would proceed as follows. Let Wt be a one-dimensional Brownian motion and let t At = r(Ws ) ds, Bt = inf {u : At > u}, Xt = WBt . 0
In other words, we let Xt be a certain time change of Brownian motion. In general, where m(dx) does not have a density, we make use of the local times Ltx of Brownian motion; see Chapter 14. Let At =
Ltx m(dx),
Bt = inf {u : Au > t},
Xt = WBt .
(41.24)
Theorem 41.9 Let (Wt , Px ) be a Brownian motion and m a measure on R such that 0 < m(a, b) < ∞ for every finite interval (a, b). Then, under Px , Xt as defined by (41.24) is a regular diffusion on natural scale with speed measure m. Proof First we show that Xt is a continuous process. Fix ω. If we choose a < inf s≤t Ws and b > sups≤t Ws , then x At = Lt m(dx) = Ltx 1[a,b] (x) m(dx) since Ltx increases only for those times s when Ws = x. By the continuity of Ltx and dominated convergence, we conclude that At (ω) is continuous at time t. Next we show that At is strictly increasing. Fix ω. If s < u, pick t ∈ (s, u). Set x = Wt . Because the support of the measure
41.5 Time change
335
dLtx is the set {r : Wr = x}, then Lxu − Lxs > 0. By the continuity of local times, Lyu − Lys > 0 for all y in a neighborhood of x, say (x − δ, x + δ). Since m(x − δ, x + δ) > 0, then Au − As > 0. Hence At is strictly increasing. This and the continuity of At imply that Bt is continuous, and therefore Xt is continuous. Next we show that Xt is a regular diffusion on natural scale. By monotone convergence X and the fact that Ltx → ∞, a.s., for each x, At ↑ ∞, hence Bt ↑ ∞, so τ(a,b) < ∞, Px -a.s., X W where τ(a,b) denotes the exit time of (a, b) by Xt and τ(a,b) denotes the corresponding exit time of Wt . Moreover, x−a X W , Px (X (τ(a,b) ) = b) = Px (W (τ(a,b) ) = b) = b−a since Xt is a time change of Wt . To verify the strong Markov property, we repeat the argument of Section 22.3. Let Ft = FBt . Then if T is a stopping time for Ft , we have
E x [ f (XT +t ) | FT ] = E x [ f (W (BT +t )) | FBT ]. BT can be seen to be a stopping time for Ft and BT +t = Bt ◦ θBT where θt are the shift operators, so this is
E x E W (BT ) f (WBt ) = E x E XT f (Xt ). As in Section 20.3, this suffices to show that Xt is a strong Markov process. X It remains to determine the speed measure of Xt . Fix (a, b) and write τX for τ(a,b) and τW W for τ(a,b) . We have ∞ E x τX = E x 1(a,b) (Xs∧τX ) ds 0 ∞ 1(a,b) (WBs∧τX ) ds = Ex 0 ∞ 1(a,b) (Wt∧τW ) dAt = Ex 0 ∞ = Ex 1(a,b) (Wt∧τW ) Lty m(dy) 0 τW x =E Lty m(dy) = E x LyτW m(dy). 0
We also have
E x LyτW = E x |WτW − y| − |x − y| by (14.5). This is equal to |a − y|Px (WτW = a) + |b − y|Px (WτW = b) − |x − y| b−x x−a = |a − y| + |b − y| − |x − y| = Gab (x, y). b−a b−a We thus have
E x τX = as required.
Gab (x, y) m(dy),
336
One-dimensional diffusions
As a corollary to the proof, we see that a regular diffusion on natural scale is a local martingale, since it is a time change of Brownian motion.
41.6 Examples Let us calculate the scale function and the speed measure for some examples of diffusions. First we need to connect the speed measure with the coefficients of an SDE. Let us look at the solutions to the SDE (41.4), but now suppose b is identically zero, or dXt = σ (Xt ) dWt . We again set a(x) = σ (x)2 . Theorem 41.10 Suppose c1 < σ (x) < c2 for all x and σ is continuous. The speed measure of Xt is given by 1 dx. m(dx) = a(x) t Proof Since dXt = σ (Xt ) dWt , then X t = 0 a(Xs ) ds. To obtain a Brownian motion W t by time-changing the martingale Xt , we must time-change by the inverse of X t . On the other hand, from Theorem 41.9, Xt is the time-change of a Brownian motion by Bt , where Bt is given by (41.24). Hence t Bt = X t = a(Xs ) ds. 0
The inverse of Bt , namely, At , must then satisfy 1 1 dAt = = , dt a(XAt ) a(Wt ) or
1 1 ds = Lty dy a(W ) a(y) s 0 for all t, using Theorem 14.4. However, At = Lty m(dy) by (41.24). Hence 1 dy = Lty m(dy). Lty a(y)
t
At =
We know E x Lyτ(c,d ) = Gcd (x, y). Therefore Gcd (x, y) m(dy) = E x Lyτ(c,d ) m(dy) 1 dy = E x Lyτ(c,d ) a(y) 1 dy = Gcd (x, y) a(y) for all c, d, and x, which implies m(dy) = (1/a(y)) dy. Now we can look at some examples and do calculations. Brownian motion with constant drift. This process is the solution to the SDE dXt = dWt +b dt. From Theorem 41.1, s(x) = exp(−2bx) is the scale function. If Yt = s(Xt ), then
Exercises
337
(s σ )(s−1 (y)) = −2by, or Yt corresponds to the operator 2b2 y2 f , and the speed measure is (4b2 y2 )−1 dx. Bessel processes. The process is only defined on the state space [0, ∞) instead of all of R and there is a boundary condition at 0. We ignore this here and consider a Bessel process of order ν up until the first hit of 0. Then X solves the SDE ν−1 dt. 2Xt
dXt = dWt +
If ν = 2, a calculation using Theorem 41.1 shows that s(x) = x2−ν . Then Yt = s(Xt ) satisfies dYt = (2 − ν )Yt (1−ν)/(2−ν) dWt , and the speed measure is m(dx) = (2 − ν )−2 x(2ν−2)/(2−ν) dx,
x > 0.
Exercises 41.1 In the proof of Proposition 41.2 we used the strong Markov property numerous times. Write out carefully in terms of shift operators and conditional expectations how the strong Markov property is applied in each case. 41.2 Give a rigorous proof of (41.9). 41.3 Show that if
Gab (x, y) m1 (dy) =
Gab (x, y) m2 (dy)
for all x, a, and b, then m1 = m2 . 41.4 Show that if X is a Bessel process of order 2, then the scale function is given by s(x) = log x, Yt = s(Xt ) satisfies dYt = e−Yt dWt , and the speed measure is m(dx) = e2x dx. 41.5 Suppose X is a regular diffusion whose state space is R. Prove that X is on natural scale if and only if P(a+b)/2 (Ta < Tb ) =
1 2
whenever a < b. 41.6 Let a > 0 and let m(dx) = dx + a δ0 (dx), where δ0 is the point mass at 0. Let (Xt , Px ) be the diffusion on the line on natural scale whose speed measure is given by m. Show that under P0 , t 1{0} (Xs ) ds > 0 0
with probability one for each t > 0. Prove that for each t > 0, Zt = {t : Xt = 0} contains no intervals. Thus the zero set of the process X spends an amount of time at 0 that has positive Lebesgue measure, but the zero set contains no intervals. 41.7 Define
ma (dx) =
dx,
x ≥ 0,
a dx,
x < 0.
338
One-dimensional diffusions Let (Xt , Pxa ) be the diffusion on natural scale on the line whose speed measure is given by ma . Suppose x > 0. Prove that if a → ∞, then Pxa converges weakly to the law of Brownian motion absorbed (i.e., killed) at 0, started at x. What do you think happens when a → 0?
Notes We have considered diffusions on R but most of what we discussed goes through for diffusions whose state space is an interval properly contained in R. In this case, one must specify what the process does when it hits the boundary. Being absorbed (i.e., killed) or reflected are two options, but much more complicated behavior is possible. See Itˆo and McKean (1965) and Knight (1981) for the complete story.
42 L´evy processes
A L´evy process is a process with stationary and independent increments whose paths are right continuous with left limits. Having stationary increments means that the law of Xt − Xs is the same as the law of Xt−s − X0 whenever s < t. Saying that X has independent increments means that Xt − Xs is independent of σ (Xr ; r ≤ s) whenever s < t. We want to examine the structure of L´evy processes. We have three examples already: the Poisson process, Brownian motion, and the deterministic process Xt = t. It turns out that all L´evy processes can be built up out of these building blocks. We will show how to construct L´evy processes and give a representation of an arbitrary L´evy process. Recall that we use Xt− = lims
42.1 Examples Let us begin by looking at some simple L´evy processes. Let Pt j , j = 1, . . . , J , be a sequence of independent Poisson processes with parameters λ j , respectively. Each Pt j is a L´evy process and the formula for the characteristic function of a Poisson random variable (see Section A.13) shows that the characteristic function of Pt j is j
E eiuPt = exp(tλ j (eiu − 1)). Therefore the characteristic function of a j Pt j is j
E eiua j Pt = exp(tλ j (eiua j − 1)) and the characteristic function of a j Pt j − a j λ j t is
E eiua j Pj −a) jλ j t = exp(tλ j (eiua j − 1 − iua j )). t
If we let m j be the measure on R defined by m j (dx) = λ j δa j (dx), where δa j (dx) is point mass at a j , then the characteristic function for a j Pt j can be written as
(42.1) exp t [eiux − 1] m j (dx) R
and the one for a j Pt j − a j λ j t as
exp t [eiux − 1 − iux] m j (dx) . R
339
(42.2)
340
L´evy processes
Now let Xt =
J
a j Pt j .
j=1
It is clear that the paths of Xt are right continuous with left limits, and the fact that X has stationary and independent increments follows from the corresponding property of the P j ’s. Moreover, the characteristic function of a sum of independent random variables is the product of the characteristic functions, so the characteristic function of Xt is given by
Ee
iuXt
= exp t [eiux − 1] m(dx)
(42.3)
R
with m(dx) = Jj=1 λ j δa j (dx). The process Yt = Xt − t Jj=1 a j λ j is also a L´evy process and its characteristic function is
E eiuYt = exp t [eiux − 1 − iux] m(dx) ,
(42.4)
R
again with m(dx) =
J j=1
λ j δa j (dx).
Remark 42.1 Recall from Proposition A.50 that if ϕ is the characteristic function of a random variable Z, then ϕ (0) = iE Z and ϕ (0) = −E Z 2 . If Yt is as in the paragraph above, then clearly E Yt = 0, and calculating the second derivative of E eiuYt at 0, we obtain
E Yt = t 2
x2 m(dx).
The following lemma is a restatement of Corollary 4.3. Lemma 42.2 If Xt is a L´evy process and T is a finite stopping time, then XT +t − XT is a L´evy process with the same law as Xt − X0 and independent of F T .
42.2 Construction of L´evy processes A process X has bounded jumps if there exists a real number K > 0 such that supt |Xt | ≤ K, a.s. Lemma 42.3 If Xt is a L´evy process with bounded jumps and with X0 = 0, then Xt has moments of all orders, that is, E |Xt | p < ∞ for all positive integers p. Proof Suppose the jumps of Xt are bounded in absolute value by K. Since Xt is right continuous with left limits, there exists M > K such that P(sups≤t |Xs | ≥ 2M ) ≤ 1/2.
42.2 Construction of L´evy processes
341
Let T1 = inf {t : |Xt | ≥ M} and Ti+1 = inf {t > Ti : |Xt − XTi | > M}. For s < T1 , |Xs | ≤ M, and then |XT1 | ≤ |XT1 − | + |XT1 | ≤ M + K ≤ 2M. We have
P(sup |Xs | ≥ 2(i + 1)M ) ≤ P(Ti+1 ≤ t ) ≤ P(Ti ≤ t, Ti+1 − Ti ≤ t ) s≤t
= P(sup |XTi +s − XTi | ≥ 2M, Ti ≤ t ) s≤t
= P(sup |Xs | ≥ 2M )P(Ti ≤ t ) s≤t
≤ 12 P(Ti ≤ t ), using Lemma 42.2 in the last equality. By induction, P(sups≤t |Xs | ≥ 2iM ) ≤ 2−i , and the lemma now follows immediately. A key lemma is the following. Lemma 42.4 Suppose I is a finite interval of the form (a, b), [a, b), (a, b], or [a, b] with a > 0 and m is a finite measure on R giving no mass to I c . Then there exists a L´evy process Xt satisfying (42.3). Proof First let us consider the case where I = [a, b). We approximate m by a discrete measure. If n ≥ 1, let z j = a + j(b − a)/n, j = 0, . . . , n − 1, and let mn (dx) =
n−1
m([z j , z j+1 ))δz j (dx),
j=0
where δz j is the point mass at z j . The measures mn converge weakly to m as n → ∞ in the sense that f (x) mn (dx) →
f (x) dx
whenever f is a bounded continuous function on R. For each n, let Ptn, j , j = 0, . . . , n − 1, be independent Poisson processes with parameters m([z j , z j+1 )) and let Xt n =
n−1
z j Ptn, j .
j=0
Then X n is a L´evy process with jumps bounded by b. By Lemma 42.2, if Tn is a stopping time for X n , ε > 0, and δ > 0, then
P(|XTnn +δ − XTnn | > ε) = P(|Xδn | > ε) ≤ P(Xδn = 0) n−1
≤P Pδn, j = 0 .
(42.5)
j=0
Since the sum of independent Poisson processes is a Poisson process, then Poisson process with parameter n−1 j=0
m([z j , z j+1 )) = m(I ).
n−1 j=0
Ptn, j is a
342
L´evy processes
The last line of (42.5) is then bounded by 1 − e−δm(I ) ≤ δm(I ), which tends to zero uniformly in n as δ → 0. Note X0n = 0, a.s. We can therefore apply the Aldous criterion (Theorem 34.8) to see that the X n are tight with respect to weak convergence on the space D[0, t0 ) for any t0 . Any subsequential weak limit X will have paths that are right continuous with left limits. For any continuous bounded function f on R, n E f (Xt n − Xsn ) = E f (Xt−s − X0n ).
Passing to the limit along an appropriate subsequence,
E f (Xt − Xs ) = E f (Xt−s − X0 ). Since f is an arbitrary bounded continuous function, we see that the laws of Xt − Xs and Xt−s − X0 are the same. Similarly we prove the increments are independent. Since x → eiux is a bounded continuous function and mn converges weakly to m, starting with
E exp(iuXt n ) = exp t [eiux − 1] mn (dx) , and passing to the limit, we obtain that the characteristic function of X under P is given by (42.3). If now the interval I contains the point b, we follow the above proof, except we let Ptn,n−1 be a Poisson random variable with parameter m([zn−1 , b]). Similarly, if I does not contain the point a, we change Ptn,0 to be a Poisson random variable with parameter m((a, z1 )). With these changes, the proof works for intervals I, whether or not they contain either of their endpoints. Remark 42.5 If X is the L´evy process constructed in Lemma 42.4, then Yt = Xt − E Xt will be a L´evy process satisfying (42.4). Here is the main theorem of this section. Theorem 42.6 Suppose m is a measure on R with m({0}) = 0 and (1 ∧ x2 )m(dx) < ∞. Suppose b ∈ R and σ ≥ 0. There exists a L´evy process Xt such that
E eiuXt = exp t iub − σ 2 u2 /2 + [eiux − 1 − iux1(|x|≤1) ]m(dx) .
(42.6)
R
The above equation is called the L´evy–Khintchine formula. The measure m is called the L´evy measure. If we let m(dx) =
1 + x2 m (dx) x2
42.2 Construction of L´evy processes
and
b=b +
(|x|≤1)
x3 m(dx) − 1 + x2
(|x|>1)
then we can also write
E eiuXt = exp t iub − σ 2 u2 /2 + eiux − 1 − R
343
x m(dx), 1 + x2
iux 1 + x2 m (dx) . 1 + x2 x2
Both expressions for the L´evy–Khintchine formula are in common use. Proof Let m(dx) be a measure supported on (0, 1] with x2 m(dx) < ∞. Let mn (dx) be the measure m restricted to (2−n , 2−n+1 ]. Let Yt n be independent L´evy processes whose characteristic functions are given by (42.4) with m replaced by mn ; see Remark 42.5. Note E Yt n = 0 for all n by Remark 42.1. By the independence of the Y n ’s, if M < N, 2−M N N N
2 n n 2 2 E Yt = E (Yt ) = t x mn (dx) = t x2 m(dx). n=M
n=M
n=M
2−N
By our assumption on m, this goes to zero as M, N → ∞, and we conclude that Nn=0 Yt n converges in L2 for each t. Call the limit Yt . It is routine to check that Yt has independent and stationary increments. Each Yt n has independent increments and is mean zero, so
E [Yt n − Ysn | Fs ] = E [Yt n − Ysn ] = 0, or Y n is a martingale. By Doob’s inequalities and the L2 convergence, N 2 E sup Ysn → 0 s≤t
n=M
k n as M, N → ∞, and hence there exists a subsequence Mk such that M n=1 Ys converges uniformly over s ≤ t, a.s. Therefore the limit Yt will have paths that are right continuous with left limits. If m is a measure supported in (1, ∞) with m(R ) < ∞, we do a similar procedure starting with L´evy processes whose characteristic functions are of the form (42.3). We let mn (dx) be the restriction of m to (2n , 2n+1 ], let Xt n be independent L´evy processes corresponding to mn , n and form Xt = ∞ n=0 Xt . Since m(R ) < ∞, for each t0 , the number of times t less than t0 at which any one of the Xt n jumps is finite. This shows Xt has paths that are right continuous with left limits, andit is easy to then see that Xt is a L´evy process. Finally, suppose x2 ∧ 1 m(dx) < ∞. Let Xt 1 , Xt 2 be L´evy processes with characteristic functions given by (42.3) with m replaced by the restriction of m to (1, ∞) and (−∞, −1), respectively, let Xt 3 , Xt 4 be L´evy processes with characteristic functions given by (42.4) with m replaced by the restriction of m to (0, 1] and [−1, 0), respectively, let Xt 5 = bt, and let Xt 6 be σ times a Brownian motion. Suppose the X i ’s are all independent. Then their sum will be a L´evy process whose characteristic function is given by (42.6). A key step in the construction was the centering of the Poisson processes to get L´evy processes with characteristic functions given by (42.4). Without the centering one is forced to work only with characteristic functions given by (42.3).
344
L´evy processes
42.3 Representation of L´evy processes We now work toward showing that every L´evy process has a characteristic function of the form given by (42.6). Lemma 42.7 If Xt is a L´evy process and A is a Borel subset of R that is a positive distance from 0, then 1A (Xs ) Nt (A) = s≤t
is a Poisson process. Saying that A is a positive distance from 0 means that inf {|x| : x ∈ A} > 0. Proof Since Xt has paths that are right continuous with left limits and A is a positive distance from 0, then there can only be finitely many jumps of X that lie in A in any finite time interval, and so Nt (A) is finite and has paths that are right continuous with left limits. It follows from the fact that Xt has stationary and independent increments that Nt (A) also has stationary and independent increments. We now apply Proposition 5.4. Theorem 42.8 Let Xt be a L´evy process with X0 = 0 and let A1 , . . . , An be disjoint bounded Borel subsets of (0, ∞), each a finite distance from 0. Set 1Ak (Xs ) Nt (Ak ) = s≤t
and Yt = Xt −
n
Nt (Ak ).
k=1
Then the processes Nt (A1 ), . . . , Nt (An ), and Yt are mutually independent. Proof Define λ(A) = E N1 (A). The previous lemma shows that if λ(A) < ∞, then Nt (A) is a Poisson process, and clearly its parameter is λ(A). The result now follows from Theorem 18.3. Here is the representation theorem for L´evy processes. Theorem 42.9 Suppose Xt is a L´evy process with X0 = 0. Then there exists a measure m on R − {0} with (1 ∧ x2 ) m(dx) < ∞ and real numbers b and σ such that the characteristic function of Xt is given by (42.6). Proof Define m(A) = E N1 (A) if A is a bounded Borel subset of (0, ∞) that is a positive ∞ distance from 0. Since N1 (∪∞ k=1 N1 (Ak ) if the Ak are pairwise disjoint and each k=1 Ak ) = is a positive distance from 0, we see that m is a measure on [a, b] for each 0 < a < b < ∞, and m extends uniquely to a measure on (0, ∞).
42.3 Representation of L´evy processes
First we want to show that
345
Xs 1(Xs >1) is a L´evy process with characteristic function
∞ exp t [eiux − 1] m(dx) . s≤t
1
Since the characteristic function of the sum of independent random variables is equal to the product of the characteristic functions, it suffices to suppose 0 < a < b and to show that
iuZt E e = exp t [eiux − 1] m(dx) , (a,b]
where Zt =
Xs 1(a,b] (Xs ).
s≤t
Let n > 1 and z j = a + j(b − a)/n. By Lemma 42.7, Nt ((z j , z j+1 ]) is a Poisson process with parameter
Thus
n−1 j=0
j = E N1 ((z j−1 , z j ]) = m((z j , z j+1 ]). z j Nt ((z j , z j+1 ]) has characteristic function n−1
exp(t j (e
iuz j
n−1
− 1)) = exp t (eiuz j − 1) j ,
j=0
which is equal to
j=0
exp t (eiux − 1) mn (dx) ,
(42.7)
n where mn (dx) = n−1 j=0 j δz j (dx). Since Zt converges to Zt as n → ∞, passing to the limit shows that Zt has a characteristic function of the form (42.6). Next we show that m(1, ∞) < ∞. (We write m(1, ∞) instead of m((1, ∞)) for esthetic reasons.) If not, m(1, K ) → ∞ as K → ∞. Then for each fixed L and each fixed t, lim sup P(Nt (1, K ) ≤ L) = lim sup K→∞
K→∞
L j=0
e−tm(1,K )
m(1, K ) j = 0. j!
This implies that Nt (1, ∞) = ∞ for each t. However, this contradicts the fact that Xt has paths that are right continuous with left limits. We define m on (−∞, 0) similarly. We now look at Yt = Xt − Xs 1(|Xs |>1) . s≤t
This is again a L´evy process, and we need to examine its structure. This process has bounded jumps, hence has moments of all orders. By subtracting c1t for an appropriate constant c1 , we may suppose Yt has mean 0. Let I1 , I2 , . . . be an ordering of the intervals {[2−(m+1) , 2−m ), (−2−m , −2−(m+1) ] : m ≥ 0}. Let t k = Xs 1(Xs ∈Ik ) X s≤t
346
L´evy processes
t k . By Corollary 18.3 and the fact that all the X k have mean zero, t k − E X and let Xt k = X ∞
∞ ∞
2 2 k E (Xt ) ≤ E Yt − Xt Xt k +E = E (Yt )2 < ∞. k 2
k=1
k=1
Hence
E
N k=M
k=1
Xt k
2
=
N
E (Xt k )2
k=M
N
tends to zero as M, N → ∞, and thus Xt − k=1 Xt k converges in L2 . The limit, Xt c , say, will be a L´evy process independent of all the Xt k . Moreover, X c has no jumps, i.e., it is continuous. Since all the X k have mean zero, then E Xt c = 0. By the independence of the increments,
E [Xt c − Xsc | Fs ] = E [Xt c − Xsc ] = 0, and we see X c is a continuous martingale. Using the stationarity and independence of the increments, c 2 c c E [(Xs+t ) ] = E [(Xsc )2 ] + 2E [Xsc (Xs+t − Xsc )] + E [(Xs+t − Xsc )2 ]
= E [(Xsc )2 ] + E [(Xt c )2 ], which implies that there exists a constant c2 such that E (Xt c )2 = c2t. We then have
E [(Xt c )2 − c2t | Fs ] = (Xsc )2 − c2 s + E [(Xt c − Xsc )2 | Fs ] − c2 (t − s) = (Xsc )2 − c2 s + E [(Xt c − Xsc )2 ] − c2 (t − s) = (Xsc )2 − c2 s. The quadratic variation process of X c is therefore c2t, and by L´evy’s theorem √ (Theorem 12.1), Xt c / c2 is a constant multiple of Brownian motion. 1 To complete the proof, it remains to show that −1 x2 m(dx) < ∞. But by Remark 42.1, x2 m(dx) = E (X1k )2 , Ik
and we have seen that
E (X1k )2 ≤ E Y12 < ∞.
k
Combining gives the finiteness of
1 −1
x2 m(dx).
Exercises 42.1 Let α ∈ (0, 2) and let X be a L´evy process where b = σ = 0 in the L´evy–Khintchine formula and the L´evy measure is m(dx) = c|x|−1−α dx. Show that if a > 0 and Yt = a1/α Xat , then Y has the same law as X . The process X is known as a symmetric stable process of index α. 42.2 Suppose Wt = (Wt1 , Wt2 ) is a two-dimensional Brownian motion started at 0. Let τs = inf {t > 0 : Wt1 > s}. Prove that Wτ2t is a L´evy process and determine the L´evy measure. Hint: Use scaling to make a guess.
Exercises
347
42.3 Let W be a one-dimensional Brownian motion and let L0 be the local time at 0. Let Tt be the inverse of L0 , that is, Tt = inf {s : L0s ≥ t}. Show Tt is a L´evy process and determine the L´evy measure. Hint: Use scaling to get started. y
42.4 Let Wt be a one-dimensional Brownian motion, Lt the local time at level y, and Tt the inverse local time at 0, that is, Tt = inf {s : L0s ≥ t}. Let x > 0 be fixed. Prove that LxTt is a L´evy process. 42.5 Let X be a L´evy process with L´evy measure m. Prove that if A and B are disjoint closed sets, then t Ex 1A (Xs− )1B (Xs ) = E x 1A (Xs )m(B − Xs ) ds 0
s≤t
for each x, where B − y = {z − y : z ∈ B}. This is the L´evy system formula in the case of L´evy processes. There is an analogous formula for Hunt processes. 42.6 A stable subordinator X of order α ∈ (0, 1) is a L´evy process whose characteristic function is given by (42.6), where b = σ 2 = 0 and m(dx) = c1(x>0) |x|−α−1 dx. Suppose X is a stable subordinator of index α and W is a Brownian motion. Show that, up to a deterministic time change, the process Zt = WXt is a symmetric stable process of index 2α. Hint: Start by using scaling. 42.7 Let Zt be a symmetric stable process of order α ∈ (0, 2). Show that if ε > 0, then |Zt | lim t→∞ t α+ε
= 0,
a.s.
Appendix A Basic probability
This appendix covers the facts from basic probability that we will need. The presentation here is not precisely what I use when I teach such a course. For example, in a course I prove the strong law of large numbers without using martingales, I present the inversion theorem for characteristic functions, I make use of L´evy’s continuity theorem, and so on. Nevertheless, proofs of all the facts from probability needed in the main part of the text are given.
A.1 First notions A probability or probability measure is a measure whose total mass is one. Instead of denoting a measure space by (X , A, μ), probabilists use (, F , P ). Here is a set, F is called a σ -field (which is the same thing as a σ -algebra), and P is a measure with P() = 1. Elements of F are called events. A typical element of is denoted ω. Instead of saying a property occurs almost everywhere, we talk about properties occurring almost surely, written a.s. Real-valued measurable functions from to R are called random variables and are usually denoted by X or Y or other capital letters. Integration (in the sense of Lebesgue) with respect to P is called expectation or expected value, and we write E X for X d P. The notation E [X ; A] is often used for A X d P. The random variable 1A is the function that is one if ω ∈ A and zero otherwise. It is called the indicator of A (the name “characteristic function” in probability refers to the Fourier transform). Events such as {ω : X (ω) > a} are almost always abbreviated by (X > a) or {X > a}. Given a random variable X , we can define a probability on the Borel σ -field of R by
PX (A) = P(X ∈ A),
A ⊂ R.
(A.1)
The probability PX is called the law of X or the distribution of X . We define FX : R → [0, 1] by FX (x) = PX ((−∞, x]) = P(X ≤ x). The function FX is called the distribution function of X .
348
(A.2)
A.1 First notions
349
Proposition A.1 The distribution function FX of a random variable X satisfies: (1) FX is increasing; (2) FX is right continuous with left limits; (3) limx→∞ FX (x) = 1 and limx→−∞ FX (x) = 0. Proof We prove the right continuity of FX and leave the rest of the proof to the reader. If xn ↓ x, then (X ≤ xn ) ↓ (X ≤ x), and so P(X ≤ xn ) ↓ P(X ≤ x) since P is a finite measure. Note that if xn ↑ x, then (X ≤ xn ) ↑ (X < x), and so FX (xn ) ↑ P(X < x). Any function F : R → [0, 1] satisfying (1)–(3) of Proposition 1.1 is called a distribution function, whether or not it comes from a random variable. Proposition A.2 Suppose F is a distribution function. There exists a random variable X such that F = FX . Proof Let = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Define X (ω) = sup{x : F (x) < ω}. It is routine to check that FX = F . In the above proof, essentially X = F −1 . However F may have jumps or be constant over some intervals, so some care is needed in defining X . Certain distributions or laws are very common. We list some of them. (1) Bernoulli. A random variable is Bernoulli if P(X = 1) = p, P(X = 0) = 1 − p for $ % some p ∈ [0, 1]. n k (2) Binomial. This is defined by$ P%(X = k) = p (1 − p)n−k , where n is a positive k n n! integer, 0 ≤ k ≤ n, p ∈ [0, 1], and = k!(n−k)! . k (3) Point mass at a. Here P(X = a) = 1. (4) Poisson. For λ > 0 we set P(X = k) = e−λ λk /k! Again k is a non-negative integer. If F is absolutely continuous, we call f = F the density of F . If such an F is the distribution function of a random variable X , then
P(X ∈ A) =
f (x) dx. A
Some examples of distributions characterized by densities are the following. (5) Uniform on [a, b]. Define f (x) = (b − a)−1 1[a,b] (x). (6) Exponential. For x ≥ 0 let f (x) = λe−λx and set f (x) = 0 for x < 0. 2 (7) Standard normal. Define f (x) = √12π e−x /2 for x ∈ R. Let us verify that the integral of f is one. To do that, let
∞
I= 0
e−x /2 dx, 2
350
Basic probability
√ and it suffices to show I = π/2. Using the Fubini theorem, the monotone convergence theorem, and a change of variables to polar coordinates, we write
∞ 2 ∞ 2 I2 = e−x /2 dx e−y /2 dy 0 ∞0 ∞ −(x2 +y2 )/2 e dx dy = 0 0 2 2 e−(x +y )/2 dx dy = lim R→∞
= lim
R→∞
= lim
R→∞
x,y≥0,x2 +y2 ≤R2 π/2 R −r2 /2
e
0
r dr dθ
0
π π 2 (1 − e−R /2 ) = 2 2
as desired. We shall see later ((A.4) and (A.5)) that a standard normal random variable Z has mean zero and variance one, which means that E Z = 0 and E Z 2 = 1. (8) Normal random variables with mean μ and variance σ 2 . If Z is a standard normal random variable, then a normal random variable X with mean μ and variance σ 2 has the same distribution as μ + σ Z. It is an exercise in calculus to check that such a random variable has density 1 2 2 e−(x−μ) /2σ . (A.3) f (x) = √ 2π σ (9) Gamma. A random variable X has a gamma distribution with parameters r and λ (both r and λ must be positive) if it has density f (x) = λe−λx (λx)r−1 / (r) ∞ for x ≥ 0 and f (x) = 0 if x < 0, where (r) = 0 e−y yr−1 dy is the Gamma function. Recall (k) = (k − 1)! for k a non-negative integer. We can use the law of a random variable to calculate expectations. Proposition A.3 Let X be a random variable. If g is bounded or non-negative, then E g(X ) = g(x) PX (dx). Proof If g is the indicator of an event A, this is just the definition of PX . By linearity, the result holds for simple functions. By the monotone convergence theorem, the result holds for non-negative functions, and by linearity again, it holds for bounded g. If FX has a density f , then PX (dx) = f (x) dx. In this case E X = x f (x) dx and E X 2 = x2 f (x) dx. (We need E |X | finite to justify the first equality if X is not necessarily non-negative.) We define the mean of a random variable to be its expectation, and the variance of a random variable is defined by Var X = E (X − E X )2 . The pth moment of X is E X p if p is a positive integer.
A.1 First notions
351
Note Var X = E [X 2 − 2(X )(E X ) + (E X )2 ] = E X 2 − (E X )2 . Let us calculate a few examples. Since xe−x /2 is an odd function, if Z is a standard normal random variable, then 1 2 E Z = x √ e−x /2 dx = 0. (A.4) 2π 2
Using integration by parts,
1 2 x2 √ e−x /2 dx 2π N 1 2 x2 √ e−x /2 dx = lim N→∞ −N 2π N 1 2 2 = lim −2Ne−N /2 + √ e−x /2 dx N→∞ 2π −N 1 −x2 /2 dx = 1, e =√ 2π
E Z2 =
(A.5)
and so Var Z = 1. By completing the square and a change of variables, we calculate 1 2 aZ Ee = √ eax e−x /2 dx 2π 1 a2 /2 2 2 e−(x−a) /2 dx = ea /2 . =√ e 2π If X is a normal random variable with mean μ and variance σ 2 , we can write X = μ + σ Z for Z a standard normal random variable, and obtain
E eaX = eaμ E eaσ Z = eaμ+a σ 2
2
/2
.
(A.6)
λk λk = ke−λ k! k! k=1
(A.7)
If X is a Poisson random variable with parameter λ, then
EX =
∞
ke−λ
k=0
= λe−λ
∞ k=1
∞
λk−1 = λ. (k − 1)!
A similar calculation shows that E [X (X − 1)] = λ2 , so Var X = E [X (X − 1)] + E X − (E X )2 = λ.
(A.8)
A straightforward application of integration by parts shows that if X is an exponential random variable with parameter λ, then ∞ 1 (A.9) EX = λxe−λx dx = . λ 0 Another equality that is useful is the following.
352
Basic probability
Proposition A.4 If X ≥ 0, a.s., and p > 0, then ∞ p EX = pλ p−1 P(X > λ) dλ. 0
The proof will show that this equality is also valid if we replace P(X > λ) by P(X ≥ λ). Proof
Using the Fubini theorem and writing ∞ ∞ p−1 pλ P(X > λ) dλ = E pλ p−1 1(λ,∞) (X ) dλ 0 0 X =E pλ p−1 dλ = E X p 0
gives the proof. We need two elementary inequalities. The first is known as Chebyshev’s inequality. Proposition A.5 If X ≥ 0,
P(X ≥ a) ≤ Proof
EX . a
We write
X 1[a,∞) (X ) ≤ E X/a, P(X ≥ a) = E 1[a,∞) (X ) ≤ E a since X/a is bigger than or equal to 1 when X ∈ [a, ∞). If we apply this to X = (Y − E Y )2 , we obtain
P(|Y − E Y | ≥ a) = P((Y − E Y )2 ≥ a2 ) ≤ Var Y/a2 .
(A.10)
This special case of Chebyshev’s inequality is sometimes itself referred to as Chebyshev’s inequality, while Proposition A.5 is sometimes called the Markov inequality. The second inequality we need is Jensen’s inequality, not to be confused with Jensen’s formula of complex analysis. Proposition A.6 Suppose g is convex and X and g(X ) are both integrable. Then g(E X ) ≤ E g(X ). Proof One property of convex functions is that they lie above their tangent lines, and more generally, their support lines. Thus if x0 ∈ R, we have g(x) ≥ g(x0 ) + c(x − x0 ) for some constant c. Letting x = X (ω) and taking expectations, we obtain
E g(X ) ≥ g(x0 ) + c(E X − x0 ). Now set x0 equal to E X . If An is a sequence of sets, define (An i.o.), read “An infinitely often,” by ∞ (An i.o.) = ∩∞ n=1 ∪i=n Ai .
This set consists of those ω that are in infinitely many of the An .
A.2 Independence
353
A simple but very important proposition is the Borel–Cantelli lemma. It has two parts, and we prove the first part here, leaving the second part to the next section. Proposition A.7 Let A1 , A2 , . . . be a sequence of events. If n P (An ) < ∞, then P(An i.o.) = 0. Proof
We write
P(An i.o.) = lim P(∪∞ i=n Ai ) ≤ lim sup n→∞
n→∞
∞
P(Ai ) = 0,
i=n
and we are done.
A.2 Independence We say two events A and B are independent if P(A ∩ B) = P(A)P(B). The events A1 , . . . , An are independent if
P(Ai1 ∩ Ai2 ∩ · · · ∩ Ai j ) = P(Ai1 )P(Ai2 ) · · · P(Ai j ) for each subset {i1 , . . . , i j } of {1, . . . , n} with 1 ≤ i1 < · · · < i j ≤ n. Proposition A.8 If A and B are independent, then Ac and B are independent. Proof
We write
P(Ac ∩ B) = P(B) − P(A ∩ B) = P(B) − P(A)P(B) = P(B)(1 − P(A)) = P(B)P(Ac ). This is all there is to the proof. We say two σ -fields F and G are independent if A and B are independent whenever A ∈ F and B ∈ G . Two random variables X and Y are independent if σ (X ), the σ -field generated by X , and σ (Y ), the σ -field generated by Y , are independent. (Recall that the σ -field generated by a random variable X is given by {(X ∈ A) : A a Borel subset of R}.) We define the independence of n σ -fields or n random variables in a similar way. Remark A.9 If f and g are Borel functions and X and Y are independent, then f (X ) and g(Y ) are independent. This follows because the σ -field generated by f (X ) is a sub-σ -field of the one generated by X , and similarly for g(Y ). To construct independent random variables, we can use the following. Proposition A.10 If F1 , . . . , Fn are distribution functions, there exist independent random variables X1 , . . . , Xn such that FXi = Fi , i = 1, . . . , n. Proof Let = [0, 1]n , F the Borel σ -field on , and P an n-dimensional Lebesgue measure on . If ω = (ω1 , . . . , ωn ), define Xi (ω) = sup{x : Fi (x) < ωi }. As in Proposition A.2, FXi = Fi . We deduce the independence from the fact that P is a product measure, in fact, the n-fold product of one-dimensional Lebesgue measure on [0, 1].
354
Basic probability
Let FX ,Y (x, y) = P(X ≤ x, Y ≤ y) denote the joint distribution function of two random variables X and Y . (The comma inside the set means “and"; this is a standard convention in probability.) Proposition A.11 FX ,Y (x, y) = FX (x)FY (y) if and only if X and Y are independent. Proof
If X and Y are independent, then FX ,Y (x, y) = P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y) = FX (x)FY (y).
Conversely, if the inequality holds, fix y and let My denote the collection of sets A for which P(X ∈ A, Y ≤ y) = P(X ∈ A)P(Y ≤ y). My contains all sets of the form (−∞, x]. It follows by linearity that My contains all sets of the form (x, z], and then by linearity again, all sets that are the finite union of such half-open, half-closed intervals. Note that the collection of finite unions of such intervals, A, is an algebra generating the Borel σ -field. It is clear that My is a monotone class, so by the monotone class theorem (Theorem B.2), My contains the Borel σ -field. For a fixed set A, let MA denote the collection of sets B for which P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B). Again, MA is a monotone class and by the preceding paragraph contains the σ -field generated by the collection of finite unions of intervals of the form (x, z], and hence contains the Borel sets. Therefore X and Y are independent. The following is known as the multiplication theorem. Proposition A.12 If X , Y , and X Y are integrable and X and Y are independent, then E [X Y ] = (E X )(E Y ). Proof Consider the pairs (ZX , ZY ) with ZX being σ (X ) measurable and ZY being σ (Y ) measurable for which the multiplication theorem is true. It holds for ZX = 1A (X ) and ZY = 1B (Y ) with A and B Borel subsets of R by the definition of X and Y being independent. It holds for simple random variables (ZX , ZY ), that is, linear combinations of indicators, by the linearity of both sides. It holds for non-negative random variables by monotone convergence. And it holds for integrable random variables by linearity again. If X1 , . . . , Xn are independent, then so are X1 − E X1 , . . . , Xn − E Xn . Assuming everything is integrable,
E [(X1 − E X1 ) + · · · (Xn − E Xn )]2 = E (X1 − E X1 )2 + · · · + E (Xn − E Xn )2 , using the multiplication theorem to show that the expectations of the cross-product terms are zero. We have thus shown Var (X1 + · · · + Xn ) = Var X1 + · · · + Var Xn . We finish up this section by proving the second half of the Borel–Cantelli lemma. Proposition A.13 Suppose An is a sequence of independent events. If ∞ n=1
then P(An i.o.) = 1.
P(An ) = ∞,
(A.11)
A.3 Convergence
355
Note that here the An are independent, while in the first half of the Borel–Cantelli lemma no such assumption was necessary. Proof
Note
P(∪Ni=n Ai ) = 1 − P(∩Ni=n Aci ) = 1 −
N
P(Aci )
i=n
=1−
N
N
(1 − P(Ai )) ≥ 1 − exp − P(Ai ) ,
i=n
i=n
using the inequality 1 − x ≤ e−x for x > 0. As N → ∞, the right-hand side tends to one, so P(∪∞ i=n Ai ) = 1. This holds for all n, which proves the result.
A.3 Convergence In this section we consider three ways a sequence of random variables Xn can converge. We say Xn converges to X almost surely if the event (Xn → X ) has probability zero. Xn converges to X in probability if for each ε, P(|Xn − X | > ε) → 0 as n → ∞. For p ≥ 1, Xn converges to X in L p if E |Xn − X | p → 0 as n → ∞. The following proposition shows some relationships among the types of convergence. Proposition A.14 (1) If Xn → X almost surely, then Xn → X in probability. (2) If Xn → X in L p , then Xn → X in probability. (3) If Xn → X in probability, there exists a subsequence n j such that Xn j converges to X almost surely. Proof To prove (1), note Xn − X tends to zero almost surely, so 1(−ε,ε)c (Xn − X ) also converges to zero almost surely. Now apply the dominated convergence theorem. (2) comes from Chebyshev’s inequality:
P(|Xn − X | > ε) = P(|Xn − X | p > ε p ) ≤ E |Xn − X | p /ε p → 0 as n → ∞. To prove (3), choose n j larger than n j−1 such that P(|Xn − X | > 2− j ) < 2− j whenever n ≥ n j . Thus if we let Ai = (|Xn j − X | > 2−i for some j ≥ i), then P(Ai ) ≤ 2−i+1 . By the Borel–Cantelli lemma P(Ai i.o.) = 0. This implies Xn j → X almost surely on the complement of (Ai i.o.). Let us give some examples to show there need not be any other implications among the three types of convergence. Let = [0, 1], F the Borel σ -field, and P a Lebesgue measure. Let Xn = n2 1(0,1/n) . Then clearly Xn converges to zero almost surely and in probability, but E Xnp = n2p /n → ∞ for any p ≥ 1. Let be the unit circle, and let P be a Lebesgue measure on the circle normalized to have total mass 1. We use θ to denote the angle that the ray from 0 through a point on the circle makes with the x axis. Let tn = ni=1 i−1 , and let An = {eiθ : tn−1 ≤ θ < tn }. Let Xn = 1An .
356
Basic probability
Any point on the unit circle will be in infinitely many An , so Xn does not converge almost surely to zero. But P(An ) = 1/(2πn) → 0, so Xn → 0 in probability and in L p .
A.4 Uniform integrability A sequence {Xi } of random variables is uniformly integrable if |Xi | d P → 0 sup i
(|Xi |>M )
as M → ∞. This can be rephrased by saying: given ε > 0 there exists M > 0 such that E [ |Xi |; |Xi | > M] < ε for all i. Here M can be chosen independently of i. Lemma A.15 If {Xi } is a uniformly integrable sequence of random variables, then supi E |Xi | < ∞. Proof
There exists M such that E [ |Xi |; |Xi | > M] ≤ 1. Then
E |Xi | ≤ E [ |Xi |; |Xi | ≤ M] + E [ |Xi |; |Xi | > M] ≤ M + 1, and we are done. We say a sequence of random variables {Xi } is uniformly absolutely continuous if given ε there exists δ such that supi E [ |Xi |; A] ≤ ε whenever P(A) < δ. Proposition A.16 The following are equivalent. (1) The sequence {Xi } is uniformly integrable. (2) The sequence {Xi } is uniformly absolutely continuous and supi E |Xi | < ∞. Proof If (1) holds, we showed in Lemma A.15 that the expectations are uniformly bounded. Let ε > 0 and choose M such that supi E [ |Xi | : |Xi | > M] < ε/2. Then if δ = ε/(2M ) and P(A) < δ, we have ε E [ |Xi |; A] ≤ E [ |Xi |; |Xi | > M] + E [ |Xi |; |Xi | ≤ M, A] < + M P(A) ≤ ε. 2 Now suppose (2) holds. Let ε > 0 and choose δ such that E [ |Xi |; A] < ε for all i if P(A) ≤ δ. Let M = supi E |Xi |/δ. Then by the Chebyshev inequality
P(|Xi | > M ) ≤
E |Xi | = δ, M
so E [ |Xi |; |Xi | > M] < ε. Proposition A.17 Suppose {Xi } and {Yi } are each uniformly integrable sequences of random variables. Then {Xi + Yi } is also a uniformly integrable sequence. Proof
By Proposition A.16, sup E |Xi + Yi | ≤ sup E |Xi | + sup E |Yi | < ∞. i
i
i
Using Proposition A.16 again, given ε there exists δ such that E [ |Xi |; A] < ε/2 and E [ |Yi |; A] < ε/2 if P(A) < δ. But then E [ |Xi + Yi |; A] < ε and a third use of Proposition A.16 yields our result.
A.5 Conditional expectation
357
Proposition A.18 Suppose there exists ϕ : [0, ∞) → [0, ∞) such that ϕ is increasing, ϕ(x)/x → ∞ as x → ∞, and supi E ϕ(|Xi |) < ∞. Then the sequence {Xi } is uniformly integrable. Proof
Let ε > 0 and choose x0 such that x/ϕ(x) < ε if x ≥ x0 . If M ≥ x0 , |Xi | ϕ(|Xi |)1(|Xi |>M ) ≤ ε ϕ(|Xi |) ≤ ε sup E ϕ(|Xi |). |Xi | = ϕ(|Xi |) i (|Xi |>M )
Since ε is arbitrary, we are done. The main result we need in this section is the Vitali convergence theorem. Theorem A.19 If Xn → X almost surely and the sequence {Xn } is uniformly integrable, then E |Xn − X | → 0. Proof By Proposition A.17 with Yi = −X for each i, the sequence Xi − X is uniformly integrable. Let ε > 0 and choose M such that |Xi − X | < ε. (|Xi −X |>M )
By dominated convergence, lim sup E |Xi − X | ≤ lim sup E [ |Xi − X |; |Xi − X | ≤ M] + ε = ε. i→∞
i→∞
Since ε is arbitrary, then E |Xi − X | → 0.
A.5 Conditional expectation If F ⊂ G are two σ -fields and X is an integrable G measurable random variable, the conditional expectation of X given F , written E [X | F ] and read as “the expectation (or expected value) of X given F ,” is any F measurable random variable Y such that E [Y ; A] = E [X ; A] for every A ∈ F . The conditional probability of A ∈ G given F is defined by P(A | F ) = E [1A | F ]. If Y1 , Y2 are two F measurable random variables with E [Y1 ; A] = E [Y2 ; A] for all A ∈ F , then Y1 = Y2 , a.s., and so conditional expectation is unique up to almost sure equivalence. In the case X is already F measurable, E [X | F ] = X . If X is independent from F , E [X | F ] = E X . Both of these facts follow immediately from the definition. For another example, if {Ai } is a finite collection of pairwise disjoint sets whose union is , P(Ai ) > 0 for all i, and F is the σ -field generated by the Ai ’s, then
P(A | F ) =
P(A ∩ Ai ) i
P(Ai )
1Ai .
(A.12)
This follows since the right-hand side is F measurable and its expectation over any set Ai is P(A ∩ Ai ). Equation (A.12) provides the link with the definition of conditional probability from elementary probability: if P(B) = 0, then
P(A | B) =
P(A ∩ B) . P(B)
(A.13)
358
Basic probability
We have
E [E [X | F ] ] = E X
(A.14)
because E [E [X | F ]] = E [E [X | F ]; ] = E [X ; ] = E X . The following is easy to establish. Proposition A.20 (1) If X ≥ Y are both integrable, then
E [X | F ] ≥ E [Y | F ],
a.s.
(2) If X and Y are integrable and a ∈ R, then
E [aX + Y | F ] = aE [X | F ] + E [Y | F ]. It is easy to check that limit theorems such as monotone convergence and dominated convergence have conditional expectation versions, as do inequalities like Jensen’s and Chebyshev’s inequalities. Thus, for example, we have Jensen’s inequality for conditional expectations. Proposition A.21 If g is convex and X and g(X ) are integrable,
E [g(X ) | F ] ≥ g(E [X | F ]),
a.s.
A key fact is the following. Proposition A.22 If X and X Y are integrable and Y is measurable with respect to F , then
E [X Y | F ] = Y E [X | F ]. Proof
(A.15)
If A ∈ F , then for any B ∈ F , E 1A E [X | F ]; B = E E [X | F ]; A ∩ B = E [X ; A ∩ B] = E [1A X ; B].
Since 1A E [X | F ] is F measurable, this shows that (A.15) holds when Y = 1A and A ∈ F . Using linearity shows that (A.15) holds whenever Y is a simple F measurable random variable. Taking limits, (A.15) holds whenever Y ≥ 0 is F measurable and X and X Y are integrable. Using linearity again completes the proof. Two other equalities are contained in the following. Proposition A.23 If E ⊂ F ⊂ G are σ -fields, then E E [X | F ] | E = E [X | E ] = E E [X | E ] | F . Proof The right equality holds because E [X | E ] is E measurable, hence F measurable. We then use the fact that if Y is F measurable, E [Y | F ] = Y . To show the left equality, let A ∈ E . Then since A is also in F , E E E [X | F ] | E ; A = E E [X | F ]; A = E [X ; A] = E [E X | E ]; A . Since both sides are E measurable, the equality follows. To show the existence of E [X | F ], we proceed as follows. Proposition A.24 If X is integrable, then E [X | F ] exists.
A.7 Martingales
359
Proof Using linearity, we need only consider X ≥ 0. Define a finite measure Q on F by Q(A) = E [X ; A] for A ∈ F . This is trivially absolutely continuous with respect to P|F , the restriction of P to F . Let E [X | F ] be the Radon–Nikodym derivative of Q with respect to P|F . Since Q and P|F are measures on F , the Radon–Nikodym derivative is F measurable, and so provides the desired random variable. When F = σ (Y ), one usually writes E [X | Y ] for E [X | F ]. Notation that is commonly used is E [X | Y = y]. The definition is as follows. If A ∈ σ (Y ), then A = (Y ∈ B) for some Borel set B by the definition of σ (Y ), or 1A = 1B (Y ). By linearity and taking limits, it follows that if Z is σ (Y ) measurable, then Z = f (Y ) for some Borel measurable function f . Set Z = E [X | Y ] and choose f Borel measurable so that Z = f (Y ). Then E [X | Y = y] is defined to be f (y). If X ∈ L2 and M = {Y ∈ L2 : Y is F measurable}, one can show that E [X | F ] is equal to the projection of X onto the subspace M.
A.6 Stopping times We next want to talk about stopping times. Suppose we have a sequence of σ -fields Fi such that Fi ⊂ Fi+1 for each i. An example would be if Fi = σ (X1 , . . . , Xi ). A random mapping N from to {0, 1, 2, . . .} is called a stopping time if for each n, (N ≤ n) ∈ Fn . The proof of the following is immediate from the definitions. Proposition A.25 (1) Fixed times n are stopping times. (2) If N1 and N2 are stopping times, then so are N1 ∧ N2 and N1 ∨ N2 . (3) If Nn is an increasing sequence of stopping times, then so is N = supn Nn . (4) If Nn is a decreasing sequence of stopping times, then so is N = inf n Nn . (5) If N is a stopping time, then so is N + n. We define
FN = {A : A ∩ (N ≤ n) ∈ Fn for all n}.
(A.16)
A.7 Martingales In this section we consider martingales. Let Fn be an increasing sequence of σ -fields. A sequence of random variables Mn is adapted to Fn if for each n, Mn is Fn measurable. Mn is a martingale if Mn is adapted to Fn , Mn is integrable for all n, and
E [Mn | Fn−1 ] = Mn−1 ,
a.s.,
n = 2, 3, . . .
(A.17)
If we have E [Mn | Fn−1 ] ≥ Mn−1 , a.s., for every n, then Mn is a submartingale. If we have E [Mn | Fn−1 ] ≤ Mn−1 , we have a supermartingale. Let us look at some examples. If Xi is a sequence of mean zero independent random variables and Sn = ni=1 Xi , then Mn = Sn is a martingale, since
E [Mn | Fn−1 ] = Mn−1 + E [Mn − Mn−1 | Fn−1 ] = Mn−1 + E [Mn − Mn−1 ] = Mn−1 , using independence.
360
Basic probability
Another example is the following. If the Xi ’s are independent and have mean zero and variance one, Sn is as in the previous example, and Mn = Sn2 − n, then 2 2 E [Sn2 | Fn−1 ] = E [(Sn − Sn−1 )2 | Fn−1 ] + 2Sn−1 E [Sn | Fn−1 ] − Sn−1 = 1 + Sn−1 ,
using independence. It follows that Mn is a martingale. A third example is the following: if X ∈ L1 and Mn = E [X | Fn ], then Mn is a martingale. The proof of this is simple:
E [Mn+1 | Fn ] = E [E [X | Fn+1 ] | Fn ] = E [X | Fn ] = Mn . If Mn is a martingale, g is convex, and g(Mn ) is integrable for each n, then by Jensen’s inequality for conditional expectations,
E [g(Mn+1 ) | Fn ] ≥ g(E [Mn+1 | Fn ]) = g(Mn ),
(A.18)
or g(Mn ) is a submartingale. Similarly if g is convex and increasing on [0, ∞) and Mn is a positive submartingale, then g(Mn ) is a submartingale because
E [g(Mn+1 ) | Fn ] ≥ g(E [Mn+1 | Fn ]) ≥ g(Mn ). A.8 Optional stopping Note that if one takes expectations in (A.17), one has E Mn = E Mn−1 , and by induction E Mn = E M0 . The theorem about martingales that lies at the basis of all other results is Doob’s optional stopping theorem, which says that the same is true if we replace n by a stopping time N. There are various versions, depending on what conditions one puts on the stopping times. Theorem A.26 If N is a stopping time with respect to Fn that is bounded by a positive real K and Mn a martingale, then E MN = E M0 . Proof
We write
E MN =
K
E [MN ; N = k] =
k=0
K
E [Mk ; N = k].
k=0
Note (N = k) is F j measurable if j ≥ k, so
E [Mk ; N = k] = E [Mk+1 ; N = k] = E [Mk+2 ; N = k] = · · · = E [MK ; N = k]. Hence
E MN =
K
E [MK ; N = k] = E MK = E M0 .
k=0
This completes the proof. The same proof as that in Theorem A.26 gives the following corollary.
A.9 Doob’s inequalities
361
Corollary A.27 If N is a stopping time bounded by K and Mn is a submartingale, then E MN ≤ E MK . The same proof also gives Corollary A.28 If N is a stopping time bounded by K, A ∈ FN , and Mn is a submartingale, then E [MN ; A] ≤ E [MK ; A]. Proposition A.29 If N1 ≤ N2 are stopping times bounded by K and M is a martingale, then E [MN2 | FN1 ] = MN1 , a.s. Proof Suppose A ∈ FN1 . We need to show E [MN1 ; A] = E [MN2 ; A]. Define a new stopping time N3 by
N1 (ω), ω ∈ A N3 (ω) = / A. N2 (ω), ω ∈ It is easy to check that N3 is a stopping time, so E MN3 = E MK = E MN2 implies
E [MN1 ; A] + E [MN2 ; Ac ] = E [MN2 ]. Subtracting E [MN2 ; Ac ] from each side completes the proof. The following is known as the Doob decomposition for discrete time martingales. Proposition A.30 Suppose Xk is a submartingale with respect to an increasing sequence of σ -fields Fk . Then we can write Xk = Mk + Ak such that Mk is a martingale adapted to the Fk and Ak is a sequence of random variables with Ak being Fk−1 measurable and A0 ≤ A1 ≤ · · · . Proof Let ak = E [Xk | Fk−1 ] − Xk−1 for k = 1, 2, . . . Since Xk is a submartingale, each ak ≥ 0. Let Ak = ki=1 ai . The fact that the Ak are increasing and measurable with respect to Fk−1 is clear. Set Mk = Xk − Ak . Then
E [Mk+1 − Mk | Fk ] = E [Xk+1 − Xk | Fk ] − ak+1 = 0, or Mk is a martingale. Combining Propositions A.29 and A.30 we have Corollary A.31 Suppose Xk is a submartingale, and N1 ≤ N2 are bounded stopping times. Then
E [XN2 | FN1 ] ≥ XN1 . A.9 Doob’s inequalities The first interesting consequences of the optional stopping theorems are Doob’s inequalities. If Mn is a martingale, set Mn∗ = maxi≤n |Mi |. Theorem A.32 If Mn is a martingale or a positive submartingale, 1 1 P(Mn∗ ≥ a) ≤ E [ |Mn |; Mn∗ ≥ a] ≤ E |Mn |. a a
362
Basic probability
Proof Fix n. Set Mn+1 = Mn . Let N = min{ j : |M j | ≥ a} ∧ (n + 1). Since the function f (x) = |x| is convex, |Mn | is a submartingale. If A = (Mn∗ ≥ a), then A ∈ FN and we have aP(Mn∗ ≥ a) ≤ E [ |MN |; A] ≤ E [ |Mn |; A] ≤ E |Mn |, the first inequality by the definition of N, the second by Corollary A.28. For p > 1, we have the following inequality. Theorem A.33 If p > 1, M is a martingale or positive submartingale, and E |Mi | p < ∞ for i ≤ n, then
p p E (Mn∗ ) p ≤ E |Mn | p . p−1 Proof Note Mn∗ ≤ ni=1 |Mn |, hence Mn∗ ∈ L p . We write, using Theorem A.32, ∞ ∞ ∗ p p−1 ∗ E (Mn ) = pa P(Mn > a) da ≤ pa p−1 E [ |Mn |1(Mn∗ ≥a) /a] da 0 0 Mn∗ p =E pa p−2 |Mn | da = E [(Mn∗ ) p−1 |Mn |] p−1 0 p (E (Mn∗ ) p )(p−1)/p (E |Mn | p )1/p . ≤ p−1 The last inequality follows by H¨older’s inequality. Now divide both sides by the quantity (E (Mn∗ ) p )(p−1)/p .
A.10 Martingale convergence theorem The martingale convergence theorem is another important consequence of optional stopping. The main step is the upcrossing lemma. The number of upcrossings of an interval [a, b] is the number of times a process M crosses from below a to above b. To be more exact, let S1 = min{k : Mk ≤ a},
T1 = min{k > S1 : Mk ≥ b},
and Si+1 = min{k > Ti : Mk ≤ a},
Ti+1 = min{k > Si+1 : Mk ≥ b}.
The number of upcrossings Un before time n is Un = max{ j : Tj ≤ n}. Theorem A.34 (Upcrossing lemma) If Mk is a submartingale, 1 E [(Mn − a)+ ]. b−a Proof The number of upcrossings of [a, b] by Mk is the same as the number of upcrossings of [0, b − a] by Yk = (Mk − a)+ , where x+ = x ∨ 0. Moreover Yk is still a submartingale. If we obtain the inequality for the number of upcrossings of the interval [0, b − a] by the process Yk , we will have the desired inequality for upcrossings of M. Thus we may assume a = 0. Fix n and define Yn+1 = Yn . This will still be a submartingale. Define Si , Ti as above, and let Si = Si ∧ (n + 1), Ti = Ti ∧ (n + 1). Since Ti+1 > Si+1 > Ti , then Tn+1 = n + 1.
E Un ≤
A.10 Martingale convergence theorem
363
We write n+1
E Yn+1 = E YS1 +
E [YTi − YSi ] +
i=0
n+1
E [YSi+1 − YTi ].
i=0
All the summands in the third term on the right are non-negative since Yk is a submartingale. The first term on the right will be non-negative since Y is non-negative. For the jth upcrossing, YTj − YSj ≥ b − a, while YTj − YSj is always greater than or equal to 0. Thus n+1
(YTi − YSi ) ≥ (b − a)Un .
i=0
Hence
E Un ≤
1 E Yn+1 . b−a
(A.19)
This leads to the martingale convergence theorem. Theorem A.35 If Mn is a submartingale such that supn E Mn+ < ∞, then Mn converges almost surely as n → ∞. Proof For each a < b, let Un (a, b) be the number of upcrossings of [a, b] by M up to time n, and let U (a, b) = limn→∞ Un . For each pair a < b of rational numbers, by monotone convergence, 1 sup E (Mn − a)+ < ∞. E U (a, b) ≤ b−a n Thus U (a, b) < ∞, a.s. If Na,b is the set of ω’s where U (a, b) = ∞ and N = ∪a lim inf n→∞ Mn (ω). Therefore Mn converges almost surely, although we still have to rule out the possibility of the limit being infinite. Since Mn is a submartingale, E Mn ≥ E M0 , and thus
E |Mn | = E Mn+ + E Mn− = 2E Mn+ − E Mn ≤ 2E Mn+ − E M0 . By Fatou’s lemma,
E lim |Mn | ≤ sup E |Mn | ≤ 2 sup E Mn+ − E M0 < ∞, n
n
n
or Mn converges almost surely to a finite limit. Corollary A.36 If Xn is a positive supermartingale or a martingale bounded above or below, Xn converges almost surely. Proof If Xn is a positive supermartingale, −Xn is a submartingale bounded above by 0. Now apply Theorem A.35. If Xn is a martingale bounded above, by considering −Xn , we may assume Xn is bounded below. Looking at Xn + M for fixed M will not affect the convergence, so we may assume Xn is bounded below by 0. Now apply the first assertion of the corollary.
364
Basic probability
Mn is a uniformly integrable martingale if the collection of random variables {Mn } is uniformly integrable. Proposition A.37 (1) If Mn is a martingale with supn E |Mn | p < ∞ for some p > 1, then the convergence is in L p as well as almost surely. This is also true when Mn is a submartingale. (2) If Mn is a uniformly integrable martingale, then the convergence is in L1 . (3) If Mn → M∞ in L1 , then Mn = E [M∞ | Fn ]. Proof (1) If supn E |Mn | p < ∞, then supn E Mn+ < ∞ and Mn converges almost surely. Let M∞ be the limit. Then |Mn − M∞ | → 0, a.s., and
E sup |Mn − M∞ | p ≤ cE sup |Mn | p + cE |M∞ | p n
n
≤ cE sup |Mn | p n
≤ c sup E |Mn | p < ∞. n
The second inequality is by Fatou’s lemma and the last by Doob’s inequalities, Theorem A.33. The L p convergence assertion now follows by dominated convergence. (2) The L1 convergence assertion follows since almost sure convergence together with uniform integrability implies L1 convergence by the Vitali convergence theorem, Theorem A.19. (3) Finally, if j < n, we have M j = E [Mn | F j ]. If A ∈ F j ,
E [M j ; A] = E [Mn ; A] → E [M∞ ; A] by the L1 convergence of Mn to M∞ . Since this is true for all A ∈ F j , M j = E [M∞ | F j ].
A.11 Strong law of large numbers Suppose we have a sequence X1 , X2 , . . . of independent and identically distributed random variables. This means that the Xi are independent and each has the same law as X1 . This situation is very common, and we abbreviate this by saying the Xi are i.i.d. Define n Sn = Xi . i=1
The Sn are called partial sums. In this section we suppose E |X1 | < ∞. The strong law of large number is the precise version of the law of averages. Theorem A.38 If Xi is an i.i.d. sequence and E |X1 | < ∞, then Sn → E X1 , n
a.s.
The proof we give is a mixture of the standard one and some martingale techniques. The standard proof (see, e.g., Chung (2001)) uses no martingale methods, while there is a proof (see Durrett (1996)) that is entirely martingale based.
A.11 Strong law of large numbers
365
Proof We may assume E Xi = 0, for otherwise we replace Xi by Xi − E Xi . Let Yn = Xn 1(|Xn |≤n) , Zn = Yn − E Yn , and n Zi . Mn = i i=1 Let Fn = σ (X1 , . . . , Xn ). Note that the Zi are independent but not identically distributed. Using the independence, Mn is a martingale:
E [Mn+1 | Fn ] = Mn +
1 1 E [Zn+1 | Fn ] = Mn + E [Zn+1 ] = Mn . n+1 n+1
We will need the estimate ∞ ∞ P(|X1 | ≥ i) = i=1
i=1 ∞
≤
i
P(|Xi | ≥ i) dx
(A.20)
i−1
P(|X1 | ≥ x) dx = E |X1 | < ∞,
0
using Proposition A.4. We show that E |Mn | is bounded by a constant not depending on n. In fact, again using Proposition A.4,
E Mn2
n Var Zi
n 1 = Var Mn = = Var Yi 2 i i2 i=1 i=1 n ∞ 1 1 i 2 ≤ EY ≤ 2yP(|Xi | ≥ y) dy i2 i i2 0 i=1 i=1 ∞ 1 ∞ =2 1(y≤i) yP(|X1 | ≥ y) dy i2 0 i=1 ∞ ∞ 1 =2 1(y≤i) yP(|X1 | ≥ y) dy i2 0 i=1 ∞ 1 · yP(|X1 | ≥ y) dy ≤c y 0 ∞ =c P(|X1 | ≥ y) dy = cE |X1 | < ∞. 0
The uniform bound on E |Mn | follows by Jensen’s inequality. By the martingale convergence theorem, Mn converges almost surely; let M∞ be the limit. Some elementary calculus shows that 1n ni=1 Mi also converges to M∞ , a.s. We now use summation by parts as follows. Since i(Mi − Mi−1 ) = Zi and M0 = 0, then 1 1 1 Zi = (iMi − iMi−1 ) = iMi − (i + 1)Mi n i=1 n i=1 n i=1 i=1 n
n
n
n−1
n − 1 1 = Mn − Mi → M∞ − M∞ = 0. n n − 1 i=1 n−1
366
Basic probability
By dominated convergence and the fact that the Xi are identically distributed,
E Yn = E [Xn 1(|Xn |≤i) ] = E [X1 1(|X1 |≤n) ] → E X1 = 0 as n → ∞, and this implies 1n ni=1 E Yi → 0. Since Yi = Zi + E Yi , we conclude 1 Yi → 0, n i=1 n
a.s.
Finally, ∞
P(Xi = Yi ) =
i=1
∞ i=1
P(|Xi | ≥ i) =
∞
P(|X1 | ≥ i) < ∞,
i=1
so by the Borel–Cantelli lemma, except for a set of probability zero, Xi = Yi for all i greater than some positive integer I (I depends on ω). Hence n n I 1 1 1 Xi − Yi ≤ |Xi − Yi | → 0, n i=1 n i=1 n i=1
a.s.
This completes the proof. The following extension of the strong law will be needed when comparing a random walk and a Brownian motion. Proposition A.39 Suppose Xi is an i.i.d. sequence and E |X1 | < ∞. Then maxk≤n |Sk − E Sk | → 0, n
a.s.
Proof By looking at Xi − E Xi , we may assume E Xi = 0. Let j(n) be (one of) the value(s) of j such that |S j | = maxk≤n |Sk |. Suppose Sn (ω)/n → 0. It suffices to show |S j(n) (ω)|/n → 0, a.s. If not, for this ω, either (1) there is a subsequence nk → ∞ and ε > 0 such that j(nk ) → ∞ and |S j(nk ) |/nk ≥ ε for all k; or (2) there exists a subsequence nk → ∞, ε > 0, and N > 1 such that j(nk ) ≤ N and |S j(nk ) |/nk ≥ ε for all k. In case (1), since j(nk ) → ∞, |S j(nk ) | j(nk ) |S j(nk ) | |S j(nk ) | → 0, = ≤ nk j(nk ) nk j(nk ) a contradiction. In case (2), |S j(nk ) | maxm≤N |Sm | ≤ → 0, nk nk also a contradiction. Another application of the strong law of large numbers is the Glivenko–Cantelli theorem. Let Xi be i.i.d. random variables which have a uniform distribution on [0, 1],
A.12 Weak convergence
367
that is, P(X1 ≤ t ) = t if 0 ≤ t ≤ 1. Let 1 Fn (t ) = 1[0,t] (Xi ), n i=1 n
0 ≤ t ≤ 1.
By the strong law, Fn (t ) → t, a.s., for each t. The Glivenko–Cantelli theorem says that the convergence is uniform over t. Theorem A.40 With Fn as above, sup |Fn (t ) − t| → 0,
a.s.
0≤t≤1
Proof For each t ∈ [0, 1], 1[0,t] (Xi ) is a sequence of i.i.d. random variables with expectation P(Xi ≤ t ) = t. By the strong law of large numbers, for each t, Fn (t ) → t, a.s. Let Nt be the set of ω such that Fn (t )(ω) does not converge to t, and let N = ∪Q+ Nt . Then P(N ) = 0. Let ε > 0 and take ω ∈ / N. Take m > 2/ε and choose n0 large enough (depending on ω) such that |Fn (k/m)(ω) − (k/m)| < ε/2,
k = 0, 1, 2, . . . , m,
if n ≥ n0 . Then if n ≥ n0 and k/m ≤ t < (k + 1)/m, Fn (t ) − t ≤ Fn ((k + 1)/m) − k/m ≤ Fn ((k + 1)/m) − (k + 1)/m + ε/2 < ε, and similarly Fn (t ) − t > −ε. Hence for n ≥ n0 , sup |Fn (t ) − t| ≤ ε. t∈[0,1]
Since ε is arbitrary, this proves the uniform convergence.
A.12 Weak convergence
√ We will see soon that if the Xi are i.i.d. with mean zero and variance one, then Sn / n converges in the sense that √ P(Sn / n ∈ [a, b]) → P(Z ∈ [a, b]), where Z is a standard normal. We want to generalize the above type of convergence. We say Fn converges weakly to F if Fn (x) → F (x) for all x at which F is continuous. Here Fn and F are distribution functions. We say Xn converges weakly to X if FXn converges weakly to FX . We also say Xn converges in distribution or converges in law to X . Probabilities μn converge weakly if their corresponding distribution functions converge, that is, if Fμn (x) = μn (−∞, x] converges weakly. An example that illustrates why we restrict the convergence to continuity points of F is the following. Let Xn = 1/n with probability one, and X = 0 with probability one. FXn (x) is 0 if x < 1/n and 1 otherwise. Note FXn (x) converges to FX (x) for all x except x = 0. Proposition A.41 Xn converges weakly to X if and only if E g(Xn ) → E g(X ) for all g bounded and continuous.
368
Basic probability
Proof Suppose E g(Xn ) → E g(X ) whenever g is bounded and continuous. Let ε > 0 and suppose x is a continuity point of FX . Choose δ such that FX (x) − ε < FX (x − δ) ≤ FX (x + δ) < FX (x) + ε. Let g be a continuous function taking values in [0, 1] such that g equals 1 on (−∞, x] and equals 0 on [x + δ, ∞). Then lim sup FXn (x) ≤ lim sup E g(Xn ) n→∞
n→∞
= E g(X ) ≤ FX (x + δ) < FX (x) + ε. A similar argument shows that lim inf n→∞ FXn > FX (x) − ε. Since ε is arbitrary, limn→∞ FXn (x) = FX (x). Now suppose Xn → X weakly. Let ε > 0 and choose M > 0 such that M and −M are continuity points for FX and also continuity points for each of the FXn , FX (−M ) < ε, and FX (M ) > 1 − ε. Suppose g is bounded and continuous on R and without loss of generality suppose g is bounded by 1. Then lim sup |E [g(Xn ); Xn ∈ / [−M, M )]|
(A.21)
n→∞
≤ lim sup P(|Xn | ≥ M ) n→∞
= lim sup FXn (−M ) + lim sup(1 − FXn (M )) n→∞
n→∞
≤ 2ε. Similarly, |E [g(X ); X ∈ / [−M, M )]| ≤ 2ε. (A.22) Take f to be a step function of the form mi=1 ci 1(ai ,bi ] such that | f (x) − g(x)| < ε for x ∈ [−M, M ) and each ai and bi is a continuity point for FX and also continuity points for each of the FXn . Then
E f (Xn ) =
m
→
ci (FXn (bi ) − FXn (ai ))
i=1 m
(A.23)
ci (FX (bi ) − FX (ai )) = E f (X ).
i=1
Finally, since f differs from g by at most ε on [−M, M), then |E f (Xn ) − E [g(Xn ); Xn ∈ [−M, M )] | ≤ ε
(A.24)
and similarly when Xn is replaced by X . Combining (A.21), (A.22), (A.23), and (A.24) and using the fact that ε is arbitrary shows that E g(Xn ) → E g(X ). Let us examine the relationship between weak convergence and convergence in probability. If Xi is an i.i.d. sequence, then Xi converges weakly, in fact, to X1 , since all the Xi ’s have the same distribution. But from the independence it is not hard to see that the sequence Xi does not converge in probability unless the Xi ’s are identically constant. Therefore one can have weak convergence without convergence in probability.
A.12 Weak convergence
369
Proposition A.42 (1) If Xn converges to X in probability, then it converges weakly. (2) If Xn converges weakly to a constant, it converges in probability. (3) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly to a constant b, then Xn + Yn converges weakly to X + b and XnYn converges weakly to bX . Proof To prove (1), let g be a bounded and continuous function. If n j is any subsequence, then there exists a further subsequence such that X (n jk ) converges almost surely to X . Then by dominated convergence, E g(X (n jk )) → E g(X ). That suffices to show E g(Xn ) converges to E g(X ). For (2), if Xn converges weakly to b,
P(Xn − b > ε) = P(Xn > b + ε) = 1 − P(Xn ≤ b + ε) → 1 − P(b ≤ b + ε) = 0. We use the fact that if Y is identically equal to b, then b + ε is a point of continuity for FY . A similar equation shows P(Xn − b ≤ −ε) → 0, so P(|Xn − b| > ε) → 0. We now prove the first part of (3), leaving the second part for the reader. Let x be a point such that x − b is a continuity point of FX . Choose ε so that x − b + ε is again a continuity point. Then
P(Xn + Yn ≤ x) ≤ P(Xn + b ≤ x + ε) + P(|Yn − b| > ε) → P(X ≤ x − b + ε). Hence lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x + ε). Since ε can be arbitrarily small and x − b is a continuity point of FX , then lim sup P(Xn + Yn ≤ x) ≤ P(X + b ≤ x). The lim inf is done similarly. We say a sequence of distribution functions {Fn } is tight if for each ε > 0 there exists M such that Fn (M ) ≥ 1 − ε and Fn (−M ) ≤ ε for all n. A sequence of random variables {Xn } is tight if the corresponding distribution functions are tight; this is equivalent to P(|Xn | ≥ M ) ≤ ε. Theorem A.43 (Helly’s theorem) Let Fn be a sequence of distribution functions that is tight. There exists a subsequence n j and a distribution function F such that Fn j converges weakly to F . What could conceivably happen is that Xn is identically equal to n, so that FXn → 0, but the function F that is identically equal to 0 is not a distribution function; the tightness precludes this. Proof Let qk be an enumeration of the rationals. Since Fn (qk ) ∈ [0, 1], any subsequence has a further subsequence that converges. Use a diagonalization argument (as in the proof of the Ascoli–Arzel`a theorem; see Rudin (1976)) so that Fn j (qk ) converges for each qk and call the limit F (qk ). F is increasing, and define F (x) = inf qk ≥x F (qk ). Hence F is right continuous and increasing. If x is a point of continuity of F and ε > 0, then there exist r and s rational such that r < x < s and F (s) − ε < F (x) < F (r) + ε. Then Fn j (x) ≥ Fn j (r) → F (r) > F (x) − ε and Fn j (x) ≤ Fn j (s) → F (s) < F (x) + ε. Since ε is arbitrary, Fn j (x) → F (x).
370
Basic probability
Since the Fn are tight, there exists M such that Fn (−M ) < ε. Then F (−M ) ≤ ε, which implies limx→−∞ F (x) = 0. Showing limx→∞ F (x) = 1 is similar. Therefore F is in fact a distribution function. We conclude by giving an easily checked criterion for tightness. Proposition A.44 Suppose there exists ϕ : [0, ∞) → [0, ∞) that is increasing and ϕ(x) → ∞ as x → ∞. If a = supn E ϕ(|Xn |) < ∞, then the sequence {Xn } is tight. Proof
Let ε > 0. Choose M such that ϕ(x) ≥ a/ε if x > M. Then ϕ(|Xn |) ε 1(|Xn |>M ) d P ≤ E ϕ(|Xn |) ≤ ε. P(|Xn | > M ) ≤ a/ε a
The conclusion follows. In particular, if supn E |Xn |2 < ∞, the sequence {Xn } is tight.
A.13 Characteristic functions itx We define the characteristic itx function of a random variable X by ϕX (t ) = E e for t ∈ R. Note that ϕX (t ) = e PX (dx). Thus if X and Y have the same law, they have the same characteristic function. Also, if the law of X has a density, that is, PX (dx) = fX (x) dx, then ϕX (t ) = eitx fX (x) dx, so in this case the characteristic function is the same as the definition of the Fourier transform of fX .
Proposition A.45 ϕ(0) = 1, |ϕ(t )| ≤ 1, ϕ(−t ) = ϕ(t ), and ϕ is uniformly continuous. Proof Since |eitx | ≤ 1, everything follows immediately from the definitions except the uniform continuity. For that we write |ϕ(t + h) − ϕ(t )| = |E ei(t+h)X − E eitX | ≤ E |eitX (eihX − 1)| = E |eihX − 1|. Since |eihX − 1| tends to zero almost surely as h → 0, the right-hand side tends to zero by dominated convergence. Note that the right-hand side is independent of t. Proposition A.46 ϕaX (t ) = ϕX (at ) and ϕX +b (t ) = eitb ϕX (t ). Proof
The first follows from E eit(aX ) = E ei(at )X , and the second is similar.
Proposition A.47 If X and Y are independent, then ϕX +Y (t ) = ϕX (t )ϕY (t ). Proof
From the multiplication theorem,
E eit(X +Y ) = E eitX eitY = E eitX E eitY , and we are done. Let us look at some examples of characteristic functions. (1) Bernoulli: By direct computation, ϕX (t ) = peit + (1 − p) = 1 − p(1 − eit ).
A.13 Characteristic functions
371
(2) Binomial: Write X as the sum of n independent Bernoulli random variables Bi with parameter p. Thus ϕX (t ) =
n
ϕBi (t ) = [ϕBi (t )]n = [1 − p(1 − eit )]n .
i=1
(3) Point mass at a: E eitX = eita . Note that when a = 0, then ϕ is identically equal to 1. (4) Poisson:
Ee
itX
=
∞
eitk e−λ
k=0
(λeit )k λk it it = e−λ = e−λ eλe = eλ(e −1) . k! k!
(5) Uniform on [a, b]: ϕ(t ) =
1 b−a
b
eitx dx = a
eitb − eita . (b − a)it
Note that when a = −b this reduces to sin(bt )/bt. (6) Exponential: ∞ ∞ itx −λx λe e dx = λ e(it−λ)x dx = ϕ(t ) = 0
0
(7) Standard normal: 1 ϕ(t ) = √ 2π
∞
λ . λ − it
eitx e−x /2 dx. 2
−∞
This can be done the square and then doing a contour integration. Alternately, √ by completing 2 ∞ ϕ (t ) = (1/ 2π ) −∞ ixeitx e−x /2 dx. (Do the real and imaginary parts separately, and use the dominated convergence theorem to justify taking the derivative inside.) Integrating by parts (do the real and imaginary parts separately), ϕ (t ) = −tϕ(t ). The only solution to this 2 differential equation with ϕ(0) = 1 is ϕ(t ) = e−t /2 . (8) Normal with mean μ and variance σ 2 : Writing X = σ Z + μ, where Z is a standard normal, then ϕX (t ) = eiμt ϕZ (σ t ) = eiμt−σ
t /2
2 2
.
(A.25)
(9) Gamma. If X has a gamma distribution with parameters λ and r, then its characteristic function is
λ r E eiuX = . λ − it Formally, this comes from writing ∞ ∞ λr 1 eitx λe−λx (λx)r−1 dx = e−(λ−it )x xr−1 dx ϕ(t ) =
(r) 0
(r) 0 and performing a change of variables. To do it properly requires a contour integration around the boundary of the region in the complex plane that is bounded by the positive x axis, the ray {(λ − it )r : r > 0}, ∂B(0, ε), and ∂B(0, R), and then letting ε → 0 and R → ∞.
372
Basic probability
A.14 Uniqueness and characteristic functions Theorem A.48 If ϕX = ϕY , then PX = PY . Proof If f is in the Schwartz class, then so is f ; see Section B.2. We use the Fubini theorem and the Fourier inversion theorem to write −1 −iuX −1 f (u)ϕX (−u) du, E f (X ) = (2π ) E du = (2π ) f (u)e and similarly for E f (Y ). Since ϕX = ϕY , we conclude E f (X ) = E f (Y ). By a limit procedure, we have this equality for all bounded and measurable f , in particular, when f is the indicator of a set. The same proof works in higher dimensions: if
E ei
n j=1
ujXj
= E ei
n j=1
u jYj
for all (u1 , . . . , un ) ∈ Rn , then the joint laws of (X1 , . . . , Xn ) and (Y1 , . . . , Yn ) are equal. The i nj=1 u j X j expression E e is called the joint characteristic function of (X1 , . . . , Xn ). The following proposition can be proved directly, but the proof using characteristic functions is much easier. Proposition A.49 (1) If X and Y are independent, X is a normal random variable with mean a and variance b2 , and Y is a normal random variable with mean c and variance d 2 , then X + Y is normal random variable with mean a + c and variance b2 + d 2 . (2) If X and Y are independent, X is a Poisson random variable with parameter λ1 , and Y is a Poisson random variable with parameter λ2 , then X + Y is a Poisson random variable with parameter λ1 + λ2 . (3) If X and Y are independent random variables, where X has a gamma distribution with parameters λ and r1 and Y has a gamma distribution with parameters λ and r2 , then X + Y has a gamma distribution with parameters λ and r1 + r2 . Proof
For (1), 2 2
ϕX +Y (t ) = ϕX (t )ϕY (t ) = eiat−b t
/2 ict−c2 t 2 /2
e
= ei(a+c)t−(b +d 2
2
)t 2 /2
.
Now use the uniqueness theorem. Parts (2) and (3) are proved similarly.
A.15 The central limit theorem We need the following estimate on moments. Proposition A.50 If E |X |k < ∞ for an integer k, then ϕX has a continuous derivative of order k and ϕX(k) (t ) = (ix)k eitx PX (dx). In particular, ϕX(k) (0) = ik E X k .
A.15 The central limit theorem
Proof
373
Write ϕX (t + h) − ϕX (t ) = h
ei(t+h)x − eitx P(dx). h Since |eihx − 1| ≤ |h| |x|, the integrand is bounded by |x|. Thus if |x|PX (dx) < ∞, we can use dominated convergence to obtain the desired formula for ϕX (t ). As in the proof of Proposition A.45, we see ϕX (t ) is continuous. We do the case of general k by induction. Evaluating ϕX(k) at 0 shows ϕX(k) (0) = ik E X k . By the above,
E X 2 = −ϕX (0).
(A.26)
The simplest case of the central limit Xi ’s are i.i.d., theorem (CLT) is the case when the √ with mean zero and variance one, Sn = ni=1 Xi , and then the CLT says that Sn / n converges weakly to a standard normal. This is the case we prove. We need the fact that if wn are complex numbers converging to w, then (1+(wn /n))n → ew . We leave the proof of this to the reader, with the warning that any proof using logarithms needs to be done with some care, since log z is a multivalued function when z is complex. Theorem A.51 Suppose the Xi ’s are i.i.d. random variables with mean zero and variance √ one. Then Sn / n converges weakly to a standard normal. Proof Since X1 has finite second moment, then ϕX1 has a continuous second derivative by Proposition A.50. By Taylor’s theorem, ϕX1 (t ) = ϕX1 (0) + ϕX 1 (0)t + ϕX1 (0)t 2 /2 + R(t ), where |R(t )|/t 2 → 0 as |t| → 0. Thus ϕX1 (t ) = 1 − t 2 /2 + R(t ). Then √ n √ √ t2 + R(t/ n) . ϕSn /√n (t ) = ϕSn (t/ n) = (ϕX1 (t/ n))n = 1 − 2n √ Since t/ n converges to zero as n → ∞, we have ϕSn /√n (t ) → e−t
2
/2
.
√ Since E Sn2 /n = 1 for all n, Proposition A.44 tells us that the random variables Sn / n are tight, and from Theorem A.43, subsequential weak limit points exist. By the preceding paragraph, any weak limit of a subsequence is a normal random variable with mean zero and variance one. Therefore the entire sequence converges weakly to a normal random variable with mean zero and variance one.
374
Basic probability
A.16 Gaussian random variables A normal random variable is also known as a Gaussian random variable. Proposition A.52 If Z is a mean zero normal random variable with variance one and x ≥ 1, then 1 −x2 /2 2 e ≤ P(Z ≥ x) ≤ e−x /2 . x In particular, if ε > 0, there exists x0 such that
P(Z ≥ x) ≥ e−(1+ε)x /2 2
if x ≥ x0 . Proof
For the right-hand inequality, ∞ ∞ y −y2 /2 1 1 2 2 e P(Z ≥ x) = √ e−y /2 dy ≤ dy = e−x /2 . x x 2π x x
The left-hand inequality is left as an exercise. Proposition A.53 If Xn is a normal random variable with mean an and variance b2n , Xn converges to X weakly, an → a, and bn → b = 0, then X is a normal random variable with mean a and variance b2 . Proof
Since
E Xn2 = Var Xn + (E Xn )2 = b2n + a2n , then supn E Xn2 < ∞, and the Xn are tight. For each t, the characteristic functions converge: ϕX (t ) = lim ϕXn (t ) = lim eitan −t n→∞
2 2 bn /2
n→∞
= eita−t
b /2
2 2
,
and the last term is the characteristic function of a normal random variable with mean a and variance b2 . Therefore any weak subsequential limit point of the sequence Xn is a normal random variable with mean a and variance b2 . We next prove Proposition A.54 If
E ei(uX +vY ) = E eiuX E eivY
(A.27)
for all u and v, then X and Y are independent random variables. Proof Let X be a random variable with the same law as X , Y one with the same law as Y , and so that X is independent of Y . (We let = [0, 1]2 , P a Lebesgue measure, X a function of the first variable, and Y a function of the second variable defined as in Proposition A.2.) Then since eiuX and eivY are independent,
E ei(uX +vY ) = E eiuX E eivY .
(A.28)
Since X , X have the same law, E eiuX = E eiuX , and similarly for Y, Y . Therefore, using (A.27) and (A.28), (X , Y ) has the same joint characteristic function as (X , Y ). By the
A.16 Gaussian random variables
375
uniqueness theorem for characteristic functions, (X , Y ) has the same joint law as (X , Y ), which implies that X and Y are independent. A sequence of random variables X1 , . . . , Xn is said to be jointly normal if there exists a sequence of i.i.d. normal random variables Z1 , . . . , Zm with mean zero and variance one and constants bi j and ai such that Xi =
m
bi j Z j + ai ,
i = 1, . . . , n.
(A.29)
j=1
In matrix notation, X = BZ + A. For simplicity, in what follows let us take A = 0; the modifications for the general case are easy. The covariance of two random variables X and Y is defined to be E [(X − E X )(Y − E Y )]. Since we are assuming our normal random variables are mean zero, we can omit the centering at expectations. Given a sequence of mean zero random variables, we can talk about the covariance matrix, which is Cov (X ) = E X X T , where X T denotes the transpose of the vector X . In the above case, we see Cov (X ) = E [(BZ)(BZ)T ] = E [BZZ T BT ] = BBT , since E ZZ T = I, the identity. T Let us compute the joint characteristic function E eiu X of the vector X , where u is an n-dimensional vector. First, if v is an m-dimensional vector,
Ee
ivT Z
=E
m
iv j Z j
e
=
j=1
m
Ee
iv j Z j
=
j=1
m
e−v j /2 = e−v 2
T
v/2
j=1
using the independence of the Z j ’s. Thus
E eiu
T
X
= E eiu
T
BZ
= e−u
T
BBT u/2
.
By taking u = (0, . . . , 0, a, 0, . . . , 0) to be a constant times the unit vector in the jth coordinate direction, we deduce that X j is indeed normal, and this is true for each j. Note that the joint characteristic function of a jointly normal collection of random variables X = (X1 , . . . , Xn ) is completely determined by BBT , which is the covariance matrix of X . In the case when the Xi ’s are not mean zero, we can readily check that the joint characteristic function is determined by the covariance matrix together with the vector of means E X . Therefore the joint distribution of a jointly normal collection of random variables is determined by the covariance matrix and the means. Proposition A.55 If the Xi are jointly normal and Cov (Xi , X j ) = 0 for i = j, then the Xi are independent. Proof If Cov (X ) = BBT is a diagonal matrix, then the joint characteristic function of the Xi ’s factors into the product of the characteristic functions of the Xi ’s, and so by Proposition A.54, the Xi ’s will in this case be independent. Remark A.56 We note that the analog of Proposition A.53 holds for jointly normal random vectors. That is, if (X j1 , . . . , X jn ) is a jointly normal collection of random variables for each j and each X ji converges in probability to X i and each Xi is nonconstant, then (X 1 , . . . , X n )
376
Basic probability
is a jointly normal collection of random variables. This follows by looking at the joint characteristic functions as in the proof of Proposition A.53. We present the multidimensional central limit theorem. Theorem A.57 Let X j = (X j1 , . . . , X jd ) be random vectors taking values in Rd and suppose k k 2 the X1 , X2 , . . . are independent and identically distributed. n Suppose E X1√= 0 and E (X1 ) < k ∞ for k = 1, . . . , d and let Ck = E [X1 X1 ]. If Sn = j=1 X j , then Sn / n converges weakly to a jointly normal random vector Z = (Z 1 , . . . , Z d ) where each Z k has mean zero and the covariance of Z k and Z is Ck . Proof
Since
E |Sn |2 /n =
n d
E |X jk |2 /n
j=1 k=1
√ is bounded independently of n, the random vectors Sn / n are tight, and therefore weak subsequential limit points exist. We need to show that any subsequential limit point is a jointly normal random vector with mean zero and covariance matrix C. If u1 , . . . , ud ∈ R, then dk=1 uk X jk , j = 1, 2, . . . , will be a sequence of i.i.d. random variables with mean zero and variance dk,=1 uk uCk . By Theorem A.51, n j=1
d k=1
√ n
uk X jk
converges weakly to a mean zero normal random variable with variance equal to d u u Ck . If we write Sn = (Sn1 , . . . , Snd ), then k k,=1 d d
√ E exp i uk Snk / n → exp − uk uCk /2 . k=1
k,=1
√ This shows that any subsequential limit point of the sequence Sn / n has the required law. If (X , Y1 , . . . , Yn ) are jointly normal random variables, then the law of X given Y1 , . . . , Yn is also Gaussian. Proposition A.58 Suppose X , Y1 , . . . , Yn are jointly normal random variables with mean zero. Let A be the n × 1 matrix whose ith entry is Cov (X , Yi ), B the n × n matrix whose (i, j)th entry is Cov (Yi , Y j ), and Y the n × 1 matrix whose ith entry is Yi . Suppose B is invertible and let D = B−1 A. Then for u ∈ R,
E [eiuX | Y1 , . . . , Yn ] = eiuD Y e−(Var X −A T
T
B−1 A)/2
.
In particular, the law of X given Y1 , . . . , Yn is that of a normal random variable with mean DT Y and variance equal to Var X − AT B−1 A.
A.16 Gaussian random variables
Proof
377
Note Cov (X − DT Y, Y j ) = Cov (X , Y j ) −
n
Di Cov (Yi , Y j )
i=1
= Aj −
n
Di Bi j = 0,
i=1
so X − DT Y is independent of each Y j . Then
E [eiuX | Y1 , . . . , Yn ] = eiuD Y E [eiu(X −D T
=e
T
iuD Y
T
iu(X −D Y
E [e
| Y1 , . . . , Yn ]
Y
T
]
= eiuD Y E e−Var (X −D T
To complete the proof, we calculate Var (X − DT Y ) = Var X − 2
T
Di Ai +
i
= Var X − AT B−1 A, and we are done.
Y )/2
.
i, j
Di Bi j D j
Appendix B Some results from analysis
B.1 The monotone class theorem The monotone class theorem is a result from measure theory used in the proof of the Fubini theorem. Definition B.1 M is a monotone class if M is a collection of subsets of X such that (1) if A1 ⊂ A2 ⊂ · · · , A = ∪i Ai , and each Ai ∈ M, then A ∈ M; (2) if A1 ⊃ A2 ⊃ · · · , A = ∩i Ai , and each Ai ∈ M, then A ∈ M. Recall that an algebra of sets is a collection A of sets such that if A1 , . . . , An ∈ A, then A1 ∪ · · · ∪ An and A1 ∩ · · · ∩ An are also in A, and if A ∈ A, then Ac ∈ A. The intersection of monotone classes is a monotone class, and the intersection of all monotone classes containing a given collection of sets is the smallest monotone class containing that collection. Theorem B.2 Suppose A0 is an algebra of sets, A is the smallest σ -field containing A0 , and M is the smallest monotone class containing A0 . Then M = A. Proof A σ -algebra is clearly a monotone class, so M ⊂ A. We must show A ⊂ M. Let N1 = {A ∈ M : Ac ∈ M}. Note N1 is contained in M, contains A0 , and is a monotone class. Since M is the smallest monotone class containing A0 , then N1 = M, and therefore M is closed under the operation of taking complements. Let N2 = {A ∈ M : A ∩ B ∈ M for all B ∈ A0 }. N2 is contained in M; N2 contains A0 ∞ because A0 is an algebra; N2 is a monotone class because (∪∞ i=1 Ai ) ∩ B = ∪i=1 (Ai ∩ B), and similarly for intersections. Therefore N2 = M; in other words, if B ∈ A0 and A ∈ M, then A ∩ B ∈ M. Let N3 = {A ∈ M : A ∩ B ∈ M for all B ∈ M}. As in the preceding paragraph, N3 is a monotone class contained in M. By the last sentence of the preceding paragraph, N3 contains A0 . Hence N3 = M. We thus have that M is a monotone class closed under the operations of taking complements and taking intersections. This shows M is a σ -algebra, and so A ⊂ M.
378
B.2 The Schwartz class
379
B.2 The Schwartz class A function f : R → R is in the Schwartz class if f is C ∞ and for each m, k ≥ 0 and each i1 , i2 , . . . , ik ∈ {1, 2, . . . , d}, ∂k f (x) → 0 |x|m ∂xi1 · · · ∂xik d
as |x| → ∞. (Here i1 , . . . , ik need not be distinct.) Suppose that f is in the Schwartz class. Suppose m, k ≥ 0 and i1 , . . . , ik and j1 , . . . , jn are each integers between 1 and d inclusive, and m1 , . . . , mk are even positive integers. Let f be the Fourier transform of f : eiu·x f (x) dx. f (u) = Rd
Then umi1 1 · · · umik k
∂ j1 +···+ jn f (u) ∂u j1 · · · ∂u jn
is bounded as a function of u because it is a constant times the Fourier transform of ∂ m1 +···+mk f , x j1 · · · x jn m1 ∂xi1 · · · ∂xmik k which is in L1 (Rd ) since f is in the Schwartz class. We conclude that f is also in the Schwartz class.
Appendix C Regular conditional probabilities
Let E ⊂ F be σ -fields, where (, F , P ) is a probability space. A regular conditional probability for E [ · | E ] is a map Q : × F → [0, 1] such that (1) Q(ω, ·) is a probability measure on (, F ) for each ω; (2) for each A ∈ F , Q(·, A) is an E measurable random variable; (3) for each A ∈ F and each B ∈ E , Q(ω, A) P(dω) = P(A ∩ B). B
Q(ω, A) can be thought of as P(A | E ). Theorem C.1 Suppose (, F , P ) is a probability space, E ⊂ F , and is in addition a complete and separable metric space. Then a regular conditional probability for P(· | E ) exists. Proof Since is a complete and separable metric space, we can embed as a subset of the compact set I = [0, 1]N , where we furnish I with the product topology. Let { f j } be a countable collection of uniformly continuous functions on such that every finite subset of distinct elements is linearly independent and such that L0 , the set of finite linear combinations of the f j ’s, is dense in the class of uniformly continuous functions on ; let us assume f1 is identically equal to 1. For each j, let g j = E [ f j | E ]. (The random variables g j are only defined up to almost sure equivalence. For each j we select an element g j from the equivalence class and keep it fixed.) If r1 , . . . , rn are rationals with r1 f1 (ω) + · · · + rn fn (ω) ≥ 0 for all ω, let N (r1 , . . . , rn ) = {ω : r1 g1 (ω) + · · · + rn gn (ω) < 0}. By the definition of g j , P(N (r1 , . . . , rn )) = 0. Let N1 be the union of all such N (r1 , . . . , rn ) with n ≥ 1, the r j rational. Then N1 ∈ E and P(N1 ) = 0. Fix ω ∈ \ N1 . Define a functional Lω on L0 by Lω ( f ) = t1 g1 (ω) + · · · + tn gn (ω) if f = t1 f1 + · · · + tn fn . 380
Regular conditional probabilities
381
We claim Lω is a positive linear functional. If f = t1 f1 + · · · + tn fn ≥ 0 and ε > 0 is rational, then there exist rationals r1 , . . . , rn such that r1 f1 + . . . + rn fn ≥ −ε and |ti − ri | ≤ ε, i = 1, . . . , n, or (r1 + ε) f1 + r2 f2 + · · · + rn fn ≥ 0. Since ω ∈ / N1 , then (r1 + ε)g1 + r2 g2 + · · · + rn gn ≥ 0. Letting ε → 0, it follows that t1 g1 + · · · + tn gn ≥ 0. This proves that Lω is positive. Since Lω ( f1 ) = 1, this implies that Lω is a bounded linear functional, and by the Hahn– Banach theorem Lω can be extended to a positive linear functional on the closure of L0 . Any uniformly continuous function on can be extended uniquely to , the closure of in I, so Lω can be considered as a positive linear functional on C(). By the Riesz representation theorem, there exists a probability measure Q(ω, ·) such that f (ω )Q(ω, dω ). Lω ( f ) = The mapping ω → Lω ( f ) is measurable with respect to E for each f ∈ L0 , hence for all uniformly continuous functions on by a limit argument. If B ∈ E and f = t1 f1 + · · · +tn fn , f (ω ) Q(ω, dω ) P(dω) = Lω f (ω) P(dω) B B = (t1 g1 + · · · + tn gn )(ω) P(dω) B = E [t1 f1 + · · · + tn fn | E ](ω) P(dω) B = f (ω) P(dω)
B
or f (ω )Q(ω, dω ) is a version of E [ f |E ] if f ∈ L0 . By a limit argument, the same is true for all f that are of the form f = 1A with A ∈ F . Let Gni be a sequence of balls of radius 1/n (with respect to the metric on ) contained in and covering . Choose in such that P(∪i≤in Gni ) > 1 − 1/(n2n ). The set Hn = ∩n≥1 ∪i≤in Gni is totally bounded; let Kn be the closure of Hn in . Since is complete, Kn is complete and totally bounded, and hence compact, and P(Kn ) ≥ 1 − 1/n. Hence
E [Q(·, ∪∞ i=1 Ki ); \ N1 ] ≥ E [Q(·, Kn ); \ N1 ] = P (Kn ) ≥ 1 − (1/n) for each n, or Q(ω, ∪∞ i=1 Ki ) = 1, a.s. Let N2 be the null set for which this fails. Thus for ω ∈ \ (N1 ∪ N2 ), we see that Q(ω, dω ) is a probability measure on . For ω ∈ N1 ∪ N2 , set Q(ω, ·) = P(·). This Q is the desired regular conditional probability.
Appendix D Kolmogorov extension theorem
Suppose S is a metric space. We use S N for the product space S × S × · · · furnished with the product topology. We may view S N as the set of sequences (x1 , x2 , . . .) of elements of S . We use the σ -field on S N generated by the cylindrical sets. Given an element x = (x1 , x2 , . . .) of S N , we define πn (x) = (x1 , . . . , xn ) ∈ S n . We suppose we have a Radon probability measure μn defined on S n for each n. (Being a Radon measure means that we can approximate μn (A) from below by compact sets; see Folland (1999) for details.) The μn are consistent if μn+1 (A × S ) = μn (A) whenever A is a Borel subset of S n . The Kolmogorov extension theorem is the following. Theorem D.1 Suppose for each n we have a probability measure μn on S n . Suppose the μn ’s are consistent. Then there exists a probability measure μ on S N such that μ(A× S N ) = μn (A) for all A ⊂ S n . Proof Define μ on cylindrical sets by μ(A × S N ) = μn (A) if A ⊂ S n . By the consistency assumption, μ is well defined. By the Carath´eodory extension theorem, we can extend μ to the σ -field generated by the cylindrical sets provided we show that whenever An are cylindrical sets decreasing to ∅, then μ(An ) → 0. Suppose An are cylindrical sets decreasing to ∅ but μ(An ) does not tend to 0; by taking a subsequence we may assume without loss of generality that there exists ε > 0 such that μ(An ) ≥ ε for all n. We will obtain a contradiction. We first want to arrange things so that each An = πn (An ) × S N . Suppose An is of the form An = {(x1 , x2 , . . .) : (x1 , . . . , x jn ) ∈ Bn }, where Bn is a Borel subset of S jn . We choose mn = n + max( j1 , . . . , jn ). Let A0 = S N . We then replace our original sequence A1 , A2 , . . . by the sequence A0 , . . . , A0 , A1 , . . . , A1 , A2 , . . . , A2 , A3 , . . . , where we have m1 occurrences of A0 , m2 − m1 occurrences of A1 , m3 − m2 occurrences of A2 , and so on. Therefore we may without loss of generality suppose jn ≤ n. We then have An = {(x1 , x2 , . . .) : (x1 , . . . , xn ) ∈ Bn × S n− jn }. Replacing Bn by Bn × S jn −n , we may without loss of generality suppose An = πn (An ) × S N . 382
Kolmogorov extension theorem
383
n ⊂ A n is compact and μ(A n ) ≤ n = πn (An ). For each n, choose B n so that B n \ B We set A n+1 N n × S and let Cn = B1 ∩ . . . ∩ Bn . Hence Cn ⊂ Bn ⊂ An , and Cn ↓ ∅, but ε/2 . Let Bn = B μ(Cn ) ≥ μ(An ) −
n
μ(Ai \ Bi ) ≥ ε/2,
i=1
n = πn (Cn ), the projection of Cn onto S n , is compact. and C We will find x = (x1 , . . . , xn , . . . ) ∈ ∩nCn and obtain our contradiction. For each n choose a point y(n) ∈ Cn . The first coordinates of {y(n)}, namely, {y1 (n)}, form a sequence contained 1 , which is compact, hence there is a convergent subsequence {y1 (nk )}. Let x1 be the limit in C point. The first and second coordinates of {y(nk )} form a sequence contained in the compact 2 , so a further subsequence {(y1 (nk j ), y2 (nk j ))} converges to a point in C 2 . Since {nk j } set C is a subsequence of {nk }, the first coordinate of the limit is x1 . Therefore the limit point of 2 . We continue this procedure {(y1 (nk j ), y2 (nk j ))} is of the form (x1 , x2 ), and this point is in C n for each n, hence to obtain x = (x1 , x2 , . . . , xn , . . .). By our construction, (x1 , . . . , xn ) ∈ C x ∈ Cn for each n, or x ∈ ∩nCn , a contradiction. A typical application of this theorem is to construct a countable sequence of independent random variables. We construct X1 , . . . , Xn as in Proposition A.10. Here S = [0, 1]. Let μn be the law of (X1 , . . . , Xn ); it is easy to check that the μn form a consistent family. We use Theorem D.1 to obtain a probability measure μ on [0, 1]N . To get random variables out of this, we let Xi (ω) = ωi if ω = (ω1 , ω2 , . . .).
References
Aldous, D. 1978. Stopping times and tightness. Ann. Probab. 6, 335–40. Barlow, M. T. 1982. One-dimensional stochastic differential equations with no strong solution. J. London Math. Soc. 26, 335–47. Bass, R. F. 1983. Skorokhod imbedding via stochastic integrals. S´eminaire de Probabilit´es XVII. New York: Springer-Verlag; 221–4. Bass, R. F. 1995. Probabilistic Techniques in Analysis. New York: Springer-Verlag. Bass, R. F. 1996. The Doob–Meyer decomposition revisited. Can. Math. Bull. 39, 138–50. Bass, R. F. 1997. Diffusions and Elliptic Operators. New York: Springer-Verlag. Billingsley, P. 1968. Convergence of Probability Measures. New York: John Wiley & Sons, Ltd. Billingsley, P. 1971. Weak Convergence of Measures: Applications in Probability. Philadelphia: SIAM. Blumenthal, R. M. and Getoor, R. K. 1968. Markov Processes and Potential Theory. New York: Academic Press. Bogachev, V. I. 1998. Gaussian Measures. Providence, RI: American Mathematical Society. Boyce, W. E. and DiPrima, R. C. 2009. Elementary Differential Equations and Boundary Value Problems, 9th edn. New York: John Wiley & Sons, Ltd. Chung, K. L. 2001. A Course in Probability Theory, 3rd edn. San Diego: Academic Press. Chung, K. L. and Walsh, J. B. 1969. To reverse a Markov process. Acta Math. 123, 225–51. Dawson, D. A. 1993. Measure-valued Markov processes. Ecole d’Et´e de Probabilit´es de Saint-Flour XXI– 1991. Berlin: Springer-Verlag. Dellacherie, C. and Meyer, P.-A. 1978. Probability and Potential. Amsterdam: North-Holland. Dudley, R. M. 1973. Sample functions of the Gaussian process. Ann. Probab. 1, 66–103. Durrett, R. 1996. Probability: Theory and Examples. Belmont, CA: Duxbury Press. Ethier, S. N. and Kurtz, T. G. 1986. Markov Processes: Characterization and Convergence. New York: John Wiley & Sons, Ltd. Feller, W. 1971. An Introduction to Probability Theory and its Applications, 2nd edn. New York: John Wiley & Sons, Ltd. Folland, G. B. 1999. Real Analysis: Modern Techniques and their Applications, 2nd edn. New York: John Wiley & Sons, Ltd. Fukushima, M., Oshima, Y. and Takeda, M. 1994. Dirichlet Forms and Symmetric Markov Processes. Berlin: de Gruyter. Gilbarg, D. and Trudinger, N. S. 1983. Elliptic Partial Differential Equations of Second Order, 2nd edn. New York: Springer-Verlag. Itˆo, K. and McKean, Jr, H. P. 1965. Diffusion Processes and their Sample Paths. Berlin: Springer-Verlag. Kallianpur, G. 1980. Stochastic Filtering Theory. Berlin: Springer-Verlag. Karatzas, I. and Shreve, S. E. 1991. Brownian Motion and Stochastic Calculus, 2nd edn. New York: SpringerVerlag. Knight, F. B. 1981. Essentials of Brownian Motion and Diffusion. Providence, RI: American Mathematical Society. Kuo, H. H. 1975. Gaussian Measures in Banach Spaces. New York: Springer-Verlag. Lax, P. 2002. Functional Analysis. New York: John Wiley & Sons, Ltd.
385
386
References
Liggett, T. M. 2010. Continuous Time Markov Processes: An Introduction. Providence, RI: American Mathematical Society. Meyer, P.-A., Smythe, R. T. and Walsh, J. B. 1972. Birth and death of Markov processes. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. III. Berkeley, CA: University of California Press; 295–305. Obł´oj, J. 2004. The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–90. Øksendal, B. 2003. Stochastic Differential Equations: An Introduction with Applications, 6th edn. Berlin: Springer-Verlag. Perkins, E. A. 2002. Dawson–Watanabe superprocesses and measure-valued diffusions. Lectures on Probability Theory and Statistics (Saint-Flour, 1999). Berlin: Springer-Verlag; 125–324. Revuz, D. and Yor, M. 1999. Continuous Martingales and Brownian Motion, 3rd edn. Berlin: SpringerVerlag. Rogers, L. C. G. and Williams, D. 2000a. Diffusions, Markov Processes, and Martingales, Vol. 1. Cambridge: Cambridge University Press. Rogers, L. C. G. and Williams, D. 2000b. Diffusions, Markov Processes, and Martingales, Vol. 2. Cambridge: Cambridge University Press. Rudin, W. 1976. Principles of Mathematical Analysis, 3rd edn. New York: McGraw-Hill. Rudin, W. 1987. Real and Complex Analysis, 3rd edn. New York: McGraw-Hill. Skorokhod, A. V. 1965. Studies in the Theory of Random Processes. Reading, MA: Addison-Wesley. Stroock, D. W. 2003. Markov Processes from K. Itˆo’s Perspective. Princeton, NJ: Princeton University Press. Stroock, D. W. and Varadhan, S. R. S. 1977. Multidimensional Diffusion Processes. Berlin: Springer-Verlag. Walsh, J. B. 1978. Excursions and local time. Ast´erisque 52–53, 159–92.
Index
adapted, 1, 359 additive functional, 169, 180 classical, 180 Aldous criterion, 264 almost surely, 348 announce, 112 Bessel processes, 200 binomial, 349, 371 Black–Scholes formula, 220 Blumenthal 0–1 law, 164 BMO, 129 Borel–Cantelli lemma, 353, 354 Brownian bridge, 273 Brownian motion, 6, 153 covariance, 8 fractional, 254 integrated, 41 maximum, 27 standard, 6 with drift, 24 zero set, 30, 48, 99, 214, 217 Brownian sheet, 254 Burkholder–Davis–Gundy inequalities, 82 cadlag, 2 Cameron–Martin space, 253 canonical process, 158 Cauchy problem, 321 cemetery, 156, 177 central limit theorem, 373 chaining, 51 change of variables formula, 71 Chapman–Kolmogorov equations, 155 characteristic function, 370 Chebyshev’s inequality, 352 Chung’s law of the iterated logarithm, 47 class D, 57, 124 class DL, 126 closed form, 303 closed operator, 295 compensator, 124, 130
complete filtration, 1 conditional expectation, 357 conditional probability, 357 conditioned processes, 178 consistent, 382 construction of Brownian motion, 36, 248, 254, 284 continuation region, 187 continuous process, 2 convergence almost surely, 355 in L p , 355 in distribution, 367 in law, 367 in probability, 355 weak, 367 convolution semigroup, 285 covariance, 375 covariance matrix, 375 covariation, 58 cumulative normal distribution function, 227 cylindrical set, 3 D[0, 1] compactness, 263 completeness, 262 metrics, 259 debut, 117 debut theorem, 117 density, 349 diffusion coefficient, 193, 315 Dirichlet boundary condition, 290 Dirichlet form, 303 Dirichlet problem, 320 dissipative, 294 distribution, 348 distribution function, 348 divergence form elliptic operators, 307 Donsker invariance principle, 269 Doob decomposition, 361 Doob’s h-path transform, 178 Doob’s inequalities, 14, 361 Doob–Meyer decomposition, 60, 124 drift coefficient, 193, 315 dual optional projection, 124
387
388 dual predictable projection, 124 dyadic rationals, 49 empirical process, 275 entry time, 115 equivalent martingale measure, 223 events, 348 excessive, 184 excessive majorant, 186 exercise time, 219 expected value, 348 exponential, 349, 371 martingale, 89 semimartingale, 144 exponential random variables, 33 Feller process, 161 Feynman–Kac formula, 323 filtration, 1, 2 finite-dimensional distributions, 3 Fourier series, 36 gamma, 350, 371 gauge, 323 Gaussian, 7, 374 Gaussian field, 255 Girsanov theorem, 89, 93, 144 Glivenko–Cantelli theorem, 366 good-λ inequality, 86 Green’s function, 175 Gronwall’s lemma, 201 H¨older continuous, 43, 47 harmonic, 173, 321 Hausdorff dimension, 48, 99 Hausdorff measure, 48 heat equation, 322 Helly’s theorem, 369 Hille–Yosida theorem, 292 hitting time, 115 Hunt process, 165 Hurst index, 254 i.i.d., 364 increasing process, 54, 121 independent, 353 independent increments, 6, 339 indicator, 348 indistinguishable, 2 infinite particle systems, 295 infinitesimal generator, 288 innovation process, 230 innovations approach, 229 integration by parts formula, 74 invariance principle, 108 invariant, 178 Itˆo’s formula, 71 multivariate, 74
Index Jensen’s inequality, 352, 358 John–Nirenberg inequality, 129 joint characteristic function, 372 jointly normal, 7, 375 Kalman–Bucy filter, 234 Karhunen–Lo`eve expansion, 253 kernel, 154 killed process, 177 Kolmogorov backward equation, 291 Kolmogorov continuity criterion, 49 Kolmogorov extension theorem , 382 Kolmogorov forward equation, 292 Kunita–Watanabe inequality, 70 L´evy measure, 342 L´evy process, 32, 297, 339 L´evy system formula, 347 L´evy’s theorem, 77 L´evy–Khintchine formula, 342 last exit, 181 law, 3, 10, 348 law of the iterated logarithm, 44 least excessive majorant, 186 left continuous process, 2 lifetime, 156, 177 LIL, 44 linear equations, 199 linear model, 234 Lipschitz function, 100, 193 local time, 94, 209 joint continuity, 96 locally bounded, 141 lower semicontinuous, 186 Markov property, 25 Markov transition probabilities, 154 Markovian, 303 martingale, 13, 359 continuous, 54 convergence theorem, 363 local, 54, 139 locally square integrable, 139 problem, 316 representation theorem, 80, 81 uniformly integrable, 54, 364 maximum principle, 176 mean, 350 mean rate of return, 218 measure-valued branching diffusion process, 317 metric entropy, 51 minimal augmented filtration, 2, 160 modulus of continuity, 247, 260 moment, 350 monotone class theorem, 378 multiplication theorem, 354
Index natural scale, 326 Neumann boundary condition, 290 Newtonian potential density, 175 NFLVR condition, 223 no free lunch, 223 nondivergence form, 296, 315 non-negative definite, 254 normal, 349, 371 nowhere differentiable, 46 null set, 1, 111 observation process, 229 occupation time density, 175 occupation times, 97 one-dimensional diffusion, 326 optimal reward, 187 optimal stopping problem, 184 optional σ -field, 111 optional projection, 119 optional stopping theorem, 17, 360 optional time, 15 Ornstein–Uhlenbeck process, 159, 198 orthogonality lemma, 131 outer probability, 111 p-variation, 30, 48 partial sums, 364 paths, 2 paths locally of bounded variation, 54 Picard iteration, 101 Poincar´e cone condition, 174 Poisson, 349, 371 point process, 147 process, 32, 171 Poisson’s equation, 319 portmanteau theorem, 237 potential, 155, 323 pr´evisible, 111 predict, 112 predictable, 64, 130 predictable σ -field, 64, 111 predictable projection, 120 probability, 348 process, 1 product formula, 74, 85 progressively measurable, 4 Prohorov metric, 241 Prohorov theorem, 239 purely discontinuous, 143 quadratic variation, 57, 79 quasi-left continuous, 165 random variables, 348 Ray–Knight theorem, 209 recurrence, 167 reduce, 139 reflection principle, 27
regular, 173, 326 regular conditional probability, 312, 380 regular Dirichlet form, 307 reproducing property, 252 resolvent, 155, 286 reward function, 184 right continuous filtration, 1 right continuous process, 2 right continuous with left limits, 2 scale function, 327 scaling, 7 Schr¨odinger operator, 323 Schwartz class, 379 section theorem optional, 117 predictable, 117 self-financing, 219 semigroup, 155 semigroup property, 155 semimartingale, 54, 141 set-indexed process, 255 shift operators, 158 signal process, 229 simple symmetric random walk, 109, 248 Skorokhod embedding, 100 Skorokhod representation, 245 Slutsky’s theorem, 242, 369 space-time process, 182 spectral theorem, 309 speed measure, 329 square integrable martingale, 55 stable subordinator, 347 stationary increments, 6, 339 stochastic integral, 64, 134, 150 local martingales, 69 multiple, 88 semimartingales, 69 stochastic process, 1 stopping time, 15, 359 Stratonovich integral, 84 strong Feller process, 161 strong law of large numbers, 364 strong Markov process, 165 strong Markov property, 25 strongly reduce, 139 sub-Markov transition probability kernels, 283 submartingale, 359 super-Brownian motion, 317 supermartingale, 359 support theorem, 93, 208 symmetric difference, 12 symmetric stable process, 346 Tanaka formula, 94, 95 terminal time, 177 tight, 369
389
390 time change, 78, 105, 180 time inversion, 11 totally inaccessible, 112, 130 trading strategy, 219 trajectories, 2 transience, 167 transition densities, 291 transition probabilities, 154 uniform ellipticity, 296, 307, 315 uniformly absolutely continuous, 356 uniformly integrable, 356 unique in law, 204 upcrossings, 18, 362 usual conditions, 1
Index variance, 350 versions, 2 Vitali convergence theorem, 357 volatility, 218 weak convergence, 367 weak Feller process, 161 weak solution, 204 weak uniqueness, 204 well posed, 316 well measurable, 111 Wiener measure, 6 Yamada–Watanabe condition, 196