A}. E[Xn; T < n] _ Fn j=l E[Xn; T = j], and since {T = Because j} E .3j for all j > 1, the submartingale property implies that E[X,n ] > F,I E[Xj; T = j] > AP{T < n}. This proves the first Doob inequality. For the second portion of (8.26) define (8.28)
7-:= inf{1 < j:5 n : Xj < -A},
where inf 0 := oo. By the optional stopping theorem, EXI < EXTnn Since
X. < -A on Jr < oo}, (8.29)
EXTnn = E[XT; r < n] + E[Xn; r > n] < -AP{r < n} + E[X,+].
This completes the proof.
0
The following convergence theorem of Doob (1940) is a consequence of the Doob inequalities.
The Martingale Convergence Theorem. Let X be a submartingale. Suppose either: (i) X is bounded in LI(P); or (ii) X is non-positive a.s.
Then, limn, Xn exists and is finite a.s. Proof. We follow the general outline of the proof of the strong law of large numbers (p. 73): We first prove things in the L2-case; then truncate down to LI(P). This is achieved in four easy steps.
5. Inequalities and Convergence
135
Step 1. The Non-negative L2-Bounded Case. If X is non-negative and bounded in L2(P), then for all n, k > 1, IIXn+k - XnII2 = IIXn+kII2 + IIXnII2 - 2E[Xn+kXnj (8.30)
= IIXn+kII2 +IIXnII2 - 2E [E(Xn+k I -rn)Xn] IIXn+kII2 - IIXnII2
According to Lemma 8.18, X2 is a submartingale since Xn > 0 for all n. Therefore, IIXnII2 / SUPm IIXmII2 as n / oo. It follows that {Xn}n 1 is a Cauchy sequence in L2(P), and so it converges in L2(P). Let X00 be the L2(P)-limit of Xn, and find nk T oo such that IIXoo - Xnk1I2 < 2-k. By Chebyshev's inequality, 00
00
EP{IXoo- XnkI? }
(8.31)
k=1
4-k
vE>0.
k=1
Thus, by the Borel-Cantelli lemma (p. 73), limk.oo Xnk = X00 a.s. On the other hand, IIXnk+1- XnkII2
IIXnk+l - Xnk III
< IIXoo - XnkII2 + IIXoo - Xnk+l 112 (8.32)
< 2-k +2 -k+l
= 3.2-k Therefore, Corollary 8.35 shows us that for all e > 0, 00
EP (8.33)
k=1
max
jnk:5j:5nk+1
IXj - Xnk I
2
00
>- e< E IIXnk+l - Xnk III E k=1 6
0-0
< E 2-k < oo. k=1
We have used the fact that {Xn+j -
is a submartingale for each fixed n with respect to the filtration {9j+n}j9_o, and that this submartingale starts at 0. It follows from the Borel-Cantelli lemma that (8.34)
lim
max
k-.oo nk<j
IXj - Xnk I = 0
a.s.
Because Xnk -1 X00 a.s., we have proved that lim,',o X,n = X., a.s. As X00 E L2(P) it follows that X00 is a.s. finite. Step 2. The Non-positive Case. If Xn < 0 is a submartingale, then exp(X,,) is a bounded non-negative submartingale (Lemma 8.18). Thanks to Step 1, limn..00 exp(Xn) exists and is finite a.s. Step 3. The Non-negative L'-Bounded Case. If Xn is a non-negative
submartingale that is bounded in L1(P), then thanks to the Krickeberg
136
8. Martingales
decomposition (p. 128), we can write X. = Y is a non-negative martingale and Z is a non-negative supermartingale; see also Remark 8.21. By Step 2, limn Yn and Z exist and are finite a.s. Consequently, Xn exists and is finite a.s. Step 4. The L1-Bounded Case. If X is an L1-bounded submartingale,
then we can write X = Y - Yn - Zn where Y is an L'-bounded martingale and Z is an L1-bounded non-negative supermartingale. Because Y+ and Y- are non-negative submartingales (Lemma 8.18), Step 3 implies that limn_ Yn and limn.w Yn exist a.s. Similarly, Z is a non-negative supermartingale and limn_m Zn exists almost surely; see Step 2. This completes the proof. 0
6. Further Applications Martingale theory provides us with a powerful set of analytical tools and, as such, it is not surprising that it has made an impact on a large number of diverse mathematical problems. This section contains a few applications of this theory. More examples can be found in the exercises.
6.1. Kolmogorov's Zero-One Law. Let {Xt}i=1 denote independent random variables, and define T to be the corresponding tail a-algebra; see Definition 6.14. Recall that the Kolmogorov zero-one law (p. 69) asserts that J is trivial. Here is a martingale proof: Let Fn := a({Xt}= 1), and consider any event A E J. Because A is independent of .fin, we have the a.s. identity
P(A I Fn) = P(A). But P(A ] 9n) defines a Doob martingale. Therefore, in accord with the martingale convergence theorem, L = limn. P(A 1 9n) exists a.s. and in L'(P). We claim that L = 1A a.s. This would then prove that P(A) = 1A a.s., which is Kolmogorov's zero-one law. Fix an integer k _> 1 and note that E[L; B] = limn. E[P(A J r,,); B] = P(A fl B) for all B E Fk. By the monotone class theorem (p. 30), E[L; B] = P(A fl B) for all B E 9, , where 9,,. denotes the smallest a-algebra that contains U°n°__1 n. Because L is 9,,,,-measurable and A E A"., this ensures that L = P(A 19 ...) = 1A a.s. The result follows.
6.2. Levy's Borel-Cantelli Lemma. In this section we describe an optimal improvement to the Borel-Cantelli lemma (p. 73; see also Problem 6.20, p. 86). This improvement is due to Levy (1937, Corollary 68, p. 249). First, we need a definition.
Definition 8.36. If E and F are events, then we say that E = F almost surely when 1E(w) = 1F(w) for almost every w.
6. Further Applications
137
Theorem 8.37. If {. n}n is a filtration and E1, E2, ... are events such 1
that En E .fin for all n > 1, then ( (8.35)
{En occurs infinitely often}
00
E P(Ef 19n_1) = 00
l a. S.
n=2
Consequently, the two events, F 1 := {E 2P(EnI,Fn-1) = oo} and F2 :_ {En occurs infinitely often} have the same probability. The proof of Levy's Borel-Cantelli lemma rests on a general result about martingales with bounded increments.
Theorem 8.38. Suppose {Xn}n is a martingale such that 1 Xn - Xn_1 < a a.s. for all n > 1, where a is a positive non-random constant. Consider the events L1 := {supra Xn < oc}, L2 := {infra Xn > -oo}, and L3 {limn-o. Xn exists and is finite}. Then L1 = L2 = L3 a.s. 1
Proof. For any A > 0 define Ta := inf{n > 1 : Xn > .1}, (8.36) where inf 0 := oo. By the optional stopping theorem (Corollary 8.27), {XnATa }°°_1 is a martingale. Moreover, the fact that the increments of X are at most a implies that XTA < a + A. Therefore, a + A - XnATA defines a non-negative martingale. This must converge a.s. Consequently, for any A > 0 there exists a null set off which limn_.oo XnAT,, exists and is finite. Take the union of these null sets, as A ranges over all positive rationals, to deduce that outside one null-set N, limn-c. X nATA exists and is finite for all rational A > 0. If w E L1 then Ta(w) is infinite for all rational A > supra Xn(w). There-
fore, L1 n N` C L3. By considering the martingale -X we find also that L2 n NC C L3. This proves that (L1 U L2) n N` C L3, whence 1L,uL2 5 1L3 a.s. Since L3 C (L1 n L2), the result follows. 2{1E, - P(E; 1 .!'i_ 1) } Proof of Theorem 8.37. The variables Xn = (n > 1) define a martingale with bounded increments. In the notation of Theorem 8.38, L1 = {Ei 1E, < oo} and L2 = {Ei P(EE j .9'i_1) < oc}.
[This merits a moment's thought.] The proof follows.
6.3. Khintchine's LIL. Suppose {Xi}°_1 are i.i.d. random variables taking the values ±1 with probability 1/2 each, and define Sn := X1 +- +Xn. By the strong law of large numbers (p. 73), Sn/n -i 0 a.s. This particular form of the strong law first appeared in the context of the normal number theorem of Borel (1909). See Problem 6.22 on page 86. One would like to know how fast Sn/n converges to zero. The central
limit theorem suggests that Sn/n cannot tend to zero much faster than
8. Martingales
138
n-1/2. The correct asymptotic size of Sn/n was found in a series of successive improvements by Hausdorff in 1913 (see 1949, pp. 420-421), Hardy and Littlewood (1914), Steinhaus (1922), and Khintchine (1923). The definitive result, along these lines, is the law of the iterated logarithm (LIL) of Khintchine (1924): S" S. 1 a.s. lim sup = - lim inf (8.37) n-oo (2n In In n)1/2 = n-oo (2n In In n)1/2
Khintchine's LIL has a remarkable extension that is valid for sums of general i.i.d. random variables with finite variance.
The Law of the Iterated Logarithm. If {Xi}i=1 are i.i.d. random vari+ Xn, then ables in L2(P) and Sn := XI + S. - nEXI lim sup
(8.38)
n-.oo (2n In In n)1/2
= SD(XI) a.s.
When the Xi's are bounded this was proved by Kolmogorov (1929). Cantelli (1933a) improved Kolmogorov's theorem to the case that X1 E L2+6(P) for some d > 0. Then the LIL for general mean-zero finite-variance increments remained elusive for nearly a decade, until Hartman and Wintner
(1941) devised an ingenious truncation method which reduced the general LIL to that of Kolmogorov. We will derive the LIL only in the case that the X's are normal. The theorem, in its full generality, is much more difficult to prove.
Proof of the LIL for Normal Increments. Without loss of generality, we may assume that the Xi's are standard normal random variables. Define
S.
A:= hm sup
(8.39)
m-oo (2m In In M) 1/z
According to the Kolmogorov 0-1 law (p. 69), A is almost surely a constant. Our task is to prove that A = 1. We do this in three steps.
Step 1. A Large-Deviations Estimate. Fix a t > 0 and define Mn :_ exp (tSn - t2n). Let An := o ({Xi}s 1), and verify that M is a non-negative 2 mean-one martingale. Moreover,
Imax Sl > nt) C I Imax Ml > exp(nt2/2)
(8.40) I.
i_
j_
JJJ
y. J
According to Doob's maximal inequality (p. 134), for all integers n _> 1 and all real numbers t > 0, (8.41)
P
max S. > nt y < exp (- nt2/2). JJJ
6. Further Applications
139
Step 2. The Upper Bound. Choose and fix c > 0 > 1, and define 9k oo, LOkJ (k =11,2.... ). We apply (8.41) to deduce that as k (8.42)
P { max Si > (2cOk_l lnln0k_1)l/2l < exp
J
11<j<ek
cOk-I In InOk_1
`
ek
=
k-(c1e)+°(1).
Thus, the left-hand side is summable in k. By the Borel-Cantelli lemma (p. 73), with probability one there exists a random variable ko such that for all k > k0, maxl<j<ek Sj < (2cOk_1 In ln0k_I )1/2. For all m > Bko we can find k > ko such that 0k_1 < m < °k, and hence, (8.43)
Sm < maxSj j!ek < (2cOk_1lnInOk_l)1/2 < (2cmInInm)1/2.
This proves that A < c1/2. Because the latter holds for all c > 0 > 1, it follows that A < 1. This is fully one-half of the LIL in the case of standard normal increments. We can also apply the preceding to the process -S to obtain the following:
(8.44)
ISmI
limsup
m-oo (2m In In m.)1/2
<1
-
a.s.
Step 3. A Lower Estimate. There exists a constant A > 0 such that (8.45)
P {X1 > Al >
Ae _ 2/2
d'\ > 1.
See Problem 1.16, page 14 for a hint. Choose and fix 0 > 1, and define Ok := [ek J . Consider the events (8.46)
Ek := {Sek+1 - Sok >_ (2ati+1 ln+ In+ Bk)1`21
where ln+ x := ln(x V e), and (8.47)
ak+1
rr
Var (Sek+ 1
- Sek) = 0k+1 - Bk.
The Ek's are independent events, and because of (8.45), for all k large, (8.48)
A
P(Ek) = P {N(0,1) > (21nln0k)112l > JJ
In Ok (2 In In Ok)
112
Consequently, Ek P(Ek) = oo, and hence by the independence part of the Borel-Cantelli Lemma, (8.49)
lim sup k-.oo
Sell+1 - Sek 1/2 > 1 (2(0k+1 Ok) In In 0k)
-
as.
8. Martingales
140
Since 9k+1 - 9k ^' Ok+1(1 - 0-1) as k -4 oc,
S Sek+l - SOk
lim sup
(8.50)
1/2
>
C1 - B)
k-.oo (29k+11n In 9k ) 1/2
a.s.
Thanks to this, (8.44), and the fact that 9k+1 - 9.9k as k
oo,
Sek+1 A > li msup k-oo (29k+11nln9k)1/2
-
Sek+1 Sek > lim sup k-oo (20k+1 In In 0k)1/2
(8.51)
>
C1 -
1/2
0
-
91/2/2
- limk---Sup(29k+11n In 9k) /2 I Sek I
1
a.s.
Let 0 T oc to find that A > 1.
O
6.4. Lebesgue's Differentiation Theorem. The fundamental theorem of calculus asserts that if f : R -I R is continuous then F(w) := fo f (x) dx is differentiable and F' = f. In fact, we have the stronger result that w+6
1
1io
(8.52)
b
f
f(y)dy = f(w),
uniformly for all w in a given compact set. Here is why: For all w E R and b > 0, I
(8.53)
1
fw+6
rw+6 Jw
If (y)
f (y) dy - f (w) l < b
Jw
- f(w) l dy.
Therefore, (8.52) follows from the uniform continuity of f on compact sets. There is a surprising extension of this, due to H. Lebesgue, that holds for all integrable functions f. The following is the celebrated differentiation theorem of Lebesgue.
Theorem 8.39. If fo If (x) I dx < oc, then (8.52) holds for almost every w E [0,1]. Consider the Steinhaus probability space ([0, 1] , 66([0, 11), P). In probabilistic language, Theorem 8.39 states that (8.52) holds almost surely provided that f E L' (P). In order to derive this formulation we need a maximal inequality for the following function M f that is known as the HardyLittlewood maximal function (1930, Theorem 17). First we extend f to a function on R by setting f (w) := f (0) if w < 0 and f (w) := f (1) if w > 1. Then, we define: 1
(8.54)
(Mf)(w) = M(f)(w) =
sup
f
w+6
6E(0,1-w) 8 w
where 0/0 := 0 to ensure that (M f) (1) = 0.
If(y)I dy
"w E [0, 11,
6. Further Applications
141
Theorem 8.40. For all A > 0, p > 1, and f E L'(P), P{Mf > A} < 8PIlflip
(8.55)
Let us first prove Theorem 8.39 assuming the preceding maximal-function inequality. Theorem 8.40 is proved subsequently.
Proof of Theorem 8.39. For notational convenience, define the "averaging operators" A6 as follows: 1
(8.56) A6(f)(w) := (A6f)(w) :=
b
f
w+6
f(y)dy dw E [0,1], f E L1(P)
Thus, we have the pointwise equality, (M f)(w) = suP6E[o,1-w] A6(I f I)(w)
Throughout this proof we tacitly extend the domain of all continuous functions g : [0, 11 - R to R by setting g(w) := g(0) for w < 0 and g(w) := g(1) for w > 1. Because continuous functions are dense in L1(P) (Problem 4.18, p. 50), for every n > 1 we can find a continuous function gn such that II9n - f II i n-1. Let 2 := lim sup6lo I A6f - f I to find that
.2
(8.57)
610
= lim sup IA69n - A6f I + I9,, - f I 610
M(I9n-f1)+I9n-fIIf 2 > A, then by the triangle inequality one of the two terms on the right-most side must be at least A/2. Therefore, we can write (8.58)
P {2 > Al < T1 +T2,
where (8.59)
T1:=P{M(I9n-fl)> 2}
and
T2:=P{I9n-fI>
2}.
We estimate T1 and T2 separately. On one hand, we can apply Theorem 8.40, with p = 1, to deduce that (8.60)
T15
16 A
16 II9n-fII1<--
An
On the other hand, we appeal to Chebyshev's inequality (p. 43) to find that (8.61)
T25
2 -2 II9n-fl11Sn
Consequently, P{2 _> Al < 18/(An) for all n > 1. Let n -' oo and A 10, in this order, to deduce that 2 = 0 a.s. This proves the theorem.
8. Martingales
142
Proof of Theorem 8.40. Because f can be replaced with if 1, we can assume without any loss in generality that f > 0. Also, we will extend the domain of the definition of f by setting f (w) := 0 for all w E R \ (0, 1]. Define .9'n to be the collection of all dyadic intervals in (0, 1]. That is, 1 E . !FnO if and only if I = (j2-n, (j + 1)2-n] where j E {0 , ... , 2n - 11 and n > 0. Define .$n to be the o-algebra generated by .fin. Since every element of FnO is a union of two of the elements of .fin+1+ it follows that gn C JFn+1
(8.62)
do > 0.
That is, {.$n}n°_o is a filtration; it is known as the dyadic filtration. We can view the function f as a random variable, and compute Mn E[f I 9n] using Corollary 8.8: (8.63)
r Mn(w) _ E 1Q(w)2n f f (y) dy
for almost all w E [0, 11.
QE.
It should be recognized that the preceding sum consists of one term only. Next define i to be the collection of all shifted dyadic intervals of the
form J = (j2-n +
2_n-1
,
(j + 1)2-n +
2-n_1),
where j E Z and n > 0.
Let Wn denote the o-algebra generated by the intervals in Wno, and define Nn := E[ f ] 8on]. Because f vanishes outside [0, 11, (8.64)
Nn(w) _
1Q(w)2n
JQ
for almost all w E [0, 11.
f (y) dy
Consider w E (0,1) and b E (0,1 - w). There exists n = n(w) > 0 such that 2-n-1 < 5 < 2-n. We can find 1(w) E JrnO and J(w) E fl-both containing w-such that (w, w + 6) C I (w) U J(w). Because f > 0, f =- 0 off [0,1], and b > 2-n-1, this implies that 1
(8.65)
d
f+' f (y) dy <- 2n+1 j f (y) dy +
Jd(W)
f (y) dy
= 2 (Mn(w) + Nn(w))
Optimize over all 8 to find that M f < 2 supra Mn + 2 supra Nn. Therefore,
for all y>0, (8.66)
P{Mf >A} n>0
4
A}+P(supNn> n>O
Al 4
Note that Mn + N,, is not a martingale because M and N are adapted to different filtrations. However, M and N are martingales in their respective filtrations. We apply the first maximal inequality of Doob (p. 134) to the
6. Further Applications
143
submartingale defined by IMnII" to find that (8.67)
P j suP Mn > l n>0
41
<
sup E (IMnIp) ap n>0
P. Ap IIfIIp
[The last inequality follows from the conditional form of Jensen's inequality.]
A similar inequality holds for N. We can combine our bounds to obtain, 2,4P (8.68)
P{Mf > A} <
IIf IIp
0
The theorem follows because 2.4P < 8p.
The following corollary of Theorem 8.40, essentially due to Hardy and Littlewood (1930, Theorem 17), is noteworthy as it has a number of interesting consequences in real and harmonic analysis.
Corollary 8.41. If p > 1 and f E LP(P) then (8.69)
f l I(Mf)(t)I" dt < (p8p1)p f 1 If(t)Ip dt.
6.5. Option-Pricing in Discrete Time. We now take a look at an application of martingale theory to the mathematics of securities in finance. In this example we consider a simplified case where there is only one
type of stock whose value changes at times n = 1,2,3, ... , N. We start with yo dollars at time 0. During the time period (n, n + 1) we look at the performance of this stock up to time n. Based on this information, we may decide to buy An+i-many shares. Negative investments are also allowed in the marketplace: If An(w) < 0 for some n and w, then we are selling short for that w. This means that we sell An(w) stocks that we do not own, hoping that when the clock strikes N, we will earn enough to pay our debts. Let Sn denote the value of the stock at time n. We simplify the model further by assuming that ISn+i -SnI = 1. That is, the stock value fluctuates b y exactly one unit at each time step, and the stock value is updated precisely at time n f o r every n = 1, 2, .... The only unexplained variable is the ending time N; this is the so-called time to maturity and will be explained later. Now we can place things in a more precise framework. Let St denote the collection of all possible w = (w1, ... , wN) where every
wj takes the values ±1. Intuitively, wj = 1 if and only if the value of our stock went up by 1 dollar at time j. Thus, wj = -1 means that the stock went down by a dollar, and Sl is the collection of all stock movements that are theoretically possible. Define the functions Si, ... , SN by So(w) := 0, and (8.70)
Sn(wi,...,wn)
wi +...+wn
dn= 1,...,N.
8. Alartingales
144
We may abuse the notation slightly and write &(w) in place of S,,(wi ...... a ).
In this way, S,,(w) represents the value of the stock at time n, and corresponds to the stock movements w1, ... , w,,. During the time interval (n, n + 1), we may look at w1,... , w,,, choose a number A,,+1(w) =A.+, (w, .... , wn ), and buy which might depend on shares. If our starting fortune at time 0 is yo, then our fortune at time n depends on { A; (w) };_11, and is given by n (8.71)
YY(w) = Yn(wl,...,wn) = yo + E A.i(w)[Sj(w) - Sy-1(w)], J=1
as n ranges from 1 to N. The sequence {A=(w)}( 1 is our investment strategy.
Recall that it depends on the stock movements {w;}N I in a "previsible manner": i.e., for each n > 2, A,,(w) depends only on w,.... ,Wn-l- JAI does not depend on w.]
A European call option is a gamble wherein we purchase the right to buy the stock at a given price C-the strike or exercise price-at time N. Suppose we have the option to call at C dollars. If it happens that SN(w) > C, then we have gained (SN(w) - C) dollars. This is because we can buy the stock at C dollars and then instantaneously sell the stock at SN(W). On the other hand, if SN(w) < C then it is not wise to buy at C. Therefore, no matter what happens, the value of our option at time N is (SN(w) - C)+. An important question that needs to be settled is this: (8.72)
What is the fair price for a call at C?
This was answered by Black and Scholes (1973) and Merton (1973) for a related, but slightly different, model. The connections to probability were discovered later by Harrison and Kreps (1979) and Harrison and Pliska (1981). The present model, the so-called "binomial pricing model," is due to Cox, Ross, and Rubenstein (1979). In order to explain their solution to (8.72) we need a brief definition from finance.
Definition 8.42. A strategy A is a hedging strategy if-
(i) Using A does not lead us to bankruptcy; i.e., Yn(w) > 0 for all
n = 1,...,N. (ii) Y attains the value of the stock at time N; i.e., YN(W) = (SN(w) - C)+. Of course any strategy A is also previsible.
Let us posit that there are no "arbitrage opportunities," where arbitrage is synonymous to "free lunch." That is, we assume there are no risk-free investments. Then, in terms of our model, yo is the "fair price of a given
6. Further Applications
145
option" if, starting with yo dollars, we can find a hedging/investment strategy that yields the value of the said option at time N, no matter how the stock values behave.
The solution of Black and Scholes (1973), transcribed to the present simplified setting, depends on first making (Q, .1'(l)) into a probability space. Here, Y(Q) denotes the power set of Q. Define the probability measure P so that Xj (w) = wj are i.i.d. taking the values ±1 with probability each. In words, under the measure P, the stock values fluctuate at random
but in a fair manner. Another, yet equivalent, way to define P is as the product measure: (8.73)
P(dw) = Q(dwl) ... Q(dwN)
dw E 1,
where Q({1}) = Q({-1}) = 1/2. Using this probability space (1, .9(l), P), {A;}°°1, {S1}°O1, and {Yt}°_1 are stochastic processes, and we can present the so-called Black-Scholes formula for the fair price yo of a European option.
The Black-Scholes Formula. A hedging strategy exists if (8.74)
yo = E [(SN - C)+] .
Proof (Necessity). We first prove Theorem 6.5 assuming that a hedging strategy A exists. If so then the process Yn defined in (8.71) is a martingale; see Example 8.15. Moreover, by the definition of a hedging strategy, Yn > 0
for all n, and YN = (SN - C)+ a.s. (in fact for all w). On the other hand, martingales have a constant mean; i.e., EYN = EY1 = yo, thanks to (8.71). Therefore, we have shown that yo = E[(SN - C)+] as desired. (] In order to prove the second half we need the following.
The Martingale Representation Theorem. In (Q, .9(1k), P), the process S is a mean-zero martingale. Any other martingale M is a martingale transform of S; i.e., there exists a previsible process H such that n
(8.75)
Mn = EMI + > H3 (Si - Sj-1)
do = 1, ... , N.
j=1
Proof. Because Lemma 8.29 proves that S is a mean-zero martingale, we can concentrate on proving that M is a martingale transform. Since M is adapted, Mn is a function of w1, ... , wn only. We abuse the notation slightly, and write (8.76)
Mn(w) = Mn(w1, ... , Wn)
Vw E I1.
The martingale property states that E[Mn+1 1,9n] = Mn a.s. Now suppose 01, ... , On are bounded and Oj is a function of wj only. Then, thanks to the
8. Martingales
146
independence of the wj's,
E
OLi.Mn+h] =1
(8.77)
[Jctj(wj)Mn+l(wl,...,wn,-1)Q(dwl) ... Q(dwn)
-2 1
+2
J j=1
j(wj)Mn+1(w1,...,wn,1)
G1
1) ... Q(dwn)-
That is, we can write n
n
E f Oj M-+1 = E H Oj . Nn
(8.78)
j=1
,
j=1
where (8.79)
1
Nn(w)
1
=2Mn+1(wl,...,wn, 1)+2Mn+1(wl,...,wn,
Note that Nn is Fn measurable for every n = 1, . . . , N. Therefore, (8.77) and the martingale property of M together show that M = N a.s. This leads us to the formula (8.80)
1
1
Mn(w) = 2Mn+l(wl,... ,wn, l) + 2Mn+1(wl,...,wn, -l),
valid for almost all w E 52.1 In fact, because fl is finite and P assigns positive measure to each wj, the stated equality must hold for all w. Moreover, since
go = 10, 52}, the preceding discussion continues to hold for n = 0 if we define Mo = EMI. Since Mn(w) = ZMn(w) + Z1Lin(w), the following holds for all 0 < n <
N-1andallwE0: (8.81)
Mn+l(wl,...,wn,l)-Mn(w)=Mn(w)-Mn+l(wl,...,wn,-1).
'While this calculation is intuitively clear, you should prove its validity by first checking it of the form fjlZi hj(wj), and then appealing to a monotone class argument. for
6. Further Applications
147
Let dj := Mj+1 - Mj so that Mn+l(w) - Mo = E"=odj(w), and apply the preceding as follows:
[dj(w)1{1}(wj+1) +dj(w)1{-1}(wj+l)]
Mn+1(w) - 11ro =
j=o n (8.82)
_E
(Mj+1(wi,... ,wj, l) - Mj(w)) [1{1}(wj+1) - 1{-1}(wj+1)]
j=o n
E (Mj+1(w1,...,wj, l) - Mj(w)) [Sj+1(w) - Sj(w)] j=o
This proves (8.75) with Hj(w) := Mj(wl,...,wj_1i 1)-Mj-I(w) and Mo O EMI. [Note that H is previsible.] We are ready to prove the second half of the Black-Scholes formula.
Proof of the Black-Scholes Formula (Sufficiency). The process Yn = E[(SN - C)+ I fn] (0 < n < N) is a non-negative Doob martingale. Also it has the property that YN = (SN - C)+ almost surely, and hence for all w (why?). Thanks to the martingale representation theorem, we can find a previsible process A such that n-1
(8.83)
Yn = EYi + > Aj(Sj - Sj_1). j=1
It follows that A is a hedging strategy with yo = EYI. By the martingale property, EYi = EY2 = . . . = EYN. This implies that yo = E[(SN - C)+J, which proves the theorem. 0
6.6. Rademacher's Theorem. A function f : (0, 1) -; R is Lipschitz continuous if there exists a constant A > 0 such that for all x, y E (0, 11, (8.84)
If (x)
- f(y)I < AIx - yl.
The optimal choice of A is called the Lipschitz constant of f . If f' exists and is continuous, then one can perform a one-term Taylor expansion to note that f is Lipschitz continuous. The following theorem of Rademacher (1919) asserts a remarkable converse.
Theorem 8.43. If f : (0, 1] -+ R is Lipschitz-continuous then it is differentiable almost everywhere.
Proof. Let ((0,1] , 4((0,1]) , P) denote the Steinhaus probability space, so that P is Lebesgue measure, and "a.e." is the same thing as "a.s." Also let {.`fi'n}°° 1 and {.9n}n°_1 respectively denote the dyadic intervals and filtration
(p. 142).
8. Martingales
148
two numbers, f(Q) and We associate to all dyadic intervals Q E r(Q): t(Q) is the left end-point of Q and r(Q), the right one. For instance, if Q = (k2--, (k + 1)2-n], then f(Q) = k2-n and r(Q) = (k + 1)2-n. Define (8.85)
Xn(w) :_ QE.9"n
f(r(Q)) - f(e(Q))1Q(w) r(Q) - e(Q)
dw E (0, 1).
This is a difference quotient because the sum consists of exactly one term and the Lipschitz continuity off ensures that sup,, I Xn I is a bounded random variable. From here on, the proof splits into two steps. Step 1. X is a martingale with respect to .9. To prove this, write (8.86)
Xn(w) = E E
f (r(Q))2- f (E(Q))1Q(w)
JE.9 _1 QE.$,o,:
QCJ
By Corollary 8.8, if w E J E .fin-1 and Q E gn is a subset of J, then P(Q I gn_1)(w) is the classical probability P(Q I J), which is z. If U) ¢ J then P(Q I.n_1)(w) = 0. Thus, E[Xn 19n-1] =
E
2
f (t(Q))1 J
f (r(Q)
QE.Fn:
2-
n
QCJ (8.87)
f (r(J)) - f (t(J))1J
_
2-'
JE.3n_1
= Xn_1.
According to the martingale convergence theorem, all bounded martingales
converge a.s. and in L1(P) (p. 134). Therefore, we can find X. such that Xn
X,,,, a.s. [P] and in L1(P).
Step 2. The Conclusion. Suppose I, J E .fin for the same n > 1, and I lies to the left of J; that is, every u E I is less than every v E J. Then we denote this by I < J. For all Q E .5n, fQ Xn(w) du) = f (r(Q)) - f (f(Q)). Therefore, for all
(8.88)
r f (r(J)) - f (0) = E / Xn(w) dw = IEYfO:
I<J
I
j
"M
Xn(w) dw.
6. Further Applications
149
Given any x E (0, 11 and n > 1, we can find a unique J E .'ro such that w E .`ro . If A denotes the Lipschitz constant of f, then
- f (r(J))I < Aix - r(J) I < 2 Therefore, If (x) - f (0) - fo ") Xn(w) dwi < A2-n. Also, (8.89)
If (x)
jr(j) Xn(w)
(8.90)
rx+2-^
dw - fox Xn(w) dw
<_ J
IXn(w)Idw.
By the dominated convergence theorem, the right-hand side goes to zero as n -+ oo. Therefore, by the monotone convergence theorem,
f (x) - f (0) = J X,,, (w) dw
(8.91)
ex E (0, 11.
0
Rademacher's theorem follows from Lebesgue's differentiation theorem (Theorem 8.39), and f' = X, almost everywhere.
6.7. Random Patterns. Suppose X1, X2, ... are i.i.d. random variables with P{X1 = 1} = p and P{X1 = 0} = q, where q:= 1-p and 0 < p < 1. It is possible to use the Kolmogorov zero-one law and deduce that the infinite sequence X1, X2,... will a.s. contain a zero, say. In fact, with probability one, any predescribed finite pattern of zeroes and ones will appear infinitely often in the sequence {X1, X2,. ..}. Let N denote the first k such that the sequence {X1, ... , Xk} contains a predescribed, non-random pattern. Then, we wish to know EN.
The simplest patterns are "0" and "1." So let N denote the smallest k such that {X1, . . . , Xk} contains a "0." It is not hard to convince yourself
that EN = 1/q because P{N = j} = p)-1q for j = 1,2,.... But this calculation uses too much of the structure of the pattern "0." Next is a more robust argument, due to Li (1980): Consider the process Yn := 91{X1=0} +
(8.92)
Define
91{X2=0} +
+q
to be the o-algebra defined by {X;} 1 for every n > 1. Then,
for alln>1, (8.93)
E[Yn+1 I fn] = Y. + 1 P(Xn+1 = 01.13.) = Yn + 1.
Therefore, {Yn - n}n 1 is a mean-zero martingale (check!). By the optional stopping theorem, E[N A n] = EYNnn for all n > 1. But N < oc a.s., and both IN A n}n and {YNnn}n 1 are increasing. Therefore, we can apply the monotone convergence theorem to deduce that EYN = EN. Because YN = (1/q) almost surely, EN = 1/q, as we know already. 1
8. Martingales
150
The advantage of the second proof is that it can be applied to other patterns. Suppose, for instance, the pattern is a sequence of f ones, where f > 1 is an integer. Consider
1
Zn,t :=
1{X,=1,...,X1=1} + 71{X2=1,....Xt+1=1} 1
+ ... +
(8.94)
1{xn_t+,=l,...,X =1} + pe11{Xn-t+2=1,....Xn=1} 1
+ p1-221{Xn-1+3=1,...,Xn=1} + ... + P1{Xn=1}
Then, you should check that {Zn,t - n}'_1 is a martingale. As before, we
have EZN,t = EN, and now we note that ZN,t = (1/pt)+(1/pt-1)+ +(1/p) a.s. Therefore, (8.95)
Therefore, set e = 2 to find that
EN =
(8.96)
1
+
p
for the pattern "11."
Our next result is another example of this kind.
Lemma 8.44. If the pattern is "01" then EN = 1/(pq). Define (8.97)
Wn
p1{X1=0,X2=1} +
+
1 1{Xn-,=O,xn=1} + 91{x.=o}
One can prove that {Wn - n}°O=1 is a martingale. Lemma 8.44 is proved by using the preceding martingale methods. Suppose we wished to know which of the two patterns, "01" and "11," is more likely to come first. To answer this, we first note that { Wn - Zn,2 } °,° 1 is a martingale, since it is the difference of two martingales.
Define T to be the smallest integer k > 1 such that the sequence {X1, ... , Xk} contains either "01" or "11." Then, we argue as before and
find that E[WT - ZT,2) = 0. But WT - ZT,2 = q-1 on ("01" comes up first}, and WT - ZT,2 = -(1/p) - (1/p2) = -(p + 1)/p2 on {"11" comes up first}. Therefore,
0=E[WT-ZT,21 (8.98)
= P { "01" comes up first} -
p p+21
P { "11" comes up first}.
Solve to find that (8.99)
P ("01" comes up before "11" } = 1 - p2.
Problems
151
6.8. Random Quadratic Forms. Let {Xi}i=1 be a sequence of i.i.d. random variables. For a given double array {ai,,, } 2,7 =1 of real numbers, we wish
to consider the "quadratic form" process, do > 1.
Q,,:= >2 > ai,jXiXj
(8.100)
1
Define a,3 :_ (aij + aj,i)/2. A little thought shows that we do not alter the value of Qn if we replace aij by of .. Therefore, we can assume that ai,j = aj,i, and suffer no loss in generality. The quadratic form process {Qn}n° I arises in many disciplines. For instance, in mathematical statistics, {Qn}' 1 belongs to an important family of processes called "U-statistics."
Theorem 8.45 (Varberg, 1966). Suppose EX1 = 0, E[X2] = 1, and E[Xf ] < oo. If E 1 1 a,3 < co, then limn. (Qn - E I
E'
Proof. Let An := E1
Qn-An=2Un+Vn,
(8.101)
where (8.102)
Un :_ EE ai,,X;Xj
and
Vn := E ai,i [X?
1
I
Because (x + y)2 < 2(x2 + y2) for all x, y E R, it follows that (Qn - An)2 <
8Un + 2. Therefore, (8.103)
E [(Qn - An)2] < 8E[Un] + 2E[V,2].
By independence, E[VI] = El
Problems 8.1. We say that X E LI(P) converges to X E LI(P) weakly in LI(P) if for all bounded random E[XZJ. Show that X - X weakly in Ll(P) if for any subvariables Z,
a-algebra I C 9, E[X I y] converges to E(XIT) weakly in LI(P). Conversely, prove that if X - X in LI (P), then for any sub-o-algebra I C .9', E[X" (`.>'J - E(X 159) in V (P). 8.2. Construct three random variables U, V, W such that EIE(U I V) I WJ 0 E(E(U I W) I V] with positive probability.
8.3. Suppose X and Y are independent real (say) random variables, and / Rz - R is bounded and measurable. If g(x) := Etf(x,Y)J for all x E R, then prove that g(X) = E[j(X,Y) I XI as.
8. Martingales
152
8.4. Suppose X, Y E Ll (P) satisfy E[X I YJ = Y and E[Y I X] = X as. Prove then that X = Y as. (HINT: Prove first that E[X - Y; X < q < YJ = 0 for all q E Q; see also Doob (1953, p. 314).) 8.5. Let ((0, 1],-4((0, 11),P) denote the Steinhaus probability space, and consider X (w) = w for 0 < w < 1. Compute and compare E[X 19'1) and E[X 19'2] where 9'1 is the o-algebra generated by o((0, 1/2]), and .92 is the sub-o-algebra of 9((0, 11) that is generated by .9((0,1/21). This is due to J. A. Turner. 8.6 (Conditional Variance). Suppose cf C .IF is a o-algebra. If X E L2(P), then define Var(X 144) _
E{(X - E[X I y])21 y). Prove that Var(X [1) < E{(X - C)2 1 y} as, for every X E L2(f2 ,.S, P) and E L2(f1,4,P). Derive (8.4) as a consequence. 8.7 (Conditional Chebyshev). Prove that if I is a sub-o-algebra of.Jr and X is a non-negative random variable on (Q, .'F, P), then AP(X > a [ y) < E[X I yJ a.s. for all a > 0. 8.8 (Corollary 8.8, p. 125, Continued). Let (X, Y) be an absolutely continuous random variable with piecewise-continuous density f (x, y). Prove that for every non-negative measurable h : R -+ R,
(8.104)
E[h(X)] YJ =
fff'_ (x)f(x,Y)dr f(x,Y)dx
as.
Even though P{Y = y} = 0 for all y E R, use the preceding to justify the classical definition,
P(X
8.9 (Censoring). Let 9 denote a filtration, and consider 9-stopping times S < T as. For any fixed event A E fs define r = S1A + T1AC. Prove that r is an .r-stopping time. 8.10. Verify Proposition 8.9.
8.11. Carefully prove Lemmas 8.25 and 8.26. In addition, construct an example that shows that the difference of two stopping times, even if non-negative, need not be a stopping time. 8.12. The optional stopping theorem (p. 130) assumes that T is almost surely bounded. Construct an example to show that this assumption cannot be dropped altogether.
8.13. Prove Lemma 8.29.
8.14 (Gambler's Ruin). If {X,} 1 are i.i.d. random variables with P{X1 = 11 = 1 - P{X1 = -l} = p 54 1, then verify that, in the proof of Theorem 8.34, {(Sn)n 1 is indeed a bounded mean-one martingale. Also compute ET in the case p = 1/2. 8.15. Let {Xn}n 1 be independent random variables such that for all k > 1, hk(t) = Ee1Xk exists and is finite for all t E (-to, to) for a fixed to > 0. Prove that whenever III < to, Mn(t) _ e1Sn / {Zk=1 hk(t) defines a mean-one martingale. [As usual, S. denotes E
1
X,.]
8.16 (Likelihood Ratios). Suppose f and g are two strictly positive probability density functions on R. Prove that if {X,};__1 are i.i.d. random variables with probability density f, then Fl,'= I [g(X, )/ f (Xj)J defines a mean-one martingale. When does it converge. and in what sense(s)?
8.17 (Ptilya's Urns). An urn initially contains R red and B black balls. Except for their colors, the balls are identical. A ball is chosen at random. If it comes up red (reap. black), then it is replaced with two red (reap. black) balls. Let X denote the number of red balls in the urn after n draws. Prove that the fraction fn = Xn/(n+ R + B) of red balls has an almost-sure limit. 8.18. Prove that Definition 8.12(iii) is equivalent to E[Xn+k I ,fnJ >- Xn as, for all k, n > 1. 8.19. Prove that Doob's decomposition (p. 128) is a.s.-unique, as long as we insist that {Zn}n°__1 is previsible and Z1 = 0.
8.20. Let {Xn}n 1 be a martingale. Prove that X is bounded in L1 (P) if sup E[X,+, J < oo.
8.21. Prove that if X is a martingale, then (8.105)
r PtlmaxJXi[>a <E(]X.[P) 111
j_
-
"p > 1, n > 1, \ > 0.
Also, prove that Doob'a inequalities imply Kolmogorov's maximal inequality (p. 74).
Problems
153
8.22 (Doob's LP Inequality). Suppose l; and ( are a.s. non-negative random variables such that va > 0.
P{(> a} < 1 E[(;(> a]
(8.106)
a
Prove that for all p > 1, (8.107)
IIfIIP 5 (p p 1) IKIIP
Use this show the strong LP-inequality of Doob: If X is a non-negative submartingale and Xn E LP(P) for all n > 1 and some p > 1, then r
11
E Ilmax X?J <
(8.108)
(p)
P
E (XP].
1
Use this to prove Corollary 8.41.
8.23 (Pitman's L2 Inequality; Problem 8.22, Continued). Suppose {X,}n 1 and {Mk}k=l are processes that satisfy: (i) Mk = Xk whenever Mk iE Mk-1; and (ii) E[Ek=z Mk-s(Xk - Xk-1)J is non-negative. Prove that E[A4n] < 4E[X2]. Use this to conclude (8.108) in the case that p = 2 (Pitman, 1981).
8.24 (Problem 8.22, Continued; Square Functions). Let {3,}O 1 denote a filtration, and suppose {X,}°_1 is a martingale such that X. E L2(P) for all i > 1.
(1) Prove that Xn - An defines a mean-zero martingale where A. = E; 1 E[d? d, = X, - X,_1, Xo = 0, and .9ro is the trivial o-algebra. The process A is called the square function of X. (2) Prove that E[sup, 1. (3) Conclude that limn X,(w) exists for almost all w E {A,,, < oo}. (4) Explore the case where the d,'s are independent.
8.25 (Theorem 8.30, Continued). Suppose Sn = X1 + + Xn defines a random walk with EX1 = is and VarX1 = oz < no. Let .fin = ((X,}°° 1) (n > 1), and consider an f-stopping time T that has a finite mean. Prove Wald's second identity, VarSr = VarXi ET. (HINT: Problem 8.22.)
8.26. Consider two random variables X and Y, both of which are defined on a common probability space (11, Jr, P). Define (8.109)
(-) 1{xeL,2
Xn =
2n
7=-a
" (7+1)2 ^))
vn
1.
Prove that for any Y E LI(P), limn_ ELY I Xn] = E[Y I X) a.s. and in Lt (P).
8.27. Suppose that Y E L1(P) is real-valued, and that X is a random variable that takes values in Rn. Prove that there exists a Borel measurable function f such that E[Y I X] = f(X) almost surely.
8.28. Suppose that {Xi}1,=1 are independent mean-zero random variables in L2(P) that are bounded; that is, that there exists a constant B such that almost surely, IXnI < B for all n. Prove that for allA>Oandn>1, (8.110)
(
P(ma!cn lS,I-A} 1925).
(B
+
VarSn
(Khintchine and Kolmogorov,
8.29 (Martingale Convergence in LP). Refine the martingale convergence theorem by showing that limn-- Xn exists in LP(P) whenever X is bounded in LP for some p > 1. In addition, prove
that if X = lim-a, Xn, then Xn = E[X I9n] as. for all n > 1. (HINT: Use Problem 8.22.)
8. Martingales
154
8.30 (Double-or-Nothing). Let {y,) 1 denote a sequence of i.i.d. random variables with P{11 = 0) = P{71 = 1) = 1. Consider the stochastic process X, where X1 := 1, and Xn := 2Xn_i7n for all n > 2. Prove that X is an L1-bounded martingale that does not converge in L'(P). Consequently, Problem 8.29 can fail for p = 1. Compute the almost-sure limit of X.. 8.31. Let {Xn}n 1 be independent standard normal random variables and define S.
X,
(n > 1). Prove that Mn = (n + 1)-1/2exp{Sn/(2n + 2)} defines a mean-one martingale (Woodroofe, 1975, Problem 12.10, p. 344).
8.32 (Problem 8.31, Continued). Define Mn as in Problem 8.31. Use only the martingale convergence theorem (p. 134) and the CLT (p. 100) to prove that limn-,c Mn = 0 a.s. Derive the following precursory formulation of the LIL (Steinhaus, 1922):
Sn=o((ninn) 1/2)
(8.111)
a.s.asn-+oo.
8.33. Suppose X is a submartingale with bounded increments; i.e., there exists a non-random finite constant B such that almost surely, ]X,, - Xn_ 1 I < B for all n > 2. Then prove that limn Xn exists as. on the set {sup,, IXm] < oo}.
8.34. Suppose {.fin}- 1 is a filtration of a-algebras, and Y E L1(P) is fixed. Define M,, _ E[Y ]5n] to be the corresponding Doob martingale. Prove that for all finite stopping times T, MT = ELY I.5T1 as. (Dubins and Freedman, 1966).
8.35. Prove that X is a martingale if and only if EMT = EMI for all bounded stopping times T. Characterize super- and submartingales similarly. 8.36. Let {Xn}°,,°-e be a non-negative supermartingale that attains the value zero at some a.s.finite time. Prove that limk_,o Xk = 0 as. 8.37. Follow the proof of Theorem 8.40 and prove that c1 A(f) < M f < c2A(f) where cl and c2 are positive and finite constants that do not depend on f, and A(f) := sup,, Afn + sup,, Nn. 8.38. The following is a variant of Problem 8.17. First choose and fix A E (0, 1). Then consider random variables X,, E (0, 11, adapted to .in, such that a.s. for all n > 1,
P{Xn+1=A+(1-A)XnIAn}=Xn (8.112)
P(Xn+l =0-X)Xnl9n}=1-Xn.
Prove that X := limn-,o Xn exists as. and in LP(P) for all p > 1, and that X_ is zero or one almost surely. Compute P{X = *-
8.39. Suppose that {X,},'=1 are i.i.d. with P{X1 = 1} = P{Xl = -1} = 1/2. As before, let
S,, :=X1+...+Xn.
(1) Prove that Eexp(tS,) <- exp(nt2/2) for all n > 1 and t E R(2) Prove that (8.41) continues to hold in the present setting.
(3) Prove the following half of the LIL for (±1) random variables: limsup
Sn
n-,o (2nlnlnn)1/2 -
a.s.
(4) Suppose {Y,}i= 1 are i.i.d. with P{Y1 = 0} = P{Yj = 1} = 1/2 and Tn := Y1 + +Yn. Prove that lim sup
T,, - 121
1
n-- (2nlnInn)21/2 < -2
as.
Check that this is one-half of the LIL (p. 138) for Tn. 8.40 (Uniform Integrability). Consider a martingale {X,,)n- 1 for which T is an a.s: finite stopping time. Suppose, in addition, that {XTnn} =1 is uniformly integrable; see Problem 4.28 on page
51. Then prove that EXT = EX1. 8.41 (Uniform Integrability). Let {Xnbe a uniformly integrable martingale with respect to some filtration .9; see Problem 4.28 on page 51. Prove that X. = limn-.,o X,, exists and is finite as. Prove also that outside a null set Xn = E[X ]5n] for all n > 1.
Problems
155
8.42 (Theorem 8.39, Continued). Prove that if f : (0, 1)k -» R is integrable (k > 2), then for almost all x E (0, 1)k 1
(8.113)
:k+a
610 Sk I lim k
...
S,+e I1
j(u) duI ... duk =
f (X).
8.43. Let XI be uniformly distributed on (0,1). Conditionally on XI, define X2 to be uniformly distributed on (0, XI); i.e., P{X2 E A I XI) = m(A n [0, X I ])/X 1 , where m denotes the Lebesgue measure. Iteratively define (8.114)
P{Xn E AI X1,..., Xn-1} =
m(An(0, Xn-1]) Xn-1
Explore the structure of {Xn}n I, and the behavior of Xn for large n. 8.44 (Patterns). Verify Lemma 8.44. Also, find the probability that we see f consecutive ones before k consecutive zeros.
8.45 (U-Statistics). Prove that that (8.115)
0 in the proof of Theorem 8.45. From this conclude
E [(Qn - An)2] =4 F
alt + Var(X?)
a2 ;.
1
1<.<7
8.46 (Runs in Random Permutations; Hard). Let H,. denote a random permutation of {1 , ... , n}, all permutations being equally likely. A block of ascending elements of [In is a run if it is not a sub-block of a longer block of ascending elements. For example, if 113 = (7,6,8,3,1 , 2 , 5 , 4 },
then it has five runs: {7}; {6,8}; (3); {1,2,5}; and (4). Prove that if Rn denotes the number of runs of fln, then nRn - (na 1) defines a mean-zero martingale, and VarR, = 0(n) as n -. oo. Conclude that lim
(8.116)
n
R= n
1
2
as.
8.47 (Reversed Doob Martingales; Hard). Let f1 ? Y2 D ... be a decreasing family of sub-aalgebras off. The family {.i n}°n°__1 is called a reversed (or "backward") filtration. Prove that if
Y E L'(P) then lim.-- E[Y (.rn] exists as. and in LI(P). 8.48 (Exchangeability; Problem 8.47, Continued; Hard). Random variables XI, X2.... are said to be exchangeable if the distribution of (XI , ... , Xn) is the same as that of (X,'111, ... , X,,(n)) for every permutation rr of { 1, ... , n) and all n > 1. Define Sn = X i + + Xn for all n > 1 and let en denote the a-algebra generated by {Sk}k n. If EIX1I < oo then: (1) Compute E(X, (<Sn] for all 1 < i < n.
(2) Prove that Z := limn-»w(Sn/n) exists a.s. and in LI(P). 8.49 (Levy's Equivalence Theorem; Hard). Let {Xn}n=1 be independent. Prove that Sn = E° 1 X; converges almost surely if and only if Sn converges in probability (Levy, 1937, Theorbme 44, p. 139). (HINT: Begin by proving that exp(itSn)/Eexp(itSf) defines a mean-one "complex" martingale for any t E R.)
8.50 (Azuma-Hoeffding Inequality; Hard). Let {X;}° 0 be a mean-zero martingale. Suppose there exist non-random constants {ci} 1 such that IX, - X,_1( < c, as. (i = 1,...,n). Prove that (8.117)
P { max (X,(> zlJ < 2exp (_ 0<2
z n
\ 1
C2/
vZ>0.
(HINT: Consult Problems 4.30 (p. 51) and 6.33 (p. 87).)
8.51 (Problem 8.40, Continued; Hard). Find an example of a mean-zero martingale {X,)andan a.s: finite stopping time T such that EXT 0 0.
8. Martingales
156
8.52 (Problem 8.22. Continued; Hard). Suppose f and ( are as. non-negative random variables such that for all a > 0,
P{f>a):5 1 EI(;{>a".
(8.118)
a
Prove that f E LI(P) as long as EI(In+(I < oo. Here, In+y = In(y n e). Use this to prove the strong LI-inequality of Doob: IfX is a non-negative submartingale, then sups E(IXnI In+ IXnI) < oo implies Esupn IXnI < oo. (HINT: Prove first that if 0 < x < y, then x in y< x In+ x + (y/e).)
8.53 (Hard). Prove that if f : R -. R is convex, then f' exists almost everywhere. [In fact, f" exists a.e., but this is a little more difcult.I 8.54 (de Finetti's Theorem; Problem 8.48, Continued; Hard). Suppose is an exchangeable sequence of zeros and ones, S. := XI + + Xn, and 4 = o(Sn , Sn+I , ...) for all n > 1.
(1) Prove that P(X1= =Xk=IIgn)=(n
s)/(s,)a.s.
(2) Use Stirling's formula to conclude the theorem of de Finetti (1937):
p(X1=.._=Xk=1,Xk+I=...=Xn=01 Z)=Zk(1-Z)"-k
a.s.
8.55 (Problem 8.52, Continued; Harder). Problem 8.52 cannot be improved. Let {Xn}n I denote i.i.d. mean-zero random variables, and define N. = E,n=, X, for all n > 1. Prove that Esupn ISn/ni and E{IXII In+ IXII) converge and diverge together (Burkholder. 1962). 8.58 (Problem 8.39, Continued; Harder). Prove the other half of the LIL (p. 138) for (±1) random variables: lim supn_,o Sn/an > 1 a.s., where an := (2n In In n)1 /2. You may use the following argument (de Acosta, 1983, Lemma 2.4):
(1) Prove that it suffices to show that for all c > 0,
InP {Sn > cI/2an} > -c. 1 n-. lnlnn (2) To establish (1) choose pn -. oo such that n divides p,,, and then prove that liminf
(
n
P {Sn > cI/Zan} > \P {Spy > eI/ZanPn/n}) (3) Use the central limit theorem and the preceding with pn - an/(Inlnn) to prove (1). Conclude the proof of the LIL for (±1) random variables.
(HINT: For part (2) first write S. = 5,,,, +(S2p - S,,,)+ +(Sn -S(n_I)p,,/n). Next observe that if each of these n/p terms is greater than pn A/n, then S. > A. Finally choose A judiciously. For part (3) optimize over the choice of a.) 8.57 (Problem 8.24, Continued; Harder). Suppose {Xn }n= is a martingale with respect to some filtration {.9rn},a 1. Suppose also that do = X. - Xn_I satisfies Idnl S a for all n > 1, where Xo = 0, Sto = (0,f1), and a is a non-random positive constant.
(1) Prove that for all x E R, e'' < I + x + x2el=I. Use this to prove that for all I E R
and all i=1,2,..., ed,
I
._
<1+
2
2°ItIE[d?_1I 2'
< exp
2'
a.s.
(2) Let { An },O°_ 1 denote the square function of X. Then conclude that given a non-random
t E R the following defines a non-negative supermartingale:
Mn = exp tXn -
t2&10A n 2
vn>1.
Moreover, verify that EMn < 1 for all n > 1. (3) Prove that if X. > 0, then limn-.oo X. IA. exists and is finite as. Prove also that
lim Xn(w) = 0
for almost all w E {A, = oo}.
Notes
157
Notes (1) Martingales were first introduced and studied by Ville (1939). The current powerful theory was formed by Doob (1940, 1949) shortly thereafter. (2) Our proof of the martingale convergence theorem (p. 134) is due to Isaac (1965). Aside from this and the original proof of Doob (1940) there are other nice proofs of Doob's martingale convergence theorem. For example, see Chatterji (1968), Helms and Loeb (1982), Lamb (1973), and C. lonescu Tulcea and A. lonescu Tulcea (1963). (3) An enormous literature is devoted to the study of the law of the iterated logarithm and its variants. An excellent starting point is the theorem of Strassen (1967). It implies that if X1, X2,... are i.i.d., EXi = 0, and VarXj = 1, then on some suitable probability space there exist i.i.d. N(0, 1) random variables {G;},°, such that
i_t Xi - E" _tGi n
nlim(n log log n) 1/2
l
-0
a.s.
In particular, this shows that the general LIL follows from the one proved here. More-
over, if the X;'s have higher moments than two, then the rate of approximation can be improved upon. This is the starting point of a theory of "strong approximations." Csorg6 and Rhvbsz (1981) is an excellent treatment. Two scholarly reviews of the LIL are Feller (1945), for the classical theory, and Bingham (1986), for the more modern advances.
(4) Equation (8.45) has the following improvement, due to Laplace (1805, pp. 490-493):
P{X1>a)=
A2/2
kr
A2
1+
A2
1+2 1+3
A2
1+4
(5) Theorem 8.39 is also known as the Lebesgue density theorem. It states that the antiderivative of every f E L'(dx) is f a.e. On the other hand, it is the case that "most" continuous functions are nowhere differentiable (Banach, 1931; Mazurkiewicz, 1931; Paley, Wiener, and Zygmund, 1933; Kahane, 1997, 2000, 2001). (6) The material on option-pricing (§6.5) is based in part on the discussions of Baxter and Rennie (1996, Chapter 2) and Williams (1991, Section 15.2). There you will learn, among other things, that there are in fact hedging strategies that never sell short. This demonstration requires only a little more effort than the proof described here, and is worth looking at. (7) The notion of Lipschitz continuity is due to the work of Lipschitz (1876) on differential equations. (8) One can streamline the method of §6.7; see Li (1980) and Gerber and Li (1981). (9) Problem 8.46 is, in essence, borrowed from the exciting book of Mahmoud (2000, pp. 48-51) on sorting. It can be shown that
f
R. - (n/2)
N(0, 1/12).
(ibid., Proposition 1.10, p. 51).
(10) Problem 8.48 is due to de Finetti (1937), but the proof outlined here is borrowed from Doob (1949). Aldous (1985) presents a masterly modern-day account of exchangeability and related topics. (11) When the increments of X are independent, Problem 8.50 is due to Hoeffding (1963). The general case is due to Azuma (1967), and is proved by the same argument.
158
8. Martin gales
(12) Problem 8.54 states that all exchangeable sequences of zeros and ones are "conditionally i.i.d." The proof outlined here is motivated by Exercise 6.3 of Durrett (1996, p. 271). For a detailed historical account see Cifarelli and Regazzini (1996). Remarkably enough, de Finetti's theorem has consequences in diverse subjects such as the philosophy of statistics (de Finetti, 1937; Kyhurg and Smokier, 1980), statistical mechanics (Georgii. 1988), and geometry of Hilbert spaces (Bretagnolle and Dacunha-Castelle. 1969). (13) Problem 8.57 generalizes the 'law of large numbers" of Dubins and Freedman (1965). The central ideas used here come from a paper of de Acosta (1983).
Chapter 9
Brownian Motion
The theory of random functions always makes the impression of a much greater degree of artificiality than corresponds to the facts.
-Raymond E. A. C. Paley and Norbert Wiener
On March 29, 1900, a doctoral student of J. H. Poincare by the name of Louis Jean Baptiste Alphonse Bachelier presented his thesis to the Faculty of Sciences of the Academy of Paris. Louis Bachelier's work was chiefly concerned with finding "a formula which expresses the likelihood of a market fluctuation" (Bachelier, 1964, p. 17). Bachelier's solution to this problem required the introduction of a number of novel ideas, one of which was today's "Brownian motion." See also the English translation in the volume edited by Cootner (Bachelier, 1964). In 1828, the botanist Robert Brown noted empirically that the grains of pollen in water undergo erratic motion. Brown himself admitted to not having a scientific explanation for this phenomenon. And it was years later, in 1905, that an explanation was found by Albert Einstein. The key idea in Einstein's solution was the introduction of a stochastic process that Einstein called "the Brownian motion." Unaware of the earlier work of Bachelier in economics, Einstein had rediscovered that Brownian motion is related concretely to the diffusion of particles. As a main application of his theory, Einstein found a very good estimate for Avogadro's constant. Einstein's theory was tacitly based on the assumption that the Brownian motion process exists. Nearly two decades later, Wiener (1923a) proved the validity of Einstein's assumption. In the present context, the contributions of von Smoluchowski (1918) and Perrin (1913) are also particularly noteworthy. 159
9. Brownian Motion
160
From a mathematical point of view, Bachelier's work went further than Einstein's. However, we introduce the latter's work because it is easier to describe. Thus, we begin with a modern statement of Einstein's postulates: Brownian motion .( W(t) }t>o is a random function oft (= "time") such that:
(P-a) W(0) = 0, and for any given time t > 0, the distribution of W(t) is normal with mean zero and variance t.
(P-b) For any 0 < s < t, W(t) - W(s) is independent of Think of s as the current time. Then, this condition is saying that "given the value of W at the present time, the future is independent of the past." This is called the Markov property.
(P-c) The random variable W(t) - W(s) has the same distribution as W(t - s). That is, Brownian motion has stationary increments. (P-d) The random path t' -+ W(t) is continuous with probability one.
Remark 9.1. One can also have a Brownian motion B that starts at an arbitrary point x E R by defining (9.1)
B(t) := x + W(t),
where W is a Brownian motion started at the origin. One can check directly that B has all the properties of W, except that B(t) has mean x for all t > 0, and B(0) = x. Unless stated to the contrary, our Brownian motions always
start at the origin.
So why are (P-a) through (P-d) postulates and not facts? The sticky point is the a.s.-continuity (P-d). In fact, Levy (1937, Theoreme 54.2, p. 181) has proven that if in (P-a) we replace the normal by any other distribution,
then either there is no process that satisfies (P-a)-(P-c), or else (P-d) fails to hold. In summary, while the predictions of theoretical physics were correct, a more solid understanding of Brownian motion required a rather in-depth undertaking such as that of N. Wiener. Since Wiener's work Brownian motion has been studied by multitudes of mathematicians. This and the next chapter aim to whet your appetite to learn more about this elegant theory.
1. Gaussian Processes Let us temporarily leave aside the question of the existence of Brownian motion, and first study normal distributions, Gaussian random variables, and Gaussian processes. Before proceeding further, you may wish to recall §5.2 (p. 11), as well as Examples 3.18 (p. 28) and 7.12 (p. 97), where normal random variables and their characteristic functions have been introduced.
1. Gaussian Processes
161
1.1. Normal Random Variables. Definition 9.2. An R-valued random variable Y is said to be centered if Y E L' (P) and EY = 0. An R"-valued random variable Y = (1',.. . , Yn)' is said to be centered if each Y is. If, in addition, E{IYj2} < 00 for all i = 1,. .. , n, then the covariance matrix Q = (Q,,j) of Y is the matrix whose (i,j)th entry is the covariance of Y and Yj; i.e., Qjj = E[YYj]. Suppose that X = (X1,. .. , Xn)' is a centered n-dimensional random variable in L2(P). Let a E R" denote a constant (column) vector, and note that a'X = a X = En 1 a;X1 is a centered R-valued random variable in L2(P) with variance n
n
Var(cz X) _ E E aiE[X;Xj[aj = a'Qa,
(9.2)
i=1 j=1
where Q = (Q;,j) is the covariance matrix of X. Since the variance of any random variable is non-negative, we have the following. Lemma 9.3. If Q denotes the covariance matrix of a centered L2 (p) -valued
random variable X = (X1, ... , Xn)' in R", then Q is a symmetric nonnegative definite matrix. Moreover, the diagonal terms of Q are given by Qj,j = VarXj. Definition 9.4. An R"-valued random variable X = (X1,.. . , Xn)' is centered normal (or centered Gaussian) if for all a E R",
= e-za'Qa where Q is a symmetric non-negative definite real matrix. The matrix A is called the covariance matrix of X. (9.3)
Ee'a.X
We have seen in Lemma 9.3 that covariance matrices are symmetric and non-negative definite. The following implies that the converse is true also.
Theorem 9.5. Let Q be a symmetric non-negative definite (n x n) matrix of real numbers. Then there exists a centered normal random variable X = (X1, . . , Xn) whose covariance matrix is Q. If Q is non-singular, then the distribution of X is absolutely continuous with respect to the n-dimensional .
Lebesgue measure and has the density o f Example 3.18 (p. 28) with Q replac-
ing E there. Finally, an R"-valued random variable X = (X1, ... , Xn) is centered Gaussian if and only if a'X is a mean-zero normal random variable
for all a E R. denote the n eigenvalues of Q. The )j's 1 are real and non-negative. Let {v;};'_1 denote the respective orthonormal eigenvectors, and view them as column vectors. Then the (n x n) matrix
Proof (Sketch). Let {a;}
9. Brownian Motion
162
P = (vi, ... , v,) is orthogonal. Moreover, we can write Q = P'AP, where A is the diagonal matrix of the eigenvalues al, ... , .\n. Next, let {Z;} 1 denote n independent standard normal random variables. It is not difficult to see that Z = (Z1,. .. , Zn)' is a centered Rnvalued random variable whose covariance is the identity matrix. Define
X := P'A1/2Z, where A'/2 denotes the diagonal matrix whose jth diagonal entry is x 2. Since X = (X1, ... , Xn)' is a linear combination of centered random variables, it too is a centered Rn-valued random variable. Define n
(9.4)
ll at (P'A1/2)1,k
Ak =
vk = 1.... , n.
1=1
Then a. X = Ek=1 ZkAk for all a E Rn. By independence, n
Eet°'X = [J Ee`4Ak = e-2 Ek=, Ak. k=1
One can check readily that Ek=1 Ak = a'Qa. Therefore, we have constructed a centered Gaussian process X that has covariance matrix Q. To check that Q is indeed the matrix of the covariances of X, make another round of computations to see that E[X;XJJ = Q. Next we suppose that Q is non-singular. Let j2 denote the distribution
of X. We have shown that µ(t) = exp(-Zt'Qt). Because µ is absolutely integrable on R1, the inversion theorem (see Problem 7.12, p. 112) implies that the probability density f = du/dx exists and is given by the formula (9.6)
AX) =
1
e-it"_2t'Qt
dt
vx E R.
Write Q:= P'A1/2A1/2P, and change variables [s = A2 Pt]. This transforms the preceding n-dimensional integral into a product of n one-dimensional integrals. Each of the latter integrals can be computed painlessly by completing the square. Therefrom follows the form of f. To complete this proof, we derive the assertion about linear combinations. First suppose a'X is a centered Gaussian variable in R. We have seen already that its variance is a'Qa, where Q denotes the covariance matrix of
X. In particular, thanks to Example 7.12 (p. 97), Ee'a'X = exp(-Za'Qa). Thus, if a'X is a mean-zero normal variable in R for all a E Rn, then X is centered Gaussian. The converse is proved similarly.
Remark 9.6. According to Theorem 9.5, the covariance matrix of a centered normal random variable determines its distribution. However, it can happen that Xl and X2 are normally distributed even though (X1, X2) is not; see Problem 9.4 below. This demonstrates that in general the normality
1. Gaussian Processes
of (X1, ... ,
163
is a stronger property than the normality of the individual
XD's.
The following important corollary asserts that for normal random vectors independence and uncorrelatedness are one and the same.
Corollary 9.7. Let (X 1, ... , X,,, Y1,. .. , Ym) be a centered normal random variable such that Cov(X;, YY) = 0 for all i = 1, ... , n and j = 1, ... , m. Then (XI, ... , and (Y1,. .. , Ym) are independent.
1.2. Brownian Motion as a Gaussian Process. Definition 9.8. We say that a real-valued process X is centered Gaussian if (X (tl ), ... , X (tk)) is a centered normal random variable in Rk for all 0 < t1, t2, t3, ... , tk. The function Q(s, t) := E[X (s)X (t)] is called the covariance function of the process X.
For the time being, we assume that Brownian motion exists, and derive some of its elementary properties. We establish the existence later on.
Theorem 9.9. If W := {W(t)}t>o denotes a Brownian motion, then it is a centered Gaussian process with covariance function Q(s, t) := s A t. Conversely, any a.s.-continuous centered Gaussian process that has covariance function Q and starts at 0 is a Brownian motion. Furthermore:
(1) (Quadratic Variation) For each t > 0, as n V n-I
jr
oo,
- W \ \n) t)J2
t.
(2) (The Markov Property) For any T > 0, the process
{W(t + T) - W(T)}t>o is a Brownian motion that is independent of of{W(r)}o
Remark 9.10.
(1) Any function with bounded variation has zero qua-
dratic variation. Therefore, Part 1 implies that Brownian motion has unbounded variation a.s.
(2) Because W(t + T) = [W(t + T) - W(T)] + W(T), the Markov property tells us that given the values of W before time T, the "post-T" process t ' W (t + T) is a "conditionally independent" Brownian motion that starts at W(T). The dependence of the post-T process on the past is "local" since it depends only on the last value W(T). Theorem 9.9 has the following useful consequence.
9. Brownian Motion
164
Then, Corollary 9.11. Let 39 be the o-algebra generated by Brownian motion is a continuous-time ..I -martingale. That is, E[W (t) I.1r9]
=W(s) a.s. for allt>s>0. Proof. We know that W(t) - W(s) has mean zero and is independent of J r,. Therefore, E[W(t) - W(s) I.T9] = 0 a.s. The result follows from the obvious fact that W (s) is p3-measurable.
Proof of Theorem 9.9 (Sketch). First, let us find the covariance function of W, assuming that W is indeed a centered Gaussian process: If t > s > 0, then because W(t) - W(s) is a mean-zero random variable that is independent of W(s), Q(s,t) = E [W(s)W(t)] = E [(W(t) - W(s) + W(s)) W(s)] (9.7)
= E [IW(s)I2] = s.
In other words, Q(s, t) = s A t for all s, t > 0. Next we will prove that W is a centered Gaussian process. By the independence of the increments of W, for all 0 = to < t1 < t2 <
n
(9.8)
Ee Ek=1 ak(W(tk)-W(tk-1)) = [1
Ee1Rk(W(tO-W(tk-1n)
k=1
For each k = 1,... , n, W(tk) - W (tk_ 1) is a mean-zero normal random variable with variance tk - tk_1i its characteristic function is computed in Example 7.12, p. 97. This leads to (9.9)
Ee'Ek=1 ak(W(tk)-W(tk-11) = e-Za'tita,
where the matrix M is described by M1,j = 0 if i # j, and Mj,2 = tj - tj_1.
In other words, the vector (W(tj) - W(tj_1); 1 < j < n) is a centered normal random variable in Rn. Now for any 3 = (,01 i n
(9.10)
.,On) E Rn,
n
E,8kW(tk) = Eak[W(tk) -W(tk-1)], k=1
k=1
+ an. Therefore, Ek=1 /3, W(tk) is a centered normal where Qk := ak + random variable in R. That is, W is a centered Gaussian process.
Next we prove that if G = {G(t)}t>o is an a.s.-continuous centered Gaussian process with covariance function Q, and if G(0) = 0, then G is a Brownian motion. Because the remaining conditions of Brownian motion are easily verified for G, it suffices to show that whenever t > s, G(t) - G(s) is independent of We fix 0 < u1 < . . . < uk < s and prove that G(t) - G(s) is independent of (G(u1),... , G(un)). However, the distribution of the (n + 1)-dimensional random vector (G(t) - G(s),G(u1),...,G(u))
2. Brownian Motion on [0, 1)
165
is the same as that of (W (t) - W (s), W (ul ), ... , W This is because everything reduces to the same calculations involving the function Q. This proves that G is a Brownian motion. Problem 9.6 contains ample hints for proving that V,a(t) t in probability as n -* oo. Note that t H W (t + T) - W (T) is a continuous centered Gaussian process. We verify that: (a) This is Brownian motion; and (b) it is independent of a({W(r)}o
E [(W(s + T) - W(T)) (W(t + T) - W(T))] = E[W(s + T)W(t +T)] - E[W(T)W(t + T)]
(9.11)
- E[W(T)W(s +T)] + E [(W(T))2]
=(s+T)-T-T+(T)=s=s At. In particular, t,--+ W (t + T) - W (T) is a Brownian motion. The assertion about independence is proved similarly: Ifs < T, then E(W(s) (W(T + t) - W(T))] (912)
E[W(s)W(T+t)]-E[W(s)W(T)]=s-s=0.
Corollary 9.7 proves the independence of t --+ W(t + T) - W(T) from 0'({W(r)}0
2. Wiener's Construction: Brownian Motion on [0, 1) It remains to prove that Brownian motion exists. We begin by reducing the scope of our ultimate goal: If we can show the existence of Brownian motion indexed by [0, 1), then we have the following more general existence result. Lemma 9.12. Suppose {B;}°_°0 are independent Brownian motions indexed by [0, 1). Then, the following recursive definition describes a Brownian motion W indexed by [0,00):
(9.13)
BO(t)
if t E [0,1),
BO(1) + Bl(t - 1)
if t E [1, 2),
W(t) :_
Ej-oBk(1)+Bj(t- j) if t E [j,j+1), Proof. The defined process W is a continuous centered Gaussian process because it is a finite sum of continuous centered Gaussian processes. It remains to compute the covariance of W: If s < t, then either we can find
9. Brownian Motion
166
j>0such that j<s j. In the first case, E[W (s)W (t)] is equal to
-1
E
-1
[(Bk(1)+B(t_i)) (Bk(1) + B(s - j) k-0
/
]
(9.14)
(Bk(1))2J + E [(BB (s - j)BB(t - j))] =EI k=o
=j+ (a - j)=s=sAt.
In the second case, one obtains the same final answer. In any event, W has the correct covariance function, and is therefore a Brownian motion. 0 The simplest construction of Brownian motion indexed by [0, I)-in fact, [0,1]-is the following minor variant of the original construction of Norbert Wiener. Throughout, let {X=}°_°0 denote a sequence of i.i.d. normal random variables with mean zero and variance one; the existence of this sequence is guaranteed by Theorem 6.17 on page 70. Then, formally speaking, the Brownian motion {W(t)}oct<1 is the limit of the following sequence: 9.15
Wn t =tX
+7Esin(jzrt)X
VO
j=1
In order to prove the existence properly, we first need to complete the probability space. Informally, this means that we declare all subsets of null sets measurable and null; this can always be done at no cost. See Theorem 3.20 on page 29. Now we can prove Wiener's theorem (1923a).
Wiener's Theorem. If the underlying probability space is complete, then W(t) = limn-o. W2n(t) exists a.s., and the convergence is uniform for all t E [0, 1]. The process W is a Brovmian motion indexed by [0, 1].
Proof. We split the proof into three steps. Step 1. Uniform Convergence. For n = 1, 2.... and t > 0 define (9.16)
sin(kk7rt)
Sn(t)
Xk.
k=1
Thus, Wn(t) = tXo + (f /ir)Sn(t). Stated more carefully, we have two processes Sn(t , w) and Wn(t , w), and as always we do not show the dependence on w.
We will prove that Stn forms a Cauchy sequence a.s. and in L2(P), uniformly in t E [0, 1]. Define (9.17)
On(t)
S2n}1(t) - S2-(t).
2. Brownian Motion on [0, 1)
167
Note that 2
2n+1
(9.18)
sin(jirt)
I&n(t)12 =
2n+1
X
j=2n+1
2
eijlrt
E
<
j=2n+1
i
X.,
Therefore, 2n+1
2n+1
I&n(t)12
ci(j-k)vrt
<
XjXk
k
j=2n+1 k=2n+1 2n+1
x2
_
(9.19)
k=21+1
2
2"-1 2"+'-1
k(l + k) XkXl+k
1=1 k=2n+1 2n-1 2n+1-1
X2
2n+1
eillrt
+2 E E
E 2 +2E E
k=2n+1
1=1
k=2n+1
XkXl+k k(k + 1)
The right-hand side is independent of t > 0, so we can take expectations and apply Minkowski's inequality to obtain rr
E suP IA(t)1 2 L t>0
(9.20)
l J
2n+1
<E
2
k=24+1 1 2n}1
_E
k=2n+l
2n-1
2n+1 -1
1=1
k=2n+l
+2
+
2n-1 2 t=1
V.
XkXl+k k(k + 1)
2'+1 -1
E
k=2n+1
2
XkXl+k 2
k(k+l)
1
12
(Why?) The final squared-L2(P)-norm is equal to k-2(k + l)-2. On the other hand, by monotonicity, .k=2n+1 k-2 < 2_n and .1=1 1(1+2n)-1 < 1. Therefore, (9.21)
E [SUP f''n(t)12]
< 2-n +2.2
-n/2.
It follows that En Supt>0 IS2n+1 (t) - S2n (t)I < oo a.s. Thus, as n -+ oo, S2n (t) converges uniformly in t > 0 to the limiting random process (9.22)
sin(jnt)Xi
Si(t) j=1
7
In particular, W (t) = limn-,,,, W2n (t) exists uniformly in t > 0 almost surely. Step 2. Continuity and Distributional Properties. The random map t -+ W2n (t) defined in (9.15) is obviously continuous. Because W is an a.s.uniform limit of continuous functions, it is a.s. continuous; see Step 3 below
for a technical note on this issue. Moreover, since W2n is a mean-zero
9. Brownian Motion
168
Gaussian process, so is W (Problem 9.2). Since W(0) = 0, it remains to prove that (9.23)
E [IW(t) - W(s)12] = t - s
dO < s < t.
But the independence of the X's, together with Lemma 6.8 (p. 67), yields
E [IW(t) - W(s)12] = (t - s)2 + (9.24)
_ (t - s)2 +
T2 E
[(S.(t) - Soo(s))2]
J=1J 2
(sin(Jlrt)
"0
sin(as)) 2
2
Define f(x) = 1[,ra,nti(x)+1[-nt,-as](x) (x E [-7r, -7r)) and On(x) = ei"z/ 2ir (x E [-7r, 7r]; n = 0, ±1, ±2,...). Then, 00
E [IW(t) - W(8)12] = (t - s)2 + (9.25)
1
-
2ar
00
f
2
f (x)O, (x) dx 2
IIn
f(x) b (x) dxl =-oo
a
By the Riesz-Fischer theorem (Theorem A.5, p. 205), the right-hand side is equal to (21r)-' f "'r lf (x) I2 dx = (t - s). This yields (9.23). Step 3. Technical Wrap-up. Now we tie up a subtle loose end: The uniform limit W (t) = limn-W W2.. (t) is known to exist, but only with probability one. This is insufficient because we need to define W (t, w) for all W. Thus, we define
W(t) := limsupW2n(t). n-oo The process W is well defined and continuous a.s. The remainder of the calculations of Step 2 goes through, since by redefining a random variable on a set of measure zero we do not change its distribution (unscramble this!). Finally, the completion is needed to ensure that the event C that W is continuous is measurable; in Step 1, we showed that C° is a subset of a null set. Because the underlying probability is complete, C` is null. (9.26)
3. Nowhere-Differentiability We are in a position to prove the following striking theorem of Paley, Wiener, and Zygmund (1933). Throughout, W denotes a Brownian motion.
Theorem 9.13. Suppose the underlying probability space is complete. Then, Brownian motion is nowhere differentiable almost surely.
3. Nowhere-Differentiability
169
Proof. For any A > 0 and n > 1, consider the event (9.27)
Ea = { 3s E [0,1] :
IW(s) - W(t)I < sup tE[s-2-°,s+2 ^]
A2-n JJ
(Why is this measurable?) We intend to show that En P(Ea) < oo for all
A>0. Indeed, suppose there exists s E [0, 11 such that for all t within 2-n of s, IW(s) - W(t)l < A2-1. Then there must exist a possibly random j = 0, ... , 2' - 1 such that s E D(j; n), where D(j; n) := [j2-n, (j +1)2 -nj . Thus, IW(s) -W(t)I < A2-n for all t E D(j;n). By the triangle inequality, 2A2-n = A2-n+1 for all u, v E D(j; n). we can deduce that I W (u) - W (v) I < Subdivide D(j; n) into four smaller dyadic intervals, and note that the successive differences in the values of W (at the endpoints of the subdivided intervals) are at most A2-n+1 This leads us to the following: (9.28)
2n-1 3
En g U n {Io,tI <
(9.29)
A2-n+')
,
i=o t=0
where (9.30)
A't = W
(j2-n
+ (f + 1)2 -(n+2)) - W
(j2-n + t2-(n+2))
Thanks to the independent-increments property of Brownian motion (P-b), 2n-1 3
P (En) <' ft P { IOj,II <
(9.31)
A2-n+1 }
,
i=o t=o
On the other hand, by the stationary-increments property of W, All has a normal distribution with mean zero and variance 2-(n+2) ((P-a) and (P-c)). Thus, for all Q > 0, 112("+2)/2 a-x2/2
P { IOj,tI
(9.32)
Q}
=
f_f32(,+2)/2
dx
/32
+2)/2.
Apply this with,3 = A2-n+1 to deduce that P(Ea) < 256x42-n. In particular, En P(Ea) < oo, as was asserted earlier. By the Borel-Cantelli lemma, for any A > 0, the following holds with probability one: For all but a finite number of n's, (9.33)
inf
sup
o<s
IW(s) -W(t)I > inf Is - tI
sup
0<s<1 It-s1<2
IW(s) -W(t)I > A 2- n
Thus, if W'(s) existed for some s E [0, 1], then IW'(s)l > A a.s. Because A > 0 is arbitrary, this proves that IW'(s)I = oo a.s. This contradicts the differentiability of W at some s E [0, 1]. Thanks to scaling (Theorem 9.9),
9. Brownian Motion
170
W is a.s. nowhere differentiable in [0 , c] for any c > 0. Therefore W is a.s. nowhere differentiable. Technical Aside in the Proof. In fact, we have proven that there exists a null set N such that D C N, where D is the collection of all w's for which t u--. W (t , w) is differentiable somewhere. The collection D need not be measurable. But this is immaterial to us since we can complete the underlying probability space at no cost (Theorem 3.20, p. 29). In the completed probability space D is a null set, and our task is complete. O
4. The Brownian Filtration and Stopping Times Recall the Markov property of Brownian motion (Theorem 9.9): Given any
fixed T > 0, the "post-T" process t i--, W(T + t) - W(T) is a Brownian motion that is independent of a({W (u)}o
given the value of W(T), the process after time T is independent of the process before time T. The strong Markov property states that the Markov property holds for
a large class of random times T that are called stopping times. We have encountered such times when studying martingales in discrete time, and their continuous-time definition is formally the same.
Throughout, W is a Brownian motion with some prescribed starting point, say W(0) = x. Definition 9.14. A filtration W = {. }t>o is a collection of sub-a-algebras of .9 such that a(. C Wt if s < t. If W is a filtration, then a measurable function T : SZ -+ [0, oo] is a stopping time (or sad-stopping time) if {T < t} is a /t-measurable for all t > 0. Given a stopping time T, we can define OT by (9.34)
v/T={AE.,F: An{T0}.
T is called a simple stopping time if there exist 0 < To, 7-1.... < oo such that T(w) E {ro,T1i...} for all w E Q. Define (9.35)
.3° = o ({W(u)}o
In light of our development of martingale theory this definition is quite natural. Here are some of the properties of FT when T is a stopping time.
Lemma 9.15. If T is a finite .90-stopping time, then 9T is a a-algebra, and T is 9To.-measurable. Furthermore, if S < T is another stopping time, then 3s 9 .70..
4. The Brownian Filtration and Stopping Times
171
Proof. For each t > 0, (9.36)
A`n{T
Consequently, 9T is closed under complementation. Because 9,0 is a monotone class, it is a a-algebra.
For all a E (0,oo) andt>0, T-1([0, aJ) n IT < t} _ IT < a n t} E .°t C go. (9.37) This proves that T is JrT-measurable. Finally, we suppose A E F so, and note that for any t > 0, An IT < t} _ An IS < t} n IT < t}. Since An IS < t} and IT < t} are both in 9O, this proves that A n IT < t} E 9to, whence we have A E F. This proves the remaining assertion that . C F. 0
For technical reasons, we need to modify the filtration 90. For every t > 0 let Ft denote the completion of . I. It can be checked that {`fi't}t>o is a filtration of a-algebras. Note that any .`tee- or .0-stopping time is also an .F-stopping time. We alter the filtration Or as well to obtain the Brownian filtration.
Definition 9.16. A filtration {. }t>o is right-continuous if for all t > 0, +,. The Brownian filtration {Ft}t>o is defined as the smallest right-continuous filtration that contains {.fit}t>o. That is, F't := n,>t.`3', s1t = nE>o
for all t>0. Next we construct interesting stopping times.
Proposition 9.17. If A C R is either open or closed, then the first hitting time TA := inf{t > 0 : W(t) E A}, where inf 0 := oo, is a stopping time with respect to the Brownian filtration F. Remark 9.18. If you know what Fo- and G6-sets are, then convince yourself that when A is of either variety, then TA is a stopping time. You may ask further, "What about TA when A is measurable but is neither Fo nor G6?" The answer is given by a quite deep theorem of Hunt (1957): TA is a stopping time for all Borel sets A.
Proof of Proposition 9.17. We prove this proposition in two steps. Step 1. TA is a stopping time when A is open. Suppose A is open. We wish to prove that {TA < t} E .fit for all t > 0. It suffices to prove that (TA < t} E .fit for all t > 0, because the right-continuity of Jr ensures that {TA < t} = n,>o{TA < t + e} E F. But (TA < t} is the event that there exists a time s before t at which W (s) E A. Let C denote the collection of all w such that t H W (t , w) is continuous. We know that P(C) = 1. And
9. Brownian Motion
172
since A is open, {TA < t} n C is the event that there exists a rational s < t at which W(s) E A. That is,
{TA
= U {W(s)EA}\ s
U {W(s)EA}nC`
.
s
Because At is complete, so is .frt. Therefore, all subsets of null sets are measurable null sets. It follows from this that Us
x and the set A is < n-1. It is clear that An is open, and {TA < t} n C = nn{TA, < t} n C, where C was defined in Step 1. By Step 1, {TA < t} n C is in .frt. Because Ft is complete, it follows that {TA < t} E .5rt also.
O
For all random variables T : SZ -+ [0, oo) we define the random variable W (T) as follows:
W(T)(w) := W(T(w),w).
(9.39)
Proposition 9.19. If T is a finite stopping time, then W (T) is measurable with respect to 5T. The proof of this proposition relies on a simple though important approximation scheme.
Lemma 9.20. Given any finite s-stopping time T, one can construct a non-increasing sequence of simple stopping times T1 > T2 > . limn Tn(w) = T(w) for all w E St. In addition, .$T = nn,!FT,,.
.
.
such that
Proof. Here is a receipe for the Tn's: (9.40)
7'n (w) =
(__) 1[k2-n,(k+1)2-n)(T(w))
Since every interval of the form [k2-n, (k + 1)2-n) is obtained by splitting into two an interval of the form [j2-n+1 (j+1)2-n+1), we have Tn > Tn+1 > T.
To check that Tn is a stopping time, note that {Tn < (k + 1)2-n} _ {T < (k + 1)2-n} E 9(k+1)2-n, since T is a stopping time. Now given any
5. The Strong ltiarkov Property
173
t > 0, we can find k and n such that t E [k2-n, (k + 1)2-n). Therefore, {T < t} = {Tn < k2-n} = IT < k2-n} E 5k2-n C St. This proves that the Tn's are non-increasing simple stopping times. Moreover, Tn converges to T since 0 < Tn - T < 2-n. It remains to prove that 5T nn5T,,; see Proposition 9.17 but replace 9° by 9 everywhere.
IfAEnk
An{Tn _ landt>_0.
Therefore, 00
(9.41)
00
An{TO m=1 n=m
e>0
The lemma follows from the right-continuity of the filtration 5.
Proof of Proposition 9.19. First suppose that T is a simple 3-stopping time which takes values in {T° , -r1 , ... }. In this case, given any Bore] set A
and any t > 0, (9.42)
{W(T) E Al n IT < t} = U {W(Tn) E Al n IT = Tn} E S. n>O:
-r.
For a general finite stopping time T, we can find simple stopping times Tn I T (Lemma 9.20) with .`'T = n,, ST.. Let C denote the collection of w's for which t i--+ W (t , w) is continuous and recall that P(C) = 1. Then, for any open set A C R,
{W(T) E A}nCn IT < t} (9.43)
00
00
= n u n {W(Tn)EA}nCn{Tn
Since Tn is a finite simple stopping time, {W(T,,) E Al n IT,, < t} E S. In particular, the completeness of Ft shows that the above, and hence, {W(T) E A} n IT < t} are also in S. The collection of all A E .q(R) such that {W(T) E A} n IT < t} E fit is a monotone class that contains all open sets. It follows from the monotone class theorem that for all A E -V(R) and t > 0, (9.44)
{W(T) E A} n IT < t} E .fit.
This proves the proposition.
5. The Strong Markov Property We are finally in a position to state and prove the strong Markov property of Brownian motion.
9. Brownian Motion
174
Theorem 9.21. If T is a finite .-stopping time, where 9 denotes the Brownian filtration, then {W (T + t) - W (T) }t>o is a Brownian motion that is independent of 9T.
Proof. We prove this first for simple stopping times, and then approximate, using Lemma 9.20, a general stopping time with simple ones. Step 1. Simple Stopping Times. If T is a simple stopping time, then there exist To < rl < such that T E {To, 7-1i ...} a.s. Now for any A E .?T, and for all B1,... , Bm E R(R),
P(An{'i<m: W(T+ti)-W(ti)EBi}) (9.45)
P(An{t'i<m: W(7k+ti)-W(Tk)EBi, T=Tk}). k=0
But A n IT = Tk} = A n IT < Tk} n IT < Tk_1}' is in .irk since A E 9T. Therefore, by the Markov property (Theorem 9.9),
P(An{'i<m: W(T+ti)-W(ti)EBi}) (9.46)
=EP{di<m: W(Tk+ti)-W(Tk)EBi}P({T=Tk}nA) k=0
= P {di < m : W(ti) E Bi} P(A). This proves the theorem in the case that T is a simple stopping time. Indeed,
to deduce that t H W(t + T) - W(T) is a Brownian motion, simply set A = R. The asserted independence also follows since A E 9T is arbitrary. Step 2. The General Case. In the general case, we approximate T by simple stopping times as in Lemma 9.20. Namely, we find Tn I T-all simple stopping times-such that nn,!FTn = 9T. Now for any A E 9T, and for all open B1i... , B,n C R,
P(An{di<m: W(T+ti)-W(ti)EBi}) (9.47)
= n-oo limp (An { 'i < m : W(Tn + ti) - W(ti) E Bi}) = lim P {Vi < n-oo
m : W(ti) E Bi} P(A).
In the first equation we used the fact that the B's are open and W is continuous, while in the second equation we used the fact that A E .99'T for all n, together with the result of Step 1 applied to T. This proves the theorem.
6. The ReBection Principle
175
6. The Reflection Principle The following "reflection principle" is a prime example of how the strong Markov property (Theorem 9.21) can be applied to make nontrivial computations for the Brownian motion.
Theorem 9.22. If t is non-random and positive, then supo<., 0 and t > 0, (9.48)
I
P ( sup W(s) > a = lo<s
tt J
exp I
a
-z2 I dz.
\\\
2t/
Proof. Define Ta = inf{s > 0 : W(s) > a} where inf 0 = oo. Thanks to Proposition 9.17, Ta is an f- and hence an f-stopping time. Step 1. T. is finite a.s. By scaling (Theorem 9.9), for any t > 0, the event {W(t) > v'} has probability (21r) -1/2 f1 ° e-x2/2 dx = c > 0. Consequently,
P{t.oo limsup
(9.49)
()>1}>c>0.
-
JJJ -
(Why?). Among other things, this and the zero-one law (Problem 9.11) together imply that lim supt_ao W (t) = oo a.s. Since W is continuous a.s., it must then hit a at some finite time a.s. Therefore, with probability one, T. is finite and W(Ta) = a. Step 2. Reflection. Note that {supo<s a} = {Ta < t} E .fit. Moreover, P{Ta < t} is equal to P {Ta < t
,
W(t) > a} + P {Ta < t, W(t) < a}
=P{W(t)>a}+P{Ta
= P {W(t) > a}
+E[P{W(Ta+(t-Ta))-W(Ta) <0IfT,};Ta
of fTa is below zero at time t - Ta, given the value of Ta. Independence and symmetry (Theorem 9.9) together imply that the said probability is
a.s. equal to P{W(TO + (t - Ta)) - W(Ta) > 0IfT,} (why a.s.?).
[In
other words, we have reflected the post-Ta process to get another Brownian motion, whence the term "reflection principle."] Therefore, we make this change and backtrack in the preceding display to deduce that P{Ta < t} is
9. Brownian Motion
176
equal to P {W(t) > a}
+E[P{W(Ta+(t-Ta))-W(Ta)>OI9T};Ta
= p{W(t) > a}+P{Ta < t, W(Ta + (t - Ta)) - W(Ta) > 0}
=P{W(t)>a}+P{Taal = 2P {W (t) > a),
because P{W(t) = a} = 0. The latter is manifestly equal to the integral in the statement of the theorem. In addition, thanks to symmetry (Theorem 9.9),
2P{W(t) > a} = P{W(t) > a} +P{-W(t) > a} (9.52)
= P{IW(t)I > a}.
This completes our proof. The reflection principle has the following peculiar consequences: While we expect Brownian motion to reach a level a at some finite time, this time has infinite expectation. That is,
Corollary 9.23. Let Ta = inf{s > 0 : W(s) = a} denote the first time Brownian motion reaches the level a. Then for all a
0, Ta < oo a.s. but
ETa=oo. Proof. (Sketch) We have seen already that Ta is a.s. finite; let us sketch a proof that Ta has infinite expectation. Without loss of generality, we can assume that a > 0 (why?). Then, thanks to Theorem 9.22, = a/f e-v2/2 P {Ta > t} dy. (9.53)
f.lVt
27r
See Problem 9.13 for more details. The preceding formula demonstrates that
g)1/2
(9.54)
P{Ta>t}Haast-goo.
Therefore, E°_1 P{Ta
n} = oo, and Lemma 6.8 (p. 67) finishes the
proof.
Problems Throughout, W denotes a Brownian motion. 9.1. Prove the following: If X and Y are respectively R"- and R'"-valued random variables, then X and Y are independent if and only if (9.55)
Ee"'X+wY=&wX ivY
Use this to prove Corollary 9.7.
vuER",vER"`.
Problems
177
9.2. Suppose for every n = 1, 2, ... , C" = (G',..., GA) is an Rk-valued centered normal random variable. Suppose further that Q;j = lim_ E[G°G'`,'] exists and is finite. Then prove that Q is a symmetric nonnegative definite matrix, and that G" converges weakly to a centered normal random variable C = (G 1, ... , Gk) whose covariance matrix is Q.
9.3 (Linear Regression). Suppose C = (G1,...,G") is an R"-valued centered normal random variable, and let f denote the a-algebra generated by (G1,...,Gm), where m < n. Prove that, conditionally on 1, (G,,,+i,...,G") is a centered normal random variable. Find the conditional mean, as well as the covariance matrix.
9.4. Construct an example of two random variables X1 and X2 such that each of them is standard normal, although (XI, X2) is not a Gaussian random variable. (HINT: If X = ±1 with probability 1/2, and Z is an independent N(0,1), then X[Z[ is an N(0, 1) also.)
9.5. Prove that if W denotes Brownian motion, then { fp W(s) ds}t>o is a continuously differentiable Gaussian process. Use this as a guide to construct a k-times continuously differentiable Gaussian process. 9.6. Let t > 0 be fixed and define V"(t) as in Theorem 9.9. Prove: (1) The first two moments of V"(t) are respectively t and (2t2/n) + t2. Use this to verify that V"(t) converges to tin probability. (2) There exists a constant A > 0 such that for all it > 1, [[V,(t) - t1I4 <
7n=.
Use this to prove that V"(t) converges to t almost surely.
9.7. Prove that {-W(t)}t>o, {tW(1/t)}t>o. and {c 1/2W(ct)}t>o are Brownian motions for any fixed, non-random c > 0.
9.8 (Heat equation). Let pt denote the density function of W(t). Compute pt(x), and verify that it solves the partial differential equation, z
vt>0, x E R. &° (x) = 2 ax t (x) Also, prove that pt solves pt +,(x) = f a pt(y-x)p,(y) dy (Bachelier, 1900, p. 29 and pp. 39-40). (9.56)
9.9 (Khintchine's LIL). If W denotes a Brownian motion, then prove the following LILs of Khintchine (1933): With probability one, limsup
(9.57)
t-oo
W(t) (2 t In In t)1/2
=limsup
W(t)
t-e (2tlnln(1/t))t/2
=1.
9.10 (Brownian Bridge). Given a Brownian motion {W(t)}t>o define the Brownian bridge to
be the process B(t) = W(t) - tW(1). Prove that for all 0 < ft < t2 <
ul,...,um ER,
lim E [e'Ej_t vi%V(t,) I W(1), < eJ = Ee'E7 1 ,o
(9.58)
< t,,, < I and
jB(tj).
This justifies the assertion that Brownian bridge is Brownian motion conditioned to be zero at time one. (HINT: B is independent of W(1).) 9.11 (Blumenthal's Zero-One Law). If W denotes a Brownian motion, define the tail o-algebra .Y as follows: First, for any t > 0, define 9 to be the P-completion of o{W(u); u > t}. Then, define ,T = nt>o_17t
(1) Prove that T is trivial; i.e., P(A) E {0,1} for all A E T. (2) Let of {W define At to be the P-completion of Jr,0, and let .fit be the right-continuous extension. That is, .fit = n.9, where the intersection is taken over all rational s > t. Prove Blumenthal's zero-one law: 5o is trivial; i.e., P(A) E {0,1} for all A E S.
9. Brownian Motion
178
9.12. Follow the next three steps in order to refine Wiener's Theorem (p. 166).
(1) Check that (W,n(t) - W2- (0)m=2, is a martingale for all fixed n > 1 and t E [0,1]. (2) Conclude that m - supo
2n+1
(3) Prove that as., W(t) = limn Wn(t) uniformly over all t E [0, 1].
9.13. Given a # 0, define T. = inf{s > 0: W(s) = a). (1) Prove that the density function f , of To is given by (9.59)
]ale a2/(2s) J T. (x) =
d a E R, x > 0.
x3/2 2 a
For a bigger challenge compute the characteristic function of Ta. The distribution of IT1 is called the stable distribution with index 1/2. (2) Show that the stochastic process {Ta}a>o has i.i.d. increments. 9.14. Choose and fix a, b > 0. Find the probability that Brownian motion does not take the value zero at any time in the interval (a, a + b). Use this to find the distribution function of (9.60)
L := sup(t < 1
:
W(t) = 0).
(HINT: L is not a stopping time; condition on W(a).) 9.15. Let (W(t))tE[o,l) denote a Brownian motion on (0,1). We wish to prove that the maximum of W is achieved at a unique place a.s.
(1) Prove that P{supfE[ablW(t)=x}=0forallxERand0
dk = 1,2,... . Tk+1 := inf{s > Tk : IW(s) - W(Tk)l = 1) (1) Prove that the Ti's are stopping times. (2) Prove that the vectors (W(Tk+1) - W(Tk),Tk+1 - Tk) (k > 0) are i.i.d. Therefore, the process {W(Tk)}k 0 is an embedding of a simple random walk inside Brownian motion (Knight, 1962). In fact, every mean-zero finite-variance random walk can be embedded inside Brownian motion via stopping times (Skorohod, 1961), but this is considerably more difficult to prove.
9.17 (The Forgery Theorem). Let W denote a Brownian motion on [0, 1), and consider a nonrandom continuous function f : [0, 1) -. R. Then prove that for all e > 0, (9.61)
P { sup IW(t) - f(t)] <_ e) > 0. tEI0,1)
1
Think of the graph of f as describing a "signature." Then, this shows that Brownian motion can forge this signature to within a with positive probability (Levy, 1951).
9.18 (Heat Semigroup). Let W denote Brownian motion. Suppose f : R -. R is measurable, and EI f(x+W(t))I < oo for all t > 0 and x E R. Define the heat operator, (Htf)(x) := Ef (x+W(t)), where t > 0 and x E R. Prove that {Ht}too has the following "aemigroup" property: dt, s > 0. Ht(Haf) = Hs+tf 9.19 (White Noise; Hard). Let f : R -. R be non-random, differentiable, and zero outside [a,b].
Then we can define f f dW to be - f W(s)f'(s) ds. Prove then that II f f dWll; = where m denotes the Lebesgue measure on R Use this "L2-isometry" to construct f f dW for all f E L2(m).
Problems
179
(1) Prove that for all f,9 E L2(m),
E [ f fdW f 9dWl = f f(x)9(x)dx. J
(2) Let G(f) = f f dW (f E L2(m)), and prove that C
{G(f)}JEL3(m) is a centered Gaussian process. The process C is the so-called white noise, as well as the Wiener integral.
(3) Prove that if {4,}a=1 is an orthonormal basis for L2(m), then {G(m,)}i=1 are i.i.d. standard normal random variables (Wiener, 1923a,b). R is infinitely differentiable and 9.20 (Hard). Suppose W is a Brownian motion and f : R Elf (W(t))l Goo for all t > 0. Prove that {f(W(t))}t>o is a martingale if and only if there exist
a,b E R such that f(x)=a+bxforallxER. 9.21 (Hard). Our proof of Theorem 9.13 can be refined to produce a stronger statement. Indeed, suppose a > z is fixed, and then prove that for all s E [0,1], (9.62)
lim sup I W (s) - W(t)I = oo
t-s
I9 - tIa
as.
9.22 (Hard). Choose and fix some a > z, and define W to be a Brownian motion. Prove that there exist finite random times of > a2 > .... decreasing to zero, such that W (oj) = o for all j > 1. Is there an a < z for which this property holds? (HINT: Problem 9.21.) 9.23 (Reflection Principle, Hard). Let {Xn }1 1 be i.i.d. taking the values ±1 with probability each. Define Sn = E;`_1 X, (n > 1). Prove that max,<,
9.24 (Hard). Prove that the zero-set Z = {t > 0 : W(t) = 0) of Brownian motion W is a.s. uncountable, closed, and has no isolated points. Prove also that Z has zero Lebesgue measure.
9.25 (Problem 9.13, continued; Hard). Define S(t) = sup,E[o,q W(s) and X(t) = S(t) - W(t). Then, compute the density function of X(t) for fixed t > 0. (HINT: Start by computing P{S(t) >
s,W(t) < w).) 9.26 (Harder). Suppose {W,}:°o is a collection of independent Brownian motions. Define the "two-parameter processes" Z1, Z2, ... , where (9.63)
Zn(s,t)=sWo(t)+
y-c
sln(jira)
n j=1
Wj(t)
v0<8,t<1.
Prove that almost surely, Z2" (a, t) converges to a limiting two-parameter process {Z(a, t) }s,tE[o,t[ . uniformly for all a, t E [0, 1). Prove that Z is an a.s.-continuous centered "Gaussian process" with
covariance function E[Z(a, t)Z(u, v)] = (s n u)(t n v). Use this to prove that for any fixed a > 0, t - s-I/2Z(s,t) is a Brownian motion on (0,1]. The process {Z(s,t)}a,tE1o,11 is the so-called Brownian sheet on [0,1]2.
9.27 (Problem 9.16, continued; Harder). Prove that the embedding scheme of Problem 9.16 satisfies ET, = 1 and E[TT ] < oo. Conclude that
IT,, - nI = O ((n in in n)1/2) < oo as n - oo as. Use this to prove that for all p > 141 (9.64)
nlimo
IW(T
W(n)I = 0
as.
no
Conclude that, on a suitable probability space, we can construct a simple walk {S,}1-1 and a Brownian motion {W(t)}t>o such that
max ISk - W(k)] = o(n") as n - oo a.s.
1
(HINT: W2(t) - t describes a mean-zero martingale.)
9. Brownian Motion
180
Notes (1) Brownian motion is known also as the "Wiener process," or even sometimes as the "Bachelier-Wiener process." (2) Although Bachelier's ideas are now regarded as revolutionary, they went largely unnoticed for nearly 60 years. Courtault, Kabanov, Bru, Crepel, Lebon, and Le Marchand (2000) contains an enthusiastic discussion of this issue. The said reference contains a number of other interesting facts about Bachelier. (3) The book of Nelson (1967) contains a detailed account of the development of the
physical theory of Brownian motion.
(4) The term "Avogadro's constant" is due to J. B. Perrin. (5) The a.s.-continuity assumption of Theorem 9.9 is redundant; it is a consequence of the other assumptions there. (6) The strong Markov property was introduced and utilized by Kinney (1953), Hunt (1956), Dynkin and Jushkevich (1956), and Blumenthal (1957). The phrase "strong Markov property" was coined by Dynkin and Jushkevich (1956). (7) Parts of our modification to the Brownian filtration are unnecessary because Brownian motion is as. continuous. However, this requires a more advanced development of the Brownian motion.
(8) The reflection principle (p. 175) is due to Bachelier (1964, pp. 61-65). The central idea uses the method of Andr6 (1887) developed for the simple walk (Problem 9.23, p. 179).
Chapter 10
Terminus: Stochastic Integration
Reason's last step is the recognition that there are an infinite number of things which are beyond it.
-Blaise Pascal
1. The Indefinite Ito Integral Given a "nice" stochastic process H = {H(s)}B>o, Ito (1944) constructed
a natural integral f H dW = fo H(s) W(ds) despite the fact that W is nowhere differentiable a.s. (Theorem 9.13). In order to identify what "nice"
means, it is best to go back and redefine what we mean by a stochastic process in continuous time.
Definition 10.1. A measurable stochastic process (also, process or stochastic process) X = {X(t)}t>o is a product-measurable function X : [0, oo) x S1 --i R.
We often write X (t) in place of X (t , w); this is similar to what we did in discrete time.
This is a natural place to verify that Brownian motion is a stochastic process.
If H is nicely behaved, then it stands to reason that we should define
f HdW as limn,, Tn(H), where (10.1)
Zn(H)
= off (2n /
(k+1) LW
-W
(i)] 181
10. Terminus: Stochastic Integration
182
and the nature of the limit must be made precise. Clearly, Zn(H) is a well-defined random variable if, for instance, H has compact support. The following performs some of the requisite book-keeping about {Zn(H)} 1.
Lemma 10.2. If there exists T > 0 such that H(s) = 0 for all s > T a.s., then Tn(H) is a.s. a finite sum and Zn+1(H) - Zn(H) 00
- j=o LHC2+11)-H\2n/J
[W
2n
1/-W\2+11/J
Proof. The sum is obviously finite. We derive the stated identity for Zn+1(H) - Tn(H). Throughout we write Hk,n in place of H(k2-n), and
Oj n W (.72-") - W(k2-m). Consider the identity 27,+,(H) = >o Hk,n+l0k+l,n+1, and split the sum according to whether k = 2j or k = 2j + 1: 00
00
Zn+1(H) = E Hj,n02j+1,n+1 + E H2j+1,n+10 j+ 1 nn+l j=o
j=0 00
_
(10.2)
j,n
2j+1,n+1
Hj,n A2j+1,n+1 + j+1,n j=o 00
2j+1,n+1
(H2j+l,n+1 - Hj,n) Oj+1,n j=0
Because 02 +1,n+1 + Aj+l nn+l = Oj+1,nl the first term is equal to Zn(H), whence follows the lemma.
Definition 10.3. A process H = {H(s)}t>o is adapted to the Brownian filtration 9 if H(s) is .T,-measurable for each s > 0. We say that H is a compact-support process if there exists a non-random T > 0 such that with probability one, H(s) = 0 for all s > T. We also need the following technical definition.
Definition 10.4. Choose and fix p > 1. We say that H is Dini-continuous in LP(P) if H(s) E 11(P) for all s > 0 and (10.3)
J o
1
p(r) dr < oo,
r
where
Op(r) :=
sup s,t: Is-tI
IIH(s) - H(t)IIp
The function Op is called the modulus of continuity of H in LP(P). If H is compact-support, continuous, and a.s.-bounded by a non-random
quantity, then it is a.s. uniformly continuous in 11(P) for any p > 1; i.e.,
1. The Indefinite Ito Integral
183
limt-.oVip(t) = 0. Dini-continuity in LP(P) ensures that ?Pp converges to zero at some minimum rate. Here are a few examples:
Example 10.5.
(a) Suppose H is a.s. differentiable with a derivative that satisfies K := sups IIH'(t)IIp < oo. [Since (s,w) H H(s,w) is product measurable, f I H'(r) I P dr is a random variable, and hence IIH'(t)IIp are well defined.) By the fundamental theorem of calculus,
ift>s>0then
t
IIH(s) - H(s)IIp S f IIH'(r)llpdr < Kit - sl. s
Therefore, Op(r) < Kr, and H is Dini-continuous in L1(P).
(b) Suppose H(s) = f (W (s)), where f is a non-random Lipschitzcontinuous function; i.e., there exists L such that
If(x)-f(y)I SLly - xI It follows then that ilip(r) = O(r1/2) as r -i 0, and this yields the Dini-continuity of H in L1(P) for any p > 1. (c) Consider H(s) = f (W (s) , s), where f (x, t) is a non-random function, twice continuously differentiable in each variable with bounded
derivatives. Suppose, in addition, that there exists a non-random T > 0 such that f (x, s) = 0 for all s > T. Because I H(s) - H(t)I is bounded above by
If (1't'(s), a) - f (u'(t), a)1 + If MO, a) - f MO, t)I , by the fundamental theorem of calculus we can find a constant M such that for all s, t > 0,
IH(s)-H(t)I <M(IW(s)-W(t)I +It-sl) By Minkowski's inequality for all p > 1,
IIH(s) - H(s)IIp S M(IIW(s) - W(t)IIp+ It - sl)
=M(c it-611/2+It-sl)
,
where cp := IIN(0,1)IIp. Therefore, i/ip(r) = O(r1/2) as r - 0, whence follows the Dini-continuity of H in any L1(P) (p > 1).
Remark 10.6 (Cauchy Summability Test). Dini-continuity in L1(P) is equivalent to the summability of ikp(2-nf). Indeed, we can write (10.4)
n=0
10. Terminus: Stochastic Integration
184
Because iPp is nondecreasing, 00
00
lEOp(21 t)dt<EOp(2-").
(10.5)
1
0
n=0
n=0
We can now define f H dW for adapted compact-support processes that are Dini-continuous in L2(P). We will improve the construction later on.
Theorem 10.7. Suppose H is an adapted compact-support stochastic process that is Dini-continuous in L2(P). Then limn-,,2n(H) exists in L2(P).
If we write[fHdw] fH dW for this limit, the[(JHdw)2] n
(10.6) E
=0
= E [f°°H2(s)ds].
and E
If a, b E R, and V is another adapted compact-support stochastic process that is Dini-continuous in L2(P), then with probability one,
= aJ HdW +bJ V dW.
(10.7)
Definition 10.8. The second identity in (10.6) is called the Ito isometry (Ito, 1944).
Proof (Sketch). For t > s, W(t) - W(s) is independent of Jr. (Theorem 9.21, p. 174), and H(u) is %'. measurable for u < s. Therefore, Lemma 10.2 implies the following. We use the notation introduced in the proof of Lemma 10.2 to simplify the type setting: IITn+1(H) -Tn(H)II2 IIAI+ 11H {22n+11)
(10.8)
0<j<2^T-1
-H
112
nntl
2nj /II22 2
E
= 2-`-10<j<2^T-1 IIH \22 +11)
- H \2n II2
< V'2(2'), by Dini continuity. Consequently, N+M-1
IITN+M(H) -IN(H)112 5 > IITn+1(H) -2n(H)II2 (10.9)
n=N+1
N+M-1
VT
`
bit (2-n-1)
'N, M > 1.
n=NN+1
It follows from Remark 10.6 that {2n(H)}n°_1 is a Cauchy sequence in L2(P). This proves the asserted L2-convergence.
1. The Indefinite Ito Integral
185
The basic properties of Brownian motion ensure that EZ"(H) = 0. By L2-convergence we also have E[f H dW] = 0. Similarly, we can prove (10.6):
E [(I Hdw)]
=HymnIlZn(H)II2 00
= ine' E [H2 (k2-")] 2-"
(10.10)
n
k=0 1
= E [1000 H2(s) ds]
.
The exchanges of limits and integrals are all justified by the compact-support
assumption on H together with the continuity of the function t' IIH(t)112. Finally, (10.7) follows from the linearity of H '-41,,(H) and the existence D of L2(P)-limits. We now drop many of the technical assumptions in Theorem 10.7.
Theorem 10.9 (1t6,1944). Suppose His adapted and E[ fo H2 (s) ds] < 00. Then one can define a stochastic integral f H dW that has mean zero and variance E[ f' H2(s) ds]. Moreover, if V is another such integrand process, then for all a, b E R
f(aH + bV)dW = E
[J
H dW
af HdW+bJ VdW a.s.,
J V dWl =E [ rCo H(s)V(s) ds1
.
The proof is function theoretic, and takes up the remainder of this section. If you wish to avoid such technical details, then you can do so without missing any of the central ideas: Simply skip to the next section. Throughout, let m denote the Lebesgue measure on (R, .(R) ), and let L2(m x P) denote the corresponding product L2-space. We may note that JJ
(10.12)
E [J"O H2(s) ds] = 11H11L2(mxe),
and E[f H(s)V(s)d.9) is the L2(m X P) inner product between H and V. The following is the key step in our construction of stochastic integrals.
Proposition 10.10. Given any stochastic process H E L2(m X P) we can find processes {H"}°O_1, all compact-support and Dini-continuous in L2(P),
such that limn-00 H = H in L2(m X P). Theorem 10.9 follows immediately from this.
10. Terminus: Stochastic Integration
186
Proof of Theorem 10.9. Proposition 10.10 asserts that there exist adapted compact-support processes {Hn}n_-1, that are Dini-continuous in L2(P) and converge to H in L2 (m x P). The Ito isometry (10.6) ensures that if Hn dW }n 1
is a Cauchy sequence in L2(P), since {Hn}n is a Cauchy sequence in L2 (M X P). Consequently, f H dW = limn f Hn dW exists in L2(P). The 1
properties of f H dW follow readily from those of f Hn dW and the L2(P)convergence that we have proved earlier. Let us conclude this section by proving the one remaining proposition.
Proof of Proposition 10.10. We proceed in three steps, each reducing the problem to a more restrictive class of processes H. Step 1. Reduction to the Compact-Support Case. For each n > 1, let
HH(t) := H(t)110,n)(t) and note that Hn is an adapted compact-support stochastic process. Moreover,
E [ j°°H2(s)ds] = 0. l Therefore, we can, and will, assume without loss of generality that H is also compact-support. Step 2. Reduction to the L2-Bounded Case. Let us first extend the definition of H to all of R by assigning H(t) := 0 if t < 0. Next, define (10.13)
Wimp IIH - HnII ,2(mXP) =
(10.14)
rt Hn(t) = n (
H(s) ds
Vt > 0, n > 1.
Check that H is an adapted process. Moreover, Hn(t) = 0 for all t > T + 1, so that Hn is also compact-support. Next, we claim that Hn is bounded in L2(P). The Cauchy-Bunyakovsky-Schwarz inequality and the FubiniTonelli theorem together imply the following: (10.15)
sup IIHn(t)II2 < n f IIH(s)II2 ds = nhIHIIi2(mxp) t>o
0
It remains to prove that Hn converges in L2(m X P) to H. Since fo H2(s) ds < oo a.s., the Lebesgue differentiation theorem (p. H(t) for almost every t > 0. Therefore, 140) implies that a.s., Hn(t) limn Hn = H (m x P)-almost surely by Fubini-Tonelli. According to the dominated convergence theorem, Step 2 follows if we prove that (10.16)
sup IHnI E L2(m X P). n>1
Note that supn IHnI < 4'H, where the latter is the "maximal function," !t Vt > 0. (10.17) (.4'H)(t) = sup n / IH(s)I ds n>1
t-(1/n)
2. Continuous Martingales in L2(P)
187
For each w, (4'H)(t + n-1) is the Hardy-Littlewood maximal function of H. Also, . WH is a n adapted process, whence 0
H
I (-*fH)(s)I2 ds <
(10.18) 1000
Confer with Corollary 8.41 on page 143. But H(t) = 0 for all t > T and suet IIH(t)112 < oo. Therefore, f °O H2(s) ds < oo for almost all w. Moreover, we take first expectations (Fubini-Tonelli), and then square roots, to deduce
that (10.19)
II'HIIL2(mxP) <_ 161IHIILz(mxP) < 00-
This reduces our problem to the one about the H's that are bounded in L2(P) and compact-support. Step 3. The Conclusion. Finally, if H is bounded in L2(P) and compactsupport, then we define Hn by (10.14) and note that Hn is differentiable,
and Hn(t) = n{H(t) - H(t - n-1)}. Therefore, (10.20)
sup IIHn(t)II2 <- 2nsuP IIH(t)II2 < 00I
t
Part (b) of Example 10.5 implies the asserted Dini-continuity of H,,. On the other hand, the argument developed in Step 2 proves that Hn -+ H in L2(m X P), whence follows the theorem.
2. Continuous Martingales in L2(P) The theories of continuous-time martingale and stochastic integration are intimately connected. Thus, before proceeding further, we take a side-step, and have a quick look at martingale theory in continuous time. To avoid unnecessary abstraction, .9T will denote the Brownian filtration throughout.
Definition 10.11. A process M = {M(t)}t>o is a (continuous-time) martingale if:
(1) M(t) E L'(P) for all t > 0. (2) If t > s > 0 then E[M(t) I9(s)] = M(s) a.s. The process M is a continuous L2-martingale if t H M(t) is almost-surely continuous, and M(t) E L2(P) for all t > 0. Much of the theory of discrete-time martingales transfers to continuous L2(P)-martingales. Here is a first sampler.
The Optional Stopping Theorem. If M is a continuous L2-martingale and S < T are bounded s-stopping times, then (10.21)
E[M(T) I.J"rs] = M(S)
a.s.
10. Terminus: Stochastic Integration
188
Proof. Throughout, choose and fix some non-random K > 0 such that T < K almost surely. If S and T are simple stopping times, then the optional stopping theorem is a consequence of its discrete-time namesake (p. 130). In general, let S 1 S and T I, T be the simple stopping times of Lemma 9.20 (p. 172). Note that
the condition S < T imposes 5,,, < T, + 2-' for all n, m > 1. Because T + 2-n` is a simple stopping time, it follows that (10.22)
= M(S,n)
E [M(Tn + 2-m) I
a.s.
Moreover, this very argument implies that {M(TL + 2-m)}°O_, is a discretetime martingale. Since Tn < T + 2-1 < K + 2-n, Problem 8.22 on page 153 implies that 1
(10.23)
E[sup jM(Tn+2-m)I2] <4E[IM(K+1)12]
-
n>1
By the almost-sure continuity of M, limn_,. M(Tn + 2-1) = M(T + 2-m) a.s. Convergence holds also in L2(P), thanks to the dominated convergence theorem. Therefore, E[M(T + 2-m) I sm]
IIE[M(Tn + 2-m) (1024)
2
= IIE [M(T,+2-m) - M(T +2-r) I222 < E (E [(M(Tn + 2-m) - M(T + 2-m))2 -40 asn -goo.
5,,,] I
1
We have appealed to the conditional Jensen inequality (p. 120) for the first inequality, and the towering property of conditional expectations (Theorem 8.5, page 123) for the second identity. By (10.22), M(S,n) = E[M(T + 2-m)1 gs,n] a.s. Because rs c `rs,,,, this and the towering property of conditional expectations together imply that E[M(S,n) I 9s] = E[M(T + 2-m) 1.5's] a.s. Let m oc and appeal to the argument driving (10.24) to deduce that E[M(T) 19s] = xlimo E [M(T + 2-m I .a sm] (10.25)
=
mo E[M(Sm) 19S]
= E[M(S) I -01s]
= M(S), where the limits all hold in L2(P). This implies that E[M(T) 1 ,Fs] = M(S) almost surely, as desired. 0 The following is a related result whose proof is relegated to the exercises.
3. The Definite Integral
189
Doob's Maximal Inequalities. Let Al denote a continuous L2-martingale. Then for all A, t (10.26)
0.
> AP
I sup JAI(s)I > $ o<s
r
E I Ihl(t)I; sup IM(s)I > AJ 0<,
L
In particular, for all p > 1, E
(10.27)
o Ai(s)IPJ [sun
<
(_-!_)E(aAI(t)It). p
1
3. The Definite Ito Integral It is a natural time to mention the definite Ito integral. The latter is defined simply as fo H dW = f H110,t) dW for all adapted processes H such that E[ fo H2(s) ds] < oo for all t > 0. This defines a collection of random variables fo H dW, one for each t > U. The following is of paramount importance to us, since it says something about the properties of the random function t F-. fo H dW .
Theorem 10.12. If H is an adapted process such that E[ f0 H2 (s) ds] < oc then we can construct the process { fo H dW }t>o such that it is a continuous L 2 -martingale.
Proof. According to Theorem 10.9, fo H dW exists. Thus, we can proceed by verifying the assertions of the theorem. We do so in three steps. Step 1. Reduction to H that is Dini-continuous in L2(P). Suppose we have proved the theorem for all processes H that are adapted and Dini-
continuous in L2(P). In this first step we prove that this implies the remaining assertions of the theorem. Let H be an adapted process such that E(fo H2(s) ds] < oc for all t > 0. We can find adapted processes H that are Dini-continuous in L2(P) and
lim E [f'
(10.28)
1
H(s))2 ds] = 0.
Indeed, we can apply Proposition 10.10 to H110.1.], and use the recipe of the said proposition for H,,. Then apply the proposition to t as well. By (10.28) and the Ito isometry (10.6),
H
lim E
[(f1Hdw r(10.29)
- J Hn, dW
=
0
Because f o Hn dW - fo Hn+, dW = f f (H - Hn+x) dW defines a continuous L2-inartingale, by Doob's maximal inequality (p. 189), for all non-random
10. Terminus: Stochastic Integration
190
but fixed T>0, (10.30)
r lim E I sup
n-oo
L0
r
(10
t
dW\) 2=
HdW - J t Hn+k
/
0
0.
In particular, for each T > 0 there exists a process X = {X(t)}t>o and a subsequence n' - oo such that ( 10.31)
lim
sup
.o0 0
J Hn' dW - X(t)= 0
a.s.
Moreover, the same uniform convergence holds in L2(P), and along the original subsequence n - oo. Consequently, (10.29) implies that X is a particular construction of t " fo H dW that is a.s.-continuous and adapted. In other words, X is adapted and a.s.-continuous, and also satisfies (10.32)
P{x(t) =JtHdW}=1 o
et>0.
JJJ
Finally, X(t) E L2(P) for all t > 0, so it remains to prove that X is a martingale. But remember that we are assuming that {fo Hn dW }t>o is a martingale. By the conditional Jensen inequality (p. 120), and by L2(P)-convergence,
IIE
IX(t+s) _
11X(t + s) -
-+0
ft+8
ft+8
HndWB]112
HndWII2
asn -goo.
Consequently, limn. fo Hn dW = E[X(t + s) I tee] in L2(P). But we have seen already that fo Hn dW -+ X (t) in L2(P). Therefore, (10.34)
E[X(t + s) I e] = X (s)
a.s.
That is, X is a martingale, as was claimed. Step 2. A Continuous Martingale in the Dini-Continuous Case. Now we suppose that H is in addition Dini-continuous in L2(P), and prove the theorem in this special case. Together with Step 1 this completes the proof.
3. The Definite Integral
191
The argument is based on a trick. Define
.7ntH)(t) _ 0
(10.35)
H C2/
/
+Hl
L2n2n
-W ()] 1] ) [w(t)_w(L2hi)]. LW
('2n+1)J
This is a minor variant of T(H11o 1). Indeed, you should check
In (H1io,q) (10.36)
=H(L2: Ii
[W(t)-W([2nt -1J+n
2n
JL JJ whose L2(P)-norm goes to zero as n - oc. But n(H) is also a stochastic process that is: (a) Adapted; and (b) continuous-in fact, piecewise linearin t. It is also a martingale. Here is why: Suppose t > s > 0. Then there
exist integers 0 < k < K < 2ns - 1 such that s E D(k; n) := [k2-n, (k + 1)2-n) and t E D(K;n). Then,
n(H)(t) -.7n(H)(s) (10.37)
k
+H(K) [w(t) - W
(2n
(a-H( 2) [IV(s) -
W (a].
where Ek<j
E [Jrn(H)(t) - 7n(H)(s) I .f'A] = 0.
This proves the martingale property. Step 3. The Conclusion. To finish the proof, suppose H is an adapted process that is Dini-continuous in L2(P). A calculation similar to that of Lemma 10.2 reveals that for any non-random T > 0, (10.39)
lim sup E [(.7n+1(H)(t) n-oo 0
-.n(H)(t))2]
= 0.
Therefore, by Doob's maximal inequality (p. 189), (10.40)
lim E I sup (.I.+, (H)(t) - .7n(H)(t))2I = 0. n-'O°
o
J
This implies that a subsequence of .7n(H) converges a.s. and uniformly for all t E [0, T] to some process X. Since ,7n (H) is a continuous process, X is necessarily continuous a.s. Furthermore, the argument applied in Step 1
10. Terminus: Stochastic Integration
192
shows that here too X is a martingale. We have seen already that for any fixed t > 0,
.n(H)(t) - Zn(Hlio,,1) -; 0
(10.41)
in L2(P).
Since Zn(Hlio,11) -+ J. H dW in L2(P), this proves that
P{X(t)=fo HdW}=1
(10.42)
o
111
dt>0,
JJJ
and whence the result.
4. Quadratic Variation We now elaborate a little on quadratic variation (Theorem 9.9, p. 163). Quadratic variation is a central theme in continuous-time martingale theory, but its study requires more time than we have. Therefore, we will develop only the portions for which we have immediate use. Throughout, we define the second-order analogue of Tn (10. 1), (10.43)
H
Q,,(H)
I
[W (1) - W ()]2.
k=O
Theorem 10.13. Suppose H is adapted, compactly supported, and uniformly continuous in L2(P); i.e., lim,..o2(r) = U. Then,
lim Qn(H) =
(10.44)
JH(s)ds
n-oo
in L2(P).
Proof. To simplify the notation, we write for all integers k > 0 and n > 1, (10.45)
H(k2-") and dk,n := W((k + 1)2-n) - W(k2-n).
Hk,n
Recall next that we can find a non-random T > 0 such that for all s > T, H(s) = 0 a.s. Throughout, we keep such a T fixed. Step 1. Approximating the Lebesgue Integral. We begin by proving that 00
E Hk,n2-" -b
(10.46)
k=0
J0 "O H(s) ds
in L2(P) as n - oo.
Note that 00
o0
E Hk,n2-" - f H(s) ds (10.47)
k=0
0
(ki'l)`2-n
< n 0
IHk,n- H(s)j ds.
5. Ito's Formula
193
Therefore, by Minkowski's inequality,
(10.48)
IIE Hk,n2-n - J
H(s) dsll <
k=0
2-n '2(2-n) 0
2
T'02 (2-n)
,
which converges to zero as n -+ oo; Step 1 follows. Step 2. Conclusion. Choose and fix t > 0, and for all n > 1 define 00
Dn
(10.49)
Qn(t) - E Hk,n2-n k=0
Note that Hk,n is independent of dk,n, and the latter has mean zero and variance 2-n. Because Dn = E00 o Hk,n[dk,n -2 -n ], we first square and then take expectations to obtain
=
IIDnII22
E [Hk2,n] E [(d2k, n
-2 -n)2]
0
(10.50)
+2
E [Hk nHj n (dk,n -2 -n) (di,n -2 -n)]
.
O<j
If j < k, then dk,n is independent of Hk,nHj,n(dj2,n - 2-n), and has mean zero. Therefore, (10.51)
E [HR,n] E [(d2k, n
IIDnII22 =
-2 -_)2]
.
O
Next we observe that [dk n - 2-n] has the same distribution as 2-n(Z2 - 1), where Z is standard normal. Because E[(Z2 -1)2] = E[Z4] -1 = 2, it follows
that (10.52)
IIDnII2 = 4n
IIHk,nhI2 O
Dini-continuity ensures that t " IIH(t)112 is continuous, and hence bounded on [0, T] by some constant KT. Thus, (10.53)
2 IIDnII2 <
2(2nT - 1)KT 4n
-0
as n
oo.
This and Step 1 together imply the result.
5. Ito's Formula and Two Applications Thus far, we have constructed the Ito integral, and studied some of its properties. In order to study the Ito integral further, we need to develop an operational calculus that mimics the calculus of the Lebesgue and/or
10. Terminus: Stochastic Integration
194
Riemann integral. To understand this better, we recall the chain rule of elementary calculus. Namely, (10.54)
(f o 9)'(x) = f'(9(x)) g ,(X),
valid for all continuously differentiable functions f and g. In its integrated form-this is integration by parts-the chain rule states that for all t > s > 0,
(10.55)
f(9(t)) - f(9(s)) = f tf'(9(u))9 (u) du. e
For example, let f (x) = x2 to find that (10.56)
92(t)-92(0)= fgdg t
where dg(s) = g'(s) ds. What if g were replaced by Brownian motion? As a consequence of our next result we have t
(10.57)
W2(t) - W2(0) = f W dW + 2
a.s.
0
Compared with (10.56), this has an extra factor (t/2).
Ito's Formula 1. If f : R R has two continuous derivatives, then for all t > s > 0, the following holds a.s.: (10.58)
f(W(t)) - f(W(s)) =
f
t
f'(W(r)) W(dr) + 1 f t f"(W(r))dr. 2
a
s
Ito's formula is different from the chain rule for ordinary integrals because the nowhere differentiability of W forces us to replace the right-hand side of (10.55) with a stochastic integral plus a second-derivative term. Remark 10.14. Ito's formula continues to hold even if we assume only that f" exists almost everywhere and f t (f'(W (r))2 dr < oo a.s. Of course then we have to make sense of the stochastic integral, etc.
Proof in the Case that f"' is Bounded and Continuous. We assume, without loss of too much generality, that s = 0.
The proof of Ito's formula starts out in the same manner as that of (10.55). Namely, by telescoping the sum we first write
f(W(2-"L2"t-1])) (10.59)
-f(0)
E
0
[f(w('2+n'))
AW(2'nM
5. Ito's Formula
195
To this we apply Taylor's expansion with remainder, and write (2-n [2"t
f (W (10.60)
- 1J)) - f(0)
E f (W (2 ))
dk.n
0
+2 E f""(W()) 4, + 0
where dk,n := W((k+1)2-") -W(k2-n), and IRk,nI
Rkndkn, 0
M<
supx
oo, uniformly for all k, n.
According to the proof of Theorem 10.7, the first term of the righthand side of (10.60) converges in L2(P) to fo f'(W(s)) W(ds); see also Example 10.5. The second term, on the other hand, converges in L2(P) to 2 fo f"(W(s)) ds; consult with Theorem 10.13. In addition, continuity and the dominated convergence theorem together imply the following: (10.61)
nimo f (W(2-n[2nt - 1])) - f(W(t))
a.s. and in L2(P).
It, therefore, suffices to prove that ?n --+ 0 in Ll (P) as n -* oc, where (10.62)
Rk,ndk
n
0
But as n -+ oo, (10.63)
Ildk,nll3 -
EI.-nI < M
MtIIN(0,1)II32-ni2.
0
The proof follows.
O
Next is an interesting refinement; it is proved by similar arguments involving Taylor series expansions that were used to derive Ito's formula 1.
Ito's Formula 2. Let W denote Brownian motion with W (O) = x0. If f (x , t) is twice continuously differentiable in x and continuously differentiable in t, and if E[ f0 I a= f (W (s) , s) I2 ds] < oo for all t > 0, then a.s., f (W (t) , t) = f (xo , 0) + (10.64)
ft
ax f (W (s) , s) W(ds)
f [cf
(W(s) , s) +
+ 0 This remains valid if f takes on complex values.
f (W(s) , s)} ds.
Of course, f H dW := f Re(H) dW + i f Im(H) dW whenever possible. I will not prove this refinement. Instead, let us close this book with two fascinating consequences of Ito's formula 2.
10. Terminus: Stochastic Integration
196
5.1. Levy's Theorem: A First Look at Exit Distributions. Choose and fix some a > 0, and let W denote a Brownian motion started somewhere
in (-a, a). We wish to know where W leaves the interval (-a, a). The following remarkable answer is due to Levy (1951):
Theorem 10.15. Choose and fix some a > 0, and define (10.65)
Ta := in£ is > 0: W (s) = a or - a},
where inf 0 := oo. If W (O) := xo E (-a, a), then for all real numbers A # 0, (10.66)
Eexp (i.AT0) =
cos (xo
2i))
cos (a 2iA)
Proof. We apply Ito's formula 2 with f (x, t) := V)(x)e
(10.67)
where A 0 0 is fixed, and the function
satisfies the following boundary-
value problem: Min(x) = 2iA i(x)
dx E (-a, a),
(10.68)
O(a) = V1 V (-a) = 1.
By actually taking derivatives, etc., we find that the solution is cos (x 2i.L)
(10.69)
cos (a 2iA
It is possible to check that E[fo l8 f (W (s) , s)/8x2 ds) is finite for all t > 0. implies that f solves the partial Moreover, the eigenvalue problem for differential equation, 2
(10.70)
2-5x2(x,t)+(x,t)=0.
As a result, Ito's formula 2 tells us that f (W (t) , t) - f (xo, 0) is a mean-zero (complex) martingale. By the optional stopping theorem (p. 187), (10.71)
E[f (W (Ta At),Ta A t)] = f(xo,0) ='O(xo)
Thanks to the dominated convergence theorem and the a.s.-continuity of W, we can let t -+ oc to deduce that (10.72)
E [f (W (Ta) ,Ta)] = TI'(xo)
Because W(Ta) = ±a and 0(±a) = 1, (10.73)
This proves the theorem.
f (W (Ta) ,Ta) = eaalo.
0
5. Ito's Formula
197
5.2. Chung's Formula: A Second Look at Exit Distributions. Let us have a second look at Theorem 10.15 in the simplest setting where x0 := 0 and a := 1. Define T := Tl to find that the formula (10.66) simplifies to the following elegant form: (10.74)
Eexp(i)T) =
1
cos
2ia
VA E R\ {0}.
In principle, the uniqueness theorem for characteristic functions tells us that the preceding formula determines the distribution of T. However, it is not always so easy to extract the right piece of information from (10.74). For instance, if it were not for (10.74), then we could not prove too easily that 1/ cos 2i5 is a characteristic function of a probability measure. Or for that matter, can you see from (10.74) that T has finite moments of all orders? (It does!) The following theorem of Chung (1947) contains a different representation of the distribution of T that answers the previous question about the existence of the moments of T.
Theorem 10.16. For all t > 0, (10.75)
P {T > t} = 74r
(2n +81)27r2t1
2-+ l eXp n=0
J
Consequently, P{T > t} - (4/7r) exp(-7r 2t/8) as t -b oc.
The preceding implies that P{T > t} < 2exp(-7r2t/8) for large values of t. In lieu of Lemma 6.8 (p.' 67), (10.76)
Vp>0.
E[7'I']=pJ00 tP-'P{T>t}dt,
Therefore, T has moments of all orders, as was asserted earlier. In fact, you
might wish to carry out the computation a little further and produce the following neat formula for the pth moment of T: (10.77)
E [7'P] =
23p+2r(p + 1),3(2p + 1)
y
p > 0'
al+2p
where r(p) = fo sP-le-sds denotes Euler's gamma function and 3 is the Dirichlet beta function, viz., (10.78) n=O
(2n +)1)t
Vt >
1.
Theorem 10.16 implies also the following formula (Chung, 1947).
10. Terminus: Stochastic Integration
198
Corollary 10.17. For all x > 0, (10.79)
P ( sup IW(s)I S x } = to<s<1
4 71
111
(-1)"
00
exp (- (2n + 1)2lr2)
.2n+ 1
J
8x2
n=
In particular, P{supo<s<1 IW(s)I < x} - (4/7r)exp(-7r2/(8x2)) as x - 0.
Proof of Theorem 10.16 (Sketch). The proof follows three steps. Step 1. A Cosine Formula. Choose and fix an integer n > 1, and define
282t)
(10.80)
dxER,t>0.
f(x,t)=cos(n-2x7r)exp(n
The function f solves the partial differential equation, 102f (10.81)
+
=0 subject to f(±l,t)=0 't>0.
This is a kind of heat equation on [-1, 1] with Dirichlet boundary conditions.
Barring technical conditions, Ito's formula 2 tells us that f (W (t) , t) - 1 defines a mean-zero martingale. [This uses the fact that f (0, 0) = 1.] By the optional stopping theorem (p. 187), E[f (W (T At) , T At)] = 1. Equivalently, (10.82)
E[f(W(T),T); Tt]=1.
282t)S
Because W(T) = ±1 a.s., the first term in (10.82) vanishes a.s. Whence we obtain the following cosine formula: /
(10.83)
\
1
E[cosl mr2(t) I; T>tJ
=exp(-n
2. A Fourier Series. Let L2(-2,2) denote the collection of all measurable functions g : [-2,2] - R such that f 22 g2 (x) dx < oo. Theorem A.5 (p. 205), after a little fidgeting with the variables, shows that 2, 2-1/2 sin(n7rx/2), 2-1/2 cos(m7rx/2) (n, m = 1,2,...) form an orthonormal basis for L2(-2,2). In particular, any 0 E L2(-2, 2) has the representation,
(10.84)
¢(x) = 20 +
[An cos n=1
(n7rx) + B. sin (n7rx 11 2
where:
- The infinite sums converge in L2(-2,2); - Ao = 2-1 f22 0(x) dx;
- An = 2-1/2 f 22 O(x) cos(n7rx/2) dx for n > 1; and
- Bn = 2-1/2 f22 22 0(x) sin(n7rx/2) dx for n > 1.
2
/J
Problems
199
Step 3. Putting it Together. We can apply the result of Step 2 to the function O(x) := 1(_1.1)(x) to obtain 1(-1,1)(x) -
(10.85)
1
_ 2 °O (-1)n
2
cos -E 2n + 1 n=0
((2n +1)7rx) 2
J
We "plug" in x := W (t , w), multiply by 1{T(w)>t}, and then apply expectations to find that
P{W(t)E(-1,1),T>t}-2PIT >t} (10.86)
=22n'
1E[cos((211+
)irW(t));T>tJ )
\
n=0
Since the left-hand side is equal to ZP{T > t}, the cosine formula of Step 1 completes our proof. The preceding proof is a sketch only because: (i) we casually treat the L2identity in (10.85) as a pointwise identity; and (ii) we exchange expectations with an infinite sum without actually justifying the exchange. With a little effort, these gaps can be filled.
Problems 10.1. In this exercise we construct a Dini-continuous process in LP(P) that is not as, continuous.
(1) Prove that if0<s
s
t
(2) Use this to prove that H(s) := 1(o is not as. continuous.
is Dini-continuous in L2(P), but H
10.2. In this exercise you are asked to construct a rather general abstract integral that is due to Young (1970). See also McShane (1969).
A function f : (0,11 - R is said to be Holder continuous of order a > 0 if there exists a finite constant K such that (10.87)
If(s) - f(t)l < Kit - sl°
es, t E (0, 1].
Let B° denote the collection of all such functions.
(1) Prove that B° contains only constants when a > 1, whereas `BI includes but is not limited to all continuously differentiable functions. (2) Prove that if 0 < a < 1, then if" is a complete normed linear space that is normed by
Illlly°
sup s,tE10,11
If(s) - f(t)I + sup If(t)I Is - t1°
tE iO,1]
.?it
(3) Given two functions f and g, define for all n > 1,
Ifbn9= klf (2) [9(k2 1)-9(2 )1'
10. Terminus: Stochastic Integration
200
Suppose f E 'a and g E `6'O for some 0 < a, 0 < 1. Prove fo f 6g := limn fo f bng exists whenever a + 0 > 1. Note that when we let g(x) = x, we recover the Riemann integral of f; i.e., that fo I6g = foI f(.) dx. (4) Prove that fo gbf is well defined, and
IIf69=f(1)9(l)-f(0)9(0)-I gbf 0 0 The integral f fbg is called a Young integral. (HINT: Lemma 10.2.) 10.3. In this problem you are asked to derive Doob's maximal inequality (p. 189) and its variants. We say that M is a submartingale if it is defined as a martingale, except
EIM(t) J F.] > M(s) a.s. whenever t > s > 0. M is a supermartingale if -M is a submartingale. A process M is said to be a continuous L2(10.88)
submartingale (respectively, continuous L2-supermartingale) if it is a submartingale (respectively
supermartingale), {M(t)}t>o is as, continuous, and M(t) E L2(P) for all t > 0. Prove: (1) If Y is in L2(P), then M(t) = EIY ),fit) is a martingale. This is a Doob martingale in continuous time. (2) If M is a martingale and t' is convex, then >'(M) is a submartingale provided that tl'(M(t)) E L'(P) for each t > 0. (3) If M is a submartingale, t' is a nondecreasing convex function, and tj'(M(t)) E L' (P) for all t > 0, then O(M) is a submartingale. (4) The first Doob inequality on page 189 holds if )MI is replaced by any a.s.-continuous submartingale. (HINT: Prove first that supo<.
10.4. For all integers n > 1 define pn := E{(W(1))"}, where W is Brownian motion. Use Ito's formula to prove that An+2 = (n + 1)µ". Compute An for all integers n > 1.
10.5 (Gambler's Ruin). If W denotes a Brownian motion, then for any a E R, define T. inf{s > 0 : W(s) = a} where inf 0 := oo. Recall that T. is an .ir-stopping time (Proposition 9.17, p. 171). If a, b > 0 then prove that b
P{T
(10.89)
and compute ET,. 10.6. Prove Corollary 10.17.
10.7. Recall the Dirichlet beta function 0 from (10.78). Prove that 0(1) = w/4 and 0(3) = w3/32. Few other evaluations of 0 are known explicitly. Even 0(2), the so-called "Catalan constant," does not have a simpler description.
10.8. Let W be a Brownian motion with W(O) = 0, and define T to be the first time W exits the interval (-1, 1). Prove that EITPI < oo whenever -oo < p < oo.
10.9. Let T := inf{s > 0 : IW(s)I = 1}, where inf 0 (10.90)
Eexp(-AT) =
0. Prove that for all A > 0, 1
cosh v'-2-,\'
(Levy, 1951).
10.10. Define 2t2)
(10.91)
p(t;x)_ (2wt)1/2eXP(
1(/
"xER,t>0.
Prove that if W is Brownian motion, then M(t) := p(t; W(t)) defines a martingale.
10.11 (Hard). Let W be a Brownian motion with W(0) = 0 and define T,,b to be the first time W exits the interval (-a, b), where a, b > 0 are fixed constants. Compute E exp(iAT,,b) for all
A>0.
Notes
201
10.12 (Hard). Let W denote a Brownian motion, and (3 > 0 a fixed positive number. Define Wa(t) = W(t) + j3t to be the so-called Brownian motion with drift 0. For any a,b > 0 define
r°,_t := inf {s > 0: Wa(s) = a or - b} .
(10.92) Prove that
1 - e-zaa P {Wa (.,-b) = -b} = e1db -e-2a°
(10.93)
From this deduce the distribution of -inft>oW8(t). (HINT: For all a E R find a non-random function h. : R+ -a R such that t -. exp(aW(t) - h°(t)) defines a mean-one martingale.)
Notes (1) Instead of presenting a general theory of stochastic integration, we have discussed a
special case that is: (i) broad enough to be applicable for our needs; and (ii) concrete enough so as to make the main ideas clear. Dellacherie and Meyer (1982) have written
a definitive account of the general theory of processes. Their treatment includes a detailed description of the general theory of stochastic integration with respect to (semi-) martingales. (2) Ito's theory of stochastic integrals uses the "left-hand rule," as can be seen clearly in (10.1). This "left-hand rule" is the hallmark of Ito's theory of stochastic integration. In general, it cannot be replaced by other rules-such as the midpoint- or the right-hand rule-without changing the resulting stochastic integral. (3) For Theorem 10.15, and much more, see Knight (1981, Chapter 4). (4) Theorem 10.16 can be extended to several dimensions Ciesielaki and Taylor (1962). One can use Corollary 10.17 in conjunction with the Poisson summation formula (Feller, 1966, p. 630) to deduce that
P
(10.94)
rm
(4n+1)s
exp(-u2/2) du. sup 1W(s)1:5X = V < J 0a<1 ' n___ (4n-1)x
Whereas (10.79) is useful for small values of x, the preceding is accurate when x is large. The fact that the right-hand sides of (10.79) and (10.94) agree is one of the
celebrated theta-function identities of analytic number theory. (5) Now that we have reached the end of the book, let me close by suggesting that this
is a natural place to start learning about W. K. Feller's theory of one-dimensional diffusions (1955a; 1955b; 1956). Modern and more pedagogic accounts include Bass (1998), Knight (1981, Chapter 4), and Revuz and Yor (1999, Chapters 3 and 7).
Appendix
The moving power of mathematical invention is not reasoning but imagination. -Augustus de Morgan
1. Hilbert Spaces Throughout, let H be a set, and recall that it is a (real) Hilbert space if it is linear and if there exists an inner product ( , ) on H x H such that f '-' (f , f) = 11f 112 norms H into a complete space. We recall that inner product means that (a f + ag, h) = (h, a f + 13g) = a(f , h) +,3(g, h) for all f,g,h E H and all a, 0 E R. Hilbert spaces come naturally equipped with a notion of angles: If (f, g) = 0 then f and g are orthogonal. Definition A.1. Given any S C H, we let S1 denote the collection of all elements of H that are orthogonal to all the elements of S. That is, (A.1)
S1={fEH: (f,g)=0"gES}.
It is easy to see that S' is itself a subspace of H, and that S n S1 = {0}. We now show that in fact S and S1 have a sort of complementary property.
Theorem A.2 (Orthogonal Decomposition). If S is a closed subspace of a Hilbert space H, then H = S + S1 := if + g : f E S , g E S'} In order to prove this, we need a lemma. Lemma A.3. If X is a closed and convex subset of a complete Hilbert space H, then there exists a unique f E X such that 11f II = infgEX 119 11 203
Appendix
204
Proof. By definition we can find f, E X such that Recall the "parallelogram law" : (A.2)
IIh + 9112 + IIh - 9112 = 2 (11h112 + 119112)
IIffII2 = infhEx IIh112.
vh, g E H.
We apply this with h := f, and g := fn, to find that for all n, m > 1, Ilfn 4- 4 f-111
= 2 (lfI2 +11fm112-211 fn
2fm112)
(A.3)
' (ilfn112+Ilfmll2-2hnf
11h112/J
The final inequality follows because (fn + fm)/2 E X, thanks to convexity. Let n,m - oo to deduce that {fn}°__1 is a Cauchy sequence in X. Because
X is closed, it follows that fn - f for some f E X, and hence Ilf 11 = infhEx llhll This verifies the existence of f. For the uniqueness portion suppose there were two norm-minimizing functions f, g E X. By the parallelogram law,
Ill-9112 = inf 11h112-
(A.4)
4
hEX
11:L+
2 g112 <0.
(Why?). Thus, f = g.
D
Proof of Theorem A.2. For all given f E H, the set f + S is closed and
convex, where f+ S := if + s
:
s E S}. In particular, f+ S has a
unique element .1(f) of minimal norm (Lemma A.3); also define .9(f) _
f - .1(f)
Because .1(f) E f + S, it follows that .9(f) E S for all f E H. Since .9 (f) + .91(f) = f, it suffices to demonstrate that .9-L (f) E S1 for all f E H. But by the definition of :91, 11.1(f)11 < If - 9Il for all g E S. Instead of g write G = ag + -60(f), where 1190 = 1 and a E R. Because G E S, we can deduce that for all g E S with 1190 = 1 and all a E R, (A.5)
11.91(1)112 < 11.1(f)
- a9I12
= 11.91(1)112
-2a(.91(1),9) +a2.
Let a = (91(f) , g) to deduce that (.-L(f), g) = 0 for all g E S. This is the desired result.
0
Theorem A.4. To every bounded linear functional 2 on a Hilbert space H there corresponds a unique 7r E H such that .`P(f) = (f , 7r) for all f E H.
Proof. If 2(f) = 0 for all f E H, then we define 7r = 0, and we are done. If not, then S = { f E H : 2(f) = 0} is a closed subspace of H that does not span all of H; i.e, there exists g E S1 with II9II = 1 and 2(g) > 0;
2. Fourier Series
205
this follows from the decomposition theorem for H (Theorem A.2). We will establish that 7r := g2(g) is the function that we seek, all the time remembering that 2(g) E R. For all f E H consider the function It = P(g) f - 2(f )g, and note that h E S since 2(h) = 0. Because 7r E Sl, this means that (7r, h) = 0. On the other hand, (7r, h) = .fi(g) (7r , f) - 2(g)2(f ). Since .2(g) > 0, we
have 2(f) = (71,1) for all f E H. It remains to prove uniqueness. but this too is easy for if there were two of these functions, say 7r1 and 7r2. then
for all f E H, (f, a, - 7r2) = 0. In particular. let f = 7r1 - r2 to see that D
ire = 712.
2. Fourier Series Throughout this section, we let T = [-7r, 7r] denote the torus of length 27r, and consider some elementary facts about the trigonometric Fourier series on T that are based on the following functions: (A.6)
t¢n(x) =
einx
ex E T, n = 0, ±1, ±2,...
.
27r
Let L2(T) denote the Hilbert space of all measurable functions f : T -. C such that (A.7)
IIf 11T,
JT
If(x)I2dx < oo.
As usual, L2(T) is equipped with the (semi-)norm IIf IIT and inner product (A.8)
(f, g) := IT
Our goal is to prove the following theorem.
Theorem A.S. The collection {0 }FEZ is a complete orthonormal system in L2(T). Consequently, every f E L2(T) can be written as
` 00
(A.9)
f=
(f, 0n)4n
n=-ac
where the convergence takes place in L2(T). Furthermore. 00
(A.10)
11f 11T
=
n=-oo
(f
I
The proof is not difficult, but requires some preliminary developments.
Definition A.6. A trigonometric polynomial is a finite linear combination of the fn's. An approximation to the identity is a sequence of integrable functions wro, wi , ...: T R+ such that:
Appendix
206
(i) fT ipn(x) dx = 1 for all n. Ef (ii) There exists co > 0 such that limn-,,,, f O (x) dx = 1 for all
e E 10,C01-
Note that (a) all the 1bn's are nonnegative; and (b) the preceding display shows that all of the area under 4pn is concentrated near the origin when n is large. In other words, as n -' oo, 7pn looks more and more like a point mass.
For n = 0,1,2.... and x E T consider (A.11)
cnx) = (1+cosx)" an
where an = f (1+cos(x))ndx. T
Lemma A.7. {}°_°o is an approximation to the identity. xe E (0, it/21. Then, Proof. Choose and fij(i +cosx)ndx < ir(1 +cose)n.
(A.12)
By symmetry, this estimates the integral away from the origin. To estimate the integral near the origin, we use a method of P.-S. Laplace and write (A.13)
j(i + cosx)n dx =
E
J
e engixl dx,
where g(x) := ln(1 + cos x). Apply Taylor's theorem with remainder to deduce that for any x E [0, e] there exists [; E [0, x] such that g(x) = In 2 - x2/(1 + cos (). But cos (> 0 because 0 < (< e < it/2. Thus, for all n > 1, (A.14)
j(i + cos x)dx >_ 2n J e2 dx > 0
V/n
J e-z2 dz. o
It follows from this and (A.12) that ff
Proposition A.8. If f E L2(T) and e > 0, then there is a trigonometric polynomial T such that JET- A IT < e. [Trigonometric polynomials are dense in L2(T).]
Proof. Since continuous functions (endowed with uniform topology) are dense in L2(T), it suffices to prove that trigonometric polynomials are dense in the space of all continuous function on T (why?). We first of all observe that the functions Kn are trigonometric polynomials. Indeed, by the binomial
theorem and the Euler formula cosx = 1(e" + e-z), Icnx) is a linear
2. Fourier Series
207
combination of {Oj(x)}? _n. Next we note that the convolution rcn * f of rcn and f is also a trigonometric polynomial, where (A.15)
(an * .f)(x) =
T IT
f(y)icn(x - y) dy.
Note that (rc,, * f)(x) - f (x) = fT{ f (y) - f (x)}rcn(y - x) dy. We choose and fix e E (0, 7r) and split the last integral according to whether or not Iy - xI < e. It follows that I(rc * f)(x) - f (x) I is at most (A.16)
sup If() - f(u)I +2 sup If(w)I
y,uET:
wET
.J
Iy-ul<e
By Lemma A.7 the last term vanishes as n (A.17)
oo. Thus, for all e > 0,
limsupsup 1(rc * f)(x) - f(x)I < sup If(y) - f(u)I xET
V,uET:
Iy-ul<e
Let c (A.18)
0 to see that the left-hand side is zero. Because II (Kn * ,f) - .f IIT < 27r sup I(rcn * f)(x) - f
(x)12,
zET
the proposition follows.
We are ready to prove Theorem A.5.
Proof of Theorem A.5. It is easy to see that {¢n}nEZ is an orthonormal sequence in L2(T); that is, (A.19)
1
ifn=m,
10 ifn34 m.
To establish completeness suppose that f E L2(T) is orthogonal to all On's; i.e., (f, ¢n) = 0 for all n E Z. If a and T are as in the preceding proposition, then: (i) (f ,T) = 0; and (ii) 11f - TIIT < E. This last part, (ii), implies that
E ? IIf - TIIT (A.20)
= IITIIT + IITIIT - 2(f,T) = Ill IIT + IITIIT
Since c is arbitrary, Ill IIT = 0, from which we deduce that f = 0 almost everywhere. This proves completeness. The remainder is easy to prove, but requires the material from Chapter 4 [§6].
Let Jan and 9 respectively denote the projections onto Sn := the _n and S. If f E L2(T), then -,Vnf is the a.e.-unique function g E S,, that minimizes Ill - 9IIT We can write g = j=_n CA
linear span of {0j}
Appendix
208
and expand the said L2-norm to obtain the following optimization problem: Minimize over all {cj} _n, 2
n
(A.21)
f - E Cj 0j
n
= IIfIIT + J.=-n
j=-n
n C.j
-2
Cj(f+l6j) j=-n
T
This is a calculus exercise and yields the optimal value of r_j = (f, 0j). It follows readily from this that: (1)
>_(f,Oj)Oj;
(ii) fn f = f - +9-f; (iii)
E
=-n I(f, Oj)I2; and (ice) IIyn fIIT = IIfIIT - Ej=-n I (f+Oj)I2 The last inequality yields Bessel's inequality: 00
(A.22)
F I(f,0j)I2 <_ IIfIIT
Our goal is to show that this is an equality. If not, then Fatou's lemma implies that II lim infn-,,,, .9n f IIT > 0, whence g := lim infn_.00.9 f 34 0 on a set of positive Lebesgue measure. But note that g E .9n for all n. Fix e > 0 and find a trigonometric polynomial T E Sn for some large n such that fig - TIIT 5 e. Now expand: E2
> IIg - TIIT =119112 + IITII. - 2(g, T)
(A.23)
= II911T + IITIIT II91IT
Thus, g = 0 almost everywhere, whence follows a contradiction. In fact, this
argument shows that any subsequential limit of .9 f must be zero almost everywhere, and hence .9n f -+ 0 in L2(T). It follows that 00
(A.24)
f = Eli
.9nf = > (f, Oj)Oj j=-00
as desired.
in L2(T),
Bibliography
Adams, W. J. (1974). The Life and Tomes of the Central Limit Theorem. New York: Kaedmon Publishing Co.
Aldous, D. J. (1985). Exchangeability and Related Topics. In Ecole d'E°te de Probabiht8s de SaintFlour, X111-1983, Volume 1117 of Lecture Notes in Math., pp. 1-198. Berlin: Springer. Alon, N. and J. H. Spencer (1991). The Probabilistic Method (First ed.). New York: Wiley. Andre, D. (1887). Solution directe du problbmc resolu par M. Bertrand. C. R. Acad. Sci. Paris 105, 436-437.
Azuma, K. (1967). Weighted sums of certain dependent random variables. Tdhoku Math. J. (2) 19, 357-367.
Bachelier, L. (1900). Theorie de la speculation. Ann. Sci. Ecole Norm. Sup. 17, 21-86. See also the 1995 reprint. Sceaux: Gauthier-Villars. Bachelier, L. (1964). Theory of speculation. In P. H. Cootner (Ed.), The Random Character of Stock Market Prices, pp. 17 78. MIT Press. Translated from French by A. James Bones". Banach, S. (1931). Uber die Baire'sche kategorie gewisser Funkionenmengen. Studia. Math. 111, 174-179.
Bass, R. F. (1998). Diffusions and Elliptic Operators. New York: Springer-Verlag. Baxter, M. and A. Rennie (1996). Financial Calculus: An Introduction to Derivative Pricing. Cambridge: Cambridge University Press. Second (1998) reprint. Berkes, 1. (1998). Results and problems related to the pointwise central limit theorem. In Asymptotic Methods in Probability and Statistics (Ottawa, ON, 1997), pp. 59-96. Amsterdam: North-Holland. Bernoulli, J. (1713). Ars Conjectandi (The Art of Conjecture). Basel: Basilem: Impensis Thurnisioruin Fraetrum. Bernstein, S. N. (1912/1913).
Demonstration du theorbme de Weierstrass fondee sur le calcul des
probabilites. Comm. Soc. Math. Kharkow 13, 1-2. Bernstein, S. N. (1964). On the property characteristic of the normal law. In Sobranie sochinenii. Tom IV: Teoriya veroyatnostei. Matematicheskaya Statistoka. 1911-1946. "Nauka". Moscow. Billingsley, P. (1995). Probability and Measure (Third ed.). Now York: John Wiley & Sons Inc. Binghamn, N. H. (1986). Variants on the law of the iterated logarithm. Bull. London Math. Soc. 18(5), 433-467.
Birkholf, G. D. (1931). Proof of the ergodic theorem. Proc. Nat. Acad. Sci. 17, 656-660. Black, F. and M. Scholea (1973). Pricing of options and corporate liabilities. J. Political Econ. 81, 637-654.
Blumenthal, R. M. (1957). An extended Markov property. Trans. Amer. Math. Soc. 85, 52-72. Borel, E. (1909). Les probabilites denombrables et leurs applications arithmetique. Rend. Cire. Mat. Palermo 27, 247-271. Bore], E. (1925). Mecanique Slatistique Classique (Third ed.). Paris: Gauthier-Villars. 209
210
Bibliography
Bourke, C., J. M. Hitchcock, and N. V. Vinodchandran (2005). Entropy rates and finite-state dimension. Theoret. Comput. Sci. 349(3), 392-406. Bovier, A. and P. Picco (1996). Limit theorems for Bernoulli convolutions. In Disordered Systems (Temuco, 1991/1992), Volume 53, pp. 135-158. Paris: Hermann. Breiman, L. (1992). Probability. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). See also the corrected reprint of the original (1988). Bretagnolle, J. and D. Dacunha-Castelle (1969). Applications radonifiantes dans les espaces de type p. C. R. Acad. Sci. Paris Ser. A-B 269, A1132-A1134. Broadbent, S. R. and J. M. Hammersley (1957). Percolation processes. I. Crystals and mazes. Proc. Cambridge Philos. Soc. 53, 629-641. Buczolich, Z. and R. D. Mauldin (1999). On the convergence of E::=1 f(nx) for measurable functions. Mathematika 46(2), 337-341. Burkholder, D. L. (1962). Successive conditional expectations of an integrable function. Ann. Math. Statist. 33, 887-893. Burkholder, D. L., B. J. Davis, and R. F. Gundy (1972). Integral inequalities for convex functions of operators on martingales. In Proc. Sixth Berkeley Symp. Math. Statist. Probab., Vol. II, Berkeley, Calif., pp. 223-240. Univ. California Press. Burkholder, D. L. and R. F. Gundy (1970). Extrapolation and interpolation of quasi-linear operators on martingales. Acta Math. 124, 249-304. Cantelil, F. P. (1917a). Su due applicazioni d'un teorema dl G. Boole ails statistics, matematica. Atti delta Reale Acaademia Nationale dei Lincei, Serie V, Rendicotti 26, 295-302. Cantelli, F. P. (1917b). Sulla probabilith come limits della frequenze. Atti delta Reale Accademia
Nationale dei Lincei, Serie V, Rendicotti 26,39-45. Cantelli, F. P. (1933a). Conaiderazioni sulla legge uniforme dei grandi numeri e sulla generalizzazione di un fondamentale teorema del signor L6vy. Giornale d. Istituto Italians Attuari 4, 327-350. Cantelli, F. P. (1933b). Sulla determinazione empirica delle leggi di probabilita. Giornale d. Istituto Itoliano Attuari 4, 421-424. Caratheodory, C. (1948). Vorlesungen fiber recite Funktionen. New York: Chelsea Publishing Company. Champernowne, D. G. (1933). The construction of decimals normal in the scale of ten. J. London Math. Soc. 8, 254-260. Chatterji, S. D. (1968). Martingale convergence and the Radon-Nikodym theorem in Banach spaces. Math. Scand. 22, 21-41. Chebyshev, P. L. (1846). Demonstration flementaire dune proposition generale de la th6orie des probabilit6s. Crelle J. Math. 33(2), 259-267. Chebyshev, P. L. (1867). Des valeurs moyennes. J. Math. Puns Appl. 12(2), 177-184. Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23, 493-507. Chow, Y. S. and H. Teicher (1997). Probability Theory: Independence, Interchangeability, Martingales (Third ed.). New York: Springer-Verlag. Chung, K. L. (1947). On the maximum partial sum of independent random variables. Proc. Nat. Acad. Sci. U.S.A. 33, 132-136.
Chung, K. L. (1974). A Course in Probability Theory (Seconded.). New York-London: Academic Press.
Chung, K.-L. and P. Erd66 (1947). On the lower limit of sums of independent random variables. Ann. of Math. (2) 48, 1003-1013. Chung, K. L. and P. ErdSSs (1952). On the application of the Borel-Cantelli lemma. Trans. Amer. Math. Soc. 72, 179-186. Ciesielski, Z. and S. J. Taylor (1962). First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path. Trans. Amer. Math. Soc. 103, 434-450. Cifarelli, D. M. and E. Regazzini (1996). De Finetti's contribution to probability and statistics. Statist. Sri. 11(4), 253-282. Coifman, R. R. (1972). Distribution function inequalities for singular integrals. Proc. Not. Acad. Set. U.S.A. 69,2938-2939. Copeland, A. H. and P. Erd8s (1946). Note on normal numbers. Bull. Amer. Math. Soc. 52, 857-860. Courtault, J: M., Y. Kabanov, B. Bru, P. Crepel, I. Lebon, and A. Le Marchand (2000). Louis Bachelier. On the Centenary of Theorie de la Speculation. Math. Finance 10(3), 341-353. Cover, T. M. and J. A. Thomas (1991). Elements of Information Theory. New York: John Wiley & Sons Inc.
Bibliography
211
Cox, J. C., S. A. Ross, and M. Rubenstein (1979). Option pricing: a simplified approach. J. Financial Econ. 7, 229-263. Crambr, H. (1936). Uber eine Eigenschaft der normalen Verteilungsfunktion. Math. Z. 41, 405-415. Cs6rg8. M. and P. Rdvdsz (1981). Strong Approximations in Probability and Statistics. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. de Acosta, A. (1983). A new proof of the Hartman-Wintner law of the iterated logarithm. Ann. Probab. 11(2), 270-276. de Finetti, B. (1937). La pr6vision: see loin logiques, ses sources aubjectives. Ann. Inst. H. PoincarC 7, 1-68.
de Moivre, A. (1718). The Doctrine of Chances; or a Method of Calculating the Probabilities of Events in Play (First ed.). London: W. Pearson.
de Moivre, A. (1733). Approximatio ad Summam terminorum Binomii (a t b)" in Serium expansi. Privately Printed. de Moivro, A. (1738). The Doctrine of Chances; or a Method of Calculating the Probabilities of Events in Play (Second ed.). London: H. Woodfall. Dellacherie, C. and P.-A. Meyer (1982). Probabilities and Potential. B. Amsterdam: North-Holland Publishing Co. Theory of martingales, Translated from French by J. P. Wilson. Devaney, R. L. (2003). An Introduction to Chaotic Dynamical Systems. Boulder, CO: Westview Press. Reprint of the second (1989) edition. Diaconis, P. and D. Freedman (1987). A dozen de Finetti-style results in search of a theory. Ann. Inst. H. Poincar8 Probab. Statist. 23(2, suppl.), 397-423. Diaconis, P. and J. B. Keller (1989). Fair dice. Amer. Math. Monthly 96(4), 337-339. Donoho, D. L. and P. B. Stark (1989). Uncertainty principles and signal recovery. SIAM J. Appl. Math. 49(3). 906-931. Doob, J. L. (1940). Regularity properties of certain families of chance variables. Trans. Amer. Math. Soc. 47, 455-486.
Doob, J. L. (1949). Application of the theory of martingales. In Le Calcul des ProbabilitCs et ses Applications, pp. 23-27. Paris: Centre National de is Recherche Scientifique. Doob, J. L. (1953). Stochastic Processes. New York: John Wiley & Sons Inc. Doob, J. L. (1971). What is a martingale? Amer. Math. Monthly 78, 451-463. Dubins, L. E. and D. A. Freedman (1965). A sharper form of the Borel-Cantelli lemma and the strong law. Ann. Math. Statist. 36, 800-807. Dubins, L. E. and D. A. Freedman (1966). On the expected value of a stopped martingale. Ann. Math. Statist 37, 1505-1509. Dudley, R. M. (1967). On prediction theory for nonstationary sequences. In Proc. Fifth Berkeley Syrup. Math. Statist. Probab., Vol. II, pp. 223-234. Berkeley, Calif.: Univ. California Press. Dudley, R. M. (2002). Real Analysis and Probability. Cambridge: Cambridge University Press. Revised reprint of the 1989 original. Durrett, R. (1996). Probability: Theory and Examples (Seconded.). Belmont, CA: Duxbury Press. Dynkin, E. and A. Jushkevich (1956). Strong Markov processes. Teor. Veroyatnost. i Primenen. 1, 149-155.
ErdSs, P. (1948). Some remarks on the theory of graphs. Bull. Amer. Math. Soc. 53, 292-294. ErdSa, P. (1949). On the strong law of large numbers. Trans. Amer. Math. Soc. 67, 51-56. Erdbs, P. and G. Szekeres (1935). A combinatorial problem in geometry. Composito. Math. 2, 463-470. Erd6s, P. and A. R.enyi (1959). On Cantor's series with convergent E 1/q". Ann. Univ. Sci. Budapest. EStvds. Sect. Math. 2, 93-109. Erd6s, P. and A. Rbnyi (1970). On a new law of large numbers. J. Analyse Math. 23, 103-111. Etemadi, N. (1981). An elementary proof of the strong law of large numbers. Z. Wahrsch. Verso. Ceb. 55, 119-122. Falconer, K. J. (1986). The Geometry of Fractal Sets, Volume 85. Cambridge: Cambridge University Press.
Fatou, P. J. L. (1906). S6ries trigonombtriques et series de Taylor. Aeta Math. 69, 372-433. Feller, W. (1945). The fundamental limit theorems in probability. Bulletin A.M.S. 51, 800-832. Feller, W. (1955a). On differential operators and boundary conditions. Comm. Pure Appl. Math. 8, 203-216.
Feller, W. (1955b). On second order differential operators. Ann. of Math. (2) 61. 90-105.
212
Feller, W. (1956). On generalized Sturm-Liouville operators.
Bibliography
In Proceedings of the Conference on
Differential Equations (dedicated to A. Weinstein), pp. 251-270. University of Maryland Book Store, College Park. Aid.
Feller, W. (1957). An Introduction to Probability Theory and Its Applications. Vol. I (Seconded.). New York: John Wiley & Sons Inc.
Feller, W. (1966). An Introduction to Probability Theory and Its Applications. Vol. 11. New York: John Wiley & Sons Inc.
Fortuin, C. M.. P. W. Kasteleyn, and J. Ginibre (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22, 536 -564. Freshet, M. R. (1930). Stir In convergence en probability. Metron 8, 1-48. Freiling, C. (1986). Axioms of symmetry: Throwing darts at the real number line. J. Symbolic Logic 51(1),190-200. Fristedt, B. and L. Gray (1997). A Modern Approach to Probability Theory. Boston. MA: Birkhhuser Boston Inc.
Garsis, A. M. (1965). A simple proof of E. Hopf's maximal ergodic theorem. J. Math. Mech. 14, 381 382.
Georgii, H.-O. (1988). Gibbs Measures and Phase Transitions. Berlin: Walter de Gruyter & Co. Gerber. H. U. and S: Y. R. Li (1981). The occurrence of sequence patterns in repeated experiments and hitting times in a Markov chain. Stochastic Process. Appl. 71(1), 101-108. Glivenko, V. (1933). Sulla determinazione empirica dells leggi di probability. Giornale d. Istituto Italiano Attuan 4, 92 -99. Glivenko, V. (1936). Sit[ teorema limits delta teoria delle funzioni caratteristiche. Giornale d. Instituto Italiano attuart 7, 160-167. Gnedenko, B. V. (1967). The Theory of Probability. New York: Chelsea Publishing Co. Translated from the fourth Russian edition by B. D. Seckler. Gnedenko, B. V. (1969). On Hilbert's Sixth Problem (Russian). In Hilbert's Problems (Russian), pp. 116-120. "Nauka", Moscow. Gnedenko. B. V. and A. N. Kolmogorov (1968). Limit Distributions for Sums of Independent Random Variables. Reading, Mass.: Addison-Wesley Publishing Co. Translated from the original Russian, annotated, and revised by Kai Lai Chung. With appendices by J. L. Doob and P. L. 11su. Revised edition. Grimmett, G. (1999). Percolation (Second ed.). Berlin: Springer-Verlag.
Hamedani, G. G. and G. C. Walter (1984). A fixed point theorem and its application to the central limit theorem. Arch. Math. (Basel) 43(3), 258-264. Hammersley, J. M. (1963). A Monte Carlo solution of percolation in a cubic lattice. In S. F. B. Alder and M. Rotenberg (Eds.), Methods in Computational Physics, Volume I. London: Academic Press. Hardy, G. H. and J. E. Littlowood (1914). Some problems of diophantine approximation. Acta Math. 37, 155-239.
Hardy. G. H. and J. E. Littlowood (1930). A maximal theorem with function-theoretic applications. Acta Math. 54, 81 166. Harris, T. E. (1960). A lower bound for the critical probability in a certain percolation process. Proc. Cambridge Philos. Soc. 56, 13-20. Harrison, J. M. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20,381-408. Harrison, J. M. and S. R. Pliska (1981). Martingales and stochastic integrals in the theory of continuous trading. Stoch. Proc. Their Appl. 11, 215-260. Hartman, P. and A. Wintner (1941). On the law of the iterated logarithm. Amer. J. Math. 63,169-176. Hausdori, F. (1927). Mengenlehre. Berlin: Walter De Gruyter & Co. Hausdorff, F. (1949). Grundzuge der Mengenlehre. New York: Chelsea publishing Co. Helms, L. L. and P. A. Loeb (1982). A nonstandard proof of the martingale convergence theorem. Rocky Mountain J. Math. 12(1), 165-170. Hoeffding. W. (1963). Probability inequalities for sums of bounded random variables. Amer. Statist. Assoc. 58, 13-30. Hoetfding, W. (1971). The Lt norm of the approximation error for Beinstein-type polynomials. J. Approximation Theory 4, 347-356. Houdr6, C.. V. PErez-Abreu, and D. Surgailis (1998). Interpolation, correlation identities, and inequalities for infinitely divisible variables. J. Fourier Anal. Appl. 4(6), 651-668. Hunt, G. A. (1956). Some theorems concerning Brownian motion. Tram. Amer. Math. Soc. 81, 294 319. Hunt, G. A. (1957). Markoff processes and potentials. I, 11. Illinois J. Math. 1, 44-93, 316-369.
Bibliography
213
Hunt, G. A. (1966). Martingates et processus its Mark-o% Paris: Dunod. lonescu Tulceo, A. and C. lonescu Tulcea (1963). Abstract ergodic theorems. Trans. Amer. Math. Soc. 107, 107-124. Isaac. R. (1965). A proof of the martingale convergence theorem. Proc. Amer. Math. Soc. 16, 842-844. Ito, K. (1944). Stochastic integral. Proc. Imp. Acad. Tokyo 20, 519-524. Jones, R. L. (1997/1998). Ergodic theory and connections with analysis and probability. New York J. Math. 3A(Proceedings of the New York Journal of Mathematics Conference, June 9 13, 1997), 31-67 (electronic). Kac, M. (1937). Une remarque sur lee polynomes de M.S. Bernstein. Studio Math. 7, 49-51. Kac, M. (1939). On a characterization of the normal distribution. Airier. J. Math. 61, 726-728. Kac, M. (1949). On deviations between theoretical and empirical distributions. Proc. Nat. Acad. Sri. U.S.A. 35, 252-257. Kac, M. (1956). Foundations of kinetic choery. In J. Neyman (Ed.), Proc. Third Berkeley Symp. on Math. Statist. Probab., Volume 3, pp. 171--197. Univ. of Calif. Kahane, J: P. (1997). A few generic properties of Fourier and Taylor series. In Trends in Probability and Related Analysis (Taipei, 1996), pp. 187-196. River Edge, NJ: World Sci. Publishing. Kahane. J.-P. (2000). Baire's category theorem and trigonometric series. J. Anal. Math. 80, 143-182. Kahane, J.-P. (2001). Probabilities and Baire's theory in harmonic analysis. In Twentieth Century
Harmonic Analysis-A Celebration (11 Ctocco, 2000), Volume 33 of NATO Sci. Ser. II Math. Phys. Chem., pp. 57-72. Dordrecht: Kluwer Acad. Publ. Karlin, S. and H. M. Taylor (1975). A First Course in Stochastic Processes (Seconded.). Academic Press [Harcourt Brace Jovanovich Publishers], New York-London.
Karlin, S. and H. M. Taylor (1981). A Second Course in Stochastic Processes. New York: Academic Press Inc. (Harcourt Brace Jovanovlch Publishers]. Keller, J. B. (1986). The probability of heads. Amer. Math. Monthly 9.9(3), 191-197. Kesten, H. (1980). The critical probability of bond percolation on the square lattice equals . Comm. Math. Phys. 74(l), 41-59. Khintchine, A. and A. Kolmogorov (1925). Uber Konvergenz von Reihen deren Glieder durch den Zufall bestimmt werden. Rec. Math. Moscow 32, 668-677. Khintchine, A. Y. (1923). Uber dyadische Briiche. Math. Z. 18. 109-116. Khintchine, A. Y. (1924). Ein Satz der Wahrscheinlichkeitsrechnung. Fond. Math. 6. 9-10.
Khintchine, A. Y. (1929). Stir la loi des grands nombres. C. R. Acad. Set. Paris 188, 477-479. Khintchine, A. Y. (1933). Asymptotische Gesetz der Wahrscheinlichkeitsrechnung. Springer. Kinney, J. R. (1953). Continuity properties of sample functions of Markov processes. Trans. Amer. Math. Soc. 74, 280-302. Knight, F. B. (1962). On the random walk and Brownian motion. Trans. Airier. Math. Soc. 103, 218 228.
Knight, F. B. (1981). Essentials of Broumian Motion and Diffusion. Providence, R.I.: American Mathematical Society.
Knuth, D. E. (1981). The Art of Computer Programming. Vol. 2 (Second ed.). Reading, Mass.: Addison-Wesley Publishing Co. Seminumerical Algorithms.
Kochen, S. and C. J. Stone (1964). A note on the Borel-Cantelli lemma. Illinois J. Math. 8, 248-251. Kolmogorov, A. (1930). Sur la loi forte des grandee nombres. C. R. Acad. Sci. Paris 191, 910-911. Kolmogorov, A. N. (1929). Uber das Gesetz des iterierten Logarithmus. Math. Ann. 101, 126-136. Kolmogorov, A. N. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer. Kolmogorov, A. N. (1950). Foundations of Probability. New York: Chelsea Publishig Company. Translation edited by Nathan Morrison. Krickeberg, K. (1963). Wahracheinlichkeitstheorie. Stuttgart: Teubner. Krickeherg, K. (1965). Probability Theory. Reading, Massachusetts: Addison-Wesley. Kyburg, Jr., H. E. and H. E. Smokier (1980). Studies in Subjective Probability (Second ed.). Huntington, N.Y.: Robert E. Krieger Publishing Co. Inc. Lacey, M. T. and W. Philipp (1990). A note on the almost sure central limit theorem. Statist. Probab. Lett. 9(3), 201-205. Lamb, C. W. (1973). A short proof of the martingale convergence theorem. Proc. Amer. Math. Soc. 38, 215-217. Lange, K. (2003). Applied Probability. New York: Springer-Verlag.
214
Bibliography
Laplace, P. S. (1782). MCmoire sur les approximations des formulas qui sent fonctions de tres-grands nombres. Technical report, Histoire do I'AcadCmie Royale des Sciences de Paris. Laplace, P.-S. (1805). Traitd de Mdcanique Cdleate, Volume 4. Chez J. B. M. Duprat, an 7 (Crapelet). Reprinted by the Chelsea Publishing Co. (1967). Translated by N. Bowdltch. Laplace, P: S. (1812). Thdori.e Analytique des ProbabilitEs, Vol. 1 and 11. V[iemej Courtier. Reprinted in Oeuvres compldtes de Laplace, Volume VII (1886), Paris: Gauthier-Villars. Lebesgue, H. (1910). 361-450.
Sur l'integration des fonctions discontinues.
Ann. Ecole Norm. Sup. 27(3),
Levi, B. (1906). Sopra l'integrazione dells serie. Rend. !nstituto Lombardino di Sci. e Lett. 39(2), 775-780.
Levy, P. (1925). Calcul des Probabilitds. Paris: Gauthier-Villars. Levy, P. (1937). Theorie de I'Addition des Variables Aldatoires. Paris: Gauthier-Villars. Levy, P. (1951). La mesure de Hausdorff de la courbe du mouvement brownien h n dimensions. C. R. Acad. Sci. Paris 233, 600-602. Li, S.-Y. R. (1980). A martingale approach to the study of occurrence of sequence patterns in repeated experiments. Ann. Probab. 8(6), 1171-1176. Liapounov, A. M. (1900). Sur one proposition de Is theorie des probabilitCs. Bulletin de l'Acaddmie Impdriale des Sciences de St. PetCrsbourg 13(4), 359-386. Liapounov, A. M. (1922). Nouvelle forma du theorbme sur Ia limits do probability. MEmoires de I'Acaddmie Impdriale des Sciences de St. Petdraboury 12(5), 1-24. Lindeberg, J. W. (1922). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrech-
nung. Math. Z. 15, 211-225. Lindvall, T. (1982). Bernstein polynomials and the law of large numbers. Math. Sci. 7(2), 127-139. Lipschitz, R. (1876). Sur Is possibilite d'integrer complement un systbme donna d'equations diffCrentielles. Bull. Sci. Math. 10, 149-159. Mahmoud, H. M. (2000). Sorting: A distribution theory. Wiley-Interscience, New York. Markov, A. A. (1910). Recherches our on cas remarquable d'epreuves dependantes. Acta Math. 33, 87-104.
Mattner, L. (1999). Product measurability, parameter integrals, and a Fubini-Tonelli counterexample. Enseign. Math. (2) 45(3-4), 271-279. Mazurkiewicz, S. (1931). Sur lea functions non dtrivablea. Studio. Math. 111, 92-94. McShane, E. J. (1969). A Riemann-type integral that includes Lebesgue-Stieltjjes, Bochner and stochastic integrals. Memoirs of the American Mathematical Society, No. 88. Providence, R.I.: American Mathematical Society. Merton, R. C. (1973). Theory of rational option pricing. Bell J. of Econ. and Management Sci. 4(1), 141-183.
Mukherjea, A. (1972). A remark on Tonelli's theorem on integration in product spaces. Pacific J. Math. 42, 177-185. Mukherjea, A. (1973/1974). Remark on Tonelli's theorem on integration in product spaces. II. Indiana Univ. Math. J. 23, 679-684. Nash, J. (1958). Continuity of solutions of parabolic and elliptic equations. Amer. J. Math. 80, 931-954. Nelson, E. (1967). Dynamical Theories of Brownian Motion. Princeton, N.J.: Princeton University Press. Norris, J. R. (1998). Markov Chains. Cambridge: Cambridge University Press. Reprint of 1997 original. Okamoto, M. (1958). Some Inequalities relating to the partial sum of binomial probabilities. Ann. Inst. Statist. Math. 10, 29-35.
Paley. R. E. A. C., N. Wiener, and A. Zygmund (1933). Notes on random functions. Math. Z. 37, 647-668.
Paley, R. E. A. C. and A. Zygmund (1932). A note on analytic functions in the unit circle. Proc. Camb. Phil. Soc. 28, 366-372. Perrin, J. B. (1913). Lea Atomes. Paris: Llbrairie F. Alcan. See also Atoms. The revised reprint of the second (1923) English edition. Van Nostrand, New York. Translated by Dalziel Llewellyn Hammick.
Pitman, J. (198]). A note on L2 maximal inequalities. In Seminar on Probability, XV (Univ. Strasbourg, Strasbourg, 1979/1980) (French), Volume 850 of Lecture Notes in Math., pp. 251 -258. Berlin: Springer. Plancherel, M. (1910). Contribution a 1'Ctude de Is representation dune fonction arbitraire par des
intCgrales defines. Rend. Circ. Mat. Palermo 30, 289-335. Plancherel, M. (1933). Sur lee formules de rdclprocit4 du type de Fourier. J. London Math. Soc. 8, 220-226.
Bibliography
215
Plancherel, M. and G. P61ya (1931). Sur les valeurs moyennes doe fonctions rtelles dbfinies pour toutes lea valuers do Ia variables. Comment. Math. Hely. 3, 114-121. Poincar6, H. (1912). Calcul des Probabititds. Paris: Gauthier-Villars. Pollard, D. (2002). A User's Guide to Measure Theoretic Probability. Cambridge: Cambridge University Press. P61ya, G. (1920). Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitstheorie and des Momentenproblem. Math. Zeit. 8, 171-181. Rademacher, H. (1919). Uber partielle and totale Differenzierbarkeit I. Math. Ann. 89, 340-359. Rademacher, H. (1922). Einige Shtze Uber Reihen von allgemeinen Orthogonalfunktionen. Math. Ann. 87, 112-138. Raikov, D. (1936). On some arithmetical properties of summable functions. Math. Sb. 1(43:3), 377-383. Ramsey, F. P. (1930). On a problem of formal logic. Proc. London Math. Soc. 30(2), 264-286. R6nyi, A. (1962). Wahrscheinlichkeitsrechnung. Mit einem Anhang fiber Informationstheorie. Berlin: VEB Deutscher Verlag der Wissenachaften. Resnick, S. 1. (1999). A Probability Path. Boston, MA: Birkhfiuser Boston Inc.
Revuz, D. and M. Yor (1999). Continuous Martingales and Brownian Motion (Third ad.). Berlin: Springer-Verlag.
Rosenlicht, M. (1972). Integration in finite terms. Amer. Math. Monthly 79, 963-972. Schnorr, C: P. and H. Stimm (1971/1972). Endliche Automaten and Zufallafolgen. Acta Informal. 1(4), 345-359.
Schrhdingor, E. (1946). Statistical Thermodynamics. Cambridge: Cambridge University Press. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27, 379-423, 623-656.
Shannon, C. E. and W. Weaver (1949). The Mathematical Theory of Communication. Urbana, Ill.:
Univ. of Illinois Press. Shultz, H. S. and B. Leonard (1989). Unexpected occurences of the number e. Math. Magazine 62(4), 269-271.
Sierpiiski, W. (1920). Sur les rapport entre ('existence des intograles fo f(x,y)dx, fo f(x,y)dy et fo dx fo f(x,y)dy. Fund. Math. 1, 142-147. Skolem, T. (1933). Ein kombinatorischer Satz mit anwenduag auf sin logisches Entacheidungsproblem. Fund. Math. 20, 254-261. Skorohod, A. V. (1961). Iseledovaniya po teorii sluchainykh protsessov. Kiev. Univ., Kiev. (Studies in the Theory of Random Processes. Translated from Russian by Scripts Technica, Inc., AddisonWesley Publishing Co., Inc., Reading, Mass. (1965). See also the second edition (1985), Dover, Now York.).
Skorohod, A. V. (1965). Studies in the theory of random processes. Addison-Wesley Publishing Co., Inc., Reading, Mass. Slutsky, E. (1925). Uber etochastische Asymptoten and Grenzwerte. Metron 5, 3-89. Solovay, R. M. (1970). A model of set theory in which every set of reels is Lebesgue measurable. Ann. Math. 92, 1-56. Steinhaus, H. (1922). Les probabilitds dlnombrables et leur rapport A la th6orie de mesure. Fund. Math. 4, 286-310. Steinhaus, H. (1930). Sur Ia probability de la convergence de eerie. Studia Math. 2, 21-39. Stigler, S. M. (1986). The History of Statistics. Cambridge, MA: The Belknap Press of Harvard University Press. Stirling, J. (1730). Methodue Difjerentiatis. London: Whiston & White. Strassen, V. (1967). Almost sure behavior of sums of independent random variables and martingales. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), pp. Vol. II, Part 1, pp. 315-343. Berkeley, Calif.: Univ. California Press. Stroock, D. W. (1993). Probability Theory, An Analytic View. Cambridge: Cambridge University
Press. Stroock, D. W. and O. Zeitounl (1991). Microcanonical distributions, Gibbs states, and the equivalence of ensembles. In Random Walks, Brownian Motion, and Interacting Particle Systems, pp. 399-424. Boston, MA: Birkhauser Boston.
Tendon, K. (1983). The Life and Works of Lip6t Fej#r. In Functions, series, operators, Vol. 1, II (Budapest, 1980), Volume 35 of Colloq. Math. Soc. Jdnos Bolyai, pp. 77-85. Amsterdam: NorthHolland.
Trotter, H. F. (1959). An elementary proof of the central limit theorem. Arch. Math. 10, 226-234.
216
Bibliography
Turing, A. M. (1934). On the Gaussian error function. Technical report, Unpublished Fellowship Dissertation, King's College Library, Cambridge. Varadhan, S. R. S. (2001). Probability Theory. New York: New York University Courant Institute of Mathematical Sciences. Varberg, D. E. (1966). Convergence of quadratic forms in independent random variables. Ann. Math. Statist. 37, 567-576. Veech, W. A. (1967). A Second Course in Complex Analysis. New York, Amsterdam: W. A. Benjamin, Inc.
Ville, J. (1939). Etude Critique de to Notion de Collectif. Paris: Gauthier-Villars. Ville, J. (1943). Sur 1'application, a un critere d'inddpendance, du d4nombrement des inversions pr8eent6es par one permutation. C. R. Acad. Sci. Paris 217, 41-42. von Neumann, J. (1940). On rings of operators, Ill. Ann. Math. 41, 94-161. von Smoluchowski, M. (1918). Uber den Begriff des Zufalls and den Ursprung der Wahrscheinlichkeit. Die Naturwissenschaften 6(17), 253-263. Wagon, S. (1985). Is x normal? Math. Intelligencer 7(3), 65-67. Wiener, N. (1923a). Differential space. J. Math. Phys. 2, 131-174. Wiener, N. (1923b). The homogeneous chaos. Amer. J. Math. 60, 879-036. Williams, D. (1991). Probability with Martingales. Cambridge: Cambridge University Press. Wong, C. S. (1977). Classroom Notes: A Note on the Central Limit Theorem. Amer. Moth. Monthly 84(6), 472. Woodroofe, M. (1975). Probability with Applications. New York: McGraw-Hill Book Co. Young, L. C. (1970). Some new stochastic integrals and Stioltjes integrals. I. Analogues of HardyLittlewood classes. In Advances in Probability and Related Topics, Vol. 2, pp. 161-240. New York: Dekker.
Zabell, S. L. (1995). Alan Turing and the central limit theorem. Amer. Math. Monthly 102(6), 483-494.
Index Symbols .8(O), Borel sigma-algebra on fl ........ 24 Bin(n,p), binomial distribution ..........8
independent identically distributed 68 mc(6d), monotone class generated by 0 30
C, the complex numbers ............... xv
IUIILP(µ) = (f IIIPdµ)'/P .............. 39 IIXIIP = (E[IXIP])'/P ................... 39 v K µ, absolute continuity ............. 47
Cb(X), bounded continuous functions from
0.(X ), 0.(.d), ... sigma-algebra generated by
CC(X), compactly supported continuous func-
A, V, the min and max operators .......xvi
C(X), continuous functions from X to R 49
X to R ............................94
X, of, etc ................. 24. 49, 120.
tions from X to R ................. 94 C°°(Rk), infinitely differentiable functions of compact support from Rk to R 111 Cov(X, Y), covariance between X and Y 62
A
EX, expectation of X .................. 39
Adapted process ..................126, 126,182
EIX; Al = E[X1AI = .fA X dP .......... 32 Geom(p), geometric distribution .........8 LP, random variables with p finite absolute
moments .......................... 39
ZP, completion of LP .................. 43 N(µ,0.2), normal distribution
.......... U
N, the natural numbers ................ xv 9(fl), the power set of 11 .............. 24 Poiss(. ), Poisson distribution ............9
Q, the rationals ........................ xv R, the real numbers .................... xv
Absolute continuity .. 11, see also Measure Adams, William J . Aldous, David J .
..................... 22
......................157
Almost everywhere convergence ........ 43 Almost sure
central limit theorem ................. 90
convergence ..........................43 Alon, Noga .............................89 Andr6, Desire .................... 119 180
Approximation to the identity ......... 205 Avogadro, Lorenzo Romano Amedeo Carlo 159
Azuma, Kazuoki ...................... 151
9
the unit sphere in R" .......... 102 SD(X), standard deviation of X ........67 Unif(a, b), uniform distribution .........11
Azuma-Hoeffding inequality ..155, see also Heoffding's inequality
VarX, variance of X ................... 62
B
X+, the positive elements of X ......... xv
Z, the integers ......................... xv
a.e., almost everywhere .................43
a.s., almost surely ...................... 43
Bachelier, Louis Jean Baptiste Alphonse 159,
117180 Backward (or reversed) martingale ..... 155 Banach, Stefan ........................152
dv/dµ, Radon-Nikodym derivative ..... 47
Bass, Richard Franklin ................ 201
J+ = max(f, ,0) ........................ 38
Baxter, Martin ........................15Z
f- = max(-f, 0) ...................... 38
Berk@s, Istvan ..........................90
217
Index
218
Bernoulli trials .......................... 8 Bernoulli, Jacob ................ 18. 22, Z1 Bernstein
polynomials .......................... 77 proof of the Weierstrass theorem ..... 77 Bernstein (Bernshtein), Sergei Natanovich 77, 117
Constantine ........ 27 109 Cauchy
sequence ............................. 46 summability test ....................183 Cauchy, Augustin Louis ..40, 108, 113 183 Cauchy-Bunyakovsky-Schwarz inequality 40 Central limit theorem . 19, 22, 89, 100, 102
Bessel's inequality ..................... 208
Bovier-Picco ........................113
Bessel, Friedrich Wilhelm ............. 208
de Moivre-Laplace ................... 19
Bingham, Nicholas H .
.................1`57
Liapounov's .........................114
Binomial distribution ...........8 see also
Lindeberg's .........................115
Distribution
projective ...........................102
Birkhoff, George David ................. 90 Black, Fischer .................... 144, 145 Black-Scholes Formula ................145 Blumenthal's zero-one law .............117 Blumenthal, Robert M............177, 180 Borel set and/or sigma-algebra ......... 24 Borel, Emile .. 52, 73 86, 89, 109, 117, 137
Borel-Cantelli lemma ..... §1 73 see also
via Liapounov's method .............1125 Ville's .............................. 116
with error estimates ............105 116
Champernowne, David Gawen .......... 90 Characteristic function .........96, 96-117 convergence theorem ............99, 99,10 inversion theorem ...................112 uniqueness theorem .................. 99
Dubins-Freedman ...................156
Chatterji, Srishti Dhav ................ 152 Chebyshev's inequality ............. 18 43
Levy's .............................. 136
conditional ..........................152
Paley-Zygmund inequality
lemma ........... 109 Bounded convergence theorem ..........45 conditional ..........................121
Bourke, Chris .......................... 90
Bovier Anton ......................... 113 Bretagnolle, Jean ..................... 158 Broadbent, S. R. ....................... 83
Brown, Robert ........................ 159 Brownian bridge ...................... 177
Brownian motion
and the heat equation 117 118 196, 198 as a Gaussian process ...............153 Einstein's predicates ................ 160 exit distribution ..196 197, 200, see also
Chung's formula
filtration ........ 171 see also Filtration gambler's ruin formula .............. 200 nowhere differentiability of ..........168
quadratic variation ..................103
Wiener's construction .......... 166-168 with drift ........................... 201
Bru, Bernard .......................... 180 Buczolich, Zolt4n .......................90
Bunyakovsky, Viktor Yakovlevich .......40
Burkholder, Donald L ..............52, 156
C
for sums ............................. 63
Chebyshev, Pafnutii Lvovich .... L& 43 fi3
Chernoff's inequality ................... 51 Chernoff, Herman ...............51, 52, 87 Chung's formula ................. 197, 200 Chung, Kai Lai ................ xii, 89, 197 Ciesielski, Zbigniew ................... 201 Cifarelli, Donato Michele ..............158
Coifman, Ronald R ..................... 52 Compact support (or compactly supported) function ..............................94
Compact-support process ............................. 182
Complete measure space ....................... 28 topological space .....................42
Completion ............................ 29 Conditional expectation .......... 120-125 and prediction ...................... 122 classical .............................124
properties ................. 120-121, 123 towering property ................... 123
Conditional probability ............. 4, 125 Consistent measures ....................59
Convergence almost everywhere ................... 43 almost sure .......................... 43 in LP ................................ 43
Call options ........................... 144
in measure ........................... 43
Cantelli, Francesco Paolo 73, 80 85, 89, 138
in probability ........................ 43
Cantor set ............................. 88
weak .................................91
Cantor, Georg ..........................34
Convergence theorem ........ 102. see also
Cantor-Lebesgue function .............. 88
Characteristic function Convex function ............... 40, 50, 156
Caratheodory Extension Theorem ...... 27
Index
219
Convolution ...................15 98,113
mean and variance ................. 12
Cootner, Paul H .......................159
hypergeometric ...................... 13
Copeland, Arthur H . ................... 90
mean and variance ................. 13
Correlation .............................67
infinitely divisible ................... 113
Countable (sub-) additivity .............24
negative binomial .................... 13
Courtault, Jean-Michel ................180
mean and variance .................13
Covariance ............................. 67
normal ...........................11, 11,28
matrix .............................. 161 Cover, Thomas M .
characteristic function ........ 97, 161
..................... 89
degenerate .........................11
Cox, John C ...........................144 Cr4pel, Pierre ......................... 180
mean and variance ................. 12
multi-dimensional .............28, 28,161
Cramdr's theorem ..................... 1117 Cramer, Harald ..............102,107,117 Cramer-Wold device .................. 102
standard ........................... U Poisson ............................9, 10
characteristic function ............. 97 connection to binomial ......... 10 19
Csorgo, Mikl6s ........................157 Cumulative distribution function ... 66, see also Distribution function
mean and variance ................. 13
uniform .......................11 11 28
Cylinder set ............................ 58
characteristic function ............. 97
connection to discrete uniform ..... 15 connection to exponential ..........1.5
D Dacunha-Castelle, Didier ..............1.58
mean and variance .................12
Davis, Burgess J ........................52 de Acosta, Alejandro ............. 156, 158 de Finetti's theorem ................... 156
de Finetti, Bruno ................. 156-158
de Moivre, Abraham ............ 15, 19,22 de Moivre-Laplace central limit theorem 19, see also Central limit theorem de Moivre's formula . 19, see also Stirling's formula Degenerate normal 1L see also Distribution
Dominated convergence theorem ........46 conditional ..........................121
Donoho, David L ...................... M.
Doob's decomposition ................. 128, 152
martingale convergence theorem .... 134 martingales .................... 127, 200
maximal inequality ................. 134 continuous-time .................. 189
Dellacherie, Claude ....................201
optional stopping theorem .......... 130
Density function ................ 10. 14, 48
strong (p, p)-inequality
Devaney, Robert L ......................90
continuous-time .................. 189
Diaconis, Persi .................... 22, 117
strong Lt-inequality ................ 156
Dimension ............................. 59
strong LP-inequality ................ 153 Pitman's improvement ............153
Dini, Ulisse ........................... 1112
Dini-continuous process in LZ(P) ............................182 Distribution ........................49, 49,0
binomial .............................. 8
Doob, Joseph Leo .. xii, 127, 130. 134. 152. 153, 156, 157 189, 200 Dubins, Lester E ..................154,
L51
Dudley, Richard Mansfield ..........22, 22,51
characteristic function ............. 97
Durrett, Richard .................. 90, 1311
connection to Poisson .......... 10, 19
Dyadic filtration ................. 142, 14$
mean and variance .................12
Dyadic interval .................. 142, 1411
Cauchy ......................... 14. 113
Dynkin, Eugene B .....................180
non-existence of the mean
......... IA
discrete ............................... 7
discrete uniform ......................a
E
exponential ..........................1a
Einstein, Albert ....................... 159
characteristic function ............. 97
Elementary function ....................37
connection to uniform ............. 15
Entropy ................................78
mean and variance ................. 13
Erd&s, Paul [Pill ................81, 88-90
function ......................13, 33, 60
Etemadi, Nasrollah ..................... 89
gamma .............................. U
European options ..................... 144
characteristic function ............ 1.2
Event ......... 35, see also Measurable set
mean and variance ................. 13
Exchangeable .................... 155, 156
geometric ............................. 8
Expectation ........................12. 32
Index
220
F
Hilbert, David ..........................52
Falconer, Kenneth K ....................34
Hitchcock, John M .
Fatou's lemma ..................... 45.50
Hoeffding's inequality ..... 51, 87, sec also
conditional .......................... 121
Fatou, Pierre Joseph Louis ..........45, 52
Azuma-Hoeffding Hoeffding, Wassily ...... 51, 52, 87, 89, UZ
Fejer, Leopold ......................... 117
Holder
.................... 90
Feller, William K.........16. 117, 157, 201
continuous function ............. Z, 199
Fermi, Enrico .......................... 84
inequality ............................ 39
Filtration ............................. 126
conditional ....................... 121 generalized ........................ 51
Brownian ...........................171 right-continuous .................... L71
Holder, Otto Ludwig ...........30 51, 199
Fischer, Ernst Sigismund .............. 16$
Houdr6, Christian ..................... 117
FKG inequality ........................ 63
Hunt, Gilbert A ..............131
Fortuin, C. M .
191 180
......................... 64
Fourier series ..................... 205-208
I
Fourier transform ............. 96, see also
Independence ...................14. 62. 6$
Characteristic function
Indicator function ...................... 36
Fourier, Jean Baptiste Joseph .......... 96
Infinitely divisible .....................)13
Frechet, Maurice Rene ..................52
Information inequality ..................87
Freedman, David A. ......... 117, 154, 138
Inner product ......................... 203
Freiling, Chris ..........................23
Integrable function ..................... 39
Fubini. Guido ..........................33
Integral ................................ 39
Fubini-Tonelli theorem .................. U
Inverse image .......................... xvi
inapplicability of .......... 56-58, 62. 63
Inversion theorem ............ 112. see also Characteristic function
G
lonescu Tulcea, Alexandra .............152
Gambler's ruin formula ........... 133, see
lonescu Tulcea, Cassius ............... 152
also Random walk, see also Brownian
Isaac, Richard .........................152
motion
Ito
Garsia, Adriano M ......................90
formula ........................ 194,195
Gaussian process ......................1.63
integral ........................ 181--18.3
Geometric distribution ......... 8, see also Distribution
indefinite .........................1$6
under Dini-continuity ............. 184 isometry ............................ 184
Goorgii, Hans-Otto ....................15& Gerber, Hans U .
...................... L52
lemma .............. see also Ito formula
Ginibre, Jean .......................... 64
It6, Kiyosi ......... 181, 184, 185. 194, 195
Glivenko, Valerii Ivanovich ........ 80, 117
Glivenko-Cantelli theorem ............. 81)
J
Gnedenko, Boris Vladimirovich ....52, 117
Jensen's inequality ..................... 40
Grimmett, Geoffrey R . ................. 83 Gundy, Richard F . ..................... 52
conditional .......................... 121
Jensen, Johann Ludwig Wilhelm Waldemar 40
H
Jones, Roger L .
Hadamard's Inequality ................. 51 Hadamard, Jacques Salomon ........... 51 Hamedani, Gholamhossein Gharagoz .. 117 Hammersley, John M . .............. 83,89 Hardy, Godfrey Harold . 138, 140, 143. 187 Hardy-Littlewood maximal function .. 1413 187
........................ 52
Jushkevich (Yushkevichl, Alexander A. 180
K Kabanov, Yuri ........................ 180
Kac, Mark ................ 151 78, 115-117 Kahane, Jean-Pierre ...................157
Kasteleyn, Pieter Willem ...............64
Harris, Theodore E . .................... 83
Keller, Joseph B ........................22
Harrison, J. Michael ................... 144
Kesten, Harry ..........................
Hartman, Philip ....................... 138
Khintchine (Khinchinl, Aleksandr Yakovie-
Hausdorff measure ..................... 34
vich ...... xii, 71, 72, 88, 89, 138, 153,
Hausdorff, Felix ................... 34, 138
122
Helms, Lester L . ...................... L52
Khintchine's
Hilbert space ..........................203
83
inequality ............................ 88
Index
221
weak law of large numbers .. 72. see also Law of large numbers
Kinney, John 8 ........................180 Knight, Flank B . ................. 11$. 201 Knuth, Donald E .......................84 Kochen, Simon ......................... 89 Kolmogorov
Liouville, Joseph ............... 11. 16 108 Lipschitz continuous .............. 147, 193
Lipschitz, Rudolf Otto Sigismund ..... 152 Littlewood. John Edensor ... 138, 140, 143. 187 Loeb, Peter A . ........................ 152
consistency theorem .........60, see also Kolmogorov extension theorem
extension theorem ....................60 maximal inequality ................... 74 one-series theorem ................... 85 strong law of large numbers .7_3 see also Law of large numbers
zero-one law .................... 69, 136 Kolmogorov, Andrei Nikolaevich xii, 52, 60, 69, 73, 74, 85, 89, 103 117, 138, 153 Kreps, David M .
Lindvall. Torgny ....................... 89
...................... 144
M Mahmoud, Hosam M.
................. 151
Le Marchand, Arnaud ................. 180 Markov
inequality ............................ 43 property ...104 see also Strong Markov of Brownian motion ......... 160, 163 Markov, Andrei Andreyevich ........43. 89 Martingale ............................ 126 and likelihood ratios ................ 152
Krickeberg's decomposition ............1'28
continuous-time ..................... 187
Krickeberg, Klaus ..................... 128
convergence theorem 134 see also Doob's
Kyburg, Henry E., Jr .
representations ......................195
.................158
reversed ............................ 155
L
transforms .......................... 127
Lacey, Michael T . ...................... 90
Mass function ....................... 7, 14
Lamb, Charles W ......................152
Mattner, Lutz .......................... 64
Laplace, Pierre-Simon . 16. 19, 21, 157, 206
Mauldin, R. Daniel ..................... 90
Law of large numbers ............... 71_-88
Maximal inequality ....74-76, 87, 1-9, 189
and Monte Carlo simulation .......... 83
Maxwell, James Clerk ................. 117
and Shannon's theorem .............. 79
Mazurkiewicz, Stefan ..................157
Erd6s-Renyi ......................... 88
McShane, Edward James .............. 199 Mean ............ 39, see also Expectation
strong ........................... 73,85
Measurable
and the Glivenko-Cantelli theorem ... 80 weak .................................72
function ..............................35
Law of rare events ......................13
Lebesgue .............................30
Law of the iterated logarithm 138, 154 156
set ...................................24
for Brownian motion ................177
space ........ 25, see also Measure space
Law of total probability ................. 5
Measure ................................24
Lebesgue
absolutely continuous ................ 47
differentiation theorem ........ 140, 155 measurable ..... 30, see also Measurable measure ........... 25 see also Measure Lebesgue, Henri Leon ......... 52, 113, 140
counting ............................. 33 finite product ........................54
Lebesgue .............................25 invariance properties ...............33
Lebon, Isabelle ........................ 180
probability ...........................25
Leonard. Bill ........................... 86
space ................................ 25
Levi's monotone convergence theorem .. 46 Levi, Beppo ............................ 52
Levy's
Bore]-Cantelli lemma ............... 136
support of ......... 33, see also Support Merton, Robert C . .................... 144
Meyer, Paul-Andre .................... 201 Minkowski's inequality ................. 40
concentration inequality .............112
conditional .......................... 121
equivalence theorem ................ 155
Minkowski, Hermann ................... 40
forgery theorem .....................178
Mixture ................................ 51
Modulus of continuity ............. 77 182
Levy, Paul .. 90, 92, 99, 112. 117, 136, 155, 160, 178. h-WL 200
Monotone class .........................30
Li, Robert Shun-Yen ............. 149 157 Liapounov [Lyppunov], Aleksandr IMlikhailovich 149.. 1-14
Lindeberg, Jarl Waldemar ........ I
theorem ..............................30
Monotone convergence theorem 46, see also Levi
1
115
conditional .......................... 121
Index
222
Monte Carlo
Previsible process ..................... 127
integration ...........................84
Probability space .....25, see also Measure
simulation ..... 83, see also Law of large
space Product
numbers, 84
measure ........... 5_4 see also Measure
Mukherjea, Arunava ....................69
topology ............................. 58
N Nash's Poincare inequality ............ 11fi
Nash, John .......................156 117
Q
Quadratic variation 113 see also Brownian
Negative part of a function ............. 38
motion
Nelson, Edward ....................... 180 Newton's method ...................... 87 Newton, Isaac .......................... 87
R
Nikodym, Otton Marcin ................ 47 Normal distribution ........... 11, see also
Radon, Johann ......................... 47
Radon-Nikod'rm theorem .............. 47
Distribution
Normal numbers ....................86,90 Norris, James R .
Rademacher's theorem ................ 147 Rademacher, Hans .................89, 147
....................... 90
0
Ralkov's ergodic theorem ........... 88, 90 Raikov, Dmitrii Abramovich ........ 88,90
Ramsey number ........................81
Ramsey, Frank Plumpton .............. 81
Okamoto, Masashi ..................... 87
Random
continuous-time ..................... 187
permutation ..3. 9 22, 116, 155, see also Central limit theorem, Ville's
Optional stopping theorem ............ 130
set ...................................64
Options ............................... 144
variable .............................. 35
Optional stopping
absolutely continuous .......... 10. 14
Orthogonal ............................203
Orthogonal decomposition theorem ....203
discrete ......................... 7, 14
vector ................................1A
P
Random walk ......................... 131
Paley, Raymond Edward Alan Christopher
gambler's ruin formula ......... 133, 152
86, 157 10 Paley-Zygmund inequality ............. 86
nearest neighborhood ...............132
Parseval des Chenes, Marc-Antoine ....117
Reflection principle .............. 175, 129
Parseval's identity ..................... 117
Regazzini, Eugenio ....................LSa
Percolation .............................82 P6rez-Abreu, Victor ................... 117 Perrin, Jean Baptiste .................. 180
simple .............................. 132
Rennie, Andrew .......................152 Rhnyi, Alfred .......................88, 89 Reversed martingale ................... 155
Philipp, Walter .........................90
Rbvesz, P&l ........................... 1.52
Picco, Pierre
.......................... 113
Revuz, Daniel ......................... 201
Piecewise continuous function .......... U Pitman's L2 inequality .......153 see also
Riemann, Georg Friedrich Bernhard ... l13
Doob's strong L' inequality Pitman, James W . ....................
Riesz reresentation theorem ............ 49
1.53
Riemann-Lebesgue lemma ............ 113 Riesz, Frigyes ......................49, 163
Plancherel theorem ................99, 115
Rosenlicht, Maxwell .................... 16
Plancherel, Michel ............. 34, 97, LLS
Ross, Stephen A .......................144
Pliska, Stanley R . ..................... 144
Rubenstein, Mark ..................... 144
Poincard inequality .................... 116 Poincar6, Jules Henri .......... 16, 22, 159
S
Poissonization .......................... 10
Schnorr, Claus-Peter ................... 90 Scholes, Myron ................... 144, 195 Schrodinger, Erwin ..................... 86 Schwarz's lemma ...................... 108
P61ya's urns ...........................152
Schwarz, Hermann Amandus ...... 40, 108
P61ya, George ................ 34, 117, 152
Semimartingale ....................... 126
Point-mass ............................. 25
Poisson distribution 9, see also Distribution
Poisson, Simeon Denis ............... 9 19
Positive part of a function ..............38
Shannon's theorem ..................... 7Z2
Positive-type function ................. 113
Shannon, Claude Elwood ...........78, 29
Power set .............................. 24
Shultz, Harris S .
....................... 86
Index
223
Sierpinski, Waclaw ..................5,1, 64
Tonelli, Leonida ........................55
Sigma-algebra ..........................23
towering property of conditional expectations
Borel ................................ 24
123
generated by a random variable . 49, 1211
Triangle inequality ..................38, 38,41
Simple function ........................ 37
Trigonometric polynomial ............. 205
Simple walk ...164 see also Random walk,
Trotter, Hale Freeman .................117
Turing, Alan Mathison ................ 117
132
Simpson, Thomas ...................... 22 Simulation ....... 83, see also Law of large numbers
Turner, James A .
..................... 152
U
Skolem, Thoralf Albert ................. 81
U-statistics .......................151, 151,155
Skorohod (Skorokhod], Anatoli Vladimirovich 116. 178
Ulam, Stanislaw ........................84
Skorohod embedding ..................118 Skorohod's theorem ...................116
Uncorrelated random variables ......... fit Uniform distribution .......... 11, see also
Uncertainty principle .......... 51, 52, 115
Distribution
Slutsky, Evgeny .................... 50, 52 Slutsky's theorem ...................... 50
Smokier, Howard E ....................155 Solovay, Robert M .
................. 23,34
Spencer, Joel ...........................89
on
Standard normal . 11, see also Distribution Stark, Philip B .
....................... 115
Steinhaus probability space .............44
............................103
and weak convergence ...............115
Uniqueness theorem ........... 99, see also Characteristic function
Stable distribution .................... 178
Standard deviation .................12 67
3n- l
Uniform integrability ..............51, 154
V Varberg, Dale E . ...................... 151
Variance ........................... 11 6Z
Steinhaus, Hugo .......... 4444 89, 138, 159
conditional ..........................152
Stigler, Steven Mack ................... 89
Veech, William A ......................117
Stimm, H ...............................90
Viete, Francois ........................ 117
Stirling's formula ..21 22, 82, 156, see also de Moivre's formula
Ville, Jean ....................... 117, 152
Vinodchandran, N. V ...................90
Stirling, James ......................... 21
von Neumann, John ................ 48, 84
Stochastic integral 179, see also It6 integral
von Smoluchowski, Marian ............ 159
Stochastic process ..................... 126
continuous-time .....................181
w
Stone, Charles J ........................89
Wagon, Stanley ........................ 90
Stopping time ................... 129, 1211
Wald's identity ................... 131, 153
simple ..............................110
Wald, Abraham .................. 131, 153
Strassen, Volker .......................152
Wallis, John .......................... L14
Strong law of large numbers ... 73, see also Law of large numbers
Weak convergence ......................91
Walter, Gilbert G .
.................... 117
for dependent variables ...............85
Weak law of large numbers .... 72, see also Law of large numbers
for exchangeable variables ...........155
Weaver, Warren ........................78
Cantelli's ............................ 85
Strong Markov property ...............124 Stroock, Daniel W .....................117
Submartingale
........................ 126
continuous-time .....................200
Supermartingale ...................... 126
Weierstrass approximation theorem 77, see also Bernstein
Hoeffding's refinement ............... 89
Kac's refinement ..................... 78 Weierstrass, Karl Theodor Wilhelm .....71
Support ................................ 33
Weyl, Hermann ........................ 52 White noise ...................... 178, 174
Surgailis, Donatas ..................... 117
Wiener process .... 180, see also Brownian
continuous-time .....................200
Szekeres, Gabor [Gyorgy] ...............81
motion
Wiener, Norbert ... 157. 159, 166 168 179
T
Tandori, K4roly ....................... 117
Williams, David .......................152 Wintner, Aurel ........................ 138
Taylor, Samuel James ................. 201
Wold, Herman 0. A .
Thomas, Joy A .
........................ 89
.................. 192
Wong, Chi Song ........................22
Index
224
Woodroofe, Michael ................... 154
Y Yor, Marc .............................201 Young integral ................... 199, 200
Young's inequality ......................51 Young, Laurence Chisholm ............ 199 Young, William Henry ..................51
z Zabell, Sandy L .
...................... 117
Zeitouni, Ofer ......................... 117 Zero-one law
Blumenthal's ....................... 12Z 69,136 Kolmogorov's ...................69, Zygmund, Antoni .............86, 157, 1S$
This is a textbook for a one-semester graduate course in measure-
theoretic probability theory. but with ample material to cover an ordinary year-long course at a more leisurely pace. Khoshnevisan's
approach is to develop the ideas that are absolutely central to modem probability theory, and to showcase them by presenting their various applications. As a result, a few of the familiar topics are replaced by interesting non-standard ones.
The topics range from undergraduate probability and classical limit theorems to Brownian motion and elements of stochastic calculus. Throughout, the reader will find many exciting applications of probability theory and probabilistic reasoning. There are numerous exercises, ranging from the routine to the very difficult. Each chapter concludes with historical notes.
ISBN 3-82L8-4215-3
I
For additional information and updates on this book. visit
www .ams.orglbookpageslgsm-80
9
www.ams.org