This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
n, then what is the probability that A will maintain a lead throughout the process of sequentially counting all m + n votes cast? [Hint: P (out of m + n votes cast, A scores m votes, B scores n votes, and A maintains a lead throughout) = P(Tö n = m + n).] - 0. Since F is closed, the events {{B*} e F } decrease to {{B*} e F) as e - 0, and tightness follows from the continuity of the probability measure P and Alexandrov's Theorem T.8.1 (ii). E 0 where T is kT temperature and k is a universal constant called Boltzmann s constant, an external field parameter H, and an interaction parameter (coupling constant) J. The spin variables X", n = 0, + 1, ±2, +3, ... , are distributed according to a stochastic process on { —1,1 } indexed by Z with the Markov property and having stationary transition law given by exp{ßJai + ßHn} p(X" + ' =nI X" =Q)= 2cosh(ßH+ßJa) 0 for all x>0,g 00 =0. Then by Proposition 6.1, the process is conservative and, therefore, nonexplosive. Example 3. (Chemical Reaction Kinetics). The study of the time evolution of molecular concentrations that undergo chemical reactions is known as chemical reaction kinetics. The interest in reaction kinetics is not limited to chemistry. Ecologists and biologists are interested in the development of populations of various organic species that are involved in "reactions" through predation, births, deaths, immigration, emigration, etc. The simplest view of chemical reactions is as an evolution that takes place through a succession of stages called elementary reactions. For example, 2A + B -+ C is the notation used by chemists to denote an elementary reaction in which two molecules of A react with a molecule of B to produce a molecule of C. The notation A -* B + C represents the dissociation of a molecule of A into molecules of types B and C. The reaction A -s B represents a transformation , = e` and, for the general case (centered and scaled), t) --+0
7. Let {S„} be the simple random walk starting at 0 and let MN =max{S„:n=0, 1,2,...,N}, m N =min{S„:n=0, 1, 2,...,N}. (i) Calculate the distribution of MN . (ii) Calculate the distribution of m N . (iii) Calculate the joint distribution of MN and S N . [Hint: Let a > 0, b be integers, a -> b. Then N
P(Tn =n,SN ^b)
P(MN >,a,S N ^b)=P(Ta -
_ Z P(Ta =n,SN —S„,>b—a) N
_
P(Tn =n)P(SN _„>-b—a) n=1
8. What percentage of the particles at y at time N are there for the first time in a dilute system of many noninteracting (i.e., independent) particles each undergoing a simple random walk starting at the origin? *9. Suppose that the points of the state space S = 1 are painted blue with probability
EXERCISES
71
or green with probability 1 - p, 0 -< p < 1, independently of each other and of a simple random walk {Sn } starting at 0. Let B denote the random set of states (integer sites) colored blue and let N(p) denote the amount of time (occupation time) that the random walk spends in the set B prior to time n, i.e., p
1B(Sk)•
Nn(P) _ k=0
(i) Show that EN(p) = (n + 1)p. [Hint: EI B (Sk ) = E{E[I B (Sk ) I Sk ]}.] (ii) Verify that
lim Var{ Nn(p) l - cap P(I - P) l n ) 1p
-
for p = Z for p
z.
9^
[Hint: Use Exercise 13.15.] (iii) For p ^ Z, use (ii), to show Nn (p)/n -+ p in probability as n -+ Go. [Hint: Use Chebyshev's Inequality.] (iv) For p = 1, show NN (p)/n -^ p in probability. [Hint: Show Var(N n (p)/n) -* 0 as n—. co.]
10. Apply Stirling's Formula, (k! _ (27rk)'/ Z k k e -k (1 + 0(l)) as k -. 00), to show for the simple symmetric random walks starting at 0 that (i) P(T N) '
(2rz)1'^z N-3/2
as N -^ oo.
(ii) ETy = co.
Exercises for Section 1.5
I. (i) Complete the proof of P61ya'.s Theorem fork > 3. (See Exercise 5 below.) (ii) Give an alternative proof of transience for k > 3 by an application of the Borel-Cantelli Lemma Part 1 (Chapter 0, (6.1)). Why cannot Part 2 of the lemma be directly applied to prove recurrence for k = 1, 2? 2. Show that
kO \k/
Z
=() /
.
[Hint: Consider the number of ways in which n balls can be selected from a box of n black and n white balls.] 3. (i) Show that for the 2-dimensional simple symmetric random walk, the probability of a return to (0, 0) at time 2n is the same as that for two independent walkers, one along the horizontal and the other along the vertical, to be at (0, 0) at time 2n. Also verify this by a geometric argument based on two independent walkers with step size 1/ f and viewed along the axes rotated by 450
72
RANDOM WALK AND BROWNIAN MOTION
(ii) Show that relations (5.5) hold for a general random walk on the integer lattice in any dimension. Use these to compute, for the simple symmetric random walk in dimension two, the probabilities fj that the random walk returns to the origin at time j for the first time for j = 1, ... , 8. Similarly compute fj in dimension three for I < j < 4. 4. (i) Show that the method of Exercise 3(i) above does not hold in k = 3 dimensions. (ii) Show that the motion of three independent simple symmetric random walkers starting at (0, 0, 0) in 71 3 is transient. 5. Show that the trinomial coefficient
n (
j, k, n
—
j — k
n! j!k!(n—j—k)!
is largest for j, k, n j — k, closest to n/3. [Hint: Suppose a maximum is attained for j = J, k = K. Consider the inequalities of the form —
n (
^
n
j, k,n—j—k)_ < (J,K,n—J—K)
when j, k and/or n j — k differ from J, K, n — J — K, respectively, by ± 1. Use this to show In — J — 2KI < 1, in — K — 2JI < 1.] —
6. Give a probabilistic interpretation to the relation (see Eq. 5.9) ß = 1/(1 — y). [Hint: Argue that the number of returns to 0 is geometrically distributed with parameter y.] 7. Show that a multidimensional random walk is transient when the (one-step) mean displacement is nonzero. 8. Calculate the probability that the simple symmetric k-dimensional random walk will return i.o. to a previously occupied site. [Hint: The conditional probability, given S o , ... , S, that S„ + 1 ^ {S o , ... , S„} is at most (2k — 1)/2k. Check that P(S+1,...,S,+mE{So,...,S„}`5
2k-1 '" 2k
)
for each m >, 1.] 9. (i) Estimate (numerically) the expected number (ß — 1) of returns to the origin. [Hint: Estimate a bound for C in (5.15) and bound (5.16) with a Riemann integral.] (ii) Give a numerical estimate of the probability y that the simple random walk in k = 3 dimensions will return to the origin. [Hint: Use (i).] *10. Calculate the probability that a simple symmetric random walk ink = 3 dimensions will eventually hit a given line {(ra, rb, rc): r e Z}, where (a, b, c) & (0, 0, 0) is a lattice point. *11. (A Finite Switching Network) Let F = (x 1 , X21... , x k } be a Finite set of k sites that can be either "on" (1) or "off" (0). At each instant of time a site is randomly selected and switched from its current state e to 1 — e. Let S = {0,1 } F = {(e l , ... , e k ): e ; = 0 or 1}. Let X1 , X2 , ... be i.i.d. S-valued random variables with
EXERCISES
73
P(X" = e ; ) = p ; , i = i,... , k, where e ; e S is defined by e.(j) = b ; , j , and p, _> 0, I p i = 1. Define a random walk on S, regarded as a group under coordinatewise addition mod 2, by
S0=(0,0,...,0).
Show that the configuration in which all switches are off is recurrent in the cases k = 1, 2. The general case will follow from the methods and theory of Chapter II when k < oo. The problem when k = cc has an interesting history: see F. Spitzer (1976), Principles of Random Walk, Springer Verlag, New York, and references
therein. *12. Use Exercise 11 above and the examples of random walks on Z to arrive at a general formulation of the notion of a random walk on a group. Describe a random walk on the unit circle in the complex plane as an illustration of your ideas. 13. Let {X"} denote a recurrent random walk on the 1-dimensional integer lattice. Show that E,(number of visits to x before hitting 0) = + cc,
x ^ 0.
[Hint: Translate the problem by x and consider that starting from 0, the number of visits to 0 before hitting —x is bounded below by the number of visits to 0 before leaving the (open) interval centered at 0 of length lxi. Use monotonicity to pass to the limit.]
Exercises for Section I.6
1. Let X be an (n + 1)-dimensional Gaussian random vector and A an arbitrary linear transformation from 11" + ' to I. Show that AX is a Gaussian random vector in 11'. 2. (i) Determine the consistent specification of finite-dimensional distributions for the canonical construction of the simple random walk. (ii) Verify Kolmogorov consistency for Example 3. *3. (A Kolmogorov Extension Theorem: Special Case) This exercise shows the role of topology in proving what otherwise seems a (purely) measure-theoretical assertion
in the simplest case of Kolmogorov's theorem. Let fl = {0, 1 } be the product space consisting of sequences of the form w = (cw„ w 2 ,...) with w i e {0, 1}. Give S2 the product topology for the discrete topology on {0, 1}. By Tychonoff's Theorem from topology, this makes S compact. Let X" be the nth coordinate projection mapping on S2. Let .y be the Borel sigmafield for ). (i) Show that F coincides with the sigmafield for S2 generated by events of the form F(E 1 ,.. ,E")={we
c2:w 1 =E i for i= 1,2,.. ,n},
for an arbitrarily prescribed sequence E i ..... E" of l's and 0's. [Hint: p(w, n) = J^ ^ Iw" — 7"I/ 2 " metrizes the product topology on S2. Consider the open balls of radii of the form r = 2 -^' centered at sequences which are 0 from some n onward and use separability.]
RANDOM WALK AND BROWNIAN MOTION
74
(ii) Let {P"} be a consistent family of probability measures, with P" defined on (Ii",."), and such that P. is concentrated on Q, = {0,1 }". Define a set function for events of the form F = F(E 1 , ... , e") in (i), by P(F)=P"({w ; =c,i=1,2,.. ,n}).
Show that there is a unique probability measure P on the sigmafield .^" of cylinder sets of the form C={a eft(w,..• .w n )EB},
where B c {0, 1 }", which agrees with this formula for Fe .f", n >_ 1. (iii) Show that F ° := Un 1 .f" is a field of subsets of Q but not a sigmafield. (iv) Show that P is a countably additive measure on F°. [Hint: f) is compact and the cylinder sets are both open and closed for the product topology on D.] (v) Show that P has a unique extension to a probability measure on F. [Hint: Invoke the Caratheodory Extension Theorem (Chapter 0, Section 1) under (iii), (iv)•]
(vi) Show that the above arguments also apply to any finite-state discrete-parameter stochastic process. 4. Let (S2, F, P) and (S, 2) represent measurable spaces. A function X defined on S2 and taking values in S is called measurable if X - `(B) E F for all Be 2', where X -' (B) = {X E B} _ {w E D: X(o) e B}. This is the meaning of an S-valued random variable. The distribution of X is the induced probability measure Q on ,P defined by Q(B)=P(X - '(B))=P(XEB),
BEY.
Let (S2, .F, P) be the canonical model for nonterminating repeated tosses of a coin and X(a) = w", WE n. Show that {X"„ X.1, . ..}, m an arbitrary positive integer, is a measurable function on (S2, F, P) taking values in (S2, F) with the distribution P; i.e., {X,", Xm+ ,, ... , } is a noncanonical model for an infinite sequence of coin tossings. 5. Suppose that Di" and D'2 are covariance matrices. (i) Verify that aD" + ßD 21 , a, ß _> 0, is a covariance matrix. (ii) Let {D ( " ) = ((a;;'))} be a sequence of covariance matrices (k x k) such that lim a v;j ) = a i; exists. Show that D = ((a u )) is a covariance matrix. *6. Let µ(t) = f ° e"'p(dx) be the Fourier transform of a positive finite measure p. (Chapter 0, (8.46)). (i) Show that (µ(t, — t.)) is a nonnegative definite matrix for any t, < • • • < t k . (ii) Show that µ(t) = e - I` 1 if p is the Cauchy distribution. *7. (Pölya Criterion for Characteristic Functions) Suppose that 4 is a real-valued nonnegative function on (— cc, oo) with 4(—t) = 4(t) and 0(0) = 1. Show that if 0 is continuous and convex on [0, cc), then 0 is the Fourier transform (characteristic function) of a probability distribution (in particular, for any t, < t 2 < • • • < t k , k >_ 1, ((4(t ; — ti ))) is nonnegative definite by Exercise 6), via the following steps. (i) Check that
1 — Its, Y(t) = 0,
Its _< 1, t E 68', Itl > 1
75
EXERCISES
(ii)
(iii)
(iv)
(v)
is the characteristic function of the (probability) measure with p.d.f. y(x) = (1 - cos x)/itx 2 , - oo <x < oc. Use the Fourier inversion formula [(8.43), Chapter 0]. Check that the characteristic function of a convex combination (mixture) of probability distributions is the corresponding convex combination of characteristic functions. Let 0 < a, < • • • - 0, Y-"•_, p ; = 1. Show that D(t) = p, y(t/a 1 ) + • • • + p, (t/a^) is a characteristic function. Draw a graph of 0(t) and check that the slope of the segment between a k and ak+I is -[(p k /a k ) + • • • + (p„/a„)], k = 1, 2, ... , n - 1. Interpret the numbers Pi , p, + p 2 , ... , p, + • • • + p„ = I along the vertical axis with reference to the polygonal graph of ß(t). Show that a function 0(t) satisfying Pölya's criterion can be approximated by a function of the form (iii) to arbitrary accuracy. [Hint: Approximate 4(t) by a polygonal path consisting of n segments of decreasing slopes.] Show that the (pointwise) limit of characteristic functions is a characteristic function.
*8. Show that the following define covariance functions for a Gaussian process. (i)
Q.j
=e
i,J = 0, 1, 2, .. .
(ii) Q = min(i, j) = (iii + I il - Ii -JI)/2• ;;
[Hint: Use Exercises 6 and 7.]
Exercises for Section I.7 1. Verify that (B,,, ... , B,,,,) has an m-dimensional Gaussian distribution and calculate the mean and the variance-covariance matrix using the fact that the Brownian motion has independent Gaussian increments. 2. Let {X,} be a process with stationary and independent increments starting at 0 with EXs < cc for s> 0. Assume EX„ EX, are continuous functions of t. (i) Show that EX, = mt for some constant m. (ii) Show that Var X, = at for some constant a 2 > 0. (iii) Calculate the limiting distribution of Y( . )
(X, - mnt) -
t
asn-> cc, for t>0,fixed. 3. (i) (Diffusion Limit Scalings) Let X, be a random variable with mean x + t f (p - q)0 and variance t f A 2 pq. Give a direct calculation of p and q = I - p in terms off and A using only the requirements that the mean and variance of X, - x should stabilize to some limiting values proportional to t as f -. oc and A -+0. (ii) Verify convergence to the Gaussian distribution for the distribution of (Q//)S1 „, j as n -* co, where {S„} is the simple random walk with p„ = (p/2 f) + Z, by application of Liapunov's CLT (Chapter 0, Corollary 7.3).
76
RANDOM WALK AND BROWNIAN MOTION
4. Let {X} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and zero drift. (ii) Show that the process has the following scaling property. For each A > 0 the process {Y} defined by Y, = A - ' t Z Xx, is distributed exactly as the process Al. (ii) How does (i) extend to k-dimensional Brownian motion? 5. Let {X} be a stochastic process which has stationary and independent increments. (i) Show that the distribution of the increments must be infinitely divisible; i.e., for each integer n, the distribution of X, - X, (s < t) can be expressed as an n-fold convolution of a probability measure p". (ii) Suppose that the increment 24 - X s has the Cauchy distribution with p.d.f. (t - s)/n[(t - s) 2 + x 2 ] for s < t, x e I8'. Show that the Cauchy process so described is invariant under the rescaling { Y} where Y = A -' Xi, for A > 0; i.e., {Y,} has the same distribution as {X,}. (This process can be constructed by methods of theoretical complements 1, 2 to Section IV.1.) 6. Let {X} be a Brownian motion starting at 0 with zero drift and diffusion coefficient a 2 > 0. Define Y,= JXJ,t ?0. (i) Calculate EY, Var Y,. (ii) Is { Y} a process with independent increments? 7. Let R, = X, where {Xj is a Brownian motion starting at 0 with zero drift and diffusion coefficient a 2 > 0. Calculate the distribution of R. 8. Let {B,} be a standard Brownian motion starting at 0. Define - Bo- nie "I
V" _
(i) Verify that EV" = 2"' 2 EIB 1 I. (ii) Show that Var V. = VarIB 1 j. t 1. (iii) Show that with probability one, {B,} is not of bounded variation on 0 V", n 1, and, using Chebyshev's Inequality, [Hint: Show that I' + l', P(V">M)->lasn- ooforanyM>0.]
>,
>-
-< -<
9. The quadratic variation over [0, t) of a function f on [0, oo) is defined by V(f)= lim. .v"(t,f),where 2^
v"(t,f)
[f(kt/ 2") - f((k - 1 )t/ 2 ")] z
_ k=1
provided the limit exists. (i) Show if f is continuous and of bounded variation then V (f) = 0. (ii) Show that Ev"(t, {XX - a 2 t as n -• cc for Brownian motion {X} with diffusion coefficient a 2 and drift p. (iii) Verify that v"(t, {X,}) -• v et in probability as n -• co. (*iv) Show that the limit in (iii) holds almost surely. [Hint: Use Borel-Cantelli Lemma and Chebyshev Inequality.] })
10. Let {X} be the k-dimensional Brownian motion with drift lt and diffusion coefficient matrix D. Calculate the mean of X, and the variance-covariance matrix of X.
EXERCISES
77
11. Let {X} be any mean zero Gaussian process. Let t, < t 2 < • • • < t". (i) Show that the characteristic function of (X r ,... .. X,^) is of the form e -44 « ) for some quadratic form Q(E,) =
E{X,,X, 2 ...X,} =
if n is odd
* E{X,X, }. • E{X,X, k }
if n is even,
where Y* denotes the sum taken over all possible decompositions into all possible disjoint pairs {t ; , t 3 }, ... , {t,,,, t,} obtained from {t,.. ... t"}. [Hint: Use induction on derivatives of the (multivariate) characteristic function at (0, 0, ... , 0) by first observing that c?e -1 Q 14) /c7 i = —a,e t'2 and c?x ; /a5 t = a ;j , where x i = Y J a ;j ^ t and A = ((a i; )).] Exercises for Section I.8
1. Construct a matrix representation of the linear transformation on U8'` of the increments (x,,, x, 2 — x .
,x
X, k ,) to (x,,, x, z , . -
, x tk ).
*2. (i) Show that the functions f: CEO, oo) —• E defined by f(w) = max w(t),
g(co) = min w(t)
a
are continuous for the topology of uniform convergence on bounded intervals. (ii) Show that the set { f({X;"°}) _< x} is a Borel subset of CEO, oo) if f is continuous on C[0, oo).
(iii) Let f: C[0, oo) --* R'` be continuous. Explain how it follows from the definition of weak convergence on C[0, oo) given after the statement of the FCLT (pages 23-24) that the random vectors f({X;"°}) must converge in distribution to f({X }) too.
(iv) Show that convergence in distribution on CEO, cc) implies convergence of the finite-dimensional distributions. Exercise 3 below shows that the converse is not true in general. *3. Suppose that for each n = 1, 2, ... , {x"(t), 0 _< t _< 1}, is the deterministic process whose sample path is the continuous function whose graph is given by Figure Ex.I.8. (i) Show that the finite-dimensional distributions converges to those of the a.s. identically zero process {z(t)}, i.e., z(t) - 0, 0 S t _< 1. (ii) Check that max,,,,, x(t) does not converge to max o ,,,, z(t) in distribution.
2n
Figure Ex.I.8
n
78
RANDOM WALK AND BROWNIAN MOTION
*4• Give an example to demonstrate that it is not the case that the FCLT gives convergence of probabilities of all infinite-dimensional events in C[0, x). [Hint: The polygonal process has finite total variation over 0 <_ t z 1 with probability 1. Compare with Exercise 7.8.] 5. Verify that the probability density function p(t; x, y) of the position at time t of the Brownian motion starting at x with drift p and diffusion coefficient a 2 solves the so-called Fokker-Planck equation (for fixed x) given by
ap _ 1
at -
Zo
ap
2 02 p
OY Z -
p
aY
(i) Check that for fixed y, p also satisfies the adjoint equation
ap =, 22 a l p a 2 + p c7 —
at
ax
"^ ax
(ii) Show that p is a symmetric function of x and y if and only if p = 0. (iii) Let c(t, y) = $ c o (x)p(t, x, y) dx, where c o is a positive bounded initial concentration smoothly distributed over a finite interval. Verify that c(t, y) solves the Fokker-Planck equation with initial condition c o , i.e.,
ac 8t
=
za g
a2C 2
Y
ac
- p—,
Y
c(O , Y) = co(y).
6. (Collective Risk in Actuary Science) Suppose that an insurance company has an initial reserve (total assets) of X0 > 0 units. Policy holders are charged a (gross) risk premium rate a per unit time and claims are made at an average rate A. The average claim amount is it with variance a 2 . Discuss modeling the risk reserve process {X,} as a Brownian motion starting at x with drift coefficient of the form a - p l and diffusion coefficient 2a 2 , on some scale. 7. (Law of Proportionate Effect) A material (e.g., pavement) is subject to a succession of random impacts or loads in the form of positive random variables L,, L 2 , .. (e.g., traffic). It is assumed that the (measure of) material strength T k after the kth impact is proportional to the strength Tk _, at the preceding stage through the applied load L k , k = 1, 2, ... , i.e., Tk = L,,T,k _,. Assume an initial strength To - 1 as normalization, and that E(log L 1 ) 2 < co. Describe conditions under which it is appropriate to consider the geometric Brownian motion defined by {exp(pt + a 2 B,)}, where {Bj is standard Brownian motion, as a model for the strength process. 8. Let X 1 , X2 ,,.. be i.i.d. random variables with EX„ = 0, Var X. = a 2 > 0. Let S. = X, + • . • + X,,, n >, 1, So = 0. Express the limiting distribution of each of the random variables defined below in terms of the distribution of the appropriate random variable associated with Brownian motion having drift 0 and diffusion coefficient a 2 > 0. (i) Fix 0>0, Y. = n -012 max{ISj ° : 1 _< k < n}. (ii) Yn = n - 'I'S.. (iii) Y. = n 312 >I Sk . [Hint: Consider the integral of t -- S(n , l , 0 5 t - 1.] 9. (i) Write R n (x) = 1(1 + xfn)" - esj. Show that
EXERCISES
79
R(x)
1 — 1 —
sup R(x) —*0
n
1 _ r = 1_ ^x r n Jj r! as n —' oo
+ I x en+
t e^xl (n + 1)!
(for every c > 0).
JxJ Sc
(ii) Use (i) to prove (8.6). [Hint: Use Taylor's theorem for the inequality, and Lebesgue's Dominated Convergence Theorem (Chapter 0, Section 0.3).]
Exercises for Section I.9
1. (i) Use the SLLN to show that the Brownian motion with nonzero drift is transient. (ii) Extend (i) to the k-dimensional Brownian motion with drift. 2. Let X, = X 0 + vt, t >, 0, where v is a nonrandom constant-rate parameter and X 0 is a random variable. (i) Calculate the conditional distribution of X,, given XS = x, for s < t. (ii) Show that all states are transient if v 0. (iii) Calculate the distribution of X, if the initial state is normally distributed with mean µ and variance a 2 . 3. Let {X,} be a Brownian motion starting at 0 with diffusion coefficient a 2 > 0 and zero drift. (i) Define { }} by Y, = tX,,, for t > 0 and Y0 = 0. Show that { Y} is distributed as Brownian motion starting at 0. [Hint: Use the law of large numbers to prove sample path continuity at t = 0.] (ii) Show that {X,} has infinitely many zeros in every neighborhood of t = 0 with probability 1. (iii) Show that the probability that t - X, has a right-hand derivative at t = 0 is zero. (iv) Use (iii) to provide another example to Exercise 8.4. 4. Show that the distribution of min,, 0 X° is exponential if {X,° } is Brownian motion starting at 0 with drift p > 0. Likewise, calculate the distribution of max,,, X,° when p<0. *5. Let {Sn } denote the simple symmetric random walk starting at 0, and let m= min S k , OSk
M= max Sk ,
n= 1, 2, ... .
05k-'n
Let {B,} denote a standard Brownian motion and let m = min,,,,, B,, M = max, < ,,, B,. Then, by the FCLT, n" t j 2 (m n , Mn , Sn ) converges in distribution to (m, M, B 1 ); for rigorous justification use theoretical complements 1.8, 1.9 noting that the functional w —+ (min,,,,, co,, max,,,,, w,, w,) is a continuous map of the metric space C[0, 1] into R 3 . For notational convenience, let Pn(J)=P(Sn=1),
Pn (u,v,y)=P(u
for integers u, v, y such that u _< 0 _< v, u < v and u _< y _< v. Also let 1(a, b) = P(a
80
RANDOM WALK AND BROWNIAN MOTION
Convergence of Probability Measures, Wiley, New York, p. 86. These results for Brownian motion are also obtained by other methods in Chapter V. (i) P(u, v, Y) = P,(Y) — n(v, Y) — it(u, Y) + n(v, u, y) + n(u, v, Y) — t(v, u, v, Y) — n(u, v, u, y) + • • • , where for any fixed sequence of nonnegative integers Y1, Y2, • • • , Yk, y, k, n, zc(y 1 , Y2, ... , Yk, y) denotes the probability that an n-step random walk meets y, (at least once), then meets —Y 2 , then meets y 3 , ... , then meets (-1) k- 'y k , and ends at y. ( 11 ) R(Y 1, Y2, ... , Yk, Y) = pn( 2 Y 1 + 2Y 2 + ... + 2y k - 1 + (-1)k + 1 y) if (-1)k + 'Y > Yk, n(Y1, Y2, ... , Yk, Y) = Pn( 2 Y1 + 2Y 2 + ... + 2y k _ (_ 1)k+ l y) if (1)'y -< Yk •
[Hint: Use Exercise 4.4(ii), the reflection principle, and induction on k. Reflect through (-1)k y k _, the part of the path to the right of the first passage through that point following successive passages through y1, — Y2, ... , ( -1 ) k-1 Yk-2.] (iii) p„(u, v, y) _
p„(y + 2k(v — u)) —
p„(2v — y + 2k(v — u)).
(iv) For integers, u -< 0 -< v, u <- y 1 < Y 2 P(u <m„
= j P(y, + 2k(v — u) < S. < Y 2 + 2k(v — u))) — Y P(2v—y 2 +2k(v—u)<S„<2v—y,+2k(v—u)). k=—w
[Hint: Sum over y in (iii).] (v) For real numbers u < 0 -< v, u < y 1 < Y 2 P(u<m-<M
_ k=—m
—
(D(2v — Y 2 + 2k(v — u), 2v — y, + 2k(v — u)). k=—ac
[Hint: Respectively substitute the integers [u / ], —[—u/], [y,^], — [ --- Y2\/] into (iv) ([ ] denoting the greatest integer function). Use Scheffe's Theorem (Chapter 0) to justify the interchange of limit with summation over k.]
(vi)
P(M
< v, y 1 < B, < ky2) = 1 (YI,Y2) — '( 2 v — y 2 , 2v — YI)•
[Hint: Take u = —n — I in (iv) and then pass to the limit.] (vii) P(u <m -< M < v) _
(-1)kD(u + 2k(v — u), v + 2k(v — u)). k=—m
[Hint: Take y, = U, Y 2 = v in (v).]
EXERCISES
81
(viii) P(sup^B,j < u) _
(— l ) k (D((2k — 1)v, (2k + 1)v). k=—a
[Hint: Take u = —v in (vii).] Exercises for Section I.10
* 1. (i) Show that (10.2) holds at each point z _< 0 (> 0) of continuity of the distribution , X. (max o <,,, X,). [Hint: These latter functionals are function for min o continuous.] (ii) Use (i) and (10.9) to assert (10.2) for all ^. 2. Calculate the probability that a Brownian motion with drift it and diffusion coefficient a 2 > 0 starting at x will reach y : x in time t or less. 3. Suppose that solute particles are undergoing Brownian motion in the horizontal direction in a semi-infinite tube whose left end acts as an absorbing boundary in the sense that when a particle reaches the left end it is taken out of the flow. Assume that initially a proportion i(x) dx of the particles are present in the element of volume between x and x + dx from the left end, so that f ö b(x) dx = 1. For a given drift it away from the left end and diffusion coefficient a 2 > 0, calculate the fraction of particles eventually absorbed. What if p = 0? 4. Two independent Brownian motions with drift p ; and diffusion coefficient a?, i = 1, 2, are found at time t = 0 at positions x i , i = 1, 2, with x, <x 2 . (i) Calculate the probability that the two particles will never meet. (ii) Calculate the probability that the particles will meet before time s > 0. 5. (i) Calculate the distribution of the maximum value of the Brownian motion starting at 0 with drift p and diffusion coefficient a 2 over the time period [0, t]. *(ii) For the case p = 0 give a geometric "reflection" argument that P(max o , s <, Xs >_ y) = 2P(X, _> y). Use (i) to verify this. 6. Calculate the distribution of the minimum value of a Brownian motion starting at 0 with drift I and diffusion coefficient a 2 over the time period [0, t]. 7. Let {B 1 } be standard Brownian motion starting at 0 and let a, b > 0. (i) Calculate the probability that —at < B, < bt for all sufficiently large t. (ii) Calculate the probability that {B,} last touches the line v = —at instead of y = bt. [Hint: Consider the process {Z,} defined by Zo = 0, Z, = tB, I , for t > 0. and Exercise 9.3(i).] 8. Let {(B,(' ) , B 2 )} be a two-dimensional standard Brownian motion starting at (0, 0) (see Section 7). Let r y = inf{t _> 0: B; 2 = y}, y > 0. Calculate the distribution of Bty ) . [Hint: {B} and {B 2 } are independent one-dimensional Brownian motions. Condition on r. Evaluate the integral by substituting u = (x 2 + y 2 )/t.] 9. Let {Br } be a standard Brownian motion starting at 0. Describe the geometric structure of sample paths for each of the following stochastic processes and calculate
EY,. (i) (Absorption)
fY = B 1 if max o<s ,, Bs < a 1 Y = a
where a > 0 is a constant.
if max o , s <, B s >_ a,
82
RANDOM WALK AND BROWNIAN MOTION
J
(*ii) (Reflection)
Y=B,
ifB,
Y,=2a—B,
if B,>a,
where a > 0 is a constant. (*iii) (Periodic) Y, = B,
—
[B, ],
where [x] denotes the greatest integer less than or equal to x. 10. Let, for a > 0, a e .f8(t) = (2 )"2 t 3/2 ir
22
'
(t > 0).
(i) Verify the convolution property fa * ff (t) = L +ß (t)
for any a, ß > 0.
(ii) Verify that the distribution of; is a stable law with exponent 0 = 2 (index z) in the sense that if Ti , T2 , ... , T. are i.i.d. and distributed as T. then n -8 (T1 + • • • + T.) is distributed as T. (see Eq. 10.2). (iii) (Scaling property) ; is distributed as z 2 z1. 11. Let T. be the first passage time to z for a standard Brownian motion starting at 0 with zero drift. (i) Verify that Ez= is not finite. A > 0. [Hint: Tedious integration will = e i ✓ 2 ' ^"', (ii) Show that Ee work.] (iii) Use Laplace transforms to check that (1/n)z (,J J converges in distribution to t z as n —> oo.
12. Let {B,} be standard Brownian motion starting at 0. Let s < t. Show that the probability that {B,} has at least one zero in (s, t) is given by (2/it) cos - '(s/t)`/ 2 . [Hint: Let p(x) = P({B,} has at least one zero in (s, t) ^ B, = x).
Then for x > 0, p(x)=P(min B,-
s,
=P(max BxIB s =0)=P(t x -
Likewise for x < 0, p(x) = P(r _ x -< t — s). So the desired probability can be obtained by calculating EP(IB,I) =
f
o,
2 P(x)(-
ns
e zs ^ 2 dx.
83
EXERCISES
Exercises for Section 1.11
Throughout this set of exercises {S"} denotes the simple symmetric random walk starting at 0. 1. Show the following for r 0. 2n
(i) P(S, ^0,S 2 ^0,...,S 2n -
(ii) P(r' 2 n ) = 2k, S Zn
t
Irl
-2n # 0 ,S2 = 2r) = 2
fl + rJ n
= 2r) _ (2k)( 2n — 2k)
Iri
2
-zn
k/J n—k+r /Jn—k
(*iii) Calculate the joint distribution of (y, B 1 ). 2. Let U. = #{k < n: Sk ( or Sk > 0} denote the amount of time spent (by the polygonal path) on the positive side of the state space during time 0 to n. (i) Show that P(UZn = 2k) = P(r,( zn ) = 2k) rz(k(n — k))'
Check k = 0, k = n first. Use mathematical induction on n and consider the conditional distribution of U 2n given the time of the first return to 0. Derive the last equality of (11.3) for U 2 using the induction hypothesis and (11.3).] (ii) Prove Corollary 11.4 by calculating the distribution of the proportion of time spent above 0 in the limit as n -• oo for the random walk. (iii) How does one reconcile the facts that U2 "/2n++Z as n --* co and P(S" > 0) -+ as n -+ ^o? [Hint: Consider the average length of time between returns to zero.] *3. Let r (" 1 = min{k -< n: Sk = max o , ; , n S; } denote the location of the first absolute maximum in time 0 to n. Then {T (n) =k} _ {Sk> 0, Sk> S(,..., S,> Sk_1,Sk>-Sk+1,•. •,Sk>- S n }
fork>- t,{T " =0}={Sk <-0,1 -
(
(i) Show that p(r(zn^ = 0) =
P( z (z" >> = 0) _ ( 2 n
_2n
\n J2
[Hint: Use (10.1) and induction.] (ii) Show that P(T(z") = 2n) = P( ,r (2n+1) = 2n) _ 1(2n\2 2- n
[Hint: Consider the dual paths to {S k : k = 0, 1, ... , n} obtained by reversing the order of the displacements, S, = S" — S"_,, Sz = (S n — Sn _ 1 ) + (S"_, — Sn_2),
84
RANDOM WALK AND BROWNIAN MOTION
... , S;, = S". This transformation corresponds to a rotation through 180 degrees. Use (11.2).]
(iii) Show that P(i a") = 2k) = P(t (2 "' = 2k + 1) = 1 (2k)2-2k(2(n - k))2-2i"-k)
2\k/
n-k
for k = 1, ... , n in the first case and k = 0, ... , n - 1 in the second. [Hint: A path of length 2n with a maximum at 2k can be considered in two sections. Apply (i) and (ii) to each section.]
C
")
1
2
(iv) lim P— t /I1 -sin - ^, n it
0
*4. Let F, UU be as defined in Exercise 2. Define V. = #{k < V" ) : Sk _ , >, 0, Sk _> 0} = UT "). Show that P(VZ " = 2r ( S 2 . = 0) = 1/(n + 1), r = 0, 1, ... , n. [Hint: Use induction and Exercise 3(i) to show that P(V2 . = 2r, S 2 " = 0) does not depend on
r,0_
1. Show that the finite-dimensional distributions of the Brownian bridge are Gaussian. 2. Suppose that F is an arbitrary distribution function (not necessarily continuous). Define an inverse to F as F - '(y) = inf{x: F(x) > y}. Show that if Y is uniform on [0, 1] then X = F '(Y) has distribution function F. -
3. Let {B r } be standard Brownian motion starting at 0 and let B* = B, - tB,, 0 < t <_ 1. (i) Show that {B*} is independent of B,. (ii) (The Inverse Simulation) Give a construction of standard Brownian motion from the Brownian bridge. [Hint: Use (i).] *4. Let {B,} be a standard Brownian motion starting at 0 and let {B*} be the Brownian bridge. (i) Show that for time points 0 < t,
Likewise, for conditioning on B, e DE = [0, e) or DE = [-s, 0) the limit is unchanged; for the existence of {B*} as the limit distribution (tightness) as e -* 0, see theoretical complement 3. (ii) Show that for m* = info ,,,, B*, M* = sup 0 ,,, B*, u <0< v, exp{-2k2(v - u) 2 }
P(u < m* _< M* _< v) _ k=-m
-
exp{-2[v+k(v-u)]2}.
[Hint: Express as a limit of the ratio of probabilities as in (i) and use Exercise 9.5(v). Also, 4(x, x + e) = e/(2n) " 2 exp( - x 2 /2) + o(1) as e -+ 0.]
85
EXERCISES
(iii) Prove sup IB*l
P^ OSt6l
/
y>0.
k=1
[Hint: Take u = —v in (ii).] 22 , v > 0. [Hint: Use Exercise 9.5(vi) for the ratio of probabilities described in (i).]
(iv) P(M* < v) = 1 — e
5. (Random Walk Bridge) Let {S.} denote the simple symmetric random walk starting at 0.
(i) Calculate P(S,„ = y I S 2 „ = 0), 0 -< m < 2n. (*ii) Let U 2 . = # {k 2n: Sk _, 0, Sk > 0}. Calculate P(U2 , = r S Z „ = 0). [Hint: See Exercise 11.4*.]
*6. (Brownian Meander) The Brownian meander {B+ } is defined as the limiting distribution of the standard Brownian motion {B,} starting at 0, conditional on {m = min,,, I B, > —e} as E —> 0 (see theoretical complement 4 for existence). Let ^ B,+ , M + = max o ,,, , B+ . Prove the following: m + = min a
(i)
P(M
-< x, B -< y) = ^ [e-l2kx)2/2 -
L'-(2kz+y)2'2]
0 < y
x.
k=-m
[Hint: Express as a limit of ratios of probabilities and use Exercise 9.5(v). Also P(m > —e)=(2/tt) 2 +o(1);see Exercise 10.5(ii)notingmin(A)= —max(—A) and symmetry. Justify interchange of limits with the Dominated Convergence Theorem (Chapter 0).] (ii) P(M + < x) = I + 2 Yk I (— 1) k exp{ —(kx) 2 /2}. [Hint: Consider (i) with y = x.]
(iii) EM + = (2tt)'t 2 log 2 = 1.7374.... [Hint: Compute f ö P(M + > x) dx from (ii).]
(iv) (Rayleigh Distribution) P(B,
< x) = I
—e
22
, x > 0. [Hint: Consider (i) in
the limit as x —• oo.]
*7. (Brownian Excursion) The Brownian excursion {B* + } is defined by the limiting distribution of {B} conditioned on {m* > —s} as d0 (see theoretical complement 4 for existence). Let M* = max 0 , 1 B* + . Prove the following: M
(i) P(M* + -< x) = 1 + 2 1 [l — (2kx) 2 ] exp{—(2kx) 2 /2},
x > 0.
k-1
[Hint: Write P(M* + < x) as the limit of ratios as 0. Multiply numerator and denominator by s -Z , apply Exercise 4, and use ]'Hospital's rule twice to evaluate the limit term by term. Check that the Dominated Convergence Theorem (Chapter 0) can be used to justify limit interchange.] (ii) EM* + = (n/2)" 22 . [Hint: Note that interchange of (limits) integral with sum in (i) to compute EM* + = 10 P(M* + > x) dx leads to an absurdity, since the values of the termwise integrals are zero. Express EM* + as Q, q)
2
lim
e
-
0
Y [(2kx) 2 — l]exp{ —' (2kx) 2 } dx k=1
86
_
RANDOM WALK AND BROWNIAN MOTION
and note that for k >- A the integrand is nonnegative on [A, oc ). So Lebesgue's monotone convergence can be applied to interchange integral with sum over k > 1/(20) to get zero for this. Thus, EM* + is the limit as 0 - 0 of a finite sum over k
(iii)
(2(2)li2)-.4( n )lt2 r
Ir
Ii
r
if r = 1
1 (r),
if r = 2, 3 , ... ,
\ 2 /
where C(r) _ ^k , k ' is the Riemann Zeta function (r >, 2). [Hint: The case r = 1 is given in (ii) above.] For the case r >- 2, we have p*+(r)=r^xr-1(1
-
Fö(x))dx=2r
^^^t' '{(2t) 2 1}e ^ z n 22 dt, k=l k 0
0
where the interchange of limits is justified for r >- 2 since 1
i -I tr k=1 k f,^
-
'I(2t) 2 _ lle - an = rz dt < oo,
r = 2, 3, ... .
'
In particular, letting Z denote a standard normal random variable, u*+(r) = 2 - 'r^(r)(2)''
2 (n)l/ 2 [EIZI .
+I — EIZI'-1]
Consider the two cases whether r is odd or even. 8. Prove the Glivenko-Cantelli Lemma by justifying the following steps. (i) For each t, the event {F (t) -• F(t)} has probability 1. (ii) For each t, the event {F(t) -• F(t )} has probability 1. (iii) Let r(y) = inf{t: F(t) -> y}, 0
= max {IF„(z(k/m)) - F(r(k/m))I, IF„(t(k/m) - ) - F(i(k/m) - )I}. I km
Then, by considering the cases TI k m 1 )^i^T^m^,
t<1(m)andt>t(1),
check that sup IF(t) - F(t)I < D, + 1 m m
m
(v) C= U U m=1k=1
{{F„(t(k/m))+-*F(i(k/m))}
u {F„(z(k/m) - )+-* F(z(k/m) - )}
87
EXERCISES
has probability zero, and for w e C and each m D;n , n (w)->0
(vi) sup IF (t, w) - F(t)J -> 0
as n -+ co
F
asn- co. for we C.
I
9. (The Gnedenko-Koroljuk Formula) Let (X 1 ..... X) and (Y1 .....) be two independent i.i.d. random samples with continuous distribution functions F and G, respectively. To test the null hypothesis that both samples are from the same population (i.e., F = G), let Fn and G. be the respective empirical distribution functions and consider the statistic Dn , n = sup x jFn (x) - G(x)I. Under the null hypothesis, X 1 ..... X, Y,..... Y„ are 2n i.i.d. random variables with the common distribution F. Verify that under the null hypothesis:
(i) the distribution of Dn. „ does not depend on F and can be explicitly calculated according to the formula P Dn.n < r = P max (S kc2nl* I n O
Sk2nl* — Sk2n`* = + 1
if X(k) E {X,, ... , Xn }
Skan)* - Skzn),* = - 1
if X(k) E { Y1 ,
—
... ,
Yn }. ]
(ii) Find the analytic expression for the probability in (i). [Hint: Consider the event that the simple random walk with absorbing boundaries at ±r returns to 0 at time 2n. First condition on the initial displacement.] (iii) Calculate the large-sample-theory (i.e., asymptotic as n - x) limit distribution of fn D., n . See Exercise 4(iii). (iv) Show C2nl Pl
sup
n
r'\
(F.(x)-
Gn(x))<=
n-r/f
1 ---- () 2n
r= 1,...,n.
(11J
[Hint: Only one absorbing barrier occurs in the random walk approach.]
Exercises for Section 1.13 1. Prove that
t
defined by (13.5) (r = 1, 2, ...) are stopping times.
2. If r is a stopping time and m > 0, prove that r A m := min{r, m} is a stopping time. 3. Prove E(S l,, > .,) -*0 as m -* oo under assumptions (2) and (3) of Theorem 13 1. i
88
RANDOM WALK AND BROWNIAN MOTION
4. For the simple symmetric random walk, starting at x, show that E{S
yP(r
y
= r) for
r _< m.
, ,1(, ,)} _
5. Prove that EZn is independent of n (i.e., constant) for a martingale {Zn ). Show also that E(Zn I {Z o , ... , Zk )) = Zk for any n> k. 6. Write out a proof of Theorem 13.3 along the lines of that of Theorem 13.1. 7. Let {Sn } be a simple symmetric random walk with p a (2, 1). (i) Prove that {(q/p)s ': n = 0, 1, 2, ...} is a martingale. (ii) Let c <x
for n>k, where A k :={Z 0 .i}.] (iii) Extend the result of Exercise 9 to nonnegative submartingales. 11. Let {Zn } be a martingale. If EIZ,,I < oo then prove that IZ,,I° is a submartingale, p >_ 1. [Hint: Use Jensen's or Hölder's Inequality, Chapter 0, (2.7), (2.12).] 12. (An Exponential Martingale) Let {X,: j _> 0} be a sequence of independent random variables having finite moment-generating functions 4,(^):= E exp{^XX } for some 96 0. Define Sn := X, + • • • + XX , Zn = exp{^Sn }/fl7 = , q(). (i) Prove that {Zn } is a martingale. (ii) Write M„ = max{S...... Sn }. If > 0, prove that n
11 O ; (Z) P(MM - A) -< exp{ — ZA}
(Ä >0).
(iii) Write mit = min{S l , ... , Sn }. If < 0, prove
P(mn -<
—A) -< exp{ZA} 11 4;(Z)
(A > 0).
i =1
13. Let {Xn : n >_ 1) be i.i.d. Gaussian with mean zero and variance a 2 > 0. Let Sn = X, + • • + Xn , MM = max{S 1 , ... , Sn }. Prove the following for A > 0. (i) P(MM _> 2) < exp{ —2 2 /(2a 2 n)). [Hint: Use Exercise 12(ii) and an appropriate choice of .]
(ii) P(max {ISST: 1 < j < n} >_ Aaln) _< 2 exp{ —A 2 /2}. 14. Let r ' r 2 be stopping times. Show the following assertions (i) —(v) hold. (i) z l v r 2 '= max(r l , tr z ) is a stopping time.
89
EXERCISES (ii)
il
A r Z := min(r l , 2 2 ) is a stopping time.
(iii) t l + 1 2 is a stopping time. (iv) at,, where a is a positive integer, is a stopping time. (v) If i l < t Z a.s. then it need not be the case that 1 2 — r, is a stopping time. (vi) If r is an even integer-valued stopping time, must zr be a stopping time? 15. (A Doob-Meyer Decomposition)
(i) Let { } } be an arbitrary submartingale (see Exercise 10) with respect to
sigmafields , o c ,yl c ,F2 c • • • . Show that there is a unique sequence { V„} such that: (a) 0= VQ
_< V <_ _< . . I
VZ
a.s.
(b) V. is .f„ _ I -measurable. (c) {Z„} := { }„ — V„} is a martingale with respect to .. [Hint: Define V„=V„_
1
+E{Y„IF
1
}—Y,n> 1.]
(ii) Calculate the {V„}, {Z„} decomposition for Y = S„, where {S„} is the simple symmetric random walk starting at 0. [Note: A sequence { V„} satisfying (b) is called a predictable sequence with respect to {.y}.] 16. Let {S„} be the simple random walk starting at 0. Let {G„} be a predictable sequence of nonnegative random variables with respect to .f„ = a{X l , ... , X„} = a So , ... , S„}, where X. = SS — S,_ 1 (n = 1, 2, ...); i.e., each G. is 9„_ 1 -measurable. Assume each G. to have finite first moment. Such a sequence {G„} will be called a strategy. Define ,
Wn =Wo+
Y Gk(Sk
—
Sk-I),
nil,
k=1
where Wo is an integrable nonnegative random variable independent of {S„} (representing initial capital). Show that regardless of the strategy {G„} we have the following. (i) If p = Z then { W„} is a martingale. (ii) If p > Z then { W„} is a submartingale. (iii) If p < Z then {W} is a supermartingale (i.e., EIW„I < cc, E(Wn+1 I f) W„, n=1,2,...).
(iv) Calculate EW„, n > 1, in the case of the so-called double-or-nothing strategy defined by G„ = 2 S ^ - '1 (S=i _ 1j , n >_ 1.
17. Let {S„} be the simple symmetric random walk starting at 0. Let r = inf{n ? 0: S„=2—n}. (i) Calculate Er from the distribution of t. (ii) Use the martingale stopping theorem to calculate Et. (*iii) How does this generalize to the cases r = inf{n ? 0: S„ = b — n}, where b is a positive integer? [Hint: Check that n + S. is even for n = 0, 1, 2, ....] 18. (i) Show that if X is a random variable such that g(z) = Ee^ x is finite in a neighborhood of z = 0, then EX” < oo for all k = 1, 2..... (ii) For a Brownian motion {X,} with drift p and diffusion coefficient a 2 , prove that exp{AX, — Atp — A 2 a 2 t/2} (t _> 0) is a martingale. 19. Consider an arbitrary Brownian motion with drift µ and diffusion coefficient a 2 > 0. (i) Let m(x) = ET", where Ts is the time to reach the boundary {c, d} starting at x e [c, d]. Show that m(x) solves the boundary-value problem
90
RANDOM WALK AND BROWNIAN MOTION d e m
dm
zag d2 +µ dz —1,
(ii) Let r(x) = Px (r d problem
Z
<
;)
m(c)=m(d)=0.
for x e [c, d]. Verify that r(x) solves the boundary value
x
2
z
dx
+µ
dr = 0, dx
r(c) = 0, r(d) = 1.
Exercises for Section I.14 1. In the case ß
= Z, consider the process {Z,N (s)} defined by ZN Y (n) ,v S„ —n
1 \NJ —f
n=1 2 ...‚N
i D,
and linearly interpolate between n/N and (n + 1)/N. (i) Show that {Z N } converges in distribution to BS + 2cs 2 (1 — s 2 ), where {B'} = {B s — sB 1 : 0 < s < 1} is the Brownian Bridge. (ii) Show that R N /(J/ DN ) converges in distribution to maxo, S,1 B; — min o , s , , B'. (iii) Show that the asymptotic distribution in (ii) is nondegenerate.
THEORETICAL COMPLEMENTS Theoretical Complements to Section I.1 1. Events that can be specified by the values of X, X,, 1 , ... for each value of n (i.e., events that depend only on the long-run values of the sequences) are called tail events.
Theorem T.1.1. (Kolmogorov Zero-One Law). A tail event for a sequence of 0 independent random variables has probability either zero or one. Proof. To see this one uses the general measure-theoretic fact that the probability of any event A belonging to the sigmafield F = 6 {X,, X2, ... , Xn , ...} generated by X 1 , X2 ,... , X, ...) can be approximated by events A 1 ..... A n , .. belonging to the field of events ,`fo = Un 1 ar {X i ..... X„} in the sense that A. e a{X„ ... , X„} for each n and P(AAA) -+ 0 as n -+ oo, where A denotes the symmetric difference AAA„ = (A n A n v (A` n A„). Applying this approximation to a tail event A, one obtains that since Ac u{X„ +1, X„ +2, ...} for each n, A is independent of each event A. Thus, 0 = lim... P(AAA„) = 2P(A)P(AC) = 2P(A)(1 — P(A)). The only solutions to the equation x(1 — x) = 0 are 0 and 1. n (
.
2. Let S. = X l + • • • + X, n 1. Events that depend on the tail of the sums are trivial (i.e., have probability I or 0) whenever the summands X 1 , X 2 , ... are i.i.d. This is a
THEORETICAL COMPLEMENTS
91
consequence of the following more general zero—one law for events that symmetrically depend on the terms X 1 , X 2 , .. . of an i.i.d. sequence of random variables (or vectors). Let ' denote the sigmafield of subsets of I8x' _ {(x,, x 2 , ...): x ; E W} generated by events depending on finitely many coordinates. Theorem T.1.2. (Hewitt—Savage Zero—One Law). Let X,, X 2 ,. . . bean i.i.d. sequence of random variables. If an event A = {(X,, X 2 , ...) e B}, where Be , is invariant under finite permutations (Xi ,, X.....) of terms of the sequence (X,, X 2 , ...), that is, A = {(X, X......) e B} for any finite permutation (i,, i 2 , ...) of (1, 2, ...), then 0. ❑ P(A) = I or As noted above, the symmetric dependence with respect to {X n } applies, for example, to tail events for the sums {S n }. Proof. To prove the Hewitt—Savage 0-1 law, proceed as in the Kolmogorov 0-1 law by selecting finite-dimensional approximants to A of the form A n = {(X,, ... , Xn ) e B}, B. e 4n, such that P(AAA,) - 0 as n —* oo. For each fixed n, let (i,, i Z , ...) be the B, permutation (2n, 2n — 1, ... , 1, 2n + I, ...) and define A. = {(X; , ... , X; ,) e B}. Then A„ and A n are independent with P(A, n A n ) = P(A,)P(A,) = (P(A n )) 2 - (P(A)) z as n —^ co. On the other hand, P(AAÄ,) = P(ALA) —* 0, so that P(A,OÄ n ) — 0 and, in particular, therefore P(A n r Ä n ) —* P(A) as n —4 co. Thus x = P(A) satisfies x = x 2 . n
Theoretical Complements to Section 1.3 1. Theorem T.3.1. If {Xn } is an i.i.d. sequence of integer-valued random variables for a general random walk S. = X, + • + X,,, n >_ 1, So = 0, on Z with 1 - EX, = 0, then {Sn } is recurrent. ❑ Proof. To prove this, first observe that P(S, = 0 i.o.) is 1 or 0 by the Hewitt—Savage zero—one law (theoretical complement 1.2). If X° , P(S n = 0) < oc, then P(S, = 0 i.o.) = 0 by the Borel—Cantelli Lemma. If Y_„ , P(S, = 0) is divergent (i.e., the expected number of visits to 0 is infinite), then we can show that P(Sn = 0 i.o.) = 1 as follows. Using independence and the property that the shifted sequence Xk , Xk ,,, ... has the same distribution as X,, X 2 , ... , one has 1 >, P(S, = 0 finitely often) >
P(S, = 0, S,„ 0, m> n)
=Z P(Sn =0)P(Sm —S, 540,m>n) = Y_ P(S„ = 0)P(Sm, 0,m i 1).
Thus, if Z„ P(S, = 0) diverges, then P(S,„ : 0, m >_ 1) = 0 or equivalently P(S m = 0 for some m >_ 1) = 1. This may now be extended by induction to get that at least r visits to 0 is certain for each r = 1, 2, .... One may also use the strong Markov property of Chapter II, Section 4, with the time of the rth visit to 0 as the stopping time. From here the proof rests on showing E n P(S, = 0) is divergent when p = 0. Consider the generating/unction of the sequence P(S, = 0), n = 0, 1, 2, ... , namely,
92
RANDOM WALK AND BROWNIAN MOTION
g(x):=
P(S" = 0)x",
Ix) < 1.
"=o
The problem is to investigate the divergence of g(x) as x -+ 1 . Note that P(S" = 0) is the 0th-term Fourier coefficient of the characteristic function (Fourier series) -
Ee tts" _
P(S" = k) e ttk
Thus,
1
1
"
P(S " = 0) = — (p"(t) dt Ee"s^ dt = — _„ 2n _"
lac
where cp(t) = Ee. It follows that for lxl < 1,
N dt
1 9(x) = 2n
" 1 - xtp(t)
Thus, recurrence or transience depends on the divergence or convergence, respectively, of this integral. Now, with p = 0, we have q(t) = 1 - o(Iti) as t -• 0. Thus, for any e > 0 there is a 6 > 0 such that I1 - cp 1 (t)I ltl. I(P2(t)I lti, for Itl _< S, where q(t) = q 1 (t) + i , 2 (t) has real and imaginary parts q, qi 2 . Now, for 0 < x < 1, noting that g(x) is real valued,
f
f
f ( l
1 - x(p (t) dt a
dt = " Re 1 J n -- x0(t) - .L \1 - xtP(t)/ J -" I 1 n dt
=
J-
1 -x(p 1 (t) a (1
-
x(V1(t))2 + x2(t)
> a
1-x 2(1
2
=—tan 3e
- x) 2 + 3x 2 r 2 t 2
- 1^
ES
1
n
/J-^— 1 - x 3E
dt v
x^P(t)Iz j
f
b -6 (1
1 - xq,(t)
dt
I1 - x^V(t)I Z
1 -x
- x + xslti) 2 + x2e2t2
dt
dt> a 1-x dt 3[(1 -x) 2 + e 2 t 2 ]
asx-' 1.
Since e is arbitrary, this completes the argument.
n
The above argument is a special case of the so-called Chung-Fuchs recurrence criterion developed by K. L. Chung and W. H. J. Fuchs (1951), "On the Distribution of Values of Sums of Random Variables," Mem. Amer. Math. Soc., No. 6.
Theoretical Complements to Section I.6 1. The Kolmogorov Extension Theorem holds for more general spaces S than the case S = R' presented. In general it requires that S be homeomorphic to a complete and separable metric space. So, for example, it applies whenever S is R k or a rectangle,
THEORETICAL COMPLEMENTS
93
or when S is a finite or countable set. Assuming some background in analysis, a simple proof of Kolmogorov's theorem (due to Edward Nelson (1959), "Regular Probability Measures on Function Space," Annals of Math., 69, pp. 630-643) in the case of compact S can be made as follows. Define a linear functional Ion the subspace of C(S') consisting of continuous functions on S' that depend on finitely many coordinates, by
l(f)' L
f(x1 .,...,
—
x !k)pI1....f Pik(dxi^ . . .dx^k)s
'I "ki
where f((xl)IE/) = J (xi i , . . . , xik).
By consistency, l is a well-defined linear functional on a subspace of C(S') that, by the Stone—Weierstrass Theorem of functional analysis, is dense in C(S'); note that S' is compact for the product topology by Tychonoff's Theorem from topology (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 174, 166). In particular, I has a natural extension to C(S') that is linear and continuous. Now apply the Riesz—Representation Theorem to get a (probability) measure p on (S', .y ) such that for any Je C(S'), l(f) = f s t f dp (see Royden, loc. cit., p. 310). To make the proof in the noncompact but separable and complete case, one can use a fundamental (homeomorphic) embedding of S into IJ (see P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York, p. 219); this is a special case of Urysohn's Theorem in topology (see H. L. Royden, loc. cit., p. 149). Then, by making a two-point compactification of R', S can be further embedded in the Hilbert cube [0,1], where the measure p can be obtained as above. Consistency allows one to restrict p back to S. 0 Theoretical Complements to Section I.7 1. (Brownian Motion and the Inadequacy of Kolmogorov's Extension Theorem) Let S = C[0,1] denote the space of continuous real-valued functions on [0, 1] equipped with the uniform metric p(x, y) = max Jx(t) — y(t)I,
x, ye C[0, 1].
osr<_i
The Borel sigmafield .4 of C[0, 1] for the metric p is the smallest sigmafield of subsets of C[0, 1] that contains all finite-dimensional events of the form {xeC{0,1]:a.<x(t )_
;
k
1, 0
<... < t, I, a 1 ... .a k' b.,.. ,b k C_R"
The problem of constructing standard Brownian motion (starting at 0) on the time interval 0 _< t _< 1 corresponds to constructing a process {X,: 0 <_ t _< 1} having Gaussian finite-dimensional distributions with mean zero and variance—covariance matrix y ;; = min(t i , t.) for the time points 0 _< t o < . • • < t k _< 1, and having a.s. continuous sample paths. One can easily construct a process on the product space
94
RANDOM WALK AND BROWNIAN MOTION
S2 = t O . 11 (or R 10, ' ) ) equipped with the sigmafeld F generated by finite-dimensional events of the form {x E 118 10 •' ) : a, < x(t 1 ) _< b i , i = 1, ... , k} having the prescribed finite-dimensional distributions; in particular, consistency in the Kolmogorov extension theorem can be checked by noting that for any time points t 1 < t 2 < • . • < t k , the matrix y ;j = min(t,, t j ), 1 i, j _< k, is symmetric and nonnegative definite, since for any real numbers x l , ... , x k , t o = 0, min(i,j)
/
^yijXiXj=>Xi Xj ^ (Lr i,j
i,j
r=1
—
tr—t)
k
= ttr—tr-1) xi r
\2 J1 iO.
i=r
However, the problem with this construction of a probability space (i2, F, P) for {X,} is that events in F can only depend on specifications of values at countably many time points. Thus, the subset C[0, 1] of S2 is not measurable; i.e., C[O, 1] 0 F. This dilemma is resolved in the theoretical complement to Section I.13 by showing that there is a modification of the process {X} that yields a process {B,} with sample paths in C[O, 1] and having the same finite-dimensional distributions as {X}; i.e., {B,} is the desired Brownian motion process. The basic idea for this modification is to show that almost all paths q —+ Xq , q e D, where D is a countable dense set of time points, are uniformly continuous. With this, one can then define {B,} by the continuous extension of these paths given by
B, _
XQ ift=qeD
lim Xq if t D. q1t
It is then a simple matter to check that the finite-dimensional distributions of {Xr } carry over to {B,} under this extension. The distribution of {B r } defines the Wiener measure on C[O, 1]. So, from the point of view of Kolmogorov's (canonical) construction, the main problem remains that of showing a.s. uniform continuity on a countable dense set. A solution is given in theoretical complement 13.1. 2. In general, if {X,: t e i} is a stochastic process with an uncountable index set, for example I = [0, 1], then the measurability of events that depend on sample paths at uncountably many points t e I can be at issue. This issue goes beyond the finite-dimensional distributions, and is typically solved by exploiting properties of the sample paths of the model. For example, if I is discrete, say I = {0, 1, 2, ...}, then the event {sup , X. > x} is the countable union {X„ > x} of measurable sets, which is, therefore, measurable. But, on the other hand, if I is uncountable, say 1 = [0, 1], then {sup,., X, > x} is an uncountable union of measurable sets. This of course does not make {sup,,, X,> x} measurable. If, for example, it is known that {XX } has continuous sample paths, however, then, letting T denote the rationals in [0, 1], we have {sup,,, X, > x} = {sup, ET X, > x} = U tET {X> x}. Thus {sup,,, X, > x} is seen to be measurable under these circumstances. While it is not always possible to construct a stochastic process {X,: t _> 0} having continuous sample paths for a given consistent specification of finite-dimensional distributions, it is often possible to construct a model (S2, F, P), {Xr : t 0} with the following property, called separability. There is a countable dense subset T of I = [0, oo] and a set D e F with P(D) = 0 such that for each we S1 - D, and each t e i there is a sequence t 1 , t 2 ,... in T (which may depend on w) such that t„ -> t and X(w) -+ X(w) (P. Billingsley (1986),
THEORETICAL COMPLEMENTS
95
Probability and Measure, 2nd ed., Wiley, New York, P. 558). In theory, this is enough sample path regularity to make manageable most such measurability issues connected with processes at uncountably many time points. In practice though, one seeks to explicitly construct models with sufficient sample path regularity that such considerations are often avoidable. The latter is the approach of this text.
Theoretical Complements to Section I.8 1. Let {X;" }, n = 1, 2, ... and {X,} be stochastic processes whose sample paths belong to a metric space S with metric p; for example, S = C[O, 1] with p(aw, ri) = sup o , t ,, 1co, — n,j, w, ry E S. Let .4 denote the Borel sigmafield for S. The distributions P of {X,} and P. of {X," }, respectively, are probability measures on (S, 9). Assume {X;" } and {X} are defined on a probability space (S2, .F , Q). Then, )
)
)
(i)
P(B) = Q({w E ): {X,(uo)} E B})
(ii)
P„
^n) (m)} E B}),
= Q({ (U aft {
(T.8.1)
n >- 1.
Convergence in distribution of {X} to {X} has been defined in the text to mean that the sequence of real-valued random variables Y:= f({X;" }) converges in distribution to Y:= f({X})for each continuous (for the metric p) function f: S - W. However, an equivalent condition is that for each bounded and continuous real-valued function f: S one has, )
lim Ef({X!" }) = Ef({X,}).
(T.8.2)
)
To see the equivalence, first observe that for any continuous f: S —• R', the functions cos(rf) and sin(rf) are, for each r E R', continuous and bounded functions on S. Therefore, assuming that condition (T.8.2) gives the convergence of the characteristic functions of the Y. to that of Y for each continuous f on S. In particular, the Y" must converge in distribution to Y. To go the other way, suppose that f: S —• ff8' is continuous and bounded. Assume without loss of generality that 0
, N
P
N
<
k) (
f< - /J
J
,
N
N k (k-1
kl
—P l----< <— /J (T.8.3) fdP^ s N N k=1 N
Equivalently, by rearranging terms, N k Ej P(f> k N ) — -S jsfdP_<- ky' P^f>k N l ^.
(T.8.4)
Likewise, these inequalities hold with P. in place of P. Therefore, by (T.8.4) applied
to P, x
_
1
fdP^N P^f> k N l l S
k=t
96
RANDOM WALK AND BROWNIAN MOTION
1N kj=, liminf P„ f> k N- I N
-
t
liminf f dP + N , „
(T.8.5)
s
by (T.8.4) applied to P, and the fact that lim Prob(} „ > x) = Prob(Y > x) for all points x of continuity of the d.f. of Y implies liminf Prob( Y„ > y) >- Prob(Y > y) for ally. Letting N -' gives
J f dP -< liminf J fdP..
S
n
s
The same argument applied to —f gives that
J f dP >- limsup f dP„.
s
„
is
Thus, in general, limsup „
J
f dP„ < I f dP -< liminf I f dP„ s
i s
',
(T.8.6)
s
which implies limsup J f dP„ = liminf J f dP„ = J f dP. s
„
s
(T.8.7)
s
This is the desired condition (T.8.2).
n
With the above equivalence in mind we make the following general definition. Definition. A sequence {P„} of probability measures on (S, .4) converges weakly (or in distribution) to a probability measure P on (S, 9) provided that lim„ f s f dP„ = f s f dP
for all bounded and continuous functions f: S -+ U8'. Weak convergence is sometimes denoted by P„ P as n -+ oo. Other equivalent notions of convergence in distribution are as follows. The proofs are along the lines of the arguments above. (P. Billingsley (1968), Convergence of Probability Measures, Theorem 9.1, pp. 11-14.) Theorem T.8.1. (Alexandrov). P, = P as n -. oo if and only if (i) lim„.^ P„(A) = P(A) for all A e -4 such that P(ÖA) = 0, where öA denotes the boundary of A for the metric p. (ii) limsup, P„(F) P(F) for all closed sets F c S. (iii) liminf, P,(G) P(G) for all open sets G c S. ❑ 2. The convergence of a sequence of probability measures is frequently established by an application of the following theorem due to Prohorov.
THEORETICAL COMPLEMENTS
97
Theorem T.8.2. (Prohorov). Let {PP } be a sequence of probability measures on the metric space S with Bore! sigmafield . If for each r > 0 there is a compact set K E c S such that P(K) > 1 — a
for all n = 1, 2, ... ,
(T.8.8)
then {P.} has a subsequence weakly convergent to a probability measure Q on (S, °t7). Moreover, if S is complete and separable then the condition (T.8.8) is also necessary. ❑ The condition (T.8.8) is referred to as tightness of the sequence of probability measures {PP }. A proof of sufficiency of the tightness condition in the special case S = 11' is given in Chapter 0, Theorem 5.2. For the general result, consult Billingsley, loc. cit., pp. 37-40. A version of (T.8.8) for processes with continuous paths is computed below in Theorem T.8.4. 3. In the case of probability measures {P P } on S = C[0, 1], if the finite-dimensional distributions of P. converge to those of P and if the sequence {P n } is tight, then it will follow from Prohorov's theorem that {P P } converges weakly to P. To check tightness it is useful to have the following characterization of (relatively) compact subsets of C[0, 1] from real analysis (A. N. Kolmogorov and S. V. Fomin (1975), Introductory Real Analysis, Dover, New York, p. 102). Theorem T.8.3. (Arzela-Ascoli). A subset A of functions in C[0, 1] has compact closure if and only if
(i) sup Iwol < c, ..A
(ii) lim sup v w (S) = 0, 5-•o w.A
where v(ö) is the oscillation in we C[0,1 ] defined by v w (S) = sup
<5Iws — co . D
The condition (ii) refers to the equicontinuity of the functions in A in the sense that given any r > 0 there is a common S > 0 such that for all functions w e A we have Iw, — cw,I < e if It — sI < b. Conditions (i) and (ii) together imply that A is uniformly bounded in the sense that there is a number B for which IIwII := sup ow,l _< B
for all w e A.
ose
This is because for N sufficiently large we have sup WEA v w (1/N) < I and, therefore, for each 0
wi,1N — wr-,u Nl _< sup Iwol + N sup v^,(— = B.
Iw1(< Iw01 + i
=1
weA
w.A
\N
4. Combining the Prohorov theorem (T.8.2) with the Arzela-Ascoli theorem (T.8.3) gives the following criterion for tightness of probability measures { P„} on S = C[0, 1]. Theorem T.8.4. Let {P„} be a sequence of probability measures on C[0, 1]. Then {P„} is tight if and only if the following two conditions hold.
98
RANDOM WALK AND BROWNIAN MOTION
(i) For each ry > 0 there is a number B such that P"({weC[0,1]:1W o l> B})
n = 1,2,....
(ii) For each e> 0, ry > 0, there is a 0 <(5 < 1 such that
P"({weC[0,1]:v",(b)>_a})_
n>_1.
❑
Proof. If {P"} is tight, then given ry > 0 there is a compact K such that P(K) > 1 — ry for all n. By the Arzela—Ascoli theorem, if B> supKIWol then P"({w e C[O, 1]: I(0 o l >, B}) 5 P"(K c ) _< 1 — (1 — n) = n. Also given e > 0 select b > 0 such that sup. EK v.((5) < E. Then P"({w e C[0,1]: v W (S) >_ ej) < P(KC) < ry
for all n _> 1.
The converse goes as follows. Given ry > 0, first select B using (i) such that P"({w: Iw o l < B}) _> 1 — Zry, for n >, 1. Select S, using (ii) such that P({w: v w ((5,) < 1/r}) 1— for n >, 1. Now take K to be the closure of
t
1 —1r co: v w (8,) < — {w: Iw o l < B} n n
Then P"(K) > 1 — ry for n > 1, and K is compact by the Arzela—Ascoli theorem. • The above theorem taken with Prohorov's theorem is a cornerstone of weak convergence theory in C[O, 1]. If one has proved convergence of the finite-dimensional distributions of {X,('°} to {X,} then the distributions of {Xo'} must be tight as probability measures on III' so that condition (i) is implied. In view of this and the Prohorov theorem we have the following necessary and sufficient condition for weak convergence in C[0,1] based on convergence of the finite-dimensional distributions and tightness. Theorem T.8.5. Let {X: 0 < t < 1) and {XX :0 _< t <, 1} be stochastic processes on (S2, .f, P) which have a.s. continuous sample paths and suppose that the finite-dimensional distributions of {X} converge to those of {X}. Then {X;"°} converges weakly to {X,} if and only if for each F > 0 lim sup
sup IX;"^ — Xs" I > e^ = 0. )
P(
❑
6-0 n Is-11<'6
Corollary. For the last limit to hold it is sufficient that there be positive numbers a, ß, M such that
EIX;" — X" M -< MIt it — sl' +0 for all s, t, n. )
❑
To prove the corollary, let D be the set of all dyadic rationals in [0, 1], i.e., numbers in [0, 1] of the form j/2" for integers j and m. By sample path continuity, the oscillation
THEORETICAL COMPLEMENTS
99
of the process over It - sI <- ö is given by v„(D, (5) = sup; X;"'l a.s. It - si-
Take 6 = 2 -k+ '. Then, taking suprema over positive integers i, j, n, v„(D, (5)
sup
2 j2
-
k
-
IXiz)-m - Xj2 -k1. )
^<(j+l)2
-
k
Now, for j2 -k < i2 - m < (j + 1)2 -k , writing r i2 - m=j2 -k + 1] 2 - m'
wherek<m 1 <m 2 <
<m,^m,
we have, writing a(p) = Jv=, 2 ”, -
(n)
— X
l„I -
k —
^`
X.j2-k+al/+) — Xj2
-
k+alµ
-
J)j
N=1
Therefore,
v„(D,
(5)
<
2
sup
Y
IX( +l)2 (
m - Xnz--I•
m=k+1 0-
Let e > 0 and take ö = 2 -k+ so small (i.e., k so large) that P(v„(D, S) > s)
P^
1/m 2 < E/2. Then
IXin+n2-m - Xhz--I >
12^
< 0-
k+1
V
sup
^=k+1
2'^-1
E
Y- P IXI h
m=k+1
— X >
1
(„)
1
J
.
m
h=0
Now apply Chebyshev's Inequality to get /
ao
2'” - 1
EIX(h+1)2-m — Xh2 - "l
m2a
P(vn(D, (5) > E) m=k+l
m 2a2m M2 m(l+p) = M mk+1
a
h=0
x m 2„ — mß
mk+12
This bound does not depend on n and goes to zero as S -+ 0 (i.e., k = log 2 (5 -' + 1 -• + oo) since it is the tail of a convergent series. This proves the corollary.
n
5. (FCLT and a Brownian Motion Construction) As an application of the above corollary one can obtain a proof of Donsker's Invariance Principle for i.i.d. summands having finite fourth moments. Moreover, since the simple random walk has moments
100
RANDOM WALK AND BROWNIAN MOTION
of all orders, this approach can also be used to give an alternative rigorous construction of the Wiener measure based on Prohorov's theorem as the limiting distribution of random walks. (Compare theoretical complement 13.1 for another construction). Let Z 1 , Z 2 ,... be i.i.d. random variables on a probability space (S2, y , P) having mean zero, variance one, and finite fourth moment m, = EZ. Define So = 0, S"=Z 1 +• •+Z",n>,1,and
= n-
Xtn]
IIZ
Stnr] + n - ' ]Z (nt — [ nt])Zt,, i +l,
0 <- I -< 1.
We will show that there are positive numbers a, ß and M such that
E^X;" — XS" ] ^a ]
-
for 0 < s, t <- 1, n= 1, 2, .... (T.5.1)
Mit —sa l+ ß
By our corollary this will prove tightness of the distributions of the process n = 1, 2, .... This together with the finite-dimensional CLT proves the FCLT under the assumption of finite fourth moments. One needs to calculate the probabilities of fluctuations described in the Arzela—Ascoli theorem more carefully to get the proof under finite second moments alone. To establish (T.5.1), take a = 4. First consider the case s = (j/n) < (k/n) = t are at the grid points. Then
E{X;" — 1snl} 4 = n -2 E{Z^ +1 + ... + Z} 4 )
k
=n
k
k
k
2 E{Zi,Z„Z,,Z;,} ij+1 i2=j+1 i3=j+1 is=j+1
= fl-2 (k — j)EZ1 + (2)( k {
1 )(EZ;) 2
1.
Thus, in this case, E{X;'] — XS"]}4 = n-2{(k
—
j)m4 + 3(k — j)(k —j — 1)}
k z n n
n -2 {(k—j)m 4 + 3(k —j) 2 } -<(m,+3) — 1 (m 4 + 3)It — s1 2 = c l (t — st 2 ,
where c 1 = m 4 + 3.
Next, consider the more general case 0 < s, t -< 1, but for which It — si -> 1/n. Then, for s < t,
E{X;'» — X;" ] } 4 = n Z E
t
4
[ntl
ZJ + (nt — [nt])Z[",1+1 — ([ns] — ns)Z]"sl+l
j=(ns]+ 1
[n^]
n - 2 3° E( ' Z^
\a )
+ (nt — [nt]) 4 EZ1", 1+ 1
=[ns]+I
+ (ns — [ns]) 4 EZ[ns]+1 n
-
2
3 4 {c 1 ([nt]
— [ns]) 2 + (nt — [nt]) 2 m 4 + (ns — [ns])2m4}
J
101
THEORETICAL COMPLEMENTS
n -2 3ac 1 {([nt] — [ns]) z + (nt — [nt])z + (ns — [ns])Z}
n -2 3 4 c i {([nt] — [ns]) + (nt — [nt]) + (ns — [ns])} 2 = n -2 3 4 c 1 {nt — ns + 2(ns — [ns])} 2 n -2 3 a c 1 {nt — ns + 2(nt — ns)} 2 = n -2 3 6 c1(nt — ns) 2 = 3 6 c1(t — s) 2 .
In the above, we used the fact that (a + b + c) 4 < 3 4 (a 4 + b a + c a ) to get the first inequality. The analysis of the first (gridpoint) case was then used to get the second inequality. Finally, if it — sI < 1/n, then either (a) k_<s
n
n
k — (b)- s< n n
l forsome0_
k+2 k+ 1 and\t< forsome0_
In (a), S1 ,, = S, so that, since Int — nsl
1,
E{X;") — Xs"'}a = n -2 E{n(t — s)4+ }4 = m4n 2 (t — s)4 = ma(nt — ns) 2 (t — s) 2 -< m a (t
In (b), S —
= Zk+ ,, so that
E{X^") _ XS"^}a = n Z E Zk
+i
+ nl t — \ (
( (
=t1 n -Z E n \ t—
k+l ^
n
k
\ Zk+i — n(s — -^Zk+I n )
Zk+z—n s—
nt
k a+(k+ am <24n2(t —
+
= 2 a n 2 m a 1(t — k
1-^ a
la
k
n
+
1
k+11 —
n
la
JZk+i Jj
a — s) m a
I k n--s )a } )J)
a
+ I + k + 1 — sl = 2 n2m 2an2ma 4 (t — s)a t—k 4
n
=
24 m4(nt
n
J
— ns) 2 (t — s) 2 24m4(t — s) 2 .
Take ß = 1, M = max{2 4 m 4 , 3 6 (m 4 + 3)} = 3 6 (m 4 + 3) for a = 4.
n
The FCLT (Theorem 8.1) is stated in the text for convergence in S = C[0, 00), when S has the topology of uniform convergence on compacts. One may take the metric to be p(w, co') _ Zk 1 2 -k dk /(1 + dk ), where d k = max{Iw(t) — cu'(t)J: 0 t k}. Since the above arguments apply to [0, k] in place of [0, 1], the assertion of Theorem 8.1 follows (under the moment condition m a < oo).
RANDOM WALK AND BROWNIAN MOTION
102
6. (Measure Determining Classes) Let (S, p) be a metric space, .a(S) its Borel sigmafield. A class l c a(S) is measure-determining if, for any two finite measures u, v, p(C) = v(C) VC e' implies p = v. An example is the class -9 of all closed sets. To see this, consider the lambda class .sad of all sets A for which p(A) = v(A). If this class contains -9 then by the Pi-Lambda Theorem (Chapter 0, Theorem 4.1) a a(s) = .a(S). Similarly, the class (0 of all open sets is measure-determining. A class 9 of real-valued bounded Borel measurable functions on S is measure-determining if ff dp = f f dv Vg e ✓ I implies p = v. The class C b (S) of real-valued bounded continuous functions on S is measure-determining. To prove this, it is enough to show that for each Fe -9 there exists a sequence {f} c Cb (S) such that f" j I F as n I oo. For this, let h a (r) = 1 — nr for 0 _< r < 1/n, h a (r) = 0 for r > 1/n. Then take f(x) = h"(p(x, F)).
Theoretical Complements to Section I.9 1. If f: C[0, oo) -+ 08' is continuous, then the FCLT provides that the real-valued random variables X. = f({X;" 1 }), n > 1, converge in distribution to X = f({X' }). Thus, if one checks by direct computation that the limit F(a) = lim a P(X" _< a), or F(a ) = lim a P(X" < a), exists for all a, then F is the d.f. of X. Applying this to f ({Xs }) := max{X5 :0 ,< s ,< t} __ M, we get (10.10), for P(T. > t) = P(M< < z), if z > 0. The case z < 0 is similar. Joint distributions of several functionals may be similarly obtained by looking at linear combinations of the functionals. Here is the precise statement. -
Theorem T.9.1. If f: C[0, oo) -- l8'` is continuous, say f = (fl , ... , f,,) where f• : C[0, oo) - III', then the random vectors X. = f({X;" 1 }), n >, 1, converge in
distribution to X = f({X,}). Proof. For any r i , ... , rk e R , ^j_ rr f : C[0, co) --• W is continuous so that ^j= 1 r; f ({ X}'°}) converges in distribution to jj =1 rt f ({X}) by the FCLT. Therefore, its characteristic function converges to Ee'<'•« 12 '"> as n -• oo for each r e W'. This means f({X^"^}) converges in distribution to f({X,}) as asserted. n 2. (Mann-Wald) It is sufficient that f: C[0, cc) -+ t8 be only a.s. continuous with respect to the limiting distribution for the FCLT to apply, i.e., for the convergence of f ({X,( '°}) in distribution to f({X,}). That is, Theorem T.9.2. If {X;">} converges in distribution to {Xf } and if P({X,} e Df ) = 0, where Df = {x e C[0, oo): f is discontinuous at x}, then f({X}) converges in distribution to f({XX }).
❑
Proof. This can be proved using Alexandrov's Theorem (T.8.1(ii)), since for any closed set F, f '(F) c f - `(F) = Df v f - `(F), where the overbar denotes the closure of the set. n -
In applications it is the requirement that the limit distribution assign probability
THEORETICAL COMPLEMENTS
103
zero to the event DJ that is sometimes nontrivial to check. As an example, consider (9.5) and (9.6). Let F = {rc < z. }. Recall that Q - C[0, oo) is given the topology of uniform convergence on bounded subintervals of [0, x). Consider an element w e S2 that reaches a number less than c before reaching d. It is simple to check that cu belongs to the interior of F. On the other hand, if w belongs to the closure of F, then either (i) we F, or (ii) w neither reaches c nor d. Thus, if cu e iF, then either (ii) occurs, the probability of which is zero for a Brownian motion since JX,I -i x in probability, as t -+ oo, or (iii) in the interval [r., z'), w never goes below c. By the strong Markov property (see Chapter V, Theorem 11.1), the latter event (iii) has the same probability as that of a Brownian motion, starting at c, never reaching below c before reaching d. In turn, the last probability is the same as that of a Brownian motion, starting at 0, never reaching below 0 before reaching d — c. This probability is zero, since r = r' converges to zero in probability as z 10 (see Eq. 10.10). By combining such arguments with those given in theoretical complement I above, one may also arrive at (10.15) for Brownian motions with nonzero drifts. Theoretical Complements to Section 1.12
A proof of Proposition 12.1 for the special case of infinite-dimensional events that depend on the empirical process through the functional (w -• supo,,,, Ico,l) used to define the Kolmogorov-Smirnov statistic (12.11) is given below. This proof is based on a trick of M. D. Donsker (1952), "Justification and Extension of Doob's Heuristic Approach to the Kolmogorov-Smirnov Theorems," Annals Math. Statist., 23, pp. 277-281, which allows one to apply the FCLT as given in Section 8 (and proved in theoretical complements to Section 1.8 under the assumption of finite fourth moments). The key to Donsker's proof is the simple observation that the distribution of the order statistic (Y i) , ... , Y) of n i.i.d. random variables Y, Y2 , ... from the uniform distribution on [0, 1] can also be obtained as the distribution of the ratios S, (
S Z S„
S., S„+
I
where Sk = T + • • • + T k > 1, and T1 , T2 , exponentially distributed random variables. k,
S„ +I ...
is an i.i.d. sequence of (mean 1) ❑
Intuitively, if the T are regarded as the successive times between occurrence of some phenomena, then S, is the time to the (n + 1)st occurrence and, in units of S„ + 1 , the occurrence times should be randomly distributed because of lack of memory and iridependence properties. A version of this simple fact is given in Chapter IV (Proposition 5.6) for the Poisson process. The calculations are essentially the same, so this is left as an exercise here. The precise result that we will prove here is as follows. The symbol °= below denotes equality in distribution. Proposition T.12.1. Let Y1 , Y2 be i.i.d. uniform on [0, 1] and let, for each n _> 1, {F(t)} be the corresponding empirical process based on Y1 Y„. Then D„ = sup o ,,, i JF(t) — tI converges in distribution to sup o ,,, i IB*l as n - oc. ,
...
,
...
,
104
RANDOM WALK AND BROWNIAN MOTION
Proof. In the notation introduced above, we have D= fn sup iFF(t) - tI = f max I y k) - k O^t51
d
='nmax
I
k5n
k
Sk
-- =
k5n S^11 d
=
'I
n
n
n
Sk—k kSn+1
max I— --
—
n
Sn+1 kn n
sup JX;' — tX; lI + O(n -1 / 2 ),
(T.12.1)
Sn+1
where Xcnl = S[ ,u1 - [nt] t
and, by the SLLN, n/(S n+1 ) -^ I a.s. as n -• ao. The result follows from the FCLT, (8.6), and the definition of Brownian bridge. n This proposition is sufficient for the particular application to the large-sample distribution given in the text. A precise definition of weak convergence of the scaled empirical distribution function as well as a proof of Proposition 12.1 can be found in P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York. 2. Tabulations of the distribution of the Kolmogorov-Smirnov statistic can be found in L. H. Miller (1956), "Table of Percentage Points of Kolmogorov Statistics," J. Amer. Statist. Assoc., 51, pp. 111-121. 3. Let (X I A) denote the conditional distribution of a random variable or stochastic process X given an event A. As suggested by Exercise 12.4*, from the convergence of the finite-dimensional distributions it is possible to obtain the Brownian bridge as the limiting distribution of ({B,} I 0 < B, _< e) as e -* 0, where {B,} denotes standard Brownian motion on 0 _< t < 1. To make this observation firm, one needs to check tightness of the family {({B,} 0 < B, _< e): e > 0). The following argument is used by P. Billingsley (1968), Convergence of Probability Measures, Wiley, New York, p. 84. Let F be any closed subset of C[0, 1]. Then, since sup,,,,, IB* - B 1 1 = IB,^, P({B t }eFl0<-B, -< e)-
E
4. A check of tightness is also required for the Brownian meander and Brownian excursion as described in Exercises 12.6 and 12.7, respectively. For this, consult R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117-129. The
THEORETICAL COMPLEMENTS
105
distribution of the extremal functionals outlined in the exercises can also be found in R. T. Durrett and D. L. Iglehart (1977), "Functionals of Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 130-135; K. L. Chung (1976), "Excursions in Brownian Motion," Ark. Mat., pp. 155-177; D. P. Kennedy (1976), "Maximum Brownian Excursion," J. App!. Probability, 13, 371-376. Durrett, Iglehart and Miller (1977) also show that the * and + commute in the sense that the Brownian excursion can be obtained either by a meander of the Brownian bridge (as done in Exercise 12.7) or as a bridge of the meander (i.e., conditioning the meander in the sense of theoretical complement 3 above). Brownian meander and Brownian excursion have been defined in a variety of other ways in work originating in the late 1940's with Paul Levy; see P. Levy (1965), Processus Stochastiques et Mouvement Brownien, Gauthier-Villars, Paris. The theory was extended and terminology introduced in K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths, Springer Verlag, New York. The general theory is introduced in D. Williams (1979), Diffusions, Markov Processes, and Martingales, Vol. 1, Wiley, New York. A much fuller theory is then given in L. C. G. Rogers and D. Williams (1987), Diffusions, Markov Processes, Martingales, Vol. II, Wiley, New York. Approaches from the point of view of Markov processes (see theoretical complement 11.2, Chapter V) having nonstationary transition law are possible. Another very useful approach is from the point of view of FCLTs for random walks conditioned on a late return to zero; see W. D. Kaigh (1976), "An Invariance Principle for Random Walk Conditioned by a Late Return to Zero," Ann. Probab., 4(1), pp. 115 -121, and references therein. A connection with extreme values of branching processes is described in theoretical complement 11.2, Chapter V.
Theoretical Complements to Section I.13 (Construction of Wiener Measure) Let {X,: t _> 0} be the process having the finite-dimensional distributions of the standard Brownian motion starting at 0 as constructed by the Kolmogorov Extension Theorem on the product probability space (D, .y , P). As discussed in theoretical complement 7.1, events pertaining to the behavior of the process at uncountably many time points (e.g., path continuity) cannot be represented in this framework. However, we will now see how to modify the model in such a way as to get around this difficulty. Let D be the set of nonnegative dyadic rational numbers and let J", k = [ k/2", (k + 1)/2"]. We will use the maximal inequality to check that °° j P max n=1
sup Xq - XXI2 "i >
n1 < oo .
(T.13.1)
(0_
By the Borel-Cantelli lemma we will get from this that with probability 1, for all n sufficiently large, max OEk
geJ,,.kn
1 sup IXq - Xkf2 .,I -< -. n D
_>
(T.13.2)
In particular, it will follow that with probability 1, for every t > 0, q - Xq is uniformly continuous on D n [0, t]. Thus, almost all sample paths of {Xq : q e D} have a unique extension to continuous functions {B,: t 0}. That is, letting C = {wu e Q: for each
106
RANDOM WALK AND BROWNIAN MOTION
t > 0, q - Xq (w) is uniformly continuous on D n [0, t]}, define for w e C, B,(co)
if t = q e D,
= Xq (W),
lim Xq (co),
(T.13.3)
if t 0 D,
q—t
where the limit is over dyadic rational q decreasing to t. By construction, {B,: t _> 0} has continuous paths with probability 1. Moreover, for 0 < t, < • • • < t, with probability one, (B,,, .. . , B1 ) = lim" . (Xqc ., ... , Xq ,) for dyadic rational q;" , ... , qp" decreasing to t,.. ... t k . Also, the random vector (Xgti .... .. Xq ^ ,) has the multivariate normal distribution with mean vector 0 and variance—covariance matrix = min(t ; , t i ), I < i, j < k as a limiting distribution. lt follows from these two facts that this must be the distribution of (B 1 , B, k ). Thus, {B,} is a standard Brownian motion process. To verify the condition (T.13.1) for the Borel—Cantelli lemma, just note that by the maximal inequality (see Exercises 4.3, 13.11),
)
)
P(
max IX,+,a2-- —
a) 2P(IX,+a — X11 % a)
(s2"ß 2
4 E(X1+, — X,) 4
a
66'
(T.13.4)
aä
since the increments of {X,} are independent and Gaussian with mean 0. Now since a} increase with m, we have, letting m —* oo, the events {max 142 .„IX, +1a2 -, — X, P(
sup
(T.13.5)
iXr+qa — X1I > a \ 6S . J
o$q$l.q.D
a
Thus, Pl max
sup Xq — Xk12 .4 >
O^kan2"qeJ".knD
/
Pf sup
- n
k=0
6^ri2 ^
)4
1 n
\qeJ",k,D
( n)2
n2"
(Xq — Xk,2"I > 5
=—‚ 6
(T.13.6)
which is summable.
Theoretical Complements to Section 1.14 1. The treatment given in this section follows that in R. N. Bhattacharya, V. K. Gupta, and E. Waymire (1983), "The Hurst Effect Under Trends," J. App!. Probability, 20, pp. 649-662. The research on the Hurst effect is rather extensive. For the other related results mentioned in the text consult the following references:
THEORETICAL COMPLEMENTS
107
W. Feller (1951), "The Asymptotic Distribution of the Range of Sums of Independent Random Variables," Ann. Math. Statist., 22, pp. 427-432. P. A. P. Moran (1964), "On the Range of Cumulative Sums," Ann. Inst. Statist. Math., 16, pp. 109-112. B. B. Mandelbrot and J. W. Van Ness (1968), "Fractional Brownian Motions, Fractional Noises and Applications," SIAM Rev., 10, pp. 422-437. Processes of the type considered by Mandelbrot and Van Ness are briefly described in theoretical complement 1.3 to Chapter IV of this book.
CHAPTER II
Discrete-Parameter Markov Chains
1 MARKOV DEPENDENCE Consider a discrete-parameter stochastic process {X„}. Think of X0 , X,, ... , X,_, as "the past," X. as "the present," and X +1 X +z as "the future" of the process relative to time n. The law of evolution of a stochastic process is often thought of in terms of the conditional distribution of the future given the present and past states of the process. In the case of a sequence of independent random variables or of a simple random walk, for example, this conditional distribution does not depend on the past. This important property is expressed by Definition 1.1. ,
i
, • • .
Definition 1.1. A stochastic process {X0 , X 1 ..... X„, ...} has the Markov property if, for each n and m, the conditional distribution of X„ + 1 .. • , Xn +m given X0 , X 1 ..... X„ is the same as its conditional distribution given X. alone. A process having the Markov property is called a Markov process. If, in addition, the state space of the process is countable, then a Markov process is called a Markov chain. ,
In view of the next proposition, it is actually enough to take m = I in the above definition. Proposition 1.1. A stochastic process X0 , X,, X 2 , .. . has the Markov property if and only if for each n the conditional distribution of X„ + , given X 0 X,..... X is a function only of X. ,
X
Proof. For simplicity, take the state space S to be countable. The necessity of the condition is obvious. For sufficiency, observe that 109
110 P(Xn+1 =ill . .. ,
DISCRETE-PARAMETER MARKOV CHAINS
Xn +m
—
.lm
I Xo =
= P(Xn+t
... ,
Xn = in)
= f1 I Xo = io, .. . , Xn = in)'
P(Xn+z = J2jXo= io,...,Xn=In,Xn+t
= J1) ...
P(Xn +m=lmIXo = io,...,Xn +m-1 =1m-1)
= P(Xn +1 =J1
I Xn = in)P(Xn +2 =12 I Xn +1 =1 ).
P(Xn +m =1m
Xn +m-1 =1m-1).
(1.1)
The last equality follows from the hypothesis of the proposition. Thus the conditional distribution of the future as a function of the past and present states i 0 , i 1 , ... , i n depends only on the present state i n . This is, therefore, the conditional distribution given X n = i n (Exercise 1). n A Markov chain {X0 , X1,. . .} is said to have a homogeneous or stationary transition law if the distribution of Xn +l, ... , Xn +m given Xn = y depends on the state at time n, namely y, but not on the time n. Otherwise, the transition law is called nonhomogeneous. An i.i.d. sequence {X n } and its associated random walk possess time-homogeneous transition laws, while an independent nonidentically distributed sequence {X} and its associated random walk have nonhomogeneous transitions. Unless otherwise specified, by a Markov process (chain) we shall mean a Markov process (chain) with a homogeneous transition law. The Markov property as defined above refers to a special type of statistical dependence among families of random variables indexed by a linearly ordered parameter set. In the case of a continuous parameter process, we have the following analogous definition. Definition 1.2. A continuous-parameter stochastic process {X,} has the Markov property if for each s < t, the conditional distribution of X, given {X, u < s} is the same as the conditional distribution of X, given Xs . Such a process is called a continuous-parameter Markov process. If, in addition, the state space is countable, then the process is called a continuous-parameter Markov chain.
2 TRANSITION PROBABILITIES AND THE PROBABILITY SPACE An i.i.d. sequence and a random walk are merely two examples of Markov chains. To define a general Markov chain, it is convenient to introduce a matrix p to describe the probabilities of transition between successive states in the evolution of the process. Definition 2.1. A transition probability matrix or a stochastic matrix is a square
TRANSITION PROBABILITIES AND THE PROBABILITY SPACE
III
matrix p = ((p i3 )), where i and] vary over a finite or denumerable set S, satisfying (i) p i; >, 0 for all i and j, (ii) YJES pit = I for all i.
The set S is called the state space and its elements are states. Think of a particle that moves from point to point in the state space according to the following scheme. At time n = 0 the particle is set in motion either by starting it at a fixed state i o , called the initial state, or by randomly locating it in the state space according to a probability distribution it on S, called the initial distribution. In the former case, it is the distribution concentrated at the state i 0 , i.e., n ; = 1 if j = i o , ire = 0 if j : i 0 . In the latter case, the probability is 7r ; that at time zero the particle will be found in state i, where 0 < n i < 1 and y i tc i = 1. Given that the, particle is in state i 0 at time n = 0, a random trial is performed, assigning probability p ;0 to the respective states j' E S. If the outcome of the trial is the state i l , then the particle moves to state i 1 at time n = 1. A second trial is performed with probabilities p i , j - of states j' E S. If the outcome of the second trial is i 2 , then the particle moves to state i 2 at time n = 2, and so on. A typical sample point of this experiment is a sequence of states, say (i o , i 1 , i z , ... , i n , ...), representing a sample path. The set of all such sample paths is the sample space S2. The position Xn at time iI is a random variable whose value is given by X n = i n if the sample path is (i 0 , i l , ... , i n , ...). The precise specification of the probability P,, on Q for the above experiment is given by Pn( XO =
l 0, X1 = ii, •
, Xn = in) =
M io Pioi1 Pi l i 2
.. •p
(2.1)
More generally, for finite-dimensional events of the form A={(X0 ,X 1 ....,Xn )EB},
(2.2)
where B is an arbitrary set of (n + 1)-tuples of elements of S, the probability of A is specified by P,1(A) _
y-
iioPioi,...p1
^.
(2.3)
(io.i1 ..... ln)EB
By Kolmogorov's existence theorem, P,, extends uniquely as a probability measure on the smallest sigmafield F containing the class of all events of the form (2.2); see Example 6.3 of Chapter I. This probability space (i), F, Pn ) with Xn (CU) = w,, co E S2, is a canonical model for the Markov chain with transition probabilities ((p ij )) and initial distribution it (the Markov property is established below at (2.7)—(2.10)). In the case of a Markov chain starting in state i, that is, 1t ; = 1, we write Pi in place of P11.
112
DISCRETE-PARAMETER MARKOV CHAINS
To specify various joint distributions and conditional distributions associated with this Markov chain, it is convenient to use the notation of matrix multiplication. By definition the (i, j) element of the matrix p 2 is given by nj
)
_ P
(2.4)
ik Pkj
kcS
The elements of the matrix p" are defined recursively by p" = p" 'p so that the (i, j) element of p" is given by -
cn-1)
cn) =
cn-')
_
P ik Pkj PikPkj
n = 2,3 ....
>
(2.5)
kcS
kES
It is easily checked by induction on n that the expression for p;n is given directly in terms of the elements of p according to )
Pi; =
Y_
)
(2.6)
Pü1Pi1i2•..Pin-zln-IPin-li.
11.....In - l E S
Now let us check the Markov property of this probability model. Using (2.1) and summing over unrestricted coordinates, the joint distribution of Xo , Xn ,, X" 2 , . .. , Xnk , with 0 = n o < n l < n2 < • • • < nk, is given by Pa(X0 = i, Xn l =j, Xn 2
—
_ Z E . . . I (
1 2
k
12' ... , Xn k =ik) 7r l Pii, Pi, . . . . pin 1 - lj l)(Pi l in i + I Pin, + l in, + Z . . I7 Y /n2 - 1 J2 )
X ( Pik-link-,+1
. .
(2.7)
Pink-,+l ink_I+2...Pink-lik)>
where >, is the sum over the rth block of indices ii + n ,_ ', .+ .), . ‚ 1,, , in , (r = 1, 2, . .. , k). The sum Y_ k , keeping indices in all other blocks fixed, yields the factor p^kk Ok -') using (2.6) for the last group of terms. Next sum successively over the (k — 1)st, ... , second, and first blocks of factors to get P (X0 =
i,
Xn1 =j1, X,2 =J2'•
^nJ 2 nl).. , Xn k =1k) = 7riP) p . plkk Ijk 1). (2.8)
Now sum over i e S to get PE(Xni
j1, X2 —121 ,
Xnk
—Jk) = ( ^I Pin1))Pjll.i2 " / ( ics
1) .
. .pjkk lik 1). (2.9)
Using (2.8) and the elementary definition of conditional probabilities, it now follows that the conditional distribution of X„ +m given X o , X,,. . . , Xn is given by Pt(Xn+m = J I Xo = i0, Xl = il, .. , Xn— 1 = in — 1, Xn = i) =p^) =P"(Xn+m=j1 Xn =i) )
=
PP(Xm
= .1 1 XO = 1),
m
i
1 , j E S. (2.10)
Although by Proposition 1.1 the case m = 1 would have been sufficient to prove
113
SOME EXAMPLES
the Markov property, (2.10) justifies the terminology that p'° :_ ((p;; >)) is the m-step transition probability matrix. Note that p'° is a stochastic matrix for all m>'1. The calculation of the distribution of X,„ follows from (2.10). We have, Pn(Xm=j)=ZPn(Xm=j• X0 =i) =ZPP(X0=i)Pi(Xm=j1 i
X0
=')
i
_
7
Z.pi;' = (n'pm).,
(2.11)
where n' is the transpose of the column vector n, and (n'pt) j is the jth element of the row vector n'pm.
3 SOME EXAMPLES The transition probabilities for some familiar Markov chains are given in the examples of this section. Although they are excluded from the general development of this chapter, examples of a non-Markov process and a Markov process having a nonhomogeneous transition law are both supplied under Example 8 below. Example 1. (Completely Deterministic Motion). Let the only elements of p, which may be either a finite or denumerable square matrix, be 0's and l's. That is, for each state i there exists a state h(i) such that p ih(i) = 1,
pi; = 0
for j 0 h(i)
(i n S).
(3.1)
This means that if the process is now in state i it must be in state h(i) at the next instant. In this case, if the initial state X o is known then one knows the entire future. Thus, if X 0 = i, then X l = h(i), X z = h(h(i)) := h (z) (i), ... , X" = h(h (
" - 1)
(i)) = h ( " (i), ... . )
Hence p;;^ = 1 if j = h ( " (i) and p;j" = 0 if j ^ h ( " (i). Pseudorandom number generators are of this type (see Exercise 7). )
)
)
Example 2. (Completely Random Motion or Independence). Let all the rows of p be identical, i.e., suppose p i , is the same for all i. Write for the common row,
p si =p
(ieS,jeS).
(3.2)
114
DISCRETE-PARAMETER MARKOV CHAINS
In this case, X0 , X 1 , X2 . . forms a sequence of independent random variables. The distribution of X0 is it while X 1 ,. . . , X, ... have a common distribution given by the probability vector (p j )jcs . If we let it = (p; ), ES , then Xo , X 1 , .. . form a sequence of independent and identically distributed (i.i.d.) random variables. The coin-tossing example is of this kind. There, S = {0,1 } (or {H, T } ) i and po=2'Pj =za1 ,.
Example 3. (Unrestricted Simple Random Walk). Here, S = {0, + 1, ±2, ...}
and p = ((p ;j )) is given by Pi; =P =q =0
ifj =i +1 ifj =i -1 ifjj—iI> 1.
(3.3)
where 0 < p < l and q = 1 — p. Example 4. (Simple Random Walk with Two Reflecting Boundaries). Here S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let =p
ifj =i +1 andc
=q
ifj =i —I andc
p
;j
Pc,c
+i
= I,
Pa,e -i
= 1.
(3.4)
In this case, if at any point of time the particle finds itself in state c, then at the next instant of time it moves with probability I to c + 1. Similarly, if it is at d at any point of time it will move to d — I at the next instant. Otherwise (i.e., in the interior of [c, d]), its motion is like that of a simple random walk. Example S. (Simple Random Walk with Two Absorbing Boundaries). Here S = {c, c + 1, ... , d}, where c and d are integers, c < d. Let PCC = 1,
Pad = 1.
(3.5)
For c < i < d, p i; is as defined by (3.3) or (3.4). In this case, once the particle reaches c (or d) it stays there forever. Example 6. (Unrestricted General Random Walk). Take S = {0, + 1, ± 2, ...}.
Let Q be an arbitrary probability distribution on S, i.e., (i) Q(i) > 0 for all i E S, (ii)
li€S
Q(i) = I.
Define the transition matrix p by
SOME EXAMPLES
115
= Q(j — i),
(3.6)
i, j E S.
One may think of this Markov chain as a partial-sum process as follows. Let X0 have a distribution n. Let Z 1 , Z 2 , ... be a sequence of i.i.d. random variables with common distribution Q and independent of X o . Then, forn>,l,
X^=X0 +Z 1 +-•-+Z,,,
(3.7)
is a Markov chain with the transition probability (3.6). Also note that Example 3 is a special case of Example 6, with Q(— l) = q, Q(l) = p, and Q(i) = 0 for i: +1.
Example 7. (Bienayme—Galton—Watson Simple Branching Processes). Particles such as neutrons or organisms such as bacteria can generate new particles or organisms of the same type. The number of particles generated by a single particle is a random variable with a probability function f; that is, f (j) is the probability that a single particle generates j particles, j = 0, 1, 2, .... Suppose that at time n = 0 there are Xo = i particles present. Let Z 1 , Z 2 , ... , Z i denote the numbers of particles generated by the first, second, ... , ith particle, respectively, in the initial set. Then each of Z 1 , ... , Z ; has the probability function f and it is assumed that Z 1 , ... , Z; are independent random variables. The size of the first generation is X 1 = Z 1 + • • • + Z., the total number generated by the initial set. The X, particles in the first generation will in turn generate a total of X 2 particles comprising the second generation in the same manner; that is, the X, particles generate new particles independently of each other and the number generated by each has probability function f. This goes on so long as offspring occur. Let X. denote the size of the nth generation. Then, using the convolution notation for distributions of sums of independent random variables, pti=P(Z1+ ... +Z.=1)_f *i (j), Poo= 1 ,
p oi =0
i> 1, j^0,
(3.8)
ifj00.
The last row says that "zero" is an absorbing state, i.e., if at any point of time X„ = 0, then X. = 0 for all m > n, and extinction occurs.
Example 8. (Pölya Urn Scheme and a Non-Markovian Example). A box contains r red balls and b black balls. A ball is randomly selected from the box and its color is noted. The ball selected together with c 0 balls of the same color are then placed back into the box. This process is repeated in successive trials numbered n = 1, 2,-. . . . Indicate the event that a red ball occurs at the nth trial by XX = 1 and that a black ball occurs at the nth trial by X„ = 0. A straightforward induction calculation gives for 0 < Yk= 1 e k < n,
116
DISCRETE-PARAMETER MARKOV CHAINS
P(X1 =
E1,•• .Xn
=
En)
[r+(s„-1)c][r+(s„-2)c].• r[b+(T„-1)c]••.b
[r+b+(n— 1)c][r+b+(n-2)c]...[r+b]
(3.9)
where „ s„= Y_ s k ,
r„ =n—s„.
(3.10)
k=1
In the cases„=n(i.e.,c 1 =...=E„=1), P(X I = 1, ... X„ = 1)
[ r + (n — 1) c] • r
[r+b+(n-1)c]••.[r+b]
(3.11)
and if s„ = 0 (i.e., e 1 = • • • = E„ = 0) then P(XI=0,...,Xn =0)=
[b + (n — 1)c]•.•b
[r+b+(n— 1)c]•••[r+b]
(3.12)
In particular, P(Xn
= E„IXt = ei,...,Xn-1 =E P (X1 = E1, .. . ,
P (X 1 = _
X.
-1)
= £ n)
c1,. ..,X—t = En — t )
[r + (s„ — 1)c]•..r[b + (r„ — 1)c]•••b [r + (s n _ 1 — 1)c].• r[b + (t n [r+s„_Ic]
1
[r + b + (n — 2)c]..•[r + b]
— 1)c] •••b [r + b + (n — 1)c]. .[r + b]
ifE= 1
— r+b+(n— 1)c [b+r„-IC]
(3.13)
ife„=0.
r+b+ (ii — 1)c
It follows that {X„} is non-Markov unless c = 0 (in which case {X„} is i.i.d.). Note, however, that {X„} does have a distinctive symmetry property reflected in (3.9). Namely, the joint distribution is a function of s„ = Yk =1 e k only, and is therefore invariant under permutations of e l • • . i,,. Such a stochastic process is called exchangeable (or symmetrically dependent). The Pölya urn model was originally introduced to illustrate a notion of "contagious disease” or "accident proneness' for actuarial mathematics. Although {X„} is non-Markov for c ^ 0, it is interesting to note that the partial-sum process {S„}, representing the evolution of accumulated numbers of red balls sampled, does have the Markov property. From (3.13) one can also get that
117
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
r P(sn=
sIS,
+ CS n _ , +(n-1)c
=s1,•• ,Sn_,
if s = l +s„_,
=s,)= r+b
+(n— S„- 1 ) c r + b + (n — 1)c b
lfs=s _ i . (3.14)
Observe that the transition law (3.14) depends explicitly on the time point n. In other words, the partial-sum process {S„} is a Markov process with a nonhomogeneous transition law. A related continuous-time version of this Markov process, again usually called the Pö1ya process is described in Exercise 1 of Chapter IV, Section 4.1. An alternative model for contagion is also given in Example 1 of Chapter IV, Section 4, and that one has a homogeneous transition law.
4 STOPPING TIMES AND THE STRONG MARKOV PROPERTY One of the most useful general properties of a Markov chain is that the Markov property holds even when the "past" is given up to certain types of random times. Indeed, we have tacitly used it in proving that the simple symmetric random walk reaches every state infinitely often with probability 1 (see Eq. 3.18 of Chapter 1). These special random times are called stopping times or (less appropriately) Markov times. Definition 4.1. Let {},,: n = 0, 1, 2, ...} be a stochastic process having a countable state space and defined on some probability space (S2, 3, P). A random variable r defined on this space is said to be a stopping time if
(i) It assumes only nonnegative integer values (including, possibly, +co), and (ii) For every nonnegative integer m the event {w: r(w) < m} is determined by Yo , Y1 ,..., Y.. Intuitively, if r is a stopping time, then whether or not to stop by time m can be decided by observing the stochastic process up to time m. For an example, consider the first time T y the process { Y„} reaches the state y, defined by t(co) = inf{n > 0: Y„(co) = y} . (4.1) If co is such that Y„(w) y whatever be n (i.e., if the process never reaches y), then take r y,(w) = oo. Observe that {w: r(w) < m} = U {co: Y„(w) = y} . n
=O
(4.2)
DISCRETE-PARAMETER MARKOV CHAINS
118
Hence zr Y is a stopping time. The rth return times r;' ofy are defined recursively by )
r" (w) = inf{n )
1: Y(w) = y},
r(w) = inf{n > iy '(w): Y (w) = y},
for r = 2, 3, ....
(4.3)
Once again, the infimum over an empty set is to be taken as oo. Now whether or not the process has reached (or hit) the state y at least r times by the time m depends entirely on the values of Y1 , ... , Y.. Indeed, {rI m} is precisely the event that at least r of the variables Y1 , ... , Y,n equal y. Hence ry" is a stopping time. On the other hand, if n y denotes the last time the process reaches the state y, then ?J is not a stopping time; for whether or not i < m cannot in general be determined without observing the entire process {Y n }. Let S be a countable state space and p a transition probability matrix on S, and let P,, denote the distribution of the Markov process with transition probability p and initial distribution n. It will be useful to identify the events that depend on the process up to time n. For this, let S2 denote the set of all sequences w = (i 0 , i 1 , i 2 , ...) of states, and let Y(w) be the nth coordinate of w (if w = (i o , i,, ... , i, ...), then Yn (cw) = in ). Let .fin denote the class of all events that depend only on Yo , YI , ... , Yn . Then the ‚ n form an increasing sequence of sigmafields of finite-dimensional events. The Markov property says that given the "past" Yo , Yi , ... , Y. up to time m, or given .gym , the conditional distribution of the "after-m"stochastic process Y,„ = {(Ym )„ } := {Ym+n . n = 0, 1, ...} is P. In other words, if the process is re-indexed after time m with m + n being regarded as time n, then this stochastic process is conditionally distributed as a Markov chain having transition probability p and initial state Y.. A WORD ON NOTATION. Many of the conditional distributions do not depend on the initial distribution. So the subscripts on P,^, Pi , etc., are suppressed as a matter of convenience in some calculations. Suppose now that r is the stopping time. "Given the past up to time t" means given the values oft and Yo , Y1 , ... , YY . By the "after-r"process we now mean the stochastic process Yi = {Yt+n :n=0, 1,2,...},
which is well defined only on the set IT < co}. Theorem 4.1. Every Markov chain { Yn : n = 0, 1, 2, ...} has the strong Markov property; that is, for every stopping time i, the conditional distribution of the after-r process Yt = { Yt+n : n = 0, 1, 2, ...}, given the past up to time i is P.. on the set
{i
< co}.
Proof. Choose and fix a nonnegative integer m and a positive integer k along with k time points 0 _< m, < m 2 < • • • < m k , and states i o , i l , ... 'im'
119
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
J1,J2, • • • ,Jk•
Then,
= Jk I T = m, Yo = i o , ... , Ym = Im) = P(Ym+mi = J1 , Ym+mz = J2, .. , Ym+mk — Jk I T = m, Yo = io... , Ym = Im)•
P(Yt +m l
J1
Yt +m 2
=J2,
... , Yt +m k
(4.4) Now if the event IT = m} (which is determined by the values of Y,,... , Ym ) is not consistent with the event "Yo = i o , ... , Ym = im " then {T = m, Yo = i o , • • • , Ym = i m } = 0 and the conditioning event is impossible. In that case the conditional probability may be defined arbitrarily (or left undefined). However, if IT = m} is consistent with (i.e., implied by) "Yo = i o , • • • , Ym = then{ T= m,Yo =i o .....Ym = i m }= {Yo =i o ,•••,Ym =i m },and the right side of (4.4) becomes P(Ym+mi
—
11> Ym+mz =i2'
..
,
Ym+m k =Jk
IY
Q
= 1 p , ... , Ym = im) •
(4.5)
But by the Markov property, (4.5) equals Pi m (Ym i = J1, Ymz =J2,
• • • , Ymk = Jk)
= Py,(Ym i =
on the set
{T
= m}.
11, Ym z
= J1, Ym z = J2,
... ,
Ym k =Jk),
(4.6)
n
We have considered just "future" events depending on only finitely many (namely k) time points. The general case (applying to infinitely many time points) may be obtained by passage to a limit. Note that the equality of (4.4) and (4.5) holds (in case {r = m} {Y0 = i o , ... , Ym = i m }) for all stochastic processes and all events whose (conditional) probabilities may be sought. The equality between (4.5) and (4.6) is a consequence of the Markov property. Since the latter property holds for all future events, so does the corresponding result for stopping times T. Events determined by the past up to time T comprise a sigmafield . , called the pre -T sigmafleld. The strong Markov property is often expressed as: the conditional distribution of Y7 given . is PY (on {T < co}). Note that . is the smallest sigmafield containing all events of the form IT = m, ,Ym =im }.
Example 1. In this example let us reconsider the validity of relation (3.18) of Chapter I in light of the strong Markov property. Let d and y be two integers. For the simple symmetric random walk {S: n = 0, 1, 2, ...} starting at x, is an almost surely finite stopping time, for the probability p x ,, that the random walk ever reaches y is I (see Chapter 1, Eqs. 3.16-3.17). Denoting by E, the expectation with respect to P, we have
120
DISCRETE-PARAMETER MARKOV CHAINS
P.(TY < oo) = P (S Y + „ = y for some n > 1) X
= EX[P(Sv+„ = y for some n , 1 (4 , So , ... ,SY)] = E X [PY (S„ = y for some n >, 1)] (since S t,,, Y = y) (Strong Markov Property) = P(S„ = y for some n > 1)
=Py (S, =y— 1,S„=y forsome n>, 1) +PP (S 1 = y + 1, S„=y for some n>, 1) =PY (S 1
= y-1)Py (S„=y forsomen> 1IS =y-1) 1
+PP (S, =y— 1)Py (S„=yforsomen>, 1 IS 1 =y+ 1) =zPy (S, + ,„=y for some m>,01S, = y — 1) +ZPy (S l+ ,„=y for some m >,QIS 1 =y+1)
(m=n-1)
=ZPy _ 1 ( S.=y for some m>,0) +ZPY+ ,( S,„=y for some m,0) (Markov property) 2P r 1,r +2Py +1, —2 +z= 1.
(4.7)
Now all the steps in (4.7) remain valid if one replaces Ty l) by i;,' -1) and T;,2) by zy'r and assumes that < oo almost surely. Hence, by induction, P ('”) < oo) = 1 for all positive integers r. This is equivalent to asserting 1 = P.,(T ( ' )
<
oo for all positive integers r)
= PX (S„ = y for infinitely many n).
(4.8)
The importance of the strong Markov property will be amply demonstrated in Sections 9-11.
5 A CLASSIFICATION OF STATES OF A MARKOV CHAIN
The unrestricted simple random walk {S„} is an example in which any state i E S can be reached from every state j in a finite number of steps with positive probability. If p denotes its transition probability matrix, then p 2 is the transition probability matrix of { Y„} 2_ {SZ „: n = 0, 1, 2, ...}. However, for the Markov chain { Y„}, transitions in a finite number of steps are possible from odd to odd integers and from even to even, but not otherwise. For {S„} one says that there is one class of "essential” states and for { Y„} that there are two classes of essential states. A different situation occurs when the random walk has two absorbing boundaries on S = {c, c + 1, ... , d — 1, d}. The states c, d can be reached (with positive probability) from c + 1, ... , d — 1. However, c + 1, ... , d — 1 cannot
121
A CLASSIFICATION OF STATES OF A MARKOV CHAIN
be reached from c or d. In this case c + 1, ... , d - I are called "inessential" states while {c }, {d} form two classes of essential states. The term "inessential" refers to states that will not play a role in the long-run behavior of the process. If a chain has several essential classes, the process restricted to each class can be analyzed separately.
>,
Definition S.I. Write i --' f and read it as either ` j is accessible from i" or "the process can go from i to j" if p;l > 0 for some n 1.
Y_
)
Since Pi;)=
Pi
(5.1)
1I.i2....,ln - 1 E.S
i -• j if and only if there exists one chain Pul'
pi,i2 , .
, p,,,_ 1 j
(i, i,, i 2 ,
... , i n _ 1 , j)
such that
are strictly positive.
Definition 5.2. Write i H j and read "i and j communicate" if i -‚ j and j -• i. Say "i is essential" if i -• j implies j -+ i ( i.e., if any state j is accessible from i, then i is accessible from that state). We shall let & denote the set of all essential states. States that are not essential are called inessential.
Proposition 5.1
(a) For every i there exists (at least one) j such that i ' f. (b) i -*j,j- •k imply i-• k. (c) "i essential" implies i <-+ i. (d) i essential, i --'f imply "f is essential" and i•-']. (e) On (' the relation "H" is an equivalence relation (i.e., reflexive, symmetric, and transitive). --
Y-
Proof. (a) For each i, jEs pit = 1. Hence there exists at least one j for which p i; >0; for this] one has i--• j. I such that p;f > 0, (b) i -* j, j -• k means that there exist m? 1, n pik > 0. Hence,
>,
)
)
(m) (n)
(m+n)_
Pu Pik
Pik
iEs (m) (n)
(m) (n)
(m) (n)
= Vif P.ik + L. Pa P)k % Pij Pik > 0.
(5.2)
+96i
Hence, i - k. Note that the first equality is a consequence of the relation p m+n
=
pm pn
(c) Suppose i is essential. By (a) there exists j such that p ;j > 0. Since i is essential, this implies] --' i, i.e., there exists m I such that pj7' 1 > 0. But then
>,
122
DISCRETE-PARAMETER MARKOV CHAINS m+1)
pi l
(m)
i = li- Pu
= pif Pi (m) +
Ics
pilp it ) > 0.
(5.3)
)3E.%
Hence i -• i and, therefore, i 4--* i. (d) Suppose i is essential, i -*j. Then there exist m > 1, n >, 1 such that p, J > 0 and p > 0. Hence i j. Now suppose k is any state such that j -> k, i.e., there exists m' >, 1 such that p1') > 0. Then, by (b), i -• k. Since i is essential, one must have k -> i. Together with i --+ j this implies (again by (b)) k -• j. Thus, if any state k is accessible from j, then j is accessible from that state k, proving that j is essential. 1 = 1, (e) If 60 is empty (which is possible, as for example in the case p i = 0, 1, 2, ...), then, there is nothing to prove. Suppose ' is nonempty. Then: (i) On f the relation "+-•" is reflexive by (c). (ii) If i is essential and i.-.j, then (by (d)) j is essential and, of course, i •-^ j and j H i are equivalent properties. Thus "«->" is symmetric (on ' as well as on S). (iii) If i H j and j *-+ k, then i j and j -• k. Hence i -4 k (by (b)). Also, k -• j and j -> i imply k --> i (again by (b)). Hence i H k. This shows that "•-+" is transitive (on 9 as well as on S). • )
)
-
From the proof of (e) the relation "^-•" is seen to be symmetric and transitive on all of S (and not merely 9). However, it is not generally true that i i (or, i -• i) for all i e S. In other words, reflexivity may break down on S. Example 1. (Simple (Unrestricted) Random Walk). S = {0, ± 1, ±2, ...}. Assume, as usual, 0
p=
5
1 5
1 5
1
0 3 0 3 0 O 0 4 0 ä 0 3 0 3 0 1
0
3 [ ii
1 3
0
(5.4)
1 3
Note that 9 = {2, 4}. In Examples I and 3 above, there is one essential class and there are no inessential states.
A CLASSIFICATION OF STATES OF A MARKOV CHAIN
123
Definition 5.3. A transition probability matrix p having one essential class and no inessential states is called irreducible. Now fix attention on S. Distinct subsets of essential states can be identified according to the following considerations. Let i e.1. Consider the set 6'(i) = { j e 6": i --> j}. Then, by (d), i +-] for all j E 6(i). Indeed, if], k e 6"(i), then j H k (for j --> i, i -• k imply j -+ k; similarly, k -• j). Thus, all members of 6'(i) communicate with each other. Let r e 6, r 6"(i). Then r is not accessible from a state in 6'(i) (for, if j e e'(i) and j -+ r, then i --> j, j -a r will imply i -> r so that r e 6"(i), a contradiction). Define S(r) = { j E 6',r -+ j}. Then, as before, all states in c0(r) communicate with each other. Also, no state in 6"(r) is accessible from any state in 6'(i) (for if ! e ó"(r), and j e 6'(i) and j -+ 1, then i -• 1; but r F + 1, so that i -• 1, 1 --> r implying i - r, a contradiction). In this manner, one decomposes 6" into a number of disjoint classes, each class being a maximal set of communicating states. No member of one class is accessible from any member of a different class. Also note that if k e 6"(i), then 6"(i) = 6'(k). For if j e 6'(i), then j , i, i -* k imply j --• k; and since j is essential one has k --> j. Hence j e 6"(k). The classes into which of decomposes are called equivalence classes. In the case of the unrestricted simple random walk {S„}, we have 6" S = {0, + 1, ±2,. . .}' and all states in 6" communicate with each other; only one equivalence class. While for {X„} = {S 2n }, = S consists of two disjoint equivalence classes, the odd integers and the even integers. Our last item of bookkeeping concerns the role of possible cyclic motions within an essential class. In the unrestricted simple random walk example, note that p i ,=0 for all i=0,±1,±2,...,butp;2 ) =2pq>0. In fact p;7 1 =0for all odd n, and p;" > 0 for all even n. In this case, we say that the period of i is 2. More generally, if i --• i, then the period of i is the greatest common divisor of the integers in the set A = In >, 1: p}. If d = d, is the period of i, then p;" ) = 0 whenever n is not a multiple of d and d is the largest integer with this property. )
Proposition 5.2 (a) If i H j then i and j possess the same period. In particular "period" is constant on each equivalence class. (b) Let i e 9' have a period d = d ; . For each j e 6'(i) there exists a unique integer r 1 , 0 < rj d - 1, such that p;j ) > 0 implies n = rj (mod d) (i.e., either n = rj or n = sd + rj for some integer s >, I).
Proof. (a) Clearly, (a+m+b)
P«
(a) (m) (b)
(5.5)
>P);Pi; P;,
for all positive integers a, m, b. Choose a and b such that p;, > 0 and pj(b > 0. If pj7 1 > 0, then ps_m ) - pj^" ) p^•^ ) > 0, and )
(a+2m+b) >
(a ) ^^m)
)
^b) > 0.
p.. Pu P^l P ( ) P (b.. P (• p.• p•. (a+m+b) >
) > 0
(
)
( 5.6 )
124
DISCRETE-PARAMETER MARKOV CHAINS
Therefore, d (the period of i) divides a + m + b and a + 2m + b, so that it divides the difference m = (a + 2m + b) — (a + m + b). Hence, the period of i does not exceed the period of j. By the same argument (since i 4—J is the same as j •-+ i), the period of j does not exceed the period of i. Hence the period of i equals the period of j. (b) Choose a such that !°> > 0. If ;'") > 0, ;") > 0, then "' 3p 1) p)> 0 and p;' ? pp;°) > 0. Hence d, the period of i, divides m + a, n + a and, therefore, m — n = m + a — (n + a). Since this is true for all m, n such that p 1) > 0, p;j> > 0, it means that the difference between any two integers in the set A = {n: p;jn> > 0} is divisible by d. This implies that there exists a unique integer r,, 0 < rj < d — 1, such that n = rj (mod d) for all n e A (i.e., n = sd + r, for some integer s >, 0 where s depends on n). n It is generally not true that the period of an essential state i is min{n >, 1: p;; > 0}. To see this consider the chain with state space { 1, 2, 3, 4} and transition matrix )
0 1 0 0 0 0 1 0 0
2 1
0 '2
1 0 0 0
Schematically, only the following one-step transitions are possible. 4—.1 T
I-->2—^3 2
Thus p;i' = 0, pi 1 > 0 , P11 > 0, etc., and pi"i = 0 for all odd n. The states communicate with each other and their common period is 2, although min{n: p;"1 > 0} = 4. Note that min{n > 1: p;° ) > 0} is a multiple of d, since d, divides all n for which p;" ) > 0. Thus, d. <, min{n >, 1: p°> 0}. Proposition 5.3. Let i E e have period d> 1. Let Cr be the set of j e .9(i) such that rj = r, where rr is the remainder term as defined in Proposition 5.2(b). Then (a) Co , C,, ... , Cd _, are disjoint, U r° =ö C, = (b) If je C„ then pik >0 implies k e C, + ,, where we take r + 1 = 0 if r = d — 1.
Proof. (a) Follows from Proposition 5.2(b). (b) Suppose j e C, and p;j > 0. Then n = sd + r for some s >, 0. Hence, if p;k > 0 then )
A CLASSIFICATION OF STATES OF A MARKOV CHAIN
125
Pik +>> Pi; )Pjk >0,
(5.7)
which implies k e Cr+l (since n + I = sd + r + I = r + 1 (mod d)), by n Proposition 5.2(b). Here is what Proposition 5.3 means. Suppose i is an essential state and has a period d > 1. In one step (i.e., one time unit) the process can go from i E Ca only to some state in C, (i.e., p 1 > 0 only if j e C 1 ). From states in C,, in one step the process can go only to states in C 2 . This means that in two steps the process can go from i only to states in C 2 (i.e., p;» > 0 only if je C 2 ), and so on. In d steps the process can go from i only to states in Cd + , = CO3 completing one cycle (of d steps). Again in d + 1 steps the process can go from i only to states in C 1 , and so on. In general, in sd + r steps the process can go from i only to states in Cr. Schematically, one has the picture in Figure 5.1 for the case d = 4 and a fixed state i e Co of period 4.
Example S.S. In the case of the unrestricted simple random walk, the period is 2 and all states are essential and communicate with each other. Fix i = 0. Then C o = {0, ±2, ±4, ...}, C l = (±1, ±3, ±5, ...}. If we take i to be any even integer, then C O3 C l are as above. If, however, we start with i odd, then C o = {± 1, ±3, ±5, ...}, C, = {0, ±2, ±4, ...}.
C, iEG,--.jEC j —^kEC,—./EC,—^mEC„—^ •••
Figure 5.1
126
DISCRETE-PARAMETER MARKOV CHAINS
6 CONVERGENCE TO STEADY STATE FOR IRREDUCIBLE AND APERIODIC MARKOV PROCESSES ON FINITE SPACES As will be demonstrated in this section, if the state space is finite, a complete analysis of the limiting behavior of p", as n —• oo, may be carried out by elementary methods that also provide sharp rates of convergence to the so-called steady-state or invariant distributions. Although later, in Section 9, the asymptotic behavior of general Markov chains is analyzed in detail, including the law of large numbers and the central limit theorem for Markov chains that admit unique steady-state distributions, the methods of the present section are also suited for applications to certain more general (nonfinite) state spaces (e.g., closed and bounded intervals). These latter extensions are outlined in the exercises. First we consider what happens to the n-step transition law if all states are accessible from each other in one time step.
Proposition 6.1. Suppose S is finite and p,J > 0 for all i, j. Then there exists a unique probability distribution it = {m 1 : j e S} such that
(6.1)
yrc ; p ;J =n J for alljeS i
and ^p;j^—itJ1
(1—Nb)"
for all i,jeS, n>,1,
(6.2)
where 6 = min{p ;J : i, je S} and N is the number of elements in S. Also, it J >, 6 for alljeS and 6<1/N. Proof. Let M;" ) , m;" ) denote the maximum and the minimum, respectively, of
the elements {p: i e S} of the jth column of p'. Since p ;J >, 6 and p ;J = 1 — Y- k0 J P ik <- 1 — (N — 1)b for all i, one has M}' ) —mj 1)
(6.3)
Fix two states i, i' arbitrarily. Let J = { j e S: p ;J > p ; .J }, J' _ { j e S: p ;J ^ p ; • J }. Then, 0 = 1 — 1 = E (p — Pi J) = Y_ (P — Pij) + Y_ (Pij — PrJ), ,
J
JE)'
JE)
so that I (Pij — PrJ) = — Y_ (p — Pi,j), je)'
JE
(6.4)
CONVERGENCE TO STEADS` STATE
127
and I (Pij — P) _ y- Pij — jE ✓
jEJ
jEJ
= 1 — y Pij — y pi'j 1 — (#J')6 — (#J)6 = I — N. (6.5) jE ✓ '
jE ✓
Therefore, (n + 1)(n + 1) nj — Pi'j
(n) Pik Pkj —
= k
(n) (Pik — Pi'k)Pj k
(n)
Pi'k Pj k = k
k
(n)
/
(n)
\Pik — Pi'k)Pkj + kcJ
(Pik — Pi'k)Pkj kEJ'
(Pik
— Pi'k)Mj
")
+ Y- (Pik — Pi'k)mj
")
k€J'
k€J
_ (Mj — mj )` ")
")
y (Pik
— min)). (6.6) — Pik)) (1 — NS)(M
`keJ
Letting i, i' be such that p (n+l) _ Mj(n+l) p1j = m(n+l) one gets from (6.6), Min+l) — m 5v+l) < (1 — N6)(Mj(" — min)).
(6.7)
)
Iteration now yields, using (6.3) as well as (6.7), M
)
(1 — N(5)"
—m
for n >, 1.
(6.8)
Now M in+ 1) = max (n+ 1) = max I P
i
i
m
\ P'k Pkj
= Mcn) , )) < max (Y- p ik M(n)1 j J
k
i
r 1) =min p^^ +i) = min Y_ P ik Pkj) i
i
\ k
k
min ) (^ Pikm;" ^ = m;" ) ,
J
i
k
nondecreasing in n. Since M;" , mj" are i.e., Mj(" is nonincreasing and m bounded above by 1, (6.7) now implies that both sequences have the same limit, say n j . Also, 6<mj(' <m
)
)
)
()
m ,")—M(n)
\p i1)_ 7El\
M(n)— m^n)
which, together with (6.8), implies the desired inequality (6.2). Finally, taking limits on both sides of the identity Pij +l) _ > P ik Pkj )
k
(6.9)
128
DISCRETE-PARAMETER MARKOV CHAINS
one gets it j = E k Trk Pkj, proving (6.1). Since E j p;^ = 1, taking limits, as n --• co, it follows that Y j nj = 1. To prove uniqueness of the probability distribution it satisfying (6.1), let it = f nj : j e S} be a probability distribution satisfying ic'p = it'. Then by iteration it follows that )
frj =
n;P;j = (it'p)j = (n'pp)j = (n'p 2 )j = ... _ (n'p")j = Y n;P. (6.10)
n
Taking limits as n —* oo, one gets nj = Y ; n ; ij = n j . Thus, n = n.
For an irreducible aperiodic Markov chain on a finite state space S there is a positive integer v such that 6':=minp;j ) >0.
(6.11)
^.j
Applying Proposition 6.1 with p replaced by p" one gets a probability it = {rc j : j e S} such that max
— nj l < (1 — Na')", n = 1, 2 ....
(6.12)
;.j
Now use the relations IP(jv+r") — 1Cjl =
Pik ) (Pkj - v) k
x j ) ) 1<
L, Pik )(1 — N(5 ' ) " = (1 — Nb
'
) , "
in
i
1,
k
(6.13)
to obtain p8j) — it! < (1 — N8')[ "iv) n = 1, 2, ... ,
(6.14)
where [x] is the integer part of x. From here one obtains the following corollary to Proposition 6.1. Corollary 6.2. Let p be an irreducible and aperiodic transition law on a state space S having N states. Then there is a unique probability distribution it on S such that Y_n;p;j =nj ;
foralljeS.
(6.15)
Also, Ip!) — ijl < (1 — N6')r"J" ] for all i, je S, for some 6' > 0 and some positive integer v.
(6.16)
CONVERGENCE TO STEADY STATE
129
The property (6.15) is important enough to justify a definition.
Definition 6.1. A probability measure it satisfying (6.15) is said to be an invariant or steady-state distribution for p. Suppose that it is an invariant distribution for p and let {X n } be the Markov chain with transition law p starting with initial distribution n. It follows from the Markov property, using (2.9), that 1 Pn(Xm = lo, Xm+1 = i l , ... , Xm+n = in) _ (TC ' P m )i0Pt0t Pi 1 i2
_ lt lo PioiI Pi,(z.
..
.
Pi r
11.
. . D)n Iin
= P (X0 = i0, X1 = i1, • P
. Xn = in),
(6.17) for any given positive integer n and arbitrary states i 0 , i 1 , ... , i n e S. In other words, the distribution of the process is invariant under time translation; i.e., {Xn } is a stationary Markov process according to the following definition.
Definition 6.2. A stochastic process { Yn } is called a stationary stochastic process if for all n, m P(YO = i0, Y1 = i1, ... , Y. =
in)
= P(Ym
=
lo, Y1 +m = i1 , •
. Yn+m = in).
(6.18) Proposition 6.1 or its corollary 6.2 establish the existence and uniqueness of an invariant initial distribution that makes the process stationary. Moreover, the asymptotic convergence rate (relaxation time) may also be expressed in the form IPi;) — ir;I < ce '
(6.19)
-
for some c, A > 0. Suppose that {Xn } is a stochastic process with state space J = [c, d] and having the Markov property. Suppose also that the conditional distribution of Xn+ 1 given X. = x has a density p(x, y) that is jointly continuous in (x, y) and does not depend on n >, 1. Given X0 = x 0 , the (conditional on X 0 ) joint density of X 1 , ... , Xn is given by J(Xi.....X..IXo=xo)(xl, ...
, xn) = P(x0, X1)P(x1, X2)
If X0 has a density u(x), then the joint density of X 0 , . Jxo,....X,)(xo, .. . , xn) = µ( x0)P(xo, x1)
...
...
.. ,
P(Xn— 1, xn).
(6.20)
Xn is
P(xn-1, xn).
(6.21)
130
DISCRETE-PARAMETER MARKOV CHAINS
Now let 6 = min p(x, y).
(6.22)
x.y c [c,dj
In a manner analogous to the proof of Proposition 6.1, one can also obtain the following result (Exercise 9). Proposition 6.3. If 6 = minx, Y E[C,dlp(x, y) > 0 then there is a continuous probability density function n(x) such that
f
d
n(x)p(x, y) dx = n(y)
for all y e (c, d)
(6.23)
and
p( )(x, y) — n(y)I '< [1 — 6(d — c)]' '0
for all x, ye (c, d)
(6.24)
where 0 = max { p(x, y) — p(z, y)} .
(6.25)
x,y, z e [c,dl
Here p "°(x, y) is the n -step transition probability density function of X" given [
Xo = x. Markov processes on general state spaces are discussed further in theoretical complements to this chapter. Observe that if A = (( a s)) is an N x N matrix with strictly positive entries, then one may readily deduce from Proposition 6.1 that if furthermore , a, = 1. for each i, then the spectral radius of A, i.e., the magnitude of the largest eigenvalue of A, must be 1. Moreover, 2 = I must be a simple eigenvalue (i.e., multiplicity 1) of A. To see this let z be a (left) eigenvector of A corresponding to A = 1. Then for t sufficiently large, z + to is also a positive eigenvector (and normalizable), where it is the invariant distribution (normalized positive eigenvector for A = 1) of A. Thus, uniqueness makes z a scalar multiple of n. The following theorem provides an extension of these results to the case of arbitrary positive matrices A = ((a u )), i.e., a 1 > 0 for 1 < i, j < N. Use is not made of this result until Sections 11 and 12, so that it may be skipped on first reading. At this stage it may be regarded as an application of probability to analysis.
yN
Theorem 6.4. [Perron—Frobenius]. Let A = ((a ij )) be a positive N x N matrix. (a) There is a unique eigenvalue A o of A that has largest magnitude. Moreover, A is positive and has a corresponding positive eigenvector.
CONVERGENCE TO STEADY STATE
131
(b) Let x be any nonnegative nonzero vector. Then v = tim A nAnx
(6.26)
new
Proof.
exists and is an eigenvector of A corresponding to A,, unique up to a scalar multiple determined by x, but otherwise independent of x.
Define A + = {A > 0: Ax >, tix for some nonnegative nonzero vector x}; here inequalities are to be interpreted componentwise. Observe that the set A + is nonempty and bounded above by IIAII :_ JN , J" , a ij . Let A o be the least upper bound of A. There is a sequence {2 n : n >, l} in A + with limit A o as n -+ oc. Let {x n : n >, l} be corresponding nonnegative vectors, normalized so that hIxIl xi = 1, n >, 1, for which Ax n >, .? n x n . Then, since Ilxnll = 1 , n = 1, 2, ... , {x n } must have a convergent subsequence, with limit denoted x 0 , say. Therefore Ax 0 >, 2 0 x 0 and hence A o e A + . In fact, it follows from the least upper bound property of 2 c that Ax o = 2 0 x 0 . For otherwise there must be a 1 component with strict inequality, say l j j — 2 0 x 1 = ö > 0, where x 0 = (x 1 ..... x N )', and Ij , a kj x j — 2 o x k >, 0, k = 2, ... , N. But then taking y = (x 1 + ((5/2), x 2 , ... , x N )' we get Ay > pl o y with strict inequality in each component. This contradicts the maximality of 2. To prove that if A is any J <, A., let z be an eigenvector corresponding to A and other eigenvalue then Al define Izl = (Iz1l, • • • , Iz n l). Then Az = Az implies AIzl %JAIIzl. Therefore, by definition of 2, we have 1 2 1 <, 20. To prove part (ii) of the Theorem we can apply Proposition 6.1 to the transition probability matrix
_j a x
C
^ P`, 10xi
where x o = (x l , ... , x r,) is a positive eigenvector corresponding to A c,. In particular, noting that (2) — N —
Puj — k1
Pik Pkj
N
a ik x k a kj xj — a
(2
)X j
— k = 1 20x1 AOxk ^—O X Xi
and inductively ^n> — ai] xj
Pij —
the result follows.
^f n A
>
xi
n
Corollary 6.5. Let A = ((a ij )) be an N x N matrix with positive entries and let B be the (N — 1) x (N — 1) matrix obtained from A by striking out an ith row and jth column. Then the spectral radius of B is strictly smaller than that of A.
132
DISCRETE-PARAMETER MARKOV CHAINS
Proof. Let Z oo be the largest positive eigenvalue of B and without loss of generality take B = ((a 1 : i, j = 1, ... , N — 1)). Since, for some positive x, N
Y
a,j x j = 2 o x 1 ,
i = 1, .. , N, x i > 0,
j=1
we have N-1
a,jxj
(Bx)i =
= 2oxi — a,NXN < 2OX;,
1= 1,. ..,N— 1.
j=1
Thus, by the property (6.26) applied to Z oo we must have
.loo
<2 g .
n
Corollary 6.6. Let A = ((a ;j )) be a matrix of strictly positive elements and let 2 0 be the positive eigenvalue of maximum magnitude (spectral radius). Then A 0 is a simple eigenvalue. Proof. Consider the characteristic polynomial p(A) = det(A — Al). Differentiation with respect to A gives p'(2) = —yk=, det(A k — ) I), where A k is the
matrix obtained from A by striking out the kth row and column. By Corollary 6.5 we have det(A k — 1 o I) 0, since each A k has smaller spectral radius than ). o . Moreover, since each polynomial det(A k — AI) has the same leading term, they all have the same sign at A = A. Thus p'(2 o ) 0 0. n Another formula for the spectral radius is given in Section 14.
7 STEADY-STATE DISTRIBUTIONS FOR GENERAL FINITE-STATE MARKOV PROCESSES In general, the transition law p may admit several essential classes, periodicities, or inessential states. In this section, we consider the asymptotics of p" for such cases. First suppose that S is finite and is, under p, a single class of periodic essential states of period d> 1. Then the matrix p ° , regarded as a (one-step) transition probability matrix of the process viewed every d time steps, admits d (equivalence) classes C o , C l , ... , C_ 1 , each of which is aperiodic. Applying (6.16) to Cr and p d (instead of S and p) one gets, writing N, for the number of elements in Cr ,
°
)In/vr ^pi; ^ — ic j 1 '< (1 — Nr S r
for i, j a Cr,
(7.1)
where n j = lim n . p;^°l, v, is the smallest positive integer such that p!y•d > > 0
133
STEADY-STATE DISTRIBUTIONS
for all i, j e C„ and b r = min{ pij, d ) : i, j e C,}. Let 6 = min{b,: r = 0, 1, ... , d — l}. Then one has, writing L = min{ N,: 0 < r < d — 1 }, v = max{v,: 0 < r < d — 1 }, piid)—ßj1 <(1 —Lc5)I'I° for i, je C, (r=0,1,. ..,d— 1). (7.2) )
If i E C, and j c C S with s = r + m (mod d), then one gets, using the facts that pkjd) = 0 if k is not in Cs , and that Y-kEC. pik ) = 1 ,
^ Pij
d+m)
71j )I — 7I j l = I I Pik)(Pkjd) —
for i e Cr,
(I — L5)'"
j
E Cs .
keC s
(7.3)
Of course, d+m') = 0
if i e C„ j E Cs , and m' s — r (mod d).
(7.4)
Note that {nj : j e Cr } is the unique invariant initial distribution on C r for the restriction of p d to C,. Now, in view of (7.4), nd
=
n—I
pit'd+m)
if je C„ j E Cs , and m = s — r (mod d). (7.5)
—
t=1
no
If r = s then m = 0 and the index of the second sum in (7.5) ranges from 1 to n. By (7.3) one then has I nd
lim
-
pii = 7r J.
Y
)
(i, j ES).
(7.6)
(i, j e S).
(7.7)
n—ac n t=1
This implies, on multiplying (7.6) by 1/d, n
lim -
n•
Pü _ — )
n—m nt=1
d
Now observe that it = {n j /d: j e S} is the unique invariant initial distribution of the periodic chain defined by p since k keS
d
>i
, lim kes n-oo
1 -
n
n [=I
P ik
)
n
= lim - Y Pi;+ I) =
n-xnI
Pkj
7C
(j e S).
d
Moreover, if it = {i c j : j e S} is another invariant distribution, then -
(7.8)
134
DISCRETE-PARAMETER MARKOV CHAINS
Frj =
it]
nkPkj — kES
(7.9)
(t = 1, 2, ...),
keS
so that, on averaging over t = 1, ... , n — 1, n,
fj =
1
na
Y_ Y ^k Pkj = =1 kES
nk ^ keS
The right side converges to Y- k n k (rc j /d) =
1
n=1
(7.10)
Pk( i
n j /d, as n —* oc by (7.7). Hence
n = /d. ;
If the finite state space S consists of b equivalence classes of essential states (b > 1), say 61,, &'z , ... .9b , then the results of the preceding paragraphs may be applied to each ß, for the restriction of p to ef, separately. Let it denote the unique invariant initial distribution for the restriction of p to f;. Write n ` for the probability distribution on S that assigns probability 71j(` to states j e 9i and zero probabilities to states not in 4 Then it is easy to check that ^6 = I a ; t" is an invariant initial distribution for p, for every b-tuple of nonnegative numbers ... , a b satisfying = 1. The probabilistic interpretation of this is as follows. Suppose by a random mechanism the equivalence class .6 ; is chosen with probability a i (1 < i < b). If tf, is the class chosen in this manner, then start the process with initial distribution n ' on S. The resulting Markov chain {X„} is a stationary process. It remains to consider the case in which S has some inessential states. In the next section it will be shown that if S is finite, then, no matter how the Markov chain starts, after a finite (possibly random) time the process will move only among essential states, so that its asymptotic behavior depends entirely on the restriction of p to the class of essential states. The main results of this section may be summarized as follows. (
)
)
)
(
)
Theorem 7.1. Suppose S is finite. (i) If p is a transition probability matrix such that there is only one essential class ' of (communicating) states and these states are aperiodic, then there exists a unique invariant distribution it such that y j , S n ; = 1 and (6.16) holds for all i, j e if. (ii) If p is such that there is only one essential class of states o' and it is periodic with period d, then again there exists a unique invariant distribution it such that ;E , n j = 1 and (7.3), (7.4), (7.7) hold for i, j e and i; = (iii) If there are b equivalence classes of essential states, then (i), (ii), as the case may be, apply to each class separately, and every convex combination of the invariant distributions of these classes (regarded as state spaces in their own right) is an invariant distribution for p.
135
MARKOV CHAINS: TRANSIENCE AND RECURRENCE PROPERTIES
8 MARKOV CHAINS: TRANSIENCE AND RECURRENCE PROPERTIES Let {X„} be a Markov chain with countable state space S and transition probability law p = ((p ij )). As in the case of random walks, the frequency of returns to a state is an important feature of the evolution of the process. Definition 8.1. A state j is said to be recurrent if PJ (X„ =j i.o.) = 1,
(8.1)
PJ(X„ =j i.o.) = 0.
(8.2)
and transient if
Introduce the successive return times to the state j as t^° =0,
'}=min{n>0:X„=j},
}
(8.3)
r=min{n> T:X„=j}
(r= 1,2,...),
with the convention that Tj(' is infinite if there is no n > Write
for which X„ =j.
}
pj1=Pj(X„=i for some n? 1)=Pj (r;
(8.4)
Using the strong Markov property (Theorem 4.1) we get Pj(T!
')
< 00) = Pj (ii'
-
' < oo and Xt,,-„ + = i for some n i 1) }
= Ej( 1 {ty- < = E j (1,-„ <
}
PXt , r - u (X„ = i R,(X„ = i
for some n > 1))
for some n? 1))
= Ej( 1 {t^.- „ <x)Pij) = Pj(T,'-n < 00 )P«.
(8.5)
Therefore, by iteration, Pj(T ! ' ) < cc) = Pj(i{ l } < cc)P' - ' = Pji P - `
(r = 2, 3, ...).
(8.6)
In particular, with i =j, )
(r=1,2,3,...).
Pj(T('
Now Pj (X„ =j for infinitely many n) = Pj (rj(' < cc for all r) }
(8.7)
136
DISCRETE-PARAMETER MARKOV CHAINS
1 i = ^Pii=1 = Jim P i (T;r ^ < oo) f 0
..
r
if pii < 1.
(8.8)
Further, write N(j) for the number of visits to the state j by the Markov chain {X„}, and denote its expected value by G(i,j) = E.N(j).
(8.9)
Now by (8.6) and summation by parts (Exercise 1), if i j then cc
E1N(j) _
Pr+t <
Pj(N(j) > r) _ r=0
)
op) =
r=0
Pji
(8.10)
r=0
so that, if i 0 j, 0 G(i, j) = Pii/(l - pü)
00
if i +* j, i.e., if p ii = 0, if i --, j and p
1, (8.11) if i -+ f and pii = 1.
This calculation provides two useful characterizations of recurrence; one is in terms of the long-run expected number of returns and the other in terms of the probability of eventual return.
Proposition 8.1 (a) Every state is either recurrent or transient. A state j is recurrent iff p ii = 1 iff G(j, j) = oc, and transient if pii < 1 if G(j, j) < oo. (b) If j is recurrent, j --* i, then i is recurrent, and pi; = p ii = 1. Thus, recurrence (or transience) is a class property. In particular, if all states communicate with each other, then either they are all recurrent, or they are all transient. (c) Let j be recurrent, and S(j) _ {i e S: j -- i) be the class of states which communicate with j. Let n be a probability distribution on S(j). Then P,r (X„ visits every state in S(j) i.o.) = 1.
(8.12)
Proof. Part (a) follows from (8.8), (8.11). For part (b), suppose j is recurrent and j -► i (i j). Let A r denote the event that the Markov chain visits i between the rth and (r + 1)st visits to state). Then under Pi , A, (r >, 0) are independent events and have the same probability 6, say. Now 0 > 0. For if 0 = 0, then PJ (X„ = i for some n >, 1) = Pj ((J r>0 A,) = 0, contradicting j -• i. It now follows from the second half of the Borel-Cantelli Lemma (Chapter 0, Lemma 6.1) that PJ (A. i.o.) = 1. This implies G(j, i) = oo. Interchanging i and j in (8.11) one then obtains p,, = 1. Hence i is recurrent. Also, pi; >_ Pj (A r i.o.) = 1. By the same argument, p ii = 1.
MARKOV CHAINS: TRANSIENCE AND RECURRENCE PROPERTIES
137
To prove part (c) use part (b) to get
P* (X„ visits i i.o.) = 1] n k P,k (X„ visits i i.o.) = I n,^ = 1. (8.13) k c S(j) k
E
S(j)
Hence
Pn( n {XX visits i i.o.}) = 1.
(8.14)
iES(j)
n Note that G(i, i) = 1/(1 — p a ), i.e., replace p, by 1 in (8.10). For the simple random walk with p> Z, one has (see (3.9), (3.10) of Chapter I), j (1 — 2p)
)'-
G(i, j) _ (qp
1/(1-2p)
for i > j, (8.5)
for i<j.
Proposition (8.1) shows that the difference between recurrence and transience is quite dramatic. If) is recurrent, then P P (N(j) = cc) = 1. If] is transient, not only is it true that P3 (N(j) < oo) = 1, but also E1 (N(j)) < oo. Note also that every inessential state is transient (Exercise 7).
Example 1. (Random Rounding). A real number x >, 0 is truncated to its (greatest) integer part by the function [x] := max{n E 7L: n < x}. On the other hand, [x + 0.5] is the value of x rounded-off to the nearest whole number. While both operations are quite common in digital filters, electrical engineers have also found it useful to use random switchings between rounding and truncation in the hardware design of certain recursive digital multipliers. In random rounding, one applies the function [x + u], where u is randomly selected from [0, 1), to digitize. The objective in such designs is to remove spurious fixed points and limit cycles (theoretical complement 8.1). For the underlying mathematical ideas, first consider the simple onedimensional digital recursion (with deterministic round-off), x„ + i = [ax„ + 0.5], n = 0, 1, 2, ... , where dal < 1 is a fixed parameter. Note that x o = 0 is always a fixed point; however, there can be others in 71 (depending on a). For example, if a = Z, then x o = 1 is another. Let U 1 , U 2 ,... be an i.i.d. sequence of random variables uniformly distributed on [0, 1) and consider X, = [aX„ + U, + ,], n >, 0 on the state space 7L. Again X0 = 0 is a fixed point (absorbing state). However, for this case, as will now be shown, X. --> 0 as n —• oo with probability 1. To see this, first note that 0 is an essential state and is accessible from any other state x. That 0 is accessible in the case 0 < a < I goes as follows. For x > 0, [ax + u] _ [ax] < ax <x if u E [0, 1 — (ax — [ax] )). If x < 0, ax 0 7L, ^[ax + u]I = [ax] + II < alx) if ue(1 + ([ax] — ax), 1). In either case, these
DISCRETE-PARAMETER MARKOV CHAINS
138
events have positive probability. If ax c 11, then I[ax + u]I = laxl. Since 0 is absorbing, all states x ; 0 must, therefore, be inessential, and thus transient. To obtain convergence, simply note that the state space of the process started at x is finite since, with probability 1, the process remains in the interval [ — Ixl, Ixl]. The finiteness is critical for this argument, as one can readily see by considering for contrast the case of the simple asymmetric (p > 2) random walk on the nonnegative integers with absorbing boundary at 0. Consider now the k-dimensional problem X n+ 1 = [AX n + U n+ 1 ], n = 0, 1, 2, , where A is a k x k real matrix and {U n } is an i.i.d. sequence of random vectors uniformly distributed over the k-dimensional cube, [0, 1) k and [ ] is defined componentwise, i.e., [x] = ([x 1 ], ... , [x k ]), x e 11 k It is convenient to use the norm II II defined by Ilx lie := max{Ix,l, • • • , x,j}. The ball B,(0):= {x: Ilxllo < r} (of radius r centered at 0) for this norm is the square of side length 2r centered at 0. Assume II Ax II o < II x II o, for all x 0 (i.e., IIAII := sup, , 0 I1AxiI 0 /Ilx11 0 < 1) as our stability condition. Once again we wish to show that X n —• 0 as n —> oo with probability 1. As in the one-dimensional case, 0 is an (absorbing) essential state that is accessible from every other state x because there is a subset N(x) of [0, 1) k having positive volume such that for each u c N(x), one has II[Ax + u]110 < IIAxllo < Ilxll o . So each state x # 0 is inessential and thus transient. The result follows since the process starting at x e 71k does not escape B 11x1 ^ 0 (0), since II[Ax + u]110 < I1Axllo + I < 11x1, + 1 and, since II[Ax + u]Il o and Ilx110 are both integers, (I[Ax + u](lo < 11x10 Linear models X n+ , = AX n + E n+ ,, with {E n } a sequence of i.i.d. random vectors, are systematically analyzed in Section 13.
9 THE LAW OF LARGE NUMBERS AND INVARIANT DISTRIBUTIONS FOR MARKOV CHAINS The existence of an invariant distribution is intimately connected with the limiting frequency of returns to recurrent states. For a process in steady state one expects the equilibrium probability of a state j to coincide with the fraction of time spent on the average by the process in state j. To this effect a major goal of this section is to obtain the invariant distribution as a consequence of a (strong) law of large numbers. Assume from now on, unless otherwise specified, that S comprises a single class of (communicating) recurrent states under pfor the Markov chain {X}. Let f be a real-valued function on S, and define the cumulative sums Let
Sn =
f(Xm)
(n = 1, 2, ...).
(9.1)
m=0
For example, if f (i) = 1 for i = j and f (i) = 0 for i j, then S/(n + 1) is
139
THE LAW OF LARGE NUMBERS
the average number of visits to j in time 0 to n. As in Section 8, let t.r denote the time of the rth visit to state j. Write the contribution to the sum S, from the rth block of'time (tr, zr + "] as, Zj r I I I
Z r =
(r=0,1,2,...).
f^(X,,,)
I
(9.2)
in =+ I
By the strong Markov property, the conditional distribution of the process {Xt l ,, Xt ..,, + ,, .. , X r ir +n , ...}, given the past up to time r, is PP , which is the distribution of {X 0 , X,, ... , Xn , ...} under X0 =1. Hence, the conditional distribution of Zr given the process up to time Tcr' is that of Z o = [(X 1 ) + + f (X T „), given X 0 =1. This conditional distribution does not change with the values of X 0 , X 1 .... , X(= j), rfr'. Hence, Zr is independent of all events that are determined by the process up to time rfr . In particular, Zr is independent of Z,, ... , Z r _ . Thus, we have the following result. )
Proposition 9.1. The sequence of random variables {Z 1 , Z 2 , ...} is i.i.d., no matter what the initial distribution of {X,,: n = 0, 1, 2, ...} may be. This decomposition will be referred to as the renewal decomposition. The strong law of large numbers now provides that, with probability 1, r
lim Z Z= EZ 1 , r-+x r
(9.3)
1
provided that EIZ 1 I < oc. In what follows we will make the stronger assumption provided that T 2,
I f(Xm)I < oo .
E m=
(9.4)
tJ! I ,
Write Nn for the number of visits to state j by time n. Now, NN =max{r>0:tjr'
(9.5)
Then (see Figure 9.1), r n
1 J( m) m=0
N„ r=1
tih „*
1^
n+I
For each sample path there are a finite number, t 1, of summands in the first sum on the right side, except for a set of sample paths having probability zero. Therefore,
140
DISCRETE-PARAMETER MARKOV CHAINS
S
Al)
T f(i)
f(Xn)
T
T
x,=r
f(j)
f(j)
f(j)
f(j)
f(j)
---- T-------I— Y-------------T--------------z--f(k)
T
T
k
0
1
•..
C (') + 1 10)...
CO tI... i (')
T oi„
ii +l ... T in
L N. +I...
n n+ I ... T iN.+u...
f(xn«1)
7... ZIA
U Z,
Zi
I
t ...
ZN
I Z.N,
"-I
Sn
Figure 9.1 1
lim
T 11)
^ f(Xm ) = 0,
with probability 1.
(9.7)
n-.m n m=0
The last sum on the right side of (9.6) has at most r;'"' ) — r( N ^ ) summands, this number being the time between the last visit to j by time n and the next visit to j. Although this sum depends on n, under the condition (9.4) we still have that (Exercise 1) 1T
'" a l l
1
Y_ .f(X.)I'
n m=n
T'N-' 1 1
If(Xm)^ +0 a.s.
I
as n— co.
(9.8)
n m-r"+1
Therefore, N 1 N^
S _ 1 N^
n
2] Z,+R n = " — E Z,+R n , Nn ,= 1 n, 1n
(9.9)
where R n --* 0 as n --> oo with probability 1 under (9.4). Also, for each sample path outside a set of probability 0, N N —• oo as n —• oo and therefore by (9.3) (taking limit over a subsequence) 1 N^
lim — Y Z r = EZ i (9.10)
n_^
if (9.4) holds. Now, replacing f by the constant function f - 1 in (9.10), we have
THE LAW OF LARGE NUMBERS
141
lim L_ — n—x
-
_' = E(tj 2) — T i
(9.11)
Nn
assuming that the right side is finite. Since n—T
J
T (N,J (N " ) < T(N"+ 1) — J J
which is negligible compared to N n as n — oo, one gets (Exercise 2)
tim n = E( T ^. 2 ) — r'')
(9.12)
, Nn
Note that the right side, E(r 2) — t'), is the average recurrence time of j(= E j tj(' ) ), and the left side is the reciprocal of the asymptotic proportion of time spent at]. Definition 9.1. A state j is positive recurrent if E T;' < oo. ;
)
(9.13)
Combining (9.9)—(9.11) gives the following result. Note that positive recurrence is a class property (see Theorem 9.4 and Exercise 4). Theorem 9.2. Suppose j is a positive recurrent state under p and that f is a real-valued function on S such that
E {If(X1)I + ... + If(X)I} < oo. ;
(9.14)
Then the following are true. (a) With F3 -probability 1, n
lim -- Z f(Xm) = E;(.f (X,) + ... + .i(X , ))/E^T; 1) .
(9.15)
n-aao n m=O
(b) If S comprises a single class of essential states (irreducible) that are all positive recurrent, then (9.15) holds with probability I regardless of the initial distribution. Corollary 9.3. If S consists of a single positive recurrent class under p, then for all i, j E S, lim
1
- n-
n
n
P;;) _ m=i Et
(9.16)
142
DISCRETE-PARAMETER MARKOV CHAINS
A WORD ON PROOFS. The calculations of limits in the following proofs require the use of certain standard results from analysis such as Lebesgue's Dominated Convergence, Fatou's Lemma, and Fubini's theorem. These results are given in Chapter 0; however, the reader unfamiliar with these theorems may proceed formally through the calculations to gain acquaintance with the statements.
Proof Take f to be the indicator function of the state j, f(j)
(9.17)
I for all k ^ j.
f(k)=0
Then Zr - 1 for all r = 0, 1, 2, ... since there is only one visit to state j in (T cr) , i(r+1)] Hence, taking expectation on both sides of (9.15) under the initial state i, one gets (9.16) after interchanging the order of the expectation and the limit. This is permissible by Lebesgue's Dominated Convergence Theorem since (n + I)—' YOf(Xm)I < 1. It will now be shown that the quantities defined by i;
=(Eji; n ) - '
(9.18)
(jES)
constitute an invariant distribution if the states are all positive recurrent and communicate with each other. Let us first show that Y- t it = I in this case. For this, introduce the random variables
Ti(" ) = # {m e (r (.' ) , i (.' +i)]: Xm =
i}
(r = 0, 1, 2, ...),
(9.19)
i.e., T i ' is the amount of time the Markov chain spends in the state i between the rth and (r + 1)th passages to j. The sequence {T: r = 1, 2, ...} is i.i.d. by the strong Markov property. Write (
)
(i) = E J T;' .
O
(9.20)
)
Then,
EJ T; 1) = Ej ( T}1)) = E j ( t f2 _ r ) = ±-.
03(i) _ ics
1
l
l
(9.21)
7f
Also, taking f to be the indicator function of {i} and replacing j by i in Theorem 9.2, one obtains
#{m
m = = Ir! n:X lim h w probability it 1. n-+ao
(9.22)
n
On the other hand, by the strong law of large numbers applied to
THE LAW OF LARGE NUMBERS
143
r = 1, 2, ...}, and by (9.12) and (9.18), the limit on the left side also equals N„
V T(r) nJ IN, T1'
lim -h---- = lim " n-• oc
N
n- x n \r1
= n 1 Bj (i).
(9.23)
n
Hence, ni = n j Oj (i).
(9.24)
Adding over i, one has using (9.21),
71
(9.25)
f
By Scheffe's Theorem (Theorem 3.7 of Chapter 0) and (9.16), In -
— 71i
—+0
as n —^ oo .
(9.26)
iES nm =1
Hence, 1 — iES
iES
1 TCi — - Y- pj,•
Pam) Pij <
" ')
n m= 1
--> 0
as n
n m= 1
iES
(9.27)
Therefore, n
ni Pi; = lim E - iES
Y Pjl Pi; ")
n-^ iESn m=1
in
= hm - Y
E p^.m)p . `
n-^ao n m= 1 iES
1
= lim
n
(m+l) _
Pi;
— it; •
(9.28)
n-'x n m=1
In other words, it is invariant. Finally, let us show that for a positive recurrent chain the probability distribution it above is the only invariant distribution. For this let it be another invariant distribution. Then it = z niPij (m = 1, 2, ...). ;
)
iES
(9.29)
144
DISCRETE-PARAMETER MARKOV CHAINS
Averaging over m = 1, 2, ... , n, one gets
1 n; = Y_ ni — ieS
i
(9.30)
Pi;
n m=1
The right side converges to Y_ i n i n; = nj by Lebesgue's Dominated Convergence Theorem. Hence n; = n; . We have proved above that if all states of S are positive recurrent and communicate with each other, then there exists a unique invariant distribution given by (9.18). The next problem is to determine what happens if S comprises a single class of recurrent states that are not positive recurrent. Definition 9.2. A recurrent state j is said to be null recurrent if Ej t,' = cc.
(9.31)
If j is a null recurrent state and i.—.j, then for the Markov chain having initial state i the sequence {Z,: r = 1, 2, ...} defined by (9.2) with f - 1 is still an i.i.d. sequence of random variables, but the common mean is infinity. It follows (Exercise 3) that, with Pi -probability 1, j (N")
llm ' = cc. n^oo Nn
Since n
we have lim
n+l
= cc,
n- w Nn
and, therefore, lim N = 0 ". te n+l
with P- probability 1.
"
(9.32)
Since 0 < N" /(n + 1) < 1 for all n, Lebesgue's Dominated Convergence Theorem applied to (9.32) yields N lim E " ;
n
-^
n+1
= 0.
(9.33)
But EiNn = Ei(t. 1 (x.., =J)) = Z pi) , m=1
m=1
(9.34)
THE LAW OF LARGE NUMBERS
145
and (9.33), (9.34) lead to p„) = 0
lim
(9.35)
n-.o Il + 1 m=1
if j is null-recurrent and j -+ i (or, equivalently, j H i). In other words, (9.16) holds if S comprises either a single class of positive recurrent states, or a single class of null recurrent states. In the case that j is transient, (8.11) implies that G(i, j) < co, i.e., 00 p;; )
In particular, therefore, lmp;; = 0
(in S),
)
(9.36)
n -.
if j is a transient state. The main results of this section may be summarized as follows. Theorem 9.4. Assume that all states communicate with each other. Then one has the following results. (a) Either all states are recurrent, or all states are transient. (b) If all states are recurrent, then they are either all positive recurrent, or all null recurrent. (c) There exists an invariant distribution if and only if all states are positive recurrent. Moreover, in the positive recurrent case, the invariant distribution it is unique and is given by n= (E;i ')) 1
(j E S).
(9.37)
(d) In case the states are positive recurrent, no matter what the initial distribution p, if E,, I f (X 1 )I < cc, then n
him ! - Z f(Xm) = Z nif(i) = Enf(X,) n— c
m=1
(9.38)
ieS
with Pµ -probability 1.
Proof. Part (a) follows from Proposition 8.1(a). To prove part (b), assume that j is positive recurrent and let i 0 j. Since , one has ET;' < co. Hence (9.23) holds. Also, (9.22) holds with iv i > 0 if i is positive recurrent, and it = 0 if i is null recurrent (use (9.32) T`r) < r+1) _ T^'
)
)
146
DISCRETE-PARAMETER MARKOV CHAINS
in the latter case). Thus (9.24) holds. Now O (i) > 0; for otherwise T = 0 with probability 1 for all r > 0, implying p , = 0, which is ruled out by the assumption. The relation (9.24) implies it > 0, since it > 0 and O (i) > 0. Therefore, all states are positive recurrent. For part (c), it has been proved above that there exists a unique invariant probability distribution it given by (9.35) if all states are positive recurrent. Conversely, suppose it is an invariant probability distribution. We need to show that all states are positive recurrent. This is done by elimination of other possibilities. If the states are transient, then (9.36) holds; using this in (9.29) (or (9.30)) one would get it = 0 for all j, which is a contradiction. Similarly, null recurrence implies, by (9.30) and (9.35), that ii = 0 for all j. Therefore, the states are all positive recurrent. Part (d) will follow from Theorem 9.2 if: ;
(i) The hypothesis (9.14) holds whenever
(9.39)
Y iil.f(i)I < oo, ics
and n (ü)
E;Zo = E;(.i(X1) + ... + .i(XT;1)) = —1
i
f(i).
(9.40)
tj ics
To verify (i) and (ii), first note that by the definition of
=)
T!
I
(9.41)
If(i)IT;'
If(Xm)I = m
iES
Since ET;' = O (i) = n i /rr (see Eq. 9.24), taking expectations in (9.41) yields Ta
E(
If(i)I"`.
If(i)IT')) =
If(Xm)I = E m=T^ ) +I
ies
iES
(9.42)
7rj
The last equality follows upon interchanging the orders of summation and expectation, which by Fubini's theorem is always permissible if the summands are nonnegative. Therefore (9.14) follows from (9.39). Now as in (9.42), T cn
.i(Xm) = E( , i(i)T;' )
E;Z0 = EZi = E m
_ ics
+i
lES
f(i)ET' ) = 1 f(i) 1 ,
(9.43)
I)
where this time the interchange of the orders of summation and expectation is justified again using Fubini's theorem by finiteness of the double "integral."
n
THE LAW OF LARGE NUMBERS
147
If the assumption that "all states communicate with each other" in Theorem 9.4 is dropped, then S can be decomposed into a set . t of inessential states and (disjoint) classes S l , S 2 , ... , St of essential states. The transition probability matrix p may be restricted to each one of the classes S..... , S, and the conclusions of Theorem 9.2 will hold individually for each class. If more than one of these classes is positive recurrent, then more than one invariant distribution exist, and they are supported on disjoint sets. Since any convex combination of invariant distributions is again invariant, an infinity of invariant distributions exist in this case. The following result takes care of the set , of inessential states in this connection (also see Exercise 4).
Proposition 9.5 (a) If j is inessential then it is transient. (b) Every invariant distribution assigns zero probability to inessential, transient, and null recurrent states. Proof. (a) If j is inessential then there exist i E S and m > I such that P;M > 0
and
)
p;; = 0 )
for all n >, 1.
(9.44)
Hence PJ(N(j)
(9.45)
By Proposition 8.1, j is transient, since (9.45) says j is not recurrent. (b) Next use (9.36), (9.35) and (9.30) and argue as in the proof of part (c) of Theorem 9.2 to conclude that nj = 0 if j is either transient or null recurrent.
n Corollary 9.6. If S is finite, then there exists at least one positive recurrent state, and therefore at least one invariant distribution n. This invariant distribution is unique if and only if all positive recurrent states communicate. Proof. Suppose it possible that all states are either transient or null recurrent. Then lim n -.^
— n+1
p7° = 0 m
for all i, j e S.
(9.46)
-p
Since (n + 1) ' I, _ , p;T < I for all], and there are only finitely many states j, by Lebesgue's Dominated Convergence Theorem, )
148
DISCRETE-PARAMETER MARKOV CHAINS
lim jes n
=
1
1
Pi; ^) = lim
n+ 1 m=1
" Pi;
nix jes n+ 1 m=o
1
lim
(m) %^ij
— n -. x
n + 1 m=0 jcs n
=tim
1 — i 1 =lim n--1 =1. (9.47)
(n + 1 m=o -
n-x,n+1
But the first term in (9.47) is zero by (9.46). We have reached a contradiction. Thus, there exists at least one positive recurrent state. The rest follows from Theorem 9.2 and the remark following its proof. •
10 THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS The same method as used in Section 9 to obtain the law of large numbers may be used to derive a central limit theorem for S. = Ym =o f(Xm ), where f is a real-valued function on the state space S. Write = Ef(X0) = 1] it f(i),
(10.1)
ieS
and assume that, for Z, =
L
, r fr, + 1 f (Xm ), r = 0, 1, 2, ... Ej (Z0
- E ; Zo ) 2 < co.
(10.2)
Now replace f by f = f - µ, and write _
n _
Sn = Y- .J(X.) = Z (f(Xm) - U), m=0
Zr =
m=0
^ J(X)
(10.3)
(r = 0, 1, 2, ...).
m =,(i +1
Then by (9.40), Ej Zr = (Ejt)En f(Xo) = 0
(r = 0, 1, 2, ...).
(10.4)
Thus {Z,: r = 1, 2, ...} is an i.i.d. sequence with mean zero and finite variance or = EJ
. (10.5)
Now apply the classical central limit theorem to this sequence. As r -• cc, (1/^) jk =1 Zk converges in distribution to the Gaussian law with mean zero and variance U 2 . Now express Sn as in (9.6) with f replaced by f, S" by S,, and
149
THE CENTRAL LIMIT THEOREM FOR MARKOV CHAINS
Zr by Zr, to see that the limiting distribution of (1/^)S n is the same as that of (Exercise 1) 1 N,,
7
r
(
i
Na )112
1
N"
(10.6)
JNn r=1
n
We shall need an extension of the central limit theorem that applies to sums of random numbers of i.i.d. random variables. We can get such a result as an extension of Corollary 7.2 in Chapter 0 as follows. Proposition 10.1. Let {XX : j >, 1} be i.i.d., EX^ = 0, 0 < a 2 := EX J2 < oc. Let {v n : n >, l} be a sequence of nonnegative integer-valued random variables with v
(10.7)
in probability
lim " = a n-•,, n
XX/✓v, converges in distribution to
for some constant a > 0. Then N(0, a 2 ).
Without loss of generality, let a = 1. Write S n := X, F > 0 arbitrarily. Then, Proof.
P(IS,•" — Small > c([na])
1/2
) < P(Ivn — [na]I
+
P(
+ • •
+ Xn . Choose
s 3 [na])
max
ISm — (n]1 % c([na]) 1J2 )
(mI — (na1I
(10.8)
The first term on the right goes to zero as n —> x, by (10.7). The second term is estimated by Kolmogorov's Maximal Inequality (Chapter 1, Corollary 13.7), as being no more than {s([na]) 1 J z ) -z (8 3 [na]) = r.
(10.9)
This shows that S—" — Sfna1 —+ 0 1[na])
1r2
in probability.
(10.10)
Since S1na1 /([na]) l " 2 converges in distribution to N(0, 1), it follows from (10.10) that so does S"/([na])'/ z . The desired convergence now follows from (10.7).
n
By Proposition 10.1, N»'2 1 Zr is asymptotically Gaussian with mean zero and variance o z . Since N„/n converges to (Er') ', it follows that the expression in (10.6) is asymptotically Gaussian with mean zero and variance
150
DISCRETE-PARAMETER MARKOV CHAINS
(E j i5 l) ) -1 0 2 . This is then the asymptotic distribution of defining, as in Chapter I, Wn(t) =
n - '/ 2 5,,.
Moreover,
Sin(]
n+l
(10.11)
I
l' (t) = W(t) +
(nt — [nt])X 1+ 1 (t '> 0),
n+l
all the finite-dimensional distributions of {W„(t)}, as well as {W(t)}, converge in distribution to those of Brownian motion with zero drift and diffusion coefficient D = (Ejij1>)-'Q2
(10.12)
(Exercise 2). In fact, convergence of the full distribution may also be obtained by consideration of the above renewal argument. The precise form of the functional central limit theorem (FCLT) for Markov chains goes as follows (see theoretical complement 1). Theorem 10.2. (FCLT). If S is a positive recurrent class of states and if (10.2) holds then, as n -+ oo, l4(I) = (n + 1) -112 S„ converges in distribution to a Gaussian law with mean zero and variance D given by (10.12) and (10.5). Moreover, the stochastic process {W(t)} (or {W(t)}) converges in distribution to Brownian motion with zero drift and diffusion coefficient D. Rather than using (10.12), a more convenient way to compute D is sometimes the following. Write E,^, Var, f , Cov,, for expectation, variance, and covariance, respectively, when the initial distribution is it (the unique invariant distribution). The following computation is straightforward. First write, for any two functions h, g that are square summable with respect to it, = Z h(i)g(i)r1 = E,,[h(X0)g(X0)].
(10.13)
ics
Then,
f(Xm)
Var,,((n + 1)-1/2) = E,d (
M=O
JJ
Z
'(n + 1)
= E, MZ0 1 2(Xm) +2 mm 1 M^ O J (Xm')J (Xm)]'(n + 1) ,
/
n + 1 m=0
E,J 2 (Xm)
ABSORPTION PROBABILITIES
151
+ —?
Y E,,[J (Xm , )Erz(f (Xm) I Xr')]
I
n + 1 m=1 m0 2
= EJ 2(X0) +
— Y Y Erz[f 1
(Xm , )(P
m-m
J )(XYr')]
n + 1 m=1 m=0 rn-
I
= EJ 2 (X0 ) + —^
Erz
% (X0)(P m
'.j)(X0)]
n + 1 m=1 m'=0 2
n
m-1
=ErzfZ(X0)+ ---- Z _Z <J,Pm-mJ>rz ,
n + 1 m=1 m'=0
=Erzf 2 (X0)+
f,
2
1 Y-
k Y-Pi
I n
n+ 1 m = 1 k= 1 I
( k=m—m ' )
(10.14)
Now assume that the limit m
y=
lim
I
exists and is finite. Then it follows from (10.14) that D = lim Var,,((n + l)_' 2 ) = E,, f 2 (Xo ) + 2y = n^ o
irj f 2 (j) + 2y. (10.16) is
Note that
(10.17)
and m Y <j P k j >rz = k=1
m COV{f(X0),J (Xk)l. k=1
The condition (10.15) that y exists and is finite is the condition that the correlation decays to zero at a sufficiently rapid rate for time points k units apart as k , oo.
11 ABSORPTION PROBABILITIES Suppose that p is a transition probability matrix for a Markov chain {Xn } starting in state i e S. Suppose that j e S is an absorbing state of p, i.e., p 1, Pik = 0 for k # j. Let r j denote the time required to reach j,
152
DISCRETE-PARAMETER MARKOV CHAINS
= inf{n: X. =j}. (11.1)
rj
To calculate the distribution of i j , consider that P1(r > m) = Y-* Per, PiIi2... Pi-- ;m ,
(11.2)
where Y* denotes summation over all m-tuples (i,, i 2 , ... , i„) of elements from S — { j}. Now let p denote the matrix obtained by deleting the jth row and jth column from p,
°
p°
=
((p ik : i, k E S — { j})).
(11.3)
Then, by definition of matrix multiplication, the calculation (11.2) may be expressed as (11.4)
Pi(t j > m) _ L P kkm1, k
and, therefore, Pi(tj=m)=ZA<M-11_LPklml k
7 Ai.
(11.5)
k
Observe that the above idea can be applied to the calculation of the first passage time to any state j e S or, for that matter, to any nonempty set A of states such that i 0 A. Moreover, it follows from Theorem 6.4 and its corollaries that the rate of absorption is, therefore, furnished by the spectral radius of p ° . This will be amply demonstrated in examples of this section. Proposition 11.1. Let p be a transition probability matrix for a Markov chain {X„} starting in state i. Let A be a nonempty subset of S, i 0 A. Let T A = inf{n >, 0: X„ E A}.
(11.6)
Then, R(TA-m)=1—Y_Pk^m),
m=1,2,...,
(11.7)
k
°
where p is the matrix obtained by deleting the rows and columns of p corresponding to the states in A.
°
In general, the matrix p is not a proper transition probability matrix since the row sums may be strictly less than 1 upon the removal of certain columns from p. However, if each of the rows in p corresponding to states j e A is replaced by ej having 1 in the jth place and 0 elsewhere, then the resulting matrix , say, is a transition probability matrix and
153
ABSORPTION PROBABILITIES
Pi(TA
< m) = 1 —
Pik )
(11.8)
k¢A
The reason (11.8) holds is that up to the first passage time t o the distribution of Markov chains having transition probability matrices p and (starting at i) are the same. In particular, Pik ) =Pk(m) fori,k0A.
(11.9)
In the case that there is more than one state in A, an important problem is to determine the distribution of X,,, starting from i A. Of course, if P; ('r A < co) < 1 then X, is a defective random variable under Pi, being defined on the set {r A < co} of P; -probability less than 1. Write aJ(i):= Pi ({T A < co, XXA =j}) (j E A, i E S).
(11.10)
By the Markov property, (conditional on X, in (11.10)), a j (i)_>P ik a i (k)
(jEA,ieS).
(11.11)
k
Denoting by a j the vector (a(i): i E S), viewed as a column vector, one may express (11.10) as (je A).
aj=paj
(11.12)
Alternatively, (11.11) or (11.12) may be replaced by
a j (i) _ E p k a J (k)
for i A,
k
(11.13) _ 1 a ' (j) 0
ifi=j if i E A, but i A j
(^ E A).
A function (or, vector) a = (a(i): i ES) is said to be p-harmonic on B(c S) if a(i) = (pa)(i)
for i e B.
(11.14)
Hence a j is p-harmonic on A` and has the (boundary-)values on A, 1 a ' (i)
f o if
ifi=j i e A, i^ j.
We have thus proved part (a) of the following proposition.
(11.15)
154
DISCRETE-PARAMETER MARKOV CHAINS
Proposition 11.2. Let p be a transition probability matrix and A a nonempty subset of S. (a) Then the distribution of Xtq , as defined by (11.10), satisfies (11.13). (b) This is the unique bounded solution if and only if forallieS.
P; (T A
(11.16)
Proof. (b) Let i e A. Then Pil o = P1 (T A > n)1 P,(T A = cc)
as n T oo.
(11.17)
kEAl
Hence, if (11.16) holds, then um p;V = 0
for all i, k e A. (11.18)
n-. .0
On the other hand, Pil o =P(TA
Now let a be another solution of (11.13), besides a j . Then a satisfies (11.12), which on iteration yields a = p"a. Taking the limit as n j oo, and using (11.18), (11.19), one gets a(i) = tim a(k) _ n—w k
a k (i)a(k) = a(i)
(11.20)
keA
for all je A`, since a(k) = 0 for k E A\{ j} and a(j) = 1. Hence a ; is the unique solution of (11.13). Conversely, if P; (T A < cc) < 1 for some in A`, then the function h = (h(i): i e S) defined by h(i):= I — P; (T A < cc) = Pi(TA = 00)
(i n S),
(11.21)
may be shown to be p-harmonic in A with (boundary-)value zero on A. The harmonic property is a consequence of the Markov property (Exercise 5), h(i) = Pi(TA = 00 ) = Y_ PikPk(TA = 00 ) k
_ Y_ Pik') _ Y_ P ik h(k) k
k
(i E A`).
(11.22)
ABSORPTION PROBABILITIES
155
Since P;(t A = 0) = 1 for je A, h(i) = 0 for je A. It follows that both a j and a J + h satisfy (11.13). Since h r 0, the solution of (11.13) is not unique. •
Example 1. (A Random Replication Model). The simple Wright-Fisher model originated in genetics as a model for the evolution of gene frequencies. Here the mathematical model will be described in a somewhat different physical context. Consider a collection of 2N individuals, each one of which is either in favor of or against some issue. Let X„ denote the number of individuals in favor of the issue at time n = 0, 1, 2, .... In the evolution, each one of the individuals will randomly re-decide his or her position under the influence of the current overall opinion as follows. Let B„ = X /2N denote the proportion in favor of the issue at time n. Then given X 0 , X...... X, each of the 2N individuals, independently of the choices of the others, elects to be in favor with probability 0,, or against the issue with probability I — 0„. That is, X
(2N) 8n(1 — B„)zN k
P(-Y„+1 = k X o , X,, ... , X„)
-
(11.23)
for k = 0, 1, ... , 2N. So {X„} is a Markov chain with state space S = {0, 1, ... , 2N} and one-step transition matrix p = ((p, j )), where Pik _ (2N^^2N)j(I
)2N-i —
i,
j = 0, 1, ... , 2N.
(11.24)
Notice that {X„} is an aperiodic Markov chain. The "boundary” states {0} and {2N} form closed classes of essential states. The set of states { 1, 2, ... , 2N — I } constitute an inessential class. The model has a special conservation property of the form of the following martingale property, E(X„+1 1 X o , X 1 ..... X„) = E(X„+1 1 X„) = (2N)2N = X„, (11.25)
for n = 0, 1, 2, .... In particular, therefore, EX,,, = E{E(X„+, 1 X o , X t , ... , X„)} = EX„,
(11.26)
for n = 0, 1, 2..... However, since S is finite we know that in the long run {X„} is certain to be absorbed in state 0 or 2N, i.e., the population is certain to eventually come to a unanimous opinion, be it pro or con. It is of interest to calculate the absorption probabilities as well as the rate of absorption. Here, with A = {0, 2N}, one has p = p. Let a(i) denote the probability of ultimate absorption at j = 0 or at j = 2N starting from state i e S. Then,
156
DISCRETE-PARAMETER MARKOV CHAINS
for 0< i < 2N, a2N( 1 ) _ > hika2N(k) k / a2N( 2 N) = 1, a2N(0) = 0,
(11.27)
and a o (i) = 1 — a 2N (i). In view of (11.26) and (11.19) we have i = E ; X0
2N = E . X,, _ kp ;k) —. 0a 0 (i) + 2 Na2N(i) = 2 Na2N(i) k=0
for i = 1, ... ,
2N — 1. Therefore, a 2N() i = ao(i)
2N
,
i
= 2N —i 2N '
=0, 1, 2,...,2N,
(11.28)
i = 0, 1, 2, ... , 2N.
(11.29)
Check also that (11.29) satisfies (11.27). In order to estimate the rate at which fixation of opinion occurs, we shall calculate the eigenvalues of p(= p here). Let v = (v o v 2N )' and consider the eigenvalue problem ,
...
,
2N
E' p j vj =2v i , ;
j=o
i
=0,1,...,2N.
(11.30)
The rth factorial moment of the binomial distribution (p ij : 0 < j < 2N) is
(2N — rl
(2N — r + 1) Z j(j — 1)... (j — r + l)pij =( 1r(2N)... 2N) j =r j — r j=o (jJ_r i 2N -j x 2N) 1 — 2N
_ (_ N I r (2N)(2N — 1)...(2N — r + 1), jj (11.31) for r = 1, 2, ... , 2N. Equation (11.31) contains a transformation between "factorial powers" and "ordinary powers" that deserves to be examined for connections with (11.30). The "factorial powers" (j) r := j(j — 1)... (j — r + 1) are simply polynomials in the "ordinary powers" and can be expressed as
j(1 — 1)•••(j — r + 1) _ Likewise, "ordinary powers"
r k =1
skjk.
(11.32)
jr can be expressed as polynomials in the "factorial
ABSORPTION PROBABILITIES
157
powers" as r
(11.33)
Sk(J)k,
J r = k=0
with the convention (j) 0 = 1. Note that S r' = 1 for all r > 0. The coefficients Sr), {Sk} are commonly referred to as Stirling coefficients of the first and second kinds, respectively. Now every vector v = (v 0 , ... , v 2N )' may be represented as the successive values of a unique (factorial) polynomial of degree 2N evaluated at 0, 1, . .. , 2N (Exercise 7), i.e., 2N
a r (j)r forj=0,1,...,2N.
(11.34)
r=0
According to (11.24), (11.32), 2N
2N
j =0 r=0
2N
2N
!2N\2N
/2N) r
S( r jr = Y- ar r Y- a r Y- pij(J)r —r=0 2N n=0 j=0 (2N) r=0 (2N)
pij vj = L-
ar
t J
\
2N /2N
=
Y- ar (2N (2N)r )r S»)(1)n.
(11.35)
n=0 r=n
It is now clear that (11.30) holds if and only if 2N
(2N)'
a, r
(2N)r S »
(11.36)
7
n
In particular, taking n = 2N and noting S' = 1, we see that (11.36) holds if
_ (2 N )2N
_ ^`'2N
(2N)2N +
(2N)
a2N
I, —
\ 1 1.37 ) a2n
and a r = & 2N) , r = 2N — 1,... ‚0, are solved recursively from (11.36). Next takea2N 1) — 0 , a2N-1 1) = 1 and solve for (2N)2N_
1
(2N)2N-1'
etc.
Then, 2r (2N)' = ( 1 2N)...(1 — r — 1),
0 < r < 2N.
(11.38)
158
DISCRETE-PARAMETER MARKOV CHAINS
Notice that .l o = A l = 1. The next largest eigenvalue is A 2 = 1 — (1/2N). Let V = ((v ij )) be the matrix whose columns are the eigenvectors obtained above. Writing D = diag(a. 0 , A.1, ... , 2 2 ,), this means pV = VD,
(11.39)
p = VDV -1 ,
(11.40)
p'° = VDmV -1 .
(11.41)
or
so that
Therefore, writing V' = ((v i ')), we have from (11.8) 2N-1 2N
2N-1
2N 2N-1
Y_ Y_ Ak vikU k ' _ Y
Pi(T(o.2N) > m) = Y p,, j=1
Y_ V ik V k ' tik . (11.42)
k=O j=1
j=1 k=O
Since the left side of (11.42) must go to zero as m —• oo, the coefficients of and A; - 1 must be zero. Thus, 2N 2N-1
Pi(T(0.2N) > m) _ Y, Y, Vikv k ^^k k=O j=1
_ ^2
Lt
, E1 1
vi2v 2i ) + kY 1
— (const.).12
, Y1 1
for large m.
Vik Uk,/ \^ 2/ m J
(11.43)
Example 2. (Bienayme—Galton—Watson Simple Branching Process). A simple branching process was introduced as Example 3.7. The state X,, of the process at time n represents the total number of offspring in the nth generation of a population that evolves by i.i.d. random replications of parent individuals. If the offspring distribution is denoted by f then, as explained in Section 3, the one-step transition probabilities are given by .f * `(j),
ifi>'1,j>'0,
p ij = 1
ifi=0,j=0,
0
ifi=0,j 0.
(11.44)
According to the values of p i j at the boundary, the state i = 0 is absorbing (permanent extinction). Write p io for the probability that eventually extinction occurs given X0 = i. Also write p = p lo . Then p io = p i , since each of the i sequences of generations arising from the i initial particles has the same chance
ABSORPTION PROBABILITIES
159
p of extinction, and the i sequences evolving independently must all be extinct in order that there may be eventual extinction, given X0 = i. If f(0) = 0, then p = p, o = 0 and extinction is impossible. If f(0) = 1, then pro % pi = I and extinction is certain (no matter what X o is). To avoid these and other trivialities, we assume (unless otherwise specified) -
0
f(0)+f(1)<1.
(11.45)
Introduce the probability generating function of f: .
0(z) _
f(j)z' = f( 0 ) + i =o
i= 1
f(j)z i ( IzI '< 1).
(11.46)
Since a power series can be differentiated term by term within its radius of convergence, one has ( Iz1 < 1).
4 (z) = — 4)(z) _ Y_ jf(j)z i ' ,
dz
(11.47)
j=1
Y_
If the mean y of the number of particles generated by a single particle is finite, i.e., if µ=
(11.48)
ifU) < oo,
i =1
then (11.47) holds even for the left-hand derivative at z = 1, i.e., (11.49)
Since ç&'(z) > 0 for 0 < z < 1, 0 is strictly increasing. Also, since 4) "(z) (which exists and is finite for 0 < z < l) satisfies dz zz
O(z) _
j(j — 1)f(j)z'
2
>0
for 0 < z < 1, (11.50)
i =2
the function 0 is strictly convex on [0, 1]. In other words, the line segment joining any two points on the curve y = 4)(z) lies strictly above the curve (except at the two points joined). Because 0(0) = f(0) >0 and 4)(l) _ f(j) = 1, the graph of ¢ looks like that of Figure 11.1 (curve a or b). The maximum of ¢'(z) is y, which is attained at z = 1. Hence, in the case µ > 1, the graph of y = 4)(z) must lie below that of y = z near z = 1 and, because 4)(0) = f(0) > 0, must cross the line y = z at a point z o , 0 < z o < 1. Since the slope of the curve y = 4)(z) continuously increases as z increases in (0, 1), z o is the unique solution of the equation z = 4)(z) that is smaller than 1.
160
DISCRETE-PARAMETER MARKOV CHAINS
4(0) = f(0)
U
Zo
I
z
Figure 11.1 In case u <, 1, y = cb(z) must lie strictly above the line y = z, except at z = 1. For if it meets the line y = z at a point z o < 1, then it must go under the line in the immediate vicinity to the right of z o , since its slope falls below that of the line (i.e., unity). In order to reach the height 4(l) = 1 (also reached by the line at the same value z = 1) its slope then must exceed 1 somewhere in (z o , 1]; this is impossible since 0'(z) < 0'(1) = p < I for all z in [0,1]. Thus, the only solution of the equation z = 4(z) is z = 1. Now observe P=Pio=
P(X1=»X0= 1 )Pjo= ^.f(1)P'=q5 (P),
(11.51)
thus if p <, 1, then p = 1 and extinction is certain. On the other hand, suppose p> 1. Then p is either z o or 1. We shall now show that p = z o (< 1). For this, consider the quantities q„'=P(X„=0I Xo = 1)=Piö (n=1,2,...).
(11.52)
That is, q„ is the probability that the sequence of generations originating from a single particle is extinct at time n. As n increases, q. j p; for clearly, {X„=0}c{X,,,=0} for all m>,n,so that q„<,qifn<m.Also lim X„ = 0 = U {X„ = 0} = {extinction occurs). Now, by independence of the generations originating from different particles,
ABSORPTION PROBABILITIES
161
P(X„=O1 Xo=j)=q i (J=O, q,,+1 =P(X,+1 =01X 0 =
1)=P(X I
1,2,...),
=01X0 = 1)
cIJ
+ Y- P(X 1 =J,X,+1 =O^Xo= I) j=1 co
=f( 0 )+ Z P(X 1 = j I Xo = 1 )P(X„+1 = 0 1X0 = 1, X 1 =J) j=1
_ f(0) +
(11.53)
(n = 1, 2, ...).
f(J)q,' = O(qn) i =1
Since q 1 = f(0) = 4ß(0) < 4(z 0 ) = z o (recall that çb(z) is strictly increasing in z for 0 < z < 1), one has using (11.53) with n = 1, q 2 = o(q 1 ) < O(z o ) = z o , and so on. Hence, q„
If f(0) + f(1) = 1 and 0
Z kPik+l^ _
k(Z PliPik) k=1 \j=1
k=1
_^ ^k P ^n1c11—^ ^i(k^111 1j Pjk — P1j Pjk l `
J.
k=1 j=1
ro
O
p E(XI I Xo =j)=
_
pi jE(XI 1X0 = 1)
)
j=1
j=1 a.
ao
_
k=1
Pi;Ju = P
1Pij = µE(X„ I X o =
1).
(11.54)
j= 1 j=1
Continuing in this manner, one obtains E(X,,+ 1 I Xo = 1) = uE(X,, j X0 = 1) = p 2 E(XX- j Xo = 1 ) =...=p"E(X 1 IXo =1)=µn+ 1 .
(11.55)
It follows that
E(X,, I X0 =1) =.IP ° .
(11.56)
Thus, in the case µ < 1, the expected size of the population at time n decreases to zero exponentially fast as n -+ co. If µ = 1, then the expected size at time n does not depend on n (i.e., it is the same as the initial size). If µ > 1, then the expected size of the population increases to infinity exponentially fast.
162
DISCRETE-PARAMETER MARKOV CHAINS
12 ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE The notion of a Gibbs distribution has its origins in statistical physics as a probability distribution with respect to which bulk thermodynamic properties of materials in equilibrium can be expressed as expected values (called phase averages). The thrust of Gibbs' idea is that a theoretically convenient way in which to view materials at the microscopic scale is as a large system composed of randomly distributed but interacting components, such as the positions and momenta of molecules comprising the material. To arrive at an appropriate probability distribution for computing large-scale averages, Gibbs argued that the probability of a given configuration of component values in equilibrium should be inversely proportional to an exponential function of the total energy of the configuration. In this way the lowest-energy configurations (ground states) are the most likely (modes) to occur. Moreover, the additivity of the combined total energy of two noninteracting systems is reflected in the multiplication rule for (exponential) probabilities of independent values for such a specification. It is a tribute to the genius of Gibbs that this approach leads to the correct thermodynamics at the bulk material scale. Perhaps because of the great success these ideas have enjoyed in physics, Gibbs' probability distributions have also been introduced to represent systems having a large number of interacting components in a wide variety of other contexts, ranging from both genetic and automata codes to economic and sociological systems. For purposes of orientation one may think of random values {X„: n e A} (states) from a finite set S distributed over a finite set of sites. The set of possible configurations is then represented by the cartesian product S2:_ {w = ( 6„: n e A) n --> fi n e S. n e A, is a function on Al. For A and S finite, S2 is also a finite set and a (free-boundary) Gibbs distribution on S2 can be described by a probability mass function (p.m.f.) of the form
I
P(Xn = o, n e A):= Z ` exp{ —ßU(cw)}, -
w = ( 6,,) e S
(12.1)
where U(w) is a real-valued function on S2, referred to as total potential energy of configurations as e S2, ß is a positive parameter called inverse temperature, and
Z:=
Z exp{—ßU(w)},
(12.2)
WE
is the normalization constant, referred to as the partition function. Observe that if P is any probability distribution on a finite set S2 that assigns strictly positive probability p(w) to each co e S2, then P can trivially be expressed in the form (12.1) with U(w) = — ß ' log{p(w)}. In physics, however, one regards the total potential energy as a sum of energies at individual sites plus the sum of interaction energies between pairs of sites plus the sum of energies between triples, etc., and the probability distribution is specified for various types of such interactions. For example, if U(w) = > fEA q1(6 n ) for -
ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE
163
co = (o„: n E A) is a sum of single-site energies, higher-order interactions being zero, then the distribution (12.1) is that of independent components; similarly for the case of infinite temperature ((i = 0). It is both natural and quite important to consider the problem of extending the above formulation to the so-called infinite volume setting in which A is a countably infinite set of sites. We will consider this problem here for one-dimensional systems on A = Z consisting of interacting components taking values in a finite set S and for which the nonzero interaction energies contributing to U are at most between pairs of nearest-neighbor (i.e., adjacent) integer sites. Let Q = S 1 for a finite set S and let .F be the sigmafield generated by finite-dimensional cylinder sets of the form C={w=(a„)efto k =s k forkeA}, (12.3) where A is a finite subset of Z and Sk e S is specified for each k e A. Although it would be enough to consistently prescribe the probabilities for such events C, except for the independent case it is not quite obvious how to do this starting with energy considerations. First consider then the independent case. For single-site energies q 1 (s, n), se S. at the respective sites ne Z, the probabilities of cylinder events can be specified according to the formula P(C) =
Zn' exp{ —ß l
Y_
cp, (s k , k)}, (12.4) J
keA
where ZA appropriately normalizes P for each set of sites A. In the homogeneous (or translation-invariant) case we have cp,(s, n) = cp l (s), s e S, depending on the site n only through the component values s at n. Now suppose that we have in addition to single-site energies cp, (s), pairwise nearest-neighbor interactions represented by (p 2 (s, n; t, m) for In — ml = 1, s, t e S. For translation invariance of interactions, take cp 2 (s, n; t, m) = q 2 (s, t) if In — ml = 1 and 0 otherwise. Now because values inside of A should be dependent on values across boundaries of A it is less straightforward to consistently write down expressions for P(C) than in (12.4). In fact, in this connection it is somewhat more natural to consider a (local) specification of conditional probabilities of finite-dimensional events of the form C, given information about a configuration outside of A. Take A = {n} and consider a specification of the conditional probability of C = {X„ = s} given an event of the form {Xk = S k for k e D„ \{n}}, where D„ is a finite set of sites that includes n and the two neighboring sites n — 1 and n + 1, as follows P(X„= sIXk =s k ,keD\{n}) = Z,,.''„ exp{
—
ß[cpi(s) + (P2(s, s„ -1) + (P2(s, sn +i)] }>
(12.5)
where Zn.n„=
Y exp{
scs
—
ß [9,(s)+92(s>s,- 1)+42(s,S,, +i)] }.
(12.6)
164
DISCRETE—PARAMETER MARKOV CHAINS
That is, the state at n depends on the given states at sites in D„ \ {n} only through the neighboring values at n — 1, n + 1. One would like to know that there is a probability distribution having these conditional probabilities. For the one-dimensional case at hand, we have the following basic result. be the Theorem 12.1. Let S be an arbitrary finite set, S2 = S^, and let sigmafield of subsets of ) generated by finite-dimensional cylinder sets. Let {Xn } be the coordinate projections on S2. Suppose that P is a probability measure on S with the following properties. (i) P(C) > 0 for every finite-dimensional cylinder set C e S. (ii) For arbitrary n c- Z let 3(n) = {n — 1, n + 1 } denote the boundary of {n} in 7L. If f is any S-valued function defined on a finite subset D,, of 7/ that contains {n} u ä(n) and if a e S is arbitrary, then the conditional probability P(X„=a Xm = f(m),meD„\{n}) depends only on the values of f on a(n). (iii) The value of the conditional probability in (ii) is invariant under translation by any amount 0 e 7L. Then P is the distribution of a (unique) stationary (i.e., translation-invariant) Markov chain having strictly positive transition probabilities and, conversely, every stationary Markov chain with strictly positive transition probabilities satisfies (i), (ii), and (iii). Proof. Let gb,C(a)=P(X„=1 X.- =b,X„+„ =c).
(12.7)
The family of conditional probabilities {g b (a)} is referred to as the local structure of the probability distribution. We will prove the converse statement first and along the way we will calculate the local structure of P in terms of the transition probabilities. So assume that P is a stationary Markov chain with strictly positive transition matrix ((p(a, b): a, b e S)) and marginal distribution (n(a): a E S). Consider cylinder set probabilities given by ... p(ab_ 1, ab). (12.8) P(XX = a o , X„+ = a l , ... , X,,+n = ab) = ir(ao)p(ao, a, )
So, in particular, the condition (i) is satisfied; also see (2.9), (2.6), Prop. 6.1. For m and n > I it is a straightforward computation to verify P(Xo=al X-m=b_„,,...,X_, =b-,,X1 =b1,...,XX =b^)
=P(X0 =ajX- 1
=b-1,X1 =b1)
= p(b-1, a)p(a, bi) /p
j2)
(b- i, bi
)
= gb— e. b,(a).
(12.9)
ONE-DIMENSIONAL NEAREST-NEIGHBOR GIBBS STATE
165
Therefore, the condition (ii) holds for P. since condition (iii) also holds because P is the distribution of a stationary Markov chain. Next suppose that P is a probability distribution satisfying (i), (ii), and (iii). We must show that P is the distribution of a stationary Markov chain. Fix an arbitrary element of S, denoted as 0, say. Let the local structure of P be as defined in (12.7). Observe that for each b, c E S. g( ) is a probability measure on S. Outlined in Exercise 1 are the steps required to show that g(a)
q(b, a)q(a, c)
=q^z^(b ^)
(12.10)
0 , (b) g g 9 (0)
(12.11)
where q(b,
c)
=
and q("+
'>
(b, c) = I q(b, a)q(a, c).
(12.12)
a€S
An induction argument, as outlined in Exercise 2, can be used to check that P(`ik+h = ak , 1 < k < r I Xh+ l -" = a, Xh+r+n = b)
=
q " (a, a1)q( ai, a2 . .. q (")( ar b) (
)
)
q (r+ 2n - 1)(Q b)
b)
>
(12.13)
l ,
for h E 7L, a , a, b E S. n, r > 1. We would like to conclude that P coincides with the (uniquely determined) stationary Markov chain having transition probabilities ((q(b, c))) defined by normalizing ((q(b, c))) to a transition probability matrix according to the special formula ;
4(b,c)=
q( b, c)µ(c)
--- _ Y q(b,a),u(a)
9(b, c)µ(c)
— Au(b)
(12.14)
aES
where p = (µ(a)) is a positive eigenvector of q corresponding to the largest (in magnitude) eigenvalue ), = Amax of q. Note that q is a strictly positive transition probability matrix with unique invariant distribution n, say, and such that (Exercise 3) 9(b, c) = q^"
)
(b, c) µ(c)
(12.15)
2 "u(b)
Let Q be the probability distribution of this Markov chain. It is enough to
166
DISCRETE-PARAMETER MARKOV CHAINS
show that P and Q agree on the cylinder events. Using (12.13) we have for any n
1,
P(X0 =a o ,...,Xr = ar) P(X-n= a,Xn+r=b)P(X0=a0,...,Xr=arI X - n = a,Xn+r=b)
_
aES bES
P(X-n = a, Xn+r = b) q
_
(n) (a ,
ao)9(ao, a,) ... q(ar-,, ar)q
(n)(a,, b)
q(r+2n)(a, b)
aES bES
Y_ Y_ P ( X
-
n = a, Xn+r = b) 4
(n) (a ,
a0)4(a0, a1). . • (a r _ 1, ar)4 (n) (ar , b ) q(r
aeS beS
+2n)(a, b)
(12.16) Now let n - oo to get by the fundamental convergence result of Proposition 6.1 (Exercise 4), P(X0 = a 0 ,
... ,
Xr = a,) = ir(ao)4(ao, a1)...4(ar -i, ar)
=Q(X0
(12.17)
=a o ,...,Xr =a r ).
Probability distributions on 7L that satisfy (i)-(iii) of Theorem 12.1 are also referred to as one-dimensional Markov random fields (MRF). This definition has a natural extension to probability distributions on 7L called the d-dimensional Markov random field. For that matter, the important element of the definition of a MRF on a countable set A is that A have a graph structure to accommodate the notion of nearest-neighbor (or adjacent) sites. The result here shows that one-dimensional Markov random fields can be locally specified and are in fact stationary Markov chains. While existence of a probability distribution satisfying (i)-(iii) can be proved for any dimension d (for finite S), the probability distribution need not be unique. This interesting phenomenon is known as a phase transition. Existence may fail in the case that S is countably infinite too (see theoretical complement 1).
°
13 A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS The canonical construction of Markov chains on the space of trajectories has been explained in Chapter I, Section 6, using Kolmogorov's Existence Theorem. In the present section and the next, another widely used general method of construction of Markov processes on arbitrary state spaces is illustrated. Markovian models in this form arise naturally in many fields, and they are often easier to analyze in this noncanonical representation. Example 1. (The Linear Autoregressive Model of Order One, or the AR(1) Let b be a real number and {E n : n > 1} an i.i.d. sequence of real-valued
Model).
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
167
random variables defined on some probability space (f), .F, P). Given an initial random variable X 0 independent of {E„}, define recursively the sequence of random variables {X": n > 0} as follows: X,, X 1 := bX o + c ,
Xn+ 1 := bXn +„
(n 0).
(13.1)
As X0 , X 1 ... . , Xn are determined by {X 0 , s„ .... E n }, and E" + , is independent of the latter, one has, for all Bore! sets C, P(Xn+t E
Cl {X0 , X 1 ...
.. Xn}) = [P(bx + c,,1 E C)]x=x„
= [P(En+ ^ E C — bx)] =x„ = Q(C — bXn), (13.2)
where Q is the common distribution of the random variables t.• n . Thus {Xn : n >, 0} is a Markov process on the state space S = ER', having the transition probability (of going from x to C in one step) p(x, C):= Q(C — bx),
(13.3)
and initial distribution given by the distribution of X 0 . The analysis of this Markov process is, however, facilitated more by its representation (13.1) than by an analytical study of the asymptotics of n-step transition probabilities. Note that successive iteration in (13.1) yields X,=bX0 +E,,
X 2 =bX,+f 2 =b 2 X 0 +he,+c 2 ••
XX = bnX0 + bn-' E1 + bn-2 c2 + ... + be„-, + e„
(13.4)
(n % I).
The distribution of Xn is, therefore, the same as that of Y„ := b"X0 + e, + bs 2 + b 2 e 3 + • • . + b""'F
(n >, 1).
(13.5)
Assume now that IbI < 1
(13.6)
and le n s < c with probability I for some constant c. Then it follows from (13.5) that b"En+^
Yn —'
a.s.,
(13.7)
n=0
regardless of X0 . Let it denote the distribution of the random variable on the right side in (13.7). Then Y. converges in distribution to it as n -, oo (Exercise 1). Because the distribution of X. is the same as that of Yn , it follows that X. converges in distribution to n. Therefore, is is the unique invariant distribution for the Markov process {Xn }, i.e., for p(x, dy) (Exercise 1).
168
DISCRETE-PARAMETER MARKOV CHAINS
The assumption that the random variable s, is bounded can be relaxed. Indeed, it suffices to assume
"
E P(IE I > cS") < oo 1
for some 6 <---‚ and some c > 0.
I
(13.8)
I
For (13.8) is equivalent to assuming P(IE" + 1 I > c8") < co so that, by the Borel—Cantelli Lemma (see Chapter 0, Section 6), P(IE"+ lI < cS" for all but finitely many n) = 1. This implies that, with probability 1, Ib "E" +l l < c(Ibj6)" for all but finitely many n. Since IbI b < 1, the series on the right side of (13.7) is convergent and is the limit of Y". It is simple to check that (13.8) holds if I bi < 1 and (Exercise 3) Eie 1 i'
for some r > 0.
(13.9)
The conditions (13.6) and (13.8) (or (13.9)) are therefore sufficient for the existence of a unique invariant probability it and for the convergence of X. in distribution to it.
_>
Next, Example 1 is extended to multidimensional state space. Example 2. (General Linear Time Series Model). Let {E": n 1} be a sequence of i.i.d. random vectors with values in Rm and common distribution Q, and let B be an m x m matrix with real entries b . Suppose X o is an m-dimensional random vector independent of {E" }. Define recursively the sequence of random vectors ;;
X0,X
1'= BXn+En +1
(n =0, 1,2,...).
(13.10)
As in (13.2), (13.3), {X"} is a Markov process with state space 68'" and transition probability
p(x, C):=Q(C — Bx)
(for all Borel sets C
c Qm). (13.11)
Assume that
IIB"°II < 1
for some positive integer n o .
( 13.12)
Recall that the norm of a matrix H is defined by IIHII:= sup IHxi, IXI =1
(13.13)
169
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
where Ixl denotes the Euclidean length of x in LRm. For a positive integer n > n o write n = jn o + j', where 0 < j' < n o . Then using the fact IIB1B2II IIB1II IIB2II for arbitrary m x m matrices B,, B 2 (Exercise 2), one gets
II WI = IIB'"°B' II < IIB"°IVIiB' II < cIIB"°II'>
c
max {IIB•II:0 < r < n o }. (13.14)
From (13.12) and (13.14) it follows, as in Example 1, that the series Z B"E, converges a.s. in Euclidean norm if (Exercise 3), for some c > 0,
" °^
1 1 P(i 1 I > cb") < oo for some 6 <
II
Bn0Il11n0
.
(13.15)
Write, in this case, Y:= E
B"sn+l (13.16)
n=O
It also follows, as in Example 1 (see Exercise 1), that no matter what the initial distribution (i.e., the distribution of X 0 ) is, X n converges in distribution to the distribution it of Y. Therefore, it is the unique invariant distribution for p(x, dy). For purposes of application it is useful to know that the assumption (13.12) holds if the maximum modulus of eigenvalues of B, also known as the spectral radius r(B) of B, is less than 1. This fact is implied by the following result from linear algebra.
Lemma. Let B be an m x m matrix. Then the spectral radius r(B) satisfies r(B) >, Iii
(13.17)
JIB"Il li ".
n—ai
Proof. Let A,, ... , A m be the eigenvalues of B. This means det(B — Al) = (^ 1 — A)(A2 — A) • * (Am — A), where det is shorthand for determinant and I is the identity matrix. Let A m have the maximum modulus among the A ; , i.e., J > Iiml then B — Al is invertible, since det(B — AI) 0. Indeed, I'ml = r(B). If Al by the definition of the inverse, each element of the inverse of B — Al is a polynomial in A (of degree m — 1 or m — 2) divided by det(B — AI). Therefore, one may write (B—AI) -1 =(A, —A) -1 ...(A m —A) -] (B o +ß.B 2 +...+,lm -1 B m - 1 ) (IAI > kml),
(13.18)
where B,(0 < j < m — 1) are m x m matrices that do not involve A. Writing z = 1/A, one may express (13.18) as
170
DISCRETE-PARAMETER MARKOV CHAINS m-1
(B — AI)-1 = (_)_m(1 — /^1,)-1.. .(1 — )!mR)-12m-1
j=0
m-1
= (-1)mz(l —.l,z) -1 ...(I —.mZ) 1 I z m 1-, Bj j=0 Z Y a,, Z n) ^
n=0
-1
Zm-1-jBj
(IZI < I2ml-1 ).
(13.19)
/-0
On the other hand, (B — ):I)-1 = —z(I — zB)' = —z
IZI < 1zkB
k0
=
. (13.20)
IIBII
To see this, first note that the series on the right is convergent in norm for Izl < 1 /1111, and then check that term-by-term multiplication of the series I z k B k by I — zB yields the identity I after all cancellations. In particular, writing b;j'^ for the (i, j) element of B k , the series X
—zY_z k b;
(13.21)
k=0
converges absolutely for Izl < 1/ B. Since (13.21) is the same as the (i, j) element of the series (13.19), at least for Izl < I/IIBII, their coefficients coincide (Exercise 4) and, therefore, the series in (13.21) is absolutely convergent for Izl < IAm) -' (as (13.19) is). This implies that, for each e > 0, I14 I < (I.l m l + E) k for all sufficiently large k.
(13.22)
For if (13.22) is violated, one may choose Izl sufficiently close to (but less than) 1/I ; m l such that Iz^ k ' ) b'>l -- co for a subsequence {k'}, contradicting the requirement that the terms of the convergent series (13.21) must go to zero for IZI < 1/IAmI•
Now IIB k ll < m' 12 max{Ib>I: 1 < i, j < m} (Exercise 2). Since m l/21 —^ 1 as k --* co, (13.22) implies (13.17). n Two well-known time series models will now be treated as special cases of Example 2. These are the pth order autoregressive (or AR(p)) model, and the autoregressive moving-average model ARMA(p, q). Example 2(a). (AR(p) Model). Let p > 1 be an integer, ß o , ß,, . .. , ß P -, real constants. Given a sequence of i.i.d. real-valued random variables {ri n : n > p}, and p other random variables U0 , U,..... UP _, independent of {ri n }, define recursively
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
171
p-1
ßU+1
°— 1 Un+p iln+p
>, 0).
(n
+
(13.23)
=0
i
The sequence { Un } is not in general a Markov process, but the sequence of p-dimensional random vectors Xn'=(Un , Un+1 , ... , U„+p-1)'
(n
0)
(13.24)
is Markovian. Here the prime (') denotes transposition, so X. is to be regarded as a column vector in matrix operations. To prove the Markov property, consider the sequence of p-dimensional i.i.d. random vectors 0
(13.25)
(n>, 1),
,ij +p-1Y
and note that X„+, = BX n + r n+1 (13.26) where B is the p x p matrix 0
1
0
0
••
0
0
0
0
1
0
•..
0
0
.
B:=
(13.27)
0
0
0
0
ß0
F'1
N2
1'3
... ' •
Np
Hence, arguing as in (13.2), (13.3), or (13.11) state space R. Write
1
0 -
2 ßp-1
X,; is a Markov process on the
—).
1
0
0
•••
0
0
0
—)
1
0
•••
0
0
...
.
•••
—^
1
' * *
/'p-2
1 1 p-1 — a
B—A.(= 0
0
YO
I'1
0
0
/32/33
Expanding det(B — Al) by its last row, and using the fact that the determinant of a matrix in triangular form (i.e., with all zero off-diagonal elements on one side of the diagonal) is the product of its diagonal elements (Exercise 5), one gets
det(B
—
AI) = (
-
)
1 p+1
(ß0 + ß1A + ... + ßp ,; ,p
-
-
Therefore, the eigenvalues of B are the roots of the equation
1 —
A").
(13.28)
172
DISCRETE-PARAMETER MARKOV CHAINS
ß0+ß1A+
...
+ßp-I )p-I
—
Ap =
(13.29)
0 .
Finally, in view of (13.17), the following proposition holds (see (13.15) and Exercise 3).
Proposition 13.1. Suppose that the roots of the polynomial equation (13.29) are all strictly inside the unit circle in the complex plane, and that the common distribution G of {tl"} satisfies l G({x e AB I : lxi > cS"}) < oo
for some S < ^— A mt
n
(13.30)
where JA m t is the maximum modulus of the roots of (13.29). Then (i) there exists a unique invariant distribution 71 for the Markov process {X"}, and (ii) no matter what the initial distribution, X n converges in distribution to it. Once again it is simple to check that (13.30) holds if G has a finite absolute moment of some order r > 0 (Exercise 3). An immediate consequence of Proposition 13.1 is that the time series { Un : n >, 0} converges in distribution to a steady state n U given, for all Bore! sets C R', by 1t (C):= ir({x E W: x ' e C}). (
(13.31)
)
To see this, simply note that U. is the first coordinate of X, so that X,, converges to it in distribution implies U. converges to it in distribution. Example 2(b). (ARMA(p, q) Model). The autoregressive moving-average model of order (p, q), in short ARMA(p, q), is defined by p — I
q
Un+p := Z ßi i =0
Un +i + 1 Sj^1 n+p—j + rin+p
(n 0),
(13.32)
j=1
where p, q are positive integers, ß i (0 < i < p — 1) and bj (1 < j < q) are real constants, {ri n : n >, p — q} is an i.i.d. sequence of real-valued random variables, and U. (0 < i <, p — 1) are arbitrary initial random variables independent of {rj"}. Consider the sequence {X"}, {s"} of (p + q)-dimensional vectors X. _ (Un , $ n := (0, 0, .
, Un+p—I. Iln+p—q+ . 0, 1%n + p — 1 , 0, . . ,
e In+p-1) , e
(n ? 0),
0 , j,,+,,_ 1 )/
where il n+p-1 occurs as the pth and (p + q)th elements of E.
(13.33)
A MARKOVIAN APPROACH TO LINEAR TIME SERIES MODELS
X,,+1
(n -> 0 ),
= HXn + En+ 1
173
(13.34)
where H is the (p + q) x (p + q) matrix
b ll .
...
h nl H :=
b 1 0 h
.
d9-1
...
0
...
b 2 d1
0
0
00
1
0 ..
00
0
00
0
1
•••
00
0
0
...
0
0•••
01
0
0
.••
0
0
00
...
the first p rows and p columns of H being the matrix B in (13.27). Note that U., ... , U P _ 1 , rl p _ q , ... , q p _, determine X 0 so that X o is independent of rl p and, therefore, of E,. It follows by induction that X. and E n+ are independent. Hence {X n } is a Markov process on the state space I8° + Q. In order to apply the Lemma above, expand det(H —;.I) in terms of the elements of its pth row to get (Exercise 5) ,
,
det(H — Al) = det(B — Al)(_2) 9
.
( 13.35)
Therefore, the eigenvalues of H are q zeros and the roots of (13.29). Thus, one has the following proposition.
Proposition 13.2. Under the hypothesis of Proposition 13.1, the ARMA(p, q) process {X n } has a unique invariant distribution n, and X. converges in distribution to it no matter what the initial distribution is. As a corollary, the time series { Un } converges in distribution to m given for all Borel sets C c R' by
ii
u
(C):= rz({x e RP + q: x
(
' E )
C}),
(13.36)
no matter what the distribution of (U0 , U„ ... , U p _,) is, provided the hypothesis of Proposition 13.2 is satisfied. In the case that E n is Gaussian, it is simple to check that under the hypothesis (13.12) in Example 2 the random vector V in (13.16) is Gaussian. Therefore, it is Gaussian, so that the stationary vector-valued process {X n } with initial distribution it is Gaussian (Exercise 6). In particular, if q n are Gaussian in Example 2(a), and the roots of the polynomial equation (13.29) lie inside the unit circle in the complex plane, then the stationary process {Un }, obtained
174
DISCRETE-PARAMETER MARKOV CHAINS
when (U0 , U I ... , U,,,_ 1 ) have distribution similar assertion holds for Example 2(b). ,
it
in Example 2(a), is Gaussian. A
14 MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS The method of construction of Markov processes illustrated for linear time series models in Section 13 extends to more general Markov processes. The present section is devoted to the construction and analysis of some nonlinear models. Before turning to these models, note that one may regard the process {X„} in Example 13.1 (see (13.1)) to be generated by successive iterations of an i.i.d. sequence of random maps a l , a...... a„, ... defined by x-->a„x=bx
+E„
(n> 1),
{s„: n >, l} being a sequence of i.i.d. real-valued random variables. Each a„ is random (affine linear) map on the state space R' into itself. The Markov sequence {X} is defined by X„ = a„ • . a,X (n > 1), (14.1) ()
where the initial X 0 is a real-valued random variable independent of the sequence of random maps {a„: n >, 1 }. A similar interpretation holds for the other examples of Section 13. Indeed it may be shown, under a very mild condition on the state space, that every Markov process in discrete time may be represented as (14.1) (see theoretical complement 1). Thus the method of the last section and the present one is truly a general device for constructing and analyzing Markov processes on general state spaces. Example 1. (Iterations of I.I.D. Increasing Maps). Let the state space be an interval J, finite or infinite. On some probability space (Q, .F , P) is given a sequence of i.i.d. continuous and increasing random maps {a„: n >, l} on J into itself. This means first of all that for each w E S2, a„(w) is a continuous and increasing (i.e., nondecreasing) function on J into J. Second, there exists a set I' of continuous increasing functions on J into J such that P(a„ e i') = I for all n; I' has a sigmafield .4(I') generated by sets of the form {y e I': a < yx < b} where a < b and x are arbitrary elements of J, and yx denotes the value of y at x. The maps a„ on Q into I' are measurable, i.e., F„ := {co e S2: a„(w) e D} E F for every D e .(r). Also, P(F) is the same for all n. Finally, {a„: n >, 1 } are independent, i.e., events {a„ e D„} is an independent sequence for every given sequence {D„} For any finite set of functions y,, Y 2 , ... , y k in I', one defines the composition Y1Y2' • •Yk in the usual manner. For example, y 1 y 2 x = y 1 (y 2 x), the value of y 1 at the point y2x.
175
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
For each x e J define the sequence of random variables
X0(x):=x,
X,(x):=a„X„ - 1(x) = anan- 1
...
(n > 1). (14.2)
a,x
In view of the independence of {a„: n > 1}, {X„: n 0} is a Markov process (Exercise 1) on J, starting at x and having the (one-step) transition probability
p(y, C):= P(a„
y E C) = p({y e F: yy e Cl)
where y is the common distribution of
(C Borel subset of J), (14.3)
an .
lt will be shown now that the following condition guarantees the existence of a unique invariant probability it as well as stability, i.e., convergence of X„(x) in distribution to it for every initial state x. Assume
6, := P(X„ o (x) '< z 0 Vx) >0
and
82 :=
P(X„ o (x) z 0 Vx) >0 (14.4)
for some zo E J and some integer n o . Define A,:= sup I P(X,(x)
'< z) — P(X,(y) '< z).
(14.5)
x,y,zcJ
For the existence of a unique invariant probability it and for stability it is enough to show that A,, —> 0 as n —> co. For this implies P(X„(x) < z) converges, uniformly in z e J, to a distribution function (of a probability measure on J). To see this last fact, observe that X„ +m (x) _— a„ + m •a,x has the same distribution as a„• • a1„ +m • • a„ + ,x, so that
IP(Xn +m(x) < Z) — P(Xn(x)
<, z)1 = IP(Xn(Ln +m
...
an+lx)
<, Z) —
P(X„(x)
<, z)I < A„,
by comparing the conditional probabilities given a„ +m, ... , an+ 1 first. Thus, if A„ —+ 0, then the sequence of distribution functions {P(X„(x) < z)} is a Cauchy sequence with respect to uniform convergence for z E J. It is simple to check that the limiting function of this sequence is a distribution function of a probability measure is on J (Exercise 2). Further, A. --+ 0 implies that this limit it does not depend on the initial state x, showing that P is the unique invariant probability (Exercise 13.1). In order to prove A. --> 0, the first step is to establish, under the assumption (14.4), the inequality A„ o < 6:=max{1 — S„ 1 — ö z }. For this fix x, y E J and first take z < z o . On the set F2 :_ {X„ 0 (x)
(14.6)
>, z
0
Vx} the
176
DISCRETE-PARAMETER MARKOV CHAINS
events {X 0 (x) <, z}, {X 0 (y) < z} are both empty. Hence, by the second condition in (14.4), IP(Xno (x) < z) — P(Xn o (y) < z)I = IE(1{x^o(x)sz) — 1{x,o(Y),z))I
P(FZ) = 1 — 6 2 , (14.7)
since the difference between the two indicator functions in (14.7) is zero on F2 . Similarly, if z > z o then on the set F, :_ {X 0 (x) <, z 0 Vx} the two indicator functions both equal 1, so that their difference vanishes and one gets IP(X^ o (x) < z) — P(Xn o (y) < z)] < P(F) = 1 — 6 1 .
(14.8)
Combining (14.7) and (14.8) one gets IP(Xno (x) < z) — P(Xn o (y) <, z)I < 8
(14.9)
for all z z o . But the function on the left is right-continuous in z. Therefore, letting z I z o , (14.9) holds also for z = z o . In other words, (14.6) holds. Next note that O n is monotonically decreasing, On+i < On.
(14.10)
For, IP(XX+i(x) 5 z) — P(Xn+i(y) 5 z)1 = IP(«n+l...a2a1X' Z) — Man+l ... a2 0tlY ^ Z)I ^ An,
by comparing the conditional probabilities given a,. The final step in proving O n -^ 0 is to show A n < 8[n/no)
(14.11)
where [r] is the integer part of r. In view of (14.10), it is enough to prove (j=1,2,...).
O jno Sö'
(14.12)
Suppose that this is true for some j >, 1. Then, IP(X(j+l)no(x) < Z) — P(X(j+l)no(.y) < z)I = I E ( 1 {ai . +un o ... ain o +IXjn o (x) z) — = I E(l {xjno(x) 2 ( ( Itj+1mo ... n 0+1) - '(
1 (au+IW o ...ai no + Xjn o (Y)Sz) ) I
— aO.zJ) — 1 {Xjno(Y)e(a(j+ llno ... a^no+1) - '( — ao.zJ) )1 ' (14.13)
Let F3 1 = {a(j + 1) no ...aj no+ lx < zOVx},
F4:= {a(j+1) no ...aj np +1X s Zpdx}.
177
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
Take z < z o first. Then the inverse image of (—x,:] in (14.13) is empty on F4 , so that the difference between the two indicator functions vanishes on F 4 . On the complement of F,, the inverse image of (— ao, z] under the continuous increasing map a (j+1)n0 • ' *anno + , is an interval (—oo, Z'] n J, where Z' is a random variable. Therefore, (14.13) leads to JE 1 F ( I lX,lx)SZ'1 — 1 X(v)$l'^)I-
IP(X(J+I)no(x) < z) — P(X(J+1)no(y) < z)I =
(14.14)
As Fä and Z' are determined by a+,)n o , ... , a;n.+, and the latter are independent of Xj„„(x), X 0 (y) one gets, by taking conditional expectation given 1aU+1)no, ... , ajno+1
IP(X(J+1)no(X) l< z) — P(X(J+1)no(Y) S z)1
IE1 Fg i jn0 I _ (1 — 5 2 )A jn0 < b\ ino < hJ+ 1 . (14.15) Similarly, if z > z 0 , the inverse image in (14.13) is J on F3 . Therefore, the difference between the two indicator functions in (14.13) vanishes on F3 , and one has (14.14), (14.15) with F4 , b 2 replaced by F 3 and J,. As (14.12) holds for j = 1 (see (14.6)), the induction is complete and (14.12) holds for allj >, 1. Since 6 < 1, it follows that under the hypothesis (14.4), A n —* 0 exponentially fast as n —+ oo and, therefore, there exists a unique invariant probability. If J is a closed bounded interval [a, b], then the condition (14.4) is essentially necessary for stability. To see this, define Y0(x) = x,
}(x):=12.
(n (n
1).
(14.16)
Then Yn (x) and X(x) have the same distribution. Also, Y1(a)
a, Y2(a) = Y1(a2a) >, Y,(a), .. . Yn+,(a) = Yn(an+la)
i Yn(a),
i.e., the sequence of random variables { Yn (a): n > 0} is increasing. Similarly, n > 0} is decreasing. Let the limits of these two sequences be Y, Y, respectively. As Yn (a) < Yn (b) for all n, Y < Y If P( Y < Y) > 0, then Y and Y cannot have the same distribution. In other words, Yn (a) (and ; therefore, Xn (a)) and Yn (b) (and, therefore, XX (b)) converge in distribution to different limits it 1 , rr z say. On the other hand, if Y = Y a.s., then these limiting distributions are the same. Also, Yn (a) < Yn (x) ` Yn (b) for all x, so that Yn (x) converges in distribution to the same limit it, whatever x. Therefore, it is the unique invariant probability. Assume that is does not assign all its mass at a single point. That is, rule out the case that with probability I ally's in I, have a common fixed point. Then there exist m < M such that P(Y < m) > 0 and P(Y > M) > 0. There exists n o such that P(Yno (b) < m) > 0 and P(Yno (a) > M) > 0. Now any z o e [m, M] satisfies (14.4).
178
DISCRETE-PARAMETER MARKOV CHAINS
As an application of Example 1, consider the following example from economics.
Example 1(a). (A Descriptive Model of Capital Accumulation). Consider an economy that has a single producible good. The economy starts with an initial stock X0 = x > 0 of this good which is used to produce an output Y, in period 1. The output Y, is not a deterministic function of the input x. In view of the randomness of the state of nature, Y, takes one of the values fr (X) with probability p, > 0 (1 < r <, N). Here f, are production functions having the following properties: (i) f, is twice continuously differentiable, f(x) > 0 and f' (x) <0 for all x>0.
(ii) limxlo fr(X) = 0, limXlof;(x) > 1, limit f(x) = 0. (iii) If r > r', then fr (X) > fr (X) for all x > 0. The strict concavity of f, in (i) reflects a law of diminishing returns, while (iii) assumes an ordering of the technologies or production functions f„ from the least productive fi to the most productive fN . A fraction ß (0 < ß < 1) of the output Yl is consumed, while the rest (1 — ß)Y, is invested for the production in the next period. The total stock X, at hand for investment in period 1 is OXo + (1 — ß)Y1 . Here 0 < I is the rate of depreciation of capital used in production. This process continues indefinitely, each time with an independent choice of the production function (f, with probability p r , 1 <, r < N). Thus, the capital X„ + , at hand in period n + 1 satisfies
Xt = 0X„ + ( 1 — ß)q 1(X.)
(n 0),
(14.17)
where cp„ is the random production function in period n, P(q'n= fr) =pr
(I
and the cp„ (n >, 1) are independent. Thus the Markov process {X„(x): n >, 0} on the state space (0, cc) may be represented as X.(X) = a( ...p(IX,
where, writing g,(x):= 0x + (1— ß)f,(x),
1 < r < N,
(14.18)
one has P(an=9^)=Pr
(I
(14.19)
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
179
Suppose, in addition to the assumptions already made, that 0+(1—ß)limf;(x)>I
(1
(14.20)
xtO
i.e., lim x10 g,(x) > I for all r. As lim x —^ g(x) = 0 + (1 — /3) lim x . f(x) = 0 < 1, it follows from the strictly increasing and strict concavity properties of g r that each g r has a unique fixed point a r (see Figure 14.1) g,(a r )=a r (I
(14.21)
Note that by property (iii) of fr , a, < a z < • • • < a N . If y >, a,, then 9r(Y) % g(a 1 ) % g 1 (a 1 ) = a,, so that X,,(x) >, a, for all n > 0 if x >, a,. Similarly, if y < a N then gr(Y) 9 r(aN ) ' 9N(aN) = a N , so that X,,(x) < a N for all n >, 0 if x < a N . As a consequence, if the initial state x is in [a,, a N ], then the process n >, 0} remains in [a,, a N ] forever. In this case, one may take J = [a,, a N ] to be the effective state space. Also, if x > a, then the nth iterate of g i , namely gin ) (x), decreases as n increases. For if x a,, then g,(x) < x, 8121 (x) = g, (g, (x)) < g, (x), etc. The limit of this decreasing sequence is a fixed point of g l (Exercise 3) and, therefore, must be a,. Similarly, if x < a N then g(x) increases, as n increases, to a N . In particular, lim g(a N )
= a,,
n-x
lim g(a 1 ) = a N . n-•y,
Thus, there exists an integer n 0 such that no)
91 (aN) < 9 N< 0) (a,).
a,
u
Figure 14.1
a,
(14.22)
180
DISCRETE-PARAMETER MARKOV CHAINS
This means that if z o e [gi" ° '(aN), 9 °) (a1)], then
P(X" O (x)
(k^l),
because t, > n E + k implies that the last k among the first n e + k function a" are g 1 . Since p; goes to zero as k —+ oo, it follows from this that i l is a.s. finite. Also XTI (x) < a N as g(y) < gr(aN) gN(aN) = aN (t < r < N) for y <, a 1 , so that
in a single step it is not possible to go from a state less than a 1 to a state larger than a N . By the strong Markov property, and the result in the preceding paragraph on the existence of a unique invariant distribution and stability on [a l , a N ], it follows that XL , + ,(x) converges in distribution to it, as m o0 (Exercise 5). From this, one may show that pl"'(x, dy) converges weakly to it(dy) for all x, as n —+ oo, so that it is the unique invariant probability on (0, oo) (Exercise 5). In the same manner it may be checked that X(x) converges in distribution to it if x > a N . Thus, no matter what the initial state x is, X(x) converges in distribution to n. Therefore, on the state space (0, cc) there exists a unique invariant distribution it (assigning probability 1 to [a,, a N ]), and stability holds. In analogy with the case of Markov chains, one may call the set of states {x; 0 < x a N } inessential. The study of the existence of unique invariant probabilities and stability is relatively simpler for those cases in which the transition probabilities p(x, dy) have a density p(x, y), say, with respect to some reference measure µ(dy) on the state space. In the case of Markov chains this measure may be taken to be the counting measure, assigning mass I to each singleton in the state space. For a class of simple examples with an uncountable state space, let S = Il' and f a bounded measurable function on I8', a < f (x) < b. Let {E"} be an i.i.d. sequence of real-valued random variables whose common distribution has a strictly positive continuous density cp with respect to Lebesgue measure on 1. Consider the Markov process X" +1 := f(X") + E" + ,
(n > 0),
(14.23)
with X0 arbitrary (independent of {E"}). Then the transition probability p(x, dy)
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
181
has the density
p (x, y):= (P(y — .i (x)).
(14.24)
Note that tp(y — f (x)) >, ii(y)
for all
XE
68,
(14.25)
where i(y):= min{cp(y — z): a <, z < b} > 0. Then (see theoretical complement 6.1) it follows that this Markov process has a unique invariant probability with a density n(y) and that the distribution of X. converges to it(y) dy, whatever the initial state. The following example illustrates the dramatic difference between the cases when a density exists and when it does not. Example 2. Let S = [-2,2] and consider the Markov process X„ + , = .f(X) + r„ + , (n > 0), X0 independent of {^„}, where ie„} is an i.i.d. sequence
with values in [-1,1], and x+l
if-2x0,
Lx-1
if 0<x^2.
(14.26)
First let E„ be Bernoulli, P(E„ = I) = '' = P(f„ _ — I). Then, with X. - x e (0, 2],
I
X 1 (x) =
x — 2
with probability ' ,
x
with probability
i,
and X, (x — 2) has the same distribution as X 1 (x). It follows that P(X 2 (x)=x-2IX,(x)=x)=?=P(X 2 (x)=x— 21X,(x)=x-2), P(X 2 (x) = xI X 1 (x) = x) = z = P(X 2 (x) = x X 1 (x) = x — 2).
In other words, X 1 (x) and X 2 (x) are independent and have the same two-point distribution It s . It follows that { X„(x): n >, I } is i.i.d. with common distribution m. In particular, n x is an invariant initial distribution. If x e [-2,0], then {X„(x): n > l} is i.i.d. with common distribution nx+2, assigning probabilities Z and Z to {x + 2} and {x}. Thus, there is an uncountable family of invariant initial distributions {n r : 0 < x < 1) v {n x+ ,: — I _< x 0}. On the other hand, suppose s„ is uniform on [— 1, 1], i.e., has the density z on [— 1, 1] and zero outside. Check that (Exercise 6) {X 2 (x): n > l} is an i.i.d.
DISCRETE-PARAMETER MARKOV CHAINS
182
sequence whose common distribution does not depend on x and has a density —2
(14.27)
Thus, i(y) dy is the unique invariant probability, and stability holds. The final example deals with absorption probabilities. Example 3. (Survival Probability of an Economic Agent). Suppose that in a
one-good economy the agent starts with an initial stock x. A fixed amount c > 0 is consumed (x > c) and the remainder x — c is invested for production in the next period. The stock produced in period 1 is X l = t 1 (X0 — c) = E 1 (x — c), where a l is a nonnegative random variable. Again, after consumption, X l — c is invested in production of a stock of E 2 (X 1 — c), provided X l > c. If X l < c, the agent is ruined. In general, Xo = x,
Xn+l = En+1 (Xn
— c)
(n>0).
(14.28)
where {En : fl ? l} is an i.i.d. sequence of nonnegative random variables. The state space may be taken to be [0, oo) with absorption at 0. The probability of survival of the economic agent, starting with an initial stock x > c, is (14.29)
p(x):= P(Xn > c for all n > 01 X 0 = x).
If S = P(c 1 = 0) > 0, then it is simple to check that P(r n = 0 for some 1 (Exercise 8), so that p(x) = 0 for all x. Therefore, assume
n > 0) =
(14.30)
P(e 1 > 0) = 1.
From (14.28) one gets, by successive iteration, c c+ -En+l
Xn+l > C lff Xn > C + C iff Xn - 1 > C + E n+1
=C+c + En
En
C
CC
iffX,=—x>c+ c +-- -+•..+ EI
E1E2
C
2n n+1
-
E1E2...sn+1
Hence, on the set {E n > 0 for all n}, {Xn >cforalln}={x>c+ + _ - +•••+ l E1 6162
` °D =jxiC+cF l
1
1 foralln}
E1gZ...En
ll—j^ ( 1
n=1 6 I E Z ...E n )
0D
In=1 E182...En
) x
,<--1 C
}
MARKOV PROCESSES GENERATED BY ITERATIONS OF I.I.D. MAPS
In other words,
183
Ix
p(x)=P1 — x — 1 (n=j E 1 g 2 C,,
(14.31)
C
This formula will be used to determine conditions on the common distribution of E n under which (1) p(x) = 0, (2) p(x) = 1, (3) p(x) < I (x > c). Suppose first that E log E, exists and E log E, < 0. Then, by the Strong Law of Large Numbers,
1
n
n
r= ,
— Y log E n gElogE l <0, a.s.
so that log E 1 c 2 • E„ --p — oo a.s., or e,e 2 • E n --, 0 a.s. This implies that the infinite series in (14.31) diverges a.s., that is, p(x) = 0
for all x, if E log E, < 0.
(14.32)
Now by Jensen's Inequality (Chapter 0, Section 2), E log E, < log EE,, with strict inequality unless E, is degenerate. Therefore, if EE, < 1, or EE, = 1 and r, is nondegenerate, then E log e, < 0. If E, is degenerate and EE 1 = 1, then P(E 1 = 1) = 1, and the infinite series in (14.31) diverges. Therefore, (14.32) implies p(x) = 0
for all x, if EE 1 < 1.
(14.33)
It is not true, however, that E log E, > 0 implies p(x) = 1 for large x. To see this and for some different criteria, define m:=inf{z>0:P(e,
(14.34)
p(x)<1
(14.35)
Let us show that for allx,ifm
For this fix A > 0, however large. Find n o such that n o >A rfl (I +).
(14.36)
r
This is possible, as 11 (1 + 1/r 2 ) < exp{j] l/r 2 } < oo. If m < I then P(E, < I + 1/r 2 ) > 0 for all r >, 1. Hence, 0< P(E r z I + l/r 2 for I n°
n0I
r=1 E1...E
r n0 )
I
r1 (l+)
nU
In
r, E,...gr
\
API
\ r =I E,...0
>A^^P
>A. r=1 E1...Er
1
+
1
/
184
DISCRETE-PARAMETER MARKOV CHAINS
Because A is arbitrary, (14.31) is less than 1 for all x, proving (14.35). One may also show that, if m> 1, then ifx
<1
P(x) _
m m— 1
(14.37)
(m > 1).
m
ifx>c m
=1 To prove this, observe that
1
< °° 1 1 n =1 E1...E,, m— 1 n =1 m
with probability 1 (if m > 1). Therefore, (14.31) implies the second relation in (14.37). In order to prove the first relation in (14.37), let x < cm/(m — 1) — cS Choose such that for some 8 > 0. Then x/c — 1 < 1/(m — 1) —
b.
1 n(b) m
<
r n(b) +b )...(
and then choose 8 r > 0 (1 <,
<,
a
r -1 (m
i
(14.38)
2
— 1) such that
1
n(S) — 1
n(b)
a -2. mr
n(S) — 1
m+8 r ) > = 1
1
(14.39)
Then 0
Y
I
Y_
(a)-1
1)
"r=1
1
n(a)-1
>Y E1...Er
r.l m
Y
5P1 1 >> 1 —a1 =P 1 > 1 —SI. ) ) \ r =1 ElE2...Er r =1 m r (r11r m— If (5 >0 is small enough, the last probability is smaller than P(I 1/(E, • • • E r ) > x/c — 1), provided x/c — 1 < 1/(m — 1), i.e., if x < cm/(m — 1). Thus for such x one has 1 — p(x) > 0, proving the first relation in (14.37).
15 CHAPTER APPLICATION: DATA COMPRESSION AND ENTROPY The mathematical theory of information storage and retrieval rests largely on foundations established by Claude Shannon. In the present section we will consider one aspect of the general theory. We will suppose that "text" is
CHAPTER APPLICATION: DATA COMPRESSION AND ENTROPY
185
constructed from symbols in a finite alphabet S = {a,, a 2 , ... , a M }. The term "text" may be interpreted rather broadly and need not be restricted to the text of ordinary human language; other usages occur in genetic sequences of DNA, computer data storage, music, etc. However, applications to linguistics have played a central role in the historical development as will be discussed below. An encoding algorithm is a transformation in which finite sequences of text are replaced by sequences of code symbols in such a way that it must be possible to uniquely reconstruct (decode) the original text from the encoded text. For simplicity we shall consider compression codes in which the same alphabet is available for code symbols. Any sequence of symbols is referred to as a word and the number of symbols as its length. A word of length t will be encoded into a code-word of length s by the encoding algorithm. To compress the text one would use a short code for frequently occurring words and reserve the longer code-words for the more rarely occurring sequences. It is in this connection that the statistical structure of text (i.e., word frequencies) will play an important role. We consider a case in which the symbols occur in sequence according to a Markov chain having a stationary transition law ((p,: i j = 1, 2, ... , M)). Shannon has suggested the following scheme for generating a Markov approximation to English text. Open a book and select a letter at random, say T. Next skip a few lines and read until a T is encountered, and select the letter that follows this T (we observed an H in our trial). Next skip a few more lines, and read until an H is encountered and take the next letter (we observed an A), and so on to generate a sample of text that should be more closely Markovian than text composed according to the usual rules of English grammar. Moreover, one expects this to resemble more closely the structure of the English language than independent samples of randomly selected single letters. Accordingly, one may also consider higher-order Markov approximations to the structure of English text, for example, by selecting letters according to the two preceding letters. Likewise, as was also done by Shannon and others, one may generate text by using "linguistic words" as the basic alphabetic symbols (see theoretical complement 1 for references). It is of significant historical notice that, in spite of a modern-day widespread utility in diverse physical sciences, Markov himself developed his ideas on dependence with linguistic applications in mind. Let X0 , X,, ... denote the Markov chain with state space S having the stationary transition law ((p ;j )) and a unique invariant initial distribution n = (n ). Then {X„} is a stationary process and the word a = ( a, a ;Z ..... a ) of length t has probability 7c i , p ; ,. ; z • • • p ;, _ .;, of occurring. Suppose that under the coding transformation the word a = ( a ,, .... a ; ,) is encoded to a word of length s = c(a ...... a ,). Let ;
;
;
;
;
u, = E[c(X,..... X,)]/t.
(15.1)
The quantity p, is referred to as the average compression for words of length t. The optimal extent to which a given (statistical) type of text can be compressed by a code is measured by the so-called compression coefßcient defined by
186
DISCRETE-PARAMETER MARKOV CHAINS
It = limsup µ,.
(15.2)
t-.x
The problem for our consideration here is to calculate the compression coefficient in terms of the parameters of the given Markov structure of the text and to construct an optimum compression code. We will show that the optimal compression coefficient is given by H 17r; H; 7c, ^ p ^ log Pt1J
It
log M log M
L' = ------ log
;
^
--,
( 15.3 )
in the sense that the compression coefficient of a code is never smaller than this, although there are codes whose coefficient is arbitrarily close to it. The parameter H = p, log p ;j is referred to as the entropy of the transition distribution from state i and is a measure of the information obtained when the Markov chain moves one step ahead out of state i (Exercises 1 and 2). The quantity H = Y_ zt H is called the entropy of the Markov chain. Observe that, given the transition law of the Markov chain, the optimal compression coefficient may easily be computed from (15.3) once the invariant initial distribution is determined. For a word a of length t let p,(a) = P((Xo , ... , X, _ ,) = a). Then, ;
;
;
;
—log P,((Xo, ... , X,-1)) = —log Tt(Xo ) + Y,
( —
log Px;.x1. ) = Yo + Z Y (15.4)
where Y,, Y2 , . .. is a stationary sequence of bounded random variables. By the law of large numbers applied to the stationary sequence Y1 , Y2 ,... , we have by Theorems 9.2 and 9.3 (see Exercise 10.5), log P,((Xo _- • , X,- 1 )) —*
EY, = H
(15.5)
as t --+ oo with probability 1 (Exercise 4); i.e., for almost all sample realizations, for large t the probability of the sequence X0 , X,, ... , X,_, is approximately exp{ —tH}. The result (15.5) is quite remarkable. It has a natural generalization that applies to a large class of stationary processes so long as a law of large numbers applies (Exercise 4). An important consequence of (15.5) that will be used below is obtained by considering for each t the M` words of length t arranged as x ( , » x ( ,., ... in order of decreasing probability. For any positive number a < 1, let Ne (e) = min{N: Y_ p,(a^,^)
s}.
(15.6)
CHAPTER APPLICATION: DATA COMPRESSION AND ENTROPY
187
Proposition 15.1. For any 0 < g < 1, log N,(c)
lim - -
(15.7)
— = H.
Proof. Since almost sure convergence implies convergence in probability, it follows from (15.5) that for arbitrarily small positive numbers y and S, b < max{e, I — s}, for all sufficiently large t we have
C
Pt(Xo, ... , X' -' ) ^ log
t
H < v )>
In particular, for all sufficiently large t, say t >, T, exp{—t(H + y)}
< p,(X 0 , ... , X,_,) < exp{—t(H — y)}
with probability at least I — ö. Let R, denote the set consisting of all words x of length t such that e - ` ( " + }' ) < p 1 (a) < e - ` ( " - ' ) . Fix t larger than T. Let Sr = {a„ ) , a (2 ... , ; N , (e» }. The sum of the probabilities of the M,(E), say, words a of length t in R, that are counted among the N,(e) words in S, equals Yr €s, u R, p,(a) > r —5 by definition of N,(e). Therefore, M, ( s ) e -r(H-vi >,
I
p,(a) > t — b.
(15.8)
aeS,' R,
Also, none of the elements of S, has probability less than exp{ —t(H + y)}, since the set of all a with p,(a) > exp{ — t(H + y)} contains R, and has total probability larger than I — ö > E. Therefore, N,(c)c- pur +y) < 1.
(15.9)
Taking logarithms, we have log N,(e) t
On the other hand, by (15.8), N,(r)e-^ur-Y) > e — 6.
(15.10)
Again taking logarithms and now combining this with (15.9), we get log N,(E) t
Since y and 6 are arbitrarily small the proof is complete.
n
188
DISCRETE-PARAMETER MARKOV CHAINS
Returning to the problem of calculating the compression coefficient, first let us show that p >, H/log M. Let 6 > 0 and let H' = H — 2b < H. For an arbitrary given code, let J, = {a a is a word of length t and c(a) < tH'/log M}.
(15.11)
Then #J < M + M 2 + • • • + MIIH'IIogM] < M(1H'jiogM) { l + 1/M + • • _
e`H'M M-1
,
(15.12)
since the number of code-words of length k is M k . Now observe that
tH'
tµ,=Ec(X1,...,X^)%lo MP{(X„...,X,)eJi} g _ ^H—[1 —P{(X,,...,X,)eJ,}].
log M
(15.13)
Therefore,
p = limsup p,
> log M
limsup [1 — P{(X,, ... , X,) e J}].
(15.14)
Now observe that for any positive number e < 1, for the probability P({(X,, ... , Xr ) e J}) to exceed E requires that N,(E) be smaller than #J,. In
view of (15.12) this means that N,(E) <—M--exp{t(H — 2b)} M-1
(15.15)
or log N1(c) < t
0(1) + H — 26.
(15.16)
Now by Proposition 15.1 for any given t this can hold for at most finitely many values of t. In other words, we must have that the probability P({ (X,, ... , X,) e Jr }) tends to 0 in the limit as t grows without bound. Therefore, (15.14) becomes
H-26 H' log M log M and since 6 > 0 is arbitrary we get p >, H/log M as desired.
(15.17)
189
EXERCISES
To prove the reverse inequality, and therefore (15.3), again let b be an arbitrarily small positive number. We shall construct a code whose compression coefficient u does not exceed (H + S)/log M. For arbitrary positive numbers y and e, we have N1(1
— e) < e ^(H+y) = M t(n+y)(Io9m
(15.18)
for all sufficiently large t. That is, the number of (relatively) high-probability words of length t, the sum of whose probabilities exceeds I — e, is no greater than the number M` (H +y ) / I og At of words of length t(H + y)/log M. Therefore, there are enough distinct sequences of length t(H + y)/log M to code the Nß (1 — e) words "most likely to occur." For the lower-probability words, the sum of whose probabilities does not exceed 1 — (I — e) = t, just code each one as itself. To ensure uniqueness for decoding, one may put one of the previously unused sequences of length t(H + y)/log M in front of each of the self-coded terms. The length c(X0 , X,.... , X,_,) of code-words for such a code is then either t(H + y)/log M or t + t(H + y)/log M, the latter occurring with probability at most e. Therefore, t(H + y)l _ t(H + y) [ Ec(X0 , X 1 .... , X1 _ t ) ^ ^ l + t + lo g M F g
t(H + 8)
lo g
(15.19)
M
where b = eH + e log M + ey + y. The desired inequality now follows.
•
EXERCISES Exercises for Section I1.1 1. Verify that the conditional distribution of X n+ , given Xo , X i , .... Xn is the conditional distribution of X +1 given X. if and only if the conditional distribution of X + given given Xo , X 1 , ... , Xn is a (measurable) function of X. alone. [Hint: Use properties of conditional expectations. Section 0.4.] 2. Show that the simple random walk has the Markov property. 3. Show that every discrete-parameter stochastic process with independent increments is a Markov process. 4. Let A, B, C be events with C, B n C having positive probabilities. Verify that the following are equivalent versions of the conditional independence of A and B given C: P(A n B C) = P(A C)P(B I C) if and only if P(A I B n C) = P(A I C). 5. (i) Let {Xn } be a sequence of random variables with denumerable state space S. Call {Xn } rth order Markov-dependent if P(Xn +i =JI Xo=io,..., = P(Xn+ 1
=j
Xn
= i. '
I Xn-r+1 = ( n-r+ l+ • • , Xn = in
)
for i o , .. ,
i n , j e S, n
i r.
190
DISCRETE-PARAMETER MARKOV CHAINS
Show that Y. = (XX , X„ + I , ... , X„ + ,._, ), n = 0, 1, 2, ... is a (first-order) Markov chain under these circumstances. (ii) Let V. = X„ +1 — X„, n = 0, 1, 2, .... Show that if {X„} is a Markov chain, then so is {(X„, V„)}. [Hint: Consider first {(X X„ + ,)} and then apply a one-to-one transformation.] 6. Show that a necessary and sufficient condition on the correlations of a discrete-parameter stationary Gaussian stochastic process {X n } to have a Markov property is Cov(X,,, Xn+m ) = a 2 pi for some a' > 0, 1p,ß < 1, m, n = 0, 1, 2, .... [A stochastic process {X„: n >, 0} is said to be stationary if, for all n, m, the distribution
of (Xo , ..
, X„)
is the same as that of (Xm, Xm+1, • • , Xm+n)•]
7. Let {„} be an i.i.d. sequence of ±1-valued Bernoulli random variables with parameter 0
of + 1-valued random variables indexed by the set of integers 1. Let {S„} be the simple symmetric random walk on the state space 7l starting at S o = 0. The random walk {S„} is assumed to be independent of the random scenery {Y„}. Define a new process {X„} by noting down the scenery at each integer site upon arrival in the course of the walk. That is, X. = Ys„, n = 0, 1, 2, ... . (i) Calculate EX„. [Hint: X. _ ^m=_„ YmI(s.=mj.] (*ii) Show that {X„} is stationary. [See Exercise 6 for a definition of stationarity.] (iii) Is {X„} Markovian? (iv) Show that Cov(X„, X„ +m ) — (2m)112m 112 for large even m, and zero for odd m. (For an analysis of the "long-range dependence” in this example, see H. Kesten and F. Spitzer (1979), "A Limit Theorem Related to a New Class of Self-Similar Processes,” Z. Wahr. Verw. Geb., 50, 5-25.) 1, 2, ...} be i.i.d. + I-valued with P(Z„ = 1) = p ^ 2. Define n=0,1,2,.... Show that for k_
10. Let
{Z„: n = 0,
X„=Z Zn +1 ,
Exercises for Section II.2 1. (i) Show that the transition matrix for a sequence of independent integer-valued
random variables is characterized by the property that its rows are identical; i.e., p ;f =p ;for all i,jeS. (ii) Under what further condition is the Markov chain an i.i.d. sequence? 2. (i) Let {„} be a Markov chain with a one-step transition matrix p. Suppose that
the process {Y„} is viewed only at every mth time step (m fixed) and let X. = Y„ m , for n = 0, 1, 2, .... Show that {X} is a Markov chain with one-step transition law given by pm. (ii) Suppose {X„} is a Markov chain with transition probability matrix p. Let n l < n 2 < • • • < n k . Prove that P(X„k _J I X „ 1
=
i ^, . . . , Xnk - = `k — O =
(nk—„k-l) Pik - IJ
EXERCISES
191
3. (Random Walks on a Group) Let G be a finite group with group operation denoted by. That is, G is a nonempty set and is a well-defined binary operation for G such that (i) if x, y a G then x p+ y e G; (ii) if x, y, z e G then x (D(v z) = (v J y) ®z; (iii) there is an e E G such that x (D e = e p+ x = x for all x e G; (iv) for each XE G there is an element in G, denoted —x, such that x $(—x) = (—_r) +Q x = e. If (1 is commutative, i.e., x Q+ y = y O + x for all x, y E G, then G is called abelian. Let X 1 , X 2 , ... be i.i.d. random variables taking values in G and having the common probability distribution Q(g) = P(X„ = g), g e G. O+ X„, ri _> 0, is (i) Show that the random walk on G defined by S„ = X 0 Q+ X i Q+ a Markov chain and calculate its transition probability matrix. Note that it is not necessary for G to be abelian for {S„} to be Markov. (ii) (Top-In Card Shuffles) Construct a model for card shuffling as a Markov chain on a (nonabelian) permutation group on N symbols in which the top card of the deck is inserted at a randomly selected location in the deck at each shuffle. (iii) Calculate the transition probability matrix for N = 3. [Hint: Shuffles are of the form (c1, c2, c• 3) —s (c2. c1, c • 3) or (c2, c3, C1) only.] Also see Exercise 4.5. An individual with a highly contagious disease enters a population. During each subsequent period, either the carrier will infect a new person or be discovered and removed by public health officials. A carrier is discovered and removed with probability q = I — p at each unit of time. An unremoved infected individual is sure to infect someone in each time unit. The time evolution of the number of infected individuals in the population is assumed to be a Markov chain {X: n = 0, 1, 2, ...}. What are its transition probabilities? 5. The price of a certain commodity varies over the values 1, 2, 3, 4, 5 units depending on supply and demand. The price X„ at time n determines the demand D. at time n through the relation D. = N — X„, where N is a constant larger than 5. The supply C. at time n is given by CC = N — 3 + E. where {F„} is an i.i.d. sequence of equally likely ± 1-valued Bernoulli random variables. Price changes are made according to the following policy: X,, +1 —X=+1
if D„—C„>0,
X„ + , —X„= —1
if D„—C„<0,
X„, 1 —X„= 0
if D„—C„=0.
(i) Fix X0 = i o . Show that {X„} is a Markov chain with state space S = { 1, 2, 3, 4, 5}. (ii) Compute the transition probability matrix of {X„}. (iii) Calculate the two-step transition probabilities. 6. A reservoir has finite capacity of h units, where h is a positive integer. The daily inputs are i.i.d. integer-valued random variables {J: n = 1, 2, ...} with the common p.m.f. {g j = P(J„ = j), j = 0, 1, 2, ...}. One unit of water is released through the dam at the end of each day provided that the reservoir is not empty or does not exceed its capacity. If it is empty, there is no release. If it exceeds capacity, then the excess water is released. Let X. denote the amount of water left in the reservoir on the nth day after release of water. Compute the transition matrix for {X„}. 7. Suppose that at each unit of time each particle located in a fixed region of space has probability p, independently of the other particles present, of leaving the region. Also,
192
DISCRETE-PARAMETER MARKOV CHAINS
at each unit of time a random number of new particles having Poisson distribution with parameter ) enter the region independently of the number of particles already present at time n. Let X. denote the number of particles in the region at time n. Calculate the transition matrix of the Markov chain {X"}. 8. We are given two boxes A and B containing a total of N labeled balls. A ball is selected at random (all selections being equally likely) at time n from among the N balls and then a box is selected at random. Box A is selected with probability p and B with probability q = I — p independently of the ball selected. The selected ball is moved to the selected box, unless the ball is already in it. Consider the Markov evolution of the number X. of balls in box A. Calculate its transition matrix. 9. Each cell of a certain organism contains N particles, some of which are of type A and the others type B. The cell is said to be in state j if it contains exactly j particles of type A. Daughter cells are formed by cell division as follows: Each particle replicates itself and a daughter cell inherits N particles chosen at random from the 2j particles of type A and the 2N — 2j particles of type B present in the parental cell. Calculate the transition matrix of this Markov chain.
Exercises for Section II.3 1. Let p be the transition matrix for a completely random motion of Example 2. Show that p" = p for all n. 2. Calculate p;; for the unrestricted simple random walk. )
3. Let p = ((p i; )) denote the transition matrix for the unrestricted general random walk of Example 6. (i) Calculate p;t» interms of the increment distribution Q. (ii) Show that p° = Q*"(j — i), where the n-fold convolution is defined recursively by Q *( " - I) (k)Q(j — k),
Q * "(j) _
Q * "' = Q.
k
4. Verify each of the following for the Pölya urn model in Example 8. (i) P(X" = 1) = r/(r + b) for each n = 1, 2, 3, ... . (ii) P(X1 = e1,...,Xn =En)= P(Xi+n=x ,...,X. +h= ),foranyh=0,1,2,... (*iii) {X is a martingale (see Definition 13.2, Chapter I). X}
5. Describe the motion represented by a Markov chain having transition matrix of the following forms: (i)
(ii)
10 = 0 1 ], 01 =[1 0J' I
(iii) p =
5 S
ss
193
EXERCISES
(iv) Use the probabilistic description to write down p" without algebraically performing the matrix multiplications. Generalize these to m-state Markov chains. (Length of a Queue) Suppose that items arrive at a shop for repair on a daily basis but that it takes one day to repair each item. New arrivals are put on a waiting list for repair. Let A. denote the number of arrivals during the nth day. Let X. be the length of the waiting list at the end of the nth day. Assume that A,, A,, .. . is an i.i.d. nonnegative integer-valued sequence of random variables with a(x) = P(A„ = x), X„ (n _> 0). Calculate x = 0, 1, 2, .... Assume that A„ + , is independent of X o the transition probabilities for {X„}. , ... ,
_<
7. (Pseudo Random Number Generator) The linear congruential method of generating integer values in the range 0 to N — I is to calculate h(x) = (ax + c) mod(N) for some choice of integer coefficients 0 a, c < N and an initial seed value of x. More generally, polynomials with integer coefficients can be used in place of ax + c. Note that these methods cycle after N iterations. (i) Show that the iterations may be represented by a Markov chain on a circle. (ii) Calculate the transition probabilities in the case N = 5, a = 1, c = 2. (iii) Calculate the transition probabilities in the case h(x) _ (x 2 + 2) mod(5). [See D. Knuth (1981), The Art of Computer Programming, Vol. lt, 2nd ed., Addison-Wesley, Menlo Park, for extensive treatments.] (A Renewal Process) A system requires a certain device for its operation that is subject to failure. Inspections for failure are made at regular points in time, so that an item that fails during the nth period of time between n — 1 and n is replaced at time n by a device of the same type having an independent service life. Let p„ denote the probability that a device will fail during the nth period of its use. Let X. be the age (in number of periods) of the item in use at time n. A new item is started at time n = 0, and X. = 0 if an item has just been replaced at time n. Calculate the transition matrix of the Markov chain {X„}.
Exercises for Section 11.4
A balanced six-sided die is rolled repeatedly. Let Z denote the smallest number of rolls for the occurrence of all six possible faces. Let Z, = 1, Z ; = smallest number of tosses to obtain the jth new face after j — 1 distinct faces have occurred. Then Z=Z 1 +•••+Z 6 (i) Give a direct proof that Z,, ... , Z 6 are independent random variables. (ii) Give a proof of (i) using the strong Markov property. [Hint: Define stopping times t, denoting the first time after r j _, that X. is not among X,, .... X where X,, X2 are the respective outcomes on the successive tosses.] (iii) Calculate the distributions of Z21 ... , Z 6 (iv) Calculate EZ and Var Z. .
T
,
...
.
2. Let {S„} denote the two-dimensional simple symmetric random walk on the integer lattice starting at the origin. Define r, = inf{n: IIS„I1 = r}, r = 1, 2, ... , where II(a, b)JI = (ai + Ibl. Describe the distribution of the process {S,,: n = 0, 1, 2, ...} in the two cases r = 1 and r = 2.
194
DISCRETE-PARAMETER MARKOV CHAINS
3. (Coupon Collector's Problem) A box contains N balls labeled 0, 1, 2, ... , N — 1.
Let T - TN be the number of selections (with replacement) required until each ball is sampled at least once. Let T; be the number of selections required to sample j distinct balls. Show that
(i) T=(T"
—
T"-a)+(T"-I
—
T"-2)+
...
+(T2
—
T1)+T1,
where T, = 1, T2 — T,..... T + , — T, ... , T" — T"_, are independent geometrically distributed with parameters (N — j)/N, respectively. (ii) Let r j be the number of selections to get ball j. Then r ; is geometrically distributed. (iii) P(T > m)
[Hint: P(T > m) < I "_, P(r > m)]. ;
(iv) P(T > m) = k ^^ (— I)k+l(
N
)(
;
Ik —
[Hint: Use inclusion—exclusion on {T> m} = U=L {i ; > m}.] (v) Let X,, X2 , ... be the respective numbers on the balls selected. Is T a stopping time for {X„}?
4. Let {Xn } and { }.} be independent Markov chains with common transition probability matrix p and starting in states i and j respectively. (i) Show that {(X„, Y„)} is a Markov chain on the state space S x S. (ii) Calculate the transition law of {(X n , Y,)}. (iii) Let T = inf{n: X. = Y.J. Show that T is a stopping time for the process {(X n , Yc,)}. (iv) Let {Z„} be the process obtained by watching {X n } up until time T and then switching to { } n } after time T; i.e., Zn = X„, n < T, and Zn = Y., for n > T Show that {Z„} is a Markov chain and calculate its transition law. 5. (Top-In Card Shuffling) Suppose that a deck of N cards is shuffled by repeatedly
taking the top card and inserting it into the deck at a random location. Let G” be the (nonabelian) group of permutations on N symbols and let X,, X 2 , ... be i.i.d. G”-valued random variables with P(Xk =)=1/N
fori= 1,2,...,N,
where (i, i — 1, ... , I> is the permutation in which the ith value moves to i — 1, i — I to i — 2, ... 2 to 1, and 1 to i. Let S o be the identity permutation and let S„ = X,. . . X„, where the group operation is being expressed multiplicatively. Let T denote the first time the original bottom card arrives at the top and is inserted back into the deck (cf. Exercise 2.3). Then (i) T is a stopping time. (ii) T has the additional property that P(T = k, Sk = g) does not depend on g e G”. [Hint: Show by induction on N that at time T — I the (N — 1)! arrangements of the cards beneath the top card are equally likely; see Exercise 2.3(iii).] (iii) Property (ii) is equivalent to P(Sk = g I T = k) = 1/1G 5 I; i.e., the deck is mixed at time T. This property is referred to as the strong uniform time property by D. Aldous and P. Diaconis (1986), "Shuffling Cards and Stopping Times,” Amer. Math. Monthly, 93, pp. 333-348, who introduced this example and approach.
EXERCISES
195
(iv) Show that maxlP(S„ E A) — Al A
I GNI
_< _< P(T > n)
Ne
[Hint: Write P(S„cA)=P(S„EA,T_
0
_< _< r
A
I
= I I P(T n) + P(S„ e A T > n)P(T > n) = Al B + rP(T > n), IGNI IGNJ
>_
1. For the rightmost upper bound, compare Exercise 4.3.]
6. Suppose that for a Markov chain {Xn p,.,,:= Pr (XX = y for some n that PX (X, = y for infinitely many n) = 1. },
1) = 1. Prove
7. (Record Times) Let X 1 , X2 , .. . be an i.i.d. sequence of nonnegative random variables having a continuous distribution (so that the probability of a tie is zero). Define R, = 1, R k = inf {n_> R k _ I + 1:XX >,max(X,,...,XX _ 1 )}, for k=2,3,.... (i) Show that {R„} is a Markov chain and calculate its transition probabilities. [Hint: All ik ! rankings of (X,, X2 , ... , X;k are equally likely. Consider the event {R 1 = 1, R 2 = '2, ... , R k = ik } and count the number of rankings of (X l , X2 , ... , X, that correspond to its occurrence.] (ii) Let T„ = R +1 — R„. Is {T„} a Markov chain? [Hint: Compute P(T3 = 1 T2 = 1, T1 = 1) and P(T3 =I T2 = 1).] )
k)
I
8. (Record Values) Let X,, X2 ,.. . be an i.i.d. sequence of nonnegative random variables having a discrete distribution function. Define the record times R 1 = 1, R 2 , R 3 , ... as in Exercise 7. Define the record values by Vk = XRk k = 1, 2, ... . ,
(i) Show that each R k is a stopping time for {X. (ii) Show that {1'} is a Markov process and calculate its transition probabilities. (iii) Extend (ii) to the case when the distribution function of Xk is continuous. }.
Exercises for Section II.5 1. Construct a finite-state Markov chain such that (i) There is only one inessential state. (ii) The set .' of essential states decomposes into two equivalence classes with periods d = I and d = 3. 2. (i) Give an example of a transition matrix for which all states are inessential. (ii) Show that if S is finite then there is at least one essential state. 3. Classify all states for p given below into essential and inessential subsets. Decompose the set of all essential states into equivalence classes of communicating states.
196
DISCRETE-PARAMETER MARKOV CHAINS
j 0 0 0 j 0 0 0 0 0; 0 0
2
I 1 1 1 1 1 6 6 6 6 6 6
0
1
5
0 0 0
0 0 0
0 0 0
6
3
1
00
0 0
0 * 0 0 0
0
3
6`
0
4. Suppose that S comprises a single essential class of aperiodic states. Show that there is an integer v such that p$ 1 > 0 for all i, j e S by filling in the details of the following steps. (i) For a fixed (i, )), let B i; _ {v > 1: p;J) > 0}. Then for each state j, B is closed under addition. (ii) (Basic Number Theory Lemma) If B is a set of positive integers having greatest common divisor 1 and if B is closed under addition, then there is an integer b such that ne B for all n >, b. [Hints: (a) Let G be the smallest additive subgroup of Z that contains B. Then argue that G = Z since if d is the smallest positive integer in G it will follow that if n E B, then, since n = qd + r, 0 <, r < d, one obtains r = n — qd E G and hence r = 0, i.e., d divides each n e B and thus d = 1. (b) If leB,theneachn=l+1+•••+Ie B. If10B,thenby(a), 1 =a—ß for a, ß E B. Check b = ( a + ß) Z + 1 suffices; for if n > (a + ß) 2 , then, writing
>_
n= q(a +ß)+r, 0
;;
;;
;;
5. Classify the states in Exercises 2.4, 2.5, 2.6, 2.8 as essential and inessential states. Decompose the essential states into their respective equivalence classes. 6. Let p be the transition matrix on S = {0, 1, 2, 3} defined below.
2
0 ' 0 '2
2
20 Z 0 0 1 0 '2 2
0
2
0
Show that S is a single class of essential states of period 2 and calculate p" for all n. 7. Use the strong Markov property to prove that if j is inessential then P,,(X,, =j for infinitely many n) = 0. 8. Show by induction on N that all states communicate in the Top-In Card Shuffling example of Exercises 2.3(ii) and 4.5.
EXERCISES
197
Exercises for Section II.6 1. (i) Check that "deterministic motion going one step to the right" on S = {0, 1, 2, ...} provides a simple example of a homogeneous Markov chain for which there is no invariant distribution. (ii) Check that the "static evolution" corresponding to the identity matrix provides a simple example of a 2-state Markov chain with more than one invariant distribution (see Exercise 3.5(i)). (iii) Check that "deterministic cyclic motion" of oscillations between states provides a simple example of a 2-state Markov chain having a unique invariant distribution it that is not the limiting distribution lim,,. P(X„ =1) for any other initial distribution It : it (see Exercise 3.5(ii)). (iv) Show by direct calculation for the case of a 2-state Markov chain having strictly positive transition probabilities that there is a unique invariant distribution that is the limiting distribution for any initial distribution. Calculate the precise exponential rate of convergence. 2. Calculate the invariant distribution for Exercise 2.5. Calculate the so-called equilibrium price of the commodity, 3. Calculate the invariant distribution for Exercise 2.8. 4. Calculate the invariant distribution for Exercise 2.3(ii) (see Exercise 5.8). 5. Suppose that n states a,, a 2 .... , a„ are arranged counterclockwise in a circle. A particle jumps one unit in the clockwise direction with probability p, 0 _< p _< 1, or one unit in the counterclockwise direction with probability q = I — p. Calculate the invariant distribution. 6. (i) (Coupling Bound) If X and Y are arbitrary real-valued random variables and J an arbitrary interval then show that JP(X e J) — P(Y e J)I < P(X ^ Y). (ii) (Doeblin's Coupling Method) Let p be an aperiodic irreducible finite-state transition matrix with invariant distribution n. Let {X„} and { Y„} be independent Markov chains with transition law p and having respective initial distributions it and S ; ,. Let T denote the first time n that X. = (a) Show P(T < oo) = 1. [Hint: Let v = min{n: p >0 for all i, j e S}. Argue that P(T> kv)5 P(Xkv^ YkvIX(k-i)i# (k-1)v''Xv# Yv) ... P(X 2 ,, ^ Y'2V I X„ ^ YV )P(X V 0 YY ) < (I — Nb 2 ) k ,
where 6 = min ; p;j> > 0, N = ISI.] (b) Show that {(X„, )„)} is a Markov chain on S x S and T is a stopping time for this Markov chain. (c) Define {Z„} to be the process obtained by observing { Y„} until it meets {X.} and then watching {X„} from then on. Show that {Z„} is a Markov chain with transition law p and invariant initial distribution H. (d) Show IP(Z„ = j) — P(XX =1)1 _< P(T _> n).
(e) Show Ip
—
P(T >_ n) _< ( 1
Nö2)o.iv^
—
-
1.
Let A = ((a ;J )) be an N x N matrix. Suppose that A is a transition probability matrix with strictly positive entries a ;j . (i) Show that the spectral radius, i.e., the magnitude of the largest eigenvalue of A,
198
DISCRETE-PARAMETER MARKOV CHAINS
must be 1. [Hint: Check first that A is an eigenvalue of A (in the sense Ax = Ax has a nonzero solution x = (x 1 x")') if and only if zA = atz has a nonzero solution z = (z, • • z"); recall that det(B) = det(B') for any N x N matrix B.] (ii) Show that A = 1 must be a simple eigenvalue of A (i.e., geometric multiplicity 1). [Hint: Suppose z is any (left) eigenvector corresponding to A = 1. By the results of this section there must be an invariant distribution (positive eigenvector) n. For t sufficiently large z + to is also positive (and normalizable).] . •
*8. Let A = ((a ;j )) be a N x N matrix with positive entries. Show that the spectral radius is also given by min{.l > 0: Ax < Ax for some positive x}. [Hint: A and its transpose A' have the same eigenvalues (why?) and therefore the same spectral radius. A' is adjoint to A with respect to the usual (dot) inner product in the sense (Ax, y) = (x, A'y) for all x, y, where (u, v) = Z" 1 u 1 v.. Apply the maximal property to the spectral radius of A'.] 9. Let p(x, y) be a continuous function on [c, d] x [c, d] with c < d. Assume that p(x, y) > 0 and p(x, y) dy = 1. Let S2 denote the space of all sequences w = (x 0 x 1 ,...) of numbers x ; e [c, d]. Let .yo denote the class of all finite-dimensional sets A of the form A = {co = (x 0 x,, . ..) e S2: a, <x < b ; , i = 0,1, ... , n}, where c < a ; < b, < d for each i. Define P (A) for such a set A by
J
,
,
P
Px(A)
=
J
"...
fa
P(x,Y1)p(Y1,Y2) ... p(Y "-1,Y ")dY" ... dy1
forxe[ao,bo].
bl ,
Define P (A) = 0 if x < a o or x > b 0 . The Kolmogorov Extension Theorem assures us that PX has a unique extension to a probability measure defined for all events in the smallest sigmafield .y of subsets of Q that contains FO . For any nonnegative integrable function y with integral 1, define P
P(A)
= J P (A)y(X) dx, x
A e F.
Let X. denote the nth coordinate projection mapping on 52. Then {X"} is said to be a Markov chain on the state space S = [c, d] with transition density p(x, y) and initial density y under P. Under P. the process is said to have initial state x. (i) Prove the Markov property for {X"}; i.e., the conditional distribution of X"+ 1 given X0 X" is p(X", y) dy. (ii) Compute the distribution of X. under P. (iii) Show that under PY the conditional distribution of X. given X o = x o is p 1 " ) ( x o , y) dy, where ,...
,
p(x,Y)=
f
p ( n - 1)
(x, z)p(z, y) dz
and
p ( ' ) =p.
(iv) Show that if S = inf{p(x, y): x, ye [c, d]} > 0, then
J I P(x, Y) — P(z, Y)I dy < 2[ 1 — b(d — c)]
199
EXERCISES
by breaking the integral into two terms involving y such that p(x, y) > p(z, y) and those y such that p(x, y) -< p(z, y). (v) Show that there is a continuous strictly positive function n(y) such that max {Ip " (x, y) — m(y)I: c -< x, y -< d} < [1 — b(d — c)]" 'p (
)
where p = max {Ip(x, y) — p(z, y)I: c -< x, y, z -< d} < oo. (vi) Prove that it is an invariant distribution and moreover that under the present conditions it is unique. (vii) Show that P(X" e (a, b) i.o.) = 1, for any c m).] -
10. (Random Walk on the Circle) Let {X"} be an i.i.d. sequence of random variables taking values in [0, 1] and having a continuous p.d.f. _f(x). Let {S} be the process on [0, 1) defined by mod1,
S"=x+X,+•••+X"
where x E [0, 1). (i) Show that {S"} is a Markov chain and calculate its transition density. (ii) Describe the time asymptotic behavior of p "»(x, y). (iii) Describe the invariant distribution. (
11. (The Umbrella Problem) A person who owns r umbrellas distributes them between home and office according to the following routine. If it is raining upon departure from either place, an event that has probability p, say, then an umbrella is carried to the other location if available at the location of departure. If it is not raining, then an umbrella is not carried. Let X. denote the number of available umbrellas at whatever place the person happens to be departing on the nth trip. (i) Determine the transition probability matrix and the invariant distribution (equilibrium). (ii) Let 0 < a < 1. How many umbrellas should the person own so that the probability of getting wet under the equilibrium distribution is at most a against a climate (p)? What number works against all possible climates for the probability a?
Exercises for Section I1.7 1. Calculate the invariant distributions for Exercise 2.6. 2. Calculate the invariant distribution for Exercise 5.6. 3. A transition matrix is called doubly-stochastic if its transpose p' is also a transition matrix; i.e., if the elements of each column add to 1. (i) Show that the vector consisting entirely of l's is invariant under p and can be normalized to a probability distribution if S is finite. (ii) Under what additional conditions is this distribution the unique invariant distribution? 4.
(i) Suppose that
it = ( a ; )
is an invariant distribution for p. The distribution P,, of
200
DISCRETE-PARAMETER MARKOV CHAINS
the process is called time-reversible if it p ij = 7r j pj; for all i, j e S [it is often said to be time-reversible (with respect to p) as well]. Show that if S is finite and p is doubly stochastic, then the (discrete) uniform distribution makes the process time-reversible if and only if p is symmetric. (*ii) Suppose that {X^} is a Markov chain with invariant distribution it and started in x. Then {X„} is a stationary process and therefore has an extension backward in time ton = 0, ± 1, ± 2..... [Use Kolmogorov's Extension Theorem.] Define the time-reversed process by Y„ = X_,. Show that the reversed process {}„} is a Markov chain with 1-step transition probabilities q ;j = nj pji /n,. (iii) Show that under the time-reversibility condition (i), the processes in (ii), { Y„} and {X„}, have the same distribution; i.e., in equilibrium a movie of the evolution looks the same statistically whether run forward or backward in time.
(iv) Show that an irreducible Markov chain on a state space S with an invariant initial distribution it is time-reversible if and only if (Kolmogorov Condition): Pr, Pi,i 2 •
Piki = PukP ikik- 1 * • •P i,i
for all i, i 1 , .... i k E S, k >_ 1.
(v) If there is a j e S such that p ij > 0 for all i 0 j in (iv), then for time-reversibility it is both necessary and sufficient that p ij Pik Pki = Pik Pkj Pji for all i, j, k. 5. (Random Walk on a Tree) A tree graph on n vertices v 1 , v 2 , ... , v, is a connected graph that contains no cycles. [That is, there is given a collection of unordered pairs of distinct vertices (called edges) with the following property: Any two distinct vertices
u, v e S are uniquely connected in the sense that there is a unique sequence e l , e 2 .... , e„ of edges e ; _ {v k; , v,„,} such that u e e, v e e,,, e i n e i ,, ^ 0, i = 1, ... , n — 1.] The degree v i of the vertex v ; represents the number
of vertices adjacent to v i , where u, v E S are called adjacent if there is an edge {u, v}. By a tree random walk on a given tree graph we mean a Markov chain on the state space S = {v1, v2.... , v r } that at each time step n changes its state v ; to one of its v i randomly selected adjacent states, with equal probabilities and independently of its states prior to time n. (i) Explain that such a Markov chain must have a unique invariant distribution. (ii) Calculate the invariant distribution in terms of the vertex degrees v i , i = 1, ... , r. (iii) Show that the invariant distribution makes the tree random walk time-reversible. n = 0, 1, 2, ... . 6. Let {X„} be a Markov chain on S and define Y. = (X, X (i) Show that {Y„} is a Markov chain on S' = {(i, j) E S x S: p i , > 0). (ii) Show that if {X„} is irreducible and aperiodic then so is {Y„}. (iii) Show that if {X„} has invariant distribution it = (rz ; ) then {}„} has invariant distribution (n ; p ; ,).
7. Let {X„} be an irreducible Markov chain on a finite state space S. Define a graph G having states of S as vertices with edges joining i and j if and only if either p, j > 0 or pj; > 0. (i) Show that G is connected; i.e., for any two sites i and j there is a path of edges from i to j. (ii) Show that if {X„} has an invariant distribution it then for any A c S, Y. E 7 ri Pij = /_ Y_ Irj Pjl leA jES\A ieA jeS\A
EXERCISES
201
(i.e., the net probability flux across a cut of S into complementary subsets A, S\A is in balance). (iii) Show that if G contains no cycles (i.e., is a tree graph in the sense of Exercise 5). then the process is time-reversible started with n.
Exercises for Section 1I.8 1. Verify (8.10) using summation by parts as indicated. [Hint: Let Z be nonnegative integer-valued. Then P(Z>r)=
P(Z=n). r=on=r+t
.=o
Now interchange the order of summation.] 2. Classify the states in Examples 3.1-3.7 as transient or recurrent. 3. Show that if j is transient and i —*j then - o p ^ < co and, in particular, p —*0 as n —• oc. [Hint: Represent N(j) as a sum of indicator variables and use (8.11).] (
)
4. Classify the states for the models in Exercises 2.6, 2.7, 2.8, 2.9 as transient or recurrent. 5. Classify the states for {R„} = {IS,,}, where S. is the simple symmetric random walk starting at 0 (see Exercise 1.8). 6. Show that inessential states are transient. 7. (A Birth or Collapse Model) Let
1
-- ' Pi.^+I =i+1
i
Pi.o
=i+
i
1'
=0, 1,2,....
Determine whether p is transient or recurrent. 8. Solve Exercise 7 when I
i
>_ I ' Po.i
1.
9. Let p,, +1 = p, p; , o = q, i = 0, 1, 2, .... Classify the states of transient or recurrent (0 < p < 1, q = 1 — p).
S=
A.o
=
i+
l '
Pi.;+I
= i + 1'
i
{0, 1, 2, ...} as
10. Fix i,.j E S. Write r„=P;(X„=j)
= p;; ' (n%1).
%=P i (X,,,^jform
r0= 1 . (n%1)•
(i) Prove (see (5.5) of Chapter I) that r„ = J]ni=1 f,,r„_ m (n _> 1). (ii) Sum (i) over n to give an alternative proof of (8.11). (iii) Use (i) to indicate how one may compute the distribution of the first visit to state j (after time zero), starting in state i, in terms of p;PP (n _> 1).
202
DISCRETE-PARAMETER MARKOV CHAINS
11. (i) Show that if II • II and II • 110 are any two norms on t, then there are positive
constants c,, c 2 such that
c1 IIxll -< Ilxllo <- c211x11
for all x e
A norm on t8 k is a nonnegative function x —p II x II on 08 k with the properties that (a) Ilcx11 = Icl Ilxll for c e R , x e ask, (b) IlxII =0,xel8k,iffx=0,
(c) Ix + y11 _< Ilxll + IIYII for all x, ye R'. [Hint: Use compactness of the unit ball in the case of the Euclidean norm, IIXII=(xi+
...
+x)' 12 x=(x 1 ,...,x k )•] ,
(ii) Show that the stability condition given in Example 1 implies that X. —. 0 in every norm.
Exercises for Section II.9 1. Let Y1 , Y2 , ... be i.i.d. random variables. (i) Show that if EIY,•I < oo then {max(Y1 , .... Yn )}/n --* 0 a.s. as n —> cc. (ii) Verify (9.8) under the assumption (9.4). [Hint: Show that P(IY^I > e i.o.) = 0 for every e> 0.] 2. Verify for (9.12) that n lim — = E(T; 2) — t) = provided E j (tj(' ) ) < co.
3.Prove Jim ' = o0 n w Nn
Pi—a.s. for a null-recurrent state j such that i#-j. j. 4. Show that positive and null recurrence are class properties. 5. Let {X„} be an irreducible aperiodic positive recurrent Markov chain having transition matrix p on a denumerable state space S. Define a transition matrix q on S x S by (i, j), (k, 1) e S x S.
= Pik P;i,
Let {Z„} be a Markov chain on S x S with transition law q. Define Tp = inf{n: Zn e D}, where D = {(i, i): je S} c S x S. Show that if P(i.j) (TD < cc) = I for all i, j then for all i, je S, lim,,.,. p;; = 7t 1 , where {n t } is the unique invariant distribution. [Hint: Use the coupling method described in Exercise 6.6 for finite state space.] )
EXERCISES
203
6. (General Birth—Collapse) Let p be a transition probability matrix on S={0,1,2,...} of the form e ;. , + ,=p i ,p,. o =I— p j , i=0,1,2,...,O
i >_ 1, Po = 1. Show:
(i) All states are recurrent k
x
iff lim F1 p j = 0
iff I (1 — p j ) = oo .
(ii) If all states are recurrent, then positive recurrence holds
pj < b .
iff k=1j=1
(iii) Calculate the invariant distribution in the case p j = 1/(j + 2). 7. Let {S"} be the simple symmetric random walk starting at the origin. Define f: Z -^ l8' by f (n) = I if n _> 0 and f (n) = 0 if n < 0. Describe the behavior of [ (So) + ... + f(Sn- 1 )]/n as n
-i
oo.
8. Calculate the invariant distribution for the Renewal Model of Exercise 3.8, in the case that p"=pn -1 (1 —p),n=0, 1,2,. . . where 0
for a, >a e { + 1, — 1}, n = 0, + 1, ±2.... ; by the Markov property is meant that the conditional distribution of X" + , given {Xk : k _< n} does not depend on {Xk ,k_
(i) Calculate the unique invariant distribution n for p. (ii) Calculate the large-scale magnetization (i.e., ability to pick up nails), defined by MN =[X_ N + ... +XN ]/(2N+1)
in the so-called bulk (thermodynamic) limit as N —+ oo. (iii) Calculate and plot the graph (i.e., magnetic isotherm) of EX0 as a function of H for fixed temperature. Show that in the limit as H —• 0 + or H — 0 - the bulk magnetization EX, tends to 0; i.e., there is no (zero) residual magnetization remaining when H is turned off at any temperature.
DISCRETE-PARAMETER MARKOV CHAINS
204
(iv) Determine when the process (in equilibrium) is reversible for the invariant
distribution; modify Exercise 7.4 accordingly. 10. Show that if {X„} is an irreducible positive-recurrent Markov chain then the condition (iv) of Exercise 7.4 is necessary and sufficient for time-reversibility of the stationary process started with distribution it. *11. An invariant measure for a transition matrix ((p ;j )) is a sequence of nonnegative numbers (m ; ) such that m ; p ;J = m j for all j ES. An invariant measure may or may not be normalizable to a probability distribution on S. (i) Let p ; _ ;+ , = p ; and p ; , o = I — p ; for i = 0, 1, 2, .... Show that there is a unique invariant measure (up to multiples) if and only if tim,,. fjk=, Pk = 0; i.e., if and only if the chain is recurrent, since the product is the probability of no
return to the origin. (ii) Show that invariant measures exist for the unrestricted simple random walk but are not unique in the transient case, and is unique (up to multiples) in the (null) recurrent case. (iii) Let Poo = Poi = i and p i , ; _, = p i , ; = 2 - ' -Z , and p ;.;+ , = I — 2 i = 1, 2, 3, .... Show that the probability of not returning to 0 is positive (i.e., transience), but that there is a unique invariant measure. 12. Let { Y„} be any sequence of random variables having finite second moments and let y„,„ = Cov(Y„, Y,„), µ„ = EY„, o„ = Var(Y„) = y„„, and p (i) Verify that —1 <_ p _< 1 for all n and m. [Hint: Use the Schwarz Inequality.] (ii) Show that if p„ . ,„ _< f(In — ml), where f is a nonnegative function such that n_ 2 Yk=öf(k)Y n =1 Qk -*0 as n -* oo, then the WLLN holds for {}„}. (iii) Verify that if = p o ^„_^ = p(ln — ml) > 0, then it is sufficient that p(k) -*0 as n -+ oc for the WLLN. (iv) Show that in the case of nonpositive correlations it is sufficient that n -1 Ik= 1 ak -+0 as n , oo for the WLLN. 13. Let p be the transition probability matrix for the asymmetric random walk on S = {0, 1, 2, ...} with 0 absorbing and p i , ;+ , = p > Z for i _> 1. Explain why for fixed i > 0,
1 ^ j e S, n,„-,
does not converge to the invariant distribution S o ({j}) (as n -- co). How can this be modified to get convergence? 14. (Iterated Averaging)
(i) Let a,, a 2 , a 3 be three numbers. Define a 4 = (a, + a 2 + a 3 )/3, a 5 = (a 2 + a 3 + a 4 )/3,.... Show that lim,, a„ = (a, + 2a 2 + 3a 3 )/6. (ii) Let p be an irreducible positive recurrent transition law and let a,, a 2 , ... be any bounded sequence of numbers. Show that lim
Pi! aj _ )
ajnt,
where (it) is the invariant distribution of p. Show that the result of (i) is a special case.
EXERCISES
205
Exercises for Section 11.10
1. (i) Let Y1 , Y,, ... be i.i.d. with EY < co. Show that max(Y 1 , ... , Y")/,/n —• 0 a.s. as n —* . [Hint: Show that P(Y.2 > ne i.o.) = 0 for every e > 0.] (ii) Verify that n '/ZS" has the same limiting distribution as (10.6). -
2. Let {W"(t): t >_ 0} be the path process defined in (10.7). Let t I < t 2 < • • • < t k , k >_ 1, be an arbitrary finite set of time points. Show that (W"(t 1 ), ... , W"(t k )) converges in distribution as n —^ x to the multivariate Gaussian distribution with mean zero and variance—covariance matrix ((D min{t i , t i })), where D is defined by (10.12). 3. Suppose that {X"} is Markov chain with state space S = { 1, 2, . .. , r} having unique invariant distribution (it ; ). Let N(i)= #{k_
ieS.
Show that `(N"(1) n
Na(r) — — 7C 1
...
n
n
is asymptotically Gaussian with mean 0 and variance—covariance matrix F = ((y, J )) where = b it — n ; n t +
(pj — n t 7Z I ),
(p) — njmi) + k=1
for 1 -<
i, j -< r.
k=1
4. For the one-dimensional nearest-neighbor Ising model of Exercise 9.9 calculate the following: (i) The pair correlations p,,,, = Cov(X", X,"). (ii) The large-scale variance (magnetic susceptibility) parameter Var(X 0 ). (iii) Describe the distribution of the fluctuations in the (bulk limit) magnetization (cf. Exercise 9.9(i)).
5. Let {X"} be a Markov chain on S and define Y = (X", Xri+ 1 ), n = 0, 1, 2..... Let p = ((p, j )) be the transition matrix for {X"}. (i) Show that { Y"} is a Markov chain on the state space defined by S'_ {(i,j)eS x S:p ij >0}.
(ii) Show that if {XX } is irreducible and aperiodic then so is { }ç"}. (iii) Suppose that {X"} has invariant distribution it = (n i ). Calculate the invariant distribution of { },}. (iv) Let (i, j) e S' and let T be the number of one-step transitions from i to j by X0 , X"... , X" started with the invariant distribution of (iii). Calculate lim". x (T"/n) and describe the fluctuations about the limit for large n. 6. (Large-Sample Consistency in Statistical Parameter Estimation) Let XX = I or 0 according to whether the nth day at a specified location is wet (rain) or dry. Assume {X"} is a two-state Markov chain with parameters ß = P(X" +1 = II X" = 0) and
S=P(X 1 =0^X"=1),n=0,1,2,...,0
206
DISCRETE-PARAMETER MARKOV CHAINS
and ß^" = T"/n, where S. = X0 + • • • + X. is the number of wet days and )
_ >.
k=O
l ((Xk.Xk l)=(0.I)) is the number of dry-to-wet transitions. Calculate
n(" and lim"—. ß^" )
)
7. Use the result of Exercise 1.5 to describe an extension of the SLLN and the CLT to certain rth-order dependent Markov chains. Exercises for Section II.11 1. Let {X"} be a two-state Markov chain on S = {O, 1 } and let T o be the first time {X"} reaches 0. Calculate P l (z o = n), n _> 1, in terms of the parameters p, o and Poi 2. Let {X"} be a three-state Markov chain on S = {0, 1, 2} where 0, 1, 2 are arranged counterclockwise on a circle, and at each time a transition occurs one unit clockwise with probability p or one unit counterclockwise with probability 1 — p. Let t o denote the time of the first return to 0. Calculate P(-r o > n), n > 1. 3. Let T o denote the first time starting in state 2 that the Markov chain in Exercise 5.6 reaches state 0. Calculate P 2 (r o > n).
4. Verify that the Markov chains starting at i having transition probabilities p and p, and viewed up to time T A have the same distribution by calculating the probabilities of the event {X0 = i, X, = i ...... Xm = ► m , T A =m} under each of p and p. 5. Write out a detailed explanation of (11.22). 6. Explain the calculation of (11.28) and (11.29) as given in the text using earlier results on the long-term behavior of transition probabilities. 7. (Collocation) Show that there is a unique polynomial p(x) of degree k that takes
prescribed (given) values v o , v 1 ..... v k at any prescribed (given) distinct points x 0 , x,, ... , x k , respectively; such a polynomial is called a collocation polynomial. [Hint: Write down a linear system with the coefficients a o , a,, ... , a k of p(x) as the unknowns. To show the system is nonsingular, view the determinant as a polynomial and identify all of its zeros.] *8. (Absorption Rates and the Spectral Radius) Let p be a transition probability matrix
for a finite-state Markov chain and let r ; be the time of the first visit to j. Use (11.4) and the results of the Perron—Frobenius Theorem 6.4 and its corollary to show that exponential rates of convergence (as obtained in (11.43)) can be anticipated more generally.
9. Let p be the transition probability matrix on S = {0, ± 1, ±2, ...} defined by ifi>0,j=0,1,2,...,i-1 ifi<0,j=0,-1,-2,..., —i +I 1
ifi=0,j=0
0
ifi=0,j960.
(i) Calculate the absorption rate.
EXERCISES
207
(ii) Show that the mean time to absorption starting at i > 0 is given by
=, (i /k).
10. Let {X„} be the simple branching process on S = {0, 1, 2, ...} with offspring distribution { fj }, f j a jfj 1. (i) Show that all nonzero states in S are transient and that lim„^ P 1 (X„ = k) = 0, k=1,2,....
(ii) Describe the unique invariant probability distribution for {X„}. II. (i) Suppose that in a certain society each parent has exactly two children, and both males and females are equally likely to occur. Show that passage of the family surname to descendants of males eventually stops. (ii) Calculate the extinction probability for the male lineage as in (i) if each parent has exactly three children. (iii) Prompted by an interest in the survival of family surnames, A. J. Lotka (1939), "Theorie Analytique des Associations Biologiques II,” Actualites Scientifiques et Industrielles, (N.780), Hermann et Cie, Paris, used data for white males in the United States in 1920 to estimate the probability function f for the number of male children of a white male. He estimated f(0) = 0.4825, f(j)= (0.2126)(0.5893)' ' (j = 1,2,...). -
(a) Calculate the mean number of male offspring. (b) Calculate the probability of survival of the family surname if there is only one male with the given name. (c) What is the survival probability forj males with the given name under this model. (iv) (Maximal Branching) Consider the following modification of the simple branching process in which if there are k individuals in the nth generation, and if X,, X 2 , ... , X„,, are independent random variables representing their respective numbers of offspring, then the (n + 1)st generation will contain Zn + , = max{X„ . ,, X z , ... , X„.k } individuals. In terms of the survival of family names one may assume that only the son providing the largest number of grandsons is entitled to inherit the family title in this model (due to J. Lamperti (1970), "Maximal Branching Processes and Long-Range Percolation,” J. Appt. Probability, 7, pp. 89-98). (a) Calculate the transition law for the successive size of the generations when the offspring distribution function is F(x), x = 0, 1, 2, ... . (b) Consider the case F(x) = 1 — (1/x), x = 1, 2, 3..... Show that lim P(Z„ + , -< kx I Z„ = k) = e x ', -
-
x>0.
12. Let f be the offspring distribution function for a simple branching process having finite second moment. Let p = > k kf(k), v 2 = E k (k — p) 2 f(k). Show that, given Xo = 1,
Var X. = g
— 1 )/(u — 1)
if
if it # I p = 1.
13. Each of the following distributions below depends on a single parameter. Construct graphs of the nonextinction probability and the expected sizes of the successive generations as a function of the parameter.
208
DISCRETE-PARAMETER MARKOV CHAINS
p (i) f(j)= q 0 (ii) f(j)=qp',
ifj= 2 ifj=0 otherwise; j=0,1,2,...;
(iii) f(j)=^ i j -
j•
14. (Electron Multiplier) A weak current of electrons may be amplified by a device consisting of a series of plates. Each electron, as it strikes a plate, gives rise to a random number of electrons, which go on to strike the next plate to produce more electrons, and so on. Use the simple branching process with a Poisson offspring distribution for the numbers of electrons produced at successive plates. (i) Calculate the mean and variance in the amplification of a single electron at the
nth plate (see Exercise 12). (ii) Calculate the survival probability of a single electron in an infinite succession of plates if y = 1.01. 15. (A Generalized Gambler's Ruin) Gamblers 1 and 2 have respective initial capitals i > 0 and c — i > 0 (whole numbers) of dollars. The gamblers engage in a series of fair bets (in whole numbers of dollars) that stops when and only when one of the gamblers goes broke. Let X. denote gambler l's capital at the nth play (n >_ 1), Xo = I. Gambler 1 is allowed to select a different game (to bet) at each play subject to the condition that it be fair in the sense that E(X„ X^_,) = X,_,, n = 1, 2, ... , and that the amounts wagered be covered by the current capitals of the respective gamblers. Assume that gambler l's selection of a game for the (n + 1)st play (n >_ 0) depends only on the current capital X. so that {XX } is a Markov chain with stationary transition probabilities. Also assume that P(X„ + I = i XX = i) < 1, 0 < i < c, although it may be possible to break even in a play. Calculate the probability that gambler 1 will eventually go broke. How does this compare to classical gamblers ruin (win or lose $1 bets placed on outcomes of fair coin tosses)? [Hint: Check that a c (i) = i/c, 0 < i < c, a 0 (i) _ (c — i)/c, 0 < i _< c (uniquely) solves (11.13) using the equation E(X„ X,) = X„_ 1 .] Exercises for Section II.12 1. (i) Show that for (12.10) it is sufficient to check that 9,(a) _ q(b, a)q(a, c) _
9,(0)
q(b, O)q(O, c)'
a, b, cc S.
[Hint: Y_ g(a) = 1, äb, c e S.] a
(ii) Use (12.11), (12.12) to show that this condition can be expressed as 9(a) 9n,ß( 8 )
9e.o(b) 9e.e( 0 ) 9e.^(a) g9(0)
9e,e(b) 9e.,,(0)
a, b, cc S.
EXERCISES
209
(iii) Consider four adjacent sites on Z in states a, ß, a, b, respectively. For notational convenience, let [a, ß, a, b] = P(X0 = a, X, = ß, X 2 = a, X 3 = b). Use the
condition (ii) of Theorem 12.1 to show [a, ß, a, b] = P(X0 = a, X 2 = a, X3 = b)g".a(ß)
and, therefore, [a, ß , a, b] _ g(ß) [a, /3', a, b]
g,,,(ß')
(iv) Along the lines of (iii) show that also [a, ß, a, b]
g ß (a)
[a, ß, a', b]
gß.b(a')
Use the "substitution scheme" of (iii) and (iv) to verify (12.10) by checking (ii). 2. (i) Verify (12.13) for the case n = 1, r = 2. [Hints: Without loss of generality, take h = 0, and note, P(X 1 = ß,X 2
=
yIXo = a,X3 =h)=
[x, ß, y b] ,
--
[a, u, v, b]
and using Exercise 1(iii)—(iv) and (12.10), [a, u, v, b] _ g(u)g(v) _ 9(a, u)q(u, v)q(v, b) [a, u', v', b]
ga."(u')gr,b(v')
q(a, u')q(u', ß)9(v', b)
Sum over u, v E S and then take u' = ß, v' = y.] (ii) Complete the proof of (12.13) for n = 1, r 2 by induction and then n
>- 2.
3. Verify (12.15). 4. Justify the limiting result in (12.17) as a consequence of Proposition 6.1. [Hint: Use Scheffe's Theorem, Chapter 0.] *5. Take S = {0, 1}, g,,,(1) = µ, g, 0 (1) = v. Find the transition matrix q and the invariant initial distribution for the Markov random field viewed as a Markov chain.
Exercises for Section 11.13 (i) Let {Y": n -> 0} be a sequence of random vectors with values in Il k which converge a.s. to a random vector Y. Show that the distribution Q. of Y. converges weakly to the distribution Q of Y. (ii) Let p(x, dy) be a transition probability on the state space S = R k (i.e., (a) for each x e S. p(x, dy) is a probability measure on (R", B') and (b) for each B E x —. p(x, B) is a Borel-measurable function on R k ). The n-step transition probability p " (x, dy) is defined recursively by (
)
210
DISCRETE-PARAMETER MARKOV CHAINS
p cl)
(x, dy) = p(x, dy),
pcn+1)(x,
B)
= Js
p(y, B)p' n) (x, dy).
Show that, if p " (x, dy) converges weakly to the same probability measure it(dy) for all x, and x —* p(x, dy) is weakly continuous (i.e., f s f (y) p(x, dy) is a continuous function of x for every bounded continuous function f on S), then n is the unique invariant probability for p(x, dy), i.e., j 13 p(x, B)rc(dx) = n(B) for all Be .V V. [Hint: Let f be bounded and continuous. Then )
)
J
dY) ->
f
.%(Y)7r(dy),
1lr(dz). ]
J
(iii) Extend (i) and (ii) to arbitrary metric space S, and note that it suffices to require convergence of n - ' Jm-ö If (Y)pl'")(x, dy) to If (y)ir(y) dy for all bounded continuous f on S. 2. (i) Let B 1 , B 2 be m x m matrices (with real or complex coefficients). Define IIBII as in (13.13), with the supremum over unit vectors in Il'" or C'°. Show that
IIBIB2II < IIB,II IIB211(ii) Prove that if B is an m x m matrix then
IIBII < m' 12 max {Ib 1 l: 1 < i,j <, m}. (iii) If B is an m x m matrix and IIBII is defined to be the supremum over unit vectors in C'", show that IIB"II >, r"(B). Use this together with (13.17) to prove that limllB"II" exists and equals r(B). [Hint: Let A. be an eigenvalue such that 12,„1 = r(B). Then there exists x e C'°, (1x11 = 1, such that Bx = Ax.]
_<
3. Suppose E I is a random vector with values in 1. (i) Prove that if b > 1 and c > 0, then P(Ie 1 I > cb")
EEIZI — 1,
where Z = logI ,I —
" =1 log
_<
nP(n < IZI < n + 1). ]
P(IZI > n) _
[Hint: n =1
n=1
(ii) Show that if (13.15) holds then (13.16) converges. [Hint:
IIS"B"II
dfl("° B""II 1 "f"°)
log c
(5
where d = max{8'IIBII': 0
_< _< r
n o }. ]
(iii) Show that (13.15) holds, if it holds for some S < 1 /r(B). [Hint: Use the Lemma.] 4. Suppose Y a"z" and 1] b "z" are absolutely convergent and are equal for Izl < r, where r is some positive number. Show that a n = bn for all n. [Hint: Within its radius of
EXERCISES
211
convergence a power series is infinitely differentiable and may be repeatedly differentiated term by term.] 5. (i) Prove that the determinant of an m x m matrix in triangular form equals the product of its diagonal elements. (ii) Check (13.28) and (13.35). 6. (i) Prove that under (13.15), Y in (13.16) is Gaussian if c" are Gaussian. Calculate the mean vector and the dispersion matrix of Y in terms of those of E". (ii) Apply (i) to Examples 2(a) and 2(b). 7. (i) In Example I show that Jhi < 1 is necessary for the existence of a unique invariant probability. (ii) Show by example that ibI < I is not sufficient for the existence of a unique invariant probability. [Hint: Find a distribution Q of the noise E" with an appropriately heavy tail.] 8. In Example 1, assume EE„ < oo, and write a = EE", X" + , = a + bX" + 0" + „ where = a" — a (n -> 1). The least squares estimates of a, b are ä N , b N , which minimize Yn-ö (XX+ , — a — bX") z with respect to a, b. (XX — X ) 2 , where (i) Show that a N = Y — 6 N X, b N = I ö ' X" + , (X" — X= N 1 X", Y= N 1 I i X. [Hint: Reparametrize to write a + bX" _ a, + b(X" — X).]
(ii) In the case Ibi < 1, prove that a N —+ a and b N - b a.s. as N —• oo. 9. (i) In Example 2 let m = 2, b„ _ —4, b 12 = 5, b 2 , = —10, b 22 = 3. Assume e, has a finite absolute second moment. Does there exists a unique invariant probability? (ii) For the AR(2) or ARMA(2, q) models find a sufficient condition in terms of ß o and (i, for the existence of a unique invariant probability, assuming that q" has a finite rth moment for some r > 0.
Exercises for Section II.14 1. Prove that the process {X"(x): n >- 0} defined by (14.2) is a Markov process having the transition probability (14.3). Show that this remains true if the initial state x is replaced by a random variable X. independent of {a"}. 2. Let F"(z):= P(X, -< z), n >- 0, be a sequence of distribution functions of random variables X. taking values in an interval J. Prove that if F + ,„(z) — F(z) converges to zero uniformly for z e J, as n and m go to co, then F(z) converges uniformly (for all z e J) to the distribution function F(z) of a probability measure on J. 3. Let g be a continuous function on a metric space S (with metric p) into itself. If, for some x e S, the iterates g ( "^(x) - g(gt" (x)), n >- 1, converge to a point x* E S as n —* cc, then show that x* is a fixed point of g, i.e., g(x*) = x*. 4. Extend the strong Markov property (Theorem 4.1) to Markov processes {X"} on an interval J (or, on 5. Let r be an a.s. finite stopping time for the Markov process {X"(x): n > 0} defined by (14.2). Assume that X,(x) belongs to an interval J with probability 1 and p ( " ) (y, dz) converges weakly to a probability measure n(dz) on J for all ye J. Assume also that p(y, J) = I for all y e J.
212
DISCRETE-PARAMETER MARKOV CHAINS (i) Prove that p " (x, dz) converges weakly to n(dz). [Hint: p '` (x, J) —. I as k --i c0, (
f f(y)p
(k+r)
)
(
.)
(x dy) = f ($f(z)p^ (y, dz))p
(k)
)
(x dy).]
(ii) Assume the hypothesis above for all x e J (with J not depending on x). Prove
that it is the unique invariant probability. 6. In Example 2, ifs,, are i.i.d. uniform on [— 1,1], prove that {X 2n (x): n > 1} is i.i.d. with common p.d.f. given by (14.27) if XE [-2, 2]. 7. In Example 2, modify f as follows. Let 0 < ö <. Define fo (x): _f(x) for —2 _< x < —6, and 6 _< x _< 2, and linearly interpolate between (—ö,), so that f, is continuous. (i) Show that, for x e [6, 1] (or, x c [-1, —b]) {X"(x): n >, 1} is i.i.d. with common distribution it (or, nx+2). (ii) For x e (1, 2] (or, [-2, —1)) {X"(x): n >_ 2} is i.i.d. with common distribution —
ix (or, ix +2).
(iii) For x e (-8, 6) {X"(x): n >_ 1} is i.i.d. with common distribution it_ X+ ,. 8. In Example 3, assume P(e 1 = 0) > 0 and prove that P(e,, = 0 for some n >_ 0) = 1. 9. In Example 3, suppose E log s > 0. (i) Prove that E,„ , {1/(E 1 ...e" )} converges a.s. to a (finite) nonnegative random variable Z. (ii) Let d, := inf{z > 0: P(Z < z) > 0}, d 2 == sup{z >- 0: P(Z >- z) > 0}. Show that ;
=0
if x < c(d, + l ),
p(x) e (0, 1)
if c(d, + l) < x < c(d 2 + 1),
=1
if x > c(d 2 + 1).
10. In Example 3, define M := sup{z >_ 0: P(e, > z) > 0}. (i) Suppose I <M < oo. Then show that p(x) = 0 if x < cM/(M — 1). (ii) If Al _< 1, then show that p(x) = 0 for all x. (iii) Let m be as in (14.34) and M as above. Let d,, d 2 be as defined in Exercise 9(ii). Show that (a) d, = > m- "
_
n =,
(b)d 2 =
M " -
1
if m > 1, = oo if m 5 1 ,
and
m—I
(=_ifM>1=coifM1).
m—I
Exercises for Section I1.15 1. Let 0 < p < I and suppose h(p) represents a measure of uncertainty regarding the
occurrence of an event having probability p. Assume (a) h(i)=0. (b) h(p) is strictly decreasing for 0 < p _< 1. (c) h is continuous on (0, 1]. (d)h(pr) = h(p) + h(r), 0
1,0< r < I.
EXERCISES
213
Intuitively, condition (d) says that the total uncertainty in the joint occurrence of two independent events is the cumulative uncertainty for each of the events. Verify that h must be of the form h(p) = —c log 2 p where c = h(21) > h(1) = 0 is a positive constant. Standardizing, one may take
(f)
h(p) = — log2 p.
be a probability distribution on S = { 1, 2, . .. , M}. Define the entropy 2. Let f= in f by the "average uncertainty," i.e., H(f)
=—Zf
(0 log 0:= 0).
loge fi
a=1
—Y_
(i) Show that H(f) is maximized by the uniform distribution on S. (ii) If g = (g ; ) is another probability distribution on S then H(f)
Ji 1og 2 gi
i
with equality if and only if f
= g for all i E S. ;
<_
3. Suppose that X is a random variable taking values a 1 ..... a M with respective probabilities p(a, ), ... , p(a M ). Consider an arbitrary binary coding of the respective symbols a,, I < i z M, by a string cb(a ; ) = ( rI'°, ... , sl,'t) of 0's and l's, such that no string (eilt, ... , F „11 ) can be obtained from a shorter code (c,°, ... , e „ 1 n ; n j , by adding more terms; such codes will be called admissible. The number n ; of bits is called the length of the code-word 4)(a 1 ). (i) Show that an admissible binary code 0 having respective lengths n ; exists if and only if M
I. i =1
[Hint: Let n 1 .... , n M be positive integers and let k = 1,2,....
Pk = #{ i:n i =k},
Then it is necessary and sufficient for an admissible code that p, S 2, P2522-2P1.....Pk52k—pl2k 1 Pk -1 2 ,k>, 1 .] (ii) (Noiseless Coding Theorem) For any admissible binary code 0 of a,..... a M having respective lengths n 1 ..... M' the average length of code-words cannot be made smaller than the entropy of the distribution p of a,, ... , a,, i.e., —...—
M
i
—Z M
n.p(a,)
%
p(a) log2 p(ai)•
i =1
=l
Y
[Hint: Use Exercise 2(ü) with f = p(a ; ), g ; = 2 - " to show M
H(p)
M
n1p(ai) + log 2 (
i =1
Then apply Exercise 3(i) to get the result.]
2 - ""). k =1
214
DISCRETE-PARAMETER MARKOV CHAINS
(iii) Show that it is always possible to construct an admissible binary code of a l , ... , a, such that M
n1 P(ai) 5 H(p) + 1.
H(p)
_< <,
[Hint: Select n, such that
—log t p(a 1 )
n;
— log 2 p(a) + 1
for 1
_< _< i
M,
and apply 3(i).] (iv) Verify that there is not a more efficient (admissible) encoding (i.e. minimal average number of bits) of the symbols a,, a 2 , a 3 for the distribution p(al) = z, P(a2) = p(a 3 ) = ä, than the code 0(a 1 ) = ( 0), 0(a 2 ) = ( 1, 0), ¢(a 3 ) = ( 1, 1)• 4. (i) Show that Y1 , Y2 in (15.4) satisfies the law of large numbers. (ii) Show that for (15.5) to hold it is sufficient that Yl , YZ satisfy the law of large numbers. ,
...
,
...
THEORETICAL COMPLEMENTS Theoretical Complements to Section II.6 1. Let S be an arbitrary state space equipped with a sigmafield .5° of events. Let y(dx) be a sigmafinite measure on (S, 9) and let p(x, y) be a transition probability density with respect to p; i.e., p is a nonnegative measurable function on (S x S, .9' x ,9') such that for each fixed x, p(x, y) is a p-integrable function of y with total mass 1. Let S2 denote the space of all sequences w = (x 0 x,, ...) of states x i e S. Let .F, denote the class of all finite-dimensional sets A of the form ,
A={w=(x 0 ,x l ,...)eftx 1 eB 1 ,i = 0,1,.. ,n}, where B ; e 9 for each i. Define PP (A) for such a set A, x e B 0 , by Px(A)
=L
...
9,
P(x,Y1)P(Y1,Y2)...P(yn_1,Y(dyn)...u(dy1)
(T.6.1)
for x e S. Define P(A) = 0 if x 0 B 0 . The Kolmogorov Extension Theorem assures us that Px has a unique extension to a probability measure defined for all events in the smallest sigmafield .F of subsets of S2 that contain S. For any probability measure y on (S, .9"), define P1(A) =
J PX(A)y(dx),
A e F.
(T.6.2)
Let X. denote the nth coordinate projection mapping on 12. Then {X„} is said to be a Markov chain on the state space S with transition density p(x, y) with respect to p and initial distribution, y under P. The results of Exercise 6.9 can be extended to this
THEORETICAL COMPLEMENTS
215
setting as follows. Suppose that there is a positive integer r and a p-integrable function p on S such that fs p(x)p(dx) > 0 and p ' (x, y) > p(y) for all x, y in S. Then there is a unique invariant distribution it such that (
sup B
)
p ( " » (x, y)p(dy) — it(B)
(1 —
where a = f s p(x) u(dx), n' = [n/r], and the sup is over B e .Y .
❑
Proof. To see this define sup M(B) :=
p " (u, y)p(dy) (
)
uES ^ JB
inf if p " (u, y)p(dy),
m"(B):=
(
)
(T.6.3)
s "
sup {M(B) — m"(B)}. B
Then = sup
Ip ( " )
"
1.
(x, y) — pc (z, y)I u(dy) )
2 , s
pick+ n.> (ii)
Ifil
(x, y)p(dy) — p «k+ '(z, y)p(dy)1
( 1 — e)[Mk.(B) — mk.(B)]
(iii) The probability measure it given by n(B)
J p " (x, y)p(dy)
=
lim
su ( " ) (x, B p Ip
y)p(dy) — it(B)
(
)
(T.6.4)
is well defined and (1 — a)"'.
(T.6.5) n
Also, the following facts are simple to check. (iv) it is absolutely continuous with respect to p and therefore by the Radon—Nykodym Theorem (Chapter 0) has a density rz(y) with respect to p. (v) For Be .9' with 7r(B) > 0, one has Py,(X" e B i.o.) = 1. 2. (Doeblin Condition) A well-known condition ensuring the existence of an invariant distribution is the following. Suppose there is a finite measure cp and an integer r >_ 1, and > 0, such that p ' (x, B) < I — r whenever (p(B) _< e. Under this condition there is a decomposition of the state space as S = U;"_ j S i such that (i) Each Si is closed in the sense that p(x, S ; ) = I for x e S. (1 i m). (ii) p restricted to Si has a unique invariant distribution rti. (
)
216
DISCRETE-PARAMETER MARKOV CHAINS
(iii) -
j p ' (x, B) -+ zr (B) (
)
;
as n -. cc for x e Si .
n r =^
Moreover, the convergence is exponentially fast and uniform in x and B. It is easy to check that the condition of theoretical complement 1 above implies the Doeblin condition with Qp(dx) = p(x)p(dx). The more general connections between the Doeblin condition with the gi(dx) = p(x) p(dx) condition in theoretical complement 1 can be found in detail in J. L. Doob (1953), Stochastic Processes, Wiley, New York, pp. 190. 3. (Lengths of Increasing Runs) Here is an example where the more general theory applies. Let X,, X2 , ... be i.i.d. uniform on [0, 1]. Successive increasing runs among the values of the sequence X 1 , X2 ,. . . are defined by placing a marker at 0 and then between X; and X +1 whenever X; exceeds X +1 , e.g., X0.20, 0.24, 0.6010.5010.30, 0.7010.20... . Let Y. denote the initial (smallest) value in the nth run, and let L. denote the length of the nth run, n = 1, 2.... . (i) { Y,} is a Markov chain on the state space S = [0, 1] with transition density e' - x
ify<x
P(x, Y) _ (ii) Applying theoretical complement 1, { Y„} has a unique invariant distribution n. Moreover, it has a density given by n(y) = 2(1 — y), 0 < y < 1. (iii) The limit in distribution of the length L. as n --+ oo may also be calculated from that of { }„}, since ,r,(1_y)m m I Y„ = Y)P(Y„ E dY) = P(L ( m — 1)! 0
J
fo
P(L„ m) =
P(Y„ E dY),
and therefore, ri
—y )1 P(L >, m):= lim P(L„ >_ m) = J ^ 1 0
2(1 — y) dy.
o (m — 1)!
„--W
Note from this that P(L. _> m) = 2.
EL. _ m=1
Theoretical Complement to Section 11.8 1. We learned about the random rounding problem from Andreas Weisshaar (1987), "Statistisches Runden in rekursiven, digitalen Systemen 1 und 2,” Diplomarbeit erstellt am Institut für Netzwerk- und Systemtheorie, Universität Stuttgart. The stability condition of Example 8.1 is new, however computer simulations by Weisshaar suggest that the result is more generally true if the spectral radius is less than 1. However, this is not known rigorously. For additional background on the applications of this
THEORETICAL COMPLEMENTS
217
technique to the design of digital filters, see R. B. Kieburtz, V. B. Lawrance, and K. V. Mina (1977), "Control of Limit Cycles in Recursive Digital Filters by Randomized Quantization," IEEE Trans. Circuits and Systems, CAS-24(6), pp. 291-299, and references therein. Theoretical Complements to Section I1.9 1. The lattice case of Blackwell's Renewal Theorem (theoretical complement 2 below)
may be used to replace the large-scale
Caesaro type convergence
obtained for the
transition probabilities in (9.16) by the stronger elementwise convergence described as follows. (i) If j e S is recurrent with period 1, then for any i e S, lim pi; ) = pj;(E;i; t) ) - ` nom
where p 0 is the probability P; (i^' ) < cc) (see Eq. 8.4). (ii) If j e S is recurrent with period d> 1, then by regarding p ° as a one-step transition law, lim p(Jd) = pijd(E;i^ii)-
i
n-^m
To obtain these from the general renewal theory described below, take as the delay Zo the time to reach j for the first time starting at i. The durations of the subsequent replacements Z 1 , Z 2 , ... represent the lengths of times between returns to I.
2. (Renewal Theorems) Let Z,, 4,... be independent nonnegative random variables such that Z,, 4,.. . are i.i.d. with common (nondegenerate) distribution function F, and Z o has distribution function G. In the customary framework of renewal theory, components subject to failure (e.g. lightbulbs) are instantly replaced upon failure, and Z,, Z 2 , ... represent the random durations of the successive replacements. The delay random variable Zo represents the length of time remaining in the life of the initial component with respect to some specified time origin. For example, if the initial component has age a relative to the placement of the time origin, then one may take G(x) _
F(x
+ a) — F(a)
1 — F(a)
Let S. = Zo + Z, + ••• + Zn ,
n ^> 0,
(T.9.1)
and let N,=inf{n>-O:Sn>t},
t>-0.
(T.9.2)
We will use the notation S.0, N° sometimes to identify cases when Z o = 0 a.s. Then Sn is the time of the (n + 1)st renewal and N, counts the number of renewals up to
218
DISCRETE-PARAMETER MARKOV CHAINS
and including time t. In the case that Z o = 0 a.s., the stochastic (counting) process {N} - {N°} is called the (ordinary) renewal process. Otherwise {N,} is called a delayed renewal process. For simplicity, first restrict attention to the case of the ordinary renewal process. Let u = EZ l < oo. Then 1/µ is called the renewal rate. The interpretation as an average rate of renewals is reasonable since Sx
-i
\ t \ —
N,
(T.9.3)
N, N,
and N, -- oc as t --* oo, so that by the strong law of large numbers
N, t
-.
1 a.s. p
>_
as t - oc.
Since N, is a stopping time for {Sn } ( for fixed t (Chapter I, Corollary 13.2) that
(T.9.4)
0), it follows from Wald's Equation
ESN , = pEN,.
(T.9.5)
In fact, the so-called Elementary Renewal Theorem asserts that
EN,1 - t
as t --* Cc.
µ
(T.9.6)
To deduce this from the above, simply observe that pEN, = ESN , > t and therefore liminf
‚- .
EN,1 / -
t
p
>,
On the other hand, assuming first that Z. < C a.s. for each n 1, where C is a positive constant, gives pEN, < t + C and therefore for this case, limsup,.,(EN,/t) I/p. More generally, since truncations of the Z. at the level C would at most decrease the S„, and therefore at most increase N, and EN„ this last inequality applied to the truncated process yields
EN,
<- 1
limsup —
t CP(Z 1
>, C) +
fc
as C -4 co.
(T.9.7)
xF(dx) p
The above limits (T.9.4) and (T.9.6) also hold in the case that p = co, under the convention that 1 /0o is 0, by the SLLN and (T.9.7). Moreover, these asymptotics can now be applied to the delayed renewal process to get precisely the same conclusions for any given (initial) distribution G of delay. With the special choice of G = F. defined by x
F(x)
=1 I u j0
P(Z, > u)du,
x 0,
(T.9.8)
THEORETICAL COMPLEMENTS
219
the corresponding delayed renewal process N, called the equilibrium renewal process, has the property that for any h, t l 0,
EN(t, t + h] = h/p,
(T.9.9)
where NI(t,t+h]=N+h—N, N.
(T.9.10)
To prove this, define the renewal function m(t) = EN„ t >- 0, for {N1 }. Then for the
general (delayed) process we have P(N, n) = G(t) +
m(t) = EN, =
= G(t) +
P(N, n+l)
P(N,_„ -> n)F(du) = G(t) + J m(t — u)F(du) f0 0 '
= G(t) + m * F(t),
(T.9.11)
where * denotes the convolution defined by
m
s
F(t)
J
=
m(t
—
(T.9.12)
s)F(ds).
u Observe that g(t) = t/p, t -> 0, solves the renewal equation (T.9.11) with G = Fx,; i.e., t
1`
JJ
t
—= (1 — F(u)) du + — F(u) du = F(t) + -F(ds) du µ u o µ o N o 0 = F^(t) + 1
F(ds) = F (t) + t s F(ds). J J r duo s f EP x
—
u '
(T.9.13)
.'
To finish the proof of (T.9.9), observe that g(t) = m`(t):= EN;, t >- 0, uniquely solves (T.9.11), with G = F., among functions that are bounded on finite intervals. For if r(t) is another such function, then by iterating we have r(t) = F(t) +^^ r(t — u)F(du) 0
= F(t) + f { F^(t — u) + .'
(
J
t
u
r(t — u — s)F(ds)jF(du)
o
= P(Nj' >- 1) + P(N' -> 2)
+ = P(Nf' ^ 1) + P(N, 3 2) +
f
r(t
—
v)P(S° e dv)
+ P(Nf' ? n) + J r(t — v)P(S° e dv). (T.9.14)
220
DISCRETE-PARAMETER MARKOV CHAINS
Thus, r(t) _
P(Nj°
3 n) = m W (t)
n =i
since r J
Ir(t - v)IP(S E dv) _< sup jr(s)jP(S° <_ t) -. 0 .'
as n oo;
ssr
o
i.e.,
°°
P(S, -< t) = P(N° >_ n) -+ 0
as n -- oo
since
P(N° >, n) = EN° < cc. n =i
Let d be a positive real number and let L d = {0, d, 2d, 3d, ...}. The common distribution F of the durations Z 1 , Z 2 , ... is said to be a lattice distribution if there is a number d > 0 such that P(Z 1 e L a ) = 1. The largest such d is called the period of F. Theorem T.9.1. (Blackwell's Renewal Theorem) (i) If F is not lattice, then for any h > 0,
P(t<Sk_
EN(t,t+h]= k=1
ast -+oo,
(T.9.15)
µ
where N(t, t + h] := N, +k - NN is the number of renewals in time t to t + h. (ii) If F is lattice with period d, then
EN " = (
P(Sk = nd) - -
)
k=^
as n -• oo,
(T.9.16)
u
where Nc") _
(T.9.17)
lsk=nd) k=0
is the number of renewals at the time nd. In particular, if P(Z, = 0) = 0, then ❑ EN " is simply the probability of a renewal at time nd. (
)
Note that assuming that the limit exists for each h > 0, the value of the limit in (i), likewise (ii), of Blackwell's theorem can easily be identified from the elementary
renewal theorem (T.9.6) by noting that cp(h):= lim^.. m EN(t, t + h] must then be linear in h, p(0) = 0, and q(1)= lim {EN" +, - EN"}
= 1im
[{EN, -ENo }+{EN 2 -EN,}+•••+{ENn -EN
EN" 1 _ (im — = -. n -.. n
µ
I
}]
THEORETICAL COMPLEMENTS
221
Recently, coupling constructions have been successfully used to prove Blackwell's renewal theorem. The basic idea, analogous to that outlined in Exercise 6.6 for another convergence problem, is to watch both the given delayed renewal and an equilibrium renewal with the same renewal distribution F until their renewal times begin (approximately) to agree. After sufficient agreement is established, then one expects the statistics of the given delayed renewal process to resemble more and more those of the equilibrium process from that time onwards. The details for this argument follow. A somewhat stronger equilibrium property than (T.9.9) will be invoked, but not verified until Chapter IV. Proof. To make the coupling idea precise for the case of Blackwell's theorem with µ < oc, let {Z8 : n >- 1} and {Z„: n >- l} denote two independent sequences of renewal lifetime random variables with common distribution F, and let Z o and Z o be independent delays for the two sequences having distributions G and G = F es , respectively. The tilde () will be used in reference to quantities associated with the latter (equilibrium) process. Let a > 0 and define, for some h},
(T.9.18)
v"(s)=inf{n>,0:IS„—§,jj e. for some n}.
(T.9.19)
v(r)=inf{n>0:IS„—
Suppose we have established that (e-recurrence) P(v(e) < oo) = I (i.e., the coupling will occur). Since the event {v(e) = n, v(e) = n} is determined by 4, Z 1 , ... , Z„ and Z;,, the sequence of lifetimes {Z' +k : k >- l} may be replaced by the sequence {Z, +k : k >- 1} without changing the distributions of {S„}, {N}, etc. Then, after such a modification for < h/2, observe with the aid of a simple figure that N(t+r,t+h—e]1 1S,w—t) -
(T.9.20) Therefore, N(t+e,t+h—s]—N(t,t+h]1 {s , ( , > , t
=N(t+e,t+h — e]l^s
+(N(t+E,t+h—E]—N(t,t+h])l (s „
>g
N(t + s, t + h — E] 1 ^ s „ ,, E N(t, t + h]I (s , (,) ,, I N(t, t
+ h]1 Is ,
II
+ N(t, t + h]l {s „)(= N(t, t + h])
N(t — E, t + h + E]1 (s
, } + N(t, t + h]l {s ,,,,,,,
N(t—e,t+h+e]+N(t,t+h]l 5 )>,}.
(T.9.21)
Taking expected values and noting the first, fifth, and seventh lines, we have the following coupling inequality, EN(t+r,t+h—e]—E(N(t,t +h]l is ,. ( ,,„,) EN(t, t + h] < EN(t — c, t + h + e] + E(N(t, t + h]1 (5 , > ,). (T.9.22)
222
DISCRETE - PARAMETER MARKOV CHAINS
Using (T.9.9), EN(t+s, t+h — e]=
h
-
2s and
EN(t — E, t+h+e]=
µ
h+2e p
Therefore,
EN(t, t + h] — hl < ?E + E(N(t, t + h]l {s ,}).
u
(T.9.23)
Since A = I Is ,()>$} is independent of {ZN, +k: k >, 1}, we have E(N(t, t + h]1 (s ,^ >,^) < E o Nh P(SVO > t) = m(h)P(S V(C) > t),
(T.9.24)
where E o denotes expected value for the process N h under zero-delay. More precisely, because (t, t + h] c (t, SN , + h] and there are no renewals in (t, SN ,), we have N(t, t + h] < inf{k 3 0: ZN, +k > h). In particular, noting (T.9.2), this upper bound by an ordinary (zero-delay) renewal process with renewal distribution F, is independent of the event A, and furnishes the desired estimate (T.9.24). Now from (T.9.23) and (T.9.24) we have the estimate EN(t, t + h] — hl < m(h)P(SV(C) > t) + ?E ,
(T.9.25)
which is enough, since e> 0 is arbitrary, provided that the initial e-recurrence assumption, P(v(e) < oo) = 1, can be established. So, the bulk of the proof rests on showing that the coupling will eventually occur. The probability P(v(c) < cc) can be analyzed separately for each of the two cases (i) and (ii) of Theorem T.9.1. First take the lattice case (ii) with lattice spacing (period) d. Note that for e < d, v(e) = v(0) = inf{n > 0: S" — = 0 for some n}.
(T.9.26)
Also, by recurrence of the mean-zero random walk on the integers (theoretical complement 3.1 of Chapter I), we have P(v(0) < co) = 1. Moreover, S, (0) is a.s. a finite integral multiple of d. Taking t = nd and h = d with e = 0, we have EN ( " ) = EN(nd, nd + h] -+ h = d P p
as n - cc.
(T.9.27)
For case (i), observe by the Hewitt—Savage zero—one law (theoretical complement 1.2 of Chapter I) applied to the i.i.d. sequence (Z 1 , Z 1 ), (Z 2 , Z 2 ), (Z 3 , Z 3 ), ... , that P(R " < c i.o. I Z o = z) = 0 or 1, where R"= min{ S,f—S":5;,—S">0,n_>0}=SS,^S —
"
Now, the distribution of {SR, +; — t} ; does not depend on t (Exercise 7.5, Chapter
THEORETICAL COMPLEMENTS
223
->
IV). This, independence of {Z' j } and {S„}, and the fact that {Sk + „ — Sk : n 0} does not depend on k, make {R,, +k } also have distribution independent of k. Therefore, the probability P(R, < e for some n > k), does not depend on k, and thus {R„ < e i.o.} =
n
{R. < e for some n
-> k}
(T.9.28)
I,=0
implies P(R„ < r i.o.) = P(R„ < e for some n
J P(R„ < s i.o.
>, 0) -< P(v(e) < oo). Now,
= z)P(2 0 e dz) = P(R„ < e i.o.) = P(R„ < r for some n)
=
J P(R. <e for some n 12 = z)P(2 e dz). (T.9.29) 0
0
The proof that P(R„ <e for some n z) > 0 (and therefore is 1) in (T.9.29) follows from a final technical lemma given below on "points of increase" of distribution functions of sums of i.i.d. nonlattice positive random variables; a point x is called a point of increase of a distribution function F if F(b) — F(a) > 0 whenever a < x < b. Lemma. Let F be a nonlattice distribution function on (0, co). The set E of points of increase of the functions F, F* z , F* 3 , . . is "asymptotically dense at co" in the sense that for any t > 0 and x sufficiently large, E n (x, x + e) 96 0, i.e., the interval (x, x + e) meets E for x sufficiently large. 0
The following proof follows that in W. Feller (1971), An Introduction to Probability Theory and Its Applications, 2nd ed., Wiley, New York, p. 147. Proof. Let a, b E E, 0 a 2 /(b — a) belongs to some I., n >, 1. Since E is easily checked to be closed under addition, the n + 1 points na + k(b — a), k = 0,1, ... , n, belong to E and partition I. into n subintervals of length b — a < r. Thus each x > a 2 /(b — a) is at a distance (b — a)/2 2 of E. If for some r > 0, b — a -> e for all a, b e E then F must be a lattice distribution. To see this say, without loss of generality, E -< b — a < 2a for somea,beE.ThenEnl c {na +k(b—a):k= 0,1,...,n}.Since(n+l)aeEnl„ for a < n(b — a), E n I must consist of multiples of (b — a). Thus, if c e E then c + k(b — a) e /„ n E for n sufficiently large. Thus c is a multiple of (b — a). n Coupling approaches to the renewal theorem on which the preceding is based can be found in the papers of H. Thorisson (1987), "A Complete Coupling Proof of Blackwell's Renewal Theorem," Stoch. Proc. App!., 26, pp. 87-97; K. Athreya, D. McDonald, P. Ney (1978), "Coupling and the Renewal Theorem," Amer. Math. Monthly, 851, pp. 809-814; T. Lindvall (1977), "A Probabilistic Proof of Blackwell's Renewal Theorem," Ann. Probab., 5, pp. 482-485. 3. (Birkhoff's Ergodic Theorem) Suppose {X: n -> 0} is a stochastic process on (S2, .F, P) with values in (S, ,V). The process {X„} is (strictly) stationary if for every pair of integers m >- 0, r >- 1, the distribution of (X 0 , X 1 , ... , X,,) is the same as that
224
DISCRETE-PARAMETER MARKOV CHAINS
>,
of (X„ X l+ „ ... , X. + ,). An equivalent definition is: {X„} is stationary if the distribution µ, say, of X :_ (X0 , X 1 , X2 , ...) is the same as that of T'X :_ (X,, X l +„ X2 +„ ...) for all r 0. Recall that the distribution of (X„ X, + „ ...) is the probability measure induced on (St, .9 ®x) by the map co -+ (X,(w), X 1 + ,( w), X 2+r (w)....). Here St is the space of all sequences x = (x 0 , x 1 , x 2 , ...) with x i E S for all i, and .5'® is the smallest sigmafield containing the class of all sets of the form C = {x e St: x i E B, for 0 < i n} where n 0 and B E . ' (0 i n) are arbitrary. The shift transformation T is defined by Tx:= (x i , x 2 , ...) on St, so that T'x = (x„ XI+„ x2+„ ...). Denote by I the sigmafield generated by {X: n 0 }. That is, ✓! is the class of all sets of the form G = X -1 C = {co E Q: X(co) E C}, C E .9'®x• For a set G of this form, write T '6= {w e Q: TX(w) E C) = {(X,, X 2 ,. . .) E C) _ {X E T 'Cl. Such a set G is said to be invariant if P(G AT 'G) = 0, where A denotes the symmetric difference. By iteration it follows that if G = {X e C} is invariant then P(G AT 'G) = 0 for all r > 0, where T 'G = {(X„ X l +„ X2,,. . .) e C}. Let f be a real-valued measurable function on (5 50®) Then cp(w):= f(X(w)) is W- measurable and, conversely, all 1-measurable functions are of this form. Such a function cp = f(X) is invariant if f(X) = f(TX) a.s. Note that G = {X E C) is invariant if and only if 1 G = l c (X) is invariant. Again, by iteration, if f(X) is invariant then f(X) = f (T'X) a.s. for all r ? 1. Given any p ✓ -measurable real-valued function cp = f(X), the functions (extended real-valued)
_<
_< _< _>
;
-
-
-
t,
f(X): = lim n -1 (f(X)+f(TX)+•-•+f(T'X)) n -m
and f(X):= nyt lim n '( f(X)+ ... + f(Tn -1 X)) -
are invariant, and the set {7(X) = f(X)} is invariant. The class 5 of all invariant sets (in 5 ✓ ) is easily seen to be a sigmafield, which is called the invariant sigmafield. The invariant sigmafield .1 is said to be trivial if P(G) = 0 or 1 for every G E J. Definition T.9.1. The process {X„: n > 0} and the shift transformation T are said to be ergodic if S is trivial.
_>
The next result is an important generalization of the classical strong law of large numbers (Chapter 0, Theorem 6.1).
Theorem T.9.2. (Birkhoff's Ergodic Theorem). Let {X: n
0) be a stationary sequence on the state space S (having sigmafield .9'). Let f(X) be a real-valued 1-measurable function such that Elf(X)( < oc. Then f(TX) converges a.s. and in L' to an invariant random variable g(X), (a) n and ❑ (b) g(X) = Ef(X) a.s. if S is trivial. -
' Y_;=ö
We first need an inequality whose derivation below follows A. M. Garcia (1965), "A Simple Proof of E. Hopf's Maximal Ergodic Theorem,” J. Math. Mech., 14, pp. 381-382. Write
THEORETICAL COMPLEMENTS
225
M„(f):= max{0, f(X),f(X)+ f(TX),...,f(X)+ • • • + f(T" - 'X)}, M(f-T)= max{0, f(TX),f(TX)+f(T 2 X),...,f(TX)+•••+f(T"X)}, M(f):= lim M„(f)= sup M„(f). (T.9.30)
Proposition T.9.3. (Maxima! Ergodic Theorem). Under the hypothesis of Theorem T.9.2, f(X)dP>-0
VGEJ.
(T.9.31)
fuW 1 01IG
Proof. Note that f(X)+M„(foT)=M^ +1 (f) on the set {MM+ ,(f)>0}. Since M, + , (f) >, M(f) and {M(f) > 0} c {M„ + , (f) > 0}, it follows that f (X) M(f) — M„(f o T) on {M(f) > 0}. Also, M (f) -> 0, M^(f o T) -> 0. Therefore, M
f(X)dP>
(M^(f)—M^(f°T))dP
-
J( M„(f)>OV G
{M„(f)>OInG
=
M„(f) dP — J J
M„(f o T) dP
O
J
J
M^(f) dP — M„(f o T) dP G
G
= 0,
where the last equality follows from the invariance of G and the stationarity of Thus, (T.9.31) holds with {M„(f) > 0} in place of {M(f) > 0}. Now let n j cc. • Now consider the quantities 1 n- 1
A,(f):= max{.f(X), -1(.f(X) + f(TX)), ... ,
n,_
o
f(T`X)
A(f):= tim A„(f)= sup A„(f). new
n,(
The following is a consequence of Proposition T.9.3.
Corollary T.9.4. (Ergodic Maximal Inequality). Under the hypothesis of Theorem T.9.1 one has, for every cc If8',
f
(A(f»c)n G
f(X) dP -> cP({A(f ) > c} n G)
VG E 5.
(T.9.32)
226
DISCRETE-PARAMETER MARKOV CHAINS
Proof. Apply Proposition T.9.3 to the function f - c to get
f
f(X) dP -> cP({M(f - c) > 0} n G).
(M(f-^»0)nG
But {M„(f - c) > 0} = {A„(f - c) > 0} = {A„(f) > c}, and {M(f - c) > 0} _
n
{A(f)>c}. We are now ready to prove Theorem T.9.2, using (T.9.31). Proof of Theorem T.9.2. Write I
„-
'
I^-'
7(X):= i-- > f(T'X),
f(X):= lim - Y f(T'X),
p_Qn.=o
n—,n.=o
Gc,,(f )'= {f(X) > c, f(X)
(T.9.33)
(c, dc R' )
;Since GG , d ( f) e 5 and Gc , d (f) c {A(f) > c}, (T.9.32) leads to
f(X) dP -> cP(GC , d ( f )).
(T.9.34)
Ld(f)
-f in place of f and note that (- f) _ G_ d , _ C (- f) = GC d (f) to get from (T.9.34) the inequality
Now take
—
f
-
f, ( f ) _ -
-
f
f(X)dP _> —dP(GC , d (f )),
G"d(f)
i.e.,
f(X)dP < dP(Gc , d (f
)).
(T.9.35)
Now if c > d, then (T.9.34) and (T.9.35) cannot both be true unless P(G,, d (f )) = 0. Thus, if c > d, then P(GC , d (f )) = 0. Apply this to all pairs of rationals c > d to get P(f (X) > f(X)) = 0. In other words, (1/n) y;=ö f (T'X) converges a.s. to g(X) := f(X).
To complete the proof of part (a), it is enough to assume f >, 0, since n - ' jö -1 f + (T'X) -- 7(X) a.s. and n - ' 2]o - 'f(TX) - r(X) a.s., where f + = max{ f, 0}, - f - = min{ f, 0}. Assume then f > 0. First, by Fatou's Lemma and stationarity of {X„},
E7(X) < lim E
”-' 1 -
f(T'(X)) = Ef(X) < oo.
n-ao n r=o
To prove the L'-convergence, it is enough to prove the uniform integrability of the sequence {(1/n)S„(f):n_>1}, where S„(f):=Zö 'f(T'X). Now since f(X) is nonnegative and integrable, given s > 0 there exists a constant N E such that -
227
THEORETICAL COMPLEMENTS
II f (X) — fE(X)II l
J
1 S"(f) dP 5 J 1 S"(f — fE) dP +
t.S^(f>>a1 nn
S"(.ff) dP Ins.,ti)>,1 n
+ N"P({n S"(f) > }) S e + N,Ef(X)/.l.
(T.9.36)
It follows that the left side of (T.9.36) goes to zero as .l --p oo, uniformly for all n. Part (b) is an immediate consequence of part (a). • Notice that part (a) of Theorem T.9.2 also implies that g(X) = E(f (X) J). Theorem T.9.2 is generally stated for any transformation T on a probability space (S2, ^, p) satisfying p(T 'G) = p(G) for all G e ^. Such a transformation is called measure-preserving. If in this case we take X to be the identity map: X(cu) = w, then parts (a) and (b) hold without any essential change in the proof. -
Theoretical Complements to Section 11.10
1. To prove the FCLT for Markov-dependent summands as asserted in Theorem 10.2, first consider X=
+ +Z Q^
Since Z i , . are i.i.d. with finite second moment, the FCLT of Chapter I provides that {X;"°} converges in distribution to standard Brownian motion. The corresponding result for {W"(t)} follows by an application of the Maximal Inequality to show sup I X;" — W " E . iJ (t)I - 0
in probability as n -+ cc,
(T.10.1)
osfsI
where t is the first return time to j. Theoretical Complements to Section 11.12
There are specifications of local structure that are defined in a natural manner but for which there are no Gibbs states having the given structure when, for example, A = Z, but S is not finite. As an example, one can take q to be the transition matrix of a (general) random walk on S = Z such that q = q11-1 > 0 for all i, j. In this case no probability distribution on S^ exists having the local structure furnished by (12.10). For proofs, refer to the papers of F. Spitzer (1974), "Phase Transition in OneDimensional Nearest-Neighbor Systems," J. Functional Analysis, 20, pp. 240-254; H. Kesten (1975), "Existence and Uniqueness of Countable One-Dimensional Markov Random Fields," Ann. Probab., 4, pp. 557-569. The treatment here follows F. Spitzer (1971), "Random Fields and Interacting Particle Systems," MAA Lecture Notes, Washington, D.C. ;;
228
DISCRETE-PARAMETER MARKOV CHAINS
Theoretical Complements to Section II.14 (Markov Processes and Iterations of I.I.D. Maps) Let p(x; dy) denote a transition probability on a state space (S, 91); that is, (1) for each x e S. p(x; dy) is a probability measure on (S, 9), and (2) for each B e.9', x --+ p(x; B) is .9'-measurable. We will assume that S is a Borel subset of a complete separable metric space, and .9' its Borel sigmafield M(S). It may be shown that S may be "relabeled" as a Borel subset C of [0, 1], with M(C) as the relabeling of .^(S). (See H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 326-327). Therefore, without any essential loss of generality, we take S to be a Borel subset of [0, 1]. For each x e S, let F(.) denote the distribution function of p(x; dy): FX (y)1= p(x; S o (—oo, y]). Define Fx ' (t) := inf {y e 11': FF (y) > t}. Let U be a random variable defined on some probability space (f2, , , P), whose distribution is uniform on (0, 1). Then it is simple to check that P(Fx'(U) _< y) > P(F x (y) > U) = P(U < FX (y)) = Fx (y), and P(F»'(U) _< y) _< P(FF (y) 3 U) = P(U S FX(y)) = Fx(y). Therefore, P(FX'(U) _< y) _ F,(y), that is, the distribution of Fx'(U) is p(x; dy). Now let U,, U2 ,. . . be a sequence of i.i.d. random variables on (S2, , P), each having the uniform distribution on (0, 1). Let X0 be a random variable with values in S. independent of {U„}. Define X„ + 1 = f(X, U,, ,) (n > 0), where f(x, u) := Fx '(u). It then follows from the above that {X: n >_ 0} is a Markov process having transition probability p(x; dy), and initial distribution that of X0 . Of course, this type of representation of a Markov process having a given transition probability and a given initial distribution is not unique. For additional information, see R. M. Blumenthal and H. K. Corson (1972), "On Continuous Collections of Measures,” Proc. 6th Berkeley Symposium on Math. Stat. and Prob., Vol. 2, pp. 33-40.
2. Example I is essentially due to L. E. Dubins and D. A. Freedman (1966), "Invariant Probabilities for Certain Markov Processes,” Ann. Math. Statist., 37, pp. 837-847. The assumption of continuity of the maps is not needed, as shown in J. A. Yahav (1975), "On a Fixed Point Theorem and Its Stochastic Equivalent," J. App!. Probability, 12, pp. 605-611. An extension to multidimensional state space with an application to time series models may be found in R. N. Bhattacharya and O. Lee (1988), "Asymptotics of a Class of Markov Processes Which Are Not in General Irreducible," Ann. Probab., 16, pp. 1333-1347. Example l(a) may be found in L. J. Mirman (1980), "One Sector Economic Growth and Uncertainty: A Survey," Stochastic Programming (M. A. H. Dempster, ed.), Academic Press, New York. It is shown in theoretical complement 3 below that the existence of a unique invariant probability implies ergodicity of a stationary Markov process. The SLLN then follows from Birkhoff's Ergodic Theorem (see Theorem T.9.2). Central limit theorems for normalized partial sums may be derived for appropriate functions on the state space, by Theorem T.13.3 in the theoretical complements of Chapter V. Also see, Bhattacharya and Lee, loc. cit. Example 3 is due to M. Majumdar and R. Radner, unpublished manuscript. K. S. Chan and H. Tong (1985), "On the Use of Deterministic Lyapunov Function for the Ergodicity of Stochastic Difference Equations," Advances in App!. Probability, 17, pp. 666-678, consider iterations of i.i.d. piecewise linear maps. 3. (Irreducible Markov Processes) A transition probability p(x; dy) on the state space (S,5°) is said to be co-irreducible with respect to a sigmafinite nonzero measure q if, for each x e S and Be .9' with q(B) > 0, there exists an integer n = n(x, B) such that
THEORETICAL COMPLEMENTS
229
p "^(x, B) > 0. There is an extensive literature on the asymptotics of cp-irreducible (
Markov processes. We mention in particular, N. Jain and B. Jamison (1967), "Contributions to Doeblin's Theorem of Markov Processes," Z. Wahrscheinlichkeitstheorie und Venn Gebiete, 8, pp. 19-40; S. Orey (1971), Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand, New York; R. L. Tweedie (1975), "Sufficient Conditions for Ergodicity and Recurrence of Markov Chains on a General State Space," Stochastic Process App!., 3, pp. 385-403. Irreducible Markov chains on countable state spaces S are the simplest examples of cp-irreducible processes; here cp is the counting measure, ep(B)'= number of points in B. Some other examples are given in theoretical complements to Section 11.6. There is no general theory that applies if p(x; dy) is not cp-irreducible, for any sigmafinite q. The method of iterated maps provides one approach, when the Markov process arises naturally in this manner. A simple example of a nonirreducible p is given by Example 2. Another example, in which p admits a unique invariant probability, is provided by the simple linear model: X"„ = ZX,, + E" + ,, where F" are i.i.d., P(e" = i) = P(F" = i) = z. —
4. (Ergodicity, SLLN, and the Uniqueness of Invariant Probabilities) Suppose {XX : it -> 0} is a stationary Markov process on a state space IS, 9'), having a transition probability p(x; dy) and an invariant initial distribution it. We will prove the following result: The process {X"} is ergodic if and only if there does not exist an invariant probability n' that is absolutely continuous with respect to it and different from it. The crucial step in the proof is to show that every (shift) invariant bounded measurable function h(X) is a.s. equal to a random variable g(Xo ) where g is a measurable function on (S, .9'). Here X= (X0 , X 1 , X 2 , . ..), and we let T denote the shift transformation and 5 the (shift) invariant sigmafield (see theoretical complement 9.3). Now if h(X) is invariant, h(X) = h(T"X) a.s. for all n > 1. Then, by the Markov property, E(h(X) I cr(Xo , ... , XX }) = E(h(T"X) I a{X 0 , ... , X,,}) = E(h(T"X) 1 6{X"}) = g(X"), where g(x) = E(h(X0 , X„ . ..) I X0 = x). By the Martingale Convergence Theorem (see theoretical complement 5.1 to Chapter IV, Theorem T.5.2), applied to the martingale g(X) = E(h(X) a{Xo , ... , X"}), g(X) converges a.s., and in L', to E(h(X) I a{Xo , X„ ...}) = h(X). But g(X) — h(X) = g(X) — h(T"X) has the same distribution as g(X0 ) — h(X) for all n > 1. Therefore, g(Xo ) — h(X) = 0 a.s., since the limit of g(X) — h(X) is zero a.s. In particular, if G e .S then there exists B E . ' such that {X 0 e B} = G a.s. This implies it(B) = P(X0 e B) = P(G). If {X"} is not ergodic, then there exists G ei such that 0 < P(G) < I and, therefore, 0 < n(B) < 1 for a corresponding set BE .y as above. But the probability measure r1 B defined by: n e (A) = tr(A n B)/n(B), A e.9", is invariant. To see this observe that $ p(x; A)i B (dx) = f B p(x; A)rz(dx)/ic(B) = P(X0 e B, X, e A)/ir(B) = P(X, E B, X, E A)/it(B) (since {X o e B} is invariant) = P(X0 E A n B)/n(B) (by stationarity) = n(A r B)/tc(B) = tt 8 (A). Since i,(B) = 1 > rz(B), and tt B is absolutely continuous with respect to n, one half of the italicized statement is proved. To prove the other half, suppose {X"} is ergodic and n' is also invariant and absolutely continuous with respect to n. Fix A e Y. By Birkhoff's Ergodic Theorem, and conditioning on X 0 , (I/n) Z;=ä p ' (x; A) converges to it(A) for all x outside a set of zero it-measure. Now the invariance of n' implies f (1/n) p(')(x; A)zr'(dx) = rz'(A) for all n. Therefore, n'(A) = rz(A). Thus it' = it, completing the proof. As a very special case, the following strong law of large numbers (SLLN) for Markov processes on general state spaces is obtained: If p(x; dy) admits a unique invariant probability rr, and {X": n -> 0} is a Markov process with transition probability (
)
230
DISCRETE-PARAMETER MARKOV CHAINS
p and initial distribution n, then (1/n) jö ' f (X,) converges to f f (x)n(dx) a.s. provided that If (x)I i(dx) < co. This also implies, by conditioning on X 0 , that this almost sure convergence holds under all initial states x outside a set of zero it-measure. -
f
5. (Ergodic Decomposition of a Compact State Space) Suppose S is a compact metric space and S = .l(S) its Borel sigmafield. Let p(x; dy) be a transition probability on (S, s(S)) having the Feller property: x -* p(x; dy) is weakly continuous on S into p(S)—the set of all probability measures on (S,.R(S)). Let T* denote the map on 9(S) into a(S) defined by: (T*µ)(B) = $ p(x; B)p(dx) (Be f(S)). Then T* is weakly continuous. For if probability measures µ^ converge weakly to p then, for every real-valued bounded continuous f on S, J f d(T*p") = f (f f (y)p(x; dy))p ^ (dx) converges to ($ f(y)p(x; dy))p(dx) =If d(T*p), since x -a f f(y)p(x;dy) is continuous by the Feller property of p. Let us show that under the above hypothesis there exists at least one invariant probability for p. Fix p e P1(S). Consider the sequence of probability measures
f
1^ ' -
µ^'=
n r=o
T *',u
(n
_>
1),
where T *0 p = u,
T *Iµ = T*p,
and
T*('"y = T*(T
*rp)
(r 1).
Since S is compact, by Prohorov's Theorem (see theoretical complement 8.2 of Chapter I), there exists a subsequence {p.} of {p"} such that p". converges weakly to a probability measure n, say. Then T *p , converges weakly to T *n. On the other hand,
J f dµ^
.
-
J f d(T *u
)I =4 f nJ
f dy -
f
f d(T*" p)) -< (sup{f(x)I: x e S })(2/n') -+ 0,
as n' -• oo. Therefore, {p„} and {T*p".} converge to the same limit. In other words, it = T*n, or it is invariant. This also shows that on a compact metric space, and with p having the Feller property, if there exists a unique invariant probability it then (1/n) T*'p converges weakly to n, no matter what (the initial T*p distribution) p is. Next, consider the set .f = .alp of all invariant probabilities for p. This is a convex and (weakly) compact subset of P1(S). Convexity is obvious. Weak compactness follows from the facts (i) q(S) is weakly compact (by Prohorov's Theorem), and (ii) T* is continuous for the weak topology on 9(S). For, if u ^ e .elf and µ^ converges weakly to p, then µ^ = T*µ" converges weakly to T*p. Therefore, T*p = p. Also, P1(S) is a metric space (see, e.g., K. R. Parthasarathy (1967), Probability Measures on Metric Spaces, Academic Press, New York, p. 43). It now follows from the Krein-Milman Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, p. 207) that di is the closed convex hull of its extreme points. Now if {X ^ } is not ergodic under an invariant initial distribution n, then, by the construction given in theoretical complement 4 above, there exists B e P(S) such that 0 < n(B) < I and it = it(B)it B + n(B`)i B .,, with n B and iB ., mutually singular invariant probabilities. In other words, the set K, say, of extreme points of d# comprises those it such that {X"} with initial distribution it is ergodic. Every it e .i is a (weak) limit of convex combinations of the form .1;'°p;" ( n -+ cc), where 0 < A;^ < 1, .l;'° = 1, µ;'° e K.
1_
)
THEORETICAL COMPLEMENTS
231
Therefore, the limit it may be expressed uniquely as it = f K pm(dp), where m is a probability measure on (K, :. (K)). This means, for every real-valued bounded continuous f, 1, f drt = Sic (f s f dp)m(dµ).
Theoretical Complements to Section II.15 1. For some of Claude Shannon's applications of information theory to language structure, see C. E. Shannon (1951), "Prediction and Entropy of Printed English," Bell System Tech. J., 30(1), pp. 50-64. The basic ideas originated in C. E. Shannon (1948), "A Mathematical Theory of Communication," Bell System Tech. J., 27, pp. 379-423, 623-656. There are a number of excellent textbooks and references devoted to this and other problems of information theory. A few standard references are: C. E. Shannon and W. Weaver (1949), The Mathematical Theory of Communications, University of Illinois Press, Urbana; and N. Abramson (1963), Information Theory and Coding. McGraw-Hill, New York.
CHAPTER III
Birth—Death Markov Chains
1 INTRODUCTION TO BIRTH—DEATH CHAINS Each of the simple random walk examples described in Section 1.3 has the special property that it does not skip states in its evolution. In this vein, we shall study time-homogeneous Markov chains called birth—death chains whose transition law takes the form ß;
ifj =i +1
S ; ifj =i -1 a i ifj =i 0
(1.1)
otherwise,
where a + ß, + b = 1. In particular, the displacement probabilities may depend on the state in which the process is located. ;
;
Example 1. (The Bernoulli—Laplace Model). A simple model to describe the mixing of two incompressible liquids in possibly different proportions can be obtained by the following considerations. Consider two containers labeled box I and box II, respectively, each having N balls. Among the total of 2N balls, there are 2r red and 2w white balls, I < r < w. At each instant of time, a ball is randomly selected from each of the boxes, and moved to the other box. The state at each instant is the number of red balls in box I. In this example, the state space is S = {O, 1, ... , 2r} and the evolution is a Markov chain on S with transition probabilities given by for I
i<,2r-1,
233
234
BIRTH—DEATH MARKOV CHAINS r P1,i+ 1 =
(w+4—i)(2r —i) (w + r) 2 CC.-V-^:)
P"
_ i(2r mal) + .(2r^r(w + r — i) (2 + r) 2 (w + r) 2
(1.2)
i(w — r + i) Pi,i-1 =
(w + r) 2
and w —r Poo =P2r,2r= w+r
(1.3) 2r Pol
= P2r,2r - 1 = w + r
Just as the simple random walk is the discrete analogue of Brownian motion, the birth—death chains are the discrete analogues of the diffusions studied in Chapter V. Most of this chapter may be read independently of Chapter II.
2 TRANSIENCE AND RECURRENCE PROPERTIES The long-run behavior of a birth—death chain depends on the nature of its (local) transition probabilities pi,i+1 = ß., p;,i-1 = S i at interior states i as well as on its transitions at boundaries, when present. In this section a case-by-case computation of recurrence properties will be made according to the presence and types of boundaries. CASE I. Let {X„} bean unrestricted birth—death chain on S = {O, ±1, ±2, ...} = 7L. The transition probabilities are 1
=
ßi,
Pi,e-1 = ai,
Pi,i =
1 — ßi — Si
(2.1)
with 0<ßi<1,
0<ö,<1,
ß.+bl.
(2.2)
Let c, d e S, c < d, and write i(i) = P({X„} reaches c before d I Xo = i) = Pi (T < Td )
(c < i < d), (2.3)
where Tr denotes the first time the chain reaches r. Now, ^(i) = ( 1 — ßi — ó 1 )i(i) + ßrçli(i + 1) + S1'(i — 1),
235
TRANSIENCE AND RECURRENCE PROPERTIES
or equivalently, ß,(i/i(i + 1) — iji(i)) = b.(iJi(i) — t'(i — 1))
(c + I 5 i <, d — 1).
(2.4)
The boundary conditions for Ii are ci(c)= 1,
çli(d)=0.
(2.5)
Rewrite (2.4) as i/i(i + 1) — i(i) = — (ii(i) —
— 1)),
(2.6)
ßi
for (c + I < i < d — 1). Iteration now yields /x (x+ 1) — ^i(x) = S #X
#X
—
bx-l
...
b^+l
(^(c + 1) — ^i(c))
(2.7)
#c + l
1
for c + l < x < d — 1. Summing (2.7) over x = y, y + 1, ... , d — 1, one gets xbx-1
0(d) — 0(y) = d - 1 S x=yßxl'x-1
. :. 61 ...
+1 (0(c + 1) — 0(c)).
(2.8)
Nc+l
Let y = c + 1 and use (2.5) to get d-1 Sxax-1..'Sc+1 0(c + 1) = X=+ 1
NxNx— 1 ' ' ' /'c+ 1 -------
--
Sx6..Sc+
1 +
x=c+1 Nxßx— 1'
..
.
(2.9)
1
#C+1
Using this in (2.8) (and using fr(d) = 0, i/i(c) = 1) one gets d-1 5x5x-1
(y)--
1+
x=y = ßxßx-
E
d-1 S
l
"'Sc +l ß^+ l
(c + l <, y < d — 1).
(2.10)
xSx-1c + 1 /^(^ /^
x=c+1 ßxßx-1' ' .Yc+l
Let p y, denote the probability that starting at y the process eventually reaches c after time 0, i.e., py, = PP (X„ = c for some n? 1).
Then (Exercise 1),
(2.11)
236
BIRTH—DEATH MARKOV CHAINS
^
di x
<
dx x _1_ß+l = 0
if
p = lim ^i(y) = 1 y
x=c+l !'xl'x— I " ' f'c+ l
I
axSx-1...6C < x
if
x=c+1#X#X—l'
.
(c < y). (2.12)
'I'c+l
Since, for c + 1 < 0, 0 ac+1 c+2" 'ax
Sxax-1'_c+1 = x=c+1
/'xYx-1
...
ßc+l
x=[c+1 lac+11'c+2
...
1'x
+ ac+ISc+2 ...bo x M2...gx Nc+lßc+2
•
(
2.13)
*'ßOx 1 F1ß2"'ßx
and a similar equality holds for c + 1 > 0, (2.12) may be stated as 2 ax = cc R IS' i=l for ally > c iff Y a—
x=1 ß 1ß2
< I
for all y> c if Y
•
'ß
x < 00.
(2.14)
x=1/'1N2"'Nz
By relabeling the states i as — i (i = 0, + 1, ± 2, ...), one gets (Exercise 2) 0 p yd = 1
for all
y < d
iff F
=
ßx ßx+
0
x=—oo axax+l"'SO 0
<1
for all
y < d iff Y ßxßx+l' ß- < oo x=—m Sxsx+1"
(2.15)
60
By the Markov property, conditioning on X I (Exercise 3), pYY
6 ypY 1.Y + ßp -
+ 1'y
+ (1
—
Sy
—
/3 y ).
(2.16)
If both sums in (2.14) and (2.15) diverge, then p Y _ l , y = 1, p y+l , y = 1, so that (2.16) implies pyy = 1,
for all y.
(2.17)
In other words, all states are recurrent. If one of the sums (2.14) or (2.15) is convergent, say (2.14), then by (2.16) we get pyy < 1,
y e S.
(2.18)
237
TRANSIENCE AND RECURRENCE PROPERTIES
A state y e S satisfying (2.18) is called a transient state; since (2.18) holds for all y e S, the birth—death chain is transient. Just as in the case of a simple asymmetric random walk, the strong Markov property may be applied to see that with probability 1 each state occurs at most finitely often in a transient birth—death Markov chain. CASE II. The next case is that of two reflecting boundaries. For this take S = {0, 1, 2, .. , N } , P00 = 1 — ß0 , POI = ß0' PN.N-1 = 6 N' PN.N = 1 — 6N, and Pi.j+ 1 = fli, Pi.r-1 = b1, pi,; = 1 — ß. — d ; for I <, i < N -- 1. If one takes c = 0, d = N in (2.3), then fr (y) gives the probability that the process starting at y reaches 0 before reaching N. The probability 4(y), for the process to reach N before 0 starting at y, may be obtained in the same fashion by changing the boundary conditions (2.5) to cß(0) = 0, (N) = I to get that q (y) = I — Ji(y). Alternatively, check that çb(y) - I — ^i(y) satisfies the equation (2.6) (with 0 replacing 0) and the boundary conditions (P(0) = 0, 4(N) = 1, and then argue that such a solution is necessarily unique (Exercise 4). All states are recurrent, by Corollary 9.6 (see Exercise 5 for an alternative proof).
CASE III. For the case of one absorbing boundary, say at 0, take S = j0, 1, 2, ...1, Poo = 1 , Pi.i+1 = #i, Pi.^-1 = b;, Pi.; = 1 — ß ; — S ; for i > 0; ß ; , 6 i > 0 for i > 0, fl + ö 1 < 1. For c, d e S, the probability Ji(y) is given by (2.10) and the probability p, which is also interpreted as the probability of eventual absorption starting at y> 0, is given by d-1
ax ax l
...
-
bl
Y ß.ß.- 1 . . Ij1
p Ya =hm
d-1
dtv 1 +
—
Sxbx-1_..51
x=1 ßßx-1
= 1
iff 2
a lb /j
J^
... Y1
a = oo
(for y > 0).
(2.19)
x=1 ßIß2'''Yx
Whether or not the last series diverges, ...6,
>
0
,
for all y> 0
(2.20)
and P Yd
1 — 6 y 6 y _ 1 * (6 1 < l ,
ford > y > 0,
Pod=O foralld>0.
(2.21)
,
By (2.16) it follows that pYY < 1
Thus, all nonzero states y are transient.
(y > 0).
(2.22)
BIRTH-DEATH MARKOV CHAINS
238
CASE IV. As a final illustration of transience—recurrence conditions, take the case of one reflecting boundary at 0 with S = {0, 1, 2, 3, ...} and Poo = I — ßo, Poi =ßo, pi.r+ =ß' =‚ p ;.; = 1 — ß ; — b ; for i > 0; ß 1 >O for all i, b ; > 0 for i > 1, ß i + 6 i < 1. Let us now see that all states are recurrent if and only if the infinite series (2.19) diverges, i.e., if and only if p yo = 1. First assume that the infinite series in (2.19) diverges, i.e., p yo = I for all y > 0. Then condition on X, to get Poo = ( 1 — ßo) + ßopio,
(2.23)
Poo = 1.
(2.24)
so that
Next look at (see Eq. 2.16)
Pu = 6 1Poi
+ ß1P21
+ (1 — 6 1 — ßl).
(2.25)
Since P 20 = 1 and the process does not skip states, P 22 = 1. Also, p ol = 1 (Exercise 6). Thus, p ll = I and, proceeding by induction, p= 1,
for each y > 0.
(2.26)
On the other hand, if the series in (2.19) converges then p^, o < 1 for all y > 0. In particular, from (2.23), we see Poo < 1. Convergence of the series in (2.19) also gives p r,, < 1 for all c < y by (2.12). Now apply (2.16) to get p<1,
for ally
(2.27)
whenever the series in (2.19) converges. That is, the birth—death chain is transient. The various remaining cases, for example, two absorbing, or one absorbing and one reflecting boundary, are left to the Exercises.
3 INVARIANT DISTRIBUTIONS FOR BIRTH—DEATH CHAINS Suppose that there is a probability distribution it on S such that n'p = n'.
(3.1)
Then n'p" = it' for each time n = 1, 2, .... That is, it is invariant under the transition law p. Note that if {X"} is started with an invariant initial distribution it then X" has distribution it at each successive time point n. Moreover, {X"} is stationary in the sense that the P,,-distribution of (X0 , X,, ... , X.) is for each m > I invariant under all time shifts, i.e., for all
239
INVARIANT DISTRIBUTIONS FOR BIRTH DEATH CHAINS
k>,1, Pn(xo =
= Pn (Xk = 1p, ... > Xm+k =
io, ... , Xm — Im)
I m ).
(3.2)
For a birth-death process on S = {0, 1, 2, ... , N} with two reflecting boundaries, the invariant distribution it is easily obtained by solving n'p' = n', i.e.,
no(1 — ßo) + ir rbi = ir o 7i
ifli
i
(3.3) (j = 1,2,...,N— 1),
+(l — ßi — ai) + mi+Ibi+1 = ni
or (3.4)
7[i-ßßi-^ — ni(ßi + S i ) + ni+^Sj+1 = 0.
The solution, subject to ni 0 for all j and J] i rc i = 1, is given by n—
i
n
(l<j
(3.5) ...ßi—I) 1 N ßoßt 7Cp = 1 + Y_ CS1(52...(SJ i =1 (
For a birth-death process on S = {0, 1, 2, ...} with 0 as a reflecting boundary, the system of equations n'p = n' are 7ro(I —ßo)+n151=ito,
(3.6)
71 i -Ißi -^+it(I—ß i —S i )+7r i +la i +^=j
(j>11).
The solution in terms of n o is 7C
= ßoß i ...ßi
a 1 a 2 ...Si
1
n o
(j%
1).
(3.7)
In order that this may be a probability distribution one must have ß0ß1 .ßJ- 1 .
< co.
(3.8)
i =1 6162..•81
In this case one must take / n o =l +
ßß1 .. i =1
16 2
.
ß1
...6 1
1 )
.
(3.9)
BIRTH—DEATH MARKOV CHAINS
240
For an unrestricted birth-death process on S = {0, ± 1, ±2,. . .} the equations n' p' = n' are 7 rj-1Ni-1
+ 1Cj(I — /'j — Uj) +
7C j+1 b j+1
(j = 0, + 1, ±2,...) (3.10)
= 7Cj
which are solved (in terms of n o ) by ßoßl 7t,
...
ßj-1
(J% 1 ),
6162...gj Ro
—
(3.11)
• 07r0
aj+16j+2
(j1< —1).
ßß+r * * F'-1
This is a probability distribution if and only if This
Y_ . E 6 j<-1 Pifli+1 • .ß-1 j>-1 6 1 2 ...(a j bj+1
j+2
• .
ß0ß1...ßj-1
< 00,
(3.12)
.
in which case 7< o=
1+
aj+lbj+2
j_' 1 Yjl'j+l
...S 0 + ßoßl
/^/^
...
-
...
ßi
1
ß—I j>-1 6162...(ai
(3.13)
Notice that the convergence of the series in (3.12) implies the divergence of the series in (2.14), (2.15). In other words, the existence of an equilibrium distribution for the chain implies its recurrence. The same remark applies to the birth-death chain with one or two reflecting boundaries.
Example 1. (Equilibrium for the Bernoulli-Laplace Model). For the Bernoulli-
Laplace model described in Section 1, the invariant distribution it = (n : i = 0, 1, ... , 2r) is the hypergeometric distribution calculated from (3.5) as ;
2r 2w ßj_ 1 2r(w + r) i (w + r — i)(2r — i) (j X W + — j _ ßo 2w+2r i(w—r +i) ^j S 1 •••b j "o j(2—r+j) j ...
-1
r
(w+r) (3.14)
The assertions concerning positive recurrence contained in Theorem 3.1 below rely on the material in Section 2.9 and may be omitted on first reading. Recall from Theorem 9.2(c) of Chapter I that in the case that all states communicate with each other, existence of an invariant distribution is equivalent to positive recurrence of all states.
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
241
Theorem 3.1
(a) For a birth-death chain on S = {O, 1, ... , N} with both boundaries reflecting, the states are all positive recurrent and the unique invariant distribution is given by (3.5). (b) For a birth-death chain on S = {0, 1, 2, ...} with 0 a reflecting boundary, all states are recurrent or transient according as the series c` 66 12
1 ß 1
[J
Nx
diverges or converges. All states are positive recurrent if and only if the series (3.8) converges. In the case that (3.8) converges, the unique invariant distribution is given by (3.7), (3.9). (c) For an unrestricted birth-death chain on S = {0, ± 1, ± 2, ...} all states are transient if and only if at least one of the series in (2.14) and (2.15) is convergent. All states are positive recurrent if and only if (3.12) holds; if (3.12) holds, then the unique invariant distribution is given by (3.11), (3.13).
4 CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS We will apply the spectral theorem to calculate p", for n = 1, 2, ... , in the case that p is the transition law for a birth-death chain. First consider the case of a birth-death chain on S = {0, 1, ... , N} with reflecting boundaries at 0 and N. Then the invariant distribution it is given by (3.5) as 1
71 t = '—^ '-1 71 0 (2 7z 1 = — n o , b l51...51
< j < N).
(4.1)
It is straightforward to check that i
m.i-1 =m1b1=ni-1ßi-1 = 7 Ei-1Pi-1.;,
i= 1,2.....N,
(4.2)
from which it follows that 7T ; p i; = n j p j „
for all i, j.
(4.3)
In the applied sciences the symmetry property (4.3) is often referred to as detailed balance or time reversibility. Introduce the following inner product (• . )" in the
vector space R"+'
242
Y_
BIRTH—DEATH MARKOV CHAINS
N
(x, Y)n = i
=o
x = (x 0 , x i , ... , xN)',
xiYi 7ry
Y = (Yo, Yi, ... , YNY, (4.4)
so that the "length" IlxiI,, of a vector x is given by
i xI (iO
( 4.5)
1JZ .
With respect to this inner product the linear transformation x —► px is symmetric since by (4.3) we have N
(px, y),, =
N
N N
i =0 j =o
Y,
UO
PjixjYiitj
Pijxj Yiii =
i=O j=O
PjiYiJxj7rj
i=O
= i=O Z (=O Y- PijYj)xi 71 i = ( PY, x). = (x, PY)n.
Y_
(4.6)
Therefore, by the spectral theorem, p has N + 1 real eigenvalues a o , a l , ... , a N (not necessarily distinct) and corresponding eigenvectors (0o, 4' , dN, which are of unit length and mutually orthogonal with respect to (• , • ). Therefore, the linear transformation x --> px has the spectral representation N
P= N
x=
akEk ,
k=0
E
(4.7)
ak(4>k, x)aek ,
k=O
where Ek denotes orthogonal projection with respect to (• , • ) onto the one-dimensional subspace spanned by 4: E k x = ( 4 k , x),1 4 k . It follows that N
p= Z
akEk ,
k=O
N
Pn x =
(4.8)
Y- ak(4)k, x)nek•
k=0
Letting x = e j denote the vector with 1 in the jth coordinate and zeros elsewhere, one gets
p;7) = ith element of p"ej N
N _
akOki4kjnj.
ak( ok, ej),Oki = k=0
k=0
(4.9)
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
243
Without loss of generality, one may take a o = I and 0 o = 1 throughout. We now consider two special birth-death chains as examples. Example 1. (Simple Symmetric Random Walk with Two Reflecting Boundaries)
S={0,1,.. ,N},
p;,;+a=Pj,^-1
=i,
for1_
Pol = 1 =PN ,N-1•
In this case the invariant initial distribution is given by 1
7Z j =N
I
(1 <j<,N-1),
tr o =zZ N =2 N .
(4.10)
If a is an eigenvalue of p, then a corresponding eigenvector x = (x o , x 1 , ... , x N )' satisfies the equation z(x j _ 1 +x j+l )=ax j (1<,j
(4.11)
along with "boundary conditions" x l = aX0,
XN_ 1 = axN.
(4.12)
As a trial solution of (4.11) consider x j = 0 1 for some nonzero 0. Since all vectors of C N+ 1 may be expressed as unique linear combinations of functions j - exp{ (2rn/N + 1) ji} = 0', with i = (-1) 1 / 2 , one expects to arrive at the right combination in this manner. Then (4.11) yields '-z (B' -1 + B' +1 ) = cth , i.e., 0 2 - 2at + 1 = 0,
(4.13)
whose two roots are B 1 =a+i,,/T-c ,
0 2 =a-ijl-a 2 .
(4.14)
The equation (4.11) is linear in x, i.e., if x and y are both solutions of (4.11) then so is ax + by for arbitrary numbers a and b. Therefore, every linear combination xj=A(cc)0 +B(a)0z
(0<j
(4.15)
satisfies (4.11). We now apply the boundary conditions (4.12) to fix A(a), B(a), up to a constant multiplier. Since every scalar multiple of a solution of (4.11) and (4.12) is also a solution, let us fix x o = 1. Note that x o = 0 implies x 3 = 0 for all j. Letting j = 0 in (4.15), one has
A(a) + B(a) = 1,
B(a) = I - A(a).
(4.16)
244
BIRTH-DEATH MARKOV CHAINS
The first boundary condition, x, = ax, = a, then becomes
A(a)(0 1 — 0 2 ) + 0 2
=
(4.17)
a,
or, 2A(a)(l — a 2 i.e., at least for a
1' 2i
)
= ( 1 — a 2 )'' 2 i,
1,
A(a) = i,
B(a) ='-z .
(4.18)
The second boundary condition x N - 1 = ax N may then be expressed as 1
(
+ 0)
= 2 (0;
+ 0z). (4.19)
Now write 0 1 = e`o, 0 2 = e - ' O, where 0 is the unique angle in [0, it] such that cos 4) = a. Note that cosine is strictly decreasing in [0, n] and assumes its entire range of values [-1, 1] on [0, 7c]. Note also that this is consistent with the requirement sin 4) = 1 — a 2 0. Then (4.19) becomes cos(N — 1)4) = a cos No = cos 0 cos N4),
(4.20)
sin No sin 0 = 0,
(4.21)
i.e.,
whose only solutions in [0,
it]
are
4)=cos
(k= 0,1,2,...,N).
(4.22)
Thus, there are N + 1 distinct (and, therefore, simple) eigenvalues a k = cos
(k = 0, 1, 2, ... , N),
(4.23)
and corresponding eigenvectors x '` ( k = 0, 1, ... , N): (
)
x5 = z(0i+9z)= cos k (j=
N
Now,
0,1,...,N).
(4.24)
245
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL, METHODS N
k1 = J n j cosz IIx(Iz
kn
l
N j=0
N j _ o N
—
t
il
cos2(krrj) = 1
= 1 N1
N -i
1
2N+Nlyl
cosz
(k7rj)
1
N + 2N
I + cos(2kirj/N)
2
c1
ifk=0orN
J-
ifk=1,2,...,N-1.
(4.25)
Thus, the normalized eigenvectors are ON=( 1 , -1 ,+1, -- 1,..
=( 1 , 1 ,•• ,l)', 4)k—/2-x
(4.26)
^
(k1
(1
0kj=^/Lcos
Now use (4.9), (4.23), and (4.26) to get, for 0 < i, j < N,
(n)— Pij —
N E «k n4
ki kj 7 lj
k=0 N-1
= rr j + 2rz j y- cos ( k^ cos kni cos( k=1\ Nj N
k7rj
N
+ (-1)rc j . (4.27)
Thus, for 1 <j,
2 p j 1 — + — N N k=1 ^l
1
COS"(
)—COS
COS
N (N I ) ) .. ) N
!
} _ 1 n+j —i _
N
(
For 0
1
Po = — + —
2N N
(—
(— )
^—
COS"t— \ COS —J / + 1 "— I
k = t N
N
1
2N
For 0
1 N -'
/ k\
P,N — + — Z COs — 2N N k = 1 N
ki
cos (--- I COS — + (— 1)n+N
N
N
i
1 2N
(4.28)
Note that when n and j — i have the same parity, say n = 2m and j — i is
even, then
C
r) p;;"' 1 —
I=4
L
cos( —I I
cos(
cos(
I[] + o(1)]
(4.29)
246
BIRTH—DEATH MARKOV CHAINS
as m - co. This establishes the precise rate of exponential convergence to the steady state. One may express this as u
m (p;]'" ) — ^^e z n' A _
cosh cos ^,
(4.30)
where A = log a, (see (4.23)).
Example 2. (Simple Symmetric Random Walk with One Reflecting Boundary) S = 0, 1, 2, ...
Poi = 1
and
i_
Pi,k -1 = 2 = Pij+ i
for all i > 1. Note that p;! is the same as in the case of a random walk with two reflecting boundaries 0 and N, provided N > n + i, since the random walk cannot reach N in n steps (or fewer) starting from i if N > n + i. Hence for all i, j, n, p is obtained by taking the limit in (4.28) as N —. oc, i.e., )
p=2
f
, cos"(nO) cos(iirO) cos( jn6) dO
o
2 "
_ —I cos"(0) cos(iO) cos(
JO) dO
(j 1> 1, i >1 0),
(4.31)
l o R
p p = 1c os(0) )
cos(iO) dO
(i i
0).
ir o
An alternative calculation of (4.31) can be made by first noting that the condition (4.3) is valid for the sequence of weights {rr ; } given in (4.1) with it o = 1. This provides an inner product (sequence) space on which p is bounded self-adjoint linear transformation. The spectral theory extends to such settings as well. An example in which the birth—death parameters are state-space dependent (i.e., nonconstant) is given in the chapter application. 5 CHAPTER APPLICATION: THE EHRENFEST MODEL OF
HEAT EXCHANGE The Ehrenfest model illustrates the process of heat exchange between two bodies that are in contact and insulated from the outside. The temperatures are assumed to change in steps of one unit and are represented by the numbers of balls in two boxes. The two boxes are marked I and II and there are 2d balls labeled 1, 2, ... , 2d. Initially some of these balls are in box I and the remainder in box II. At each step a ball is chosen at random (i.e., with equal probabilities among ball numbers 1, 2, ... , 2d) and moved from its box to the other box. If there
CHAPTER APPLICATION: THE EHRENFEST MODEL OF HEAT EXCHANGE
247
are i balls in box I, then there are 2d — i balls in box II. Thus there is no overall heat loss or gain. Let X. denote the number of balls in box I after the nth trial. Then {X„: n = 0, 1, ...} is a Markov chain with state space S = {0, 1, 2, ... , 2d} and transition probabilities Pi,c+1 =1 — d- ,
Pu.,- = 2d '
for i = 1,2,... , 2d — 1,
(5.1) Poi-
P
ij
1 ,
P2d,24-1=
= 0,
1
,
otherwise.
This is a birth-death chain with two reflecting boundaries at 0 and 2d. The transition probabilities are such that the mean change in temperature, in box I, say, at each step is propostional to the negative of the existing temperature gradient, or temperature difference, between the two bodies. We will first see that the model yields Newton's law of cooling at the level of the evolution of the averages. Assume that initially there are i balls in box I. Let Y„ = X„ — d, the excess of the number of balls in box I over d. Writing e„ = E j (Y„), the expected value of Y„ given X 0 = i, one has e„=E,(X„—d)=E,[X„- —d+(X„—X„-,)]
=E,(X„_1
—
d)+E;(X„
= e„- 1 + Ei
/ 2d —
X„ 1)=e„-1+Er
—x_, X _ i
2d
1
2d )
1\ e„ -i = e„-, — d = 1 — ^ e„ 1.
d
-
)
Note that in evaluating E i (X„ — X„ _ 1 ) we first calculated the conditional expectation of X„ — X„-, given Xn _ 1 and then took the expectation of this conditional mean. Now, by successive applications of the relation e„ = (1 — e„=(1 —!) e o = „
(
1 —!) E i (X0 —d) =(i —d)(1 —! „. „
)
(5.2)
Suppose in the physical model the frequency of transitions is r per second. Then in time t there are n = tT transitions. Write v = —log[(l — (1/d))]T. Then e„ = (i — d)e - °`,
(5.3)
which is Newton's law of cooling. The equilibrium distribution for the Ehrenfest model is easily seen, using (3.5),
to be
248
BIRTH-DEATH MARKOV CHAINS
( 2 d) 2 _ a y
_
(5.4)
j=0, 1,...,2d.
That is, it = (ij : j e S) is binomial with parameters 2d, z. Note that d = is the (constant) mean temperature under equilibrium in (5.3). The physicists P. and T. Ehrenfest in 1907, and later Smoluchowski in 1916, used this model in order to explain an apparent paradox that at the turn of the century threatened to wreck Boltzmann's kinetic theory of matter. In the kinetic theory, heat exchange is a random process, while in thermodynamics it is an orderly irreversible progression toward equilibrium. In the present context, thermodynamic equilibrium would be achieved when the temperatures of the two bodies became equal, or at least approximately or macroscopically equal. But if one uses a kinetic model such as the one described above, from the state i = d of thermodynamical equilibrium the system will eventually pass to a state of extreme disequilibrium (e.g., i = 0) owing to recurrence. This would contradict irreversibility of thermodynamics. However, one of the main objectives of kinetic theory was to explain thermodynamics, a largely phenomenological macroscopic-scale theory, starting from the molecular theory of matter. Historically it was Poincare who first showed that statistical-mechanical systems have the recurrence property (theoretical complement 2). A scientiest named Zermelo then forcefully argued that recurrence contradicted irreversibility. Although Boltzmann rightly maintained that the time required by the random process to pass from the equilibrium state to a state of macroscopic nonequilibrium would be so large as to be of no physical significance, his reasoning did not convince other physicists. The Ehrenfests and Smoluchowski finally resolved the dispute by demonstrating how large the passage time may be from i = d to i = 0 in the present model. Let us now present in detail a method of calculating the mean first passage time m = E 1 To , where To = inf {n > 0: X„ = 0} . Since the method is applicable to general birth—death chains, consider a state space S = {0, 1, 2, ... , N} and a reflecting chain with parameters ß ; , S ; = 1 — ß, such that 0 < ß, < 1 for 1
m ; =1+ ßm
1
+5 1 m i _ 1 (1
m0=0,
(5.5)
mN = 1 + mN_1.
Relabel the states by i —► u, so that u ; is increasing with i, u
0
=0,
(5.6)
u l = 1,
and, for all x e S,
4'(x) __ Px
({
X„} reaches 0 before N) = u
" —
u
"
UN - UO
(5.7)
249
CHAPTER APPLICATION: THE EHRENFEST MODEL OF HEAT EXCHANGE
In other words, in this new scale the probability of reaching the relabeled boundary u = 0, before U N , starting from u x (inside), is proportional to the distance from the boundary U N . This scale is called the natural scale. The difference equations (5.5) when written in this scale assume a simple form, as will presently be shown. First let us determine u x from (5.6) and (5.7) and the difference equation '(x)=ß x '(x+1)+ 5 x (i(x-1),
0<x< N,
(5.8) tf(x) = 1,
i(N)
=
0,
which may also be expressed as ^i(x+l)—^i(x)=
6X[^i(x)—^i(x-1)],
0<x
(5.9)
/i(N)=0.
ß/i(0) = 1,
Equations (5.6), (5.7), and (5.9) yield ux+1 — ux= --- (ux — ux -i) x
a 1 S Z .. _ a a a
S,a2...
(ui
x
—uo)= a
(1<x
(5.10)
(1
(5.11)
ß or x
. (5 i 61o2"
ux+1 = 1 +i^ ßiß2 ...ßi
x
Now write (5.12)
m(u) - m x .
Then (5.5) becomes [m(ui +i) — m(ur)]ßr —
[m(ui) — m(ui-
)]5 =
—1
(1 <, i < N — 1), (5.13)
m(u N )—m(u N - 1 )= 1.
m(u 0 )=m(0)=0,
One may rewrite this, using (5.10), as m(u1 + 1 ) — m(ui) — m(ui) — m(u1 ui +l — Ui
ui — Ui -1
-
1
)
_
ßoß1
.
6162.. •4j
1
(1 < i < N — 1)
250
BIRTH-DEATH MARKOV CHAINS
or, summing over i = x, x + 1, ... , N — 1 and using the last boundary condition in (5.13), one has 1
m(u) — m(u- 1 )
UN — UN-1
1 ßß _—
_
i =x
Ux — Ux-1
6162...(Si
(1 x N — 1). (5.14)
Relations (5.10) and (5.14) lead to
Y_
l'N-1 +
m \U x ) — m(U x -1) = Yx/'x+l
Y
1 F'x
Ni -1i
(1 x < N — 1). l (5.15) The factor ß ; /ß i is introduced in the last summands to take care of the summand corresponding to i = x (this summand is actually 1/S x ). Sum (5.15) over x = 1, 2, ... , y to finally get, using m(u 0 ) = 0, m(UO _
...6
Sxax+l
ßxfßx+1
x=1
Sx 6 x+I
...%jN * *
'
i =x
N-1
ßx
-1 +
SN-1
Ll
x=1 i =x
(Sx...Sißi
...
ßi 1ßt
6X... 6 ßi
(1 < y < N — 1).
(5.16)
In particular, for the Ehrenfest model one gets (2d—x)••.2.1
m =m(ud)= °
x=
+
1x(x+1)...(2d-1)
= (2d — x)!(x — 1)! (2d — 1)! x=1
= 2d 22d(1
+Q)).
d
U -1
(2d—x)•••(2d —i)
x=1 i =x
x(x+1)...i(2d —i)
2d1 (2d — x)!(x — 1)!
x=1 i =x
(2d — i)!i!
(5.17)
Next let us calculate
m =_ Ei T (0
4
(5.18)
where Td=
inf{n >0:Xd}.
(5.19)
Writing m(u i ) = Ph i , one obtains the same equations as (5.13) for 1 < i < d — 1, and boundary conditions
in(u o ) = 1 + ►n(u 1 ),
m(u) = 0.
As before, summing the equations over i = 1, 2, ... , x,
(5.20)
CHAPTER APPLICATION: THE EHRENFEST MODEL OF HEAT EXCHANGE
m(ux+l) - m(ux)
tn(ul) - tn(uo)
...
ß_-I
i =1 6 1 62...^i
ux+I — U x U1 — UQ
_ -1- Y
ßoß1
x
251
...—'
o1
_ 6 6 ...6. 1
1
(5.21)
2
where ß 0 = 1. Therefore, x x - Y
m(u x+, ) - m(u) _x
' z ß,ßz
ßx
r+ I x x+ I
1 ß,
i
(5 22)
ßx6x+I '
which, on summing over x = 1, 3, ... , d - 1 and using (5.20), leads to
-m(u 0 ) _ -1
-
x=1
.g x - d
d-a 6152
PlP2
.
d-1
m°
l+
X= I
x!
x 5;+ , ...g xbx+l ...
.
(5.23)
ßxbx+l
d-I x ((x + 1)x• •(i + 2)(i + 1)
(2d-1)•••(2d-x)
d-1 <1
=
i
x=1 i =1 Pf
• .Px
For the present example this gives
-
+
x
i; (2d_i)...(2d_x)(x+1) )
+ d-I
x!
2d
x
x
x -i
+
x=1 (2d- 1)•••(2d-x) x = 1 2d-x 1= 2d-x d-1
,1+Y-
x= I
x!
(2d -
d-I
-+I
1)...(2d - x)
1+ dIl xl
2d
x= I
2(d - x)
+ d(log d + 1).
x= I (2d - 1) • • • (2d - x)
(5.24)
Since the sum in the last expression goes to zero as d -• oo, m o
asd -* oo.
(5.25)
For d = 10 000 balls and rate of transition one ball per second, it follows that m o < 102 215 seconds < 29 hours, 10 6000 years. a n d = 10
(5.26)
It takes only about a day on the average for the system to reach equilibrium from a state farthest from equilibrium, but takes an average time inconceivably large, even compared to cosmological scales, for the system to go back to that state from equilibrium.
BIRTH—DEATH MARKOV CHAINS
252
The original calculations by the Ehrenfests concerned the mean recurrence times. Using Theorem 9.2(c) of Chapter II it is possible to get these results quite simply as follows. Let t;' = min{n ? 1: X„ = i }. Then, the mean recurrence time of the state i is
Et'
1 = i!(2d it
—
i)! 22d
(5.27)
2d!
For d = 10 000 one gets, using Stirling's approximation for the second estimate, E o zoi » — 220000
Edtä'^ ^_ l00,Jr. .
(5.28)
Thus, within time scales over which applications of thermodynamics make sense, one would not observe a passage from equilibrium to a (macroscopic) nonequilibrium state. Although Boltzmann did not live to see it, this vindication of his theory ended a rather spirited debate on its validity and contributed in no small measure to its eventual acceptance by physicists. The spectral representation for p can be obtained by precisely the same steps as those outlined in Exercise 4.3. The 2d eigenvalues that one obtains are given by a j =j/d, j = ±1,±2,..., ±d.
EXERCISES Exercises for Section III.1 (Artificial Intelligence) The amplitudes of pure noise in a signal- detection device has p.d.f. fo (x), while the amplitude is distributed as f l (x) when a signal is present with the noise. A detection procedure is designed as follows. Select a threshold value 0 0 = r8 for .integer r. If the first amplitude observed, X 1 , is larger than the initial threshold value 0, then decide that the signal is present. Otherwise decide that the signal is absent. Suppose that a signal is being sent with probability p = Z and that upon making a decision the observer learns whether or not the decision was correct. The observer keeps the same threshold value if the decision is correct, but if the decision was incorrect the threshold is increased or decreased by an amount S depending on the type of error committed. The rule governing the learning process of the observer is then 0.11 = ©n + U{1(O,.)(X.„ —0)—S]
where S. is 1 or 0 depending on whether a signal is sent or not. The signal transmission processes {S„} and {X„} are i.i.d. (i) Show that the threshold adjustment process {0„} is a birth—death Markov chain and identify the state space. (ii) Calculate the transition probabilities.
253
EXERCISES
2. Suppose that balls labeled 1, ... , N are initially distributed between two boxes labeled I and II. The state of the system represents the number of balls in box I. Determine the one-step transition probabilities for each of the following rules of motion in the state space. (i) At each time step a ball is randomly (uniformly) selected from the numbers 1, 2, ... , N. Independently of the ball selected, box I or II is selected with respective probabilities p, and P2 = t — p,. The ball selected is placed in the box selected. (ii) At each time step a ball is randomly (uniformly) selected from the numbers in box I with probability p, or from those in II with probability P2 = 1 — p,. A box is then selected with respective probabilities in proportion to current box sizes. The ball selected is placed in the box selected. (iii) At each time step a ball is randomly (uniformly) selected from the numbers in box I with probability proportional to the current size of I or from those in II with the complementary probability. A box is also selected with probabilities in proportion to current box size. The ball selected is placed in the box selected.
Exercises for Section 111.2
>_ > >_
1. Let A d be the set {w: X 0 (co) = y, {Xn (w): n 0} reaches c before d}, where y > c. Show that A d j A = {cw: X0 (cu) = y, {X (cu): n O} ever reaches c}. X
2. Prove (2.15) by using (2.14) and looking at {—Xn : n
_< _<
0}.
3. Prove (2.4), (2.16), and (2.23) by conditioning on X, and using the Markov property. 4. Suppose that cp(i)(c
>_ _<
i d) satisfy the equations (2.4) and the boundary conditions q(c) = 0, cp(d) = 1. Prove that such a cp is unique.
5. Consider a birth—death chain on S = {0, 1, ... , N } with both boundaries reflecting. (i) Prove that P(T mN) (I — S N 5 N _ I ...6, )m if i > j, and < (1 — ßo/i t ..ß N _, )m if i < f. Here T = inf {n 1: Xn =j}. (ii) Use (i) to prove that p ; , = P; (Tt < x) = I for all i, j. 6. Consider a birth—death chain on S = {0, 1, ...} with 0 reflecting. Argue as in Exercise 5 to show that p 1 = I for all y. 7. Consider a birth—death chain on S = {0, I, ... , N} with 0, N absorbing. Calculate
p;T',
lim n ' nix
for all i, j.
m=1
8. Let 0 be a reflecting boundary for a birth—death chain on S = {... , —3, —2, —1, 0}. Derive the necessary and sufficient condition for recurrence.
N
9. If 0 is absorbing, and reflecting, for a birth—death chain on S = {0, 1, ... , N}, then show that 0 is recurrent and all other states are transient. 10. Let p be the transition probability matrix of a birth—death chain on S = {0, 1, 2, ...} with ß.=
2 (j +21 )
Si
2 (j+ 1),
j= 0.1,2,....
254
BIRTH-DEATH MARKOV CHAINS
(i) Are the states transient or recurrent? (ii) Compute the probability of reaching c before d, c < d, starting from state i, c_
Exercises for Section III.3 1. Let {X„} be the asymmetric simple random walk on S = {0, 1, 2, ...} with ß j = p < 2, 1, 2,... and (partial) reflection at 0 with (i) Calculate the invariant initial distribution it. (ii) Calculate EX„ as a function of p < Z. 2. (A Birth—Death Queue) During each unit of time either 0 or I customer arrives for service and joins a single line. The probability of one customer arriving is 2, and no customer arrives with probability 1 — 2. Also during each unit of time, independently of new arrivals, a single service is completed with probability p or continues into the next period with probability 1 — p. Let X. be the total number of customers (waiting in line or being serviced) at the nth unit of time. (i) Show that {X„} is a birth—death chain on S = {0, 1, 2, ...}. (ii) Discuss transience, recurrence, positive-recurrence. (iii) Calculate the invariant initial distribution when A < p. (iv) Calculate EX„ when A < p, where it is the invariant initial distribution. 3. Calculate the invariant distribution for Exercise 1.2(i) where N —i N p l ,
ifj =i +1,
—i) Ni (P 1N + N p 2 ,
ifj=i,i=0,1 ' ..., N,
Pij=
Ni P 2' ifj =i -1, 0,
otherwise.
Discuss the situation for Exercise 2(ii) and (iii). 4. (Time Reversal) Let {X„} be a (stationary) irreducible birth—death chain with in-
variant initial distribution n. Show that P„(X„ = j I X„ + i = i) = Pn(X„+ 1 = j I X. = i).
Exercises for Section III.4 1. Calculate p for the birth—death queue of Exercise 3.2. 2. Let T be a self-adjoint linear transformation on a finite-dimensional inner product space V. Show (i) All eigenvalues of T must be real. (ii) Eigenvectors of T associated with distinct eigenvalues are orthogonal.
EXERCISES
255
3. Calculate the transition probabilities p" for n >, 1 by the spectral method in the case of Exercise 1.2(i) and p, = p 2 = Z according to the following steps. (i) Consider the eigenvalue problem for the transpose p'. Write out difference equations for p'x = ax. (ii) Replace the system of equations in (i) by the infinite system 1 -
2
x0+
1 N —i 1 i +2 X I = ax0, 2N Xi + - xi+) + x^+2 = ax;+„
---
2
2N
2N
i = 0, 1, 2, .... Show that if for some a there is a nonzero solution x = (x 0 , x,, x 2' ...)' of this infinite system with X N+l = 0, then a must be an eigenvalue of p' with corresponding eigenvector (x o , x,, ... , x N )'. [Hint: Show that X N+ , = 0 implies x i = 0 for all i -> N + 1.] (iii) Introduce the generating function q(z) = Z^° ° x ; z` for the infinite system in (ii). Note that for the desired solutions satisfying X N+ , = 0, q(z) will be a polynomial of degree _< N. Show that N(2a — 1 — z)
q (Z) = '
q(0) = xo.
q(Z),
1 — ZZ
[Hint: Multiply both sides of the second equation in (ii) by z' and sum over i >-0.] (iv) Show that (iii) has the unique solution (P(Z) = X0( 1 — z)
N I'
-a)(I +
Z
)Na
(v) Show that for aj = j/N, j = 0, 1, ... , N, cp(z) is a polynomial of degree N and therefore, by (ii) and (iii), a j = j/N, j = 0, 1, ... , N, are the eigenvalues of p' and,
therefore, of p. (vi) Show that the eigenvector x (' ) = (xö ) , ... , xN ) )' corresponding to aj = j/N is given with xo' ) = 1, by xk ) = coefficient of z' in (1 — z)" - '(1 + z) . (vii) Write B for the matrix with columns x^ 0) , .. , x (N) . Then, (B') - ' diag(aö, • . , a N)B
,
where (B') ' — B diag
no
(IX' 0) lirz
n °...
' IIX IN) IIa2
)
and the (invariant) distribution it is binomial with p = 2, N; see Exercise 1.2(i). [Hint: Use the definitions of eigenvectors and matrix multiplication to write (p')"B = B diag(cc .... , aN). Multiply both sides by B' and take the transpose to get the spectral representation of p". Note that the columns of (B') - ' are the eigenvectors of p since p(B') - ' = (B') - ` diag(a o , ... , a N ). To compute (B') - ' use orthogonality of the eigenvectors with respect to (• , • )" to first write B' diag(a ° , ... ,. N )B = diag(llx (0) jjn, • • • , IIx (N) 11n). The formula for (B') - ' follows.]
256
BIRTH—DEATH MARKOV CHAINS
4. (Relaxation and Correlation Length) Let p be the transition matrix for a finite state stationary birth—death chain {X n } on S = {0, 1, ... , N} with reflecting boundaries at 0 and N. Show that sup {Corr,,( f(Xn ), g(X 0 ))} = e z i°. 1.9
where Corrn(.i(X.),
— Ef(XX)][g(X0) — Eg(Xo)]} 9(X0)) = En{[f(Xn) (Varnf(Xn)) 2 (Var,^ g(X0)) 1J2
A l is the largest nontrivial (i.e., ^ 1) eigenvalue of p, and the supremum is over real-valued functions f and g on S. The parameter r = 1/x. 1 is called the relaxation time or correlation length in applications. [Hint: Use the self-adjointness of p (i.e., time-reversibility) with respect to (• , • ),, to obtain an orthonormal basis {cp"} of eigenvectors of p. Check equality in the case f = g = cp l is the eigenvector corresponding to A 1 . Restrict attention to f and g such that E n f = E n g = 0 and
ii IIn = 119 11,, = 1, and expand as
f
=Y_(f
M. P.,
(
(
g=Y_(g ,qin)np
n
Use the inequality ab 5.
.
n
_< (a' + b 2 )/2 to show (Corr, ( f (X.), g(X ))I < e -z ".] o
(i) (Simple Random Walk with Periodic Boundary) States 0, 1, 2, ... , N — 1 are arranged clockwise in a circle. A transition occurs either one unit clockwise or one unit counterclockwise with respective probabilities p and q = 1 — p. Show that 1 N-1
N
r_o1
where 0 = e(znptN is an Nth root of unity (all Nth roots of unity being
1,0,0 ,...,0 ). 2
N-1
(*ii) (General Random Walk with Periodic Boundary) Suppose that for the arrangement in (i), a transition k units clockwise (equivalently, N — k units counterclockwise) occurs with probability p k , k = 0,1, ... , N — 1. Show that N—I (n) = 1 Pjk — —
N—I
0rU'k) I
N
r=o
P,
Br5 "
s=o
where 0 = e (2 "' is an Nth root of unity.
THEORETICAL COMPLEMENTS Theoretical Complements to Section III.5 1. Conservative dynamical systems consisting of one or many degrees of freedom (components, particles) are often represented by one-parameter families of
THEORETICAL COMPLEMENTS
257
transformations {T,} acting on points x = (x,, ... , x") of [I8" (position—momentum phase space). The transformations are obtained from the differential equations (Newton's Law) that govern the evolution started at arbitrary states x e (18"; i.e., Tx is the state at time t when initially the state is T o x = x, x e ti". The mathematical theory described below applies to phenomena where the physics provide a law of evolution of the form aT, x
— f(T,x),
t > 0,
at
(T.5.1)
To x=x, such that f = (f,, ... , f") . I^" — R" uniquely determines the solution at all timest >0 for each initial state x by (T.5.1).
Example 1. Consider a mass m on a spring initially displaced to a location x, (relative to rest position at x, = 0) and with initial momentum x 2 . Then Hooke's law provides the force (acting along the gradient of the potential curve U(x 1 ) = Zkx, ), according to which f(x) - (f,(x), f2 (x)) _ ((1/m)x 2 , —kx,), where k > 0 is the spring constant. In particular, it follows that T r x = xA(t), where
A(t)=
cos(yt)
—my sin(yt)
--sin(yt)
cos(yt)
1 t_>0,
my
where y=
k
> 0.
Notice that areas (2-dimensional phase-space volume) are preserved under T, since det A(t) = 1. The motion is obviously periodic in this case.
Example 2. A standard model in statistical mechanics is that of a system having k (generalized) position coordinates q l , ... , q, and k corresponding (generalized) momentum coordinates p l , ... , P k . The law of evolution is usually cast in Hamiltonian form:
OH dq;_aH dp;_ dt
ap ' ;
aq; '
dt
i
=1,...,k,
(T.5.2)
where H - ll(q,, . .. , qk, p,, ... , Pk) is the Hamiltonian function representing the total energy (kinetic energy plus potential energy) of the system. Example I is of this form with k = 1, H(q, p) = p 2 /2m + kg 2 . Writing n = 2k, x, = q,, ... , X k = qk> Xk+ 1 = Pi, • • • , X2k = Pk, this is also of the form (T.5.1) with p
f(x) _ (fl (x), ..... 2k(x)) _
/
OH OH
OH —
GXk+ 1 aX2,
ax,
aH
, ... , --
(T.5.3)
OXk
Observe that for H sufficiently smooth, the flow in phase space is generally incompressible. That is,
258
BIRTH—DEATH MARKOV CHAINS
div f(x) - trace^af 1
ax;/1
= 0
r a / a Fl 1 + a
of —
—
i =1
ax;
(_ OH )]
I
ax1
i 1 LOx1\aXk+t/ axk +i
(T.5.4)
for all x.
Liouville first noticed the important fact that incompressibility gives the volume preserving property of the flow in phase space. Lionville Theorem T.5.1. Suppose that f(x) in (T.5.1) is such that div f(x) = 0 for all x. Then for each bounded (measurable) set D c R', IT DI = IDI for all t > 0, where I • I denotes n-dimensional volume (Lebesgue measure). ❑ Proof. By the uniqueness condition stated at the outset we have T,
+h
= T,Th for all
t, h > 0. So, by the change of variable formula, d e t(
ITs+hDI = f
-T,,x
l dx.
11,D \ ax )
To calculate the Jacobian, first note from (T.5.1) that aThx
I+ af h+O(h2)
ax =
ax
as h—•0.
But, expanding the determinant and collecting terms, one sees for any matrix M that det(I + hM) = 1 + h trace(M) + 0(h 2 )
as h —• 0.
Thus, since trace(af/ax) = div f(x) = 0,
det( Ox ) = 1 + O(h 2 )
as h —p 0.
ax
It follows that for each t >_ 0 IT,+h DI = IT,DI + O(h 2 )
as h —+ 0,
or dt
IT,DI = 0
and
ITODI = IDI, n
i.e., t —* ITDI is constant with constant value L.
2. Liouville's theorem becomes especially interesting when considered along side the following theorem of Poincare. Poincare's Recurrence Theorem T.5.2. Let T be any volume preserving continuous one-to-one mapping of a bounded (measurable) region D c 1' onto itself. Then for each neighborhood A of any point x in D and every n, however large there is a subset B of A having positive volume such that for all ye B T'y E A for some
r
_>
n.
❑
THEORETICAL COMPLEMENTS
259
Proof: Consider A, T", T - 2 "A, .... Then there are distinct times i, j such that IT -. "A n T'01 ^ 0; for otherwise
Ipl >- Ü T i
-
i "a,1
=I
I
-i
"AI = Z JAI = +oo.
i =a
=o
i =o
It follows that IO n T - "li - 'IAI ^ 0. Take B=AnT - "'i - 'IA,r= nil
—
ii.
n
3. S. Chandrasekhar (1943), "Stochastic Problems in Physics and Astronomy", Reviews in Modern Physics, 15, 1-89, contains a discussion of Boltzmann and Zermello's classical analysis together with other applications of Markov chains to physics. More complete references on alternative derivations as well as the computation of the mean recurrence time of a state can be found in M. Kac (1947), "Random Walk and the Theory of Brownian Motion", American Mathematical Monthly, 54, 369-391; also see E. Waymire (1982), "Mixing and Cooling from a Probabilistic Point of View", SIAM Review, 24, 73-75.
CHAPTER IV
Continuous-Parameter Markov Chains I INTRODUCTION TO CONTINUOUS-TIME MARKOV CHAINS Suppose that {X,: t O} is a continuous-parameter stochastic process with a finite or denumerably infinite state space S. Just as in the discrete-parameter case, the Markov property here also refers to the property that the conditional distribution of the future, given past and present states of the process, does not depend on the past. In terms of finite-dimensional events, the Markov property requires that for arbitrary time points 0 < s o < s, < ... < s <S <1< t, < . .
P(X,=j,X„ l = jt,...,X,.=jfl Xso =i o ,...,X„=i k ,Xs =i) =P(X,=J X, i =11 ...,X1„=J.IXs =i). (1.1) ,
,
<,
In other words, for any sequence of time points 0 t o < t, < ... , the discrete parameter process Yo := X, 0 , Y, := Xr ..... is a Markov chain as described in Chapter II. The conditional probabilities p, J (s, t) = P(X1 = j Xs = i), 0 < s < t, are collectively referred to as the transition probability law for the process. In the case p ; j (s, t) is a function of t — s, the transition law is called time-homogeneous, and we write p, 1 (s, t) = p, j (t — s). Simple examples of continuous-parameter Markov chains are the continuous-time random walks, or processes with independent increments on countable state space. Some others are described in the examples below.
I
Example 1. (The Poisson Process). The Poisson process with intensity function p is a process with state space S = {0, 1, 2, ...} having independent increments distributed as 261
CONTINUOUS-PARAMETER MARKOV CHAINS
262
(f:P(u)du
(
)^
P(X,—XX =j)=
exp—
p(u)duI,
(1.2)
for j = 0, 1, 2, ... , s < t, where p(u), u > 0, is a continuous nonnegative function. Just as with the simple random walk in Chapter II, the Markov property for the Poisson process is a consequence of the independent increments property (Exercise 2). Moreover, P.1(s,t)=P(Xr=j1 X, = i)
P(X' = j, Xs = i) = P (X5=I)
=
P(Xt
—
Xs=j i)P(X5 =i) P(XS=I) —
(J — i )!
J
l expj — p(u) du) \
for j i
(1.3) ifj
0
In the case that p(u) _ A (constant) for each u >, 0, [2(t — s))' ' e 2(i s) -
Pi;(s,t)= 0,
(j
—
i)!
. 3>
(1.4)
,
is a function p ;; (t — s) of s and t through t — s; i.e., the transition law is time-homogeneous. In this case the process is referred to as the Poisson process with parameter 2. Example 2. (The Compound Poisson Process). Let {N} be a Poisson process
with parameter A > 0 starting at 0, and let Y,, Y2 ,... be i.i.d. integer-valued random variables, independent of the process {N}, having a common probability mass function f. The process {X,} is defined by Nt
x= E Y„,
(1.5)
n=o
where Yo is independent of the process {NN } and of Y1 , Y2 .... . The stochastic process {XX } is called a Compound Poisson Process. The process has independent increments and is therefore Markovian (Exercise 4). As a consequence of the independence of increments,
263
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS
p i,(s,t)=P(X,=jI X5 =i)= P(X,—XX =j—i XX =i)
=P(X,—X =j —i). S
(1.6)
Therefore, p,J(s,t)=E{P(X,—XS=j— i N,—N )} S
2(t — S)]k
*k _ Y i f) (j k^ e ^
—
( 1 .7)
=
k0
where f * k is the k-fold convolution of f with f * 0 (0) = 1. In particular, {X,} has a time-homogeneous transition law. The continuous-time simple random walk is defined as the special case with f( + 1) = P(Yn = + 1) = p, f(—l)=P(Y,,= — l)=q,O
2 KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS We continue to denote the finite or denumerable state space by S. To construct a Markov process in discrete time (i.e., to find all possible joint distributions) it was enough to specify a one-step transition matrix p along with an initial distribution n. In the continuous-parameter case, on the other hand, the specification of a single transition matrix p(t o ) = ((p i; (t 0 ))), where p 11 (t o ) gives the probability that the process will be in state j at time t o if it is initially at state i, together with an initial distribution n, is not adequate. For a single time point t o , p(t o ) together with it will merely specify joint distributions of X0 , X, o , X 2 , 0 , . .. , XX , o , ... ; for example, Pn (X0 = i0, X, 0 = i1, ... , Xn,0 = in) = iti0pioi,(t0)pi1iz(t0)...pi^,-1in(t0).
(2.1)
Here p(t o ) takes the place of p, and t o is treated as the unit of time. Events that depend on the process at time points that are not multiples of t o are excluded. Likewise, specifying transition matrices p(t o ), p(t, ), ... , p(t) for an arbitrary finite set of time points t o , t,, ... , t, will not be enough. On the other hand, if one specifies all transition matrices p(t) of a time-homogeneous Markov chain for values of t in a time interval 0 < t <, t o for some t o > 0, then, regardless of how small t o > 0 may be, all other transition probabilities may be constructed from these. To understand this basic fact, first
264
CONTINUOUS-PARAMETER MARKOV CHAINS
assume transition matrices p(t) to be given for all t > 0, together with an initial distribution n. Then for any finite set of time points 0 < t l < t 2 < • • • < t,,, the joint distribution of X 0 , X,,, ... , X given by Pn( XO = i0+ X1 i
= i1 ,
Xr2 = i2. •
,
Xt
-i
=
in - 1 ,
X,„ = in)
_ t1) ... Pi,,-( t,, — t,, –I). = 1tioPi0i,(t1)Pi1i2(t2
Specializing to n i = 1,
t = t, t 2 = t +
s, it follows that
Pi(X1 =j, X1
k) =
(2.2)
Pi;(t)P;k(s),
and (2.3)
Pi(XI +s = k) = p(t + s). But {Xt +s = k} is the countable union disjoint events. Therefore, Pi(XI +s = k)
=Y
U
JEs
{X, =1'
Xt +s =
k} of pairwise
P;(XX =j, Xt +s = k).
(2.4)
JEs
The relations (2.3) and (2.4) provide the Chapman—Kolmogorov equations, namely, pik(t + s)
_Ep
i; (t)p Jk (s)
(i, k E S; s > 0, t > 0),
(2.5)
,jEs
which may also be expressed in matrix notation by the following so-called semigroup property p(t + s) = p(t)p(s)
(s
> 0, t > 0).
(2.6)
Therefore, the transition matrices p(t) cannot be chosen arbitrarily. They must be so chosen as to satisfy the Chapman—Kolmogorov equations. It turns out that (2.5) is the only restriction required for consistency in the sense of prescribing finite-dimensional distributions as in Section 6, Chapter I. To see this, take an arbitrary initial distribution it and time points 0 < t t < t 2 < t 3 . For arbitrary states i 0 , i l , i 2 , i 3 , one has from (2.2) that P-(X0 =
i0, Xti = i1 ,
X,2 = 12, X3 = 1 3) _ "ioPioi1(t1)P2(t2 — t1)Pi2t3(t3 — t2),
(2.7)
as well as
Pn(XO = i0 X,, = ,
i1
,
Xt3 = i3) _
it OPiOiJ
(t1)Pi1i3(t3 — t1).
(2.8)
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS
265
But consistency requires that (2.8) be obtained from (2.7) by summing over i 2 . This sum is
Y—ii 7C top10 (tl)Pt,i2(t2 — tl)P1213(t3 — t2) = ni o p0 (t1)Y-Pi i2 1
1
i2
(t2 — t1)Pi 2 i 3 (t3 — t 2 ). (2.9)
By the Chapman—Kolmogorov equations (2.5), with t = t 2 — t,, s = t 3 — t Z , one has
Z PiIi2(t2 — t1)Pi2i3(t3 — t2) = Pi ,3(t3 — t1 ), 1
iZ
showing that the right sides of (2.8) and (2.9) are indeed equal. Thus, if (2.5) holds, then (2.2) defines joint distributions consistently, i.e., the joint distribution at any finite set of points as specified by (2.2) equals the probability obtained by summing successive probabilities of a joint distribution (like (2.2)) involving a larger set of time points, over states belonging to the additional time points. Suppose now that p(t) is given for 0 < t < t o , for some t o > 0, and the transition probability matrices satisfy (2.6). Since any t > t o may be expressed uniquely as t = rt o + s, where r is a positive integer and 0 < s < t o , by (2.6) we have p(t) = p(rt o + s) = p(t o )p((r — 1)t 0 + s) = p 2 (to)p((r — 2)t o + s) = ... = pr(t)p( s ) • Thus, it is enough to specify p(t) on any interval 0 < t < t o , however small t o > 0 may be. In fact, we will see that under certain further conditions p(t) is determined by its values for infinitesimal times; i.e., in the limit as t o —• 0. From now on we shall assume that lim p(t) = 5 ;j , rlo
(2.10)
where Kronecker's delta is given by 6 ij = 1 if i = j, 5 = 0 if i : j. This condition is very reasonable in most circumstances. Namely, it requires that with probability 1, the process spends a positive (but variable) amount of time in the initial state i before moving to a different state j. The relations (2.10) are also expressed as lim p(t) = I, tlo
(2.11)
where I is the identity matrix, with 1's along the diagonal and 0's elsewhere.
266
CONTINUOUS-PARAMETER MARKOV CHAINS
We shall also write,
(2.12)
p(o)=I.
Then (2.11) expresses the fact that p(t), 0 < t < oc, is (componentwise) continuous at t = 0 as a function of t. It may actually be shown that owing to the rich additional structure reflected in (2.6), continuity implies that p(t) is in fact differentiable in t, i.e., p(t) = d(p ;; (t))/dt exist for all pairs (i, j) of states, and alit > 0. At t = 0, of course, "derivative" refers to the right-hand derivative. In particular, the parameters q ;j given by
q^; = lim
Pij(t) — p(0) =
tlo
t
lim
p(t) —
elo
S `' ,
(2.13)
t
are well defined. Instead of proving differentiability from continuity for transition probabilities, which is nontrivial, we shall assume from now on that p ;; (t) has a finite derivativefor all (i, j) as part of the required structure. Also, we shall write Q
=
(2.14)
((qi;)),
for q i; defined in (2.13). The quantities q ;; are referred to as the infinitesimal transition rates and Q the (formal) infinitesimal generator. Note that (2.13) may be expressed equivalently as p
1
(At) = S i; + q, At + o(At)
as At —. 0.
(2.15)
Suppose for the time being that S is finite. Since the derivative of a finite sum equals the sum of the derivatives, it follows by differentiating both sides of (2.5) with respect to t and setting t = 0 that
Pik(s) =
jes
qij Pik(s),
p' (0 )Pfk(s) _
i, k e S,
(2.16)
JE$
or, in matrix notation after relabeling s as t in (2.16) for notational convenience, p'(t)
=
QP(t)
(t
0).
(2.17)
The system of equations (2.16) or (2.17) is called Kolmogorov's backward equations. One may also differentiate both sides of (2.5) with respect to s and then set s = 0 to get Kolmogorov's forward equations for a finite state space S, Pik(t) = E p (t)q^ k , JES
i, k E S,
(2.18)
SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM
267
or, in matrix notation, P'(t) = p(t)Q.
(2.19)
Since p ij (t) are transition probabilities, for all i e S.
p 13 (t) = I
(2.20)
,%ES
Differentiating (2.20) term by term and setting t = 0, Y 9i; = 0.
(2.21)
jES
Note that qij:= pi ß (0) > 0
for i ^ j, (2.22)
9ii'= p(0)
0,
in view of the fact p iJ (t) 0 = p ij (0) for i 0 j, and p ii (t) < 1 = p i; (0). In the general case of a countable state space S, the term-by-term differentiation used to derive Kolmogorov's equations may not always be justified. Conditions are given in the next two sections for the validity of these equations for transition probabilities on denumerable state spaces. However, regardless of whether or not the differential equations are valid for given transition probabilities p(t), we shall refer to the equations in general as Kolmogorov's backward and forward equations, respectively. Example 1. (Compound Poisson). From (1.7), co
p 1 (t) = I f*k(j — i) k=o
^ktk 1 e -at
t j0,
k•
= s ;; e + f(j — i)a.te ^' + o (t),
as t j 0.
(2.23)
Therefore, and
= pii( 0 ) = 1 f(j — i),
(2.24)
4ü = pii( 0 ) = —2(1 — .f(0 )).
3 SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM We saw in Section 2 that if p(t) is a transition probability law on a finite state space S with Q = p'(0), then p satisfies Kolmogorov's backward equation
268
CONTINUOUS-PARAMETER MARKOV CHAINS
Y_
p(t) _ > gikPkj(t), k
i, j e S, t
0,
(3.1)
j ES,
0,
(3.2)
and Kolmogorov's forward equation p(t)
_
Pik(t)gkj , i,
t
k
where Q = (( q ij )) satisfies the conditions in (2.21) and (2.22). The important problem is, however, to construct transition probabilities p(t) having prescribed infinitesimal transition rates Q = ((q1)) satisfying qij
> 0
for i : j, = L.
q 11
0, (3.3)
qij•
j
In the case that S is finite it is known from the theory of ordinary differential equations that, subject to the initial condition p(0) = I, the unique solution to (3.1) is given by p(t)
where the matrix
e`Q is
= e,
t
? 0,
(3.4)
defined by e=
I (tQ)n = I + Y t Q". n=1 n. n.
(3.5)
n =o
Example 1. Consider the case S = {0, 1} for a general two-state Markov chain
with rates — qoo = q01 =
ß,
Then, observing that Q 2 = — ( ß + Q"
=
(-
—q11 = q10
5)Q
(3.6)
and iterating, we have
1)" '(ß + S)" ' Q, -
= 6.
-
for n = 1, 2, ....
(3.7)
Therefore, p(t)=
e`Q=I
—R I
(e-
r(ß+b)—
1)Q
_ 1 S + ße-(B+a)' ß — ß e cß+air ß+b S — Se -
ß+Se-
It is also simple, however, to solve the (forward) equations directly in this case (Exercise 3).
269
SOLUTIONS TO KOLMOGOROV'S EQUATIONS IN EXPONENTIAL FORM
In the case that S is countably infinite, results analogous to those for the finite case can be obtained under the following fairly restrictive condition, A
:=
sup
Iql
<
oo
(3.9)
.
In view of (3.3) the condition (3.9) is equivalent to sup
Igjjl
<
(3.10)
c.
i
The solution p(t) is still given by (3.4), i.e., for each j e S, z
t t"
p1j(t) = ; + tq ;j + t2 q (2) + ... + t n q (n) + ...
(3.11)
for all i e S. For, by (3.9) and induction (Exercise 1), we have Iq n i < A(2A)" ', (
)
-
for i, j e S, n >, 1,
(3.12)
so that the series on the right in (3.11) converges to a function r;j (t), say, absolutely for all t. By term-by-term differentiation of this series for r(t), which is an analytic function of t, one verifies that r. (t) satisfies the Kolmogorov backward equation and the correct initial condition. Uniqueness under (3.9) follows by the same estimates typically used in the finite case (Exercise 2). To verify the Chapman-Kolmogorov equations (2.6), note that m
2
p (t +s)=e ( z+s'Q=I+(t+s)Q+^t2^s^ Q 2 + ... .^^^
=
m t
_}. 2! m! L I+tQ+t?Q2+...
2
Qm
+...l J
n
x I+sQ+s—Qz+... +S_Qn+..
2!
n!
= e`QesQ = p(t)p(s).
(3.13)
The third equality is a consequence of the identity Y
t m Qm S "
m+n=r m!
Qn —
n!
tmS"Qr . (t + S)r `fi (
m+n=r
m!n!^r!
r •
Also, the functions H1 (t) := Y JES p. (t) satisfy Hi(t) = Z p(t) = 1 jES
j€S kES
gi,Pki(t) =
gi,HH(t) kES
(t 0),
(
3.14 )
270
CONTINUOUS-PARAMETER MARKOV CHAINS
with initial conditions H 1 (0) = E IES b i; = 1 (i e S). Since HH (t):= 1 (for all t > 0, all i e S) clearly satisfy these equations, one has H,(t) = 1 for all t by uniqueness of such solutions. Thus, the solutions (3.11) have been shown to satisfy all conditions for being transition probabilities except for nonnegativity (Exercise 5). Nonnegativity will also follow as a consequence of a more general method of construction of solutions given in the next section. When it applies, the exponential form (3.4) (equivalently, (3.11)) is especially suitable for calculations of transition probabilities by spectral methods as will be seen in Section 9.
Example 2. (Poisson Process). The Poisson process with parameter 2 > 0 was introduced in Example 1.1. Alternatively, the process may be regarded as a Markov process on the state space S = {0, 1, 2, ...} with prescribed infinitesimal transition rates of the form q ij =2,
forj =i +1, i= 0,1,2,...
q=—A,
i =0,1,2,...
q 1 = 0,
otherwise.
(3.15)
By induction it will follow that
nn+j n (-1) —i^l, (n) _
if0<j—inn
—
0,
(3.16)
otherwise.
Therefore, the exponential formula (3.4) [(3.11)] gives for j i, t > 0, /
!)ijlt) =
(S ij
t2 +
tqi; +
Cji^ + .. .
t ;—i+k
oo
k= (j — i +k)! q = (2t }'
1 (—At)k
(j — 1 )!k=o
k!
tj—i+k
i + k
k=O(j —i +k)!\ j —i = (2ty-` e-a:
l
(3.17)
(j —i)!
Likewise, pij(t) = 0
for j < i.
(3.18)
Thus, the transition probabilities coincide with those given in Example 1.
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 271
4 SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATIONS For the general problem of constructing transition probabilities ((p ;; (t))) having a prescribed set of infinitesimal transition rates given by Q = ((q1)), where
—q i1 = y q ;j ,
q ;j >0
for i j,
(4.1)
1^I
the method of successive approximations will be used in this section. The main result provides a solution to the backward equations p(t) _ Z gikPk;(t),
i, j E S
(4.2)
k
under the initial condition p 11 (0)=5 1j ,
(4.3)
i,jeS.
The precise statement of this result is the following. Theorem 4.1. Given any Q satisfying (4.1) there exists a smallest nonnegative solution p(t) of the backward equations (4.2) satisfying (4.3). This solution satisfies X p 1 (t) < 1
for all i ES, all t >, 0.
(4.4)
jES
In case equality holds in (4.4) for all i e S and t > 0, there does not exist any other nonnegative solution of (4.2) that satisfies (4.3) and (4.4). Proof. Write n, = — q (i e S). Multiplying both sides of the backward equations (4.2) by the "integrating factor" e z is one obtains ;
;;
e A " s PLk(s) = e Ais y gI,Ptk(s), jCs
or d li" ds (ePik(s)) = A;e PIk(S) + e r ' s
ga;P;k(s) = y elisq„PJk(s)• JE$
j :i
On integration between 0 and t one has, remembering that p,(0) = S ;k , t
e Air Pik(t) = Sjk +
e gIJPjk(s) ds , jo o
272
CONTINUOUS-PARAMETER MARKOV CHAINS
or
Pik(t) = Sike a,' +
f` e -aiu-s)
(t ^ 0; i,
gijpjk(s) ds
k E S).
(4.5)
j #i o
Reversing the steps shows that (4.2) together with (4.3) follow from (4.5). Thus (4.2) and (4.3) are equivalent to the system of integral equations (4.5). To solve the system (4.5) start with the first approximation
°
(4.6)
(i, k e S, t ) 0)
Pik = Sike A'`
and compute successive approximations, recursively, by p(t)
_ 6 ike -Ail + Y_
e -x<«-s^ gij
p(k s
i
)
( ) ds
(n 3 1).
(4.7)
j0i Jo
Since q ;j 0 for i j, it is clear that p;k ^(t) > p;k ^(t). It then follows from (4.7) by induction that p;k + '(t) p(t) for all n , 0. Thus, p ;k (t) = lim n -. p(t) exists. Taking limits on both sides of (4.7) yields Pik(t) = 6ike-Ail +
e-a'(t-S)gijPjk(s) ds.
(4.8)
j9 6 i JO
1 for Hence, p ik (t) satisfy (4.5). Also, P k (t) - p;° (t) >, 0. Further, >kes Pik »(t) all t > 0 and all i. Assuming, as induction hypothesis, that >kES Pik '^(t) 1 for all t >, 0 and all i, it follows from (4.7) that p(t) =
e
-xis + Y
kES
e-zju-s>qij( ^
keS
j #iJ0 '
e -x;' +
pk -1)(S))ds l
ds
joi o
= et + tie-A;' fells ds = e-x;' + A i e -
'
= 1.
o
Hence, Ekcs
p(t) <
1 for all n, all t >, 0, and the same must be true for
)
I Pikt) = lim X pik (t)• kES
nj ro kES
We now show that p(t) is the smallest nonnegative solution of (4.5). Suppose p(t) is any other nonnegative solution. Then obviously p ;k (t) ^ 6 ik e = p»(t) for all i, k, t. Assuming, as induction hypothesis, p ik (t) > p;k 1 (t) for all i, k e S. t > 0, it follows from the fact that p ;k (t) satisfies (4.5) that
SOLUTIONS TO KOLMOGOROV'S EQUATIONS BY SUCCESSIVE APPROXIMATION 273 r
>
e x;a s^
e—x;t + S pik (t) > ik
cn-1) sd s /t ) . ( ) = p ikt
qjk ij p
joi Jo
Hence, p ik (t) p(t) for all n 0 and, therefore, p ikt) 15 k (t) for all i, k e S and all t >, 0. The last assertion of the theorem is almost obvious. For if equality holds in (4.4) for p(t), for all i and all t >, 0, and p(t) is another transition probability matrix, then, by the above p ik (t) ^ p ;k (t) for all i, k, and t >, 0. If strict inequality holds for some t = t o and i = i o then summing over k one gets Y- Pik( O) > E Pik(0) = I, kE$
kES
contradicting the hypothesis that p(t o ) is a transition probability matrix.
•
Note that we have not proved that p(t) satisfies the Chapman-Kolmogorov equation (2.6). This may be proved by using Laplace transforms (Exercise 6). It is also the case that the forward equations (2.18) (or (2.19)) always hold for the minimal solution p(t) (Exercise 5). In the case that (3.9) holds, i.e., the bounded rates condition, there is only one solution satisfying the backward equations and the initial conditions and, therefore, p;k(t) is given by exponential representation on the right side of (3.11). Of course, the solution may be unique even otherwise. We will come back to this question and the probabilistic implications of nonuniqueness in the next section. Finally, the upshot of all this is that the Markov process is under certain circumstances specified by an initial distribution it and a matrix Q, satisfying (4.1). In any case, the minimal solution always exists, although the total mass may be less than 1. Example 1. Another simple model of "contagion" or "accident proneness" for actuarial mathematics, this one having a homogeneous transition law, is obtained as follows. Suppose that the probability of an accident in t to t + At is (v + Ar) At + o(At) given that r accidents have occurred previous to time t, for v,). > 0. We have a pure birth process on S = {0, 1, 2, ...} having infinitesimal parameters q = —(v + i),), q i , i+ , = v + i2, q ;k = 0 if k < i or if k> i + 1. The forward equations yield ;;
p(t) _ —(v + nA)po„(t) + (v + (n — 1 )A)po,-1(t), p
0 (t)
=
—
vpoo(t)
(n
(4.9)
1).
Clearly, poo(t) = p01
(t) _ —(v + A)poI(t) + vpoo(t) _ —(v + A)poI(t) + ve "`
(4.10)
274
CONTINUOUS-PARAMETER MARKOV CHAINS
This equation can be solved with the aid of an integrating factor as follows. Let g(t) = e ( v + z tp ol (t). Then (4.10) may be expressed as dg(t) = vez,
dt
'
or g(t) =g(0)+
Jve z u
du=
0
J
vez"du=V(ezt
-
1).
0
Therefore, Poi(t) = e-(V+zng(t) = V ( e -^^ _ e -cV+z)t) _ V
— e - z').
(4.11)
Next p2(t) = —(v + 22)P02(t) + (v + A)Poi(t) _ —(v + 22)p 02 (t) + (v + 2) -v e - "`(1 — e -zt ).
(4.12)
Therefore, as in (4.10) and (4.11), (v p 02 (t) = e cv+zztt[J r (v+zz)" e 0 = e-(v+2)t [1'(%'
2 )v e"(1
[''("_+ A) = e-cv+2t z) ,1
z") du ]
+ ^1) J` (ezz"
^
—e
— ezu) du]
o
e 2zr — 1 — e zt 1 221
2
— v(v + 2) [ e - vt — v+z ) t 2e -( + e - w+zz)t]
22 2
— v(v + 2) e_vt[1 — e - zt]z
22 2
Assume, as induction hypothesis, that Po"(t)=
v(v + 2) .. .( v + (n — 1)^) e_vr[1 —
n! A"
Then, Pö. "+i(t) = —(v + (n + 1 ) 2 )p0,"+1(t) + (v + n2)P0"(t),
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
275
yields (t) = e—cv+("+1)z)r Po,"+ 1
J
e(v+("+1)z)u(v
+ n.1 )po"(u) du
0
= v(v + 2) ••(v+n2) i)z),
`
n!^"
ec"+i)z"[1 — e z"]"du.
(4.13)
o
Now, setting x = e za,
r ` 1
e " — e-z°]" du = - J
Jo 1
(x
1
x"( i
i
—
\
--)
1)"+1
x/ e ,.^
1 e"
dx = -
f
(x — 1)" dx
2 l 1 ( e z^ — 1)"+1
L n+1 ] 1 n+l
=
Hence, v(v + A)... (v + n2) po+i(t) _
(n + l)!2'
z^ "+ i e vt (1 — e)
(4.14)
5 SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY Let Q = ((q ij )) be transition rates satisfying (4.1) and such that the corresponding Kolmogorov backward equation admits a unique (transition probability semigroup) solution p(t) = ((p ;j (t))). Given an initial distribution it on S there is a Markov process {X1 } with transition probabilities p(t), t 0, and initial distribution it having right-continuous sample paths. Indeed, the process {X1 } may be constructed as coordinate projections on the space S2 of right-continuous step functions on [0, oo) with values in S (theoretical complement 5.3). Our purpose in the present section is to analyze the probabilistic nature of the process {X,}. First we consider the distribution of the time spent in the initial state. Proposition 5.1. Let the Markov chain {X,: 0 < t < cc} have the initial state i and let To = inf{t > 0: X, 0- i }. Then To has an exponential distribution with
parameter —q . In the case q = 0, the degeneracy of the exponential distribution can be interpreted to mean that i is an absorbing state, i.e., P1 (TO =oo)=1. ;;
;;
Proof. Choose and fix t > 0. For each integer n > 1 define the finite-dimensional
event
276
CONTINUOUS-PARAMETER MARKOV CHAINS
A„={X(m/2 n )t =iform=0,l,...,2n}. The events A. are decreasing as n increases and
A =
linl A:= fl A„ n-00
n=1
= {X„ = i for all u in [0, t] which is a binary rational multiple of t}
={To >t}. To see why the last equality holds, first note that{ To > t} = {X„ = i for all u in [0, t]} c A. On the other hand, since the sample paths are step functions, if a sample path is not in {To > t} then there occurs a jump to state j, different from i, at some time t o (0 < t o < t). The case t o = t may be excluded, since it is not in A. Because each sample path is a right-continuous step function, there is a time point t 1 > t o such that X. =j for t o < u < t 1 . Since there is some u of the form u = (m/2n)t < t in every nondegenerate interval, it follows that X„ = j for some u of the form u = (m/2n)t < t; this implies that this sample path is not in A. and, hence, not in A. Therefore, {To > t} A. Now note by (2.2)
r
Z°
P1 (To > t) = Pi (A) = lim F(A) = 11n p„(2„ nt x
n
t
1 o0
L
1 2” ]
= l im 1 + 2„ qii + o^2 „) = e`q ,
(5.1)
nt o0
proving that To is exponentially distributed with parameter —q ii . If q 1 , = 0, the above calculation shows that P; (To > t) = e ° = 1 for all t > 0. Hence P(T0 =oo)=1. n The following random times are basic to the description of the evolution of continuous time Markov chains. T o =O,
r 1 =inf{t>0:X, Xo }, To=t1,
r„=inf{t>t n _ 1 :X1 0-X,, 1 },
Tn=Tn+1—Tn
(5.2) forn^ 1.
Thus, To is the holding time in the initial state, T1 is the holding time in the state to which the process jumps first time, and so on. Generally, T. is the holding time in the state to which the process jumps at its nth transition. As usual, P; denotes the distribution of the process {X1 } under Xo = i. As might be guessed, given the past up to and including time To , the process evolves from time To onwards as the original Markov process would with initial state X.0 . More generally, given the sample path of the process up to time = To + T1 + • • • + Tn _ 1 , the conditional distribution of {Xz , + ,: t > 0} is Ps , , , depends only on the (present) state X, and on nothing else in the past.
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
277
Although this seems intuitively clear from the Markov property, the time To is not a constant, as in the Markov property, but a random variable. The italicized statement above is a case of an extension of the Markov property known as the strong Markov property. To state this property we introduce a class of random times called stopping times or Markov times. A stopping time r is a random time, i.e., a random variable with values in [0, 00], with the property that for every fixed time s, the occurrence or nonoccurrence of the event {i <, s} can be determined by a knowledge of {X: 0 < u <, s}. For example, if one cannot decide whether or not the event IT < 10} has happened by observing only {X,,: 0 < u <, 10}, then r is not a stopping time. The random variables i l = To , r 2 = To + T1 ..... t„ are stopping times, but T1 , To + TZ are not (Exercise 1).
Proposition 5.2. On the set {co e Si: r(w) < 00}, the conditional distribution of {X^ + ,: t >, 0} given the past up to time r is P, if r is a stopping time. Stated another way, Proposition 5.2 says that on IT < co}, given the past of the process {X,} up to time r, the future is (conditionally) distributed as the Markov chain starting at X, and having the same transition probabilities as those of {X1 }. This is the strong Markov property. The proof of Proposition 5.3 in the case that r is discrete is similar to that already given in Section 4, Chapter II. The proof in the general case follows by approximating t by discrete stopping times. For a detailed proof see Theorem 11.1 in Chapter V. The strong Markov property will now be used to obtain a vivid probabilistic description of the Markov chain {X,}. For this, let us write k..=0
^r = — qj„if = q "
=1
and
q1j
:
0,
for i J
k;;=0
for i j
(5.3)
if q ;; =0.
Proposition 5.3. If the initial state is i, and q ;; 0 then To and X T ,, are independent and the distribution of XT0 is given by P^(XT O =j) = kr;,
j e S.
(5.4)
Proof. For s, A > 0, and j i observe that P1 (s
(5.5)
278
CONTINUOUS-PARAMETER MARKOV CHAINS
Dividing the first and last expressions of (5.5) by i and letting 0. 0 one gets the joint density-mass function of To and XTp at s and j, respectively (Exercise 2), iTo.xTO (s , j) = e v1s pij(0 ) = e v " s gij = 21e-xs
qij
(5.6)
,
—
where 2 i = — q ;i . Now use Propositions 5.1 and 5.3 and the strong Markov property to check the following computation. Pio(TO - SO , XTo = il T1 ' S17 ,
XtI = 12, ... , Tn - Sn, X, = in+l)
so
Pio(TI ' S1, Xti = l2, ... , Tn ' Sng
=
Xt.^ = in+l I TO = S, XTo = i 1)^io
s=0
x e liosk ;oi , ds -
so
1 (T <1 S1 i XT o = i2, ... , Tn—I < sn , Xs. = in+1)2ioe—^ioskioi, ds =
s=0
\
so
_Aioe—xios
ds)kio`,
:=o
Rie—itsdS)ki,i2 (
X Pie(TO'S2,
fs"=0
13,...,Tn-2'Sn,Xtn_2 =in+J
XTo=
_ ... = f f ,l ife zt ,s ds)( f k, ji . + , I . j=0
(5.7)
j=0 /
Note that n
fl
= Pi Ø (Y1 = il, Y2 = i2, ... , Yn+1 = in+1) ,
j=0
where {Y.: m = 0, 1, ...} is a discrete-parameter Markov chain with initial state i o and transition matrix K = ((k ij )), while n
sj
ft
1i,e-xi;s ds
j=0 s=0
is the joint distribution function of n + 1 independent exponential random variables with parameters ^. io , A i ,.... , 2 i .. We have, therefore, proved the following. Theorem 5.4 (a) Let {X1 } be a Markov chain having the infinitesimal generator Q and initial state i o . Then {Yn := XT ,: n = 0, 1, 2, ...} is a discrete parameter
279
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
Markov chain having the transition probability matrix K = (( k 1 )) given by (5.3). Also, conditionally given {Y„}, the holding times To , T1 , T2 are independent exponentially distributed random variables with parameters A ;0 il ,'l iV ... (b) Conversely, suppose { Y„} is a discrete-parameter Markov chain having transition probability matrix K and initial state i 0 , and for each sequence of states i 0 , i1, '2, ... , let To , T1 , T2 , be a sequence of independent exponentially distributed random variables with parameters respectively. Assume also that the sequences {To , T,, T2 .}, corresponding to all distinct sequences of states, comprise an independent family. Define ,..
.
,.?
,
...
, ..
X,
=Yo ,
0
(5.8) X,
=Yk+1
for To +Tl +•••+Tk
.
Then the process {XX : t >, 0} so constructed is Markovian with initial state i o and infinitesimal generator Q. Part (b) of Theorem 5.4 provides a noncanonical construction of a Markov chain with the transition rates given by Q = ((q ;i )). By randomizing over the initial state, one easily extends the theorem to an arbitrary initial distribution n. The content of the system of integral equations (4.5) is now easy to understand. Let us write (4.5) as p<<(t) = 8‚e-t + Y_
Ai e
-
xisk ü pü (t — s) ds.
(5.9)
i0i o
If j 0 i, then k ;i pi,(t — s) is the conditional probability, given To = s, that XTo = and in an additional time t — s the process will be in state 1. Adding over all j 0 i, and then multiplying by the density of To and integrating, one gets for the case l 0 i,
f' _Yf
p 1 (t) = P; (X, = 1) =
Z
j#t
Pi(XT o
=
I
-x j, X, = l Tp = s)A e :s ds
,
ioi o,
ie—
lisk
;i p i,( t — s) ds.
(5.10)
l
If 1 = i, an extra term, namely, e -z i` must be added as the probability that no jump occurs in the time interval (0, t].
Example 1. (Homogeneous Poisson Process). As an immediate consequence of Theorem 5.4 we get the following corollary.
CONTINUOUS-PARAMETER MARKOV CHAINS
280
Corollary 5.5. For a homogeneous Poisson process with parameter A the successive holding times are i.i.d. exponential with parameter A. An interesting and useful property of the holding times of a Poisson process concerns the conditional distribution of the successive arrival times in the period from 0 to t given the number of arrivals in [0, t]. Proposition 5.6. Let To , To + T1 , ... , To + T1 + + i,... ‚denote successive arrival times of the Poisson process { Y} with parameter A. Then the conditional distribution of (To , To + T1 , ... , To + T1 + • • • + T,_ 1 ) given { X1 — Xo = k} is the same as that of k increasingly ordered independent random variables each having the uniform distribution on (0, t]. Proof. Let Uo , U,,... , Uk _ 1 be k i.i.d. random variables each uniformly
distributed on (0, t]. Let U (o) be the smallest of U o , U 1 , ... , U1 the next smallest, etc., so that with probability one U (o) < U(1) < ... < UUk _ 1 ^. Since each realization of (U(0) , U(1) , ... , Usk _ 1 ^) corresponds to k! permutations of (Uo , U 1 ,. . . , Uk _ 1 ), each with the same density 1/t 1', the joint density of (U(o), U41), ... , U(k-n) is given by
g(so> ... > sk -1) _
k!k!
if 0 < so <S 1 < •
0
otherwise.
< S_ 1 <_ t (5.11)
Now, by Corollary 5.5, To , T1 , ... , Tk are i.i.d. with joint density given by k
2k+1 exp —A f (xo, x1, ...
‚x)=
0
x ; if x ; >0
for all i,
otherwise.(5.12)
`-o
Since the Jacobian of the transformation (xo, xl, ... , xk) --► (x0, xo + xl, ... , xo + xl + • • • + xk)
is 1, the joint density of To , To + T1 , ... , To + T1 + • • • + Tk is given by f1(t0, t1, ... , t k
) = 2k+le—Irk
for 0 < t o < t 1 < t 2 < ... < t k . (5.13)
The density of the conditional distribution of To , To + T1 ..... given {To +TI +•••+Tk - l ,
To + T1 + • • • + Tk
2k+l e —tsk
P(To +TI +•••+Tk -
l
*t
if0<s o <s l <••• <s k ands k _ 1
otherwise.
(5.14)
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
281
But the denominator in (5.14) equals P(Y, = k), where Y, = X, — X0 . Therefore, ^k+le-dsk (
ASk Ak!e Th
At) k e xttk
e
.fz(so, s / , ... , SO =
(5.15)
k! if 0 <so< s 1 < • • • < sk and sk _ 1 1< t < S,
0
otherwise.
Integrating this over S k we get the conditional density of To , To + T1 , ... T0 +TI + ... +Tk - 1 given {Y=k}as (2) k!
t k e - xt J3(s0+S1..
. Sk-1) =
e -ASk ds _
-
if0<s0<S 1 <
0
k! e -zt = k'
k e xt t k
tk
(5.16) . <sk_1 Wit,
otherwise.
Thus, as asserted, f3 is the same as g.
n
Example 2. (Continuous-Parameter Markov Branching Process). This is the continuous-parameter version of the Bienayme-Galton-Watson branching process discussed in Chapter II. It is another basic random replication scheme that is used to describe processes ranging from the growth of populations to the branching of river networks. Consider a population of X0 = i o >, 1 particles at time 0. Each particle, independently of the others, waits an exponentially distributed length of time with parameter A > 0 and then splits into a random number k of particles with probability fk (independently of the waiting time), where fk > 0, fo >0,0 < fo + f1 < 1, 1k ofk = 1. The progeny continue to split according to this same process. Let X t denote the number of particles present at time t. An equivalent description of the process {X t } is as follows. Since the minimum of i o i.i.d. exponential random variables with parameter 2 > 0 is exponential with parameter i o 2, the holding time To in state i o is exponential with parameter i o 2. At time To the initial population of X 0 = i o particles becomes a population of X To = j i o — 1 particles with probability fj - ;o+ 1 . If i o = 0, then Xt = 0 for all t >, 0. In view of the lack of memory property of the i.i.d. exponentially distributed splitting times of the particles and their progeny, the holding time T 1 to the next transition given {XTo =1' To } (j >, 1) is exponential with parameter 2 j = jA (Exercise 13). Thus, {Xt } may be described as a continuous-parameter Markov chain with state space S = {0, 1, 2, ...} with absorbing state at 0 and having infinitesimal transition rates that can be calculated as follows. First note that the probability of k of i particles splitting within time h > 0 is (
)( e _y_k(l — e -z'')k = O(h k ) = o(h)
for i 3 k >, 2.
282
CONTINUOUS-PARAMETER MARKOV CHAINS
Now, p(h) =
e''xn + .fi \ /[J n 2e-ase-zln-SI ds](e -.tn ) i-1 + 0 (h 2 ) 1 0
= 1 - iA,(1 - fl )h + o(h)
as h --> 0.
(5.17)
Likewise, for j > i + 1, pii(h) _ ()ii-r+i
e -ze e -u- +i^acn s^ ds (e xn)i-1 + o (h)
LJ h fi 0
]
=if; - i + ,2h+o(h)
ash->0.
(5.18)
For] = i - 1, we have pt.j-i(h) = (i)fo[
f
Jh
a:lxn)
= ilfo h + o(h)
+ o (h)
ash-•0.
(5.19)
Therefore, the transition rates are given by jai-1, i^1, j0i,
iifj -i +II q i; = 0,
i =0, j>,0 or j
-12(1-f1 ),
i =j'> 1.
The branching evolution is characteristically multiplicative and as such clearly does not possess (additive) independent increments. The branching mechanism is distinctively displayed in terms of the probability-generating functions of the (minimal) transition probabilities by the following considerations. Let Go
9((r)
_
E
p 1 (t)r,
i = 0, 1, 2, ....
(5.21)
j=o
Since each of the i initial particles, independently of the others, has produced a random number of progeny at time t with distribution (p lj (t): j = 0, 1, ...), the distribution of the total progeny of the i initial particles at time t is the i-fold convolution (p(t): j = 0, 1, 2, ...). Therefore, 9,(` ß (r) = (9,(' ) (r)),
i = 0, 1, 2, ....
(5.22)
The Chapman-Kolmogorov equations for the branching evolution take the (transformed) form 9^+s(r) = g^ 1) (9s"(r)),
(5.23)
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
283
since pij(t + s)r' = Y, Y, P ik(t)pkj(s)r j=0
l
j=0 k
P^k(t)(gs"(r)) k = 9i`'(gs"(r)).
_ k
Fix i > 0 and consider the discrete parameter stochastic process {X.} defined
by n=0,1,2,....
(5.24)
The process {X„} is clearly a discrete-parameter Markov chain. In fact, from (5.24) one can see that {X^} has stationary transition probabilities j3 p ij (r). This makes {X„} a discrete parameter Bienayme—Galton—Watson branching process with offspring distribution f, = p lj = p(r). Observe that the event that {X,} eventually becomes extinct is equivalent to the event that {XX } eventually becomes extinct. It follows from the result obtained in Example 11.2 of Chapter II for the discrete parameter case that the probability p of eventual extinction of an initial single particle is the smallest nonnegative (fixed-point) r of the equation 9i' > (r) = r.
(5.25)
In particular, observe that this root cannot depend on t > 0. One may therefore expect to be able to compute p from the generating function of the infinitesimal transition rates. Define (5.26)
h" (r) _ Y, gijri = —fir + )
j=0
j=0
Proposition 5.7. The extinction probability p for the continuous-parameter branching process is the smallest nonnegative root of the equation h«(r) = 0.
Moreover, if
0 j fi <, I then p = 1.
Proof. Observe that the Kolmogorov backward equation for the (minimal) branching evolution transforms as follows. age (r)
8t.
_
aPij(t)
j
8t
r _
j
k
91kPkj(t)ri
_ I glkgi k1 (r) = Y glk(gi i) (r)) k . k
k
284
CONTINUOUS-PARAMETER MARKOV CHAINS
Thus,
ag, 1 ) (r) at
=h
n (9 1
(5.27)
(r)),
00
pij( 0 )r j = r.
g0 (r) _
(5.28)
j=0
In particular, since g, l) (p) = p for all t > 0, it follows that p must satisfy h
(ll
(p) =
Since h (l ^(0)=2f0 >0,h" 1 (l)=Zj
ag,(' (p) = 0. at )
o
(5.29)
q 1j =0,and
d2V l( r )
dr2 =A I j(j-1)fjrj i 0,
0
j=2
it follows that 0 (r) can have at most one zero in the open interval 0 < r < 1. If^^ o jf
d h" (r) _— J+ Y 2jrj fr'i dr )
is nonpositive for all 0 < r < 1. Thus p = 1 in this case.
n
As a corollary it follows that since A appears as a simple positive factor in (5.26), the extinction probability is the smallest nonnegative fixed point of the probability-generating function of the offspring distribution {f3}. In particular, therefore, extinction is completely determined by the embedded discrete parameter Markov chain {Y„}. In particular, if > j jfj > 1, then p < 1 (see Example 11.2 of Chapter II). Example 3. (Continuous-Parameter Binary Branching or Birth—Death). The
Bienayme—Galton—Walton binary branching Markov process {X} is a simple birth—death process on {0, 1, 2, ...} with X 0 = 1, absorption at 0, having linear birth—death rates q 11 + 1 = q ; , ; _ 1 = i2/2, i >, 1, and having right continuous sample paths a.s. Define L:= inf{t > 0:X=0},
(5.30)
M,==#{O<s
(5.31)
Imagine a binary tree graph emanating from a single root vertex (see Figure 5.1) such that starting from a single root particle (vertex), a particle lives for
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
285
Figure 5.1 an exponentially distributed duration with mean A -1 and then either splits into two particles or terminates with equal probabilities and independently of the holding times. Each new particle is again independently subjected to the same rules of evolution as its predecessors. Then L represents the time to extinction of this process, and M, is the total number of deaths by time t. According to Proposition 5.7, L is finite with probability 1. In particular, M L, represents the number of degree-one vertices in the tree (excluding the root). In the context of random flow networks, such vertices represent sources, and the extreme value statistic L is the main channel length. We will consider the problem of computing the (conditional) mean of L given ML = n. The process {(X1 , M,)} is a Markov process on Z + x Z + having rates A = ((a, j )), where j = i + (1, 0) = (il + 1, i2)
Zile,
j= i +(-1,1)=(i1-1, i 2 +1) 'i i i_, _(5.32) —i.?, j=i
0,
otherwise.
Justification of the use of the forward equations below is deferred to Section 8 (Example 8.2, Exercise 8.14). {u X tv M }, 0 < u, v < 1, t >, 0. Then,
Lemma. Let g(t, u, v) = E
C i (u — C 2 ) + C 2 (C 1 — u)eI'^ (c,-C2)1
g(t, u, v) = — u — C
(
2 z)) — (( u — Ci)e=z(c,=czi,_ _,
where C i = C 1 (v) = 1 — (1 — v)" 2 , C 2 = C 2 (v) = 1 + (1 — v)i"2
(5.33)
CONTINUOUS-PARAMETER MARKOV CHAINS
286
Proof. It follows from Kolmogorov's forward equation that for i o = (1, 0),
a at
i iz E so {ux^vM^} ; = E ;o u v a(xM,),(il.iz)
th'i2)
= E1 0 {2AX,u x,+ '^ + ZAX1 u x,-1 V M ^ +1 — AX,u x `v M `} =
Z2u2
au
E; o {u xt v Mt } + ZAv 3u
— Au
au
E; o {u x `v M `} (5.34)
Therefore, one has
a
g(t, u, v)
(5.35)
= A (u 2 + v — 2u) ^u g(t, u, v)
with the initial condition, (5.36)
g(0, u, v) = U. Letting y(t) be the characteristic curve defined by Y'(t) _ —A (y 2 (t) — 2y(t) + v) _ —A (y(t) — CO)(Y(t) — C2),
with C, - C,(v) and C 2 - C2 (v) as defined in the statement of the lemma, (5.35) and (5.36) take the form dt g(t, y(t), v) = 0,
g( 0 , Y( 0 ), v) = Y(0 )•
n
These are easily integrated to get (5.33). From the lemma it follows that (z)(_ 1)n+'vn
vnP(ML = n) = lim g(t, 1, v) = 1 — (1 — v) l " 2 =
n=1 \n
n=1
0 < v < 1. (5.37)
In particular, the classic formula of Cayley for the distribution of ML follows as (see also Exercise 11) P(ML=n)=(2(-1)n-1=2
1 2 1 2-zn+1 n n-1
n=1,2.....
(5.38)
Thus, by the Stirling formula one has,
SAMPLE PATH ANALYSIS AND THE STRONG MARKOV PROPERTY
P(ML =n)— 1—n -3 / 2 asn-->oo. 2f
287
(5.39)
The lemma provides a way to compute the conditional moments as follows:
EL I ML = n} = r J
"
tr 'P(L>tJML =n)dt -
(5.40)
0
and P(L> t,ML =n) — P(ML =n)—P(L
P(ML =n) Now, letting
hn=P(ML=n)E(Ln ML =n),
n> 1,
(5.42)
we have for 0 < v < 1, using (5.33), (5.37), (5.40) and (5.41),
J
h n vn = r tr -1 {(1 —(1 — v) 112 ) — g(t, 0, v)} dt n1
0
= rr°°
C C z
tr-1 C _ 1
o
f
—
1A(c,-c C zC 1e 2)r
dt -C2)u C 1a(c, C2z— 1 e
^
1
C C _r-1J r t d t. Cl)e#M(cl-CZ)`^ 1 — ( 2 C 22 — C ie
(5.43)
Ix(c1-C2)1
Expanding the integrand as a geometric series, one obtains 0on
(v) =
r CI(C2 — C2) t r-1
C2
i
n
Jo
e(n+1)z(c1 cz)t dt (
2
C 21 C
(C — C l r+1 y (C = irr \nn—r r l n=1 ) 2
= 2 ^r (1 — v)_
1)/2
)
"
f"
S r—
le—s ds
(Clinn-r. n=1
(5.44)
C2J
In the case r = 1, h(v) _ — 2 log A
1—
C1(v)
C2(v)
_—
2a.
log
2(1 — v) 1/2
1 + (1 — V)112
(5.45)
288
CONTINUOUS-PARAMETER MARKOV CHAINS
To invert h(v) in the case r = 1, consider that vh'(v) is the generating function of {nh"}. Thus, differentiating one gets that vh'(v) =.i 1 (1 — v) -1 [1 —(I — v) i
2] _). {( 1 — v) i
i
—(1 — v) -i'2} (5.46) and, therefore, after expanding (1 -- v) - ' and (1 — v) - `/ z in a Taylor series about v = 0 and equating coefficients in the left and rightmost expansions in (5.46), it follows that -
(_ 2 nP(ML =n)E(LIML =n)=2 1+ ' )(_1)"
t
.
( 5.47)
In particular, using (5.39) and Stirling's formula, it follows that E(n - '' z )LI ML =n)' 2,
as n—•oc.
(5.48)
}' I
ML = n), r >, 2, may The asymptotic behavior of higher moments E({n - " 2 ):L also be determined from (5.44) (theoretical complement 4). Moreover, it may be shown that as n —> oo, the conditional distribution of n -112 AL given ML = n has a limiting distribution that coincides with the distribution of the maximum of a (suitably scaled) Brownian excursion process as described in Exercise 12.7, Chapter I (see theoretical complement 4).
6 THE MINIMAL PROCESS AND EXPLOSION The Markov process with infinitesimal generator Q as constructed in Theorem 5.4 is called the minimal process with infinitesimal generator Q because it gives the process only up to the time T0 +T1 +T2
+•••=
n =o
.
( 6.1)
The time (is called the explosion time. What we have shown in general is that every Markov chain with infinitesimal generator Q and initial distribution is is given, up to the explosion time, by the process {X1 } described in Theorem 5.4. If ( = oo with probability 1, then the minimal process is the only one with infinitesimal generator Q and we have uniqueness. In the case P;(( = co) < 1, i.e., if explosion does take place with positive probability, then there are various ways of continuing the minimal process for t ( so as to preserve the Markov property and the backward equations (2.16) for the continued process {X1 }. One such way is to fix an arbitrary probability distribution 4 on S and let X, = j with probability i/i i (j e S). For t > ( the continued process then evolves as a new (minimal) process
THE MINIMAL PROCESS AND EXPLOSION
289
with initial j and infinitesimal generator Q until a second explosion occurs, at which time {X,} is assigned a state according to 4r independently of the assignment at the time of the first explosion C, and so on. The transition probabilities p ;k (t), say, of {X,} clearly satisfy (5.9), the integral version of the backward equations (2.16), since the derivation (5.10) is based on conditioning with respect to To , which is surely less than S, and on the Markov property. Unfortunately, the forward equations (2.19) do not hold for p, since the conditioning in the corresponding integral equations is with respect to the final jump before time t and one or more explosions may have already occurred by time t. We have already shown in Section 3 that if {q ;; : i c S} is bounded, the backward equations have a unique solution. This solution p. (t) gives rise to a unique Markov process. Therefore we have the following elementary criterion for nonexplosion. Proposition 6.1. If sup, Es ) (= sup,jq ;i l = sup ; , j Jq ;^^) is finite, then P = cc) = 1 for all i e S, and the minimal process having infinitesimal generator Q and an arbitrary initial distribution is the only Markov process with infinitesimal
generator Q. Definition 6.1. A Markov process {X} for which P; (C = oo) = I for all je S is called conservative or nonexplosive. Compound Poisson processes are conservative as may be checked from ll ( A t) /' / ^J*„v1—I)_
e jes
n=O JES
n•
i
^t e-i,` (^)„
n=O
e-x^(^.t)„=
n•
1.
n=O n!
*n (j _ i)
jES J
(6 . 2 )
Example 1. (Pure Birth Process). A pure birth process has state space S = {0, 1, 2, ...}, and its generator Q has elements q ;i = — A i , q ; , ;+ I = ^.,, q,; = 0 for j > i + I or j < i, i e S. Note that this has the same degenerate (i.e., completely deterministic) embedded spatial transition matrix as the Poisson process. However, the parameters Al i of the exponential holding times are not assumed to be equal (spatially constant) here. Fix an initial state i. Then the embedded chain { }„: n = 0, 1, ...} has only one possible realization, namely, Y„ = n + i (n > 0). Therefore, the holding times To , T1 ,... , are (unconditionally) independent and exponentially distributed with parameters 2, A i+ ,, ... , Consider the explosion time =To +T1 +•••=
Tm. m=0
CONTINUOUS-PARAMETER MARKOV CHAINS
290
First assume °° 1 Y, —=00.
(6.3)
n=0 ^n
For s > 0, consider 4(s)=E1e-
=Ee -'16r m= Eie-ST„, m=0
= ri
m +i = H
1
mj=O'm +i + S „=O 1 + S/Am +i
Choose and fix s> 0. Then,
log 4(s) _ — E 1og 1 + S m=0
^m +i
1
°° _ —s
— E s/'l m MW =0 1 + S/Am+l
= —oo,
(6.4)
m=0 1m+i + S
since log(1 + x) >, x/(1 + x) for x 0 (Exercise 1). Hence log 0(s) = — oc, i.e., (k(s) = 0. This is possible only if P i (C = oc) = 1 since e > 0 whenever ( < oo. Thus if (6.3) holds, then explosion is impossible. Next suppose that
1 <0.
Z
(6.5)
n=0 ^n
Then E(=
1 m=0 1 m +i
m1
= —<0o. n =i in
This implies P;(( = oo) = 0, i.e., Pi (C < oo) = 1. In other words, if (6.5) holds, then explosion is certain. In the case of explosion, the minimal process {X,: 0 < t <} can be continued to a conservative Markov process on the time interval 0 <, t < oc in a variety of ways. For illustration consider the continuation {X 1 } of an explosive pure birth process {X } obtained by an instantaneous jump to state 0 at time t = C, after which it evolves as a minimal process starting at 0 until the time of the second explosion, and so on. Let Pij (t) denote the transition probabilities for {X}. Likewise, Pi denotes the distribution of {X} given Xo = I. Consider, for
THE MINIMAL PROCESS AND EXPLOSION
291
simplicity, p oo (t). One has 00
Poo(t)=Po(X,= 0 )_ 1 P0 (X,=0,N,=n)
(6.6)
n=0
where N, is the number of explosions by time t. Since the event {X, = 0, N, = 0} is the event that the minimal process holds at zero until time t, Po(^': = 0, N, = 0 ) = p00(t) = e-x°'.
(6.7)
More generally, the event {X, = 0, N, = n} occurs if and only if the nth explosion occurs at some time prior to t and the minimal process remains at 0 from this time until t. Let fo denote the probability density function of the first explosion time ( for the process {X,} starting at 0. Since the times between successive explosions are i.i.d. with p.d.f. fo , the p.d.f. of the nth explosion time is given by the n-fold convolution ft". Therefore, Po(X, = 0, N, = n) =
J 0 poo(t — s)f t
*n (s)
ds.
(6.8)
Using (6.7) and (6.8) in (6.6) we get
Je
Poo(t)= e "` 1 +
f 0 (s) ds] .
(6.9)
n=1 0
Differentiating (6.9) gives Pöo(t) =
—)
OPoo(t) + Y f"(t).
(6.10)
n=1
In particular, (6.10) shows that the forward equation does not hold for ß(t). Since the time of the first explosion starting at 0 is the holding time at 0 plus the remaining time for explosion starting at state 1, we have by the strong Markov property, (6.11)
fo=e^*.fi
where f1 is the p.d.f. of the first explosion time starting in state 1 and e Ao is the exponential p.d.f. with parameter A0 . Therefore, W
.f ö"(t) _ n=1
e * .fi * f " n=1
(t).
(6.12)
292
CONTINUOUS-PARAMETER MARKOV CHAINS
Since, exo(t) = Aoe-z°' = 2opo0(t),
(6.13)
we can write (6.12) as L f Ö n (t) = A0 L n=1
n =1
Poo * fl * f
o*
Z)(t)
e
_ 2 0
oo
r
.fi(s)poo(t — s) ds + A 0. fi*.fön(s)poo(t — s) ds n =i o o
(6.14)
= A0P10(t)•
Now (6.10) takes the form P'00(t) = —A 000 (t) + A0ß10(t).
(6.15)
Equation (6.15) is the (0, 0) term in the backward equation for (t). As remarked earlier (Exercise 2) the backward equations hold for all terms Another method of constructing a conservative process from an explosive minimal process is to adjoin a new state A to S and define X, = A for t > l;. Then A is an absorbing state and the transition probabilities are for i, je S
p ;1 (t)
1 — > P Ik (t) for i E S, j = A o(t) — A(6.16) keS I 1
fori=j=A
0
fori= A,jES.
It is simple to check that (6.16) defines a transition probability (Exercise 3) and satisfies the backward as well as the forward equations for i, je S (Exercise 3).
7 SOME EXAMPLES Example 1. (The Compound Poisson Process). The state space for {Xr
is S = {0, ± 1, ±2, ...}. Let f be a given probability mass function on S with f(0) < 1, and let A > 0. Define the Q-matrix by A.= —9 11 =A.(l
k _ i
—
2ii
' A
}
—f(0)),• .1) (7
f(j—i) ifj i. 1 — f(0)
293
SOME EXAMPLES
Since the A, do not depend on i, it follows from Theorem 5.1 that the embedded chain {1',}, corresponding to the transition matrix K and some initial distribution, is independent of the i.i.d. exponentially distributed sequence of holding times { To , T,, .. .}. Also, { Yn } is a random walk that may be represented as Yo =Xo ,Y1 =X0 +Z 1 ,Y2 =X0 +Z 1 +Z 2 ,...,Yn =X o +Z 1 +Z 2 +•.• + Z .... , where {Z 1 , Z 2 ..... Zn , ...} is a sequence of i.i.d. random variables with common probability function f, independent of X0 . It follows from this description that the transition probabilities are given as follows. Let A = 2(1 — f (0)). Then, p 1 (t) _
Pi ( n transitions occur in (0, t], and
X, = j)
n=0
ro
+
Y- P,(T
O
+T1 +... +T- ,1
n=1
R(To+Ti +... +Tn- i
=S;je -x`+ n=1
x PA. =bi,e -kt+
_
e -xt n =1
(A t)n n = o
n.
=j)
-- f *n(j —i)
n!
(7.2)
f *n (j — i),
with the convention that f * o (k) = ó kø The rates given in (7.1) are the same as those that are obtained in (2.24). .
Example 2. (The Yule Linear Growth Process). This process is sometimes used as a model for the growth of bacteria, the fission of neutrons, and other random replication processes. It is a special case of Example 2 in Section 5 with fo = 0,f2 = 1. However, to keep this example self-contained, consider a situation in which there are initially i members of a population, each splitting into two identical particles after an exponentially distributed amount of time having parameter A and independently of the others. Let X X denote the size of the population at time t. Then {Xj is a Markov process on S = { 1, 2, ...} (Exercise 1), whose transition rates can be calculated explicitly as follows. p,1(h) = ( e
-xn)1 = e -stn = I — i1h + o(h).
(7.3)
Acisz— e ds] [e ] i
(7.4)
Similarly, (i)[Jh p1±
1(h) =
o
1—
= i2h + o(h)
294
CONTINUOUS-PARAMETER MARKOV CHAINS
and forj>i+2, -»-1 = o(h). 0 <, p 1 (h) < (1 — e
(7.5)
In particular, therefore, the infinitesimal rates are given by =—iA,
q i , i+1 =IA,
A i; =0
ifj>i+Iorifj
Note that this makes {X} a pure birth process. Since 1 c1 n 1 An
1
°1
2n =1 n
n1 nA
it follows from Example 6.1 that the process is conservative. We will use the backward equations to compute p k (t) for all i, k. We have ;
for k < i (indeed, p ik (t) = 0),
p k (t) = 0
(7.7)
p1(t) = —iApei(t), pik(t) = — iAp1k(t) + iApi +l,k(t)
(k
>, i + 1).
Clearly, p11(t) = e -Ail = e -'A'
(i = 1, 2, ...).
Substituting this into the last equation of (7.7) (with k = i + 1), one gets pi.j +1(t) _ —i2p +1(t) + iAe " + nx'.
(7.8)
Multiplying both sides by e` ' one gets
d
t
(e"'pi.i +I(t)) = iAe' A`p1,1+1(t) + e pi.1+1(t) = i2ee
At = iAe21
or t
epi,r +l(t) =
fo
iAe ds + e` Ap1+l( 0 ) = 1(1 -
—
et)
Therefore, pi,i +1(t) = ie
-
` » ( 1 — e -At ).
(7.9)
SOME EXAMPLES
295
Substituting this into (7.7) with k = i + 2, one gets
p+2(t) _ — i2 R,;+z(t) + i?,[(i + 1)e-(r+1 t(1 — e which as above, on multiplying both sides by eU`, yields t.
t
p; +2(t) = e ` ` f i^.(i + 1)e '(1 — e) ds -
-
-
o
= i(i + 1)e`i t[(1 — e -z') + z (e -22' — 1)] = i(i + 1l ) e i.Afo — e -A T. 2
(7.10)
By induction (a basis for which is laid out in (7.9)-(7.10)), one may prove (Exercise 2)
Fik(t)=
i(i+ 1). ..(k— (k —i)!
e'(1 —e - l`)'` -; fork>i.
(7.11)
Example 3. (N-Server Queues). Think of arrivals of customers as a Poisson process with parameter a. There are N servers. Let X, denote the number of customers being served or waiting at time t. A customer arriving at time t is immediately served if XX is less than N. On the other hand, if X, exceeds or equals N, then the customer must stand in a queue and await a turn. Assume that the service time for each customer, counted from the moment of arrival at a free counter until the moment of departure from the counter, is exponentially distributed with parameter ß, and assume that the service times for the different customers are independent. The process {X,} is Markovian (Exercise 4). The infinitesimal conditions may be found as follows. The state space is S = {0, 1, 2, ...} and the infinitesimal probabilities are given by
p 1 (h)
e-ahe-iph + O(h 2 )
p1,ß+3(h) =
1(h)
if 0 < i < N,
= l e -ahe-N(1h + O(h 2 )
if i > N,
e - a h ahe - `a h + O(h 2 )
if 0 <' i '< N,
ehe - N a h + 0(h 2 )
if I> N,
^e - ah i(1 — e -ah) e -u-'inh + O(h 2 ) =
e
ah
p(h) = O(h 2 )
N(1 — e - u h )e -(N -
' ) ah + O(h 2 )
if I '< i '< N, if
i> N,
if ji — jj > 1.
296
CONTINUOUS-PARAMETER MARKOV CHAINS
Therefore, q=-(a+iß)
if0
q =-(a+Nß) ;;
if i >, 0;
= a
=iß
ifi>N;
if1<,i N,
q _,=Nß ;
(7.12)
ifi>N.
It is difficult to calculate explicitly the transition probabilities by either of the methods of the two preceding examples. In the next example, we consider the case of infinitely many servers and zero waiting time for mathematical convenience. The N-server queue is a special case of the continuous-parameter birth-death Markov process to be analyzed in Example 1 of Section 8.
Example 4. (Queues with Infinitely Many Servers). In the physical model, X, denotes the number of customers being served at time t. Suppose that the number Y of arrivals of customers is a Poisson process with parameter a. Each customer is upon arrival immediately attended to by a server, the service time being an exponential random variable with parameter ß. The service times of different customers are independent. Suppose that initially there are X o = i customers who are being served. Because of the "lack of memory" property of the exponential distribution, the remaining service times for these i customers are still independent exponential random variables with parameter ß. Of the i customers served at time zero, let Z, be the number of those who remain at the counters (being served) at time t. Then Z, is a binomial random variable with parameters i and e ß`. During the time interval (0, t], however, there will be Y new arrivals. Suppose W of them remain at the counters at time t. Then X, = Z, + W. Of course, Z, and W are independent random variables. To determine the distribution of X, given X 0 = i, we therefore need to compute the distribution of W, given Y, = k. Given that there are k arrivals in a Poisson process in (0, t], the k arrival times, say r 1 , r l + T 2i , ... , i, + z 2 + • • • + T k , may be thought of as k independent observations U 1 , U2 ,. . . , U the uniform distribution on (0, t] arranged in increasing order of magnitude (see Proposition 5.6). Since the probability that an individual who arrives at time u will not leave the counter by time t is e ß ` , the probability that a random arrival in (0, t] will not leave by time t is -
-
1 e etr u^ d u
=
(
-
1 - e - ß'
(7.13)
t o ß`
Therefore, the conditional distribution of W given Y = k is binomial with
SOME EXAMPLES
parameters
k
297
and (1
— eß t )/ßt.
Unconditionally, then,
P(W = j) = > P(Y = k)P(W = i 1 Y = k) k=j
= kJ e
1 _ ßt
)J^
at (k^k \%/\
1 !'t R`1
Qtk \ J ‚
e' 1 — C ot J °° (at)k
j!
ßt
k=J (k —J)!
ßt
_ j _ ( t) - e )lat) e
t
1—e ß` J
R
a ea([I - ( 1 ! \ ßt I
= e'>"
- e -ß ) (
Q j t k-
(
[('x/ß)( 1 — e ß `)] J( j >, 0). -
(7.14)
)
j!
In other words, W, is Poisson with parameter (a/ß)(1 — e ß`). Since Z, is binomial, it follows from (7.14) that -
p ik (t)=P(X,=kI X o = i) =P(Z,+W
=kIXo =i)
min(i,k) i =
!e-ßt )r(1 — e - Rt)i
- re -
^a^Q)u -e-B')
[( c /ß/1 )(l [(a/ )(1
—e-ß
r )] k-r
r t( k—r)! =
e -cawR1-e 01 (1
r
mik ,) _ — e ß`)`C(a/ß)(1 — e-a`)]ki!
x [eß` + e - ß` — 2]
-
1a
r
= o r!(i — r)!(k — r)!
r
for k = 0,1, .... (7.15)
From (7.15) one can directly check the infinitesimal conditions for i l 0,q 111 = a and q i. , I = iß for i 1, q o , l = a. However, it is simpler to use the probabilistic description to arrive at these. For example, p i (h) is the sum of the probabilities of the events that there are no arrivals in (0, h] and none of the Xo = i customers leaves during (0, h], or one customer arrives in (0, h] and one of i + 1 customers leaves during (0, h], etc. Therefore, q, i = —( + iß)
p
r
(h) = e-ah(e-hh)i + 0(h 2 ).
(7.16)
More generally, since the chance of two or more occurrences in a small interval of length h is 0(h 2 ), one has p(h) = e-(a+`ßu' + 0(h 2 ) = 1 — (a + iß)h + h
p+1(h) =
\\ o
ae-aye-P(h-5) ds)e - 'ß h + 0 (h 2 )
0 (h 2 ),
CONTINUOUS-PARAMETER MARKOV CHAINS
298
=
e -Bh-ißh
h
ae-c'-v)s ds + O(h2) = e-u+i)ühah + O(h 2 ) 0
= ah — a(i + 1)ßh 2 + O(h 2 ) = ah + O(h2), pj,j_1(h) = (e
-
(7.17)
« h )i(1 — e - 9h)e - c" - ')ßh
= i(1 — ah)ßh(1 — (i — 1)ßh) + 0(h 2 ) = ißh + 0(h 2 )
Therefore, = 1im P<<(h) — I _ — (a + iß), hlo
qi,i+i(h) = lim
h
(7.18)
P^,^+i(h) —
hlo
0 = a
qj,1-1 = iß.
h
For completeness one needs to argue that {X,} is a Markov process. For this note that X 4 is is a function of (i) Xs and the additional service times up to time s + t required by the X., customers present at time s, and (ii) the numbers and times of arrivals of those customers arriving during (s, s + t] and their service times. But the latter (i.e., (ii)) are stochastically independent of {X: 0 < u < s} since Ys+, — Ys is independent of Y. for 0 <, u < s, and the service times of all customers are independent of each other as well as of the process {Y,,: u > 0}. Also, the conditional distribution of the additional service times of the Xs customers who are being served at time s, given {X, 0 < u < s} (or, for that matter, given { } u , 0 < u < s}, as well as the service times of all these arrivals in [0, s] already spent in [0, s]) is still that of XS i.i.d. exponential random variables each having parameter ß ("lack of memory" property). Hence P(Xs+ , = k ( X.: 0 <, u < s) _ P(Xs+, = k I X.J.
Example 5. (Monte Carlo Approaches to the Telegrapher's Equation). Monte Carlo methods broadly refer to numerical approximations based on averages of simulations of suitably designed random processes. Modern-day computing speed and precision make the random-number generator an important tool for numerical calculations. The results and methods of Chapter I indicate how discrete time and space Monte Carlo numerical solutions to initial and boundary-value problems associated with the heat equation (or diffusion equation)
au
a2u
au
=D+cax2 ax at
299
SOME EXAMPLES
might be obtained from density profiles computed from simulated random walks. A connection between general linear parabolic (diffusion) partial differential equations and discrete space and/or time Markov birth—death chains is described in Section 4 of Chapter V. In the present example we will consider a probabilistic approach to a special hyperbolic (wave) equation known as the telegrapher's equation. Namely,
z
ßa 2 t
i
2a-=0, (7.19) —v2+
with the initial conditions of the form u(x, 0) = (p(x) and (0u/at)J, =0 = 0 for a suitable initial profile cp(x). The telegrapher's equation arises in connection with the spatiotemporal evolution of both the voltage and the charge density in one-dimensional electrical transmission lines. In this framework the coefficients ß, a, and v 2 are all nonnegative real numbers; in terms of the physical parameters ß/v 2 = LC and 2a/v 2 = RC, where R is the electrical resistance, L is the inductance, and C is the capacitance (see Exercise 9). However, these quantities play no particular role in the probabilistic derivation. After dividing through by ß > 0 and adjusting the other parameters accordingly, we may take ß = 1. Partition the state space as a one-dimensional lattice of sites spaced A > 0 units apart and partition time into units of 0, r, 2r, 3r, ... , with A = vr. A particle is started at x = 0 and in the first unit of time r moves with speed v > 0, with some probability ic + in the positive direction to A(= vT), or moves with probability zr - = 1 — ir + in the negative direction at speed v to —A. Subsequently, at each successive time step the particle will continue to travel at the rate v but at each time step, independently of the previous choice(s), may choose to reverse its previous direction with probability p = ar. Notice that, given the initial velocity, the time step until the particle makes a reversal in direction is geometrically distributed with mean 1/p = 1/ar and variance q/p 2 = (1 — ar)/(ar) 2 . The reversal mechanism provides a kind of stochastic oscillation in the motion. The position process {S„} is represented in terms of successive displacements
by S0=0,
Sn =X 1 +X 2 +••.+Xn ,
nil,
(7.20)
where {X„} is a two-state Markov chain with initial distribution P(X, = A) = n + = I — P(X I = —A) and homogeneous one-step transition law determined by
f'(X„+^=A^X„_A)=P(Xn +^_
—
A^X„_ —A)=1—p,
n?1. (7.21)
Let us now consider the average profile after n time steps and viewed from the position of the particles initially moving to the right, starting from the initial
300
CONTINUOUS-PARAMETER MARKOV CHAINS
position m0. Let u + (na, m0) = E(cp(m0 + S„) I X l = 0),
(7.22)
u-(nT, m0) = E(çp(m0 + S„) 1 X l = -A).
(7.23)
and let
Note that the conditional distribution of {S„} given X, = -A coincides with the conditional distribution of { - S„} given X l = A. Therefore, u (nt, m0) = E(cp(mA - S^) I X l = 0).
(7.24)
-
Conditioning on X 2 in (7.22) we can write u + (ni, mEt) = E(E(cp(m 0 + S„) XZ) j Xl = 0) =E(gq(m 0- S, -i
+A)^X, =
(mO+S„-1 + 0 )I Xt = 0 )( 1 - p)
= u - ((n - 1)i, (m + 1)0)t + u + ((n - 1)t, (m + 1)4)(1 - ar). (7.25)
Conditioning likewise in (7.23) gives
u (nr, mA) = u + ((n - 1)r, (m - 1)A)ar + u ((n - 1)i, (m - 1)A)(1 - OCT). -
-
(7.26)
Note that taking n + = n - = Z, u(nT, mA):= Ecp(m0 + S„) = Z(u + + u - ). We now have the first-order equations for u + and u - given by
u + (nT, x) - u + ((n - l)z, x) u + ((n - 1)T, x + vr) - u + ((n - 1)r, x) T
i
- au+((n - 1)T, x + vr) + au - ((n - 1)i, x + vT), (7.27)
u (nr, x) -
-
u ((n r -
-
1)i, x) u ((n -
-
1)i, x
-
vT)
-
u ((n -
-
1)i, x)
z - au ((n - 1)i, x -
vT) +
au + ((n - 1)r, x - v c), -
(7.28)
where x = mA, A = vi. In the limit as i -> 0, A = vt -- 0, n, m -+ oo such that x = mA, t = nT, these equations go over to
au
= v
a
— au + au ,
(7.29)
-
(7.30)
-
a u = — v ax - au - + au + .
301
SOME EXAMPLES
Let, for the solutions u + , u to these equations, u + +u - u+-uw= and u=—-
(7.31)
By combining (7.29) and (7.30) we get the so-called transmission line equations
for u and w, au
aw
(7.32)
^t = v -- , aw = V au - 2aw.
at ax
(7.33)
Now w can be eliminated by differentiating (7.32) with respect to t and (7.33) with respect to x and combining the equations. This results in the telegrapher's equation for u. The initial conditions can also be checked by passage to the limit. The natural Monte Carlo simulation for solving the telegrapher's equation suggested by the above analysis is to start a large number of noninteracting particles in motion according to the above scheme, say half of them starting to the right and the other half going to the left initially. One then approximates u(t, x), t = nt, x = m0, by the arithmetic mean N -1 Yk= , cp(x + Sr), where N is the number of particles. However, there is a practical issue that makes the approach unfeasible. Namely, for a small time-space grid, the probability p = ca of a reversal will also be very small; the smaller the grid size the larger the mean time to reversal and its variance. This means that an extremely large number of particle evolutions will have to be simulated in order to keep the fluctuations in the average down. Mark Kac first suggested that for this problem it is possible, and more practical, to use continuous-time simulations. The idea is to consider directly the time to velocity reversal. In the above scheme this time is TN2 , where N, the number of time steps until reversal, is geometrically distributed with parameter p = ar. In the limit as T -+ 0 the time therefore converges in distribution to the exponential distribution with parameter a. So it at least seems reasonable to consider the continuous-parameter position process { Y} defined by the motion of a particle that, starting from the origin, travels at a rate v for an exponentially distributed length of time, reverses its direction, and then travels again for an (independent) exponentially distributed length of time at the speed v before again reversing its direction, and so on. The Poisson process of reversal times can be accurately simulated. To make the above ideas firm, let {N,} be the continuous-parameter Poisson process with parameter a. Let be a + 1-valued random variable independent of {N} with P(e = 1) = Z. The velocity process is the two-state Markov process { V,} defined by V,,=us(-1)v,
t>,0.
(7.34)
302
CONTINUOUS-PARAMETER MARKOV CHAINS
The position process, starting from x, is therefore given by Y,=x+
J
0
t
Vds=x+vs
E
(7.35)
(—l)' ds. ,
0
Although {Y} is not a Markov process, the joint evolution {(Y„ V)} of position and velocity is a Markov process, as is the velocity process { V} alone. This is seen as follows.
P(a
J
\ o
=P(a YSb,V=w Yo=y,Vo=^=Yw=v.
(7.36)
One expects from the earlier considerations that the solution to the telegrapher's equation with the prescribed initial conditions is given by u(t, x) = Ecp(x + Y) =
1E
(x+v E(-1) N °ds)+cp^x—v J^(-1)'-ds)]. (7.37) 0
[
To verify that (7.37) indeed solves the problem for a reasonably large class of initial profiles, let f(x, w) = cp(x), for w = v, — v. Define g(t,x,w)=E{f(Y,V) V0 =w,Y0 =x}=Tf(x, w).
(7.38)
Let g(t, x) = g(t, x, v), and g (t, x) = g(t, x, —v). Since {(Y, V)} is a Markov process, g satisfies the backward equation -
a
9 = Ag(t, x, w) = w ag + a[g(t, x, — w) — g(t, x, w)].
(7.39)
In order to check that the infinitesimal generator A is of this form, let x , f(x, w) be, for w = v and w = — v, a bounded twice-differentiable function with bounded derivatives. Then, as s ,^ 0, 9(s,x,x)=E(i(I:,V)1 I'o=x, Vo=w)
= E f(x, Vs) + (Y5 — x)
af(X, ys) Yo = x, Vo = w] o(s)
[
_ (f(x, w)e "^ + ase-"f(x, —w) + o(s)) -
J
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
303
f ( x , w) I +E((-x)- - ------ Yo=x, vo=wl+o(s) Gx \\ = f(x, w) - as(.f (x, w) - f(x, - w))
öf( x, w)
S
o _ f(x, w) + as(.f (x, — w) —1(x, w)) Öf (x w)
J (e
'
+
Ox
w
S -„,
+ o(i)) dr + o(s)
0
= f(x, w) + as(f (x, - w) - f(x, w)) + w
i?f ( x, w) 3x
s + o(s). (7.40)
Taking w = v and w = - v in (7.39), we get the two first-order equations (7.29) and (7.30) satisfied by g + and g - , from which the corresponding transmission line equations and then the telegrapher's equation follow from (7.39) in the same way as before.
8 ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS Let {X} be a Markov chain on a countable state space S having a transition probability matrix p(t), t > 0. As in the case of discrete time, write i -• j if p 13 (t) > 0 for some t > 0. States i and j communicate, denoted i -* j, if i -• j and j -* i. Say "i is essential" if i -• j implies j --- i for all j, otherwise say "i is inessential." The following analog of Proposition 5.1 of Chapter II is proved in exactly the same manner, replacing p7 , p, etc., by p j (t), p ( s), etc. Let denote the set of all essential states.
°
;
;;
Proposition 8.1 (a) For every i there exists at least one j such that i -• j. (b) Ifi-->j,j- .k then i-•k.
(c) If i is essential then i H I. (d) If i is essential and i - j then "j is essential" and i .-+ j. (e) On 41 the relation "+-." is an equivalence relation, i.e., reflexive, symmetric, and transitive. By (e) of Proposition 8.1, & decomposes into disjoint equivalence classes of states such that members of a class communicate with each other. If i and j belong to different classes then i +-* j, j -i- I. A significant departure from the discrete-parameter case is that states in
304
CONTINUOUS-PARAMETER MARKOV CHAINS
continuous parameter chains are not periodic. More precisely, one has the following proposition.
Proposition 8.2 (a) For each state i, p, (t) > 0 for all t > 0. (b) For each pair i, j of distinct states, either p 1 (t) = 0 for all t > 0 or p ij (t)>0 for all t>0.
Proof. (a) If there is a positive t o > 0 such that p 1 (t 0 ) = 0, then since
C
P14
(» n
—
n
t ... P«^ n t ^ < P«(to), = Puu( n t Pi^( n )
)
P« to = 0 \ n) for all positive integers n. Taking the limit as n --' oo, one gets by continuity that lim„ p ;; (t o /n) = 0, which is a contradiction to continuity at t = 0 and the requirement p(0) = 1. (b) Let i, j be two distinct states. Suppose t o is a positive number such that p, (t o ) = 0. Since p i,(t o ) > p ;; (t o — s)p ; ,(s) for all s, 0 < s < t o , and since p ;i (t o — s) > 0 it follows that p. s (s) = 0 for all s <, t o . Then q i; = p(0) = 0, and = 0. It follows from (5.4) that P; (X To = j) = 0. This implies that k ; • ; = 0 for every j' such that k uj ' > 0. For, if k ;j . > 0 and k j . j > 0, then no matter how small t is (t > 0), one has, denoting by {} n : n = 0, 1, 2, ...} the embedded discrete parameter Markov chain, P.;(t) P(XT o =j', XTo+T1 = j To+Ti _
=Pi(Y1=j',Y2=j,To+T1
krj fff
dt ; dt ; . dt ; > 0
,,
+r; 5«r,+tj +r;)
ifA >0. (8.1) If ), j = 0 then, given Y2 = j, T2 = oo with probability 1. Thus, the triple integral may be replaced by
JJ
2 te - xiti+ ,t;tr
dt ; dt v > 0.
This contradicts "p 1 (s) = 0 for all s < t o .” Thus, k;» = 0. In this manner one n may prove that k;j = 0 for all n. This implies p 1 (t) = 0 for all t. )
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
305
A state i is recurrent if P;(sup{t>0:X,= i }=oo)=1.
(8.2)
P,(sup{t>,0:X, =i}
(8.3)
A state i is transient if
Define the time of first return to i by rj i = inf{t >, TO : X, = i }.
(8.4)
Also denote by p i; the probability pü = P(^1 i < cc).
(8.5)
Thus, if j 0 i, then p ;j is the probability that the particle starting at i will ever visit j. Note that p ;; is the probability that the particle will ever return to i after having visited another state. Theorem 8.3 (a) A state i is recurrent or transient for {X} according as i is recurrent or transient for the embedded discrete parameter chain { }"} having transition matrix K. (b) i is recurrent if and only if p i; = 1. (c) Recurrence (or transience) is a class property. Proof. (a) Suppose i is recurrent. This clearly implies that P1 (Y" = i for infinitely many integers n) = 1. Conversely, suppose i is a recurrent state of the discrete-parameter embedded chain { }"}. Let
>7 ; =z=inf{t>, To : X, = i}, T;"=inf{t>
X^ =i },
(8.6)
for r = 2, 3, ... , denote the times of successive returns to state I. Under Pi the random variables r; 11 , r; 2 > — . , T;'^ — ( ' -1) are i.i.d. and positive. Hence, the nth partial sum t;" ) converges to oo with probability 1 as n --^ cc (Exercise 1). Therefore (8.2) holds. If i is transient for {X,}, then i cannot be recurrent for { }"} since this would imply i is recurrent for {X}. This implies that i is transient for { }"}, since, for a discrete-parameter chain a state is either recurrent or transient. (b) Since transitions occur only at times To , To + T,, To + T, + T2 , ... , it follows that p;j=P;(Y"=j for some n
1).
(8.7)
306
CONTINUOUS-PARAMETER MARKOV CHAINS
In other words, the p ;; defined by (8.51 for {X) are the same as those defined for the embedded discrete-parameter chain. Hence, (b) follows from (a) and Proposition 8.1 of Chapter II. (c) i is recurrent and i i--> j for {XX } implies "i is recurrent and i -J for { Y„}," which implies that j is recurrent for { Y„}, which, in turn, implies thatj is recurrent n for {X}. A state i is positive recurrent if .:=E< oo .
(8.8)
Note that if (8.8) holds then p ;; = 1. Therefore, positive recurrence implies recurrence. If a recurrent state i is such that E ri = oo, it is called null recurrent. Suppose that S is a single essential class of positive recurrent states, under p(t). By the strong law of large numbers one has with probability 1, ;
r
T (r) r'
Y
,,fr' +1) —
;
= (i) i
t
= 1
= E(r; Z) - r; l) ) = E ; T; 1) .
lim — = lira r-+ao r
(8.9)
r
Similarly, if f is a real-valued function on S such that
(E; J
'
TS 1 )
f (XS ) ds < oo , (8.10) 0
then applying the strong law of large numbers to the i.i.d. sequence •I,
f(XS)ds
Zr =
(r = 1,2,...),
(8.11)
it follows that, with probability 1, 1 r
urn- r.00 rr'=1
Zr , = E ; J
(8.12)
f(XS ) ds. 0
Writing v, = number of visits to state i during (0, t], one has
J f(Xs) ds = -1JT" .f (Xs) ds + 1 -
1 ` ('
-
t 0
('
t 0
°^
t r l=1
Zr +
1 t
`
.f (Xs) ds.
(8.13)
The first and third terms on the right-hand side of (8.13) go to zero as t -• 00 with probability 1 (Exercise 2). By (8.9), (v, /t) -+ (E 1) ) -1 with probability 1 as t -+ oo. Therefore we have the following theorem.
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
307
Theorem 8.4. Suppose S comprises a single essential class of positive recurrent states. If f satisfies (8.10) for some i, then with probability 1,
1
(
L(11
t
lim ds=(E;T;'1) `E; —. t
JO
f(X,)ds,
(8.14)
regardless of the initial distribution. The following analog of Theorem 9.2 of Chapter II may be proved in exactly the same manner using Theorem 8.4. Corollary 8.5. Suppose S consists of a single positive recurrent class under p(t). Then: (a) For all i, j e S
1
lim p,.(s) ds = (Et''.
(8.15)
t_.0t 0
(b) The transition probability p(t) admits a unique invariant initial distribution it given by 7r, = (E; 25 1 ^) -1 ,
j E S.
(c) If (8.10) holds, then the limit in (8.14) also equals ER f (Xo ) = Y j n j f (j). To state the analog of Theorem 10.2 of Chapter II, write f(j) = f(j) — E n f (Xo ) and WT (t) =
1
"T
T CTI
fo
f (Xs ) ds
(t >, 0).
(8.16)
Theorem 8.6. Assume, in addition to the hypothesis of Theorem 8.4, that t^ 1z
Q 2 .= E ; f (XS ) ds) < oo .
(8.17)
0
Then, as T -+ oo, the stochastic process {WT (t): t >, 0} converges in distribution to Brownian motion with drift zero and diffusion coefficient 'Q Z . b 2 = (E j rj' ) ) -
(8.18)
A complete analog of Theorem 9.4 of Chapter II follows from Theorems 8.3, 8.4, Corollary 8.5, if one observes that in the case S comprises a single
308
CONTINUOUS-PARAMETER MARKOV CHAINS
communicating class, the Markov chain {X,} is positive recurrent if and only if it admits an invariant probability distribution. The "only if" part of the last statement follows from Corollary 8.5(b). For the sufficiency see Exercise 12. In order to find an invariant probability n, one may try to differentiate the right side of the equation n'p(t) = n' to get (8.19)
n'Q=0.
A sufficient condition for the validity of term-by-term differentiation is given in Exercise 11. Example 1. (Birth-Death Process with One Reflecting Boundary) S = {0, 1, 2, ...},
= 8 iß, ,,
9 01 = 2,
q00 = —2 0;
qt1 = —2 k (i > 1).
ql,1+1 = ß1 2 i,
Here ß ; , 8 i > 0 and ß ; + b l = 1. By Theorem 8.3 and Section 2 of Chapter III, we know that this Markov chain is recurrent if and only if
x=
big2...bx
= co.
(8.20)
l#1N2* Yx
Also, the equations (8.19) become —no20 + it1A1S1 = 0,
— ni Ai + ni+it5i+1Ai+1 = 0
for j >, 1.
(8.21)
Solving these successively in terms of n o one gets, with ß o = 1, __ -o ir _ (AO^ ß0ßiß2' 1 b 1 1
o '
'—
l
S 1S 2'
For the chain to be positive recurrent requires
f 2. J o or j2.
S
n i < co, i.e.,
( AO) ß0ßiß2 ßi-1 < oo. S1S2...Si
i
( 8.22 )
(8.23)
If this holds, take ..ß
no
= (
1
i -i 1
3162... . ai »=1 ( 20 ) ßoß'ß2
(8.24)
in (8.22) to get the unique invariant initial distribution. Thus, the necessary and
309
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
sufficient conditions for positive recurrence (or for the existence of a steady state) are (8.20) and (8.23). The N-server queue described in Example 7.3 is a continuous-parameter birth-death chain on S = {0, 1, 2, ...} with infinitesimal rate parameters given by (7.12). In particular, the process has a reflecting boundary at zero so that the criterion for recurrence and positive recurrence apply. The series (8.20) diverges or converges according to the divergence or convergence of 6 1 6 2 ' * Y=N
ßlß2...
ß( 2 ß) ... [(N — 1 )ß](Nß) Y -N+1
= -
ß y y=N
aY
N! - l y I ( aß I y (8.25) =
Thus, the process is recurrent if and only if ß > a/N. Similarly, the series (8.23) converges or diverges according as the following series converges or diverges, 4 ßiß2 Y=N A y
ßv
...
i
=
ay -i
ß y ! y N+ 1 a + N ß Y=N N.N
6162
i NN
k))(a+Nß)-I.
(8.26)
Thus, the process is positive recurrent if and only ifß> a/N. It is null recurrent if and only (fß = a/N. In the case ß> a/N, the invariant initial distribution is
given by 1 _1 a' ( 1_-_ it = o (for 2 j N), j! ß (a +jß) it ^ 1 ß (a + ß) ito .
_
a
'N N 1 7r '
1+
N 0 —
Nß N! (a + Nß)
1
710 (j
(
(8.27)
1_ a 'NN
+
1 ;^j! ßII)i (a +Jß) ;_
> N),
-I
(a + Nß) Nßj N(
Example 2. (Birth-Death Process with One Absorbing Boundary). Here
S={0,1,2,...}, q1.j + 1 = ß1 2 ;
—q1;=A.;>0
fori_>l,
and
q1.1-1 = b i A 1 for
and
ß;+b;=1
q00=0, i
> 1,
where 0<ßi<1
fori>,l.
310
CONTINUOUS-PARAMETER MARKOV CHAINS
Thus, k oo = 1, k i; = 0 if i >, 1, k ;.;+ , = ß ; , k_, = 6, for i >, 1. The embedded discrete-parameter chain is a birth-death chain with absorption at 0. Therefore, "0" is an absorbing state for {X,}, and it is the only absorbing state. Given two states c and d with c < d, let /i(x) denote the probability that {X} reaches c before it reaches d starting from x. Note that this probability is the same for {X,} as it is for the embedded chain {}„}. Therefore by Eq. 2.10 of Chapter III, d_1 6c ^(x) = a-1
+lac 2 ...
-x ß^+ißß+z
(c + 1 < x < d — 1).
b
ac+1&+z
( Y=C+l Nc+lße+z
5y
...ßy
(8.28)
y+ .
ley)
Letting d --s oo, one gets d-1
p xc = Px ({X,}
ever reaches c) = lim
a.^
v=X ^+^ß^+z
1 d-1
for 6c+iSc+i
v=c +i ßc+ißc+z
. ß y
..
x
... 8v ...
>
c.
1+1
ßy )
(8.29)
Thus, for x > 0, p x , = I and, more generally, p x, = 1 for x > c, if and only if
v=
6 1 6 2 ... by
= cc. (8.30) i
ßißz
...
ßv
But if p xo = 1 for all x > 0 then Px (( = cc) = 1 for all x > 0, since the holding time at 0 is infinite. So there is no explosion if (8.30) holds. If the series in (8.30) is convergent, then p so < 1 for x> 0, and p x, < I for x > c. However, this does not necessarily mean explosion. For example, let ßx = p, 6 x = 1 — p for x>0 with 2
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
311
of a molecule of A into a molecule of B (for example, as a result of radioactive emissions). The term molecularity is used to refer to the number of reactant molecules involved in a given elementary reaction. So, A — ► B is called uni-molecular while A + B --* C is bi-molecular. Double arrows are used to denote reversible reactions. For example A . B signifies that either the reaction A --. B or the reverse reaction B -- ► A may occur at an elementary reaction stage. As an illustration, consider a chemical solution that undergoes elementary reactions of the form C + A . C + B where C denotes a catalyst whose concentration is held constant. For example, if the concentration of C is large relative to the concentrations of A and B then, although C does react with A and B, the relative variation in its concentration is so small that it may be viewed as constant. In describing the evolution of such a process, it is convenient to disregard the concentrations of substances that are constant throughout the evolution. So we are left with the reversible unimolecular reaction A # B. Let X, denote the number of molecules (concentration) of A at time t in a volume of substance held constant. The total number N of molecules A and B is constant throughout the reaction process. The state space of the process {X1 } is S={O,1,2,...,N}.
The holding time in state i represents the time required for an elementary reaction (either A --• B or B —^ A) to occur when there are i molecules of species A (and N — i of species B). The transition (jump) from state i to state i — 1 corresponds to the reaction A --* B by one of the A molecules with the catalyst and the transition from i to i + 1 corresponds to the reverse reaction. Let us assume that {X} is a time-homogeneous Markov process such that each molecule of A has probability rA At + o(At) of reacting with the catalyst in time At to produce a molecule of B (as At —• 0). Suppose also that the reverse reaction has probability rB At + o(At) as At —^ 0. In addition, suppose that the probability of occurrence of more than one elementary reaction in time At has order o(At). This makes {X} a continuous parameter (Ehrenfest type) birth—death chain on S = {0, 1, 2, ... , N} with two reflecting boundaries at i = 0 and i = N and having the infinitesimal transition rates q ;; given by ß2=(N—i)rß 52 =irA
forj= i +1,0<,i^N-1
forj =i -1, 1 <,i^N
9.;
(8.31)
—)_ —[irA +(N—i)r 8]
0
forj=i 3 O
otherwise.
Observe that uni-molecularity is reflected in the birth—death property p.1 (At) = o(At) as At — 0 for I j — i! > 1. In case of higher-order molecularities, for example, 2A + C —+ B, the concentration of A would skip states. Also notice that the conservation of mass is obtained in the boundary conditions. It follows from an application of Corollary 8.5 that there is a unique invariant initial distribution it that is also the limiting distribution regardless of the initial state. In fact, one may obtain it algebraically from the equation n'Q = d.
312
CONTINUOUS-PARAMETER MARKOV CHAINS
However, it is difficult to explicitly solve the Kolmogorov equations (2.17) and (2.19) for the distribution at finite t > 0. By considerations of the evolution of the generating functions of the distribution under (2.17) or (2.19), one may obtain moments and other useful information at times t < oo, when the coefficients are, as in the present case and in the branching case of Example 3 in Section 5, (affine) linear in the state variable j. The generating function of X X given X. = i, is N
(8.32)
W;(z, t) = E;(z ") _ Y_ z'pri(t) j=o
so that, differentiating both sides with respect to t, using (2.19), and regarding X --^ zx as a function of X for fixed z, E1(zx`) = E,( qx,,;z')
U at t) '
+i = Ej( 1 ti,x<
+ +i — rAEj(Xtz x `) = rAEi(Xtz xt-1 ) + NrB E,(z x t ^) — rB E ; (X,z x1 )
— NrB E(z x `) + rB E(X z x °) = rA z ` (z, t) + Nrezcp,(z, t) — z2r8 Viz (z, t) — rAz 0z ' (z, t) — NrB 9 i (z, t) + rz
U4,1
az
(8.33)
(z, t).
Collecting similar terms in the last expression, one gets
`P^
a at t) _ (1 — z)(rB Z + rA a )
^az'
t) — NrB (1 — z)^p;(z, t).
(8.34)
This is aJrst-order partial differential equation subject to the "initial condition" ^p (z, 0) = E.z x0 = z`. ;
(8.35)
A standard method of solving this equation is by Lagrange's method of characteristics. This method simply recognizes that (8.34) in essence says that the directional derivative
C
-t — (1 — z)(rB Z + rA )
äZ )wl(z, t) _ —NrB (1 - z)4,1(z, t).
313
ASYMPTOTIC BEHAVIOR OF CONTINUOUS-TIME MARKOV CHAINS
Introduce a real parameter 0 and in the (z, t) plane define the (family of) parametric curve(s) 0 —+ (z(0), t(0)), by d z(9)
dO
--(1 — z)(rB Z + rA),
dt(B) = 1.
dO
(8.36)
Now note that along such a curve the above directional derivative is none other than d[cp (z(0), t(6))]/d8, so that one has ;
d^P^(z( 8 ), t( 8 )) = —Nr B
dO
(8.37)
(1 — z)gqr(z( 0 ), t(0)),
whose general solution is
cp ; (z(0), t(0)) = exp{ —NrB
J (1
(8.38)
— z(0)) d9 + d} ,
where d is a constant of integration. Now the solution of (8.36) is z(0) _ —r" +
t( 8 ) = 0,
rA + rB
rB
(1 + cee)
i .
(8.39)
rB
Different values of c give rise to different curves of this parametric family. Each point (z, t) in the planar region {(z, t): —1 < z < 1, t > 0} lies on one and only one curve of this family. Thus, if we fix (z, t), then this point corresponds to c obtained by solving for c and 0 with t(0) = t, z(0) = z in (8.39). That is, — 1
0 = t'
c=e_(rA+rH)l rA
arB/\z+ra/
J .
(8.40)
Next note that, by (8.36), —((1—z)d9=^
J
dz =
l log(z+ A r ),
rB z + rA re
\
(8.41)
rB
so that q, (z(0), t(0)) = d'(z + r'') N ,
(8.42)
rB/
where d' is to be obtained from the initial condition cp,(z(0), t(0)) = cp,(z(0), 0) = z`(0).
(8.43)
314
CONTINUOUS-PARAMETER MARKOV CHAINS
Specifically, d' = zi(0) ( z(0) + rA^
N =
r " + rB z(0) ^
rB
_1_rA+rA+ rB
)-
N(l + c) -N
rB
rB
(1 +c)-J( rA +rB)-N(l+c)-N(8.44) rB rB
c being given by (8.40). Thus, writing
c(t, z)
for c in (8.40), NN
t) =[ (1 + c(t, z))' rA B
B rß) (1 + c(t, Z))Z + rB/ rB rBJ ^ rA
(8.45)
From (8.45) and (8.40), by appropriate differentiation, one may calculate E.XX and Var X. Moreover, as t —► oo, c(t, z) -+ 0. Hence, ;
rß lye qi(z, t) =
)N(Z
ra rB (_
+ )N,
(8.46)
which shows that, no matter what the initial state i may be, as t - cc, the limiting distribution of XX is binomial with parameters N and p = rB /(r,, + rß ).
9 CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS We are interested here in the computation of e`Q, where Q is an (m + 1) x (m + 1) matrix with eigenvalues a ° , a l , ... , a m , possibly with repetitions, and corresponding eigenvectors X00 X0 =
XOm ,
...,
X.
xm0
Xmm
which are linearly independent. Let X00 X01
... XOm
.X11
.. Xlm
B = ( XOXI ... xm ) = X
XmO Xml ... xmm
(9.1)
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
315
Then QB = (a0X0a1x1
...
... «mx.) _ (000X^IX1Xl ...«mXm), ... ,
Q Z B = Q(«oXO«1X1
Q
B = («px0a1X1 ... pt
Hence
0o tQ e B=
n=o
«mXm) ,
m X. )
ao t o
t o ' —Q) B =
n n n —(OXOGL'1X1...oc"X.)
n.
n=o n.
= (et^wxoetatx1...eta' X,) = B diag(et^o, . .. , et"').
(9.2)
Therefore e tao etas
0
0 B-1.
e`Q=B
(9.3)
Example 1. (Birth-Death Process with Reflecting Boundaries). Let S = {0,1,2,...,N}, =
—
'N'
900=
—)
o,
qol =
9i,i+ 1 = NiAi , q1,_1 = ( 1 — ß)A, = SiAi
qN,N— 1 = AN
with 0 <ß, < 1 for 1 < i < N — 1. The invariant distribution n, which exists and is unique, satisfies n'p(t) = n'. Differentiation at t = 0 yields n'Q=0,
(9.4)
or, in detail, —n o A o + rz 1 A 1 5 1 = 0, ni-lßj-itij-1
— "jAj +
nj +1Sj+1^j+1 =0,
for
1 < j < N — 1,
nN-1YN-1AN-1 — nNAN = O.
The solution is unique and given by
Ao
zt 1 = ial 1c o ,
n2 —
aZ^z
it s =
(ir1A1 — no2o)
=
1 — Aolno = ( 0
1 (n2A2 — n1ß1 2 1) = __ 63A3
J
622 61 (
'ß oP! — A 0 ß 1
3'3 (5 1 62
b1
'
ß no, ^O ) 1 ^2 6162
—
no = Ao
ß1ß2
^.3 61/52(53
no
(9.5)
316 7Tj
CONTINUOUS-PARAMETER MARKOV CHAINS
_ Ao
ki
ß1ß2"'ß;-1
12"
S;
n o for 2<,j
ßN—IAN-1 A0 ß1A2 7 rN = 'rN-1 = (— AN AN 1 2
.
ßN-1
S 6 ...
Since Y
it,
( 7r0. 9.6)
(SN-1
= 1, one gets 7C o
=[i
o
+
+
A1S1
Ao Nl j=2 Aj
...ßj _ 1 1.
ö1...
(9.7)
5J
In (9.5) we have used the convention S N = 1. Now observe that Q is a symmetric linear transformation with respect to the inner product defined by N
That is,
(x, y)n = Z xiyi 1r;. i =o
(9.8)
(Qx, y), = (x, QY), .
(9.9)
To show this, note that b j Aj nj = ßj - l Aj _ i nj - 1 , so that
Y_
N-1
(Qx, Y)n = ( - 2 0X0 + 2 0X1)Y0it0 + Y ;=
+
(
ANXN -1 —
(bjAjX;-1 —
1
A NXN)YNltN
2;X; + ß;A;Xj+l)Yjlrj
N
—1 OX0Y0it0 — ANXNYNiN +
j=1 N-1
N-1
—
6j Aj x j —
lY; "j
1jxjyjij + N-1
_
—
A :XoY0it0 — ANXNYNiN — I
j=1
j=
Y; 7r ;
N
N
+
Ajx j
+
(9.10)
1 j=1
which is symmetric in x and y. Hence Q has N + 1 real eigenvalues a N possibly with repetitions, and corresponding mutually orthogonal eigenvectors x 0 , x 1 , ... , X N of unit length. By orthogonality of the eigenvectors with respect to (•,•)n , N (xis xk), = [r XjiXjkitj = 6ik. j=0
(9.11)
317
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
Equivalently,
B'
diag(n o , ... ,
7z N )B
= I.
(9.12)
'1 N ).
(9.13)
Therefore, B -1 = B' diag(n o , ... Using (9.12) in (9.3) we get P(t)
= etQ = B diag(etao ... , et )B' diag(n o , ... ,
it N ).
(9.14)
From (9.14) we have N
pij(t)
y- xi
=
k
e tak x jk 7C j .
( 9.15)
k=0
As a special case consider the continuous-parameter simple symmetric random
walk given by
2 0 = 2 l=
=AN
= 1 and
ßi = z=8 ; forl
N-1.
If a is an eigenvalue of Q, then a corresponding eigenvector y satisfies the equations —
Yj+2Yj +i =ayj (1
— Yo + Yi =
aYo'
<j(N— 1),
YN-1 — YN
=
(9.16)
aYN•
One may check that the eigenvalues a o , ... , a N are given by
ak
kn
=cos N
1
(k=0, 1,2,...,N),
and the corresponding eigenvectors x 0 , x o =(1,1,...,1)', k1L11
x jk = f cos( \NJ
... ,
(9.17)
x N are
xN=
(1,-1,1,...)',
1
(9.18)
A systematic method for calculating these eigenvalues and eigenvectors is given in Example 4.1 of Chapter III. The eigenvectors here are the same as there, but the eigenvalues differ by 1 because the matrices differ by an identity
318
CONTINUOUS-PARAMETER MARKOV CHAINS
matrix. Also from (9.6), itO—itN
2N,
for I <,j<,N— 1.
7rj=N
Hence, by (9.15), pi(t) _
N Y e(cos(krz/N)-1)tx.kxjki, k=0
_
7Tj 1
+
N-1
k ril cos(k^r^l + e('os("l')-1)'2 cos( ? N \NJ k=1
e-
zt0(i,j)
(9.19)
where 0(i, j) = X IN XjN = +1 or —1 according as I j — i( is even or odd. Note that p,j (t) converges to it, exponentially fast as t — oo.
10 ABSORPTION PROBABILITIES Suppose that PR is the distribution of a continuous-parameter Markov chain {X1 } starting at i c- S and having homogeneous transition probabilities p(t) = ((p ;j (t))) with corresponding infinitesimal rates furnished by Q = p'(0). For a nonempty set A of states, let Q denote the matrix obtained from Q by replacing all rows corresponding to states in A by zeros. Let P, denote the distribution of the process {Xj so transformed. For an initial state i A, all states belonging to A are absorbing states under Pi . Moreover, in view of Theorem 5.4, the distributions under Pi and P, of the process {Xt : 0 - t < i A }, where 'c
A
=inf{t>0:X,EA}
(10.1)
is the first passage time to A, must coincide. Therefore, absorption probabilities starting from i 0 A can be calculated directly in terms of the transition probabilities under the transformed distribution P, as follows. Proposition 10.1. Let p(t) and $(t) denote the transition probabilities for the distributions PI and P; generated by Q and Q, respectively. Then, for i 0 A, PP(rA > t) = Y_ Pij(t).
(10.2)
jO A
Proof. Let { Y„} be the discrete-parameter Markov chain starting in state i under P, obtained in accordance with Theorem 5.4 and let To , Tl , T2 ,... denote the
corresponding exponentially distributed holding times. Writing v A =inf{n>,0: Y„eA},
(10.3)
ABSORPTION PROBABILITIES
319
we have P;(t G TA <
00) _
Z Z * P;(Y° = 1, Yl
=
il,
,
Ym_1
- i m_ 1,
Ym - j, VA - m,
m=1
T0+ ..+Tm - 1 >t) =
L
^*
kü
m=1
,G(t; 2, ; , ... > i
k.
^),
(10.4)
where Z* denotes summation over all m-tuples (i 1 , ... , i.-,,j) of states is the probability that i l , i2, .. . 'L- 1 E S\A , je A, and G(t; .?;, 2, ... .l ; a sum of m independent exponentially distributed variables with parameters ).„ A i , ... , ). i exceeds the value t; recall that A,, = —g,,. The corresponding probability under P, is obtained on replacing k ii ,k i ,i k ; , ;z • • by iii k; 6 • • . L ;m _ ,; and G by G. Now the transition probabilities for the discrete-parameter Markov chains corresponding to Q and Q are the same for all sequences of states in*. Therefore, for such sequences G(t;1 i ,, ... , ;m _,) = G(t; 2, .. , J ; ), and Pi(TA > t) = Pi(TA > t) _ Y_ ß 1 (t). j0 A
Observe that since p,
for all lEA,keS\A,
k (t)=0
(10.5)
the backward equations for p ;; (t), with i, je S\A, take the form
°
° °
p '(t) = Q P (t),
°
(10.6)
°
where p and Q are obtained from p and Q by deleting all rows and columns corresponding to states in the set A. Example 1. (Biomolecular Fixation). The surface of a bacterium is assumed to consist of molecular sites that are susceptible to invasion by foreign molecules. Molecules of a particular composition, termed acceptable, will be permanently affixed to a site upon arrival, to the exclusion of further attachments by other molecules. Molecules that do not possess the acceptable composition remain at the site for some positive length of time but are eventually rejected. The problem here is to analyze the rate at which fixation occurs. Suppose that foreign molecules arrive at a fixed site according to the occurrences of a Poisson process with parameter A > 0. A proportion a > 0 of the molecules that arrive at the site are acceptable. Other "unacceptable" molecules remain at the site for an exponentially distributed length of time with parameter ß > 0. The calculation of the distribution of the length of time until fixation occurs is an absorption probability problem.
320
CONTINUOUS-PARAMETER MARKOV CHAINS
At any particular time t, the site is in one of three possible states a, b, or c, say, depending on whether the site is (a) occupied by an acceptable molecule, (b) occupied by an unacceptable molecule, or (c) unoccupied. The evolution of states at the given site is, according to the above assumptions, a continuous-time Markov chain {X} with state space S = {a, b, c} starting at c and having infinitesimal rates given by a
a Q =
0 b 0
b
c
0 —ß
0 ß .
(10.7)
c 2 (1 — a)A —A According to Proposition 10.1, the distribution of the first passage time z a to fixation (a) is given by Pc(T.,
(10.8)
> t) = P° (t) + P ä(t),
where by (10.6), p ° (t) is determined by p ° '(t) = Q ° P ° (t),
P ° ( 0 ) = I,
(10.9)
and b
c .
(10.10)
Q° c[(1 — a)A ßA]
Q ° has distinct eigenvalues r,, r 2 given by the zeros of the characteristic polynomial (10.11)
det(Q° — rI) = r 2 + ( A + ß)r + aßA. In particular, rl =
—! (A
+ ß) + z(A 2
+
2(1 — 2)ßA + ßZ)ß/ 2
rz = —(A + ß) — i( 22 + 2(1 — 2a)ßi +
,
ß2)"2.
(
10.12)
(10.13)
Two corresponding linearly independent eigenvectors are x 1
=
1
x [r, +ßi ß ri+ß 2
(10.14)
ABSORPTION PROBABILITIES
321
Linear independence is easily checked since
ß 1 = ß(r 2 — r 1 ) 0.
detB=det [
(10.15)
r, ß+ß r 2 +ß
The solution to the system (10.9) is given as in Section 9 by p ° (t) = eQ ° ` = B diag(etr e ,r,)B- i /22{er2t — ß2{e
_ I ß{(r2 + ß)e r — (r1 + ß)er2(}
ß(r2 — r1) (rl + ß)(r2 + ß){e " — e r2l }
ß{(r2 +
e rlt}
— (rl + ß)e r'I }J (10.16)
Therefore, from (10.16) and (10.8) we obtain Pc (z a >
t) = ^ r I
^r 2 (1
+ r^er'` — r i (1 + 2 ) r 2 `].
(10.17)
The average rate of molecular fixation is given by
Erna =
, + ^D rlr2 +ß(rr2) Pc(Ta > t) dt = -- -- —
o
ßr1 r2
_
ß+ A(1 - a) -. (10.18)
xßA
Example 2. (Continuous-Parameter Markov Branching). Let {X} be the continuous-parameter Markov branching process defined in Example 5.2. The generator Q = ((q 1 )) is given by iAfi -i +1 j0i,j =i— 1,i+1,...,i'> 1, ,
q^i= — i).(1 — f1), 0,
=J
1,
(10.19)
i =0,j,>0ori,> l,j
The probability p of eventual extinction for an initial single particle was calculated in Section 5. Let h(s) = h" (s) =gis .
(10.20)
Assume that the offspring distribution {f} has finite second moment with mean µ < 1. Then h'(1) <0 and h"(1) < oo. In particular it follows from Proposition 5.7 that p = 1. We will exploit the special Markov branching structure to compute the tail probability P,(t {0} > t) for large values of t under these assumptions. Let K i =h'(1)=2(Et-1)<0,
K2=h„(1)=2 1 j(j-1)fj
CONTINUOUS-PARAMETER MARKOV CHAINS
322
We have, Pl(Tto) > t} =p(t) = I — P1.0(t) = 1 — gi"( 0 ),
(10.22)
i =1
where g
1 (r) _ ° , p l , j (t)rJ. As in the proof of Proposition 5.7 at (5.27), the Kolmogorov backward equation for branching transforms according to
a cis r
9r O
at
(10.23)
= h(g,' (r)),
with g(r) = r. Thus, for each t > 0, 0 < r < 1, (rg'' ds )
J
h(s) = t.
(10.24)
Y
Now, h(1) = 0
and
h(s) = h'(1)(s — 1) + (s)(s — 1) 2 for s <, 1,
where (1 - ) = h"(1) < cc. Thus, under the assumptions for (10.21),
1
_ 1 _ 1 h(s) K,(s — 1) + z^(S)(s — 1) 2 (s)(s — 1) K 1 (S-1) 1+ 2K 1 ^(s)(s — l )
=
1 1
—
K(S — 1)
2K1 =
1 + ^(s)(s — 1) 2K 1
where
c(s) ((s)= 2K,
^( s)(s 2K 1
-
I+
1
)
In particular, note that
11 h(S)
K I (s — 1)
1 + qO(s),
K,(s — 1)
—(s)
(10.25)
ABSORPTION PROBABILITIES
323
is bounded for 0 < s < 1. Define for 0 < x < 1,
H(x)_ — 1 — 1
h(s)
K 1 (s — 1)
ds +
log(1 — x).
K1
(10.26)
Then H'(x) = 1/h(x) > 0 for 0 < x < I and (10.24) may be expressed as
H(g,( 1 (r))—H(r)=t,
0,
(10.27)
Since H is strictly increasing on [0, 1), this can be solved to get
g, l ^(r)=H - '(t+H(r)), Now, from (10.26), we have for x —• 1
H(x) =
-
0
(10.28)
,
_J t cp(s) ds + 1 log(1 — x) x
K1
1
1 1 x^p(s)
_ x
ds (1 — x) +
= —(1 )(1
log(1 — x) 1
— x) + 0(1 — x) + - log(1 — x) K1
= K2 (1 — x) + 1 log(1 — x) + o(1 — x). 2K1
(10.29)
K1
Equivalently, solving this for log(1 — x) we have log(1
— x) = K 1 H(x) — K 2
(1 — x) + o(1 — x).
(10.30)
2K 1
Therefore, for x — 1 , -
1 — x = e"iH(x)e-O"2I2"jxl-x)(1 + o(1 — x)) = e1 1 —
(1 — x) + 0(1 _x)).
(10.31)
2K1
Now, by (10.22), (10.28), and (10.31) we have for
t —•
oo, y,:=H - 1 (t + H(0)),
Pl (T (o ^ > t) = 1 — g'(0) = 1 — H - '(t + H(0)) = 1 — y, = e"Jecv^) 1 — K
t
(1 — YI) + o(1 — Yj)^ , (1
(10.32)
324
CONTINUOUS-PARAMETER MARKOV CHAINS
where y, = H '(t + H(0)) - 1 as t --^ oo is used in applying (10.31). Thus, -
Pl(T{o} > t) = e
Klcr+ 11 t 0 »(1 —2k (1 — Yt) + o(1 — Y)) i
= e Kit e K1 (°)(1 —2K1 (1 — g; 1) (0)) + 0(1 — gß 1) (0))) — ce 1 ' - µ ) `
(10.33)
as t -+ oo,
where —K1 = 2(1 —µ)>0,
c=e'Ieto>>0.
By consideration of the p.g.f. k(z, t) of the conditional distribution of X, given {X, > 0, X0 = 1}, one may obtain the existence of a nondegenerate limit distribution in the limit as t --^ oo having p.g.f. of the form (Exercise 4*), Z ds
k(z, co) := tim k(z, t) = 1 — exp K l (10.34) roo
o h(s)
11 CHAPTER APPLICATION: AN INTERACTING SYSTEM THE SIMPLE SYMMETRIC VOTER MODEL Although their interpretations vary, the voter model was independently introduced by P. Clifford and A. Sudbury and by R. Holley and T. Liggett. In either case, one considers a distribution of + l's at the point of the d-dimensional integer lattice Z d at time t = 0. In the demographic interpretation one imagines the sites of Z' as the locations of a species of one of the two types, + 1 or —1. In the course of time, the type of species occupying a particular site can change owing to invasion by the opposition. The invasion is by occupants of neighboring sites and occurs at a rate proportional to the number of neighboring sites maintained by the opposing species. In the sociopolitical version, the values + 1 represent opposing opinions (pro or con) held by occupants of locations indexed by 7L'4 . As time evolves, a voter may change opinion on the issue. The rate at which the voters' position on the issue changes is proportional to the number of neighbors who hold the opposing opinion. This model is also related to a tumor-growth model introduced by T. Williams and R. Bjerknes, which, in fact, is now often referred to as the biased voter model. If one assumes the mechanism for cell division to be the same for abnormal as for normal cells in the Williams-Bjerknes model (i.e., no "carcinogenic advantage" in their model), then one obtains the voter model discussed here (see theoretical complements 4 for reference). While mathematical methods and theories for this and the more general models of this type are relatively recent, quite a bit can be learned about the voter model by applying some of the basic results of this chapter.
325
CHAPTER APPLICATION
A sample configuration of ± 1-values is represented by points a = ( a n : n E 7L ° ) in the (uncountable) product space S = (1, — I )". The evolution of configurations for the voter model will be defined by a Markov process in S. The desired transition rates are such that in time t to t + At, the configuration may change from a, by a flip at some site m, to the configuration 6 ( m , where Q;,m ) = —a n if n = m and 4m ) = a n otherwise. This occurs, with probability C m (a) At + o(At), where c m (a) is the number of neighbors n of m such that a n 54 a,,; two sites that differ by one unit in one (and only one) of the coordinate directions are defined to be neighbors. More complicated changes involving a flip at two or more sites are to occur with probability o(At) as At -- 0. It is instructive to look closely at the flip-rates for the one-dimensional case. For a configuration a E S a flip at site to occurs at unit rate c m (a) = I if the neighboring sites m — I and m + I have opposite values (opinions); i.e., u._ lam+ 1 = — 1. If, on the other hand, the values at m — 1 and m + I agree mutually, but disagree with the value at m, then the flip-rate (probability) at in is proportionately increased to c m (a) = 2. If both neighboring values agree mutually as well as with the value at ni, then the flip-rate mechanism is turned off, c m (a) = 0. Thus there is a local tendency toward consensus within the system. Observe that c m (a) may be expressed as a local averaging via )
cm( 6 )
Y_
= I — 2am( a m-I + a m+1 ).
(11.1)
More generally, in d dimensions the rates c m (a) may be expressed likewise as c m (a) = d(I —
=2d
am hmn"n J nt/ d /
I
Pmn'
(11.2)
In: o. # amt
where p = (( p m ,,)) is the transition probability matrix on 7L ° of the simple
°
symmetric random walk on Z , i.e.,
1 2d' Pm,, =
0,
if m and
n
are neighbors,
otherwise.
This helps explain the terminology simple symmetric voter model. The first issue one must contend with is the existence of a Markov evolution {a(t): t >, 0} on the uncountable state space S having the prescribed infinitesimal transition rates. In general this itself can be a mathematically nontrivial matter when it comes to describing infinite interacting systems. However, for the special flip rates desired for the voter model, a relatively simple graphical construction of the process is possible; called the percolation construction because of the "fluid flow" interpretation described below.
326
CONTINUOUS-PARAMETER MARKOV CHAINS Time
...
(+)
Space
n
m (-)
(+)
(-)
(-)
... ( ao(o))
Figure 11.1
Consider a space-time diagram of V x [0, oo) in which a "vertical time axis" is located at each site m n 7L , and imagine the points of Z" as being laid out along a "horizontal axis"; see Figure 11.1. The basic idea is this. Imagine each vertical time axis as a wire that transports negative charge (opinion) upward from the (initially) negatively charged sites m such that a m (0) = -1. For a certain random length of time (wire), the voter opinion at m will coincide with that of Q m (0), but then this influence will be blocked, and the voter will receive the opinion of a randomly selected neighbor. Place a S at the blockage time, and draw an arrow from the wire at the randomly selected site to the wire at m at this location of the blockage. S stands for "death of influence from (directly) below." If the neighbor is conducting negative flow from some point below to this time, then it will be transferred across the arrow and up. So, while the source of influence at m might have changed, the opinion has not. However, if the selected neighbor is not conducting negative flow by this time, then a portion of the wire at m above this time will be given positive charge (opinion). Likewise, at certain times positive (plus) portions of the wire at a site can become negatively charged by the occurrence of an arrow from a randomly selected neighbor conducting negative flow across the arrow. This "space-time" flow of influence being qualitatively correct, it is now a matter of selecting the distribution of occurrence times of 8's and arrows with the Markov property of the evolution in mind. The specification of the average density of occurrences of 6's and arrows will then furnish the rate-parameter values. Let {Nm (t): t >, 0}, m E 71 d , be independent Poisson processes with intensity 2d. The construction of a Poisson process having right-continuous unit jump sample functions is equivalent to the construction of a sequence {7,,(k): k > 1} of i.i.d. exponential inter-arrival times (see Section 5). The construction of countably many independent versions is made possible by Kolmogorov's existence theorem (theoretical complement 6.1 of Chapter!). Let {U m (k): k m e Z', be independent i.i.d. sequences and independent of the processes
°
327
CHAPTER APPLICATION
{Nm (t) }, m E Z, where P(U m (k) = n) = p mn . At the kth occurrence time r m (k):= Tm (1) + • • • + Tm (k) of the Poisson process at m e Z d a neighboring site of m, represented by Um (k), is randomly selected. One may also consider independent Poisson processes {Nmn (t): t >, 0}, m, n e Z d , with respectively ,
intensity parameters 2dp mn representing those (Poisson) times at m when the neighboring site n is picked. Notice that 2dp mn is either 1 or 0 according to whether m and n are neighbors or not. At the kth event time i = r m (k) of the process {Nm (t)}, let n = Um (k) be the corresponding neighbor selected. Place an arrow from (n, t) to (m, r), and attach the symbol S to the point (m, r) at the arrowhead. For a given initial configuration a(0) _ (° m (0)), define a flow to be initiated at the sites m such that a m (0) = —1. The flow passes vertically until blocked by the occurrence of a S above, and also passes horizontally across arrows in the direction of the arrows and is stopped at the occurrence of a b from moving upward. The configuration at time t is defined by 6 n (t)
= —1 if the (negative) flow reaches (n, t) from some initial point (m, 0)
= +1 otherwise. (11.3) More precisely, the flow is defined to reach (n, t) from (m, 0) if there are times 0= t o
vertical intervals (excluding lower end points) joining (n„ t i ) to (n„ i ), i = 1, ... , k. In this case (n, t) and (m, 0) are said to be connected by a continuous-time nearest-neighbor percolation. The transition from a m (t—) = — Ito a m (t) = +1 occurs if and only if t = is an event time of the process {Nmn (t)} for some n such that a n (t—) _ +1.
Y_
Therefore the flipping rate from —1 to + I is
1 2 dPmn( 1 + a
n )/2
= d( 1 —
am
=
Pmnan
c m (a) for am= —1.
)
Y_
The transition from a m (t —) = + 1 to Q m (t) = — 1 occurs if and only if t = T is an event time of the process {Nmn (t)} for some n such that a n (t —) _ —1. Therefore the flipping rate from + 1 to —1 is 2 dpmn( 1 — Qn)/ 2
= d(l —
am
Pmnan)
, for a m = + 1.
n
This (noncanonical) construction provides the existence of a Markov process with the desired infinitesimal rates. To check the Markov property, it is enough to consider the distribution of a n (t + s) for some finitely many sites n e Z", and fixed s, t >, 0, conditionally given the (entire) configurations a(u), u < t. Working backwards through the space—time diagram from a., (t + s), ... , a n , ( t + s), note
328
CONTINUOUS-PARAMETER MARKOV CHAINS
that the value Q,(t + s) must coincide with Qm ,(t) for some m i e Z", where m ; depends only on a(u), u >, t, but not on a(u), u < t. In the case of Figure 11.1 the history of v„(t + s) given a(u), u < t, shows that the voter at n at time t + s is copying the voter at m at time t (see theoretical complement 1). The two configurations a + and a - representing total consensus, a = + 1 and a. _ -1 for all n, are absorbing. In particular this makes each probability distribution of the form pö . + ( 1 - p)S a -, for 0 < p < 1, an invariant equilibrium distribution for the system. To understand conditions under which it is also possible to have other equilibrium distributions of persistent disagreement requires some further analysis. Probability distribution on S are defined for events belonging to the sigmafield F generated by events in S that depend on the values of configurations at finitely many sites. The state space S has a natural metric space structure for which F coincides with its Bore! sigmafield (theoretical complement 2). Moreover, S is a compact metric space with this metric (theoretical complement 2). Consequently, any collection of probability measures on (S, .) is necessarily tight (theoretical complement 8.2 of Chapter I). This is quite significant since it implies that convergence in distribution (weak convergence) coincides with convergence offinite-dimensional distributions in this setting. Special advantage is taken of this fact in the use of a method of finite-dimensional (multivariate) moments to study the long-run behavior of the distribution of the system. Let f be a real-valued function on S that depends on a fixed set of finitely many coordinates of configurations a e S. Then f must be bounded. Define linear operators T , ( t >, 0), for such functions f by
o
Q
1
T,f(a)=E.f(a(t)),
aeS.
(11.4)
Then,
`_
T,f(a) — .f (a) = Ea{.f (d'(t)) — .%(a)}
_Y_
{f()
—
f(a)}a n (a)l +O(t),
(11.5)
m
uniformly as t -• 0 + . Accordingly, for functions f on S depending on finitely many coordinates we have for t > 0, a E S,
at
where, for such functions
(a) = T,(Af)(a) = E Q {Af(a(t))},
f,
Af(a) = lim t - a +
T f(a) — f(a) `
t
(11.6)
CHAPTER APPLICATION
329
= lim r-0
= L.
m
E { f (a(t Q
)) —
f (a)}
1
{ (a 1m, )
—J
(a)}C m (J).
(11.7)
Let tt be an arbitrary initial probability distribution on (S, S) and let µ, denote the corresponding distribution of a(t) with p o = µ. The (spatial) block correlations of the distribution p, over the finite set of sites D = {n 1 , ... , n,}, say, are the multivariate moments of a (t), ... , .r (t) defined by (pµ
(t, D) = E µ { Qn (t)
x
... X a(t)}.
(11.8)
Using standard inclusion—exclusion calculations, one can verify that a probability distribution on (S, .y) is uniquely determined by its block correlations. Let cp 0 (t, D) = cp sa (t, D). Applying equation (11.6) to the function f (a) := Q n , x x a, a e S, we get an equation describing the evolution of the block correlations of the distributions as follows:
Öt
j^
tp e (t, D)
=
Ea
kn Q( ) (t) — kfl
_ —2E6{
[ii ak(t)
Cm(a(t))S
k€D
mED
_ 2d Z E,, j meD
]
l 6 k(t) Cm(a(t))
l fl a (t)] L 1 k
t keD
—
am(t) I Pmn an( t ) n
J
}
1 p mn q (t, (D\{m})A{n}), (11.9)
_ —2dIDI cp e (t, D) + 2d
meD n
where A is symmetric difference, A AB := A n BC u AC n B, A, B c Z', and IDS is cardinality of D. Taking expected values, we get from (11.9) that acp µ (t, D)
a
— —2dIDjcp µ (t, D) + 2d
t
['
/
t, / D m 0 J n
mED n
_ Y E {cp µ (t, (D\{m})A{n}) — (p µ (t, D)}2p mn .
(11.10)
m€D n
As a warm-up to the equations (11.10) take D = {m} and suppose that p is an invariant (equilibrium) distribution. Then Ep C m = cp,,(0, {m}) is a harmonic . function for the random walk; i.e., the left-hand side of (11.10) is zero so that the equations show that q, has the averaging property (harmonicity) ßp µ (0, {m}) _ Y_ q,(0, {n})p mn . n
(11.11)
330
CONTINUOUS-PARAMETER MARKOV CHAINS
°
Now, for the simple symmetric random walk on 7L , one can show that the only bounded solutions to (11.11) are constants (theoretical complement 3). Therefore, the distribution of a m under the invariant equilibrium distribution is independent of m in this case. From here out we will restrict our consideration to the long-time behavior of various translation invariant initial distributions, with say, EN7m=2p-1,
0
(11.12)
In particular, we will consider what happens in the long run when voters initially are, independently, + 1 or —1 with probabilities p, 1 — p, respectively. The equation (11.9) or (11.10) suggests consideration of the following finite-particle system. A particle is initially placed at each site of the finite set Do = D. Each of these particles then undergoes a continuous-time random walk according to p = ((p m )) with exponential holding times with parameter 2d, independently of the others until two particles meet. If two of the particles meet,
then they are mutually annihilated. Let D, denote the collection of sites occupied by particles at time t > 0. The evolution of the stochastic process {D,} takes place in the (denumerable) state space S = q(d) consisting of all finite subsets of Z'. The empty set is absorbing for the process. If D, : 0 then in time t to t + At a change to (D,\{m })i {n}, for some me D, and ne Z', occurs with probability 2dp mn At + o(At) as At --* 0+. Other types of changes in the configuration have probability o(At) as At --^ 0+. Now consider that for fixed a e S we have by (11.9) that 0 9.(t, D) = p * ^VQ(t, )(D), at
(11.13)
(p.(0, D) = [1 a m ,
(11.14)
together with
mED
where for D E _V (d) ,
A*f(D) = Y I { f((D\{m})A{n}) — f(D)}2dp mn ,
(11.15)
meD n
for bounded real-valued functions f on £ (d) . Since A* is the infinitesimal generator for T f (D) = E D f (D,), t >, 0, D e 2 (d) , the annihilating random walk, the solution to (11.13), (11.14) is given by
(PQ(t, D) = ED(a(0, D1) = E D I H m e D,
Qm
} . )))
(11.16)
331
CHAPTER APPLICATION
The representation (11.16) is known as the duality equation and is the basis for the proof of the following major result. Theorem 11.1. Let p = ((p m „)) be the transition probability matrix of the simple symmetric random walk on 7L ° associated with the simple symmetric voter model on 7L ° . For an initial probability distribution µ satisfying (11.12) we have the following: (a) If d < 2, then p, converges in distribution to pö Q+ + (1 - p)S,, _ as t -* oo. (b) If d >, 3, then for each p e (0, 1) there is a distinct translation-invariant equilibrium distribution v°° ) on (S, .F ), which is not a mixture of 6 Q+ and S a _, such that E,,,p,v m = 2p - 1. Moreover, if the initial distribution It is that of independent ± 1-valued Bernoulli random variables with probability parameter determined by (11.12), then µ, converges in distribution to v° 0) as t -+ co. Proof. To prove (a) first consider that
E p m (t) = cp µ (t, {m}) =
f
^p.(t, {m })µ(da)
S
_J [
ESml fl T k
]
1t(da)
Y_ Pmk(t)E j,Uk = 2p - I
for all t >, 0, (11.17)
k
where p mk (t) is the transition law for the continuous-time random walk. Also for distinct sites n and m in 7L" we have q (t, {n, m}) = Pp(0m(t) = ø(t)) — Pj (Om(t) # cr (t))
= I - 2Pµ (a m (t) : a, (t)).
(11.18)
Therefore, it is enough to show that cp(t, {n, m}) -• 1 as t -> oo to prove (a). Now since the difference of the two independent simple symmetric random walks starting at • m and n evolves as a (symmetrized) continuous-parameter simple symmetric random walk, it follows from the recurrence when d < 2, that 0 a.s. as t --> oo; i.e., the particles will eventually meet. Using the duality relation and the Lebesgue Dominated Convergence Theorem we have, therefore,
lim cp µ (t, {n, m}) = lim J E{f,m) s
L fl
(Tk}µ(dß) = 1.
(11.19)
keD^
To prove (b), on the other hand, take for It the Bernoulli product distribution
332
CONTINUOUS-PARAMETER MARKOV CHAINS
of independent values subject to (11.12) and consider the duality relation yl,(t, D) = E 1 , ED H
l
6 m}
= ED Et fl amt = ED fl EN am
meD,
{
meD1
J
meDt
= ED( 2 p — 1)ID I, (11.20) ,
where 1D1 1 denotes the cardinality of D1 . Now since JD I is a.s. nonincreasing, the limit, denoted FDJ, exists a.s. and is positive with nonzero probability by transience. The limit distribution v P is defined through its block correlation accordingly. That is, (
f
)
s{ [ am}vc")(da) = E D (2p — 1)ID. 1 . meD
Certainly it follows that v p» is translation-invariant and, being a time-asymptotic distribution of the evolution, one expects it to be invariant under the (further) evolution. More precisely, for any (bounded) continuous real-valued function f on S we have by the Chapman—Kolmogorov equations (semigroup property) that (
f(i1)v(di) = fS J f(rl)p(t; a, dil)vcP ) (da)
s
fS
f
(
= lim
s-.
f
11)p(t;
a, dq)g,(da),
(11.22)
s s
since for fixed (continuous) f and t > 0, the (bounded) function a --• $ s f (rl)p(t; a, dq) is continuous (theoretical complement) and, as noted earlier, by compactness of S µ s converges to v(da) in distribution (i.e., weak convergence) as s --* oo. Now, from (11.22) and the Chapman—Kolmogorov equations again, one has
J
f(t)v(di) = lim J .f(il)µ,+s(drl) = J f(rl)v (a) (dii). (11.23) s
s_^
s
s
Thus, since the integrals of continuous functions on S with respect to the distributions v(drl) and v (1»(dq) coincide, the two measures must be the same (see theoretical complement 8.6 of Chapter I). n The property that, for each bounded continuous function f on S. the function a —* E f (a(t)) = f s f (rl) p(t; a, dq) is also a bounded continuous function, is called the Feller property. As illustrated in the above proof (Eq. 11.22), this property is essential for confirming one's intuition about invariance properties of long-time limits under weak convergence. The other important role played Q
333
EXERCISES
by topology in the above was in the use of compactness to, in fact, get weak convergence from finite-dimensional calculations.
EXERCISES Exercises for Section IV.1 1. Show that in order to establish the Markov property it is enough to check Eq. 1.1 for only one "future" time point t, for arbitrary t > s. 2. Show that a continuous-parameter process with independent increments has the Markov property. Show also that such a process has a homogeneous transition law if and only if, for every h > 0, the distribution of X, +h — X, is the same for all t _> 0. 3. Show that, given a Poisson process {X} as in Example 1 with f ö p(u) du = cc, there exists an increasing transformation cp: [0, oo) —+ [0, cc) such that the process X,,,,} is homogeneous with parameter). = 1. 4. Prove that the compound Poisson process (Example 2) has independent increments. 5. Consider a compound Poisson process {X,} with state space U8' and an arbitrary jump distribution p(dx) on (f8'. (ii) Show that {X} is a Markov process and compute its transition probability p(t; x, B)'= P(X, E B I X0 = x).
(ii) Compute the characteristic function of X,. 6. (Doubly Stochastic Poisson or Cox Process) Suppose that the parameter A (mean rate) of a homogeneous Poisson process {X} is random with distribution p(dA) = f(A) d). on (0, ro). In other words, conditionally given A = ) o , {X} is a Poisson process with parameter A 0 . (i) Show that {X} is not a process with independent increments. (ii) Show that {X,} is (generally) not a Markov process. (iii) Compute the distribution of X, for arbitrary but fixed t > 0. (iv) Compute Cov(X ,, X,). .
7. Generalize Exercise 6 to compound Poisson processes. 8. The lifetimes of elements of a certain type are independent and exponentially distributed with parameter a > 0. At time t = 0 there are X0 = n living elements present. Let X, denote the number alive at time t. Show that {X,} is a Markov process and calculate its transition probabilities. 9. Let {X,: t _> 0} be a process starting at x with (stationary) independent increments, and EX, < cc. Assuming EX, and EX are continuous, prove the following. (i) EX, = mt + x, Var X, = a 2 t for some constants m and a 2 . (ii) (X, — mt)/../ converges in distribution to the Gaussian distribution with mean 0 and variance a', as t —^ co. 10.
(i) Show that the sum of a finite number of independent real-valued stochastic
processes, each having independent increments is also a process with independent increments. (ii) Let {N,(' ) } (i = 1, 2, ... , k) be independent Poisson processes with mean
334
CONTINUOUS-PARAMETER MARKOV CHAINS
parameter A ; (i = 1, 2, ... , k). If c„ C 2 ..... C, are arbitrary distinct positive constants, show that { c N} is a compound Poisson process and compute its jump distribution and the (Poisson) mean rate of occurrences. (*iii) Prove that every compound Poisson process is a limit in distribution of superpositions of independent Poisson processes as described in (ii). [Hint: Compute the characteristic function of respective increments.] ;
Exercises for Section IV.2 1. Check the Kolmogorov consistency condition for P,, defined by Eq. 2.2, assuming the Chapman—Kolmogorov condition (2.5). 2. There are n identical components in a system that operate independently. When a component fails, it undergoes repair, and after repair is placed back into the system. Assume that for a component the operating times between successive failures are i.i.d. exponential with mean 1/A, and that these are independent of the successive repair times, which are i.i.d. exponential with mean 1/µ. The state of the system is the number of components in operation. Determine the infinitesimal generator of this Markov process. 3. For the process {X,} in Exercise 1.8, give the corresponding infinitesimal generator and Kolmogorov's backward and forward equations. 4. Let {X,} be a birth—death process on S = {0, 1, 2, ...} with q i >,0, where ß,b>0,q ;; =0iflj—it>1,q o ,=0.Let
m(t) = EX,,
s (t) = ;
+
, = iß, q , _, = i5, ; ;
E.X..
(i) Use the foward equation to show m(t) _ (ß — S)m (t), m ; (0) = i. ;
(ii) Show m (t) = ;
(iii) Show s(t) = 2(ß — b)s (t) + (ß + S)m (t). (iv) Show that ;
;
ß + (1 — e ^e_ 1)]
+ sr(t)=
i(i + 2ßt),
if ß 8 if ß = b.
(v) Calculate Var X,. ;
*5. Consider a compound Poisson process {X,} on III' with an arbitrary jump distribution p(dx).
(i) For a given bounded continuous function f on II' compute u(t, x)'= E(f(X,) 1 Xo = x),
and show that at u(t, x) = —Au(t, x) +
J
.i J u(t, x + y)µ(dy) = A (u(t, x + y) — u(t, x))µ(dy).
335
EXERCISES
(ii) Show that the limit of the last expression is lim d u(t x) = (Qf)(x), '
, lo at
where Q is the integral operator (Qf)(x) _ ) f (f (x + y) — f (x)) p(d y). (iii) Write the (backward) equation (i) above for u(t, x) in terms of Q. 6. Let { Y} be a nonhomogeneous Markov chain with transition probabilities p ü (s, t) = P(Y = j ( Y = i), continuous for 0 _< s _< t, with p ;i (s, s) = b, i , i, j e S, and such that S
lim P. i(s' t ) — a +i = 91;(s) NS
t—s
exists and is finite for each s. (i) Show that the Chapman—Kolmogorov equations take the form, Pk(, t) _
p^ ; (s, r)P;k (r, t),
s < r < t.
(ii) For finite S show that the backward and forward equations, respectively, take the forms below: (backward)
ap`as' t) = —Z 9ii(s)Pik(s, t), i
(forward)
a
P,k(S, t) — dt _
P,;(s, t)9;k(t)• i
7. Consider a collection of particles that act independently in giving rise to succeeding generations of particles. Suppose that each particle, from the time it appears, waits a random length of time having an exponential distribution with parameter ), and then either splits into two particles with probability p or disappears with probability q = I — p. Find the generator of this Markov process on the state space S = {0, 1, 2, 3, ...}, the state being the number of particles present. 8. In Exercise 7 above, suppose that new particles immigrate into the system (independently of particles present) at random times that form a Poisson process with parameter y, and then give rise to succeeding generations as described in Exercise 7. Compute the generator of this Markov process. *9. Let {XJ be a Markov process with state space an arbitrary measurable space (S, .9'). Let B(S) be the space of (Sorel-measurable) bounded real-valued functions on S with the uniform norm, 111 11 = supxeslf(x)I, JE B(S). Then (B(S), II • II) is a Banach space. Define T, f(x) = E x f (X,), t >_ 0, x e S, for Je B(S). Also, for Je B(S) such that lim, . o . {(T, f — f)/t} exists in (B(S), II • II ), say that f belongs to the domain of Q and define Qf = lim,. o ,{(T,f — f)/t}. Show that if the domain of Q is all of B(S), then Q must be a bounded linear operator on (B(S), II • II); i.e., Q is continuous on (B(S), II • II). [Hint: T, f(x) = f(x) + J T,Qf (x) ds, t > 0, XE S, Je B(S). Apply the closed graph theorem from functional analysis.]
336
CONTINUOUS-PARAMETER MARKOV CHAINS
10. (Continuous-Parameter Pölya Process) Fix r> 0 and consider a box containing r = r(r) red balls and b = b(T) black balls. Every r units of time a ball is randomly selected, its color is noted, and together with c = c(z) balls of the same color it is placed in the box. Let S„ T denote the number of red balls sampled by the time nr, n = 0, 1, 2, ... , S o = 0. As in Eq. 3.14 of Chapter II, {S„,} is a discrete-parameter nonhomogeneous Markov chain with one-step transition probabilities pj.;(nr, (n + 1)T) =
I - pj.^+i(nr, (n + 1)r),
and p,.;+ (nt,(n+1)a)=P(S^, +nz =i +1 S,, =i)=
r
+ ci
i= 0,1,...
r + b + nc '
Let p = p(r) = r/(r + b), y = y(r) = c/(r + b), t = nz. Then the probability of a transition from i to i + 1 in time t to + t is given by
p+yi
(t,t+T)=
1+yt i
Note that p = r/(r + b) is the probability of selecting a red ball at the nth trial and np = (p/z)t is the expected number of red balls sampled by time t = nr, i.e., p/i is the mean sampling rate of red balls. Suppose that p/T = p(r)/r -+ 1 and = Y(i)/t -. y o > 0 as t - 0.
(i) For fixed r > 0, use a combinatorial argument to show that the distribution of S,, is given by n) b(b+c)•••(b+(n (
-
j)c
-
c) r( r+c). (r +jc
-
c)
j (b+r)(b+r+c) •(b+r+nc-c)
(ii) Show that in the limit as r -+ 0, the distribution in (i) converges to f(t) = t'(1 + yot)-t-'
((j - I)Yo + 1 )
..
. (2Yo + 1 )(Yo + 1)
,
= 0,1, 2, ... ,
j•
which is a negative binomial p.m.f. (iii) Show that jf (t) = t 3=o
and
(j - t) Z Tf(t)
= Yot 2 + t.
3=o
(iv) The continuous-parameter Pölya process is often defined as a nonhomogeneous (pure birth) Markov chain { Y,} starting at 0 on S = {0, 1, 2, ...} with transition probabilities denoted by P(Y =j I Y = i) = p ;; (s, t), j, i = 0, 1, 2, ... , 0 _< s <_ t, and satisfying p, ; (t, t + At) =
+ q 1 (t) At + o(At)
as At - 0,
EXERCISES
337
where qi,i+,(t) = l
Check that P(S( „ + , )
L
+ iy o l + ty o
=
j IS
1
=
, =
—
q«(t),
q 1 (t) = 0,
j : i, i + 1 .
i) = p;t (t, t + t) + o(T)as r —. 0, n -+ oo, t = nt.
Exercises for Section IV.3 1. Prove the inequality (3.12). 2. Prove that the solution p(t) = exp {tQ} is the unique bounded solution of Kolmogorov's backward (as well as forward) equations, under the hypothesis (3.9). 3. Solve the forward equations directly in Example 1. 4. Solve the Kolmogorov equations for a Markov process with three states and infinitesimal generator —A
a
0
Q= 0 —µ µ
0
00
where .?, p are positive. 5. Let Q = ((q ij )) such that sup 1 Ig 1 I < co. Show that all elements of eQ` are nonnegative for all t >- 0 if q, -> 0 for i 0 j. [Hint: Consider z
r;j (t)= S, j +q ;; t +q;j 1 t + • • •
2!
for small t > 0, and use the fact that the product of matrices with all entries nonnegative has itself all entries nonnegative together with the property e to = (e(>Q)'.] 6. Each organism of a system, independently of the others, lives for an exponentially distributed time with parameter A and is then replaced by two replicas that independently undergo the same replication process. Let X, denote the total number of organisms at time t. Show that the bounded rates condition is not satisfied for the generator Q of {X,}. 7. Calculate the transition probabilities for Exercise 2.2 in the case n = 2 8. In radioactive transformations an unstable atom randomly disintegrates to an unstable atom of a new chemical and physical structure. Successive atomic states in the chain are labeled 0, 1, 2, ... , N. Atoms in state i are transformed to state i + 1,0 -< i < N at the rate q i , ; + i = .1 1 , where A N = 0. Calculate the distribution of the state at time t of an atom initially in state 0.
Exercises for Section IV.4 1. Calculate the successive approximations p(t) in the case of the (homogeneous) Poisson process with parameter A > 0.
338
CONTINUOUS-PARAMETER MARKOV CHAINS
2. Write out the third iterate (of Eq. 4.7) p;» for the case S = {l, 2, 3}, q,, i = —A, q1,2 = A q 22 = —µ‚ qz.3 = µ, q3.3 = — y, q 32 = y, q•; = 0 otherwise for i, j e S. 3. (A Pure Death Process) Let S = {0, 1, 2, ...} and let q_ = 6 > 0, q ;; _ —S, i > 1, = 0, otherwise. (i) Calculate p (t), t 0 using successive approximations. (ii) Calculate E.X, and Var ; X. ;;
4. Calculate approximations p 2 (t) for Exercises 2.7 and 2.8. 5. Show that the minimal solution p(t) also satisfies the forward equation. [Hint: Consider the integral version.] 6. Show that the minimal solution p(t), t > 0, is a semigroup (i.e., that it satisfies the Chapman—Kolmogorov equations) by the following procedure. (i) Let f be a continuous bounded function on [0, oo) with Laplace transform f(v) =
f"
e °f(t) dt. -
'
Define F(s, t) = f(s + t), s, t _> 0, and let F(v, µ) =
J
$ e`°F(s, t) ds dt
(µ > 0, v > 0)
0
be the bivariate Laplace transform of F. Show that P satisfies the resolvent equation: F(v, µ) _ —
i(v) — ^(µ)
v ^ .
V—/1
[Hint: Write u = s + t, v = —s + t, 0 <_ u < oo, —u < v < u, in the integral defining F.] (ii) Let p(v) = ((‚(v))) denote the matrix of transformed entries of the minimal solution. Show that p(s + t) = p(s)p(t), s, t >, 0 if and only if
_ p(v) — p(µ) = p(µ)p(v),
vu.
v—p
(iii) Show that the backward and forward equations (see also Exercise 5) transform, respectively, as [B]: vp(v) = I + Qp(v), [F]: v(v) = I + P(v)Q•
(iv) Use the backward equation to show µp(v) = A + Q(v) for A = I + (p — v)p(v). Use nonnegativity of p(v) and A for p> v, to show by
339
EXERCISES
induction, using (4.7), that p(v) > ß( " ) (p)A for n = 0, 1, 2, ... , and therefore P(v) > P(,u)A = p(µ) + (p - v)P(µ)P(v),
p> v.
(v) Use [F] to prove the reverse inequality and hence (ii). [Hint: Check r(v) = p(p) + (u - v)p(p)p(v) solves [F] and use minimality.] 7. Compute p ;k (t) for all i, k in Example 1.
Exercises for Section IV.5 1. Show that the holding time T" (n > 1) is not a stopping time. 2. Let A(A), B(A), B(0) denote the events Is < To <_ s + A), {XX+o =j}, {XT" -J}, respectively.
(i) Prove that P; (A(A) n B(A) n B`(0)) = o(A) and P (A(A) n BC(A) n B(0)) = o(A) as A 10 (for i 0 j). [Hint: Apply the strong Markov property with respect to To in order to show P; (s < To < To + T, _< s + A) = o(A).] (ii) Use (i) to derive (5.6) from (5.5). 3. Let U 1 , U21... , U" be i.i.d. uniform on [0, t], and order them as U (l) < U(2) <
(ii) Show that Dk has p.d.f. given by n(1 - x)" - ', 0 < x < 1, k = 1, ... , n + 1. 4. Derive an extension of Proposition 5.6 for a nonhomogeneous Poisson process with intensity function p(t) (see Example 1.1). 5. Suppose that a pure birth process on S = {0, 1, 2, ...} has infinitesimal parameters qkk =
-
^`k•
(i) If Zö Ak ' = oo and 1ö ‚-2 < co, then show that the variance of = To + T, + • • • + T _, (the time to reach n, starting from 0) goes to a finite limit as n - oo. Use Kolmogorov's zero-one law (see theoretical complements 1.1, Chapter I) to show that as n -+ oc, z" - .ik ' converges (for all sample paths, outside a set of probability zero) to a finite random variable ry. (ii) If ak 1 = cc = ^ö ak Z , but _p 1 k 3 < oo, show that
C
t"
I
)/40 Ak 2 ) '
is asymptotically (as n -* co) Gaussian with mean 0 and variance 1. [Hint: Use Liapounov's central limit theorem, Chapter 0.] 6. Messages arrive at a telegraph office according to a Poisson process with mean rate of occurrence of 4 messages per hour. (i) What is the probability that no message will have arrived during the morning hours (8 to 12)? (ii) What is the distribution of the time at which the second afternoon message arrives?
340
CONTINUOUS-PARAMETER MARKOV CHAINS
7. Let {X,} and {Y} be independent Poisson processes with parameters 2 and p respectively. Show that the probability of n occurrences of {Y,} within the time interval from the first to the (r + 1)st occurrence of {X,} is given by
C
A + pJ \.A + p)• n = n r-1 1)(—p
0, 1,2
, ... .
'
8. Let {X} be a Poisson process with parameter A > 0. Let To denote the time of the first arrival. (i) Let N = X ZTo — X T ,, and calculate Cov(N, To ). (ii) Calculate the (conditional) expected value of the time To + • • • + T,_, of the rth arrival given X, = n > r.
>_
9. Suppose that two colonies, I and 2, start as single units and independently undergo growth by pure birth processes with rates ß„ ß 2 , respectively. Calculate the expected size of colony 1 at the time when the first offspring is produced in colony 2. 10. Consider a pure—death process with rates —q ;; = q 1 _ 1 = 2>0, i 1, q• ; = 0 otherwise. (i) Calculate p(t). (ii) If initially there are n, particles of type 1 and n 2 particles of type 2 that independently undergo pure death at rates S,, 5, respectively, calculate the expected number of type 1 particles at the time of extinction of the type 2 particles. 11. Let {N,} be a homogeneous Poisson process with parameter 2>0. Define X, _ (-1) ^ `, t 0. Show that {X,} is a Markov process and compute its transition probabilities. 12. Consider all rooted binary tree graphs having n sources (i.e., degree-one vertices excluding the root). Call any edge incident to a source vertex an external edge and call the others internal (see Fig. Ex.IV.5). [By a rooted tree is meant a tree graph in which a vertex is singled out as the root. Other graph-theory terminology is described in Exercise 7.5 of Chapter II.]
n=4
-source Internal
Root Figure Ex. IV.5
EXERCISES
_>
341
(i) Show that a rooted binary tree with n sources has n external edges and n — t internal edges, n 1. In particular, the total number of edges is 2n — 1, which is also the total number of vertices excluding the root. (ii) Show that the following code establishes a one-to-one correspondence between the collection of all rooted binary tree graphs and the collection of all simple polygonal paths from (0, 0) to (2n — 1, — 1) in steps of + I that do not touch or cross the line y = —1 prior to x = 2n — 1. Starting with the edge incident to the root, traverse the tree along the leftmost path until reaching the leftmost source. Then follow back until reaching a junction leading to the next leftmost source, and so on, recording, on the first (and only on the first) traverse of an edge, + if it is internal and — if it is external. The path 2n — I long of (+ )s and (—)s furnishes the coding of the tree in the form of a "random walk excursion."
(iii) Use (ii) and the reflection principle (Section 4, Chapter I) to calculate the distribution of ML in Example 3. (iv) Let g(x) = Ex M I denote the probability-generating function of ML . Use the recursive structure of the tree to establish the quadratic equation g(x) = Zx + ? g 2 (x). (v) Use (iv) to give another derivation of the distribution of ML .
13. Show that the holding time in state j for Example 2 is exponentially distributed with parameter i t = jA (I? 1).
Exercises for Section IV.6 I. Show that (I + x) log(1 + x)
_> >_ x for all x
0.
2. Show that (i) the backward equation holds for p 1 (t), and (ii) {‚(t): t > 0} are transition probabilities.
>_
3. Show that {p(t): t > 0} are transition probabilities on S v {A} satisfying the backward and forward equations. 4. (i) Consider an initial mass of size x o that grows (deterministically) to a size x, at time t at a rate that is dependent upon size; say, x; = f (x,), t 0. Give an example of a growth-rate function f (x), x 0, such that the mass will grow without bound within a finite time > 0, i.e., lim,_, x, = co. [Hint: Consider a case in which each "element" of mass grows at a rate proportional to the total mass, so that the total mass itself grows at a rate proportional to the mass squared.] (ii) Let X, denote the number of reactions that have occurred on or before time t in a chain reaction process. Suppose that {X} is a pure birth process starting at 0. Show that the expected number of reactions that occur in a finite time interval [0, t] need not be finite. (A Bus Stop Problem) A passenger regularly travels from home to office either by bus or by walking. The travel time by walking is a constant t w , whereas the travel
time by bus from the stop to work is a random variable, independent of the bus arrival time, with a continuous distribution having mean t b < t w . Buses arrive randomly at the home stop according to a Poisson process with intensity parameter 2. The passenger uses the following strategy to decide whether to walk or ride. If the bus arrives within c time units, then ride the bus; if the wait reaches c, then walk. Determine (optimality) conditions on c that minimize the average travel time in terms
342
CONTINUOUS-PARAMETER MARKOV CHAINS
of t b , t„„ A. [Note: The solutions c = 0 (always walk), c = cc (always ride) are permitted.] 6. Consider a single telephone line that is either free (state 0) or busy (state 1). Suppose incoming calls form a Poisson process with a mean rate (intensity) A per minute. The successive durations of calls are i.i.d. exponential random variables with parameter p (mean duration of a call being p ' minutes), independent of the Poisson process of incoming calls. If a call arrives at a time when the line is busy, the call is simply lost. (i) Give the corresponding prescription of Kolmogorov's backward and forward equations for the state of the line. (ii) Solve the forward equation. -
Exercises for Section IV.7 1. Show that the process {X} described noncanonically in Example 2 is a Markov process. 2. Prove Eq. 7.11. 3. Let {X,} be the Yule linear growth process of Example 2. Show that (i) EXX = ie". (ii) Var, X, = ie(el` - 1) = 2ie 3 II 2 sinh(lAt). 4. Show that the N-server queue process {X} is a Markov process. *5. (Renewal Age Process) Let Ti , T2 ,... be an i.i.d. sequence of positive random
variables with a continuous (lifetime) distribution function F. Let S o = 0, S„=T1 +T2 +•••+T,,,n> 1,andforA 0 =O,definefort>0,theageattimet by A,:= t - max{Sk : Sk _< t, k >_ 0}. Then {a + A,} is the process starting at a >_ 0. (i) Show that {A,} has the Markov property: i.e., for arbitrary time points 0 -< s o <S 1 < • • • <S <s < t < t, < • • • < t n , the conditional distribution of A so , .. , A sk A S does not depend on A so , .. , A sk A, A,,, .. , A (ii) The failure rate (also called hazard rate or force of mortality) for the objects being renewed is defined by h(t) = f(t)/[l - F(t)], where f is the p.d.f. (assumed to exist) of F. For continuously differentiable functions g having bounded derivatives show, for a _> 0, .
,
Qg(a): lim Ep
g(Ar) - g(a) dg = h(a){g(0) - g(a)} + —>
,- o t
da
a -> 0.
(iii) Show that µ(a) = Z -1 exp{ - J$ h(u) du}, a >_ 0, where Z is the normalization constant, solves f o u(a)Qg(a) da = 0. In particular, show that p is the density of an invariant probability for {A,}. (iv) Show (i) holds also for the residual lifetime { SN , - t}, where N, = inf {n >_ 0: S> t}. For ET, < oc, show that ff (t) = (1 - F(t))/ET„ t >, 0, is the density of an invariant probability. [The above has an interesting generalization by
EXERCISES
343
F. Spitzer (1986), "A Multidimensional Renewal Theorem," in Probability, Statistical Mechanics, and Number Theory, Advances in Mathematics Supplementary Studies, Vol. 9 (G. C. Rota, ed.), pp. 147-155.] 6. (Thinned Poisson) Let {t n } be the sequence of occurrence times for a Poisson process {N} with parameter 2. Let {r„} be an i.i.d. sequence, independent of the Poisson process, of Bernoulli 0-1-valued random variables such that P(E, = 0) = p, 0
It `_
n =o
t > 0,
li0.1)(e tn),
Vo = 0,
where {N, = max {n: i n < t}} is the Poisson counting process. Show that {}} is a Poisson process with intensity parameter (1 - p)2. 7. (i) (Vibrating String) For Example 5, check that in the (deterministic) case
p = ar = 0, u(x, t) = 2 {cp(x + vt) + cp(x - vt)}. In particular, that u(x, t) solves —=v2 [Z
x2 ,
u(x, 0) = cP(x),
nt
au t) =0
at t = 0.
(ii) Check that there is a randomization of time represented by a nonnegative nondecreasing stochastic process { T,} such that u(x, t) = z{Ecp(x + vT,) + Ecp(x - vT) }. [Hint: Simply consider (7.37).] 8. (Diffusion limit) Check that in the limit a -. co, v - oo, 2a /V 2 = D ' > 0, the diffusion equation -
äu
au
at = D 3x2' is consistent with the telegrapher's equation. 9. The flow of electricity through a coaxial cable is typically described by the telegrapher's equation (7.19) or (7.32), (7.33), where u(x, t) represents the (instantaneous) voltage and w(x, t) the current at a distance x from the sending end of the cable. The parameters a, /3, and v 2 can be interpreted in terms of the electrical properties of the cable as outlined in this exercise. If one ignores leakage (conductance due to inadequate insulation), then the circuit diagram for the segment of cable from x to x + Ax may be depicted as in Figure Ex.IV.7 below. Here the parameters R, L, and C are the resistance, inductance, and capacitance per unit length. These parameters are defined in accordance with certain physical principles. For example, Ohm's law says that the ratio of the voltage drop across a resistor to the current through the resistor is a constant (called the resistance) given here by R'Ax. Thus, the voltage drop across the resistor is R Ax w(x, t). Likewi.>e,
344
CONTINUOUS-PARAMETER MARKOV CHAINS x+Ax R4x
Lex
I
❑ (x. t)
J
-L
cex
u(x +/X. t)
Figure Ex. IV.7
when there is a change in the flow of current aw/at in an inductor then there is a corresponding voltage drop of L Ax Ow/Ot. According to Kirchhoff's second law, the sum of potential drops (as measured by a voltmeter) around a closed loop in an electric network is zero. Thus, for Ax small, one has u(x + Ax, t) - u(x, t) + R Ax w(x, t) + L Ax
a`
= 0.
The nature of a capacitor is such that the ratio of the charge stored to the voltage drop is the capacitance C Ax. Thus, the capacitor current, being the time rate of change of charge, is given by C Ax au/at. According to Kirchhoff's first law, the sum of the currents flowing toward any point in an electrical network is zero (i.e., charge is conserved). Thus, for Ax small, one has au w(x+ Ax, t)—w(x,t) +CAx—=0. at (i) Use the above to complete the derivation of the transmission line equations and the telegrapher's equation (for twice-continuously differentiable functions) and give the corresponding electrical interpretation of the parameters a, ß, and v 2 (ii) In the case when leakage is present there is an additional parameter (loss factor) G, called the conductance per unit length, such that the leakage current is proportional to the voltage G Ax u(x, t). In this case the term G Ax u(x, t) is added to the left-hand side in Kirchhoff's first law and one sees that the voltage and current satisfy transmission line equations of exactly the same general forms. Show that precisely the same equation is satisfied by both voltage and current. .
10. In Example 4, suppose that the service time distribution is arbitrary, with distribution function F. Determine the distribution of W,.
Exercises for Section IV.8 1. Let Z (i >, 1) be a sequence of i.i.d. nonnegative random variables, P(Z, > 0) > 0. Prove that Y "_, Z -* oo almost surely. ;
;
2. Prove that the first and third terms on the right side of Eq. 8.13 go to zero (almost surely) as t -+ oo, provided (8.10) holds.
345
EXERCISES
3. When is a compound Poisson process (i) a pure birth process, (ii) a birth—death process? Write down q1, k ;j in these cases. 4. Let {X,} be a birth—death chain, and let c <x
bi).
PX ({X,} ever reaches c) in the case {X} is nonexplosive. Briefly (ii) Calculate p comment on what may happen in the case that explosion may occur with positive Ps - probability. 5. Given a birth—death chain {X}: (i) Derive a difference equation for m(x):= E X (a) where i is the first time {X } reaches c or d (c 5 x d). (ii) For the special case qzs = — A., ß, = hx = i (c <x < d), compute m(x). (*iii) Compute m(x) for general birth—death chains. 6. Let p(t) = ((p, (t))) be transition probabilities for a Markov chain on a finite state space S. Adapt Proposition 6.1 of Chapter II to show that lim,— m p. (t) exists for all i, Je S, and the convergence is exponentially fast. [Hint: Fix t > 0 and consider the discrete-parameter one-step transition probability matrix q = ((p ;,(t))). Then q" = ((p ; ,(nt)), n = 0, 1, 2, .... Use the semigroup property to show that lim"^ q;jl' (exists) does not depend on t > 0.] 7. Suppose that {X,} is irreducible and positive-recurrent. Let {}"} be the embedded discrete-parameter Markov chain with transition matrix K. Let a be the invariant distribution for {X,} such that n' Q = 0. Calculate the invariant distribution for { Y"}. 8. Let {X} be a positive-recurrent birth—death process on S = {0, 1, 2, ...} with birth—death rates ß ; a ; and b ; A ; respectively. Show that =
where it is the invariant initial distribution. 9. Consider the N-server queue under the invariant initial distribution (8.27). (i) Show that the expected number of customers waiting to be served (excluding those being served) is given by (IP( ^) ) N' rc o , i
where p = Nß (< 1)
is referred to as the traffic intensity parameter; i.e., the mean number of arrivals within the average service time ß '/N. (ii) Show that the average length of time a customer must wait for service is -
(NP)"
It o .
N(1 — p) 2 ßN!
(iii) A particular hospital ward receives patients according to a Poisson process at an average rate of a = 2 per day. The average length of stay is 5 days. Assuming the length of stay to be exponentially distributed, how many beds are necessary
346
CONTINUOUS-PARAMETER MARKOV CHAINS
in order to achieve an equilibrium distribution for the total number of patients admitted and waiting admission? What will be the average waiting time for admission? 10. Let {X= } be the continuous-parameter Markov (binary) branching process with offspring distribution f(0) = (5,f(2) = ß, ß + 8 = 1, and parameter A. (i) Show that {X,} is a birth—death process with linear rates. (ii) Let gt =_ g;' (r):=Z ; P; (X, = j)rt denote the p.g.f. of the P ; -distribution of X,. Show that the forward equations transform as )
°
age _ at =[ß(r
z
ag^'> _ r , r)+b(1—r)] ^
Q=Q1ß, S=2S.
(iii) Solve the equation for g. 11. Suppose that {X,} has transition probabilities p 1 (t) and infinitesimal transition rates given by Q = ((q)). Suppose that it is a probability distribution satisfying zt'Q = 0. If E ; A ; n i < oo, where 2 ; = —q ;; , then show that it is an invariant initial distribution. [Hint: Differentiation of >' nip(t) term by term is allowed.] 12. If S comprises a single communicating class and it is an invariant probability, then all states are positive recurrent. [Hint: Consider the discrete time chain X,, n >_ 0. Use Theorem 9.4 of Chapter II. Note that q defined by (8.4) is smaller than the second passage time to state i for the discrete parameter chain.] 13. Show that positive recurrence is a class property. [Hint: Use arguments analogous to those used in the discrete-time case.] 14. Verify the forward equations for Example 5.3. [Hint: Use Exercise 4.5 and Example
8.2.] Exercises for Section IV.9 1. Calculate the spectral representation of the transition probabilities p 1 (t) for the continuous-parameter simple symmetric random walk on S = {0, 1, 2, ...} with one reflecting boundary at 0. [Hint: Use (9.19).] 2. (Time Reversibility) Let {X,} be an irreducible positive recurrent Markov chain with
invariant initial distribution it. (i) Show that {X} is a stationary process; i.e., for any h > 0, 0 < t i < t 2 < • • < t k (X,,..... X, k ) and (X,,,,, . .. , X, k „,) have the same distribution. (ii) Show that {X,} is time-reversible, in the sense that for any 0 _< t i < t 2 < • • • < t j, < T (X,,, . .. , X, k ) and (XT _ tk , ... , X T _,,) have the same distribution, if and only if n ; p ;j (t) = rr t pt; (t)
for all i, j e S, t _> 0.
This last property is sometimes referred to as time-reversibility of n. (iii) Show that {X} is time-reversible if and only if n ; q i; = rc ; q ;; for all i, j e S. (iv) Show that {X,} is time-reversible if and only if the discrete-parameter embedded chain is time-reversible; see Exercise 7.4 of Chapter II. (v) Show that the chemical reaction kinetics example is time-reversible under the invariant initial distribution.
347
EXERCISES
3. (Relaxation and Maximal Correlation) Let {X,} bean irreducible finite-state Markov chain on S with transition probabilities p(t) = ((p ;j (t))), t >, 0, and infinitesimal rates Q = ((q i3 )). Let it be the unique invariant distribution and assume the distribution to be time-reversible as defined in Exercise 2(ii) above. Define the maximal correlation by p(t) = sup f , 9 Corr( f (XX ), g(X o )), where the supremum is over all real-valued functions f, g on S and Corrn(.f(X,), g(Xo)) =
E{[J(X,) — E,.f(X^)][g(XO) — Eg(XO)]}
{ Var r f (Xi )} "2 {Var,, g(Xo)} It 2
Show that p(t)=e2,ß
t>_0
where A, < 0 is the largest nontrivial eigenvalue of Q. The parameter y = —1/),, is called the relaxation time or correlation length parameter in this context. [Hint: For f, g such that E rt f = E R g = 0, 1II f II,, = IIII R = i Corr rt ( f (X,), g(X o )) = (p(t)f, g),. So there is an orthonormal basis {(p„} of eigenvectors of p(t). Since k ! p(t) = e Qr _ k ^ _ Qktk
the eigenvalues of p(t) are of the form e`^' where A. are eigenvalues of Q. Take f = g = cp, to get
expand f and g in terms of {q,}, i.e., f = J„ (f, cp„) n cp,,, g = E„ (g, to show ICorr n ( f (X,), g(X0 ))l _< e. Use the simple inequality ab _< (a 2 + b 2 )/2.] 4. Let {X,} be an irreducible positive recurrent finite-state Markov chain with initial distribution p. Let lt j (t) = PI (X, = j), je S, and let it denote the invariant initial distribution. Also let Q = ((q, j )) be the matrix of infinitesimal rates for {X}. Show that (i)
dµ (t) = '
dt
{µ(t)q 1 — p (t)q 1 },
z(0)=p,
(ii) Suppose that it is a
t >' 0, j e S,
tes
time-reversible
j e S.
invariant distribution and define
Yjj =
n1g1j
_ njgj,
(possibly infinite). Show that dpj(t) dt
_
1
keS Yjk
ilk\t) — 14j (t) j
Ttk
e S.
Itj )'
Consider an electrical network in which the states of S are the nodes, and y jk is the resistance in a wire connectingj and k. Suppose also that each nodej carries a capacitance rrj . Then these equations are Kirchhoffs equations for the spread
348
CONTINUOUS-PARAMETER MARKOV CHAINS
of an initial "electrical charge" µ ; (0) = p,, j ES, with time (see also Exercise 7.9). The potential energy stored in the capacitors at the nodes at time t, when the initial distribution is µ(0) = t, is given by U(t) = U(t(t)) = — - Y
2
(ul(t))^
z
ieS n%
One expects that as time progresses energy will be dissipated as heat in the wires. (iii) Show that if p 96 it, then U is strictly decreasing as a function of time. (iv) Calculate U(p(t)) for the two-state (flip-flop) Example 3.1 in the cases t(0) = 6 ( ,^, i =0,1.
5. As in Exercise 4 above, let {X1 } be an irreducible positive recurrent finite-state Markov chain with initial distribution it. Let µ; (t) = PI (XZ = j), je S, and let it denote the invariant initial distribution. Let h be a strictly concave function on [0, x) and define the h-entropy of the distribution at time t by h
H(t) = H(µ(t)) ,%e$
t >- 0. '\
1j
(i) Show for µ 0 n, H(t) is strictly increasing as a function of t and lim 1 ^^ H(p(t)) = H(n). (ii) Statistical entropy is defined by taking h(x) _ —x log x in (i). Calculate H(p(t)) for the two-state (flip-flop) Example 3.1 when t(0) = S (;r , i = 0, 1. Exercises for Section IV.10 ) appearing in Eq. 10.4. [Hint: Apply 1. Calculate the distribution G(t; 2, A ; , ... , A i the partial fractions decomposition to the characteristic function of the sum.] 2. Calculate the distribution of the time until absorption at j = 0 for the pure death process on S = {0, 1, 2, ...} starting at i, with infinitesimal rates q 1 = ib if j = i — 1, i >, 1, q i1 = —i8, q 11 = 0 otherwise. What is the average time to absorption? 3. A zero-seeking particle in state i > 0 waits for an exponentially distributed time with parameter a. and then jumps to a position uniformly distributed over 0, 1, 2, ... , i — 1. If in state i < 0, it holds for an exponential time with parameter 2 and then jumps to a position uniformly distributed over i + 1, i + 2, ... , —1, 0. Once in state 0 it stays there. Calculate the average time to reach zero, starting in
state
i.
*4. (i) Under the conditions stated in Example 2, show that conditioned on nonextinction at time t, X, has a limit distribution as t -- oc with p.g.f. given by Eq. 10.34. [Hint: Check that Zj o P,(X, =j I r (o) > t)r may be expressed as {g,(' ) (r) — g,(' ß ( 0 )}/( 1 — g"(Ø)). Apply (10.28) to get H(t + H(r)) — H '(t + H(0)) -
1 — H"'(t + H(0)) which by (10.31) is, asymptotically as t -+ oc, given by I — exp{x l (H(r) — H(0))} x (1 + 0(1)). Finally, apply (10.26).]
THEORETICAL COMPLEMENTS
349
(ii) Determine the precise form of the distribution of the limit in the binary case .r2=p<9=.%,p+9= 1 .
THEORETICAL COMPLEMENTS Theoretical Complements to Section IV.I (Independent Increments and Infinite Divisibility) A probability measure Q on (R', ') is called infinitely divisible if for each n >_ 1, Q can be factored as an n-fold convolution of a probability distribution Q. on (R' , . iJ), i.e., Q = Q, * • • • * Q. (n-fold ), n >_ 1. Familiar examples are the Normal, Gamma, Poisson and Cauchy distributions, as well as the first passage time distribution of the Brownian motion (Chapter 1). A basic property of infinitely divisible distributions is that the characteristic function Q() = f eQ(dx), 5 e R', is never zero. To see this, observe that by considering cP(5) = IQ 2 (^)I if necessary, one may assume to be given a real nonnegative characteristic function which, for any n _> 1, factors into an n-fold product of real nonnegative characteristic functions q,, (of some probability distributions µ n , say). Since q(0) = 1 and cp is continuous, there is an interval [ —i, r] on which cp(5) is strictly positive. Now, taking the unique version of log cp(^) which makes - log cp(5) continuous and log cp(0) = 0, log
I
lb,(dx) = n J (I — e 2 u'
J
J [I
n[1 — cPn( 2 5)] = n I — J e 2
J
= n [I — cos(2^x)]p„(dx) = 2n
),u,(dx)
—
cos 2 (^x)](dx)
a
[I — cos(,x)]µ„(dx) = 4n[1 — cpJ )].
4nJ
^' So, the sequence n[ 1 — cp„(2' )] k is bounded on [—r, r]. This means that each cp„(s ), and therefore, cp(^) must be nonzero on [—Zr, Zr]. In particular, there could be no largest such r and this proves p has no real zeros. If {X,} is a stochastic process with state space S c R' having stationary independent increments, then for any s < t, the distribution of X, — X, is clearly infinitely divisible since for each integer n _> 1, X,
—
XS= Z (X,—X, 1 )•
k=1
t k =s+(t—s)^
k=0,1,.. ,n.
n
Conversely, given an infinitely divisible distribution Q there is a family of probability measures Q„ t > 0, such that Q, = Q and Q, * Q s = Q, + ,,, obtained, for example, though their respective characteristic functions by taking cp,(^) = (cp(^))':_ exp{t log cp, ( )}, since cp, (S) ^ 0, for H' Thus one obtains a consistent specification of the finite-dimensional distributions of a stochastic process {X,} having stationary independent increments starting at 0 such that X, — X, has distribution Q, _„ 0 _< s< t.
CONTINUOUS-PARAMETER MARKOV CHAINS
350
The process {X,} with stationary independent increments is a Markov process on R' starting at 0 having stationary transition probabilities given by p(t;x,B)=P(X,+sEB1 Xs=x)=Q,(B — x),
t>0, BE.c',
where B — x :_ {y — x: ye B} is a translate of B. 2. (Levy—Khinchine Representation) The Brownian motion process and the compound Poisson process are simple examples of processes having stationary independent increments. While the Brownian motion is the only one of these that is a.s. continuous, both are continuous in probability (called stochastic continuity). That is, for any t o >' 0, X, —• X 0 in probability as t —+ t o since (i) for the Brownian motion, a.s. convergence implies convergence in probability and (ii) for the compound Poisson process, P(IX, — X,I > e) < I — e -z^` = 0Qt — sI) for e > 0. Although the sample paths of the Poisson and compound Poisson processes have jump discontinuities, stochastic continuity means that there are no fixed discontinuities. Stochastic processes {X} and {Y} defined on the same probability space and having the same index set are said to be stochastically equivalent if P(X, = Y) = 1 for each t. Stochastically equivalent processes must have the same finite-dimensional distributions. As an application of the fundamental theorem on the sample path regularity of stochastically continuous submartingales given in theoretical complement 5.2, it will follow that a stochastically continuous process {X,} with independent increments is equivalent to a stochastic process {Y} having the property that almost all of its sample paths are right-continuous and have left-hand limits at each t, i.e., have at most jump discontinuities of the first kind. Moreover, the process {Y,} is unique in the sense that if { Y;} is any other such process, then P(Y, = Y, for every t) = 1. Without loss of generality we may assume the given process {X} with stationary independent increments to have jump discontinuities of the first kind. Such a process is called a (homogeneous) Levy process. The prefix homogeneous refers only to stationarity of the increments. Theorem T.1.1. If {X,} is a (nondegenerate) homogeneous Levy process having a.s. continuous sample paths, then {XX } must be Brownian motion. ❑ Proof. Let s < t. By sample path continuity, for any e> 0 there is a number 6 = S(c)> 0 such that P(IX„ — X
I <e whenever lu — VI <5, s < u, v _< t) > 1 — e.
Let {E"} be a sequence of positive numbers decreasing to zero and partition (s, t] into subintervals s = tö / < t;" < ... < tk", = t of lengths less than b" = b(e,,). Then )
)
k„
k„
X,—X9 = E (XX ,1—X1 )__ E X;" I
where X'> . .. , Xk^ are independent (triangular array). Observe that the truncated random variables )
351
THEORETICAL COMPLEMENTS
are also independent and
.,
—+ X, — X, in probability as n —* oo since k„
> I — e". P(X, — XS =R)>
The result now follows by an application of the Lindeberg CLT (Chapter 0, Theorem 7.1). n Within this same context, the other extreme is represented by the following. Theorem T.1.2. Let {XX } be a homogeneous Levy process almost all of whose sample paths are step functions with unit jumps. Then {X,} is a Poisson process. 0 The proof of Theorem T.1.2 will be based on the following basic coupling lemma. Lemma. (Coupling Bound). Let X and Y be arbitrary random variables. Then for any (Borel) set B, P(X e B) — P(Ye B)I -< P(X ^ Y). Proof. First consider the case P(X e B) > P(Y e B). Then 0-
The argument applies symmetrically to the other case.
The coupling lemma can be used to get the following Poisson approximations which, while important in their own right, will be used in the proof of Theorem 1.1.2. Lemma. (Poisson Approximations). Let Y1 , ... , Y" be independent Bernoulli 0-1-valued random variables with p ; = P(Y• = 1), i = 1, ... , n. Then for any ). > 0 and J^{0,1,2....}, (i) P(1 Y E J) — Fo „(J)H 1 p; ;
(ii)
P^ ^ Y E J) — Fx (J) S ^ —
P,^ + (max P,)
P;,
where e-^
FF(J) _ m E./ m•
p..
p" _ i
=1
Proof. Let Q be an arbitrary (discrete) probability distribution on {0, 1, 2, ...} with p.m.f. q = Q( {i}), i = 0, 1, 2, ... . A simulation random variable having distribution Q based on the value of a random variable U uniformly distributed over (0, 1] (i.e., ;
352
CONTINUOUS-PARAMETER MARKOV CHAINS
a function of U) can be defined by (T.1.1)
X = Q(U),
where Q(x) e {O, 1, 2, ...} is uniquely defined for each 0 < x < I by the condition that (an empty sum being zero) Qt:) -1
Q(x)
j=0
j=0
q ; .
(T.1.2)
Clearly, X = Q(U) so constructed has distribution Q since P(Q(U) =i) =P
qj
9; =q;•
Now let U 1 , U2 ,..., UU be i.i.d. uniform over (0, 1] and let G, - G F; = F,, be the Bernoulli and Poisson distributions with parameter p ; , respectively, i = 1, ... , n. That is, F; ({k})=p` e P',
G,({1})=p = 1 —G ({0}), ;
-
;
k=0,1,2,....
k!
indicate the
Then, by the coupling lemma, letting p„ _ ^i= 1 p ; and letting corresponding "simulation" function as defined above, we have P(Y Ö1 (U1 ) 6 1 F,(U,))
P(i Y €) — Fp.(J)
(T.1.3)
since the distribution of the sum of independent Poisson random variables with parameters p,,... , p„ is Poisson with parameter E" =1 p ; . Now from (T.1.3) we get — FP (J) ^ P^ U {G,(U ; )
P(t
P((U; )
(T.1.4)
F; (U1 )).
Now, for fixed i, observe from Figure TC.IV.1 that P(G;(U;) ^ F;(U,)) =
(e-v1 — (I — p;)) + (1 — e-a'(1 + p;)) 5 p?
(T.1.5)
.- {G (U ) = 0} -• 1 «-- {G (U ) = I} ;
0
I
;
;
'
Figure TC.IV.1
1 — pi {‚(U,) = 0)
;
1 (1 + p;)e-P' --• { F (U,) > 1) ) = 1) I f-(U I - {F
e-n,
;
;
;
THEORETICAL COMPLEMENTS
353
(i).
since 1 — e °i -< p,. Thus, (T.1.4) and (T.1.5) prove To derive the estimate (ii), let > 0 be arbitrary and use the triangle inequality to write -
i
PI ` ^t Y Ei) — FA (J)I 5
IP
1
(^
Y Ei) — FP„(J) + IF,,(J) — Fz(J)I
n
<
p? + IF(J) — FA(J)I max pi 15i^n
(T.1.6)
pi + IFF .(J) — FA (J)I. i =1
To complete the proof, we simply need to show 1—
IF(J) — FA(J)I
For this, first suppose p" = _, p ; > land let Z„ Z Z be independent Poisson random variables with parameters a and Y"_ , p i — A. Again use the coupling bound lemma to bound the last term in (T.1.6) by P(Z1 ^ Z 1 + Zz) = P(Z2 ^ 0) = 1 — exp
[—
(iZ pi — i)] -< .l — t
pi. (T.1.7) i =1
n
The symmetrical argument works for Y" =1 p ; < A also.
Proof of Theorem T.1.2. It is enough to show that for each t > 0, X, has a Poisson distribution with EX, = At for some A > 0. Partition (0, t] into n intervals of the form (t i _,, t i ], i = 1, ... , n having equal lengths A = t/n. Let A> = {X,, — X1 l} and SS _ Y"-_, IÄ"? (the number of time intervals with at least one jump occurrence). Let D denote the shortest distance between jumps in the path Xs , 0 < s -< t. Then P(X, 0 S") -< P(0 < D < t/n) —+ 0 as n --, oo, since {X, 0 Sn } implies that there is at least one interval containing two or more jumps. Now, by the Poisson approximation lemma (ii), taking J = {m}, we have P(S" = m) — — e -x' \ Inp" — Al + (npn)z
m!
n
where pn = pin) _ P(A^nl) ,
i = 1, ... , n, and A > 0 is arbitrary. Thus, using the triangle inequality and the coupling bound lemma, P(XX
= m )-- e xS -
IP(X, =m )—P(Sn=m )I
+
m!
P(Sn=m )
--- e -zl m!
P(X, 0 Sn) + Inpn — AI + npn
=o( 1 )+Inpn — Al+np^.
(T.1.8)
The proof will be completed by determining A such that np" — Al —• 0, at least along
354
CONTINUOUS-PARAMETER MARKOV CHAINS
a subsequence of {np"}. For then we also have np„ = np"P(A) — 0 as n —> oo. But, since P(X, = 0) = P(n"=, A;'°') = (1 — p")", it follows that P(X, = 0) > 0; for otherwise P(A;" ) ) = 1 for each i and therefore X, > n for all n, which is not possible under the assumed sample path regularity. Therefore np, _< —n log(1 — p") = —log P(X, = 0) for all n. Since {np"} is a bounded sequence of positive numbers, there is at least one limit point A > 0. This provides the desired A. • The remarkable fact which we wish to record here (with a sketch of the proof and examples) is that every (homogeneous) Levy process may be represented as sum of a (possibly degenerate) Brownian motion and a limit of independent superpositions of compound Poisson processes with varying jump sizes. Observe that if the sample paths of {X,} are step function of a fixed jump size y 0 0, then {y 1 X1 } is a Poisson process by Theorem 1.1.2. The idea is that by removing (subtracting) the jumps of various sizes from {XX } one arrives at an independent homogeneous Levy process with continuous sample paths. According to Theorem T.1.1, therefore, this is a Brownian motion process. More precisely, let {X,} be a homogeneous Levy process and let S = [0, T) x !J' for fixed T> 0. Let a(S) be the Borel sigmafield of S, and let R + (S) = {A e g(S): p(A, [0, T) x {0}) > 0}, where p(A, B) = inf{Jx — yl: x e A, ye B}. The space—time jump set of {X,} is defined by J=J(w)={(t,y)eS:y=X,(ow)—X,-(cw) 0,0
for w e S2. Define a random counting measure v on a(S) by v(A) = v(uw, A) = # A n J(w),
WED, S2, A e l(S).
For fixed we S2, v is a measure on the sigmafield a(S), and for each A e P4(S), v(A) is a random variable (i.e., w --+ v(cw, A) is measurable). Moreover, for fixed A a P4(S), the process v(A n [0, t] x U8'), t _> 0, is a Poisson process. To prove this, take A an open subset of S and show that the process is a Levy process with unit jump step function sample paths and apply Theorem 1.1.2. Define µ(A) = Ev(A) for A e .4 + (S). Then visa measure on P4(S); countable additivity can be proved from countable additivity of v by Lebesgue's Monotone convergence theorem. Now observe that for each fixed B E .(OR' \ {0} ), the set function µ(C) = v(C x B) is translation invariant on .4([0, T)). Therefore, v(C x B) is a multiple (dependent on B) of the Lebesgue measure ICJ of C, i.e., v(C x B) = K(B)ICI, where K(B) is the multiplying factor. Notice now that since p is a measure, K must also be a measure on .P(R' \ {0} ). Define v,(B) = v([0, t] x B), B e P4(18' \ {0}).
Since there are at most a finite number of jumps in [0, t] of size 1/n or larger, {uv^> Ii "i yv,(dy) is well-defined. However, in general, its limit may not exist pathwise. By appropriate centering, a limit may be shown to exist in L 2 . Finally, one may prove that this process built up from jumps is independent of the Brownian motion component. In this way one obtains the following. Theorem T.1.3. Let {X,} be a homogeneous Levy process. Then there is a standard Brownian motion {B,} independent of {v,(B): Be .4(IV \ {0}), 0 < t < T} such that
THEORETICAL COMPLEMENTS
355
X, = mt + /CT B 1 +
ff
Yv1(dY) —
'
^
1 +t y z K(dY) }
in distribution, where me R', Q 2 -> 0, and for fixed Be .ß(U8' \{0}), v 1 (B) is a Poisson process in t. Corollary. A probability distribution Q on I' is infinitely divisible if and only if the characteristic function Q() = f u eQ(dx), e W, can be represented as Q() =exp{ imt — 2tz + t
J
[e
`Y _ l_ +y vZ K(dY) j
J
for some me I8', a 2 >, 0, where K is a nonnegative measure on I8'. Example 1. K = 0. Then {X,} is Brownian motion with drift m and diffusion
coefficient a Z . Example 2 r = lim
y z K(dy)
e10 IYI>s 1 + y
exists. Then, X, = (m + r)t + 6 2 B, +
5
yv,(dy).
In this case
Ee' 4x, =exp{it(m+r)^—t
l
= lim
f
e10
+t
J -. [e'^Y - 1]K(dy)}.J J
Ifm+r= 0, a 2 = 0, then
Example 3.
X,
Z
yv,(dy)
and
Ee°
= exp{lim t J
(IYI>e)
(((sly
(e'4Y — 1)K(dy) (IYI>el )))
Example4. IfA=K(t')
J
Yv,(dy),
Ee`4x , = exp{2.t J
l
(e'SY — 1 ) 2-1 K(dY)}.
An important special class of (nondegenerate) homogeneous Levy processes {X1 } have the following basic scaling property: For each A > 0, there is a positive constant c x > 0 such that {c AX,u } and {X,} have the same distribution. These are the Levy stable processes. In view of the Levy—Khinchine formula we have the following theorem.
CONTINUOUS-PARAMETER MARKOV CHAINS
356
Theorem T.1.4. If {X,} is a homogeneous Levy stable process, then or
Ee'^ x ' = e"t'°
e - "`^ 2 or
exp{(—at + i jßlt )1 del
for some a>0,ßeW,0<0<2. Observe that in the first case X, = m, t >_ 0, is a.s. constant and c A = ‚.° = 1. In the second case {X1 } is Brownian motion with zero drift and c z = .1 -1 . In the third case we have c z = A `t e and examples are the Cauchy process (0 = 1) and the first-passage-time process for Brownian motion (0 = Z). Thorough treatments of processes with independent increments may be found in K. Ito (1984), Lectures on Stochastic Processes, Tata Institute of Fundamental Research, distributed by Springer-Verlag, New York, and in I. I. Gikhman and A. V. Skorokhod (1969), Introduction to the Theory of Random Processes, W. B. Saunders, Philadelphia. The coupling arguments follow T. C. Brown (1984), "Poisson Approximations and the Definition of the Poisson Process," Amer. Math. Monthly, 91, pp. 116-123. -
3. (Fractional Brownian Motion and Self-Similarity) The scaling property (or self-similarity) of Levy processes was extended to more general processes (including diffusions) in a classic paper by John Lamperti (1962), "Semistable Stochastic Processes," Trans. Amer. Math. Soc., 104, pp. 62-78. Lamperti used the term semistable to describe the class of processes {X,} with the property that for each 1 > 0 there is a positive scale coefficient c z and a location parameter b z such that {c A XA, + b x } and {X} have the same distribution. The defining property is often referred to as simple scaling or self-similarity. Stochastic processes with this property are of substantial current interest in many branches of mathematics and the physical sciences (see references below as well as references therein). One family of such processes of some notoriety is the class of fractional Brownian motions defined as follows. Let 0 <ß < I be a fixed parameter and define, for each s, t >_ 0, Pß(s, t) _
{IsI 2 + ItI Z Q — Is — tI 2 R}/2.
(T.1.9)
Then p ß is a positive-definite symmetric function. To see this, one need only check that there is a family of square-integrable functions y ß (t, •) on the real-number line such that for each s and t, p p (s, t) may be represented as an L 2 inner product
x) =
It — xlQ - 't 2 sign(t — x) + IxIß - 't z sign(x). (T.1.10)
One may then check the positive-definiteness using the bilinearity of the inner product; see, for example, M. Ossiander and E. Waymire (1989), "On Certain Positive Definite Kernels," Proc. Amer. Math. Soc., 107, 487-492. The fractional Brownian motion {Zr } with exponent ß may now be most simply defined as a mean-zero Gaussian process starting at 0 such that Z S and Z, have covariance given by pß (s, t). This class of processes was studied by Benoit Mandelbrot and others in connection with the Hurst effect problem of Section 1.14; see B. Mandelbrot (1982), The Fractal Geometry of Nature, Freeman, San Francisco (and references therein) and J. Feder (1988), Fractals, Plenum Press, New York. Additional reference: M. Taqqu (1986), "A Bibliographical Guide to Self-Similar Processes and Long Range Dependence," in Dependence in Statistics and Probability (E. Eberlein, M. Taqqu, eds.), Birkhäuser, Boston.
THEORETICAL COMPLEMENTS
357
Theoretical Complements to Section IV.5 (The Uperossing Inequality and (Sub) Martin gale Convergence) A fundamental property of monotone sequences of real numbers is that boundedness of the sequence implies the existence of a limit. An analogous property of submartingales also holds as will be developed below. Consider a sequence {Z.} of real-valued random variables and sigmafields .F1 c .F2 c • • . , such that Z. is . -measurable. Let a < b be an arbitrary pair of real numbers. An uperossing of the interval (a, b) by {Z„} is a passage to a value equal to or exceeding b from an earlier value equal to or below a. It is convenient to look at the corresponding process {X:= (Z^ — a) + }, where (Z„ — a) + = max{(Z„ — a), 0}. The uperossings of (a, b) by {Z„} are the uperossings of (0, b — a) by {X„}. The successive uperossing times >a 2k (k = 1, 2, ...) of {X} are defined by q l := inf{n _> I: X„ = 0}, 11 2 := inf{n >_ q: X„ _> b — a}, ?1 2k+1 := inf{n q2k: Xn = 0 } , rl2k+2'= inf{n 3 12k+1: X„ > b — a}. Then the rl k are {5„}-stopping times. Fix a positive integer N and define N = min{ry k , N},•k = 1, 2..... Then the z k are stopping times. Also, T k - N for k > [N/2] (the integer part of N/2) so that X X N for k > [N/2]. Let UN - U N (a, b) denote the number of uperossings of (a, b) of {Zn } by time N. That is, UN(a, b) := sup{k > 1: q 2k _< N} , (T.5.1) T k `= rlk A
with the convention that the supremum over an empty set is 0. Thus UN is also the number of uperossings of (0, b — a) by {X} in time N. Since X X N fork > [N/2], one may write (setting t 0 = 1) IN121 XN
— X1 = Y-
[N^/'2` 1
(XT2k-1 — X T2k-2 )
+ L^
k=1
(XT2k — Xr2k - 1 )•
(T.5.2)
k=1
To relate (T.5.2) to the number UN , let v denote the largest k such that ry k _< N, i.e., v is the last time _< N for an uperossing or a downcrossing. Notice that UN = [v/2]. Ifvis even, then XT2k — XT2k-, >b a if 2 k
N/21
(Xr2k
—
X T2k
-
1
)
>- [v/2](b
—
a) = UN(b
—
a).
(T.5.3)
k= 1
As a consequence, [N/21
XN
—
X1 >-
Y-
k=1
(XT2k
-
1
—
XT2k
-
2)
+ (b — a)UN .
(T.5.4)
Observe that (T.5.4) is true for an arbitrary sequence of random variables (or real numbers) {Z}. Assume now that {Z„} is a {.f„}-submartingale. Then {X„} is a {.F„}-submartingale by convexity. One may show, using the method of proof of Theorem 13.3 of Chapter I, that {X Tk : k >_ l} is a submartingale. Hence EX increasing in k, so that
E[
[N121 Y-
(XT2k-I — XT2k-2 ) ]
k=1
Applying this to (T.5.2) one obtains the following.
_> . 0
358
CONTINUOUS-PARAMETER MARKOV CHAINS
Proposition T.5.1. (Uperossing Inequality). Let {Z„} be a {S}-submartingale. For each pair a
E(Z N — a) + — E(Z, — a) + EIZNI + Ial
(T.5.5)
b—a
b—a
As an important consequence of this result we get the following theorem. Theorem T.5.2. (Submartingale Convergence Theorem). Let {Z„} be a submartingale such that EIZ„I is bounded. Then {Z„} converges a.s. to an integrable random variable Z. ❑ Proof. Let U(a, b) denote the total number of uperossings of (a, b) by {Zn }. Then UN (a, b) j U(a, b) as N I oo. Therefore, by the Monotone Convergence Theorem
(Chapter 0) EU(a,
b)
=
lim EUN (a, b) _< sup N1
N
EIZNI + lal
< co .
(T.5.6)
b — a
In particular, U(a, b) < cc almost surely, so that P(liminf Z n
(T.5.7)
Since this holds for every pair a, b = a + 1/m with a rational and m a positive integer, and the set of all such pairs is countable, one must have liminf Z. = limsup Z. almost surely. If lim Z. is not finite with probability 1, IZ,I —* oo with positive probability so that E limlZ„I = co. Then, by Fatou's Lemma (Chapter 0), lim EIZ„I = 00, which contradicts the hypothesis of the theorem. The integrability of IZJ follows from Fatou's Lemma. n Corollary. A nonnegative martingale {Z„} converges almost surely to a finite limit Z. Also, EZ, < EZ,. Proof. For a nonnegative martingale {Zn }, IZn l = Z. and therefore, sup EIZ„I = sup EZ„ = EZ, < oo. By Fatou's Lemma (Chapter 0), EZ ^ = E(lim Z„) S lim EZ, = n EZ,.
Convergence properties of supermartingales {X„} are obtained from the submartingale results applied to { — X.J. 2. (Martingales and Sample Path Regularity) The control over fluctuations available
through Doob's Inequality (Section 1.13) and the Uperossing Inequality (theoretical complement 1) for (sub/super) martingales make it possible to find versions of such processes with very regular sample path behavior. In this connection the martingale property is of fundamental importance for the construction of models of stochastic processes having manageable sample path properties (cf. theoretical complements 1.1 and 5.3). The basic result is the following. Recall the meaning of continuity in probability from theoretical complement 1.2.
359
THEORETICAL COMPLEMENTS
Theorem T.5.3. Let {X} be a submartingale or supermartingale with respect to an increasing family of sigmafields {.}. Suppose that {X,} is continuous in probability at each t _> 0. Then there is a process {X,} such that: (i) (Stochastic Equivalence) {X,} is equivalent to {X} in the sense that P(X, = = l for each t _> 0. (ii) (Sample Path Regularity) With probability I the sample paths of {Xt } are bounded on compact intervals a _< t < b, (a, b >_ 0), and are right-continuous and have left-hand limits at each t > 0. ❑ Proof: Fix T> 0 and let Q T denote the set of rational numbers in [0, T]. Write
Q T = U ^ , R", where each R. is a finite subset of [0, T] and T a R, c R 2 c • • • . By Doob's Maximal Inequality (Section 1.13) we have IS rl
P( max (X,I > AlE tE
n= 1,2 ... .
j
R,
Therefore,
P(
suP JX,I > n^ tEQT
lim P(max JX t j > A) <_ EIX TI 7
"-•IX:
/i.
(.Rn
In particular, the paths of {X,: t a Q T } are bounded with probability 1. Let (c, d) be any interval in R and let U (T) (c, d) denote the number of uperossings of (c, d) by the process {X,: t e Q T }. Then U (T) (c, d) is the limit of the number U ( " ) (c, d) of uperossings of (c, d) by { X,: t E R"} as n -* co. By the uperossing inequality (theoretical complement 1), one has EU 1 " ) (c, d)
EIXTI
+ j cj
d—c
Since, the Ul" ) (c, d) are nondecreasing with n, it follows that U (T) (c, d) is a.s. finite. Taking unions over all (c, d), with c, d rational, it follows that with probability one {X,: t a Q 7_} has only finitely many uperossings of any interval. In particular, therefore, left- and right-hand limits must exist at each t < T a.s. To construct a right-continuous version of {X,}, define X, = lim5.., ;.SEQT X, fort < T. That {X,} is in fact stochastically equivalent to {X1 } now follows from continuity in probability; i.e., X, = lim s -,. X, = X, since a.s. limits and limits in probability must a.s. coincide. Since T is arbitrary, the proof is complete. n 3. (Markov Processes and Sample Path Regularity) The martingale regularity theorem
of theoretical complement 2 can be applied to the problem of constructing Markov processes with regular sample path properties. Theorem T.5.4. Let {X,: t _> 0} be a Markov process with a compact metric state space S and transition probability function p(t;X,B)=P(Xt+s eBIXs =x),
xES, s,t->0, BE.^(S),
360
CONTINUOUS-PARAMETER MARKOV CHAINS
such that (i) (Uniform Stochastic Continuity) For each e 0, p(t; x, BE(x)) = o(1) as t -* 0 + uniformly for x e S, where B E (x) is the ball centered at x of radius e > 0. (ii) (Feller Property) For each (bounded) continuous function f, the function x -+ f s f(y)p(t; x, dy) = E x f (Xr ) is continuous. Then there is a version {X,} of {X,} a.s. having right continuous sample paths with left-hand limits at each t. 0
Proof. It is enough to prove that {X,} a.s. has left- and right-hand limits at each t, for then {X,} can be modified as X, = lim s .,+ XS which by stochastic continuity will provide a version of {X,}. It is not hard to check stochastic continuity from (i) and (ii). Let f E C(S) be an arbitrary continuous function on S. Consider the semigroup { T} acting on C(S) by T, f (x) = E x f (X,), x e S, t _> 0. For A > 0, write
R x f (x) =
f'
e 'Tf (x) ds, -
X E S;
.
R x (A > 0) is called the resolvent (Laplace transform) of the semigroup {T}. A basic property of the resolvent is that AR, behaves like the identity operator on C(S) for A large in the sense that
Ml— 1Rf II := sup 11(x) — ARJ.f(x)I = sup IJe ' 2 (f(x) — T.f(x)) ds xeS xes 0 l
f
^ e -» 2IIf — Tf II ds = f e - `Ilf — Tx-lTII dt.
o
(T.5.8)
0
For each z > 0, by the properties (i) and (ii), Ill - Tx- lTII - 0 as A - oo, and the integrand is bounded by 2e - TII f II since II 7;f II _< Ill II for any t >_ 0. By Lebesgue's Dominated Convergence Theorem (Chapter 0) it follows that III f - 2R x f II - 0 as A -+ oo; i.e., ARJ f - f uniformly on S as d --> oo. With this property of the resolvent in mind, consider the process { Y} defined by 1', = e
-,
"Rj(X,),
(T.5.9)
t >, 0,
where f e C(S) is fixed but arbitrary nonnegative function on S. Then {Y,} is a supermartingale with respect to ^, = a{X,: s 5 t}, t > 0, since EI YI < 00, Y, is .-measurable, one has T f (x) >, 0 (x e S. t >_ 0), and - zn +e >E{R f(X )^.3z}=e-xetF ThR f X) E{Y t+h 1.^}:=e r ^ r+h h r ,t ( r
= e -at f m e -xis+nt
J
m - zr T,+hf(X$) ds = e.
0
e-.0 f
e
-,
"T.f(X,) ds
n
e'Tf(X,)ds=Y.
(T.5.10)
^
.
Applying the martingale regularity result of Theorem T.5.3, we obtain a version { Y} of { Y} whose sample paths are a.s. right continuous with left-hand limits at each t.
THEORETICAL COMPLEMENTS
361
Thus, the same is true for {)e^'Y} _ {AR x f(X,)}. Since J.R A f -+ f uniformly as ;. -+ x, the process { f(X,)} must, therefore, a.s. have left and right-hand limits at each t. The same will be true for any f E C(S) since one can write f = f ' - f - with f + and f - continuous nonnegative functions on S. So we have that for each f e C(S), { f (X,)} is a process whose left- and right-hand limits exist at each t (with probability 1). As remarked at the outset, it will be enough to argue that this means that the process {X,} will a.s. have left- and right-hand limits. This is where compactness of S enters the argument. Since S is a compact metric space, it has a countable dense subset {x„}. The functions f„: S --• 68' defined by f(x) = p(x, x), XE S, are continuous for the metric p, and separate points of S in the sense that if x y, then for some n, f(X) ^ f,(y). In view of the above, for each n, { f,(X,)} is a process whose left- and right-hand limits exist at each t with probability 1. Thus, the countable union of events of probability 0 having probability 0, it follows that, with probability 1, for all n the left- and right-hand limits exist at each t for { f„(X,)}. But this means that with probability one the left- and right-hand limits exist at each t for {X} since the fn 's separate points; i.e., if either limit, say left, fails to exist at some t', then, by compactness of S, the sample path t -+ X, must have at least two distinct limit points as t --> t' , contradicting the corresponding property for all the processes { f„(X,)}, -
n=1,2,....
n
In the case that S is locally compact, one may adjoin a point at infinity, denoted A( S), to S. The topology of the one point compactification on S = S v {A} defines a neighborhood system for A by complements of compact subsets of S. Let . !(S) be the sigmafield generated by {, {A}}. The transition probability function p(t; x, B), (t >, 0, XE S. Be s(S)) is extended to p"(t; x, B) (t >, 0, XE S. B e 4(S)) by making A an absorbing state; i.e., p(t; A, B) = I if and only if Ac B, Be 4, otherwise p(t; A, B) = 0. If the conditions of Theorem T.5.4 are fulfilled for p, then one obtains a regular process {X,} with state space S. Defining r o =inf{t>0;X,=A},
the basic Theorem T.5.4 provides a process {X,: t < r o } with state space S whose sample paths are right continuous with left-hand limits at each t < r e . A detailed treatment of this case can be found in K. L. Chung (1982), Lectures from Markov Processes to Brownian Motion, Springer- Verlag, New York. While the one-point compactification is natural on analytic grounds, it is not always probabilistically natural, since in general a given process may escape the state space in a variety of manners. For example, in the case of birth-death processes with state space S = Z one may want to consider escapes through the positive integers as distinct from escapes through the negative integers. These matters are beyond the present scope. 4. (Tauberian Theorem) According to (5.44), in the case r >- 2 the generating function
h(v) can be expressed in the form h(v) - (1 - v) K((1 - v) ') as v -* 1 , (T.5.12) -
-
where p >- 0 and K(x) is a slowly varying function as x -+ oo; that is, K(tx)/K(t) --* 1 as t -• oo for each x > 0. Examples of slowly varying functions at infinity are constants, various powers of Ilog xl, and the coefficient appearing in (5.44) as a function of x = (1 - v) '. In the case r = 1, the generating function vh'(v) for {nh„} is also of -
362
CONTINUOUS-PARAMETER MARKOV CHAINS
this form asymptotically with p = 1. Likewise, in the case of (5.37) one can differentiate to get that the generating function of {(n + I)P(ML = n + 1)} is asymptotically of the form Z(1 — v) '' 2 as v -• 1. Let d(v) be the generating function for a sequence {a„} of nonnegative real numbers and suppose -
X
=I
ä(v)
a k v k ,
( T.5.13)
k=0
converges for 0 < v < 1. The Tauberian theorem provides the asymptotic growth of the sums as n
Y a k — (1/pl'(p))nP
(T.5.14)
as n -+ co,
k=0
from that of ä(v) of the form ä(v)
—
— v) oK((l — v) ')
(1
-
-
as v -. 1 (T.5.15) -
as in (T.5.12). Under additional regularity (e.g., monotonicity) of the terms {a„} it is often possible to deduce the asymptotic behavior of the terms (i.e., differenced sums) from this as a. — (1/I'(p))n° - '
as n -• oo.
(T.5.16)
An especially simple proof can be found in W. Feller (1971), An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, New York, pp. 442-447. Using the Tauberian theorem one can compute the asymptotic form of the rth moments (r > 1) of An ' 12 L given ML = n as n -. oo. In fact, one obtains that for r > 1, -
I
E{»n - 'I 2 L ML = n} — (2/)'p* + (r)
as n - oo,
(T.5.17)
where y* + (r) is the rth moment of the maximum of the Brownian excursion process given in Exercise 12.8(iii) of Chapter I. Moreover, in view of the last equation in the hint following that exercise, one can check that the moments µ*+(r) uniquely determine the distribution function by checking that the moment-generating function with these coefficients has an infinite radius of convergence. From this one obtains convergence in distribution to that of the maximum of a (appropriately scaled) Brownian excursion. This result, which was motivated by considerations of the main channel length as an extreme value of a river network, can be found in V. K. Gupta, O. Mesa and E. Waymire (1990), "Tree Dependent Extreme Values: The Exponential Case,” J. Appl. Probability, in press. A more comprehensive treatment of this problem is given by R. Durrett, H. Kesten and E. Waymire (1989), "Random Heights of Weighted Trees," MSI Report, Cornell University, Ithaca. Problems of this type also occur in the analysis of tree search algorithms in computer science (see P. Flajolet and A. M. Odlyzko (1982), "The Average Height of Binary Trees and Other Simple Trees," J. Comput. System Sci., 25, pp. 171-213).
363
THEORETICAL COMPLEMENTS Theoretical Complements to Section IV.1I
1. A (noncanonical) probability space for the voter model is furnished by the graphical percolation construction. This approach was created by T. E. Harris (1978), "Additive Set-Valued Markov Processes and Percolation Methods," Ann. Probab., 6, pp. 355-378. To construct Q, first, for each m e Z', n belonging to the boundary set 0(m) of nearest neighbors to m, let f2 m , n denote the collection of right-continuous nondecreasing unit jump stepfunctions cn m.n (t), t >, 0, such that Wm(t) --* co as t -+ x. By the term "step function," it is implied that there are at most finitely many jumps in any bounded interval (nonexplosive). Let Pm ,, be the (canonical) Poisson probability distribution on (S 2 m.n• m.n) with intensity 1, where S m.n is the sigmafield generated by events of the form {w E Q m : co(t) s k}, t > 0, k = 0, 1,2,.... Define (1 ',Y', P') as the product probability space S2' = 11 ' 11 m.n, ' = H where the products fj are over m E 1d , n e 3(m). To get 52, remove from Q' any = ((Wm,n (t): t % 0): m E Z', n E 3(m)) such that two or more w m , n (t) have a jump at the same time t. Then f(= S2 n #') and P are obtained by the corresponding restriction to S2 (and measure-theoretical completion). The percolation flow structure can now be defined on (S2, , P) as in the text, but sample pointwise for each w = ((wm.n(t): t 0): m E 1 d ,
ne 3(n)) e 0. For a given initial configuration 1I E S let
D = D(q) :_ {m E 7L ° : 'im = -1 }. A sample path of the process started at '1. denoted a D( " ) (t, w) _ (a^°" ) (t, co): n E 71"), t > 0, co e S2, is given in terms of the percolation flow
on w by if the flow on as reaches (n, t) from some (m, 0), m E D,
I 1 +1
otherwise.
(T.11.1)
The Markov property follows from the following basic property of the construction: QD(^) (t + s, w) =
D^Q UI9I^S.CU %i
(t, Us w),
(T.11.2)
where, for co E 0 as above, U,: S2 --+ Q, is given by U,(co) = ((W m , n (t + s): t >- 0): me 7L ° , ne 3(n)).
(T.11.3)
The Markov property follows from (T.11.2) as a consequence of the invariance of P under the map U,: S2 -• 0 for each s; this is easily checked for the Poisson distribution with constant intensity and, because of independence, it is enough. 2. The distribution of the process {a(t)} started at it E S = { - 1, 1}'" is the induced probability measure P" on (S, .) defined by P(B) = P({6 D(1) (t)} E B),
Be..
(T.11.4)
A metric for S, which gives it the so-called product space topology, is defined by
--kJ.21ni—_._...
=
(T.11.5)
nEzd
where uni _ I(n l , ... , n d )I := max(1n 1 1, ... , In d l). Note that this metric is possible largely because of the denumerability of 7L ° . In any case, this makes the fact that, (i) ' is
364
CONTINUOUS-PARAMETER MARKOV CHAINS
the Bore! sigmafield and (ii) S is compact, rather straightforward exercises for this metric (topology). Compactness is in fact true for arbitrary products of compact spaces under the product topology, but this is a much deeper result (called Tychonoffs Theorem).
To prove the Feller property, it is sufficient to consider continuity of the mappings of the form a —' PQ (v„(t) = —1, ne F) at tt e S, for, fixed t >0 and finite sets F c V. Inclusion—exclusion principles can then be used to get the continuity of a -+ P0 (B) for all finite-dimensional sets B e F. The rest will follow from compactness (tightness). For simplicity, first consider the map a —* PP (a (t) _ —1), for F = {n}. Open neighborhoods of t) e S are provided by sets of the form, R A (1l) = {a e S: Qm = h m for all me A},
(T.11.6)
where A is a finite subset of 7L", say A={m=(m i ,...,m d )e7L ° :Im ; j
r>0.
(T.11.7)
Now, in view of the simple percolation structure for the voter model, regardless of the initial configuration il, one has Qn^^ ) (t) = Qm(n,t)( 0 )
a.s.,
(T.11.8)
where m(n, t) is some (random) site, which does not depend on the initial configuration, obtained by following backwards through the percolation diagram down and against the direction of arrows. Thus, if {r k } is a sequence of configurations in fl that converges to il in the metric p as k — oo, then o (t) —• vä,,,,, t) (t) a.s. as k —+ oo. Thus, by Lebesgue's Dominated Convergence Theorem (Chapter 0), one has 1 — 2 Pnk (Qfl(t) _ —1) = Ea 11k .")(t) = 1 — 2 Pn (Qn(t) _ —1).
Thus, P,) , (a„(t) = —1) —+ Pq (v„(t) = —1) as k —• co. 3. A real-valued function cp(m), m e ZL", such that for each m e Z",
2d
I
/
^ Ym)
(T.11.9)
4 (n) = q (m),
E
where 0(m) denotes the set of nearest neighbors of m, is said to be harmonic (with respect to the discrete Laplacian on 7Ld ). Note that (T.11.9) may be expressed as (p(Z O ) = E m cp(Z,) = E m rp(Zk ),
k>_ 1,
(T.11.10)
where {Z k } is the simple symmetric random walk starting at Z o = m. Theorem T.11.1. (Boundedness Principle). Let (p be a real-valued bounded harmonic ❑ function on 7L ° . Then cp is a constant function. Proof. Suppose that {(Zk, ZZ)} is a coupling of two copies of {Z k }; i.e., a Markov chain on S x S such that the marginal processes {Zk} and {Z'} are each Markov
THEORETICAL COMPLEMENTS
>,
365
chains on S with the same transition law as {Z}. Then, if r = inf{k 1: Zk = Zk} < o0 a.s. one may define Z; = Zk for all k > r without disrupting the property that the process {(Z;, Zk)} is a coupling of {Z k }. With this as the case, by the boundedness of cp and (T.11.10), one obtains
-
Ig(n) — 9(m)I = IEnp(Zk) — Emco(Zk)I = I E (n.m){q (Zk) — p(Zk) }I E(n,m)Iq(Zk) — w(Zk)I , 2BP(n.m)(Zk 0 Zk) = 2 BP(n.ml(z > k),
(T.l 1.11)
where B = sup.J(p(x)I. Letting k —+ oc it would then follow that q(n) = p(m). The success of this approach rests on the construction of a coupling {(Z, Zk)} with r < 00 a.s.; such a coupling is called a successful coupling. The independent coupling is the simplest to try. To see how it works, take d = 1 and let {Z} and {Z} be independent simple symmetric random walks on Z starting at n, m, n — m even. Then {(Zk, Zk)} is a successful coupling since {Zk — Zk} is easily checked to be a recurrent random walk using the results of Chapters II and IV or theoretical complement to Section 3 of Chapter 1. This would also work for d = 2, but it fails for d 3 owing to transience. In any case, here is another coupling that is easily checked to be successful for any d. Let {(Z, Zk)} be the Markov chain on 7L ° x Z d starting at (n, m) and having the stationary one-step transition probabilities furnished by the following rules of motion. At each time, first select a (common) coordinate axis at random (each having probability 1 /d). From (n, m) such that the coordinates of n, m differ along the selected axis, independently select ± directions (each having probability i) along the selected axis for displacements of the components n, m. If, on the other hand, the coordinates of n, m agree along the selected axis, then randomly select a common ± direction along the axis for displacement of both components n, m. Then the process {(Zk, Zk)} with this transition law is a coupling. That it is successful for all (n, m), whose coordinates all have the same parity, follows from the recurrence of the simple symmetric random walk on 1 1 , since it guarantees with probability 1 that each of the d coordinates will eventually line up. Thus, one obtains cp(n) = cp(m) for all n, m whose coordinates are each of the same parity. This is enough by (T.11.9) and the maximum principle for harmonic functions described below. •
_>
The following two results provide fundamental properties of harmonic functions given on a suitable domain D. Their proofs are obtained by iterating the averaging property out to the boundary, just as in the one-dimensional case of Exercise 3.16, Chapter 1. First some preliminaries about the domain. We require that D = D ° u 33D, where D ° , CD are finite, disjoint subsets of Z d such that (i) 3(n) c D for each n e D , where 3(n) denotes the set of nearest neighbors of n; (ii) ä(m) n D ° 0 Q for each me OD; (iii) for each n, me D there is a path of respective nearest neighbors n„ n 2 , ... , n k in D such that n, e 0(n), n k e 0(m).
°
°
Theorem T:11.2. (Maximum Principle). A real-valued function cp on a domain D, as
366
CONTINUOUS-PARAMETER MARKOV CHAINS
°
defined above, that is harmonic on D , i.e., p(m)
_—m
2d neB(ml
(P(n),
takes its maximum and minimum values on D.
m e D°, ❑
In particular, it follows from the maximum principle that if cp also takes an extreme value on D , then it must be constant on D ° . This can be used to complete the proof of Theorem T. 11.1 by suitably constructing a domain D with any of the 2 d coordinate parities on D and öD desired.
°
°
Theorem T.11.3. (Uniqueness Principle). If cp, and Q 2 are harmonic functions on D, as defined in Theorem T.11.2, and if q, 1 = cp 2 on OD, then gyp, = c0 2 on D. ❑ Proof. To prove the uniqueness principle, simply note that q - 42 is harmonic with zero boundary values and hence, by the Maximum Principle, zero extreme values.
n
The coupling described in Theorem T.11.1 can be found in T. Liggett (1985), Interacting Particle Systems, Springer-Verlag, New York, pp. 67-69, together with another coupling to prove the boundedness principle (Choquet-Deny theorem) for the more general case of an irreducible random walk on V.
4. Additional references: The voter model was first considered in the papers by P. Clifford and A. Sudbury (1973), "A Model for Spatial Conflict," Biometrika, 60, pp. 581-588, and independently by R. Holley and T. M. Liggett (1975), "Ergodic Theorems for Weakly Interacting Infinite Systems and the Voter Model," Ann. Probab., 3, pp. 643-663. The approach in section 11 essentially follows F. Spitzer (1981), "Infinite Systems with Locally Interacting Components," Ann. Probab., 9, 349-364. The so-called biased voter (tumor-growth) model originated in T. Williams and R. Bjerknes (1972), "Stochastic Model for Abnormal Clone Spread Through Epithelial Basal Layer," Nature, 236, pp. 19-21. Much of the modern interest in the mathematical theory of infinite systems of interacting components from the point of view of continuous-time Markov evolutions was inspired by the fundamental papers of Frank Spitzer (1970), "Interaction of Markov Processes," Advances in Math., 5, pp. 246-290, and by R. L. Dobrushin (1971), "Markov Processes With a Large Number of Locally Interacting Components," Problems Inform. Transmission, 7, pp. 149-164, 235-241. Since then several books and monographs have been written on the subject, the most comprehensive being that of T. Liggett (1985), loc. cit. Other modern books and monographs on the subject are those of F. Spitzer (1971), Random Fields and Interacting Particle Systems, MAA, Summer Seminar Notes; D. Griffeath (1979), Additive and Cancellative Interacting Particle Systems, Lecture Notes in Math., No. 724, Springer-Verlag, New York; R. Kindermann and J. L. Snell (1980), Markov Random Fields and Their Applications, Contemporary Mathematics Series, Vol. 1, AMS, Providence, RI; R. Durrett (1988), Lecture Notes on Particle Systems and Percolation, Wadsworth, Brooks/Cole, San Francisco.
CHAPTER V
Brownian Motion and Diffusions
I INTRODUCTION AND DEFINITION One-dimensional unrestricted diffusions are Markov processes in continuous time with state space S = (a, b), — oo < a 0 introduced in Chapter I as a limit of simple random walks. In this case, S = (—cc, cc). The transition probability distribution p(t; x, dy) of Ys+ , given Ys = x has a density given by p(i; x, y) _
1
(2tto 2 t) i / 2
I
e -2Q cv- x -gt)2.
(1.1)
Because { Y} has independent increments, the Markov property follows directly. Notice that (1.1) does not depend on s; i.e., {Y} is a time-homogeneous Markov process. As before, by a Markov process we mean one having a time-homogeneous tranition law unless stated otherwise. One may imagine a Markov process {X,} that has continuous sample paths but that is not a proc,ss with independent increments. Suppose that, given Xs = x, for (infnitesimJl) small times t, the displacement X s +, — Xs = Xs+, — x has mean and variance approximately tp(x) and ta 2 (x), respectively. Here p(x) and a 2 (x) are function;; of the state of x, and not constants as in the case of {Y1 }. The distinction between { Y} and {X,} is analogous to that between a simple random walk and a birth-death chain. More precisely, suppose E;X + , — Xs X., = x) = tp(x) + o(t), 3
E((X X+, — XS) 2 I X = x) = ta 2 (x) + o(t), X
E(I);s +1— Xsl' I XS = x) = o(t), hold, as t j 0, for every x e S.
(1.2)
367
368
BROWNIAN MOTION AND DIFFUSIONS
Note that (1.2) holds for Brownian motions (Exercise 1). A more general formulation of the existence of infinitesimal mean and variance parameters, which does not require the existence of finite moments, is the following. For every c > 0 assume that E((X5+, — XS) 1 {Ixa.,-xsl_<E} j XS = x) = tµ(x) + o(t), E((X.+: — )S)21IIX.,,-xslsE} X = x) = ta 2 (x) + o(t),
(1.2),
P(IXs„,—XI>E1Xs=x)=o(t), hold as t j 0. It is a simple exercise to show that (1.2) implies (1.2)' (Exercise 2). However, there are many Markov processes with continuous sample paths for which (1.2)' hold, but not (1.2). Example 1. (One-to-One Functions of Brownian Motion). Let {B 1 } be a standard one-dimensional Brownian motion and cp a twice continuously differentiable strictly increasing function of (—cc, oo) onto (a, b). Then {Xj :_ {cp(B,)} is a time-homogeneous Markov process with continuous sample paths. Take p(x) = e' 3 . Now check that EIX^I = cc for all t > 0, so that (1.2) does not hold. On the other hand, (1.2)' does hold (as explained more generally in Section 3). Definition 1.1. A Markov process {X} on the state space S = (a, b) is said to be a diffusion with drift coefficient µ(x) and diffusion coefficient v 2 (x) > 0, if (i) it has continuous sample paths, and (ii) relations (1.2)' hold for all x. If the transition probability distribution p(t; x, dy) has a density p(t; x, y), then, for (Borel) subsets B of S, p(t; x, B) = J p(t; x, y) dy.
(1.3)
e
It is known that a strictly positive and continuous density exists under the Condition (1.1) below, in the case S = (— co, oo) (see theoretical complement 1). Since any open interval (a, b) can be transformed into (— oo, oo) by a strictly increasing smooth map, Condition (1.1) may be applied to S = (a, b) after transformation (see Section 3). Below, a(x) denotes either the positive square root of a 2 (x) for all x or the negative square root for all x. Condition (1.1). The functions µ(x), a(x) are continuously differentiable, with bounded derivatives on S = (— oo, oo). Also, a” exists and is continuous, and
369
INTRODUCTION AND DEFINITION
a 2 (x) > 0 for all x. If S = (a, b), then assume the above conditions for the relabeled process under some smooth and strictly increasing transformation onto (— oo, co ).
Although the results presented in this chapter are true under (1.2)', in order to make the calculations less technical we will assume (1.2). It turns out that Condition (1.1) guarantees (1.2). From the Markov property the joint density of X,,, X, 2 , ... , X, for 0 < t l < t 2 < . • • < t„ is given by the product p(tl;x,Yi)p(t2 — t1;Yi,Y2)"'p(tn — tn-1;Y,ß-1,Yn)• Therefore, for an initial distribution n, P,,(XX,EB,,...,X,EB.)
f
s e,
..
p(tl;x,Y)
...
p(t„ — t^-1;Yn-1,Yn)dy ...dy e n(dx). (1.4)
JB,,
As usual P,, denotes the distribution of the process {X} for the initial distribution rt. In the case it = S.. we write PX in place of P6 . Likewise, E• E X are the corresponding expectations. Example 2. (Ornstein-Uhlenbeck Process). Let V, denote the random velocity of a large solute molecule immersed in a liquid at rest. For simplicity, let V denote the vertical component of the velocity. The solute molecule is subject to two forces: gravity and friction. It turns out that the gravitational force is negligible compared to the frictional force exerted on the solute molecule by the liquid. The frictional force is directed oppositely to the direction of motion and is proportional to the velocity in magnitude. In the absence of statistical fluctuations, one would therefore have m(dV,/dt) = —ßV„ where m is the mass of the solute molecule and ß is the constant of proportionality known as the coefficient of friction. However, this frictional force —ßV may be thought of as the mean of a large number of random molecular collisions. Assuming that the central limit theorem applies to the superposition of displacements due to a large number of such collisions, the change in momentum m dV over a time interval (t, t + dt) is approximately Gaussian, provided dt is such that the mean number of collisions during (t, t + dt) is large. The mean and variance of this Gaussian distribution are both proportional to the number of collisions, i.e., to dt. Therefore, one may take the (local) mean of V to be —(ß/m) V, dt and the (local) variance to be a 2 dt. E(V
+d ,
— V I V = V) = — (ß/m)V dt + o(dt),
(1.5) E((V
+d ,
— V) 2 V = V) = a 2 dt + o(dt).
Therefore, a reasonable model for the velocity process is a diffusion with drift (—ß/m)V and diffusion coefficient a 2 > 0, called the Ornstein-Uhlenbeck process.
370
BROWNIAN MOTION AND DIFFUSIONS
An important problem is to determine the transition probabilities for diffusions having given drift and diffusion coefficients. Various approaches to this problem will be developed in this chapter, but in the meantime for the Ornstein-Uhlenbeck process we can check that the transition function given by
1(y 2 — x _ßm^ 't)2
p(t; x, y) = [nw 2 ß -1 m(1 — exp{-2ßm"'t})] - 'I 2 exp — ßm a (1 —e
)
(t > 0, — oo < x, y < oo), (1.6)
furnishes the solution. In other words, given V = x, L' Gaussian with mean xe - ß'" ` and variance ZQ 2 (1 — e_ 2 ß'°``)ß -1 m. From this one may check (1.5) directly.
Example 3. (A Nonhomogeneous Diffusion). Suppose {X} is a diffusion with drift µ(x) and diffusion coefficient a 2 (x). Let f be a continuous function on [0, co). Define
f
Z, =X,+
f(u)du,
(1.7)
t30.
o
Then, since f is a deterministic function, the process {Z 1 } inherits the Markov property from {XX }. Moreover, {Z} has continuous sample paths. Now, E{Z5+ ,—Z3 1Z ,=z}=E{Xs+,—XXIX„=z— o
(
=µ^z—
J
£
f(u)du}+
f
"+ t f (u) du
s
f(u)du)t+f(s)t+o(t).
(1.8)
0
Also, E{(Z+t—ZS) 2 lZ3=z}=Ej(Xs+1—X.,) 2 X.,=z—
l
+
Jf (u)du} o
(f
.'
f(u) du ) /I /
s
= a 2 (z —
+o(t)
J
S
f(u) du)t + o(t).
(1.9)
0
Similarly, E{IZs+, — Zs 1 3 Z., = z} = o(t).
(1.10)
In this case {Z} is referred to as a diffusion with (nonhomogeneous) drift and
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
371
diffusion coefficients, respectively, given by µ(s, z) =p z —
\
J
du) + f(s),
Js f (u) du I .
(1.11)
v 2 (s, z) = a 2 (z \ o —
Note that if {X,} is a Brownian motion with drift µ and diffusion coefficient a 2 , and if f (t) := v is constant, then {Z,} is Brownian motion with drift µ + v
and diffusion coefficient o 2 . A rigorous construction of unrestricted diffusions by the method of stochastic differential equations is given in Chapter VII. An alternative method, similar to that of Chapter IV, Sections 3 and 4, is to solve Kolmogorov's backward or forward equations for the transition probability density. The latter is then used to construct probability measures P,, Px on the space of all continuous functions on [0, oo). The Kolmogorov equations, derived in the next section, may be solved by the methods of partial differential equations (PDE) (see theoretical complement 1). Although the general PDE solution is not derived in this book, the method is illustrated by two examples in Section 5. A third method of construction of diffusions is based on approximation by birth—death chains. This method is outlined in Section 4. Diffusions with boundary conditions are constructed from unrestricted ones in Sections 6 and 7 by a probabilistic method. The PDE method, based on eigenfunction expansions, is described and illustrated in Section 8. The aim of the present chapter is to provide a systematic development of some of the most important aspects of diffusions. Computational expressions and examples are emphasized. The development proceeds in a manner analogous to that pursued in previous chapters, and, as before, there is a focus on large-time behavior. 2 KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
In Section 2 of Chapter IV, Kolmogorov's equations were derived for continuous-parameter Markov chains. The backward equations for diffusions are derived in Subsection 2.1 below, together with an important connection between Markov processes and martingales. The forward, or Fokker—Planck, equation is obtained in Subsection 2.2. Although the latter plays a less important mathematical role in the study of diffusions, it arises more naturally than the backward equation in physical applications. 2.1 The Backward Equation and Martingales
Suppose {X} is a Markov process on a metric space S, having right-continuous sample paths and a transition probability p(t; x, dy). Continuous-parameter
372
BROWNIAN MOTION AND DIFFUSIONS
Markov chains on countable state spaces and diffusions on S = (a, b) are examples of such processes. On the set B(S) of all real-valued, bounded, Borel measurable function f on S define the transition operator (Ttf)(x):= E(f(X) ( Xo = x) =
f
f (y)p(t; x, dy),
(t > 0).
(2.1)
Then T, is a bounded linear operator on B(S) when the latter is given the sup norm defined by
il f II := sup{II(Y)I : Y e S} .
(2.2)
Indeed T, is a contraction, i.e., IITrf II < IIfit for all f E B(S). For,
I(T,.f)(x)I
5 J I.f (Y)I P(t;, x, dy) < II f 1I .
(2.3)
The family of transition operators {T,: t > 0} has the semigroup property, Ts+, = TSTt,
(2.4)
where the right side denotes the composition of two maps. This relation follows from (Ts+,.f )(x) = E(f(Xs+,)1 Xo = x) = E[E(.f(X5+^) J XS) 1 Xo = x] = E[(T,f)(X.,) Xo = x] = T.,(T1.f)(x). (2.5)
The relation (2.4) also implies that the transition operators commute, TT, = T A T,.
(2.6)
As discussed in Section 2 of Chapter IV for the case of continuous-parameter Markov chains, the semigroup relation (2.4) implies that the behavior of T, near t = 0 completely determines the semigroup. Observe also that if f e Cs(S), the set of all real bounded continuous functions on S, then (T j)(x) - E(f (XX ) I X0 = x) —+ f (x) as t j 0, by the right continuity of the sample paths. It then turns out, as in Chapter IV, that the derivative (operator) of the function t --+ T, at t = 0 determines {T,: t > 0} (see theoretical complement 3). Definition 2.1. The infinitesimal generator A of {T 1 : t > 0}, or of the Markov process {X1 }, is the linear operator A defined by
(A.f)(x) = lim (Ts i)(x) — i(x) > s i o S
(2.7)
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
373
for all f e B(S) such that the right side converges to some function uniformly in x. The class of all such f comprises the domain ^A c?f A. In order to determine A explicitly in the case of a diffusion {X,}, let f be a bounded twice continuously differentiable function. Fix x e S, and a ä > 0, however small. Find r > 0 such that I f"(x) — f"(y)I < b for all y such that ly — xl < e. Write (T,f)(x) = E(.f(X,)I(lx,-xp,E) I Xo = x) + E(f(XX)Itlx,-xl>E} I Xo = x). (2.8) Now by a Taylor expansion of f(X1 ) around x, E(.f (XX) 1 nx, -x1
Xo = x) = E[{ f(x) + (X — x).f'(x) + z(X, — x) 2
f „(x)
+ 2(XI — x) 2 (f " (^,) — f ,, (x))} 1 11x xl Vie} I Xo = x], (2.9)
where ^, lies between x and X. By the first two relations in (1.2)', the expectations of the first three terms on the right add up to f(x) + tj,(x)f , (x) + tza 2 (x)f"(x) + 0(t),
(t 0).
(2.10)
The expectation of the remainder term in (2.9) is less than — x) 2 lilx^ x^sEt 1 Xo = x) = 8Q 2 (x)t + o(t),
(t j0). (2.11)
The relations (2.8)-(2.11) lead to
um
(Tj)(x) - f(x) _
( l o t
{u(x)f'(x) + z6 2 (x)f "(x)} -562 1 x) 2 (
+ liro (lo
Ilfll E(1 t
(fix,—xl>el
X o
x). (2.12)
The Firn on the right is zero by the last relation in (1.2)'. As >0 is arbitrary it now follows that Firn
rlo
(T^.1)(x)-
.l (x) = µ(x)1 '(x) + ia 2 (x)f "(x).
(2.13)
t
Does the above computation prove that a bounded twice continuously differentiable f belongs to -9A ? Not quite, for the limit (2.13) has not been shown to be uniform in x. There are three sources of nonuniformity. One is that the o(t) terms in (1.2)' need not be uniform in x. The second is that the o(t) term in (2.10) may not be uniform in x, even if those in (1.2)' are; for µ(x)f'(x), Q 2 (x)f"(x) may not be bounded. The third source of nonuniformity arises from
374
BROWNIAN MOTION AND DIFFUSIONS
the fact that given b > 0 there may not exist an e independent of x such that I f() — f (x)I <5 for all x, y satisfying Ix — yI < e. The third source is removed by requiring that f" be uniformly continuous. Assume p(x)f'(x), r 2 (x)f "(x) are bounded to take care of the second. The first source is intrinsic. One example where the errors in (1.2)' are o(t) uniformly in x, is a Brownian motion (Exercise 4). In the case the Brownian motion has a zero drift, the second source is removed if f" is bounded. Thus, for a Brownian motion with zero drift, every bounded f having a uniformly continuous and bounded f" is in 9A . Indeed, it may be shown that 27A comprises precisely this class of functions. Similarly, if the Brownian motion has a nonzero drift, then f e .9A if f, f ', f " are all bounded and f" is uniformly continuous (Exercise 4). In general one may ensure uniformity of o(t) in (1.2)' by restricting to a compact subset of S. Thus, all twice continuously differentiable f, vanishing outside a closed and bounded subinterval of S, belong to 9A . We have sketched a proof of the following result (see theoretical complements 1, 3 and Exercise 3.12 of Chapter VII).
Proposition 2.1. Let {X} be a diffusion on S = (a, b). Then, all twice
continuously differentiable f, vanishing outside a closed bounded subinterval of S, belong to -9A , and for such f (Af )(x) = u(x).f'(x) + ?v 2 (x)f"(x). (2.14)
In what follows, the symbol A will stand, in the case of a diffusion {X}, for the second-order differential operator (2.14), and it will be applied to all twice-differentiable f, whether or not such an f is in _9A • Turning to the derivation of the backward equation, note that the arguments leading to (2.13) hold for all bounded f that are twice continuously differentiable. If the function x -+ (T,f )(x) is bounded and twice continuously differentiable, then applying (2.13) to this function one gets
^t (T, f)(x) = A(T, f)(x), (t > 0, x e S). (2.15)
This is Kolmogorov's backward equation for the function (t, x) —• (T, f)(x) (t > 0, x e S). It will be shown below that if f E.9,, then T, f e Q. The following proposition establishes this fact for general Markov processes.
Proposition 2.2. Let {T,} be the family of transition operators for a Markov process on a metric space S. If f E 22 A then the backward equation (2.15) holds.
375
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
Proof. If f e -9A then using (2.6) and (2.3),
TS(T,f) —T^f
—
T(A.f) = T,( T
.i —Af
s is
/
Ts f — f
— Af
0
ass j0, (2.16)
which shows that T, f e _9A and, therefore, by the definition of A, (2.15) holds. n
The relation (2.16) also shows that A(T,f) = T 1 (Af),
(2.17)
i.e., T, and A commute on .9A • This fact is made use of in the proof of the following important result. Theorem 2.3. Let {X} be a right-continuous Markov process on a metric space S, and f E -9A . Assume that s —> (A f)(X ) is right-continuous for all sample paths. Then the process {Z1 }, where S
Z,'=
f(X,)
—
J
r (Af)(X„) ds 0
(2.18)
(t > 0 ),
is a {.yt }-martingale, with ^ == Q{X: 0 < u <, t}. Proof. The assumption of right continuity of s —• (A f)(X.,) ensures the integrability and .F,-measurability of the integral in (2.18) (theoretical complement 2). Now E(Zs+, I Js) = E(.f(Xs+,) 1 `ys ) — Jo (Af)(XX) du — E fS+t (A.f )(X.) du 1 5;
= (T,.f)(X,) —
Jo
(Af )(Xu) du — E \Jo (A.f )(XS )„ du I .ys ) , (2.19)
where XS is the after-s or shifted process (X) = XS+ „ (u > 0). By the Markov property, £ (A.f)(XS )„ du 1 5s J = E
Ex [
£
(A.f)(X^.) du]X=x,.
LJ o (EXA.f )(X.) du].=x.
=
LJ .
o
P T (Af)(x) dul
s X-x
(2.20)
376
BROWNIAN MOTION AND DIFFUSIONS
interchanging the order of integration with respect to Lebesgue measure and taking expectation for the second equality (Fubini's Theorem, Section 4 of Chapter 0). Now replace the last integrand T„(Af) by A(T„ f) (see (2.17)) to get E\Jo(Af)(Xs )udul ysl —JoA(Tuf)(XS) du.
(2.21)
Substituting (2.21) in (2.19) obtain E(Zs+: 1 5s) = (T^f)(XS) —
J
A(Tf)(X) du —
f
(A.f )(X„) du. (2.22)
o
0
By the backward equation (2.15), which holds by Proposition 2.2, and the fundamental theorem of calculus,
(T,f)(XS) = f(X ) + J r a (T,f)(X.,) du = f(X) + 5
o
au
f,
A(T,.f)(XS)du. (2.23)
o
Use this in (2.22) to derive the desired result,
E(Zs+J .gis) = .f(XS) — J s (A.f )(X.) du = Z.
n
0
Combining Theorem 2.3 and Proposition 2.1, the following corollary is immediate.
Corollary 2.4. If Condition (1.1) holds for a diffusion {X,}, then for every twice continuously differentiable f vanishing outside a compact subset of S, the process {Z} in (2.18) is a {.F,}-martingale. It will be shown in Section 3 of Chapter VII by direct probabilistic arguments that {Z} is a {.F}-martingale for a much wider class of twice-differentiable functions f than prescribed by the corollary. As a simple example, take {X,} to be a Brownian motion with drift p and diffusion coefficient 6 Z . Then
A=2o.2 aZ2+p
dx
d
.
dx
If one takes f(x) = x in (2.18), then Zt = XX — tp (t >, 0), which is a martingale, even though f is unbounded. In the case p = 0, one may take f (x) = x 2 to get Z, = Xt — ta 2 (t >, 0), which is also a martingale. Still, one can do a lot under the assumption of Corollary 2.4. One application is the following.
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
377
Proposition 2.5. Let {X,} be a diffusion on S = (a, b), whose coefficients satisfy Condition (1.1). Let [c, d] c (a, b). Then
f f
d
^
Px ({X,}
reaches c before d) =
J^
exp —
J - 2µ(y ('
'
`
)
d> dz
o ? y -(
)
)
,
c < x <, d.
2u( y) dy1 dz d exp^— f Z )) a c 2 (y) (2.24)
Proof. Define a twice continuously differentiable function f that equals the right side of (2.24) for c < x < d, and vanishes outside [c — s, d + c] c (a, b) for some s > 0. This is always possible as f" is continuous at c, d (Exercise 3). By Corollary 2.4, Z,:= f(X,) —
f
(A.f)(X,) ds
(t 3 0 ),
(2.25)
o
is a {.F,}-martingale. Define the {}-stopping time (see Chapter I (13.68)), z
=
> 0: X, = c or d}.
inf{t
(2.26)
Let x E (c, d). By the optional stopping result Proposition 13.9 of Chapter I
E x Zt = E x Z o .
(2.27)
But Z o = f(X0) - f(x) under PX , so that E.,Z 0 is the right side of (2.24). Now check that Af(x) = 0 for c <x < d, so that (Af)(XS ) = 0 for 0 < s
so that EZL is simply the left side of (2.24).
on {XT = c}, on {X, = d},
(2.28)
n
As in Section 9 of Chapter I and Section 2 of Chapter III, one may derive criteria for transience and recurrence based on Corollary 2.4. This will be pursued later in Sections 4, 9, and 14. The martingale method will be more fully explored in Chapter VII.
2.2 The Fokker—Planck Equation Consider a diffusion {X,} on S = (a, b) whose coefficients satisfy Condition (1.1). Let p(t; x, y) be the transition probability density of {X,}. Letting f = l 8 in
378
BROWNIAN MOTION AND DIFFUSIONS
(2.5) leads to
J
=J I p(t;zydy)ps;xz) dz ss \
+ t;x,y)dy SB
=
J(J p( a
t; z, y)p(s; x, z) dz) dy,
s
(2.29)
//
for all Borel sets B c S. This implies the so-called Chapman—Kolmogorov equation
p(s + t; x, y) = J p(t; z, y)p(s; x, z) dz = (T3.f )(x),
(2.30)
s
where f (z) := p(t; z, y). A somewhat informal derivation of the backward equation for p may be based on (2.13) and (2.30) as follows. Suppose one may apply the computation (2.13) to the function f in (2.30). Then (Af)(x)
=
lim slo
(Tsf)(x) — i(x) . s
( 2.31)
But the right side equals äp(t; x, y)/öt, and the left side is Ap(t; x, y). Therefore,
ap(t; x, y) at
_
aP(t; x, y ) 1 p(x)
ax
+ i s
2
2
P(t; x, y)
(x) a
axe
(2.32)
which is the desired backward equation for p. In applications to physical sciences the equation of greater interest is Kolmogorov's forward equation or the Fokker—Planck equation governing the probability density function of X, when X0 has an arbitrary initial distribution it. Suppose for simplicity that it has a density g. Then the density of X, is given by g(x)p(t;
f
b
x, y) (T*g)(y):=
dx.
(2.33)
a
The operator T* transforms a probability density g into another probability density. More generally, it transforms any integrable g into an integrable function T*g. It is adjoint (transpose) to T, in the sense that
J s(T7g)(y)f(y) dy = fs g(x)(T:f)(x) dx =
Here = f u(x)v(x) dx. If f is twice continuously differentiable and vanishes outside a compact subset of S, then one may differentiate with respect to t in
KOLMOGOROV'S BACKWARD AND FORWARD EQUATIONS, MARTINGALES
379
(2.34) and interchange the orders of integration and differentiation to get, using (2.17),
`a T*9,.i) _ (T f)=
=
f
(T *9)(Y)(Af)(Y) dy.
(2.35)
Now, assuming that f, h are both twice continuously differentiable and that f vanishes outside a finite interval, integration by parts yields
=f
s
h(Y)[p(Y).f'(Y) + 1 2 (Y).f"(Y)] dY 2
— Js L
Y
(.u(Y)h(Y)) + dy2 (2a2(Y)h(Y)) f(Y) dy = , (2.36)
where A* is the formal adjoint of A defined by i d (A*h)( y ) _ — (,u(Y)h(Y)) + d yZ (ia 2 (Y)h(Y)). dy
(2.37)
Applying (2.36) in (2.35), with T*g in place of h, one gets
a at T
*g,.i = .
(2.38)
Since (2.38) holds for sufficiently many functions f, all infinitely differentiable functions vanishing outside some closed, bounded interval contained in S for instance, we get (Exercise 1),
öt (T*9)(Y) = A * (T*9)(Y)•
(2.39)
That is,
Js
at
g(x) dx
— Js L —
I ä2 ay (µ(Y)P(t; x, Y)) 2 aye ( 62 (Y)P(t; x, Y)) 9(x) dx. (2.40)
Since (2.40) holds for sufficiently many functions g, all twice continuously differentiable functions vanishing outside a closed bounded interval contained in S for instance, we get Kolmogorov's forward equation for the transition
380
BROWNIAN MOTION AND DIFFUSIONS
probability density p ( Exercise 1), ap(t; x, y) —
ät
ö ä2 —ôy (h(y)p(t; x, y)) + a y2 (icr 2 (y)p(t; x, y)),
(t > 0). (2.41)
For a physical interpretation of the forward equation (2.41), consider a dilute concentration of solute molecules diffusing along one direction in a possibly nonhomogeneous fluid. An individual molecule's position, say in the x-direction, is a Markov process with drift u(x) and diffusion coefficient a 2 (x). Given an initial concentration c o (x), the concentration c(t, x) at x at time t is given by c(t, x) =
f
c o (z)p(t; z, x) dz,
(2.42)
s
where p(t; z, x) is the transition probability density of the position process of an individual solute molecule. Therefore, c(t, x) satisfies Kolmogorov's forward equation ac(t, x)
A*c(t, x)
ax J(t, x),
(2.43)
with J given by J(t, x) _ —
ä
[1 2 (x)c(t,
x)) + µ(x)c(t, x)].
(2.44)
The Kolmogorov forward equation is also referred to as the Fokker—Planck equation in this context. Now the increase in the amount of solute in a small region [x, x + Ax] during a small time interval [t, t + At] is approximately öc(t, x)
öt
At Ax.
(2.45)
On the other hand, if v(t, x) denotes the velocity of the solute at x at time t, moving as a continuum, then a fluid column of length approximately v(t, x) At flows into the region at x during [t, t + At]. Hence the amount of solute that flows into the region at x during [t, t + At] is approximately v(t, x)c(t, x) At, while the amount passing out at x + Ax during [t, t + At] is approximately v(t, x + Ax)c(t, x + Ax) At. Therefore, the increase in the amount of solute in [x, x + Ax] during [t, t + At] is approximately [v(t, x)c(t, x) — v(t, x + Ax)c(t, x + Ax)] At.
(2.46)
TRANSFORMATION OF THE GENERATOR
381
Equating (2.45) and (2.46) and dividing by At Ax, one gets öc( öt x)
äx (v(t,
x)c(t, x)).
(2.47)
Equation (2.47) is generally referred to as the equation of continuity or the equation of mass conservation. The quantity v(t, x)c(t, x) is called the flux of the solute, which is seen to be the rate per unit time at which the solute passes out at x (at time t) in the positive x-direction. In the present case, therefore, the flux is given by (2.44).
3 TRANSFORMATION OF THE GENERATOR UNDER RELABELING OF THE STATE SPACE In many applications the state space of the diffusion of interest is a finite or a semi-infinite open interval. For example, in an economic model the quantity of interest, say price, demand, supply, etc., is typically nonnegative; in a genetic model the gene frequency is a proportion in the interval (0, 1). It is often the case that the end points or boundaries of the state space in such models cannot be reached from the interior. In other words, owing to some built-in mechanism the boundaries are inaccessible. A simple way to understand these processes is to think of them as strictly monotone functions of diffusions on S = R 1 . Let d 2 d +µ(x)x
(3.1)
A=162(x)dx2
be the generator of a diffusion on S = (—co, oo). That is to say, we consider a diffusion with coefficients µ(x), a 2 (x). Let 0 be a continuous one-to-one map of S onto S = (a, b), where — oo < a 0, then for every r> 2 one has E(IX , —
XX = x) = o(t)
as t10,
(3.2)
for all e'>0. Proof. Let r> 2, E' > 0 be given. Fix 0 > 0. We will show that the left side of
(3.2) is less than 6t for all sufficiently small t. For this, write
382
BROWNIAN MOTION AND DIFFUSIONS
(5= (0/(2 r 2 (x))) 1 " ( r -2) . Then the left side of (3.2) is no more than E(IXS+1 — Xsl'lnx.+t—xsl,a) ( XS = x)
r
+ E(IXs+r — X 1 (Ixs+t — xsls^xs.t—xsl>S) Xs = x)
S' 2 (1 X., — X521(1
xsi,ts) 1 XX = x) + E 'r P(IX., — XJ > (5 XS = x)
(o2(x)t + o(t)) + (e')ro(t) = 0 t + o(t). (3.3)
l 2 o' 2 (x)
The last inequality uses the fact that (1.2)' holds for all e > 0. Now the term o(t) is smaller than Ot/2 for all sufficiently small t. n Proposition 3.1. Let {X} be a diffusion on S = (c, d) having drift and diffusion coefficients µ(x), Q 2 (x), respectively. If 0 is a three times continuously differentiable function on (c, d) onto (a, b) such that 0' is either strictly positive or strictly negative, then {Z, :_ ¢(XX )} is a diffusion on (a, b) whose drift i(•) and diffusion coefficient Q 2 (.) are given by A(z) = 4) (4
+ i4) (4) 1(z))a2(4,-1(z)),
'
& 2 (Z) _ (4' 1 (0 -1 (Z))) 2
z E (a, b). (3.4)
2 (4 -1 (Z))'
Proof. By a Taylor expansion, Za+, — Z., = 0(Xs+r) — 0(X.,)
= (X,+' — X.)^'(XS) + 1 (Xs +, — X:) 2 4)"(X5) + 3 (Xs +, — XS) 3 4"(Z), {
2!
(3.5) where lies between Xs and XS+ ,. Fix s > 0, z e (a, b). There exist positive constants (5 1 (),5 2 (c) such that 0 - ' maps the interval [z — e, z + s] onto [4 -1 (z) — (5 1 (e), ¢ - '(z) + 6 2 (s)]• Write x = 4 - '(z), and let S m = min{8 1 (s), 6 2 (8)}, SM = max{81(c), 62(8)}. Then 1 ({Zs+t-zlbE) = 1(x-61(E)sX,+t,x+bz(E))1 (3.6)
1 (Ixa+t-xlsam)
1 {IZ.+t-zI e}
1 {IX,+t-xlsöM)'
Therefore, E((Xs — X., ) 1 {Iza+t-zl,E) 1 Zs = z) = E((X„, — x)ltl+t-IE) I X, = x) = E((X5+, —
+ E((Xs
x ) l (Ixa+t — xl) —
1
XX = x )
x ) l (Ix.+t—xl>b,,.,iz,+t—zl5e) I Xs =
(3.7) In view of the last inequality in (3.6), the last expectation is bounded in
383
TRANSFORMATION OF THE GENERATOR
magnitude by SMP(IXs+a — xI > CS m I XX = x), which is of the order o(t) as t j 0, by the last relation in (1.2)'. Also, by the first relation in (1.2)', E((Xs+t — Xs)ltIxs+t-xl,am) I Xs = x) = µ(x)t + o(t).
Therefore, E((XX+r — XS)l(Iz. ZIsE) 1 Z = z) = u(x)t + o(t). ,
(3.8)
In the same manner, one has E((X5+1 — Xs) 21 (1z,+t—zsl,E) 1 Zs = z) = E((Xs+r — Xs) 21 (Ixs-,—xaI o,,,} I Xs = x)
+ O(SMP(I Xs+r — xl > b m I Xs = x) = 0 2 (x)t + o(t).
(3.9)
Also, by the Lemma above, ,,,
E(IXs+r — Xs1 3 1o (^)Ilnzs.,—z,I,E) ( ZS = z) cE(IXX+r — Xsl 31 {Ix,,,-x,I_
Using (3.8)-(3.10), and (3.5), one gets E((Zs+r — Zs)ltlzs,, zsI E) Zs = z) = O'(x),U(x)t + ir 2 (x)O ,, (x)t + 0(t), (3.11) so that the drift coefficient of {Z r } is (^
(z))(ß
1 (z))
+ 1a 2 (4 1 (z))ß (4 1 (z))•
In order to compute the diffusion coefficient of {Z}, square both sides of
(3.5) to get E((Z5+r — Zs) 21 {Izs.t_ZsI e) 1 Zs = z) = (t'(x)) 2 E((X5+r — Xs) 21 uz,.,-zal,E) I XS = x) + R r , (3.12)
where each summand in R, is bounded by a term of the form cE(IX3+, — Xsl'lnz,+-z,IsE)1 Xs = x) 5 cE(IXs+r — XSI l(lxa.-xa1^aM) XS = x),
BROWNIAN MOTION AND DIFFUSIONS
384
where r> 2. Hence by the Lemma above R, = o(t) and we get, using (3.9) in (3.12),
E((Z„, — ZS) 21 {IZ ,,-zal-<E) 1 ZS = z) = (4'(x)) 2 o 2 (x)t + o(t).
(3.13)
Finally, P(IZs+r
—
Zj>cIZs=z)
—
Asl> 6 .1X.,=x)=
0 (t),
(3.14)
by the last condition in (1.2)'. n Example 1. (Geometric Brownian Motion). Let S = (—cc, cc), S = (0, cc), 4(x) = e". If A is given by (3.1), then the coefficients of A are µ(z) = zµ(log z) + Zza 2 (log z), (0
v 2 (z) = z 2 Q 2 (log z),
(3.15)
In particular, a Brownian motion with drift µ and diffusion coefficient 0 2 becomes, under the transformation x --> ex, a diffusion with state space S = (0, oo) and generator Ä
d
Zß Z Z Z d z2 + ( + 2 f z z .
(3.16)
In other words, the mean rate of growth as well as the mean rate of fluctuation in growth at z is proportional to the size z. Note that one has Z, = e x'.
(3.17)
But {X} may be represented as XX = X0 + tp + cB^, t >, 0, where {B1 } is a standard one-dimensional Brownian motion starting at zero and independent of X0 . Then (3.17) becomes Z, = Z o e`µ
°',
Zo = e x o.
(3.18)
The process {Z} is sometimes referred to as the geometric Brownian motion. Example 2. (Geometric Ornstein-Uhlenbeck Process). The Ornstein-Uhlenbeck process with generator (see Example 1.2) i
A Zoe
dx z — Yx
dx'
(—cc < x < co),
(3.19)
is transformed by the transformation x -+ e" into a diffusion on (0, oo) with
TRANSFORMATION OF THE GENERATOR
385
generator
d
z d2 - f \yz log z - 2 Ä = ? Q Z Z Z dz2 °z /%i dz ,
( 0
(3.20)
Example 3. S = (- co, oo), S = (0, 1), 4(x) = ex/(ex + 1). For this case, first transform by x - ex, then by y - y/(1 + y). Thus, one needs to apply the transformation y(y): y -• y/(1 + y) to the operator with coefficients (3.15). Since the inverse of y is y '(z) = z/(1 - z), one may use (3.15) to obtain the transformed operator (Exercise 1) A = Zz 2 (1 -
dz —Z— 1 -z dz'
z) 2 7 2 log (
z z(1 - z) z + z(1 - z)µ log-----) +-- — a 2 log --1-z 1-z 2 - z 2 (1 - z)Q2 (los
r = Zz(1 - z) z(1 - z)a 2 log
L
_--)]
z d 2 --1 - z j dzZ
+ { 2j( log z I + (1 - 2z)ß 2 1 log z I} ] , jjjz dz 1 — z/ 1— l
0< z <1. (3.21)
`
In particular, a Brownian motion is transformed into a diffusion on (0, 1) with generator
r Ä
=
d
zz(1 - z) z(1 - z)a 2 z2 + ( 2µ + (1- 2z)Q 2
L
)
d ,
0< z <1, (3.22)
and an Ornstein-Uhlenbeck process is transformed into a diffusion on (0, 1) with generator Ä = Zz(1 - z) z(1 - z)a 2
dz -
d
2y log z - - (1 - 2z) a2 - z dz
dz 2 1
0
Example 4. Let S = (-oo, oo), S = (0, oo), 4(x) = e"'. If dz A —_iQZ_ z dx2
386
BROWNIAN MOTION AND DIFFUSIONS
then z Ä = Z6 2 (9z 2 (log z)4"3) dz
z
+ Za
2 z{6(log
z)" + 9(log z)413}
dz
. (3.24)
In this example the transformed process Z,=e',
t>,0
(3.25)
(where {X,} is a Brownian motion) does not have a finite first moment (Exercise 2).
4 DIFFUSIONS AS LIMITS OF BIRTH—DEATH CHAINS We now show how one may arrive at diffusions as limits of discrete-parameter birth—death chains with decreasing step size and increasing frequencies. Because the underlying Markov chains do not in general have independent increments, the limiting diffusions will not have this property either. Indeed, the Brownian motions are the only Markov processes with continuous sample paths that have independent increments (see Theorem T.1.1 of Chapter IV). Suppose we are given two real-valued functions µ(x), a 2 (x) on 6R' _ (— oo, 00) that satisfy Condition (1.1). Throughout this section, also assume that µ(x), v 2 (x) are bounded and write Qö = sup Q 2 (x).
(4.1)
x
Consider a discrete-parameter birth—death chain on S = {0, ±A, ±2A, ...} with step size A > 0, having transition probabilities p ;; of going from iA to JA in one step given by, 1z(iA)E U2(ie)8 = a;o^ ;=Q Z (ie)E^t(ie)E _ 2e ^o> := 2e2 + 2A 2A2
(4.2)
a z (i0)e = 1
Pu
cep — a^e^ 1 — ez ß< < = —
with the parameter given by A2 Az .
(4.3)
Uo
Note that under Condition (1.1) and boundedness of p(x), v z (x), the quantities ß, 5, and I — ß;°) — a;° ) are nonnegative for sufficiently small A. We shall let e be the actual time in between two successive transitions. Note that, given
DIFFUSIONS AS LIMITS OF BIRTH—DEATH CHAINS
387
that the process is at x = iA, the mean displacement in a single step in time e is A /3i(°) - A bi(°) = p(iA)e = fi(x)e.
(4.4)
Hence, the instantaneous rate of mean displacement per unit time, when the process is at x, is u(x). Also, the mean squared displacement in a single step is A 2 ß + ( - A) 2 8; °) = r 2 (iA)c = Q Z (x)E.
(4.5)
Therefore, a 2 (x) is the instantaneous rate of mean squared displacement per unit time. Conversely, in order that (4.4), (4.5) may hold one must have the birth-death parameters (4.2) (Exercise 1). The choice (4.3) of c guarantees the nonnegativity of the transition probabilities p ;; , p ;.; _,, pi,;+l. Just as the simple random walk approximates Brownian motion under proper scaling of states and time, the above birth-death chain approximates the diffusion with coefficients µ(x) and u'(x). Indeed, the approximation of Brownian motion by simple random walk described in Section 8 of Chapter I follows as a special case of the following result, which we state without proof (see theoretical complement 1). Below [r] is the integer part of r. Theorem 4.1. Let { Y: n = 0, 1, 2, ...} be a discrete-parameter birth-death with one-step transition probabilities (4.2) and chain on S = {0, ±A, ±2&. with Yo) = [x o /0]A where x 0 is a fixed number. Define the stochastic process {X} := { Yt°e 1 }
(t >, 0).
(4.6)
Then, as A j 0, {X} converges in distribution to a diffusion {X} with drift µ(x) and diffusion coefficient 6 2 (x), starting at x 0 . As a consequence of Theorem 4.1 it follows that for any bounded continuous function f we have
°)
)
lim E{.f(X( ) I Xö = ^]Aj = Ez.f(X,). A-^ [
(4.7)
By (2.15), in the case of a smooth initial function f, the function u(t, x):= E x f (XX ) solves the initial value problem
au
_ ac
^
Z 2(x)
ai Z + p(x) au ,
ax
ax
lim u(t, x) _ f (x). ,1 o
(4.8)
The expectation on the left side of (4.7) may be expressed as u (A) (n, i)'=
>f(jA)P
(4.9)
388
BROWNIAN MOTION AND DIFFUSIONS
where n = [ t/s], i = [x/A], and p;; are the n -step transition probabilities. It is illuminating to check that the function (n, i) u ° (n, i) satisfies a difference equation that is a discretized version of the differential equation (4.8). To see this, note that )
(
(
P;
+
i)
)
= PiiPci ) +P).i +i pi+i.;+pa,i - iPi"- i,; )
(n) _ ce) c") (e) (") (o) (") = ( 1 — ß) at )P); + ßt Pt+i,i + bI PI -i,; (n)
= P);
+
(A) (") i (Pt +i.;
(n)
(D) (n)
— P.;) — bt (Pt;
—
(n)
Pi-
i,;),
(4.10)
or, P(
— P ( __ µ(iA) (") .+ 20 {(P i,;
1
+
Q2
2
—
(") + (P,;(") —p)(") Pi;) -i,;)}
(iA) (") (") + PI (") -i, ;) (P( +i,; — 2pi(;
A2
( 4.11 )
Summing over j one gets a corresponding equation for u (°) ( n, i). As Ab 0 the state space S = {0, ±A, ±2A,. . .} approximates l = (— oo, oo), provided we think of the state jA as representing an interval of width A around jA. Accordingly, think of spreading the probability p;j ) over this interval. Thus, one introduces the approximate density y -- p (°) ( t; x, y) at time t = ne for states x = iA, y =jA, by p(e)(ns; iA, jA) := h-.
(4.12)
By dividing both sides of (4.11) by A, one then arrives at the difference equation (P (e) ((n + 1)s; iA, jA) — p ( e)(ne; iA,jA)) /E e = µ(iA)(P (n) (ns; (i + 1)A, jA) — p ( ) (ne; (i — 1)A, jA))/2A A + ia 2 (iA)(P (e) (ns; (i + 1)A, jA) — 2p ( ) ( nc; iA,jA) + p ( e) (ne; (i — 1)A, jA))/AZ. (4.13)
This is a difference-equation version of the partial differential equation y) ap(t;, a t
= k(x)
op(; x, y) + Za2(x) a 2 P(t; x, y) ax
i
for t > 0, — oo < x, y < oo, (4.14) at grid points (t, x, y) = (ne, iA, jA). Thus, the transition probability density p(t; x, y) of the diffusion {X} may
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES
389
be approximately computed by computing p. The latter computation only involves raising a matrix to the nth power, a fairly standard computational task. In addition, Theorem 4.1 may be used to derive the probability i/i(x) that {X,} reaches c before d, starting at x E (c, d), by taking the limit, as A J 0, of the corresponding probability for the approximating birth—death chain {Y;,e }. In other words (see Section 2 of Chapter III, relation (2.10)), )
^e>
j-1 Sce)S(e) r r -1
0(x) = leim
5 1+1
ß(s)
r=to
(4.15)
+1
1 L=ißie^ r
1
g^e>
^e^ +1 r
Pi+1
where i' =
i= i =
x], [
The limit in (4.15) is (Exercise 2) J
1L/(X) =
Y1 exp —
f
rip
hm
(4.16)
( 2 µ(Y)/a 2 (Y)) dY j -
eio I + ^Y1 exp —
f I
].
rA
c
( J .
r =i +t
[
]
[
re ( 2 li(Y)/a 2 (Y)) dY c
d
exp
^
2 µ(Y)/Q 2 (Y)) d y}
Z
J
exp{ — c
dz
_JZ (
(4.17) )
J ( µ(Y)/a (Y)) dy} dz 2
c
2
) ))
which confirms the computation (2.24) given in Section 2. This leads to necessary and sufficient conditions for transience and recurrence of diffusions (Exercise 3). This is analogous to the derivation of the corresponding probabilities for Brownian motion given in Section 9 of Chapter I. Alternative derivations of (4.17) are given in Section 9 (see Eq. 9.23 and Exercise 9.2), in addition to Section 2 (Eq. 2.24).
5 TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES Under Condition (1.1) the Kolmogorov equations uniquely determine the transition probabilities. However, solutions are generally not obtainable in closed form, although numerical methods based on the scheme described in
390
BROWNIAN MOTION AND DIFFUSIONS
Section 4 sometimes provide practical approximations. Two examples for which the solutions can be obtained explicitly from the Kolmogorov equations alone are given here.
Example 1. (Brownian Motion). Brownian motion is a diffusion with constant drift and diffusion coefficients. First assume that the drift is zero. Let the diffusion coefficient be a 2 . Then the forward, or Fokker-Planck, equation for p(t; x, y) is
ap(t at a , Y) = 2a2 a Z P(; 2 , Y) (
t > 0, — co < x < oc, — oo < y < co ). (5.1)
Y
Let the Fourier transform of p(t; x, y) as a function of y be denoted by P(t; x, )
=
J
e 14y p(t;
x, y) dy.
(5.2)
Then (5.1) becomes (5.3)
2 Q2ß'
öt or, 2
öt (e
)
0, (5.4)
whose general solution is p(t; x, ) = c(x, l;)exp{—?a 2 2 t}.
(5.5)
Now p is the characteristic function of the distribution of X, given X 0 = x, and as t 10 this distribution converges to the distribution of X0 that is degenerate at x. Therefore,
c(x, Z) = lim p(t; x, Z) = E(e`
X °)
= e`x,
(5.6)
tjo
and we obtain ^(t; x, l;) = exp {il x — a 2 2 t}.
(5.7)
But the right side is the characteristic function of the normal distribution with
TRANSITION PROBABILITIES FROM THE KOLMOGOROV EQUATIONS: EXAMPLES
391
mean x and variance tv 2 . Therefore, P(t; x,Y) _ (2na2t)"2 exp{-
s
(y _
(t > 0, -cc < x, y < oo). (5.8) X)
The transition probability density of a Brownian motion with nonzero drift may be obtained in the same manner as above (Exercise 1).
Example 2. (The Ornstein-Uhlenbeck Process). S = (- oo, cc), u(x) = - yx, a 2 (x) := v 2 > 0. Here y is a (positive) constant. Fix an initial state x. As a function of t and y, p(t; x, y) satisfies the forward equation
P
z
at
ZP
2 öy 2
(5.9)
+ y - (YP(t; X, Y))-
Let p be the Fourier transform of p as a function of y, ß(t; Z) =
J
(5.10)
e'4YP(t; x, y) dy.
Then, upon integration by parts,
G
Y/ (t; ) = J -^ e aY (t; x, y) dy
i^e'4vp(t;
^ I (t; Z) = GY /
_ m
e1 ap(t ; x, Y) dY = - iZ aY
_ (_ iZ) 2
C
Y ^y ) ^(t; ) = J
x, y) dy = -iß,
J
J e`' apaydy
ep(t; x, Y) dY =
ey aP(t;
, Y) =
1 (ap ö
1 d
ay^
iö
i
y a J p e ^ y ! dY
( -1 P) _
-
P -
öp(t; )
a
(5.11)
Here we have assumed that y(ap/ay), a l p/ay e are integrable and go to zero as jyj - cc. Taking Fourier transforms on both sides of (5.9) one has aß(t; ) =-2
^ 2 P+yP+y( - P
-
aß)=
392
BROWNIAN MOTION AND DIFFUSIONS
Therefore, öp(t;
3ß 2 2 ^ Z p. ^ öt
(5.12)
Thus, we have reduced the second-order partial differential equation (5.9) to the first-order equation (5.12). The left side of (5.12) is the directional derivative of p along the vector (1, y^) in the (t, )-plane. Let a(t) = d exp{yt}. Then (5.12) yields
d
(5.13)
t p(t; a(t)) = --- a 2 (t)p(t; a(t))
or
-f
p(t; a(t)) = c(x, d) exp - -
o, a
l
2 (s)
ds }
))
= c( x , d) e x p ^ — a 4 dz (e 2 Yt _ l)}.
Y
For arbitrary t and one may choose d = ^e - Y`, so that a(t) = ^, and get ß(t; ) = c(x, ^e - '")exp{ —42 X 2 (1 — e -2 ji)}
(5.14)
l 4y
t j 0, one then has (5.15)
,im ß(t; Z) = c(x, Z). :10
On the other hand, as t 10, p(t; c) converges to the Fourier transform of 5, which is exp{i^x}. Thus, c(x, ) = exp{i^x} for every real , so that (5.16)
(t; ) = exp{ ixe - y` — 4 (1 — e -Z Y`) Z },
Y
which is the Fourier transform of a Gaussian density with mean xe - y` and variance (o 2 /2y)(1 — e -2 yt ). Therefore, p(t; x, y) = (2 )^ii (2y (1 — e -Z Y`)) I/2 exp{
a 2 (1 — e e 2y1)2}
(5.17)
Note that the above derivation does not require that y be a positive parameter. However, observe that if y = ß/m > 0 then letting t --+ oo in (5.16) we obtain
393
DIFFUSIONS WITH REFLECTING BOUNDARIES
the characteristic function of the Gaussian distribution with mean 0 and variance Q 2 /2y = mc 2 /2ß (the Maxwell—Boltzmann velocity distribution). It follows that it(v) (__Zy-1)1/2 exp{ — yv z/6 z },
—cc
< v < oo,
is the p.d.f. of the invariant initial distribution of the process. With it as the initial distribution, { V} is a stationary Gaussian process. The stationarity may be viewed as an "equilibrium" status in which energy exchanges between the particle and the fluid by thermal agitation and viscous dissipation have reached a balance (on the average). Observe that the average kinetic energy is given by E, ( mV,) = (m 2 /4ß)Q 2 •
6 DIFFUSIONS WITH REFLECTING BOUNDARIES So far we have considered unrestricted diffusions on S = (— oo, oo), or on open intervals. End points of the state space of a diffusion that cannot be reached from the interior are said to be inaccessible. In this section and the next we look at diffusions restricted to subintervals of S with one or more end points accessible from the interior. In order to continue the process after it reaches a boundary point, one must specify some boundary behavior, or boundary condition, consistent with the requirement that the process be Markovian. One such boundary condition, known as the reflecting boundary condition or the Neumann boundary condition, is discussed in this section. Subsection 6.2 may be read independently of Subsection 6.1.
6.1 Reflecting Diffusions as Limits of Birth—Death Chains As an aid to intuition let us first see, in the spirit of Section 4, how a reflecting diffusion on S = [0, oo) may be viewed as a limit of reflecting birth—death chains. Let it(x) and a 2 (x) satisfy Condition (1.1) on [0, oo) and let µ(x) be bounded. For sufficiently small A > 0 one may consider a discrete-parameter birth—death chain on S = {0, A, 2A,. . .} with one-step transition probabilities =
,e) = u'(i0)e p(iA)a ß^ 2AZ + 2A
= ö Stns
— a 2 (iA) e u(iA)E 2A2 — 2A (i CrZAE
P^^ = 1 — (ßi + b) = 1 — (i
)
1),
(6.1)
394
BROWNIAN MOTION AND DIFFUSIONS
and Poi = ßo (Al
—
62(0)E 0 )E a2(0)E ii(0)E (0) ß o 1—— 2ez — µ( 2e .
202 + 2e Poo = 1 —
(6.2)
Exactly as in Section 4, the backward difference equations (4.11) or (4.13) are obtained for i >, 1, j > 0. These are the discretized difference equations for
aP(t;
,
Y)
= u(x) aP(
ä )x Y + ZU2(x)
z
a 2 P(t; x, Y)
t>0,
x>0, Y>0.
(6.3) The backward boundary condition for the diffusion is obtained in the limit as A J, 0 from the corresponding equations for the chain for i = 0, j >, 0, Po.i+ 11 — Poi Pi] + Poo Po ( a'(0)f: µ( 0 )s )(") — a 2 0 )E - 11 0 )E l( n 2A 2 + 2A p '' + 1\ 1 2A z 2A )Poi)
_
(
or, + Pö; i)E Pö;
(
(J 0),
_ 2Q0) (Pi; — Pö;) + 2D) (P (' — Pö;) (j > 0). (6.4)
In the notation of (4.12) this leads to,
2e)
p((n + 1)E; 0, jA) — p (°) (m; 0, j0) A =
(
P (°) (nw; A, JA) — p (°) (w; 0, jA))
+ µ( 0) (p (°) (nc; A, j0) — p (°)
(
ns; 0, JA))A.
(6.5) Fix y >, 0, t > 0. For n = [ t/E], j = [y/0], the left side of (6.5) is approximately 0 ( aP(t; x, Y) at
)
.= O'
while the right side is approximately 1az(0) öP(t; x, Y) 01p(0) aP(t; x, Y)
z
ax
x_o
z
ax
).=0
Letting A j 0, one has ap(t; x, Y) =
ox
x=0
0,
(t
> 0, y > 0).
(6.6)
395
DIFFUSIONS WITH REFLECTING BOUNDARIES
The equations (6.3) and (6.6), together with an initial condition p(0; x,.) = 6 x determine a transition probability density of a Markov process on [0, co) having continuous sample paths and satisfying the infinitesimal conditions (1.2) on the interior (0, oc). This Markov process is called the diffusion on [0, co) with reflecting boundary at 0 and drift and diffusion coefficients µ(x), a 2 (x). The equations (6.3) and (6.6) are called the Kolmogorov backward equation and backward boundary condition, respectively, for this diffusion. This particular boundary condition (6.6) is also known as a Neumann boundary condition. The precise nature of the approximation of the reflecting diffusion by the corresponding reflecting birth—death chain is the same as described in Theorem 4.1. 6.2 Sample Path Construction of Reflecting Diffusions Independently of the above heuristic considerations, we shall give in the rest of this section complete probabilistic descriptions of diffusions reflecting at one or two boundary points, with a treatment of the periodic boundary along the way. The general method described here is sometimes called the method of images. ONE-POINT BOUNDARY CASE. (Diffusion on S = [0, c0) with "0" as a Reflecting Boundary). First we consider a special case for which the probabilistic description of "reflection" is simple. Let µ(x), v 2 (x) be defined on S = [0, cc) and satisfy Condition (1.1) on S. Assume also that µ(0) = 0. Now extend the coefficients p(.), a 2 µ(
—
x) =
—
u(x),
(.)
(6.7)
on R' by setting
Q 2 (— x)
= Q 2 (x),
(x > 0).
(6.8)
Although /1(.), a'(.) so obtained may no longer be twice-differentiable on R', they are Lipschitzian, and this suffices for the construction of a diffusion with these coefficients (Chapter VII). Theorem 6.1. Let {X} denote a diffusion on FR' with the extended coefficients ,u(.), a 2 (.) defined above. Then {1X1 1} is a Markov process on the state space S = [0, cc), whose transition probability density q(t; x, y) is given by q(t;
x, y) = p(t; x, y) + p(t; x, — y)
(x, ye [0, cc)),
(6.9)
where p(t; x, y) is the transition probability density of {X}. Further, q satisfies the backward equation aq(t;
x, y)
äq
12 a 2 q
+ p(x)
ax (t
> 0; x > 0, y % 0 ),
(6.10)
396
BROWNIAN MOTION AND DIFFUSIONS
and the backward boundary condition aq(t; x, y) I ax
=0
(6.11)
(t>0;y>,0).
x=0
Proof. First note that the two Markov processes {X} and {—X} on R' have
the same drift and diffusion coefficients (use Proposition 3.1, or see Exercise 1). Therefore, they have the same transition probability density function p, so that the conditional density p(t; x, y) of XX at y given X0 = x is the same as the conditional density of —X, at y given —X 0 = x; but the latter is the conditional density of X, at —y, given X0 = —x. Hence, p(t; x, y) = p(t; —x, —y).
(6.12)
In order to show that { Y := I X^I } is a Markov process on S = [0, cc), consider an arbitrary real-valued bounded (Borel measurable or continuous) function g on [0, oo), and write h(x) = g(Ixl). Then, as usual, writing Px for the distribution of {X1 } starting at x and Ex for the corresponding expectations, one has for Ex(g()s+,) I {Y„ 0 < u < s}), Ex(g(IX5+,I) I {IXuI: 0 s u
s})
= Ex[E(g(IXs+,I) I {X„: 0 u s}) I {IX I: 0 u 5 s}] X
= E x [E(h(XS+ ,) {X: 0 u s}) {IX„I: 0 u s}]
Ex[(J
m
=
EX[J
{IX„I: 0 < u <, s}
00 g(z)(p(t; x', z) + p(t; —x', z)) dz J o
8
{IXXI: 0 ‹ u < s}1
/ x—x
g(z)q(t; IXXI, 0
]
x' =X s
0
Ex [\J
x'=xs
g(z)(p(t; x', z) + p(t; x', —z)) dz
= Ex
=
I {IXXI: 0 ,< u ,< s}]
m h(y)p(t; x', y) dy)
z) dz I {IX„I: 0 < u ‹ s}]
g(z)q(t; I X31, z) dz,
=J 0
(6.13)
where q(t; x, y) = p(t; x, y) + p(t; —x, y).
(6.14)
This proves {IX,I} is a time-homogeneous Markov process with transition probability density q. By differentiating both sides of (6.9) with respect to t,
397
DIFFUSIONS WITH REFLECTING BOUNDARIES
and making use of the backward equation for p, one arrives at (6.10). Since the right side of (6.14) is an even differentiable function of x, (6.11) follows. n
Definition 6.1. A Markov process on S = [a, oo) that has continuous sample paths and whose transition probability density satisfies equations (6.10), (6.11), with 0 replaced by a, is called a reflecting diffusion on [a, oo) having drift and diffusion coefficients u(x), a 2 (x). The point a is then called a reflecting boundary point. Note that in this definition µ(0) is not required to be zero. It is possible to give a description of such a general diffusion with a reflecting boundary, similar to that given in Theorem 6.1 for the case µ(0) = 0 (see theoretical complement 1). Example 1. Consider a reflecting diffusion on S = [0, oo) with u(-):= 0, v 2 (x):= a 2 > 0. This is called the reflecting Brownian motion on [0, x). Its transition probability density is, by (6.9) or (6.14), _ x 2 q(t; x, Y) _ (2n02t) - ^i z
exp — (y
2
x) + exp — (y
2Q t )
(t>0;x,y>,0). (6.15)
Example 2. Consider the Ornstein-Uhlenbeck process (Example 5.2) {X 1 } on W with µ(x) _ —yx, a 2 (•) := a 2 > 0. By Theorem 6.1, {IX t I} is the reflecting diffusion on S = [0, co ), with transition probability density (see (5.17) and (6.9)) q(t; x, y) _
yt) ? (1 — e -2
-viz[
yl z
exp a z (1
ç_
— e -2 y t )
y( y_+_xe) 2
+ exp a
z (1
— e-2yi)
1
>
(t>0;x,y>,0). (6.16)
By arguments similar to those given in Section 2, we shall now derive the forward equation for a reflecting diffusion on [a, cc). By considering twice continuously differentiable functions f, g vanishing outside a finite interval, and g vanishing in a neighborhood of a as well, one derives the forward equation exactly as in (2.33)-(2.41) (Exercise 6), l p(t; x, Y)
ä2
a (ia (Y)P(t; x, Y)) — - (µ(Y)P(t; x, Y)), Y Y (t>0;x>,a,y>a). (6.17)
BROWNIAN MOTION AND DIFFUSIONS
398
To derive the forward boundary condition, differentiate both sides of the following equation with respect to t, 1
=J
(6.18)
p(t; x, y) dy,
to-I
to obtain, using (6.17),
0
ap(t; x, y) r
('
=J
tQ 00
^
dY =
at
J
a
a
— — f^(Y)P + a (ia 2 (Y)P) dY ay
t.'.) ay \
= µ(a)p(t; x, a) — \ Y (zo 2 (Y)P(t; x, Y))
)y = a
Hence the forward boundary condition is (6.19)
O y (io 2 (Y)P(t; x, Y)) y =a — µ(a)p(t; x, a) = 0.
CASE. (Diffusions on a Circle: Periodic Boundary Conditions). Let µ(• ), a'(.) be defined on R' and satisfy Condition (1.1). Assume that both are periodic functions of period d.
PERIODIC BOUNDARY
Theorem 6.2. Let {X} be a diffusion on l' with periodic coefficients µ(• ), 2(.) as above. Define (6.20)
{Z,} := {XX (mod d)} .
Then {Z} is a Markov process on [0, d) whose transition probability density function q(t; z, z') is given by (6.21)
(t > 0; z, z' e [0, d))
q(t; z, z') _ Z p(t; z, z' + md) m=—m
where p(t; x, y) is the transition probability density function of {X}. Proof. Let f be a real-valued bounded continuous function on [0, d] with Proof f(0) = f (d ). Let g be the periodic extension of f on Q^' defined by
g(x + md) = f (x) for x e [0, d] and m = 0, ± 1, +2, .... We need to prove [
E[f(Z3+)1 {Z: 0 < u < s}] =
J
df(zr)q(t; z, z')
0
dz']
. 2 -Z
(6.22)
399
DIFFUSIONS WITH REFLECTING BOUNDARIES
Now, since f(Z) = g(X,), and {X} is Markovian,
1
1
E[f (ZS +,) {X: 0 < u < s}] = E[g(X5+,) {X: 0 < u < s}]
[II
=g(y)p(t; x, y) dY]
z— x
s
m +l}d
L
9(Y)P(t;
m=—m LLL
x, Y) dY(6 . 2 3)
d
x
=x,,
But the periodicity of z(•) and a'(•) implies that the Markov processes Al and {X, — md} have the same drift and diffusion coefficients (Proposition 3.1 or Exercise 2) and, therefore, the same transition probability densities. This means p(t;x+md,y+md)=p(t;x,y)
(t>O;x,ye[0,d);m=0,+1,±2,...). (6.24) Using (6.24) in (6.23), and using the fact that g(md + z') = f(z') for z E [0, d), one gets E[ f (ZS+
,)
J
I {X: 0 u < s}] = =
m
d
g(md + z') p(t; Xs , and + z')
—ao 0
d
=
p(t; X , and + z') dz'.(6.25)
f(z')
5
m
0
=
—ao
Since Zs = X (mod d), there exists a unique integer m o = m 0 (X ) such that Xs = m o d + Z. Then 5
3
p(t; X , and + z') = p(t; m o d + Z , and + z') = p(t; Z S
5
S,
(
m — m 0 )d + z'),
by (6.24). Hence (6.25) reduces to (taking m' = m — m 0 ) E[f(Z ±,)
I {X: 0 - u < s}] =
f
=
J
od .
d
p(t; Z , m'd + z') dz
f (z')
f (z')q(t; Zs
S
,
z') dz'.
(6.26)
0
The desired relation (6.22) is obtained by taking conditional expectations of extreme left and right sides of (6.26) with respect to {Z: 0 < u < s}, noting that {Z: 0 < u < s} is determined by {X: 0 < u < s} (i.e., a{Z: 0 < u < s} c
Q{X:O'
n
Since 0(mod d) = 0 = d(mod d), the state space of {Z} may be regarded as
400
BROWNIAN MOTION AND DIFFUSIONS
the compact set [0, d] with 0 and d identified. Actually, the state space is best thought of as a circle of circumference d, with z as the arc length measured counterclockwise along the circle starting from a fixed point on the circle. It may also be identified with the unit circle by zE-.exp{i2irz/d}. Definition 6.5. Let {XX } be a diffusion on O' with periodic drift and diffusion coefficients with period d. Then the Markov process {Z} := {X,(mod d)}, is called a diffusion on [0, d) with periodic boundary or a diffusion on the circle. Example 3. Let {Xj be a diffusion on 0V with drift and diffusion coefficients µ and a 2 > 0. Then {Z, := X1 (mod 2it)} is the circular Brownian motion, so named since Z, can be identified with e` Z '. The transition probability density of this diffusion is
q(t; z, z')
_
1
(2ja2t)-ßi2 expl (2mn + z' — z ( —6.27) tµ)2 2a2t m=m
Underlying the proofs of Theorems 6.1 and 6.2 is an important principle that may be stated as follows (Exercise 5 and theoretical complement 2). Proposition 6.3. (A General Principle). Suppose {X,} is a Markov process on S having transition operators T, t > 0. Let q be an arbitrary (measurable) function on S. Then {q(XX is a Markov process on S':= q(S) if T ( f o 4,) is a function of (p, for every bounded (measurable) f on S'. )}
'
TWO-POINT BOUNDARY CASE. (Diffusions on S = [0, 1] with "0" and "1" Reflecting). To illustrate the ideas for the general case, let us first see how to obtain a probabilistic construction of a Brownian motion on [0, 1] with zero drift, and both boundary points 0,1 reflecting. This leads to another application of the above general principle 6.3 in the following example. Example 4. Let {X} be an unrestricted Brownian motion with zero drift, starting at x e [0, 1]. Define Z,(' ) := X,(mod 2). By Theorem 6.2, {Z} is a diffusion on [0, 2] (with "0" and "2" identified). Hence {Z, 2 ':= Z,( " — 1} is a diffusion on [-1, 1] whose transition probability density is (see (6.27))
q(2)(t; z, z')
_Y ,,,_ _
( (2m+z'—z) 2 1 (2nQ 2 t) 1/2 expt— 2a2t Jj
for —1 < z', z
,< 1.
(6.28) In particular,
q(2)(t; z, z') = q (2) (t; —z, —z').
(6.29)
It now follows, exactly as in the proof of Theorem 6.1, that {Z r } = { cp(Zt( z) )} :_ {jZt( 2) I} is a Markov process on [0, 1]. For, if f is a continuous function on
401
DIFFUSIONS WITH REFLECTING BOUNDARIES
[0, 1] then E[f(Z,) I Zö' = z] = E[f(IZ; Z ^I) Z0 = z] f (I z'I)q(t; z, z') dz'
= I
=
1
f
f(Iz'I)q(2(t;z
z') dz'
0
= f0 =
f(Iz'I)q ( ^ 1 (t;z,z') dz' + J
0
f
f(Iz'I)(q` 2 (t; z, —z') + q (2 (t; z, z')) dz'
i
f(lz'I)(q (2> (t; — z, z') + q (2 (t;z, z')) dz' (by (6. 29 ))
0
=f(Iz'I)q(t; Izl, z') dz', say,
(6.30)
fo
which is a function of q(z) = IzI. Hence, {Z,} is a Markov process on [0, 1] whose transition probability density is q(t; x, y) = q (2 (t;
m=
— x,
y) + q(2 (t; x, y)
_
(2na2t)
—
co
+ y — x)Zl + exp{ — (2m + y + x)?^J 2QZt li2[exp —(2m
l
j
{
2a2t
forx,ye[0,l]. (6.31) Note that öq(t; x, y) aq( 2 )( t; — x, Y) a9 (2) (t; x, y) at
—
at
m= —ao
+ at (p(t;
at
—x, 2m + y) + p(t; x, 2m + y)), (6.32)
where p(t; x, y) is the transition probability density of a Brownian motion with zero drift and diffusion coefficient 6 2 . Hence _ aq(t; x, at
)
Z 1- a9t>Y) ax2
y
,
(xE(0 , 1 )>YE[0 , 1 ]).
(6.33)
(yE[0,1]).
(6.34)
Also, the first equality in (6.31) shows that anti- x_ v1
=0, UA
1x=0
402
BROWNIAN MOTION AND DIFFUSIONS
Since q (2) (t; —x, y) + q (2) (t; x, y) is symmetric about x = 1 (see (6.21)) one has aq(t; x, y) = 0 3x x=1
(y E [0 , 1 ]).
(6.35)
Thus, q satisfies Kolmogorov's backward equation (6.33), and backward boundary conditions (6.34), (6.35). In precisely the same manner as in Example 4, we may arrive at a more general result. In order to state it, consider µ(• ), Q 2 (•) satisfying Condition (1.1) on [0, 1]. Assume µ(0) = 0 = µ(1).
(6.36)
Extend p(.), a 2 (.) to (— oo, oo) as follows. First set p(—x) = —µ(x),
for x e [0, 1]
u 2 (—x) = r 2 (x)
(6.37)
and then set p(x+2m)=µ(x),
a 2 (x+2m)=v 2 (x)
forxe[-1,1],
m=0,±1,±2.....(6.38)
Theorem 6.4. Let «•)' a 2 (.) be extended as above, and let {XX } be a diffusion on S = (—oo, oo) having these coefficients. Define {Z'} __ {X,(mod 2)}, and {ZZ } _= {IZ,( 1 " — 1I}. Then {Z^} is a diffusion with coefficients p(.) and Q 2 (.) on [0, 1], and reflecting boundary points 0, 1. The condition (6.36) guarantees continuity of the extended coefficients. However, if the given t(.) on [0, 1] does not satisfy this condition, one may modify y(x) at x = 0 and x = 1 as in (6.36). Although this makes z(.) discontinuous, Theorem 6.4 may be shown to go through (see theoretical complements 1 and 2). 7 DIFFUSIONS WITH ABSORBING BOUNDARIES Diffusions with absorbing boundaries are rather simple to describe. Upon arrival at a boundary point (state) the process is to remain in that state for all times thereafter. In particular this entails jumps in the transition probability distribution at absorbing states. 7.1 One-Point Boundary Case (Diffusions on S = [a, oo) with Absorption at a) Let p(• ), a'(.) be defined on S and satisfy Condition (1.1). Extend µ(• ), a'(• ) on all of W in some (arbitrary) manner such that Condition (1.1) holds on II'
403
DIFFUSIONS WITH ABSORBING BOUNDARIES
for this extension. Let {X} denote a diffusion on ER' having these coefficients, and starting at x e [a, oo). Define a new stochastic process {X = } by
^
X,
if t < Ta
a=X^.
ift>,T a ,
(7.1)
where 'c a is the first passage time to a defined by
ra := inf{t >, 0: X`
= a}.
(7.2)
Note that 'c a - i a ({X,}) is a function of the process {X}. Theorem 7.1. The process {X,} is a time-homogeneous Markov process. Proof. Let B be a Borel subset of (a, oo). Then P(XS+ ,EB, and T a 's {X:0-
(7.3)
On the other hand, introducing the shifted process {(X) := X5+ ,: t > 0} and
using the Markov property of
{X,}
one gets, on the set {T a > s},
P(Xs+ ,EB,t a >s1 {X:0‹u<s})
= P(Xs+1 e B, T a > s, and T a > S + t {X: 0 1 u <1 s})
= 1 {ta , $} P((XS ), e B, and 'r a (XS ) > t I {Xu : 0 <- u -< s}) = 1 {^a, ‚ )P(X, e B, and Ta > t I {Xo = Y})I ^ =xs = 1 {ra>S}P(X, e B {X0 = Y})I=x,. ,
(7.4)
For the second equality in (7.4), we have used the fact that if T a > s then T a = s + T a (X, ), being the first passage time to a for X. Also, (T a > s} is determined by {X: 0 < u < s}, so that 1{ßa>s} may be taken outside the conditional probability. Combining (7.3) and (7.4) one gets, for B c (a, oo), P(Xs+ ,eBI {XX :0s) = P(X, e B {X o = y})I v=2= l ira>s ^ = P(XI e B I {Xo = Y})I v=g,• (7.5)
The second equality holds, since r a > s implies Xs = Xs (and, always, X o = Xo ). The last equality holds, since T. < s implies X3 = a, so that for B contained in (a, co), P(X e B I {Xo = y})I y=x , = 0. Now the sigmafield Q{X: 0 < u < s} is contained in v{X: 0 < u < s}, since X„ is determined by {X: 0 < v < u}. Hence P(Xs+ e B {X: 0 < u < s}) may be obtained by taking the conditional expectation of the left side of (7.5), given {X:0 < u < s}. But the last expression
404
BROWNIAN MOTION AND DIFFUSIONS
in (7.5) is already a function of Xs . Hence P(X,,, e B I {X,,: 0< u 5 s}) = P(Ä', E B I Xo = y)I v =g..
(7.6)
For B = {a}, (7.6) may be checked by taking B = (a, oo), and by complementation. This establishes the Markov property of {X,}. • Definition 7.1. The Markov process {XJ is called a diffusion on [a, cc) having drift and diffusion coefficients u(• ), a 2 (• ), and an absorbing boundary at a. The transition probability p(t; x, dy) of {Xj is given by p(t; x, B) := P(X^ e B I {X° = x}) (P(XX e B, and t Q > t I {X° = x})
S(P(TQ '
if B c (a, oo) ifB={a}.
(7 7)
In particular, the transition probability distribution will have a jump (positive probability) at the boundary point {a} if a is accessible. Let p(t; x, y) denote the transition probability density of {X}. Since, for Bc(a, cc) and x>a, p(t;x,B)=P(X1 EB,and to >t1 {X° =x}) P(XX e B I {X° = x}) = p(t; x, B),
(B c (a, oo)),
(7.8)
°
°
P(t; x, dy) is given by a density p (t; x, y), say, on (a, co) with p (t; x, y) < p(t; x, y) (for x > a, y > a) (Exercise 1). One may then rewrite (7.7) as
IffB
p°(t;x,y)dy
P(t; x B) = P(TQ t)
ifBc(a, cc) and x>a, ifB={a},
(7.9)
where P. denotes the distribution of {X} starting at x. Thus for the analytical determination of p(t; x, dy) one needs to find the density p ° (t; x, y) and the function (t, x) -^ Px (ra < t) for x > a. It is shown in Section 15 that p° (t; x, y) satisfies the same backward equation as does p(t; x, y) (also see Exercise 2).
°
ap (t; x, y) = öt
I 2
0Zp° ap0
(t>0;x>a,y>a), (7.10)
Za (x) axz + µ(x) Ox
and the Dirichlet boundary condition
°
lim p (t; x, y) = 0. xlo
(7.11)
405
DIFFUSIONS WITH ABSORBING BOUNDARIES
Indeed, (7.11) may be derived from the relation (see (7.9)) P ° (t; x, Y) dY = Px (T a > t),
(x > a, t > 0),
(7.12)
noting that as x j a the probability on the right side goes to zero. If one assumes that p ° (t; x, y) has a limit as x j a, then this limit must be zero. By the same method as used in the derivation of (6.17) it will follow (Exercise 3) that p° satisfies the forward equation
aP ° (t;x,Y) = aZ (i 6Z (Y)P ° ) — öt
ay2
a (µ(Y)P ° ),
(t > 0; x > a,Y > a), (7.13)
aY
and the forward boundary condition lim p ° (t; x, y) = 0,
(t > 0; x > a).
(7.14)
y1a
° °
For a Brownian motion on [0, co) with an absorbing boundary zero, p is calculated in Section 8 analytically. A purely probabilistic derivation of p is sketched in Exercises 11.5, 11.6, and 11.11.
7.2 Two-Point Boundary Case (Diffusions on S = [a, b] with Two Absorbing Boundary Points a, b) Let U(• ), i 2 (•) be defined on [a, b]. Extend these to J' in any manner such that Condition (1.1) is satisfied. Let {X,} be a diffusion on R' having these extended coefficients, starting at a point in [a, b]. Define the stopped process {X,} by,
tx, X,
ift>T,(7.15) ift ^i,
where T is the first passage time to the boundary, r:=
inf{t>,0:X,=aor
(7.16)
X,=b}.
Virtually the same proof as given for Theorem 7.1 applies to show that {X t } is a Markov process on [a, b]. Definition 7.2. The process {X,} in (7.15) is called a diffusion coefficients i(.), a'(•) and with two absorbing boundaries a, b.
Once again, the transition probability
p(t; x, dy) of {X}
on [a, b] with
is given by a density
406
BROWNIAN MOTION AND DIFFUSIONS
p ° (t; x, y) when restricted to the interior (a, b),
°
p(t; x, B) = J p (t; x, y) dy a
(t > 0; a <x < b, B c (a, b)). (7.17)
This density has total mass less than 1,
J
°
P (t; x, y) dy = PX (r > t),
(t > 0; x e (a, b)),
(7.18)
(a,b)
where Ps is the distribution of {X}. Also, by the same argument as in the one-point boundary case, lim p ° (t; x, y) = 0,
(7.19)
Y1a
lim p ° (t; x, y) = 0;
(t > 0; x e (a, b)).
(7.20)
YIb
Unlike the one-point boundary case, however, p ° does not completely determine p(t; x, dy) in the present case. For this, one also needs to calculate the probabilities P(t; x, {b}) := Px (i ,< t, XL = b). (7.21)
P(t; x, {a}) °= Px (r < t, Xi = a),
In order to calculate these probabilities, let A denote the event that the diffusion {X,}, starting at x, reaches a before reaching b, and let DD be the event that by time t the process does not reach either a or b but eventually reaches a before b. Then DD c A and P(t; x, {a}) = Px(A\D,) = P(A) — Px(DD) = ‚ 1i(x) — P(D),
(7.22)
where (/i(x) is the probability that starting at x the diffusion {X,} reaches a before b. Conditioning on XX , one gets for t > 0, a <x < b,
PP(DD) =
f
°
PX(D, 1 X, = y)P (t; x, y) dy =
(a,b)
f
i(y)P°(t; x, y) dy. (7.23)
(a.b)
Substituting (7.22) in (7.23) yields p(t; x, {a}) = ^i(x)
—
J
°
^i(y)p (t; x, y) dy, (a,b)
Therefore we have the following result.
(t > 0, a < x < b). (7.24)
DIFFUSIONS WITH ABSORBING BOUNDARIES
407
Proposition 7.2. Let µ(x) and a 2 (x) satisfy Condition (1.1) on (—oo, oo). Let S = [a, b],
for some a < b. Then the probability p(t; x, {a}) is given by
P(t; x, {a}) = /i(x)
—
J0
6 ^i(Y)p° (t; x, y) dy
(t > 0, a <x
< b), (7.25)
where '(y) = PY (X, = a), Py being the distribution of the unrestricted process {X,} starting at y. The function çli(x) in Proposition 7.2 is given by (2.24) as well as (4.17). It is also calculated in Section 8, where it is shown that tJi(x) can be obtained as the solution to a boundary-value problem that is the continuous (or, differential equation) analog of the discrete (or difference equation) boundary-value problem (Chapter III, Eqs. 2.4-2.5) for the birth—death process. It is given by
rJi(x) _
f. f '
exp
dy } dz
< Ja ^Z(Y) ) ) (a < x <, b). 2µ(y) dy 1J dz —
Z
(7.26)
exp —f(Y) ." O
Finally, p°(t; x, y) dy — p(t; x, {a}).
p(t; x, {b}) = Px (r < t) — p(t; x, {a}) = 1 — f (a,b)
(7.27)
For the case of a Brownian motion on [a, b] with both boundary points a, b absorbing, calculations of p°(t; x, y), /i(x) based on eigenfunction expansions and (7.26) are given in Section 8. A purely probabilistic calculation is sketched in Exercises 11.5, 11.6, and 11.10.
7.3 Mixed Two-Point Boundary Case (One Absorbing Boundary Point and One Reflecting) Let {X} be a diffusion on [a, oo) with a reflecting boundary a, having drift and diffusion coefficients t(•), a 2 (.). For b > a, define X, i {
b
if t < zb, ift>T,.
(7.28)
If {X,} starts at x e [a, b], then {1,} is a Markov process on [a, b], starting at x, which is called a diffusion on [a, b] having coefficients p(•), a 2 (• ), and a reflecting boundary point a and an absorbing boundary point b.
408
BROWNIAN MOTION AND DIFFUSIONS
8 CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
Consider an arbitrary diffusion on an interval S with drift µ(x) and diffusion coefficient a 2 (x). If an end point is accessible then choose either a Neumann (reflecting) or a Dirichlet (absorbing) boundary condition. Let S denote the interior of S. Define the function
°
2a exp{I(x ° , x)} , a 2 (x)
x e S,
n(x) =
(8.1)
where a is an arbitrary positive constant, x 0 is an arbitrarily chosen state, and 1(x 0 , x) := x
2µ(z)
dz (8.2)
xo a 2 (Z)
Notice that n(x) is proportional to the limiting measure obtained from Eq. 4.1 of Chapter III, in the diffusion limit of birth—death chains with the parameters given by (4.2) of the present chapter. Consider the space LZ (S, it) of real-valued functions on S that are square integrable with respect to the density n. Let f, g be twice continuously differentiable functions that satisfy the backward Neumann or Dirichiet boundary condition(s) imposed at the boundary point(s), and that vanish outside a finite interval. Then, upon integration by parts, one obtains the following property for A (Exercise 1) (8.3)
x E S ° ,
(8.4)
and the inner product < , >,, is defined by
f
f(x)h(x)n(x) dx.
(8.5)
S
In view of the symmetry of A reflected in (8.3), one may expect that there is a spectral representation of the transition probability analogous to that for discrete state spaces (see Section 4 of Chapter III and Section 9 of Chapter IV). That this is indeed the case will be illustrated in forthcoming examples of this section (also see theoretical complement 2). Consider the case in which S is a closed and bounded interval. The idea behind this method is that if i is an eigenfunction of A (including boundary
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
409
conditions) corresponding to an eigenvalue a, i.e., A>y = a>G
(8.6)
then u(t, x) = e /i(x) solves the backward equation
^u = eai/i(x) = eAi(x) = Au(t, x)
(8.7)
^t
and u satisfies the boundary conditions. Likewise, if u(t, x) is a superposition (linear combination) of such functions then the same will be true. Suppose that the set of eigenvalues (counting multiplicities) is countable, say 1 0 , a,, a 2 , . .. with corresponding eigenfunctions Mi o , 0 l , . .. of unit length, i.e., II0.II = <., .> 112 = 1. It is simple to check that eigenfunctions corresponding to distinct eigenvalues are orthogonal with respect to the inner product (8.5) (Exercise 2). Also if there are more than one linearly independent eigenfunctions for a single eigenvalue, then these can be orthogonalized by the Gram—Schmidt procedure (Exercise 2). So , ... can be taken to be orthonormal. If the set of finite linear combinations of the eigenfunctions is dense in LZ (S, it), then the eigenfunctions are said to be complete. In this case each f E LZ (S, n) has a Fourier expansion of the form J = Y_
(8.8)
n=0
Consider the superposition defined by u(t, x) =
E e"^`
(8.9)
n=0
Then u satisfies the backward equation and the boundary condition(s) together with the initial condition u(0, x) = f(x).
(8.10)
T^f(x) = EXf(X,) = I .f(y)p(t; x, dy)
(8.11)
However, the function
Js
also satisfies the same backward equation, boundary condition and initial condition. So if there is uniqueness for a sufficiently large class of initial functions f (see theoretical complement 2) then we get T,f(x) = u(t, x),
(8.12)
410
BROWNIAN MOTION AND DIFFUSIONS
which means
fs f(Y)p(t; x, dy) = fs f(Y)(Z
e""'^„(x)^n(Y))it(Y) dy.
(8.13)
=O
In such cases, therefore, p(t; x, dy) has a density p(t; x, y) and it is given by p(t; x, y) _Ze "'^n(x)^ n(Y)n(Y) a
(8.14)
n=o
In the present section this method of computation of transition probabilities is illustrated in the case that µ(x) and o 2 (x) are constants. In Examples I and 2 below, y = 0 and o 2 (x) - • z > 0, so that n(x) is a constant that may be taken to be 1. Explicit computations of eigenvalues and eigenfunctions are possible only in very special cases. There are, however, effective numerical procedures for their approximate computation. Example 1. (Brownian Motion on [0, d] with Two Reflecting Boundaries). In this case, S = [0, d], and Kolmogorov's backward equation for the transition probability density function p(t; x, y) is z öt
zoz
ax
(t > 0, 0 < x < d; y e S),
(8.15)
with the backward boundary condition p a t
äx x=o,a
=0
(t > 0; ye S).
(8.16)
We seek eigenvalues a of A and corresponding eigenfunctions 0. That is, consider solutions of ?oz^„(x) = ai(x),
(8.17)
ßi'(0) = '(d) = 0.
(8.18)
Check that the functions ( m=0,1,2,...), ( 8.19 ) ^i m (x) = b m cos mit ^x8.19 l
satisfy (8.17), (8.18), with a given by m z ir 2 a z —,„ 2d2 (m=0,1,2,...).
(8.20)
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
411
Now the function (8.1) is here it(x) = 1. Since
J
l z dx=d, 0
J cost(—Jdx =2,
(8.21)
0
the normalizing constant b m is given by
b rn =
jI
(m ? 1);
\
bo
d
(8.22)
To see whether (8.19) and (8.20) provide all the eigenvalues and eigenfunctions, one needs to check the completeness of {di m : m = 0, 1, 2, ...} in L2 ([0, d], n). This may be done by using Fourier series (Exercise 10). Thus,
p(t; x, y) = L e
a"t
II/m(x)frm(y)
m=0
z z z 1 2 1 anmmirx m7ry cos ( X ) exp — t cos d dm, 2dz d
_ — +
(t >0,0<x,y
Notice that lim p(t; x, y) =
1
d ,
( 8.24)
the convergence being exponentially fast, uniformly with respect to x, y. Thus, as t —* oo, the distribution of the process at time t converges to the uniform distribution on [0, d]. In particular, this limiting distribution provides the invariant initial distribution for the process.
Example 2. (Brownian Motion with One Reflecting Boundary). Let S = [0, oo), µ(x) - 0, a 2 (x) - a z > 0. We seek the transition density p that satisfies
ap
2
a lp
at= 2 a
äxZ
ap ax I =o
=0
fort>0, x >0, y:0,
(8.25) fort>0, y>,O.
X
This diffusion is the limiting form of that in Example 1 as d j oo. Letting d j oo
412
BROWNIAN MOTION AND DIFFUSIONS
in (8.23) one has 1 (— oznz(m/d)2 1 ^— p(t; x, y) = lim - exp 1 t)t cos cos d r^d m 2 =-m d d \ m^y^ ( to z i z u zl °° = exp{ — cos(zrxu) cos(nyu) du 2 1.
-X
= 1
_ 2^
)
Zzz exp - ta [cos(n(x + y)u) + cos(n(x - y)u)] du 2. I (e ,J-^
t62 z 1 dZ + e) ex{p _ l I
= (2ntor) lz ^1z (exp {( _
(x + Z )Z } + ex 2tv )
p
( _ nu)
2tQ Z )Z) / (x
(t>0,0<x,y
The last equality in (8.26) follows from Fourier inversion and the fact that exp{ — ta 2 ^ 2 /2} is the characteristic function of the normal distribution with mean zero and variance to e . Example 3. (Diffusion on S = [0, d] with Constant Coefficients and Two Absorbing Boundaries). Here S = [0, d], and the transition probability has a density p(t; x, y) on 0 < x, y < d that satisfies the backward equation z (t>0,0<x
(8.27)
2a +µax,
the backward boundary conditions Jim p(t; x, y) = 0,
lim p(t; x, y) = 0,
x10
xjd
(8.28)
for t > 0, 0
(8.29)
are eigenfunctions corresponding to eigenvalues m z n 2^2 µ 2 « = a. 2d2
2a2 ,
(m = 1, 2, ...).
(8.30)
413
CALCULATION OF TRANSITION PROBABILITIES BY SPECTRAL METHODS
Since the function in (8.1) is in this case given by n(x) = exp { 2px )) } „JJ
( 8.31 )
(O ^ x ^ d),
the normalizing constants are 2
b m =
(m=1,2,...).
d
(8.32)
The eigenfunction expansion of p is therefore p(t; x,
W Z eam'IJm(x)Im(Y)ir(Y) m=I
y) =
x) _ 2 exp( µ(Y0-z } exp
t
l 2d 2 t — ia-i j expj —m22daZt^ m=1
x sin( m dx ) sin^ m dy^
(t > 0,0 < x, y < d).
(8.33)
It remains to calculate p(t; x, {0}) = PX (XX = 0),
p(t; x, {d}) = PX (X, = d),
0 < x < d. (8.34)
Note that p(t; x, {0}) + p(t; x, {d}) = 1 — J
p(t; x, y) dy (0d)
= 1 — 2 exp
m
1 — Q2c m^I a
exp{ l
d^l
m2
—
2t
(
mix
si n( ^
\(8.35)
where d
am =
exp " sin
(8.36)
y)dy.
f'
Thus, it is enough to determine p(t; x, {0}); the function determined from (8.35). From (7.25) we get p(t; x, {0}) = +1(x) —
J
0(y)p(t; x, y) dy
p(t; x, {d})
is then
(8.37)
(0 d)
where i/i(x) = Px ({X,} reaches 0 before d). From the calculations of (9.6) in
414
BROWNIAN MOTION AND DIFFUSIONS
Chapter I we have x ^ (x) =
ifµ=0
— exp{2µ(d — x)/a2} 1 — exp{2dµ/6 2 }
(8.38) ifµ
0.
Substituting (8.33), (8.38) in (8.37) yields p(t; x, {0}). Example 4. (Diffusion on S = [0, oo) with Constant Coefficients µ, o 2 > 0; Zero an Absorbing Boundary). The transition density function p(t; x, y) for
x, y e (0, oo) is obtained as a limit of (8.33) as d T oo. That is,
µIY —^z x) _ J nz 2 exp l 2 p(t; x, y) =
exp —
2 sin(irxu) sin(ityu) du
µ2t }n2o2tu2
2 _ exp
1 J exp{ - 2 } sin(Zx) sin(ZY) dZ -
tµ( a 2 x) 2 Q Z —
_ exp p(y 2
2 2t)
i J
x) — /
1.
o
))
exp{ — l ort z ). 0
x [cos(^(x — y)) — cos(^(x + y))] d^
1
= xp e 2ir (
z µ(Y — x) _ /it ) Qz
e-i^cx-ri _ e -14cx+Y)) e -a 2 ,4 2 J2 dZ
2QZ
1 µ2t ) = exp { µ(y — x) — l Q z 2vz (2na 2 t) 1 / 2 x exp —
z (x — y) z (xexp+ — y) 2o 2 t
2vzt
(8.39)
Integration of (8.39) with respect to y over (0, oo) yields p(t; x, {0}`), which is the probability that, starting from x, the first passage time to zero is greater than t. Examples 3 and 4 yield the distributions of the maximum and the minimum and the joint distribution of the maximum, minimum and the state at time t of a diffusion with constant coefficients over the time period [0, t] (Exercises 4-6).
9 TRANSIENCE AND RECURRENCE OF DIFFUSIONS Let {X': t 0} be a diffusion on an interval S, with drift and diffusion coefficients µ(x) and a 2 (x), starting at x. Let [c, d] c S, c < d, and let
415
TRANSIENCE AND RECURRENCE OF DIFFUSIONS
O(x) = P({XI } reaches c before d),
c < x < d.
(9.1)
Assume, as always, that Condition (1.1) holds. Criteria for transience and recurrence of diffusions may be derived from the computation of i/i(x) given in Section 4 (see Eq. (4.17)). A different method of computation based on solving a differential equation governing ifr is given in this section; recall Chapter III, Eqs. 2.4, 2.5 in the case of the birth—death chain for the analogous discrete problem. The present method is similar to that of Proposition 2.5, but does not make use of martingales. It does, however, use the fact derived in Exercise 3.5, namely, Px ( max IX, — xI > s = o(h) os<
as h j 0
for every s > 0.
/
Assume that this last fact holds.
Proposition 9.1. i/i is the solution to the two-point boundary-value problem: za 2 (x)r/i"(x) + µ(x)ç/i'(x) = 0
for c <x < d, (9.2)
i(c) = 1,
'(d)=0.
Proof. Let i denote the first passage to the set {c, d} with two elements. The event {i
> h} - {c < min{X: 0 < u < h} < max{X: O <, u < h}
is determined by {X,,: 0 <, u < h}. Therefore, by the Markov property,
Px (r>h,X,=c)=E x [P('r>h,X,=c1 {X:0<,u s h})] =Ex [P(r>h,(X,; ) T =c1{X.:0
= Ex[ 1 {T>h)P((Xti = Ex(l{r>h}/(Xh))•
(9.3)
Here X,; is the shifted process {(X,; ),} := {X,, ,: t >, 0} and tr' = inf{t: (X,) = c} (= T — h on {r> h}). Now extend the function i/i(x) over x
P.('r > h, X^ = c) = ExÜ0 (Xh) — EX(l(T,n)(J(Xh))
=
J
I (y)p(h; x, y) dy + o(h), as h j 0. (9.4) s
The remainder term in (9.4) is o(h) by the third condition in (1.2)'. Actually,
416
BROWNIAN MOTION AND DIFFUSIONS
one needs the stronger property that PX(maxos5shlX5 — xj > E) = o(h) (see Exercise 3.5). For the same reason, Px (r
ashj0.
(9.5)
By (9.4) and (9.5) i/i(x) = Px (r > h, XL = c) + Px (r < h, XL = c) = (T h, /I)(x) + o(h),
as h j 0.
(9.6)
Hence lim (Th/)(x) — /i(x) = 0. hlo h But the limit on the left is
(Ai/i)(x) =
(9.7)
Za 2 (x)^/i"(x) + p(x)/i'(x) (see Section 2). n
In order to solve the two-point boundary-value problem (9.2), write I(x,z):=
J
2,u(y) dy
(x,zES),
(9.8)
x a 2 (y) -
with the usual convention that 1(z, x) _ —I(x, z). Fix an arbitrary x 0 E S, and define the scale function s(x) = s(x o ; x).=
J
exp{-1(x 0 , z)} dz,
x
(x e S),
(9.9)
xo
and the speed function m(x)
- m(x o ; x):=
J
2 exp{I(x 0 , z)} dz,
x xa
(x E S).
(9.10)
a2 ( z )
In terms of these two functions, the differential operator A
d2
d
= 2 6 2 (x) dx2 + µ(x) —x (9.11)
takes the simple form
A
=
d d dm(x) ds(x)
(9.12)
417
TRANSIENCE AND RECURRENCE OF DIFFUSIONS
In other words, for every twice-differentiable function f on S (Exercise 1)
=d
(Af)(x)
(df (x)
dm(x)
(9.13)
ds(x)
where, by the chain rule, df(x) — df(x) dx _ 1 ds(x)
df(x) — 1
dx ds(x) s'(x) f (x) '
m'(x)f (x).
dm(x)
(9.14
)
The differential equation in (9.2) may now be expressed as d dm(x) ds(x)
(9.15)
which yields do(x) = c l ds(x)
O(x) = c l s(x) + c 2
,
(9.16)
.
The boundary conditions in (9.2) now yield,
= —
--) cl
1-
c2
s(d)—s(c)
= —c i s(d) =
s(d)
(C.
)
s(d)—s(c)
(9.17)
Use these constants in (9.16) to get
O(x) = s(d) — s(x) s(d)—s(c)
=
s(d;x,) — s(x;x0)
(9.18) s(d;x,)—s(c;x,)
Taking x o = c and noting that s(c; c) = 0, one has
J '(x) =
d
fd
exp{ — I(c, z)} dz (9.19)
exp{-1(c, z)} dz
c
The relation (9.18) justifies the scale function nomenclature for the function s. When distance is measured in this scale, i.e., when s(y) — s(x) is regarded as the distance between x and y (x < y), then the probability of reaching c before d starting at x is proportional to the distance s(d) — s(x) between x and d. This is a property of Brownian motion, for which s(x; 0) = x (also see Eq. 9.6, Chapter 1). In particular, starting from the middle of an interval in this scale,
418
BROWNIAN MOTION AND DIFFUSIONS
the probability of reaching the left end point before the right end point is Z, the same as that of reaching the right end point first. It follows from (9.18) (Exercise 2) that
} reaches d before c cp{x)'= Px {{X
)
s(x)—s(c) _ _
f
s(d) — s(c)
e-i<<,:^ dz . (9.20)
x
d
e_'' dz
Jc
Of course, (9.20) may be proved directly in the same way as (9.18). Next write p x , = Px ({X} ever reaches y}),
(x, y e S).
(9.21)
Then the following results hold.
Proposition 9.2. Let S = (a, b). Fix x 0 e S arbitrarily. (a) If s(a) = s(x o ; a) = —cc,,
(9.22)
then p, = 1 for all y > x. If s(x o ; a) is finite, then x exp{ —1(x 0 , z)} dz s(x) — s(a) Px Y =
a
f
(y > x). (9.23)
s(Y) — s(a) =
exp{ —1(x 0 , z)} dz y
.
(b)If (9.24)
s(x o ; b) = oc,
then p, = 1 for all y < x. If s(x o ; b) is finite, then
s(b) — s(x)= fX' pxY — s(b) — s(y)
exp{-1(x o , z)} dz (y < x). (9.25)
f b exp{ —1(x o , z)} dz
J
'
Y
Proof. (a) If y> x then p x ^, is obtained from (9.20) by letting d = y, and c J. a.
(b) If y < x, then use (9.18) with
c = y,
and let d lb.
n
TRANSIENCE AND RECURRENCE OF DIFFUSIONS
419
Definition 9.1. A state y is recurrent if p xy = 1 for all x E S such that p vx > 0, and is transient otherwise. If all states in S are recurrent, then the diffusion is said to be recurrent. The following corollary is a useful consequence of Proposition 9.2.
Corollary 9.3. A diffusion on S = ( a, b) with coefficients µ(x), r 2 (x) is recurrent if and only if s(a) = —cc and s(b) = oo. If S has one or two boundary points, then p xv is given by the following proposition. The modifications of the above to get these conditions are left to exercises. Proposition 9.4 (a) Suppose S = [a, b) and a is reflecting. Then the diffusion is recurrent if and only if s(b) = oo. If, on the other hand, s(b) < oo then one has p 1 for y > x and b exp{-1(x 0 , z)} dz
Pxv =
s(b)—s(x) = s(b) — s(y)
fb
y<x.
(9.26)
exp{ — I(x o ,z)} dz
y
(b) Suppose S = [ a, b) and a is absorbing. Then the only recurrent state is a and s(x) — s(a)
if0<x<'y,
s(y) — s(a)
ifs(b)=oo and 0,
Px y = I
s(b)—s(x)
if s(b) < oc and 0
s(b) — s(y)
(9.27)
,< y ,< x.
(c) Suppose S = [ a, b] with both boundaries reflecting. Then p xy = I for all x, y E S. and the diffusion is recurrent. (d) Suppose S = [ a, b] with a absorbing and b reflecting. Then the only recurrent state is a, p xy = 1 for y < x, and s(x) — s(a) Px v
= s(y) — s(a)
y
,
> x.
(9.28)
(e) Suppose S = [ a, b] with both boundaries absorbing. Then all states,
BROWNIAN MOTION AND DIFFUSIONS
420
other than a and b, are transient and one has s(b) — s(x)
a y x
s(b) — s(y) Px y =
s(x) — s(a)
(9.29)
a<x
s(y) — s(a)
10 NULL AND POSITIVE RECURRENCE OF DIFFUSIONS As in the case of Markov chains, the existence of an invariant initial distribution for a diffusion is equivalent to its positive recurrence. The present section is devoted to the derivation of a criterion for positive recurrence. For a diffusion {X} let t y denote the first passage time to a state y defined by
(10.1)
r y = inf{t >, 0: X, = y},
with the usual convention that t y (uw) = oo for those sample points w for which X,(oo) does not equal y for any t (i.e., if the set on the right side of (10.1) is empty). If c < d are two states, then write z:=inf {t^0:X(c,d)} =r.A r d ,
(10.2)
where A denotes the minimum, i.e., S A t = min(s, t). Then i is the first time the process is at c or d, provided X. e [c, d]. Let us now calculate the mean escape time M(x) of a diffusion from an interval (c, d),
M(x):= E x T,
(10.3)
Note that M(x) = 0 if x (c, d). Here [c, d] c S. Now denoting by Xti the shifted process {(Xh ),:= Xk+( : t > 0}, one has
T = h + r(X,;)
on the set {T > h},
(10.4)
since i(X,;) = z({(X,; ),}) is precisely the additional time needed by the process {X1 }, after time h, to escape from (c, d). Because 1{,>k} is determined by (i.e., measurable with respect to) {X: 0 < u < h}, it follows by the Markov property that E x ( 1 {s>k)T(Xj )) = E x [1{ T >k)E(i(Xn )
{X: 0 s u
h})] = E x( 1 {z>k}(Eyt)
= E X(l{r>k)M(Xk )) = EX(M(Xh)) — Ex( 1 {t,k)M(Xh)).
y
=x ti )
(10.5)
421
NULL AND POSITIVE RECURRENCE OF DIFFUSIONS
By (10.4) and (10.5), for x e (c, d), M(x) = E x t = hPP (i > h) + EX(M(Xh)) — EX(I{isn)M(Xh)) + Ez(r l{ hl) = h + EX(M(Xh)) — hP(r < h) — EX(l(T
1
lim - {(Th M)(x) — M(x)} _ —1,
(10.7)
Noh
i.e., (AM)(x) = —1 for x e (c, d). Therefore the following result is proved.
Proposition 10.1. The mean escape time M(x) from (c, d) satisfies Zv z (x)M"(x) + u(x)M'(x) = —1,
c < x < d, (10.8)
M(c) = M(d) = 0. In order to solve the two-point boundary-value problem (10.8), express the differential equation as (see Section 9) d
dM(x) —
dm(x) ds(x)
1,
(10.9)
whose general solution is given by dM(x) =
—m(x) + c 1 ,
ds(x)
M(x) = c,s(x) — f m(y) ds(y) + c 2 . (10.10) cx
From the boundary conditions in (10.8) it follows that ^ d m(y) ds(y) s(d) — s(c) '
c '
cz = —c,s(c).
Substituting this in (10.10), one gets
M(x) =
S(C; X) d
z
m(c; y)s'(c; y) dy —
s(c; d) ff
m(c; y)s'(c; y) dy
(10.11)
422
BROWNIAN MOTION AND DIFFUSIONS
_
jd J
—
e-rt^.:^ dz d
e-,cc,=) dz <
^
y
^2(z)
eic'.Z) dz ] e_'.
<
x y 2
dY
dz e `c'.y dY (c < x < d). (10.12) )
-
^ LJG 6 2 e''
Equation (10.9) justifies the nomenclature speed measure given to m. For suppose the diffusion is in natural scale, i.e., that it has been relabeled so that s(x) = x. Then (10.9) may be expressed as M"(x) = —m'(x).
(10.13)
If m l , m 2 are two speed measures and M l , M2 the corresponding expected times to escape from (c, d), then m' (x) > m'' (x) for all x implies M 1 (x) >, M(x) for all x (Exercise 8). Thus, the larger m' is, the slower is the speed of escape. For a Brownian motion with zero drift and diffusion coefficient a 2 , the speed measure is m(x) = (2/a l )x (with x o = 0) and m'(x) = 2/a 2 . Suppose the state space S has an inaccessible right end point b. Write tc A T d for min{z,, t d }, the first passage time to {c, d}, so that M(x) = E x (; A i d ),
c < x < d.
(10.14)
By letting d T b in (10.14), one obtains Ex ; provided 1im r d = oo
with probability 1,
(10.15)
dlb
which is what inaccessibility of b means. It may be shown that (10.15) holds, in particular, if s(b) = oo (Exercise 2). In this case, E x ; is finite if and only if (Exercise 3) m(b) := m(x o ; b) b= 2 e`cx° M dz < cc.
(10.16)
It now follows from the first equality in (10.12) that (Exercise 3) Ex ; =
JC
(m(b) — m(y)) ds(y),
x > c.
(10.17)
Proceeding in the same manner, it follows (Exercise 4) that if S has an inaccessible lower end point a and s(a) = — oo, then E x id < oo (x < d) if and
423
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
only if m(a):_ —m(x o ; a):=
—
f
2
e`cX°.z) dz -
a. a2(Z) a
.
X°
2—
a'
e-(z,x°)
dz < oc .
W (10.18)
Then (Exercise 4)
E rd X
=
(m(y) — m(a)) ds(y),
id
x < d.
(10.19)
Finiteness (with probability 1) of the passage time t y under every initial state is the meaning of recurrence of a diffusion. A stronger property is the finiteness of their expectations. Definition 10.1. A diffusion on S is positive recurrent if Er y < co for all x, y c S. A recurrent diffusion that is not positive recurrent is null recurrent. The positive recurrence of a diffusion implies its recurrence, although the converse is not true. The following result has been proved above. Proposition 10.2. Suppose S = ( a, b). Then the diffusion is positive recurrent, if and only if s(a) = —cc,
m(a)> —co,
(10.20)
rn(b) < oo.
(10.21)
and s(b) = +oo,
For intervals S having boundary points one may prove the following result. The modifications that give this result are left to exercises. Proposition 10.3 (a) Suppose S = [ a, b) with a a reflecting boundary. Then the diffusion is positive recurrent if and only if s(b) = cc and m(b) < oo. (b) Suppose S = [a, b] with a and b reflecting boundaries. Then the diffusion is positive recurrent.
11 STOPPING TIMES AND THE STRONG MARKOV PROPERTY
Take Q to be the set of all possible. paths, i.e., the set of all continuous functions aw on the time interval [0, co) into the interval S (state space). In this case X,
424
BROWNIAN MOTION AND DIFFUSIONS
is the value of the trajectory (function) at time t, X1 (w) = w,. The sigmafield F is the smallest sigmafield that includes all finite-dimensional events of the form {X^. e B, for I < i ,< n}, where 0 .< t 1
This says that, for a stopping time i, whether or not to stop by time t is determined by the sample path or trajectory up to time t. Check that constant times T - t o are stopping times. The first passage times t y , ; A T d are important examples of stopping times (Exercise 1). By the "past up to time t" we mean the pre-T sigmafield of events .FT defined by Wit -=v{XIAt :
t
^
0}.
(11.1)
The stochastic process {X} :_ {X,,, } is referred to as the process stopped at r. Events in .fit depend only on the process stopped at i. The stopped process contains no further information about the process {X1 } beyond the time T. An alternative description of the pre-T sigmafield, which is often useful in checking whether a particular event belongs to it, is as follows, 5t ={Fe. :Fn{z
(11.2)
For example, using this it is simple to check that {T
(11.3)
The proof of the equivalence of (11.1) and (11.2) is a little long and is omitted (see theoretical complement 1). For our purposes in this section, we take (11.2) as the definition of .yt . It follows from (11.1) or (11.2) that if i is the constant time r - t o , then .; =^
As intuition would suggest, if t, and r 2 are two stopping times such that r 2 then F cßi2 .
(11.4)
425
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
For, if A e 34 ,, then An {T Z -t}
=(An{T 1
(11.5)
Since A n {z, < t} E .F, and {z 2 > t} _ {T Z < t}` E .y„ the set on the right side of (11.5) is in .F,. Hence A e.yr2 . Another important property of stopping times is that, if i 1 and r 2 are stopping times, then r, A i 2 - min{r,, i 2 } is also a stopping time. This is intuitively clear and simple to prove (Exercise 2). It follows from (11.4) that c—
5b
Ft
A cz C 9 n ^`tz
Finite-dimensional events in J1t are those that depend only on finitely many coordinates of the stopped process, i.e., events of the form (11.6)
G={(X( I ni,...,X,..,,)EB}
where B is a Borel subset of Sm := S x S x • .. x S (m-fold). Similarly, a finite-dimensional -measurable function is of the form (11.7)
f(Xt l ^i,...,XI-AT)
where f is a Borel-measurable function on S'°. For example, one may take f to be a continuous function on S . More generally, one may consider an arbitrary measurable set F of paths and let G be the event that the stopped process lies in F, (11.8)
(Fey).
G = [{Xt „ T } E F]
The sigmafield . is precisely this class of sets G. The Markov property holds if the "past" and "future" are defined relative to stopping times, and not merely constant times. For this the past relative to a stopping time T is taken to be the pre-T sigmafield information contained in the stopped process {X,,, T }. The future relative to to r is the after-T process Xt = {(XT )1 } obtained by viewing {X,} from time t = r onwards, for r < oo. That is, (XT ),(w) = XZ( )+ ,(cv),
t
0,
on {w: T(w) <
}.
(11.9)
Since the after-T process is defined only on the set {r < oo}, events based on it are subsets of {r < co}. The after -i sigmafield on IT < cc} comprises sets of the form
H={Xt eF}n{r
(FE.).
(11.10)
Finite-dimensional events and functions measurable with respect to this sigmafield of subsets on {T < cc} may be defined as in (11.6) and (11.7) with X,,, 1 replaced by X
426
BROWNIAN MOTION AND DIFFUSIONS
Theorem 11.1. (Strong Markov Property). Let i be a stopping time. On {i < oo}, the conditional distribution of X, given the past up to the time i is the same as the distribution of the diffusion {X} starting at X. In other words, the conditional distribution is Px on {x < co}. Proof. First assume that i has countably many values ordered as 0 s s l < s 2 < • . Consider a finite-dimensional function of the after-r process of the form (11.11)
{t < oo},
h(X,,,, X^+r2, ... ,
where h is a bounded continuous real-valued function on S' and
0<'t1
E[h(X,,,I, .
.. , XT+r,)1{T<,}I ]
_ (E y h(X ,
> tc))y=x.1{t<^}. (11.12)
If (11.12) holds, then for every bounded .Ft measurable function Z we have E(Zh(X, + , ... , X+.) 1 <) = E(E[Zh(XX+ts> ... , Xt+t.) 1 {=<
.'t1)
= E(ZE[h(XX+> . .. , X'+t) 1 {t<'} t.^^)
= E(Z[Eh(Xt ,, .
.. , Xj.)]y=x, 1 ( <})• (11.13)
Conversely, if the equality between the first and last terms in (11.13) holds for all bounded measurable functions Z of the stopped process, then (11.12) holds. Indeed, to verify (11.12) it is enough to check (11.13) for finite-dimensional functions Z of the form (11.7), where f is bounded and continuous on S'° (Exercise 3). Now E(.f (XT„ t,, . .. , XT„ t,)h(Xt+t,, ... , XT+t,) 1 {^=s ; }) ... , X+)1{==s;})
= E(f(XX ; t>> ... , X = E(E[(f(XS;At> ... X = E(f(XS
;nt
„)h(X• . 41 , .
.. , X. tß)
1
=}) I s;^)
. .. , Xs ; ,, t)lt==5 ;}E[h(XS ; +t> .. . , Xs+) ^^)
(11.14)
since XSJA t ,, ... , X,, A „ are determined by {Xs : 0 < s <, s ; }, i.e., are measurable with respect to ^sJ ; also, {'c=si} = {a <si } n {z <si t } c Efs' -
By the Markov property, the last expression in (11.14) equals E(.f(XS ; A
,
.
.. , X,, t ß ,) 1 {==s,}[Ey h(Xt,, ... , Xt:})],,=x, j ) = E(f(XTA‚ > • . • , X,,, l„,)l{t=s ; }[E y, h(Xt;, ... , X„)]Y=x.).
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
427
Therefore, E (J (X
A
r t , . . . , XLA !)h(X+ti, . . , XT +Ir) {T=S;})
= E(f(XrAt
XL ,m)
1 }L=S
;
}[E,h(Xt.l , X})^=x). (11.15)
Summing (11.15) over j one gets E(f(XL„ t,, .. . , X, A IM)h(XZ+t^, .. , XL+t.) 1 IT<^}) = E(f(X„,,, ... , XZA,)[Eyh(Xrs, ... ,
Xt.)]Y=x,1{T<.}). (11.16)
This completes the proof in the case r has countably many values o
s 1 < S 2 < •••.
The case of more general r may be dealt with by approximating it by stopping times assuming countably many values. Specifically, for each positive integer n define
Z
n
=
1
L
if
0o
if r = oo .
j=0,1,2,...
(11.17)
Since
^ Tn = jn
1 < 2n t<
2 j
J_ f t \ 2n J'f T
RE^ r for {;t}= U fT n =^ n j:j/2"-
J
2n }
allt
>,0.
)))
Therefore, T n is a stopping time for each n and i n (w) j i(w) as n oo for each w e Q. Also, .F, c 3 (11.4). Let f, h be bounded continuous functions on S°', S', respectively. Define q (Y) = E^,h(X, ; , ... , X1: ). ,
(11.18)
Then PP(Y) =
J
9(Yi
)p(ti;Y,Yi)dYi,
(11.19)
where g(y 1 ) = E[h(X,,,, ... , Xt; ) I X so that g is bounded. Since y —• p(t I ; y, y,) is continuous, cp is continuous (see Exercise 4 and theoretical complement 1). Applying (11.16) to T = r n one has E(.f(XT"At>> • . • > Xz"A tm)h(X+t;, ... , XT"+:ß) 1 {t"<W}) E (f(X,n,, .. . , XL"Atm)4(X,)11Tn
BROWNIAN MOTION AND DIFFUSIONS
428
Since f, h, cp are continuous, {Xt } has continuous sample paths, and T. j i as n -+ oo, Lebesgue's dominated convergence theorem may be used on both sides of (11.20) to get
E (f (XL A t ,, . . . , XL lm)h(X + 1 , . . . , X
,.)1< cob)
= E(f(X,,,, . .. , Xt„,)q (X) 1 (T
n
This establishes (11.13), and therefore (11.12).
Examples 1, 2, 3 below illustrate the importance of Theorem 11.1 in typical computations. In all these examples {X} is a one-dimensional Brownian motion with zero drift.
Example 1. (Probability of Reaching One Boundary Point Before Another). Let {Xj be a one-dimensional Brownian motion with zero drift and diffusion coefficient Q z > 0. Let c
(11.22)
(c < x ` d).
Fix x E (c, d) and It > 0 such that [x — h, x + h] c (c, d). Writer = T x _ h A 'rx+n, i.e., r is the first time {X} reaches x — h or x + h. Then Px (i < oo) = 1, since
1
Px(i>t)
(2 )I Z ^
^
f _
✓'
( 21
exp{ —
,41,7
z2
f
(27CV2t)l^Z
dz -*0
x+nexp^_(y_X)Zl
x_h
dy
2o2t J}
as t -+ cc.
(11.23)
Now, si(x) = PX ({X1 } reaches c before d) = Px ({(Xz ),} reaches c before d)
= Ex (P({(Xi ),} reaches c before d I {X,,, ^; t '> 0})).
(11.24)
The strong Markov property now gives that O(x) = EX(iIi(XL))
(11.25)
so that by symmetry of the normal distribution,
t^i(x)=1i(x—h)PP (XL =x—h)+t/(x+h)PP (XT =x+h) =ay(x—h)Z+fi(x+h)2.
(11.26)
429
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
Therefore, (t (x+h)— f(x))—(^(x)—^i(x—h))=0.
(11.27)
Dividing the second-order difference equation (11.27) by h 2 and letting h j 0, we get ."=0,
1'(c) = 1,
(11.28)
i(d)=0.
Therefore, d—x
(11.29)
OX)=d—c
which is a special case of (2.24), or (4.17) or (9.18); see also Eq. 9.6 of Chapter I. Now, by (11.23) and (11.29), P x ({X,} reaches d before c) = 1 — ^i(x) = x c
(11.30)
d—c
for c < x <, d. It follows, on letting d T cc in (11.29) and c. — oo in (11.30) that P.(r < cc) = 1
for all x, y.
(11.31)
Example 2. (Independent Increments of the First Passage Time Process). Let {X,} be as in Example 1, with X o = 0 with probability 1. Fix 0
= T a + inf{t i 0: X
b} = T a + Tb(Xio),
(11.32)
where T b (X) is the first hitting time of b by the after-T a process X. Since this last hitting time depends only on the after-T a process, it is independent of T a which is measurable with respect to .F. Hence T b — Ta - Tb(X o) and T a are independent, and the distribution of T b — T a under P0 is the distribution of T b under Pa . This last distribution is the same as the distribution of T b _ a under P0 . For, if {X1 } is a Brownian motion starting at a then {X, — a} is a Brownian motion starting at zero; and if T b is the first passage time of {X,} to b then it is also the first passage time of {XX — a} to b — a. If 0 < a,
430
BROWNIAN MOTION AND DIFFUSIONS
to
Figure 11.1 to T a . _ , shows that T an — T an _ , is independent of {t a , : 1 < i < n — 1} and its distribution (under Po ) is the same as the distribution of i a „_ a , under P0 . We have arrived at the following proposition. Proposition 11.2. Under P0 , the stochastic process {Ta : 0 < a < o} is a process with independent increments. Moreover, the increments are homogeneous, i.e., the distribution of T b — t o is the same as that of T d — ; if d — c = b — a; both distributions being the same as the distribution of r b _ a under P0 . Example 3. (Distribution of the First Passage Time). Let {XX } be a
one-dimensional Brownian motion with zero drift and diffusion coefficient a 2 > 0, X0 = 0 with probability 1. Fix a > 0. Let { Y} be the same as {X} up to time and then the reflection of {X} about the level a after time t o (see Figure 11.1 above). That is, XX
_t2XTa —X X =2a—X,
for t < TQ, fort ->Ta.
(
11.33)
Thus, Y ä = { YTa+' } = a + ( a — Xä). Now the process a — X ä is independent of the past up to time r 0 , and has the same distribution as X ä — a, namely, that of a Brownian motion starting at zero. Therefore, Y ä - a + (a — X) is independent of the past up to time 'r a and has the same distribution as that of a + ( X ä — a) = X. We have arrived at the following result. Proposition 11.3. (The Reflection Principle). The distribution of {Y} is the same as that of {X,}. Now for x
e a one has (see Figure 11.1 above)
Po (X, ,> x) = Po (X1 ,> x, v a ,< t) = Po (Y ,< 2a — x, i a (Y) ,< t), (11.34)
STOPPING TIMES AND THE STRONG MARKOV PROPERTY
431
where t 0 (Y) is the first passage time to a of { }}. Note that ra (Y) = r a . By Proposition 11.3, the extreme right side of (11.34) equals Po (X1 < 2a — x, z a <- t). Hence,
Po ( X,>, x )= Po ( X,<2a — x,za
P0 (X, > a) = P0 (X, < a, M, >, a) = P0 (M, >, a) — P0 (M, >, a, X, > a) = PO (M, > a) — PO (X, > a)
(11.36)
since {X1 > a} c {M1 > a }. Thus,
PO (M, > a) = 2P0 (X, > a).
(11.37)
Hence, PO(ta < t)
= P0 (M, % a) _
2 1c0 YZI(2a2t) dy e(2ita2t)I 2 a 2 (2)1/2
e - = 212 dz.
(11.38)
Thus, the probability density function (p.d.f.) of T a is given by t 3/2 e a2 /(2ort) -
fa(t) = u(27r)1/2
0 < t < co.
(11.39)
The p.d.f. of MM is obtained by differentiating the first integral in (11.38) with respect to a, namely,
9,(a)
- a2 /(2,,2f) = 2Q1 = 2 e ( 27ra 2 t) 1J2 \a
aj tJI ,
—cc
where cp is the standard normal density, (p(z) = ( 2 7 ) 112 exp{ —z 2 /2 }. Changing variables x —► y = 2a — x in (11.35) yields the joint distribution of (X1 , M1 ),
PO (X, <, y, MM a) = Po (X, >, 2a — y)
F =
-
➢)/(a lt) ( 2irl)
e =/2 dz,
—co
a0.
(11.41) Differentiating successively with respect to y and a, one gets the joint p.d.f. of
432
(X1 , MM )
BROWNIAN MOTION AND DIFFUSIONS
as
h (y a) = 2(2a — y) r a 3 t 312
j 2a —
y
\ t7 \/ /
2 2a—y g Q' t 3 / 2
2 0,
12 INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
In this section it is shown that the existence of an invariant probability is equivalent to positive recurrence. In addition, a computation of the invariant density is provided. First, an important consequence of the strong Markov property (Theorem 11.1) is the following result. Proposition 12.1. If the diffusion is recurrent, x, y e S (x < y) and f is a
real-valued (measurable) function on S, then the sequence of random variables n2
Z,=
+2 r
f(X)ds
(r= 1,2,...)
(12.1)
n2r
is i.i.d., where rl 1 = inf{t>,0:X,=x}, 12, +1
=inf{t
11 2r :XX
r12,=
=x}
inf{t >r12r- :Xr =Y},
(12.2)
(r=1,2,...).
By the strong Markov property, the conditional distribution of Z, given the past up to time 11 2 , is the same as the distribution of j2 f (Xs ) ds with initial state y. This last distribution does not change with the sample point w, and therefore Z, is independent of the past up to time q 2 ,. In particular, Zr is independent of Z 1 , Z Z ... , Z,_ 1 . n Proof.
,
The main result may now be derived. Theorem 12.2. Suppose that the diffusion is positive recurrent on S = ( a, b).
(a) Then there exists a unique invariant distribution n(dx). (b) For every real-valued f such that
fs Lf(x)In(dx) < oo,
(12.3)
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
433
the strong law of large numbers holds, i.e., with probability 1,
1
f
t
lim f(X.,) ds = t o
r_X
(12.4)
fs(x)n(dx),
no matter what the initial distribution may be. (c) If the end points a, b, of S are inaccessible or reflecting, then the invariant probability has a density ic(x), which is the unique normalized integrable solution of A*7c(x) = 0, i.e., 1 d2 2
(a 2 (x)rc(x)) — x (µ(x)n(x)) = 0
for x e S,
d
(12.5)
or simply of dx (U2(X)n(x)) — µ(x)ac(x) = 0.
2
(12.6)
Indeed, the invariant measure is the normalized speed measure,
ir(x)
=
m (x)(12.7)
m(b) — m(a)
Proof. Positive recurrence implies E5 (rl 2r — r12._2) = E Y ri 2 < oc. Hence, by the classical strong law of large numbers, r (flr' — 12(r -1)) - r' 1 X12r —
—* Ey112
r
r
(12.8)
with Pr -probability 1, as r --• oo. Let f be a bounded real-valued (measurable) function on S. Then applying the strong law to the sequence Z, (r = 0, 1, 2, ... ; ri o = 0) in (12.1) one gets lim 1
r— ^
r
Zr -1 = EYZ0 = E,f f(X,) ds.
Z
r'=
, "'
(12.9)
1
As in Section 9 of Chapter II (or Section 8 of Chapter IV), one has (Exercise 1) lim
1 r—c, fl2r
f
0
f(XS ) ds = lim
f
t —ao t 0
f(
X
) ds,
(12.10)
434
BROWNIAN MOTION AND DIFFUSIONS
for every f such that
EI J
n2
(12.11)
I f(X,)I ds < oo .
0
Combining (12.8)—(12.10), one gets lim
1 fo f(XX)ds=
r-• m t
for all
Et InZ f(X5 ) ds,
Ey'12
(12.12)
0
f satisfying (12.11). In the special case f = 1
8
with
B a Borel subset of
S, (12.12) becomes r lim 1 8 (X3 ) ds = ir(B)
1
(12.13)
t- r tfo
where
f
on 2
n(B)'= 1
EY 1 B (XS ) ds.
Ep^i2
(12.14)
Thus, (12.13) says that the limiting proportion of time the process spends in a set B equals the expected amount of time it spends in B during a single cycle relative to (i.e., divided by) the mean length of the cycle. It is simple to check from (12.14) that n is a probability measure on S (Exercise 2). Also, if f = a1 ß , where a l , a 2 , ... , a„ are real numbers and B 1 , B 2i ... , B„ are pairwise disjoint (Borel) subsets of S, then 1 E5 f(Xs) ds =E 1
E5,12
Y a ^rl: 1
2 i= 1
0
_
8 r (X.) ds
0
a n (B ) =J f(x)ic(dx). i
;
The equality EY z ^
n=
E2 o f (X„) ds =
('
J
s
f(x)ir(dx)
(12.15)
may now be extended to all f satisfying (12.11) (Exercise 2). Combining (12.12) and (12.15), one has
lim
1
r
f
f(X5 ) ds = sf(x)n(dx)
r- ,O t 0
(12.16)
435
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
with Pr -probability 1, for all f satisfying (12.11). Taking expectations in (12.16) for bounded f one gets, by arguments used in Section 9 of Chapter II, lim J t B y f (X,) ds = t-. t o
J f (x)n(dx)
(12.17)
s
for all y. Writing (T5f)(Y) = Ej(XS) = f(z)p(s; Y, z) dz,
(12.18)
fs
one may express (12.17) as
1 J (T ('
lim r
-
t
S
f )(y) ds = f (x)n(dx)
.0 t o
(12.19)
s
for all y E S. But the left side also equals, for any given h> 0, 1 t+h
lim
1 (Ts f)(y) ds = lim —
t-^co t h
r^oo t
t
1 t (Ts+hf)(Y) ds = lim — (TT,, f))(y) ds ('
o t-
x,
=
JO
t
J (T f)(x)n(dx)• (12.20) h
The last equality follows from (12.19) applied to the function Th f. It follows from (12.19) and (12.20) that, at least for all bounded (measurable) f, one has
JI (T f )(x)n(dx) = f f(x)n(dx) h
s
(h > 0).
(12.21)
s
Specializing this to f = 1 B one has (T h f)(x) = p(h; x, B), so that (12.21) yields PP (Xh E B) = E,P(Xh e B 1 Xo ) =
J p(h; x, B)n(dx) s
J
=f
s
1 B (x)rc(dx) =
J x(dx) = T(B) = P,,(Xo e B),
(12.22)
s
proving that it is an invariant probability. The proof of parts (a), (b) of Theorem 12.2 is now complete, excepting for uniqueness. Uniqueness may be proved in the same manner as in Section 6 of Chapter II (Exercise 3). In order to prove (c), first note that n(x) given by (12.7) is a probability density function (p.d.f.) which satisfies (12.6) and, therefore, (12.5). To prove
BROWNIAN MOTION AND DIFFUSIONS
436
its invariance one needs to check (see (12.14)), for n given by (12.7),
E
z
E E x .f(X:) ds = .f(z)n(z) dz = 1 YIl
i (b) m(ä )d (
s
s
)
(12.23)
-
where f is an arbitrary bounded measurable function on S, and a, b, are the end points of S. But, as in the proof of (10.17) and (10.18) (Exercise 4) Ex f
T-
f(X,) ds = t
0
y
f f'u^).dm(ul ds(z), Z.
Jz a
f
(12.24)
JJ
.f(X,) ds = Y b .f(u) dm(u) ds(z). Ex o
Hence, by the strong Markov property,
EJ =
n2 f(X)ds=
0
I
E
!
r
£
f(Xs) ds + E x
'
y
/('
J
tY
.f (XS) ds
0
\
6
( J f (u) dm(u) I ds(z) = s(x; y) J X \ n
/
6
f(u) dm(u). (12.25) a
In particular, taking f - 1,
E
y
'7 2 = s(x; y)(m(b) — m(a)).
(12.26)
n
Dividing (12.25) by (12.26); one arrives at (12.23).
The following alternative argument is instructive, and may be justified under appropriate assumptions on the transition probability density p(t; x, y) (Exercise 5). By the backward equation (2.15) and integration by parts one has, for the function ir(x) given by (12.7),
f p(t; x, y) 7 r(x) dx = s s
=
ap(t
, ät
y)
f
m(x) dx = (Ax p(t; x, y))rti(x) dx s
Jp(t; x, y)(A*rc)(x) dx = 0,
(t > 0).
(12.27)
s
Since fs p(t; x, y)rc(x) dx is the p.d.f. of X, when the initial p.d.f. is ir(x), it follows from (12.27) that the distribution of X, does not change for t > 0; but X, —► Xo a.s. as t 10, so that X, converges in distribution to ir(x) dx. Therefore, X, has p.d.f. n(x) for every t > 0. Example 1. (Price Adjustment in a Two-Commodity Model). The excess
INVARIANT DISTRIBUTIONS AND THE STRONG LAW OF LARGE NUMBERS
437
demand function a(x) = (z t (x), z 2 (x)) of two commodities is a function of prices x = (xl, x2) E A:= 1 (x1, x2): xl > 0,x2
> 0,x1 + x2 =
1}
such that Walras' law holds, namely, x,z i (x) + x z z 2 (x) = 0,
x c- A.
(12.28)
In view of (12.28) one may concentrate on the behavior of just one excess demand, say z l (x). Price adjustments in a market usually depend on excess demands. Since x 2 = 1 — x 1 , one may consider the price X 1 (t) of the first commodity as a diffusion on (0, 1) with a drift function determined by the excess demand z l at this price. For example, take the generator of such a diffusion to be of the form Ä
= 12 62
O x
dz + ,O x d (0 <x<1), dx2
(12.29)
dx
so that the mean rate of change in price is proportional to the excess demand. The drift function z 1 (x) satisfies lim z l (x) = co,
lim z i (x) = —ace.
XIO
xii
(12.30)
This says that if the price of commodity I is very low (high) its demand is very high (low), so the price will tend to go up (down) fast at the mean rate z,. Also, assume that there is a constant rö > 0 such that Qz (x) , Qö.
(12.31)
Then the boundaries 0 and I become inaccessible. Indeed, it is easy to check in this case that s(0) = —co, s(1) = oo, and that the diffusion on S = (0, 1) is positive recurrent and, therefore, 0 and 1 cannot be reached from S (Exercise 6). There is, therefore, a unique steady-state or equilibrium distribution it(dx) = iv(x) dx, and, no matter where the system starts, the distribution of X 1 (t) will approach this steady-state distribution as t co (see theoretical complement 1). Example 2. (Stochastic Changes in the Size of an Industry). By the "size" of a competitive industry is meant its productive capacity. The profit level f(y) for a given size y is the rate of additional return (profit) per unit of increase in the industry size from its present size y. The law of diminishing return requires that f'(y) <0 for all y. There is a "normal" profit level p. When the current level f falls below p, firms tend to drop out or reduce their individual capacities. If the current profit level f is higher than p, then the productive capacity of
438
BROWNIAN MOTION AND DIFFUSIONS
the industry is increased owing either to the appearance of new firms or to expansions in existing firms. In a deterministic dynamic model one might consider the industry size y, to be governed by an equation of the form dy` = g(.f(Y1) — P),
dt
(12.32)
where g is assumed to be sign preserving. That is, if u > 0 ifu<0.
g(u) (> 0
<0
(12.33)
The deterministic equilibrium is the value y* such that f(y*) = P. We assume f(0) > p > f (co ). In a stochastic dynamic model one may take the industry
size Y at time t to be a diffusion on S = (0, oo) with drift 0
µ(y) = g(f(Y) — P)'
(12.34)
and a diffusion coefficient ö 2 (y) that represents the (time) rate of local fluctuation (variance) when the process is at y. If one assumes Q 2 (Y) > Qö > 0,
lim g(f(Y) — P) < 0, Y-M
(12.35)
limyg(f(y)—p)>0, vlo
then the process { Y} is positive recurrent (and, in particular, 0 and o0 are inaccessible) (Exercise 7). The strict sign-preserving property (12.33) is not needed for this. The stochastic steady-state distribution is approached as time increases, no matter how the process starts. This distribution is sometimes referred to as the stochastic equilibrium.
13 THE CENTRAL LIMIT THEOREM FOR DIFFUSIONS Consider a positive recurrent diffusion on an interval S. The following theorem may be proved in much that same way as described in Section 10 of Chapter II (Exercise 1). Let y e S, and define ri g as in (12.2). Theorem 13.1. Let f be a real-valued (measurable) function on the state space S of a positive recurrent diffusion, satisfying
J
f 2 (x)n(x) dx < cc, b2:= E y ( s
\
J
'12
o
(f(X,) — E,j) ds^ Z < co, (13.1)
439
THE CENTRAL LIMIT THEOREM FOR DIFFUSIONS
where ir(dx) = 7r(x) dx is the invariant probability. Then as
J 1
t --•
oo,
'
(13.2)
(f(X) — E n f) ds
o
converges in distribution to a Gaussian law with mean zero and variance 2 /(E, 2 ), whatever the initial distribution. In order to use Theorem 13.1, one must be able to express the variance parameter in terms of the drift and diffusion coefficients. A derivation of such an expression will be given now. Assume E,j = 0 (i.e., replace f by f — E rz f ). Under the initial distribution n the variance of (13.2) is given by
?
f(X.,) ds) = 1 E n J
t o
o
_
J
Jf(X) ds = E(
Var><
En
o
J
t
r r
t
o
2
=
r
S
oo
('
E
t
,
r s
En(f(XX•).f(X5)) o 0
E><[f(X5•)E(f(X,)
1 X .)] ds' ds 5
s
J o o
-
f(X)f(X5 ) ds' ds 0
f(X)f(X)ds' ds = ?
o
_?JJ
J
o
,r(.f(Xs•)(TS-s•.f)(Xs )) ds' ds.
(13.3)
Now E,c(.f(Xs.)(T5 - 5.f)(X„•))=
f
.f(x)(TS-s-f)(x)n(x)dx.
(13.4)
s
Therefore, (13.3) may be expressed as 1
J
r
2
Var n o f (XS) ds =
r
fs ( fo I (Ts
t
s f)(x) ds' ds)f (x)n(x) dx
t
2 r
s
s
(I
(T„ f)(x) du) ds] f(x)n(x) dx.
(13.5)
0
Assume that the limit h(x) = lim ^^ (T„f)(x) du s—.
(13.6)
0
exists in L2 (S, it) (see remark following proof of Theorem T.13.4, page 515).
440
BROWNIAN MOTION AND DIFFUSIONS
Then
JS (T„ f)(x) duds = h(x),
lim 1
(13.7)
t .' ot o
1—.
and one has, using (13.5) and (13.7), lim Var,, 1 f(X5 ) ds = 2f h(x)f(x)n(x) dx. 1--^
f o
(13.8)
s
Let us show that the function h satisfies the equation Ah(x) =
—f (x),
(13.9)
where A is the infinitesimal generator of the diffusion, i.e.,
- + µ(x) d ,
A = Zo2(x)
(13.10)
together with boundary conditions if S has a boundary. For this, note that, by (13.6), for > 0,
J
T E h(x) = lim J S (TE T„f)(x) du = lim
S
(TE+ . f)(x) du
s-^ o
= lim S
= lim
I
+E (T„f)(x) dv = lim
s
J
S
s'^^ o
(Tj)(x) dv —
J
J
S
(T„ f)(x) dv
e
(T^f)(x) dv = h(x) —
0
J (T^f)(x) dv. 0
(13.11)
Hence, (TEh)(x) — h(x) 1
E = —um-- (T„ f)(x) dv = — (T o f)(x) = — f (x).
Ah(x) = lim
ej0
E
sf.0 E 0
(13.12)
These calculations provide the following result. Proposition 13.2. The variance of the limiting Gaussian distribution in Theorem (13.1) is given by
2
f
f(x)h(x)n(x) dx
s
where h is a solution to (13.9) in LZ (S, n).
(13.13)
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS
441
It may be shown that if there exists a function h in LZ (S, it) satisfying (13.9) for a given f in LZ (S, it) with Ej = 0, then such an h is unique up to the addition of a constant (see theoretical complement 4). The condition E n f = 0 ensures that in this case the expression (13.13) does not change if a constant is added to h. Also note that, in order that (13.9) may admit a solution h, one must have Ej f = 0. This is seen by integrating both sides of (13.9) with respect to it(x) dx and reducing the first integral by integration by parts. That is,
f = fh
Ah(x)it(x) dx = JS
h(x)(A*n(x)) dx
s
(x)[Z(u'(x)n(x))" — (p(x)n(x))'] dx = 0, (13.14)
s
using (12.5).
14 INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS A k- dimensional standard Brownian motion with initial state x = (x ( ' ) , x (2) , ... , x) is the process {B, = (B, .. . , B"): t 0} where {B, }, 1 < i < k, are k independent one-dimensional Brownian motions starting at x '° (1 < i < k). It is easily seen to be Markovian and the conditional density of B + given given {Bh: 0 < u < s} is a Gaussian (k-dimensional) density with mean vector Bs and variance—covariance matrix tl where I is the (k x k) identity matrix. The transition density function is (
k ) 2 1 ^/2 k exp — 1 ^ (y ( — x" ) 2t ; = 1 ]
p(t; x, y) = [ (2nt)
I k
(Y ` — x(`)) ( )
I ex { _ (2nt) 1 / 2 p 2t
?^
where x
= (x"
...,x (k) ),
Y = (Y "),...,y(k)). (14.1)
As in the one-dimensional case, the Markov property also follows from the fact that {B: t > 0} is a (vector-valued) process with independent increments. It is straightforward to check that p satisfies the backward equation
ap_1k ^ a 2 p
(14.2)
442
BROWNIAN MOTION AND DIFFUSIONS
as well as the forward equation
ap
-
öt
1 k02p
(14.3) 2 a=1 (a y (I)) 2 •
A Brownian motion {X, = (X,(' ) , . .. , X,(k) )} with drift vector p = (µ (1) , ... , µ (k) ) and diffusion matrix D = ((d ;; )), is defined by X,=x 0 +tµ+o'B,,
(14.4)
where X a = x 0 = (xol) , ... , xö )) is the initial state, a is a k x k matrix satisfying and B, = (B,. . . , Bt( k) ) is a standard Brownian motion with initial state "zero" (vector). For each t > 0, aB, is a nonsingular linear transformation of a Gaussian vector B, whose mean (vector) is zero and variance-covariance matrix tI. Therefore, aB, is a Gaussian random vector with mean vector zero and variance-covariance matrix atic' = talä = tD = ((td ;; )). Therefore, X, is Gaussian with mean x 0 + ttt and variance-covariance matrix tD. Since X, +S - X, = st + a(B, , - B,), {X,} is a process with independent increments (and is, therefore, Markovian) having a transition density function 66' = D ,
1 p(t; x, y) -
((21rt)'/ 2 ) k (det D)" 2 1 k
(
k
x ci) - tµ (i)) I d''(y(`) - x ( ` ) - tµ(°)(yc;) ` 2 t;=ii=1
x expS -- Y
(x,yeRk;t>,0). (14.5)
Here ((d")) = D -i* . This transition probability density satisfies the backward and forward equations (Exercise 3)
a
1
k
k
rat = 2 ; ^
j=1 d"
a t__ 1d l3 öt
2;=1; =i
a
k
a2
U)
, ax ` axu + ; Y µ axe;) ( )
)
a 2 P -
ay u)a x c;)
IC
;=1
(14.6) u ;;)
aP a y (j)•
More generally, we have the following.
Definition 14.1. A k-dimensional diffusion with (nonconstant) drift bector µ(x) = (p" l) (x), .... p (k) (x)) and (nonconstant) diffusion matrix D(x) = ((d (x))) is a Markov process whose transition density function p(t; x, y) satisfies the ;;
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS 443
1 Y_
Kolmogorov equations
aP öt
k
^,^ ,
1
k
a 2 (dj;(Y)P) — 2i=11=1 äyu°
aP __ 1 öt
a
I d,i(x) ax ^ ax ct^ + Y µ (x) ax 2, =1 i=1 =1
(14.7)
3(f^(`^(Y)P) a y <<^
Such a p may be shown to exist and be unique under the assumptions: 1. ((d i3 (y))) is, for each y, a positive definite matrix. 2. d(y) is, for each pair (i, j), twice continuously differentiable. 3• µ ` (y) is, for each i, continuously differentiable. 4• lµ ( O(y)) does not go to infinity faster than lyl, and Id (y)I does not go to infinity faster than Iy1 2 . ( )
;;
An alternative definition of a multidimensional diffusion may be given by requiring that the appropriate analog of (1.2)' holds (Exercise 2). The state space may be taken to be a suitably "regular" open subset of Il k , e.g., a rectangle, a ball, l{B k \ {0 }, etc. Once p is given, the joint distribution of the stochastic process at any finite set of time points is assigned in the usual manner, e.g., given X 0 = x 0 , P(t1; x0, x1)P(t2 — t1; x1, x 2 ).
..P(t n — t o -1; x n - 1 , x )
is the joint density of (X,,, X, ... , Xj where 0 < t 1 < t 2 < • • • < t,. The sample paths may also be taken to be continuous (vector-valued functions oft). An alternative method of constructing the diffusion is to solve Itb's stochastic differential equations: dX, = µ(X,) dt + a(X,) dB^,
X 0 = x,
(14.8)
where {B,: t >, 0} is a standard Brownian motion. The equations (14.8) (there are k equations corresponding to the k components of the vector X,) may be roughly interpreted as follows: Given {X: 0 < u <, t }, the conditional distribution of the increment dX, = X, +d , — X, is approximately Gaussian with mean µ(X,) dt and variance—covariance matrix D(X) dt. We will study the Ito calculus in detail in Chapter VII.
The operator
12 •
k
A:= d,i(x) ^.i =1
k
a2 +
öx^`^ öxci^
a (14.9) Y j(x)axti)
r=1
is called the (infinitesimal) generator of the diffusion on R k with drift p(•) and diffusion matrix D(.).
BROWNIAN MOTION AND DIFFUSIONS
444
It is sometimes possible to reduce the state space by transformation from multidimensional to one dimensional while still preserving the Markov property. Such reductions often facilitate computation of certain probabilities. Of course, if cp is a one-to-one measurable transformation then p(x) may be thought of only as a relabeling of x, and the Markov property is preserved, since knowing cp(X„), 0 < u < s, is equivalent to knowing X,,, 0 < u < s. In particular, if cp is onto an open subset a one-to-one twice continuously differentiable map of S of III W', then the Markov process {Y,:= (p(X,)} is also a diffusion whose drift and diffusion coefficients may be expressed in terms of those of {X,} in a manner analogous to that in Proposition 3.2 (Exercise 2). The Ito calculus of Chapter VII provides another method for calculating the coefficients of the transformed diffusion (Exercise 3). We are here interested, however, in transformations that are not one-to-one and, in fact, reduce the dimension of the state space.
Il"
Example 1. (The Bessel Process or the Radial Brownian Motion). Let {B,} be
a k-dimensional standard Brownian motion, k> 1. Let k
5" (x) 21 ^ 2
O(x) = Ixl = (
R,:= q(B,) = IB,I
=J
Let us show that {R} is a Markov process on the state space g= [0, co). Let f be an arbitrary real-valued, bounded measurable function on S. Write g(x) = (f o ip)(x) = f (IxI). Using the Markov property of {B,} one has E[.f (R,+s) I
{R,,: 0 u s}] = E[E(f (R 1+S ) I {B,,: 0 s u s}) I {R: 0 u s}] = E[E(g(B, +S ) I {B: 0 u s}) I {R: 0 u s}] = E[(T,g)(B5) I {R: 0
u <, s}],
(14.10)
where {T r } is the semigroup of transition operators associated with {B,}, i.e.,
(T,g)(x) =
=
fR
f
g((y))p(t;
x, y) dy
k
2
i(IYI)(2^ct) k^2 eXp _ ly 2a x l dy.
(14.11)
Rk
To reduce the last integral, change to polar coordinates. That is, first integrate over the surface of the sphere {lyl = r} with respect to the normalized surface area measure da, and then integrate out r after multiplying the integrand by
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS
445
the surface area uw k r k- ' of {lyl = r }. Since k
IY - x1 2 = Iy1 2 + IXI 2 - 2 Y xy, x12+IYI?1 ex { IY —xl Z ^ da = ex {_I ffl yl =r}
p 2t
2t
p
x
flI
J
l•IYI exp lx
wu ) zc` ) }da i (z) (14.12) )))
i=1
t
ZI =1}
where w = x /Ixl, z = Y/IYI• Note that E;` =1 w ( zW is the cosine of the angle between the unit vectors w and z, and its distribution under da l does not depend on the particular unit vector w. In particular, one may replace w by (1, 0, ... , 0). Hence,
Jilyl .y exp{—
I Y 2rxl 2 l dar __ exp{ Ix12 2t
r2^
I
I zJ 1x12 + r exp
2t
f il=1 =^^ exp ^\—
h(
rlxl t
)z(1) } dal
)
(14.13)
where (Exercise 9) 1
e °"(1 uz)(k zu2 du 1 (14.14) (1 — uz)(k-2)I2 du -
h(v):=
f
e °Z `
J da, _
ilzl =11 J
—
'
1
Thus,
(T1g)(x) = (2mt) - ki 2
w k e - I=I 212 i f( r ) e -.=I2(h( j r x ll r k-I d r = G(t, Ixl), J o ` t )
(14.15) say. Using (14.15) in (14.10) one gets the desired Markov property of {R 1 }, E[.i(R, +s) I {R: 0 < u 5 s }] = E[ i(t, IX5I) I {R u : 0 < u < s }] ='(t, Rs).
(14.16) Further, it follows from (14.15) that the transition probability density function of {R 1 } is z
2 r zl
exp _ r ^rt lhl,k ^. 9(t; r, r') = (21rt)-ki2wk
(14.17)
446
BROWNIAN MOTION AND DIFFUSIONS
Write the semigroup of transition operators for {R,} as {T r }. Then, (Trf)(r) = (T9)( X )Ilxi=. _'(t, r),
a _
a
a
at (Tjf)(r) = at 0(t, r) 1 k a2,(t,
IxI) I 1
2
a
i= [ax(i)
1
(
aI1/(tr)
ar
k
= 2 i I
1
, z ,=,
2 i = 1 ax
1"
I
at (Tr9)(X)
r
2 i =, ax
x«)
a 2 (T,g)(X)I (a x (i))Z (
l
2^(t, r)
aixI
ar
ax / Ix,.,
"1
IXI
__ 1 '` a 2 ^(t, r) (x(1))2 + [ Or r2 2i a
Ixl=r
a^ (t r) 1 (x (i) ) 2 r (r — r 3
)llxl=r
= 1 a 2 ^i(t, r) + k — 1 a, (t, r) = 1 a 2 (T 1 f)(r) + k — 1 a(T^ f)(r) ar ar e 2r ar 2 2 ar e 2r (14.18) Thus, the transformed Markov process has as its infinitesimal generator the
Bessel operator Ar.— 1 dz + k— 1 d
2
dr 2 2r dr
(14.19)
It will be shown later that the state space of {R,} may be restricted to (0, cc), so that (k — 1)/2r in (14.19) is defined. It may be noted that not only is {R,} Markovian, its distribution under P,, is Q 4,( . ) ; where P. is the distribution of the Brownian motion {X,} when X 0 = x, and Q, that of {R} when R o = r. This is easily checked by looking at the joint distribution, under Px , of (R,., ... , R) by successive conditioning with respect to {X: 0 '< U i
=n-1,n-2,...,1
(0
Let us make use of the above reduction to calculate the probability (for given 0
(14.20)
for c < Ix — al < d. Here B(a:r) = {z E R k : Iz — al
for c
(14.21)
INTRODUCTION TO MULTIDIMENSIONAL BROWNIAN MOTION AND DIFFUSIONS
447
In turn, (14.21) equals Px _ a ({R r } reaches c before d) = Q 1 21 ({)ç} reaches c before d), (14.22) where { Y} is the canonical one-dimensional diffusion (i.e., the coordinate process) on S = [0, oo), having the same distribution as {R, }. From (2.24) (or, (4.17), or (9.19)), the last probability is given by d
e - ' , 'I dr
0(Ix
—
al)
_
^"
f
dH^
(14.23)
e -'..) dr c
where, with µ(y) = (k — 1)/2y, Q 2 (y) = 1 (see (14.19)),
I(c,r)— Jar Q Z (y) dy= ^^rk
dy=
y I
^' log(C) k- ( 14.24)
so that
logd—loglx — al logd —lagt c
(C
k 2
^G(Ix — al) _
Ix — alJ
—
ifk =2, (
)
k
-2`14.25)
d
c k-2 if
(k
k > 2,
-)
for c
If k 2
k_2
xca
(c < Ix — al).
ifk>2, (14.26)
In other words, a two-dimensional Brownian motion is recurrent, while higher-dimensional Brownian motions are transient (Exercises 5 and 6). For k = 2, given any ball, {X,} will reach it (with probability 1) no matter where it starts. For k > 2, there is a positive probability that {X,} will never reach a ball if it starts from a point outside. Recall that an analogous result is true for multidimensional simple symmetric random walks (see Section 5 of Chapter I). In one respect, however, multidimensional random walks on integer lattices differ from multidimensional Brownian motions. In the former case, in view of
448
BROWNIAN MOTION AND DIFFUSIONS
the countability of the state space, the random walk reaches every state with positive probability. This is not true for multidimensional Brownian motions. Indeed, by letting c 10 in (14.25) one gets for 0< jx — aj < d, P,,({X,} reaches a before 3B(a:d)) = 0.
(14.27)
Letting d t co in (14.27) it follows that PX ({X,} ever reaches a) = 0
(0 < x — al).
(14.28)
(0
(14.29)
In particular, taking a = 0, one gets Q,({Y} ever reaches 0) = 0,
In view of (14.29), zero is said to be an inaccessible boundary for the Bessel process. The state space of the Bessel process may, therefore, be restricted to S = (0, 00).
Observe that the Markov property of {cp(X,):= JX^^ = Rj in Example 1 follows from the fact that T, transforms (bounded measurable) functions of q(x) into function of cp(x). This is a general property, as may be seen from the steps (14.10), (14.11), (14.15), and (14.16). In most cases, however, it is difficult, if not impossible, to calculate T, g, since the transition probability cannot be computed explicitly. A simple check for the process {q (X,)} to be Markov is furnished by the following. Proposition 14.1
(a) If A transforms functions of q(x) into functions of cp(x) then {cp(X,)} is a Markov process. (b) If P. denotes the distribution of {X} when X 0 = x and Q„ that of {cp(X,)} when gp(X o ) = u, then the P. distribution of {cp(X,)} is Q 4, . This statement needs some qualification, since not all bounded measurable functions are in the domain (of definition) of A (see theoretical complement 1).
15 MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS AND CRITERIA FOR TRANSIENCE
AND RECURRENCE Consider a diffusion {X,} on U8'` with drift coefficients y ( ' ) (•) and diffusion coefficients d ; j (.) as described in Section 13. Let G be a proper open subset of Il k with a "smooth" boundary 0G (see theoretical complement 1), e.g., G = H":= {x e U8' k : x° ) > 0), G = B(O:a)== {x e Il k : lxi
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 449
{x E ll : x^' = 0 }, aB(O:a) = {x E R: jxl = a}. One may "restrict" the diffusion to G - G u r7G in various ways, by prescribing what the diffusion must do on reaching the boundary öG starting from the interior of G, consistent with the Markov property and the continuity of sample paths. The two most common prescriptions are (i) absorption and (ii) reflection. This section is devoted to absorption. In the case of absorption the new process {X,} is defined in terms of the original unrestricted process {X,} (which is assumed to start in S = G u c G) by )
`
=
if t < i o , (
X
ift>r
15.1)
where t a, G is the first passage time to äG, also called the first hitting time of G, defined by T,G = inf {t >- 0: X, e 2G} .
(15.2)
As always, T ay = oo if {X,} never hits G.
Definition 15.1. The process {X} defined by (15.1) is called a diffusion on G = G u OG with absorption on G. The Markov property of {JZ,} follows from calculations that are almost the same as carried out for Theorem 7.1 (Exercise 1). Its transition probability is given by p(t;x,B) = P(X t eBIX ° =x) P(X ( EB,T 2G >tjX ° =x)
ifBcG,
P(i oc <, t,X, c-BIX ° =x)
ifBcöG.
(15.3)
Now {Xj has a transition probability density, say, p(t; x, y), and
P (X,EB,t ac >t1X ° =x)<,P(X,EB1X ° =x)= J p(t;x,y)dy =0, (15.4) s
if B c G has Lebesgue measure zero. Hence, the measure B -+ P(t; x, B) on the Borel sigmafield of G has a density, say p (t; x, y) (which vanishes outside G), so that (15.3) may be expressed as
°
° J ( 15.5) B c G, x e G,
p (t; x, y) dy,
p(t; x, B) =
B
PX(zaG
450
BROWNIAN MOTION AND DIFFUSIONS
where P. denotes the distribution of the unrestricted process starting at x. If x e OG, then of course p(t; x, B) = 1 B (x) for all t > 0. Example 1. (Brownian Motion on a Half-Space with Absorption). Let p ( (•) = p i , d (.) = Q?8 i; , where the µ;, r > 0 are constants. Let {X, = (X;'", .. , X)} be ;;
an unrestricted Brownian motion with these coefficients, starting at some x E H k . First consider the case k = 1. For this case the state space is S = [0, x), whose boundary is {0}. Example 8.4 provides the density p = p°, say, given by
°
p°(t; u, v) = exp{
u^(V— u) — p It }(2 ma^t)-viz
x exp
z
2
1 (v — u) z 2Q, t
)))
— exp _
(v — u) 2 11(15.6) 2v; t
for t>0,u> 0,v>0. As a special case for µ l = 0 one has P°(t; u, v) = P1(t; u, v) — P,(t; u, —v)
(15.7)
for t > 0, u > 0, v > 0, where p l is the transition probability density of an unrestricted one-dimensional Brownian motion with drift zero and diffusion coefficient ai. Also, for the case k = 1, one has öG = {0}, so that X 0 P„(iaG < t and Xt , G = 0) = P„(T,G 1< t),
(15.8)
which is the distribution of the first passage time to zero of a one-dimensional Brownian motion starting at u. By relation (10.15) in Section 10 of Chapter I, we have PP(Tac '< t) =
fo, fq,u,(s;
(15.9)
u) ds
where iiz exp _ (u —sµß .%1,µ1(s; u) = u(2^ols3)
z )
(15.10)
2s Q;
for u > 0. For k> 1, the (k — 1)-dimensional Brownian motion {(X independent of {X1 1) } and, therefore, of the first passage time iac = inf{t > 0: X(' = 0}.
2
,. . . , X; k) )} is
(15.11)
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 451
Therefore, for every interval I, c (0, cc) and every Borel subset C of l " ' it follows that p(t; x, 1i x C) = Px( {X;' C 1 1 , (X} z) , ... , X{ k) ) E C} n {z aG > t }) )
= P( {X{ i) E 1 i } n {T OG > t} Xo' ) = x^' ) ) x P((X; Z) , .. , Xt(k)) E C Xo) = xci) for 2 <, j <, k)
= f p',(t'
x
(') v)
flp'(t;x°i)
dv/ J
v a)) d v 1z)...d v (k),
x ' > 0, (15.12) (
)
where p° is given by (15.6) and pj is the transition probability density of an unrestricted one-dimensional Brownian motion with drift p and diffusion coefficient o; namely,
p,(t; u, v) = (2iroj t) 'i 2 I _ (v
=
u — tµ ) z } , 2Q;t ;
(u, v c l ').
(15.13)
Hence, for x, y e H k , k
°
( )) fl p;(t; x p (t; x, Y) = p°(t; x(') y
,
(15.14)
i=z Rk
Consider a Borel set B c öG. Then B = {0} x C for some Borel subset C of - I and, for x e G,
P(t; x, B) = Px ( {r c -< t} n {X
{0} x C })
=Px ({(X,...,Xtk,)EC }n {T OG 1
, Xs k) ) E C} I Tc'G = s)Jai.N,(s; x11)) ds
Px((Xs2),
J
= fol
(f `J
...
c
f
pJ(s; x^i)vci)) dv' z ' .
J ^
..
dv^ k) ).ral, µ1 (s;x UU) )ds.
(15.15)
The last equality holds owing to the independence of ' OG and the (k — 1) dimensional Brownian motion {(Xt( z) , ... , X;k) )}. Letting t -+ oo in (15.15) one arrives at the Pr -distribution ifr (x, dy) of X reG * Let y' = (0, y (2) , . .. , y (k) ) C ÖG. It follows from (15.15) that the distribution /i(x, dy) has a density '(x', y') with respect to Lebesgue measure on 3G = {0} x k -1 , given by
V/(x, y') =
fo
"o
()
f6 2 ' ,,(s; x ' ) fl p (s, x ti) yti)) ds i=z
(y' E OG; x e H k ). (15.16)
452
BROWNIAN MOTION AND DIFFUSIONS
For the special case y j = 0, a? = a 2 for I < i < k, we have exp{— J x(')(2i x J
i(x, Y) =
2Q2s L
2kt2
(1)
(2n)k/2
I ( x (1))2 + [' (y (l) — x (i))2
L —
^(x(r))2 + I (y(i) — xci))2]} ds
a2)-kizs-(k+z)iz
0
r(k/2)
[2,
x(1)
1
J
k/2
z
J
I
t (krz)- 1e-^ dt
°
(15.17)
ir k/2 Ix — ( 0 , Y ' )I k
The second equality in (15.17) is the result of the change of variables s — t = [( x (1))2
L
+ Y_ (y — x j ) 1'2a s. ( ) 2
J
z
2
It is interesting to note that, for k = 2, (15.17) is the Cauchy distribution. The explicit calculations carried out for the example above are not possible for general diffusions, or for more general domains G. One may, however, derive the linear second-order equations whose solutions are p ° (t; x, dy). The function p ° (t; x, y) satisfies Kolmogorov's backward equation (see theoretical complement 2) x,
y) _ at
°
x, y) + '` (^) x aP (t; x, Y) __ A o d.. x a2p°(t; ax( , ) P , F^ O 2 j "O ax (O ax(,) k
(t>0,x,yEG), (15.18)
and the backward boundary condition lim x-a 3G.x G
°
p (t; x, y) = 0
(t > 0, ye G).
(15.19)
Relations (15.18)—(15.19) may be checked directly for Example 1. Finally, for B c öG using the same argument as used in Section 7, (see (7.22)—(7.24)) one gets, writing i for TOG, P(t; x, B) = PX (X, e B) = P,^(T t, X t e B) =PX (i<00,X,eB)—Px (i>t,X,aB)
= PX (r < oo and X = e B) —E X [PX ({T>t}n{X t aB} {X,,:0
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 453
=Px (t< cc and X t cB) — Ex[Ex( 1 (t>)
1 (xtX,^EB} 1 {X:0
-<
u
<
t })]
= Px (r < oo, X^ E B) — E x [1 ( ^ > , } Py (i < oc, X , E B) y = x ,] = ^i(x;
B)
—
°
(t > 0, x c G), (15.20)
B)p (t; x, y) dy
where i4'(x, B):= Px (r <
°
oc, X e B),
(B c 1G).
(15.21)
Since p (t; x, y) is determined by (15.18) and (15.19) (along with the initial condition lim, 10 p (t; x, y) dy = S x for x E G), it remains to determine fir. Now, for fixed x, the hitting distribution O(x; dy) on äG is determined by the collection of functions
°
p 1 (x) '=
f
ac
f(y)Ü^(x; dy) = Ex(f(X^) 1 (^<^})
(x E G), (15.22)
for arbitrary bounded continuous f on 13 G. Let us show that Al f (x) =0
(xEG),
lim tG f (x) = f(a)
(a c 1G).
(15.23)
x —a
Drop the subscript f and assume for simplicity that O(x) can be extended to a twice-differentiable function i on Q that belongs to the domain of A, the infinitesimal generator of the unrestricted diffusion {X 1 } (theoretical complement 4). Then, for x E G, assuming PP (r ac < t) = o(t) as t j0, (see Exercise 14.10), proceed as in the proof of Proposition 9.1 to get, writing r for ' OG T1c(x) = Ex(^(X1)) = Ex( 1 (t> /(X,)) + Ex( 1 (Tsri/(XO) = Ex( 1 V>t ^4(XJ) + o(t) = Ex( 1 (t>,)Ex,(f(X^)l(t<)) + o(t) (
}
= Ex( 1 (c>t}E(J (X (X,'))
{Xu: 0 < u < t })) + O(t)
(by the Markov property) = Ex(1(r>,)f(Xt(X,-))1(t(Xn+)<w)) + O(t)
Ex( 1 {L >ti./ (XTlit<mi) + °(t)
x [X = X V(x ,- ) and {r < oo} = {c{X = Ex(f(X^) 1 1,
= '(x) + o(t) = i(x) + o(t)
) < o0 on {i
> t }]
[PP(r < t) = o(t)] (x E G, t > 0).
(15.24)
454
BROWNIAN MOTION AND DIFFUSIONS
Hence, A(x) = At (x) = lim 1
T^i(x) — >y(x) = 0
(x e G),
(15.25)
1 0t
establishing the first relation in (15.23). The second relation in (15.23), namely, continuity at the boundary, may also be established under broad assumptions (theoretical complement 2). If G is bounded and aG smooth then the so-called Dirichlet problem (15.23), for a given continuous f specified on ÖG, can be shown to have a unique solution by the so-called maximum principle (see theoretical complement 3). A proof of this principle is sketched in Exercises 10-12 for the case A = A is the Laplacian A =
I'
02
(15.26)
ca> 2 '
The Maximum Principle. Let u be a continuous function on G = G u 3G, where G is a bounded, connected, open set, such that Au = 0 on G. Then u attains its maximum and minimum on äG, and not on G, unless it is a constant. To prove uniqueness of the solution of (15.23), suppose i2 are two solutions. Let u = rG 1 — ßi 2 , whose value on öG is zero. Now apply the maximum principle to u. We briefly illustrate these ideas by an example. Example 2. (Brownian Motion on a Ball with Absorption at the Boundary). Take p (.)=0, d, (.)=a 2 8 for 1 0. The case k = 1 is dealt with in Example 8.3, in detail. Let us consider k> 1. The hitting distribution i/i(x; dy), i.e., the Ps -distribution of X L , c is called the harmonic measure on öG. Its density fi(x, y) with respect to the surface area measure s(dy) on öG (or the arc length measure in the case k = 2) is the Poisson kernel, ;
;;
a 2 — Ix1 2 i(x; Y) =(IYI = a), cokalx — yl k
(15.27)
where w k is the surface area of the unit sphere (Exercise 7). The solution to (15.23) is a2 _ 2 Of(x)
o IX1
(Y)
s(dy).
t^y^ =o}
(15.28) IX
It is left as an exercise (Exercise 11) to check that x --► t4 (x, y) satisfies Ai/i = 0 in G for each x e OG, where A is the Laplacian (15.26). One may then differentiate under the integral sign to prove Abi f = 0 in G. The checks that
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS
455
SOG i/i(x; y)s(dy) = 1 for each x e G, and that i(x, y) dy converges weakly to ö 0 (dy) as x —4 y o E öG are left to Exercise 11.
Let us now turn to transience and recurrence of unrestricted multidimensional diffusions {X 1 }. Now, according to the probabilistic representation (15.22) of the solution to the Dirichlet problem (15.23), the probability (0 < c < lxl < d),
O(x):= PX ( {X,} reaches OB(0; c) before aB(0; d)),
(15.29)
is the solution to for 0 < c < ixi < d,
A(i(x) = 0
lim (x)=
x — y ,«^x^
10 1
(15.30)
if lyi = c, if lYl = d.
To see this take G in (15.22) and (15.23) to be the annulus G = {c < lxl
(Ixl > 0).
(15.31)
Straightforward differentiations yield (also see (14.18)) x(')
ö
IXI F'(ixl),
öx(^)f(x) =
(x(`))2
öz (ax(i)2
f(x) _
3(2)
^
xI2
x(i)xU)
ax(i) 2x(,) =
„
1 - (x ( ' ) ) 2 \
F (IXI) + IXI
IXI3
F
(IXI),
(15.32)
x(i)x(1)
1x1 2 F °(Ixi) — 1x13 F'(Ixl),
(1
i).
456
BROWNIAN MOTION AND DIFFUSIONS
Also write, ' x° Y d..(x)x ixt2 (
d(x):=
)
)
''
B(x):= trace of D(x) _
d ;; (x),
C(x):=2 x ° p °(x), I
(15.33)
B(x) + C(x) d(x) — 1, ß(r) := max Ixl =r
ßO — r
B(x) + C(x) min— —1, d(x)
Ixl_r
a(r):= max d(x),
a(r):= min d(x). Ixl =r
Ixl =r
Finally define, for some c > 0,
1 (r):=
f Jc
r
ß(u) du, I(r):= u
f r c
O du.
(15.34)
u
Using (15.33) it is simple to check that 2Af
= d(x)F "(Ixl)
+ B(x) — d(x) + C(x) F'(Ixl). x
(15.35)
Proposition 15.1. Under conditions (1)-(4) in Section 14, the function 0 in (15.29) satisfies d
d
exp{ — I(u)} du fX1
f
exp{ - 1(u)} du
exp{—I(u)} du
('ä J exp{ — I(u)} du
(c <, jxI < d)(15.36)
d
Proof. Let
J
r
exp{ -1(u)} du
F(r):= ('`d exp{ —1(u)} du
J
.f(x):= F(Ixi)
(c < Ix! < d). (15.37)
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS
457
Then F'(r) >0 and F"(r) + F'(r)ß(r)/r = 0, for r > c. Hence by (15.35) and the definition of ß, writing r = jxl, A x
A f <, max i(Y)
d(x)
c
F"(r) +
F'(r)ß(r)
-- = 0,
(15.38)
r
I,1=• d(y)
so that Af(x) < 0
for all c < jxi <, d.
(15.39)
Now extend f to a twice continuously differentiable function on all of l that vanishes outside a compact set (Exercise 15). Denote this extension also by f. By the extension of Corollary 2.4 to S = 11" mentioned above, Z, :=f(X,)—
J
Af(X S )ds
(t '>0)
(15.40)
0
is a martingale. By optional stopping (see Proposition 13.9 of Chapter I and Exercise 21), EXZ, = E.Z o = f(x),
(15.41)
i = TOG(o ;d) A TäG(o ;d).
(15.42)
with
In other words, using (15.39), E,,.f(X^) =f(x) + Ex J Af(X.,) ds < f(x)
(c < Ixl < d). (15.43)
0
On the other hand, f (X t ) is 1 if {X} reaches äB(0; d) before OB(0; c), and is 0 otherwise. Therefore, (15.43) becomes Px ( {X,} reaches OB(0; d) before öB(0; c)) < f (x).
(15.44)
Subtracting both sides from 1, the first inequality in (15.36) is obtained. For the second inequality in (15.36), replace ß by ß in the definition of f in (15.37) (x). n to get A f (x) >, 0, and E x f (X r ) >, f The following corollary is immediate on letting d T o0 in (15.36). Write p c (x):=Px ( {X,} eventually reaches äB(0; c)).
(15.45)
458
BROWNIAN MOTION AND DIFFUSIONS
Corollary 15.2. If the conditions (1)-(4) of Section 14 hold, then for all x,
Ixt > C, p(x) = 1
if j^ exp{-1(u)} du = co,
(15.46)
p(x) < 1
if
exp{-1(u)} du < oo.
(15.47)
fc ,0
Note that the convergence or divergence of the integrals in (15.46), (15.47) does not depend on the value of c (Exercise 16). Theorem 15.3 below says that the divergence of the integral in (15.46) implies recurrence, while the convergence of the integral in (15.47) implies transience. First we must define more precisely transience and recurrence for multidimensional diffusions. Since "point recurrence" cannot be expected to hold in multidimensions, as is illustrated for the case of Brownian motion in (14.28), the definition of recurrence on l8 k has to be modified (in the same way as for Brownian motion).
Definition 15.2. A diffusion on R' (k > 1) is said to be recurrent if for every pair x0y PX(IX, - yl <, e for some t) = 1
(15.48)
for all e > 0. A diffusion is transient if Px (IX^I- ► coast- ► co) =1
(15.49)
for all x. Theorem 15.3. Assume that the conditions (1)-(4) in Section 14 hold.
= oo for some c > 0, then the diffusion is recurrent. (a) If J° e_ 1 (b) If J e `- " du < oo for some c> 0, then the diffusion is transient. -
(
)
Proof. (a) Fix x 0 , y (x o y) and c > 0. Let d > max{1x 0 1, lyl + s}. Choose c, 0 < c < d, such that {Ixl = c} is disjoint from B(y; e). Define the stopping times
T 1 := inf{t ? 0: IX^I = d}, i2n+1 = inf{t - r 2n : IX 1 I = d},
T2:= inf{t >, r 1 : IXI = c}, T2 , :=
(15.50)
inf{t ,> zzn 1: IX:I = c}. -
Now the function x -• PX (IX,I = d for some t > 0), (x) < d, is the solution to the Dirichlet problem (15.23) with G = B(0; d), and boundary function f - 1 on {Ixl = d}. But i(x) - 1 is a solution to this Dirichlet problem. By the
MULTIDIMENSIONAL DIFFUSIONS UNDER ABSORBING BOUNDARY CONDITIONS 459
uniqueness of the solution, Px (IX,I = d for some t >, 0) = I for Ixl < d.
(15.51)
This means PX(TI < cc) = 1,
(Ixl < d).
(15.52)
As r 2 is the first hitting time of {Ixl = c} by the after-z process X , it follows from (15.52), the strong Markov property, and (15.46), that 00 )
Px(r2 <
(Ixl < d).
(15.53)
forlxl
(15.54)
= 1,
It now follows by induction that PX (t„
A,,:= { {X, } does not reach äB(y; s) during (ten, r 21]}
(n > 1). (15.55)
Then = EX 0 (i(X^ 2 ^)),
Pxo(A,, I
(15.56)
where O(x) = P.(A t )•
(15.57)
But iIi(x) is the solution to the Dirichlet problem Ai(x) = 0
for x e G := B(y; d) — B(y; e),
lim i(x)= x—Z
if Izl = d, if z e OB(y; e).
1 0
(15.58)
By the maximum principle, 0 < O(x) < 1 for x e G; in particular, 1 — 6 0 : =max >/i(x) < 1.
(15.59)
IxI =c
Therefore, from (15.56), P..(A, I ^sz.)'< (I — 6 0)
(n
>, 1).
(15.60)
460
BROWNIAN MOTION AND DIFFUSIONS
Hence, (Exercise 17) P 0 X,} does not reach ÖB(y; E)) < P, 0 (A I n • . • n A„) < (1 — 8 0 )” for all n, (15.61) which implies (15.48), with x = x 0 . (b) The proof is analogous to the proof of the transience of Brownian motion for k >, 3, as sketched in Exercise 14.5 and Exercise 18. n ({
16 REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS The probabilistic construction of reflecting diffusions is not quite as simple as that of absorbing diffusions. Let us begin with an example. Example 1. (Reflecting Brownian Motion on a Half-Space). Take S = Hk uäHk = R,where Hk = {X E
Rk
;
X (1) >O},
3H k = {X E P k ; x (1) = 0}.
First consider the case k = 1. This is dealt with in detail in Subsection 6.2, Example 1. In this case a reflecting Brownian motion {X,} with zero drift and diffusion coefficient a 2 > 0 is given by (16.1)
Y = All
where {X} is a one-dimensional (unrestricted) Brownian motion with zero drift and diffusion coefficient a 2 > 0. The transition probability density of {X1 is given by (16.2) g1(t;x,y)= p1(t;x,y)+p1(t;x, — y) (x,y% 0 ), }
where p l (t; x, y) is the transition probability density of {B,}, p, (t; x, y) = (2na2t)_ 1/2 exp
_(y
—
x
2
2Q t )
Z
( X, y e W).
(16.3)
Fork > 1, consider k independent Brownian motions {Xli) }, I < j k, each having drift zero and diffusion coefficient a 2 . Then {Y t ;= (IX'(' ) I, X e ) , ... , X; k) )} is a Brownian motion in Hk := {x e Il k ; x ( ' ) > 0} = Hk u 3Hk with normal reflection at the boundary {x e ff; x (1) = 0} = öHk . Its transition probability density function q is given by k
q(t; x, y) = qi(t; x('), y ( ' ) ) (X = (x(1),
... , x (k) ),
ri pi(t; x(i), y e) ), j =2
y = (.y(1)ß ... , y (k) ) E Hk).
(16.4)
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
461
This transition density satisfies Kolmogorov's backward equation
a
(t > 0; XE
t = i 62 A.q(t; x, y), k
where =j
Hk , ye Hk), (16.5)
a2
ax
and the backward boundary condition
a q(t; x, y) =0,
(t >0;xEaHk,yEHk).
--
(16.6)
_ The following extension of Theorem 6.1 provides a class of diffusions on
Hk with normal reflection at the boundary.
Suppose that drift coefficients p (x), I < i < k, and diffusion coefficients d(x) are prescribed on '1k the assumptions (1)—(4) in Section 14. Assume also that p 1l) (x) =0,
d 1 (x) =0
(2<j
(16.7)
For x = (xU'>, .. , x (k) ) write
x = ( x (1 1 x 121 x ).
(16.8)
—
Now extend the coefficients
/1(1)(.),
JU (^)( x ) = _p(l)(X),
on all of 11" by setting, on (Hk )`,
U(j)(x) = /1(i)(X)
d„(x) = d, ^(x),
d.;(x)
d ;; (•)
(2 <j
d,;x = —d 1 (z)
< k),
(2 < j < k),
;
= d 1 ()
(16.9)
(2 < i, j < k).
Proposition 16.1. Let {X,} be a diffusion on Rk with coefficients µ^l^( ) d. (. ) defined on 11” and satisfying (16.7), (16.9). Then the process {Y := (IX ' j, X(k))} ,(
)
is a Markov process on H k , whose transition probability density function q is given by q(t; x, y) = p(t; x, y) + p(t; x, y) =
p(t;
x, y) + p(t; z, y),
where p is the transition probability density of {X, }.
(x, y e Rk ),
(16.10)
462
BROWNIAN MOTION AND DIFFUSIONS
Proof. Bya change of variables, the Markov process {X,: (—
.. , X;k) )}
has the transition probability density p given by x, y) = p(t; z, y).
p(t;
(16.11)
Therefore p satisfies the backward equation ap(t; X, Y)
aP(t; x, y) at
=
at 1
_
k
k
µ ( ` ) (X)(a^p)(t; X, y), (16.12)
I dj;(X)(a1ajp(t; X, y) +
2 i.i=1
!=1
where a ; p denotes differentiation of p with respect to the ith backward coordinate. Now (a1 p)(t; X, Y) = — (a1 p)(t, x, y), (2 < j 5 k),
(a;p)(t; X, Y) = (a; P)(t; x, y),
(aip)(t; X> Y) _ (a1P)(t; x, y),
(16.13)
(ataip)(t; X, Y) _ — ( 2 ,a;P)(t; x, y),
(2 j <, k),
(ei a;p)(t; X, Y) _ (ei a;p)(t; x, y),
(2 5 i, j < k).
It follows from (16.7), (16.9), (16.12), and (16.13) that {X} and {X t } have the same drift and diffusion coefficients and, therefore, the same transition probability density, i.e., x, y) = p(t; x, y).
p(t;
(16.14)
It now follows, as in the proof of Theorem 6.1, that (1) J , X(2) {Yt':= , - (IX t l
,
X(k))} t
is a Markov process whose transition probability density is given by p(t; x, y) + p(t; x, y). The second equality in (16.10) is obtained using (16.14). n The backward equation for p leads, by (16.10), to the backward equation for q, k
a^ =
2
k
1 d(x) ax(^ ax (;) + ^ µ(ß)(x)
(x E Hk ; t > 0). (16.15)
The second equation in (16.10), together with the assumption that p(t; x, y) is
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
463
differentiable in x, implies the backward boundary condition aq(t;x,
y)
(t > 0; xEaHk ,yEHk ).
=0
(16.16)
Definition 16.1. A Markov process on the state space Hk that has continuous sample paths and whose transition density q satisfies (16.15), (16.16) is called a reflecting diffusion on Hk with drift coefficients µ ( ` 1 (x) and diffusion coefficients d(x). It may be shown that the first requirement in (16.7), namely, µ ' (x)=0 (
(16.17)
forxeäHk ,
)
is really not needed for the validity of Proposition 16.1. The condition (16.17) ensures that the extended drift coefficients in (16.9) are continuous. If it does not hold, extend the coefficients as in (16.9) and modify µ(')(•) on OHk by setting it zero there. Although the extended µ 'I(•) is no longer continuous, it is still possible to define a diffusion having these coefficients and Proposition 16.1 goes through (theoretical complement 1). The second requirement in (16.7), i.e., the condition (
d 1 (x)=0
for x e aHk , 2 < j < k,
(16.18)
is of a different nature. The next proposition shows that the failure of (16.18) means that the direction of reflection is no longer normal to the boundary. First consider a one-dimensional reflecting Brownian motion on [0, co) with mean drift it 'I and diffusion coefficient 1. It may be constructed by the method described in the proof of Proposition 16.1, after setting the drift —p"I on (— oo, 0) and modifying the drift at 0 to have the value zero as described above (also see Section 6). Let q l denote the transition density of this reflecting Brownian motion. Let p j denote the transition probability density of a one-dimensional Brownian motion (on R) with drift and diffusion coefficient 1,2 < j < k, i.e., p;(t; x,y):_(2nt)
" 2 exp —(y
x— t ^2 µ ) . 2t
(16.19)
Then k
q(t;
x, y) := qi(t; x «I ye>) []
pj ( c; x ° ye>),
(x, y E Hk),
(16.20)
I= 2
is the transition probability density of a Markov process on Hk satisfying the
464
BROWNIAN MOTION AND DIFFUSIONS
backward equation k
aq(t; x, y) 1 a 2 9 µ^;, aq at = 2 ; _ , (ax<<>)2 +
(t>0; xEHk ,yEHk ), (16.21)
and the backward boundary condition aq(t;x,y) ax" ^
=
forxeaHk , (t>O;yeHk ).
(16.22)
This Markov process will be referred to as a reflecting Brownian motion on Hk having drift µ := (µ('), ... , µ (k) ) and diffusion matrix I (or, diffusion coefficients b ;, (1 i,j
(
=
3f(x) ax(1)
, ... ,
of (x) (16.24) ax(k)
Also write D" 2 for the positive definite matrix such that D 1 / 2 D 1 / 2 = D, and D 1/2 for its inverse. Proposition 16.2. On the half-space H = fly there exists a Markov process {Z,} with continuous sample paths whose transition probability density r satisfies the Kolmogorov backward equation ar(t; z, w)
at
1 k 02r k ^^^ ar d,+ b` az(;) + 1 =1 = 2 ;, 1 (t > 0; zeH,wef), (16.25)
and the backward boundary condition (Dy)•(grad r)(t; z, w) = 0
(t > 0; z e OH, w E H).
(16.26)
Further, {Z,} has the representation Z, := D' 12 0Y,
(16.27)
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
465
where 0 is an orthogonal transformation such that 0' maps D" 2 y/ID" 2 yJ into e:= (1, 0, ... ‚0), and {Y} is a reflecting Brownian motion on Hk having drift vector v:= (D 112 O) - 't and diffusion matrix I.
Proof. Let {Y,} denote a reflecting Brownian motion on the half-space Hk and 0 an orthogonal matrix as described. Then {Z, }, defined by (16.27), is a Markov process on the state space H (Exercise 3). Let q, r denote the transition probability densities of {Y, }, {Z, }, respectively. Then, writing T:= D 112 0, r(t; z, w) = (det D 112 ) 'q(t; T - 'z T - 'w). (16.28) Since whenever {B,} is a standard Brownian motion, {TB,} is a Brownian motion with dispersion matrix TT' = D, one may easily guess that r satisfies the backward equation (16.25). To verify (16.25) by direct computation, use the fact that, k q 16.29 =2e + ;^1 v ( ` a ( ) xq at — ax(`)
aq(t; x, y)
)
and that, for every real-valued twice-differentiable function f on W' the following standard rules on the differentiation of composite functions apply (Exercise 4): T' grad(f o T - ') = (grad f) o T - ', (16.30)
TT'(((aja;f) ° T - I )) = (((ai a;)(f ° T -' ))) This is used with f as the function x —• q(t; x, T -1 w), for fixed t and T - 'w, to arrive at (16.25) using (16.28) and (16.29). The boundary condition aq/ax^`> - (e•grad q) = 0 on OHk becomes, using the first relation in (16.30), e•(T' grad r)(t; z, w) = 0
(z E aH - {y•z = 0 }).
(16.31)
Recall that O' D 1 / 2 y = IDYI e, and T = D 112 0, to express (16.31) as (D). (grad r)(t; z, w) = 0
if z e OH.
(16.32)
n Since a diffusion on H k with spatially varying drift coefficients and constant diffusion matrix I may be constructed by the method of Proposition 16.1, the preceding result may be proved with {Y,} taken as such a diffusion. This leads to a Markov process (diffusion) on H whose transition probability density r
466
BROWNIAN MOTION AND DIFFUSIONS
satisfies the backward equation ar(t; z, w)
at
1 k 32r
k
( i)
ar
µ (z) az^i)
2 ,, ^ d " as<<^ a z (ü +
(16.33)
and the backward boundary condition (16.26). To extend the preceding construction to a spatially dependent dispersion matrix D(x) that does not satisfy (16.18), involves some difficulty, which can in general be resolved only locally; constructions on local pieces must be put together to come up with the desired diffusion on H (theoretical complement 2). In order to define reflecting diffusions on more general domains, we consider domains of the form G:= {x e Ul k ; cp(x) >, 0},
(16.34)
where cp is a real-valued continuously differentiable function on R such that grad q is bounded away from zero on the boundary, i.e., for some c > 0, (gradq(x)i>c>0
forxe aG:_{x:Q(x)=0}.
(16.35)
Write G:= {p(x) > 0}. Let D(x):= ((d ; (x))), and u ' (x) (1 < i < k) satisfy the assumptions (1)—(4) of Section 14, on G. ( )
Definition 16.2. Let {X,} be a Markov process on the state space G in (16.34) that has continuous sample paths and a transition probability density q satisfying the Kolmogorov backward equation ag(t; x, y) at
k
2
k
= Aq := 1 Z d,j ( x ) a g + 1 u^i^(x) qa , 2 i . j = 1 ax(`) ax j> (
i = 1 ax(f)
(t>0;xeG,ye G), (16.36)
and the backward boundary condition D(x)(grad (p)(x)•grad q = 0
(t > 0;
x e aG, y e G).
(16.37)
Then {X,} is called a reflecting diffusion on G having drift coefficients p O(• ) and diffusion coefficients d(.) (1 < i,j < k). The vector D(x)(grad cp)(x) at x e aG (or any nonzero multiple of it), is said to be conormal to the boundary at x. The reflection of {X,} is said to be in the direction of the conormal to the boundary. (
Note that, according to standard terminology, (grad p)(x) is normal to the boundary of G at x a aG.
So far we have considered domains (16.37) with q(x) = x^l ) . Now let p(x) = 1 — 1x1 2 , so that G =_ {x e IRk: lxi < 1} is the closed unit ball.
REFLECTING BOUNDARY CONDITIONS FOR MULTIDIMENSIONAL DIFFUSIONS
467
Example 2. The reflecting standard Brownian motion {X,} on the unit ball B(0:1) = {x e R': Ixl < 1} has a transition probability density p satisfying the backward equation ap(t; x, Y) = i
(t > 0; IXI < 1 , IY1 , 1),
!A p(t; x, Y)
at
(16.38)
and the boundary condition
y-
—0)
ap(t; x, Y) =
ax '
0
(t > 0;
Ixl = 1, IYI < 1).
(16.39)
( )
It is useful to note that the radial motion {R,:= IX,I} is a Markov process on [0, 1]. This follows from the fact that the Laplacian A. transforms radial functions into radial functions (Proposition 14.1) and the boundary condition is radial in nature. Indeed, if f is a twice continuously differentiable function on [0, 1] and g(x) = f(Ixl), then g is a twice continuously differentiable radial function on 9(0:1) and the function 7 g(x)°= Exg(X,) = Exf(IXI),
(16.40)
is the solution to the initial-value problem au(t, x) —
at k
x(i'
au(t,
ZAXu(t, x),
x) = 0,
lim u(t, x) = f(Ixl), to
(t > 0; Ixl < 1),
(Ixl = 1),
(16.41)
(Ixl < 1 ).
Let v(t, r) be the unique solution to av(t,
at
r)
1 az v k — 1 av = ----- + ------ tare 2r ar'
av(t, r) -
ar
- 0,
limv(t,r)= f(r), tlo
(t>0;0 r<1), (t > 0; r = 1),
(16.42)
(0
It is not difficult to check that the function v(t, Ixl) satisfies (16.41) (Exercise 5). By uniqueness of the solution to (16.41) it follows that u(t, x) _— v(t, Ixl).
468
BROWNIAN MOTION AND DIFFUSIONS
Thus, the transition operators T, transform radial functions to radial functions. Hence, by Proposition 14.1, {R,} is a Markov process on [0, 1] whose infinitesimal generator is 1d2 k-1 d
A ,:=--+ —, 2 dr z 2r dr
with boundary condition d/dr = 0 at r = 1. Recall (see Eq. 14.29, Section 14) that 0 is an inaccessible boundary for {R,}. Hence, a reflecting boundary condition attached to the accessible boundary r = 1 suffices to specify the Markov process {R,}.
17 CHAPTER APPLICATION: G. I. TAYLOR'S THEORY OF SOLUTE TRANSPORT IN A CAPILLARY In a classic study (see theoretical complement 1), G. I. Taylor showed that when a solute in dilute concentration is injected into a liquid flowing with a steady slow velocity through an infinite straight capillary of uniform circular cross section, the concentration along the capillary becomes Gaussian as time increases. Such results have diverse applications. Taylor himself used his theory, along with some meticulous experiments, to determine molecular diffusion coefficients for various substances. Another application is to determine the rate at which a chemical injected into the bloodstream propagates. Let a denote the radius of the circular cross-section of the capillary whose interior G and boundary öG are given by G = {y = (y (l) , y'): — oc < y c^, < z, ly'l
(17.1)
(y' = (Y (2) , y (3) )).
Let c(t; y) denote the solute concentration at a point y at time t. The velocity of the liquid is along the capillary length (i.e., in the yU ) direction) and is given, as the solution of a linearized Navier—Stokes equation governing a steady nonturbulent flow, by
--i ^z
F(y')
=
u0 ( i —
a
.
(17.2)
The parameter U0 is the maximum velocity, attained at the center of the capillary. As in Einstein's theory of Brownian motion, a solute particle injected at a point x = (x', x (2) , x (3) ) E G is locally like a three-dimensional Brownian motion with drift (F(x'), 0, 0) and diffusion matrix D O I, where Do is the molecular diffusion coefficient of the solute (relative to the liquid in the capillary), and I
469
CHAPTER APPLICATION
is the 3 x 3 identity matrix. When this solute particle reaches the boundary clG, it is reflected. In other words, the location {X, = (X; ", X;^^, X^ 3 ^) = (X^' ) , X,)} of the solute particle is a reflecting diffusion on G = G u öG having drift µ(x) = (F(x'), 0, 0) and diffusion matrix D.I. Therefore, its transition probability density p(t; x, y) satisfies Kolmogorov's backward equation, ^ p = Ap'= zDoAXp + F(x') Öt
ap (x G'x(1)
= (x" , x'), x' = (x^2> x^ 3 ^)), (17.3) )
where A is the Laplacian Z ö 2 /(äx) 2 . The backward boundary condition is that of normal reflection (see (16.37) with cp(x) = a 2 — Ix'1 2 ), ;
x« ) ü
p
öx
p =n(x)•gradp =0
(xcäG,t>0;ye6). (17.4)
+x ( ' ) b i
öx
Here, n(x) = (0, x') is the normal (of length a) at the boundary point x = (x", x').
The Fokker—Planck or forward equation may now be shown, by arguments entirely analogous to those given in Sections 2 and 6, to be 2D0A rp — --- (F(y')p) (ye G, t > 0).
=
(17.5)
The forward boundary condition is the no-flux condition (see Section 2) y (2) 2)
+y=n(y)•grad p
=0
(ycäG,t>0).
(17.6)
By using the divergence theorem, which is a multidimensional analog of integration by parts, it is simple to check, in the manner of Sections 2 and 6, that (17.5) and (17.6) are indeed the forward (or, adjoint) conditions. Given an initial solute concentration distribution c o (dx), the solute concentration c(t; y) at y at time t is then given by c(t;
y) =
J
p(t; x, y)co(dx)
(17.7)
G
and it satisfies the Fokker—Planck equations (17.5) and (17.6). Since (a) the diffusion matrix is D O I, (b) the drift velocity is along the x' -axis and depends only on x^ 2 ^, x^ 3 ^, and (c) the boundary condition only involves x (2) , x (3) , it should be at least intuitively clear that {X,} has the following representation: 1. {X := (X
2
,X
3
)} is a two-dimensional, reflecting Brownian motion on
470
BROWNIAN MOTION AND DIFFUSIONS
the disc Ba := {ly'I <, a} with diffusion matrix D 0 I', where I' is the 2 x 2 identity matrix. 2. X,(' = Xo' + f ä F(X) ds + Do B„ where {B 1 } is a one-dimensional standard Brownian motion independent of {X;} and Xo l ". (17.8) )
A proof of this representation is given toward the end of this section. Now the transition probability density p' of the process {X,'} is related top by p'(t;
p(t; x, y) dyt i) .
x', Y') =
(17.9)
By the Markov property of {XI}, the integral on the right does not involve xi ". Now the disc Ba = {ly'I <, a} is compact and p'(t; x', y') is positive and continuous in x', y' e BQ for each t > 0. It follows, as in Proposition 6.3 of Chapter II, that for some positive constants c 1 , c 2 , one has max Ip'(t; x', y') — y(Y')I < c l e - ` 2 `
(t > 0),
(17.10)
X , ,y'EBa
where y(y') is the unique invariant probability for p'. Let us check that y(y') is the uniform density,
y(Y') = a n g (IY'I < a).
(17.11)
It is enough to show that p'(t;
atfjjy'j'_^a)
x', y')y(x') dx' = 0.
(17.12)
But p' satisfies the backward equation
atp = iDOAX• p'
(Ix'! < a),
(17.13)
along with the boundary condition x t2) a P + xts) a p = 0
ax
ax
(Ix') = a).
(17.14)
Interchanging the order of differentiation and integration on the left side of (17.12), and using (17.13) and (17.14) and the divergence theorem, (17.12) is established. In other words, the adjoint operator, which in this symmetric case is the same as the infinitesimal generator, annihilates constants.
471
CHAPTER APPLICATION
As a consequence of (17.10), the concentration c(t; y) gets uniformized, or becomes constant, in the y'-plane. That is, (t; Y,) :=
J
^ c(t; y) dy ' —' cö'= (
)
ao
(fa co(dx))'7za2
(17.15)
exponentially fast as t —> oo. Of much greater interest, however, is the asymptotic behavior of the concentration in the y ' -direction, i.e., of (
)) := e(t; y
y) dy' =
fj c(t; Iy'l a)
)
J _co(dx)I G
IJy'^
a)
7 dy'I. p(t; x, y)
(1 .16)
The study of the asymptotics of c(t; y ") is further simplified by the fact that the radial process {R,:= IX,'1} is Markovian. This is a consequence of the facts that (1) the backward operator ZD.A.. of {X;} transforms all smooth radial functions on the disc B o into radial functions and (2) the boundary condition (17.12) is radial, asserting that the derivative in the radial direction at the boundary vanishes (see Proposition 14.1). Hence {R,'} is a diffusion on [0, a] whose transition probability density is (
q(t; r, r') := p'(t; X', Y')sr(dY) .
(Ix'l = r),
(17.17)
iI9'I_r)
where s,•(dy') is the arc length measure on the circle {Iy'I = r' }. The infinitesimal generator of {R;} is given by the backward operator _ AR
(d 2 1 dl A dr 2 + r drJ'
0 < r < a
(17.18)
and the backward reflecting boundary condition
dl
dr r = a
= 0.
(17.19)
One may arrive at (17.18) and (17.19) from (17.17), (17.13), and (17.14), as in Example 2 of Section 16 (see Eq. 16.42 for k = 2). It follows from (17.10), (17.11), and (17.17) that {R,'} has the unique invariant density ir(r)
=
2r
z ,
(17.20)
0 <, r <, a,
a
and that max o,r.rsa
q(t;
r, r')
2r
—
r a
27cac 1 e - ` 2t (t
>
0).
(17.2)
472
BROWNIAN MOTION AND DIFFUSIONS
For convenience, write
f(r) = 1 — r2 (O < r < a).
(17.22)
Then F(y') = Uo f (ly'l)• Now by the central limit theorem (Theorem 13.1, Exercise 13.2. Also see theoretical complement 13.3) it follows that, as n —• oo, Z=
n
'12 fo,",
(f(R;) — En.f) ds —' N( 0 , 2 t),
(17.23)
where the convergence in (17.23) is in distribution, and
f0a Ej =
f(r)ir(r) dr =
J
(1 o
—? r (r2 dr = 1 .
(17.24)
aZ/ \a 2 2
The variance parameter u 2 of the limiting Gaussian is given by (Proposition 13.2, Exercise 13.2) Q Z =2
f
'a (f (r) — E n f)h(r)n(r) dr 0
(17.25)
where h is a function in LZ ([O, a], n) satisfying ARh(r) _ — (f (r) — E,rf) _ ^2 — 1 .
(17.26)
Such an h is unique up to the addition of a constant, and is given by a
— 2r h(r) 8D az 0
2 + c 3 (17.27)
where c 3 is an arbitrary constant, which may be taken to be zero in carrying out the integration in (17.25). One then gets 2
48 0
Finally, since {B} and {X,'} in (17.8) are independent it follows that, as n —► 00, Y:= n - ' 1' Z (X — ZUo nt) --* N(0, Dt)
(17.29)
in distribution. Here a2U2
D 48D 0+ D
o ,
(17.30)
473
CHAPTER APPLICATION
is the large scale or effective dispersion along the capillary axis. Equations (17.29) and (17.30) represent Taylor's main result as completed by R. Aris (theoretical complement 1). The effective dispersion D is larger than the molecular diffusion coefficient Do and grows quadratically with velocity. Actually, the functional central limit theorem holds, i.e., { Y} converges in distribution to a Brownian motion with zero drift and variance parameter D. Also, for each t > 0, the convergence in (17.29) is much stronger than in distribution. Indeed, since the distribution of Y,(" is the convolution of those of Z,(" ) and Brownian motion B, (see (17.8(ii))), Y,(" has a density that converges to the density of N(0, Dt) as n --> oo. Hence, by Scheffe's theorem (Section 3 of Chapter 0), the density of Y," converges to that of N(0, Dt) in the L' -norm. Since the distribution of X,(' ) has the density c(t;•)/Co , where C o := $,3 c o (dx) is the total amount of solute present, the density of Y^' ° is given by )
)
(17.31)
--> n" 2 c(nt; n'/ Z Z + 2Uo nt)/C0 .
z
One therefore has, writing E =
I
E
-l^( E -z t; E -i z + zU0 E -2 t) —
(2nCt)ii2 exp{ - 2
dz-+0
as v J, 0. (17.32)
Another way of expressing (17.32) is by changing variables z' _ E - 'z, t' = E -z t. Then (17.32) becomes ^c(t ; z + zU°t)
exp^ (2^Dt')^^z
ll
--2Dt'}
dz' -+0
ast'I oo.
(17.33)
From a practical point of view, (17.33) says that at time t' much larger than the relaxation time over which the error in (17.10) becomes negligible, the concentration along the capillary axis becomes Gaussian. The center of mass of the solute moves with a velocity ZU 0 along the capillary axis, with a dispersion D per unit of time. The relaxation time in (17.10) is 1/c 2 , and —c 2 is estimated by the first (i.e., closest to zero) nonzero eigenvalue of ZD o i X . on the disc Ba with the no-flux, or Neumann, boundary condition. It remains to give a proof of the representation (17.8). First, let us show that {X, = (X; 1 ", X;)} as defined by (17.8) is a time-homogeneous Markov process. Fix s, t positive, and let the initial state be x o = (xö'I, xo'). Write X
;+)S = x
+
1 5 F((X; + )„) du + Do (B,+s — B,),
(17.34)
0
where X; + is the after -t process {(X )":= X, + : u >, 0 }. Let g be a bounded measurable function on G. Since {B,,: 0 <, u < t }, {X;,: 0 < u ` t} determine
474 {X.: 0
BROWNIAN MOTION AND DIFFUSIONS
< u < tl,
E[g(X1+s) I
{X v : 0 < u ,
t}]
= E[E(g(X, +S ) {B:0 u,< t}, {X: 0,< u<, t}) ^ {X: 0,< u,< t}]. (17.35) Now, g(X1+5) = 9(X}' , +
foS
F((X; + )u)du+
(B1+s—B1),
X;+s). (17.36)
Since B,+s — B t is independent of the conditioning variables and of X;'> and X; +, the inner conditional expectation on the right in (17.35) may be expressed as
f
E(h(s, (X, + F o ((X;+)„))du, X )
+s
{B: 0 < u < t}, {X: 0 < u < t} (17.37)
where h(s, v, z') = Eg(v + Ji (B, — B,), z').
(17.38)
To evaluate (17.37) use the facts that (1) X,(' ) is determined by the conditioning variables, (2) {X;} is independent of {B,}, so that the conditional distribution of X given {B,,: 0 < u < t}, {X: 0 < u < t}, is the same as that given {XÜ: 0 < u < t}. Then (17.37) becomes (E(h(s, y + Jo F((X)u) du, X;+S) I {X: 0 < u < t})).. (17.39)
But {X;} is a time-homogeneous Markov process. Therefore, (17.39) becomes
(h1(s, y, X;)) Y =xil , = hl(s, Xi l) , X;),
(17.40)
where h l is some function on [0, ao) x G. The expression (17.40) is already determined by {X 0 : 0 < u < t}. Therefore, the outer conditional expectation in (17.35) is the same as (17.40), completing the proof that {X 1 } is a time-homogeneous Markov process. Observe next that by Proposition 14.1, if {Z, :_ (Z,'', Z,')} is a Markov process on G whose transition probability density satisfies the Kolmogorov equations (17.3) and (17.4), then {Z,'} is a Markov process on the disc B o whose transition probability density satisfies the Kolmogorov equations (17.13) and (17.14). Since the latter are also satisfied by {X;}, (17.8(i)) is established. It is enough then to identify the conditional probability density of X;'^, given X o = x, with that of Z; 1) , given Z o = x. Let g be a smooth bounded function on R', and let g denote the function on G defined by g(x) = g(xU )). Then, denoting by EX
EXERCISES
475
expectation given X 0 = x, (Tt9)(X)'= E.O(Xr) = E,,g(X).
(17.41)
By (17.8), as t j 0 one has by a Taylor expansion, (Tr9)(x) = g(xci)) + g'(xU))Ex(J F(X) ds + Do B `) 0 r
+ Zg"(x ( ' ) )E.
2
F(X) ds + Do B, + o(t) 0
= g(x1) + g '( x U))F( x ') + zg"(x" ) )Dot + o(t)
(17.42)
where g', g" are the first and second derivatives of g. It follows that
a
( at
T,9(x) I
= F(x')g'(x") + ZDo g"(x" i ^).
(17.43)
/t_o
But the right side is Ag, where A is as in (17.3). Therefore, the infinitesimal generator of {X,} agrees with that of {Z I } for functions depending only on the first coordinate. In particular, the backward equation for T, g becomes
a T,g(x) = AT,g(x), at
(17.44)
so that, by the uniqueness of the solution to the initial-value problem, E.g(X) _ (T,g)(x) = E(g(Z{' ) ) I Z o = x). Thus, the conditional distribution of Xf ", given X 0 = x, is the same as that of Z,1 1) , given Z o = x. This completes the proof of (17.8). It may be noted that, although kinetic theoretic arguments given earlier provide a justification for the validity of the Fokker—Planck equations (17.5) and (17.6) (with c(t; y) replacing p(t; x, y)), they are not needed for the analysis of the asymptotics of c(t; y) carried out above. We have simply given a probabilistic proof, using the central limit theorem for Markov processes, of an important analytical result.
EXERCISES Exercises for Section V.1
Check (1.2) for (i) a Brownian motion with drift p and diffusion coefficient a Z > 0, and (ii) the Ornstein—Uhlenbeck process.
476
BROWNIAN MOTION AND DIFFUSIONS
2. Show that if (1.2) holds then so does (1.2)' for every e > 0. 3. Suppose that {X,} is a Markov process on S = (a, b), and (p is a continuous strictly monotone function on (a, b) onto (c, d). (i) Prove that {ç(XX )} is a Markov process. (ii) Compute the transition probability density for the process {X,:=exp{B,}} in Example 1. 4. Consider the Ornstein-Uhlenbeck velocity process { Y,} starting at V o = v. Let y = ß/m (Example 2). (i) Show that az Cov(V, I;+) = — (1 - e - ' 7 )e - Y5 , Y
s > 0.
(ii) Calculate the limit of the transition probability density p(t; x, y) in (1.6), as t--. oo,if(a)y<0,or(b)y>0. (iii) Define the Ornstein-Uhlenbeck position process by X, = x + f o V. ds, t -> 0. (a) Show that {X} is a Gaussian process. (b) Show EX, = x + vy '(1 - e Y`). (c) Show
z
z
z
VarX,=a t-3v +v
Y
z
2 Y'
4e-rr-1e-zyr
2Y z Y
Y
(d) Show z COV(Xs , Xs+t - Xs) =
(1 - e)(1 - e) 2 . Y
(iv) Explain why {X,} is not a Markov process. (v) Write y = ßm 'and let a z = ya for some aö > 0. Use (iii) to show that as y -+ co the process {XX } converges to a Brownian motion with zero drift and diffusion coefficient a. 5. In Example 3, take {X,} to be an Ornstein-Uhlenbeck process and f(t) = ct (c ^ 0). Compute the transition probability density of {Z, = X, + ct}. 6. Let v e R', v 0 0, be an arbitrary (nonrandom) constant. Define a deterministic motion by XX = X0 + vt, t > 0, i.e., randomness only in the initial distribution of X 0 . Show that {X,} is a Markov process having continuous sample paths and satisfying (1.2) but with o z = 0. Does the transition probability distribution have a density?
Exercises for Section V.2 1. If fa f(x)h l (x) dx = J f(x)h z (x) dx for all twice continuously differentiable functions f vanishing outside some closed, bounded subinterval of (a, b), then prove that h l (x) = h 2 (x) outside a set of Lebesgue measure zero.
2. Show that the derivations of (2.13) and of the backward equation (2.15) do not require the assumption of the existence of a density of the transition probability.
EXERCISES
477
3. Let f be a twice continuously differentiable function on [c, d]. Show that given &> 0 there exists a twice-differentiable g on l' such that g = f on [c, d] and g vanishes outside [c - e, d + J.
4. (i) Show that (1.2) in Section 1 holds uniformly for all x in the following cases: (a)
U(x)
=0,a 2 (x) =a 2 .
(b) u(x) = µ, a 2 (x) = a 2
(ii) Show that, for the Ornstein-Uhlenbeck process, PP (IX, - xI > r) = o(t) does not hold uniformly in x.
5. Let p(t; x, y) be the transition probability density of a diffusion on V8' and A a real number. Write q(t; x, y) exp{ - At} p(t; x, y). (i) Show that q satisfies the Chapman-Kolmogorov equation (2.2), and the backward and forward equations at = µ(x)
ax —— + Za2(x) ax
2q,
- ay a (u(y)q) + äya (za (y)q) - Aq. z
aq
at
2
2
(ii) (Initial-Value Problem) Let f be a bounded continuous function and define T, f := f f(y)q(t; x, y) dy. Show that the function u(t, x):= (T, f)(x) solves the initial-value problem z au au —=p(x)tx +za2(x)axz-Au at
(xEf8',t>0),
lim u(t, x) = f(x). rlo (iii) (Killing at a Constant Rate) Let {X,} be a diffusion with transition probability density p, and an exponentially distributed random variable independent of {X,} and having parameter A. Then q may be interpreted as the (defective) transition probability density of the process {X,} killed at time . More specifically, show that the function u in (ii) may be represented as E(f(X,) 1 (Z>l) 1 Xo = x). (iv) (Killing) Let ).(x) be a continuous, nonnegative function on R'. Define the operators T, acting on bounded continuous functions f, by (T,f)(x)'= E(.i(X,) expS
l
-
J o A(X,) dsl t
I
Xo = x)
where {X} is a diffusion on W. Show that the operators T, have the semigroup property, and that u(t, x):= (T, f)(x) solves the initial-value problem in (ii) with
A= (v) Note that (T, f)(x) is finite (indeed, T, f is bounded if f is bounded), if A(x) >_ 0. One may also express T, f as
(T1./ )(x) = E(.f(X,) 1 (4,X >> n),
478
BROWNIAN MOTION AND DIFFUSIONS
where conditionally given the sample path {X s : s >_ 0}, the killing time ^ (X , E has the (nonhomogeneous) exponential distribution P(i;^ x , } > t {Xs : s > 0}) = exp
t— £
2(XS ) ds } . )
The killing times ^ fx . ) may take the value + co with positive probability. If {X,} is defined on a probability space of trajectories (52,,.F 1 , P,), then show that by enlarging the space appropriately one may define both {X,} and on a common probability space (52,.x, P). [Hint: First construct the product space (Q 2 , .Fz , Pz ) for an independent family of random variables { } indexed by the set S2, of all trajectories co,. That is, 0 2 = X w En , Icu,, where lo = [0, oo] for all w l E Q. Then take the product space (S2 = X S 2 Z, = 1 OO Wiz, P = P 1 X P2).]
(vi) (Feynman—Kac Formula) If A(x) is bounded below and continuous, and not necessarily nonnegative, show that T, f given in (iv) is well defined, defines a semigroup, and solves the initial-value problem (ii) with A = A(x). (Adjoint Diffusions) Let p(t; x, y) denote the transition probability density of a diffusion on F1' with coefficients satisfying Condition (1.1). In addition, assume that
z 2dxz
(o2(x))—d µ(x)=0,
(xeR').
(i) Show that P(t; x, y):= p(t; y, x) is a transition probability density of a diffusion on F1' and compute its drift and diffusion coefficients. What is the forward equation for p? (ii) (Time Reversibility) Let {X,} and {Y} be diffusions with transition probability densities p and p" as above, with X o =_ x, Yo = y. Prove that for arbitrary time points 0 < t, < • • • < t,, < t the conditional p.d.f. of (X,,, X,,, ... , X,„) given X, = y, evaluated at (x,, x 2 ..... x„), is the same as the conditional p.d.f. of (Y,„, Y,, .. , Y) given Y, = x, evaluated at (x,,, x„_,, .. , x,). Thus, the sample path of one process over any finite time period cannot be distinguished probabilistically from that of the other with the direction of time reversed. 7. (Sources and Sinks) Consider the Fokker—Planck equation with a source (or sink) term h(t, y), öc(t, at
ö a2 (u(Y)c(t, Y)) + i (ia z (Y)c(t, y)). + h(t, y), —
Y
where h(t, y) is a bounded continuous function of t and y, and f Ih(t, x)Ip(t; x, y) dx < oo. One may interpret h(t, y) (in case h _> 0) as the rate at which new solute particles are created at y at time t, providing an additional source for the change in concentration with time other than the flux. The contribution of the initial concentration c o to the concentration at y at time t is c, (t, Y)'=
co(x)p(t; x, Y) dx. J
EXERCISES
479
The contribution due to the h(s, x) Ax As new particles created in the region [x, x + ix] during the time [s, s + As] is h(s, x)p(t — s; x, y) As Ax (0 _< s < t). Integrating, the overall contribution from this to the concentration at y at time t is c 2 (t, y) =
J
f. gal
h(s, x) p(t — s; x, y) dx ds.
'
(i) (Duhamel's Principle) Show that c, satisfies the homogeneous Fokker—Planck equation with h = 0 and initial concentration c o , while c 2 satisfies the nonhomogeneous Fokker—Planck equation above with initial concentration zero. Hence c := c, + c 2 solves the nonhomogeneous Fokker—Planck equation above with initial concentration c o . Assume here that c o (x) is integrable, and lim
J h(t —
s, x)p(s; x, y) ds = h(t, x).
ajo PU
Show that this last condition is satisfied, for example, under the hypothesis of Exercise 6. *Can you make use of Exercise 5 to verify it more generally? (ii) Apply (i) to the case µ(x), a z (x), h(t, x) are constants. (iii) Solve the corresponding initial-value problem for the backward equation.
Exercises for Section
V.3
1. Derive (3.21) using (3.15) and Proposition 3.1. 2. Show that the process {Z,} of Example 4 does not have a finite first moment. 3. By a change of variables directly derive the backward equation satisfied by the transition probability density p of {Z,} := {4p(X,)}, using the corresponding equation for p. From this read off the drift and diffusion coefficients of {Z, }. 4. (Natural Scale for Diffusions) For given drift and diffusion coefficients p(.), a'(.) >0 on S = (a, 6), define (Y 1(x z):= ^ 2, u ) d Y
with the usual convention !(z, x) define the scale function
_
x 62 (y)
—1(x, z) for z < x. Fix an arbitrary x o e S, and
J
s(x)= exp{ —1(x 0 , z)} dz, x°
again following the convention fx° = —J° for x < x o . If {X,} is a diffusion on S with coefficients p(•) a 2 (.), find the drift and diffusion coefficients of {s(X,)} (see Section 9 for a justification of the nomenclature).
5. Use Corollary 2.4 to prove that the following stronger version of the last relation
480
BROWNIAN MOTION AND DIFFUSIONS
in (1.2)' holds: for each x e 11' and each c> 0, P.( max IX,-xl>E^=o(h)
ashj0.
o
[Hint: Let f be twice continuously differentiable, such that (a) f (y) = 2 for Ix — yj > 1, (b) 1 <, f (y) <- 2 for Iy — xl > e, (c) f (y) = Iy — xl'/E 3 for ly — xl e. Then
P.( max JX,—xl>s)^P, max f(X,)>-1^ Px ^ max
tf
(X,) —
ost-
J
t
Af(X,) ds}
0
2
if h is chosen so small that max{IAf (y)I: ye 11'} -< 1/2h. By the maximal inequality (see Chapter I, Theorem 13.6), the last probability is less than
EXI f(Xh) —
f
Af (XS ) ds) <- 2E x f 2 (Xh ) + O(h 2 ).
.'
Now show that
I3
EXf 2 (Xh) 4 PX(IXh — xl > e) + E x IX'^ — x I(lXh_XI F) = o(h),
by (1.2)' and the lemma in Section 3. 6. Show that the geometric Brownian motion {X,} is a process with independent ratios (X,,/X, a ), ... , (X,.jX,), for 0 = t o < t l < • • • < t,,, k >, 2. In particular, at any time scale, t k — kA, k = 0, 1, 2, ... , the values X,,,, X,, ... satisfy the law of proportionate effect: X,,, = L k (A)X„k , k = 0, 1, 2, ... , where L 0 (0), L 1 (i), ... are i.i.d. (load factors) positive random variables. 7. Let {B(t)} be a standard Brownian motion starting at 0. Let V, = e - 'aB(e 21 ), t > 0. Show that { Y,} is an Ornstein-Uhlenbeck process. Note continuity of sample paths from {B(t)}.
Exercises for Section V.4 1. Show that (4.4), (4.5) are solved by (4.2). 2. Derive (4.17). 3. (i) Using (4.17) derive the probability p , that, starting at x, {X,} ever reaches c. X
(ii) Find a necessary and sufficient condition for {X 1 } to be recurrent. 4. Apply the criterion in Exercise 3(ii) to the Ornstein-Uhlenbeck process. What if y—ß/m> 0 ? 5. (i) Using the results of Section 3 of Chapter III, give an informal derivation of a necessary and sufficient condition for the existence of a unique invariant probability distribution for {X,} (i.e., of p(t; x, y) dy).
481
EXERCISES
(ii) Compute this invariant density. (iii) Show that the Ornstein-Uhlenbeck process { V} has a unique invariant distribution and compute this distribution (called the Maxwell-Boltzmann: velocity distribution). (iv) Show that under the invariant (Maxwell-Boltzmann) distribution, az Cov,(V, , V +,) =
2- e
-
'S,
s > 0.
Y
(v) Calculate the average kinetic energy E(mV )of an Ornstein-Uhlenbeck particle under the invariant initial distribution. (vi) According to Bochner's Theorem (Chapter 0, Section 8), one can check that p(s) = Cov,,(V„ V + ,) in (iv) is the Fourier transform of a measure It (spectral distribution). Calculate p. 6. Briefly indicate how the Ornstein-Uhlenbeck process may be viewed as a limit of the Ehrenfest model (see Section 5 of Chapter I11) as the number of balls d goes to infinity. 7. Let s (x)
J
E
= pex ^-I 2ti(1)dy^dz .
xo a 2 (Y)
be the so-called scale function on S = (a, b), where x o is some point in S. If {X} is a diffusion on S with coefficients p(• ), a 2 (• ), then show that the diffusion { Y, := s(X,)} on S = (s(a), s(b)) has the property P({Y,} reaches c before dl Yo = y) =a —y c ,
y_
8. Derive the forward equation ap(t x, y) ;
at
ä ^2 = -^ ^(µ(Y)P(t; x, y)) + (ia2(Y)P(t; X, y)), yyz
(t >0, -cc < x, y < oo), along the lines of the derivation of the backward equation (4.14) given in this section. [Hint: p " +t = pap ] .
9. (Maxwell-Boltzmann Velocities) A monatomic gas consists of a large number N (- 10 23 ) molecules of identical masses m each. Label the velocity of the ith molecule by v, _ (v 31 _ 2 , v 3; 1 , v 3; ), for i = 1, 2, ... , N. To say that the gas is an ideal gas means that molecular interactions are ignored. In particular, the only energy represented in an ideal monatomic gas is the translational kinetic energy of individual molecular motions. The total energy is therefore 3N
'zmv; = zm IIvIIi,
T = T(v1, v2, ... , U3N) _
for v = (v1.... , V3N) E R 3N ,
j=1
where IlvlI2:=^; =Ni vj. Define equilibrium for a gas in a closed system of energy E
482
BROWNIAN MOTION AND DIFFUSIONS
to mean that the velocities are (purely random) uniformly distributed over the energy surface S given by the surface of the 3N-dimensional ball of radius R = (2E/m)" 2 , i.e.,
f
3N
S = ( VI, v2, •
2E
, V3N)• Y_vJ _ m i =1
One may also define temperature in proportion to E. (i) Show that the distribution of the jth component of velocity is given by P(a
b
Ja
[l — (x/R)2](3N- 3)/2 dx/^ R [1 — (x/R)2](3N- 3)12 dx. R
(ii) Calculate the limiting distribution as N —+ oo and E —+ cc such that the average energy per particle EIN (density) stabilizes to > 0. (iii) How do the above calculations change when the 1 2 -norm 1 2 defined above is replaced by the 1p norm defined by jIvIIP'= v, where p >_ 1 is fixed? (This interesting generalization was brought to our attention by Professor Svetlozar T. Rachev.)
Exercises for Section V.5 1. Extend the argument given in Example 1 to compute the transition probability density of a Brownian motion with drift p and diffusion coefficient a 2 > 0. 2. Use the method of Example 2 to compute the transition probability density of a diffusion with drift u(x) = —yx — g and diffusion coefficient a 2 > 0. 3. (i) Use Kolmogorov's backward equation and the Fourier transform to solve the initial-value problem
öu ^= x (
)
Z
=za axe+µöx
(t>0,—oo<x
lim u(t, x) = f (x), rlo
where f is a bounded and continuous function on (—oo, oo). (ii) Use (i) to compute the transition probability density of a Brownian motion. 4. In Exercise 3 take the initial-value problem to be the one corresponding to the Ornstein—Uhlenbeck process. Exercises for Section V.6 1. Use (1.2)' to show that the diffusions { Y, := — X,} and {X} have the same drift and diffusion coefficients, if (6.7), (6.8) hold. 2. Use (1.2)' to show that the diffusions {Y:= X, — md} (m = 0, ± 1, ...) have the same drift and diffusion coefficients, provided u(•) and v 2 (•) are periodic with period d. 3. Express the transition probability density of the Markov process {Z 1 } in Theorem 6.4 in terms of that of {X,}.
483
EXERCISES
4. Using the construction of reflecting diffusions on [0, oo), [0, 1], construct reflecting diffusions on [a, oo), (— co, a], [a, b] for arbitrary a < b. What are the analogs of (6.7), (6.8) and (6.36)- (6.38) in these cases? Express the transition probability densities for these reflecting diffusions in terms of appropriate unrestricted diffusions. 5. Suppose that cp is a measurable map on an interval S onto an interval S'. Suppose that {X} is a Markov process having a homogeneous transition probability p. If T,(f o ^p)(x) := E[ f ((p(X,)) I X 0 = x] is a function v 1 (t, p(x)) of qp(x), for every real-valued bounded measurable function f on S. then prove that { Y, := cp(X,)} is a homogeneous Markov process on S'. 6. Give a derivation of the forward equation (6.17). 7. Prove that the transition probability density q(t; x, y) of {Z} in Theorem 6.4 satisfies (aq/ax)x -1 = 0.
8. (i) Assume that p(t; x, y) >0 for all t > 0, and x, y e (— oo, oo) for every diffusion whose drift and diffusion coefficients satisfy Condition (1.1). Prove that the Markov process {Z,} on the circle in Theorem 6.2 has a unique invariant probability m(y) dy, and that the transition probability density q(t; x, y) of {Z,} converges to m(y) in the sense that
J
jq(t;x,y)
—
m(y)ldy - + 0 ast-+oo,
s
the convergence being exponentially fast and uniform in x. (ii) Compute m(y). [Hint: Differentiate $ T, f (x)m(x) dx = Sf(x)m(x) dx with respect to t, to get (A f)(x)m(x) dx = 0 for all twice-differentiable f on S with compact support, so that A *m = 0.]
J),
9. Derive the forward boundary conditions for a reflecting diffusion on [0, 1]. 10. Suppose (a) {X} and { —X,} have the same transition probability density, and (b) { 1 + X} and { l — X,} have the same transition probability density. Show that the drift and diffusion coefficients of {X} must be periodic with period 2.
Exercises for Section V.7
1. Use the Radon-Nikodym Theorem (Section 4 of Chapter 0) to prove the existence of a nonnegative function y -+ p (t; x, y) such that f, p (t; x, y) dy = p(t; x, B) for every Borel subset B of (a, oo). Show that p (t; x, y) _< p(t; x, y) where p is the transition probability density of the unrestricted process {X, }.
°
°
°
2. Let T, (t > 0) be the semigroup of transition operators for the Markov process {Xj in Theorem 7.1, i.e.,
f(y)P(t; x, dy)
(T,f)(x) = Exf(Xr) = fla,
for all bounded continuous f on [a, oo).
484
BROWNIAN MOTION AND DIFFUSIONS
(i) If f(a) = 0, show that (a) (T^f)(x) =
f(
°
(x > a),
f(Y)P (t; x, Y) dY a.m)
(b) (T, f)(x) -+ 0 as x j a. [Hint: Use (7.11).] (ii) Use (i) to show that T° (t > 0), defined by
(T°.f )(x);=
f(
f(Y)P°(t; x, y) dY, a,
)
is a semigroup of operators on the class C of all bounded continuous functions on [a, oo) vanishing at a. Thus T° is the restriction of T, to C (and T°C c C). (iii) Let A ° denote the infinitesimal generator of T°, i.e., the restriction of the infinitesimal generator of T, to C. Give at least a rough argument to show that (7.10) holds (along with (7.11)). 3. Derive (7.13) and (7.14). 4. (Method of Images) Consider the drift and diffusion coefficients p(.), a 2 (•) defined on [0, oo). Assume u(0) = 0. Extend p(.), a 2 (•) to 01' as in (6.8) by setting ,u(—x) = —µ(x), a'(—x) = a 2 (x) for x > 0. Let p ° denote the (defective) transition probability density on [0, oo) (extended by continuity to 0) defined by (7.9). Let p denote the transition probability density for a diffusion on R' having the extended coefficients above. Show that P ° (t; x, y) = p(t; x, y) — p(t; x, — y) = p(t; x, y) — p(t; —x, y), (t > 0;
x, y >- 0).
[Hint: Show that p ° satisfies the Chapman-Kolmogorov equation (2.30) on S = [0, oo), the backward equation (7.10), the boundary condition (7.11), and the initial condition p ° (t; x, y) dy -* 5(dy) as t j 0 (i.e., (T° f)(x) -• f (x) as t j 0 for every bounded continuous function f on [0, oo) vanishing at 0). Then appeal to the uniqueness of such a solution (see theoretical complements 2.3, 8.1). An alternative probabilistic derivation is given in Exercise 11.11.] 5. (Method of Images) Let u(.), a 2 (.) be drift and diffusion coefficients defined on [0, d] with p(0) = µ(d) = 0, as in (6.36). Extend p(.), a 2 (.) to R' as in (6.37) and (6.38) (with d in place of 1). Let p denote the transition probability density of a diffusion on W having these extended coefficients. Define q(t;
x,
y):=
I p(t; x, y + 2md),
—d < x, y < d,
m=—m
P
° (t; x, Y)'= q(t; x, Y) — q(t; x, — y),
0 5 x, y < d.
(i) Prove that p ° is the density component of the transition probability of the diffusion on [0, d] absorbed at the boundary {0, d}. [Hint: Check (2.30). (7.10), and (7.11) on [0, d], as well as the initial condition: (T° f)(x) -+ f (x) as t 10 for every continuous f on [0, d] with f(0) = f(d) = 0.] (ii) For the special case j(.) = 0, a 2 (.) = a 2 compute p°.
485
EXERCISES 6. (Duhamel's Principle) (i) (Nonhomogeneous Equation) Solve the initial-boundary-value problem a at
u(t, x) = u(x)
öu -x +
za2(x)
tazu x), c3x2 + h(t,
lim u(t, x) = 0,
lim u(t, x) = 0,
xja
zTb
(t > 0, a <x < b); (t > 0, a -< x -< b);
(a -<x-
tim u(t,x)= f(x), (40
where h is a bounded continuous function on [0, cc) x [a, b], and f is a bounded continuous function on [a, b] vanishing at a, b. [Hint: Let p ° be as in (7.17). Define u0(t, x)'=
f. J '
h(s, y)p ° (t — s; x, y) dy,
ui(t; x) =
[a,b]
ft
f(y)p ° (t; x,
v) dy.
a,b]
°
Check that u satisfies the nonhomogeneous equation and the given boundary conditions, but has zero initial value. Then check that u, satisfies the homogeneous differential equation (i.e., with h = 0), the given boundary and initial conditions.] (ii) (Nonzero Constant Boundary Values) Solve the initial-boundary-value problem z
at
u(t, x) =
axz
(t > 0; a <x < b);
u(t, x),
lim u(t, x) = a,
lim u(t, x) = ß;
:la
xTb
lim u(t, x) = f(x), t;
o
where f is a bounded continuous function on [a, b] with 1(a) = a, f (b) = ß. [Hint: Let u (x) satisfy the differential equation and the boundary conditions. Find u,(t, x) which satisfies the differential equation, zero boundary conditions, and the initial condition: lim a ° u, (t, x) = f (x) — u ° (x). Then consider u ° (x) + u,(t, x).] (iii) (Time Dependent Boundary Values) In (ii) take a = a(t), ß = ß(t) where a(t), ß(t) are continuous differentiable functions on [0, x). [Hint: Let u ( t, x):= a(t) + ((ß(t) — a(t))/(b — a))(x — a). Find u,(t, x) solving the nonhomogeneous equation:
°
°
au, = _a u 1 _ a u0 1Q2
(3t
_
z öx z at
with zero boundary conditions and initial condition lim, ° u, (t, x) = .f(x) — uo(0, x)•] ]
7. (Maximum Principle) (i) Let u(t, x) be twice-differentiable with respect to x and once with respect to t on
486
BROWNIAN MOTION AND DIFFUSIONS
(0, T] x (a, b), continuous on [0, T] x [a, b], and satisfy (au/at) — Au <0 on (0, T] x (a, b), where A = o 2 (x) d 2 /dx 2 + p(x) d/dx, with a 2 (x) > 0. Show that the maximum value of u is attained either (a) initially, i.e., at a point (0, x o ), or (b) at a point (t, a) or (t, b) for some t e (0, T]. [Hint: Suppose not. Let the maximum value be attained at a point (t o , x 0 ) with 0 < t o T, and x o e (a, b). Then (Au)(t o , x o ) 0, so that (öu/öt)(t o , x o ) < 0. But this means u(t, x o ) > u(t o , x o ) for some t < t o ]. (ii) Extend (i) to the case au/at — Au < 0. [Hint: Let 0 be a continuous function on [a, b] satisfying AB = 1. For each r > 0, consider u,(t, u) = u(t, x) + eO(x).] (iii) Let u(x) be twice-differentiable on (a, b), continuous on [a, b] and satisfy (Au)(x) > 0 for a <x < b. Show that the maximum value of u is attained at x = a or x = b. [Hint: First assume Au > 0; if the maximum value is attained at x o e (a, b), then Au(x o ) < 0.] (iv) Prove that the solutions to the initial-boundary-value problems in Exercise 6 are unique.
_<
_<
Exercises for Section V.8 1. Prove that (8.3) holds under Dirichlet or Neumann boundary conditions at the end points of a finite interval S.
_<
2. Prove that (i) if a is an eigenvalue of A (with Dirichlet or Neumann boundary conditions at the end points of S), then a 0; (ii) if a,, a 2 are two distinct eigenvalues and ei l 0 2 corresponding eigenfunctions, then Ali, and i' 2 are orthogonal, i.e., = 0; (iii) if 0,, 0 2 are linearly independent, then 4,, ¢ 2 — («,, 02> /II¢1Il 2 )O, are orthogonal; if 0 1 , 0 2 , ... , 0, are linearly independent, and ¢,, ... , O k _, are orthogonal to each other, then 0„ ... , ¢ k_ 1, ¢k — ( Z,-i «;l 4,k) /11c6;• II 2 )4 . are orthogonal. ,
,
3. Assume that the nonzero eigenvalues of A are bounded away from zero (see theoretical complement 2). (i) In the case of two reflecting boundaries prove that ,im p(t; x, Y) = m(Y)
(x, Y e [ 0 ,
d])
where m(y) is the normalization of n(y) in (8.1). Show that this convergence is exponentially fast, uniformly in x, y. (ii) Prove that m(y) dy is the unique invariant probability for p. (iii) Prove (ii) without the assumption concerning eigenvalues. 4. By a change of scale in Example 3 compute the transition probability of a Brownian motion on [a, b] with both boundaries absorbing. 5. Use Example 4, and a change of scale, to find (i) the distribution of the minimum m, of a Brownian motion {X,} over the time interval [0, t]; (ii) the distribution of the maximum M, of {X s } over [0, t]; (iii) the joint distribution of (X„ m,).
EXERCISES
487
6. Use Example 3, and change of scale, to compute the joint distribution of (X„ m, M,), using the notation of Exercise 5 above. 7. For a Brownian motion with drift p and diffusion coefficient a 2 > 0, compute the p.d.f. of T„ A r b when X0 = x and a <x < b. 8. Establish the identities
L [
(na 2t) z ' Z m
p (
-
(2md +y -x) Z l
ex jl 3 = _ x 2azt
1 2 - + - ex
= (2na 2 t) - '/ 2
L
m=1
+ ex
p(
(2md +Y +x)' ll
Sl
z
-
cos
- -- -->
p^-
)7J
2621
— cos
m x / \i
\-
exp{-(2md + y —Y^ - exp^-(2md + y + x) 2a2t 2a 2 1 ) -^1 x,
2
_-- Z exp -
a 2 rr 2 m 2 1
m rzy
sin ( rnnx - si n -
(0 x, y d).
)
d m = 1 2d2
d d
Use these to derive Jacobi's identity for the theta function, .
©(z):=
°^
exp{
-nm z} _—1 2
m - m _
[Hint:
exp{
(il 1 -nmz /z} _ --O` f \z /
(z > 0).
^Z
^z m s,
Compare (8.23) with (6.31), and (8.33) with Exercise 7.5(ii).]
9. (Hermite Polynomials and the Ornstein-Uhlenbeck Process) Consider the generator A = ' d 2 /dx 2 - x d/dx on the state space W. (i) Prove that A is symmetric on an appropriate dense subspace of L2 (!8', e - x 2 dx). (ii) Check that A has eigenfunctions H"(x)'= ( -1)" exp {x 2 }(d "/dx") exp{ -x 2 }, the so-called Hermite polynomials, with corresponding eigenvalues n = 0, 1, 2, ... . (iii) Give some justification for the expansion x
p(t; x, y) = e
2
Z c"e
t
"'H"(x)H"(y),
c "'=
10. According to the theory of Fourier series, the functions cos nx (n = 0, 1, 2, ...), sin nx (n = 1, 2, ...) form a complete orthogonal system in L2 [-n, 7r]. Use this to prove the following: (i) The functions cos nx (n = 0, 1, 2, ...) form a complete orthogonal sequence in L2 [0, it]. [Hint: Let f E L2 [0, n]. Make an even extension of f to [ -n, n], and show that this f may be expanded in L2 [-n, it] in terms ofcos nx (n = 0, 1, ...).] (ii) The functions sin x (n = 1, 2, ...) form a complete orthogonal sequence in L2 [0, it]. [Hint: Extend f to [-it, n] bysetting,f( -x) = - f(x)forx e [ -n, 0).]
Exercises for Section V.9 1. Check (9.13). 2. Derive (9.20) by solving the appropriate two-point boundary-value problem.
488
BROWNIAN MOTION AND DIFFUSIONS
3. Suppose S = [a, b) and a is reflecting. Show that the diffusion is recurrent if and only if s(b) = cc. If, on the other hand, s(b) < oo then one has p s,. = 1 for y> x and
s(b) — s(x) _
Px r
=
Sib) — s(Y)
J
n exp{ — I(x o , z)} dz
x
----,
s
y < x.
exp{ — I(x o , z)} dz
[Hint: For y > x, assume the fact that the transition probability density p(t; x, y) is strictly positive and continuous in x, y for each t> 0, to show that b := min{P= (X o y): a -< z -< y} > 0 for t o > 0.]
4. Suppose S = [a, b) and a is absorbing. Then show that no state, other than a, is recurrent and that s(x) —s(a)
ifa<x^y,
s(y) — s(a)
Px y =
ifs(b)=oo,anda<-y<x,
I s(b) — s(x)
ifs(b)< cc, and a
s(b) — s(y)
5. Suppose S = [a, b] with both boundaries reflecting. Then show that p x,, = 1 for all x, y E S. and the diffusion is recurrent. 6. Apply Corollary 9.3 and Exercise 3 to decide which of the following diffusions are recurrent and which are transient: (i) S=(—ac,00),µ(x)=p960,a z (x)=a 2 >0; (ii) S=(— cc, cc),p(x)=0,a 2 (x)=a z >0;
(iii) S = (—cc, cc), µ(x) = —ßx, a 2 (x) = a 2 >0 (consider separately, ß>0, ß < 0); (iv) S = [0, cc), p(x) - p, a 2 (x) = a 2 > 0, 0 reflecting (consider separately the cases P< 0 ,P= 0 ,u>0); (v) S=(0,1),u(x)--* oo as x10,p(x)-+ —oo asxji,a 2 (x)>a 2 >0 for all x. 7. Suppose S = [a, b] with a absorbing and are transient, p xy = 1 for y < x, and
b
reflecting. Show that all states but a
s(x) — s(a)
Px v
= s(y)
— s(a),
Y > x.
8. Suppose S = [a, b] with both boundaries absorbing. Show that all states, other than a and b, are transient and s(b)—s(x)
s(b) — s(y) Pz y =
s(x) — s(a) s(y)—s(a)
ayx
EXERCISES
489
9. Let [c, d] c S. Solve the two-point boundary-value problem: A f (x) = 0 for c <x < d, lim a 1 , f (x) = a, lim X1 d f (x) = ß, where a, ß are given numbers. Show that the solution equals aP (r < < r d ) + ßPx (r a < t a ). P
Exercises for Section V.10 1. (i) Assuming that the transition probability density p(t; x, y) is positive and continuous in x, y for each t > 0, show that M(x), defined by (10.3), is bounded on [c, d]. [Hint: Use Proposition 13.5 of Chapter I.] (ii) Let T be as in (10.2), x e (c, d). Assume that
Px (max (XX —xj>a =o(h) 0
ash10,
/
for every e > 0 (this is proved in Exercise 3.5). Show that E (1,,, h } M(Xh )) = o(h) as h l 0. Check the assumption in the case of constant coefficients. X
2. Show that (10.15) holds if s(b) = oo. 3. Assume s(b) = oc. (i) Show that (10.16) is necessary and sufficient for finiteness of E x r, (c < x). (ii) Prove (10.17), under the assumption (10.16). 4. Suppose s(a) = —cc. (i) Prove that, for x < d, E,t d (ii) Derive (10.19).
<
cc if and only if (10.18) holds.
5. (i) Suppose S = [a, b) with a a reflecting boundary. Show that the diffusion is positive recurrent if and only if s(b) = oo and m(b) < co. [Hint: Assume that the transition probability density p(t; x, y) is positive and continuous in x, y for each t > 0, and proceed as in Exercise 9.3, or use Proposition 13.5 of Chapter I.] Show that (10.19) holds in case of positive recurrence. (ii) Suppose S = [a, b] with a and b reflecting boundaries. Show that the diffusion is positive recurrent. Show that (10.17) and (10.19) hold. 6. Apply Proposition 10.2 and Exercise 5 above (as well as Exercise 9.6) to classify the diffusions in Exercise 9.6 into transient, null recurrent and positive recurrent ones. 7. Let [c, d] c S. Solve the two-point boundary-value problem Af (x) = —g(x) for c <x < d, lim, 1 c f (x) = a, lim x 1 d f (x) = ß, where g is a given bounded measurable (or, continuous) function on [c, d] and a, ß are given constants. Show that the solution represents
f
OT, n za
Ex
g(X,) ds + aPa(t< < t,.) +
>_
ßPx(Td < t,)•
8. Show that, under natural scale (i.e., s(x) = x), m'1 (x) M I (x) -> M 2 (x) for all x.
m'2 (x) for all x implies
490
BROWNIAN MOTION AND DIFFUSIONS
Exercises for Section V.11 1. Prove that t, r, A r d are stopping times, by showing that they are measurable with respect to . O ' t^ A Zd' respectively. 2. Prove that T 1 A 1 2 is a stopping time, if z and t Z are. 3. (i) Suppose that E(Zh(XX+t,, .
.. ' Xi+,,)lfr<^)) =
E(Z[E, h(X ;, .... X,;%=x.11=<^))
for all Z of the form (11.7), m being arbitrary and f bounded continuous. Prove that (11.12) holds. [Hint: Use (a) the fact that bounded continuous functions on S"' are dense in O(S', y) where u is any probability measure, (b) Dynkin's Pi-Lambda Theorem (Section 4 of Chapter 0).] (ii) Prove that if (11.12) holds for arbitrary finite sets t, < t2 < ... < t; and bounded continuous h, then the conditional distribution of X, given .F, is Px , on the set {r < co}.
4. Prove that q(y) in (11.18) is continuous for every bounded continuous h on S', if x —• p(t; x, dy) is weakly continuous (see Section 5 of Chapter 0, for the definition of weak convergence). This weak continuity, also called the (weak) Feller property, is proved in Section 3 of Chapter VII. 5. Let {XX } be a Brownian motion with drift p and diffusion coefficient a 2 > 0, starting at x.
(i) Prove that the conditional distribution of {X; 0 _< u <_ t}, given X, does not depend on p. [Hint: Look at finite-dimensional conditional distributions.] (ii) Use (i) and (11.42) to compute the joint p.d.f. of (X„ M,), where MM = max{X,,: 0 < u < t}. (iii) Use (ii) to compute the distributions of (a) M, and (b) r a (a > x). (iv) Check that the conditional distribution in (i), given X, = y, is the same as the distribution of the process {_o_ (y—x))+x:Ou<'t1
1
where { }.: u >, 0} is a Brownian motion with zero drift and diffusion coefficient a 2 , starting at zero. 6. Assume the result of Example 8.3, for the case p = 0. Show how you can use Exercise 5(i) to derive the joint distribution of (Xe , m, M,) where {X,), M, are as in Exercise 5(i), (ii), and m, = min{X: 0 _< u _< t}. 7. Let {X} be a Brownian motion with drift zero and diffusion coefficient az > 0, starting at x. Let t =; A r d , where c <x < d. Show that the after-r process X, is not independent of the pre-t sigmafield. 8. Let {X} be a Brownian motion with drift p > 0 and diffusion coefficient a 2 > 0, starting at zero. (i) Prove that {z Q : a - 0} is a process with independent and homogeneous increments. (ii) Find the distribution of r q . (iii) What can you say concerning (i), (ii), if p < 0?
EXERCISES
491
9. Show that the proof of the strong Markov property (Theorem 11.1) essentially applies (indeed, more simply) to the countable state space case (see Section 5 of Chapter IV). 10. (i) Use Exercise 5(ii) to derive the transition probability density (component) of a Brownian motion on (— oo, 0] with absorption at 0. (ii) Derive the corresponding transition density on [0, co) with absorption at 0. (iii) Use Exercise 6 to derive the transition probability density (component) of a Brownian motion on [0, 1] with absorption at both ends. 11. (Method of Images) Consider the drift and diffusion coefficient g(.), c 2 (•) defined on (— oo, 0]. Assume p(0) = 0. Extend p(.)‚o 2 (.) to R' as in (6.8) by setting µ( —x) = —g(x), Q 2 (— x) = a 2 (x) for x < 0. Let {X,} be a diffusion on R' with these coefficients, starting at x < 0, and let z o denote the first passage time to 0. Let { Y,} be the process defined by _ X, for t < T o
Y —
X, for
>, t o .
(i) Show that {Y,} has the same distribution as (ii) Show that Px (M, -> 0) = 2P,,(X, > 0), where M, := max{Xs : 0 -< s -< t }. [Hint: Show that (11.34)— (11.36) hold with 0 replaced by x and a replaced by 0.] (iii) Show that Px (XX -< y, t o > t) = PX (X, >, —y) for ally < 0. [Hint: Use the analog of (11.35).] (iv) Deduce from (iii) that
P.(X,
forallx<0,y-<0.
(v) Derive the analog of (iv) for a diffusion on [0, co).
Exercises for Section V.12 1. Use the strong Markov property to prove (12.10) assuming that the diffusion is recurrent and that (12.11) holds. 2. Assume that the diffusion is positive recurrent. (i) Prove that (12.14) defines a probability measure n(dx), i.e., prove countable additivity. [Hint: Use Monotone Convergence Theorem.] (ii) Prove (12.15) for all f satisfying (12.11). 3. Prove the uniqueness of the invariant probability in Theorem 12.2. 4. (i) Prove (12.24). (ii) Use the strong Markov property to prove (12.25). 5. Under what assumptions on p (and op/at) can you justify the equalities in (12.27)? 6. (i) Prove that the diffusion in Example I is positive recurrent. (ii) Let — oo
492
BROWNIAN MOTION AND DIFFUSIONS
7. Prove that the diffusion in Example 2 is positive recurrent under the assumptions (12.35).
Find the invariant distributions of the following diffusions: (i) S = (—cc, oo), µ(x) = —ßx (ß < 0), o 2 (x) = a 2 •
(ii) S = (0, 1], µ(x) = (k — 1)/x (k > 1), a 2 (x) = a 2 "1" a reflecting boundary. (iii) S = [0, CO), µ(x) _ —ßx (ß > 0), a 2 (x) = a 2 , "0" is a reflecting boundary. Exercises for Section V.13 1. Give a proof of Theorem 13.1 similar to the proof of Proposition .10.1 in Chapter II. (Also, see the proof of Proposition 12.1.) 2. Consider a diffusion {X1 } on S = (0, 1], with "1" a reflecting boundary and coefficients µ(x) = (k — 1)/x (k > 1), v 2 (x) - a 2 . Apply Theorem 13.1 to compute the asymptotic distribution of t-l/2 J t (f(XS )—Ej)ds
as tj oo,
0
where f (x) = 1 — x 2 . What is E n f ? [Hint: Remember the boundary condition for A.]
Exercises for Section V.14 Let {Xt(' 1 } (i = 1, 2, ... , k) be k independent diffusions on R 1 with drift coefficients u ( ' ) (•) and diffusion coefficients a().). (i) Show that {X 1 := (X,( ", .. , X; k) )} is a Markov process on 18 k , and express its transition probability density in terms of those of {X}' ) }. [Hint: Look at P({X s+ , e R} n F), where R is a k-dimensional rectangle (a,, b,) x • • • x (a k , b k ), and F = F, n F2 n • . n Fk with F; determined by {X'°: 0 < u < s} (1 < i < k).] (ii) Show that the transition probability density p(t; x, y) of {X,} satisfies Kolmogorov's backward and forward equations
1 Zk a Q , (x) a 2p + Yk- µu ^ (x'°) äp
äp
ax
at = ap
I Z
ax(i,'
" Z (U?(y Ii ')p) — e ^ ay^,^ (f^(Y' ° )p)•
at =2 i =, (ay )
(iii) Let p(x) = —yx, D(x) = a 2 I, with y > 0, a 2 > 0. Write down its transition probability density p(t; x, y), and check that it satisfies the Kolmogorov equations. Show p(t; x, y) converges to a Gaussian density as t —• cc, and deduce that the diffusion has a unique invariant distribution. 2. Write down the analogs of (1.2), (1.2)' for a multidimensional diffusion. 3. (i) Show that a k-dimensional Brownian motion with drift p = (p". .. , p" k ") and diffusion matrix D = ((d ;; )) is a process with independent increments, and that its transition probability density p(t; x, y) satisfies Kolmogorov's equations (14.6).
EXERCISES
493
(ii) More generally, let cp be a one-to-one twice continuously differentiable map on R' onto an open subset of R k . If {X} is a diffusion on R k with generator A given by (14.9), compute the generator of {Y,:= cp(X,)}. 4. Check that the proof of Theorem 11.1 (Strong Markov property) remains unchanged if the state space is taken to be It" or a subset of I". 5. Let {B,} be a k - dimensional standard Brownian motion, k >_ 3, starting at some Rk . (i) Let d> jxl. Prove that P(B,I > d for all sufficiently large t) = 1. [Hint: Fix d, > d. (a) P({B,} ever reaches {z: IzI = d, }) = 1, by the recurrence of one-dimensional
XE
Brownian motions. (b) Write r, := inf {t > 0: IB,l = d,}, r, :=inf{t> r,: B,l = d}, t 2n +1 ' =inf {t >'r 2n :IB,l = d, } z 2n := inf {t > T 2i _,: IB,l = d }.
By the strong Markov property and (14.26), P(r 2 " < oo) = (d/d,) (k-2 >", and P(T2 1 < oo I TZ" < (Xo) = 1. (c) By (b), P(T Z " = oo for some n) = 1.] (ii) From (i) conclude that P(IB,I-+coast-+ cc) =1. 6. Let {B,} be a two-dimensional standard Brownian motion. Let B = {z: Iz - z 0 1 with z o and E > 0 arbitrary. Prove that P(sup {t >_0:B,EB} = oo) = 1. 7. Let {X,} be a k - dimensional diffusion with periodic drift and diffusion coefficients, the period being d > 0 in each coordinate. Write Z;' := X;'°(mod d), Z, (Z t (I) , Z't1k) ) (i) Use Proposition 14.1 to show that {Z,} is a Markov process on the torus T = [0, d)" —which may be regarded as a Cartesian product of k circles. (ii) Assuming that the transition probability density p(t; x, y) of {X,} is continuous and positive for all t > 0, x, y, prove that the transition probability q(t; z, z') dz' of {Z,} admits a unique invariant probability n(z') dz' and that )
max jq(t; z, z') - ir(z')l z.z'e T
_<
[I -
bdk]ttt -1 0
where [t] is the integer part of t, b'= min{q(1;z, z'): z, z' e T} and
>_
0= max{q(l;y,z')—q(l;z,z'):y,z,z'c-T}. [Hint: Recall Exercise 6.9 of Chapter 11.] 8. Let {X,} be a k - dimensional diffusion, k 2. (i) Use Proposition 14.1 and computations such as (14.18) to find a necessary and
494
BROWNIAN MOTION AND DIFFUSIONS
sufficient condition (in terms of the coefficients of {X,}) for {R,:= IX,I} to be a Markov process. Compute the generator of {R,} if this condition is met. (ii) Assume that the sufficient condition in (i) is satisfied. Find necessary and sufficient conditions for {X} to be (a) recurrent, (b) transient. 9. Derive (14.14). 10. (i) Sketch an argument, similar to that in Section 2, to show that Corollary 2.4
holds for diffusions on 11" (under conditions (1)—(4) following Definition 14.1). (ii) Use (i) to prove Px (max IX,—xl>e^=o(h)
as h10.
O_
[Hint: See Exercise 3.5.] Exercises for Section V.15 1. Prove that the process {X,} defined by (15.1) is Markovian and has the transition probability (15.3). [Hint: Mimic the proof of Theorem 7.1.] 2. Specialize (15.23) to the case k = 1 to compute Px ({X} reaches c before d), where {X} is a one-dimensional diffusion and c 5 x <_ d. Check that this agrees with (9.19). 3. Apply (15.23) to compute Px ({B} reaches 3B(O:c) before aB(O:d)) where {B,} is a k-dimensional standard Brownian motion, k > 1, OB(O:r) = {y E l: IYI = r}, and c < Ixl _< d. Check that this computation agrees with (14.23). 4. Use Exercise 14.8(i) and (15.23) to compute Px ({X,} reaches äB(O:c) before öB(O:d)),
c 5 Ixl _< d,
for a k-dimensional diffusion (k > 1) {X,} such that the radial process {R,:= IX,I} is Markovian. 5. Consider a standard k-dimensional Brownian motion (k > 1) on G = {x E R k : 0 for 1 <, i 5 k} with absorption at the boundary. (i) Compute p ° (t; x, y) for t> 0; x, ye G). (ii) Compute the distribution of r ac , if the process starts at x e G. *(iii) Compute the hitting (or, harmonic) measure i/i(x, dy) on äG, for x e G. (1'(x, dy) is the Pr -distribution of X t G ). G. *(iv) Compute the transition probability P(t; x, B) for x e G, B 6. Do the analog of Exercise 5 for G = {x e I: 0 _< x '° _< 1 for 1 < i _< k}. (
7. Prove (15.27).
8. For G in Example I and Exercises 5 and 6, show that P P (T OG > t) = o(t) as t 10 if x e G.
9. Show that the function Ji f in (15.28) satisfies (15.23) with A = Z A ( A :=
k
02
i =1
(ax )
„^ z
EXERCISES
495
10. Let G be an open subset of Oa k . A Borel measurable function u on G, which is bounded on compacts, is said to have the mean value property in G if u(x) = f sk-, u(x + rO) do, for all x and r > 0 such that the closure of the ball B(x:r) with center x and radius r is contained in G. Here S" - ' is the unit sphere {lyj = l} and dO denotes integration with respect to the uniform (i.e., rotation invariant) distribution on S k- '. (i) Let {B,} be a standard Brownian motion on V (k > 1) and cp a bounded Borel measurable function on 3G. Suppose Px (i ?c < cc) = 1 for all x c G, where P. denotes the distribution of {B,} starting at x. Show that u(x):= E x cp(B,,, G ) has the mean value property in G. [Hint: u(x) = E x u(B,,, A ) if {y: ly - xl -< r} c G, by the Strong Markov property. Next note that the Po -distribution of {B,} is the same as that of {OB,} for every orthogonal transformation 0.] (ii) Show that if u has the mean value property in G, then u is infinitely differentiable. [Hint: Fix x c- G. Let B(x:2r) c G, and let 0 be an infinitely differentiable radially symmetric p.d.f. vanishing outside B(O:r). Let ü.= u on B(x:2r) and zero outside. Show that u = ü * i/i on B(x:r)]. (iii) Show that if u has the mean value property in G, then it is harmonic in G, i.e., Au = 0 in G. [Hint:
u(x) = u(y) + Y- (xW`W - y (0 )
i
au (Y) + 2 Z (xu) - y 'n)( x u> - yÜn) öxO öxu> (Y)
+ O(p 3 ), where p = Ix - yl. Integrate with respect to the uniform distribution on {x: Ix - yj = p }. Show that the integral of the first sum is zero, that of the second sum is (p 2 /2k)Au(y) and that of the remainder is 0(p 3 ), so that (p 2 /2k)Au(y) + O(p 3 ) is zero for small p.] (iv) Show that a harmonic function in G has the mean value property. [Hint: Use the divergence theorem.] 11. Check that (i) the function i/i f in (15.28) is harmonic,
O(x; y)s(dy) = I for x E 3G, and (ii) f (iii) i(i(x, y)s(dy) converges weakly to b .(dy) as x -^ y o E 3G. IG
y
12. (Maximum Principle) Let u be harmonic in a connected open set G c 1& k . Show that u cannot attain its infimum or supremum in G, unless u is constant in G. [Hint: Use the mean value property.] 13. (Dirichlet Problem) Let G be a bounded, connected, open subset of R. Given a continuous function q' on OG, show, using Exercises 10 and 12, that (i) u(x):= E, e q (B) is a solution of the Dirichlet problem Au(x) = 0
for x e G,
u(x) = 43(x)
for x e 3G,
and (ii) if u is continuous at the boundary, i.e., u(y) = lim u(x) as x(e G) - y e 0G, then this solution is unique in the class of solutions which are continuous on G.
496
BROWNIAN MOTION AND DIFFUSIONS
14. (Poisson's Equation) Let G be as in Exercise 13. Give at least an informal proof, akin to that of (15.25), that Tr'G
u(x):= E x g(X,) ds + E x cp(X T G ,
)
0
satisfies Poisson's equation 2Au(x) = —g(x)
for x e G,
u(x) = 47(x)
for x e OG.
Here g, cp are given continuous functions on G and OG, respectively. Assuming u is continuous at the boundary, show that this solution of Poisson's equation is unique in the class of all continuous solutions on G. 15. Extend the function F in (15.37) to a twice continuously differentiable function on [0, co) vanishing outside [c 1 , d 1 ], where 0 < c, < c < d
(i)fc°
17. Prove (15.61) using (15.60). 18. Write out the details of the proof of Theorem 15.3(b). [Hint: Follow the steps of Exercise 14.5(i), replacing step (a) by (15.52) with d replaced by d i .] 19. Consider the translation x —* x + z (for a given z), and the diffusion {Y, := X, + z}. (i) Write down the drift and diffusion coefficients of {Y,}. (ii) Show that {X,} is recurrent (or transient) if and only if {Y,} is recurrent (transient). (iii) Write down (15.46) and (15.47) for {Y}, in terms of the coefficientsµ")(• ), d. (. ). 20. Let {X,} be a diffusion on l. Assume that (a) ((d (x))) __ a 2 I, where a 2 > 0 and I is the k x k identity matrix, and I x ' Ip''°(x + z) -< —S for Ixt >, M, where z e l8 k , 5>0, M >0 are given. Apply Theorem 15.3 (and Exercise 19) to decide whether the diffusion is recurrent or transient.
(b)E;`_ ;;
(
21. For a diffusion having a positive and continuous (in x, y) transition density p(t; x, y) (t > 0), prove that the P,-distribution of T OB(O . d) has a finite m.g.f. in a neighborhood of zero, if xl < d.
Exercises for Section V.16 1. Compute the transition probability density of a standard Brownian motion on G = {x e 08 k x '° >, 0 for 1 < i < k} with (normal) reflection at the boundary (k > 1). :
(
2. (i) Compute the transition probability density q of a standard Brownian motion on G = {x e ll : 0 < x ` -< 1 for 1 -< i -< k} with (normal) reflection at the boundary. ( )
497
THEORETICAL COMPLEMENTS
(ii) Prove that max q((t; x, y) — lI -< c i e ` z', -
(t > 0),
x,yEri
for some positive constants c,, c 2
.
3. Prove that {Z,} defined by (16.27) is a Markov process on H = {x e 11": y•x > 0}. 4. Check (16.30). 5. Check that the function v(t, jxj) given by the solution of (16.42) (with r = jxl) satisfies (16.41).
THEORETICAL COMPLEMENTS Theoretical Complements to Section V.1 1. A construction of diffusions on S = R' by the method of stochastic differential equations is given in Chapter VII (Theorem 2.2), under the assumption that g(• ), r(.) are Lipschitzian. If it is also assumed that a(x) is nonzero for all x, then it follows from K. Ito and H. P. McKean, Jr. (1965), Diffusion Processes and Their Sample Paths, Springer- Verlag, New York, pp. 149-158, that the transition probability p(t; x, dy) has a positive density p(t; x, y) for t > 0, which is once continuously differentiable in t and twice in x. Most of the main results of Chapter V have been proved without assuming existence of a smooth transition probability density. Ito and McKean, loc. cit., contains a comprehensive account of one-dimensional diffusions.
Theoretical Complements to Section V.2 1. For the Markov semigroup {T,} defined by (2.1) in the case p(t; x, dy) is the transition probability of a diffusion {X }, it may be shown that every twice continuously differentiable function f vanishing outside a compact subset of the state space belongs to 9,,. A proof of this is sketched in Exercise 3.12 of Chapter VII. As a consequence, by Theorem 2.3, f (X,) — $ö Af (Xs ds is a martingale. More generally, it is proved in Chapter VII, Corollary 3.2, that this martingale property holds for every twice continuously differentiable f such that f, f', f" are bounded. Many important properties of diffusions may be deduced from this martingale property (see Sections 3, 4 of Chapter VII). )
2. (Progressive Measurability) The assumption of right continuity of t —• X,(w) for every w e Q in Theorem 2.3 ensures that {X,} is progressively measurable with respect to {.y, }; that is, for each t > 0 the map (s, w) —+ X(w) on [0, t] x S2 into S is measurable with respect to the product sigmafield _4([0, t]) © SF, (on [0, t] x S2) and the Borel sigmafield .a(S) on S. Before turning to the proof, note that as a consequence of the progressive measurability of {X} the integral f p Af (Xj ds is well defined, is .y,-measurable and has a finite expectation, by Fubini's Theorem.
498
BROWNIAN MOTION AND DIFFUSIONS
Lemma 1. Let (S2, .y) be a measurable space. Suppose S is a metric space and s --+ X(s, w) is right continuous on [0, oo) into S, for every aw e Q. Let {.y,} be an increasing family of sub-sigmafields of .f such that X, is .F, measurable for every
t _> 0. Then {X,} is progressively measurable with respect to {. ~r }.
❑
Proof. Fix t > 0. Let f be a real-valued bounded continuous function on S. Consider the function (s, cw) -* f (XX (w)) on [0, t] x fI. Define 2^
gn(s, w) _ f(X2-^r(w)) 1 [0.2--r](s) +
f(X12-nr(w))lu+-1)2 r,i2-"r](S). (T.2.1)
As each summand in (T.2.1) is the product of an S measurable function of w and a ([0, t])-measurable function of s, g„ is _4([0, t]) ®.-measurable. Also, for each (s, co) e [0, t] x f, g„(s, w) -+ f (X$ (co)). Therefore, (s, w) , f (X3 (o))) is R([0, t]) Qx .measurable. Since the indicator function of a closed set F c S may be expressed as a pointwise limit of continuous functions, it follows that (s, w) -• 1 F (Xs (w)) is _4([0, t]) ©.0;-measurable. Finally, c' == {B e p1(S): (s, co) -• 1 8 (XS (w)) is ‚([0, t]) Q measurable on [0, t] x fl} equals s(S), by Dynkin's Pi-Lambda Theorem. n Another significant consequence of progressive measurability is the following result, which makes the statement of the strong Markov property of continuous-parameter Markov processes complete (see Chapter IV, Proposition 5.2; Chapter V, Theorem 11.1; and Exercise 11.9). Lemma 2. Under the hypothesis of Lemma 1, (a) every {.^,}-stopping time r is .-measurable, where 5= == {A e F: A n {i _< t} e .F, Vt > 0}, and (b) X,1 (T< . ) is measurable. SFB ❑ Proof. The first part is obvious. For part (b), fix t > 0. On the set 52, := {r < t}, XT is the composition of the maps (i) a -• (tr(w), w) (on f2, into [0, t] x S2,) and (ii) (s, w) •--> X (w) (on [0, t] x Q r into S). Let n __ {A n St,: Ac . Nr } = (Ac 3: A c i2 r }, be the trace of the sigmafleld .Fr on 52,. If r e [0, t] and Ac .y, ( n , one has 5
{west,: (r(w),o)e[0,
r] x A} =
r} n
Ac
Thus the map (i) is measurable with respect to In (on the domain St,) and ,1([0, t] ® (on the range [0, t] x S2 r ). Next, the map (s, w) -+ XS (w) is measurable with respect to .q([0, t]) O .yr (on [0, t] x f2) and .a(S) (on S), in view of the progressive measurability of {X 5 }. Since [0, t] x Q, e ([0, t]) ® F,, and the restriction of a measurable function to a measurable set is measurable with respect to the trace sigmafield, the map (ii) is e([0, t]) © SFJ,,- measurable. Hence w -. XT(m[ (W) is . I n ,-measurable on R. Therefore, for every B e a(S), (we S2: X
X(W)
(cw) e B} n Q, - {co E f',: X, [m) (w) e B) e S.
n
3. (Semigroup Theory and Feller's Construction of One-Dimensional Diffusions) In a series of articles in the 1950s, beginning with "The Parabolic Differential Equation
499
THEORETICAL COMPLEMENTS
and the Associated Semigroups of Transformations," Ann. Math., 55 (1952), pp. 468-519, W. Feller constructed all nonsingular Markov processes on an interval S having continuous sample paths. Nonsingularity here means that, starting from any point in the interior of S, the process can move to the right, as well as to the left with positive probability. If S = (a, b), and a, b cannot be reached from the interior, then such a process is completely characterized by a strictly increasing continuous scale function s(x) and a strictly increasing right-continuous speed function m(x). The role of s(x) is the determination of the probability 4(x) of reaching d before c, starting from x, where a
(
d
Mx 1, c<x
The infinitesimal generator of this process is A = (d/dm(x))(d/ds(x)). It may be noted that, given the functions 0, M, the scale function s(.) is determined up to an additive constant, and the speed function m(•) is determined up to the addition of an affine linear function of s(. ). One may, therefore, fix x 0 e (a, b) and let s(x 0 ) = 0, m(x 0 ) = 0. A necessary and sufficient condition that the boundary point a cannot be reached, that is for the inaccessibility of a, is
f.
(T.2.2)
X0 m(x)ds(x)= —co,
and that for the inaccessibility of
J
b is b
(T.2.3)
m(x) ds(x) = co.
xo
For the class of diffusions considered in this book (see (9.9), (9.10), (9.13)),
s(x)=
Jx
exp{—I(x o , z)} dz, 1 (xo,
z^ z) exp{I(x o , z)} dz,
m(x) = f
,
X0
xp
z) = ,Jr
2µ(Y) dY
xo U 2 (Y)
If any boundary point is accessible, a boundary condition has to be specified at that point, in order to indicate what the Markov process does on reaching this point. This is discussed in theoretical complement 7.1. A very readable account of this characterization of nonsingular Markov processes on S having continuous sample paths is given in D. Freedman (1971), Brownian Motion and Diffusion, Holden-Day, San Francisco, pp. 102-138. See Theorem T.3.1 of Chapter VII for (T.2.2), (T.2.3). The second, and more important, part of Feller's theory is the construction of Markov processes on S. given a pair of scale and speed functions s and m. This is carried out by the method of semigroups. The rest of this subsection is devoted to a description of this method.
500
BROWNIAN MOTION AND DIFFUSIONS
In general semigroup theory one considers a family of bounded linear operators {T,: t > 0} on a Banach space B with norm II - II, satisfying T, +s = T,T,. It will be assumed that IIT, f — f II —0 as t j 0, for every fe ll. Since the set B o of all f e B for which this convergence holds is itself a Banach space, i.e., a closed linear subset of B, this assumption simply means the restrict of T, to B 0 . It follows from the semigroup property that {T,: t > 0} is determined by {T,: 0 < t <e} for every s > 0. Indeed, the semigroup is determined by its infinitesimal generator A:= (d/dt)T o . To be precise, let -9 = _9 x denote the set of all f e B such that (T, f — f )/t converges in norm to some g e B, as t j 0. For f e -9, define Af to be lim(TI f — f )/t as t 10. We will also assume that IIT,II _< 1, i.e., {T} is a contraction semigroup. This is not a serious restriction since, given an arbitrary semigroup {T,}, the semigroup exp{ —ct}T, (t > 0) defines a contraction semigroup if c > logjjT, II. For f e 9A ,
II (T, +sf
—
Tf)/s—T,A f II _ II(T3f—f)/s—A f II --+0
as s j 0. Therefore, T, f e 3x for all t > 0 if f e -q,^, and in this case AT, f = T, A f. That is, T, and A commute on Q. By an argument entirely analogous to that used for proving the fundamental theorem of calculus,
E
ATJds= J ' T.Afds. T,f—f= J ( T,f)ds= , ds o 0 '
In particular, the function t —. T, f on (0, oo) solves the initial-value problem: for f e 1,A , solve
t
u(t) = Au(t)
(t > 0), tim u(t) = f. Flo
The solution t --+ T, f is also unique. To see this, let u(t) be a solution. Fix t> 0 and consider the function s --+ T,_ S u(s) (0 _< s 5 t). Now,
ds
Tu(s) = lim {(T, h
h u(s
+ h) — T 1 _ s u(s + h)) + (T 1 _ g u(s + h) — T,_.u(s))}
1 t-s
du s (
)
=lim 1 (— T S,Au(s+h)ds' +T,_ 5 h l o h , _s_a ds = —T,_ s Au(s) + T,_,Au(s) = 0.
Hence T,_ s u(s) is independent of s so that setting in turn s = 0, t, we get T,u(0) = u(t), that is, T, f = u(t). When is an operator A defined on a linear subspace Q of B an infinitesimal generator of a contraction semigroup {T,}? The answer is contained in the important Hille— Yosida Theorem: Suppose 1 x is dense in B. Then A is the infinitesimal generator of a contraction semigroup on B if and only if, for each . > 0, A — A is one-to-one (on 9A ) and onto B with II(A — A) - ' II < 1/2. A simple proof of this theorem for the case B = C[a, b], the set of all continuous functions on (a, b) with finite limits at a and b, and for closed linear subspaces of C[a, b], may be found in P. Mandl (1968),
THEORETICAL COMPLEMENTS
501
Analytical Treatment o/' One-Dimensional Markov Processes, Springer- Verlag,
New York, pp. 2-5. It may be noted that if A is a bounded operator on B, then T, = exp{tA} (see Section 3 of Chapter IV). But the differential operator A = (d /dm(x))(d/ds(x)) is unbounded on C[a, b]. Consider the case B = C 0 (a, b), the set of all continuous functions on (a, b) having limit zero at a and b. C0 (a, b) is given the supnorm. Let A = (d/dm(x))(d/ds(x)) with ^A comprising the set of all f e B = C0 (a, h) such that Af e B. Suppose A, i7 A satisfy the hypothesis of the Hille-Yosida Theorem, and {T,} the corresponding contraction semigroup. It is simple to check that (i: - A) 'f equals g= J/ e '°T, f dt, for each 2 > 0; that is, . - A)g = f. Suppose that the resolvent operator (;. - A) ' is positive; if f >_ 0 then (1 - A) 'f >_ 0. Then, by the uniqueness theorem for Laplace transforms, T, f > 0 if f >_ 0. In this case f -^ T, f(x) is a positive bounded linear functional on C 0 (a, b). Therefore, by the Riesz Representation Theorem (see H. L. Royden (1968), Real Analysis, 2nd ed., Macmillan, New York, pp. 310-11), there exists a finite measure p(t; x, dy) on S = (a, b) such that T, f(x) = $ f(y)p(t; x, dy) for every f e C 0 (a, b). In view of the contraction property, p(t; x, S) < 1. In order to verify the hypothesis of the Hille-Yosida Theorem in the case a, b are inaccessible, construct for each A. > 0 two positive solutions u„ u 2 of (A - A)u = 0, u, increasing and u 2 decreasing. Then define the symmetric Green's function G,A (x, y) = W 'u,(x)u z (y) for a <x _< y < b, extended to x > y by symmetry. Here the Wronskian W given by W 1= u 2 (x) du, (x)/ds(x) - u l (x) du 2 (x)/ds(x) is independent of x. For Ja C0 (a, b) the function g(x) = G x f(x):= J G,(x, y)f(y) dm(y) is the unique solution in C0 (a, b) of (A - A)g = f. In other words, G. = (A - A) '. The positivity of (A - A) ' follows from that of G A (x, y). Also, one may directly check that G.1 - 1/2. This implies p(t; x, (a, b)) = 1. For details of this, and for the proof that a Markov process on S = (a, b) with this transition probability may be constructed on the space C([0, oo ): S) of continuous trajectories, see Mandl (1968), loc. cit., pp. 14-17,21-38. -
-
-
-
-
-
4. For a diffusion on S = (a, b), consider the semigroup {T} in (2.1). Under Condition (1.1), it is easy to check on integration by parts that the infinitesimal generator A is self-adjoint on LZ (S, n(y) dy), where ir(y) dy is the speed measure whose distribution function is the speed function (see Eq. 8.1). That is,
J
Af(Y)g(Y)n(Y) dy = ff(y)Ag(y)7r(y)dy
for all twice continuously differentiable f, g vanishing outside compacts. It follows that T, is self-adjoint and, therefore, the transition probability density q(t; x, y):= p(t; x, y) /n(y) with respect to n(y) dy is symmetric in x and y. That is, q(t; x, y) = q(t; y, x). Since p satisfies the backward equation cap/c?t = A ' p, it follows that so does q: cq/ät = A x q. By symmetry of q it now follows that 3q /ät = A y q. Here Aq, A y q denote the application of A to x -. q(t; x, y) and y -^ q(t: x, y) respectively. The equation tq/at = A y q easily reduces to cep/lit = A *p, as given in (2.41). For details see Ito and McKean (1965), loc. cit., pp. 149-158. For more general treatments of the relations between Markov processes and semigroup theory, see E. B. Dynkin (1965), Markov Processes, Vol. 1, Springer- Verlag, New York, and S. N. Ethier and T. G. Kurtz (1986), Markov Processes: Characterization and Convergence, Wiley, New York.
502
BROWNIAN MOTION AND DIFFUSIONS
Theoretical Complements to Section V.4 1. Theorem 4.1 is a special case of a more general result on the approximation of diffusions by discrete-parameter Markov chains, as may be found in D. W. Stroock and S. R. S. Varadhan (1979), Multidimensional Diffusions, Springer-Verlag, New York, Theorem 11.2.3. From the point of view of numerical analysis, (4.13) is the discretized, or difference-equation, version of the backward equation (4.14), and is usually solved by matrix methods.
Theoretical Complements to Section V.5 1. The general PDE (partial differential equations) method for solving the Kolmogorov equations, or second-order parabolic equations, may be found in A. Friedman (1964), Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood Cliffs. In particular, the existence of a smooth positive fundamental solution, or transition density, is proved there.
Theoretical Complements to Section V.6 1. Suppose two Borel-measurable function p(• ), a'(•) > 0 are given on S = (a, b) such that (i) j(.) is bounded on compact subsets of S, (ii) a 2 (.) is bounded away from zero and infinity on compact subsets of S, and (iii) a and b are inaccessible (see theoretical complement 2.1 above). Then Feller's construction provides a Markov process on S having continuous sample paths, with a scale function s(.) and speed function m(.) expressed as functions of p(• ), a'(.) as described in theoretical complement 2.1. Hence continuity of p(•) is not needed for this construction. 2. The general principle Proposition 6.3 is very useful in proving that certain functions {Q(XX )} of Markov processes {X} are also Markov. But how does one find such functions? It turns out that in all the cases considered in this book the function q0 is a maximal invariant of a group of transformations G under which the transition probability is invariant. Here invariance of the transition probability means p(t; gx, g(B)) = p(t; x, B) for all t > 0, x e S. g e G, B Borel subset of the metric space S. A function cp is said to be a maximal invariant if (i) cp(gx) = cp(x) for all g e G, x e S, and (ii) every measurable invariant function is a (measurable) function of (p. For each x e S the orbit of x (under G) is the set o(x):= {gx: g e G}. Each invariant function is constant on orbits. Let S' be a metric space and cp a measurable function on S onto S' such that (i)' pp is constant on orbits, and (ii)' cp(x) # q(y) if o(x) 96 o(y). In other words
Proof. Taking the conditional expectation first with respect to a{Xu : u <, s} and then with respect to a{cp(X„): u < s),
I
I
P(
s}).
503
THEORETICAL COMPLEMENTS
But, by property (i) above, p(t; x, tV '(B')) = p(t; g 'x, g '(r) '(B'))) = p(t; g - 'x, (W ° g) '(B')) -
-
-
= p(t; g
-
'x, qP '(B')), -
since (p ° g = (p by invariance of (p. In other words, the function x -, p(t; x, (p '(B')) is invariant. By (ii) it is therefore a function q(t; cp(x), B'), say, of 'p. Thus E(p(t; X, q '(B')) a{w(X„): u -< s }) = E(q(t; p(XX), B') a{ p(XX): u < s }) -
= q(t; go(xs), B').
n
It has been pointed out to us by J. K. Ghosh that the idea underlying the proposition also occurs in the context of sequential analysis in statistics. See W. J. Hall, R. A. Wijsman, and J. K. Ghosh (1965), "The Relationship Between Sufficiency and Invariance with Applications in Sequential Analysis, Ann. Math. Statist., 36, pp. 575-614. Also see theoretical complement 16.3. A maximal invariant of the reflection group G = {e, —e} (ex := x, —ex := —x) is (p(x) = Jx ( . Thus, {IX,l} is Markov if {X} and { —X,} have the same transition probability, i.e., if p(t; —x, —B) = p(t; x, B). The group G of transformations on 68' generated by reflections around 0 and 1 (i.e., by g o x = —x, g l x = g 1 (1 + x — 1) _ 1 — x + I = 2 — x) is infinite, and o ( x )={ x + 2m : m = 0 , ±1,±2,...}u {— x +2m:m= 0,±1,... }.
Thus, if p(t; —x, —B) = p(t; x, B) and p(t; x, B) = p(t; x + 2, B + 2), then {Z,} in Theorem 6.4 is Markov, since a maximal invariant is cp(x) = jx(mod 2) — 11. Theoretical Complements to Section V.7
1. It is shown in Example 14.1, that the so-called radial Brownian motion {IB,^} is a diffusion on S = [0, co), such that "0” is inaccessible (from the interior of S). On the other hand, if the process starts at "0 ", it instantaneously enters the interior (0, cc)
and stays there forever. Thus, although the diffusion may be restricted to the state space (0, oo), "0" may also be included in the state space. The only other way of including "0" in the state space is to make "0" an absorbing boundary, if the process is to have the Markov property and continuous sample paths. The nonabsorbing boundary point "0" in this case is called an entrance boundary. In general, an inaccessible lower boundary a is an entrance boundary if
v= J
xp
s(x)dm(x)> —cc.
a
Similarly, an upper inaccessible boundary b is entrance if
v ° :=J 6 s(x)dm(x) < cc. xo
p.
See Ito and McKean (1965), loc. cit., 108. Suppose a boundary point is accessible, but the process starting at this boundary
504
BROWNIAN MOTION AND DIFFUSIONS
point cannot enter the interior; that is, no boundary condition other than absorption is consistent with the requirement that the process be Markovian and have continuous sample paths. Then the boundary is said to be an exit boundary. In general, an accessible lower boundary a is exit if v a = — oc, and an accessible upper boundary h is exit if u b = co. For details see Ito and McKean, loc. cit., p. 108. An example of a lower exit boundary is S = [0, oc ), µ(x) = ax, a 2 (x) = ßx with ß > 0, which occurs as a model of growth of an isolated population (see S. Karlin and H. M. Taylor (1981), A Second Course in Stochastic Processes, Academic Press, New York, p. 239). If one allows jump discontinuities at a boundary point, then the diffusion on reaching the boundary (or, starting from it) may stay at this point for a random holding time before jumping to another state. The holding time distribution is necessarily exponential (see Chapter IV, Proposition 5.1), with a parameter 6 > 0, while the jump distribution y(dy) is arbitrary. The successive holding times at this boundary are i.i.d. and are independent of the successive jump positions, the latter being also i.i.d. See Mandl (1968), loc. cit., pp. 39, 47, 66-69. Theoretical Complements to Section V.8
(Semigroups under Boundary Conditions) Let S = [a, b], p(. ) and a(. ) differentiable, and a 2 (x) > 0 for all x e S. If Dirichlet or Neumann boundary conditions are prescribed at a, b, then the Hille-Yosida theorem may be used to construct a contraction semigroup {T,} generated by A on B:= C[a, b], or C 0 (a, b). As indicated in theoretical complement 2.3 above, there exists a transition probability, with total mass _< 1, such that Tj(x) = f f (y)p(t; x, dy) for all f c B. To give an indication how the hypothesis of the Hille-Yosida Theorem is verified, consider Dirichlet boundary conditions. That is, let B = C 0 (a, b), -9A the set of all Je B such that Af := (d/dm(x))(d/ds(x))f(x) is in B. Fix A > 0, and let u„ u 2 be the solutions of (it — A)u = 0 with u i (a) = 0, ui(a) = 1, u 2 (b) = 0, ui(b) = — 1. By the existence and uniqueness theorem for linear ordinary differential equations, these solutions exist and are unique. Define the Green's function G A (x, y) = W - `u 1 (x)u 2 (y) for x <, y, and extend it symmetrically to x > y, with W as the Wronskian (see theoretical complement 2.3). It is not difficult to check that for every f e B the function G A f (x):= f G A (x, y)f (y) dm(y) is the unique element in B satisfying (2 — A)G A f = f. Since G A is positive and (G x 1)(x) may be shown directly to be less than I for all x, the verification is complete, and we have a transition probability p(t; x, dy) with p(t; x, S) < 1.
In the case of Neumann, or reflecting, boundary conditions at a, b, take B = C[a, b] and 9,^ = the set of all f in B such that Af e B and f'(a) = 0 = f'(b). Then (.? — A) may be expressed as (A — A) 'f = G z f + I I (f)u, + ! 2 (f )u 2 , where u I , u 2 , GA are as above, and 1 1 (f),1 2 (f) are bounded linear functionals of f determined so that g== (2 — A) 'f satisfies the boundary conditions g'(a) = 0 = g'(b). The hypothesis of the Hille-Yosida Theorem may be directly verified now. Since constant functions belong to Ö,^ and (A — A)(1/2) = 1, it follows that -
(A—A) -t 1 = 1/2,
and
p(t;x,[a,b])= I.
We have given a brief outline above of Feller's construction of diffusions under Dirichlet and Neumann boundary conditions. Complete details may be found in
505
THEORETICAL COMPLEMENTS
Mandl (1968), loc. cit., Chapter II. The direct probabilistic constructions given in the text (see Sections 6 and 7) do not make use of this theory. 2. (Eigenfunction Expansions) For a justification of the eigenfunction expansion described in Section 8, consider first {T,} on C0 (a, b) under Dirichlet boundary conditions. Extend T, to L2 ([a, b], a(y) dy = dm(y)). Now n(y) dy is an invariant measure. For
dt
f
T1f(Y)n(Y) dy =
J ATf(Y)h(Y) dy = J Tf(Y)A
*7 z(Y) dy = 0,
so that
J T,f(Y)n(Y) dy =
f
f(y)ir(y)dy for f e C.(a, b).
From this it is easily shown that {T,} is a contraction semigroup on L2 , whose infinitesimal generator is the extension of A to the set of all f e C0 (a, b) such that Af e LZ([a, b], m(y) dy). We denote this extension also by A and note that A is self adjoint. Fix A > 0. Then (2 — A) ' is compact and self-adjoint on L2 . Indeed, (.? — A) ' is the integral operator G., and it follows from a well-known theorem of Riesz (see F. Riesz (1955), Functional Analysis, F. Ungar Publishing Co., New York, Chapter VI) that (A — A) ' is a compact self-adjoint operator whose eigenvalues p„(2) are positive and converge to zero as n —^ oo, and that the corresponding normalized eigenfunctions 0. comprise a complete orthonormal sequence in L2 . As a consequence, the eigenvalues of A are ;:= = A — '(2) < 0, with the corresponding eigenfunctions i/i,,. Under Neumann boundary conditions, it follows from the representation (A — A) 'f = G x f + 1,(f )u 1 + 1 2 (f )u 2 that (A — A) ' is again a compact self-adjoint operator on L2 . The eigenvalues a„ of A are nonnegative, with 0 as one of them since Al = 0. Again the normalized eigenfunctions comprise a complete orthonormal set. The uniqueness of the solution to the initial-value problem, under any of these boundary conditions, follows from the uniqueness result proved in theoretical complement 2.1 above. Another proof may be based on the maximum principle for parabolic equations (see Exercise 7.7). -
-
-
-
-
Theoretical Complements to Section V.11 1. A proof that the pre -i sigmafield as defined by (11.2) is the same as the one generated by the stopped process {X t ,,,: t _> 0} is given in Stroock and Varadhan (1979), loc. cit., p. 33. The continuity of the function cp(y) in (11.18) depends only on the fact that x — p(t; x, dy) is weakly continuous. This fact, referred to as the Feller property, is proved in Section 3 of Chapter VII, for diffusions on H'. More generally, as described in theoretical complements to Sections 2, 7, 8, Feller's construction implies that T, f is bounded and continuous if f is. 2. The Brownian meander and Brownian excursion processes considered in Exercises 12.6, 12.7 of Chapter I and theoretical complements to Section 12 of Chapter I and Section 5 of Chapter IV may also be defined as continuous-parameter Markov
506
BROWNIAN MOTION AND DIFFUSIONS
processes having continuous sample paths but nonhomogeneous transition probabilities. In fact, let {B,} denote a standard Brownian motion, and let a. = sup{t _< 1: B, = 0}, p = inf{t >_ 1: B, = 0}. Consider the stochastic processes defined by B; :_ X8 (1 _
(meander):
x+ ,^(1 — A)_1/ 2 ,
B*+ :_ JB (1 _, )x+ , P I(p — A) 1 / 2 ,
(excursion):
(T.11.1)
0 _< t _< 1,
(T.11.2)
0 < t < 1.
Then in B. Belkin (1972), "An Invariance Principle for Conditioned Random Walk Attracted to a Stable Law," Z. Wahrscheinlichkeitstheorie und Verw. Geb., 21, pp. 45-64, it is shown that the meander process defined by (T.11.1) is a continuous-parameter Markov process starting at 0 with transition probabilities given by the following densities (
p(00 t, Y) = 2t 3 / 2 y exp{ — 2t }^ i -t( 0 , Y),
0 S t — 1, y > 0,
-
p + (s, x; t, Y) = {(P1-s(Y — x) — cp,-s(Y + x)}
D 1 ^(0, y) 0 < s < t ca l _,(0, x)'
1, x, y > 0,
where p (z) = (2zru) iiz exp{ —2u}' (
and
^.(a, b) =
Ja q(z)dz. b
Likewise, for the case of the excursion process defined by (T.11.2), one obtains a continuous-parameter Markov process with nonhomogeneous transition law with the following density (see Ito and McKean (1965), loc. cit., p. 76): p* + (0, 0; t, y) = 2y 2 (21rt 3 (1 — t) 3 ) - '/ 2 exp{—y 2 /2t(1 — t)},
0 < t -< 1, y > 0,
(1 — s) 3 / 2 y exp{ —y 2 /2(1 — t) }
P *+ (s, x; t, y) = {q,-(y — x) — (P'-s(Y + x)}
(1 — t) 312 x exp{ —x 2 /2(1 — s)}' 0<s
The equivalence of these processes with those of the same name defined in theoretical complements to Section I.12 is the subject of the paper by R. T. Durrett, D. L. Iglehart, and D. R. Miller (1977), "Weak Convergence to Brownian Meander and Brownian Excursion," Ann. Probab., 5, pp. 117-129.
Theoretical Complements to Section V.12 1. Consider a positive recurrent diffusion on an interval S, having a transition probability p(t; x, dy) and an invariant initial distribution n. We will use a coupling argument to show that fl p(t; x, dy) — rc(dy)jI := sup{ p(t; x, B) — n(B)I: B e _4(S)} -. 0 as t -• oc. For this, let {X1 }, {Y} be two independent diffusions having transition probability p and with X0 - x, and Yo having distribution n. Let i= inf{t >, 0: X, = Y}. If one
507
THEORETICAL COMPLEMENTS
can show that r < oo a.s., then the process {Z} defined by
Z,:=
(X, Y,
_>
for t
(T.12.1)
has, by the strong Markov property for the two-dimensional Markov process {(X, ) ,) }, the same distribution as {X, }. In this case,
Ip(t; x, B) — rt(B)i = IP(X, E B) — P(} e B)I = IP(Z, E B) — P(Y E B)I _
ast--*oc.
In order to prove r < oo a.s., it is enough to prove that the process {(X„ Y,)} reaches every rectangle [a, b] x [c, d] (a < b, c < d) almost surely. For the process must then go across the diagonal {(u, v): u = v} a.s. Let cp(u, v) denote the probability that a pair of such independent diffusions starting at (u, v) ever reach the rectangle [a, b] x [c, d]. Note that it x it is an invariant initial distribution for the two-dimensional diffusion, and let {(U, V)} denote such a diffusion having the initial distribution it x it. Let a denote the probability that the latter ever reaches [a, b] x [c, d]. Also write D„ := {(U„ V) e [a, b] x [c, d] for some t > n }, and F.:= {(U„ V,) e [a, b] x [c, d] for some t _< n }. Then a = P(D„) + P(F„ n D.c). But, by stationarity, P(D„) = a. Therefore, P(F, n D „) = 0. By the Markov property, P(F„ n Dc) = E'[IF.,( 1 — ro(U,,, V'))] = E(P(F„ I (U,, V'))(l — qp(U,,, l' )). Now we may use the positivity of the transition probability density to show easily that P(F, U„ = u, V. = v) > 0 for almost all pairs (u, v) (with respect to Lebesgue measure on S x S). Therefore, I — (p(u, v) = 0 for almost every pair (u, v). By using the continuity of x —• p(t; x, y) it is now shown that (u, v) —* c,(u, v) is continuous, so that cp(u, v) = I for all (u, v). This proves that r < oo a.s.
Theoretical Complements to Section V.13 1. (Martingale Central Limit Theorem) Although a proof of the central limit theorem (Theorem 13.1) for positive recurrent one-dimensional diffusions may be patterned after the corresponding proof for Markov chains without any essential change, the proof does not work for Markov processes on general state spaces and multidimensional diffusions. The reason for this failure is that point recurrence may not hold. That is, even for Markov processes having a unique invariant distribution, there may not be any point in the state space to which the process returns (infinitely often) with probability 1. A more general approach that applies to all ergodic Markov processes is via martingales. First, let us prove a general martingale central limit theorem. Central limit theorems for Markov processes will then be derived from it. Let {Xk ,,,: 1 < k < k„} be, for each n >, 1, a square integrable martingale difference
508
BROWNIAN MOTION AND DIFFUSIONS
sequence, with respect to an increasing family of sigmafields { y k : 0 -< k <- kn }. Write 22
I
k p-- '^k—I,n)>
E(X k,n
2 = Sk.n
2 62.,
i =1 I'k( i)'—
E(Xl,n1(IXj.nI>c) I ` fJ — l.n ) >
(T.13.1)
Mn =max{ak n ;1 <, k kn Sn,kn' —
YL Xin.
j=1
The following result is due to B. M. Brown (1977), "Martingale Central Limit Theorems," Ann. Math. Statist., 42, pp. 59-66. Theorem T.13.1. (Martingale CLT). Assume that, as n -+ oo, (i) skn n -• 1 in probability, and (ii) L k ,,() -, 0 in probability, for every E > 0. Then S n kn converges in distribution to N(0, 1). ❑ Proof. Consider the conditional characteristic functions
(Pk,n( )'=E(exp { i
(fielt').
Xk,n} I'^k—1,n ),
(T.13.2)
It would be enough to show that
(a) E (exp {i Sn.kn}'1 j 4k.)) = 1, provided If in
(b)
p k , n () -+ exp{ — X 2 /2} in probability.
Indeed, if J([
E ex 2 ==ex P{ — 2/2 P{ i S n.kn } — ex p{ —Z/ }I
Sexp{—Z2/2
E exp{iZSf , kn } } exp{ — Z2/2}
E
exp{i_Sn , k „} 1 1 4'k.,,()
Z 1 I .0. — /2} f (Pk.n(Z)
}Elexp{ — 1
(T.13.3) Now part (a) follows by taking successive conditional expectations given .yk-1,n (k = kn , kn — 1, ... ,1), if (JJ ())-' is integrable. Note that the martingale difference property is not needed for this. It turns out, however, that in general (f cannot be bounded away from zero. Our first task is then to replace Xk,n by new martingale differences Yk , n for which this integrability does hold, and whose sum has the same asymptotic distribution as S., kn . To construct 1, first use
THEORETICAL COMPLEMENTS
509
assumption (ii) to check that M. -+ 0 in probability. Therefore, there exists a
nonrandom sequence 8„ j 0 such that
asn-pco.
P(M„>_S „)-+0
(T.13.4)
Similarly, there exists, for each e > 0, a nonrandom sequence O„(E) j 0 such that
P(L
>_ O„(E)) -+ 0
(e)
k
as n - co.
(T.13.5)
Consider the events A k „(s):= {Qk,„ < L k. „ < O „(E) , se.,, < 2 },
(1 _< k _< Q. (T.13.6)
Then A k , „(e) is yk _ 1 -measurable. Therefore, Yk. „ defined by Y
(T.13.7)
Xk.n 1 Ak ^(c)
has zero conditional expectation, given Although Yk ,„ depends on e, we will suppress this dependence for notational convenience. Note also that k„
P(Yk.n = Xk.n for 1 -< k < kn) >- Pn Ak,n(E) k=1
= P(M, < b n> L k n(e) < ®(e)
, S,.k ^ < 2) - 1. (T.13.8)
We will use the notation (T.13.1 -2) with a prime (') to denote the corresponding quantities for {Yk.n }. For example, using the fact E(Yk,„ I Fk.-I.n ) = 0 and a Taylor expansion, z
^ kn() — ^ 1 2 \\\
(^ \l QkZn i E exP(i Yk.n } — )1
L
=E
L
— 2
<E II3 2
Yk•„
f
ol
1+
(
t(;
Yk,n +
\
(1 2 ) Yk.n
1
k-1•
"J^
— u)(exp{iu^Yk } — 1) du I ^k_ 1•
(l
/ 6k ' n + SzZ E(Yk.n ll^Yt.^^>e}
„JI
c^-ln)
3
II akZ , + 2 E(Xk.n 1 Urk."I>E1 j Ak- l.n)• Fix E W. Since M; < b „, 0 Therefore, using (T.13.6, 9), j^
11
1 — ( 2 /2)
2.
^k.n(S) — ^ — - Qk n^ _ Y_k SZ
x
f
1
k
(T.13.9)
- 1 (I _< k _< k „) for all large n.
—
f//
11
ISI 3 E +2 e),
"2
1
2 J 2 ak,n
(T.13.10)
510
BROWNIAN MOTION AND DIFFUSIONS
and
(l — çc) — exp{—
(
_
22 Sk
1 — ^2 vk2„ — exp{— ^
J
8
2 ak?„ }
4
8
(T.13.11)
Therefore, ( ^2 1 j (Pk.n (^)
Sa
1 1 3 E +X 2 0„(E) + 4 8 n .
— exp1 — 2 Sn.k„
(T.13.12)
Moreover, (T.13.12) implies
Ifl wk.n( )1 >- exp {— 22 S k „} — I I'E +X 2 0„(E) + 4a ö„ a
exp{ —^ 2
} — I^I E — 0 2 0„(ß) + 4 S (T.13.13) 3
n
By choosing E sufficiently small, one has for all sufficiently large n (depending on E), Ifl gqk.n(^)I bounded away from zero (uniformly in n). Therefore, (a) holds for {Yk , n }, for all sufficiently small E (and all sufficiently large n, depending on E). By using relations as in (T.13.3) and the inequalities (T.13.12, 13) and the fact that s„ k. - 1 in probability, we get 2
lira E exp{i^S k „} — exp -n^m
2
exp{ — ^2 } li m E (exp{— Z }) — (11 (Pk.n() I )I 2
5 ex p{ — } exp12^( exp{ — ^Z } — I I'E
l
exP (
( ((
(
^2
1
2
—
—
I Inn EIf ^Pk.n(^) — e 212 1 )
n -^ x
(T.13.14)
ii')
Finally, z2 l
_
urn E exp {i S„, k ,} — exp — — S } 2 )
n- W
(
Jim E exp{iS„ k ^} — E exp{iS k „}I + lim IE exp{iS} — exp{ -n-+ao
l
.—. (
2
2
2
= 0 + li m I E exp{iS;, k „} — exp{ — } < (expf— 2 } — II'E J
1(T. 13.15)
THEORETICAL COMPLEMENTS
511
The extreme right side of (T.13.15) goes to zero as e j 0, while the extreme left does not depend on e. n Condition (ii) of the theorem is called the conditional Lindeberg condition. For additional information on martingale central limit theorems, see P. G. Hall and C. C. Heyde (1980), Martingale Limit Theory and Its Applications, Academic Press, New York. One may deduce from this theorem a result of P. Billingsley (1961), "The Lindeberg-Levy Theorem for Martingales," Proc. Amer. Math. Soc., 12, pp. 788-792, and I. A. Ibragimov (1963), "A Central Limit Theorem for a Class of Dependent Random Variables," Theor. Probability Appl., 8, pp. 83-89, stated below. Theorem T.13.2. Let {X„: n _> 0} be a stationary ergodic sequence of square integrable martingale differences. Then Z„/^:= (X 1 + • • + X„) /\ converges in ❑ distribution to N(0, a 2 ), where a 2 = EX. Proof. It is convenient to construct a doubly stationary sequence
X_,, XO, X, ... .X„,... such that {X„: n _> 0} has the same distribution as {X„: n >_ 0} has. This can be accomplished by setting for each m, the finite-dimensional distribution of any m consecutive terms of the sequence {X„: n e Z) to be the same as the distribution of (X0 ..... X„, _ , ). Since this specification satisfies Kolmogorov's consistency requirements, the construction is complete on the canonical space 08 1 . To simplify notation we shall write X„, instead of X„, for this process. If a 2 = 0, then X„ = 0 almost surely, and the desired conclusion holds trivially. Assume a 2 > 0.
First, let us show that {X„} so extended is a {3} -martingale difference sequence, where ^„ := a {X; : —co
I `;n) = 0, i.e., `
E(1 X„ ,) = 0 A
+
for all A e .fF„.
(T.13.16)
Suppose A is a finite-dimensional event in , say A e a{ X„ _ j : 0 _< j _< m }. Note that the (joint) distribution of (X„_„„ ... , X„, X ,) is the same as that of any in + 2 consecutive terms of the sequence {X„: n >_ 0 }, e.g., (X0 , X,, ... , X„, + ,). Therefore, (T.13.16) becomes, for such a choice of A, E(I A X„, + ,) = 0, for all A E a{X o , ... , X„,}, which is clearly true by the martingale difference property of {X„: n >, 0 }. Now consider the class 'K„ of sets A in f„ such that E(1,, X„. +1 ) = 0. It is simple to check that '6'„ is a sigmafield and since''„ = a {X„ - t : 0 s j _< m} for all m ? 0 one has 16„ _ „. Next define a := E(X, I -,) (na 1). Then {a: n _> 0} is a stationary ergodic sequence, and EQ,., = a' > 0. In particular, sn /n = cr ,/n -+ a 2 a.s., where s„ = am. Observe that s,, - co almost surely. Write Xk,„:=Xk/fn, k„ = n. Then at,, = at/n, and s.2 = s /n - a 2 a.s., so that condition (i) of Theorem T.13.1 is checked, assuming a 2 = I without loss of generality. It remains to check the conditional Lindeberg condition (ii). But L „, „(e) is nonnegative, and EL „ „(F) = E(XI l (1X ,^ >E ;_n I ) -+ 0. Therefore, L „, „(e) --* 0 in probability. n .
512
BROWNIAN MOTION AND DIFFUSIONS
2. We now apply Theorem 1.13.2 to Markov processes. Let {X n : n _> 0} be a Markov process on a state space S (with sigmafield .'), having a transition probability p(x; dy). Assume that p(x; dy) admits an invariant probability it and that under this initial invariant distribution the stationary process {Xn : n >, 0} is ergodic. Assume X0 has distribution n. Consider a real-valued function f on S such that Ef Z (Xo ) < oo. Write f'(x) = f (x) -1 where f = f f dit. Write T for the transition operator, (Tg)(x):= I g(y)p(x; dy). Then (Tmf')(x) _ (Tmf )(x) -7 for all m >, 0. Also, Ef'(X0 ) = 0. Suppose the series h n (x) ^_ Iö (Tmf')(x) converges in L2 (S, n) to h, i.e.
f
asn-•co.
(h. - h)' d7z -* 0
(T.13.17)
This is true, e.g., for all f in case S is finite. In the case (T.13.17) holds, h satisfies h(x) - (Th)(x) = f'(x),
or
(I - T)h = f',
(T.13.18)
i.e., f' belongs to the range of I - T regarded as an operator on LZ (S,., iv). Now it is simple to check that h(Xn ) - (Th)(Xn _ 1 )
(n _> I)
(T.13.19)
is a martingale difference sequence. It is also stationary and ergodic. Write n
Zn
(h(X,) - (Th)(Xm-, )).
°=
(T.13.20)
m=1
Then, by Theorem T.13.2, Z,/ f converges in distribution to N(0, a 2 ), where a 2 = E(h(XI) - (Th)(X0)) 2 = Eh 2 (X,) + E(Th) 2 (X0 ) - 2E[(Th)(X0)h(X1)]. (T.13.21) Since E[h(X 1 ) I {X0 }] = (Th)(X0 ), we have E[(Th)(X o )h(X I )] = E(Th) 2 (X0 ), so that (T.13.21) reduces to
a 2 = Eh 2 (XI) - E(Th) 2 (Xo) = J h 2 do - J (Th) 2 dn.
(T.13.22)
Also, by (T.13.18), n-1
n
Zn
(h(X,) - (Th)(Xm)) + h(X) - h(X0)
(h(Xm) - (Th)(Xm-1)) _
=
m=1
n—I
_
r'
f'(Xm ) + h(X) - h(Xo ). m=0
Since
E[h(Xn) - h(X0))/ n]Z ?(Eh2(X0) + Eh 2 (X0)) = 4 $h 2 dir -. 0,
n
n
(T.13.23)
THEORETICAL COMPLEMENTS
513
and
1 " 11
1
1=o f'(Xm) = Z In
—
(T.13.24)
— — (h(X") — h(X0)),
f
m
it follows that the left side converges in distribution to N(0, a 2 ) as n —* Co. We have arrived at a result of M. 1. Gordin and B. A. Lifsic (1978), "The Central Limit Theorem for Stationary Ergodic Markov Processes," Dokl. Akad. Nauk SSSR, 19, pp. 392-393.
Theorem T.13.3. (CLT for Discrete-Parameter Markov Processes). Assume p(x, dy) admits an invariant probability n and, under the initial distribution n, {X"} is ergodic. Assume also that f' := f — f is in the range of I — T. Then
-- Z jn
(f (Xm
) —
r)
N(0,0. 2
)
as n —• oo ,
(T.13.25)
m=0
where the convergence is in distribution, and
is given by (T.13.22).
Q2
❑
Some applications of this theorem in the context of processes such as considered in Sections 13, 14 of Chapter II may be found in R. N. Bhattacharya and O. Lee (1988), "Asymptotics of a Class of Markov Processes Which Are Not In General Irreducible," Ann. Probab., 16, pp. 1333-1347, and "Ergodicity and Central Limit Theorems for a Class of Markov Processes," J. Multivariate Analysis, 27, pp. 80-90.
3. For continuous-parameter Markov processes such as diffusions, the following theorem applies (see R. N. Bhattacharya (1982), "On the Functional Central Limit Theorem and the Law of the Iterated Logarithm for Markov Processes," Z. Wahrscheinlichkeitstheorie und Verw. Geb., 60, pp. 185-201).
Theorem T.13.4. (CLT for Continuous-Parameter Markov Processes). Let {X} be a stationary ergodic Markov process on a metric space S, having right-continuous sample paths. Let n denote the (stationary) distribution of X„ and A the infinitesimal generator of the process on LZ (S,it). If f belongs to the range of A, then t -1 / 2 f (X,) ds converges in distribution to N(0, a 2 ), with a Z = 2
fo
—
Proof. It follows from Theorem 2.3 that Z" = g(X)
—
J
Ag(XX) ds = g(X) 0
—
J
(n = 1, 2,...)
f(X) ds o
>,
is a square integrable martingale. By hypothesis, Z" — Z" _ 1 (n 1), is a stationary ergodic sequence of martingale differences. Therefore, Theorem T.13.2 applies and converges in distribution to N(0, a'), where a 2 = E(Z 1 — Z0 2 i.e., )
a z = EIg(X,) — g(X 0 ) —
J
1 0
Ag(XS) ds
j
2 .
,
(
T.13.26)
514
BROWNIAN MOTION AND DIFFUSIONS
But E(n- ` 12 g(Xn)) 2 = n - 'Eg 2 (X,) -p 0
as n- oc.
Therefore, +n-1/2 fo" f (XS)ds -^ N(0, a 2 ). Also, for each positive integer n, {Zk/n - Z(k-1)/n• 1 < k < n} are stationary martingale differences, so that n
Q 2 = E(ZI — 4) 2 = E E(Zk/ n — Z(k— 1)/n) 2 = nE(Z11n — 4)2 k=1
f
/n
= nE g(XI1n) - g(Xo) -
2
Ag(X,) ds
o, I/n
=
nE(g(X I/n ) - g(Xo )) 2 + nE^
z
Äg(Xs) ds 0
- 2nE (g(XI/n) - g(X0))
f
/n
(T.13.27)
Ag(XX) ds .
o
,
Now, nE(g(X11n) - g(Xo)) 2 = n[Eg 2 (XI/n) + Eg 2 (Xo ) - 2 Eg(X11n)g(Xo)] = n 2 Eg 2 (Xo) - 2 J g(x)TI/ng(x)it(dx)] [
1/n
(^('
= n 2Eg 2 (Xo ) - 2 J g(x) g(x) +
TSÄg(x) ds n(dx) 0
[
f J
= - 2n g(x)T,Ag(x) ds ir(dx) --o - 2
g(x)Ag(x)rr(dx),
0, /n (T.13.28)
as n - oo, since T,Äg -. Äg in L2 (S, n) as s j 0. Also, lJn (
E
=
o
1
2
1
Äg(XS) ds I-< E(! / n
fo
J('
I/n
l/"
1 E(Ag) n 2
2 (Xo) (Äg(Xs))2ds) _ - d E(Äg)s n o
(T.13.29)
2 (X0).
Applying (T.13.28, 29), the product term on the extreme right side of (T.13.27) is seen to go to zero. Therefore, a 2 = - 2
ll
J
k I k l
f(X,)I ds: 1 5 k-< n}, )
THEORETICAL COMPLEMENTS
515
then P(M„ >
Y P(n -1j2
E)
k=1
as n —p
fk
If(XS)I ds >
E
t
1= nP(f 1 If(X5)I ds > E f)
/
-
0
, 0
/
since
G0' If(X)Ids) /
n
.
In order to check that f — f belongs to the range of A it is enough to show that —$0 Ts (f — f)(x) ds converges in LZ (S, it) as t —• oo. For if the convergence is to a function g in L2 , then n
— (Tag — g)(x) = h in L2 as h 10. In other words, Äg =
J
Ta(f — f)(x) ds = 0
TS(.%
—
.%)( x) ds —' (f —
f —7 Now,
I
f
J
y f( )(p(s; x, dy) — n(dy)) ds.
(T.13.26)
os
Therefore, one simple sufficient condition that f — f belongs to the range of Ä for every f e LZ(S, n) is sup{Ip(s; x, B) — n(B)I: Be a(S), x E S} —• 0
_<
exponentially fast as s --• oo. This is, for example, the case with the reflecting a 2 ), k> 1. r -dimensional Brownian motion on the disc {1x1 2 An alternative renewal approach may be found in R. N. Bhattacharya and S. Ramasubramanian (1982), "Recurrence and Ergodicity of Diffusions," J. Multivariate Analysis, 12, pp. 95-122. 4. It is shown in Bhattacharya (1982), loc. cit., that ergodicity of the Markov process {X,} under the stationary initial distribution n is equivalent to the null space of A being the space of constants.
Theoretical Complements to Section V.14 1. To state Proposition 14.1 more precisely, suppose {X,} is a Markov process on a metric space S having an infinitesimal generator A on a domain -, and transition operators T, on the Banach space Cb (S) of all real-valued bounded continuous functions f on S, with the sup norm II.f I). Let cp be a continuous function on S onto a metric space S'. Let be a class of real-valued, bounded and continuous functions g on S' having the following two properties: (i) If g e ✓ g then g o ( A' and A(g o q) = h o cp for some bounded continuous hon S'; (ii) 9 is a determining class for probability measures on (S', M(S')), that is, if µ, v are
two probability measures such that $ g dp = J g dv for all g e then p = v. Then {tp(X,)} is a Markov process on S'.
516
BROWNIAN MOTION AND DIFFUSIONS
To see this consider the subspace B of C,(S) comprising all functions g o gyp, g in the closure of the linear span of 2(f) of W✓ . Then B is a Banach space, and A is the generator of a semigroup on B by the Hille -Yosida Theorem and property (i). This implies in particular that T, maps lB into itself, so that T,(g o 4,) is of the form h, o cp for some bounded continuous h, on S'. Since 5 ✓ is a determining class, it follows that T,(g o cp) is of the form h, o cp for every bounded measurable g on S'. Therefore, Proposition 6.3 applies, showing that {q (X,)} is a Markov process on S'. Theoretical Complements to Section V.15 1. The Markov property of {X,} as defined by (15.1) does not require any assumption on (G. But for the continuity at the boundary of x - T, g(x) := E. g(X) for bounded continuous g on G, or of the solution /i f of the Dirichlet problem (15.23) for continuous bounded f on (G, some "smoothness" of aG is needed. The minimal such requirement is that every point b of äG be regular, that is, for every t > 0, Px (t d0 > t) -+ 0 as x -• b, x e G. Here T ai is as in (15.2). A simple sufficient condition for the regularity of b is that it be Poincare point, that is, there exists a truncated cone contained in 68'`\G with vertex at b. For the case of a standard Brownian motion on R', simple proofs of these facts may be found in E. B. Dynkin, and A. A. Yushkevich (1969), Markov Processes: Theorems and Problems, Plenum Press, New York, pp. 51-62. Analytical proofs for general diffusions may be found in D. Gilbarg and N. S. Trudinger (1977), Elliptic Partial Differential Equations of Second Order, Springer- Verlag, New York, p. 196, for the Dirichlet problem, and A. Friedman (1964), Partial Differential Equations of Parabolic Type, Prentice-Hall, Englewood Cliffs. 2. Let f be a bounded continuous function that vanishes outside G. Define T°f(x):= E(f(X ( )1 wc „ ) ( X ° = x). If b is a regular boundary point then as x -+ b (x e G), T° f (x) -+ 0. In other words, T°C ° c C° , where Co is the class of all bounded continuous functions on G vanishing at the boundary. If f is a twice continuously differentiable function in C o with compact support then one may show that f e ^ö,^o, the domain of the infinitesimal generator A ° of the semigroup {T°}, and that (A°f )(x) _ (A f)(x) for all x e G. For this one needs the estimate P,(T eG <- t) -• 0 as t --+ 0, uniformly for all x in the support of f (see Exercise 3.11 of Chapter VII). Also see (15.24), (15.25), as well as Exercise 3.12 of Chapter VII. As indicated in Section 7 following (7.12), if aG is assumed regular and p ° (t; x, y) has a limit as x approaches äG, then this limit must be zero. For the existence of a smooth p° continuous on G, and vanishing on 3G, see A. Friedman, loc. cit., p. 82. 3. (Maximum Principle for Elliptic Equations) Suppose first that Au(x) > 0 for all x e G,
where G is a bounded open set. Let us show that the maximum of u cannot be attained in the interior. For if it did at x 0 e G, then ((3u/öx''') x0 = 0, 1 < i _< k, and B:= ((ö 2 u/(3x (') dx (j) )),ro is negative semidefinite. This implies that the eigenvalues of CB are all real nonpositive, where C:= ((d(x ° )). For if ) is an eigenvalue of CB, then CBz = Az for some nonzero z e C k and, using the inner product < , > on C”, 0 _<
THEORETICAL COMPLEMENTS
517
Choose y sufficiently large that the expression within square brackets is strictly positive for all x e G. Then Au(x) > 0 for x e G, for all t > 0. Applying the conclusion of the preceding paragraph conclude that the maximum of u t is attained on l G, say at x^. Since öG is compact, choosing a convergent subsequence of x L as E j 0 through a sequence, it follows that the maximum of u is attained on EG. Finally, if Au(x) = 0 for all x e G, then applying the above result to both u and —u one gets: u attains its maximum and minimum on c?G. Note that for the above proof it is enough to require that µ '°(x), d ;j (x), I _< i, j < k, are continuous on G, and d 4 (x) > 0 on G for some i. Under the additional hypothesis that ((d ;; (x))) is positive definite on G, one can prove the strong maximum principle: Suppose G is a (
bounded and connected open set. 1f Au(x) = 0 for all x e G, and u is continuous on G, then u cannot attain its maximum or minimum in G , unless u is constant.
The probabilistic argument for the strong maximum principle is illuminating. Under the conditions (1)—(4) of Section 14 it may be shown that the support of the probability measure P. is the set of all continuous functions on [0, ou) into R', starting at x (see theoretical complements to Section Vll.3). Therefore, for any ball B x, the Pr -distribution i(x; dy) of X i 8 has support i)B. Now suppose u is continuous on G and Au(x) = 0 for all x e G. Suppose u(x 0 ) = maxju(x): x e G} for some x o e G. Then for every closed ball B e = ;x: Ix — x 0 ) _< e} c G, the strong Markov property yields: u(x o ) = E..u(X,,,,,,), in view of the representation (15.22). (Also see Exercise 3.14 of Chapter VII ). That is, u(xo) =
J
u(y)^e(xo; dy), (T.15,1)
where 0,(x o ; dy) is the PX -distribution of X. Since u is continuous on iB and the probability measure 0,(x 0 ; dy) has the full support (,B,, the right side is strictly smaller than the maximum value of u on B E , namely u(x o ), unless u(y) = u(x 0 ) for all y e i B^. By letting E vary, this constancy extends to the maximal closed ball with center x 0 contained in G. By connectedness of G, the proof is completed. 4. Let G be a bounded open set such that there exists a twice continuously differentiable real-valued function cp on R'` such that G = {x: ep(x) < e}, (G = ix: q(x) = c;, for some c e 18. Assume also that grad cpi is bounded away from zero on iG. If u is a twice continuously differentiable function in G such that u, i u/<,x 4 ' ) , i 2 u/i)xW i^x^' 4 (I _< i, j < k) have continuous extensions to G, then there exists a twice continuously differentiable function q on 9 with compact support such that q = u on G (see Gilbarg and Trudinger, loc. cit., p. 130).
Theoretical Complements to Section V.16 1. D. W. Stroock and S. R. S. Varadhan (1979), Multidimensional Diffusions, Springer- Verlag, New York, Chapter 10, have constructed diffusions on R'` under the assumptions (i) ((d(x))) is continuous and positive definite, (ii) jl ( '°(x) (I s i _< k) are
Borel-measurable and bounded on compacts, and (iii) an appropriate condition for nonexplosion holds. These conditions apply under our assumptions in Section 16. 2. The present method of constructing reflecting diffusions in one and multidimensions may be found in M. Friedlin (1985), Functional Integration and Partial Differential
518
BROWNIAN MOTION AND DIFFUSIONS
Equations, Princeton University Press, Princeton, pp. 86, 87, and in R. N.
Bhattacharya and E. C. Waymire (1989), "An Extension of the Classical Method of Images for the Construction of Reflecting Diffusions," Proceedings of the R. C. Bose Symposium on Probability, Statistics and Design of Experiments (ed. R. R. Bahadur, J. K. Ghosh, K. Sen), Eka Press, Calcutta. 3. The transformation g(x) = x (see (16.9)) generates a group of transformations G = {g o , g} with g o as the identity. A maximal invariant under G is q (x) = (Ix" > I, x (2) , .. , x (k) ). By the proposition in theoretical complement 6.2, {(X,)} is Markov if p(t; z, y) = p(t; x, y). The transition density of the unrestricted diffusion {X i } in Proposition 16.1 has this invariance property. As further applications of the proposition, consider the following examples. Example 1. (Radial Diffusions). Let {X} be a diffusion on ll ' with drift s(.), and diffusion matrix ((d1 (. ))). If I d;j (x)xl' x'j and 2 E x ( ' ) µ (i) (x) + E d 1 (x) are radial functions (i.e., functions of Ix»), then {IX,I} is a Markov process. This follows from the fact that if f is a smooth radial function then Af is also radial (see (15.35), and theoretical complement 14.1). The underlying group G in this case is the group of all orthogonal matrices. )
)
Example 2. (Diffusions on the Torus). Let {X 1 } be a diffusion on Ft" such that µ 1 ` ) (x) and d ij (x) are invariant under the group G generated by translations by a set of k linearly independent vectors ^,,, 4 2 , ... , 4k . In other words, µ(i)(X + m r r^ _ 1
p (x),
dij(x + I mr4r = dij(x) 1 )
(1 < i..% <, k),
for all x and all k-tuples of integers (m 1 , M21. .. , m k ). Now express x (uniquely) as S,(x) r , and let 8,(x) = 5 r (x)(mod 1) e [0, 1), 1 -< r -< k. Then the maximal invariant under G is rp(x):= ($ 1 (x), ... , &(x)), and {cp(X,)} is a Markov process on the torus [0,1) k .
Theoretical Complements to Section V.17 1. Taylor's article appeared in G. I. Taylor (1953), "Dispersion of Soluble Matter in a Solvent Flowing Through a Tube," Proc. Roy. Soc. Ser. A, 219, pp. 186-203. It was shown there that asymptotically for large U0 , D — a 2 Uö/48D0 . The explicit computation (17.30) was given by R. Aris (1956), "On the Dispersion of a Solute in a Fluid Flowing Through a Tube," Proc. Roy. Soc. Ser. A, 235, pp. 67-77, using the method of moments and eigenfunction expansions. The treatment of the Taylor—Aris theory given in this section follows R. N. Bhattacharya and V. K. Gupta (1984), "On the Taylor—Aris Theory of Solute Transport in a Capillary," SIAM J. App!. Math., 44, 33-39.
CHAPTER VI
Dynamic Programming and Stochastic Optimization 1 FINITE-HORIZON OPTIMIZATION Consider a finite set S, called the state space, a finite set A, called the action space, and, for each pair (x, a) e S x A, a probability function p(y; x, a) on S: p(y; x, a)
p(y; x, a) = 1.
0,
(1.1)
yes
The function p(y; x, a) denotes the probability that the state in the next period will be y, given that the present state is x and an action a has been taken. A policy (or, a feasible policy) is a sequence of functions (fo , f,, .. .) defined as follows. fo is a function on S into A. If the state in period k = 0 is x 0 then an action f0 (x 0 ) = a o is taken. Given the state x 0 and the action a o = fo(xo), the state in period k = I is x, with probability p(x,; x 0 ,f0 (x 0 Now fl is a function on S x A x S into A. Given the triple x o , fo (x o ), x,, an action .f1(x0,f0(x0), x 1 ) = a l is taken. Given x 0 , a o and the state x i and action a l , the probability that the state in period k = 2 is x 2 is p(x 2 ; x,, a,). Similarly, f2 is a function on S x A x S x A x S into A; given x o , a o , x i , a l , x 2 an action a 2 = f2 (x o , a o , x 1 , a 1 , x 2 is taken. Given all the states and actions up to period k = 2 (namely x0, ao = .fo(xo), xi, ai = fi(xo, ao, x1), x2, a 2 = .f2(x0, ao, xi, ai x2)), the probability that a state x 3 occurs during period k = 3 is p(x 3 ; x 2 , a 2 ), and so on. In general, a policy is a sequence of functions { fk : k = 0, 1, 2, ...} such that fk is a function on (S x A)' x S into A (k = 1, 2, ...), with fo a function on S into A. The sequence may be finite { fk : k = 0, 1, ... , N }, called the finite-horizon case, or it, may be infinite, the infinite-horizon case. Consider first the case of a finite horizon. Given a policy f = ( fo, f1, .. • N + 1 random variables X0 , X,,. , XN are defined, Xk being the state at time )).
)
,
..
519
520
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
k. The initial state X0 may be either fixed or randomly chosen according to a probability in o = {n 0 (x): x e S}. Given Xo = x 0 the conditional probability distribution of X l is given by ,
P(X1
= xl
X0
=
x0)
= P(x1; x0,.fo(xo)).
(1.2)
Given X0 = x 0 , X l = x 1 , the conditional probability distribution of X2 is given by
I
=
P(X2
=
x2 X0
x0 , X1
In general, given X. = of Xk is given by P(Xk = Xk
= fk(xo a0
x1)
Xl =
=
x1,
P(x2; x1, al),
•••‚
al
= f1(xo,f0(x0)
,
x1).
(1.3) the conditional distribution
Xk -1 = X k _ 1'
I XO = X0, Xl = X1, ... , Xk- I = xk- 1) = P(xk; Xk- 1, ak- 1), (1.4)
where the actions ak
x0,
=
ak
,
are defined recursively by
a0, x1, al,
...
(k = 1, 2, ... ‚ N),
, Xk-1r ak-1 , xk)
(1.5)
= f0(xo)•
The joint distribution of X0 , X 1 , 10.k(XO, X1,
...
, Xk):=
... ,
Xk is then given by
P(X0 = X0, X1 = xl, ... ,
_ iro( x0) Pt x1>x0
,
Xk = Xk)
ao)P(X2=x1 , a,)
.
..
P(Xk;Xk-1>ak-1) ,
(k = 1,2,...,N) (1.6) with the states a 0 , a 1 , ... defined by (1.5). Expectations with respect to nö,N will be denoted by o The kth period reward function is a real-valued function g k on S x A such that if the state at time k is X k and the action at time k is a k , then a reward g k (x k , a k ) accrues. The total expected return for a policy f is given by
En .
N Jao
—
Eaak=0 Z gk(Xk, ak),
(1.7)
where a k denotes the (random) action at time k determined by the states X0 , X 1 X,,_ 1 according to (1.5). In case n o = S write Jx for J Let F denote the set of all policies. The object is to find an optimal policy, i.e., a policy f* = ( f ö, ... , f N) such that ,
...
X
‚
J
for all initial distributions
110
J` o for all f e F,
no.
0.
(1.8)
521
FINITE-HORIZON OPTIMIZATION
In view of the fact that the probability of transition to a state x k+ , in period k + I depends only on the state X k and action a k in period k, it is plausible that the search for an optimal policy may be confined to the class of policies f = ( fo fl , . .. 'f) such that the action a k depends only on the state x k in period k. In other words, fk is a function on S into A (k = 0, 1, ... , N). Such policies are called Markovian. The reason for this nomenclature is the following proposition. ,
Proposition 1.1. If f = ( f,,
fl , . .. , fN ) is a Markovian policy, then X0 , X,, ... , XN is a Markov chain (which is, in general, time-inhomogeneous).
Proof. By (1.6) one has
P(XO = x0, X, = x,, ... , XN = xN) = lr0(x0)P(x,; xo,fO(xO))P(x2; x1,Jl(x,))
= 71 o(xo)P1(xl; xo)P2(x2; x1)
...
..
.
P(xN; xN—
I
,
fN-1(xN -1))
(1.9)
PN(xN; xN -1),
where Pk (xk; xk
1)
= P (xk; xk l , fk — 1(xk — 1)) • (
1.10)
Hence X0 , X 1 , ... , XN forms a finite Markov chain with successive one-step transition probabilities P,, P2, ... , PN. n Let us show that a Markovian optimal policy exists. For each element f N(y) (perhaps not unique) of A such that max 9 (Y, a) = 9N(Y,fN(Y)). ,
yE
S pick an
(1.11)
aEA
Write hN(Y) = 9N(Y9fN(Y))•
(1.12)
Next, for each y E S find an element f N - 1 (y) of A such that max [9N -1(Y> a) + aEA
I
hN(z)P(z; Y, a)
2E$
= 9N -1(Y,f -1(Y)) +
J
X hN(z)P(z;Y,.IN -1(Y)) = hN -1(Y), (1.13) ZES
522
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
say. In general, let f k be a function (on S into A) such that max 9k(Y, a) + I hk+ l(z)P(z; Y, a) = 9k(Y, f aeA
L
J
ZES
(Y)) + Y hk+ 1 (z)p(z;
Y, fk (Y))
ZES
=hk(Y),
(k= N— 1,N-2,...,0) (1.14)
say, obtained successively (i.e., by backward recursion) starting from (1.11) and (1.12).
Theorem 1.2. The Markovian policy f* (f 0* f
i . .. , f N) is optimal.
Proof. Fix an initial distribution n o . Let f = (fo , f1 , . .. , fN ) be any given policy. Define the policies k f^ = (fo'f . .. {'
.. . >fN) (k = 1,... ,N),
{'
f(0) = f*
(1.15)
We will show that (k = 0, 1, 2, ... , N).
Jn o < J f 'ö'
(1.16)
First note that the joint distribution of Xo , X 1 , ... , Xk is the same under f and under f (k) (i.e., nö, k = it). In particular, k-1
k-1
9k'(Xk,, a k .) = E o ' Z g k •(Xk ., a k .).
ERO
(1.17)
k'=0
k'=0
Now, Erto{9N(XN , aN) I XO
= X0, X1 = X 1 , . .. , XN = XN] = 9N(XN, aN) ' 9N(XNr f *I \\= hN(XN) = Eao ' 19N(XNr aN) I X O = x0, ... , XN = XN].
Since nö.N = nö N, averaging over x o , x 1 , .
.. , X N ,
(1.18)
one gets
En09N(XN aN) <, Erto'9N(XN, aN). ,
(1.19)
Together with (1.17) with k = N, this proves (1.16) for k = N. To prove (1.16) for k = N — 1, note that Es,C9N-1(XN-1, aN-1) + 9N(XN, aN) I Xo = xo, . .. , XN-1 = XN-1]
=
9N-1(XN-1 , aN-1)+ Y- 9N(z,aN(Z))P(Z;XN-1,aN-1) (1.20) zes
FINITE-HORIZON OPTIMIZATION
523
where a N _ 1 , a N (z) are defined by (1.5) with x N replaced by z. By (1.12) and (1.13), the right side of (1.20) does not exceed hN-1(xN -1)
=
En0 'I [ gN-1(XN - 1'aN -1) +gN(XN,aN)
XO =
Since nö.N 1 =
1,
nö N
Eno(9N_,( XN -1 aN -1) ,
x0 > • - ' , XN 1 = xN - 1] • ( 1.21)
it follows by averaging over x 0 , x 1 , ... ,
+ 9N(XN, aN)) <, Erzo
11
(9N-1( XN -1 aN -1) ,
X_ 1
that
+ 9N(XN, aN))• (1.22)
Together with (1.17) for k = N — 1, (1.22) yields (1.16) for k = N — 1. To complete the proof we use backward induction. Assume that r
E no
N .
9k'(Xk„ ak')
I XO = XO, • • • , Xk
hk+11/
+l = Xk+I
ll•23) (1.23)
k'=k+1
for all x 0 ,x 1 ,...,x k + , eS and N
.) IX I (X.
Enä`"
9k' k>a k
k'=k+1
k+l
for all X k+ 1 e S. Then N
Eno
E
gk,(Xk., a k ')
X0 = x 0
, ...
,
Y_
= x k+l
N
gk(xk , ak)
+ Ea o Eao
9k'(Xk', a k
,) I X
0 ,..
. , Xk +1
k'=k+1
XO = X 0 , X i = x 1 , ... , Xk = xk1
+ Eno[hk+ 1(Xk+ 1) I XO =
= 9k(xk , ak) +
(1.24)
Xk = xk
k'=k
= gk(xk, ak)
—h k+l(x k+l ) —
hk+1(z)p(z; Xk, ak)
X0,
<
.
..
, Xk = Xk] (1.25)
hk(xk)•
=ES
The first inequality in (1.25) follows from (1.23), while the last inequality follows from (1.14). Hence (1.23) holds for k. Also, hk(xk) = gk(xk, f (xk))
+
Y_
1] hk+ I (z)p(z; xk, f (xk)) =ES
= Eno ' [gk(Xk , ak) + hk+ 1(Xk +1) I Xk = Xk] N
r1kl
En o [gk(Xk, ak) +
k'=k +I
I
gk'(Xk', ak') Xk = Xk
(1.26)
524
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
where the last equality follows from the facts that: (i) conditional distributions of (Xk+ 1, under nö N and nä N ", and (ii) one has
, XN )
given Xk+ 1 are the same
N
E
E
gk'lXk'^ / ak') I Xk+l
7[p 1 Rp
Xk
k'=k+1 N —
E 110 1nol E
E gk'(Xk'^ ak') ( Xk^ Xk+l Xk [
k'=k+l
N
=t o gk'(Xk,, a k •) Xk .
(1.27)
k'=k+1
Hence, (1.24) holds with k + 1 replaced by k, and the induction argument is complete, so that (1.23), (1.24) hold for all k = 0,1, ... , N — 1. Thus, one has N %no =
E ao
y- gk'( Xk' , ak) k'=0
k-1
E>
l
N
gk'(Xk,, ak') + E
0 gk'(Xk', ak') I
k'=0
k'=k
r
k-1
N
= E>
L
/
k'=k
Xk])
k-1
E tt0
Y
gk'(Xk'^ ak') k'=0
+ EROhk(Xk)
k-1 (Ikl
=
E no
r
(Ikl
: gk'(Xk', ak') + Eao hk(Xk) k'=0 N
(Ikl = En o gk'(Xk.,
(Ikl
Q k .) = J np .
(1.28)
k'=0
Letting k = 0, the desired result is obtained.
n
Exactly the same proof works for the more general result below (Exercise 3). Theorem 1.3. Assume S is countable, A compact metric, and the reward functions a —p g k (x, a) are bounded and continuous for each x a S. Then f* =(f ,f .•••..f*)isoptimal. There are several useful extensions of the problem of optimization over a finite horizon considered in this section. First, the set of all possible actions when in state x may depend on x. Denote
525
THE INFINITE-HORIZON PROBLEM
this set by A r (x e S). In this case one simply maximizes over a e A y in (1.11)—(1.14) (Exercises 1, 2). Secondly, instead of maximizing (rewards), one could minimize (costs). In all the results above, max is then replaced by min, and inequality signs get reversed (Exercise I ). One may often derive existence of optimal policies when S is a continuum, g k are unbounded, and A is noncompact, if certain special structures are guaranteed. For example, if S, A are intervals (finite or infinite) and the functions a —' gk(x, a), gk(x, a)
+
hk+ 1(z)p(dz; x, a)
J
(0 < k < N — 1)
have unique maxima in A, then all the results carry over. Here (i) p(dz; x, a) is, for each pair (x, a), a probability measure on (the Borel sigmafield of) S, (ii) hk +1(z) is integrable with respect to p(dz; x, a), and its integral is a continuous function of (x, a). See Section 5 for an interesting example. Finally, the rewards g k (0 <, k < N) may depend on the state and action in period k, and the state in period k + 1. This may be reduced to the standard case considered in this section. Suppose gk(x, a, x') is the reward in period k if the state and action in period k are x, a, and the state in period k + I is x'. Then define gk(x,
a) := I
gk(x,
Then for each policy f, N
Y_
(1.29)
N
g(X , ak, Xk + 1) = Erzo
E rzo
a, z)p(z; x, a).
k=0
I
E(gk(xk, ak, Xk+ 1) X0,
• • • , Xk)
k=0 N
=
Eno
gk(Xk, a k ).
(1.30)
k=0
Thus, the maximization of the first expression in (1.30) is equivalent to that of the last (see Exercise 2).
2 THE INFINITE-HORIZON PROBLEM As in Section 1, consider finite state and action spaces S, A, unless otherwise specified. Assume that sup Ig k (x, a)I) < oo . k=0 xOS.aEA
(2.1)
526
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
An important example in which (2.1) is satisfied is the case of discounted rewards g k (x, a) = 8 k g o (x, a) for some discount factor S, 0 < b < 1. Those who are interested primarily in this special case may go directly to the dynamic programming principle (appearing after (2.13)) and Theorem 2.2. In view of (2.1), given any E > 0 there exists an integer N E such that I gk(xk, ak) — j, gk(Xk, k k
ak) < 2 (
2.2
)
for all sequences {x k } in S and {a k } in A. Let (f o*`, f ; `, ... , f) be an optimal Markovian policy over the finite horizon {0, 1, 2, ... , NE }, and f*` = (f", f iE, ...) with f = f NE for all k> N. Then f*` is a Markovian policy that is E-optimal over the infinite horizon, i.e., given any policy f one has 'ino °= Eno L gk(Xk, ak) k=0 r
= E aoL ( 1 gk(Xk , ak)) +
E
0
k=0
N^
E
i En o k ^ gk(Xk, ak)J — 2
( (
Z gk(Xk, ak)) k=N,+ 1
ao
=gk(Xk, a k ))
_c=J
o
_c. (2.3)
i E^^ ( = O
By Cantor's diagonal procedure, one may find a sequence S„ j.0 and a sequence of functions { f,*a"} from S into A (k = 0, 1, 2, ...) such that (Exercise 1) lim f "(x) = f(x)
for all k = 0, 1, 2, ... , and for all x e S, (2.4)
n + ao
for some sequence of functions f,*. Since S and A are finite, (2.4) implies that f,*, b^(x) = f k (x) for all sufficiently large n. From this it follows that given any
integer N there exists n(N) such that IN
Eao^
gk(Xkr ak)) =
EIto
\k=0
gk(Xk, ak))
for n > n(N).
(2.5)
\k=0
In particular, for fixed E > 0, (2.5) holds for N = N E . In view of this and (2.2), one has for all n > n(N). •
s N`
1
E
E•a ,
„
JEäo k^ gk(Xk, ak) ) — 2 = E no
^
gk(Xk, ak)) — 2 i J n o — E. (2.6)
(k N'
Now apply (2.3) to this to get J`,ö > Jn 0 — S„ — E for all n > n(N), and then let n — cc to obtain JRO ? J,` 0 — E.
527
THE INFINITE-HORIZON PROBLEM
Since this is true for all e, it follows that Jrzö
?
J,` o for all f e F.
(2.7)
In other words, f* is an optimal Markovian policy. We have thus proved the following result.
Theorem 2.1. Under the hypothesis (2.1), there exists an optimal Markovian policy. We now consider a special case of Theorem 2.1, the case of discounted dynamic programming. Assume that
(2.8)
(k = 0, 1, 2, ...),
gk(x, a) = b k go(x, a)
where g o is a given real-valued function on S x A, and S is a constant satisfying 0 < 5
(2.9)
A Markovian policy f = (f0 , fl , ...) is said to be stationary if fk = fo for all k. It will be shown now that for the reward (2.8) there exists a stationary optimal policy. Let f* _ (f o*, f i , ...) be an optimal Markovian policy in this case (which exists by Theorem 2.1). Write Ez for E 0 and J,, for J 0 in the case n({x }) = 1. Then, bkgo(Xk, ak)
JI = Ex k=0
= 90(x, iö(x)) + SEzE
(5kg0(xk
^
+1,
ak+l) 1
k=0
= 9o(x, iö(x)) + 8
J^'P(z; x,.fö(x)),
X'J)
(2.10)
where f«" = (f ,*, f z, ...). Since f* is optimal, one has 1
Jf'" < J `
for all z e S.
(2.11)
JZ^P(z; x, iö(x)) = Jx`
(2.12)
Therefore, J
go(x,fö(x)) + a zes
where f* (1) = (f *, f *, fi , f *....)• But f* is optimal. Therefore, one must have
528
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
equality in (2.12). Similarly, one shows successively that f
in)
= 1 01
-11
(n = 1, 2, ...),
f *(0) = f*
(2.13)
where f *(n-1):_
(fo,JO, ...
IJo, flI f2l
...),
with fo appearing in the first n positions. By (2.2) it follows that the stationary policy f* = ( f * ,f *, f *' • • • 'f, ...) is optimal. We shall denote it by f*. This proves the first part of Theorem 2.2 below. The dynamic programming equation (2.14) below may be derived by letting k T oo in (1.14) with g k (x, a) = S k (x, a). However, an alternative derivation of Theorem 2.2 is given below. Let us describe first the so-called dynamic programming principle. Suppose there exists a stationary optimal policy f* = ( f *, f *, ...) in the case of discounted rewards (2.8) with the discount factor S satisfying (2.9). Let J, := Jz' denote the optimal reward, starting at state x. Fix x e S and a e A. Consider a Markovian policy f' = ( fo , f *, f *, . ..) for which f0 (x) = a, while the optimal policy is used from the next period onward. Since the conditional distribution of {Xk : k > 1}, given X 1 under f' is the same as that under f*,
J' go(x, a) + EX Eö (j b k go(Xk,.f *(xk)) X,
L
= go(x, a) + b
k
k=1
/]
= go(x, a) + SEX'JxI
1] J2p(z; x, a). Z
E.S
The optimal reward Jx must equal the maximum of Jz over all choices of f0 (x) = a E A. Thus, one arrives at the dynamic programming equation (2.14).
Theorem 2.2. Let g o be bounded and 0 < 6 < 1. Then for the discounted rewards (2.8) there exists a stationary optimal policy f* = ( f *, f *, ...). This policy satisfies the dynamic programming equation, or the Bellman equation, for each x e S, namely
JX + = max { 9 o (x, a) + b aES (
J f* p(z; x, a)} .
( 2.14)
zcs
Conversely, if a stationary Markovian policy f* satisfies (2.14) for all x e S, then it is optimal.
Proof. Denote by B(S) the set of all real-valued functions on S. For each
THE INFINITE-HORIZON PROBLEM
529
function f on S into A, define the maps T, Tf : B(S) --- B(S) by (TfJ)(x) = 90(x,.f(x)) + 6
1 J(z)p(z; x,f(x)), ZES
(2.15) (TJ)(x) = max { 9 o (x, a) + 6 aEA
For all J, J'
E
1 J(z)p(z; x, a) } ZES
((
(x ES, J e B(S)).
))
B(S) one has
II T} J — Tf J'll = max ITT J(x) — TJ J'(x)I = 6 max XES
XES
1<SIIJ
—
IY
(J(z)
— J'(z))p(z; x,.f(x))
ZES
(2.16)
J'll,
where II I denotes "sup norm", IIgII := max {lg(x)I: x E S }. Turning to T, fix Je B(S). Let the maximum in (2.15) be attained at a point f (x) (x es). Then, (TJ)(x)
= (T1 J)(x).
(2.17)
Also, note that for all J' E B(S) one has (TJ')(x)
> (Tf J')(x)
(x e S).
(2.18)
Combining (2.17) and (2.18), one gets, for all J', (TJ)(x)
— (TJ')(x) = (Tf J)(x) — (TJ')(x) <, (Tf J)(x) — T1 J'(x).
(2.19)
By (2.16) and (2.19), sup [(TJ)(x) xeS
— (TJ')(x)] < sup [(TJJ)(x) — (TfJ')(x)] < 611 J —
J'.
(2.20)
XES
This inequality being true for all J, J' e B(S), one gets (interchanging the roles of J and J'), IITJ
— TJ'jI <6IIJ—J'II.
(2.21)
Thus, T is a (uniformly strict) contraction on the space B(S). Therefore, it has a unique fixed point J *; i.e., TJ* = J* (Exercise 2), J*(x)
= max { 9 o (x, a) + b 1 aE`i
l
J *(z)p(z; x, a) } .
ZES
)
(2.22)
Let a = f *(x) maximize the right side of (2.22) (x e S). Then, J * (x)
= 9o(x,f * (x)) + b Z ZES
J * (z)p(z; x,.f * (x))
= Tf .J *(x).
(2.23)
530
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Now let f be an arbitrary policy. In view of (2.22) one has, for all k = 0, 1, 2.... (and all x), J*(Xk)
i g0(Xk, ak) + SE ' (J * (Xk+1) 1 {Xk, ak}).
(2.24)
Multiplying both sides by 8 k and taking expectations one gets S
k E f J * (Xk)
i BkExgo(Xk, ak) + 6k+IExJ*(Xk+l).
(2.25)
Summing (2.24) over k = 0, 1, 2,... and canceling common terms from both sides it follows that W (2.26)
h k ß f g0(Xk, ak) = J.
J * (x) i k=0
Now note that one has equality in (2.24)-(2.26) if f = f = (f *, f *, f *, ...) (see (2.23)). Hence f* is optimal. This proves that there exists a stationary optimal policy f* satisfying (2.14). Conversely, if (2.14) holds (i.e., (2.22) and (2.23) hold) then (2.24)-(2.26) (with equalities in case f = f*) show that f* is optimal. n For the map T defined in (2.15), (T N O)(x) is the N-stage optimal reward function for the discounted case, as may be seen from (1.11)-(1.14) and Theorem 1.2. Hence J*(x) = lim N -.(T N 0)(x) is the optimal reward function in the infinite-horizon case. The contraction property of T shows that T"'J(x) converges to the optimal reward function, no matter what J is. The proof of Theorem 2.2 extends word for word to the following more general case.
Theorem 2.3. Let S be countable, A a compact metric, g o bounded, and a -+ g o (x, a) continuous for each x. Then the conclusions of Theorem 2.2 hold. Further extensions are indicated in the theoretical complements. Example. Let S = 7L, the set of all integers, A = [a, 1 — a] with 0 < a <‚ and p(x + 1;x,a) = a, p(0;0,a)= 1 —a,
p(x — 1; x,a) = 1 — a a p( 1 ; 0 ,a)=2,
for x 0 0,
a p(-1;0,a)= 2 .
(2.27)
Take gk (x, a) = b k ^p(x) (0 < b < 1), where co is bounded and even (i.e., cp(x) = gyp(—x)), and cp(jxl) decreases as lxi increases. Then the hypothesis of Theorem 2.3 is satisfied. Under a stationary policy f = (f, f, . ..) the stochastic
531
THE INFINITE-HORIZON PROBLEM
process {Xk } is a birth-death chain on 7 with transition probabilities ß x ' = P(Xk+1 =
X
+ 1 I Xk = X) = f(x) ,
5 x := P(Xk+ 1 =x— 1 I Xk = X) = 1 —f(x)
(X 0),
YO' = P(Xk +1 = 1 I Xk = 0) _ ÖO = P(Xk +l = — l I Xk = O) = 1 — ßo — S O = P(Xk +l = O I Xk = O) = I — f(0).
f(0) , 2 (2.28)
Since the maximum of cp is at zero, and q(ix ) decreases as lxi increases, one expects the maximum (discounted) reward to be attained by a policy f* = (f* f*,.. .) which assigns the highest possible probability to move in the direction of zero. In other words, an optimal policy is perhaps given by ,
1 —a
f *(x) = a a
ifx<0,
if x > 0,
(2.29)
ifx =0.
Let us check that f* is indeed optimal. To simplify notation, write Ex for E. Assume, as an induction hypothesis, that the following order relations hold. (i) (ii)
(iii)
E x (p(Xk ) > Ex+ 1 p(Xk) E x 9(Xk )
E x+ ,cp(Xk )
E x cp(Xk ) = E_ x cp(XX )
for all x > 0, for all x
—1,
(2.30)
for all x.
Clearly, (2.30) holds for k = 0 in view of the assumptions on q. Suppose it holds for some k. Then for x > I one has, by (i), Exgp(Xk +l) = (1 — a)Ex-1(P(Xk) + aEx+1p(Xk) (1 — a)Ex(P(Xk) + aEx+2(P(Xk) = Ex+14(Xk +l),
(x > 1).
(2.31)
Similarly, using (ii), Exq (Xk +1) < Ex+1gq(Xk), ,
(x'< —1).
(2.32)
Also, (iii) is trivially true for all k if x = 0. For x > 0 one has, by (iii), EX^P(Xk +l) = (1 — a)Ex-1gq(Xk) + aEx+1(P(Xk)
= (1 — a)E- x +1gq(Xk) + aE_x-1ç(Xk)
= E -x(P(Xk +l). (2.33)
532
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Finally, using (2.31)-(2.33), / a a / Eogo(Xk+l) = ( 1 — a)EO(p(Xk) + 2 E-Igq(Xk) + 2 E1(P( Xk)
_ (1 — a)Eo^P(Xk) + aEI(P(Xk) (1 — a)Eo^P(Xk) + aE2(P(Xk) = EIp(Xk+l).
(2.34)
By (2.31) and (2.34), the relation (2.30(i)) holds with k replaced by k + 1. By (2.32), (2.30(ii)) holds for k + 1; in view of (2.33), (2.30(iii)) holds for k + 1. Hence the relations (2.30) hold for all k >, 0. To complete the proof that f* is optimal, let us now show that the expected discounted reward J x under f' given by 00
Jx :=
I,
(2.35)
6k E.(V(Xk)
k=0
satisfies the dynamic programming equation (2.14). That is, one needs to show ix =cp(x)+6max[(1 —a)JJ _ I +aJx+t ]
(x00),
aEA
(2.36)
J0 =rp(0)+6max (1 —a)Jo
+2J_ I +aJ,
]
.
[
Now, by (2.30(i)), Jx_I'>Jx+I
forx>,l,
so that, assigning the largest possible weight to the larger quantity J._ I , max [(1 — a)Jx _ 1 + aJx+l] = (1 — a)Jx _ 1 + aJ+l,
(x >, 1). (2.37)
aeA
But Jx satisfies the equation ao Jx = q (x) + b
bk-'Exp(Xk) k=1
ak-1 [(1 — a)E x-l^(Xk-1) + aEx+l9(Xk-1)]
(P(x) + (5 k=I
= q(x) + S[(1 — a)Js _ I + aJx+1],
(x ? 1).
(2.38)
Hence, Jx satisfies (2.36) for x >, 1. In exactly the same way one shows, using (2.30(ii)), that J. satisfies (2.36) for x < —1. Finally, since J-, = J 1 by (2.30(iii)),
OPTIMAL CONTROL OF DIFFUSIONS
533
and .J0 ?J 1 (= J_ 1),
_, + --J,, (2.39) max (1 —a)Jo +2J_, +-J 1 ]= (1 —a)Jo +2J acA
2
[
assigning the largest possible weight to J0 for the last equality. The proof of (2.36), and therefore of the optimality of f *, is now complete. In general, even for the infinite-horizon discounted case, it is difficult to compute explicitly optimal policies and optimal rewards. The approximations T"0, and the corresponding N -period optimal policy given by backward recursion (1.11)—(1.14), are therefore useful.
3 OPTIMAL CONTROL OF DIFFUSIONS
Let G be an open set c l, and consider the problem of minimizing the expected penalty E X f (X.. (; ) on reaching the boundary 0G, starting from a point x outside of G= G v 1G. Here {X,} is a diffusion on Fk" \ G, with absorption at 1G. The function f is a continuous bounded function on 1G. If f = I on 1G, the problem is to minimize the probability of ever reaching 13G. As usual TG denotes the first time to reach the boundary 1G. The drift vector µ(x, c) and diffusion matrix D(x, c) of the diffusion are specified. The control variable c, which may depend on the present state x, takes values in some compact set C c 18m • The function c(.) (on W'\G into C) so chosen is called a control, or a feedback control. The class of allowable, or feasible, controls may vary. For example, it may be the class of all measurable functions, the class of all continuous functions, the class of all continuously differentiable functions, etc. Let us give a heuristic derivation of the dynamic programming equation, by means of the so-called dynamic programming principle. Write Ex for the expectation when the diffusion has coefficients i(., c), D(., c) with c constant, and xis the initial state. If c(. ) is a control (function), this expectation is denoted EX^ , the diffusion having drift and diffusion coefficients µ(y, c(y)), D(y, c(y)). Write -)
Je(x) '= Ex i(X r ^), J` (x) °= EX ' f (Xr.•G), J(x)t= inf J(x), (3.1) (•)
( )
c(-
where the infimum is over the class of all feasible controls. Suppose there exists an optimal control c*(.). Consider the following small perturbation c l (•) from the optimal policy. c l (.) takes the constant value c for an initial period [0, t], after which it switches to the optimal control c *(. ). This control is of course time-dependent and not feasible under our description above. But one may actually allow time-dependent controls in the present development (theoretical
534
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
complement 1). One has, by the Markov property, J" ' (x) = EXJ(X,) + o(t), (
)
as t .j 0.
(3.2)
Note that if after time t the state is X, e Il k \ G, then the conditional expectation of f (Xt G ) is J(X,), as the optimal control c*(.) is now in effect. Also, the probability that t -> 'r OG is o(t) for x e Il k \ G (see Exercise 1 for an indication of this). As J" ' (x) > J(x), one gets ( )
EJ(X 1 ) — J(x) > o(t),
as t j 0.
(3.3)
But the left side is TIJ(x) — J(x), where {T',} is the semigroup of transition operators generated by 2
A` :
2
d " (x ' c)
a öx^'^axu^ + Y pl'>(x, c) a
(3.4)
with absorbing boundary condition on 8G. Hence, dividing both sides of (3.3) by t and letting t f, 0, one gets A,J(x) >, 0. (3.5) By minimizing the left side of (3.3) with respect to c, one expects an equality in (3.3) and, therefore, in (3.5); i.e., the dynamic programming equation is inf A,J(x) = 0
for XE R'\G,
J(x) = f(x)
for x E 8G. (3.6)
cec
We will illustrate the use of this rigorously in Example 1 below. Before this is undertaken, a simple proposition in the case k = 1, f - 1, directly shows how to minimize the expected penalty when the diffusion coefficient is fixed. Let p° ) (•) (i = 1, 2) be two drift functions, Q 2 (.) > 0 a diffusion coefficient specified on (0, oo), satisfying appropriate smoothness conditions. Let i/i°(z) denote the probability that a diffusion with coefficients p ( O(• ), 6 2 (.) reaches 0 before reaching d, starting at z. Here d > 0 is a fixed number. It is intuitively clear that, for the same diffusion coefficient a 2(.), larger the drift tc(.) better the chance of staying away from zero. Here is a proof. Proposition 3.1. Consider two diffusions on [0, oo ), with absorption at 0, having a common diffusion coefficient a'(•) but different drifts p(•) (i = 1, 2) 1(z) for all z > 0. satisfying p ( '>(z)'< j 2 ^(z) for every z > 0. Then /i 2 (z) Proof. By relation (9.19) in Chapter V, = ^i^`^(z)
2 11()(y) a (— fu 2µct_ (y) dy du. (3.7) du exp xp{ ( e } 40, —fo" a2(y) fZ o a 2 (y)
535
OPTIMAL CONTROL OF DIFFUSIONS
Write `I( z ) — pti^(z), ^(z):= y
i (z) ° =µt' (z) + srl(z),
and let F(r; z) denote the probability of reaching 0 before d starting at z, for a diffusion with drift coefficient µ E ( • ) and diffusion coefficient o2(.). It is straightforward to check (Exercise 2) that F(a; z) = /it i ^(z)(1 — ey(z)) + O(s 2 )
as e —• 0,
(3.8)
where
f
^ µ ti ^
a
y(z):=
2
exp
J
o
(Y) rl(Y)
2 dy 6 (Y)
22
0
6
dy du
(Y)
I
°
exp —
0
Y) 2 pta^( z dy du. 6
(Y}
z(3.9)
Since rl(z) >, 0 for all z, it follows that y(z) >0
unless µt' (.) = )
de
µt 2) (• ).
(3.10)
Therefore, barring the case of identical drifts, one has
F(e;
_ —^itn(z)Y(z) < 0.
z)
(3.11)
e=0
Hence, F(c; z) is strictly decreasing in e in a neighborhood of a = 0, say on [0, e o ). Since one can continue beyond s o , by replacing µt l) (•) by j(.), the supremum of all such e o is oo. In particular, one can take e = 1. n Example 1. (Survival Under Uncertainty). Consider first the following discrete-time model, in which h denotes the length of time between successive periods. Let Z, denote an agent's capital at time t. His capital at time t + h is given by Zo =z >0, Z,
=exp {W +n }(Z,—ah),
(t = nh; n=0,1,...)
(3.12)
where a is a positive constant and { W h : n = 1, 2, ...} is a sequence of i.i.d. Gaussian random variables with mean mh and variance vh. The constant a denotes a fixed consumption rate per unit of time, so that Z, — ah is the amount available for investment in the next period. The random exponential term represents the uncertain rate of return. For the Markov process {Z n h }, it is easy
536
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
to check (Exercise 3) that E(Z, +h ( Z,) = exp{mh + Zvh}(Z, — ah), E(Zt+ ^, I Z,) = exp{2mh + 2vh}(Z1 — ah)', E(Z, + „—Z,IZ,)=[(m+2v)Z,—a]h+o(h),
(3.13)
E((Zj+h — Z,) 2 ( Z,) = vZI h + o(h), E(IZ,+n — Z t 1 3 Zr )=o(h),
as h j0.
Therefore, one may, in the limit as h j 0, model Z, by a diffusion with drift t(.) and diffusion coefficient Q 2 (.) given by p(z) = (m + -v)z — a,
a2(z) = vz 2 .
(3.14)
The state space of this diffusion should be taken to be [0, cc), since negative capital is not allowed. One takes "0" as an absorbing state, which when reached indicates the agent's economic ruin. In the discrete-time model, suppose that the agent is free to choose (m, v) for the next period depending on the capital in the current period. This choice is restricted to a set C x (0, cc). In the diffusion approximation, this amounts to a choice of drift and diffusion coefficients, µ(z) = (m(z) + iv(z))z — a,
o2(z) = v(z)z 2 ,
(3.15)
such that (m(z), v(z)) E C
for every z e (0, co).
(3.16)
The object is to choose m(z), v(z) in such a way as to minimize the probability of ruin. The following assumptions on C are needed: (i) 0 < v * := inf{v: (m, v)€ C} < v* := sup{v: (m, v) E C} < oo, (ii) f (v) _= sup{m: (m, v) e C} < co, and (f (v), v) E C for each v e [v * , v*] , (iii) v -+ f(v) is twice continuously differentiable and concave on (v * , v*), (iv) lim l ^,, f'(v) = cc, Iim
t v.
f'(v) = — cc.
(3.17)
First fix a d > 0. Suppose that the agent wants to quit while ahead, i.e., when a capital d is reached. The goal is to maximize the probability of reaching d before 0. This is equivalent to minimizing the probability O(z) of reaching 0 before d, starting at z. It follows from Proposition 3.1 that an optimal choice of (m(z), v(z)) should be of the form (m(z) = f(v(z)), v(z))-
(3.18)
537
OPTIMAL CONTROL OF DIFFUSIONS
The problem of optimization has now been reduced to a choice of v(z). The dynamic programming equation for this choice is contained in the following proposition. To state it define, for each constant v E [v * , v*], the infinitesimal generator A„ by (A,g)(z):= zvz 2 g"(z) + {(f(v) + zv)z — a}g'(z).
(3.19)
For a given measurable function v(.) on (0, cc) into [v * , v *] define the infinitesimal generator A v( . ) by (A V( . ) g)(z)-= zv(z)z 2 g"(z) + {(f(v(z)) + 1 v(z))z — a}g'(z).
(3.20)
Fix d > 0. Let /i,, ( . ) (z) denote the probability that a diffusion having generator A, ( . ) , starting at z e [0, d], reaches 0 before d. For simplicity consider v(.) to be differentiable, although the arguments are valid for all measurable v(.) (see theoretical complement 2). Write ^ (z):= inf i , ( . ) (z).
(3.21)
Proposition 3.2. Assume the hypothesis (3.17) on C. Then the dynamic programming equation min A,,O(z) =0 VE
I V,R .
for0
lim^i(z) =1,
lim^i(z) =0,
Z10
V * 1
_
td
(3.22)
has a solution that is the minimal probability of ruin . For each z E (0, d) there exists a unique 6(z) E [v * , v*] such that the minimum is attained at v = v(z). The function v is strictly decreasing on (0, cc) and is optimal. Proof. In the case v * = v*, Proposition 3.1 already provides the optimum choice. We therefore assume v * < v *. Let çli(z) be a twice continuously differentiable function satisfying (3.22), and v(•) a (measurable) function such that the minimum in (3.22) is attained at v = v(z). Then, for 0 < z < d;
A f (z) = 0
lim s(z) = 1,
lim ii(z) = 0. (3.23)
Z10
zta
Then (see (3.20)), z2
r
(z) _ — 17(z) {(f(v(z)) + iv(z))z — a}'(z).
(3.24)
Using this in (3.22), one gets min VE IV..V ;j
[ —v {(f(v(z)) + zv(z))z — a}/v(z) + (f(v) + zv)z — a]^i'(z) = 0. (3.25)
538
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
It follows from (3.23), since its solution is unique (Exercise 7.7(iii), (iv) of Chapter V), that i(z) is the probability of reaching zero before d, starting at z, for the diffusion generated by A, ( . ) . In particular, i'(z) < 0. One may actually explicitly calculate i'(z) from (9.19) in Chapter V. Therefore, to determine v(z) we find the critical point(s) of the expression within the square brackets in (3.25) by setting its derivative equal to zero. This leads to
(i(v(z)) + Zv(z))z — a = (i v(z)
(3.26)
(v) + z)z,
or, .f(v(z)) — v(z)f'(v) = a
z
Since this must hold for v = v(z), the equation to solve is f(v)
— vf'(v) = a.
(3.27)
z
As the derivative of the left side is — v f "(v) > 0, the left side increases from — oo to + co as v increases from v * to v*. Therefore, for each z > 0 there exists a unique solution of (3.27), say v = v(z) e (v * , v*). Since the right side of (3.27) decreases as z increases, it follows that the function v(•) is strictly decreasing. It remains to show that cui is minimal. For this, consider an arbitrary (measurable) choice v(.). It follows from (3.22) that (A V( . ) I4')(z) >, 0
for all z e (0, d).
(3.28)
Since (A V( . ) ct' V( . ) )(z) = 0 and >/i and ^i v( .^ satisfy the same boundary conditions, letting u(z):= ii(z) — 0,,. ß (z), one gets (A, ( . ) u)(z) > 0
for all z e (0, d),
lim u(z) = 0,
lim u(z) = 0. (3.29)
zj0
zld
By the maximum principle (see Exercise 7.7(iii) of Chapter V) it follows that u(z) <, 0 for all z, i.e., ^i(z)
< 0 . (z) '( )
for all z E [0, d].
(3.30)
n It is important to note that the optimal choice µ(Y)'= (f( ►'(Y)) + z'(y))y — a, i(y),
(3.31)
539
OPTIMAL CONTROL OF DIFFUSIONS
does not depend on d. Therefore, with this choice one also minimizes the probability p Zo of ever reaching 0, starting from z. For this, simply let d j oo in the inequality (3.32) (.(.)(z) <' ^1l ((z). An example of a C satisfying (3.17) is the following set bounded by the lines v = v o ± ß, and a semi-elliptical cap (bounding m away from + co), C ={(m,v):vo—ß^v'
—(v v0)2 1 / 2
(3.33)
Here a, ß,v o are positive constants, v o > ß, and m o is an arbitrary constant. Next consider the problem of maximizing the expected discounted reward J(x) ;= E c( ' )
0
e
ßs r(XS,
c(XS)) ds,
(3.34)
where the state space of the diffusion {X 1 } is an open set G c W. The diffusion has drift µ(y, c(y)) and diffusion matrix D(y, c(y)); r(y, c) is the reward rate when in state y and an action c belonging to a compact set C c 11" is taken, and ß> 0 is the discount rate. The given functions ph i) (y, c) (1 < i < k) are appropriately smooth on G x C, as are d(y, c) (1 < i, j < k), and D(y, c) is positive definite. Once again, the set of feasible controls c(.) may vary. For example, it may be the class of all continuously differentiable functions on G into C. The function r(y, c) is continuous and bounded on G x C. If, under some feasible control c(. ), the diffusion {X,} can reach öG with positive probability, then in (3.34) the upper limit of integration should be replaced by rO G • Let J(x):= sup J`O(x). a.)
(3.35)
In order to derive the dynamic programming equation informally, let c *(.) be an optimal control. Consider a control c 1 (.) which, starting at x, takes the action c initially over the period [0, t] and from then on uses the optimal control c *(. ). Then, as in the derivation of (3.2), (3.3), J(x) = E`X o e -a sr(X S , c) ds + EXr )
=
E`
fo,
= E` J
5
^" f
e -Qs r(Xs,
e ßsr(X,, c) ds + EX^'Ie ` -
o e - ßsr(X s , c)
e 'o
ds + e - Q`ExJ(X,)
= t(r(x, c)) + o(t) + e - ß`T,J(x) < J(x),
-s
c(XS)) ds
ri r(X^ +s•, c(Xr +s•)) ds'
540
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
or, (1 —en')
t
J(x) > r(x, c) + e-ß'
T,J(x) — J(x)
t
+ o(1).
(3.36)
Letting t l 0 in (3.36) one arrives at ßJ(x) > r(x, c) + A c J(x).
(3.37)
One expects equality if the right side of (3.37) (or (3.36)) is maximized with respect to c, leading to the dynamic programming equation ßJ(x) = sup {r(x, c) + A,J(x)).
(3.38)
Cec
In the example below, k = 1. Example 2. The state space is G = (0, oo). Let c denote the consumption rate of an economic agent, i.e., the "fraction" of stock consumed per unit time. The utility, or reward rate, is c) = (cx)1 r(x, (3.39) y
where y is a constant, 0 < y < 1. The stock X, corresponding to a constant control c is a diffusion with drift and diffusion coefficients µ(x, c) = Sx — cx,
o 2 (x, c) = v 2 x 2 ,
( 3.40)
where 6 > 1 is a given constant representing growth rate of capital, while c is the depletion rate due to consumption. For any given feasible control c(•) one replaces c by c(x) in (3.40), giving rise to a corresponding diffusion {X1 }. It is simple to check that such a diffusion never reaches the boundary (Exercise 4). The dynamic programming equation (3.38) becomes ßJ(x) = max — x' + zQ 2 x 2 J"(x) + (Sx — cx)J'(x) ? . CE[0,1]
(3.41)
y
Let us for the moment assume that the maximum in (3.41) is attained in the interior so that, differentiating the expression within braces with respect to c one gets cY-'xY
= xJ'(x),
or, c*(x) =
1 (J'(x))' icy
x
-1)
.
( 3.42)
541
OPTIMAL CONTROL OF DIFFUSIONS
Since the function within braces in (3.41) is strictly concave it follows that (3.42) is the unique maximum, provided it lies in (0, 1). Substituting (3.42) in (3.41) one gets 262x2)„(.x)
+ SxJ'(x) — \
— ßJ(x) = 0.
I—
(3.43)
Y/
Try Try the "trial solution” J(x) = dxv.
(3.44)
Then (3.43) becomes, as x 1 is a factor in (3.43), -1) — ßd = 0, a 2 dy(v — I) + döy — (1 — I I (dv)rrw
vl
(3.45)
i.e., d =
1
v ß
1 —y
+ .a 2 v(l —
1-v (3.46)
-- -
y) —
ay
Hence, from (3.42), (3.44), and (3.46), 72
ii^ v^ = ß +± Y( 1 =r — SY . c * := c * (x) = (yd) 1 )
I —y
(3.47)
Thus c*(x) is independent of x. Of course one needs to assume here that this constant is positive, since a zero value leads to the minimum expected discounted reward zero. Hence, ß + a 2 y(1 — y) — by > 0.
(3.48)
For feasibility, one must also require that c* < 1; i.e., if the expression on the extreme right in (3.47) is greater than 1, then the maximum in (3.41) is attained always at c* = 1 (Exercise 5). Therefore,
c* = max {1, d' },
(3.49)
where d' is the extreme right side of (3.47). We have arrived at the following. Proposition 3.3. Assume 6 > land (3.48). Then the control c*(•) = min{ 1, d'} is optimal in the class of all continuously differentiable controls. Proof. It has been shown above that c*(. ) is the unique solution to the dynamic programming equation (3.38) with J = J` * . Let c(•) be any continuously differentiable control. Then it is simple to check (see Exercise 6) that ßJ` ( ) (x) = r(x, c(x)) + A c( - ) jc '(x). (.
(3.50)
542
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Using (3.38) with J = J one then has for the function (3.51)
h(x) := J(x) — J` ' (x), (
)
the inequality
ßh(x) = r(x, c*) + A,.J(x) — {r(x, c(x)) + A C( . ) J` ' (x)} (
)
r(x, c(x)) + A C( . ) J(x) — {r(x, c(x)) + A C( . ) J` ' (x)} = A C( . ) h(x), (3.52) ( )
or, g(x) := (ß — A C( . ) )h(x) >, 0.
(3.53)
Since the nonnegative function hi(x):= EX )(J e 0
ßs
g(XS)ds),
(3.54)
satisfies the equation (Exercise 6) (ß — A«.^)h1(x) = g(x),
(3.55)
one has, assuming uniqueness of the solution to (3.55) (see theoretical complement 3), h(x) = h l (x). Therefore, h(x) >, 0. n Under the optimal policy c* the growth of the capital (or stock) X, is that of a diffusion on (0, co) with drift and diffusion coefficients
µ(x):=
(ö
—
c*)x,
u'(x):= u 2 x 2 .
(3.56)
By a change of variables (see Example 3.1 of Chapter V), the process { Y := log X} is a Brownian motion on O' with drift 6 — c* — ZQ Z and diffusion coefficient a'. Thus, {X,} is a geometric Brownian motion.
4 OPTIMAL STOPPING AND THE SECRETARY PROBLEM On a probability space (S2, .`, P) an increasing sequence {.: n >, 0} of subsigmafields of are given. For example, one may have 3„ = a{Xo , X1 , ... , X„}, where {X„} is a sequence of random variables. Also given are real-valued integrable random variables {Y,,: n > 0}, Y„ being .-measurable. The objective is to find a {}-stopping time r,*„ that minimizes EYT in the class ✓ , of all {.y„}-stopping times T < m. One may think of Y„ as the loss incurred by stopping at time n, and m as the maximum number of observations allowed.
543
OPTIMAL STOPPING AND THE SECRETARY PROBLEM
This problem is solved by backward recursion in much the same way as the finite-horizon dynamic programming problem was solved in Section 1. We first give a somewhat heuristic derivation of rm. If S = . . . .‚, X„} (n >, 0), and X0 , X 1 ,. . . , Xm _ 1 have been observed, then by stopping at time m — 1 the loss incurred would be Ym _ 1 . On the other hand, if one decided to continue sampling then the loss would be Ym . But Ym is not known yet, since Xm has not been observed at the time the decision is made to stop or not to stop sampling. The (conditional) expected value of Ym , given X0, ... , Xm_1, must then be compared to Ym _ 1 . In other words, rm is given, on the set {im >,m— l}, by if Ym_1 < E(Ym I °Fm-1 ),
_Im — 1
2* m
if Ym -1
m
(4.1 ) > E(Ym I `gym - 1)•
As a consequence of such a stopping rule, one's expected loss, given {X0 ,...,Xm — I },is
Vm _, :=min{Ym _ 1 , E(Ym I `gym -1)}
on
{t,
>, m — 1 }.
(4.2)
Similarly, suppose one has already observed Xo , X 1 , ... , Xm _ 2 (so that r m — 2). Then one should continue sampling only if Ym _ 2 is greater than the conditional expectation (given {X0 , . .. , Xm _ 2 }) of the loss that would result from continued sampling. That is, — 2
T m
m
>m
—
if Ym -2 < E(Vm -1 I .m-2), if Ym -2 > E( Vm -t I'Fm -2) on {Tm i m
I
—
(4.3)
2
}.
The conditional expected loss, given {Xo , ... , Xm _ 2 }, is then Vm - 2 :=min{Ym _ 2 , E(Vm -1 I'm -2)}
on
{t„, i m
— 2 }.
(4.4)
Proceeding backward in this manner one finally arrives at 0 m
>1
if Yo ` E(V1 I .Fo), ifYo>E(Vil^o)
on{t0} =52.
(4.5)
The conditional expectation of the loss, given .moo , is then Vo := min{ Yo , E(V1 )} .
(4.6)
More precisely, V, are defined by backward recursion, Vm :=Ym ,
Vj:=min{Y^,E(Vj
1
I.3j )}
(j = m-1,m-2,...,0), (4.7)
544
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
and the stopping time Tm is defined by r:= min{ j: 0 < j < m, }, = Vj }.
(4.8)
Although the optimality of im is intuitively clear, a formal proof is worthwhile. First we need the following extension of the Optional Stopping Theorem (Chapter I, Theorem 13.3). A sequence {S S : j >, 0} is an {.9j }-submartingale if E(SS I .Fj _,) > SS _ 1 a.s. for all j >, 1. Theorem 4.1. (Optional Stopping Theorem for Submartingales). Let {S; : j > 0} be an {.}-submartingale, r a stopping time such that (i) P(r < cc) = 1, (ii) E^SZ C < oc, and (iii) lim n . E(S,„l {t>m) ) = 0. Then EST >, ESo , with equality in the case {SS } is a {3}-martingale. Proof. The proof is almost the same as that of Theorem 13.1 of Chapter I if we write XX := SS — SJ _ 1 (j >, 1), X0 = Sp, and note that E(XX I .9j _ 1 ) 0 and accordingly change (13.8) of Chapter Ito E(Xjlj)) = E[ 1 {r j)E(Xj ( . _ 1)]
% 0,
and replace the equalities in (13.9) and (13.11) by the corresponding inequalities.
n The main theorem of this section may be proved now. Theorem 4.1 is needed for the proof of part (c), and we use it only for r <, m. Theorem 4.2. (a) The sequence {l': 0 < j < m} is an {$ ; }-submartingale. (b) The sequence { Vrm „ j : 0 <, j < m} is an {S%}-martingale. (c) One has E(Y,) > E(V0
VT E $„
)
E(Y.) = E(VO ).
(4.9)
That is, r is optimal in the class ✓,,,.
Proof (a) By (4.7), V < E(
+1
(b) Let Z be an arbitrary F;-measurable bounded real-valued random variable. We need to prove (see Chapter 0, relation (4.14)) E(ZVV.Ai)=E(ZV.A(J +i))
(0<j'<m— 1).
(4.10)
For this, write E(ZV,.
,i)
= E(ZV,. , j 1 (,,J;) + E(ZVL,AJl1, >J))
= E(Zl'
A j
I (rm , j) ) + E(ZVV l st ,, > J) ).
(4.11)
545
OPTIMAL STOPPING AND THE SECRETARY PROBLEM
But, on {tm >j}, Vj = E(VJ+ , I S). Also, {rm >j} E Ft . Therefore, E(ZVjl
{t.,
>i}) = E(Zllr.,>J}E(V;
+i
I ,;))
= E(ZV, + , i{T ,>i)) = E(ZV^n,A(J+1)l1t,,>i}).
(4.12)
j }. A (j + 1) on {r Using (4.12) in (4.11) one gets (4.10), since zm A j = (c) Let r E .9,,. Since YY >, V^ for all j (see (4.7)), one has Yt , V. By (4.7), Theorem 4.1, and the submartingale property of {Vj } it now follows that
(4.13)
E(Y,) >, E(VT ) > E(V0 ).
This gives the first relation in (4.9). The second relation in (4.9) follows by the n martingale property of { Vtm „ J } (and Theorem 4.1). Note YL ^, = V^.,.
Y„
Y„
In the minimization of EYt over .9 , need not be .-measurable. In such cases one may replace by E( Y„ .) = U„, say, and note that, for every t C-
j)] Z_ m
m
m
E(Y) _ I E(Y.iItT=1)) = I E[E(Yi 1 J^=jJ I J= 0 l=0
=
J=0
E[ 1 1t=r)UJ) = EUt .
Hence the minimization of EYE reduces to the minimization of EUr over .%,,,. Also, instead of minimization one could as easily maximize EYT over .9,„. Simply replace min by max in (4.7), and replace ">," by "<," in (4.9). This is the case in the following example, in which the indexing of the random variables and sigmafields starts at j = I (instead of j = 0). Example 1. (The Secretary Problem or the Search for the Best). Let m> 1 labels carrying m distinct numbers be in a box. A person takes out one label at a time at random, observes the number on it and sets it aside before the next draw. This continues until he stops after the rth draw. The objective is to maximize the probability that this rth number is the maximum M in the box. Thus, if Wj = 1 or 0 according as the jth number X; drawn is the maximum or not, one would like to maximize EW over the class ✓ of all stopping times r(<, m) relative to the sigmafields .Fj := a{X I , ... , X3 }, I <, j '< m. Define the .F -measurable random variables ;
Y` =E(WiI.Fi)=P(XX=MI ).
(4.14)
If XX is the maximum, say Ml , among X 1 ,. . . , XX , then the condition probability (given X,, ... , XX ) that it is the maximum in the whole box is j/m; if X3 < MM then of course this conditional probability is zero. Therefore, j
Y = m l(x;=Mj1'
(4.15)
546
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Also, for every z e .9 , as explained above, m
m
EYT =
E(Y;lt
i }) _
=1
i =1
m
E[E(Wi I X1) 1 1 , =i}] _
E[E( 1 tt=i} 14 I $5)]
i =1 m
E(I t== Wj ) = E.
_
1=1
Hence the maximum of EWt over 9-m is also the maximum of EYt over ✓m . In order to use Theorem 4.2 with min replaced by max in (4.7) and ">" replaced by "<," in (4.9), we need to calculate Vj (1 < j < m). Now Vm = Ym and (see Eq. 4.15) E(YmI' m-1 )=
mm P(Xm=M
mI^m 1
1
(Mm =M).
)= m
Since (m — 1)/m > 1/m, it then follows that J 1
Vm -i °=max {Ym-1 E(Ym I ^m-1) } ,
=
m-1 1 1 {X + m 1 {X--i<M.,,- } Mm -i)
Then, E(
m-1 , I ^m 2) =
m—
m
1
_
m
P(Xm-1 = Mm-1 (gym-z) + 1— P(Xm - 1 < Mm-1 m-z) m
1 lm-2 m-2 1
+—
m-1
mm-1
_
I
+
1
m m-2 m-1 (
(4.16)
To evaluate Vm-2'=max {Ym-2,E(Vm-1 m-2)} note that if (m _2)_1 + (m — 1)' < 1 then (see Eqs. 4.14 and 4.16) m-2
1
m-2(
V m z = m 1txm-2M.-z}
m m-2
1 + m — 1 ltxm-2
so that a calculation akin to (4.16) yields m-3
m Iz m 3)= E( V
1
1
1
m m-3+m-2+m-1
547
OPTIMAL STOPPING AND THE SECRETARY PROBLEM
Assume, as a backward induction hypothesis, that
11 m —I m j j+l
1
(4.18 )
J
for some j such that
a:= - + 1 + ... + j j+1
(4.19)
1 1. m-1
Then,
V; •= max{ Y, E(V;+ 1 F; )} = max { m 1(x; =M; ) ' m ai} =
I
1
{x;=M;} + m aj1jx;<M;},
and, since P(X; = M; 35- 1 ) = 1/j, it follows that
E(Vj I. j -^)=
m
j-1
j-1
1
11 +-+...+ j
m-1
=
J-1 m
a;-^•
The induction is complete, i.e., (4.18) holds for all j >j', where j *:=max{j:l <j<m,a > 1 }.
(4.20)
;
In other words, one gets
E(V 1
)=1-a for j* <, j < m .
(4.21)
L a*,
(4.22)
m
Also, V;. = max{, E(1'.
1 ;.)} =
m
since a . > 1. In particular, V; . is nonrandom, which implies ;
* m
which in turn leads to
j* V;• -^ = max {Y;.- 1 , E(Vi. I-)} = a .. m ;
548
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Continuing in this manner one gets
V ; =
L a .
for 1 < j < j*.
;
m
(4.23)
The optimal stopping rule is then given by (see Eq. 4.8) i
* _ min{ j:j>,j*+ 1,X; =M;},
if X; = M; for some j>,j*+1, (4.24) if X; 9kM; forallj>,j*+1.
m,
For, if j <j" then Yt < E(V;+ 1 Iy ) = ( j*/m)a;.. On the other hand, if j >j' then aj < 1 so that (i) Y > E(Vj+1 ;) = ( j/m)a j on {XX = MM } and (ii) 0= Y < E(V 1 .yj ) on {X, < M}. Simply Simply stated, the optimal stopping rule is to draw j* observations and then continue sampling until an observation larger than all the preceding shows up (and if this does not happen, stop after the last observation has been drawn). The maximal probability of stopping at the maximum value is then ;
I
E(V1 ) Vi
1
11 -+ j*+ 1 +...+
-
m
m-1
(4.25)
Finally, note that, as m —► oo, 1 1 1 a*=—+ z1, +•••+ m — 1 ' j* j* + 1
(4.26)
where the difference between the two sides of the relation "z" goes to zero. This follows since j* must go to infinity (as the series (1/j) diverges and j* is defined by (4.20)) and a;. > 1, a;. + l < 1. Now, l
1
a,. = m* iI J/m ,*/m .
m
1
1 —
dx = — log(j*/m).
(4.27)
- « e ', m
(4.28)
Combining (4.26) and (4.27) one gets j*
—logt
m
.: 1,
where the ratio of the two sides of "—" goes to one, as m --+ oo. Thus, lim —=e, m
m-
J
lim E(V1 )=e. m-+co
(4.29)
549
CHAPTER APPLICATION
5 CHAPTER APPLICATION: OPTIMALITY OF (S, s) POLICIES IN INVENTORY PROBLEMS Suppose a company has an amount x of a commodity on stock at the beginning of a period. The problem is to determine the amount a of additional stock that should be ordered to meet a random demand W at the end of this period. The cost of ordering a units is
K +ca
if a = 0, ifa>0,
(5.1)
where K >, 0 is the reorder cost and c > 0 is the cost per unit. There is a holding cost h > 0 per unit of stock left unsold, and a depletion cost d > 0 per unit of unmet demand. Thus the expected total cost is
I(x, a):='(a) + L(x + a),
(5.2)
L(y):= E(h max {0, y — W} + d max {0, W — y }).
( 5.3)
where
In this model x may be negative, but a > 0. The objective is to find, for each x, the value a = f *(x) that minimizes (5.2). Assume
EW
d >c>0,
h >0.
(5.4)
Consider the function G on R ,
G(y):= cy + L(y).
(5.5)
Then,
G(x)—cx 1 ( x, a) = JG(x + a) + K — cx
ifa =0, ifa>0.
(5.6)
Minimizing I(x, a) over a >, 0 is equivalent to minimizing I(x, a) + cx over a>,O,oroverx +a>,x. But G(x)
a) + cx = +a) JG(x +K I(x,
if a = 0, ifa>0.
(5.7)
Thus, the optimum a = f *(x) equals y*(x) — x, where y = y*(x) minimizes the
550
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
function (on [x, oo))
G(Y; x) {G(x) G(y) + K
if y = x, if y> x,
(5.8)
over y > x. Now the function y - L(y) is convex, since the functions y -+ max{0, y - w}, max{0, w - y} are convex for each w. Therefore, G(y) is convex. Clearly, G(y) goes to 0o as y -> oo. Also, for y < 0, G(y) > cy + djyl. Hence G(y) -+ oo as y -. - oo, as d > c > 0. Thus, G(y) has a minimum at y = S. say. Let s < S be such that
G(s)
=
K
+
G(S).
(5.9)
If K = 0, one may take s = S. If K > 0 then such ans < S exists, since G(y) --> o0 as y -- - oo and G is continuous. Observe also that G decreases on (- cc, S] and increases on (S, cc), by convexity. Therefore, for XE (-cc, s], G(x) > G(s) = K + G(S), and the minimum of G in (5.8) is attained at y = y*(x) = S. On the other hand, for x E (s, S], G(x)<,G(s)=K+G(S)
for all y>x.
Therefore, y*(x) = x for XE (s, S]. Finally, if x e (S, oo) then G(x) < G(y) for all y > x, since G is increasing on (S, oc). Hence y*(x) = x for x E (S, oo). Since f *(x) = y*(x) - x, we have proved the following. Proposition 5.1. Assume (5.4). There exist two numbers s <, S such that the optimal policy is to order f*(x) _ S-x
to
ifx<s,
(5.10)
ifx>s.
The minimum cost function is
I(x) := I(x, f *(x)) = (g(f *(x)) + L(x + f *(x)) K+G(S)-cx G(x)-cx
ifx.s, ifx>s.
(5.11)
Instead of fixed costs per unit for holding and depletion, one may assume that L(y) is given by L(y):=
E(H(max{ 0, y - W}) + D(max{0, W - y})).
(5.12)
Here H and D are convex and increasing on [0, co), H(0) = 0 = D(0). The
CHAPTER APPLICATION
551
assumption (5.4) may now be relaxed to L(y) < oo
for all y,
lim (cx + L(x)) > K + cS + L(S), (5.13) X - - -
where S is a point where G is minimum. Next consider an (N + 1)-period dynamic inventory problem. If the initial stock is x = X0 (state) and a = a 0 units (action) are ordered, then the expected total cost in period 0 is g 0 (x, a):= E(4'(a) + h max{0, x + a — W1 } + d max {0, W1 — x — a}) = I(x, a).
(5.14)
We denote by W1 , W2 ,.. . , W^ i.i.d. random demands arising at the end of periods 0, 1, . .. , N — 1. Assumption (5.4) is still in force with W a generic random variable having the same distribution as the W. The state X 1 in period I is given by X 1 = x + a — W1 = X0 + a o — W1 and in general Xk = Xk-
l
+ ak-, — Wk (5.15)
where Xk - 1 is the state in period k — 1, and a k _ is the action taken in that period. Thus, the transition probability law p(dz; x, a) is the distribution of x + z — W. The conditional expectation of the cost for period k, given Xk _, = x, ak -1 =a, is
(k = 0, 1, ... , N),
gk(x, a)'= 5 k l(x, a)
(5.16)
where ö is the discount factor, 0 < 6 < oo. The objective is to minimize the total (N + 1)- period expected discounted cost N
JX °= EX
ak I (Xk, ak),
(5.17)
k=0
over all policies f = (f, f', ... 'f). By Theorem 1.2 (changing max to min), an optimal policy is f = (f ö, f *, ... , f N), which is Markovian and is given by backward recursion (see Eqs. 1.11 - 1.14). In other words, f N is given by (5.10), fN( )= S—x X
0
ifX<S, ifx>S,
(5.18)
and f (0 < k < N — 1) are given recursively by (1.14). Let us determine f N _i. For this one minimizes gN_ 1(x, a) + Eh N (x + a — W),
(5.19)
552
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
where 8 N,- 1 is given by (5.16) and h N,(x) is the minimum of g,(x, a) = S"I(x, a), i.e., (see Eq. 5.11) hN(x) =
Ö N I(x).
Thus, (5.19) becomes 6' -1 [I(x,a)+6EI(x+a—W)],
(5.20)
and its minimization is equivalent to that of I N _ l (x,a):=I(x,a)+6E1(x+a— W).
(5.21)
GN -1(Y) °= cy + L(y) + 6E1(y•— W).
(5.22)
Write
Then (see the analogous relations (5.5)-(5.8)), GN -i(x) I N - i (x, a) + cx = f +a)+K G,-,(x
if a = 0, ifa>0.
(5.23)
Hence, the minimization of IN _ 1 (x, a) over a >, 0 is equivalent to the minimization of the function (on [x, oo)), GN -l(X) G N -,(Y; x) _
tGN-1(Y) + K
ify=x, ify>x,
( 5.24)
over y >, x. The corresponding points f _ 1 (x), yN _ 1 (x), where these minima are achieved, are related by J
N -1 \x) = YN-I(X) - X.
(5.25)
The main difficulty in extending the one-period argument here is that the function GN - 1 is not in general convex, since I is not in general convex. Indeed (see (5.11)), to the left of the point s it is linear with a slope —c, while the right-hand derivative at s is G'(s) — c < —c since G'(s) <0 if K > 0. One can easily check now that I is not convex, if K > 0. It may be shown, however, that GN _ I is K- convex in the following sense. Definition 5.1. Let K >, 0. A function g on an interval.! is said to be K- convex if, for all y l < Y 2 <J'3 in a — Y2
K + 9(Y3) 9(Y2) + Y (9(Y2) — 9(Y1)). Y2 — Yi
(5.26)
553
CHAPTER APPLICATION
Thus 0-convexity is the same as convexity, and a convex function is K-convex for all K > 0. We will show a little later that (i) G N _ 1 is K-convex on V^ 1 , (ii) GN -1(Y) -* cc as IYI --> oo.
Assuming (i), (ii), we now prove the existence of two numbers S N _ I SN _ 1 such that YN
ifx,SN -1
SN -1
1(X)'- x t
(5.27)
1f X > S N-1+
minimizes GN _ 1 (y; x) over the interval [x, cc). For this let SN _ 1 be a point where G N _ , attains its minimum value, and let S N _ 1 be the smallest number SN _ 1 such that GN-I(SN -1) =
K + GN-1(SN -1)-
(5.28)
Such a number S N _, exists, since G N _ 1 is continuous and G N _ I (x) -• co as x-* -cc. Now G N _ 1 is decreasing on (-cc ; S N - I ]. To see this, let y, < Y 2 < S N _ 1 and apply (5.26) with y, = K + GN- I (SN- 1) > G_1(y2) +
SN -1
—
Y2
/
/
-- ( GN- I (Y2) — GN- 1 (YI )). (5.29)
Y2 — YI
Also, GN _ I (y 2 ) > K + G N _ 1 (SN _ 1 ), since Y 2 <5N- , and (5.28) holds for the smallest possible 5 N _ 1 • Using this in (5.29), one gets
K + GN-1 (SN-1 ) > K + GN-I(SN -1) +
/
/
----- -- (GN-I(Y2) — GN-1(Y1))
,
Y2 — Y1
so that GN _ I (y2) < GN _ I (y l ). Therefore, if x -< sN _ 1, then GN _ , (x) > GN _ I (SN) _ GN_ 1 (SN_ 1 ) + K and,forally> x , GN_1(y;x)=GN_1(y)+K^GN_1(SN_I) +K. Hence, the minimum of 6N -,(-; x) on [x, oo) is attained at y = 5N-1' proving the first half of (5.27). In order to prove the second half of (5.27) it is enough to show that GN-I(x)
If S N _ I
<x
ifS N - I ^x
(5.30)
1< S N _ I then by (5.28) and K-convexity,
GN-I(sN-1) = K + GN-1(SN -1) i GN-1(x) + SN--1 x
X — SN -1
(GN -,(x) — GN-1(sN -1)) ,
554
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
or, SN -1
—
SN-1
SN-1 —S N-1 /
GN-1(SN-1) >
X — SN- 1 X
—
GN -l(x),
SN _ 1
i.e., GN_l(x) <- GN-I(SN-1) = GN-1(SN-1) + K < GN_1(y) + K for all y. Since (5.30) clearly holds for x = s N _ 1 , it remains to prove it in the case S N _ 1 <x < y. By K- convexity one has x / / /
/
K + GN -1(Y) i GN -1(x) +
( GN-1( x) — GN-,(SN-1))• X — SN-1
Thus (5.30) is proved, establishing the second half of (5.27). Finally, let us show that GN _ 1 is K-convex. Recall (see Eqs. 5.22, 5.5) that G_
1
(y) = G(y) + SE1(y — W)
(5.31)
Since the function G is convex (i.e., 0-convex), it is enough to prove that I is K- convex. For it is easy to check from the definition, on taking expectations, that the K- convexity of I implies that of y --+ EI (y — W). It is also easy to show that if J 1 is K 1 -convex, J2 is K 2 -convex for some K 1 >, 0, K 2 > 0, then S 1 J1 + S 2 J2 is (a 1 K 1 + 5 2 K 2 )-convex for all S 0, 6 2 >, 0. For positive S it now follows from (5.31) that GN _ 1 is K- convex if I is K- convex. Recall the definition of I (see Eq. 5.11), 1 ( x) __
ifx<s, ifx>s.
K+G(S)—cx
(5.32) ( )
We want to show K + 1(x 3 ) > 1(x 2 ) + X3
—
X2
x2 — x1
(1(x 2 ) — 1(x 1 ))
for all x 1 <x 2 <x 3 . ( 5.33)
If x 3 -< s, then linearity of I in (— oo, s] implies convexity and, therefore, K- convexity in (— co, s]. Similarly, if x 1 > s, (5.33) is trivially true since G(x) — cx is convex, and, therefore, K- convex. Consider then x 1 <s <x 3 . Distinguish two cases. CASE I.
x 1 <s <x 2 . In this case (5.33) is equivalent to
K+G(x 3 )— cx 3 >G(x 2 )— cx 2
+
X3—X2
x2
—
X1
(G(x 2 )— cx 2
—
K—G(S)+cx 1 ).
(5.34) Canceling cx (i = 1, 2, 3) from both sides and recalling that K + G(S) = G(s), ;
CHAPTER APPLICATION
555
(5.34) becomes equivalent to K + G(x 3 ) >, G(x 2 ) + X3 x2
—
X2
—
x1
(G(x 2 ) — G(s)).
(5.35)
If G(x 2 ) >, G(s), then the right side of (5.35) is no more than G(x2) +
X3
x2
— X2 —
s
(G(x2) — G(s)),
which is no more than G(x 3 ) by convexity of G. Hence (5.35) holds. Suppose G(x 2 ) < G(s). The left side of (5.35) is no less than K + G(S) = G(s), and (5.35) will be proved if it holds with the left side replaced by G(s), i.e., if (5.36)
G(s) > G(x 2 ) + x3 x2 (G(x 2 ) — G(s)). x2 — x 1 —
By simple algebra, (5.36) is equivalent to G(s) >, G(x 2 ), which has been assumed to be the case.
CASE 1I. x 1 < x 2 - s < x 3 . In this case (5.33) is equivalent to
K + G(x3) — cx3 > K + G(S) — cx2 + x3 — x2
—
X2
x1
(
—
cx2 + cx l )
= K + G(S) — cx 3 ,
which is obviously true. Thus, I is K-convex, so that G N _, is K-convex, and the proof of (5.27) is complete.
The minimum value of (5.19), or (5.20), is (5.37)
where (see Eqs. 5.23, 5.25, 5.27) K +GN _,(SN _,)—cx Ix -,(x) = min I N -,(x, a) _ G,^_,(x)—cx a>- 0
ifx s N _ 1 , ifx>sN_1.
(5.38)
The equation for the determination of f_2 is then (see Eq. 1.14) min [6 N-2 1(x, a) + 6 N- 'EI N _ ,(x + a — W)] a
?o
= 6 N-2 min [I(x, a) + 6EI N _ I (x + a — W)]. (5.39) n3o
556
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
This problem is mathematically equivalent to the one just considered since (5.32) and (5.38) are of the same structure, except for the fact that G is convex and G, -1 is K-convex. But only K-convexity of G was used in the proof of K-convexity of I(x). Proceeding iteratively, we arrive at the following. Proposition 5.2. Assume (5.4). Then there exists an optimal Markovian policy
f* _ (f ö, ... , f *) of the (S, s) type for the (N + 1)-period dynamic inventory model, i.e., there exist Sk < S k (k = 0, 1, . .. , N) such that during period k it is optimal to order Sk — x if the stock x is < s k , and to order nothing if x > Sk. Also s k <Sk for all kifK>0, and sk =Sk ifK=0. Finally, consider the infinite-horizon problem of minimizing, for a given b, 0<8<1,
X b k l (Xk, ak),
Jx := Ex
(5.40)
k=0
over the class of all (measurable) policies f = (fo , f1 , ...). Write this infimum as
Jx := inf JX. (5.41) To show that Jx is finite, consider the policy f° := (0, 0, ...), i.e., the stationary policy that orders zero in every period. It is straightforward to check that EX ° 1(Xk , a k ) < (h + d)Ex°(IXk-l) + Wk ) = (h + d)(EX°IXk-11 + EW), (5.42) where { Wk } is an i.i.d. sequence, Wk having the same distribution as W considered before. Also, for each k, Wk is independent of X 0 ,. .. , Xk _ , . Now,
EX ° IXkI = Ex ° IXk_ I — Wkl < EW + Ez ° IXk _ 11, and iterating one gets Es°I XkI <, kEW + Ixl.
(5.43)
Substituting (5.43) in (5.42), (5.40) one gets JX
(h + d)EW (h + d)Ixl
(1 — S) 2 1
S
(5.44)
Hence Jx < oo. The dynamic programming equation (2.14) becomes Jx = min {I(x, a) + 6E(Jx+a _ w )}. a^0
(5.45)
EXERCISES
557
We will assume K = 0. Now the (N + 1 )-period optimal costs (T"0)(x) are increasing with N. Since these costs are convex, the limit J. is also easily seen to be convex and, in particular, continuous. It follows that the convergence of I(x, a) + SE(T "0)(x + a — W)
(5.46)
to the expression within braces in (5.45) with J = J. is uniform on compact subsets of x and a, at least if W is bounded, P(0. W'< M) = I
for some M < oo.
(5.47)
The dynamic programming equation, therefore, holds for J, x . This also implies that the convex function
G(y)L=G(y) + BEJY _ W = cy + L(y) + 5EJJ _ W (5.48) goes to infinity as yl —• oo. Thus, by the same argument as above, an optimal policy exists with S = s. We have the following. Proposition 5.3. Assume (5.4) and (5.47). Let K = 0, 0 < S < 1. (a) Then the dynamic programming equation (5.45) holds for the minimum cost J. in the infinite-horizon discounted dynamic programming problem. (b) There exists a stationary optimal policy f* = ( f *, f *,_...) such that a = f *(x) minimizes the right-hand side of (5.45) with Jx = J, and f* is given by
_ S—x f *(x) 0
ifx'<S, if x > S.
Here S is the point where the function (5.48) attains its minimum. EXERCISES Exercises for Section VIA Consider the following inventory control problem. Let the possible amounts of a commodity that a business can stock be 0, 1, 2 (states). In each period k = 0, 1, 2 (N = 2) the stock is replenished by ordering 0, 1, or 2 units (actions), but not more than 2 units can be stored. There is a random demand that arises at the end of each period, demands over different periods being independent of each other and identically distributed. The possible values of demand are 0, 1, 2, which occur with probabilities 0.2, 0.5, and 0.3, respectively. Thus the transition probability p(z; x, a) is the distribution of max {0, z + a — W} A 2, where W is the random demand. Find an optimal policy to minimize the total expected cost if in each period the cost is the sum of (a) the cost of ordering a units at the rate of $1 per unit, (b) the cost of storing
558
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
the excess supply max{0, x + a — W} at the rate of $1 per unit, and (c) a penalty of $1 per unit for excess demand max{0, W — x — a}. 2. (Taxicab Operation) The area of operation of a cab driver comprises three towns, Ti, T2 , T3 (states). To get a fare, the driver may follow one of three courses (actions): pull over and wait for a radio call (a 1 ), go to the nearest taxi stand and wait in line (a 2 ), or go on cruising until hailed by a passenger (a 3 ). The action a, is not available if the driver is in town T,, as there is no radio service in this town. There is a known probability p(T; ; Ti , a,) that the next trip will be to town T; , if the cab is in town T• and follows the course of action a,. The corresponding reward is g(T, a k , T). (i) Write down the backward recursion relations for maximizing the total expected reward over a finite horizon. (ii) Find the optimal policy for the case N = 2, if the transition probabilities and rewards are as follows: State
Ti
Action
a2 a3
T2
al a2 a3
T3
a, az a3
Probability of transition to state
Reward if trip is to state
T1 T2 T3 TI
T2 T3
2
0 2 2
1 4 i 4 1 4 i 4 t I
2 I 4
0
2
1
1 4 1 4 1
2 1 2 1 4 1
2 41 i 4 I 4
2
1 4 21 1 4 1 2
2 4 2 2 2
4
2
4
4
2 2
4
2
4 4 4
2 2 2 2 2
3. Prove Theorem 1.3 along the lines of the proof of Theorem 1.2.
Exercises for Section VI.2 1, Suppose S = {x,, x z , ...} is countable and A is a compact metric space. For each E e (0, b) let {f: k = 0,1, ...} be a sequence of functions on S into A. (i) Show that there exists a sequence E, j 0 such that f 0-(x l ) converges to some point fö(x l ), say, in A. [Hint: A is a compact metric space.] (ii) Show that there exists a subsequence {E,,, 2 : n = 1, 2, ...} of {E„ : n = 1, 2, ...} such that f ö 2 (x z ) converges to some point f(x2) in A. (iii) Having constructed {E 1 : n = 1, 2, ...} in this manner find a subsequence n = 1, 2, ...} of {: n = 1, 2, ...} such that f " ( x ;+ ,) converges to some point f ö(x ;+ , ), say. (iv) Define S;p = E.,,, and show that f ö^•°(xj ) - f ö(x j ) for allj = 1, 2, ... , as n — co. (v) Use (i)—(iv) to find a subsequence {8,.,: n >, 1} of {8 0 } such that ^•'(x) converges to f (x), say, for every x e S, as n —+ oo. (vi) Proceeding in this manner find a subsequence { ° . k + 1 : n l} c {8,,, k : n > l} such that f k^ k(x) -- f k (x), say, for every x e S.
fa
THEORETICAL COMPLEMENTS
559
(vii) Let 6, := 8 . Show that f k "(x) - f (x) for every k >_ 0 and every x e S, as n -* oo.
2. Let the elements of a finite set S be labeled 1, 2, ... , m.
(i) Show that the set B(S) of all real-valued functions on S may be identified with 1m, and the "sup norm" on B(S) corresponds to the norm Ixl := max {Ix°' I: I _< i <_ m} on p'". (ii) Let T be a (strict) contraction on S, i.e., IITf — Tg <6111 gll for some 6 e [0, 1). Show that this may be viewed as a contraction, also denoted T, on 68",i.e.,ITx—TyI Ix — yl. (iii) Show that T"O converges to a point, say x *, in S. [Hint: {TO: n >_ l} is a Cauchy sequence in l8'".] (iv) Show that Tx* = x *. (v) Show that T"y x* whatever be ye p'". [Hint: If 7'x* = x *, Ty* = y *, then x* = y *•] )
—
3. Write out the dynamic programming equation for the optimal infinite-horizon discounted reward version of part (i) of the taxicab operation problem described in Example 1.2. 4. (i) Prove Theorem 2.3.
(ii) Under the hypothesis of Theorem 2.3 show that (T"O)(x) is the N -stage optimal reward, where T is defined by (2.15).
Exercises for Section VI.3
_<
1. Suppose {X} is a diffusion on Oa k \G with absorption at OG, starting at x äG. Give an argument analogous to that given in Exercise 3.5 of Chapter V to prove that P:(taG t) = o(t), as t f, 0. 2. Check (3.8). 3. Check (3.13). 4. Show that the probability that the diffusion with coefficients (3.40), with c replaced by c(x) (continuously differentiable) ever reaches 0, starting at x > 0, is zero. 5. Suppose that the expression on the extreme right in (3.47) is greater than or equal to 1, then the maximum in (3.41) is attained at c* = 1. [Hint: c = 0 gives a minimum, and the function is strictly concave on [0, 1].] 6. Suppose {X,} is a diffusion on (a, b) with infinitesimal generator A = 26 2 (x) d 2 /dx 2 + µ(x) d /dx. Let f be in the domain of A, i.e., (T, f , /t -* Af as t j 0. Prove that the —
r)
function g(x):= E x f ö e ßs(T s f )(x) ds satisfies the resolvent equation: (/3 — A)g(x) _ f(x). [Hint: (T,g)(x) = e°`9(x) — efl' e -vs(Tsf)(x) ds.] -
fo
THEORETICAL COMPLEMENTS Theoretical Complements to Section VI.I
_< _<
1. (Extensions to More General State Spaces: Finite-Horizon Case) Let S be a complete separable metric space, A a compact metric space, g,,, (0 k N) continuous
560
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
real-valued functions on S x A, p(dz; z, a) weakly continuous on S x A into f(S), where e(S) is the set of all probability measures on the Borel sigmafield of S. Then the maxima in (1.11), (1.13), and (1.14) are attained, and the maximal functions h k are continuous on S. Since there may not be a unique point in A where (1.14) is maximized, one needs to use a measurable selection theorem to obtain measurable functions f N, f N_ 1 , ... , f ö on S into A in order to define a Markovian policy f* _ (f ö, f *, ... , f N). A proof of the existence of measurable selections may be found in A. Maitra (1968), "Discounted Dynamic Programming on Compact Metric Spaces," Sankhyä, Ser. A, 30, pp. 211-216. The proof of optimality remains unchanged. The above assumptions can be further relaxed. Assume (A): 1. S is a nonempty Borel subset of a complete separable metric space. 2. A is a nonempty compact metric space. 3. g k (k = 0, 1, .. . , N) are bounded upper semicontinuous real-valued functions on S x A. 4. p(dz; x, a) is weakly continuous on S x A into f(S).
Under (A) the following hold: (i) h N (x) 2= SUP QEA g s,(x, a) is attained (i.e., h N (x) = g N (x, aN(x)) for some a N (x) e A) and is upper semicontinuous on S. (ii) Given that hk +l is upper semicontinuous on S: (a) The function (x, a) —. h k+ 1(z)p(dz; x, a) is upper semicontinuous on S x A. (b) h k (x) = suq, EA {g k (x, a) + $ h k+ ,( z)p(dz; x, a)} is upper semicontinuous on S (and the supremum is attained). (c) There is a Borel-measurable function f k on S into A such that
f
h&(x) = gk(x,fk(x)) +
f hk+ 1(z)p(dz;x,f k(x)).
See, Maitra, loc. cit., for this generalization. Therefore, Proposition 1.1 and Theorem 1.2 go over under (A).
Theoretical Complements to Section VI.2 1. (Infinite-Horizon Discounted Problem) Consider the assumption (A'): Conditions (A) above, with (3) replaced by (3)': g o is bounded and upper semicontinuous on S x A. Theorem 2.2 holds under (A'). The proof goes over word for word if one takes B(S) to be the set of all real-valued bounded Borel measurable functions on S. 2. (Semi-Markov Models) All the results of Sections 1 and 2 have extensions to semi-Markov models, which include, as special cases, the discrete-time models of Sections 1 and 2, as well as continuous-time jump Markov-type models. In order to describe this extension let (1) the state space S be a Borel subset of a complete separable metric space, (2) the action space A be compact metric; (3) a reward rate r(x, a) be given, which is bounded and upper semicontinuous on S x A. (4) In addition, one is given the holding time distribution y(du; x, a) in state x when an action a is taken, and the probability distribution of transition p(dz; x, a) to a new state z from
THEORETICAL COMPLEMENTS
561
the present state x when an action a is taken; (x, a) -* y(du; x, a) and (x, a) - p(dz; x, a) are assumed to be weakly continuous. where fo is a measurable A policy f is a sequence of functions f = ( fo , f„ f2 function on S into A, f, is a measurable function on S x A x S into A, ... , fk is a measurable function on (S x A) k x S into A. Given such a policy and an initial state x, an action f0 (x) is taken. Then the process remains in state x for a random time To having distribution y(du; x, f0 (x)). At the end of this time, the state changes to X, with distribution p(dz; x, fo (x)). Then an action a, = f1 (x, fo (x), X,) is taken. The process remains in state X, for a random time T, having distribution y(du; X 1 , a s ), conditionally given X,. At the end of this period the state changes to X 2 having distribution p(dz; X,, a, ), conditionally given X 1 . This goes on indefinitely. The expected discounted reward under the policy is , ...),
Jx := EX
I
e - ß`r(Y, a,) dt
0
where ß > 0 is the discount rate, Y is the state at time t, and a, is the action at time t. A policy f is (semi-)Markov if, for all k >- 1, f, depends only on the last coordinate among its arguments, i.e., if fk is a function on S into A. Under such a policy, the stochastic process { },: t >- 0} is semi-Markov. In other words, although the embedded process {Xk : k >- 0} is (nonhomogeneous) Markov having transition probability p(dz; x, fk _,(x)) at the kth step, and the holding times To , T,, ... are conditionally independent given {X k : k >- 0}, the latter (conditional) distributions y(du; x, fk (x)) are not in general exponential. Hence, { Y: t >- 0} is not Markov in general. A policy f =_ f °° is said to be stationary if it is (semi-)Markov and if fk = f for all k, where f is a fixed Borel-measurable function on S into A. Under a mild additional assumption that (5) 6,(x, a):= exp{ —ßu}y(du; x, a} is bounded away from 1, it may be shown that the optimal discounted reward J, is the unique upper semicontinuous solution to the dynamic programming equation (
)
f
J. = max fr(x, a)T B (x, a) + S Q (x, a) a
J
Jz P(dz; x, a) }
Here T Q (x, a):= (1 — S ß (X, a))/ß. In addition, there exists a Borel-measurable function f * on S into A such that a = f *(x) maximizes the right side above. For each such f* the stationary policy f* is optimal. For details and references to the literature, see R. N. Bhattacharya and M. Majumdar (1989), "Controlled Semi-Markov Models: The Discounted Case," J. Statist. Plan. Inference, 21, pp. 365-381. The following lists only a few of the significant earlier works on the subject. R. Howard (1960), Dynamic Programming and Markov Processes, MIT Press, Cambridge, Mass.; D. Blackwell (1965), "Discounted Dynamic Programming," Ann. Math. Statist., 36, pp. 226-235; A. Maitra (1968), loc. cit.; S. M. Ross (1970), "Average Cost Semi-Markov Decision Processes," J. Appl. Probability, 7, pp. 656-659. The dynamic programming equations of this chapter (for example, (2.14)) originated in Bellman's pioneering work. See R. Bellman (1957), Dynamic Programming, Princeton University Press, Princeton.
562
DYNAMIC PROGRAMMING AND STOCHASTIC OPTIMIZATION
Theoretical Complements to Section VI.3 1. The policies that are shown in this section to be optimal in the class of all stationary feasible policies are actually optimal in a much larger class of policies, namely, the class of all nonanticipative feasible policies, which may depend on the entire past. For details, see W. H. Fleming and R. W. Rishel (1975), Deterministic and Stochastic Optimal Control, Springer-Verlag, New York, Chapters V and VI. Apart from some technical measurability problems, the ideas involved in this extension of the class of feasible policies are similar to those already considered in Sections 1 and 2. 2. Feller's construction of one-dimensional diffusions (see theoretical complement 2.3 to Chapter V) allows discontinuous drift and diffusion coefficients. Hence, in Example 1 one may allow all measurable functions v with values in [v s , v*]. This example is due to M. Majumdar and R. Radner (1990), "Linear Models of Economic Survival under Production Uncertainty," Working Paper 427, Dept. Econ., Cornell University. Example 2 is a one-dimensional specialization of a result of R. C. Merton (1971), "Optimal Consumption and Portfolio Rules in a Continuous-Time Model," J. Economic Theory, 3, pp. 373-413. 3. The uniqueness of the solution to (3.55) is a little difficult to check, since r(x, c) is unbounded as a function of x. One way to circumvent this problem is to fix T> 0 and consider the problem of maximizing the discounted reward over the finite horizon [0, T]. This is carried out in Fleming and Rishel, loc. cit., pp. 160, 161. One may then let T —* oo.
Theoretical Complement to Section VIA
1. A readable account of the general theory of optimal stopping rules in discrete time may be found in Y. S. Chow, H. Robbins, and D. Siegmund (1971), Great Expectations: The Theory of Optimal Stopping, Houghton Mifflin, Boston.
Theoretical Complement to Section YES The chapter application is based on H. Scarf (1960), "The Optimality of (S, s) Policies in the Dynamic Inventory Problem," Mathematical Methods in the Social Sciences, Stanford University Press, Stanford, pp. 96-202.
CHAPTER VII
An Introduction to Stochastic Differential Equations 1 THE STOCHASTIC INTEGRAL A diffusion {X,} on R' may be thought of as a Markov process that is locally like a Brownian motion. That is, in some sense the following relation holds, dX, = µ(X,) dt + a(X,) dB„
(1.1)
where /1(.), a(.) are given functions on ll ', and {B,} is a standard one-dimensional Brownian motion. In other words, conditionally given {X : 0 < s < t}, in a small time interval (t, t + dt] the displacement dx, := X, +d , — X, is approximately the Gaussian random variable u(X,) dt + — B,), having mean µ(X,) dt and variance 6 2 (X,) dt. Observe, however, that (1.1) cannot be regarded as an ordinary differential equation such as dX,/dt = µ(X,) + o (X,) dB,/dt. For, outside a set of probability 0, a Brownian path is nowhere differentiable (Exercise 1; or see Exercises 7.8, 9.3 of Chapter I). The precise sense in which (1.1) is true, and may be solved to construct a diffusion with coefficients p(•), a 2 (.), is described in this section and the next. The present section is devoted to the definition of the integral version of (1.1): 5
X,=x+£m(Xx )ds 0
+5 a(X)dB.
(1.2)
0
The first integral is an ordinary Riemann integral, which is defined if t(.) is continuous and s —• Xs is continuous. But the second integral cannot be defined as a Riemann—Stieltjes integral, as the Brownian paths are of unbounded variation on [0, t] for every t > 0 (Exercise 1). On the other hand, for a constant 563
564
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
function o(x) - a, the second integral has the obvious meaning as a(B, — B o ), so that (1.2) becomes X,=x +£u(X,)ds+a(B,—B o ).
(1.3)
0
It turns out that (1.3) may be solved, more or less by Picard's well-known method of iteration for ordinary differential equations, if µ(x) is Lipschitzian (Exercise 2). This definition of the (stochastic) integral with respect to the Brownian increments dB, easily extends to the case of an integrand that is a step function. In order that {X} may have the Markov property, it is necessary to restrict attention to step functions that do not anticipate the futute. This motivates the following development. Let (S2, F, P) be a probability space on which a standard one-dimensional Brownian motion {B,: t > 0} is defined. Suppose { : t >, 0} is an increasing family of sub-sigmafields of F such that (i) B. is 3- measurable (s >, 0), (ii) {B, — B s : t >, s} is independent of (events in) .;s (s > 0).
(1.4)
For example, one may take .y, = a {B.,: 0 < s < t }. This is the smallest .01, one may take in view of (i). Often it is important to take larger A. As an example, let SZ = a[{B s : 0 < s < t }, {Z z : A E A}], where {Z 2 : A E A} is a family of random variables independent of {B,: t >, 0 }. For technical convenience, also assume that .F„ F are P-complete, i.e., if N e .y, and P(N) = 0, then all subsets of N are in Wit : this can easily be ensured (theoretical complement 1). Next fix two time points 0 <, a < ß < oc. A real-valued stochastic process { f (t): a < t < ß} is said to be a nonanticipative step functional on [a, ß] if there exists a finite set of time points t o = a < t, < • • • < t,„ = ß and random variables f (0 < i < m) such that
(i) f is .F,,- measurable (0 < i < m), f for t, < t < t + , (O <, i < m — 1), f (ß) = f,.
(ii) f (t) =
(1.5)
;
Definition 1.1. The stochastic integral, or the Ito integral, of the nonanticipative step functional f = { f(t): a < t < ß} in (1.5) is the stochastic process
I f(s) dB `
s
:=
f(to)(B, — B1 0 ) ° . f(a)(B, — Ba
)
for t E [a, t,],
i
f(ti- 1 )(B,. — B, ) + i(t1)(B, — B1,) ;
for t e (ti, ti+ ],
=1
(a
Observe that the Riemann-type sum (1.6) is obtained by taking the value of the integrand at the left endpoint of a time interval (t,_,, ti]. As a consequence,
565
THE STOCHASTIC INTEGRAL
for each t E [a, ß] the Ito integral ca f(s) dB., is .9i measurable, i.e., it is nonanticipative. Some other important properties of this integral are contained in the proposition below. Proposition 1.1 (a) If f is a nonanticipative step functional on [a, ß], then t —^ f f(s) dB, is continuous for every w n S2; it is also additive, i.e., 'f(u)dB„+
j f(u)dB„= J " f(u)dB., (a<'s
s
f^"
a
(b) If f, g are nonanticipative step functionals on [a, ß], then
J
(f(s) + g(s)) dB5 = a
J f(s) dB,, + a
g(s) dB,,.
(1.8)
a
(c) Suppose f is a nonanticipative step functional on [a, ß] such that
E
J
R
f 2 (t) dB, =
J
a
ß
(Ef 2 (t)) dt < oo.
(1.9)
a
Then {$f(s) dB 5 : a < t ß} is a square integrable {S,}-martingale, i.e.,
E(J
rI
f(u)dB
)=
f
, f (u) dB.
(a'<s
(1.10)
2(s)dsl.Fa)
(1.11)
a
a
Also,
E (f f(s)dB,)2I J' E (Ja f
and 2
c
f
Ef (s) dB 5 = 0, «'
E
r
f (s) dB s = E f
a
2 (s)
ds,
(a < t < ß)
a
(1.12)
Proof. (a), (b) follow from Definition 1.1. (c) Let f be given by (1.5). As Jä f(u) dB„ is . -measurable, it follows from (1.7) that s
E
f(u)dB,,I a
5
=
f(u)dB.+E a
f(u)dB„I. ). S
s
(1.13)
566
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Now, by (1.6), if t1 _ 1 <S ` t1 and t, < t < t ;+ 1 , then
J
f(u) dB „ = f(s)(B1, — Bs) +.f(t;)(B, ; ., — Bt ; ) +
...
+f(ti- 1 )(B,,
—
Bt,-,)+.f(t,)(B,
—
(1.14)
B,,).
Observe that, for s' < t', B,. — B5 . is independent of.. (property (1.4(ii)), so that E(B,.—B 5 .l.ys .) =0
(1.15)
(s'
Applying this to (1.14),
E(f(s)(B,, — B3) 2s) = f(s)E(B,, — B:I gis) = 0 , ) = E[E(f(t;)(B,,. — B,,)1 .
E(f(tj)(Br j ^, — B,) I
J
)1
2s]
(116)
=E[f(t ; )E(B t ,,, — B,IJ,,)I.gis] = 0,..., E(.f (tt)(B, — Bra) j . ) = E[.f (tt)E(B, — B, 1 gis)] = 0.
Therefore, E (
J
t
f(u)dB„I.F,)
=0
(s
(1.17)
/
s
From this and (1.7), the martingale property (1.10) follows. In order to prove (1.11), first let t e [a, t 1 ]. Then
E
(f
a
f(s) dBs J z I ya = E(.f Z (a)(B, — Ba) 2 I ^`a) = .% 2 (a)E((B , /
J
—
Ba) 2 I Via)
=f2(a)(t—a)=E(Jaf 2(s) ds ),
(1.18)
by independence of .tea and B, — Ba . If t e (t ; , t i ,,] then, by (1.6),
E
((J f (S) dB5) 2
= E[{
1
(tJ-1)(Bt^ — Bti-t) + f (ti)(Bt — B11 )1 2 I Via] .
l (1.19) Now the contribution of the product terms in (1.19) is zero. For, if j < k, then E[f(ti-i)(B, — Br,-1 )f(tk-1)(Btk =E[E(...
-
Btk-1 ) I Via]
tk- ,)1$Z 27
E[f(t1-1)(B,i — Bri-1)f(tk-I)E(Btk - B lk
= 0.
-
Si;tk
-1 )
19Q]
(1.20)
THE STOCHASTIC INTEGRAL
567
Therefore, (1.19) reduces to
E[(J
«
f(s) dBs) z I'^« /
J =
E(.%z(tj-1)(Eti — Btj 1 )2 Vi a) j=1
+ E(f 2 (ti)(B, — By) 2 1 Via) E[J Z(tj- 1)E((13ti — B 1 )2
=
ti_t) `tea]
j=1
+ E[f 2(ti)E((B, — B.) 2 I `^'ti) I Via] i
E(J 2 ( t j-1)( tj — tj-1)
=
) + E(f 2(ti)(t — ti) Via)
j=1
= E (t — tj_
=E(
J
t
f 2 (s)dsI F.
t_ + (t — t
/'
tj
(1.21)
The relations (1.12) are immediate consequences of (1.17) and (1.21).
n
The next task is to extend the definition of the stochastic integral to a larger class of functionals, by approximating these by step functionals. Definition 1.2. A right-continuous stochastic process f = { f(t): a < t < ß}, such that f (t) is .-measurable, is said to be a nonanticipative functional on [a, ß]. Such an f is said to belong to .#[a, ß] if R
f 2 (t)dt < co.
E
(1.22)
a
If f e .#[a, ß] for all ß > a, then f is said to belong to .#[‚ cc).
Proposition 1.2. Let f E
ß]. Then there exists a sequence { f,} of nonanticipative step functionals belonging to i!'[a, ß] such that t3
E
(f„(t)—f(t)) 2 dt-^0
asn --goo.
(1.23)
a
Proof. Extend f(t) to (— oo, oo) by setting it zero outside [a, ß]. For > 0 write g(t) := f (t — s). Let i be a symmetric, continuous, nonnegative (nonrandom) function that vanishes outside (-1, 1) and satisfies f 1_ 1 i1i(x) dx = I. Write Ii e (x) %= ‚1i(x/s)/E. Then /0, vanishes outside (—s, E) and satisries
568
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
f`-E he(x) dx = 1. Now define gE'= g E * k// s , i.e.,
g(t) =
=
E
9E(t — x)TE(x) dx = Jf(t —
J E
J^
f (y)^E(y — t + E) dy
i. — x)'YE(x) dx
(y = t — e — x).
(1.24)
I 2E
Note that, for each w e S2, the Fourier transform of gE is 9E
(
) = 7E(S) YE( ) =
e f( '
)Y (
),
so that, by the Plancherel identity (Chapter 0, Eq. 8.45), °
IgE(t) — f(t)I Z
1 dt = 2>z
19 0) — f(^)I z d^ 1
1 = 2^
°°
I.f(^)I2Ie1E(^) — lj 2 )d^.
(1.25)
The last integrand is bounded by 21 f (^)I 2 . Since E
J
d
f J I.f ( )I n
2
d dP = 2n
^
Jn J
f 2 (t) dt dP
f 2 (t) dt < co,
= 2nE I
(1.26)
J [ a.ß1 it follows, by Lebesgue's Dominated Convergence Theorem, that
J
E I gE(t) — f (t)I 2 dt --> 0
as s—• 0.
(1.27)
This proves that there exists a sequence {g„} of continuous nonanticipative functionals in t[a, ß] such that
e E
(g„(s) — f (s)) 2 ds --> 0.
(1.28)
It is now enough to prove that if g is a continuous element of ß], then there exists a sequence {h „} of nonanticipative step functionals in .A'[a, ß] such that
v
E (h„(s) —g(s)) 2 ds --+0.
f
(1.29)
569
THE STOCHASTIC INTEGRAL
For this, first assume that g is also bounded, jgj < M. Define h„(t)=gI a+ k (ß—a )I
ifa+k
n /
(ß—a),(0
h(ß) = g(ß). Use Lebesgue's Dominated Convergence Theorem to prove (1.29). If g is unbounded, then apply (1.29) with g replaced by its truncation g M , which agrees with g on {t c [a, ß]: jg(t)l M} and equals —M on the set {g(t) < —M} and M on {g(t) > M}. Note that g m _ (g A M) v (—M) is a continuous and bounded element of #[a, ß], and apply Lebesgue's Dominated Convergence Theorem to get B
(g M (s)—g(s)) 2 ds—*0
asM--0x.
n
2
We are now ready to define the stochastic integral of an arbitrary element fin &[a, ß]. Given such an f, let { f„} be a sequence of step functionals satisfying
(1.23). For each pair of positive integers n, m, the stochastic integral fä (fn — fm)(s) dB ( t < ß) is a square integrable martingale, by Proposition 1.1(c). Therefore, by the Maximal Inequality (see Chapter!, Eq. 13.56) and the second relation in (1.12), for all c > 0, 3
P(M,
>
E(.fn(s) — fm (S)) 2 ds/E 2 ,
8 )
(1.30)
Q
where M,,,„:=max^If ' (fjs) — fm (s))dB s :a'
(1.31)
”
Choose an increasing sequence {n k } of positive integers such that the right side of (1.30) is less than 1/k 2 for r = 1/k 2 , if n, m > n k . Denote by A the set A :_ {cw e S2: M„ k „ k „(w) <, 1/k 2 for all sufficiently large k} .
(1.32)
By the Borei—Cantelli Lemma (Section 6 of Chapter 0), P(A) = 1. Now on A the series > k M„ k „ k+ , < oo. Thus, given any 6 > 0 there exists a positive integer k(5) (depending on co E A) such that
>1
<
Vj,lik(b).
k3k(b)
In other words, on the set A the sequence { f,, f, (s) dB,: a < t < ß} is Cauchy k
570
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Ja
in the supremum distance (1.31). Therefore, f", (s) dB, (a < t < ß) converges uniformly to a continuous function, outside a set of probability zero. ß], the uniform limit of f f" (s) dB (a < t < ß) on A as constructed above is called the stochastic integral, or the Ito integral, of f, and is denoted by f,, f (s) dB, (a < t < ß). It will be assumed that t -+ $f(s) dB is continuous for all w e S2, with an arbitrary specification on A`. If f e M[a, co ), then such a continuous version exists for f ä f (s) dB s on the infinite interval [a, cc). Definition 1.3. For f e
k
5
5
It should be noted that the Ito integral of f is well defined up to a null set. For if {f}, {g"} are two sequences of nonanticipative step functionals both satisfying (1.23), then E
(f"(s) — g"(s)) 2 ds —► 0.
(1.33)
If { f" }, {g," } are subsequences of {f}, {g"} such that f f" (s) dB s and J g(s) dB3 both converge uniformly to some processes {Y, }, {Z, }, respectively, outside a set of zero probability, then
a
k
k
k
Ef
ß
(1 — Z) 2 ds tim E k-
a
(fn
k
(S)
— 9m
k (S))
2
ds = 0,
a
by (1.33) and Fatou's Lemma (Section 3 of Chapter 0). As { Y }, {Z are a.s. continuous, one must have Ys = Z., for a < s < ß, outside a P -null set. Note also that $f(s) dB, is .` ,-measurable for a < t <, ß, f e -#[a, ß]. The following generalization of Proposition 1.1 is almost immediate. S
3}
Theorem 1.3. Properties (a)—(c) of Proposition 1.1 hold if f (and g) e W[a, ß]. Proof. This follows from the corresponding properties of the approximating step functionals { f k }, on taking limits. n "
Example. Let f (s) = B . Then f e 4' [a, cc). Fix t > 0. Define f(t) = B, and S
f"(s):= Br2
-„ 1
if r2 - "t<,s<(r +1)2 - "t (0
Then, writing Br," ‚= Br2 -
J
2" -1 1 2" - 1 fn(s) dB3 = B,,,(B,.+t,n — B,,,) ,") _ (Br +1,n — Br,n) 2 r =0 2 r=0 0 —zt
—
ZBö + ZB, a.s.
(as n --> oo).
—'
B0 + 2Bt (1.34)
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
571
The last convergence follows from an application of the Borel-Cantelli Lemma (Exercise 4). Hence,
J
t
B5 dB 5 = —(B-B)—t.
(1.35)
0
Notice that a formal application of ordinary calculus would yield J ö Bs dB., = 2(B, — Bö). Also, if one replaced the values of f at the left end points of the subintervals by those at the right end points, then one would get, in place of the first sum in (1.34), 1 z^ -i
z^ -i
(
Br+l,n(Br+l,n — Br,n) _ —
z
i
tBr+l,n — Br,n) + -1
z (B z —B0)ß
2 r=0
r=0
which converges a.s. to Zt + Z(B2 — Bö). Observe finally that the stochastic integral in (1.2) is well defined if } s(X )} belongs to .A'[0, cc). In the next section, it is shown that there exists a unique continuous nonanticipative solution of (1.2) if u(•) and a(.) are Lipschitzian and, in particular, {6(X )} e .H[0, cc). S
5
2 CONSTRUCTION OF DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS In the last section, a precise meaning was given to the stochastic differential equation (1.1) in terms of its integral version (1.2). The present section is devoted to the solution of (1.1) (or (1.2)). 2.1 Construction of One-Dimensional Diffusions Let u(x) and v(x) be two real-valued functions on R' that are Lipschitzian, i.e., there exists a constant M > 0 such that Iµ(x) — i (Y)I <' Mix — YI,
1Q(x) — Q(Y)I '< Mix — YI ,
Vx, y. (2.1)
The first order of business is to show that equation (1.1) is valid as a stochastic integral equation in the sense of Ito, i.e., Xt = X« +
Jt µ(Xs) ds + J 0
t
a(XS) dB5,
(t ^ a).
(2.2)
a
Theorem 2.2. Suppose p(•), a(.) satisfy (2.1). Then, for each 3-measurable square integrable random variable X, there exists a unique (except on a set of zero probability) continuous nonanticipative functional {Xt : t >, a} belonging to .‚i [a, oo) that satisfies (2.2).
572 Proof.
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
We prove "existence" by the method of iterations. Fix X,1°):= V
Let
<, t < T.
Define, recursively,
f
T> a.
XI(" +i) ; = Xa + ,u(X(")) ds +
f
t
o(X) dB,,
'
a
(2.3)
a t
T.
(2.4)
a
For example, Xi l) = Xa + µ(Xa)(t — a) + u(Xa)(B, — Ba),
X i2) = Xa + f.' +
J
lz([X« + u(XX)(s — a) + Q(Xa)(BS — Ba)]) ds
([Xa + p(XX)(s
—
a) + u(Xa)(B.,
(2.5)
B )]) dB,,
a t <, T.
2
—
Note that, for each n, a < t < T} is a continuous nonanticipative functional on [a, T]. Also, (X ^"+ 1 )
_ X = ") ) 2
If.
r2
(µ(Xs"))_u(Xs"— i))) ds +I(Q(Xs" ) )—o(XSn -1) )) dB a
t
t
2( j
+
(k(X"))
— /2(XS" -1) ds ))
2
a
-1 21j t (ff(Xr"r) — u (Xsn
)))
\JJ a
dB s ) 2 , (n > 1).
(2.6)
Write
(2.7)
D := E(max (Xs" — Xs" 1) ) 2 ) )
aSs^t
Taking expectations of the maximum in (2.6), over 0 < t < T, and using (2.1) we get T
D r" + 1) 2EM2 T
I X(") s — X s( "
2 -
' ds )
a
+ 2E max aSI-
(f
(Q(Xs" ) — Q(Xs"
.'
)
1)
)) dB s 1 2 . /
(2.8)
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
Now,
EI
J
T
J
IXs"t — XS" 'It ds) < (T — a)E
J
a
573
T (Xsn) — Xsn 1))2 ds
a
T
DS"t ds.
(T — a)
(2.9)
a
Also, by the Maximal Inequality (Chapter I, Theorem 13.6) and Theorem 1.3 (see the second relation in (1.12)), E max
(r(Xs"°) — 6{X s" '?}) dBS2
T (f,'
J
4E1
T
(v(XS ) — v(Xsn 1)}) dB) = 4J
E( a (Xsn^) _
(XSn 1)))2 ds
")
4M2
E
E(Xs"t — Xs" - ' 1 )zds 4M z J T DS"° ds.
(2.10)
a
Using (2.9) and (2.10) in (2.8), obtain DT +II \ (2M 2 (T — a) + 8M 2 ) JT Ds"I ds = c l J T Ds"I ds a
(n > 1), (2.11)
a
say. An analogous, but simpler, calculation gives
DTI < 2(T — a) 2 Ep 2 (Xa ) + 8(T — a)E i 2 (X« ) = c 2 ,
(2.12)
where c z is a finite positive number (Exercise 1). It follows from (2.11), (2.12), and induction that D("") ^ c z (T — a)"ci (n0).
n.
(2.13)
By Chebyshev's Inequality, PI max ^Xtn+'1 — Xt"Ij > 2 - ") < T
2
zn
c 2(T —
(2.14)
n.
Since the right side is summable in n, it follows from the Borel-Cantelli Lemma that P( max IX,(" + 't — X; "Il > 2 for infinitely many n) = 0. a-
(2.15)
574
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Let N denote the set within parentheses in (2.15). Outside N, the series Xtn +l) — X ,(" 1 = um ( t t )—
X^o) + r
n=0
X t(")
n-+m
converges absolutely, uniformly for a < t < T, to a nonanticipative functional {Xt : a < t < T }, say. Since each X;" ) is continuous and the convergence is uniform, the limit is continuous. Also, by Fatou's Lemma,
J
T
tim
E(X;" ) — Xt ) 2 dt <
a
m
-
00
f
(2.16)
E(X;" ) — X;'" ) ) 2 dt.
T
^,
By the triangle inequality for L2 - norms, and (2.13), if m > n then
j
(
T1/2 n) — Xm)) 2 dt^\ E(X e
/
m
1
T
1/2
E(X[r
I) — Xir))2
dt )
r=n+1 t\ a m
o0
(DT))"2 \
rn+1
(DT))1 12 > 0,
(2.1 7 )
rn+1
/
as n —• w (Exercise 2). Therefore, {Xt : a < t < T} is in #[a, T], and /' T
E(X(" ) —Xt ) 2 dt
(2.18)
asn--roo.
—+0,
Ja
To prove that X, satisfies (2.2), note that max ! t µ(Xs" ) ds + I t o(Xs" ) dBs — T JJ a Ja )
M
)
J
f«' µ(X,)
ds —
J o(XS) dB5 1
a
T
IXS" — Xs 1 ds + max )
a a,tsT
f
(Q(XS" ) — u(XX)) dB s . (2.19) )
a'
The first term on the right side goes to zero, as proved above. For the second term, use the Maximal Inequality (Chapter I, Theorem 13.6) to get
PI max I
\\\a5t5T
f.
'
(a(Xs" ) ) — a(X2)) dB, > 1 J < k 2 E k /
J T (v(Xs" ) )
— ^(X)) 2 ds
a
k2
Y (D)
)
(2.20)
= n+1 \ r-
Since the last expression in (2.20) is summable in n (Exercise 2), it follows from the Borel—CanteIli Lemma that max{jsa (o(XS" ) ) — a(X5 )) dB s j: a < t < ß} < 1/k
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
575
for all sufficiently large n, outside a set Nk of zero probability. Let No := U {Nk : 1 < k < co}. Then No has zero probability and, outside No , the second term on the right side of (2.19) goes to zero. Thus, the right side of (2.4) converges uniformly on [a, T] to the right side of (2.2) as n —+ cc, outside a set of zero probability. Since X 1 converges to XX uniformly on [a, T] outside a set of zero probability, (2.2) holds on [a, T] outside a set of zero probability. It remains to prove the uniqueness to the solution to (2.2). Let { Y,: a < t < T} be another solution. Then, write 49 E(max{(X 3 — YI: a , s'< t}) 2 and use (2.7)—(2.11) with cp, in place of D, X, in place of X;" ) and Y in place of X;" - ' ) , to get T
q ds.
(PT < cl
(2.21)
Since t —+ cp, is nondecreasing, iteration of (2.21) leads, just as in (2.12), (2.13), to c 2 (T — a) C1 "
PP T ^
n!
--'.0
asn —*oo.
(2.22)
Hence, (p T = 0, i.e., X, = Y on [a, T] outside a set of zero probability.
•
The next result identifies the stochastic process {X,: t >, 0} solving (2.2), in the case a = 0, as a diffusion with drift u(. ), diffusion coefficient a 2 (•), and initial distribution as the distribution of the given random variable X 0 . Theorem 2.2. Assume (2.1). For each x E El let {X: t > 0} denote the unique continuous solution in .ßl[0, oo) of Itö's integral equation
X f = x +
f
p(Xs) ds + i 1 u(Xs) dB (t > 0). 5
(2.23)
,10
o ,
Then {X'} is a diffusion on R' with drift j(.) and diffusion coefficient a 2 (. ), starting at x.
Proof. By the additivity of the Riemann and stochastic integrals, t
p(X„) du +
X; = Xs + S
f
' u(Xx) dB,
(t '> s).
(2.24)
5
Consider also the equation, for z e R',
X, = z +
J
u(X„) du + J a(X)dB, S
S
(t >' s).
(2.25)
576
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Let us write the solution to (2.25) (in . #[s, co)) as 6(s, t; z, Bs), where Bs:= {B„ — B s : s < u < t }. It may be seen from the successive approximation scheme (2.3)—(2.5) that 6(s, t; z, B.,) is measurable in (z, B.,) (theoretical complement 1). As {X„: u > s} is a continuous stochastic process in ✓#[s, co) and is, by (2.24), a solution to (2.25) with z = Xs, it follows from the uniqueness of this solution that = 8(s, t; Xs, B),
X;
(t >, s).
(2.26)
Since Xs is .-measurable and S and Bs are independent, (2.26) implies (see Section 4 of Chapter 0, part (b) of the theorem on Independence and Conditional Expectation) that the conditional distribution of X1 given .Fs is the distribution of 6(s, t; z, BS), say q(s, t; z, dy), evaluated at z = X. Since o {X„: 0 < u < s} the Markov property is proved. To prove homogeneity of this Markov process, notice that for every h > 0, the solution 8(s + h, t + h; z, Bs+;,) of
X,+h = z +
l'
+n µ(X„) du + ,h
+h
J
(t >, s)
a(X„) dB,
(2.27)
s+n
has the same distribution as that of (2.25). This fact is verified by noting that the successive approximations (2.3)—(2.5) yield the same functions in the two cases except that, in the case of (2.27), B., is replaced by Bs+,^,. But B, and B + have have the same distribution, so that q(s + h, t + h; z, dy) = q(s, t; z, dy). This proves homogeneity. To prove that {X} is a diffusion in the sense of (1.2) or (1.2)' of Chapter V, assume for the sake of simplicity that µ(•) and o 2 (. ) are bounded (see Exercise 7 for the general case). Then E(X' — x) = E J r u(X s) ds + E a(Xs) dBs = E 0
0
= tµ(x) +
J
E(lc(Xs)
t
—
J
r
µ(X5) ds
o
µ(x)) ds = tp(x) + o(t),
(2.28)
0
since Ej(XS) — µ(x) —• 0 as s J. 0. Next,
E(Xx
— x) 2
= E(
Jo µ(X;)ds r
+ 2EL(
= 0 (t 2 ) +
J
o
J
0
Z )
µ(X5)
t
+E(fo
J
ds)( o
a(Xs)dB5) /
a(X) dB.,) ]
Ea 2 (X) ds + 0(t) = 0(t)
as t j 0. (2.29)
577
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
For, by the Schwarz Inequality,
E
[(J o µ(Xs) ds)(o a(XS) dB. )]
2
[E(fo µ(X ('
= O(t)[
s) ds I 2] 1/2[E(fo a(X) dB))2] /1/z
c
J E2(XS)ds]
= O(t)O(t" 2 ) = 0(t)
as t j 0,
0
leading to E(X, — x) 2
=J
o z (X,,) ds + o(t)
as t j 0.
(2.30)
Now, as in (2.28),
J Ea
a 2 (x) dx +
2(XS) ds =
0
0
J
E(a 2 (X;) — a 2 (x)) ds
0
= ta 2 (x) + 0(t)
as t j 0.
(2.31)
The last condition (1.2)' of Chapter V is checked in Section 3, Corollary 3.5.
n
2.2 Construction of Multidimensional Diffusions In order to construct multidimensional diffusions by the method of Ito it is necessary to define stochastic integrals for vector-valued integrands with respect to the increments of a multidimensional Brownian motion. This turns out to be rather straightforward. Let {B, = ( B;' 1 B, )} be a standard k-dimensional Brownian motion. Assume (1.4) for {B,}. Define a vector-valued stochastic process ,
...
,
f = {f(t) = (f">(t), ... , .f
(k)
(t)): a z t < ß}
to be a nonanticipative step functional on [a, ß] if (1.5) holds for some finite set of time points t o = a < t, < • • •
;
Definition 2.1. The stochastic integral, or the Ito integral of a nonanticipative step functional f on [a, ß] is defined by f(to) • (B, — B oo
Y f(t
;
for t e [a, t,],
)
_,) (B r ,— B,
;
)+f(t r )•(B,—B 1 ,)
forte(t.,t. +i ]•
j=1
(2.32)
578
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Here • (dot) denotes euclidean inner product,
f(s) (B, — B ) _
f ` (s)(B,(' — Bs' ).
S
)
( )
(2.33)
)
It follows from this definition that the stochastic integral (2.32) is the sum of k one-dimensional stochastic integrals,
J
k
f(s)•dB =
f m`m(s) dB(` .
s
(2.34)
)
i=1
a
Parts (a), (b) of Proposition 1.1 extend immediately. In order to extend part (c) assume, as in (1.9),
J
ß E^f(u)1 2
du - 1 t1
J
ß
E(f ^`^(u)) 2 du < cc.
(2.35)
a
The square integrability and the martingale property of the stochastic integral follow from those of each of its k summands in (2.34). Similarly, one has e f(s)•dB s I = 0.
E(J a
(2.36)
111
It remains to prove the analog of (1.11), namely,
E\(Ja
f(u) dB
") 2
ja
) = E j'
If(u)I 2
du
^).
(2.37)
In view of (1.11) applied to $ä f" 1 (u) dB, (1 <, i < k), it is enough to prove that the product terms vanish, i.e.,
EL ( of «n(u) dB(')Xf " fv1(u) dB) a
= 0
for i J.
(2.38)
^
To see this, note that, for s < s' < u < u', E[ f ``' (s)(B s° — Bs') ) f(.;)(u)(Buj) —
= E[E(...
B(>>) I •Fa]
.mau) I . ] = E[.f (n(s)(Bs`) — Bsn)f u)( u )E(B/) — Bü;) I .F„) I .Fa] = 0. (2.39)
Also, by the indepencence of .Fs and {B s . — B s : s' s }, and by the independence
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
579
of {B{°} and {Bl'l} for i j, E[ f <<1
ci) (s)(BS' (s)(B — BS` ) ).i ) — B u) ) ^ ^] = E[E(. ..1 .ys) 1 Ja] = E[f ( `'(s).f u 1 (s){E((B (,) — Bs ° )(BSP — B u> ) 1 ^)} ( te ]
a
= 0.
Thus, Proposition 1.1 is fully extended to k-dimension. For the extension to more general nonanticipative functionals, consider the following analog of Definition 1.2. Definition 2.2.. If f = {f(t) = (f ß' 1 (t), ... , f(m)Ø)) a < t < ß} is a right-continuous
stochastic process with values in (for some m) such that f(t) is F,-measurable for all t e [a, ß], then f is called a nonanticipative functional on [a, ß] with values in fl8m. If, in addition,
J
E R If(t)j 2 dt < oo,
(2.40)
then f is said to belong to .WW[a, ß]. If f e W[a, ß] for all ß> a, then f belongs to ./H[a, o)). For f e di[, ß] with values in R'` one may apply the results in Section Ito each of the k components f;, f (u) dBu° (1 <, i < k) to extend Proposition 1.2 to k-dimension and to define the stochastic integral f ä f(u) dB. for a k-dimensional f e ^[a ß]. The following extension of Theorem 1.3 is now immediate. Theorem 2.3. Suppose f e . &{x, ß] with values in l8 k . Then
(a) t —• J r f(u) dB„ is continuous and additive; also
(b)tJ f(u) dB: a < t <, ß} is a square integrable {, }-martingale and F
f(u)
E \\^a
dBu )
1 3a / = E
f If(u)I 2 du
. f .
(2.41)
The final task is to construct multidimensional diffusions. For this let µß°(x), a i; (x) (1 < i, j < k) be real-valued Lipschitzian functions on W. Then, writing, a(x) for the k x k matrix ((u jj (x))), I t(x) — t(y) I '< Mx — YI,
Ia(x) — a(Y)II '< Mix — YI,
(x, ye (2.42)
580
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
for some positive constant M. Here II II denotes the matrix norm, IIDII := sup{IDzI: z e I8k}. Consider the vector stochastic integral equation, X, = X a +
Ja
'
µ(Xs) ds +
Ja
t a(X5) dB5,
(t s),
(2.43)
( 1 < i < k).
(2.44)
which is shorthand for the system of k equations X;'' = X +
a ; (X 5 )•dB s ,
p °(X,) ds + a
a
Here ø,(x) is the k- dimensional vector, (6 i1 (x), ... , v ;k (x)). The proofs of the following theorems are entirely analogous to those of Theorems 2.1 and 2.2 (Exercises 4 and 5). Basically, one only needs to write
IX E+
,(
t) — X(t)l 2 instead of (Xa+ 1 (t) — X„(t)) 2 ,
r
J
a
/('
2
(jt(X„(s)) — µ(X„ _ 1 (s))) ds Ila(Xn(s)) — a(Xn-
in place of
i(s))II 2 for
(J
\2
r a
(µ(X„(s)) — µ(X„ _ 1 (s))) ds ds)I
(6(Xn (ss)) — Q(X _ (s)) 2
,
etc.
Theorem 2.4. Suppose p(• ), a(.) satisfy (2.42). Then, for each -measurable square integrable random vector X 8 , there exists a unique (up to a P -null set) continuous element {X,: t > a} of #[a, oo) such that (2.43) holds. Theorem 2.5. Suppose µ(• ), a(.) satisfy (2.42), and let {X'; t > 0) denote the unique (up to a P-null set) continuous nonanticipative functional in .t[0, co) satisfying Itö's integral equation X,
f' ,
s) ds +
J
r
6(X 8 ) dB5
,
(
t >- 0).
(2.45)
0
Then {X'} is a diffusion on R' with drift t(•) and diffusion matrix a(•)6'(• ), a'(.) being the transpose i(.). It may be noted that in Theorems 2.4 and 2.5 it is not assumed that a(x) is positive definite. The positive definiteness guarantees the existence of a density for the transition probability and its smoothness (see theoretical complement 5.1 of Chapter V), but is not needed for the Markov property. Example. Let k = 1, µ(x) = — yx, 6(x) = a. Then the successive approximations
DIFFUSIONS AS SOLUTIONS OF STOCHASTIC DIFFERENTIAL EQUATIONS
581
X,(" (see Eqs. 2.3-2.5) are given by (assuming B 0 = 0) )
X'( o) - X0,
J — yX ds + 6B, = X (I — ty) + uB , = X0 + J — y{X (I — sy) +
X; ') = Xp +
0
o
r
Jo
X; 2)
r
QB S } ds + aB ,
0
0
t i ,z \
=
I —ty+
r
} — )Xp—yQ 2!J
Cl X1 3) = Xp +
p
S2y2
— y (I — Sy + 2 2
=
B,ds+aB„ 0 s
2,--- ^ Xp — y6
3 31
I
I — ty+ 2 t ^ ---- IX 0 +, 2 6 2 2
f('
33
ty+ -2 i —t 3! ^Xp+,, a =
1
—
ty+
-- ,
t3 „3
- - ---'-)X0+,r 2 (J
2!
J
( '
3
0
r
(
r
J(' B ds+a'B, Jz
B„ duds—y6
o
t22
B u du + QB S ^ dS + 6B,
s
o,
/
fo
s
0
f
ds)B„du—yß f Bs ds+aB,
.
o,
'
(t—u)B„du—
Bsds+aB,.
o
0
(2.46) Assume, as an induction hypothesis, (—ty)Mn-1
a•," ) = Y -- )X 0 + Y- (—Y)r"6 (
M "= 0 mI
m1
fr(t—S )m
0 (n1
—
-
I
Bs ds + 6B,. (2.47)
I)!
Now use (2.4) to check that (2.47) holds for n + 1, replacing n. Therefore, (2.47) holds for all n. But as n — co, the right side converges (for every w e S2) to
e
-r yX0 —
ya
f r e-vlr-S^B5 ds + QB,.
Jo
(2.48)
Hence, Xr equals (2.48). In particular, with Xp = x,
X, = e-"x —
t7 £ r
e-''u-s)B5 ds + aB r .
(2.49)
0
As a special case, for y > 0, a 0- 0, (2.49) gives a representation of an Ornstein—Uhlenbeck process as a functional of a Brownian motion.
582
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
3 ITÖ'S LEMMA
Brownian paths s - B, have finite quadratic variation on every finite time interval [0, t] in the sense that x -1
max
Ji
16N52"
(
m =0
B(m+l)2-nt - Bm2-nt) 2 - N2 - "t -+0 a.s.
as n -• cc. (3.1)
This is easily checked by recognizing that the expression ZN say, within the absolute signs is, for each n, a martingale (1 < N < 2 "), so that the Maximal Inequality (Chapter I, Eq. 13.56) may be used to prove (3.1) (Exercise 1). Notice that the quadratic variation of {B } over an interval equals the length of the interval. A consequence of (3.1) is a curious and extremely important "chain rule" for the stochastic calculus, called Itö's Lemma. The present section is devoted to a derivation of this chain rule and some applications. For an intuitive understanding of this chain rule, consider a nonanticipative functional of the form ,,, ,
5
Y(t) = Y(0) +
J " f(s) ds + fo g(s) dB
5
,
(
3.2)
0
where f, g E ✓l'[0, T], and Y(0) is .-measurable and square integrable. One may express (3.2) in the differential form (3.3)
dY(t) = f(t)dt + g(t)dB,.
Suppose that cp is a real-valued twice continuously differentiable function on ll , say with bounded derivatives gyp', cp". Then Itö's Lemma says dq (Y(t)) = q'( Y(t)) dY(t) + zip " (Y(t))g 2 (t) dt _ {qp'(Y(t)) f(t) + iqp"(Y(t))g 2 (t))} dt + gp'(Y(t))g(t) dB,. (3.4) .
In other words, (P(Y(t)) = (P(Y(0)) +
E
{(P'(Y(s))f(s) + 1 "(Y(s))g 2 (s)} ds + -
J
9
'(Y(s))g(s) dB 5
.
0
(3.5)
Observe that a formal application of ordinary calculus would give dcp(Y(t)) = q'(Y(t)) d Y(t) = cp'(Y(t))f(t) dt
The extra term
Zip "(Y(t))g 2 (t)
+ tp'(Y(t))g(t) dB,.
dt appearing in (3.4) arises because
(dY(t)) 2 = g 2 (t)(dB,) 2 + o(dt) = g 2 (t) dt + o(dt).
(3.6)
583
ITO'S LEMMA
Since the term g 2 (t) dt cannot be neglected in computing the differential dcp(Y(t)), one must expand cp(Y(t + dt)) around Y(t) in a Taylor expansion including the second derivative of cp. The same argument as above applied to a function p(t, y) on [0, T] x (f8 1 , such that cp o := öcp/öt, q,', cp" are continuous and bounded, leads to dcp(t, Y(t)) = {(p o (t, Y(t)) + cp'(t, Y(t))f(t) + jqp'(t, Y(t))g 2 (t)} dt + cp'(t, Y(t))g(t) dB,.
(3.7)
To state an extension to multidimensions, let cp(t, y) be a function on [0, T] x Ihm that is once continuously differentiable in t and twice in y. Write
a0(P(t, Y) __ a0t, Y)
7r(P(t, y)'= a Y) (1 < r < m).
at
(3.8)
Let {B,} be a k-dimensional standard Brownian motion satisfying conditions (1.4(i), (ii)) (with B s replaced by B S ). Suppose Y(t) is a vector of m processes of the form Y(t) = (Y (I) (t), . . . , Y Im' \t)),
f
y (r) (t) = y()(Ø) +
I fr( s) ds +
o
J
o
s • dB s grO
r m). (1 ^^
(3.9)
Here fr,. 'fm are real-valued and g 1 , ... , g m vector-valued (with values in ff8'k ) nonanticipative functionals belonging to . &[0, T]. Also Y ( ' ) (0) are .äo -measurable square integrable random variables. One may express (3.9) in the differential form
dY r (t) = fr (t) dt + g,(t).dB, (
)
(1 <, r v m).
(3.10)
Itö's lemma says,
dQ(t, Y(t)) =
U o(p(t ,
Y(t)) +
l
ar(p(t, Y(t))f(t) r=1
1
+ —
arar. (t, Y(t))(gr(t) . r (t)) dt
2 ! ISr.r'-<m m
+ I a r (p(t, Y(t))gr (t).dB,. r
(3.11)
584
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
In order to arrive at this, write dcp(t, Y(t)) = (p(t + dt, Y(t + dt)) — (p(t, Y(t)) = ^p(t + dt, Y(t + dt)) — (p(t, Y(t + dt)) + qp(t, Y(t + dt)) — qo(t, Y(t)) = Ö 0 (p(t, Y(t)) dt +
a r (p(t, Y(t)) dY r (t) (
)
r=1 +
12 Orar'gP(t,Y(t))dy(r)(t)dY(r')(t).
(3.12)
! 1 r,r' S m
In Newtonian calculus, of course, the contribution of the last sum to the differential would be zero. But there is one term in the product dY r (t) dY r' (t) which is of the order dt and, therefore, must be retained in computing the stochastic differential. This term is (see Eq. 3.10) (
(gr(t)•dB1)(gr•(t)•dB) _ (^ \t
- 1
9("(t) dB)Xj
(
)
Y = 1 9(t)
k
ga 1 (t)9:` 1 (t)(dB,( ° ) 2 +
_
)
f=1
E 9( t) (t)9(' ) (t)dB(` ) dBf;> .
i#j
(3.13)
Now as seen above (see Eq. 3.1) the first sum in (3.13) equals
)
(3.14)
dt = gr(t)•gr (t) dr.
19r` (t)9:` (t) )
(
)
To show that the contribution of the second term in (3.13) to cp(t, Y(t)) — cp(0, Y(0)) over any interval [0, t] is zero note that, for i j, N-1
=
Y 9:` ) (m 2 "t)grj) (m 2-" t)(B(m+l)Y ^t - Bm 2 ^t)(Bm +1)2 -
t
)
-
^t
- Bm2
^t),
m=0
I
max JZ N , — ► 0 a.s.
(3.16)
1
Thus, g;°(t)g;P(t) dB ti
1
dB
1
= 0
(i ^ j).
(3.17)
ITÖ'S LEMMA
585
Using (3.14), (3.17) in (3.12), Itö's lemma (3.11) is obtained. A more elaborate argument is given in theoretical complement 3.1. For ease of reference, here is a statement of Itö's Lemma. Theorem 3.1. (Itö's Lemma). Assume f1 .... , f„„ g 1 , ... , g,,, belong to .ßl[0, T], y ( r ) (0) (1 < r < m) .moo -measurable and square integrable. Let cp(t, y) be a real-valued function on [0, T] x 1m which is once continuously differentiable in t and twice in y. Assume that
E
f
OT
( 0 r 4P(s, Y(s)) 2 Ig r (s)I 2 ds < oo,
(3.18)
(I < r <, m),
i.e., ö r cpgr e .A"[0, T] (1 <, r < m). Then (3.11) holds, i.e., IOo(P(u Y(u)) + Y_ Or^(U, Y(U))fr (U)
(P(t, Y(t)) — w(s, Y(s)) = J
,
r= 1
+
12 1 .
arar'(p(u, Y(u))gr(u)•gr.(u) 5
r.r'
<-
m
I
du
t
m
+
1
0 r cp(u, Y(u))g r (u) • dB„
Z
(0 <, s < t < T).
r=1 s
(3.19)
Applying Itö's Lemma to the diffusion Y(t) = X, (t >, 0) constructed in Section 2, the following corollary is obtained immediately. Corollary 3.2. Let {X t } be a diffusion given by the solution to the stochastic differential equation (2.43), with a = 0 and tt(• ), 6(•) Lipschitzian. Assume cp(t, y) satisfies the hypothesis of Theorem 3.1 with m = k, and (3.18) replaced by
E
fO
T (arcP)2(u, Xu)I6r (Xu)I2du < oo,
(1 < r < k).
(a) Then one has the relation 9(t, X,) = cp(s, X S ) + 5{O o (u, X„) + (Atp)(u, X„)} du s k
t
örcp(u, X„)6 r (X„)•dB.
+
(s < t < T), (3.20)
r=1 s
where a r (x) is the rth row vector of a(x), and A is the differential operator
586
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
(14.9) of Chapter V with D(x) = a(x)a'(x),
r (A(P)(u, x)'= drr'(X)arar'(v(u, x) + Z µ (X)ar(v(u, x), 2 1-
)
((drr'(X)))'^ O(X)O'(X).
(b) In particular, if 0 0 (p, O rr • cp are bounded then Z, := w(t, XJ —
J
^ {a o (p(u, X.) + (A(p)(u, X.)} du
(0 , t , T),
0
(3.22)
is a {.^,}-martingale. Note that part (b) is an immediate consequence of part (a), since Z, — Zs equals the stochastic integral in (3.20), whose conditional expectation, given .gis , is zero. Corollary 3.2 generalizes Corollary 2.4 of Chapter V to a wider class of functions and to multidimension. Therefore, one may provide derivations of Propositions 2.5 and 15.1 of Chapter V, as well as criteria for transience and recurrence based on them, using Itö's Lemma instead of Corollary 2.4 of Chapter V. The next result similarly provides a criterion for positive recurrence. Recall the notation (see Eqs. 15.33 and 15.34 of Chapter V) D(x) _ ((d ;; (x))):= 6(x)a'(x),
d(x):= Y d jj (x) x t') x ci)/jxj 2 , i.i
d„(x),
B(x):= > (r) := max Ixl =r
B(x) + C(x)
C(x): =2 x ( ` )JUY ) (x),
— 1, ß(r) min
d(x)
& (r):= max d(x),
B(x) + C(x)
1xl =r
— 1, (3.23 )
d(x)
a(r)r= Ixl_r min d(x),
Ixl =r
I(r) := f r ß(u) du, u
J
^
1(r) :=
ß (u) du,
fu
where c> 0 is a given constant. Also note that (see Eq. 15.35 of Chapter V) for every F that is twice differentiable on (0, oo), Acp for cp(x):= F(IxI) is given by
B(x) + C(x)—d(x) „ 2 (Aq, )(x) = (d(x))F (Ixl) + Ixl F
(Ixl)
(Ixt > 0). (3.24)
Proposition 3.3. Let t(.), 6(•) be Lipschitzian, and ß(x) nonsingular for all x.
ITÖ'S LEMMA
587
Suppose that, for some c > 0,
J
J^
1 exp{1(u)} du < oo a(u)
(3.25)
Then E(t B(O:ro)
(3.26)
(Ixj > ro > 0),
( X o = x) < cc
where t
B(O : r)
:=
inf{t
0: 1X^l = r} .
(3.27)
Proof. First note that if (3.25) holds for some c> 0 then it holds for all c > 0
(Exercise 2). Let c = ro > 0. Define F(r) :=
f"
exp{ —1(u)} u
0
J acv)exp{I(v)} dv J du,
(r >, ro )
(3.28)
and q(x)
=
F(Ixl),
Ixi
(3.29)
ro .
Note that F'(r) > 0 and F"(r) + (ß(r)/r)F'(r) _ —1/a(r) for r > r 0 . Hence, by (3.24),
2(A4)(x) < — d(x)/a((x)) < —1
forIxI >, r 0 .
(3.30)
Fix x such that Ixi > r0 . By Corollary 3.2(b), and optional stopping (Proposition 13.9 of Chapter 1; also see Exercises 3, 4), EZgN = EZ o = q(x) = F(Ixl)
(3.31)
where Z, is as in (3.22) with cp(t, y) = q(y) and cp o (t, y) = 0, and r1 N is the {.}-stopping time
?J N := inf{t >, O:IX; I = ro or N},
(r0 < IxI < N).
(3.32)
Using (3.30) in (3.31) the following inequality is obtained, 2EF(IXn I) — 2F(Ixl) = E N
J nN
2(Acp)(X9) ds
—E(11,).
(3.33)
0
Now the first relation in (3.25) implies that T (O:, o ) < oo a.s. (see Corollary 15.2 of Chapter V. or Exercise 5). Therefore, rJ N --+ t,, B(a:ro) a.s. as N -* co, so that
588
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
(3.33) yields E('tae(O:ro) 1 Xo = x) < 2 F(Ixl).
(3.34)
n
As in the one-dimensional case (see Section 12 of Chapter V), it may be shown that (3.26) implies the existence of a unique invariant distribution of the diffusion (theoretical complement 5). As a second application of Itö's Lemma, let us obtain an estimate of the fourth moment of a stochastic integral.
Proposition 3.4. Let f = {(fi (t), ... , fk (t)): 0 < t < T} be a nonanticipative functional on [0, T] satisfying T f 4(t) dt < oo
(i = 1, ... , k).
EJ 0
(3.35)
Then
ff
E
'T{Ef 4(t)} dt. f(t)•dB,4 < 9k 3 T Jy ^ 1=1
.
Proof.
f
(3.36)
0
Since f(t) • dB, = Z _, f (t) dB}` ) ,
E( oT f(t)•dB,f 4 =k 4 E) 1 j f Tf(t)dBi^>)4 \\\ki =io
/
k
4
Ek
(f T
.f (t) dB/4
=
0
k 3 Y E \J
1
T
f
(t) dB,(')
4
(3.37)
Hence, it is enough to prove T
E(
4
g(t) dB)
T
{Eg 4 (t)} dt
(3.38)
0
0
for a real-valued nonanticipative functional g satisfying
E
f
0T
g 4 (t) dt < oo.
(3.39)
First suppose g is a bounded nonanticipative step functional. Then by an explicit computation (Exercise 6) the nonanticipative functional s -a (f ö g(u) dB.) 3 g(s) belongs to AY[0, T]. One may then apply Itö's Lemma (see Eq. 3.5, or Eq.
589
ITÖ'S LEMMA
3.20 with k = 1) to q (y) = y 4 and Y(t) = f o g(s) dB to get 5
(
fo g(s) dBs / 4 6
fo
\Jo
g(u) dB„
I Z g 2 (s) ds + 4 fo
(10
9(u) dBu 3 g(s) dBs> )
0
Taking expectations,
J
J, E \ o g(s) dB, 4
=6 f
g(u) dB„ 1 z g 2 (s)} ds.
o
(3.41)
Ej \Jo
This shows, in particular, that J, is absolutely continuous with a density satisfying z
f
('
)
dtv = 6E I( 0 g(u) dB„ ) g2(t)} 6 E( = 6J
2
o
4^1/2
g(u) dB„
{Eg4(t)}^^2 (3.42)
{Eg 4 (t)}'/ 2 .
Therefore, unless J, = 0 (which implies J = 0 for 0 < s < t), S
dJ,its
dt
=
Z J1 viz — 5 3{Eg 4 (t)}'/ dt
2,
or, JI'' < 3 J t {Eg 4 (s)} l / 2 ds,
J, '< 9t J t (Eg 4 (s)) ds.
0
0
This proves (3.38) for bounded nonanticipative step functionals. For the case of an arbitrary nonanticipative functional g satisfying (3.39), observe that such a g belongs to ./l[0, T]. Following the proof of Proposition 1.2, there exists a sequence of bounded step functionals g„ that converge almost surely to g and for which
f
'T
T
g(s) dB 5 a.s.
9(s) dB, --•
(i = 1, 2),
0
E
f
OT
(gn(s) — g z (S)) Z ds —> 0.
590
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
By Fatou's Lemma,
J('o
T
E
\ 4 -< lim E
g(t) dB r !
/
n-'m
n-
(f
T
4
9T lim E
g„(t) dB,^
g, (t) dt Jo
oT
T
= 9TE
fg
4 (t)
n
dt.
0
As a corollary, the property (1.2) of {X, } in Section 1 of Chapter V follows. Corollary 3.5. Let {X} be a diffusion with drift coefficient p(•) and diffusion coefficient a 2 (.), both Lipschitzian and bounded. Then, for every E > 0, P(X" — xl > E) = 0(t)
as t j 0.
(3.43)
Proof. By Chebyshev's Inequality, P(IXI — xl) < E(Xx — x) 4 . E
But if u(.) and a(.) are both bounded by c, then r
a
E(Xl — x) 4 = EI µ(Xs) ds + o (Xs) dB s
/fo
Jo
,
4
23{E(Joµ(Xs) ds)
1
4 + E (Joi(Xs) dBs) n
as t j 0.
8 {(ct) 4 + 9t 2 C 4 } = 0(t 2 ) = 0(t)
For another application of Itö's Lemma (see Eq. 3.19), let Y(t) != Xl — X = x
—y+
0
J
r
(µ(Xs) — µ(X5 )) ds +
J
r
(a(XS) — a(XS )) dBs,
0
(3.44)
and cp(z) = Iz1 2 , to get
Ixe — xs I Z = IX — y1 2 +
J
r
{2(x5 — xs) . (µ(X5 ) — µ(Xs ))
0 k
I rr(Xs) — ar(X5)i2} ds
+ r=1
+
J
0
2(X — Xs) • (a(Xs) — ß(XS )) dB s .
(3.45)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
591
Note that, since X, X; E ßl[0, oo) and p(•) and o(.) are Lipschitzian, the expectation of the stochastic integral is zero, so that EIX, — Xfl 2 = Ix — y1 2 +
J'
o
f
2E(Xs — Xs) . (µ(Xs) — µ(Xs ))
+ 1 Et(X) - 6r(XS )I 2 ( ds. (3.46) r=1
In particular, the left side is an absolutely continuous function of t with a density dt EIX — Xfl 2 2MEIXt — Xf l 2 + kM 2 EIX, — Xfl 2 .
(3.47)
Integrate (3.47) to obtain EIX' — X, l 2 < Ix — y12e(2M+kM2),.
(3.48)
As a consequence, the diffusion has the Feller property: If y --^ x, then p(t; y, dz) converges weakly to p(t; x, dz) for every t >, 0. To deduce this property, note that (3.48) implies that, for every bounded Lipschitzian function h on R' with Ih(z) — h(z')I < cjz — z'J,
IEh(XT) — Eh(X^ )I = IE(h(XT) — h(X^ ))I cEjX — X; clx — yIe(M+kM2/2)t y 0
c(EIX; — X; X 2 )'/ 2
as y —• x.
(3.49)
Now apply Theorem 5.1 of Chapter 0. To state the Feller property another way, write T, for the transition operator (T,f)(x) = Ef(X,) =
f
f (y)p(t; x, dy).
(3.50)
The Feller property says that if f is bounded and continuous, so is T 1 J'. A number of other applications of Itö's Lemma are sketched in the Exercises.
4 CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS A significant advantage of the theory of stochastic differential equations over other methods of construction of diffusions lies in its ability to construct and analyze with relative ease those diffusions whose diffusion matrices D(x):= u(x)a'(x) are singular. Such diffusions are known as singular, or degenerate, diffusions. Observe that in Section 2 the only assumption made on
592
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
the coefficients is that they are Lipschitzian. As we shall see in this section, the stochastic integral representation (2.45) and Itö's Lemma are effective tools in analyzing the asymptotic behavior of these diffusions. Notice, on the other hand, that the method of Section 15 of Chapter V (also see Section 3 of the present chapter) does not work for analyzing transience, recurrence, etc., as quantities such as ß(r) may not be finite. Singular diffusions arise in many different contexts. Suppose that the velocity V, of a particle satisfies the stochastic differential equation dV, = µ0(V,) dt + a 0 '
(V1)
dB',
(4.1)
where {B} is a standard three-dimensional Brownian motion. The position Y, of the particle satisfies dY,
V, dt.
=
(4.2)
The process {X, := (V 1 , Y,)} is then a six-dimensional singular diffusion governed by the stochastic differential equation dX, = µ(X,) dt + a(X,) dB,,
(4.3)
where
FL(x) = (µo(v),
v)', 6(x) 0 _I _
c'^
v) 0
'
writing x = (v, y) and 0 for a 3 x 3 null matrix. Here {B,} is a six-dimensional standard Brownian motion whose first three coordinates comprise {B,(" }. Often, as in the case of the Ornstein—Uhlenbeck process (see Example 1.2 in Chapter V, and Exercise 14.1(iii) of Chapter V), the velocity process has a unique invariant distribution. The position process, being an integral of velocity, is usually asymptotically Gaussian by an application of the central limit theorem. Thus, the asymptotic properties of X, = (V„ Y,) may in this case be deduced from those of the nonsingular diffusion {V, }. On the other hand, there are many problems arising in applications in which the analysis is not as straightforward. One may think of a deterministic process U, = (Ut", U^ 21 ) governed by a system of ordinary differential equations dU, = µ(U 1 ) dt, having a unique fixed point u* such that U, --+ u* as t --• co, no matter what the initial value U 0 may be. Suppose now that a noise is superimposed that affects only the second component Ue 2) directly. The perturbed system X, = (X; 11 , XI 21 ) may then be governed by an equation of the form (4.3) with (
6(x)
-
0
0 0(
(x)^
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
593
where a&(x) is nonsingular. The following result may be thought to deal with this kind of phenomenon, although 6(x) need not be singular. For its statement, use the notation J(x)°_ ((J1i(x))).
°(x),
Theorem 4.1. Let
i(.),
(4.4)
6(•) be Lipschitzian on ll ,
Il(x) — a(y)II '< )olx — yl
for all x, ye UBk.
(4.5)
Assume that, for all x, the eigenvalues of the symmetric matrix 2(J(x) + J'(x)) are all less than or equal to —A 1 < 0, where k2ö < 22 1 . Then there exists a unique invariant distribution ir(dz) for the diffusion, and p(t; x, dz) converges weakly to n(dy) as t , cc, for every x e W. Proof. The first step is to show that, for some x, the family { p(t; x, dy): t >, 0} is tight, i.e., given E > 0 there exists C E < oo such that p(t; x, {y: IYI > c E }) <, e
(t >, 0).
(4.6)
It will then follow (see Chapter 0, Theorem 5.1) that there exist t„ —^ cc and a probability measure it, perhaps depending on x, such that weakly
p(t.; x, dy)
i
as n --+ oo.
rc(dy)
(4.7)
We will actually prove that sup EIXfI Z < oo,
(4.8)
t>'0
where {X'} is the solution to (4.3) with X 0 = x. Clearly, (4.6) follows from (4.7) by means of Chebyshev's Inequality,
p(t; x, {lyl > c}) = P(I X I > c)
i c
EIX^ I 2 .
In order to prove (4.8), apply It6's Lemma (see Eq. 3.20) to the function ^(y) = Iy1 2 to get
5
t
IXxI 2 = IXI2 + (Acp)(XS)ds + 0
('
t
. ö,(p(X7)a,(Xs)•dB5. r= 1 0
(4.9)
594
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Now check that k
(Agq)(Y) = 2y it(Y) + X I(y)I 2 = 2 Y'µ(Y) + tr(a(Y)a'
(Y)),
(4.10)
where tr D denotes the trace of D, i.e., the sum of the diagonal elements of D. Now, by a one-term Taylor expansion, i y • grad j(By) dB,
µ (r) (Y) — µ r (0) = (
)
0
µ(y)
—
µ(0 ) = J 1 y'J(Oy)dO, 0
(4.11)
y •µ(Y) = Y - (µ(Y) — µ(0 )) + Y'µ( 0 ) =
f
1 o
(Y'J(Oy)Y) dO + y•µ(0).
Now, Y'J(OY)Y = Y'J'(OY)Y = zY'(J(OY) + J'(OY))Y 5 — A,jy 2 .
(4.12)
Using this in (4.11), 2 Y'µ(Y) < —2
1
y
2
+ 2 (I t( 0 )I)IYI.
(4.13)
Also, tr(a(Y) 6 '(Y)) = tr{(ß(Y) — a( 0 ))(a(Y) — 6 (0 ))' + a(0 )(u(Y) — ß(0))' +
(a(y)
—
6(0))6'(0) + a(0)ä (0)} .
(4.14)
Since every element of a matrix is bounded by its norm, the trace of a k x k matrix is no more than k times its norm. Therefore, (4.14) leads to tr(ß(Y)a'(Y)) < kAöjy( 2 + 2kIG(0)IA0IYl + kIa(0)I 2 .
(4.15)
Substituting (4.13) and (4.15) into (4.10), and using the fact that Il < 6 1Y1 2 + 1/5 for all 6 > 0, there exist 6, > 0, 6 2 > 0, such that 2), 1 — kt — 6 1 > 0 and (AqP)(Y)
—(2A — k^0 — b^)IYI 2 + b 2 •
(4.16)
Now take expectations in (4.9) to get EIX^ I 2 = IXI Z + EE(Agp(XS)) ds.
(4.17)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
595
In arriving at (4.17), use the facts (i) IAq (y)I < cjy1 2 + c' for some c, c' positive, so that f o EIA(p(Xs)I ds < oo, and (ii) the integrand of the stochastic integral in (4.9) is in .t[0, cc). Now (4.17) implies that t --> EIX, I Z is absolutely continuous with a density satisfying, by (4.16), dt
— kA 2 — 6 1)EjX x 1 2 + 6 2 .
EIXx 1 2 = EAW(X')
(4.18)
In other words, writing 6' .= 22 1 — k2ö — 6 1 > 0, 8(t) = EIX9 2 ,
dt (e
a ` 0 (t)) - 6 2 e b `,
0 (t) -< {0(0) + (ö 2 /8')(e a ` — 1)}e - a`
(4.19)
<, Ix1 2 e - a' 1 + 6 2 /S'.
This proves (4.8) and therefore (4.6) for all x. The next task is to prove that the limit in (4.7) does not depend on x. For this it is enough to show lim EIX, — XT I 2 = 0
`dx, ye IF8'`.
(4.20)
For, if (4.7) holds for some x, and (4.20) holds for this x and all y then for every Lipschitzian and bounded f, writing t for t,,, Ef(XT) — ff (z) n(dz) < IEf(Xfl — Ef(Xx )I + Ef(X,) —
,
f
f
f(z)7r(dz)
f(z)n(dz)
c(EIXt — X11 2 ) 1 / 2 + ^Ef(X;) — Jf(z)(dz) ^ . 0 ,
as t o — oo. It follows (Chapter 0, Theorem 5.1) that if (4.7) holds, then p(tn;
y, dz) —' n(dz)
Vy C R k .
(4.21)
To prove (4.20), proceed as in the first step, replacing X; by Z,:= X, — X; =x — y+ 0
J
t
f(s) ds + J ry(s) dB s ,
(4.22)
0
where
f(s) °= µ(Xs) — µ(Xs) = J 1 (XS — X5)'J(XS + 0(Xs — Xs )) d 0 , o Y(s)'= aND — a(XS )
(4.23)
596
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Letting Y, = Z, and p(z) = Iz1 2 in It6's Lemma (see Eqs. 3.44-3.46), obtain EIZ t I 2 = Ix - y1 2 + E J {2Z . f(s) + tr y(s)y'(s)} ds. 5
(4.24)
But, by (4.23) and the fact that z'J(w)z < —A l lz1 2 for all w, z (see Eq. 4.12), Z S •fs =
J
1
Z;J(X., + OZ5 )Zs dO
—A I IZs12. (4.25)
0
Also, IIY(s)Y'(s)II
II7(s)II IIY'(s)II A 2 '7 2 ,
so that tr
y(s)y'(s)
S k 2 0IZsj 2 .
(4.26)
Using (4.25) and (4.26) in (4.24), obtain
dt
EIZ` I 2 = E {2Z,•f(t) + tr y(t)y'(t)} —(2A 1 — k 2 ö)EIZ,I 2 ,
(EIZOI 2 = Ix — y1 2 ).
(4.28)
Hence (4.20) holds and, therefore, the limit in (4.21) is independent of y e Il k (i.e., the limit in (4.7) is independent of x). The final step is to show that (4.21) implies that n is the unique invariant probability. Let T, denote the transition operator for the diffusion (see Eq. 3.50). Then for every bounded continuous function f, (4.21) implies (T,J)(Y) - ..i := -
f
f (z)it(dz).
(4.29)
By Lebesgue's Dominated Convergence Theorem, applied to the sequence {T,. f } and the measure p(t; x, dy), T,(T,.,f)(x) =
J (TtJ)(Y)P(t; x, dy) TJ = .f.
(4.30)
But T, f is a bounded continuous function, by the Feller property (see end of Section 3). Therefore, applying (4.29) to T, f, T,,(T,.f)(x) -'
J (T^f)(z)n(dz).
(4.31)
CHAPTER APPLICATION: ASYMPTOTICS OF SINGULAR DIFFUSIONS
597
As T,(T,,, f) = T,^(T, f) = T, +t ^ f, the limits in (4.30) and (4.31) coincide,
J (T, f
)(z)zr(dz) =
J
f (z)mc(dz),
(4.32)
i.e., if X 0 has distribution it, then Ef(X,) = Ef(X 0 ) for all t > 0. In other words, it is an invariant (initial) distribution. To prove uniqueness, let n' be any invariant probability. Then for all bounded continuous f,
J
(T„ f)(z)n'(dz) =
f
f(z)ir'(dz) Vn.
(4.33)
But, by (4.29), the left side converges to 7. Therefore,
f-
f
f(z)ir(dz)
= ff(z)7r'(dz), (4.34)
implying it' = n.
U
Consider next linear stochastic differential equations of the form dX, = CX, dt + a dB,
(4.35)
where C and a are constant k x k matrices. This is a continuous-time analog of the difference equation (13.10) of Chapter II for linear time series models. It is simple to check that (Exercise 1) t
XX:= e``x +
e ( `"s )c v dB Q s
(4.36)
is a solution to (4.35) with initial state x. By uniqueness, it is the solution with X o = x. Observe that, being a limit of linear combinations of Gaussians, X, is Gaussian with mean vector exp{tC}x and dispersion matrix (Exercise 2) L(t) = E eta-s)co dB se (r-s)'6 dB s = £ e('-s)c66^e('-s)' ds 0
0
0
=
5
eae du.
(4.37)
0
Suppose that the real parts of the eigenvalues A.....2, , say, of C are negative. This does not necessarily imply that the eigenvalues of C + C are all negative (Exercise 3). Therefore, Theorem 4.1 does not quite apply. But, writing ß
598
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
[t] for the integer part of t and B = exp {C },
Ile` c I < Ile t " c Ii Ile" -( ""II < IIB 1 t'II max{ Ilesc ii: 0 < s < 1} -+ 0
(4.38)
exponentially fast, as t -+ oo, by the Lemma in Section 13 of Chapter II. For the eigenvalues of B are exp{A ; } (1 < i < k), all smaller than I in modulus. It follows from the exponential convergence (4.38) that exp{tC }x -► 0 as t -• oo, and the integral in (4.37) converges to E :=
J
e°c aae c du. "
(4.39)
0
The following result has, therefore, been proved. Proposition 4.2. The diffusion governed by (4.35) has a unique invariant distribution if all the eigenvalues of C have negative real parts. The invariant distribution is Gaussian with zero mean vector and dispersion matrix E given by (4.39).
Under the hypothesis of Proposition 4.2, the diffusion governed by (4.35) may be thought of as the multivariate Ornstein-Uhlenbeck process.
EXERCISES Exercises for Section VII.1 1. Let s = t 0 ,,, < t l , n < • • • < t k ., n = t be, for each n, a partition of the interval [s, t], such that S n := max {t ;+ ,, n - t in : 0 i 5 k n - 1} -+0 as n -• oo.
(i) Prove that n
s :_ n
(B,3
a n — Btn) 2
i=0
converges to t - s in probability as n --> oc. [Hint: E(sn - (t - s)) 2 -+ 0.] (ii) Prove that
J B,, i l.n — B, ,I —* 00 i=0
in probability as n --> oo. [Hint: y n -> s ',/maxJB,,,,, n — B, i.n ^.] (iii) Let it denote an arbitrary subdivision s = t o < t l < • •
EXERCISES
599
(iv) Prove that, outside a set of zero probability, the Brownian paths are of unbounded variation over every finite interval [a, b], a < b. [Hint: Use (iii) for every interval [a, b] with rational end points.] (v) Use (iv) to prove that, outside a set of zero probability, the Brownian paths are nowhere differentiable on (0, oo). [Hint: If f is differentiable at x, then there exists an interval [x — h, x + h] such that If() — f (z)I < ( 2 I f '(x)l + i) iy — zI Vy, z e [x — h, x + h], so that f is of variation less than (21f'(x)l + 1)2h on [x— h,x+h).] 2. Consider the stochastic integral equation (1.3) with p(•) Lipschitzian, (µ(x) — le(y)i Mix — yt. Solve this equation by the method of successive approximations, with the nth approximation X;" given by )
(y(" '>) ds+a(B,— B o )
X}">=x+ J
(n>1),
-
X;° 1 =x.
0
[Hint: For each t > 0 write b"(t) := max {IXS" — XS" "1.0 -< s < t}. Fix T> 0. Then )
T
J f
(' Tr„
S"(T)-<M
S"_ i (s)ds-<M 2 0
-1
S "_ 2 (s)dsdt"_,
0 0
T
t„-,
M"
-
fo
o
J
(' tz
... 6,(s)dsdt 2 ..•dt"_, < M "B I (T)T" '(n — 1)!
-
o
E"
c5(T) converges to a finite limit, which implies the uniform convergence Hence, of {X:0 5 s T} to a finite and continuous limit {XS : 0 s T}.] 3. Suppose f", Ja £[a, ß] and E f Q (f"(s) — f (s)) 2 ds -• 0 as n + co. (i) Prove that r
r
U"'= max
l
f(s)dB — f(s)dB. :
0 ))J
Ja
in probability. (ii) If { Y(t): a t ß} is a stochastic process with continuous sample paths and f Q f(s) dB s --* Y(t) in probability for all t e [a, ß], then prove that
P(Y(t)=f(s)dB,,`dte[a,ß]=1. )
4. Prove the convergence in (1.34). 5. Let f (t) (t o < t < t,) be a nonrandom continuously differentiable function. Prove that
J
f(s) dB a = f(t1)B1 — f(t 0 )B — B,sf'(s) ds. 10
to
ro
600
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Exercises for Section VII.2 I. If h is a Lipschitzian function on Il' and Xis a square integrable random variable, then prove that Eh 2 (X) < oo. [Hint:look at h(X) - h(0).] 2. Let DT's be as in the proof of Theorem 2.1. Prove that the sum ^^° " +1 (D) 2 occurring in (2.17) goes to zero as n - oo. [Hint: Use Stirling's approximation (10.3) in Chapter I.] 3. Assume the hypothesis of Theorem 2.1, with a = 0. (ii) Prove that E(X, - X0 ) 2 -< 4M 2 (t + 1)
f
E(XS - X0 ) 2 ds + 4t 2 Ep 2 (X0 ) + 4 tEa 2 (Xo
o,
(ii) Write D, E(max{(XS - X0 ) 2 :0 s t }), 0 t , T. Deduce from (i) that (a) D, <, dl
f
o
s
ds + d 2 , where d, = 4M 2 (T + 1), d 2 = 4T(TEp 2 (X0 ) + Ea 2 (X0
D
(b) DT < d 2 e ° ' T .
4. Write out a proof of Theorem 2.4 by extending step by step the proof of Theorem 2.1. 5. Write out a proof of Theorem 2.5 along the lines of that of Theorem 2.2. 6. Consider the Gaussian diffusion {X} given by (2.49). (i) Show that EX, = e - `Yx. (ii) Compute Cov(Xl , X+,,). (iii) For the case y > 0, a 96 0, prove that there exists a unique invariant probability distribution, and specify this distribution. 7. (i) Prove (2.28), (2.30), and (2.31) under the assumption (2.1). (ii) Write out a corresponding proof for the multidimensional case. Exercises for Section VII.3 (i) Prove (3.1). [Hint: For each n, the finite sequence (B , +,^r v - B rii2 - „) 2 - 2 "t (m = 0, 1, ... , 2" - 1) is i.i.d. with mean zero and variance 2(4 ")t 2 . Use (13.56) of Chapter I to estimate (
-
,
-
1
P max ZN ,"I > -and then apply the Borel-Cantelli Lemma.] (ii) Prove (3.16).
2. Let t(• ), a(•) be Lipschitzian and a(x) nonsingular for all x. (i) Show that if
J
exp{ -I(u)} du = oo
then the same holds for all c > 0.
for some c > 0,
601
EXERCISES (ii) Show that if
^^ ä '(u) exp{1(u)} du < co
for some c > 0,
then the same holds for all c > 0. 3. Let p(• ), a(•) be Lipschitzian and d, 1 (x)= i1(x)l 2 > 0 for all x. Prove that Et < oo, where r := inf{t ? 0: XI = d} and lxi < d. [Hint: For a sufficiently large c, the function cp(x):= —exp{cx" ) } in {Ixi d} (extended suitably) satisfies A(x) < 0 in d}. Apply It6's Lemma to cP and optional {lxs <_ d}. Let —6:= max{Acp(x): x stopping (Proposition 13.9 of Chapter 1) with stopping time t A n to get E(r A n) _< (2/6) max{1gp(x)I: jxl < d}. Now let n j oo.] 4. Let F be the twice continuously differentiable function on [r o , N], 0 < r o < N < oo, defined by (3.28). Find a twice continuously differentiable extension of F on [0, oo) that vanishes outside [r o — e, N + t:], where E > 0 is chosen so that r o — s > 0. Show that cp(x):= F(Ixl) is twice continuously differentiable on l, vanishing outside a compact set. Apply It6's Lemma to cp to derive (3.31). 5. Let
F(r):= J r exp{ —1(u))} du,
c -< r -< d,
where /(u) is defined by (3.23). (i) Define cp(x):= F(IxI) for c < jxJ _< d and obtain a twice continuously differentiable extension of cp vanishing outside a compact. (ii) For the function tp in (i) check that Agp(x) ` 0 for c 5 lxi < d, and use It6's Lemma to derive a lower bound for P(r( e(o:r) < =ae(o:d)), where raa(v:.)'= ,
inf{t >_ 0:IX;i = r}. (iii) Use (ii) to prove that
for lxi > r o ,
P(iaBio:r o ) < 00) = I
provided
f
^ exp{ —1(u)} du = ca
for some c> 0.
c
(iv) Use an argument similar to that outlined in (i) —(iii) to prove that P(zae(o:. o ) <
oo) < 1
if
J
m
exp{-1(u)} du < oo.
6. Suppose that g(•) is a real-valued, bounded, nonanticipative step functional on [0, T]. Prove that
f
t
3
t —. ( g(u) o dB„ g(t)
belongs to
.'H[0, T]
602
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
[Hint: Use (1.6), with g in place of f, to get E {(fö g(u) dB„) 6 g 2 (t)} S ct' for an appropriate constant c.] 7. Let {X} be a diffusion on 68 1 having Lipschitzian coefficients µ(• ), v(• ), with o 2 (x) > 0 for all x. (i) Use Itö's Lemma to compute (a) i/i(x):= P( {X^ } reaches c before d), c < x < d; (b) p P({X; } ever reaches c), x > c; (c) p xd t= P( {Xf } ever reaches d), x < d. [Hint: Consider p(y):= f' exp{ —1(c, r)} dr for c -< y -< d, where 1(c, r):_ f C ( 2 u(z)/a 2 (z)) dz.]
(ii) Compute (a) ET, A T d , where T,:= inf {t _> 0: Xx = r }, and c < x < d; (b) Et c (x > c); (c) ET d (x
8. (i) Let m be a positive integer, and g a nonanticipative functional on [0, T] such
that E f o g 2 m(t) dt < oo. Prove that T
E/
2m
g(t)dB,)
< (2m — 1)mTm-I f
o
Eg2m(t)dt.
0T
[Hint: For bounded nonanticipative step functionals g, use Ito's Lemma to get \ 2m
t o
(' s
t
2,-2
E (J g(u) dB„ g 2 (s) ds,
m(2m — 1)
Jt 1= E(g(s) dB s )
)
/
o
o
so that
(' dJ, = m(2m — I )E
dt
m(2m — 1)E
(J
1(
o
f
2m
r
g(u) dB,)
-2 g 2 (t)
2m (2m -2)/2m
g(u) dB.)
(Eg2m(t))I /m
o
(by Hölder's Inequality).] (ii) Extend (i) to multidimension, i.e., for nonanticipative functionals f = {(f,(t), ... , fk (t)): 0 < t < T} satisfying E J 01 f 2m(t) dt < oo (1 _< i _< k), prove that
E(1 T f(t)•dB, Jo
< k 2 m - I(2m — 1)mTm-I /
YE
f0T
J
f 2 m(t) dt.
9. (L°-Maximal Inequalities) (i) Let {Z.: n = 0,1, ...} be an {}-martingale, such that EIZ n I° < oo for all n and for some p > 1. Prove that
Pt max IZ m I> ^/ < i -F EjZn^ p VA > 0. ,
\I m n
[Hint: Note that {IZ n j°} is a submartingale, and see Exercise 13.10 of Chapter I.]
603
EXERCISES
(ii) Assume the hypothesis of (i) for some p > 1. Prove that max IZnI ° I ` (P/(P — 1))PEIZ"I°.
E^ 0<m_n
[Hint: Write M"'= max{IZ,I: 0 _< m _< n}. Then AP(M" A) _ j IZnI 1M , dP ,
Ja
by (13.55) of Chapter I. Therefore,
J
('
M
m
AP - 'P(M" >- A) d i (by Fubini)
EMn = E(p »'' dA = p 0
J
o
m
M^
('
('
p
J
A p-2 ^ IZnI1,.^,,, dPl dA = P IZ4I^ o
/
\
_ (PI (P — 1
Al-2 d), dP
\ o
IZnl M^ -' dP
))
Ja
(p/(p — 1))(EMP) ( p - ' )1 P(EIZI
')"
(by Hölder's Inequality) .]
(iii) Let {Z,: t > 0} be a continuous-parameter {.F,}-martingale, having a.s. continuous sample paths. If EIZ,IP < oo for all t ? 0, and for some p > 1, then show that max IZ,I >,) <- A. -° EIZTI ° VA> 0, P( o
and, if p> 1, show that
Ej
max IZ,Ij < (p/(p — 1))PEIZ T Ip. 0
10. (i) In addition to the hypothesis of Theorem 2.1 assume that EX'" < co, where m is a positive integer. Prove that for 0 _< t — a < 1, /
/
/
(2m)1 u2m
IIX[ — Xall2m -< c1( m, M) (t — IX)Il^lXa)112m + (t — a)1J211Q(Xa)112m\m
2m j
J
= c l (m, M)tp(t — 0),
say. Here II.112. is the Le m-norm for random variables, and c l (m, M) depends only on m and M. [Hint: Write S;"°:= sup{E(XS"' — s t}. Use the triangle inequality for II 2m' and Exercise 8(i), to show that (a) I(Xt ' — Xa Ii2m <- q (t — a),
(((1))1/2m < (t — a)e
604
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
-1 (b) afn) < 22m-1M2m {(t — IX)2m l
+
t
(2m — 1)m (t — a) m -1
6(gn-1j ds
}
a
for n>2.
Observe that
II Xt — X12,,,
((S(n))1/2m
n=1
(ii) Deduce from (i) that sup„,.,,, EX” (iii) Extend (i), (ii) to multidimension.
/
< oo for all t > a.
11. (i) Use the result in 10(i) to prove, for every e > 0, that P(IX,, — xl >, s) = o(t), as t j 0, (a) uniformly for all x in a bounded interval, if p(.), a(.) are Lipschitzian, and (b) uniformly for all x in 08' if p(. )' a(•) are bounded as well as Lipschitzian. (ii) Extend (i) to multidimension. (iii) Let t(.), a(.) be Lipschitzian. Prove that
P( max IXs — xl >_ e f = o(t) o5s5r
as t 10,
I
uniformly for every bounded set of x's. Show that the convergence is uniform for all x e IF k if t(. ), a(•) are bounded as well as Lipschitzian. [Hint:
lµ(X")I PI max IX'r — xl s > ^ S d s % a ^ a X adB + P max oss,t E/ \ Os: Ifos 2 I — e/ < P \J o t 2 () ='1
+ 1 2,
say. To estimate I l note that Ip(XS)I < 1µ(z)1 + MIX' — xl, and use Chebyshev's Inequality for the fourth moment and Exercise 10(iii). To estimate / 2 use Exercise 9(iii) with p = 4, Proposition 3.4 (or Exercise 8(ii) with m = 2) and Exercise 10(iii).] 12. (The Semigroup {T, }) Let p(-), a(•) be Lipschitzian. (i) Show that IITtf — f II -+ 0 as t j 0, for every real-valued Lipschitzian f on 08 k vanishing outside a bounded set. Here II • II denotes the "sup norm." [Hint: Use
Exercise 10.] (ii) Let f be a real-valued twice continuously differentiable function on 08'`. (a) If Af and gradf are polynomially bounded (i.e., IAf(x)I < c l (1 + Ixlm'), (grad f (x)l c 2 (1 + Ixlm'), show that T, f (x) -+ f(x) as t j 0, uniformly on every bounded set of x's. (b) If Af is bounded and grad f polynomially bounded, show that IITtf — f II -+ 0. [Hint: Use Itö's Lemma.] (iii) (Infinitesimal Generator) Let -9 denote the set of all real-valued bounded continuous functions f on R k such that Ilt - '(T,f — f) — gII -^ 0 as t 10, for some bounded continuous g (i.e., g = ((d/dt)T r f), = o , where the convergence of the difference quotient to the derivative is uniform on R"). Write g = Af for such f, and call A the infinitesimal generator of {T, }, and -9 the domain of A. Show that every twice continuously differentiable f, which vanishes outside some bounded set, belongs to -9 and that for such f, Af = Af where A is given
605
EXERCISES
by (3.21). [Hint: Suppose f(x) = 0 for Ixl > N. For each E > 0 and x E ITJ(x) — f(x) — tAf(x)i -< tO(Af:c) + 2tIIAf IIPI max IXs — xl \o<s<<
where ©(h:) = sup{th(y) — h(z)I: y, z E R", ly ... zl < a}. As r j0. O(Af :s) --p 0, and for each e > 0, as t j 0
0
max IXS — xi _> e P( osssr
/
uniformly for all x in Ne r_ {y: II _< N + F} (see Exercise I I (iii)). For x E NE, IT,f(x) — f(x) — tA.f(x)I = E
f
o
tIIAfIIP(
Af(X,)ds
where z':= inf{t >_ 0: IX, I = N}. But P(r" <, t) < sup{P(r'' <_ t): lyl = N + F}, by the strong Markov property applied to the stopping time t:= inf{t >_ 0: IX, I = N + f}. Now t)
max IXs—yl>,e^=o(t) o,s,l
astj0,
uniformly for all y satisfying lyl = N + e.] (iv) Let f be a real-valued, bounded, twice continuously differentiable function on If8'k such that gradf is polynomially bounded, Af is uniformly continuous, and Af(x) —+ 0 as Ixl —• oo . Show that f E -9. [Hint: Extend the argument in (iii).] (v) (Local Property of A) Suppose f, g E _ are such that f(y) = g(y) in a — XI
13. (Initial-Value Problem) Let t(). a(•) be Lipschitzian. Adopt the notation of Exercise 12.
(i) If f E -9, then T, f E for all t _> 0, and ÄT, f = T,Äf. [Hint: h(Th(Tf)
—
in "sup norm," since T,g
T1f)=
Ti(Tti
h —f
)
^T,Af
11911 for all bounded measurable g.]
(ii) (Existence of a Solution) Let f satisfy the hypothesis of Exercise 12(iv). Show that the function u(t, x):= T, f (x) satisfies Kolmogorov's backward equation
au(t, x) = Au(t, x
and the initial condition u(t, x) —• f(x)
as t j 0, uniformly on (I^'`.
606
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
If, in addition, u(t, x) is twice continuously differentiable in x, for > 0, then Au(t, x) = Au(t, x). [Hint: The first part is a restatement of (i). For the second part use Exercise 12(iii), (v).] (iii) (Uniqueness) Suppose v(t, x) is continuous on {t -> 0, x e H"), once continuously differentiable in t > 0 and twice continuously differentiable in x e H'", and satisfies av(t,x)
at
=Av(t,x)
(t
>0,xEH"),
lim v(t, x) = f(x)
(x e R'),
rio
where f is continuous and polynomially bounded. If Av(t, x) and 1grad v(t, x)j =
^(a/axW . .. , a/ax"k")v(t, x)I are polynomially bounded uniformly for every compact set of time points in (0, eo), then v(t, x) = u(t, x):= Ef (X^ ). [Hint: For each t > 0, use Itö's Lemma to the function w(s, x) = v(t — s, x) to get r
{—v o (r, X) + Av(r, X
Ev(e, X^_ E ) — v(t, x) = E
r
)} dr = 0,
E
where v 0 (t, y)'= (av/at)(t, y). Let e 10.] 14. (Dirichlet Problem) Let G be a bounded open subset of H". Assume that A(• ), «(.) are Lipschitzian and that, for some i, d(x)1= (x)I 2 > 0 for x e G. Suppose v is a twice continuously differentiable function on G, continuous on G, satisfying Av(x)
_
—g(x)
(x e G),
(x e bG),
v(x)=f(x)
where f and g are (given) continuous functions on aG and G, respectively. Assume that v can be extended to a twice continuously differentiable function on H'". Then v(x) = E f (X7) + E
T
g(X,) ds
(x e G),
.1 0
where r'= inf{t >, 0: X; e aG}. [Hint: v can be taken to be twice continuously differentiable on H k with compact support. Apply Itd's Lemma to {v(X,)}, and then use the Optional Stopping Theorem (Proposition 13.9, Chapter I). Also see Exercise 3.]
15. (Feynman—Kac Formula) Let p(.), 6(•) be Lipschitzian. Suppose u(t, x) is a continuous function on [0, oo) x H", once continuously differentiable in t for t > 0, and twice continuously differentiable in x on H", satisfying au(t, x) at
= Au(t, x) +
V(x)u(t, x)
(t > 0, x E H"),
u(0, x) = f(x),
where f is a polynomially bounded continuous function on H k , and V isa continuous function on H k that is bounded above. If grad u(t, x) and Au(t, x) are polynomially
607
THEORETICAL COMPLEMENTS
bounded in x uniformly for every compact set oft in (0, oo), then
u(t,x)= E(f(X,)exp {
f
.'
V(XS)ds}) J
(t>,0,xcR'`).
[Hint: For each t > 0, apply Itö's Lemma to
X,.,
1'(s) = (Y1(s), Y2(s)) = (
J V(X) ds')
and
w(s, y) = u(t — s, Y1),
0
where y = ( y 1 , y 2 ).]
Exercises for Section VII.4 1. (i) Use the method of successive approximation to show that (4.36) is the solution to (4.35). (ii) Use Itö's Lemma to check that (4.36) solves (4.35). 2. Check that (4.37) is the dispersion matrix of the solution {X'} of the linear stochastic differential equation (4.35). 3. (i) Let C be a k x k matrix such that C + C' is negative definite (i.e., the eigenvalues of C + C' are all negative). Prove that the real parts of all eigenvalues of C are negative. (ii) Give an example of a 2 x 2 matrix C whose eigenvalues have negative real parts, but C + C' is not negative definite. 4. In (4.35) let C = —cI where c > 0 and I is the k x k identity matrix. Compute the dispersion matrices E(t), E explicitly in this case, and show that they are singular if and only if a is singular.
THEORETICAL COMPLEMENTS Theoretical Complements to Section VII.1
_>
1. (P- Completion of Sigmafields) Let (0, .F , P) be a probability space, a subsigmafield of .. The P-completion ' of is the sigmafield {G u N: G e N c M e such that P(M) = 0}. The proof of Lemma I of theoretical complement V.1.1 shows that if .3, is P-complete for all t 0 then the stochastic process {f(t): t 0} is progressively measurable provided (1) f(t) is ,y,-measurable for all t, and (2) t —+ f(t) is almost surely right-continuous. Several times in Section 1 we have constructed processes { f(t)} that have continuous sample paths almost surely, and which have the property that f(t) is .F- measurable. These are nonanticipative in view of the fact that, is P-complete.
>_
_< _<
2. (Extension of the Ito Integral to the Class .[a, /f]) Notice that the Itö integral (1.6) is well defined for all nonanticipative step functionals, and not merely those that are
in #[a, ß]. It turns out that the stochastic integral of a right-continuous nonanticipative functional { f(t): a t fl} exists as an almost surely continuous
608
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
limit of those of an appropriate sequence of step functionals, provided
P
f 2 (s) ds < oo a.s.
(T.1.1)
Ja
The class of all nonanticipative right-continuous functionals { f (t): a < t < ß} satisfying (T.1.1) is denoted by 2[a, ß]. Given an f E .^[a, ß] define, for each positive integer n,
:=: cw J A(t; n) { l
f 2 (s)ds
;
a
Then f" E ✓#[a, ß]. Write )
A":= w: j f 2 (t, w) dt
a
If m > n then on A. we have fm (t, co) = f"(t, w) for a <, t _< ß. It follows that
J
"
fm (s) dB, =
a
J f (s) dB, n
for
a _< t _< ß, for almost all w E A".
a
g(t,
w) = h(t, w) for a _< t _< ß, To see this, consider g, h E ✓H[a, ß] such that we A e F. If g, h are step functionals, then it follows from the definition (1.6) of the stochastic integral that
g(s) dB = (s) ^ h dB J r r
s
s fora
_< t < ß, for all
wE
A.
a
a
In the general case, let g ", h" be nonanticipative step functionals such that g"(s) dB, --• j g(s) j a
dB,,
j h"(s)
dB, —>
,J a
,l a
h(s)
dB,
for a _< t _< ß, a.s.
a
Without loss of generality one may take, for each n, the same partition {t ; (n): 0 < i < N of [a, ß] for both g h Modify h if necessary, so that h"(tf"°, co) = g"(t; ", w) for we {h(s) = g(s) for a _< s _< t," } (0 _< i <_ N"). Then
", ".
"}
",
)
J
h"(s)
dB, =
c
g"(s)
dB,
for a _< t _< ß, if w E A,
a
so that, in the limit,
J
g(s) dB,,
h(s) dB5 =
a _< t < ß, almost everywhere on A.
a
a
Thus, we get
J
r fm(s)
a
dB. =
Jf t
a
"(s) dB,
a _< t < ß,
almost everywhere on A".
THEORETICAL COMPLEMENTS
609
Therefore, lim
m-t f^
fm(s)
dB, =
'
Js
f,(s) dB,
a < t -< ß, almost everywhere on A.
A stochastic process that equals 1'. fn (s) dB, a.s. on A. (n >- 1) will be called the Ito integral off and denoted by f (s) dB,. Since P(A) —. I in view of (T.1.1), on the complement of U a A„ the process may be defined arbitrarily. The proof of Proposition 1.2 may be adapted for f e Y[a, (f] to show that (see (1.25) and the definition of h. after (1.29)) there exists a sequence of nonanticipative step functionals h„ such that
Ja
E
(h„(s) — f(s)) 2 ds —* 0
in probability.
From this one may show that f Q h a (s) dB, converges in probability to S f(s) dB,. For this we first derive the inequality
J
('
P^
ß
f (s) dB, > e^ P(
j
ß
c.
f2(5)dS -> c +
a
a
z (T.1.3) r,
for all c > 0, e > 0. To see this define g(t) = f(t)
Since g e . #{, ß], and f(t) = g(t) for that
(I Jaß.i(s) dB
P(
E)
P
J
1 A(t; )-
-< t /3 on the set {
ß
Ja f
f 2 (s) ds > c) +
2
(s) ds
g(s) dB, > r
P(I JQß 2(s) ds c^ + E\Jaß g(s) dB )/ F
2
:
P \J ßf
J
J
ß ds) PI ß f 2 (s) ds c) + (E E g2(s) ß c
P a
Applying (T.1.3) to h
f 2(s) ds > c + EZ
/2 (T.1.4)
—f one gets, taking c = e 3 ,
lim
(J
a
(S) dB, — J ß f(s) dB, >
:h
which is the desired result. Properties (a), (b) of Proposition 1.1 hold for f E ^P[a, ß].
610
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Theoretical Complements to Section VII.2 1. (The Markov Property of {X1 }) Consider the equation (2.25). Let 5s , denote the P -completion of the sigmafield a {B" — Bs : s < u -< t }. The successive approximations (2.3) — (2.5), with a = s, X,, = z, may be shown to be measurable with respect to (U8 1 ) Qx X5 . To see this, note that (z, u, w) --+ X." (w) is measurable on (R' x [s, t] x f2,.(l 1 ) Q.1([s, t]) Qx 9,,,) for every t > s. Also, if (z, u, u0) --* X »(w) is measurable with respect to this product sigmafield, then so is the right side of (2.4). This is easily checked by expressing the time integral as a limit of Riemann sums, and the stochastic integral as a limit of stochastic integrals of nonanticipative (with respect to { 9 ✓ : u -> s }) step functionals. It then follows that the almost sure limit X, of X; " (as n - oo) is measurable as a function of (z, w) (with respect to the sigmafield f(H 1 ) px (g R' x i)). Write the solution of (2.25) as cp(s, t; z). Then X; = (p(s, t; XS ), by (2.24) and the uniqueness of the solution of (2.25) with z = X. Since X. is S measurable, and is independent of 15 ,, it follows that for every bounded Borel measurable function g on R', E(g(X;) I ) _ [Ey(rp(s, t; z))] z=Xt (see Section 4 of Chapter 0, Theorem on Independence and Conditional Expectation). This proves the Markov property of {X, }. The multidimensional case is entirely analogous. )
)
2. (Nonhomogeneous Diffusions) Suppose that we are given functions p '°(t, x), 6(t, x) (
on [0, oo) x H k into H'. Write µ(t, x) for the vector whose ith component is µ^' (t, x), a(t, x) for the k x k matrix whose (i, j) element is Q(t, x). Given s > 0, we would like to solve the stochastic differential equation )
dX, = µ(X„ t) dt + a(X„ t) dB, (t >, s), XS = Z,
(T.2.1)
where Z is `ks measurable, EIZI 2 < oc. It is not difficult to extend the proofs of Theorems 2,1, 2.2 (also see Theorems 2.4, 2.5) to prove the following result. Theorem T.2.1. Assume 1µ(t, x) — µ(s, Y)I 5 M(Jx — yl + It — sI),
Ila(t, x) — a(s, Y)II
(T.2.2)
M(Ix — yI + It — sl),
for some constant M. (a) Then for every . -measurable k -dimensional square integrable Z there exists a unique nonanticipative continuous solution {X,: t ^ s} in .&[s, oo) of
X,=Z+
J
u)du+ S
J
f
a(X,u)dB"
(t s). (T.2.3)
5
(b) Let {X;•": t ? s} denote the solution of (T.2.3) with Z = x E H". Then it is a (possibly nonhomogeneous) Markov process with (initial) state x at time s and transition probability p(s', t; z, B) =
P(X,'•z E B)
(s -< s'
t).
(T.2.4)
THEORETICAL COMPLEMENTS
611
Theoretical Complements for Section VII.3 1. (Itb's Lemma) To prove Theorem 3.1, assume first that f„ g, are nonanticipative step functionals. The general case follows by approximating f,, g, by step functionals and taking limits in probability. Fix s < t (0 -< s < t -< T). In view of the additivity of the (Riemann and stochastic) integrals, it is enough to consider the case f,(s') = f,(s), g,(s') = g,(s) for s -< s' t. Let tö ) = s < t(" ) < • • • < tN,), = t be a sequence of partitions such that as n oc.
S":=max{t+1—t;"):=0-
Write Nn
(t, Y(t)) — P(s, Y(s))
-
1
=[(t1, i
Y(t
)) — (P(ti"', Y(tt)fl
=o Nn - 1
[^(t;"), Y(t'? l )) — q(ti" ) Y(t1" ) ))] .
+ i
(Ti 1)
=o
The first sum on the right may be expressed as Nn
i
-
"1 (tl — t l ){öoq(t
,
(ti+l))+R';}
(T.3.2)
i
where IRn .'I 5 max{Iaoq(u, Y(t i+ )) — aoq(u', Y(t i+ l ))I: t
u
u'
ti+, } -a 0
uniformly in i (for each w), as n , because c o op is uniformly continuous on [s, t] x { Y(u, w): s -< u -< t}. Hence, (T.3.2) converges, for all w, to
Jo'=
J
l
d0 (u, Y(u)) du.
(T.3.3)
By a Taylor expansion, the second sum on the right in (T.3.1) may be expressed as Nn
-
1 m (Ylrl(ti+l) — Y (ry
(tlnl))
0,
q(tin)
Y(tlnl))
i =0 r=1
Nn -I
I
+
2 ! t^r,r'Sm
) — Y l.l (tl" ) ))(Y`"'(t'+) — y r.l (t^"'))
> ( i
=o
X l a r ar' q ( t n) Y(t'n))) + Rrr).n.i}, (T.3.4) where R, — 0 uniformly in i (for each w), because (u, y) ---r ö,. 3, cp(u, y) and t -a Y(t) are continuous. Using (3.9) the first sum in (T.3.4) is expressed as m Nn-1 J {' nl Y- Y- l( 1 i ") — ti )f,(s) + g,(s)' (Bt(
r=1
i
=0
—'
Jt 1
`_ .=t ^J sl
,
— Bilnl)}urq(tin) Y(t)) l
f.(u)a.^P(u,
J
1'(u)) du + 1 a,(P(u, Y(u))g.(u)•dB"], in probability, (T.3.5)
612
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
since ä,cp is continuous and (see Exercise 1.3) N„-1 1=0
f
4"
(0
(p(t(
('
I
"1 Y(ti"^)) — 3 p(u, Y(u)))g,(s)•dB„ = J h"(u)•dB,,, s
"
say, where $ Jh,.,(u)j du -+ 0 a.s. It remains to find the limiting value of the second sum in (T.3.4) (excluding the remainder). For this, write N„-1 (Y'(
t l + l) — Y
.
(t"))(Y'(t
1=o N„-1
_
'
+1) — Y (. ) (tl n) ))a.a. w(ti"' y(t;n '))
{ (t + l — t )f (s) + g.(s) • (B,j" , — B 11 0} ,
i0
X
l(t( +l — t1n))✓,
(s) + g; (s)• (B 1 1,. — Btlnl)}ä,är•(p(ti'° Y(t'"1)).
(T.3.6)
By (3.1), (3.15) and (3.16), (T.3.6) converges a.s. (for a suitable sequence of partitions) to N -1
um j
k
k
9(s)(B^f;, — B.)}{ g (s)(B) — B
) a.ö: ^P(t}
1
" ,
Y(ti''))
=1i=1 Jim =
9:1(s)9r1(s)
(Bri'.
k
— B^i^ ) Z a.a; W (ti " 1 Y(ti" 1 ))
1=0
j=1
N„-1
= L^ 9r11(S)gr^)(s) llm
Z
nl (ti +l — ti )a,a,. (ti ") , Y(ti "1 ))
i=0
j=1
(T.3.7) = J12'= Y f,' g,'j) (U)gr(j '(U)O,ar'(P(U,Y(u))du. j =1
n
The proof is completed by adding J0 , J 11 , and J 12 .
It may be noted that the proof goes through for f„ g, e £[0, T], I -< r -< k, cp(t, y) twice continuously differentiable in y and once in t. 2. (Explosion) Assume that µ(• ), 6(•) are locally Lipschitzian, that is, (2.42) holds for ^xl < n, jyj < n, with M = Mn , where Mn may go to infinity as n —• co. One may still construct a diffusion on R k satisfying (2.43) up to an explosion time C. For this, let {X': t >- 0} be the solution of (2.43) with globally Lipschitzian coefficients i,,(•), c()
satisfying µn(Y) = µ(Y)
and
a(y) = 6 (Y)
for jyj -< n.
(T.3.8)
We may, for example, let µ "(y) = p(ny/Iyl) and v"(y) = a(ny/Iyj) for lyl > n, so that (2.42) holds for all x, y with M = M. Let us show that, if (xl -< n, X' m (w) = X;. ,,(aw)
for 0 -< t < C n (co) (m >- n)
(T.3.9)
THEORETICAL COMPLEMENTS
613
outside a set of zero probability, where inf{t > 0: IX, „I = n } .
(T.3.10)
To prove (T.3.9), first note that if m >- n then µ(y) = µ„(y) and a,,,(y) = a„(y) for II -< n, so that and
µ m (X,,,,) = µ„(X,.,,)
«„(X;,,,) =
a„(Xf „)
for 0 -< t < S„(w). (T.3.1 I
It follows from (T.3.1 1) (see the argument in theoretical complement I.2) that
f p.(X,.,,) ds +
Jo
a,X) dB = Jo
µ„(X) ds + Jo
ß„(X;. ,,) dB
for 0 -< t < S„
JO (T.3.12)
almost surely. Therefore, a.s.,
Xi m — X^ n =
f'
o [I►m(X.;m) — P^,(Xs,)] ds +
[ 6 m(X.;.m) — am(X,.^)] dBu
Jo
for 0 -< t < S„. (T.3.13)
This implies cP(t):=
E(IXi — Xi„J 2 I
>
,) 2tMm J
(s) ds + 2M'
w(s) ds 0
= 2Mm(t + 1) q(s) ds.
Jo
(T.3.14)
By iteration (see (2.21) and (2.22)), q(t) = 0. In other words, on {^„ > t}, X; „, = a.s., for each t > 0. Since t --, X; ,,,, X,.,, are continuous, it follows that, outside a set of probability 0, X,.„, = Xs,, for 0 -< s < S„, m -> n. In particular, C. j a.s. to (the explosion time) 1;, say, as n j co, and X; (w) = lim X” „(w),
t < (w),
(T.13.15)
nom
exists a.s., and has a.s. continuous sample paths on [0, S(cw)). If P(C < co) > 0, then we say that explosion occurs. In this case one may continue {X'} fort -> S(w) by setting X; (co) _ "co”, t
(co),
(T.13.16)
where "co" is a new state, usually taken to be the point at infinity in the one-point compactification of 11" (see H. L. Royden (1963), Real Analysis, 2nd ed., Macmillan, New York, p. 168). Using the Markov property of {X} for each n, it is not difficult to show that {X': t >- 0} defined by (T.13.15) and (T.13.16) is a Markov process on 08'` u {"cc"}, called the minimal dillusion generated by A =
1
2
0 2 r; u^ ) d ,( x 3x^^i fix' + µ (X) (izri'
614
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
If k = 1 one often takes, instead of (T.13.16), —00
for t >-C(w)on(lim
+ oo
for t >- ^(w) on (um X; = + oo f .
X,
=—oo^,
\
Xt _ T -'
(T.3.17)
J
Write C = r_. on {lim,^. X, = —co}, C = r + on {lim,^. T,:= inf{t >, 0: X; = z }.
X,
= +oo}. Write
Theorem T.3.1. (Feller's Test for Explosion). Let k = 1 and I(z) := f o (2µ(z')/a 2 (z')) dz'. Then if
(a) P(r + ^ < oo) = 0
J
exp{- 1(y) } ^ y (exp {1(z) }/v 2 (z)) dz oo, dy =
L
0
f (b) P(r-. < oo) = 0i
J o exp' — 1(y)}
I
0
I
o (exp{I(z)}/a 2 (z)) dz] dy = oo.
fy
Proof. (a) Assuming that the integral on the right in (a) diverges we will construct a nonnegative, increasing and twice continuously differentiable function (p on [0, cc) such that (i) Agq(y) < p(y) for all y and (ii) 4p(y) -+ oo as y -^ oo. Granting the existence of such a function cp for the moment (and extending it to a twice continuously differentiable function on (-0o,0]), apply Itö's Lemma to the function cp(t, y) := exp{ — t }(p(y) to get, for all x > 0, n TO A I
Eq (t„ A T o n t, X '^
Tp
,,,) — q (x) = E
(—exp{ — s })[cp(Xs) — AQp(Xs )] ds 0
0.
This means q(x) >, Ecp(;r n A T o A t, X A
To „
j, so that
Q(x) >, e `^p(n)P(r < r o A t). -
Letting n -+ o0 one has, for all t >, 0, P(r + ,
< r o A t) < e`cp(x) )im cp
-
'(n) = 0.
(T.3.18)
n-•m
Thus, P(T +« < t, T o = oo) = 0, for all t >- 0, so that P(t + ^ < oo, -r o = oo) = 0. But on the set {tt o < co}, r + > i o by definition of r +m . Therefore, P(r +W t, r o < oo) = 0. This implies P(r + . < oo) = 0. It remains to construct cp. Let p 0 (z) = I (z ? 0) and define, recursively, q(z) = 2
^exp{- 1(y)
f.
exp{ 1 (z')}con-i(z }Ifoy
62 (z')
)
dz,ldy
J
(z ^ 0; n >- 1).
(T.3.19)
615
THEORETICAL COMPLEMENTS
Then (p, >- 0, q(z) j as z j, and cp„(z)
(£
[ex f ex6{((zl z
)}
dz' d3 , (n -> 0). (T.3.20)
2 p{-!(Y)}
n'
0
To prove (T.3.20), note that it holds for n = 0 and that „(z) -< cp„_,(z)^p,(z) (n -> 1). Now use induction on n. It follows from (T.3.20) that the series q (z):= „= o y^„(z) converges uniformly on compacts. Also, it is simple to check that A(P„(z) = cP„_,(z) for all n >_ 1, so that A may be applied term by term to yield AQQ(z) = (P(z) (z -> 0). By assumption, p 1 (z) --' -) as z -+ rc,. Therefore, cp(z) -* x as z - o. Next assume that the series on the right in (a) converges. Again, applying Itö's Lemma to rp(t, y):=exp{-t}cp(y), we get X q(x) = EcP(r„ A T o A t , TnA ToA
cp(n)P(i„
< t o A t) + q (0)P(r o -< t„ A t) + e
-
`cp(n)P(t
< z„ A x)
= cp(n){P(t„ -< T o A t) + e `P(i„ A T o > t)}.
(T.3.21)
-
Letting t -* oc we get 9(x) -< p(n){P(r„ < To, T o < cc) + P(i„ < r x , T o = cc)} < w(n)P(i„ < cc),
so that P(z, < cc) -> 9 (x)/q (oo) > 0.
The proof of (b) is essentially analogous.
n
REMARK. It should be noted that P(T + , < oo) > 0 means +co is accessible. For diffusions on S = (a, b) with a and/or b finite, the criterion of accessibility mentioned in theoretical complement V.2.3 may be derived in exactly the same manner. In multidimension (k > 1) the following criteria may be derived. We use the notation (3.23) for the following statement.
Theorem T.3.2.
(Has'tninskii's Test for Explosion).
(a) P(S < oo) = 0
if J
(b) P(? < oo) > 0
if
J
x
e i^•^(
f exp{/(u)}
x(u)
` J e-t(r)(I
,
exp{I(u)}
a(u)
du) dr = cc,
du dr < co.
[I]
Proof. (a) Assume that the right side of (a) diverges. Fix r o e [0, xl). Define the following functions on (r o , cc),
^Ao(v) = 1,
^P„(v) =
J„
e ^t.^ I I exp{ (u)}r0=i(u) du) dr.
„
x(u)
(T.3.22)
616
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Then the radial functions q(y)on (r0 _< IyI < oo) satisfy: A(y) _< ^p„_ I(IYI)• Now,
as in the proof of Theorem T.3.1, apply Itö's Lemma to the function
extended to all of R'` as a twice continuously differentiable function. Then, writing t R := inf {t >_ 0: IXXI = R }, E(p(t, o /\ tR A t, X,'
TRn,) <- 9(Ixl),
or
e-'w(R)P(tR ^ t ro A t) (IX),
P(ZR
t, a A t)
e'
w(IXI) cp(R)
Letting R -• co, we get P(^ _< t, o A t) = 0 for all t _> 0. It follows that P(S < co) = 0. (b) This follows by replacing upper bars by lower bars in the definition (T.3.22) and noting that for the resulting functions, Acp„(IYI) % (V „- i(IYI)• The rest of the proof follows the proof of the second half of part (a) of Theorem T.3. 1. n Theorem T.3.1 is due to W. Feller (1952), "The Parabolic Differential Equations and the Associated Semigroups of Transformations,” Ann. of Math., 55, pp. 468-519. Theorem T.3.2 is due to R. Z. Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and Stabilization of the Solution of the Cauchy Problem for Parabolic Equations,” Theor. Probability Appl., 5, pp. 196-214. 3.
), a(.) be Lipschitzian, a(•) nonsingular and a - ' (•) Lipschitzian. Let {X'} be the solution of (Cameron-Martin-Girsanov Theorem) Let
X; = x + j a(X)dB (t
-> 0).
(T.3.23)
Jo
Define
exp{
3
`
J
6-
1 ` '(X II)R(XS )•dB 3 . - 2 la -t (Xs:)P(X -, )I Z ds' J5 (0`s
By Itö's Lemma, dZ^.: = Z,..6 -
'(X')p(X^ )•dB„
or,
Z"= I + J Z0 • a '(X S)g(Xs)•dB s . —
0
(T.3.25)
THEORETICAL COMPLEMENTS
617
One may show, using Exercises 3.8, 3.10, that the integrand in (T.3.25) is in ✓1I[0, ox). Hence {Z°•'} is a nonnegative {3}-martingale and EZ' = 1. Define the set function Q on U,,o ^F, by Q(A)= E(1 A Z") =
f
Z°•" dP,
A e .f,.
(T.3.26)
A
Note that if A e ,y, then A e .,, for all t' > t, and E(I A Z) = E [ 1AE(Z o.x I
^)] = E(l A Z°.x)
Hence Q is a well-defined nonnegative and countably additive set function on the field U,, o F,. It therefore has a unique extension to the smallest sigmafield 9 containing U,, o .y,. Also, Q(S1) = E(Z°') = 1. Thus, Q is a probability measure on (f2, F). On (Q .y,), Q is absolutely continuous with respect to P. with a Radon—Nikodym derivative Zr'. We will show that, under Q, {X} is a diffusion with drift µ(•) and diffusion matrix a( )'( ), starting at x. Denote by E Q and E, expectations under Q and P, respectively. Let g be an arbitrary real-valued bounded Borel measurable function on Il k . Then E(9(X'„)Z°+,1 gis) = Z
o.x
'(t, X., ),
(T.3.27)
where i(t, Y):= E(g(XY )Z
o .r).
(T.3.28)
Therefore, for every real bounded F,-measurable h, .x
EQ(hg(Xs+^)) = E(hZ° i(t, XS)) = EQ(h+i(t, Xs)),
(T.3.29)
which proves the desired Markov property. Finally, for all twice continuously differentiable g with compart support, It6's Lemma yields, with f = a`tt, o d(g(X, )Zo,.) = (Aog)(X' )Z ” dt + Z°.' (grad 9)(X: ). ø(X; ) dBt
+ g(X,)Z,o •'"f(X, )•dB, + Z°•"(grad g)(Xµ(X, ) dt = (Ag)(X, )Z°" dt + Z°•"(grad g)(X,) + g(X, )f (X, )) • dB„ (T.3.30) where, writing d(y)t= (a(y)a'(y)), 1 Aog(Y) = 2 1]
82g( Y) oy^^i ayu^'
^;^ ag Ag(Y)'= Aog(Y) + µ (Y) _ .) . (T.3.31)
Taking expectations in (T.3.30) we get E Q g(X;) — g(x) = f E Q Ag(Xs) ds.
(T.3.32)
Jo Dividing by t and letting t j 0, it follows that the infinitesimal generator of the Q-distribution of {X;) is A. We have thus proved the following result.
Gib
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
Theorem T.3.3. (Cameron-Martin-Girsanov Theorem). Suppose i(•), a(.), a ' ( )
are Lipschitzian, {X'} the nonanticipative solution of (T.3.23). (a) Then for every t ? 0, the probability measure Q defined by (T.3.26) is absolutely continuous with respect to P on (fl, . ), with the Radon-Nikodym derivative Z o •" (b) Under Q, {X} is a diffusion with drift it(•) and diffusion matrix a(.)«'(•). ❑ An immediate corollary is the following result. Let .4, denote the sigmafield on S2':= C([0, oo): P) generated by the coordinate projections co' -• a, 0 <_ s _< t. Let 9 = a(U,, o ,4,). Denote by P" the distribution (on (0', ga')) of a diffusion with drift µ;(.) and diffusion matrix ((d; j (• ))) = a(•)a(• )', starting at x (i = 1, 2). )
Corollary T.3.4. Suppose µ;(•) (i = 1, 2), a(. ), a ' (•) are Lipschitzian. Then, for every t > 0, P;;2 ^ is absolutely continuous with respect to P» on (Q', V,). ❑ The preceding results extend easily to nonhomogeneous diffusions on 1. Recall the notation of Theorem T.2.1, and define
exp{Jt0 a
'(s, Xo. =)µ(s, Xo. :),d% - 1 r^ ^a-'(s Xo.X)µ(s, Xo,x)^2 ds}. 2 o jjj
J
(T.3.33) Theorem T.3.5 (a) Suppose (t, y) -. µ(t, y), a(t, y), a - '(t, y) are Lipschitzian. Let {X°•'} denote the nonanticipative solution of X o •" = x +
fo,
a(s, X o •')
dB s ,
(T.3.34)
and Q a probability measure on (S2, a(U,, o S)) defined by Q(A) = E(1 g Z°•"),
A e.. (T.3.35)
Then Q is absolutely continuous with respect to P on (S2, y,), with a Radon-Nikodym derivative Zo" (b) Suppose (t, (y)) -+ µ (t, y) (i = 1, 2), a(t, y), a" 1 (t, y) are Lipschitzian. Let Pö! x (i = 1, 2) denote the distribution (on (C([0, oo): R'), .4)) of a diffusion with drift µ;() and diffusion matrix a( ‚. )a'( ). Then Pö ; is absolutely continuous with respect to Po'; on .4. ❑ ;
The assumption that µ, a, a ' be Lipschitzian may be relaxed to the assumption that these may be locally Lipschitzian and that the diffusions, with and without the drift p, be nonexplosive. This may be established by approximating µ, a by µ„, a„ as in (T.3.8), and then letting n -. oo. R. H. Cameron and W. T. Martin (1944), "Transformation of Wiener Integrals Under Translations,” Ann. of Math., 45, pp. 386-396, proved the above results for the case a(•) = I. These were generalized by I. V. Girsanov (1960), "On Transforming a Class of Stochastic Processes by Absolutely Continuous Substitution of Measures," Theor. Probability Appl., 5, pp. 314-330. -
THEORETICAL COMPLEMENTS
619
4. (The Support Theorem and the Maximum Principle) We want to establish the following result. The notation is the same as in the statement of Corollary T.3.4. Theorem T.3.6. (The Support Theorem). Let p.), a( ), a 1 ( ) be Lipschitzian. Then the support of the distribution P. (of a diffusion with drift It(.) and diffusion matrix ( )a'( )) is C {w' e C([0, cc): 01"): w'(0) = x}. ❑
Proof. It is enough to show that all sufficiently smooth functions in C. are in the support of P. Let t —* u(t) = (u,(t), ... , u k (t)) be a twice continuously differentiable function in C. Let > 0, T> 0 be given. Define a continuously differentiable (in t) function c(t) _ (c 1 (t), ... , c k (t)) that vanishes for t > 2T and such that c.(t) = (d/dt)u ; (t), 1 _< i <_ k, t T. Consider the diffusion {X°•'} that is the solution of X°•x = x + J
c(s) ds +
J a(X
') dB s (t -> 0).
(T.3.36)
0
0
In view of Theorem T.3.5, it is enough to show P^IX° — x — Jc(s) ds < e for all t e [0, T]) > 0.
(T.3.37)
In turn, (T.3.37) follows if we can find a Lipschitzian vector field b(•) such that the solution {Y,} of r
Y, = x +
c(s) ds +
f
.'
r
b(Y,) ds +
6(Y s ) dB,
(t -> 0)
(T.3.38)
0
satisfies
P(IYr — x —
J c(s) ds < e for all
t E [0, T]) > 0.
(T.3.39)
0
But (T.3.39) will follow if one may find b(•) such that E(•r OB(x: ,) ) > T,
where T öB( , ) := inf { t > 0: I Y, — x — f ' c(s) dsI-> e } . .
)
The following lemma shows that such a b(•) exists.
n
Lemma. Let a(•) be nonsingular and Lipschitzian. Then, for every e > 0, sup Er fl( , E) = co, where the supremum is over the class of all Lipschitzian vector fields b(.).
(T.3.40) ❑
Proof. Without loss of generality, take x = 0. (This amounts to a translation of the coefficients by x.) Write Z, = Y, — f ö c(s) ds. For any given pair t() - b(. ), y(.)
620
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
define
f
Za)ZW
('
d(t, z):= d ;i (z + c(s) ds) Z IZI .
B(t, z) _ Y d ii (z +
J c(s) ds) o
,
t. t
C(t,
z) = 2 +
c(s) ds o
B(t, z) + C(t, z)
3=' sup
f30.Izl =r d(t,
a(r):= sup d(t, z).
— 1,
z) t
>-o.Izl =r
Similarly ß(r), a(r) are defined by replacing "sup" by "inf" in the last two expressions. Finally, let
I(r):=
f rß(u) du
!(r):_ I r ß(u) du
J
U
o
a U
where a is an arbitrary positive constant. Note that {Z,} is a (nonhomogeneous) diffusion on 11" whose drift coefficient is b (t , z) = b(z + ,(o c(s) ds), and whose diffusion matrix is D(t, z) := a(t, z)ß'(t, z) where a(t, z) = a(z + f o c(s) ds). The infinitesimal generator of this diffusion is denoted A. Now check that (T.3.41) limß(r)>k -1. •10
Define
F(r)
=
f, f, 8(
"a
1v )
(
exp{ —
J
uß(,
v
dll } dv) du. )))
«v
(T.3.42)
Using (T.3.41) it is simple to check that F(r) is finite for 0 < r s. Also, for r > 0,
F'(r) =
—
Er 1 exp^—
a(v) F"(r)
J
r
^
ß(U) dv' } dv < 0,
J)
v'
(1.3.43 )
_ —«-r) — Y ß r)
Define q(z) = F(IzI). By (3.24) and (T.3.43) it follows that for 0< IzI < s.
2A t rp(z) >- —1
Therefore, by Itb's Lemma applied to cp(Z,) (and using the fact that cp = 0 on öB(O:e)), 0
— 4?(0) 3
—
ZEiaBto:E),
so that
ETaB(o:,) -> 2q(0) = 2F(0),
0 -< IzI -< a.
(T.3.44)
Choose b(z) _ —Mz (M > 0). Then it is simple to check that as
M —r oo, F(0) — oo. n
621
THEORETICAL COMPLEMENTS
Corollary T.3.7. (Maximum Principle for Elliptic Equations). Suppose It(.) and a(• ) are locally Lipschitzian and r(.) nonsingular. Let A be as in (T.3.31). Let G be a bounded, connected, open set, and u(•) a continuous function on G, twice continuously differentiable in G, such that Au(x) = 0 for all x e G. Then u(.) cannot attain its maximum or minimum in G, unless u is constant on G. 0 Proof. If possible, let x o e G be such that u(x) _< u(x o ) for all x e G. Let 6 > 0 be such that B(x 0 :2 5) c G. Let cp be a bounded, twice continuously differentiable function on H" with bounded first and second derivatives, satisfying q0(x) = u(x) on B(x o :b). Let {X,} be a diffusion with coefficients µ(• ), a(• ), starting at x 0 . Apply It6's Lemma to get Eg7(X8,0 ai) =
or, denoting by it the distribution of X,ß, b .,
u(y)rt(dy)
f
=
u(x o ).
(T.3.45)
eB(x o :j)
But u(y) _< u(x o ) for all ye aB(x 0 :6). If u(y o ) < u(x o ) for some y o E 3B(x 0 :S) then, by the continuity of u, there is a neighborhood of y o on 3B(x 0 :S) in which u(y) < u(x o ). But Theorem T.3.6 implies that iv assigns positive probability to this neighborhood, which would imply that the left side of (T.3.45) is smaller than its right side. Thus u(y) = u(x o ) on 3B(x 0 :8). By letting 6 vary, it follows that u is constant on every open ball B contained in G centered at x o . Let D = {y e G: u(y) = u(x o )}. For each y e D there exists, by the same argument as above, an open ball centered at y on which u equals u(x o ). Thus D is open. But D is also closed (in G), since u is continuous. By connectedness of G, D = G. n D. W. Stroock and S. R. S. Varadhan (1970), "On the Support of Diffusion Processes with Applications to the Strong Maximum Principle," Proc. Sixth Berkeley Symp. on Math. Statist. and Probability, Vol. III, pp. 333-360, contains much more than the results described above. 5. (Positive Recurrence and the Existence of a Unique Invariant Probability) Consider a positive recurrent diffusion {X} on Rk . Fix two positive numbers r, < r z . Define ry e :=inf{t
0: X') = r 1 },
z; _ '
riz;:=inf{t
: IX,) = r z }, (T.3.46)
x 7zr +i =inf{ t->ri zi : ^X,^ =r 1 }
(i
1).
The random variables q ; are a.s. finite. By the strong Markov property {Xn z ,: i > 1} is a Markov chain on the state space 0B(O:r 1 ) = {)yl = r l }, having a (one-step) transition probability ni(y, B):= P(X"7:,.•, E B {iii
nz:-. )xt=r =
J
=.zi
()yj = r 1 , B E .1(ÖB(O: r, )),
H,(z, B)H2(y, dz), (T.3.47)
622
AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS
where .yn , is the pre-q sigmafield, and H (y, dz) is the distribution of X ß(0 ., t the random point where {X;} hits OB(0:r; ) at the time of its first passage to aB(O:r; ). Now for all z,, z 2 e OB(O:r, ), H2 (z,, dy), H2 (z 2 ,dy) are absolutely continuous with respect to each other and there exists a positive constant c (independent of z,, z 2 ) such that ;
dH2(z1, dy) >_
dH2(z2, dy)
(jz; = r l for i = 1, 2).
c
(T.3.48)
This is essentially Harnack's inequality (see D. Gilbarg and N. S. Trudinger (1977), Elliptic Partial Differential Equations of Second Order, Springer- Verlag, New York, p. 189). It follows from (T.3.47) and (T.3.48) that dn1(Y, dz)
dn1(Y1, dz) >c
(Y,
Yi e öB(O:r l )).
(T.3.49)
This implies (see theoretical complements II.6) that there exists a unique invariant probability µ l for the Markov chain {X11 2 , : i > l} and the n-step transition probability n(y, B) converges exponentially fast to µ 1 (B) as n -^ oo, uniformly for all y e 3B(O: r,) and all BE .(0B(0: r, )). Now let {X 1 } be the above diffusion with initial distribution µ,, so that {X n2 : i > 1} is a stationary process, as is the process nz
Z1 (f)= fJ(XS)ds (i n2i-
1),
1
where f is a real-valued bounded Borel-measurable function on 118'`. ❑
Lemma. The tail sigmafield of {Z ; (f ): i > 1} is trivial.
Proof. Let A e ✓ := n,,,, o {Z; (f ): i >, n} —the tail sigmafield, and Bev{Z i (f ): I _< i _< n} c ,Fn2 „ , (the pre-n Z „ + , sigmafield of {X, }). For every positive integer m, A belongs to the after- ,j 2( ,,, )+i process X,1z and, therefore, may be expressed 1
m
as {Xnzi m „ e A n +m a.s., for some Borel set A„ +m of C([0, oo): 11"). By the strong Markov property, }
P(A j
na^.m^. ) = E(In„,m(X
+
.I) I ^ ?2I.'+m,.,)
_ (Px (A n+m )) x = xa21,,.^1.I — ^n+m( X nzi^«)s
say. Hence,
P(A I . n2^*
I
) =
E( P„+m (Xmt (
„.m1
+^ )
^ ^nTh,,)
= (
J
+zmx,
dz)/ x
= X,2,,
i
But n,mt(x, B) -+ µ 1 (B) as m -+ oo, uniformly for all x and B. Therefore
P(A I m* ,) -
J
cp„ +m (z)µ,(dz) --. 0
a.s.
as m -• cc.
(T.3.50)
623
THEORETICAL COMPLEMENTS
But ^^n+mlZ)t^l(dZ) = E(n+m(X,1,_
= P(A).
Therefore, (T.3.50) implies, for every Be P(B nA)—P(B)P(A)—^0
asm --•oo.
(T.3.51)
But the left side of (T.3.51) does not depend on m. Therefore, P(B n A) = P(B)P(A). < i _< n} and the tail sigmafield % are independent for every I-fence 1 }. In particular, ✓ is independent i This implies that .i is independent of of itself, so that P(A) = P(A n A) = P(A)P(A). Thus, P(A) = 0 or 1. n
a{Z (f ): I ;
a{Z (f ): _>
n _> 1.
;
Using the above lemma and the Ergodic Theorem (theoretical complements II.9), instead of the classical strong law of large numbers, we may show, as in Section 9 of Chapter 1I, or Section 12 of Chapter V, that
J
1(
as. .
1
lim -- .i(X ) ds --' --- ---- E f(X s ) ds = .i(X)m(dX) > (T.3.52) r
-.
>
S
t 0 E(3 — tie) i,, 1
say, where in is the probability measure on (H', k ) with m(B) equal to the average amount of time the process {X } spends in the set B during a cycle [72+ 92n+3]. It follows from (T.3.52) that in is the unique invariant probability for the diffusion. The criterion (3.25) (Proposition 3.3) for positive recurrence is due to R. Z. Has'minskii (1960), "Ergodic Properties of Recurrent Diffusion Processes and Stabilization of the Solution to the Cauchy Problem for Parabolic Equations," Theor. Probability App!., 5, pp. 196-214. A proof and some extensions may be found in R. N. Bhattacharya (1978), "Criteria for Recurrence and Existence of Invariant Measures for Multidimensional Diffusions," Ann. Probab., 6, pp. 541-553; Correction Note (1980), Ann. Probab., 8. The proof in the last article does not require Harnack's inequality. S
Theoretical Complements to Section VII.4 1. Theorem 4.1 and Proposition 4.2 are special cases of more general results contained in G. Basak (1989), "Stability and Functional Central Limit Theorems for Degenerate Diffusions,)) Ph.D. Dissertation, Indiana University. Two comprehensive accounts of the theory of stochastic differential equations are N. Ikeda and S. Watanabe (1981), Stochastic Differentia! Equations and Diffusion Processes, North-Holland, Amsterdam, and I. Karatzas and S. E. Shreve (1988), Brownian Motion and Stochastic Calculus, Springer- Verlag, New York.
CHAPTER 0
A Probability and Measure Theory Overview
1 PROBABILITY SPACES Underlying the mathematical description of random variables and events is the notion of a probability space ((2, F, P). The sample space 1 is a nonempty set that represents the collection of all possible outcomes of an experiment. The elements of 12 are called sample points. The sigmafield F is a collection of subsets of Q that includes the empty set 0 (the "impossible event') as well as the set n (the "sure event") and is closed under the set operations of complements and finite or denumerable unions and intersections. The elements of F are called measurable events, or simply events. The probability measure P is an assignment of probabilities to events (sets) in F that is subject to the conditions that (i) 0 < P(F) _< 1, for each Fe F, (ii) P(0) = 0, P(12) = 1, and (iii) "(Ui F; ) = Z ; P(F; ) for any finite or denumerable sequence of mutually exclusive (pairwise disjoint) events F; , i = 1, 2, ... , belonging to F. The closure properties of F ensure that the usual applications of set operations in representing events do not lead to nonmeasurable events for which no (consistent) assignment of probability is possible. The required countable additivity property (iii) gives probabilities a sufficiently rich structure for doing calculations and approximations involving limits. Two immediate consequences of (iii) are the following so-called continuity properties: if A, c A 2 c is a nondecreasing sequence of events in .y then, thinking of Ute° , A n as the "limiting event" for such sequences, P^ U
lim P(An).
n-I An)=
^
To prove this, disjointify {A n } by B n = A. — A_ 1 , n >_ 1, A 0 = 0, and apply (iii) to U = , B. = U^ , A n . By considering complements, one gets for decreasing measurable
625
626
A PROBABILITY AND MEASURE THEORY OVERVIEW
events A,
A 2 • • • that Pj (l A \n—,
nt =
tim P(A,).
(1.2)
n
While (1.1) holds for all countably additive set functions u (in place of P) on F, finite or not, (1.2) does not in general hold if p(A n is not finite for at least some n
>_ _<
)
(onwards). If S2 is a finite or denumerable set, then probabilities are defined for all subsets F of (I once they are specified for singletons, so F is the collection of all subsets of S2. Thus, if f is a probability mass function (p.m.f.) for singletons, i.e., f (w) 0 for all con S2 and y. f(co) = 1, then one may define P(F) = y.EF f(cw). The function P so defined on the class of all subsets of Q is countably additive, i.e., P satisfies (iii). So (Q, F, P) is easily seen to be a probability space. In this case the probability measure P is determined by the probabilities of singletons {co}. In the case f) is not finite or denumerable, e.g., when Q is the real line or the space of all infinite sequences of 0's and l's, then the above formulation is no longer possible in general. Instead, for example in the case S2 = I8', one is often given a piecewise continuous probability density function (p.d.f.) f, i.e., f is nonnegative, integrable, and f f (x) dx = 1. For an interval I = (a, b) or (b, cc), —cc < a
;
;
>_
p(F)
;
=
inf E p(C,)
(F e F),
where the summation is over a finite collection C,, C 2 , contains F and the infimum is over all such collections.
...
(1.3) of sets in' whose union
627
RANDOM VARIABLES AND INTEGRATION
As suggested by the construction of measures on it outlined above, starting from their specifications on a class of rectangles, if two measures µ, and U 2 on a sigmafield agree on a subclass .9t c ,F closed under finite intersections and ) E .sd, then they agree on the smallest sigmafield, denoted a(d), that contains .4. The sigmafield a(d) is called the sigmafield generated by d. On a metric space S the sigmafield M = a(S) generated by the class of all open sets is called the Borel sigmafield. In case a probability measure P is given on M k , by specifying a p.d.f. f on I, for example, one defines the (cumulative) distribution function (c.d.f.) F associated with P as F(x)=P({(Y1,.. .,Yk)ER k :y;<xifor I
x=(x 1 ,...,xjEI8 k . (1.4)
It is simple to show, using the continuity (additivity) properties of P that F is right-continuous and coordinatewise nondecreasing. By the usual inclusion—exclusion procedure one may compute probabilities of rectangles (a,, b,] x • • • x (a h , bk ] in terms of the distribution function.
2 RANDOM VARIABLES AND INTEGRATION A real-valued function X defined on Q is a random variable provided that events of the form {X E I} := {co E S: X(cu) E I} are in F for all intervals l; i.e., X is a measurable function on S with respect to .f . This definition makes it possible to assign probabilities to events {X E I}. A similar definition can be made for k-dimensional vector-valued random variables by taking I to be an arbitrary open rectangle in ll. From this definition it follows that if X is a random variable then the event {X E B} E.y for every Borel set B. The distribution of X is the probability measure Q on (lt, -4 k ) given by Q(B)
=
P(X
E
B),
B E .A k .
(2.1)
Often one writes Px for Q. We sometimes write Q(dx) = f(x) dx to signify that the distribution of X has p.d.f. f with respect to Lebesgue measure. Suppose that (S2, ,y , p) is an arbitrary measure space and g is a real-valued function on S2 that is measurable with respect to .^. For the simplest case, suppose that g takes on only finitely many distinct nonzero values y t , Y 2 ..... y k on the respective sets A,, A 21 ... , A k ; such a function g is called a simple function. In the case is a probability measure, simple functions are discrete random variables. If µ(A ; ) is finite for each i _> 1 then the Lebesgue integral of g is defined as the sum of the values of g weighted by the p-measures of the sets on which it takes its values. That is,
Jn
g dp'= Z y, µ(Ar),
(2.2)
where A ; = {w E Q: g(w) = y;}, i = 1, 2, ... , k. If g is a bounded nonnegative measurable function on S2, then it is possible to approximate g by a sequence of simple functions n = 1, 2, ... , where g" has values i/2" on A(n) = {w: i /2" _< g(w) < (i + 1)/2"), respectively, for i = 1, 2, ... , k" = N 2" with N so large that g(w) < N for all w. If each A 1 (n) has finite measure, then fn g" dp is a nondecreasing sequence of numbers whose limit therefore exists but may be infinite. In the case that the limit exists and is finite, g is said to be integrable and its Lebesgue integral is the limiting value. In the case that
628
A PROBABILITY AND MEASURE THEORY OVERVIEW
g is an unbounded nonnegative measurable function, first approximate g by g(w) = min{g(w), N — 1 }, N = 1, 2, .... Then each g N is a bounded nonnegative measurable function. If each g r, is integrable, then the sequence f n g,, dp is a nondecreasing sequence of numbers. If the limit of this sequence is finite, then g is integrable and its integral is the limiting value. Finally, if g is an arbitrary measurable function then write g = g — g as a difference of nonnegative measurable functions, where g + (w) = g(w) on A + _ {weft g(w) > O} and g + (w) = 0 on A - = {w e n: g(w)
J
xQ(dx). u^
More generally, if^p is a measurable function on R' and q (X) integrable, then Erp(X) =
J
cp(X) dP =
n
J
rp(x)Q(dx).
(2.3)
^a^
To verify the change-of-variable formula (2.3) one begins with simple functions (p and proceeds by limits as described after the definition (2.2). Estimates and bounds on expected values and other integrals are often obtained by applications of the basic inequalities for integrals summarized below. Let (S2, °^ , p) be an arbitrary measure space and let f and g be integrable functions defined on S2. A property is said to hold almost everywhere (a.e.) or almost surely (a.s.) if the set of points where it fails has p-measure zero. Then, it is simple to check from the definition of the integral that we have the following. Proposition 2.1. (Domination Inequality). If f _< g p-a.e. then
f dµ Jn
< in
g dl^
(2.4)
A real-valued differentiable function (p defined on an interval I whose graph is supported below by its tangent lines is called convex function, e.g., qi(x) = ex, x e 18', (p(x) = x°, x > 0 (p > 1). In other words, a differentiable cp is convex on I if 4,(y) -> 4,(x) + m(x)(y — x) (x, y e 1), (2.5)
where m(x) = 4,'(x). The function cy is strictly convex if the inequality in (2.5) is strict
except when x = y. For twice continuously differentiable functions this means a positive second derivative. Observe that if rp is convex on I, and X and 4(X) have finite expec-
629
RANDOM VARIABLES AND INTEGRATION
tations, then letting y = X, x = EX in (2.5) we have q(X) ? m(EX)(X — EX) + (p(EX). Therefore, using the above domination inequality (2.6) and linearity of the integral, we have Ecp(X)
>- m(EX)E(X — EX) + Ecp(EX) = cp(EX ).
(2.6)
This inequality is strict if (p is strictly convex and X is not degenerate. More generally, convexity means that for any x 0 ye I, (p(tx + (1 — t)y) _< up(x) + (1 — t)q (y), 0 < t < 1; likewise strict convexity means strict inequality here. One may show that (2.5) still holds if m(x) is taken to be the right-hand derivative of tp at x. The general inequality (2.6) may then be stated as follows.
Proposition 2.2. (Jensen's Inequality). If cp is a convex function on the interval I and if X is a random variable taking its values in 1, then (2.7)
(p(EX) _< Etp(X)
provided that the indicated expected values exist. Moreover, if q is strictly convex, then equality holds if and only if the distribution of X is concentrated at a point (degenerate). From Jensen's inequality we see that if X is a random variable with finite pth absolute moment then for 0 < r < p, writing IXI = (IXI')'/", we get EIXI° >_ (EIXI')°l'. That is, taking the pth root, EIXI")' ' (EIXI')' 1 -<
(2.8)
(0 < r < p).
(
'
In particular, all moments of lower order than p also exist. The inequality (2.8) is known as Liapounov's Inequality. The next inequality is the Hölder Inequality, which can also be viewed as a convexity type inequality as follows. Let p > 1, q > I and let f and g be functions such that If I ° and Igl are both p-integrable. If p and q are conjugate in the sense I/p + 1/q = 1, then using convexity of the function ex we obtain
IfI - I9I = e nrvuoglllP+tt/vuogl919 <
1 e l-91f1" + I e l- 919IQ = I IfI' + 1 191 P
q
P
9
.
q
Thus,
IfIIgI
1
- If1 1
P
1 +-
( 2.9)
I9I 9 .
q
Now define the L° -norm by
IlflI '=(
Ifl
d1)
( 2.10)
.
Replacing f and g by f/IlfII,, and g /Ilgll q , respectively, we have upon integration that
f Ifgj
dy
1
— 5 II.fIIpII9II q P
IJ1919du t 1 J Ifl ° dµ +--_—+—= 1.
(IifII P )'
Thus, we have proved the following.
q
(11911 9 )4 P
q
(2.11)
630
A PROBABILITY AND MEASURE THEORY OVERVIEW
Proposition 2.3. (Hölder Inequality). If I f" and Ig1 4 are integrable, where 1
J fIgIq dyJ
J
(2.12)
The case p = 2 (and therefore q = 2) is called the Cauchy—Schwarz Inequality, or the Schwarz Inequality. The first inequality in (2.12) is a consequence of the Domination Inequality, since fg _< I fgl and —fg <, IfgI. The next convexity-type inequality furnishes a triangle inequality for the If -norm (P31). Proposition 2.4. (Minkowski Inequality). If I f Ip and Igle are µ-integrable where 1 < p < co, then
if
l/p
If+gl p dµ}
_<
'ip
+{ IgI "dµ }
.
(
2.13)
J
To prove (2.13) first notice that since,
If + gl p
'ip
( IJ I + IgI) p < (2 max(If I, Igl)) p
_<
2p
(I f I p + IgI"),
(2.14)
the integrability of If + glp follows from that of If Ip and Igle. Let q = p/(p — 1) be the conjugate to p, i.e., 1/q + 1/p = 1, barring the trivial case p = 1. Then,
f If+glpdu
f
_< (IfI +IgI)pdp.
(2.15)
The trick is to write
J (I.f I + IgI)
p
dµ
=f
J
I.f I(If I + Igl) p- ' du + IgI(If I + II) p- ' du
Now we can apply Hölder's inequality to get, since qp — q = p, /q
J
(IfI + IgI) p dµ < IIflip
[(IfI + II) p-I]9dµ)1 + II8II \J
_ (Ilf Ilp + II IIp)
_
(J
Elle [(Ifl + II)p- I]4dp
(III + IgI) p n
Dividing by (f (If I + Igl)p )' Iq and again using conjugacy 1 — 1/q = 1/p gives the desired inequality by taking (1 /p)th roots in (2.15), i.e., Ilf + gllp < IlfII + IlgII . Finally, if X is a random variable on a probability space (f2, .., P) then one has Chebyshev's Inequality
P(IXI 3 e) which follows from EIXI"
>,
p
EIXIP
E
E(lIIxI,E}IXI P )
>_
(s > 0, p> 0),
e"P(IXI > e).
(2.16)
631
LIMITS AND INTEGRATION
3 LIMITS AND INTEGRATION A sequence of measurable functions { f : n >- I} on a measure space (S2, Y, p) is said to converge in measure (or in probability in case p is a probability) to f if, for every e > 0, f
p({Ifn —fi>E})-+0
asn -+oo.
(3.1)
A sequence { fn } is said to converge almost everywhere (abbreviated a.e.) to f, if u({fn— i })=0.
(3.2)
If g is a probability measure, and (3.2) holds, one says that (f} converges almost sure/i' (a.s.) to f. A sequence { f,} is said to be Cauchy in measure if, for every E > 0,
Alf.—LI >E })-0
asn, in--^ co.
(3.3)
Given such a sequence, one may find an increasing sequence of positive integers {n k : k >- 1} such that
u({I/n k — .in k . (> ('z) k }) < (z) k (k = 1, 2, ...).
(3.4)
The sets B,:= U°= {If — fnk , l > 2 -k } form a decreasing sequence converging to B:= {If — fnk „I > 2 for infinitely many integers k}. Now p(B,) -< 2"+'. Therefore, u(B) = 0. Since B = {1 fn , — f,,, s 2 for all sufficiently large k}, on B` the sequence { f,,,: k >- l} converges to a function f. Therefore, { fnk } converges to f a.e. Further, for any r>0 and all m, .
-
u({Ifn—fI>E})Vu({Ifn—fnml>2})+11({Ifn —.fl>2}). m
(3.5)
The first term on the right goes to zero as n and n m go to infinity. Also,
Bm if
1 1 Jn.^ — J I > 21 C
2—m+1 < 2
As p(B m ) -+ 0 as m -+ oc, so does the second term on the right in (3.5). Therefore, { fn : n -> l} converges to fin measure. Conversely, if {f,,} converges to f in measure then, for every E > 0,
— f > E}) < µ({l f
—
.fl > 2}) + p({IJJ — f > 2}) --> 0
as n, m --• x. That is, {f,,} is Cauchy in measure. We have proved parts (a), (b) of the following result. Proposition 3.1. Let (1 , .y, p) be a measure space on which is defined a sequence { !„} of measurable functions.
632
A PROBABILITY AND MEASURE THEORY OVERVIEW
(a) If { f„} converges in measure to f then
If.) is Cauchy in measure and there exists
a subsequence { f,k : k >- l} which converges a.e. to f (b) If {f} is Cauchy in measure then there exists f such that {f} converges to f in measure. (c) If µ is finite and { f,} converges to f a.e., then {f} converges to f in measure. Part (c) follows from the relations -+ 0 asn -goo.
u( {Ifm—fI>E})<µ(U {ifm—fI>E }
(3.6)
m=n
Note that the sets A, say, within parentheses on the right side are decreasing to A := {I f, — f I > s for infinitely many m }. As remarked in Section 1, following (1.2), the convergence of µ(A m ) to u(A) holds because µ(A m ) is finite for all m. The first important result on interchanging the order of limit and integration is the following. Theorem 3.2. (The Monotone Convergence Theorem). If { f} is an increasing sequence of nonnegative measurable functions on a measure space (S2, .y , p), then I^m
J f.du= ffdl^,
(3.7)
where f = lim„ f,. Proof. If {f) is a sequence of simple functions then (3.7) is simply the definition of $ f dp. In the general case, for each n let { f k : k >- l} be an increasing sequence of nonnegative simple functions converging a.e. to f, (ask --* x). Then g k := max{ fn k : 1 -< n -< k} (k >, 1) is an increasing sequence of simple functions and, for k >- n, (3.8)
Jk
as f„, k -< f, < f, for I -< n < k. By the Domination Inequality
f
f,,
dp ,
J
g k dp <- ffk dµ.
(3.9)
By taking limits in (3.8) as k -. oo one gets (3.10)
where g = lim k g k • By taking limits in (3.10) as n - oo, one gets f < g , f, that is, g = Now, by the definition of the integral, on taking limits in (3.9) as k -^ oo one gets
f f„du<
J
gdµ=
J fdµSlim J
Taking limits, as n -. co, in (3.11) one gets (3.7).
fkdµ.
f.
(3.11)
k
n
633
LIMITS AND INTEGRATION
A useful consequence of the Monotone Convergence Theorem is the following theorem. Theorem 3.3. (Fatou's Lemma). If { f,} is a sequence of nonnegative integrable functions,
then
J (liminf f,) dit -< liminf J f„ dµ.
(3.12)
Proof. Write g„:= inf{ fk : k -> n}, g:= liminf f,. Then 0 g„ 1g. Therefore, by the Monotone Convergence Theorem, f g„ dp --+ ( g dµ. But g„ f„ so that $ g„ dp $ ff dp for all n. Therefore,
J g dµ = lim J g„ dµ = liminf J g„ dp -< liminf J f„ dp.
n
The final result on the interchange of limits and integration is as follows. be a sequence of Theorem 3.4. (Lebesgue's Dominated Convergence Theorem). Let { be measurable functions on a measure space (S2, F, p) such that
(i) f„ -^ f a.e. or in measure, and (ii) ILl -< g for all n, where $ g dp < oo. Then
f
If. - f I dp ^ 0
asn--•oc.
(3.13)
In particular, $ f„ dµ -• $ f dp. Proof. Assume f, -+ f a.e. By Fatou's Lemma applied to the sequences {g + f,}, {g - f„} one gets $ f dµ < liminf $ f„ du and $ f dµ -> limsup $ f„ dµ. Therefore, lim $ f„ dp = $ f dµ. To derive (3.13), apply the above result to the sequence {If. - f I}, noting that
If,,—fI <2g. Now assume { f,} converges to fin measure. By Proposition 3.1, for every subsequence
{ f,.} of {f,} there exists a further subsequence { f,,,: k >- 1) such that { f, k } converges a.e. to f. By the above, $ f k dp -• $ f dp as k -* oo. Since the limit is the same for all subsequences, the proof is complete. n We now turn to the L"-spaces. Consider the set of all measurable functions f on a measure space (S2, .F, u) such that I I fl dp < co, where p is a given number in [1, oo). For each such f, consider the class of all g such that f = g a.e. Since the relation "f = g a.e." is an equivalence relation (i.e., it is reflexive, symmetric, and transitive), the set of all such f splits into disjoint equivalence classes. The set of all these equivalence classes is denoted L"(S), F, p) or L°(S2, p) or simply L° if the underlying measure space is obvious. It is customary and less cumbersome to write f e L°, rather than {equivalence class of f} e U. For Je L°, the L°-norm II f lI is defined by (2.10). By Minkowskii's inequality (2.13), If is a linear space. That is, f, g e L° implies cf + dg e If for all reals c, d. Also, under the L'-norm, If is a normed linear space. This means (i) III f ii,, 0 for all f e L° , with equality if and only if f = 0 (a.e.); (ii) Ilcf II D = I cl II f II P for every real c, every f e L°;
634
A PROBABILITY AND MEASURE THEORY OVERVIEW
and (iii) the triangle inequality II f + gll, < II f II P + g11, holds for all f, g e L° (by (2.13)).
The space L" is also complete, i.e., if fn e L' (n > 1) is a Cauchy sequence (i.e., II.% — .fmll P -* 0 as n, m -. oo), then there exists f in L° such that II fn — f II P - 0. For this last important fact, note first that Ilfn — fmllP - 0 easily implies that { fn } is Cauchy in measure and therefore, by Proposition 3.1, converges to some f in measure. As a consequence, {h f"} converges to I f I° in measure. It then follows by Fatou's Lemma applied to {I ff} that f I f I ° dp < oo, i.e., f e U. Applying Fatou's Lemma to the sequence n} one gets — p:
{If fml in >_
I.in — f I ° dp <_ liminf
J
m
J I.fn — .lml ° dp.
(3.14)
But S Ifn — fmi' dµ = (11L — fmllp)". Therefore, the right side goes to zero as n -^ oo, proving that II f — f II,, -• 0. A complete normed linear space is called a Banach space. Thus, the LP-spaces are Banach spaces (1 < p < oo). Ifµ is a probability measure (more generally, a finite measure), then p < p' implies L" c La', in view of Liapounov's inequality (2.8). This is not true if µ(Sl) = cc. When does a sequence {f} in L° converge to some Je LP in L" -norm? Clearly, {f,,} must converge to f in measure. A useful sufficient condition is provided by Lebesgue's Dominated Convergence Theorem if one assumes that the dominating function g is in R. For then IL — f I° -* 0 in measure, and IL — f IP <, 2"(/c,"+ I f(" ) (see Eq. 2.14) 2° + 'Igl" = h, say, so that the conditions of the theorem apply to {If — f I°}. If p is a probability measure, then one may obtain a necessary and sufficient condition for L°-convergence. For this purpose, define a sequence of random variables {X n } on a probability space (S2, F, P) to be uniformly integrable if
J
sup IXnI dP -. 0
(3.15)
as A -• co.
n
One then has the following. Theorem 3.5. (L'- Convergence Criterion). Let (S2, F , P) be a probability space, and {Xn } a sequence of integrable random variables. Then X. converges in L' to some random variable X if and only if (i) XX -, X in probability, and (ii) {Xn } is uniformly integrable.
Proof. (Sufficiency). Under the hypotheses (i), (ii), given e > 0 one has for all n,
f IX.I dP <_ J IXnldP+A_<E
+1(e)
(3.16)
if A(e) is sufficiently large. In particular, { f IX,,I dP} is a bounded sequence. Then, by Fatou's Lemma, f IXI dP < co. Now
f11X.-Xj>_.Z)
IXn—XIdP<
f
IXX—XIdP+
flX.j_>Al2)
^ f I XnI dP + jX^J>_Al21
fI
f
IXn
—
XIdP
flX,j
f
I
X I dP
jX.J^_.1/2)
+ X,—XIdP. jjX.J ^Aj2.jX.-Xj1_A)
(3.17)
635
LIMITS AND INTEGRATION
Now, given e > 0, choose 2(E) > 0 such that the first and second terms of the last sum are less than e for A = 2(e) (use hypothesis (ii)). With this value of A, the third term goes to zero, as n -.• eo, by Lebesgue's Dominated Convergence Theorem. Hence JX„ — X 1 dP -< 2e.
limsup J n-
W
(Ix„-XPA( )1
But, again by the Dominated Convergence Theorem, limsup „-
f
'm
IX„ — XI dP = 0. IX.-XI<,( t )^
(Necessity). If X. —+ X in L', clearly X. —• X in probability. Also,
J ilx.,l x)
IX„I dP - ix„ — X^ dP + JXj dP J {Ix^l%zt
f
IXX — X^dP +
f
JXidP +
f
JX^dP.
jjXj
fjXj>-k2)
(3.18)
The first term of the last sum goes to zero, as n - co, by hypothesis. For each A > 0 the third term goes to zero by the Dominated Convergence Theorem, as n —+ oo. The same convergence theorem implies that the second term goes to zero as A —. cc. Therefore, given e > 0 there exists 2(E), n(e), such that sup f JX„j dP -< e "'>"(E)lx„I%x1
if 1
2(a).
(3.19)
Since a finite sequence {X: 1 < n < n(e)} of integrable random variables is always n uniformly integrable, it follows that {XX } is uniformly integrable. A corollary is the following version. As always, p >, 1. Theorem 3.6. (L°-Convergence Criterion). Let (S2,.y, P) be a probability space, and XX e if (n 1). Then X. — X in L° if and only if (i) XX —• X in probability, and (ii) {IX„i} is uniformly integrable. Proof. Apply the above result to the sequence {IXX — XI°}. For sufficiency, note, as in (3.16), that (i), (ii) imply Xe L°, and then argue as in (3.17) that the uniform integrability of {jXI°} implies that of {IX„ — XI"}. The proof of necessity is analogous to (3.18) and (3.19), using (2.14). n It is simple to check by a Chebyshev-type inequality (see Eq. 2.16) that if {EIX„V } is a bounded sequence for some p' > p, then {IX„I"} is uniformly integrable. There is one important case where convergence a.e. implies C-convergence. This is as follows. Theorem 3.7. (Scheffe's Theorem). Let (S), .°^ , p) be a measure space. Let f,(n >- 1),f be p.d.f.'s with respect to p, i.e., f,,, f are nonnegative, and Sf dp = 1 for all n, If dp = 1. If f„ converges a.e. to f, then f„ —. fin L'.
636
A PROBABILITY AND MEASURE THEORY OVERVIEW
Proof: Recall that for every real-valued function g on SI, one has g = g + — g - ,
where g + = max {g, 0 }, g - = —min {g, 0 }.
II = g + + g - ,
One has
f
(f — fn) dl^ =
0 = $(f — fn) dµ — f (f — fn) du, }
so that
J (f — f„) dP = J (f _<
— fn)+
J If — f^l dµ = 2 f (.1 — .%)+ dµ. (3.20)
dµ,
—
Now 0 _< (f — f,) + f, and (f f,) + -+ 0 a.e. as n —* oo. Therefore, by (3.20) and Lebesgue's Dominated Convergence Theorem,
f
If_f.1 dµ = 2 f (f — .ff) + du —•0
asn—>oo.
n
Among the L"- spaces the space Lz - Lz (O, .F , p) has a particularly rich structure. It is a Hilbert space. That is, it is a Banach space with an inner product < , > defined by
(f,geL ). 2
(3.21)
The inner product is bilinear, i.e., linear in each argument, and the L z -norm is given by
If II = <.f,f>' rz (3.22) and the Schwarz Inequality may be expressed as
I< 1, 9 >I
11f 112119112.
(3.23)
4 PRODUCT MEASURES AND INDEPENDENCE, RADON—NIKODYM THEOREM AND CONDITIONAL PROBABILITY If (S,, .,, µ, ), (S2,'2 , P2) are two measure spaces, then the product space (S, .9', p) is a measure space where (i) S is the Cartesian product S 1 x S 2 ; (ii) .50 = .9' g .'z is the smallest sigmafeld containing the class R of all measurable rectangles, x B 2 : B, e .So,, B z e Soz ); and (iii) p is the product measure P, x µ z on .9' determined by the requirement p(B, x B.) = p1(B1)pz(B2)
(B 1 E5",, B 2 e.9 ).
(4.1)
As the intersection of two measurable rectangles is a measurable rectangle, 9 is closed
637
PRODUCT MEASURES AND INDEPENDENCE
under finite intersections. Then the class' of all finite disjoint unions of sets in .4 is a field. By finite additivity and (4.1), p extends to' as a countably additive set function. Finally, Caratheodory's Extension Theorem extends p uniquely to a measure on 9 = e('), the smallest sigmafield containing '. For each Ba 5°, every x-section B(x ,., := {y: (x, y) E B} is in $o,. This is clearly true for measurable rectangles F = B, x B 2 . The class .01 of all sets in So for which the assertion is true is a 2-class (or, a lambda class). That is, (i) S e 2, (ii) A, Be 2, and A c B imply B\A E 2, and (iii) A„(n l) e 2, A„ IA imply Ac Y. We state without proof the following useful result from which the measurable-sections property asserted above for all Be .9' follows. Theorem 4.1. (Dynkin's Pi-Lambda Theorem).' Suppose a class .4 is closed under finite intersections, a class d is a lambda class, and R c .4. Then a(s) c .W, where a(s) is the smallest sigmafield containing -4. In view of the measurable-section property, for each B e 9 one may define the functions x —" p 2 (B(X ,.,), y -a p,(B). These are measurable functions on (S,,.',µ,) and (SZ, 52, P2), respectively, and one has µ(B) =
J A2(B(X .,)p, (dx) = f1
p,(Bc..vi)P2(dy)
(B E 2).
(4.2)
2
This last assertion holds for B = B, x B 2 e , by (4.2) and the relations µ2(B1,)) = P 2 (B 2 )l B ,(x), ‚(B () ) = p,(B,)1 8 ,(y). The proof is now completed for all B e .9° by the Pi-Lambda Theorem. It follows that if f(x, y) is a simple function on (S, .9', p), then for each x e S, the function y —. f(x, y) on (S 2 . Sot , µ 2 ) is measurable, and for each y e S 2 the function x —* f(x, y) on (S,,50,,µ,) is measurable. Further for all such f
J
/d
.f(x,Y)P2(dY) IPi(dx) = = J
\Js=
Js ( f, ^ f '
(x,Y)p,(dx) F^2(dy). (4.3)
J
By the usual approximation of measurable functions by simple functions one arrives at the following theorem. Theorem 4.2. (Fubini's Theorem). (a) If f is integrable on the product space (Si x S2, `° OO`°2 9, x P2) = (S, 5, p), then ,
(i) x - fsz f(x, y)p 2 (dy) is measurable and integrable on (S,, So,, p,). (ii) y - f s , f(x, y)p,(dx) is measurable and integrable on (S2, 9'Z, P2).
(iii) One has the equalities (4.3). (b) If f is nonnegative measurable on (S, .9', p) then (4.3) holds, whether the integrals are finite or not. The concept of a product space extends to an arbitrary but finite number of components (S ; ,., µ;) (1 < i _< k). In this case S = S, x S 2 x • .. x Sk , .° = 9, Ox .9 OO . • • (9 is the smallest sigmafield containing the class . of all measurable rectangles ' P. Billingsley (1986), Probability and Measure, 2nd ed., Wiley, New York. p. 36.
A PROBABILITY AND MEASURE THEORY OVERVIEW
638
B = B 1 x B 2 x • • • x B k (B ; e 1 -< i -< k). The sigmafield .9' is called the product sigmafield, while p = u, x • .. x Pk is the product measure determined by
µ(B 1 x Bz x
•..
x
Bk) = p1(B1)µ2(B2)
...
uk(Bk)
(4.4)
for elements in A. Fubini's theorem extends in a straightforward manner, integrating
f first with respect to one coordinate keeping the k — 1 others fixed, then integrating the resulting function of the other variables with respect to a second coordinate keeping the remaining k — 2 fixed, and so on until the function of a single variable is integrated with respect to the last and remaining coordinate. The order in which the variables are integrated is immaterial, the final result equals J s f dµ. Product probabilities arise as joint distributions of independent random variables. Let (S2, F, P) be a probability space on which are defined measurable functions X; (1 -< i < k), with X, taking values in S; , which is endowed with a sigmafield So (1 < i < k). The measurable functions (or random variables) X,, X2 .... , X„ are said to be independent if P(X, EB 1i X2 EB 2 ,...,Xk EBk )= P(X1 eB1)
...
P(XkEBk)
for all B I eYl ,...,Bk e4k . (4.5) In other words the (joint) distribution of (X 1 ,. . . , Xk ) is a product measure. If f• is an integrable function on (S; , .97, p i ) (1 < i -< k), then the function
f: (xt, x2.... , Xk) -- .r1(x 1)f2(x2)
.
..
fk(xk)
is integrable on (S, .9', p), and one has
J
fd (4.6)
or
E(fl f(X1)) = fl (Efi(X1)).
(4.7)
A sequence {X„} of random variables is said to be independent if every finite subcollection is. Two sigmaftelds FI , .y2 are independent if P(F1 n F2 ) = P(F 1 )P(F2 ) for all Fl E .fi , F2 e .F2 . Two families of random variables {X x : A E A 1 ) and { Yx : A E A 2 } are independent of each other if o{X5 : A E A t ) and r{YY : A E A Z } are independent. Here 6 {Xd: A E Al is the smallest sigmafield with respect to which all the X x are measurable. Events A 1 , A 2 , ... , A k are independent if the corresponding indicator functions 1 4 1 A 2 , ... ,1,,, are independent. This is equivalent to requiring P(B l n B 2 n • • • n Bk ) _ P(B 1 )P(B 2 )• • •P(Bk ) for all choices B l , ... , B k with B ; = A ; or A. Before turning to Kolmogorov's definition of conditional probabilities, it is necessary to state an important result from measure theory. To motivate it, let (S2, F, p) be a measure space and let f be an integrable function on it. Then the set function v defined by v(F) :=
fF f dµ
(F e .^),
(4.8)
639
PRODUCT MEASURES AND INDEPENDENCE
is a countable additive set function, or a signed measure, on .F with the property if µ(F) = 0.
v(F) = 0
(4.9)
A signed measure v on F is said to be absolutely continuous with respect to µ, denoted v «,u, if (4.9) holds. The theorem below says that, conversely, v « p implies the existence of an f such that (4.8) holds, if p, v are sigmajinite. A countably additive set function v is sigmafinite if there exist B. (n > 1) such that (i) U,,, 1 B„ = S2, (ii) Jv(B„)I < oo for all n. All measures in this book are assumed to be sigmaflnite. Theorem 4.3. (Radon—Nikodym Theorem). 2 Let (S2, F, p) be a measure space. If a finite signed measure v on F is absolutely continuous with respect to p then there exists an a.e. unique integrable function f, called the Radon—Nikodym derivative of v with respect to µ, such that v has the representation (4.8). Next, the conditional probability P(A B), of an event A given another event B, is defined in classical probability as n B) P(A I B)._ P(A (4.10)
P(B)
provided P(B) > 0. To introduce Kolmogorov's extension of this classical notion, let (S2, ,F, P) be a probability space and {B„} a countable partition of S2 by sets B. in F. Let -9 denote the sigmafield generated by this partition, Q = a{B„}. That is, -9 comprises all countable disjoint unions of sets in {B„}. Given an event A e .F, one defines P(A the conditional probability of A given Q, by the s-measurable random variable P(A 19)(oß)._
P(A n B.)
for w e B,,,
P(B„)
(4.11)
if P(B„) > 0, and an arbitrary constant c,,, say, if P(B„) = 0. Check that
J
P(A I-9) dP = P(A n D)
for all Dc -9.
(4.12)
D
If X is a random variable, E!XJ < oo, then one defines E(X (.9), the conditional expectation of X given -9, as the 9i-measurable random variable E(X I f)(w):= I X dP P(B,,) JB,
for we B,,, if P(B„) > 0,
=c^
forcoeB,,,ifP(B„)=0,
(4.13)
where c„ are arbitrarily chosen constants (e.g., c„ = 0 for all n). From (4.13) one easily verifies the equality E(X 9) dP = D
J
XdP for all D e ^.
D
Note that P(A I -9) = E(1 A I D), so that (4.12) is a special case of (4.14). 2 P. Billingsley (1986), loc. cit., p. 443.
(4.14)
640
A PROBABILITY AND MEASURE THEORY OVERVIEW
One may express (4.14) by saying that E(X -9) is a 9- measurable random variable whose integral over each D E -9 equals the integral of X over D. By taking D = B. in (4.14), on the other hand, one derives (4.13). Thus, the italicized statement above may be taken to be the definition of E(X -9). This is Kolmogorov's definition of E(X -9), which, however, holds for any subsigmafield -9 of F, whether generated by a countable partition or not. To see that a 9-measurable function E(X -9) satisfying (4.14) exists and is unique, no matter what sigmafield .9 c F is given, consider the set function v defined on 9 by
I
v(D)_
X dP
(D
E
9).
(4.15)
D
Then v(D) = 0 if P(D) = 0. Consider the restriction of P to .9. Then one has v « P on 9. By the Radon-Nikodym Theorem, there exists a unique (up to a P-null set) 9-measurable function, denoted E(X 9), such that (4.14) holds. The simple interpretation of E(X -9) in (4.13) as the average of X on each member of the partition generating .9 is lost in this abstract definition for more general sigmafields 9. But the italicized definition of E(X -9) above still retains the intuitive idea that given the information embodied in -9, the reassessed (or conditional) probability of A, or expectation of X, must depend only on this information, i.e., must be 9-measurable, and must give the correct probabilities, or expectations, when integrated over events in -9. Here is a list of some of the commonly used properties of conditional expectations.
1
Theorem 4.4. (Basic Properties of Conditional Expectations). Let (S2, F, P) be a probability space, -9 a subsigmafield of F. (a) If X is 9-measurable and integrable then E(X -9) = X. (b) (Linearity) If X, Y are integrable and c, d constants, then E(cX + dYI -9) cE(X -9) + dE(YI -9). (c) (Order) If X, Y are integrable and X <_ Y a.s., then E(X -9) _< E(YI 9) a.s. (d) If Y and X Y are integrable, and X is 9-measurable then E(X Y -9) = XE(Y 9). (e) (Successive Smoothing) If 9 is a subsigmafield ofF, 5 c -9, and X is integrable, then E(X ) = E[E(X 1 -9) ] = E[E(X 19)1 -9]. (f) (Convergence) Let {X„} be a sequence of random variables such that, for all n, Z where Z is integrable. If X - X a.s., then E(X, I -9) -• E(X19) a.s. and X
in V. All the properties (a) -(f) are fairly straightforward consequences of the definition of conditional expectations, and are therefore not proved here. The interplay between conditional expectations and independence is described by the following result. Theorem 4.5. (Independence and Conditional Expectation). Let (S2, F, P) be a probability space and 9 a subsigmafield of F. Then the following hold. (a) If X is integrable, and a {X} and -9 are independent, then E(X -9) = EX. (b) Suppose X and Y are measurable maps on (Q, .F) into (S,, .9,) and (S2, Y2), respectively. Let rp be a real-valued measurable function on (S, x S2, . ©.9? ), and cp(X, Y) integrable. Assume X and Y are independent, i.e., v{X} and o { Y} are independent. Then, -
E(q (X, Y) I a {Y }) = [ Etp(X, y)]r =r.
(4.16)
PRODUCT MEASURES AND INDEPENDENCE
641
(c) Let X, Y, Z be measurable maps on (0, .Y-, P) into (S,, S/). (S Z , .9 ), and (S 3 , ./ ), respectively. Let co be a real-valued measurable function on (S,, .9, ). Assume cp(X) is integrable, and a{X} and a{ Y} both independent of a{Z}. Then, writing Q{Y, Z} for the smallest sigmafield c .} with respect to which both Y and Z are measurable, one has E(tp(X))I at Y,Z})=E(cp(X)I a{ Y}).
(4.17)
Proof. (a) EX is a constant and, therefore, -measurable. Also, relation (4.14) holds with EX in place of E(X I -9) by independence. (b) Let Px , P,. denote the distributions of X and Y, respectively. Then the product probability Px x P. on (S, x S 2 , ®.VZ ) is the (joint) distribution of (X, Y). Let D E cr{ Y}. This means there exists BE ./ such that D = {w eft Y(co) E B} By the change-of-variables formula and Fubini's Theorem,
J
J la(Y)(fl(X, Y) dP = f
11
n
=
L
=J =
IB(Y)(Js
la(Y)(P(x, Y)^'x(da)Pr(dY) x sz
(P(x, v)Px(dx) I Pr(dY) =
J
Sz
1 s(Y)(E(➢ (X, Y))P1(dv)
Ja
le(Y(w))(E(P(X, y)) y =Y(,) dP(w) = 1 D(W)(Ew(X, y)) y =Y () dP(w) n
JI (E
Also, [Ecp(X, y)],,=,. is o{Y}-measurable. (c) Let D, E a{Y}, D 2 E a{Z}. Then there exist B, e'9 , B 2 e V, such that D, = Y '(B,), D2 = Z '(B 2 ). With D = D, n D 2 one has -
-
fD rP(X)dP= J 1 D,ID rP(X)dP= J z
S2
1 e,(Y) 1 ,,(Z)p(X)dP
o
= E(I B,(Y)W(X))E( 1 ,,(Z)) = E(la j (Y)ç(X))P(Z E B 2 ) = P(Z e B 2 )E[E(1 B ,(Y)gp(X) I a{Y})] = P(Z e B 2 )E(1(Y)E(cp(X) rr{Y})] = E(1 B2 (Z)1 e (Y)E(W(X)I o{ Y})) = E(1 0 ,1 D ,E(gp(X)I o{ Y})) =
E(tp(X)I a{Y})dP. Din Di
Thus, the desired relation (of the type) (4.14) holds for sets D = D, n D 2 e at Y, Z}. The class :/,1 of all such sets is closed under finite intersection. Also, the class d of all sets D e at Y, Z} for which f D q(X) dP = f D E(cp(X) ( at Y}) dP holds is a lambda class. Therefore, by Dynkin's Pi-Lambda Theorem, a(4) c d. But = at Y, Z}. n There is an extension of Jensen's inequality (2.7) for conditional expectations that is useful.
642
A PROBABILITY AND MEASURE THEORY OVERVIEW
Proposition 4.6. (Conditional Jensen 's Inequality). Let cp be a convex function on an interval J. If X is an integrable random variable such that P(X e J) = 1, and cp(X) is integrable, then E[gp(X)
19]
>, Qp(E(X I
.9)).
(4.18)
Proof. In (2.5) take y = X, x = E(X 1 -9), to get rp(X) >, cp(E(X
1 .9)) + m(E(X I .9))(X — E(X 1 .9)). n
Now take conditional expectations, given -9, on both sides.
As an immediate corollary to (4.18) it follows that the operation of taking conditional expectation is a contraction on LF(i2, .F, P). That is, if X e L° for some p >, 1, then
(4.19)
IIE(X 1 .9)IIp s 11X11 p .
If Xe L2 , then in fact E(X I .9) is the orthogonal projection of X onto LZ (fl, -9, P) c L2 (S2, F, P). For if Y is an arbitrary element of L2 (,.9, P), then E(X — Y) 2 = E(X — E(X .9) + E(X I -9) — Y) 2 = E(X — E(X + E(E(X I -9) — Y) 2 + 2E[(E(X .9) — Y)(X — E(X I .9))].
The last term on the right side vanishes, on first taking conditional expectation given (see Basic Property (e)). Hence, for all Ye L2 (12, .9, P), E(X — Y) 2 = E(X — E(X I .9)) 2 + E(E(X 11) — Y) 2 E(X — E(X ( .9 )) Z . (4.20)
Note that X has the orthogonal decomposition: X = E(X (9) + (X — E(X .9)). Finally, the classical notion of conditional probability as a reassessed probability measure may be recovered under fairly general conditions. The technical difficulty lies in the fact that for every given pairwise disjoint sequence of events {A n } one may assert the equality P(U A„ .9)(cu) _ E P(A„ I .9)(m) for all w outside a P-null set (Basic Properties (b), (f)). Since in general there are uncountably many such sequences, there may not exist any choice of versions of P(A ^) for all A e F such that for each w, outside a set of zero probability, A -+ P(A 1 .9)(cw) is a probability measure on F. When such a choice is possible, the corresponding family {P(A .9): A e f} is called a regular conditional probability. The problem becomes somewhat simpler if one does not ask for a conditional probability measure on F, but on a smaller sigmafield. For example, one may consider the sigmafield I = {Y - '(B): B e.9'} where Y is a measurable function on ((1, F, P) into (S, .9'). A function (w, B) -+ Q w(B -9) on S2 x .9' into [0, 1] is said to be a conditional distribution of Ygiven -9 if (i) for each Be .50 , Q W (B 1 -9) = P( {Ye B) .9)(cw) _E(I B (Y) .9)(w), for all w outside a P-null set, and (ii) for each to e S2, B —4 Q^,(B I -9) is a probability measure on (S, .9'). Note that (i), (ii) say that there is a regular conditional probability on 9 ✓. If there exists a conditional distribution Q,,(B -9) of Y given .9, then it is simple to check that E(W(Y) 19)(w) _
js
Q(y)Q(dy
19)
a.s.
(4.21)
643
CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS
for every measurable cp on (S, 9) such that co(Y) is integrable. Conditional distributions do exist if S is a (Borel subset of a) complete separable metric space and 5' its Borel sigmafeld. In this book we often write E(Z {X x : Ac A}) in place of E(Z 6{X,: A c A}) for simplicity.
5 CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS A sequence of probability measures {Pn : n = 1, 2, ...} on (W, I') is said to converge weakly to a probability measure P (on R') if lim 41(x) dP (x) = P
n^. oar
J 41(x) dP(x)
(5.1)
^aI
holds for all bounded continuous functions 0 on W. It is actually sufficient to verify (5.1) for those continuous functions 0 that vanish outside some finite interval. For suppose (5.1) holds for all such functions. Let 0 be an arbitrary bounded continuous function, l41(x)i -< c for all x. For notational convenience write {x e W: lxi >- N} = {jxI >, N}, etc. Given s > 0 there exists N such that P({ixi >- N}) < e/2c. Let 6 N,, O be as in Figure 5.1(a), (b). Then, lim Pn ( {Ixt ` N + 1}) >, lim J O x (x) dP(x) =
J © (x) dP(x) N
P({ixi <, N}) > 1---, 2c
so that limPP(ixi>N+1})_-1— lim Pn ({ixi-
(5.2)
n-• m
Hence, writing 0, = 416''N and noting that 0 = Q^ s, on {ixi -< N + 1} and that on {IxI > N + 1} one has f r(x)I < c, we have
lim I^ ¢ dP,, — n-. I$
0 dPI <- lim I pi
JI 0
x dP„ —
n-m ßäi
I
dP
,t
+ lim (cPP ({jxj > N + 1}) + cP({IxI > N + 1})) n-.m
= lim cP ({IxI > N + 1}) + cP({Ixi > N + 1})) n-^m
E 8
2c
2c
Since e> 0 is arbitrary, J, 0 dPn —. }° a , ¢ dP, and the proof of the italicized statement above is complete. Let us now show that it is enough to verify (5.1) for every infinitely
644
A PROBABILITY AND MEASURE THEORY OVERVIEW
(a)
(b)
Figure 5.1
differentiable function that vanishes outside a finite interval. For each e > 0 define the
function PE(x) = d(E) expj —(1
= 0
for ^xj < E,
— x2/e2)}
for IxI >- e,
l
(5.3)
where d(e) is so chosen as to make f p,(x) dx = 1. One may check that p,(x) is infinitely differentiable in x. Now let 0 be a continuous function that vanishes outside a finite interval. Then ¢ is uniformly continuous and, therefore, b(s) = sup{1cß(x) — ¢(y)j: x — y^ -< e) -+ 0 as e j 0. Define (x)
= 0 * P(X)J
(5.4)
O(x — y)PE(y) dy, e
and note that, since 4(x) is an average over values of 4 within the interval (x — e, x + s), fr/(x) — ¢(x)I <- S(e) for all E. Hence, ^
J 0 dP„ — J OE dP„I '< ö(e)
for all n,
IJ
J
0 dP — 4 dPI -< b(E),
645
CONVERGENCE IN DISTRIBUTION IN FINITE DIMENSIONS
J
dP_
f 0dPH J n
t8
0dPn— u
+I L
J ¢`dP„ + J 4 dP J O`dP P
o8
O'dP-
2ö(r) + I J
J
—
„
çbdP
O` dP. —
u
J ulO` dP • 26(E)
as n--. cc .
Since S(e) -+ 0 as e --* 0 it follows that f R , ¢ dPn -* f a ^ 0 dP, as claimed. Next let Fn , F be the distribution functions of P, P. respectively (n = 1, 2, ...). We want to show that if {P : n = 1, 2, ...} converges weakly to P. then F (x) -. F(x) as n -+ oo for all points of continuity x of F. For this, fix a point of continuity x 0 of F. Given e > 0 there exists 'j(e) > 0 such that IF(x) - F(x o )I < r for Ix - x 0 1 _< ry(c). Let ii. (x) = I for x _< x 0 , = 0 for x > x 0 + rt(e), and /i. (x) be linearly interpolated for x o <x <x 0 + ry(e). Similarly, define /i, (x) = I for x < x 0 - q(e), ' (x) = 0 for x and linearly interpolated in the interval (x 0 - ry(e), x o ). Then, using (5.1), F
P
lim F(x 0 ) <- lim J c+(X)dPn (X) =
J
(x)dP(x) <- F(xo + ?I(e)) < F(x 0 ) + F,
n-+m
n-+m
lim F(x 0 ) n-+m
lim J
(5.5) (x) dPP(x) = J ^E dP(x) >- F(xo - q(F)) > F(x o ) - e.
n-^w
Since r > 0 is arbitrary, lim n ^ FF (x o ) <_ F(x o ) _< lim n ^ F(x o ), showing that F(x 0 ) -+ F(x o ) as n - co. One may show that the converse is also true. That is, if FF (x) --* F(x) at all points of continuity of a distribution function (d.f.) F, then {Pn : n = 1, 2, ...} converges weakly to P. For this take a continuous function f that vanishes outside [a, b] where a, h are points of continuity of F. One may divide [a, b] into a finite number of small subintervals whose end points are all points of continuity of F, and approximate f by a step function constant over each subinterval. The integral of this step function with respect to P. converges to that with respect to P. All the above results easily extend to multidimensions and we have the following theorem. Theorem 5.1. Let P 1 , P2 .... , P be probability measures on ff8". The following are equivalent statements.
(a) {Pn : n = 1, 2, ...} converge weakly to P. (b) Eq. 5.1 holds for all infinitely differentiable functions vanishing outside a bounded
set. (c) F(x) -* F(x) as n -* oo, for every point of continuity x of F. Often the probability measures P. (n _> 1) arise as distributions of given random variables X. (n _> 1). In this situation, if {Pn } converges weakly to P. one says {X n } converges in distribution, or in law, to P. It is simple to check that if X} converges in probability to a random variable X, then {Xn } converges in distribution to the distribution of X. In general, if {Fn } is a sequence of distribution functions of probability measures P. that converges to a right-continuous, nondecreasing function F at all points of continuity
646
A PROBABILITY AND MEASURE THEORY OVERVIEW
F(x), it need not be the case that F(x) is the distribution function of a probability measure. While there will be a positive measure p such that F(x) = p(— oo, x], x e R', it can happen that u(R') < 1; for example, if P. = b „, i.e., P„({n }) = 1,F„(x) = 0, x
_>
>_
—
Theorem 5.2. Suppose a sequence of probability measures {P„} on R' is tight. Then it has a subsequence converging weakly to a probability measure P. Proof. Let F„ be the d.f. of P.. Let {r 1 , r 2 , be an enumeration of the rationals. Since {F„(r l )} is bounded, there exists a subsequence {n,} (i.e., {1 f ,2,.....n,,...}) of the ...}
positive integers such that F„,(r,) is convergent. Since {F,,(r 2 is bounded, there exists a subsequence {n 2 } of {n 1 } such that F„ 2 (r2 is convergent. Then {F,,(r; )} ( i = 1, 2) converges. Continue in this manner. Consider the "diagonal” sequence 1 1 , 2 2 , kk Then {F„,(r; )} converges, as n —• oo, for every i, as {n,,: n >_ i} is a subsequence of {n : n > 1 }. Let us write n„ = n'. Define )}
)
... ,
, ... .
;
G(r): =lim F,, (r )
(i = 1, 2, ...),
F(x):= inf {G(r; ): r> x}
(x
;
(5.6)
Then F is a nondecreasing and right-continuous function on R', 0 < F < 1. Let x be a point of continuity of F. Let x' < y' <x < y” < x" with y', y" rational. Then
F(x') _< G(y') = lim F„.(y')
<,
lim Ff .(x) < lim F,,(x) < lim F.,(y") = G(y") _< F(x").
Now let x' x, x" j x, and use continuity of F at x to deduce lim F..(x) = F(x).
(5.7)
It remains to prove that F is the d.f. of a probability measure. By tightness of {P„} (and, therefore, of {P,,.}), given e > 0, there exist x F , yE such that F,,(x,) <‚ F„(y,) > I — s. Let xE, y£ be points of continuity of F such that xE <x and y> yE . Then
F(x,) = lim F,,(x) < e, F(y) = lim F,,.(y') I This shows lim, I _. F(x) = 0, lim, 1 , F(x) = 1.
—
E.
n
A similar proof applies to P,,, P on R k .
6 CLASSICAL LAWS OF LARGE NUMBERS Theorem 6.1. (Strong Law of Large Numbers). Let {X„} be a sequence of pairwise independent and identically distributed random variables defined on a probability space
CLASSICAL LAWS OF LARGE NUMBERS
647
(SZ, y, P). If EIX,j < oo then with probability I, lim n
X' + _+ Xn = EX,.
(6.1)
wn
The proof we present here is due to Etemadi. 3 It is based on Part I of the following
Borel-Cantelli Lemmas. Part 2 is also important and it is cited here for completeness. However, it is not used in Etemadi's proof of the SLLN. Lemma 6.1. (Bore/-Cantelli). Let {A n } be any sequence of events in 3. Part 1. If J t P(A,) < co then
P(A, i.o.):= P n U A k ) = 0. =I k=n
( '
Part 2. If A 1 , A 2 , ... are independent events and if
P(A,) diverges then
P(A, i.o.) = 1.
Proof. For Part I observe that the sequence of events B. = U= A k , n = 1, 2, ... , is a decreasing sequence. Therefore, we have by the continuity property (1.2), m
m
\
x
m
P n U A k t = lim P^ U A k ) -< lim Z P(A k ) = 0. nk=n
n-m
k=n
n^z k=n
For Part 2 note that
P({A n i.o.}`) = P^ U n Ak) = lim P( ñ Ak j = lim n=1 k=n
n-^ac
k=n
J
P(Ak).
n—x k=n
But m
w
P(Ak) = lim U (1 — P(Ak)) k =n
lim exp{— m-^m
P(A k )} = 0.
n
k=n
Without loss of generality we may assume for the proof of the SLLN that the random variables X. are nonnegative, since otherwise we can write X. = X„ — X,, where X„ = max(X,, 0) and X, = —min(X,, 0) are both nonnegative random variables, and then the result in the nonnegative case yields that S n
n z Xk = n1 k=, ark _ n1_k=,
converges to EX, — EX 1 = EX, with probability 1. -
' N. Etemadi (1983), `On the Laws of Large Numbers for Nonnegative Random Variables," J. Multivariate Analysis, 13, pp. 187-193.
648
A PROBABILITY AND MEASURE THEORY OVERVIEW
. Then Y. has moments of all orders. Let Truncate the variables X. by Y„ = X„1 T„ = Zk =1 Yk and consider the sequence {T„} on the "fast” time scale T. = [a"], for a e > 0. Then by Chebyshev's
fixed a > 1, where brackets [ ] denote the integer part. Let Inequality and pairwise independence, ET P ^I T^^— `^ >
Var(T) 1 T^ z z z=^ = z E T„ k= 1
T„
E T„
VarYk ^
E
1 T^ Z
= 22 S E {Xk2 l (xk 5k1} =' 2 Z E T„ k=l
1
E
t^
Zz E{X
EYk
rn k=1
E {X12 1(X, sk)}
i n k=1
1
1
ltx „t ^^} = 2 T„E{X i lix, s^^(}. (6.2)
c T„ k=1
e T„
Therefore, T” — ET
r^I > e^
P \I
Z E {Xi l(x, s^ "1} = 2
+2,
.3) 1(x„^^(I. (6
En=1 Tn
n=1 E Tn
Tn
Let x > 0 and let N = minn >- 1: T. -> x }. Then a" >- x and, since y < 2[y] for any y >- 1, Z n=l
1 l(:st^l= Z 1 -<2 Tn
in>-x Tn
« "= 2 a " =k« -
a—!
n3N
k, x
where k = 2 /(a — 1). Therefore, k
°° 1 n=1
in
forX,>0.
Xl
So ^n
ET= I>e)
2
(6.4)
nE1P\IT
By the Borel—Cantelli Lemma (Part 1), taking a union over positive rational values of e, with probability 1, (TL^ — ET ^ )/T n —^ 0 as n -4 oo. Therefore, T
EX,
(6.5)
Tn
since
lim n_ao Tn
ET
lim EY " = EX,. E
n^ro
Since
P(X1 > u) du = EX, <
P(XX Y") = Z P(X 1 > n) -< n=1
n=1
O
o0
CLASSICAL CENTRAL LIMIT THEOREMS
649
we get by another application of the Borel-Cantelli Lemma that, with probability 1, T. -. 0
S.
as n -+ oo.
(6.6)
n
Therefore, the previous results about {T} give for {Sn } that as n-*
S `"-SEX,
(6.7)
rn
with probability 1. If v n
< k < t,,
1
then, since X. >_ 0, Sty
Sk
Zn +l Sr—I
Tn+1 in
k
to Tn+l
Zn
(6.8)
But rn+l/in -* a, so that now we get with probability 1, 1
Sk
Sk
- EX, -< liminf - -< limsup - -< aEX, . a
k
k
k
k
(6.9)
Take the intersection of all such events for rational a > 1 to get lim k - d, Sk /k = EX, with probability 1. This is the Strong Law of Large Numbers (SLLN). n The above proof of the SLLN is really quite remarkable, as the following observations show. First, pairwise independence is only used to make sure that the positive and negative parts of X, and their truncations, remain pairwise uncorrelated for the calculation of the variance of Tk as the sum of the variances. Observe that if the random variables are all of the same sign, say nonnegative, and bounded, then it suffices to require that they merely be uncorrelated for the same proof to go through. However, this means that, if the random variables are bounded, then it suffices that they be uncorrelated to get the SLLN; for one may simply add a sufficiently large constant to make them all positive. Thus, we have the following. Corollary 6.2. If X 1 , X2,... , is a sequence of mean zero uncorrelated random variables that are uniformly bounded, then with probability 1, X, +.••+Xn —
n
*0
asn -*oo.
7 CLASSICAL CENTRAL LIMIT THEOREMS In view of the great importance of the central limit theorem (CLT) we shall give a general but self-contained version due to Lindeberg. This version is applicable to non-identically distributed summands. Theorem 7.1. (Lindeberg's CLT). For each n, let X,,,,. .. , Xk^ n be independent random
650
A PROBABILITY AND MEASURE THEORY OVERVIEW
variables satisfying k"
(
EXj.n = 0 ,
al.n'— (EX^
n)1/Z
CO,
an = 1
,
(7.1)
j=1
and, for each E > 0, k"
(Lindeberg condition)
lim Z E(X; jjjx "I,E}) = 0 .
(7.2)
j
n—W j=1
Then ^J^ 1 X
in distribution to the standard normal law N(0, 1).
Proof. Let {Z : j _> 1} be a sequence of i.i.d. N(0, 1) random variables, independent of j
{X: I < j Q. Write Z
so that EZ
0
,,
(7.3)
(1 <j < kn),
;=a j.n Zj
= EX EZj = a j = EX J . Define n
n
n
m
(I < _< —
k"
Um,n := Y_ X,
,,
m
+ I Zj,n j=m+1
j=1
kn
1), (7.4)
j=1
j=1
Vm.n `= Um.n — Xm n
(1 < m < k n ).
Let f be a real-valued function on I8' such that ff', f ", f " are bounded. Recall the following version of the Taylor expansion, which is easy to check by integration by parts,
h2 f(x
+ h) = f(x)+hf'(x) + h f"(x)+h 2 (1 — 6){f"(x + Oh) — f"(x) }dO
Taking x = V,,,, h
2!
0
(x,hE R'). (7.5)
= X,,, in (7.5), one gets
EJ (Umn) = E' f( V,. +
Xm.n) =
EJ ( Vmn) + E ( Xm.nf ' ( vmn)) + I'E(Xm.nf " ( V, )) + E(R mn), (7.6)
where Xm.n
J
1
(1
—
6){f " ( Vm n + ©x,) — f"( V)} d8.
(7.7)
As Xm , n and Vm , n are independent, and EX,,,,, = 0, EX,,, = of,,,, (7.6) reduces to .n
Ef(Um,n) = E.f(I'm.n) + ^2 Ef"(ym.n) + E(R m , n ).
(7.8)
Also Um_ l.n = Vm,n + Z., n , and Vm.n and Zm.n are independent. Therefore, exactly as
CLASSICAL CENTRAL LIMIT THEOREMS
651
above one gets, using EZ,,,, = 0, EZ m,n = Qm,n (7.9)
Ef(Um- t.n) = Ef(Vm,n) + -^.n Ef'(Vm,n) + E(R^n.n),
where R
= Zm
J 1 (1 — O){ f
„
(V. „ + ©Zm ) — .f"(Vm,„)} dB.
(7.10)
Hence, IEf(Umn) — Ef(U,-i.n)I -< EIRm,nI + EIR;n.nl
(1 -< m -< Q.
(7.11)
Now, given an arbitrary e > 0, EIRm,nI = E(IRm.nl l{Ix,,.,.l>el) + E(IRm.nI1Ux.,,.„ISc))
E xm.n1llXm.^1>El f '(1 — 0 ) 2 11f"Il^o do [
]
.
+ E X 2 ,.IllX.^..,I5c1 J ' (1 — O)IXm,nl IIJ "Il do J 0
[
Ilf"IIwE(Xr2n,. 1 tlx .
I>s} ) + 2EQm,n
11 f " II .
(7.12)
We have used the notation 11911 := sup{Ig(x)I: x e W}. By (7.1), (7.2), and (7.12),
lim
EIRm,„I
28IIf'"II .
m=1
As > 0 is arbitrary, k„
(7.13)
tim Z EIR m.„I = 0. m=1
Also, ,
EIRl s E[zm.„ J(1 — 9)Ilf"'1 Zm,nl de] _ ?II f"'II^EIzm.„1 3 = ?Ilf" IIWam.nEIZ1i 3 cam„c^
max a m , n l am
1 <m-k„
(7.14)
/
where c = Z IIf"ILEIZ1I3. Now, for each S > 0, a m.n = E(xm.n 1 11x., ,d>al) + E(Xm.n 1 lIX,,, ,I ös) -< E ( x m.n 1 llx.^..,1 >ö)) + 5 2 ,
which implies that k„
max am,„ K
E(Xm.„ltlxm.>aI) + Sz.
652
A PROBABILITY AND MEASURE THEORY OVERVIEW
Therefore, by (7.2),
max a,,,,, -- 0 i
as n -- oo.
(7.15)
5mbk
From (7.14) and (7.15) one gets k„
I EIR; ,j < c max a m n -^ 0
as n — cc.
(7.16)
m=1
Combining (7.13) and (7.16), one finally gets k„
/
/
Z (EIRm.nl + EIRin.nl) -a 0
IEf(Uk,,,n) — Ef(Uo.n)I
as n
c0.
(7.17)
m=1
But Uo , n is a standard normal random variable. Hence,
J
k„
Ef
Y- x)
f(y)(2n)-lie
j= 1
exp{—Zy 2 } dy -+ 0
as n -+ oo. n
By Theorem 5.1, the proof is complete.
It has been shown by Feller 4 that in the presence of the uniform asymptotic negligibility condition (7.15), the Lindeberg condition is also necessary for the CLT to hold. Corollary 7.2. (The Classical CLT). Let {Xj : j -> 1} be i.i.d. EXj = p, 0 < a 2 := Var Xj < oo. Then Y 7 =1 (Xj — )/(i/) converges in distribution to N(0, 1). Proof.
Let Xi ,, = (Xj — µ)/(a^), k n = n, and apply Theorem 7.1.
n
Corollary 7.3. (Liapounov's CLT). For each n let X1,n, X2,n, . .. , Xn k„ be k n independent random variables such that k„
k
I EXj,n = u,
I Var Xi ,, = Q Z > 0,
j= 1j=l k„
(Liapounov condition)
lim Z EIXj , n — EXj , n 1 2+ ' = 0
(7.18)
n-. j=l
for some 6 > 0. Then
Xi,,
converges in distribution to the Gaussian law with mean
p and variance a Z . Proof. By normalizing one may assume, without loss of generality, that k.,
EXj , n =0,
Z EX; n = 1. j=1
4
P. Billingsley (1986), loc. cit., p. 373.
653
FOURIER SERIES AND THE FOURIER TRANSFORM
It then remains to show that the hypothesis of the corollary implies the Lindeberg condition (7.2). This is true since, for every E > 0, k„
k„
i
(
Z E X;.^ltlx;..,l>tl) J=1
E 1
IX. Iz+a ,.n Ea 0,
as n - oo, by (7.18).
(7.19) n
Observe that the most crucial property of the normal distribution used in the proof of Theorem 7.1 is that the sum of independent normal random variables is normal. In other words, the normal distribution is infinitely divisible. In fact, the normal distribution N(0, 1) may be realized as the distribution of the sum of independent normal random variables having zero means and variances Q? for any arbitrarily specified set of nonnegative numbers or? adding up to 1. Another well-known infinitely divisible distribution is the Poisson distribution. The following multidimensional version of Corollary 7.2 may be proved along the lines of the proof of Theorem 7.1. Theorem 7.4. (Multivariate Classical CLT). Let {X n : n = 1, 2, ...} be a sequence of i.i.d. random vectors with values in l. Let EX I = p and assume that the dispersion matrix (i.e., variance-covariance matrix) D of X I is nonsingular. Then as n -+ 00, n - '^ 2 (X I + • • • + X. — nit) converges in distribution to the Gaussian probability measure with mean zero and dispersion matrix D.
8 FOURIER SERIES AND THE FOURIER TRANSFORM Consider a real- or complex-valued periodic function on the real line. By changing the scale, if necessary, one may take the period to be 2n. Is it possible to represent f as a superposition of the periodic functions ("waves") cos nx, sin nx of frequency n (n = 0, 1, 2, ...)? The Weierstrass approximation theorem (Theorem 8.1) says that every continuous periodic function f of period 2n is the limit (in the sense of uniform convergence of functions) of a sequence of trigonometric polynomials, i.e., functions of the form T Y c n e" s = n=-T
T
c0 +
(an cos nx + b n sin nx). n=1
The theory of Fourier series says, among other things, that with the weaker notion of L2 -convergence the approximation holds for a wider class of functions, namely for all square integrable functions f on [—it, n]; here square integrability means that I f I z is measurable and that J"_,, f (x)1 2 dx < oc. This class of functions is denoted by L2 [ — rr, n]. The successive coefficients c n for this approximation are the so-called Fourier coefficients: 1 c
n
= --
"
f(x)edx n
(n=0,±1,±2,...).
(8.1)
654
A PROBABILITY AND MEASURE THEORY OVERVIEW
The functions exp{inx} (n = 0, + 1, ±2, ...) form an orthonormal set: 1
e1nxe-"""dx =0
forn0m,
=1
for n = m,
2it
_n
(8.2)
so that the Fourier series off written formally, without regard to convergence for the time being, as X Y
cneinx
(8.3)
is a representation of f as a superposition of orthogonal components. To make matters precise we first prove the following theorem. Theorem 8.1. Let f be a continuous periodic function of period 2n. Then, given ö > 0, there exists a trigonometric polynomial Y„"= - N do exp{inx} such that N
sup f (x) —
do exp{inx} <5. n=—N
XE[ä,
Proof. For each positive integer N, introduce the Fejer kernel
k N (x) :=
j (i
— 1nj ) exp{inx}.
n = _ N N
(8.4)
+1
This may also be expressed as (N + 1)k N (x) _
exp{ijx}I z =
exp{i( j — k)x} = I
lexp^i(N + 1)x} — l
i=o
o,j.ksN
_ 2{l — (cos(N + 1)x} _ sm {Z(N + 1)x} 2(1 — cos x)
exp{ix} — 1
2
)
sin Zx
2
(8.5)
The first equality in (8.5) follows from th^fact that there are N + 1 — (ni pairs (j, k) such thatj — k = n. It follows from (8.5) that k N is a positive continuous periodic function with period 2n. Also, k N is a p.d.f. on [ —n, iv], as follows from (8.4) on integration. For every e> 0 it follows from (8.5) that k N (x) goes to zero uniformly on [ —n, —e] u [E, n] so that kN(x)
dx
-.
0
as N -^ co .
(8.6)
[ - a. - e]v [c.n]
In other words, k N (x) dx converges weakly to 5 0 (dx), the point mass at 0, as N -* oo. Consider now the approximation fN of f defined by
fN(x)'=
f
rz -
.f(y)kN(x — y)dy = y 1 1
. n= _N \\\
— N
Inl+llJc„ exp{inx},
(8.7)
655
FOURIER SERIES AND THE FOURIER TRANSFORM
where c" is the nth Fourier coefficient of f. By changing variables and using the periodicity of f and k N , one may express fN as .fN(x) =
f
n f(x — y)kN(y) dy. rz
Therefore, writing M = sup{I f (x)I: Xe l}, and S E = sup{If (y) — f(y')I: I y — y'I
$
J
kN(y) dy +
S. E
[-R.-E]V [E.f[]
R
(8.8)
It now follows from (8.6) that f — fN converges to zero uniformly as N -+ oo. Now n write do = (1 — Inj/(N + 1))c n . The next task is to establish the convergence of the Fourier series (8.3) to f in L2 . For this note that for every square integrable f and all positive integers N, rz
1
i
C" e'nx ( f(x)_ 27[ - n N
e
(m = 0, ±1.... .±N). (8.9)
dx = c, — C m = 0
Therefore, if one defines the norm (or "length") of a function g in L2 [—iv, rz] by rz
i2
I lg(x)j' dx Ilg11 = (—
(8.10)
= JIgll2,
rz
then, writing 2 for the complex conjugate of z,
N 0 - I f(x) —
=
2
_N
c"einxZ I = 1 f rz (f(x) — ^ c"e•nx )( f( x ) — [^ C "e -inx 1 dx 21 _ rz \ -N J -N
1
^ /'
rz
J -n
IJ (x)12 dx — L
(
einx f(x) dx + 2^
-N \ 27C f rz -n
J
rz
-a
e"' f (x) dx — cn ,, J 7
N
[ N^
N
p
(0.11.)
ICnI2.
= IIJ 11 2 — [^ Cn Cn = IIJ I1 2 — -N
This shows that II f (x) — > N_ N c" exp{inx} 11 2 decreases as N increases and that 2
N
lim
f(x) — Y- cne`"x
N- .
= 11
f
112 — Y- Ic n I 2 .
(8.12)
-N
To prove that the right side of (8.12) vanishes, first assume that f is continuous and f(—it) = f(n). Given e> 0 there exists, by Theorem 8.1, a trigonometric polynomial ZNOy o d n e i "x such that No
max f(x) — Yx
-No
de"' <e. < E.
656
A PROBABILITY AND MEASURE THEORY OVERVIEW
This implies n
1
2 de"" dx < F 2 .
No
f(x) — 2r j,,
(8.13)
-No
But, by (8.9), f(x) — J] N-N o c„ exp{inx} is orthogonal to exp{imx} (m = 0, ± 1, ... , ±N o ) so that 1 7C
rz /'( x ) —
No
2
d rze "x
fNo
No
.J
27t
l dx
- No
cneinx^ dx
f(x) —
27r .,
nx
2
No
=
_No rz
1
(cn — d n )e
_ No
n
n
1
2
No
d x =1 f(x) — So crzeinx +
2
No
— d rz )e inx dx.
+ (c„ 27r
(8.14)
,, -No
Hence, by (8.13) and (8.14), o
— No c n e 27[ f rz I f (x) -rz
inx
l2 dx < e 2 ,
lim
f(x) —
c„etnx
2 \ e 2 . (8.15)
-N
Nam
-No
Since f > 0 is arbitrary, it follows that N
lim f(x) — N--
crze inx
= 0,
(8.16)
-N
and, by (8.12),
II.f ll 2 =
Y_
(8.17)
Ic„1 2 .
This completes the proof of convergence for continuous periodic f. Now it may be shown that given a square integrable f and e > 0, there exists a continuous periodic g such that II f — gll < E/2. Also, letting Z d. exp{inx}, X c„ exp{inx} be the Fourier series of ,g, f, respectively, there exists N t such that N,
d„ exp{inx} II <
g(x) —
-
-N,
Hence (see (8.14)) f(x) —
Y c,einx \ -N,
f(x) —
dneirzx -N,
<
2E + 2E =E.
If — gII + I g(x) —
d u e" -N,
(8.18)
657
FOURIER SERIES AND THE FOURIER TRANSFORM
Since F >0 is arbitrary and Il f(x) — I N-N cne' n `II Z decreases to II f NI oo (see Eq. 8.12), one has N
llm N-^ x^
CnPinx =0,
f(x) —
Ilf 11 2 =
2
— ^ x x. Ic n lz as
x
c12
(8.19)
-x
- N
Thus, we have proved the first part of the following theorem.
Theorem 8.2 (a) For every f in L2 [—n, n], the Fourier series of f converges to f in L2 -norm, and IcnI2)"2 holds for its Fourier coefficients c n . the identity li f II = (b) If (i) f is differentiable, (ii) f(—n) = f(n), and (iii) f' is square integrable, then the Fourier series of f also converges uniformly to f on [—n, n]. Proof. To prove part (b), let f be as specified. Let Y_ c n exp{inx} be the Fourier series of f, and Y c;,' ) exp{inx} that of f'. Then
cnl) = 2?[ Jna f' (l
inxln
x)e-"' dx = 2n f
J
(x)e
+
rz
Zn Jn /'(x)e-inx dx = 0 + inc
n
= inc n .
rzJ
(8.20) Since f' is square integrable,
(8.21)
S Incnl' = Z Ic;,"I Z < cc. Therefore, by the Cauchy—Schwarz Inequality,
1
1
Icnl = Icol + n^o Inl Inc,I <_ Icol +
\h/2/
Incnl2
z
n#on/
< oo. (8.22)
\n#O
But this means that Y c n exp{inx} is uniformly absolutely convergent, since max I E c n exp{inx} I _< E Ic n l —*0 x
InI>N
as N —• cc.
I,,I>N
Since the continuous functions Y N_ N c n exp{inx} converge uniformly (as N —> co) to c n exp{inx}, the latter must be a continuous function, say h. Uniform convergence to h also implies convergence in norm to h. Since Y"' N c n exp{inx} also converges in norm to f, f(x) = h(x) for all x. For if the two continuous functions f and h are not identically equal, then
J
f(x)—g(x)l 2 dx>0.
n
For a finite measure (or a finite signed measure) p on the circle [—n, n) (identifying
658
A PROBABILITY AND MEASURE THEORY OVERVIEW
-n and n), the nth Fourier coefficient of p is defined by 1 c„ = — 2n
exp{ -inx}p(dx)
(n = 0, ± 1, ...).
(8.23)
[n)
If p has a density f, then (8.23) is the same as the nth Fourier coefficient of f given by (8.1). Proposition 8.3. A finite measure µ on the circle is determined by its Fourier coefficients. Proof. Approximate the measure p(dx) by g(x) dx, where rz N (x):=
g
_ rz
n N N+
kN(x - y)p(dy) = ^ (1 - l
^c„ exp{inx}, 1
(8.24)
with c„ defined by (8.23). For every continuous periodic function h (i.e., for every continuous function on the circle),
f
h(x)gN(x) dx t
=
5
h(x)k,(x - y) dx)P(dy). ( ft - x,.)
(8.25)
As N -+ oo, the probability measure k N (x - y) dx = k N (y - x) dx on the circle converges weakly to b y (dx). Hence, the inner integral on the right side of (8.25) converges to h(y). Since the inner integral is bounded by sup{Ih(y)l: ye W}, Lebesgue's Dominated Convergence Theorem implies that
lim N
-
f
h(x)g N (x) dx = J
,-x,n)
h(y)p(dy).
(8.26)
[-rz.rz)
This means that p is determined by {g N : N > 1} The latter in turn are determined by {c),
n
We We are now ready to answer an important question: When is a given sequence
{c: n = 0, + 1, ...}, the sequence of Fourier coefficients of a finite measure on the circle? A sequence of complex numbers {c: n = 0, + 1, ±2,.. .) is said to be positive definite if for any finite sequence of complex numbers {zj : 1 < j < N), one has Y-
c j - k zj z k i 0.
(8.27)
1 j.k5N
Theorem 8.4. (Herglotz's Theorem). {c: n = 0, ± 1, ...} is the sequence of Fourier coefficients of a probability measure on the circle if and only if it is positive definite, and co = 1. Proof. (Necessity).
If p is a probability measure on the circle, and
{z j : 1 -< j -< N} a
659
FOURIER SERIES AND THE FOURIER TRANSFORM
given finite sequence of complex numbers, then Ci_kZJZk =
1 E zjik 21t 1-<j.kSN
1j,kN
J
exp{—i( j — k)x}µ(dx) [-a.n)
J exp{ —ijk} j (dx) >- 0. =1J z N
N
('
_ Irz ^E zj exp{ — iix})(Z z k exp{ikx})µ(dx) 1
1
Z
N
('
j
(8.28)
Also,
J
c0= p(dx) = 1.
(Sufficiency). Take z j = exp{i(j — 1)x}, j = 1, 2, . .. , N + 1 in (8.27) to get gN(x):= 1
c j _ k exp{i(j — k)x} >, 0.
(8.29)
N + 1 0-<j,k
Again, as there are N + 1 — Inj pairs (j, k) such that j — k = n (—N _< n _< N), (8.29) becomes ' 0
-<
9 N (X) _
1
—
j exp{inx}c.
N +1 /
(8.30)
In particular, 1
gN(x)dx = c o = 1.
(8.31)
27[ t-n,n)
Hence (l/21t)g N is a p.d.f. on [—it, ir]. By Theorem 5.2, there exists a subsequence {g N -} such that (1/27t)g N .(x) dx converges weakly to a probability measure u(dx) as N' -+ oo. Also, integration yields
i2rcf -,,. [
exp{ —inx}g N (x) dx = ( l — Inj ^c„
n[
N
+
1
For each fixed n, take N = N' in (8.32) and let C„ = lim^1 — N '-
n
j --c„ =
N' + 1 ;
N' -•
(n = 0, + 1, ... , ±N). (8.32) oo. Then
exp{—inx}p(dx)
(n = 0, ±1,...). (8.33)
In other words, c„ is the nth Fourier coefficient of p. If u({rz}) > 0, one may change p to p' where p'({—it}) = p({—R}) + p({rr}), and p' = p on (—it, n) to get a probability measure p' on the circle whose Fourier coefficients are c. Note that (8.33) holds with p replaced by p', because exp{—inn} = exp{inrc} = 1. n
660
A PROBABILITY AND MEASURE THEORY OVERVIEW
Corollary 83. A sequence {c„} of complex numbers is the sequence of Fourier coefficients of a finite measure on the circle [ -n, it) if and only if {c„} is positive definite. Proof. Since the measure p = 0 has Fourier coefficients c„ = 0 for all n, and the latter trivially comprise a positive definite sequence, it is enough to prove the correspondence between nonzero positive definite sequences and nonzero finite measures. It follows from Theorem 8.4, by normalization, that this correspondence is 1-1 between positive definite sequences {c„} with c o = c > 0, and measures on the circle having total mass c. n The Fourier transform of an integrable (real- or complex-valued) function f on (-oo, oo) is the function f on (-oo, oo) defined by e`^yf(y)dy,
-oo<
(8.34)
As a special case take f = I(c,dJ• Then,
=
exp{ i^ d}
- exp{i^ c }
(8.35)
i
f()
so that -. 0 as ICI -+ oo. This convergence to zero as - ± oo is clearly valid for arbitrary step functions, i.e., finite linear combinations of indicator functions of finite intervals. Now let f be an arbitrary integrable function. Given e > 0 there exists a step function f, such that
I14 — f II I °= f
W
If(y) - f(y)I dy <. (8.36)
Now it follows from (8.34) that (J ) - f( if ff - f II 1 for all . Since J) --* 0 as ^ -. ± oo, one has lim 1 _,. I f(^)I < E. Since E >0 is arbitrary, 0
as {^j -+ co.
(8.37)
The property (8.37) is generally referred to as the Riemann-Lebesgue Lemma. If f is continuously differentiable and f, f' are both integrable, then integration by parts yields .^^( ) =
-
). if.
(8.38)
The boundary terms in deriving (8.38) vanish, for if f' is integrable (as well as f) then f (x) -. 0 as x - ± oo. More generally, if f is r-times continuously differentiable and f e' , 0 < j < r, are all integrable, then one may repeat the relation (8.38) to get )
f (^) = (-if' (). (r)
(8.39)
In particular, (8.39) implies that if f, f', f" are integrable then f is integrable. It is instructive to consider the Fourier transform as a limiting version of a Fourier series. Consider for this purpose that f is differentiable and vanishes outside a finite interval, and that f' is square integrable. Then, for all sufficiently large integers N, the
661
FOURIER SERIES AND THE FOURIER TRANSFORM
function gN(x):=
f(Nx)
vanishes outside (—n, n). Let
Cn Ne
°x
, c e
°S
(8.40)
be the Fourier series of y ti. and its
derivative g N , respectively. Then
f
Cn.N — gN(x)e-;nx dx =
n
'—
2R
f (Nx)e
dx = 2
inl/N
1 - ^r. f(y)e
dy
n N
1 2Nrz
=— f
(8.41)
Now writing A = (2>2j, n-2)'/2, 11 kn,NI = ICo.NI + (
n>o InI
1/z
1/2
> Z >2 Inen.Nl 2
Co.NI +
(I ncn.NI)
.,
on
n#0
'/2
Ico,NI + A( >2 Ic^ 1 NI 2 )
f =—
ou
j
x
n
1/2
.. g(x)dx + A ( rz f Ig (x)I 2 dx < 2 / -R
Therefore, for all sufficiently large N, the following convergence is uniform:
f(z) = yN
(
-
N -
n e ;n 1N. N
= n1 _.. f x, 2Nn
n Y>.
(8.42)
Letting N - co in (8.42), if f E C(If', dx), one gets the Fourier inversion formula, f(z)
=
--
`°
('
.f(—y)ey dy = - J
_^ .i(^)e_
;
^^ ds.
(8.43)
One may show that this formula holds for all f such that both f and f are integrable. Next, any f that vanishes outside a finite interval and is square integrable is automatically integrable, and for such an f one has, for all sufficiently large N, 1
1-
2n f
IgN(x)I2 dx
Ig(x)I 2 dx =
I
=2I f
If(y)I 2 dy,
Y ICn.NI 2 = 4NZrz2 Y, I f( — N^ 2 ,
so that
N n=^^ I f \ N n/I2 2n J
If(y)1 2 dy.
(8.44)
662
A PROBABILITY AND MEASURE THEORY OVERVIEW
Therefore, letting NI oc, one has the Plancherel identity . If(^)I Z dl; = 2n I
1
If(y)1 2 dy.
JJ
JJJ ^
(8.45)
Now every square integrable f on (— oo, oo) may be approximated in norm arbitrarily closely by a continuous function that vanishes outside a finite interval. Since (8.45) holds for the latter, it also holds for all f that is integrable as well as square integrable. We have, therefore, the following. Theorem 8.6 (a) If f and f are both integrable, then the Fourier inversion formula (8.43) holds. (b) If f is integrable as well as square integrable, then the Plancherel identity (8.45) holds. Next define the Fourier transform µ of a finite measure p by setting
µ(ff) _ f
e du(x).
(8.46)
j
If p is a finite signed measure, i.e., p = µ 1 — P 2 where p, P 2 are finite measures, then also one defines fi by (8.46) directly, or by setting µ = µ l — µ 2 . In particular, if p(dx) = f (x) dx, where f is real-valued and integrable, then µ = f If p is a probability measure, then A is also called the characteristic function of p (or of any random variable whose distribution is p). We next consider the convolution of two integrable functions f, g:
f
*g(x) =
.f(x—y)g(y)dy J
(—oo<x
(8.47)
Since If * g(x)I dx = =
If(x — y)IIg(y)I dy dx
JJ
f
If (x)l dx m
J
(8.48)
Ig(y)I dy,
f * g is integrable. Its Fourier transform is
(f * g) (Z) = ^
=
Jm
e'4X
\J -
JJ
f (x — y)g(y) dy) dx
e'Z(x-rie''yf(x — y)g(y) dy dx e.42e14y
-m
-t
f(z)g(y) dy dz =
i(
( ),
(8.49)
663
FOURIER SERIES AND THE FOURIER TRANSFORM
a result of importance in probability and analysis. By iteration, one defines the n-fold convolution f, * • • • *f,, of n integrable functions f,, ... , f„ and it follows from (8.49) that (f, * • • • * f) = f, f. f.. Note also that if f, g are real-valued integrable functions and one defines the measures u, v by µ(dx) = f (x) dx, v(dx) = g(x) dx, and µ * v by (f * g)(x) dx, then (µ * v)(B) = I (f * g)(x) dx = 1' \J f(x — y) dx g(y) dy
JJJe =
s
J
p(B — y)g(y) dy =
J
T
)
µ(B — y) dv(y),
(8.50)
for every interval (or, more generally, for every Bore! set) B. Here B — y is the translate of B by —y, obtained by subtracting from each point in B the number y. Also, (µ * v) = (f * g) = fo = µß. In general (i.e., whether or not p and/or v have densities), the last expression in (8.50) defines the convolution p * v of finite signed measures p and v. The Fourier transform of this finite signed measure is still given by (p * v) = µv". Recall that if X,, X 2 are independent random variables on some probability space ((2, d, P) and have distributions Q1, Q 2 , respectively, then the distribution of X, + X Z is Q, * Q 2 whose characteristic function (i.e., Fourier transform) may also be computed from = Ee'z x, Ee`t x2 = Qt( )2( ).
(Q, * Q2) ^(Z)
(8.51)
This argument extends to finite signed measures, and is an alternative way of thinking about (or deriving) the result (p * v) = µv". Theorem 8.4 and Corollary 8.5 may be extended to a correspondence between finite (probability) measures on H' and positive definite functions on R' defined as follows. A complex-valued function cp on R' is said to be positive definite if for every positive integer N and finite sequences {^„ , ... , N } c H' and {z,, Z21... , z„} c C (the set of complex numbers), one has Z zjzk(P(Zi — Zk) i 0.
(8.52)
I fij.k<-N
Theorem 8.7. (Bochner's Theorem). A function cp on R' is the Fourier transform of a finite measure on H' if and only if it is positive definite and continuous. The proof of necessity is entirely analogous to (8.28). The proof of sufficiency may be viewed as a limiting version of that for Fourier series, as the period increases to infinity. We omit this proof. The above results and notions may be extended to higher dimensions H k . This extension is straightforward. The Fourier series of a square integrable function f on [—m, i) x [—n, n) x • • • x [—n, n) = [—rz, n) k is defined by E„ c,, exp{iv• x} where the summation is over all integral vectors (or multi-indices) v = (v', V' 21 , .. , v (k '), each vf° being an integer. Also v • x = v'''x—the usual euclidean inner product on U& k between two vectors v = (v"'. .. , v°' ), and x = (x°', x (2 ', .. , x" k '). The Fourier coefficients are given by I
n
C^ —
— (
"
...
k
2n
)
,,
'v f(x)e . dx.
(8.53)
664
A PROBABILITY AND MEASURE THEORY OVERVIEW
The extensions of Theorems 8.1-8.4 are fairly obvious. Similarly, the Fourier transform of an integrable function (with respect to Lebesgue measure on Il) f is defined by
i()
=
J
...
J
e
f(y) dy
(4 E R k ),
(8.54)
d4,
(8.55)
and the Fourier inversion formula becomes f(z)
=
.f(4)e
...
(2 I f
which holds when f(x) and f() are integrable. The Plancherel identity
J
.. J
If(4)I Z d4 = ( 2 n) k
J
(8.45) becomes
... If(y)1 2 dy, J
(8.56)
which holds whenever f is integrable and square integrable. Theorem 8.6 now extends in an obvious manner. The definitions of the Fourier transform and convolution of finite signed measures on 08'` are as in (8.46) and (8.50) with integrals over (— 00, cc) being replaced by integrals over R". The proof of the property (it s * 12) = µl P2 is unchanged. Our final result says that the correspondence P -+ P. on the set of probability measures onto the set of characteristic functions, is continuous. Theorem 8.8. (Cramer-Levy Continuity Theorem). Let P„(n >, 1), P be probability
measures on (R', ,1k)
(a) If P. converges weakly to P, then P„() converges to P(4) for every e Il k . (b) If P,, () - P() for every , then P. converges weakly to P. Proof. (a) Since P„(, P() are the integrals of the bounded continuous function exp {i4 • x} with respect to P„ and P, it follows from the definition of weak convergence that PP (^) -> P(^). (b) Let f be an infinitely differentiable function on l having compact support. Define f(x) = f(—x). Then f is also infinitely differentiable and has compact support. Now,
f
f(x)P^(dx) _ (f * P„)( 0 ),
f
f(x)P(dx) = (f * P)(0).
(8.57)
As f is integrable, (f* P„) = / P., and (f * P) = f P are integrable. By Fourier inversion and Lebesgue's Dominated Convergence Theorem, (f * P^)(0 ) =
(2rr) k ^uk exp{
—i0 • x }f(4)PP(4) d --- (f* P)(0).
Hence, f f dP„ - f f dP for all such f. The proof is complete by Theorem 5.1.
(8.58)
n
Author Index
Aldous, D., 194 Aris, R., 518 Athreya, K., 223
Feder. J., 356 Feller, W., 67, 107, 223, 362, 498, 499. 616. 652 Flajolet, P., 362 Fleming, W. H., 562 Fomin, S. V., 97 Freedman, D. A., 228, 499 Friedlin, M., 517 Friedman, A., 502, 516 Fuchs, W. H. J., 92
Bahadur, R. R.. 518 Basak, G., 623 Belkin, B., 506 Bellman. R., 561 Bhattacharya, R. N., 106, 228.513, 515. 518, 561, 623 Billingsley. P., 79, 93, 94, 96. 104, 511, 637, 639, 652 Bjerknes, R.. 366 Blackwell. D., 561 Blumenthal, R. M., 228 Brown, B. M., 508 Brown, T. C., 356
Garcia, A. M., 224 Ghosh. J. K., 503, 518 GiIbarg, D., 516, 517, 622 Girsanov, I. V., 618 Gordin, M. 1., 513 Griffeath. D.. 366 Gupta, V. K., 106, 362, 518
Cameron. R. H., 618 Chan. K. S., 228 Chandrasekhar, S., 259 Chow, Y. S., 562 Chung, K. L., 92, 105.361 Clifford, P., 366 Corson, H. K.. 228
Hall, P. G., 511 Hall, W. J., 503 Harris, T. E., 363 Has'minskii, R. Z., 616, 623 Heyde, C. C.. 511 Holley, R., 366 Howard, R., 561
Diaconis, P., 194 Dobrushin, R. L., 366 Dorisker, M. D., 103 Doob, J. L.. 216 Dorfman, R., 64 Dubins. L. E., 228 Durrett, R. T., 105. 362. 366. 506 Dynkin, E. B., 501, 516
Ibragimov, A., 511 lglehart, D. L., 105, 506 Ikeda, N., 623 It6. K., 105, 356, 497. 501. 503, 504. 506
Eberlein, E., 356 Etemadi, N., 647 Ethier, S. N., 501
Jain, N., 229 Jamison, B., 229 Kac, M., 259 Kaigh, W. D., 105 Karatzas, I., 623 Karlin, S., 504
665
666 Kennedy, D. P., 105 Kesten, H., 190, 227, 362 Kieburtz, R. B., 217 Kindermann, R.. 366 Kolmogorov, A. N., 97 Kunth. D., 193 Kurtz, T. G., 501 Lamperti, J., 207.356 Lawrance, V. B., 217 Lee, O., 228, 513 Levy, P., 105 Lifsic, B. A.. 513 Liggett, T. M., 366 Lindvall, T., 223 Lotka, A. J., 207 McDonald, D., 223 McKean, H. P. Jr., 105, 497, 501, 503, 504, 506 Maitra. A., 560, 561 Majumdar. M., 228, 561, 562 Mandelbrot, B. B., 107, 356 Mandl, P., 500, 501, 504, 505 Martin, W. T.. 618 Merton, R. C., 562 Mesa, 0., 362 Miller, D. R., 105, 506 Miller. L. H., 104 Mina, K. V., 217 Mirman, L. J., 228 Moran, P. A. P., 107 Nelson, E., 93 Ney, P., 223 Odlyzko. A. M., 362 Orey, S., 229 Ossiander, M., 356
AUTHOR INDEX
Riesz, F., 505 Rishel, R. W., 562 Robbins, H., 562 Rogers. L. C. G., 105 Ross, S. M., 561 Rota, G. C.. 343 Royden, H. L.. 93, 228, 230, 501, 613 Scarf, H., 562 Sen, K., 518 Shannon, C. E., 231 Shreve. S. E., 623 Siegmund, D.. 562 Skorokhod, A. V., 356 Snell. J. L.. 366 Spitzer, F.. 68. 73, 190, 227, 343, 366 Stroock, D. W., 502. 505, 517, 621 Sudbury, A., 366 Taqqu, M., 356 Taylor, G. 1., 518 Taylor, H. M.. 504 Thorisson. H., 223 Tidmore, F. E., 65 Tong, H., 228 Trudinger, N. S., 516. 517, 622 Turner, D. W., 65 Tweedie, R. L.. 229 Van Ness, J. W.. 107 Varadhan, S. R. S., 502, 505, 517.621 Watanabe, S., 623 Waymire, E., 106, 259, 362, 356, 518 Weaver, W., 231 Weisshaar, A., 216 Wijsman, R. A.. 503 Williams, D., 105 Williams, T., 366
Parthasarathy, K. R., 230 Rachev, S. T., 482 Radner. R.. 228, 562 Ramasubramanian, S., 516
Yahay. J. A., 228 Young. D. M., 65 Yosida, K., 500 Yushkevich, A. A., 516
Subject Index
Absolutely continuous, 639 Absorbing boundary: birth-death chain, 237, 309 branching, 158, 160, 321 Brownian motion, 412, 414, 451, 454 diffusion, 402-407, 448-460 Markov chain. 151. 318 random walk, 114, 122 Accessible boundary, 393, 499, 615. See also Explosion Actuary science: collective risk, 78 contagion, accident proneness, 116, 273 After-tprocess, 118.425 After-ssigmafield, 425 Alexandrov theorem, 96 Annihilating random walk. 330 Arcsine law, 34, 35, 84 AR(p), ARMA (p,q), 170, 172 Artificial intelligence, 252 Arzela-Ascoli theorem, 97 Asymptotic equality (-), 55 Autoregressive model, 166. 170 Backward equation. See also Boundary conditions branching, 283.322 Brownian motion, 78, 410. 412, 452, 467 diffusion, 374. 395, 442, 462, 463, 464 Markov chain, 266, 335 Backward induction, 523 Backward recursion, 522. 543 Bellman equation, see Dynamic programming equation Bernoulli-Laplace model, 233, 240 Be rt rands ballot problem, 70 Bessel process, see Brownian motion Biomolecular fixation, 319, 321
Birkhoffs ergodic theorem, 224 Birth-collapse model, 203 Blackwell's renewal theorem, 220 Bochner s theorem, 663 Borel-Cantelli lemma, 647 Borel sigmafield: of CI0.cI, CI0.l . 23.93 of metric space, 627 of l8', 68' .626 Boundary conditions, see also Absorbing boundary, Reflecting boundary backward, 396, 401.402, 452 forward,398 mixed, 407 nonzero values, 485, 489 time dependent, 485 Branching process: Bienayme-Galton-Watson branching, 115, 158.281 extinction probability, 160.283 Lamperti's maximal branching, 207 time to extinction, 285 Brownian bridge. 36, 103 Brownian excursion, 85. 104, 288, 362, 505 Brownian meander, 85, 104, 505 Brownian motion. 18, 350, 390,441 constructions, 93, 99, 105 max/min, 79.414.431, 486, 487, 490 scaling property, 76 standard, 18.441 tied-down, see Brownian bridge Cameron-Martin-Girsanov theorem, 616, 618 Canonical construction, 15. 166 Canonical sample space, 2 Cantor set. 64 Cantor ternary function, 64
667
668
Capital accumulation model, 178 Caratheodory extension theorem, 626 Card shuffles, 191, 194 Cauchy-Schwarz inequality, 630 Central limit theorem (CLT): classical, 652 continuous parameter Markov process, 307, 513
diffusion, 438-441 discrete parameter Markov process, 150, 227, 513
functional (FCLT), 23, 99 Liapounov, 652 Lindeberg, 649 Martingale, 503-511 multivariate classical, 653 Chapman-Kolmogorov equations, see Semigroup property Characteristic function, 662 Chebyshev inequality, 630 Chemical reaction kinetics, 310 Choquet-Deny theorem, 366 Chung-Fuchs recumence criterion, 92 Circular Brownian motion, 400 Collocation polynomial, 206 Communicate, 121, 123, 303 Completely deterministic motion, 113 Completely random motion, 113 Compound Poisson process, 262, 267, 292 Compression coefficient, 185 Conditional expectation, 639 Conditional probability, 639 Co-normal reflection, 466 Conservative process: diffusion, see Explosion Markov chain, 289 Continuity equation, 381 Continuous in probability, see Stochastic continuity Control: diffusion, 533-542 feasible, 533 feedback, 533 optimal, 533 Convergence: almost everywhere (a.e.), 631 almost surely (a.s.). 631 in distribution (in law), see Weak convergence in measure, 631 Convex function, 159, 628 Convolution, 115,263.291.662.663 Correlation length, see Relaxation time Coupling. 197, 221, 351, 364, 506
SUBJECT INDEX
Coupon collector problem, 194 Cox process, 333 Cramer-Levy continuity theorem, 664 Detailed balance principle, see Time-reversible Markov process Diffusion. 368. See also Boundary conditions multidimensional, 442 nonhomogeneous, 370, 610 singular, 591 Diffusion coefficient, 18, 19, 442 Dirichlet boundary condition. 404 Dirichlet problem, 454. 495. 606 Dispersion matrix, see Diffusion coefficient Distribution function (c.d.f. or d.f), 627 Doeblin condition, 215 Doeblin coupling, see Coupling Dominated convergence theorem. 633 Domination inequality. 628 Donsker's invariance principle, see Central limit theorem Doob's maximal inequality. 49 Doob-Meyer decomposition, 89 Dorfman's blood test problem. 64 Doubly stochastic, 199 Doubly stochastic Poisson, see Cox process Drift coefficient. 18. 19.442 Duhamel principle, 479,485 Dynamic inventory problem. 551 Dynamic programming equation. 527, 528 Dynamic programming principle, 526 discounted. 527 Dynkin's Pi-Lambda theorem, 637 Eigenfunction expansion. 409, 505 Empirical process, 36, 38 Entrance boundary, 503 Entropy. 186, 213.348 Ergodic decomposition. 230 Ergodic maximal inequality, 225 Ergodic theorem, see Birkhof s ergodic theorem Escape probability: birth-death, 235 Brownian motion, 25.27 diffusion. 377.428 random walk, 6 Essential. 121, 303 Event, 625 finite dimensional, 2.62 infinite dimensional. 2. 3, 23-24, 62 Exchangeable process, 116 Exit boundary. 504 Expected value, 628 -
SUBJECT INDEX
669
Explosion: diffusion. 612 Feller's test, 614 Has'minskii's test. 615 Markov chain, 288 pure birth, 289 Exponential martingale, 88
Hermite polynomials, 487 Hewitt-Savage zero-one law, 91 Hille -Yosida theorem. 500 Hitting distributions, 454 Holder inequality, 630 Horizon, finite or infinite, 519 Hurst exponent, 56
Factorial moments, 156 Fatou's lemma, 633 Feller property, 332. 360. 505, 591 Feynman-Ka c formula. 478, 606 Field. 626 First passage time: Brownian motion, 32, 429, 430 diffusion, 491 distribution, 430 random walk, 11 First passage time process, 429 Flux, 381 Fokker- Planck equation. see Forward equation, 78 Forward equation, see also Boundary conditions branching. 285 Brownian motion, 78. 380, 390 diffusion. 379, 405.442 Markov chain. 266, 335 Fourier coefficients, 653. 658. 663 Fourier inversion formula. 661, 664 Fourier series, 654 Fourier transform. 660. 662. 664 Fractional Brownian motion, 356 Fubini's theorem. 637 Functional central limit theorem (FCLT). see Central limit theorem
i.i.d., 3 Inaccessible, 393. 422. 503 Independent: events, 638 random variables, 638 sigmafields, 638 Independent increments, 18.262, 349, 429 Inessential, 121.303 Infinitely divisible distribution, 349 Infinitesimal generator, 372 compound Poisson. 334 diffusion, 373, 500 Markov chain. 266 multidimensional diffusion, 443, 604 radial Brownian motion. 446 Initial value problem, 477, 604 with boundary conditions, 467, 485. 500 Invariance principle. 24. See also Central limit theorem Invariant distribution: birth-death chain, 238 diffusion, 432, 433, 483, 593, 621, 623 Markov chain. 129, 307, 308 Invariant measure. 204 Inventory control problem, 557 Irreducible transition probability, 123 Irreversible process, 248 Ising model, 203 Iterated averaging. 204 Iterated random maps. 174, 228 ItO integral. 564, 570. 577.607 Itö s lemma. 585.611 Itö s stochastic integral equation, 571, 575 multidimensional, 580
Gambler's ruin, 67. 70, 208 Gaussian pair correlation formula. 77 Gaussian process, 16 Geometric Brownian motion. 384 Geometric Ornstein-Uhlenbeck. 384 Gibbs distribution, 162 Glivenko-Cantelli lemma. 39, 86 Gnedenko-Koroljuk formula. 87 Gradient (grad), 464. 466 Green's function, 501 Harmonic function, 68, 329, 364, 366, 495 Harmonic measure, 454. See also Hitting distributions Harmonic oscillator, 257 Harnack's inequality, 622 Herglotz's theorem, 658
Jacobi identity, 487 Jensen's inequality. 629 conditional, 642 K-convex. 552 Killing, 477 Kirchoff circuit laws, 344, 347 Kolmogorov's backward equation, see Backward equation Kolmogorov's consistency theorem, see Kolmogorov's existence theorem
670
Kolmogorov's existence theorem. 16, 73, 92, 93
Kolmogorov's extension theorem, see Kolmogorov's existence theorem Kolmogorov's forward equation, see Forward equation Kolmogorov's inequality, 51 Kolmogorov-Smirnov statistic, 38, 103 Kolmogorov zero-one law, 90 Kronecker's delta, 265 Lp convergence criterion, 634, 635 LP maximal inequality, 602, 603 Lp space, 633
Lagrange's method of characteristics, 312 Laplacian, 454 Law of diminishing returns, 178 Law of large numbers: classical, 646 diffusion, 432 Markov chain, 145, 307 Law of proportionate effect, 78, 480 Learning model, see Artificial intelligence Lebesgue dominated convergence theorem, see Dominated convergence theorem Lebesgue integral, 627 Lebesgue measure, 626 Levy-Khinchine representation, 350, 354 Levy process. 350 Levy stable process. 355 Liapounov's inequality, 629 Linear space, normed, 633 Liouville theorem, 258 Lipschitzian, local, global, 612 m -step transition probability, 113 Main channel length, 285 Mann-Wald theorem, 102
Markov branching, see Branching process, Bienayme-Galton-Watson branching Markov chain, 17, 109, 110, 111, 214 Markov property, 109, 110 for function of Markov process, 400, 448, 483, 502, 518
Markov random field, 166 Markov time, see Stopping time Martingale, Martingale difference, 43.45 Mass conservation, 311, 381 Maximal inequality: Doob, 49 Kolmogorov. 51 LP, 602, 603
Maximal invariant, 502 diffusion on torus, 518 radial diffusion, 518
SUBJECT INDEX Maximum principle. 365, 454. 485, 495, 516, 621
Maxwell-Boltzmann distribution, 63.393, 481
Measure, 626 Measure determining class, 101 Measure preserving transformation, 227 Method of images, 394, 484, 491. See also Reflecting boundary, diffusion Monotone convergence theorem, 632 Minimal process: diffusion. 613 Markov chain, 288 Minkowski inequality, 630 Mutually singular, 63 Natural scale: birth-death chain, 249 diffusion, 416, 479 Neumann boundary condition, 404,408 Negative definite, 607 Newton's law in Hamiltonian form, 257 Newton's law of cooling, 247 Noiseless coding theorem. 213 Nonanticipative functional. 564, 567. 577. 579 Nonhomogeneous: diffusion, 370, 610 Markov chain, 263, 335, 336 Markov process, 506 Nonnegative definite. 17, 74, 658 Normal reflection. 460 Null recurrent: diffusion. 423 Markov chain, 144, 306 Occupation time, 35 Optional stopping. 52, 544 Orbit, 502 Ornstein-Uhlenbeck process, 369, 391. 476. 580
Past up to time, 118, 424 Percolation construction, 325, 363 Periodic boundary, 256, 398, 400, 493, 518 Period of state, 123 Perron-Frobenious Theorem, 130 p- harmonic, 153 Plancherel identity. 662 Poincare recurrence, 248, 258 Poisson kernel, 454 Poisson process, 103, 261, 270, 279, 280, 351 Poisson's equation, 496 Policy: c- optimal, 526 feasible. 519
SUBJECT INDEX
Markov, 521 optimal, 520
semi-Markov, 561 stationary, 527 (S. • ), 550
Pölya characteristic function criterion, 74 Pölya recurrence criterion. 12 Pdlya urn scheme, 115. 336 Positive definite, 17, 74, 356, 658. See also
Nonnegative definite Positive recurrent:
birth-death, 240 diffusion, 423. 586, 621 Markov chain, 141, 306 Pre sigmafield, 119, 424 Price adjustment model, 436 Probability density function (p.d.f.), 626 Probability generating function (p.g.f.), 159 Probability mass function (p.m.f.), 626 Probability measure. 625 Probability space, 625 Process stopped at t, 424 Product measure space, 636 Products of random variables, 68 Progressively measurable, 497 Prohorov theorem. 97 Pseudo random number generator, 193 Pure birth process, 289 Pure death process, 338, 348
671 diffusion, 419, 458
Markov chain, 305 random walk, 8, 12 Reflecting boundary: birth-death chain, 237, 238. 241, 308. 315 Brownian motion. 410, 411.412, 460, 467 diffusion, 395-402, 463.466
random walk, 114, 122, 243, 246 Reflection principle: Brownian motion. 430 diffusion, see Method of images random walk, 10. 70 Regular conditional probability, 642 Relaxation time, 129, 256, 347 Renewal age process, 342 Renewal process, 193, 217 Renewal theorem, see Blackwell's renewal theorem Residual life time process, 342 Resolvent: equation, 338, 559
operator, 501 semigroup, 360 Reward function, 520 discounted, 526
Reward rate, 539 Riemann-Lebesgue lemma. 660 Sample path, 2, I11, 354, 358 Scale function. 416.499
Quadratic variation. 76
Scaling. 76. 82. 355
Queue. 193. 295.296. 309. 345
Scheffe's theorem, 635 Schwarz inequality, see Cauchy-Schwarz
r-th order Markov dependence, 189 r-th passage time, 39. 40. 45 Radon-Nikodym theorem. 639 Random field, I Random replication model, 155 Random rounding model. 137, 216 Random variable, 627 Random walk: continuous parameter, 263, 317 general, 114 max/min, 6, 66. 70, 83
multidimensional, II simple. 4, 114. 122 Random walk bridge, 85 Random walk on group, 72, 191. 199 Random walk in random scenery. 190 Random walk on tree. 200 Range of random walk. 67 Record value, time. 195 Recurrence: birth-death chain. 236 Brownian motion, 25.447
inequality
Secretary problem, 545 Selfsimilar, 356 Semigroup property, 264. 282. 372, 504, 604 Semistable, 356 Separable process, 94
Shannon's entropy. see Entropy Shifted process, 403 Sigmafield, 625, 627 Sigmafinite, 626 Signed measure. 639 Simple function. 627 Singular diffusion, see Diffusion Size of industry model, 437 Slowly varying, 361 Source/sinks. 478 Spectral radius, 130, 197 Spectral representation. See also
Eigenfunction expansion birth-death chain, 242
diffusion, 410 Speed measure. 416.499
672
SUBJECT INDEX
Stable process. 355 Stationary process, 129. 223. 346 Statistical hypothesis test, 39 Stirling coefficient, 157 Stirling formula, 11.67 Stochastic continuity, 350. 360 Stochastic differential equation, 563. 583.
592 Stochastic matrix, 110 Stopping time, 39, 51, 117, 277, 424 Strong law of large numbers (S.L.L.N.), see Law of large numbers Strong Markov property, 118, 277, 426, 491 Strong uniform time, 194 Submartingale convergence theorem, 358 Sub/super martingale. 88.89 Support theorem, 619 Survival probability of economic agent, 182 Survival under uncertainty, 535 Symmetric dependence, see Exchangeable process Tauberian theorem, 361 Taxicab problem, 558 Telegrapher equation, 299, 343 Theta function, see Jacobi identity Thinned Poisson process, 343 Tightness, 23, 646 Time dependent boundary, 485
Time-reversible Markov process, 200, 241,
254.346.478 Time series, linear. 168 AR(p). 166, 170 ARMA( p.q). 172 Traffic intensity of queue, 345 Transience: birth-death, 237 Brownian motion. 27.447 diffusion. 419.458 Markov chain. 305 random walk, 7 Transition probability, density, law, 110,
123. 368 Tree graph. 200 Umbrella problem, 199 Uniform asymptotic negligibility. 652 Uniform integrability, 634 Uperossing inequality. 357, 358 Wald's equation. 42 Weak convergence. 24, 96, 97, 643 Weierstrass approximation theorem, 653 Wiener measure, 19, 105 Yule process. 293 Zero seeking particles. 348
Errata • p. 12, (5.3): Replace n by k on right side of (5.3). • p. 30, line 8 from top: fo (t) should be fo z (t). • p. 40, line 15 from bottom: ES, = ESo. • p. 53, line 8 from top: T r should be (
)
• p. 69, Exc. 1(iv): q - p^ -1 should be (q - p) -1 . • p. 70, Exc. 4: In 4(i) replace the equality by the inequality (Levy's Inequality) P(maxn
(22 I
2 N -3 " for y 0 and N = ^y) + 2m, m = 0,1, ... .
• p. 88, Exc. 7: "symmetric" should be "asymmetric". • p. 88, Exc. 12(ii), (iii): Assume EX. = 0 for all n. • p. 91, proof of Theorem T.1.2: P(A.n ii A n ) should be P(A l A.). • p. 93, line 6 from top: Delete p in µi, . Should read p..,, (dxz, • • • dxi ). • p. 123, line 14 from bottom: Insert "> 0" (A = In > 1: pti^ > > 0}). • p. 127, (6.7): Superscript N + 1 on the left side should be n + 1. • p. 130: line 14 from bottom, should read: spectral radius is the largest magnitude of eigen values of A. • p. 135: line below (8.2): "return" times should be "passage" times. • p. 137, line 13 from top: The reference should be to Exercise 6 (not Exercise 7). • p. 137, (8.15): Should be (2p - 1) in place of (1 - 2p). • p. 140, (9.8): First sum should start with m = n + 1. • p. 150, (10.11): Replace X[l+l by f(X[nt]+1). • p. 173, line before (13.35): Replace "p-th row" by "last row". e p. 194, Exc. 4(iv), should read: Zn = X, n
673
674
ERRATA
• p. 202, Exc. 1(ii): Insert n in hint: P(lY I > ne i.o.). • p. 210, line 7 from bottom: cEizi 1 should be EZ I (delete e and -1). • p. 225, line 2 from bottom: Replace T.9.1 by T.9.2. • p. 234, 1st line of (1.2), should read: pi,i +l =
(w+r z)(2T-2>
234, 2nd line of (1.2), should read:
i(2r -i)
• p. 236, (2.16): Last term should be (1 - 6,
- ,Qy )p w
(w -'+z)(w +r-i)
(i.e., missing factor pvv ).
• p. 239, (3.6): Right side of 2nd equation in (3.6) should be irk in place of j. • p. 240, 6 lines from bottom: Replace first denominator (2 - r + j) in (3.14) by (w - r + j). • p.256, Exc. 4, should read: al = e -x1 is the largest nontrivial (i.e., c'i is the eigenvector corresponding to al.
1) eigenvalue of p.
• p. 262, 2nd line of (1.4), should read: j < i. • p. 278, lines 7, 8 from top:
X- 1
should be XT2 , and XT, should be
XT,,^ }1 .
• p. 279, line 4 from top: Insert at the end: on the set {Yk = ik : k > 0 }. • p. 280, 2nd line of Proposition 5.6: Replace {Yt} by {Xt}. • p. 285, line 4 from bottom: The reference should be to Example 8.3 (not Example 8.2). • p. 290, line 7 from top: A,,,, +1 should be A, +i. • p. 293, 3rd line of (7.2): Capital T in center of inequality show be lowercase
t.
• p. 302, line 6 from top: {Vt : 0 < u < t} should be {Vu : 0 < u < t }. • p. 303, line 5 from top: o(T) should be o(1). • p. 307, (8.15): The right side should be divided by A2. • p. 307, 3rd line after (8.15): The right side of the display should be divided by A. • p. 309, 12 lines from top: The second and third sums in display (8.26) are E
N) _ N ^0" (y N
ß
• p. 309, (8.27): Delete
N
pyrv , -y
,
(^
+p) I (a+jR) ° (a +N,3) wherever it appears in display (8.27).
• p. 325, (11.2): First sum is over n E Z d . • p. 326, Figure 11.1: ( o o(0)) should be (a„(0)). -
• p. 329, 11.10: Insert d: Should be 2dp mn in second line of display (11.10). • p. 333, Exc. 3: p(u) > 0. • p. 334, line 1 from top: Replace "positive" by "nonzero". • p. 357, lines 3, 4 above (T.5.1): Tk - N should be T2k = N and X ,- k = N should be X 2k = N. ,-
• p. 357, (T.5.3) and 2nd line from bottom: Replace the upper limit of the sum by [N/2] + 1 (not [N/2]). • p. 363, line 6 from bottom: Replace (S,.T) by (E, C9). • p. 363, line 5 from bottom: Replace F by 9. • p. 364, line 8 from top: Replace F by 9.
ERRATA
675 • p. 364, line 14 from bottom, should read: to y in the metric p as k - oo, then a° ( "' k) (t) _ am (nk) ( 0 ) + a(0) ( 0 ) _ aD(7n)(t) a.s. as k - oo. m(n,t) m(n,t) • p. 364, line 12 from bottom, display should read: 1-2P.n , (o (t) = -1) = EQD""` ) (t) -f
Ea n (t) = 1 - 2P.,(o n (t) = -1). • p. 373, (2.12): tfH should be H f I. • p. 374, line 2 from top: l 1(y) - f (x) should be f" (y) - f" (x)1. • p. 375, line 3 from bottom: x = X. should be x = X s . • p. 380, (2:44): Delete square brackets on the right-hand side, and insert one parenthesis after -a/ax. • p. 400, (6.27): Insert a minus sign in the exponential. • p. 414, (8.38), ist line: Replace by d d x. • p. 439, 2nd line of (13.3): Replace f o fo b y fo fo in first term and insert ds'ds at the end of the line. • p. 476, Exc. 4(v): a 2 = yao should be a 2 = ry 2 a ö . • p. 477, Exc. 5(i): Reference to (2.4) in place of (2.2). • p. 485, line 9: Insert ds in the double integral. • p. 487, Exc. 9(ii): The eigenvalues are 0, -1, -2, ... . • p. 505, line 18 from bottom: "nonnegative" should be "nonpositive". • p. 518, line 5 from top: The publisher is Wiley Eastern Division (not Eka Press). • p. 564, line 20: In the definition of P-complete, replace N E .Ft by N E F. • p. 565, (1.9): dB t should be dt. • p. 566, (1.18): Should be .P« not. P,9 in (1.18). • p.569,line2:Insert (ß-a) toread a+n(ß-a)
1 • p. 590, line 9 from top: P(IXt - xJ > e). • p. 597, (4.37): Insert one prime on the right side: aa'. • p. 598, (4.39): Insert one prime on the right side: aa'. • p. 602, Exc. 8(i): In the last inequality of the Hint, transpose E and the first bracket {. • p. 607, line 4: Insert factor eye in cp(s, y) = u(t - s, yl)eh/ 2 . • p. 617, (T.3.30): Insert a'(Xt) after Z°'' in the dB t term on the right side of last line of (T.3.30) and close parenthesis before dBt. Last line of (T.3.30) should thus read: A9(Xi )Z°'"dt + Z°'^ (a (Xi) grad g(Xe) + g(Xe ).f (X t )) • dBt.
676
ERRATA
• p. 618, lines 14, 15 from the bottom, should read: Then P and Q are absolutely continuous with respect to each other on (Il, .Ft), with dQ/dP = Z o "" • p. 619, (T.3.36): c(s) should be replaced by µ(X 8 ) • p. 620, line 4 from top: Insert (r): Should read ß(r) :=. • p. 626, line 17 from top: I = [a, b). • p. 640, Theorem 4.4(f): Delete the last "a.s. and". • p. 648, lines 11, 12 from bottom: k = 2a /(a — 1) in line 11, and a should be a in line 12. • p. 651, 4 lines from bottom, should read: where c = 2I I f " I C E Zl I3 Zr in (8.4) on the right; missing factor 21r on the left in (8.5). • p. 654: Missing factor —^
• p. 668, line 15 from top: Martingale pages are 507-515 (not 503-511). • p. 670, line 5 from top, should read: Kolmogorov forward equation, 78, 266, 335, 379, 380, 390, 405, 442, 477, 478, 481, 492. See also Forward equation. • p. 670, line 15 from bottom: Ornstein—Uhlenbeck pages are 369, 391, 476, 480, 581, 598 (not 580).