Stochastic processes without measure theory

Stochastic processes without measure theory Byron Schmuland I returned, and saw under the sun, that the race is not to...

Author: Schmuland B.

34 downloads 917 Views 605KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Stochastic processes without measure theory

Byron Schmuland

I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, neither yet bread to the wise, nor yet riches to men of understanding, nor yet favour to men of skill; but time and chance happeneth to them all. Ecclesiastes 9:11.

Contents

1. Finite Markov chains A. Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 B. Calculating probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 C. Invariant probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 D. Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 E. Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2. Countable Markov chains A. Recurrence and transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 B. Difference equations and random walks on Z . . . . . . . . . . . . . . . . . 30 C. The general result of random walks . . . . . . . . . . . . . . . . . . . . . . . . . . 33 D. Two types of recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34 E. Branching processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3. Optimal stopping A. Strategies for winning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

B. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 C. Algorithms to find optimal strategies . . . . . . . . . . . . . . . . . . . . . . . . . 45 D. Binomial pricing model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4. Martingales A. Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 B. Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 C. Optional sampling theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 D. Martingale convergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5. Brownian motion A. Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 B. Reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 C. Dirichlet problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6. Stochastic integration A. Integration with respect to random walk . . . . . . . . . . . . . . . . . . . . . 78 B. Integration with respect to Brownian motion . . . . . . . . . . . . . . . . . 79 C. Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

1

FINITE MARKOV CHAINS

1

Finite Markov Chains

A

Basic definitions

1

Let (Xn )∞ n=0 be a stochastic process taking values in a state space S that has N states. To understand the behaviour of this process we will need to calculate probabilities like P (X0 = i0 , X1 = i1 , . . . , Xn = in ).

(1.1)

This can be computed by multiplying conditional probabilities as follows: P (X0 = i0 ) P (X1 = i1 j X0 = i0 ) P (X2 = i2 j X1 = i1 , X0 = i0 ) × · · · × P (Xn = in j Xn−1 = in−1 , . . . , X1 = i1 , X0 = i0 ).

Example A-1. We randomly select playing cards from an ordinary deck. The state space is S = fRed, Blackg. Let’s calculate the chance of observing the sequence RRB using two different sampling methods. (a) Without replacement P (X0 = R, X1 = R, X2 = B) = P (X0 = R) P (X1 = R j X0 = R) P (X2 = B j X1 = R, X0 = R) =

26 25 26 52 51 50

≈ .12745. (b) With replacement P (X0 = R, X1 = R, X2 = B) = P (X0 = R) P (X1 = R j X0 = R) P (X2 = B j X1 = R, X0 = R) =

26 26 26 52 52 52

≈ .12500.

1

FINITE MARKOV CHAINS

2

Definition. The process (Xn )∞ n=0 is called a Markov chain if for any n and any collection of states i0 , i1 , . . . , in+1 we have P (Xn+1 = in+1 j Xn = in , . . . , X1 = i1 , X0 = i0 ) = P (Xn+1 = in+1 j Xn = in ). For a Markov chain the future depends only on the current state and not on past history.

Exercise. In example A-1, calculate P (X2 = B j X1 = R) and confirm that only “with replacement” do we get a Markov chain. Definition. A Markov chain (Xn )∞ n=0 is called time homogeneous if, for any i, j ∈ S we have P (Xn+1 = j j Xn = i) = p(i, j), for some function p : S × S → [0, 1]. From now on, we will assume that all our Markov chains are time homogeneous. The probabilities (1.1) for a Markov chain are computed using the initial probabilities φ(i) = P (X0 = i) and the transition probabilities p(i, j): P (X0 = i0 , X1 = i1 , . . . , Xn = in ) = φ(i0 ) p(i0 , i1 ) p(i1 , i2 ) · · · p(in−1 , in ). We will often be interested in probabilities conditional on the starting position, and will write Pi (A) instead of P (A j X0 = i). In a similar vein, we will write conditional expected values as Ei (X) instead of E(X j X0 = i). Definition. The transition matrix for the Markov chain (Xn )∞ n=0 is the N × N matrix whose (i, j)th entry is p(i, j).

1

FINITE MARKOV CHAINS

3

Example. Card color with replacement.

P =

B R

²

B R ³ 1/2 1/2 1/2 1/2

Every transition matrix satisfies 0 ≤ p(i, j) ≤ 1 and matrices are also called stochastic.

P

j∈S

p(i, j) = 1. Such

Example A-2. Two state Markov chain Imagine an office where there is only one telephone, so that at any time the phone is either free (0) or busy (1). During each time period p is the chance that a call comes in, while q is the chance that a call is completed. ↓ ↓ 0 0 0 1 1 1 0 ··· ↓ The two arrows on top represent incoming calls and the arrow below is the completed call. Note that the second incoming call was lost since the phone was already busy when it came in.

P =

0 1

²

0 1 ³ 1−p p q 1−q

Example A-3. A simple queueing system Now imagine that the office got a second telephone, so that the number of busy phones can be 0, 1, or 2. During each time period p is the chance that a call comes in, while q is the chance that a call is completed.

1

FINITE MARKOV CHAINS

4

↓ ↓ 0 0 0 1 1 2 1 ··· ↓ The two arrows on top represent incoming calls and the arrow below is the completed call. This time the second incoming call is answered on the second phone. 0 0 1−p P = 1 (1 − p)q 2 0 

1 p (1 − p)(1 − q) + pq q

2  0 p(1 − q)  1−q

Example A-4. Random walk with reflecting boundaries. The state space is S = f0, 1, . . . , Ng, and at each time the walker jumps to the right with probability p, or to the left with probability 1 − p.

p

1−p

1

•

•

0

1

1

....... .. .............. ....... .................................... ..................................... ..... ..... .......... .. ...... .... ... .... ..... .... ... ...

....... ...... . ...................................... ..... ...... . .... ..... ... ...

...

•

•

•

j−1

j

j+1

.. .............. .................................... ..... ...... .... ..... ... ...

...

•

•

N −1

N

That is, p(j, j + 1) = p and p(j, j − 1) = 1 − p, for j = 1, . . . , N − 1. The boundary conditions are p(0, 1) = 1 and p(N, N − 1) = 1, and p(i, j) = 0 otherwise. In the case p = 1/2, we call this a symmetric random walk.

Example A-5. Random walk with absorbing boundaries. As above except with boundary conditions p(0, 0) = 1 and p(N, N ) = 1.

1

FINITE MARKOV CHAINS

5

Example A-6. Symmetric vs. Random walk with drift. The following pages show some pictures of the first 10000 steps of a gambler playing red-black on a roulette wheel. The symmetric walk (p = 1/2) is the idealized situation of a fair game, while the walk with drift (p = 9/19) shows the real situation where the casino takes advantage of the green spots on the wheel. The central limit theorem can help us understand the result after 10000 plays of the game. In the symmetric case, it is a tossup whether you are ahead or behind. In the real situation however, the laws of probability have practically guaranteed a nice profit for the casino. ........................... ......................... ...... ....... ...... ....... ..... ..... ...... ..... ..... ..... ..... ..... ..... ..... ..... ..... . . . . . . ..... ..... .. .. . . . . . . . . ..... ..... .. . .... . . . ..... . . . . . ..... ... ... ..... . . . . . . . . . ..... ..... .. .. . . . . . . . . . ..... ..... .. .. . . . . . . . ..... . . . ..... ... .... . ..... . . . . . . . ..... .. ..... .. . . . . . . . . . . ..... ..... ... .... . . . . . ..... . . . . ..... ... ... ..... . . . . . . . . . . ..... ..... ... ... . . . . . . . . ..... . . . ...... ... ...... . .... . . . . . . . . . . ...... ...... ... .... . . . . . . . . . ....... . . . . . . ........ ... .... . . ......... . . . . . . . . . . . . . . . . . . ............. ............. . ..... ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..................... . . . . . . . . . . . . . . ... . ......... ............... ......... ... ....

-526.31

0

Example A-7. A genetics model. Imagine a population of fixed size N , that has two types of individuals; blue and red . At each time t, every individual gives birth to one offspring whose genetic type ( or ) is randomly chosen according to the empirical distribution of the current generation. The old generation dies, and the new generation takes its place, keeping the population size fixed at N . This process repeats itself in every generation. The state space of the process is the set of all probability measures on the type space f , g. The picture below shows a typical sample path for the the process. Here N = 400 and X0 = (1/2, 1/2). Notice that as time goes on, one of the genetic types starts to dominate and will eventually become fixed. Once the process hits one of the absorbing states, it is stuck and the population stops evolving.

1d Random Walk 160 140 120 100 80 60 40 20 0 -20 -40 10000 steps - 3 sample paths

Biased 1d Random Walk 100 0 -100 -200 -300 -400 -500 -600 10000 steps - 3 sample paths

1

B

FINITE MARKOV CHAINS

6

Time = 0

Time = 1

Time = 2

Time = 100

Time = 200

Time = 300

Calculating probabilities

Consider a two state Markov chain with p = 1/4 and q = 1/6 so that

P =

0 1

²

0 1 ³ 3/4 1/4 . 1/6 5/6

To find the probability that the process follows a certain path, you multiply the initial probability with conditional probabilities. For example, what is the chance that the process begins with 01010?

P (X0 = 0, X1 = 1, X2 = 0, X3 = 1, X4 = 0) = φ(0) p(0, 1) p(1, 0) p(0, 1) p(1, 0) = φ(0) × 1/576.

1

FINITE MARKOV CHAINS

7

As a second example, let’s find the chance that the process begins with 00000.

P (X0 = 0, X1 = 0, X2 = 0, X3 = 0, X4 = 0) = φ(0) p(0, 0) p(0, 0) p(0, 0) p(0, 0) = φ(0) × 81/256. If (as in many situations) we were interested in conditional probabilities, given that X0 = 0, we simply drop φ(0), that is,

P (X1 = 1, X2 = 0, X3 = 1, X4 = 0 j X0 = 0) =

1 = .00174. (1.2) 576

P (X1 = 0, X2 = 0, X3 = 0, X4 = 0 j X0 = 0) =

81 = .31641. (1.3) 256

Here’s a harder problem. Suppose we want to find P (X4 = 0 j X0 = 0)? Instead of one path, this includes all possible paths that start in state 0 at time 0, and find themselves in state 0 again at time 4. Then P (X4 = 0 j X0 = 0) is the sum of (1.2) and (1.3) above, plus six others. (Why six?) Luckily there is an easier way to calculate such probabilities.

Theorem. The conditional probability P (Xn = j j X0 = i) is the (i, j)th entry in the matrix P n .

In the problem above, matrix multiplication gives

P4 = so that P (X4 = 0 j X0 = 0) =

0

0 À 3245

1 3245 6912

6912 3667 10368

1 !

3667 6912 6701 10368

,

= .46947.

Probability vectors Let φ0 = (φ0 (1), . . . , φ0 (i), . . . , φ0 (N )) be the 1 × N row vector whose ith entry is P (X0 = i). This vector give the distribution

1

FINITE MARKOV CHAINS

8

of the random variable X0 . If we multiply φ0 on the left of the N × N matrix P n , then we get a new row vector:

φ0 P n

pn (1, 1)  pn (2, 1) = (φ0 (1), . . . , φ0 (i), . . . , φ0 (N ))  ..  . 

··· ··· ...

pn (1, j) pn (2, j) .. .

··· ··· ...

 pn (1, N ) pn (2, N )   ..  .

pn (N, 1) · · · pn (N, j) · · · pn (N, N )

=

P

i∈S φ0 (i)pn (i, 1), . . . ,

P

i∈S φ0 (i)pn (i, j), . . . ,

¡ φ (i)p (i, N ) . 0 n i∈S

P

But for each j ∈ S, X X φ0 (i)pn (i, j) = P (X0 = i)P (Xn = j j X0 = i) i∈S

i∈S

=

X

P (X0 = i, Xn = j)

i∈S

= P (Xn = j). In other words, the vector φn = φ0 P n gives the distribution of Xn .

Example. If we start in state zero, then φ0 = (1, 0) and ³ ² ² 3245 3667 ³ 3245 3667 6912 6912 4 , = (.46947, .53053). = φ4 = φ0 P = (1, 0) 3667 6701 6912 6912 10368 10368 On the other hand, if we flip a coin to choose the starting position then 17069 24403 φ0 = (1/2, 1/2) and φ4 = ( 41472 , 41472 ) = (.41158, .58840).

C

Invariant Probabilities

Definition. A probability vector π is called invariant for the Markov chain if π = πP .

1

FINITE MARKOV CHAINS

9

Example. Let’s try to find invariant probability vectors for some Markov chains. 1.

Suppose that P =

²

0 1

³ 1 . An invariant probability vector π = 0

(π1 , π2 ) must satisfy (π1 , π2 ) = (π1 , π2 )

²

0 1

1 0

³

,

or, multiplying the right hand side, (π1 , π2 ) = (π2 , π1 ). This equation gives us π1 = π2 , and since π1 + π2 = 1, we conclude that π1 = 1/2 and π2 = 1/2. The unique invariant probability vector for P is π = (1/2, 1/2). ²

1 0 2. If P = 0 1 satisfies π = πP ! 3.

³

is the identity matrix, then any probability vector

Consider a two state Markov chain with ² ³ 3/4 1/4 P = . 1/6 5/6

The first entry of π = πP gives π1 = 3π1 /4 + π2 /6, which implies π1 = 2π2 /3. Using π1 + π2 = 1, we conclude that π = (2/5, 3/5).

Theorem A probability vector π is invariant if and only if there is a probability vector v such that π = limn vP n . Proof: (⇒) Suppose that π is invariant, and choose φ0 = π. then φ1 = φ0 P = πP = π. Repeating this argument shows that φn = π for all n ≥ 1. Therefore, π = limn πP n (In this case, we say that the Markov chain is in equilibrium). (⇐) Suppose that π = limn vP n . Multiply both sides on the right by P to obtain πP = (lim vP n )P = lim vP n+1 = π. n

n

1

FINITE MARKOV CHAINS

10 u t

This shows that π is invariant.

²

³ 1−p p Let’s investigate the general 2 × 2 matrix P = . It has q 1−q eigenvalues 1 and 1 − (p + q). If p + q > 0, then P can be diagonalized as P = QDQ−1 , where  q p  ² ³ ² ³ 1 −p 1 0 p+q p+q  . Q= , D= , Q−1 =  1  −1 1 q 0 1 − (p + q) p+q p+q Using these matrices, it is easy to find powers of the matrix P . For example P 2 = (QDQ−1 )(QDQ−1 ) = QD2 Q−1 . In the same way, for every n ≥ 1 we have P n = QDn Q−1 ² n ³ 1 0 = Q Q−1 0 (1 − (p + q))n q p+q = 1n  q p+q 

 p p  p+q  np+q p  + (1 − (p + q))  −q p+q p+q

−p  p+q  . q  p+q

Now, if 0 < p + q < 2 then (1 − (p + q))n → 0 as n → ∞ and  q p  ² ³ −π− p+q p+q  n P → q p  = −π− , p+q p+q where π = (q/(p + q), p/(p + q)). For any probability vector v we have ² ³ −π− n n lim vP = v lim P = v = π. n n −π− This means that π is the unique limiting vector for P , and hence the unique invariant probability vector.

1

FINITE MARKOV CHAINS

11

Definition A Markov chain is called ergodic if P n converges to a matrix with identical rows π as n → ∞. In that case, π is the unique invariant probability vector.

Theorem If P is a stochastic matrix with P n > 0 for some n ≥ 1, then the Markov chain is ergodic.

The next result is valid for any Markov chain, ergodic or not. P Theorem If P is a stochastic matrix, then (1/n) nk=1 P k → M . The set of invariant probability vectors is the set of all convex combinations of rows of M . Proof: We assume the convergence result and prove the second statement. Note that any vector π that is a convex combination of rows of M can be written π = vM for some probability vector v. (⇒) If π is an invariant probability vector, then π = πP . Therefore ! À n 1X k → πM, P π=π n k=1 which shows that π is a convex combination of the rows of M . (⇐) Suppose that π = vM for some probability vector v. Then À n ! X 1 πP = vM P = lim v Pk P n n k=1 = lim v(P 2 + . . . + P n+1 ) n

= lim v(P + · · · + P n ) n

1 n

1 1 + v(P n+1 − P ) n n

= vM + 0 = π. u t

1

FINITE MARKOV CHAINS

12

P We sketch the argument for the convergence result n1 nk=1 P k → M . P The (i, j) entry of the approximating matrix n1 nk=1 P k can be expressed in terms of probability as À n ! n n n 1X 1X k 1X 1X P = Pi (Xk = j) = Ei (1(Xk =j) ) = Ei 1(Xk =j) . n k=1 ij n k=1 n k=1 n k=1 This is the expected value of the random variable representing the average number of visits the Markov chain makes to state j during the first n time periods. A law of large numbers type result will be used to show why this average converges. Define the return time of the state j as Tj := inf(k ≥ 1 j Xk = j). We use the convention that the infimum of the empty set is ∞. There are two possibilities for the P sequence 1(Xk =j) : if Tj = ∞, then it is just a 1 sequence of zeros, and n nk=1 1(Xk =j) = 0. On the other hand, if Tj < ∞, then the history of the process up to Tj is irrelevant and we may just as well start counting visits to j from time Tj . This leads to the equation mij = Pi (Tj < ∞) mjj . Putting i = j above, we discover that if Pj (Tj < ∞) < 1, then mjj = 0. Thus mij = 0 for all i ∈ S and hence πj = 0 for any invariant probability vector π. If Pj (Tj < ∞) = 1, then in fact Ej (Tj ) < ∞ (for a finite state space!). The following example shows the first n + 1 values of the sequence 1(Xk =j) , where we assume that the (` + 1)th visit to state j occurs at time n. The random variable Tjs is defined as the time between the (s − 1)th and sth visit. These are independent, identically distributed random variables with the same distribution as Tj . nth trial ↓ 1 |00001 | | {z } · · · 000001 | {z } · · · {z } 00000000000001 {z } 00000001 1 2 3 T `j Tj Tj Tj The average number of visits to state j up to time n can be represented as the inverse of the average amount of time between visits. The law of large

1

FINITE MARKOV CHAINS

13

numbers says that, Pj -almost surely, Pn ` 1 k=1 1(Xk =j) = P` > 0. → k n E (T ) j j T k=1 j We conclude that (1/n)

Pn

k=1

P k → M , where mij = Pi (Tj < ∞)/Ej (Tj ).

Examples of invariant probabilities. 1. The rat. Suppose that a rat wanders aimlessly through the maze pictured below. If the rat always chooses one of the available doors at random, regardless of what’s happened in the past, then Xn = the rat’s position at time n, defines a Markov chain. 1 3

2 4

The transition matrix for this chain is   0 1/3 1/3 1/3  1/2 0 0 1/2   P =  1/2 0 0 1/2  1/3 1/3 1/3 0 To find the invariant probability vector, we rewrite the equation π = πP as (I − P t )π = 0, where P t is the transpose of the matrix P and I is the identity matrix. The usual procedure of row reduction will lead to the answer.     1 −1/2 −1/2 −1/3 1 −1/2 −1/2 −1/3  −1/3   1 0 −1/3    →  0 5/6 −1/6 −4/9   −1/3  0 −1/6 5/6 −4/9  0 1 −1/3  −1/3 −1/2 −1/2 1 0 −2/3 −2/3 8/9

1

FINITE MARKOV CHAINS

14



   1 0 −3/5 −3/5 1 0 0 −1  0 1 −1/5 −8/15      →  0 1 0 −2/3   0 0 4/5 −8/15   0 0 1 −2/3  0 0 −4/5 8/15 0 0 0 0 This last matrix tells us that π1 − π4 = 0, π2 − 2π4 /3 = 0, and π3 − 2π4 /3 = 0, in other words the invariant vector is (π4 , 2π4 /3, 2π4 /3, π4 ). Because π1 +π2 +π3 +π²4 = 1, we need ³ π4 = 3/10 so the unique invariant probability 3 2 2 3 , , , . vector is π = 10 10 10 10 2. Random walk.

Here is an example from the final exam in 2004.

α

1/2

........ .. ..................................... ..... ...... ... .... ..... ... ...

•

•

0

1

α

1/2

........ ......................................... ...... ..... ..... .... ... ...

........ . ... .......... ...................................... ........................................ ...... ..... ..... ........... .. ..... ... ... ... ... ... ...

...

•

•

•

j−1

j

j+1

...

•

•

N −1

N

A particle performs a random walk on f0, 1, . . . , Ng as drawn above. On the interior, it is a symmetric random walk. From either of the boundary points 0 or N , the particle either jumps to the neighboring state (with probability α) or sits still (with probability 1 − α). Here, α can take any value in [0, 1], so this includes both reflecting and absorbing boundaries, as well as so-called “sticky” boundaries. Find all invariant probability measures for every 0 ≤ α ≤ 1. Solution: The vector equation π = πP gives π0 = (1 − α)π0 + π1 /2 π1 = απ0 + π2 /2 πj = πj−1 /2 + πj+1 /2, for j = 2, . . . , N − 2 πN −1 = απN + πN −2 /2 πN = (1 − α)πN + πN −1 /2,

1

FINITE MARKOV CHAINS

15

which can be re-written as

2απ0 = π1 π1 − 2απ0 = π2 − π1 πj − πj−1 = πj+1 − πj , for j = 2, . . . , N − 2 πN −1 − 2απN = πN −2 − πN −1 2απN = πN −1 .

Combining the first two equations shows that π2 = π1 , and then the middle set of equations implies that π1 = P π2 = π3 = · · · = πN −1 . If α > 0, then both π0 and πN equal π1 /2α. From N j=0 πj = 1, we get the unique solution π0 = πN =

1 , 2((N − 1)α + 1)

πj =

α , j = 1, . . . , N − 1. (N − 1)α + 1

If α = 0, then we find that π1 = π2 = · · · = πN −1 = 0, and π0 , πN are any two non-negative numbers that add to 1. 3. Google search engine. One of the primary reasons why Google is such an effective search engine is the PageRank algorithm developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford. A brief explanation explanation of PageRank is available at http : //www.google.com/technology/index.html Imagine surfing the Web, randomly choosing an outgoing link from one page to get to the next. This can lead to dead ends at pages with no outgoing links, or cycles around cliques of interconnected pages. So, a certain fraction of the time, simply choose a random page. This theoretical random walk of the Web is a Markov chain. The limiting probability that a random surfer visits any particular page is its PageRank. A page has high rank if it has links to and from other pages with high rank.

1

FINITE MARKOV CHAINS

16

To illustrate, imagine a miniature Web consisting of six pages labelled A to F , connected as below. ........ ................. ...... ......... ..... ... ... ... ... ... .. .. . ..... .... . .. . ... ... ........... .. . . . ..... ...... .... . . . ................................................... . . . . . . . . . . . . . . . . . ... ................ ........... .......... ... ..... ....... ... ... .... ....... ..... ... ..... ....... ..... .... ... ..... ....... ..... ..... ....... ... ... ..... .... ....... . . ... ..... . ... . . . ....... ............. ...... ........................ . . ... . . . ... . . . . . . . . . . ....... .. ... ... .. ... ... . . . . . . . . . ... ... ... ... ................... ... . . . . . . . . . . . . . . . . . . . ... . . ... .... ... . . ..... . . ... .... . . .......... . . . . . . . . . . . . .. . . . . . . . . ... ...... ... ......... ........................ . . . . .... . . ................. ........... ... ..... ..... . . . ... . . . . . . . ... . . ..... ... ..... . . ... . . . . . . .. . . . . ..... ... ..... . . . . . . . . . . . . . . ..... ......... ........ ... ..... . . . . . . . . . . . ....... . . ....... ........ ... ........... . . . . . . . . . . . . . . . . . . . . . . . . ............... .......... .... ............ ......... ... .. ......... ..... ... .... .. ... ... . .................................................................................................................................................. .... .. . ... ... . . ... . ..... ..... .................... ....................

D

E

B

C

A

F

The connectivity matrix of this miniweb is: A A 0 B 0 C 0 G= D 1 E0 F 1 

B 1 0 0 0 0 0

C 1 1 0 0 0 0

D 0 1 1 0 0 0

E 0 0 1 0 0 0

F 0 0 1 0 1 0

       

P The transition probabilities are pij = pgij / k gik + (1 − p)/n. With n = 6 and p = .85, we get   1 18 18 1 1 1  1 1 18 18 1 1    1  1 1 1 37/3 37/3 37/3    P =  35 1 1 1 1 1 40     1 1 1 1 1 35  35 1 1 1 1 1 Using software to solve the matrix equation π = πP , we get π = (.2763, .1424, .2030, .1431, .0825, .1526), so the pages ranked according to their PageRank are A, C, F, D, B, E.

1

D

FINITE MARKOV CHAINS

17

Classification of states

Definition. A state j ∈ S is called transient if Pj (Tj < ∞) < 1 and called recurrent if Pj (Tj < ∞) = 1. Note. Let Rjs be the time of the sth return to state j, so Rj1 = Tj1 = Tj . We have Pj (Rjs < ∞) = Pj (Tj < ∞)Pj (Rjs−1 < ∞), and by induction we prove that Pj (Rjs < ∞) = [Pj (Tj < ∞)]s . Taking the intersection over s, we obtain º £ ¢ 0 if j is transient s . Pj ∩s (Rj < ∞) = 1 if j is recurrent The probability of infinitely many visits to state j is either zero or one, according as the state j is transient or recurrent. Definition. Two states i, j ∈ S are said to communicate if there exist n m, n ≥ 0 so that pm ij > 0 and pji > 0. By this definition, every state communicates with itself (reflexive). Also, if i communicates with j, then j communicates with i (symmetric). Finally, if i and j communicate, and j and k communicate, then i and k communicate (transitive). Therefore “communication” is an equivalence relation and we can divide the state space into disjoint sets called communication classes. If i is transient, then mii = 0 and the equation mji = Pj (Ti < ∞) mii shows that mji = 0 for all j ∈ S. The jth row of the matrix M is invariant for P , and hence for any power of P , so that X 0 = mji = mjk pnki ≥ mjj pnji . k∈S

If i and j communicate, we can choose n so pnji > 0, and conclude that mjj = 0. This shows that communicating states i and j are either both recurrent, or both transient. All states within each communicating class are of the same type, so we will speak of recurrent or transient classes.

Lemma.

If j is recurrent and Pj (Ti < ∞) > 0, then Pi (Tj < ∞) > 0.

1

FINITE MARKOV CHAINS

18

In particular, j communicates with i, which means you cannot escape a recurrent class. Proof. If j is recurrent, the chain makes infinitely many visits to j. Define T = inffn > Ti j Xn = jg to be the first time the process hits state j after hitting state i. Then Pj (Ti < ∞) = Pj (Ti < ∞, T < ∞) = Pj (Ti < ∞)Pi (Tj < ∞). The first equation is true because, starting at j, the process hits j at arbitrarily large times. The second equation comes from applying the Markov property at time Ti . If Pj (Ti < ∞) > 0, then we can divide it out to obtain Pi (Tj < ∞) = 1. u t A general stochastic matrix with recurrent classes R1 , . . . , Rr , after a possible reordering of the states, looks like .   P1 0 · · · 0 ......... ....  0 P2 · · · 0 .......  ..  0 ... ...  0  ... 0 0 ... P =  ... . .  0  .... 0 · · · P . r .   ... . .... ... .. ... ... ...

S

Q

Each recurrent class R` forms a little Markov chain with transition matrix P` . When you take powers you get .  n  0 · · · 0 ......... P1 . . .  0 P2n · · · 0 ........  ..   0 ... . .   . . . 0 .... 0 Pn =  0  ... . .  0  n .... 0 · · · P . .  r ...  .. Sn

.. ... .. ... ... ...

Qn

Averaging these from 1 to n, and letting n → ∞ reveals the structure of

1

FINITE MARKOV CHAINS

19

the matrix M . Π1  0   M = 0  0  

0 Π2 0 0

··· 0 ··· 0 ... 0 · · · Πr

S∞

.. ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..



 0    

0

If i and j are in the same recurrent class R` , the argument in the Lemma above shows that Pi (Tj < ∞) = 1 and so mij = mjj . That is, the rows of Π` are identical and give the unique invariant probability vector for P` .

Examples of classification of states. ²

³ 0 1 1. If P = , then there is only the one recurrent class R1 = f0, 1g. 1 0 The invariant probability must be unique and have strictly positive entries. ²

³ 1 0 2. If P = , then there are two recurrent classes R1 = f0g 0 1 and R2 = f1g. The invariant measures are π = a(1, 0) + (1 − a)(0, 1) for 0 ≤ a ≤ 1. That is, all probability vectors! 3.

Suppose we have 

1/2 1/2 0 0  1/6 5/6 0 0   0 3/4 1/4 P = 0  0 0 1/6 5/6 1/5 1/5 1/5 1/5

 0 0   0  . 0  1/5

The classes are R1 = f0, 1g, R2 = f2, 3g, and T1 = f4g. The invariant measures are π = a(1/4, 3/4, 0, 0, 0) + (1 − a)(0, 0, 2/5, 3/5, 0) for 0 ≤ a ≤ 1. None of these puts mass on the transient state.

1

FINITE MARKOV CHAINS

20

4. Take a random walk with absorbing boundaries at 0 and N . We can reach 0 from any state in the interior, but we can’t get back. The interior states must therefore be transient. Each boundary point is recurrent, so R1 = f0g, R2 = fN g, and T1 = f1, 2, . . . , N − 1g and the invariant probability vectors are π = a(1, 0, 0 . . . , 0, 0) + (1 − a)(0, 0, . . . , 0, 1) = (a, 0, 0 . . . , 0, 1 − a) for 0 ≤ a ≤ 1.

E

Hitting times

Partition the state space S into two pieces D and E. We suppose that for every starting point in D it is possible to reach the set E. We are interested in the transition from D to E.

S

............................................................... ............ ........ .. .......... ...... .. ........ ..... . ... . . . . . . ..... ............. ....... .... . ...... . ... . . . . . . . ... .... . . . . . . ... ... .... ... . . . ... . ........ ... ...... . . . . . . . ... . . . . . . . ...... ... . . . . .. . . . . . . . . ...... .. .. .. . . . . . . . . ... . ..... ............ .. ... ..... . . .................................. .... . ..... . .. . . . . . . ..... . .......... .. .... ... .... .. ............... .... ... ..... ........... ..... ... .. ... ..... ... .. ... ... ..... . .. . ... . ..... .... . .... .... ...... .... . . ... . ... ........ ...... ... ... .. ..... ... ...................... ..... ... ... ..... .... ... ... ..... ...... . . . . ..... .... ......... . ..... ....... .. ..... ...... ....... . . . . . ....... . ....... ......... ......... .......... .............. ...............................................

i

•

•

•

•

E

•

D

•

Let Q be the matrix of transition probabilities from the set D into itself, and S the matrix of transition probabilities of D into E. The row sums of (I − Q)−1 give the expected amount of time spent until the chain hits E. The matrix (I − Q)−1 S gives the probability distribution of the first state visited in E.

1

FINITE MARKOV CHAINS

Examples.

1. The rat.

1

Recall the rat in the maze.

1 2 3 4  1 0 1/3 1/3 1/3 2 1/2 0 0 1/2   P =   3 1/2 0 0 1/2  4 1/3 1/3 1/3 0 

2

3

21

4

Here’s a question we can answer using methods from a previous section: Starting in room 1, how long on average does the rat take to return to room 1? We are looking for E1 (T1 ), which is 1/π1 for the unique invariant probability vector. Since π1 = 3/10, the answer is 10/3 = 3.333. Here’s a similar sounding question: Starting in room 1, how long on average does the rat take to enter room 4? For this we need the new results from this section. Divide the state space into two pieces D = f1, 2, 3g and E = f4g. The corresponding matrices are 1 1 0  Q = 2 1/2 3 1/2 

2 1/3 0 0

3  1/3 0  0

and

4  1 1/3 S = 2 1/2 . 3 1/2 

We calculate 1 2 3  1 1 −1/3 −1/3 1 0  I−Q = 2 −1/2 3 −1/2 0 1 

and

(I−Q)−1

1 2 3  1 3/2 1/2 1/2 = 2 3/4 5/4 1/4 . 3 3/4 1/4 5/4

The row sums of (I − Q)−1 are     5/2 E1 (T4 )  E2 (T4 )  =  8/4  , 8/4 E3 (T4 ) and the first entry answers our question: E1 (T4 ) = 5/2.



1

FINITE MARKOV CHAINS

22

2. $100 or bust. Consider a random walk on the graph pictured below. You keep moving until you hit either $100 or ruin. What is the probability that you end up ruined? .◦ ◦..... ..... ... $100 . . . . ..... ... .... ... .◦.. .... ........ . . . ..... .. .... ... . . ◦ ruin start ◦ E consists of the states $100 and ruin, so the Q and S matrices look like:     0 1/3 1/3 1/3 0 Q =  1/4 0 1/4  S =  1/4 1/4  . 1/3 1/3 0 0 1/3 A bit of linear algebra gives  11/8 2/3 (I − Q)−1 S =  1/2 4/3 5/8 2/3

 5/8 1/3 1/2   1/4 11/8 0

   0 5/8 3/8 1/4  =  1/2 1/2  . 1/3 3/8 5/8

Starting from the bottom left hand corner there is a 5/8 chance of being ruined before hitting the money. Hey! Did you notice that if we start in the center, then getting ruined is a 50-50 proposition? Why doesn’t this surprise me?

3. The spider and the fly. A spider performs a random walk on the corners of a cube and eventually will catch and eat the stationary (and rather stupid!) fly. How long on average does the hunt last?

fly

............................................ spider ......... . ..................................................... .... ... . ... .. .. .. ... . .. .... .. ... .. .. .. ... ... ...... . . . . ... . ............. ...............................................

1

FINITE MARKOV CHAINS

23

To begin with, it helps to squash the cube flat and label the corners to see what is going on. 1 •

4 •

2 •

5 •

3 •

6 •

.................................................................................. ........... .. ....... ...... ........... ...... ...... ...... ...... ...... ...... ..... ...... . ...... . ...... . . ...... ... ..... ...... . . . . . . . . . . . ...... ...... ..... .... . . . . ...... . . . .... ............. ...... . . . . . . . . ...... ...... ... ... . . . . . . . . . . . ...... . ...... .... .... . . . . ...... . . . . . . ...... ...... .... .... . . . . . . . . . . ...... .... .... .... . . . . . . . . . . . ........................................................................................................................................................................................................................................ ...... ...... . . ... . ...... . ...... ...... . . . ...... ... ...... ..... . . . . . . . . . . ...... ...... .... .... . . . . ...... ...... . . . . ...... ......... ...... ..... .. ...... ..... .......... ...... ...... ...... ...... ...... ........... ..... ..... ...... ...... . . . . . . . . . . ...... . ...... .. ...... ...... ...... ...... ...... ......... ...... . .................................................................................................

F •

S •

Here is the transition matrix for the chain. F 1 2 3 F 0 1/3 1/3 1/3 1 1/3 0 0 0   2  1/3 0 0 0  3 1/3 0 0 0 P =  4 0 1/3 1/3 0  5  0 1/3 0 1/3 6 0 0 1/3 1/3 S 0 0 0 0

4 0 1/3 1/3 0 0 0 0 1/3



5 0 1/3 0 1/3 0 0 0 1/3

6 0 0 1/3 1/3 0 0 0 1/3

S  0 0   0   0   1/3   1/3   1/3  0

Using software to handle the linear algebra on the 6 × 6 matrix Q, we find that the row sums of (I − Q)−1 are (10, 9, 9, 9, 7, 7, 7). The answer to our question is the first one: ES (TF ) = 10.

4. Random walk. Take the random walk on S = f0, 1, 2, 3, 4g with absorbing boundaries. 1−p • 0

p

...... .. .............. .................. ................. ............... ......... ........................ ...... ..... ..... .......... ..... ... ... ... . . ... .... .

• 1

• 2

• 3

• 4

1

FINITE MARKOV CHAINS

24

Now put D = f1, 2, 3g and E = f0, 4g, so that 1 1 0 Q = 2 1 − p 3 0 

(I − Q)−1

2 3  p 0 0 p 1−p 0

and

0 4  1 1−p 0 0 . S = 2 0 3 0 p 

1 1 (1 − p)2 + p 1 2 1−p = (1 − p)2 + p2 3 (1 − p)2

2 p 1 1−p



3  p2 . p 2 p + (1 − p)

The row sums give     E1 (TE ) 1 + 2p2 1  E2 (TE )  = .  2 (1 − p)2 + p2 2 E3 (TE ) 1 + 2(1 − p) Matrix multiplication gives 0 1 (1 − p + p2 )(1 − p) 1 −1 2 (1 − p)2 (I − Q) S = 2 2 (1 − p) + p 3 (1 − p)3 

4  p3 . p2 2 (1 − p + p )p

Starting in the middle, at state 2, we find that

4 E(length of game) =

2 , (1 − p)2 + p2

3 2

................... ....... ...... ..... ..... ..... ..... ..... ..... . . . ..... ... . . . ..... .. . . . ..... . .... ... . . ... .. . . ... .. . .... . ..... ... . . . ..... ... . . . ..... ... . ..... . . .. . ..... . . . ..... ... . . . . ..... . . .....

0

0.5

1

1

FINITE MARKOV CHAINS

25

1

P (ruin) =

0.5

(1 − p)2 , (1 − p)2 + p2

0

... ... ... ... ... ... ... ... ... .... ..... ..... ..... ..... ..... ...... ...... ....... ....... ........ ......... .......... ............ ................ ............................

0

0.5

1

For instance, with an American roulette wheel we have p = 9/19 so that the expected length of the game is 3.9889 and the chance of ruin .5524.

5. Waiting for patterns. Suppose you start tossing a fair coin and that you will stop when the pattern HHH appears. How long on average does this take? We define a Markov chain where Xn means the number of steps needed to complete the pattern. The state space is S = f0, 1, 2, 3g and we start in state 3 and the target is state 0. Define D = f1, 2, 3g and E = f0g. The state of the chain is determined by the number of Hs at the end of the sequence. Here · · · T represents any initial sequence of tosses, including the empty sequence, provided it doesn’t have 3 heads in a row. ···T ···TH · · · T HH · · · T HHH

State State State State

3 2 1 0

The corresponding Q matrix and (I − Q)−1 are given by 1 1 0  Q = 2 1/2 3 0 

2 0 0 1/2

3  1/2 1/2 , 1/2

(I − Q)−1

1 2 3  1 2 2 4 = 2 2 4 6  3 2 4 8 

The third row sum answers our question: E(T0 ) = 2 + 4 + 8 = 14.

1

FINITE MARKOV CHAINS

26

We can apply the same idea to other patterns; let’s take T HT . Now the states are given as follows: · · · HH · · · HHT ···TT ···TH · · · T HT

State State State State State

3 2 2 1 0

The corresponding Q matrix and (I − Q)−1 are a little different 1 1 0  Q = 2 1/2 3 0 

2 0 1/2 1/2

3  1/2 0 , 1/2

(I − Q)−1

1 2 3  1 2 2 2 = 2 2 4 6  3 2 4 4 

The third row sum E(T0 ) = 2 + 4 + 4 = 10 shows that we need on average 10 coin tosses to see T HT .

6. A Markov chain model of algorithmic efficiency. Certain algorithms in operations research and computing science act in the following way. The objective is to find the best of a set of N elements. The algorithm starts with one of the elements, and then successively moves to a better element until it reaches the best. In the worst case, this algorithm will require N − 1 steps. What about the average case? Let Xn stand for the rank of the element we have at time n. If the algorithm chooses a better element at random, then Xn is a Markov chain with transition matrix

1

FINITE MARKOV CHAINS



1

0

27

0

0

···

0

0

1 0 0 0 ··· 0 0 1 1 0 0 ··· 0 0 2 2 1 1 1 0 ··· 0 0 3 3 3 .. .. .. .. .. .. ... . . . . . . 1 1 1 1 1 ··· 0 N −1 N −1 N −1 N −1 N −1 We are trying to hit E = f1g and so   0 0 0 ··· 0 0  1   0 0 ··· 0 0   2     1  1  0 ··· 0 0  Q= . 3  3   .  . . . . .  . .. .. .. .. ..   .    1 1 1 1 ··· 0 N −1 N −1 N −1 N −1 A bit of experimentation with Maple will convince you that   1 0 0 ··· 0 0 1   0 0   2 1 0 ···    1 1   1 ··· 0 0  (I − Q)−1 =  . 2 3  . . .  . .  . . . ...  . . . . . . .     1 1 1 1 ··· 1 2 3 4 N −1        P =       



       .       

Taking row totals shows that E(Tj ) = 1 + (1/2) + (1/3) + · · · + (1/(j − 1)). Even if we begin with the worst element, we have E(TN ) = 1 + (1/2) + (1/3) + · · · + (1/(N − 1)) ≈ log(N ). It takes an average of log(N ) steps to get the best element. The average case is much faster than the worst case analysis might lead you to believe.

2

COUNTABLE MARKOV CHAINS

2

28

Countable Markov Chains

We now consider a state space that is countably infinite, say S = f0, 1, . . .g or S = Zd = f(i1 , . . . , id ) : ij ∈ Z, j = 1, . . . , dg. We no longer use linear algebra but some of the rules carry over X p(x, y) = 1, for all x ∈ S, y∈S

and pm+n (x, y) =

X

pm (x, z)pn (z, y).

z∈S

A

Recurrence and transience

Suppose that Xn is an irreducible Markov chain on a countable state space S. We call the chain recurrent if for each x ∈ S, Px (Xn = x infinitely often) = 1, and transient if Px (Xn = x infinitely often) = 0. Fix P∞ a state x and assume X0 = x. Define the random variable R = n=0 1(Xn =x) . From the Markov property E(R j R > 1) = 1 + E((R − 1) j R > 1) = 1 + E(R) By definition E(R) = E(R j R > 1)P (R > 1) + E(R j R = 1)P (R = 1) = (1 + E(R))P (R > 1) + P (R = 1) = 1 + E(R)P (R > 1), whence we conclude that 1 = E(R)P (R = 1).

(2.4)

Suppose that P (R = 1) = 0, that is, a return to x is certain. From the Markov property, a second return is also certain, and a third, etc. In fact P (R = ∞) = 1, so that x is recurrent and E(R) = ∞.

2

COUNTABLE MARKOV CHAINS

29

Now suppose that P (R = 1) > 0, so a return to the initial state is not certain. Then from (2.4) we have E(R) = 1/P (R = 1) < ∞, so P (R < ∞) = 1, i.e. x is transient. Since E(R) = E

À∞ X

1(Xn =x)

!

=

n=0

∞ X

P (Xn = x) =

n=0

∞ X

pn (x, x),

n=0

the following theorem holds.

Theorem.

The state x is recurrent if and only if

P∞

n=0

pn (x, x) = ∞.

Example. One dimensional random walk p

1−p

...

......... ..... ........................................ ..................................... ...... ..... ......... . ..... ..... .... .... ... ... ... ...

•

•

•

•

•

x−2

x−1

x

x+1

x+2

...

Take x = 0 and assume that X0 = 0. Let’s find p2n (0, 0). In order that X2n = 0, there must have been n steps ¡ to the left and n steps to the 2n right. The number of such paths is n and the probability of each path is pn (1 − p)n so ² ³ 2n n (2n)! n p2n (0, 0) = p (1 − p)n = p (1 − p)n . n n!n! √ Stirling’s formula says n! ∼ 2πn(n/e)n , so replacing the factorials gives us an approximate formula p 2π(2n)(2n/e)2n n [4p(1 − p)]n n √ . p (1 − p) = p2n (0, 0) ∼ 2πn(n/e)2n πn P P √ If p = 1/2, then n p2n (0, 0) ∼ n 1/ πn = ∞ and the P walk is recurrent. But if p 6= 1/2, then p2n (0, 0) → 0 exponentially fast and n p2n (0, 0) < ∞ so the walk is transient.

2

COUNTABLE MARKOV CHAINS

30

Example. Symmetric random walk in Zd At each time, the walk moves to one of its 2d neighbors with equal probability. ... ... ... ... ... . ..... .... . . ......... ... .... . . . . . .................................................................................................................................. . . ... ... . ......... ........ .. . ..... ... ... ... ..

•

•

•

•

•

•

•

•

•

Exact calculations are a bit difficult, so we just sketch a rough calculation that gives the right result. Suppose the walker takes 2n steps. Roughly, we expect he’s taken 2n/d steps in each direction. In order to return to zero, we need the number of steps in each direction to be even: the chance of that is (1/2)d−1 . In each direction, the p chance that a coordinate ends up back at 0 in 2n/d steps is roughly 1/ π(n/d). Therefore ² ³d−1 ² ³d/2 d 1 . p2n (0, 0) ∼ 2 πn Since

B

P

n−a < ∞ if and only if a > 1, we conclude that º recurrent if d ≤ 2 d Symmetric random walk in Z is . transient if d ≥ 3

Difference equations and Markov chains on Z

Let Xn be a random walk on the integers, where p(y, y − 1) = qy , p(y, y + 1) = py , and p(y, y) = 1 − (qy + py ), where we assume py + qy ≤ 1. For x ≤ y ≤ z, define a(y) = Py (Xn will hit state x before z).

2

COUNTABLE MARKOV CHAINS

31

Note that a(x) = 1 and a(z) = 0. Conditioning on X1 we find that the function a is harmonic for x < y < z, that is, a(y) = a(y − 1)qy + a(y)(1 − (qy + py )) + a(y + 1)py , which can be rewritten as a(y)(qy + py ) = a(y − 1)qy + a(y + 1)py , or as py [a(y) − a(y + 1)] = qy [a(y − 1) − a(y)]. Provided the py > 0 for x < y < z, we can divide to get a(y) − a(y + 1) =

qy [a(y − 1) − a(y)], py

and iterating gives us a(y) − a(y + 1) =

qx+1 · · · qy [a(x) − a(x + 1)]. px+1 · · · py

For convenience, let’s define ry = qy /py , so the above equation becomes a(y) − a(y + 1) = rx+1 · · · ry [a(x) − a(x + 1)]. This is even true for y = x, if we interpret the “empty” product rx+1 · · · rx as 1. For any x ≤ w ≤ z we have a(x) − a(w) =

w−1 X

[a(y) − a(y + 1)] =

y=x

w−1 X

rx+1 · · · ry [a(x) − a(x + 1)]. (1)

y=x

In particular, putting w = z gives 1 = 1 − 0 = a(x) − a(z) =

z−1 X

rx+1 · · · ry [a(x) − a(x + 1)],

y=x

and plugging this back into (1) and solving for a(w) gives Pz−1 y=w (rx+1 · · · ry ) . a(w) = Pz−1 (r · · · r ) x+1 y y=x

(2)

2

COUNTABLE MARKOV CHAINS

32

Consequences 1. Let’s define the function b by b(y) = Py (Xn will hit state z before x). This function is also harmonic, but satisfies the opposite boundary conditions b(x) = 0 and b(z) = 1. Equation (1) is valid for any harmonic function, so let’s plug in b and multiply by -1 to get b(w) = b(w) − b(x) =

w−1 X

rx+1 · · · ry [b(x + 1) − b(x)].

(3)

y=x

Plugging in w = z allows us to solve for b(x + 1) − b(x) and plugging this back into (3) gives Pw−1 y=x (rx+1 · · · ry ) . (4) b(w) = Pz−1 y=x (rx+1 · · · ry ) In particular we see that a(w) + b(w) = 1 for all x ≤ w ≤ z. That is, the chain must eventually hit one of the boundary points fx, zg, provided all the py ’s are non-zero. 2. For w ≥ x, define α(w) = Pw (ever hit state x) = lim a(w). z→∞

P If the limit of the denominator of (4) diverges, i.e., ∞ y=x (rx+1 · · · ry ) = ∞, then limz→∞ b(w) = 0, so limz→∞ a(w) = α(w) = 1 for all w. On the other hand, if

P∞

y=x (rx+1

· · · ry ) < ∞, then

P∞ y=w (rx+1 · · · ry ) α(w) = P∞ . y=x (rx+1 · · · ry ) This shows that α(w) decreases to zero as w → ∞.

(5)

2

COUNTABLE MARKOV CHAINS

33

3. In the case where py = p and qy = q don’t depend on y,  z r − rw    rz − rx if r 6= 1 a(w) =  z−w   if r = 1. z−x Letting z → ∞ gives the probability we ever hit x from the right: α(w) = 1 ∧ rw−x . 4. Notice that α(x) = 1. The process is guaranteed to hit state x when you start there, for the simple reason that we count the visit at time zero! Let’s work out the chance of a return to state x. Conditioning on the position X1 at time 1, we have Px (Tx < ∞) = qPx−1 (hit x) + (1 − (p + q))Px (hit x) + pPx+1 (hit x) ³ ² 1 + (1 − (p + q))1 + p(1 ∧ r) = q 1∧ r = (q ∧ p) + (1 − (p + q)) + (p ∧ q) = 1 − jp − qj. This shows that the chain is recurrent if and only if p = q.

C

The general result for random walks .................................................................................................................................................... .. . . ... . .. ... ... ... ... ... ... ... ... ... ..... ... ..... ... ..... ... ... ..... .. ... ... ... ... .. ... ... ... ............ ............... . . . . ... . ... ... .. .... .. ... .... ... .... ... ... ..... . ... .. . ... . . ... . . ... .. ... ... . .. . . . . . . . . . . . . . . . .......................................................................................................................................................................... . . .. . .. ... . . . . . . . . . . . ... ... . . .. . . ... .... . . . . . . . ... .. . . ... . ... ............ ............... ... ... ... ... ... .... .. .. ...... ... ... ... .... ... .... ... .... ... ... ... ... ... ... ... .. . . . . . . . . ................................................................................................................................................................ ... .. .. .. .. ... .. ... .. ... ... ... ..... ... ..... ... .... ... . . . . . . ... . . . . ... ... ... ... ... ... ... ... ... ... ... ... .. ... .. ... ... ... ... .... ... ... ..... . . . . .. . ... ... . ............................................................................................................................................................ . .

•

The following general result is proved using of harmonic analysis.

2

COUNTABLE MARKOV CHAINS

34

Theorem. Suppose Xn is a genuine d-dimensional random P P walk with jxjp(0, x) < ∞. The walk is recurrent if d = 1, 2, and xp(0, x) = 0. Otherwise it is transient.

D

Two types of recurrence

A Markov chain only makes finitely many visits to a transient state j, so the asymptotic density of its visits, mjj must be zero. This is true for both finite and countable Markov chains. A recurrent state j may have zero density or positive density, though the first is impossible in a finite state space S. Definition. A recurrent state j is called null recurrent if mjj = 0, i.e., Ej (Tj ) = ∞. It is called positive recurrent if mjj > 0, i.e., Ej (Tj ) < ∞. The argument in section 1.D shows that if states i and j communicate, and if mii = 0, then mjj = 0 also. The following lemma shows that if i and j communicate, then they are either both recurrent or both transient. Putting it all together, communicating states are of the same type: either transient, null recurrent, or positive recurrent.

Lemma.

If i and j communicate,

P

k

pkii < ∞ if and only if

P

k

pkjj < ∞.

n Proof. Choose n and m so pm ij > 0 and pji > 0. Then for every k ≥ 0, n+k+m we have pjj ≥ pnji pkii pm ij , so that

X k

pkjj ≥

X k

n pn+k+m ≥ pm ij pji jj

X

pkii .

k

P P Therefore k pkjj < ∞ implies k pkii < ∞. Reversing the roles of i and j gives the result. t u

2

COUNTABLE MARKOV CHAINS

Example.

35

For the symmetric random walks in d = 1, 2, we have 2n

m00

2n

1 X 1 1 X 2k √ = lim = 0. p00 ∼≤ lim 2n k=1 2n k=1 πk

These random walks are null recurrent.

E

Branching processes

This is a random model for the evolution of a population over several generations. It has its origins in an 1874 paper by Francis Galton and Reverend Henry William Watson called ”On the probability of extinction of families”. (see http://www.mugu.com/galton/index.html) To illustrate, here is a piece of the Schmuland family tree: •

.. ........... ..... .. ..... ..... .. ..... ..... ..... ........ . . . . ..... . ... ..... ..... . ..... .... .... ..... ..... ..... ..... ... .... . . . . . . . . . . . . . . . . .................... ........... ... . . . . . . . . . ...... ......... ............ ... . . . . . . . . . . . ..... ....... ... ... ..... ....... . . . . . . . . . . . . . . ..... ....... .... ..... ....... ..... ....... ..... ..... .... ....... ..... ....... ... ... ....... ..... ..... ....... ....... ..... ... ... ..... ....... ....... ........ ....... . . . . . . . . . . . .. . ... ... ....... ... . . ... . . . . . . ... ... ....... ... . ... . . . . . . . ..... ... ... ... ... . . . . . . . . . . ..... .. ... .... . . . . . . . . . ..... . . ..... ... ... ... . . . . . . . ... . . . ..... . ... ... . . . . . ... . . . . . . . ..... ... .... ..... . . . ... ..... ... . . . .. .... .... ......... ..... .. .. ..... .... .... ..... ... ..... ... ..... ... ... .... . .

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

In this model, Xn is the number of individuals alive at time n. Each individual produces offspring distributed like the random variable Y y 0 p(y) p0

1 p1

2 p2

3 p3

... ...

We also assume that the individuals reproduce independently. The process (Xn ) is a Markov chain with state space S = f0, 1, 2, . . .g. It is not easy to write explicit formulae for the transition probabilities, but we can express them as follows p(k, j) = P (Y1 + · · · + Yk = j).

2

COUNTABLE MARKOV CHAINS

36

Let’s find the average size of generation n. Let µ = E(Y ) = Then EXn+1 j Xn = k) = E(Y1 + · · · + Yk ) = kµ,

P∞

j=0

jpj .

so that E(Xn+1 ) =

∞ X

E(Xn+1 j Xn = k)P (Xn = k)

=

∞ X

kµP (Xn = k)

k=0

k=0

= µE(Xn ). By induction, we discover that E(Xn ) = µn E(X0 ). If µ < 1, then E(Xn ) → 0 as n → ∞. The estimate E(Xn ) =

∞ X

kP (Xn = k) ≥

k=0

∞ X

P (Xn = k) = P (Xn ≥ 1)

k=1

shows that limn→∞ P (Xn = 0) = 1. Now, for a branching process, the state 0 is absorbing so we can draw the stronger conclusion that P (limn Xn = 0) = 1. In other words, the branching process is guaranteed to become extinct if µ < 1. Extinction. Let’s define a sequence an = P (Xn = 0 j X0 = 1), that is, the probability that the population is extinct at the nth generation, starting with a single individual. Conditioning on X1 and using the Markov property we get P (Xn+1 = 0 j X0 = 1) = = =

∞ X

k=0 ∞ X

k=0 ∞ X

P (Xn+1 = 0 j X1 = k)P (X1 = k j X0 = 1) P (Xn = 0 j X0 = k)P (X1 = k j X0 = 1) P (Xn = 0 j X1 = 1)k pk .

k=0

P∞

k If we define φ(s) = k=0 pk s , then the equation above can be written as an+1 = φ(an ). Note that φ(0) = P (Y = 0), and φ(1) = 1. Also

2

COUNTABLE MARKOV CHAINS

37

P k−1 0 ≥ 0, and φ0 (1) = E(Y ). Finally, note that φ00 (s) = φ = ∞ k=0 pk ks P(s) ∞ k−2 ≥ 0, and if p0 + p1 < 1, then φ00 (s) > 0 for s > 0. k=0 pk k(k − 1)s The sequence (an ) is defined through the equations a0 = 0 and an+1 = φ(an ) for n ≥ 1. Since a0 = 0, we trivially have a0 ≤ a1 . Apply φ to both sides of the inequality to obtain a1 ≤ a2 . Continuing in this way we find that the sequence (an ) is non-decreasing. Since (an ) is bounded above by 1, we conclude that an ↑ a for some constant a. The value a gives the probability that the population will eventually become extinct. It is the smallest solution to the equation a = φ(a). The following pictures sketch the proof that a = 1 (extinction is certain) if and only if E(Y ) ≤ 1.

Case µ ≤ 1 and a = 1

Case µ > 1 and a < 1 •

.. ...... ... .... .. ...... ......... ... .............. . . . ... . .. . .......... ... ......... ... .......... ...... ..... ... .................. . . . ... . . ...... ....... ... .................... ... ...... .. ....... ...... ....... ... ................................ ... . . . ... . . . . ....... .... ......... . . ... ....... ... .. ..... .. ... ....... ............................................ ... .. .. . .......... . . . . . . . ... . . . . ..... .. ........ ... .... .... .... ..... .. ......... . . . . . . . . . . ... . . . . . .. .......... ... .... ........ .. ... .. ........... . . . . . . . . . . . . . ... . . .. .. . . . . ... . ....... ............... . .............................................................................................................................. . .... .... . ... .. . . . . .... . ..... .. .. ... .. ... ..... . . . . . . .. .. . . ... . ..... ... . . .... . . . . . .... .... . ... . . ... . . . . . . ... . ..... .. ... .. ... .. .... . . . . . .. .. . ... . . ..... ... . . . ..... . . . . . . . ... . .... ... . . .... . . . ... ..... .. ... .. .... .... ..... . . . . . . . . . ... . .. ..... ... . ..... .... ... .... .... ... ..... . . . ... . .... .. ... .. .... .... ..... . . .. ... ......... .. ... ....... .. . .. .. .. .. .........................................................................................................................................................................................................................................

a0

a1

a2 a3a4

1

.. ... ... ...... .. ........ ........ ... ........... . . . ... . . .......... ... ........ ... ..... .... ..... ..... ... ............. . . ... . ..... .... ... ..... .... ... ..... ..... ..... ..... ... .............. . . ... . ..... .... ... ..... ..... ... ..... ..... ..... .... ... ............. . . . ... . ..... ..... ... ........... ... .......... .......... ... ............. . . . ... . . .. .......... ... .... ... ........ .......... . ... ................. . . . . . ... . . .................. ... ... ....... .. ...... ... ........ ....... ............................. .. ... .... ......... .... ......... .. .. . . . . . . ... . . .. ........... .. .. ........... .. .. ............. .. ................................................................................ .. .. . ... . ... ..... ... . . . . .... . . ... . . . .... . . . ... ..... .. .. ... .. .... ..... . . . . . . . . . . ... . . ..... ... . ... . .... . . . . . . ... . .... . ... . . . . . . ... . .... .. .. ... .. .... ..... . .. .. ... ......... .. ... ....... .. ... ...... .. .. . . . ..................................................................................................................................................................................................................................

•

a0

a1 a2 a3 a

1

Examples. 1. p0 = 1/4, p1 = 1/4, p2 = 1/2. This gives us µ = 5/4 and φ(s) = 1/4 + s/4 + s2 /2. Solving φ(s) = s gives two solutions f1/2, 1g. Therefore a = 1/2.

2

COUNTABLE MARKOV CHAINS

38

2. p0 = 1/2, p1 = 1/4, p2 = 1/4. In this case, µ = 3/4 so that a = 1. 3. Schmuland family tree. p0 = 7/15, √ p − 1 = 3/15, p2 = 1/15, p3 = 4/15. This gives µ = 1.1333 and a = ( 137 − 5)/8 = .83808. There is a 16% chance that our surname will survive into the future!

3

OPTIMAL STOPPING

39

3

Optimal Stopping

A

Strategies for winning

Think of a Markov chain as a gambling game. For example, a random walk on f0, 1, . . . , Ng could represent your winnings at the roulette wheel. Definition. A payoff function f : S → [0, ∞) assigns a “payoff” to each state of the Markov chain. Think of it as the amount you would collect if you stop playing with the chain in that state.

Example. In the usual situation, f(x) = x, while in the situation where you owe the mob N dollars the payoff is f(x) = 1N (x). f(x) = x

f(x) = 1N (x)

•.......

. .... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . ... . . ... .... .... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . . . . . . . ......................................................................................................................................................................................................

•

•

...

•

• 0

1

2

··· N − 1 N

•.......

.. ... .. . ......................................................................................................................................................................................................

• 0

• 1

• 2

• ... N − 1 N

Note. Any function g : S → R can also be considered as the column vector whose coordinates are the numbers g(x) for x ∈ S. Multiplying this vector on the right of the transition matrix PPgives a new column vector (= function) P g with coordinates (P g)(x) = y∈S g(y)p(x, y). Definition. A stopping time (strategy) is a rule that tells you when to stop playing. Mathematically, it is a random variable with values in f0, 1, 2, . . .g∪ f∞g so that fT ≤ ng ∈ σ(X0 , . . . , Xn ) for all n ≥ 0.

3

OPTIMAL STOPPING

40

Examples. 1. T ≡ 0, i.e., don’t gamble. 2. T ≡ 1, i.e., play once, then stop. 3. T = “wait for 3 reds in a row, bet on black. Repeat until you hit 0 or N .” We will always assume that P (T < ∞) = 1 in this section. Definition. The value function v : S → [0, ∞) is defined as the most expected profit possible from that starting point; v(x) = sup E(f(XT ) j X0 = x). T

There is an optimal strategy Topt so that v(x) = E(f(XTopt ) j X0 = x). Facts about v 1. Consider the strategy T0 ≡ 0 (don’t gamble). Then v(x) = sup E(f(XT ) j X0 = x) ≥ E(f(XT0 ) j X0 = x) = f(x). T

That is, v(x) ≥ f(x) for all x ∈ S. 2. Define the strategy T ∗ : play once, then follow the optimal strategy. Then v(x) ≥ E(f(XT ∗ ) j X0 = x) X = E(f(XT ∗ ) j X1 = y)p(x, y) y∈S

=

X

v(y)p(x, y)

y∈S

= (P v)(x). Therefore v(x) ≥ (P v)(x) for all x ∈ S. Such a function is called superharmonic.

3

OPTIMAL STOPPING

41

3. By definition, a superharmonic function u satisfies u(x) ≥ (P u)(x), or E(u(X0 ) j X0 = x) ≥ E(u(X1 ) j X0 = x). It turns out that for any two stopping times 0 ≤ S ≤ T < ∞ we have E(u(XS ) j X0 = x) ≥ E(u(XT ) j X0 = x). 4. Suppose that u is a superharmonic function and u(x) ≥ f(x) for all x ∈ S. Then u(x) = ≥ ≥ =

E(u(X0 ) j X0 = x) E(u(Topt ) j X0 = x) E(f(Topt ) j X0 = x) v(x).

Putting all this together gives the following. Lemma. The value function v is the smallest superharmonic function ≥ f.

Here is the main theorem of this section. Theorem. The optimal strategy is given by TE := inf(n ≥ 0 j Xn ∈ E), where E = fx ∈ S j f(x) = v(x)g. Sketch of proof. First you must show that P (TE < ∞ j X0 = x) = 1 for all x ∈ S. Assume this has been done. Define u(x) = E(f(XTE ) j X0 = x), and note that, since v is superharmonic, u(x) = E(v(XTE ) j X0 = x) ≤ v(x). Define another strategy TE0 = inf(n ≥ 1 j Xn ∈ E), so that TE0 ≥ TE and (P u)(x) = E(v(XTE0 ) j X0 = x) ≤ E(v(XTE ) j X0 = x) = u(x),

3

OPTIMAL STOPPING

42

showing that u is superharmonic. The last thing is to show that u(x) ≥ f(x). Fix a state y so that f(y) − u(y) = supx (f(x) − u(x)). Then u(x) + f(y) − u(y) ≥ f(x) for all x ∈ S. Since the left hand side is superharmonic, we get u(x) + f(y) − u(y) ≥ v(x) for all x ∈ S. in particular, f(y) ≥ v(y) so that y ∈ E. Then u(y) = E(f(XTE ) j X0 = y) = f(y), so that u(x) ≥ f(x) for all x ∈ S.

u t

Corollary. v(x) = maxff(x), (P v)(x)g.

B

Examples

1. Take the random walk on f0, 1, . . . , Ng with q ≥ p, and f(x) = x. Then (P f )(x) = qf(x−1)+(1−(q +p))f(x)+pf (x+1) = x−(q −p) ≤ x = f(x). This shows that f is superharmonic. Therefore v = f everywhere and E = S. The optimal strategy is T0 ≡ 0, i.e., don’t gamble! 2. Take the random walk on f0, 1, . . . , Ng with absorbing boundaries and f(x) = 1N (x). Let’s show that the optimal strategy is to continue until you hit f0, N g. Certainly f0, N g ⊆ E since absorbing states x always satisfy v(x) = f(x). For 1 ≤ x ≤ N − 1, there is a non-zero probability that the chain will hit N before 0 and so, f(x) = 0 < E(f(XTf0,Ng ) j X0 = x) ≤ v(x). Thus, v(x) > f(x) so x does not belong to E.

3

OPTIMAL STOPPING

43

Note that the value function gives the probability of ending up at N ; v(x) = E(f(XTE ) j X0 = x) = P (XTE = N j X0 = x). The function v is harmonic (v(x) = (P v)(x)) on f1, 2, . . . , N − 1g so, as for the example on page 33, we calculate directly  1 − (q/p)x   , p 6= q  1 − (q/p)N v(x) = .    x p=q N 3. Zarin case The following excerpt is taken from What is the Worth of Free Casino Credit? by Michael Orkin and Richard Kakigi, published in the January 1995 issue of the American Mathematical Monthly. In 1980, a compulsive gambler named David Zarin used a generous credit line to run up a huge debt playing craps in an Atlantic City casino. When the casino finally cut off Zarin’s credit, he owed over $3 million. Due in part to New Jersey’s laws protecting compulsive gamblers, the debt was deemed unenforceable by the courts, leading the casino to settle with Zarin for a small fraction of the amount he owed. Later, the Internal Revenue Service tried to collect taxes on the approximately $3 million Zarin didn’t repay, claiming that cancellation of the debt made it taxable income. Since Zarin had never actually received any cash (he was always given chips, which he promptly lost at the craps table), an appellate court finally ruled that Zarin had no tax obligation. The courts never asked what Zarin’s credit line was actually worth. Mathematically, the payoff function is the positive part of x − k, where k is the units of free credit. f(x) = (x − k)+ •.......

. .... ... ... ... ... ... ... .. ... ... ... ... ... . . . .............................................................................................................................................................................................................................................................................

···

•

•

•

•

0

1

2

···

• k

k+1

k+2

···

3

OPTIMAL STOPPING

44

Since the state zero is absorbing, we have v(0) = 0. On the other hand, v(x) > 0 = f(x) for x = 1, . . . , k, so that 1, . . . , k 6∈ E. Starting at k, the optimal strategy is to keep playing until you hit 0 or N for some N > k which is to be determined. In fact, N is the smallest element in E greater than k. We have to eliminate the possibility that N = ∞, that is, E = 0. But the strategy Tf0g gives a value function that is identically zero. As this is impossible, we know N < ∞. The optimal strategy is Tf0,N g for some N . Using the previous example we can calculate directly that E(f(XTf0,Ng ) j X0 = k) = (N − k)

1 − (q/p)k . 1 − (q/p)N

For any choice of p and q, we choose N to maximize the right hand side. In the Zarin case, we may assume he played the “pass line” bet which gives the best odds of p = 244/495 and q = 251/495, so that q/p = 251/244. We also assume that he bets boldly, making the maximum bet of $15,000 each time. Then three million dollars equals k = 200 free units, and trial and error gives N = 235 and v(200) = 12.977 = $194, 655. N Expected Profit (units) Expected Profit ($) 232 12.9169 193754.12 233 12.9486 194228.91 234 12.9684 194526.29 235 12.9771 194655.80 236 12.9751 194626.58 12.9632 194447.42 237 238 12.9418 194126.71 In general, we have the approximate formula N ≈ k + 1/ ln(q/p) and the probability of reaching N is approximately 1/e = .36788. Therefore the approximate value of k free units of credit is v(k) ≈ which is independent of k!

1 , exp(1) ln(q/p)

3

C

OPTIMAL STOPPING

45

Algorithm to find optimal strategy

The value function v is the smallest superharmonic function that exceeds f, but there is no direct formula to find it. Here is an algorithm that gives approximations to v.

Algorithm.

Define u1 (x) =

º

f(x) sup f

if x is absorbing otherwise.

Then let u2 = max(P u1 , f), u3 = max(P u2 , f), etc. The sequence (un ) decreases to the function v.

Example. How much would you pay for the following financial opportunity? I assume that you follow a random walk on the graph. There is no payoff except $100 at state 4, and state 5 is absorbing. 1 •..................

• 4 ($100 bill)

.. ..... ..... ..... ..... ..... . ..... . . . .. ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ......... ......... .... ..... ..... ..... ........ ..... ..... . . . ..... ... . . . ..... . ..... ..... ..... ..... . . . . ..... .... . ..... . . ..... .... . . . ..... . ... .....

3 •

2•

• 5 (absorbing)

In vector form, the value function is f = (0, 0, 0, 100, 0). (For ease of typesetting, we will render these column vectors as row vectors, OK?) The “P ” operation takes a vector u and gives Pu =

²

³ u(2) + u(3) + u(4) u(1) + u(3) + u(5) u(1) + u(2) + u(4) + u(5) u(1) + u(3) + u(5) , , , , u(5) . 3 3 4 3 The initial vector is u1 = (100, 100, 100, 100, 0) and P u1 = (100, 200/3, 75, 200/3, 0). Taking the maximum of this with f puts the fourth coordinate back up to 100, giving u2 = (100, 200/3, 75, 100, 0). Applying this procedure to u2 gives u3 = (725/9, 175/3, 200/3, 100, 0). Putting this on a computer, and repeating 15 times yields (in decimal

3

OPTIMAL STOPPING

46

format) u15 = (62.503, 37.503, 50.002, 100.00, 0.00). These give the value, or fair price, of the different starting positions on the graph. We may guess that if the algorithm were continued, the values would be rounded to u∞ = v = (62.5, 37.5, 50.0, 100.0, 0.0). We can confirm this guess by checking the equation v = max(P v, f).

D

The binomial pricing model

In this section, we will look at an example from mathematical finance. It’s a little more complicated than our previous models, but it uses the same principles. We suppose a market with 3 financial instruments: bond (riskless asset), stock (risky asset), and an option (to be explained). In one time period, the value of the stock could go up or down, by being multiplied by the constant d or u. In contrast, the bond grows at fixed rate r independent of the market. Here r = 1 + interest rate and we assume that d < r < u to preclude the possibility of arbitrage (riskfree profit). • uS, rB, Cu

........ .......... .......... .......... .......... . . . . . . . . . ... .......... .......... .......... .......... ................... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

C, B, S •

• dS, rB, Cd

Time 0

Time 1

How to not make money Now imagine a portfolio that consists of x units of stock, y units of options, and z units of bonds. The numbers can be positive, zero, or negatives; for example they may represent debt. Suppose that we can arrange it so that this portfolio is absolutely worthless

3

OPTIMAL STOPPING

47

at time 1, whether stocks go up or down x(uS) + y Cu + z rB = 0 x(dS) + y Cd + z rB = 0 The fair time-zero price for the worthless portfolio is also zero, so that x S + y C + z B = 0. Solving these 3 equations for the unknown C gives º² ³ ² ³ » 1 r−d u−r C= Cu + Cd . r u−d u−d To ease the notation let p = (r − d)/(u − d) so that the price can be written C = (p Cu + (1 − p) Cd )/r. The “worthless portfolio” device is the same as the usual “replicating portfolio” device.

Call Option A call option gives the holder the right (but not the obligation) to buy stock at a later time for K dollars. The value K is called the strike price. The value of the option at time 1 is given by Cu = (uS − K)+

Cd = (dS − K)+

Example. Suppose S = 100, r = 1.05, u = 2, d = 1/2. This gives a p-value of p = 11/30. If the strike price is K = 150, then Cu = 50 and Cd = 0. Therefore the option price is º² ³ ² ³ » ² ³ 11 19 1 55 1 50 + 0 = = 17.46 C= r 30 30 r 3 Just to doublecheck: the replicating portfolio consists of 1/3 share of stock and a loan of 1r (50/3) dollars. If the stock goes down, it is worth 50/3 dollars which you use to pay off the loan, leaving a profit of zero. On the other hand, if the stock goes up it is worth 200/3, you pay off the loan and retain 150/3 = 50 dollars profit. The time zero value of this portfolio is ² ³ ² ³ 1 55 100 1 50 − = . 3 r 3 r 3

3

OPTIMAL STOPPING

48

Clearly the price of the call option is non-negative; C ≥ 0. Also 1 fp (uS − K)+ + (1 − p)(dS − K)+ g r 1 ≥ fp (uS − K) + (1 − p)(dS − K)g r K = S− r ≥ S−K

C =

Combining these two inequalities shows that C ≥ (S − K)+ . Definition. An American option can be exercised at any time, a European option can only be exercised at the terminal time. For American options, the price is the maximum of the current payoff and the formula calculated earlier. For a call option, ³ ² 1 C = max (S − K)+ , fpCu + (1 − p)Cd g r 1 = fpCu + (1 − p)Cd g . r A call option is never exercised early.

A put option gives the buyer the right (but not the obligation) to sell stock for K dollars. That is, Pu = (K − uS)+ and

Pd = (K − dS)+

² ³ 1 P = max (K − S)+ , fpPu + (1 − p)Pd g . r

Example. Suppose again that S = 100, r = 1.05, u = 2, d = 1/2. This gives a p-value of p = 11/30. If the strike price is K = 150, then Pu = 0 and Pd = 100. Therefore the option price is º² ³ ² ³ »³ ² 11 19 1 0+ 100 = 60.31. P = max 50, 1.05 30 30

3

OPTIMAL STOPPING

49

Multiple time periods.

Call Option •

650 800

•

50 200

•

0 50

........ .......... .......... .......... . . . . . . . . . . .......... .......... .......... .......... .......... . . . . . . . . . . .......... .......... ............... .......... .......... .......... .......... .......... .......... . . . . . . . . .......... . . .......... .......... .......... .......... . . . . . . . . .......... . ...... . . .......... . . . . . . . .................. ........... . . . . . . . . ..... . .......... ...... . . . . .......... ..... . . . . . . . . . . .......... .......... ...... . . . . . . . . . . . . . . . . . . . . . . . . .......... ...... ...... . . . . . . . . . . . . . . . . . . . . . . . . .......... ........ .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... . .......... .......................... ........................... . . . . .. .......... .......... ................... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... . . . . . .......... . . . . .......... .......... ...... . . . .......... . . . . . . .......... .......... ...... . . . . .......... . . . . .......... . .................... ............ .......... .......... .......... ... .......... .......... . . . . . . . . . . .......... ........ . . .......... . . . . . . . .... .......... ........... .......... .......... .......... ........... .......... .......... ..................... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......

257.14

•

100.33

400

38.71

200

17.46

100

6.10

100

•

•

•

50

• 0

•

25

Time

0

1

2

0 • 12.50 3

This tree explains the price of a call option with terminal time n = 3. The red numbers are the possible stock values and the green numbers are the current value of the call option. These are calculated by starting at the right hand side and working left, using our formula. The end result is relatively simple, since a call option is never exercised early. ( n ² ³ ) 1 X n j n−j j n−j p (1 − p) (u d S − K)+ C= n r j j=0

3

OPTIMAL STOPPING

50

Put Option •

0 800

•

0 200

•

100 50

....... .......... .......... .......... .......... . . . . . . . . . . .......... .......... .......... .......... .......... . . . . . . . . . . .......... .............. .......... .......... .......... .......... .......... .......... .......... .......... . . . . . . . . .......... . ...... . . . .......... . . . . . . .......... ...... . . . . . . . . .......... . ...... . . ................. . . . . . . . ...... .............. . . . . . . ................. . . . .......... ...... .......... . . . . . . . . . . . . . . . . . . . . . . .......... ..... ...... . . . . . . . . . . . . . . . . . . . . . . . . .......... ...... ...... . . . . . . . . . . . . . . . . . . . . . . . . .......... ......... .......... .......... .......... .......... .......... .......... .......... .......... . .......... ........................... ........................... .. ........... . . . . . . . .......... . . .......... . .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... . . . . . . .......... . . .......... . ...... .......... . . .......... . . . . . . . .......... ........... .......... .............. ............. .......... .......... .......... ... .......... ........... .......... .......... . . . . . . . . . . .......... ........... .......... ........... .......... .......... ........... .......... .......... .......... ..................... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......

0

•

36.38

400

73.02

200

60.32

100

100

100

50

125

•

•

•

• •

25

Time

0

1

2

• 137.50 12.50 3

This tree explains the price of a put option with terminal time n = 3. The red numbers are the possible stock values and the green numbers are the current value of the put option. These are calculated by starting at the right hand side and working left, using our formula, but always taking the maximum with the result of immediate exercise. There are boxes around the nodes where the option would be exercised; note that two of them are early exercise.

What’s the connection with optimal stopping for Markov chains? Define a Markov chain on the tree. At each time, the chain moves forward. It goes upward (north-east) with probability p and downwards (south-east) with probability 1 − p. The ends are absorbing. Let f(x) be the value of the option is exercised immediately at state x. Note that for a state x at level n, we have u1 (x) = f(x), u2 (x) = max(P u1 (x), f(x)) = f(x), u3 (x) = max(P u2 (x), f(x)) = f(x). Similarly uk (x) = f(x) for all k, so letting k → ∞ we find v(x) = f(x) for such x.

3

OPTIMAL STOPPING

51

In other words, u1 (x) = v(x) for x at level n. Next consider a state x at level n − 1. We have u2 (x) = max(P u1 (x), f(x)). Now u1 ≡ v at level n so P u1 ≡ P v at level n−1. So u2 (x) = max(P v(x), f(x)) = v(x) for such x.

x

•

y

•

z

..... .......... .......... .......... .......... .................. .......... .......... .......... .......... .......

•

level = n − 1

(P g)(x) = pg(y) + (1 − p)g(z)

n

Continuing this way, we can prove that uj+1 (x) = v(x) for x at level N − j. In particular, uk (x) = v(x) for all x when k ≥ N + 1. The algorithm of starting at the far right hand side and working backwards gives us the value function v, which gives the correct price of the option at each state. We exercise the option at any state where v(x) = f(x).

4

MARTINGALES

52

4

Martingales

A

Conditional Expectation

What’s the best way to estimate the value of a random variable X? If we want to minimize the squared error E[(X − e)2 ] = E[X 2 − 2eX + e2 ] = E(X 2 ) − 2eE(X) + e2 , differentiate to obtain 2e − 2E(X), which is zero at e = E(X). Example. Your friend throws a die and you have to estimate its value X. According to the analysis above, your best bet is to guess E(X) = 3.5. What happens if you have additional information? Suppose that your friend will tell you the parity of the die value, that is, whether it is odd or even? How should we modify our guess to takenthis new information into 0 if X is even account? Let’s define the random variable P = . 1 if X is odd Then E(X j X is even) =

6 X

xP (X = x j X is even)

x=1

³ ² ³ ² ³ ² 1 1 1 + (3 × 0) + 4 × + (5 × 0) + 6 × = (1 × 0) + 2 × 3 3 3 = 4. Similar calculations show that E(X j X is odd) = 3. We can combine these results as follows. Define a function φ(p) =

º

4 3

if p = 0 . if p = 1

Then our best estimate is the random variable φ(P ). In an even more extreme case, your friend may tell you the exact result X. In that case your estimate will be X itself.

4

MARTINGALES

53

Information

Best estimate of X

none

E(X j no info) = φ

partial

E(X j P ) = φ(P )

complete

E(X j X) = φ(X)

φ ≡ E(X) º 4 if p = 0 φ(p) = 3 if p = 1 φ(x) = x

Example. Suppose you roll two fair dice and let X be the number on the first die, and Y be the total on both dice. Calculate (a) E(Y j X) and (b) E(X j Y ). (a) X

E(Y j X)(x) =

yP (Y = y j X = x)

y

6 X 1 = (x + w) = x + 3.5, 6 w=1

so that E(Y j X) = X + 3.5. The variable w in the sum above stands for the value on the second die. (b) X

x P (X = x j Y = y)

=

X

x

P (X = x, Y = y) P (Y = y)

=

X

x

P (X = x, Y − X = y − x) P (Y = y)

=

X

x

P (X = x)P (Y − X = y − x) P (Y = y)

E(X j Y )(y) =

x

x

x

x

4

MARTINGALES

Now

and

54

 y−1    36 P (Y = y) =    13 − y 36 1 P (Y − X = y − x) = , 6

y = 2, 3, 4, 5, 6, 7 y = 8, 9, 10, 11, 12 y − 6 ≤ x ≤ y − 1.

For 2 ≤ y ≤ 7 we get y−1 X

y−1

1/36 1 X 1 (y − 1)y y E(X j Y )(y) = x = x= = . (y − 1)/36 y − 1 x=1 y−1 2 2 x=1 For 7 ≤ y ≤ 12 we get 6 X

6 X 1 1/36 y E(X j Y )(y) = = x x= . (13 − y)/36 13 − y x=y−6 2 x=y−6

Therefore our best estimate is E(X j Y ) = Y /2.

If X1 , X2 , . . . is a sequence of random variables we will use Fn to denote “the information contained in X1 , . . . , Xn ” and we will write E(Y j Fn ) for E(Y j X1 , . . . , Xn ). Definition. E(Y j Fn ) is the unique random variable satisfying the following two conditions: 1. E(Y j Fn ) depends only on the information in Fn . That is, there is some function φ so that E(Y j Fn ) = φ(X1 , X2 , . . . , Xn ). 2. If Z is a random variable that depends only on Fn , then E (E(Y j Fn )Z) = E(Y Z).

4

MARTINGALES

55

Properties: 1. E(E(Y j Fn )) = E(Y ) 2. E(aY1 + bY2 j Fn ) = aE(Y1 j Fn ) + bE(Y2 j Fn ) 3. If Y is a function of Fn , then E(Y j Fn ) = Y 4. For m < n, we have E(E(Y j Fn ) j Fm ) = E(Y j Fm ) 5. If Y is independent of Fn , then E(Y j Fn ) = E(Y )

Example 1. Let X1 , X2 , . . . be independent random variables with mean µ and set Sn = X1 + X2 + · · · + Xn . Let Fn = σ(X1 , , . . . , Xn ) and m < n. Then E(Sn j Fm ) = E(X1 + · · · + Xm j Fm ) + E(Xm+1 + · · · + Xn j Fm ) = X1 + · · · + Xm + E(Xm+1 + · · · + Xn ) = Sm + (n − m)µ

Example 2. Let X1 , X2 , . . . be independent random variables with mean µ = 0 and variance σ 2 . Set Sn = X1 + X2 + · · · + Xn . Let Fn = σ(X1 , , . . . , Xn ) and m < n. Then E(Sn2 j Fm ) = = = = =

E((Sm + (Sn − Sm ))2 j Fm ) 2 + 2Sm (Sn − Sm ) + (Sn − Sm )2 j Fm ) E(Sm 2 + 2Sm E(Sn − Sm j Fm ) + E((Sn − Sm )2 ) Sm 2 + 0 + Var (Sn − Sm ) Sm 2 Sm + (n − m)σ 2

4

MARTINGALES

56

Example 3.

B

Martingales

Let X0 , X1 , · · · be a sequence of random variables and define Fn = σ(X0 , X1 , . . . , Xn ) to be the “information in X0 , X1 , . . . , Xn ”. The family (Fn )∞ n=0 is called the filtration generated by X0 , X1 , . . .. A sequence Y0 , Y1 , . . . is said to be adapted to the filtration if Yn ∈ Fn for every n ≥ 0, i.e., Yn = φn (X0 , . . . , Xn ) for some function φn .

4

MARTINGALES

57

Definition. The sequence M0 , M1 , . . . of random variables is called a martingale (with respect to (Fn )∞ n=0 ) if (a) E(jMn j) < ∞ for n ≥ 0. ∞ (b) (Mn )∞ n=0 is adapted to (Fn )n=0 .

(c) E(Mn+1 j Fn ) = Mn for n ≥ 0. Note that if (Mn ) is an (Fn ) martingale, then E(Mn+1 − Mn j Fn ) = 0 for all n. Therefore if m < n, E(Mn − Mm j Fm ) = E(

n−1 X

Mj+1 − Mj j Fm )

n−1 X

E(Mj+1 − Mj j Fj ) j Fm )

j=m

= E(

j=m

= 0 so that E(Mn j Fm ) = Mm . Another note: Suppose (Mn ) is an (Fn ) martingale, and define FnM = σ(M0 , M1 , . . . , Mn ). Then Mn ∈ FnM for all n, and FnM ⊆ Fn . Therefore E(Mn+1 j FnM ) = E(E(Mn+1 j Fn ) j FnM ) = E(Mn j FnM ) = Mn , so (Mn ) is an (FnM )-martingale. Example 1. Let X1 , X2 , . . . be independent random variables with mean µ. Put S0 = 0 and Sn = X1 + · · · + Xn for n ≥ 1. Then Mn := Sn − nµ is an (Fn ) martingale. Proof. E(Mn+1 − Mn j Fn ) = = = =

E(Xn+1 − µ j Fn ) E(Xn+1 j Fn ) − µ µ−µ 0.

4

MARTINGALES

58

Example 2. Martingale betting strategy Let X1 , X2 , . . . be independent random variables with P (X = 1) = P (X = −1) = 1/2. These represent the outcomes of a fair game that we will bet on. We start with a one dollar bet, and keep doubling our bet until we win once, then stop. Let W0 = 0 and, for n ≥ 1, Wn be our winnings after n bets: this is either equal to 1 or to −(1 + 2 + · · · + 2n−1 ) = 1 − 2n . If Wn = 1, then Wn+1 = 1 also since we’ve stopping betting. That is, E(Wn+1 j Wn = 1) = 1 = Wn . On the other hand, if Wn = 1 − 2n , then we bet 2n dollars so that 1 1 P (Wn+1 = 1 j Wn = 1 − 2n ) = , P (Wn+1 = 1 − 2n+1 j Wn = 1 − 2n ) = . 2 2 Putting this together we get 1 1 E(Wn+1 j Wn = 1 − 2n ) = (1) + (1 − 2n+1 ) = 1 − 2n = Wn . 2 2 Thus E(Wn+1 j Wn ) = Wn in either case, so Wn is a martingale. Example 3. A more complex betting strategy Let X1 , X2 , . . . be as above, and suppose you bet Bn dollars on the nth play. We insist that Bn ∈ Fn−1 since you can’t peek into the future. Such a (Bn ) process is called predictable. P Your winnings are given by Wn = nj=1 Bj Xj so that E(Wn+1 − Wn j Fn ) = = = =

E(Bn+1 Xn+1 j Fn ) Bn+1 E(Xn+1 j Fn ) Bn+1 E(Xn+1 ) 0,

so (Wn ) is again a martingale. This is a discrete version of stochastic integration with respect to a martingale. Note that example 2 is the case where we set B1 = 1 and º j−1 if X1 , X2 , . . . , Xj−1 = −1 . Bj = 2 0 otherwise

4

MARTINGALES

59

Example 4. Polya’s urn Begin with an urn that holds two balls: one red and the other green. Draw a ball at random, then return it with another of the same color. Define Xn to be the number of red balls in the urn after n draws. Then Xn is a time inhomogeneous Markov chain with P (Xn+1 = k + 1 j Xn = k) =

k n+2

P (Xn+1 = k j Xn = k) = 1 −

k n+2

This gives E(Xn+1

² ³ ² ³ k n+3 k j Xn = k) = (k + 1) +k 1− =k , n+2 n+2 n+2

so that E(Xn+1 j Xn ) = Xn (n + 3/n + 2). From the Markov property we get ³ ² n+3 , E(Xn+1 j Fn ) = Xn n+2 and dividing we obtain E

²

Xn+1 j Fn (n + 1) + 2

³

=

Xn . n+2

If we define Mn = Xn /(n + 2), then (Mn ) is a martingale. Here Mn stands for the proportion of red balls in the urn after the nth draw.

C

Optional sampling theorem

Definition. A process (Xn ) is called a supermartingale if it is adapted and E(Xn+1 j Fn ) ≤ Xn for n ≥ 0. A process (Xn ) is called a submartingale if it is adapted and E(Xn+1 j Fn ) ≥ Xn for n ≥ 0.

Definition. A random variable τ with values in f0, 1, . . .g ∪ f∞g is called a stopping time if fω : τ (ω) ≤ ng ∈ Fn for n ≥ 0.

4

MARTINGALES

60

Proposition. If (Xn ) is a supermartingale and 0 ≤ σ ≤ τ are bounded stopping times, then E(Xτ ) ≤ E(Xσ ). Proof. Let k be the bound, i.e., 0 ≤ σ ≤ τ ≤ k. We prove the result by induction on k. If k = 0, then obviously the result is true. Now suppose the result is true for k − 1. Write E(Xσ − Xτ ) = E(Xσ∧(k−1) − Xτ ∧(k−1) ) − E((Xk − Xk−1 )1fσ≤k−1, τ =kg ). The first term on the right is non-negative by the induction hypothesis. As for the second term, note that fσ ≤ k − 1, τ = kg = fσ ≤ k − 1g ∩ fτ ≤ k − 1gc ∈ Fk−1 . Therefore E((Xk − Xk−1 )1fσ≤k−1, τ =kg ) = E((Xk − Xk−1 j Fk−1 )1fσ≤k−1, τ =kg ) ≤ 0, which gives the result.

Optional sampling theorem. If (Mn ) is a martingale and T a finite stopping time, then under suitable conditions E(M0 ) = E(MT ). Proof. For each k, Tk := T ∧k is a bounded stopping time so that E(M0 ) = E(MTk ). But E(MT ) = E(MTk ) + E((MT − Mk )1fT >kg ), so to prove the theorem you need to argue that E((MT − Mk )1fT >kg ) → 0 as k → ∞.

Warning. The simple symmetric random walk Sn is a martingale, and T := inffn : Sn = 1g is a stopping time with P (T < ∞) = 1. However E(S0 ) = 0 6= E(ST ) = 1, so the optional sampling theorem fails.

4

MARTINGALES

61

Analysis of random walk using martingales Let X1 , X2 , . . . be independent with P (X = −1) = q, P (X = 1) = p, and P (X = 0) = 1 − (p + q). Note that µ = q − p and σ 2 = p + q − (q − p)2 . Let S0 = j and Sn = S0 + X1 + · · · + Xn and define the stopping time T := inffn ≥ 0 : Sn = 0 or Sn = N g where we assume that 0 ≤ j ≤ N . 1. (Case p = q) Since (Sn ) is a martingale, we have E(S0 ) = E(ST ) j = 0 × P (ST = 0) + N × P (ST = N ) which implies that P (ST = N ) = j/N . 2. (Case p 6= q) Now (Sn − nµ) is a martingale, we have E(S0 − 0µ) = E(ST − T µ) j = N × P (ST = N ) − E(T )µ which unfortunately leaves us with two unknowns. To overcome this problem, we introduce another martingale: Mn = (q/p)n (check that this really is a martingale!). By optional stopping À² ³ ! À² ³ ! S S q 0 q T E = E p p ² ³j ² ³0 ² ³N q q q = P (ST = 0) + P (ST = N ) p p p ² ³N ² ³j q q = 1 − P (ST = N ) + P (ST = N ). p p A little algebra now shows that

² ´ µ ³ 1 − (q/p)j 1 − (q/p)j −1 P (ST = N ) = and E(T ) = (p − q) N −j 1 − (q/p)N 1 − (q/p)N

4

MARTINGALES

62

3. (Case p = q) Now (Sn2 − nσ 2 ) is a martingale, we have E(S02 − 0µ) = E(ST2 − T σ 2 ) j 2 = N 2 × P (ST = N ) − E(T )σ 2 Substitute P (ST = N ) = j/N and solve to obtain

E(T ) =

j(N − j) p+q

Probability of ruin: p = 1/2, q = 1/2, p = 9/10, q = 10/19 1.00 ......................................... ......... ................ .. ......... ......... ..................... 0.75 ......... ......... ........ ......... ........ ......... ....... . . . ......... 0.50 ... ......... ......... ............ ......... ..... ......... ..... 0.25 ......... .... ............. ...... 0.00 0 5 10 15 20 Starting point

Average length of game: p = 1/2, q = 1/2, p = 9/10, q = 10/19 100 75 50 25 0

................................ ......... ........................................................... . . . . . ............. .... ...... ..... ...... ..... ............ . . ..... .... . . . . . . .... ..... .. ........ . . .... ... .. ...... . . ........ .. ..... . ... ... . . . . . ...... .. ........ ...... ..... .. .... ...... ...... ...... .... 0 5 10 15 20 Starting point

4

MARTINGALES

63

Waiting for patterns: In tossing a fair coin, how long on average until you see the pattern HT H? Imagine a gambler who wants to see HT H and follows the “play until you lose” strategy: at time 1 he bets one dollar, if it is T he loses and quits, otherwise he wins one dollar. Now he has two dollars to bet on T , if it is H he loses and quits, otherwise he wins two more dollars. In that case, he bets his four dollars on H, if it is T he loses and quits, otherwise he wins four dollars and stops. His winnings Wn1 form a martingale with W00 = 0. Now imagine that at each time j ≥ 1 another gambler begins and bets on the same coin tosses using the same strategy. These guys winnings are labelled Wn2 , Wn3 , . . . Note that Wnj = 0 for n < j. P Define Wn = nj=1 WnJ the total winnings and let T be the first time the pattern is completed. By optional stopping E(WT ) = E(W0 ) = 0. From the casino’s point of view this means that the average income equals the average payout.

Income: $1

$1

$1

$1

···

$1

$1

$1

$1

Coin tosses: H

T

T

H

···

H

H

T

H

Payout: $0

$0

$0

$0

···

$0

$8

$0

$2

Examining this diagram, we see that the total income is T dollars, while the total payout is 8 + 2 = 10 dollars, and conclude that E(T ) = 10. Fortunately, you don’t need to go through the whole analysis every time you solve one of these problems, just figure out how much the casino has to pay out. For instance, if the desired pattern is HHH, then the casino pays out the final three bettors a total of 8 + 4 + 2 = 14 dollars, thus E(T ) = 14. Example. If a monkey types on a keyboard, randomly choosing letters, how long on average before we see the word MONKEY? Answer: 266 = 308915776.

4

MARTINGALES

64

Guessing Red: A friend turns over the cards of a well shuffled deck one at a time. You can stop anytime you choose and bet that the next card is red. What is the best strategy? Solution: Let Rn be the number of red cards left after n cards have been turned over. Then º Rn with probability 1 − p Rn+1 = , Rn − 1 with probability p where p = Rn /(52 − n), the proportion of reds left. Taking expectations we get ² ³ 52 − (n + 1) Rn = Rn E(Rn+1 j Rn ) = Rn − 52 − n 52 − n so that E

²

Rn+1 j Rn 52 − (n + 1)

³

=

Rn . 52 − n

This means that Mn := Rn /(52 − n) is a martingale. Now let T represent your stopping strategy. By the optional stopping theorem, P (T is successful) = E(MT ) = E(M0 ) = 1/2. Every strategy has a 50% chance of success!

An application to linear algebra: First note that if (Xn ) is a Markov chain, and v a superharmonic function, then the process v(Xn ) is a supermartingale. X E(v(Xn+1 ) j Fn ) = E(v(Xn+1 ) j Xn ) = v(y)p(Xn , y) ≤ v(Xn ). y∈S

Theorem. Let P be an n × n matrix with pij > 0 and Then the eigenvalue 1 is simple.

P

j

pij = 1 for all i.

Proof. Let (Xn ) be the Markov chain with transition matrix P on state space S = f1, 2, . . . , ng. A function u : S → R is harmonic if and only if the

4

MARTINGALES

65

vector u = (u(1), u(2), . . . , u(n))T satisfies P u = u, i.e., u is a right eigenvalue for the eigenvalue 1. Clearly the constant functions are harmonic; we want to show that they are the only ones. Suppose u is a right eigenvector so that u : S → R is harmonic and u(Xn ) is a (bounded!) martingale. Let x, y ∈ S and Ty := inffn ≥ 0 : Xn = yg be the first time the chain hits state y. Since the chain is communicating, we have P (Ty < ∞ j X0 = x) = 1 and so u(y) = Ex (u(XTy )) = Ex (u(X0 )) = u(x).

D

Martingale convergence theorem

Theorem. If (Mn ) is a martingale with supn E(jMn j) < ∞, then there is a random variable M∞ so that Mn → M∞ . Proof. It suffices to show that for any −∞ < a < b < ∞, the probability that (Mn ) fluctuates infinitely often between a and b is zero. To see this, we define a new martingale (Wn ) which is the total “winnings” for a particular betting strategy. The strategy is to wait until the process goes below a, then keep betting until the process goes above b, and repeat. The winnings on the jth bet is Mj − Mj−1 so that W0 = 0,

Wn =

n X

Bj (Mj − Mj−1 ), n ≥ 1.

j=1

The following diagram explains the relationship between the two martingales.

4

MARTINGALES

66

M process .

b a

.... ◦ ... ◦..............................◦...............................................................................................................................◦..........................................................................................................◦................................................................................................................................................................................................................. ... .... .. ...... ◦ ... ... •............ ... ... .. .. . ... . . . . . . . ... . ....... ◦ .• ... ... ... ◦ ... ... ... ... ... ... .... ... ... ... ◦ ... ... ........... . . .. .. • . . . . ... . . . .. . .. . ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ..... ... .. .. ... .. .. ... ..... .. ..... . . . . . ... . . . . . ... . . . . ................ . ... ... ........... • • • • ... ... . • ... ... . .. ... ... jM ... .. n − aj .. ... .. ... .. ... . .

•

.... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..

◦

W process

•

. ...... ... .. ... ..... ... ... ........ . . . . . . . ... . ..... ......... ... ... ..... ..... ..... ..... ..... ... ..... ........ . . ... . . .. .. ... ... ... ... ... ... ... ... ...

•

◦

◦

.. ...... ...... ......

•............. ...

◦

◦

◦

◦

... ... ... .. . ... ... ... .. . ... ... ... .. . ... ... ...

◦

◦

•

•

•

•

... ... .. .. . . . ... ... ... ................ .... ...........

•

•

Notice that Wn ≥ (b − a)Un − jMn − aj, where Un is the number of times that (Mn ) “upcrosses” the interval [a, b]. Therefore 0 = E(W0 ) = E(Wn ) ≥ (b − a)E(Un ) − E(jMn − aj), so that E(Un ) ≤ E(jMn − aj)/(b − a). Taking the supremum in n on both sides gives E(U∞ ) < ∞, which shows that P (U∞ = ∞) = 0. So Mn → M∞ , but M∞ make take the values −∞ or ∞. Luckily Fatou’s lemma comes to the rescue, showing us that E(jM∞ j) ≤ lim inf n E(jMn j) < ∞. Thus P (jM∞ j < ∞) = 1 and E(M∞ ) exists. Although E(M0 ) = E(Mn ) for all n, we cannot always let n → ∞ to conclude that E(M0 ) = E(M∞ ).

4

MARTINGALES

67

Examples. 1. Polya’s urn. Let Mn be the proportion of red balls in Polya’s urn at time n. Then (Mn ) is a martingale and 0 ≤ Mn ≤ 1, so supn E(jMn j) ≤ 1. Therefore Mn → M∞ for some random variable M∞ . It turns out that M∞ has a uniform distribution on (0, 1).

Three sample paths from Polya’s urn 1.00 0.75 0.50 0.25

.... ............. .... ...... ........... .. ........ .. ....... .................................................................................................... ............................ ......................................................................... ..................................................................... ...... ..... .... . . .. . ... ... ... .... .. ... ... .... .. ... ..... ... ... ... ... ... ... ... ... . ........ ... ...... ................................. .... .......... ............... ................................................ .................................... .... ... .. .... .... ............................. .......... ........... ..... ..... ......... ..................... . ... ... ..... .... ...... ... ...... ... .. .. ..... .............. ..... . ............... .............. ..... ... ....... ... ... ........ .............. ................. ................ ................ . . ........................................................................................................... ........ ........... ..... .. ......... ......................................................................................................... . .......

0.00 0

25

50

75

100

2. Branching process. Let Xn be a branching process, and put Mn := n Xn /µ so that (Mn ) is a martingale. If µ ≤ 1, then Mn → M∞ = 0 (extinction). If µ > 1, then (Mn ) is uniformly integrable and Mn → M∞ where E(M∞ ) = 1. 3. Random harmonic series. The harmonic series diverges, but not the alternating harmonic series: 1+ 1−

1 1 1 1 + + + · · · + + · · · = ∞. 2 3 4 j

1 1 1 1 + − + · · · + (−1)j+1 + · · · = ln 2. 2 3 4 j

Here the positive and negative terms partly cancel, allowing the series to converge.

4

MARTINGALES

68

Let’s choose plus and minus signs at random, by tossing a fair coin. Formally, let (εj )∞ j=1 be independent random variables with common distribution P (εj = 1) = P (εj = −1) = 1/2. P Then the martingale convergence theorem shows that the sequence Mn = nj=1 εj /j converges almost surely. P The limit M∞ := ∞ j=1 εj /j has the smooth density pictured below. 0.25 0.20 0.15 0.10 0.05 0.00

............................................... ...... .... . . ... .. ... .. . ... ... ... .. ... .. ... .. ... .. ... .. ... .. ... .... . . . ....... . . . . . . . ..... . .... −3 −2 −1 0 1 2 3 P∞ Density of j=1 εj /j.

4. Recurrent Markov chain. Let (Xn ) be an irreducible, recurrent Markov chain on a countable state space S. Suppose that u is a bounded harmonic function. Then Mn := u(Xn ) is a bounded martingale and so Mn → M∞ as n → ∞. But Xn is recurrent and visits every state infinitely often, so u(Xn ) can only be convergent if u is constant.

5

BROWNIAN MOTION

5

Brownian motion

A

Basic properties

69

Brownian motion is our standard model for continuous random movement. We get a Brownian motion (Xt ) by assuming (1) Independent increments: for s1 < t1 < s2 < t2 < · · · < sn < tn the random variables Xt1 − Xs1 , . . . , Xtn − Xsn are independent. (2) Stationarity: the distribution of Xt − Xs depends only on t − s. (3) Continuous paths: the sample path t 7→ Xt (ω) is continuous with probability 1. Here the random variables Xt take values in the state space Rd for d ≥ 1, the starting point X0 = x ∈ Rd can be anywhere, and E(Xt ) = µt for some fixed µ ∈ Rd . When µ 6= 0 we have Brownian motion with drift, while if d > 1 we call (Xt ) multi-dimensional Brownian motion. Conditions (1)–(3) imply that Xt has a (multivariate) normal distribution for t > 0. Definition. The process (Xt ) is called standard d-dimensional Brownian motion when µ = 0 and the covariance matrix of Xt satisfies E [(Xt − X0 )(Xt − X0 )0 ] = tI. In this case, the coordinates of the vector Xt = (Xt1 , Xt2 , . . . , Xtd ) are independent 1-dimensional Brownian motions. Brownian motion is a Markov process with transition kernel 1 2 e−(y−x) /2t , P (Xt = y j X0 = x) = pt (x, y) = √ 2dπt This kernel satisfies the Chapman-Kolmogorov equation Z ps+t (x, y) = ps (x, z)pt (z, y) dz.

y ∈ Rd .

5

BROWNIAN MOTION

70

For standard d-dimensional Brownian motion, we have

2

E kXt − X0 k

¡

=

d X

E((Xtj − X0j )2 ) = dt,

j=1

so that, on average, d-dimensional Brownian motion is about from its starting position at time t.

√ dt units

In p fact, the average speed of Brownian motion over [0, t] is E(kXt −X0 k)/t ∼ d/t. For large t, this is near zero while for large t, it is near ∞. Proposition. Xt is not differentiable at t = 0, i.e., P (ω : Xt0 (ω) exists at t = 0) = 0. Proof: (ω :

Xt0 (ω)

² ³ exists at t = 0) ⊆ ω : sup kXt (ω) − X0 (ω)k/t < ∞ 0
=

∞ [

² ³ n−1 ω : sup 2 kX2−n (ω) − X2−(n−1) (ω)k < k . n

k=1

Define Ak = (ω : supn 2n−1 kX2−n (ω) − X2−(n−1) (ω)k < k). The random variables Zn := 2n−1 (X2−n − X2−(n−1) ) are independent multivariate normal, so XY X P (Ak ) = P (kZk < k) = 0. P (ω : Xt0 (ω) exists at t = 0) ≤ k

k

n

u t A more complicated argument gives P (ω : t 7→ Xt (ω) is not differentiable at all t ≥ 0) = 1.

Brownian Motion at a large time scale 100 80 60 40 20 0 -20 -40 -60 -80 -100

0

20

40

60

80

Brownian Motion at small time scale 0.01 0.008 0.006 0.004 0.002 0 0.000 -0.002 -0.004 -0.006 -0.008 -0.01

0.002

0.004

0.006

0.008

100 80 60 40 20 0 -20 -40 -60 -80 -100

5

B

BROWNIAN MOTION

71

The reflection principle

By the three ingredients (1)–(3) that define Brownian motion, we see that for any fixed s ≥ 0, the process (Xt+s − Xs ) is a Brownian motion independent of Fs that starts at the origin. In other words, (Xt+s ) is a Brownian motion, independent of Fs , with random starting point Xs . An important generalization says that if T is a finite stopping time then (XT +t ) is independent of FT , with random starting point XT . Suppose Xt is a standard 1-dimensional Brownian motion starting at x, and let x < b. We will prove that P (Xs ≥ b for some 0 ≤ s ≤ t) = 2P (Xt ≥ b). This follows from stopping the process at Tb , the first time (Xt ) hits the point fbg, then using symmetry. The picture below will help you to understand the calculation: 1 P (Xt ≥ b) = P (Xt ≥ b j Tb ≤ t)P (Tb ≤ t) = P (Tb ≤ t), 2 which gives the result.

Reflection principle

b

x

. .. .. ..................... .. ................... ... .............. ........... . ..... ............ ..... ...... ...... .. .... ... .... .. ......... .. .... ...... ... ... ... ... ............................................................................................................................................................................................................................................................................................................................................................................. ... . .. .. ... ..... ..................... ..... ...................... . . . .. . . . . . . . . ... .. . . .. ... ... ..... ... ....... .. .. .. .. ........ ..................... ..... ..... ... .... ... .. .... ..... ........ .... ....... ... ........ ..... ............... .. ... .......... ... .......... . .......... . ...... ....... ..... .. . ............. ........... ................... . . . . . . . ...... . . . . . ... . . . . . . ......... . ... ........ ... ....... . .............. ...... ...... .... .... . ... ... .. .......... ...... ... ... .. ... ......... ... ....... . ... . . .................... . ... . ........... .... ...... . ... ...... .... . . . . ... . . ............. .. .... .............. ......... ........ . .. ... .... ... . . .. ....... ... ....... ............ ........... . .... .. . ... ... ...... ....... . . ...... ...... .... . ...... ........ ............. ... ... ... . ............. ..................... .................... .... .... .. ............ .. ....... .. . .... ..... ... ................. ........... ........ ........ .............. ....... ....... ..... ... ... .......... .. .. ........ . .... .. ........ ............ .. .... . ... ... ... ...... ...... .... ...... ...... ..... . . ... ... ... . . . ....... ....... . . . ... ... ..... .... .... .... ... ........... ... ....... .. ...... ........................................................................................................................................................................................................................................................................................................................................................................... ... ... ... ... .. .... .. ...

•

T

If we now fix a < x < b, we may write explicitly ° √± Px (Tb ≤ t) = 2Px (Xt ≥ b) = 2P Z ≥ (b − x)/ t . Letting t → ∞ we find Px (Xt ever hits b) = 2P (Z ≥ 0) = 1.

5

BROWNIAN MOTION

72

This shows that 1-dimensional Brownian will eventually hit every value greater than the starting position. Since −Xt is also a Brownian motion, we argue that it will also hit every value less than the starting point: Px (Xt hits a) = P−x (−Xt hits − a) = 1. Now we use the strong Markov property again to show that Px (Xt hits b, then hits a) = Px (Xt hits b)Pb (Xt hits a) = 1. In particular it must return to its starting point. You can extend this argument to prove Px (Xt hits all points infinitely often ) = 1.

Now let T be the hitting time of the set fa, bg. Since (Xt ) is a martingale, we have x = Ex (X0 ) = Ex (XT ) = aPx (XT = a) + bPx (XT = b). Using the fact that Px (XT = a) + Px (XT = b) = 1, we can conclude that Px (XT = b) = (x − a)/(b − a). Just like for the symmetric random walk, (Xt2 − t) is a martingale so Ex (X02 − 0) = Ex (XT2 − T ) x2 = a2 Px (XT = a) + b2 Px (XT = b) − Ex (T ). The previous result plus a little algebra shows that Ex (T ) = (b − x)(x − a). If we let a → −∞, we find that, although Px (Tb < ∞), we have Ex (Tb ) = ∞.

C

The Dirichlet problem

Let (Xt ) be a d-dimensional Brownian motion and f : Rd → R. We want to study the function u(t, x) := Ex (u(Xt )).

5

BROWNIAN MOTION

73

The Taylor’s series of f about the point z is 1 f(y) = f(z) + h∇f(z), y − zi + hy − z, D2 f(z)(y − z)i + o(jy − zj2 ). 2 Setting y = Xt , z = Xs , and taking expectations we get X Ex (f(Xt )) = Ex (f(Xs )) + Ex (∂i f(Xs ))Ex (Xti − Xsi ) i

+

1X

2

Ex (∂ij2 f(Xs ))Ex [(Xti − Xsi )(Xtj − Xsj )] + o(E(kXt − Xs k2 ))

i,j

= Ex (f(Xs )) + Ex

À X

!

∂ii2 f(Xs ) (t − s) + o(jt − sj)

i

Therefore we see that u satisfies ∂ 1 u(t, x) = Ex (∆f(Xt )) . ∂t 2 To find the spatial derivatives of u we use the translation invariance of Brownian motion. Ey (f(Xt )) = Ex (f(Xt + (y − x))) ² ³ 1 2 2 = Ex f(Xt ) + h(∇f)(Xt ), y − xi + hy − x, D f(Xt )(y − x)i + o(ky − xk ) 2 1 = Ex (f(Xt )) + hEx ((∇f)(Xt )), y − xi + hy − x, Ex (D2 f(Xt ))(y − x)i 2 2 + o(ky − xk ). In particular, we have D2 u(t, x) = Ex (D2 f(Xt )) and hence ∆u(t, x) = Ex ((∆f)(Xt )). In other words, u satisfies the “heat equation” 1 ∂ u(t, x) = (∆u)(t, x). ∂t 2 Let us explore the connection with the heat equation on a bounded region D of Rd . We fix a temperature distribution g on ∂D (the boundary) for all time, and begin with an initial temperature distribution f in D at time 0. The latter distribution will flow and eventually dissipate completely.

5

BROWNIAN MOTION

74

The solution to the heat equation for x ∈ D can be expressed as ¡ u(t, x) = Ex f(Xt )1ft
x−a . b−a

.. ........... ......... .... ......... ... ......... . . . . . ... . . . .. ... ......... ......... . . . . . . .... . . ...... . . . . . ... . . . ...... . . . ... . . . . . ...... . . ... . . . . . . ...... ... . . . . . . . . .. ...... . . . . . . . . . . . ...............................................................................................................................................................................................................................

a

x

b

That’s very nice, but let me plant the seeds of doubt by asking three questions and looking at a couple of counterexamples. Questions 1. Do we know that Px (T < ∞) = 1? 2. Is u continuous at the boundary? 3. Is there more than one solution to the Dirichlet problem? Example 1. In R2 , let D = f(x1 , x2 ) : x1 > 0g, the open right half plane. The functions u1 (x) ≡ 0 and u2 (x) = x1 are both harmonic and equal zero on ∂D.

5

BROWNIAN MOTION

75

Theorem 1. If D is a bounded open set, u1 , u2 harmonic on D, continuous ¯ and u1 = u2 on ∂D, then u1 = u2 on D. on the closure D, Example 2. Let D = f(x1 , x2 ) : 0 < x21 + x22 < 1g be the punctured disc, and put g(x) = 1 − jxj on ∂D. That is, zero when jxj = 1 and 1 at x = 0. For x ∈ D, we have u(x) = Ex (g(XT )) = Px (XT = 0). It turns out that u(x) = 0 for all x ∈ D. Theorem 2. If ∂D is smooth, and g continuous on ∂D, then u(x) → g(y) as x ∈ D → y ∈ ∂D. Example 3. Let D be bounded with a smooth boundary ∂D. Define u(x) = Px (T∂D < ∞). Then u is harmonic on D and u ≡ 1 on ∂D. Therefore, theorems 1 and 2 tell us that u(x) = 1 for all x ∈ D, that is, Px (T∂D < ∞) = 1 for all x ∈ D. Example 4. Let D1 be any bounded set in Rd . Let D2 be an open sphere so large that D1 ⊆ D2 . Since D2 is a bounded open set with a smooth boundary, we have Px (T∂D1 < ∞) ≥ Px (T∂D2 < ∞) = 1 for all x ∈ D1 . Now that we’ve cleared those points up, let’s return to the problem of finding probabilities by solving the Dirichlet problem. .................................................. ....... ......... ...... ....... ...... ...... ..... ...... . . . . ..... .... . ..... . . ... ... . . ... .. . ... . ... ... . ... .. . ... ............................ . . . . . .... ..... ... .... . . . . ... . ... ... ... . ... ... . ... ... ... .... ... .. .. .... ... .................. . . ... . ... . . ... . ... ... . .. . . ... . . . .. ... ....... . ... . . . . ... ..... . . ... . . . . . . . . . ... ....... ..... ... ... ..... .................................. ... ..... ... ..... ... ... ..... .. . ... . . . . . ... ... ... ..... .... ... ..... ..... ..... ..... ...... ..... . . . . ...... ..... ....... ....... ......... ................. ......................... ..........

x.

R ←1

R2 .

The probability that Brownian motion reaches the outer boundary first is given by v(x) = Ex (g(XT )) where g(x) = 1 if jxj = R2 and g(x) = 0 if jxj = R1 . The function v will be harmonic in between. Now, the symmetry

5

BROWNIAN MOTION

76

of Brownian motion implies that the probability is the same for all x with a common radius. So we can write °P ±1/2 d 2 v(x) = φ(r), where r = . i=1 xi Taking derivatives of the function r, we find ±−1/2 1 °Pd xi 2 x ∂i r = 2xi = i=1 i 2 r x i ∂i [φ(r)] = φ0 (r)∂i r = φ0 (r) r so that °x ± xi xi i + φ0 (r)∂i r r r ² ³ x2i 1 −1 xi 00 0 = φ (r) 2 + φ (r) + xi 2 r r r r 2 1 x2 x = φ00 (r) 2i + φ0 (r) − φ0 (r) 3i r r r

∂ii [φ(r)] = φ00 (r)

Adding over i gives ∆[φ(r)] =

d X i=1

d r2 r2 ∂ii [φ(r)] = φ (r) 2 +φ0 (r) −φ0 (r) 3 = φ00 (r)+φ0 (r) r r r 00

²

d−1 r

³

Solving the one variable equation ∆[φ(r)] = 0 we get the solution  ln jxj − ln(R1 )    ln(R ) − ln(R ) if d = 2  2 1 v(x) =  R12−d − jxj2−d    2−d if d ≥ 3 R1 − R22−d We learn something interesting by taking limits as R2 → ∞. For d = 2, Px ( ever hits B(0, R1 )) = lim 1 − v(x) = 1. R2 →∞

Two dimensional Brownian motion will hit any ball, no matter how small, from any starting point. If we pursue this argument, we can divide R2

.

5

BROWNIAN MOTION

77

using a fine grid, and find that 2-d Brownian motion will visit every section infinitely often. On the other hand, if we leave R2 alone and let R1 → 0, we get Px (Xt = 0 before jXt j = R2 ) = lim 1 − v(x) = 0, R1 →0

and if we now let R2 → ∞ we discover Px (Xt ever hits 0) = 0. Two dimensional Brownian motion will never hit any particular point. The process is neighborhood recurrent but not point recurrent. For d ≥ 3, if we let R2 → ∞ we get Px ( ever hits B(0, R1 )) = lim 1 − v(x) = R2 →∞

²

jxj R1

³2−d

.

Since this is less than one, we see that Brownian motion is transient when d ≥ 3. It turns out that whether or not d-dimensional Brownian motion will hit a set depends on its fractional dimension. The process can hit sets of dimension greater than d − 2, but cannot hit sets of dimension less than d − 2. In the d − 2 case, it depends on the particular set.

6

STOCHASTIC INTEGRATION

78

6

Stochastic integration

A

Integration with respect to random walk

Let X1 , X2 , . . . be independent random variables with P (Xi = 1) = P (Xi = −1) = 1/2. The symmetric random walk can be expressed as Sn = X1 + · · · + Xn , so that Xi = Si − Si−1 = ∆Si . Let Fn denote the information ni X1 , . . . , Xn , and let Bn be the amount “bet” on the nth game. We require that Bn ∈ Fn−1 , i.e., the B-process is predictable. The winnings up to time n can be written: Zn =

n X

Bi Xi =

i=1

n X

Bi ∆Si ,

i=1

so we can call Z the integral of B with respect to S. Recall that Z is a martingale E(Zn+1 − Zn j Fn ) = E(Bn+1 Xn+1 j Fn ) = Bn+1 E(Xn+1 ) = 0. In particular, E(Zn ) = 0. What about the variance Var (Zn ) = E(Zn2 )? Squaring the sum gives X X Bi Bj Xi Xj Bi2 Xi2 + 2 Zn2 = i<j

i

=

X i

Bi2

+2

X

Bi Bj Xi Xj

i<j

For i < j we have E(Bi Bj Xi Xj ) = E(E(Bi Bj Xi Xj j Fj−1 )) = E(Bi Bj Xi E(Xj )) = 0 so E(Zn2 ) =

P

i

E(Bi2 ).

6

B

STOCHASTIC INTEGRATION

79

Integration with respect to Brownian motion

Many models of random behaviour suppose that a process X satisfies a stochastic differential equation dXt = a(Xs ) ds + b(Xs ) dWs . Here the function a is called the drift coefficient and b the diffusion coefficient. This equation is understood in the integrated sense Z t Z t Xt = X0 + a(Xs ) ds + b(Xs ) dWs . 0

0

Let Wt be a standard 1-dimensional Brownian motion, and Yt the R t amount “bet” at time t. We want to define the integrated process Zt = 0 Ys sWs . Rt We assume that 0 E(Ys2 ) ds < ∞ and that Yt is Ft measurable. Simple integrands Suppose there are a finite number of times 0 = t0 < t1 < t2 < · · · < tn and that the process (Yt ) can be written  Y0 0 ≤ t < t1   Y t ≤ t < t 1 1 2 Yt = .. ..  .  . Y n tn ≤ t < ∞ We assume E(Yi2 ) < ∞ and Yi ∈ Fti for all i. Then it makes sense to define: for tj < t ≤ tj+1 Zt =

t

Z

Ys dWS = 0

j X

Yi−1 [Wti − Wti−1 ] + Yj [Wt − Wtj ].

... ... .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. ....................................................................................................................................................................................................................................................................................................... .

i=1

···

tj t tj+1 ··· t1 t2 t3 0 Here are some facts about the integral we’ve defined

6

STOCHASTIC INTEGRATION

80

1. Linearity: If X and Y are two simple integrands, then so is aX + bY and ³ ³ ²Z t ²Z t Z t Ys dWs . Xs dWs + b (aXs + bYs ) dWs = a 0

0

0

2. Martingale property: Clearly Zt ∈ Ft and E(Zt2 ) < ∞. Now if tj ≤ s ≤ t ≤ tj+1 for some j then Zt − Zs = Yj [Wt − Ws ] so E(Zt − Zs j Fs ) = E(Yj [Wt − Ws ] j Fs ) = Yj E(Wt − Ws Fs ) = 0, which is the martingale equation. Now if s ≤ tj < · · · < tk ≤ t, then P Zt − Zs = (Ztj − Zs ) + k−1 i=j (Zti+1 − Zti ) + (Zt − Ztk ), so that E(Zt − Zs j Fs ) = E(Ztj − Zs j Fs ) +

k−1 X

E(Zti+1 − Zti j Fs ) + E(Zt − Ztk j Fs )

k−1 X

E(E(Zti+1 − Zti j Fti ) j Fs )

i=j

= E(Ztj − Zs j Fs ) +

i=j

+E(E(Zt − Ztk j Ftk ) j Fs ) = 0. Rt 3. Variance formula: E(Zt2 ) = 0 E(Ys2 ) ds. This follows exactly as for integration with respect to random walk. For integrands Yt that are not simple, we define a simple approximation as follows (n) Yt = Yi/n for i/n ≤ t < (i + 1)/n. Rt The stochastic integral Zt = 0 Ys dWs is defined as the limit Z t Ys(n) dWs . Zt = lim n→∞

0

The linearity, martingale property, and variance formula carry over to (Zt ) An example Let f be a differentiable non-random function. Then Z t Z t Zt = f(s) dWs = (Wt f(t) − W0 f(0)) − Ws df (s). 0

0

6

STOCHASTIC INTEGRATION

81

Rt Then Zt is a normal random variable with mean zero and variance 0 f 2 (s) ds. We can show that Zt has independent increments as well, so that Z is just a time changed Brownian motion ³ ² t 2 Zt = B ∫ f (s) ds . 0

C

Ito’s formula

Let f be a differentiable function and write f(t) as the telescoping sum f(t) = f(0) +

n−1 X

∼ f(0) +

n−1 X

[f((j + 1)t/n) − f(jt/n)]

j=0

0

f (jt/n)(t/n) +

j=0

= f(0) +

n−1 X

o(t/n)

j=0

t

Z

f 0 (s) ds + 0 0

In a similar vein, let Wt be a Brownian motion and write f(Wt ) as telescoping sum f(Wt ) = f(w0 ) +

n−1 X

[f(W(j+1)t/n ) − f(Wjt/n )]

j=0

= f(W0 ) +

n−1 X j=0

1 + 2

n−1 X j=0

00

¡ f 0 (Wjt/n ) W(j+1)t/n − Wjt/n

f (Wjt/n ) W(j+1)t/n − Wjt/n

¡2

n−1 h X ¡2 i + o W(j+1)t/n − Wjt/n j=0

¡2 The intuition behind Ito’s formula is that you can replace W(j+1)t/n − Wjt/n by t/n with only a small amount of error. Therefore f(Wt ) = f(W0 ) +

n−1 X j=0

¡ f 0 (Wjt/n ) W(j+1)t/n − Wjt/n

6

STOCHASTIC INTEGRATION

82

n−1

n−1

X 1 X 00 f (Wjt/n ) (t/n) + o(t/n) + error . 2 j=0 j=0

+

Letting n → ∞ gives Ito’s formula

f(Wt ) = f(W0 ) +

t

Z

1 f (Ws ) dWs + 2 0

0

t

Z

f 00 (Ws ) ds. 0

Rt Example. Suppose we want to calculate 0 Ws dWs . The definition gets us nowhere so we try to apply the usual rules of calculus Z t Z t 2 2 Ws dWs , Ws dWs = Wt − W0 − 0

0

Rt

which implies 0 Ws dWs = [Wt2 − W02 ]/2. The only problem is that this formula is false! Since W0 = 0 , we can see that it is fishy by taking expectations on both sides: the left hand side gives zero but the right hand side is strictly positive. The moral of this example is that the usual rulesR of calculus do not apply t to stochastic integrals. So how do we calculate 0 Ws dWs correctly? Let f(t) = t2 , so f 0 (t) = 2t and f 00 (t) = 2. From Ito’s formula we find Z t Z Z t 1 t 2 2 2Ws dWs + Wt = W0 + 2 ds = 2 Ws dWs + t, 2 0 0 0 and therefore

t

Z

Ws dWs = 0

¡ 1 2 Wt − t . 2

A more advanced version of Ito’s formula can handle functions that depend on t as well as x:

f(t, Wt ) = f(0, W0 ) +

t

Z

∂s f(s, Ws ) ds + 0

t

Z 0

1 ∂x f(s, Ws ) dWs + 2

t

Z

∂xx f(s, Ws ) ds. 0

6

STOCHASTIC INTEGRATION

83

A peek at math finance Imagine an economy with two assets: a bond whose value grows at a fixed rate r and a stock whose price per unit (St ) is a random variable. If Mt is the cost of one unit of bond at time t we have dMt = rMt dt which implies Mt = M0 ert . For the stock we have dSt = St (µ dt + σ dWt )

(∗)

How do we solve this equation? Guess! Let Xt = exp(at + bWt ). From Ito’s formula with f(t, x) = exp(at + bx) we get Xt

Z 1 t 2 b Xs ds = X0 + aXs ds + bXs dWs + 2 0 0 0 Z t Z t 1 = X0 + (a + b) aXs ds + b Xs dWs 2 0 0 Z

t

Z

t

To solve (∗) we set b = σ and a = µ − σ 2 /2. The solution to our problem is a geometric Brownian motion ¡¡ St = S0 exp σWt + (µ − σ 2 /2) t .