This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0.
(2.3.5)
2.3 Discrete time martingales
53
This means that {X n } oscillates about or up-crosses the interval [a, b] infinitely many times. However, using Theorem 2.3.6 and the fact that sup E[X n ] = E[X 1 ] < ∞ we have: lim E[Cn [a, b]] ≤ lim n
n
E[X n − a]+ E[X 1 ] + |a| ≤ < ∞, b−a b−a
which contradicts (2.3.5), that is, P({ω : lim inf X n (ω) < lim sup X n (ω)}) = 0. Hence limn X n = X a.s. To finish the proof we must show that E[|X |] < ∞. This follows from Fatou’s Lemma 1.3.16. Theorem 2.3.8 Let (, F, P) be a probability space equipped with a filtration {Fn }. Write F∞ = Fn ⊂ F. Let P be another probability measure on (, F) which is absolutely continuous with respect to P when both are restricted to Fn for each n (i.e. P(F) = 0 then P(F) = 0 for all F ∈ Fn ). Suppose n are the corresponding Radon–Nikodym derivatives. Then n converges to an integrable random variable with probability 1. Moreover, if P is absolutely continuous with respect to P on F∞ then is the corresponding Radon–Nikodym derivative. Proof The first statement of the theorem follows from Theorem 2.3.7; the second statement follows from Theorem 3, page 478 of Shiryayev [36]. See also Example 2.3.4. Returning to Example (1.3.29): Example 2.3.9 Suppose (, F, P) is a probability space on which is defined a sequence of random variables Y1 , Y2 , . . . and Fn = σ {Y1 , Y2 , . . . , Yn }. Let P be another probability measure on F. Suppose that under P and P the random vector {Y1 , Y2 , . . . , Yn } has densities f n (.) and f n respectively with respect to n-dimensional Lebesgue measure. Then by Theorem 2.3.8 the Radon–Nikodym derivatives dP f¯n (Y1 , Y2 , . . . , Yn ) = n = dP Fn f n (Y1 , Y2 , . . . , Yn ) converge to an integrable and F∞ -measurable random variable .
Example 2.3.10 If {X n } is an integrable, real valued process with independent increments having mean 0 then it is a martingale with respect to the filtration it generates. If, in addition, X n2 is integrable then X n2 − E(X n2 ) is a martingale with respect to the same filtration. The proof is left as an exercise. Theorem 2.3.11 If {X n , Fn } is a martingale and α is a stopping time with respect to the filtration Fn , then {X min(n,α) , Fn } is a martingale. Proof First we have to show that X min(n,α) is integrable. But X min(n,α) = n−1 k=0 X k + X n I{α≥n} and by assumption the variables X 0 , . . . , X n are integrable. Hence X min(n,α) is
54
Stochastic processes
integrable. Moreover, X min(n,α) is Fn -measurable. It remains to show that E[X min(n+1,α) | Fn ] = X min(n,α) . This follows from E[X min(n+1,α) − X min(n,α) | Fn ] = E[Iα>n (X n+1 − X n ) | Fn ] = I{α>n} E[(X n+1 − X n ) | Fn ] = 0, since {α > n} ∈ Fn . We also have that stopping at an optional time preserves the martingale property. Theorem 2.3.12 (Doob Optional Sampling Theorem). Suppose {X n , Fn } is a martingale. Let α ≤ β (a.s.) be stopping times such that X α and X β are integrable. Also suppose that lim inf |X n |dP → 0, (2.3.6) {α≥n}
and
lim inf {β≥n}
|X n |dP → 0.
(2.3.7)
Then E[X β | Fα ] = X α .
(2.3.8)
In particular E[X β ] = E[X α ]. Proof Using the definition of conditional expectation, we have to show that for every A ∈ Fα , I{α≤β} E[X β | Fα ]dP = I{α≤β} X β dP = I{α≤β} X α dP. A
A
However, {α ≤ β} =
A
n≥0 {α
= n} ∩ {β ≥ n}. Hence it suffices to show that, for all n ≥ 0: I{α=n}∩{β≥n} X β dP = I{α=n}∩{β≥n} X α dP
A
A
=
I{α=n}∩{β≥n} X n dP.
(2.3.9)
A
Now, {ω : β(ω) ≥ n} = {ω : β(ω) = n} last integral in (2.3.9) is equal to X n dP + A∩{α=n}∩{β=n}
=
{ω : β(ω) ≥ n + 1} and in view of (2.3.1), the
X n+1 dP
A∩{α=n}∩{β≥n+1}
X β dP + A∩{α=n}∩{β=n}
X n+1 dP. A∩{α=n}∩{β≥n+1}
(2.3.10)
2.3 Discrete time martingales
Also, {ω : β(ω) ≥ n} = {ω : n ≤ β(ω) ≤ n + 1} again, (2.3.10) equals X β dP + A∩{α=n}∩{n≤β≤n+1}
55
{ω : β(ω) ≥ n + 2} and using (2.3.1)
X n+2 dP.
A∩{α=n}∩{β≥n+2}
Repeating this step k times, I{α=n}∩{β≥n} X n dP = A
X β dP A∩{α=n}∩{n≤β≤n+k}
+
X n+k+1 dP, A∩{α=n}∩{β≥n+k+1}
that is
X β dP =
X n dP
A∩{α=n}∩{n≤β≤n+k}
A∩{α=n}∩{β≥n}
−
X n+k+1 dP. A∩{α=n}∩{β≥n+k+1}
Now, + − X n+k+1 = X n+k+1 − X n+k+1 + + − + = 2X n+k+1 − (X n+k+1 + X n+k+1 ) = 2X n+k+1 − |X n+k+1 |
so that
X β dP = A∩{α=n}∩{n≤β≤n+k}
X n dP A∩{α=n}∩{β≥n}
−2
A∩{α=n}∩{β≥n+k+1}
+
+ X n+k+1 dP
|X n+k+1 |dP.
(2.3.11)
A∩{α=n}∩{β≥n+k+1}
Taking the limit when k → ∞ of both sides of (2.3.11) and using (2.3.7), we obtain X β dP = X n dP, A∩{α=n}∩{n≤β}
A∩{α=n}∩{n≤β}
which establishes (2.3.9) and finishes the proof.
Definition 2.3.13 The stochastic process {X n , Fn } is a local martingale if there is a sequence of stopping times {αk } increasing to ∞ with probability 1 and such that {X n∧αk , Fn } is a martingale. Remark 2.3.14 The interesting fact about local martingales is that they can be obtained rather naturally through a martingale transform (stochastic integral in the continuous time case) which is defined as follows. Suppose {Yn , Fn } is a martingale and {An , Fn } is a
56
Stochastic processes
predictable process. Then the sequence X n = A0 Y0 +
n
Ak (Yk − Yk−1 )
k=1
is called a martingale transform and is a local martingale. Proof To show that {X n , Fn } is a local martingale we have to find a sequence of stopping times {αk }, k ≥ 1, increasing to infinity (P-a.s.) and such that the “stopped” process {X min(n,αk ) , Fn } is a martingale. Let αk = inf{n ≥ 0 : |An+1 | > k}. Since A is predictable the αk are stopping times and clearly αk ↑ ∞ (P-a.s.). Since Y is a martingale and |Amin(n,αk ) I{αk >n} | ≤ k then, for all n ≥ 1, E[|X min(n,αk ) I{αk >n} | < ∞. Moreover, from Theorem 2.3.11, E[(X min(n+1,αk ) − X min(n,αk ) )I{αk >n} | Fn ] = I{αk >n} Amin(n+1,αk ) E[Ymin(n+1,αk ) − Ymin(n,αk ) | Fn ] = 0. This finishes the proof. Example 2.3.15 Suppose that you are playing a game using the following “strategy”. At each time n your stake is An . Write X n for the state of your total gain through the n-th game with X 0 = 0 for simplicity. Write Fn = σ {X k : 0 ≤ k ≤ n}. We suppose for each n, An is Fn−1 measurable, that is A = {An } is predictable with respect to the filtration Fn . This means that An = An (X 0 , X 1 , . . . , X n−1 ) is a function of X 0 , X 1 , . . . , X n−1 . If we assume that you win (or lose) at time n if a Bernouilli random variable bn is equal to 1 (or −1), then n n Xn = Ak bk = Ak Ck . k=1
k
k=1
Here Ck = Ck − Ck−1 and Ck = i=1 bi . If C is a martingale with respect to the filtration Fn (in this case we say that the game is “fair”), then the same thing holds for X because E[X n | Fn−1 ] = X n−1 + An E[Cn − Cn−1 | Fn−1 ] = X n−1 + An (E[Cn | Fn−1 ] − Cn−1 ) = X n−1 + An (Cn−1 − Cn−1 ) = X n−1 . 2.4 Doob decomposition A submartingale is a process which “on average” is nondecreasing. Unlike a martingale, which has a constant mean over time, a submartingale has a trend or an increasing predictable part perturbated by a martingale component which is not predictable. This is made more precise by the following theorem due to J. L. Doob.
2.4 Doob decomposition
57
Theorem 2.4.1 (Doob Decomposition). Any submartingale {X n } can be written (P-a.s. uniquely) as X n = Yn + Z n ,
a.s.
(2.4.1)
where {Yn } is a martingale and {Z n } is a predictable, increasing process, i.e. E(Z n ) < ∞, Z 1 = 0 and Z n ≤ Z n+1 a.s. ∀n. Proof Then:
Write n = X n − X n−1 , yi = i − E[i | Fi−1 ] and z i = E[i | Fi−1 ], z 0 = 0. X n = 1 − E[1 | F0 ] + 2 − E[2 | F1 ] + · · · + n − E[n | Fn−1 ] +
n
E[i | Fi−1 ]
i=1
=
n
yi +
n
i=1
zi
i=1
= Yn + Z n , To prove uniqueness suppose that there is another decomposition X n = Yn + Z n = n n i=1 yi + i=1 z i . Let yn + z n = x n = yn + z n and take conditional expectation with respect to Fn−1 to get z n = z n , because yn is a martingale increment and z n is predictable. This implies yn = yn and the uniqueness of the decomposition. Remarks 2.4.2 1. In Theorem 2.4.1 if {X n } is just an Fn -adapted and integrable process the decomposition remains valid but we lose the “increasing” property of the process {Z n }. 2. The process X − Z is a martingale; as a result Z is called the compensator of the submartingale X . 3. A processes which is the sum of a predictable process and a martingale is called a semimartingale. 4. Uniqueness of the decomposition is ensured by the predictability of the process {Z n }. Definition 2.4.3 A discrete-time stochastic process {X n }, with finite-state space S = {s1 , s2 , . . . , s N }, defined on a probability space (, F, P) is a Markov chain if P(X n+1 = sin+1 | X 0 = si0 , . . . , X n = sin ) = P(X n+1 = sin+1 | X n = sin ), for all n ≥ 0 and all states si0 , . . . , sin , sin+1 ∈ S. This is termed the Markov property. {X n } is a homogeneous Markov chain if
P(X n+1 = s j | X n = si ) = π ji is independent of n.
58
Stochastic processes
The matrix = {π ji } is called the probability transition matrix of the homogeneous Markov chain and it satisfies the property Nj=1 π ji = 1. Note that our transition matrix is the transpose of the traditional transition matrix defined elsewhere. The convenience of this choice will be apparent later. The following properties of a homogeneous Markov chain are easy to check. 1. Let π 0 = (π10 , π20 , . . . , π N0 ) be the distribution of X 0 . Then P(X 0 = si0 , X 1 = si1 , . . . , X n = sin ) = πi00 πi0 i1 . . . πin−1 in . 2. Let π n = (π1n , π2n , . . . , π Nn ) be the distribution of X n . Then π n = n π 0 = π n−1 . Example 2.4.4 Let {ηn } be a discrete-time Markov chain as in Definition 2.4.3. Consider the filtration {Fn } = σ {η0 , η1 , . . . , ηn }. Write X n = (I(ηn =s1 ) , I(ηn =s2 ) , . . . , I(ηn =s N ) ). Then X n is a discrete-time Markov chain with state space the set of unit vectors e1 = (1, 0, . . . , 0) , . . . , e N = (0, . . . , 1) of IR N . However, the probability transitions matrix of X is . We can write: E[X n | Fn−1 ] = E[X n | X n−1 ] = X n−1 ,
(2.4.2)
from which we conclude that X n−1 is the predictable part of X n , given the history of X
up to time n − 1 and the nonpredictable part of X n must be Mn = X n − X n−1 . In fact it can be easily shown that Mn ∈ IR N is a mean 0, Fn -vector martingale and we have the semimartingale (or Doob decomposition) representation of the Markov chain {X n }, X n = X n−1 + Mn .
(2.4.3)
Definition 2.4.5 Given two (column) vectors X and Y the tensor or Kronecker product X ⊗ Y is the (column) vector obtained by stacking the rows of the matrix X Y , where is the transpose, with entries obtained by multiplying the i-th entry of X by the j-th entry of Y . Example 2.4.6 Let {X n } be an order-2 Markov chain (see (2.4.4) below) with state space the standard basis of IR2 {e1 , e2 } on a filtered probability space (, F, Fn , P), Fn = σ {X 0 , X 1 , . . . , X n } such that P(X n = ek | Fn−1 ) = P(X n = ek | X n−2 , X n−1 ), and probability transitions matrix = {πk, ji },
k
πk, ji = 1,
i, j, k = 1, 2
(2.4.4)
2.5 Continuous time martingales
or
=
π1,11 π2,11
π1,12 π2,12
π1,21 π2,21
59
π1,22 . π2,22
Lemma 2.4.7 A semimartingale representation (or Doob decomposition) of the order-2 Markov chain X is: X n = (X n−2 ⊗ X n−1 ) + Mn ,
(2.4.5)
that is Mn = X n − X n−2 ⊗ X n−1 is an Fn -martingale. (X n−2 ⊗ X n−1 ) is the tensor, or Kronecker, product of the vectors X n−1 , X n−2 . This can be identified with one of the standard unit vectors {e1 , e2 , e3 , e4 } of IR4 , that is e1 ⊗ e1 = (1, 0, 0, 0) ,
e1 ⊗ e2 = (0, 1, 0, 0) ,
e2 ⊗ e1 = (0, 0, 1, 0) ,
e2 ⊗ e2 = (0, 0, 0, 1) .
Proof E[X n | Fn−1 ] = E[X n | X n−2 , X n−1 ] = E[X n | X n−2 = ei , X n−1 = e j ]I{X n−2 =ei ,X n−1 =e j } ij
=
k
=
ij
=
ek πk, ji I{X n−2 =ei ,X n−1 =e j }
ij
(π1, ji , π2, ji )I{X n−2 =ei ,X n−1 =e j }
ei ⊗ e j I{X n−2 =ei ,X n−1 =e j } = X n−2 ⊗ X n−1 .
ij
2.5 Continuous time martingales The stochastic process X is a submartingale (supermartingale) with respect to the filtration {Ft } if 1. it is Ft -adapted, E[|X t |] < ∞ for all t and 2. E[X t | Ft ] ≥ X t (E[X t | Ft ] ≤ X t ) for all t ≤ t. The stochastic process X is a martingale if it is a submartingale and a supermartingale. Since for a martingale E[X t | Fs ] = X s , it follows that E[E[X t | Fs ]] = E[X s ], and E[X t ] = E[X s ] for all s ≥ 0, so that E[X t ] = E[X 0 ] for all t ≥ 0.
60
Stochastic processes
Example 2.5.1 If X is an integrable random variable on a filtered probability space then
X t = E[X | Ft ] is a martingale, since for s ≤ t, E[X t | Fs ] = E[E[X | Ft ] | Fs ] = E[X | Fs ] = X s .
An important application of Example 2.5.1 is Example 2.5.2 Let (, F, P, P) be a probability space with a filtration {Ft , t ≥ 0} and two probability measures such that P P. Then the Radon–Nikodym Theorem asserts the existence of a nonnegative random variable such that for all F ∈ F, P(F) = (ω)dP(ω). F
Then t = E[ | Ft ] is a nonnegative martingale with mean E[t ] = (ω)dP(ω) = 1.
t (ω)dP(ω) =
Example 2.5.3 Let {X t } be a stochastic process adapted to the filtration {Ft } with independent increments, that is, for s ≤ t, X t − X s is independent of the σ -field Fs . Then the process {X t − E[X t ]} is an Ft -martingale since E[X t − E[X t ] | Fs ] = E[X t − E[X t ] − (X s − E[X s ]) + (X s − E[X s ]) | Fs ] = X s − E[X s ] + E(X t − X s ) − E(X t − X s ) = X s − E[X s ]. The following martingale convergence result is proved in, for instance, [6] page 16. Theorem 2.5.4 (Martingale Convergence Theorem). Let {X t , Ft }, t ≥ 0, be a martingale with right-continuous sample paths. If supt E[|X t |] < ∞ then there is a random variable X ∞ ∈ L 1 such that limt→∞ X t = X ∞ a.s. Furthermore, if {X t , Ft }, t ≥ 0 is uniformly L1
integrable then (X t → X ∞ ) and E[|X t |] increases to E[|X ∞ |] as t → ∞. Theorem 2.5.5 (Stopped Martingales are Martingales). Let {X t , Ft } be a martingale with right-continuous sample paths and α a stopping time. The stopped process {X t∧α , t ≥ 0} is also a martingale. Proof
See [34] page 189.
Theorem 2.5.6 (Optional Stopping). Let {X t , Ft , t ≥ 0} be a right-continuous martingale with a last element X ∞ , and let α ≤ β be two stopping times. Then E[X β | Fα ] = X α In particular, we have E[X β ] = X 0 . Proof
See [21] page 19.
a.s.
2.5 Continuous time martingales
61
Now we give a characterization of a uniformly integrable martingale. We need this result to prove Theorem 3.5.3 Theorem 2.5.7 Suppose {X t }, 0 ≤ t ≤ ∞, is an adapted right-continuous process such that for every stopping time α, E[|X α |] < ∞ and E[X α ] = 0. Then {X t } is a uniformly integrable martingale. Proof
Consider any time t ∈ [0, ∞] and F ∈ Ft . Let α(ω) = t I{ω∈F} + ∞I{ω∈F} / .
Then α is a stopping time and by assumption 0 = E[X α ] = E[X t I{ω∈F} ] + E[X ∞ I{ω∈F} / ] = E[X ∞ ] = E[X ∞ I{ω∈F} ] + E[X ∞ I{ω∈F} / ]. Hence E[X t I{ω∈F} ] = E[X ∞ I{ω∈F} ] for all F ∈ Ft , so X t = E[X ∞ | Ft ] a.s. Recall that the definition of a martingale involves the integrability of X t , for all t which in fact is a sufficient condition for the existence of E[X t | Fs ], s ≤ t. However, E[X t | Fs ], s ≤ t may exist even though E[|X t |] = ∞, in which case {X t , Ft } is called a local martingale. First recall the concept of local properties of deterministic functions. The (deterministic) function X t = et /(t − 1) is locally bounded, i.e. it is bounded on compact sets not containing 1 (closed bounded intervals in IR − {1}). In fact we can define, for each n ∈ IN: Ytn = X t I[|X t |≤n] + n I[X t >n] − n I[X t <−n] . Clearly Ytn is bounded everywhere and equals X t on closed bounded intervals. However, for Ytn to converge to X t we must allow n to increase to infinity. The same idea is used when X t (ω) is a random function or a stochastic process. However, the localizing sequence is then a sequence of random variables, in fact stopping times. For example, consider αn (ω) = inf{t : |X t (ω)| > n}, which is the first time the sample path X t (ω) leaves the interval [−n, +n]. Then define Ytn (ω) = X t∧αn (ω) (ω), so that for different ωs there are, for each n, different times t when X t (ω) leaves the bounded set [−n, n]. As in the deterministic case, the sequence of stopping times αn (ω) must increase to infinity for almost all ω. Here x ∧ y stands for the smaller of x and y. Definition 2.5.8 The stochastic process X = {X t }, t ≥ 0, is said to be square integrable if supt E[X t2 ] < ∞. Definition 2.5.9 The stochastic process {X t , Ft } is a local martingale if there is a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a martingale. Definition 2.5.10 The stochastic process {X t , Ft } is a locally square integrable martingale (i.e. locally in L 2 ) if there is a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a square integrable martingale.
62
Stochastic processes
The following two theorems, whose proofs can be found in [11], are needed in the proof of Theorem 3.5.6. Theorem 2.5.11 Let {X t , Ft } be a local martingale which is zero at time t = 0. Then there exists a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a uniformly integrable martingale and E[X t∧αn | Ft ] is bounded on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t < αn (ω)) (denoted [[0, αn [[). Theorem 2.5.12 Let {X t , Ft } be a local martingale. Then there exists a sequence of stopping times {αn } increasing to ∞ with probability 1 such that for each n, X {αn ∧t} = U{τn ∧t} + V{αn ∧t} , where U0 = 0, U{αn ∧t} is square integrable and V{αn ∧t} is a martingale of integrable variation which is zero at t = 0.
2.6 Doob–Meyer decomposition The following definitions are needed in the sequel. Definition 2.6.1 Let f be a real valued function on an interval [a, b]. The variation of f on the interval [a, b] is given by lim
n→∞
n
|
f (tkn )
−
n f (tk−1 )| =
k=1
b
|d f |,
a
where a = t0n < t1n < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. If
b
|d f | < ∞,
a
then we say that f has finite variation on the interval [a, b]. If
b
|d f | = ∞,
a
then we say that f has infinite variation on the interval [a, b]. Definition 2.6.2 A stochastic process X is of integrable variation if
E
∞
|dX s | < ∞.
0
Example 2.6.3 A typical example of a continuous function of infinite variation is the following: f (x) =
0 for x = 0, π x sin for 0 < x ≤ 1. 2x
2.6 Doob–Meyer decomposition
63
Consider the sequence of partitions of the interval [0, 1]: π1 = {0, 1}, π2 = {0, 12 , 1}, π3 = {0, 13 , 12 , 1}, π4 = {0, 14 , 13 , 12 , 1}, ...
1 1 1 πn = 0, , ,..., ,1 . n−1 n−2 n − (n − 2)
Then it can be verified that 0
1
|d f | = lim
n→∞
n
n | f (tkn ) − f (tk−1 )| = ∞.
k=1
Another example of a function of infinite variation in any interval containing 0 is f (x) =
(−1)[1/x] , 1+x
where [1/x] stands for the integral part of 1/x. Definition 2.6.4 An adapted process {X t , Ft } is called a semimartingale if it can be written in the form X t = X 0 + Mt + Vt . Here {Mt }, t ≥ 0 is a local martingale with M0 = 0; {Vt } is an adapted process with paths of finite variation (see Definition 2.6.1), and V0 = 0. {Vt } is not necessarily predictable. Roughly speaking, {Vt } is a slowly changing component (trend) and {Mt } is a quickly changing component. Definition 2.6.5 An adapted process {X t , Ft } is called a special semimartingale if it can be written in the form X t = X 0 + Mt + Vt . Here {Mt }, t ≥ 0, is a local martingale with M0 = 0; {Vt } is a predictable process with paths of finite variation, and V0 = 0. Theorem 2.6.6 X is a (special) semimartingale if and only if the stopped process X t∧τn is a (special) semimartingale, where {τn } is a sequence of stopping times such that limn τn = ∞ Proof (Elliott [11]). Clearly, if X is a (special) semimartingale then the stopped process X t∧τn is a (special) semimartingale for each n.
64
Stochastic processes
If S and T are stopping times and X t∧S and X t∧T are (special) semimartingales then the same is true of X t∧(S∨T ) = X t∧S + X t∧T − X t∧(S∧T ) . Therefore, we can assume that {τn } is an increasing sequence of stopping times with the stated properties. If X t∧τn is a special semimartingale for each n it has a unique decomposition X t∧τn = X 0 + Mtn + Ant . However, (X t∧τn+1 )t∧τn = X t∧τn , so (M n+1 )t∧τn = M n and (An+1 )t∧τn = An . The processes {M n } and {An } can, therefore, be “pasted” together to give a local martingale M and a predictable process A of locally finite variation, so the process X in this case is a special semimartingale. In the general case we know that X t∧τn is a semimartingale for each n. However, X is cer tainly a right-continuous process with left limits, so the process Vt = 0<s≤t X s I{|X s |≥1} is of finite variation, as is Y = X − V − X 0 . For each n, Yt∧τn = X t∧τn − Vt∧τn − X 0 is a semimartingale whose jumps are all bounded by 1. Therefore, by Corollary 12.40(c) page 150 in [11], Yt∧τn is a special semimartingale. By the first part of this proof Y is then a special semimartingale and X t = X 0 + Yt + Vt is a semimartingale. Definition 2.6.7 A right-continuous stochastic process {X t } on the stochastic basis (, F, Ft , P) is said to be of class D if the family {X τ }, for all τ which is an a.s. finite stopping time, is uniformly integrable. It is of class DL if it is of class D on each interval [0, a], a < ∞. Definition 2.6.8 A right-continuous uniformly integrable supermartingale {X t } is said to be of class D if the set of random variables {X τ }, for τ any stopping time, is uniformly integrable. Note that any uniformly integrable martingale is of class D. This follows from Doob’s Optional Stopping Theorem 2.5.6 because X τ = E[X ∞ | Fτ ] a.s. The proof of the following important theorem can be found, for instance, in [11]. Theorem 2.6.9 (Doob–Meyer Decomposition). Any class D supermartingale (X t , Ft ) can be written (P-a.s. uniquely) as Mt = X t + A t ,
(2.6.1)
where {Mt , Ft } is a uniformly integrable martingale and {At , Ft } is a predictable, increasing process. Remarks 2.6.10 1. If we replace class D by class DL in the theorem {Mt , Ft } is no longer a uniformly integrable martingale. (See Theorem 4.10 in [21].)
2.6 Doob–Meyer decomposition
65
2. The Doob–Meyer decomposition of a process is the special semimartingale representation of that process because of the predictability of the process {At , Ft }. 3. Recall that in the Doob decomposition for discrete time submartingales the increasing predictable process is given by Zn = E[X i − X i−1 | Fi−1 ]. or Z n = E[X n | Fn−1 ]. By analogy, At is obtained if we replace summation with integration in the following manner: t X s+h − X s At = lim E | Fs ds, h→0 0 h or dZ t = E[dX t | Ft ]. 4. An interesting consequence of the Doob–Meyer Theorem is that any continuous martingale has unbounded variation. To see this suppose that (Mt , Ft ) is a continuous martingale with bounded variation so that it can be written as a difference of two continuous increasing functions X t and Z t , Mt = X t − Z t
or
X t = Mt + Z t ,
which is a Doob–Meyer decomposition of the submartingale X t . But X t = 0 + X t is another Doob–Meyer decomposition of X t . By uniqueness Mt = 0. Example 2.6.11 Suppose τ1 ≤ τ2 ≤ . . . is a sequence of stopping times such that limn τn = +∞ a.s. Then the counting process Nt = I{τn ≤t} n≥1
is an Ft -submartingale and it admits a Doob–Meyer decomposition N t = Yt + Z t . Here, the predictable, increasing process {Z t , Ft } is called the compensator of Nt .
Example 2.6.12 Let X be the single jump process introduced in Example 2.1.4. For t ≥ 0 define the process µ(t, A) = I{T ≤t} I{Z ∈A} .
(2.6.2)
Note that the process µ(t, A) has sample paths which are identically zero until the jump time T . They then have a unit jump at T and Z ∈ A. We now show that the predictable
66
Stochastic processes
compensator of µ is given by µ p (t, A) = −
]0,T ∧t]
dFsA , Fs−
(2.6.3)
where FsA = P[T > s, Z ∈ A] and Fs = P[T > t, Z ∈ E]. Write Ft for the completed σ -field generated by {X s }, s ≤ t, so that Ft is generated by B([0, t]) × E. Note that ]t, ∞] × E is an atom of Ft . We have the following result which will be used later (see Lemma 3.8.9). Lemma 2.6.13 Suppose τ is an Ft stopping time with P(τ < T ) > 0. Then there is a t0 ∈ [0, ∞[ such that τ ∧ T = t0 ∧ T a.s. Proof Suppose τ takes two values t1 < t2 on {ω ∈ = [0, ∞] × E : τ (ω) ≤ T (ω)} with positive probability. Then for t1 < t < t2 , {ω ∈ = [0, ∞] × E : τ (ω) ≤ t}∩]t, ∞] × E = ]t, ∞] × E, so {τ ≤ t} ∈ Ft . Therefore for some t0 ∈ [0, ∞[, {τ ≤ T } ⊂ {t0 ≤ T }. A similar argument gives the reverse inclusion and the result follows.
Theorem 2.6.14 q(t, A) = µ(t, A) − µ p (t, A) is an Ft -martingale. Proof
([11]) For t > s,
E[q(t, A) − q(s, A) | Fs ] = E[µ(t, A) − µ p (t, A) − (µ(s, A) − µ p (s, A)) | Fs ] = E[µ(t, A) − µ(s, A) − (µ p (t, A) − µ p (s, A)) | Fs ] So we must show that E[µ(t, A) − µ(s, A) | Fs ] = E[µ p (t, A) − µ p (s, A) | Fs ].
(2.6.4)
First note that, in view of (2.6.2), if T ≤ s both sides of (2.6.4) are zero. Now recall that ]s, ∞] × E is an atom of Fs E[µ(t, A) − µ(s, A) | Fs ] = E[I{Z ∈A} I{ss, Z ∈ A | T > s, Z ∈ E)I{T >s,Z ∈E} − P(T > t, Z ∈ A | T > s, Z ∈ E)I{T >s,Z ∈E} =
FsA − FtA I{T >s,Z ∈E} . Fs
2.6 Doob–Meyer decomposition
67
On the other hand µ p (t, A) is a function of T only, and F(t) = P(T > t). Therefore, using (2.6.3), E[µ p (t, A) − µ p (s, A) | Fs ]
dFuA dFuA = −E − | T > s, Z ∈ E I{T >s,Z ∈E} ]0,T ∧t] Fu− ]0,T ∧s] Fu− dFuA dFuA I{T >s,Z ∈E} (IT >t + Iss, Z ∈ E) ]0,T ∧t] Fu− ]0,s] Fu−
I{T >s,Z ∈E} dFuA dFuA =− I {T > t, Z ∈ E} − E P(T > s, Z ∈ E) ]0,t] Fu− ]0,s] Fu−
dFuA dFuA + Is
I {T > s, Z ∈ E} dFuA dFuA = −Ft − (−dFr ) . Fs ]s,t] Fu− ]s,t] ]s,r ] Fu− Interchanging the order of integration, the double integral is dFuA 1 (dFr ) = dFr dFuA ]s,t] ]s,r ] Fu− ]s,t] Fu− ]u,t] 1 = (Ft − Fu− )dFuA ]s,t] Fu− dFuA = Ft − (FtA − FsA ). ]s,t] Fu− Therefore (2.6.4) holds and the result follows. A continuous-time, discrete-state stochastic process of great importance in stochastic modeling is the following. Definition 2.6.15 A continuous-time stochastic process {X t }, t ≥ 0, with finite-state space S = {s1 , s2 , . . . , s N }, defined on a probability space (, F, P) is a Markov chain if for all t, u ≥ 0 and 0 ≤ r ≤ u, P(X (t+u) = s j | X u = si , X r = sk ) = P(X (t+u) = s j | X u = si ), for all states si , s j , sk ∈ S. {X t }, t ≥ 0, is a homogeneous Markov chain if
P(X (t+u) = s j | X u = si ) = p ji (t) is independent of u. The family Pt = { p ji (t)} is called the transition semigroup of the homogeneous Markov chain and it satisfies the property Nj=1 p ji (t) = 1.
68
Stochastic processes
The following properties are similar to the discrete-time case. P(t+u) = Pt Pu and P0 = I , where I is the identity matrix. Let p0 = ( p01 , p02 , . . . , p0N ) be the distribution of X 0 and pt = ( pt1 , pt2 , . . . , ptN ) be the distribution of X t . Then pt = Pt p0 . Theorem 2.6.16 Let {Pt }, t ≥ 0 be a continuous transition semigroup. Then there exist
qi = lim h↓0
1 − pii (h) ∈ [0, ∞], h
and
q ji = lim h↓0
Proof
p ji (h) ∈ [0, ∞). h
See, for instance, [5] page 334.
The matrix A = {qi j } is called the infinitesimal generator of the continuous-time homogeneous Markov chain. N Note that since p ji (h) = 1 it follows immediately that j=1
qi = −
N
q ji .
j=i, j=1
The differential system d P(t+h) − Pt Ph − I Pt = lim = Pt lim = Pt A h↓0 h↓0 dt h h is called Kolmogorov’s forward differential system. d Similarly, the system Pt = A Pt is called Kolmogorov’s backward differential system. dt In this finite-state case, a solution for both systems, with initial condition P0 = I , is et A . Example 2.6.17 (Semimartingale representation of a continuous-time Markov chain) Let {Z t } t ≥ 0 be a continuous-time Markov chain, with state space {s1 . . . , s N }, defined on a probability space (, F, P). S will denote the (column) vector (s1 , . . . , s N ) . Suppose 1 ≤ i ≤ N , and for j = i, πi (x) =
N
(x − s j ),
j=1
πi (x) ; then φi (s j ) = δi j and φ = (φ1 , . . . , φ N ) is a bijection of the set πi (si ) {s1 . . . , s N } with the set S = {e1 , . . . , e N } . Here, for 0 ≤ i ≤ N , ei = (0, . . . , 1, . . . , 0) is the i-th unit (column) vector in IR N . Consequently, without loss of generality, we shall consider a Markov chain on S. If X t ∈ S denotes the state of this Markov chain at time t ≥ 0, then the corresponding value of Z t is X t , S, where ., . denotes the inner product in IR N . and φi (x) =
2.6 Doob–Meyer decomposition
69
Write pti = P(X t = ei ), 0 ≤ i ≤ N . We shall suppose that for some family of matrices At , pt = ( pt1 , . . . , ptN ) satisfies the forward Kolmogorov equation d pt = A t pt , dt with p0 known and At = (ai j (t)), t ≥ 0. The fundamental transition matrix associated with A will be denoted by (t, s), so with I the N × N identity matrix, d(t, s) = At (t, s), dt d(t, s) = −(t, s)As , ds
(s, s) = I
(2.6.5)
(t, t) = I.
(If At is constant (t, s) = exp(t − s)A.) Consider the process in state x ∈ S at time s and write X s,t (x) for its state at the later time t ≥ s. Then E s,x [X t | Fs ] = E s,x [X t | X s ] = E s,x [X s,t (x)] = (t, s)x. Write Fts for the right-continuous, complete filtration generated by σ {X r : s ≤ r ≤ t}, and Ft = Ft0 . We have the following representation result. Lemma 2.6.18
Mt = X t − X 0 −
t
Ar X r dr 0
is an {Ft } martingale. Proof
Suppose 0 ≤ s ≤ t. Then t E[Mt − Ms | Fs ] = E X t − X s − Ar X r dr | Fs s t = E Xt − Xs − Ar X r dr | X s s t = E s,X s [X t ] − X s − Ar E s,X s [X r ]dr s t = (t, s)X s − X s − Ar (r, s)X s dr = 0 s
by (2.6.5). Therefore, the (special) semimartingale representation of the Markov chain X is
t
Xt = X0 +
Ar X r dr + Mt .
0
70
Stochastic processes
2.7 Brownian motion Let X be a real valued random variable with E[X 2 ] < ∞, E[X ] = µ and E[X − µ]2 = σ 2 = 0. Recall that X is Gaussian if its probability density function is given by the function 1 (x − µ)2 f (x) = √ , x ∈ IR. exp − 2σ 2 2πσ 2 If X = (X 1 , . . . , X n ) is a vector valued random variable with positive definite covariance matrix C = {Cov(X i , X j )}, i, j = 1, . . . , n, E[X ] = µ = (µ1 , . . . , µn ) , then X = (X 1 , . . . , X n ) is Gaussian if its density function is 1 (x − µ) C −1 (x − µ) f (x1 , . . . , xn ) = exp − , (2π)n/2 (det C)1/2 2 (x1 , . . . , xn ) ∈ IRn . Notice that the first two moments completely characterize a Gaussian random variable and uncorrelatedness implies independence between Gaussian random variables. A continuous-time, continuous-state space stochastic process {Bt } is said to be a standard one-dimensional Brownian motion process if X 0 = 0 a.s., it has stationary independent increments and for every t > 0, Bt is normally distributed with mean 0 and variance t. These features make the {Bt }, perhaps the most well-known and extensively studied continuous-time stochastic process. The joint distribution of any finite number of the random variables Bt1 , Bt2 , . . . , Btn , t1 ≤ t2 ≤ · · · ≤ tn of the process is normal with density 1 1 x 2 n−1 exp − 1 √ 2t1 i=1 2π (ti+1 − ti ) 2πt1 (xi+1 − xi )2 × exp − . 2(ti+1 − ti )
f (x1 , x2 , . . . , xn ) = √
The form of the density function f (x1 , x2 , . . . , xn ) shows that indeed the random variables Bt1 , Bt2 − Bt1 , . . . , Btn − Btn−1 are independent. By the independent increment property, P(Bt ≤ x | Bt0 = xo ) = P(Bt − Bt0 ≤ x − x0 ) x−x0 1 u2 = √ exp − du. 2(t − t0 ) 2π(t − t0 ) −∞ If Bt = (Bt1 , . . . , Btn ) is a vector valued Brownian motion process and x, y ∈ IRn , then 1 |y − x|2 f B (t, x, y) = exp − (2πt)n/2 2t n (yi − xi )2 1 = exp − , √ 2t 2π t i=1
2.7 Brownian motion
71
so that the n components of Bt are themselves independent one-dimensional Brownian motion processes. Some properties of the Brownian motion process The proofs of the following properties are left as exercises. If {Bt } is a Brownian motion process then: 1. the process {−Bt } is a Brownian motion, 2. for any a ≥ 0, the process {Bt+a − Bt } is a Brownian motion and the same result holds if a is replaced with a finite valued stopping time a(ω), 3. for any a = 0, the process {a Bt/a 2 } is a Brownian motion, 4. the process {B1/t }, for t > 0, is a Brownian motion, 5. Almost all the paths of (one-dimensional) Brownian motion visit any real number infinitely often. Theorem 2.7.1 Let {Bt } be a standard Brownian motion process and Ft = σ {Bs : s ≤ t}. Then 1. {Bt } is an Ft -martingale, 2. {Bt2 − t} is an Ft -martingale, and 3. for any real number σ , {exp(σ Bt −
σ2 t)} is an Ft -martingale. 2
Proof 1. Let s ≤ t. E[Bt − Bs | Fs ] = E[Bt − Bs ] = 0 because {Bt } has independent increments and E[Bt ] = E[Bs ] = 0 by hypothesis. 2. E[(Bt − Bs )2 | Fs ] = E[Bt2 − Bs2 ] = t − s. Therefore E[Bt2 − t 2 | Fs ] = Bs2 − s. 1 2 3. If Z is a standard normal random variable, with density √ e−x /2 , and λ ∈ IR then 2π ∞ 1 λx −x 2 /2 λ2 /2 λZ E[e ] = √ e e dx = e . Using the independence of increments and 2π −∞ stationarity we have, for s < t, E[eσ Bt −
σ2 t 2
| Fs ] = eσ Bs − =
σ2 t 2 E[eσ (Bt −Bs )
| Fs ]
σ2 eσ Bs − 2 t E[eσ (Bt −Bs ) ]
= eσ Bs −
σ2 t 2 E[eσ Bt−s ].
Now σ Bt−s is N (0, σ 2 (t − s)); that is, if Z is N (0, 1) as previously, σ Bt−s has the same √ law as σ t − s Z and E[eσ Bt−s ] = E[eσ Therefore E[eσ Bt −
σ 2
2
t | F ] = eσ Bs − s
σ 2
2
√
t−s Z
] = eσ
2
(t−s)/2
.
s and the result follows.
It turns out that Theorem 2.7.1 (2) characterizes a Brownian motion (see Theorem 3.7.3).
72
Stochastic processes
Theorem 2.7.2 (The Strong Markov Property for Brownian Motion) Let {Bt } be a Brownian motion process on a filtered probability space (, F, Ft }, and let τ be a finite valued stopping time with respect to the filtration {Ft }. Then the process B{τ +t} − Bτ , t ≥ 0, is a Brownian motion independent of Fτ Proof
See [34] page 22.
Theorem 2.7.3 (Existence of Brownian Motion) There exists a probability space on which it is possible to define a process {Bt }, 0 ≤ t ≤ 1, which has all the properties of a Brownian motion process. Proof
See [34] page 10.
2.8 Brownian motion process with drift An important stochastic process in applications is the one-dimensional Brownian motion with drift X t = µt + σ Bt , where µ is a constant, called the drift parameter, and Bt is a standard Brownian motion. Then it is easily seen that X t has independent increments and that X t+h − X t is normally distributed with mean µh and variance σ h. By the independent increment property we have P(X t ≤ x | X t0 = xo ) = P(X t − X t0 ≤ x − x0 ) x−x0 1 (u − µ(t − t0 ))2 = √ du. exp − 2(t − t0 )σ 2 2π(t − t0 )σ −∞ 2.9 Brownian paths The sample paths of a Brownian motion process are highly irregular. In fact they model the motion of a microscopic particle suspended in a fluid and subjected to the impacts of the fluid molecules. This phenomenon was first reported by the Scottish botanist Robert Brown in 1828. The path followed by a Brownian particle is very irregular. The sample paths of a Brownian motion process are nowhere differentiable with probability 1. To see this consider the quantity
Zh =
Bt+h − Bt , h
which is normally distributed with variance 1/ h → ∞ as h → 0. Hence for every bounded Borel set B, P(Z h ∈ B) → 0
(h → 0),
2.9 Brownian paths
73
that is, Z h does not converge with positive probability to a finite random variable. Using Kolmogorov’s Continuity Theorem, which we now state, one can show that almost all sample paths of a Brownian motion process are continuous. ˘ Theorem 2.9.1 (Kolmogorov–Centsov Continuity Theorem). Suppose that the stochastic process {X t } satisfies the following conditions. For all T > 0 there exists constants α > 0, β > 0, D > 0 such that: E[|X t − X s |α ] ≤ D|t − s|1+β ;
0 ≤ s, t ≤ T,
(2.9.1)
then almost every sample path is uniformly continuous on the interval [0, T ]. For the proof see [15] page 57. Recall that for a Brownian motion, P(Bt − Bs ≤ x) = √
1 2π|t − s|
x −∞
exp −
u2 du. 2|t − s|
Hence E|Bt − Bs |4 = √
1 2π|t − s|
+∞ −∞
u 4 exp −
u2 du = 3(t − s)2 , 2|t − s|
which verifies the Kolmogorov condition with α = 4, D = 3, β = 1 and establishes the almost sure continuity of the Brownian motion process. We now show that each portion of almost every sample path of the Brownian motion process Bt has infinite length, i.e. almost all sample paths are of unbounded variation, so that terms in a Taylor series expansion which would ordinarily be of second order get promoted to first order. This is one of the most remarkable properties of a Brownian motion process. Lemma 2.9.2 Let Bt be a Brownian motion process and let a = t0n < t1n < · · · < tnn = b
n denote a sequence of partitions of the interval [a, b] such that δn = max(tkn − tk−1 )= n max tk → 0 as n → ∞. n )2 = 2 B n and Write (Btkn − Btk−1 tk
Sn (B) =
n
2 Btkn .
k=1
Then: 1. E[Sn (B) − (b − a)]2 → 0 2. If δn → 0 so fast that
(δn → 0). ∞ n=1
then Sn (B) → b − a
(a.s.)
δn < ∞
(2.9.2)
74
Stochastic processes
Proof E[Sn (B) − (b − a)]2 = E[
1.
n
(2 Btkn − tkn )]2
k=1
=
n
E[2 Btkn ]2 − 2tkn E[2 Btkn ] + (tkn )2 ]
k=1
=
n
(3(tkn )2 − 2(tkn )2 + (tkn )2 )
k=1
=
n
2(tkn )2 ≤ 2δn
k=1
n
2(tkn ) = 2δn (b − a),
k=1
which goes to zero as δn → 0 and E[Sn − (b − a)]2 → 0. 2. By Chebyshev’s inequality (1.3.33) P(|Sn (B) − (b − a)| ≥ ) ≤
Var(Sn (B) − (b − a)) 2δn (b − a) ≤ . 2 2
(2.9.3)
In view of (2.9.2) we can sum up both sides of (2.9.3) and use the Borel–Cantelli Lemma (1.2.7) to get P(lim sup{|Sn (B) − (b − a)| ≥ }) = 0, and the event {ω : |Sn (B(ω)) − (b − a)| ≥ } occurs only a finite number of times with probability 1 as n increases to infinity. Therefore we have almost sure convergence.
The above argument shows that Bt (ω) is, a.s., of infinite variation on [a, b]. To see this note that n (ω)| b − a ≤ lim sup max |Btkn (ω) − Btk−1
n
n |. |Btkn − Btk−1
k=1 n (ω)| can be made From the sample-path continuity of Brownian motion, max |Btkn (ω) − Btk−1 n n n arbitrarily small for almost all ω which implies that k=1 |Btk − Btk−1 | → ∞ for almost all ω as n → ∞. There is a simple construction for Brownian motion. Take a sequence X 1 , X 2 , . . . , of i.i.d. N (0, 1) random variables and an orthonormal basis {φn } for L 2 [0, 1]. That is t φn , φn L 2 = φn2 (s)ds = 1,
0
and φm , φn L 2 = 0
t
φm (s)φn (s)ds = 0,
2.10 Poisson process
if m = n. For t ∈ [0, 1] define Btn
=
n
Xk
k=1
t
75
φk (s)ds.
0
Using the Parseval equality it is seen that E[Btn − Btm ]2 → 0
(n, m → ∞).
The completeness of L 2 [0, 1] implies the existence of a limit process Bt with the same covariance function as a Brownian motion. It can also be shown that Btn converges uniformly in t ∈ [0, 1] to Bt with probability 1 (a.s.), that is, {Bt } has continuous sample paths a.s. 2.10 Poisson process A continuous-time, discrete-state space, stochastic process which keeps the count of the occurrences of some specific event (or events) {Nt }t≥0 is called a counting process. The Poisson process is a counting process which, like the Brownian motion, has independent increments, but its sample paths are not continuous. They are increasing step functions with each step having height 1 and a random waiting time between two consecutive jumps. The times between successive jumps are independent and exponentially distributed with parameter λ > 0. The joint probability distribution of any finite number of values Nt1 , Nt2 , . . . , Ntn of the process is P[Nt1 = k1 , . . . , Ntn = kn ] =
[λ(ti+1 − ti )]ki+1 −ki (λt1 )k1 exp(−λt1 ) n−1 exp(−λ(ti+1 − ti )), k1 ! (ki+1 − ki )! i=1
provided that t1 ≤ t2 ≤ · · · ≤ tn and k1 ≤ k2 ≤ · · · ≤ kn . The Poisson process is a.s. continuous at any point, as shown by P(ω : lim |Nt+ (ω) − Nt (ω)| = 0) = lim e−λ = 1. →0
→0
However, the probability of continuity at all points in any interval is less than 1 so that it is not (a.s.) sample path continuous. Like any process with independent increments, the Poisson process is Markovian (see 2.2.4). However, the independent increment assumption is stronger than the Markov property. 2.11 Problems 1. Show that the Borel σ -field B(IR∞ ) coincides with the smallest σ -field containing the |xk1 − xk2 | open sets in IR∞ in the metric ρ∞ (x 1 , x 2 ) = k 2−k . 1 + |xk1 − xk2 | 2. Suppose that at time 0 you have $a and your component has $b. At times 1, 2, . . . you bet a dollar and the game ends when somebody has $0. Let Sn be a random walk
76
3.
4. 5.
6.
Stochastic processes
on the integers {. . . , −2, −1, 0, +1, +2, . . . } with P(X = −1) = q, P(X = +1) = p. Let α = inf{n ≥ 1 : Sn = −a, +b}, i.e. the first time you or your component is ruined, then {Sn∧α }∞ n=0 is the running total of your profit. Show that if p = q = 1/2, {Sn∧α } is a bounded martingale with mean 0 and that the probability of your ruin is b/(a + b). q Sn Show that if the game is not fair ( p = q) then Sn is not a martingale but Yn = is p a martingale. Find the probability of your ruin and check that if a = b = 500, p = .499 and q = .501 then P(ruin) = .8806 and it is almost 1 if p = 1/3. Show that if {X n } is an integrable, real valued process, with independent increments and mean 0, then it is a martingale with respect to the filtration it generates; and if in addition X n2 is integrable, X n2 − E(X n2 ) is a martingale with respect to the same filtration. Let {X n } be a sequence of i.i.d. random variables with E[X n ] = 0 and E[X n2 ] = 1. n Show that Sn2 − n is an Fn = σ {X 1 , . . . , X n }-martingale, where Sn = i=1 Xi . Let {yn } be a sequence of independent random variables with E[yn ] = 1. Show that the sequence X n = nk=0 yn is a martingale with respect to the filtration Fn = σ {y0 , . . . , yn }. Let {X n } and {Yn } be two sequences of i.i.d. random variables with E[X n ] = E[Yn ] = 0, E[X n2 ] < ∞, E[Yn2 ] < ∞ and Cov(X n , Yn ) = 0. Show that {SnX SnY −
n
Cov(X i , Yi )}
i=1
is an Fn = σ {X 1 , . . . , X n , Y1 , . . . , Yn }-martingale, where n n SnX = i=1 X i , SnY = i=1 Yi . 7. Show that two square integrable martingales X and Y are orthogonal if and only if X 0 Y0 = 0 and the process {X n Yn } is a martingale. 8. Show that the square integrable martingales X and Y are orthogonal if and only if for every 0 ≤ m ≤ n, E[X n Yn | Fm ] = E[X n | Fm ]E[Yn | Fm ]. 9. Let {Bt } be a standard Brownian motion process (B0 = 0, a.s., σ 2 = 1). Show that the conditional density of {Bt } for t1 < t < t2 , P(Bt ∈ dx | Bt1 = x1 , Bt2 = x2 ), is a normal density with mean and variance µ = x1 +
x2 − x1 (t − t1 ), t2 − t1
σ2 =
(t2 − t)(t − t1 ) . t2 − t1
10. Let {Bt } be a standard Brownian motion process. Show that the density of α = inf{t, Bt = b}, the first time the process Bt hits level b ∈ IR (see Example 2.2.5), is given by |b| −b/2t f α (t) = √ e ; 2π t 3
t > 0.
2.11 Problems
77
11. Let {Bt } be a Brownian motion process with drift µ and diffusion coefficient σ 2 . Let xt = e Bt ,
t ≥ 0.
Show that E[xt | x0 = x] = xet(µ+ 2 σ ) , 1
2
and var[xt | x0 = x] = x 2 e2t(µ+ 2 σ 1
2
)
2 etσ − 1 .
12. Let Nt be a standard Poisson process and Z 1 , Z 2 . . . a sequence of i.i.d. random variables such that P(Z i = 1) = P(Z i = −1) = 1/2. Show that the process Xt =
Nt
Zi
i=1
is a martingale with respect to the filtration {Ft } = σ {X s , s ≤ t}. 13. Show that the process {Bt2 − t, FtB } is a martingale, where B is the standard Brownian motion process and {FtB } its natural filtration. 14. Show that the process {(Nt − λt)2 − λt} is a martingale, where Nt is a Poisson process with parameter λ. 15. Show that the process t It = f (ω, s)dMs 0
is a martingale. Here f (.) is an adapted, bounded, continuous sample paths process and Mt = Nt − λt is the Poisson martingale. 16. Referring to Example 2.4.4, define the processes Nnsr =
n
I(ηk−1 =s,ηk =r ) =
k=1
n
X k−1 , es X k , er ,
(2.11.1)
k=1
and Onr =
n
I(ηk =r ) =
k=1
n
X k , er .
(2.11.2)
k=1
Show that 2.11.1 and 2.11.2 are increasing processes and give their Doob decompositions. 17. Let {X k , Fk }, for 0 ≤ k ≤ n be a martingale and α a stopping time. Show that E[X α ] = E[X 0 ]. 18. Let α be a stopping time with respect to the filtration {X n , Fn }. Show that Fα = {A ∈ F∞ : A ∩ {ω : α(ω) ≤ n} ∈ Fn is a σ -field and that α is Fα -measurable.
∀ n ≥ 0}
78
Stochastic processes
19. Let {X n } be a stochastic process adapted to the filtration {Fn } and B a Borel set. Show that α B = inf{n ≥ 0,
X n ∈ B}
is a stopping time with respect Fn . 20. Show that if α1 , α2 , are two stopping times such that α1 ≤ α2 (a.s.) then Fα1 ⊂ Fα2 . 21. Show that if α is a stopping time and a is a positive constant, then α + a is a stopping time. 22. Show that if {αn } is a sequence of stopping times and the filtration {Ft } is rightcontinuous then inf αn , lim inf αn and lim sup αn are stopping times.
3
Stochastic calculus
3.1 Introduction It is known that if a function f is continuous and a function g is right continuous with left limits, of bounded variation (see Definition 2.6.1), then the Riemann–Stieltjes integral of f with respect to g on [0, t] is well-defined and equals t n n f (s)dg(s) = lim f (τkn )(g(tkn ) − g(tk−1 )), 0
δn →0
k=1
where 0 = t0n < t1n < · · · < tnn = t denotes a sequence of partitions of the interval [0, t] n n such that δn = max(tkn − tk−1 ) → 0 as n → ∞ and tk−1 ≤ τkn ≤ tkn . The Lebesgue–Stieltjes integral with respect to g can be defined by constructing a measure µg on the Borel field B([0, ∞)), starting with the definition µg ((a, b]) = g(b) − g(a), and then starting with the integral of simple functions f with respect to µg , as in Chapter 1. For right continuous left limited stochastic processes with bounded variation sample paths, path-by-path integration is defined for each sample path by fixing ω and performing Lebesgue–Stieltjes integration with respect to the variable t. If a continuous (local) martingale X has bounded variation, its quadratic variation is zero (see Remark 2.6.10(4)). However, continuous (local) martingales have unbounded variation, so that the Stieltjes definition cannot be used in stochastic integration to define path-by-path integrals. We assume the dependence of f as ω is constant in time.
3.2 Quadratic variations Discrete-time processes Definition 3.2.1 The stochastic process X = {X n }, n ≥ 0, is said to be square integrable if supn E[X n2 ] < ∞.
80
Stochastic calculus
Definition 3.2.2 Let {X n } be a discrete time, square integrable stochastic process on a filtered probability space (, F, {Fn }, P). 1. The nonnegative, increasing process defined by [X, X ]n = X 02 +
n
(X k − X k−1 )2
k=1
is called the optional quadratic variation of {X n }. The predictable quadratic variation of {X n } relative to the filtration {Fn } and probability measure P is defined by n
X, X n = E(X 02 ) +
E[(X k − X k−1 )2 | Fk−1 ].
k=1
2. Given two square integrable processes {X n } and {Yn } the optional covariation process is defined by [X, Y ]n = X 0 Y0 +
n
(X i − X i−1 )(Yi − Yi−1 ),
i=1
and the predictable covariation process is defined by X, Y n = E[X 0 Y0 ] +
n
E[(X i − X i−1 )(Yi − Yi−1 ) | Fi−1 ].
i=1
Example 3.2.3 Let X 1 , X 2 , . . . be a sequence of i.i.d. normal random variables with mean 0 and variance 1, and consider the process Z 0 = 0 and Z n = nk=1 X k . Then it is left as an exercise to show that [Z , Z ]n =
n
X k2 ,
k=1
Z , Z n = n, E([Z , Z ]n ) = E(
n
X k2 ) = n.
k=1
Here Z , Z n is not random and is equal to the variance of Z n .
Example 3.2.4 Let = {ωi , 1 ≤ i ≤ 8} and the time index be n = 1, 2, 3. Suppose we are given a probability measure P(ωi ) = 1/8, i = 1, . . . , 8, a filtration F0 = {, ∅}, F1 = σ {{ω1 , ω2 , ω3 , ω4 }, {ω5 , ω6 , ω7 , ω8 }}, F2 = σ {{ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }, {ω7 , ω8 }}, F3 = σ {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 , ω7 , ω8 },
3.2 Quadratic variations
81
and a stochastic process X given by:
X=
X 0 (ω1 ) X 0 (ω2 ) . . .
X 0 (ω8 )
X 1 (ω1 ) X 1 (ω2 ) . . .
X 1 (ω8 )
X 2 (ω1 ) X 2 (ω2 ) . . .
X 2 (ω8 )
X 3 (ω1 ) X 3 (ω2 ) . . .
X 3 (ω8 )
which is adapted to the filtration {Fi , i = 0, 1, 2, 3}, that is x0 X=
x0
x0
x0
x0
x0
x0
x0
x1,1 x1,1 x1,1 x1,1 x1,2 x1,2 x1,2 x1,2 x2,1 x2,1 x2,2 x2,2 x2,3 x2,3 x2,4 x2,4
.
x3,1 x3,2 x3,3 x3,4 x3,5 x3,6 x3,7 x3,8 In this simple example the stochastic process X, X n = E(X 02 ) +
n
E[(X k − X k−1 )2 | Fk−1 ]
k=1
can be explicitly calculated: X, X 0 = E(X 02 ) = x02 , X, X 1 = E(X 02 ) + E[(X 1 − X 0 )2 | F0 ] = x02 + E[(X 1 − X 0 )2 ] 4 4 = x02 + (x1,1 − x0 )2 + (x1,2 − x0 )2 . 8 8 Note that X, X 0 and X, X 1 are both F0 -measurable, that is, they are constants. X, X 2 (ω) = E(X 02 ) + E[(X 1 − X 0 )2 ] + E[(X 2 − X 1 )2 | F1 ](ω) = X, X 1 + E[(X 2 − X 1 )2 | {ω1 , ω2 , ω3 , ω4 }]I{ω1 ,ω2 ,ω3 ,ω4 } + E[(X 2 − X 1 )2 | {ω5 , ω6 , ω7 , ω8 }]I{ω5 ,ω6 ,ω7 ,ω8 } = X, X 1 + +
(x2,1 − x1,1 )2 2/8 + (x2,2 − x1,1 )2 2/8 I{ω1 ,ω2 ,ω3 ,ω4 } P{ω1 , ω2 , ω3 , ω4 } = 4/8
(x2,3 − x1,2 )2 2/8 + (x2,4 − x1,2 )2 2/8 I{ω5 ,ω6 ,ω7 ,ω8 } P{ω5 , ω6 , ω7 , ω8 } = 4/8
4 4 = x02 + (x1,1 − x0 )2 + (x1,2 − x0 )2 8 8 +
(x2,1 − x1,1 )2 + (x2,2 − x1,1 )2 I{ω1 ,ω2 ,ω3 ,ω4 } 2
+
(x2,3 − x1,2 )2 + (x2,4 − x1,2 )2 I{ω5 ,ω6 ,ω7 ,ω8 } . 2
82
Stochastic calculus
Note that X, X 2 is F1 -measurable. X, X 3 (ω) = E(X 02 ) + E[(X 1 − X 0 )2 ] + E[(X 2 − X 1 )2 | F1 ](ω) + E[(X 3 − X 2 )2 | F2 ](ω) = X, X 2 + E[(X 3 − X 2 )2 | F2 ](ω) = X, X 2 +
(x3,1 − x2,1 )2 + (x3,2 − x2,1 )2 I{ω1 ,ω2 } 2
+
(x3,3 − x2,2 )2 + (x3,4 − x2,2 )2 I{ω3 ,ω4 } 2
+
(x3,5 − x2,3 )2 + (x3,6 − x2,3 )2 I{ω5 ,ω6 } 2
+
(x3,7 − x2,4 )2 + (x3,8 − x2,4 )2 I{ω7 ,ω8 } . 2
Note that X, X 3 is F2 -measurable.
Theorem 3.2.5 If {X n } is a square integrable martingale then X 2 is a submartingale and X 2 − X, X is a martingale, i.e. X, X is the unique predictable, increasing process in the Doob decomposition of X 2 . Proof
From Jensen’s inequality 2.3.3, 2 E[X n2 | Fn−1 ] ≥ (E[X n | Fn−1 ])2 = X n−1 .
Hence X 2 is a submartingale. The rest of the proof is left as an exercise.
Theorem 3.2.6 If X and Y are (square integrable) martingales, then X Y − [X, Y ] and X Y − X, Y are martingales. Proof E(X n Yn − [X, Y ]n | Fn−1 ) = −[X, Y ]n−1 + E(X n Yn − (X n − X n−1 )(Yn − Yn−1 ) | Fn−1 ) = −[X, Y ]n−1 − X n−1 Yn−1 + E(X n Yn − X n Yn + X n Yn−1 + X n−1 Yn | Fn−1 ) = −[X, Y ]n−1 − X n−1 Yn−1 + 2X n−1 Yn−1 = X n−1 Yn−1 − [X, Y ]n−1 . The proof for X Y − X, Y is similar. Two martingales X and Y are orthogonal if and only if X, Y n = 0 for, all n.
3.2 Quadratic variations
83
Example 3.2.7 Returning to Example 2.3.15, we call the stochastic process X = n n k=1 Ak bk = k=1 Ak C k a stochastic integral with predictable integrand A and integrator the martingale C. Note that the predictability of the integrand is a rather natural requirement. In discrete time the stochastic integral is usually called the martingale transform and it is usually written (A • C)n =
n
Ak Ck .
k=1
Stochastic integrals can be defined for more general integrands and integrators. Theorem 3.2.8 For any discrete time process X = {X n } we have: n
X k−1 X k =
k=1
1 2 (X − [X, X ]n ). 2 n
Proof 2
n
X k−1 X k + [X, X ]n =
k=1
n
[2X k−1 (X k − X k−1 ) + (X k − X k−1 )2 ] = X n2 .
k=1
1 2 (X − X 02 ) 2 t we should replace the integrand X n−1 by a non-predictable one, (X n−1 + X n )/2. This is a discrete-time Stratonovitch integral and: In order to recover the analog of the familiar form of the integral
X s dX s =
n X k−1 + X k 1 X k = (X n2 − X 02 ). 2 2 k=1
However, we then lose the martingale property of the stochastic integral. The following result, which is proved using the identity 1 ([X + Y, X + Y ]n − [X, X ]n − [Y, Y ]n ), 2 is the integration (or summation) by parts formula. [X, Y ]n =
Theorem 3.2.9 X n Yn =
n k=1
X k−1 Yk +
n
Yk−1 X k + [X, Y ]n .
k=1
We now state the rather trivial discrete-time version of the so-called Itˆo formula of stochastic calculus. Theorem 3.2.10 For a real valued differentiable function f and a stochastic process X we have n n f (X n ) = f (X 0 ) + f (X k−1 )X k + [ f (X k ) − f (X k−1 ) − f (X k−1 )X k ]. k=1
k=1
84
Stochastic calculus
Continuous-time processes We begin by recalling few definitions and results regarding deterministic functions. Definition 3.2.11 The quadratic variation Sn ( f ) of a function f on an interval [a, b] is Sn ( f ) =
n
n ( f (tkn ) − f (tk−1 ))2 ,
k=1
where a = < < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. t0n
t1n
Lemma 3.2.12 If f is a continuous real valued function of bounded variation (see Definition 2.6.1) then its quadratic variation on any interval [a, b] is 0, that is lim Sn ( f ) = lim
n→∞
n→∞
n
n ( f (tkn ) − f (tk−1 ))2 = 0,
k=1
where a = t0n < t1n < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. Proof Since f is continuous and of bounded variation there exists M > 0 such that for ε > 0 we can choose a partition so fine that ε n maxk (| f (tkn ) − f (tk−1 )|) < . nM Sn ( f ) < M
n
n | f (tkn ) − f (tk−1 )| < Mn
k=1
ε = ε, nM
and the result follows. Let {X t , Ft } be a square integrable martingale. Then {X t2 , Ft } is a nonnegative submartingale, hence of class DL and from the Doob–Meyer decomposition there exists a unique predictable increasing process {X, X t , Ft } such that X t2 = Mt + X, X t , where {Mt , Ft } is a right-continuous martingale and X, X 0 = X 02 . Lemma 3.2.13 Suppose X = {X t , Ft } is a square integrable martingale. Then: 1. X = X c + X d , where X c is the continuous martingale part of X and X d is the purely discontinuous martingale part of X . This decomposition is unique. 2 2. E[ s X s2 ] ≤ E[X ∞ ], where X ∞ = limt→∞ X t . 2 3. For any t, s≤t X s < ∞ a.s. Proof
See [11] page 97.
The following result is analogous to Lemma 3.2.13.
3.2 Quadratic variations
85
Lemma 3.2.14 Suppose X = {X t , Ft } is a local martingale. Then: 1. X = X c + X d , where X c is the continuous local martingale part of X and X d is the purely discontinuous local martingale part of X . This decomposition is unique. 2. For any t, X s2 < ∞ a.s. s≤t
Proof
See [11] page 119.
Definition 3.2.15 Let X = {X t , Ft } be a square integrable martingale. 1. X, X is called the predictable quadratic variation of X . 2. The optional increasing process [X, X ]t = X c , X c t + (X s )2 s≤t
is called the optional quadratic variation of X . X = X c + X d is the unique decomposition given by Lemma 3.2.13. Example 3.2.16 If {Nt } is a Poisson process with parameter λ, Ns = 0 or 1 for all s ≥ 0 and N c , N c t = 0. Therefore [N , N ]t = (Ns )2 = Nt . 0≤s≤t
Since {Nt − λt} is a martingale that is 0 at 0, we have N , N t = λt. Theorem 3.2.17 If X = {X t , Ft } is a continuous local martingale, there exists a unique increasing process X, X , vanishing at zero, such that X 2 − X, X is a continuous local martingale. Proof
See [32] page 124.
Definition 3.2.18 Suppose X = X 0 + M + V is a semimartingale (see Definition 2.6.4). Then the optional quadratic variation of X is the process [X, X ]t = X c , X c t + (X s )2 . s≤t
By definition V has finite variation in [0, t], (Vs )2 ≤ K |Vs )| < ∞, s≤t
s≤t
for some K . Also, from Lemma 3.2.14 s≤t (Ms )2 < ∞. Therefore finite because (X s )2 ≤ (Ms )2 + (Vs )2 .
s≤t (X s )
2
is a.s.
86
Stochastic calculus
Lemma 3.2.19 Almost every sample path of [X, X ] is right-continuous with left limits and of finite variation on each compact subset of IR. Further, [X, X ]t < ∞ a.s. for each t ∈ [0, ∞). Proof
See [11].
Definition 3.2.20 Suppose {X t , Ft } and {Yt , Ft } are two square integrable martingales. Then 1 X, Y = (X + Y, X + Y − X, X − Y, Y ). 2 X, Y is the unique predictable process of integrable variation (see Definition 2.6.2) such that X Y − X, Y is a martingale and X 0 Y0 = X, Y 0 . Two square integrable martingales X and Y are called orthogonal martingales if X, Y t = 0, a.s., holds for every t ≥ 0. Remark 3.2.21 From the definition, the orthogonality of two square integrable martingales X and Y implies that X Y is a martingale. Conversely, from the identity E[(X t − X s )(Yt − Ys ) | Fs ] = E[X t Yt − X t Ys − X s Yt + X s Ys | Fs ] = E[X t Yt − X s Ys | Fs ] = E[X, Y t − X, Y s | Fs ], if X Y is a martingale the two square integrable martingales X and Y are orthogonal.
Definition 3.2.22 Suppose {X t , Ft } and {Yt , Ft } are two square integrable martingales. Define 1 ([X + Y, X + Y ] − [X, X ] − [Y, Y ]). 2 Then [X, Y ] is of integrable variation (see Definition 2.6.2), [X, Y ] =
X Y − [X, Y ] is a martingale and X 0 Y0 = [X, Y ]0 = X 0 Y0 . Remark 3.2.23 From the definition [X, Y ]t = X c , Y c t +
X s Ys .
s≤t
Definition 3.2.24 Suppose X = {X t , Ft } is a local martingale and let X = X c + X d be its unique decomposition into a continuous local martingale and a totally discontinuous local martingale. Then the optional quadratic variation of X is the increasing process [X, X ]t = X c , X c t + (X s )2 . s≤t
3.3 Simple examples of stochastic integrals
87
If X , Y are local martingales, 1 ([X + Y, X + Y ] − [X, X ] − [Y, Y ]) 2 = X c , Y c t + X s Ys .
[X, Y ] =
s≤t
We end this section with the following useful inequalities. Write H2 = {uniformly integrable (see Definition 1.3.34) martingales {Mt } such that sup |Mt | ∈ L 2 }
(3.2.1)
t
Theorem 3.2.25 Suppose X, Y ∈ H2 and f , g are measurable processes. (See Definition 2.1.9.) If 1 < p < ∞ and 1/ p + 1/q = 1 then ∞ 1/2 ∞ 1/2 ∞ 2 2 E | f s ||gs ||dX, Y s | ≤ f s dX, X s gs dY, Y s , 0 0 0 p
and
E 0
Proof
∞
1/2 ∞ | f s ||gs ||d[X, Y ]s | ≤ f s2 d[X, X ]s 0
p
q
1/2 ∞ gs2 d[Y, Y ]s . 0 q
See [11] page 102.
Theorem 3.2.26 (Time-Change for Martingales). If M is an Ft -continuous local martingale vanishing at 0 and such that M, M∞ = ∞ and if we set Tt = inf{s : M, Ms > t}, then Bt = MTt is an FTt -Brownian motion and Mt = BM,Mt . Proof
See [32] page 181.
3.3 Simple examples of stochastic integrals Example 3.3.1 Suppose {X t }, t ≥ 0, is a stochastic process representing the random price of some asset. Consider a partition 0 = t0n < t1n < · · · < tnn = t of the interval [0, t]. Suppose ξti , i = 0, 1, . . . , n − 1, is the amount of the asset which is bought at time ti for the price X ti . This amount ξti is held until time ti+1 when it is sold for price X ti+1 . The amount gained (or lost) is, therefore, ξti (X ti+1 − X ti ). Then ξti+1 is bought at time ti+1 . Clearly ξti should be predictable with respect to the filtration {FtX } generated by X . Then t n−1
ξti (X ti+1 − X ti ) = ξs dX s i=0
0
is the total increase (or loss) in the trader’s wealth from holding these amounts of the asset.
88
Stochastic calculus
Example 3.3.2 Since the sample paths of a Poisson process Nt are increasing and of finite variation we can write t ∞ X s (ω)dNs (ω) = X αk (ω) (ω)I(αk ≤t) (ω), 0
k=1
where αk is the time of the k-th jump. Recall that the number of jumps in any finite interval [0, t] is finite with probability 1. Hence the infinite series has only finitely many nonzero terms for almost all ω. Example 3.3.3 Stochastic integration with respect to the family of martingales q(t, A) related to the single jump process (See Examples 2.1.4 and 2.6.12) is simply ordinary (Stieltjes) integration with respect to the measures µ and µ p applied to suitable integrands. Recall that µ picks out the jump time T and the location Z of the stochastic process X , that is µ(ds, dz) is non-zero only when T ∈ ds and Z ∈ dz. Therefore, we may write for any suitable real valued function g defined on = [0, ∞] × E: g(s, z)q(ds, dz) = g(s, z)µ(ds, dz) − g(s, z)µ p (ds, dz),
where
g(s, z)µ(ds, dz) = g(T, Z ),
since the random measure µ picks out the jump time T and the location Z only. We say that g ∈ L 1 (µ) if
||g|| L 1 (µ) = E |g|dµ = E[g(T, Z )] < ∞. E
We say that g ∈ if g I{t<τn } ∈ L 1 (µ) for some sequence of stopping times τn ↑ ∞ a.s. Using (2.1.1) and (2.1.2) we have dFsA µ p (t, A) = − = λ(A, s)d(s). (3.3.1) ]0,T ∧t] Fs− ]0,T ∧t] L 1loc (µ)
Hence
We also have
g(s, z)µ p (ds, dz) =
Define g
Mt =
g(s, z)λ(dz, s)d(s). ]0,T ]
E
g(s, z)µ p (ds, dz) =
g(s, z) ]0,T ]
E
P(ds, dz) . Fs−
E
=
I{s≤t} g(s, z)q(ds, dz)
I{s≤t} g(s, z)µ(ds, dz) − E
I{s≤t} g(s, z)µ p (ds, dz), E
3.3 Simple examples of stochastic integrals
89
or, from the definition,
g
Mt = g(T, Z )I{T ≤t} +
g(s, z) ]0,T ∧t]
E
P(ds, dz) . Fs−
g
Theorem 3.3.4 Mt is an Ft -martingale for g ∈ L 1 (µ). Proof
For t > s,
g E[Mt − Msg | Fs ] = E g(T, Z )(I{T ≤t} − I{T ≤s} ) P(du, dz) P(du, dz) + g(u, z) − g(u, z) | Fs . Fu− Fu− ]0,T ∧t] E ]0,T ∧s] E So we must show that E[g(T, Z )(I{T ≤t} − I{T ≤s} ) | Fs ] P(du, dz) = −E g(u, z) Fu− ]0,T ∧t] E P(du, dz) − g(u, z) | Fs . Fu− ]0,T ∧s] E First note that if T ≤ s both sides of (3.3.2) are zero. Now E[g(T, Z )(I{T ≤t} − I{T ≤s} ) | Fs ] = E[g(T, Z )I{ss]I{T >s} I{T >s} = g(u, z)P(du, dz), Fs ]s,t] E and
P(du, dz) P(du, dz) −E g(u, z) − g(u, z) | Fs Fu− Fu− ]0,T ∧t] E ]0,T ∧s] E P(du, dz) = −E g(u, z) Fu− ]0,T ∧t] E P(du, dz) − g(u, z) | T > s I{T >s} Fu− ]0,s] E I{T >s} P(du, dz) E =− g(u, z) P(T > s) Fu− ]0,T ∧t] E P(du, dz) (I{T >t} + I{ss} P(du, dz) = −Ft g(u, z) Fs Fu− ]s,t] E P(du, dz) + g(u, z) dFr . Fu− ]s,t] ]s,r ] E
(3.3.2)
90
Stochastic calculus
Interchanging the order of integration, the triple integral is P(du, dz) = g(u, z) dFr Fu− ]s,t] ]s,r ] E 1 = dFr g(u, z)P(du, dz) ]s,t] Fu− ]u,t] E 1 = (Ft − Fu− )g(u, z)P(du, dz) ]s,t] E Fu− P(du, dz) = Ft g(u, z) − g(u, z)P(du, dz). Fu− ]s,t] E ]s,t] E Therefore (3.3.2) holds and the result follows.
3.4 Stochastic integration with respect to a Brownian motion Let B = {Bt , t ≥ 0} be a Brownian motion process and let 0 = t0n < t1n < · · · < tnn = t n denote a sequence of partitions of the interval [0, t] such that δn = max(tkn − tk−1 ) → 0 as n → ∞. Write formally t It = Bs dBs . 0
If formula were true for stochastic integrals It = Bt2 − t the usual integration-by-parts 1 2 of the limit, as 0 Bs dBs , so It = 2 Bt . (This assumes the existence, in some sense, n n δn = max(tk − tk−1 ) → 0 ( n → ∞), of the Riemann–Stieltjes sums Sn = nk=1 Bτkn (Btkn − n n n n ), where t Btk−1 k−1 ≤ τk ≤ tk .) Now Sn can be written as Sn =
1 2 B + Sn , 2 t
(3.4.1)
where Sn = − +
n n 1 2 2 n ) + n ) (Btkn − Btk−1 (Bτkn − Btk−1 2 k=1 k=1 n
n ). (Btkn − Bτkn )(Bτkn − Btk−1
k=1
To see this write n ) = (B n − B n n )(B n − B n + B n − B n ) Bτkn (Btkn − Btk−1 τk tk−1 + Btk−1 tk τk τk tk−1
2 n ) + (B n − B n ) = (Btkn − Bτkn )(Bτkn − Btk−1 τk tk−1 n (B n − B n ). + Btk−1 tk tk−1
(3.4.2)
3.4 Integration with respect to a Brownian motion
91
The last term in (3.4.2) is written n (B n − B n ) = (B n n ) Btk−1 tk tk−1 tk−1 − Btkn + Btkn )(Btkn − Btk−1
2 n ) + B n (B n − B n ) = −(Btkn − Btk−1 tk tk tk−1
1 2 n ) − 2B n (B n − B n )] = − [2(Btkn − Btk−1 tk tk tk−1 2 1 1 1 2 2 2 n ) − n ) − B n] + = − (Btkn − Btk−1 [(Btkn − Btk−1 Bn tk 2 2 2 tk 1 1 2 1 2 n ) − = − (Btkn − Btk−1 + Bt2kn . Btk−1 n 2 2 2 n Using this form of Sn one can show that if τkn = (1 − α)tk + αtk−1 , 0 ≤ α ≤ 1, then
L2
lim Sn =
δn →0
Bt2 + (α − 12 )t = It (α), 2
where Sn is given by (3.4.1). It is interesting to notice that the stochastic integral It (α) =
Bt2 + (α − 12 )t 2
n n -measurable is an Ft -martingale if and only if α = 0. When α = 0 the integrand Btk−1 is Ftk−1 and so does not anticipate future events in Ftkn . Then, because B has independent increments, n n Btk−1 is independent of the integrator Btkn − Btk−1 which gives E[Sn ] = 0. t K. Itˆo [17] has given a definition of the integral f (s, ω)dBs (ω) for the class of pre-
0
dictable, locally square integrable stochastic processes { f (t, ω)}. The next important step was given by H. Kunita and S. Watanabe in 1967 [24]. They extended the definition of Itˆo by replacing the Brownian motion process by an arbitrary square integrable martingale {X t } employing the quadratic variation processes X, X t . The stochastic (Itˆo) integral with respect to a Brownian motion integrator will be defined for two classes of integrands. The larger class of integrands gives an integral which is a local martingale. The more restricted class of integrands gives an integral which is a martingale. Suppose (, F, P) is a probability space and B = {Bt , t ≥ 0} is a standard Brownian motion. Write Ft0 = σ {Bu : u ≤ t} and {Ft , t ≥ 0} for the right continuous, complete filtration generated by B. Let H be the set of all adapted, measurable processes { f (ω, t), Ft } such that with probability 1, 0
t
f 2 (ω, s)ds < ∞,
∀t ≥ 0,
92
Stochastic calculus
and let {H 2 , ||.|| H 2 } be the normed space of all adapted, measurable processes { f (ω, t), Ft } such that t 2 E f (ω, s)ds < ∞, ∀t ≥ 0, 0
where || f || H 2 = E
t 0
f 2 (ω, s)ds
1/2
, for f ∈ H 2 .
It is clear that H 2 ⊂ H , since for a nonnegative random variable X , if P(X = ∞) = 0
then
E[X ] = ∞,
in other words, if E[X ] < ∞
P(X = ∞) = 0. t In our case the nonnegative random variable is f 2 (ω, s)ds. then
0
As in the definition of the (deterministic) Stieltjes integral, a natural way to define the stochastic integral is to start with simple functions, that is, piecewise constant functions. Definition 3.4.1 A (bounded and predictable) function f (ω, t) is simple on the interval [0, t] if f (0, ω) is constant and for s ∈ (0, t], f (s, ω) =
n−1
f k (ω)I(tk ,tk+1 ] (s),
k=0
where 0 = t0 , . . . , tn = t is a partition of the interval [0, t] independent of ω, each f k (ω) is Ftk measurable and E[ f k2 ] < ∞. For any simple function f (ω, t) ∈ H (or f (ω, t) ∈ H 2 ) the Itˆo stochastic integral is defined as t
I( f ) = f (ω, s)dBs (ω) = f (tk , ω)(Btk+1 (ω) − Btk (ω)) 0
=
k
f k (ω)(Btk+1 (ω) − Btk (ω)).
k
Note that each f k is Ftk -measurable and hence independent of the integrator (Btk+1 − Btk ) because of the independent increment property of the Brownian motion B = {Bt , t ≥ 0}. In order to define the integral for functions in {H 2 , ||.|| H 2 } we need a few preliminary results. Lemma 3.4.2 ([16]). Let (, F, Ft , P) be a filtered probability space. Let L be a linear space of real and bounded measurable stochastic processes such that 1. L contains all bounded, left-continuous adapted processes, 2. if {X n } is a monotone increasing sequence of processes in L such that X = supn X n is bounded, then X ∈ L. Then L contains all bounded predictable processes. Proof
See [16] page 21.
3.4 Integration with respect to a Brownian motion
93
Lemma 3.4.3 Let S 2 be the set of all simple processes in H 2 . Then 1. S 2 is dense in H 2 . 2. For f ∈ S 2 , ||I ( f )|| L 2 = || f || H 2 . 3. For f ∈ S 2 , E[I ( f )] = 0. Proof 1. Let f ∈ H 2 and for K > 0 set F K = f I[−K ,K ] . Then f K ∈ H 2 and || f − f K || H 2 → 0 as K → ∞. Therefore suppose that f ∈ H 2 is bounded. Let L = { f ∈ H 2 : f is bounded and there exists f n ∈ S 2 such that || f − f n || H 2 → 0, n → ∞}. L is linear and is closed under monotone increasing sequences. If f is left-continuous bounded and adapted one can set f n (0, ω) = f (0, ω), and for t > 0, f n (t, ω) = f (k/2n , ω)I(k/2n ,(k+1)/2n ] (t),
k = 0, 1, . . . .
Then f n ∈ S 2 and by bounded convergence || f − f n || H 2 → 0, n → ∞. Now, in view of Lemma 3.4.2, L contains all bounded, predictable processes and L contains all bounded processes in H 2 . (See [16] Remark 1.1, page 45.)
2 2 2. ||I ( f )|| L 2 = E[I ( f )]2 = E[ f (tk , ω)(Btk+1 (ω) − Btk (ω)) ] = E[ Ak ] =
k
+2
k
E[(Ak ) ] + 2 2
E[Ai A j ] =
i< j
E[E[Ai A j | F j ]] =
i< j
k
E[E[(Ak ) | Fk ]] 2
k
E[E[ f 2 (tk , ω)(Btk+1 − Btk )2 | Fk ]]
k
= E[ f 2 (tk , ω)(tk+1 − tk )] = E[
t 0
k
f 2 (ω, s)ds] = || f ||2H 2 .
The proof of the last part of the lemma is left as an exercise. Theorem 3.4.4 Suppose that f (ω, t) ∈ H 2 . Then there exists an (a.s. unique) L 2 -random L2
variable I ( f ) such that I ( f n ) → I ( f ) independently of the choice of the sequence of simple functions f n (ω, t) ∈ H 2 , that is t t L2 f n (ω, s)dBs (ω) → f (ω, s)dBs (ω). (3.4.3) 0
0
The left hand side of (3.4.3) is also called the Itˆo integral of f . Proof In view of Lemma 3.4.3 we have that for f (ω, t) ∈ H 2 there exists a sequence of simple functions f n ∈ S 2 such that H2
lim f n → f, and we see that ||I ( f n ) − I ( f m )|| L 2 = ||I ( f n − f m )|| L 2 = || f n − f m || H 2 → 0. However, L 2 is complete, so that the Cauchy sequence I ( f n ) has a limit I ( f ) ∈ L 2 .
94
Stochastic calculus
Suppose that { f n } is a second sequence converging to f but I ( f n ) converges to another integral I ( f ). Then || f n − f || H 2 + || f − f n || H 2 ≥ || f n − f n || H 2 = ||I ( f n ) − I ( f n )|| L 2 . However, || f n − f || H 2 + || f − f n || H 2 → 0 by assumption and therefore ||I ( f n ) − I ( f n )|| L 2 → 0, which establishes the uniqueness of the limit I ( f ). L2
Remark 3.4.5 Since I ( f n ) → I ( f ), then lim E[I ( f n )] = E[I ( f )] and in view of Lemma 3.4.3 we have: 1. For f ∈ H 2 , E[I ( f )] = 0. 2. For f ∈ H 2 , ||I ( f )|| L 2 = || f || H 2 .
3.5 Stochastic integration with respect to general martingales Recall that H2 is given by (3.2.1). Write S = {bounded simple predictable processes (Definition 3.4.1) }, H02 2,c
(3.5.1)
= {{Mt } ∈ H : M0 = 0 a.s.},
(3.5.2)
H
= {{Mt } ∈ H and {Mt } is continuous},
(3.5.3)
H02,c
= {{Mt } ∈ H
(3.5.4)
2 2
2,c
: M0 = 0 a.s.}.
Suppose X ∈ H2 . Then the integral
t
f (s, ω)dX s = f 0 X 0 +
0
n
f k (X tk+1 ∧t − X tk ∧t ) exists for f ∈ S.
0
t
Lemma 3.5.1 ([11]).
f (s)dX s ∈ H2 and
0
E
2
∞
f (s)dX s
∞
=E
0
0
=E
∞
f (s, ω)dX, X s 2
f 2 (s, ω)d[X, X ]s .
0
Proof
By definition
t
f (s, ω)dX s = f 0 X 0 +
0
n
f k (X tk+1 ∧t − X tk ∧t ).
0
By the optional stopping theorem, for s ≤ t: t E f (z, ω)dX z | Fs = 0
s 0
f (z, ω)dX z .
3.5 Stochastic integration with respect to general martingales
95
For k < , so that k + 1 ≤ , E[ f k f (X tk+1 ∧t − X tk ∧t )(X t +1 ∧t − X t ∧t )] = E[E[ f k f (X tk+1 ∧t − X tk ∧t )(X t +1 ∧t − X t ∧t ) | Ft ]] = 0. Therefore
2
t
E
= E[
f (s)dX s 0
n
f k2 (X t2k+1 ∧t − X t2k ∧t )
0 n = E[ f k2 (X, X tk+1 ∧t − X, X tk ∧t ) 0
=E
t
f (s, ω)dX, X s 0 ∞ 2 ≤E f (s, ω)dX, X s < ∞, 2
0
because f is bounded and X ∈ H . The integrals on the right are Stieltjes integrals. Therefore, by Lebesgue’s Theorem, letting t → ∞: ∞ 2 ∞ 2 E =E f (s)dX s f (s, ω)dX, X s . 2
0
0
Finally note that X, X − [X, X ] is a martingale of integrable variation and the result follows. Theorem 3.5.2 Write L 2 (X, X ) for the space of predictable processes { f (ω, t)} such that ∞ || f ||2X,X = E f 2 (ω, s)dX, X s < ∞. 0
t
Then the map f → 0 f dX of S into H2 extends in a unique manner to a linear isometry of L 2 (X, X ) into H2 . Proof Suppose that the space t S is endowed with the seminorm ||.||X,X . Then from Lemma 3.5.1 the map f → 0 f dX of S into H2 is an isometry. However, S is dense in H2 and this map extends in a unique manner to an isometry of L 2 (X, X ) into H2 . The following characterization is due to Kunita and Watanabe [24]. (See [11] page 107.) Theorem 3.5.3 Suppose f ∈ L 2 (X, X ). 1. Then for every Y ∈ H2 ,
∞
E 0
E
0
∞
| f (s)||dX, Y s | < ∞, | f (s)||d[X, Y ]s | < ∞.
96
Stochastic calculus
t
2. The stochastic integral It = such that for every Y ∈ H2 , E[I∞ Y∞ ] = E
f (s)dX s is characterized as the unique element of H2
0 ∞
f (s)dX, Y s = E
0
3. Furthermore, for every Y ∈ H ,
∞
f (s)d[X, Y ]s .
0
2
t
I, Y t =
f (s)dX, Y s ,
0 t
I, Y t =
f (s)d[X, Y ]s .
0
Proof 1. Follows from Theorem 3.2.25. 2. The linear functional on L 2 (X, X ) defined by ∞ f → E I ∞ Y∞ − f (s)dX, Y s 0
is continuous by Theorem 3.2.25 and it is zero on the space of simple processes S which is dense in L 2 (X, X ). Therefore it is zero on L 2 (X, X ) by continuity. The second identity follows because X, Y − [X, Y ] is a martingale of integrable variation. 3. Note that t ∞
jt = It Yt − f (s)dX, Y s ≤ sup |It Yt | + | f (s)||dX, Y s | ∈ L 1 . 0
t
0
Applying the identity in part (ii) it is seen that, for any stopping time T , E[JT ] = 0. In view of Theorem 2.5.7, jt is a martingale. However, I, Y t is the unique predictable process of integrable variation such that It Yt − I, Y t is a martingale. Therefore, t I, Y t = 0 f (s)dX, Y s . To prove the last identity, decompose X and Y into their continuous and totally discontinuous parts and then use a similar argument. (See [11] page 108.) Note that the first identity in (2) uniquely characterizes the stochastic integral I . This is because the right hand side is a continuous linear functional Y (given f and X ), whilst the left hand side is just the inner product of I and Y in the Hilbert space H2 . Consequently, given f and X there is a unique I ∈ H2 which gives this linear functional. Definition 3.5.4 A process { f (t, ω)} is locally bounded process if { f (0, ω)} is a.s. finite, and if there is a sequence of stopping times τn ↑ ∞ and constants K n such that | f (t, ω)|I{0
a.s.
Definition 3.5.5 A martingale X is a locally uniformly integrable martingale if there is a sequence of stopping times τn ↑ ∞ such that the stopped martingale X {τn ∧t} is a uniformly integrable martingale.
3.6 The Itˆo formula for semimartingales
97
Theorem 3.5.6 ([11] page 121) 1. Suppose X is a locally uniformly integrable martingale and { f (t, ω)} is a predictable
t locally bounded process. There is then a unique local martingale {It = 0 f (s)dX s } such that for every bounded martingale Y , t [I, Y ]t = f (s)d[X, Y ]s . 0
(Here the right hand side is just a Stieltjes integral on each sample path.) t 2. I0 = f (0)X 0 , Itc = 0 f (s)dX sc and the processes (It ) and f (t, ω)X t are indistinguishable. 3. If the local martingale X is also of locally integrable variation (see Definition 2.6.2) then It can be calculated as the Stieltjes integral along each sample path. Proof Assume that X 0 = 0 and f (0) = 0. There is a sequence of stopping times τn ↑ ∞ such that the stopped martingale X {τn ∧t} is a uniformly integrable and bounded martingale by Theorem 2.5.11. Using Theorem 2.5.12 we can write X {τn ∧t} = U{τn ∧t} + V{τn ∧t} , where U0 = 0, U{τn ∧t} is square integrable and V{τn ∧t} is a martingale of integrable variation which is zero at t = 0. The stochastic integral t t t f (s)dX {τn ∧t} = f (s)dU{τn ∧t} + f (s)dV{τn ∧t} 0
0
0
is defined by Theorem 3.5.2. Furthermore, this integral a uniformly integrable martingale. If n < m (so that τn ≤ τm a.s.), because X {τn ∧t} is equal to X {τm ∧t} stopped at τn we have t t f (s)dX {τn ∧t} is equal to f (s)dX {τm ∧t} stopped at τn . 0 0 t
t A process {It = 0 f (s)dX s } is then defined by putting {I{τn ∧t} = 0 f (s)dX {τn ∧t} } and it is seen that It is a local martingale. The rest of the proof is left as an exercise. 3.6 The Itˆo formula for semimartingales Write V = {{Vt } which are adapted, right-continuous with left limits (corlol or c`adl`ag) and for which almost every sample path is of finite variation on each compact subset of [0, ∞[]}, V0 = {{Vt } ∈ V : V0 = 0 a.s}, ∞ |dVt | < ∞}, A = {{Vt } ∈ V : E
(3.6.1) (3.6.2) (3.6.3)
0
A0 = {{Vt } ∈ A : V0 = 0 a.s.}.
(3.6.4)
Theorem 3.6.1 Suppose X = X 0 + M + V is a semimartingale and { f (t, ω)} is a predictable locally bounded process. Then the process t t t
It = f (s, ω)dX s (ω) = f (0)X 0 + f (s, ω)dMs (ω) + f (s, ω)dVs (ω) 0
0
0
98
Stochastic calculus
c is at semimartingale. It is independent of the decomposition X , and the processes It and f (s)dX sc and (It ) and f (t)X t are indistinguishable. 0
Proof Suppose X = X 0 + Mˆ + Vˆ is a second decomposition of X . Then M − Mˆ = V − Vˆ is a local martingale which is locally of integrable variation. Therefore, by Thet
orem 3.5.6(3) the stochastic integral t f (s)d(V − Vˆ )s , and so 0
t
f (0)X 0 +
0
0
t
Because X c = M c the processes 3.5.6(ii). Similarly
t f (s, ω)dVs (ω) = f (0)X 0 + f (s, ω)d Mˆ s (ω) 0 t + f (s, ω)dVˆs (ω).
t
f (s, ω)dMs (ω) +
0
ˆ s is equal to the Stieltjes integral f (s)d(M − M)
0
f (s)dX sc and I c are indistinguishable by Theorem
0
t
f (t)X t = f (t)(Mt + Vt ) =
f (s)dMs +
0
t
f (s)dVs
= (It ).
0
The Itˆo formula is first established for a continuous, bounded, real semimartingale. Theorem 3.6.2 Suppose X = X 0 + M + V is a semimartingale such that |X 0 | ≤ K a.s., M ∈ H02,c (see (3.5.4)) and bounded by K , V ∈ V0 (see (3.6.3)), V is continuous and ∞ |dVs | ≤ K a.s. 0
Let F be a twice continuously differentiable function on IR. Then t t 1 t F(X t ) = F(X 0 ) + F (X s− )dMs + F (X s− )dVs + F (X s )dM, Ms . 2 0 0 0 (3.6.5) That is, the processes on the left and right hand sides are indistinguishable. Proof
Write t I1 = F (X s− )dMs , 0
t
I2 =
F (X s− )dVs ,
I3 =
0
t
F (X s )dM, Ms .
0
Now |X | ≤ 3K . If a, b ∈ [−3K , +3K ], 1 F(b) − F(a) = (b − a)F (a) + (b − a)2 F (a) + r (a, b), 2 where, because F is uniformly continuous on [−3K , +3K ], |r (a, b)| ≤ (|b − a|)(b − a)2 . Here (s) is an increasing function of s such that lims→0 (s) = 0.
3.6 The Itˆo formula for semimartingales
99
A stochastic subdivision of [0, t] is now defined by putting t0 = 0, ti+1 = t ∧ (ti + a) ∧ inf{s > ti : |Ms − Mti | > a or |Vs − Vti | > a}, where a is any positive real number. Then as a → 0 the steps of the subdivision, sup(ti+1 − ti ), converge uniformly to 0, and the random variables sup |Mti+1 − Mti | ≤ a, sup |Vti+1 − Vti | ≤ a, tend uniformly to 0. Therefore the variation of X on the interval [ti , ti+1 ] is bounded by 4a. Now F(X t ) − F(X 0 ) =
F (X ti )(X ti+1 − X ti ) +
i
+
r (X ti , X ti+1 ) = S1 +
i P
1 F (X ti )(X ti+1 − X ti )2 2 i 1 S2 + R, 2
P
say.
P
We shall show that as a → 0, S1 → I1 + I2 , S2 → I3 , and R → 0. Write S1 = F (X ti )(Mti+1 − Mti ) + F (X ti )(Vti+1 − Vti ) i
i
= U1 + U2 .
L2
Step 1. We show that U1 → I1 . Write I1 =
ti+1
F (X s )dMs .
ti
i
The martingale property implies different terms in the sum are mutually orthogonal, so 2 ti+1 ||U1 − I1 ||22 = (F (X ) − F (X ))dM s ti s i
=E
ti
2
ti+1
(F (X s ) − F (X ti )) dM, Ms 2
ti
i
≤ E[{sup sup (F (X s ) − F (X ti ))2 }M, Mt ]. ti ≤s≤ti+1
t
By uniform continuity, the supremum tends uniformly to zero. M, Mt is integrable, so the result follows by the Monotone Convergence Theorem 1.3.15. L1
Step 2. We show that U2 → I2 . |U2 − I2 | ≤
ti+1
|(F (X s ) − F (X ti ))||dVs |
ti
i
≤ {sup sup |F (X s ) − F (X ti )| t
ti ≤s≤ti+1
o
t
|dVs |}.
100
Stochastic calculus
Again by uniform continuity of F and the Monotone Convergence Theorem 1.3.15, ||U2 − I2 ||1 converges to 0. Step 3. Writing S2 = F (X ti )(Vti+1 − Vti )2 + 2 F (X ti )(Vti+1 − Vti )(Mti+1 − Mti ) i
+
i
F (X ti )(Mti+1 − Mti )2
i
= V1 + V2 + V3 , respectively. We first show that V1 and V2 converge to 0 both a.s. and in L 1 . However, if C > sup{|F (x)| + |F (x)| : −3K ≤ x ≤ 3K }, t |V1 | ≤ C sup |Vti+1 − Vti | |dVs | ≤ aC K . i
0
P
Step 4. We show that V3 → I3 . First recall that M is bounded by K , so 2 E[M, M∞ − M, Mt | Ft ] = E[M∞ | F] − Mt2 ≤ K 2 .
Therefore
E[M, M2∞ ] = 2E
∞
(M, M∞ − M, Mt )dM, Mt
0
= 2E
0
∞
2 (E[M∞
| F] −
Mt2 )dM,
Mt
≤ 2K 2 E[M, M∞ ] ≤ 2K 4 . Consequently M, M∞ ∈ L 2 and the martingale M 2 − M, M is actually in H02 . Write J3 = F (X ti )(M, Mti+1 − M, Mti ). i
Then the same argument as Step 2 shows that t L1 J3 → I3 = F (X s )dM, Ms . 0 P
Therefore, J3 → I3 . We shall show that ||V3 − J3 ||2L 2 → 0. Because M 2 − M, M is a martingale, E[(Mti+1 − Mti )2 − M, Mti+1 + M, Mti | Fti ] = 0. Therefore, distinct terms in the sum defining V3 − J3 are orthogonal and ||V3 − J3 ||22 = E[F (X ti )2 ((Mti+1 − Mti )2 − M, Mti+1 + M, Mti )2 ]. i
3.6 The Itˆo formula for semimartingales
101
However, F (X ti )2 ≤ C 2 and (α − β)2 ≤ 2(α 2 + β 2 ), so ||V3 − J3 ||22 ≤ 2C 2 E(Mti+1 − Mti )4 i
+ 2C 2
E(M, Mti+1 − M, Mti )2 .
i
The second sum here is treated similarly to V1 in Step 3: because M, M is unia.s. formly continuous on [0, t], supi (M, Mti+1 − M, Mti ) → 0 as a → 0 and is bounded by M, Mt . Therefore 2C 2 E(M, Mti+1 − M, Mti )2 i
≤ 2C 2 E(sup(M, Mti+1 − M, Mti )M, Mt ). i
Now M, Mt ∈ L 2 , so the second sum converges to zero by Lebesgue’s Dominated Convergence Theorem 1.3.17. For the first sum, 2C 2 E(Mti+1 − Mti )4 ≤ 2C 2 E(sup(Mti+1 − Mti )2 (Mti+1 − Mti )2 ) i
i
≤ 2C a E( 2 2
i
(Mti+1 − Mti ) ) = 2C 2 a 2 E[Mt2 ]. 2
i
which again converges to zero as a → 0. (Note that it is only here, where we use the fact that |Mti+1 − Mti | ≤ a, that the random character of the partition {ti } is used.) L2
P
P
We have, thus, shown that V3 − J3 → 0. However, J3 → I3 so V3 → I3 . Step 5. Finally, we show that the remainder term R converges to 0 as a → 0. We have observed that the remainder term r in the Taylor expansion is such that |r (a, b)| ≤ (|b − a|)(b − a)2 , where is an increasing function and lims→0 (s) = 0. Therefore, |R| ≤ (X ti+1 − X ti )2 (|X ti+1 − X ti |) i
≤ 2 (2a)
((Vti+1 − Vti )2 + (Mti+1 − Mti )2 ). i
Now
2 E (Mti+1 − Mti ) = E[Mt2 ] i
is independent of the partition, and E (Vti+1 − Vti )2 ≤ a E |Vti+1 − Vti | ≤ K a. i
i
Because lima→0 (2a) = 0, lim |E(R)| ≤ lim E(|R|) = 0.
a→0
a→0
102
Stochastic calculus
For a fixed t, therefore,
t
F(X t ) = F(X 0 ) + +
1 2
F (X s− )dMs +
0
t
t
F (X s− )dVs
0
F (X s )dM, Ms ,
0
almost surely. Because all the processes are right-continuous with left limits the two sides are indistinguishable (see Definition 2.1.5). The differentiation rule will next be proved for a function F which is twice continuously differentiable, and which has bounded first and second derivatives, and a semimartingale X of the form X t = X 0 + Mt + Vt , where X 0 ∈ L 1 , M ∈ H02 and V ∈ A0 . That is, the following result will be proved after the lemmas and remarks below. Theorem 3.6.3 Suppose X = X 0 + M + V is a semimartingale such that X 0 ∈ L 1 a.s., M ∈ H02 , V ∈ A0 and F is twice continuously differentiable with bounded first and second derivatives. Then the following two processes, the left and right hand sides, are indistinguishable: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s− )dX c , X c s 2 0 0 + (F(X s ) − F(X s− ) − F (X s− )X s ). (3.6.6) 0<s≤t
Remarks 3.6.4 1. Note that the series is absolutely convergent because, if C is a bound for |F |, then by Taylor’s theorem
|F(X s ) − F(X s− ) − F (X s− )X s | ≤
0<s≤t
C (X s )2 , 2 0<s≤t
and the right hand side is finite as in Definition 3.2.18. Also, because X c , X c is a continuous process, t t F (X s− )dX c , X c s = F (X s )dX c , X c s . 0
0
2. The first integral on the right of (3.6.6) is a well-defined stochastic integral since the integrand is predictable and locally bounded. Similar remarks apply to the second integral. 3. The series on the right of (3.6.6) is a correction term to balance the number of jumps on both sides of the equation.
3.6 The Itˆo formula for semimartingales
103
4. Another form of the differentiation rule is the following: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s− )d[X, X ]s 2 0 0 1 + (F(X s ) − F(X s− ) − F (X s− )X s − F (X s− )X s2 ). 2 0<s≤t This representation is of interest because whilst the predictable quadratic variation process X c , X c depends on the underlying probability measure, the optional quadratic variation process [X, X ] does not. 5. If {X t } is a deterministic function of bounded variation which is right-continuous and has left limits, we require F to be only once continuously differentiable and then: t F(X t ) = F(X 0 ) + F (X s− )dX s + (F(X s ) − F(X s− ) − F (X s− )X s ) = 0
0 t
F (X s− )dX sc +
0<s≤t
(F(X s ) − F(X s− )).
0<s≤t
Lemma 3.6.5 Suppose the differentiation rule of Theorem 3.6.6 is true for all semimartingales of the form X t = X 0 + Nt + Bt , where X 0 belongs to some dense set in L 1 , N belongs to some dense set in H02 and B belongs to some dense set in A0 . Then Theorem 3.6.6 is true for general semimartingales of the stated form. Proof
See [11] page 133.
Lemma 3.6.6 The semimartingale we need consider can be further restricted so that, 2,c if X t = X 0 + Mt + Vt , then M ∈ H0 is bounded, V ∈ A has at most N jumps and ∞
0
|dAcs |, is bounded, and X 0 is bounded.
Proof
See [11] page 137.
We now prove Theorem 3.6.6. Proof of Theorem 3.6.6 From Lemma 3.6.5 and Lemma 3.6.6, the result of Theorem 3.6.6 will follow if it can be proved that for a semimartingale X t = X 0 + Mt + Vt , 2 1 where ∞ X 0 ∈ L is bounded, M ∈ H0 and M is bounded, V ∈ A0 has at most N jumps and |dVs | < ∞. 0
104
Stochastic calculus
However, note that the two sides of the differentiation formula have the same jump at time t: because X c , X c is continuous the jump of the right hand side is F (X t− )X t + (F(X t ) − F(X t− ) − F (X t− )X t ) = F(X t ) − F(X t− ), which is the jump of the left hand side at t. Consider the continuous semimartingale X¯ t = X 0 + Mt + Vtc . Then from Theorem 3.6.2 the differentiation rule is true for X¯ . Furthermore, if the jumps of X are indexed in increasing order as 0 < S1 ≤ · · · ≤ S N ≤ ∞, then X t = X¯ t on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t < S1 (ω)} (denoted [[0, S1 [[). Therefore, X t− = X¯ t− on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t ≤ S1 (ω)} (denoted [[0, S1 ]]) so t t F (X s− )dMs = F ( X¯ s− )dMs 0
on [[0, S1 ]]. Also,
0
t 0
F (X s− )dVs =
0
t
F ( X¯ s− )dVsc
on [[0, S1 ]]. Because X c = X¯ c = M and the formula is true for X¯ (on [[0, ∞[[), the differentiation formula is true on [[0, S1 [[. However, the two sides of the formula have equal jumps at S1 , so the formula is true on [[0, S1 ]]. The same reasoning establishes the formula on {(t, ω) ∈ [0, ∞[× : S1 (ω) ≤ t ≤ S2 (ω)) ([[S1 , S2 ]]), and so on, up to [[S N , ∞[[. Theorem 3.6.6 is, therefore, proved. The differentiation rule will now be given when X is a general semimartingale and F is twice continuously differentiable, not necessarily having bounded derivatives. Theorem 3.6.7 Suppose X a semimartingale and F is a twice continuously differentiable function. Then F(X ) is a semimartingale, and with equality denoting indistinguishability: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s )dX c , X c s 2 0 0 + (F(X s ) − F(X s− ) − F (X s− )X s ). 0<s≤t
Proof
See [11] page 138.
Remark 3.6.8 Note that F (X s ) is right-continuous with left limit, and so F (X s− ) is predictable. Also, by considering stopping times such as S = inf{t : F (X s ) ≥ n}, we see that F (X s− ) is locally bounded. Similar remarks apply to F (X s− ). Therefore, the two integrals are well-defined. For any ω ∈ the trajectory X s (ω), 0 ≤ s ≤ t, remains in a compact interval [−C(t, ω), C(t, ω)]. On such a compact interval the second derivative of F is bounded
3.6 The Itˆo formula for semimartingales
105
by some constant K (t, ω). Therefore, if s ≤ t, |F(X s ) − F(X s− ) − F (X s− )X s | ≤
1 K (t, ω)(X s )2 (ω). 2
As in Definition 3.2.18, we know that s≤t (X s )2 (ω) is finite almost surely. Therefore, for any t the sum occurring on the right hand side of the differentiation rule is a.s. absolutely convergent. Theorem 3.6.9 If {X t } = {(X t1 , . . . , X tn )} is an n-vector semimartingale and F is a twice continuously differentiable function with respect to all arguments, we have t ∂ F(X s− ) 1 t ∂ 2 F(X s− ) i F(X t1 , . . . , X tn ) = F(X 0 ) + dX + dX ic , X jc s s i i∂ X j ∂ X 2 ∂ X 0 0 s i i, j ∂ F(X s− ) i F(X s ) − F(X s− ) − (3.6.7) + X s . ∂ X si i 0<s≤t Remark 3.6.10 If {X t } is a deterministic right-continuous with left limits function of bounded variation, then t t = 1 + s− dX s 0
has the unique exponential solution (0 = 1) t = e X t −X 0 (1 + X s )e{−X s } . s≤t
The next example is a generalization of the exponential formula to special semimartingales. Example 3.6.11 We shall apply Theorem 3.6.9 to show that if X t is a special semimartingale, then t t = 1 + s− dX s (3.6.8) 0
has the unique stochastic exponential solution (0 = 1) 1 c c t = e X t − 2 X , X t (1 + X s )e{−X s } s≤t
= eY1t Y2t ,
(3.6.9)
where 1 Y1t = X t − X c , X c t , 2 Y2t = (1 + X s )e−X s . s≤t
First note that the infinite product Y2t is finite (see Lemma 13.7 of [11]).
106
Stochastic calculus
Write t = f (Y1t , Y2t ). Using rule (3.6.7), f (Y1t , Y2t ) = 0 +
2 i=1
0
t
∂ f (Y1s− , Y2s− ) dYis ∂Yi
(3.6.10)
2 1 ∂ 2 f (Y1s− , Y2s− ) c c + f (Y1s , Y2s ) − f (Y1s− , Y2s− ) dYi , Y j s + 2 i, j=1 ∂Yi ∂Y j 0<s≤t
−
2 ∂ f (Y1s− , Y2s− ) i=1
∂Yi
Yis .
(3.6.11)
Because Y2t is a purely discontinuous process and of bounded variation the second integral of the sum in (3.6.10) is equal to eY1s− Y2s . 0<s≤t
Now Yic , Y jc = 0 except for i = j = 1 because the continuous part of Y2t is identically zero and since Y1c = X c , (3.6.11) becomes 1 t s− dX c , X c s . 2 0 In the last expression, using (3.6.9), we have f (Y1s , Y2s ) − f (Y1s− , Y2s− ) = s − s− 1 c c = e X s −X s− +X s− 2 X , X s (1 + X r )e−X r (1 + X s )e−X s r ≤s− 1 c − e X s− − 2 X ,
X c s (1 + X )e−X r r r ≤s−
1 c c = e X s− − 2 X , X s (1 + X r )(1 + X s ) r ≤s−
= s− X s . That is, s = s− (1 + X s ). Putting these results together gives (3.6.8). For the proof of the uniqueness see Theorem 13.5 of [11]. Example 3.6.12 Consider the following “log-Poisson plus log-normal” process, with its jump part driven by a finite sum of independent Poisson processes Nti , i = 1, . . . , n with time varying jump sizes ati and intensities λit :
t
Xt = X0 + σ 0
X s− dBs +
n i=1
0
t
X s− asi (dNsi − λis ds).
3.6 The Itˆo formula for semimartingales
107
Applying the result of Example 3.6.11 we see that X has the form n t σ i i X t = X s− exp σ (Bt − Bs ) − (t − s) − as λs ds (1 + ari Nri ). 2 s≤r ≤t i=1 s Example 3.6.13 If X t and Yt are two semimartingales, the product rule gives t t X t Yt = X 0 Y0 + X s− dYs + Ys− dX s + [X, Y ]t , 0
0
and
=
X t2 or
t
X 02
t
+2
X s− dX s + [X, X ]t ,
0
X s− dX s =
0
1 2 (X − [X, X ]t ). 2 t
Applying the preceding result to a Poisson process we obtain t 1 1 Ns− dNs = (Nt2 − [N , N ]t ) = (Nt2 − Nt ). 2 2 0 Theorem 3.6.14 (L´evy’s characterization of the Poisson process) Suppose {Q t } is a purely discontinuous martingale on the filtered probability space (, F, P, Ft ), t ≥ 0, all of whose jumps equal 1. If Q 2t − t is a martingale then {Nt = Q t + t} is a Poisson process. Proof
We can suppose N0 = Q 0 = 0. Because Q t is purely discontinuous, E[[Q, Q]t ] = E[Q 2t ] < ∞,
but because all the jumps equal +1, [Q, Q]t =
Q 2s =
s≤t
Q s .
s≤t
Write Nt = s≤t Q s . Then Nt is integrable, because [Q, Q]t is. Furthermore, Q is a compensated sum of jumps, by Theorem 9.24 of [11], so Nt has a predictable compensator λt : Q t = N t − λt . However, Nt − t = [Q, Q]t − t = (Q 2t − t) + ([Q, Q]t − Q 2t ), and is therefore, a martingale. Consequently λt = t. That is, Q t = Nt − t.
(3.6.12)
108
Stochastic calculus
Applying the differentiation rule to the martingale Q t and the function f (x) = eiux , u ∈ IR, from time 0 to the first jump time T1 , we have T1 iu Q T1 e = 1 + iu eiu Q v− dQ v + (eiu Q T1 − eiu Q T1 − − iueiu Q T1 − Q T1 ) 0
= 1 + iu
T1
eiu Q v− dQ v + (eiu Q T1 − e−iuT1 − iue−iuT1 ),
0
since Q T1 − = N T1 − − T1 = −T1 by (3.6.12) and Q T1 = 1. Also note that eiu Q T1 cancels out. Hence T1 (1 + iu)e−iuT1 = 1 + iu eiu Q v− dQ v . 0
Taking the conditional expectation with respect to F0 we have (1 + iu)E[e−iuT1 | F0 ] = 1. Therefore, T1 is independent of F0 and is exponentially distributed with parameter 1. A time translation and a similar argument shows that Tn − Tn−1 is independent of Fn−1 , and is exponentially distributed with parameter 1. Therefore Nt is a Poisson process.
3.7 The Itˆo formula for Brownian motion In this section, the Itˆo formula obtained above for general semimartingales is specialized to the Brownian motion and the related Itˆo processes. (See Definition 3.7.7.) Theorem 3.7.1 Let f be a twice continuously differentiable function on IR and let B = {Bt , t ≥ 0} be a Brownian motion. Then, in view of Theorem 3.6.7, f (Bt ) is a semimartingale given by the formula t 1 t f (Bt ) = f (B0 ) + f (Bs )dBs + f (Bs )dB, Bs 2 0 0 t 1 t = f (B0 ) + f (Bs )dBs + f (Bs )ds. 2 0 0 Example 3.7.2 f (Bt ) =
Bt2
=
B02
+ 0
t
1 2Bs dBs + 2
t
2ds. 0
Then d(Bt2 ) = 2Bt dBt + dt. We prove the converse of Theorem 2.7.1. Theorem 3.7.3 Let {Wt , Ft }, t ≥ 0 be a continuous (scalar) local martingale such that {Wt2 − t}, t ≥ 0, is a local martingale. Then {Wt , Ft } is a Brownian motion.
3.7 The Itˆo formula for Brownian motion
109
Proof We must show that for 0 ≤ s ≤ t the random variable Wt − Ws is independent of Fs and is normally distributed with mean 0 and covariance t − s. In terms of characteristic functions this means we must show that for any real u, E[eiu(Wt −Ws ) | Fs ] = E[eiu(Wt −Ws ) ] = e−u
2
(t−s)/2
.
Consider the (complex-valued) function f (x) = eiux . Applying the Itˆo rule to the real and imaginary parts of f (x) we have t 1 t 2 iuWr iuWt iuWr f (Wt ) = e = f (Ws ) + iue dWr − u e dr, (3.7.1) 2 s s because dW r = dr by hypothesis. Furthermore, the real and imaginary parts of t iueiuWr dWr are in fact square integrable martingales because the integrands are bounded t s by 1. Consequently, E[ iueiuWr dWr | Fs ] = 0. For any A ∈ Fs we may multiply (3.7.1) s
by I A e−iuWs and take expectations to deduce E[e
iu(Wt −Ws )
1 I A ] = P(A) − 2
t
E[eiu(Wr −Ws ) I A ]dr.
0
Solving this equation, we see E[eiu(Wt −Ws ) I A ] = P(A)e−u
2
(t−s)/2
,
and the result follows. If the function f is a function of both time and space the Itˆo rule has the following form. Theorem 3.7.4 Let f (., .) be continuously differentiable in the first argument and twice continuously differentiable in the second argument and let {Bt } be a Brownian motion. Then f (t, Bt ) is given by the formula t t ∂ f (s, Bs ) ∂ f (s, Bs ) 1 t ∂ 2 f (s, Bs ) f (t, Bt ) = f (0, B0 ) + ds ds + dBs + ∂s ∂B 2 0 ∂ B2 0 0 t t ∂ f (s, Bs ) ∂ f (s, Bs ) 1 ∂ 2 f (s, Bs ) = f (0, B0 ) + ds. dBs + + ∂B ∂s 2 ∂ B2 0 0 One can write formally the differential expression: ∂ f (t, Bt ) 1 ∂ 2 f (t, Bt ) ∂ f (t, Bt ) dt. dt + dBt + ∂t ∂B 2 ∂ B2 Here the differentials satisfy: d f (t, Bt ) =
(dBt )2 = dt,
(dBt )n = 0,
n > 2,
dt dBt = 0.
Example 3.7.5 1 f (t, Bt ) = exp(Bt − t), 2 1 1 d f (t, Bt ) = f (t, Bt )dBt + (− f (t, Bt ) + f (t, Bt ))dt 2 2 = f (t, Bt )dBt .
(3.7.2)
110
Stochastic calculus
Hence the function exp(Bt − 12 t) is the solution of the exponential equation (3.7.2). Since t 1 2 E (exp(Bs − s)) ds < ∞, 2 0 t the Itˆo integral exp(Bs − 12 s)dBs is a martingale. However, 0
0
t
1 1 exp(Bs − s)dBs = exp(Bt − t) − 1, 2 2
so the process X t = exp(Bt − 12 t) is a martingale.
Example 3.7.6 Given two adapted, measurable, processes X t and Yt such that with probability 1, t X s2 ds < ∞, 0
and
t
0
we have:
t
0
In particular,
t
X s dBs ,
t
t
Ys dBs =
0
Ys2 ds < ∞,
0
t
X s dBs ,
0
X s Ys ds. 0
X s dBs =
0
t
X s Ys dBs , Bs =
0
t
X s2 dBs ,
t
Bs = 0
X s2 ds.
Definition 3.7.7 An Itˆo process is a (special) semimartingale of the form: t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω). 0
0
Here Bt is a Brownian motion, and {µ(ω, t)}, {σ (ω, t)} are adapted, measurable processes such that with probability 1, t µ(ω, s)ds < ∞, ∀t ≥ 0 0
and
t
σ 2 (ω, s)ds < ∞,
∀t ≥ 0.
0
Given two Itˆo processes
Xt = X0 + 0
t
t
α(ω, s)ds + 0
β(ω, s)dBs (ω),
3.7 The Itˆo formula for Brownian motion
and
t
Y t = Y0 +
t
µ(ω, s)ds +
0
111
σ (ω, s)dBs (ω),
0
we have
t
[X, Y ]t = X, Y t = X 0 Y0 +
β(ω, s)σ (ω, s)ds.
0
Given an adapted, measurable process Yt such that with probability 1: t |Ys µ(s)|ds < ∞, 0
and
t
(Ys σ (s))2 ds < ∞,
0
and an Itˆo process
t
Xt = X0 +
t
µ(ω, s)ds +
0
σ (ω, s)dBs (ω),
0
we can define the stochastic integral t t t Ys dX s = Ys µ(ω, s)ds + Ys σ (ω, s)dBs (ω). 0
0
0
Remarks 3.7.8
1. From the definition of an Itˆo process it follows that the process process. t 2. The process Ys dX s is a continuous semimartingale.
t
Ys dX s is an Itˆo 0
0
Theorem 3.7.9 Let f be a twice continuously differentiable function on IR and let t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω) (3.7.3) 0
0
be an Itˆo process. Then f (X t ) is given by the formula t 1 t f (X t ) = f (X 0 ) + f (X s )dX s + f (X s )dX, X t . 2 0 0 Proof
See any of [6], [11], [16], [30], [34].
Or in differential form, d f (X t ) = f (X t )dX t +
1 f (X t )dX, X t . 2
Here, dX, X t = σ (t)dBt , σ (t)dBt = σ (t)2 dt.
(3.7.4)
112
Stochastic calculus
If (3.7.3) is substituted in (3.7.4) we have t t 1 2 f (X t ) = f (X 0 ) + f (X s )σ (ω, s)dBs (ω) + f (X s )σ (s) + µ(ω, s) f (X s ) ds. 2 0 0 t Remark 3.7.10 Note that f (X s )σ (ω, s)dBs (ω) is perhaps only a local martin t 0 gale even if E σ 2 (s)ds < ∞ because f (X t )σ (t) satisfies the weaker condition 0 t ( f (X s )σ (s))2 ds < ∞. This is guaranteed by the continuity of f (X t ) in t. Consequently, 0
local martingales arise naturally in the context of Itˆo stochastic calculus.
Example 3.7.11 Consider the Itˆo process t Xt = σ (ω, s)dBs (ω), 0
and suppose f (X t ) =
X t2 .
By the Itˆo formula,
1 d f (X t ) = 0dt + 2X t 0dt + 2X t σ (t)dBt + 2σ (s)2 dt 2 2 = 2X t σ (t)dBt + σ (s) dt.
Example 3.7.12 Solve the following linear stochastic differential equation: dX t = µX t dt + σ X t dBt ,
(µ, σ ∈ IR).
(3.7.5)
Assume that X 0 is independent of Bt and E[X 0 ]2 < ∞. We must find an adapted, measurable process X t such that t 2 E X s ds < ∞, 0
and (3.7.5) holds. Let f (X t ) = log X t and apply the Itˆo formula: t t 1 1 1 1 log X t = log X 0 + µX s − σ 2 X s2 2 ds + σ X s dBs Xs 2 Xs Xs 0 0 1 = log X 0 + (µ − σ 2 )t + σ Bt . 2 Therefore,
1 2 X t = X 0 exp (µ − σ )t + σ Bt . 2
As a Borel function of Bt , X t is adapted and since it is continuous, it is measurable. Now X t2 = X 02 exp{(2µ − σ 2 )t + 2σ Bt }, and using the assumptions, E[X t2 ] = E[X 02 ] exp{(2µ − σ 2 )t}E[exp{2σ Bt }].
3.7 The Itˆo formula for Brownian motion
113
Recall that Bt is an N (0, t) random variable so that E[exp{2σ Bt }] = exp{2σ 2 t}. Therefore E[X t2 ] = E[X 02 ] exp{(2µ + σ 2 )t}. Consequently, E 0
t
X s2 ds
t
=
E[X s2 ]ds t = E[X 02 ] exp{(2µ + σ 2 )s}ds < ∞. 0
0
Setting µ = 0 and σ = 1 in (3.7.5) gives the equation dX t = X t dBt .
(3.7.6)
1 X t = exp{Bt − t}. 2
(3.7.7)
This has the solution
The process X given by (3.7.7) is called the stochastic exponential of the Brownian motion process B. For a general Itˆo process, t t Zt = Z0 + µ(ω, s)ds + σ (ω, s)dBs (ω), 0
0
consider the equation dX t = X t dZ t , with X 0 given and F0 measurable, that is, X 0 is a constant. Then the unique solution of the equation is the process t t 1 X t = X 0 exp{ σ (s)dBs + (µ(s) − σ 2 (s))ds} 2 0 0 1 = exp{X t − X, X t }. 2 X t is then called the stochastic exponential of the Itˆo process Z . A generalization of Theorem 3.7.9 is: Theorem 3.7.13 Suppose f (., .) is continuously differentiable in the first argument and twice continuously differentiable in the second argument, and consider the Itˆo process t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω). (3.7.8) 0
0
114
Stochastic calculus
Then f (t, X t ) is given by the formula
∂ f (X s ) dX s ∂ Xs 0 ∂ f (X s ) 1 t ∂ 2 f (X s ) dX, X t . ds + ∂s 2 0 ∂ X s2
f (t, X t ) = f (0, X 0 ) +
t
+ 0
t
(3.7.9)
Again in differential form: d f (t, X t ) =
1 ∂ 2 f (X t ) ∂ f (X t ) ∂ f (X t ) dt + dX t + dX, X t . ∂ Xt ∂t 2 ∂ X t2
Substituting (3.7.8) in (3.7.9) we have: f (t, X t ) = f (0, X 0 ) +
∂ f (X s ) ds + ∂s
∂ f (X s ) dBs ∂ Xs 0 0 t 2 ∂ f (X s ) 1 2 ∂ f (X s ) µ(s) ds. + + σ (s) ∂ Xs 2 ∂ X s2 0 t
t
σ (s)
The following theorem gives the multi-dimensional Itˆo formula. (See [25].) Theorem 3.7.14 Let f (t, x1 , . . . , xn ) be continuously differentiable in the first argument and twice continuously differentiable in the other arguments. Suppose X 1 , . . . , X n are Itˆo processes of the form: dX t1 = µ1 (t)dt + σ11 (t)dBt1 + σ12 (t)dBt2 + · · · + σ1m (t)dBtm dX t2 = µ2 (t)dt + σ21 (t)dBt1 + σ22 (t)dBt2 + · · · + σ2m (t)dBtm ... dX tn = µn (t)dt + σn1 (t)dBt1 + σn2 (t)dBt2 + · · · + σnm (t)dBtm . We, therefore, require that with probability 1, t |µi (s)|ds < ∞,
i = 1, . . . , n,
0
and
t
|σkl (t)|2 ds < ∞,
k = 1, . . . , n;
l = 1, . . . , m.
0
Suppose B 1 , . . . , B m are m independent Brownian motions. t ∂f ∂f 1 n 1 n f (t, X t , . . . , X t ) = f (0, X 0 , . . . , X 0 ) + µi (s) i + ∂s ∂ Xs 0 i 1 ∂ ∂ + Tr σ (s) f σ (s) ds 2 ∂X ∂X t ∂f + σi j (t)dBsj . ∂ X si 0 ij
(3.7.10)
3.8 Representation results
115
∂ ∂ f is the matrix ∂X ∂X
Here σ (t) = {σi j (t)} is an n × m matrix, X = (X , . . . , X ), ∂2 f , i, j = 1, . . . , n and Tr(A) is the trace of the matrix A, i.e. the sum of the ∂ Xi∂ X j diagonal entries of the matrix A. 1
n
We can write (3.7.10) in differential form: d f (t, X t1 , . . . , X tn ) =
∂f ∂f µi (t) i dt dt + ∂t ∂ Xt i +
σi (t)
i
∂f 1 ∂2 f i dB + dX i , X j . 2 i, j ∂ X i ∂ X j ∂ X ti t
Example 3.7.15 Suppose that dX t1 = µ1 (t)dt + σ1 (t)dBt1 dX t2 = µ2 (t)dt + σ2 (t)dBt2 are two Itˆo processes and that f (X t1 , X t2 ) = X t1 X t2 . From the Itˆo rule (3.7.10) d(X t1 X t2 ) = X t2 dX t1 + X t1 dX t2 + σ1 (t)σ2 (t)dt = X t2 dX t1 + X t1 dX t2 + dX 1 , X 2 t , or equivalently X t1 X t2 = X 01 X 02 +
0
t
X s2 dX s1 +
0
t
X s1 dX s2 + X 1 , X 2 t .
3.8 Representation results Measurable, adapted processes { f (ω, t), Ft } such that
t
f (ω, s)ds < ∞, 2
E
∀t ≥ 0
0
generate martingales {X t , Ft } via the formula Xt = X0 +
t
f (ω, s)dBs (ω).
(3.8.1)
0
The following theorem (Davis [8]) gives a converse result in the sense that any (square) martingale {X t , Ft } can be represented as an Itˆo integral similar to (3.8.1). Theorem 3.8.1 Suppose {Bt }, t ≥ 0, is a Brownian motion on the filtered probability space (, F, P, Ft ). Write Gt0 = σ {Bs : s ≤ t} and {Gt } for the completion of {Gt0 }, so that the filtration {Gt } is certainly right-continuous.
116
Stochastic calculus
Then every random variable X ∈ L 2 (, G∞ ) can be represented as a stochastic integral ∞ X = E[X | G0 ] + f s dBs , 0
∞
where { f t } is a Gt -predictable process and E[ 0
f s2 ds] < ∞. Furthermore,
t
E[X | Gt ] = E[X | Ft ] = E[X | G0 ] +
f s dBs .
0
If {Bt } = {Bt1 , . . . , Btn }, t ≥ 0, is an n-dimensional ∞ Brownian motion, then f t = ( f t1 , . . . , f tn ) is a Gt -predictable process such that E[ ( f si )2 ds] < ∞, i = 1, . . . , n and 0
X ∈ L 2 (, G∞ ) has representation X t = E[X | G0 ] +
n i=1
Proof
0
t
f si dBsi .
See [8].
Theorem 3.8.2 Suppose {Nt }, t ≥ 0, is a Poisson process on the filtered probability space (, F, P, Ft ). Write Gt0 = σ {Ns : s ≤ t} and {Gt } for the completion of {Gt0 }, so that the filtration {Gt } is certainly right-continuous. Then every random variable X ∈ L 2 (, G∞ ) can be represented as stochastic integral ∞ X = E[X | G0 ] + f s dQ s , 0
where Q t = Nt − t, { f t } is a Gt -predictable process and E[
∞ 0
E[X | Gt ] = E[X | Ft ] Proof
f s2 ds] < ∞. Furthermore,
a.s. for all t.
See [8], [11]. Representation results for Markov chains
Consider a finite state Markov process {X t }, t ≥ 0, defined on a probability space (, F, P). We have noted in Example 2.6.17 that, without loss of generality, the state space of X can be identified with the set S = {e1 , . . . , e N } of standard unit vectors in IR N . Recall that pti = P(X t = ei ),
0 ≤ i ≤ N,
and d pt = A t pt . dt At = (ai j (t)), t ≥ 0.
(3.8.2)
3.8 Representation results
117
Write Fts for the right-continuous, complete filtration generated by σ {X r : s ≤ r ≤ t}, and Ft = Ft0 . We saw in Lemma 2.6.18 that t
Vt = X t − X 0 − Ar X r dr (3.8.3) 0
is an {Ft }-martingale. Lemma 3.8.3
X t = (t, 0) X 0 +
t
(r, 0)−1 dVr .
(3.8.4)
0
Proof
Differentiating (3.8.4) verifies the result.
If x, y are (column) vectors in IR N we shall write x, y = x y for their scalar (inner) product. Consider 0 ≤ i, j ≤ N with i = j. Then, because the Markov chain is piecewise constant, dX s = X s and X s− , ei ej dX s = X s− , ei ej X s = X s− , ei ej (X s − X s− ) = I (X s− = ei , X s = e j ). Therefore, 0
t
X s− , ei ej dX s =
ij
I (X s− = ei , X s = e j ) = Jt ,
0<s≤t
which equals the number of times X jumps from ei to e j in the interval [0, t]. Define the martingale t ij
Vt = X s− , ei ej dVs . 0
(Note the integrand is predictable.) Then t t ij Vt = X s− , ei ej dX s − X s− , ei ej As X s− ds 0 0 t ij = Jt − X s− , ei a ji (s)ds 0 t ij = Jt − X s , ei a ji (s)ds, 0
because X s = X s− for each ω, except for countably many s. That is, for i = j, t ij ij Jt = X s , ei a ji (s)ds + Vt . 0
The process ij
Jt .
0
t
X s , ei a ji (s)ds is, therefore, the compensator of the counting process
118
Stochastic calculus j
For a fixed j, 0 ≤ j ≤ N , write Jt for the number of jumps into state e j up to time t. Then, for i = j, N
j
Jt =
ij Jt
=
i=1
N i=1
t
j
X s , ei a ji (s)ds + Vt ,
0
N j ij where Vt is the martingale i=1 Vt . Finally, write Jt for the total number of jumps (of all kinds) of the process X up to time t. Then Jt =
N
j
Jt =
t
X s , ei a ji (s)ds + Q t ,
i, j=1 0
j=1
where Q t is the martingale
N
N j=1
j
Vt . However, aii (s) = −
N
a ji (s),
j=1
so Jt = −
N i=1
t
X s , ei aii (s)ds + Q t .
(3.8.5)
0
Before we state the next result we need the following definition. Definition 3.8.4 If M = (M 1 , . . . , M N ) is a vector, IR N -valued, square integrable martingale, the quadratic predictable variation process of M is the (unique) predictable matrix valued process M, M ∈ IR N ×N such that M M − M, M is a martingale. Here M M is the (Kronecker) product of the (column) vector M with the (row) vector M , so that M M can be identified with the matrix valued process with entries (M i M j ). Lemma 3.8.5 The quadratic predictable variation process of the (vector) martingale V (see Definition 3.8.4) is given by the matrix valued process t t t V, V t = diag Ar X r − dr − (diagX r − )Ar dr − Ar (diagX r − )dr. 0
Proof
0
0
Recall X t ∈ S is one of the unit vectors ei . Therefore, X t X t = diagX t .
(3.8.6)
Now by the product rule X t X t = X 0 X 0 + + 0
t
t 0
X r − (Ar X r − ) dr +
t 0
X r − dVr +
t
0
dVr X r − + V, V t + ([V, V ]t − V, V t ),
(Ar X r − )X r − dr
3.8 Representation results
119
where [V, V ]t − V, V t is an {Ft }-martingale. However, a simple calculation using 3.8.6 shows X r − (Ar X r − ) = (diagX r − )Ar , and (Ar X r − )X r − = Ar (diagX r − ) . Therefore, X t X t = X 0 X 0 + +
t
0
t
(diagX r − )Ar dr
Ar (diagX r − )dr + V, V t + martingale.
(3.8.7)
0
Also, from (3.8.6), X t X t
t
= diagX t = diagX 0 + diag
Ar X r − dr + diagVt .
(3.8.8)
0
The semimartingale decompositions (3.8.7) and (3.8.8) must be the same, so equating the predictable terms, t t t V, V t = diag Ar X r − dr − (diagX r − )Ar dr − Ar (diagX r − )dr. 0
0
0
We next note the following representation result: Remark 3.8.6 A time varying function f (t, X t ) of X t ∈ S takes only the values f (t, e1 ), . . . , f (t, e N ) for each t. Writing f i (t) = f (t, ei ), 1 ≤ i ≤ N , we see f can be represented by the vector f (t) = ( f 1 (t), . . . , f N (t)) ∈ IR N , so that f (t, X t ) = f (t), X t , where , denotes the inner product in IR N .
Therefore, we have the following differentiation rule and representation result: Lemma 3.8.7 Suppose the components of f (t) are differentiable in t. Then t t f (t, X t ) = f (0, X 0 ) + f (r ), X r dr + f (r ), Ar X r − dr 0 0 t + f (r ), dVr . Here,
(3.8.9)
0 t
f (r ), dVr is an Ft -martingale. Also, using (3.8.4),
0
t
f (t, X t ) = f (t), (t, 0)X 0 + 0
f (t), (t, r )dVr .
(3.8.10)
120
Stochastic calculus
The single jump process Here we discuss representation results for the single jump process. (See Examples 2.1.4 and 2.6.12.) Lemma 3.8.8 Suppose {Mt } is a uniformly integrable {Ft }-martingale (see Example 2.6.12 for the definition of the filtration {Ft }) such that M0 = 0 a.s. Then there is an F = t Ft measurable function h : → IR such that h ∈ L 1 (P) and 1 Mt = h(T, Z )I{T ≤t} − I{T >t} h(s, z)P(ds, dz) a.s. Ft ]0,t] E Proof If {Mt } is a uniformly integrable {Ft }-martingale, then {Mt } = E[h | Ft ] for some F = t Ft measurable random variable h. From the definition of F = t Ft , h is of the form h(T, Z ). However, 1 E[h(T, Z ) | Ft ] = h(T, Z )I{T ≤t} + I{T >t} h(s, z)P(ds, dz). Ft ]t,∞] E Because 0 = M0 = E[h] =
h(s, z)P(ds, dz)
=
h(s, z)P(ds, dz) + ]0,t]
E
h(s, z)P(ds, dz), ]t,∞]
E
the result follows.
Lemma 3.8.9 Suppose {Mt }, t ≥ 0, is a local martingale of {Ft }. 1. If c = ∞, or c < ∞ and Fc− = 0, then {Mt } is a martingale on [0, c[. 2. If c < ∞ and Fc− > 0, then {Mt } is a uniformly integrable martingale. Proof 1. Let {Tk } be an increasing sequence of stopping times such that lim Tk = ∞ a.s. and {Mt∧Tk } is a uniformly integrable martingale. If there is a k such that Tk ≥ T a.s. then Mt = Mt∧Tk a.s. is a uniformly integrable martingale. Otherwise, suppose for each k that P(Tk < T ) > 0. Then, by Lemma 2.6.13, there is a sequence {tk } such that Tk ∧ T = tk ∧ T for each k, and because P(T > tk ) > 0 we have tk ≤ c, otherwise we should have Tk ≥ T . Because lim Tk = ∞ we see that limk P(T > tk ) = 0, so lim tk = c. Now {Mt } is stopped at time T so Mt∧Tk = Mt∧tk . Consequently {Mt }, t ≤ tk , is a uniformly integrable martingale, and {Mt } is certainly a martingale on [0, c[. 2. Suppose now that c < ∞ and Fc− > 0. Then P(T = c) > 0. Because lim Tk = ∞ a.s. there is a k such that P(T = c, Tk > c) > 0. Consequently, for such a k, Tk ≥ T a.s. and the process {Mt∧Tk } = {Mt } is a uniformly integrable martingale.
3.8 Representation results
Write E
121
L 1 (µ) for the set of measurable functions g : → IR such that |g|dµ < ∞, and L 1loc (P) for the set of measurable functions g : → IR
[0,∞]×E
such that I{s≤t} g(s, x) ∈ L 1 (P) for all t < c. We have the following martingale representation result (see [11]). g
Theorem 3.8.10 {Mt } is a local Ft -martingale with M0 = 0 a.s. if and only if Mt = Mt for some g ∈ L 1loc (P), where g
Mt =
I{s≤t} g(s, x)q(ds, dx).
Proof Suppose g ∈ L 1loc (P). Then there is an increasing sequence of stopping times {Tk } such that lim Tk = ∞ a.s. and I{s0. From Lemma 3.8.9 {Mt } is a uniformly integrable martingale, and so is of the form 1 Ft
Mt = h(T, Z )I{T ≤t} − I{T >t}
h(s, z)P(ds, dz), ]0,t]
(3.8.11)
E
where h(T, Z ) = M∞ . Define g(t, Z ) = h(t, Z ) − I{t
1 Ft
h(s, z)P(ds, dz) ]0,t]
if t < ∞,
(3.8.12)
E
and g(∞, Z ) = 0. Then g Mt
=
I{s≤t} gdq
= I{t≥T } g(T, Z ) −
]0,t]
− I{t
E
E
−1 g(s, z)Fs− P(ds, dz)
−1 g(s, z)Fs− P(ds, dz).
(3.8.13)
g
From (3.8.11) and (3.8.13) we see that Mt = Mt if
h(t, z) = g(t, z) − ]0,t]
E
−1 g(s, z)Fs− P(ds, dz).
(3.8.14)
122
Stochastic calculus
However, if g is given by (3.8.12) and t < c, −1 −1 g(s, z)Fs− P(ds, dz) = h(s, z)Fs− P(ds, dz) ]0,t]
E
]0,t]
E
− ]0,t]
E
h(u, z)P(du, dz)dFs ]0,s]
−
+ ]0,t]
E
]0,t]
E
E
]u,t]
−1 Fs−1 Fs− dFs h(u, z)P(du, dz)
−1 h(s, z)Fs− P(ds, dz)
=
+ ]0,t]
= Ft−1
−1 h(s, z)Fs− P(ds, dz)
= ]0,t]
−1 Fs−1 Fs−
E
−1 Ft−1 Fu− h(u, z)P(du, dz)
h(u, z)P(du, dz). ]0,t]
E
Therefore, (3.8.14) is satisfied if t < c. A similar calculation shows that the coefficients g g of Itc) = 0 so it remains only to show that Mc = Mc when T (ω) = c. This is verified by a similar calculation to that above. We now check that g ∈ L 1 (µ). Because {Mt } is uniformly integrable |h|d p < ∞.
Therefore
|h|d p ≤ ≤ =
|h|d p − ]0,c[ −1 |h|d p − Fc− −1 |h|d p + Fc−
≤ (1 +
−1 Fc− )
Ft−1
|h|P(ds, dz)dFt
]0,t]
E
|h|P(ds, dz)dFt ]0,c[
]0,t]
E
|h|(Ft − Fc− P(ds, dz) ]0,c[
E
|h|d p < ∞.
Consequently, g ∈ L 1 (µ). 2. Now suppose c = ∞, or c < ∞ and Fc− = 0. Then from Lemma 3.8.9 {Mt } is a martingale on [0, c[, and so uniformly integrable on [0, t] for any t < c. Therefore Mt is of the form (3.8.11) for some h satisfying |h|P(ds, dz) < ∞, ]0,t]
E
3.9 Random measures
123
for all t < c. Calculations as in (1.) above show that, for g given by (3.8.12) and t < c, g Mt = Mt . Also −1 |g|P(ds, dz) ≤ |h|d p − Fs |h|P(ds, dz)dFs ]0,t]
E
]0,t]
E
]0,t[
≤
]0,s]
|h|d p 1 − ]0,t]
E
]0,t[
Fs−1 dFs
E
<∞
if t < ∞.
1 Therefore g ∈ L loc (P) and the proof is complete.
3.9 Random measures Definition 3.9.1 A measure µ on (IR+ , B(IR+ )) is a counting measure if
1. µ(B) ∈ {0, 1, . . . , +∞} = IN ∪ {∞} for every B ∈ B(IR+ ), 2. µ([a, b]) < ∞ for all bounded intervals [a, b] ⊂ IR+ . In other words a counting measure µ is just a countable subset D ⊂ IR+ , and for any given B ∈ B(IR+ ), µ(B) is the number of points in D which belong to B and we write µ(dx) = δx (dx). x∈D
Here δx (dx) denotes the unit mass at x. Integration with respect to a counting measure µ is reduced to discrete time summation, i.e. for any real valued function f , we have f (x)µ(dx) = f (x). IR
x∈D
Definition 3.9.2 Let (E, E) be a measurable space. Let D ⊂ IR+ be a countable set. A function p from D to E is called a point function. A point function p defines a counting measure µp (dt, dx) on B(IR+ ) ⊗ E by µp ((0, t] × A) = #{s ∈ D; s ≤ t, p(s) ∈ A},
t > 0, A ∈ E.
(3.9.1)
The right hand side of (3.9.1) stands for the number of times s up to time t when p(s) landed in A. Definition 3.9.3 Let (, F, P) be a given probability space and (E, E) be a measurable space. A nonnegative kernel µ(ω, dt, dx) is called a random measure (on E) if 1. µ(., A) is F-measurable for each fixed A ∈ B(IR+ ) ⊗ E, 2. µ(ω, .) is a σ -finite measure for each ω. Such a random measure is said to be integer valued if also
3. µ(ω, A) ∈ {0, 1, . . . , +∞} = IN ∪ {∞} for every A ∈ B(IR+ ) ⊗ E, 4. µ(ω, (0, t] × E) − µ(ω, (0, t) × E) = µ(ω, {t} × E) ≤ 1 for all (ω, t), that is to say the counting done by µ cannot increase by more than one at any isolated single time t.
124
Stochastic calculus
Remarks 3.9.4 1. If µ is an integer valued random measure write D = {(ω, t) : µ(ω, {t} × E) = 1}. It follows from Definition 3.9.3(3, 4) that, for each fixed ω, the set Dω = {t ∈ IR+ : µ(ω, {t} × E) = 1},
(3.9.2)
which is the set of times when µ(ω, .) jumps, is at most countable. 2. For t ∈ Dω write {t} × E = ∞ of E into proper n=1 {t} × An , where {An } is a partition nonempty subsets. Then there exists one and only one subset Aωn t ∈ ∞ n=1 An , say, such that µ(ω, {t} × Aωn t ) = 1, which implies that there exists a single point εt (ω) ∈ Aωn t such that µ(ω, {t} × εt (ω)) = 1. In summary, for each (ω, t) ∈ D there is a unique point εt (ω) ∈ E such that µ(ω, {t} × dx) = δεt (ω) (dx). Here δεt (ω) (dx) denotes the unit mass at εt (ω) and we can write µ(ω, dt, dx) = δ(s,εs (ω)) (dt, dx) (ω,s)∈D
=
I(εs ∈E) δ(s,εs (ω)) (dt, dx).
(3.9.3)
s≥0
Note that the set given by (3.9.2) can be written Dω = {t ∈ IR+ : εt (ω) ∈ E)}.
(3.9.4)
3. If {εt } is an E-valued stochastic process such that for each fixed ω the t-function ε(ω, .) takes on at most a countable number of values in E, then the above expression (3.9.3) defines an integer valued random measure. These random measures are sometimes called point processes, since for each ω the sample path {εt (ω)} consists of the countable set of points {t, εt (ω)}, that is, {εt (ω)} is a point function as given by Definition 3.9.2. 4. Since an integer valued random measure process µ satisfies the assumptions of Doob– Meyer Theorem 2.6.9, then there exists a predictable increasing process µ p , the compensator of µ, such that µ p (ω, {t} × E) ≤ 1, and µ − µ p is a local martingale.
Random measures associated with jump processes When dealing with stochastic processes which are not continuous everywhere but with sample paths which are right-continuous with left limits, the notion of random measures enters naturally into the scene. For instance, the process µ B (ω, (0, t], B) = I (X s ∈ B), B ∈ B(IR − {0}), (3.9.5) 0<s≤t
3.9 Random measures
125
is called the measure of jumps of the process X and it counts the increments of X which fall in B up to time t. Note that, since X is right-continuous with left limits, the series given by (3.9.5) is finite (a.s.) for every finite t and any subset B that is bounded away from zero. However, the number of jumps of X t need not be finite on finite intervals of time. If we eliminate the randomness parameter ω from (3.9.5) we obtain a σ -finite measure on the product σ -field generated by (0, ∞) × (IR − {0}). Remark 3.9.5 Since the process (3.9.5) is an integer valued random measure, by Remark 3.9.4(4) there exists a predictable increasing process µ Bp , the compensator of µ B , such that µ B − µ Bp is a local martingale. Examples 3.9.6 1. Counting processes. Since all the jumps are of size +1, in formula (3.9.5) the only set of interest is B = {+1}, that is, µ B (ω, (0, t], B) = I (X s = 1) = X t . 0<s≤t
2. Finite state processes. Suppose that for all t ≥ 0, X t ∈ {0, 1, . . . , N }. Then the possible jumps are the integers {−N , −N + 1, . . . , −1, +1, . . . , N − 1, N }, and in formula (3.9.5) the sets B of interest are all subsets of {−N , −N + 1, . . . , −1, +1, . . . , N − 1, N }. 3. Let Z t , t ∈ IR+ , be a finite state space process with right-constant sample paths on the state space S = {e1 , e2 , . . . , e N }. Here ei is the standard basis (column) vector in IR N with unity in the i-th position and zero elsewhere. Let Tk (ω) be the k-th jump time of Z , δTk (ω) (dr ) be the unit mass at time Tk (ω) and δ Z Tk (ω) (ei ) be the unit mass at Z Tk (ω) (ω). Since Z t is a jump process taking values in the vector space IR N we can write Zt = Z0 + Z r . 0
Here Z r = Z r − Z r − =
N
(ei − Z r − )
i=1
=
N
∞
δTk (ω) (dr )δ Z Tk (ω) (ei )
k=1
(ei − Z r − )µ Z (dr, ei ).
i=1
We assume that each Z t has almost surely finitely many jumps in any finite interval so that the random measure µ Z is σ -finite. Let µ˜ Z (dr, ei ) be the predictable compensator of µ Z so that N t Zt = Z0 + (ei − Z r − )µ˜ Z (dr, ei ) + Wt , i=1
where Wt =
N
t
i=1 0 (ei
0
− Z r − )(µ Z (dr, ei ) − µ˜ Z (dr, ei )).
126
Stochastic calculus
Definition 3.9.7 An integer valued random measure µ(ω, dt, dx) is a Poisson random measure if 1. for each A ∈ B(IR+ ) ⊗ E the random variable µ(., A) is Poisson distributed with parameter λ(A) = E[µ(., A)], i.e. P(µ(., A) = k) = exp{−λ(A)}
(λ(A))k , k!
and 2. if A1 , A2 , . . . , An are disjoint subsets of B(IR+ ) ⊗ E, then the random variables µ(., A1 ), µ(., A2 ), . . . , µ(., An ) are mutually independent.
More of the differentiation rule Suppose X = {X t , Ft } is a real local martingale. Let X t = X 0 + X c + X d be the unique decomposition given in Lemma 3.2.14, where X c is the continuous local martingale part of X and X d is the purely discontinuous local martingale part of X . Suppose µ X (ω, dt, dx) = I(X s =0) δ(s,X s ) (dt, dx). s>0
with predictable compensator µ Xp (ω, dt, dx). We also have from [18] t c Xt = X0 + Xt + x(µ X (dt, dx) − µ Xp (dt, dx)). 0
IR
Suppose f ∈ C , the space of functions continuously differentiable in t and twice continuously differentiable in x. Then the differentiation rule gives (see [18]) t ∂ f (s, X s− ) f (t, X t ) = f (0, X 0 ) + ds ∂s 0 t ∂ f (s, X s− ) c + dX s ∂x 0 1 t ∂ 2 f (s, X s− ) + dX c , X c s 2 0 ∂x2 t + [ f (s, X s− + x) − f (s, X s− )] (µ X (ds, dx) − µ Xp (ds, dx)) 1,2
0
+
IR
t 0
f (s, X s− + x) − f (s, X s− ) −
IR
∂ f (s, X s− ) µ Xp (ds, dx). ∂x
Example 3.9.8 Suppose that the scalar process {X t } is described by the stochastic differential equation dX t = f (t, X t )dt + σ (t, X t )dBt + γ (t, X t− , x)(µ(dt, dx) − µ p (dt, dx)). IR
Here B is a standard Brownian motion and µ is a random measure with compensator µ p .
3.10 Problems
127
Suppose f ∈ C 1,2 . Then the differentiation rule gives (see [18]) t ∂ f (s, X s− ) f (t, X t ) = f (0, X 0 ) + ds ∂s 0 t ∂ f (s, X s− ) + ( f (s, X s )ds + σ (s, X s− )dBs ) ∂x 0 t ∂ f (s, X s− ) + γ (s, X s− , x)(µ(ds, dx) − µ p (ds, dx)) ∂s 0 IR 1 t ∂ 2 f (s, X s− ) 2 + σ (s, X s− )ds 2 0 ∂x2 t
+ f s, X s− + γ (s, X s− , x) − f s, X s− (µ(dt, dx) − µ Xp (dt, dx)) 0
+
IR
t
0
f (s, X s− + γ (s, X s− , x) − f s, X s−
IR
− γ (s, X s− , x)
∂ f (s, X s− ) X µ p (dt, dx). ∂x
3.10 Problems 1. Let X 1 , X 2 , . . . be a sequence of i.i.d. N (0, 1) random variables and consider the process Z 0 = 0 and Z n = nk=1 X k . Show that n [Z , Z ]n = X k2 , k=1
Z , Z n = n, E([Z , Z ]n ) = E(
n
X k2 ) = n.
k=1
2. Show that if X and Y are (square integrable) martingales, then X Y − X, Y is a martingale. 3. Establish the identity 1 ([X + Y, X + Y ]n − [X, X ]n − [Y, Y ]n ). 2 4. Show that for any processes X , Y , [X, Y ]n =
X n Yn =
n
X k−1 Yk +
k=1
n
Yk−1 X k + [X, Y ]n .
k=1
5. Show that for a real valued differentiable function f and a stochastic process X we have the discrete time version of the Itˆo formula, f (X n ) = f (X 0 ) +
n
f (X k−1 )X k
k=1
+
n k=1
[ f (X k ) − f (X k−1 ) − f (X k−1 )X k ].
128
Stochastic calculus
6. Show that if {X n } is a square integrable martingale then X 2 − X, X is a martingale. 7. Find [B + N , B + N ]t and B + N , B + N t for a Brownian motion process {Bt } and a Poisson process {Nt }. L2
8. Show that limδn →0 Sn = Bt2 /2 + (α − 12 )t, where Sn is given by (3.4.1) where τkn = n (1 − α)tk + αtk−1 , 0 ≤ α ≤ 1. 9. Let f be a deterministic square integrable function and Bt a Brownian motion. Show that the stochastic integral t f (s)dBs 0
is a normally distributed random variable with distribution t N (0, 0 f 2 (s)ds). 10. Show that if t E[ f (s)]2 ds < ∞, 0
the Itˆo process It =
t
f (s)dBs 0
has orthogonal increments, i.e., for 0 ≤ r ≤ s ≤ t ≤ u, E[(Iu − It )(Is − Ir )] = 0. 11. Show that
0
t
(Bs2 − s)dBs =
Bt3 − t Bt . 3
12. Prove the second part of Lemma 3.4.3. 13. Show that the process Bt2 /2 + (α − 12 )t is an Ft -martingale if and only if α = 0. 14. Using the Itˆo formula, show that the Doob–Meyer decomposition of Bt4 is given by t t Bt4 = 4 Bs3 dBs + 6 Bs2 ds, 0
0
where B is the Brownian motion process. 15. Using the Itˆo formula, show that d(Bt )n = n Btn−1 dBt +
n(n − 1) n−2 Bt dt. 2
16. If N is a standard Poisson process show that the stochastic integral t Ns d(Ns − s) 0
is not a martingale. However, show that t Ns− d(Ns − s) 0
3.10 Problems
129
is a martingale. Here Nt is a Poisson process. (Note that at any jump time s, Ns− = Ns − 1.) 17. Prove that t 2 Ns− dNs = 2 Nt − 1. 0
Here Nt is a Poisson process. 18. Show that the unique solution of
t
xt = 1 +
xs− dys 0
c is given by xt = e yt 0,s≤t (1 + ys ). Here yt is a finite-variation deterministic function. 19. Show that the unique solution of t xt = 1 + xs− dNs
0
is given by xt = 2 Nt . Here Nt is a Poisson process. 20. Show that given two adapted, measurable processes xt and yt , such that t 2 E (xs ) ds < ∞, 0
and
E
t
(ys )2 ds < ∞,
0
we have for 0 ≤ r ≤ t, t t t E xs dBs ys dBs | Fr = E xs ys ds | Fr , 0
0
0
where Bt is a Brownian motion process and Ft is its natural filtration. 21. Show that the linear stochastic differential equation dX t = F(t)X t dt + G(t)dt + H (t)dBt , with X 0 = ξ has the solution t t −1 −1 X t = (t) ξ + (s)G(s)ds + (s)H (s)dBs . 0
(3.10.1)
(3.10.2)
0
Here F(t) is an n × n bounded measurable matrix, H (t) is an n × m bounded measurable matrix, Bt is an m-dimensional Brownian motion and G(t) is an IRn -valued bounded measurable function. (t) is the fundamental matrix solution of the deterministic equation dX t = F(t)X t dt. See [1].
130
Stochastic calculus
22. Show that the solution (3.10.2) of the stochastic differential equation (3.10.1) with E|X 0 |2 = E|ξ |2 < ∞ has mean t −1 µt = E[X t ] = (t) E[ξ ] + (s)G(s)ds , 0
satisfying the deterministic differential equation dµt = F(t)µt dt + G(t)dt,
µ0 = E[ξ ],
and covariance matrix P(t) satisfying the deterministic matrix differential equation d p(t) = F(t)P(t)dt + P(t)F(t) dt + H (t)H (t) dt,
µ0 = E[ξ ],
with initial value P(0) = E[ξ − Eξ ][ξ − Eξ ] . 23. Show that the solution (3.10.2) of the stochastic differential equation (3.10.1) is a Gaussian stochastic process if and only if X 0 is normally distributed or constant. 24. Show that the linear stochastic differential equation dX t = −α X t dt + σ dBt , with E|X 0 |2 = E|ξ |2 < ∞ has the solution t X t = e−αt ξ + σ e−α(t−s) dBs , 0
and µt = E[X t ] = e−αt E[ξ ], σ 2 (1 − e−2αt ) . 2α 25. Show that the sequence of stopping times given by Remark 3.4.5 is indeed a localizing sequence of stopping times, i.e. a nondecreasing sequence, converging to ∞ with probability 1. 26. Suppose for θ ∈ IR, P(t) = Var(X t ) = e−2αt Var(ξ ) +
1 2 X tθ = eθ Mt − 2 θ At
is a martingale, and suppose there is an open neighborhood I of θ = 0 such that for all θ ∈ I and all t (P- a.s.), 1. |X tθ | ≤ a, dX θ 2. | t | ≤ b, dθ d2 X tθ 3. | 2 | ≤ c. dθ Here a, b, c are nonrandom constants which depend on I , but not on t. Show that then the processes {Mt } and {Mt2 − At } are martingales. 27. Prove the result of Example 3.6.12.
4
Change of measures
4.1 Introduction We begin by giving a conditional form of Bayes’ Theorem. The result relates conditional expectations under two different measures. Consider first a simple situation like, for instance, the throwing of a die. Here = {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 } = {1, 2, 3, 4, 5, 6}. Suppose P(ωi ) = pi = 1/6. Let P be another probability measure such that P(ωi ) =
1 . 6
Then the two measures are related by the Radon–Nikodym derivative 1 P I{ω } (ω). (ω) = (ω) = P 6 pi i i Write j = (ω j ) = 1/(6 p j ). Consider the sub-σ -field G = {{odd}, {even}, , ∅}. Consider a set of real numbers {x1 , x2 , . . . , x6 } and an associated random variable X (ω) → IR given by: X (ωi ) = xi ,
i = 1, . . . , 6.
The G-measurable random variable E[X | G](ω) is constant on the two atoms of G and is given by the following expression: E[X | G](ω) =
x j j P[ω j | G](ω)
j
=
j
x j j P[ω j | {even}]I{even} (ω) +
j
x j j P[ω j | {odd}]I{odd} (ω)
132
Change of measures
x2 2 p2 + x4 4 p4 + x6 6 p6 I{even} (ω) P({even}) x 1 1 p 1 + x 3 3 p 3 + x 5 5 p 5 + I{odd} (ω) P({odd}) x2 + x4 + x6 x1 + x3 + x5 = I{even} (ω) + I (ω). 6( p2 + p4 + p6 ) 6( p1 + p3 + p5 ) {odd}
=
Similarly, E[ | G](ω) = =
j P[ω j | G](ω) j {P[ω j | {even}]I{even} (ω) + P[ω j | {odd}]I{odd} (ω)}
=
2 p2 + 4 p4 + 6 p6 1 p 1 + 3 p 3 + 5 p 5 I{even} (ω) + I{odd} (ω) P({even}) P({odd})
=
1 1 I{even} (ω) + I (ω). 2( p2 + p4 + p6 ) 2( p1 + p3 + p5 ) {odd}
Now with E denoting expectation under P, E[X | G](ω) = x j P[X = x j | G](ω) = x j {P[X = x j | {even}]I{even} (ω) + P[X = x j | {odd}]I{odd} (ω)} =
x 2 p 2 + x 4 p 4 + x 6 p6
I{even} (ω) +
x1 p1 + x3 p3 + x5 p5
P({even}) P({odd}) x1 + x3 + x5 x2 + x4 + x6 = I{even} (ω) + I{odd} (ω). 3 3
I{odd} (ω)
We, therefore, see that
E X | G E X | G E[X | G](ω) = (ω). I{even} (ω) + I E |G E | G {odd}
We now prove this result in full generality. Recall that φ is integrable if E|φ| < ∞. Theorem 4.1.1 (Conditional Bayes’ Theorem) Suppose (, F, P) is a probability space and G ⊂ F is a sub-σ -field. Suppose P is another probability measure absolutely continuous with respect to P (P P) and with a Radon–Nikodym derivative dP = . dP Then if φ is any integrable F-measurable random variable, E φ | G if E | G > 0, E |G E[φ | G] = 0 otherwise. Proof
We must show that for any A ∈ G, E[φ | G]dP = αdP, A
A
4.1 Introduction
where
E φ | G E |G α= 0 otherwise.
133
if E | G > 0,
Write G = {ω : E | G = 0}, so G ∈ G. Then
E | G dP = 0 = G
dP, G
and ≥ 0 a.s. So either P(G) = 0, or the restriction of to G is 0 a.s. In either case, = 0 a.s. on G. Now G c = {ω : E | G > 0}. Suppose A ∈ G; then A = B ∪ C, where B = A ∩ G c and C = A ∩ G. Further, E[φ | G]dP = φdP = φdP A A A = φdP + φdP. B
Of course, = 0 a.s. on C ⊂ G, so φdP = 0 = αdP, C
by definition. Now
(4.1.2)
C
αdP = B
= = = = = That is
(4.1.1)
C
(E φ | G /E | G )dP B E φ | G
E IB E |G E φ | G
E IB E |G
E φ | G E E[I B |G E |G E φ | G
E I B E[ | G] E |G E I B φ .
φdP = B
αdP. B
(4.1.3)
134
Change of measures
From (4.1.1), adding (4.1.2) and (4.1.3), we see that φdP + φdP = φdP C B A = E[φ | G]dP = αdP, A
A
and the result follows. Another useful version of the preceding theorem is the following result. Theorem 4.1.2 Suppose (, F, P) is a probability space with a filtration {Ft , t ≥ 0}. Suppose P is another probability measure absolutely continuous with respect to P (P P) on F and with Radon–Nikodym derivative dP = . dP Define the martingale E | Ft = t . Then if {φt } is any {Ft }-adapted process, E t φt | Fs if E t | Fs > 0, E t | Fs E φt | Fs = 0 otherwise.
4.2 Measure change for discrete time processes Example 4.2.1 Let {bn } be a sequence of i.i.d. Bernouilli random variables on a probability space (, F, P) such that P(bk = 1) = p1 and P(bk = 2) = p2 , p1 + p2 = 1. Consider the filtration {Fk } = σ {b1 , . . . , bk }. Suppose that we wish to define a new probability measure P on (, Fk } such that P(bk = 1) = P(bk = 2) = 1/2. For 1 ≤ k ≤ N define a positive {Fk , P}-martingale {k } with P-mean 1 and put dP (ω) = N (ω). FN dP Let 0 = 1. Since 1 is F1 = σ {b1 }-measurable we have 1 (ω) =
P(b1 = 1) P(b1 = 2) I(b1 =1) (ω) + I(b =2) (ω), P(b1 = 1) P(b1 = 2) 1
or 1 (ω) =
1 1 I(b =1) (ω) + I(b =2) (ω). 2 p1 1 2 p2 1
(4.2.1)
4.2 Measure change for discrete time processes
135
Similarly, 2 (ω) = =
2 P(bi = j, b j = i) I(b = j,b j =i) P(bi = j, b j = i) i i, j=1 2
1 I(bi = j,b j =i) . 4 p i pj i, j=1
Define λk (ω) =
2 1 I(b =i) (ω), 2 pi k i=1
N (ω) =
N
λk (ω).
k=1
Now E[k | Fk−1 ] = k−1 E[λk | Fk−1 ] 2 1 = k−1 E[ I(b =i) (ω) | Fk−1 ] 2 pi k i=1 = k−1
2 1 pi = k−1 . 2 pi i=1
Hence for 1 ≤ k ≤ N , {k } is a martingale and since 0 = 1, E[k ] = 1. Lemma 4.2.2 Under the probability measure P defined by (4.2.1), {bn } is a sequence of i.i.d. Bernouilli random variables such that P(bn = 1) = P(bn = 2) = 1/2. Proof
Using Bayes’ Theorem 4.1.1 write P[bn = | Fn−1 ] = E[I(bn =) | Fn−1 ] = =
E[I(bn =) n | Fn−1 ] E[n | Fn−1 ]
n−1 E[I(bn =) λn | Fn−1 ] E[I(bn =) λn ] = n−1 E[λn | Fn−1 ] E[λn ]
= E[I(bn =) λn ]. Here λn =
2 i=1
=
1 1 I(b =i) (ω) and E[λn ] = 1 so that 2 2 pi n 1 P[bn = ] 2 p 1 1 = p = , 2 p 2
P[bn = | Fn−1 ] =
which shows that under P {bn } is a sequence of i.i.d. Bernouilli random variables such that P(bn = 1) = P(bn = 2) = 1/2.
136
Change of measures
Example 4.2.3 Let {X n } be a sequence of random variables with positive probability density functions φn on some probability space (, F, P). Consider the filtration {Fn } = σ {X 1 , . . . , X n }. Suppose that we wish to define a new probability measure P on (, Fn } such that X n are i.i.d. with positive probability density function α. Let λ0 = 1 and for k ≥ 1, λk =
α(X k ) , φk (X k )
n =
n
λk ,
k=0
and dP (ω) = n (ω). Fn dP Lemma 4.2.4 The sequence of random variables {n }, n ≥ 0 is an {Fn , P}-martingale with P-mean 1. Moreover, under P, {X n } is a sequence of i.i.d. random variables with probability density function α. Proof
We have to show that E[n | Fn−1 ] = n−1 .
However, n = n−1 λn and since n−1 is Fn−1 -measurable we must show that E[λn | Fn−1 ] = 1. In view of the definition of λn we have α(x) α(X n ) E[λn | Fn−1 ] = E | Fn−1 = E φk (x)dx | Fn−1 = 1. φk (X n ) IR φk (x) Since {n } is a martingale, for all n, E[λn ] = E[λ0 ] = 1. Let f be any integrable real-valued “test” function. Using Bayes’ Theorem 4.1.1, E[ f (xn ) | Fn−1 ] =
E[ f (xn )n | Fn−1 ] = E[ f (xn )λn | Fn−1 ]. E[n | Fn−1 ]
Using the form of λn we have α(x) E f (x) f (x)α(x)dx, φk (x) | Fn−1 = φk (x) IR IR which finishes the proof. The next example is a generalization of Example 4.2.1. Some dependence between the random variables bn is introduced. Example 4.2.5 Let {ηn }, 1 ≤ n ≤ N be a Markov chain with state space {1, 2} on a probability space (, F, P) such that P(ηn = j | ηn−1 = i) = pi j and let { p10 , p20 } be the distribution
4.2 Measure change for discrete time processes
137
of η0 . Consider the filtration {Fn } = σ {η0 , η1 , . . . , ηn }. Suppose that we wish to define a new probability measure P on (, Fn } such that P(ηn = j | ηn−1 = i) = pi j . Let 0 = 1. Since 1 is F1 = σ {η0 , η1 }-measurable we have that 1 (ω) =
p 11 p I(η =1,η1 =1) (ω) + 12 I(η0 =1,η1 =2) (ω) p11 0 p12 +
p 21 p I(η =2,η1 =1) (ω) + 22 I(η0 =2,η1 =2) (ω). p21 0 p22
Define λn (ω) =
p ji I(η =i,η = j) (ω), p ji n−1 n
ij
N =
N
λn .
n=1
Lemma 4.2.6 {n } is an {Fn , P}-martingale and under P the Markov chain η has transition probabilities pi j . Proof Using the fact that n−1 is Fn−1 -measurable and the Markov property of {ηn } under P we can write E[n | Fn−1 ] = n−1
= n−1 = n−1
p ji E[I(ηn−1=i ,ηn = j) | ηn−1 ] pi j ij p ji pi j I(ηn−1=i ) p ji ij i
I(ηn−1=i )
p ji
j
= n−1 . Hence {n } is a martingale and since 0 = 1, E[n ] = 1 for all n ≥ 0. Using Bayes’ Theorem 4.1.1 write P[ηn = | Fn−1 ] = E[I(ηn =) | Fn−1 ] =
E[I(ηn =) n | Fn−1 ] E[n | Fn−1 ]
=
n−1 E[I(ηn =) λn | Fn−1 ] n−1 E[λn | Fn−1 ]
=
E[I(ηn =) λn | Fn−1 ] E[λn | Fn−1 ]
= E[I(ηn =) λn | Fn−1 ].
138
Here λn (ω) =
Change of measures
ij
p ji I(η =i,η = j) (ω) and E[λn ] = 1] so that: p ji n−1 n p i I(η ) P[ηn = | Fn−1 ] pi n−1=i i p i = I(η ) P[ηn = | ηn−1 ] pi n−1=i i p i = pi I(ηn−1=i ) pi i
P[ηn = | Fn−1 ] =
= p X n−1 , , which shows that under P, {ηn } is a Markov chain with transition probabilities pi j .
Example 4.2.7 Let {ηn } be a Markov chain with state space S = {e1 , . . . , e M }, where ei are unit vectors in IR M with unity as the i-th element and zeros elsewhere. Write Fn0 = σ {η0 , . . . , ηn } for the σ -field generated by η0 , . . . , ηn , and {Fn } for the complete filtration generated by the Fn0 ; this augments Fn0 by including all subsets of events of probability zero. The Markov property implies here that P(ηn+1 = e j | Fn ) = P(ηn+1 = e j | ηn ). Write
= ( p ji ) ∈ IR M×M , so that E[ηk+1 | Fk ] = E[ηk+1 | ηk ] = ηk . From (2.4.3) we have the semimartingale ηn+1 = ηn + Vn+1 .
(4.2.2)
The Markov chain is a simple kind of stochastic process on S. However, a more simple process would be one in which η is independently and uniformly distributed over its state space S at each time n. This is modeled by supposing there is a probability measure P on (, F) such that at time n, P(ηn+1 = j | ηn = i) = 1/M. Given such a simple process, and its probability P, we shall construct a new probability P so that under P, η is a Markov chain with transition matrix . Recall that, if = ( p ji ) is a transition matrix, then ( p ji ) ≥ 0 and M j=1 p ji = 1. Suppose is any transition matrix. Suppose {ηn }, n ≥ 0, is a process on the finite state space S such that, under a probability P, P(ηn = j | ηn−1 = i) =
1 . M
That is, the probability distribution of η is independent, and uniformly distributed at each time n.
4.2 Measure change for discrete time processes
139
Lemma 4.2.8 Define λ¯ = M
M
( η−1 , e j η , e j ),
j=1
¯n = and
n
=1
λ¯ .
A new probability measure P is defined by putting Markov chain with transition matrix . Proof
Note first that E[λ¯ | F−1 ] = M E
M
dP = n , and under P, η is a dP Fn
( η−1 , e j η , e j ) | F−1
j=1
=M =
M 1
η−1 , e j M j=1
M M
η−1 , ei p ji = 1.
i=1 j=1
Then, using Bayes’ Theorem 4.1.1, P(ηn = e j | Fn−1 ) = E[ X n , e j | Fn−1 ] =
E[ X n , e j n | Fn−1 ] E[n | Fn−1 ]
.
Because n = n−1 λ¯ n and n−1 is Fn−1 -measurable this is E[ X n , e j λ¯ n | Fn−1 ] = M E[ ηn−1 , e j ηn , e j ) | Fn−1 ] E[λ¯ n | Fn−1 ] = ηn−1 , e j , and, as this depends on ηn−1 this equals P(ηn = e j | ηn−1 ). If ηn−1 = ei we see that P(ηn = e j | ηn−1 = ei ) = p ji and so, under P, η is a Markov chain with transition matrix . Example 4.2.9 In this example we discuss the filtering of a partially observed discrete-time, finite-state Markov chain, that is, the Markov chain is not observed directly; rather there is a discrete-time, finite-state observation process {Yk }, k ∈ IN, which is a “noisy” function of the chain. All processes are defined initially on a probability space (, F, P); below a new probability measure P is defined. A system is considered whose state is described by a finite-state, homogeneous, discretetime Markov chain X k , k ∈ IN. We suppose X 0 is given, or its distribution known. If the state space of X k has N elements it can be identified, without loss of generality, with the set S X = {e1 , . . . , e N }, where ei are unit vectors in IR N with unity as the i-th element and zeros elsewhere.
140
Change of measures
Write Fk = σ {X 0 , . . . , X k }, for the complete filtration generated by X 0 , . . . , X k . The Markov property implies here that P(X k+1 = e j | Fk ) = P(X k+1 = e j | X k ). Write a ji = P(X k+1 = e j | X k = ei ), A = (a ji ) ∈ IR N ×N ,
(4.2.3)
so that E[X k+1 | Fk ] = E[X k+1 | X k ] = AX k and X k+1 = AX k + Vk+1 . The state process X is not observed directly. We suppose there is a function c(., .) with finite range and we observe the values Yk+1 = c(X k , w k+1 ),
k ∈ IN,
(4.2.4)
where the w k are a sequence of independent, identically distributed (i.i.d.) random variables. We shall write {Gk } for the complete filtration generated by X and Y , and {Yk } for the complete filtration generated by Y . Suppose the range of c(., .) consists of M points. Then we can identify the range of c(., .) with the set of unit vectors SY = { f 1 , . . . , f M }, f j = (0, . . . , 1, . . . , 0) ∈ IR M , where the unit element is the j-th element. Now (4.2.4) implies P(Yk+1 = f j | Gk ) = P(Yk+1 = f j | X k ). Write C = (c ji ) ∈ IR M×N , c ji = P(Yk+1 = f j | X k = ei ),
(4.2.5)
so that M j=1 c ji = 1 and c ji ≥ 0, 1 ≤ j ≤ M, 1 ≤ i ≤ N . Note that, for simplicity, we assume that the c ji are independent of k. We have, therefore, E[Yk+1 | X k ] = C X k . If Wk+1 := Yk+1 − C X k then, taking the conditional expectation and noting E[C X k | X k ] = C X k , we have E[Wk+1 | Gk ] = E[Yk+1 − C X k | X k ] = C X k − C X k = 0, so Wk is a (P, Gk ) martingale increment and Yk+1 = C X k + Wk+1 , Write Yki = Yk , f i so Yk = (Yk1 , . . . , YkM ) , k ∈ IN. For each k ∈ IN, exactly one component is equal to 1, the remainder being 0. M i Note i=1 Yk = 1. Write i i ck+1 = E[Yk+1 | Gk ] =
N
ci j e j , X k ,
j=1 1 M and ck+1 = (ck+1 , . . . , ck+1 ) . Then
ck+1 = E[Yk+1 | Gk ] = C X k . We shall suppose initially that cki > 0, 1 ≤ i ≤ M, k ∈ IN. (See, however, Remark 4.2.12.) M i Note i=1 ck = 1, k ∈ IN.
4.2 Measure change for discrete time processes
141
In summary then, we have under P, X k+1 = AX k + Vk+1 Yk+1 = C X k + Wk+1 ,
(4.2.6) k ∈ IN,
(4.2.7)
where X k ∈ S X , Yk ∈ SY , A and C are matrices of transition probabilities given in (4.2.3), (4.2.5). The entries satisfy N
a ji = 1, a ji ≥ 0,
(4.2.8)
c ji = 1, c ji ≥ 0.
(4.2.9)
j=1 M j=1
We assume, for this measure change, ci > 0, 1 ≤ i ≤ M, ∈ IN. This assumption, in effect, is that given any Gk , the observation noise is such that there is a nonzero probability i that Yk+1 > 0 for all i. This assumption is later relaxed to achieve the main results of this section. Define λ =
M M −1
Y , f i , i i=1 c
k =
k
λ .
=1
Lemma 4.2.10 With the above definitions E[λk+1 | Gk ] = 1. Proof E[λk+1 | Gk ] = =
M 1 1 i P(Yk+1 = 1 | Gk ) i M i=1 ck+1 M 1 1 · ci = 1. i M i=1 ck+1 k+1
∞ We now define a new probability measure P on , G by putting the restriction of =1
dP dP the Radon–Nikodym derivative to the σ -field Gk equal to k . Thus = k . This dP dP Gk means that, for any set B ∈ Gk , P(B) = k dP. B
Equivalently, for any Gk - measurable random variable φ, dP E φ = φdP = φ dP = φk dP = E k φ , dP where E and E denote expectations under P and P, respectively. Lemma 4.2.11 Under P, {Yk }, k ∈ IN, is a sequence of i.i.d. random variables each having the uniform distribution which assigns probability 1/M to each point f i , 1 ≤ i ≤ M, in its range space.
142
Proof
Change of measures
Using Lemma 4.2.10 and Bayes’ Theorem 4.1.1 we have j
P(Yk+1 = 1 | Gk ) = E[ Yk+1 , f j | Gk ] = E[k+1 Yk+1 , f j | Gk ]/E[k+1 | Gk ] = k E[λk+1 Yk+1 , f j | Gk ]/k E[λk+1 | Gk ] = E[λk+1 Yk+1 , f j | Gk ]
1 M =E
Y , f Y , f | G k+1 i k+1 j k i=1 i Mck+1
1 j = E Yk+1 | Gk j Mck+1 1 1 j = ck+1 = , j M Mck+1 a quantity independent of Gk , which finishes the proof. Note that E[X k+1 | Gk ] =
E[k+1 X k+1 | Gk ] = E[λk+1 X k+1 | Gk ] = AX k , E[k+1 | Gk ]
so that under P, X remains a Markov chain with transition matrix A. A reverse measure change ∞ What we wish to do now is start with a probability measure P on , Gn such that n=1
1. the process X is a finite-state Markov chain with transition matrix A and 2. {Yk }, k ∈ IN, is a sequence of i.i.d. random variables and j
j
P(Yk+1 = 1 | Gk ) = P(Yk+1 = 1) = 1/M. Suppose C = (c ji ), 1 ≤ j ≤ M, 1 ≤ i ≤ N is a matrix such that c ji ≥ 0 and M j=1 c ji = 1. ∞ We shall now construct a new measure P on , Gn such that under P, (4.2.7) still holds and E[Yk+1 | Gk ] = C X k . We again write
i and ck+1
n=1
ck+1 = C X k , K i = ck+1 , f i = C X k , f i , so that i=1 ck+1 = 1.
Remark 4.2.12 We do not divide by the cki in the construction of P from P. Therefore, we no longer require the cki to be strictly positive. The construction of P from P is inverse to that of P from P. Write = M
M i=1
ci Y , f i ,
k =
k
=1
λ .
4.2 Measure change for discrete time processes
143
Lemma 4.2.13 With the above definitions E[λk+1 | Gk ] = 1. Proof
Following the proof of Lemma 4.2.13, E[λk+1 | Gk ] = M
M
i i ck+1 P(Yk+1 = 1 | Gk )
i=1
=M
M i ck+1 i=1
This time set Theorem.)
M
=
M
i ck+1 = 1.
i=1
dP = k . (The existence of P follows from Kolmogorov’s Extension dP Gk
Lemma 4.2.14 Under P, E[Yk+1 | Gk ] = C X k . Proof
The proof is left as an exercise.
Write qk (er ), 1 ≤ r ≤ N , k ∈ IN, for the unnormalized, conditional probability distribution such that E[k X k , er | Yk ] = qk (er ). Now
N
i=1 X k , ei
= 1, so
N i=1
N qk (ei ) = E k
X k , ei | Yk = E k | Yk . i=1
Therefore, the normalized conditional probability distribution pk (er ) = E[ X k , er | Yk ] is given by pk (er ) =
qk (er ) . k qk (e j ) j=1
Theorem 4.2.15 For k ∈ IN, and 1 ≤ r ≤ N , we have the recursive estimate qk+1 = A diag(qk ) M
M
i=1
Yi
ci jk .
144
Change of measures
Proof Using the independence assumptions under P and the fact that Nj=1 X k , e j = 1, we have
E[ X k+1 , er k+1 | Yk+1 ] = E AX k + Vk+1 , er k k+1 | Yk+1 =M
N
E[ X k , e j ar j k | Yk ]
j=1
=M
N
M
Yi
ci jk+1
i=1
qk (e j )ar j
j=1
M
Yi
ci jk+1 ,
i=1
and the result follows. Example 4.2.16 (Change of measure for linear systems). Consider a system whose state at times k = 1, 2, . . . is xk ∈ IR. Let (, F, P) be a probability space upon which {vk }, k ∈ IN is a sequences of N (0, 1) Gaussian random variables, having zero means and variances 1. Let {Fk }, k ∈ IN be the complete filtration (that is, F0 contains all the P-null events) generated by {x0 , x1 , . . . , xk }. The state of the system satisfies the linear dynamics xk+1 = axk + bvk+1 .
(4.2.10)
Note that E[vk+1 | Fk ] = 0. Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (4.2.10) will hold. Suppose that under P, {xk }, k ∈ IN, is an i.i.d. N (0, 1) sequence with density function φ. For each l = 0, 1, 2, . . . define φ(b−1 (xl − axl−1 )) , bφ(xl ) k
k = λl . λl =
l=0
Lemma 4.2.17 The process {k }, k ∈ IN, is a P-martingale with respect to the filtration {Fk }. Proof
Since k is Fk -measurable, E[k+1 | Fk ] = k E[λk+1 | Fk ].
So that it is enough to show that E[λk+1 | Fk ] = 1: φ(b−1 (xk+1 − axk )) | Fk ] E[λk+1 | Fk ] = E[ bφ(xk+1 ) φ(b−1 (x − axk )) = φ(x)dx. bφ(x) IR
4.3 Girsanov’s theorem
145
Using the change of variable b−1 (x − axk ) = u, φ(u)du = 1, IR
and the result follows. Define P on {, F} by setting the restriction of the Radon–Nykodim derivative Gk equal to k . Then:
dP dP
to
Lemma 4.2.18 On {, F} and under P, {vk }, k ∈ IN, is a sequence of i.i.d. N (0, 1) random variables, where
vk+1 = b−1 (xk+1 − axk ). Proof Suppose f : IR → IR is a “test” function (i.e. measurable function with compact support). Then with E (resp. E) denoting expectation under P (resp. P) and using Bayes’ Theorem 4.1.1, E[ f (vk+1 ) | Fk ] =
E[k+1 f (vk+1 ) | Fk ] E[k+1 | Fk ]
= E[λk+1 f (vk+1 ) | Fk ], where the last equality follows from Lemma 4.2.17. Consequently E[ f (vk+1 ) | Fk ] = E[λk+1 f (vk+1 ) | Fk ] φ(b−1 (xk+1 − axk )) =E f (b−1 (xk+1 − axk )) | Fk . bφ(xk+1 ) Using the independence assumption under P this is φ(b−1 (x − axk )) −1 φ(u) f (u)du, f (b (x − axk ))φ(x)dx = bφ(x) IR IR and the lemma is proved.
4.3 Girsanov’s Theorem In this section we investigate how martingales, and in particular, Brownian motion, are changed when a new, absolutely continuous, probability measure is introduced. We need first the following results. Theorem 4.3.1 Suppose (, F, P) is a probability space with a filtration {Ft , t ≥ 0}. Suppose P is another probability measure equivalent to P (P P and P P) and with Radon–Nikodym derivative dP = . dP
146
Change of measures
Define the martingale E | Ft = t Then 1. {X t t } is a local martingale under P if and only if {X t } is a local martingale under P. 2. Every P-semimartingale is a P-semimartingale. Proof 1. We prove the result for martingales. The extension to local martingales can be found in Proposition 3.3.8 of Jacod and Shiryayev [19]. Let {X t } be a P martingale and F ∈ Fs , s ≤ t. We have X t dP = X s dP = X s s dP, F
F
and
F
X t dP = F
that is
X t t dP, F
X t t dP = F
X s s dP. F
Hence X is a P-martingale. The proof of the converse is identical. 2. By definition, a semimartingale is the sum of a local martingale and a process of finite variation. We need only prove the theorem in one direction and we can suppose X 0 = 0. If {X t } is a semimartingale under P, then by the product rule {X t t } is a semimartingale under P, which has a decomposition X t t = Nt + Vt , where N a local martingale and V is a process of finite variation. Therefore −1 X t = Nt −1 t + Vt t ,
since, by the equivalence of P and P, −1 t exists and is a P-martingale. By the first part −1 of this theorem Nt t is a local martingale under P, and the second term is the product of the P-semimartingale V of finite variation and the P-martingale −1 t . Theorem 4.3.2 Suppose t and P are as mentioned in Theorem 4.3.1 above. Suppose {X t } is a local martingale under P with X 0 = 0, (i) {X t } is a special semimartingale under P if the process { X, t } exists and then under P, t t −1 Xt = Xt − s− d X, s + −1 s− d X, s . 0
0
4.3 Girsanov’s theorem
147
Here, the first term is a local martingale under P, and the second is a predictable process of finite variation. (ii) In general, the process t Xt − −1 s− d[X, ]s 0
is a local martingale under P. Proof
See [11] page 162.
The following important theorem is an extension of the following rather simple situation. Let X 1 , . . . , X n be i.i.d. normal random variables with mean E(X ) = 0 and variance E(X 2 ) = σ 2 = 0 under probability measure P and with mean E(X ) = µ and variance E(X 2 ) = σ 2 = 0 under probability measure P µ . Then it is clear that P µ P (and P P µ ) and that n n dP µ 1 2 µi X i (ω) − µ . (ω) = exp dP 2 i=1 i i=1 Theorem 4.3.3 (Girsanov) Suppose Bt , t ∈ [0, T ], is an m-dimensional Brownian motion on a filtered space {, F, Ft , P}. Let f = ( f 1 , . . . , f m ) : × [0, T ] → IRm be a predictable process such that T | f t |2 dt < ∞ a.s. 0
Write
t ( f ) = exp
m i=1
t
0
f si dBsi
1 − 2
t
| f s | ds , 2
0
and suppose E[T ( f )] = 1, 1 T 2
(which holds if Novikov’s condition E e 2 0 | ft | dt < ∞ holds). (See [11].) If P f is the dP f = T ( f ), then Wt is an m-dimensional dP f Brownian motion on {, F, Ft , P }, where t Wti = Bti − f si ds. (4.3.1) probability measure on {, F} defined by
0
Proof We prove here the scalar case. To show W is a standard Brownian motion we verify the conditions of Theorem 2.7.1. That is, we show that (i) it is continuous a.s., (ii) it is a (local) martingale, and (iii) {Wt2 , t ≥ 0} is a (local) martingale. By definition W is a continuous process a.s. (Bt is continuous a.s. and an indefinite integral is a continuous process.) For (ii) we must show W is a local (Ft )-martingale under measure P f . Equivalently, from
148
Change of measures
Lemma 4.3.1 we must show that {t Wt } is a local martingale under P. Using the Itˆo rule we see, as in Example 3.6.11, that
t
t ( f ) = 1 +
s ( f ) f s dBs .
(4.3.2)
0
Applying the Itˆo rule to (4.3.2) and W , t Wt = W0 +
t
0
= W0 +
t
s dBs −
0
= W0 +
t
s dWs +
0 t
Ws ds +
t
d , W s
0
s f s ds +
0
t
t
t
Ws s f s dBs +
0
s f s ds
0
s (1 + Ws f s )dBs ,
0
and, as a stochastic integral with respect to B, {t Wt , t ≥ 0} is a (local) martingale under P. Property (iii) is established similarly, Wt2
=2
t
Ws dWs + W, W t = 2
0
t
Ws dWs + t,
0
or Wt2 − t = 2
t
Ws dWs ,
0
which, from (ii), is a (local) martingale under P f and the result follows.
Example 4.3.4 As a simple application of Girsanov’s theorem, let us derive the distribution of the first passage time, α = inf{t, Bt = b}, for Brownian motion with drift to a level b ∈ IR (see Example 2.2.5). Suppose that under probability measure P, {Bt , FtB } is a standard Brownian motion. Write 1 t = exp µBt − µ2 t , 2 and set dP µ = t . dP µ
Using Girsanov’s theorem, the process Bt = Bt − µt is a standard Brownian motion unµ der probability measure P µ . That is, under probability measure P µ , Bt = µt + Bt is a Brownian motion with drift µt.
4.3 Girsanov’s theorem
149
Now P µ (α ≤ t) = E µ [I (α ≤ t)] = E[t I (α ≤ t)] = E[I (α ≤ t)E[t | Fα ]] (see (2.2.2) and (2.2.3) for the definition of Fα ) = E[I (α ≤ t)α ]
= E[I (α ≤ t) exp µb − t |b| = exp µb − √ 2πs 3 0
1 2 µ α ] 2 1 2 µ s − b/2s ds. 2
See Problem 10, Chapter 2 for the density function of α under P.
Remark 4.3.5 Equation (4.3.1) is equivalent to saying that the original Brownian motion process {Bt } is a weak solution of the stochastic differential equation dX t = f (t, ω)dt + dB t ,
X 0 = 0,
where {B t } is a Brownian motion. That is, we have constructed a probability measure P on (, F) and a new Brownian motion process {B t } such that dBt = f (t, ω)dt + dB t . Remark 4.3.6 Let X t be a special semimartingale; then (see Example 3.6.11) t t = 1 + s− dX s ,
(4.3.3)
0
has the unique solution (0 = 1) 1 c c t = e X t − 2 X , X t s≤t (1 + X s )e− X s ,
which is called the stochastic exponential of the semimartingale {X t }. If t is a uniformly integrable positive martingale then ∞ = limt→∞ t exists and E[∞ | Ft ] = t
(a.s.).
Consequently, E[∞ ] = E[0 ] = 1, so that a new probability measure P can be defined on (, F) by putting dP = ∞ . dP P is equivalent to P if and only if ∞ > 0 a.s. More precisely, we have the following form of Girsanov’s theorem. (See [11] page 165.)
150
Change of measures
Theorem 4.3.7 Suppose the exponential t and P are as mentioned in (4.3.3) and Remark 4.3.6. If {Mt } is a local martingale under probability measure P, and the predictable covariation process { M, X t } exists under probability measure P, then M t = Mt − M, X t is a local martingale under probability measure P. Proof
First note that t plays the role of t in part (i) of Theorem 4.3.2. However, t t = 1 + s− dX s , 0
so
M, t =
t
s− d M, X s
0
and
t 0
−1 s− d M, s = M, X t .
That is, from part (i) of Theorem 4.3.2, M t = Mt − M, X t is a local martingale under probability measure P. More generally, we have the following result which is proven in [11]. Theorem 4.3.8 Suppose for a continuous local martingale {X t } the exponential t and P are as mentioned in Remark 4.3.6. Let {Mt } = {Mt1 , . . . , Mtm } be an IRm -valued continuous local martingale under prob1 m ability measure P. Then {M t } = {M t , . . . , M t } is a continuous local martingale under i probability measure P, where M t = Mti − M i , X t , and the predictable covariation under probability measure P of {M t } is equal to the predictable covariation under probability measure P of {Mt }, that is i
j
M , M tP = M i , M j tP . 4.4 The single jump process In this section we investigate Radon–Nikodym derivatives relating probability measures that describe when the jump happens and where it goes for a single jump process. Recall a few facts from Chapters 2 and 3. Consider a stochastic process {X t }, t ≥ 0, which takes its values in some measurable space {E, E} and which remains at its initial value z 0 ∈ E until a random time T , when it jumps to a random position Z . A sample path of the process is z 0 if t < T (ω), X t (ω) = Z (ω) if t ≥ T (ω).
4.4 The single jump process
151
The underlying probability space can be taken to be = [0, ∞] × E, with the σ -field B × E. A probability measure P is given on (, B × E). Write Ft = P[T > t, Z ∈ E], c = inf{t : Ft = 0} and d(t) = P(T ≤ t, Z ∈ E | T > t − ) =
−dFt Ft− ,
for the rate of the jump of the process X . Write FtA = P[T > t, Z ∈ A], then there is a Radon–Nikodym derivative λ(A, s) such that A A Ft − F0 = λ(A, s)dFs . ]0,t[
There is a bijection between probability measures P on (, B × E) and L´evy systems (λ, ). For A ∈ E define P(]0, t] × A) = − λ(A, s)dFs . ]0,t]
For t ≥ 0 define µ(t, A) = IT ≤t I Z ∈A . The predictable compensator of µ is given by dFsA µ p (t, A) = − . ]0,T ∧t] Fs− Write Ft for the completed σ -field generated by {X s }, s ≤ t, then q(t, A) = µ(t, A) − µ p (t, A) is an Ft -martingale. Suppose P is absolutely continuous with respect to P. Then there is a Radon–Nikodym dP derivative L = . Write L t = E[L | Ft ]. From Lemma 3.8.8, dP 1 L t = L(T, Z )I{T ≤t} + I{T >t} L(s, z)P(ds, dz). Ft ]t,∞] E However, the P(ds, dz)-integral is equivalent to
P(T > t, Z ∈ E) = F t , so that L t = L(T, Z )I{T ≤t} + I{T >t}
Ft . Ft
If we substitute the mean 0 martingale L t − 1 for Mt in Theorem 3.8.10 we have the stochastic integral representation Lt − 1 = I{s≤t} g(s, x)q(ds, dx),
where g(s, x) = L(s, x) − I{sc.
152
Change of measures
In order to use the exponential formula given in Example 3.6.11 we write t Lt = 1 + L s− dMs .
(4.4.1)
0
Here
Mt =
I{s≤t} g(s, x)L −1 s− q(ds, dx).
The unique solution of (4.4.1) is the stochastic exponential (L 0 = 1)
L t = e Mt (1 + Ms )e− Ms . s≤t
At the discontinuity of Fs ,
Ms = E
g(s, z)L −1 s− λ(dz, s)
and at the jump time T , MT = g(T, z)L −1 T− + Hence
E
Fs , Fs−
g(T, z)L −1 T − λ(dz, T )
FT . FT −
−1 L t = exp − I{s≤t} g(s, x)L s− dµ p FT −1 −1 × 1 + g(T, z)L T − I{T ≤t} + I{T ≥t} g(T, z)L T − λ(dz, T ) FT − E
F s × 1+ . g(s, z)L −1 s− λ(dz, s) Fs− E s≤t∧T,u=T
We can relate the L´evy system (λ, ) of probability measure P to that of probability measure P. This is given in the next theorem (see [11]). Theorem 4.4.1 Suppose (λ, ) is the L´evy system of probability measure P. Then dF-a.s.: Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E A , λ(A, s) = Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E E and
t = ]0,t]
Proof
E
1 + g(s, z)L −1 s− +
For t > 0 and A ∈ E, F¯tA = P(]t, ∞] × A) =
However,
Fs Fs−
E
g(s, z)L −1 s− λ(dz, s)ds .
LdP = − ]t,∞]×A
F¯tA = −
L(s, z)λ(dz, ds)dFs . ]t,∞]
A
λ(A, s)d F¯s = − ]t,∞]
λ(A, s) ]t,∞]
d F¯s dFs . dFs
4.4 The single jump process
so dFs -a.s.: λ(A, s)
d F¯s = dFs
L(s, z)λ(dz, ds) = A
153
¯ F¯s Fs− g(s, z)L −1 + λ(dz, ds). s− Fs− Fs A
¯ and if F¯c− ¯ Therefore, for s < c, ¯ = 0, for s ≤ c, Fs Fs d F¯s F¯s λ(dz, ds) d F¯s -a.s. λ(A, s) = g(s, z)L −1 + s− Fs− F¯s− dFs F¯s− A Fs F¯s −1 = 1+ g(s, z)L s− + 1 + λ(dz, ds). Fs− F¯s− A (4.4.2) ¯ and Now if s is a point of continuity of F then it is also a point of continuity of F, ¯ ¯ d Fs Fs Fs = F¯s = 0. If Fs = 0 then the Radon–Nikodym derivative = , and the dFs Fs left hand side above is Fs− (Fs− + F¯s ) F¯s F¯s 1+ . λ(A, s) = λ(A, s) Fs Fs F¯s F¯s− Evaluating (4.4.2) when A = E, so λ(E, s) = 1 = λ(E, s), F¯s Fs Fs −1 −1 1 + g(s, z)L s− + = g(s, z)L s− λ(dz, s), Fs− E Fs− F¯s− if Fs = 0, and we have Fs d F¯s = (1 + g(s, z)L −1 s− )λ(dz, s), F¯s− dFs E Fs ) = 0, Fs− Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E λ(A, s) = A Fs 1 + g(s, z)L −1 g(s, z)L −1 s− + s− dλ dλ Fs− E E
if Fs = 0. Substituting in (4.4.2) we have if (1 +
¯ and for s ≤ c¯ if F¯c− d F¯s -a.s. for s < c, ¯ = 0. Now (1 + Fs /Fs− ) = 0 only if s = c, c < ∞ and Fc− = 0. This situation is only of interest here if also c¯ = c and F¯c− = 0. However, in this case it is easily seen that substituting g(c, z)L −1 c− =
Fc− L(c, z) F¯c−
in (4.4.2) gives the correct expression for λ(A, c) = λ(A, c), because L(c, z) = Now
t = − ]0,t]
d F¯s = F¯s
]0,t]
Fs d F¯s ds . F¯s− dFs
F¯c dλ . Fc dλ
154
Change of measures
If Ft is continuous at s, again F¯s = Fs = 0 and evaluating (4.4.2) for A = E, ds Fs d F¯s = = (1 + g(s, z)L −1 s− )λ(dz, s). ds F¯s dFs E That is
t = ]0,t]
E
1 + g(s, z)L −1 s− +
Fs Fs−
E
g(s, z)L −1 s− λ(dz, s)ds .
Notation 4.4.2 Denote by A the set of right-continuous, monotonic increasing (deterministic) functions t , t ≥ 0, such that (1) 0 = 0, (2) u = u − u− ≤ 1 for all points of discontinuity u, (3) if u = 1 then t = u for t ≥ u. Remark 4.4.3 If t ∈ A then t = ct + dt , where dt = s≤t s and ct is continuous. The decomposition is unique and both dt and ct are in A. If dt = 0 and ct is absolutely continuous with respect to Lebesgue measure, there is a measurable function rs such that t c t = rs ds. 0
The function rs is often called the “rate” of the jump process. Note that might equal +∞ for finite t.
Lemma 4.4.4 The formulae Ft = 1 − G t , Ft = exp(−ct )
(1 − u ),
(4.4.3)
u≤t
t = − ]0,t]
−1 Fs− dFs ,
(4.4.4)
define a bijection between the set A and the set of all probability distributions {G} on ]0, ∞]. Proof Clearly if t ∈ A then Ft , defined by (4.4.3), is monotonic decreasing, rightcontinuous, F0 = 0 and 0 ≤ Ft ≤ 1. Therefore G t = 1 − Ft is a probability distribution on ]0, ∞]. Conversely, if G t is a probability distribution, if Ft = 1 − G t and t is given by (4.4.4), then t is in A. From Example 3.6.11 (taking to be a single point), Ft defined by (4.4.3) is the unique solution of the equation dFt = −Ft− dt , This shows the correspondence is a bijection.
F0 = 1.
4.4 The single jump process
155
Lemma 4.4.5 Suppose t ∈ A is a second process whose associated Stieltjes measure dt is absolutely continuous with respect to dt , that is dt = αt . dt Then the associated F t has the form F t = Ft
(1 − α(s) d ) s
s≤t
(1 − ds )
t exp − (α(s) − 1)dcs , 0
where Ft is defined by (4.4.3). Furthermore, α(s) ds ≤ 1, and if α(s) ds = 1 then α(t) = 0 for t ≥ s. Proof
By hypothesis
t = 0
so from (4.4.3) c
F t = e−t
t
α(s)dcs +
α(s) ds ,
s≤t
(1 − u )
u≤t
t c = exp − α(s)ds (1 − α(s) ds ) 0
= Ft
u≤t
(1 − α(s) d ) s
s≤t
(1 − ds )
t c exp − (α(s) − 1)ds . 0
The conditions on α follow from Lemma 4.4.4 and the definition of A. If λ(., .) is such that (λ1) λ(A, s) ≥ 0 for A ∈ E, s > 0, (λ2) for each A ∈ E λ(A, .) is Borel measurable, (λ3) for all s ∈]0, c[, (except perhaps on a set of d-measure 0), λ(., s) is a probability measure on (E, E), and if c < ∞ and c− < ∞ then λ(., c) is a probability measure. Then: Lemma 4.4.6 There is a bijection between probability measures P on (, B × E) and L´evy systems (λ, ). Proof In Example 2.1.4 we saw how a L´evy system is determined by a measure P. Conversely, given a pair (λ, ), because ∈ A we can determine a function Ft by (4.4.3). For A ∈ E define P(]0, t] × A) = − λ(A, s)dFs . ]0,t]
Now the converse of theorem 4.4.1 is given. (Theorem 17.12 of [11].)
156
Change of measures
Theorem 4.4.7 Suppose P, P have L´evy systems (λ, ) and (λ, ). Write c = inf{t : F t = 0}, and suppose c ≤ c, dt d on ]0, c] and λ(., t) λ(., t) d-a.e. Then P P with Radon–Nikodym derivative t c L(t, z) = α(t)β(t, z) t− exp − (α(s) − 1)ds I{t≤c} .
(4.4.5)
0
Fs
1 + Fs− α(s) , Here t = Fs s≤t 1+ Fs−
dt dλt = α(t), and = β(t, z). dt dλt Proof
Define L(t, Z ) by (4.4.5) and write t η(t) = exp − (α(s) − 1)dcs . 0
β(t, z)dλ = 1 a.s.
Then, because E
E[L(t, Z )] = −
α(t)η(t) t− dFt . ¯ ]0,c]
From Lemma 4.4.5 and Equations (4.4.3) and (4.4.4), η(t) t− =
F¯t− . Ft−
As measures on [0, ∞], dt = so
d F¯t dFt = −α(t) = α(t)dt , ¯ Ft− Ft−
E[L(t, Z )] = −
α(t) F¯t− ¯ ]0,c]
dFt =− Ft−
d F¯t− F¯t− = F¯0 − F¯c¯ = 1. F¯t− ¯ ]0,c]
dP ∗ A probability measure P ∗ P can, therefore, be defined on (, B × E) by putting = dP L. For t < c we have L t = E[L | F] = L(T, Z )I{t≥T } + I{t
By similar calculations to those above the later term is F¯t Ft−1 α(s)η(s) s− dFs = = η(t) t , Ft ¯ ]t,c]
4.5 Change of parameter in Poisson processes
157
so L t = α(T )β(T, Z )η(T ) T − I{t≥T¯} + I{t
¯ for t < c. ¯ and for t = c¯ if c¯ = ∞ or F¯c− (2) φ(t, z) = 0 for t > c, ¯ = 0. ¯ z) = L(c, ¯ z) if c¯ < ∞ and F¯c− ¯ z) = (3) φ(c, = 0, that is, substituting, in this case φ(c, ¯ ¯ c, ¯ z). α(c)β( The L´evy system (λ∗ , ∗ ) associated with P ∗ is then defined by Fs 1+φ+ φdλ dλ Fs− E , λ∗ (A, s) = A Fs 1+φ+ φdλ dλ Fs− E E and ∗t
= ]0,t]
E
Fs 1+φ+ Fs−
φ λ(dz, s)ds . E
Substituting the above expression for φ we have (1 + ( Ft /Ft− )α(t)) φdλ = α(t)I{t≤c} , ¯ − (1 + Ft /Ft− ) E and (1 + ( Ft /Ft− ) φdλ = 1. E
The above expression gives d∗t = α(t), and dt
dλ∗t = β(t, z), dλt
so ∗t = , and
λ∗t = λ.
By Lemma 4.4.6, P = P ∗ P and the result is proved.
4.5 Change of parameter in Poisson processes Let Nt be a Poisson process with constant parameter λ on a filtered probability space (, F, Ft , P) and suppose that we wish to define a new probability P such that Nt is a
158
Change of measures
ˆ Define the stochastic process Poisson process with constant parameter λ. t =
Nt λˆ λ
e(λ−λ)t . ˆ
(4.5.1)
Lemma 4.5.1 The process (4.5.1) is a martingale under probability measure P. Proof Let 0 ≤ s ≤ t and recall that Nt is an independent increment process adapted to Ft so that Ns (Nt −Ns ) ˆλ ˆ λ ˆ ˆ E[t | Fs ] = e(λ−λ)s E e(λ−λ)(t−s) λ λ (Nt −Ns ) ˆ λ ˆ = s e(λ−λ)(t−s) E λ k λˆ (λ(t − s))k ˆ (λ−λ)(t−s) = s e e−λ(t−s) λ k! k = s , which finishes the proof.
Lemma 4.5.2 The exponential martingale {t } given by (4.5.1) is the unique solution of
t
t = 1 −
ˆ s− λ−1 (λ − λ)(dN s − λds).
(4.5.2)
0
Proof
Write t = e(λ−λ)t Yt , ˆ
where Yt =
Nt λˆ λ
(4.5.3)
, and t = f (t, Yt ). Using rule (3.6.9),
t ∂ f (s, Ys− ) ∂ f (s, Ys− ) ds + dYs ∂s ∂Y 0 0 ∂ f (s, Ys− ) f (s, Ys ) − f (s, Ys− ) − + Ys . ∂Y 0<s≤t
f (t, Yt ) = 1 +
t
(4.5.4)
Because Yt is a purely discontinuous process and of bounded variation, the second integral ˆ in (4.5.4) is equal to 0<s≤t e(λ−λ)s Ys .
4.5 Change of parameter in Poisson processes
159
In expression (4.5.4) we have f (s, Ys ) − f (s, Ys− ) = s − s−
Nr Ns λˆ λˆ =e λ r ≤s− λ Nr
λˆ ˆ −e(λ−λ)s r ≤s− λ Ns ˆ λ = s− − 1 λ λˆ = s− − 1 Ns . λ ˆ (λ−λ)s
Putting all these results together gives
t
t = 1 + 0
+
0<s≤t t
= 1+
ˆ s− ds + (λ − λ)
s−
− −
ˆ
0<s≤t
ˆ s− ds + (λ − λ) e(λ−λ)s Ys − ˆ
e(λ−λ)s Ys
λˆ ˆ − 1 Ns − e(λ−λ)s Ys λ
0
e(λ−λ)s Ys ˆ
0<s≤t
t
ˆ s− ds (λ − λ)
0
0<s≤t t
ˆ s− λ−1 (λ − λ)(dN s − λds),
0
which, after simplification, is (4.5.2). Now define a new probability measure P by setting dP E Ft = t . dP Lemma 4.5.3 Under probability measure P the process Nt is a Poisson process with paˆ rameter λ. Proof
By the characterization theorem of Poisson processes (see Theorem 3.6.14) we ˆ and Mt2 − λt ˆ are (P, Ft )-martingales. must show that Mt = Nt − λt By Bayes’ Theorem 4.1.1, for t ≥ s ≥ 0, E[Mt | Fs ] =
E[t Mt | Fs ] E[t Mt | Fs ] . = E[t | Fs ] s
160
Change of measures
Therefore, Mt is a (P, Ft )-martingale if and only if t Mt is a (P, Ft )-martingale. Now
t
t Mt =
s− dMs +
0
t
Ms− ds + [, M]t .
0
Recall
[, M]t =
s Ms
o<s≤t
=
t
o<s≤t t
=−
ˆ s− λ (λ − λ)dNs Ns −1
0
ˆ s− λ−1 (λ − λ)d[N , N ]s
0
=−
t
ˆ s− λ−1 (λ − λ)dN s.
0
Therefore t Mt =
t
t
ˆ s− (dNs − λds) +
0
Ms− ds −
0
t
ˆ s− λ−1 (λ − λ)dN s.
(4.5.5)
0
The second integral on the right of (4.5.5) is a (P, Ft )-martingale. (Recall that Nt − λt is a (P, Ft )-martingale.) The other two integrals are written as
t
t
ˆ s− (dNs − λds) =
ˆ s− (dNs − λds + λds − λds) t t t ˆ = s− (dNs − λds) + s− λds − s− λds,
0
0
0
0
(4.5.6)
0
and
t
−1
t
ˆ s− λ−1 (λ − λ)(dN s − λds + λds) t t ˆ = s− (dNs − λds) + s− λ−1 (λ − λ)λds.
ˆ s− λ (λ − λ)dN s =
0
0
0
(4.5.7)
0
Substituting (4.5.6) and (4.5.7) in (4.5.5) yields the desired result and it remains to show ˆ is also a (P, Ft )-martingale. that Mt2 − λt Now Mt2 = 2
0
t
Ms− dMs + [M, M]t = 2
t
Ms− dMs + Nt .
(4.5.8)
0
ˆ from both sides of (4.5.8) makes the last term on the right of (4.5.8) Subtracting λt a (P, Ft )-martingale and since the dM integral is a (P, Ft )-martingale the result follows.
4.6 Poisson process with drift
161
4.6 Poisson process with drift Let Nt be a Poisson process with parameter λ on a filtered probability space (, F, Ft , P) and suppose that we have the following process: X t = µt + σ Nt = µt + σ (Nt − λt + λt) = (µ + σ λ)t + σ Mt .
(4.6.1)
Here µ and σ are constants and {Mt } = {Nt − λt} is an (Ft , P)-martingale. The dynamics given by (4.6.1) could describe the evolution of a system with a linear trend perturbated by random jumps of size σ given by the Poisson process N . We wish to define a new probability P such that X t has dynamics Xt = σ Mt , where {M t } is an (Ft , P)-martingale. Define the stochastic process t µ µ µ t = exp − (1 − dMr Ns )e λσ Ns λσ 0 λσ 0<s≤t
µ µ µ Ns )e λσ Ns = e− λσ Mt (1 − λσ 0<s≤t
µ µ t = eσ (1 − Ns ), λσ 0<s≤t
(4.6.2)
(4.6.3)
where the last expression is obtained if we recall that Mt = Nt − λt and that
µ * ) µ Ns e λσ Ns = exp λσ 0<s≤t 0<s≤t µ
= e λσ Nt . Lemma 4.6.1 The process (4.6.3) is a martingale under probability measure P. Proof Let 0 ≤ s ≤ t and recall that Nt is an independent increment process and is adapted to Ft so that
µ µ E[t | Fs ] = e σ t E (1 − Nr ) | Fs λσ 0
µ µ = s e σ (t−s) E Nr ) (1 − λσ s
162
Change of measures
Lemma 4.6.2 The exponential martingale {t } given by (4.6.3) is the unique solution of t µ t = 1 − (dNs − λds). s− (4.6.4) λσ 0 µ µ Proof Write t = e σ t Yt , where Yt = 0
µ µ µ = eσ s 1− Nr 1 − Ns λσ λσ r ≤s−
µ µ −e σ s 1− Nr λσ r ≤s− µ = s− 1 − Ns − 1 µλσ = −s− Ns . λσ Combining these results together gives t µ µ t = 1 + e( σ )s Ys s− ds + σ 0 0<s≤t µ
µ + s− − Ns − e( σ )s Ys λσ 0<s≤t t µ µ = 1+ e( σ )s Ys s− ds + σ 0 0<s≤t t µ µ ( σ )s − e Ys − s− ds σ 0 0<s≤t t µ − s− (dNs − λds), λσ 0 which, after simplification, is (4.6.4). Now define a new probability measure P by setting dP E Ft = t . dP
4.7 Continuous-time Markov chains
163
Lemma 4.6.3 Under probability measure P the process X t has dynamics given by (4.6.2). Proof Let M t = Nt − λt, where λ = λ − µ/σ which is assumed to be positive. We claim that M t is a P-martingale. To see this we need only show that M is a P-martingale. Using the differentiation rule, d(t M t ) = t− dM t + M t− dt + d[, M]t µ µ dMt − t− dNt = t− (dNt − λdt) + M t− t− λσ λσ µ µ µ dMt − dNt = t− dNt − λdt + dt + M t− σ λσ λσ µ µ = t− dMt + M t− dMt − dMt , λσ λσ so that M is a dM stochastic integral, and hence is a P-martingale. So we can write X t = σ M t , under the new measure P.
Remark 4.6.4 The results of this section hold if (4.6.1) is replaced with the more general dynamics dX t = µ(t, X t− )dt + σ (t, X t− )dNt . The stochastic exponential martingale (4.6.3) takes the form t µs µs t = exp ds (1 − Ns ). σ λσ s s 0 0<s≤t 4.7 Continuous-time Markov chains Consider again the finite-state Markov process {X t } on the set of standard unit vectors of IR N (see Example 2.6.17 and Section 3.8). Write Ft for the right-continuous, complete filtration σ {X r : 0 ≤ r ≤ t}. We saw in Lemma 2.6.18 that X t has the semimartingale representation t Xt = X0 + Ar X r dr + Vt . 0
Recall that Jt denotes the total number of jumps (of all kinds) of the process X up to time t and N t Jt = −
X s , ei aii (s)ds + Q t . i=1
0
Write λt = −
N i=1
X t , ei aii (t).
164
Change of measures
Suppose that we wish to define a new probability P such that Jt is a standard Poisson process with parameter 1. Define the P-martingale t t t = exp{− log λr dJr + (λr − 1)dr }. 0
0
Since {t , Ft } is a P-martingale such that t t = 1 − r − λr−1 (λr − 1)(dJr − λr dr ), 0
we can define a new probability measure P by setting dP E Ft = t . dP Lemma 4.7.1 Under probability measure P the process Jt is a Poisson process with parameter 1. Proof
By the characterization theorem of Poisson processes (see Theorem 3.6.14) we
2
must show that Q t = Jt − t and Q t − t are (P, Ft )-martingales. By Bayes’ Theorem 4.1.1, for t ≥ s ≥ 0, E[Q t | Fs ] =
E[t Q t | Fs ] E[t Q t | Fs ] . = E[t | Fs ] s
Therefore, Q t is a (P, Ft )-martingale if and only if t Q t is a (P, Ft )-martingale. Now t t t Q t = s− dQ s + Q s− ds + [, Q]t , 0
and
0
t t −1 [, Q]t = 1 − s− λs (λs − 1)(dJs − λs ds), dJs − t 0 0 t =− s− λ−1 s (λs − 1)d[J , J ]s 0 t = s− (λ−1 s − 1)dJs . 0
Therefore
t
t Q t = 0
t
s− (dJs − ds) +
t
Q s− ds +
0
0
s− (λ−1 s − 1)dJs .
(4.7.1)
The second integral on the right of (4.7.1) is a (P, Ft )-martingale. However, Jt − λt is a (P, Ft )-martingale so that the other two integrals are written as t t s− (dJs − ds) = s− (dJs − λs ds + λs ds − ds) 0 0 t t = P-martingale + s− λs ds − s− ds, (4.7.2) 0
0
4.8 Problems
and
t 0
s− (λ−1 s − 1)dJs =
165
t
s− (λ−1 s − 1)(dJs − λs ds + λs ds) 0 t = P-martingale + s− (λ−1 s − 1)λs ds.
(4.7.3)
0
Substituting (4.7.2) and (4.7.3) in (4.7.1) yields the desired result. 2 To finish the proof we have to show that Q t − t is also a (P, Ft )-martingale. Now t t 2 Qt = 2 Q s− dQ s + [Q, Q]t = 2 Q s− dQ s + Jt . 0
(4.7.4)
0
Subtracting t from both sides of (4.7.4) makes the last term on the right of (4.7.4) a (P, Ft )martingale and since the dM integral is a (P, Ft )-martingale the result follows.
Remark 4.7.2 The above counting processes generate the same information as the Markov chain {X t }. If we wish to change the intensity matrix A to a another one, A, under a new probability measure P we change the intensities in the counting processes t t ij ij ij Jt =
X s , ei a ji (s)ds + Vt = λis j ds + Vt . 0
0
In order to define a new probability P such that t ij ij ij Jt = λs ds + V t , 0
define the P-martingale t =
i= j
and set
exp
ij
t
log 0
λs
dJri j ij λs
dP E Ft dP
−
t
0
+ ij (λs
−
λis j )dr
,
(4.7.5)
= t . ij
ij
Lemma 4.7.3 Under probability measure P the processes Jt have intensities λt respectively.
4.8 Problems 1. Consider the probability space ([0, 1], B([0, 1]), λ), where λ is the Lebesgue measure on the Borel σ -field B([0, 1]). Let P be another probability measure carried by the singleton {0}, i.e. P({0}) = 1. Let
166
Change of measures
π1 = {[0, 12 ], ( 12 , 1]}, π2 = {[0, 14 ], ( 14 , 34 ], ( 34 , 1]}, . . . , πn = {[0, 1/2n ], . . . , (1 − 1/2n , 1]}. Define the random variable
1 1 P([0, 2 ]) n , n ([0, n ]) = = 2 0 elswhere in [0, 1]. 2 λ([0, 2−n ]) −n
2. 3.
4.
5.
Show that the sequence n is a positive martingale (with respect to the filtration generated by the partitions πn ) such that E λ [n ] = 1 for all n but lim n = 0 λ-almost surely. Prove Lemma 4.2.14. Consider the order-2 Markov chain {X n }, 1 ≤ n ≤ N discussed in Example 2.4.6. Define a new probability measure P on (, Fn } such that P(X n = ek | X n−2 = ei , X n−1 = e j ) = pi j,k . On a probability space (, F, P) consider the stochastic process X n with a finite state space and transition probabilities P(X n+1 = k | X n−1 = i, X n = j) = pi j,k . Transform the process X into a Markov chain Y with an appropriate state space and do a change of measure under which the process Y becomes a sequence of i.i.d. uniform random variables on the state space of the Markov chain. (Hint: show that the process (X n−1 , X n ), (X n , X n+1 ), . . . is a Markov chain.) Let Nt be a Poisson process with parameter λ on a filtered probability space (, F, Ft , P) and suppose that we have the following process: X t = µt + σ Nt .
Here µ and σ are constants. Define a new probability P such that X t is a Poisson process with parameter λ = 1. 6. Show that the exponential martingale {t } given by (4.7.5) is the unique solution of t ij t = 1 + s− (λis j )−1 (λs − λis j )(dJsi j − λis j ds). i, j
7. Prove Lemma 4.7.3.
0
Part II Applications
5
Kalman filtering
5.1 Introduction This chapter discusses the filtering of partially observed linear (and nonlinear) dynamics using the tools developed in Chapter 4. The chapter starts with simple applications and evolves into less easy situations.
5.2 Discrete-time scalar dynamics Consider a system whose state at time k is xk ∈ IR. The time index k of the state evolution will be discrete and identified with IN = {0, 1, 2, . . . }. Let (, F, P) be a probability space upon which {vk }, {w k }, k ∈ IN are independent and identically distributed (i.i.d.) sequences of Gaussian random variables, having zero means and variances 1 (N (0, 1)) , respectively; x0 is normally distributed. Let {Fk }, k ∈ IN be the complete filtration (that is, F0 contains all the P-null events) generated by {x0 , x1 , . . . , xk }. The state of the system satisfies the linear dynamics xk+1 = axk + bvk+1 .
(5.2.1)
Note that E[vk+1 | Fk ] = 0. A useful and simple model for a noisy observation of xk is to suppose it is given as a linear function of xk plus a random “noise” term. That is, we suppose that for some real numbers c and d our observations have the form yk = cxk + dw k .
(5.2.2)
We shall also write {Yk }, k ∈ IN for the complete filtration generated by {y0 , y1 , . . . , yk }. Using measure change techniques we shall derive a recursive expression for the conditional distribution of xk given Yk . 5.3 Recursive estimation Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (5.2.1) and (5.2.2) will hold.
170
Kalman filtering
Suppose that under P: 1 2 1. {xk }, k ∈ IN is an i.i.d. N (0, 1) sequence with density function φ(x) = √ e−x /2 , 2π 1 2 2. {yk }, k ∈ IN is an i.i.d. N (0, 1) sequence with density function ψ(y) = √ e−y /2 . 2π For l = 0, λ0 =
ψ(d −1 (y0 − cx0 )) and for l = 1, 2, . . . define dψ(y0 ) φ(b−1 (xl − axl−1 ))ψ(d −1 (yl − cxl )) , b dφ(xl )ψ(yl ) k k = λl . λl =
(5.3.1) (5.3.2)
l=0
Let Gk be the complete σ -field generated by {x0 , x1 , . . . , xk , y0 , y1 , . . . , yk } for k ∈ IN. Lemma 5.3.1 The process {k }, k ∈ IN is an P-martingale with respect to the filtration {Gk }, k ∈ IN. Proof
Since k is Gk -measurable, E[k+1 | Gk ] = k E[λk+1 | Gk ].
Now φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) E λk+1 | Gk = E | Gk bdφ(xk+1 )ψ(yk+1 ) φ(b−1 (xk+1 − axk )) ψ(d −1 (yk+1 − cxk+1 )) =E | Gk , xk+1 | Gk . E bφ(xk+1 ) dψ(yk+1 ) Now,
E
and
ψ(d −1 (y − cxk+1 )) ψ(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 = ψ(y)dy = 1, dψ(yk+1 ) dψ(y) IR φ(b−1 (x − axk )) φ(b−1 (xk+1 − axk )) E | Gk = φ(x)dx = 1, bφ(xk+1 ) bφ(x) IR
using the change of variable d −1 (y − cxk+1 ) = u in the first integral and a similar change of variable in the second integral. Define P on {, F} by setting the restriction of the Radon–Nikodym derivative Gk equal to k . Then:
dP dP
to
Lemma 5.3.2 On {, F} and under P, {vk }, k ∈ IN, {w k }, k ∈ IN are i.i.d. N (0, 1) sequences of random variables, where
vk+1 = b−1 (xk+1 − axk ),
w k = d −1 (yk − cxk ).
5.3 Recursive estimation
171
Proof Suppose f, g : IR → IR are “test” functions (i.e. measurable functions with compact support). Then with E (resp. E) denoting expectation under P (resp. P) and using Bayes’ Theorem (4.1.1) E[ f (vk+1 )g(w k+1 ) | Gk ] =
E[k+1 f (vk+1 )g(w k+1 ) | Gk ] E[k+1 | Gk ]
= E[λk+1 f (vk+1 )g(w k+1 ) | Gk ], where the last equality follows from Lemma 5.3.1. Consequently E[ f (vk+1 )g(w k+1 ) | Gk ] = E[λk+1 f (vk+1 )g(w k+1 ) | Gk ] φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) =E bdφ(xk+1 )ψ(yk+1 ) −1 −1 × f (b (xk+1 − axk ))g(d (yk+1 − cxk+1 )) | Gk
φ(b−1 (xk+1 − axk )) f (b−1 (xk+1 − axk )) bφ(xk+1 ) ψ(d −1 (yk+1 − cxk+1 )) × E g(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 | Gk . dψ(yk+1 )
=E
Now
ψ(d −1 (yk+1 − cxk+1 )) E g(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 dψ(yk+1 ) ψ(d −1 (y − cxk+1 )) = ψ(y)g(d −1 (y − cxk+1 ))dy dψ(y) IR = ψ(u)g(u)du.
IR
Similarly E
φ(b−1 (xk+1 − axk )) f (b−1 (xk+1 − axk )) | Gk = φ(u) f (u)du. bφ(xk+1 ) IR
Therefore
E[ f (vk+1 )g(w k+1 ) | Gk ] =
φ(u) f (u)du IR
ψ(u)g(u)du, IR
and the lemma is proved. Let g : IR → IR be a “test” function. Using Bayes’ Theorem 4.1.1, E[g(xk ) | Yk ] =
E[k g(xk ) | Yk ] E[k | Yk ]
,
(5.3.3)
172
Kalman filtering
where E (resp. E) denotes expectations with respect to P (resp. P). Consider the unnormalized, conditional expectation which is the numerator of (5.3.3), E[k g(xk ) | Yk ]. This is a measure-valued process. Write αk (.), k ∈ IN for its density so that E[k g(xk ) | Yk ] = g(x)αk (x)dx.
(5.3.4)
IR
If pk (.) denotes the normalized conditional density, such that E[g(xk ) | Yk ] = g(x) pk (x)dx, IR
then from (5.3.3) we see that pk (x) = αk (x)
αk (z)dz
−1
,
(5.3.5)
IR
for x ∈ IR, k ∈ IN. Then we have the following result. Theorem 5.3.3 ψ(d −1 (yk+1 − cx)) αk+1 (x) = dbψ(yk+1 )
φ(b−1 (x − az))αk (z)dz.
(5.3.6)
Proof For any “test” function g and in view of (5.3.1) and (5.3.2), g(x)αk+1 (x)dx = E[k+1 g(xk+1 ) | Yk+1 ] IR
= E[k λk+1 g(xk+1 ) | Yk+1 ] φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) = E k g(xk+1 ) | Yk+1 bdφ(xk+1 )ψ(yk+1 ) 1 φ(b−1 (x − axk ))ψ(d −1 (yk+1 − cx)) = E k dbψ(yk+1 ) φ(x) × φ(x)g(x)dx | Yk+1 . The last equality follows from the fact that xk+1 has distribution φ and is independent of everything else under P. Also, note that given yk+1 we condition only on Yk to get an expression similar to notation (5.3.4), that is 1 g(x)αk+1 (x)dx = dbψ(yk+1 ) IR × ψ(d −1 (yk+1 − cx))φ(b−1 (x − az))g(x)αk (z)dxdz. IR
IR
This holds for all “test” functions g so we can conclude that (5.3.6) holds.
5.3 Recursive estimation
173
Remark 5.3.4 The linearity of (5.2.1) and (5.2.2) implies that (5.3.5) is also normally distributed with mean xˆk|k = E[xk | Yk ] and variance k|k = E[(xk − xˆk|k )2 | Yk ]. Our purpose now is to give recursive estimates of xˆk|k and k|k using the recursion for αk (x). Theorem 5.3.5 For the linear model described by (5.2.1) and (5.2.2) the conditional mean and variance of the state process xk are given by the following recursions: k+1|k+1 = Ak , xˆk+1|k+1 = Ak Bk , where Ak = Bk = −1
and k|k =
c2 1 a 2 k|k + − d2 b2 b4
−1 ,
cyk+1 a xˆk|k k|k , + 2 d2 b k|k
a2 1 . + b2 k|k
Proof Recall that φ(.) and ψ(.) are normal densities with zero means and variances 1, and using the fact that αn (x) is proportional to a normal density with mean xˆk|k and variance k|k we write ψ(d −1 (yk+1 − cx)) αk+1 (x) = φ(b−1 (x − az))αk (z)dz dbψ(yk+1 )
ψ(d −1 (yk+1 − cx)) 1 1 = (z − xˆk|k )2 dz exp − 2 (x − az)2 − dbψ(yk+1 ) 2b 2 k|k 2 xˆk|k ψ(d −1 (yk+1 − cx)) −x 2 = exp − dbψ(yk+1 ) 2b2 2 k|k
2 −1 2 a ax 1 xˆk|k × exp − 2z dz. + + z 2 b2 k|k b2 k|k Let
2 xˆk|k ψ(d −1 (yk+1 − cx)) −x 2 K (x) = , − exp dbψ(yk+1 ) 2b2 2 k|k
−1
a2 1 + , 2 b k|k
ax xˆk|k + . b2 k|k
k|k = and βk (x) =
174
Kalman filtering
αk+1 (x) = K (x)
exp
= K (x)
exp
−1 2 −1 [z k|k − 2zβk (x)] dz 2
−1 2 k|k
[z 2 − 2z k|k βk (x) + ( k|k βk (x))2 − ( k|k βk (x))2 ] dz
( k|k βk (x))2 −1 2 = K (x) exp (z − k|k βk (x)) dz exp 2 2 k|k ( k|k βk (x))2 = K (x) exp 2π k|k . 2 The last expression follows from the fact the integrand is proportional to a normal density with mean k|k βk (x) and variance k|k and hence the integral is 1 when properly normalized. The final step is to group together all the terms containing x and then completing the square with respect to the variable x: cy −1 2 c2 1 a 2 k|k a xˆk|k k|k k+1 αk+1 (x) = K 1 exp − 2x + 2− + 2 x 2 d2 b b4 d2 b k|k
−1 = K 2 exp (x − Ak Bk )2 , 2Ak where K 1 and K 2 are constants independent of x and −1 c2 1 a 2 k|k Ak = + 2− , d2 b b4 cyk+1 a xˆk|k k|k Bk = , + 2 d2 b k|k that is k+1|k+1 = Ak , xˆk+1|k+1 = Ak Bk , which finishes the proof. The Kalman filter is usually presented in terms of the one-step ahead prediction xˆk+1|k = E[xk+1 | Yk ] = a xˆk|k . Similarly, k+1|k = E[(xk+1 − xˆk+1|k )2 | Yk ] = a 2 k|k + b2 .
Then, with K k+1 =
c k+1|k , c2 k+1|k + d 2 xˆk+1|k+1 = a xˆk+1|k + K k+1 (yk − c xˆk+1|k ), k+1|k+1 = k+1|k −
2 c2 k+1|k
c2 k+1|k + d 2
.
5.4 Vector dynamics
175
5.4 Vector dynamics Consider a system whose state at time k = 0, 1, 2, . . . is X k ∈ IRm and which can be observed only indirectly through another process Yk ∈ IRd . Let (, F, P) be a probability space upon which Vk and Wk are normally distributed with means 0 and respective covariance identity matrices Im×m and Id×d . Assume that Dk is nonsingular, Bk is nonsingular and symmetric (for notational convenience). X 0 is a Gaussian random variable with zero mean and covariance matrix B02 (of dimension m × m). Let {Fk }, k ∈ IN be the complete filtration (that is F0 contains all the P-null events) generated by {X 0 , X 1 , . . . , X k }. The state and observations of the system satisfies the linear dynamics X k+1 = Ak+1 X k + Bk+1 Vk+1 ∈ IRm , Yk = Ck X k + Dk Wk ∈ IR . d
(5.4.1) (5.4.2)
Ak , Ck are matrices of appropriate dimensions. We shall also write {Yk }, k ∈ IN for the complete filtration generated by {Y0 , Y1 , . . . Yk }. Using measure change techniques we shall derive a recursive expression for the conditional distribution of X k given Yk . Recursive estimation Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (5.4.1) and (5.4.2) will hold. Suppose that under P: 1. {X k }, k ∈ IN is an i.i.d. N (0, Im×m ) sequence with density function φ(x) = 1 e−x x/2 , (2π )m/2 2. {Yk }, k ∈ IN is an i.i.d. N (0, Id×d ) sequence with density function ψ(y) = 1 e−y y/2 . d/2 (2π ) For any square matrix B write |B| for the absolute value of its determinant. ψ(D0−1 (Y0 − C0 X 0 )) For l = 0, λ0 = and for l = 1, 2, . . . define |D0 |ψ(Y0 ) λl = k =
φ(Bl−1 (X l − Al X l−1 ))ψ(Dl−1 (Yl − Cl X l )) , |Bl ||Dl |φ(X l )ψ(Yl ) k
λl .
(5.4.3) (5.4.4)
l=0
Let {Gk } be the complete σ -field generated by {X 0 , X 1 , . . . , X k , Y0 , Y1 , . . . , Yk } for k ∈ IN. The process {k }, k ∈ IN is a P-martingale with respect to the filtration {Gk }. dP Define P on {, F} by setting the restriction of the Radon–Nikodym derivative to dP Gk equal to k . It can be shown that on {, F} and under P, Vk and Wk are normally
176
Kalman filtering
distributed with means 0 and respective covariance identity matrices Im×m and Id×d , where
−1 Vk+1 = Bk+1 (X k+1 − Ak+1 X k ),
Wk = Dk−1 (Yk − Ck X k ). Let g : IRm → IR be a “test” function. Write αk (.), k ∈ IN for the density E[k g(xk ) | Yk ] = g(x)αk (x)dx.
(5.4.5)
IRm
Then we have the following result: Theorem 5.4.1 αk+1 (x) = Proof
−1 (Yk+1 − Ck+1 x)) ψ(Dk+1 |Dk+1 ||Bk+1 |ψ(Yk+1 )
−1 (x − Ak+1 z))αk (z)dz. φ(Bk+1
(5.4.6)
The proof is similar to the scalar case and is left as an exercise.
Remark 5.4.2 The linearity of (5.4.1) and (5.4.2) implies that (5.4.6) is proportional to a normal distribution with mean Xˆ k|k = E[X k | Yk ] and error covariance matrix k|k = E[(X k − Xˆ k|k )(X k − Xˆ k|k ) | Yk ] . Our purpose now is to give recursive estimates of Xˆ k|k and k|k using the recursion for αk (x). Write (x, Yk+1 ) = Then
−1 ψ(Dk+1 (Yk+1 − Ck+1 x)) (2π )−m/2 | k|k |−1/2 . |Dk+1 ||Bk+1 |ψ(Yk+1 )
αk+1 (x) = (x, Yk+1 )
IRm
−1 −1 exp (−1/2)(Bk+1 (x − Ak+1 z)) (Bk+1 (x − Ak+1 z))
−1 + (z − Xˆ k|k ) k|k (z − Xˆ k|k ) dz −1 = K (x) exp(−1/2)z k|k z − 2βk+1 z dz,
(5.4.7) (5.4.8)
IRm
where K (x) is independent of the variable z and −1
−2 −1 k|k = Ak+1 Bk+1 Ak+1 + k|k , −2 −1 βk+1 = x Bk+1 Ak+1 + Xˆ k|k k|k .
The next step is to complete the “square” in the argument of the exponential in (5.4.8) in order to rewrite the integrand as a normal density which integrates out to 1. Now −1
−1
z k|k z − 2βk+1 z = (z − k|k βk+1 ) k|k (z − k|k βk+1 ) − βk+1 k|k βk+1 ,
(5.4.9)
5.5 The EM algorithm
177
after substitution of (5.4.9) into (5.4.8) and integration we are left with only the x variable. Completing the “square” with respect to x,
1 −1 αk+1 (x) = K 1 exp (− )(x − Xˆ k+1|k+1 ) k+1|k+1 (x − Xˆ k+1|k+1 ) , 2 where K 1 is a constant independent of x and −1 −2 −2 −2 k+1|k+1 = Bk+1 − Bk+1 Ak+1 k|k Ak+1 Bk+1 −2 + Ck+1 Dk+1 Ck+1 , −1 −2 −1 ˆ −2 k+1|k+1 Xˆ k+1|k+1 = Bk+1 X k|k + Ck+1 Ak+1 k|k k|k Dk+1 Yk+1 .
The one-step ahead prediction version is Xˆ k+1|k = E[X k+1 | Yk ] = Ak+1 Xˆ k|k , k+1|k = E[(X k+1 − Xˆ k+1|k )(X k+1 − Xˆ k+1|k ) | Yk ] −2 = Ak+1 k|k Ak+1 + Bk+1 .
−2 Then, with Hk+1 = k+1|k Ck+1 (Ck+1 k+1|k Ck+1 + Dk+1 ) use of the Matrix Inversion Lemma 5.4.3 gives the Kalman filter equations in the form:
Xˆ k+1|k+1 = Ak+1 Xˆ k+1|k + Hk+1 (Yk+1 − Ck+1 Xˆ k+1|k ), −2 k+1|k+1 = k+1|k − k+1|k Ck+1 (Ck+1 k+1|k Ck+1 + Dk+1 Ck+1 k+1|k ).
Lemma 5.4.3 (Matrix Inversion Lemma) Assuming the required inverses exist, the Matrix Inversion Lemma states that: −1 −1 −1 −1 −1 (A11 − A12 A−1 = A−1 22 A21 ) 11 + A11 A12 (A22 − A21 A11 A12 ) A21 A11 .
5.5 The EM algorithm The EM algorithm ([3], [9]) is a widely used iterative numerical method for computing maximum likelihood parameter estimates of partially observed models such as linear Gaussian state space models. For such models, direct computation of the MLE is difficult. The EM algorithm has the appealing property that successive iterations yield parameter estimates with nondecreasing values of the likelihood function. Suppose that we have observations y1 , . . . , y K available, where K is a fixed positive integer. Let {Pθ , θ ∈ } be a family of probability measures on (, F), all absolutely continuous with respect to a fixed probability measure P0 . The log-likelihood function for computing an estimate of the parameter θ based on the information available in Y K is dPθ L K (θ ) = E 0 log | YK , dP0 and the maximum likelihood estimate (MLE) is defined by θˆ ∈ argmax L K (θ). θ∈
178
Kalman filtering
Let θˆ0 be the initial parameter estimate. The EM algorithm generates a sequence of parameter estimates {θ j }, j ≥ 1, as follows. Each iteration of the algorithm consists of two steps: Step 1 (E-step). Set θ˜ = θˆ j and compute Q(θ, θ˜ ), where dPθ Q(θ, θ˜ ) = E θ˜ log | YK . dPθ˜ Step 2 (M-step). Find θˆ j+1 ∈ argmaxθ∈ Q(θ, θ j ). Using Jensen’s inequality (2.3.3) it can be shown (see Theorem 1 in [9]) that the sequence of model estimates {θˆ j , j ≥ 1} from the EM algorithm are such that the sequence of likelihoods {L K (θˆ j )}, j ≥ 1, is monotonically increasing with equality if and only if θˆ j+1 = θˆ j . Sufficient conditions for convergence of the EM algorithm are given in [37]. We briefly summarize them here: Assume that (i) The parameter space is a subset of some finite dimensional Euclidean space IRr . (ii) {θ ∈ : L K (θ ) ≥ L K (θˆ0 )} is compact for any L K (θˆ0 ) > −∞. (iii) L K is continuous in and differentiable in the interior of . (As a consequence of (i), (ii) and (iii), clearly L K (θˆ j ) is bounded from above.) (iv) The function Q(θ, θˆ j ) is continuous in both θ and θˆ j . Then by Theorem 2 in [37], the limit of the sequence EM estimates {θˆ j } has a stationary ¯ for some stationary point θ¯ of L K . Also {L K (θˆ j )} converges monotonically to L¯ t = Lt (θ) ¯ ¯ point θ . To make sure that Lt is a maximum value of the likelihood, it is necessary to try different initial values θˆ0 .
5.6 Discrete-time model parameter estimation In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, filtered estimates of the above quantities are computed via Kalman smoothing, which requires large memory numerical implementation. This problem is solved in [13] by providing finite-dimensional filters for (the components of) such integral processes. The authors further show that finite-dimensional filters exist for moments of all orders of the state process. Assume that the state and observation processes are given by the vector dynamics X k+1 = Ak+1 X k + Bk+1 Vk+1 ∈ IRm , Yk = Ck X k + Dk Wk ∈ IRd .
(5.6.1) (5.6.2)
Ak , Ck are matrices of appropriate dimensions, Vk and Wk are normally distributed with means 0 and respective covariance identity matrices Im×m and Id×d . Assume that Dk is nonsingular, Bk is nonsingular and symmetric (for notational convenience). X 0
5.6 Discrete-time model parameter estimation
179
is a Gaussian random variable with zero mean and covariance matrix B02 (of dimension m × m). The linear model given by (5.6.1) and (5.6.2) is determined by the matrices A, B, C and D which need to be known. These parameters are estimated using the expectation maximization (EM) algorithm. Maximum likelihood estimation of the parameters via the EM algorithm requires computation of the filtered estimates of quantities such as k k k Tk(0) = l=0 X l ⊗ X l , Tk(1) = l=1 X l ⊗ X l−1 , Tk(2) = l=1 X l−1 ⊗ X l−1 , and Uk = k l=0 X l ⊗ Yl . Consider the time-invariant version of (5.6.1), (5.6.2): X k+1 = AX k + BVk+1 ∈ IRm ,
(5.6.3)
Yk = C X k + DWk ∈ IR .
(5.6.4)
d
The aim is to compute ML estimates of the parameters θ = (A, B, C, D) given the observations Yk = σ {Ys : s ≤ k}. This is done via the EM algorithm.
Notation Let ei , e j ∈ IRm denote unit vectors with 1 in the i-th and j-th positions, respectively. For i, j ∈ {1, . . . , m}, i j(0)
Tk
=
k
X l , ei X l , e j ,
(5.6.5)
X l , ei X l−1 , e j ,
(5.6.6)
l=0 i j(1)
Tk
=
k l=0
here ., . denotes the scalar product. Also let f j ∈ IRd denote the unit vector with 1 in the j-th position. For n ∈ {1, . . . , d} write Ukin =
k
X l , ei Yl , f n .
(5.6.7)
l=0 i j(0)
i j(1)
Note that Tk , Tk and Ukin are merely the elements of the matrices Tk(0) , Tk(1) , Tk(2) , and Uk respectively. ˜ is derived. Now the expression for Q(θ, θ) To update the set of parameters from θ˜ to θ , the following density is introduced: k dPθ γl , Gk = dPθ˜ l=0
γl =
where γ0 =
−1 ˜ (Y0 − C X 0 )) | D|φ(D , −1 ˜ |D|φ( D (Y0 − C˜ X 0 ))
−1 −1 ˜ ˜ (Yl − C X l )) | B|ψ(B (X l − AX l−1 )) | D|φ(D . ˜ l−1 )) |D|φ( D˜ −1 (Yl − C˜ X l )) |B|ψ( B˜ −1 (X l − AX
180
Kalman filtering
Now
E θ˜
dPθ log | Yk = −k log |B| − (k + 1) log |D| dPθ˜ Gk k 1 −2 + E θ˜ (X l − AX l−1 ) B (X l − AX l−1 ) | Yk 2 l=1 k 1 − E θ˜ (Yl − C X l ) (D D )−1 (Yl − C X l ) | Yk 2 l=1 ˜ = Q(θ, θ˜ ), + R(θ)
where R(θ˜ ) does not involve θ .
(5.6.8)
∂Q = 0. This yields ∂θ −1 k E θ˜ | Yk X l−1 ⊗ X l−1 | Yk ,
To implement the M-step set the derivatives A = E θ˜
k
X l ⊗ X l−1
l=1
k 1 B = E θ˜ (X l − AX l−1 ) ⊗ (X l − AX l−1 ) | Yk , k l=1 −1 k k C = E θ˜ E θ˜ Yl ⊗ X l | Yk X l ⊗ X l | Yk , 2
l=0
(5.6.9)
l=1
(5.6.10)
(5.6.11)
l=0
k 1 DD = (Yl − C X l ) ⊗ (Yl − C X l ) | Yk . E˜ k + 1 θ l=0
i j(0)
(5.6.12)
i j(1)
Next, finite-dimensional recursive filters for Tk , Tk and Ukin are derived. That is, these filters can be described in terms of a finite number of statistics. 5.7 Finite-dimensional filters Initially, assume that all processes are defined on an “ideal” probability space (, F, P). Suppose that under P: 1. {X k }, k ∈ IN is an i.i.d. N (0, Im×m ) sequence with density function ψ, 2. {Yk }, k ∈ IN is an i.i.d. N (0, Id×d ) sequence with density function φ. Write λ0 =
φ(D0−1 (Y0 − C0 X 0 )) . For each l = 1, 2, . . . define |D0 |φ((Y0 )
φ(Dl−1 (Yl − Cl X l )) ψ(Bl−1 (X l − Al X l−1 )) . |Dl |φ((Yl ) |Bl |φ((X l ) k For each k ≥ 0 set k = l=0 λl . Let Gk be the complete σ -field generated by {X 0 , X 1 , . . . , X k , Y0 , Y1 , . . . , Yk } for k ∈ IN. dP Define P on {, F} by setting the restriction of the Radon–Nikodym derivative to dP Gk equal to k . Then: λl =
5.7 Finite-dimensional filters
181
Lemma 5.7.1 On {, F} and under P, {Vk }, k ∈ IN, {Wk }, k ∈ IN are i.i.d. N (0, Id×d ), N (0, Im×m ) sequences respectively, where
Vl = Dl−1 (Yl − Cl X l ),
Wl = Bl−1 (X l − Al X l−1 ). Definition 5.7.2 Define the measure-valued processes αk (x) = E[k I (X k ∈ dx) | Yk ], i j(M) βk (x)
i j(M)
= E[k Tk
I (X k ∈ dx) | Yk ],
M = 0, 1, 2,
δkin (x) = E[k Ukin I (X k ∈ dx) | Yk ].
(5.7.1)
Then for any “test” function g : IRm → IR, write E[k g(X k ) | Yk ] = i j(M) E[k Tk g(X k )
IRm
αk (x)g(x)dx, i j(M)
| Yk ] =
E[k Ukin g(X k ) | Yk ] =
IRm
IRm
βk
(x)g(x)dx,
M = 0, 1, 2,
δkin (x)g(x)dx.
(5.7.2)
The following theorem ([13]) gives recursive expressions for the unnormalized densities i j(M) αk (x), βk (x), M = 0, 1, 2, and δkin (x). The recursions are derived under the measure P where {X l } and {Yl }, l = 0, 1, . . . are independent sequences of random variables. Theorem 5.7.3 For k = 0, 1, . . . the unnormalized densities αk (x), i j(M)
βk
(x), M = 0, 1, 2, and δkin (x) defined by (5.7.1) are given by the following recursions. αk (x) = (x, Yk ) IR
i j(0) βk (x)
= (x, Yk )
m
αk−1 (z)ψ(Bk−1 (x − Ak z))dz, βk−1 (z)ψ(Bk−1 (x − Ak z))dz i j(0)
IRm
+ x, ei x, e j i j(1) βk (x)
+ x, ei i j(2)
βk
IR
IRm
m
+
IRm
(5.7.4)
βk−1 (z)ψ(Bk−1 (x − Ak z))dz
z, e j αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,
(5.7.5)
βk−1 (z)ψ(Bk−1 (x − Ak z))dz i j(2)
(x) = (x, Yk )
IRm
αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,
i j(1)
= (x, Yk )
(5.7.3)
IR
m
z, ei z, e j αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,
(5.7.6)
182
Kalman filtering
δkin (x) = (x, Yk )
IRm
in δk−1 (z)ψ(Bk−1 (x − Ak z))dz
+ x, ei Yk , f n where (x, Yk ) =
IRm
αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,
(5.7.7)
φ(Dk−1 (Yk − Ck x)) . |Bk ||Dk |φ((Yk )
Remarks 5.7.4 1. Using (5.7.3), (5.7.4) and (5.7.7) are written as i j(0) i j(0) βk (x) = (x, Yk ) βk−1 (z)ψ(Bk−1 (x − Ak z))dz + x, ei x, e j αk (x), δkin (x) = (x, Yk )
(5.7.8)
IRm
IRm
in δk−1 (z)ψ(Bk−1 (x − Ak z))dz + x, ei Yk , f n αk (x).
(5.7.9)
2. The above theorem does not require Vl and Wl to be Gaussian. The recursions (5.7.3), (5.7.4) and (5.7.7) hold for arbitrary densities ψ and φ as long as φ is strictly positive. 3. Initial conditions: Note that at k = 0, the following holds for any Borel “test” function g(x). φ(D0−1 (Y0 − C0 x)) E[0 g(x) | Y0 ] = E g(x) | Y0 |D0 |φ(Y0 ) 1 = φ(D0−1 (Y0 − C0 x))ψ(x)g(x)dx. |D0 |φ(Y0 ) IRm (5.7.10) Equating (5.7.1) and (5.7.10) yields α0 (x) =
φ(D0−1 (Y0 − C0 x)) ψ(x). |D0 |φ(Y0 ) i j(M)
Similarly the initial conditions for βk i j(0)
β0
(x) M = 0, 1, 2 and δkin (x) are
(x) = x, ei x, e j α0 (x),
i j(2) β0 (x)
= 0,
δ0in (x)
(5.7.11)
i j(1)
β0
(x) = 0,
= x, ei Y0 , f n α0 (x). (5.7.12) i j(M)
4. Theorems 5.2 and 5.3 in [13] derive finite-dimensional filters for Tk , M = 0, 1, 2, i j(M) and Ukin defined in (5.6.5), (5.6.6), and (5.6.7). In particular, the densities αk , βk , in M = 0, 1, 2, and δk are characterized in terms of a finite number of statistics. Recall from Section 5.4 that αk is an unnormalized normal density with mean µk = E[X k | Yk ] and variance Rk = E[(X k − µk )(X k − µk ) | Yk ] given by the well-known Kalman filter equations. −1 µk = Rk Bk−2 Ak σk−1 Rk−1 µk−1 + Rk Ck (Dk Dk )−1 Yk , −1 Rk = (Ak Rk−1 Ak + Bk2 )−1 + Ck (Dk Dk )−1 Ck .
(5.7.13) (5.7.14)
5.7 Finite-dimensional filters
183
Here σk is a symmetric m × m matrix defined as −1 σk = Ak Bk−2 Ak + Rk−1 .
(5.7.15)
Due to the presence of the quadratic term x, ei x, e j , the density βk(0) in (5.7.8) is not Gaussian. However, as will be shown below (Theorem 5.2 in [13]) it is possible to express it as a quadratic expression in x multiplied by αk (x) for all k. The important conclusion then is that by updating the coefficients of the quadratic expression, together with the Kalman i j(0) filter above, gives finite-dimensional filters for computing Tk . A similar result also holds i j(1) i j(2) for Tk , Tk and Ukin . Theorems 5.7.5 and 5.7.6 that follow derive finite-dimensional sufficient statistics for the i j(M) densities βk , M = 0, 1, 2, and δkin . To simplify notation, define k = Bk−2 Ak σk−1 ,
−1 −1 Sk = σk+1 R k µk .
(5.7.16)
i j(M)
Theorem 5.7.5 At time k, the density βk (x) (initialized according to (5.7.12)) is comi j(M) i j(M) i j(M) pletely defined by the five statistics ak , bk , dk , Rk and µk as follows: i j(M)
βk
i j(M) i j(M) i j(M) (x) = ak + bk x + x dk x αk (x),
i j(M)
i j(M)
M = 0, 1, 2,
(5.7.17)
i j(M)
where ak ∈ IR, bk ∈ IRm , and dk ∈ IRm×m is a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m. i j(M) i j(M) i j(M) Furthermore, ak , bk , dk , M = 0, 1, 2, are given by the following recursions: i j(0)
i j(0) i j(0) −1 i j(0) + Sk dk Sk , + bk Sk + Tr dk σk+1 i j(0) i j(0) i j(0) = k+1 bk + 2dk Sk , b0 = 0m×1 , i j(0)
ak+1 = ak i j(0)
bk+1
i j(0)
i j(0)
a0
ei ej + e j ei 1 i j(0) k+1 + (ei ej + e j ei ), d0 = , 2 2 i j(1) i j(1) i j(1) −1 i j(1) i j(1) = ak + bk Sk + Tr dk σk+1 + Sk dk Sk , a0 = 0, i j(1) i j(1) i j(1) = k+1 bk + 2dk Sk + ei ej Sk , b0 = 0m×1 , i j(0)
dk+1 = k+1 dk i j(1)
ak+1
i j(1)
bk+1
i j(1)
= 0,
(5.7.18) (5.7.19) (5.7.20) (5.7.21) (5.7.22)
1 i j(1) k+1 + (ei ej k+1 + k+1 e j ei ), d0 = 0m×m , (5.7.23) 2 i j(2) i j(2) i j(2) −1 i j(2) = ak + bk Sk + Tr dk σk+1 + Sk (dk + ei ej )Sk i j(1)
dk+1 = k+1 dk i j(2)
ak+1
i j(2)
bk+1
i j(2) −1 , a0 = 0, + Tr ei ej σk+1 i j(2) i j(2) = k+1 bk + (2dk + 2ei ej )Sk ,
ei ej + e j ei i j(2) i j(2) k+1 dk+1 = k+1 dk + , 2
(5.7.24) i j(2)
b0
i j(2)
d0
= 0m×1 ,
(5.7.25)
= 0m×m ,
(5.7.26)
where Tr[.] denotes the trace of a matrix (which is the sum of the diagonal elements), σk is defined in (5.7.15) and µk , Rk are obtained from the Kalman filter (5.7.13) and (5.7.14).
184
Kalman filtering
Proof Only the proof of the theorem for M = 0 is given; the proofs for M = 1, 2 are similar and left as an exercise. i j(0) i j(0) i j(0) From (5.7.12), at time k = 0, β0 (x) is of the form (5.7.17) with a0 = 0, b0 = 0, i j(0) d0 = (ei e j + e j ei )/2. Assume that (5.7.17) holds at time k. Then at time k + 1, using (5.7.17) and the recursion (5.7.8) it follows that i j(0) i j(0) i j(0) i j(0) βk+1 (x) = (x, Yk ) ak + bk z + z dk z αk (z) IRm
× ψ(Bk−1 (x − Ak z))dz + x, ei x, e j αk+1 (x).
(5.7.27)
Denote the first term on the RHS of (5.7.27) as I1 . 1 −2 I1 = K (x) exp − {(x − Ak+1 z) Bk+1 (x − Ak+1 z) + (z − µk ) Rk−1 (z − µk )} 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz 1 i j(0) i j(0) i j(0) = K 1 (x) exp − (z σk+1 z − ξk+1 z) ak + bk z + z dk z dz, (5.7.28) 2 IRm where σk is defined in (5.7.15), −2 ξk+1 = 2(x Bk+1 Ak+1 + µk Rk−1 ), −1/2
K (x) = (x, Y )(2π)−m |Bk+1 |−1 |Rk α¯ k = αk (z)dz, IRm
α¯ k ,
1 −2 −1 K 1 (x) = K (x) exp − (x Bk+1 x + µk Rk µk ) . 2
(5.7.29)
Completing the “square” in the exponential term in (5.7.28) yields
−1 σk+1 ξk+1 ξk+1 1 I1 = K 1 (x) exp − − 2 4 −1 −1 σk+1 ξk+1 ξk+1 σk+1 1 × exp − σk+1 z − z− 2 2 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz. Now consider the integral in (5.7.30), −1 −1 σk+1 ξk+1 ξk+1 σk+1 1 exp − σk+1 z − z− 2 2 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz i j(0) i j(0) i j(0) = (2π)m/2 |σk+1 |−1/2 ak + bk E[z] + E[z dk z] ,
(5.7.30)
(5.7.31)
5.7 Finite-dimensional filters
185
since the exponential term is an unnormalized Gaussian density in z with normalization constant (2π)m/2 |σk+1 |−1/2 . So E[z] = E[z dk
i j(0)
−1 σk+1 ξk+1 , 2
z] = E{(z − E[z]) dk
i j(0)
i j(0)
= Tr[dk
(5.7.32)
(z − E[z])} + E[z ]dk
i j(0)
E[z],
1 −1 i j(0) −1 −1 σk+1 ] + (σk+1 ξk+1 ) dk (σk+1 ξk+1 ). 4
(5.7.33)
Therefore from (5.7.30), (5.7.31), (5.7.32), (5.7.33) and (5.7.27) it follows that 1 i j(0) −1 i j(M) i j(0) i j(0) −1 βk+1 (x) = αk+1 (x) ak + bk σk+1 ξk+1 + Tr[dk σk+1 ] 2 1 −1 i j(0) −1 + ξk+1 σk+1 dk σk+1 ξk+1 + x ei ej x . 4
(5.7.34)
Substituting for ξk+1 (which is affine in x) yields i j(0) i j(0) i j(0) i j(0) βk+1 (x) = ak+1 + bk+1 x + x dk+1 x αk+1 (x), i j(0)
i j(0)
i j(0)
where ak+1 , bk+1 and dk+1 are given by (5.7.18), (5.7.19) and (5.7.20). The proof of the following theorem (Theorem 5.3 in [13]) is very similar and hence omitted. Theorem 5.7.6 At time k, the density δkin (x) (initialized according to (5.7.12)) is completely defined by the four statistics a¯ kin , b¯kin , Rk and µk as follows: δkin (x) = a¯ kin + b¯kin x αk (x), (5.7.35)
where a¯ kin ∈ IR, b¯kin ∈ IRm , are given by the following recursions: in a¯ k+1 = a¯ kin + b¯kin Sk ,
in b¯k+1 = k+1 b¯kin ,
a¯ 0in = 0,
(5.7.36)
b¯0in = ei Yk+1 , f n ,
(5.7.37)
where k and Sk are defined in (5.7.16) and µk , Rk are obtained from the Kalman filter (5.7.13) and (5.7.14). i j(M)
Having characterized the densities βk , M = 0, 1, 2, and δkin by their finite-dimensional i j(M) sufficient statistics, finite-dimensional filters for Tk and Ukin are now derived. i j(M)
Theorem 5.7.7 (Theorem 5.4 in [13]) Finite-dimensional filters for Tk , M = 0, 1, 2, and Ukin are given by i j(M) i j(M) i j(M) i j(M) i j(M) E[Tk | Yk ] = a k + bk µk + Tr dk Rk + µk dk µk , (5.7.38)
E[Ukin | Yk ] = a¯ kin + b¯kin µk .
(5.7.39)
186
Proof
Kalman filtering
Using the abstract Bayes’ Theorem (4.1.1), i j(M)
i j(M) E[Tk
| Yk ] =
E[k Tk
i j(M)
| Yk ]
E[k | Yk ]
=
IR
m
βk
K
dx ,
(5.7.40)
where the constant K = (5.7.17), i j(M)
IRm
βk
IRm
αk (x)dx. But since αk (x) is an unnormalized density, from
i j(M) i j(M) i j(M) dx = K E ak + bk x + x dk x i j(M) i j(M) i j(M) i j(M) = K ak + bk µk + Tr dk Rk + µk dk µk .
(5.7.41)
Substituting in (5.7.40) proves the theorem. The proof of (5.7.39) is left as an exercise.
Remark 5.7.8 Theorem (5.7.7) gives finite-dimensional filters for the time sum of the ij states Ukin and time sum of the square of the states Tk . Theorem 6.2 in [13] shows that finite-dimensional filters exist for the time sum of any arbitrary integral power of the states. For notational simplicity, assume that the state and observation processes are scalar-valued i.e. m = d = 1 in (5.6.1) and (5.6.2). k p Let Tk = i=0 X i and define the unnormalized density βk (x) = E[k Tk I (X k ∈ dx) | Yk ]. The first step is to obtain a recursion for βk (x). It can be easily shown that βk (x) = (x, Yk ) βk−1 (z)ψ(Bk−1 (x − Ak z))dz + x p αk (x), IR
(5.7.42) φ(Dk−1 (Yk − Ck x)) Bk Dk φ(Yk ) Next, βk (x) is characterized in terms of finite sufficient statistics, as shown in Theorem 5.7.3. Also for p = 1 and 2, Theorems 5.7.6 and 5.7.5 give finite-dimensional sufficient statistics. Theorem 6.2 in [13] shows that βk can be characterized in terms of finitedimensional statistics for any p ∈ IN. where (x, Yk ) =
Theorem 5.7.9 ([13] Theorem 6.2) At time k, the density βk (x) in (5.7.42) is completely defined by p + 3 statistics ak (0), ak (1), . . . , ak ( p), Rk and µk as follows: p i ak (i)x αk (x), (5.7.43) βk (x) = i=0
5.7 Finite-dimensional filters
187
where p i
ak+1 (n) =
ak (i)ηi j
j
i=n j=n
n
−2 n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) , 0 ≤ n < p,
−2 p ak+1 ( p) = 1 + ak ( p)η pp (Ak+1 Bk+1 ) ,
(5.7.44)
and i −( j+1) 1.3...(i − j − 1)σk+1 j ηi j = 0 σ − j k+1
if i − j is even, i > j, if i − j is odd, i > j,
(5.7.45)
if i = j.
Proof At time k = 0, β0 (x) = x p α0 (x) and so satisfies (5.7.43). Assume that (5.7.43) holds at time k. Then at time k + 1, using similar arguments to Theorem 5.7.5, it follows that p −1 i βk+1 (x) = (x, Yk ) ψ(Bk+1 (x − Ak+1 z)) ak (i)z αk (z)dz + x p αk (x). (5.7.46) IR
i=0
Denote the RHS as I1 .
−1 σk+1 δk+1 δk+1 1 I1 = K 1 (x) exp − − 2 4 2 p −1 1 δ σ k+1 × z − k+1 ak (i)z i exp − σk+1 dz. 2 2 IR i=0
(5.7.47)
The integral in (5.7.47) is 1/2
(2π)
−1/2
|σk+1 |
E
p
ak (i)z
i
i=0
= (2π)
1/2
−1/2
|σk+1 |
p i=0
ak (i)
i
i
j=0
j
E (z − E[z])i− j (E[z]) j .
(5.7.48)
Recall from (5.7.32) that E[z] is affine in x: −1 −2 E[z] = σk+1 [Rk−1 µk + Ak+1 Bk+1 x].
Also E[(z − E[z])2 ] is independent of x. Indeed ([31], p. 111), if i − j is even, i > j, 0 i− j −1 E (z − E[z]) = 1.3...(i − j − 1)σk+1 if i − j is even, i > j, 1 if i = j.
(5.7.49)
(5.7.50)
188
Kalman filtering
Thus
βk+1 (x) = αk+1 (x) = αk+1 (x)
p p j
ak (i)ηi j
i=0 j=0 n=0 p p i
ak (i)ηi j
n=0 i=n j=n
j n j n
−2 n n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) x
+x
p
−2 n n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) x
+x
p
.
(5.7.51) Equation (5.7.51) is of the form (5.7.43) with ak+1 (i), i = 0, . . . , p given by (5.7.44).
Remark 5.7.10 The filters derived in Theorems 5.7.3, 5.7.5 and 5.7.6 have one major problem: they require Bk to be invertible. In practice, Bk is often not invertible. A simple transformation that expresses the filters in the terms of the inverse of the predicted Kalman covariance matrix is used. This inverse exists even if Bk is singular as long as a certain uniform controllability condition holds. Both the uniform controllability condition and the transformation used are well-known in Kalman filter literature (Chapter 7 in [20]).
First define the Kalman predicted state estimate µk|k−1 = E[X k | Yk−1 ] and the predicted
state covariance Rk|k−1 = E[(X k − µk|k−1 )(X k − µk|k−1 ) | Yk−1 ]. It is left as an exercise to show that Rk|k−1 = Bk2 + Ak Rk−1 Ak .
(5.7.52)
The first step is to provide a sufficient condition for Rk|k−1 to be non-singular. Definition 5.7.11 ([20] Chapter 7) The state space model (5.6.1), (5.6.2) is said to be uniformly completely controllable if there exist a positive integer N1 and positive constants α, β such that α I ≤ C(k, k − N1 ) ≤ β I
for all k ≥ N1 .
(5.7.53)
Here k
C(k, k − N1 ) =
φ(k, l + 1)Bl Bl φ(k, l + 1) .
(5.7.54)
l=k−N1
φ(k2 , k1 ) =
Ak2 Ak2 −1 ...Ak1 +1
if k2 > k1 ,
I
if k2 = k1 .
(5.7.55)
Lemma 5.7.12 If the state space model (5.6.1), (5.6.2) is uniformly completely controllable and R0 ≥ 0 then Rk and Rk|k−1 are positive definite matrices (and hence nonsingular) for all k ≥ N1 . Proof See [20], p. 238, Lemma 7.3. The following lemma will be used in the sequel.
5.7 Finite-dimensional filters
189
−1 Lemma 5.7.13 (Lemma 7.3 in [13]) Assume Rk|k−1 exists. Then with σk and k defined in (5.7.15) and (5.7.16), respectively, −1 σk−1 = Rk−1 − Rk−1 Ak Rk|k−1 Ak Rk−1 , −1 k+1 = Rk+1|k Ak+1 Rk .
(5.7.56) (5.7.57)
Furthermore, the Kalman filter (5.7.13), (5.7.14) can be expressed in “standard” form µk = Ak µk−1 + Rk|k−1 Ck [Ck Rk|k−1 Ck + Dk Dk ]−1 (Yk − Ck Ak µk−1 ), Rk = Rk|k−1 − Rk|k−1 Ck [Ck Rk|k−1 Ck + Dk Dk ]−1 Ck Rk|k−1 , Rk|k−1 = Bk2 + Ak Rk−1 Ak . Proof
(5.7.58)
Straightforward use of the Matrix Inversion Lemma 5.4.3 on (5.7.15) yields σk−1 = Rk−1 − Rk−1 Ak (Bk2 + Ak Rk−1 Ak )−1 Ak Rk−1 .
(5.7.59)
Substituting (5.7.52) in (5.7.59) proves (5.7.56). To prove (5.7.57), first note that
−2 −1 k+1 = Bk+1 Ak+1 σk+1 −2 −2 −1 = Bk+1 Ak+1 Rk − Bk+1 Ak+1 Rk Ak+1 Rk+1|k Ak+1 Rk −2 −2 −2 −1 = Bk+1 Ak+1 Rk − Bk+1 (Rk+1|k − Bk+1 )Rk+1|k Ak+1 Rk ,
because of (5.7.52). So −2 −2 −1 k+1 = Bk+1 Ak+1 Rk − Bk+1 Ak+1 Rk + Rk+1|k Ak+1 Rk −1 = Rk+1|k Ak+1 Rk .
To prove (5.7.58), consider the Kalman filter equations (5.7.13) and (5.7.14). Using Lemma 5.7.12 to the first term on the RHS of (5.7.13) yields the “standard” Kalman filter equations. Now the filters derived earlier in this section are expressed in terms of Rk|k−1 instead of Bk . As shown below (Theorem 7.4 in [13]), the advantage of doing so is that Bk no longer needs to be invertible, as long as the uniformly controllability condition in Definition 5.7.11 holds. Theorem 5.7.14 Consider the linear dynamical system (5.6.1) and (5.6.2) with Bk not necessarily invertible. Assume that the system is uniformly completely controllable, i.e. (5.7.53) holds. Then at time k, with σk−1 given by (5.7.56) and k defined in (5.7.57), the following model holds. 1. The density αk (x) (defined in (5.7.1)) is an unnormalized Gaussian density with mean µ ∈ IRm and covariance matrix Rk ∈ IRm×m . These are recursively computed via the standard Kalman filter equations (5.7.58).
190
Kalman filtering i j(M)
2. The density βk (x) (initialized according to (5.7.12)) is completely defined by the five i j(M) i j(M) i j(M) statistics ak , bk , dk , Rk and µk as follows: i j(M) i j(M) i j(M) i j(M) βk (x) = ak + bk x + x dk x αk (x), M = 0, 1, 2, i j(M)
i j(M)
i j(M)
where ak ∈ IR, bk ∈ IRm , and dk ∈ IRm×m is a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m. These statistics are recursively computed by Equations (5.7.18) to (5.7.26). 3. The density δkin (x) (initialized according to (5.7.12)) is completely defined by the four statistics a¯ kin , b¯kin , Rk and µk as follows: δkin (x) = a¯ kin + b¯kin x αk (x), (5.7.60) where a¯ kin ∈ IR, b¯kin ∈ IRm are given by the recursions (5.7.36) and (5.7.37).
i j(M)
Finally, finite-dimensional filters for Tk given by (5.7.38) and (5.7.39).
and Ukin in terms of the above statistics are
Proof It only remains to show that subject to the uniform complete controllability condition (5.7.53), the filtering equations (5.7.18)–(5.7.26) and (5.7.36), (5.7.37) in Theorem 5.7.14 hold even if the matrices Bk+1 are singular. The proof of this is as follows. If Bk+1 is singular, 1. Add × N (0, 1) noise to each component of X k+1 . This is done by replacing Bk+1 in (5.6.1) with the nonsingular matrix Bk+1 = Bk+1 + Im , where ∈ IR. Denote the resulting state process X k+1 . 2. Define Rk+1|k as in (5.7.52) with Bk+1 replaced by Bk+1 . Express the filters in term of Rk+1|k as Theorem 5.7.14. 3. As → 0, Rk+1|k → Rk+1|k . 4. Then using the bounded conditional convergence theorem (p. 214, [4]), the conditional i j(0) estimates of X k , X k X k , Tk (x ), and Ukin (x ) converge to the conditional estimates of i j(0) X k , X k X k , Tk (x), and Ukin (x), respectively.
5.8 Continuous-time vector dynamics Consider the classical linear Gaussian model for the signal and observation processes. That is, the signal {xt }, t ≥ 0, is described by the equation dxt = At xt dt + Bt dw t ,
x0 ∈ IRm ,
(5.8.1)
and the observation process {yt }, t ≥ 0, is described by the equation dyt = Ct xt dt + Dt dvt ,
y0 = 0 ∈ IRn .
(5.8.2)
Here w and v are independent r -dimensional and n-dimensional Brownian motions, respectively, defined on a probability space (, F, P) with complete filtrations Ft = σ {xs , ys : s ≤ t}, and Yt = σ {ys : s ≤ t}, t ≥ 0. Further, w and v are independent of x0 . We assume
5.8 Continuous-time vector dynamics
191
that x0 is random variable with normal density
) p0 (x) = (2π )−n/2 |P0 |−1/2 exp(−1/2) (x − xˆ0 ) P0−1 (x − xˆ0 ) .
The matrix functions At ∈ IRm×m , Bt ∈ IRm×r , Ct ∈ IRn×m , and Dt ∈ IRn×n are measurable functions of t. We assume Dt is a positive definite matrix. We model the above dynamics by supposing that initially we have an “ideal” probability space (, F, P) such that under P, 1. w is an r -dimensional Brownian motion and {xt } is defined by (5.8.1), 2. y is an n-dimensional Brownian motion, independent of w and x0 , and having quadratic variation yt = Dt > 0; i.e., Dt is a positive definite matrix.
∂ ∂ Write = ,..., . ∂ x1 ∂ xn m For any function g : IR → IR, write 2 ∂2g ∂ g . . . ∂x2 ∂ x1 ∂ xm 1 .. .. .. 2 = . . . . 2 ∂ g ∂2g ... ∂ xm ∂ x1 ∂ xm2 For a vector field g(x) = (g1 (x), g2 (x), . . . , gm (x)) defined on IRm , define div(g) = Define
t
t = exp
(Cs xs ) 0
∂g2 ∂gm ∂g1 + + ··· + . ∂ x1 ∂ x2 ∂ xm
Ds−1 (Ds−1 ) dys
which is also given by
t
t = 1 + 0
1 − 2
t 0
xs Cs (Ds−1 ) Ds−1 Cs xs ds
s xs Cs (Ds−1 ) Ds−1 dys .
To see this apply the Itˆo rule to the function log t . Then t is an Ft -martingale and E[t ] = 1. A new probability measure P can be defined by setting Define the process vt by the formula dvt = Dt−1 (yt − Ct xt dt),
,
(5.8.3)
dP = t . dP Ft
v0 = 0.
Then Girsanov’s theorem 4.3.3 implies that {vt } is a standard n-dimensional Brownian motion process under P. Therefore under P, dyt = Ct xt dt + Dt dvt . Note that under P, the process {xt } still satisfies (5.8.1). Consequently, under P the processes {xt } and {yt } satisfy the real world dynamics (5.8.1) and (5.8.2). However, P is a more convenient measure with which to work.
192
Kalman filtering
For any “test” function φ : IRm → IR which is in C 2 and has compact support, write σ (φ)t = E[t φ(xt ) | Yt ]. In the case when the measure defined by σ (.)t has a density q(x, t) we have σ (φ)t = φ(x)q(x, t)dx. IRm
Using the vector Itˆo rule (Theorem 3.6.9) we establish t t φ(xt ) = φ(x0 ) + ( φ(xs )) As xs ds + ( φ(xs )) Bs dw s 1 + 2
0
t 0
0
Tr[ 2 φ(xs ) Bs Bs ]ds.
(5.8.4)
In view of (5.8.3) and (5.8.4) and using the Itˆo product rule (Example 3.7.15), t t t φ(xt ) = φ(x0 ) + s ( φ(xs )) As xs ds + s ( φ(xs )) Bs dw s 0
+
1 2
+ 0
0
t 0
t
s Tr[ 2 φ(xs ) Bs Bs ]ds
s φ(xs )xs Cs (Ds−1 ) Ds−1 dys .
(5.8.5)
Conditioning both sides of (5.8.5) on Yt and using the fact that Bt and yt are independent and that yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 p. 261), we have Theorem 5.8.1 Suppose φ ∈ C 2 is a real-valued function with compact support. Then t σ (φ)t = σ (φ)0 + σ (( φ(xs )) As xs )ds 0
1 t + σ (Tr[ 2 φ(xs ) Bs Bs ])ds 2 0 t + σ (φ(xs )xs )Cs (Ds−1 ) Ds−1 dys . 0
If σ (.)t has a density q(x, t), we integrate by parts each term of q(x, t)φ(x)dx = q(x, 0)φ(x)dx IRm
IRm
+
t m
0
IR
0
IRm
q(x, t)( φ(x)) As x)dxds
1 t + q(x, t)Tr[ 2 φ(x) Bs Bs ]dxds 2 0 IRm t + q(x, t)φ(x)x Cs (Ds−1 ) Ds−1 dxdys .
(5.8.6)
5.8 Continuous-time vector dynamics
For instance, if m = 2, t t q(x, t)( φ(x)) As x)dxds = 0
IRm
193
∂φ ∂φ q(x, t) , m ∂ x1 ∂ x2 IR
0
× (a11 x1 + a12 x2 , a21 x1 + a22 x2 ) dxds t ∂φ = q(x, t)(a11 x1 + a12 x2 ) dxds m ∂ x1 0 IR t ∂φ + q(x, t)(a21 x1 + a22 x2 ) dxds ∂ x2 0 IRm t ∂q(x, t)(a11 x1 + a12 x2 ) =− φ(x) dxds m ∂ x1 0 IR t ∂q(x, t)(a21 x1 + a22 x2 ) − φ(x) dxds m ∂ x2 0 IR t ∂q(x, t)(a11 x1 + a12 x2 ) =− φ(x) ∂ x1 IRm 0 ∂q(x, t)(a21 x1 + a22 x2 ) dxds + ∂ x2 t =− φ(x)div(As xq(x, s))dxds. IRm
0
Similarly, t 0
IRm
q(x, s)Tr[ 2 φ(x) Bs Bs ]dxds =
t 0
IRm
φ(x)Tr[ 2 q(x, s)Bs Bs ]dxds,
which holds for all “test” functions φ; hence Lemma 5.8.2
q(x, t) = q(x, 0) − + 0
t
div(As xq(x, s))ds +
0 t
1 2
0
t
Tr[ 2 q(x, s)Bs Bs ]ds
q(x, s)x Cs (Ds−1 ) Ds−1 dys ,
(5.8.7)
with q(x, 0) = p0 (x), the density of x0 . Remark 5.8.3 Equation (5.8.7) is a stochastic partial differential equation for the unnormalized conditional density of xt given Yt . In general, the solution of this equation is a conditional density function, evolving stochastically in time. For the linear, Gaussian dynamics (5.8.1) and (5.8.2), however, q(x, t) has a simple form. Theorem 5.8.4 The solution of (5.8.7) is ) q(x, t) = (2π)−m/2 | t |−1/2 νt exp(−1/2) (x − m t ) t−1 (x − m t ) .
(5.8.8)
Here m t = E[xt | Yt ], m 0 = xˆ0 , t = E[(xt − m t ) (xt − m t ) | Yt ], 0 = P0 , and νt is a normalizing factor.
194
Kalman filtering
It is well known that m t and t are given by the Kalman filter equations dm t = At m t dt + t Ct (Dt−1 ) Dt−1 (dyt − Ct m t dt) = At m t dt + t Ct (Dt−1 ) dvt , (since under P, dvt = Dt−1 (dyt − Ct m t dt), d t = t At + At t + Bt Bt − t Ct (Dt−1 ) Dt−1 Ct t . dt Note that t is deterministic and can be computed off-line. Also
t 1 t −1 −1 νt = exp m s Cs (Ds−1 ) Ds−1 dys − m s Cs (Ds ) Ds Cs m s ds . 2 0 0 Proof
(5.8.9) (5.8.10)
(5.8.11)
([2]) We have to show that for any “test” function φ(.) t 1 t σ (φ)t = σ (φ)0 + σ (( φ(xs )) As xs )ds + σ (Tr[ 2 φ(xs ) Bs Bs ]ds 2 0 0 t + σ (φ(xs )xs )Cs (Ds−1 ) Ds−1 dys 0
=
IRm
φ(x)q(x, t)dx,
(5.8.12)
where q(x, t) is given by (5.8.8). −1/2 1/2 Let ξ = t (x − m t ), x = m t + t ξ and dx = | t |1/2 dξ . Hence the dx integral on the right hand side of (5.8.12) is equal to 2 1/2 (2π)−m/2 φ(m t + t ξ )νt e−|ξ | /2 dξ. IRm
Now:
d
(2π) IRm
= IR
m
−m/2
φ(m t +
2 1/2 t ξ )νt e−|ξ | /2 dξ
2 1/2 (2π)−m/2 d φ(m t + t ξ )νt e−|ξ | /2 dξ,
and using the product rule, d(φνt ) = φdνt + νt dφ + d φ, νt ∂φ 1 ∂ 2φ 1/2 (dm t + d t ξ ) + Tr[ 2 d m, mt ]. ∂x 2 ∂x From (5.8.9), (5.8.10) and (5.8.11), 1/2
dφ(m t + t ξ ) =
dνt = νt m s Cs (Ds−1 ) Ds−1 dys , d t = ( t At + At t + Bt Bt − t Ct (Dt−1 ) Dt−1 Ct t )dt, dm t = (At m t − t Ct (Dt−1 ) Dt−1 Ct m t )dt + t Ct (Dt−1 ) Dt−1 dyt , d m, mt = t Ct (Dt−1 ) Dt−1 Ct t dt, d φ, νt = νt
∂φ t Ct (Dt−1 ) Dt−1 Ct m t dt. ∂x
5.8 Continuous-time vector dynamics
195
Therefore ∂φ 1/2 1 ∂ 2φ d t ξ + Tr[ 2 t Ct (Dt−1 ) Dt−1 Ct t ] dt ∂x ∂x 2 ∂x ∂φ + [φm s Cs (Ds−1 ) Ds−1 + t Ct (Dt−1 ) Dt−1 ]dys , ∂x
d(φνt ) = νt
∂φ
At m t +
and
∂φ
∂φ 1/2 d t ξ ∂x ∂x IRm 1 ∂ 2φ + Tr[ 2 t Ct (Dt−1 ) Dt−1 Ct t ] dt 2 ∂x ∂φ 2 + [φm s Cs (Ds−1 ) Ds−1 + t Ct (Dt−1 ) Dt−1 ]dys e−|ξ | /2 dξ. ∂x
dσ (φ)t = (2π)
−m/2
νt
At m t +
However, using integration by parts, IRm
IRm
∂φ 1/2 −|ξ |2 /2 1 d t ξ e dξ = ∂x 2
∂ 2 φ d t −|ξ |2 /2 ]e dξ, ∂ x 2 dt IRm ∂φ 2 2 1/2 t Ct (Dt−1 ) Dt−1 e−|ξ | /2 dξ = φξ t Ct (Dt−1 ) Dt−1 e−|ξ | /2 dξ, ∂x IRm
−1/2
ξ = x t
−1/2
− m t t
Tr[
. It follows that
dσ (φ)t = (2π )−m/2 νt
∂φ IRm
∂x
At m t
1 1 ∂ 2 φ 1 ∂ 2φ 0 + Tr A + A B B dt + Tr t t t t t t 2 ∂x2 2 ∂x2 2 + φx Ct (Dt−1 ) Dt−1 dys e−|ξ | /2 dξ. Using integration by parts again,
∂ 2φ ∂ 2 φ −|ξ |2 /2 −|ξ |2 /2 Tr A dξ = Tr A dξ e e t t t t ∂x2 ∂x2 IRm IRm ∂φ 2 1/2 = At t ξ e−|ξ | /2 dξ m ∂x IR ∂φ 2 = At xe−|ξ | /2 dξ − IRm ∂ x IRm ∂ 2φ ∂φ 2 2 1/2 Tr At t e−|ξ | /2 dξ = At t ξ e−|ξ | /2 dξ 2 ∂x IRm IRm ∂ x ∂φ −|ξ |2 /2 = At xe dξ − IRm ∂ x IRm
∂φ 2 At m t e−|ξ | /2 dξ, ∂x
∂φ 2 At m t e−|ξ | /2 dξ. ∂x
196
Kalman filtering
Hence
1 ∂ 2φ At x + Tr B B dt t t ∂x 2 ∂x2 IRm 2 + φx Ct (Dt−1 ) Dt−1 dys e−|ξ | /2 dξ
dσ (φ)t = (2π )
−m/2
∂φ
νt
1 = σ (( φ(x)) At x)ds + σ (Tr[ 2 φ(x) Bt Bt )dt 2 + σ (φ(x)x )Ct (Dt−1 ) Dt−1 dyt , which is the desired result.
5.9 Continuous-time model parameters estimation The linear model given by (5.8.1) and (5.8.2) is determined by the matrices A, B, C and D which need to be known. These parameters are estimated using the expectation maximization (EM) algorithm which we describe here. Maximum likelihood estimation of the parameters via the EM algorithm requires computation of the filtered estimates of quantities such as
t 0
t
xs ⊗ dxs ,
xs ⊗ dys ,
0
t
xs ⊗ xs ds.
0
Remark 5.9.1 In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, filtered estimates of the above quantities are computed via Kalman smoothing, which requires large memory numerical implementation. This problem is solved in [14] by providing finite-dimensional filters for (the components of) such integral processes. It is further shown that finite-dimensional filters exist for integrals and stochastic integrals of moments of all orders of the state process etc. Consider the time-invariant version of (5.8.1), (5.8.2): dxt = Axt dt + Bdw t ,
x0 ∈ IRm ,
dyt = C xt dt + Ddvt ,
y0 = 0 ∈ IRn .
The aim is to compute ML estimates of the parameters θ = (A, C) given the observations Yt = σ {ys : s ≤ t} and assuming B, D are known. This is done via the EM algorithm. Remark 5.9.2 Unlike the discrete-time case, in continuous time, it is not possible to obtain ML estimates of the variance terms B and D because measures corresponding to Wiener processes with different variances are not absolutely continuous (see Chapter 6.1 in [15]). Estimates for B and D are given in terms of the quadratic variations of the state and observation processes.
5.9 Continuous-time model parameters estimation
197
Notation Let ei , e j ∈ IRm denote unit vectors with 1 in the i-th and j-th positions, respectively. Write ij Tt
t
=
0
t
ij
Lt =
t
xs , ei e j , xs ds = 0
t
xs , ei e j , dxs =
0
xs (ei ej )xs ds,
xs (ei ej )dxs ;
0
(5.9.1) (5.9.2)
here ., . denotes the scalar product. Also let f j ∈ IRn denote the unit vector with 1 in the j-th position. Write
t
ij
Ut =
xs , ei f j , dys =
0
0
t
xs (ei f j )dys .
(5.9.3)
Now the expression for Q(θ, θ˜ ) is derived. To update the estimates from A˜ to A, introduce the density t dPθ xs (A − A˜ )(B B )# dxs = exp dPθ˜ Gt 0 1 t − xs (A − A˜ )(B B )# (A − A˜ )xs ds , 2 0 where # denotes the pseudo inverse. Then t dP(A) E log xs A (B B )# dxs | Yt = E ˜ dP( A) 0 1 t ˜ − xs A(B B )# Axs ds | Yt + R( A), 2 0
(5.9.4)
˜ does not involve A. where R( A) Similarly, to update the estimates from C˜ to C, introduce the density t dP(C) ˜ = exp xs (C − C)(D D )−1 dxs ˜ Gt dP(C) 0 1 t ˜ ˜ s ds , − xs (C − C)(D D )−1 (C + C)x 2 0 Consequently,
t dP(C) E log xs C (D D )−1 dys | Yt = E ˜ dP(C) 0 1 t ˜ − xs C (D D )−1 C xs ds | Yt + S(C), 2 0 ˜ does not involve C. where R(C)
(5.9.5)
198
Kalman filtering
Adding (5.9.4) and (5.9.5) yields t 1 t # # ˜ Q(θ, θ) = E xs A (B B ) dxs − x A(B B ) Axs ds | Yt 2 0 s 0 t 1 t +E xs C (D D )−1 dys − xs C (D D )−1 C xs ds | Yt 2 0 0 + E[R(θ˜ ) | Yt ]. ∂Q = 0. This yields ∂θ t −1 t E A=E dxs ⊗ xs | Yt xs ⊗ xs ds | Yt
To implement the M-step set the derivatives
0
0
= Lˆ t Tˆt−1 , t t −1 C=E E dys ⊗ xs | Yt xs ⊗ xs ds | Yt 0
0
= Jˆt Hˆ t−1 , ij ij ij where Tˆt and Lˆ t ∈ IRm×m denote matrices with elements Tˆt = E[Tt | Yt ] and Lˆ t = ij ij E[L t | Yt ], i, j ∈ {1, . . . , m}. Also Uˆ t ∈ IRm×n denotes the matrix with elements Uˆ t = ij E[Ut | Yt ], i ∈ {1, . . . , m}, j ∈ {1, . . . , n}. ij ij ij Remark 5.9.3 The terms Tˆt , Lˆ t and Uˆ t are computed in terms of finite-dimensional filters in Theorems 3.2, 3.8 and 3.5 in [14], thus obtaining a filter-based EM algorithm.
Definition 5.9.4 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Tt g(xt ) | Yt ]. This has a density βt (x) so that ij ij E[t Tt g(xt ) | Yt ] = βt (x)g(xt )dx. IRm
ij
The existence of the density βt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations. This is established in, for instance, Section 4.2 of [2]. The following theorem (Theorem 3.2 in [14]) shows the surprising result that we can ij describe the measure βt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.6). ij
Theorem 5.9.5 At time t, the density βt (x) is completely described by the five statistics ij ij ij a¯ t , b¯t , c¯t , t , and m t as follows: βt (x) = (a¯ t + x b¯t + x c¯t x)q(x, t). ij
ij
ij
ij
(5.9.6)
ij ij ij Here a¯ t ∈ IR, b¯t ∈ IRm , and c¯t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices.
5.9 Continuous-time model parameters estimation
199
Further, ij da¯ t ij i j ij = Tr c¯t Bt Bt + b¯t Bt Bt t−1 m t , a¯ 0 = 0 ∈ IR, dt ij 1 ij 0 db¯t ij ij = − At + t−1 Bt Bt b¯t + 2c¯t Bt Bt t−1 m t , b¯0 = 0 ∈ IRm , dt ij 1 ij 1 0 dc¯t ij 0 = − At + t−1 Bt Bt c¯t − c¯t At + Bt Bt t−1 dt 1 ij + (e j ei + ei ej ), c¯0 = 0 ∈ L s (IRm , IRm ). 2
Proof
(5.9.7) (5.9.8)
(5.9.9)
Apply the Itˆo product rule to dTt = xt (ei ej )xt dt and ij
dt φ(xt ) = t ( φ(xt )) At xt dt + t ( φ(xt )) Bt dw t 1 + t Tr[ 2 φ(xt ) Bt Bt ]dt 2 + t φ(xt )xt Ct (Dt−1 ) Dt−1 dyt ,
(5.9.10)
to get
t
ij
t φ(xt )Tt = 0
s Tsi j ( φ(xs )) As xs ds
t
+ 0
+
1 2
t
0 t
+
s Tsi j ( φ(xs )) Bs dw s
0 t
+ 0
s Tsi j Tr[ 2 φ(xs ) Bs Bs ]ds
s Tsi j φ(xs )xs Cs (Ds−1 ) Ds−1 dys s φ(xs )xs (ei ej )xs ds.
(5.9.11)
Conditioning both sides of (5.9.11) on Yt under the “ideal” world probability measure P and using Lemma 3.2 p. 261 of Hajek and Wong [15] gives ij E[t φ(xt )Tt
| Yt ] = 0
+
t
E[s Tsi j ( φ(xs )) As xs | Ys ]ds
1 2
+
0 t
0
+
0
t
t
E[s Tsi j Tr[ 2 φ(xs ) Bs Bs ] | Ys ]ds
E[s Tsi j φ(xs )xs Cs (Ds−1 ) Ds−1 | Ys ]dys E[s φ(xs )xs (ei ej )xs | Ys ]ds.
200
Kalman filtering ij
In terms of the densities βt (x) and q(x, t), integrate by parts each term of ij
IR
m
βt (x)φ(x)dx =
t 0
βt (x)( φ(x)) As x)dxds ij
IR
m
1 t ij + βt (x)Tr[ 2 φ(x) Bs Bs ]dxds 2 0 IRm t ij + βt (x)φ(x)x Cs (Ds−1 ) Ds−1 dxdys
IRm
0
t
+ 0
q(x, t)φ(xs )xs (ei ej )xs ds.
ij
that is, βt (x) must satisfy the stochastic partial differential equation
t
ij
βt (x) = −
div(As xβsi j (x))ds
0
1 t + Tr[ 2 βsi j (x)Bs Bs ]ds 2 0 t + βsi j (x)x Cs (Ds−1 ) Ds−1 dys 0
+
0
t
q(x, s)xs (ei ej )xs ds.
(5.9.12)
We look for a solution of (5.9.12) of the form β¯si j (x) = (a¯ si j + x b¯si j + x c¯si j x)q(x, s).
(5.9.13)
As noted just after Definition 5.9.4, if such solution exists, it is unique. ¯ b¯ and c, ¯ To simplify notation drop superscripts i, j on a, div(β¯s (x)As x) = div((a¯ s + x b¯s + x c¯s x)q(x, s)As x) = (b¯s + 2c¯s ) As xq(x, s) + (a¯ s + b¯s x + x c¯s x)div(As xq(x, s)) β¯s (x) = ((a¯ s + x b¯s + x c¯s x)q(x, s)) = (b¯s + 2c¯s x)q(x, s) + (a¯ s + x b¯s + x c¯s x) q(x, s) 2 β¯s (x) = 2c¯s xq(x, s) + 2(b¯s + 2c¯s x)( q(x, s)) + (a¯ s + x b¯s + x c¯s x) 2 q(x, s) Tr[ 2 βsi j (x)Bs Bs ] = 2q(x, s)Tr(c¯s Bs Bs ) + 2(b¯s + 2c¯s x)Bs Bs q(x, s) + (a¯ s + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]. Now from (5.8.8), q(x, s) = − s−1 (x − m s )q(x, s).
5.9 Continuous-time model parameters estimation
201
Consequently, substitution of β¯t (x), given by (5.9.13) in the differential form of the right hand side of (5.9.12), yields − ((b¯s + 2c¯s )x) As xq(x, s)ds − (a¯ s + b¯s x + x c¯s x)div(As xq(x, s))ds + q(x, s)Tr(c¯s Bs Bs )ds + (b¯s + 2c¯s x)Bs Bs s−1 (x − m s )q(x, s)ds 1 (a¯ s + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]ds 2 + (a¯ s + x b¯s + x c¯s x)q(x, s)x Cs (Ds−1 ) Ds−1 dys +
+ q(x, s)xs (ei ej )xs ds.
(5.9.14)
Also, ¯ dβs(x) = (da¯ s + db¯s x + x dc¯s x)q(x, s) + (a¯ si j + x b¯s + x c¯s x)dq(x, s).
(5.9.15)
Consequently, β¯t (x), given by (5.9.13), is a solution of (5.9.12) if (5.9.14) equals (5.9.15). However, q(x, s) solves the equation (5.8.7), so dq(x, s) = − div(As xq(x, s))ds +
1 Tr[ 2 q(x, s)Bs Bs ]ds 2
+ q(x, s)x Cs (Ds−1 ) Ds−1 dys . Therefore, substituting the above expression for dq(x, s) into (5.9.15) yields ¯ dβs(x) = (da¯ s + db¯s x + x dc¯s x)q(x, s) − (a¯ si j + x b¯s + x c¯s x)div(As xq(x, s))ds 1 + (a¯ si j + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]ds 2 + (a¯ si j + x b¯s + x c¯s x)x Cs (Ds−1 ) Ds−1 q(x, s)dys .
(5.9.16)
Finally, equating the coefficients of x, x and the constants in (5.9.14) and (5.9.16), it is seen the result holds if (5.9.7), (5.9.8) and (5.9.9) hold. A solution of the ordinary differential equations (5.9.8) and (5.9.9) is now obtained. Write G t for the matrix solution of dG t = −(At + t−1 Bt Bt )G t , G 0 = Im×m . (5.9.17) dt Note that G t is deterministic and can be calculated off-line. Also, as an exponential matrix, G t has an inverse G −1 t . Lemma 5.9.6 The explicit solutions of (5.9.8) and (5.9.9) are t
−1 i j −1 ¯bti j = 2G t G s c¯s Bs Bs s m s ds , 0
ij c¯t
1 = Gt 2
t 0
G −1 s (e j ei
+
ei ej )(G s )−1 ds
G t .
202
Kalman filtering
Proof
The above equations follow using variation of constants.
ij
ij
Remark 5.9.7 We proceed similarly with the processes Ut and L t leaving the details as exercises. Definition 5.9.8 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Ut g(xt ) | Yt ]. This has a density γt (x) so that ij
ij
E[t Ut g(xt ) | Yt ] = IR
m
γt (x)g(xt )dx.
ij
The existence of the density γt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations (see Section 4.2 of [2]). The following theorem (Theorem 3.5 in [14]) shows that we can describe the measure ij γt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.7). ij
Theorem 5.9.9 At time t, the density γt (x) is completely described by the five statistics ij ij ij a˘ t , b˘t , c˘t , t , and m t as follows: γt (x) = (a˘ t + x b˘t + x c˘t x)q(x, t). ij
ij
ij
ij
ij
ij
ij
Here a˘ t ∈ IR, b˘t ∈ IRm , and c˘t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices. Further, ij da˘ t ij i j ij = Tr c˘t Bt Bt + b˘t Bt Bt t−1 m t , a˘ 0 = 0 ∈ IR, dt 1 ij 0 ij ij db˘t = [− At + t−1 Bt Bt b˘t + 2c˘t Bt Bt t−1 m t ]dt + dyt f j ei , ij b˘0 = 0 ∈ IRm ,
(5.9.19)
ij 1 ij 1 0 dc˘t ij 0 = − At + t−1 Bt Bt c˘t − c˘t At + Bt Bt t−1 dt 1 ij + (ei f j Ct + Ct f j ei ), c˘0 = 0 ∈ L s (IRm , IRm ). 2
Proof
(5.9.18)
(5.9.20)
Apply the Itˆo product rule to dUt = xt (ei f j )dys and ij
dt φ(xt ) = t ( φ(xt )) At xt dt + t ( φ(xt )) Bt dw t 1 + t Tr[ 2 φ(xt ) Bt Bt ]dt 2 + t φ(xt )xt Ct (Dt−1 ) Dt−1 dyt . and condition both sides on Yt under the “ideal” world probability measure P using Lemma 3.2, p. 261 of [15].
5.9 Continuous-time model parameters estimation
203
ij
In terms of the densities γt (x) and q(x, t), integration by parts yields t 1 t ij ij γt (x) = − div(As xγs (x))ds + Tr[ 2 γsi j (x)Bs Bs ]ds 2 0 0 t t ij −1 −1 + γs (x)x Cs (Ds ) Ds dys + q(x, s)xs (ei f j )dys 0
0
t
+ 0
q(x, s)(x Cs f j ei x)ds.
(5.9.21)
We look for a solution of (5.9.21) of the form γ¯si j (x) = (a˘ si j + x b˘si j + x c˘si j x)q(x, s).
(5.9.22)
As noted just after Definition 5.9.4, if such solution exists, it is unique. Recall that q(x, s) solves the equation (5.8.7): dq(x, s) = − div(As xq(x, s))ds +
1 Tr[ 2 q(x, s)Bs Bs ]ds 2
+ q(x, s)x Cs (Ds−1 ) Ds−1 dys . So (5.9.22) is a solution of (5.9.21) if (5.9.18), (5.9.19) and (5.9.20) hold. Now the ordinary differential equations (5.9.19) and (5.9.20) are solved explicitly. Note j j that f j dyt = dyt f j = dyt , where yt denotes the j-th component of yt . Lemma 5.9.10 The explicit solutions of (5.9.19) and (5.9.20) are t t ij ij −1 ˘ b˘t = 2G t G −1 B B m ds + G e dy c s s i s , s s s s s 0
ij c˘t
1 = Gt 2
0
0
t
G −1 s (ei
f j Cs
+
Cs
f j ei )(G s )−1 ds
G t .
Definition 5.9.11 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Ut g(xt ) | Yt ]. This has a density λt (x), so that ij ij E[t Ut g(xt ) | Yt ] = λt (x)g(xt )dx. IRm
ij
The existence of the density λt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations (see Section 4.2 of [2]). The following theorem (Theorem 3.8 in [14]) shows that one can describe the measure ij λt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.7). ij
Theorem 5.9.12 At time t, the density λt (x) is completely described by the five statistics ij ij ij a˜ t , b˜t , c˜t , t , and m t as follows: λt (x) = (a˜ t + x b˜t + x c˜t x)q(x, t). ij
ij
ij
ij
204
Kalman filtering ij
ij
ij
Here a˜ t ∈ IR, b˜t ∈ IRm , and c˜t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices. Further, 0 1 da˜ t ij i j = Tr c˜t Bt Bt + b˜t Bt Bt t−1 m t − Tr Bt Bt ei ej , dt ij
ij
a˜ 0 = 0 ∈ IR, ij db˜t
dt
(5.9.23)
1 ij 0 ij = − At + t−1 Bt Bt b˜t + 2c˜t Bt Bt t−1 m t − (e j ei )Bt Bt t−1 m t , ij b˜0 = 0 ∈ IRm ,
(5.9.24)
ij 1 ij 1 0 dc˜t ij 0 = − At + t−1 Bt Bt c˜t − c˜t At + Bt Bt t−1 dt 1 0 1 + (ei ej (At + Bt Bt t−1 ) + At + t−1 Bt Bt e j ei ), 2 ij
c˜0 = 0 ∈ L s (IRm , IRm ).
(5.9.25)
ij
Proof Apply the Itˆo product rule to t Ut φ(xt ) and condition on Yt under the “ideal” world probability measure P using Lemma 3.2, p. 261 of [15]. ij In terms of the densities λt (x) and q(x, t), integrate by parts to get t 1 t ij ij λt (x) = − div(As xλs (x))ds + Tr[ 2 λis j (x)Bs Bs ]ds 2 0 0 t t + λis j (x)x Cs (Ds−1 ) Ds−1 dys + q(x, s)x (ei ej )As xds 0
−
0
0
t
div(x (ei ej )Bt Bt q(x, s))ds.
(5.9.26)
We look for a solution of (5.9.26) of the form γ¯si j (x) = (a˜ si j + x b˜si j + x c˜si j x)q(x, s).
(5.9.27)
Recall that q(x, s) solves the equation (5.8.7). So (5.9.27) is a solution of (5.9.26) if (5.9.23), (5.9.24) and (5.9.25) hold. Now the ordinary differential equations (5.9.24) and (5.9.25) are solved explicitly. Lemma 5.9.13 The explicit solutions of (5.9.24) and (5.9.25) are t ij ij −1 ˜ b˜t = G t G −1 (2 c − e e ) B B m ds , j i s s s s s s 0
1 t −1 ij c˜t = G t [G s (ei ej (At + Bt Bt t−1 ) 2 0 0 1 + At + t−1 Bt Bt e j ei )(G s )−1 ]ds G t .
(5.9.28)
5.9 Continuous-time model parameters estimation
205
Remark 5.9.14 Note that from the definition of G t , (5.9.17), that the integrand in (5.9.28) −1 includes only half of the four terms in the derivative of G −1 t (ei e j + e j ei )(G t ) ), and so the integral cannot be evaluated in closed form. ij
ij
ij
Theorem 5.9.15 (Theorem 3.10 in [14]) Finite-dimensional filters for Tt , Ut , and L t , defined in (5.9.1), (5.9.3), and (5.9.2), are given by m
E[Tt | Yt ] = a¯ t + m t a¯ t + ij
ij
ij
c¯t ( p, q) t ( p, q) + m t c¯t m t , ij
ij
p,q=1 m
E[Ut | Yt ] = a˘ t + m t a˘ t + ij
ij
ij
c˘t ( p, q) t ( p, q) + m t c˘t m t ,
(5.9.29)
c˜t ( p, q) t ( p, q) + m t c˜t m t .
(5.9.30)
ij
ij
p,q=1 m
E[L t | Yt ] = a˜ t + m t a˜ t + ij
ij
ij
ij
ij
p,q=1
Proof Recall from (5.8.8) that q(x, t) is an unnormalized Gaussian density with mean m t and variance t . Therefore, q(x, t)dx = νt . IRm
Note that for u ∈ IRm ,
IR
m
u xq(x, t)dx = (u m t )νt .
Also, for any matrix M ∈ L(IRm , IRm ) with entries M( p, q), 1 ≤ p, q ≤ m, IRm
x M xq(x, t)dx =
IRm
(x − m t ) M(x − m t )q(x, t)dx
+ m t Mm t =
m
q(x, t)dx IRm
M( p, q) t ( p, q) +
m t Mm t
νt .
p,q=1
Now from Bayes’ Theorem (4.1.1), we have ij E[Tt
2 ij ij E[t Tt | Yt ] IRm βt (x)dx 2 | Yt ] = = E[t | Yt ] IRm q(x, t)dx = a¯ t + m t a¯ t + ij
ij
m
c¯t ( p, q) t ( p, q) + m t c¯t m t , ij
ij
p,q=1
by (5.9.6) and because the factors νt cancel. The proof of equations (5.9.29) and (5.9.30) are similar.
206
Kalman filtering
Estimation of B and D First consider the tensor product of xt with itself: t t xt ⊗ xt = x0 ⊗ x0 + xs ⊗ dxs + dxs ⊗ xs + 0
0
t
B B ds.
Conditioning both sides of (5.9.31) on Yt , we have t t t = E[x0 ⊗ x0 | Yt ] + E xs ⊗ dxs + dxs ⊗ xs | Yt + 0
(5.9.31)
0
0
t
B B ds.
(5.9.32)
0
E[x0 ⊗ x0 | Yt ] in (5.9.32) is the smoothed second moment and is given in terms of finitedimensional statistics; see Theorem 12.11, section 12.4 in [25]. The components of the ij conditioned stochastic integral in (5.9.32) are given by the filtered estimates of L t . Consequently, we have a procedure for estimating the matrix B B . Similarly, consider the tensor product of yt with itself: t t yt ⊗ yt = ys ⊗ dys + dys ⊗ ys + D D t. 0
0
This expression simply amounts to evaluating D D in terms of the quadratic variation of y. 5.10 Direct parameter estimation In the previous sections maximum likelihood arguments were used to estimate recursively the parameters for the linear model (5.8.1), (5.8.2). Here a direct approach to the estimation problem as well as rates of convergence are discussed. The following theorem ([12]) is a continuous-time version of Kronecker’s Lemma (see for example [26] or [29] for the discrete-time case). This result is applied to discuss rates of convergence of the estimates. Suppose (, F, Ft , P), t ≥ 0 is a stochastic basis and M is a continuous locally square integrable martingale. Further, u t is a positive nondecreasing predictable process such that ut > c > 0
Write z t =
2t t0
a.s.
u r−1 dMr , for 0 ≤ t0 ≤ t.
Theorem 5.10.1 Suppose lim z t (ω) = ξ (ω) < ∞ a.s.
t→∞
1 Then limt→∞ (Mt − Mt0 ) exists a.s. ut If limt→∞ u t (ω) = +∞, this limit is 0. Proof
For any s, t0 < s < t, because u is nondecreasing, t t Mt − Ms = u r dzr = u r d(zr − z s ) s
s
t
= u t (z t − z s ) − s
(zr − z s )du r
a.s.
5.10 Direct parameter estimation
207
|Mt − Ms | ≤ 2u t sup |zr − z s |.
(5.10.1)
Consequently, r ≥s
Suppose that limt→∞ u t (ω) = u(ω) < ∞. Then |Mt − Ms | ≤ 2u supr ≥s |zr − z s |. From the hypothesis that limt→∞ z t (ω) = ξ (ω) < ∞ a.s., for any > 0 there is an s such that, if r ≥ s ≥ s , |zr − z s | < /2u. Consequently, if r ≥ s ≥ s , |Mt − Ms | ≤ . That is, Mt
(ω) satisfies a Cauchy condition and converges to a limit µ(ω). Then 1 1 limt→∞ (Mt − Mt0 ) = (µ − Mt0 ). ut u Suppose now that limt→∞ u t (ω) = +∞. Given > 0, again using the Cauchy condition for z, there is an s such that, if r ≥ s ≥ s ∨ t0 , |zr − z s | < . 3 Consequently, supr ≥s ∨t0 |zr − z s | ≤ /3. From (5.10.1), if t ≥ s ∨ t0 , 1 2 . |Mt − Ms ∨t0 | ≤ ut 3 Suppose t0 ≤ s ∨ t0 < t0 . Now limt→∞ u t (ω) = +∞, so there is a t such that, t > t , ut ≥ That is
3|Ms ∨t0 − Mt0 | .
1 |Mt − Ms ∨t0 | ≤ . Now ut 3 1 1 1 |Mt − Mt0 | ≤ |Ms ∨t0 − Mt0 | + |Mt − Ms ∨t0 |. ut ut ut
So if t > s ∨ t ∨ t0 , 1 |Mt − Mt0 | ≤ , ut and the result is proved. The signal coefficient From (5.8.1),
0
t
t
dxs ⊗ xs = A
t
xs ⊗ xs ds +
0
dw s ⊗ xs ,
0
which we rewrite L t = ATt + Mt , and E[L t | Yt ] = AE[Tt | Yt ] + E[Mt | Yt ].
208
Kalman filtering
An estimate for A is, therefore, Aˆ t = Lˆ t Hˆ t−1 , and the error Aˆ t − A = Mˆ t Hˆ t−1 . Now as a special case of Theorem 5.10.1 we investigate the convergence of this error to zero. Consider a function ρ(t), t ≥ 0, which is positive nondecreasing and such that t
limt→∞
ρ −1 (s)ds = λ < ∞. Note from Theorem 5.10.1 this last condition implies that
0
limt→∞ tρ(t)−1 = 0. An example of such a function is ρ(t) = max(1, t(log t)(log log t)α ),
α > 1.
Clearly any function which grows faster than t α , α > 1, at infinity satisfies the condition. The strongest results are those for functions which have the slowest growth at infinity. Consider the (matrix) martingale Mt . M is locally square integrable; M will denote the predictable nonnegative process such that Mt Mt − Mt is a local martingale. t
In fact Mt = B B
0
gale
xs xs ds and Tr Mt = Tr(B B )
t
Rt =
t
0
xs xs ds. Consider the martin-
ρ(Tr Ms )−1/2 dMs .
0
Lemma 5.10.2 Rt is a square integrable martingale, so limt→∞ Rt = ξ (ω) < ∞ exists a.s. Proof E[Tr(Rt Rt )] Now
t
t
=E
−1
ρ(Tr Ms ) d(Tr Ms ) .
0
ρ(Tr Ms )−1 d(Tr Ms ) < λ a.s. So
0
lim E[Tr(Rt Rt )] ≤ λ < ∞.
t→∞
and Rt is a square integrable martingale for 0 ≤ t ≤ ∞. Corollary 5.10.3 From Theorem 5.10.1, if ρ is continuous, lim ρ(Tr Mt )−1/2 Mt
t→∞
exists. If limt→∞ Tr Mt = +∞, this limit is zero. (Tr Mt is an increasing process so limt→∞ Tr Mt exists and is either finite or +∞). Corollary 5.10.4 lim ρ
t→∞
exists a.s.
0
t
xs xs ds
−(1/2) Mt
5.10 Direct parameter estimation
Proof
209
Note that, apart from the positive constant B ∗ = Tr(B B ), Tr Mt is
0
Therefore as ρ is nondecreasing, ∗
−1
∗
ρ((B + 1) Tr Mt ) = ρ (B + 1) ≤ρ 0
so
ρ 0
t
−1
xs xs ds
With R¯ t =
t
t
−1
B
∗
0
t
xs xs ds
xs xs ds ,
1−1 0 ≤ ρ (B ∗ + 1)−1 Tr Mt .
ρ
0
0
u
xs xs ds
−1/2
dMu ,
we have E[Tr( R¯ t R¯ t )] ≤ (B ∗ + 1)λ < ∞. Therefore, limt→∞ R¯ t exists and is finite a.s., so from Theorem 5.10.1 t
−1/2 lim ρ xs xs ds Mt t→∞
0
exists a.s.
Corollary 5.10.5 Suppose x satisfies the stability property 1 t L = sup xs xs ds < ∞, t t 0 and lim ρ(t)Mt = ∞.
t→∞
Then lim ρ(t)−1/2 Mt = 0
t→∞
a.s.
Proof ρ((B ∗ + 1)−1 (L + 1)−1 Tr Mt )
t ∗ −1 −1 ∗ = ρ (B + 1) (L + 1) B xs xs ds
0
1 t = ρ (B + 1) (L + 1) B t x xs ds t 0 s 0 1 ≤ ρ (B ∗ + 1)−1 (L + 1)−1 B ∗ t L ≤ ρ(t). ∗
−1
−1
∗
t
xs xs ds.
210
Kalman filtering
Therefore ρ(t)−1 ≤ ρ((B ∗ + 1)−1 (L + 1)−1 Tr Mt ). With R˜ t =
t
ρ(s)−1/2 dMs ,
0
we have E[Tr R˜ t R˜ t ] = E
t
−1
ρ(s) d(Tr Ms )
0
≤ (B ∗ + 1)(L + 1)λ < ∞. Therefore, limt→∞ R˜ t exists and is finite a.s. Thus from Theorem 5.10.1 lim ρ (t)−1/2 Mt = 0
t→∞
a.s.
Theorem 5.10.6 Suppose x satisfies the stability property of Corollary 5.10.5 and limt→∞ ρ(t) = ∞. Further, suppose x satisfies the excitation condition ρ(t)−1 Hˆ t > K > 0,
t
where Tt = 0
xs xs ds and Hˆ t = E[Tt | Yt ]. Then lim Mˆ t Hˆ t−1 = 0 a.s.
t→∞
with convergence at a rate ρ(t)1/2 . Then lim ρ(t)−1/2 = 0 a.s.
t→∞
Proof
The stability property states that supt 1 sup E t t
t 0
1 t
t 0
xs xs ds ≤ L a.s. Therefore
xs xs ds ≤ L < ∞,
and because limt→∞ tρ(t)−1 = 0, sup t
1 E[Tr Mt ] < ∞, ρ(t)
and the set of random variables {ρ(t)−1/2 Mt } is bounded in L 2 . We can, therefore, condition the convergence of Corollary 5.10.5 and deduce lim ρ(t)−1/2 Mˆ t = 0
t→∞
a.s.
5.11 Continuous-time nonlinear filtering
211
Now Mˆ t Hˆ t−1 = ρ(t)−1/2 Mˆ t (ρ(t)−1/2 Hˆ t )−1 < ρ(t)−1/2 Mˆ t ρ(t)−1/2 K −1 . Therefore, limt→∞ ρ(t)1/2 Mˆ t Hˆ t−1 = 0 a.s. and the result follows.
The observation coefficient From (5.8.2),
t
t
dys ⊗ xs = C
0
t
xs ⊗ xs ds +
0
dvs ⊗ xs ,
0
which we rewrite Ut = C Tt + Nt , and E[Ut | Yt ] = C E[Tt | Yt ] + E[Nt | Yt ]. An estimate for C is, therefore, Aˆ t = Jˆt Hˆ t−1 , and the error Cˆ t − A = Nˆ t Hˆ t−1 . Similar discussions allow us to conclude that, under the stability and excitation conditions, the error Nˆ t Hˆ t−1 converges to zero almost surely at a rate ρ(t)1/2 . 5.11 Continuous-time nonlinear filtering Suppose (, F, P) is a probability space with a complete filtration {Ft }, t ≥ 0, on which are given two independent Ft -Brownian motion processes Bt and yt with quadratic variations Q(.) and R(.) respectively. Let x0 be a real valued random variable with distribution π0 (.). Consider the Borel functions g : IR × [0, ∞) → IR, s : IR × [0, ∞) → IR, where |g(x1 , t) − g(x2 , t)| ≤ k|x1 − x2 |, |s(x1 , t) − s(x2 , t)| ≤ k|x1 − x2 |. Write Yt = σ {ys : s ≤ t} for the complete filtration generated by the observation process y. Remark 5.11.1 The stochastic differential equation dxt = g(xt )dt + s(xt )dBt , with initial state x0 , has a strong solution.
212
Kalman filtering
Consider the Borel function h : IR × [0, ∞) → IR, where we suppose |h(x, t)| ≤ k(1 + |x|). Define
t
t = exp 0
h(xs )Rs−1 dys
which is also given by
t
t = 1 + 0
1 − 2
t
h
2
0
(xs )Rs−1 ds
,
s h(xs )Rs−1 dys .
(5.11.1)
To see this apply the Itˆo rule to the function log t . Then t is an Ft -martingale and E[t ] = 1. A new probability measure P can be defined by setting dP = t . dP Ft t Define the process bt by the formula bt = yt − h(xs )ds. Then {bt } is a Wiener process 0
under P with quadratic variation R(.). Therefore under P, t yt = h(xs )ds + bt . 0
For any real valued function φ for which the expectation is defined, write σ (φ)t = E[t φ(xt ) | Yt ].
(5.11.2)
In the case when the measure defined by σ (.)t has a density q(x, t), we have σ (φ)t = φ(x)q(x, t)dx. IR
Using the Itˆo rule, we establish
t
φ(xt ) = φ(x0 ) + +
t 0
0
∂φ(xs ) s(xs )dBs ∂x
1 ∂ 2 φ(xs ) 2 ∂φ(xs ) s (x ) + g(x ) ds. s s 2 ∂x2 ∂x
(5.11.3)
In view of (5.11.1) and (5.11.3) and using the Itˆo product rule (Example 3.7.15), t t t φ(xt ) = φ(x0 ) + s dφ(xs ) + φ(xs )ds + [, φ]t 0
= φ(x0 ) +
0
t
+ 0
+
0
t
0 t
s
∂φ(xs ) s s(xs )dBs ∂x
1 2 ∂ 2 φ(xs ) ∂φ(xs ) + g(xs ) s (xs ) ds 2 ∂x2 ∂x
s h(xs )Rs−1 φ(xs )dys .
(5.11.4)
5.11 Continuous-time nonlinear filtering
213
Conditioning both sides of (5.11.4) on Yt and using the fact that Bt and yt are independent and that yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 of Chapter 7), we obtain a stochastic differential equation for (5.11.2). Theorem 5.11.2 Suppose φ ∈ C 2 is a real valued function with compact support. Then t t σ (φ)t = σ (φ)0 + σ (Aφ)s ds + σ (h(xs )Rs−1 φ(xs ))dys , (5.11.5) 0
0
1 2 ∂ 2 φ(xs ) ∂φ(xs ) + g(xs ) s (xs ) . 2 ∂x2 ∂x If σ (.)t has a density q(x, t), we integrate by parts each term of (5.11.5) using the fact that φ ∈ C 2 has compact support: where Aφ(x) =
φ(x)q(x, t)dx =
IR
or
t
∂ 2 φ(x) ∂x2 0 IR IR t t ∂φ(x) + q(x, s)g(x) q(x, s)h(x)Rs−1 φ(x)dxdys , dxds + ∂x IR IR 0 0 φ(x)q0 (t)dx +
1 2
q(x, s)s 2 (x)
φ(x)q(x, t)dx =
φ(x)q0 (x)dx +
IR
IR
−
t 0
+
t
φ(xs ) IR
IR
0
1 2
t 0
φ(x)
IR
∂ 2 (q(x, s)s 2 (x)) ∂x2
∂(q(x, s)g(x)) dxds ∂x
q(x, s)h(x)Rs−1 φ(x)dxdys ,
for all “test” functions φ, hence Corollary 5.11.3 q satisfies the linear stochastic differential equation t t ∗ q(x, t) = q0 (x) + (A q)(x, s)ds + q(x, s)h(x, s)Rs−1 dys . 0
0
1 ∂ s (xt )q(x, t) ∂g(xt )q(x, t) Here (A∗ q)(x, t) = and q0 (x) is the density such that − 2 ∂x2 ∂x π0 (dx) = q0 (x)dx. 2 2
The correlated case Here we consider nonlinear dynamics with correlated noises. Suppose (, F, P) is a probability space with a complete filtration {Ft }, t ≥ 0, on which are given two Ft -Brownian motion processes Bt ∈ IRd and Wt ∈ IRm such that t
B i , W j t = ρsi j ds, 1 ≤ i, 1 ≤ j ≤ m. 0
x0 ∈ IRd has distribution π0 (.) and is independent of Bt and Wt . Consider the Borel functions
214
Kalman filtering
g : IRd × [0, ∞) → IRd , s : IRd × [0, ∞) → L(IRd , IRd ), h : IRd × [0, ∞) → IRm and the continuous and nonsingular matrix α : [0, ∞) → L(IRm , IRm ). We assume here that |g(x1 , t) − g(x2 , t)| ≤ k|x1 − x2 |, ||s(x1 , t) − s(x2 , t)|| ≤ k|x1 − x2 |, |h(x, t)| ≤ k(1 + |x|), ||α(y)|| ≥ δ > 0 ||α(yt1 )
−
α(yt2 )||
δ and
for some ≤
k|yt1
−
yt2 |.
dxt = g(xt )dt + s(xt )dBt , dyt = h(xt )dt + α(yt )dWt . Write Yt = σ {ys : s ≤ t} for the complete filtration generated by the observation process y. Define
t 1 t −1 −1 −1 2 t = exp − (α(ys ) h(xs )) dWs − |α(ys ) h(xs )| ds 2 0 0
t 1 t −1 −1 −1 2 = exp − (α(ys ) h(xs )) α(ys ) dys + |α(ys ) h(xs )| ds . 2 0 0 Consequently,
t
t = exp
(α(ys )−1 h(xs )) α(ys )−1 dys −
0
1 2
t
|α(ys )−1 h(xs )|2 ds .
0
dP = dP Ft d ¯t ∈ IRm are standard Brownian motions, −1 t , and under P the processes Vt ∈ IR and y where By Girsanov’s Theorem, a new probability measure P can be defined by setting
dVti = dBti + ρ i , α −1 hdt,
ρ i ∈ IRd ,
and d y¯t = α −1 dy = dWt + α −1 hdt. t Furthermore, under P, V i , y j t = ρsi j ds 0
For any real valued function φ for which the expectation is defined write σ (φ)t = E[t φ(xt ) | Yt ]. Theorem 5.11.4 Suppose φ ∈ C 2 (IRd ) is a real valued function with compact support. Then t σ (φ)t = σ (φ)0 + σ (Aφ)s ds +
0 t
{σ ( φ.s.ρ) + α −1 (ys )σ (φh)} α −1 (ys )dys ,
0
where Aφ(x) =
d d 1 ∂ 2 φ(xs ) ∂φ(xs ) (ss )i j (xs ) i j + g i (xs ) . 2 i, j=1 ∂x ∂x ∂xi i=1
5.12 Problems
Proof
215
The proof is left as an exercise. 5.12 Problems
1. Assume that the state and observation processes of a system are given by the vector dynamics (5.4.1) and (5.4.2). For m, k ∈ IN, m < k, write the unnormalized conditional density such that E[k I (X m ∈ dx) | Yk ] = γm,k (x)dx. Using the change of measure techniques described in Section 5.3, show that γm,k (x) = αm (x)βm,k (x), where αm (x) is given recursively by (5.3.6). Show that βm,k (x) = E[m+1,k | X m = x, Yk ] 1 = φm+1 (Ym+1 − Cm+1 z) φ(ym+1 ) IRm × ψm+1 (z − Am+1 x)βm+1,k (z)dz.
(5.12.1)
2. Show that the density βm,k (x) (5.12.1) is Gaussian and derive backward recursions for its conditional mean and covariance matrix ([10] page 101). 3. Assume that the state and observation processes are given by the vector dynamics X k+1 = Ak+1 X k + Vk+1 + Wk+1 ∈ IRm , Yk = Ck X k + Wk ∈ IRd . Ak , Ck are matrices of appropriate dimensions, Vk and Wk are normally distributed with means 0 and respective covariance matrices Q k and Rk , assumed nonsingular. Using measure change techniques derive recursions for the conditional mean and covariance matrix of the state X given the observations Y . 4. Let m = n = 1 in (5.8.1) and (5.8.2). The notation in Section 5.8 and Section 5.9 is used here. Let t be the process defined as t t = xsp ds, p = 1, 2, . . . . 0
Write E[t I(t ∈dx) | Yt ] = µt (x)dx. Show that at time t, the density µt (x) is completely described by the p + 3 statistics st (0), st (1), . . . , st ( p), t , and m t as follows: p st (i) q(x, t), µt (x) = i=1
216
Kalman filtering
where s0 (i) = 0, i = 1, . . . , p, and dst ( p) = − p(At + t−1 Bt2 )st ( p) + 1, dt dst ( p − 1) = −( p − 1)(At + t−1 Bt2 )st ( p − 1) + pst ( p) t−1 Bt2 m t , dt dst (i) 1 = −i(At + t−1 Bt2 )st (i) + (i + 1)(i + 2)st (i + 2) dt 2 + (i + 1)st (i + 1) t−1 Bt2 ,
5. 6. 7. 8. 9. 10. 11.
dst (0) = Bt2 st (2) + t−1 st (1)m t . dt Give a detailed proof of Lemma 5.7.1. Prove (5.7.5), (5.7.6), (5.7.7) and (5.7.3). Finish the proof of Theorem 5.7.5. Give the proof of Theorem 5.7.6. Prove (5.7.39). Establish (5.7.52). Give the proof of Theorem 5.11.4.
i = 1, . . . , p − 2,
6
Financial applications
6.1 Volatility estimation Suppose a price S evolves in discrete time, k = 0, 1, . . . , with dynamics Sk+1 = Sk eµ−
2 σk+1 2
+σk+1 bk+1
.
Here {bk } is a sequence of i.i.d. normal random variables with mean 0 and variance 1 (N (0, 1)) and σk+1 represents the volatility of the price change between times k and k + 1. E[Sk+1 | Sk ] = Sk eµ . The price sequence S0 , S1 , . . . is observed as are the logarithmic increments yk+1 = log
σ2 Sk+1 = µ − k+1 + σk+1 bk+1 . Sk 2
Let us suppose that log σk has dynamics log σk+1 = a + b log σk + θ w k+1 . Here again {w k } is a sequence of i.i.d. N (0, 1) random variables. Writing xk = log σk , so that σk = exk , we see xk+1 = a + bxk + θw k+1 , e2xk yk = µ − + e2xk bk . 2 Now assume that under the reference probability measure P both {xk } and {yk } are sequences of i.i.d. N (0, 1) random variables. Write Gk = σ {x0 , . . . , xk , y0 , . . . , yk−1 }, and denoting by φ(.) the N (0, 1) probability density function φ(θ −1 (xk − a − bxk−1 )) φ(e λk = θ φ(xk )
for k = 1, 2, . . . .
−xk
1 (yk − µ + e2xk )) 2 , exk φ(yk )
(6.1.1)
218
Financial applications
Set 1 φ(e−x0 (y0 − µ + e2x0 )) 2 λ0 = , ex0 φ(y0 ) n
n =
λk .
k=0
dP = n . dP Gn We can then show that under P, {w k }, {bk }, k = 0, 1, . . . are sequences of i.i.d. N (0, 1) random variables, where
Define a new probability measure P (the “real world” probability), by setting
w k = θ −1 (xk − a − bxk−1 ), 1 bk = e−xk (yk − µ + e2xk ). 2 From Bayes’ Theorem 4.1.1, for any Borel measurable function f , E[ f (xk ) | Yk ] =
E[k f (xk ) | Yk ] E[k | Yk ]
.
The numerator defines a measure; suppose it has a density qk (.) so that ∞ E[k f (xk ) | Yk ] = f (z)qk (z)dz,
(6.1.2)
−∞
and we have the recursion Theorem 6.1.1 qk (z) = (z, y)
∞ −∞
φ(θ −1 (z − a − bx))qk−1 (x)dx.
1 e−z φ(e−z (yk − µ + e2z )) 2 Here (z, y) = . θ φ(yk ) This gives the formula for updating the unnormalized conditional density of xk = log σk given Yk . Putting f (x) ≡ 1 in (6.1.2) we see ∞ E[k | Yk ] = qk (z)dz, (6.1.3) −∞
so that the normalized conditional density of xk = log σk given Yk is pk (z) =
qk (z)
∞
−∞
qk (x)dx
.
6.1 Volatility estimation
Furthermore, taking f (xk ) = xk we see
219
∞
E[xk | Yk ] = −∞ ∞ −∞
zqk (z)dz . qk (z)dz
This is the optimal estimate of the logarithm of the volatility given the observations of the price. Calibration Suppose H , F, G are integrable functions. Consider
Sn =
n
H (yk )F(xk )G(xk−1 ).
(6.1.4)
k=1
We wish to estimate E[Sk | Yk ]. Consider an associate measure and suppose there is a density L k (z) such that ∞ E[k Sk f (xk ) | Yk ] = f (z)L k (z)dz,
(6.1.5)
−∞
for any integrable function f . We can derive the following formula for updating L k . Theorem 6.1.2
L k (z) = (z, y)
∞ −∞
+ H (yk )F(z)
φ(θ −1 (z − a − bx))L k−1 (x)dx
∞ −∞
φ(θ −1 (z − a − bx))G(x)qk−1 (x)dx ,
1 e−z φ(e−z (yk − µ + e2z )) 2 where (z, y) = . θ φ(yk ) Proof
Using (6.1.1) and (6.1.4), ∞ E[k Sk f (xk ) | Yk ] = f (z)L k (z)dz −∞
= E[k−1 Sk−1 f (xk )
φ(θ −1 (xk − a − bxk−1 )) θφ(xk )
1 φ(e−xk (yk − µ + e2xk )) 2 × | Yk ] exk φ(yk ) + E[k−1 f (xk )H (yk )F(yk )G(xk−1 ) 1 φ(e−xk (yk − µ + e2xk )) 2 × | Yk ] exk φ(yk )
φ(θ −1 (xk − a − bxk−1 )) θφ(xk )
220
Financial applications
=
∞
−∞
+
∞
−∞
∞
−∞
φ(θ −1 (z − a − bx)) f (z)(z, y)L k−1 (x)dxdz ∞
−∞
φ(θ −1 (z − a − bx))(z, y)
× H (yk )F(z)G(x) f (z)qk−1 (x)dxdz. This equality holds for all integrable f and the result follows. Corollary 6.1.3 Taking f (z) ≡ 1 in (6.1.5) we see ∞ E[k Sk | Yk ] = L k (z)dz.
(6.1.6)
−∞
Further, from Bayes’ Theorem 4.1.1, E[Sk | Yk ] =
E[k Sk | Yk ]
=
E[k | Yk ]
∞
−∞ ∞ −∞
1. For sk1 =
L k (z)dz . qk (z)dz
Special cases
k i=1
xi a measure γk1 is defined by E[k Sk1 f (xk ) | Yk ] =
∞ −∞
f (z)γk1 (z)dz.
(6.1.7)
This is updated by the formula γk1 (z) = (z, y) +z
∞ −∞
∞
−∞
φ(θ
1 φ(θ −1 (z − a − bx))γk−1 (x)dx
−1
(z − a − bx))G(x)qk−1 (x)dx .
Then E[Sk1 | Yk ] =
∞
−∞ ∞ −∞
2. For sk2 =
k i=1
γk1 (z)dz . qk (z)dz
xi−1 the corresponding measure γk2 is updated by
γk2 (z)
= (z, y) +
∞ −∞
∞
xφ(θ −∞
2 φ(θ −1 (z − a − bx))γk−1 (x)dx
−1
(z − a − bx))G(x)qk−1 (x)dx .
6.2 Parameter estimation
3. For Jk =
k
221
xi xi−1 the corresponding measure βk1 is updated by
∞ 1 βk1 (z) = (z, y) φ(θ −1 (z − a − bx))βk−1 (x)dx
i=1
+z
−∞
∞
−∞
xφ(θ −1 (z − a − bx))G(x)qk−1 (x)dx .
4. Similar formulae, which are all special cases of the expression for L k (z), are obtained for updating the measures: βk2 (z) associated with
k
2 xi−1 ,
i=1
βk3 (z) associated with
k
xi2 ,
i=1
νk1 (z) associated with
k
yi e−2xi ,
i=1
νk2 (z) associated with
k
e−2xi .
i=1
In all cases the conditional expectation of the sum, given the observations, is obtained by normalizing the integral of the associated measure. For example, ∞
νk1 (z)dz k −∞ −2xi E yi e | Yk = ∞ . i=1 qk (z)dz −∞
6.2 Parameter estimation Estimates of the sums above can be used to apply to the EM algorithm. Parameters in our model can be re-estimated recursively and, further, one parameter at a time can be updated. For example, suppose after some iteration a parameter set (a, b, θ, µ) is obtained and we wish to re-estimate the parameter b, given the observations y1 , y2 , . . . , yk . ˆ This is Consider a change of measure which replaces parameter b in our model by b. given by a Radon–Nikodym derivative ˆ
bk =
k ˆ i−1 )) φ(θ −1 (xi − a − bx , φ(θ −1 (xi − a − bxi−1 )) i=1
dP b ˆ = bk . dP b Gk ˆ The maximizing step determines the conditional expectation of log bk given the observations. That is, consider
k 1 −1 2 bˆ ˆ i−1 ) ) + R(b) | Yk , E[log k | Yk ] = E − (θ (xi − a − bx 2 i=1 ˆ
and setting
where R(b) does not involve b.
222
Financial applications
The first order condition gives the maximum value of bˆ as: k k xi xi−1 − a i=1 xi−1 | Yk ] E[ i=1 bˆk = k 2 E[ i=1 xi−1 | Yk ] ∞ (βk1 (z) − aγk2 (z))dz = −∞ ∞ . βk2 (z)dz −∞
Similar arguments gives estimates k k 1 xi − b xi−1 | Yk E k i=1 i=1 ∞ (γk1 (z) − bγk2 (z))dz = −∞ ∞ . k qk (z)dz
aˆ k =
−∞
k 1 E (xi − a − bxi−1 )2 | Yk 2k i=1 ∞ F(z)dz a −∞ = − , ∞ 2 2k qk (z)dz
(θˆk )2 =
−∞
k k yi e2xi | Yk +E 2 i=1 ∞ ∞ t qk (z)dz + νk1 (z)dz 2 −∞ −∞ ∞ = . νk1 (z)dz
µˆ k =
−∞
Here F(z) = βk3 (z) + bβk2 (z) − 2bβk1 (z) − 2aγk1 (z) + 2abγk2 (z). 6.3 Filtering a price process Suppose in discrete time a price S has the form Sk+1 = Sk eYk+1 , where Yk+1 = ck + σk bk+1 . Here {b } is a sequence of i.i.d. normal random variables with mean 0 and variance 1 (N (0, 1)). Suppose (ck , σk ) takes values in a finite set B = {(ci , σi ) : 1 ≤ i ≤ N }. Write c = (c1 , c2 , . . . , c N ) ,
σ = (σ1 , σ2 , . . . , σ N ) ,
6.4 Parameter estimation
223
and suppose that (ck , σk ) evolves as a Markov chain with state space B. We can identify B with S = {e1 , e2 , . . . , e N }, where, as before, ei = (0, . . . , 1, . . . , 0) ∈ IR N . Suppose φ : B → S gives this bijection, so that for each i, 1 ≤ i ≤ N , φ(ci , σi ) = ei . Write X k = φ((ck , σk )) (where k now denotes the time parameter). Then ck = c, X k , and σk = σ, X k . We suppose X is a Markov chain on (, F, P) with state space S and transition matrix A. The state space S could be quite small and X could represent the state of the economy as “good”, “bad”, or “average”. Of course X is not observed directly. Instead we observe logarithmic increments of the price process: Yk+1 = log
Sk+1 = ck + σk bk+1 = c, X k + σ, X k bk+1 . Sk
The N (0, 1) random variable b models a purely random noise in the dynamics. The Markov chain X also models some random behavior, but hopefully random behavior with some structure. The results of the previous sections can now be applied. For any price process {Sk }, k = 1, 2, . . . , the steps are 1. calculate the sequence of logarithmic increments Yk+1 = log
Sk+1 = ck + σk bk+1 = c, X k + σ, X k bk+1 , Sk
2. choose “appropriate” prior values for {(ci , σi ) : 1 ≤ i ≤ N } and for the transition probabilities a ji = P(X k+1 = e j | X k = ei ) ≥ 0,
with Nj=1 a ji = 1, 3. after n values of Y have been observed, calculate new estimates for c, σ and the a, 4. use these values, iteratively, to re-estimate the c, σ and the a. The EM algorithm implies the estimates improve monotonically, in the sense that the expected log-likelihood increases with each re-estimation. Consequently, the model is ‘self-tuning’. This step is repeated until some stopping criterion is satisfied. 6.4 Parameter estimation for a modified Kalman filter This application considers a slightly modified linear Gaussian model. Consider the following model for the spot price of oil S: dSt = (µ − δt )St dt + σ1 St dz 1 (t).
(6.4.1)
224
Financial applications
Here z 1 is a standard Brownian motion and δt represents the “convenience yield”. (This models the value of holding amounts of the commodity.) In fact it is supposed that δ follows similar stochastic dynamics of the form dδt = κ(α − δt )dt + σ2 dz 2 (t).
(6.4.2)
Here z 2 is a second standard Brownian motion with z 1 (t), z 2 (t) = ρt. It is convenient to consider the logarithm of the stock price, X t = loge St . Then X satisfies 1 dX t = κ(µ − δt − σ12 )dt + σ1 dz 1 (t). 2
(6.4.3)
If r is the risk-free interest rate (taken to be constant here) and λ is the market price of convenience yield risk (also assumed constant), S and δ follow similar processes under an equivalent martingale measure. However, it is equations (6.4.2) and (6.4.3) which we discretize to give dynamics for the state vector (X t , δt ) as: (X t , δt ) = ct + Q t (X t−1 , δt−1 ) + ηt .
(6.4.4)
1 Here ct = ((µ − σ12 )t, καt) ∈ IR2 and 2 1 −t Qt = , a 2 × 2 matrix. 0 1 − κt The future price for oil for delivery at time T ≥ 0 is given by:
(1 − e−κ T ) F(S, δ, T ) = S exp −δ + A(T ) , κ where 1 σ22 1 (1 − e−κ T ) σ1 σ 2 ρ A(T ) = r − α + T + σ22 − 2 2κ κ 4 κ3 σ 2 (1 − e−κ T ) + ακ + σ1 σ2 ρ − 2 . κ κ2 Here S is the spot price today, T = 0 and δ is the value of the convenience yield today, T = 0. Consequently, loge F(S, δ, T ) = loge S − δ
(1 − e−κ T ) + A(T ). κ
(6.4.5)
6.4 Parameter estimation
225
It is these future prices, for various dates T , which are given in the market. That is, for different dates T1 , T2 , . . . , TN we have observations yt1 = loge F(S, δ, T 1 ), .. . ytN = loge F(S, δ, T N ). It is supposed these observations give the right hand side of (6.4.5) plus some “noise” term εt ∈ IR N , where εt = (εt1 , . . . , εtN ) is a sequence for t = 0, 1, . . . of independent Gaussian random variables with E[εt ] = 0 ∈ IR N and Var εt = E[εt εt ] = H ∈ IR N ×N . The observation equation (6.4.5), plus εt noise on the right side, therefore has the form: yt = dt + Z t (X t , δt ) + εt , for t = 1, 2, . . . , T , where
(6.4.6)
loge F(S, δ, T 1 ) yt1 .. yt = ... = .
ytN
loge F(S, δ, T N )
are the future prices at time t for delivery at times t + T 1 , . . . , t + T N . 1 1, −κ −1 (1 − eκ T ) A(T1 ) 2 1, −κ −1 (1 − eκ T ) .. N . dt = . ∈ IR , Z t = .. . A(TN ) N 1, −κ −1 (1 − eκ T ) The model, in summary, has dynamics (6.4.4) for the “signal” (X t , δt ), (X t , δt ) = c + Q(X t−1 , δt−1 ) + ηt ,
(6.4.7)
and dynamics (6.4.6) for the observations, yt = (yt1 , . . . , ytN ), yt = dt + Z t (X t , δt ) + εt .
(6.4.8)
Note that, in spite of Schwartz’s notation, c, Q, d and Z do not depend on t. They do include t, the time increment of fixed size. Equations (6.4.7) and (6.4.8) are of the form where the classical Kalman filter can be applied. This considers linear dynamics for the signal X t = (X t1 , . . . , X tm ) ∈ IRm , t = 0, 1, . . . , X t+1 = A¯ + AX t + Bw t+1 ,
A ∈ IRm×m ,
(6.4.9)
and observations yt = C¯ + C X t + Dvt ,
t = 0, 1, . . ..
(6.4.10)
¯ this model is slightly different Note that, because of the inclusion of the terms A¯ and C, from that considered previously.
226
Financial applications
One observes yt , t = 0, 1, . . . , T, . . . , and wishes to make the best estimate of X t . This is the quantity Xˆ t = E[X t | y0 , y1 , . . . , yt ]. In fact Xˆ t is also a Gaussian random variable with conditional mean µt = Xˆ t = E[X t | y0 , y1 , . . . , yt ], and variance Rt = E[(X t − µt )(X t − µt ) | y0 , y1 , . . . , yt ]. In fact, the formulae are better written in terms of the one-step predictions: µk|k−1 = E[xk | y0 , y1 , . . . , yk−1 ] = A¯ + Aµk−1 , and Rk|k−1 = E[(X k − µk|k−1 )(X k − µk|k−1 ) | y0 , y1 , . . . , yk−1 ]. Then Rk|k−1 = B 2 + A Rk−1 A .
Kalman filter The (modified) Kalman filter then gives recursive updates: µk+1 = A¯ + Aµk + Rk+1|k C (C Rk+1|k C + D D )−1 × (yk+1 − C¯ − C A¯ − C Aµk ), Rk+1 = Rk+1|k − Rk+1|k C (C Rk+1|k C + D D )−1 C Rk+1|k . As stated, µk = Xˆ k = E[X t | y0 , y1 , . . . , yk ] is the conditional mean, or best estimate, of X k given y0 , y1 , . . . , yk . Similarly, Rk = E[(X k − µk )(X k − µk ) | y0 , y1 , . . . , yk ].
Parameter estimation ¯ A, B, C, ¯ C, D is However, to implement the Kalman filter knowledge of the parameters A, required. Our algorithms, when modified for these “affine” dynamics, provide optimal ways of estimating these parameters. i j(M) i j(M) i j(M) In fact, consider the following recursions for ak ∈ IR, bk ∈ IRm , dk ∈ IRm×m in ¯ in (a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m), a¯ k , bk , u ik , vki ,
6.4 Parameter estimation −1 u¯ ik , v¯ki . M = 0, 1, 2, 1 ≤ i, j ≤ m, 1 ≤ n ≤ d, σk−1 = Rk−1 − Rk−1 A Rk|k−1 A Rk−1 . i j(M)
ak
i j(M)
= ak
i j(M)
+ bk
i j(M) −1 −1 −1 σk+1 Rk µk + Tr dk σk+1
−1 + µk Rk−1 σk+1 dk
i j(M)
i j(M)
−1 −1 σk+1 Rk µk − bk
−1 + A¯ Rk+1|k A R k dk
i j(M)
−1 A¯ Rk A Rk+1|k
−1 − 2A Rk+1|k A R k dk
i j(M)
i j(M)
a0
−1 A¯ Rk A Rk+1|k
−1 −1 σk+1 R k µk ,
= 0 ∈ IR,
i j(0) i j(0) i j(0) −1 −1 i j(0) −1 −1 bk+1 = Rk+1|k A¯ , A Rk bk + 2dk σk+1 Rk µk − 2dk Rk A Rk+1|k i j(0)
b0
= 0 ∈ IRm ,
−1 dk+1 = Rk+1|k A R k dk i j(0)
i j(0)
d0
i j(0)
=
ei ej + e j ei 2
1 −1 Rk A Rk+1|k + (ei ej + e j ei ), 2
∈ IRm×m ,
i j(1) i j(1) i j(1) −1 −1 i j(1) −1 −1 bk+1 = Rk+1|k A¯ A Rk bk + 2dk σk+1 Rk µk − 2dk Rk A Rk+1|k −1 −1 −1 ¯ + ei ej σk+1 A, Rk µk − ei ej Rk A Rk+1|k i j(1)
b0
= 0 ∈ IRm ,
−1 dk+1 = Rk+1|k A R k dk i j(1)
i j(1)
d0
i j(2)
i j(1)
1 −1 −1 −1 Rk A Rk+1|k + (ei ej Rk A Rk+1|k + Rk+1|k A Rk e j ei ), 2
= 0 ∈ IRm×m ,
i j(2) −1 −1 i j(2) −1 i j(2) −1 −1 + bk σk+1 Rk µk + Tr dk σk+1 Rk µk + µk Rk−1 dk σk+1 −1 −1 −1 −1 + Tr ei e j σk+1 + µk Rk−1 σk+1 (ei ej )σk+1 R k µk i j(2)
ak+1 = ak
−1 −1 A¯ + A¯ Rk+1|k A Rk (ei ej )Rk A Rk+1|k i j(2)
− bk
i j(2) −1 −1 −1 A¯ + A¯ Rk+1|k A¯ Rk A Rk+1|k A Rk dk Rk A Rk+1|k
i j(2) −1 −1 −1 − 2 A¯ Rk+1|k A Rk dk σk+1 R k µk −1 −1 −1 − A¯ Rk+1|k A Rk (ei ej + e j ei )σk+1 R k µk , i j(2)
a0
= 0 ∈ IR,
i j(2) i j(2) i j(2) −1 −1 −1 A Rk bk + 2dk σk+1 R k µk bk+1 = Rk+1|k i j(2) − 2d Rk A R −1 A¯ k
+
k+1|k
−1 Rk+1|k A Rk (ei ej
−1 −1 + e j ei )σk+1 R k µk
−1 −1 ¯ − Rk+1|k A Rk (ei ej + e j ei )Rk A Rk+1|k A,
227
228
Financial applications i j(2)
b0
i j(2) dk+1 i j(2)
d0
= 0 ∈ IRm , =
−1 Rk+1|k A Rk
i j(2) dk
ei ej + e j ei
+
2
−1 , Rk A Rk+1|k
= 0 ∈ IRm×m ,
−1 −1 −1 in ¯ a¯ k+1 = a¯ kin + b¯kin σk+1 Rk µk − b¯kin Rk A Rk+1|k A,
a¯ 0in = 0, −1 in b¯k+1 = Rk+1|k A Rk b¯kin + ei ej yk+1 ,
b¯0in = ei y0 , en ,
−1 −1 −1 ¯ u ik+1 = u ik + vki σk+1 A, Rk µk − vki Rk A Rk+1|k
u i0 = 0, −1 i vk+1 = Rk+1|k A Rk vki + ei ,
v0i = ei ∈ IRm ,
−1 −1 u¯ ik+1 = u¯ ik + (v¯ki + ei )σk+1 R k µk −1 ¯ A, − (v¯ki + ei )Rk A Rk+1|k
−1 i v¯k+1 = Rk+1|k A Rk (v¯ki + ei ),
u¯ i0 = 0 ∈ IR,
v¯0i = 0 ∈ IRm .
where Tr[.] denotes the trace of a matrix (which is the sum of the diagonal elements). Write Hk(0) =
k
=0
Hk(2) =
k
=1
Lk =
k
x x ,
Hk(1) =
=0
x −1 x −1 ,
x ,
k
Jk =
x x ,
k
=0
L¯ k =
=0
k
x y ,
x −1 ,
=1
Hˆ k(0) = E[Hk(0) | Yk ],
Hˆ k(1) = E[Hk(1) | Yk ],
etc. Then for M = 0, 1, 2: i j(M)
E[Hk
E[Jkin
i j(M)
| Yk ] = a k | Yk ] =
a¯ kin(M)
i j(M)
+ bk +
i j(M)
µk + Tr[dk
Rk ] + µk dk
i j(M)
µk ,
b¯kin µk ,
E[L ik | Yk ] = u ik + vki µk ,
E[ L¯ ik | Yk ] = u¯ ik + v¯ki µk . These equations give recursive finite dimensional filters for estimating the matrices and vectors Hk(M) , M = 0, 1, 2, Jk , L k and L¯ k given the observations y0 , y1 , . . . , yk .
6.5 Estimating the implicit interest rate
229
¯ C¯ are then (given y0 , y1 , . . . , yk ): The revised estimates for the parameters A, B, C, D, A, 1 A¯ k = ( Lˆ k − A Lˆ¯ k ), k
A¯ k =
k 1 y − C Lˆ k ), ( k + 1 =0
Ak = ( Hˆ k(1) − A¯ Lˆ¯ k )( Hˆ k(2) )−1 ,
Ck = ( Jˆk − C¯ Lˆ k )( Hˆ k(0) )−1 , B2 =
1 ˆ (0) { Hk − (A Hˆ k(1) + Hˆ k(1) A ) + A Hˆ k(2) A k − ( A¯ Lˆ k + Lˆ k A¯ ) + ( A¯ Lˆ¯ k A + A Lˆ¯ k A¯ ) + k A¯ A¯ },
k k 1 (0) (D D ) = y y − ( Jˆk C + C Jˆk ) + C Hˆ k C − C¯ y
k + 1 =0
=0 k ¯ ¯ ¯ ˆ ˆ ¯ ¯ − y C + (C L k C + C L k C ) + (k + 1)C C .
=0
Given observations y0 , y1 , . . . , yk , the parameters are initialized and the above algorithms run to re-estimate the parameters one at a time. With the same y0 , y1 , . . . , yk this process is iterated until some stopping rule is satisfied.
6.5 Estimating the implicit interest rate of a risky asset In this section a risky asset is considered whose price at time t is supposed described by an equation of the form dSt = St (ρt dt + σ dBt ),
t ≥ 0.
(6.5.1)
Here the drift coefficient ρt is the underlying interest rate of the risky asset, B is a standard Brownian motion and the integrals are taken to be Itˆo integrals. This model is used frequently, and often the coefficients ρ and σ are supposed to be constant. Various forms and methods of estimating the volatility of diffusion coefficient σ can be found in the literature; see for example [33]. We shall suppose σ is constant and determined by one of these techniques. We shall suppose that the implicit interest rate ρt behaves like a Markov chain with state space {r1 , . . . , r N }. r will denote the (column) vector (r1 , . . . , r N ) . Suppose S0 = S. Then from (6.5.1), t 1 2 St = S exp (ρu − σ )du + σ Bt . 2 0 Write Yt = ln St − ln S. Then
Yt = 0
t
1 (ρu − σ 2 )du + σ Bt . 2
230
Financial applications
Now ρt − 12 σ 2 takes values in the set {r1 − 12 σ 2 , . . . , r N − 12 σ 2 }. Write gi = ri − 12 σ 2 and g for the (column) vector (g1 , . . . , g N ) . Without loss of generality, we shall consider a Markov chain on S = {e1 , . . . , e N } (see Example 2.6.17). Here, for 0 ≤ i ≤ N , ei = (0, . . . , 1, . . . , 0) is the i-th unit (column) vector in IR N . If X t ∈ S denotes the state of this Markov chain at time t ≥ 0, then the corresponding value of ρt is X t , r , where ., . denotes the inner product in IR N . A natural process to take as the observation process is Yt , which can be written t Yt = X u , g du + σ Bt . (6.5.2) 0
Write Ft for the right-continuous, complete filtration generated by σ {X r , Yr : r ≤ t}, and Yt for the right-continuous, complete filtration σ {Yr : r ≤ t}, generated by the observation process. We have the following semimartingale representation result (see Lemma 2.6.18): t Xt = X0 + Ar X r dr + Vt . (6.5.3) 0
Filtering We model the above dynamics by supposing that initially we have an “ideal” probability space (, F, P) such that under P 1. X is a Markov chain with representation (6.5.3), 2. σ −1 Y is a standard Brownian motion, independent of X . Define
t
t = exp
X u , g σ
−1
0
1 dYs − 2
t
X u , g σ 2
−2
ds ,
0
which is also given by
t
t = 1 +
s X u , g σ −1 dYs .
(6.5.4)
0
To see this apply the Itˆo rule to the function log t . Then t is an Ft martingale and E[t ] = 1. A new probability measure P can be defined by setting Define the process Bt by the formula dBt = σ −1 (Yt − X t , g dt),
dP = t . dP Ft
B0 = 0.
Then Girsanov’s theorem 4.3.3 implies that {Bt } is a standard Brownian motion process under P. Therefore, under P, dYt = X t , g dt + σ dBt .
(6.5.5)
Note that under P, the process {X t } still satisfies (6.5.3). Consequently, under P the processes {X t } and {Yt } satisfy the real world dynamics (6.5.3) and (6.5.2). However, P is a
6.5 Estimating the implicit interest rate
231
more convenient measure with which to work. Using a version of Bayes’ Theorem (4.1.1), E[X t | Yt ] =
E[t X t | Yt ] . E[t | Yt ]
Write σ (X t ) = E[t X t | Yt ]. (6.5.6) N Note that E[t | Yt ] = i=1 σ (X t , ei ) = σ ( i=1 X t , ei ) = σ (1). More simply, E[t | Yt ] = σ (X t ), 1 , where 1 is an N -dimensional vector with all entries equal to 1. In view of (6.5.4) and (6.5.3) and using the Itˆo product rule (3.7.15), t t t X t = X 0 + s AX s ds + s X s , g X s σ −1 dYs N
0
= X0 +
t
0 t
s AX s ds +
0
s G X s σ −1 dYs .
(6.5.7)
0
Here G is the diagonal matrix whose entries are g1 , . . . , g N . Conditioning both sides of (6.5.7) on Yt and using the fact that Yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 p. 261), we have the following finite dimensional filter for σ (X t ): t t σ (X t ) = σ (X 0 ) + Aσ (X s )ds + Gσ (X s )σ −2 dYs . (6.5.8) 0
0
Note that σ (ρt ) = σ (X t ), r . For s ≤ t the smoother for X s , ei is defined as E[X s , ei | Yt ], with unnormalized form under P
E[X s , ei t | Yt ] = σt (X s , ei ). However, it is more convenient to work with σt (X s , ei X t ) (see [10] Chapter 8 for more details) and we have t σt (X s , ei X t ) = σs (X s , ei X s ) + Aσu (X s , ei X u )du 0
+
t
Gσu (X s , ei X u )σ −2 dYu .
0
This is a finite dimensional filter for σt (X s , ei X t ). Consequently, σt (X s , ei ) = σt (X s , ei X t ), 1 and σt (X s , ei ) E[X s , ei | Yt ] = N . j=1 σ (X s , e j ) Revising the parameters In addition to the volatility σ the parameters introduced in the above model are the values ri , 1 ≤ i ≤ N , of the implicit interest rate, and the entries ai j , 1 ≤ i, j ≤ N , of the Q-matrix
232
Financial applications
A. Recall gi = ri − 12 σ 2 . Using the expectation maximization (EM) algorithm, it is shown in [10] Chapter 8 that the revised estimates are given by ij
aˆ ji =
σ (Nt ) , σ (Jti )
gˆi =
σ (G it ) . σ (Jti )
ij
Here Nt is the number of jumps of X from ei to e j in the time interval [0, 1]. Jti = t 0 X u , ei du is the amount of time X spends in state ei in the time interval [0, t] and t t t X u , ei σ −2 dYu = gi X u , ei σ −2 du + X u , ei σ −2 dBu , G it = 0
0
0
the unnormalized estimates are given by the following linear equations: t t ij ij σ (Nt X t ) = Aσ (Ns X s )ds + σ (X s ), ei a ji e j ds 0
+ 0
σ (Jti X t ) =
t 0
t
0
t 0
Gσ (Nsi j X s )σ −2 dYs ,
Aσ (Jsi X s )ds +
+ σ (G it X t ) =
0
t
t
σ (X s ), ei ei ds
0
Gσ (Jsi X s )σ −2 dYs ,
Aσ (G is X s )ds + gi
+ 0
t
(6.5.9)
t
(6.5.10) σ (X s ), ei ei σ −2 ds
0
(Gσ (G is X s ) + σ (X s ), ei ei )σ −2 dYs .
(6.5.11)
In each case we have ij
ij
σ (Nt ) = σt (Nt X t ), 1 , σ (Jti ) = σt (Jti X t ), 1 , σ (G it ) = σt (G it X t ), 1 . Numerical methods Here we describe numerical approximations to (6.5.8), (6.5.9), (6.5.10), and (6.5.11). Write qt = σ (X t ). Then (6.5.8) is t t qt = σ (X 0 ) + Aqs ds + Gqs σ −2 dYs . 0
0
Suppose h = nt . For 0 ≤ k < n a first approximation gives q(k+1)h = qkh + Aqkh .h + Gqkh σ −2 (Y(k+1)h − Ykh ).
6.5 Estimating the implicit interest rate
233
However, this neglects terms which do not converge to 0 when h → 0. To capture these terms Milshtein [27] noted one should substitute qkh + Aqkh .s + Gqkh σ −2 (Ys − Ykh ) for qs in the expression q(k+1)h = qkh +
(k+1)h
Aqs ds +
kh
(k+1)h
Gqs σ −2 dYs .
kh
Neglecting terms which converge to 0 when h → 0, the Milshtein approximation then is q(k+1)h = qkh + Aqkh .h + Gqkh σ −2 (Y(k+1)h − Ykh ) 1 2 G qkh σ −4 [(Y(k+1)h − Ykh )2 − σ 2 h]. 2
+
A full discussion of the Milshtein scheme, and other more sophisticated schemes, can be ij ij found in [22]. Write n t = σ (Nt X t ). Then (6.5.9) becomes t t ij nt = An is j ds + qs , ei a ji e j ds 0
0
t
+ 0
Gn is j σ −2 dYs .
The Milshtein form in this case is: n (k+1)h = qkh , ei a ji e j h + [I + Ah + qkh .h + G(Y(k+1)h − Ykh )σ −2 ij
+
1 2 ij G [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]n kh . 2
Similarly, writing τti = σ (Jti X t ) and discretizing (6.5.10), i τ(k+1)h = qkh , ei ei h + [I + Ah + qkh .h + G(Y(k+1)h − Ykh )σ −2
+
1 2 i G [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]τkh . 2
Finally, with γti = σ (G it X t ) discretizing (6.5.11), i γ(k+1)h = gi qkh , ei ei σ −2 h + [I + Ah + qkh .h
+ G(Y(k+1)h − Ykh )σ −2 1 i + G 2 [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]γkh 2 + σ −2 qkh , ei ei (Y(k+1)h − Ykh ) 1 + σ −4 Gqkh , ei [(Y(k+1)h − Ykh )2 − σ 2 h]ei . 2
(6.5.12)
234
Financial applications
New estimates for the parameters a ji and gi , based on the observations of the price up to time t = nh, are, therefore, ij
aˆ ji =
n t , 1 , τti , 1
ij
aˆ ji =
n t , 1 . τti , 1
Using the smoothed versions of (6.5.9), (6.5.10) and (6.5.11) (see [10] Chapter 8), and possibly additional data, a second revised estimate for these parameters can be obtained. Iterating this procedure provides a monotonic, increasing sequence of probability densities, so, in terms of maximizing the expectation, the models are improving with each step and the estimation methods are self-tuning.
7
A genetics model
7.1 Introduction Consider a population of N independent individuals. At each time k ∈ {0, 1, 2, . . . } each individual can be in one of n states. The total number, N , of individuals in the population remains constant in time. However, the distribution of the N individuals among the n states changes. We suppose that initially all random variables are defined on a probability space (, F, P). For 1 ≤ i, j ≤ n, p ji is the probability that an individual in the population will jump from state i at time k − 1 to state j at time k. That is, we suppose each individual in the population behaves like an independent time-homogeneous Markov chain with transition matrix P = ( p ji ). Note nj=1 p ji = 1. Write p j = ( p1 j , p2 j , . . . , pn j ) for the j-th column of P. Write (N ) for the set of all partitions of N into n summands; that is, z ∈ (N ) if z = (z 1 , z 2 , . . . , z n ), where each z i is a nonnegative integer and z 1 + z 2 + · · · + z n = N . Write X (k) = (X 1 (k), X 2 (k), . . . , X n (k)) ∈ (N ) for the distribution of the population at time k. It is easily checked that E[X (k) | X (k − 1)] = P X (k − 1).
(7.1.1)
However, the population is sampled by withdrawing (with replacement), at each time k, M individuals from the population and observing to which state they belong. That is, at each time k a sample Y (k) = (Y1 (k), Y2 (k), . . . , Yn (k)) ∈ (M) is obtained, where (M) is the set of partitions of M. Clearly this sequence of samples, Y (0), Y (1), Y (2), . . . enables us to revise our estimates of the state X (k). 7.2 Recursive estimates For α = (α1 , α2 , . . . , αn ) ∈ R n and s = (s1 , s2 , . . . , sn ) ∈ (N ),
236
A genetics model
write F(α, s) =
n
p j , αs j ,
j=1
where , denotes the scalar product in R n . For r = (r1 , r2 , . . . , rn ) ∈ (N ) write pr s = P(X (k) = r | X (k − 1) = s). Then pr s is the coefficient of α1r1 α2r2 . . . αnrn in F(α, s). That is, ∂N F(α, s). (7.2.1) . . . ∂αnrn M For y = (y1 , y2 , . . . , yn ) ∈ (M) write for the multinomial coefficient y1 y2 . . . yn M! . This is just the number of ways of selecting y1 objects from M into state 1, y1 !y2 ! . . . yn ! y2 into state 2 and so on. Then, under the original probability measure P, r yn r1 y1 r2 y2 M n P(Y (k) = y | X (k) = r ) = ... . y1 y2 . . . yn N N N pr s = (r1 !r2 ! . . . rn !)
∂α1r1 ∂α2r2
Write Gk for the complete σ -field generated by X (0), X (1), . . . , X (k) and Y (0), Y (1), Y (2), . . . , Y (k − 1). Yk will denote the complete σ -field generated by Y (0), Y (1), Y (2), . . . , Y (k). We wish to introduce a new probability measure P under which the probability of withdrawing an element in any one of the n states is just 1/n. For this define factors M 1 X n (k) −Yn (k) X 1 (k) −Y1 (k) X 2 (k) −Y2 (k) γk (Y (k)) = ... , n N N N and write k =
k
γk .
=0
A new probability measure can be defined by putting
dP = k . dP Gk
Lemma 7.2.1 For y ∈ (M), r ∈ (N ), P(Y (k) = y | Gk ) =
M y1 y2 . . . yn
M 1 . n
Proof P(Y (k) = y | Gk ) = E[I (Y (k) = y) | Gk ] and by a version of Bayes’ Theorem (4.1.1), this is =
E[k I (Y (k) = y) | Gk ] . E[k | Gk ]
7.2 Recursive estimates
237
Now γk is the only factor of k not Gk -measurable, so this is =
E[γk I (Y (k) = y) | Gk ] . E[γk | Gk ]
The denominator E[γk | Gk ] equals M 1 X 1 (k) −Y1 (k) X 2 (k) −Y2 (k) X n (k) −Yn (k) E ... | Gk , n N N N and the only variables not Gk -measurable are Y1 (k), . . . , Yn (k). Consequently, this conditional expectation is M 1 M = = 1. y1 . . . yn n y∈(M) The numerator is E[γk I (Y (k) = y) | Gk ] =
M y1 y2 . . . yn
M 1 . n
Consequently, P(Y (k) = y | Gk ) =
1 M n y1 y2 . . . yn M
[3 pt] = P(Y (k) = y). That is, under P the n states are i.i.d. with probability 1/n.
Remark 7.2.2 Under P, P(X (k) = r | X (k − 1) = s) is still pr s given by (7.2.1). However, as we saw in Lemma 7.2.1, P(Y (k) = y | Gk ) = P(Y (k) = y | X (k) = r ) M 1 M = P(Y (k) = y) = . n y1 y2 . . . yn To return from P to P the inverse density must be introduced. That is, with −M X 1 (k) Y1 (k) X 2 (k) Y2 (k) 1 X n (k) Yn (k) −1 γ k = γk = ... , n N N N k = −1 k =
k =0
γ ,
the probability P can be defined by putting
dP = k . dP Gk
238
A genetics model
If {φk } is a {Gk } adapted process then Bayes’ Theorem (4.1.1) implies E[φk | Yk ] =
E[k φk | Yk ] E[k | Yk ]
.
E[k φk | Yk ] is, therefore, an unnormalized conditional expectation of φk given Yk . The denominator E[k | Yk ] is a normalizing factor. For r ∈ (N ) write qr (k) = E[k I (X (k) = r ) | Yk ]. Note that r ∈(N ) I (X (k) = r ) = 1 so that r ∈(N ) qr (k) = E[k | Yk ]. We then have the following recursion. Theorem 7.2.3 If Y (k) = (Y1 (k), Y2 (k), . . . , Yn (k)) = (y1 , y2 , . . . , yn ) ∈ (N ), r yn
r y1 r y2 1 1 n qr (k) = n −M ... pr s qs (k − 1). N N N s∈(N ) (Note we take 00 = 1.) Proof qr (k) = E[k I (X (k) = r ) | Yk ] = E[k I (X (k) = r ) | Yk−1 , Y (k) = (y1 , y2 , . . . , yn )] = E[k−1 γ k I (X (k) = r ) | Yk−1 , Y (k) = (y1 , y2 , . . . , yn )] r y1 r y2 r yn 1 2 n = n −M ... E[ k−1 I (X (k) = r ) N N N
× I (X (k − 1) = s) | Yk−1 ] s∈(N )
= n −M
r y1 r y2 1
2
N
N
×
...
r yn n
N
E[ k−1
(X (k − 1) = s) pr s | Yk−1 ]
s∈(N )
= n −M
r y1 r y2 1
2
N
N
...
r yn
n N
pr s qs (k − 1).
s∈(N )
Remarks 7.2.4 P(X (k) = r | Yk ) = E[I (X (k) = r ) | Yk ] =
qr (k) . s∈(N ) qs (k)
To obtain the expected value of X (k) given the observations Yk we consider the vector of
7.3 Approximate formulae
values r = (r1 , r2 , . . . , rn ) for any r ∈ (N ). Then E[X (k) | Yk ] =
r ∈(N )
qr (k) · r
s∈(N )
qr (k)
239
.
Unfortunately this does not have the simple form of (7.1.1). Also note that the transition probabilities pr s can be re-estimated using the techniques described in Chapter 2 of [10].
7.3 Approximate formulae Unfortunately the recursion for qr (k) given by Theorem 7.2.3 is not easily evaluated. One approximation would be to use a smaller value N for N in the summation. To obtain nontrivial partitions of N into n summands, N should be greater than n. Substitution of the observed Y (0), Y (1), Y (2), . . . then would give a sequence of approximate distributions. Alternatively, one could replace the martingale “noise” in the dynamics of X (k) by Gaussian noise ([23]). To describe this, first suppose the n states of the individuals in the population are identified with the unit (column) vectors e1 , . . . , en , ei = (0, . . . , 1, 0, . . . , 0) of R n . Let X i (k) ∈ {e1 , . . . , en } denote the state of the i-th individual at time k. Then for each i, 1 ≤ i ≤ N , X i (k) behaves like a Markov chain on (, F, P) with transition matrix P. Consequently, X i (k) = P X i (k − 1) + M i (k),
(7.3.1)
where E[M i (k) | Gk−1 ] = E[M i (k) | X i (k − 1)] = 0. Write p(0) = ( p1 (0), . . . , pn (0)) = E[X i (0)]. Then from (7.3.1) E[X i (k)] = p(k) = P k p(0). For (column) vectors x, y ∈ R n write x ⊗ y = x y for their Kronecker, or tensor, product, and diag x for the matrix with x on the diagonal. Then, because X i (k) is one of the unit vectors e1 , . . . , en , X i (k) ⊗ X i (k) = diag X i (k) = P diag X i (k − 1)P + M i (k) ⊗ (P X i (k − 1)) + (P X i (k − 1)) ⊗ M i (k) + M i (k) ⊗ M i (k) = diag P X i (k − 1) + diag M i (k). Taking the expectation, we have E[M i (k) ⊗ M i (k)] = diag P p(k − 1) − P diag p(k − 1)P = Q(k), say. For i = j the processes X i and X j are independent.
240
A genetics model
Define N X (k) =
X i (k) , N
i=1
N M(k) =
M i (k) . N
i=1
The (vector) process X (k) describes the actual distribution of the population at time k. Its components sum to unity and X (k) = P X (k − 1) + M(k).
(7.3.2)
Also, by independence, E[M(k) ⊗ M(k)] is also equal to the matrix Q(k). The suggestion made in [23] is to replace the martingale increments M(k) in (7.3.2) by independent (vector) Gaussian random variables W (k) of mean 0 and covariance Q(k). Write φk (w) for the normal density on R n of mean 0 and covariance Q(k). That is, suppose the signal process X (k), taking values in R n , has dynamics X (k) = P X (k − 1) + W (k). For y = (y1 , y2 , . . . , yn ) ∈ (M) and x = (x1 , x2 , . . . , xn ) ∈ R n , x = 0, define ρ(x, y) = |x|−M |x1 | y1 |x2 | y2 . . . |xn | yn ; set ρ(0, y) = 0 for y ∈ (M). The observation process still gives rise to Y (0), Y (1), . . . , Y (k) ∈ (M) and for y ∈ (M), x ∈ R n we suppose M ρ(x, y). P(Y (k) = y | X (k) = x) = y1 y2 . . . yn Starting with the probability P, now define γ k = n −M ρ(X (k), Y (k)), and k = dP Again P can be defined in terms of P by setting = k . Gk dP Suppose f : R n → R is any measurable “test” function. Consider E[ f (X (k)) | Yk ] =
E[k f (X (k)) | Yk ] E[k | Yk ]
.
Suppose there is an unnormalized conditional density qk (x) such that E[k f (X (k)) | Yk ] = f (x)qk (x)dx. Rn
The next result gives a recursion for qk which is the analog of Theorem 7.2.3. Theorem 7.3.1 qk (z) = n
−M
ρ(z, y)
Rn
φk (z − P x)qk−1 (x)ds.
k
=0
γ .
7.3 Approximate formulae
Proof
E[k f (X (k)) | Yk ] =
f (z)qk (z)dz Rn
= n −M E[k−1 ρ(X (k), Y (k)) f (X (k)) | Yk ] = n −M E[k−1 ρ(P X (k − 1) + W (k), y) × f (P X (k − 1) + W (k)) | Yk−1 , Y (k) = y] = n −M E[k−1 ρ(P X (k − 1) + W (k), y) × f (P X (k − 1) + W (k)) | Yk−1 ] = n −M E[ k−1 ρ(P X (k − 1) + W (k), y) × f (P x + w)φk (w)qk−1 (x) | Yk−1 ] −M =n ρ(z, y) f (z)φk (z − P x)qk−1 (x)dzdx. As this identity holds for all such f the result follows.
241
8
Hidden populations
8.1 Introduction An important problem in statistical ecology is how to determine the size of an animal population. A large number of techniques for providing an answer are available (see [35]) but the best-known one is the capture–recapture method. A random sample of individuals is captured, tagged or marked in some way, and then released back into the population. After allowing time for the marked and unmarked to mix sufficiently, a second simple random sample is taken and the marked ones are observed. At epoch write N for the population size, n for the number of marked and released individuals, n k = k=1 n for the total number of captured and marked individuals up to time k, M for the sample size, n for the number of available marked individuals for sampling and y for the number of captured (or recaptured) marked individuals. We are interested in estimating the size N at time of the population. All random variables are defined initially on a probability space (, F, P). All the filtrations defined here will be assumed to be complete. Write Gk = σ (N , n , y , M ≤ k), and Yk = σ (y ≤ k). We assume here that 1. The population sizes Nk follow the dynamics: Nk = Nk−1 + σ (Nk−1 )vk ,
(8.1.1)
N0 has distribution π0 and vk is a sequence of independent random variables with densities φk . 2. The n k are random variables with conditional binomial distributions with parameters pk = p( n k , y1 , . . . , yk , θ ) and n k . For example, θ n1 p1 = = θ, n1 n1 n1 θ n 2 + θ 2 θ n 2 + θ 2 = ,..., n2 n1 + n2 k n i θ k−i+1 n k−1 nk pk = i=1 = θ pk−1 + θ . (8.1.2) nk nk nk 0 < θ ≤ 1 is a parameter assumed to be known or it is to be estimated. The powers of θ express our belief that as time goes by early marked individuals are becoming less and less available for recapture due to various causes including deaths, emigration, etc. p2 =
8.2 Distribution estimation
243
If the number of captured and marked individuals n is kept constant (8.1.2) takes the form: pk (θ ) =
k−1 θk pk−1 + . k k
(8.1.3)
3. The observed random variable yk is assumed to have a conditional binomial distribution, n k Mk −m Mk n k m P(yk = m | Gk − {yk }) = 1− , (8.1.4) m Nk Nk Mk ! Mk where = . m m!(Mk − m)!
8.2 Distribution estimation Define λ0 = 1. For ≥ 1 and for suitable density functions ψ write −y σ (N−1 )ψ (N ) 1 n y −M −n − n +n n λ = p (1 − p ) , 1 − n φ (v ) N N 2 M + and k = k=0 λ
(8.2.1)
Lemma 8.2.1 The process k is a G-martingale. Proof
E[ k | Gk−1 ] = k−1 E[λk | Gk−1 ]. It remains to show that E[λk | Gk−1 ] = 1. σ (Nk−1 )ψk (Nk ) 1 n k −yk E[λk | Gk−1 ] = E[ n k Nk φk (vk ) 2 Mk + n k yk −Mk −n k × 1− pk (1 − pk )−n k +n k | Gk−1 ] Nk n −yk 1 σ (Nk−1 )ψk (Nk ) −n k k = E[ pk (1 − pk )−n k +n k E[ φk (vk ) Nk 2 Mk +n k n k yk −Mk × 1− | Gk−1 , Nk , n k , Mk ] | Gk−1 ]. Nk
The inner expectation equals 2 Mk , so that σ (Nk−1 )ψk (Nk ) −n k pk (1 − pk )−n k +n k | Gk−1 ] φk (vk )
nk nk 1 σ (Nk−1 )ψk (Nk ) = E[ | Gk−1 ] φk (vk ) 2n k i i=0 σ (Nk−1 )ψk (Nk−1 + σ vk ) = E[ | Gk−1 ] φk (vk ) σ (Nk−1 )ψk (Nk−1 + σ v) = φk (v)dv = ψk (u)du = 1. φk (v)
E[λk | Gk−1 ] =
1
nk 2
E[
244
Hidden populations
A new probability measure P can be defined by setting that:
dP = k . The point here is dP Gk
Lemma 8.2.2 Under the new probability measure P, Nk , n k and yk are three sequences of independent random variables which are independent of each other. Further, Nk has density ψk , n k has distribution bin( n k , 1/2) and yk has distribution bin(Mk , 1/2). Proof
For any “test” functions f , g and h, using Bayes’ Theorem 4.1.1, E[ f (Nk )g(n k )h(yk ) | Gk−1 ] =
E[ f (Nk )g(n k )h(yk ) k | Gk−1 ] , E[ k | Gk−1 ]
which equals E[ f (Nk )g(n k )h(yk )λk | Gk−1 ] σ (Nk−1 )ψk (Nk ) 1 −n k ˜ = E[ f (Nk )g(n k ) p (1 − pk )−n k +n k ˜k k n φk (vk ) 2 n k yk −Mk 1 n k −yk 1− × E[h(yk ) M | Gk−1 , Nk , n k , Mk ] | Gk−1 ]. 2 k Nk Nk After cancellation the inner expectation equals
Mk Mk 1 h(m) , m 2 Mk m=0 which shows that yk has distribution bin(Mk , 1/2) independent of N and n. Similarly, E[ f (Nk )g(n k )
σ (Nk−1 )ψk (Nk ) 1 −n k ˜ p (1 − pk )−n k +n k | Gk−1 ] φk (vk ) 2n˜ k k
= E[g(n k )]E[ f (Nk−1 + vk ) = E[g(n k )]
σ (Nk−1 )ψk (Nk−1 + σ vk ) | Gk−1 ] φk (vk )
f (Nk−1 + σ v)σ (Nk−1 )ψk (Nk−1 + σ v)dv
= E[g(n k )]
f (u)ψk (u)du
= E[g(n k )]E[ f (Nk )]. That is, under P the three processes are independent sequences of random variables with the desired distributions. Using this fact we derive a recursive equation for the unnormalized conditional distribution of Nk given Yk . For any “test” function f consider f (z)qk (z)dz E[ f (Nk ) −1 k | Yk ] E[ f (Nk ) | Yk ] = . (8.2.2) = E[ −1 E[ −1 k | Yk ] k | Yk ] The denominator of (8.2.2) being a normalizing factor, we focus only on the numerator.
8.2 Distribution estimation
245
In view of Lemma 8.2.2, −1 −1 E[ f (Nk ) −1 k | Yk ] = E[ f (Nk ) k−1 λk | Yk ]
= 2n k +Mk E
nk
f (z)
z
i=0
1
× pki (1 − pk )n k −i dz
= 2 Mk
nk
z − N k−1 φ k i Mk −yk σ (Nk−1 ) 1− ψk (z) z σ (Nk−1 )ψk (z)
i yk
nk
nk 2
i
i yk
f (z)
i=0
n k −i
× pki (1 − pk )
1−
z nk i
−1 k−1 | Yk
i Mk −yk z
φk
z − u σ (u) σ (u)
qk−1 (u)dzdu.
Comparing this last expression with the right hand side of (8.2.2) we have: Theorem 8.2.3 The unnormalized conditional probability density function of the hidden Markov model given by (8.1.1), (8.1.2), and (8.1.4) follow the recursions qk (z) =
nk
B(yk , z, i)
k (z, u)qk−1 (u)du.
(8.2.3)
i=0
i yk
i Mk −yk 1− z
Here Bk (y, z, i) = 2 z z − u φk σ (u) . (Note we take 00 = 1.) σ (u) Mk
nk i
pki (1 − pk )n k −i
and
k (z, u) =
Remarks 8.2.4 1. The normalized conditional density of Nk is given by
qk (z)
.
qk (u)du 2. The initial (normalized) probability density of N0 , prior to sampling, is π0 (.), so q0 (z) = π0 (z). Using the notation in Theorem 8.2.3, q1 (z) =
n1
B1 (y, z, i)
1 (z, u)π0 (u)du,
(8.2.4)
i=0
and further estimates follow from (8.2.3). 3. If the distribution of N0 is a delta function concentrated at some number A, (8.2.4) becomes q1 (z) = 1 (z, A)
n1
B1 (y, z, i).
(8.2.5)
i=0
246
Hidden populations
8.3 Parameter estimation Our model is function of the parameter pk , the proportion of the accessible marked individuals at epoch k. Suppose pk has dynamics given by (8.1.2). We also assume that θ will take values in some measurable space (, β, γ ). We now derive a recursive joint conditional unnormalized distribution for Nk and θ . We keep working under the probability measure P. Lemma 8.3.1 Write qk (z, θ)dzdθ = E[I (Nk ∈ dz, θ ∈ dθ) −1 k | Yk ]. Then qk (z, θ) =
nk
Bk (y, z, i, θ )
k (z, u)qk−1 (u, θ) du.
(8.3.1)
i=0
i yk n k −i i Mk −yk nk i 1− Here Bk (y, z, i, θ ) = 2 , pk (θ ) 1 − pk (θ) i z z z − u φk σ (u)
k (z, u) = . σ (u) Mk
Proof
Let f , g be integrable test functions. E[ f (Nk )g(θ ) −1 | Y ] = f (z)g(v)qk (z, v) dzdγ (v). k k
Using the independence assumption under P the left hand side of (8.3.2) is −1 = E[ f (Nk )g(θ ) −1 k−1 λk | Yk ] nk
φk
i yk
z − N
k−1
i Mk −yk σ (Nk−1 ) ψk (z) z z σ (N k−1 )ψk (z) i=0
n k −i nk i × pk (v) 1 − pk (v) dzdγ (v) −1 k−1 | Yk ] i z − u φ n k k i yk i Mk −yk σ (u) = 2 Mk f (z)g(v) 1− z z σ (u) i=0
nk × pk (v)i 1 − pk (v))n k −i qk−1 (u, v)dzdudγ (v). i
= 2 Mk E[
f (z)g(v)
1−
Comparing this last expression with the right hand side of (8.3.2) gives (8.3.1). If at time 1, θ has density h(θ ), then q1 (z, θ) =
nk
B1 (y, z, i, θ )h(θ)
i=0
and further updates are given by Lemma 8.3.1.
1 (z, u)π0 (u)du,
and
(8.3.2)
8.4 Pathwise estimation
247
If no dynamics enter the population size and Nk has density φk (.) independent of N , < k, the recursion in Lemma 8.3.1 simplifies to: qk (z, θ) = φk (z)qk−1 (z, θ)
nk
Bk (y, z, i, θ).
(8.3.3)
i=0
Maximum posterior estimators Quantity (8.2.4) (or 8.2.5) is a function of the unknown population size and could be 1 , which is the maximum posterior maximized with respect to z, yielding a critical value N estimate of N at epoch 1 given y1 . Similar maximizations at later times will provide MAP estimators for the population size at these times. 8.4 Pathwise estimation We now derive a recursive equation, which does not involve any integration, for the unnormalized density of the whole path up to epoch k.
Write qk (z 0 , . . . , z k )dz 0 . . . dz k = E[I (z 0 ∈ dz 0 ) . . . I (z k ∈ dz k ) −1 k | Yk ]. Theorem 8.4.1 Using the notation in Theorem 8.2.3, qk (z 0 , . . . , z k ) =
nk
Bk (y, z, i) k (z k−1 , z k )qk−1 (z 0 , . . . , z k−1 ).
(8.4.1)
i=0
Let f 0 , . . . , f k be “test functions”. Then E[ f 0 (N0 ) . . . f k (Nk ) −1 | Y ] = f 0 (z 0 ) . . . f k (z k )qk (z 0 , . . . , z k )dz 0 . . . dz k , k k
Proof
(8.4.2)
and −1 −1 E[ f 0 (N0 ) . . . f k (Nk ) −1 k | Yk ] = E[ f 0 (N0 ) . . . f k (Nk ) k−1 λk | Yk ] n yk n Mk −yk = 2 Mk E[ f 0 (N0 ) . . . f k (Nk−1 ) −1 f 1 − (z ) k k k−1 zk zk z − N k k−1 φk σ (Nk−1 ) × dz k | Yk ] σ (Nk−1 ) n yk n Mk −yk Mk =2 f 0 (z 0 ) . . . f k (z k ) 1− zk zk z − z k k−1 φk σ (z k−1 ) qk−1 (z 0 , . . . , z k−1 )dz 0 . . . dz k . × σ (z k−1 )
Comparing the last expression with (8.4.2) yields at once (8.4.1). n 1 Again we have q0 (z) = π0 (z), q1 (z 0 , z 1 ) = 1 (z 0 , z 1 )π0 (z 0 ) i=0 B1 (y, z, i, ), and further estimates follow from (8.4.1). However, no integration is needed in subsequent recursions.
248
Hidden populations
Maximum posterior estimators Expression (8.4.1) is a function of the path (z 0 , . . . , z k ) and could be maximized yielding a 0 , . . . , N k ). Since no integration is involved here one could substitute, at time critical path ( N 0 , . . . , N k−1 and then maximize qk ( N 0 , . . . , N k−1 , z k ) k say, the sequence of critical values N with respect to the variable z k to obtain an estimate for Nk . 8.5 A Markov chain model Suppose that on probability space (, F, P) are given three sequences of independent random variables, Nk , n k , and yk . For k ∈ IN, Nk is uniformly distributed over some finite set S = {s1 , . . . , s L } ⊂ IN − {0}, n k has a binomial distribution with parameters ( n k , 1/2) and yk has a binomial distribution with parameters (Mk , 1/2), where Mk ∈ IN − {0} is given. We wish to define a new probability measure P such that yk has a binomial distribution with parameters (Mk , n k /Nk ), Nk is a Markov chain with state space S and stochastic matrix C = {ci j } = P[Nk+1 = si | Nk = s j ], n k are random variables with conditional binomial distributions with parameters ( pk , n k ). Define the G-predictable sequences αi = Lj=1 I (N−1 = s j )ci j , for i = 1, . . . , L. In vector notation this is α (N−1 ) = C I (N−1 ), where I (N−1 ) = (I (N−1 = s1 ), . . . , I (N−1 = s L )). Now write L n y n M −y λ = 2 M +n pn (1 − p )n −n (Lαi ) I (N =si ) , (8.5.1) 1− N N i=1 k = k=0 λ . Lemma 8.5.1 The process k is a G-martingale. E[ k | Gk−1 ] = k−1 E[λk | Gk−1 ], so we must show that E[λk | Gk−1 ] = 1. n yk −Mk φk (Nk ) 1 n −yk 1 − E[λk | Gk−1 ] = E[ | Gk−1 ] φk (vk ) 2 Mk Nk Nk 1 φk (Nk ) n −yk n yk −Mk = M E[ | Gk−1 , Nk , Mk ] | Gk−1 ] 1− E[ 2 k φk (vk ) Nk Nk
Proof
Mk 1 φk (Nk ) Mk ! E[ | Gk−1 ] 2 Mk φk (vk ) m=0 m!(Mk − m)! φk (Nk−1 + v) = φk (v)dv = 1, φk (v)
=
and a new probability measure P can be defined by setting
dP = k . dP Gk
Lemma 8.5.2 Under the probability measure P the above processes obey the desired dynamics, i.e. Nk is a Markov chain with state space S and stochastic matrix C = {ci j }, yk and n k are random variables with conditional binomial distributions with parameters (Mk , n k /Nk ) and ( pk , n k ) respectively.
8.5 A Markov chain model
Proof
249
We give a proof only for the first statement regarding Nk . P[Nk = s j | Gk−1 ] = E[I (Nk = s j ) | Gk−1 ] =
E[I (Nk = s j ) k | Gk−1 ] E[ k | Gk−1 ]
= E[I (Nk = s j )λk | Gk−1 ]
= E[I (Nk = s j )2 Mk +n k pn k (1 − pk )n k −n k
n yk k
sj
n k Mk −yk j × 1− Lαk | Gk−1 ] sj
= Lαk 2 Mk +n k E[I (Nk = s j ) pn k (1 − pk )n k −n k j
n yk k
sj
n k Mk −yk × 1− | Gk−1 ] sj
Mk nk Mk n m n Mk −m 1 j Mk + nk 1 = αk L2 1− L n=0 m=0 m si si 2 Mk
nk 1 pkn (1 − pk )n k −n × n 2n k j
= αk = P[Nk = s j | Nk−1 ].
Working under the probability measure P, we derive recursive equations for the unnormalized conditional probability distribution of Nk . Write P[Nk = si | Yk ] = E[I (Nk = si ) | Yk ] =
E[I (Nk = si ) k | Yk ] E[ k | Yk ]
,
and qksi = E[I (Nk = si ) k | Yk ]. Theorem 8.5.3 qksi =
nk
Bk (y, si , n)
n=0
Here Bk (y, si , n) = 2
Mk
n yk si
L
s
j ci j qk−1 .
(8.5.2)
j=1
n Mk −yk 1− si
nk n
pkn (1 − pk )n k −n .
If at time 0, q0 = π = (π1 , . . . , π L ), q1si =
n1 n=0
B1 (y, si , n)
L
ci j π j .
j=1
(8.5.3)
250
Hidden populations
If N0 = sα with probability 1, q1si = ciα
n1
B1 (y, si , n),
(8.5.4)
n=0
and further updates are given by (8.5.2). MAP estimators of N1 , . . . , Nk are provided by 1 = argmax{q s1 , q s2 , . . . , q sL }, . . . , N 1 1 1 k = argmax{q s1 , q s2 , . . . , q sL }. N k k k
8.6 Recursive parameter estimation The previous model is function of the parameters pk and C = ci j . Let pk = pk (θ1 ) and C = C(θ2 ) = ci j (θ2 ) and θ = (θ1 , θ2 ). Suppose θ belongs to some measurable space (, β, γ ). Working again under the probability measure P, write qksi (θ )dθ = E[I (Nk = si )I (θ ∈ dθ) k | Yk ].
(8.6.1)
Lemma 8.6.1 qksi (θ ) =
nk
Bk (y, si , n, θ )
n=0
Here Bk (y, si , n, θ ) = 2
Mk
n yk si
L
s
j ci j (θ )qk−1 (θ).
(8.6.2)
j=1
n Mk −yk 1− si
nk n
pkn (θ)(1 − pk (θ ))n k −n .
If θ1 has density h(.) and θ2 has density g(.), q1si (θ ) =
n1 n=0
B1 (y, si , n, θ )
L
ci j (θ2 )g(θ2 )π j .
(8.6.3)
j=1
8.7 A tags loss model In this section we propose a model where the marks or tags are not permanent. In this situation the marking is done using double tagging where each individual is marked with two tags. For simplicity we assume that the two tags on each individual are nondistinguishable and that individuals retain or lose their tags independently. We start again with a probability space (, F, P) on which are given two sequences of independent random variables Nk and yk . For k ∈ IN, Nk is uniformly distributed over some finite set S = {s1 , . . . , s L } ⊂ IN − {0}, and yk has a trinomial distribution with parameters (Mk , 1/3), where Mk ∈ IN − {0} is given. At any epoch each individual in the population is in any of three states, namely unmarked, marked with only one tag, and marked with two tags, which states we shall call 0, 1, 2 respectively. We suppose that each individual behaves like an independent time homogeneous Markov chain with transition matrix { pi j }.
8.7 A tags loss model
251
At each time the population size N is distributed or partitioned into three groups N (2), N (1), and N (0) = N − N (2) − N (1) among the three states, and we would like to define the set of all such partitions as the states of a three-dimensional Markov chain (N (0), N (1), N (2)). Recall that at each epoch , 0 ≤ N (2), N (1) ≤ n. Write p(i0 ,i1 ,i2 ),( j0 , j1 , j2 ) = P[(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | (Nk−1 (0), Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 )], and for any real numbers x0 , x1 , x2 define the function F(x0 , x1 , x2 , j0 , j1 , j2 ) = (
2
2 2 p0 x ) j0 ( p1 x ) j1 ( p2 x ) j2 .
=0
=0
=0
x0i0 x1i1 x2i2
Then p(i0 ,i1 ,i2 ),( j0 , j1 , j2 ) is the coefficient of in F(x0 , x1 , x2 , j0 , j1 , j2 ). We wish to define a new probability measure P such that yk hasa conditional trinomial distribution with parameters Mk , Nk (0)/Nk , Nk (1)/Nk , Nk (2)/Nk , Nk is a Markov chain with state space S and stochastic matrix C = {ci j }. The Markov chain (N (0), N (1), N (2)) is the same under both probability measures. Define again the G-predictable sequences αi =
L
I (N−1 = s j )ci j , i = 1, . . . , L .
j=1
Now write λ = 3 M
L N (0) y (0) N (1) y (1) N (2) y (2) (Lαi ) I (N =si ) , N N N i=1
k =
k
(8.7.1)
λ .
=0
The process k is a G-martingale and a new probability measure P can be defined by dP setting = k . It can be checked that under P the above processes have the desired dP Gk distributions. Working under the probability measure P, we derive recursive equations for the unnormalized conditional joint probability distribution of Nk and (Nk (0), Nk (1), Nk (2)). Write P[Nk = si , (Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | Yk ] = E[I (Nk = si )I [(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )] | Yk ] =
E[I (Nk = si )I [(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )] k | Yk ] E[ k | Yk ]
,
and qk (si , i 1 , i 2 ) = E[I (Nk = si , Nk (2) = i 2 , Nk (1) = i 1 , Nk (0) = si − i 1 − i 2 ) k | Yk ]. It can be shown that qk (si , i 1 , i 2 ) is given by the following recursions.
252
Hidden populations
Theorem 8.7.1 qk (si , i 1 , i 2 ) = 3 Mk ×
L s − i − i yk (0) i yk (1) i yk (2) i 1 2 1 2 ci j si si si j=1 n k−1
p(si −i1 −i2 ,i1 ,i2 )(s j − j1 − j2 , j1 , j2 ) qk−1 (s j , j1 , j2 ).
(8.7.2)
j1 + j2 =0
The expected value of Nk given the observations Yk is given by L
i=1 si
nk
i +i 2 =0
E[Nk | Yk ] = 1 nk L i=1
i 1 +i 2 =0
qk (si , i 1 , i 2 )
qk (si , i 1 , i 2 )
.
Another way of looking at the problem is by considering only the subpopulation of tagged individuals in the definition of the Markov chain (Nk (0), Nk (1), Nk (2)). In this case the state space is the set of all the partitions of the totality of tagged individuals into three groups: the ones with two tags, the ones with one tag and the ones who lost both tags. Hence we write the total number of tagged individuals as nk = n = Nk (2) + Nk (1) + Nk (0). Note that, when sampling, we cannot observe directly members belonging to the group of individuals who lost their two tags as they become undistinguishable from the unmarked ones in the sample. Now we assume that under P the observation process is multinomial with parameters (Mk , 1/4) and under P it is (conditional) multinomial with parameters Mk , Nk (0)/Nk , Nk (1)/Nk , Nk (2)/Nk , Nk (u)/Nk . Here Nk (u) is the number of unmarked individuals in the population. Again note that Nk (u) is not Nk (0). Given yk (1), yk (2), the unobserved component yk (0), under the probability measure P, is binomial with parameters (Mk − yk (1) − yk (2), 1/2). Write P[Nk = si , (Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | Yk ] = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) | Yk ] =
E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) k | Yk ] E[ k | Yk ]
,
and qk (si , i 0 , i 1 , i 2 ) = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) k | Yk ] = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ))4 Mk i yk (0) i yk (1) i yk (2) i yk (u) 0 1 2 u × si si si si ×L
L j=1
I (Nk−1 = s j )ci j k−1 | Yk ]
8.8 Gaussian noise approximation
253
i yk (1) i yk (2) i yk (0) 1 2 0 E si si si s − i − i − i Mk −yk (2)−yk (1)−yk (0)
= 4 Mk × ×
i
2
1
0
si L
I (Nk−1 = s j )ci j I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ))
j=1
×
I ((Nk−1 , Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 )) k−1 | Yk
j0 + j1 + j2 = n
i yk (1) i yk (2) Mk −y k (2)−yk (1) i 0 m 1 2 E si si si m=0 s − i − i − i Mk −yk (2)−yk (1)−m
= 4 Mk ×
i
× ×
2
1
0
si
L Mk − yk (2) − yk (1) 1 Mk −yk (2)−yk (1) I (Nk−1 = s j )ci j m 2 j=1 I ((Nk−1 , Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 ))
j0 + j1 + j2 = n
× p(i0 ,i1 ,i2 )( j0 , j1 , j2 ) k−1 | Yk . Using the definition of q we have: Theorem 8.7.2 qk (si , i 0 , i 1 , i 2 ) =
Mk −y k (2)−yk (1)
Bk (m, i 0 , i 1 , i 2 )
m=0
L
ci j
j=1
p(i0 ,i1 ,i2 )( j0 , j1 , j2 ) qk−1 (s j , j0 , j1 , j2 ).
j0 + j1 + j2 = n
Here Bk (m, i 0 , i 1 , i 2 ) = 2 Mk +yk (1)+yk (2) ×
i yk (1) i yk (2) i m 1
2
0
si
si
si
s − i − i − i Mk −yk (2)−yk (1)−m M − y (2) − y (1) i 2 1 0 k k k × . m si 8.8 Gaussian noise approximation
An approximate but simpler form of the recursion in Lemma 8.7.2 is to use a suggestion proposed by [23] where the martingale increment “noise” present in the representation of a Markov chain is replaced by Gaussian noise. To this effect, let’s identify, as it is explained in [10], the three states 0, 1, 2 with the standard unit (column) vectors e1 , e2 , e3 of IR3 . Write X kn ∈ {e1 , e2 , e3 } for the state of the n-th individual at time k, 1 ≤ n ≤ n . Then each
254
Hidden populations
individual behaves like a Markov chain on (, F, P) with transition matrix P. n 1 X kn . Then Define X k = n n=1 X k = P X k−1 + Mk ,
(8.8.1)
where Mk is a martingale increment. The suggestion made in [23] is to replace the martingale increment Mk in (8.8.1) by an independent Gaussian random variable vk of mean 0 and covariance matrix E[Mk Mk ] whose density is denoted by φk . That is, the signal process xk , taking values in IR3 , has dynamics xk = P xk−1 + vk .
(8.8.2)
We assume that under P the observation process is multinomial with parameters (Mk , 1/4), xk has density φk and Nk is uniformly distributed over the set (s1 , . . . , s L ). Under the “real world” probability measure P, Nk is a Markov chain with transition matrix C, xk has dynamics (8.8.2) and yk has conditional probability distribution given by P[yk = yk (2) + yk (1) + yk (0) + yk (u) | Mk , Nk = si , xk = (x0 , x1 , x2 ), Nk (u) = si − x2 − x1 − x0 ] x0 yk (0) x1 yk (1) x2 yk (2) Mk = yk (2), yk (1), yk (0), yk (u) si si si s − x − x − x Mk −yk (2)−yk (1)−yk (0) i 2 1 0 × . si P is defined in terms of P using the G–martingale k =
k k=0
×
4 Mk
x (0) yk (0) x (1) yk (1) x (2) yk (2) k k k Nk Nk Nk
L N − x (2) − x (1) − x (0) Mk −yk (2)−yk (1)−yk (0) k k k k (Lαki ) I (Nk =si ) . Nk i=1
The next theorem is the analog of Theorem 8.7.2. Theorem 8.8.1 The unnormalized joint conditional probability distribution of Nk and xk E[I (Nk = si )I (xk ∈ dx) k | Yk ] := qksi (x)dx is given recursively as follows: Mk −y L k (2)−yk (1) sj qksi (x) = Bk (m, x0 , x1 , x2 ) ci j φk (x − Pu)qk−1 (u)du, m=0
using the notation in Theorem 8.7.2.
j=1
References
[1] Arnold, L. Stochastic Differential Equations: Theory and Applications. John Wiley & Sons (1974). [2] Bensoussan, A. Stochastic Control of Partially Observed Systems, Cambridge University Press (1992). [3] Baum, L. E. and Petrie, T. Statistical inference for probabilistic functions of finite state markov chains, Ann. Inst. Statistical Mathematics 37 (1966) 1554–1563. [4] Billingsley, P. Probability and Measure. Third edn. Wiley Series in Probability and Mathematical Statistics (1995). [5] Br´emaud, P. Markov Chains, Gibbs Fields, Monte Carlo Simulation and Queues. Text in Applied Mathematics 31. Springer (1999). [6] Chung, K. L. and Williams, R. J. Introduction to Stochastic Integration. Second edn. Birkhauser (1990). [7] Chung, K. L. A Course in Probability Theory. Academic Press (1974). [8] Davis, M. H. A. Martingales of Wiener and Poisson processes. J. London Math. Soc. (2) 13 (1976) 336–338. [9] Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statistical Society, B 39 (1977) 1–38. [10] Elliott, R. J., Aggoun L., and Moore, J. B. Hidden Markov Models: Estimation and Control. Applications of Mathematics, Vol. 29. Springer-Verlag (1995). [11] Elliott, R. J. Stochastic Calculus and Applications. Applications of Mathematics Vol. 18. Springer-Verlag (1982). [12] Elliott, R. J. and Moore, J. B. A martingale Kronecker lemma and parameter estimation for linear systems. IEEE Trans. Auto. Control 43, No. 9 (1998). [13] Elliott, R. J. and Krishnamurthy, Vikrum. New finite dimensional filters for parameter estimation of discrete-time linear Gaussian models. IEEE Trans. Auto. Control 44 (1998) 938–951. [14] Elliott, R. J. and Krishnamurthy, Vikrum. Exact finite dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems. SIAM J. Control Optim. 35, No. 6 (1997) 1908–1923. [15] Hajek, B. and Wong, E. Stochastic Processes in Engineering Systems. Springer-Verlag (1985). [16] Ikeda, N. and Watanabe, S. Stochastic Differential Equations and Diffusion Processes. Second edn. North-Holland Publishing Company (1989). [17] Itˆo, K. Stochastic integral. Proc. Imperial Acad. Tokyo 20 (1944) 519–524. [18] Jacod, J. Calcul Stochastique et Problemes de Martingales, Lecture Notes in Math. Vol. 714, Springer-Verlag (1979). [19] Jacod, J. and Shiryayev, A. N. Limit Theorems for Stochastic Processes. Springer (1987). [20] Jazwinski, A. H. Stochastic Processes and Filtering Theory. Academic Press (1970).
256
References
[21] Karatzas, Ioannis and Shreve, S. E. Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics, Vol. 113. Springer-Verlag (1988). [22] Kloeden, P. E. and Platen, E. Numerical Solution of Stochastic Differential Equations. SpringerVerlag (1992). [23] Krichagina, N. V., Lipster, R. S., and Rubinovich, E. Y. Kalman filter for Markov processes. In Steklov Seminar (1984). Eds. N. V. Krylov, R. S. Lipster and A. A. Novikov, Optimization Software Inc. (1985) 197–213. [24] Kunita, H. and Watanabe, S. On square integrable martingales, Nagoya Math. J. 30 (1967) 209–245. [25] Lipster, R. S. and Shiryayev, A. N. (1977). Statistics of Random Processes 2. Springer-Verlag (1977). [26] Lo`eve, M. Probability Theory I. Fourth edn. Springer-Verlag (1977). [27] Milshtein, G. N. Approximate integration of stochastic differential equations. Theory of Prob. Appl. 19 (1974) 562–577. [28] Meyer, P. A. Probability and Potentials. Blaisdell Publishing Company (1966). [29] Neveu, J. Discrete Parameter Martingales. North-Holland (1975). [30] Oksendal, B. Stochastic Differential Equations, an Introduction with Applications. Fourth edn. Universitext. Springer (1995). [31] Papoulis, A. Probability, Random Variables and Stochastic Processes. McGraw Hill (1984). [32] Revuz, Daniel and Yor, Marc. Continuous Martingales and Brownian Motion. Third edn. Grundlehren der mathematischen Wissenschaften, Vol. 293. Springer-Verlag (1999). [33] Rogers, L. C. G. and Satchell, S. E. Estimating variance from high, low and closing prices, Ann. Appl. Probability 1 (1991) 504–512. [34] Rogers, L. C. G. and Williams, David. Diffusions, Markov Processes, and Martingales. Vol. 1: Foundations. Second edn. John Wiley & Sons (1994). [35] Seber, G. A. F. The Estimation of Animal Abundance and Related Parameters. Second edn. Edward Arnold (1982). [36] Shiryayev, A. N. Probability Theory. Springer-Verlag (1984). [37] Wu, C. F. J. On the convergence properties of the EM algorithm, Ann. Statistics 11 (1983) 95–103.
Index
σ -algebra, 4 σ -field, 4 σ -finite measure, 20 absolutely continuous, 17, 24 adapted process, 43 algebra, 3 almost surely (a.s.), 19 atoms, 5, 14 Bayes’ Theorem, 131 Borel field, 4 Borel sets, 4 Bounded Convergence Theorem, 21 Brownian motion with drift, 72 calibration, 219 capture–recapture, 242 Cauchy–Schwarz inequality, 26 certain event, 3 change of variable formula in Lebesgue integral, 22 Chebyshev–Markov inequality, 26 class D, 64 class D supermartingale, 64 coarser field, 14 compensator, 57, 65, 124 complete filtration, 7 complete σ -field, 7 Conditional Bayes’ Theorem, 132 conditional independence, 11 conditional probability given G, 10 continuity of probability, 7 continuous almost surely, 42 continuous in L p , 42 continuous in probability, 42 convenience yield, 224 converges almost surely, 19 converges in distribution, 27 converges in probability, 27 correlation coefficient, 26 counting measure, 123 counting process, 65, 75
covariance, 26 cylinder set, 8, 18 direct product, 23 discrete time Itˆo formula, 83 Doob–Meyer special semimartingale representation, 65 EM algorithm, 177, 196 events, 3, 4 expected value of a random variable, 20 Fatou’s Lemma, 21 field, 3 filtered probability space, 7 filtration, 5 finite variation function, 62 first passage time, 49 Gaussian random variable, 70 hitting time, 46 homogeneous Markov chain, 57, 67 impossible event, 3 increasing process, 45, 64 independent events, 11 independent increments, 38 indicator function, 14 indistinguishable processes, 42 infinite variation function, 62 infinitely often (i.o.), 13 infinitesimal generator, 68 integer valued measure, 123 integrable random variable, 20 integrable variation, 62, 97, 98 integration by parts formula, 83 Itˆo formula for Brownian motion, 108 Itˆo formula for continuous bounded semimartingales, 98 Jensen’s Inequality, 51
258 Kolmogorov’s backward differential system, 68 Kolmogorov’s Existence Theorem, 45 Kolmogorov’s forward differential system, 68 L´evy system, 41 L´evy’s characterization of the Poisson process, 107 Lebesgue measure, 20 Lebesgue’s Dominated Convergence Theorem, 21 left continuity of a filtration, 6 likelihood ratio, 25 local martingale, 55, 61 local square integrable martingale, 61 localizing sequence, 61 locally bounded process, 96 locally uniformly integrable martingale, 96 Markov chain, 57, 67 Markov process, 50 Markov property, 50, 57 Markov time, 46 martingale, 50, 59 Martingale Convergence Theorem, 60 martingale transform, 56, 83 mean of a random variable, 20 measurable sets, 4 measurable space, 4 measurable stochastic process, 44 measure, 19 measure of jumps, 125 measure space, 19 modification process, 42 modified Kalman filter, 226 Monotone Convergence Theorem, 20 multinomial distribution, 37 natural filtration, 43 negligible events, 7 normalized random variable, 26 null events, 7 optional covariation, 80 optional process, 45 optional quadratic variation, 80, 85, 86 optional time, 46, 49 orthogonal martingales, 82, 86 passage time of Brownian motion, 76 point function, 123 point processes, 124 Poisson process, 75 Poisson process with drift, 161 Poisson random measure, 126 posterior probability, 9 power class, 4 predictable covariation, 80
Index predictable process, 45 predictable quadratic variation, 80, 85 probability density function, 17, 18 probability distribution function, 17 probability measure, 6 probability space, 7 probability transition matrix, 58 product σ -field, 23 product probability measure, 8 product rule, 118 progressively measurable process, 44 Radon–Nikodym derivative, 25 Radon–Nikodym Theorem, 24 random measure, 123 random variable, 14 regular martingale, 52 right-continuity of a filtration, 6 right-continuous process, 42 right-continuous with left limits process, 42 sample path continuity, 42 sample space, 3 semimartingale, 57, 63 semimartingale representation of a Markov chain, 58, 69 simple event, 3 simple function, 15, 92 special semimartingale, 63 square integrable process, 61, 80 standard deviation, 26 standard one-dimensional Brownian motion, 70 stationary increments, 38 stochastic basis, 7 stochastic exponential, 113 stochastic integral, 83 stochastic process, 38 stopping time, 46 Stratonovitch integral, 83 strong Markov property, 50 submartingale, 50, 59 supermartingale, 50, 59 time-change for martingales, 87 transition semigroup, 67 uniform integrability, 27 up-crossings theorem, 52 usual conditions, 7 variance, 26 variation of a function, 62 volatility, 219 volatility estimation, 217 weak solution, 149