Measure theory and filtering: introduction and applications

This page intentionally left blank Measure Theory and Filtering Introduction and Applications The estimation of noisi...

Author: Lakhdar Aggoun | Robert J. Elliott

48 downloads 986 Views 1MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

This page intentionally left blank

Measure Theory and Filtering Introduction and Applications The estimation of noisily observed states from a sequence of data has traditionally incorporated ideas from Hilbert spaces and calculus-based probability theory. As conditional expectation is the key concept, the correct setting for filtering theory is that of a probability space. Graduate engineers, mathematicians, and those working in quantitative finance wishing to use filtering techniques will find in the first half of this book an accessible introduction to measure theory, stochastic calculus, and stochastic processes, with particular emphasis on martingales and Brownian motion. Exercises are included, solutions to which are available from www.cambridge.org. The book then provides an excellent user’s guide to filtering: basic theory is followed by a thorough treatment of Kalman filtering, including recent results that exend the Kalman filter to provide parameter estimates. These ideas are then applied to problems arising in finance, genetics, and population modelling in three separate chapters, making this a comprehensive resource for both practitioners and researchers. Lakhdar Aggoun is Associate Professor in the Department of Mathematics and Statistics at Sultan Qabos University, Oman. Robert Elliott is RBC Financial Group Professor of Finance at the University of Calgary, Canada.

CAMBRIDGE SERIES IN STATISTICAL AND PROBABILISTIC MATHEMATICS Editorial Board R. Gill (Department of Mathematics, Utrecht University) B. D. Ripley (Department of Statistics, University of Oxford) S. Ross (Department of Industrial Engineering, University of California, Berkeley) M. Stein (Department of Statistics, University of Chicago) B. Silverman (St. Peter’s College, University of Oxford)

This series of high-quality upper-division textbooks and expository monographs covers all aspects of stochastic applicable mathematics. The topics range from pure and applied statistics to probability theory, operations research, optimization, and mathematical programming. The books contain clear presentations of new developments in the field and also of the state of the art in classical methods. While emphasizing rigorous treatment of theoretical methods, the books also contain applications and discussions of new techniques made possible by advances in computational practice. Already published 1. Bootstrap Methods and Their Application, by A. C. Davison and D. V. Hinkley 2. Markov Chains, by J. Norris 3. Asymptotic Statistics, by A. W. van der Vaart 4. Wavelet Methods for Time Series Analysis, by Donald B. Percival and Andrew T. Walden 5. Bayesian Methods, by Thomas Leonard and John S. J. Hsu 6. Empirical Processes in M-Estimation, by Sara van de Geer 7. Numerical Methods of Statistics, by John F. Monahan 8. A User’s Guide to Measure Theoretic Probability, by David Pollard 9. The Estimation and Tracking of Frequency, by B. G. Quinn and E. J. Hannan 10. Data Analysis and Graphics using R, by John Maindonald and John Braun 11. Statistical Models, by A. C. Davison 12. Semiparametric Regression, by D. Ruppert, M. P. Wand, R. J. Carroll 13. Exercises in Probability, by Loic Chaumont and Marc Yor

Measure Theory and Filtering Introduction and Applications

Lakhdar Aggoun Department of Mathematics and Statistics, Sultan Qaboos University, Oman

Robert J. Elliott Haskayne School of Business, University of Calgary

   Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge  , UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521838030 © Cambridge University Press 2004 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2004 - -

---- eBook (EBL) --- eBook (EBL)

- -

---- hardback --- hardback

Cambridge University Press has no responsibility for the persistence or accuracy of s for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Preface Part I

Theory

page ix 1

1

Basic probability concepts 1.1 Random experiments and probabilities 1.2 Conditional probabilities and independence 1.3 Random variables 1.4 Conditional expectations 1.5 Problems

3 3 9 14 28 34

2

Stochastic processes 2.1 Definitions and general results 2.2 Stopping times 2.3 Discrete time martingales 2.4 Doob decomposition 2.5 Continuous time martingales 2.6 Doob–Meyer decomposition 2.7 Brownian motion 2.8 Brownian motion process with drift 2.9 Brownian paths 2.10 Poisson process 2.11 Problems

38 38 46 50 56 59 62 70 72 72 75 75

3

Stochastic calculus 3.1 Introduction 3.2 Quadratic variations 3.3 Simple examples of stochastic integrals 3.4 Stochastic integration with respect to a Brownian motion 3.5 Stochastic integration with respect to general martingales 3.6 The Itˆo formula for semimartingales 3.7 The Itˆo formula for Brownian motion 3.8 Representation results 3.9 Random measures 3.10 Problems

79 79 80 87 90 94 97 108 115 123 127

vi

4

Contents

Change of measures 4.1 Introduction 4.2 Measure change for discrete time processes 4.3 Girsanov’s theorem 4.4 The single jump process 4.5 Change of parameter in poisson processes 4.6 Poisson process with drift 4.7 Continuous-time Markov chains 4.8 Problems

Part II

Applications

131 131 134 145 150 157 161 163 165 167

5

Kalman filtering 5.1 Introduction 5.2 Discrete-time scalar dynamics 5.3 Recursive estimation 5.4 Vector dynamics 5.5 The EM algorithm 5.6 Discrete-time model parameter estimation 5.7 Finite-dimensional filters 5.8 Continuous-time vector dynamics 5.9 Continuous-time model parameters estimation 5.10 Direct parameter estimation 5.11 Continuous-time nonlinear filtering 5.12 Problems

169 169 169 169 175 177 178 180 190 196 206 211 215

6

Financial applications 6.1 Volatility estimation 6.2 Parameter estimation 6.3 Filtering a price process 6.4 Parameter estimation for a modified Kalman filter 6.5 Estimating the implicit interest rate of a risky asset

217 217 221 222 223 229

7

A genetics model 7.1 Introduction 7.2 Recursive estimates 7.3 Approximate formulae

235 235 235 239

8

Hidden populations 8.1 Introduction 8.2 Distribution estimation 8.3 Parameter estimation 8.4 Pathwise estimation 8.5 A Markov chain model

242 242 243 246 247 248

Contents

8.6 8.7 8.8

Recursive parameter estimation A tags loss model Gaussian noise approximation

References Index

vii

250 250 253 255 257

Preface

Traditional courses for engineers in filtering and signal processing have been based on elementary linear algebra, Hilbert space theory and calculus. However, the key objective underlying such procedures is the (recursive) estimation of indirectly observed states given observed data. This means that one is discussing conditional expected values, given the observations. The correct setting for conditional expected value is in the context of measurable spaces equipped with a probability measure, and the initial object of this book is to provide an overview of required measure theory. Secondly, conditional expectation, as an inverse operation, is best formulated as a form of Bayes’ Theorem. A mathematically pleasing presentation of Bayes’ theorem is to consider processes as being initially defined under a “reference probability.” This is an idealized probability under which all the observations are independent and identically distributed. The reference probability is a much nicer measure under which to work. A suitably defined change of measure then transforms the distribution of the observations to their real world form. This setting for the derivation of the estimation and filtering results enables more general results to be obtained in a transparent way. The book commences with a leisurely and intuitive introduction to σ -fields and the results in measure theory that will be required. The first chapter also discusses random variables, integration and conditional expectation. Chapter 2 introduces stochastic processes, with particular emphasis on martingales and Brownian motion. Stochastic calculus is developed in Chapter 3 and techniques related to changing probability measures are described in Chapter 4. The change of measure method is the basic technique used in this book. The second part of the book commences with a treatment of Kalman filtering in Chapter 5. Recent results, which extend the Kalman filter and enable parameter estimates to be obtained, are included. These results are applied to financial models in Chapter 6. The final two chapters give some filtering applications to genetics and population models. The authors would like to express their gratitude to Professor Nadjib Bouzar of the Department of Mathematics and Computer Science, University of Indianapolis, for the incredible amount of time he spent reading through the whole manuscript and making many useful suggestions. Robert Elliott would like to acknowledge the support of NSERC and the hospitality of the Department of Applied Mathematics at the University of Adelaide, South Australia.

x

Preface

Lakhdar Aggoun would like to acknowledge the support of the Department of Mathematics and Statistics, Sultan Qaboos University, Al-Khoud, Sultanate of Oman; the hospitality of the Department of Mathematical Sciences at the University of Alberta, Canada; and the Haskayne School of Business, University of Calgary, Calgary, Canada.

Part I Theory

1

Basic probability concepts

1.1 Random experiments and probabilities An experiment is random if its outcome cannot be predicted with certainty. A simple example is the throwing of a die. This experiment can result in any of six unpredictable outcomes 1,

2, 3, 4, 5, 6 which we list in what is usually called a sample space = {1, 2, 3, 4, 5, 6} = {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 }. Another example is the amount of yearly rainfall in each of the next 10 years in Auckland. Each outcome here is an ordered set containing ten nonnegative real numbers (a vector in IR10 + ); however, one has to wait 10 years before observing the outcome ω. Another example is the following. Let X t be the water level of a dam at time t. If we are interested in the behavior of X t during an interval of time [t0 , t1 ] say, then it is necessary to consider simultaneously an uncountable family of X t s, that is, = {0 ≤ X t < ∞,

t0 ≤ t ≤ t1 }.

The “smallest” observable outcome ω of an experiment is called simple. The set {1} containing 1 resulting from a throw of a die is simple. The outcome “odd number” is not simple and it occurs if and only if the throw results in any of the three simple outcomes 1, 3, 5. If the throw results in a 5, say, then the same throw results also in “a number larger than 3” or “odd number.” Sets containing outcomes are called events. The events “odd number” and “a number larger than 3” are not mutually exclusive, that is, both can happen simultaneously, so that we can define the event “odd number and a number larger than 3.” The event “odd number and even number” is clearly impossible or empty. It is called the impossible event and is denoted, in analogy with the empty set in set theory, by ∅. The event “odd number or even number” occurs no matter what is the event ω. It is itself and is called the certain event. In fact possible events of the experiment can be combined naturally using the set operations union, intersection, and complementation. This leads to the concept of field or algebra (σ -field (sigma-field) or σ -algebra, respectively) which is of fundamental importance in the theory of probability.

4

Basic probability concepts

A nonempty class F of subsets of a nonempty set is called a field or algebra if 1. ∈ F, 2. F is closed under finite unions (or finite intersections), 3. F is closed under complementation. It is a σ -field or (σ -algebra) if the stronger condition 2. F is closed under countable unions (or countable intersections) holds. If {F} is a σ -field the pair (, F) is called a measurable space. The sets B ∈ F are called events and are said to be measurable sets. For instance, the collection of finite unions of the half open intervals (a, b], (−∞ < a < b ≤ +∞) in IR plus the empty set is a field but not a σ -field because it is not closed under infinite countable unions. The open interval (0, 1) = ∞ n=1 (0, 1 − 1/n] is not in this collection despite the fact it contains each interval (0, 1 − 1/n]. Neither does it contain the singletons {x}, even though {x} = ∞ n=1 (x − 1/n, x] and it does not contain many other useful sets. This suggests that the notion of σ -field is indeed needed. There exists a minimal σ -field denoted B(IR) containing all half open intervals (a, b]. This is the Borel σ -field on the real line and it is the smallest σ -field containing the collection of open intervals and hence all intervals. It contains also: 1 1 1. all singletons {x} since {x} = ∞ x − , x + , n=1 n n 2. the set Q of all rational numbers because it is a countable union: Q = r ∈Q {r }, 3. the complement of Q, which is the set of all irrational numbers, 4. all open sets since any open set O = n In , where {In } are disjoint intervals. To see this recall that since O is open, then for any x ∈ O there exits a maximal interval Ix containing x and contained in O and Ix = O if O is itself an interval. If O is not an interval then there is a collection of disjoint maximal intervals contained in O, one for each x ∈ O. Moreover, each of these intervals contains a rational number because of the density of Q. Let {rn : n = 1, 2, . . . } be an enumeration of these rationals. Consequently, there is only at most a countable number of these intervals I1 , I2 , . . .. Therefore, since each of these intervals is contained in O, their union n In ⊂ O. Conversely, for each x ∈ O there exits a maximal interval In(x) containing x and contained in n In , that is, O ⊂ n In . Consequently O = n In . Sets in B(IR) are called Borel sets. Note that a topological space, unlike a measure space, is not closed under complementation. A word of caution here: even σ -fields are not in general closed under uncountable unions. The largest possible σ -field on any set is the power class 2 containing all the subsets of . However this σ -field is in general “too big” to be of any use in probability theory. At the other extreme we have the smallest σ -field consisting of and the empty set ∅. Given any collection C of subsets of , the σ -field generated by C, denoted by σ {C}, is made up of the class of all countable unions, all countable intersections and all complements of the subsets in C and all countable unions, intersections and complements of these sets, and so on. For instance, if C contains one subset, F say, then σ {F} consists of the subset

1.1 Random experiments and probabilities

5

F itself, its complement F¯ (also denoted F c ), their union F ∪ F¯ (which is always ) and their intersection F ∩ F¯ (which is always ∅). The σ -field σ {C} generated by a class of subsets C contains by definition C itself (as a subset); however, there are other σ -fields also containing C, one of them being 2 (the largest one). The point here is that σ {C} is the smallest σ -field containing C. In the set theory context “smallest” means that σ {C} is in the intersection of all the σ -fields containing C. In summary: C ⊂ σ {C} ⊂ {any σ -field containing C}. It is left as an exercise to show that any σ -field is either finite or uncountably infinite. Fields, or σ -fields, are convenient mathematical objects that express how much we know about the outcome ω of a random experiment. For instance, if = {1, 2, 3, 4, 5, 6} we may not be able to observe ω but we may observe a “larger” event like “odd number”= {(1, 3, 5)}, so that our “observed” σ -field is smaller than the one generated by . In fact it is equal to {(1, 3, 5), (2, 4, 6), , ∅}, which does not contain events like {(1, 3)} or {6}. When the sample space is finite, it is enough to represent information through partitions of into atoms, which are the smallest observable events. Since a field is just a collection of finite unions and complements of these atoms, it represents the same information as the partition. This is not true on infinite sample spaces as partitions and fields are not big enough to represent information in all practical situations. Suppose that when the experiment of throwing a die is performed, an indirect observer of the outcome ω can only learn that the event {1, 2} did or did not occur. So for this observer the (smallest) decidable events, or atoms, are in the field F1 = σ {{1, 2}, {3, 4, 5, 6}} = {∅, {1, 2, 3, 4, 5, 6}, {1, 2}, {3, 4, 5, 6}}. Another observer with a better access to information might be able to observe the richer field F2 = σ {{1, 2}, {3, 4}, {5, 6}}, which contains more atoms. The point here is that, given a set of outcomes , it is possible to define many fields, or σ -fields, ranging from the coarsest (containing only and the empty set ∅), to the finest (containing all the subsets of ). A natural question is: what extra conditions will make a field into a σ -field? We have the following useful result. A field is a σ -field if and only if it is closed under monotonic sequences of events, that is, it contains the limit of every monotonically increasing or decreasing sequence of events. (A sequence of events Ai , i ∈ IN, is monotonic increasing if A1 ⊂ A2 ⊂ A3 . . . ). Let the index parameter t be either a nonnegative integer or a nonnegative real number. To keep track, to record, and to benefit from the flow of information accumulating in time and to give a mathematical meaning to the notions of past, present and future the concept of filtration is introduced. This is done by equipping the measurable space (, F) with a nondecreasing family {Ft , t ≥ 0} of “observable” sub-σ -fields of F such that Ft ⊂ Ft whenever t ≤ t . That is, as time flows, our information structures or σ -fields are becoming finer and finer. We define F∞ = σ ( t≥0 Ft ) = t≥0 Ft where the symbol = stands for “by definition.”

6

Basic probability concepts

Example 1.1.1 Let = {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 }. The σ -fields F0 = σ {, ∅}, F1 = σ {{ω1 , ω2 , ω3 }, {ω4 , ω5 , ω6 }}, F2 = σ {{ω1 , ω2 }, {ω3 }, {ω4 , ω5 , ω6 }}, F3 = σ {{ω1 }, {ω2 }, {ω3 }, {ω4 , ω5 , ω6 }}, form a filtration since F0 ⊂ F1 ⊂ F2 ⊂ F3 . However, the σ -fields F0 = σ {, ∅}, F1 = σ {{ω1 , ω2 , ω3 }, {ω4 , ω5 , ω6 }}, F2 = σ {{ω1 , ω4 }, {ω2 , ω5 }, {ω3 , ω6 }}, F3 = σ {{ω1 }, {ω2 }, {ω3 , ω4 }, {ω5 , ω6 }}, do not form a filtration since, for instance, F1 ⊂ F2 .

Example 1.1.2 Suppose is the unit interval (0, 1] and consider the following σ -fields: F0 = σ {, ∅}, F1 = σ {(0, 12 ], ( 12 , 34 ], ( 34 , 1]}, F2 = σ {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}, F3 = σ {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}. These form a filtration since F0 ⊂ F1 ⊂ F2 ⊂ F3 .

When the time index t ∈ IR+ we are led naturally to introduce the concepts of rightcontinuity and left-continuity of a filtration as a function of t. A filtration {Ft , t ≥ 0} is right-continuous if Ft contains events immediately after t, that is Ft = >0 Ft+ . We may also say that a filtration {Ft , t ≥ 0} is right-continuous if new information at time t arrives precisely at time t and not an instant after t. It is left-continuous if {Ft } contains events strictly prior to t, that is Ft = s
1.1 Random experiments and probabilities

7

It is easily seen that if is finite we need only specify P on atoms of F. The triple (, F, P) is called a probability space. Nonempty events which are unlikely to occur and to which a zero probability is assigned are called negligible events or null events. A σ -field F is P-complete if all subsets of null events are also events. Of course, their probability is zero. A filtration is complete if F0 is complete, i.e. all the null events are known at the initial time. The mathematical object (, F, Ft , P), where the filtration {Ft , t ≥ 0} is rightcontinuous and complete, is sometimes called a stochastic basis or a filtered probability space . The filtration {Ft , t ≥ 0} is said to satisfy the “usual conditions” if it is right-continuous and complete. For monotonic sequences of events we have the following result on continuity of probability measures. Theorem 1.1.3 Let (, F, P) be a probability space. If {An } is an increasing sequence of events with limit A, then P(An ) ↑ P(A), and if {Bn } is a decreasing sequence of events with limit B, then P(Bn ) ↓ P(B). Proof To prove the first statement, visualize the sequence {An } as a sequence of increasing concentric disks and then define the sequence of disjoint rings {Rn } (except for R1 which is the disk A1 ): R1 = A1 , R2 = A2 − A1 , . . . , Rn = An − An−1 . Note that ∞ Ak = ∪kn=1 Rn , A = ∪∞ n=1 An = ∪n=1 Rn ,

so that by σ -additivity k k P(A) = ∞ n=1 P(Rn ) = limk n=1 P(Rn ) = limk P(∪n=1 Rn ) = limk P(Ak ). The proof of the second statement follows by considering the sequence of complementary ¯ so that events { B¯ n } which is increasing with limit B, 1 − P(An ) ↑ 1 − P(A) =⇒ P(An ) ↓ P(A).

Example 1.1.4 Consider the experiment of tossing a fair coin infinitely many times and “observing” the outcomes of all tosses. Here each ω ∈ = (H, T )∞ is a countably infinite sequence of “Heads” and “Tails”. If we denote “Heads” and “Tails” by 0 and 1, each ω is a sequence of 0s and 1s and it can be shown that there are as many ωs as there are points in the interval [0, 1)!

8

Basic probability concepts

Suppose we wish to estimate the probability of the event consisting of those ωs for which the proportion of heads converges to 1/2. The so-called Strong Law of Large Numbers says that this probability is equal to one, i.e. the ωs for which the convergence to 1/2 does not hold form a negligible set. However, this negligible set is rather huge, as can be imagined! Example 1.1.5 In Example 1.1.4 let Fn,S be the collection of infinite sequences of H s and T s with some restriction S put on the first n tosses. For instance, if n = 3, S = {H H T . . . , H T H . . . , T H H . . . } ⊂ (H, T )3 , F3,S is the collection of infinite sequences of H s and T s for which the first three entries contain exactly two H s. It is left as an exercise to show that the class F = {Fn,S , S ⊂ (H, T )n , n ∈ IN} is a field. We now quote without proof from [4] the following result on extending a function P defined on sets in a field. Theorem 1.1.6 ([4]) If P is a probability measure on a field A, then it can be extended uniquely to the σ -field F = σ {A} generated by A, i.e. the restriction of the extension measure to the field A is P itself and by tradition they are both denoted by P. Let us return to the coin-tossing situation of Example 1.1.5. Using the extension theorem (Theorem 1.1.6) one can construct a (unique) probability measure P called product probability measure on the space ((H, T )∞ , F), starting from an initial probability ( p(H ), p(T )) = (1/2, 1/2) by setting P(Fn,S ) =

1 n S

2

n 1 = (number of infinite sequences in S) × . 2

It is left as an exercise to show that P does not depend on the representations of sets in F and that it is countably additive. (See [4]). An immediate generalization of the coin tossing experiment in Example 1.1.5 is to consider an infinite sequence of independent experiments, to which corresponds an infinite sequence of probability spaces (1 , F1 , P1 ), (2 , F2 , P2 ), . . . . We are interested in the space (∞) = 1 × 2 × . . . of all infinite sequences ω = (ω1 , ω2 , . . . ). Events of interest are again cylinder sets, i.e. infinite sequences with restrictions put on the first n outcomes. The collection of all these cylinders form a field which generates a σ -field F, often denoted F1 ⊗ F2 ⊗ . . . . A probability measure P can be defined on cylinder sets then extended uniquely to F using the Extension Theorem 1.1.6. In the coin-tossing experiment, an example of an event which is in F is the event F that a “Head” will occur. Clearly, F = ∞ k=1 Fk , where Fk is the event that a “Head” occurs on the k-th trial and not before. Since each Fk is a cylinder set, P(Fk ) is well defined for each

1.2 Conditional probabilities and independence

9

k ≥ 1. Moreover the Fk s are pairwise disjoint, hence P(F) =

∞

P(Fk ) =

k=1

∞ 1 = 1. 2k k=1

Note that this probability is still 1 regardless of the size of the probability of occurrence of a “Head”, (as long as it is not 0). Modeling with infinite sample spaces is not a mathematical fantasy. In many very simple minded problems infinite sequences of outcomes cannot be avoided. For example, “the first time a Head occurs” event cannot be described in a finite sample space model because the number of trials before it occurs cannot be bounded in advance. In general, it is impossible to define a probability measure on all the subsets of an infinite sample space; that is, one cannot say any subset is an event. However, consider the following case. Example 1.1.7 Suppose that is countable and let F be the σ -field 2 . Then it is not difficult to define a probability measure on F. Choose P such that 0 ≤ P({ω}) ≤ 1 and P({}) = P(ω) = 1, ω∈

and for any F ∈ F, define P(F) = ω∈F P(ω). Let {Fn }n∈IN be a sequence of disjoint sets in F and let ωn, denote the simple events in Fn . Since we have an infinite series of nonnegative numbers, P( Fn ) = P(ωn,m ) = P(ωn,m ) = P(Fn ). n

n,m

n

m

n

1.2 Conditional probabilities and independence Given a probability space (, F, P) and some event B with P(B) = 0, we define a new posterior probability measure as follows. If A is any event we define the probability of A given B as P(A | B) =

P(A and B) P(A ∩ B) = , P(B) P(B)

provided P(B) > 0. Otherwise P(A | B) is left undefined. What we mean by “given event B” is that we know that event B has occurred, that is we know that ω ∈ B, so that we no longer assign the same probabilities given by P to events but assign new, or updated, probabilities given by the probability measure P(. | B). Any event which is mutually exclusive with B has probability zero under P(. | B) and the new probability space is now (B, F ∩ B, P(. | B)). If our observation is limited to knowing whether event B has occurred or not we may as well define P(. | B), where B is the complement of B within . Prior to knowing where the outcome ω is we define the, now random, quantity:

P(. | B or B)(ω) = P(. | σ {B})(ω) = P(. | B)I B (ω) + P(. | B)I B (ω).

10

Basic probability concepts

This definition extends in an obvious way to a σ -field G generated by a finite or countable partition {B1 , B2 , . . . } of and the random variable P(. | G)(ω) is called the conditional probability given G. The random function P(. | G)(ω) whose values on the atoms Bi are P(. ∩ Bi ) ordinary conditional probabilities P(. | Bi ) = is not defined if P(Bi ) = 0. In P(B I ) this case we have a family of functions P(. | G)(ω), one for each possible arbitrary value assigned to the undefined P(. | Bi ). Usually, one version is chosen and different versions differ only on a set of probability 0. Example 1.2.1 Phone calls arrive at a switchboard between 8:00 a.m. and 12:00 p.m. according to the following probability distribution: lk 1. P(k calls within an interval of length l) = e−l ; k! 2. If I1 and I2 are disjoint intervals, P((k1 calls within I1 ) ∩ (k2 calls within I2 )) = P(k1 calls within I1 )P(k2 calls within I2 ), that is, events occurring within disjoint time intervals are independent. Suppose that the operator wants to know the probability that 0 calls arrive between 8:00 and 9:00 given that the total number of calls from 8:00 a.m. to 12:00 p.m., N8−12 , is known. From past experience, the operator assumes that this number is near 30 calls, say. Hence P(0 calls within [8, 9) | 30 calls within [8, 12]) P((0 calls within [8, 9)) (30 calls within [9, 12])) = P(30 calls within [8, 12]) P(0 calls within [8, 9))P(30 calls within [9, 12]) = = P(30 calls within [8, 12])

30 3 , 4

which can be written as P(0 calls within [8, 9) | N8−12

N 3 = N) = . 4

(1.2.1)

Remarks 1.2.2 Consider again Example 1.2.1. 1. The events Fi = {ω : N8−12 (ω) = i}, i = 0, 1, . . . form a partition of and are atoms of the σ -field generated by observing only N8−12 , so we may write: P(0 calls within [8, 9) | Fi , i ∈ IN)(ω) = P(0 calls within [8, 9) | σ {Fi , i ∈ IN})(ω) ∞ i 3 = I Fi (ω). 4 i 2. Observe that since each event F ∈ σ {Fi , i ∈ IN} is a union of some Fi1 , Fi2 , . . . , and since we know, at the end of the experiment, which F j contains ω, then we know

1.2 Conditional probabilities and independence

11

whether or not ω lies in F, that is whether F or the complement of F has occurred. In this sense, σ {Fi , i ∈ IN} is indeed all we can answer about the experiment from what we know. The likelihood of occurrence of any event A could be affected by the realization of B. Roughly speaking if the “proportion” of A within B is the same as the “proportion” of A within then it is intuitively clear that P(A | B) = P(A | ) = P(A). Knowing that B has occurred does not change the prior probability P(A). In that case we say that events A and B are independent. Therefore two events A and B are independent if and only if P(A ∩ B) = P(A)P(B). Two σ -fields F1 and F2 are independent if and only if P(A1 ∩ A2 ) = P(A1 )P(A2 ) for all A1 ∈ F1 , A2 ∈ F2 . If events A and B are independent so are σ {A} and σ {B} because the impossible event ∅ is independent of everything else including itself, and so is . Also A and B c , Ac and B, Ac and B c are independent. We can say a bit more, if P(E) = 0 or P(E) = 1 then the event E is independent of any other event including E itself, which seems intuitively clear. Mutually exclusive events with positive probabilities provide a good example of dependent events. Example 1.2.3 In the die throwing experiment the σ -fields F1 = σ {{1, 2}, {3, 4, 5, 6}}, and F2 = σ {{1, 2}, {3, 4}, {5, 6}}, are not independent since if we know, for instance, that ω has landed in {5, 6} (or equivalently {5, 6} has occurred) in F2 then we also know that the event {3, 4, 5, 6} in F1 has occurred. This fact can be checked by direct calculation using the definition. However, the σ -fields F3 = σ {{1, 2, 3}, {4, 5, 6}}, and F4 = σ {{1, 4}, {2, 5}, {3, 6}}, are independent. The occurrence of any event in any of F3 or F4 does not provide any nontrivial information about the occurrence of any (nontrivial) event in the other field. Another fundamental concept of probability theory is conditional independence. Events A and C are said to be conditionally independent given event B if P(A ∩ C | B) = P(A | B)P(C | B), P(B) > 0. The following example shows that it is not always easy to decide, under a probability measure, if conditional independence holds or not between events. Example 1.2.4 Consider the following two events: A1 =“person 1 is going to watch a football game next weekend,” A2 =“person 2, with no relation at all with person 1, is going to watch a football game next weekend.”

12

Basic probability concepts

There is no reason to doubt the independence of A1 and A2 in our model. However consider now the event B = “next weekend weather is good.” Suppose that P(A1 | B) = .90,

P(A2 | B) = .95,

P(A2 | B) = .30,

P(A1 | B) = .40,

P(B) = .75

and

P(B) = .25.

Using this information it can be checked that P(A1 ∩ A2 ) = P(A1 )P(A2 ). The reason is that event B has “linked” events A1 and A2 in the sense that if we knew that A1 has occurred the probability of B should be high, resulting in the probability of A2 increasing. The independence concept extends to arbitrary families of events. A family of events {Aα , α ∈ I } is said to be a family of independent events if and only if any finite subfamily is independent, i.e., for any finite subset of indices {i 1 , i 2 , . . . , i k } ⊂ I , P(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ) = P(Ai1 )P(Ai2 ) . . . P(Aik ). A family of σ -fields {Fα , α ∈ I } is said to be a family of independent σ -fields if and only if any finite subfamily {Fi1 , Fi2 , . . . , Fik } is independent; that is, if and only if any collection of events of the form {Ai1 ∈ Fi1 , Ai2 ∈ Fi2 , . . . , Aik ∈ Fik } is independent. An extremely powerful and standard tool in proving properties which are true with probability one is the Borel–Cantelli Lemma. This lemma concerns sequences of events. Let {An } be a monotone decreasing sequence of events, i.e. A1 ⊃ A2 ⊃ · · · ⊃ An ⊃ An+1 ⊃ . . . , then by definition lim An =

n→∞

∞

An .

n=1

Let {Bn } be a monotone increasing sequence of events, i.e. B1 ⊂ B2 ⊂ · · · ⊂ Bn ⊂ Bn+1 ⊂ . . . , then by definition lim Bn =

n→∞

∞

Bn .

n=1

Let {Cn } be an arbitrary sequence of events. Define

An = sup Ck = k≥n

∞

Ck ,

k=n

and

Bn = inf Ck = k≥n

∞

Ck .

k=n

Event An occurs if and only if at least one of the events Cn , Cn+1 , . . . occurs and event Bn occurs if and only if all the Cn occur simultaneously except for a finite number.

1.2 Conditional probabilities and independence

13

By construction, An and Bn are monotone. An is decreasing and Bn is increasing so that: A = lim An = n→∞

∞

An =

∞ ∞

Ck ,

n=1 k=n

n=1

and B = lim Bn = n→∞

∞

Bn =

∞ ∞

Ck .

n=1 k=n

n=1

∞ Event A = ∞ n=1 k=n C k = lim sup C n occurs if and only if infinitely many C n occur, or Cn occurs infinitely often (Cn i.o.). To see this suppose that ω belongs to an infinite number ∞ ∞ of Cn s; then for every n, ω ∈ ∞ if ω k=n C k . Therefore, ω ∈ n=1 k=n C k . Conversely, belongs to only a finite number of Cn s, then there is some n 0 such that ω ∈ ∞ k=n 0 C k . ∞ ∞ ∞ ∞ Since ∞ C ⊂ C , this shows that ω ∈ C if ω belongs to only n=1 k=n k k=n 0 k n=1 k=n k a finite number of Cn s. ∞ Event B = ∞ n=1 k=n C k = lim inf C n occurs if and only if all but a finite number of Cn occur. Clearly lim inf Cn ⊂ lim sup Cn . Consider the following simple example of sequences of intervals in IR. Example 1.2.5 Let A and B be any subsets of and define the sequences C2n = A and C2n+1 = B. Then: lim sup Cn = A ∪ B,

lim inf Cn = A ∩ B.

Example 1.2.6 Let Ck = {(x, y) ∈ IR2 :

0 ≤ x < k, 0 ≤ y <

1 }, k

then An = sup Ck = k≥n

∞

Ck = {x, y ∈ IR2 :

0 ≤ x < ∞, 0 ≤ y <

k=n

1 }, n

and Bn = inf Ck = k≥n

∞

Ck = {x, y ∈ IR2 :

0 ≤ x < n, y = 0}.

k=n

An and Bn are monotone and decreasing and increasing respectively so that: A = lim An = n→∞

∞

An = lim sup Cn = {x, y ∈ IR2 :

0 ≤ x < ∞, y = 0},

Bn = lim inf Cn = {x, y ∈ IR2 :

0 ≤ x < ∞, y = 0}.

n=1

and B = lim Bn = n→∞

∞ n=1

14

Basic probability concepts

Lemma 1.2.7 (Borel–Cantelli). Let (, F, P) be a probability space. 1. For an arbitrary sequence of events {Cn }, ∞ n=1 P(C n ) < ∞ implies P(lim sup C n ) = 0. 2. If {Cn } is a sequence of independent events, ∞ n=1 P(C n ) = ∞ implies P(lim sup C n ) = 1. Proof lim sup Cn =

1.

∞ ∞ n=1 k=n

Ck ⊂

∞

Ck ,

k=n

which implies P(lim sup Cn ) ≤ ∞ ∞. k=n P(C k ) → 0 as n → ∞ ¯ 2. Consider the complementary event of lim sup Cn which is ∞ n=1 k=n Ak . Now n+m n+m n+m

P( A¯ k ) = (1 − P(Ck )) ≤ exp − P(Ck ) → 0, k=n

k=n

k=n

for all n as m → ∞ because of the divergence of the series

∞ n=1

P(Cn ).

1.3 Random variables Definition 1.3.1 If (, F) and (E, E) are measurable spaces a map X : → E is measurable if X −1 (B) ∈ F for all B ∈ E. Definition 1.3.2 A measurable real valued function X : (, F, P) → (IR, B(IR)) is called a random variable. It is left as an exercise to show that if {ω : X (ω) ≤ x} = {ω : X (ω) ∈ (−∞, x]} ∈ F for all real x ∈ IR then X is a random variable. For C ∈ define IC (ω), (also denoted χC (ω) or simply I (C)), the indicator function of the set C, as follows: 1 if ω ∈ C, IC (ω) = 0 otherwise. Example 1.3.3 Let be the unit interval (0, 1] and on it are given the following σ -fields: F1 = σ {(0, 12 ], ( 12 , 34 ], ( 34 , 1]}, F2 = σ {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}, F3 = σ {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}. Consider the mapping X (ω) = x1 I(0, 14 ] (ω) + x2 I( 14 , 12 ] (ω) + x3 I( 12 , 34 ] (ω) + x4 I( 34 ,1] (ω). It is an easy exercise to check that the inverse image X −1 of any interval in IR is in F2 and in F3 (F2 ⊂ F3 ). F2 is coarser than F3 because the atoms (smallest sets) of F2 are unions of the smaller atoms of F3 . So if we know in which atom of F3 the outcome ω is we can determine in which atom it is of F2 . However, if we know that ω is in the F2 atom (0, 14 ],

1.3 Random variables

15

say, then it could be in either of F3 atoms (0, 18 ] or ( 18 , 28 ]. X is not a random variable with respect to F1 since X −1 ({x1 }) = (0, 14 ] is not an atom (and, a fortiori, it is not an event) of F1 . To put it another way, knowing for instance that ω ∈ (0, 12 ] leaves us undecided about which value X has taken; that is to say X is not F1 -measurable. Note that in the above example F2 is the smallest σ -field with respect to which X is measurable and it coincides with the class of all inverse images of X . For this reason it is called the σ -field generated by X and is denoted σ (X ). We have F1 ⊂ σ (X ) ⊂ F3 . For more general cases where X takes its values in some topological space E, the Borel σ -field B on E is the smallest σ -field generated by the open sets of E. In general it is not possible to assign probabilities to all subsets of and, therefore, we cannot treat any function X as a random variable since X −1 might not be an event and so its probability is not defined. However, it is not an easy task to come up with an example of a function which is not a random variable! In the finite state space set up we have: X is F-measurable if and only if X is a constant function when restricted to any of the atoms of F. To see this suppose first that F ∈ F is an atom and that X takes values a and b on F with a < b. Let α = (a + b)/2, then {ω ∈ : X (ω) ≤ α} ∈ F is a nonempty proper subset of F, a contradiction. For the converse, let {F1 , . . . , F p } be the collection of atoms of F and suppose that X (ω) = αi for ω ∈ Fi , i = 1, 2, . . . , p. Then {ω ∈ : X (ω) ≤ α} = Fi ∈ F, αi ≤α

that is, X is F-measurable. It is interesting to note here that we can express a random variable in two ways: X (ω) = αi I Fi (ω) = α I X −1 (α) (ω). α∈IR

Fi

Definition 1.3.4 X is a simple function on (, F) if there exists a partition of = Ai ∩ A j = ∅ for i = j and X (ω) =

k

xi I Ai (ω).

Ai ,

(1.3.1)

i=1

Given a finite σ -field A = σ {A1 , . . . , A N }, where we assume without loss of generality that A1 , . . . , A N form a partition and a random variable X assuming finitely many (distinct) values, x1 , . . . , xk , and with inverse images X −1 ({xi }) contained in A (so k ≤ N ) we can N write uniquely X = l=1 yl I Al . To see this note that the inverse image of any point xi in the range of X is a union of atoms of A, Ai1 , . . . , Ais(xi ) , say, that is

k k N X= xi I[∪s Ais ] = xi I Ai s = yl I Al . i=1

i=1

s

l=1

The mapping in Example 1.3.3 is a simple random variable. The following result is fundamental in the theory of integration.

16

Basic probability concepts

Theorem 1.3.5 If X is a positive random variable (possibly with infinite values), there exists an increasing sequence of simple random variables X 1 , X 2 , . . . converging monotonically to X for each ω such that X n (ω) ≤ X (ω) for all n and all ω. Proof

If X is nonnegative, define X n (ω) =

n2n k−1 k=1

2n

k (ω). I[ k−1 2n , 2n )

Clearly, X n (ω) converges to X (ω) for all ω. As a corollary, notice that a general random variable X can be represented as the difference

of two nonnegative random variables: X = X I X ≥0 − (−X )I X <0 = X + − X − . Let us notice that, unlike spaces of continuous functions, the spaces of measurable functions are closed under pointwise limits, i.e. the spaces of measurable functions are more stable. In applications we often work with functions of random variables. Suppose f : IR → IR is a function, then Y (.) = f (X (.)) is also a function from to IR. Y will be a random variable if Y −1 (B) = [ f (X )]−1 (B) = X −1 ( f −1 (B)) ∈ F, for each Borel set B; which holds whenever f −1 (B) is a Borel set. By definition the random variable Y generates events which are in σ (X ); in other words σ (Y ) ⊂ σ (X ). In this case we say that Y is σ (X )-measurable. Conversely, we have the important result: Theorem 1.3.6 If Y is σ (X )-measurable then there exists a measurable function f such that Y (.) = f (X (.)). Proof

Start with simple functions. Let A ∈ σ (X ) and consider the random variable 1 if ω ∈ A, Y (ω) = I A (ω) = 0 otherwise.

Since A ∈ σ (X ) there exists a Borel set B such that X −1 (B) = A and clearly Y (ω) =

I A (ω) = I B (X (ω)) = f (X (ω)). Hence the result is true for simple functions. Using Theorem 1.3.5 and limiting arguments we can establish the result for general random variables which are σ (X )-measurable. A more precise argument is (see [36]): let = {Y : Y is σ (X )-measurable}, and X = {Z : Z is σ (X )-measurable and Z = f (X ), f is a Borel function}. We first note that ⊆ X . Now for any F ∈ σ (X ), there is a Borel set B ∈ B(IR) such that F = {ω : X (ω) ∈ B} and we see that I F (ω) = I B (X (ω)) ∈ X . Hence every simple σ (X )-measurable random variable is in X . However, if Y is an arbitrary, σ (X )-measurable random variable, Theorem 1.3.5 guarantees the existence of a sequence of simple random

1.3 Random variables

17

variables Y1 , Y2 , . . . which converges pointwise to Y . By the previous step, for each n, there is a Borel function f n such that Yn (ω) = f n (X (ω)) ∈ X . Then f n (X (ω)) converges to Y (ω) pointwise. Define the Borel function: limn f n (x) if the limit exists, f (x) = 0 otherwise. Clearly Y (ω) = limn f n (X (ω)) = f (X (ω)) for all ω ∈ , which gives the result. Example 1.3.7 Let X be a real valued function on with finite range. Suppose that the range of X is {x1 , . . . , xr }. Then we know that σ (X ) is the σ -algebra generated by the atoms {X −1 (x j ); j = 1, . . . , r }. Suppose that Y is σ (X )-measurable. Then Y must be constant on the atoms of σ (X ). That is, there exist constants y1 , . . . , yr such that Y (ω) = y j I X −1 (x j ) (ω) = g(x j )I X −1 (x j ) (ω) = g(X (ω)). j

j

where, by definition, g(x j ) = y j .

Remark 1.3.8 The result of Example 1.3.7 can be extended as follows. Let X 1 , . . . , X n be n random variables on (, F, P). Write σ (X 1 , . . . , X n ) for the σ -algebra generated by these random variables. The atoms of this σ -algebra are all sets of the form F1 ∩ F2 ∩ · · · ∩ Fn , where Fi is an atom of σ (X i ) for i = 1, 2, . . . , n. Then a function Y on is σ (X 1 , . . . , X n )measurable if and only if Y = g(X 1 , . . . , X n ) for some function g : IRn → IR. Thus the use of σ -algebra is a powerful way of expressing dependence of one random variable on a (possibly infinite) family of others. It should be pointed out that the class of Borel functions is rich enough for most practical purposes. It contains continuous functions, piecewise continuous functions, etc. It is closed under pointwise limit operations, linear combinations, products, compositions, etc. A real random variable X induces a probability measure on the Borel sets of B(IR), called the probability distribution of X , defined by PX (B) = P(X ∈ B), for any Borel set B. If there exists a nonnegative function f such that for any Borel set B PX (B) = f (x)dx, (⇒ f (x)dx = 1), B

IR

we say that PX is absolutely continuous and f is the probability density function of X . The probability distribution function of a real valued random variable is defined on IR by the formula: F(x) = P(X ≤ x).

18

Basic probability concepts

The probability distribution function has the following properties: 1. 0 ≤ F(x) ≤ 1 for all x ∈ IR. 2. F is nondecreasing and right-continuous with left limits at each x ∈ IR. Conversely, given a distribution function F(x) on the real line, there exists a probability space ( = IR, F = B(IR), P) and a random variable X (ω) ≡ ω on it such that P(X ≤ x) = F(x). P is defined on intervals as P((a, b]) = F(b) − F(a) and extended uniquely to a probability measure on B(IR). (See Theorem 1, page 152 in [36]). The measure P constructed from the function F is usually called the Lebesgue–Stieltjes probability measure corresponding to the distribution F. A similar situation exists for random vectors (X 1 , . . . , X n ) and probability distributions on (IRn , B(IRn )). Given a probability measure P on (IRn , B(IRn )), write: Fn (x1 , . . . , xn ) = P((−∞, x1 ] × · · · × (−∞, xn ]). An absolutely continuous random vector is one for which there exists a nonnegative probability density function f n (.) : IRn → IR such that: +∞ +∞ ··· f n (u 1 , . . . , u n )du 1 . . . du n = 1, −∞

−∞

and

Fn (x1 , . . . , xn ) =

x1 −∞

···

xn −∞

f n (u 1 , . . . , u n )du 1 . . . du n .

Conversely we have Theorem 1.3.9 (Theorem 2, page 160 in Shiryayev [36]) Let Fn be a distribution function on IRn . Then there is a unique probability measure P on (IRn , B(IRn )) such that P((a1 , b1 ] × · · · × (an , bn ]) = a1 b1 . . . an bn Fn (x1 , . . . , xn ), where ai bi Fn (x1 , . . . , xn ) = Fn (x1 , . . . , xi−1 , bi , xi+1 , . . . , xn ) − Fn (x1 , . . . , xi−1 , ai , xi+1 , . . . , xn ). The remarkable fact is that a similar construction for probability measures also works for the space (IR∞ , B(IR∞ )), where IR∞ = {(x1 , x2 , . . . ) ∈ IR × IR × . . . } and the Borel σ -field of IR∞ B(IR∞ ) is the smallest σ -field containing all the cylinder sets {x = (x1 , x2 , . . . ) ∈ IR∞ : x1 ∈ I1 , x2 ∈ I2 , . . . , xn ∈ In },

n ≥ 1,

where each Ii is an interval of the form (ai , bi ]. In other words, a cylinder set is a set of infinite sequences with restrictions placed on a finite number of coordinates.

1.3 Random variables

19

The next theorem is Kolmogorov’s Theorem on the Extension of Measures in (IR∞ , B(IR∞ )). See [36] page 163. Theorem 1.3.10 Suppose that P1 , P2 , . . . are probability measures on the spaces (IR, B(IR)), (IR2 , B(IR2 )), . . . such that for n ≥ 1 and B ∈ B(IRn )) Pn+1 (B × IR) = Pn (B). then there is a unique probability measure P on (IR∞ , B(IR∞ )) such that the restriction of P to B(IRn )) is Pn for all n ≥ 1. Properties of random variables are usually not required to hold for every ω. They may fail to hold on sets of measure zero, that is negligible subsets of . In this case they are said to hold almost surely (a.s.) or almost everywhere (a.e.) with respect to some probability measure P. σ -fields are P-completed by including in them all subsets of the P-negligible subsets of . Example 1.3.11 A sequence of random variables {X k } is said to converge almost surely a.s. to a random variable X (X k → X ) if P[ω : lim X k (ω) exists and is finite] = 1. k→∞

That is, {X k (ω)} converges pointwise to X (ω), except possibly on a negligible subset of . Independence of random variables is expressed in terms of the σ -fields they generate. Definition 1.3.12 Suppose X 1 , . . . , X n are random variables on a probability space (, F, P). Then X 1 , . . . , X n are independent if for any k ≤ n, P(X 1 ∈ B1 , . . . , X k ∈ Bk ) =

k

P(X i ∈ Bi ),

i=1

for all Borel sets (B1 , . . . , Bk ). An arbitrary class of random variables is independent if and only if any finite subfamily is independent. Theorem 1.3.13 Suppose X 1 , . . . , X n are random variables on a probability space (, F, P). Then X 1 , . . . , X n are independent if and only if P(X 1 ≤ x1 , . . . , X n ≤ xn ) =

n

P(X i ≤ xi ),

(x1 , . . . , xn ) ∈ IRn .

i=1

Proof

See [36] page 179.

Definition 1.3.14 Let (, F) be a measurable space. A map µ : F → [0, ∞] is called a measure on (, F) if µ is countably additive, that is if {Bk } is a countable sequence of pairwise disjoint elements of F, then µ( Bk ) = µ(Bk ). The triple (, F, µ) is then called a measure space.

20

Basic probability concepts

We say that µ is finite if µ() < ∞ and it is σ -finite if there exists a partition of into disjoint subsets {Sk } such that ∞ n=1 Sk = and µ(Sk ) < ∞, for all k ≥ 1. When we move from intervals, which are essential in the definition of Riemann integral, to more complicated subsets of IR whose “lengths” or measures are not obvious we use Lebesgue measure and integration theory. Roughly speaking, the Lebesgue measure of a set F (denoted λ(F)) is the infimum of the sums of lengths of intervals which cover the set F. If this infimum exists we say that F is measurable. More formally, consider the field A generated by finite unions of intervals of the form (a, b] and define the set function n λ0 (∪(ai , bi ]) = i=1 (bi − ai ), a1 ≤ b1 . . . an ≤ bn . Then λ0 is well defined, σ -finite and countably additive. Hence there exists a unique measure λ on the Borel σ -field B(IR) [34]. This measure is called the Lebesgue measure on (IR, B(IR)). Given a finite probability space (, F, P} and a simple random variable X (ω) =

n

x I A (ω),

=1

the expected value or mean or integral with respect to the measure P is defined by the formula: E[X ] =

n

x P(A ).

=1

For a nonnegative random variables X there exists, by Theorem 1.3.5, a sequence of simple random variables X 1 , X 2 , . . . increasing to X pointwise, and we define its Lebesgue integral or expectation by: E[X ] = lim E[X n ] = X (ω)d P(ω). n

Furthermore, it can be shown that the limit is independent of the increasing sequence of functions. If the above limit is finite, X is said to be integrable. If both its positive and negative parts, (X + = X I{X ≥0} , X − = −X I{X <0} ) are integrable, X is said to be integrable and we define X (ω)dP(ω) = X + (ω)dP(ω) − X − (ω)dP(ω). Note that

E[X ] =

X (ω)dP(ω) =

xdFX (x). IR

Here FX (x) = P(X ≤ x) is the usual probability distribution function of X and dFX is the Lebesgue–Stieltjes probability measure corresponding to the probability distribution function F. Theorem 1.3.15 (Monotone Convergence Theorem.) If X n is a sequence of nonnegative random variables increasing almost surely to X , then X n dP increases to X dP.

1.3 Random variables

21

Proof Let X n,k be a sequence of simple functions increasing pointwise to X k as n → ∞. Put Z n = max1≤k≤n X n,k . Then Z n is increasing and Z n ≤ max1≤k≤n X k = X n . Clearly, X k = lim X n,k ≤ lim Z n = Z ≤ lim X n = X, n

n

n

for all k ≥ 1, that is to say X = Z . However, the random variables Z n are simple and increasing to Z , so that lim X n dP ≤ X dP = Z dP = lim Z n dP ≤ lim X n dP. n

and hence limn

n

X n dP =

n

X dP.

An important consequence of the Monotone Convergence Theorem is Fatou’s Lemma. Theorem 1.3.16 (Fatou’s Lemma.) If X n is a sequence of nonnegative random variables then lim inf X n dP ≤ lim inf X n dP, n

n

where lim infn X n = limn infm≥n X m . Proof Let lim infn X n = limn infm≥n X m = limn Z n . Then Z n increases to lim infn X n so by the Monotone Convergence Theorem (Theorem 1.3.15) lim inf X n dP = lim Z n dP = lim Z n dP = lim inf Z n dP n n n n ≤ lim inf X n dP, n

which establishes the result. Theorem 1.3.17 (Lebesgue’s Dominated Convergence Theorem.) Suppose {X n } is a sequence of random variables such that |X n | ≤ Y almost surely where Y is an integrable random variable. If X n converges to X a.s., then X n and X are integrable, converges to X dP, and |X n − X |dP → 0 as n → ∞. Proof

X n dP

Using the hypothesis and Fatou’s Lemma, X dP = lim inf X n dP ≤ lim inf X n dP n n ≤ lim sup X n dP ≤ lim sup X n dP = X dP.

Since Y is integrable and |X | ≤ Y , X is also integrable. Now, note that |X n − X | ≤ 2Y so the same argument can be used to prove the second part of the theorem. A special case of Lebesgue’s Dominated Convergence Theorem is the Bounded Convergence Theorem where Y is replaced by some positive constant.

22

Basic probability concepts

If X and Y are independent random variables with finite means, then their product is a random variable with finite mean and E[X Y ] = E[X ]E[Y ].

(1.3.2)

To see this consider first independent events A and B and their indicator functions: X = I A , Y = I B . Then E[X Y ] = E[I A I B ] = E[I A∩B ] = P(A ∩ B) = P(A)P(B). m If X = i=1 ai I Ai , Y = nj=1 b j I B j are independent, we may assume that the Ai (resp. Bi ) form a partition of and ai = a j if i = j (resp. bi = b j if i = j). Then n m E[X Y ] = ai I Ai (ω) b j I B j (ω)dP(ω) i=1

=

m n

j=1

ai b j

i=1 j=1

=

I Ai (ω)dP(ω)

X (ω)dP(ω)

I B j (ω)dP(ω)

Y (ω)dP(ω)

= E[X ]E[Y ]. When X and Y are integrable, they are limits of simple functions X n , Yn for which E[X n Yn ] = E[X n ]E[Yn ] is true by the above argument. Taking limits, as we are allowed to by Theorem 1.3.17, finishes the proof. We also have the change of variable formula in Lebesgue Integral : Theorem 1.3.18 Let X be a real random variable and B a Borel set. Then: g(x)dFX (x) = g(X (ω))dP(ω). X −1 (B)

B

Here g is a Borel function and when B = IR g(X (ω))dP(ω) = g(x)dFX (x).

IR

(If is countable, the integral is replaced by a summation E[g(X )] = g(X (ωi ))P(ωi ) = g(x)P(X −1 (x)) = g(x)FX (x). ωi

x∈IR

x∈IR

Here FX (x) = FX (x) − FX (x−)). Proof

Let B and C be two Borel sets and g(x) = IC (x). Then g(x)dFX (x) = P(X −1 (B) ∩ X −1 (C)) = g(X (ω))dP(ω). B

X −1 (B)

Hence, the result is true for nonnegative simple functions, and by the Monotone Convergence Theorem 1.3.15 it is true for all nonnegative random variables. In the general case we need only represent g as the difference of two nonnegative functions: g = g + − g − .

1.3 Random variables

23

Definition 1.3.19 Given two σ -fields F1 and F2 the product σ -field of F1 and F2 , denoted F1 ⊗ F2 , is the smallest σ -field containing all “rectangles” F1 × F2 , F1 ∈ F1 , F2 ∈ F2 . Definition 1.3.20 Let (1 , F1 , µ1 ), (2 , F2 , µ2 ) be two measure spaces. The direct product of (1 , F1 , µ1 ) and (2 , F2 , µ2 ) is defined as the measure space ( = 1 × 2 , F = F1 ⊗ F2 , µ = µ1 × µ2 ) such that: µ1 × µ2 (F1 × F2 ) = µ1 (F1 )µ2 (F2 ),

F1 ∈ F1 , F2 ∈ F2 .

The following theorem is on the reduction of a (Lebesgue) double integral to an iterated integral. Theorem 1.3.21 (Fubini Theorem). Suppose X (ω1 , ω2 ) is an F1 ⊗ F2 -measurable function which is integrable with respect to µ1 × µ2 . Then 1. X dµ1 (ω1 ) is defined, F2 -measurable and finite µ2 -a.e. 1 2. X dµ2 (ω2 ) is defined, F1 -measurable and finite µ1 -a.e. 2 3. X d(µ1 × µ2 ) = X dµ2 dµ1 = X dµ1 dµ2 . 1 ×2

Proof

1

2

2

1

See [36] page 198.

For C ∈ F, we write

X dP =

IC X dP = E[IC X ].

C

If X = IC (.) we see that

E(X ) = E[IC ] =

dP = P(C), C

and we have the following important special cases. P(X ∈ B) = E[I B (X (.))], P(X ∈ B, Y ∈ C) = E[I B (X (.))IC (Y (.))]. Remark 1.3.22 When integrating random variables we are implicitly assuming measurability of the integrand with respect to the σ -field and measure at hand. However, consider the following example. Example 1.3.23 Let be the unit interval (0, 1] and on consider the σ -field F = σ {(0, 12 ], ( 12 , 34 ], ( 34 , 1]}. Define the measure λ((0, 12 ]) = 12 ,

λ(( 12 , 34 ]) = 14 ,

λ(( 34 , 1] = 14 ,

24

Basic probability concepts

Consider the mapping X (ω) = x1 I(0, 14 ] (ω) + x2 I( 14 , 12 ] (ω) + x3 I( 12 , 34 ] (ω) + x4 I( 34 ,1] (ω). There is no natural definition of the integral of X with respect to the given measure λ on the field F, because λ is not defined on the sets (0, 14 ] and ( 14 , 12 ]. The problem here is that as far as F is concerned, the event (0, 12 ] is an indivisible atom of the measure space. That is to say, X is not F-measurable. Thus, integration theory in this context is forced to consider only those integrands X with x1 = x2 , and in this case the integral is as defined earlier for simple functions: X dλ = x1 λ((0, 12 ]) + x3 λ(( 12 , 34 ]) + x4 λ(( 34 , 1]).

Remark 1.3.24 Given any nonnegative random variable X on a probability space (, F, P) with finite mean E(X ) = 1, one can define another (probability) measure P on F by setting for F ∈ F: P(F) = X dP. F

Clearly, if P(F) = 0, then P(F) = 0 for any F ∈ F and we say that P is absolutely continuous with respect to P (P P). However, the remarkable fact that the converse is true is given by the following theorem. (See [36].) Theorem 1.3.25 (Radon–Nikodym). Let (, F) be a measurable space, µ a σ -finite measure, and µ a signed measure (i.e. µ = µ1 − µ2 , where at least one of the measures µ1 and µ2 is finite) such that for each F ∈ F, µ(F) = 0 implies µ(F) = 0. We write µ µ. Then there exists an F-measurable function with values in the extended real line [−∞, +∞], such that µ(C) = (ω)dµ(ω), C

for all C ∈ F. The function is unique up to sets of µ-measure zero; if h is another Fmeasurable function such that µ(C) = C h(ω)dµ(ω) for all C ∈ F, then µ{ω : (ω) = dµ h(ω)} = 0. If µ is a positive measure, then has its values in [0, +∞]. We write = . dµ F Remark 1.3.26 In the case of probability measures the Radon–Nikodym Theorem reads as follows. If P and P are two probability measures on (, F) such that for each B ∈ F, P(B) = 0 implies P(B) = 0 (P P), then there exists a nonnegative random variable , dP such that P(C) = C dP for all C ∈ F. We write = . dP F Taking C = we see in particular that P() = 1 = d P = E[],

so that P is a probability measure if and only if is nonnegative and E[] = 1.

1.3 Random variables

25

is called the density of P, with respect to P, or the Radon–Nikodym derivative of P, with respect to P. Example 1.3.27 Suppose (, F, P) is a probability space, where is a finite set containing N outcomes and pi = P({ωi }). p Suppose is a positive, real valued function on such that (ωi ) = i , where pi = pi P(ωi ) is some new probability we would like to assign to outcome ωi . Since 1= pi = (ωi ) pi , the expected value of under probability measure P is equal to 1 and the random variable is the Radon–Nikodym derivative of P with respect to P. If X is a random variable on then: E[X ] = X (ωi ) pi = X (ωi )(ωi ) pi = E[X ]. Hence E[X ] = E[X ]. Conversely, if pi = 0 for all i, E[X ] = E[−1 X ].

Example 1.3.28 xLet f be a continuous function defined on [0, ∞) and F(x) = x f (u)du = f (u)dλ(u), where λ is the Lebesgue measure. Consider the signed mea0

0

sure µ (i.e. µ = µ1 − µ2 , where at least one of the measures µ1 and µ2 is finite) defined on x

([0, ∞), B([0, ∞)) by setting µ([0, x]) = given by the limit: lim

h→0

f (u)du. We know that the derivative of F is 0

F(x + h) − F(x) µ([x, x + h]) dµ(x) = lim = = f (x), h→0 λ([x, x + h]) h dλ(x)

and we may think of f as the Radon–Nikodym derivative of the measure µ induced by the integral with respect to Lebesgue measure λ. Example 1.3.29 Let (, F, P) be a probability space on which are defined the random variables Y1 , Y2 , . . . , Yn . Let Fn = σ {Y1 , Y2 , . . . , Yn }. Let P be another probability measure on F. Suppose that under P and P the random vector {Y1 , Y2 , . . . , Yn } has densities f n (.) and f n (.) respectively, with respect to n-dimensional Lebesgue measure. Then the Radon– Nikodym derivative dP f n (Y1 , Y2 , . . . , Yn ) = dP Fn f n (Y1 , Y2 , . . . , Yn )

(1.3.3)

is the likelihood ratio of the two probability measures in the presence of a sample of observations {Y1 , Y2 , . . . , Yn }. The higher (1.3.3) is the stronger is the evidence against f n (.). Now we state a few useful and standard theorems in the theory of integration.

26

Basic probability concepts

Definition 1.3.30 Suppose X is a square integrable random variable, that is E[X 2 ] < ∞, then Var(X ) = E[(X − E[X ])2 ] = E[X 2 ] − E[X ]2 , √ is called the variance of X . The square root of the variance Var(X ) = σ X is called the standard deviation of the random variable X . Definition 1.3.31 If X and Y are random variables with finite means µ X and µY and finite variances σ X2 and σY2 , the covariance of X , Y is given by Cov(X, Y ) = E[(X − µ X )(Y − µY )] = E[X Y ] − µ X µY , and the correlation coefficient of X , Y is given by ρ(X, Y ) = where X =

Cov(X, Y ) = Cov(X , Y ), σ X σY

X − µx Y − µY and Y = are the normalized versions of X and Y . σX σY

In view of (1.3.2) independence of X and Y implies Cov(X, Y ) = 0 = ρ(X, Y ). However, the converse is not true. In fact Cov(X, Y ) can vanish even if Y is a (nonlinear) function of X . The covariance, or the correlation coefficient, are related only with the linear dependence of X and Y . Lemma 1.3.32 (Cauchy–Schwarz Inequality). If X and Y are square integrable random variables, (E[X Y ])2 ≤ E[X 2 ]E[Y 2 ]. Proof Let R(t) = E[X + tY ]2 = E[X 2 ] + t 2 E[Y 2 ] + 2t E[X Y ] ≥ 0. Then for the quadratic function of t, R(t), to be nonnegative its discriminant must satisfy (E[X Y ])2 − E[X 2 ]E[Y 2 ] ≤ 0, which is the result. From the Cauchy–Schwarz Inequality 1.3.32 the integrability of X Y is guaranteed by the square integrability of X and Y . Another useful inequality is Lemma 1.3.33 (Chebyshev–Markov inequality). Let (, F, µ) be a measurable space. For every measurable real valued function on , and every pair of real numbers p > 0, α > 0, 1 µ({ω : | f (ω)| ≥ α}) ≤ p | f (ω)| p dµ(ω). α Proof

Let Fα = {ω : | f (ω)| ≥ α}. Then p p p | f (ω)| dµ(ω) ≥ | f (ω)| dµ(ω) ≥ α

Fα

dµ = α p µ(Fα ). Fα

In addition to almost sure convergence, which was defined in Example 1.3.11, we have the following types of convergence.

1.3 Random variables

27

First recall that L p (, F, P), p ≥ 1, is the space of random variables with finite absolute p-th moments, that is, E[|X | p ] < ∞. Lp

{X k } converges to X in L p (X k → X ), (0 < p < ∞), if E[|X k | p ] < ∞,

E[|X | p ] < ∞,

and E[|X k − X | p ] → 0 (k → ∞). P {X k } converges to X in probability or in measure (X k → X ) if for each > 0 the sequence P[|X k − X | > ] → 0

(k → ∞). D

Let Fn (x) = P[X n ≤ x], F(x) = P[X ≤ x]. X n converges in distribution to X (X n → X ) if g(x)dFn (x) → g(x)dF(x), IR

IR

for every real valued, continuous bounded function g defined on IR. A necessary and sufficient condition for that is: Fn (x) → F(x), at every continuity point x of F [7]. These convergence concepts are in the following relationship to each other. a.s.

P

D

(X k → X ) ⇒ (X k → X ) ⇒ (X n → X ). A useful concept is the uniform integrability of a family of random variables which permits the interchange of limits and expectations. Definition 1.3.34 A sequence {X n } of random variables is said to be uniformly integrable if sup E[|X n |I{|X n |>A} ] → 0, n

(A → ∞).

(1.3.4)

A family {X t }, t ≥ 0 of random variables is said to be uniformly integrable if sup E[|X t |I{|X t |>A} ] → 0, t

(A → ∞).

(1.3.5)

Example 1.3.35 If L is bounded in L p (, F, P) for some p > 1, then L is uniformly integrable. Proof Choose A so large that E[|X | p ] < A for all X ∈ L. For fixed X ∈ L, let Y = Y p−1 |X |I{|X |>K } . Then Y (ω) ≥ K I{|X |>K } > 0 for all ω ∈ . Since p > 1, p−1 ≥ I{|X |>K } , K and K 1− p Y p =

Y p−1 Y ≥ Y I{|X |>K } = Y. K p−1

Thus E[Y ] ≤ K 1− p E[Y p ] ≤ K 1− p E[|X | p ] ≤ K 1− p A, which goes to 0 when K → ∞, from which the result follows.

28

Basic probability concepts

The following result is a somewhat stronger version of Fatou’s Lemma 1.3.16. Theorem 1.3.36 Let {X n } be a uniformly integrable family of random variables. Then E[lim inf X n ] ≤ lim inf E[X n ]. Proof

The proof is left as an exercise

Corollary 1.3.37 Let {X n } be a uniformly integrable family of random variables such that X n → X (a.s.), then E|X n | < ∞, E(X n ) → E(X ), and E|X n − X | → 0. The following deep result (Shiryayev [36]) gives a necessary and sufficient condition for taking limits under the expectation sign. Theorem 1.3.38 Let 0 ≤ X n → X and E(X n ) < ∞. Then E(X n ) → E(X ) ⇐⇒ the family {X n } is uniformly integrable. Proof The sufficiency part follows from Theorem 1.3.36. To prove the necessity, note that if x is not a point of positive probability for the distribution of the random variable X then X n I{X n <x} → X I{X <x} and the family {X n I{X n <x} } is uniformly integrable. Hence, by the sufficiency part of the theorem, we have E[X n I{X n <x} ] → E[X I{X <x} ], and E[X n I{X n ≥x} ] → E[X I{X ≥x} ]. So given > 0 there exists N such that E[X n I{X n ≥x} ] ≤ E[X I{X ≥x} ] + 2 and also, since E[X ] is finite, there exists, for the same , an x0 (which is not a point of discontinuity of the distribution of X ) such that E[X I{X ≥x0 } ] < 2 . Then if n is large enough, E[X n I{X n ≥x0 } ] ≤ E[X I{X ≥x0 } ] +

≤ . 2

Since E[X n ] is finite for all n, choose x1 so large that E[X n I{X n ≥x1 } ] < for all n ≤ N . Then we can conclude that supn E[X n I{X n ≥x1 } ] < , which shows the uniform integrability of {X n }. Remark 1.3.39 If we delete n in (1.3.4), E[|X |I{|X |>A} ] → 0,

(A → ∞),

is the requirement for X to be integrable.

1.4 Conditional expectations We often have insights into the occurrence of events in gained through observations of realizations of related random variables. Conditional expectations incorporate this information by conditioning on events or random variables, i.e. on the σ -fields generated by these random variables.

1.4 Conditional expectations

29

Let X = i xi I Ai be a simple random variable on a probability space (, F, P). What is the expected value of X given some event B having positive probability P(B)? Under the posterior probability measure P(. | B) this is E[X | B] = xi P(X = xi | B) 1 1 = xi P({X = xi } ∩ B) = E[X I B ]. P(B) P(B) E[X I B ] is the probability weighted sum of the values taken on by X in the event B. We divide the weighted sum by P(B) to obtain the weighted average. We could write as a definition: E[X | B] =

E[X I B ] E[X I B ] = . E[I B ] P(B)

Let X = IC and Y = I B . The σ -field σ (Y ) is generated by the atoms B and B. To see this, consider any Borel set B:   ∅ if {0, 1} ∈ / B,     B if 0 ∈ B, 1 ∈ / B, Y −1 {B} =  B if 1 ∈ B, 0 ∈ / B,     if {0, 1} ∈ B. Hence σ (Y ) = {, B, B, ∅}. Define E[X | Y ] = E[X | σ (Y )] = E[X | atoms of σ (Y )] = E[X | B, B]. Or,

E[IC | B, B](ω) = P(C | B, B)(ω) = P(C | B)I B (ω) + P(C | B)I B (ω). Hence E[X | Y ] is a function constant on the atoms of σ (Y ). That is E[X | Y ] is σ (Y )-measurable. Since E[X | Y ] is a random variable its mean is: E[E[X | Y ]] = E[P(C | B)I B (ω) + P(C | B)I B (ω)] = P(C ∩ B) + P(C ∩ B) = P(C) = E[X ]. If X is an integrable random variable and Y = i yi I Bi is a simple random variable, we write E[X I B ] i E[X | Y ] = E[X | σ (Y )] = I B (ω). P(Bi ) i Hence E[X | Y ] is σ (Y )-measurable and E[E[X | Y ]] =

E[X I Bi ] = E[X ].

The expected value of E[X | Y ] is the same as the expected value of X .

30

Basic probability concepts

Let X ∈ L 1 (E|X | < ∞) be a (nonnegative for simplicity) random variable on a probability space (, F, P) and G be a sub-σ -field of F. The probability space (, G, P) is a coarsening of the original one and X is, in general, not measurable with respect to G. We seek now a G-measurable random variable, which we denote temporarily by X G , that assumes, on average, the same values as X . That is, we seek an integrable random variable X G such that X G is G-measurable and X G dP = X dP, for all A ∈ G. A

A

Now the set function Q(A) = A X dP is a measure absolutely continuous with respect to P, so that the Radon–Nikodym Theorem 1.3.25 guarantees the existence of a G-measurable random variable suggestively denoted by E(X | G), which is uniquely determined except on an event of probability zero, such that X dP = E[X | G]dP, A

A

for all A ∈ G. We say that X G is a version of E(X | G). For a general integrable random variable X we define E[X | G] as E[X + | G] − E[X − | G]. Remark 1.4.1 Let (, F, P) be given, and suppose X is an L 2 random variable (measurable with respect to F). Let G be a sub-σ -algebra of F, that is, G is less informative than F. A natural question is: by observing only G how much can we learn about X ? Or, among all random variables which are G-measurable which one gives us the best information (in the mean square sense) about the random variable X ? It turned out that E[X | G] is the closest (G-measurable) random variable to X . This is seen by considering, for any G-measurable random variable, Z = X − E[X | G]. Then: E[(Z − Y )2 ] = E[(X − E[X | G])2 + Y 2 + 2Y (X − E[X | G])] = E[E[(X − E[X | G])2 | G]] + E[Y 2 ].

This is minimized when Y = 0 a.s.

Example 1.4.2 Let = (0, 1], X (ω) = ω, P be Lebesgue measure and consider the σ -field

G = σ {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]} = σ {A1 , A2 , A3 , A4 }. E[X | G] must be constant on the atoms of G so that E[X | G](ω) = xi I Ai (ω). E[X I Ai ] . P(Ai ) 1 Clearly P(Ai ) = and E[X I Ai ] = xdx. 4 Ai

where xi =

1.4 Conditional expectations

31

Hence E[X | G](ω) =

1 2 5 7 I A1 (ω) + I A2 (ω) + I A3 (ω) + I A4 (ω), 8 8 8 8

which is a G-measurable random variable.

Example 1.4.3 Let X 1 , X 2 and X 3 be three independent, identically distributed (i.i.d.) random variables such that P(X i = 1) = p = 1 − P(X i = 0) = 1 − q. Let S = X 1 + X 2 + X 3 . Suppose that we observe X 1 and X 2 and we wish to find the (conditional) probability that S = 2 given X 1 and X 2 . The σ -field generated by the (vector) random variable (X 1 , X 2 ) is generated by the atoms {Ai j }, i, j = 0, 1, where Ai j = [ω : X 1 (ω) = i, X 2 (ω) = j]. P(S = 2 | X 1 , X 2 )(ω) = P(S = 2 | σ {X 1 , X 2 })(ω) = P(S = 2 | Ai j )I Ai j (ω) i, j=0,1

= =

P(S = 2 ∩ Ai j ) I Ai j (ω) P(Ai j ) i, j=0,1 P(i + j + X 3 = 2)P(Ai j ) i, j=0,1

=

P(Ai j )

I Ai j (ω)

P(X 3 = 2 − i − j)I Ai j (ω)

i, j=0,1

= P(X 3 = 0)I A11 + P(X 3 = 1)I{A10 ∪A01 } = q I A11 (ω) + p I{A10 ∪A01 } (ω). The expected value of the (σ {X 1 , X 2 }-measurable) random variable P(S = 2 | X 1 , X 2 ) is E[q I A11 (ω) + p I{A10 ∪A01 } (ω)] = q P(A11 ) + p[P(A01 ) + P(A01 )] = P(S = 2). Example 1.4.4 Let f ∈ L 1 [0, 1], i.e. the Lebesgue integral [0,1) | f (x)|dx exists and is j j +1 finite. Let Fn = σ {[ n , n ), j = 0, . . . , 2n − 1}. Then 2 2 −n n 2 −1 ( j+1)2 f (x)dx j2−n E[ f | Fn ](ω) = I[ j2−n ,( j+1)2−n ) (ω). −n 2 j=0 Theorem 1.4.5 If X is real F-measurable random variable and if A ∈ F, then X = 0 a.s.

X dP = 0 for all A

32

Basic probability concepts

Suppose X ≥ 0 and

Proof

A

X dP = 0 for all A ∈ F. Write An = {ω : X (ω) ≥ n1 }. X dP ≥ An

1 P(An ) ≥ 0. n

X dP = 0 so P(An ) = 0 for all n. Therefore,

But An

P({X > 0} = P(

An ) ≤

P(An ) = 0.

For a general random variable X , recall that X = X + − X − , where both X + and X − are nonnegative. The following is a list of classical results on conditional expectation: 1. E(X | A) is unique (a.s.) Proof Let X 1 = E(X | A) and X 2 be an A-measurable random variable such that X 2 dP = X dP, A

A

for all A ∈ A and let 0 = {ω : X 1 > X 2 } ∈ A. Hence X 1 dP = E(X | A) = X dP, 0

and

0

X 2 dP =

0

so that

X dP, 0

0

or

0

X 1 dP =

0

X 2 dP,

0

(X 1 − X 2 )dP = 0.

Using Theorem 1.4.5 X 1 = X 2 a.s. 2. If A1 and A2 are two sub-σ -fields of F such that A1 ⊂ A2 , then E(E(X | A1 ) | A2 ) = E(E(X | A2 ) | A1 ) = E(X | A1 ).

(1.4.1)

Proof Clearly E(E(X | A1 ) | A2 ) = E(E(X | A2 ) | A1 ). Now E(E(X | A2 ) | A1 ) is A1 -measurable and for A ∈ A1 , E(E(X | A2 ) | A1 ))dP = E(X | A2 )dP A A = X dP = E(X | A1 )dP. A

Hence E(E(X | A2 ) | A1 ) = E(X | A1 ) a.s.

A

1.4 Conditional expectations

33

3. If X , Y , X Y ∈ L 1 and Y is A-measurable then E[X Y | A] = Y E[X | A].

(1.4.2)

Proof It is sufficient to prove the result when X and Y are positive. If Y = I A , A ∈ A, then for every B ∈ A X Y dP = X dP = E[X | A]dP B A∩B A∩B = I A E[X | A]dP = Y E[X | A]dP. B

B

That is, E[X Y | A] = Y E[X | A], if Y is an indicator function. It follows that the result is true for simple functions of sets in A and therefore for a limit of bounded increasing sequence of such functions converging to Y . 4. If X is independent of the σ -field A, then E(X | A) = E(X ). Proof

(1.4.3)

First note that E(X ) is A-measurable. Now, for A ∈ A we have to show that E(X | A)dP = E(X )dP. A

A

However, the left hand side is equal to E[I A X ] and the right hand side is equal to E[I A ]E[X ], and their equality follows from the definition of independence of random variables. 5. Conditional expectation is a projection operation, and so E[E[X | A] | A] = E[X | A].

(1.4.4)

Example 1.4.6 Consider the joint distribution function F(x1 , x2 ) of two real valued random variables X 1 , X 2 and the probability measure P on the two-dimensional Borel sets generated by the distribution function F(x1 , x2 ). Suppose that P is absolutely continuous with respect to two-dimensional Lebesgue measure. Then, by the Radon–Nikodym theorem, there exists a nonnegative density function f (x1 , x2 ) such that for any Borel set B: P(B) = I B (x1 , x2 ) f (x1 , x2 )dx1 dx2 . If f (x1 , x2 ) > 0 everywhere,

P(B | X 2 = x2 ) = from which we can deduce that +∞ probability measure P(. | X 2 =

−∞ x2 ).

{x1 :(x1 ,x2 )∈B} f (x 1 , x 2 )dx 1 , +∞ −∞ f (x 1 , x 2 )dx 1

f (x1 , x2 ) f (x1 , x2 )dx1

is the density function of the conditional

34

Basic probability concepts

Example 1.4.7 Let X 1 and X 2 be two random variables with a normal joint distribution. Then their probability density function has the form 2 1 1 2 ¯ ¯ ¯ ¯ φ(x1 , x2 ) = exp − − 2ρ x + x x , x 1 2 2 2(1 − ρ 2 ) 1 2πσ1 σ2 1 − ρ 2 xi − µi , i = 1, 2. The conditional density of X 1 given X 2 = σi σ1 x2 is a normal density with mean µ1 + ρ (x2 − µ2 ) and variance Var(X 1 | X 2 = x2 ) = σ2 (1 − ρ 2 )σ12 < σ12 = Var(X 1 ). To see this, recall that, by definition, the conditional density of X 1 given X 2 is given by where 0 ≤ ρ < 1 and x¯i =

φ(x1 | x2 ) =

φ(x1 , x2 ) IR φ(x 1 , x 2 )dx 1

2 1 2 ¯ ¯ ¯ ¯ x − 2ρ x + x x 1 2 2 2(1 − ρ 2 ) 1 2πσ1 σ2 1 − ρ 2 = 1 1 exp − x¯22 2π σ2 2 2 1 1 2 2 ¯ ¯ ¯ ¯ = exp − − 2ρ x + ρ x x x 1 2 2 2(1 − ρ 2 ) 1 2πσ1 1 − ρ 2 1 1 2 = exp − [x¯1 − ρ x¯2 ] 2(1 − ρ 2 ) 2πσ1 1 − ρ 2 1

=

exp −

1 2πσ1 1 − ρ 2

2 1 σ1 , × exp − 2 x1 − (µ1 + ρ (x2 − µ2 )) σ2 2σ1 (1 − ρ 2 ) and the result follows. Thus by conditioning on X 2 we have gained some statistical information about X 1 which resulted in a reduction in the variability of X 1 . 1.5 Problems

1. Let {Fi }i∈I be a family of σ -fields on . Prove that i∈I Fi is a σ -field. 2. Let A and B be two events. Express by means of the indicator functions of A and B I A∪B ,

I A∩B ,

I A−B ,

I B−A ,

I(A−B)∪(B−A) ,

¯ where A − B = A ∩ B.

1 3. Let = IR and define the sequences C2n = [−1, 2 + ) and C2n+1 = [−2 − 2n 1 , 1). Show that 2n + 1 lim sup Cn = [−2, −2],

lim inf Cn = [−1, 1].

1.5 Problems

35

1 1 1 5 4. Let = (ω1 , ω2 , ω3 , ω4 ) and P(ω1 ) = , P(ω2 ) = , P(ω3 ) = , and P(ω4 ) = . 12 6 3 12 Let {ω1 , ω3 } if n is odd, An = {ω2 , ω4 } if n is even. 5. 6. 7. 8.

9. 10. 11. 12.

13.

Find P(lim sup An ), P(lim inf An ), lim sup P(An ), and lim inf P(An ) and compare. Give a proof to Theorem 1.3.36. Show that a σ -field is either finite or uncountably infinite. Show that if X is a random variable, then σ {|X |} ⊆ σ {X }. Show that the set B0 of countable unions of open intervals in IR is not closed under complementation and hence is not a σ -field. (Hint: enumerate the rational numbers and choose, for each one of them, an open interval containing it. Now show that the complement of the union of all these open intervals is not in B0 .) Show that the class of finite unions of intervals of the form (−∞, a], (b, c], and (d, ∞) is a field but not a σ -field. Show that a sequence of random variables {X n } converges (a.s.) to X if and only if ∀ > 0 limm→∞ P[|X n − X | ≤ ∀ n ≥ m] = 1. Show that if {X k } converges (a.s.) to X then {X k } converges to X in probability but the converse is false. Consider the probability space (IN, F, P), where IN is the set of natural numbers, F 1 is the collection of all the subsets of IN and P({k}) = k . Let X k (ω) = I[ω=k] . Discuss 2 the convergence (a.s.) and in probability of X k and show that on this particular space they are equivalent. Let {X n } be a sequence of random variables with P[X n = 2n ] = P[X n = −2n ] =

1 , 2n

1 . 2n−1 Show that {X n } converges (a.s.) to 0 but E|X n | p does not converge to 0. 14. Let {X n } be a sequence of random variables with P[X n = 0] = 1 −

1 , n 1 P[X n = 0] = 1 − . n Show that {X n } does not converge (a.s.) to 0 but E|X n | p converges to 0. 15. Suppose Q is another probability measure on (, F) such that P(A) = 0 implies Q(A) = 0 (Q P). Show that P-a.s. convergence implies Q-a.s. convergence. 16. Prove that if F1 and F2 are independent sub-σ -fields and F3 is coarser than F1 , then F3 and F2 are independent. 1 17. Let = (ω1 , ω2 , ω3 , ω4 , ω5 , ω6 ), P(ωi ) = pi = and the sub-σ -fields 6 F1 = σ {{ω1 , ω2 }, {ω3 , ω4 , ω5 , ω6 }}, p

P[X n = n 1/2 ] =

F2 = σ {{ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }}.

36

Basic probability concepts

Show that F1 and F2 are not independent. What can be said about the sub-σ -fields F3 = σ {{ω1 , ω2 }, {ω3 }, {ω4 , ω5 , ω6 }}, and F5 = σ {{ω1 , ω4 }, {ω2 , ω5 }, {ω3 , ω6 }}? 18. Let = {(i, j) : i, j = 1, . . . , 6} and P({i, j}) = 1/36. Define the quantity X (ω) =

∞

k I{(i, j):i+ j=k} .

k=0

Is X a random variable? Find PX (x) = P(X = x), calculate E[X ] and describe σ (X ), the σ -field generated by X . 19. For the function X defined in the previous exercise, describe the random variable P(A | X ), where A = {(i, j) : i odd, j even} and find its expected value E[P(A | X )]. 20. Let be the unit interval (0, 1] and on it be given the following σ -fields: F1 = σ {(0, 12 ], ( 12 , 34 ], ( 34 , 1]}, F2 = σ {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}, F3 = σ {(0, 18 ], ( 18 , 28 ], . . . , ( 78 , 1]}. Consider the mapping X (ω) = x1 I

(0,

1 (ω) ] 4

+ x2 I

(

1 1 (ω) , ] 4 2

+ x3 I

(

1 3 (ω) , ] 2 4

+ x4 I

(

(ω). 3 , 1] 4

Find E[X | F1 ], E[X | F2 ], and E[X | F3 ]. 21. Let be the unit interval and ((0, 1], P) be the Lebesgue-measurable space and consider the following sub-σ -fields: F1 = σ {(0, 12 ], ( 12 , 34 ], ( 34 , 1]}, F2 = σ {(0, 14 ], ( 14 , 12 ], ( 12 , 34 ], ( 34 , 1]}. Consider the mapping X (ω) = ω. Find E[E[X | F1 ] | F2 ], E[E[X | F2 ] | F1 ] and compare. 22. Consider the probability measure P on the real line such that: P(0) = p,

P((0, 1)) = q,

p + q = 1,

and the random variables defined on = IR, X 1 (x) = 1 + x, X 2 (x) = 0I{x≤0} + (1 + x)I{0<x<1} + 2I{x≥1} , +∞ X 3 (x) = (1 + x + k)I{k≤x≤k+1} . k=−∞

Is there any P-a.s. equality between X 1 , X 2 and X 3 ?

1.5 Problems

37

23. Let X 1 , X 2 and X 3 be three independent, identically distributed (i.i.d.) random variables such that P(X i = 1) = p = 1 − P(X i = 0) = 1 − q. Find P(X 1 + X 2 + X 3 = s | X 1 , X 2 ). 24. Let X 1 , X 2 and X 3 be three random variables with multinomial distribution with parameters p1 , p2 , p3 , n, that is P(X 1 = n 1 , X 2 = n 2 , X 3 = n 3 ) =

n! p1n 1 p2n 2 p3n 3 , n 1 !n 2 !n 3 !

where n 1 , n 2 and n 3 are nonnegative integers such that n 1 + n 2 + n 3 = n. Show that if n is a random variable with Poisson distribution with parameter λ then the three random X 1 , X 2 X 3 become mutually independent with Poisson distributions. 25. On = [0, 1] and P being Lebesgue measure show that X = x1 I(0, 12 ] + x2 I( 12 ,1] and Y = y1 I(0, 14 ]∪( 34 ,1] + y2 I( 14 ], 34 ] are independent. 26. Show that (see Example 1.4.4) E[ f | Fn ] =

n 2 −1

j=0

( j+1)2−n j2−n

f (x)dx

2−n

I[ j2−n ,( j+1)2−n )

converges a.s. and in L 1 to f as n → ∞. In particular, if f = I E for some Borel set E, then n 2 −1

j=0

λ(E ∩ [ j2−n , ( j + 1)2−n )) a.s. I[ j2−n ,( j+1)2−n ) (x) → I E (x), 2−n

x ∈ [0, 1]. Here λ(.) is the Lebesgue measure.

2

Stochastic processes

2.1 Definitions and general results A stochastic process is a mathematical model for any phenomenon evolving or varying in time (or over some index set), subject to random influences. Examples include the price of a commodity observed through time, the fluctuating water level behind a dam or the distribution of shades in a noisy image observed over a region of IR2 . Suppose (, F) is a measurable space. We shall define a stochastic process to be a mapping X (index) (ω) from × {index space} into a second measurable space (E, E), called the state space, or the range space. Alternatively, we can consider a stochastic process as a family {X t } t ∈ {index space} of random variables all defined on a measurable space (, F). For a fixed simple outcome ω, X (.) (ω) is a function describing one possible trajectory, or sample path, followed by the process. If the time index is frozen at t, say, then we have a random variable X t (.), i.e. an F-measurable function of ω. When the time index t is continuous, measurability, continuity, etc. in t are considered. A continuous-time stochastic process {X t } is said to have independent increments if for all t0 < t1 < t2 < · · · < tn , the random variables X t1 − X t0 , X t2 − X t1 , . . . , X tn − X tn−1 are independent. If for all s, X t+s − X t has the same distribution for all t, {X t } is said to possess stationary increments. Sometimes, a stochastic process is interpreted as just a single random variable taking values in a space of functions, that is, with each ω is associated a function. In analogy with real random variables, the state space is then endowed with a Borel σ -field (generated by the open sets of an underlying topology). Example 2.1.1 Let = {ω1 , ω2 , . . . }, and let the time index n be finite 0 ≤ n ≤ N . A stochastic process X in this setting is a two-dimensional array or matrix such that:

X=

X 1 (ω1 )

X 1 (ω2 ) . . .

X 2 (ω1 )

X 2 (ω2 ) . . .

...

...

...

X N (ω1 ) X N (ω2 ) . . .

2.1 Definitions and general results

39

Each row represents a random variable and each column is a sample path or a realization of the stochastic process X . If the time index is unbounded, each sample path is given by an infinite sequence. Example 2.1.2 Let N = 4 in the previous example and suppose that X is given by the following array. 2

3

5

7 11 √ −1 1 5.7 2 3

3

2.3

1

6

83 19

11 7 70

3

2

−5

2

21

5

1

0

1

2

3

3

2

The sample space of {X n } is IR4 and the stochastic process can be thought of as a mapping (in fact a random variable)

ωi → X (ωi ) = (X 1 (ωi ), . . . , X 4 (ωi )) = (x1i , x2i , x3i , x4i ) = x i ∈ IR4 . The random variable X induces a probability measure PX on the Borel σ -field B(IR4 ) in the usual way, i.e., for any B ∈ B(IR4 ),

PX (B) = P[ω : X (ω) ∈ B] = P(X −1 (B)). For instance, B1 = {x ∈ IR4 : 3 ≤ x1 ≤ 5, 2 ≤ x2 ≤ 7} contains a single trajectory (column 6 in the table) so that PX (B1 ) = P(ω6 ). B2 = {x ∈ IR4 : max xn ≤ 7} 1≤n≤4

contains four trajectories (column 2, column 3, column 4 and column 6 in the table) so that PX (B2 ) = P(ω2 , ω3 , ω4 , ω6 ). Example 2.1.3 Let = {ω1 , ω2 , . . . } and P be a probability measure on (, F). Suppose that the time index set is the set of positive integers. A real valued stochastic process X in this setting is a two-dimensional infinite array such that: X 1 (ω1 ) X 1 (ω2 ) . . . X = X 2 (ω1 ) X 2 (ω2 ) . . . . ...

...

...

Here the sample space is IR∞ = {(x1 , x2 , . . . ) ∈ IR × IR × . . . }.

40

Stochastic processes

Note that the Borel σ -field B(IR∞ ) coincides with the smallest σ -field containing the open |xk1 − xk2 | sets in IR∞ in the metric ρ∞ (x 1 , x 2 ) = k 2−k ([36]). 1 + |xk1 − xk2 | Now think of the stochastic process X as an IR∞ valued random variable

ωi → X (ωi ) = (X 1 (ωi ), X 2 (ωi ), . . . ) = (x1i , x2i , . . . ) = x i ∈ IR∞ . The random variable X induces a probability measure PX on the σ -field B(IR∞ ). For instance, if A = {x ∈ IR∞ : sup xn > a} ∈ B(IR∞ ), then the set A consists of all sequences with some of their entries larger than a and PX (A) = P(ω : X (ω) ∈ A). Example 2.1.4 (The Single Jump Process) Consider a stochastic process {X t }, t ≥ 0, which takes its values in some measurable space {E, E} and which remains at its initial value z 0 ∈ E until a random time T , when it jumps to a random position Z . A sample path of the process is z 0 if t < T (ω), X t (ω) = Z (ω) if t ≥ T (ω). The underlying probability space can be taken to be = [0, ∞] × E, with the σ -field B × E. A probability measure P is given on (, B × E) and we suppose P([∞, 0] × {z 0 }) = 0 = P({0} × E), so that the probabilities of a zero jump and a jump at time zero are zero. Write Ft = P[T > t, Z ∈ E], c = inf{t : Ft = 0}. Ft is right-continuous and monotonic decreasing, so there are only countably many points of discontinuity {u} = D where Fu = Fu − Fu− = 0. At points in D, there are positive probabilities that X jumps. Note that the more probability mass there is at a point u, the more predictable is the jump at that point. Formally define a function by setting: d(t) = P(T ∈]t − dt, t], Z ∈ E | T > t − dt). Then is the probability that the jump occurs in the interval ]t − dt, t], given it has not

2.1 Definitions and general results

41

happened at t − dt. Roughly speaking we have d(t) = P(T ∈]t − dt, t] | T > t − dt) =

P(T ∈]t − dt, t]) Ft−dt

=

1 − Ft − (1 − Ft−dt ) Ft−dt

=

−(Ft − Ft−dt ) Ft−dt

=

−(Ft − Ft− ) Ft−

=

−dFt . Ft−

Define

(t) = − ]0,t[

dFs . Fs−

(2.1.1)

For instance, if T is exponentially distributed with parameter θ we have d exp(−θs) (t) = − = θ. ]0,t[ exp(−θ s) Write FtA = P[T > t, Z ∈ A], then clearly the measure on (IR+ , B(IR+ )) given by FtA is absolutely continuous with respect to that given by Ft , so that there is a Radon–Nikodym derivative λ(A, s) such that A A Ft − F0 = λ(A, s)dFs . (2.1.2) ]0,t[

The pair (λ, ) is the L´evy system for the jump process. Roughly, λ(dx, s) is the conditional distribution of the jump position Z given the jump happens at time s. Let X t be a continuous time stochastic process. That is, the time index belongs to some interval of the real line, say, t ∈ [0, ∞). If we are interested in the behavior of X t during an interval of time [t0 , t1 ] it is necessary to consider simultaneously an uncountable family of X t s {X t , t0 ≤ t ≤ t1 }. This results in a technical problem because of the uncountability of the index parameter t. Recall that σ -fields are, by definition, closed under countable operations only and that statements like {X t ≥ x, t0 ≤ t ≤ t1 } = t0 ≤t≤t1 {X t ≥ x} are not events! However, for most practical situations this difficulty is bypassed by replacing uncountable index sets by countable dense subsets without losing any significant information. In general, these arguments are based on the separability of a continuous time stochastic process. This is possible, for example, if the stochastic process X is almost surely continuous (see Definition 2.1.6).

42

Stochastic processes

Let X = {X t : t ≥ 0} and Y = {Yt : t ≥ 0} be two stochastic processes defined on the same probability space (, F, P). Because of the presence of ω, the functions X t (ω) and Yt (ω) can be compared in different ways. Definition 2.1.5 1. X and Y are called indistinguishable if P({ω : X t (ω) = Yt (ω), t ≥ 0}) = 1. 2. Y is a modification of X if for every t ≥ 0, we have P({ω : X t (ω) = Yt (ω)}) = 1. 3. X and Y have the same law or probability distribution if and only if all their finite dimensional probability distributions coincide, that is, if and only if for any sequence of times 0 ≤ t1 ≤ · · · ≤ tn the joint probability distributions of (X t1 , . . . , X tn ) and (Yt1 , . . . , Ytn ) coincide. Note that the first property is much stronger than the other two. The null sets in the second and third properties may depend on t. Recall that there are different definitions of limit for sequences of random variables. So to each definition corresponds a type of continuity of a real valued time index process. Definition 2.1.6 1. {X t } is continuous in probability if for every t and > 0, lim P[|X t+h − X t | > ] = 0.

h→0

2. {X t } is continuous in L p if for every t, lim E[|X t+h − X t | p ] = 0.

h→0

3. {X t } is continuous almost surely (a.s.) if for every t, P[lim X t+h = X t ] = 1. h→0

4. {X t } is right continuous if for almost every ω the map t → X t (ω) is right continuous. That is, lim X s = X t a.s. s↓t

If in addition lim X s = X t− exists a.s., s↑t

{X t } is right continuous with left limits (rcll or corlol or c`adl`ag). However, none of the above notions is strong enough to differentiate, for instance, between a process for which almost all sample paths are continuous for every t, and a process for which almost all sample paths have a countable number of discontinuities, when the two processes have the same finite dimensional distributions. A much stronger criterion for continuity is sample paths continuity which requires continuity for all ts simultaneously! In other words,

2.1 Definitions and general results

43

for almost all ω the function X (.) (ω) is continuous in the usual sense. Unfortunately, the definition of a stochastic process in terms of its finite dimensional distributions does not help here since we are faced with whole intervals containing uncountable numbers of ts. Fortunately, for most useful processes in applications, continuous versions (sample path continuous), or right-continuous versions, can be constructed. If a stochastic process with index set [0, ∞) is continuous its sample space can be identified with C[0, ∞), the space of all real valued continuous functions. A metric on this space is sup0≤t≤k |x(t) − y(t)| ρ(x, y) = 2−k , 1 + sup0≤t≤k |x(t) − y(t)| k for x, y ∈ C[0, ∞). (See [36].) Let B(C) be the smallest σ -field containing the open sets of the topology induced by ρ on C[0, ∞), the Borel σ -field. Then ([36]) the same σ -field B(C) is generated by the cylinder sets of C[0, ∞) which have the form {x ∈ C[0, ∞) : xt1 ∈ I1 , xt2 ∈ I2 , . . . , xtn ∈ In }, where each Ii is an interval of the form (ai , bi ]. In other words, a cylinder set is a set of functions with restrictions put on a finite number of coordinates, or, in the language of Shiryayev ([36]), it is the set of functions that, at times t1 , . . . , tn , “get through the windows” I1 , . . . , In and at other times have arbitrary values. An example of a Borel set from B(C) is A = {x : sup xt > a, t ≥ 0}. Remark 2.1.7 Note that the set given by A depends on the behavior of functions on an uncountable set of points and would not be in the σ -field B(C) if C[0, ∞) were replaced by the much larger space IR[0,∞) (see Theorem 3, page 146 of [36]). In this latter space every Borel set is determined by restrictions imposed on the functions x, on an at most countable set of points t1 , t2 , . . . . Suppose the index parameter t is either a nonnegative integer or a nonnegative real number. The σ -fields FtX = σ {X u , u ≤ t} are the smallest ones with respect to which the random variables X u , u ≤ t, are measurable, and are naturally associated with any stochastic process {X t }. FtX is sometimes called the natural filtration associated with the stochastic process {X t }. The σ -field FtX contains all the events which by time t are known to have occurred or not by observing X up to time t. Often it is convenient to consider larger σ -fields than FtX . For instance, {Ft = σ {X u , Yu ; u ≤ t} where {Yt } is another stochastic process. Definition 2.1.8 The stochastic process X is adapted to the filtration {Ft , t ≥ 0} if for each t ≥ 0 X t is a Ft -measurable random variable. Clearly X is adapted to FtX . A function f is FtX -measurable if the value of f (ω) can be decided by observing the history of X up to time t (and nowhere else). This follows from the multivariate version of Theorem 1.3.6. For instance, f (ω) = X t 2 (ω) is FtX -measurable for 0 < t < 1 but it is not FtX -measurable for t ≥ 1.

44

Stochastic processes

As a function of two variables (t, ω), a stochastic process should be measurable with respect to both variables to allow a minimum of “good behavior”. Definition 2.1.9 A stochastic process {X t } with t ∈ [0, ∞) on a probability space {, F, P} is measurable if, for all Borel sets B in the Borel σ -field B(IRd ), {(ω, t) : X t (ω) ∈ B} ∈ F ⊗ B([0, ∞)). If the probability space {, F, P} is equipped with a filtration {Ft } then a much stronger statement of measurability which relates measurability in t and ω with the filtration {Ft } is progressive measurability. Definition 2.1.10 A stochastic process {X t } on a filtered probability space {, F, Ft , P} is progressively measurable if, for any t ∈ [0, ∞) and for any set B in the Borel σ -field B(IRd ), {(ω, s) : s ≤ t, X s (ω) ∈ B} ∈ Ft ⊗ B([0, t]). Here B([0, t]) is the σ -field of Borel sets on the interval [0, t]. A measurable process need not be progressively measurable since σ (X t ) may contain events not in Ft . Lemma 2.1.11 If X is a progressively measurable stochastic process, then X is adapted. Proof The map ω → (s, ω) from → [0, t] × is Ft -measurable. The map (s, ω) → X s (ω) from [0, t] × to the state space of X is Ft -measurable. By composition of the two maps the result follows. Theorem 2.1.12 If the stochastic process {X t : t ≥ 0} on the filtered probability space {, F, Ft , P} is measurable and adapted, then it has a progressively measurable modification. Proof

See [28] page 68.

Typically, in a description of a random process, the measure space and the probability measure on it are not given. One simply describes the family of joint distribution functions of every finite collection of random variables of the process. A basic question is whether there is a stochastic process with such a family of joint distribution functions. The following theorem ([36] page 244), due to Kolmogorov, guarantees us that this is the case if the joint distribution functions satisfy a set of natural consistency conditions. Theorem 2.1.13 (Kolmogorov Consistency Theorem) For all t1 , . . . , tk , k ∈ IN, in the time index T , let Pt1 ,...,tk be probability measures on (IRk , B(IRk )) such that Ptσ (1) ,...,tσ (k) (F1 × · · · × Fk ) = Pt1 ,...,tk (Fσ −1 (1) × · · · × Fσ −1 (k) ). for all permutations σ on {1, 2, . . . , k} and Pt1 ,...,tk (F1 × · · · × Fk ) = Pt1 ,...,tk ,tk+1 ,...,tk+m (F1 × · · · × Fk × IRn × · · · × IRn ),

2.1 Definitions and general results

45

for all m ∈ IN, and the set on the right hand side has a total of k + m factors. Then there is a unique probability measure P on the space (IRT , B(IRT )) such that the restriction of P to any cylinder set Bn = {x ∈ IRT : xt1 ∈ I1 , xt2 ∈ I2 , . . . , xtn ∈ In } is Pt1 ,...,tn , that is P(Bn ) = Pt1 ,...,tn (Bn ). Proof

See [36] page 167.

Theorem 2.1.14 ( Kolmogorov’s Existence Theorem). For all τ1 , . . . , τk , k ∈ IN and τ in the time index let Pτ1 ,...,τk be probability measures on IRnk such that Pτσ (1) ,...,τσ (k) (F1 × · · · × Fk ) = Pτ1 ,...,τk (Fσ −1 (1) × · · · × Fσ −1 (k) ), for all permutations σ on {1, 2, . . . , k} and Pτ1 ,...,τk (F1 × · · · × Fk ) = Pτ1 ,...,τk ,τk+1 ,...,τk+m (F1 × · · · × Fk × IRn × · · · × IRn ), for all m ∈ IN, and the set on the right hand side has a total of k + m factors. Then there exist a probability space (, F, P) and a stochastic process {X τ } on into IRn such that Pτ1 ,...,τk (F1 × · · · × Fk ) = P[X τ1 ∈ F1 , . . . , X τk ∈ Fk ], for all τi in the time set, k ∈ IN and all Borel sets Fi . Proof The proof follows essentially from Theorems 1.3.9, 1.3.10 and 2.1.13. See [36] page 247.

Definition 2.1.15 Suppose X is a stochastic process whose index set is the positive integers Z + . Suppose Fn is a filtration. Then {X n } is predictable if X n is Fn−1 -measurable, that is, X n (ω) is known from observing events in Fn−1 at time n − 1. In continuous time, without loss of generality, we shall take the time index set to be [0, ∞). In the continuous time case, roughly speaking, a stochastic process {X t } is predictable if knowledge about the behavior of the process is left-continuous, that is, X t is Ft− -measurable. Stated differently, for processes which are continuous on the left one may predict their value at each point by their values at preceding points. A Poisson process (see Section 2.10) is not predictable (its sample paths are right-continuous) otherwise we would be able to predict a jump time immediately before it jumps. More precisely, a stochastic process is predictable if it is measurable with respect to the σ -field on × [0, ∞) generated by the family of all left-continuous adapted stochastic processes. A stochastic process X with continuous time parameter is optional if it is measurable with respect to the σ -field on × [0, ∞) generated by the family of all right-continuous, adapted stochastic processes which have left limits. Definition 2.1.16 A measurable stochastic process {X t } with values in [0, ∞), is called an increasing process if almost every sample path X (ω) is right-continuous and increasing.

46

Stochastic processes

Theorem 2.1.17 Suppose {X t } is an increasing process. Then X t has a unique decomposition as X tc + X td , where {X tc } is an increasing continuous process, and {X td } is an increasing purely discontinuous process, that is, {X td } is the sum of the jumps of {X t } . If {X t } is predictable {X td } is predictable. If {X t } is adapted {X tc } is predictable. Proof

See [11] page 69. 2.2 Stopping times

One of the most important questions in the study of stochastic processes is the study of when a process hits a certain level or enters a certain region in its state space for the first time. Since for each possible trajectory, or realization ω, there is a hitting time (finite or infinite), the hitting time is a random variable taking values in the index, or time, space of the stochastic process. ∞ Let IN∞ = {1, 2, 3, . . . , ∞} and F∞ = σ {∪∞ n=1 Fn . n=1 Fn } = A random variable α taking values in IN∞ is a stopping time (or optional or Markov time) with respect to a filtration {Fn } if for all n ∈ IN∞ we have {ω : α(ω) ≤ n} ∈ Fn . An equivalent definition in discrete time is to require {ω : α(ω) = n} ∈ Fn . The concept of stopping time is directly related to the concept of the flow of information through time, that is, the filtration. The event {ω : α(ω) ≤ n} is Fn -measurable, that is, measurable with respect to the information available up to time n. This means a stopping time is a nonanticipative function, whereas a general random variable may anticipate the future. Example 2.2.1 Let {X n , Fn } be an adapted process (i.e. {Fn } is a filtration and X n is Fn measurable for all n). Suppose A is a measurable set of the state space of X . Then the random time α = min{k : X k ∈ A} is a stopping time since {α ≤ n} =

n

{X k ∈ A} ∈ Fn .

k=1

If α is a stopping time with respect to a filtration Fn so is α + m, m ∈ IN. However, α − m, m ∈ IN is not a stopping time since the event {α − m = n} = {α = n + m} is not in Fn ; it is in Fn+m and hence anticipates the future. In order to measure the information accumulated up to a stopping time we should define the σ -field Fα of events prior to a stopping time α. Suppose that some event B is part of this information. This means that if α ≤ n we should be able to tell whether or not B has occurred. However, {α ≤ n} ∈ Fn so that we should have B ∩ {α ≤ n} ∈ Fn and B c ∩ {α ≤ n} ∈ Fn . We, therefore, define: Fα = {A ∈ F∞ : A ∩ {ω : α(ω) ≤ n} ∈ Fn The next examples should help to clarify this concept.

∀n ≥ 0}.

2.2 Stopping times

47

Example 2.2.2 Let = {ωi ; i = 1, . . . , 8} and the time index T = {1, 2, 3}. Consider the following filtration: F1 = σ {{ω1 , ω2 , ω3 , ω4 , ω5 , ω6 }, {ω7 , ω8 }}, F2 = σ {{ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }, {ω7 , ω8 }}, F3 = σ {{ω1 }, {ω2 }, {ω3 }, {ω4 }, {ω5 }, {ω6 }, {ω7 }, {ω8 }}. Now define the random variable α(ω1 ) = α(ω2 ) = α(ω5 ) = α(ω6 ) = 2, α(ω3 ) = α(ω4 ) = α(ω7 ) = α(ω8 ) = 3, so that {α = 0} = ∅,

{α = 1} = ∅,

{α = 2} = {ω1 , ω2 , ω5 , ω6 }, {α = 3} = {ω3 , ω4 , ω7 , ω8 }, and α is a stopping time. Now Fα = { all events A ∈ F∞ (= F3 ) such that for some n the event A is a subset of the event {ω : α(ω) ≤ n} }. In our situation Fα = σ {{ω1 , ω2 }, {ω5 , ω6 }, {ω3 }, {ω4 }, {ω7 }, {ω8 }}. Note that the first two simple events of Fα , {ω1 , ω2 }, {ω5 , ω6 }, are in F2 and the rest are in F3 as they should be. Also, note that Fα is not the σ -field generated by the random variable α. However, a closer look shows that α is Fα -measurable. If, for instance, the outcome is ω1 then α = 2 and α −1 (2) = {α = 2} = {ω1 , ω2 , ω5 , ω6 } is an atom of the σ -field generated by the random variable α but not an atom of Fα . Example 2.2.3 Consider again the experiment of tossing a fair coin infinitely many times. Each ω is an infinite sequence of heads and tails and = {H, T }IN . Define the filtration: F1 = σ {{ω starting with H }, {ω starting with T }}, F2 = σ {{ω starting with HH}, {ω starting with HT }, {ω starting with TH }, {ω starting with TT }}, . . . , Fn = σ {{ω starting with n fixed letters}} Suppose that we win one dollar each time “Heads” comes up and lose one otherwise. Let S0 = 0 and Sn be our fortune after the n-th toss. Define the random variable α = inf{n : Sn > 0}, which is the first time our winnings exceed our losses. Clearly, α is a stopping time with respect to the filtration Fn . Here Fα = σ {{ω starting with H }, {ω starting with THH}, {ω starting with THTHH}, {ω starting with TTHHH}, . . . }.

48

Stochastic processes

and α(ω starting with H ) = 1, α(ω starting with THH) = 3, α(ω starting with THTHH) = α(ω starting with TTHHH) = 5. If ω = T H T H H . . . , then the information at time α(T H T H H . . . ) = 5 is in F5 and is given by the event composed of all the smaller events starting with T H T H H and is an atom of Fα . However {α = 5} = {{T H T H H . . . }, {T T H H H . . . }} which is not an atom of Fα . If α ≤ β are two stopping times then Fα ⊂ Fβ , because if A ∈ Fα , A {β ≤ n} = (A {α ≤ n}) {β ≤ n} ∈ Fn

(2.2.1)

for all n. From this result we see that if {αn } is an increasing sequence of stopping times, the sequence {Fαn } is a filtration. Example 2.2.4 Let = {ωi , i = 1, . . . , 8} and the time index T = {1, 2, 3, 4}. Consider the following filtration: F1 = σ {{ωi , i = 1, . . . , 6}, {ω7 , ω8 }}, F2 = σ {{ω1 , ω2 , ω3 }, {ω4 , ω5 , ω6 }, {ω7 , ω8 }}, F3 = σ {{ω1 , ω2 }, {ω3 }, {ω4 }, {ω5 , ω6 }, {ω7 , ω8 }}, F4 = σ {{ω1 }, {ω2 }, {ω3 }, {ω4 }, {ω5 }, {ω6 }, {ω7 }, {ω8 }}. Now define the stopping times α1 and α2 : α1 (ω1 ) = α1 (ω2 ) = α1 (ω3 ) = α1 (ω4 ) = α1 (ω5 ) = α1 (ω6 ) = 2, α1 (ω7 ) = α1 (ω8 ) = 3, α2 (ω1 ) = α2 (ω2 ) = α2 (ω3 ) = 2,

α2 (ω5 ) = α2 (ω6 ) = 3,

α2 (ω4 ) = α2 (ω7 ) = α2 (ω8 ) = 4, so that α1 ≤ α2 and Fα1 ⊂ Fα2 , where Fα1 = σ {{ω1 , ω2 , ω3 }, {ω4 , ω5 , ω6 }, {ω7 , ω8 }}, Fα2 = σ {{ω1 , ω2 , ω3 }, {ω4 }, {ω5 , ω6 }, {ω7 }, {ω8 }}. For any Borel set B, {ω : X α(ω) (ω) ∈ B} =

∞

{X n (ω) ∈ B, α(ω) = n} ∈ F,

n=0

that is, X α is a random variable. If X ∞ has been defined and X ∞ ∈ F∞ = n Fn , then we define X α (ω) = X α(ω) (ω), i.e. X α = n∈IN∞ X n I{α=n} ∈ Fα , that is, X α is Fα -measurable.

2.2 Stopping times

49

In the continuous time situation, definitions are more involved and the time parameter t plays a much more important role since continuity, limits etc. enter the scene. Let {Ft }, t ∈ [0, ∞) be a filtration. A nonnegative random variable α is called a stopping time with respect to the filtration Ft if for all t ≥ 0 we have {ω : α(ω) ≤ t} ∈ Ft . A nonnegative random variable α is an optional time with respect to the filtration Ft if for all t ≥ 0 we have {ω : α(ω) < t} ∈ Ft . Every stopping time is optional, and the two concepts coincide if the filtration is right-continuous since {ω : α(ω) ≤ t} ∈ Ft+ for every > 0, and hence {ω : α(ω) ≤ t} ∈ >0 Ft+ = Ft+ = Ft provided that Ft is right-continuous. Example 2.2.5 Suppose {X t , t ≥ 0} is continuous and adapted to the filtration {Ft , t ≥ 0}. 1. Consider α(ω) = inf{t, X t (ω) = b}, the first time the process X hits level b ∈ IR (first passage time to a level b ∈ IR). Then α is a stopping time since {α ≤ t} =

{|X r − b| ≤

n∈IN {r ∈Q,r ≤t}

1 } ∈ Ft . n

2. Consider α(ω) = inf{t, |X t (ω)| ≥ 1}, the first time the process X leaves the interval [−1, +1]. Then α is a stopping time. 3. Consider α(ω) = inf{t, X t (ω) > 1} which is the first time the jump X t = X t − X t− exceeds 1. Then α is a stopping time. Similarly to the discrete time case, the σ -field of events prior to a stopping time α is defined by Fα = {A ∈ F∞ : A ∩ {ω : α(ω) ≤ t} ∈ Ft

∀t ≥ 0}.

(2.2.2)

Any stopping time α is Fα -measurable as, for s ≤ t, {ω : α(ω) ≤ s} ∩ {ω : α(ω) ≤ t} = {ω : α(ω) ≤ min(t, s)} ∈ Fmin(t,s) ⊂ Ft .

(2.2.3)

Hence {ω : α(ω) ≤ s} ∈ Fα . If α1 , α2 are stopping times, then min(α1 , α2 ), max(α1 , α2 ) and α1 + α2 are stopping times as: 1. {min(α1 , α2 ) ≤ t} = {α1 ≤ t} ∪ {α2 ≤ t} ∈ Ft , 2. {max(α1 , α2 ) ≤ t} = {α1 ≤ t} ∩ {α2 ≤ t} ∈ Ft , 3. {α1 + α2 ≤ t} = {α1 = 0, α2 = t} ∪ {α2 = 0, α1 = t} ({α1 ≤ p} ∩ {α2 ≤ q} , p,q∈Q, p+q≤t

where Q is the set of rational numbers. 4. If {αn } is a sequence of stopping times then sup αn is a stopping time since {sup αn ≤ t} = n {αn ≤ t} ∈ Ft . 5. If α1 , α2 are stopping times such that α1 ≤ α2 then Fα1 ⊂ Fα2 .

50

Stochastic processes

Perhaps one of the most important applications of the concept of stopping time is the so-called strong Markov property. A stochastic process {X t } is a Markov process if E[ f (X t+s ) | FtX ] = E[ f (X t+s ) | X t ],

(P-a.s.)

(2.2.4)

where f is any bounded measurable function and FtX = σ {X u , u ≤ t}. Equation (2.2.4) is termed the Markov property. A natural generalization of the Markov property is the strong Markov property, where the “present” time t in (2.2.4) is replaced by a stopping time and the “future” time t + s is replaced by another later stopping time. That is, if α and β are stopping times and α ≤ β, E[X β | Fα ] = X α a.s. In other words a stochastic process {X t } has the strong Markov property if the information about the behavior of {X t } prior to the stopping time α is irrelevant in predicting its behavior after that time α once X α is observed.

2.3 Discrete time martingales Martingales are probably the most important type of stochastic processes used for modeling. They occur naturally in almost any information processing problem involving sequential acquisition of data: for example, the sequence of estimates of a random variable based on increasing observations, and the sequence of likelihood ratios in a sequential hypothesis test are martingales. The stochastic process X is a submartingale (supermartingale) with respect to the filtration {Fn } if it is 1. Fn -adapted, 2. E[|X n |] < ∞ for all n, and 3. E[X n | Fn ] ≥ X n a.s. (E[X n | Fn ] ≤ X n a.s.) for all n ≤ n. The stochastic process X is a martingale if it is a submartingale and a supermartingale. If we recall the definition of conditional expectation we see that the requirement E[X n+1 | Fn ] = X n a.s. implies the following: E[X n+1 | Fn ]dP = X n+1 dP, F ∈ Fn , F

and

F

X n dP = F

X n+1 dP,

F ∈ Fn .

Since Fn ⊂ Fn+1 ⊂ · · · ⊂ Fn+k , it easily seen that X n dP = X n+1 dP · · · = X n+k dP, F

(2.3.1)

F

F

F

F ∈ Fn .

(2.3.2)

2.3 Discrete time martingales

51

and hence with probability 1 E[X n+k | Fn ] = X n . Setting F = and n = 1, 2, . . . in (2.3.2) gives E[X 1 ] = E[X 2 ] = · · · = E[X n ]. A classical example of a martingale X is a player’s fortune in successive plays of a fair game. If X 0 is the initial fortune, then “fair” means that, on average, the fortune at some future time n, after more plays, should be neither more nor less than X 0 . If the game is favorable to the player, then his fortune should increase on average and X n is a submartingale. If the game is unfavorable to the player, X n is a supermartingale. The following important inequality is used to prove a fundamental result on constructing a uniformly integrable family of random variables by conditioning a fixed (integrable) random variable on a family of sub-σ -fields. Lemma 2.3.1 (Jensen’s Inequality). Suppose X ∈ L 1 . If φ : IR → IR is convex and φ(X ) ∈ L 1 , then E[φ(X ) | G] ≥ φ(E[X | G]).

(2.3.3)

Proof (see, for example, [11]) Any convex function φ : IR → IR is the supremum of a family of affine functions, so there exists a sequence (φn ) of real functions with φn (x) = an x + bn for each n, such that φ = supn φn . Therefore φ(X ) ≥ an X + bn holds a.s. for each (and hence all) n. So by the positivity of E[. | G], E[φ(X ) | G] ≥ supn (an E[X | G] + bn ) = φ(E[X | G]) a.s. Lemma 2.3.2 Let X ∈ L p , p ≥ 1. The family L = {E[X | G] : G is a sub-σ -field of F}, is uniformly integrable. Proof

Since φ(x) = |x| p is convex, Jensen’s Inequality 2.3.1 implies that |E[X | G]| p ≤ E[|X | p | G].

Hence E[|E[X | G]| p ] ≤ E[E[|X | p | G]] = E[|X | p ], that is, E[|E[X | G]| p ] < ∞ for all G. Thus the family L is L p -bounded, hence uniformly integrable by Example 1.3.35. Specializing Lemma 2.3.2 to filtrations, we obtain an important type of martingale. Example 2.3.3 Let {Fn } be a filtration, suppose F∞ = Fn and Y ∈ L 1 (, F∞ ). Define X n = E[Y | Fn ], n ≥ 1. Then {X n , Fn }, n ≥ 1 is a martingale. To check this consider E[X n+1 | Fn ] = E[E[Y | Fn+1 ] | Fn ] = E[Y | Fn ] = X n ,

52

Stochastic processes

using property (1.4.1) of conditional expectations. Conversely, if the stochastic process {X n } is a martingale with respect to the filtration {Fn } and there exists an integrable random variable X such that X n = E[X | Fn ]

(P-a.s.) n ≥ 1,

then the martingale {X n , Fn } is called regular. Regularity of a martingale {X n , Fn } is in fact equivalent to the uniform integrability of the process {X n } by Lemma 2.3.2. In turn, this is equivalent to the convergence in L 1 of {X n } to X . (See [11].) Example 2.3.4 Let (, F, P) be a probability space equipped with a filtration {Fn }. Let P be another probability measure on (, F) absolutely continuous with respect to P when both are restricted to Fn (i.e. P(F) = 0 then P(F) = 0 for all F ∈ Fn ). Then from the Radon–Nikodym Theorem 1.3.25 there is an Fn -measurable derivative n such that: P(F) = n (ω)dP, F ∈ Fn . (2.3.4) F

Similarly, there is an Fn+1 -measurable density n+1 . Now, Fn ⊂ Fn+1 so that F ∈ Fn+1 and (2.3.4) remains true if n is replaced with n+1 : P(F) = n (ω)dP = n+1 (ω)dP, F ∈ Fn . F

F

which implies that {n } is an {Fn } martingale.

Definition 2.3.5 Let {X n , Fn } be a submartingale. The number Cn [a, b] of up-crossings of the interval [a, b] by the sequence X 1 , . . . , X n is defined to be the largest positive integer k such that we can find 0 ≤ s1 < t1 < s2 < t2 < · · · < sk < tk ≤ n with X si < a, X ti > b, for 1 ≤ i ≤ k. The following theorem is a useful tool in proving convergence results for submartingales. Theorem 2.3.6 (Doob). If {X n , Fn } is a submartingale then for all n ≥ 1, E[Cn [a, b]] ≤

E[X n − a]+ , b−a

where and [X n − a]+ = max{(X n − a), 0}. Proof

See [36] page 474.

Theorem 2.3.7 If {X n , Fn } is a nonnegative martingale then X n → X a.s., where X is an integrable random variable. Proof Suppose that the event {ω : lim inf X n (ω) < lim sup X n (ω)} = p 0.

(2.3.5)

2.3 Discrete time martingales

53

This means that {X n } oscillates about or up-crosses the interval [a, b] infinitely many times. However, using Theorem 2.3.6 and the fact that sup E[X n ] = E[X 1 ] < ∞ we have: lim E[Cn [a, b]] ≤ lim n

n

E[X n − a]+ E[X 1 ] + |a| ≤ < ∞, b−a b−a

which contradicts (2.3.5), that is, P({ω : lim inf X n (ω) < lim sup X n (ω)}) = 0. Hence limn X n = X a.s. To finish the proof we must show that E[|X |] < ∞. This follows from Fatou’s Lemma 1.3.16. Theorem 2.3.8 Let (, F, P) be a probability space equipped with a filtration {Fn }. Write F∞ = Fn ⊂ F. Let P be another probability measure on (, F) which is absolutely continuous with respect to P when both are restricted to Fn for each n (i.e. P(F) = 0 then P(F) = 0 for all F ∈ Fn ). Suppose n are the corresponding Radon–Nikodym derivatives. Then n converges to an integrable random variable with probability 1. Moreover, if P is absolutely continuous with respect to P on F∞ then is the corresponding Radon–Nikodym derivative. Proof The first statement of the theorem follows from Theorem 2.3.7; the second statement follows from Theorem 3, page 478 of Shiryayev [36]. See also Example 2.3.4. Returning to Example (1.3.29): Example 2.3.9 Suppose (, F, P) is a probability space on which is defined a sequence of random variables Y1 , Y2 , . . . and Fn = σ {Y1 , Y2 , . . . , Yn }. Let P be another probability measure on F. Suppose that under P and P the random vector {Y1 , Y2 , . . . , Yn } has densities f n (.) and f n respectively with respect to n-dimensional Lebesgue measure. Then by Theorem 2.3.8 the Radon–Nikodym derivatives dP f¯n (Y1 , Y2 , . . . , Yn ) = n = dP Fn f n (Y1 , Y2 , . . . , Yn ) converge to an integrable and F∞ -measurable random variable .

Example 2.3.10 If {X n } is an integrable, real valued process with independent increments having mean 0 then it is a martingale with respect to the filtration it generates. If, in addition, X n2 is integrable then X n2 − E(X n2 ) is a martingale with respect to the same filtration. The proof is left as an exercise. Theorem 2.3.11 If {X n , Fn } is a martingale and α is a stopping time with respect to the filtration Fn , then {X min(n,α) , Fn } is a martingale. Proof First we have to show that X min(n,α) is integrable. But X min(n,α) = n−1 k=0 X k + X n I{α≥n} and by assumption the variables X 0 , . . . , X n are integrable. Hence X min(n,α) is

54

Stochastic processes

integrable. Moreover, X min(n,α) is Fn -measurable. It remains to show that E[X min(n+1,α) | Fn ] = X min(n,α) . This follows from E[X min(n+1,α) − X min(n,α) | Fn ] = E[Iα>n (X n+1 − X n ) | Fn ] = I{α>n} E[(X n+1 − X n ) | Fn ] = 0, since {α > n} ∈ Fn . We also have that stopping at an optional time preserves the martingale property. Theorem 2.3.12 (Doob Optional Sampling Theorem). Suppose {X n , Fn } is a martingale. Let α ≤ β (a.s.) be stopping times such that X α and X β are integrable. Also suppose that lim inf |X n |dP → 0, (2.3.6) {α≥n}

and

lim inf {β≥n}

|X n |dP → 0.

(2.3.7)

Then E[X β | Fα ] = X α .

(2.3.8)

In particular E[X β ] = E[X α ]. Proof Using the definition of conditional expectation, we have to show that for every A ∈ Fα , I{α≤β} E[X β | Fα ]dP = I{α≤β} X β dP = I{α≤β} X α dP. A

A

However, {α ≤ β} =

A

n≥0 {α

= n} ∩ {β ≥ n}. Hence it suffices to show that, for all n ≥ 0: I{α=n}∩{β≥n} X β dP = I{α=n}∩{β≥n} X α dP

A

A

=

I{α=n}∩{β≥n} X n dP.

(2.3.9)

A

Now, {ω : β(ω) ≥ n} = {ω : β(ω) = n} last integral in (2.3.9) is equal to X n dP + A∩{α=n}∩{β=n}

=

{ω : β(ω) ≥ n + 1} and in view of (2.3.1), the

X n+1 dP

A∩{α=n}∩{β≥n+1}

X β dP + A∩{α=n}∩{β=n}

X n+1 dP. A∩{α=n}∩{β≥n+1}

(2.3.10)

2.3 Discrete time martingales

Also, {ω : β(ω) ≥ n} = {ω : n ≤ β(ω) ≤ n + 1} again, (2.3.10) equals X β dP + A∩{α=n}∩{n≤β≤n+1}

55

{ω : β(ω) ≥ n + 2} and using (2.3.1)

X n+2 dP.

A∩{α=n}∩{β≥n+2}

Repeating this step k times, I{α=n}∩{β≥n} X n dP = A

X β dP A∩{α=n}∩{n≤β≤n+k}

+

X n+k+1 dP, A∩{α=n}∩{β≥n+k+1}

that is

X β dP =

X n dP

A∩{α=n}∩{n≤β≤n+k}

A∩{α=n}∩{β≥n}

−

X n+k+1 dP. A∩{α=n}∩{β≥n+k+1}

Now, + − X n+k+1 = X n+k+1 − X n+k+1 + + − + = 2X n+k+1 − (X n+k+1 + X n+k+1 ) = 2X n+k+1 − |X n+k+1 |

so that

X β dP = A∩{α=n}∩{n≤β≤n+k}

X n dP A∩{α=n}∩{β≥n}

−2

A∩{α=n}∩{β≥n+k+1}

+

+ X n+k+1 dP

|X n+k+1 |dP.

(2.3.11)

A∩{α=n}∩{β≥n+k+1}

Taking the limit when k → ∞ of both sides of (2.3.11) and using (2.3.7), we obtain X β dP = X n dP, A∩{α=n}∩{n≤β}

A∩{α=n}∩{n≤β}

which establishes (2.3.9) and finishes the proof.

Definition 2.3.13 The stochastic process {X n , Fn } is a local martingale if there is a sequence of stopping times {αk } increasing to ∞ with probability 1 and such that {X n∧αk , Fn } is a martingale. Remark 2.3.14 The interesting fact about local martingales is that they can be obtained rather naturally through a martingale transform (stochastic integral in the continuous time case) which is defined as follows. Suppose {Yn , Fn } is a martingale and {An , Fn } is a

56

Stochastic processes

predictable process. Then the sequence X n = A0 Y0 +

n

Ak (Yk − Yk−1 )

k=1

is called a martingale transform and is a local martingale. Proof To show that {X n , Fn } is a local martingale we have to find a sequence of stopping times {αk }, k ≥ 1, increasing to infinity (P-a.s.) and such that the “stopped” process {X min(n,αk ) , Fn } is a martingale. Let αk = inf{n ≥ 0 : |An+1 | > k}. Since A is predictable the αk are stopping times and clearly αk ↑ ∞ (P-a.s.). Since Y is a martingale and |Amin(n,αk ) I{αk >n} | ≤ k then, for all n ≥ 1, E[|X min(n,αk ) I{αk >n} | < ∞. Moreover, from Theorem 2.3.11, E[(X min(n+1,αk ) − X min(n,αk ) )I{αk >n} | Fn ] = I{αk >n} Amin(n+1,αk ) E[Ymin(n+1,αk ) − Ymin(n,αk ) | Fn ] = 0. This finishes the proof. Example 2.3.15 Suppose that you are playing a game using the following “strategy”. At each time n your stake is An . Write X n for the state of your total gain through the n-th game with X 0 = 0 for simplicity. Write Fn = σ {X k : 0 ≤ k ≤ n}. We suppose for each n, An is Fn−1 measurable, that is A = {An } is predictable with respect to the filtration Fn . This means that An = An (X 0 , X 1 , . . . , X n−1 ) is a function of X 0 , X 1 , . . . , X n−1 . If we assume that you win (or lose) at time n if a Bernouilli random variable bn is equal to 1 (or −1), then n n Xn = Ak bk = Ak Ck . k=1

k

k=1

Here Ck = Ck − Ck−1 and Ck = i=1 bi . If C is a martingale with respect to the filtration Fn (in this case we say that the game is “fair”), then the same thing holds for X because E[X n | Fn−1 ] = X n−1 + An E[Cn − Cn−1 | Fn−1 ] = X n−1 + An (E[Cn | Fn−1 ] − Cn−1 ) = X n−1 + An (Cn−1 − Cn−1 ) = X n−1 . 2.4 Doob decomposition A submartingale is a process which “on average” is nondecreasing. Unlike a martingale, which has a constant mean over time, a submartingale has a trend or an increasing predictable part perturbated by a martingale component which is not predictable. This is made more precise by the following theorem due to J. L. Doob.

2.4 Doob decomposition

57

Theorem 2.4.1 (Doob Decomposition). Any submartingale {X n } can be written (P-a.s. uniquely) as X n = Yn + Z n ,

a.s.

(2.4.1)

where {Yn } is a martingale and {Z n } is a predictable, increasing process, i.e. E(Z n ) < ∞, Z 1 = 0 and Z n ≤ Z n+1 a.s. ∀n. Proof Then:

Write n = X n − X n−1 , yi = i − E[i | Fi−1 ] and z i = E[i | Fi−1 ], z 0 = 0. X n = 1 − E[1 | F0 ] + 2 − E[2 | F1 ] + · · · + n − E[n | Fn−1 ] +

n

E[i | Fi−1 ]

i=1

=

n

yi +

n

i=1

zi

i=1

= Yn + Z n , To prove uniqueness suppose that there is another decomposition X n = Yn + Z n = n n i=1 yi + i=1 z i . Let yn + z n = x n = yn + z n and take conditional expectation with respect to Fn−1 to get z n = z n , because yn is a martingale increment and z n is predictable. This implies yn = yn and the uniqueness of the decomposition. Remarks 2.4.2 1. In Theorem 2.4.1 if {X n } is just an Fn -adapted and integrable process the decomposition remains valid but we lose the “increasing” property of the process {Z n }. 2. The process X − Z is a martingale; as a result Z is called the compensator of the submartingale X . 3. A processes which is the sum of a predictable process and a martingale is called a semimartingale. 4. Uniqueness of the decomposition is ensured by the predictability of the process {Z n }. Definition 2.4.3 A discrete-time stochastic process {X n }, with finite-state space S = {s1 , s2 , . . . , s N }, defined on a probability space (, F, P) is a Markov chain if P(X n+1 = sin+1 | X 0 = si0 , . . . , X n = sin ) = P(X n+1 = sin+1 | X n = sin ), for all n ≥ 0 and all states si0 , . . . , sin , sin+1 ∈ S. This is termed the Markov property. {X n } is a homogeneous Markov chain if

P(X n+1 = s j | X n = si ) = π ji is independent of n.

58

Stochastic processes

The matrix = {π ji } is called the probability transition matrix of the homogeneous Markov chain and it satisfies the property Nj=1 π ji = 1. Note that our transition matrix is the transpose of the traditional transition matrix defined elsewhere. The convenience of this choice will be apparent later. The following properties of a homogeneous Markov chain are easy to check. 1. Let π 0 = (π10 , π20 , . . . , π N0 ) be the distribution of X 0 . Then P(X 0 = si0 , X 1 = si1 , . . . , X n = sin ) = πi00 πi0 i1 . . . πin−1 in . 2. Let π n = (π1n , π2n , . . . , π Nn ) be the distribution of X n . Then π n = n π 0 = π n−1 . Example 2.4.4 Let {ηn } be a discrete-time Markov chain as in Definition 2.4.3. Consider the filtration {Fn } = σ {η0 , η1 , . . . , ηn }. Write X n = (I(ηn =s1 ) , I(ηn =s2 ) , . . . , I(ηn =s N ) ). Then X n is a discrete-time Markov chain with state space the set of unit vectors e1 = (1, 0, . . . , 0) , . . . , e N = (0, . . . , 1) of IR N . However, the probability transitions matrix of X is . We can write: E[X n | Fn−1 ] = E[X n | X n−1 ] = X n−1 ,

(2.4.2)

from which we conclude that X n−1 is the predictable part of X n , given the history of X

up to time n − 1 and the nonpredictable part of X n must be Mn = X n − X n−1 . In fact it can be easily shown that Mn ∈ IR N is a mean 0, Fn -vector martingale and we have the semimartingale (or Doob decomposition) representation of the Markov chain {X n }, X n = X n−1 + Mn .

(2.4.3)

Definition 2.4.5 Given two (column) vectors X and Y the tensor or Kronecker product X ⊗ Y is the (column) vector obtained by stacking the rows of the matrix X Y , where is the transpose, with entries obtained by multiplying the i-th entry of X by the j-th entry of Y . Example 2.4.6 Let {X n } be an order-2 Markov chain (see (2.4.4) below) with state space the standard basis of IR2 {e1 , e2 } on a filtered probability space (, F, Fn , P), Fn = σ {X 0 , X 1 , . . . , X n } such that P(X n = ek | Fn−1 ) = P(X n = ek | X n−2 , X n−1 ), and probability transitions matrix = {πk, ji },

k

πk, ji = 1,

i, j, k = 1, 2

(2.4.4)

2.5 Continuous time martingales

or

=

π1,11 π2,11

π1,12 π2,12

π1,21 π2,21

59

π1,22 . π2,22

Lemma 2.4.7 A semimartingale representation (or Doob decomposition) of the order-2 Markov chain X is: X n = (X n−2 ⊗ X n−1 ) + Mn ,

(2.4.5)

that is Mn = X n − X n−2 ⊗ X n−1 is an Fn -martingale. (X n−2 ⊗ X n−1 ) is the tensor, or Kronecker, product of the vectors X n−1 , X n−2 . This can be identified with one of the standard unit vectors {e1 , e2 , e3 , e4 } of IR4 , that is e1 ⊗ e1 = (1, 0, 0, 0) ,

e1 ⊗ e2 = (0, 1, 0, 0) ,

e2 ⊗ e1 = (0, 0, 1, 0) ,

e2 ⊗ e2 = (0, 0, 0, 1) .

Proof E[X n | Fn−1 ] = E[X n | X n−2 , X n−1 ] = E[X n | X n−2 = ei , X n−1 = e j ]I{X n−2 =ei ,X n−1 =e j } ij

=

k

=

ij

=

ek πk, ji I{X n−2 =ei ,X n−1 =e j }

ij

(π1, ji , π2, ji )I{X n−2 =ei ,X n−1 =e j }

ei ⊗ e j I{X n−2 =ei ,X n−1 =e j } = X n−2 ⊗ X n−1 .

ij

2.5 Continuous time martingales The stochastic process X is a submartingale (supermartingale) with respect to the filtration {Ft } if 1. it is Ft -adapted, E[|X t |] < ∞ for all t and 2. E[X t | Ft ] ≥ X t (E[X t | Ft ] ≤ X t ) for all t ≤ t. The stochastic process X is a martingale if it is a submartingale and a supermartingale. Since for a martingale E[X t | Fs ] = X s , it follows that E[E[X t | Fs ]] = E[X s ], and E[X t ] = E[X s ] for all s ≥ 0, so that E[X t ] = E[X 0 ] for all t ≥ 0.

60

Stochastic processes

Example 2.5.1 If X is an integrable random variable on a filtered probability space then

X t = E[X | Ft ] is a martingale, since for s ≤ t, E[X t | Fs ] = E[E[X | Ft ] | Fs ] = E[X | Fs ] = X s .

An important application of Example 2.5.1 is Example 2.5.2 Let (, F, P, P) be a probability space with a filtration {Ft , t ≥ 0} and two probability measures such that P P. Then the Radon–Nikodym Theorem asserts the existence of a nonnegative random variable such that for all F ∈ F, P(F) = (ω)dP(ω). F

Then t = E[ | Ft ] is a nonnegative martingale with mean E[t ] = (ω)dP(ω) = 1.

t (ω)dP(ω) =

Example 2.5.3 Let {X t } be a stochastic process adapted to the filtration {Ft } with independent increments, that is, for s ≤ t, X t − X s is independent of the σ -field Fs . Then the process {X t − E[X t ]} is an Ft -martingale since E[X t − E[X t ] | Fs ] = E[X t − E[X t ] − (X s − E[X s ]) + (X s − E[X s ]) | Fs ] = X s − E[X s ] + E(X t − X s ) − E(X t − X s ) = X s − E[X s ]. The following martingale convergence result is proved in, for instance, [6] page 16. Theorem 2.5.4 (Martingale Convergence Theorem). Let {X t , Ft }, t ≥ 0, be a martingale with right-continuous sample paths. If supt E[|X t |] < ∞ then there is a random variable X ∞ ∈ L 1 such that limt→∞ X t = X ∞ a.s. Furthermore, if {X t , Ft }, t ≥ 0 is uniformly L1

integrable then (X t → X ∞ ) and E[|X t |] increases to E[|X ∞ |] as t → ∞. Theorem 2.5.5 (Stopped Martingales are Martingales). Let {X t , Ft } be a martingale with right-continuous sample paths and α a stopping time. The stopped process {X t∧α , t ≥ 0} is also a martingale. Proof

See [34] page 189.

Theorem 2.5.6 (Optional Stopping). Let {X t , Ft , t ≥ 0} be a right-continuous martingale with a last element X ∞ , and let α ≤ β be two stopping times. Then E[X β | Fα ] = X α In particular, we have E[X β ] = X 0 . Proof

See [21] page 19.

a.s.

2.5 Continuous time martingales

61

Now we give a characterization of a uniformly integrable martingale. We need this result to prove Theorem 3.5.3 Theorem 2.5.7 Suppose {X t }, 0 ≤ t ≤ ∞, is an adapted right-continuous process such that for every stopping time α, E[|X α |] < ∞ and E[X α ] = 0. Then {X t } is a uniformly integrable martingale. Proof

Consider any time t ∈ [0, ∞] and F ∈ Ft . Let α(ω) = t I{ω∈F} + ∞I{ω∈F} / .

Then α is a stopping time and by assumption 0 = E[X α ] = E[X t I{ω∈F} ] + E[X ∞ I{ω∈F} / ] = E[X ∞ ] = E[X ∞ I{ω∈F} ] + E[X ∞ I{ω∈F} / ]. Hence E[X t I{ω∈F} ] = E[X ∞ I{ω∈F} ] for all F ∈ Ft , so X t = E[X ∞ | Ft ] a.s. Recall that the definition of a martingale involves the integrability of X t , for all t which in fact is a sufficient condition for the existence of E[X t | Fs ], s ≤ t. However, E[X t | Fs ], s ≤ t may exist even though E[|X t |] = ∞, in which case {X t , Ft } is called a local martingale. First recall the concept of local properties of deterministic functions. The (deterministic) function X t = et /(t − 1) is locally bounded, i.e. it is bounded on compact sets not containing 1 (closed bounded intervals in IR − {1}). In fact we can define, for each n ∈ IN: Ytn = X t I[|X t |≤n] + n I[X t >n] − n I[X t <−n] . Clearly Ytn is bounded everywhere and equals X t on closed bounded intervals. However, for Ytn to converge to X t we must allow n to increase to infinity. The same idea is used when X t (ω) is a random function or a stochastic process. However, the localizing sequence is then a sequence of random variables, in fact stopping times. For example, consider αn (ω) = inf{t : |X t (ω)| > n}, which is the first time the sample path X t (ω) leaves the interval [−n, +n]. Then define Ytn (ω) = X t∧αn (ω) (ω), so that for different ωs there are, for each n, different times t when X t (ω) leaves the bounded set [−n, n]. As in the deterministic case, the sequence of stopping times αn (ω) must increase to infinity for almost all ω. Here x ∧ y stands for the smaller of x and y. Definition 2.5.8 The stochastic process X = {X t }, t ≥ 0, is said to be square integrable if supt E[X t2 ] < ∞. Definition 2.5.9 The stochastic process {X t , Ft } is a local martingale if there is a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a martingale. Definition 2.5.10 The stochastic process {X t , Ft } is a locally square integrable martingale (i.e. locally in L 2 ) if there is a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a square integrable martingale.

62

Stochastic processes

The following two theorems, whose proofs can be found in [11], are needed in the proof of Theorem 3.5.6. Theorem 2.5.11 Let {X t , Ft } be a local martingale which is zero at time t = 0. Then there exists a sequence of stopping times {αn } increasing to ∞ with probability 1 and such that for each n, {X t∧αn , Ft } is a uniformly integrable martingale and E[X t∧αn | Ft ] is bounded on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t < αn (ω)) (denoted [[0, αn [[). Theorem 2.5.12 Let {X t , Ft } be a local martingale. Then there exists a sequence of stopping times {αn } increasing to ∞ with probability 1 such that for each n, X {αn ∧t} = U{τn ∧t} + V{αn ∧t} , where U0 = 0, U{αn ∧t} is square integrable and V{αn ∧t} is a martingale of integrable variation which is zero at t = 0.

2.6 Doob–Meyer decomposition The following definitions are needed in the sequel. Definition 2.6.1 Let f be a real valued function on an interval [a, b]. The variation of f on the interval [a, b] is given by lim

n→∞

n

|

f (tkn )

−

n f (tk−1 )| =

k=1

b

|d f |,

a

where a = t0n < t1n < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. If

b

|d f | < ∞,

a

then we say that f has finite variation on the interval [a, b]. If

b

|d f | = ∞,

a

then we say that f has infinite variation on the interval [a, b]. Definition 2.6.2 A stochastic process X is of integrable variation if

E

∞

|dX s | < ∞.

0

Example 2.6.3 A typical example of a continuous function of infinite variation is the following: f (x) =

0 for x = 0, π x sin for 0 < x ≤ 1. 2x

2.6 Doob–Meyer decomposition

63

Consider the sequence of partitions of the interval [0, 1]: π1 = {0, 1}, π2 = {0, 12 , 1}, π3 = {0, 13 , 12 , 1}, π4 = {0, 14 , 13 , 12 , 1}, ...

1 1 1 πn = 0, , ,..., ,1 . n−1 n−2 n − (n − 2)

Then it can be verified that 0

1

|d f | = lim

n→∞

n

n | f (tkn ) − f (tk−1 )| = ∞.

k=1

Another example of a function of infinite variation in any interval containing 0 is f (x) =

(−1)[1/x] , 1+x

where [1/x] stands for the integral part of 1/x. Definition 2.6.4 An adapted process {X t , Ft } is called a semimartingale if it can be written in the form X t = X 0 + Mt + Vt . Here {Mt }, t ≥ 0 is a local martingale with M0 = 0; {Vt } is an adapted process with paths of finite variation (see Definition 2.6.1), and V0 = 0. {Vt } is not necessarily predictable. Roughly speaking, {Vt } is a slowly changing component (trend) and {Mt } is a quickly changing component. Definition 2.6.5 An adapted process {X t , Ft } is called a special semimartingale if it can be written in the form X t = X 0 + Mt + Vt . Here {Mt }, t ≥ 0, is a local martingale with M0 = 0; {Vt } is a predictable process with paths of finite variation, and V0 = 0. Theorem 2.6.6 X is a (special) semimartingale if and only if the stopped process X t∧τn is a (special) semimartingale, where {τn } is a sequence of stopping times such that limn τn = ∞ Proof (Elliott [11]). Clearly, if X is a (special) semimartingale then the stopped process X t∧τn is a (special) semimartingale for each n.

64

Stochastic processes

If S and T are stopping times and X t∧S and X t∧T are (special) semimartingales then the same is true of X t∧(S∨T ) = X t∧S + X t∧T − X t∧(S∧T ) . Therefore, we can assume that {τn } is an increasing sequence of stopping times with the stated properties. If X t∧τn is a special semimartingale for each n it has a unique decomposition X t∧τn = X 0 + Mtn + Ant . However, (X t∧τn+1 )t∧τn = X t∧τn , so (M n+1 )t∧τn = M n and (An+1 )t∧τn = An . The processes {M n } and {An } can, therefore, be “pasted” together to give a local martingale M and a predictable process A of locally finite variation, so the process X in this case is a special semimartingale. In the general case we know that X t∧τn is a semimartingale for each n. However, X is cer tainly a right-continuous process with left limits, so the process Vt = 0<s≤t X s I{|X s |≥1} is of finite variation, as is Y = X − V − X 0 . For each n, Yt∧τn = X t∧τn − Vt∧τn − X 0 is a semimartingale whose jumps are all bounded by 1. Therefore, by Corollary 12.40(c) page 150 in [11], Yt∧τn is a special semimartingale. By the first part of this proof Y is then a special semimartingale and X t = X 0 + Yt + Vt is a semimartingale. Definition 2.6.7 A right-continuous stochastic process {X t } on the stochastic basis (, F, Ft , P) is said to be of class D if the family {X τ }, for all τ which is an a.s. finite stopping time, is uniformly integrable. It is of class DL if it is of class D on each interval [0, a], a < ∞. Definition 2.6.8 A right-continuous uniformly integrable supermartingale {X t } is said to be of class D if the set of random variables {X τ }, for τ any stopping time, is uniformly integrable. Note that any uniformly integrable martingale is of class D. This follows from Doob’s Optional Stopping Theorem 2.5.6 because X τ = E[X ∞ | Fτ ] a.s. The proof of the following important theorem can be found, for instance, in [11]. Theorem 2.6.9 (Doob–Meyer Decomposition). Any class D supermartingale (X t , Ft ) can be written (P-a.s. uniquely) as Mt = X t + A t ,

(2.6.1)

where {Mt , Ft } is a uniformly integrable martingale and {At , Ft } is a predictable, increasing process. Remarks 2.6.10 1. If we replace class D by class DL in the theorem {Mt , Ft } is no longer a uniformly integrable martingale. (See Theorem 4.10 in [21].)

2.6 Doob–Meyer decomposition

65

2. The Doob–Meyer decomposition of a process is the special semimartingale representation of that process because of the predictability of the process {At , Ft }. 3. Recall that in the Doob decomposition for discrete time submartingales the increasing predictable process is given by Zn = E[X i − X i−1 | Fi−1 ]. or Z n = E[X n | Fn−1 ]. By analogy, At is obtained if we replace summation with integration in the following manner: t X s+h − X s At = lim E | Fs ds, h→0 0 h or dZ t = E[dX t | Ft ]. 4. An interesting consequence of the Doob–Meyer Theorem is that any continuous martingale has unbounded variation. To see this suppose that (Mt , Ft ) is a continuous martingale with bounded variation so that it can be written as a difference of two continuous increasing functions X t and Z t , Mt = X t − Z t

or

X t = Mt + Z t ,

which is a Doob–Meyer decomposition of the submartingale X t . But X t = 0 + X t is another Doob–Meyer decomposition of X t . By uniqueness Mt = 0. Example 2.6.11 Suppose τ1 ≤ τ2 ≤ . . . is a sequence of stopping times such that limn τn = +∞ a.s. Then the counting process Nt = I{τn ≤t} n≥1

is an Ft -submartingale and it admits a Doob–Meyer decomposition N t = Yt + Z t . Here, the predictable, increasing process {Z t , Ft } is called the compensator of Nt .

Example 2.6.12 Let X be the single jump process introduced in Example 2.1.4. For t ≥ 0 define the process µ(t, A) = I{T ≤t} I{Z ∈A} .

(2.6.2)

Note that the process µ(t, A) has sample paths which are identically zero until the jump time T . They then have a unit jump at T and Z ∈ A. We now show that the predictable

66

Stochastic processes

compensator of µ is given by µ p (t, A) = −

]0,T ∧t]

dFsA , Fs−

(2.6.3)

where FsA = P[T > s, Z ∈ A] and Fs = P[T > t, Z ∈ E]. Write Ft for the completed σ -field generated by {X s }, s ≤ t, so that Ft is generated by B([0, t]) × E. Note that ]t, ∞] × E is an atom of Ft . We have the following result which will be used later (see Lemma 3.8.9). Lemma 2.6.13 Suppose τ is an Ft stopping time with P(τ < T ) > 0. Then there is a t0 ∈ [0, ∞[ such that τ ∧ T = t0 ∧ T a.s. Proof Suppose τ takes two values t1 < t2 on {ω ∈ = [0, ∞] × E : τ (ω) ≤ T (ω)} with positive probability. Then for t1 < t < t2 , {ω ∈ = [0, ∞] × E : τ (ω) ≤ t}∩]t, ∞] × E = ]t, ∞] × E, so {τ ≤ t} ∈ Ft . Therefore for some t0 ∈ [0, ∞[, {τ ≤ T } ⊂ {t0 ≤ T }. A similar argument gives the reverse inclusion and the result follows.

Theorem 2.6.14 q(t, A) = µ(t, A) − µ p (t, A) is an Ft -martingale. Proof

([11]) For t > s,

E[q(t, A) − q(s, A) | Fs ] = E[µ(t, A) − µ p (t, A) − (µ(s, A) − µ p (s, A)) | Fs ] = E[µ(t, A) − µ(s, A) − (µ p (t, A) − µ p (s, A)) | Fs ] So we must show that E[µ(t, A) − µ(s, A) | Fs ] = E[µ p (t, A) − µ p (s, A) | Fs ].

(2.6.4)

First note that, in view of (2.6.2), if T ≤ s both sides of (2.6.4) are zero. Now recall that ]s, ∞] × E is an atom of Fs E[µ(t, A) − µ(s, A) | Fs ] = E[I{Z ∈A} I{s s, Z ∈ A | T > s, Z ∈ E)I{T >s,Z ∈E} − P(T > t, Z ∈ A | T > s, Z ∈ E)I{T >s,Z ∈E} =

FsA − FtA I{T >s,Z ∈E} . Fs

2.6 Doob–Meyer decomposition

67

On the other hand µ p (t, A) is a function of T only, and F(t) = P(T > t). Therefore, using (2.6.3), E[µ p (t, A) − µ p (s, A) | Fs ]

dFuA dFuA = −E − | T > s, Z ∈ E I{T >s,Z ∈E} ]0,T ∧t] Fu− ]0,T ∧s] Fu− dFuA dFuA I{T >s,Z ∈E} (IT >t + Is s, Z ∈ E) ]0,T ∧t] Fu− ]0,s] Fu−

I{T >s,Z ∈E} dFuA dFuA =− I {T > t, Z ∈ E} − E P(T > s, Z ∈ E) ]0,t] Fu− ]0,s] Fu−

dFuA dFuA + Is
I {T > s, Z ∈ E} dFuA dFuA = −Ft − (−dFr ) . Fs ]s,t] Fu− ]s,t] ]s,r ] Fu− Interchanging the order of integration, the double integral is dFuA 1 (dFr ) = dFr dFuA ]s,t] ]s,r ] Fu− ]s,t] Fu− ]u,t] 1 = (Ft − Fu− )dFuA ]s,t] Fu− dFuA = Ft − (FtA − FsA ). ]s,t] Fu− Therefore (2.6.4) holds and the result follows. A continuous-time, discrete-state stochastic process of great importance in stochastic modeling is the following. Definition 2.6.15 A continuous-time stochastic process {X t }, t ≥ 0, with finite-state space S = {s1 , s2 , . . . , s N }, defined on a probability space (, F, P) is a Markov chain if for all t, u ≥ 0 and 0 ≤ r ≤ u, P(X (t+u) = s j | X u = si , X r = sk ) = P(X (t+u) = s j | X u = si ), for all states si , s j , sk ∈ S. {X t }, t ≥ 0, is a homogeneous Markov chain if

P(X (t+u) = s j | X u = si ) = p ji (t) is independent of u. The family Pt = { p ji (t)} is called the transition semigroup of the homogeneous Markov chain and it satisfies the property Nj=1 p ji (t) = 1.

68

Stochastic processes

The following properties are similar to the discrete-time case. P(t+u) = Pt Pu and P0 = I , where I is the identity matrix. Let p0 = ( p01 , p02 , . . . , p0N ) be the distribution of X 0 and pt = ( pt1 , pt2 , . . . , ptN ) be the distribution of X t . Then pt = Pt p0 . Theorem 2.6.16 Let {Pt }, t ≥ 0 be a continuous transition semigroup. Then there exist

qi = lim h↓0

1 − pii (h) ∈ [0, ∞], h

and

q ji = lim h↓0

Proof

p ji (h) ∈ [0, ∞). h

See, for instance, [5] page 334.

The matrix A = {qi j } is called the infinitesimal generator of the continuous-time homogeneous Markov chain. N Note that since p ji (h) = 1 it follows immediately that j=1

qi = −

N

q ji .

j=i, j=1

The differential system d P(t+h) − Pt Ph − I Pt = lim = Pt lim = Pt A h↓0 h↓0 dt h h is called Kolmogorov’s forward differential system. d Similarly, the system Pt = A Pt is called Kolmogorov’s backward differential system. dt In this finite-state case, a solution for both systems, with initial condition P0 = I , is et A . Example 2.6.17 (Semimartingale representation of a continuous-time Markov chain) Let {Z t } t ≥ 0 be a continuous-time Markov chain, with state space {s1 . . . , s N }, defined on a probability space (, F, P). S will denote the (column) vector (s1 , . . . , s N ) . Suppose 1 ≤ i ≤ N , and for j = i, πi (x) =

N

(x − s j ),

j=1

πi (x) ; then φi (s j ) = δi j and φ = (φ1 , . . . , φ N ) is a bijection of the set πi (si ) {s1 . . . , s N } with the set S = {e1 , . . . , e N } . Here, for 0 ≤ i ≤ N , ei = (0, . . . , 1, . . . , 0) is the i-th unit (column) vector in IR N . Consequently, without loss of generality, we shall consider a Markov chain on S. If X t ∈ S denotes the state of this Markov chain at time t ≥ 0, then the corresponding value of Z t is X t , S, where ., . denotes the inner product in IR N . and φi (x) =

2.6 Doob–Meyer decomposition

69

Write pti = P(X t = ei ), 0 ≤ i ≤ N . We shall suppose that for some family of matrices At , pt = ( pt1 , . . . , ptN ) satisfies the forward Kolmogorov equation d pt = A t pt , dt with p0 known and At = (ai j (t)), t ≥ 0. The fundamental transition matrix associated with A will be denoted by (t, s), so with I the N × N identity matrix, d(t, s) = At (t, s), dt d(t, s) = −(t, s)As , ds

(s, s) = I

(2.6.5)

(t, t) = I.

(If At is constant (t, s) = exp(t − s)A.) Consider the process in state x ∈ S at time s and write X s,t (x) for its state at the later time t ≥ s. Then E s,x [X t | Fs ] = E s,x [X t | X s ] = E s,x [X s,t (x)] = (t, s)x. Write Fts for the right-continuous, complete filtration generated by σ {X r : s ≤ r ≤ t}, and Ft = Ft0 . We have the following representation result. Lemma 2.6.18

Mt = X t − X 0 −

t

Ar X r dr 0

is an {Ft } martingale. Proof

Suppose 0 ≤ s ≤ t. Then t E[Mt − Ms | Fs ] = E X t − X s − Ar X r dr | Fs s t = E Xt − Xs − Ar X r dr | X s s t = E s,X s [X t ] − X s − Ar E s,X s [X r ]dr s t = (t, s)X s − X s − Ar (r, s)X s dr = 0 s

by (2.6.5). Therefore, the (special) semimartingale representation of the Markov chain X is

t

Xt = X0 +

Ar X r dr + Mt .

0

70

Stochastic processes

2.7 Brownian motion Let X be a real valued random variable with E[X 2 ] < ∞, E[X ] = µ and E[X − µ]2 = σ 2 = 0. Recall that X is Gaussian if its probability density function is given by the function 1 (x − µ)2 f (x) = √ , x ∈ IR. exp − 2σ 2 2πσ 2 If X = (X 1 , . . . , X n ) is a vector valued random variable with positive definite covariance matrix C = {Cov(X i , X j )}, i, j = 1, . . . , n, E[X ] = µ = (µ1 , . . . , µn ) , then X = (X 1 , . . . , X n ) is Gaussian if its density function is 1 (x − µ) C −1 (x − µ) f (x1 , . . . , xn ) = exp − , (2π)n/2 (det C)1/2 2 (x1 , . . . , xn ) ∈ IRn . Notice that the first two moments completely characterize a Gaussian random variable and uncorrelatedness implies independence between Gaussian random variables. A continuous-time, continuous-state space stochastic process {Bt } is said to be a standard one-dimensional Brownian motion process if X 0 = 0 a.s., it has stationary independent increments and for every t > 0, Bt is normally distributed with mean 0 and variance t. These features make the {Bt }, perhaps the most well-known and extensively studied continuous-time stochastic process. The joint distribution of any finite number of the random variables Bt1 , Bt2 , . . . , Btn , t1 ≤ t2 ≤ · · · ≤ tn of the process is normal with density 1 1 x 2 n−1 exp − 1 √ 2t1 i=1 2π (ti+1 − ti ) 2πt1 (xi+1 − xi )2 × exp − . 2(ti+1 − ti )

f (x1 , x2 , . . . , xn ) = √

The form of the density function f (x1 , x2 , . . . , xn ) shows that indeed the random variables Bt1 , Bt2 − Bt1 , . . . , Btn − Btn−1 are independent. By the independent increment property, P(Bt ≤ x | Bt0 = xo ) = P(Bt − Bt0 ≤ x − x0 ) x−x0 1 u2 = √ exp − du. 2(t − t0 ) 2π(t − t0 ) −∞ If Bt = (Bt1 , . . . , Btn ) is a vector valued Brownian motion process and x, y ∈ IRn , then 1 |y − x|2 f B (t, x, y) = exp − (2πt)n/2 2t n (yi − xi )2 1 = exp − , √ 2t 2π t i=1

2.7 Brownian motion

71

so that the n components of Bt are themselves independent one-dimensional Brownian motion processes. Some properties of the Brownian motion process The proofs of the following properties are left as exercises. If {Bt } is a Brownian motion process then: 1. the process {−Bt } is a Brownian motion, 2. for any a ≥ 0, the process {Bt+a − Bt } is a Brownian motion and the same result holds if a is replaced with a finite valued stopping time a(ω), 3. for any a = 0, the process {a Bt/a 2 } is a Brownian motion, 4. the process {B1/t }, for t > 0, is a Brownian motion, 5. Almost all the paths of (one-dimensional) Brownian motion visit any real number infinitely often. Theorem 2.7.1 Let {Bt } be a standard Brownian motion process and Ft = σ {Bs : s ≤ t}. Then 1. {Bt } is an Ft -martingale, 2. {Bt2 − t} is an Ft -martingale, and 3. for any real number σ , {exp(σ Bt −

σ2 t)} is an Ft -martingale. 2

Proof 1. Let s ≤ t. E[Bt − Bs | Fs ] = E[Bt − Bs ] = 0 because {Bt } has independent increments and E[Bt ] = E[Bs ] = 0 by hypothesis. 2. E[(Bt − Bs )2 | Fs ] = E[Bt2 − Bs2 ] = t − s. Therefore E[Bt2 − t 2 | Fs ] = Bs2 − s. 1 2 3. If Z is a standard normal random variable, with density √ e−x /2 , and λ ∈ IR then 2π ∞ 1 λx −x 2 /2 λ2 /2 λZ E[e ] = √ e e dx = e . Using the independence of increments and 2π −∞ stationarity we have, for s < t, E[eσ Bt −

σ2 t 2

| Fs ] = eσ Bs − =

σ2 t 2 E[eσ (Bt −Bs )

| Fs ]

σ2 eσ Bs − 2 t E[eσ (Bt −Bs ) ]

= eσ Bs −

σ2 t 2 E[eσ Bt−s ].

Now σ Bt−s is N (0, σ 2 (t − s)); that is, if Z is N (0, 1) as previously, σ Bt−s has the same √ law as σ t − s Z and E[eσ Bt−s ] = E[eσ Therefore E[eσ Bt −

σ 2

2

t | F ] = eσ Bs − s

σ 2

2

√

t−s Z

] = eσ

2

(t−s)/2

.

s and the result follows.

It turns out that Theorem 2.7.1 (2) characterizes a Brownian motion (see Theorem 3.7.3).

72

Stochastic processes

Theorem 2.7.2 (The Strong Markov Property for Brownian Motion) Let {Bt } be a Brownian motion process on a filtered probability space (, F, Ft }, and let τ be a finite valued stopping time with respect to the filtration {Ft }. Then the process B{τ +t} − Bτ , t ≥ 0, is a Brownian motion independent of Fτ Proof

See [34] page 22.

Theorem 2.7.3 (Existence of Brownian Motion) There exists a probability space on which it is possible to define a process {Bt }, 0 ≤ t ≤ 1, which has all the properties of a Brownian motion process. Proof

See [34] page 10.

2.8 Brownian motion process with drift An important stochastic process in applications is the one-dimensional Brownian motion with drift X t = µt + σ Bt , where µ is a constant, called the drift parameter, and Bt is a standard Brownian motion. Then it is easily seen that X t has independent increments and that X t+h − X t is normally distributed with mean µh and variance σ h. By the independent increment property we have P(X t ≤ x | X t0 = xo ) = P(X t − X t0 ≤ x − x0 ) x−x0 1 (u − µ(t − t0 ))2 = √ du. exp − 2(t − t0 )σ 2 2π(t − t0 )σ −∞ 2.9 Brownian paths The sample paths of a Brownian motion process are highly irregular. In fact they model the motion of a microscopic particle suspended in a fluid and subjected to the impacts of the fluid molecules. This phenomenon was first reported by the Scottish botanist Robert Brown in 1828. The path followed by a Brownian particle is very irregular. The sample paths of a Brownian motion process are nowhere differentiable with probability 1. To see this consider the quantity

Zh =

Bt+h − Bt , h

which is normally distributed with variance 1/ h → ∞ as h → 0. Hence for every bounded Borel set B, P(Z h ∈ B) → 0

(h → 0),

2.9 Brownian paths

73

that is, Z h does not converge with positive probability to a finite random variable. Using Kolmogorov’s Continuity Theorem, which we now state, one can show that almost all sample paths of a Brownian motion process are continuous. ˘ Theorem 2.9.1 (Kolmogorov–Centsov Continuity Theorem). Suppose that the stochastic process {X t } satisfies the following conditions. For all T > 0 there exists constants α > 0, β > 0, D > 0 such that: E[|X t − X s |α ] ≤ D|t − s|1+β ;

0 ≤ s, t ≤ T,

(2.9.1)

then almost every sample path is uniformly continuous on the interval [0, T ]. For the proof see [15] page 57. Recall that for a Brownian motion, P(Bt − Bs ≤ x) = √

1 2π|t − s|

x −∞

exp −

u2 du. 2|t − s|

Hence E|Bt − Bs |4 = √

1 2π|t − s|

+∞ −∞

u 4 exp −

u2 du = 3(t − s)2 , 2|t − s|

which verifies the Kolmogorov condition with α = 4, D = 3, β = 1 and establishes the almost sure continuity of the Brownian motion process. We now show that each portion of almost every sample path of the Brownian motion process Bt has infinite length, i.e. almost all sample paths are of unbounded variation, so that terms in a Taylor series expansion which would ordinarily be of second order get promoted to first order. This is one of the most remarkable properties of a Brownian motion process. Lemma 2.9.2 Let Bt be a Brownian motion process and let a = t0n < t1n < · · · < tnn = b

n denote a sequence of partitions of the interval [a, b] such that δn = max(tkn − tk−1 )= n max tk → 0 as n → ∞. n )2 = 2 B n and Write (Btkn − Btk−1 tk

Sn (B) =

n

2 Btkn .

k=1

Then: 1. E[Sn (B) − (b − a)]2 → 0 2. If δn → 0 so fast that

(δn → 0). ∞ n=1

then Sn (B) → b − a

(a.s.)

δn < ∞

(2.9.2)

74

Stochastic processes

Proof E[Sn (B) − (b − a)]2 = E[

1.

n

(2 Btkn − tkn )]2

k=1

=

n

E[2 Btkn ]2 − 2tkn E[2 Btkn ] + (tkn )2 ]

k=1

=

n

(3(tkn )2 − 2(tkn )2 + (tkn )2 )

k=1

=

n

2(tkn )2 ≤ 2δn

k=1

n

2(tkn ) = 2δn (b − a),

k=1

which goes to zero as δn → 0 and E[Sn − (b − a)]2 → 0. 2. By Chebyshev’s inequality (1.3.33) P(|Sn (B) − (b − a)| ≥ ) ≤

Var(Sn (B) − (b − a)) 2δn (b − a) ≤ . 2 2

(2.9.3)

In view of (2.9.2) we can sum up both sides of (2.9.3) and use the Borel–Cantelli Lemma (1.2.7) to get P(lim sup{|Sn (B) − (b − a)| ≥ }) = 0, and the event {ω : |Sn (B(ω)) − (b − a)| ≥ } occurs only a finite number of times with probability 1 as n increases to infinity. Therefore we have almost sure convergence.

The above argument shows that Bt (ω) is, a.s., of infinite variation on [a, b]. To see this note that n (ω)| b − a ≤ lim sup max |Btkn (ω) − Btk−1

n

n |. |Btkn − Btk−1

k=1 n (ω)| can be made From the sample-path continuity of Brownian motion, max |Btkn (ω) − Btk−1 n n n arbitrarily small for almost all ω which implies that k=1 |Btk − Btk−1 | → ∞ for almost all ω as n → ∞. There is a simple construction for Brownian motion. Take a sequence X 1 , X 2 , . . . , of i.i.d. N (0, 1) random variables and an orthonormal basis {φn } for L 2 [0, 1]. That is t φn , φn L 2 = φn2 (s)ds = 1,

0

and φm , φn L 2 = 0

t

φm (s)φn (s)ds = 0,

2.10 Poisson process

if m = n. For t ∈ [0, 1] define Btn

=

n

Xk

k=1

t

75

φk (s)ds.

0

Using the Parseval equality it is seen that E[Btn − Btm ]2 → 0

(n, m → ∞).

The completeness of L 2 [0, 1] implies the existence of a limit process Bt with the same covariance function as a Brownian motion. It can also be shown that Btn converges uniformly in t ∈ [0, 1] to Bt with probability 1 (a.s.), that is, {Bt } has continuous sample paths a.s. 2.10 Poisson process A continuous-time, discrete-state space, stochastic process which keeps the count of the occurrences of some specific event (or events) {Nt }t≥0 is called a counting process. The Poisson process is a counting process which, like the Brownian motion, has independent increments, but its sample paths are not continuous. They are increasing step functions with each step having height 1 and a random waiting time between two consecutive jumps. The times between successive jumps are independent and exponentially distributed with parameter λ > 0. The joint probability distribution of any finite number of values Nt1 , Nt2 , . . . , Ntn of the process is P[Nt1 = k1 , . . . , Ntn = kn ] =

[λ(ti+1 − ti )]ki+1 −ki (λt1 )k1 exp(−λt1 ) n−1 exp(−λ(ti+1 − ti )), k1 ! (ki+1 − ki )! i=1

provided that t1 ≤ t2 ≤ · · · ≤ tn and k1 ≤ k2 ≤ · · · ≤ kn . The Poisson process is a.s. continuous at any point, as shown by P(ω : lim |Nt+ (ω) − Nt (ω)| = 0) = lim e−λ = 1. →0

→0

However, the probability of continuity at all points in any interval is less than 1 so that it is not (a.s.) sample path continuous. Like any process with independent increments, the Poisson process is Markovian (see 2.2.4). However, the independent increment assumption is stronger than the Markov property. 2.11 Problems 1. Show that the Borel σ -field B(IR∞ ) coincides with the smallest σ -field containing the |xk1 − xk2 | open sets in IR∞ in the metric ρ∞ (x 1 , x 2 ) = k 2−k . 1 + |xk1 − xk2 | 2. Suppose that at time 0 you have $a and your component has $b. At times 1, 2, . . . you bet a dollar and the game ends when somebody has $0. Let Sn be a random walk

76

3.

4. 5.

6.

Stochastic processes

on the integers {. . . , −2, −1, 0, +1, +2, . . . } with P(X = −1) = q, P(X = +1) = p. Let α = inf{n ≥ 1 : Sn = −a, +b}, i.e. the first time you or your component is ruined, then {Sn∧α }∞ n=0 is the running total of your profit. Show that if p = q = 1/2, {Sn∧α } is a bounded martingale with mean 0 and that the probability of your ruin is b/(a + b). q Sn Show that if the game is not fair ( p = q) then Sn is not a martingale but Yn = is p a martingale. Find the probability of your ruin and check that if a = b = 500, p = .499 and q = .501 then P(ruin) = .8806 and it is almost 1 if p = 1/3. Show that if {X n } is an integrable, real valued process, with independent increments and mean 0, then it is a martingale with respect to the filtration it generates; and if in addition X n2 is integrable, X n2 − E(X n2 ) is a martingale with respect to the same filtration. Let {X n } be a sequence of i.i.d. random variables with E[X n ] = 0 and E[X n2 ] = 1. n Show that Sn2 − n is an Fn = σ {X 1 , . . . , X n }-martingale, where Sn = i=1 Xi . Let {yn } be a sequence of independent random variables with E[yn ] = 1. Show that the sequence X n = nk=0 yn is a martingale with respect to the filtration Fn = σ {y0 , . . . , yn }. Let {X n } and {Yn } be two sequences of i.i.d. random variables with E[X n ] = E[Yn ] = 0, E[X n2 ] < ∞, E[Yn2 ] < ∞ and Cov(X n , Yn ) = 0. Show that {SnX SnY −

n

Cov(X i , Yi )}

i=1

is an Fn = σ {X 1 , . . . , X n , Y1 , . . . , Yn }-martingale, where n n SnX = i=1 X i , SnY = i=1 Yi . 7. Show that two square integrable martingales X and Y are orthogonal if and only if X 0 Y0 = 0 and the process {X n Yn } is a martingale. 8. Show that the square integrable martingales X and Y are orthogonal if and only if for every 0 ≤ m ≤ n, E[X n Yn | Fm ] = E[X n | Fm ]E[Yn | Fm ]. 9. Let {Bt } be a standard Brownian motion process (B0 = 0, a.s., σ 2 = 1). Show that the conditional density of {Bt } for t1 < t < t2 , P(Bt ∈ dx | Bt1 = x1 , Bt2 = x2 ), is a normal density with mean and variance µ = x1 +

x2 − x1 (t − t1 ), t2 − t1

σ2 =

(t2 − t)(t − t1 ) . t2 − t1

10. Let {Bt } be a standard Brownian motion process. Show that the density of α = inf{t, Bt = b}, the first time the process Bt hits level b ∈ IR (see Example 2.2.5), is given by |b| −b/2t f α (t) = √ e ; 2π t 3

t > 0.

2.11 Problems

77

11. Let {Bt } be a Brownian motion process with drift µ and diffusion coefficient σ 2 . Let xt = e Bt ,

t ≥ 0.

Show that E[xt | x0 = x] = xet(µ+ 2 σ ) , 1

2

and var[xt | x0 = x] = x 2 e2t(µ+ 2 σ 1

2

)

2 etσ − 1 .

12. Let Nt be a standard Poisson process and Z 1 , Z 2 . . . a sequence of i.i.d. random variables such that P(Z i = 1) = P(Z i = −1) = 1/2. Show that the process Xt =

Nt

Zi

i=1

is a martingale with respect to the filtration {Ft } = σ {X s , s ≤ t}. 13. Show that the process {Bt2 − t, FtB } is a martingale, where B is the standard Brownian motion process and {FtB } its natural filtration. 14. Show that the process {(Nt − λt)2 − λt} is a martingale, where Nt is a Poisson process with parameter λ. 15. Show that the process t It = f (ω, s)dMs 0

is a martingale. Here f (.) is an adapted, bounded, continuous sample paths process and Mt = Nt − λt is the Poisson martingale. 16. Referring to Example 2.4.4, define the processes Nnsr =

n

I(ηk−1 =s,ηk =r ) =

k=1

n

X k−1 , es X k , er ,

(2.11.1)

k=1

and Onr =

n

I(ηk =r ) =

k=1

n

X k , er .

(2.11.2)

k=1

Show that 2.11.1 and 2.11.2 are increasing processes and give their Doob decompositions. 17. Let {X k , Fk }, for 0 ≤ k ≤ n be a martingale and α a stopping time. Show that E[X α ] = E[X 0 ]. 18. Let α be a stopping time with respect to the filtration {X n , Fn }. Show that Fα = {A ∈ F∞ : A ∩ {ω : α(ω) ≤ n} ∈ Fn is a σ -field and that α is Fα -measurable.

∀ n ≥ 0}

78

Stochastic processes

19. Let {X n } be a stochastic process adapted to the filtration {Fn } and B a Borel set. Show that α B = inf{n ≥ 0,

X n ∈ B}

is a stopping time with respect Fn . 20. Show that if α1 , α2 , are two stopping times such that α1 ≤ α2 (a.s.) then Fα1 ⊂ Fα2 . 21. Show that if α is a stopping time and a is a positive constant, then α + a is a stopping time. 22. Show that if {αn } is a sequence of stopping times and the filtration {Ft } is rightcontinuous then inf αn , lim inf αn and lim sup αn are stopping times.

3

Stochastic calculus

3.1 Introduction It is known that if a function f is continuous and a function g is right continuous with left limits, of bounded variation (see Definition 2.6.1), then the Riemann–Stieltjes integral of f with respect to g on [0, t] is well-defined and equals t n n f (s)dg(s) = lim f (τkn )(g(tkn ) − g(tk−1 )), 0

δn →0

k=1

where 0 = t0n < t1n < · · · < tnn = t denotes a sequence of partitions of the interval [0, t] n n such that δn = max(tkn − tk−1 ) → 0 as n → ∞ and tk−1 ≤ τkn ≤ tkn . The Lebesgue–Stieltjes integral with respect to g can be defined by constructing a measure µg on the Borel field B([0, ∞)), starting with the definition µg ((a, b]) = g(b) − g(a), and then starting with the integral of simple functions f with respect to µg , as in Chapter 1. For right continuous left limited stochastic processes with bounded variation sample paths, path-by-path integration is defined for each sample path by fixing ω and performing Lebesgue–Stieltjes integration with respect to the variable t. If a continuous (local) martingale X has bounded variation, its quadratic variation is zero (see Remark 2.6.10(4)). However, continuous (local) martingales have unbounded variation, so that the Stieltjes definition cannot be used in stochastic integration to define path-by-path integrals. We assume the dependence of f as ω is constant in time.

3.2 Quadratic variations Discrete-time processes Definition 3.2.1 The stochastic process X = {X n }, n ≥ 0, is said to be square integrable if supn E[X n2 ] < ∞.

80

Stochastic calculus

Definition 3.2.2 Let {X n } be a discrete time, square integrable stochastic process on a filtered probability space (, F, {Fn }, P). 1. The nonnegative, increasing process defined by [X, X ]n = X 02 +

n

(X k − X k−1 )2

k=1

is called the optional quadratic variation of {X n }. The predictable quadratic variation of {X n } relative to the filtration {Fn } and probability measure P is defined by n

X, X n = E(X 02 ) +

E[(X k − X k−1 )2 | Fk−1 ].

k=1

2. Given two square integrable processes {X n } and {Yn } the optional covariation process is defined by [X, Y ]n = X 0 Y0 +

n

(X i − X i−1 )(Yi − Yi−1 ),

i=1

and the predictable covariation process is defined by X, Y n = E[X 0 Y0 ] +

n

E[(X i − X i−1 )(Yi − Yi−1 ) | Fi−1 ].

i=1

Example 3.2.3 Let X 1 , X 2 , . . . be a sequence of i.i.d. normal random variables with mean 0 and variance 1, and consider the process Z 0 = 0 and Z n = nk=1 X k . Then it is left as an exercise to show that [Z , Z ]n =

n

X k2 ,

k=1

Z , Z n = n, E([Z , Z ]n ) = E(

n

X k2 ) = n.

k=1

Here Z , Z n is not random and is equal to the variance of Z n .

Example 3.2.4 Let = {ωi , 1 ≤ i ≤ 8} and the time index be n = 1, 2, 3. Suppose we are given a probability measure P(ωi ) = 1/8, i = 1, . . . , 8, a filtration F0 = {, ∅}, F1 = σ {{ω1 , ω2 , ω3 , ω4 }, {ω5 , ω6 , ω7 , ω8 }}, F2 = σ {{ω1 , ω2 }, {ω3 , ω4 }, {ω5 , ω6 }, {ω7 , ω8 }}, F3 = σ {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 , ω7 , ω8 },

3.2 Quadratic variations

81

and a stochastic process X given by:

X=

X 0 (ω1 ) X 0 (ω2 ) . . .

X 0 (ω8 )

X 1 (ω1 ) X 1 (ω2 ) . . .

X 1 (ω8 )

X 2 (ω1 ) X 2 (ω2 ) . . .

X 2 (ω8 )

X 3 (ω1 ) X 3 (ω2 ) . . .

X 3 (ω8 )

which is adapted to the filtration {Fi , i = 0, 1, 2, 3}, that is x0 X=

x0

x0

x0

x0

x0

x0

x0

x1,1 x1,1 x1,1 x1,1 x1,2 x1,2 x1,2 x1,2 x2,1 x2,1 x2,2 x2,2 x2,3 x2,3 x2,4 x2,4

.

x3,1 x3,2 x3,3 x3,4 x3,5 x3,6 x3,7 x3,8 In this simple example the stochastic process X, X n = E(X 02 ) +

n

E[(X k − X k−1 )2 | Fk−1 ]

k=1

can be explicitly calculated: X, X 0 = E(X 02 ) = x02 , X, X 1 = E(X 02 ) + E[(X 1 − X 0 )2 | F0 ] = x02 + E[(X 1 − X 0 )2 ] 4 4 = x02 + (x1,1 − x0 )2 + (x1,2 − x0 )2 . 8 8 Note that X, X 0 and X, X 1 are both F0 -measurable, that is, they are constants. X, X 2 (ω) = E(X 02 ) + E[(X 1 − X 0 )2 ] + E[(X 2 − X 1 )2 | F1 ](ω) = X, X 1 + E[(X 2 − X 1 )2 | {ω1 , ω2 , ω3 , ω4 }]I{ω1 ,ω2 ,ω3 ,ω4 } + E[(X 2 − X 1 )2 | {ω5 , ω6 , ω7 , ω8 }]I{ω5 ,ω6 ,ω7 ,ω8 } = X, X 1 + +

(x2,1 − x1,1 )2 2/8 + (x2,2 − x1,1 )2 2/8 I{ω1 ,ω2 ,ω3 ,ω4 } P{ω1 , ω2 , ω3 , ω4 } = 4/8

(x2,3 − x1,2 )2 2/8 + (x2,4 − x1,2 )2 2/8 I{ω5 ,ω6 ,ω7 ,ω8 } P{ω5 , ω6 , ω7 , ω8 } = 4/8

4 4 = x02 + (x1,1 − x0 )2 + (x1,2 − x0 )2 8 8 +

(x2,1 − x1,1 )2 + (x2,2 − x1,1 )2 I{ω1 ,ω2 ,ω3 ,ω4 } 2

+

(x2,3 − x1,2 )2 + (x2,4 − x1,2 )2 I{ω5 ,ω6 ,ω7 ,ω8 } . 2

82

Stochastic calculus

Note that X, X 2 is F1 -measurable. X, X 3 (ω) = E(X 02 ) + E[(X 1 − X 0 )2 ] + E[(X 2 − X 1 )2 | F1 ](ω) + E[(X 3 − X 2 )2 | F2 ](ω) = X, X 2 + E[(X 3 − X 2 )2 | F2 ](ω) = X, X 2 +

(x3,1 − x2,1 )2 + (x3,2 − x2,1 )2 I{ω1 ,ω2 } 2

+

(x3,3 − x2,2 )2 + (x3,4 − x2,2 )2 I{ω3 ,ω4 } 2

+

(x3,5 − x2,3 )2 + (x3,6 − x2,3 )2 I{ω5 ,ω6 } 2

+

(x3,7 − x2,4 )2 + (x3,8 − x2,4 )2 I{ω7 ,ω8 } . 2

Note that X, X 3 is F2 -measurable.

Theorem 3.2.5 If {X n } is a square integrable martingale then X 2 is a submartingale and X 2 − X, X is a martingale, i.e. X, X is the unique predictable, increasing process in the Doob decomposition of X 2 . Proof

From Jensen’s inequality 2.3.3, 2 E[X n2 | Fn−1 ] ≥ (E[X n | Fn−1 ])2 = X n−1 .

Hence X 2 is a submartingale. The rest of the proof is left as an exercise.

Theorem 3.2.6 If X and Y are (square integrable) martingales, then X Y − [X, Y ] and X Y − X, Y are martingales. Proof E(X n Yn − [X, Y ]n | Fn−1 ) = −[X, Y ]n−1 + E(X n Yn − (X n − X n−1 )(Yn − Yn−1 ) | Fn−1 ) = −[X, Y ]n−1 − X n−1 Yn−1 + E(X n Yn − X n Yn + X n Yn−1 + X n−1 Yn | Fn−1 ) = −[X, Y ]n−1 − X n−1 Yn−1 + 2X n−1 Yn−1 = X n−1 Yn−1 − [X, Y ]n−1 . The proof for X Y − X, Y is similar. Two martingales X and Y are orthogonal if and only if X, Y n = 0 for, all n.

3.2 Quadratic variations

83

Example 3.2.7 Returning to Example 2.3.15, we call the stochastic process X = n n k=1 Ak bk = k=1 Ak C k a stochastic integral with predictable integrand A and integrator the martingale C. Note that the predictability of the integrand is a rather natural requirement. In discrete time the stochastic integral is usually called the martingale transform and it is usually written (A • C)n =

n

Ak Ck .

k=1

Stochastic integrals can be defined for more general integrands and integrators. Theorem 3.2.8 For any discrete time process X = {X n } we have: n

X k−1 X k =

k=1

1 2 (X − [X, X ]n ). 2 n

Proof 2

n

X k−1 X k + [X, X ]n =

k=1

n

[2X k−1 (X k − X k−1 ) + (X k − X k−1 )2 ] = X n2 .

k=1

1 2 (X − X 02 ) 2 t we should replace the integrand X n−1 by a non-predictable one, (X n−1 + X n )/2. This is a discrete-time Stratonovitch integral and: In order to recover the analog of the familiar form of the integral

X s dX s =

n X k−1 + X k 1 X k = (X n2 − X 02 ). 2 2 k=1

However, we then lose the martingale property of the stochastic integral. The following result, which is proved using the identity 1 ([X + Y, X + Y ]n − [X, X ]n − [Y, Y ]n ), 2 is the integration (or summation) by parts formula. [X, Y ]n =

Theorem 3.2.9 X n Yn =

n k=1

X k−1 Yk +

n

Yk−1 X k + [X, Y ]n .

k=1

We now state the rather trivial discrete-time version of the so-called Itˆo formula of stochastic calculus. Theorem 3.2.10 For a real valued differentiable function f and a stochastic process X we have n n f (X n ) = f (X 0 ) + f (X k−1 )X k + [ f (X k ) − f (X k−1 ) − f (X k−1 )X k ]. k=1

k=1

84

Stochastic calculus

Continuous-time processes We begin by recalling few definitions and results regarding deterministic functions. Definition 3.2.11 The quadratic variation Sn ( f ) of a function f on an interval [a, b] is Sn ( f ) =

n

n ( f (tkn ) − f (tk−1 ))2 ,

k=1

where a = < < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. t0n

t1n

Lemma 3.2.12 If f is a continuous real valued function of bounded variation (see Definition 2.6.1) then its quadratic variation on any interval [a, b] is 0, that is lim Sn ( f ) = lim

n→∞

n→∞

n

n ( f (tkn ) − f (tk−1 ))2 = 0,

k=1

where a = t0n < t1n < · · · < tnn = b denotes a sequence of partitions of the interval [a, b] n such that δn = max(tkn − tk−1 ) → 0 as n → ∞. Proof Since f is continuous and of bounded variation there exists M > 0 such that for ε > 0 we can choose a partition so fine that ε n maxk (| f (tkn ) − f (tk−1 )|) < . nM Sn ( f ) < M

n

n | f (tkn ) − f (tk−1 )| < Mn

k=1

ε = ε, nM

and the result follows. Let {X t , Ft } be a square integrable martingale. Then {X t2 , Ft } is a nonnegative submartingale, hence of class DL and from the Doob–Meyer decomposition there exists a unique predictable increasing process {X, X t , Ft } such that X t2 = Mt + X, X t , where {Mt , Ft } is a right-continuous martingale and X, X 0 = X 02 . Lemma 3.2.13 Suppose X = {X t , Ft } is a square integrable martingale. Then: 1. X = X c + X d , where X c is the continuous martingale part of X and X d is the purely discontinuous martingale part of X . This decomposition is unique. 2 2. E[ s X s2 ] ≤ E[X ∞ ], where X ∞ = limt→∞ X t . 2 3. For any t, s≤t X s < ∞ a.s. Proof

See [11] page 97.

The following result is analogous to Lemma 3.2.13.

3.2 Quadratic variations

85

Lemma 3.2.14 Suppose X = {X t , Ft } is a local martingale. Then: 1. X = X c + X d , where X c is the continuous local martingale part of X and X d is the purely discontinuous local martingale part of X . This decomposition is unique. 2. For any t, X s2 < ∞ a.s. s≤t

Proof

See [11] page 119.

Definition 3.2.15 Let X = {X t , Ft } be a square integrable martingale. 1. X, X is called the predictable quadratic variation of X . 2. The optional increasing process [X, X ]t = X c , X c t + (X s )2 s≤t

is called the optional quadratic variation of X . X = X c + X d is the unique decomposition given by Lemma 3.2.13. Example 3.2.16 If {Nt } is a Poisson process with parameter λ, Ns = 0 or 1 for all s ≥ 0 and N c , N c t = 0. Therefore [N , N ]t = (Ns )2 = Nt . 0≤s≤t

Since {Nt − λt} is a martingale that is 0 at 0, we have N , N t = λt. Theorem 3.2.17 If X = {X t , Ft } is a continuous local martingale, there exists a unique increasing process X, X , vanishing at zero, such that X 2 − X, X is a continuous local martingale. Proof

See [32] page 124.

Definition 3.2.18 Suppose X = X 0 + M + V is a semimartingale (see Definition 2.6.4). Then the optional quadratic variation of X is the process [X, X ]t = X c , X c t + (X s )2 . s≤t

By definition V has finite variation in [0, t], (Vs )2 ≤ K |Vs )| < ∞, s≤t

s≤t

for some K . Also, from Lemma 3.2.14 s≤t (Ms )2 < ∞. Therefore finite because (X s )2 ≤ (Ms )2 + (Vs )2 .

s≤t (X s )

2

is a.s.

86

Stochastic calculus

Lemma 3.2.19 Almost every sample path of [X, X ] is right-continuous with left limits and of finite variation on each compact subset of IR. Further, [X, X ]t < ∞ a.s. for each t ∈ [0, ∞). Proof

See [11].

Definition 3.2.20 Suppose {X t , Ft } and {Yt , Ft } are two square integrable martingales. Then 1 X, Y = (X + Y, X + Y − X, X − Y, Y ). 2 X, Y is the unique predictable process of integrable variation (see Definition 2.6.2) such that X Y − X, Y is a martingale and X 0 Y0 = X, Y 0 . Two square integrable martingales X and Y are called orthogonal martingales if X, Y t = 0, a.s., holds for every t ≥ 0. Remark 3.2.21 From the definition, the orthogonality of two square integrable martingales X and Y implies that X Y is a martingale. Conversely, from the identity E[(X t − X s )(Yt − Ys ) | Fs ] = E[X t Yt − X t Ys − X s Yt + X s Ys | Fs ] = E[X t Yt − X s Ys | Fs ] = E[X, Y t − X, Y s | Fs ], if X Y is a martingale the two square integrable martingales X and Y are orthogonal.

Definition 3.2.22 Suppose {X t , Ft } and {Yt , Ft } are two square integrable martingales. Define 1 ([X + Y, X + Y ] − [X, X ] − [Y, Y ]). 2 Then [X, Y ] is of integrable variation (see Definition 2.6.2), [X, Y ] =

X Y − [X, Y ] is a martingale and X 0 Y0 = [X, Y ]0 = X 0 Y0 . Remark 3.2.23 From the definition [X, Y ]t = X c , Y c t +

X s Ys .

s≤t

Definition 3.2.24 Suppose X = {X t , Ft } is a local martingale and let X = X c + X d be its unique decomposition into a continuous local martingale and a totally discontinuous local martingale. Then the optional quadratic variation of X is the increasing process [X, X ]t = X c , X c t + (X s )2 . s≤t

3.3 Simple examples of stochastic integrals

87

If X , Y are local martingales, 1 ([X + Y, X + Y ] − [X, X ] − [Y, Y ]) 2 = X c , Y c t + X s Ys .

[X, Y ] =

s≤t

We end this section with the following useful inequalities. Write H2 = {uniformly integrable (see Definition 1.3.34) martingales {Mt } such that sup |Mt | ∈ L 2 }

(3.2.1)

t

Theorem 3.2.25 Suppose X, Y ∈ H2 and f , g are measurable processes. (See Definition 2.1.9.) If 1 < p < ∞ and 1/ p + 1/q = 1 then ∞ 1/2 ∞ 1/2 ∞ 2 2 E | f s ||gs ||dX, Y s | ≤ f s dX, X s gs dY, Y s , 0 0 0 p

and

E 0

Proof

∞

1/2 ∞ | f s ||gs ||d[X, Y ]s | ≤ f s2 d[X, X ]s 0

p

q

1/2 ∞ gs2 d[Y, Y ]s . 0 q

See [11] page 102.

Theorem 3.2.26 (Time-Change for Martingales). If M is an Ft -continuous local martingale vanishing at 0 and such that M, M∞ = ∞ and if we set Tt = inf{s : M, Ms > t}, then Bt = MTt is an FTt -Brownian motion and Mt = BM,Mt . Proof

See [32] page 181.

3.3 Simple examples of stochastic integrals Example 3.3.1 Suppose {X t }, t ≥ 0, is a stochastic process representing the random price of some asset. Consider a partition 0 = t0n < t1n < · · · < tnn = t of the interval [0, t]. Suppose ξti , i = 0, 1, . . . , n − 1, is the amount of the asset which is bought at time ti for the price X ti . This amount ξti is held until time ti+1 when it is sold for price X ti+1 . The amount gained (or lost) is, therefore, ξti (X ti+1 − X ti ). Then ξti+1 is bought at time ti+1 . Clearly ξti should be predictable with respect to the filtration {FtX } generated by X . Then t n−1

ξti (X ti+1 − X ti ) = ξs dX s i=0

0

is the total increase (or loss) in the trader’s wealth from holding these amounts of the asset.

88

Stochastic calculus

Example 3.3.2 Since the sample paths of a Poisson process Nt are increasing and of finite variation we can write t ∞ X s (ω)dNs (ω) = X αk (ω) (ω)I(αk ≤t) (ω), 0

k=1

where αk is the time of the k-th jump. Recall that the number of jumps in any finite interval [0, t] is finite with probability 1. Hence the infinite series has only finitely many nonzero terms for almost all ω. Example 3.3.3 Stochastic integration with respect to the family of martingales q(t, A) related to the single jump process (See Examples 2.1.4 and 2.6.12) is simply ordinary (Stieltjes) integration with respect to the measures µ and µ p applied to suitable integrands. Recall that µ picks out the jump time T and the location Z of the stochastic process X , that is µ(ds, dz) is non-zero only when T ∈ ds and Z ∈ dz. Therefore, we may write for any suitable real valued function g defined on = [0, ∞] × E: g(s, z)q(ds, dz) = g(s, z)µ(ds, dz) − g(s, z)µ p (ds, dz),

where

g(s, z)µ(ds, dz) = g(T, Z ),

since the random measure µ picks out the jump time T and the location Z only. We say that g ∈ L 1 (µ) if

||g|| L 1 (µ) = E |g|dµ = E[g(T, Z )] < ∞. E

We say that g ∈ if g I{t<τn } ∈ L 1 (µ) for some sequence of stopping times τn ↑ ∞ a.s. Using (2.1.1) and (2.1.2) we have dFsA µ p (t, A) = − = λ(A, s)d(s). (3.3.1) ]0,T ∧t] Fs− ]0,T ∧t] L 1loc (µ)

Hence

We also have

g(s, z)µ p (ds, dz) =

Define g

Mt =

g(s, z)λ(dz, s)d(s). ]0,T ]

E

g(s, z)µ p (ds, dz) =

g(s, z) ]0,T ]

E

P(ds, dz) . Fs−

E

=

I{s≤t} g(s, z)q(ds, dz)

I{s≤t} g(s, z)µ(ds, dz) − E

I{s≤t} g(s, z)µ p (ds, dz), E

3.3 Simple examples of stochastic integrals

89

or, from the definition,

g

Mt = g(T, Z )I{T ≤t} +

g(s, z) ]0,T ∧t]

E

P(ds, dz) . Fs−

g

Theorem 3.3.4 Mt is an Ft -martingale for g ∈ L 1 (µ). Proof

For t > s,

g E[Mt − Msg | Fs ] = E g(T, Z )(I{T ≤t} − I{T ≤s} ) P(du, dz) P(du, dz) + g(u, z) − g(u, z) | Fs . Fu− Fu− ]0,T ∧t] E ]0,T ∧s] E So we must show that E[g(T, Z )(I{T ≤t} − I{T ≤s} ) | Fs ] P(du, dz) = −E g(u, z) Fu− ]0,T ∧t] E P(du, dz) − g(u, z) | Fs . Fu− ]0,T ∧s] E First note that if T ≤ s both sides of (3.3.2) are zero. Now E[g(T, Z )(I{T ≤t} − I{T ≤s} ) | Fs ] = E[g(T, Z )I{s s]I{T >s} I{T >s} = g(u, z)P(du, dz), Fs ]s,t] E and

P(du, dz) P(du, dz) −E g(u, z) − g(u, z) | Fs Fu− Fu− ]0,T ∧t] E ]0,T ∧s] E P(du, dz) = −E g(u, z) Fu− ]0,T ∧t] E P(du, dz) − g(u, z) | T > s I{T >s} Fu− ]0,s] E I{T >s} P(du, dz) E =− g(u, z) P(T > s) Fu− ]0,T ∧t] E P(du, dz) (I{T >t} + I{ss} P(du, dz) = −Ft g(u, z) Fs Fu− ]s,t] E P(du, dz) + g(u, z) dFr . Fu− ]s,t] ]s,r ] E

(3.3.2)

90

Stochastic calculus

Interchanging the order of integration, the triple integral is P(du, dz) = g(u, z) dFr Fu− ]s,t] ]s,r ] E 1 = dFr g(u, z)P(du, dz) ]s,t] Fu− ]u,t] E 1 = (Ft − Fu− )g(u, z)P(du, dz) ]s,t] E Fu− P(du, dz) = Ft g(u, z) − g(u, z)P(du, dz). Fu− ]s,t] E ]s,t] E Therefore (3.3.2) holds and the result follows.

3.4 Stochastic integration with respect to a Brownian motion Let B = {Bt , t ≥ 0} be a Brownian motion process and let 0 = t0n < t1n < · · · < tnn = t n denote a sequence of partitions of the interval [0, t] such that δn = max(tkn − tk−1 ) → 0 as n → ∞. Write formally t It = Bs dBs . 0

If formula were true for stochastic integrals It = Bt2 − t the usual integration-by-parts 1 2 of the limit, as 0 Bs dBs , so It = 2 Bt . (This assumes the existence, in some sense, n n δn = max(tk − tk−1 ) → 0 ( n → ∞), of the Riemann–Stieltjes sums Sn = nk=1 Bτkn (Btkn − n n n n ), where t Btk−1 k−1 ≤ τk ≤ tk .) Now Sn can be written as Sn =

1 2 B + Sn , 2 t

(3.4.1)

where Sn = − +

n n 1 2 2 n ) + n ) (Btkn − Btk−1 (Bτkn − Btk−1 2 k=1 k=1 n

n ). (Btkn − Bτkn )(Bτkn − Btk−1

k=1

To see this write n ) = (B n − B n n )(B n − B n + B n − B n ) Bτkn (Btkn − Btk−1 τk tk−1 + Btk−1 tk τk τk tk−1

2 n ) + (B n − B n ) = (Btkn − Bτkn )(Bτkn − Btk−1 τk tk−1 n (B n − B n ). + Btk−1 tk tk−1

(3.4.2)

3.4 Integration with respect to a Brownian motion

91

The last term in (3.4.2) is written n (B n − B n ) = (B n n ) Btk−1 tk tk−1 tk−1 − Btkn + Btkn )(Btkn − Btk−1

2 n ) + B n (B n − B n ) = −(Btkn − Btk−1 tk tk tk−1

1 2 n ) − 2B n (B n − B n )] = − [2(Btkn − Btk−1 tk tk tk−1 2 1 1 1 2 2 2 n ) − n ) − B n] + = − (Btkn − Btk−1 [(Btkn − Btk−1 Bn tk 2 2 2 tk 1 1 2 1 2 n ) − = − (Btkn − Btk−1 + Bt2kn . Btk−1 n 2 2 2 n Using this form of Sn one can show that if τkn = (1 − α)tk + αtk−1 , 0 ≤ α ≤ 1, then

L2

lim Sn =

δn →0

Bt2 + (α − 12 )t = It (α), 2

where Sn is given by (3.4.1). It is interesting to notice that the stochastic integral It (α) =

Bt2 + (α − 12 )t 2

n n -measurable is an Ft -martingale if and only if α = 0. When α = 0 the integrand Btk−1 is Ftk−1 and so does not anticipate future events in Ftkn . Then, because B has independent increments, n n Btk−1 is independent of the integrator Btkn − Btk−1 which gives E[Sn ] = 0. t K. Itˆo [17] has given a definition of the integral f (s, ω)dBs (ω) for the class of pre-

0

dictable, locally square integrable stochastic processes { f (t, ω)}. The next important step was given by H. Kunita and S. Watanabe in 1967 [24]. They extended the definition of Itˆo by replacing the Brownian motion process by an arbitrary square integrable martingale {X t } employing the quadratic variation processes X, X t . The stochastic (Itˆo) integral with respect to a Brownian motion integrator will be defined for two classes of integrands. The larger class of integrands gives an integral which is a local martingale. The more restricted class of integrands gives an integral which is a martingale. Suppose (, F, P) is a probability space and B = {Bt , t ≥ 0} is a standard Brownian motion. Write Ft0 = σ {Bu : u ≤ t} and {Ft , t ≥ 0} for the right continuous, complete filtration generated by B. Let H be the set of all adapted, measurable processes { f (ω, t), Ft } such that with probability 1, 0

t

f 2 (ω, s)ds < ∞,

∀t ≥ 0,

92

Stochastic calculus

and let {H 2 , ||.|| H 2 } be the normed space of all adapted, measurable processes { f (ω, t), Ft } such that t 2 E f (ω, s)ds < ∞, ∀t ≥ 0, 0

where || f || H 2 = E

t 0

f 2 (ω, s)ds

1/2

, for f ∈ H 2 .

It is clear that H 2 ⊂ H , since for a nonnegative random variable X , if P(X = ∞) = 0

then

E[X ] = ∞,

in other words, if E[X ] < ∞

P(X = ∞) = 0. t In our case the nonnegative random variable is f 2 (ω, s)ds. then

0

As in the definition of the (deterministic) Stieltjes integral, a natural way to define the stochastic integral is to start with simple functions, that is, piecewise constant functions. Definition 3.4.1 A (bounded and predictable) function f (ω, t) is simple on the interval [0, t] if f (0, ω) is constant and for s ∈ (0, t], f (s, ω) =

n−1

f k (ω)I(tk ,tk+1 ] (s),

k=0

where 0 = t0 , . . . , tn = t is a partition of the interval [0, t] independent of ω, each f k (ω) is Ftk measurable and E[ f k2 ] < ∞. For any simple function f (ω, t) ∈ H (or f (ω, t) ∈ H 2 ) the Itˆo stochastic integral is defined as t

I( f ) = f (ω, s)dBs (ω) = f (tk , ω)(Btk+1 (ω) − Btk (ω)) 0

=

k

f k (ω)(Btk+1 (ω) − Btk (ω)).

k

Note that each f k is Ftk -measurable and hence independent of the integrator (Btk+1 − Btk ) because of the independent increment property of the Brownian motion B = {Bt , t ≥ 0}. In order to define the integral for functions in {H 2 , ||.|| H 2 } we need a few preliminary results. Lemma 3.4.2 ([16]). Let (, F, Ft , P) be a filtered probability space. Let L be a linear space of real and bounded measurable stochastic processes such that 1. L contains all bounded, left-continuous adapted processes, 2. if {X n } is a monotone increasing sequence of processes in L such that X = supn X n is bounded, then X ∈ L. Then L contains all bounded predictable processes. Proof

See [16] page 21.

3.4 Integration with respect to a Brownian motion

93

Lemma 3.4.3 Let S 2 be the set of all simple processes in H 2 . Then 1. S 2 is dense in H 2 . 2. For f ∈ S 2 , ||I ( f )|| L 2 = || f || H 2 . 3. For f ∈ S 2 , E[I ( f )] = 0. Proof 1. Let f ∈ H 2 and for K > 0 set F K = f I[−K ,K ] . Then f K ∈ H 2 and || f − f K || H 2 → 0 as K → ∞. Therefore suppose that f ∈ H 2 is bounded. Let L = { f ∈ H 2 : f is bounded and there exists f n ∈ S 2 such that || f − f n || H 2 → 0, n → ∞}. L is linear and is closed under monotone increasing sequences. If f is left-continuous bounded and adapted one can set f n (0, ω) = f (0, ω), and for t > 0, f n (t, ω) = f (k/2n , ω)I(k/2n ,(k+1)/2n ] (t),

k = 0, 1, . . . .

Then f n ∈ S 2 and by bounded convergence || f − f n || H 2 → 0, n → ∞. Now, in view of Lemma 3.4.2, L contains all bounded, predictable processes and L contains all bounded processes in H 2 . (See [16] Remark 1.1, page 45.)

2 2 2. ||I ( f )|| L 2 = E[I ( f )]2 = E[ f (tk , ω)(Btk+1 (ω) − Btk (ω)) ] = E[ Ak ] =

k

+2

k

E[(Ak ) ] + 2 2

E[Ai A j ] =

i< j

E[E[Ai A j | F j ]] =

i< j

k

E[E[(Ak ) | Fk ]] 2

k

E[E[ f 2 (tk , ω)(Btk+1 − Btk )2 | Fk ]]

k

= E[ f 2 (tk , ω)(tk+1 − tk )] = E[

t 0

k

f 2 (ω, s)ds] = || f ||2H 2 .

The proof of the last part of the lemma is left as an exercise. Theorem 3.4.4 Suppose that f (ω, t) ∈ H 2 . Then there exists an (a.s. unique) L 2 -random L2

variable I ( f ) such that I ( f n ) → I ( f ) independently of the choice of the sequence of simple functions f n (ω, t) ∈ H 2 , that is t t L2 f n (ω, s)dBs (ω) → f (ω, s)dBs (ω). (3.4.3) 0

0

The left hand side of (3.4.3) is also called the Itˆo integral of f . Proof In view of Lemma 3.4.3 we have that for f (ω, t) ∈ H 2 there exists a sequence of simple functions f n ∈ S 2 such that H2

lim f n → f, and we see that ||I ( f n ) − I ( f m )|| L 2 = ||I ( f n − f m )|| L 2 = || f n − f m || H 2 → 0. However, L 2 is complete, so that the Cauchy sequence I ( f n ) has a limit I ( f ) ∈ L 2 .

94

Stochastic calculus

Suppose that { f n } is a second sequence converging to f but I ( f n ) converges to another integral I ( f ). Then || f n − f || H 2 + || f − f n || H 2 ≥ || f n − f n || H 2 = ||I ( f n ) − I ( f n )|| L 2 . However, || f n − f || H 2 + || f − f n || H 2 → 0 by assumption and therefore ||I ( f n ) − I ( f n )|| L 2 → 0, which establishes the uniqueness of the limit I ( f ). L2

Remark 3.4.5 Since I ( f n ) → I ( f ), then lim E[I ( f n )] = E[I ( f )] and in view of Lemma 3.4.3 we have: 1. For f ∈ H 2 , E[I ( f )] = 0. 2. For f ∈ H 2 , ||I ( f )|| L 2 = || f || H 2 .

3.5 Stochastic integration with respect to general martingales Recall that H2 is given by (3.2.1). Write S = {bounded simple predictable processes (Definition 3.4.1) }, H02 2,c

(3.5.1)

= {{Mt } ∈ H : M0 = 0 a.s.},

(3.5.2)

H

= {{Mt } ∈ H and {Mt } is continuous},

(3.5.3)

H02,c

= {{Mt } ∈ H

(3.5.4)

2 2

2,c

: M0 = 0 a.s.}.

Suppose X ∈ H2 . Then the integral

t

f (s, ω)dX s = f 0 X 0 +

0

n

f k (X tk+1 ∧t − X tk ∧t ) exists for f ∈ S.

0

t

Lemma 3.5.1 ([11]).

f (s)dX s ∈ H2 and

0

E

2

∞

f (s)dX s

∞

=E

0

0

=E

∞

f (s, ω)dX, X s 2

f 2 (s, ω)d[X, X ]s .

0

Proof

By definition

t

f (s, ω)dX s = f 0 X 0 +

0

n

f k (X tk+1 ∧t − X tk ∧t ).

0

By the optional stopping theorem, for s ≤ t: t E f (z, ω)dX z | Fs = 0

s 0

f (z, ω)dX z .

3.5 Stochastic integration with respect to general martingales

95

For k < , so that k + 1 ≤ , E[ f k f (X tk+1 ∧t − X tk ∧t )(X t +1 ∧t − X t ∧t )] = E[E[ f k f (X tk+1 ∧t − X tk ∧t )(X t +1 ∧t − X t ∧t ) | Ft ]] = 0. Therefore

2

t

E

= E[

f (s)dX s 0

n

f k2 (X t2k+1 ∧t − X t2k ∧t )

0 n = E[ f k2 (X, X tk+1 ∧t − X, X tk ∧t ) 0

=E

t

f (s, ω)dX, X s 0 ∞ 2 ≤E f (s, ω)dX, X s < ∞, 2

0

because f is bounded and X ∈ H . The integrals on the right are Stieltjes integrals. Therefore, by Lebesgue’s Theorem, letting t → ∞: ∞ 2 ∞ 2 E =E f (s)dX s f (s, ω)dX, X s . 2

0

0

Finally note that X, X − [X, X ] is a martingale of integrable variation and the result follows. Theorem 3.5.2 Write L 2 (X, X ) for the space of predictable processes { f (ω, t)} such that ∞ || f ||2X,X = E f 2 (ω, s)dX, X s < ∞. 0

t

Then the map f → 0 f dX of S into H2 extends in a unique manner to a linear isometry of L 2 (X, X ) into H2 . Proof Suppose that the space t S is endowed with the seminorm ||.||X,X . Then from Lemma 3.5.1 the map f → 0 f dX of S into H2 is an isometry. However, S is dense in H2 and this map extends in a unique manner to an isometry of L 2 (X, X ) into H2 . The following characterization is due to Kunita and Watanabe [24]. (See [11] page 107.) Theorem 3.5.3 Suppose f ∈ L 2 (X, X ). 1. Then for every Y ∈ H2 ,

∞

E 0

E

0

∞

| f (s)||dX, Y s | < ∞, | f (s)||d[X, Y ]s | < ∞.

96

Stochastic calculus

t

2. The stochastic integral It = such that for every Y ∈ H2 , E[I∞ Y∞ ] = E

f (s)dX s is characterized as the unique element of H2

0 ∞

f (s)dX, Y s = E

0

3. Furthermore, for every Y ∈ H ,

∞

f (s)d[X, Y ]s .

0

2

t

I, Y t =

f (s)dX, Y s ,

0 t

I, Y t =

f (s)d[X, Y ]s .

0

Proof 1. Follows from Theorem 3.2.25. 2. The linear functional on L 2 (X, X ) defined by ∞ f → E I ∞ Y∞ − f (s)dX, Y s 0

is continuous by Theorem 3.2.25 and it is zero on the space of simple processes S which is dense in L 2 (X, X ). Therefore it is zero on L 2 (X, X ) by continuity. The second identity follows because X, Y − [X, Y ] is a martingale of integrable variation. 3. Note that t ∞

jt = It Yt − f (s)dX, Y s ≤ sup |It Yt | + | f (s)||dX, Y s | ∈ L 1 . 0

t

0

Applying the identity in part (ii) it is seen that, for any stopping time T , E[JT ] = 0. In view of Theorem 2.5.7, jt is a martingale. However, I, Y t is the unique predictable process of integrable variation such that It Yt − I, Y t is a martingale. Therefore, t I, Y t = 0 f (s)dX, Y s . To prove the last identity, decompose X and Y into their continuous and totally discontinuous parts and then use a similar argument. (See [11] page 108.) Note that the first identity in (2) uniquely characterizes the stochastic integral I . This is because the right hand side is a continuous linear functional Y (given f and X ), whilst the left hand side is just the inner product of I and Y in the Hilbert space H2 . Consequently, given f and X there is a unique I ∈ H2 which gives this linear functional. Definition 3.5.4 A process { f (t, ω)} is locally bounded process if { f (0, ω)} is a.s. finite, and if there is a sequence of stopping times τn ↑ ∞ and constants K n such that | f (t, ω)|I{0
a.s.

Definition 3.5.5 A martingale X is a locally uniformly integrable martingale if there is a sequence of stopping times τn ↑ ∞ such that the stopped martingale X {τn ∧t} is a uniformly integrable martingale.

3.6 The Itˆo formula for semimartingales

97

Theorem 3.5.6 ([11] page 121) 1. Suppose X is a locally uniformly integrable martingale and { f (t, ω)} is a predictable

t locally bounded process. There is then a unique local martingale {It = 0 f (s)dX s } such that for every bounded martingale Y , t [I, Y ]t = f (s)d[X, Y ]s . 0

(Here the right hand side is just a Stieltjes integral on each sample path.) t 2. I0 = f (0)X 0 , Itc = 0 f (s)dX sc and the processes (It ) and f (t, ω)X t are indistinguishable. 3. If the local martingale X is also of locally integrable variation (see Definition 2.6.2) then It can be calculated as the Stieltjes integral along each sample path. Proof Assume that X 0 = 0 and f (0) = 0. There is a sequence of stopping times τn ↑ ∞ such that the stopped martingale X {τn ∧t} is a uniformly integrable and bounded martingale by Theorem 2.5.11. Using Theorem 2.5.12 we can write X {τn ∧t} = U{τn ∧t} + V{τn ∧t} , where U0 = 0, U{τn ∧t} is square integrable and V{τn ∧t} is a martingale of integrable variation which is zero at t = 0. The stochastic integral t t t f (s)dX {τn ∧t} = f (s)dU{τn ∧t} + f (s)dV{τn ∧t} 0

0

0

is defined by Theorem 3.5.2. Furthermore, this integral a uniformly integrable martingale. If n < m (so that τn ≤ τm a.s.), because X {τn ∧t} is equal to X {τm ∧t} stopped at τn we have t t f (s)dX {τn ∧t} is equal to f (s)dX {τm ∧t} stopped at τn . 0 0 t

t A process {It = 0 f (s)dX s } is then defined by putting {I{τn ∧t} = 0 f (s)dX {τn ∧t} } and it is seen that It is a local martingale. The rest of the proof is left as an exercise. 3.6 The Itˆo formula for semimartingales Write V = {{Vt } which are adapted, right-continuous with left limits (corlol or c`adl`ag) and for which almost every sample path is of finite variation on each compact subset of [0, ∞[]}, V0 = {{Vt } ∈ V : V0 = 0 a.s}, ∞ |dVt | < ∞}, A = {{Vt } ∈ V : E

(3.6.1) (3.6.2) (3.6.3)

0

A0 = {{Vt } ∈ A : V0 = 0 a.s.}.

(3.6.4)

Theorem 3.6.1 Suppose X = X 0 + M + V is a semimartingale and { f (t, ω)} is a predictable locally bounded process. Then the process t t t

It = f (s, ω)dX s (ω) = f (0)X 0 + f (s, ω)dMs (ω) + f (s, ω)dVs (ω) 0

0

0

98

Stochastic calculus

c is at semimartingale. It is independent of the decomposition X , and the processes It and f (s)dX sc and (It ) and f (t)X t are indistinguishable. 0

Proof Suppose X = X 0 + Mˆ + Vˆ is a second decomposition of X . Then M − Mˆ = V − Vˆ is a local martingale which is locally of integrable variation. Therefore, by Thet

orem 3.5.6(3) the stochastic integral t f (s)d(V − Vˆ )s , and so 0

t

f (0)X 0 +

0

0

t

Because X c = M c the processes 3.5.6(ii). Similarly

t f (s, ω)dVs (ω) = f (0)X 0 + f (s, ω)d Mˆ s (ω) 0 t + f (s, ω)dVˆs (ω).

t

f (s, ω)dMs (ω) +

0

ˆ s is equal to the Stieltjes integral f (s)d(M − M)

0

f (s)dX sc and I c are indistinguishable by Theorem

0

t

f (t)X t = f (t)(Mt + Vt ) =

f (s)dMs +

0

t

f (s)dVs

= (It ).

0

The Itˆo formula is first established for a continuous, bounded, real semimartingale. Theorem 3.6.2 Suppose X = X 0 + M + V is a semimartingale such that |X 0 | ≤ K a.s., M ∈ H02,c (see (3.5.4)) and bounded by K , V ∈ V0 (see (3.6.3)), V is continuous and ∞ |dVs | ≤ K a.s. 0

Let F be a twice continuously differentiable function on IR. Then t t 1 t F(X t ) = F(X 0 ) + F (X s− )dMs + F (X s− )dVs + F (X s )dM, Ms . 2 0 0 0 (3.6.5) That is, the processes on the left and right hand sides are indistinguishable. Proof

Write t I1 = F (X s− )dMs , 0

t

I2 =

F (X s− )dVs ,

I3 =

0

t

F (X s )dM, Ms .

0

Now |X | ≤ 3K . If a, b ∈ [−3K , +3K ], 1 F(b) − F(a) = (b − a)F (a) + (b − a)2 F (a) + r (a, b), 2 where, because F is uniformly continuous on [−3K , +3K ], |r (a, b)| ≤ (|b − a|)(b − a)2 . Here (s) is an increasing function of s such that lims→0 (s) = 0.

3.6 The Itˆo formula for semimartingales

99

A stochastic subdivision of [0, t] is now defined by putting t0 = 0, ti+1 = t ∧ (ti + a) ∧ inf{s > ti : |Ms − Mti | > a or |Vs − Vti | > a}, where a is any positive real number. Then as a → 0 the steps of the subdivision, sup(ti+1 − ti ), converge uniformly to 0, and the random variables sup |Mti+1 − Mti | ≤ a, sup |Vti+1 − Vti | ≤ a, tend uniformly to 0. Therefore the variation of X on the interval [ti , ti+1 ] is bounded by 4a. Now F(X t ) − F(X 0 ) =

F (X ti )(X ti+1 − X ti ) +

i

+

r (X ti , X ti+1 ) = S1 +

i P

1 F (X ti )(X ti+1 − X ti )2 2 i 1 S2 + R, 2

P

say.

P

We shall show that as a → 0, S1 → I1 + I2 , S2 → I3 , and R → 0. Write S1 = F (X ti )(Mti+1 − Mti ) + F (X ti )(Vti+1 − Vti ) i

i

= U1 + U2 .

L2

Step 1. We show that U1 → I1 . Write I1 =

ti+1

F (X s )dMs .

ti

i

The martingale property implies different terms in the sum are mutually orthogonal, so 2 ti+1 ||U1 − I1 ||22 = (F (X ) − F (X ))dM s ti s i

=E

ti

2

ti+1

(F (X s ) − F (X ti )) dM, Ms 2

ti

i

≤ E[{sup sup (F (X s ) − F (X ti ))2 }M, Mt ]. ti ≤s≤ti+1

t

By uniform continuity, the supremum tends uniformly to zero. M, Mt is integrable, so the result follows by the Monotone Convergence Theorem 1.3.15. L1

Step 2. We show that U2 → I2 . |U2 − I2 | ≤

ti+1

|(F (X s ) − F (X ti ))||dVs |

ti

i

≤ {sup sup |F (X s ) − F (X ti )| t

ti ≤s≤ti+1

o

t

|dVs |}.

100

Stochastic calculus

Again by uniform continuity of F and the Monotone Convergence Theorem 1.3.15, ||U2 − I2 ||1 converges to 0. Step 3. Writing S2 = F (X ti )(Vti+1 − Vti )2 + 2 F (X ti )(Vti+1 − Vti )(Mti+1 − Mti ) i

+

i

F (X ti )(Mti+1 − Mti )2

i

= V1 + V2 + V3 , respectively. We first show that V1 and V2 converge to 0 both a.s. and in L 1 . However, if C > sup{|F (x)| + |F (x)| : −3K ≤ x ≤ 3K }, t |V1 | ≤ C sup |Vti+1 − Vti | |dVs | ≤ aC K . i

0

P

Step 4. We show that V3 → I3 . First recall that M is bounded by K , so 2 E[M, M∞ − M, Mt | Ft ] = E[M∞ | F] − Mt2 ≤ K 2 .

Therefore

E[M, M2∞ ] = 2E

∞

(M, M∞ − M, Mt )dM, Mt

0

= 2E

0

∞

2 (E[M∞

| F] −

Mt2 )dM,

Mt

≤ 2K 2 E[M, M∞ ] ≤ 2K 4 . Consequently M, M∞ ∈ L 2 and the martingale M 2 − M, M is actually in H02 . Write J3 = F (X ti )(M, Mti+1 − M, Mti ). i

Then the same argument as Step 2 shows that t L1 J3 → I3 = F (X s )dM, Ms . 0 P

Therefore, J3 → I3 . We shall show that ||V3 − J3 ||2L 2 → 0. Because M 2 − M, M is a martingale, E[(Mti+1 − Mti )2 − M, Mti+1 + M, Mti | Fti ] = 0. Therefore, distinct terms in the sum defining V3 − J3 are orthogonal and ||V3 − J3 ||22 = E[F (X ti )2 ((Mti+1 − Mti )2 − M, Mti+1 + M, Mti )2 ]. i

3.6 The Itˆo formula for semimartingales

101

However, F (X ti )2 ≤ C 2 and (α − β)2 ≤ 2(α 2 + β 2 ), so ||V3 − J3 ||22 ≤ 2C 2 E(Mti+1 − Mti )4 i

+ 2C 2

E(M, Mti+1 − M, Mti )2 .

i

The second sum here is treated similarly to V1 in Step 3: because M, M is unia.s. formly continuous on [0, t], supi (M, Mti+1 − M, Mti ) → 0 as a → 0 and is bounded by M, Mt . Therefore 2C 2 E(M, Mti+1 − M, Mti )2 i

≤ 2C 2 E(sup(M, Mti+1 − M, Mti )M, Mt ). i

Now M, Mt ∈ L 2 , so the second sum converges to zero by Lebesgue’s Dominated Convergence Theorem 1.3.17. For the first sum, 2C 2 E(Mti+1 − Mti )4 ≤ 2C 2 E(sup(Mti+1 − Mti )2 (Mti+1 − Mti )2 ) i

i

≤ 2C a E( 2 2

i

(Mti+1 − Mti ) ) = 2C 2 a 2 E[Mt2 ]. 2

i

which again converges to zero as a → 0. (Note that it is only here, where we use the fact that |Mti+1 − Mti | ≤ a, that the random character of the partition {ti } is used.) L2

P

P

We have, thus, shown that V3 − J3 → 0. However, J3 → I3 so V3 → I3 . Step 5. Finally, we show that the remainder term R converges to 0 as a → 0. We have observed that the remainder term r in the Taylor expansion is such that |r (a, b)| ≤ (|b − a|)(b − a)2 , where is an increasing function and lims→0 (s) = 0. Therefore, |R| ≤ (X ti+1 − X ti )2 (|X ti+1 − X ti |) i

≤ 2 (2a)

((Vti+1 − Vti )2 + (Mti+1 − Mti )2 ). i

Now

2 E (Mti+1 − Mti ) = E[Mt2 ] i

is independent of the partition, and E (Vti+1 − Vti )2 ≤ a E |Vti+1 − Vti | ≤ K a. i

i

Because lima→0 (2a) = 0, lim |E(R)| ≤ lim E(|R|) = 0.

a→0

a→0

102

Stochastic calculus

For a fixed t, therefore,

t

F(X t ) = F(X 0 ) + +

1 2

F (X s− )dMs +

0

t

t

F (X s− )dVs

0

F (X s )dM, Ms ,

0

almost surely. Because all the processes are right-continuous with left limits the two sides are indistinguishable (see Definition 2.1.5). The differentiation rule will next be proved for a function F which is twice continuously differentiable, and which has bounded first and second derivatives, and a semimartingale X of the form X t = X 0 + Mt + Vt , where X 0 ∈ L 1 , M ∈ H02 and V ∈ A0 . That is, the following result will be proved after the lemmas and remarks below. Theorem 3.6.3 Suppose X = X 0 + M + V is a semimartingale such that X 0 ∈ L 1 a.s., M ∈ H02 , V ∈ A0 and F is twice continuously differentiable with bounded first and second derivatives. Then the following two processes, the left and right hand sides, are indistinguishable: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s− )dX c , X c s 2 0 0 + (F(X s ) − F(X s− ) − F (X s− )X s ). (3.6.6) 0<s≤t

Remarks 3.6.4 1. Note that the series is absolutely convergent because, if C is a bound for |F |, then by Taylor’s theorem

|F(X s ) − F(X s− ) − F (X s− )X s | ≤

0<s≤t

C (X s )2 , 2 0<s≤t

and the right hand side is finite as in Definition 3.2.18. Also, because X c , X c is a continuous process, t t F (X s− )dX c , X c s = F (X s )dX c , X c s . 0

0

2. The first integral on the right of (3.6.6) is a well-defined stochastic integral since the integrand is predictable and locally bounded. Similar remarks apply to the second integral. 3. The series on the right of (3.6.6) is a correction term to balance the number of jumps on both sides of the equation.

3.6 The Itˆo formula for semimartingales

103

4. Another form of the differentiation rule is the following: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s− )d[X, X ]s 2 0 0 1 + (F(X s ) − F(X s− ) − F (X s− )X s − F (X s− )X s2 ). 2 0<s≤t This representation is of interest because whilst the predictable quadratic variation process X c , X c depends on the underlying probability measure, the optional quadratic variation process [X, X ] does not. 5. If {X t } is a deterministic function of bounded variation which is right-continuous and has left limits, we require F to be only once continuously differentiable and then: t F(X t ) = F(X 0 ) + F (X s− )dX s + (F(X s ) − F(X s− ) − F (X s− )X s ) = 0

0 t

F (X s− )dX sc +

0<s≤t

(F(X s ) − F(X s− )).

0<s≤t

Lemma 3.6.5 Suppose the differentiation rule of Theorem 3.6.6 is true for all semimartingales of the form X t = X 0 + Nt + Bt , where X 0 belongs to some dense set in L 1 , N belongs to some dense set in H02 and B belongs to some dense set in A0 . Then Theorem 3.6.6 is true for general semimartingales of the stated form. Proof

See [11] page 133.

Lemma 3.6.6 The semimartingale we need consider can be further restricted so that, 2,c if X t = X 0 + Mt + Vt , then M ∈ H0 is bounded, V ∈ A has at most N jumps and ∞

0

|dAcs |, is bounded, and X 0 is bounded.

Proof

See [11] page 137.

We now prove Theorem 3.6.6. Proof of Theorem 3.6.6 From Lemma 3.6.5 and Lemma 3.6.6, the result of Theorem 3.6.6 will follow if it can be proved that for a semimartingale X t = X 0 + Mt + Vt , 2 1 where ∞ X 0 ∈ L is bounded, M ∈ H0 and M is bounded, V ∈ A0 has at most N jumps and |dVs | < ∞. 0

104

Stochastic calculus

However, note that the two sides of the differentiation formula have the same jump at time t: because X c , X c is continuous the jump of the right hand side is F (X t− )X t + (F(X t ) − F(X t− ) − F (X t− )X t ) = F(X t ) − F(X t− ), which is the jump of the left hand side at t. Consider the continuous semimartingale X¯ t = X 0 + Mt + Vtc . Then from Theorem 3.6.2 the differentiation rule is true for X¯ . Furthermore, if the jumps of X are indexed in increasing order as 0 < S1 ≤ · · · ≤ S N ≤ ∞, then X t = X¯ t on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t < S1 (ω)} (denoted [[0, S1 [[). Therefore, X t− = X¯ t− on the stochastic interval {(t, ω) ∈ [0, ∞[× : 0 ≤ t ≤ S1 (ω)} (denoted [[0, S1 ]]) so t t F (X s− )dMs = F ( X¯ s− )dMs 0

on [[0, S1 ]]. Also,

0

t 0

F (X s− )dVs =

0

t

F ( X¯ s− )dVsc

on [[0, S1 ]]. Because X c = X¯ c = M and the formula is true for X¯ (on [[0, ∞[[), the differentiation formula is true on [[0, S1 [[. However, the two sides of the formula have equal jumps at S1 , so the formula is true on [[0, S1 ]]. The same reasoning establishes the formula on {(t, ω) ∈ [0, ∞[× : S1 (ω) ≤ t ≤ S2 (ω)) ([[S1 , S2 ]]), and so on, up to [[S N , ∞[[. Theorem 3.6.6 is, therefore, proved. The differentiation rule will now be given when X is a general semimartingale and F is twice continuously differentiable, not necessarily having bounded derivatives. Theorem 3.6.7 Suppose X a semimartingale and F is a twice continuously differentiable function. Then F(X ) is a semimartingale, and with equality denoting indistinguishability: t 1 t F(X t ) = F(X 0 ) + F (X s− )dX s + F (X s )dX c , X c s 2 0 0 + (F(X s ) − F(X s− ) − F (X s− )X s ). 0<s≤t

Proof

See [11] page 138.

Remark 3.6.8 Note that F (X s ) is right-continuous with left limit, and so F (X s− ) is predictable. Also, by considering stopping times such as S = inf{t : F (X s ) ≥ n}, we see that F (X s− ) is locally bounded. Similar remarks apply to F (X s− ). Therefore, the two integrals are well-defined. For any ω ∈ the trajectory X s (ω), 0 ≤ s ≤ t, remains in a compact interval [−C(t, ω), C(t, ω)]. On such a compact interval the second derivative of F is bounded

3.6 The Itˆo formula for semimartingales

105

by some constant K (t, ω). Therefore, if s ≤ t, |F(X s ) − F(X s− ) − F (X s− )X s | ≤

1 K (t, ω)(X s )2 (ω). 2

As in Definition 3.2.18, we know that s≤t (X s )2 (ω) is finite almost surely. Therefore, for any t the sum occurring on the right hand side of the differentiation rule is a.s. absolutely convergent. Theorem 3.6.9 If {X t } = {(X t1 , . . . , X tn )} is an n-vector semimartingale and F is a twice continuously differentiable function with respect to all arguments, we have t ∂ F(X s− ) 1 t ∂ 2 F(X s− ) i F(X t1 , . . . , X tn ) = F(X 0 ) + dX + dX ic , X jc s s i i∂ X j ∂ X 2 ∂ X 0 0 s i i, j ∂ F(X s− ) i F(X s ) − F(X s− ) − (3.6.7) + X s . ∂ X si i 0<s≤t Remark 3.6.10 If {X t } is a deterministic right-continuous with left limits function of bounded variation, then t t = 1 + s− dX s 0

has the unique exponential solution (0 = 1) t = e X t −X 0 (1 + X s )e{−X s } . s≤t

The next example is a generalization of the exponential formula to special semimartingales. Example 3.6.11 We shall apply Theorem 3.6.9 to show that if X t is a special semimartingale, then t t = 1 + s− dX s (3.6.8) 0

has the unique stochastic exponential solution (0 = 1) 1 c c t = e X t − 2 X , X t (1 + X s )e{−X s } s≤t

= eY1t Y2t ,

(3.6.9)

where 1 Y1t = X t − X c , X c t , 2 Y2t = (1 + X s )e−X s . s≤t

First note that the infinite product Y2t is finite (see Lemma 13.7 of [11]).

106

Stochastic calculus

Write t = f (Y1t , Y2t ). Using rule (3.6.7), f (Y1t , Y2t ) = 0 +

2 i=1

0

t

∂ f (Y1s− , Y2s− ) dYis ∂Yi

(3.6.10)

2 1 ∂ 2 f (Y1s− , Y2s− ) c c + f (Y1s , Y2s ) − f (Y1s− , Y2s− ) dYi , Y j s + 2 i, j=1 ∂Yi ∂Y j 0<s≤t

−

2 ∂ f (Y1s− , Y2s− ) i=1

∂Yi

Yis .

(3.6.11)

Because Y2t is a purely discontinuous process and of bounded variation the second integral of the sum in (3.6.10) is equal to eY1s− Y2s . 0<s≤t

Now Yic , Y jc = 0 except for i = j = 1 because the continuous part of Y2t is identically zero and since Y1c = X c , (3.6.11) becomes 1 t s− dX c , X c s . 2 0 In the last expression, using (3.6.9), we have f (Y1s , Y2s ) − f (Y1s− , Y2s− ) = s − s− 1 c c = e X s −X s− +X s− 2 X , X s (1 + X r )e−X r (1 + X s )e−X s r ≤s− 1 c − e X s− − 2 X ,

X c s (1 + X )e−X r r r ≤s−

1 c c = e X s− − 2 X , X s (1 + X r )(1 + X s ) r ≤s−

= s− X s . That is, s = s− (1 + X s ). Putting these results together gives (3.6.8). For the proof of the uniqueness see Theorem 13.5 of [11]. Example 3.6.12 Consider the following “log-Poisson plus log-normal” process, with its jump part driven by a finite sum of independent Poisson processes Nti , i = 1, . . . , n with time varying jump sizes ati and intensities λit :

t

Xt = X0 + σ 0

X s− dBs +

n i=1

0

t

X s− asi (dNsi − λis ds).

3.6 The Itˆo formula for semimartingales

107

Applying the result of Example 3.6.11 we see that X has the form n t σ i i X t = X s− exp σ (Bt − Bs ) − (t − s) − as λs ds (1 + ari Nri ). 2 s≤r ≤t i=1 s Example 3.6.13 If X t and Yt are two semimartingales, the product rule gives t t X t Yt = X 0 Y0 + X s− dYs + Ys− dX s + [X, Y ]t , 0

0

and

=

X t2 or

t

X 02

t

+2

X s− dX s + [X, X ]t ,

0

X s− dX s =

0

1 2 (X − [X, X ]t ). 2 t

Applying the preceding result to a Poisson process we obtain t 1 1 Ns− dNs = (Nt2 − [N , N ]t ) = (Nt2 − Nt ). 2 2 0 Theorem 3.6.14 (L´evy’s characterization of the Poisson process) Suppose {Q t } is a purely discontinuous martingale on the filtered probability space (, F, P, Ft ), t ≥ 0, all of whose jumps equal 1. If Q 2t − t is a martingale then {Nt = Q t + t} is a Poisson process. Proof

We can suppose N0 = Q 0 = 0. Because Q t is purely discontinuous, E[[Q, Q]t ] = E[Q 2t ] < ∞,

but because all the jumps equal +1, [Q, Q]t =

Q 2s =

s≤t

Q s .

s≤t

Write Nt = s≤t Q s . Then Nt is integrable, because [Q, Q]t is. Furthermore, Q is a compensated sum of jumps, by Theorem 9.24 of [11], so Nt has a predictable compensator λt : Q t = N t − λt . However, Nt − t = [Q, Q]t − t = (Q 2t − t) + ([Q, Q]t − Q 2t ), and is therefore, a martingale. Consequently λt = t. That is, Q t = Nt − t.

(3.6.12)

108

Stochastic calculus

Applying the differentiation rule to the martingale Q t and the function f (x) = eiux , u ∈ IR, from time 0 to the first jump time T1 , we have T1 iu Q T1 e = 1 + iu eiu Q v− dQ v + (eiu Q T1 − eiu Q T1 − − iueiu Q T1 − Q T1 ) 0

= 1 + iu

T1

eiu Q v− dQ v + (eiu Q T1 − e−iuT1 − iue−iuT1 ),

0

since Q T1 − = N T1 − − T1 = −T1 by (3.6.12) and Q T1 = 1. Also note that eiu Q T1 cancels out. Hence T1 (1 + iu)e−iuT1 = 1 + iu eiu Q v− dQ v . 0

Taking the conditional expectation with respect to F0 we have (1 + iu)E[e−iuT1 | F0 ] = 1. Therefore, T1 is independent of F0 and is exponentially distributed with parameter 1. A time translation and a similar argument shows that Tn − Tn−1 is independent of Fn−1 , and is exponentially distributed with parameter 1. Therefore Nt is a Poisson process.

3.7 The Itˆo formula for Brownian motion In this section, the Itˆo formula obtained above for general semimartingales is specialized to the Brownian motion and the related Itˆo processes. (See Definition 3.7.7.) Theorem 3.7.1 Let f be a twice continuously differentiable function on IR and let B = {Bt , t ≥ 0} be a Brownian motion. Then, in view of Theorem 3.6.7, f (Bt ) is a semimartingale given by the formula t 1 t f (Bt ) = f (B0 ) + f (Bs )dBs + f (Bs )dB, Bs 2 0 0 t 1 t = f (B0 ) + f (Bs )dBs + f (Bs )ds. 2 0 0 Example 3.7.2 f (Bt ) =

Bt2

=

B02

+ 0

t

1 2Bs dBs + 2

t

2ds. 0

Then d(Bt2 ) = 2Bt dBt + dt. We prove the converse of Theorem 2.7.1. Theorem 3.7.3 Let {Wt , Ft }, t ≥ 0 be a continuous (scalar) local martingale such that {Wt2 − t}, t ≥ 0, is a local martingale. Then {Wt , Ft } is a Brownian motion.

3.7 The Itˆo formula for Brownian motion

109

Proof We must show that for 0 ≤ s ≤ t the random variable Wt − Ws is independent of Fs and is normally distributed with mean 0 and covariance t − s. In terms of characteristic functions this means we must show that for any real u, E[eiu(Wt −Ws ) | Fs ] = E[eiu(Wt −Ws ) ] = e−u

2

(t−s)/2

.

Consider the (complex-valued) function f (x) = eiux . Applying the Itˆo rule to the real and imaginary parts of f (x) we have t 1 t 2 iuWr iuWt iuWr f (Wt ) = e = f (Ws ) + iue dWr − u e dr, (3.7.1) 2 s s because dW r = dr by hypothesis. Furthermore, the real and imaginary parts of t iueiuWr dWr are in fact square integrable martingales because the integrands are bounded t s by 1. Consequently, E[ iueiuWr dWr | Fs ] = 0. For any A ∈ Fs we may multiply (3.7.1) s

by I A e−iuWs and take expectations to deduce E[e

iu(Wt −Ws )

1 I A ] = P(A) − 2

t

E[eiu(Wr −Ws ) I A ]dr.

0

Solving this equation, we see E[eiu(Wt −Ws ) I A ] = P(A)e−u

2

(t−s)/2

,

and the result follows. If the function f is a function of both time and space the Itˆo rule has the following form. Theorem 3.7.4 Let f (., .) be continuously differentiable in the first argument and twice continuously differentiable in the second argument and let {Bt } be a Brownian motion. Then f (t, Bt ) is given by the formula t t ∂ f (s, Bs ) ∂ f (s, Bs ) 1 t ∂ 2 f (s, Bs ) f (t, Bt ) = f (0, B0 ) + ds ds + dBs + ∂s ∂B 2 0 ∂ B2 0 0 t t ∂ f (s, Bs ) ∂ f (s, Bs ) 1 ∂ 2 f (s, Bs ) = f (0, B0 ) + ds. dBs + + ∂B ∂s 2 ∂ B2 0 0 One can write formally the differential expression: ∂ f (t, Bt ) 1 ∂ 2 f (t, Bt ) ∂ f (t, Bt ) dt. dt + dBt + ∂t ∂B 2 ∂ B2 Here the differentials satisfy: d f (t, Bt ) =

(dBt )2 = dt,

(dBt )n = 0,

n > 2,

dt dBt = 0.

Example 3.7.5 1 f (t, Bt ) = exp(Bt − t), 2 1 1 d f (t, Bt ) = f (t, Bt )dBt + (− f (t, Bt ) + f (t, Bt ))dt 2 2 = f (t, Bt )dBt .

(3.7.2)

110

Stochastic calculus

Hence the function exp(Bt − 12 t) is the solution of the exponential equation (3.7.2). Since t 1 2 E (exp(Bs − s)) ds < ∞, 2 0 t the Itˆo integral exp(Bs − 12 s)dBs is a martingale. However, 0

0

t

1 1 exp(Bs − s)dBs = exp(Bt − t) − 1, 2 2

so the process X t = exp(Bt − 12 t) is a martingale.

Example 3.7.6 Given two adapted, measurable, processes X t and Yt such that with probability 1, t X s2 ds < ∞, 0

and

t

0

we have:

t

0

In particular,

t

X s dBs ,

t

t

Ys dBs =

0

Ys2 ds < ∞,

0

t

X s dBs ,

0

X s Ys ds. 0

X s dBs =

0

t

X s Ys dBs , Bs =

0

t

X s2 dBs ,

t

Bs = 0

X s2 ds.

Definition 3.7.7 An Itˆo process is a (special) semimartingale of the form: t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω). 0

0

Here Bt is a Brownian motion, and {µ(ω, t)}, {σ (ω, t)} are adapted, measurable processes such that with probability 1, t µ(ω, s)ds < ∞, ∀t ≥ 0 0

and

t

σ 2 (ω, s)ds < ∞,

∀t ≥ 0.

0

Given two Itˆo processes

Xt = X0 + 0

t

t

α(ω, s)ds + 0

β(ω, s)dBs (ω),

3.7 The Itˆo formula for Brownian motion

and

t

Y t = Y0 +

t

µ(ω, s)ds +

0

111

σ (ω, s)dBs (ω),

0

we have

t

[X, Y ]t = X, Y t = X 0 Y0 +

β(ω, s)σ (ω, s)ds.

0

Given an adapted, measurable process Yt such that with probability 1: t |Ys µ(s)|ds < ∞, 0

and

t

(Ys σ (s))2 ds < ∞,

0

and an Itˆo process

t

Xt = X0 +

t

µ(ω, s)ds +

0

σ (ω, s)dBs (ω),

0

we can define the stochastic integral t t t Ys dX s = Ys µ(ω, s)ds + Ys σ (ω, s)dBs (ω). 0

0

0

Remarks 3.7.8

1. From the definition of an Itˆo process it follows that the process process. t 2. The process Ys dX s is a continuous semimartingale.

t

Ys dX s is an Itˆo 0

0

Theorem 3.7.9 Let f be a twice continuously differentiable function on IR and let t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω) (3.7.3) 0

0

be an Itˆo process. Then f (X t ) is given by the formula t 1 t f (X t ) = f (X 0 ) + f (X s )dX s + f (X s )dX, X t . 2 0 0 Proof

See any of [6], [11], [16], [30], [34].

Or in differential form, d f (X t ) = f (X t )dX t +

1 f (X t )dX, X t . 2

Here, dX, X t = σ (t)dBt , σ (t)dBt = σ (t)2 dt.

(3.7.4)

112

Stochastic calculus

If (3.7.3) is substituted in (3.7.4) we have t t 1 2 f (X t ) = f (X 0 ) + f (X s )σ (ω, s)dBs (ω) + f (X s )σ (s) + µ(ω, s) f (X s ) ds. 2 0 0 t Remark 3.7.10 Note that f (X s )σ (ω, s)dBs (ω) is perhaps only a local martin t 0 gale even if E σ 2 (s)ds < ∞ because f (X t )σ (t) satisfies the weaker condition 0 t ( f (X s )σ (s))2 ds < ∞. This is guaranteed by the continuity of f (X t ) in t. Consequently, 0

local martingales arise naturally in the context of Itˆo stochastic calculus.

Example 3.7.11 Consider the Itˆo process t Xt = σ (ω, s)dBs (ω), 0

and suppose f (X t ) =

X t2 .

By the Itˆo formula,

1 d f (X t ) = 0dt + 2X t 0dt + 2X t σ (t)dBt + 2σ (s)2 dt 2 2 = 2X t σ (t)dBt + σ (s) dt.

Example 3.7.12 Solve the following linear stochastic differential equation: dX t = µX t dt + σ X t dBt ,

(µ, σ ∈ IR).

(3.7.5)

Assume that X 0 is independent of Bt and E[X 0 ]2 < ∞. We must find an adapted, measurable process X t such that t 2 E X s ds < ∞, 0

and (3.7.5) holds. Let f (X t ) = log X t and apply the Itˆo formula: t t 1 1 1 1 log X t = log X 0 + µX s − σ 2 X s2 2 ds + σ X s dBs Xs 2 Xs Xs 0 0 1 = log X 0 + (µ − σ 2 )t + σ Bt . 2 Therefore,

1 2 X t = X 0 exp (µ − σ )t + σ Bt . 2

As a Borel function of Bt , X t is adapted and since it is continuous, it is measurable. Now X t2 = X 02 exp{(2µ − σ 2 )t + 2σ Bt }, and using the assumptions, E[X t2 ] = E[X 02 ] exp{(2µ − σ 2 )t}E[exp{2σ Bt }].

3.7 The Itˆo formula for Brownian motion

113

Recall that Bt is an N (0, t) random variable so that E[exp{2σ Bt }] = exp{2σ 2 t}. Therefore E[X t2 ] = E[X 02 ] exp{(2µ + σ 2 )t}. Consequently, E 0

t

X s2 ds

t

=

E[X s2 ]ds t = E[X 02 ] exp{(2µ + σ 2 )s}ds < ∞. 0

0

Setting µ = 0 and σ = 1 in (3.7.5) gives the equation dX t = X t dBt .

(3.7.6)

1 X t = exp{Bt − t}. 2

(3.7.7)

This has the solution

The process X given by (3.7.7) is called the stochastic exponential of the Brownian motion process B. For a general Itˆo process, t t Zt = Z0 + µ(ω, s)ds + σ (ω, s)dBs (ω), 0

0

consider the equation dX t = X t dZ t , with X 0 given and F0 measurable, that is, X 0 is a constant. Then the unique solution of the equation is the process t t 1 X t = X 0 exp{ σ (s)dBs + (µ(s) − σ 2 (s))ds} 2 0 0 1 = exp{X t − X, X t }. 2 X t is then called the stochastic exponential of the Itˆo process Z . A generalization of Theorem 3.7.9 is: Theorem 3.7.13 Suppose f (., .) is continuously differentiable in the first argument and twice continuously differentiable in the second argument, and consider the Itˆo process t t Xt = X0 + µ(ω, s)ds + σ (ω, s)dBs (ω). (3.7.8) 0

0

114

Stochastic calculus

Then f (t, X t ) is given by the formula

∂ f (X s ) dX s ∂ Xs 0 ∂ f (X s ) 1 t ∂ 2 f (X s ) dX, X t . ds + ∂s 2 0 ∂ X s2

f (t, X t ) = f (0, X 0 ) +

t

+ 0

t

(3.7.9)

Again in differential form: d f (t, X t ) =

1 ∂ 2 f (X t ) ∂ f (X t ) ∂ f (X t ) dt + dX t + dX, X t . ∂ Xt ∂t 2 ∂ X t2

Substituting (3.7.8) in (3.7.9) we have: f (t, X t ) = f (0, X 0 ) +

∂ f (X s ) ds + ∂s

∂ f (X s ) dBs ∂ Xs 0 0 t 2 ∂ f (X s ) 1 2 ∂ f (X s ) µ(s) ds. + + σ (s) ∂ Xs 2 ∂ X s2 0 t

t

σ (s)

The following theorem gives the multi-dimensional Itˆo formula. (See [25].) Theorem 3.7.14 Let f (t, x1 , . . . , xn ) be continuously differentiable in the first argument and twice continuously differentiable in the other arguments. Suppose X 1 , . . . , X n are Itˆo processes of the form: dX t1 = µ1 (t)dt + σ11 (t)dBt1 + σ12 (t)dBt2 + · · · + σ1m (t)dBtm dX t2 = µ2 (t)dt + σ21 (t)dBt1 + σ22 (t)dBt2 + · · · + σ2m (t)dBtm ... dX tn = µn (t)dt + σn1 (t)dBt1 + σn2 (t)dBt2 + · · · + σnm (t)dBtm . We, therefore, require that with probability 1, t |µi (s)|ds < ∞,

i = 1, . . . , n,

0

and

t

|σkl (t)|2 ds < ∞,

k = 1, . . . , n;

l = 1, . . . , m.

0

Suppose B 1 , . . . , B m are m independent Brownian motions. t ∂f ∂f 1 n 1 n f (t, X t , . . . , X t ) = f (0, X 0 , . . . , X 0 ) + µi (s) i + ∂s ∂ Xs 0 i 1 ∂ ∂ + Tr σ (s) f σ (s) ds 2 ∂X ∂X t ∂f + σi j (t)dBsj . ∂ X si 0 ij

(3.7.10)

3.8 Representation results

115

∂ ∂ f is the matrix ∂X ∂X

Here σ (t) = {σi j (t)} is an n × m matrix, X = (X , . . . , X ), ∂2 f , i, j = 1, . . . , n and Tr(A) is the trace of the matrix A, i.e. the sum of the ∂ Xi∂ X j diagonal entries of the matrix A. 1

n

We can write (3.7.10) in differential form: d f (t, X t1 , . . . , X tn ) =

∂f ∂f µi (t) i dt dt + ∂t ∂ Xt i +

σi (t)

i

∂f 1 ∂2 f i dB + dX i , X j . 2 i, j ∂ X i ∂ X j ∂ X ti t

Example 3.7.15 Suppose that dX t1 = µ1 (t)dt + σ1 (t)dBt1 dX t2 = µ2 (t)dt + σ2 (t)dBt2 are two Itˆo processes and that f (X t1 , X t2 ) = X t1 X t2 . From the Itˆo rule (3.7.10) d(X t1 X t2 ) = X t2 dX t1 + X t1 dX t2 + σ1 (t)σ2 (t)dt = X t2 dX t1 + X t1 dX t2 + dX 1 , X 2 t , or equivalently X t1 X t2 = X 01 X 02 +

0

t

X s2 dX s1 +

0

t

X s1 dX s2 + X 1 , X 2 t .

3.8 Representation results Measurable, adapted processes { f (ω, t), Ft } such that

t

f (ω, s)ds < ∞, 2

E

∀t ≥ 0

0

generate martingales {X t , Ft } via the formula Xt = X0 +

t

f (ω, s)dBs (ω).

(3.8.1)

0

The following theorem (Davis [8]) gives a converse result in the sense that any (square) martingale {X t , Ft } can be represented as an Itˆo integral similar to (3.8.1). Theorem 3.8.1 Suppose {Bt }, t ≥ 0, is a Brownian motion on the filtered probability space (, F, P, Ft ). Write Gt0 = σ {Bs : s ≤ t} and {Gt } for the completion of {Gt0 }, so that the filtration {Gt } is certainly right-continuous.

116

Stochastic calculus

Then every random variable X ∈ L 2 (, G∞ ) can be represented as a stochastic integral ∞ X = E[X | G0 ] + f s dBs , 0

∞

where { f t } is a Gt -predictable process and E[ 0

f s2 ds] < ∞. Furthermore,

t

E[X | Gt ] = E[X | Ft ] = E[X | G0 ] +

f s dBs .

0

If {Bt } = {Bt1 , . . . , Btn }, t ≥ 0, is an n-dimensional ∞ Brownian motion, then f t = ( f t1 , . . . , f tn ) is a Gt -predictable process such that E[ ( f si )2 ds] < ∞, i = 1, . . . , n and 0

X ∈ L 2 (, G∞ ) has representation X t = E[X | G0 ] +

n i=1

Proof

0

t

f si dBsi .

See [8].

Theorem 3.8.2 Suppose {Nt }, t ≥ 0, is a Poisson process on the filtered probability space (, F, P, Ft ). Write Gt0 = σ {Ns : s ≤ t} and {Gt } for the completion of {Gt0 }, so that the filtration {Gt } is certainly right-continuous. Then every random variable X ∈ L 2 (, G∞ ) can be represented as stochastic integral ∞ X = E[X | G0 ] + f s dQ s , 0

where Q t = Nt − t, { f t } is a Gt -predictable process and E[

∞ 0

E[X | Gt ] = E[X | Ft ] Proof

f s2 ds] < ∞. Furthermore,

a.s. for all t.

See [8], [11]. Representation results for Markov chains

Consider a finite state Markov process {X t }, t ≥ 0, defined on a probability space (, F, P). We have noted in Example 2.6.17 that, without loss of generality, the state space of X can be identified with the set S = {e1 , . . . , e N } of standard unit vectors in IR N . Recall that pti = P(X t = ei ),

0 ≤ i ≤ N,

and d pt = A t pt . dt At = (ai j (t)), t ≥ 0.

(3.8.2)

3.8 Representation results

117

Write Fts for the right-continuous, complete filtration generated by σ {X r : s ≤ r ≤ t}, and Ft = Ft0 . We saw in Lemma 2.6.18 that t

Vt = X t − X 0 − Ar X r dr (3.8.3) 0

is an {Ft }-martingale. Lemma 3.8.3

X t = (t, 0) X 0 +

t

(r, 0)−1 dVr .

(3.8.4)

0

Proof

Differentiating (3.8.4) verifies the result.

If x, y are (column) vectors in IR N we shall write x, y = x y for their scalar (inner) product. Consider 0 ≤ i, j ≤ N with i = j. Then, because the Markov chain is piecewise constant, dX s = X s and X s− , ei ej dX s = X s− , ei ej X s = X s− , ei ej (X s − X s− ) = I (X s− = ei , X s = e j ). Therefore, 0

t

X s− , ei ej dX s =

ij

I (X s− = ei , X s = e j ) = Jt ,

0<s≤t

which equals the number of times X jumps from ei to e j in the interval [0, t]. Define the martingale t ij

Vt = X s− , ei ej dVs . 0

(Note the integrand is predictable.) Then t t ij Vt = X s− , ei ej dX s − X s− , ei ej As X s− ds 0 0 t ij = Jt − X s− , ei a ji (s)ds 0 t ij = Jt − X s , ei a ji (s)ds, 0

because X s = X s− for each ω, except for countably many s. That is, for i = j, t ij ij Jt = X s , ei a ji (s)ds + Vt . 0

The process ij

Jt .

0

t

X s , ei a ji (s)ds is, therefore, the compensator of the counting process

118

Stochastic calculus j

For a fixed j, 0 ≤ j ≤ N , write Jt for the number of jumps into state e j up to time t. Then, for i = j, N

j

Jt =

ij Jt

=

i=1

N i=1

t

j

X s , ei a ji (s)ds + Vt ,

0

N j ij where Vt is the martingale i=1 Vt . Finally, write Jt for the total number of jumps (of all kinds) of the process X up to time t. Then Jt =

N

j

Jt =

t

X s , ei a ji (s)ds + Q t ,

i, j=1 0

j=1

where Q t is the martingale

N

N j=1

j

Vt . However, aii (s) = −

N

a ji (s),

j=1

so Jt = −

N i=1

t

X s , ei aii (s)ds + Q t .

(3.8.5)

0

Before we state the next result we need the following definition. Definition 3.8.4 If M = (M 1 , . . . , M N ) is a vector, IR N -valued, square integrable martingale, the quadratic predictable variation process of M is the (unique) predictable matrix valued process M, M ∈ IR N ×N such that M M − M, M is a martingale. Here M M is the (Kronecker) product of the (column) vector M with the (row) vector M , so that M M can be identified with the matrix valued process with entries (M i M j ). Lemma 3.8.5 The quadratic predictable variation process of the (vector) martingale V (see Definition 3.8.4) is given by the matrix valued process t t t V, V t = diag Ar X r − dr − (diagX r − )Ar dr − Ar (diagX r − )dr. 0

Proof

0

0

Recall X t ∈ S is one of the unit vectors ei . Therefore, X t X t = diagX t .

(3.8.6)

Now by the product rule X t X t = X 0 X 0 + + 0

t

t 0

X r − (Ar X r − ) dr +

t 0

X r − dVr +

t

0

dVr X r − + V, V t + ([V, V ]t − V, V t ),

(Ar X r − )X r − dr

3.8 Representation results

119

where [V, V ]t − V, V t is an {Ft }-martingale. However, a simple calculation using 3.8.6 shows X r − (Ar X r − ) = (diagX r − )Ar , and (Ar X r − )X r − = Ar (diagX r − ) . Therefore, X t X t = X 0 X 0 + +

t

0

t

(diagX r − )Ar dr

Ar (diagX r − )dr + V, V t + martingale.

(3.8.7)

0

Also, from (3.8.6), X t X t

t

= diagX t = diagX 0 + diag

Ar X r − dr + diagVt .

(3.8.8)

0

The semimartingale decompositions (3.8.7) and (3.8.8) must be the same, so equating the predictable terms, t t t V, V t = diag Ar X r − dr − (diagX r − )Ar dr − Ar (diagX r − )dr. 0

0

0

We next note the following representation result: Remark 3.8.6 A time varying function f (t, X t ) of X t ∈ S takes only the values f (t, e1 ), . . . , f (t, e N ) for each t. Writing f i (t) = f (t, ei ), 1 ≤ i ≤ N , we see f can be represented by the vector f (t) = ( f 1 (t), . . . , f N (t)) ∈ IR N , so that f (t, X t ) = f (t), X t , where , denotes the inner product in IR N .

Therefore, we have the following differentiation rule and representation result: Lemma 3.8.7 Suppose the components of f (t) are differentiable in t. Then t t f (t, X t ) = f (0, X 0 ) + f (r ), X r dr + f (r ), Ar X r − dr 0 0 t + f (r ), dVr . Here,

(3.8.9)

0 t

f (r ), dVr is an Ft -martingale. Also, using (3.8.4),

0

t

f (t, X t ) = f (t), (t, 0)X 0 + 0

f (t), (t, r )dVr .

(3.8.10)

120

Stochastic calculus

The single jump process Here we discuss representation results for the single jump process. (See Examples 2.1.4 and 2.6.12.) Lemma 3.8.8 Suppose {Mt } is a uniformly integrable {Ft }-martingale (see Example 2.6.12 for the definition of the filtration {Ft }) such that M0 = 0 a.s. Then there is an F = t Ft measurable function h : → IR such that h ∈ L 1 (P) and 1 Mt = h(T, Z )I{T ≤t} − I{T >t} h(s, z)P(ds, dz) a.s. Ft ]0,t] E Proof If {Mt } is a uniformly integrable {Ft }-martingale, then {Mt } = E[h | Ft ] for some F = t Ft measurable random variable h. From the definition of F = t Ft , h is of the form h(T, Z ). However, 1 E[h(T, Z ) | Ft ] = h(T, Z )I{T ≤t} + I{T >t} h(s, z)P(ds, dz). Ft ]t,∞] E Because 0 = M0 = E[h] =

h(s, z)P(ds, dz)

=

h(s, z)P(ds, dz) + ]0,t]

E

h(s, z)P(ds, dz), ]t,∞]

E

the result follows.

Lemma 3.8.9 Suppose {Mt }, t ≥ 0, is a local martingale of {Ft }. 1. If c = ∞, or c < ∞ and Fc− = 0, then {Mt } is a martingale on [0, c[. 2. If c < ∞ and Fc− > 0, then {Mt } is a uniformly integrable martingale. Proof 1. Let {Tk } be an increasing sequence of stopping times such that lim Tk = ∞ a.s. and {Mt∧Tk } is a uniformly integrable martingale. If there is a k such that Tk ≥ T a.s. then Mt = Mt∧Tk a.s. is a uniformly integrable martingale. Otherwise, suppose for each k that P(Tk < T ) > 0. Then, by Lemma 2.6.13, there is a sequence {tk } such that Tk ∧ T = tk ∧ T for each k, and because P(T > tk ) > 0 we have tk ≤ c, otherwise we should have Tk ≥ T . Because lim Tk = ∞ we see that limk P(T > tk ) = 0, so lim tk = c. Now {Mt } is stopped at time T so Mt∧Tk = Mt∧tk . Consequently {Mt }, t ≤ tk , is a uniformly integrable martingale, and {Mt } is certainly a martingale on [0, c[. 2. Suppose now that c < ∞ and Fc− > 0. Then P(T = c) > 0. Because lim Tk = ∞ a.s. there is a k such that P(T = c, Tk > c) > 0. Consequently, for such a k, Tk ≥ T a.s. and the process {Mt∧Tk } = {Mt } is a uniformly integrable martingale.

3.8 Representation results

Write E

121

L 1 (µ) for the set of measurable functions g : → IR such that |g|dµ < ∞, and L 1loc (P) for the set of measurable functions g : → IR

[0,∞]×E

such that I{s≤t} g(s, x) ∈ L 1 (P) for all t < c. We have the following martingale representation result (see [11]). g

Theorem 3.8.10 {Mt } is a local Ft -martingale with M0 = 0 a.s. if and only if Mt = Mt for some g ∈ L 1loc (P), where g

Mt =

I{s≤t} g(s, x)q(ds, dx).

Proof Suppose g ∈ L 1loc (P). Then there is an increasing sequence of stopping times {Tk } such that lim Tk = ∞ a.s. and I{s 0. From Lemma 3.8.9 {Mt } is a uniformly integrable martingale, and so is of the form 1 Ft

Mt = h(T, Z )I{T ≤t} − I{T >t}

h(s, z)P(ds, dz), ]0,t]

(3.8.11)

E

where h(T, Z ) = M∞ . Define g(t, Z ) = h(t, Z ) − I{t
1 Ft

h(s, z)P(ds, dz) ]0,t]

if t < ∞,

(3.8.12)

E

and g(∞, Z ) = 0. Then g Mt

=

I{s≤t} gdq

= I{t≥T } g(T, Z ) −

]0,t]

− I{t
E

E

−1 g(s, z)Fs− P(ds, dz)

−1 g(s, z)Fs− P(ds, dz).

(3.8.13)

g

From (3.8.11) and (3.8.13) we see that Mt = Mt if

h(t, z) = g(t, z) − ]0,t]

E

−1 g(s, z)Fs− P(ds, dz).

(3.8.14)

122

Stochastic calculus

However, if g is given by (3.8.12) and t < c, −1 −1 g(s, z)Fs− P(ds, dz) = h(s, z)Fs− P(ds, dz) ]0,t]

E

]0,t]

E

− ]0,t]

E

h(u, z)P(du, dz)dFs ]0,s]

−

+ ]0,t]

E

]0,t]

E

E

]u,t]

−1 Fs−1 Fs− dFs h(u, z)P(du, dz)

−1 h(s, z)Fs− P(ds, dz)

=

+ ]0,t]

= Ft−1

−1 h(s, z)Fs− P(ds, dz)

= ]0,t]

−1 Fs−1 Fs−

E

−1 Ft−1 Fu− h(u, z)P(du, dz)

h(u, z)P(du, dz). ]0,t]

E

Therefore, (3.8.14) is satisfied if t < c. A similar calculation shows that the coefficients g g of It c) = 0 so it remains only to show that Mc = Mc when T (ω) = c. This is verified by a similar calculation to that above. We now check that g ∈ L 1 (µ). Because {Mt } is uniformly integrable |h|d p < ∞.

Therefore

|h|d p ≤ ≤ =

|h|d p − ]0,c[ −1 |h|d p − Fc− −1 |h|d p + Fc−

≤ (1 +

−1 Fc− )

Ft−1

|h|P(ds, dz)dFt

]0,t]

E

|h|P(ds, dz)dFt ]0,c[

]0,t]

E

|h|(Ft − Fc− P(ds, dz) ]0,c[

E

|h|d p < ∞.

Consequently, g ∈ L 1 (µ). 2. Now suppose c = ∞, or c < ∞ and Fc− = 0. Then from Lemma 3.8.9 {Mt } is a martingale on [0, c[, and so uniformly integrable on [0, t] for any t < c. Therefore Mt is of the form (3.8.11) for some h satisfying |h|P(ds, dz) < ∞, ]0,t]

E

3.9 Random measures

123

for all t < c. Calculations as in (1.) above show that, for g given by (3.8.12) and t < c, g Mt = Mt . Also −1 |g|P(ds, dz) ≤ |h|d p − Fs |h|P(ds, dz)dFs ]0,t]

E

]0,t]

E

]0,t[

≤

]0,s]

|h|d p 1 − ]0,t]

E

]0,t[

Fs−1 dFs

E

<∞

if t < ∞.

1 Therefore g ∈ L loc (P) and the proof is complete.

3.9 Random measures Definition 3.9.1 A measure µ on (IR+ , B(IR+ )) is a counting measure if

1. µ(B) ∈ {0, 1, . . . , +∞} = IN ∪ {∞} for every B ∈ B(IR+ ), 2. µ([a, b]) < ∞ for all bounded intervals [a, b] ⊂ IR+ . In other words a counting measure µ is just a countable subset D ⊂ IR+ , and for any given B ∈ B(IR+ ), µ(B) is the number of points in D which belong to B and we write µ(dx) = δx (dx). x∈D

Here δx (dx) denotes the unit mass at x. Integration with respect to a counting measure µ is reduced to discrete time summation, i.e. for any real valued function f , we have f (x)µ(dx) = f (x). IR

x∈D

Definition 3.9.2 Let (E, E) be a measurable space. Let D ⊂ IR+ be a countable set. A function p from D to E is called a point function. A point function p defines a counting measure µp (dt, dx) on B(IR+ ) ⊗ E by µp ((0, t] × A) = #{s ∈ D; s ≤ t, p(s) ∈ A},

t > 0, A ∈ E.

(3.9.1)

The right hand side of (3.9.1) stands for the number of times s up to time t when p(s) landed in A. Definition 3.9.3 Let (, F, P) be a given probability space and (E, E) be a measurable space. A nonnegative kernel µ(ω, dt, dx) is called a random measure (on E) if 1. µ(., A) is F-measurable for each fixed A ∈ B(IR+ ) ⊗ E, 2. µ(ω, .) is a σ -finite measure for each ω. Such a random measure is said to be integer valued if also

3. µ(ω, A) ∈ {0, 1, . . . , +∞} = IN ∪ {∞} for every A ∈ B(IR+ ) ⊗ E, 4. µ(ω, (0, t] × E) − µ(ω, (0, t) × E) = µ(ω, {t} × E) ≤ 1 for all (ω, t), that is to say the counting done by µ cannot increase by more than one at any isolated single time t.

124

Stochastic calculus

Remarks 3.9.4 1. If µ is an integer valued random measure write D = {(ω, t) : µ(ω, {t} × E) = 1}. It follows from Definition 3.9.3(3, 4) that, for each fixed ω, the set Dω = {t ∈ IR+ : µ(ω, {t} × E) = 1},

(3.9.2)

which is the set of times when µ(ω, .) jumps, is at most countable. 2. For t ∈ Dω write {t} × E = ∞ of E into proper n=1 {t} × An , where {An } is a partition nonempty subsets. Then there exists one and only one subset Aωn t ∈ ∞ n=1 An , say, such that µ(ω, {t} × Aωn t ) = 1, which implies that there exists a single point εt (ω) ∈ Aωn t such that µ(ω, {t} × εt (ω)) = 1. In summary, for each (ω, t) ∈ D there is a unique point εt (ω) ∈ E such that µ(ω, {t} × dx) = δεt (ω) (dx). Here δεt (ω) (dx) denotes the unit mass at εt (ω) and we can write µ(ω, dt, dx) = δ(s,εs (ω)) (dt, dx) (ω,s)∈D

=

I(εs ∈E) δ(s,εs (ω)) (dt, dx).

(3.9.3)

s≥0

Note that the set given by (3.9.2) can be written Dω = {t ∈ IR+ : εt (ω) ∈ E)}.

(3.9.4)

3. If {εt } is an E-valued stochastic process such that for each fixed ω the t-function ε(ω, .) takes on at most a countable number of values in E, then the above expression (3.9.3) defines an integer valued random measure. These random measures are sometimes called point processes, since for each ω the sample path {εt (ω)} consists of the countable set of points {t, εt (ω)}, that is, {εt (ω)} is a point function as given by Definition 3.9.2. 4. Since an integer valued random measure process µ satisfies the assumptions of Doob– Meyer Theorem 2.6.9, then there exists a predictable increasing process µ p , the compensator of µ, such that µ p (ω, {t} × E) ≤ 1, and µ − µ p is a local martingale.

Random measures associated with jump processes When dealing with stochastic processes which are not continuous everywhere but with sample paths which are right-continuous with left limits, the notion of random measures enters naturally into the scene. For instance, the process µ B (ω, (0, t], B) = I (X s ∈ B), B ∈ B(IR − {0}), (3.9.5) 0<s≤t

3.9 Random measures

125

is called the measure of jumps of the process X and it counts the increments of X which fall in B up to time t. Note that, since X is right-continuous with left limits, the series given by (3.9.5) is finite (a.s.) for every finite t and any subset B that is bounded away from zero. However, the number of jumps of X t need not be finite on finite intervals of time. If we eliminate the randomness parameter ω from (3.9.5) we obtain a σ -finite measure on the product σ -field generated by (0, ∞) × (IR − {0}). Remark 3.9.5 Since the process (3.9.5) is an integer valued random measure, by Remark 3.9.4(4) there exists a predictable increasing process µ Bp , the compensator of µ B , such that µ B − µ Bp is a local martingale. Examples 3.9.6 1. Counting processes. Since all the jumps are of size +1, in formula (3.9.5) the only set of interest is B = {+1}, that is, µ B (ω, (0, t], B) = I (X s = 1) = X t . 0<s≤t

2. Finite state processes. Suppose that for all t ≥ 0, X t ∈ {0, 1, . . . , N }. Then the possible jumps are the integers {−N , −N + 1, . . . , −1, +1, . . . , N − 1, N }, and in formula (3.9.5) the sets B of interest are all subsets of {−N , −N + 1, . . . , −1, +1, . . . , N − 1, N }. 3. Let Z t , t ∈ IR+ , be a finite state space process with right-constant sample paths on the state space S = {e1 , e2 , . . . , e N }. Here ei is the standard basis (column) vector in IR N with unity in the i-th position and zero elsewhere. Let Tk (ω) be the k-th jump time of Z , δTk (ω) (dr ) be the unit mass at time Tk (ω) and δ Z Tk (ω) (ei ) be the unit mass at Z Tk (ω) (ω). Since Z t is a jump process taking values in the vector space IR N we can write Zt = Z0 + Z r . 0
Here Z r = Z r − Z r − =

N

(ei − Z r − )

i=1

=

N

∞

δTk (ω) (dr )δ Z Tk (ω) (ei )

k=1

(ei − Z r − )µ Z (dr, ei ).

i=1

We assume that each Z t has almost surely finitely many jumps in any finite interval so that the random measure µ Z is σ -finite. Let µ˜ Z (dr, ei ) be the predictable compensator of µ Z so that N t Zt = Z0 + (ei − Z r − )µ˜ Z (dr, ei ) + Wt , i=1

where Wt =

N

t

i=1 0 (ei

0

− Z r − )(µ Z (dr, ei ) − µ˜ Z (dr, ei )).

126

Stochastic calculus

Definition 3.9.7 An integer valued random measure µ(ω, dt, dx) is a Poisson random measure if 1. for each A ∈ B(IR+ ) ⊗ E the random variable µ(., A) is Poisson distributed with parameter λ(A) = E[µ(., A)], i.e. P(µ(., A) = k) = exp{−λ(A)}

(λ(A))k , k!

and 2. if A1 , A2 , . . . , An are disjoint subsets of B(IR+ ) ⊗ E, then the random variables µ(., A1 ), µ(., A2 ), . . . , µ(., An ) are mutually independent.

More of the differentiation rule Suppose X = {X t , Ft } is a real local martingale. Let X t = X 0 + X c + X d be the unique decomposition given in Lemma 3.2.14, where X c is the continuous local martingale part of X and X d is the purely discontinuous local martingale part of X . Suppose µ X (ω, dt, dx) = I(X s =0) δ(s,X s ) (dt, dx). s>0

with predictable compensator µ Xp (ω, dt, dx). We also have from [18] t c Xt = X0 + Xt + x(µ X (dt, dx) − µ Xp (dt, dx)). 0

IR

Suppose f ∈ C , the space of functions continuously differentiable in t and twice continuously differentiable in x. Then the differentiation rule gives (see [18]) t ∂ f (s, X s− ) f (t, X t ) = f (0, X 0 ) + ds ∂s 0 t ∂ f (s, X s− ) c + dX s ∂x 0 1 t ∂ 2 f (s, X s− ) + dX c , X c s 2 0 ∂x2 t + [ f (s, X s− + x) − f (s, X s− )] (µ X (ds, dx) − µ Xp (ds, dx)) 1,2

0

+

IR

t 0

f (s, X s− + x) − f (s, X s− ) −

IR

∂ f (s, X s− ) µ Xp (ds, dx). ∂x

Example 3.9.8 Suppose that the scalar process {X t } is described by the stochastic differential equation dX t = f (t, X t )dt + σ (t, X t )dBt + γ (t, X t− , x)(µ(dt, dx) − µ p (dt, dx)). IR

Here B is a standard Brownian motion and µ is a random measure with compensator µ p .

3.10 Problems

127

Suppose f ∈ C 1,2 . Then the differentiation rule gives (see [18]) t ∂ f (s, X s− ) f (t, X t ) = f (0, X 0 ) + ds ∂s 0 t ∂ f (s, X s− ) + ( f (s, X s )ds + σ (s, X s− )dBs ) ∂x 0 t ∂ f (s, X s− ) + γ (s, X s− , x)(µ(ds, dx) − µ p (ds, dx)) ∂s 0 IR 1 t ∂ 2 f (s, X s− ) 2 + σ (s, X s− )ds 2 0 ∂x2 t

+ f s, X s− + γ (s, X s− , x) − f s, X s− (µ(dt, dx) − µ Xp (dt, dx)) 0

+

IR

t

0

f (s, X s− + γ (s, X s− , x) − f s, X s−

IR

− γ (s, X s− , x)

∂ f (s, X s− ) X µ p (dt, dx). ∂x

3.10 Problems 1. Let X 1 , X 2 , . . . be a sequence of i.i.d. N (0, 1) random variables and consider the process Z 0 = 0 and Z n = nk=1 X k . Show that n [Z , Z ]n = X k2 , k=1

Z , Z n = n, E([Z , Z ]n ) = E(

n

X k2 ) = n.

k=1

2. Show that if X and Y are (square integrable) martingales, then X Y − X, Y is a martingale. 3. Establish the identity 1 ([X + Y, X + Y ]n − [X, X ]n − [Y, Y ]n ). 2 4. Show that for any processes X , Y , [X, Y ]n =

X n Yn =

n

X k−1 Yk +

k=1

n

Yk−1 X k + [X, Y ]n .

k=1

5. Show that for a real valued differentiable function f and a stochastic process X we have the discrete time version of the Itˆo formula, f (X n ) = f (X 0 ) +

n

f (X k−1 )X k

k=1

+

n k=1

[ f (X k ) − f (X k−1 ) − f (X k−1 )X k ].

128

Stochastic calculus

6. Show that if {X n } is a square integrable martingale then X 2 − X, X is a martingale. 7. Find [B + N , B + N ]t and B + N , B + N t for a Brownian motion process {Bt } and a Poisson process {Nt }. L2

8. Show that limδn →0 Sn = Bt2 /2 + (α − 12 )t, where Sn is given by (3.4.1) where τkn = n (1 − α)tk + αtk−1 , 0 ≤ α ≤ 1. 9. Let f be a deterministic square integrable function and Bt a Brownian motion. Show that the stochastic integral t f (s)dBs 0

is a normally distributed random variable with distribution t N (0, 0 f 2 (s)ds). 10. Show that if t E[ f (s)]2 ds < ∞, 0

the Itˆo process It =

t

f (s)dBs 0

has orthogonal increments, i.e., for 0 ≤ r ≤ s ≤ t ≤ u, E[(Iu − It )(Is − Ir )] = 0. 11. Show that

0

t

(Bs2 − s)dBs =

Bt3 − t Bt . 3

12. Prove the second part of Lemma 3.4.3. 13. Show that the process Bt2 /2 + (α − 12 )t is an Ft -martingale if and only if α = 0. 14. Using the Itˆo formula, show that the Doob–Meyer decomposition of Bt4 is given by t t Bt4 = 4 Bs3 dBs + 6 Bs2 ds, 0

0

where B is the Brownian motion process. 15. Using the Itˆo formula, show that d(Bt )n = n Btn−1 dBt +

n(n − 1) n−2 Bt dt. 2

16. If N is a standard Poisson process show that the stochastic integral t Ns d(Ns − s) 0

is not a martingale. However, show that t Ns− d(Ns − s) 0

3.10 Problems

129

is a martingale. Here Nt is a Poisson process. (Note that at any jump time s, Ns− = Ns − 1.) 17. Prove that t 2 Ns− dNs = 2 Nt − 1. 0

Here Nt is a Poisson process. 18. Show that the unique solution of

t

xt = 1 +

xs− dys 0

c is given by xt = e yt 0,s≤t (1 + ys ). Here yt is a finite-variation deterministic function. 19. Show that the unique solution of t xt = 1 + xs− dNs

0

is given by xt = 2 Nt . Here Nt is a Poisson process. 20. Show that given two adapted, measurable processes xt and yt , such that t 2 E (xs ) ds < ∞, 0

and

E

t

(ys )2 ds < ∞,

0

we have for 0 ≤ r ≤ t, t t t E xs dBs ys dBs | Fr = E xs ys ds | Fr , 0

0

0

where Bt is a Brownian motion process and Ft is its natural filtration. 21. Show that the linear stochastic differential equation dX t = F(t)X t dt + G(t)dt + H (t)dBt , with X 0 = ξ has the solution t t −1 −1 X t = (t) ξ + (s)G(s)ds + (s)H (s)dBs . 0

(3.10.1)

(3.10.2)

0

Here F(t) is an n × n bounded measurable matrix, H (t) is an n × m bounded measurable matrix, Bt is an m-dimensional Brownian motion and G(t) is an IRn -valued bounded measurable function. (t) is the fundamental matrix solution of the deterministic equation dX t = F(t)X t dt. See [1].

130

Stochastic calculus

22. Show that the solution (3.10.2) of the stochastic differential equation (3.10.1) with E|X 0 |2 = E|ξ |2 < ∞ has mean t −1 µt = E[X t ] = (t) E[ξ ] + (s)G(s)ds , 0

satisfying the deterministic differential equation dµt = F(t)µt dt + G(t)dt,

µ0 = E[ξ ],

and covariance matrix P(t) satisfying the deterministic matrix differential equation d p(t) = F(t)P(t)dt + P(t)F(t) dt + H (t)H (t) dt,

µ0 = E[ξ ],

with initial value P(0) = E[ξ − Eξ ][ξ − Eξ ] . 23. Show that the solution (3.10.2) of the stochastic differential equation (3.10.1) is a Gaussian stochastic process if and only if X 0 is normally distributed or constant. 24. Show that the linear stochastic differential equation dX t = −α X t dt + σ dBt , with E|X 0 |2 = E|ξ |2 < ∞ has the solution t X t = e−αt ξ + σ e−α(t−s) dBs , 0

and µt = E[X t ] = e−αt E[ξ ], σ 2 (1 − e−2αt ) . 2α 25. Show that the sequence of stopping times given by Remark 3.4.5 is indeed a localizing sequence of stopping times, i.e. a nondecreasing sequence, converging to ∞ with probability 1. 26. Suppose for θ ∈ IR, P(t) = Var(X t ) = e−2αt Var(ξ ) +

1 2 X tθ = eθ Mt − 2 θ At

is a martingale, and suppose there is an open neighborhood I of θ = 0 such that for all θ ∈ I and all t (P- a.s.), 1. |X tθ | ≤ a, dX θ 2. | t | ≤ b, dθ d2 X tθ 3. | 2 | ≤ c. dθ Here a, b, c are nonrandom constants which depend on I , but not on t. Show that then the processes {Mt } and {Mt2 − At } are martingales. 27. Prove the result of Example 3.6.12.

4

Change of measures

4.1 Introduction We begin by giving a conditional form of Bayes’ Theorem. The result relates conditional expectations under two different measures. Consider first a simple situation like, for instance, the throwing of a die. Here = {ω1 , ω2 , ω3 , ω4 , ω5 , ω6 } = {1, 2, 3, 4, 5, 6}. Suppose P(ωi ) = pi = 1/6. Let P be another probability measure such that P(ωi ) =

1 . 6

Then the two measures are related by the Radon–Nikodym derivative 1 P I{ω } (ω). (ω) = (ω) = P 6 pi i i Write j = (ω j ) = 1/(6 p j ). Consider the sub-σ -field G = {{odd}, {even}, , ∅}. Consider a set of real numbers {x1 , x2 , . . . , x6 } and an associated random variable X (ω) → IR given by: X (ωi ) = xi ,

i = 1, . . . , 6.

The G-measurable random variable E[X | G](ω) is constant on the two atoms of G and is given by the following expression: E[X | G](ω) =

x j j P[ω j | G](ω)

j

=

j

x j j P[ω j | {even}]I{even} (ω) +

j

x j j P[ω j | {odd}]I{odd} (ω)

132

Change of measures

x2 2 p2 + x4 4 p4 + x6 6 p6 I{even} (ω) P({even}) x 1 1 p 1 + x 3 3 p 3 + x 5 5 p 5 + I{odd} (ω) P({odd}) x2 + x4 + x6 x1 + x3 + x5 = I{even} (ω) + I (ω). 6( p2 + p4 + p6 ) 6( p1 + p3 + p5 ) {odd}

=

Similarly, E[ | G](ω) = =

j P[ω j | G](ω) j {P[ω j | {even}]I{even} (ω) + P[ω j | {odd}]I{odd} (ω)}

=

2 p2 + 4 p4 + 6 p6 1 p 1 + 3 p 3 + 5 p 5 I{even} (ω) + I{odd} (ω) P({even}) P({odd})

=

1 1 I{even} (ω) + I (ω). 2( p2 + p4 + p6 ) 2( p1 + p3 + p5 ) {odd}

Now with E denoting expectation under P, E[X | G](ω) = x j P[X = x j | G](ω) = x j {P[X = x j | {even}]I{even} (ω) + P[X = x j | {odd}]I{odd} (ω)} =

x 2 p 2 + x 4 p 4 + x 6 p6

I{even} (ω) +

x1 p1 + x3 p3 + x5 p5

P({even}) P({odd}) x1 + x3 + x5 x2 + x4 + x6 = I{even} (ω) + I{odd} (ω). 3 3

I{odd} (ω)

We, therefore, see that

E X | G E X | G E[X | G](ω) = (ω). I{even} (ω) + I E |G E | G {odd}

We now prove this result in full generality. Recall that φ is integrable if E|φ| < ∞. Theorem 4.1.1 (Conditional Bayes’ Theorem) Suppose (, F, P) is a probability space and G ⊂ F is a sub-σ -field. Suppose P is another probability measure absolutely continuous with respect to P (P P) and with a Radon–Nikodym derivative dP = . dP Then if φ is any integrable F-measurable random variable,    E φ | G if E | G > 0, E |G E[φ | G] =  0 otherwise. Proof

We must show that for any A ∈ G, E[φ | G]dP = αdP, A

A

4.1 Introduction

where

   E φ | G E |G α=  0 otherwise.

133

if E | G > 0,

Write G = {ω : E | G = 0}, so G ∈ G. Then

E | G dP = 0 = G

dP, G

and ≥ 0 a.s. So either P(G) = 0, or the restriction of to G is 0 a.s. In either case, = 0 a.s. on G. Now G c = {ω : E | G > 0}. Suppose A ∈ G; then A = B ∪ C, where B = A ∩ G c and C = A ∩ G. Further, E[φ | G]dP = φdP = φdP A A A = φdP + φdP. B

Of course, = 0 a.s. on C ⊂ G, so φdP = 0 = αdP, C

by definition. Now

(4.1.2)

C

αdP = B

= = = = = That is

(4.1.1)

C

(E φ | G /E | G )dP B E φ | G

E IB E |G E φ | G

E IB E |G

E φ | G E E[I B |G E |G E φ | G

E I B E[ | G] E |G E I B φ .

φdP = B

αdP. B

(4.1.3)

134

Change of measures

From (4.1.1), adding (4.1.2) and (4.1.3), we see that φdP + φdP = φdP C B A = E[φ | G]dP = αdP, A

A

and the result follows. Another useful version of the preceding theorem is the following result. Theorem 4.1.2 Suppose (, F, P) is a probability space with a filtration {Ft , t ≥ 0}. Suppose P is another probability measure absolutely continuous with respect to P (P P) on F and with Radon–Nikodym derivative dP = . dP Define the martingale E | Ft = t . Then if {φt } is any {Ft }-adapted process,  E t φt | Fs   if E t | Fs > 0, E t | Fs E φt | Fs =  0 otherwise.

4.2 Measure change for discrete time processes Example 4.2.1 Let {bn } be a sequence of i.i.d. Bernouilli random variables on a probability space (, F, P) such that P(bk = 1) = p1 and P(bk = 2) = p2 , p1 + p2 = 1. Consider the filtration {Fk } = σ {b1 , . . . , bk }. Suppose that we wish to define a new probability measure P on (, Fk } such that P(bk = 1) = P(bk = 2) = 1/2. For 1 ≤ k ≤ N define a positive {Fk , P}-martingale {k } with P-mean 1 and put dP (ω) = N (ω). FN dP Let 0 = 1. Since 1 is F1 = σ {b1 }-measurable we have 1 (ω) =

P(b1 = 1) P(b1 = 2) I(b1 =1) (ω) + I(b =2) (ω), P(b1 = 1) P(b1 = 2) 1

or 1 (ω) =

1 1 I(b =1) (ω) + I(b =2) (ω). 2 p1 1 2 p2 1

(4.2.1)

4.2 Measure change for discrete time processes

135

Similarly, 2 (ω) = =

2 P(bi = j, b j = i) I(b = j,b j =i) P(bi = j, b j = i) i i, j=1 2

1 I(bi = j,b j =i) . 4 p i pj i, j=1

Define λk (ω) =

2 1 I(b =i) (ω), 2 pi k i=1

N (ω) =

N

λk (ω).

k=1

Now E[k | Fk−1 ] = k−1 E[λk | Fk−1 ] 2 1 = k−1 E[ I(b =i) (ω) | Fk−1 ] 2 pi k i=1 = k−1

2 1 pi = k−1 . 2 pi i=1

Hence for 1 ≤ k ≤ N , {k } is a martingale and since 0 = 1, E[k ] = 1. Lemma 4.2.2 Under the probability measure P defined by (4.2.1), {bn } is a sequence of i.i.d. Bernouilli random variables such that P(bn = 1) = P(bn = 2) = 1/2. Proof

Using Bayes’ Theorem 4.1.1 write P[bn = | Fn−1 ] = E[I(bn =) | Fn−1 ] = =

E[I(bn =) n | Fn−1 ] E[n | Fn−1 ]

n−1 E[I(bn =) λn | Fn−1 ] E[I(bn =) λn ] = n−1 E[λn | Fn−1 ] E[λn ]

= E[I(bn =) λn ]. Here λn =

2 i=1

=

1 1 I(b =i) (ω) and E[λn ] = 1 so that 2 2 pi n 1 P[bn = ] 2 p 1 1 = p = , 2 p 2

P[bn = | Fn−1 ] =

which shows that under P {bn } is a sequence of i.i.d. Bernouilli random variables such that P(bn = 1) = P(bn = 2) = 1/2.

136

Change of measures

Example 4.2.3 Let {X n } be a sequence of random variables with positive probability density functions φn on some probability space (, F, P). Consider the filtration {Fn } = σ {X 1 , . . . , X n }. Suppose that we wish to define a new probability measure P on (, Fn } such that X n are i.i.d. with positive probability density function α. Let λ0 = 1 and for k ≥ 1, λk =

α(X k ) , φk (X k )

n =

n

λk ,

k=0

and dP (ω) = n (ω). Fn dP Lemma 4.2.4 The sequence of random variables {n }, n ≥ 0 is an {Fn , P}-martingale with P-mean 1. Moreover, under P, {X n } is a sequence of i.i.d. random variables with probability density function α. Proof

We have to show that E[n | Fn−1 ] = n−1 .

However, n = n−1 λn and since n−1 is Fn−1 -measurable we must show that E[λn | Fn−1 ] = 1. In view of the definition of λn we have α(x) α(X n ) E[λn | Fn−1 ] = E | Fn−1 = E φk (x)dx | Fn−1 = 1. φk (X n ) IR φk (x) Since {n } is a martingale, for all n, E[λn ] = E[λ0 ] = 1. Let f be any integrable real-valued “test” function. Using Bayes’ Theorem 4.1.1, E[ f (xn ) | Fn−1 ] =

E[ f (xn )n | Fn−1 ] = E[ f (xn )λn | Fn−1 ]. E[n | Fn−1 ]

Using the form of λn we have α(x) E f (x) f (x)α(x)dx, φk (x) | Fn−1 = φk (x) IR IR which finishes the proof. The next example is a generalization of Example 4.2.1. Some dependence between the random variables bn is introduced. Example 4.2.5 Let {ηn }, 1 ≤ n ≤ N be a Markov chain with state space {1, 2} on a probability space (, F, P) such that P(ηn = j | ηn−1 = i) = pi j and let { p10 , p20 } be the distribution

4.2 Measure change for discrete time processes

137

of η0 . Consider the filtration {Fn } = σ {η0 , η1 , . . . , ηn }. Suppose that we wish to define a new probability measure P on (, Fn } such that P(ηn = j | ηn−1 = i) = pi j . Let 0 = 1. Since 1 is F1 = σ {η0 , η1 }-measurable we have that 1 (ω) =

p 11 p I(η =1,η1 =1) (ω) + 12 I(η0 =1,η1 =2) (ω) p11 0 p12 +

p 21 p I(η =2,η1 =1) (ω) + 22 I(η0 =2,η1 =2) (ω). p21 0 p22

Define λn (ω) =

p ji I(η =i,η = j) (ω), p ji n−1 n

ij

N =

N

λn .

n=1

Lemma 4.2.6 {n } is an {Fn , P}-martingale and under P the Markov chain η has transition probabilities pi j . Proof Using the fact that n−1 is Fn−1 -measurable and the Markov property of {ηn } under P we can write E[n | Fn−1 ] = n−1

= n−1 = n−1

p ji E[I(ηn−1=i ,ηn = j) | ηn−1 ] pi j ij p ji pi j I(ηn−1=i ) p ji ij i

I(ηn−1=i )

p ji

j

= n−1 . Hence {n } is a martingale and since 0 = 1, E[n ] = 1 for all n ≥ 0. Using Bayes’ Theorem 4.1.1 write P[ηn = | Fn−1 ] = E[I(ηn =) | Fn−1 ] =

E[I(ηn =) n | Fn−1 ] E[n | Fn−1 ]

=

n−1 E[I(ηn =) λn | Fn−1 ] n−1 E[λn | Fn−1 ]

=

E[I(ηn =) λn | Fn−1 ] E[λn | Fn−1 ]

= E[I(ηn =) λn | Fn−1 ].

138

Here λn (ω) =

Change of measures

ij

p ji I(η =i,η = j) (ω) and E[λn ] = 1] so that: p ji n−1 n p i I(η ) P[ηn = | Fn−1 ] pi n−1=i i p i = I(η ) P[ηn = | ηn−1 ] pi n−1=i i p i = pi I(ηn−1=i ) pi i

P[ηn = | Fn−1 ] =

= p X n−1 , , which shows that under P, {ηn } is a Markov chain with transition probabilities pi j .

Example 4.2.7 Let {ηn } be a Markov chain with state space S = {e1 , . . . , e M }, where ei are unit vectors in IR M with unity as the i-th element and zeros elsewhere. Write Fn0 = σ {η0 , . . . , ηn } for the σ -field generated by η0 , . . . , ηn , and {Fn } for the complete filtration generated by the Fn0 ; this augments Fn0 by including all subsets of events of probability zero. The Markov property implies here that P(ηn+1 = e j | Fn ) = P(ηn+1 = e j | ηn ). Write

= ( p ji ) ∈ IR M×M , so that E[ηk+1 | Fk ] = E[ηk+1 | ηk ] = ηk . From (2.4.3) we have the semimartingale ηn+1 = ηn + Vn+1 .

(4.2.2)

The Markov chain is a simple kind of stochastic process on S. However, a more simple process would be one in which η is independently and uniformly distributed over its state space S at each time n. This is modeled by supposing there is a probability measure P on (, F) such that at time n, P(ηn+1 = j | ηn = i) = 1/M. Given such a simple process, and its probability P, we shall construct a new probability P so that under P, η is a Markov chain with transition matrix . Recall that, if = ( p ji ) is a transition matrix, then ( p ji ) ≥ 0 and M j=1 p ji = 1. Suppose is any transition matrix. Suppose {ηn }, n ≥ 0, is a process on the finite state space S such that, under a probability P, P(ηn = j | ηn−1 = i) =

1 . M

That is, the probability distribution of η is independent, and uniformly distributed at each time n.

4.2 Measure change for discrete time processes

139

Lemma 4.2.8 Define λ¯ = M

M

( η−1 , e j η , e j ),

j=1

¯n = and

n

=1

λ¯ .

A new probability measure P is defined by putting Markov chain with transition matrix . Proof

Note first that E[λ¯ | F−1 ] = M E

M

dP = n , and under P, η is a dP Fn

( η−1 , e j η , e j ) | F−1

j=1

=M =

M 1

η−1 , e j M j=1

M M

η−1 , ei p ji = 1.

i=1 j=1

Then, using Bayes’ Theorem 4.1.1, P(ηn = e j | Fn−1 ) = E[ X n , e j | Fn−1 ] =

E[ X n , e j n | Fn−1 ] E[n | Fn−1 ]

.

Because n = n−1 λ¯ n and n−1 is Fn−1 -measurable this is E[ X n , e j λ¯ n | Fn−1 ] = M E[ ηn−1 , e j ηn , e j ) | Fn−1 ] E[λ¯ n | Fn−1 ] = ηn−1 , e j , and, as this depends on ηn−1 this equals P(ηn = e j | ηn−1 ). If ηn−1 = ei we see that P(ηn = e j | ηn−1 = ei ) = p ji and so, under P, η is a Markov chain with transition matrix . Example 4.2.9 In this example we discuss the filtering of a partially observed discrete-time, finite-state Markov chain, that is, the Markov chain is not observed directly; rather there is a discrete-time, finite-state observation process {Yk }, k ∈ IN, which is a “noisy” function of the chain. All processes are defined initially on a probability space (, F, P); below a new probability measure P is defined. A system is considered whose state is described by a finite-state, homogeneous, discretetime Markov chain X k , k ∈ IN. We suppose X 0 is given, or its distribution known. If the state space of X k has N elements it can be identified, without loss of generality, with the set S X = {e1 , . . . , e N }, where ei are unit vectors in IR N with unity as the i-th element and zeros elsewhere.

140

Change of measures

Write Fk = σ {X 0 , . . . , X k }, for the complete filtration generated by X 0 , . . . , X k . The Markov property implies here that P(X k+1 = e j | Fk ) = P(X k+1 = e j | X k ). Write a ji = P(X k+1 = e j | X k = ei ), A = (a ji ) ∈ IR N ×N ,

(4.2.3)

so that E[X k+1 | Fk ] = E[X k+1 | X k ] = AX k and X k+1 = AX k + Vk+1 . The state process X is not observed directly. We suppose there is a function c(., .) with finite range and we observe the values Yk+1 = c(X k , w k+1 ),

k ∈ IN,

(4.2.4)

where the w k are a sequence of independent, identically distributed (i.i.d.) random variables. We shall write {Gk } for the complete filtration generated by X and Y , and {Yk } for the complete filtration generated by Y . Suppose the range of c(., .) consists of M points. Then we can identify the range of c(., .) with the set of unit vectors SY = { f 1 , . . . , f M }, f j = (0, . . . , 1, . . . , 0) ∈ IR M , where the unit element is the j-th element. Now (4.2.4) implies P(Yk+1 = f j | Gk ) = P(Yk+1 = f j | X k ). Write C = (c ji ) ∈ IR M×N , c ji = P(Yk+1 = f j | X k = ei ),

(4.2.5)

so that M j=1 c ji = 1 and c ji ≥ 0, 1 ≤ j ≤ M, 1 ≤ i ≤ N . Note that, for simplicity, we assume that the c ji are independent of k. We have, therefore, E[Yk+1 | X k ] = C X k . If Wk+1 := Yk+1 − C X k then, taking the conditional expectation and noting E[C X k | X k ] = C X k , we have E[Wk+1 | Gk ] = E[Yk+1 − C X k | X k ] = C X k − C X k = 0, so Wk is a (P, Gk ) martingale increment and Yk+1 = C X k + Wk+1 , Write Yki = Yk , f i so Yk = (Yk1 , . . . , YkM ) , k ∈ IN. For each k ∈ IN, exactly one component is equal to 1, the remainder being 0. M i Note i=1 Yk = 1. Write i i ck+1 = E[Yk+1 | Gk ] =

N

ci j e j , X k ,

j=1 1 M and ck+1 = (ck+1 , . . . , ck+1 ) . Then

ck+1 = E[Yk+1 | Gk ] = C X k . We shall suppose initially that cki > 0, 1 ≤ i ≤ M, k ∈ IN. (See, however, Remark 4.2.12.) M i Note i=1 ck = 1, k ∈ IN.

4.2 Measure change for discrete time processes

141

In summary then, we have under P, X k+1 = AX k + Vk+1 Yk+1 = C X k + Wk+1 ,

(4.2.6) k ∈ IN,

(4.2.7)

where X k ∈ S X , Yk ∈ SY , A and C are matrices of transition probabilities given in (4.2.3), (4.2.5). The entries satisfy N

a ji = 1, a ji ≥ 0,

(4.2.8)

c ji = 1, c ji ≥ 0.

(4.2.9)

j=1 M j=1

We assume, for this measure change, ci > 0, 1 ≤ i ≤ M, ∈ IN. This assumption, in effect, is that given any Gk , the observation noise is such that there is a nonzero probability i that Yk+1 > 0 for all i. This assumption is later relaxed to achieve the main results of this section. Define λ =

M M −1

Y , f i , i i=1 c

k =

k

λ .

=1

Lemma 4.2.10 With the above definitions E[λk+1 | Gk ] = 1. Proof E[λk+1 | Gk ] = =

M 1 1 i P(Yk+1 = 1 | Gk ) i M i=1 ck+1 M 1 1 · ci = 1. i M i=1 ck+1 k+1

∞ We now define a new probability measure P on , G by putting the restriction of =1

dP dP the Radon–Nikodym derivative to the σ -field Gk equal to k . Thus = k . This dP dP Gk means that, for any set B ∈ Gk , P(B) = k dP. B

Equivalently, for any Gk - measurable random variable φ, dP E φ = φdP = φ dP = φk dP = E k φ , dP where E and E denote expectations under P and P, respectively. Lemma 4.2.11 Under P, {Yk }, k ∈ IN, is a sequence of i.i.d. random variables each having the uniform distribution which assigns probability 1/M to each point f i , 1 ≤ i ≤ M, in its range space.

142

Proof

Change of measures

Using Lemma 4.2.10 and Bayes’ Theorem 4.1.1 we have j

P(Yk+1 = 1 | Gk ) = E[ Yk+1 , f j | Gk ] = E[k+1 Yk+1 , f j | Gk ]/E[k+1 | Gk ] = k E[λk+1 Yk+1 , f j | Gk ]/k E[λk+1 | Gk ] = E[λk+1 Yk+1 , f j | Gk ]

1 M =E

Y , f Y , f | G k+1 i k+1 j k i=1 i Mck+1

1 j = E Yk+1 | Gk j Mck+1 1 1 j = ck+1 = , j M Mck+1 a quantity independent of Gk , which finishes the proof. Note that E[X k+1 | Gk ] =

E[k+1 X k+1 | Gk ] = E[λk+1 X k+1 | Gk ] = AX k , E[k+1 | Gk ]

so that under P, X remains a Markov chain with transition matrix A. A reverse measure change ∞ What we wish to do now is start with a probability measure P on , Gn such that n=1

1. the process X is a finite-state Markov chain with transition matrix A and 2. {Yk }, k ∈ IN, is a sequence of i.i.d. random variables and j

j

P(Yk+1 = 1 | Gk ) = P(Yk+1 = 1) = 1/M. Suppose C = (c ji ), 1 ≤ j ≤ M, 1 ≤ i ≤ N is a matrix such that c ji ≥ 0 and M j=1 c ji = 1. ∞ We shall now construct a new measure P on , Gn such that under P, (4.2.7) still holds and E[Yk+1 | Gk ] = C X k . We again write

i and ck+1

n=1

ck+1 = C X k , K i = ck+1 , f i = C X k , f i , so that i=1 ck+1 = 1.

Remark 4.2.12 We do not divide by the cki in the construction of P from P. Therefore, we no longer require the cki to be strictly positive. The construction of P from P is inverse to that of P from P. Write = M

M i=1

ci Y , f i ,

k =

k

=1

λ .

4.2 Measure change for discrete time processes

143

Lemma 4.2.13 With the above definitions E[λk+1 | Gk ] = 1. Proof

Following the proof of Lemma 4.2.13, E[λk+1 | Gk ] = M

M

i i ck+1 P(Yk+1 = 1 | Gk )

i=1

=M

M i ck+1 i=1

This time set Theorem.)

M

=

M

i ck+1 = 1.

i=1

dP = k . (The existence of P follows from Kolmogorov’s Extension dP Gk

Lemma 4.2.14 Under P, E[Yk+1 | Gk ] = C X k . Proof

The proof is left as an exercise.

Write qk (er ), 1 ≤ r ≤ N , k ∈ IN, for the unnormalized, conditional probability distribution such that E[k X k , er | Yk ] = qk (er ). Now

N

i=1 X k , ei

= 1, so

N i=1

N qk (ei ) = E k

X k , ei | Yk = E k | Yk . i=1

Therefore, the normalized conditional probability distribution pk (er ) = E[ X k , er | Yk ] is given by pk (er ) =

qk (er ) . k qk (e j ) j=1

Theorem 4.2.15 For k ∈ IN, and 1 ≤ r ≤ N , we have the recursive estimate qk+1 = A diag(qk ) M

M

i=1

Yi

ci jk .

144

Change of measures

Proof Using the independence assumptions under P and the fact that Nj=1 X k , e j = 1, we have

E[ X k+1 , er k+1 | Yk+1 ] = E AX k + Vk+1 , er k k+1 | Yk+1 =M

N

E[ X k , e j ar j k | Yk ]

j=1

=M

N

M

Yi

ci jk+1

i=1

qk (e j )ar j

j=1

M

Yi

ci jk+1 ,

i=1

and the result follows. Example 4.2.16 (Change of measure for linear systems). Consider a system whose state at times k = 1, 2, . . . is xk ∈ IR. Let (, F, P) be a probability space upon which {vk }, k ∈ IN is a sequences of N (0, 1) Gaussian random variables, having zero means and variances 1. Let {Fk }, k ∈ IN be the complete filtration (that is, F0 contains all the P-null events) generated by {x0 , x1 , . . . , xk }. The state of the system satisfies the linear dynamics xk+1 = axk + bvk+1 .

(4.2.10)

Note that E[vk+1 | Fk ] = 0. Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (4.2.10) will hold. Suppose that under P, {xk }, k ∈ IN, is an i.i.d. N (0, 1) sequence with density function φ. For each l = 0, 1, 2, . . . define φ(b−1 (xl − axl−1 )) , bφ(xl ) k

k = λl . λl =

l=0

Lemma 4.2.17 The process {k }, k ∈ IN, is a P-martingale with respect to the filtration {Fk }. Proof

Since k is Fk -measurable, E[k+1 | Fk ] = k E[λk+1 | Fk ].

So that it is enough to show that E[λk+1 | Fk ] = 1: φ(b−1 (xk+1 − axk )) | Fk ] E[λk+1 | Fk ] = E[ bφ(xk+1 ) φ(b−1 (x − axk )) = φ(x)dx. bφ(x) IR

4.3 Girsanov’s theorem

145

Using the change of variable b−1 (x − axk ) = u, φ(u)du = 1, IR

and the result follows. Define P on {, F} by setting the restriction of the Radon–Nykodim derivative Gk equal to k . Then:

dP dP

to

Lemma 4.2.18 On {, F} and under P, {vk }, k ∈ IN, is a sequence of i.i.d. N (0, 1) random variables, where

vk+1 = b−1 (xk+1 − axk ). Proof Suppose f : IR → IR is a “test” function (i.e. measurable function with compact support). Then with E (resp. E) denoting expectation under P (resp. P) and using Bayes’ Theorem 4.1.1, E[ f (vk+1 ) | Fk ] =

E[k+1 f (vk+1 ) | Fk ] E[k+1 | Fk ]

= E[λk+1 f (vk+1 ) | Fk ], where the last equality follows from Lemma 4.2.17. Consequently E[ f (vk+1 ) | Fk ] = E[λk+1 f (vk+1 ) | Fk ] φ(b−1 (xk+1 − axk )) =E f (b−1 (xk+1 − axk )) | Fk . bφ(xk+1 ) Using the independence assumption under P this is φ(b−1 (x − axk )) −1 φ(u) f (u)du, f (b (x − axk ))φ(x)dx = bφ(x) IR IR and the lemma is proved.

4.3 Girsanov’s Theorem In this section we investigate how martingales, and in particular, Brownian motion, are changed when a new, absolutely continuous, probability measure is introduced. We need first the following results. Theorem 4.3.1 Suppose (, F, P) is a probability space with a filtration {Ft , t ≥ 0}. Suppose P is another probability measure equivalent to P (P P and P P) and with Radon–Nikodym derivative dP = . dP

146

Change of measures

Define the martingale E | Ft = t Then 1. {X t t } is a local martingale under P if and only if {X t } is a local martingale under P. 2. Every P-semimartingale is a P-semimartingale. Proof 1. We prove the result for martingales. The extension to local martingales can be found in Proposition 3.3.8 of Jacod and Shiryayev [19]. Let {X t } be a P martingale and F ∈ Fs , s ≤ t. We have X t dP = X s dP = X s s dP, F

F

and

F

X t dP = F

that is

X t t dP, F

X t t dP = F

X s s dP. F

Hence X is a P-martingale. The proof of the converse is identical. 2. By definition, a semimartingale is the sum of a local martingale and a process of finite variation. We need only prove the theorem in one direction and we can suppose X 0 = 0. If {X t } is a semimartingale under P, then by the product rule {X t t } is a semimartingale under P, which has a decomposition X t t = Nt + Vt , where N a local martingale and V is a process of finite variation. Therefore −1 X t = Nt −1 t + Vt t ,

since, by the equivalence of P and P, −1 t exists and is a P-martingale. By the first part −1 of this theorem Nt t is a local martingale under P, and the second term is the product of the P-semimartingale V of finite variation and the P-martingale −1 t . Theorem 4.3.2 Suppose t and P are as mentioned in Theorem 4.3.1 above. Suppose {X t } is a local martingale under P with X 0 = 0, (i) {X t } is a special semimartingale under P if the process { X, t } exists and then under P, t t −1 Xt = Xt − s− d X, s + −1 s− d X, s . 0

0

4.3 Girsanov’s theorem

147

Here, the first term is a local martingale under P, and the second is a predictable process of finite variation. (ii) In general, the process t Xt − −1 s− d[X, ]s 0

is a local martingale under P. Proof

See [11] page 162.

The following important theorem is an extension of the following rather simple situation. Let X 1 , . . . , X n be i.i.d. normal random variables with mean E(X ) = 0 and variance E(X 2 ) = σ 2 = 0 under probability measure P and with mean E(X ) = µ and variance E(X 2 ) = σ 2 = 0 under probability measure P µ . Then it is clear that P µ P (and P P µ ) and that n n dP µ 1 2 µi X i (ω) − µ . (ω) = exp dP 2 i=1 i i=1 Theorem 4.3.3 (Girsanov) Suppose Bt , t ∈ [0, T ], is an m-dimensional Brownian motion on a filtered space {, F, Ft , P}. Let f = ( f 1 , . . . , f m ) : × [0, T ] → IRm be a predictable process such that T | f t |2 dt < ∞ a.s. 0

Write

t ( f ) = exp

m i=1

t

0

f si dBsi

1 − 2

t

| f s | ds , 2

0

and suppose E[T ( f )] = 1, 1 T 2

(which holds if Novikov’s condition E e 2 0 | ft | dt < ∞ holds). (See [11].) If P f is the dP f = T ( f ), then Wt is an m-dimensional dP f Brownian motion on {, F, Ft , P }, where t Wti = Bti − f si ds. (4.3.1) probability measure on {, F} defined by

0

Proof We prove here the scalar case. To show W is a standard Brownian motion we verify the conditions of Theorem 2.7.1. That is, we show that (i) it is continuous a.s., (ii) it is a (local) martingale, and (iii) {Wt2 , t ≥ 0} is a (local) martingale. By definition W is a continuous process a.s. (Bt is continuous a.s. and an indefinite integral is a continuous process.) For (ii) we must show W is a local (Ft )-martingale under measure P f . Equivalently, from

148

Change of measures

Lemma 4.3.1 we must show that {t Wt } is a local martingale under P. Using the Itˆo rule we see, as in Example 3.6.11, that

t

t ( f ) = 1 +

s ( f ) f s dBs .

(4.3.2)

0

Applying the Itˆo rule to (4.3.2) and W , t Wt = W0 +

t

0

= W0 +

t

s dBs −

0

= W0 +

t

s dWs +

0 t

Ws ds +

t

d , W s

0

s f s ds +

0

t

t

t

Ws s f s dBs +

0

s f s ds

0

s (1 + Ws f s )dBs ,

0

and, as a stochastic integral with respect to B, {t Wt , t ≥ 0} is a (local) martingale under P. Property (iii) is established similarly, Wt2

=2

t

Ws dWs + W, W t = 2

0

t

Ws dWs + t,

0

or Wt2 − t = 2

t

Ws dWs ,

0

which, from (ii), is a (local) martingale under P f and the result follows.

Example 4.3.4 As a simple application of Girsanov’s theorem, let us derive the distribution of the first passage time, α = inf{t, Bt = b}, for Brownian motion with drift to a level b ∈ IR (see Example 2.2.5). Suppose that under probability measure P, {Bt , FtB } is a standard Brownian motion. Write 1 t = exp µBt − µ2 t , 2 and set dP µ = t . dP µ

Using Girsanov’s theorem, the process Bt = Bt − µt is a standard Brownian motion unµ der probability measure P µ . That is, under probability measure P µ , Bt = µt + Bt is a Brownian motion with drift µt.

4.3 Girsanov’s theorem

149

Now P µ (α ≤ t) = E µ [I (α ≤ t)] = E[t I (α ≤ t)] = E[I (α ≤ t)E[t | Fα ]] (see (2.2.2) and (2.2.3) for the definition of Fα ) = E[I (α ≤ t)α ]

= E[I (α ≤ t) exp µb − t |b| = exp µb − √ 2πs 3 0

1 2 µ α ] 2 1 2 µ s − b/2s ds. 2

See Problem 10, Chapter 2 for the density function of α under P.

Remark 4.3.5 Equation (4.3.1) is equivalent to saying that the original Brownian motion process {Bt } is a weak solution of the stochastic differential equation dX t = f (t, ω)dt + dB t ,

X 0 = 0,

where {B t } is a Brownian motion. That is, we have constructed a probability measure P on (, F) and a new Brownian motion process {B t } such that dBt = f (t, ω)dt + dB t . Remark 4.3.6 Let X t be a special semimartingale; then (see Example 3.6.11) t t = 1 + s− dX s ,

(4.3.3)

0

has the unique solution (0 = 1) 1 c c t = e X t − 2 X , X t s≤t (1 + X s )e− X s ,

which is called the stochastic exponential of the semimartingale {X t }. If t is a uniformly integrable positive martingale then ∞ = limt→∞ t exists and E[∞ | Ft ] = t

(a.s.).

Consequently, E[∞ ] = E[0 ] = 1, so that a new probability measure P can be defined on (, F) by putting dP = ∞ . dP P is equivalent to P if and only if ∞ > 0 a.s. More precisely, we have the following form of Girsanov’s theorem. (See [11] page 165.)

150

Change of measures

Theorem 4.3.7 Suppose the exponential t and P are as mentioned in (4.3.3) and Remark 4.3.6. If {Mt } is a local martingale under probability measure P, and the predictable covariation process { M, X t } exists under probability measure P, then M t = Mt − M, X t is a local martingale under probability measure P. Proof

First note that t plays the role of t in part (i) of Theorem 4.3.2. However, t t = 1 + s− dX s , 0

so

M, t =

t

s− d M, X s

0

and

t 0

−1 s− d M, s = M, X t .

That is, from part (i) of Theorem 4.3.2, M t = Mt − M, X t is a local martingale under probability measure P. More generally, we have the following result which is proven in [11]. Theorem 4.3.8 Suppose for a continuous local martingale {X t } the exponential t and P are as mentioned in Remark 4.3.6. Let {Mt } = {Mt1 , . . . , Mtm } be an IRm -valued continuous local martingale under prob1 m ability measure P. Then {M t } = {M t , . . . , M t } is a continuous local martingale under i probability measure P, where M t = Mti − M i , X t , and the predictable covariation under probability measure P of {M t } is equal to the predictable covariation under probability measure P of {Mt }, that is i

j

M , M tP = M i , M j tP . 4.4 The single jump process In this section we investigate Radon–Nikodym derivatives relating probability measures that describe when the jump happens and where it goes for a single jump process. Recall a few facts from Chapters 2 and 3. Consider a stochastic process {X t }, t ≥ 0, which takes its values in some measurable space {E, E} and which remains at its initial value z 0 ∈ E until a random time T , when it jumps to a random position Z . A sample path of the process is z 0 if t < T (ω), X t (ω) = Z (ω) if t ≥ T (ω).

4.4 The single jump process

151

The underlying probability space can be taken to be = [0, ∞] × E, with the σ -field B × E. A probability measure P is given on (, B × E). Write Ft = P[T > t, Z ∈ E], c = inf{t : Ft = 0} and d(t) = P(T ≤ t, Z ∈ E | T > t − ) =

−dFt Ft− ,

for the rate of the jump of the process X . Write FtA = P[T > t, Z ∈ A], then there is a Radon–Nikodym derivative λ(A, s) such that A A Ft − F0 = λ(A, s)dFs . ]0,t[

There is a bijection between probability measures P on (, B × E) and L´evy systems (λ, ). For A ∈ E define P(]0, t] × A) = − λ(A, s)dFs . ]0,t]

For t ≥ 0 define µ(t, A) = IT ≤t I Z ∈A . The predictable compensator of µ is given by dFsA µ p (t, A) = − . ]0,T ∧t] Fs− Write Ft for the completed σ -field generated by {X s }, s ≤ t, then q(t, A) = µ(t, A) − µ p (t, A) is an Ft -martingale. Suppose P is absolutely continuous with respect to P. Then there is a Radon–Nikodym dP derivative L = . Write L t = E[L | Ft ]. From Lemma 3.8.8, dP 1 L t = L(T, Z )I{T ≤t} + I{T >t} L(s, z)P(ds, dz). Ft ]t,∞] E However, the P(ds, dz)-integral is equivalent to

P(T > t, Z ∈ E) = F t , so that L t = L(T, Z )I{T ≤t} + I{T >t}

Ft . Ft

If we substitute the mean 0 martingale L t − 1 for Mt in Theorem 3.8.10 we have the stochastic integral representation Lt − 1 = I{s≤t} g(s, x)q(ds, dx),

where g(s, x) = L(s, x) − I{s c.

152

Change of measures

In order to use the exponential formula given in Example 3.6.11 we write t Lt = 1 + L s− dMs .

(4.4.1)

0

Here

Mt =

I{s≤t} g(s, x)L −1 s− q(ds, dx).

The unique solution of (4.4.1) is the stochastic exponential (L 0 = 1)

L t = e Mt (1 + Ms )e− Ms . s≤t

At the discontinuity of Fs ,

Ms = E

g(s, z)L −1 s− λ(dz, s)

and at the jump time T , MT = g(T, z)L −1 T− + Hence

E

Fs , Fs−

g(T, z)L −1 T − λ(dz, T )

FT . FT −

−1 L t = exp − I{s≤t} g(s, x)L s− dµ p FT −1 −1 × 1 + g(T, z)L T − I{T ≤t} + I{T ≥t} g(T, z)L T − λ(dz, T ) FT − E

F s × 1+ . g(s, z)L −1 s− λ(dz, s) Fs− E s≤t∧T,u=T

We can relate the L´evy system (λ, ) of probability measure P to that of probability measure P. This is given in the next theorem (see [11]). Theorem 4.4.1 Suppose (λ, ) is the L´evy system of probability measure P. Then dF-a.s.: Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E A , λ(A, s) = Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E E and

t = ]0,t]

Proof

E

1 + g(s, z)L −1 s− +

For t > 0 and A ∈ E, F¯tA = P(]t, ∞] × A) =

However,

Fs Fs−

E

g(s, z)L −1 s− λ(dz, s)ds .

LdP = − ]t,∞]×A

F¯tA = −

L(s, z)λ(dz, ds)dFs . ]t,∞]

A

λ(A, s)d F¯s = − ]t,∞]

λ(A, s) ]t,∞]

d F¯s dFs . dFs

4.4 The single jump process

so dFs -a.s.: λ(A, s)

d F¯s = dFs

L(s, z)λ(dz, ds) = A

153

¯ F¯s Fs− g(s, z)L −1 + λ(dz, ds). s− Fs− Fs A

¯ and if F¯c− ¯ Therefore, for s < c, ¯ = 0, for s ≤ c, Fs Fs d F¯s F¯s λ(dz, ds) d F¯s -a.s. λ(A, s) = g(s, z)L −1 + s− Fs− F¯s− dFs F¯s− A Fs F¯s −1 = 1+ g(s, z)L s− + 1 + λ(dz, ds). Fs− F¯s− A (4.4.2) ¯ and Now if s is a point of continuity of F then it is also a point of continuity of F, ¯ ¯ d Fs Fs Fs = F¯s = 0. If Fs = 0 then the Radon–Nikodym derivative = , and the dFs Fs left hand side above is Fs− (Fs− + F¯s ) F¯s F¯s 1+ . λ(A, s) = λ(A, s) Fs Fs F¯s F¯s− Evaluating (4.4.2) when A = E, so λ(E, s) = 1 = λ(E, s), F¯s Fs Fs −1 −1 1 + g(s, z)L s− + = g(s, z)L s− λ(dz, s), Fs− E Fs− F¯s− if Fs = 0, and we have Fs d F¯s = (1 + g(s, z)L −1 s− )λ(dz, s), F¯s− dFs E Fs ) = 0, Fs− Fs −1 1 + g(s, z)L −1 + g(s, z)L dλ dλ s− s− Fs− E λ(A, s) = A Fs 1 + g(s, z)L −1 g(s, z)L −1 s− + s− dλ dλ Fs− E E

if Fs = 0. Substituting in (4.4.2) we have if (1 +

¯ and for s ≤ c¯ if F¯c− d F¯s -a.s. for s < c, ¯ = 0. Now (1 + Fs /Fs− ) = 0 only if s = c, c < ∞ and Fc− = 0. This situation is only of interest here if also c¯ = c and F¯c− = 0. However, in this case it is easily seen that substituting g(c, z)L −1 c− =

Fc− L(c, z) F¯c−

in (4.4.2) gives the correct expression for λ(A, c) = λ(A, c), because L(c, z) = Now

t = − ]0,t]

d F¯s = F¯s

]0,t]

Fs d F¯s ds . F¯s− dFs

F¯c dλ . Fc dλ

154

Change of measures

If Ft is continuous at s, again F¯s = Fs = 0 and evaluating (4.4.2) for A = E, ds Fs d F¯s = = (1 + g(s, z)L −1 s− )λ(dz, s). ds F¯s dFs E That is

t = ]0,t]

E

1 + g(s, z)L −1 s− +

Fs Fs−

E

g(s, z)L −1 s− λ(dz, s)ds .

Notation 4.4.2 Denote by A the set of right-continuous, monotonic increasing (deterministic) functions t , t ≥ 0, such that (1) 0 = 0, (2) u = u − u− ≤ 1 for all points of discontinuity u, (3) if u = 1 then t = u for t ≥ u. Remark 4.4.3 If t ∈ A then t = ct + dt , where dt = s≤t s and ct is continuous. The decomposition is unique and both dt and ct are in A. If dt = 0 and ct is absolutely continuous with respect to Lebesgue measure, there is a measurable function rs such that t c t = rs ds. 0

The function rs is often called the “rate” of the jump process. Note that might equal +∞ for finite t.

Lemma 4.4.4 The formulae Ft = 1 − G t , Ft = exp(−ct )

(1 − u ),

(4.4.3)

u≤t

t = − ]0,t]

−1 Fs− dFs ,

(4.4.4)

define a bijection between the set A and the set of all probability distributions {G} on ]0, ∞]. Proof Clearly if t ∈ A then Ft , defined by (4.4.3), is monotonic decreasing, rightcontinuous, F0 = 0 and 0 ≤ Ft ≤ 1. Therefore G t = 1 − Ft is a probability distribution on ]0, ∞]. Conversely, if G t is a probability distribution, if Ft = 1 − G t and t is given by (4.4.4), then t is in A. From Example 3.6.11 (taking to be a single point), Ft defined by (4.4.3) is the unique solution of the equation dFt = −Ft− dt , This shows the correspondence is a bijection.

F0 = 1.

4.4 The single jump process

155

Lemma 4.4.5 Suppose t ∈ A is a second process whose associated Stieltjes measure dt is absolutely continuous with respect to dt , that is dt = αt . dt Then the associated F t has the form F t = Ft

(1 − α(s) d ) s

s≤t

(1 − ds )

t exp − (α(s) − 1)dcs , 0

where Ft is defined by (4.4.3). Furthermore, α(s) ds ≤ 1, and if α(s) ds = 1 then α(t) = 0 for t ≥ s. Proof

By hypothesis

t = 0

so from (4.4.3) c

F t = e−t

t

α(s)dcs +

α(s) ds ,

s≤t

(1 − u )

u≤t

t c = exp − α(s)ds (1 − α(s) ds ) 0

= Ft

u≤t

(1 − α(s) d ) s

s≤t

(1 − ds )

t c exp − (α(s) − 1)ds . 0

The conditions on α follow from Lemma 4.4.4 and the definition of A. If λ(., .) is such that (λ1) λ(A, s) ≥ 0 for A ∈ E, s > 0, (λ2) for each A ∈ E λ(A, .) is Borel measurable, (λ3) for all s ∈]0, c[, (except perhaps on a set of d-measure 0), λ(., s) is a probability measure on (E, E), and if c < ∞ and c− < ∞ then λ(., c) is a probability measure. Then: Lemma 4.4.6 There is a bijection between probability measures P on (, B × E) and L´evy systems (λ, ). Proof In Example 2.1.4 we saw how a L´evy system is determined by a measure P. Conversely, given a pair (λ, ), because ∈ A we can determine a function Ft by (4.4.3). For A ∈ E define P(]0, t] × A) = − λ(A, s)dFs . ]0,t]

Now the converse of theorem 4.4.1 is given. (Theorem 17.12 of [11].)

156

Change of measures

Theorem 4.4.7 Suppose P, P have L´evy systems (λ, ) and (λ, ). Write c = inf{t : F t = 0}, and suppose c ≤ c, dt d on ]0, c] and λ(., t) λ(., t) d-a.e. Then P P with Radon–Nikodym derivative t c L(t, z) = α(t)β(t, z) t− exp − (α(s) − 1)ds I{t≤c} .

(4.4.5)

0

Fs

1 + Fs− α(s) , Here t = Fs s≤t 1+ Fs−

dt dλt = α(t), and = β(t, z). dt dλt Proof

Define L(t, Z ) by (4.4.5) and write t η(t) = exp − (α(s) − 1)dcs . 0

β(t, z)dλ = 1 a.s.

Then, because E

E[L(t, Z )] = −

α(t)η(t) t− dFt . ¯ ]0,c]

From Lemma 4.4.5 and Equations (4.4.3) and (4.4.4), η(t) t− =

F¯t− . Ft−

As measures on [0, ∞], dt = so

d F¯t dFt = −α(t) = α(t)dt , ¯ Ft− Ft−

E[L(t, Z )] = −

α(t) F¯t− ¯ ]0,c]

dFt =− Ft−

d F¯t− F¯t− = F¯0 − F¯c¯ = 1. F¯t− ¯ ]0,c]

dP ∗ A probability measure P ∗ P can, therefore, be defined on (, B × E) by putting = dP L. For t < c we have L t = E[L | F] = L(T, Z )I{t≥T } + I{t
By similar calculations to those above the later term is F¯t Ft−1 α(s)η(s) s− dFs = = η(t) t , Ft ¯ ]t,c]

4.5 Change of parameter in Poisson processes

157

so L t = α(T )β(T, Z )η(T ) T − I{t≥T¯} + I{t
¯ for t < c. ¯ and for t = c¯ if c¯ = ∞ or F¯c− (2) φ(t, z) = 0 for t > c, ¯ = 0. ¯ z) = L(c, ¯ z) if c¯ < ∞ and F¯c− ¯ z) = (3) φ(c, = 0, that is, substituting, in this case φ(c, ¯ ¯ c, ¯ z). α(c)β( The L´evy system (λ∗ , ∗ ) associated with P ∗ is then defined by Fs 1+φ+ φdλ dλ Fs− E , λ∗ (A, s) = A Fs 1+φ+ φdλ dλ Fs− E E and ∗t

= ]0,t]

E

Fs 1+φ+ Fs−

φ λ(dz, s)ds . E

Substituting the above expression for φ we have (1 + ( Ft /Ft− )α(t)) φdλ = α(t)I{t≤c} , ¯ − (1 + Ft /Ft− ) E and (1 + ( Ft /Ft− ) φdλ = 1. E

The above expression gives d∗t = α(t), and dt

dλ∗t = β(t, z), dλt

so ∗t = , and

λ∗t = λ.

By Lemma 4.4.6, P = P ∗ P and the result is proved.

4.5 Change of parameter in Poisson processes Let Nt be a Poisson process with constant parameter λ on a filtered probability space (, F, Ft , P) and suppose that we wish to define a new probability P such that Nt is a

158

Change of measures

ˆ Define the stochastic process Poisson process with constant parameter λ. t =

Nt λˆ λ

e(λ−λ)t . ˆ

(4.5.1)

Lemma 4.5.1 The process (4.5.1) is a martingale under probability measure P. Proof Let 0 ≤ s ≤ t and recall that Nt is an independent increment process adapted to Ft so that   Ns (Nt −Ns ) ˆλ ˆ λ ˆ ˆ E[t | Fs ] = e(λ−λ)s E  e(λ−λ)(t−s)  λ λ   (Nt −Ns ) ˆ λ ˆ  = s e(λ−λ)(t−s) E  λ k λˆ (λ(t − s))k ˆ (λ−λ)(t−s) = s e e−λ(t−s) λ k! k = s , which finishes the proof.

Lemma 4.5.2 The exponential martingale {t } given by (4.5.1) is the unique solution of

t

t = 1 −

ˆ s− λ−1 (λ − λ)(dN s − λds).

(4.5.2)

0

Proof

Write t = e(λ−λ)t Yt , ˆ

where Yt =

Nt λˆ λ

(4.5.3)

, and t = f (t, Yt ). Using rule (3.6.9),

t ∂ f (s, Ys− ) ∂ f (s, Ys− ) ds + dYs ∂s ∂Y 0 0 ∂ f (s, Ys− ) f (s, Ys ) − f (s, Ys− ) − + Ys . ∂Y 0<s≤t

f (t, Yt ) = 1 +

t

(4.5.4)

Because Yt is a purely discontinuous process and of bounded variation, the second integral ˆ in (4.5.4) is equal to 0<s≤t e(λ−λ)s Ys .

4.5 Change of parameter in Poisson processes

159

In expression (4.5.4) we have f (s, Ys ) − f (s, Ys− ) = s − s−

Nr Ns λˆ λˆ =e λ r ≤s− λ Nr

λˆ ˆ −e(λ−λ)s r ≤s− λ   Ns ˆ λ = s−  − 1 λ λˆ = s− − 1 Ns . λ ˆ (λ−λ)s

Putting all these results together gives

t

t = 1 + 0

+

0<s≤t t

= 1+

ˆ s− ds + (λ − λ)

s−

− −

ˆ

0<s≤t

ˆ s− ds + (λ − λ) e(λ−λ)s Ys − ˆ

e(λ−λ)s Ys

λˆ ˆ − 1 Ns − e(λ−λ)s Ys λ

0

e(λ−λ)s Ys ˆ

0<s≤t

t

ˆ s− ds (λ − λ)

0

0<s≤t t

ˆ s− λ−1 (λ − λ)(dN s − λds),

0

which, after simplification, is (4.5.2). Now define a new probability measure P by setting dP E Ft = t . dP Lemma 4.5.3 Under probability measure P the process Nt is a Poisson process with paˆ rameter λ. Proof

By the characterization theorem of Poisson processes (see Theorem 3.6.14) we ˆ and Mt2 − λt ˆ are (P, Ft )-martingales. must show that Mt = Nt − λt By Bayes’ Theorem 4.1.1, for t ≥ s ≥ 0, E[Mt | Fs ] =

E[t Mt | Fs ] E[t Mt | Fs ] . = E[t | Fs ] s

160

Change of measures

Therefore, Mt is a (P, Ft )-martingale if and only if t Mt is a (P, Ft )-martingale. Now

t

t Mt =

s− dMs +

0

t

Ms− ds + [, M]t .

0

Recall

[, M]t =

s Ms

o<s≤t

=

t

o<s≤t t

=−

ˆ s− λ (λ − λ)dNs Ns −1

0

ˆ s− λ−1 (λ − λ)d[N , N ]s

0

=−

t

ˆ s− λ−1 (λ − λ)dN s.

0

Therefore t Mt =

t

t

ˆ s− (dNs − λds) +

0

Ms− ds −

0

t

ˆ s− λ−1 (λ − λ)dN s.

(4.5.5)

0

The second integral on the right of (4.5.5) is a (P, Ft )-martingale. (Recall that Nt − λt is a (P, Ft )-martingale.) The other two integrals are written as

t

t

ˆ s− (dNs − λds) =

ˆ s− (dNs − λds + λds − λds) t t t ˆ = s− (dNs − λds) + s− λds − s− λds,

0

0

0

0

(4.5.6)

0

and

t

−1

t

ˆ s− λ−1 (λ − λ)(dN s − λds + λds) t t ˆ = s− (dNs − λds) + s− λ−1 (λ − λ)λds.

ˆ s− λ (λ − λ)dN s =

0

0

0

(4.5.7)

0

Substituting (4.5.6) and (4.5.7) in (4.5.5) yields the desired result and it remains to show ˆ is also a (P, Ft )-martingale. that Mt2 − λt Now Mt2 = 2

0

t

Ms− dMs + [M, M]t = 2

t

Ms− dMs + Nt .

(4.5.8)

0

ˆ from both sides of (4.5.8) makes the last term on the right of (4.5.8) Subtracting λt a (P, Ft )-martingale and since the dM integral is a (P, Ft )-martingale the result follows.

4.6 Poisson process with drift

161

4.6 Poisson process with drift Let Nt be a Poisson process with parameter λ on a filtered probability space (, F, Ft , P) and suppose that we have the following process: X t = µt + σ Nt = µt + σ (Nt − λt + λt) = (µ + σ λ)t + σ Mt .

(4.6.1)

Here µ and σ are constants and {Mt } = {Nt − λt} is an (Ft , P)-martingale. The dynamics given by (4.6.1) could describe the evolution of a system with a linear trend perturbated by random jumps of size σ given by the Poisson process N . We wish to define a new probability P such that X t has dynamics Xt = σ Mt , where {M t } is an (Ft , P)-martingale. Define the stochastic process t µ µ µ t = exp − (1 − dMr Ns )e λσ Ns λσ 0 λσ 0<s≤t

µ µ µ Ns )e λσ Ns = e− λσ Mt (1 − λσ 0<s≤t

µ µ t = eσ (1 − Ns ), λσ 0<s≤t

(4.6.2)

(4.6.3)

where the last expression is obtained if we recall that Mt = Nt − λt and that

µ * ) µ Ns e λσ Ns = exp λσ 0<s≤t 0<s≤t µ

= e λσ Nt . Lemma 4.6.1 The process (4.6.3) is a martingale under probability measure P. Proof Let 0 ≤ s ≤ t and recall that Nt is an independent increment process and is adapted to Ft so that

µ µ E[t | Fs ] = e σ t E (1 − Nr ) | Fs λσ 0
µ µ = s e σ (t−s) E Nr ) (1 − λσ s
162

Change of measures

Lemma 4.6.2 The exponential martingale {t } given by (4.6.3) is the unique solution of t µ t = 1 − (dNs − λds). s− (4.6.4) λσ 0 µ µ Proof Write t = e σ t Yt , where Yt = 0
µ µ µ = eσ s 1− Nr 1 − Ns λσ λσ r ≤s−

µ µ −e σ s 1− Nr λσ r ≤s− µ = s− 1 − Ns − 1 µλσ = −s− Ns . λσ Combining these results together gives t µ µ t = 1 + e( σ )s Ys s− ds + σ 0 0<s≤t µ

µ + s− − Ns − e( σ )s Ys λσ 0<s≤t t µ µ = 1+ e( σ )s Ys s− ds + σ 0 0<s≤t t µ µ ( σ )s − e Ys − s− ds σ 0 0<s≤t t µ − s− (dNs − λds), λσ 0 which, after simplification, is (4.6.4). Now define a new probability measure P by setting dP E Ft = t . dP

4.7 Continuous-time Markov chains

163

Lemma 4.6.3 Under probability measure P the process X t has dynamics given by (4.6.2). Proof Let M t = Nt − λt, where λ = λ − µ/σ which is assumed to be positive. We claim that M t is a P-martingale. To see this we need only show that M is a P-martingale. Using the differentiation rule, d(t M t ) = t− dM t + M t− dt + d[, M]t µ µ dMt − t− dNt = t− (dNt − λdt) + M t− t− λσ λσ µ µ µ dMt − dNt = t− dNt − λdt + dt + M t− σ λσ λσ µ µ = t− dMt + M t− dMt − dMt , λσ λσ so that M is a dM stochastic integral, and hence is a P-martingale. So we can write X t = σ M t , under the new measure P.

Remark 4.6.4 The results of this section hold if (4.6.1) is replaced with the more general dynamics dX t = µ(t, X t− )dt + σ (t, X t− )dNt . The stochastic exponential martingale (4.6.3) takes the form t µs µs t = exp ds (1 − Ns ). σ λσ s s 0 0<s≤t 4.7 Continuous-time Markov chains Consider again the finite-state Markov process {X t } on the set of standard unit vectors of IR N (see Example 2.6.17 and Section 3.8). Write Ft for the right-continuous, complete filtration σ {X r : 0 ≤ r ≤ t}. We saw in Lemma 2.6.18 that X t has the semimartingale representation t Xt = X0 + Ar X r dr + Vt . 0

Recall that Jt denotes the total number of jumps (of all kinds) of the process X up to time t and N t Jt = −

X s , ei aii (s)ds + Q t . i=1

0

Write λt = −

N i=1

X t , ei aii (t).

164

Change of measures

Suppose that we wish to define a new probability P such that Jt is a standard Poisson process with parameter 1. Define the P-martingale t t t = exp{− log λr dJr + (λr − 1)dr }. 0

0

Since {t , Ft } is a P-martingale such that t t = 1 − r − λr−1 (λr − 1)(dJr − λr dr ), 0

we can define a new probability measure P by setting dP E Ft = t . dP Lemma 4.7.1 Under probability measure P the process Jt is a Poisson process with parameter 1. Proof

By the characterization theorem of Poisson processes (see Theorem 3.6.14) we

2

must show that Q t = Jt − t and Q t − t are (P, Ft )-martingales. By Bayes’ Theorem 4.1.1, for t ≥ s ≥ 0, E[Q t | Fs ] =

E[t Q t | Fs ] E[t Q t | Fs ] . = E[t | Fs ] s

Therefore, Q t is a (P, Ft )-martingale if and only if t Q t is a (P, Ft )-martingale. Now t t t Q t = s− dQ s + Q s− ds + [, Q]t , 0

and

0

t t −1 [, Q]t = 1 − s− λs (λs − 1)(dJs − λs ds), dJs − t 0 0 t =− s− λ−1 s (λs − 1)d[J , J ]s 0 t = s− (λ−1 s − 1)dJs . 0

Therefore

t

t Q t = 0

t

s− (dJs − ds) +

t

Q s− ds +

0

0

s− (λ−1 s − 1)dJs .

(4.7.1)

The second integral on the right of (4.7.1) is a (P, Ft )-martingale. However, Jt − λt is a (P, Ft )-martingale so that the other two integrals are written as t t s− (dJs − ds) = s− (dJs − λs ds + λs ds − ds) 0 0 t t = P-martingale + s− λs ds − s− ds, (4.7.2) 0

0

4.8 Problems

and

t 0

s− (λ−1 s − 1)dJs =

165

t

s− (λ−1 s − 1)(dJs − λs ds + λs ds) 0 t = P-martingale + s− (λ−1 s − 1)λs ds.

(4.7.3)

0

Substituting (4.7.2) and (4.7.3) in (4.7.1) yields the desired result. 2 To finish the proof we have to show that Q t − t is also a (P, Ft )-martingale. Now t t 2 Qt = 2 Q s− dQ s + [Q, Q]t = 2 Q s− dQ s + Jt . 0

(4.7.4)

0

Subtracting t from both sides of (4.7.4) makes the last term on the right of (4.7.4) a (P, Ft )martingale and since the dM integral is a (P, Ft )-martingale the result follows.

Remark 4.7.2 The above counting processes generate the same information as the Markov chain {X t }. If we wish to change the intensity matrix A to a another one, A, under a new probability measure P we change the intensities in the counting processes t t ij ij ij Jt =

X s , ei a ji (s)ds + Vt = λis j ds + Vt . 0

0

In order to define a new probability P such that t ij ij ij Jt = λs ds + V t , 0

define the P-martingale t =

i= j

and set

exp

ij

t

log 0

λs

dJri j ij λs

dP E Ft dP

−

t

0

+ ij (λs

−

λis j )dr

,

(4.7.5)

= t . ij

ij

Lemma 4.7.3 Under probability measure P the processes Jt have intensities λt respectively.

4.8 Problems 1. Consider the probability space ([0, 1], B([0, 1]), λ), where λ is the Lebesgue measure on the Borel σ -field B([0, 1]). Let P be another probability measure carried by the singleton {0}, i.e. P({0}) = 1. Let

166

Change of measures

π1 = {[0, 12 ], ( 12 , 1]}, π2 = {[0, 14 ], ( 14 , 34 ], ( 34 , 1]}, . . . , πn = {[0, 1/2n ], . . . , (1 − 1/2n , 1]}. Define the random variable

 1 1 P([0, 2 ])  n , n ([0, n ]) = = 2 0 elswhere in [0, 1]. 2 λ([0, 2−n ]) −n

2. 3.

4.

5.

Show that the sequence n is a positive martingale (with respect to the filtration generated by the partitions πn ) such that E λ [n ] = 1 for all n but lim n = 0 λ-almost surely. Prove Lemma 4.2.14. Consider the order-2 Markov chain {X n }, 1 ≤ n ≤ N discussed in Example 2.4.6. Define a new probability measure P on (, Fn } such that P(X n = ek | X n−2 = ei , X n−1 = e j ) = pi j,k . On a probability space (, F, P) consider the stochastic process X n with a finite state space and transition probabilities P(X n+1 = k | X n−1 = i, X n = j) = pi j,k . Transform the process X into a Markov chain Y with an appropriate state space and do a change of measure under which the process Y becomes a sequence of i.i.d. uniform random variables on the state space of the Markov chain. (Hint: show that the process (X n−1 , X n ), (X n , X n+1 ), . . . is a Markov chain.) Let Nt be a Poisson process with parameter λ on a filtered probability space (, F, Ft , P) and suppose that we have the following process: X t = µt + σ Nt .

Here µ and σ are constants. Define a new probability P such that X t is a Poisson process with parameter λ = 1. 6. Show that the exponential martingale {t } given by (4.7.5) is the unique solution of t ij t = 1 + s− (λis j )−1 (λs − λis j )(dJsi j − λis j ds). i, j

7. Prove Lemma 4.7.3.

0

Part II Applications

5

Kalman filtering

5.1 Introduction This chapter discusses the filtering of partially observed linear (and nonlinear) dynamics using the tools developed in Chapter 4. The chapter starts with simple applications and evolves into less easy situations.

5.2 Discrete-time scalar dynamics Consider a system whose state at time k is xk ∈ IR. The time index k of the state evolution will be discrete and identified with IN = {0, 1, 2, . . . }. Let (, F, P) be a probability space upon which {vk }, {w k }, k ∈ IN are independent and identically distributed (i.i.d.) sequences of Gaussian random variables, having zero means and variances 1 (N (0, 1)) , respectively; x0 is normally distributed. Let {Fk }, k ∈ IN be the complete filtration (that is, F0 contains all the P-null events) generated by {x0 , x1 , . . . , xk }. The state of the system satisfies the linear dynamics xk+1 = axk + bvk+1 .

(5.2.1)

Note that E[vk+1 | Fk ] = 0. A useful and simple model for a noisy observation of xk is to suppose it is given as a linear function of xk plus a random “noise” term. That is, we suppose that for some real numbers c and d our observations have the form yk = cxk + dw k .

(5.2.2)

We shall also write {Yk }, k ∈ IN for the complete filtration generated by {y0 , y1 , . . . , yk }. Using measure change techniques we shall derive a recursive expression for the conditional distribution of xk given Yk . 5.3 Recursive estimation Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (5.2.1) and (5.2.2) will hold.

170

Kalman filtering

Suppose that under P: 1 2 1. {xk }, k ∈ IN is an i.i.d. N (0, 1) sequence with density function φ(x) = √ e−x /2 , 2π 1 2 2. {yk }, k ∈ IN is an i.i.d. N (0, 1) sequence with density function ψ(y) = √ e−y /2 . 2π For l = 0, λ0 =

ψ(d −1 (y0 − cx0 )) and for l = 1, 2, . . . define dψ(y0 ) φ(b−1 (xl − axl−1 ))ψ(d −1 (yl − cxl )) , b dφ(xl )ψ(yl ) k k = λl . λl =

(5.3.1) (5.3.2)

l=0

Let Gk be the complete σ -field generated by {x0 , x1 , . . . , xk , y0 , y1 , . . . , yk } for k ∈ IN. Lemma 5.3.1 The process {k }, k ∈ IN is an P-martingale with respect to the filtration {Gk }, k ∈ IN. Proof

Since k is Gk -measurable, E[k+1 | Gk ] = k E[λk+1 | Gk ].

Now φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) E λk+1 | Gk = E | Gk bdφ(xk+1 )ψ(yk+1 ) φ(b−1 (xk+1 − axk )) ψ(d −1 (yk+1 − cxk+1 )) =E | Gk , xk+1 | Gk . E bφ(xk+1 ) dψ(yk+1 ) Now,

E

and

ψ(d −1 (y − cxk+1 )) ψ(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 = ψ(y)dy = 1, dψ(yk+1 ) dψ(y) IR φ(b−1 (x − axk )) φ(b−1 (xk+1 − axk )) E | Gk = φ(x)dx = 1, bφ(xk+1 ) bφ(x) IR

using the change of variable d −1 (y − cxk+1 ) = u in the first integral and a similar change of variable in the second integral. Define P on {, F} by setting the restriction of the Radon–Nikodym derivative Gk equal to k . Then:

dP dP

to

Lemma 5.3.2 On {, F} and under P, {vk }, k ∈ IN, {w k }, k ∈ IN are i.i.d. N (0, 1) sequences of random variables, where

vk+1 = b−1 (xk+1 − axk ),

w k = d −1 (yk − cxk ).

5.3 Recursive estimation

171

Proof Suppose f, g : IR → IR are “test” functions (i.e. measurable functions with compact support). Then with E (resp. E) denoting expectation under P (resp. P) and using Bayes’ Theorem (4.1.1) E[ f (vk+1 )g(w k+1 ) | Gk ] =

E[k+1 f (vk+1 )g(w k+1 ) | Gk ] E[k+1 | Gk ]

= E[λk+1 f (vk+1 )g(w k+1 ) | Gk ], where the last equality follows from Lemma 5.3.1. Consequently E[ f (vk+1 )g(w k+1 ) | Gk ] = E[λk+1 f (vk+1 )g(w k+1 ) | Gk ] φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) =E bdφ(xk+1 )ψ(yk+1 ) −1 −1 × f (b (xk+1 − axk ))g(d (yk+1 − cxk+1 )) | Gk

φ(b−1 (xk+1 − axk )) f (b−1 (xk+1 − axk )) bφ(xk+1 ) ψ(d −1 (yk+1 − cxk+1 )) × E g(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 | Gk . dψ(yk+1 )

=E

Now

ψ(d −1 (yk+1 − cxk+1 )) E g(d −1 (yk+1 − cxk+1 )) | Gk , xk+1 dψ(yk+1 ) ψ(d −1 (y − cxk+1 )) = ψ(y)g(d −1 (y − cxk+1 ))dy dψ(y) IR = ψ(u)g(u)du.

IR

Similarly E

φ(b−1 (xk+1 − axk )) f (b−1 (xk+1 − axk )) | Gk = φ(u) f (u)du. bφ(xk+1 ) IR

Therefore

E[ f (vk+1 )g(w k+1 ) | Gk ] =

φ(u) f (u)du IR

ψ(u)g(u)du, IR

and the lemma is proved. Let g : IR → IR be a “test” function. Using Bayes’ Theorem 4.1.1, E[g(xk ) | Yk ] =

E[k g(xk ) | Yk ] E[k | Yk ]

,

(5.3.3)

172

Kalman filtering

where E (resp. E) denotes expectations with respect to P (resp. P). Consider the unnormalized, conditional expectation which is the numerator of (5.3.3), E[k g(xk ) | Yk ]. This is a measure-valued process. Write αk (.), k ∈ IN for its density so that E[k g(xk ) | Yk ] = g(x)αk (x)dx.

(5.3.4)

IR

If pk (.) denotes the normalized conditional density, such that E[g(xk ) | Yk ] = g(x) pk (x)dx, IR

then from (5.3.3) we see that pk (x) = αk (x)

αk (z)dz

−1

,

(5.3.5)

IR

for x ∈ IR, k ∈ IN. Then we have the following result. Theorem 5.3.3 ψ(d −1 (yk+1 − cx)) αk+1 (x) = dbψ(yk+1 )

φ(b−1 (x − az))αk (z)dz.

(5.3.6)

Proof For any “test” function g and in view of (5.3.1) and (5.3.2), g(x)αk+1 (x)dx = E[k+1 g(xk+1 ) | Yk+1 ] IR

= E[k λk+1 g(xk+1 ) | Yk+1 ] φ(b−1 (xk+1 − axk ))ψ(d −1 (yk+1 − cxk+1 )) = E k g(xk+1 ) | Yk+1 bdφ(xk+1 )ψ(yk+1 ) 1 φ(b−1 (x − axk ))ψ(d −1 (yk+1 − cx)) = E k dbψ(yk+1 ) φ(x) × φ(x)g(x)dx | Yk+1 . The last equality follows from the fact that xk+1 has distribution φ and is independent of everything else under P. Also, note that given yk+1 we condition only on Yk to get an expression similar to notation (5.3.4), that is 1 g(x)αk+1 (x)dx = dbψ(yk+1 ) IR × ψ(d −1 (yk+1 − cx))φ(b−1 (x − az))g(x)αk (z)dxdz. IR

IR

This holds for all “test” functions g so we can conclude that (5.3.6) holds.

5.3 Recursive estimation

173

Remark 5.3.4 The linearity of (5.2.1) and (5.2.2) implies that (5.3.5) is also normally distributed with mean xˆk|k = E[xk | Yk ] and variance k|k = E[(xk − xˆk|k )2 | Yk ]. Our purpose now is to give recursive estimates of xˆk|k and k|k using the recursion for αk (x). Theorem 5.3.5 For the linear model described by (5.2.1) and (5.2.2) the conditional mean and variance of the state process xk are given by the following recursions: k+1|k+1 = Ak , xˆk+1|k+1 = Ak Bk , where Ak = Bk = −1

and k|k =

c2 1 a 2 k|k + − d2 b2 b4

−1 ,

cyk+1 a xˆk|k k|k , + 2 d2 b k|k

a2 1 . + b2 k|k

Proof Recall that φ(.) and ψ(.) are normal densities with zero means and variances 1, and using the fact that αn (x) is proportional to a normal density with mean xˆk|k and variance k|k we write ψ(d −1 (yk+1 − cx)) αk+1 (x) = φ(b−1 (x − az))αk (z)dz dbψ(yk+1 )

ψ(d −1 (yk+1 − cx)) 1 1 = (z − xˆk|k )2 dz exp − 2 (x − az)2 − dbψ(yk+1 ) 2b 2 k|k 2 xˆk|k ψ(d −1 (yk+1 − cx)) −x 2 = exp − dbψ(yk+1 ) 2b2 2 k|k

2 −1 2 a ax 1 xˆk|k × exp − 2z dz. + + z 2 b2 k|k b2 k|k Let

2 xˆk|k ψ(d −1 (yk+1 − cx)) −x 2 K (x) = , − exp dbψ(yk+1 ) 2b2 2 k|k

−1

a2 1 + , 2 b k|k

ax xˆk|k + . b2 k|k

k|k = and βk (x) =

174

Kalman filtering

αk+1 (x) = K (x)

exp

= K (x)

exp

−1 2 −1 [z k|k − 2zβk (x)] dz 2

−1 2 k|k

[z 2 − 2z k|k βk (x) + ( k|k βk (x))2 − ( k|k βk (x))2 ] dz

( k|k βk (x))2 −1 2 = K (x) exp (z − k|k βk (x)) dz exp 2 2 k|k ( k|k βk (x))2 = K (x) exp 2π k|k . 2 The last expression follows from the fact the integrand is proportional to a normal density with mean k|k βk (x) and variance k|k and hence the integral is 1 when properly normalized. The final step is to group together all the terms containing x and then completing the square with respect to the variable x: cy −1 2 c2 1 a 2 k|k a xˆk|k k|k k+1 αk+1 (x) = K 1 exp − 2x + 2− + 2 x 2 d2 b b4 d2 b k|k

−1 = K 2 exp (x − Ak Bk )2 , 2Ak where K 1 and K 2 are constants independent of x and −1 c2 1 a 2 k|k Ak = + 2− , d2 b b4 cyk+1 a xˆk|k k|k Bk = , + 2 d2 b k|k that is k+1|k+1 = Ak , xˆk+1|k+1 = Ak Bk , which finishes the proof. The Kalman filter is usually presented in terms of the one-step ahead prediction xˆk+1|k = E[xk+1 | Yk ] = a xˆk|k . Similarly, k+1|k = E[(xk+1 − xˆk+1|k )2 | Yk ] = a 2 k|k + b2 .

Then, with K k+1 =

c k+1|k , c2 k+1|k + d 2 xˆk+1|k+1 = a xˆk+1|k + K k+1 (yk − c xˆk+1|k ), k+1|k+1 = k+1|k −

2 c2 k+1|k

c2 k+1|k + d 2

.

5.4 Vector dynamics

175

5.4 Vector dynamics Consider a system whose state at time k = 0, 1, 2, . . . is X k ∈ IRm and which can be observed only indirectly through another process Yk ∈ IRd . Let (, F, P) be a probability space upon which Vk and Wk are normally distributed with means 0 and respective covariance identity matrices Im×m and Id×d . Assume that Dk is nonsingular, Bk is nonsingular and symmetric (for notational convenience). X 0 is a Gaussian random variable with zero mean and covariance matrix B02 (of dimension m × m). Let {Fk }, k ∈ IN be the complete filtration (that is F0 contains all the P-null events) generated by {X 0 , X 1 , . . . , X k }. The state and observations of the system satisfies the linear dynamics X k+1 = Ak+1 X k + Bk+1 Vk+1 ∈ IRm , Yk = Ck X k + Dk Wk ∈ IR . d

(5.4.1) (5.4.2)

Ak , Ck are matrices of appropriate dimensions. We shall also write {Yk }, k ∈ IN for the complete filtration generated by {Y0 , Y1 , . . . Yk }. Using measure change techniques we shall derive a recursive expression for the conditional distribution of X k given Yk . Recursive estimation Initially we suppose all processes are defined on an “ideal” probability space (, F, P); then under a new probability measure P, to be defined, the model dynamics (5.4.1) and (5.4.2) will hold. Suppose that under P: 1. {X k }, k ∈ IN is an i.i.d. N (0, Im×m ) sequence with density function φ(x) = 1 e−x x/2 , (2π )m/2 2. {Yk }, k ∈ IN is an i.i.d. N (0, Id×d ) sequence with density function ψ(y) = 1 e−y y/2 . d/2 (2π ) For any square matrix B write |B| for the absolute value of its determinant. ψ(D0−1 (Y0 − C0 X 0 )) For l = 0, λ0 = and for l = 1, 2, . . . define |D0 |ψ(Y0 ) λl = k =

φ(Bl−1 (X l − Al X l−1 ))ψ(Dl−1 (Yl − Cl X l )) , |Bl ||Dl |φ(X l )ψ(Yl ) k

λl .

(5.4.3) (5.4.4)

l=0

Let {Gk } be the complete σ -field generated by {X 0 , X 1 , . . . , X k , Y0 , Y1 , . . . , Yk } for k ∈ IN. The process {k }, k ∈ IN is a P-martingale with respect to the filtration {Gk }. dP Define P on {, F} by setting the restriction of the Radon–Nikodym derivative to dP Gk equal to k . It can be shown that on {, F} and under P, Vk and Wk are normally

176

Kalman filtering

distributed with means 0 and respective covariance identity matrices Im×m and Id×d , where

−1 Vk+1 = Bk+1 (X k+1 − Ak+1 X k ),

Wk = Dk−1 (Yk − Ck X k ). Let g : IRm → IR be a “test” function. Write αk (.), k ∈ IN for the density E[k g(xk ) | Yk ] = g(x)αk (x)dx.

(5.4.5)

IRm

Then we have the following result: Theorem 5.4.1 αk+1 (x) = Proof

−1 (Yk+1 − Ck+1 x)) ψ(Dk+1 |Dk+1 ||Bk+1 |ψ(Yk+1 )

−1 (x − Ak+1 z))αk (z)dz. φ(Bk+1

(5.4.6)

The proof is similar to the scalar case and is left as an exercise.

Remark 5.4.2 The linearity of (5.4.1) and (5.4.2) implies that (5.4.6) is proportional to a normal distribution with mean Xˆ k|k = E[X k | Yk ] and error covariance matrix k|k = E[(X k − Xˆ k|k )(X k − Xˆ k|k ) | Yk ] . Our purpose now is to give recursive estimates of Xˆ k|k and k|k using the recursion for αk (x). Write (x, Yk+1 ) = Then

−1 ψ(Dk+1 (Yk+1 − Ck+1 x)) (2π )−m/2 | k|k |−1/2 . |Dk+1 ||Bk+1 |ψ(Yk+1 )

αk+1 (x) = (x, Yk+1 )

IRm

−1 −1 exp (−1/2)(Bk+1 (x − Ak+1 z)) (Bk+1 (x − Ak+1 z))

−1 + (z − Xˆ k|k ) k|k (z − Xˆ k|k ) dz −1 = K (x) exp(−1/2)z k|k z − 2βk+1 z dz,

(5.4.7) (5.4.8)

IRm

where K (x) is independent of the variable z and −1

−2 −1 k|k = Ak+1 Bk+1 Ak+1 + k|k , −2 −1 βk+1 = x Bk+1 Ak+1 + Xˆ k|k k|k .

The next step is to complete the “square” in the argument of the exponential in (5.4.8) in order to rewrite the integrand as a normal density which integrates out to 1. Now −1

−1

z k|k z − 2βk+1 z = (z − k|k βk+1 ) k|k (z − k|k βk+1 ) − βk+1 k|k βk+1 ,

(5.4.9)

5.5 The EM algorithm

177

after substitution of (5.4.9) into (5.4.8) and integration we are left with only the x variable. Completing the “square” with respect to x,

1 −1 αk+1 (x) = K 1 exp (− )(x − Xˆ k+1|k+1 ) k+1|k+1 (x − Xˆ k+1|k+1 ) , 2 where K 1 is a constant independent of x and −1 −2 −2 −2 k+1|k+1 = Bk+1 − Bk+1 Ak+1 k|k Ak+1 Bk+1 −2 + Ck+1 Dk+1 Ck+1 , −1 −2 −1 ˆ −2 k+1|k+1 Xˆ k+1|k+1 = Bk+1 X k|k + Ck+1 Ak+1 k|k k|k Dk+1 Yk+1 .

The one-step ahead prediction version is Xˆ k+1|k = E[X k+1 | Yk ] = Ak+1 Xˆ k|k , k+1|k = E[(X k+1 − Xˆ k+1|k )(X k+1 − Xˆ k+1|k ) | Yk ] −2 = Ak+1 k|k Ak+1 + Bk+1 .

−2 Then, with Hk+1 = k+1|k Ck+1 (Ck+1 k+1|k Ck+1 + Dk+1 ) use of the Matrix Inversion Lemma 5.4.3 gives the Kalman filter equations in the form:

Xˆ k+1|k+1 = Ak+1 Xˆ k+1|k + Hk+1 (Yk+1 − Ck+1 Xˆ k+1|k ), −2 k+1|k+1 = k+1|k − k+1|k Ck+1 (Ck+1 k+1|k Ck+1 + Dk+1 Ck+1 k+1|k ).

Lemma 5.4.3 (Matrix Inversion Lemma) Assuming the required inverses exist, the Matrix Inversion Lemma states that: −1 −1 −1 −1 −1 (A11 − A12 A−1 = A−1 22 A21 ) 11 + A11 A12 (A22 − A21 A11 A12 ) A21 A11 .

5.5 The EM algorithm The EM algorithm ([3], [9]) is a widely used iterative numerical method for computing maximum likelihood parameter estimates of partially observed models such as linear Gaussian state space models. For such models, direct computation of the MLE is difficult. The EM algorithm has the appealing property that successive iterations yield parameter estimates with nondecreasing values of the likelihood function. Suppose that we have observations y1 , . . . , y K available, where K is a fixed positive integer. Let {Pθ , θ ∈ } be a family of probability measures on (, F), all absolutely continuous with respect to a fixed probability measure P0 . The log-likelihood function for computing an estimate of the parameter θ based on the information available in Y K is dPθ L K (θ ) = E 0 log | YK , dP0 and the maximum likelihood estimate (MLE) is defined by θˆ ∈ argmax L K (θ). θ∈

178

Kalman filtering

Let θˆ0 be the initial parameter estimate. The EM algorithm generates a sequence of parameter estimates {θ j }, j ≥ 1, as follows. Each iteration of the algorithm consists of two steps: Step 1 (E-step). Set θ˜ = θˆ j and compute Q(θ, θ˜ ), where dPθ Q(θ, θ˜ ) = E θ˜ log | YK . dPθ˜ Step 2 (M-step). Find θˆ j+1 ∈ argmaxθ∈ Q(θ, θ j ). Using Jensen’s inequality (2.3.3) it can be shown (see Theorem 1 in [9]) that the sequence of model estimates {θˆ j , j ≥ 1} from the EM algorithm are such that the sequence of likelihoods {L K (θˆ j )}, j ≥ 1, is monotonically increasing with equality if and only if θˆ j+1 = θˆ j . Sufficient conditions for convergence of the EM algorithm are given in [37]. We briefly summarize them here: Assume that (i) The parameter space is a subset of some finite dimensional Euclidean space IRr . (ii) {θ ∈ : L K (θ ) ≥ L K (θˆ0 )} is compact for any L K (θˆ0 ) > −∞. (iii) L K is continuous in and differentiable in the interior of . (As a consequence of (i), (ii) and (iii), clearly L K (θˆ j ) is bounded from above.) (iv) The function Q(θ, θˆ j ) is continuous in both θ and θˆ j . Then by Theorem 2 in [37], the limit of the sequence EM estimates {θˆ j } has a stationary ¯ for some stationary point θ¯ of L K . Also {L K (θˆ j )} converges monotonically to L¯ t = Lt (θ) ¯ ¯ point θ . To make sure that Lt is a maximum value of the likelihood, it is necessary to try different initial values θˆ0 .

5.6 Discrete-time model parameter estimation In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, filtered estimates of the above quantities are computed via Kalman smoothing, which requires large memory numerical implementation. This problem is solved in [13] by providing finite-dimensional filters for (the components of) such integral processes. The authors further show that finite-dimensional filters exist for moments of all orders of the state process. Assume that the state and observation processes are given by the vector dynamics X k+1 = Ak+1 X k + Bk+1 Vk+1 ∈ IRm , Yk = Ck X k + Dk Wk ∈ IRd .

(5.6.1) (5.6.2)

Ak , Ck are matrices of appropriate dimensions, Vk and Wk are normally distributed with means 0 and respective covariance identity matrices Im×m and Id×d . Assume that Dk is nonsingular, Bk is nonsingular and symmetric (for notational convenience). X 0

5.6 Discrete-time model parameter estimation

179

is a Gaussian random variable with zero mean and covariance matrix B02 (of dimension m × m). The linear model given by (5.6.1) and (5.6.2) is determined by the matrices A, B, C and D which need to be known. These parameters are estimated using the expectation maximization (EM) algorithm. Maximum likelihood estimation of the parameters via the EM algorithm requires computation of the filtered estimates of quantities such as k k k Tk(0) = l=0 X l ⊗ X l , Tk(1) = l=1 X l ⊗ X l−1 , Tk(2) = l=1 X l−1 ⊗ X l−1 , and Uk = k l=0 X l ⊗ Yl . Consider the time-invariant version of (5.6.1), (5.6.2): X k+1 = AX k + BVk+1 ∈ IRm ,

(5.6.3)

Yk = C X k + DWk ∈ IR .

(5.6.4)

d

The aim is to compute ML estimates of the parameters θ = (A, B, C, D) given the observations Yk = σ {Ys : s ≤ k}. This is done via the EM algorithm.

Notation Let ei , e j ∈ IRm denote unit vectors with 1 in the i-th and j-th positions, respectively. For i, j ∈ {1, . . . , m}, i j(0)

Tk

=

k

X l , ei X l , e j ,

(5.6.5)

X l , ei X l−1 , e j ,

(5.6.6)

l=0 i j(1)

Tk

=

k l=0

here ., . denotes the scalar product. Also let f j ∈ IRd denote the unit vector with 1 in the j-th position. For n ∈ {1, . . . , d} write Ukin =

k

X l , ei Yl , f n .

(5.6.7)

l=0 i j(0)

i j(1)

Note that Tk , Tk and Ukin are merely the elements of the matrices Tk(0) , Tk(1) , Tk(2) , and Uk respectively. ˜ is derived. Now the expression for Q(θ, θ) To update the set of parameters from θ˜ to θ , the following density is introduced: k dPθ γl , Gk = dPθ˜ l=0

γl =

where γ0 =

−1 ˜ (Y0 − C X 0 )) | D|φ(D , −1 ˜ |D|φ( D (Y0 − C˜ X 0 ))

−1 −1 ˜ ˜ (Yl − C X l )) | B|ψ(B (X l − AX l−1 )) | D|φ(D . ˜ l−1 )) |D|φ( D˜ −1 (Yl − C˜ X l )) |B|ψ( B˜ −1 (X l − AX

180

Kalman filtering

Now

E θ˜

dPθ log | Yk = −k log |B| − (k + 1) log |D| dPθ˜ Gk k 1 −2 + E θ˜ (X l − AX l−1 ) B (X l − AX l−1 ) | Yk 2 l=1 k 1 − E θ˜ (Yl − C X l ) (D D )−1 (Yl − C X l ) | Yk 2 l=1 ˜ = Q(θ, θ˜ ), + R(θ)

where R(θ˜ ) does not involve θ .

(5.6.8)

∂Q = 0. This yields ∂θ −1 k E θ˜ | Yk X l−1 ⊗ X l−1 | Yk ,

To implement the M-step set the derivatives A = E θ˜

k

X l ⊗ X l−1

l=1

k 1 B = E θ˜ (X l − AX l−1 ) ⊗ (X l − AX l−1 ) | Yk , k l=1 −1 k k C = E θ˜ E θ˜ Yl ⊗ X l | Yk X l ⊗ X l | Yk , 2

l=0

(5.6.9)

l=1

(5.6.10)

(5.6.11)

l=0

k 1 DD = (Yl − C X l ) ⊗ (Yl − C X l ) | Yk . E˜ k + 1 θ l=0

i j(0)

(5.6.12)

i j(1)

Next, finite-dimensional recursive filters for Tk , Tk and Ukin are derived. That is, these filters can be described in terms of a finite number of statistics. 5.7 Finite-dimensional filters Initially, assume that all processes are defined on an “ideal” probability space (, F, P). Suppose that under P: 1. {X k }, k ∈ IN is an i.i.d. N (0, Im×m ) sequence with density function ψ, 2. {Yk }, k ∈ IN is an i.i.d. N (0, Id×d ) sequence with density function φ. Write λ0 =

φ(D0−1 (Y0 − C0 X 0 )) . For each l = 1, 2, . . . define |D0 |φ((Y0 )

φ(Dl−1 (Yl − Cl X l )) ψ(Bl−1 (X l − Al X l−1 )) . |Dl |φ((Yl ) |Bl |φ((X l ) k For each k ≥ 0 set k = l=0 λl . Let Gk be the complete σ -field generated by {X 0 , X 1 , . . . , X k , Y0 , Y1 , . . . , Yk } for k ∈ IN. dP Define P on {, F} by setting the restriction of the Radon–Nikodym derivative to dP Gk equal to k . Then: λl =

5.7 Finite-dimensional filters

181

Lemma 5.7.1 On {, F} and under P, {Vk }, k ∈ IN, {Wk }, k ∈ IN are i.i.d. N (0, Id×d ), N (0, Im×m ) sequences respectively, where

Vl = Dl−1 (Yl − Cl X l ),

Wl = Bl−1 (X l − Al X l−1 ). Definition 5.7.2 Define the measure-valued processes αk (x) = E[k I (X k ∈ dx) | Yk ], i j(M) βk (x)

i j(M)

= E[k Tk

I (X k ∈ dx) | Yk ],

M = 0, 1, 2,

δkin (x) = E[k Ukin I (X k ∈ dx) | Yk ].

(5.7.1)

Then for any “test” function g : IRm → IR, write E[k g(X k ) | Yk ] = i j(M) E[k Tk g(X k )

IRm

αk (x)g(x)dx, i j(M)

| Yk ] =

E[k Ukin g(X k ) | Yk ] =

IRm

IRm

βk

(x)g(x)dx,

M = 0, 1, 2,

δkin (x)g(x)dx.

(5.7.2)

The following theorem ([13]) gives recursive expressions for the unnormalized densities i j(M) αk (x), βk (x), M = 0, 1, 2, and δkin (x). The recursions are derived under the measure P where {X l } and {Yl }, l = 0, 1, . . . are independent sequences of random variables. Theorem 5.7.3 For k = 0, 1, . . . the unnormalized densities αk (x), i j(M)

βk

(x), M = 0, 1, 2, and δkin (x) defined by (5.7.1) are given by the following recursions. αk (x) = (x, Yk ) IR

i j(0) βk (x)

= (x, Yk )

m

αk−1 (z)ψ(Bk−1 (x − Ak z))dz, βk−1 (z)ψ(Bk−1 (x − Ak z))dz i j(0)

IRm

+ x, ei x, e j i j(1) βk (x)

+ x, ei i j(2)

βk

IR

IRm

m

+

IRm

(5.7.4)

βk−1 (z)ψ(Bk−1 (x − Ak z))dz

z, e j αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,

(5.7.5)

βk−1 (z)ψ(Bk−1 (x − Ak z))dz i j(2)

(x) = (x, Yk )

IRm

αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,

i j(1)

= (x, Yk )

(5.7.3)

IR

m

z, ei z, e j αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,

(5.7.6)

182

Kalman filtering

δkin (x) = (x, Yk )

IRm

in δk−1 (z)ψ(Bk−1 (x − Ak z))dz

+ x, ei Yk , f n where (x, Yk ) =

IRm

αk−1 (z)ψ(Bk−1 (x − Ak z))dz ,

(5.7.7)

φ(Dk−1 (Yk − Ck x)) . |Bk ||Dk |φ((Yk )

Remarks 5.7.4 1. Using (5.7.3), (5.7.4) and (5.7.7) are written as i j(0) i j(0) βk (x) = (x, Yk ) βk−1 (z)ψ(Bk−1 (x − Ak z))dz + x, ei x, e j αk (x), δkin (x) = (x, Yk )

(5.7.8)

IRm

IRm

in δk−1 (z)ψ(Bk−1 (x − Ak z))dz + x, ei Yk , f n αk (x).

(5.7.9)

2. The above theorem does not require Vl and Wl to be Gaussian. The recursions (5.7.3), (5.7.4) and (5.7.7) hold for arbitrary densities ψ and φ as long as φ is strictly positive. 3. Initial conditions: Note that at k = 0, the following holds for any Borel “test” function g(x). φ(D0−1 (Y0 − C0 x)) E[0 g(x) | Y0 ] = E g(x) | Y0 |D0 |φ(Y0 ) 1 = φ(D0−1 (Y0 − C0 x))ψ(x)g(x)dx. |D0 |φ(Y0 ) IRm (5.7.10) Equating (5.7.1) and (5.7.10) yields α0 (x) =

φ(D0−1 (Y0 − C0 x)) ψ(x). |D0 |φ(Y0 ) i j(M)

Similarly the initial conditions for βk i j(0)

β0

(x) M = 0, 1, 2 and δkin (x) are

(x) = x, ei x, e j α0 (x),

i j(2) β0 (x)

= 0,

δ0in (x)

(5.7.11)

i j(1)

β0

(x) = 0,

= x, ei Y0 , f n α0 (x). (5.7.12) i j(M)

4. Theorems 5.2 and 5.3 in [13] derive finite-dimensional filters for Tk , M = 0, 1, 2, i j(M) and Ukin defined in (5.6.5), (5.6.6), and (5.6.7). In particular, the densities αk , βk , in M = 0, 1, 2, and δk are characterized in terms of a finite number of statistics. Recall from Section 5.4 that αk is an unnormalized normal density with mean µk = E[X k | Yk ] and variance Rk = E[(X k − µk )(X k − µk ) | Yk ] given by the well-known Kalman filter equations. −1 µk = Rk Bk−2 Ak σk−1 Rk−1 µk−1 + Rk Ck (Dk Dk )−1 Yk , −1 Rk = (Ak Rk−1 Ak + Bk2 )−1 + Ck (Dk Dk )−1 Ck .

(5.7.13) (5.7.14)

5.7 Finite-dimensional filters

183

Here σk is a symmetric m × m matrix defined as −1 σk = Ak Bk−2 Ak + Rk−1 .

(5.7.15)

Due to the presence of the quadratic term x, ei x, e j , the density βk(0) in (5.7.8) is not Gaussian. However, as will be shown below (Theorem 5.2 in [13]) it is possible to express it as a quadratic expression in x multiplied by αk (x) for all k. The important conclusion then is that by updating the coefficients of the quadratic expression, together with the Kalman i j(0) filter above, gives finite-dimensional filters for computing Tk . A similar result also holds i j(1) i j(2) for Tk , Tk and Ukin . Theorems 5.7.5 and 5.7.6 that follow derive finite-dimensional sufficient statistics for the i j(M) densities βk , M = 0, 1, 2, and δkin . To simplify notation, define k = Bk−2 Ak σk−1 ,

−1 −1 Sk = σk+1 R k µk .

(5.7.16)

i j(M)

Theorem 5.7.5 At time k, the density βk (x) (initialized according to (5.7.12)) is comi j(M) i j(M) i j(M) pletely defined by the five statistics ak , bk , dk , Rk and µk as follows: i j(M)

βk

i j(M) i j(M) i j(M) (x) = ak + bk x + x dk x αk (x),

i j(M)

i j(M)

M = 0, 1, 2,

(5.7.17)

i j(M)

where ak ∈ IR, bk ∈ IRm , and dk ∈ IRm×m is a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m. i j(M) i j(M) i j(M) Furthermore, ak , bk , dk , M = 0, 1, 2, are given by the following recursions: i j(0)

i j(0) i j(0) −1 i j(0) + Sk dk Sk , + bk Sk + Tr dk σk+1 i j(0) i j(0) i j(0) = k+1 bk + 2dk Sk , b0 = 0m×1 , i j(0)

ak+1 = ak i j(0)

bk+1

i j(0)

i j(0)

a0

ei ej + e j ei 1 i j(0) k+1 + (ei ej + e j ei ), d0 = , 2 2 i j(1) i j(1) i j(1) −1 i j(1) i j(1) = ak + bk Sk + Tr dk σk+1 + Sk dk Sk , a0 = 0, i j(1) i j(1) i j(1) = k+1 bk + 2dk Sk + ei ej Sk , b0 = 0m×1 , i j(0)

dk+1 = k+1 dk i j(1)

ak+1

i j(1)

bk+1

i j(1)

= 0,

(5.7.18) (5.7.19) (5.7.20) (5.7.21) (5.7.22)

1 i j(1) k+1 + (ei ej k+1 + k+1 e j ei ), d0 = 0m×m , (5.7.23) 2 i j(2) i j(2) i j(2) −1 i j(2) = ak + bk Sk + Tr dk σk+1 + Sk (dk + ei ej )Sk i j(1)

dk+1 = k+1 dk i j(2)

ak+1

i j(2)

bk+1

i j(2) −1 , a0 = 0, + Tr ei ej σk+1 i j(2) i j(2) = k+1 bk + (2dk + 2ei ej )Sk ,

ei ej + e j ei i j(2) i j(2) k+1 dk+1 = k+1 dk + , 2

(5.7.24) i j(2)

b0

i j(2)

d0

= 0m×1 ,

(5.7.25)

= 0m×m ,

(5.7.26)

where Tr[.] denotes the trace of a matrix (which is the sum of the diagonal elements), σk is defined in (5.7.15) and µk , Rk are obtained from the Kalman filter (5.7.13) and (5.7.14).

184

Kalman filtering

Proof Only the proof of the theorem for M = 0 is given; the proofs for M = 1, 2 are similar and left as an exercise. i j(0) i j(0) i j(0) From (5.7.12), at time k = 0, β0 (x) is of the form (5.7.17) with a0 = 0, b0 = 0, i j(0) d0 = (ei e j + e j ei )/2. Assume that (5.7.17) holds at time k. Then at time k + 1, using (5.7.17) and the recursion (5.7.8) it follows that i j(0) i j(0) i j(0) i j(0) βk+1 (x) = (x, Yk ) ak + bk z + z dk z αk (z) IRm

× ψ(Bk−1 (x − Ak z))dz + x, ei x, e j αk+1 (x).

(5.7.27)

Denote the first term on the RHS of (5.7.27) as I1 . 1 −2 I1 = K (x) exp − {(x − Ak+1 z) Bk+1 (x − Ak+1 z) + (z − µk ) Rk−1 (z − µk )} 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz 1 i j(0) i j(0) i j(0) = K 1 (x) exp − (z σk+1 z − ξk+1 z) ak + bk z + z dk z dz, (5.7.28) 2 IRm where σk is defined in (5.7.15), −2 ξk+1 = 2(x Bk+1 Ak+1 + µk Rk−1 ), −1/2

K (x) = (x, Y )(2π)−m |Bk+1 |−1 |Rk α¯ k = αk (z)dz, IRm

α¯ k ,

1 −2 −1 K 1 (x) = K (x) exp − (x Bk+1 x + µk Rk µk ) . 2

(5.7.29)

Completing the “square” in the exponential term in (5.7.28) yields

−1 σk+1 ξk+1 ξk+1 1 I1 = K 1 (x) exp − − 2 4 −1 −1 σk+1 ξk+1 ξk+1 σk+1 1 × exp − σk+1 z − z− 2 2 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz. Now consider the integral in (5.7.30), −1 −1 σk+1 ξk+1 ξk+1 σk+1 1 exp − σk+1 z − z− 2 2 2 IRm i j(0) i j(0) i j(0) × ak + bk z + z dk z dz i j(0) i j(0) i j(0) = (2π)m/2 |σk+1 |−1/2 ak + bk E[z] + E[z dk z] ,

(5.7.30)

(5.7.31)

5.7 Finite-dimensional filters

185

since the exponential term is an unnormalized Gaussian density in z with normalization constant (2π)m/2 |σk+1 |−1/2 . So E[z] = E[z dk

i j(0)

−1 σk+1 ξk+1 , 2

z] = E{(z − E[z]) dk

i j(0)

i j(0)

= Tr[dk

(5.7.32)

(z − E[z])} + E[z ]dk

i j(0)

E[z],

1 −1 i j(0) −1 −1 σk+1 ] + (σk+1 ξk+1 ) dk (σk+1 ξk+1 ). 4

(5.7.33)

Therefore from (5.7.30), (5.7.31), (5.7.32), (5.7.33) and (5.7.27) it follows that 1 i j(0) −1 i j(M) i j(0) i j(0) −1 βk+1 (x) = αk+1 (x) ak + bk σk+1 ξk+1 + Tr[dk σk+1 ] 2 1 −1 i j(0) −1 + ξk+1 σk+1 dk σk+1 ξk+1 + x ei ej x . 4

(5.7.34)

Substituting for ξk+1 (which is affine in x) yields i j(0) i j(0) i j(0) i j(0) βk+1 (x) = ak+1 + bk+1 x + x dk+1 x αk+1 (x), i j(0)

i j(0)

i j(0)

where ak+1 , bk+1 and dk+1 are given by (5.7.18), (5.7.19) and (5.7.20). The proof of the following theorem (Theorem 5.3 in [13]) is very similar and hence omitted. Theorem 5.7.6 At time k, the density δkin (x) (initialized according to (5.7.12)) is completely defined by the four statistics a¯ kin , b¯kin , Rk and µk as follows: δkin (x) = a¯ kin + b¯kin x αk (x), (5.7.35)

where a¯ kin ∈ IR, b¯kin ∈ IRm , are given by the following recursions: in a¯ k+1 = a¯ kin + b¯kin Sk ,

in b¯k+1 = k+1 b¯kin ,

a¯ 0in = 0,

(5.7.36)

b¯0in = ei Yk+1 , f n ,

(5.7.37)

where k and Sk are defined in (5.7.16) and µk , Rk are obtained from the Kalman filter (5.7.13) and (5.7.14). i j(M)

Having characterized the densities βk , M = 0, 1, 2, and δkin by their finite-dimensional i j(M) sufficient statistics, finite-dimensional filters for Tk and Ukin are now derived. i j(M)

Theorem 5.7.7 (Theorem 5.4 in [13]) Finite-dimensional filters for Tk , M = 0, 1, 2, and Ukin are given by i j(M) i j(M) i j(M) i j(M) i j(M) E[Tk | Yk ] = a k + bk µk + Tr dk Rk + µk dk µk , (5.7.38)

E[Ukin | Yk ] = a¯ kin + b¯kin µk .

(5.7.39)

186

Proof

Kalman filtering

Using the abstract Bayes’ Theorem (4.1.1), i j(M)

i j(M) E[Tk

| Yk ] =

E[k Tk

i j(M)

| Yk ]

E[k | Yk ]

=

IR

m

βk

K

dx ,

(5.7.40)

where the constant K = (5.7.17), i j(M)

IRm

βk

IRm

αk (x)dx. But since αk (x) is an unnormalized density, from

i j(M) i j(M) i j(M) dx = K E ak + bk x + x dk x i j(M) i j(M) i j(M) i j(M) = K ak + bk µk + Tr dk Rk + µk dk µk .

(5.7.41)

Substituting in (5.7.40) proves the theorem. The proof of (5.7.39) is left as an exercise.

Remark 5.7.8 Theorem (5.7.7) gives finite-dimensional filters for the time sum of the ij states Ukin and time sum of the square of the states Tk . Theorem 6.2 in [13] shows that finite-dimensional filters exist for the time sum of any arbitrary integral power of the states. For notational simplicity, assume that the state and observation processes are scalar-valued i.e. m = d = 1 in (5.6.1) and (5.6.2). k p Let Tk = i=0 X i and define the unnormalized density βk (x) = E[k Tk I (X k ∈ dx) | Yk ]. The first step is to obtain a recursion for βk (x). It can be easily shown that βk (x) = (x, Yk ) βk−1 (z)ψ(Bk−1 (x − Ak z))dz + x p αk (x), IR

(5.7.42) φ(Dk−1 (Yk − Ck x)) Bk Dk φ(Yk ) Next, βk (x) is characterized in terms of finite sufficient statistics, as shown in Theorem 5.7.3. Also for p = 1 and 2, Theorems 5.7.6 and 5.7.5 give finite-dimensional sufficient statistics. Theorem 6.2 in [13] shows that βk can be characterized in terms of finitedimensional statistics for any p ∈ IN. where (x, Yk ) =

Theorem 5.7.9 ([13] Theorem 6.2) At time k, the density βk (x) in (5.7.42) is completely defined by p + 3 statistics ak (0), ak (1), . . . , ak ( p), Rk and µk as follows: p i ak (i)x αk (x), (5.7.43) βk (x) = i=0

5.7 Finite-dimensional filters

187

where p i

ak+1 (n) =

ak (i)ηi j

j

i=n j=n

n

−2 n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) , 0 ≤ n < p,

−2 p ak+1 ( p) = 1 + ak ( p)η pp (Ak+1 Bk+1 ) ,

(5.7.44)

and    i  −( j+1)   1.3...(i − j − 1)σk+1    j ηi j =   0    σ − j k+1

if i − j is even, i > j, if i − j is odd, i > j,

(5.7.45)

if i = j.

Proof At time k = 0, β0 (x) = x p α0 (x) and so satisfies (5.7.43). Assume that (5.7.43) holds at time k. Then at time k + 1, using similar arguments to Theorem 5.7.5, it follows that p −1 i βk+1 (x) = (x, Yk ) ψ(Bk+1 (x − Ak+1 z)) ak (i)z αk (z)dz + x p αk (x). (5.7.46) IR

i=0

Denote the RHS as I1 .

−1 σk+1 δk+1 δk+1 1 I1 = K 1 (x) exp − − 2 4   2 p −1  1  δ σ k+1 × z − k+1 ak (i)z i exp − σk+1 dz.  2  2 IR i=0

(5.7.47)

The integral in (5.7.47) is 1/2

(2π)

−1/2

|σk+1 |

E

p

ak (i)z

i

i=0

= (2π)

1/2

−1/2

|σk+1 |

p i=0

ak (i)

i

i

j=0

j

E (z − E[z])i− j (E[z]) j .

(5.7.48)

Recall from (5.7.32) that E[z] is affine in x: −1 −2 E[z] = σk+1 [Rk−1 µk + Ak+1 Bk+1 x].

Also E[(z − E[z])2 ] is independent of x. Indeed ([31], p. 111),   if i − j is even, i > j,  0 i− j −1 E (z − E[z]) = 1.3...(i − j − 1)σk+1 if i − j is even, i > j,   1 if i = j.

(5.7.49)

(5.7.50)

188

Kalman filtering

Thus

βk+1 (x) = αk+1 (x) = αk+1 (x)

p p j

ak (i)ηi j

i=0 j=0 n=0 p p i

ak (i)ηi j

n=0 i=n j=n

j n j n

−2 n n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) x

+x

p

−2 n n (Rk−1 µk ) j−n (Ak+1 Bk+1 ) x

+x

p

.

(5.7.51) Equation (5.7.51) is of the form (5.7.43) with ak+1 (i), i = 0, . . . , p given by (5.7.44).

Remark 5.7.10 The filters derived in Theorems 5.7.3, 5.7.5 and 5.7.6 have one major problem: they require Bk to be invertible. In practice, Bk is often not invertible. A simple transformation that expresses the filters in the terms of the inverse of the predicted Kalman covariance matrix is used. This inverse exists even if Bk is singular as long as a certain uniform controllability condition holds. Both the uniform controllability condition and the transformation used are well-known in Kalman filter literature (Chapter 7 in [20]).

First define the Kalman predicted state estimate µk|k−1 = E[X k | Yk−1 ] and the predicted

state covariance Rk|k−1 = E[(X k − µk|k−1 )(X k − µk|k−1 ) | Yk−1 ]. It is left as an exercise to show that Rk|k−1 = Bk2 + Ak Rk−1 Ak .

(5.7.52)

The first step is to provide a sufficient condition for Rk|k−1 to be non-singular. Definition 5.7.11 ([20] Chapter 7) The state space model (5.6.1), (5.6.2) is said to be uniformly completely controllable if there exist a positive integer N1 and positive constants α, β such that α I ≤ C(k, k − N1 ) ≤ β I

for all k ≥ N1 .

(5.7.53)

Here k

C(k, k − N1 ) =

φ(k, l + 1)Bl Bl φ(k, l + 1) .

(5.7.54)

l=k−N1

φ(k2 , k1 ) =

Ak2 Ak2 −1 ...Ak1 +1

if k2 > k1 ,

I

if k2 = k1 .

(5.7.55)

Lemma 5.7.12 If the state space model (5.6.1), (5.6.2) is uniformly completely controllable and R0 ≥ 0 then Rk and Rk|k−1 are positive definite matrices (and hence nonsingular) for all k ≥ N1 . Proof See [20], p. 238, Lemma 7.3. The following lemma will be used in the sequel.

5.7 Finite-dimensional filters

189

−1 Lemma 5.7.13 (Lemma 7.3 in [13]) Assume Rk|k−1 exists. Then with σk and k defined in (5.7.15) and (5.7.16), respectively, −1 σk−1 = Rk−1 − Rk−1 Ak Rk|k−1 Ak Rk−1 , −1 k+1 = Rk+1|k Ak+1 Rk .

(5.7.56) (5.7.57)

Furthermore, the Kalman filter (5.7.13), (5.7.14) can be expressed in “standard” form µk = Ak µk−1 + Rk|k−1 Ck [Ck Rk|k−1 Ck + Dk Dk ]−1 (Yk − Ck Ak µk−1 ), Rk = Rk|k−1 − Rk|k−1 Ck [Ck Rk|k−1 Ck + Dk Dk ]−1 Ck Rk|k−1 , Rk|k−1 = Bk2 + Ak Rk−1 Ak . Proof

(5.7.58)

Straightforward use of the Matrix Inversion Lemma 5.4.3 on (5.7.15) yields σk−1 = Rk−1 − Rk−1 Ak (Bk2 + Ak Rk−1 Ak )−1 Ak Rk−1 .

(5.7.59)

Substituting (5.7.52) in (5.7.59) proves (5.7.56). To prove (5.7.57), first note that

−2 −1 k+1 = Bk+1 Ak+1 σk+1 −2 −2 −1 = Bk+1 Ak+1 Rk − Bk+1 Ak+1 Rk Ak+1 Rk+1|k Ak+1 Rk −2 −2 −2 −1 = Bk+1 Ak+1 Rk − Bk+1 (Rk+1|k − Bk+1 )Rk+1|k Ak+1 Rk ,

because of (5.7.52). So −2 −2 −1 k+1 = Bk+1 Ak+1 Rk − Bk+1 Ak+1 Rk + Rk+1|k Ak+1 Rk −1 = Rk+1|k Ak+1 Rk .

To prove (5.7.58), consider the Kalman filter equations (5.7.13) and (5.7.14). Using Lemma 5.7.12 to the first term on the RHS of (5.7.13) yields the “standard” Kalman filter equations. Now the filters derived earlier in this section are expressed in terms of Rk|k−1 instead of Bk . As shown below (Theorem 7.4 in [13]), the advantage of doing so is that Bk no longer needs to be invertible, as long as the uniformly controllability condition in Definition 5.7.11 holds. Theorem 5.7.14 Consider the linear dynamical system (5.6.1) and (5.6.2) with Bk not necessarily invertible. Assume that the system is uniformly completely controllable, i.e. (5.7.53) holds. Then at time k, with σk−1 given by (5.7.56) and k defined in (5.7.57), the following model holds. 1. The density αk (x) (defined in (5.7.1)) is an unnormalized Gaussian density with mean µ ∈ IRm and covariance matrix Rk ∈ IRm×m . These are recursively computed via the standard Kalman filter equations (5.7.58).

190

Kalman filtering i j(M)

2. The density βk (x) (initialized according to (5.7.12)) is completely defined by the five i j(M) i j(M) i j(M) statistics ak , bk , dk , Rk and µk as follows: i j(M) i j(M) i j(M) i j(M) βk (x) = ak + bk x + x dk x αk (x), M = 0, 1, 2, i j(M)

i j(M)

i j(M)

where ak ∈ IR, bk ∈ IRm , and dk ∈ IRm×m is a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m. These statistics are recursively computed by Equations (5.7.18) to (5.7.26). 3. The density δkin (x) (initialized according to (5.7.12)) is completely defined by the four statistics a¯ kin , b¯kin , Rk and µk as follows: δkin (x) = a¯ kin + b¯kin x αk (x), (5.7.60) where a¯ kin ∈ IR, b¯kin ∈ IRm are given by the recursions (5.7.36) and (5.7.37).

i j(M)

Finally, finite-dimensional filters for Tk given by (5.7.38) and (5.7.39).

and Ukin in terms of the above statistics are

Proof It only remains to show that subject to the uniform complete controllability condition (5.7.53), the filtering equations (5.7.18)–(5.7.26) and (5.7.36), (5.7.37) in Theorem 5.7.14 hold even if the matrices Bk+1 are singular. The proof of this is as follows. If Bk+1 is singular, 1. Add × N (0, 1) noise to each component of X k+1 . This is done by replacing Bk+1 in (5.6.1) with the nonsingular matrix Bk+1 = Bk+1 + Im , where ∈ IR. Denote the resulting state process X k+1 . 2. Define Rk+1|k as in (5.7.52) with Bk+1 replaced by Bk+1 . Express the filters in term of Rk+1|k as Theorem 5.7.14. 3. As → 0, Rk+1|k → Rk+1|k . 4. Then using the bounded conditional convergence theorem (p. 214, [4]), the conditional i j(0) estimates of X k , X k X k , Tk (x ), and Ukin (x ) converge to the conditional estimates of i j(0) X k , X k X k , Tk (x), and Ukin (x), respectively.

5.8 Continuous-time vector dynamics Consider the classical linear Gaussian model for the signal and observation processes. That is, the signal {xt }, t ≥ 0, is described by the equation dxt = At xt dt + Bt dw t ,

x0 ∈ IRm ,

(5.8.1)

and the observation process {yt }, t ≥ 0, is described by the equation dyt = Ct xt dt + Dt dvt ,

y0 = 0 ∈ IRn .

(5.8.2)

Here w and v are independent r -dimensional and n-dimensional Brownian motions, respectively, defined on a probability space (, F, P) with complete filtrations Ft = σ {xs , ys : s ≤ t}, and Yt = σ {ys : s ≤ t}, t ≥ 0. Further, w and v are independent of x0 . We assume

5.8 Continuous-time vector dynamics

191

that x0 is random variable with normal density

) p0 (x) = (2π )−n/2 |P0 |−1/2 exp(−1/2) (x − xˆ0 ) P0−1 (x − xˆ0 ) .

The matrix functions At ∈ IRm×m , Bt ∈ IRm×r , Ct ∈ IRn×m , and Dt ∈ IRn×n are measurable functions of t. We assume Dt is a positive definite matrix. We model the above dynamics by supposing that initially we have an “ideal” probability space (, F, P) such that under P, 1. w is an r -dimensional Brownian motion and {xt } is defined by (5.8.1), 2. y is an n-dimensional Brownian motion, independent of w and x0 , and having quadratic variation yt = Dt > 0; i.e., Dt is a positive definite matrix.

∂ ∂ Write = ,..., . ∂ x1 ∂ xn m For any function g : IR → IR, write   2 ∂2g ∂ g . . .  ∂x2 ∂ x1 ∂ xm    1  .. .. .. 2 =  . . . .   2  ∂ g ∂2g  ... ∂ xm ∂ x1 ∂ xm2 For a vector field g(x) = (g1 (x), g2 (x), . . . , gm (x)) defined on IRm , define div(g) = Define

t

t = exp

(Cs xs ) 0

∂g2 ∂gm ∂g1 + + ··· + . ∂ x1 ∂ x2 ∂ xm

Ds−1 (Ds−1 ) dys

which is also given by

t

t = 1 + 0

1 − 2

t 0

xs Cs (Ds−1 ) Ds−1 Cs xs ds

s xs Cs (Ds−1 ) Ds−1 dys .

To see this apply the Itˆo rule to the function log t . Then t is an Ft -martingale and E[t ] = 1. A new probability measure P can be defined by setting Define the process vt by the formula dvt = Dt−1 (yt − Ct xt dt),

,

(5.8.3)

dP = t . dP Ft

v0 = 0.

Then Girsanov’s theorem 4.3.3 implies that {vt } is a standard n-dimensional Brownian motion process under P. Therefore under P, dyt = Ct xt dt + Dt dvt . Note that under P, the process {xt } still satisfies (5.8.1). Consequently, under P the processes {xt } and {yt } satisfy the real world dynamics (5.8.1) and (5.8.2). However, P is a more convenient measure with which to work.

192

Kalman filtering

For any “test” function φ : IRm → IR which is in C 2 and has compact support, write σ (φ)t = E[t φ(xt ) | Yt ]. In the case when the measure defined by σ (.)t has a density q(x, t) we have σ (φ)t = φ(x)q(x, t)dx. IRm

Using the vector Itˆo rule (Theorem 3.6.9) we establish t t φ(xt ) = φ(x0 ) + ( φ(xs )) As xs ds + ( φ(xs )) Bs dw s 1 + 2

0

t 0

0

Tr[ 2 φ(xs ) Bs Bs ]ds.

(5.8.4)

In view of (5.8.3) and (5.8.4) and using the Itˆo product rule (Example 3.7.15), t t t φ(xt ) = φ(x0 ) + s ( φ(xs )) As xs ds + s ( φ(xs )) Bs dw s 0

+

1 2

+ 0

0

t 0

t

s Tr[ 2 φ(xs ) Bs Bs ]ds

s φ(xs )xs Cs (Ds−1 ) Ds−1 dys .

(5.8.5)

Conditioning both sides of (5.8.5) on Yt and using the fact that Bt and yt are independent and that yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 p. 261), we have Theorem 5.8.1 Suppose φ ∈ C 2 is a real-valued function with compact support. Then t σ (φ)t = σ (φ)0 + σ (( φ(xs )) As xs )ds 0

1 t + σ (Tr[ 2 φ(xs ) Bs Bs ])ds 2 0 t + σ (φ(xs )xs )Cs (Ds−1 ) Ds−1 dys . 0

If σ (.)t has a density q(x, t), we integrate by parts each term of q(x, t)φ(x)dx = q(x, 0)φ(x)dx IRm

IRm

+

t m

0

IR

0

IRm

q(x, t)( φ(x)) As x)dxds

1 t + q(x, t)Tr[ 2 φ(x) Bs Bs ]dxds 2 0 IRm t + q(x, t)φ(x)x Cs (Ds−1 ) Ds−1 dxdys .

(5.8.6)

5.8 Continuous-time vector dynamics

For instance, if m = 2, t t q(x, t)( φ(x)) As x)dxds = 0

IRm

193

∂φ ∂φ q(x, t) , m ∂ x1 ∂ x2 IR

0

× (a11 x1 + a12 x2 , a21 x1 + a22 x2 ) dxds t ∂φ = q(x, t)(a11 x1 + a12 x2 ) dxds m ∂ x1 0 IR t ∂φ + q(x, t)(a21 x1 + a22 x2 ) dxds ∂ x2 0 IRm t ∂q(x, t)(a11 x1 + a12 x2 ) =− φ(x) dxds m ∂ x1 0 IR t ∂q(x, t)(a21 x1 + a22 x2 ) − φ(x) dxds m ∂ x2 0 IR t ∂q(x, t)(a11 x1 + a12 x2 ) =− φ(x) ∂ x1 IRm 0 ∂q(x, t)(a21 x1 + a22 x2 ) dxds + ∂ x2 t =− φ(x)div(As xq(x, s))dxds. IRm

0

Similarly, t 0

IRm

q(x, s)Tr[ 2 φ(x) Bs Bs ]dxds =

t 0

IRm

φ(x)Tr[ 2 q(x, s)Bs Bs ]dxds,

which holds for all “test” functions φ; hence Lemma 5.8.2

q(x, t) = q(x, 0) − + 0

t

div(As xq(x, s))ds +

0 t

1 2

0

t

Tr[ 2 q(x, s)Bs Bs ]ds

q(x, s)x Cs (Ds−1 ) Ds−1 dys ,

(5.8.7)

with q(x, 0) = p0 (x), the density of x0 . Remark 5.8.3 Equation (5.8.7) is a stochastic partial differential equation for the unnormalized conditional density of xt given Yt . In general, the solution of this equation is a conditional density function, evolving stochastically in time. For the linear, Gaussian dynamics (5.8.1) and (5.8.2), however, q(x, t) has a simple form. Theorem 5.8.4 The solution of (5.8.7) is ) q(x, t) = (2π)−m/2 | t |−1/2 νt exp(−1/2) (x − m t ) t−1 (x − m t ) .

(5.8.8)

Here m t = E[xt | Yt ], m 0 = xˆ0 , t = E[(xt − m t ) (xt − m t ) | Yt ], 0 = P0 , and νt is a normalizing factor.

194

Kalman filtering

It is well known that m t and t are given by the Kalman filter equations dm t = At m t dt + t Ct (Dt−1 ) Dt−1 (dyt − Ct m t dt) = At m t dt + t Ct (Dt−1 ) dvt , (since under P, dvt = Dt−1 (dyt − Ct m t dt), d t = t At + At t + Bt Bt − t Ct (Dt−1 ) Dt−1 Ct t . dt Note that t is deterministic and can be computed off-line. Also

t 1 t −1 −1 νt = exp m s Cs (Ds−1 ) Ds−1 dys − m s Cs (Ds ) Ds Cs m s ds . 2 0 0 Proof

(5.8.9) (5.8.10)

(5.8.11)

([2]) We have to show that for any “test” function φ(.) t 1 t σ (φ)t = σ (φ)0 + σ (( φ(xs )) As xs )ds + σ (Tr[ 2 φ(xs ) Bs Bs ]ds 2 0 0 t + σ (φ(xs )xs )Cs (Ds−1 ) Ds−1 dys 0

=

IRm

φ(x)q(x, t)dx,

(5.8.12)

where q(x, t) is given by (5.8.8). −1/2 1/2 Let ξ = t (x − m t ), x = m t + t ξ and dx = | t |1/2 dξ . Hence the dx integral on the right hand side of (5.8.12) is equal to 2 1/2 (2π)−m/2 φ(m t + t ξ )νt e−|ξ | /2 dξ. IRm

Now:

d

(2π) IRm

= IR

m

−m/2

φ(m t +

2 1/2 t ξ )νt e−|ξ | /2 dξ

2 1/2 (2π)−m/2 d φ(m t + t ξ )νt e−|ξ | /2 dξ,

and using the product rule, d(φνt ) = φdνt + νt dφ + d φ, νt ∂φ 1 ∂ 2φ 1/2 (dm t + d t ξ ) + Tr[ 2 d m, mt ]. ∂x 2 ∂x From (5.8.9), (5.8.10) and (5.8.11), 1/2

dφ(m t + t ξ ) =

dνt = νt m s Cs (Ds−1 ) Ds−1 dys , d t = ( t At + At t + Bt Bt − t Ct (Dt−1 ) Dt−1 Ct t )dt, dm t = (At m t − t Ct (Dt−1 ) Dt−1 Ct m t )dt + t Ct (Dt−1 ) Dt−1 dyt , d m, mt = t Ct (Dt−1 ) Dt−1 Ct t dt, d φ, νt = νt

∂φ t Ct (Dt−1 ) Dt−1 Ct m t dt. ∂x

5.8 Continuous-time vector dynamics

195

Therefore ∂φ 1/2 1 ∂ 2φ d t ξ + Tr[ 2 t Ct (Dt−1 ) Dt−1 Ct t ] dt ∂x ∂x 2 ∂x ∂φ + [φm s Cs (Ds−1 ) Ds−1 + t Ct (Dt−1 ) Dt−1 ]dys , ∂x

d(φνt ) = νt

∂φ

At m t +

and

∂φ

∂φ 1/2 d t ξ ∂x ∂x IRm 1 ∂ 2φ + Tr[ 2 t Ct (Dt−1 ) Dt−1 Ct t ] dt 2 ∂x ∂φ 2 + [φm s Cs (Ds−1 ) Ds−1 + t Ct (Dt−1 ) Dt−1 ]dys e−|ξ | /2 dξ. ∂x

dσ (φ)t = (2π)

−m/2

νt

At m t +

However, using integration by parts, IRm

IRm

∂φ 1/2 −|ξ |2 /2 1 d t ξ e dξ = ∂x 2

∂ 2 φ d t −|ξ |2 /2 ]e dξ, ∂ x 2 dt IRm ∂φ 2 2 1/2 t Ct (Dt−1 ) Dt−1 e−|ξ | /2 dξ = φξ t Ct (Dt−1 ) Dt−1 e−|ξ | /2 dξ, ∂x IRm

−1/2

ξ = x t

−1/2

− m t t

Tr[

. It follows that

dσ (φ)t = (2π )−m/2 νt

∂φ IRm

∂x

At m t

1 1 ∂ 2 φ 1 ∂ 2φ 0 + Tr A + A B B dt + Tr t t t t t t 2 ∂x2 2 ∂x2 2 + φx Ct (Dt−1 ) Dt−1 dys e−|ξ | /2 dξ. Using integration by parts again,

∂ 2φ ∂ 2 φ −|ξ |2 /2 −|ξ |2 /2 Tr A dξ = Tr A dξ e e t t t t ∂x2 ∂x2 IRm IRm ∂φ 2 1/2 = At t ξ e−|ξ | /2 dξ m ∂x IR ∂φ 2 = At xe−|ξ | /2 dξ − IRm ∂ x IRm ∂ 2φ ∂φ 2 2 1/2 Tr At t e−|ξ | /2 dξ = At t ξ e−|ξ | /2 dξ 2 ∂x IRm IRm ∂ x ∂φ −|ξ |2 /2 = At xe dξ − IRm ∂ x IRm

∂φ 2 At m t e−|ξ | /2 dξ, ∂x

∂φ 2 At m t e−|ξ | /2 dξ. ∂x

196

Kalman filtering

Hence

1 ∂ 2φ At x + Tr B B dt t t ∂x 2 ∂x2 IRm 2 + φx Ct (Dt−1 ) Dt−1 dys e−|ξ | /2 dξ

dσ (φ)t = (2π )

−m/2

∂φ

νt

1 = σ (( φ(x)) At x)ds + σ (Tr[ 2 φ(x) Bt Bt )dt 2 + σ (φ(x)x )Ct (Dt−1 ) Dt−1 dyt , which is the desired result.

5.9 Continuous-time model parameters estimation The linear model given by (5.8.1) and (5.8.2) is determined by the matrices A, B, C and D which need to be known. These parameters are estimated using the expectation maximization (EM) algorithm which we describe here. Maximum likelihood estimation of the parameters via the EM algorithm requires computation of the filtered estimates of quantities such as

t 0

t

xs ⊗ dxs ,

xs ⊗ dys ,

0

t

xs ⊗ xs ds.

0

Remark 5.9.1 In all the existing literature on parameter estimation of linear Gaussian models via the EM algorithm, filtered estimates of the above quantities are computed via Kalman smoothing, which requires large memory numerical implementation. This problem is solved in [14] by providing finite-dimensional filters for (the components of) such integral processes. It is further shown that finite-dimensional filters exist for integrals and stochastic integrals of moments of all orders of the state process etc. Consider the time-invariant version of (5.8.1), (5.8.2): dxt = Axt dt + Bdw t ,

x0 ∈ IRm ,

dyt = C xt dt + Ddvt ,

y0 = 0 ∈ IRn .

The aim is to compute ML estimates of the parameters θ = (A, C) given the observations Yt = σ {ys : s ≤ t} and assuming B, D are known. This is done via the EM algorithm. Remark 5.9.2 Unlike the discrete-time case, in continuous time, it is not possible to obtain ML estimates of the variance terms B and D because measures corresponding to Wiener processes with different variances are not absolutely continuous (see Chapter 6.1 in [15]). Estimates for B and D are given in terms of the quadratic variations of the state and observation processes.

5.9 Continuous-time model parameters estimation

197

Notation Let ei , e j ∈ IRm denote unit vectors with 1 in the i-th and j-th positions, respectively. Write ij Tt

t

=

0

t

ij

Lt =

t

xs , ei e j , xs ds = 0

t

xs , ei e j , dxs =

0

xs (ei ej )xs ds,

xs (ei ej )dxs ;

0

(5.9.1) (5.9.2)

here ., . denotes the scalar product. Also let f j ∈ IRn denote the unit vector with 1 in the j-th position. Write

t

ij

Ut =

xs , ei f j , dys =

0

0

t

xs (ei f j )dys .

(5.9.3)

Now the expression for Q(θ, θ˜ ) is derived. To update the estimates from A˜ to A, introduce the density t dPθ xs (A − A˜ )(B B )# dxs = exp dPθ˜ Gt 0 1 t − xs (A − A˜ )(B B )# (A − A˜ )xs ds , 2 0 where # denotes the pseudo inverse. Then t dP(A) E log xs A (B B )# dxs | Yt = E ˜ dP( A) 0 1 t ˜ − xs A(B B )# Axs ds | Yt + R( A), 2 0

(5.9.4)

˜ does not involve A. where R( A) Similarly, to update the estimates from C˜ to C, introduce the density t dP(C) ˜ = exp xs (C − C)(D D )−1 dxs ˜ Gt dP(C) 0 1 t ˜ ˜ s ds , − xs (C − C)(D D )−1 (C + C)x 2 0 Consequently,

t dP(C) E log xs C (D D )−1 dys | Yt = E ˜ dP(C) 0 1 t ˜ − xs C (D D )−1 C xs ds | Yt + S(C), 2 0 ˜ does not involve C. where R(C)

(5.9.5)

198

Kalman filtering

Adding (5.9.4) and (5.9.5) yields t 1 t # # ˜ Q(θ, θ) = E xs A (B B ) dxs − x A(B B ) Axs ds | Yt 2 0 s 0 t 1 t +E xs C (D D )−1 dys − xs C (D D )−1 C xs ds | Yt 2 0 0 + E[R(θ˜ ) | Yt ]. ∂Q = 0. This yields ∂θ t −1 t E A=E dxs ⊗ xs | Yt xs ⊗ xs ds | Yt

To implement the M-step set the derivatives

0

0

= Lˆ t Tˆt−1 , t t −1 C=E E dys ⊗ xs | Yt xs ⊗ xs ds | Yt 0

0

= Jˆt Hˆ t−1 , ij ij ij where Tˆt and Lˆ t ∈ IRm×m denote matrices with elements Tˆt = E[Tt | Yt ] and Lˆ t = ij ij E[L t | Yt ], i, j ∈ {1, . . . , m}. Also Uˆ t ∈ IRm×n denotes the matrix with elements Uˆ t = ij E[Ut | Yt ], i ∈ {1, . . . , m}, j ∈ {1, . . . , n}. ij ij ij Remark 5.9.3 The terms Tˆt , Lˆ t and Uˆ t are computed in terms of finite-dimensional filters in Theorems 3.2, 3.8 and 3.5 in [14], thus obtaining a filter-based EM algorithm.

Definition 5.9.4 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Tt g(xt ) | Yt ]. This has a density βt (x) so that ij ij E[t Tt g(xt ) | Yt ] = βt (x)g(xt )dx. IRm

ij

The existence of the density βt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations. This is established in, for instance, Section 4.2 of [2]. The following theorem (Theorem 3.2 in [14]) shows the surprising result that we can ij describe the measure βt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.6). ij

Theorem 5.9.5 At time t, the density βt (x) is completely described by the five statistics ij ij ij a¯ t , b¯t , c¯t , t , and m t as follows: βt (x) = (a¯ t + x b¯t + x c¯t x)q(x, t). ij

ij

ij

ij

(5.9.6)

ij ij ij Here a¯ t ∈ IR, b¯t ∈ IRm , and c¯t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices.

5.9 Continuous-time model parameters estimation

199

Further, ij da¯ t ij i j ij = Tr c¯t Bt Bt + b¯t Bt Bt t−1 m t , a¯ 0 = 0 ∈ IR, dt ij 1 ij 0 db¯t ij ij = − At + t−1 Bt Bt b¯t + 2c¯t Bt Bt t−1 m t , b¯0 = 0 ∈ IRm , dt ij 1 ij 1 0 dc¯t ij 0 = − At + t−1 Bt Bt c¯t − c¯t At + Bt Bt t−1 dt 1 ij + (e j ei + ei ej ), c¯0 = 0 ∈ L s (IRm , IRm ). 2

Proof

(5.9.7) (5.9.8)

(5.9.9)

Apply the Itˆo product rule to dTt = xt (ei ej )xt dt and ij

dt φ(xt ) = t ( φ(xt )) At xt dt + t ( φ(xt )) Bt dw t 1 + t Tr[ 2 φ(xt ) Bt Bt ]dt 2 + t φ(xt )xt Ct (Dt−1 ) Dt−1 dyt ,

(5.9.10)

to get

t

ij

t φ(xt )Tt = 0

s Tsi j ( φ(xs )) As xs ds

t

+ 0

+

1 2

t

0 t

+

s Tsi j ( φ(xs )) Bs dw s

0 t

+ 0

s Tsi j Tr[ 2 φ(xs ) Bs Bs ]ds

s Tsi j φ(xs )xs Cs (Ds−1 ) Ds−1 dys s φ(xs )xs (ei ej )xs ds.

(5.9.11)

Conditioning both sides of (5.9.11) on Yt under the “ideal” world probability measure P and using Lemma 3.2 p. 261 of Hajek and Wong [15] gives ij E[t φ(xt )Tt

| Yt ] = 0

+

t

E[s Tsi j ( φ(xs )) As xs | Ys ]ds

1 2

+

0 t

0

+

0

t

t

E[s Tsi j Tr[ 2 φ(xs ) Bs Bs ] | Ys ]ds

E[s Tsi j φ(xs )xs Cs (Ds−1 ) Ds−1 | Ys ]dys E[s φ(xs )xs (ei ej )xs | Ys ]ds.

200

Kalman filtering ij

In terms of the densities βt (x) and q(x, t), integrate by parts each term of ij

IR

m

βt (x)φ(x)dx =

t 0

βt (x)( φ(x)) As x)dxds ij

IR

m

1 t ij + βt (x)Tr[ 2 φ(x) Bs Bs ]dxds 2 0 IRm t ij + βt (x)φ(x)x Cs (Ds−1 ) Ds−1 dxdys

IRm

0

t

+ 0

q(x, t)φ(xs )xs (ei ej )xs ds.

ij

that is, βt (x) must satisfy the stochastic partial differential equation

t

ij

βt (x) = −

div(As xβsi j (x))ds

0

1 t + Tr[ 2 βsi j (x)Bs Bs ]ds 2 0 t + βsi j (x)x Cs (Ds−1 ) Ds−1 dys 0

+

0

t

q(x, s)xs (ei ej )xs ds.

(5.9.12)

We look for a solution of (5.9.12) of the form β¯si j (x) = (a¯ si j + x b¯si j + x c¯si j x)q(x, s).

(5.9.13)

As noted just after Definition 5.9.4, if such solution exists, it is unique. ¯ b¯ and c, ¯ To simplify notation drop superscripts i, j on a, div(β¯s (x)As x) = div((a¯ s + x b¯s + x c¯s x)q(x, s)As x) = (b¯s + 2c¯s ) As xq(x, s) + (a¯ s + b¯s x + x c¯s x)div(As xq(x, s)) β¯s (x) = ((a¯ s + x b¯s + x c¯s x)q(x, s)) = (b¯s + 2c¯s x)q(x, s) + (a¯ s + x b¯s + x c¯s x) q(x, s) 2 β¯s (x) = 2c¯s xq(x, s) + 2(b¯s + 2c¯s x)( q(x, s)) + (a¯ s + x b¯s + x c¯s x) 2 q(x, s) Tr[ 2 βsi j (x)Bs Bs ] = 2q(x, s)Tr(c¯s Bs Bs ) + 2(b¯s + 2c¯s x)Bs Bs q(x, s) + (a¯ s + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]. Now from (5.8.8), q(x, s) = − s−1 (x − m s )q(x, s).

5.9 Continuous-time model parameters estimation

201

Consequently, substitution of β¯t (x), given by (5.9.13) in the differential form of the right hand side of (5.9.12), yields − ((b¯s + 2c¯s )x) As xq(x, s)ds − (a¯ s + b¯s x + x c¯s x)div(As xq(x, s))ds + q(x, s)Tr(c¯s Bs Bs )ds + (b¯s + 2c¯s x)Bs Bs s−1 (x − m s )q(x, s)ds 1 (a¯ s + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]ds 2 + (a¯ s + x b¯s + x c¯s x)q(x, s)x Cs (Ds−1 ) Ds−1 dys +

+ q(x, s)xs (ei ej )xs ds.

(5.9.14)

Also, ¯ dβs(x) = (da¯ s + db¯s x + x dc¯s x)q(x, s) + (a¯ si j + x b¯s + x c¯s x)dq(x, s).

(5.9.15)

Consequently, β¯t (x), given by (5.9.13), is a solution of (5.9.12) if (5.9.14) equals (5.9.15). However, q(x, s) solves the equation (5.8.7), so dq(x, s) = − div(As xq(x, s))ds +

1 Tr[ 2 q(x, s)Bs Bs ]ds 2

+ q(x, s)x Cs (Ds−1 ) Ds−1 dys . Therefore, substituting the above expression for dq(x, s) into (5.9.15) yields ¯ dβs(x) = (da¯ s + db¯s x + x dc¯s x)q(x, s) − (a¯ si j + x b¯s + x c¯s x)div(As xq(x, s))ds 1 + (a¯ si j + x b¯s + x c¯s x)Tr[ 2 q(x, s)Bs Bs ]ds 2 + (a¯ si j + x b¯s + x c¯s x)x Cs (Ds−1 ) Ds−1 q(x, s)dys .

(5.9.16)

Finally, equating the coefficients of x, x and the constants in (5.9.14) and (5.9.16), it is seen the result holds if (5.9.7), (5.9.8) and (5.9.9) hold. A solution of the ordinary differential equations (5.9.8) and (5.9.9) is now obtained. Write G t for the matrix solution of dG t = −(At + t−1 Bt Bt )G t , G 0 = Im×m . (5.9.17) dt Note that G t is deterministic and can be calculated off-line. Also, as an exponential matrix, G t has an inverse G −1 t . Lemma 5.9.6 The explicit solutions of (5.9.8) and (5.9.9) are t

−1 i j −1 ¯bti j = 2G t G s c¯s Bs Bs s m s ds , 0

ij c¯t

1 = Gt 2

t 0

G −1 s (e j ei

+

ei ej )(G s )−1 ds

G t .

202

Kalman filtering

Proof

The above equations follow using variation of constants.

ij

ij

Remark 5.9.7 We proceed similarly with the processes Ut and L t leaving the details as exercises. Definition 5.9.8 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Ut g(xt ) | Yt ]. This has a density γt (x) so that ij

ij

E[t Ut g(xt ) | Yt ] = IR

m

γt (x)g(xt )dx.

ij

The existence of the density γt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations (see Section 4.2 of [2]). The following theorem (Theorem 3.5 in [14]) shows that we can describe the measure ij γt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.7). ij

Theorem 5.9.9 At time t, the density γt (x) is completely described by the five statistics ij ij ij a˘ t , b˘t , c˘t , t , and m t as follows: γt (x) = (a˘ t + x b˘t + x c˘t x)q(x, t). ij

ij

ij

ij

ij

ij

ij

Here a˘ t ∈ IR, b˘t ∈ IRm , and c˘t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices. Further, ij da˘ t ij i j ij = Tr c˘t Bt Bt + b˘t Bt Bt t−1 m t , a˘ 0 = 0 ∈ IR, dt 1 ij 0 ij ij db˘t = [− At + t−1 Bt Bt b˘t + 2c˘t Bt Bt t−1 m t ]dt + dyt f j ei , ij b˘0 = 0 ∈ IRm ,

(5.9.19)

ij 1 ij 1 0 dc˘t ij 0 = − At + t−1 Bt Bt c˘t − c˘t At + Bt Bt t−1 dt 1 ij + (ei f j Ct + Ct f j ei ), c˘0 = 0 ∈ L s (IRm , IRm ). 2

Proof

(5.9.18)

(5.9.20)

Apply the Itˆo product rule to dUt = xt (ei f j )dys and ij

dt φ(xt ) = t ( φ(xt )) At xt dt + t ( φ(xt )) Bt dw t 1 + t Tr[ 2 φ(xt ) Bt Bt ]dt 2 + t φ(xt )xt Ct (Dt−1 ) Dt−1 dyt . and condition both sides on Yt under the “ideal” world probability measure P using Lemma 3.2, p. 261 of [15].

5.9 Continuous-time model parameters estimation

203

ij

In terms of the densities γt (x) and q(x, t), integration by parts yields t 1 t ij ij γt (x) = − div(As xγs (x))ds + Tr[ 2 γsi j (x)Bs Bs ]ds 2 0 0 t t ij −1 −1 + γs (x)x Cs (Ds ) Ds dys + q(x, s)xs (ei f j )dys 0

0

t

+ 0

q(x, s)(x Cs f j ei x)ds.

(5.9.21)

We look for a solution of (5.9.21) of the form γ¯si j (x) = (a˘ si j + x b˘si j + x c˘si j x)q(x, s).

(5.9.22)

As noted just after Definition 5.9.4, if such solution exists, it is unique. Recall that q(x, s) solves the equation (5.8.7): dq(x, s) = − div(As xq(x, s))ds +

1 Tr[ 2 q(x, s)Bs Bs ]ds 2

+ q(x, s)x Cs (Ds−1 ) Ds−1 dys . So (5.9.22) is a solution of (5.9.21) if (5.9.18), (5.9.19) and (5.9.20) hold. Now the ordinary differential equations (5.9.19) and (5.9.20) are solved explicitly. Note j j that f j dyt = dyt f j = dyt , where yt denotes the j-th component of yt . Lemma 5.9.10 The explicit solutions of (5.9.19) and (5.9.20) are t t ij ij −1 ˘ b˘t = 2G t G −1 B B m ds + G e dy c s s i s , s s s s s 0

ij c˘t

1 = Gt 2

0

0

t

G −1 s (ei

f j Cs

+

Cs

f j ei )(G s )−1 ds

G t .

Definition 5.9.11 For any “test” function g : IRm → IR, define a measure-valued process ij ij E[t Ut g(xt ) | Yt ]. This has a density λt (x), so that ij ij E[t Ut g(xt ) | Yt ] = λt (x)g(xt )dx. IRm

ij

The existence of the density λt (x) follows from the existence and uniqueness of solutions of stochastic partial differential equations (see Section 4.2 of [2]). The following theorem (Theorem 3.8 in [14]) shows that one can describe the measure ij λt (x) exactly as a quadratic in x multiplying the q(x, t) of (5.8.7). ij

Theorem 5.9.12 At time t, the density λt (x) is completely described by the five statistics ij ij ij a˜ t , b˜t , c˜t , t , and m t as follows: λt (x) = (a˜ t + x b˜t + x c˜t x)q(x, t). ij

ij

ij

ij

204

Kalman filtering ij

ij

ij

Here a˜ t ∈ IR, b˜t ∈ IRm , and c˜t ∈ L s (IRm , IRm ), the space of symmetric m × m matrices. Further, 0 1 da˜ t ij i j = Tr c˜t Bt Bt + b˜t Bt Bt t−1 m t − Tr Bt Bt ei ej , dt ij

ij

a˜ 0 = 0 ∈ IR, ij db˜t

dt

(5.9.23)

1 ij 0 ij = − At + t−1 Bt Bt b˜t + 2c˜t Bt Bt t−1 m t − (e j ei )Bt Bt t−1 m t , ij b˜0 = 0 ∈ IRm ,

(5.9.24)

ij 1 ij 1 0 dc˜t ij 0 = − At + t−1 Bt Bt c˜t − c˜t At + Bt Bt t−1 dt 1 0 1 + (ei ej (At + Bt Bt t−1 ) + At + t−1 Bt Bt e j ei ), 2 ij

c˜0 = 0 ∈ L s (IRm , IRm ).

(5.9.25)

ij

Proof Apply the Itˆo product rule to t Ut φ(xt ) and condition on Yt under the “ideal” world probability measure P using Lemma 3.2, p. 261 of [15]. ij In terms of the densities λt (x) and q(x, t), integrate by parts to get t 1 t ij ij λt (x) = − div(As xλs (x))ds + Tr[ 2 λis j (x)Bs Bs ]ds 2 0 0 t t + λis j (x)x Cs (Ds−1 ) Ds−1 dys + q(x, s)x (ei ej )As xds 0

−

0

0

t

div(x (ei ej )Bt Bt q(x, s))ds.

(5.9.26)

We look for a solution of (5.9.26) of the form γ¯si j (x) = (a˜ si j + x b˜si j + x c˜si j x)q(x, s).

(5.9.27)

Recall that q(x, s) solves the equation (5.8.7). So (5.9.27) is a solution of (5.9.26) if (5.9.23), (5.9.24) and (5.9.25) hold. Now the ordinary differential equations (5.9.24) and (5.9.25) are solved explicitly. Lemma 5.9.13 The explicit solutions of (5.9.24) and (5.9.25) are t ij ij −1 ˜ b˜t = G t G −1 (2 c − e e ) B B m ds , j i s s s s s s 0

1 t −1 ij c˜t = G t [G s (ei ej (At + Bt Bt t−1 ) 2 0 0 1 + At + t−1 Bt Bt e j ei )(G s )−1 ]ds G t .

(5.9.28)

5.9 Continuous-time model parameters estimation

205

Remark 5.9.14 Note that from the definition of G t , (5.9.17), that the integrand in (5.9.28) −1 includes only half of the four terms in the derivative of G −1 t (ei e j + e j ei )(G t ) ), and so the integral cannot be evaluated in closed form. ij

ij

ij

Theorem 5.9.15 (Theorem 3.10 in [14]) Finite-dimensional filters for Tt , Ut , and L t , defined in (5.9.1), (5.9.3), and (5.9.2), are given by m

E[Tt | Yt ] = a¯ t + m t a¯ t + ij

ij

ij

c¯t ( p, q) t ( p, q) + m t c¯t m t , ij

ij

p,q=1 m

E[Ut | Yt ] = a˘ t + m t a˘ t + ij

ij

ij

c˘t ( p, q) t ( p, q) + m t c˘t m t ,

(5.9.29)

c˜t ( p, q) t ( p, q) + m t c˜t m t .

(5.9.30)

ij

ij

p,q=1 m

E[L t | Yt ] = a˜ t + m t a˜ t + ij

ij

ij

ij

ij

p,q=1

Proof Recall from (5.8.8) that q(x, t) is an unnormalized Gaussian density with mean m t and variance t . Therefore, q(x, t)dx = νt . IRm

Note that for u ∈ IRm ,

IR

m

u xq(x, t)dx = (u m t )νt .

Also, for any matrix M ∈ L(IRm , IRm ) with entries M( p, q), 1 ≤ p, q ≤ m, IRm

x M xq(x, t)dx =

IRm

(x − m t ) M(x − m t )q(x, t)dx

+ m t Mm t =

m

q(x, t)dx IRm

M( p, q) t ( p, q) +

m t Mm t

νt .

p,q=1

Now from Bayes’ Theorem (4.1.1), we have ij E[Tt

2 ij ij E[t Tt | Yt ] IRm βt (x)dx 2 | Yt ] = = E[t | Yt ] IRm q(x, t)dx = a¯ t + m t a¯ t + ij

ij

m

c¯t ( p, q) t ( p, q) + m t c¯t m t , ij

ij

p,q=1

by (5.9.6) and because the factors νt cancel. The proof of equations (5.9.29) and (5.9.30) are similar.

206

Kalman filtering

Estimation of B and D First consider the tensor product of xt with itself: t t xt ⊗ xt = x0 ⊗ x0 + xs ⊗ dxs + dxs ⊗ xs + 0

0

t

B B ds.

Conditioning both sides of (5.9.31) on Yt , we have t t t = E[x0 ⊗ x0 | Yt ] + E xs ⊗ dxs + dxs ⊗ xs | Yt + 0

(5.9.31)

0

0

t

B B ds.

(5.9.32)

0

E[x0 ⊗ x0 | Yt ] in (5.9.32) is the smoothed second moment and is given in terms of finitedimensional statistics; see Theorem 12.11, section 12.4 in [25]. The components of the ij conditioned stochastic integral in (5.9.32) are given by the filtered estimates of L t . Consequently, we have a procedure for estimating the matrix B B . Similarly, consider the tensor product of yt with itself: t t yt ⊗ yt = ys ⊗ dys + dys ⊗ ys + D D t. 0

0

This expression simply amounts to evaluating D D in terms of the quadratic variation of y. 5.10 Direct parameter estimation In the previous sections maximum likelihood arguments were used to estimate recursively the parameters for the linear model (5.8.1), (5.8.2). Here a direct approach to the estimation problem as well as rates of convergence are discussed. The following theorem ([12]) is a continuous-time version of Kronecker’s Lemma (see for example [26] or [29] for the discrete-time case). This result is applied to discuss rates of convergence of the estimates. Suppose (, F, Ft , P), t ≥ 0 is a stochastic basis and M is a continuous locally square integrable martingale. Further, u t is a positive nondecreasing predictable process such that ut > c > 0

Write z t =

2t t0

a.s.

u r−1 dMr , for 0 ≤ t0 ≤ t.

Theorem 5.10.1 Suppose lim z t (ω) = ξ (ω) < ∞ a.s.

t→∞

1 Then limt→∞ (Mt − Mt0 ) exists a.s. ut If limt→∞ u t (ω) = +∞, this limit is 0. Proof

For any s, t0 < s < t, because u is nondecreasing, t t Mt − Ms = u r dzr = u r d(zr − z s ) s

s

t

= u t (z t − z s ) − s

(zr − z s )du r

a.s.

5.10 Direct parameter estimation

207

|Mt − Ms | ≤ 2u t sup |zr − z s |.

(5.10.1)

Consequently, r ≥s

Suppose that limt→∞ u t (ω) = u(ω) < ∞. Then |Mt − Ms | ≤ 2u supr ≥s |zr − z s |. From the hypothesis that limt→∞ z t (ω) = ξ (ω) < ∞ a.s., for any > 0 there is an s such that, if r ≥ s ≥ s , |zr − z s | < /2u. Consequently, if r ≥ s ≥ s , |Mt − Ms | ≤ . That is, Mt

(ω) satisfies a Cauchy condition and converges to a limit µ(ω). Then 1 1 limt→∞ (Mt − Mt0 ) = (µ − Mt0 ). ut u Suppose now that limt→∞ u t (ω) = +∞. Given > 0, again using the Cauchy condition for z, there is an s such that, if r ≥ s ≥ s ∨ t0 , |zr − z s | < . 3 Consequently, supr ≥s ∨t0 |zr − z s | ≤ /3. From (5.10.1), if t ≥ s ∨ t0 , 1 2 . |Mt − Ms ∨t0 | ≤ ut 3 Suppose t0 ≤ s ∨ t0 < t0 . Now limt→∞ u t (ω) = +∞, so there is a t such that, t > t , ut ≥ That is

3|Ms ∨t0 − Mt0 | .

1 |Mt − Ms ∨t0 | ≤ . Now ut 3 1 1 1 |Mt − Mt0 | ≤ |Ms ∨t0 − Mt0 | + |Mt − Ms ∨t0 |. ut ut ut

So if t > s ∨ t ∨ t0 , 1 |Mt − Mt0 | ≤ , ut and the result is proved. The signal coefficient From (5.8.1),

0

t

t

dxs ⊗ xs = A

t

xs ⊗ xs ds +

0

dw s ⊗ xs ,

0

which we rewrite L t = ATt + Mt , and E[L t | Yt ] = AE[Tt | Yt ] + E[Mt | Yt ].

208

Kalman filtering

An estimate for A is, therefore, Aˆ t = Lˆ t Hˆ t−1 , and the error Aˆ t − A = Mˆ t Hˆ t−1 . Now as a special case of Theorem 5.10.1 we investigate the convergence of this error to zero. Consider a function ρ(t), t ≥ 0, which is positive nondecreasing and such that t

limt→∞

ρ −1 (s)ds = λ < ∞. Note from Theorem 5.10.1 this last condition implies that

0

limt→∞ tρ(t)−1 = 0. An example of such a function is ρ(t) = max(1, t(log t)(log log t)α ),

α > 1.

Clearly any function which grows faster than t α , α > 1, at infinity satisfies the condition. The strongest results are those for functions which have the slowest growth at infinity. Consider the (matrix) martingale Mt . M is locally square integrable; M will denote the predictable nonnegative process such that Mt Mt − Mt is a local martingale. t

In fact Mt = B B

0

gale

xs xs ds and Tr Mt = Tr(B B )

t

Rt =

t

0

xs xs ds. Consider the martin-

ρ(Tr Ms )−1/2 dMs .

0

Lemma 5.10.2 Rt is a square integrable martingale, so limt→∞ Rt = ξ (ω) < ∞ exists a.s. Proof E[Tr(Rt Rt )] Now

t

t

=E

−1

ρ(Tr Ms ) d(Tr Ms ) .

0

ρ(Tr Ms )−1 d(Tr Ms ) < λ a.s. So

0

lim E[Tr(Rt Rt )] ≤ λ < ∞.

t→∞

and Rt is a square integrable martingale for 0 ≤ t ≤ ∞. Corollary 5.10.3 From Theorem 5.10.1, if ρ is continuous, lim ρ(Tr Mt )−1/2 Mt

t→∞

exists. If limt→∞ Tr Mt = +∞, this limit is zero. (Tr Mt is an increasing process so limt→∞ Tr Mt exists and is either finite or +∞). Corollary 5.10.4 lim ρ

t→∞

exists a.s.

0

t

xs xs ds

−(1/2) Mt

5.10 Direct parameter estimation

Proof

209

Note that, apart from the positive constant B ∗ = Tr(B B ), Tr Mt is

0

Therefore as ρ is nondecreasing, ∗

−1

∗

ρ((B + 1) Tr Mt ) = ρ (B + 1) ≤ρ 0

so

ρ 0

t

−1

xs xs ds

With R¯ t =

t

t

−1

B

∗

0

t

xs xs ds

xs xs ds ,

1−1 0 ≤ ρ (B ∗ + 1)−1 Tr Mt .

ρ

0

0

u

xs xs ds

−1/2

dMu ,

we have E[Tr( R¯ t R¯ t )] ≤ (B ∗ + 1)λ < ∞. Therefore, limt→∞ R¯ t exists and is finite a.s., so from Theorem 5.10.1 t

−1/2 lim ρ xs xs ds Mt t→∞

0

exists a.s.

Corollary 5.10.5 Suppose x satisfies the stability property 1 t L = sup xs xs ds < ∞, t t 0 and lim ρ(t)Mt = ∞.

t→∞

Then lim ρ(t)−1/2 Mt = 0

t→∞

a.s.

Proof ρ((B ∗ + 1)−1 (L + 1)−1 Tr Mt )

t ∗ −1 −1 ∗ = ρ (B + 1) (L + 1) B xs xs ds

0

1 t = ρ (B + 1) (L + 1) B t x xs ds t 0 s 0 1 ≤ ρ (B ∗ + 1)−1 (L + 1)−1 B ∗ t L ≤ ρ(t). ∗

−1

−1

∗

t

xs xs ds.

210

Kalman filtering

Therefore ρ(t)−1 ≤ ρ((B ∗ + 1)−1 (L + 1)−1 Tr Mt ). With R˜ t =

t

ρ(s)−1/2 dMs ,

0

we have E[Tr R˜ t R˜ t ] = E

t

−1

ρ(s) d(Tr Ms )

0

≤ (B ∗ + 1)(L + 1)λ < ∞. Therefore, limt→∞ R˜ t exists and is finite a.s. Thus from Theorem 5.10.1 lim ρ (t)−1/2 Mt = 0

t→∞

a.s.

Theorem 5.10.6 Suppose x satisfies the stability property of Corollary 5.10.5 and limt→∞ ρ(t) = ∞. Further, suppose x satisfies the excitation condition ρ(t)−1 Hˆ t > K > 0,

t

where Tt = 0

xs xs ds and Hˆ t = E[Tt | Yt ]. Then lim Mˆ t Hˆ t−1 = 0 a.s.

t→∞

with convergence at a rate ρ(t)1/2 . Then lim ρ(t)−1/2 = 0 a.s.

t→∞

Proof

The stability property states that supt 1 sup E t t

t 0

1 t

t 0

xs xs ds ≤ L a.s. Therefore

xs xs ds ≤ L < ∞,

and because limt→∞ tρ(t)−1 = 0, sup t

1 E[Tr Mt ] < ∞, ρ(t)

and the set of random variables {ρ(t)−1/2 Mt } is bounded in L 2 . We can, therefore, condition the convergence of Corollary 5.10.5 and deduce lim ρ(t)−1/2 Mˆ t = 0

t→∞

a.s.

5.11 Continuous-time nonlinear filtering

211

Now Mˆ t Hˆ t−1 = ρ(t)−1/2 Mˆ t (ρ(t)−1/2 Hˆ t )−1 < ρ(t)−1/2 Mˆ t ρ(t)−1/2 K −1 . Therefore, limt→∞ ρ(t)1/2 Mˆ t Hˆ t−1 = 0 a.s. and the result follows.

The observation coefficient From (5.8.2),

t

t

dys ⊗ xs = C

0

t

xs ⊗ xs ds +

0

dvs ⊗ xs ,

0

which we rewrite Ut = C Tt + Nt , and E[Ut | Yt ] = C E[Tt | Yt ] + E[Nt | Yt ]. An estimate for C is, therefore, Aˆ t = Jˆt Hˆ t−1 , and the error Cˆ t − A = Nˆ t Hˆ t−1 . Similar discussions allow us to conclude that, under the stability and excitation conditions, the error Nˆ t Hˆ t−1 converges to zero almost surely at a rate ρ(t)1/2 . 5.11 Continuous-time nonlinear filtering Suppose (, F, P) is a probability space with a complete filtration {Ft }, t ≥ 0, on which are given two independent Ft -Brownian motion processes Bt and yt with quadratic variations Q(.) and R(.) respectively. Let x0 be a real valued random variable with distribution π0 (.). Consider the Borel functions g : IR × [0, ∞) → IR, s : IR × [0, ∞) → IR, where |g(x1 , t) − g(x2 , t)| ≤ k|x1 − x2 |, |s(x1 , t) − s(x2 , t)| ≤ k|x1 − x2 |. Write Yt = σ {ys : s ≤ t} for the complete filtration generated by the observation process y. Remark 5.11.1 The stochastic differential equation dxt = g(xt )dt + s(xt )dBt , with initial state x0 , has a strong solution.

212

Kalman filtering

Consider the Borel function h : IR × [0, ∞) → IR, where we suppose |h(x, t)| ≤ k(1 + |x|). Define

t

t = exp 0

h(xs )Rs−1 dys

which is also given by

t

t = 1 + 0

1 − 2

t

h

2

0

(xs )Rs−1 ds

,

s h(xs )Rs−1 dys .

(5.11.1)

To see this apply the Itˆo rule to the function log t . Then t is an Ft -martingale and E[t ] = 1. A new probability measure P can be defined by setting dP = t . dP Ft t Define the process bt by the formula bt = yt − h(xs )ds. Then {bt } is a Wiener process 0

under P with quadratic variation R(.). Therefore under P, t yt = h(xs )ds + bt . 0

For any real valued function φ for which the expectation is defined, write σ (φ)t = E[t φ(xt ) | Yt ].

(5.11.2)

In the case when the measure defined by σ (.)t has a density q(x, t), we have σ (φ)t = φ(x)q(x, t)dx. IR

Using the Itˆo rule, we establish

t

φ(xt ) = φ(x0 ) + +

t 0

0

∂φ(xs ) s(xs )dBs ∂x

1 ∂ 2 φ(xs ) 2 ∂φ(xs ) s (x ) + g(x ) ds. s s 2 ∂x2 ∂x

(5.11.3)

In view of (5.11.1) and (5.11.3) and using the Itˆo product rule (Example 3.7.15), t t t φ(xt ) = φ(x0 ) + s dφ(xs ) + φ(xs )ds + [, φ]t 0

= φ(x0 ) +

0

t

+ 0

+

0

t

0 t

s

∂φ(xs ) s s(xs )dBs ∂x

1 2 ∂ 2 φ(xs ) ∂φ(xs ) + g(xs ) s (xs ) ds 2 ∂x2 ∂x

s h(xs )Rs−1 φ(xs )dys .

(5.11.4)

5.11 Continuous-time nonlinear filtering

213

Conditioning both sides of (5.11.4) on Yt and using the fact that Bt and yt are independent and that yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 of Chapter 7), we obtain a stochastic differential equation for (5.11.2). Theorem 5.11.2 Suppose φ ∈ C 2 is a real valued function with compact support. Then t t σ (φ)t = σ (φ)0 + σ (Aφ)s ds + σ (h(xs )Rs−1 φ(xs ))dys , (5.11.5) 0

0

1 2 ∂ 2 φ(xs ) ∂φ(xs ) + g(xs ) s (xs ) . 2 ∂x2 ∂x If σ (.)t has a density q(x, t), we integrate by parts each term of (5.11.5) using the fact that φ ∈ C 2 has compact support: where Aφ(x) =

φ(x)q(x, t)dx =

IR

or

t

∂ 2 φ(x) ∂x2 0 IR IR t t ∂φ(x) + q(x, s)g(x) q(x, s)h(x)Rs−1 φ(x)dxdys , dxds + ∂x IR IR 0 0 φ(x)q0 (t)dx +

1 2

q(x, s)s 2 (x)

φ(x)q(x, t)dx =

φ(x)q0 (x)dx +

IR

IR

−

t 0

+

t

φ(xs ) IR

IR

0

1 2

t 0

φ(x)

IR

∂ 2 (q(x, s)s 2 (x)) ∂x2

∂(q(x, s)g(x)) dxds ∂x

q(x, s)h(x)Rs−1 φ(x)dxdys ,

for all “test” functions φ, hence Corollary 5.11.3 q satisfies the linear stochastic differential equation t t ∗ q(x, t) = q0 (x) + (A q)(x, s)ds + q(x, s)h(x, s)Rs−1 dys . 0

0

1 ∂ s (xt )q(x, t) ∂g(xt )q(x, t) Here (A∗ q)(x, t) = and q0 (x) is the density such that − 2 ∂x2 ∂x π0 (dx) = q0 (x)dx. 2 2

The correlated case Here we consider nonlinear dynamics with correlated noises. Suppose (, F, P) is a probability space with a complete filtration {Ft }, t ≥ 0, on which are given two Ft -Brownian motion processes Bt ∈ IRd and Wt ∈ IRm such that t

B i , W j t = ρsi j ds, 1 ≤ i, 1 ≤ j ≤ m. 0

x0 ∈ IRd has distribution π0 (.) and is independent of Bt and Wt . Consider the Borel functions

214

Kalman filtering

g : IRd × [0, ∞) → IRd , s : IRd × [0, ∞) → L(IRd , IRd ), h : IRd × [0, ∞) → IRm and the continuous and nonsingular matrix α : [0, ∞) → L(IRm , IRm ). We assume here that |g(x1 , t) − g(x2 , t)| ≤ k|x1 − x2 |, ||s(x1 , t) − s(x2 , t)|| ≤ k|x1 − x2 |, |h(x, t)| ≤ k(1 + |x|), ||α(y)|| ≥ δ > 0 ||α(yt1 )

−

α(yt2 )||

δ and

for some ≤

k|yt1

−

yt2 |.

dxt = g(xt )dt + s(xt )dBt , dyt = h(xt )dt + α(yt )dWt . Write Yt = σ {ys : s ≤ t} for the complete filtration generated by the observation process y. Define

t 1 t −1 −1 −1 2 t = exp − (α(ys ) h(xs )) dWs − |α(ys ) h(xs )| ds 2 0 0

t 1 t −1 −1 −1 2 = exp − (α(ys ) h(xs )) α(ys ) dys + |α(ys ) h(xs )| ds . 2 0 0 Consequently,

t

t = exp

(α(ys )−1 h(xs )) α(ys )−1 dys −

0

1 2

t

|α(ys )−1 h(xs )|2 ds .

0

dP = dP Ft d ¯t ∈ IRm are standard Brownian motions, −1 t , and under P the processes Vt ∈ IR and y where By Girsanov’s Theorem, a new probability measure P can be defined by setting

dVti = dBti + ρ i , α −1 hdt,

ρ i ∈ IRd ,

and d y¯t = α −1 dy = dWt + α −1 hdt. t Furthermore, under P, V i , y j t = ρsi j ds 0

For any real valued function φ for which the expectation is defined write σ (φ)t = E[t φ(xt ) | Yt ]. Theorem 5.11.4 Suppose φ ∈ C 2 (IRd ) is a real valued function with compact support. Then t σ (φ)t = σ (φ)0 + σ (Aφ)s ds +

0 t

{σ ( φ.s.ρ) + α −1 (ys )σ (φh)} α −1 (ys )dys ,

0

where Aφ(x) =

d d 1 ∂ 2 φ(xs ) ∂φ(xs ) (ss )i j (xs ) i j + g i (xs ) . 2 i, j=1 ∂x ∂x ∂xi i=1

5.12 Problems

Proof

215

The proof is left as an exercise. 5.12 Problems

1. Assume that the state and observation processes of a system are given by the vector dynamics (5.4.1) and (5.4.2). For m, k ∈ IN, m < k, write the unnormalized conditional density such that E[k I (X m ∈ dx) | Yk ] = γm,k (x)dx. Using the change of measure techniques described in Section 5.3, show that γm,k (x) = αm (x)βm,k (x), where αm (x) is given recursively by (5.3.6). Show that βm,k (x) = E[m+1,k | X m = x, Yk ] 1 = φm+1 (Ym+1 − Cm+1 z) φ(ym+1 ) IRm × ψm+1 (z − Am+1 x)βm+1,k (z)dz.

(5.12.1)

2. Show that the density βm,k (x) (5.12.1) is Gaussian and derive backward recursions for its conditional mean and covariance matrix ([10] page 101). 3. Assume that the state and observation processes are given by the vector dynamics X k+1 = Ak+1 X k + Vk+1 + Wk+1 ∈ IRm , Yk = Ck X k + Wk ∈ IRd . Ak , Ck are matrices of appropriate dimensions, Vk and Wk are normally distributed with means 0 and respective covariance matrices Q k and Rk , assumed nonsingular. Using measure change techniques derive recursions for the conditional mean and covariance matrix of the state X given the observations Y . 4. Let m = n = 1 in (5.8.1) and (5.8.2). The notation in Section 5.8 and Section 5.9 is used here. Let t be the process defined as t t = xsp ds, p = 1, 2, . . . . 0

Write E[t I(t ∈dx) | Yt ] = µt (x)dx. Show that at time t, the density µt (x) is completely described by the p + 3 statistics st (0), st (1), . . . , st ( p), t , and m t as follows: p st (i) q(x, t), µt (x) = i=1

216

Kalman filtering

where s0 (i) = 0, i = 1, . . . , p, and dst ( p) = − p(At + t−1 Bt2 )st ( p) + 1, dt dst ( p − 1) = −( p − 1)(At + t−1 Bt2 )st ( p − 1) + pst ( p) t−1 Bt2 m t , dt dst (i) 1 = −i(At + t−1 Bt2 )st (i) + (i + 1)(i + 2)st (i + 2) dt 2 + (i + 1)st (i + 1) t−1 Bt2 ,

5. 6. 7. 8. 9. 10. 11.

dst (0) = Bt2 st (2) + t−1 st (1)m t . dt Give a detailed proof of Lemma 5.7.1. Prove (5.7.5), (5.7.6), (5.7.7) and (5.7.3). Finish the proof of Theorem 5.7.5. Give the proof of Theorem 5.7.6. Prove (5.7.39). Establish (5.7.52). Give the proof of Theorem 5.11.4.

i = 1, . . . , p − 2,

6

Financial applications

6.1 Volatility estimation Suppose a price S evolves in discrete time, k = 0, 1, . . . , with dynamics Sk+1 = Sk eµ−

2 σk+1 2

+σk+1 bk+1

.

Here {bk } is a sequence of i.i.d. normal random variables with mean 0 and variance 1 (N (0, 1)) and σk+1 represents the volatility of the price change between times k and k + 1. E[Sk+1 | Sk ] = Sk eµ . The price sequence S0 , S1 , . . . is observed as are the logarithmic increments yk+1 = log

σ2 Sk+1 = µ − k+1 + σk+1 bk+1 . Sk 2

Let us suppose that log σk has dynamics log σk+1 = a + b log σk + θ w k+1 . Here again {w k } is a sequence of i.i.d. N (0, 1) random variables. Writing xk = log σk , so that σk = exk , we see xk+1 = a + bxk + θw k+1 , e2xk yk = µ − + e2xk bk . 2 Now assume that under the reference probability measure P both {xk } and {yk } are sequences of i.i.d. N (0, 1) random variables. Write Gk = σ {x0 , . . . , xk , y0 , . . . , yk−1 }, and denoting by φ(.) the N (0, 1) probability density function φ(θ −1 (xk − a − bxk−1 )) φ(e λk = θ φ(xk )

for k = 1, 2, . . . .

−xk

1 (yk − µ + e2xk )) 2 , exk φ(yk )

(6.1.1)

218

Financial applications

Set 1 φ(e−x0 (y0 − µ + e2x0 )) 2 λ0 = , ex0 φ(y0 ) n

n =

λk .

k=0

dP = n . dP Gn We can then show that under P, {w k }, {bk }, k = 0, 1, . . . are sequences of i.i.d. N (0, 1) random variables, where

Define a new probability measure P (the “real world” probability), by setting

w k = θ −1 (xk − a − bxk−1 ), 1 bk = e−xk (yk − µ + e2xk ). 2 From Bayes’ Theorem 4.1.1, for any Borel measurable function f , E[ f (xk ) | Yk ] =

E[k f (xk ) | Yk ] E[k | Yk ]

.

The numerator defines a measure; suppose it has a density qk (.) so that ∞ E[k f (xk ) | Yk ] = f (z)qk (z)dz,

(6.1.2)

−∞

and we have the recursion Theorem 6.1.1 qk (z) = (z, y)

∞ −∞

φ(θ −1 (z − a − bx))qk−1 (x)dx.

1 e−z φ(e−z (yk − µ + e2z )) 2 Here (z, y) = . θ φ(yk ) This gives the formula for updating the unnormalized conditional density of xk = log σk given Yk . Putting f (x) ≡ 1 in (6.1.2) we see ∞ E[k | Yk ] = qk (z)dz, (6.1.3) −∞

so that the normalized conditional density of xk = log σk given Yk is pk (z) =

qk (z)

∞

−∞

qk (x)dx

.

6.1 Volatility estimation

Furthermore, taking f (xk ) = xk we see

219

∞

E[xk | Yk ] = −∞ ∞ −∞

zqk (z)dz . qk (z)dz

This is the optimal estimate of the logarithm of the volatility given the observations of the price. Calibration Suppose H , F, G are integrable functions. Consider

Sn =

n

H (yk )F(xk )G(xk−1 ).

(6.1.4)

k=1

We wish to estimate E[Sk | Yk ]. Consider an associate measure and suppose there is a density L k (z) such that ∞ E[k Sk f (xk ) | Yk ] = f (z)L k (z)dz,

(6.1.5)

−∞

for any integrable function f . We can derive the following formula for updating L k . Theorem 6.1.2

L k (z) = (z, y)

∞ −∞

+ H (yk )F(z)

φ(θ −1 (z − a − bx))L k−1 (x)dx

∞ −∞

φ(θ −1 (z − a − bx))G(x)qk−1 (x)dx ,

1 e−z φ(e−z (yk − µ + e2z )) 2 where (z, y) = . θ φ(yk ) Proof

Using (6.1.1) and (6.1.4), ∞ E[k Sk f (xk ) | Yk ] = f (z)L k (z)dz −∞

= E[k−1 Sk−1 f (xk )

φ(θ −1 (xk − a − bxk−1 )) θφ(xk )

1 φ(e−xk (yk − µ + e2xk )) 2 × | Yk ] exk φ(yk ) + E[k−1 f (xk )H (yk )F(yk )G(xk−1 ) 1 φ(e−xk (yk − µ + e2xk )) 2 × | Yk ] exk φ(yk )

φ(θ −1 (xk − a − bxk−1 )) θφ(xk )

220

Financial applications

=

∞

−∞

+

∞

−∞

∞

−∞

φ(θ −1 (z − a − bx)) f (z)(z, y)L k−1 (x)dxdz ∞

−∞

φ(θ −1 (z − a − bx))(z, y)

× H (yk )F(z)G(x) f (z)qk−1 (x)dxdz. This equality holds for all integrable f and the result follows. Corollary 6.1.3 Taking f (z) ≡ 1 in (6.1.5) we see ∞ E[k Sk | Yk ] = L k (z)dz.

(6.1.6)

−∞

Further, from Bayes’ Theorem 4.1.1, E[Sk | Yk ] =

E[k Sk | Yk ]

=

E[k | Yk ]

∞

−∞ ∞ −∞

1. For sk1 =

L k (z)dz . qk (z)dz

Special cases

k i=1

xi a measure γk1 is defined by E[k Sk1 f (xk ) | Yk ] =

∞ −∞

f (z)γk1 (z)dz.

(6.1.7)

This is updated by the formula γk1 (z) = (z, y) +z

∞ −∞

∞

−∞

φ(θ

1 φ(θ −1 (z − a − bx))γk−1 (x)dx

−1

(z − a − bx))G(x)qk−1 (x)dx .

Then E[Sk1 | Yk ] =

∞

−∞ ∞ −∞

2. For sk2 =

k i=1

γk1 (z)dz . qk (z)dz

xi−1 the corresponding measure γk2 is updated by

γk2 (z)

= (z, y) +

∞ −∞

∞

xφ(θ −∞

2 φ(θ −1 (z − a − bx))γk−1 (x)dx

−1

(z − a − bx))G(x)qk−1 (x)dx .

6.2 Parameter estimation

3. For Jk =

k

221

xi xi−1 the corresponding measure βk1 is updated by

∞ 1 βk1 (z) = (z, y) φ(θ −1 (z − a − bx))βk−1 (x)dx

i=1

+z

−∞

∞

−∞

xφ(θ −1 (z − a − bx))G(x)qk−1 (x)dx .

4. Similar formulae, which are all special cases of the expression for L k (z), are obtained for updating the measures: βk2 (z) associated with

k

2 xi−1 ,

i=1

βk3 (z) associated with

k

xi2 ,

i=1

νk1 (z) associated with

k

yi e−2xi ,

i=1

νk2 (z) associated with

k

e−2xi .

i=1

In all cases the conditional expectation of the sum, given the observations, is obtained by normalizing the integral of the associated measure. For example, ∞

νk1 (z)dz k −∞ −2xi E yi e | Yk = ∞ . i=1 qk (z)dz −∞

6.2 Parameter estimation Estimates of the sums above can be used to apply to the EM algorithm. Parameters in our model can be re-estimated recursively and, further, one parameter at a time can be updated. For example, suppose after some iteration a parameter set (a, b, θ, µ) is obtained and we wish to re-estimate the parameter b, given the observations y1 , y2 , . . . , yk . ˆ This is Consider a change of measure which replaces parameter b in our model by b. given by a Radon–Nikodym derivative ˆ

bk =

k ˆ i−1 )) φ(θ −1 (xi − a − bx , φ(θ −1 (xi − a − bxi−1 )) i=1

dP b ˆ = bk . dP b Gk ˆ The maximizing step determines the conditional expectation of log bk given the observations. That is, consider

k 1 −1 2 bˆ ˆ i−1 ) ) + R(b) | Yk , E[log k | Yk ] = E − (θ (xi − a − bx 2 i=1 ˆ

and setting

where R(b) does not involve b.

222

Financial applications

The first order condition gives the maximum value of bˆ as: k k xi xi−1 − a i=1 xi−1 | Yk ] E[ i=1 bˆk = k 2 E[ i=1 xi−1 | Yk ] ∞ (βk1 (z) − aγk2 (z))dz = −∞ ∞ . βk2 (z)dz −∞

Similar arguments gives estimates k k 1 xi − b xi−1 | Yk E k i=1 i=1 ∞ (γk1 (z) − bγk2 (z))dz = −∞ ∞ . k qk (z)dz

aˆ k =

−∞

k 1 E (xi − a − bxi−1 )2 | Yk 2k i=1 ∞ F(z)dz a −∞ = − , ∞ 2 2k qk (z)dz

(θˆk )2 =

−∞

k k yi e2xi | Yk +E 2 i=1 ∞ ∞ t qk (z)dz + νk1 (z)dz 2 −∞ −∞ ∞ = . νk1 (z)dz

µˆ k =

−∞

Here F(z) = βk3 (z) + bβk2 (z) − 2bβk1 (z) − 2aγk1 (z) + 2abγk2 (z). 6.3 Filtering a price process Suppose in discrete time a price S has the form Sk+1 = Sk eYk+1 , where Yk+1 = ck + σk bk+1 . Here {b } is a sequence of i.i.d. normal random variables with mean 0 and variance 1 (N (0, 1)). Suppose (ck , σk ) takes values in a finite set B = {(ci , σi ) : 1 ≤ i ≤ N }. Write c = (c1 , c2 , . . . , c N ) ,

σ = (σ1 , σ2 , . . . , σ N ) ,

6.4 Parameter estimation

223

and suppose that (ck , σk ) evolves as a Markov chain with state space B. We can identify B with S = {e1 , e2 , . . . , e N }, where, as before, ei = (0, . . . , 1, . . . , 0) ∈ IR N . Suppose φ : B → S gives this bijection, so that for each i, 1 ≤ i ≤ N , φ(ci , σi ) = ei . Write X k = φ((ck , σk )) (where k now denotes the time parameter). Then ck = c, X k , and σk = σ, X k . We suppose X is a Markov chain on (, F, P) with state space S and transition matrix A. The state space S could be quite small and X could represent the state of the economy as “good”, “bad”, or “average”. Of course X is not observed directly. Instead we observe logarithmic increments of the price process: Yk+1 = log

Sk+1 = ck + σk bk+1 = c, X k + σ, X k bk+1 . Sk

The N (0, 1) random variable b models a purely random noise in the dynamics. The Markov chain X also models some random behavior, but hopefully random behavior with some structure. The results of the previous sections can now be applied. For any price process {Sk }, k = 1, 2, . . . , the steps are 1. calculate the sequence of logarithmic increments Yk+1 = log

Sk+1 = ck + σk bk+1 = c, X k + σ, X k bk+1 , Sk

2. choose “appropriate” prior values for {(ci , σi ) : 1 ≤ i ≤ N } and for the transition probabilities a ji = P(X k+1 = e j | X k = ei ) ≥ 0,

with Nj=1 a ji = 1, 3. after n values of Y have been observed, calculate new estimates for c, σ and the a, 4. use these values, iteratively, to re-estimate the c, σ and the a. The EM algorithm implies the estimates improve monotonically, in the sense that the expected log-likelihood increases with each re-estimation. Consequently, the model is ‘self-tuning’. This step is repeated until some stopping criterion is satisfied. 6.4 Parameter estimation for a modified Kalman filter This application considers a slightly modified linear Gaussian model. Consider the following model for the spot price of oil S: dSt = (µ − δt )St dt + σ1 St dz 1 (t).

(6.4.1)

224

Financial applications

Here z 1 is a standard Brownian motion and δt represents the “convenience yield”. (This models the value of holding amounts of the commodity.) In fact it is supposed that δ follows similar stochastic dynamics of the form dδt = κ(α − δt )dt + σ2 dz 2 (t).

(6.4.2)

Here z 2 is a second standard Brownian motion with z 1 (t), z 2 (t) = ρt. It is convenient to consider the logarithm of the stock price, X t = loge St . Then X satisfies 1 dX t = κ(µ − δt − σ12 )dt + σ1 dz 1 (t). 2

(6.4.3)

If r is the risk-free interest rate (taken to be constant here) and λ is the market price of convenience yield risk (also assumed constant), S and δ follow similar processes under an equivalent martingale measure. However, it is equations (6.4.2) and (6.4.3) which we discretize to give dynamics for the state vector (X t , δt ) as: (X t , δt ) = ct + Q t (X t−1 , δt−1 ) + ηt .

(6.4.4)

1 Here ct = ((µ − σ12 )t, καt) ∈ IR2 and 2 1 −t Qt = , a 2 × 2 matrix. 0 1 − κt The future price for oil for delivery at time T ≥ 0 is given by:

(1 − e−κ T ) F(S, δ, T ) = S exp −δ + A(T ) , κ where 1 σ22 1 (1 − e−κ T ) σ1 σ 2 ρ A(T ) = r − α + T + σ22 − 2 2κ κ 4 κ3 σ 2 (1 − e−κ T ) + ακ + σ1 σ2 ρ − 2 . κ κ2 Here S is the spot price today, T = 0 and δ is the value of the convenience yield today, T = 0. Consequently, loge F(S, δ, T ) = loge S − δ

(1 − e−κ T ) + A(T ). κ

(6.4.5)

6.4 Parameter estimation

225

It is these future prices, for various dates T , which are given in the market. That is, for different dates T1 , T2 , . . . , TN we have observations yt1 = loge F(S, δ, T 1 ), .. . ytN = loge F(S, δ, T N ). It is supposed these observations give the right hand side of (6.4.5) plus some “noise” term εt ∈ IR N , where εt = (εt1 , . . . , εtN ) is a sequence for t = 0, 1, . . . of independent Gaussian random variables with E[εt ] = 0 ∈ IR N and Var εt = E[εt εt ] = H ∈ IR N ×N . The observation equation (6.4.5), plus εt noise on the right side, therefore has the form: yt = dt + Z t (X t , δt ) + εt , for t = 1, 2, . . . , T , where

(6.4.6)

   loge F(S, δ, T 1 ) yt1     .. yt =  ...  =   . 

ytN

loge F(S, δ, T N )

are the future prices at time t for delivery at times t + T 1 , . . . , t + T N .   1   1, −κ −1 (1 − eκ T ) A(T1 ) 2    1, −κ −1 (1 − eκ T )   ..  N  . dt =  .  ∈ IR , Z t =  ..  .   A(TN ) N 1, −κ −1 (1 − eκ T ) The model, in summary, has dynamics (6.4.4) for the “signal” (X t , δt ), (X t , δt ) = c + Q(X t−1 , δt−1 ) + ηt ,

(6.4.7)

and dynamics (6.4.6) for the observations, yt = (yt1 , . . . , ytN ), yt = dt + Z t (X t , δt ) + εt .

(6.4.8)

Note that, in spite of Schwartz’s notation, c, Q, d and Z do not depend on t. They do include t, the time increment of fixed size. Equations (6.4.7) and (6.4.8) are of the form where the classical Kalman filter can be applied. This considers linear dynamics for the signal X t = (X t1 , . . . , X tm ) ∈ IRm , t = 0, 1, . . . , X t+1 = A¯ + AX t + Bw t+1 ,

A ∈ IRm×m ,

(6.4.9)

and observations yt = C¯ + C X t + Dvt ,

t = 0, 1, . . ..

(6.4.10)

¯ this model is slightly different Note that, because of the inclusion of the terms A¯ and C, from that considered previously.

226

Financial applications

One observes yt , t = 0, 1, . . . , T, . . . , and wishes to make the best estimate of X t . This is the quantity Xˆ t = E[X t | y0 , y1 , . . . , yt ]. In fact Xˆ t is also a Gaussian random variable with conditional mean µt = Xˆ t = E[X t | y0 , y1 , . . . , yt ], and variance Rt = E[(X t − µt )(X t − µt ) | y0 , y1 , . . . , yt ]. In fact, the formulae are better written in terms of the one-step predictions: µk|k−1 = E[xk | y0 , y1 , . . . , yk−1 ] = A¯ + Aµk−1 , and Rk|k−1 = E[(X k − µk|k−1 )(X k − µk|k−1 ) | y0 , y1 , . . . , yk−1 ]. Then Rk|k−1 = B 2 + A Rk−1 A .

Kalman filter The (modified) Kalman filter then gives recursive updates: µk+1 = A¯ + Aµk + Rk+1|k C (C Rk+1|k C + D D )−1 × (yk+1 − C¯ − C A¯ − C Aµk ), Rk+1 = Rk+1|k − Rk+1|k C (C Rk+1|k C + D D )−1 C Rk+1|k . As stated, µk = Xˆ k = E[X t | y0 , y1 , . . . , yk ] is the conditional mean, or best estimate, of X k given y0 , y1 , . . . , yk . Similarly, Rk = E[(X k − µk )(X k − µk ) | y0 , y1 , . . . , yk ].

Parameter estimation ¯ A, B, C, ¯ C, D is However, to implement the Kalman filter knowledge of the parameters A, required. Our algorithms, when modified for these “affine” dynamics, provide optimal ways of estimating these parameters. i j(M) i j(M) i j(M) In fact, consider the following recursions for ak ∈ IR, bk ∈ IRm , dk ∈ IRm×m in ¯ in (a symmetric matrix with elements dk ( p, q), p = 1, . . . , m, q = 1, . . . , m), a¯ k , bk , u ik , vki ,

6.4 Parameter estimation −1 u¯ ik , v¯ki . M = 0, 1, 2, 1 ≤ i, j ≤ m, 1 ≤ n ≤ d, σk−1 = Rk−1 − Rk−1 A Rk|k−1 A Rk−1 . i j(M)

ak

i j(M)

= ak

i j(M)

+ bk

i j(M) −1 −1 −1 σk+1 Rk µk + Tr dk σk+1

−1 + µk Rk−1 σk+1 dk

i j(M)

i j(M)

−1 −1 σk+1 Rk µk − bk

−1 + A¯ Rk+1|k A R k dk

i j(M)

−1 A¯ Rk A Rk+1|k

−1 − 2A Rk+1|k A R k dk

i j(M)

i j(M)

a0

−1 A¯ Rk A Rk+1|k

−1 −1 σk+1 R k µk ,

= 0 ∈ IR,

i j(0) i j(0) i j(0) −1 −1 i j(0) −1 −1 bk+1 = Rk+1|k A¯ , A Rk bk + 2dk σk+1 Rk µk − 2dk Rk A Rk+1|k i j(0)

b0

= 0 ∈ IRm ,

−1 dk+1 = Rk+1|k A R k dk i j(0)

i j(0)

d0

i j(0)

=

ei ej + e j ei 2

1 −1 Rk A Rk+1|k + (ei ej + e j ei ), 2

∈ IRm×m ,

i j(1) i j(1) i j(1) −1 −1 i j(1) −1 −1 bk+1 = Rk+1|k A¯ A Rk bk + 2dk σk+1 Rk µk − 2dk Rk A Rk+1|k −1 −1 −1 ¯ + ei ej σk+1 A, Rk µk − ei ej Rk A Rk+1|k i j(1)

b0

= 0 ∈ IRm ,

−1 dk+1 = Rk+1|k A R k dk i j(1)

i j(1)

d0

i j(2)

i j(1)

1 −1 −1 −1 Rk A Rk+1|k + (ei ej Rk A Rk+1|k + Rk+1|k A Rk e j ei ), 2

= 0 ∈ IRm×m ,

i j(2) −1 −1 i j(2) −1 i j(2) −1 −1 + bk σk+1 Rk µk + Tr dk σk+1 Rk µk + µk Rk−1 dk σk+1 −1 −1 −1 −1 + Tr ei e j σk+1 + µk Rk−1 σk+1 (ei ej )σk+1 R k µk i j(2)

ak+1 = ak

−1 −1 A¯ + A¯ Rk+1|k A Rk (ei ej )Rk A Rk+1|k i j(2)

− bk

i j(2) −1 −1 −1 A¯ + A¯ Rk+1|k A¯ Rk A Rk+1|k A Rk dk Rk A Rk+1|k

i j(2) −1 −1 −1 − 2 A¯ Rk+1|k A Rk dk σk+1 R k µk −1 −1 −1 − A¯ Rk+1|k A Rk (ei ej + e j ei )σk+1 R k µk , i j(2)

a0

= 0 ∈ IR,

i j(2) i j(2) i j(2) −1 −1 −1 A Rk bk + 2dk σk+1 R k µk bk+1 = Rk+1|k i j(2) − 2d Rk A R −1 A¯ k

+

k+1|k

−1 Rk+1|k A Rk (ei ej

−1 −1 + e j ei )σk+1 R k µk

−1 −1 ¯ − Rk+1|k A Rk (ei ej + e j ei )Rk A Rk+1|k A,

227

228

Financial applications i j(2)

b0

i j(2) dk+1 i j(2)

d0

= 0 ∈ IRm , =

−1 Rk+1|k A Rk

i j(2) dk

ei ej + e j ei

+

2

−1 , Rk A Rk+1|k

= 0 ∈ IRm×m ,

−1 −1 −1 in ¯ a¯ k+1 = a¯ kin + b¯kin σk+1 Rk µk − b¯kin Rk A Rk+1|k A,

a¯ 0in = 0, −1 in b¯k+1 = Rk+1|k A Rk b¯kin + ei ej yk+1 ,

b¯0in = ei y0 , en ,

−1 −1 −1 ¯ u ik+1 = u ik + vki σk+1 A, Rk µk − vki Rk A Rk+1|k

u i0 = 0, −1 i vk+1 = Rk+1|k A Rk vki + ei ,

v0i = ei ∈ IRm ,

−1 −1 u¯ ik+1 = u¯ ik + (v¯ki + ei )σk+1 R k µk −1 ¯ A, − (v¯ki + ei )Rk A Rk+1|k

−1 i v¯k+1 = Rk+1|k A Rk (v¯ki + ei ),

u¯ i0 = 0 ∈ IR,

v¯0i = 0 ∈ IRm .

where Tr[.] denotes the trace of a matrix (which is the sum of the diagonal elements). Write Hk(0) =

k

=0

Hk(2) =

k

=1

Lk =

k

x x ,

Hk(1) =

=0

x −1 x −1 ,

x ,

k

Jk =

x x ,

k

=0

L¯ k =

=0

k

x y ,

x −1 ,

=1

Hˆ k(0) = E[Hk(0) | Yk ],

Hˆ k(1) = E[Hk(1) | Yk ],

etc. Then for M = 0, 1, 2: i j(M)

E[Hk

E[Jkin

i j(M)

| Yk ] = a k | Yk ] =

a¯ kin(M)

i j(M)

+ bk +

i j(M)

µk + Tr[dk

Rk ] + µk dk

i j(M)

µk ,

b¯kin µk ,

E[L ik | Yk ] = u ik + vki µk ,

E[ L¯ ik | Yk ] = u¯ ik + v¯ki µk . These equations give recursive finite dimensional filters for estimating the matrices and vectors Hk(M) , M = 0, 1, 2, Jk , L k and L¯ k given the observations y0 , y1 , . . . , yk .

6.5 Estimating the implicit interest rate

229

¯ C¯ are then (given y0 , y1 , . . . , yk ): The revised estimates for the parameters A, B, C, D, A, 1 A¯ k = ( Lˆ k − A Lˆ¯ k ), k

A¯ k =

k 1 y − C Lˆ k ), ( k + 1 =0

Ak = ( Hˆ k(1) − A¯ Lˆ¯ k )( Hˆ k(2) )−1 ,

Ck = ( Jˆk − C¯ Lˆ k )( Hˆ k(0) )−1 , B2 =

1 ˆ (0) { Hk − (A Hˆ k(1) + Hˆ k(1) A ) + A Hˆ k(2) A k − ( A¯ Lˆ k + Lˆ k A¯ ) + ( A¯ Lˆ¯ k A + A Lˆ¯ k A¯ ) + k A¯ A¯ },

k k 1 (0) (D D ) = y y − ( Jˆk C + C Jˆk ) + C Hˆ k C − C¯ y

k + 1 =0

=0 k ¯ ¯ ¯ ˆ ˆ ¯ ¯ − y C + (C L k C + C L k C ) + (k + 1)C C .

=0

Given observations y0 , y1 , . . . , yk , the parameters are initialized and the above algorithms run to re-estimate the parameters one at a time. With the same y0 , y1 , . . . , yk this process is iterated until some stopping rule is satisfied.

6.5 Estimating the implicit interest rate of a risky asset In this section a risky asset is considered whose price at time t is supposed described by an equation of the form dSt = St (ρt dt + σ dBt ),

t ≥ 0.

(6.5.1)

Here the drift coefficient ρt is the underlying interest rate of the risky asset, B is a standard Brownian motion and the integrals are taken to be Itˆo integrals. This model is used frequently, and often the coefficients ρ and σ are supposed to be constant. Various forms and methods of estimating the volatility of diffusion coefficient σ can be found in the literature; see for example [33]. We shall suppose σ is constant and determined by one of these techniques. We shall suppose that the implicit interest rate ρt behaves like a Markov chain with state space {r1 , . . . , r N }. r will denote the (column) vector (r1 , . . . , r N ) . Suppose S0 = S. Then from (6.5.1), t 1 2 St = S exp (ρu − σ )du + σ Bt . 2 0 Write Yt = ln St − ln S. Then

Yt = 0

t

1 (ρu − σ 2 )du + σ Bt . 2

230

Financial applications

Now ρt − 12 σ 2 takes values in the set {r1 − 12 σ 2 , . . . , r N − 12 σ 2 }. Write gi = ri − 12 σ 2 and g for the (column) vector (g1 , . . . , g N ) . Without loss of generality, we shall consider a Markov chain on S = {e1 , . . . , e N } (see Example 2.6.17). Here, for 0 ≤ i ≤ N , ei = (0, . . . , 1, . . . , 0) is the i-th unit (column) vector in IR N . If X t ∈ S denotes the state of this Markov chain at time t ≥ 0, then the corresponding value of ρt is X t , r , where ., . denotes the inner product in IR N . A natural process to take as the observation process is Yt , which can be written t Yt = X u , g du + σ Bt . (6.5.2) 0

Write Ft for the right-continuous, complete filtration generated by σ {X r , Yr : r ≤ t}, and Yt for the right-continuous, complete filtration σ {Yr : r ≤ t}, generated by the observation process. We have the following semimartingale representation result (see Lemma 2.6.18): t Xt = X0 + Ar X r dr + Vt . (6.5.3) 0

Filtering We model the above dynamics by supposing that initially we have an “ideal” probability space (, F, P) such that under P 1. X is a Markov chain with representation (6.5.3), 2. σ −1 Y is a standard Brownian motion, independent of X . Define

t

t = exp

X u , g σ

−1

0

1 dYs − 2

t

X u , g σ 2

−2

ds ,

0

which is also given by

t

t = 1 +

s X u , g σ −1 dYs .

(6.5.4)

0

To see this apply the Itˆo rule to the function log t . Then t is an Ft martingale and E[t ] = 1. A new probability measure P can be defined by setting Define the process Bt by the formula dBt = σ −1 (Yt − X t , g dt),

dP = t . dP Ft

B0 = 0.

Then Girsanov’s theorem 4.3.3 implies that {Bt } is a standard Brownian motion process under P. Therefore, under P, dYt = X t , g dt + σ dBt .

(6.5.5)

Note that under P, the process {X t } still satisfies (6.5.3). Consequently, under P the processes {X t } and {Yt } satisfy the real world dynamics (6.5.3) and (6.5.2). However, P is a

6.5 Estimating the implicit interest rate

231

more convenient measure with which to work. Using a version of Bayes’ Theorem (4.1.1), E[X t | Yt ] =

E[t X t | Yt ] . E[t | Yt ]

Write σ (X t ) = E[t X t | Yt ]. (6.5.6) N Note that E[t | Yt ] = i=1 σ (X t , ei ) = σ ( i=1 X t , ei ) = σ (1). More simply, E[t | Yt ] = σ (X t ), 1 , where 1 is an N -dimensional vector with all entries equal to 1. In view of (6.5.4) and (6.5.3) and using the Itˆo product rule (3.7.15), t t t X t = X 0 + s AX s ds + s X s , g X s σ −1 dYs N

0

= X0 +

t

0 t

s AX s ds +

0

s G X s σ −1 dYs .

(6.5.7)

0

Here G is the diagonal matrix whose entries are g1 , . . . , g N . Conditioning both sides of (6.5.7) on Yt and using the fact that Yt has independent increments under P (it is Wiener) (see [15] Lemma 3.2 p. 261), we have the following finite dimensional filter for σ (X t ): t t σ (X t ) = σ (X 0 ) + Aσ (X s )ds + Gσ (X s )σ −2 dYs . (6.5.8) 0

0

Note that σ (ρt ) = σ (X t ), r . For s ≤ t the smoother for X s , ei is defined as E[X s , ei | Yt ], with unnormalized form under P

E[X s , ei t | Yt ] = σt (X s , ei ). However, it is more convenient to work with σt (X s , ei X t ) (see [10] Chapter 8 for more details) and we have t σt (X s , ei X t ) = σs (X s , ei X s ) + Aσu (X s , ei X u )du 0

+

t

Gσu (X s , ei X u )σ −2 dYu .

0

This is a finite dimensional filter for σt (X s , ei X t ). Consequently, σt (X s , ei ) = σt (X s , ei X t ), 1 and σt (X s , ei ) E[X s , ei | Yt ] = N . j=1 σ (X s , e j ) Revising the parameters In addition to the volatility σ the parameters introduced in the above model are the values ri , 1 ≤ i ≤ N , of the implicit interest rate, and the entries ai j , 1 ≤ i, j ≤ N , of the Q-matrix

232

Financial applications

A. Recall gi = ri − 12 σ 2 . Using the expectation maximization (EM) algorithm, it is shown in [10] Chapter 8 that the revised estimates are given by ij

aˆ ji =

σ (Nt ) , σ (Jti )

gˆi =

σ (G it ) . σ (Jti )

ij

Here Nt is the number of jumps of X from ei to e j in the time interval [0, 1]. Jti = t 0 X u , ei du is the amount of time X spends in state ei in the time interval [0, t] and t t t X u , ei σ −2 dYu = gi X u , ei σ −2 du + X u , ei σ −2 dBu , G it = 0

0

0

the unnormalized estimates are given by the following linear equations: t t ij ij σ (Nt X t ) = Aσ (Ns X s )ds + σ (X s ), ei a ji e j ds 0

+ 0

σ (Jti X t ) =

t 0

t

0

t 0

Gσ (Nsi j X s )σ −2 dYs ,

Aσ (Jsi X s )ds +

+ σ (G it X t ) =

0

t

t

σ (X s ), ei ei ds

0

Gσ (Jsi X s )σ −2 dYs ,

Aσ (G is X s )ds + gi

+ 0

t

(6.5.9)

t

(6.5.10) σ (X s ), ei ei σ −2 ds

0

(Gσ (G is X s ) + σ (X s ), ei ei )σ −2 dYs .

(6.5.11)

In each case we have ij

ij

σ (Nt ) = σt (Nt X t ), 1 , σ (Jti ) = σt (Jti X t ), 1 , σ (G it ) = σt (G it X t ), 1 . Numerical methods Here we describe numerical approximations to (6.5.8), (6.5.9), (6.5.10), and (6.5.11). Write qt = σ (X t ). Then (6.5.8) is t t qt = σ (X 0 ) + Aqs ds + Gqs σ −2 dYs . 0

0

Suppose h = nt . For 0 ≤ k < n a first approximation gives q(k+1)h = qkh + Aqkh .h + Gqkh σ −2 (Y(k+1)h − Ykh ).

6.5 Estimating the implicit interest rate

233

However, this neglects terms which do not converge to 0 when h → 0. To capture these terms Milshtein [27] noted one should substitute qkh + Aqkh .s + Gqkh σ −2 (Ys − Ykh ) for qs in the expression q(k+1)h = qkh +

(k+1)h

Aqs ds +

kh

(k+1)h

Gqs σ −2 dYs .

kh

Neglecting terms which converge to 0 when h → 0, the Milshtein approximation then is q(k+1)h = qkh + Aqkh .h + Gqkh σ −2 (Y(k+1)h − Ykh ) 1 2 G qkh σ −4 [(Y(k+1)h − Ykh )2 − σ 2 h]. 2

+

A full discussion of the Milshtein scheme, and other more sophisticated schemes, can be ij ij found in [22]. Write n t = σ (Nt X t ). Then (6.5.9) becomes t t ij nt = An is j ds + qs , ei a ji e j ds 0

0

t

+ 0

Gn is j σ −2 dYs .

The Milshtein form in this case is: n (k+1)h = qkh , ei a ji e j h + [I + Ah + qkh .h + G(Y(k+1)h − Ykh )σ −2 ij

+

1 2 ij G [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]n kh . 2

Similarly, writing τti = σ (Jti X t ) and discretizing (6.5.10), i τ(k+1)h = qkh , ei ei h + [I + Ah + qkh .h + G(Y(k+1)h − Ykh )σ −2

+

1 2 i G [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]τkh . 2

Finally, with γti = σ (G it X t ) discretizing (6.5.11), i γ(k+1)h = gi qkh , ei ei σ −2 h + [I + Ah + qkh .h

+ G(Y(k+1)h − Ykh )σ −2 1 i + G 2 [(Y(k+1)h − Ykh )2 − σ 2 h]σ −4 ]γkh 2 + σ −2 qkh , ei ei (Y(k+1)h − Ykh ) 1 + σ −4 Gqkh , ei [(Y(k+1)h − Ykh )2 − σ 2 h]ei . 2

(6.5.12)

234

Financial applications

New estimates for the parameters a ji and gi , based on the observations of the price up to time t = nh, are, therefore, ij

aˆ ji =

n t , 1 , τti , 1

ij

aˆ ji =

n t , 1 . τti , 1

Using the smoothed versions of (6.5.9), (6.5.10) and (6.5.11) (see [10] Chapter 8), and possibly additional data, a second revised estimate for these parameters can be obtained. Iterating this procedure provides a monotonic, increasing sequence of probability densities, so, in terms of maximizing the expectation, the models are improving with each step and the estimation methods are self-tuning.

7

A genetics model

7.1 Introduction Consider a population of N independent individuals. At each time k ∈ {0, 1, 2, . . . } each individual can be in one of n states. The total number, N , of individuals in the population remains constant in time. However, the distribution of the N individuals among the n states changes. We suppose that initially all random variables are defined on a probability space (, F, P). For 1 ≤ i, j ≤ n, p ji is the probability that an individual in the population will jump from state i at time k − 1 to state j at time k. That is, we suppose each individual in the population behaves like an independent time-homogeneous Markov chain with transition matrix P = ( p ji ). Note nj=1 p ji = 1. Write p j = ( p1 j , p2 j , . . . , pn j ) for the j-th column of P. Write (N ) for the set of all partitions of N into n summands; that is, z ∈ (N ) if z = (z 1 , z 2 , . . . , z n ), where each z i is a nonnegative integer and z 1 + z 2 + · · · + z n = N . Write X (k) = (X 1 (k), X 2 (k), . . . , X n (k)) ∈ (N ) for the distribution of the population at time k. It is easily checked that E[X (k) | X (k − 1)] = P X (k − 1).

(7.1.1)

However, the population is sampled by withdrawing (with replacement), at each time k, M individuals from the population and observing to which state they belong. That is, at each time k a sample Y (k) = (Y1 (k), Y2 (k), . . . , Yn (k)) ∈ (M) is obtained, where (M) is the set of partitions of M. Clearly this sequence of samples, Y (0), Y (1), Y (2), . . . enables us to revise our estimates of the state X (k). 7.2 Recursive estimates For α = (α1 , α2 , . . . , αn ) ∈ R n and s = (s1 , s2 , . . . , sn ) ∈ (N ),

236

A genetics model

write F(α, s) =

n

p j , αs j ,

j=1

where , denotes the scalar product in R n . For r = (r1 , r2 , . . . , rn ) ∈ (N ) write pr s = P(X (k) = r | X (k − 1) = s). Then pr s is the coefficient of α1r1 α2r2 . . . αnrn in F(α, s). That is, ∂N F(α, s). (7.2.1) . . . ∂αnrn M For y = (y1 , y2 , . . . , yn ) ∈ (M) write for the multinomial coefficient y1 y2 . . . yn M! . This is just the number of ways of selecting y1 objects from M into state 1, y1 !y2 ! . . . yn ! y2 into state 2 and so on. Then, under the original probability measure P, r yn r1 y1 r2 y2 M n P(Y (k) = y | X (k) = r ) = ... . y1 y2 . . . yn N N N pr s = (r1 !r2 ! . . . rn !)

∂α1r1 ∂α2r2

Write Gk for the complete σ -field generated by X (0), X (1), . . . , X (k) and Y (0), Y (1), Y (2), . . . , Y (k − 1). Yk will denote the complete σ -field generated by Y (0), Y (1), Y (2), . . . , Y (k). We wish to introduce a new probability measure P under which the probability of withdrawing an element in any one of the n states is just 1/n. For this define factors M 1 X n (k) −Yn (k) X 1 (k) −Y1 (k) X 2 (k) −Y2 (k) γk (Y (k)) = ... , n N N N and write k =

k

γk .

=0

A new probability measure can be defined by putting

dP = k . dP Gk

Lemma 7.2.1 For y ∈ (M), r ∈ (N ), P(Y (k) = y | Gk ) =

M y1 y2 . . . yn

M 1 . n

Proof P(Y (k) = y | Gk ) = E[I (Y (k) = y) | Gk ] and by a version of Bayes’ Theorem (4.1.1), this is =

E[k I (Y (k) = y) | Gk ] . E[k | Gk ]

7.2 Recursive estimates

237

Now γk is the only factor of k not Gk -measurable, so this is =

E[γk I (Y (k) = y) | Gk ] . E[γk | Gk ]

The denominator E[γk | Gk ] equals M 1 X 1 (k) −Y1 (k) X 2 (k) −Y2 (k) X n (k) −Yn (k) E ... | Gk , n N N N and the only variables not Gk -measurable are Y1 (k), . . . , Yn (k). Consequently, this conditional expectation is M 1 M = = 1. y1 . . . yn n y∈(M) The numerator is E[γk I (Y (k) = y) | Gk ] =

M y1 y2 . . . yn

M 1 . n

Consequently, P(Y (k) = y | Gk ) =

1 M n y1 y2 . . . yn M

[3 pt] = P(Y (k) = y). That is, under P the n states are i.i.d. with probability 1/n.

Remark 7.2.2 Under P, P(X (k) = r | X (k − 1) = s) is still pr s given by (7.2.1). However, as we saw in Lemma 7.2.1, P(Y (k) = y | Gk ) = P(Y (k) = y | X (k) = r ) M 1 M = P(Y (k) = y) = . n y1 y2 . . . yn To return from P to P the inverse density must be introduced. That is, with −M X 1 (k) Y1 (k) X 2 (k) Y2 (k) 1 X n (k) Yn (k) −1 γ k = γk = ... , n N N N k = −1 k =

k =0

γ ,

the probability P can be defined by putting

dP = k . dP Gk

238

A genetics model

If {φk } is a {Gk } adapted process then Bayes’ Theorem (4.1.1) implies E[φk | Yk ] =

E[k φk | Yk ] E[k | Yk ]

.

E[k φk | Yk ] is, therefore, an unnormalized conditional expectation of φk given Yk . The denominator E[k | Yk ] is a normalizing factor. For r ∈ (N ) write qr (k) = E[k I (X (k) = r ) | Yk ]. Note that r ∈(N ) I (X (k) = r ) = 1 so that r ∈(N ) qr (k) = E[k | Yk ]. We then have the following recursion. Theorem 7.2.3 If Y (k) = (Y1 (k), Y2 (k), . . . , Yn (k)) = (y1 , y2 , . . . , yn ) ∈ (N ), r yn

r y1 r y2 1 1 n qr (k) = n −M ... pr s qs (k − 1). N N N s∈(N ) (Note we take 00 = 1.) Proof qr (k) = E[k I (X (k) = r ) | Yk ] = E[k I (X (k) = r ) | Yk−1 , Y (k) = (y1 , y2 , . . . , yn )] = E[k−1 γ k I (X (k) = r ) | Yk−1 , Y (k) = (y1 , y2 , . . . , yn )] r y1 r y2 r yn 1 2 n = n −M ... E[ k−1 I (X (k) = r ) N N N

× I (X (k − 1) = s) | Yk−1 ] s∈(N )

= n −M

r y1 r y2 1

2

N

N

×

...

r yn n

N

E[ k−1

(X (k − 1) = s) pr s | Yk−1 ]

s∈(N )

= n −M

r y1 r y2 1

2

N

N

...

r yn

n N

pr s qs (k − 1).

s∈(N )

Remarks 7.2.4 P(X (k) = r | Yk ) = E[I (X (k) = r ) | Yk ] =

qr (k) . s∈(N ) qs (k)

To obtain the expected value of X (k) given the observations Yk we consider the vector of

7.3 Approximate formulae

values r = (r1 , r2 , . . . , rn ) for any r ∈ (N ). Then E[X (k) | Yk ] =

r ∈(N )

qr (k) · r

s∈(N )

qr (k)

239

.

Unfortunately this does not have the simple form of (7.1.1). Also note that the transition probabilities pr s can be re-estimated using the techniques described in Chapter 2 of [10].

7.3 Approximate formulae Unfortunately the recursion for qr (k) given by Theorem 7.2.3 is not easily evaluated. One approximation would be to use a smaller value N for N in the summation. To obtain nontrivial partitions of N into n summands, N should be greater than n. Substitution of the observed Y (0), Y (1), Y (2), . . . then would give a sequence of approximate distributions. Alternatively, one could replace the martingale “noise” in the dynamics of X (k) by Gaussian noise ([23]). To describe this, first suppose the n states of the individuals in the population are identified with the unit (column) vectors e1 , . . . , en , ei = (0, . . . , 1, 0, . . . , 0) of R n . Let X i (k) ∈ {e1 , . . . , en } denote the state of the i-th individual at time k. Then for each i, 1 ≤ i ≤ N , X i (k) behaves like a Markov chain on (, F, P) with transition matrix P. Consequently, X i (k) = P X i (k − 1) + M i (k),

(7.3.1)

where E[M i (k) | Gk−1 ] = E[M i (k) | X i (k − 1)] = 0. Write p(0) = ( p1 (0), . . . , pn (0)) = E[X i (0)]. Then from (7.3.1) E[X i (k)] = p(k) = P k p(0). For (column) vectors x, y ∈ R n write x ⊗ y = x y for their Kronecker, or tensor, product, and diag x for the matrix with x on the diagonal. Then, because X i (k) is one of the unit vectors e1 , . . . , en , X i (k) ⊗ X i (k) = diag X i (k) = P diag X i (k − 1)P + M i (k) ⊗ (P X i (k − 1)) + (P X i (k − 1)) ⊗ M i (k) + M i (k) ⊗ M i (k) = diag P X i (k − 1) + diag M i (k). Taking the expectation, we have E[M i (k) ⊗ M i (k)] = diag P p(k − 1) − P diag p(k − 1)P = Q(k), say. For i = j the processes X i and X j are independent.

240

A genetics model

Define N X (k) =

X i (k) , N

i=1

N M(k) =

M i (k) . N

i=1

The (vector) process X (k) describes the actual distribution of the population at time k. Its components sum to unity and X (k) = P X (k − 1) + M(k).

(7.3.2)

Also, by independence, E[M(k) ⊗ M(k)] is also equal to the matrix Q(k). The suggestion made in [23] is to replace the martingale increments M(k) in (7.3.2) by independent (vector) Gaussian random variables W (k) of mean 0 and covariance Q(k). Write φk (w) for the normal density on R n of mean 0 and covariance Q(k). That is, suppose the signal process X (k), taking values in R n , has dynamics X (k) = P X (k − 1) + W (k). For y = (y1 , y2 , . . . , yn ) ∈ (M) and x = (x1 , x2 , . . . , xn ) ∈ R n , x = 0, define ρ(x, y) = |x|−M |x1 | y1 |x2 | y2 . . . |xn | yn ; set ρ(0, y) = 0 for y ∈ (M). The observation process still gives rise to Y (0), Y (1), . . . , Y (k) ∈ (M) and for y ∈ (M), x ∈ R n we suppose M ρ(x, y). P(Y (k) = y | X (k) = x) = y1 y2 . . . yn Starting with the probability P, now define γ k = n −M ρ(X (k), Y (k)), and k = dP Again P can be defined in terms of P by setting = k . Gk dP Suppose f : R n → R is any measurable “test” function. Consider E[ f (X (k)) | Yk ] =

E[k f (X (k)) | Yk ] E[k | Yk ]

.

Suppose there is an unnormalized conditional density qk (x) such that E[k f (X (k)) | Yk ] = f (x)qk (x)dx. Rn

The next result gives a recursion for qk which is the analog of Theorem 7.2.3. Theorem 7.3.1 qk (z) = n

−M

ρ(z, y)

Rn

φk (z − P x)qk−1 (x)ds.

k

=0

γ .

7.3 Approximate formulae

Proof

E[k f (X (k)) | Yk ] =

f (z)qk (z)dz Rn

= n −M E[k−1 ρ(X (k), Y (k)) f (X (k)) | Yk ] = n −M E[k−1 ρ(P X (k − 1) + W (k), y) × f (P X (k − 1) + W (k)) | Yk−1 , Y (k) = y] = n −M E[k−1 ρ(P X (k − 1) + W (k), y) × f (P X (k − 1) + W (k)) | Yk−1 ] = n −M E[ k−1 ρ(P X (k − 1) + W (k), y) × f (P x + w)φk (w)qk−1 (x) | Yk−1 ] −M =n ρ(z, y) f (z)φk (z − P x)qk−1 (x)dzdx. As this identity holds for all such f the result follows.

241

8

Hidden populations

8.1 Introduction An important problem in statistical ecology is how to determine the size of an animal population. A large number of techniques for providing an answer are available (see [35]) but the best-known one is the capture–recapture method. A random sample of individuals is captured, tagged or marked in some way, and then released back into the population. After allowing time for the marked and unmarked to mix sufficiently, a second simple random sample is taken and the marked ones are observed. At epoch write N for the population size, n for the number of marked and released individuals, n k = k=1 n for the total number of captured and marked individuals up to time k, M for the sample size, n for the number of available marked individuals for sampling and y for the number of captured (or recaptured) marked individuals. We are interested in estimating the size N at time of the population. All random variables are defined initially on a probability space (, F, P). All the filtrations defined here will be assumed to be complete. Write Gk = σ (N , n , y , M ≤ k), and Yk = σ (y ≤ k). We assume here that 1. The population sizes Nk follow the dynamics: Nk = Nk−1 + σ (Nk−1 )vk ,

(8.1.1)

N0 has distribution π0 and vk is a sequence of independent random variables with densities φk . 2. The n k are random variables with conditional binomial distributions with parameters pk = p( n k , y1 , . . . , yk , θ ) and n k . For example, θ n1 p1 = = θ, n1 n1 n1 θ n 2 + θ 2 θ n 2 + θ 2 = ,..., n2 n1 + n2 k n i θ k−i+1 n k−1 nk pk = i=1 = θ pk−1 + θ . (8.1.2) nk nk nk 0 < θ ≤ 1 is a parameter assumed to be known or it is to be estimated. The powers of θ express our belief that as time goes by early marked individuals are becoming less and less available for recapture due to various causes including deaths, emigration, etc. p2 =

8.2 Distribution estimation

243

If the number of captured and marked individuals n is kept constant (8.1.2) takes the form: pk (θ ) =

k−1 θk pk−1 + . k k

(8.1.3)

3. The observed random variable yk is assumed to have a conditional binomial distribution, n k Mk −m Mk n k m P(yk = m | Gk − {yk }) = 1− , (8.1.4) m Nk Nk Mk ! Mk where = . m m!(Mk − m)!

8.2 Distribution estimation Define λ0 = 1. For ≥ 1 and for suitable density functions ψ write −y σ (N−1 )ψ (N ) 1 n y −M −n − n +n n λ = p (1 − p ) , 1 − n φ (v ) N N 2 M + and k = k=0 λ

(8.2.1)

Lemma 8.2.1 The process k is a G-martingale. Proof

E[ k | Gk−1 ] = k−1 E[λk | Gk−1 ]. It remains to show that E[λk | Gk−1 ] = 1. σ (Nk−1 )ψk (Nk ) 1 n k −yk E[λk | Gk−1 ] = E[ n k Nk φk (vk ) 2 Mk + n k yk −Mk −n k × 1− pk (1 − pk )−n k +n k | Gk−1 ] Nk n −yk 1 σ (Nk−1 )ψk (Nk ) −n k k = E[ pk (1 − pk )−n k +n k E[ φk (vk ) Nk 2 Mk +n k n k yk −Mk × 1− | Gk−1 , Nk , n k , Mk ] | Gk−1 ]. Nk

The inner expectation equals 2 Mk , so that σ (Nk−1 )ψk (Nk ) −n k pk (1 − pk )−n k +n k | Gk−1 ] φk (vk )

nk nk 1 σ (Nk−1 )ψk (Nk ) = E[ | Gk−1 ] φk (vk ) 2n k i i=0 σ (Nk−1 )ψk (Nk−1 + σ vk ) = E[ | Gk−1 ] φk (vk ) σ (Nk−1 )ψk (Nk−1 + σ v) = φk (v)dv = ψk (u)du = 1. φk (v)

E[λk | Gk−1 ] =

1

nk 2

E[

244

Hidden populations

A new probability measure P can be defined by setting that:

dP = k . The point here is dP Gk

Lemma 8.2.2 Under the new probability measure P, Nk , n k and yk are three sequences of independent random variables which are independent of each other. Further, Nk has density ψk , n k has distribution bin( n k , 1/2) and yk has distribution bin(Mk , 1/2). Proof

For any “test” functions f , g and h, using Bayes’ Theorem 4.1.1, E[ f (Nk )g(n k )h(yk ) | Gk−1 ] =

E[ f (Nk )g(n k )h(yk ) k | Gk−1 ] , E[ k | Gk−1 ]

which equals E[ f (Nk )g(n k )h(yk )λk | Gk−1 ] σ (Nk−1 )ψk (Nk ) 1 −n k ˜ = E[ f (Nk )g(n k ) p (1 − pk )−n k +n k ˜k k n φk (vk ) 2 n k yk −Mk 1 n k −yk 1− × E[h(yk ) M | Gk−1 , Nk , n k , Mk ] | Gk−1 ]. 2 k Nk Nk After cancellation the inner expectation equals

Mk Mk 1 h(m) , m 2 Mk m=0 which shows that yk has distribution bin(Mk , 1/2) independent of N and n. Similarly, E[ f (Nk )g(n k )

σ (Nk−1 )ψk (Nk ) 1 −n k ˜ p (1 − pk )−n k +n k | Gk−1 ] φk (vk ) 2n˜ k k

= E[g(n k )]E[ f (Nk−1 + vk ) = E[g(n k )]

σ (Nk−1 )ψk (Nk−1 + σ vk ) | Gk−1 ] φk (vk )

f (Nk−1 + σ v)σ (Nk−1 )ψk (Nk−1 + σ v)dv

= E[g(n k )]

f (u)ψk (u)du

= E[g(n k )]E[ f (Nk )]. That is, under P the three processes are independent sequences of random variables with the desired distributions. Using this fact we derive a recursive equation for the unnormalized conditional distribution of Nk given Yk . For any “test” function f consider f (z)qk (z)dz E[ f (Nk ) −1 k | Yk ] E[ f (Nk ) | Yk ] = . (8.2.2) = E[ −1 E[ −1 k | Yk ] k | Yk ] The denominator of (8.2.2) being a normalizing factor, we focus only on the numerator.

8.2 Distribution estimation

245

In view of Lemma 8.2.2, −1 −1 E[ f (Nk ) −1 k | Yk ] = E[ f (Nk ) k−1 λk | Yk ]

= 2n k +Mk E

nk

f (z)

z

i=0

1

× pki (1 − pk )n k −i dz

= 2 Mk

nk

z − N k−1 φ k i Mk −yk σ (Nk−1 ) 1− ψk (z) z σ (Nk−1 )ψk (z)

i yk

nk

nk 2

i

i yk

f (z)

i=0

n k −i

× pki (1 − pk )

1−

z nk i

−1 k−1 | Yk

i Mk −yk z

φk

z − u σ (u) σ (u)

qk−1 (u)dzdu.

Comparing this last expression with the right hand side of (8.2.2) we have: Theorem 8.2.3 The unnormalized conditional probability density function of the hidden Markov model given by (8.1.1), (8.1.2), and (8.1.4) follow the recursions qk (z) =

nk

B(yk , z, i)

k (z, u)qk−1 (u)du.

(8.2.3)

i=0

i yk

i Mk −yk 1− z

Here Bk (y, z, i) = 2 z z − u φk σ (u) . (Note we take 00 = 1.) σ (u) Mk

nk i

pki (1 − pk )n k −i

and

k (z, u) =

Remarks 8.2.4 1. The normalized conditional density of Nk is given by

qk (z)

.

qk (u)du 2. The initial (normalized) probability density of N0 , prior to sampling, is π0 (.), so q0 (z) = π0 (z). Using the notation in Theorem 8.2.3, q1 (z) =

n1

B1 (y, z, i)

1 (z, u)π0 (u)du,

(8.2.4)

i=0

and further estimates follow from (8.2.3). 3. If the distribution of N0 is a delta function concentrated at some number A, (8.2.4) becomes q1 (z) = 1 (z, A)

n1

B1 (y, z, i).

(8.2.5)

i=0

246

Hidden populations

8.3 Parameter estimation Our model is function of the parameter pk , the proportion of the accessible marked individuals at epoch k. Suppose pk has dynamics given by (8.1.2). We also assume that θ will take values in some measurable space (, β, γ ). We now derive a recursive joint conditional unnormalized distribution for Nk and θ . We keep working under the probability measure P. Lemma 8.3.1 Write qk (z, θ)dzdθ = E[I (Nk ∈ dz, θ ∈ dθ) −1 k | Yk ]. Then qk (z, θ) =

nk

Bk (y, z, i, θ )

k (z, u)qk−1 (u, θ) du.

(8.3.1)

i=0

i yk n k −i i Mk −yk nk i 1− Here Bk (y, z, i, θ ) = 2 , pk (θ ) 1 − pk (θ) i z z z − u φk σ (u)

k (z, u) = . σ (u) Mk

Proof

Let f , g be integrable test functions. E[ f (Nk )g(θ ) −1 | Y ] = f (z)g(v)qk (z, v) dzdγ (v). k k

Using the independence assumption under P the left hand side of (8.3.2) is −1 = E[ f (Nk )g(θ ) −1 k−1 λk | Yk ] nk

φk

i yk

z − N

k−1

i Mk −yk σ (Nk−1 ) ψk (z) z z σ (N k−1 )ψk (z) i=0

n k −i nk i × pk (v) 1 − pk (v) dzdγ (v) −1 k−1 | Yk ] i z − u φ n k k i yk i Mk −yk σ (u) = 2 Mk f (z)g(v) 1− z z σ (u) i=0

nk × pk (v)i 1 − pk (v))n k −i qk−1 (u, v)dzdudγ (v). i

= 2 Mk E[

f (z)g(v)

1−

Comparing this last expression with the right hand side of (8.3.2) gives (8.3.1). If at time 1, θ has density h(θ ), then q1 (z, θ) =

nk

B1 (y, z, i, θ )h(θ)

i=0

and further updates are given by Lemma 8.3.1.

1 (z, u)π0 (u)du,

and

(8.3.2)

8.4 Pathwise estimation

247

If no dynamics enter the population size and Nk has density φk (.) independent of N , < k, the recursion in Lemma 8.3.1 simplifies to: qk (z, θ) = φk (z)qk−1 (z, θ)

nk

Bk (y, z, i, θ).

(8.3.3)

i=0

Maximum posterior estimators Quantity (8.2.4) (or 8.2.5) is a function of the unknown population size and could be 1 , which is the maximum posterior maximized with respect to z, yielding a critical value N estimate of N at epoch 1 given y1 . Similar maximizations at later times will provide MAP estimators for the population size at these times. 8.4 Pathwise estimation We now derive a recursive equation, which does not involve any integration, for the unnormalized density of the whole path up to epoch k.

Write qk (z 0 , . . . , z k )dz 0 . . . dz k = E[I (z 0 ∈ dz 0 ) . . . I (z k ∈ dz k ) −1 k | Yk ]. Theorem 8.4.1 Using the notation in Theorem 8.2.3, qk (z 0 , . . . , z k ) =

nk

Bk (y, z, i) k (z k−1 , z k )qk−1 (z 0 , . . . , z k−1 ).

(8.4.1)

i=0

Let f 0 , . . . , f k be “test functions”. Then E[ f 0 (N0 ) . . . f k (Nk ) −1 | Y ] = f 0 (z 0 ) . . . f k (z k )qk (z 0 , . . . , z k )dz 0 . . . dz k , k k

Proof

(8.4.2)

and −1 −1 E[ f 0 (N0 ) . . . f k (Nk ) −1 k | Yk ] = E[ f 0 (N0 ) . . . f k (Nk ) k−1 λk | Yk ] n yk n Mk −yk = 2 Mk E[ f 0 (N0 ) . . . f k (Nk−1 ) −1 f 1 − (z ) k k k−1 zk zk z − N k k−1 φk σ (Nk−1 ) × dz k | Yk ] σ (Nk−1 ) n yk n Mk −yk Mk =2 f 0 (z 0 ) . . . f k (z k ) 1− zk zk z − z k k−1 φk σ (z k−1 ) qk−1 (z 0 , . . . , z k−1 )dz 0 . . . dz k . × σ (z k−1 )

Comparing the last expression with (8.4.2) yields at once (8.4.1). n 1 Again we have q0 (z) = π0 (z), q1 (z 0 , z 1 ) = 1 (z 0 , z 1 )π0 (z 0 ) i=0 B1 (y, z, i, ), and further estimates follow from (8.4.1). However, no integration is needed in subsequent recursions.

248

Hidden populations

Maximum posterior estimators Expression (8.4.1) is a function of the path (z 0 , . . . , z k ) and could be maximized yielding a 0 , . . . , N k ). Since no integration is involved here one could substitute, at time critical path ( N 0 , . . . , N k−1 and then maximize qk ( N 0 , . . . , N k−1 , z k ) k say, the sequence of critical values N with respect to the variable z k to obtain an estimate for Nk . 8.5 A Markov chain model Suppose that on probability space (, F, P) are given three sequences of independent random variables, Nk , n k , and yk . For k ∈ IN, Nk is uniformly distributed over some finite set S = {s1 , . . . , s L } ⊂ IN − {0}, n k has a binomial distribution with parameters ( n k , 1/2) and yk has a binomial distribution with parameters (Mk , 1/2), where Mk ∈ IN − {0} is given. We wish to define a new probability measure P such that yk has a binomial distribution with parameters (Mk , n k /Nk ), Nk is a Markov chain with state space S and stochastic matrix C = {ci j } = P[Nk+1 = si | Nk = s j ], n k are random variables with conditional binomial distributions with parameters ( pk , n k ). Define the G-predictable sequences αi = Lj=1 I (N−1 = s j )ci j , for i = 1, . . . , L. In vector notation this is α (N−1 ) = C I (N−1 ), where I (N−1 ) = (I (N−1 = s1 ), . . . , I (N−1 = s L )). Now write L n y n M −y λ = 2 M +n pn (1 − p )n −n (Lαi ) I (N =si ) , (8.5.1) 1− N N i=1 k = k=0 λ . Lemma 8.5.1 The process k is a G-martingale. E[ k | Gk−1 ] = k−1 E[λk | Gk−1 ], so we must show that E[λk | Gk−1 ] = 1. n yk −Mk φk (Nk ) 1 n −yk 1 − E[λk | Gk−1 ] = E[ | Gk−1 ] φk (vk ) 2 Mk Nk Nk 1 φk (Nk ) n −yk n yk −Mk = M E[ | Gk−1 , Nk , Mk ] | Gk−1 ] 1− E[ 2 k φk (vk ) Nk Nk

Proof

Mk 1 φk (Nk ) Mk ! E[ | Gk−1 ] 2 Mk φk (vk ) m=0 m!(Mk − m)! φk (Nk−1 + v) = φk (v)dv = 1, φk (v)

=

and a new probability measure P can be defined by setting

dP = k . dP Gk

Lemma 8.5.2 Under the probability measure P the above processes obey the desired dynamics, i.e. Nk is a Markov chain with state space S and stochastic matrix C = {ci j }, yk and n k are random variables with conditional binomial distributions with parameters (Mk , n k /Nk ) and ( pk , n k ) respectively.

8.5 A Markov chain model

Proof

249

We give a proof only for the first statement regarding Nk . P[Nk = s j | Gk−1 ] = E[I (Nk = s j ) | Gk−1 ] =

E[I (Nk = s j ) k | Gk−1 ] E[ k | Gk−1 ]

= E[I (Nk = s j )λk | Gk−1 ]

= E[I (Nk = s j )2 Mk +n k pn k (1 − pk )n k −n k

n yk k

sj

n k Mk −yk j × 1− Lαk | Gk−1 ] sj

= Lαk 2 Mk +n k E[I (Nk = s j ) pn k (1 − pk )n k −n k j

n yk k

sj

n k Mk −yk × 1− | Gk−1 ] sj

Mk nk Mk n m n Mk −m 1 j Mk + nk 1 = αk L2 1− L n=0 m=0 m si si 2 Mk

nk 1 pkn (1 − pk )n k −n × n 2n k j

= αk = P[Nk = s j | Nk−1 ].

Working under the probability measure P, we derive recursive equations for the unnormalized conditional probability distribution of Nk . Write P[Nk = si | Yk ] = E[I (Nk = si ) | Yk ] =

E[I (Nk = si ) k | Yk ] E[ k | Yk ]

,

and qksi = E[I (Nk = si ) k | Yk ]. Theorem 8.5.3 qksi =

nk

Bk (y, si , n)

n=0

Here Bk (y, si , n) = 2

Mk

n yk si

L

s

j ci j qk−1 .

(8.5.2)

j=1

n Mk −yk 1− si

nk n

pkn (1 − pk )n k −n .

If at time 0, q0 = π = (π1 , . . . , π L ), q1si =

n1 n=0

B1 (y, si , n)

L

ci j π j .

j=1

(8.5.3)

250

Hidden populations

If N0 = sα with probability 1, q1si = ciα

n1

B1 (y, si , n),

(8.5.4)

n=0

and further updates are given by (8.5.2). MAP estimators of N1 , . . . , Nk are provided by 1 = argmax{q s1 , q s2 , . . . , q sL }, . . . , N 1 1 1 k = argmax{q s1 , q s2 , . . . , q sL }. N k k k

8.6 Recursive parameter estimation The previous model is function of the parameters pk and C = ci j . Let pk = pk (θ1 ) and C = C(θ2 ) = ci j (θ2 ) and θ = (θ1 , θ2 ). Suppose θ belongs to some measurable space (, β, γ ). Working again under the probability measure P, write qksi (θ )dθ = E[I (Nk = si )I (θ ∈ dθ) k | Yk ].

(8.6.1)

Lemma 8.6.1 qksi (θ ) =

nk

Bk (y, si , n, θ )

n=0

Here Bk (y, si , n, θ ) = 2

Mk

n yk si

L

s

j ci j (θ )qk−1 (θ).

(8.6.2)

j=1

n Mk −yk 1− si

nk n

pkn (θ)(1 − pk (θ ))n k −n .

If θ1 has density h(.) and θ2 has density g(.), q1si (θ ) =

n1 n=0

B1 (y, si , n, θ )

L

ci j (θ2 )g(θ2 )π j .

(8.6.3)

j=1

8.7 A tags loss model In this section we propose a model where the marks or tags are not permanent. In this situation the marking is done using double tagging where each individual is marked with two tags. For simplicity we assume that the two tags on each individual are nondistinguishable and that individuals retain or lose their tags independently. We start again with a probability space (, F, P) on which are given two sequences of independent random variables Nk and yk . For k ∈ IN, Nk is uniformly distributed over some finite set S = {s1 , . . . , s L } ⊂ IN − {0}, and yk has a trinomial distribution with parameters (Mk , 1/3), where Mk ∈ IN − {0} is given. At any epoch each individual in the population is in any of three states, namely unmarked, marked with only one tag, and marked with two tags, which states we shall call 0, 1, 2 respectively. We suppose that each individual behaves like an independent time homogeneous Markov chain with transition matrix { pi j }.

8.7 A tags loss model

251

At each time the population size N is distributed or partitioned into three groups N (2), N (1), and N (0) = N − N (2) − N (1) among the three states, and we would like to define the set of all such partitions as the states of a three-dimensional Markov chain (N (0), N (1), N (2)). Recall that at each epoch , 0 ≤ N (2), N (1) ≤ n. Write p(i0 ,i1 ,i2 ),( j0 , j1 , j2 ) = P[(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | (Nk−1 (0), Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 )], and for any real numbers x0 , x1 , x2 define the function F(x0 , x1 , x2 , j0 , j1 , j2 ) = (

2

2 2 p0 x ) j0 ( p1 x ) j1 ( p2 x ) j2 .

=0

=0

=0

x0i0 x1i1 x2i2

Then p(i0 ,i1 ,i2 ),( j0 , j1 , j2 ) is the coefficient of in F(x0 , x1 , x2 , j0 , j1 , j2 ). We wish to define a new probability measure P such that yk hasa conditional trinomial distribution with parameters Mk , Nk (0)/Nk , Nk (1)/Nk , Nk (2)/Nk , Nk is a Markov chain with state space S and stochastic matrix C = {ci j }. The Markov chain (N (0), N (1), N (2)) is the same under both probability measures. Define again the G-predictable sequences αi =

L

I (N−1 = s j )ci j , i = 1, . . . , L .

j=1

Now write λ = 3 M

L N (0) y (0) N (1) y (1) N (2) y (2) (Lαi ) I (N =si ) , N N N i=1

k =

k

(8.7.1)

λ .

=0

The process k is a G-martingale and a new probability measure P can be defined by dP setting = k . It can be checked that under P the above processes have the desired dP Gk distributions. Working under the probability measure P, we derive recursive equations for the unnormalized conditional joint probability distribution of Nk and (Nk (0), Nk (1), Nk (2)). Write P[Nk = si , (Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | Yk ] = E[I (Nk = si )I [(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )] | Yk ] =

E[I (Nk = si )I [(Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )] k | Yk ] E[ k | Yk ]

,

and qk (si , i 1 , i 2 ) = E[I (Nk = si , Nk (2) = i 2 , Nk (1) = i 1 , Nk (0) = si − i 1 − i 2 ) k | Yk ]. It can be shown that qk (si , i 1 , i 2 ) is given by the following recursions.

252

Hidden populations

Theorem 8.7.1 qk (si , i 1 , i 2 ) = 3 Mk ×

L s − i − i yk (0) i yk (1) i yk (2) i 1 2 1 2 ci j si si si j=1 n k−1

p(si −i1 −i2 ,i1 ,i2 )(s j − j1 − j2 , j1 , j2 ) qk−1 (s j , j1 , j2 ).

(8.7.2)

j1 + j2 =0

The expected value of Nk given the observations Yk is given by L

i=1 si

nk

i +i 2 =0

E[Nk | Yk ] = 1 nk L i=1

i 1 +i 2 =0

qk (si , i 1 , i 2 )

qk (si , i 1 , i 2 )

.

Another way of looking at the problem is by considering only the subpopulation of tagged individuals in the definition of the Markov chain (Nk (0), Nk (1), Nk (2)). In this case the state space is the set of all the partitions of the totality of tagged individuals into three groups: the ones with two tags, the ones with one tag and the ones who lost both tags. Hence we write the total number of tagged individuals as nk = n = Nk (2) + Nk (1) + Nk (0). Note that, when sampling, we cannot observe directly members belonging to the group of individuals who lost their two tags as they become undistinguishable from the unmarked ones in the sample. Now we assume that under P the observation process is multinomial with parameters (Mk , 1/4) and under P it is (conditional) multinomial with parameters Mk , Nk (0)/Nk , Nk (1)/Nk , Nk (2)/Nk , Nk (u)/Nk . Here Nk (u) is the number of unmarked individuals in the population. Again note that Nk (u) is not Nk (0). Given yk (1), yk (2), the unobserved component yk (0), under the probability measure P, is binomial with parameters (Mk − yk (1) − yk (2), 1/2). Write P[Nk = si , (Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ) | Yk ] = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) | Yk ] =

E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) k | Yk ] E[ k | Yk ]

,

and qk (si , i 0 , i 1 , i 2 ) = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 )) k | Yk ] = E[I (Nk = si )I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ))4 Mk i yk (0) i yk (1) i yk (2) i yk (u) 0 1 2 u × si si si si ×L

L j=1

I (Nk−1 = s j )ci j k−1 | Yk ]

8.8 Gaussian noise approximation

253

i yk (1) i yk (2) i yk (0) 1 2 0 E si si si s − i − i − i Mk −yk (2)−yk (1)−yk (0)

= 4 Mk × ×

i

2

1

0

si L

I (Nk−1 = s j )ci j I ((Nk (0), Nk (1), Nk (2)) = (i 0 , i 1 , i 2 ))

j=1

×

I ((Nk−1 , Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 )) k−1 | Yk

j0 + j1 + j2 = n

i yk (1) i yk (2) Mk −y k (2)−yk (1) i 0 m 1 2 E si si si m=0 s − i − i − i Mk −yk (2)−yk (1)−m

= 4 Mk ×

i

× ×

2

1

0

si

L Mk − yk (2) − yk (1) 1 Mk −yk (2)−yk (1) I (Nk−1 = s j )ci j m 2 j=1 I ((Nk−1 , Nk−1 (1), Nk−1 (2)) = ( j0 , j1 , j2 ))

j0 + j1 + j2 = n

× p(i0 ,i1 ,i2 )( j0 , j1 , j2 ) k−1 | Yk . Using the definition of q we have: Theorem 8.7.2 qk (si , i 0 , i 1 , i 2 ) =

Mk −y k (2)−yk (1)

Bk (m, i 0 , i 1 , i 2 )

m=0

L

ci j

j=1

p(i0 ,i1 ,i2 )( j0 , j1 , j2 ) qk−1 (s j , j0 , j1 , j2 ).

j0 + j1 + j2 = n

Here Bk (m, i 0 , i 1 , i 2 ) = 2 Mk +yk (1)+yk (2) ×

i yk (1) i yk (2) i m 1

2

0

si

si

si

s − i − i − i Mk −yk (2)−yk (1)−m M − y (2) − y (1) i 2 1 0 k k k × . m si 8.8 Gaussian noise approximation

An approximate but simpler form of the recursion in Lemma 8.7.2 is to use a suggestion proposed by [23] where the martingale increment “noise” present in the representation of a Markov chain is replaced by Gaussian noise. To this effect, let’s identify, as it is explained in [10], the three states 0, 1, 2 with the standard unit (column) vectors e1 , e2 , e3 of IR3 . Write X kn ∈ {e1 , e2 , e3 } for the state of the n-th individual at time k, 1 ≤ n ≤ n . Then each

254

Hidden populations

individual behaves like a Markov chain on (, F, P) with transition matrix P. n 1 X kn . Then Define X k = n n=1 X k = P X k−1 + Mk ,

(8.8.1)

where Mk is a martingale increment. The suggestion made in [23] is to replace the martingale increment Mk in (8.8.1) by an independent Gaussian random variable vk of mean 0 and covariance matrix E[Mk Mk ] whose density is denoted by φk . That is, the signal process xk , taking values in IR3 , has dynamics xk = P xk−1 + vk .

(8.8.2)

We assume that under P the observation process is multinomial with parameters (Mk , 1/4), xk has density φk and Nk is uniformly distributed over the set (s1 , . . . , s L ). Under the “real world” probability measure P, Nk is a Markov chain with transition matrix C, xk has dynamics (8.8.2) and yk has conditional probability distribution given by P[yk = yk (2) + yk (1) + yk (0) + yk (u) | Mk , Nk = si , xk = (x0 , x1 , x2 ), Nk (u) = si − x2 − x1 − x0 ] x0 yk (0) x1 yk (1) x2 yk (2) Mk = yk (2), yk (1), yk (0), yk (u) si si si s − x − x − x Mk −yk (2)−yk (1)−yk (0) i 2 1 0 × . si P is defined in terms of P using the G–martingale k =

k k=0

×

4 Mk

x (0) yk (0) x (1) yk (1) x (2) yk (2) k k k Nk Nk Nk

L N − x (2) − x (1) − x (0) Mk −yk (2)−yk (1)−yk (0) k k k k (Lαki ) I (Nk =si ) . Nk i=1

The next theorem is the analog of Theorem 8.7.2. Theorem 8.8.1 The unnormalized joint conditional probability distribution of Nk and xk E[I (Nk = si )I (xk ∈ dx) k | Yk ] := qksi (x)dx is given recursively as follows: Mk −y L k (2)−yk (1) sj qksi (x) = Bk (m, x0 , x1 , x2 ) ci j φk (x − Pu)qk−1 (u)du, m=0

using the notation in Theorem 8.7.2.

j=1

References

[1] Arnold, L. Stochastic Differential Equations: Theory and Applications. John Wiley & Sons (1974). [2] Bensoussan, A. Stochastic Control of Partially Observed Systems, Cambridge University Press (1992). [3] Baum, L. E. and Petrie, T. Statistical inference for probabilistic functions of finite state markov chains, Ann. Inst. Statistical Mathematics 37 (1966) 1554–1563. [4] Billingsley, P. Probability and Measure. Third edn. Wiley Series in Probability and Mathematical Statistics (1995). [5] Br´emaud, P. Markov Chains, Gibbs Fields, Monte Carlo Simulation and Queues. Text in Applied Mathematics 31. Springer (1999). [6] Chung, K. L. and Williams, R. J. Introduction to Stochastic Integration. Second edn. Birkhauser (1990). [7] Chung, K. L. A Course in Probability Theory. Academic Press (1974). [8] Davis, M. H. A. Martingales of Wiener and Poisson processes. J. London Math. Soc. (2) 13 (1976) 336–338. [9] Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statistical Society, B 39 (1977) 1–38. [10] Elliott, R. J., Aggoun L., and Moore, J. B. Hidden Markov Models: Estimation and Control. Applications of Mathematics, Vol. 29. Springer-Verlag (1995). [11] Elliott, R. J. Stochastic Calculus and Applications. Applications of Mathematics Vol. 18. Springer-Verlag (1982). [12] Elliott, R. J. and Moore, J. B. A martingale Kronecker lemma and parameter estimation for linear systems. IEEE Trans. Auto. Control 43, No. 9 (1998). [13] Elliott, R. J. and Krishnamurthy, Vikrum. New finite dimensional filters for parameter estimation of discrete-time linear Gaussian models. IEEE Trans. Auto. Control 44 (1998) 938–951. [14] Elliott, R. J. and Krishnamurthy, Vikrum. Exact finite dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems. SIAM J. Control Optim. 35, No. 6 (1997) 1908–1923. [15] Hajek, B. and Wong, E. Stochastic Processes in Engineering Systems. Springer-Verlag (1985). [16] Ikeda, N. and Watanabe, S. Stochastic Differential Equations and Diffusion Processes. Second edn. North-Holland Publishing Company (1989). [17] Itˆo, K. Stochastic integral. Proc. Imperial Acad. Tokyo 20 (1944) 519–524. [18] Jacod, J. Calcul Stochastique et Problemes de Martingales, Lecture Notes in Math. Vol. 714, Springer-Verlag (1979). [19] Jacod, J. and Shiryayev, A. N. Limit Theorems for Stochastic Processes. Springer (1987). [20] Jazwinski, A. H. Stochastic Processes and Filtering Theory. Academic Press (1970).

256

References

[21] Karatzas, Ioannis and Shreve, S. E. Brownian Motion and Stochastic Calculus. Graduate Texts in Mathematics, Vol. 113. Springer-Verlag (1988). [22] Kloeden, P. E. and Platen, E. Numerical Solution of Stochastic Differential Equations. SpringerVerlag (1992). [23] Krichagina, N. V., Lipster, R. S., and Rubinovich, E. Y. Kalman filter for Markov processes. In Steklov Seminar (1984). Eds. N. V. Krylov, R. S. Lipster and A. A. Novikov, Optimization Software Inc. (1985) 197–213. [24] Kunita, H. and Watanabe, S. On square integrable martingales, Nagoya Math. J. 30 (1967) 209–245. [25] Lipster, R. S. and Shiryayev, A. N. (1977). Statistics of Random Processes 2. Springer-Verlag (1977). [26] Lo`eve, M. Probability Theory I. Fourth edn. Springer-Verlag (1977). [27] Milshtein, G. N. Approximate integration of stochastic differential equations. Theory of Prob. Appl. 19 (1974) 562–577. [28] Meyer, P. A. Probability and Potentials. Blaisdell Publishing Company (1966). [29] Neveu, J. Discrete Parameter Martingales. North-Holland (1975). [30] Oksendal, B. Stochastic Differential Equations, an Introduction with Applications. Fourth edn. Universitext. Springer (1995). [31] Papoulis, A. Probability, Random Variables and Stochastic Processes. McGraw Hill (1984). [32] Revuz, Daniel and Yor, Marc. Continuous Martingales and Brownian Motion. Third edn. Grundlehren der mathematischen Wissenschaften, Vol. 293. Springer-Verlag (1999). [33] Rogers, L. C. G. and Satchell, S. E. Estimating variance from high, low and closing prices, Ann. Appl. Probability 1 (1991) 504–512. [34] Rogers, L. C. G. and Williams, David. Diffusions, Markov Processes, and Martingales. Vol. 1: Foundations. Second edn. John Wiley & Sons (1994). [35] Seber, G. A. F. The Estimation of Animal Abundance and Related Parameters. Second edn. Edward Arnold (1982). [36] Shiryayev, A. N. Probability Theory. Springer-Verlag (1984). [37] Wu, C. F. J. On the convergence properties of the EM algorithm, Ann. Statistics 11 (1983) 95–103.

Index

σ -algebra, 4 σ -field, 4 σ -finite measure, 20 absolutely continuous, 17, 24 adapted process, 43 algebra, 3 almost surely (a.s.), 19 atoms, 5, 14 Bayes’ Theorem, 131 Borel field, 4 Borel sets, 4 Bounded Convergence Theorem, 21 Brownian motion with drift, 72 calibration, 219 capture–recapture, 242 Cauchy–Schwarz inequality, 26 certain event, 3 change of variable formula in Lebesgue integral, 22 Chebyshev–Markov inequality, 26 class D, 64 class D supermartingale, 64 coarser field, 14 compensator, 57, 65, 124 complete filtration, 7 complete σ -field, 7 Conditional Bayes’ Theorem, 132 conditional independence, 11 conditional probability given G, 10 continuity of probability, 7 continuous almost surely, 42 continuous in L p , 42 continuous in probability, 42 convenience yield, 224 converges almost surely, 19 converges in distribution, 27 converges in probability, 27 correlation coefficient, 26 counting measure, 123 counting process, 65, 75

covariance, 26 cylinder set, 8, 18 direct product, 23 discrete time Itˆo formula, 83 Doob–Meyer special semimartingale representation, 65 EM algorithm, 177, 196 events, 3, 4 expected value of a random variable, 20 Fatou’s Lemma, 21 field, 3 filtered probability space, 7 filtration, 5 finite variation function, 62 first passage time, 49 Gaussian random variable, 70 hitting time, 46 homogeneous Markov chain, 57, 67 impossible event, 3 increasing process, 45, 64 independent events, 11 independent increments, 38 indicator function, 14 indistinguishable processes, 42 infinite variation function, 62 infinitely often (i.o.), 13 infinitesimal generator, 68 integer valued measure, 123 integrable random variable, 20 integrable variation, 62, 97, 98 integration by parts formula, 83 Itˆo formula for Brownian motion, 108 Itˆo formula for continuous bounded semimartingales, 98 Jensen’s Inequality, 51

258 Kolmogorov’s backward differential system, 68 Kolmogorov’s Existence Theorem, 45 Kolmogorov’s forward differential system, 68 L´evy system, 41 L´evy’s characterization of the Poisson process, 107 Lebesgue measure, 20 Lebesgue’s Dominated Convergence Theorem, 21 left continuity of a filtration, 6 likelihood ratio, 25 local martingale, 55, 61 local square integrable martingale, 61 localizing sequence, 61 locally bounded process, 96 locally uniformly integrable martingale, 96 Markov chain, 57, 67 Markov process, 50 Markov property, 50, 57 Markov time, 46 martingale, 50, 59 Martingale Convergence Theorem, 60 martingale transform, 56, 83 mean of a random variable, 20 measurable sets, 4 measurable space, 4 measurable stochastic process, 44 measure, 19 measure of jumps, 125 measure space, 19 modification process, 42 modified Kalman filter, 226 Monotone Convergence Theorem, 20 multinomial distribution, 37 natural filtration, 43 negligible events, 7 normalized random variable, 26 null events, 7 optional covariation, 80 optional process, 45 optional quadratic variation, 80, 85, 86 optional time, 46, 49 orthogonal martingales, 82, 86 passage time of Brownian motion, 76 point function, 123 point processes, 124 Poisson process, 75 Poisson process with drift, 161 Poisson random measure, 126 posterior probability, 9 power class, 4 predictable covariation, 80

Index predictable process, 45 predictable quadratic variation, 80, 85 probability density function, 17, 18 probability distribution function, 17 probability measure, 6 probability space, 7 probability transition matrix, 58 product σ -field, 23 product probability measure, 8 product rule, 118 progressively measurable process, 44 Radon–Nikodym derivative, 25 Radon–Nikodym Theorem, 24 random measure, 123 random variable, 14 regular martingale, 52 right-continuity of a filtration, 6 right-continuous process, 42 right-continuous with left limits process, 42 sample path continuity, 42 sample space, 3 semimartingale, 57, 63 semimartingale representation of a Markov chain, 58, 69 simple event, 3 simple function, 15, 92 special semimartingale, 63 square integrable process, 61, 80 standard deviation, 26 standard one-dimensional Brownian motion, 70 stationary increments, 38 stochastic basis, 7 stochastic exponential, 113 stochastic integral, 83 stochastic process, 38 stopping time, 46 Stratonovitch integral, 83 strong Markov property, 50 submartingale, 50, 59 supermartingale, 50, 59 time-change for martingales, 87 transition semigroup, 67 uniform integrability, 27 up-crossings theorem, 52 usual conditions, 7 variance, 26 variation of a function, 62 volatility, 219 volatility estimation, 217 weak solution, 149