An introduction to stochastic filtering theory

An Introduction to Stochastic Filtering Theory OXFORD GRADUATE TEXTS IN MATHEMATICS Books in the series 1. Keith Hann...

Author: Jie Xiong

134 downloads 2410 Views 995KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

An Introduction to Stochastic Filtering Theory

OXFORD GRADUATE TEXTS IN MATHEMATICS Books in the series 1. Keith Hannabuss: An introduction to quantum theory 2. Reinhold Meise and Dietmar Vogt: Introduction to functional analysis 3. James G. Oxley: Matroid theory 4. N.J. Hitchin, G.B. Segal, and R.S. Ward: Integrable systems: twistors, loop groups, and Riemann surfaces 5. Wulf Rossmann: Lie groups: An introduction through linear groups 6. Qing Liu: Algebraic geometry and arithmetic curves 7. Martin R. Bridson and Simon M. Salamon (eds): Invitations to geometry and topology 8. Shmuel Kantorovitz: Introduction to modern analysis 9. Terry Lawson: Topology: A geometric approach 10. Meinolf Geck: An introduction to algebraic geometry and algebraic groups 11. Alastair Fletcher and Vladimir Markovic: Quasiconformal maps and Teichmüller theory 12. Dominic Joyce: Riemannian holonomy groups and calibrated geometry 13. Fernando Villegas: Experimental Number Theory 14. Péter Medvegyev: Stochastic Integration Theory 15. Martin Guest: From Quantum Cohomology to Integrable Systems 16. Alan Rendall: Partial Differential Equations in General Relativity 17. Yves Félix, John Oprea and Daniel Tanré: Algebraic Models in Geometry 18. Jie Xiong: An Introduction to Stochastic Filtering Theory

An Introduction to Stochastic Filtering Theory Jie Xiong Department of Mathematics University of Tennessee Knoxville, TN 37996-1300, USA

1

3

Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With ofﬁces in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © Jie Xiong 2008 The moral rights of the author have been asserted Database right Oxford University Press (maker) First published 2008 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available Typeset by Newgen Imaging Systems (P) Ltd., Chennai, India Printed in Great Britain on acid-free paper by Biddles Ltd., King’s Lynn, Norfolk ISBN 978–0–19–921970–4 10 9 8 7 6 5 4 3 2 1

To Jingli, Jerry and Michael

This page intentionally left blank

Preface The object of stochastic ﬁltering is to use probability tools to estimate unobservable stochastic processes that arise in many applied ﬁelds including communication, target tracking, and mathematical ﬁnance. Stochastic ﬁltering theory has seen a rapid development in recent years. First, the (branching) particle-system representation of the optimal ﬁlter has been studied by many authors to seek more effective numerical approximations of the optimal ﬁlter. It turns out that such a representation can be utilized to prove the uniqueness of the solution to the ﬁltering equation itself and, hence, broadening the scope of the tractable class of models. Secondly, the stability of the ﬁlter with “incorrect” initial state as well as the long-time behavior of the optimal ﬁlter has attracted the attention of many researchers. This direction of research has become extremely challenging after a gap in a widely cited paper was discovered. Finally, many problems in mathematical ﬁnance, for example, the stochastic volatility model, lead to singular ﬁltering models. More speciﬁcally, the magnitude of the observation noise may depend on the signal that makes the optimal ﬁlter singular. Some progress in this aspect was made recently. It is the belief of this author that the time is ripe for a new textbook to reﬂect these recent developments. The main theme of this book is to recapitulate these advances in a succinct and efﬁcient manner. The book can serve as a text for mathematics as well as engineering graduate and inspired undergraduate students. It can also serve as a reference for practitioners in various ﬁelds of applications. As noted, the aim of this book is to take the students to this exciting ﬁeld of research through the shortest route possible. To achieve this goal, we completely avoid the chaos decomposition used in the classical ﬁltering theory (e.g. Kallianpur [81]). The main approach of this book is based on the particle representation for stochastic partial differential equations developed by Kurtz and Xiong ([97–99]). The methods used here can be applied to more general stochastic partial differential equations. Therefore, it provides a bridge to the readers who are interested in studying the general theory of stochastic partial differential equations. We should mention that the propagation of chaos decomposition and the multiple stochastic integral methods have provided another type of

viii

Preface

numerical scheme in the approximation of the optimal ﬁlter. The advantage of this approach is that most of the computations are done “ofﬂine” in advance. This important development is not covered in this book because we want to limit the pre-requisite of this book. We only assume that the reader has basic knowledge of probability theory, for example, the material in the book of Billingsley [11].

Acknowledgements The author hopes that the readers ﬁnd this book enjoyable and informative as they seek to enter the research ﬁeld of stochastic ﬁltering. He had much assistance in making the book a reality and wishes to thank all of those who helped along the way. In particular, he would like to note that his friend and colleague, Balram Rajput at the University of Tennessee has read the entire manuscript and has helped the author to correct many grammatical and presentational problems. Tom Kurtz from the University of Wisconsin, Wei Sun from Concordia University at Montreal, Ofer Zeitouni from the University of Minnesota and Yong Zeng from the University of Missouri at Kansas City have read the manuscript and made many constructive suggestions. Don Dawson from Carleton University and Leonid Mytnik from Israel Institute of Technology have discussed with this author and made very important observations that help to improve this book substantially. Zhiqiang Li, his graduate student, has read the entire manuscript and asked questions that helped him to clarify many points from a students’s viewpoint. This book is based on a course taught by the author in the Fall semester of 2005 at the University of Tennessee. After the book was almost ﬁnished, the author also presented it in a short course at the Summer School in Beijing Normal University in 2007. The author wishes to thank the audience in both classes who raised many interesting questions. He also wishes to thank Zenghu Li for the invitation to visit Beijing Normal University and to give the noted course in the Summer School there. As much of the material is based on the author’s collaborative research in this ﬁeld, he would like to thank his collaborators on stochastic ﬁltering for this enjoyable co-operation and for allowing him to include the joint research in this book. These collaborators include Dan Crisan (Imperial College in London), Mike Kouritzin (the University of Alberta at Edmonton), Tom Kurtz (the University of Wisconsin at Madison), Wei Sun (Concordia University at Montreal), and Xunyu Zhou (the Chinese University of Hong Kong). He also would like to thank the National Security Agency for the support of his research during the last ﬁve years. Finally, the author would like to thank the staff of the Oxford University Press: Editor Alison Jones and her assistant Dewi Jackson, as well as her formal assistant Jessica Churchman, for their co-operation and help.

This page intentionally left blank

Contents

1

2

3

4

Introduction

1

1.1 1.2 1.3

1 6 8

Examples Basic deﬁnitions and the ﬁltering equation An overview

Brownian motion and martingales

15

2.1 2.2 2.3 2.4

15 25 32 34

Martingales Doob–Meyer decomposition Meyer’s processes Brownian motion

Stochastic integrals and Itô’s formula

36

3.1 3.2 3.3 3.4 3.5 3.6

36 37 41 46 52 57

Predictable processes Stochastic integral Itô’s formula Martingale representation in terms of Brownian motion Change of measures Stratonovich integral

Stochastic differential equations

61

4.1 4.2 4.3 4.4 4.5

62 67 70 72 79

Basic deﬁnitions Existence and uniqueness of a solution Martingale problem A stochastic ﬂow Markov property

xii

Contents

5

6

Filtering model and Kallianpur–Striebel formula

82

5.1 5.2 5.3 5.4 5.5

82 83 86 93 95

Uniqueness of the solution for Zakai’s equation 6.1 6.2 6.3 6.4 6.5 6.6

7

8

9

The ﬁltering model The optimal ﬁlter Filtering equation Particle-system representation Notes

Hilbert space Transformation to a Hilbert space Some useful inequalities Uniqueness for Zakai’s equation A duality representation Notes

96 96 98 103 109 111 120

Uniqueness of the solution for the ﬁltering equation

121

7.1 7.2 7.3 7.4

121 124 129 131

An interacting particle system The uniqueness of the system Uniqueness for the ﬁltering equation Notes

Numerical methods

132

8.1 8.2 8.3 8.4 8.5

132 137 143 149 155

Monte-Carlo method A branching particle system Convergence of Vtn Convergence of V n Notes

Linear ﬁltering

157

9.1 9.2 9.3

157 160

9.4

Gaussian system Kalman–Bucy ﬁltering Discrete-time approximation of the Kalman–Bucy ﬁltering Some basic facts for a related deterministic control problem

164 165

Contents

9.5 9.6

Stability for Kalman–Bucy ﬁltering Notes

10 Stability of non-linear ﬁltering 10.1 10.2 10.3 10.4

Markov property of the optimal ﬁlter Ergodicity of the optimal ﬁlter Finite memory property Asymptotic stability for non-linear ﬁltering with compact state space 10.5 Exchangeability of union intersection for σ -ﬁelds 10.6 Notes 11 Singular ﬁltering 11.1 11.2 11.3 11.4 11.5 11.6

A special example A general singular ﬁltering model Optimal ﬁlter with discrete support Optimal ﬁlter supported on manifolds Filtering model with Ornstein–Uhlenbeck noise Notes

180 185 186 187 198 205 211 223 230 231 231 236 240 245 252 254

Bibliography

255

List of Notations

266

Index

269

xiii

This page intentionally left blank

1

Introduction

In this chapter, we ﬁrst give a few motivating examples for stochastic ﬁltering. Then, we introduce some basic deﬁnitions and state the main equations that arise in non-linear ﬁltering theory. Finally, we give an overview of the topics to be covered in this book.

1.1

Examples

In this section, we study four examples arising from different ﬁelds of application. The ﬁrst example comes from wireless communication that was in fact the main motivation for ﬁltering theory in the early stage. The second example comes from mathematical ﬁnance where the random factors affecting the stock prices are not completely observed, instead, only the stock prices themselves are observed. The selection of a portfolio must be based on the available information provided by the movement of the stock prices. The third example comes from the ﬁeld of environment protection. In this example, we estimate the distribution of the undesired chemicals in a river using the data obtained from a few observation stations along the river. Finally, in the last example, we study the ﬁltering problem when the observation noise is given by an Ornstein–Uhlenbeck process that is an approximation of the white noise that exists only in the sense of generalized functions.

1.1.1

Wireless communication

A signal process Xt taking values in a space S is to be transmitted to a receiver. Because of the random noise, this signal is not directly observable. Instead, a function h(Xt ) (taking values in Rm ) of this signal plus an m-dimensional white noise nt is observed. The original observation

2

1 : Introduction

model is then yt = h(Xt ) + nt ,

(1.1)

where yt is called the observation process. Note that the white noise exists only in the sense of generalized functions, it is the derivative (again, in the sense of generalized functions) of a Brownian motion that exists in the ordinary sense (we refer the interested reader to the book of Kuo [96] for an introduction to the white noise theory). Therefore, it is natural for us to consider the accumulated observation process t Yt = ys ds 0

as the source of our information. The observation equation is then written as t h(Xs )ds + Wt , (1.2) Yt = 0

where Wt is an m-dimensional Brownian motion. The aim of the ﬁltering theory is to estimate the signal based on the observation σ -ﬁeld

Gt ≡ σ (Ys : 0 ≤ s ≤ t) generated by the (accumulated) observation process {Ys : s ≤ t}.

1.1.2

Portfolio optimization

We consider a market consisting of a bond and d stocks whose prices are stochastic processes Sti , i = 0, 1, . . . , d, governed by the following stochastic differential equations (SDEs): ij ˜ j dSti = Sti Xti dt + m σ ˜ d W i = 1, 2, . . . , d, t , j=1 t (1.3) 0 0 0 dSt = St Xt dt, t ≥ 0, ˜ m )∗ is a standard Brownian motion deﬁned on ˜ := (W ˜ 1, . . . , W where W a stochastic basis (, F , P; {Ft }t≥0 ) satisfying the usual condition, Xti , i = 1, 2, . . . , d, are the appreciation rate processes of the stocks, Xt0 is ˜ t := (σ˜ tij ) is the interest rate process, and the d × m matrix valued process ∗ the volatility process. Here and throughout this book A denotes the transpose of a matrix A. Let

Gt := σ (Ssi : s ≤ t, i = 0, 1, 2, . . . , d), t ≥ 0.

1.1

Examples

˜ ˜ is the only In our model Gt , rather than FtW (the ﬁltration generated by W), information available to the investors at time t. One of the objectives of mathematical ﬁnance is to study how to choose a suitable portfolio such that the terminal wealth is optimized. Let uit be the worth of an agent’s wealth (dollar amount) in the ith stock, i = 1, 2, . . . , d. Our decision must be based on the available information. Let L2G (0, T; Rd ) be the collection of square-integrable processes that are predictable with respect to the σ -ﬁelds Gt .

Deﬁnition 1.1 A d-dimensional process ut ≡ (u1t , . . . , udt )∗ is an admissible portfolio if ut ∈ L2G (0, T; Rd ). For the portfolio to be self-ﬁnanced, the change in the wealth should be equal to the value change due to that of the stocks and the bond. Let Wt be the wealth process. Then ⎛ ⎞ d d 0 i i ⎠ dSt i dSt ⎝ d Wt = Wt − ut + u t St0 Sti i=1 i=1 ⎞ ⎛ m d d ij ˜ tj . = ⎝Xt0 Wt + (Xti − Xt0 )uit ⎠ dt + σ˜ t uit d W (1.4) i=1

i=1 j=1

Applying Itô’s formula to equation (1.3), we have d log Sti

=

Xti

m 1 ii ij ˜ j − at dt + σ˜ t d W t , i = 1, 2, . . . , d, 2

(1.5)

j=1

where ij at

:=

m

jk

σ˜ tik σ˜ t , i, j = 1, 2, . . . , d.

k=1

It is easy to show that the quadratic covariation process, which coinj cides with Meyer’s process in this case, between log Sti and log St is given

t ij ij by 0 as ds. Therefore, the matrix-valued process At ≡ (at ) is Gt -adapted. ij

Let t ≡ (σt ) be the square root of At . We will prove in Chapter 11 ij that σt is Gt -adapted, i.e. it is completely observable. As we shall see in equation (1.8) below the stock price Sti satisﬁes an equivalent stochastic ij ij differential equation that depends on (σt ) instead of (σ˜ t ). Moreover, d log St0 is also Gt -adapted. Xt0 = dt

3

4

1 : Introduction

However, the stochastic process Xt := (Xt1 , . . . , Xtd )∗ is not necessarily Gt -adapted and hence, its value is not available to the investors. We need to estimate Xt based on the available information Gt . From equation (1.5), we see that t m t 1 ij ˜ j log Sti − log S0i − Xsi − aiis ds = σ˜ s d W i = 1, 2, . . . , d s, 2 0 0 j=1

t

t are martingales with Meyer’s process 0 As ds = 0 s2 ds. By the martingale representation theorem, there exists a d-dimensional standard Brownian motion W ≡ (W 1 , . . . , W d ) on (, F , P) such that m

˜t = σ˜ t d W

j=1

ij

j

d

ij

j

σt dWt ,

i = 1, . . . , d.

(1.6)

j=1

Thus, d log Sti

=

Xti

d 1 ii ij j − at dt + σt dWt , 2

i = 1, . . . , d.

(1.7)

j=1

Equivalently, the stock prices satisfy the following modiﬁed stochastic differential equations: ⎛ ⎞ d j ij i = 1, . . . , d. (1.8) σt dWt ⎠ , dSti = Sti ⎝Xti dt + j=1

We assume that t is invertible. Let S˜ t be deﬁned by d S˜ t := t−1 d log St . We can write the observation equation (1.7) as t 1˜ S˜ t = S˜ 0 + s−1 Xs − A s ds + Wt , 2 0 where

(1.9)

∗ ˜ s = a11 , . . . , add . A s s ˜

If t is non-random, then FtS = Gt . Let Yt = S˜ t − S˜ 0 . The observation model can be written as t hs (Xs )ds + Wt , Yt = 0

1.1

where hs (x) =

1.1.3

s−1

Examples

1˜ x − As . 2

Environment pollution

Suppose that there is a source of pollution at a location θ at which undesired chemicals are dumped into the river [0, ]. We assume that the dumping times follow a Poisson process with parameter λ and the amounts are independent R+ -valued random variables ξ1 , ξ2 , . . ., with the same distribution. Denote the dumping times by τ1 < τ2 < · · · . Then, the chemical distribution Xt in the river at time t is an MF ([0, ])-valued stochastic process, where MF ([0, ]) denotes the collection of ﬁnite measures on [0, ]. For τj ≤ t < τj+1 , the process Xt satisﬁes the following partial differential equation d Xt , f = Xt , Lf , dt

where µ, f represents the integral of a function f with respect to a measure µ, Lf (x) = Df (x) + Vf (x) − αf (x), D is the dispersion coefﬁcient, and V is the river velocity and α is the leakage rate. At time t = τj , there is a random jump for X given by Xt − Xt− = ξj δθ , where δθ is the Dirac measure at θ. Suppose that m observation stations x1 , . . . , xm are set up along the river. The chemical concentrations near these stations are observed subject to the random error: t 1 Xs ([xi − , xi + ])ds + Wti , i = 1, 2, . . . , m. Yti = 2 0 Let Gt = σ (Ys : 0 ≤ s ≤ t). Then Gt is the information available and we need to estimate Xt based on Gt . It is also desirable to estimate the parameters θ and λ.

1.1.4

Filtering with OU process as noise

As we indicated in Section 1.1.1, white noise does not exist in the ordinary sense. We will demonstrate below that the Ornstein–Uhlenbeck process provides a natural approximation of white noise.

5

6

1 : Introduction β

Let β > 0 and consider the process Ot governed by the following SDE: β

β

dOt = −βOt dt + βdWt , β

where Wt is an m-dimensional Brownian motion. Ot is called the Ornstein– Uhlenbeck process with parameter β. Applying Itô’s formula (to be given in Chapter 3), we get β d eβt Ot = βeβt dWt , and hence, β Ot

= O0 e

−βt

+ βe

−βt

t 0

eβs dWs .

It follows from Theorem 3.6 that for t ≥ s ≥ 0,

β −β(t−s) ∞ if t = s, −β(t+s) = −e → e 0 if t = s. 2

β β Cov(Ot , Os ) β

Thus, Ot approximates white noise as β → ∞. More precisely, we can prove that its integral converges to the Brownian motion Wt . In fact, as t 1 β β Or dr = Wt − Ot , β 0 it is easy to see that 0

t

β

Or dr → Wt ,

as β → ∞.

For simplicity of notations, we take β = 1 and denote O1t by Ot . We will consider the ﬁltering problem with the following observation model: yt = h(Xt ) + Ot .

(1.10)

Since the law of y is not absolutely continuous with respect to the law of the OU-process O, the ﬁltering problem with equation (1.10) as the observation model is singular. We will study a general singular ﬁltering model with equation (1.10) as a special case in Chapter 11.

1.2

Basic deﬁnitions and the ﬁltering equation

As we have seen from the examples in the previous section, the ﬁltering problem consists of two processes: The signal process, which is what we want

1.2

Basic deﬁnitions and the ﬁltering equation

to estimate, and the observation process, which provides the information we can use. In this book, we will model the signal by a d-dimensional diffusion process Xt governed by the following stochastic differential equation: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,

(1.11)

where W and B are two independent Brownian motions taking values in Rm and Rd , respectively. The mappings b : Rd → Rd , c : Rd → Rd×m and σ : Rd → Rd×d are continuous. The observation process is an m-dimensional process satisfying the following stochastic differential equation: t Yt = h(Xs )ds + Wt , (1.12) 0

where h : Rd → Rm is a continuous mapping. Let

Gt = σ (Ys : 0 ≤ s ≤ t) be the information available to us. Note that such a setup will not cover the model in the pollution example that needs an inﬁnite-dimensional state space. Its solution is beyond the scope of this book. As we will demonstrate in Chapter 5, the non-linear optimal ﬁlter is a P (Rd )-valued process πt that is the conditional distribution of Xt given Gt , where P (Rd ) denotes the collection of all Borel probability measures on Rd . The key in the development of non-linear ﬁltering theory is the Kallianpur–Striebel formula that represents the optimal ﬁlter πt according to an unnormalized ﬁlter Vt : Vt , f , ∀ f ∈ Cb2 (Rd ), (1.13) πt , f =

Vt , 1 where

ˆ (Mt f (Xt )|Gt ), Vt , f = E

ˆ refers to the expectation with respect to a probability measure P, ˆ and E which is equivalent to P such that dP = Mt . d Pˆ Ft The process Vt takes values in MF (Rd ), the space of ﬁnite Borel measures on Rd .

7

8

1 : Introduction

The main advantage of using Pˆ is that Y becomes a Brownian motion that is independent of B, and Xt is governed by a stochastic differential equation driven by B and Y. Based on this fact, a stochastic differential equation on MF (Rd ) is derived: t t (1.14) Vs , Lf ds + Vs , ∇ ∗ fc + fh∗ dYs , Vt , f = V0 , f + 0

0

where Lf = ∇ ∗ fb + =

1 ∗ 2 tr c ∂ fc + σ ∗ ∂ 2 f σ 2

d d 1 aij ∂ij2 f + bi ∂i f , 2 i,j=1

i=1

and a = cc∗ + σ σ ∗ . The equation above is called Zakai’s equation for the unnormalized ﬁlter. Applying Itô’s formula to the Kallianpur–Striebel formula and Zakai’s equation, we can obtain the ﬁltering equation for πt : t (1.15) πs , Lf ds πt , f = π0 , f + +

0

0

t

πs , ∇ ∗ fc + fh∗ − πs , f πs , h∗ dνs ,

where

νt = Yt −

0

t

πs , h ds

is a Brownian motion with respect to the original probability measure P. The process νt is called the innovation process and the ﬁltering equation is called the Kushner–Stratonovich equation, or the FKK equation (which stands for Fujisaki–Kallianpur–Kunita).

1.3

An overview

In this section, we will give an outline of the results that will be studied in this book. In Chapters 2–4, we will introduce the basic material of stochastic analysis that will be used in the rest of this book. We refer the reader who is interested in a more detailed treatment of this topic to the following books (Ikeda and Watanabe [76], Revuz and Yor [135], Protter [134]). Then, in Chapter 5, we derive the Kallianpur–Striebel formula as well as the ﬁltering equation (1.15) and Zakai’s equation (1.14). Now we sketch the results of Chapters 6–11.

1.3

An overview

In Chapter 6, we study Zakai’s equation (1.14) as a linear stochastic partial differential equation (SPDE). We make a linear transformation from MF (Rd ) to a Hilbert-space H0 such that the equation (1.14) is transformed to an equation on H0 . We then use Hilbert-space techniques to derive various estimates for equations on H0 . As a consequence, we prove that equation (1.14) has a unique solution. In Chapter 7, we study the ﬁltering equation (1.15) as a non-linear SPDE. To get the uniqueness of the solution, we consider a particle system ⎧

t

t ˜ i

t i i i i i ⎪ ⎨ Xt = X0 + 0 σ (Xs )dBs + 0 b(X s , µs )ds + 0 c(Xs )dνs , t i ∗ i i i (1.16) At = A0 + 0 As β (Xs , µs )dνs , ⎪ 1 n i ⎩ µ = lim t n→∞ n i=1 At δX i , t

where the ν, Bi , i = 1, 2, . . . , are independent Brownian motions, and ˜ µ) = b(x) − c(x)β(x, µ) β(x, µ) = µ, h − h(x) and b(x, for x ∈ Rd and µ ∈ P (Rd ). It can be proved that πt is a solution to equation (1.16), i.e. equation (1.16) holds with µt replaced by πt for suitable (Xti , Ait ), i = 1, 2, . . . . Theorem 1.2 Under suitable conditions, the inﬁnite system equation (1.16) has a unique solution (X, A, µ). Next, we proceed to provide an intuitive proof for the uniqueness of the solution to equation (1.15). Let {µt } be another solution to equation (1.15). Then {µt } is a solution to the following linear SPDE: t t

ηs , Lφ ds +

ηt , φ = π0 , φ + ηs , βs∗ φ + ∇ ∗ φc) dνs , (1.17) 0

0

where βs (x) = β(x, µs ). With µt given, we consider a system of the form equation (1.16) as follows:

t

t ˜ i

t i Xti = X0i + 0 σ (Xsi )dBis + 0 b(X s , µs )ds + 0 c(Xs )dνs ,

(1.18) t i ∗ i i i At = A0 + 0 As β (Xs , µs )dνs . We deﬁne a measure-valued process 1 i At δXi . t n→∞ n n

µ˜ t = lim

i=1

9

10

1 : Introduction

Then, µ˜ t is a solution of equation (1.17). Since the linear SPDE has a unique solution, we get that µ˜ t = µt , and hence, µt is a solution to the system equation (1.16). By the uniqueness of the solution for the system equation (1.16) we see that µt = πt . Thus, we have “proved” the following Theorem 1.3 Under suitable conditions, the ﬁltering equation (1.15) has a unique solution. Next, in Chapter 8, we study the numerical approximation for the optimal ﬁlter using some branching particle systems. Let {xni , i = 1, 2, . . . , n} be i.i.d. random vectors in Rd with common distribution π0 ∈ P (Rd ). Then 1 ≡ δxni → π0 in P (Rd ). n n

π0n

i=1

n−2α ,

0 < α < 1. For j = 0, 1, 2, . . ., we suppose that there Let δ = n are mj number of particles alive at time t = jδ. During the time interval (jδ, (j + 1)δ), the particles move according to the following diffusions: For i = 1, 2, . . . , mnj , t t t i i i i i ˆ b(Xs )ds + σ (Xs )dBs + c(Xsi )dYs , (1.19) Xt = Xjδ + jδ

jδ

jδ

where bˆ = b − ch. At the end of the interval, the ith particle (i = 1, 2, . . . , mnj ) branches i of offspring satisfying (independent of others) into a random number ξj+1 ˜ n (X i )] ˜ n (X i )}, [M with probability 1 − {M j+1 j+1 i ξj+1 = ˜ n (X i )}, ˜ n (X i )] + 1 with probability {M [M j+1 j+1

where {x} = x − [x] is the fraction of x, ˜ n (X i ) = M j+1 and n Mj+1 (X i )

= exp

( j+1)δ

∗

h jδ

1 mnj

n (X i ) Mj+1 , mnj n =1 Mj+1 (X )

(Xti )dYt

1 − 2

( j+1)δ

jδ

|h(Xti )|2 dt

Now we deﬁne the approximate ﬁlter as follows: mn

πtn

j 1 ˜n i Mj (X , t)δXi , = n t mj

i=1

(1.20)

jδ ≤ t < (j + 1)δ,

.

(1.21)

1.3

where

Mjn (X i , t)

= exp

t

h jδ

∗

(Xsi )dYs

1 − 2

t

jδ

An overview

|h(Xsi )|2 ds

,

(1.22)

and ˜ n (X i , s) = M j

1 mnj

Mjn (X i , s) . mnj n (X , s) M j =1

˜ n (X i , t). At the end Namely, the ith particle has a time-dependent weight M j of the interval, i.e. t = (j + 1)δ, this particle dies and give birth to a random number, whose conditional expectation is equal to the pre-death weight ˜ n (X i ) = M ˜ n (X i , (j + 1)δ)) of the particle, of offsprings. (note that M j+1 j Theorem 1.4 Under suitable conditions, there exists a constant K1 such that

E sup d(πtn , πt ) ≤ K1 n−

1−α 2

,

(1.23)

0≤t≤T

where d is a suitable distance on P (Rd ). In Chapter 9, we specialize the ﬁltering problem to the linear case. The optimal ﬁlter in this case is called the Kalman–Bucy ﬁlter. We assume that π0 ˆ 0 and covariance is a multivariate normal distribution with mean vector X matrix γ0 . Suppose that the signal Xt and the observation Yt are given by linear equations ⎧ ⎨ dXt = b˜ t + bt Xt dt + ct dWt + σt dBt , (1.24) ⎩ dYt = h˜ t + ht Xt dt + dWt , Y0 = 0. Theorem 1.5 The optimal ﬁlter πt is a multivariate normal (random) disˆ t and covariance matrix γt characterized by tribution with mean vector X the following equations t t ˜ ˆ ˆ ˆ bs + bs Xs ds + (1.25) cs + γs h∗s dνs , Xt = X0 + 0

0

and d γt = γt b∗t + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt

(1.26)

In Chapter 9, we also investigate the stability of the Kalman–Bucy ﬁlter. Let π¯ t be the multivariate normal (random) distribution with mean vector Zt

11

12

1 : Introduction

and covariance matrix Pt characterized by equations (1.25) and (1.26) with incorrect initial conditions Z0 and P0 , respectively. Then, as time increases, the Kalman–Bucy ﬁlter will correct the incorrect initials. We consider the following case: All coefﬁcients in model equation (1.24) are independent of t, b˜ t = 0 and h˜ t = 0. Theorem 1.6 Under suitable conditions on the coefﬁcient matrices (b − ch, σ , h), we have lim Pt = lim γt = γ∞ ,

t→∞

t→∞

and, for some λ > 0, ˆ t − Zt eλt = 0, lim X t→∞

a.s. (almost surely)

As a consequence, we have that lim ρ(πt , π¯ t ) = 0,

t→∞

a.s.,

where ρ is the Wasserstein distance in P (Rd ). We come back to the non-linear ﬁltering problem and consider the stability and other properties for the optimal ﬁlter πt in Chapter 10. Here, we assume that the signal is a general Markov process on a Polish space S with generator L and a unique invariant measure µ ∈ P (S). Suppose that the observation process Yt is given by t Yt = h(Xs )ds + Wt , (1.27) 0

Rm

is a continuous map and Wt is an m-dimensional where h : S → Brownian motion independent of X. The next theorem establishes the Markov property for the optimal ﬁlter. Theorem 1.7 Under suitable conditions, the ﬁlter {πt } is a Feller–Markov process taking values in P (S). The ergodicity for the non-linear ﬁlter was ﬁrst studied by Kunita [92]. Subsequently, many authors extended it to various setups. Then, using the results for the ergodicity, many authors studied the stability of the optimal ﬁlter. Unfortunately, there is a gap in Kunita’s proof that was found by Baxendale et al. [5]. Let {ξt , −∞ < t < ∞} be a stationary Markov process on S with generator L and marginal distribution µ. Let {βt , −∞ < t < ∞} be a Brownian

1.3

An overview

motion on Rm and let {zt , t ∈ R} be a process satisfying t h(ξu )du + βt − βs . z t − zs = s

Deﬁne z Fs,t = σ (zv − zu : s ≤ u ≤ v ≤ t),

−∞ ≤ s ≤ t ≤ ∞.

Let π¯ t be the solution of the ﬁltering equation with incorrect initial condition. The ﬁlter is asymptotically stable if lim d(πt , π¯ t ) = 0,

t→∞

a.s.

for a suitable metric d on P (S). The key equality used by Kunita [92] is that ξ z z ∩∞ s=−∞ F−∞,t ∨ F−∞,s = F−∞,t ,

a.s.

(1.28)

However, this equality does not hold in general as pointed out by Baxendale et al. [5]. The following theorem was proved by Budhiraja [18]. Theorem 1.8 Under suitable conditions on L, the following statements are equivalent: 1. The equality equation (1.28) holds. 2. The Markov process {πt } has a unique invariant measure. 3. The ﬁlter πt is asymptotically stable. Since condition equation (1.28) is not easy to verify, many authors studied the stability problem using other methods. Below we state a result of Atar and Zeitouni [3], which is proved using the Hilbert metric. Theorem 1.9 Suppose that S is a compact manifold in Rd . Then lim sup t→∞

1 log d(πt , π¯ t ) < 0, t

a.s.

Finally, in Chapter 11, we consider the ﬁltering model when the observation noise depends on the signal: dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , (1.29) dYt = h(Xt )dt + σ (Xt )dWt , where B and W are two independent Brownian motions in Rd and Rm , respectively, and b, c, c˜ , h, σ are functions on Rd with values in Rd , Rd×m , Rd×d , Rm , Rm×m , respectively. Without loss of generality, we

13

14

1 : Introduction

assume that for each x ∈ Rd , we have σ (x) ∈ Sd , the collection of all symmetric positive-deﬁnite d × d-matrices. Let Zt = σ (Xt ). It is easy to show that Zt , t > 0, is observable, and hence, the optimal ﬁlter πt is a probability measure supported on the manifold MZt where for z ∈ Sd , Mz = {x ∈ Rd : σ (x) = z}. Thus, the optimal ﬁlter is singular. The following decomposition plays a key role in the study of the singular ﬁlter πt . Theorem 1.10 Under suitable conditions, there exists an observable ﬂow ξt,s from MZs to MZt , and a diffusion process κt on the manifold MZ0 such that Xt (ω) = ξt,0 (κt (ω), ω),

∀ t ≥ 0.

Let (Yˆ t , Zt ) be the observation process, where t ˆ σ −1 h(Xs )ds + Wt . Yt = 0

We denote the optimal ﬁlter of the signal κt by Ut , which is a P (MZ0 )valued process. Note that this is a classical ﬁltering problem, and can be solved by the methods we mentioned above. The singular ﬁlter can be represented in terms of Ut , as the following theorem points out. Theorem 1.11 For any f ∈ Cb (Rd ) and t > 0, we have πt , f = Ut , f ◦ ξt,0 . Throughout this book, we will use K with a subscript to denote a constant. The subscript will be taken consecutively in each theorem and will restart from 1 at the beginning of each new theorem. Thus, for example, K1 in two different theorems may have different values.

2

Brownian motion and martingales

In this chapter, we introduce some basic concepts and properties of martingales that will be needed in the development of the ﬁltering theory introduced in this book. The aim is to prepare the reader with necessary material for the study of ﬁltering theory through the shortest route possible. Throughout this book, we ﬁx a complete probability space (, F , P) and a family of increasing sub-σ -ﬁelds Ft (t ∈ T) satisfying the usual conditions: F0 contains all P-null sets and Ft is right-continuous. We shall take T = R+ = [0, ∞) unless stated otherwise. Occasionally, we take T = N = {0, 1, 2, . . .} for the discrete case. All the stochastic processes Xt in this book will be adapted to this family of σ -ﬁelds, i.e. ∀ t, Xt is Ft -measurable. The quadruple (, F , P, Ft ) is called a stochastic basis.

2.1

Martingales

Let Xt be a real-valued stochastic process such that E|Xt | < ∞, ∀ t ∈ T. Deﬁnition 2.1 {Xt }t∈T is a martingale if ∀ s < t,

E(Xt |Fs ) = Xs ,

a.s.

(2.1)

It is a supermartingale (resp. submartingale) if equation (2.1) is replaced by inequality:

E(Xt |Fs ) ≤ Xs ,

(resp. ≥)

a.s.

We consider the discrete case ﬁrst. Let T = N and let Xn be a discrete-time stochastic process. Let fn be a predictable process (i.e. fn is Fn−1 -measurable). We deﬁne a transformation (f · X)n = f0 X0 +

n k=1

fk (Xk − Xk−1 ).

16

2 : Brownian motion and martingales

Note that this transformation is the counterpart in the discrete case of the stochastic integral that will be introduced in Chapter 3. Proposition 2.2 If Xn is a martingale (resp. supermartingale) and fn is a bounded (resp. non-negative and bounded) predictable process, then (f ·X)n is a martingale (resp. supermartingale). Proof Suppose that Xn is a martingale. As (f · X)n = (f · X)n−1 + fn (Xn − Xn−1 ), we have

E((f · X)n |Fn−1 ) = (f · X)n−1 . Thus, (f · X)n is a martingale. The other case can be veriﬁed similarly.

It is useful to consider a process at a random time τ . Such a time should be “adapted” to the σ -ﬁelds Ft . Namely, whether τ ≤ t or not should be decided by using the information available at time t. More precisely, we give the following Deﬁnition 2.3 τ : → T is a stopping time if ∀ t ∈ T, {τ ≤ t} ∈ Ft . We deﬁne the σ -ﬁeld at time τ as

Fτ = {A ∈ F : ∀ t ∈ T, A ∩ {τ ≤ t} ∈ Ft }. We denote the collection of all stopping times bounded by T as ST . Theorem 2.4 (Optional sampling theorem) Let X = {Xn }n∈N be a martingale (resp. supermartingale). Let τ , σ ∈ SN be such that σ (ω) ≤ τ (ω), ∀ω ∈ . Then,

E(Xτ |Fσ ) = Xσ

(resp. ≤)

a.s.

(2.2)

Proof We assume that X is a martingale. Let fn = 1σ
E(Xτ ) = E(Xσ ). For any B ∈ Fσ , it is easy to show that τB ≡ τ 1B + N1Bc and σB ≡ σ 1B + N1Bc are two stopping times and σB (ω) ≤ τB (ω) ≤ N. Hence,

2.1

Martingales

E(XτB ) = E(XσB ). Therefore, E(Xσ 1B ) = E(XσB ) − E(XN 1Bc ) = E(XτB ) − E(XN 1Bc ) = E(Xτ 1B ). This proves equation (2.2). The case for supermartingales can be proved similarly. Next, we give some estimates on the probabilities related to submartingales. The corollary of these estimates will be very important throughout this book. Theorem 2.5 (Doob’s inequality) Let {Xn }n∈N be a submartingale. Then for every λ > 0 and N ∈ N, λP max Xn ≥ λ ≤ E XN 1maxn≤N Xn ≥λ ≤ E(|XN |), n≤N

and

λP min Xn ≤ −λ ≤ E(|X0 | + |XN |). n≤N

Proof Let σ = min{n ≤ N : Xn ≥ λ} with the convention that inf ∅ = N. Then σ ∈ SN . By equation (2.2), we have

E(XN ) ≥ E(Xσ ) = E Xσ 1maxn≤N Xn ≥λ + E XN 1maxn≤N Xn <λ ≥ λP max Xn ≥ λ + E XN 1maxn≤N Xn <λ . n≤N

Thus,

λP max Xn ≥ λ ≤ E(XN ) − E XN 1maxn≤N Xn <λ n≤N

= E XN 1maxn≤N Xn ≥λ ≤ E(|XN |).

The other inequality can be proved similarly.

17

18

2 : Brownian motion and martingales

Corollary 2.6 Let {Xn }n∈N be a submartingale. Then for every λ > 0 and N ∈ N,

λP max |Xn | ≥ λ ≤ 2E(|XN |) + E(|X0 |). n≤N

Proof Combining both inequalities in Theorem 2.5, we get

λP max |Xn | ≥ λ ≤ λP max Xn ∨ max(−Xn ) ≥ λ

n≤N

n≤N

n≤N

≤ λP max Xn ≥ λ + λP min Xn ≤ −λ n≤N

n≤N

≤ 2E(|XN |) + E(|X0 |).

Corollary 2.7 (Doob’s inequality) Let {Xn }n∈N be a martingale such that for some p > 1 we have E(|Xn |p ) < ∞, ∀ n ∈ N. Then for every N ∈ N,

P max |Xn | ≥ λ ≤ n≤N

E(|XN |p ) , λp

(2.3)

and

E max |Xn | n≤N

p

≤

p p−1

p

E(|XN |p ).

(2.4)

Proof By Jensen’s inequality, |Xn |p is a submartingale and hence, equation (2.3) follows from Theorem 2.5 directly. Let Y = max |Xn |. n≤N

By Theorem 2.5, we have λP(Y ≥ λ) ≤ E(|XN |1Y≥λ ).

2.1

Hence,

∞

p

E(Y ) = E =p

0 ∞

≤

0 ∞

0

= pE

Martingales

pλp−1 1λ≤Y dλ λp−1 P(Y ≥ λ)dλ

λp−2 E 1Y≥λ |XN | dλ

0

Y

λp−2 dλ|XN |

p E(Y p−1 |XN |) p−1 1/p (p−1)/p p ≤ E(|XN |p ) E(Y p ) , p−1

=

where the last inequality follows from Hölder’s inequality. The inequality equation (2.4) then follows easily. Next, we consider the limit of submartingales. Let {Xn }n∈N be a submartingale and a < b. Deﬁne τ0 = σ0 = 0 and for n ≥ 0, τn = min{m ≥ σn−1 : Xm ≤ a} σn = min{m ≥ τn : Xm ≥ b}.

(2.5)

Then, τn and σn are two sequences of increasing stopping times and the number of upcrossings of {Xn : 0 ≤ n ≤ N} for the interval [a, b] is X (a, b) = max{n : σn ≤ N}. UN

Theorem 2.8 Suppose that {Xn }n∈N is a submartingale. Then, ∀ N ∈ N and a < b, we have X EUN (a, b) ≤

1 E{(XN − a)+ − (X0 − a)+ }. b−a

Proof By Jensen’s inequality, Yn = (Xn − a)+ is a submartingale and X (a, b) = U Y (0, b − a). Let τ and σ be deﬁned as in equation (2.5) UN n n N with X, a, b replaced by Y, 0, b − a, respectively. If σn > N, then YN − Y0 =

n

(Yσk ∧N − Yτk ∧N ) +

k=1 X (a, b) + ≥ (b − a)UN

n

(Yτk ∧N − Yσk−1 ∧N )

k=1 n

(Yτk ∧N − Yσk−1 ∧N ).

k=1

19

20

2 : Brownian motion and martingales

Therefore, X E(YN − Y0 ) ≥ (b − a)EUN (a, b).

As a consequence of the upcrossing estimate above, we have the following submartingale convergence theorem. Theorem 2.9 If {Xn }n∈N is a submartingale such that sup E(Xn+ ) < ∞, n

then X∞ = limn→∞ Xn exists a.s. and X∞ is integrable. Proof For any r > r, we have X X EU∞ (r, r ) = lim EUN (r, r ) N→∞

≤

r

1 lim E((XN − r)+ − (X0 − r)+ ) < ∞. − r N→∞

Hence, X P lim inf Xn < lim sup Xn = P ∪r
n→∞

which proves that X∞ exists a.s. By Fatou’s lemma,

E|X∞ | ≤ lim inf E|Xn | n→∞

= lim inf (2E(Xn+ ) − E(Xn )) n→∞

≤ 2 sup E(Xn+ ) − E(X0 ) < ∞. n

Hence, X∞ is integrable.

As a corollary of the theorem above, we now prove the following martingale convergence theorem. Theorem 2.10 Suppose that Y is an integrable random variable and {Fn } is an increasing sequence of sub-σ -ﬁelds of F . Let Xn = E(Y|Fn ), ∀ n ≥ 1. Then {Xn } is a uniformly integrable martingale and lim Xn = X∞ ,

n→∞

a.s. and in L1 ().

2.1

Martingales

Furthermore, X∞ = E(Y|F∞ ),

(2.6)

where F∞ = σ (∪n Fn ) ≡ ∨n Fn . Proof By Jensen’s inequality, we have |Xn | ≤ E |Y|Fn , and hence

E |Xn |1|Xn |>λ ≤ E E |Y|Fn 1|Xn |>λ = E |Y|1|Xn |>λ ≤ E |Y|1|Y|>λ + λ P (|Xn | > λ) ≤ E |Y|1|Y|>λ + λ−1 λ E (|Xn |) ≤ E |Y|1|Y|>λ + λ−1 λ E (|Y|) ,

where λ is an arbitrary constant. Then lim sup sup E |Xn |1|Xn |>λ ≤ E |Y|1|Y|>λ . λ→∞

n

Taking λ → ∞, we get

lim sup E |Xn |1|Xn |>λ = 0,

λ→∞ n

and hence {Xn } is uniformly integrable. By Theorem 2.9, as n → ∞, we have that Xn → X∞ a.s. and hence, in L1 (). In order to prove (2.6), we deﬁne

C = {B ∈ F : E(X∞ 1B ) = E(Y1B )} . For any B ∈ Fn , we have

E(Xn 1B ) = E (E (Y|Fn ) 1B ) = E(Y1B ). As B is also in Fm for any m ≥ n, we get

E(Y1B ) = E (Xm 1B ) . Taking m → ∞, we get that B ∈ C . Thus ∪n Fn ⊂ C . Clearly ∪n Fn is closed under ﬁnite intersection and C , containing ∪n Fn , is closed under increasing limit and closed under true difference. Thus, C contains the σ -ﬁeld generated by ∪n Fn , i.e. F∞ ⊂ C . This proves (2.6).

21

22

2 : Brownian motion and martingales

We will need to consider martingales in reverse time in R− . To this end, we only need to study the martingales with time parameter in Z− . Let {F−n , n ≥ 0} be a family of increasing σ -ﬁelds. Let {X−n , n ≥ 0} be a sequence of integrable random variables adapted to {F−n , n ≥ 0}. Deﬁnition 2.11 The sequence {X−n , n ≥ 0} is a backward martingale if for n ≥ 0, X−n is F−n -measurable, and for 0 ≤ n < m, we have

E (X−n |F−m ) = X−m ,

a.s.

Now we state and prove the backward martingale convergence theorem. Theorem 2.12 Let {(X−n , F−n ), n ≥ 0} be a backward martingale, and let F−∞ = ∩∞ n=0 F−n . Then the sequence {X−n , n ≥ 0} converges a.s. and in L1 to X = E(X0 |F−∞ ) as n → ∞. Proof Let U−n be the number of upcrossings of {−X−n , n ≥ 0} of [a, b] between times −n and 0. Then U−n is increasing as n increases, and let U(a, b) = lim U−n . n→∞

By the monotone convergence theorem, we get E U(a, b) = lim E {U−n } n→∞

≤

1 E (−X0 − a)+ < ∞, b−a

and hence, P U(a, b) < ∞ = 1. The same upcrossing argument as in the proof of Theorem 2.9 implies that X = lim X−n n→∞

exists a.s. As X−n = E (X0 |F−n ), the family {X−n , n ≥ 0} is uniformly integrable and X−n converges to X in L1 . Using the same arguments as in the proof of Theorem 2.10, we can show that X = E (X0 |F−∞ ). Finally, we consider continuous-time submartingales. Lemma 2.13 Let {Xt }t≥0 be a submartingale. Then ∀ T > 0, P

sup

t∈Q+ ∩[0,T]

|Xt | < ∞ = 1,

(2.7)

2.1

and

P ∀ t ≥ 0,

Martingales

lim

s∈Q+ , s↓t

Xs and

lim

s∈Q+ , s↑t

Xs exist = 1.

(2.8)

Proof Let {r1 , r2 , . . .} be an enumeration of Q ∩ [0, T]. For each n, let s1 < s2 < · · · < sn be a rearrangement of {r1 , . . . , rn }. Deﬁne Y0 = X0 , Yn+1 = XT and Yi = Xsi , i = 1, 2, . . . , n. Then Y = {Yi }i=0,1,...,n+1 is a submartingale. Therefore, by Corollary 2.6 and Theorem 2.8, 1 P max |Yi | > λ ≤ (2E|XT | + E|X0 |) , 1≤i≤n λ and

EUnY (a, b) ≤

1 1 E(Yn − a)+ ≤ E(XT − a)+ . b−a b−a

Take n → ∞, we have P

sup

t∈Q∩[0,T]

|Xt | > λ ≤

1 (2E|XT | + E|X0 |) , λ

(2.9)

and X|

EU∞ Q∩[0,T] (a, b) ≤

1 E(XT − a)+ . b−a

(2.10)

The identity equation (2.7) follows from equation (2.9) by taking λ → ∞. By equation (2.10), we see that X| Q∩[0,T] (a, b) = ∞ = 0. P ∪a
lim

s∈Q+ , s↓t

Xs or

lim

s∈Q+ , s↑t

Xs does not exist

(2.11)

X| Q∩[0,T] (a, b) = ∞ . ∪a
Therefore, the set (2.11) is of probability 0. Equation (2.8) follows by taking T → ∞.

23

24

2 : Brownian motion and martingales

Theorem 2.14 Let {Xt }t≥0 be a submartingale. Then ˆt = X

lim Xr

r∈Q, r↓t

ˆ t is a submartingale that is right-continuous with left-limit exists a.s. and X ˆ t a.s. for every t ≥ 0, and (cádlág) a.s. Furthermore, Xt ≤ X ˆ t ) = 1, P(Xt = X

∀t≥0

(2.12)

if and only if E(Xt ) is right-continuous. ˆ t exists a.s. The cádlág Proof It follows from Lemma 2.13 directly that X ˆ property of Xt can be veriﬁed easily. ˆ t is Ft+ = Ft measurable. For s > t and B ∈ Ft , we have Note that X ˆ t 1B ) = lim E(Xr 1B ) ≤ E(X r∈Q, r↓t

lim

r ∈Q, r ↓s

ˆ s 1B ). E(Xr 1B ) = E(X

ˆ t is a submartingale. Similarly, we have Hence, X ˆ t 1B ), E(Xt 1B ) ≤ E(X

∀ B ∈ Ft ,

ˆ t a.s. and hence Xt ≤ X ˆ t = EXt and hence, Xt = X ˆ t a.s. If E(Xt ) is right-continuous, then EX On the other hand, if equation (2.12) holds, then E(Xt ) is right-continuous ˆ t ) is right-continuous. since E(X ˆ in Theorem 2.14 is called a cádlág If E(Xt ) is right-continuous, then X modiﬁcation of X. From now on, we always take cádlág versions for such submartingales. The following theorem is an immediate consequence of Corollary 2.7. Theorem 2.15 (Doob’s inequality) Let {Xt }t≥0 be a right-continuous martingale such that E(|Xt |p ) < ∞, ∀ t ≥ 0 for some p > 1. Then for every t ≥ 0, E(|Xt |p ) , (2.13) P max |Xs | ≥ λ ≤ s≤t λp and

E max |Xs | s≤t

p

≤

p p−1

p

E(|Xt |p ).

(2.14)

Next, we consider the continuous-time counterpart of Theorem 2.4. We need to deﬁne the class (DL) ﬁrst.

2.2

Doob–Meyer decomposition

Deﬁnition 2.16 A submartingale {Xt } is in the class (DL) if ∀ T > 0, the family of random variables {Xσ : σ ∈ ST } is uniformly integrable. Remark 2.17 By the argument as in the proof of Theorem 2.10, we can prove that every right-continuous martingale is of class (DL). Theorem 2.18 (Optional sampling theorem) Suppose that {Xt }t≥0 is a rightcontinuous martingale. Let τ , σ ∈ SN be such that σ (ω) ≤ τ (ω), ∀ω ∈ . Then

E(Xτ |Fσ ) = Xσ

a.s.

(2.15)

Proof Let τn =

k 2n

if

k−1 k ≤ τ < n. n 2 2

Then τn ↓ τ is a sequence of stopping times. Let σn be deﬁned similarly. For any A ∈ Fσ , we have A ∈ Fσn and hence, by Theorem 2.4,

E(Xσn 1A ) = E(Xτn 1A ). Take n → ∞, we have

E(Xσ 1A ) = E(Xτ 1A ). This implies that E(Xτ |Fσ ) = Xσ .

The following theorem follows from Theorem 2.18 immediately. Theorem 2.19 Let {Xt }t≥0 be a right-continuous martingale and (σt )t≥0 be ˜ t = Xσ and F˜ t = Fσ , an increasing family of bounded stopping times. Let X t t ˜ ˜ t , Ft ) is a martingale. ∀ t ≥ 0. Then (X

2.2

Doob–Meyer decomposition

Note that a submartingale increases in expectation. Therefore, it should consist of two parts: the martingale part plus an increasing process. The rigorous treatment of this idea is the so-called Doob–Meyer decomposition that is the subject of this section. Theorem 2.20 (Doob’s decomposition) A submartingale {Xn }n∈N has exactly one decomposition Xn = Mn + An ,

(2.16)

25

26

2 : Brownian motion and martingales

where {Mn }n∈N is a martingale, A0 = 0, An is Fn−1 -measurable and An ≤ An+1 a.s. for all n ∈ N. Proof Deﬁne M0 = X0 and for n ≥ 1, Mn = Mn−1 + Xn − E(Xn |Fn−1 ). Then (Mn , Fn ) is a martingale. Deﬁne An = Xn − Mn . Then A0 = 0 and An = An−1 − Xn−1 + E(Xn |Fn−1 ).

(2.17)

It is clear that An is Fn−1 -measurable. Since Xn is a submartingale, by equation (2.17), we have An ≥ An−1 a.s. Next we prove the uniqueness. Suppose (Mn , An ) is such a decomposition, then

E(Xn |Fn−1 ) = Mn−1 + An .

(2.18)

From A0 = 0 and (2.16) we see that A0 and M0 are uniquely determined. Suppose that An−1 and Mn−1 are uniquely determined. Then An is uniquely determined by equation (2.18), and Mn by equation (2.16). Thus, the uniqueness follows by induction. Next, we consider the decomposition of a continuous-time submartingale. Deﬁnition 2.21 {At }t≥0 is an integrable increasing process if A0 = 0, t → At is right-continuous and increasing a.s. and

E(At ) < ∞,

∀ t ≥ 0.

An increasing process At is natural if it “almost” has no common jumps with any bounded martingale. Namely, for any bounded martingale mt , we have E ms As = 0, s≤t

where As = As − As− is the jump of A at s. Deﬁnition 2.22 An integrable increasing process At is natural if for every bounded martingale mt , t t E ms dAs = E ms− dAs 0

holds for every t ≥ 0.

0

2.2

Doob–Meyer decomposition

The following proposition gives a useful equivalent deﬁnition of the natural increasing process. Proposition 2.23 An integrable increasing process At is natural if and only if for every bounded martingale {mt }, t E(mt At ) = E ms− dAs 0

holds for every t ≥ 0. Proof Since {mt } is bounded and right-continuous, and {At } is integrable, it follows from the dominated convergence theorem that t E ms dAs 0

n−1

= E lim

n→∞

⎛ = lim ⎝ ⎛ = lim ⎝ n→∞

n

k=0

n−1

n→∞

m (k+1)t A (k+1)t − A kt

E m (k+1)t A (k+1)t − n

k=0 n

k=1

n

n

n

E m kt A kt − n

n

n−1

⎞ E E m (k+1)t A kt F kt ⎠ n

k=0 n−1 k=0

n

n

⎞

E m kt A kt ⎠ n

n

= E(mt At ); the third equality follows from the martingale property of mt .

Now we are ready to state the continuous-time counterpart of Theorem 2.20. Theorem 2.24 (Doob–Meyer decomposition) If {Xt }t≥0 is a submartingale of class (DL), then it is expressible uniquely as Xt = Mt + At , where At is an integrable natural increasing process and Mt is a martingale. Proof “Uniqueness”. Suppose that Xt = Mt − At = Mt − At are two such decompositions. Then At − At = Mt − Mt

27

28

2 : Brownian motion and martingales

is a martingale. Therefore, for any bounded martingale mt , we have

E(mt (At − At )) t =E ms− d(As − As ) 0

= lim E n→∞

= lim E n→∞

n−1 k=0 n−1 k=0

A (k+1)t − A (k+1)t

m kt n

n

m kt n

n

M (k+1)t − M (k+1)t n

− A kt − A k

n

n

− M kt − M k n

n

n

= 0. For any bounded random variable ξ , let mt = E(ξ |Ft ). Then

E(ξ At ) = E(E(ξ |Ft )At ) = E(E(ξ |Ft )At ) = E(ξ At ). Hence, for any t ≥ 0 ﬁxed, At = At a.s. By the right-continuity of A and A , we see that A = A a.s. “Existence”. By the uniqueness, we only need to construct the decomposition on [0, T]. Set Yt = Xt − E(XT |Ft ). Then Yt is a non-positive submartingale with YT = 0. We only need to prove the Doob–Meyer decomposition for {Yt }0≤t≤T . Let tjn = 2jTn . As {Ytjn , j = 0, 1, 2, . . . , 2n } is an Ftjn -adapted submartingale with YT = 0, it follows from Theorem 2.20 that 0 = YT = MTn + AnT . Taking conditional expectation, we get 0 = E MTn + AnT Ftjn = Mtnn + E AnT Ftjn . j

This implies that

Mtnn = −E AnT Ftjn , j

and hence, Ytjn = −E(AnT |Ftjn ) + Antn , j

(2.19)

2.2

Doob–Meyer decomposition

n -measurable. Assume for the moment where An0 = 0, Antn ≤ Antn , Antn is Ftj−1 j

j+1

j

that the family {AnT }n≥1 is uniformly integrable, which will be shown in n Lemma 2.25 below. Then there is a subsequence nk such that ATk converges to a random variable AT in the weak topology of L1 (): For any bounded n random variable ξ , E(ATk ξ ) → E(AT ξ ). Denote by Mt a right-continuous version of the uniformly integrable martingale (E(AT |Ft ))0≤t≤T and let At = Yt − Mt . Then {At } is right-continuous. Let i ≤ j. For any n0 > 0, n

n

Yt n0 + E(ATk |Ft n0 ) ≤ Yt n0 + E(ATk |Ft n0 ), i

i

j

j

and hence by taking k → ∞, we have At n0 ≤ At n0 . Therefore, {At } is i

n

j

increasing on {ti 0 : n0 ≥ 1, i = 0, 1, . . . , 2n0 }, and thus on all [0, T]. Finally, we prove that {At } is natural. Let mt be a non-negative, bounded, right-continuous martingale. By the dominated convergence theorem,

E

0

T

ms− dAs = lim

n −1 2

n→∞

= lim

i=0 n −1 2

E mtin Antn − E mtin Antn )

n→∞

= lim

n E mtin (Ati+1 − Atin )

i+1

i=0 n −1 2

n→∞

i

n n n A n n E mti+1 − m A n ti t t

i=0

i+1

i

= E(mT AT ), where the penultimate equality follows from the fact that Antn

i+1

measurable. Hence At is natural.

is Ftin

Lemma 2.25 The family of random variables {AnT }n≥1 is uniformly integrable. Proof By equation (2.19) and the predictability of {Antn : j = 0, 1, . . . , 2n }, j

it is easy to show that Antn = k

k−1

n |Ft n ) − Yt n . E(Ytj+1 j j

j=0

29

30

2 : Brownian motion and martingales

Let c > 0 be ﬁxed and n : Antn > c} σcn = inf{tk−1 k

with the convention that the inﬁmum over an empty set is T. Then σcn ∈ ST and Anσcn ≤ c. By the optional sampling theorem and equation (2.19), we have Yσcn = Anσcn − E(AnT |Fσcn ). Hence,

E(AnT 1AnT >c ) = −E(Yσcn 1σcn
(2.20)

Note that

E(Anσcn 1σcn
≤

2E((AnT

n
n 1σ n
It then follows from equation (2.20) that n 1σ n c ) ≤ −E(Yσcn 1σcn
Note that sup E(Yσcn 1σcn c ) + sup E(Yσcn 1|Yσ n |≤c , σcn
c

n

c

n

≤ sup E(Yσcn 1|Yσ n |>c ) + c sup P(σcn < T). c

n

n

n } and {Yσ n } are uniformly integrable and As {Yσc/2 c

P(σcn < T) = P(AnT > c) ≤

1 1 EAnT ≤ EY0 → 0 c c

as c → ∞, uniformly in n, we have lim sup E(AnT 1AnT >c ) = 0.

c→∞ n

This then proves the uniform integrability.

2.2

Doob–Meyer decomposition

Sometimes, we need At to be a continuous process. To this end, we need to assume that EXσ is continuous in stopping time σ . Deﬁnition 2.26 A submartingale Xt is regular, if for any T > 0 and for any σn ∈ ST increasing to σ , we have E(Xσn ) → E(Xσ ). Theorem 2.27 Let Xt be a regular submartingale of class (DL). Then At in the Doob–Meyer decomposition is continuous a.s. Proof Suppose that σn ∈ ST increasing to σ , then E(Aσn ) ↑ E(Aσ ) and hence, Aσn ↑ Aσ a.s. Set tjn = 2jTn . For c > 0, we deﬁne n ∧ c|Ft ), Ant = E(Atj+1

n t ∈ (tjn , tj+1 ].

n ] and A is a natural Since Ant is a martingale on the interval (tjn , tj+1 t increasing process, it is easy to show that t t n Ans− dAs , ∀t ∈ [0, T]. (2.21) E As dAs = E 0

0

n

Next we show that there exists a subsequence nk such that At k → At ∧ c uniformly in t ∈ [0, T] so we can pass to the limit in the above equality. For > 0, we deﬁne σn, = inf{t ∈ [0, T] : Ant − At ∧ c > }, n for t ∈ (t n , t n ]. Then with the convention that inf ∅ = T. Let πn (t) = tj+1 j j+1 n σn, , πn (σn, ) ∈ ST . Since At is decreasing in n, σn, is increasing in n. Let σ = limn→∞ σn, . Then σ ∈ ST and limn→∞ πn (σn, ) = σ . By the optional sampling theorem,

E(Anσn, ) = E(Aπn (σn, ) ∧ c), and hence, P(σn, < T) ≤ −1 E(Anσn, − Aσn, ∧ c) = −1 E(Aπn (σn, ) ∧ c − Aσn, ∧ c) → 0,

as n → ∞.

Therefore, lim P

n→∞

sup |Ant − At ∧ c| >

t∈[0,T]

= 0.

31

32

2 : Brownian motion and martingales

Hence, there exists a subsequence nk such that lim

n

sup |At k − At ∧ c| = 0,

nk →∞ t∈[0,T]

Thus, by equation (2.21), we have T E As ∧ cdAs = E

T

0

0

a.s.

As− ∧ cdAs .

Hence, 0=E

T

0

(As ∧ c − As− ∧ c)dAs ≥ E

(As ∧ c − As− ∧ c)(As − As− ).

s≤T

This implies the continuity of s → As ∧ c. Since c is arbitrary, we have the continuity of At .

2.3

Meyer’s processes

In this section, we introduce Meyer’s process for each square-integrable martingale. This process will play an important role in the deﬁnition of the stochastic integral in the next chapter. Deﬁnition 2.28 A martingale {Mt }t≥0 is a square-integrable martingale (denoted by M ∈ M2 ) if

E(Mt2 ) < ∞,

∀ t ≥ 0.

If M is continuous, then we write M ∈ M2,c . Lemma 2.29 If {Mt }t≥0 is a right-continuous square-integrable martingale, then Mt2 is a right-continuous submartingale of class (DL). Proof By Jensen’s inequality, Mt2 is a submartingale. By Theorem 2.15, we have

E

sup Mt2

Hence, for c → ∞, we have sup E

σ ∈ST

≤ 4E(MT2 ) < ∞.

0≤t≤T

Mσ2 1Mσ2 ≥c

Hence, Mt2 is in class (DL).

≤E

sup

0≤t≤T

Mt2 1sup 2 0≤t≤T Mt ≥c

→ 0.

2.3

Meyer’s processes

Applying the Doob–Meyer decomposition, there exists a unique natural increasing process At such that Mt2 − At is a martingale. We shall denote At by Mt , which is called Meyer’s process of Mt . Finally, we consider Meyer’s process between two martingales. Deﬁnition 2.30 For M, N ∈ M2 , the stochastic process

M, Nt =

1 ( M + Nt − M − Nt ) 4

is called Meyer’s process of Mt and Nt . Sometimes, we need to deﬁne Meyer’s process for a more general class of stochastic processes. Deﬁnition 2.31 A real-valued process {Mt }t∈R+ is a local martingale if there exists a sequence of stopping times τn increasing to ∞ almost surely such that ∀ n, Mtn ≡ Mt∧τn is a martingale. We denote the collection of all continuous local martingales by Mcloc , and all continuous locally square2,c

integrable martingales by Mloc .

Remark 2.32 Let Mt be a continuous local martingale. Deﬁne σn (ω) = inf{t : |Mt (ω)| ≥ n}, with the convention that inf ∅ = ∞. Then ∀ n, Mtn ≡ Mt∧σn is a bounded continuous martingale. Theorem 2.33 Let Mt be a continuous local martingale. Then there exists a unique continuous increasing process At with A0 = 0 such that Mt2 − At is a local martingale. We shall denote At by Mt . Proof Let Mtn be given as in Remark 2.32. Let Ant = Mn t . The continuous n+1 martingale Mt∧σ has Meyer’s process An+1 t∧σn . However, n n+1 Mt∧σ = Mt∧σn ∧σn+1 = Mt∧σn = Mtn , n

which has Meyer’s process Ant . Hence, n An+1 t∧σn = At ,

∀t.

Deﬁne At = Ant ,

t ≤ σn .

Then A0 = 0 and At is a continuous increasing process and At∧σn = Ant .

33

34

2 : Brownian motion and martingales 2 Since Mt∧σ = (Mtn )2 , it is clear that Mt2 − At is a local martingale with n localizing stopping times {σn }. The uniqueness of At follows from that of the process Ant .

As a consequence of Theorem 2.18, we have Corollary 2.34 Let Xt be a continuous local martingale and (σt )t≥0 be an ˜ t = Xσ increasing family of right-continuous bounded stopping times. Let X t and F˜ t = Fσt , ∀ t ≥ 0. Suppose that X is constant on [σt− , σt ], for any t. ˜ t , F˜ t ) is a continuous local martingale with Then (X ! " ˜ = Xσ . X t t

Proof Let τn be a localizing stopping-time sequence for X. Let τ˜n = inf{t : σt ≥ τn }. For any s > 0, we have {τ˜n ≤ t} ∩ {σt ≤ s} = {τn ≤ σt ≤ s} ∈ Fs . Then, {τ˜n ≤ t} ∈ Fσt = F˜ t , and hence, τ˜n is an F˜ t -stopping time. As τn ∈ [στ˜n − , στ˜n ], we have ˜ t∧τ˜ = Xσ ∧σ = Xσ ∧τ X t t n n τ˜n is an F˜ t -martingale.

2.4

Brownian motion

Brownian motion is the simplest and the most useful square-integrable martingale. In a sense, stochastic analysis is a branch of mathematics that studies the functionals of Brownian motions. Deﬁnition 2.35 A d-dimensional continuous process Xt is a Brownian motion if X0 = 0, for any t > s, Xt −Xs is independent of Fs and Xt −Xs has a multivariate normal distribution with mean zero and covariance matrix (t − s)Id , where Id is the d × d identity matrix. The next theorem shows that Meyer’s process for Brownian motion is tId . The converse of this theorem is also true and will be proved in the next chapter.

2.4

Brownian motion

Theorem 2.36 Suppose that Xt = (Xt1 , Xt2 , . . . , Xtd ) is a d-dimensional j Brownian motion. Then Xt , j = 1, 2, . . . , d, are square-integrable martingales satisfying ! " X j , X k = δjk t. (2.22) t

Proof It is easy to show that Xti is a square-integrable martingale. We only prove (2.22). For t > s, we have j

E(Xt Xtk − δjk t|Fs ) j

j

j

= E((Xt − Xs )(Xtk − Xsk )|Fs ) + Xs Xsk − δjk t j

j

j

+ E(Xsk (Xt − Xs ) + Xs (Xtk − Xsk )|Fs ) j

= δjk (t − s) + Xs Xsk − δjk t j

= Xs Xsk − δjk s. j

Therefore, Xt Xtk − δjk t is a martingale. This proves (2.22).

35

3

Stochastic integrals and Itô’s formula

In this chapter, we deﬁne stochastic integrals and introduce some stochastic analysis results that are essential in the later chapters. This chapter is organized as follows: In Section 3.1, we deﬁne predictable processes. Intuitively, the set of predictable processes is the “closure” of the set of left-continuous processes. In Section 3.2, we give the deﬁnition of the stochastic integral with respect to a square-integrable martingale. Meyer’s process, deﬁned in Chapter 2, plays a key role. We prove Itô’s formula in Section 3.3. This formula is very important in stochastic analysis, just like the chain rule in the ordinary analysis. In Section 3.4, we show that the Brownian motion is uniquely characterized by its Meyer process. We also give some representation theorems for square-integrable martingales in terms of Brownian motions. In Section 3.5, we present Girsanov’s formula for the change of probability measures. Finally, in Section 3.6, we study the Stratonovich integral. The advantage of this integral is that Itô’s formula based on it coincides with the chain rule in calculus, which is easier to use than Itô’s formula based on Itô’s integral.

3.1

Predictable processes

Let L be the collection of all measurable maps X : (R+ × , B (R+ ) ⊗ F ) → (R, B (R)), such that ∀ t ≥ 0, Xt : → R is Ft -measurable and, for each ω, t → Xt (ω) is left-continuous. Let P = σ X −1 (B) : B ∈ B (R), X ∈ L , where X −1 (B) = {(t, ω) ∈ R+ × : Xt (ω) ∈ B} .

3.2

Stochastic integral

Namely, P is the smallest σ -ﬁeld on (R+ ×, B (R+ )⊗ F ) such that ∀ X ∈ L, X : (R+ × , P ) → (R, B (R)) is measurable. Deﬁnition 3.1 A stochastic process X = (Xt (ω)) is predictable if the mapping X : (R+ × , P ) → (R, B (R)) is measurable. Example 3.2 Let 0 = t0 < t1 < · · · < tn . Deﬁne the simple process Xt (ω) = X0 (ω)1{0} (t) +

n−1

Xj (ω)1(tj ,tj+1 ] (t).

j=0

If Xj is Ftj -measurable, j = 0, 1, . . . , n, then X is predictable. The following lemma gives a useful alternative description of the predictable σ -ﬁeld P . Lemma 3.3 The σ -ﬁeld P is generated by all sets of the form = (u, v]×B, where B ∈ Fu , or = {0} × B, where B ∈ F0 . Proof Let G be the collection of all sets of the form = (u, v] × B, where B ∈ Fu , or = {0} × B, where B ∈ F0 . For every ∈ G , it is easy to see that 1 ∈ L and hence G ⊂ P . This implies that σ (G ) ⊂ P , where σ (G ) is the σ -ﬁeld generated by G . On the other hand, for each X ∈ L, we deﬁne 2

Xtn (ω)

= X0 (ω)1{0} (t) +

n

Xj/n (ω)1(jn−1 ,( j+1)n−1 ] (t).

j=0

It is clear that X n is σ (G )-measurable and Xtn (ω) → Xt (ω) for each t ≥ 0 and ω ∈ . Hence, X is σ (G )-measurable. This implies that P ⊂ σ (G ). Therefore, P = σ (G ).

3.2

Stochastic integral

Denote by L0 the collection of all simple predictable processes ft of the form ft (ω) =

n−1

fj (ω)1(tj ,tj+1 ] (t),

j=1

where 0 ≤ t0 < · · · < tn , fj is a bounded Ftj -measurable random variable.

37

38

3 : Stochastic integrals and Itô’s formula

Let M ∈ M2,c be ﬁxed. For f ∈ L0 , we deﬁne the Itô stochastic integral as fs dMs =

I(f ) ≡

n−1

fj (Mtj+1 − Mtj ).

(3.1)

j=1

Proposition 3.4 The stochastic integral satisﬁes the following identities: For every f ∈ L0 , we have

E

fs dMs

= 0,

and

E

2 fs dMs

=E

fs2 d Ms .

Proof The ﬁrst equality follows from

E

fs dMs

=

n−1

E(fj (Mtj+1 − Mtj ))

j=1

=

n−1

E(fj E(Mtj+1 − Mtj |Ftj ))

j=1

= 0.

(3.2)

To prove the second equality, we note that 2

fs dMs

=

n−1

fj2 (Mtj+1 − Mtj )2

j=1

+2

fj fk (Mtj+1 − Mtj )(Mtk+1 − Mtk )

0≤j
≡ I1 + I2 . Similar to equation (3.2), we have E(I2 ) = 0. On the other hand,

E(I1 ) =

n−1 j=1

E fj2 E (Mtj+1 − Mtj )2 Ftj

3.2

=

n−1

Stochastic integral

E fj2 Mtj+1 − Mtj

j=1

fs2 d Ms .

=E

To extend the deﬁnition of the stochastic integral to more general f , for M ∈ M2,c , we deﬁne a measure νM on (R+ × , P ) by νM (A) = E 1A (t, ω)d Mt . By Lemma 3.3, it is easy to show that L0 is a dense subspace of L2 (νM ). The following theorem follows from Proposition 3.4 directly. Theorem 3.5 The mapping I : L0 → L2 (, F , P) deﬁned by equation (3.1) is a linear isometry. Namely, for f , g ∈ L0 and α, β ∈ R, we have I(αf + βg) = αI(f ) + βI(g), and

E |I(f )|2 =

R+ ×

a.s.

|f (t, ω)|2 νM (dtdω).

As a consequence, it can be extended uniquely to a linear isometry from L2 (νM ) into L2 (, F , P). We still denote the extension by I(f ) = fs dMs . We then deﬁne the stochastic integral as a process t It (f ) ≡ fs dMs ≡ fs 1[0,t] (s)dMs . 0

Theorem 3.6 The stochastic process It ( f ) is a continuous square-integrable martingale with Meyer’s process t fs2 d Ms . I( f ) t = 0

Proof We only need to prove the theorem for t ≤ T with T being ﬁxed. Let {f n } be a sequence of simple predictable processes such that |fsn | ≤ |fs |,

(3.3)

39

40

3 : Stochastic integrals and Itô’s formula

and

E

T 0

(fsn − fs )2 d Ms < 2−n .

By the deﬁnition of the stochastic integral, we see that ∀ t ∈ [0, T], E |It (f n ) − It (f )|2 → 0.

(3.4)

It is easy to verify that It (f n ) and t n 2 (fsn )2 d Ms It (f ) − 0

are martingales. It then follows from equations (3.4) and (3.3) that It (f ) and t fs2 d Ms It (f )2 − 0

are martingales. Therefore, It (f ) is a square-integrable martingale with Meyer’s process t I(f ) t = fs2 d Ms . 0

From Theorem 2.15, we get

1 n P sup It (f ) − It (f ) > n 0≤t≤T 2 ≤ n2 E IT (f n ) − IT (f ) T (fsn − fs )2 d Ms < n2 2−n , = n2 E 0

which is summable. By Borel–Cantelli’s lemma, we have 1 n P sup It (f ) − It (f ) > , inﬁnitely often = 0. n 0≤t≤T Hence,

sup It (f n ) − It (f ) → 0,

a.s.

0≤t≤T

As It ( f n ) are continuous, It (f ) is continuous a.s.

2,c

Finally, we give the deﬁnition of the stochastic integral when M ∈ Mloc .

3.3

Itô’s formula

2,c

Deﬁnition 3.7 For M ∈ Mloc , let L2loc (M) be the collection of all real-valued predictable processes f such that there exists a sequence of stopping times σn ↑ ∞ a.s. and

E

T∧σn

0

ft2 d Mt

< ∞,

∀ T > 0, n ∈ N.

(3.5)

It is clear that we may choose σn in Deﬁnition 3.7 such that ∀ n ∈ N, Mtσn ≡ Mt∧σn is a square-integrable martingale and equation (3.5) is satisﬁed. Deﬁne Itn (f ) = It (1(0,σn ] f ). For m < n, it is easy to verify that n (f ). Itm (f ) = It∧σ m

Therefore, there exists a unique stochastic process It (f ) such that Itn (f ) = It∧σn (f ). Deﬁnition 3.8 It (f ) is called the stochastic integral of f ∈ L2loc (M) with

t 2,c respect to M ∈ Mloc . We also write 0 fs dMs for It (f ).

3.3

Itô’s formula

In this section, we derive Itô’s formula for a function of a semimartingale. This formula is the counterpart in stochastic analysis of the chain rule in ordinary calculus. Deﬁnition 3.9 A d-dimensional process Xt is a continuous semimartingale if Xt = X0 + Mt + At , where M1 , . . . , Md are continuous local martingales and A1 , . . . , Ad are continuous ﬁnite-variation processes. Before we state Itô’s formula, we need to introduce the following notations. Let Cb2 (Rd ) be the collection of all bounded differentiable functions with bounded partial derivatives up to order 2. We denote the partial derivative of a function F with respect to its ith variable by ∂i F. Similarly, we 2F by ∂ij2 F. denote ∂x∂ i ∂x j

41

42

3 : Stochastic integrals and Itô’s formula

Theorem 3.10 (Itô’s formula) Let Xt be a d-dimensional continuous semimartingale and let F ∈ Cb2 (Rd ). Then F(Xt ) = F(X0 ) +

d

t

0

i=1

∂i F(Xs )dMsi

+

d i=1

t 0

∂i F(Xs )dAis

d " ! 1 t 2 + ∂ij F(Xs )d Mi , Mj . s 2 0

(3.6)

i,j=1

Proof For simplicity of notations, we assume that d = 1. Let τn =

0 if |X0 | > n, inf {t : |Mt | > n or Var(A)t > n or Mt > n} if |X0 | ≤ n,

where Var(A)t is the total variation of A on [0, t]. It is clear that τn ↑ ∞ a.s. We only need to prove equation (3.6) with t replaced by t ∧ τn . In other words, we assume that |X0 |, |Mt |, Var(A)t and Mt are all bounded by a constant K and F ∈ C02 (R). Here, C02 (R) stands for the set of all functions that are in Cb2 (R) and are of compact supports. Let ti = itn , i = 0, 1, . . . , n. Then n (F(Xti ) − F(Xti−1 ))

F(Xt ) − F(X0 ) =

i=1 n

=

F (Xti−1 )(Xti − Xti−1 )

i=1

1 + F (ξi )(Xti − Xti−1 )2 2 n

≡

I1n

i=1 + I2n ,

(3.7)

where ξi is between Xti−1 and Xti . Note that as n → ∞, I1n

=

n

→

i=1 t 0

F (Xti−1 )(Mti − Mti−1 ) + F (Xs )dMs +

n

F (Xti−1 )(Ati − Ati−1 )

i=1

t 0

F (Xs )dAs .

(3.8)

3.3

Itô’s formula

On the other hand, 2I2n =

n

F (ξi )(Mti − Mti−1 )2

i=1

+2

n

F (ξi )(Mti − Mti−1 )(Ati − Ati−1 )

i=1

+ ≡

n

F (ξi )(Ati − Ati−1 )2

i=1 n n I21 + I22

n + I23 .

(3.9)

Since At is continuous and of ﬁnite variation and M is continuous, it is easy n → 0 and I n → 0. to show that I22 23 Let Vkn

k = (Mti − Mti−1 )2 ,

k = 1, 2, . . . , n.

i=1

Then,

E

(Vnn )2

=

n

E(Mti − Mti−1 )4

i=1

+2

E E (Mtj − Mtj−1 )2 |Ftj−1 (Mti − Mti−1 )2

1≤i<j≤n

≤ 4K2

n

E(Mti − Mti−1 )2

i=1

+2

E

Mtj − Mtj−1 (Mti − Mti−1 )2

1≤i<j≤n

≤ 4K2 E(Vnn ) + 2K

E (Mti − Mti−1 )2

1≤i
≤ (4K

2

+ 2K)E(Vnn )

# ≤ (4K2 + 2K) E (Vnn )2 . Hence, E (Vnn )2 ≤ (4K2 + 2K)2 .

43

44

3 : Stochastic integrals and Itô’s formula

Let n

I3n =

F (Xti−1 )(Mti − Mti−1 )2 ,

i=1

and I4n =

n

F (Xti−1 ) Mtj − Mtj−1 .

i=1

Then,

E(|I3n

2 n − I21 |)

≤E

max |F (ξi ) − F (Xti−1 )|

1≤i≤n

2

E (Vnn )2 → 0, (3.10)

and

I4n →

Finally,

t

0

F (Xs )d Ms .

(3.11)

E |I3n − I4n |2 =E

n i=1

2 F (Xti−1 )2 (Mti − Mti−1 )2 − Mti − Mti−1

n 2 2 ≤ F ∞ E (Mti − Mti−1 )4 + Mti − Mti−1

$

i=1

→ 0.

(3.12)

Combining equations (3.10), (3.11) and (3.12), we see that t n F (Xs )d Ms . I21 →

(3.13)

0

Equation (3.6) then follows from equations (3.7), (3.8), (3.9) and (3.13). As an application of Itô’s formula, we prove that for continuous squareintegrable martingales, Meyer’s processes coincide with the quadratic variation process. This point of view will be useful in Chapter 11. 2,c

Theorem 3.11 Suppose that M ∈ Mloc . Let 0 = t0n < t1n < · · · < tnn = t be such that n max (tjn − tj−1 ) → 0.

1≤j≤n

3.3

Itô’s formula

Then, lim

n→∞

n 2 n ) = M . (Mtjn − Mtj−1 t j=1

Proof Note that n 2 n ) (Mtjn − Mtj−1 j=1

=

n

n tj−1

j=1

=2

t 0

tjn

n )dMs + M n − M n 2(Ms − Mtj−1 tj tj−1

Ms dMs − 2

n

n (Mt n − Mt n ) + M Mtj−1 t j j−1

j=1

→ Mt , where the ﬁrst step follows from Itô’s formula, and the last step follows from the deﬁnition of the stochastic integral. As another application of Itô’s formula, we derive the Burkholder– Davis–Gundy inequality in this section. We recall Doob’s inequality (Theorem 2.15): p p p E max |Xs | ≤ E(|Xt |p ). s≤t p−1 Suppose that X0 = 0. As Xs2 − Xs is a martingale, we have E |Xt |2 = E ( Xt ), and Doob’s inequality becomes 2 (3.14) E max |Xs | ≤ 4E ( Xt ) . s≤t

Equation (3.14) is called the Burkholder–Davis–Gundy inequality. It also holds for general p ≥ 1. Since in this book we will only need the case of p ≥ 2, whose proof is much easier than other cases, we will only state this case in the next theorem. Theorem 3.12 (Burkholder–Davis–Gundy inequality) Suppose that p ≥ 2 and X ∈ M2,c satisfying X0 = 0. Then there exists a constant Kp such that p p 2 (3.15) E max |Xs | ≤ Kp E Xt . s≤t

45

46

3 : Stochastic integrals and Itô’s formula

Proof Since |x|p ∈ C 2 for p ≥ 2, it follows from Itô’s formula that t p(p − 1) t |Xt |p = p|Xs |p−2 Xs dXs + |Xs |p−2 d Xs . 2 0 0 Taking expectation on both sides, it follows from Hölder’s inequality that t p(p − 1) p E |Xt | = E |Xs |p−2 d Xs 2 0 p(p − 1) p−2

Xt ≤ E max |Xs | s≤t 2 p−2 2 p p p p(p − 1) p 2 ≤ E max |Xs | E Xt . s≤t 2 Combining with Doob’s inequality, we get

E max |Xs | s≤t

p

p ≤ p−1

p

p−2 2 p p p p(p − 1) p 2 E max |Xs | E Xt . s≤t 2

The inequality equation (3.15) then follows easily.

3.4

Martingale representation in terms of Brownian motion

In this section, we make use of Itô’s formula to show that the Brownian motion is characterized by its Meyer process. Then, as consequences of this result, we present some representation theorems for square-integrable martingales in terms of Brownian motions. Recall that ξ ∗ is the transpose of a vector or a matrix ξ . 2,c

Theorem 3.13 Suppose that Xt = (Xt1 , . . . , Xtd )∗ is such that X j ∈ Mloc , X0 = 0 and ! " X j , X k = δjk t, j, k = 1, 2, . . . , d. t

Then Xt is a d-dimensional Brownian motion. Proof Let ξ ∈ Rd . Applying Itô’s formula to the function exp (iξ ∗ x), we have t ∗ ∗ exp iξ Xt = exp iξ Xs + i exp iξ ∗ Xu ξ ∗ dXu −

1 2

s

s

t

|ξ |2 exp iξ ∗ Xu du.

3.4

Thus,

Martingale representation in terms of Brownian motion

E exp iξ Xt Fs = exp iξ ∗ Xs ∗ 1 t 2 |ξ | E exp iξ Xu Fs du. − 2 s

∗

By solving this integral equation, we get 1 E exp iξ ∗ (Xt − Xs ) Fs = exp − |ξ |2 (t − s) . 2 This implies that Xt −Xs is independent of Fs and Xt −Xs has a multivariate normal distribution with mean 0 and covariance matrix (t − s)Id . Namely, Xt is a d-dimensional Brownian motion. As an application of Theorem 3.13, we can represent any locally squareintegrable martingale as a time change of a Brownian motion. 2,c

Theorem 3.14 Suppose that M ∈ Mloc satisfying limt→∞ Mt = ∞ a.s. Let τt = inf{u : Mu > t}, and F˜ t = Fτt . Then Bt = Mτt is an (F˜ t )-Brownian motion. As a consequence, Mt has the following representation: Mt = B Mt . Proof We ﬁrst prove that Bt is continuous. Note that the only possible case for Bt not being continuous is that τt has a jump and M is not constant over this jump. Suppose that τt has a jump at t0 . Denote r = τt0 − and r = τt0 . Then Mu = t0 for all u ∈ (r, r ); and M is not constant on the interval (r, r ). Therefore, we only need to show that (3.16) P { Mr = Mr } \ {Mu = Mr , ∀ u ∈ [r, r ]} = 0. Let σ = inf {s > r : Ms > Mr } . Then σ is a stopping time and hence, by Doob’s optional sampling theorem, Ns ≡ Mσ ∧(r+s) − Mr is a local martingale with respect to Fˆ s ≡ Fσ ∧(r+s) . Since

Ns = Mσ ∧(r+s) − Mr = 0, we have N = 0. This implies equation (3.16).

47

48

3 : Stochastic integrals and Itô’s formula

By Corollary 2.34,

Bt = Mτt = t. Therefore, Bt is a Brownian motion.

Next, we would like to remove the condition that limt→∞ Mt = ∞ a.s. To this end, we need to deﬁne the Brownian motion in an extended probability space. ˜ F˜ t ) is an extension of a stochastic ˜ F˜ , P, Deﬁnition 3.15 We say that (, ˜ → that is F˜ /F basis (, F , P, Ft ) if there exists a mapping π : −1 measurable such that i) F˜ t ⊃ π (Ft ); ii) P = P˜ ◦ π −1 and iii) for every bounded random variable X on , ˜ X( ˜ ω)| ˜ E( ˜ F˜ t ) = E(X|Ft )(π ω)

˜ ˜ P-a.s. (almost surely with respect to P),

˜ ω) ˜ by X if its meaning is ˜ We shall denote X where X( ˜ = X(π ω), ˜ for ω˜ ∈ . clear from the context. ˜ F˜ t ) is called a standard extension of the stochas˜ F˜ , P, The quadruple (, tic basis (, F , P, Ft ) if we have another stochastic basis ( , F , P , Ft ) such that ˜ F˜ t ) = (, F , P, Ft ) × ( , F , P , F ), ˜ F˜ , P, (, t ˜ and π ω˜ = ω for ω˜ = (ω, ω ) ∈ . 2,c

Theorem 3.16 For M ∈ Mloc , we deﬁne τt = inf{u : Mu > t}, with the convention that inf ∅ = ∞. Let Fˆ t = σ ∪s>0 Fτt ∧s . ˜ F˜ t ) of (, F , P, Fˆ t ) there exists an ˜ F˜ , P, Then, on an extension (, ˜ (Ft )-Brownian motion Bt such that Mt = B Mt . Proof By the optional sampling theorem, ∀ s ≥ s and u ≥ v,

E(Mτu ∧s |Fτv ∧s ) = Mτv ∧s , and

E((Mτu ∧s − Mτv ∧s )2 |Fτv ∧s ) = E( Mτu ∧s − Mτv ∧s |Fτv ∧s ).

3.4

Martingale representation in terms of Brownian motion

Therefore, s → Mτu ∧s is a square-integrable martingale with Meyer’s process Mτu ∧s . By the martingale convergence theorem (Theorem 2.9), ˜ u = lim Mτ ∧s B u s↑∞

exists a.s. Further, ∀ u ≥ v, we have

E(B˜ u |Fˆ v ) = B˜ v , and

E((B˜ u − B˜ v )2 |Fˆ v ) = E( Mτu − Mτv |Fˆ v ). Let ( , F , P , Ft ) be a stochastic basis and let Bt be a Brownian motion on . Deﬁne the standard extension by ˜ F˜ t ) = (, F , P, Fˆ t ) × ( , F , P , F ). ˜ F˜ , P, (, t Let ˜ t∧ M . Bt = Bt − Bt∧ M∞ + B ∞ Then Bt is a continuous F˜ t -martingale with Bt = t, and hence, a Brownian motion. The rest of the proof is easy. Next, we represent a square-integrable martingale as a stochastic integral with respect to a Brownian motion. Theorem 3.17 Let Mi ∈ M2,c , i = 1, 2, . . . , d. Let ij : R+ × → R, i, j = 1, 2, . . . , d be predictable processes such that !

i

M ,M

j

" t

=

t d 0

ik (s)jk (s)ds.

k=1

If det((s)) = 0 a.s.

∀ s,

(3.17)

then there exists a d-dimensional Brownian motion Bt (on the original stochastic basis) such that Mti =

d k=1

0

t

ik (s)dBks .

Proof For N > 0, let IN (s) = 1max1≤i,j≤d |( −1 )ij (s)|≤N ,

(3.18)

49

50

3 : Stochastic integrals and Itô’s formula

where −1 is the inverse matrix of . Deﬁne i,N

Bt

=

d

t

( −1 )ik (s)IN (s)dMsk ,

0

k=1

i = 1, 2, . . . , d.

Then, Bi,N ∈ M2,c and !

B

i,N

j,N

,B

" t

d

=

0

k,=1

= =

t

d

t

m=1 0 t 0

(

−1

)ik (s)(

−1

)j (s)IN (s)

d

km (s)jm (s)ds

m=1

δim δjm IN (s)ds

IN (s)dsδij .

So, i,N 2 E sup Bi,N − B ≤ 4 E t t 0≤t≤T

T 0

|IN (s) − IN (s)|2 ds → 0

as N, N → ∞. Therefore, Bi, N converges in M2,c to some Bi and ! " Bi , Bj = δij t. t

By Theorem 3.13, Bt = (B1t , . . . , Bdt ) is a d-dimensional Brownian motion. Note that d k=1

0

t

N ik (s)dBk, = s

t

0

IN (s)dMsi .

Taking N → ∞, we get the representation equation (3.18).

Next, we remove the condition equation (3.17). In this case, we need to construct the Brownian motion on an extension of the original stochastic basis. Theorem 3.18 Let Mi ∈ M2,c , i = 1, 2, . . . , d. Let ik : R+ × → R, i = 1, 2, . . . , d, k = 1, 2, . . . , r, be predictable processes such that !

Mi , Mj

" t

=

t r 0

k=1

ik (s)jk (s)ds,

i, j = 1, 2, . . . , d.

3.4

Martingale representation in terms of Brownian motion

˜ F˜ t ) of (, F , P, Ft ) there exists a ˜ F˜ , P, Then, on an extension (, r-dimensional Brownian motion Bt such that Mti =

r

t

0

k=1

ik (s)dBks ,

i = 1, 2, . . . , d.

(3.19)

Proof By taking Mti ≡ 0 or ik ≡ 0 if necessary, we may assume that d = r. Let ij (s) =

d

ik (s)jk (s).

k=1

Then (s) is a d × d non-negative deﬁnite matrix. Let 1

˜ (s) = lim (s) 2 ((s) + Id )−1 , ↓0

where Id is the d × d identity matrix. By diagonalization of the matrix (s), ˜ it is easy to see that (s) above is well deﬁned. Let ER (s) be the projection matrix to the range of (s) and EN (s) = Id − ER (s). Then, 1

1

˜ ˜ 2 = (s) 2 (s) = ER (s). (s)(s) 1

First, we assume that (s) = (s) 2 . Let Bt be a d-dimensional Brownian motion on a stochastic basis ( , F , P , Ft ) and let ˜ F˜ t ) = (, F , P, Ft ) × ( , F , P , F ). ˜ F˜ , P, (, t Deﬁne Bit =

d k=1

0

t

d

˜ ik (s)dMsk +

k=1

t 0

EN (s)ik dBk s .

Then Bi , Bj t = δij t and hence, Bt is a d-dimensional Brownian motion. Further, d k=1

t 0

ik (s)dBks =

d k,j=1

+

t

˜ kj (s)dMs ik (s) j

0

d k,j=1

0

t

j

ik (s)EN (s)kj dBs

51

52

3 : Stochastic integrals and Itô’s formula

=

d

t

0

j=1

= Mti −

j

ER (s)ij dMs

d

t

0

j=1

j

EN (s)ij dMs .

(3.20)

Note that % d j=1

· 0

& j EN (s)ij dMs

= 0. t

Combining this with equation (3.20), we see that equation (3.19) holds. For general (s), there exists an orthogonal-matrix-valued predictable 1 process P(s) such that (s) 2 = (s)P(s). By the previous step, we have r

Mti =

k=1

t

0

1

(s) 2 dBks .

Let ˜k = B s

d i=1

0

t

j

Pkj (s)dBs .

Then B˜ t is a d-dimensional Brownian motion and equation (3.19) holds ˜ with B replaced by B.

3.5

Change of measures

In this section, we investigate the following question: how do the martingales change under equivalent probability measures? First, we consider a sequence of non-negative local martingales. Under Novikov’s condition equation (3.23), it becomes a martingale and gives the Radon–Nickodym derivative between two probability measures. 2,c For X ∈ Mloc with X0 = 0, we deﬁne 1 E (X)t ≡ exp Xt − Xt . 2

Lemma 3.19 The positive-valued process E (X)t is a continuous local martingale.

3.5

Change of measures

Proof Applying Itô’s formula, we have t E (X)t = 1 + E (X)s dXs . 0

Hence E (X) is a continuous local martingale. Theorem 3.20 (Kazamaki) If E exp 12 Xt < ∞ for all t, then E (X)t is a

martingale. Proof Let {σn } be a sequence of stopping times increasing to inﬁnity such that for each n, {E (X)t∧σn : t ≥ 0} is a martingale. For any bounded stopping time σ , it follows from Fatou’s lemma that

EE (X)σ ≤ lim inf EE (X)σ ∧σn = 1.

(3.21)

n→∞

Since E exp

1 2 Xt

< ∞ for all t, it is easy to show that for any T ≥ 0,

the family {Xt : t ≤ T} is uniformly integrable. Thus, Xt is a martingale. 1 By Jensen’s inequality, exp 2 Xt is a submartingale. Let a ∈ (0, 1). Then

E (aX)t = (E (X)t )a (Zt(a) )1−a , 2

2

(a)

where Zt = exp (aXt /(1 + a)). By the optional sampling theorem for a submartingale, for any σ ∈ ST , we have 0 ≤ Zσ(a) ≤ E ZT(a) |Fσ , (a)

and hence, {Zσ : σ ∈ ST } is uniformly integrable. Then, sup E E (aX)σ 1E (aX)σ >c σ ∈ST

2

≤ sup (EE (aX)σ )a σ ∈ST

1−a2 E Zσ(a) 1E (aX)σ >c

1−a2 ≤ sup E Zσ(a) 1E (aX)σ >c σ ∈ST

→ 0, as c → ∞, i.e. {E (aX)σ : σ ∈ ST } is uniformly integrable. Thus 2

1 = E(E (aX)σ ) ≤ (E(E (X)σ ))a

1−a2

E(ZT(a) )

.

(3.22)

53

54

3 : Stochastic integrals and Itô’s formula

Note that ZT(a) ≤ 1XT ≤0 + exp

1 XT 2

is uniformly integrable for a ∈ (0, 1). Then, as a ↑ 1, we have 1 (a) XT ∈ (0, ∞), EZT → E exp 2 and hence, 1−a2 EZT(a) → 1. Therefore, by equation (3.22), we get

EE (X)σ ≥ 1. Combining with equation (3.21), we get EE (X)σ = 1 for any bounded stopping time σ . Let s ≤ t and B ∈ Fs . Deﬁne t if ω ∈ / B, σ = s if ω ∈ B. It is easy to show that σ ∈ St , and hence 0 = EE (X)σ − 1 = E (E (X)t 1Bc + E (X)s 1B ) − 1 = EE (X)s 1B − EE (X)t 1B . This implies that E (X)t is a martingale.

The next theorem gives another condition for E (X)t to be a martingale that is easier to verify than that in Kazamaki’s theorem. 2,c

Theorem 3.21 (Novikov) If the stochastic process X ∈ Mloc satisﬁes the following Novikov condition: 1

Xt < ∞, ∀ t ≥ 0, (3.23) E exp 2 then {E (X)t }t≥0 is a continuous martingale. Proof Note that

1 exp Xt 2

1

Xt . = (E (X)t ) exp 4 1 2

3.5

Change of measures

It follows from the Cauchy–Schwartz inequality that

E exp

1 Xt 2

1

≤ (EE (X)t ) 2

1 2 1

Xt E exp < ∞. 2

The conclusion then follows from Kazamaki’s theorem.

Throughout the rest of this section, we assume that E (X)t deﬁned above is a martingale with E (X)0 = 1. We deﬁne a probability measure Pˆ t on (, Ft ) by Pˆ t (A) = E(E (X)t 1A ),

∀ A ∈ Ft .

Then ∀ t > s, we have Pˆ t |Fs = Pˆ s . In fact, ∀ A ∈ Fs , Pˆ t (A) = E(E(E (X)t 1A |Fs )) = E(E (X)s 1A ) = Pˆ s (A). We assume that F = σ ∪t≥0 Ft . ˆ F = Then there exists a unique probability measure Pˆ on (, F ) such that P| t ˆPt . Denote Pˆ by E (X) · P. The following theorem gives a formula for the conditional expectation of a random variable under a change of measure. Theorem 3.22 (Bayes’ formula) Suppose that ξ is an integrable random variable on (, F , P) and G is a sub-σ -ﬁeld of F . Let Q >> P be another dP probability measure such that M = dQ . Then

E(ξ |G ) =

EQ (ξ M|G ) , EQ (M|G )

(3.24)

where EQ refers to the expectation with respect to the probability measure Q. Proof For any A ∈ G , we have A

EQ (ξ M|G ) EQ (ξ M|G ) Q 1A Q dP = E M EQ (M|G ) E (M|G ) EQ (ξ M|G ) Q Q E (M|G ) =E 1A Q E (M|G ) = EQ (1A ξ M).

55

56

3 : Stochastic integrals and Itô’s formula

On the other hand, A

E(ξ |G )dP = E(1A ξ ) = EQ (1A ξ M).

This proves the identity equation (3.24).

Finally, we state and prove the main theorem of this section that will be used extensively in this book. ˆ Denote the collection of all P-locally square-integrable martingales by 2,c ˆ Mloc . 2,c Theorem 3.23 (Girsanov’s transformation) i) If Y ∈ Mloc , then Y˜ deﬁned by

Y˜ t = Yt − X, Yt

(3.25)

ˆ is a P-locally square-integrable martingale. 2,c ii) For Y 1 , Y 2 ∈ Mloc , let Y˜ 1 , Y˜ 2 be deﬁned by equation (3.25). Then, Meyer’s processes satisfy the following identity: ! " ! " Y˜ 1 , Y˜ 2 = Y 1 , Y 2 , a.s. Proof i) First, we assume that Y˜ t is bounded. By Itô’s formula, d(E (X)t Y˜ t ) = Y˜ t d E (X)t + E (X)t d Y˜ t + d E (X), Yt = Y˜ t d E (X)t + E (X)t dYt . Therefore, E (X)t Y˜ t is a martingale and hence, by Bayes’ formula, ˜ Eˆ (Y˜ t |Fs ) = E(E (X)t Y˜ t |Fs )E (X)−1 s = Ys , ˆ (Y|G ) stands for the conditional expectation, given G , under the where E ˆ probability measure P. In general, we choose a sequence of increasing stopping times σn such ˆ 2,c , and so does Y. ˜ that ∀ n, Y˜ t∧σn is bounded. Therefore, Y˜ σn ∧· ∈ M loc Since the Meyer’s processes coincide with the corresponding quadratic variation processes, the proof of (ii) is trivial. Corollary 3.24 Suppose that Xt =

t 0

s∗ dBs ,

3.6

Stratonovich integral

where Bt is a d-dimensional Brownian motion and is a square-integrable Rd -valued predictable process. We assume that the Novikov condition is satisﬁed so that Pˆ is a probability measure. Then ˜ t = Bt − B

t

0

s ds

ˆ Ft ). is a d-dimensional Brownian motion on (, F , P, Proof Note that !

" Bi , X = ti . t

ˆ . As ˜i ∈ M Hence, B loc 2,c

!

˜ i , B˜ j B

" t

! " = Bi , Bj = δij t,

ˆ B˜ t is a d-dimensional Brownian motion under P.

3.6

Stratonovich integral

In the deﬁnition of Itô’s integral, the values of the integrand at the left points of each subinterval of a partition are taken in the deﬁnition of the approximating Riemann sum. Namely

t 0

fs dMs = lim

||→0

n−1

fti Mti+1 − Mti ,

i=0

where = {0 = t0 < t1 < · · · < tn = t} is a partition of [0, t] and = max0≤i
Mt is a square-integrable martingale and f ∈ L (νM ), the stochastic integral 0 fs dMs is also a martingale. The disadvantage of this deﬁnition of the stochastic integral is that Itô’s formula based on this integral is different from the chain rule, which is very convenient to use. To overcome this shortcoming of the Itô integral, Stratonovich modiﬁed the Riemann sum by taking the average of the values of the integrand at endpoints for each small interval.

57

58

3 : Stochastic integrals and Itô’s formula

Theorem 3.25 Let X and Y be two continuous semimartingales. Let be a partition 0 = t0 < t1 < · · · < tn = t of [0, t] such that || → 0. Then, the sum In ≡

n−1 i=0

1 Xti + Xti+1 Yti+1 − Yti 2

converges in probability to a random variable, denoted by Further, t t 1 Xs dYs + X, Yt . Xs ◦ dYs = 2 0 0

t 0

Xs ◦ dYs .

(3.26)

Proof Note that In =

n−1 i=0

1 n−1 Xti+1 − Xti Yti+1 − Yti . Xti Yti+1 − Yti + 2

(3.27)

i=0

By the deﬁnition of the Itô integral, the ﬁrst term of equation (3.27) con t verges to 0 Xs dYs . By Theorem 3.11 and Deﬁnition 2.30, it is easy to show that the second term of equation (3.27) converges to 12 X, Yt . Thus, the conclusions of the theorem hold.

t Deﬁnition 3.26 The stochastic integral 0 Xs ◦dYs is called the Stratonovich integral. As a consequence of the theorem above, we have the following useful identities. For simplicity of notations, we shall use X ◦ dY for Xt ◦ dYt . Corollary 3.27 Suppose that X, Y, Z are semimartingales, then X ◦ (dY + dZ) = X ◦ dY + X ◦ dZ, (X + Y) ◦ dZ = X ◦ dZ + Y ◦ dZ, X ◦ (dY · dZ) = (X ◦ dY) · dZ = X · (dY · dZ), and (XY) ◦ dZ = X ◦ (Y ◦ dZ). Proof The ﬁrst two equalities follow from the deﬁnition directly. By equation (3.26), we have Y ◦ dZ = YdZ +

1 d Y, Z . 2

3.6

Stratonovich integral

As Y, Z is of ﬁnite variation, its quadratic covariation with X is zero, and hence, X ◦ d Y, Z = Xd Y, Z . As

X, YdZ = Yd X, Z ,

we have

1 X ◦ (Y ◦ dZ) = X ◦ YdZ + d Y, Z 2 1 X ◦ d Y, Z 2 1 1 = XYdZ + Yd X, Z + Xd Y, Z 2 2 1 = XYdZ + d XY, Z 2 = (XY) ◦ dZ. = X ◦ (YdZ) +

This proves the last equality. The other equalities can be proved similarly. Finally, we give the counterpart of Itô’s formula in the present setup. This formula has the same form as the chain rule in calculus. Theorem 3.28 If X 1 , X 2 , . . . , X d are continuous semimartingales and f ∈ C 3 (Rd ), then Y = f (X 1 , . . . , X d ) is a semimartingale and dYt =

d

∂i f (Xt1 , . . . , Xtd ) ◦ dXti .

(3.28)

i=1

Proof Denote Xt = (Xt1 , . . . , Xtd ). By Itô’s formula, we get dYt =

d

∂i f (Xt )dXti +

i=1

d " ! 1 2 ∂ij f (Xt )d X i , X j . t 2 i,j=1

By Theorem 3.25, we have ∂i f (Xt )dXti = ∂i f (Xt ) ◦ dXti −

" 1 ! d ∂i f (X), X i . t 2

(3.29)

59

60

3 : Stochastic integrals and Itô’s formula

Applying Itô’s formula again, we get d∂i f (Xt ) =

d

j ∂ij2 f (Xt )dXt

j=1

d " ! 1 3 + ∂ijk f (Xt )d X j , X k . t 2 j,k=1

Thus, d ! ! " " d ∂i f (X), X i = ∂ij2 f (Xt )d X i , X j . t

j=1

t

(3.30)

The chain rule equation (3.28) follows from equations (3.29) and (3.30).

4

Stochastic differential equations

In many ﬁltering problems, the signal Xt is a stochastic process governed by a stochastic differential equation (SDE). First, we derive this SDE intuitively. Suppose Xt is a continuous process taking values in Rd . Without noise, it should be governed by an ordinary differential equation of the form: dXt = b(Xt ), dt where b : Rd → Rd is a continuous map. In many real-world problems, there are (white) noises that perturb the signal. Namely, Xt is governed by the following SDE: dXt = b(Xt ) + σ (Xt )nt , dt where nt is an m-dimensional white noise and σ : Rd → Rd×m is a continuous mapping. It is well known that the white noise exists in the sense of generalized function only, while the accumulated process t ns ds Bt = 0

is an m-dimensional Brownian motion. Then Xt is governed by the following SDE: dXt = b(Xt )dt + σ (Xt )dBt ,

(4.1)

which is understood as Xt = X0 +

t 0

b(Xs )ds +

t 0

σ (Xs )dBs .

In this chapter, we study the existence and uniqueness for the solution to the SDE (4.1).

62

4 : Stochastic differential equations

4.1

Basic deﬁnitions

In this section, we introduce various meanings of the solution to equation (4.1). We shall introduce the weak solution, the strong solution as well as the martingale problem solution. For the uniqueness, we shall introduce uniqueness in law, pathwise uniqueness as well as the well posedness of the martingale problem. If Xt is a continuous Rd -valued process, then X : (, F ) → (Cd , B (Cd )) is a measurable mapping, where Cd = C(R+ , Rd ) is the collection of the continuous mappings from R+ to Rd . Then X induces a probability measure on Cd . We shall denote this measure by L(X) or P ◦ X −1 . Deﬁnition 4.1 i) A probability measure µ on Cd is a weak solution to equation (4.1) if there exists a stochastic process Xt and an m-dimensional Brownian motion Bt on a stochastic basis such that equation (4.1) holds and L(X) = µ. ii) We say that uniqueness of the weak solution for equation (4.1) holds if whenever X and X are two weak solutions to equation (4.1) with L(X0 ) = L(X0 ), then L(X) = L(X ). Sometimes, we need uniqueness of the solution at the level of each path. Deﬁnition 4.2 We say that pathwise uniqueness of solutions for equation (4.1) holds if whenever X and X are two solutions deﬁned on the same stochastic basis with the same Brownian motion B such that X0 = X0 , then we have Xt = Xt , ∀ t ≥ 0, a.s. Sometimes, we need to construct solutions on a pre-speciﬁed stochastic basis with a given Brownian motion. Deﬁnition 4.3 i) A measurable functional F : Rd × Cm → Cd is a strong solution to equation (4.1) if for every Rd -valued random variable X0 and an m-dimensional Brownian motion B, X = F(X0 , B) satisﬁes equation (4.1). ii) We say that the uniqueness of the strong solution for equation (4.1) holds if for another solution X with the same initial X0 and the same Brownian B, we have X = F(X0 , B). To establish the relationship among weak and strong solutions, as well as the relationship among various versions of uniqueness, we demonstrate how to put two solutions of equation (4.1) on the same probability space, which might not be the case to begin with. Suppose that X and X are two solutions of the SDE (4.1) on stochastic bases ( , F , P , (Ft )) and ( , F , P , (Ft )) with initial random variables X0 and X0 (having the same distribution λ0 on Rd ) and Brownian motions B and B , respectively. Let λ and λ be the Borel probability measures

4.1

Basic deﬁnitions

on the Cartesian product Cd × Cm × Rd induced by (X , B , X0 ) and (X , B , X0 ), respectively. Deﬁne a mapping π : Cd × Cm × Rd → Cm × Rd by π(w1 , w2 , x) = (w2 , x). Then, λ ◦ π −1 = λ ◦ π −1 = PB ⊗ λ0 , where PB is the probability measure on Cm induced by a Brownian motion and PB ⊗ λ0 is the product measure of PB and λ0 on Cm × Rd . Let λw2 ,x (dw1 ) and λw2 ,x (dw1 ) be the regular conditional probability distribution of w1 given (w2 , x) with respect to λ and λ , respectively. This is possible since Cd is a Polish space. On the space = C d × C d × C m × Rd , we deﬁne a Borel probability measure λ by w3 ,x w3 ,x λ(A) = 1A (w1 , w2 , w3 , x)λ (dw1 )λ (dw2 ) × PB (dw3 )λ0 (dx)

(4.2)

for A ∈ B (). Then, it is easy to show that (w1 , w3 , x) and (X , B , X0 ) have the same distribution, and so do (w2 , w3 , x) and (X , B , X0 ). Let ιt be the operator of stopping a process at t. Namely, (ιt x)s = xs∧t , ∀ x ∈ Cd . Let d Bt (Cd ) = ι−1 t (B (C )).

Intuitively, for any solution Xt of equation (4.1), Xt should depend on {Bs : s ≤ t} only. Therefore, if A ∈ Bt (Cd ), then P(Xt ∈ A|B, X0 ) should be a functional of {X0 , Bs : s ≤ t}. The following lemma makes this intuition rigorous. Since the proof is quite technical, we suggest the reader skips it in the ﬁrst reading. Lemma 4.4 For any A ∈ Bt (Cd ), we deﬁne two functions f1 and f2 f1 (w, x) = λw,x (A)

and

f2 (w, x) = λw,x (A).

Then f1 and f2 are measurable with respect to the completion of the σ -ﬁeld Bt (Cm ) × B (Rd ) relative to the probability measure PB ⊗ λ0 . Proof We only prove the result for f1 . For ﬁxed t > 0 and A ∈ Bt (Cd ), w,x let λt (A) be deﬁned as λw,x (A) with λ replaced by its restriction to the sub-σ -ﬁeld

Bt (Cd ) × Bt (Cm ) × B (Rd ).

63

64

4 : Stochastic differential equations w,x

Then, (w, x) → λt (A) is measurable with respect to the σ -ﬁeld Bt (Cm ) × B (Rd ). Now, we only need to show that w,x

λt

(A) = f1 (w, x)

for PB ⊗ λ0 -a.s (w, x),

i.e. for any C ∈ B (Cm ) × B (Rd ), we have to show that w,x λt (A)PB (dw)λ0 (dx) = λ (A × C).

(4.3)

C

Deﬁne ρ : C([0, t], Rm ) × Cm → Cm by 1 ws ρ(w1 , w2 )s = 2 − w2 + w1 ws−t t 0

if s < t if s ≥ t.

Since ρ is a bijection and ρ, ρ −1 are continuous, we only need to prove equation (4.3) for C of the form C = {w ∈ Cm : ρ −1 w ∈ A1 × A2 } × D, where A1 ∈ B (C([0, t], Rm )), A2 ∈ B (Cm ) and D ∈ B (Rd ). As Brownian motions are of independent increments, PB ◦ ρ = P1 ⊗ P2 , where P1 and P2 are probability measures on C([0, t], Rm ) and Cm , respectively. Furtherw,x more, as λt (A) is Bt (Cm ) × B (Rd )-measurable, we can ﬁnd a measurable function g in C([0, t], Rm ) × Rd such that λ t

w,x

(A) = g(ρ −1 (w)1 , x),

where ρ −1 (w)1 ∈ C([0, t], Rm ) is the ﬁrst component of ρ −1 (w) in the product space C([0, t], Rm ) × Cm . Hence w,x λt (A)PB (dw)λ0 (dx) = g(u1 , x)P1 (du1 )P2 (du2 )λ0 (dx) A1 ×A2 ×D

C

=

A1 ×D

g(u1 , x)P1 (du1 )λ0 (dx)P2 (A2 ).

Let ˜ = {w ∈ Cm : (ρ −1 w)1 ∈ A1 } × D. C ˜ ∈ Bt (Cm ), and hence Then C w,x 1 1 g(u , x)P1 (du )λ0 (dx) = λ t (A)PB (dw)λ0 (dx) A1 ×D

˜ C

˜ = λ (A × C) = P X ∈ A, B |[0,t] ∈ A1 , X0 ∈ D .

4.1

Basic deﬁnitions

As P2 (A2 ) = P B (t + ·) − B (t) ∈ A2 , we have

w,x

C

λt

(A)PB (dw)λ0 (dx)

= P X ∈ A, B |[0,t] ∈ A1 , X0 ∈ D, B (t + ·) − B (t) ∈ A2 = P X ∈ A, (B , X0 ) ∈ C = λ (A × C).

This proves equation (4.3).

The next lemma identiﬁes the Brownian motion in the new probability space (, B (), λ). Lemma 4.5 Let Bt be the completion of

Bt (Cd ) × Bt (Cd ) × Bt (Cm ) × B (Rd ) relative to the probability measure λ. Then w3 is a Brownian motion on the stochastic basis (, B , λ, Bt ). Proof First, we prove that w3 is of independent increments. To this end, we only need to show that for t ≥ s, Eλ {exp(i a, w3 (t) − w3 (s)Rm )|Bs } = Eλ {exp(i a, w3 (t) − w3 (s)Rm )}. Let A1 , A2 ∈ Bs (Cd ), A3 ∈ Bs (Cm ), A4 ∈ B (Rd ) and a ∈ Rm . Then we have Eλ {exp(i a, w3 (t) − w3 (s)Rm )1A1 ×A2 ×A3 ×A4 } exp(i a, w3 (t) − w3 (s)Rm)λw3 ,x (A1)λw3 ,x (A2)PB (dw3)λ0 (dx) = =

A3 ×A4

A3 ×A4

exp(i a, w3 (t) − w3 (s)Rm )f1 (w3 , x)f2 (w3 , x)PB (dw3 )λ0 (dx)

= Eλ exp(i a, w3 (t) − w3 (s)Rm )λ(A1 × A2 × A3 × A4 ), where f1 , f2 are deﬁned in Lemma 4.4. Hence, w3 is of independent increments. Next, as the law of (w3 )t − (w3 )s under λ is the same as the law of Bt − Bs under P , we see that (w3 )t − (w3 )s has a multivariate normal distribution with mean zero and covariance matrix (t−s)Im . Therefore, w3 is a Brownian motion.

65

66

4 : Stochastic differential equations

The next lemma says that if a product probability measure P1 ⊗ P2 is supported on the diagonal of the product space, then the marginal probability measures must be the same and be degenerated. In other words, if two independent random variables are equal, then they have to be constant. Lemma 4.6 Let P1 and P2 be two probability measures on a Polish space X. If (P1 × P2 ){(x1 , x2 ) : x1 = x2 } = 1, there exists a unique x ∈ X such that P1 = P2 = δ{x} . Proof As

1=

P1 (dx)

1x=y P2 (dy) =

P1 (dx)P2 ({x}) ≤ 1,

(4.4)

we have P2 ({x}) = 1 for P1 − a.s. x. So, there exists a unique x such that P2 ({x}) = 1 and P1 = δx . After these preparations, we are now ready to state and to prove the main theorem of this section, which establishes the relationship between the pathwise uniqueness and the existence of a strong solution. Theorem 4.7 The equation (4.1) has a unique strong solution if and only if for every Borel probability measure µ0 on Rd , a weak solution µ of equation (4.1) exists with µ ◦ X0−1 = µ0 and the pathwise uniqueness of the solution holds. In this case, the weak uniqueness also holds. Proof If equation (4.1) has a unique strong solution, it is easy to verify that equation (4.1) has a weak solution and the pathwise uniqueness holds. We now prove the converse. Let X and X be two solutions of the SDE (4.1) (we can always take copies if necessary). From the arguments above, we see that (w1 , w3 , x) and (w2 , w3 , x) are two solutions of equation (4.1) on the same stochastic basis (, B , λ, Bt ). By the pathwise uniqueness, we have that λ(w2 = w1 ) = 1. By equation (4.2), we get λ(w2 = w1 ) = λw,x ⊗ λw,x (w2 = w1 )PB (dw)λ0 (dx), and hence, for PB ⊗ λ0 -a.s. (w, x), we have λw,x ⊗ λw,x (w1 = w2 ) = 1. By Lemmas 4.6 and (4.5), there exists a mapping F : Cm × Rd → Cd ,

(4.5)

4.2

Existence and uniqueness of a solution

such that λw,x = λw,x = δF(w,x) .

(4.6)

For any A ∈ Bt (Cd ), by equation (4.6), Lemma 4.4 and 1F −1 (A) (w, x) = λw,x (A), it follows that F −1 (A) is in the completion of Bt (Cm ) × B (Rd ) relative to PB ⊗ λ0 , and hence, F(w, x) is adapted. Then, for any Brownian motion B and initial random variable X0 , F(B, X0 ) is a solution of equation (4.1). The uniqueness of the strong solution follows directly from the pathwise uniqueness of equation (4.1).

4.2

Existence and uniqueness of a solution

In this section, we establish a unique solution to equation (4.1). The existence is established by the Picard approximation, while the uniqueness follows from Gronwall’s inequality. Suppose that the coefﬁcients b and σ satisfy the following Lipschitz condition: There exists a constant K such that |b(x) − b(y)| + |σ (x) − σ (y)| ≤ K|x − y|,

∀x, y ∈ Rd .

(4.7)

Theorem 4.8 Under Condition equations (4.7), (4.1) has a unique strong solution. Proof We only need to prove the theorem for t ≤ T. First, we use Picard iteration to construct a solution. Let Xt0 ≡ X0 and Xtn+1

= X0 +

t 0

b(Xsn )ds +

t 0

σ (Xsn )dBs ,

Consider the equivalent probability P˜ given by d P˜ = e−|X0 | /E e−|X0 | , dP

n ≥ 1.

67

68

4 : Stochastic differential equations

if necessary, we may assume that E(|X0 |2 ) < ∞. By the Burkholder–Davis– Gundy inequality, we get n+1 2 gn+1 (t) ≡ E sup |Xs | s≤t

2

≤ 3E(|X0 | ) + 3T E + 12E

0

2

|b(0)| + K|Xsn |

t

2

|σ (0)| + K|Xsn |

0

≤ K1 + K2

t

t

0

ds

ds

gn (s)ds,

where K1 , K2 are two constants and g0 (t) ≤ K1 . Using induction, we can prove that E sup |Xsn |2 ≤ K1 eK2 t . s≤t

Note that Xtn+1

− Xtn

=

t

0

(b(Xsn ) − b(Xsn−1 ))ds +

t 0

(σ (Xsn ) − σ (Xsn−1 ))dBs .

Applying the Burkholder–Davis–Gundy inequality again, we have n+1 n 2 fn+1 (t) ≡ E sup |Xs − Xs | s≤t

≤ 2E

t

0

2 |b(Xsn ) − b(Xsn−1 )|ds

r 2 + 2E sup (σ (Xsn ) − σ (Xsn−1 ))dBs r≤t

≤ 2TK

2 0

0

t

E

≤ (2T + 8)K2

|Xsn

0

t

− Xsn−1 |2

ds + 8E

0

t

|σ (Xsn ) − σ (Xsn−1 )|2 ds

fn (s)ds.

(4.8)

Set K3 = (2T + 8)K2 . Using induction, it is easy to show that fn+1 (t) ≤

K3n

0

t

(t − s)n−1 f1 (s)ds → 0. (n − 1)!

(4.9)

4.2

Existence and uniqueness of a solution

Therefore, there exists a continuous stochastic process Xt such that

E sup |Xtn − Xt |2 → 0. t≤T

It is easy to show that Xt is a solution to equation (4.1). To prove the uniqueness, we let X and Y be two solutions to equation (4.1) with the same initial and the same driving Brownian motion. Using arguments similar to equation (4.8), we have t 2 g(t) ≡ E sup |Xs − Ys | ≤ K4 g(s)ds. s≤t

0

By Gronwall’s inequality, which is proved below, we get g(t) ≡ 0. This implies the pathwise uniqueness. It follows from Theorem 4.7 that equation (4.1) has a unique strong solution. Lemma 4.9 (Gronwall’s inequality) If t g(t) ≤ K1 + K2 g(s)ds,

∀t ≥ 0,

0

then g(t) ≤ K1 eK2 t , Proof Note that g(t) ≤ K1 + K2 =

t 0

∀t ≥ 0.

K1 + K 2

K1 + K1 K2 t + K22

≤ K1 + K1 K2 t + K22 = K 1 1 + K2 t +

t

0 t

0

s

g(r)dr ds 0

(t − r)g(r)dr r (t − r) K1 + K2 g(s)ds dr

(K2 2

t)2

+ K23

0

0

t

(t − s)2 g(s)ds. 2

Using induction, we can show that t (K2 t)n (K2 t)2 (t − s)n + ··· + + K2n+1 g(s)ds. g(t) ≤ K1 1 + K2 t + 2 n! n! 0 Taking n → ∞, we ﬁnish the proof.

Finally, in this section, we prove the continuous dependency of the solution on the coefﬁcients. This theorem will be needed when we derive the duality for stochastic ﬁltering.

69

70

4 : Stochastic differential equations

Theorem 4.10 Suppose that {(bn , σ n )} is a sequence of functions on Rd taking values in Rd × Rd×m . For each n, (bn , σ n ) satisﬁes Condition (4.7). Further, as n → ∞, n ≡ sup |bn (x) − b(x)|2 + |σ n (x) − σ (x)|2 → 0. x∈Rd

Let X n be the solution to equation (4.1) with (b, σ ) being replaced by (bn , σ n ). Then, for any T > 0,

E sup |Xtn − Xt |2 → 0. t≤T

Proof As Xtn

− Xt =

t

b

0

n

(Xsn ) − b(Xs )

ds +

0

t

σ n (Xsn ) − σ (Xs ) dBs ,

it follows from the Cauchy–Schwarz and Burkholder–Davis–Gundy inequalities that for t ≤ T, t E sup |Xsn − Xs |2 ≤ 2T E |bn (Xsn ) − b(Xs )|2 ds s≤t

0

+ 8E ≤ 4T E

t 0

|σ n (Xsn ) − σ (Xs )|2 ds

t

|b(Xsn ) − b(Xs )|2 + n ds

0

+ 16E

t 0

|σ (Xsn ) − σ (Xs )|2 + n ds

≤ 4(T + 4)n + 4(T + 4)KE

t

0

|Xsn − Xs |2 ds.

By Gronwall’s inequality, we have

E sup |Xsn − Xs |2 ≤ 4(T + 4)n e4(T+4)Kt . s≤t

This implies the desired estimate.

4.3

Martingale problem

Let Xt be the unique solution to equation (4.1). Applying Itô’s formula, we get for any f ∈ Cb2 (Rd ), df (Xt ) = Lf (Xt )dt + ∇ ∗ f σ (Xt )dBt ,

4.3

Martingale problem

where d d 1 2 Lf = aij ∂ij f + bi ∂i f , 2 i,j=1

i=1

and aij =

m

σik σjk .

k=1

Therefore, f Mt

≡ f (Xt ) − f (X0 ) −

t 0

Lf (Xs )ds

(4.10)

is a square-integrable martingale. Deﬁnition 4.11 We say that {Xt } is a solution to the L-martingale problem f if ∀f ∈ Cb2 (Rd ), Mt deﬁned in equation (4.10) is a locally square-integrable martingale. The L-martingale problem is well posed in C([0, ∞), Rd ) if it has at least one solution and its solution is unique in distribution, i.e. if X, Y are two solutions, then L(X) = L(Y). From the above, we see that the solution to the SDE (4.1) is a solution to the L-martingale problem. The next theorem establishes the uniqueness for the solution of the martingale problem. Theorem 4.12 Under Condition (4.7), the L-martingale problem is well posed in C([0, ∞), Rd ). Proof Let Xt be a solution to the L-martingale problem. First, we prove that t i i i bi (Xs )ds Mt ≡ Xt − X0 − 0

is a local martingale. Let f ∈ Cb2 (Rd ) be such that f (x) = xi

for |x| ≤ r.

Then,

f

i − X0i − Mt∧σr ≡ Xt∧σ r

t∧σr 0

bi (Xs )ds

is a continuous martingale, where σr = inf {t : |Xt | > r} .

71

72

4 : Stochastic differential equations f

i As σr ↑ ∞ and Mt∧σ = Mt∧σr is a continuous square-integrable martingale, r 2,c

we see that Mi ∈ Mloc . Take f ∈ Cb2 (Rd ) such that f (x) = xi xj

for |x| ≤ r,

it is easy to verify that j

j

Xti Xt − X0i X0 −

t 0

j aij (Xs ) + Xsi bj (Xs ) + Xs bi (Xs ) ds

is a local martingale. Note that dXti = bi (Xt )dt + dMti ,

i = 1, 2, . . . , d.

By Itô’s formula, we get that " ! j j d(Xti Xt ) = Xti bj (Xt ) + Xt bi (Xt ) dt + d Mi , Mj + d(local mart.). t

Hence, t t m " ! aij (Xs )ds = σik σjk (Xs )ds. Mi , Mj = t

0

0

k=1

By the martingale representation theorem (Theorem 3.16), we see that there exists a Brownian motion Bt such that m t σik (Xs )dBis . Mti = k=1

0

Therefore, Xt is a solution to equation (4.1). By the uniqueness of the weak solution, we see that the L-martingale problem is well posed. Remark 4.13 From the proof of the theorem above we see that the solution to the martingale problem and the weak solution of the stochastic differential equation are equivalent.

4.4

A stochastic ﬂow

Throughout this section, we assume that Condition (4.7) holds, so that the SDE (4.1) has a unique strong solution. Let Xt = F(t, x, B) be the unique strong solution of the SDE (4.1) with initial x, where F : R+ × Rd × Cd → Rd is a measurable mapping. We deﬁne a shift operator θt from Cd to Cd by (θt B)s = Bt+s − Bs ,

∀ s ≥ 0.

4.4

A stochastic ﬂow

By the pathwise uniqueness of the solution, we see that for t, s ≥ 0 ﬁxed F(t + s, x, B) = F(s, F(t, x, B), θs B)

a.s.

Namely, Xt = F(t, x, B) as a mapping x → F(t, x, B) is a stochastic ﬂow. In this section, we consider the differentiability of this mapping. To this end, we study the Euler approximation of the solution. This approximation method will be useful when we study the numerical solution to the ﬁltering equation in Chapter 8. For δ ∈ (0, 1), let ηδ (t) = jδ for jδ ≤ t < (j + 1)δ, j = 0, 1, 2, . . .. Let X δ be the solution to t t δ δ b Xηδ (s) ds + σ Xηδ δ (s) dBs . (4.11) Xt = x + 0

0

Note that for jδ ≤ t < (j + 1)δ, equation (4.11) becomes δ δ δ Xtδ = Xjδ + b(Xjδ )(t − jδ) + σ (Xjδ )(Bt − Bjδ ).

Therefore, the solution to equation (4.11) is given recursively starting from j = 0. Throughout the rest of this section, we assume that the coefﬁcients b and σ are bounded and Lipschitz continuous, i.e. there exists a constant K such that for any x, y ∈ Rd , we have |b(x)| + |σ (x)| ≤ K, and |b(x) − b(y)| + |σ (x) − σ (y)| ≤ K|x − y|. Theorem 4.14 For any p > 12 , there exists a constant K1 such that

E sup |Xtδ − Xt |2p ≤ K1 δ p . 0≤t≤T

Proof Let fδ (t) = E sup |Xsδ − Xs |2p . 0≤s≤t

Using the inequality |a + b|2p ≤ 22p−1 |a|2p + |b|2p , we get 2p 2p E Xηδ δ (s) − Xsδ ≤ E b Xηδ δ (s) (s − ηδ (s)) + σ Xηδ δ (s) Bηδ (s) − Bs ≤ 22p−1 K2p δ 2p + 22p−2 K2p K2 δ p ≤ K3 δ p ,

73

74

4 : Stochastic differential equations

where K2 is a constant (the 2p-moment of a d-dimensional standard normal random vector) and K3 = 22p−2 K2p (2 + K2 ). By the Lipschitz continuity of b and σ , we get 2p E b Xηδ δ (s) − b Xsδ ≤ K2p K3 δ p , and

As Xtδ

2p E σ Xηδ δ (s) − σ Xsδ ≤ K2p K3 δ p . t t δ b Xηδ (s) − b(Xs ) ds + σ Xηδ δ (s) − σ (Xs ) dBs , − Xt = 0

0

by the Cauchy–Schwarz inequality and the Burkholder–Davis–Gundy inequality, we have t 2p fδ (t) ≤ 2p−1 T 2p−1 E b Xηδ δ (s) − b (Xs ) ds +2 ≤ K4

2p−1

t 0

0

2p 2p − 1

2p

T

t

p−1 0

2p E σ Xηδ δ (s) − σ (Xs ) ds

2p K3 δ p K2p + K2p E Xsδ − Xs ds

≤ K5 δ p + K6

t 0

fδ (s)ds.

By Gronwall’s inequality, we get fδ (t) ≤ K5 eK6 T δ p → 0.

Now we consider the differentiability of F(t, x, B) with respect to x. For convenience, we rewrite equations (4.1) and (4.11) as Xt = x +

0

t

b(Xs )ds +

d i=1

t

0

σi (Xs )dBis ,

and Xtδ = x +

t 0

d t b Xηδ δ (s) ds + σi Xηδ δ (s) dBis , i=1

where σi is the ith column of the matrix σ .

0

4.4

A stochastic ﬂow

Suppose that b and σ are continuously differentiable in x. For 0 < t < δ, we have Xtδ = x + b(x)t +

d

σi (x)Bit ,

i=1

which is differentiable in x. Let Ytδ = ∇ ∗ Xtδ . Then, Ytδ = I + ∇ ∗ bt +

d

∇ ∗ σi (x)Bit .

i=1

For δ ≤ t < 2δ, we have d Xtδ = Xδδ + b Xδδ (t − δ) + σi Xδδ Bit − Biδ . i=1

Then, Ytδ is a Rd×d -valued process satisfying d ∇ ∗ σi Xδδ Yδδ Bit − Biδ . Ytδ = Yδδ + ∇ ∗ b Xδδ Yδδ (t − δ) + i=1

Using induction, we can prove that Xtδ is differentiable in x and its gradient Ytδ is the unique solution to the following SDE on Rd×d : Ytδ

=I+

0

t

∗

∇ b

Xηδ δ (s)

Yηδδ (s) ds +

d i=1

t 0

∇ ∗ σi Xηδ δ (s) Yηδδ (s) dBis . (4.12)

To study the limit of Y δ as δ → 0, we need the following discrete-time version of Gronwall’s inequality. Lemma 4.15 (Discrete-time Gronwall inequality) Let {an : n ∈ Z+ } be a non-negative sequence. Suppose that there are two constants K1 and K2 satisfying an+1 ≤ K1 + K2

n

aj ,

∀n ≥ 0.

j=0

Then, an ≤ K1 + (K1 + a0 K2 ) (1 + K2 )n−1 .

(4.13)

75

76

4 : Stochastic differential equations

Proof Let Sn =

n

aj .

j=0

Then, Sn+1 ≤ K1 + (1 + K2 )Sn . Using induction, we have Sn = K1 1 + (1 + K2 ) + · · · + (1 + K2 )n−1 + (1 + K2 )n a0 K1 (1 + K2 )n − 1 + (1 + K2 )n a0 K2 K1 (1 + K2 )n . ≤ a0 + K2 =

Inserting this back into equation (4.13), we ﬁnish the proof.

As a corollary of the discrete-time Gronwall inequality, we have Corollary 4.16 Suppose that ∇ ∗ b and ∇ ∗ σi , i = 1, 2, . . . , d are bounded and Lipschitz continuous. Then, for any p > 12 , we have sup E|Ytδ |2p < ∞. t≤T

Proof Applying the Cauchy–Schwarz and the Burkholder–Davis–Gundy inequalities to equation (4.12), we get t δ 2 E|Yt | ≤ K1 + K2 E|Yηδδ (s) |2p ds. (4.14) 0

Let t = (n + 1)δ. Then, δ |2p an+1 ≡ E|Y(n+1)δ

≤ K1 + K2 δ

n

aj .

j=0

By Lemma 4.15, we have an ≤ K1 + (K1 + K1 K2 δ) (1 + K2 δ)n−1 T

≤ K1 + (K1 + K1 K2 δ) (1 + K2 δ) δ −1 ≤ K1 + K1 eK2 T . Inserting this back into equation (4.14), we get the boundedness of E|Ytδ |2p .

4.4

A stochastic ﬂow

To characterize the limit of Y δ as δ → 0, we consider the following linear SDE on Rd×d : Yt = I +

t

0

∗

∇ b (Xs ) Ys ds +

d i=1

0

t

∇ ∗ σi (Xs ) Ys dBis .

(4.15)

Theorem 4.17 Suppose that ∇ ∗ b and ∇ ∗ σi , i = 1, 2, . . . , d are bounded and Lipschitz continuous. Then equation (4.15) has a unique strong solution Yt taking values in Rd×d , and, for any p > 12 , there exists a constant K such that

E sup |Ytδ − Yt |2p ≤ Kδ p .

(4.16)

0≤t≤T

Proof The existence and uniqueness of Yt follow from the same arguments as in Section 4.2. Now we prove equation (4.16). Note that Ytδ − Yt (4.17) t = ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) + ∇ ∗ b (Xs ) Yηδδ (s) − Ys ds 0

+

d t i=1

0

∇ ∗ σi Xηδ δ (s) − ∇ ∗ σi (Xs ) Yηδδ (s)

+∇ ∗ σi (Xs ) Yηδδ (s) − Ys dBis . By the Lipschitz continuity and the Cauchy–Schwarz inequality, we have 2p E ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) 4p 1/2 1/2 δ δ 4p 2p ≤K E Xηδ (s) − Xs E Yηδ (s) . By Corollary 4.16 and the triangle inequality, we can continue with 2p E ∇ ∗ b Xηδ δ (s) − ∇ ∗ b (Xs ) Yηδδ (s) 2p 4p 1/4p 4p 1/4p δ δ δ ≤ K1 E Xηδ (s) − Xs + E Xs − X s ≤ K2 δ p ,

77

78

4 : Stochastic differential equations

where the last step follows from Theorem 4.14. Similarly, we can get 2p E ∇ ∗ b (Xs ) Yηδδ (s) − Ys ≤ K3 δ p , 2p E ∇ ∗ σi Xηδ δ (s) − ∇ ∗ σi (Xs ) Yηδδ (s) ≤ K4 δ p , and 2p E ∇ ∗ σi (Xs ) Yηδδ (s) − Ys ≤ K4 δ p . Applying the Cauchy–Schwarz and the Burkholder–Davis–Gundy inequalities to equation (4.17), we obtain equation (4.16). Finally, we consider the invertibility of the matrix Yt . To deﬁne a process and to prove it is the inverse of Yt , it is more convenient to use the Stratonovich form of the SDE. Recall that Xti

=

X0i

+

t 0

i

b (Xs )ds +

d

t

0

j=1

j

σij (Xs )dBs .

Applying Itô’s formula, we have dσij (Xt ) = Lσij (Xt )dt +

d

∂k σij (Xt )

d

σk (Xt )dBt .

=1

k=1

Thus, d ! " d σij (X), Bj = σkj (Xt )∂k σij (Xt )dt. t

k=1

Using equation (3.26), the basic process Xt is governed by the following Stratonovich form: Xt = X0 +

t 0

¯ s )ds + b(X

d i=1

t 0

σi (Xs ) ◦ dBis ,

where d 1 ¯ b(x) = b(x) − σij (x)∂i σj (x). 2 i,j=1

4.5

Markov property

Then, the Jacobian matrix Yt for the mapping x → F(t, x, B) satisﬁes Yt = I +

t

0

∗¯

∇ b (Xs ) Ys ds +

d i=1

t

0

∇ ∗ σi (Xs ) Ys ◦ dBis .

(4.18)

Theorem 4.18 The process {F(t, x, B)} is a stochastic ﬂow and, for almost all B, the mapping x → F(t, x, B) is differentiable, and the Jacobian matrix Yt is invertible. Proof Let Zt be a Rd×d -valued process governed by the following SDE: Zt = I −

t

0

∗¯

Zs ∇ b (Xs ) ds −

d i=1

0

t

Zs ∇ ∗ σi (Xs ) ◦ dBis .

(4.19)

The existence of a solution follows from the same arguments as in Theorem 4.8. Applying Itô’s formula to equations (4.18) and (4.19) we have d(Zt Yt ) = Zt ◦ dYt + dZt ◦ Yt = 0. Thus, Zt Yt = I a.s.

4.5

Markov property

In this section, we consider the Markov property for the solution of the SDE (4.1). Deﬁnition 4.19 A stochastic process (Xt ) is said to be a Markov process if for all t ≥ s ≥ 0 and for any bounded measurable function f , we have E f (Xt )|Fs = E f (Xt )|Xs , a.s. We will need the following lemma for the proof of the Markov property. Lemma 4.20 Given a probability space (, G , P) and measurable spaces (E, E ) and (F, F ). Suppose that A ⊂ G is a σ -ﬁeld, X : → E and Y : → F are random variables such that X is A-measurable and Y is independent of A. For a bounded real-valued function on (E × F, E × F ), we deﬁne φ(x) = E(x, Y). Then φ is Borel measurable on (E, E) and

E ((X, Y)|A) = φ(X),

a.s.

79

80

4 : Stochastic differential equations

Proof Write PY for the law of Y. Then, φ(x) =

F

(x, y)PY (dy).

The measurability of φ follows from Fubini’s theorem. Let Z be a bounded A-measurable random variable. Denote the law of (X, Z) by PX,Z . Then

E (ZE ((X, Y)|A)) = E (Z(X, Y)) = (x, y)zPX,Z (d(x, z))PY (dy) E×R F

=

E×R

=

E×R

F

(x, y)PY (dy) zPX,Z (d(x, z))

φ(x)zPX,Z (d(x, z))

= E(φ(X)Z).

This implies the desired identity.

Now we are ready to prove the Markov property for the solution to the SDE (4.1). Theorem 4.21 The solution (Xt ) to the SDE (4.1) is a Markov process. Proof Let g be a bounded real-valued function on Rd . We deﬁne a function φ on R+ × Rd by φ(t, x) = Eg(F(t, x, B)), where F is the stochastic ﬂow given in the last section. Now we prove that for any t ≥ s ≥ 0, E g(Xt )Fs = φ(t − s, Xs ).

(4.20)

The Markov property of (Xt ) follows from equation (4.20) easily. Let Z be an Fs -measurable bounded random variable. It follows from Lemma 4.20 that E g(F(t − s, Z, θs B))Fs = φ(t − s, Z), a.s.

4.5

Markov property

Take Z = Xs = F(s, X0 , B), we see that φ(t − s, Xs ) = E g(F(t − s, F(s, X0 , B), θs B))Fs = E g(F(t, X0 , B))Fs = E g(Xt )Fs , where the second equality follows from the ﬂow property of (Xt ).

81

5

Filtering model and Kallianpur–Striebel formula

In this chapter, we introduce the basic setup of the ﬁltering models to be studied in this book. Then we demonstrate that the optimal ﬁlter is given by the conditional distribution of the signal. Bayes’ formula in the ﬁltering setup, is called the Kallianpur–Striebel formula, which is the key in the development of non-linear ﬁltering theory. We will derive this formula in Section 5.2. We establish the ﬁltering equations in Section 5.3. Finally, in Section 5.4, we give a particle-system representation of the optimal ﬁlter that will be useful in understanding numerical schemes for the optimal ﬁlter.

5.1

The ﬁltering model

As we mentioned in the introductory chapter, the ﬁltering problem consists of two processes: The signal process, which is what we want to estimate, and the observation process that provides the information we can use. For the observation process, we assume that it is governed by the following stochastic differential equation: t h(Xs )ds + Wt , (5.1) Yt = 0

where Wt is an m-dimensional Brownian motion. For the signal process Xt , we assume that Xt is a Rd -valued process governed by the following stochastic differential equation: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,

(5.2)

where B is a d-dimensional Brownian motion independent of W, and b : Rd → Rd , c : Rd → Rd×m and σ : Rd → Rd×d are continuous mappings. We make the following assumption (BC) throughout this book. Note that the boundedness condition equation (5.3) is not necessary. We make this assumption in this book for simplicity of presentation.

5.2

The optimal ﬁlter

Assumption (BC): The mappings b, c, σ , h are bounded and Lipschitz continuous. We shall denote the bound and the Lipschitz constant in the Assumption (BC) by K. Namely, for any x, y ∈ Rd , we assume that max |b(x)|, |c(x)|, |σ (x)|, |h(x)| ≤ K, (5.3) and max |b(x) − b(y)|, |c(x) − c(y)|, |σ (x) − σ (y)| ≤ K|x − y|, where for x = (x1 , x2 , . . . , xd )∗ ∈ Rd , the Euclidean norm is deﬁned by ⎛ ⎞1/2 d |x| = ⎝ x2i ⎠ , i=1

and for matrix c ∈ Rd×m , ⎛ ⎞1/2 m d |c| = ⎝ c2 ⎠ . ij

i=1 j=1

Let

Gt = σ (Ys : 0 ≤ s ≤ t) be the σ -ﬁeld generated by the observation up to time t; Gt will be the information available to us at time t. The aim of the ﬁltering theory is to estimate the signal Xt based on the information Gt .

5.2

The optimal ﬁlter

In this section, we proceed to estimate the signal Xt based on the available information Gt . The next lemma concerns the estimation of the exact value of a random variable ξ based on the information represented by a σ -ﬁeld G . It says that the conditional expectation E(ξ |G ) is the one that has the minimum square error among all G -measurable random variables. We denote the collection of all G -measurable square-integrable random variables by L2 (, G , P). Lemma 5.1 Let ξ be a square-integrable random variable in the probability space (, F , P). Let G be a sub-σ -ﬁeld of F . Then

E((ξ − E(ξ |G ))2 ) = min{E((ξ − η)2 ) : η ∈ L2 (, G , P)}.

83

84

5 : Filtering model and Kallianpur–Striebel formula

Proof Let η ∈ L2 (, G , P). Then

E((ξ − η)2 ) − E((ξ − E(ξ |G ))2 ) = E((E(ξ |G ) − η)(2ξ − η − E(ξ |G ))) = E(E{((E(ξ |G ) − η)(2ξ − η − E(ξ |G )))|G }). As E(ξ |G ) − η is G -measurable, we can continue the calculation above with

E((ξ − η)2 ) − E((ξ − E(ξ |G ))2 ) = E((E(ξ |G ) − η)E{(2ξ − η − E(ξ |G ))|G }) = E((E(ξ |G ) − η)2 ) ≥ 0.

This ﬁnishes the proof of the lemma.

Most of the time, we are interested in some quantities that are functions of the signal instead of the signal itself. Therefore, we want to ﬁnd a systematic way to estimate f (Xt ) for f in a rich enough family of test functions. Although E(Xt |Gt ) is the best estimate for Xt , f (E(Xt |Gt )) is not the best estimate for f (Xt ) based on the least square error criterion if f is not a linear function. Instead, applying the above lemma with ξ = f (Xt ) and G = Gt , we see that E f (Xt )|Gt is the best estimate of f (Xt ). Let πt (·) ≡ P(Xt ∈ ·|Gt ) be the regular conditional probability distribution of Xt given Gt ; i.e. πt is a map from B (Rd ) × to [0, 1] such that i) For any ω ∈ , πt (·, ω) is a probability measure on Rd . ii) For any A ∈ B (Rd ), πt (A, ·) is a Gt -measurable random variable. iii) For any A ∈ B (Rd ), we have πt (A, ω) = P(Xt ∈ A|Gt )(ω),

a.s. ω.

(5.4)

Now we prove that the conditional expectation is given by the integral of f with respect to the regular conditional probability distribution πt . Recall that Cb (Rd ) is the set of bounded continuous functions on Rd . Throughout this book, we will use ν, f to denote the integral of a function f with respect to a measure ν. Lemma 5.2 For any f ∈ Cb (Rd ) and t ≥ 0, we have E(f (Xt )|Gt ) = πt , f , a.s.

(5.5)

Proof By equation (5.4), we see that equation (5.5) holds for f (x) = 1A (x). It follows from the linearity of the conditional expectation that equation (5.5) holds for simple functions. For f ≥ 0, we can take an increasing sequence of simple functions converging pointwise to f .

5.2

The optimal ﬁlter

Then equation (5.5) follows from the monotone convergence theorem. Finally, the case for general f follows from the linearity and f = f + −f − . Let P (Rd ) be the collection of all Borel probability measures on Rd . Then, πt is a P (Rd )-valued stochastic process. Based on the observation above, we call πt the optimal ﬁlter of Xt . To calculate πt , f = E f (Xt )|Gt effectively, we need to make a transformation of probabilities so that Yt becomes a Brownian motion under the new probability measure (see equation (5.1) for the deﬁnition of Yt ). In fact, we can achieve this by a change of probability measures based on Girsanov’s theorem. Since h is bounded, t 1 t −1 2 |h(Xs )| ds h(Xs ), dWs − Mt ≡ exp − 2 0 0 is a martingale. Let Pˆ be the measure on that is absolutely continuous with respect to P and the Radon–Nickodym derivative on (, Ft ) is d Pˆ = Mt−1 , dP Ft W,B

is the σ -ﬁeld generated by {Ws , Bs : s ≤ t} that we shall where Ft = Ft take throughout this and the next section. Then Pˆ is a probability measure. ˆ By Corollary 3.24, Yt is a P-Brownian motion independent of B. The next theorem is Bayes’ formula in the ﬁltering setup. It gives a formula for the calculation of the conditional expectation under the new probaˆ It will play a very important role in the ﬁltering theory bility measure P. introduced in this book. Theorem 5.3 (Kallianpur–Striebel formula) The optimal ﬁlter πt can be represented as Vt , f πt , f = , ∀ f ∈ Cb (Rd ), (5.6)

Vt , 1 where

ˆ (Mt f (Xt )|Gt ), Vt , f = E

ˆ refers to the expectation with respect to the measure P. ˆ and E Proof Note that

dP = Mt . d Pˆ F t

(5.7)

85

86

5 : Filtering model and Kallianpur–Striebel formula

Replacing P and Q in Theorem 3.22 by Pˆ

Ft

and P , respectively, we have Ft

Vt , f Eˆ (Mt f (Xt )|Gt ) πt , f = = .

Vt , 1 Eˆ (Mt |Gt )

Let MF (Rd ) be the collection of all ﬁnite Borel measures on Rd . The MF (Rd )-valued process Vt is called the unnormalized ﬁlter.

5.3

Filtering equation

In this section, we ﬁrst derive a stochastic differential equation for the unnormalized ﬁlter Vt . Then we establish the ﬁltering equation for the optimal ﬁlter πt by making use of the Kallianpur–Striebel formula. ˆ Recall that Yt is a P-Brownian motion independent of B. As dWt = dYt − h(Xt )dt, it follows from equation (5.2) that the signal Xt is governed by the following stochastic differential equation: dXt = (b − ch)(Xt )dt + c(Xt )dYt + σ (Xt )dBt . Note that

Mt = exp

t

0

1 h (Xs )dYs − 2 ∗

0

t

(5.8)

2

|h(Xs )| ds ,

where h∗ denotes the transpose of h ∈ Rm . By Itô’s formula, it is easy to show that dMt = Mt h∗ (Xt )dYt .

(5.9) W,B

We will need the following lemma. Recall that we take Ft = Ft this section.

in

Lemma 5.4 Suppose that f and g are predictable processes on the stochastic ˆ Ft ) satisfying basis (, F , P, T T ˆ ˆ E |fs |ds + E |gs |2 ds < ∞. 0

Then,

Eˆ

0

0

t

t fs dsGt = Eˆ fs |Gs ds, 0

(5.10)

5.3

Eˆ

0

t

Filtering equation

t gs dYs Gt = Eˆ (gs |Gs ) dYs ,

(5.11)

gs dBs Gt = 0.

(5.12)

0

and

Eˆ

0

t

Proof Suppose f is simple, i.e. fs =

k

fi 1(ai ,bi ] (s),

i=1

where (ai , bi ], i = 1, 2, . . . , k are disjoint subintervals of [0, t], and fi is Fai measurable, i = 1, 2, . . . , k. Let

Gs,t = σ (Yu − Ys : s ≤ u ≤ t). Note that Gt = Gai ∨ Gai ,t ≡ σ Gai ∪ Gai ,t . Then,

Eˆ

t 0

k fs dsGt = Eˆ fi (bi − ai )|Gt i=1

=

k

Eˆ fi |Gai ∨ Gai ,t (bi − ai )

i=1

=

k i=1 t

=

0

Eˆ fi |Gai (bi − ai ) Eˆ fs |Gs ds,

where the equality prior to the last one follows from the independence of increments of the Brownian motion Y. Thus, equation (5.10) holds for simple processes. If f ≥ 0, we can take an increasing sequence of simple processes converging pointwise to f . Then equation (5.10) follows from the monotone

87

88

5 : Filtering model and Kallianpur–Striebel formula

convergence theorem. For general f , we can write f = f + − f − , and then equation (5.10) follows from the linearity. Now we proceed to prove the second equation. Suppose g is simple, namely

gs =

k

gi 1(ai ,bi ] (s),

i=1

where gi is Fai measurable, i = 1, 2, . . . , k. Once again using the independence of increments for Y, we get

Eˆ

t

0

k gs dYs Gt = Eˆ gi (Ybi − Yai )|Gt i=1

=

k

Eˆ (gi |Gt ) (Ybi − Yai )

i=1

=

k i=1 t

=

0

Eˆ gi |Gai (Ybi − Yai )

Eˆ (gs |Gs ) dYs .

For a general g, we can approximate it by a sequence of simple processes g n such that |gsn | ≤ |gs |, a.s. for all s ≤ t. Then t 2 t n ˆ ˆ E gs dYs ≤ E |gs |2 ds < ∞, 0

0

t and hence { 0 gsn dYs : n ≥ 1} is uniformly integrable. Thus

Eˆ

t 0

t n ˆ gs dYs Gt = lim E gs dYs Gt n→∞

= lim =

n→∞ 0 t 0

0

t

ˆ E gsn Gs dYs

Eˆ gs Gs dYs .

5.3

Filtering equation

Finally, we prove the last equation. Suppose g is simple ﬁrst. As Bbi − Bai is independent of Gt ∨ σ (gi ), we have

Eˆ

0

t

k gs dBs Gt = Eˆ gi (Bbi − Bai )|Gt i=1

=

k i=1

Eˆ Eˆ gi (Bbi − Bai )Gt ∨ σ (gi ) Gt

= 0. For general g, equation (5.12) follows from the same approximation argument as those used to derive equation (5.11). After all these preparations above, we are now ready to derive the main equations in the ﬁltering theory. First, we consider the linear equation for the unnormalized ﬁlter Vt . Theorem 5.5 (Zakai’s equation) The unnormalized ﬁlter Vt satisﬁes the following stochastic differential equation:

Vt , f = V0 , f +

0

t

Vs , Lf ds +

0

t

Vs , ∇ ∗ fc + fh∗ dYs ,

(5.13)

where Lf =

d d 1 aij ∂ij2 f + bi ∂i f 2 i,j=1

i=1

is the generator of the signal process, and the d × d matrix a = (aij ) is given by a = cc∗ + σ σ ∗ . Proof Applying Itô’s formula to equation (5.2), we have ˜ (Xt )dt + ∇ ∗ fc(Xt )dYt + ∇ ∗ f σ (Xt )dBt , df (Xt ) = Lf where ˜ = Lf − ∇ ∗ fch. Lf

(5.14)

89

90

5 : Filtering model and Kallianpur–Striebel formula

Applying Itô’s formula to equations (5.9) and (5.14), we obtain ˜ (Xt )dt + ∇ ∗ fc(Xt )dYt + ∇ ∗ f σ (Xt )dBt d(Mt f (Xt )) = Mt Lf + Mt f (Xt )h∗ (Xt )dYt + Mt ∇ ∗ fch(Xt )dt.

(5.15)

Namely, Mt f (Xt ) = f (X0 ) + + +

t

0 t 0

t 0

Ms Lf (Xs )ds

Ms (∇ ∗ fc(Xs ) + fh∗ (Xs ))dYs Ms ∇ ∗ f σ (Xs )dBs .

Taking conditional expectations on both sides, we get t ˆ Vt , f = V0 , f + E Ms Lf (Xs )dsGt ˆ +E ˆ +E

0

t

0

t

0

= V0 , f +

Ms (∇ ∗ fc(Xs ) + fh∗ (Xs ))dYs Gt Ms ∇ ∗ f σ (Xs )dBs Gt

t

0

Vs , Lf ds +

0

t

Vs , ∇ ∗ fc + fh∗ dYs ,

where the last equality follows from Lemma 5.4.

To establish a stochastic differential equation for the optimal ﬁlter πt . We ﬁrst deﬁne the innovation process νt by dνt = dYt − πt , h dt. (5.16) Lemma 5.6 The innovation process νt is a Gt -Brownian motion under the original probability measure. Proof It follows from equation (5.16) that, for t > s, t πr , h drGs E(νt |Gs ) = E Yt −

0

= E Yt − Ys −

t s

πr , h drGs + νs .

5.3

Filtering equation

Now, recalling the observation equation (5.1), we obtain

E(νt |Gs ) = E Wt − Ws − =

t

s

E

t

h(Xr ) − πr , h drGs + νs

s

E(h(Xr )|Gr ) − h(Xr ) Gs dr + νs

= νs , where the second equality follows from the independent increments of the Brownian motion W, the linearity of the conditional expectation and the boundedness of h. Therefore, νt is a Gt -martingale. ˆ its Meyer process is As Yt is a Brownian motion under P, lim

n→∞

n

j

j

Yit/n − Y(i−1)t/n

! " k k Yit/n = Y j , Y k = δjk t. − Y(i−1)t/n t

i=1

Since νt = Yt −

0

t

πs , h ds,

and the quadratic variation of the second term is 0, it follows that Meyer’s process for ν is !

νj, νk

" t

! " = Y j , Y k = δjk t. t

It now follows from Theorem 3.13 that νt is a Gt -Brownian motion.

Finally, we are ready to derive the stochastic differential equation for the optimal ﬁlter. Theorem 5.7 (Kushner–FKK equation) The optimal ﬁlter πt satisﬁes the following stochastic differential equation: For all f ∈ Cb2 (Rd ),

πt , f = π0 , f + +

0

t

0

t

πs , Lf ds

πs , ∇ ∗ fc + fh∗ − πs , f πs , h∗ dνs .

(5.17)

91

92

5 : Filtering model and Kallianpur–Striebel formula

Proof Applying Itô’s formula to equation (5.6) and making use of Zakai’s equation (5.13), we get 1 Vt , Lf dt + Vt , ∇ ∗ fc + fh∗ dYt

Vt , 1 Vt , f ∗ − , h V dYt t

Vt , 12 1 ∗ ∗ V V − , ∇ fc + fh , h dt t t

Vt , 12 Vt , f + Vt , h∗ Vt , h dt 3

Vt , 1 = πt , Lf dt + πt , ∇ ∗ fc + fh∗ dYt − πt , f πt , h∗ dYt − πt , ∇ ∗ fc + fh∗ πt , h dt + πt , f πt , h∗ πt , h dt.

d πt , f =

The equation (5.17) then follows if we replace Yt with νt + the above equation.

t 0

πs , h ds in

As an application of the innovation process, we now consider the portfolio optimization problem introduced in Chapter 1. Example 5.8 (Portfolio optimization) We demonstrate that the portfolio optimization problem can be solved in the ﬁltering framework. Note that the signal process is the appreciation rate process Xt = (Xt1 , . . . , Xtd )∗ that can be modelled by a stochastic differential equation in Rd as follows: dXt = b(Xt )dt + c(Xt )dWt + σ (Xt )dBt ,

(5.18)

where (W, B) is a (m + d)-dimensional Brownian motion and the observation process Yt is the logarithm of the stock price process that satisﬁes the following SDE: t Yt = hs (Xs )ds + Wt , (5.19)

0

˜ s is a function from Rd to itself. It is clear where hs (x) = s−1 x − 12 A that equations (5.18) and (5.19) have a similar form as the ﬁltering model we introduced in this chapter. Although the observation function h here is time dependent, the ﬁltering equation still holds with obvious modiﬁcation. However, the key point in solving the portfolio optimization problem is to represent the wealth process according to processes that are Gt -adapted.

5.4

Particle-system representation

Note that by equations (1.4) and (1.6), the wealth process Wt satisﬁes the following SDE: ⎞ ⎛ d d j ij d Wt = ⎝Xt0 Wt + (Xti − Xt0 )uit ⎠ dt + σt uit dWt , (5.20) i=1

i,j=1

where uit is the dollar amount in the ith stock you decide to own at time t. Clearly, your decision has to be based on the available information, and hence, the portfolio ut = (u1t , . . . , udt )∗ is Gt -adapted. As we know from Section 1.1.2, Xt0 is Gt -adapted. However, (Xti , Wti ), i = 1, 2, . . . , d are not Gt -adapted. Note that the innovation process T πs , hs ds νt = Yt − 0

is Gt -adapted. By equation (5.19), we get T hs (Xs )ds Wt = Yt − = νt +

0 T

0

¯ s − Xs ds, s−1 X

¯ s = E (Xs |Gs ). Inserting back into equation (5.20), we get where X ⎛ ⎞ d d ij j ¯ i − X 0 )ui ⎠ dt + d Wt = ⎝Xt0 Wt + (X σt uit dνt . t t t i=1

(5.21)

i,j=1

Note that all the processes as in equation (5.21) are Gt -adapted. The methods of stochastic control theory can be applied to obtain an optimal portfolio. This is a case where we can separate the ﬁltering problem from the optimal control, namely, the model satisﬁes the separation principle. Since the theory of stochastic control is beyond the scope of this book, we refer the interested reader to the book of Yong and Zhou [153] for the general theory, and to the paper of Xiong and Zhou [150] for the application in obtaining an optimal portfolio (in some sense) for the model (with a slight generalization) introduced in this example.

5.4

Particle-system representation

In this section, we establish a particle-system representation for the unnormalized ﬁlter that will be useful in Chapter 8. We will represent Vt in terms of a conditionally independent system of particles.

93

94

5 : Filtering model and Kallianpur–Striebel formula

ˆ Recall that Yt is a P-Brownian motion. Note that the signal Xt is given by the following stochastic differential equation: dXt = (b − ch)(Xt )dt + c(Xt )dYt + σ (Xt )dBt . Further, dMt = Mt h∗ (Xt )dYt . ˆ let Bi , i = 1, 2, . . . , be indepenOn the probability space (, F , P), dent copies of B, and let them be independent of Y. Now we consider an interacting particle system: For i = 1, 2, . . . , dXti = (b − ch)(Xti )dt + c(Xti )dYt + σ (Xti )dBit ,

(5.22)

and

dMti = Mti h∗ (Xti )dYt M0i = 1.

(5.23)

Theorem 5.9 Suppose that {X0i , i = 1, 2, . . .} are i.i.d. random vectors with common distribution π0 on Rd . Then 1 i Mt f (Xti ), Vt , f = lim k→∞ k

k

(5.24)

i=1

where {(Mi , X i ) : i = 1, 2, . . .} is the unique strong solution to the particle system equations (5.22)–(5.23). Proof Since the system equations (5.22)–(5.23) has a unique solution, there is a functional Ft such that (Xti , Mti ) = Ft (X0i , Bi , Y). By the independence of Bi , we see that (Xti , Mti ) (i = 1, 2, . . .) are conditionally (given Gt ) independent with identical conditional distribution. Therefore, by a conditional version of the strong law of large numbers, we have 1 i ˆ t f (Xt )|Gt ) = Vt , f , lim Mt f (Xti ) = E(M k→∞ k k

a.s.

(5.25)

i=1

5.5

5.5

Notes

Notes

Since the early work of Stratonovich [142], [143] and Kushner [100], [101], non-linear ﬁltering has been studied by many authors under various setups. Here we only mention a few: Grigelionis [72], Kailath [79], Kailath and Greesy [80], Frost and Kailath [66], Liptser [110], [111], [112], Liptser and Shiryaev [113], [114], [115], [116], Rozovskii [136], Shiryaev [138], [139], Striebel [144], Wentzell [147], Wonham [149] and Yershov [151], [152]. The celebrated paper of Fujisaki, et al. [67] brings to a culmination the innovation approach to non-linear ﬁltering of diffusion processes. The ﬁltering equation is usually called the Kushner–Stratonovich equation or the Kushner–FKK equation. The upper and lower bounds for the error of the optimal ﬁlter were studied by Bobrovsky and Zakai [13], [14], and, Zakai and Ziv [155]. We omit this topic because we want to restrict the length of this book. The work of Kallianpur and Striebel [82], [83] establish the representation of the optimal ﬁlter in terms of the unnormalized ﬁlter, which was studied in the pioneering doctoral dissertations of Duncan [61] and Mortensen [127] and the important paper of Zakai [154]. The linear SPDE (5.13) is called the Duncan–Mortensen–Zakai equation, or, simply, Zakai’s equation.

95

6

Uniqueness of the solution for Zakai’s equation

In this chapter, we prove the uniqueness of the solution to Zakai’s equation by transforming it to an SDE in a Hilbert space and by making use of estimates based on Hilbert-space techniques. Most of the material in Sections 6.2–6.4 is taken from Kurtz and Xiong [97] where a large class of linear stochastic partial differential equations are studied. Although the techniques we introduce here are for Zakai’s equation, they can be applied to other classes of linear SPDEs that includes Zakai’s equation as a special example.

6.1

Hilbert space

In this section, we state some useful facts about Hilbert spaces that will be utilized in proving the uniqueness for the solution of Zakai’s equation. Deﬁnition 6.1 A linear space H is an inner product space if there is a bilinear form ·, · on H × H such that x, x ≥ 0 for all x ∈ H and x, x = 0 if and only if x = 0. In an inner product space, we deﬁne a norm x = x, x1/2 ,

∀x ∈ H

and a metric d(x, y) = x − y,

∀ x, y ∈ H.

The space H is separable if it has a countable dense subset. A sequence {xn : n ≥ 1} is a Cauchy sequence if for any > 0, there exists N > 0 such that xn − xm < whenever n, m ≥ N. The space H is complete if every Cauchy sequence is convergent in H. Deﬁnition 6.2 An inner product space H is a Hilbert space if it is complete. (We shall assume throughout this chapter that it is separable.)

6.1

Hilbert space

The following inequality is the Cauchy–Schwarz inequality in Hilbert space. Proposition 6.3 Let H be a Hilbert space and x, y ∈ H. Then | x, y | ≤ xy. Proof For any t ∈ R, we have 0 ≤ tx + y, tx + y = x2 t 2 + 2 x, y t + y2 . Thus, (2 x, y )2 − 4x2 y2 ≤ 0. The conclusion of the proposition then follows easily.

Next we consider the basis of H. Deﬁnition 6.4 A countable set {hj : j = 1, 2, . . .} is a complete orthonormal system (CONS) of H if i) hj , hk = δjk , ∀j, k = 1, 2, . . . . ii) Every h ∈ H can be represented as a linear combination of {hj : j = 1, 2, . . .}. Proposition 6.5 If {hj : j = 1, 2, . . .} is a CONS of H, then for any h ∈ H, we have i) h=

∞

h, hj hj ;

j=1

ii) ∞ | h, hj |2 = h2 . j=1

Proof i) As h can be represented as a linear combination of {hj : j = 1, 2, . . .}, h=

∞ j=1

aj hj .

97

98

6 : Uniqueness of the solution for Zakai’s equation

Then, ∞ h, hk = aj hj , hk = ak .

j=1

ii) By the continuity and the linearity of the inner product, we have %∞ & ∞ 2 h, hj hj , h, hj hj h = h, h = j=1

=

j=1

∞

h, hj h, hk hj , hk

j,k=1

=

∞

2 h, hj .

j=1

Let H1 and H2 be two Hilbert spaces. We denote by H1 ⊗ H2 the completion of the linear span of {(h1 , h2 ) : hi ∈ Hi , i = 1, 2} with respect to the norm · H1 ⊗H2 deﬁned by (h1 , h2 )2H1 ⊗H2 = h1 2H1 + h2 2H2 . It is easy to show that H1 ⊗ H2 is also a Hilbert space. H1 ⊗ H2 is called the tensor product of the Hilbert spaces H1 and H2 .

6.2

Transformation to a Hilbert space

In this section, we consider the uniqueness of the solution to the SDE (5.13) for the unnormalized ﬁlter Vt taking values in MF (Rd ). Let H0 = L2 (Rd ) be the Hilbert space consisting of square-integrable functions on Rd with the usual L2 -norm and the inner product given by 2 2 |φ(x)| dx and φ, ψ0 = φ(x)ψ(x)dx, (6.1) φ0 = Rd

Rd

respectively. Let MG (Rd ) be the space of ﬁnite signed measures on Rd . To obtain good estimates and to derive uniqueness for the solution to equation (5.13), we transform an MG (Rd )-valued process to an H0 -valued process. For any ν ∈ MG (Rd ) and δ > 0, let Gδ (x − y)ν(dy), (6.2) (Tδ ν)(x) = Rd

6.2

Transformation to a Hilbert space

where Gδ is the heat kernel given by

− d2

Gδ (x) = (2π δ)

|x|2 exp − 2δ

.

For t ≥ 0, we deﬁne operators Tt : H0 → H0 by Tt φ(x) = Gt (x − y)φ(y)dy, ∀φ ∈ H0 . Rd

Lemma 6.6 The family of operators {Tt : t ≥ 0} forms a contraction semigroup on H0 , i.e. ∀ t, s ≥ 0 and φ ∈ H0 , we have Tt+s = Tt Ts and Tt φ0 ≤ φ0 . Proof For any φ ∈ H0+ , by Fubini’s theorem, we have

Tt (Ts φ)(x) = = =

Rd Rd Rd

Gt (x − y) Rd

Rd

Gs (y − z)φ(z)dz dy

Gt (x − y)Gs (y − z)dyφ(z)dz

Gt+s (x − z)φ(z)dz

= Tt+s φ(x). The case for general φ ∈ H0 follows from the linearity. By the Cauchy–Schwarz inequality, we have 2 = Gδ (x − y)φ(y)dy dx Rd Rd 2 ≤ Gδ (x − y)φ(y) dy Gδ (x − z)dzdx

Tδ φ20

Rd

=

Rd 2 φ0 .

This proves the second property.

Rd

We shall need the following facts. Lemma 6.7 i) If ν ∈ MG (Rd ) and δ > 0, then Tδ ν ∈ H0 . ii) If ν ∈ MG (Rd ) and δ > 0, then T2δ |ν|0 ≤ Tδ |ν|0 , where |ν| is the total variation measure of ν.

99

100

6 : Uniqueness of the solution for Zakai’s equation

Proof i) By the Cauchy–Schwarz inequality, we get

2 Gδ (x − y)ν(dy) dx |Tδ ν(x)| dx = Rd Rd Rd Gδ (x − y)2 |ν|(dy)|ν|(Rd )dx ≤

2

Rd

Rd

= G2δ (0)|ν|(Rd )2 < ∞. ii) It follows from the semigroup property of Tt that T2δ = Tδ Tδ . Thus, by i) and the contraction property, we see that T2δ |ν|0 = Tδ (Tδ |ν|)0 ≤ Tδ |ν|0 .

Let Zsδ = Tδ Vs , where Vs is an MG (Rd )-valued solution to equaδ tion (5.13). To obtain an estimate for the H 0 -norm of the process Z , we need the following lemma. Recall that ν, f represents the integral of the function f with respect to the measure ν. Lemma 6.8 For any δ > 0, v ∈ MG (Rd ) and f ∈ H0 , we have i) Tδ v, f 0 = v, Tδ f .

(6.3)

ii) If, in addition, ∂i f ∈ H0 , then ∂i Tδ f = Tδ ∂i f , where ∂i f =

(6.4)

∂f ∂xi .

Proof i) First, we prove equation (6.3) for f ≥ 0. By Fubini’s theorem, we can change the order of the integrals below: Gδ (x − y)v(dy)f (x)dx Tδ v, f 0 = Rd Rd = Gδ (x − y)f (x)dxv(dy) Rd Rd = Tδ f (y)v(dy) = v, Tδ f . Rd

The case for general f follows from the linearity. ii) Recall that C01 (Rd ) denotes the collection of functions with compact support and continuous derivatives of order 1. Taking a test function ψ ∈

6.2

Transformation to a Hilbert space

C01 (Rd ), we have

2

Rd

≤

Rd

Rd

Gδ (x − y)|f (y)||∂i ψ(x)|dxdy

Rd

Gδ (x − y)2 |∂i ψ(x)|dxdy

= G2δ (0)

2 |∂i ψ(x)|dx

Rd

Rd

Rd

Rd

|f (y)|2 |∂i ψ(x)|dxdy

|f (y)|2 dy < ∞.

(6.5)

By Fubini’s theorem and the integration-by-parts formula, we have

− Tδ f , ∂i ψ

=−

0

Rd

Rd

Gδ (x − y)f (y)∂i ψ(x)dxdy

=− Gδ (x − y)∂i ψ(x)dxf (y)dy Rd Rd ∂ Gδ (x − y)ψ(x)dxf (y)dy. = d d R R ∂xi

(6.6)

Similar to equation (6.5), we can prove that

Rd

R

∂ Gδ (x − y)ψ(x)f (y) dxdy < ∞. d ∂x i

Thus, by Fubini’s theorem again, we can continue equation (6.6) and obtain

− Tδ f , ∂i ψ

0

∂ Gδ (x − y)f (y)dyψ(x)dx Rd Rd ∂yi ∂ = Gδ (x − y) f (y)dyψ(x)dx ∂yi Rd Rd = Tδ ∂i f , ψ 0 . =−

The proof of ii) now follows from duality. Replacing f by Tδ f in equation (5.13), we have δ Zt , f 0 = Vt , Tδ f = V0 , Tδ f +

t 0

Vs , LTδ f ds +

(6.7)

t 0

Vs , ∇ ∗ (Tδ f )c + (Tδ f )h∗ dYs .

101

102

6 : Uniqueness of the solution for Zakai’s equation

Note that for any ν ∈ MF (Rd ), % & d d 1 2 aij ∂ij (Tδ f ) + bi ∂i (Tδ f ) ν, LTδ f = ν, 2 i,j=1

i=1

d d " 1 ! bi ν, Tδ ∂i f , aij ν, Tδ ∂ij2 f − 2

=

i,j=1

i=1

where the last equality follows from equation (6.4). Hence, using equation (6.3), we have

d d " 1 ! 2 Tδ (bi ν), ∂i f 0 Tδ (aij ν), ∂ij f − ν, LTδ f = 0 2

i,j=1

=

i=1

d d " 1 ! 2 ∂i Tδ (bi ν), f 0 . ∂ij Tδ (aij ν), f + 0 2 i,j=1

(6.8)

i=1

Similarly, we can prove that ν, ∇ ∗ (Tδ f )c + (Tδ f )h∗ = −∇ ∗ Tδ (cν) + Tδ (h∗ ν), f 0 .

(6.9)

Inserting equations (6.8) and (6.9) back into equation (6.7) we have

Ztδ , f 0

=

Z0δ , f 0

d " 1 t! 2 ∂ij Tδ (aij Vs ), f ds + 0 2 0 i,j=1

+

d t i=1

0

∂i Tδ (bi Vs ), f

ds − 0

0

t

∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ), f

By Itô’s formula, we have

Ztδ , f

2 0

d t " δ ! 2 2 Zs , f 0 ∂ij Tδ (aij Vs ), f ds = Z0δ , f 0 + 0

i,j=1 0

+

d i=1 t

−

0

0

t

2 Zsδ , f 0 ∂i Tδ (bi Vs ), f 0 ds

2 Zsδ , f 0 ∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ), f 0 dYs

t ∗ + ∇ Tδ (cVs ) − Tδ (h∗ Vs ), f H 0

0

⊗Rm

2 ds.

0

dYs .

6.3

Some useful inequalities

Summing over f in a CONS of H0 , we get Ztδ 20

= Z0δ 20

+

d i=1 t

−

+

0

+

t

0

t 0

d t!

" Zsδ , ∂ij2 Tδ (aij Vs ) ds 0

i,j=1 0

2 Zsδ , ∂i Tδ (bi Vs ) 0 ds

2 Zsδ , ∇ ∗ Tδ (cVs ) − Tδ (h∗ Vs ) 0 dYs ' ' ∗ '∇ Tδ (cVs ) − Tδ (h∗ Vs )'2

H0 ⊗Rm

ds.

Taking expectations, we have

Eˆ Ztδ 20 = Eˆ Z0δ 20 +

+

+

d i,j=1 0 d i=1 t

+

0

−2 0

i,j=1 0

t

! " Eˆ Zsδ , ∂ij2 Tδ ((σ σ ∗ )ij Vs ) ds 0

! " Eˆ Zsδ , ∂ij2 Tδ ((cc∗ )ij Vs ) ds 0

ˆ Zδ , ∂i Tδ (bi Vs ) ds 2E s 0

' '2 Eˆ '∇ ∗ Tδ (cVs )'H

0 ⊗R

t

0 t

t 0

+

t

d

m

ds

Eˆ ∇ ∗ Tδ (cVs ), Tδ (h∗ Vs ) H

'2 ' Eˆ 'Tδ (h∗ Vs )'H

0 ⊗R

0 ⊗R

m

ds.

m

ds (6.10)

We will show that the integral terms on the right of equation (6.10) are bounded by a constant times the integral of Tδ (|Vs |)0 . To this end, we need some careful estimates that will be derived in the next section.

6.3

Some useful inequalities

To continue with the estimation in the last section, we now derive some useful inequalities. Throughout this section, we assume that fi : Rd → Rm , i = 1, 2, are two bounded Lipschitz continuous functions, namely, there

103

104

6 : Uniqueness of the solution for Zakai’s equation

exists a constant K such that ∀x, y ∈ Rd ,

|fi (x) − fi (y)| ≤ K|x − y|, and |fi (x)| ≤ K,

∀x ∈ Rd .

Note that |·| denotes the Euclidean norm. We will not indicate the dimension of the space when it is clear from the context. For example, | · | here denotes the norm in Rm . However, it might also denote the norm in Rd or R below. Lemma 6.9 Suppose that g ∈ H0 is such that ∂i g ∈ H0 , i = 1, . . . , d. Then, g, f1 ∂i g ≤ 1 Kg2 . 0 0 2

(6.11)

Proof First, we assume that f1 and g are continuously differentiable and have compact supports. Then, integrating by parts we have 1 1 2 g, f1 ∂i g 0 = f1 (x)∂i (g (x))dx = − g 2 (x)∂i f1 (x)dx, 2 Rd 2 Rd and hence

1 2 g, f1 ∂i g ≤ 1 g (x)∂i f1 (x)dx ≤ Kg20 . 0 2 Rd 2

The general result follows by approximation.

Note that for ζ ∈ MG (Rd ), f1 ζ is a Rm -valued signed measure and hence, Tδ (f1 ζ ) is in H0 ⊗ Rm . The next lemma gives some estimates for Tδ (f1 ζ ). We abuse the notation a little by using |ζ | to denote the total variation measure of the signed measure ζ , i.e. |ζ | = ζ + + ζ − ; while ζ + (resp. ζ − ) is the positive (resp. negative) part of ζ . Lemma 6.10 There exist constants K1 and K2 depending on K such that for any ζ ∈ MG (Rd ), ' ' ' ' 'Tδ (f1 ζ )' 'Tδ (|f1 ||ζ |)' ≤ K Tδ (|ζ |)0 , ≤ (6.12) m H ⊗R 0 0

' ' 'f1 ∂i Tδ (ζ ) − ∂i Tδ (f1 ζ )'

H0 ⊗Rm

and

≤ K1 T2δ (|ζ |)0 ,

Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) H0 ⊗Rm ≤ K2 Tδ (|ζ |)20 .

(6.13)

(6.14)

6.3

Some useful inequalities

Proof The inequalities equation (6.12) follow from the following calculation: ' ' 'Tδ (f1 ζ )'2 = |Tδ (f1 ζ )(x)|2 dx H ⊗Rm

=

Rd

= ≤

(

Rd

Rd

0

Rd

Rd

Rd

Gδ (x − y)f1 (y)ζ (dy),

Rd

Rd

) Gδ (x − z)f1 (z)ζ (dz) dx

Gδ (x − y)Gδ (x − z) f1 (y), f2 (z) ζ (dy)ζ (dz)dx

Tδ (|f1 ||ζ |)(x)2 dx

'2 ' = 'Tδ (|f1 ||ζ |)'0 ≤ K2 Tδ (|ζ |)20 . Note that |f1 (x)∂i Tδ (ζ )(x) − ∂i Tδ (f1 ζ )(x)| (f1 (x) − f1 (y))∂xi Gδ (x − y)ζ (dy) . = Rd

(6.15)

As ∂i Gδ (x) = −

xi Gδ (x), δ

(6.16)

and

|x|2 Gδ (x) = exp − 4δ

d

2 2 G2δ (x),

(6.17)

by Lipschitz continuity of f1 , we can continue equation (6.15) as follows |f1 (x)∂i Tδ (ζ )(x) − ∂i Tδ (f1 ζ )(x)| |xi − yi | Gδ (x − y)|ζ |(dy) K|x − y| ≤ d δ R d |x − y|2 |x − y|2 ≤K exp − 2 2 G2δ (x − y)|ζ |(dy) δ 4δ Rd d

≤ 2 2 +2 KT2δ (|ζ |)(x),

(6.18)

105

106

6 : Uniqueness of the solution for Zakai’s equation

2 where the last inequality follows from the fact that u2 exp − u4 ≤ 4,

for all u ∈ R. Taking the H0 -norm of both sides of equation (6.18) gives d equation (6.13) with K1 = 2 2 +2 K. Finally, we prove equation (6.14). By triangular inequality, we have Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) H0 ⊗Rm ≤ Tδ (f2 ζ ), f1 ∂i Tδ ζ H

0

⊗Rm

(6.19)

+ Tδ (f2 ζ ), ∂i Tδ (f1 ζ ) − f1 ∂i Tδ ζ H ⊗Rm . 0 Note that the ﬁrst term on the right-hand side of equation (6.19) is ∂i (f ∗ Tδ (f2 ζ )), Tδ ζ 1

0

≤ ∂i f1∗ Tδ (f2 ζ ), Tδ ζ 0 + f1∗ ∂i (Tδ (f2 ζ )), Tδ ζ 0 ≤ KTδ (f2 ζ )H0 ⊗Rm Tδ ζ 0 + f2 ∂i (Tδ ζ ), f1 Tδ ζ H

+ ∂i (Tδ (f2 ζ )) − f2 ∂i (Tδ ζ ), f1 Tδ ζ H

m . 0 ⊗R

m 0 ⊗R

By equations (6.11) and (6.13), we then have ∂i (f ∗ Tδ (f2 ζ )), Tδ ζ 1

0

1 2 K Tδ ζ 2 + KK1 T2δ |ζ |0 Tδ ζ 0 2 1 2 2 ≤ K + K + KK1 Tδ |ζ |20 , 2 ≤ K2 Tδ |ζ |20 +

(6.20)

where the last inequality follows from Lemma 6.7. On the other hand, by equations (6.12) and (6.13), the second term of equation (6.19) is bounded by Tδ (f2 ζ )H0 ⊗Rm ∂i Tδ (f1 ζ ) − f1 ∂i Tδ ζ H0 ⊗Rm ≤ KTδ |ζ |0 K1 T2δ |ζ |0 ≤ KK1 Tδ |ζ |20 ,

(6.21)

6.3

Some useful inequalities

where the last inequality follows from Lemma 6.7. Therefore, using equations (6.20) and (6.21), we can continue equation (6.19) to ﬁnish the proof of equation (6.14). Lemma 6.11 There exists a constant K1 such that for any ζ ∈ MG (Rd ), we have ' '2 ' m ' d ' ' 2 ∗ ' ' ≤ K1 Tδ (|ζ |)2 . Tδ ζ , ∂ij Tδ ((cc )ij ζ ) + ∂ T (c ζ ) x δ ik i 0 ' ' 0 ' ' k=1 i=1

d ! i,j=1

"

0

(6.22) Proof To make clear the variable with respect to which the integral is taken, we use the following convention:

Rd

ζ (dx)f (x) for

Rd

f (x)ζ (dx)

when the expression for f is too long. Note that d !

Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )

i,j=1

=

d d i,j=1 R

dx

"

(6.23)

0

Rd

ζ (dy)Gδ (x−y)

Rd

ζ (dz)∂x2i xj Gδ (x−z)

k=1

Using the semigroup property of Gδ , we have Rd

Gδ (x − y)Gδ (x − z)dx = G2δ (y − z).

By equation (6.16), we get ∂ij2 Gδ (x)

=

xi xj 1i=j − δ δ2

m

Gδ (x).

As ∂x2i xj Gδ (x − z) = ∂z2i zj Gδ (x − z),

cik (z)cjk (z).

107

108

6 : Uniqueness of the solution for Zakai’s equation

we can continue equation (6.23) as follows d !

Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )

i,j=1

=

=

d

i,j=1

0

ζ (dy)

d i,j=1 R

d

"

ζ (dz)

Rd

m

cik (z)cjk (z)∂zi ∂zj

k=1

Rd

Gδ (x−y)Gδ (x−z)dx

(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd

G2δ (z − y)

m

cik (z)cjk (z).

k=1

Interchanging y and z in the above equation, we arrive at d !

Tδ ζ , ∂ij2 Tδ ((cc∗ )ij ζ )

i,j=1

=

i,j=1

0

d

"

(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd

1 (cik (z)cjk (z) + cik (y)cjk (y)). × G2δ (z − y) 2 m

(6.24)

k=1

Similarly, we can prove that '2 ' ' m ' ' ' d ' ' ∂ T (c ζ ) δ i ik ' ' ' ' k=1 i=1 0

=−

m

d

Tδ (cik ζ ), ∂i ∂j Tδ (cjk ζ ) 0

k=1 i,j=1

=−

m d k=1 i,j=1

(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd

1 × G2δ (z − y) (cik (y)cjk (z) + cik (z)cjk (y)). 2

(6.25)

6.4

Uniqueness for Zakai’s equation

Hence, combining equations (6.24) and (6.25), the left-hand side of equation (6.22) is equal to

d

(zi − yi )(zj − yj ) 1i=j ζ (dy) ζ (dz) − 2δ 4δ 2 Rd Rd

i,j=1

G2δ (z − y)

1 (cik (y) − cik (z))(cjk (y) − cjk (z)). 2 m

×

k=1

Using equation (6.17) and the Lipschitz continuity of c, we see that the above quantity is bounded by d 1 |z − y|2 |z − y|2 1 exp − |ζ |(dy) |ζ |(dz) + 2 2δ 4δ 4δ 2 Rd Rd i,j=1

d

× 2 2 G4δ (z − y)K2 |z − y|2 ≤ 4K

d

2

d i,j=1 R

|ζ |(dy)

Rd

d

|ζ |(dz)2 2 G4δ (z − y)

d

= d 2 2 2 +2 K2 T2δ (|ζ |)20 d

≤ d 2 2 2 +2 K2 Tδ (|ζ |)20 , where the ﬁrst inequality follows by bounding (4v2 + 2v)e−v . The lemma d follows with K1 = d 2 2 2 +2 K2 .

6.4

Uniqueness for Zakai’s equation

Now we continue the estimation started in Section 6.2 by making use of the inequalities we obtained in Section 6.3. Theorem 6.12 If V is an MG (Rd )-valued solution of equation (5.13) and Zδ = Tδ V, then t δ 2 δ 2 ˆ EZt 0 ≤ Z0 0 + K1 Eˆ Tδ (|Vs |)20 ds, (6.26) 0

where K1 is a constant. Proof The last term of equation (6.10) is bounded by a constant times Tδ (|Vs |)20 by equation (6.12). The bound for the second term of equation (6.10) follows from Lemma 6.11. The bound for the sum of the

109

110

6 : Uniqueness of the solution for Zakai’s equation

third and ﬁfth terms of equation (6.10) also follows from Lemma 6.11. The bound for the fourth and sixth terms of equation (6.10) follows by equation (6.14). Corollary 6.13 If V is an MF (Rd )-valued solution of equation (5.13) and ˆ Vt 2 < ∞, ∀t ≥ 0. V0 ∈ H0 , then Vt ∈ H0 a.s. and E 0 Proof Since Vt is a measure, |Vt | = Vt . It follows from equation (6.26) that t δ 2 δ 2 ˆ EZt 0 ≤ Z0 0 + K1 Eˆ Zsδ 20 ds. 0

By Gronwall’s inequality, we have

Eˆ Ztδ 20 ≤ Z0δ 20 eK1 t . Note that lim Ztδ , φ 0 = lim

δ→0

(6.27)

δ→0 Rd

Gδ (x − y)φ(x)dxVt (dy) = Vt , φ .

Rd

Let {φj } be a complete, orthonormal system of H0 such that φj ∈ Cb (Rd ). Then, by Fatou’s lemma, ⎡ ⎤ ⎡ ⎤ 2 2 ˆ Zδ 2 ≤ V0 2 eK1 t . ˆ⎣ Eˆ ⎣ lim Ztδ , φj 0 ⎦ ≤ lim inf E Vt , φj ⎦ = E t 0 0 j

δ→0

j

Let

δ→0

Vt , φj φj .

V˜ t =

j

Then, V˜ t ∈ H0 and !

V˜ t , f

" 0

=

Vt , φj f , φj 0 = Vt , f .

j

ˆ Vt 2 < ∞. Hence, Vt ∈ H0 and E 0

These estimates give uniqueness of MF (Rd )-valued solutions with V0 ∈ H0 . Theorem 6.14 Suppose that V0 ∈ H0+ . Then Zakai’s equation (5.13) has at most one MF (Rd )-valued solution. Proof Let Vt1 and Vt2 be two MF (Rd )-valued solutions with the same initial value V0 . By Corollary 6.13, Vt1 , Vt2 ∈ H0 a.s. Let Vt = Vt1 − Vt2 . Then Vt ∈ H0 and t 2 ˆ ETδ Vt 0 ≤ K1 Eˆ Tδ (|Vs |)20 ds. 0

6.5

A duality representation

Note that for Vt ∈ H0 , we have |Vt | ∈ H0 and |Vt |0 = Vt 0 . Taking δ → 0, we get t t 2 2 ˆ ˆ EVt 0 ≤ K1 E |Vs |0 ds = K1 Eˆ Vs 20 ds. 0

0

By Gronwall’s inequality, we arrive at Vt ≡ 0.

By exactly the same argument we can prove the following theorem. Theorem 6.15 Suppose that V0 ∈ H0 . Then Zakai’s equation (5.13) has at most one H0 -valued solution.

6.5

A duality representation

In this section, we give a representation of the unnormalized ﬁlter in terms of the solution to an SPDE that is the dual of Zakai’s equation. This representation will be useful in proving the convergence of the numerical approximations of the optimal ﬁlter. To aid the understanding of this method, we recall the duality used in the proof of uniqueness for the solution of a linear partial differential equation (PDE). Let u be a solution to the following PDE: ∂u = L u, ∂s

(6.28)

with initial condition u0 , where L is a second-order differential operator and L is the adjoint operator of L. To prove the uniqueness for the solution of equation (6.28), we consider the following backward PDE for s ∈ [0, t] with t being ﬁxed: ∂v ∂s = −Lv, (6.29) vt = g. Then,

d ds

us , vs 0 = 0 and, hence, ut , g 0 = u0 , v0 0 ,

here the notation ·, ·0 is the inner product in L2 (Rd ) introduced in equation (6.1). This implies the uniqueness for the solution to equation (6.28). Now we imitate equation (6.29) and consider the backward SPDE: ˆ dψs = −Lψs ds − ∇ ∗ ψs c + h∗ ψs dY 0 ≤ s ≤ t, s, (6.30) ψt = φ,

111

112

6 : Uniqueness of the solution for Zakai’s equation

where dˆ denotes the backward Itô integral. Namely, we take the right endpoints in the approximating Riemann sum in deﬁning the stochastic integral. Remark 6.16 In the ordinary Riemann integral, it does not matter which point we take in each subinterval of a partition to deﬁne the Riemann sum. However, it is important to take the left endpoint when deﬁning Itô’s stochastic integral. Therefore, it is crucial here to take the right endpoint in the Riemann sum when we consider the backward SPDE. Hereafter, we will denote by Cbk Rd , X the set of all bounded continuderivatives up to order ous mappings from Rd to X with bounded partial k d k, where X is a Hilbert space. We endow Cb R , X with the following norm ||ϕ||k,∞ = sup Dα ϕ (x)X , ϕ ∈ Cbk Rd , X , where α =

|α|≤k x∈R

α1, . . . , αd

d

is a multi-index, |α| = α 1 + · · · + α d and 1 d Dα ϕ = ∂1α · · · ∂dα ϕ. Also, let Wpk Rd , X be the set of all functions with generalized partial derivatives up to order k with both the function and all k d its partial derivatives being p-integrable. We endow Wp R , X with the following Sobolev norm ⎛ ||ϕ||k,p

=⎝

|α|≤k

Rd

⎞1 p

p Dα ϕ (x)X

dx⎠ .

When X is clear from the context or X = R, we will drop it from the notation for simplicity. To demonstrate the existence of a solution, we now convert the backward SPDE (6.30) to an ordinary SPDE by reversing the time parameter. Fix t > 0. For 0 < s < t, we deﬁne Y˜ s = Yt − Yt−s and ψ˜ s = ψt−s . Then, ψ˜ s satisﬁes the following forward SPDE d ψ˜ s = Lψ˜ s ds + ∇ ∗ ψ˜ s c + h∗ ψ˜ s d Y˜ s , ψ˜ 0 = φ.

0 ≤ s ≤ t,

(6.31)

˜ we get the As equation (6.31) is a Zakai-type equation with Y replaced by Y, existence of its solution (as the optimal ﬁlter for a suitable ﬁltering model).

6.5

A duality representation

Similar to the proof of Theorem 6.14, we can show the uniqueness for the solution to equation (6.31) with φ ∈ H0+ = W20 (Rd )+ . However, we need

here the solution of equation (6.30) to be a process with values in Cb2 Rd . To achieve this, we show that ψs ∈ W2k Rd , where k is chosen so that

2(k − 2) > d, and then, using a standard Sobolev imbedding argument, which we state below (without giving the proof) for the convenience of the reader. We refer the reader to the book of Adams [1] for the proof of a more general version of the theorem. Theorem 6.17 (Sobolev) If kp > d + j, then Wpk (Rd ) can be embedded into j

Cb (Rd ), i.e. there is a constant K1 and a linear mapping from f ∈ Wpk (Rd ) j to f¯ ∈ C (Rd ) such that f (x) = f¯ (x) for almost every x and b

f¯ j,∞ ≤ K1 f k,p . Now we are ready to prove the existence of a smooth solution to the SPDE (6.31). We shall need the following Assumption (BD): The mappings a, b, c, h, φ are in Cbk (Rd , X ) with k = d2 + 2 and X being Sd , Rd , Rd×m , Rm and R, respectively. Also, we assume φ ∈ W2k (Rd ). Lemma 6.18 Suppose that Assumption (BD) holds. Then there exists a constant K1 independent of φ and s ∈ [0, t] such that (6.32) E[ψs 2k,2 ] ≤ K1 φ2k,2 . As a consequence ψs ∈ Cb2 Rd a.s. and there exists a constant K2 independent of φ and s ∈ [0, t] such that E[ψs 22,∞ ] ≤ K2 φ2k,2 . (6.33) Proof It follows from the same arguments as those leading to equation (6.27) that there exists a constant K3 such that

Eˆ ψs 20,2 ≤ K3 φ20,2 . Next, we take derivatives (smooth out by the Brownian semigroup Tδ as we did in Section 6.2 if necessary) on both sides of equation (6.31). For simplicity of notations, we assume d = 1. Then ψ˜ s1 ≡ ∇ ψ˜ s satisﬁes the following SPDE d ψ˜ s1 = L1 ψ˜ s1 ds + ∇ ∗ ψ˜ s1 c + c1 ψ˜ s1 + c2 ψ˜ d Y˜ s , ψ˜ 1 = ∇φ 0

113

114

6 : Uniqueness of the solution for Zakai’s equation

where L1 is a second order differential operator with bounded coefﬁcients, ci s (i = 1, 2), are bounded functions. With similar arguments as in equation (6.27), we can prove that there is a constant K4 such that

Eˆ ψ˜ s1 20,2 ≤ K4 φ21,2 . The higher-derivative estimates for equation (6.32) follow by induction. The inequality equation (6.33) follows from Sobolev’s imbedding theorem. Remark 6.19 Condition (BD) is not sharp for the conclusion of Lemma 6.18 to hold. It can be relaxed by using Krylov’s Lp -theory, whose proof is more complicated than the proof we presented above. For g ∈ L2 ([0, t], Rm ), we deﬁne r √ 1 r W ∗ 2 θg (r) = exp −1 gs dWs + |gs | ds . 2 0 0

(6.34)

We will need the following lemma that implies that the family θgW (t) : g is bounded on [0, t] ˆ is dense in L2 (, FtW , P). ˆ satisﬁes Lemma 6.20 If ξ ∈ L2 (, FtW , P) Eˆ ξ θgW (t) = 0, for all bounded function g on [0, t], then ξ = 0 a.s. Proof Let

n : 1 ≤ i ≤ 2n , Hn = σ Wtin − Wti−1

where tin = it2−n , i = 0, 1, 2, . . . , 2n . Then {Hn } is a sequence of σ -ﬁelds increasing to FtW . By the martingale convergence theorem (Theorem 2.9), we have ˆ (ξ |Hn ) → ξ , ξn ≡ E a.s. n : 1 ≤ i ≤ 2n . Let Note that ξn is a function of Wtin − Wti−1 n

gsn

=

2 i=1

n ,t n ) (s), λi 1[ti−1 i

6.5

A duality representation

where the λi s are constants. Then ˆ ξ θ Wn (t) = E ˆ ξ θ Wn (t)|Hn = E ˆ ξn θ Wn (t) . ˆ E 0=E g

g

g

t Note that 0 |gsn |2 ds is non-random. This implies that the Fourier transformation of ξn is ⎛ ⎞⎞ ⎛ 2n √ ⎠⎠ = 0. n Eˆ ⎝ξn exp ⎝ −1 λi Wtin − Wti−1 i=1

Therefore, ξn = 0 a.s. and hence, ξ = 0 a.s.

The following lemma will play a key role in the proof of the convergence of Vtn to the unnormalized ﬁlter Vt in Chapter 8 as well as for the duality representation of the unnormalized ﬁlter. Recall that Xt is the signal and dMt = Mt h∗ (Xt )dYt . Note that ψ is Gt -measurable which is independent of FrB . The stochasr tic integral 0 Ms ∇ ∗ ψs σ (Xs )dBs is well deﬁned on the stochastic basis ˆ F˜ s ), where F˜ s = Fs ∨ Gt , 0 ≤ s ≤ t. (, F , P, Lemma 6.21 Suppose that Condition (BD) holds. Then, for every t ≥ 0, we have t ψt (Xt )Mt − ψ0 (X0 ) = Ms ∇ ∗ ψs σ (Xs )dBs , a.s. (6.35) 0

Proof Let f and g be two bounded smooth functions on [0, t] taking values in Rm and Rd , respectively. Let θfY (r) be deﬁned as in equation (6.34), and let θgB (r) be deﬁned in a similar fashion. Note that both sides of equation (6.35) are Gt ∨ FtB -measurable. It follows from the previous lemma (with W replaced by (B, Y)) that in order to prove equation (6.35) it is sufﬁcient to show that for all bounded functions f and g, we have Eˆ (ψt (Xt )Mt − ψ0 (X0 )) θfY (t)θgB (t) t ∗ Y B ˆ Ms ∇ ψs σ (Xs )dBs θf (t)θg (t) . (6.36) =E 0

Let

ˆ ψr (x)θ˜ Y (r)|Fr , r (x) = E f

∀x ∈ Rd ,

115

116

6 : Uniqueness of the solution for Zakai’s equation

where θ˜fY (r) = θfY (t)/θfY (r) = exp

t √ 1 t 2 ∗ −1 fs dYs + |fs | ds . 2 r r

Let θ˜gB (r) be deﬁned similarly. Since ψr and θ˜fY (r) are measurable with respect to the σ -ﬁeld Gr,t , which is independent of Fr , we get that ˆ ψr (x)θ˜ Y (r) . r (x) = E f As θ˜gB (r) is independent of Fr ∨ Gr,t and θgB (r) is a martingale, we have Eˆ ψr (x)θ˜fY (r)θ˜gB (r)|Fr = Eˆ Eˆ ψr (x)θ˜fY (r)θ˜gB (r)Fr ∨ Gr,t Fr ˆ ψr (x)θ˜ Y (r)E ˆ θ˜ B (r)Fr Fr =E g f ˆ ψr (x)θ˜ Y (r)Fr =E f = r (x). Hence, for r ∈ [0, t], we have Eˆ ψr (Xr )Mr θfY (t)θgB (t)|Fr = Mr θfY (r)θgB (r)Eˆ ψr (Xr )θ˜fY (r)θ˜gB (r)|Fr = r (Xr )Mr θfY (r)θgB (r).

(6.37)

t ˆ

t ∗ Note that r fs∗ dY s coincides with r fs dYs since fs is deterministic. Thus, by the backward Itô formula, we have √ ˆ r. dˆ θ˜fY (r) = − −1θ˜fY (r)fr∗ dY

(6.38)

Applying the backward Itô formula to equations (6.30) and (6.38), we get ˆ r θ˜ Y (r)) = − Lψr θ˜ Y (r)dr − ∇ ∗ ψr c + h∗ ψr θ˜ Y (r)dY ˆ r d(ψ f f f √ √ ˆ r + −1 ∇ ∗ ψr c + h∗ ψr fr θ˜ Y (r)dr − −1ψr θ˜fY (r)fr∗ dY f √ ∗ = −Lψr + −1 ∇ ψr c + h∗ ψr fr θ˜fY (r)dr √ ˆ r. − ∇ ∗ ψr c + h∗ ψr − −1ψr fr∗ θ˜fY (r)dY

6.5

A duality representation

Writing into integral form, we get t √ Y ˜ φ − ψs θf (s) = Lψr − −1 ∇ ∗ ψr c + h∗ ψr fr θ˜fY (r)dr s

+

t s

∇ ∗ ψr c + h∗ ψr −

√

ˆ r. −1ψr fr∗ θ˜fY (r)dY

Taking expectation on both sides, we see that t √ Lr − −1 ∇ ∗ r cfr + h∗ fr r dr, φ − s = s

and hence, r is the solution to the following PDE: √ d r = −Lr + −1 ∇ ∗ r cfr + h∗ fr r . dr As a consequence, is differentiable in r and has continuous ﬁrst- and second-order partial derivatives in x. By Itô’s formula, we have √ dr (Xr ) = −Lr (Xr ) + −1 ∇ ∗ r cfr + h∗ fr r (Xr ) dr ˜ r (Xr )dr + ∇ ∗ r σ (Xr )dBr + c(Xr )dYr + L √ √ = −1 ∇ ∗ r cfr + h∗ fr r + −1∇ ∗ r ch (Xr )dr + ∇ ∗ r σ (Xr )dBr + c(Xr )dYr . (6.39) Note that dMr = Mr h∗ (Xr )dYr , dθfY (r) =

√ −1θfY (r)fr∗ dYr ,

dθgB (r) =

√ −1θgB (r)gr∗ dBr .

and

Applying Itô’s formula to the four equations above, we get d(r (Xr )Mr θfY (r)θgB (r)) √ = −1Mr θfY (r)θgB (r)∇ ∗ r σ (Xr )gr dr + d(mart.)

(6.40)

117

118

6 : Uniqueness of the solution for Zakai’s equation

Making use of equation (6.37) with r = t and r = 0, respectively, we get Eˆ (ψt (Xt )Mt − ψ0 (X0 )) θfY (t)θgB (t) ˆ E ˆ ψt (Xt )Mt θ Y (t)θ B (t)Ft − E ˆ ψ0 (X0 )θ Y (t)θ B (t) =E g g f f ˆ t (Xt )Mt θ Y (t)θ B (t) − 0 (X0 ) =E g f t √ (6.41) Eˆ Mr θfY (r)θgB (r)∇ ∗ r σ (Xr )gr dr, = −1 0

where the last equality follows from equation (6.38). ˆ F˜ s ), we get By Itô’s formula on the stochastic basis (, F , P, r Ms ∇ ∗ ψs σ (Xs )dBs θgB (r) 0

=

r

· · · dBs +

0

√

−1

0

r

Ms ∇ ∗ ψs σ (Xs )gs θgB (s)ds.

This implies that t ∗ Y B ˆ E Ms ∇ ψs σ (Xs )dBs θf (t)θg (t) 0

t ∗ Y B ˆ ˆ Ms ∇ ψs σ (Xs )dBs θf (t)θg (t)Gt =E E ˆ =E

0

√

−1

t

0

Ms ∇

∗

ψs σ (Xs )gs θgB (s)dsθfY (t)

t √ ˆ Ms ∇ ∗ ψs σ (Xs )gs θ B (s)θ Y (t)|Fs ds ˆ E Ms E = −1 g f 0

t √ ∗ Y B Y ˆ ˆ ˜ −1 Ms E ∇ ψs (Xs )θf (s)|Fs σ (Xs )gs θg (s)θf (s)ds =E ˆ =E

0

√

−1

0

t

Ms ∇ ∗ s (Xs )σ (Xs )gs θgB (s)θfY (s)ds .

(6.42)

From equations (6.41) and (6.42), we see that equation (6.36) holds, which then implies equation (6.35). As a corollary, we get the uniqueness for the solution to Zakai’s equation. Corollary 6.22 Suppose that Condition (BC) holds, φ ∈ Cb (Rd ) and π0 ∈ L2 (Rd ). Then

Vt , φ = π0 , ψ0 .

(6.43)

6.5

A duality representation

Proof First, we assume that Condition (BD) holds. By equation (5.12), we see that t Eˆ Ms ∇ ∗ ψs σ (Xs )dBs Gt = 0. 0

It then follows from equation (6.35) and Theorem 5.3 that ˆ (ψ0 (X0 )|Gt ) = π0 , ψ0 . ˆ φ(Xt )Mt Gt = E

Vt , φ = E Now we will remove the Condition (BD) but assume that φ and π0 are Lipschitz continuous functions. Let {(an , bn , cn , hn , φ n )} be a sequence of functions such that for each n, Condition (BD) is satisﬁed, and as n → ∞, it converges to (a, b, c, h, φ) in supremum norm. Let Vtn , ψ0n be given as before with (a, b, c, h, φ) being replaced by (an , bn , cn , hn , φ n ). We now prove the convergence of Vtn , ψ0n to Vt , ψ0 using the representation given by the Kallianpur–Striebel formula. By the proof above, we have that n Vt , φ = π0 , ψ0n . (6.44) Although the measure Pˆ n depends on n, the same process (Y, B) applies to all models and is an (m + d)-dimensional Brownian motion under Pˆ n for all n. As (X n , Mn ) is the unique strong solution to dXtn = (bn − cn hn )(Xtn )dt + cn (Xtn )dYt + σ n (Xtn )dBt dMtn = hn (Xtn )∗ dYt , its distribution does not depend on n. Thus

ˆn ˆ Mn f (X n )|Gt . Vtn , φ = EP Mtn f (Xtn )|Gt = E t t

Note that Eˆ Vtn , φ − Vt , φ ˆ Mn φ(X n ) − Mt φ(Xt ) ≤E t t # n # 2 ˆ ˆ |X n − Xt |2 , ˆ ≤ KE Mt − Mt + EMt K E t where we used the Lipschitz continuity and the boundedness of φ is the last ˆ |X n − Xt |2 → 0. Similar to the inequality. By Theorem 4.10, we have E t ˆ Mn − Mt → 0. Therefore, as proof of Theorem 4.10, we can show that E t n → ∞, Eˆ Vtn , φ − Vt , φ → 0. (6.45)

119

120

6 : Uniqueness of the solution for Zakai’s equation

Similarly, we have Eˆ π0 , ψ0n − π0 , ψ0 → 0.

(6.46)

Taking n → ∞ on both sides of equation (6.44), it follows from equations (6.45) and (6.46) that equation (6.43) holds when φ and π0 are Lipschitz continuous functions. Finally, for general φ ∈ Cb (Rd ) and π0 ∈ L2 (Rd ), we can approximate them by Lipschitz functions and obtain equation (6.43) by passing though the limit as we did above.

6.6

Notes

There are several theories developed that provide the existence and uniqueness for the solutions to linear SPDEs. For example, Krylov and Rozovskii [90], [91], Pardoux [132], Rozovskii [137] for the L2 -theory, Da Prato and J. Zabczyk [47], Krylov [88] for the Lp -theory, Krylov [89] for the analytic approach, and Kunita [94] for the semigroup approach. Most of the material in this chapter (except the last section) was taken from Kurtz and Xiong [97]. The method here can be applied to general linear SPDEs with Zakai’s equation as a special case. The last section is based on Crisan [38].

Uniqueness of the solution for the ﬁltering equation

7

In this chapter, we prove the uniqueness for the solution to the ﬁltering equation. To this end, we consider an interacting particle system whose weighted empirical distribution process satisﬁes the ﬁltering equation. Note that the ﬁltering equation is a non-linear stochastic partial differential equation (SPDE). The main idea in proving the uniqueness in this chapter is to show that the uniqueness of a non-linear SPDE is implied by that of the inﬁnite system of ordinary stochastic differential equations and that of a corresponding linear SPDE, which follows from the same arguments as those in the previous chapter. The uniqueness for the system is obtained by a truncation argument.

7.1

An interacting particle system

In this section, we deﬁne an inﬁnite particle system and prove that the weighted empirical measure process of this system is a solution of the ﬁltering equation. This system will help us to prove the uniqueness for the solution to the ﬁltering equation. Let β : Rd × P (Rd ) → Rm and b˜ : Rd × P (Rd ) → Rd be given by ˜ µ) = b(x) − c(x)β(x, µ). β(x, µ) = µ, h − h(x) and b(x, We consider an interacting particle system {(X i , Ai ), i = 1, 2, . . .} governed by the following equations: i = 1, 2, . . ., Xti

=

X0i

+

0

t

σ (Xsi )dBis

+

0

Ait

=

Ai0

+

t

t 0

˜ i , µs )ds + b(X s

t 0

Ais β ∗ (Xsi , µs )dνs ,

c(Xsi )dνs ,

(7.1)

(7.2)

122

7 : Uniqueness of the solution for the ﬁltering equation

and 1 i At δXi , t n→∞ n n

µt = lim

(7.3)

i=1

where the Bi , i = 1, 2, . . ., are independent, standard Rd -valued Brownian motions; ν is a Rm -valued Brownian motion independent of {Bi : i = 1, 2, . . .}, and the notation δx stands for the Dirac point measure at x. More speciﬁcally, equation (7.3) means that for any f ∈ Cb (Rd ), n 1 i µt , f = lim At f (Xti ). n→∞ n

i=1

Deﬁnition 7.1 The triple (X, A, µ) is a solution to the system equations (7.1–7.3) if the equations (7.1–7.3) are satisﬁed and given the process (µ, ν), the sequence of stochastic processes (X i , Ai ), i = 1, 2, . . ., are conditionally independent with identical conditional distribution in the space C(R+ , Rd × R+ ). To prove the uniqueness of the solution to the system, we have to assume the Lipschitz continuity of the coefﬁcients. To this end, we need to use a metric in the space P (Rd ). We shall take the Wasserstein metric deﬁned below. For ν1 , ν2 ∈ P (Rd ), the Wasserstein metric is deﬁned by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where

B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .

We note that the topology determined by the Wasserstein metric ρ is equivalent to the topology of weak convergence on P (Rd ). Under Condition (BC), we can verify that the coefﬁcients of the inﬁnite particle system satisfy the following conditions (S1) and (S2): (S1) There exists a constant K such that for each x ∈ Rd , ν ∈ P (Rd ) ˜ ν)|2 + |c(x)|2 + |β(x, ν)|2 ≤ K2 . |σ (x)|2 + |b(x, (S2) For each x1 , x2 ∈ Rd and ν1 , ν2 ∈ P (Rd ), ˜ 1 , ν1 ) − b(x ˜ 2 , ν2 )|2 |σ (x1 ) − σ (x2 )|2 + |b(x + |c(x1 ) − c(x2 )|2 + |β(x1 , ν1 ) − β(x2 , ν2 )|2 ≤ K2 (|x1 − x2 |2 + ρ(ν1 , ν2 )2 ).

7.1

An interacting particle system

We assume that (Ai0 , X0i ) are independent and identically distributed random vectors being independent of {Bi } and ν. For the moment, we suppose that the system has a solution, the existence of which will be proved later (cf. Theorem 7.7 below). The following proposition establishes the ﬁniteness of the second moments for the locations and the weights of the particles in the system. Proposition 7.2 Suppose that Assumption (S1) holds and

E(A10 )2 + E|X01 |2 < ∞.

(7.4)

If (X, A, µ) is a solution of equations (7.1)–(7.3), then for every t ≥ 0 and i ∈ N, (7.5) E sup (Ais )2 + |Xsi |2 < ∞. 0≤s≤t

Proof Applying the Burkholder–Davis–Gundy inequality to Xti in equation (7.1) and using the inequality (a+b+c+d)2 ≤ 4(a2 +b2 +c2 +d 2 ), we have t i 2 i 2 E sup |Xs | ≤ 4E|X0 | + 16E |σ (Xsi )|2 ds 0≤s≤t

+ 4t E

0

0

t

˜ i , µs )|2 ds + 16E |b(X s

t

0

|c(Xsi )|2 ds

≤ 4E|X0i |2 + 32K2 t + 4K2 t 2 < ∞. Applying Itô’s formula to Ait in equation (7.2), we have ! " (Ait )2 = (Ai0 )2 exp 2Nti − N i , t

where

Nti =

0

t

β ∗ (Xsi , µs )dνs .

(7.6)

Since exp 2Nti − 2 N i t is a martingale and t ! " Ni = |β(Xsi , µs )|2 ds ≤ K2 t, t

0

by Doob’s inequality, we get

E sup (Ais )2 ≤ 4E(Ait )2 0≤s≤t

! " ! " = 4E (Ai0 )2 exp 2Nti − 2 N i exp N i t

≤ 4e

K2 t

E(Ai0 )2 .

t

123

124

7 : Uniqueness of the solution for the ﬁltering equation

The next theorem shows that the weighted empirical measure of the system is indeed a solution to the ﬁltering equation. Theorem 7.3 Let µt be the weighted empirical measure process for the particle system equations (7.1)–(7.3). Then, ∀φ ∈ Cb2 (Rd ), t t

µt , φ = π0 , φ +

µs , Lφ ds + µs , β ∗ (·, µs )φ + ∇ ∗ φc dνs . (7.7) 0

0

Proof Applying Itô’s formula to equations (7.1) and (7.2), for every φ ∈ Cb2 (Rd ), we have t i i i i Ais φ(Xsi )β ∗ (Xsi , µs )dνs At φ(Xt ) = A0 φ(X0 ) + 0

+

t 0

+ +

t

0 t 0

Ais Lφ(Xsi )ds

(7.8)

Ais ∇ ∗ φ(Xsi )σ (Xsi )dBis Ais ∇ ∗ φ(Xsi )c(Xsi )dνs .

i = 1, . . . , n, are independent Brownian motions, we can apply As the Burkholder–Davis–Gundy inequality to get n 2 1 t i ∗ i i i E sup As ∇ φ(Xs )σ (Xs )dBs 0 t≤T n Bi ,

i=1

T n 2 1 ≤4 2 E (Ais )2 ∇ ∗ φ(Xsi )σ (Xsi ) ds n 0 i=1

4 ∇φ2∞ K2 T E sup(A1s )2 → 0. n s≤T Taking limn→∞ n1 ni=1 on both sides of equation (7.8), we see that equation (7.7) holds. ≤

Remark 7.4 By the deﬁnition of β, we see that equation (7.7) coincides with the ﬁltering equation (5.17).

7.2

The uniqueness of the system

In this section, we prove the uniqueness for the solution of the inﬁnite system of stochastic differential equations. Although the coefﬁcients are assumed

7.2

The uniqueness of the system

to be Lipschitz, the product aβ ∗ (x, µ), which appears in the system, is only Lipschitz in (a, x, µ) in the region where a is bounded. To get the uniqueness for the solution to the system, we need to use a localizing technique. Namely, ﬁrst we prove the uniqueness when the weights of the particles are bounded, and then, extend to the whole space. However, there are inﬁnitely many particles in the system, it is not possible to make their weights bounded (for example, Borel–Cantelli’s lemma tells us that, in the special case when individuals are independent of each other, the probability of their weights being bounded is 0). To overcome this difﬁculty, we will localize a kind of average of the weights. Here is the main theorem of this section. Theorem 7.5 Under the Assumptions (S1), (S2) and equation (7.4), the system has at most one solution. ˜ A, ˜ µ) Proof Let (X, A, µ) and (X, ˜ be two solutions of equations (7.1)–(7.3) with the same initial conditions. Recall that (Xi , Ai ), i = 1, 2, . . ., are conditionally independent with identical conditional distribution. It follows from the conditional version of the strong law of large numbers that the following limit 1 i 2 (At ) n→∞ n n

lim

i=1

exists almost surely. For any m ∈ N, let

$ n 1 i 2 τm = inf t : lim (At ) > m2 . n→∞ n i=1

Then, τm is an increasing sequence of stopping times. The stopping time τ˜m ˜ i ). Let ηm = τm ∧ τ˜m . Then, {ηm } is deﬁned similarly (with Ait replaced by A t is again an increasing sequence of stopping times. We denote the limit of this sequence by η∞ . By equation (7.1) and the Burkholder–Davis–Gundy inequality, we get 2 i ˜i E|Xt∧η −X t∧ηm | m t ˜ i )|2 1s≤η ds |σ (Xsi ) − σ (X ≤ 12E m s 0

+ 3t E

t 0

˜ i , µs ) − b( ˜ X ˜ i , µ˜ s )|2 1s≤η ds |b(X m s s

125

126

7 : Uniqueness of the solution for the ﬁltering equation

+ 12E

t 0

˜ i )|2 1s≤η ds |c(Xsi ) − c(X m s

≤ 3K2 (8 + t)E

t

˜ i |2 + ρ(µs , µ˜ s )2 1s≤η ds. |Xsi − X m s

0

(7.9)

For s ≤ ηm , we have n 1 i i i i ˜ φ(X ˜ )) (As φ(Xs ) − A ρ(µs , µ˜ s ) = sup lim s s φ∈B1 n→∞ n i=1

1 i ˜ i )| As |φ(Xsi ) − φ(X s φ∈B1 n→∞ n n

≤ sup lim

i=1

n 1 i ˜ i | φ(X ˜ i ) + sup lim |As − A s s φ∈B1 n→∞ n i=1

1 n→∞ n

≤ lim

n

1 i ˜ i |. |As − A s n→∞ n

˜ i | + lim Ais |Xsi − X s

i=1

n

i=1

Consequently, the by Cauchy–Schwarz inequality, we get

1 i 2 ρ(µs , µ˜ s ) ≤ lim (As ) n→∞ n n

1

1 i ˜ i |2 |Xs − X lim s n→∞ n

2

i=1

1 n→∞ n

+ lim

n

n

1 2

i=1

˜ i| |Ais − A s

i=1

1 i ˜ i |2 ≤ m lim |Xs − X s n→∞ n n

i=1

1 2

1 i ˜ i |, (7.10) |As − A s n→∞ n n

+ lim

i=1

where the last inequality follows from s ≤ ηm . Let i 2 ˜i −X fm (t) = E|Xt∧η t∧ηm | , m

and

⎡

2 ⎤

1 i ˜i |At∧ηm − A gm (t) = E ⎣ lim t∧ηm | n→∞ n n

i=1

⎦.

7.2

The uniqueness of the system

Then, 1 i 2 ˜i ≤ 2m E lim |Xs∧ηm − X s∧ηm | n→∞ n n

2

2

Eρ(µs , µ˜ s ) 1s≤ηm

i=1

2

1 i ˜i + 2E lim |As∧ηm − A s∧ηm | n→∞ n n

i=1

≤ 2m2 fm (s) + 2gm (s).

(7.11)

By equations (7.9), (7.10), (7.11) and Fatou’s lemma, we have, 2

fm (t) ≤ 3K (8 + t)

t 0

fm (s) + 2m2 fm (s) + 2gm (s) ds.

(7.12)

Next, we derive the estimate for gm (t). As Ait

=

Ai0 exp

Nti

1 ! i" , − N t 2

with N i given by equation (7.6), and making use of the fact that |ex − ey | ≤ (ex ∨ ey )|x − y|, we have t ˜ i | = (|Ai | ∨ |A ˜ i |) (β ∗ (X i , µs ) − β ∗ (X ˜ i , µ˜ s ))dνs |Ait − A t t t s s 1 − 2

0

0

t

|β(Xsi , µs )2

˜ i , µ˜ s )|2 ds. − β(X s

For t ≤ ηm , it follows from the Cauchy–Schwarz inequality that

2

1 i ˜ i| |At − A lim t n→∞ n n

i=1

n n 1 i 2 1 t ∗ i i 2 ∗ ˜i ˜ (At ) ∨ (At ) lim ≤ lim (β (Xs , µs )−β (Xs , µ˜ s ))dνs n→∞ n n→∞ n 0 i=1 i=1 2 t 1 i 2 i ˜ 2 ˜ − (|β(Xs , Vs )| − |β(Xs , Vs )| )ds . 2 0

127

128

7 : Uniqueness of the solution for the ﬁltering equation

By the deﬁnition of ηm and the inequality (a + b)2 ≤ 2(a2 + b2 ), we can continue with 2 n 1 i i ˜ | lim |At − A t n→∞ n i=1

2 n 1 t ∗ i ∗ ˜i ≤ 4m lim (β (Xs , µs ) − β (Xs , µ˜ s ))dνs n→∞ n 0 i=1 t 1 2 i i 2 ˜ + t 4K |β(Xs , µs ) − β(Xs , µ˜ s )| ds . 4 0 2

By the Lipschitz continuity of β, we ﬁnally obtain: 2 n 1 i ˜ i| lim |At − A t n→∞ n i=1

2 n 1 t ∗ i ∗ ˜i ≤ 4m lim (β (Xs , µs ) − β (Xs , µ˜ s ))dνs n→∞ n 0 i=1 t 4 i i 2 2 ˜ |Xs − Xs | + ρ(µs , µ˜ s ) ds . +K t 2

0

It follows from Fatou’s lemma and the Burkholder–Davis–Gundy inequality that t n 1 2 ˜ i , V˜ s )|2 1s≤η ds E 4 |β(Xsi , Vs ) − β(X gm (t) ≤ 4m lim m s n→∞ n 0 i=1 t 4 i i 2 2 ˜ +K t |Xs − Xs | + ρ(µs , µ˜ s ) 1s≤ηm ds 0

t n 1 2 4 i i 2 2 ˜ (4K +K t) E |Xs − Xs | +ρ(µs , µ˜ s ) 1s≤ηm ds ≤ 4m lim n→∞ n 0 i=1 t ≤ 4m2 (4K2 + K4 t) fm (s) + 2m2 fm (s) + 2gm (s) ds. (7.13) 2

0

Adding equations (7.12) and (7.13), for t ≤ T, we have t fm (t) + gm (t) ≤ K(m, T) fm (s) + gm (s) ds, 0

where K(m, T) is a constant. By Gronwall’s inequality, we have fm (t) + gm (t) = 0.

(7.14)

7.3

Uniqueness for the ﬁltering equation

Then for m, i ∈ N and t ∈ [0, T], we have i ˜i Xt∧η =X t∧ηm m

and

˜i Ait∧ηm = A t∧ηm

a.s.

˜ i and A ˜ i are continuous, we obtain almost surely that As X i , Ai , X i ˜i Xt∧η =X t∧ηm m

˜i Ait∧ηm = A t∧ηm ,

and

∀ m, i ∈ N and t ∈ [0, T].

By equation (7.3), we have µt∧ηm = µ˜ t∧ηm , ∀ t ∈ [0, T]

a.s.

Hence, ˜ t, A ˜ t , µ˜ t ) (Xt , At , µt ) = (X

for t ≤ ηm ∧ T,

a.s.

Taking T, m → ∞, we then get ˜ t, A ˜ t , µ˜ t ) (Xt , At , µt ) = (X By the deﬁnition of ηm ,

for t ≤ η∞ ,

a.s.

1 i 2 P(ηm ≤ t) ≤ P sup lim (As ) ≥ m2 n→∞ n 0≤s≤t n

i=1

1 1 i 2 E sup lim (As ) m2 0≤s≤t n→∞ n n

≤

i=1

≤ =

1 1 lim inf m2 n→∞ n

n

E sup (Ais )2

i=1

0≤s≤t

1 E sup (A1 )2 , m2 0≤s≤t s

where the last inequality follows by moving the sup inside the sum and applying Fatou’s lemma, and the last equality follows from the fact that (Xi , Ai ), i = 1, 2, . . . are conditionally independent with identical conditional distribution. Hence, by Proposition 7.2, P(η∞ ≤ t) = limm→∞ P{ηm ≤ t} = 0, i.e. η∞ = ∞, a.s., and the uniqueness follows.

7.3

Uniqueness for the ﬁltering equation

In this section, we establish the uniqueness for the solution to the ﬁltering equation (7.7). First, let us summarize the techniques that will be used in this section. Note that the optimal ﬁlter πt is a solution of equation (7.7). We assume

129

130

7 : Uniqueness of the solution for the ﬁltering equation

the existence of another solution µt and ﬁx the variable in the nonlinear functions in equation (7.7) by µ (cf. equations (7.15) and (7.19)) to obtain a linear SPDE whose uniqueness follows from arguments similar to those in Section 6.4. The uniqueness for the solution to equation (7.7) is implied by that of the linear SPDE equation (7.19) and that of the system equations (7.1–7.3) proved in the previous section (cf. the proof of Theorem 7.7 for this argument). Namely, we decompose the difﬁcult uniqueness problem for the non-linear SPDE into two simpler problems: One for a linear SPDE and one for a system of ordinary SDEs. Now we ﬁx a P (Rd )-valued process µt and consider the linear equation t t

ηt , φ = π0 , φ +

ηs , Lφ ds + (7.15) ηs , βs∗ φ + ∇ ∗ φc) dνs , 0

0

where βs (x) = β(x, µs ). L2 (Rd )

Recall that H0 = is deﬁned in Section 6.2. By following exactly the same argument as those used in Section 6.4, we have the following theorem. Theorem 7.6 Suppose that π0 ∈ H0 . Then equation (7.15) has at most one solution. Finally, we consider the uniqueness of the solution of the non-linear SPDE equation (7.7). Theorem 7.7 Suppose that π0 ∈ H0 , then there exists a unique H0 -valued solution of equation (7.7). Further, πt is the unique solution to the system equations (7.1)–(7.3). Proof By Corollary 6.13, πt takes values in H0 . Let µt be another H0 -valued solution of equation (7.7). Consider the system of stochastic differential equations: i = 1, 2, . . ., t t t i i i ˜ Xti = X0i + b(Xs , µs )ds + σ (Xs )dBs + c(Xsi )dνs , (7.16) 0

0

and

Ait = Ai0 +

0

t 0

Ais β ∗ (Xsi , µs )dνs .

(7.17)

Let µ˜ t be given by 1 i At δXi . t n→∞ n n

µ˜ t = lim

i=1

(7.18)

7.4

Notes

As in Theorem 7.3, we can prove that µ˜ is a solution of t t

ηt , φ = π0 , φ +

ηs , Lφ ds + ηs , β ∗ (·, µs )φ + ∇ ∗ φc dνs . (7.19) 0

0

By Corollary 6.13, µ˜ is H0 -valued. In particular, µ˜ is an H0 -valued solution of equation (7.19). Since µ is also an H0 -valued solution of equation (7.19), it follows from Theorem 7.6 that µ˜ = µ. Hence, µ, ˜ together with suitable (X i , Ai ), i = 1, 2, . . ., is a solution of the system equations (7.1)–(7.3). Note that we may replace µt by πt in the equations (7.16) and (7.17) and then deﬁne π˜ t by the right hand side of equation (7.18). By a similar argument as the above, we then have π˜ t = πt , and hence, πt is a solution to the system equations (7.1)–(7.3). By the uniqueness of the solution to this system we see that πt = µt .

7.4

Notes

Limits of empirical measure processes for systems of interacting diffusions have been studied by various authors following the pioneering work by McKean [122] (see, for example, Chiang, et al. [28], Graham [71], Hitsuda and Mitoma [73], Kallianpur and Xiong [84], Méléard [123], and Morien [126]). In these papers, the driving processes in the models are assumed to be independent. The limit is then a deterministic, measure-valued function. Florchinger and Le Gland [63] and Del Moral [50] consider particle approximations for Zakai’s equation. Kotelenez [87] introduces a model of n-particles with the same driving process for each particle and studies the empirical process as the solution of an SPDE. In his model, the weights Ai are constants. Dawson and Vaillancourt [49] consider a model that corresponds to taking Ait ≡ 1. Bernard et al. [8] consider a system with time-varying weights and a deterministic limit. Kurtz and Xiong [97] study a general class of non-linear SPDEs whose solutions are represented as the weighted measure process of interacting particles systems. Most of the material in this chapter is taken from Kurtz and Xiong [97].

131

8

Numerical methods

Explicit solutions to the ﬁltering equations are rarely available. Thus, to solve the ﬁltering problems, we have to resort to numerical approximations. In this chapter, we study the numerical solutions to the optimal ﬁlters using certain particle systems. The main idea is to represent the solution to a stochastic partial differential equation through a system of weighted particles whose locations and weights are governed by stochastic differential equations that can be solved numerically. In Section 8.1, we consider a direct Monte-Carlo method based on the weighted particle-system representation. As the error in the Monte-Carlo approximation increases exponentially fast when the time parameter tends to inﬁnity, due to the exponential growth of the variance of the weights of the particles in the system, we will modify the weight of each particle. However, we need to keep the total mass constant for the approximate ﬁlter. To this end, the number of particles in the system will be changed from time to time. We use a branching particle system to match the change of the number of particles in the system. In Section 8.2, we introduce this branching particle system to approximate the optimal ﬁlter. In Section 8.3, we give a primary estimate on the error bound between the unnormalized approximate ﬁlter and the optimal one with t ﬁxed. Finally, in Section 8.4, we prove that the approximate ﬁlter converges uniformly in time as the number of particles tends to inﬁnity and the step size between branching times tends to 0.

8.1

Monte-Carlo method

In this section, we will give a numerical scheme based on the weighted particle system representation equations (5.22–5.24) for the unnormalized ﬁlter. We ﬁrst approximate the unnormalized ﬁlter by the weighted empirical measure of a ﬁnite system. Then, for each stochastic differential equation in that system, we approximate the solution using the Euler scheme.

8.1

Monte-Carlo method

First, we recall the weighted particle-system representation given in Section 5.4: ⎧ ˜ i )dt + c(X i )dYt + σ (X i )dBi , ⎨ dXti = b(X t t t t (8.1) M0i = 1, i = 1, 2, . . . , dMti = Mti h∗ (Xti )dYt , ⎩ 1 n i i Vt , f = limn→∞ n i=1 Mt f (Xt ), where b˜ = b−ch. We recall that {X0i , i = 1, 2, . . .} are i.i.d. random vectors with common distribution π0 in Rd . Let t h∗ (Xsi )dYs . Nti = 0

Then

Ni

ˆ is a P-martingale with Meyer’s process satisfying t 2 ! " i N = h(Xsi ) ds ≤ K2 t, t

0

where the constant K is given in equation (5.3). An application of Itô’s formula shows that Mti is given by 1 ! i" i i N . (8.2) Mt = exp Nt − t 2 Proposition 8.1 Suppose that Condition (BC) holds. Then for each i ∈ N, 2 Eˆ sup |Msi |2 ≤ 4eK T .

(8.3)

0≤s≤T

Proof By equation (8.2), Mti is a square-integrable martingale with ! " Eˆ (Mti )2 = Eˆ exp 2Nti − N i t ! ! " " 1 i ˆ exp 2N i − =E 2N exp Ni t t t 2 " ! 2 ˆ exp 2N i − 1 2N i ≤E eK t t t 2 2

= eK t . Consequently, equation (8.3) follows from Doob’s inequality equation (2.14). Let 1 i Mt δXi . t n n

Vtn =

i=1

The process

Vtn

is called the approximate unnormalized ﬁlter.

(8.4)

133

134

8 : Numerical methods

The following corollary gives the convergence of Vtn as n → ∞. ˆ |X i |2 < ∞ and Condition (BC) holds. Let Corollary 8.2 Assume that E 0 f be a bounded continuous function. Then, for each t ≥ 0, there exists a constant K1 such that K1 f 2∞ . (8.5) Eˆ | Vtn , f − Vt , f |2 ≤ n Proof Note that, given Gt , the sequence of random vectors {(Mti , Xti ) : i = 1, 2, . . .} are conditionally independent with the same conditional distribution. For any i, we have Eˆ Mti f (Xti )Gt = Vt , f . Thus, by equation (8.4), we have Eˆ | Vtn , f − Vt , f |2 ⎛ ⎞ n 2 1 ˆ ˆ⎝ E ˆ Mi f (X i )Gt Gt ⎠ =E Mti f (Xti ) − E t t n2 i=1

=

1 n2

n

Eˆ

2 ˆ Mi f (X i )|Gt Mti f (Xti ) − E t

i=1

2 1ˆ 1 ˆ (M1 f (X 1 )|Gt ) Mt f (Xt1 ) − E = E t t n 4f 2∞ ˆ ≤ E((Mt1 )2 ). n Equation (8.5) then follows from Proposition 8.1 and equation (8.6).

(8.6)

Next, we apply the Euler scheme to approximate the solution of the ﬁnite system {(Mi , X i ) : i = 1, 2, . . . , n}. For δ > 0, let ηδ (t) = jδ

for jδ ≤ t < (j + 1)δ.

Set Zti = log Mti . Deﬁne the ﬁnite system {(X δ,i , Zδ,i , V n,δ ) : i = 1, 2, . . . , n} as follows: ⎧ δ,i ˜ δ,i )dt + c(X δ,i )dYt + σ (X δ,i )dBi ⎪ dXt = b(X ⎪ t ηδ (t) ηδ (t) ⎪ 2 ηδ (t) ⎨ δ,i δ,i δ,i dZt = h(Xηδ (t) )dYt − 12 h(Xηδ (t) ) dt (8.7) ⎪ ⎪ n ⎪ n,δ δ,i 1 ⎩ Vt = δXδ,i . i=1 exp Zt n t

8.1

Monte-Carlo method

The next theorem proves the convergence of each particle in the ﬁnite system to the corresponding one in the inﬁnite system equation (8.1) as δ → 0. ˆ |X i |2 < ∞ and Condition (BC) holds. For Theorem 8.3 Assume that E 0 each positive integer i and for each T > 0, we have δ,i δ,i i 2 i 2 ˆ (8.8) E sup |Xt − Xt | + sup |Zt − Zt | ≤ K1 (T)δ, t≤T

t≤T

where K1 (T) is a constant depending on T. Proof By equations (8.1) and (8.7), we have t δ,i i ˜ δ,i ) − b(X ˜ i ) ds Xt − Xt = b(X s ηδ (s) 0

+ +

t 0

δ,i c(Xηδ (s) ) − c(Xsk,i ) dYs

t 0

δ,i σ (Xηδ (s) ) − σ (Xsi ) dBis .

Applying the Burkholder–Davis–Gundy and Hölder’s inequalities, we obtain t 2 ˜ δ,i δ,i i 2 ˜ i ) ds ˆ E sup |Xs − Xs | ≤ 3T Eˆ b(X ) − b(X s ηδ (s) s≤t

0

+ 12

0

+ 12

t

0

t

2 Eˆ c(Xηδ,iδ (s) ) − c(Xsi ) ds 2 Eˆ σ (Xηδ,iδ (s) ) − σ (Xsi ) ds.

(8.9)

It follows from equation (8.7) that

Eˆ |Xtδ,i − Xηδ,iδ (t) |2 ≤ 3K2 (t − ηδ (t))2 + 3K2 E(|Yt − Yηδ (t) |2 ) ˆ (|Bi − Bi |2 ) + 3K2 E t ηδ (t) ≤ K2 δ, and hence, ˜ δ,i ) − b(X ˜ i )|2 ≤ 2E ˜ δ,i ) − b(X ˜ δ,i )|2 + 2E ˜ δ,i ) − b(X ˜ i )|2 ˆ |b(X ˆ |b(X Eˆ |b(X s s s s ηδ (s) ηδ (s) ≤ 2K2 K2 δ + 2K2 fδ (s),

135

136

8 : Numerical methods

where ˆ sup |X δ,i − X i |2 . fδ (t) ≡ E s s s≤t

The other two terms in equation (8.9) can be estimated similarly. Therefore, t t fδ (t) ≤ 3T 2(K2 δ + K2 fδ (s))ds + 24 2(K2 δ + K2 fδ (s))ds 0

0

≤ K3 (T)δ + K4 (T)

t 0

fδ (s)ds.

By Gronwall’s inequality, we have

Eˆ sup |Xtδ,i − Xti |2 ≤ K5 (T)δ. t≤T

δ,i i 2 ˆ sup A similar inequality holds for E t≤T |Zt − Zt | .

To study the convergence of measure-valued processes, we need a metric in the space of ﬁnite measures. Once again, we use the Wasserstein metric introduced in Chapter 6. We recall that for ν1 , ν2 ∈ MF (Rd ), the Wasserstein metric is given by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where

B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .

Under this metric, MF (Rd ) becomes a Polish space. We will be dealing with measures of the form 1 j ai δxj , n i n

νj =

j = 1, 2.

i=1

It is useful to note that in this case, ∀ φ ∈ B1 , | ν1 , φ − ν2 , φ | 1 1 |ai − a2i ||φ(x1i )| + a2i |φ(x1i ) − φ(x2i )| n n

≤

i=1

1 1 (ai ∨ a2i ) |x1i − x2i | + | log a1i − log a2i | , n n

≤

i=1

8.2

A branching particle system

where we used |a − b| ≤ a ∨ b | log a − log b| in the last inequality above. Hence n 1 1 (ai ∨ a2i ) |x1i − x2i | + | log a1i − log a2i | . (8.10) ρ(ν1 , ν2 ) ≤ n i=1

ˆ |X i |2 < ∞ and Condition (BC) holds. Then Corollary 8.4 Assume that E 0 there exists a constant K1 (T) such that √ Eˆ sup ρ(Vtn,δ , Vtn ) ≤ K1 (T) δ. (8.11) t≤T

Proof Note that by equation (8.10), we have n,δ

ρ(Vt , Vtn ) 1 δ,i δ,i δ,i (Mt ∨ Mti ) |Xt − Xti | + |Zt − Zti | n n

≤

≤

i=1

1 δ,i (Mt ∨ Mti )2 n n

i=1

1/2

2 1 δ,i δ,i |Xt − Xti | + |Zt − Zti | n n

1/2 ,

i=1

where the last inequality follows from the Cauchy–Schwarz inequality. The conclusion then follows from Proposition 8.1 and Theorem 8.3. Finally, we combine both approximating procedures (δ → 0 and n → ∞). The sampling error and the discretization error together will give us the following overall error estimate. n,1/n ˆ |X i |2 < ∞ and Condition (BC) Theorem 8.5 Let V¯ tn = Vt . Assume E 0 holds. Then there exists a constant K1 (t) such that

K1 (t) Eˆ ρ(V¯ tn , Vt ) ≤ √ . n We note that the weights introduced above have mean 1 and variance growing exponentially fast as t → ∞. Therefore, the error associated with the numerical scheme introduced above grows exponentially fast as t → ∞. To avoid this drawback of the numerical scheme, we consider in the next section a branching particle system to modify the weights of the particles at the time-discretization steps.

8.2

A branching particle system

In this section, we introduce the branching particle-system approximation of the optimal ﬁlter. The main purpose is to reduce the variance of the

137

138

8 : Numerical methods

weights of the particles in the system. The idea is to divide the time interval into small subintervals and the weight for each particle at any partition time is modiﬁed as an exponential martingale that depends on the signal and the noise in the small interval prior to that time instead of on the whole interval starting from 0. Now we proceed to the deﬁnition of the branching particle system. Initially, there are n particles of weight 1 each at locations xni , i = 1, 2, . . . , n, satisfying the following initial condition (I): The initial positions {xni : i = 1, 2, . . . , n} of the particles are i.i.d. random vectors in Rd with the common distribution π0 ∈ P (Rd ). Note that, under Condition (I), we have that 1 δxni → π0 n n

π0n =

i=1

in P (Rd ) almost surely; and for any φ ∈ Cb (Rd ), " 2 1 ! π0 , φ 2 − π0 , φ2 Eˆ π0n − π0 , φ = n ≤ 4n−1 φ20,∞ , here, φ0,∞ denotes the supremum norm of φ. Let δ = δn = n−2α , 0 < α < 1. For j = 0, 1, 2, . . ., there are mnj number of particles alive at time t = jδ. Note that mn0 = n. During the time interval (jδ, (j + 1)δ), the particles move according to the following diffusions: For i = 1, 2, . . . , mnj , i Xti = Xjδ +

t

jδ

σ (Xsi )dBis +

t

jδ

˜ i )ds + b(X s

t jδ

c(Xsi )dYs .

(8.12)

At the end of the interval, the ith particle (i = 1, 2, . . . , mnj ) branches i of offsprings such (independent of others) into a random number ξj+1 that the conditional expectation and the conditional variance given the information prior to the branching satisfy

i ˜ n (X i ), Eˆ ξj+1 |F( j+1)δ− = M j+1 and ˆ i n VarP ξj+1 |F( j+1)δ− = γj+1 (X i ),

(8.13)

8.2

A branching particle system

n (X i ) is arbitrary, where γj+1

˜ n (X i ) = M j+1

and

n Mj+1 (X i )

= exp

( j+1)δ

∗

h jδ

1 mnj

n (X i ) Mj+1 , mnj n (X ) M j+1 =1

(Xti )dYt

1 − 2

( j+1)δ

jδ

(8.14)

|h(Xti )|2 dt

.

(8.15)

n , we take To minimize γj+1

i ξj+1

=

˜ n (X i )] ˜ n (X i )}, [M with probability 1 − {M j+1 j+1 ˜ n (X i )}, ˜ n (X i )] + 1 with probability {M [M j+1 j+1

where {x} = x − [x] is the fraction of x, and [x] is the largest integer that is not greater than x. In this case, we have n ˜ n (X i )}(1 − {M ˜ n (X i )}). γj+1 (X i ) = {M j+1 j+1

Now we deﬁne the approximate ﬁlter as follows: mn

πtn

j 1 ˜n i Mj (X , t)δXi , = n t mj

jδ ≤ t < (j + 1)δ,

i=1

where

Mjn (X i , t) = exp

t

jδ

h∗ (Xsi )dYs −

1 2

t

jδ

|h(Xsi )|2 ds ,

(8.16)

and ˜ n (X i , s) = M j

1 mnj

Mjn (X i , s) . mnj n (X , s) M j =1

˜ n (X i , t). At the Namely, the ith particle has a time-dependent weight M j end of the interval, i.e. t = (j + 1)δ, this particle dies and gives birth to a random number of offspring, whose conditional expectation is equal ˜ n (X i ) = M ˜ n (X i , (j + 1)δ)) of the to the pre-death weight (note that M j+1 j particle. The new particles start from their mother’s position with weight 1 each.

139

140

8 : Numerical methods

The process πtn is called the hybrid ﬁlter since it involves a branching particle system and the empirical measure of these weighted particles. In the earlier stage of the study of particle approximation of the optimal ﬁlter, the particle approximation is deﬁned as πtn without the weight, i.e. the particle ﬁlter is mn

π˜ tn

j 1 = n δXi , t mj

jδ ≤ t < (j + 1)δ.

(8.17)

i=1

Thus, the current approximate ﬁlter πtn is a combination of the weighted ﬁlter introduced in Section 8.1 and the particle ﬁlter equation (8.17). That is why we call it the hybrid ﬁlter. Since Zakai’s equation for the unnormalized ﬁlter Vt is much simpler than the Kushner–FKK equation for the optimal ﬁlter πt , to study the convergence of πtn to πt , it is more convenient to consider an auxiliary process ﬁrst. Let mn

ηkn =

j−1 1 n k Mj (X ). j=1 n mj−1

=1

For kδ ≤ t < (k + 1)δ, we deﬁne Vtn

mn

mn

i=1

i=1

k k 1 1 = ηkn πtn Mkn (X i , t) = ηkn Mkn (X i , t)δXi . t n n

We will prove that Vtn converges to the unnormalized ﬁlter Vt . In the rest of this section, we derive a few estimates for the branching particle system introduced above. Lemma 8.6 There exists a constant K1 such that for any i = 1, 2, . . . , mnj and 0 ≤ j ≤ [T/δ], we have n Eˆ |Mj+1 (X i ) − 1|2 Fjδ ≤ K1 δ. (8.18) Proof By equation (8.16), similar to Proposition 8.1, it is easy to show that for jδ ≤ t ≤ (j + 1)δ, i K2 δ Eˆ (Mjn (X i , t))2 |Fjδ ∨ Fjδ,( , (8.19) j+1)δ ≤ e i i i where Fjδ,( j+1)δ = σ {Bs − Bjδ : jδ ≤ s ≤ (j + 1)δ} is the σ -ﬁeld generated

by the increments of Bit in t ∈ [jδ, (j + 1)δ]. Applying Itô’s formula to

8.2

A branching particle system

equation (8.16), we have Mjn (X i , s)

=1+

s

jδ

Mjn (X i , r)h∗ (Xri )dYr .

(8.20)

Thus,

Eˆ

n |Mj+1 (X i ) − 1|2 Fjδ

ˆ =E ≤

( j+1)δ

jδ ( j+1)δ

|Mjn (X i , r)|2 |h(Xri )|2 drFjδ

2

eK δ K2 dr

jδ 2

= K2 eK δ δ.

(8.21) 2

The inequality equation (8.18) then follows by taking K1 = K2 eK .

Remark 8.7 In the proof of equation (8.21), we have not used the full ˆ strength of equation (8.19). Instead, we only need E (Mjn (X i , t))2 |Fjδ ≤ 2

eK δ . The full strength of equation (8.19) will be needed in the proof of Theorem 8.10. Lemma 8.8 For each 1 ≤ j ≤ [T/δ], we have 2 Eˆ mnj (ηjn )2 ≤ neK T . Proof As mnj−1

mnj

=

ξji ,

i=1

we have

Eˆ mnj |Fjδ− =

mnj−1

Eˆ ξji |Fjδ−

i=1 mnj−1

=

˜ n (X i ) M j

i=1

= mnj−1 .

(8.22)

141

142

8 : Numerical methods

By equation (8.19), for 1 ≤ i ≤ mnj−1 , we have 2 Eˆ Mjn (X i )2 |F( j−1)δ ≤ eK δ .

(8.23)

Thus,

Eˆ

n ηjn /ηj−1

2

|F( j−1)δ

⎛⎛ 1 ˆ⎜ =E ⎝⎝ n mj−1

⎞ ⎟ Mjn (X k )⎠ F( j−1)δ ⎠

mnj−1

⎞2

k=1

n

mj−1 1 ˆ n k 2 E Mj (X ) |F( j−1)δ ≤ n mj−1 k=1

≤e

K2 δ

.

(8.24)

n is F( j−1)δ -measurable, it follows from As ηjn is Fjδ− -measurable and ηj−1 equation (8.22) and equation (8.23) that

n n 2 n n 2 ˆ ˆ ˆ E mj (ηj ) = E E mj (ηj ) Fjδ−

ˆ mn (ηn )2 =E j−1 j 2 n n 2ˆ n n ˆ = E mj−1 (ηj−1 ) E ηj /ηj−1 |F( j−1)δ

2 ˆ mn (ηn )2 . ≤ eK δ E j−1 j−1 By induction, we get 2 2 Eˆ mnj (ηjn )2 ≤ eK T mn0 = neK T , for 1 ≤ j ≤ [T/δ].

Finally, we give an estimate of the conditional variance γjn (X i ). Lemma 8.9 There exists a constant K1 such that for any j ≥ 0 and i = 1, 2, . . . , mnj , we have 2 √ n n Eˆ γj+1 (X i ) ηj+1 /ηjn |Fjδ ≤ K1 δ.

8.3

Proof As

Convergence of Vnt

˜n n i (X i ) ≤ M γj+1 j+1 (X ) − 1 ,

we have 2 n i n n ˆ E γj+1 (X ) ηj+1 /ηj Fjδ ⎛

⎞ 2 mnj 1 ⎟ ˜n i n ˆ⎜ Mj+1 (X k ) Fjδ ⎠ ≤E ⎝M j+1 (X ) − 1 n mj k=1 ⎞ ⎞ mnj mnj 1 1 n n ˆ ⎝⎝Mn (X i )−1 + Mj+1 (X k )Fjδ ⎠ ≤E Mj+1 (X k )−1⎠ n j+1 mnj mj ⎛⎛

k=1

⎛

2 n ⎜ i ≤ ⎝ E Mj+1 (X ) − 1 Fjδ

k=1

1/2

⎫1/2 ⎞ ⎧ mnj ⎨ 1 2 ⎬ n ⎟ + E Mj+1 (X k ) − 1 Fjδ ⎠ ⎭ ⎩ mnj k=1

⎫1/2 ⎧ mnj 2 ⎬ ⎨ 1 n E Mj+1 (X k ) Fjδ × ⎭ ⎩ mnj k=1

3 2 ≤ 2K2 δeK /2 = K1 δ, where K1 =

8.3

√ 2 2K2 eK /2 .

Convergence of Vnt

In this section, we consider the convergence of Vtn to Vt with t ﬁxed. We recall that {ψs , 0 ≤ s ≤ t} is the solution to the backward SPDE equation (6.30). n Let kδ ≤ t < (k + 1)δ. First, we note that Vkδ , ψkδ − V0n , ψ0 can be written as a telescopic sum k ! j=1

" ! " Vjδn , ψjδ − V(nj−1)δ , ψ( j−1)δ .

143

144

8 : Numerical methods

As ψt = φ, we get

n , ψkδ Vtn , φ − V0n , ψ0 = Vtn , ψt − Vkδ +

k !

" ! " ˆ V n , ψjδ |Fjδ− ∨ Gjδ,t Vjδn , ψjδ − E jδ

j=1

+

k ! " ! Eˆ V n , ψjδ |Fjδ− ∨ Gjδ,t − V n

( j−1)δ , ψ( j−1)δ

jδ

"

j=1

≡ I1n + I2n + I3n ,

(8.25)

where Gs,t = σ (Yu − Ys : s ≤ u ≤ t). By the deﬁnition of V n , we have mn

k 1 i I1n = ηkn ) . Mkn (X i , t)ψt (Xti ) − ψkδ (Xkδ n

i=1

Note that

Eˆ

!

"

Vjδn , ψjδ |Fjδ− ∨ Gjδ,t

⎛

⎞ i ˆ ⎝ηn 1 ξji ψjδ (Xjδ )Fjδ− ∨ Gjδ,t ⎠ =E j n mnj−1 i=1

mnj−1

= ηjn

1 i ˆ ψjδ (Xjδ )E ξji |Fjδ− ∨ Gjδ,t n i=1

mn

j−1 1 i ˆ = ηjn ψjδ (Xjδ )E ξji |Fjδ− n

i=1

mn

j−1 1 i ˜ n = ηjn ψjδ (Xjδ )Mj (X i ), n

(8.26)

i=1

where the second equality follows from the measurability of ηjn , mnj−1 ,

i ) with respect to F ψjδ (Xjδ jδ− ∨ Gjδ,t , the third equality follows from the fact that Yt is of independent increments. Thus,

I2n =

k j=1

mn

j−1 1 i ˜ n (X i )). ηjn ψjδ (Xjδ )(ξji − M j n

i=1

8.3

Convergence of Vnt

By equation (8.26) and the deﬁnitions of V n and ηn , we have

I3n =

k j=1

=

k

⎞ mnj−1 mnj−1 1 1 i ˜ n n ⎝ηn ψjδ (Xjδ )Mj (X i ) − ηj−1 ψ( j−1)δ (X(i j−1)δ )⎠ j n n ⎛

i=1

i=1

mnj−1

n ηj−1

j=1

1 i )Mjn (X i ) − ψ( j−1)δ (X(i j−1)δ ) . ψjδ (Xjδ n i=1

Theorem 8.10 Suppose that the conditions (BD) and (I) hold. Then there exists a constant K1 such that Eˆ | Vtn , φ − Vt , φ |2 ≤ K1 n−(1−α) φ2k,2 , where k =

d 2

+ 2 is given in Condition (BD).

Proof Denote the term in the summation in I3n by aj , i.e. I3n = equation (6.35), we get

1 ηjn

aj+1 =

mn

j

n

i=1

( j+1)δ

jδ

k

j=1 aj .

By

Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis .

Thus,

Eˆ aj+1 |Fjδ =

1 ηjn n

mn

j

i=1

Eˆ

( j+1)δ

jδ

Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis Fjδ

= 0,

and hence,

Eˆ ((I3n )2 ) =

k−1 j=0

⎛

mnj

1 Eˆ ⎝ηjn n i=1

( j+1)δ

jδ

⎞2 Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis ⎠ .

145

146

8 : Numerical methods

Since {(Bis − Bijδ , Xsi ) : jδ ≤ s ≤ (j + 1)δ}, i = 1, 2, . . . , mnj are conditionally (given Fjδ ∨ Gt ) independent, we can continue with

Eˆ ((I3n )2 ) =

k−1 j=0

ˆ =E

⎛ ⎛⎛ ⎞ ⎞ ⎞2 mnj ( j+1)δ ⎜ ⎜ 1 ⎟ n 2⎟ Eˆ ⎝Eˆ ⎝⎝ Mjn (X i , s)∇ ∗ ψs σ (Xsi )dBis ⎠Fjδ ∨ G⎠ t (ηj ) ⎠ n jδ

k−1 j=0

i=1

mn

j 1 ( j+1)δ n i 2 ∗ Mj (X , s) |∇ ψs σ (Xsi )|2 (ηjn )2 ds. n2 jδ

i=1

i Conditioning on Fjδ ∨ Fjδ,( j+1)δ , we then get

Eˆ ((I3n )2 ) ˆ =E

k−1 j=0

ˆ =E

k−1 j=0

n

mj 1 ( j+1)δ ˆ n i 2 ∗ i 2 i n 2 M E (X , s) |∇ ψ σ (X )| F ∨ F s jδ s j jδ,( j+1)δ (ηj ) ds n2 jδ i=1

mn

j 1 ( j+1)δ ˆ n i 2 i M E (X , s) | F ∨ F jδ j jδ,( j+1)δ n2 jδ

i=1

i n 2 Eˆ |∇ ∗ ψs σ (Xsi )|2 |Fjδ ∨ Fjδ,( j+1)δ (ηj ) ds, where the last equality follows from the independent increments of Y and i n i the fact that, given Fjδ ∨ Fjδ,( j+1)δ , Mj (X , s) is Gs -measurable and ψs is Gs,t -measurable. Using equation (8.19), we can continue with the above estimate as follows

Eˆ ((I3n )2 ) ≤ e

K2 δ

Eˆ

k−1 j=0

mn

j 1 ( j+1)δ ˆ ∗ 2 E ψ | F ∇ σ 20,∞ (ηjn )2 ds. s jδ 0,∞ n2 jδ

i=1

Since Y is of independent increments and ψs is Gs,t -measurable, for s ≥ jδ, we have Eˆ ∇ ∗ ψs 20,∞ |Fjδ = Eˆ ∇ ∗ ψs 20,∞ ≤ K2 φ2k,2 .

8.3

Convergence of Vnt

Thus, Eˆ ((I3n )2 ) ≤ K3 n−2 φ2k,2 Eˆ mnj (ηjn )2 ≤ K4 n−1 φ2k,2 ,

(8.27)

where the last inequality follows from Lemma 8.8. It follows from the independent increment property of Y that ˆ ξ i − M ˜ n (X i )Fj δ− ∨ Gt = E ˜ n (X i )Fj δ− = 0. Eˆ ξji − M j j j Thus, for j < j , we have mn

mn

j−1 j −1 1 i i n i 1 ˆ ˜ ˜ n (X i ))ηn ηn E ψjδ (Xjδ )(ξj − Mj (X )) ψj δ (Xji δ )(ξji − M j−1 j j n n

i=1

i=1

n

mj −1 1 i i n i 1 ˆ ˜ =E ψjδ (Xjδ )(ξj − Mj (X )) ψj δ (Xji δ ) n n i=1 i=1 i n i n n ˆ ˜ × E(ξj − Mj (X )|Fj δ− ∨ Gt )ηj−1 ηj mnj−1

= 0. Therefore 2 mnj−1 k n 2 n1 i i n i ˆ ˆ ˜ E((I2 ) ) = E ηj ψjδ (Xjδ )(ξj − Mj (X )) j=1 n i=1 =

k j=1

⎞2 mnj−1 1 i ˜ n (X i ))⎠ Eˆ ⎝ηjn ψjδ (Xjδ )(ξji − M j n ⎛

i=1

⎛ ⎛⎛ n ⎞ ⎞ ⎞2 mj−1 k ⎜ ⎜ 1 i n 2⎟ ˜ n (X i ))⎠ Fjδ− ∨ Gt⎟ = Eˆ ⎝Eˆ ⎝⎝ ψjδ (Xjδ )(ξji − M ⎠ (ηj ) ⎠ j n j=1

i=1

mn

j−1 k 1 i 2 n ˆ ψjδ (Xjδ ) γj (X i )(ηjn )2 . =E n2

j=1

i=1

147

148

8 : Numerical methods

By Lemma 8.9, we can continue the above estimate with n

mj−1 k 1 ˆ n 2 2 n i n 2 ˆ ˆ E I2 E γ (X )(η ) F =E ψ jδ ( j−1)δ 0,∞ j j n2 j=1

i=1 n

mj−1 k √ 1 ˆ 2 n 2 ˆ ≤E ψ E δ jδ 0,∞ K5 (η( j−1)δ ) 2 n j=1

i=1

k √ 1 ˆ n E mj−1 (η(nj−1)δ )2 ≤ K6 δφ2k,2 2 n j=1

√ 1 2 ≤ K6 δφ2k,2 2 kneK T n ≤ K7 n−(1−α) φ2k,2 . The term I1n can be estimated in a manner similar to the estimate for I3n . The conclusion then follows from equation (8.25) and the fact that 2 2 Eˆ V0n , ψ0 − Vt , φ = Eˆ V0n − V0 , ψ0 ˆ ψ0 2 ≤ 4n−1 E 0,∞ ≤ K8 n−1 φ2k,2 , where the ﬁrst inequality follows from Condition (I). mnk

Remark 8.11 For the case of π˜ tn , we deﬁne V˜ tn = n ηkn π˜ tn for kδ ≤ t < (k + 1)δ. In that case, ! " V˜ tn , φ − V0n , φ = I0n + I1n + I2n + I3n , where Ijn , j = 1, 2, 3 as before and mn

k 1 1 − Mkn (X i , t) φ Xti . I0n = ηkn n

i=1

It is easy to show that Eˆ |I0n |2 ≤ K1 φ20,2 n−2α .

8.4

Convergence of Vn

Therefore, ! 2 " Eˆ V˜ tn , φ − Vt , φ ≤ K2 n−(1−α) ∨ n−2α φ2k,2 . For this case, the optimal α is 13 .

8.4

Convergence of Vn

In this section, we study the convergence of V n , regarding V n as a sequence of stochastic processes taking values in MF (Rd ). More speciﬁcally, we derive the convergence rate uniformly for t in any ﬁnite interval. The main idea of this section is to obtain an equation for the process Vtn and then to derive a maximum inequality making use of martingale theory. First, we consider the equation satisﬁed by Vtn . Let jδ ≤ t < (j + 1)δ. Applying Itô’s formula to equation (8.12), we get ˜ (X i )dt + ∇ ∗ f σ (X i )dBi + c(X i )dYt , df (Xti ) = Lf t t t t ˜ = Lf − h∗ c∗ ∇f is deﬁned in Section 5.3. Combining it with where Lf equation (8.20), by Itô’s formula again, we have d Mjn (X i , t)f (Xti ) = Mjn (X i , t)Lf (Xti )dt + Mjn (X i , t)∇ ∗ f σ (Xti )dBit + Mjn (X i , t) ∇ ∗ fc + fh (Xti )dYt . It follows from the deﬁnition of Vtn that mn

j n 1 n i n d Vt , f = Vt , Lf dt + Mj (X , t)∇ ∗ f σ (Xti )dBit ηjn n i=1 n ∗ + Vt , ∇ fc + fh dYt .

The jump of Vtn at t = (j + 1)δ is n 1 ηj+1

n

=

mn

j

i=1

i ξj+1 δXi ( j+1)δ

n 1 ηj+1

n

mn

1 − ηjn n

mn

j

n Mj+1 (X i )δXi

( j+1)δ

i=1

j

i=1

i ˜ n (X i ) δ i ξj+1 −M j+1 X

( j+1)δ

.

149

150

8 : Numerical methods

Therefore,

Vtn , f = V0n , f + n,f + Nt

t

0

Vsn , Lf ds +

t

0

Vsn , ∇ ∗ fc + h∗ f dYs

ˆ tn,f , +N

(8.28)

where n,f

Nt

=

[t/δ] j=0

mnj

1 n i=1

(( j+1)δ)∧t

jδ

∇ ∗ f σ (Xsi )dBis ηjn ,

and ˆ tn,f = N

[t/δ] j=1

mn

j−1 1 i ˜ n (X i ))f (X i ). ηjn (ξj − M j jδ n

i=1

We will need the following lemma to help us calculate Meyer’s process for discontinuous martingales. Lemma 8.12 Suppose that {ζi , i = 1, 2, . . .} is a sequence of squareintegrable random variables such that ζi is Fi -measurable and ∀i = 1, 2, . . . . E ζi Fi−1 = 0, Let Mt =

[t]

ζi .

i=1

Then, {Mt , t ≥ 0} is a square-integrable martingale with Meyer’s process

Mt =

[t]

E ζi2 |Fi−1 .

(8.29)

i=1

Proof Let F˜ s = F[s] , s ≥ 0. Then Mt is F˜ t -adapted. For s < t, we have ⎛ ⎞ [t] E Mt − Ms |F˜ s = E ⎝ ζi F[s] ⎠ i=[s]+1

=

[t] i=[s]+1

E ζi F[s] = 0.

8.4

Convergence of Vn

This proves that Mt is a martingale. Denote the right-hand side of equation (8.29) by γt . Then, γt is predictable and

E

Mt2 − γt − Ms2 − γs F˜ s ⎛ ⎞ 2 [s] 2 [t] [t] ζi − ζi − E ζi2 |Fi−1 F[s] ⎠ = E⎝ i=1

⎛⎛

i=1

[t]

⎜ = E ⎝⎝

⎞2

=

[t]

ζi ⎠ + 2

i=[s]+1

ζi

i=[s]+1

[t]

i=[s]+1

⎛

E ζi2 |F[s] − E ⎝

i=[s]+1

[s]

ζj −

j=1

[t]

[t] i=[s]+1

⎞ ⎟ E ζi2 |Fi−1 F[s] ⎠

⎞ E ζi2 |Fi−1 F[s] ⎠

i=[s]+1

= 0. Thus, Mt = γt .

As an application of the lemma, we have

n,f

Corollary 8.13 The processes Nt gales with Meyer’s processes !

N

n,f

" t

=

[t/δ] j=0

ˆ tn,f are two uncorrelated martinand N

n

mj 1 (( j+1)δ)∧t ˆ ∗ i 2 E f σ (X )| |∇ Fjδ ds(ηjn )2 , s n2 jδ i=1

and !

ˆ n,f N

" t

=

[t/δ] j=1

n

mj−1 1 ˆ n i 2 i n 2 γ . E (X )f (X )(η ) F ( j−1)δ j jδ j n2 i=1

151

152

8 : Numerical methods

Proof We prove the second equality only. By Lemma 8.12, we have !

ˆ n,f N

" t

=

[t/δ] j=1

=

[t/δ] j=1

=

[t/δ] j=1

⎛⎛

⎞ 1 ˆ ⎜⎝ i ⎟ i ⎠ E⎝ (ξj − Mjn (X i ))f (Xjδ ) (ηjn )2 F( j−1)δ ⎠ 2 n ⎞2

mnj−1 i=1

⎛ ⎛⎛ n ⎞ ⎞ ⎞2 mj−1 1 ˆ ⎜ ˆ ⎜⎝ ⎟ ⎟ i ⎠ E ⎝E ⎝ (ξji − Mjn (X i ))f (Xjδ ) Fjδ−⎠(ηjn )2 F( j−1)δ⎠ 2 n i=1

n

mj−1 1 ˆ n i 2 i n 2 γ . E (X )f (X )(η ) F ( j−1)δ j jδ j n2

(8.30)

i=1

Deﬁne the usual distance on MF (Rd ) by d(ν1 , ν2 ) =

∞

2−i | ν1 − ν2 , fi | ∧ 1 ,

∀ ν1 , ν2 ∈ MF (Rd ), (8.31)

i=0

where f0 = 1 and for i ≥ 1, fi ∈ Cbk+2 (Rd ) ∩ W2k+2 (Rd ) with ||fi ||k+2,∞ ≤ 1 and also ||fi ||k+2,2 ≤ 1, where k = d2 + 2 is given in Condition (BD). Theorem 8.14 Suppose that the conditions (BD) and (I) hold and, additionally, that h ∈ Cbk (Rd ) ∩ W2k (Rd ). Then, there exists a constant K1 such that

Eˆ sup d(Vtn , Vt )2 ≤ K1 n−(1−α) .

(8.32)

t≤T

˜ 1 , ν2 ) be deﬁned as in equation (8.31) with “∧1” removed. Proof Let d(ν Note that ˜ n , Vt )2 ≤ Eˆ sup d(V t t≤T

∞ i=1

−k

2

n 2 ˆ ˆ sup V n − Vt , 1 2 , +E E sup Vt − Vt , fi t t≤T

t≤T

(8.33)

8.4

Convergence of Vn

and 2 2 Eˆ sup Vtn − Vt , f ≤ K2 Eˆ V0n − V0 , f t≤T

+ K2

T

2 Eˆ Vtn − Vt , Lf dt

T

2 Eˆ Vtn − Vt , ∇ ∗ fc + hf dt

0

+ K2

0

ˆ + K2 E

[T/δ] j=0

ˆ + K2 E

i=1

[T/δ] j=1

mn

j 1 (( j+1)δ)∧t ∗ |∇ f σ (Xsi )|2 ds(ηjn )2 n2 jδ

mn

j−1 1 n i 2 i γj (X )f (Xjδ )(ηjn )2 . n2

(8.34)

i=1

By Condition (I) we know that the ﬁrst term is bounded by K3 n−1 . By Theorem 8.10, we see that the second and the third terms are bounded by K4 n−(1−α) . For the fourth, we note that 4th term ≤ K5

[T/δ] j=0

δ ˆ n n 2 E mj (ηj ) ≤ K6 n−1 . n2

By Lemma 8.9, we have 5th term ≤ K7

[T/δ] j=1

√ n

δˆ n n 2 m ≤ K8 n−(1−α) . E (η ) j−1 j−1 2

(8.35)

To complete the proof we consider the last term in equation (8.33). Taking f = 1 in equation (8.34), we get 2 Eˆ sup Vtn − Vt , 1 ≤ K t≤T

T

0

ˆ + KE

2 Eˆ Vtn − Vt , h H dt [T/δ] j=1

mn

j−1 1 n i n 2 γj (X )(ηj ) . n2

(8.36)

i=1

Applying equation (8.35) with f = 1, we see that the second term of equation (8.36) is bounded by K9 n−(1−α) . By Theorem 8.10, we get that the ﬁrst term of equation (8.36) is bounded by K10 n−(1−α) .

153

154

8 : Numerical methods

Thus, by taking K1 = K3 + 2K4 + K6 + K8 + K9 + K10 , we get ˜ n , Vt )2 ≤ K1 n−(1−α) . Eˆ sup d(V t

(8.37)

t≤T

˜ 1 , ν2 ). The desired inequality follows from d(ν1 , ν2 ) ≤ d(ν

Remark 8.15 Here, the estimate equation (8.37) is stronger than equation (8.32). We will need this stronger version in the proof of Theorem 8.16 below. Finally, we convert the convergence result to that for πtn . Theorem 8.16 Suppose that the conditions in Theorem 8.14 are satisﬁed. Then, there exists a constant K1 such that

E sup d(πtn , πt ) ≤ K1 n−

1−α 2

.

(8.38)

0≤t≤T

Proof Note that for f bounded by 1, we have V n , f V − V n , 1 + V n , 1 V n − V , f n t t t t t t n | πt − πt , f | = Vt , 1 Vt , 1 | Vt − Vtn , 1 | | Vtn − Vt , f | + . ≤

Vt , 1

Vt , 1 Thus, d(πtn , πt ) ≤

1 1 ˜ n | Vt − V n , 1 | + d(Vt , Vt ).

Vt , 1

Vt , 1

Recalling that the Radon–Nickodym derivative of the probability measure P on FT with respect to the probability measure Pˆ is MT , we have 1 1 ˜ n n ˆ E sup d(πt , πt ) ≤ E sup | Vt −V n , 1 |+ d(Vt , Vt ) MT

Vt , 1 0≤t≤T 0≤t≤T Vt , 1 1 1 2 2 2 MT2 n ˆ ˆ ≤ E sup | Vt − V , 1 | E sup 2 0≤t≤T 0≤t≤T Vt , 1 1 1 2 2 2 M T ˜ n , Vt )2 ˆ sup d(V + E Eˆ sup . t 2 0≤t≤T 0≤t≤T Vt , 1 (8.39) ˆ M4 < ∞. As It follows from the same argument as in equation (8.3) that E T 1 d log Vt , 1 = πt , h∗ dYt − | πt , h∗ |2 dt, 2

8.5

we have −4

Vt , 1

= exp −4

t

0

∗

πs , h dYs + 2

0

t

∗

Notes

2

| πs , h | ds .

It follows from the same argument as in equation (8.3) again that

Eˆ sup Vt , 1−4 < ∞. 0≤t≤T

Thus, by equations (8.37) and (8.39), there is a constant K1 such that equation (8.38) holds. For the particle ﬁlter, we have the following estimate. Remark 8.17 For the particle ﬁlter π˜ tn , we have E sup d(π˜ tn , πt ) ≤ K1 n−(1−α) ∨ n−2α . 0≤t≤T

8.5

Notes

Particle-system approximation of optimal ﬁlter was studied in heuristic schemes in the beginning of the 1990s by Gordon et al. [70], Gordon et al. [69], Kitagawa [86], Carvalho et al. [26], Del Moral et al. [55]. The rigorous proof of the convergence results for the particle ﬁlter were published by Del Moral [51] in 1996, and independently, by Crisan and Lyons [42] in 1997. Since then, many improvements have been made by various authors. Here, we would like to mention only a few: Crisan and Lyons [43], Crisan [38], [36], [35], [37], Crisan et al. [39], [41], Crisan and Doucet [40], Del Moral and Guionnet [53], Del Moral and Miclo [56]. Later, a central-limittype theorem was proved by Crisan and Xiong [45], on which most of this chapter is based, for a new class of hybrid ﬁlter as well as for the original branching particle ﬁlters. The material in Section 8.1 is based on Kurtz and Xiong [98]. Other numerical methods for the ﬁltering problems have been studied extensively, although much of the work has been done under the assumption that the observation noise is independent of the signal. Kushner [102, 103, 104] develops approximation methods based on replacing the signal process by a ﬁnite-state Markov chain that approximates the signal. In the simplest cases, this method is equivalent to a ﬁnite-difference approximation in the ﬁltering equation. Picard [133] considers a time discretization of Zakai’s equation involving the replacement of the signal by a discrete-time process and discrete-time approximations of the Radon–Nickodym derivative in

155

156

8 : Numerical methods

the Kallianpur–Striebel formula. The approximations still involve integrals against process distributions, and Picard suggests a Monte-Carlo scheme to implement the approximation. Di Masi et al. [59] consider a similar time discretization, but they also introduce a signal approximation that reduces the problem to a ﬁnite-dimensional computation somewhat similar to the approach taken by Kushner. Lototsky and Rozovskii [117] and Lototsky et al. [118] derive algorithms based on a Wiener chaos decomposition. This point of view is also explored by Budhiraja and Kallianpur [19]. Hu et al. [75] consider a Wong–Zakai-type approximation for the ﬁltering equation directly. Florchinger and Le Gland [64], [65] consider a time discretization of Zakai’s equation for diffusion processes observed in correlated noise based on a split-up approximation and a Trotter-type product formula. In [63], a particle approximation is formulated by Florchinger and Le Gland. Del Moral [50] considers a particle approximation for a model with independent observation noise that discounts past information.

9

Linear ﬁltering

In this chapter, we consider a very special ﬁltering model. Namely, the signal is a Gaussian process and the observation function is linear. The corresponding ﬁltering theory is called Kalman–Bucy ﬁltering. We will derive the Kalman–Bucy ﬁlter as a special case of the non-linear ﬁlter introduced in the previous chapters. After the ﬁltering equations are established and the discrete-time approximation is studied, we will investigate the long time stability of the linear ﬁlter in Sections 9.4 and 9.5. The results in the last two sections can be regarded as a standard to compare with when we study the stability for the non-linear ﬁlter in Chapter 10. The material in Section 9.4 is very technical and can be skipped in the ﬁrst reading.

9.1

Gaussian system

We consider the signal–observation given by the following system dXt = (b˜ t + bt Xt )dt + ct dWt + σt dBt , dYt = (h˜ t + ht Xt )dt + dWt , Y0 = 0,

(9.1)

ˆ 0 ∈ Rd and covariance where X0 is a normal random vector with mean X matrix γ0 ∈ Rd×d , (Wt , Bt ) is an m + d-dimensional Brownian motion, the coefﬁcients b˜ t , bt , ct , σt , h˜ t , ht are deterministic matrices (or vectors) of dimensions d × 1, d × d, d × m, d × d, m × 1, m × d, respectively. To solve the linear stochastic system equation (9.1), we consider the following ordinary differential equations (9.2) and (9.3) in the space of d × d-matrices, whose solution will be the dual process of Xt . Let pt and qt be deterministic functions taking values in the space of d × d-matrices. Suppose that pt is the unique solution to the linear system d pt = −pt bt , dt

p0 = I,

(9.2)

158

9 : Linear ﬁltering

and qt is the unique solution to the linear system d qt = bt qt , dt

q0 = I,

(9.3)

here, I is the d × d identity matrix. Then d dpt dqt (pt qt ) = qt + pt dt dt dt = −pt bt qt + pt bt qt = 0, and hence, pt qt = p0 q0 = I. Namely, pt is invertible with p−1 t = qt . Remark 9.1 If bt = b does not depend on t, then pt = e−bt ≡

∞ (−1)n t n n=0

n!

bn and qt = ebt .

Deﬁnition 9.2 A stochastic process Xt is a Gaussian process if for any t1 < t2 < · · · < tn and λ1 , . . . , λn ∈ Rd , the random variable ni=1 λ∗i Xti has a normal distribution. Theorem 9.3 Suppose that b, h are bounded on [0, T], c, σ are square˜ h˜ are integrable on [0, T]. Then, the stochastic integrable on [0, T] and b, process (Xt , Yt ) is a Gaussian process. Proof Applying Itô’s formula to equations (9.1) and (9.2), we get d (pt Xt )

= − pt bt Xt dt + pt (b˜ t + bt Xt )dt + ct dWt + σt dBt = pt b˜ t dt + ct dWt + σt dBt . Thus,

t Xt = qt X0 + qt 0 pr b˜ r dr + cr dWr + σr dBr ,

t Yt = (h˜ s + hs Xs )ds + Wt . 0

Therefore, (Xt , Yt ) is obtained by a linear transformation of {(X0 , Wr , Br ) : 0 ≤ r ≤ t}, and hence, it is a Gaussian process. Recall that Gt is the σ -ﬁeld generated by the observation process before time t. Let πt be the optimal ﬁlter, i.e. πt (ω) is the conditional distribution of Xt given Gt : πt (A, ω) = P (A|Gt ) (ω),

∀ A ∈ B (Rd ), ∀ ω ∈ .

9.1

Gaussian system

Theorem 9.4 For any t ≥ 0 and for ω ∈ being ﬁxed, πt (ω) is a multivariate normal probability measure on Rd . N Proof Let DN = {0 = sN 1 < · · · < saN = t} be an increasing sequence of sets whose union is dense in [0, t]. Since (Xt , Yt ) is a Gaussian process, the conditional distribution πtN ≡ P(Xt ∈ ·|Ys , s ∈ DN ) is normal with condiˆ N and conditional covariance matrix γ N . We now consider tional mean X t t the characteristic function corresponding to πtN . For λ ∈ Rd , we deﬁne ∗ eiλ x πtN (dx) φN (λ) ≡ d R ∗ = E eiλ Xt Ys , s ∈ DN .

Note that for λ ∈ Rd ﬁxed, {φN (λ) : N ≥ 1} is a martingale with ∗ φ∞ (λ) = Rd eiλ x πt (dx). By the martingale convergence theorem (Theorem 2.10), we see that φN (λ) → φ∞ (λ) a.s. Since the characteristic function of a multivariate normal distribution πtN is given explicitly as follows: ˆ N − 1 λ∗ γ N λ , φN (λ) = exp λ∗ X t t 2 ˆt ˆ N and γ N as N → ∞. Denote the limit by X we get the convergence of X t t and γt , respectively. Then 1 ∗ ∗ˆ φ∞ (λ) = exp λ Xt − λ γt λ . 2 Thus, πt (ω) is a multivariate normal distribution on Rd .

ˆ t and γt to denote the (ranAs in the proof of the theorem above, we use X dom) conditional mean vector and the conditional covariance matrix of the multivariate normal distribution corresponding to the random measure πt . Lemma 9.5 The matrix γt is non-random. In fact, for 1 ≤ i, j ≤ d, we have ˆ ˆ iX γt = E(Xti Xt ) − E(X t t ). ij

j

j

Proof As in the proof of Theorem 9.4, we get ij ˆ i )(Xtj − X ˆ tj )|Ys , s ∈ DN . γt = lim E (Xti − X t N→∞

Since

ˆ i |Ys , s ∈ DN ) = E E(X i − X ˆ i |Gt )|Ys , s ∈ DN = 0, E(Xti − X t t t

159

160

9 : Linear ﬁltering

ˆ i is uncorrelated to {Ys , s ∈ DN }. By the properties of normal random Xti − X t ˆ i is independent of the family {Ys , s ∈ DN }. Similarly, Xtj − X ˆ tj vectors, Xti − X t is independent of {Ys , s ∈ DN }. Thus, j j ij i i ˆ ˆ γt = E (Xt − Xt )(Xt − Xt )Gt ˆ i )(Xtj − X ˆ tj )Ys , s ∈ DN = lim E (Xti − X t N→∞ ˆ i )(Xtj − X ˆ tj ) = lim E (Xti − X t N→∞ j ˆj ˆ iX = E Xti Xt − X t t .

9.2

Kalman–Bucy ﬁltering

ˆ t and γt by making use In this section, we derive equations for the processes X of the ﬁltering equation. The coefﬁcients in the ﬁltering model of Chapter 5 are assumed to be bounded, which is not satisﬁed in the current linear model. However, since the processes are Gaussian, which have moments of any order, it follows from the same arguments as in the derivation of the ﬁltering equation (5.17) that πt satisﬁes the following stochastic differential equation on P (Rd ): For any f ∈ C 2 (Rd ) with at most exponential growth, t t! " πt , f = π0 , f + πs , ∇ ∗ fcs + f (h˜ s + hs ι)∗ dνs πs , Ls f ds + −

0

0

t

0

"

!

πs , f πs , (h˜ s + hs ι)∗ dνs ,

where νt = Yt −

(9.4)

t! 0

" πs , h˜ s + hs ι ds

t ˆ s ds h˜ s + hs X = Yt − 0

is a d-dimensional Brownian motion, d 1 ij 2 at ∂ij f (x), Ls f (x) = ∇ ∗ f (x)(b˜ t + bt x) + 2 ij=1

and at = ct ct∗ + σt σt∗ , ι is the identity function on Rd , i.e. ι(x) = x, ∀x ∈ Rd .

9.2

Kalman–Bucy ﬁltering

ˆ t . Taking f (x) = xi , we have We now derive the equation satisﬁed by X Ls f (x) = b˜ is +

d

ij

bs xj and ∂j f = δij , i = 1, 2, . . . , d.

j=1

Then

d ij ˆ j bs X πs , Ls f = b˜ is + s, j=1

t!

" πs , ∇ ∗ fcs + f (h˜ s + hs ι)∗ dνs

0 d

=

% πs ,

=

j=1 d

=

⎛ j

∂ fcs + ⎝h˜ s +

=1

j=1 d

d

%

j

k=1

⎛

πs , cs + xi ⎝h˜ s + ij

d

j

d

⎞ & jk j hs xk ⎠ f dνs ⎞&

hs xk ⎠ dνs jk

j

j=1

⎛ ˆ i h˜ sj + ⎝csij + X s

j=1

d

⎞

jk ˆ iX ˆk ⎠ j hs (γsik + X s s ) dνs ,

k=1

and % & d d " ! j jk j πs , (h˜ s + hs ι)∗ dνs = hs xk dνs πs , h˜ s + j=1

=

d

⎛ ⎝h˜ sj +

j=1

k=1 d

⎞

jk ˆ k ⎠ j hs X dνs . s

k=1

Thus, ˆi =X ˆi + X t 0

+

⎛

t 0

d j=1

0

⎝b˜ is + ⎛

t

d

⎞ ˆ s ⎠ ds bt X ij

j

j=1

ˆ i h˜ sj + ⎝csij + X s

d k=1

⎞ ˆk ⎠ ˆ iX hs (γsik + X s s ) dνs jk

j

161

162

9 : Linear ﬁltering

−

⎛

d j=1

ˆi + =X 0

t

0

0

ˆ i ⎝h˜ sj + X s

⎞ jk ˆ k ⎠ j hs X dνs s

k=1

⎛ t

d

⎝b˜ is +

d

⎞

ij ˆ j ⎠ bt X s ds +

j=1

d j=1

0

⎛ t

⎝csij +

d

⎞ hs γsik ⎠ dνs . jk

k=1

Writing in the vector form, we have t t ˜ ˆ ˆ ˆ bs + bs Xs ds + cs + γs h∗s dνs . Xt = X0 + 0

j

(9.5)

0

Therefore, we have proved the existence part of the following theorem. ˆ is the unique solution to the SDE Theorem 9.6 The mean vector process X (9.5), with νt being a d-dimensional Brownian motion given by t ˆ s ds. h˜ s + hs X νt = Yt − 0

Proof We only need to prove the uniqueness. Note that νt depend on the ˆ We write the SDE (9.5) as solution X. t ˆ0 + ˆt = X cs + γs h∗s dYs X 0

t ˆ s ds. + b˜ s − cs + γs h∗s h˜ s + bs − cs + γs h∗s hs X 0

˜ q˜ be deﬁned as in equations (9.2) and (9.3) with bs replaced by Let p, bs − cs + γs h∗s hs . Similar to the proof of Theorem 9.3 we see that ˆ 0 + q˜ t ˆ t = q˜ t X X

0

t

p˜ s

b˜ s − cs + γs h∗s h˜ s ds + cs + γs h∗s dYs .

This yields the uniqueness.

Next, we derive the equation satisﬁed by γt . Applying Itô’s formula to equation (9.1), for 1 ≤ i, j ≤ d, we have j

j

j

ij

d(Xti Xt ) = Xti dXt + Xt dXti + at dt. Writing this in the integral form, we have t j j ij i j i j Xt Xt = X0 X0 + Xsi dXs + Xs dXsi + as ds . 0

9.2

Kalman–Bucy ﬁltering

Taking expectations and then taking derivatives on both sides, we get ⎛ ⎞ ⎛ ⎞ d d d j j jk j ij k⎠ + at . E(Xti Xt ) = EXti ⎝b˜ t + bt Xtk ⎠ + EXt ⎝b˜ it + bik t Xt dt k=1

k=1

(9.6) Applying Itô’s formula to equation (9.5), we get ˆ ˆi ˆ ˆ ˆi ˆ iX d(X t t ) = Xt d Xt + Xt d Xt + j

j

j

m

(ct + γt h∗t )ik (ct + γt h∗t )jk dt.

k=1

Similar to equation (9.6), we have ⎛ ⎞ ⎛ ⎞ m m d jk ˆ k ⎠ ˆ iX ˆj ˆ i ⎝˜ j ˆ tj ⎝b˜ i + ˆ k⎠ E(X bt X bik + EX t t ) = EXt bt + t t t Xt dt k=1

ij + (ct + γt h∗t )(ct + γt h∗t )∗ .

k=1

(9.7)

Combining equations (9.6) and (9.7) with Lemma 9.5, we have ij d ij ik jk ik jk ij γt = γt bt + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt d

d

k=1

k=1

It then follows that, in the matrix form, γt satisﬁes the following Riccati equation: d γt = γt b∗t + bt γt + at − (ct + γt h∗t )(ct + γt h∗t )∗ . dt

(9.8)

ˆ t , γt ) is the unique solution to equations (9.5) Theorem 9.7 The process (X and (9.8). Proof Suppose that γt1 and γt2 are two solutions to equations (9.8) and ζt is their difference. As |γt1 h∗t ht γt1 − γt2 h∗t ht γt2 | ≤ |ζt h∗t ht γt1 | + |γt2 h∗t ht ζt | ≤ |ht |2 |γt1 | + |γt2 | |ζt |, we have

|ζt | ≤

0

t

2 |bs | + |hs |2 |γs1 | + |γs2 | + |cs ||hs | |ζs |ds.

By Gronwall’s inequality we see that ζt = 0.

163

164

9 : Linear ﬁltering

For d = m = 1 with time-independent coefﬁcients, we can give the explicit formula for γt . Note that d γt = 2bγt + a − (c + hγt )2 dt = −hγt2 + 2(b − ch)γt + (a − c2 ) = −h(γt − c+ )(γt − c− ), where c± =

b − ch ±

3

(b − ch)2 + h(a − c2 ) . h

Then, dγt = −hdt. (γt − c+ )(γt − c− ) Thus, γ0 − c+ −h(c+ −c− )t γt − c+ = e . γt − c− γ0 − c− ˆ t , γt ) is call the Kalman–Bucy ﬁlter of the linear Remark 9.8 The process (X system equation (9.1).

9.3

Discrete-time approximation of the Kalman–Bucy ﬁltering

In this section, we consider the numerical solution to the Kalman–Bucy ﬁltering, since in most cases it cannot be given explicitly. Recall that the ﬁlter ˆ t , γt ) is given by equations (9.5) and (9.8). Using the Euler scheme, we (X approximate the Kalman–Bucy ﬁlter at time points t = kδ, k = 0, 1, 2, . . . by the following recursive formulas: δ δ ˆ δ + b˜ kδ + bkδ X ˆδ ˆ δ δ + ckδ + γ δ h∗ = X − ν ν X (k+1)δ kδ kδ kδ kδ (k+1)δ kδ , (9.9) δ δ ˆ δ δ, = νkδ + Y(k+1)δ − Ykδ − h˜ kδ + hkδ X ν(k+1)δ kδ

(9.10)

and

δ δ δ ∗ δ ∗ δ ∗ ∗ δ. = γkδ + γkδ bkδ + bkδ γkδ + akδ − ckδ + γkδ hkδ ckδ + γkδ hkδ γ(k+1)δ (9.11)

9.4

Some basic facts for a related deterministic control problem

Theorem 9.9 Suppose that the coefﬁcients b˜ t , bt , ct , σt , h˜ t , ht are Lipschitz continuous in t. Then there exists a constant K1 such that ˆδ −X ˆ kδ |2 + max |γ δ − γkδ | ≤ K1 δ. E max |X kδ kδ kδ≤T

kδ≤T

Proof Consider the process in ﬁnite time interval [0, T]. Since γt and the coefﬁcients at , bt , ct and ht are continuous, they are bounded. It follows d from equation (9.8) that dt γt is bounded. Thus, γt is also Lipschitz continuous. Form this, together with the Lipschitz continuity of the coefﬁcients, we see that there exists a constant K2 such that (k+1)δ ∗ γs bs + bs γs + as − (cs + γs h∗s )(cs + γs h∗s )∗ ds kδ

−

γkδ b∗kδ

+ bkδ γkδ + akδ −

ckδ + γkδ h∗kδ

∗ ckδ + γkδ h∗kδ δ

≤ K2 δ 2 . As

γ(k+1)δ = γkδ +

(k+1)δ

kδ

γs b∗s + bs γs + as − (cs + γs h∗s )(cs + γs h∗s )∗ ds,

by equation (9.11), we get δ fk+1 ≡ |γ(k+1)δ − γ(k+1)δ |

≤ fk + K2 δ 2 + K3 fk δ. Using induction, we can prove that fk ≤ K2 δ 2 1 + (1 + K3 δ) + · · · + (1 + K3 δ)k−1 =

K2 (1 + K3 δ)k − 1 δ K3

≤ K4 δ. ˆ δ − Xkδ follows from the same arguments The proof for the estimate of X kδ as in Sections 4.4 and 8.1.

9.4

Some basic facts for a related deterministic control problem

In this section, we consider the time-homogeneous linear system, i.e. the case when the matrices b˜ t , bt , ct , σt , h˜ t , ht do not depend on t. This section is long and technical. Its main purpose is to prepare Lemma 9.35

165

166

9 : Linear ﬁltering

with related deﬁnitions and results from the theory of stochastic control. We strongly suggest the reader skips this section in the ﬁrst reading. We include this section here for the completeness of the book. We will investigate the limiting behavior of the Riccati equation (9.8). To this end, we consider the following optimal control problem with state equation on Rd : x˙ t = Axt + But , and the cost functional T 1 1 J(u) = |Dxt |2 + |ut |2 dt + x∗T γ0 xT , 2 0 2

(9.12)

∀ u ∈ L2 ([0, T], Rm ),

where | · | denotes the Euclidean norm, A, B, D are matrices of dimensions d d ×d, d ×m and m×d, respectively, and x˙ t denotes the time derivative dt xt . Suppose that the initial state x0 is ﬁxed. The aim of the control problem is to ﬁnd a control u ∈ L2 ([0, T], Rm ) such that J(u) is minimized. Here, ut is called the control variable, and xt is the state variable. The problem in the last paragraph is a special case of the LQ control problem. In this section, we shall study some properties for this special case. We refer the reader to Yong and Zhou [153] for a detailed treatment under a more general setting. First, we will derive a Riccati equation for this control problem. Then, we will choose the matrices A, B, D in terms of the coefﬁcients in the ﬁltering problem such that the Riccati equation for the control problem coincides with equation (9.8), which is the Riccati equation for the Kalman–Bucy ﬁlter. The representation for the Riccati equation in terms of the control problem will help us in studying the limit of the solution. The next theorem establishes the optimal control law. Theorem 9.10 Suppose that pt is given by the following differential equation on Rd : p˙ t = −A∗ pt + D∗ Dxt ,

pT = −γ0 xT .

Let ut = B∗ pt . Then for any u˜ ∈ L2 ([0, T], Rm ), we have J(u) ˜ ≥ J(u). Namely, u is the optimal control law. Proof Deﬁne vt = u˜ t − ut and let x˜ t be the solution to equation (9.12) with ut replaced by u˜ t . Let zt = x˜ t − xt . Then z˙ t = Azt + Bvt ,

z0 = 0.

9.4

Some basic facts for a related deterministic control problem

As d ∗ (p zt ) = x∗t D∗ Dzt + p∗t Bvt dt t = x∗t D∗ Dzt + u∗t vt , we have

0

T

Dxt , Dzt Rm dt +

T 0

u∗t vt dt = −x∗T γ0 zT .

Then, we have J(u) ˜ − J(u) =

1 2

T 0

1 |Dzt |2 + |vt |2 dt + zT∗ γ0 zT ≥ 0, 2

and hence, the minimal of J is attained at ut .

Next, we show that the optimal control law is related to the Riccati equation. Theorem 9.11 Let Pt be the solution to the following Riccati equation on Sd+ : −P˙ t = A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt ,

PT = γ0 ,

(9.13)

where Sd+ is the collection of all d × d non-negative-deﬁnite matrices. Then, pt = −Pt xt and J(u) =

1

P0 x0 , x0 Rd . 2

Proof Let qt = −Pt xt . Taking derivatives on both sides, we get q˙ t = −P˙ t xt − Pt x˙ t = A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt xt − Pt Axt + BB∗ pt = −A∗ qt + D∗ Dxt − Pt BB∗ (pt − qt ) . Note that pt is also a solution to the above equation. By the uniqueness of the solution to this equation, we see that pt = −Pt xt . Note that d

Pt xt , xt Rd = − A∗ Pt + Pt A + D∗ D − Pt BB∗ Pt xt , xt dt + Pt (Axt + But ) , xt + Pt xt , Axt + But = −|Dxt |2 − |ut |2 .

167

168

9 : Linear ﬁltering

Hence J(u) =

1

P0 x0 , x0 Rd . 2

(9.14)

Remark 9.12 Note that equation (9.8) can be written as d γt = γt (b − ch)∗ + (b − ch)γt + σ σ ∗ − γt h∗ hγt . dt If A = (b − ch)∗ , B = h∗ and D = σ , then the Riccati equation (9.13) coincides with equation (9.8) with time reversed. To study the long-time limit of the (uncontrolled) linear system dxt = Axt , dt

(9.15)

which is a special case of the linear SDE (9.5), we need some facts from matrix algebra. We state these facts without giving their proofs, which can be found in any textbook about matrix algebra. Lemma 9.13 Suppose that the d×d-matrix A has k distinct (complex) eigenvalues λi , i = 1, 2, . . . , k. Let mi be the multiplicity of λi in the characteristic polynomial of A. Let Ni be the null space of the matrix Mi ≡ (A − λi I)mi . Then the dimension of Ni , denoted by dim(Ni ), is equal to mi , and the d-dimensional complex space is equal to the direct sum of the null spaces Ni , i = 1, 2, . . . , k. Based on this decomposition, we get the following Jordan form for the matrix A. Lemma 9.14 There is an invertible d×d-matrix C that can be partitioned as C = (C1 , C2 , . . . , Ck ) such that A = CJC −1 , where J = diag(J1 , J2 , . . . , Jk ), and Ci is a d × mi -matrix, i = 1, 2, . . . , k. The block Ji is an mi × mi -matrix that can be subpartitioned as Ji = diag Ji1 , Ji2 , . . . , Jii , where each subblock Jij is of the form ⎛ ⎞ λi 1 0 ··· ··· ⎜ 0 λi 1 0 ··· ⎟ ⎜ ⎟ ⎜ Jij = ⎜ · · · · · · · · · · · · · · · ⎟ ⎟. ⎝ 0 · · · 0 λi 1 ⎠ 0 · · · · · · 0 λi J is called the Jordan normal form of A.

9.4

Some basic facts for a related deterministic control problem

Now we can get an expression for the solution to equation (9.15). Theorem 9.15 Suppose that x0 = ki=1 vi with vi ∈ Ni , i = 1, 2, . . . , k. Write C −1 = (U1∗ , U2∗ , . . . , Uk∗ )∗ where the partitioning corresponds to that of C in Lemma 9.14, i.e. Ui is an mi × d-matrix. Then xt =

k

Ci exp (Ji t) Ui vi ,

i=1

where

eJi t = diag e Ji1 t , e Ji2 t , . . . , e Jii t ,

and

⎛

1

t2 2!

t

···

⎜ ∞ n t n ⎜ 0 1 t ··· Jij = ⎜ exp Jij t ≡ ⎜ n! ⎝ ··· ··· ··· ··· n=0 0 0 0 ···

n −1

t ij (nij −1)! n −2 t ij (nij −2)!

··· 1

⎞ ⎟ ⎟ λt ⎟e i , ⎟ ⎠

(9.16)

nij is the dimension of Jij . Proof It is easy to verify that xt = etA x0 . Thus, xt =

∞ n t n=1

=

n!

∞ n t n=1

n!

n

A =

∞ n t n=1

k

n!

CJ n C −1 x0

Ci Jin Ui vi

i=1

=

k

Ci exp (Ji t) Ui vi .

i=1

With the preparation above, we are ready to study the limiting behavior of the solution of equation (9.15). If all the eigenvalues of A have negative real parts, then lim xt = 0.

t→∞

Deﬁnition 9.16 Let A be a d × d-matrix. We call the direct sum of those subspaces Ni with Re(λi ) < 0 the stable subspace of A. The orthogonal complement of the stable subspace is called the unstable subspace. A is asymptotically stable if all its eigenvalues have negative real parts. Remark 9.17 If A is asymptotically stable, then the real parts of the eigenvalues of A are negative. Let λ0 > 0 be such that Re(λj ) < −λ0 for all

169

170

9 : Linear ﬁltering

j = 1, 2, · · · , d. It follows from equation (9.16) that there exists a constant K such that |xt | ≤ Ke−λ0 t . Therefore, xt tends to 0 exponentially fast. To prove the convergence of the solution of the Riccati equation as T → ∞, we need to make some assumptions on the coefﬁcient matrices. Deﬁnition 9.18 The linear system equation (9.12) is completely controllable if for any a ∈ Rd , there is a control u and a time t1 such that x0 = a and xt1 = 0. Remark 9.19 Using a shift transformation, we see that an equivalent deﬁnition for the linear system equation (9.12) being completely controllable is that for any a ∈ Rd , there is a control u and a time t1 such that xt1 = a and x0 = 0. In fact, this is the deﬁnition given in [105]. Namely, under suitable control, the system can reach any state starting from 0. Now we consider the system equation (9.15) with output variable yt , namely, we study the system dxt = Axt , dt

yt = Dxt .

(9.17)

Denote the solution yt by yt (a) if x0 = a. Deﬁnition 9.20 The system equation (9.17) is completely reconstructible if for any t1 > 0 and a ∈ Rd , yt (a) = 0,

∀ t ≤ t1

implies a = 0. Remark 9.21 By the linearity, the system equation (9.17) is completely reconstructible if for any t1 > 0 and a1 , a2 ∈ Rd , yt (a1 ) = yt (a2 ),

∀ t ≤ t1

implies a1 = a2 . This means that the initial state can be reconstructed from the output. We denote the solution to the Riccati equation (9.13) by PtT when we discuss its limit. We may omit T for simplicity of notation when there is no confusion. Theorem 9.22 Suppose that the system equation (9.12) is completely controllable and PT = 0. Then P0T converges as T → ∞. Denote the ¯ limit by P. If, in addition, the system equation (9.17) is completely reconstructible, then P¯ > 0 and the matrix A − BB∗ P¯ is asymptotically stable.

9.4

Some basic facts for a related deterministic control problem

Proof By equation (9.14), we have aT ≡

x∗0 P0T x0

=

min

u∈L2 ([0,T],Rm ) 0

T

|Dxt |2 + |ut |2 dt.

(9.18)

Thus, aT is non-decreasing in T. We now prove that {aT } is bounded from above. Since the system is completely controllable, there exists an input u that transfers state x0 to 0 at some time t1 . Clearly, the optimal control after t1 is ut = 0, and in this case, xt = 0, ∀ t ≥ t1 . Then, aT is bounded from above by t1 |Dxt |2 + |ut |2 dt. 0

Therefore, aT converges for any x0 . This proves the convergence of P0T as T → ∞. Next, we assume that the system equation (9.17) is completely reconstructible in addition to the assumption of equation (9.12) being completely controllable. Note that P¯ ≥ 0. Suppose that P¯ is singular. Then there exists x0 = 0 ¯ 0 = 0. Since aT is non-decreasing, we see that aT = 0 such that a∞ = x∗0 Px for all T ≥ 0. Hence, there exists u such that for any T ≥ 0, T |yt |2 + |ut |2 dt = 0. 0

Thus, ut = 0 and yt = 0 for all t ≤ T, but x0 = 0. This contradicts the assumption that the system is completely reconstructible. Therefore, P¯ is positive-deﬁnite. ¯ It is clear that Pt = P¯ for all t ≤ T. By Theorem 9.11, we Let PT = P. see that $ T 2 2 ∗ ¯ ¯ 0. inf |yt | + |ut | dt + xT PxT = x∗0 Px (9.19) u∈L2 ([0,T],Rm )

0

¯ t . Thus The optimal control is attained at ut = −B∗ Px ∞ |yt |2 + |ut |2 dt < ∞.

(9.20)

0

Since P¯ is not singular, it follows from equation (9.19) that {xT } is bounded. Under the steady-state control law, the system becomes (9.21) x˙ t = A − BB∗ P¯ xt .

171

172

9 : Linear ﬁltering

If A − BB∗ P¯ is not asymptotically stable, then there is an eigenvalue of A − BB∗ P¯ whose real part is non-negative. It follows from equation (9.16) that for a suitably chosen x0 the sequence {xt } cannot be bounded, which contradicts the boundedness of {xt } we obtained earlier. Hence, A − BB∗ P¯ is asymptotically stable. In general, the system need not be completely reconstructible. We now deﬁne the subspace of states which cannot be reconstructed. Deﬁnition 9.23 The unreconstructible subspace of the system equation (9.17) is the linear subspace of Rd consisting of the states a ∈ Rd satisfying yt (a) = 0,

∀ t ≥ 0.

The following theorem will be needed in this section. We state it without giving the proof. Theorem 9.24 (Cayley–Hamilton theorem) Let f (λ) = det(λI − A) = c0 + c1 λ + · · · + cd−1 λd−1 + λd be the characteristic polynomial of the matrix A. Then, f (A) = 0. The following theorem gives an explicit representation of the unreconstructible subspace. Theorem 9.25 The unreconstructible subspace of equation (9.17) is equal to the null space of the matrix ⎛ ⎞ D ⎜ DA ⎟ ⎟ Q=⎜ ⎝ ··· ⎠. DAd−1 Proof If a is in the unreconstructible subspace, then DeAt a = 0, Thus,

for all t ≥ 0.

2 t D + DAt + DA2 + · · · a = 0, 2!

for all t ≥ 0.

(9.22)

Therefore, Da = DAa = DA2 a = · · · = 0 and, hence, Qa = 0. On the other hand, suppose that Qa = 0, i.e. Da = DAa = · · · = DAd−1 a = 0. It follows from the Cayley–Hamilton theorem that f (A) = 0, where f is the characteristic polynomial of the matrix A: c0 + c1 A + · · · + cd−1 Ad−1 + Ad = 0.

9.4

Some basic facts for a related deterministic control problem

Thus, DAd a = 0. By induction, we can prove that DAk a = 0 for all k. Hence, equation (9.22) holds, and a is in the unreconstructible subspace. Deﬁnition 9.26 The pair of matrices (A, D) is called detectable if the unreconstructible subspace of the system equation (9.17) is contained in the stable subspace of A. Next, we study the system under state transformation, and investigate how the Riccati equation changes under this transformation. Consider the controlled system with output: x˙ t = Axt + But ,

yt = Dxt .

(9.23)

Let U be an invertible d × d-matrix and xt = Uxt . Note that xt is just the notation for another state variable. It should not be confused with the derivative of xt or the transpose of xt . Then x˙ t = UAU −1 xt + UBut ,

yt = DU −1 xt .

Let Pt = (U −1 )∗ Pt U −1 . Then −P˙ t = (UAU −1 )∗ Pt + Pt UAU −1 + (DU −1 )∗ DU −1 − Pt UB(UB)∗ Pt . Note that for t → ∞, Pt has a limit if and only if Pt has a limit. Suppose that the original system is not completely reconstructible. Denote the rank of the matrix Q by d1 < d. Let f1 , f2 , . . . , fd1 be a set of linearly independent row vectors of Q. Let fd1 +1 , . . . , fd be row vectors, orthogonal to f1 , f2 , . . . , fd1 , such that {f1 , . . . , fd } form a basis of Rd . Let U1 and U2 be d1 × d and (d − d1 ) × d matrices formed by the row vectors f1 , f2 , . . . , fd1 U1 . Let U −1 = (T1 , T2 ) be and fd1 +1 , . . . , fd , respectively. Let U = U2 the corresponding partition, i.e. T1 is d × d1 and T2 is d × (d − d1 ). Then U1 T1 U1 T2 I 0 = . UU −1 = U2 T1 U2 T2 0 I Thus, U1 T2 = 0. Since the row space of U1 coincides with the row space of Q, for x ∈ Rd , U1 x = 0 implies Qx = 0, and hence, x is in the unreconstructible subspace. Since U1 T2 = 0, all column vectors of T2 must be in the unreconstructible subspace. There are d − d1 linearly independent column vectors in T2 . Hence, these column vectors form a basis for the unreconstructible subx1 space. Thus, U1 x = 0 for any x in this subspace. Let x = with x2 x1 ∈ Rd1 and x2 ∈ Rd−d1 . As x = U −1 x = T1 x1 + T2 x2 ,

173

174

9 : Linear ﬁltering

the unreconstructible subspace is {x : x1 = 0} and the reconstructible subspace is {x : x2 = 0}. Note that U1 AT1 U1 AT2 −1 UAU = , U2 AT1 U2 AT2 and DU −1 = (DT1 , DT2 ). If x0 is in the unreconstructible subspace, by Theorem 9.25 and the Cayley–Hamilton theorem, it is easy to verify that Ax0 is also in that subspace. Thus, the column vectors of AT2 are also in that subspace, and hence, U1 AT2 = 0. Since the rows of D are row vectors of the unreconstructibility matrix Q, we must have DT2 = 0. We summarize these observations in the following: Lemma 9.27 The transformed system is represented in the unreconstructibility canonical form: ⎧ ⎨

0 A11 = A21 A22 ⎩ yt = (D1 , 0)xt , x˙ t

xt

+

B1 B2

ut

where A11 is a d1 × d1 -matrix, and the pair (A11 , D1 ) is completely reconstructible. Further, the transformed Riccati equation for Pt ≡ (U −1 )∗ Pt U −1 is P˙ t =

∗ 0 0 A11 (D1 )∗ D1 A11 + Pt + Pt A21 A22 A21 A22 0 ∗ B1 (B1 ) B1 (B2 )∗ − Pt Pt . B2 (B1 )∗ B2 (B2 )∗

0 0

The reconstructible subspace is {x ∈ Rd : x2 = 0}, where x2 ∈ Rd−d1 consists of the last d − d1 components of x = Ux. Proof We only need to prove that (A11 , D1 ) is completely reconstructible. Let x ∈ Rd1 be such that D1 x = D1 A11 x = D1 (A11 )2 x = · · · = 0.

9.4

Some basic facts for a related deterministic control problem

Then, DT1 x = 0. As D1 A11 x = DT1 U1 AT1 x = D(I − T2 U2 )AT1 x = DAT1 x, we get DAT1 x = 0. Similarly, we can show that DAk T1 x = 0 for all k. Thus, T1 x is in the unreconstructible subspace of (A, D). This implies T1 x = 0, and hence, x = 0. Therefore, (A11 , D1 ) is completely reconstructible. We now consider the controlled system equation (9.12) that is not necessarily completely controllable. Deﬁnition 9.28 The controllable subspace of the linear system equation (9.12) consists of the states that can be reached from the zero state within a ﬁnite time. Namely, a ∈ Rd is in the controllable subspace if and only if there exist u ∈ L2 ([0, T], Rm ) and t1 > 0 such that x0 = 0 and xt1 = a. The pair (A, B) is stabilizable if the unstable space of A is contained in the controllable subspace of the linear system equation (9.12). The next theorem gives an explicit characterization of the controllable subspace. Theorem 9.29 The controllable subspace of the linear system equation (9.12) is equal to the linear subspace spanned by the columns of the controllability matrix P = (B, AB, . . . , Ad−1 B). Proof Suppose x0 = 0. Then t xt = eA(t−s) Bus ds 0

=B

0

t

us ds + AB

t 0

2

(t − s)us ds + A B

0

t

(t − s)2 us ds + · · · . 2!

Thus, xt is in the column space of the matrix P∞ = (B, AB, A2 B, . . .). By the Cayley–Hamilton theorem, Ad can be represented as a linear combination of I, A, A2 , · · · , Ad−1 . Thus, the column space of P∞ coincides with that of P. Hence, xt is in the column space of P.

175

176

9 : Linear ﬁltering

On the other hand, suppose a is in the column space of P. Namely, there are vectors α0 , α1 , . . . , αd−1 ∈ Rm such that a = Bα0 + ABα1 + · · · + Ad−1 Bαd−1 . We can choose a suitable function u such that t (t − s)i us ds = αi−1 11≤i≤d . i! 0 Thus, a is in the controllable subspace of the linear system equation (9.12). Next, we consider the controllability canonical form. Since the proof is similar to that of Lemma 9.27, we will omit it. Lemma 9.30 Suppose that T1 is a d × d2 -matrix whose column vectors form a basis for the controllable subspace and T2 is a d × (d − d2 )-matrix whose column vectors are orthogonal to those of T1 , and, together with those of T1 , form a basis for Rd . Let T = (T1 , T2 ) and xt = T −1 xt . Then the system is transformed into the controllability canonical form: A11 A12 B1 ut , x˙ t = xt + 0 A22 0 where A11 is a d2 × d2 -matrix, and the pair (A11 , B1 ) is completely controllable. Further, the transformed Riccati equation becomes ∗ A11 A12 ˙P = A11 A12 Pt + Pt t 0 A22 0 A22 ∗ B1 (B1 ) 0 ∗ Pt . + (D ) D − Pt 0 0 Now we are ready to consider the limit of the Riccati equation. This limit will be established in the following four theorems. Theorem 9.31 If PTT = 0 and the intersection of the unstable, uncontrollable and reconstructible subspaces of the system equation (9.23) is empty, ¯ which is a solution to the following then, as T → ∞, P0T converges to P, algebraic Riccati equation ¯ + D∗ D − PBB ¯ ∗ P¯ = 0. A∗ P¯ + PA

(9.24)

Proof By Lemma 9.27, the system can be written into the reconstructibility canonical form. For simplicity of notation, we denote xt and Pt by xt and

9.4

Some basic facts for a related deterministic control problem

Pt , respectively. Then ⎧ A11 B1 0 ⎨ xt + ut , x˙ t = A21 A22 B2 ⎩ yt = (D1 , 0)xt , where A11 is a d1 × d1 -matrix, and the pair (A11 , D1 ) is completely reconstructible. Partitioning the solution Pt of the Riccati equation as Pt11 Pt12 Pt = , (Pt12 )∗ Pt22 then

∗ −P˙ t11 = D∗1 D1 − Pt11 B1 + Pt12 B2 Pt11 B1 + Pt12 B2 ∗ + Pt11 A11 + Pt12 A21 + Pt11 A11 + Pt12 A21 , ∗ −P˙ t12 = − Pt11 B1 + Pt12 B2 Pt12 B1 + Pt22 B2

−P˙ t22

+ Pt22 A22 + A∗22 Pt22 + A∗11 Pt12 , ∗ = − Pt12 B1 + Pt22 B2 Pt12 B1 + Pt22 B2 + Pt22 A22 + A∗22 Pt22 ,

with the terminal conditions PT11 = 0, PT12 = 0 and PT22 = 0. Therefore, Pt22 = 0, Pt12 = 0 and

−P˙ t11 = D∗1 D1 − Pt11 B1 B∗1 Pt11 + Pt11 A11 + A∗11 Pt11 PT11 = 0.

It follows from this that the unreconstructible subspace does not affect the convergence of Pt . Therefore, we may and will assume that the system is completely reconstructible, and hence, the condition of the theorem becomes: “the intersection of the uncontrollable subspace and the unstable subspace is empty”. Thus, the uncontrollable subspace is contained in the stable subspace, i.e. the pair (A, B) is stabilizable. Now we transform the system equation (9.23) to the controllability canonical form (again, we use Pt instead of Pt for the simplicity of notations): A11 A12 B1 ut , xt + (9.25) x˙ t = 0 A22 0

177

178

9 : Linear ﬁltering

where A11 is a d2 × d2 -matrix, and the pair (A11 , B1 ) is completely controllable. The transformed Riccati equation becomes ∗ A11 A12 B1 B∗1 0 ∗ ˙Pt = A11 A12 Pt . +D D − Pt Pt + P t 0 0 0 A22 0 A22 (9.26) ∈

Rd2

and

x2

∈

Rd−d2

x1 x2

. For x2 = 0, x is in be such that x = 1 xt and there is a control u such the controllable subspace. Hence, xt = 0 that x1t = 0 for large t. For x1 = 0, x is in the uncontrollable subspace, which is contained in the stable subspace. By the stability, xt → 0 as t → ∞. In both cases, we see that {aT } deﬁned in the proof of Theorem 9.22 is bounded. It then follows from the same arguments as in the proof of Theorem 9.22 that the limit of P0T exists as T → ∞. It is clear that the limit P¯ is a solution to equation (9.24). Let

x1

Theorem 9.32 Suppose that the system equation (9.23) is stabilizable and ¯ t is asympdetectable. Then, the steady-state control law ut = −B∗ Px ∗ totically stable. Equivalently, we have that A − BB P¯ is asymptotically stable. Proof We have already seen in the proof of Theorem 9.31 that the steadystate control does not affect and is not affected by the unreconstructible part of the system. Since the system is detectable, we may and will omit the unreconstructible part and assume that the system is completely reconstructible. We now use the controllability canonical form. Partitioning the matrix Pt as Pt12 Pt11 Pt = . (Pt12 )∗ Pt22 It is easy to see from equation (9.26) that Pt11 is the solution of −P˙ t11 = D∗1 D1 − Pt11 B1 B∗1 Pt11 + Pt11 A11 + A∗11 Pt11 , PT11 = 0. By Theorem 9.22, we see that Pt11 has a limit P¯ 11 as T → ∞, and that 0 is in the stable A11 − B1 B∗1 P¯ 11 is asymptotically stable. As x = x2 subspace, xt → 0, and hence, x2t → 0. By equation (9.25), we have x˙ 2t = A22 x2t . Thus, A22 is stable.

9.4

Some basic facts for a related deterministic control problem

The control law for the whole system is given by P¯ 12 P¯ 11 xt ut = − B∗1 , 0 (P¯ 12 )∗ P¯ 22 = − B∗1 P¯ 11 , B∗1 P¯ 12 xt . With this control law, the system equation (9.23) becomes A11 − B1 B∗1 P¯ 11 A12 − B1 B∗1 P¯ 12 xt . x˙ t = 0 A22 Since both A11 − B1 B∗1 P¯ 11 and A22 are asymptotically stable, xt tends to 0 exponentially fast. Therefore, A − BB∗ P¯ is asymptotically stable. Theorem 9.33 Suppose that the system equation (9.23) is stabilizable and detectable. Then, the steady-state control law minimizes $ T 2 2 ∗ lim (9.27) |Dxt | + |ut | dt + xT γ0 xT T→∞

0

¯ 0 . Here, γ0 ≥ 0 means for all γ0 ≥ 0. Further, the minimal value is x∗0 Px that γ0 is a non-negative-deﬁnite matrix. Proof Obviously, the steady-state control law minimizes ∞ |Dxt |2 + |ut |2 dt, 0

¯ 0 . Denote the steady-state control law by u¯ t and the minimal value is x∗0 Px and the corresponding state process by x¯ t . By the previous theorem, we see that limT→∞ x¯ T = 0. Therefore, the value of equation (9.27) with (xt , ut ) ¯ 0 . Suppose that there exists another replaced by (x¯ t , u¯ t ) is equal to x∗0 Px control law ut that gives a smaller value for equation (9.27). Then ∞ ¯ 0. |Dxt |2 + |ut |2 dt + lim x∗T γ0 xT < x∗0 Px T→∞

0

Since γ0 ≥ 0, this would imply that ∞ ¯ 0. |Dxt |2 + |ut |2 dt < x∗0 Px

(9.28)

0

This is not possible because the minimal value for the left hand side of equation (9.28) is equal to the right hand side. Therefore, equation (9.27) is minimized by u. ¯

179

180

9 : Linear ﬁltering

Theorem 9.34 Suppose that the system equation (9.23) is stabilizable and detectable. Then the solution of the Riccati equation with PT = γ0 tends to P¯ as T → ∞. Further, P¯ is the unique solution to the algebraic Riccati equation ¯ ∗ P¯ = 0. ¯ + D∗ D − PBB A∗ P¯ + PA Proof Note that

min

u∈L2 ([0,T],Rm )

0

T

(9.29) $

|Dxt |2 + |ut |2 dt + x∗T γ0 xT

= x∗0 P0T x0 .

By the last theorem, we get P0T → P¯ as T → ∞. It is clear that P¯ solves the algebraic Riccati equation (9.29). Now we prove the uniqueness. Let P be any non-negative-deﬁnite solution of the algebraic Riccati equation. Consider the Riccati equation (9.13) with terminal condition PT = P . Obviously, the solution of the Riccati equation is Pt = P for all t ≤ T. Then, the steady-state solution P¯ must also be given by P . This proves the uniqueness for the solution to equation (9.29).

9.5

Stability for Kalman–Bucy ﬁltering

After all the preparations in the last section, we can now discuss the stability of the linear ﬁltering. We consider the ﬁltering model whose signal is given by dXt = bXt dt + cdWt + σ dBt ,

(9.30)

and the observation process is dYt = hXt dt + dWt ,

(9.31)

ˆ 0 ∈ Rd and where X0 is a d-dimensional normal random vector with mean X d covariance matrix γ0 ∈ S+ , the space of all non-negative-deﬁnite symmetric d × d-matrices, (Wt , Bt ) is an m + d-dimensional Brownian motion, the coefﬁcients b, c, σ , h are matrices of dimensions d × d, d × m, d × d and m × m, respectively. As dWt = dYt − hXt dt, we can rewrite equation (9.30) as dXt = (b − ch)Xt dt + cdYt + σ dBt .

(9.32)

9.5

ˆ t = E (Xt |Gt ) and γt = E Recall that X

Stability for Kalman–Bucy ﬁltering

ˆt Xt − X

ˆt Xt − X

∗

satisfy

the following equations:

ˆ t = (b − ch − γt h∗ h)X ˆ t dt + c + γt h∗ dYt , dX

(9.33)

and γ˙t = γt (b − ch)∗ + (b − ch)γt + σ ∗ σ − γt h∗ hγt . d , we deﬁne the d-dimensional stochastic For any z ∈ Rd and R ∈ S+ d -valued function P by process Zt and S+ t dZt = (b − ch − γt h∗ h)Zt dt + c + Pt h∗ dYt (9.34) Z0 = z,

and

P˙ t = Pt (b − ch)∗ + (b − ch)Pt + σ ∗ σ − Pt h∗ hPt P0 = R.

ˆ 0 and R = γ0 , we have Zt = X ˆ t and Pt = γt . Thus, Note that for z = X (Zt , Pt ) can be regarded as the linear ﬁlter with “incorrect” initial. We will ˆ t − Zt as t → ∞. We need the following study the limit behavior of X d such that Assumption (A): There exists a matrix γ∞ ∈ S+ γ∞ (b − ch)∗ + (b − ch)γ∞ + σ ∗ σ − γ∞ h∗ hγ∞ = 0,

(9.35)

and b − ch − γ∞ h∗ h is asymptotically stable. Lemma 9.35 If ((b−ch)∗ , σ ) is detectable and ((b−ch)∗ , h∗ ) is stabilizable, then the Assumption (A) holds, γ∞ is the unique solution to equation (9.35), d. and Pt → γ∞ exponentially fast for any initial condition P0 = R ∈ S+ Proof By Remark 9.12 and Theorem 9.34, we see that γt converges to γ∞ , which is the unique solution Riccati equation (9.35). By to the ∗ algebraic ∗ Theorem 9.32, the matrix b − ch − h hγ∞ is asymptotically stable. This implies that b − ch − γ∞ h∗ h is also asymptotically stable. Let 0 < λ0 < inf −Reλ : λ is an eigenvalue of b − ch − γ∞ h∗ h . (9.36) Note that

1 d ∗ (Pt − γ∞ ) = b − ch − (Pt + γ∞ ) hh (Pt − γ∞ ) dt 2 ∗ 1 ∗ + b − ch − (Pt + γ∞ ) hh (Pt − γ∞ ) . 2

181

182

9 : Linear ﬁltering

Thus, there exists a constant K1 such that |Pt − γ∞ | ≤ K1 e−λ0 t .

If R = γ∞ , then Zt is called the steady-state ﬁlter. Corollary 9.36 Suppose that ((b − ch)∗ , σ ) is detectable and ((b − ch)∗ , h∗ ) is stabilizable. If R = γ∞ , then Zt is asymptotically optimal in the following sense: ˆ t |2 = tr(γ∞ ), lim E |Xt − Zt |2 = lim E |Xt − X t→∞

t→∞

where tr(γ ) denotes the trace of the matrix γ . Proof The second equality follows from the fact that ˆ t )(Xt − X ˆ t )∗ tr(γt ) = Etr (Xt − X ˆ t )∗ (Xt − X ˆ t) = Etr (Xt − X ˆ t |2 . = E |Xt − X To prove the ﬁrst equality, we only need to show that ˆ t − Zt |2 = 0. lim E|X

t→∞

By equations (9.33) and (9.34), we get ˆ t − Zt + (γ∞ − γt ) h∗ hX ˆt ˆ t − Zt = (b − ch − γ∞ h∗ h) X d X + (γ∞ − Pt ) h∗ hZt dt + (γt − Pt ) h∗ dYt . Applying Itô’s formula, we have ˆ t − Zt )e−(b−ch−γ∞ h∗ h)t d (X ∗ ˆ t + (γ∞ − Pt ) h∗ hZt dt = e−(b−ch−γ∞ h h)t (γ∞ − γt ) h∗ hX ∗ h)t

+ e−(b−ch−γ∞ h

(γt − Pt ) h∗ dYt .

(9.37)

Then, ˆ t − Zt |2 ≤ 3|X ˆ 0 − z|2 |e2(b−ch−γ∞ h∗ h)t | E|X t ∗ ˆ s |2 ds + 6t E |e2(b−ch−γ∞ h h)(t−s) ||γ∞ − γs |2 |h∗ h|2 |X 0

9.5

+ 6t E

t 0

∗ h)(t−s)

|e2(b−ch−γ∞ h

Stability for Kalman–Bucy ﬁltering

||γ∞ − Ps |2 |h∗ h|2 |Zs |2 ds

t 2 (b−ch−γ∞ h∗ h)(t−s) ∗ (γt − Pt )h dYs . + 3E e

(9.38)

0

By Theorem 9.15, it is easy to show that ∗ h)t

|e(b−ch−γ∞ h

| ≤ e−λ0 t .

Thus, by Lemma 9.35 and the boundedness of the second moments of ˆ t and Zt , the fourth term of the right-hand side of equation (9.38) is X bounded by t 2 (b−ch−γ∞ h∗ h)(t−s) ∗ 6E e (γt − Pt )h dWs 0

2 t ∗ + 6E e(b−ch−γ∞ h h)(t−s) (γt − Pt )h∗ hXs ds 0

≤ 6E

t

0

+ 6t E ≤ K1

0

t

∗ h)(t−s)

|e2(b−ch−γ∞ h

t 0

|e2(b−ch−γ∞ h

||γt − Pt |2 |h|2 ds

∗ h)(t−s)

||γt − Pt |2 |h∗ h|2 |Xs |2 ds

e−2λ0 (t−s) e−2λ0 s ds = K2 e−2λ0 t .

The other terms can be estimated similarly. Therefore, we get ˆ t − Zt |2 ≤ K2 e−2λ0 t → 0, E|X as t → ∞.

(9.39)

Now we study the a.s. convergence. Denote the right-hand side of equation (9.36) by λ¯ . Theorem 9.37 Suppose that Assumption (A) holds. Let X0 be normal with ˆ 0 and covariance matrix γ0 . Suppose that mean X lim γt = lim Pt = γ∞ .

t→∞

t→∞

¯ we have Then, for any z ∈ Rd and any 0 < λ < λ, ˆ t − Zt eλt = 0, a.s. lim X t→∞

Proof First, we modify the proof of Corollary 9.36 to get an estimate that is better than that given in equation (9.39). By equation (9.37) and the

183

184

9 : Linear ﬁltering

Burkholder–Davis–Gundy inequality, it follows from the same arguments as those leading to equation (9.39) that 2 ∗ ˆ s − Zs ) ≤ K1 , E sup e−(b−ch−γ∞ h h)s (X s≥0

where K1 is a constant independent of t. Denote ∗ h)s

Us = e−(b−ch−γ∞ h

ˆ s − Zs ). (X

Then, ˆ s − Zs | ≤ e−λ0 s |Us |, |X and hence, ˆ s − Zs |2 e2λ0 s < ∞. E sup |X s≥0

Thus, ˆ t − Zt |eλt ≤ e−(λ0 −λ)t sup |X ˆ s − Zs |eλ0 s → 0, |X

a.s.

s≥0

Finally, we prove the asymptotic stability of the Kalman–Bucy ﬁlter. Note ˆ t and that, given Gt , πt is a Gaussian probability measure with mean X covariance matrix γt . Let π¯ t be, given Gt , a Gaussian probability measure with mean Zt and covariance matrix Pt . Recall that the Wasserstein metric in the space P (Rd ) of probability measures is deﬁned by ρ(ν1 , ν2 ) = sup {| ν1 , φ − ν2 , φ | : φ ∈ B1 } , where

∀ ν1 , ν2 ∈ P (Rd ),

B1 = φ : |φ(x) − φ(y)| ≤ |x − y|, |φ(x)| ≤ 1, ∀x, y ∈ Rd .

Corollary 9.38 Under the conditions of Theorem 9.37, we have lim ρ(πt , π¯ t ) = 0,

t→∞

a.s.

Proof Let ϕ(z) be the probability density function of the d-dimensional standard normal random vector. Then, ˆ t + √γt z ϕ(z)dz. f (x)πt (dx) = f X Rd

Rd

9.6

Notes

Thus, for f ∈ B1 , we have f (x)π (dx) − t

f (x)π¯ t (dx) Rd Rd 3 √ ˆ ≤ Xt + γt z − Zt + Pt z ϕ(z)dz Rd √ 3 ˆ ≤ X − Z γ − Pt |z|φ(z)dz + t t t Rd √ √ 3 ˆ ≤ X t − Zt + d γt − Pt → 0,

a.s.

9.6

Notes

The Kalman–Bucy ﬁlter and its stability were ﬁrst studied by Bucy and Kalman [15]. Some other properties and applications have been studied by many authors. Here we refer the reader to Beneš and Karatzas [7], Chow, et al. [33], Delyon and Zeitouni [58], Makowski [119], Makowski and Sowers [120], Miller and Runggaldier [125] and Miller and Rubinovich [124] for these topics. Section 9.5 is based on Ocone and Pardoux [129]. The basic deﬁnitions and results for deterministic linear systems in Section 9.4 are taken from the book of Kwakernaak and Sivan [105]. The stability problem for linear ﬁlter was studied by Bucy and Kalman [15].

185

10

Stability of non-linear ﬁltering

In this chapter, we consider the stability of the non-linear ﬁlter for the case when the observation and signal noises are independent, i.e. c = 0 in the ﬁltering model introduced in the previous chapters. In this case, Xt is a time-homogeneous Markov process taking values in Rd . We will consider the ﬁltering problem with a state space that is more general than what we studied in the previous chapters, namely, we assume that Xt is a continuous time-homogeneous Markov process in a Polish space S. We denote the generator of Xt by L. Suppose that the observation process Yt is an m-dimensional process given by t h(Xs )ds + Wt , (10.1) Yt = 0

where h : S → Rm is a continuous map and Wt is an m-dimensional Brownian motion independent of X. By the arguments similar to those in Chapters 5 and 7, the optimal ﬁlter πt ≡ P(·|Gt ) is the unique solution to the following ﬁltering equation on P (S): For any f in the domain D(L) of L, t t πs , Lf ds + πs , fh∗ − πs , f πs , h∗ dνs , πt , f = π0 , f + 0

0

(10.2) where π0 is the law of X0 , and νt is the innovation process given by dνt = dYt − πt , h dt. Note that νt is an m-dimensional Brownian motion. In the last chapter, we studied the stability problem for the Kalman– Bucy ﬁlter. Namely, when the initial of the ﬁltering equation is normal with “incorrect” mean and variance, the difference between the Kalman–Bucy ﬁlter with the correct initial and that with the incorrect one tends to zero as t → ∞. A natural generalization in the non-linear case is the investigation of the following question: Under what conditions does the distance between

10.1

Markov property of the optimal ﬁlter

πt and π¯ t tend to 0 as t → ∞? Here, π¯ t is the solution of equation (10.2) with π0 replaced by π¯ 0 ∈ P (S). Deﬁnition 10.1 The ﬁltering model is asymptotically stable if for any π0 , π¯ 0 ∈ P (S), we have lim d(π¯ t , πt ) = 0

t→∞

in probability, where d(·, ·) is a suitable metric in the space of probability measures on S.

10.1

Markov property of the optimal ﬁlter

As we mentioned above, the signal Xt is a continuous time-homogeneous Markov process taking values in S. We denote the transition probability of the Markov process Xt by p(t, x, A), where t ≥ 0, x ∈ S and A ∈ B (S). Then, for any (s, x) ∈ R+ × S, there exists a probability measure Ps, x on C(R+ , S) such that for t > s and A ∈ B (S), Ps,x ξt ∈ A|Fsξ = p(t − s, x, A), Ps,x − a.s., and Ps,x (ξu = x, 0 ≤ u ≤ s) = 1, where ξt is the co-ordinate process on C(R+ , S), i.e. ξt (θ ) = θt for all θ ∈ C(R+ , S). We assume throughout this chapter that the following condition (F) is satisﬁed. Condition (F): The mapping (s, x) → Ps,x from R+ × S to P (C(R+ , S)) is continuous. Remark 10.2 Under Condition (F), Xt becomes a Feller–Markov process, i.e. for any f ∈ Cb (S), Tt f deﬁned by Tt f (x) ≡ f (y)p(t, x, dy) S

is still in Cb (S). Remark 10.3 Although the Markov process Xt is assumed to be timehomogeneous and hence, the distribution can be characterized by the family of probability measures {P0,x : x ∈ S}, it is more convenient to use {Ps,x : s ≥ 0, x ∈ S}, which is usually reserved for time-inhomogeneous Markov processes. This will become clear when we deﬁne st and !st later.

187

188

10 : Stability of non-linear ﬁltering

Since we will discuss the ﬁltering problems with different initial distributions, it is convenient to construct all ﬁltering models under the same (standard) stochastic basis. We recall that the observation is an mdimensional process given by equation (10.1). Let βt be the co-ordinate process on C(R+ , Rm ). Let Q be the probability measure on C(R+ , Rm ) induced by the Brownian motion Wt . Let ˆ = C(R+ , S) × C(R+ , Rm ) and Rs,λ = Ps,λ ⊗ Q, where for λ ∈ P (S), Ps,λ ∈ P (C(R+ , S)) is deﬁned as ∀A ∈ B (C(R+ , S)) . Ps,λ (A) = Ps,x (A)λ(dx), S

Namely, Ps,λ is the distribution of the Markov process Xt with initial distribution λ, and Rs,λ is the joint distribution of (X, W). Let Fˆ t be the σ -ﬁeld ˆ generated by the co-ordinate processes ξ and β stopped at t, i.e. on Fˆ t = Ftξ ,β . Denote Fˆ ∞ by Fˆ .

t To deﬁne a common version of the stochastic integral s h(ξu )∗ dβu with respect to the probability measures Rs,λ , (s, λ) ∈ R+ × P (S), we prove it to be a measurable functional of ξ and β. We need the following result adapted from Karandikar [85]. Lemma 10.4 Let W be an m-dimensional Brownian motion and let f be an adapted (with respect to the original stochastic basis (, F , P, Ft )) Rm valued continuous stochastic process. Let τ0n = 0 and n i ≥ 0. = inf t ≥ τin : |ft − fτin | ≥ 2−n , τi+1 n , k ≥ 0, we deﬁne For τkn ≤ t < τk+1

Itn =

k−1 i=0

∗ n − W τ n + f n Wt − W τ n . fτ∗n Wτi+1 τ i i

Then, for all T < ∞, when n → ∞, we have t n ∗ fs dWs → 0, sup It − 0≤t≤T

k

k

a.s.

0

Proof We denote the left-hand side of equation (10.3) by Un . Let vtn = τkn Then,

n if τkn ≤ t < τk+1 ,

k = 0, 1, 2, . . . .

t ∗ fvsn − fs dWs . Un = sup 0≤t≤T

0

(10.3)

10.1

Markov property of the optimal ﬁlter

By Doob’s inequality we have

EUn2

≤ 4E

T

|fvsn − fs |2 ds

0 −2n

≤ 4T2

.

Therefore,

E

∞

Un2 < ∞,

n=1

and hence, Un → 0 a.s.

As a consequence, we can prove that the stochastic integral is a functional of the integrand and the driving Brownian motion. Corollary 10.5 There is a measurable mapping F from C(R+ , Rm ) × C(R+ , Rm ) to C(R+ , R) such that for any m-dimensional Brownian motion W and any Rm -valued continuous stochastic process f , we have t fs∗ dWs , ∀t ≥ 0, a.s. (10.4) F(f , W)t = 0

Proof Fix ζ , η ∈ C(R+ , Rm ). Let an0 = 0 and, for i ≥ 0, ani+1 = inf t ≥ ani : |ζt − ζani | ≥ 2−n . Deﬁne a sequence Fn of mappings from C(R+ , Rm ) × C(R+ , Rm ) to C(R+ , R) as follows: For ank ≤ t < ank+1 , k ≥ 0, Fn (ζ , η)t =

k−1

ζa∗n ηani+1 − ηani + ζa∗n ηt − ηan . i

i=0

We then deﬁne F(ζ , η) =

lim Fn (ζ , η)

n→∞

0

k

k

if the limit exists, otherwise.

Equation (10.4) follows from Lemma 10.4. ˆ → R be the stochastic process such that Let Zt : t h(ξu )∗ dβu , Rs,λ − a.s., Zt − Z s = s

189

190

10 : Stability of non-linear ﬁltering

where the stochastic integral is understood as the functional of h(ξ ) and β ˆ let deﬁned in the corollary above. For 0 ≤ s ≤ t and (θ , η) ∈ , 1 t qst (θ , η) ≡ exp Zt (θ , η) − Zs (θ , η) − |h(ξu (θ ))|2 du . 2 s Note that ξt (θ ) = θt and βt (η) = ηt . Therefore, qst above can also be denoted by qst (ξ , β). We deﬁne an MF (S)-valued process st (λ) and a P (S)-valued process !st (λ) on C(R+ , Rm ) as f (ξt (θ))qst (θ , η)Ps,x (dθ )λ(dx), Q-a.s. η, st (λ)(η), f ≡ S C(R+ ,S)

and !st (λ)(η) =

st (λ)(η) .

st (λ)(η), 1

(10.5)

We shall denote q0t , 0t and !0t by qt , t and !t , respectively. By the Kallianpur–Striebel formula, we have πt = !t (π0 )(Y),

a.s., ∀t ≥ 0.

Proposition 10.6 Let π¯ t be deﬁned as the unique solution of equation (10.2) with initial π¯ 0 . Then, for any t ≥ 0, π¯ t = !t (π¯ 0 )(Y),

a.s.

(10.6)

Proof As ξt is a Markov process with generator L, for any f ∈ D(L) with Lf ∈ Cb (S), t f Nt ≡ f (ξt ) − Lf (ξs )ds 0

is a square-integrable martingale independent of βt , under probability measure R0, π¯ 0 . It is easy to show that dqt = qt h(ξt )∗ dβt . By Itô’s formula, we have d(f (ξt )qt ) = Lf (ξt )qt dt + qt dNt + f (ξt )qt h(ξt )∗ dβt . f

As

t (π¯ 0 )(η), f =

C(R+ ,S)

f (ξt (θ))qt (θ , η)Pπ¯ 0 (dθ ),

10.1

Markov property of the optimal ﬁlter

we get

t (π¯ 0 ), f = π¯ 0 , f +

0

t

s (π¯ 0 ), Lf ds +

t

0

s (π¯ 0 ), fh∗ dβs .

Applying Itô’s formula to equation (10.5), we then obtain d !t (π¯ 0 ), f = !t (π¯ 0 ), Lf dt + !s (π¯ 0 ), fh∗ dβt − !t (π¯ 0 ), f !t (π¯ 0 ), h∗ dβt − !t (π¯ 0 ), fh∗ !t (π¯ 0 ), h dt 2 + !t (π¯ 0 ), f !t (π¯ 0 ), h dt = !t (π¯ 0 ), Lf dt + !s (π¯ 0 ), fh∗ − !t (π¯ 0 ), f !t (π¯ 0 ), h∗ d ν˜ , where

ν˜ t = βt −

0

t

!s (π¯ 0 ), h ds.

Replacing βt by Yt , we see that !t (π¯ 0 )(Y) satisﬁes equation (10.2) with initial π¯ 0 . The representation equation (10.6) follows from the uniqueness of the solution to equation (10.2). The next lemma establishes the ﬂow property for both processes and !, which is the key in the proof of their Markov property. Lemma 10.7 Fix 0 ≤ s < t < ∞ and λ ∈ P (S). Then t (λ) = st (s (λ)) and !t (λ) = !st (!s (λ)) ,

Q-a.s.

Proof Let D = {(θ , θ) ∈ C(R+ , S) × C(R+ , S) : θs = θs }. We deﬁne a mapping (θ , θ) ∈ D → θ˜ ∈ C(R+ , S) as if u ≤ s, θu θ˜u = θu if u ≥ s. Then, qt (θ˜ , η) = qs (θ , η)qst (θ , η), and, by the Markov property of the measure Px , we have ˜ = Px (dθ )Ps,ξs (θ ) (dθ ). Px (d θ)

191

192

10 : Stability of non-linear ﬁltering

Therefore, for any η ∈ C(R+ , Rm ), st (s (λ)) (η), f f (ξt (θ))qst (θ , η)Ps,x (dθ) s (λ)(dx) = S

=

C(R+ ,S)

S C(R+ ,S)

=

S C(R+ ,S)

C(R+ ,S)

f (ξt (θ))qst (θ , η)Ps,ξs (θ ) (dθ ) qs (θ , η)Px (dθ )λ(dx)

˜ t (θ, ˜ η)Px (d θ)λ(dx) ˜ f (ξt (θ))q

= t (λ)(η), f . The second equality of the lemma follows from the ﬁrst one, the deﬁnition equation (10.5), and the fact that !st (λ) = !st (λ¯ ), where λ¯ = λ, 1−1 λ is the normalization of λ.

Note that, for (s, λ) ∈ R+ × P (S) ﬁxed, the process {qst : t ≥ s} is a ˆ Fˆ , Rs,λ , Fˆ t ). martingale on the stochastic basis (, β

β

Lemma 10.8 Let s ≥ 0 be ﬁxed. On the stochastic basis (Cm , F∞ , Q, Ft ), we deﬁne the process {ρst : t ≥ 0} as follows:

st (λ), 1 if t ≥ s, ρst ≡ 1 if t ≤ s. Then, {ρst , t ≥ 0} is a martingale. β

Proof Suppose that t > r ≥ s and A ∈ Fr . Then EQ (ρst 1A (η)) = qst (θ , η)Ps,λ (dθ )Q(dη) A C(R+ ,S)

= =

ˆ

1A (η)qst (θ , η)Rs,λ (dθdη) 1A (η)qsr (θ , η)Rs,λ (dθdη)

ˆ Q

= E (ρsr 1A (η)) . Thus, EQ ρst |Frβ = ρsr .

(10.7)

10.1

Markov property of the optimal ﬁlter

For t > s ≥ r, we have EQ ρst |Frβ = EQ EQ ρst |Fsβ Frβ = 1 = ρsr . The proof for the case s ≥ t > r is trivial. Thus, equation (10.7) holds in all cases. This implies the martingale property of ρst . Based on Lemma 10.8, we deﬁne a probability measure Qs,λ on Cm such that dQs,λ = ρst dQ

β

on Ft .

We denote Q0,λ by Qλ . Lemma 10.9 Qπ0 is the law of the observation process Y. ˆ π be the probability measure on ˆ such that on Ft , Proof Let R 0 ˆπ dR 0 (θ, η) = qt (θ , η). d(Pπ0 ⊗ Q)

(10.8)

ˆ π , the process It follows from Girsanov’s theorem that under R 0 t h(ξs )ds βt − 0

is a Brownian motion, which is independent of ξt . Thus, the law of βt under ˆ π coincides with that of Yt under P. Note that for A ∈ B (Cm ), R 0 ˆ π (C(R+ , S) × A) ˆ π (η : β(η) ∈ A) = R R 0 0 qt (θ , η)Pπ0 (dθ )Q(dη) = C(R+ ,S) A

=

A

t (π0 )(η), 1 Q(dη)

= Qπ0 (dη). This implies that Qπ0 is the law of the observation process Y.

Now, we are ready to prove the Markov property for the optimal ﬁlter. β

Theorem 10.10 Let λ ∈ P (S). Then, the stochastic processes (t (λ), Ft ) β Markov processes on the probabiland (!t (λ), Ft ) are time-homogeneous β m ity space C , F∞ , Q .

193

194

10 : Stability of non-linear ﬁltering β

Proof Fix 0 ≤ s < t < ∞, λ ∈ P (S) and A ∈ Fs . Then, for f ∈ Cb (MF (S)),

A

f (t (λ))dQ =

A

f (st (s (λ)))dQ

Q

=E

1A E

=

A

β f (st (s (λ)))Fs

Q

f1 (s, t, s (λ))dQ,

(10.9)

where f1 (s, t, µ) = EQ f (st (µ)) for any s < t and µ ∈ MF (S). We note that the law of {(ξs+u , βs+u − βs ) : u ≥ 0} under Rs,λ is the same as that {(ξu , βu ) : u ≥ 0} under Rλ . As a consequence, the law of st (λ) under Rs,λ is the same as that of t−s (λ) under Rλ . Thus, f1 depends on (s, t) only through t − s. Let f2 (t − s, µ) = f1 (s, t, µ). We may continue the calculation of equation (10.9) above with

A

f (t (λ))dQ =

A

f2 (t − s, s (λ))dQ.

This implies that EQ f (t (λ))|Fsβ = f2 (t − s, s (λ)). β β Therefore, on the probability space Cm , F∞ , Q , t (λ), Ft is a timehomogeneous Markov process. The conclusion for !t (λ) can be proved similarly. ξ

β

As Ft and Ft are independent under Rπ0 , we have

(!t (π¯ 0 )(η), ξt (θ )) , Fˆ t ˆ Fˆ , Rπ ). Markov process on the probability space (, Corollary 10.11 The stochastic process

is a

0

ˆπ , The next theorem states the Markov property under the measure R 0 which is deﬁned by equation (10.8). Theorem 10.12 The stochastic process (!t (π¯ 0 )(η), ξt (θ )) is Markovian on ˆ π , Fˆ t ) taking values in P (S) × S. ˆ Fˆ , R the stochastic basis (, 0

10.1

Markov property of the optimal ﬁlter

Proof Fix 0 < s < t. Let f : P (S) × S → R be a bounded measurable function and let A ∈ Fˆ s . Then A

ˆπ = f (!t (π¯ 0 ), ξt )d R 0

A

=

A

f (!t (π¯ 0 ), ξt )qt dRπ0 ERπ0 f (!st (!s (π¯ 0 )) , ξt ) qs qst Fˆ s dRπ0

=

A

=

A

f1 (!s (π¯ 0 ), ξs )qs dRπ0 ˆπ , f1 (!s (π¯ 0 ), ξs )d R 0

where f1 (λ, x) = ERs,x f (!st (λ), ξt )qst . Hence, ˆ ERπ0 f (!t (π¯ 0 ), ξt )Fˆ s = f1 (!s (π¯ 0 ), ξs ). This implies the desired Markov property.

ˆ π is the same as the law of (Xt , Yt ). Note that the law of (ξt , βt ) under R 0 As π¯ t = !t (π¯ 0 )(Y) and Xt = ξt (X), we get the following: Corollary 10.13 The process {(π¯ t , Xt ), Ft } is a P (S) × S-valued Markov process. Finally, in this section, we study the Feller property of the Markov process (π¯ t , Xt ). Lemma 10.14 Suppose that the sequence {λn } ⊂ P (S) converges to λ ∈ P (S) weakly. Then, for all t ≥ 0, t (λn ) → t (λ) in Q probability. As a consequence, we get that !t (λn ) → !t (λ) in Q probability. Proof By the continuity of Px in x, we see that Pλn converges weakly to Pλ in the space P (C(R+ , S)). By Skorohod’s representation theorem (cf. Theorem 25.6 on page 343 of Billingsley [11]), there exists a probability ˜ a sequence of continuous S-valued processes {ξ˜tn , n ≥ 1} and an space , S-valued process ξ˜t such that ξ˜ n and ξ˜ have distributions Pλn and Pλ on ˜ C(R+ , S), respectively, and ξ˜ n (ω) ˜ → ξ˜ (ω) ˜ in C(R+ , S) for almost all ω˜ ∈ .

195

196

10 : Stability of non-linear ﬁltering

Recall that from Condition (BC), the mapping h is bounded by the ˜ we have constant K. Note that for ω˜ ∈ , qt (ξ˜ n (ω), ˜ η)2 Q(dη) Cm

=

t

exp

Cm

0

2h(ξ˜sn (ω)) ˜ ∗ dβs (η) −

t 2 ˜n ( ω)) ˜ h( ξ × exp ds Q(dη) s ≤ eK =e

0

2t

Cm

K2 t

t

1 2

t 2 n ˜ ˜ ds 2h(ξs (ω)) 0

2h(ξ˜sn (ω)) ˜ ∗ dβs (η) −

exp 0

1 2

t 2 n ˜ 2h( ξ ( ω)) ˜ ds Q(dη) s 0

.

(10.10)

Similarly, we can prove that 2 qt (ξ˜ (ω), ˜ η)2 Q(dη) ≤ eK t .

(10.11)

Further, we have 2 ˜ ω)Q(dη) ˜ ˜ η) − log qt (ξ˜ (ω), ˜ η) P(d log qt (ξ˜ n (ω),

(10.12)

Cm

Cm

˜

2 t ∗ n ˜ ω)Q(dη) h(ξ˜s (ω)) ˜ − h(ξ˜s (ω)) ˜ dβs (η) P(d ˜ ≤2 ˜ Cm 0 t 2 n 2 2 ˜ ω)Q(dη) ˜ ˜ ˜ |h(ξs (ω))| ds P(d ˜ − |h(ξs (ω))| ˜ +2

Cm

˜

2

≤ 2 + (2K) t

0

Cm

t 2 ˜n ˜ ω)Q(dη). ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ h(ξs (ω)) ˜ 0

Using the inequalities equations (10.10), (10.11), (10.12) and x e − ey ≤ (ex + ey )|x − y|, we get

2 ˜ qt (ξ˜ n (ω), ˜ ˜ η) − qt (ξ (ω), ˜ η) P(d ω) ˜ Q(dη) ˜ Cm 2 t ˜n K2 t 2 ˜ ω)Q(dη). ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ 2 + (2K) t ≤ 2e h(ξs (ω)) Cm

˜ 0

10.1

Markov property of the optimal ﬁlter

Then, for any f ∈ Cb (S), | t (λn )(η) − t (λ)(η), f |2 Q(dη) Cm

2 n n Q(dη) ˜ ˜ ˜ ˜ ˜ ( ω))q ˜ ( ξ ( ω), ˜ η) − f ( ξ ( ω))q ˜ ( ξ ( ω), ˜ η) P(d ω) ˜ f ( ξ = t t t t ˜ Cm 2 ˜n ˜ ω) ˜ ˜ − f (ξ˜t (ω)) ˜ P(d ≤ K1 f (ξt (ω))

˜

2 ˜ n ˜ ˜ ˜ η) − qt (ξ (ω), ˜ η) P(d ω) ˜ Q(dη) + K1 qt (ξ (ω), ˜ Cm 2 ˜n ˜ ω) ˜ ˜ − f (ξ˜t (ω)) ˜ P(d ≤ K1 f (ξt (ω))

˜

+ K2

Cm

t 2 ˜n ˜ ω)Q(dη) ˜ − h(ξ˜s (ω)) ˜ dsP(d ˜ h(ξs (ω)) ˜ 0

→ 0,

(10.13)

where K1 , K2 are two constants. The conclusion of the lemma then follows easily. Deﬁne two families of operators {Tt : t ≥ 0} and {St : t ≥ 0} on Cb (P (S)) and Cb (P (S) × S), respectively, as follows:

Tt G(λ) = EQλ G(!t (λ)),

∀ G ∈ Cb (P (S)) and λ ∈ P (S),

and ˆ

St F(λ, x) = ERx F(!t (λ), ξt ),

∀ F ∈ Cb (P (S) × S) and (λ, x) ∈ P (S) × S.

Theorem 10.15 The families of operators {Tt } and {St } are Feller semigroups on Cb (P (S)) and Cb (P (S) × S), respectively. Proof By Lemma 10.14, for {λn } ⊂ P (S) converging to λ ∈ P (S), we have

Tt G(λn ) = EQλn G(!t (λn )) = EQ G(!t (λn )) t (λn ), 1 → EQ G(!t (λ)) t (λ), 1 = Tt G(λ). This proves that Tt is a mapping from Cb (P (S)) to Cb (P (S)), and hence, {Tt : t ≥ 0} is a Feller semigroup.

197

198

10 : Stability of non-linear ﬁltering

Finally, we prove the Feller property for the semigroup {St : t ≥ 0}. Let (λn , xn ) → (λ, x) in P (S) × S. Note that ˆ

St F(λn , xn ) = ERxn F(!t (λn ), ξt ) F(!t (λn )(η), ξt (θ ))qt (θ , η)Pxn (dθ )Q(dη). = C(R+ ,S) Cm

˜ a sequence Similar to Lemma 10.14, there exist a probability space , of C(R+ , S)-valued random variables {θ˜ n : n ≥ 1} and a C(R+ , S)-valued ˜ with distributions Pxn and Px , respectively, and random variable θ˜ on ˜θ n (ω) ˜ ˜ Then, ˜ → θ (ω) ˜ in C(R+ , S) for almost all ω˜ ∈ . n ˜ ω)Q(dη). St F(λn , xn ) = F(!t (λn )(η), ξt (θ˜ n (ω)))q ˜ ˜ η)P(d ˜ t (θ˜ (ω), ˜ C(R+ ,S)

By estimates similar to those leading to equation (10.13), we can prove that St F(λn , xn ) → St F(λ, x). Thus, {St : t ≥ 0} is a Feller semigroup on Cb (P (S) × S).

10.2

Ergodicity of the optimal ﬁlter

In this section, we consider the ergodicity of the Markov process πt . Namely, we look for conditions under which the Markov process πt has a unique invariant measure. Deﬁnition 10.16 A probability measure M on P (S) is an invariant measure of the Markov process πt if for any F ∈ Cb (P (S)) and t ≥ 0, we have Tt F(λ)M(dλ) = F(λ)M(dλ). P (S)

P (S)

We make the following assumptions. Assumption (E1): The signal process Xt has a unique invariant measure µ ∈ P (S). Assumption (E2): For any f ∈ Cb (S), we have lim sup Tt f (x) − µ, f µ(dx) = 0, t→∞

S

where {Tt } is the semigroup of the signal process. To study the invariant measure of the optimal ﬁlter, we extend the ﬁltering model to include the whole real line as the set for the time parameter. (1) Let Pµ ∈ P (C(R, S)) be such that the co-ordinate process ξt , t ∈ R, is a Markov process with stationary marginal distribution µ, i.e. for any

10.2

Ergodicity of the optimal ﬁlter

E1 , . . . , En ∈ B (S) and −∞ < t1 < · · · < tn < ∞, (1) ξt1 ∈ E1 , . . . , ξtn ∈ En Pµ ··· µ(dx1 )p(t2 − t1 , x1 , dx2 ) · · · p(tn − tn−1 , xn−1 , dxn ). = E1

En

Let Q(1) ∈ P (C(R, Rm )) be the distribution of the m-dimensional Brownian motion with time t ∈ R, i.e. for the co-ordinate process βt on β (C(R, Rm ), F∞ , Q(1) ) and −∞ < t0 < t1 < · · · < tn < ∞, the random vectors √

1 1 (βt − βt0 ), . . . , √ (βt − βtn−1 ) tn − tn−1 n t1 − t0 1

are i.i.d. with common distribution N(0, I) on Rm . Let (1) (1) (1) = C(R, S) × C(R, Rm ) and R(1) µ = Pµ ⊗ Q .

Let {zt , t ∈ R} be the observation process on ((1) , B ((1) ), R(1) µ ) such that t z t − zs = h(ξu )du + βt − βs . s

Denote the observation σ -ﬁelds by z Fs,t = σ (zv − zu : s ≤ u ≤ v ≤ t),

For f ∈ Cb (S), let

and

−∞ ≤ s ≤ t ≤ ∞.

" (1) (0) z , πs,t , f = ERµ f (ξt )|Fs,t

!

(10.14)

" (1) (1) z ∨ σ (ξs ) . πs,t , f = ERµ f (ξt )|Fs,t

! (0)

Note that πs,t is the usual optimal ﬁlter with s as the initial time; and

(1)

πs,t is the optimal ﬁlter with the complete knowledge of the initial (i.e. at time s) signal as well as the observation process. In the next lemma, we establish some useful alternative expressions for (0) (1) the measure-valued processes πs,t and πs,t . Lemma 10.17 For any s < t, we have (0)

πs,t = !t−s (µ)(zs ),

(1)

πs,t = !t−s (δξs )(zs ),

199

200

10 : Stability of non-linear ﬁltering

and

! " (1) z ξ (1) πs,t , f = ERµ f (ξt )F−∞,t ∨ F−∞,s ,

(10.15)

where zus = zs+u − zs . Proof Note that

zts =

t

0

h(ξus )du + βts ,

where ξus = ξs+u and βus = βs+u − βs . Then, ξts , βts , zts t≥0 has the same structure as the original ﬁltering problem with initial µ. Hence, (0) πs,s+t = !t (µ)(zs ). (1)

Further, the ﬁlter πs,t is equivalent to the optimal ﬁlter in the original ﬁltering problem with initial δξs , and hence, (1)

πs,t = !t−s (δξs )(zs ). By the Markov property of ξ and the independency of β and ξ , given ξ σ (ξs ), the σ -ﬁelds σ (ξu+s , βs+u − βs , u ≥ 0) and F−∞,s are independent. As z ⊂ σ (ξu+s , βs+u − βs , u ≥ 0), σ (ξt ) ∨ Fs,t ξ

we get its independency, given σ (ξs ), from F−∞,s , and hence, (1) (1) z z ξ ξ ERµ f (ξt )Fs,t ∨ F−∞,s = ERµ f (ξt )Fs,t ∨ F−∞,s ∨ σ (ξs ) (1) z = ERµ f (ξt )Fs,t ∨ σ (ξs ) (1)

= πs,t (f ).

(10.16)

Note that ξ ξ z z z F−∞,t ∨ F−∞,s = F−∞,s ∨ F−∞,s ∨ Fs,t β

ξ

z = F−∞,s ∨ F−∞,s ∨ Fs,t , β

ξ

z , so we get and that F−∞,s is independent of σ (ξt ) ∨ F−∞,s ∨ Fs,t (1) (1) ξ ξ z z ERµ f (ξt )|Fs,t ∨ F−∞,s = ERµ f (ξt )|F−∞,t ∨ F−∞,s .

(10.17)

Combining equations (10.16) and (10.17), we get the last expression of the lemma.

10.2 (0)

(1)

Ergodicity of the optimal ﬁlter (1)

µ

µ

Denote the laws of πs,s+t and πs,s+t under Rµ by mt and Mt , respectively. Recall that {Tt } is the semigroup for the Markov process πt . Lemma 10.18 For any t ≥ 0 and F ∈ Cb (P (S)), we have µ µ mt , F = Tt F(µ) and Mt , F = Tt F(δx )µ(dx). S

Proof Note that µ (1) (1) (0) mt , F = ERµ F(πs,s+t ) = ERµ F !t (µ)(zs ) = EQµ F(!t (µ)) = Tt F(µ), and

(1) µ (1) Mt , F = ERµ F(πs,s+t ) (1) F(!s,s+t (δξs )(zs ))Pµ (dξ )Q(1) (dβ) =

C(R+ ,S) Cm

= =

F(!t (δx )(z))Qδx (dz)µ(dx)

S Cm

S

Tt F(δx )µ(dx),

where the third equality follows from the stationarity and the fact that the conditional law of zs = zs (ξ , β) under R(1) µ given ξs = x is equal to Qδx . By equations (10.14) and (10.15) and the backward martingale convergence theorem, we get that as s → −∞, we have ! " ! " (1) (0) (0) z πs,t , f → πt , f = ERµ f (ξt )|F−∞,t , and

! " ! " (1) ξ (1) (1) z πs,t , f → πt , f = ERµ f (ξt )| ∩∞ . s=−∞ F−∞,t ∨ F−∞,s

As

we see that

(1) µ (0) mt−s , F = ERµ F(πs,t ),

µ lim mµ u , F = lim mt−s , F

u→∞

s→−∞

(1)

(0)

= lim ERµ F(πs,t ) s→−∞ (1)

(0)

= ERµ F(πt ).

201

202

10 : Stability of non-linear ﬁltering (0)

As a consequence, the distribution of πt does not depend on t. We denote µ it by mµ = limu→∞ mu . Therefore, mµ is an invariant measure of the semiµ group {Tt }. Similarly, we can prove that Mu tends to Mµ , which is also an invariant measure of the semigroup {Tt }. (0) (1) By the expressions of πt and πt , we see that mµ = Mµ , if for some t, ξ z z ∩∞ R(1) (10.18) s=−∞ F−∞,t ∨ F−∞,s = F−∞,t , µ − a.s. In this case, we will prove the uniqueness of the invariant measure for the optimal ﬁlter. To this end, we need to introduce the concept of the barycenter for a probability measure on P (S). Note that ν, f (dν) f ∈ Cb (S) → P (S)

is a bounded linear functional on Cb (S), and there exists η ∈ P (S) such that η, f = ν, f (dν). P (S)

We denote η by

η=

P (S)

ν(dν)

and call it the barycenter of . Theorem 10.19 If mµ = Mµ , then the optimal ﬁlter has a unique invariant measure. Proof Suppose that is another invariant measure of the optimal ﬁlter. Let µ˜ be the barycenter of , i.e. µ˜ = ν(dν). P (S)

For f ∈ Cb (S), let F ∈ Cb (P (S)) be deﬁned by F(ν) = ν, f , ν ∈ P (S). Then, Tt F(ν) = Eν F(πt ) = Eν πt , f = Eν Eν f (Xt )|Gt = Eν f (Xt ) = ν, Tt f ,

10.2

Ergodicity of the optimal ﬁlter

and hence, µ, ˜ f =

P (S)

ν, f (dν) =

P (S)

=

P (S)

F(ν)(dν)

Tt F(ν)(dν) =

P (S)

= µ, ˜ Tt f .

ν, Tt f (dν)

This proves that µ˜ is an invariant measure of the semigroup {Tt }. By the uniqueness of the invariant measure for {Tt }, we get µ˜ = µ. Therefore, the barycenter of is equal to µ. (c) (e) Now, we deﬁne two probability measures µ and µ on P (S) by !

" ! " (e) (c) , F = F(µ) and , F = F(δx )µ(dx), µ µ S

∀F ∈ Cb (P (S)).

Let F be a continuous convex function on P (S). Then, " (c) , F = F µ

!

P (S)

≤

P (S)

ν(dν)

F(ν)(dν) = , F ,

(10.19)

where the inequality above follows from Jensen’s inequality. On the other hand, we note that for any ν ∈ P (S),

ν, f =

S

f (x)ν(dx) =

( =

S

)

S

δx , f ν(dx)

δx ν(dx), f ,

and hence, ν=

S

δx ν(dx).

(10.20)

As µ is the barycenter of , we have !

" (e) , F = µ

P (S) S

F(δx )ν(dx)(dν).

(10.21)

203

204

10 : Stability of non-linear ﬁltering

It follows from equation (10.20) and Jensen’s inequality that we can continue equation (10.21) with ! " (e) µ , F ≥ δx ν(dx) (dν) F P (S)

S

=

P (S)

F(ν)(dν) = , F .

(10.22)

Combining equations (10.19) and (10.22), we get ! " ! " (c) , F ≤ , F ≤ (e) , F , for any continuous convex function F on P (S). Next, we prove that Tt F is also convex. For α ∈ [0, 1] and λ1 , λ2 ∈ P (S), set λ = αλ1 + (1 − α)λ2 . Let πt = Eλ (·|Gt ) and π˜ t = Eλ (·|Gt ∨ σ (π0 )), where π0 is a random measure with Pλ (π0 = λ1 ) = α and Pλ (π0 = λ2 ) = 1 − α. Then,

Eλ (π˜ t |Gt ) = πt , and hence, by Jensen’s inequality, we have

Tt F(λ) = Eλ F(πt ) ≤ Eλ (Eλ (F(π˜ t )|Gt )) = Eλ F(π˜ t ) = α Tt F(λ1 ) + (1 − α)Tt F(λ2 ). Finally, we prove the uniqueness of the invariant measure for {Tt }. By Lemma 10.18 and the deﬁnition of (c) , we get ! " µ mt , F = Tt F(µ) = (c) , T F . (10.23) t µ Since is invariant for Tt , it follows from equations (10.19) and (10.23) that µ mt , F ≤ , Tt F = , F . (10.24) By Lemma 10.18 and the deﬁnition of (e) , we get " µ ! (e) Mt , F = µ , Tt F .

(10.25)

10.3

Finite memory property

Since is invariant for Tt , it follows from equations (10.22) and (10.25) that µ Mt , F ≥ , Tt F = , F . (10.26) Combining equations (10.23) and (10.25), taking t → ∞, we get µ m , F ≤ , F ≤ Mµ , F . Since mµ , F = Mµ , F, we get

, F = mµ , F = Mµ , F

for all convex F.

Thus, mµ = = Mµ . This implies the desired uniqueness.

10.3

Finite memory property

In this section, we establish an equivalent condition for the stability of the optimal ﬁlter. Let {Tt∗ } be the dual semigroup on P (S) of the semigroup {Tt } on Cb (S). First, we need the following lemma. Lemma 10.20 Let ν ∈ P (S) be such that for some > 0, T∗ ν is absolutely continuous with respect to µ. Then, Tt∗ ν → µ in P (S) as t → ∞. Proof For any f ∈ Cb (S) and t ≥ , we have ∗ Tt ν, f = T∗ ν, Tt− f dT ∗ ν = Tt− f (x) (x)µ(dx). dµ S Thus, ∗ T ν − µ, f ≤

t

For any K > 0, we have ∗ T ν − µ, f ≤ K t

∗ Tt− f (x) − µ, f dT ν (x)µ(dx). dµ S

S

Tt− f (x) − µ, f µ(dx)

+ 2f 0,∞

S

It follows from Assumption (E2) that lim sup T ∗ ν − µ, f ≤ 2f 0,∞ t→∞

t

dT∗ ν µ(dx). (x)1 dT∗ ν (x)>K dµ dµ S

dT∗ ν µ(dx). (x)1 dT∗ ν (x)>K dµ dµ

205

206

10 : Stability of non-linear ﬁltering

Taking K → ∞, we get

lim sup Tt∗ ν − µ, f ≤ 0. t→∞

This implies the convergence of Tt∗ ν to µ.

To apply this lemma, we will need the following Assumption (E3): For any ν1 , ν2 ∈ P (S), there exists t ≥ 0 such that Tt∗ ν1 is absolutely continuous with respect to Tt∗ ν2 . As a consequence, we have the following corollary. Corollary 10.21 Suppose that Assumption (E3) holds. Then for any ν ∈ P (S), we have Tt∗ ν → µ as t → ∞. Proof By (E3), we get Tt∗ ν << Tt∗ µ = µ. The conclusion then follows from Lemma 10.20. In the next two propositions, we establish the absolute continuity relation among {Tt∗ }, {!t } and {Qλ }, which will be used to get the ﬁnite memory property of the optimal ﬁlter. Proposition 10.22 Let µ1 , µ2 ∈ P (S) and > 0. If T∗ µ1 << T∗ µ2 , then ! (µ1 ) << ! (µ2 ),

Q-a.s.

Proof Recall that Cm = C(R+ , Rm ). Let N ⊂ Cm be the Q-nullset such that for η ∈ / N , we have q (θ , η) > 0 a.s. with respect to both Pµ1 and Pµ2 . c Fix η ∈ N and let A ∈ B (S) be such that ! (µ2 )(η), 1A = 0. Then 1A (ξ (θ))q (θ , η)Pµ2 (dθ ) = 0. Cm

Thus, 1A (ξ (θ)) = 0,

Pµ2 -a.s.

This implies that T∗ µ2 (A) = 0, and hence, T∗ µ1 (A) = 0. Now we reverse the above argument with µ2 replaced by µ1 . Then, ! (µ1 )(η), 1A = 0. Proposition 10.23 Let µ1 , µ2 ∈ P (S). If there exists > 0 such that ! (µ1 ) << !(µ2 ),

Q-a.s.,

then Qµ1 << Qµ2 . β

Proof Denote ! (µi ) by γi , i = 1, 2. Suppose that A ∈ Ft . Then Qµ1 (A) = EQ { t (µ1 ), 1 1A } = EQ { (µ1 ), 1 t (γ1 ), 1 1A } ,

(10.27)

10.3

Finite memory property

where the last equality follows from the ﬂow property of . Note that

t (γ1 ), 1 =

S Cm

qt (θ , η)P,x (dθ)γ1 (dx)

=

S Cm

qt (θ , η)P,x (dθ)1 dγ1 dγ2

+

S Cm

dγ1 (x)γ2 (dx) (x)≤K dγ2

qt (θ , η)P,x (dθ )1 dγ1 dγ2

≤ K t (γ2 ), 1 +

S

γ (dx) (x)>K 1

t (δx ), 1 1 dγ1 dγ2

γ (dx). (x)>K 1

As Q

E

β Q

t (δx ), 1 F = E

Cm

qt (θ , η)P,x (dθ )Fβ = 1,

we have Q

E

(µ1 ), 1

Q

≤E

=E

t (δx ), 1 1 dγ1

γ (dx)1A (x)>K 1

t (δx ), 1 1 dγ1

γ (dx) (x)>K 1

dγ2

(µ1 ), 1

Q

S

S

dγ2

(µ1 ), 1

S

1 dγ1 dγ2

γ (dx) (x)>K 1

≡ C1 (K). Since S

1 dγ1 dγ2

γ (dx) (x)>K 1

≤ 1,

by the dominated convergence theorem, we have that C1 (K) → 0 as K → ∞. Now we continue the estimate equation (10.27) to arrive at Qµ1 (A) ≤ KEQ { (µ1 ), 1 t (γ2 ), 1 1A } + C1 (K).

(10.28)

207

208

10 : Stability of non-linear ﬁltering

The ﬁrst term of equation (10.28) is dominated by

(µ1 ), 1 Q

t (γ2 ), 1 1A

(µ2 ), 1 KE

(µ2 ), 1 ≤ KLEQ { (µ2 ), 1 t (γ2 ), 1 1A } Q

+ KE

(µ1 ), 1 1 (µ1 ),1

(µ2 ),1 >L

$

t (γ2 ), 1 1A

= KLQµ2 (A) + KC2 (L), where

(10.29)

Q

C2 (L) = E

$

(µ1 ), 1 1 (µ1 ),1

(µ2 ),1 >L

t (γ2 ), 1 1A

and the last equality of equation (10.29) follows from equation (10.27) with µ1 replaced by µ2 . By the dominated convergence theorem again, we have that C2 (L) → 0 as L → ∞. Suppose that Qµ2 (A) = 0. It follows from equations (10.28) and (10.29) that Qµ1 (A) ≤ C1 (K) + KC2 (L). Taking L → ∞, and then taking K → ∞, we get Qµ1 (A) = 0. This proves the absolute continuity of Qµ1 with respect to Qµ2 . We now introduce the property of “ﬁnite memory of the ﬁlter”, by which we mean that the optimal ﬁlter can be approximated by a ﬁlter that uses only the observations from the past τ unit of time. More precisely, we have Deﬁnition 10.24 The ﬁlter has the ﬁnite memory property if for any f ∈ Cb (S) and for µ-almost all x ∈ S, we have ∗ lim sup lim sup EQδx !t (δx ) − !t−τ ,t (Tt−τ δx ), f = 0. (10.30) τ →∞

t→∞

∗ δ ) is the ﬁlter using the history of the observaRemark 10.25 !t−τ ,t (Tt−τ x tion from time t − τ to time t, i.e. it is a ﬁnite memory ﬁlter. The statement equation (10.30) says that the optimal ﬁlter can be approximated by one with ﬁnite memory.

Theorem 10.26 Suppose that Assumptions (E1)–(E3) hold. If the optimal ﬁlter is asymptotically stable with π0 = δx and π¯ 0 = µ, ∀ x ∈ S, then it has the ﬁnite memory property.

10.3

Finite memory property

Proof We note that 2 ∗ EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f 2 2 ∗ = EQµ !t (µ), f µ), f + EQµ !t−τ ,t (Tt−τ ∗ µ), f . − 2EQµ !t (µ), f !t−τ ,t (Tt−τ

(10.31)

Note that !t (µ) = !t−τ ,t (!t−τ (µ)). As !t−τ ,t and !t−τ are independent under Qµ , and ∗ EQµ !t−τ (µ) = Tt−τ µ,

we see that 2 ∗ ∗ . EQµ !t (µ), f !t−τ ,t (Tt−τ µ), f = EQµ !t−τ ,t (Tt−τ µ), f Thus, we can continue equation (10.31) with 2 ∗ EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f 2 2 ∗ − EQµ !t−τ ,t (Tt−τ = EQµ !t (µ), f µ), f 2 2 = EQµ !t (µ), f − EQµ !τ (µ), f µ

= mt (Ff ) − mµ τ (Ff ), 2 where Ff (ν) = ν, f for ν ∈ MF (S), the second equality follows from µ being an invariant element of Tt∗ and the fact that !t−τ ,t (µ) and !τ (µ) have the same distribution, and the last equality follows from Lemma 10.18. Taking t → ∞ and then, τ → ∞ in the above equation, we have 2 ∗ µ), f lim sup lim sup EQµ !t (µ) − !t−τ ,t (Tt−τ τ →∞

t→∞ µ

= lim sup m (Ff ) − mµ τ (Ff ) τ →∞

= 0. Thus, ∗ lim sup lim sup EQµ !t (µ) − !t−τ ,t (Tt−τ µ), f = 0. τ →∞

t→∞

209

210

10 : Stability of non-linear ﬁltering

Now we need to replace µ in the above equation by δx . By Assumption (E3), it follows from Propositions 10.22 and 10.23 that Qδx << Qµ . Note that !t (µ) − !t−τ ,t (T ∗ µ), f ≤ 2f 0,∞ . t−τ By the dominated convergence theorem, we have ∗ µ), f lim sup lim sup EQδx !t (µ) − !t−τ ,t (Tt−τ τ →∞

t→∞

Qµ

= lim sup lim sup E τ →∞

t→∞

!t (µ) − !t−τ ,t (T ∗ µ), f dQδx t−τ dQµ

= 0.

(10.32)

We note that ∗ EQδx !t (δx ) − !t−τ ,t (Tt−τ δx ), f ≤ EQδx !t (δx ) − !t (µ), f ∗ + EQδx !t (µ) − !t−τ ,t (Tt−τ µ), f ∗ ∗ + EQδx !t−τ ,t (Tt−τ µ) − !t−τ ,t (Tt−τ δx ), f . By the asymptotical stability, the ﬁrst term tends to 0. The convergence of the second term follows from equation (10.32). Thus, we only need to prove that the third term tends to 0; this term can be rewritten as QT ∗

E

t−τ δx

∗ !τ T µ − !τ T ∗ δx , f . t−τ t−τ

(10.33)

∗ δ → µ as t → ∞. As T ∗ µ = µ, it By Corollary 10.21, we have Tt−τ x t−τ follows from the Feller property of the ﬁlter that equation (10.33) tends to 0. Combining the estimates above, we see that the ﬁlter has the ﬁnite memory property.

The proof of the next theorem, due to Budhiraja, is technically involved. We will omit the proof but state the theorem for the completeness of the results. Theorem 10.27 Suppose that Assumptions (E1)–(E3) hold. If the ﬁlter has the ﬁnite memory property, then it satisﬁes the condition equation (10.18). Remark 10.28 Combining the theorems we introduced in this and the last sections we see that equation (10.18), ergodicity, stability and ﬁnite memory properties for the optimal ﬁlter are all equivalent, provide that Assumptions (E1)–(E3) hold.

10.4

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

Asymptotic stability for non-linear ﬁltering with compact state space

In the last section, we have given equivalent conditions under which the optimal ﬁlter is stable. Now, in this section, we consider a situation when all these conditions are satisﬁed, i.e. the stability of the optimal ﬁlter holds. To this end, we will make use of the Hilbert metric that we introduce now. We will give the proof of the fact that will be used in this book. We refer the reader who is interested in more detail for this topic to the books Birkhoff [12] and Hopf [74]. Deﬁnition 10.29 i) For λ, µ ∈ MF (S), λ ≤ µ means that λ(A) ≤ µ(A) for all A ∈ B (S). ii) Two measures λ, µ ∈ MF (S) are comparable if there are two positive constants K1 and K2 such that K1 λ ≤ µ ≤ K2 λ. The Hilbert metric ρh on MF (S) is deﬁned as ⎧ sup{λ(A)/µ(A): A∈ B (S), µ(A)>0} ⎪ ⎪ if λ, µ are comparable, ⎨log inf{λ(A)/µ(A): A∈ B (S), µ(A)>0} ρh (λ, µ)= 0 if λ = µ = 0, ⎪ ⎪ ⎩ ∞ otherwise. First, let us see how to calculate the Hilbert distance for a special case. Suppose that S consists of n points. Then MF (S) can be identiﬁed with Rn+ . For x, y ∈ Rn+ , it is easy to show that xi yj . 1≤i, j≤n xj yi

ρh (x, y) = log sup

The following special case will be used later. For n = 2, we have x1 y2 . ρh (x, y) = log x2 y1

(10.34)

Remark 10.30 The Hilbert metric ρh is only a pseudo-metric on MF (S). However, it is a metric on P (S). Proof It is clear that ρh (λ, µ) = ρh (µ, λ) and ρh (λ, Kλ) = 0 for any constant K and λ, µ ∈ MF (S). Now we prove the triangle inequality. For λ, µ, ν ∈ MF (S), we may and will assume that they are comparable (otherwise the

211

212

10 : Stability of non-linear ﬁltering

triangle inequality equation (10.35) is trivial). Then sup{λ(A)/ν(A) : A ∈ B (S), ν(A) > 0} inf{λ(A)/ν(A) : A ∈ B (S), ν(A) > 0} sup{ν(A)/µ(A) : A ∈ B (S), µ(A) > 0} + log inf{ν(A)/µ(A) : A ∈ B (S), µ(A) > 0}

ρh (λ, ν) + ρh (ν, µ) = log

= log

≥ log

ν(A) sup{ λ(A) ν(A) : ν(A) > 0} sup{ µ(A) : µ(A) > 0} ν(A) inf{ λ(A) ν(A) : ν(A) > 0} inf{ µ(A) : µ(A) > 0}

λ(A) sup{ µ(A) : µ(A) > 0} λ(A) inf{ µ(A) : µ(A) > 0}

= ρh (λ, µ).

(10.35)

Thus, ρh is a pseudo-metric on MF (S). However, if λ, µ ∈ P (S) satisfying ρh (λ, µ) = 0, then λ(A) λ(A) sup : µ(A) > 0 = inf : µ(A) > 0 . µ(A) µ(A) Thus, there exists a constant K such that λ(A) = Kµ(A),

∀ A ∈ B (S).

Taking A = S, we get K = 1, and hence, λ = µ. Namely, ρh is a metric on P (S). For a linear transformation T on MF (S), we deﬁne the ρh -diameter of the range T (MF (S)) of T as H(T ) = sup{ρh (T λ, T µ) : λ, µ ∈ MF (S)}. For the special case that MF (S) = R2+ and T given by a matrix 4

we have

ab cd

5 ,

a, b, c, d > 0,

(ax1 + bx2 )(cy1 + dy2 ) H(T ) = sup log : x, y ∈ R2+ (cx1 + dx2 )(ay1 + by2 ) bc = log . ad

(10.36)

(10.37)

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

Similarly, if for a speciﬁc λ ∈ MF (S), the transformation T has the kernel representation T µ, f = G(x, x )f (x )µ(dx)λ(dx ), S S

where

G(x, x )

is non-negative, then H(T ) = log esssup

G(x, y)G(x , y ) , G(x, y )G(x , y)

(10.38)

with the convention 0/0 = 1 and 1/0 = ∞. The supremum above is strict over x, x ∈ S, and is essential over y, y ∈ S with respect to λ. The next lemma will be useful in the proof of the stability result. Lemma 10.31 Let T be a linear transformation on MF (S). Then T is a contraction under the Hilbert metric and ρ (T λ, T µ) H(T ) τ (T ) ≡ sup h : 0 < ρh (λ, µ) < ∞ = tanh . (10.39) ρh (λ, µ) 4 The function τ is called Birkhoff’s contraction coefﬁcient. Proof For simplicity, we consider only the case that MF (S) = R2+ and T is given by the matrix equation (10.36). We refer the reader to Birkhoff [12] (Theorem 3, p. 384) for the arguments on relating the general case to the current one. Let R(x) = xx12 for x = (x1 , x2 ) ∈ R2+ . It follows from equation (10.34) that ρh (x, y) = log(R(x)/R(y)) . Denote x =

(T x)1 aR(x) + b , = (T x)2 cR(x) + d

and y is deﬁned similarly. Then,

y du ρh (T x, T y) = log(x /y ) = . x u

Deﬁne a new variable u such that u =

au + b . cu + d

As du ad − bc = , du (cu + d)2

(10.40)

213

214

10 : Stability of non-linear ﬁltering

we may continue equation (10.40) with R(y) ad − bc ρh (T x, T y) = du R(x) (au + b)(cu + d) R(y) du |ad − bc| ≤ √ u 2 abcd + ad + bc R(x) |ad − bc| = √ ρh (x, y). 2 abcd + ad + bc By elementary calculation, it is easy to show that |ad − bc| bc 6 = tanh log 4 . √ ad 2 abcd + ad + bc We ﬁnish the proof of the lemma by making use of equation (10.37).

The following technical lemma will be needed in the proof of the exponential stability of the ﬁlter. Lemma 10.32 For any λ, µ ∈ P (S), we have dTV (λ, µ) ≡ 2 sup |λ(A) − µ(A)| A∈B(S)

≤ 2 ∧ eρh (λ,µ) − 1 ≤

2 ρ (λ, µ). log 3 h

(10.41)

dTV is called the total variation metric on P (S). Proof If λ and µ are not comparable, then ρh (λ, µ) = ∞, and hence, equation (10.41) clearly holds. Now, we suppose that λ and µ are comparable. Let A ≡ {A ∈ B (S) : λ(A) ≥ µ(A)}. Then, for A ∈ A with µ(A) > 0, we have 1≤ and hence

λ(A) λ(A)/µ(A) = ≤ eρh (λ,µ) , µ(A) λ(S)/µ(S)

0 ≤ λ(A) − µ(A) ≤ µ(A) eρh (λ,µ) − 1 .

(10.42)

It is clear that equation (10.42) holds even if µ(A) = 0 since λ(A) = 0 by the comparability.

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

By symmetry, we can prove that for A ∈ A, 0 ≤ µ(Ac ) − λ(Ac ) ≤ λ(Ac ) eρh (λ,µ) − 1 . Therefore, dTV (λ, µ) = sup λ(A) − µ(A) + µ(Ac ) − λ(Ac ) A∈A

≤ sup µ(A) + λ(Ac ) eρh (λ,µ) − 1 A∈A

≤ eρh (λ,µ) − 1. It is clear that dTV (λ, µ) is bounded by 2. Hence, the ﬁrst part of the inequality equation (10.41) holds. The second part follows from the inequality 2x , 2 ∧ ex − 1 ≤ log 3

∀x ≥ 0.

Recall that Xt is a continuous time-homogeneous Markov process taking values in S. Px is the probability measure on C(R+ , S) induced by Xt with X0 = x. Let (ξt , βt ) be the co-ordinate process on the probability space ˆ Fˆ , Rπ0 ), where ˆ = C(R+ , S) × C(R+ , Rm ) and Rπ0 = Pπ0 ⊗ Q. Then, (, the observation process Yt has the same distribution as the process t yt = h(ξs )ds + βt . 0

Since we are going to study the stability problem with models on the stanˆ Fˆ , Rπ0 ) only, we shall use some of the notations dard probability space (, y that were used in the original ﬁltering model. For example, we take Gt = Ft and πt = !t (π0 )(y). ˆ is a Polish space, there exists a regular conditional probability As distribution ˆ x ·|Gt , ξt = x R ˆ x is the probability measure whose Radon–Nickodym ˆ where R on , derivative with respect to Rx is ˆx dR (θ , η) = qt (θ , η) = exp dRx

0

t

1 h(ξu (θ)) dβu (η) − 2 ∗

0

t

2

|h(ξu (θ ))| du .

215

216

10 : Stability of non-linear ﬁltering

ˆ x , it is easy to show that for any B ∈ Fˆ t As Gt and ξt are independent under R and A ∈ B (S), we have ˆ x (ξt ∈ dx ). ˆ x B|Gt , ξt = x R ˆ x (B ∩ {ξt ∈ A}|Gt ) = R R A

Recall that

qt (θ , y) = exp and

p(t, x, dx )

t

0

h(ξu (θ))∗ dyu −

1 2

0

t

|h(ξu (θ ))|2 du ,

is the transition function of the Markov process Xt .

Lemma 10.33 For any y ∈ C(R+ , Rm ) and x, x ∈ S, let ˆ It (x, x ; y) = ERx qt (·, y)|Gt , ξt = x . Then, for any f ∈ Cb (S), we have f (x )p(t, x, dx )It (x, x ; y)λ(dx). t (λ)(y), f =

(10.43)

S S

Proof By the Kallianpur–Striebel formula, we have ˆ t (λ)(y), f = ERλ f (ξt (θ))qt (θ , y)|Gt ˆ = ERx f (ξt (θ))qt (θ , y)|Gt λ(dx) S ˆ = ERx qt (θ , y)|Gt , ξt = x f (x )p(t, x, dx )λ(dx) S S = f (x )It (x, x ; y)p(t, x, dx )λ(dx). S S

The aim of this section is to prove that dTV (!t (π¯ 0 ), !t (π0 )) → 0

as t → ∞.

More speciﬁcally, we shall calculate the asymptotic rate γ (π¯ 0 ) = lim sup t→∞

1 log dTV (!t (π¯ 0 ), !t (π0 )), t

and prove that γ (π¯ 0 ) < 0. Throughout the rest of this section, we assume that the Markov process Xt has a unique invariant measure µ satisfying the following conditions: Assumption (S1): The measure Pµ is ergodic, i.e. the tail σ -ﬁeld ∩t≥0 σ {Xs : s ≥ t} contains only Pµ -trivial sets (those with Pµ measure 0 or 1).

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

Assumption (S2): The measures Pπ0 and Pµ are equivalent on the tail σ -ﬁeld. Let λ ∈ MF (S), t ≥ 0 and y ∈ C(R+ , Rm ) be ﬁxed. We deﬁne an operator λ,y It on MF (S) as follows: " ! λ,y f (x )It (x, x ; y)µ(dx)λ(dx ). It (µ), f = S S

For convenience, we denote the transition probability p(t, x, dx ) also by pt (x, dx ). Theorem 10.34 Suppose that Assumptions (S1) and (S2) hold, and let δ > 0. (a) If π¯ 0 ∈ P (S) is comparable with π0 , then γ (π¯ 0 ) ≤ δ −1 Eµ log τ (δ ),

Qπ0 -a.s.

(10.44)

(b) If π¯ 0 ∈ P (S) is comparable with π0 , and there exists λ ∈ MF (S) such that for every x ∈ S, the measure pδ (x, ·) is absolutely continuous with respect to λ, then λ,y γ (π¯ 0 ) ≤ δ −1 Eµ log tanh 4−1 H(Tδ ) + H(Iδ ) , Qπ0 -a.s. (10.45) (c) Suppose that there exists λ ∈ MF (S) such that, for every x ∈ S, pδ (x, ·) is comparable with λ, and Iδ (x, x ; y) can be bounded above and below by positive bounds that do not depend on x and x . Then equations (10.44) and (10.45) hold Qπ0 -a.s. for any π¯ 0 ∈ P (S). Proof (a) Let n = [t/δ]. By Lemma 10.32, we have 2 1 1 1 log dTV (!t (π¯ 0 ), !t (π0 )) − log ≤ log ρh (!t (π¯ 0 ), !t (π0 )). t t log 3 t Note that for any positive constants K1 and K2 , we have ρh (K1 λ, K2 µ) = ρh (λ, µ). Thus, ρh (!t (π¯ 0 ), !t (π0 )) = ρh (t (π¯ 0 ), t (π0 )). By the ﬂow property proved in Lemma 10.7, we have t (π¯ 0 ) = nδ,t (nδ (π¯ 0 )). It follows from Lemma 10.31 and the fact of τ (·) is bounded by 1 that ρh (t (π¯ 0 ), t (π0 )) ≤ ρh (nδ (π¯ 0 ), nδ (π0 )).

217

218

10 : Stability of non-linear ﬁltering

Therefore, combining all the estimates above, we get 1 2 1 log dTV (!t (π¯ 0 ), !t (π0 )) − log t t log 3 1 ≤ log ρh (nδ (π¯ 0 ), nδ (π0 )) t ρ ((i+1)δ (π¯ 0 ), (i+1)δ (π0 )) 1 1 + log ρh (π¯ 0 , π0 ) log h t ρh (iδ (π¯ 0 ), iδ (π0 )) t n−1

=

i=0

1 1 log τ (iδ,(i+1)δ ) + log ρh (π¯ 0 , π0 ). t t n−1

≤

i=0

Note that τ (iδ,(i+1)δ ) is a function of {ξt+iδ , yiδ+t − yiδ : 0 ≤ t ≤ δ}. Thus, ˆ µ . It follows {τ (iδ,(i+1)δ ), i = 1, 2, . . .} is an ergodic sequence under R ˆ µ -a.s. By from Birkhoff’s ergodic theorem that equation (10.44) holds R ˆ π are equivalent on the tail σ -ﬁeld. ˆ µ and R Assumption (S2), we see that R 0 ˆπ As equation (10.44) is a tail event, we get that equation (10.44) holds R 0 a.s. Since equation (10.44) depends on y only and the marginal measure of ˆ π is Qπ , we see that equation (10.44) holds Qπ -a.s. R 0 0 0 (b) By a slightly abuse of the notation, we denote the density of the probability measure pδ (x, ·) with respect to λ by pδ (x, x ). Deﬁne Jδ (x, x ) = pδ (x, x )Iδ (x, x ; y). By equation (10.43), we have f (x )Jδ (x, x )π¯ 0 (dx)λ(dx ). δ (π¯ 0 ), f = S S

Thus, δ has a kernel representation, and hence, H(δ ) = log esssup ≤ log esssup

Jδ (x, z)Jδ (x , z ) Jδ (x, z )Jδ (x , z) pδ (x, z)pδ (x , z ) Iδ (x, z; y)Iδ (x , z ; y) + log esssup pδ (x, z )pδ (x , z) Iδ (x, z ; y)Iδ (x , z; y)

= H(Tδ ) + H(Iδλ (·; y)). By equations (10.39) and (10.44), we see that equation (10.45) holds. (c) Note that by Lemma 10.33, we have d(δ (π¯ 0 )) (x ) = pδ (x, x )Iδ (x, x ; y)π¯ 0 (dx). dλ S

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

By the assumption of the theorem, pδ and Iδ are bounded, and hence, the Radon–Nickodym derivative above is bounded (below from 0 and above from inﬁnity). Thus δ (π¯ 0 ) and δ (π0 ) are comparable. Therefore, the argument used in equation (10.45) holds when ρh (q, π0 ) is replaced by ρh (δ (π¯ 0 ), δ (π0 )). Finally, we specialize the results above to the diffusion in S. We assume that S is a compact manifold in Rd . For the reader who is not familiar with a differential manifold, you can regard S as a compact surface in Rd . Suppose that the Markov process ξt is governed by the following SDE on S: dξt = b(ξt )dt + σ (ξt )dαt ,

(10.46)

where αt is a d-dimensional Brownian motion. For simplicity of notation, we assume that the observation is a real-valued (m = 1) process yt given by t h(ξs )ds + βt . yt = 0

We assume that b, σ , h are Lipschitz continuous on S. We further assume that h ∈ Cb2 (S) and b, σ satisfy the uniform elliptic condition. It is well known that the transition density for the Markov process ξt exists and satisﬁes K1 e−K2 t

−1

≤ pt (x, x ) ≤ K3 t −d/2 ,

∀ x, x ∈ S,

(10.47)

where K1 , K2 , K3 are three constants. We refer the reader to Chapter 3 of Davies [48] for this result. We will need the following result in proving the asymptotic stability of the optimal ﬁlter. Lemma 10.35 For δ > 0 and y ∈ C(R+ , R) ﬁxed, there exists a constant K1 = K1 (δ, y) such that Iδ (x, x , y) ≤ K1 ,

∀x, x ∈ S.

Proof Let λ be the surface measure on S. Note that δ 1 δ 2 ˆx R Iδ (x, x ; y) = E exp h(ξs )dys − h (ξs )ds ξδ = x , Gδ . 2 0 0 Applying Itô’s formula to equation (10.46), we have d(h(ξs )ys ) = h(ξs )dys + ys ∇ ∗ h(ξs )dξs +

d 1 σik σjk ∂ij2 h(ξs )ys ds. 2 i,j,k=1

219

220

10 : Stability of non-linear ﬁltering

Thus, δ 1 δ 2 h(ξs )dys − h (ξs )ds 2 0 0 ⎛ ⎞ δ δ d 1 2 2 ⎝ ⎠ = h(ξδ )yδ − σik σjk ∂ij h(ξs )ys ds − ys ∇ ∗ h(ξs )dξs h (ξs ) + 2 0 0 ≤ K(y, δ) −

0

i,j,k=1

δ

ys ∇ ∗ h(ξs )dξs ,

where K(y, δ) is a random variable depending on y and δ. Therefore, δ ˆx K(y,δ) R ∗ E ys ∇ h(ξs )dξs ξδ = x , Gδ . exp − Iδ (x, x ; y) ≤ e 0

(10.48) Set

t ys ∇ ∗ h(ξs )dξs . Ft = exp − 0

Let δn ↑ δ. Applying Fatou’s lemma, we get ˆ ˆ ERx Fδ ξδ = x , Gδ ≤ lim ERx Fδn ξδ = x , Gδ . n→∞

(10.49)

ξ

For any Fδn -measurable random variable Z and Borel measurable function f , it follows from the Markov property of ξt that ˆ ˆ ˆ ERx Z|ξδ = x pδ (x, x )f (x )dx = ERx ERx (Z|ξδ ) f (ξδ ) S

ˆ = ERx Zf (ξδ ) ˆ ˆ ξ = ERx ZERx f (ξδ )|Fδn ˆx R =E Z pδ−δn (ξδn , x )f (x )dx =

S

S

ˆ ERx Zpδ−δn (ξδn , x ) f (x )dx .

Thus, ˆ ˆ ERx Z|ξδ = x pδ (x, x ) = ERx Zpδ−δn (ξδn , x ) for almost every x ∈ S.

10.4

Asymptotic stability for non-linear ﬁltering with compact state space

ˆ x , the processes (ξt ) and (yt ) are independent. Thus, when considUnder R ering the conditional expectation given Gδ , we may regard y0δ , the path yt for ξ t ∈ [0, δ], as ﬁxed (non-random). Namely, for Z being Fδn ∨ Gδ -measurable, we have ˆ ˆ ERx Z|ξδ = x , Gδ pδ (x, x ) = ERx Zpδ−δn (ξδn , x )|Gδ , a.s. x ∈ S. Taking Z = Fδn , then ˆ ˆ ERx Fδn |ξδ = x , Gδ pδ (x, x ) = ERx Fδn pδ−δn (ξδn , x )|Gδ .

(10.50)

Denote the left-hand side of equation (10.50) by bn (x, x , y0δ ). Since ξ Fδn−1 is Fδn ∨ Gδ -measurable, it follows from the same argument as in equation (10.50) that ˆ bn−1 (x, x , y0δ ) = ERx Fδn−1 pδ−δn (ξδn , x )|Gδ . In view of equations (10.48) and (10.49), in order to get a uniform upper bound for Iδ (x, x ; y), it sufﬁces to bound bn uniformly in x, x and n. Note that |bn (x, x , y0δ ) − bn−1 (x, x , y0δ )| ˆ ≤ ERx |Fδn − Fδn−1 |pδ−δn (ξδn , x )|Gδ 1 d ˆx ˆx d+1 d+1 d+1 d+1 R R d |Fδn − Fδn−1 | pδ−δn (ξδn , x ) |Gδ |Gδ E ≤ E ≡ Q1 Q2 .

(10.51)

By equation (10.47), we get Q2 =

S

pδ−δn (z, x )

d+1 d

≤ K2

S

pδn (x, z)dz

pδ−δn (z, x )(δ − δn )

= K2 (δ − δn ) ≤ K3 (δ − δn )

d − 2(d+1)

− 12

d d+1

pδn (x, z)dz

d d+1

d

pδ (x, x ) d+1

d − 2(d+1) −d 2 /2(d+1)

δ

.

To bound Q1 , we ﬁrst use the inequality |ex − ey | ≤ |x − y|(ex + ey ),

(10.52)

221

222

10 : Stability of non-linear ﬁltering

and the Cauchy–Schwarz inequality. Denoting m = 2(d + 1), we have ⎧ ⎛ ⎞⎫ 1 d+1 ⎨ δn ⎬ d+1 ˆ d+1 ⎠ Q1 ≤ ERx ⎝ ys ∇ ∗ h(ξs )dξs G Fδn + Fδn−1 δ ⎭ ⎩ δn−1 m $ 1 m δn 1 m ˆ ∗ ys ∇ h(ξs )dξs Gδ ERx 2m Fδmn + Fδmn−1 Gδ δn−1

ˆx R

≤ E

≡ Q11 Q12 . By the Burkholder–Davis–Gundy inequality, we get m $ 1 m δn ys ∇ ∗ h(ξs )b(ξs )ds Gδ δn−1

ˆx R

Q11 ≤ E

ˆx R

+ E

m $ 1 m δn ∗ y ∇ h(ξs )σ (ξs )dαs Gδ δn−1 s

≤ K4 y0δ ∞ (δn − δn−1 )1/2 .

(10.53)

On the other hand, ˆ

ERx (Fδmn |Gδ ) ˆx R = E exp −m

δn

0

≤e

K5 my0δ ∞ δ

∗

ys ∇ h(ξs )b(ξs )ds − m

δn 0

∗

ys ∇ h(ξs )σ (ξs )dαs

m2 δn ∗ 2 E exp |ys ∇ h(ξs )σ (ξs )| ds 2 0

× exp −m

δn

0

δ

δn 0

|ys ∇ ∗ h(ξs )σ (ξs )|2 ds

2 yδ 2 δ 0 ∞

≤ eK5 my0 ∞ δ eK6 m × E exp −m

m2 ys ∇ ∗ h(ξs )σ (ξs )dαs − 2

0

δn

m2 ys ∇ h(ξs )σ (ξs )dαs − 2 ∗

δn 0

∗

2

|ys ∇ h(ξs )σ (ξs )| ds

= exp (mK(y, δ)) , where K(y, δ) = K5 y0δ ∞ δ + K6 my0δ 2∞ δ.

10.5

Exchangeability of union intersection for σ -ﬁelds

Thus, Q12 ≤ 4 exp (K(y, δ)) .

(10.54)

Combining equations (10.53) and (10.54), we conclude that (10.55) Q1 ≤ 4K4 y0δ ∞ (δn − δn−1 )1/2 exp (K(y, δ)) . Choosing δn = δ 1 − 2−n and combining equations (10.51), (10.52) and (10.55), we get |bn (x, x , y0δ ) − bn−1 (x, x , y0δ )| ≤ 4K3 K4 y0δ ∞ (δ2−n )1/2(d+1) exp (K(y, δ)) δ −d

2 /2(d+1)

.

Since b0 = pδ (x, x ), it follows that bn ≤ K7 δ −d/2 + 4K3 K4 y0δ ∞ δ (1−d)/2 exp (K(y, δ)) .

(10.56)

Using equations (10.48), (10.49) and (10.47), we get Iδ (x, x ; y0δ ) ≤ eK(y,δ)

bn . pδ (x, x )

Making use of equation (10.56), we then get the boundedness of Iδ (x, x ; y0δ ). Theorem 10.36 There exists a constant K1 > 0 such that γ (π¯ 0 ) ≤ −K1 ,

Qπ0 -a.s.

(10.57)

Proof It follows from Lemma 10.35 and the Cauchy–Schwarz inequality EeX ≥ 1/Ee−X that there is a constant K2 = K2 (δ, y) > 0 such that Iδ (x, x , y0δ ) ≥ K2 ,

∀ x, x ∈ S.

(10.58)

By Lemma 10.35 and equation (10.58), it follows from equation (10.38) λ,y that H(Iδ ) < ∞. It follows from equations (10.38) and (10.47) that H(Tδ ) < ∞. Thus, equation (10.57) follows from equation (10.45).

10.5

Exchangeability of union intersection for σ -ﬁelds

As we saw in Section 10.3, the condition equation (10.18) is the key for the stability of the optimal ﬁlter. This condition is essentially about the exchangeability of the two operations (union and intersection) of σ -ﬁelds. In this section, we study this problem in a general setting.

223

224

10 : Stability of non-linear ﬁltering

Let {Gn , n ≥ 1} be a sequence of non-increasing sub-σ -ﬁelds of F with the intersection G∞ . Let A be another sub-σ -ﬁeld of F . The goal of this section is to ﬁnd conditions under which the following equality holds ∩∞ n=1 (A ∨ Gn ) = A ∨ G∞

(10.59)

in some sense we shall deﬁne below. Deﬁnition 10.37 For two sub-σ -ﬁelds A and B of F , we write

A=B

mod P,

if A and B induce the same sets of P-equivalence sets. More precisely, for any A ∈ A there exists B ∈ B such that P(AB) = 0, and vice versa, where AB = (A \ B) ∪ (B \ A) is the symmetric difference between A and B. Now we can modify equation (10.59), and the goal of this section is to ﬁnd conditions under which the following equality holds ∩∞ n=1 (A ∨ Gn ) = A ∨ G∞ ,

mod P.

(10.60)

For convenience, we introduce a few more notations. For a σ -ﬁeld A, Ab denotes the set of all bounded A-measurable random variables. PωA denotes the regular conditional probability measure on (, F , P) given A. First, we consider measurable spaces (X, X ) and (Y, Y ). Let Q be a probability measure on (X × Y, X ⊗ Y ). We denote the kernel from X to Y by Qx (dy), namely, F(x, y)Qx (dy) Q1 (dx) = F(x, y)dQ(x, y) (10.61) X

Y

X×Y

for all F ∈ (X ⊗ Y )b , where Q1 is the ﬁrst marginal measure of Q. Let {Yn } be a decreasing sequence of sub-σ -ﬁelds of Y . We now ﬁnd conditions for the following counterpart of equation (10.60): ∩∞ n=1 (X ⊗ Yn ) = X ⊗ Y∞ ,

mod P.

(10.62)

Lemma 10.38 Let Y b,0 be a generating system of Y b as a monotone class. The equality equation (10.62) holds if and only if for every g ∈ Y b,0 and every > 0 there exists a ﬁnite-dimensional subset R of X b and a uniformly bounded sequence {hn } of functions on X × Y such that for each n: i) hn ∈ (X ⊗ Yn )b . ii) hn (·, y) ∈ R for every y ∈ Y.

10.5

iii)

X×Y

Exchangeability of union intersection for σ -ﬁelds

Q (x, y) h (x, y) − E X ⊗ Y (g| ) dQ(x, y) < . n n

Proof “Only if” Suppose that equation (10.62) holds. Fix g ∈ Y b,0 and > 0. Let g(x, ˜ y) = g(y). By the backward martingale convergence theorem, we have ˜ X ⊗ Yn ) → EQ g| EQ (g| ˜ ∩∞ n=1 (X ⊗ Yn ) ˜ X ⊗ Y∞ ) = EQ (g| in L1 (Q). Choose n0 such that for n > n0 , ˜ X ⊗ Yn ) − EQ (g| ˜ X ⊗ Y∞ ) < . EQ EQ (g| 2 For n ≤ n0 or n = ∞, by the martingale convergence theorem, we can choose a ﬁnite subalgebra Hn of X such that ˜ Hn ⊗ Yn ) − EQ (g| ˜ X ⊗ Yn ) < . EQ EQ (g| 2 Let 0 b R = ∨nn=1 Hn ∨ H∞ . Then R is a ﬁnite-dimensional space. Denote ˜ Hn ⊗ Yn ) , hn = EQ (g|

n ≤ n0 or n = ∞.

For n > n0 , let hn = h∞ . It is easy to see that {hn , n ≥ 1} satisﬁes i)–iii). “If” We only need to show that for any F ∈ (X ⊗ Y )b , EQ (F|X ⊗ Y∞ ) = EQ F| ∩∞ (10.63) n=1 (X ⊗ Yn ) . Suppose F(x, y) = g(y) for some g ∈ Y b,0 . For every > 0, let {hn } and R be given by i)–iii). Let {f1 , . . . , fd } be a basis of R. Then, there exist uniformly bounded functions gin ∈ Ynb , i = 1, . . . , d such that hn (x, y) =

d

gin (y)fi (x),

x ∈ X, y ∈ Y.

i=1

Since the bounded set of L∞ is relatively compact in weak topology, we b of {g n }, i.e. for every h ∈ L1 (Q2 ), may take a (weak) limit point gi ∈ Y∞ i we have 2 2 EQ gin h → EQ gi h , i = 1, . . . , d,

225

226

10 : Stability of non-linear ﬁltering

as n → ∞. Let h˜ i (y) =

fi (x)H(x, y)Qy (dx),

X

i = 1, 2, . . . , d.

Then, for every H ∈ L∞ (Q), as n → ∞, we have

EQ (hn H) =

d i=1

=

X

d i=1

→

Y

Y

gin (y)h˜ i (y)Q2 (dy)

d i=1

fi (x)gin (y)H(x, y)Qy (dx)Q2 (dy)

Y

gi (y)h˜ i (y)Q2 (dy)

= EQ h∞ H , where h∞ (x, y) =

d

fi (x)gi (y) ∈ (X ⊗ Y∞ )b .

i=1

Taking n → ∞ in iii), by the backward martingale convergence theorem, we get Q ∞ (x, y) g| ∩ h (x, y) − E X ⊗ Y ( ) dQ(x, y) ≤ . ∞ n n=1 X×Y

Such an h∞ exists for all > 0. This implies that EQ g| ∩∞ n=1 (X ⊗ Yn ) is X ⊗ Y∞ -measurable, and hence, equals EQ (g|X ⊗ Y∞ ). Next, we take F(x, y) = f (x)g(x) for some f ∈ X b and g ∈ Y b,0 . The same arguments with fi replaced by ffi yield equation (10.63) for this F. The standard method of the measure theorem then implies equation (10.63) for general F ∈ (X ⊗ Y )b . Next, we transform the result of Lemma 10.38 to our original probability space by deﬁning a mapping γ from (, F ) to ( × , A ⊗ G1 ) by γ (ω) = (ω, ω). Take X = Y = , X = A and Y = G1 . Lemma 10.39 For each n, we have γ −1 (A ⊗ Gn ) = A ∨ Gn . Proof Let

D = {B1 × B2 : B1 ∈ A, B2 ∈ Gn } ,

10.5

and

Exchangeability of union intersection for σ -ﬁelds

H = B ∈ A ⊗ Gn : γ −1 (B) ∈ A ∨ Gn .

It is easy to show that D is closed under ﬁnite intersection and H, containing D, is closed under increasing limit and closed under true difference. Thus, H = A ⊗ Gn . This proves γ −1 (A ⊗ Gn ) ⊂ A ∨ Gn . The other direction of the containment can be proved similarly. As a consequence of Lemma 10.39, we see that the equalities equations (10.60) and (10.62) are equivalent. Lemma 10.40 Deﬁne a probability measure Q on × by Q = P ◦ γ −1 . Then, for any random variable g on (, F , P), we have ˜ A ⊗ Gn ) (ω, ω) = E (g|A ∨ Gn ) (ω), EQ (g|

P-a.s. ω,

(10.64)

where g(ω ˜ 1 , ω2 ) = g(ω2 ) is a random variable on ( × , F ⊗ F , Q). Proof It follows from Lemma 10.39 that both sides of equation (10.64) are A ∨ Gn -measurable. We only need to show that for A1 ∈ A and A2 ∈ Gn , EQ (g|A ⊗ Gn ) (ω, ω)P(dω) = E (g|A ∨ Gn ) (ω)P(dω). A1 A2

A1 A2

It is clear that the right-hand side equals E g1A1 A2 . On the other hand, the left-hand side is EQ (g|A ⊗ Gn ) 1A1 ×A2 dQ = EQ g1A1 ×A2 = E g1A1 A2 . ×

By Lemmas 10.39 and 10.40, we can transform the result of Lemma 10.38 to our original setting now. b,0

Theorem 10.41 Let G1 be a generating system of G1b as a monotone class. b,0

The equality equation (10.60) holds if and only if for every g ∈ G1 and every > 0 there exists a ﬁnite-dimensional subset R of Ab and a uniformly bounded sequence {hn } of functions on × such that for each n: i) hn ∈ (A ⊗ Gn )b . ii) hn (·, ω) ∈ R for every ω ∈ . iii) hn (ω, ω) − E (g|A ∨ Gn ) (ω) P(dω) < .

227

228

10 : Stability of non-linear ﬁltering

In the rest of this section, we take X, Y as those deﬁned just before Lemma 10.39. Recall that Q1 is the marginal measure of Q on X, and PxA coincides with Qx , the transition kernel from X to Y. Below, we will use PxA to indicate its measurability with respect to A as a function of x. Now we consider other conditions under which equation (10.60) holds. We need the following Deﬁnition 10.42 Let A, G be two sub-σ -ﬁelds of F . The σ -ﬁeld G is PA -separable if there is a countably generated sub-σ -ﬁeld H of G such that G = H mod PωA for P-a.s. ω. Throughout the rest of this section, we assume that G1 is PA -separable. b,0

Lemma 10.43 Let G be a sub-σ -ﬁeld of G1 and let G1 be as before. Then, the following statements are equivalent: i) G is PA -separable. b,0 ii) For every g ∈ G1 there exists F ∈ (A ⊗ G )b such that A

F(x, ·) = EPx (g|G )

for Q1 -a.s. x,

(10.65) A

where PxA is the conditional probability measure given A, and EPx stands for the expectation with respect to the measure PxA . b,0 iii) For every g ∈ G1 and every F ∈ (A ⊗ G )b satisfying F = EQ (g|A ⊗ G ), we have that equation (10.65) holds. Proof i)⇒iii) Let H ⊂ G be countably generated such that H = G mod PxA for Q1 -a.s. x. Let F ∈ (A ⊗ G )b satisfy F = EQ (g|A ⊗ G ). Then, for all A ∈ A and G ∈ H, we have F(x, y)PxA (dy)Q1 (dx) = g(y)PxA (dy)Q1 (dx). A G

A G

Thus, for G ∈ H, A F(x, y)Px (dy) = g(y)PxA (dy) G

for Q1 -a.s. x.

G

Since H is countably generated there exists N ⊂ X such that Q1 (N) = 0 A / N. Because H = G mod PxA , F then satisﬁes and F(x, ·) = EPx (g|H), ∀ x ∈ the same relation for G instead of H. iii)⇒ii) is obvious. ii)⇒i) By a monotone class argument we see that the assertion in ii) carries over to all g ∈ G1b . Let H1 ⊂ G1 be countably generated such that H1 = G1 mod PxA for x ∈ / N1 while Q1 (N1 ) = 0. Fix a countb able subset {gm , m ≥ 1} of H1 that generates H1b as a monotone class.

10.5

Exchangeability of union intersection for σ -ﬁelds A

For each m, we choose Fm ∈ (A ⊗ G )b satisfying Fm (x, ·) = EPx (gm |G ) for Q1 -a.s. x. Since every product σ -ﬁeld is the union of its countably generated subproduct-σ -ﬁelds, all functions Fm (x, ·), (m ≥ 1, x ∈ X), are measurable with respect to a countably generated sub-σ -ﬁeld H of G . Then A

A

EPx (gm |G ) = EPx (gm |H),

∀m ≥ 1, x ∈ / N2 ,

where N2 is another Q1 -nullset. If x ∈ / N1 ∪ N2 , it follows that A

A

EPx (g|G ) = EPx (g|H),

∀ g ∈ H1b .

(10.66)

Then, equation (10.66) holds for all g ∈ G1b . Therefore, H = G mod PxA since G ⊂ G1 . Finally, we are ready to state the main theorem of this section. Theorem 10.44 Suppose that Gn is PA -separable for all n. Then, equation (10.60) holds if and only if G∞ is PA -separable. b,0

Proof Fix g ∈ G1 . Let Fn ∈ (A ⊗ Gn )b be such that Fn = EQ (g|A ⊗ Gn ). Deﬁne F = lim supn→∞ Fn . From Lemma 10.43 we see that Fn (x, ·) = A EPx (g|Gn ). By the martingale convergence theorem, we have A

F(x, ·) = EPx (g|G∞ )

Q1 -a.s.x.

For H ∈ (A⊗G∞ )b , it follows from equation (10.61) that the two statements H = F mod Q,

(10.67)

and A

H(x, ·) = EPx (g|G∞ )

Q1 -a.s.x

(10.68)

are equivalent. Suppose equation (10.60) holds. Then, equation (10.62) holds, and hence, there exists H ∈ (A ⊗ G∞ )b with property equation (10.67). Hence, H satisﬁes equation (10.68) that, by Lemma 10.43, implies that G∞ is PA -separable. On the other hand, suppose that G∞ is PA -separable. By Lemma 10.43, we see that there is H ∈ (A ⊗ G∞ )b satisfying equation (10.68) and hence equation (10.67). Therefore, ( A ⊗ G ) F ≡ EQ g ∩∞ n n=1

229

230

10 : Stability of non-linear ﬁltering

coincides Q-a.s. with an A ⊗ G∞ -measurable function, i.e. EQ (g|A ⊗ G∞ ) = EQ g ∩∞ n=1 (A ⊗ Gn ) . b,0

Since this is true for every g ∈ G1 , it follows from the same arguments as in the proof of Theorem 10.41 that equation (10.60) holds.

10.6

Notes

The problem of invariant measures for ﬁltering processes was ﬁrst considered by Kunita [92]. Later, the results were extended by Kunita [93], Stettner [140], and Bhatt et al. [9] to more general settings. Using the results of Kunita [92], Ocone and Pardoux [129] studied the asymptotic stability of the ﬁlter. Since then, various authors have investigated this problem with various improvements. For the setting with Markov ergodic signals on compact domains, this problem is studied by Atar and Zeitouni [3], [4], Baxendale et al. [5], Chigansky [29], [30], Chigansky and Liptser [31], Da Prato, et al. [46], Del Moral and Guionnet [52–54], Del Moral and Miclo [57], Delyon and Zeitouni [58]. It is pushed to non-compact/non-ergodic settings by Atar [2], Budhiraja and Ocone [24], [25], Le Gland and Mevel [107], Le Gland and Oudjane [108], [109], Di Masi and Stettner [60], Tadi´c and Doucet [145], Oudjane and Rubenthaler [130], Papavasiliou [131]. Some other results using Kunita [92] include Budhiraja [16], [17], [18], Stettner [141], Le Breton and Roubaud [106]. The stability with respect to initial conditions is naturally related to the robustness of the ﬁltering equation with respect to the model parameters. The related results appeared in, for example, a series of papers of Budhiraja and Kushner [20], [22], [21], [23]. Here, we also mention some other papers in this subject: Cérou [27], Chigansky and Liptser [32], Clark et al. [34], Ocone [128]. The key condition for the stability is equation (10.18). This condition was studied by Weizsächer [146]. Counterexamples for this condition can be found in Baxendale et al. [5], Delyon and Zeitouni [58], and Williams [148]. Sections 10.1 and 10.2 are based on the paper of Bhatt et al. [9]. Theorem 10.19 is due to Kunita. Section 10.3 is based on the paper of Budhiraja [18]. The concept of the ﬁnite-memory property is introduced by Ocone and Pardoux [129]. Section 10.4 is based on Atar and Zeitouni [3]. Section 10.5 is based on Weizsächer [146].

11

Singular ﬁltering

In this chapter, we consider the ﬁltering problem when the magnitude of the observation noise depends on the signal itself. Such a situation arises from many application problems, such as the stochastic volatility model in mathematical ﬁnance and the general ﬁltering problem with the Ornstein–Uhlenbeck process as the observation noise. Since the covariance matrix of the observation noises is completely observable by using the quadratic covariation process, the optimal ﬁlter becomes a probability measure supported on the surface (manifold) on which the covariance matrix of the observation noises is constant. Thus, the optimal ﬁlter is singular. In this chapter, we will demonstrate how to transform this singular ﬁltering problem to the classical one that we studied in the previous chapters.

11.1

A special example

In this section, we consider an inspiring example that can be solved by elementary calculations. Suppose that (B, W) is a (d +m)-dimensional standard Brownian motion. Let the signal be given by the following stochastic differential equation: # j j j dXt = bj (Xt )dt + Xt dBt , j = 1, 2, . . . , d, (11.1) and let the observation process be given by 3 dYt = h(Xt )dt + Zt dWt ,

(11.2)

where b = (b1 , . . . , bd )∗ : Rd → Rd and h : Rd → Rm are continuous j mappings, and Zt = dj=1 Xt is a non-negative-valued process. Remark 11.1 We now consider the example introduced in Section 1.1.2. Suppose that the volatilities of the stocks are not deterministic. Instead, we assume that they are all equal to the square root of the sum of the

232

11 : Singular ﬁltering

appreciation rates of all available stocks, namely 3 j j j j j = 1, 2, . . . , d. dSt = St Xt dt + Zt dWt , j

j

Then, Yt ≡ log St , j = 1, 2, . . . , d satisfy the equation (11.2) with m = d j and hj (x) = xj − 12 z. In this case, Xt is the appreciation rate of the jth stock. If bj (x) = αj (βj −xj ), j = 1, 2, . . . , d with αj , βj ≥ 0 being constants, then equation (11.1) is the Cox–Ingersoll–Ross (CIR) model for interest rates. We refer the reader to the book of Baxter and Rennie [6] for more details on this model. We can also use it to model the appreciation rate of the jth stock. √ Note that the volatility of the jth stock here is modelled as Zt for j = 1, 2, . . . , d. In general, the volatility of the jth stock may √ have a more general form, in particular, it may depend on j. We take Zt here for the convenience of the calculations. The general case can be solved using the method that will be introduced in other sections of this chapter. By Theorem 3.11 and Deﬁnition 2.30, we see that for 1 ≤ i, j ≤ m, the quadratic covariation process Y i , Y j t satisﬁes !

Yi, Yj

where 0 =

t0n

<

t1n

" t

= lim

n→∞

< ··· <

tnn

n

Ytin − Ytin k

k=1

k−1

j

j

k

k−1

Yt n − Yt n

,

= t is a partition of [0, t] with

n | → 0. max |tkn − tk−1

0≤k≤n

Then Y i , Y j t is Gt -measurable. By Theorem 3.6, we get that t ! " Yi, Yj = Zs dsδij . t

0

Therefore, Zt , t > 0, is an observable process. It will feedback and provide some extra information about the signal process Xt . For any z > 0, we deﬁne the hyperplane ⎧ ⎫ d ⎨ ⎬ xj = z . Mz := x ∈ Rd : ⎩ ⎭ j=1

As Xt1 + · · · + Xtd = Zt , the optimal ﬁlter πt is supported on the hyperplane MZt , namely, πt (MZt ) = 1. Note that 3 ¯ t )dt + Zt dW 0 , dZt = b(X (11.3) t

11.1

where ¯ b(x) ≡

d

bj (x) and

dWt0

j=1

≡

d j=1

7

A special example

j

Xt j dB . Zt t

By Theorem 3.13, it is easy to verify that Wt0 is a one-dimensional Brownian motion. Let ej be the unit vector in the jth axis of Rd , j = 1, 2, . . . , d, namely, the jth co-ordinate of ej is 1, and the other co-ordinates are 0. Let √ e¯ := (1, 1, . . . , 1)∗ / d. Note that the vectors b(x) and ej , 1 ≤ j ≤ d, can be decomposed orthogonally as follows: ˜ b(x) = b(x), e¯ e¯ + b(x),

1 ej = √ e¯ + e˜ j , d

˜ with b(x), e˜ j being orthogonal to e¯ . Then, in vector form, equation (11.1) becomes d # j j dXt = b(Xt )dt + Xt ej dBt (11.4) j=1

=

# d 1 ¯ 1 3 j j 0 ˜ e˜ j Xt dBt . √ e¯ b(Xt ) + b(Xt ) dt + √ e¯ Zt dWt + d d j=1

Let (x) be a d × d orthogonal matrix whose last column is # # ∗ x1 xd , . . . , . Deﬁne a d-dimensional process B˜ t by z z ˜ t = (Xt )∗ dBt . dB ˜ t is a d-dimensional Again, by Theorem 3.13, it is easy to show that B Brownian motion. Further, dWt0

=

d

j ˜ d, σjd (Xt )dBt = d B t

j=1

and hence, B˜ d = W 0 . As (t)∗ = (t)−1 , we can represent the Brownian j motions Bt , j = 1, 2, . . . , d, in terms of the stochastic integrals with respect ˜ d−1 and Wt0 : to the Brownian motions B˜ 1t , . . . , B t 7 d−1 j Xt j σjk (Xt )d B˜ kt + dWt0 , j = 1, 2, . . . , d. dBt = Zt k=1

233

234

11 : Singular ﬁltering

By equation (11.3), the equation (11.4) is then rewritten as # d d−1 d j 1 X j ˜ t )dt + σjk (Xt ) Xt e˜ j d B˜ kt + dXt = √ e¯ dZt + b(X √ t e˜ j dWt0 . Zt d j=1 k=1 j=1 (11.5) Note that for ﬁxed s and t with s < t, the mapping 1 x → ξt,s (x) ≡ x + √ e¯ (Zt − Zs ) d −1 is a one-to-one linear transformation from MZs onto MZt . Thus, ξt,0 is a

−1 mapping from MZt to Mz0 . Let κt = ξt,0 Xt . Then

1 κt = Xt − √ e¯ (Zt − z0 ) d satisﬁes the following stochastic differential equation on the hyperplane Mz0 : ˜ t )dt + dκt = b(X

d−1 d

# σjk (Xt )

k=1 j=1

j Xt e˜ j d B˜ kt

d j Xt + √ e˜ j dWt0 . Zt j=1

(11.6)

For κ ∈ Mz0 and z ∈ R+ , deﬁne ˆ b(κ, z) = b˜ κ + d −1/2 e¯ (z − z0 ) ,

σˆ 0 (κ, z) =

d κ j + d −1/2 (z − z0 ) e˜ j , √ z j=1

and, for k = 1, 2, . . . , d − 1, σˆ k (κ, z) =

d

# σjk κ + d −1/2 e¯ (z − z0 ) κ j + d −1/2 (z − z0 )˜ej .

j=1

Then, the singular ﬁltering problem becomes the classical one with the signal κt given by the following SDE on Mz0 : ˆ t , Zt )dt + dκt = b(κ

d−1

˜ k + σˆ 0 (κt , Zt )dW 0 , σˆ k (κt , Zt )d B t t

(11.7)

k=1

and the observation process (Yt , Zt ) given by 3 ˆ t , Zt )dt + Zt dWt , dYt = h(κ

(11.8)

11.1

and

A special example

3 ˆ¯ 0 dZt = b(κ t , Zt )dt + Zt dWt ,

(11.9)

where

ˆ¯ ˆ h(κ, z) = h κ + d −1/2 e¯ (z − z0 ) and b(κ, z) = b¯ κ + d −1/2 e¯ (z − z0 ) . Deﬁne the optimal ﬁlter for κt by Y,Z Ut , f = E f (κt )Ft ,

∀ f ∈ Cb (Mz0 ).

To study the ﬁltering problem with signal equation (11.7) and observations equations (11.8) and (11.9), we need to ﬁnd the initial probability measure U0 on Mz . Suppose that the distribution of X0 has a continuous density π˜ 0 on Rd+ . Theorem 11.2 Let U0 be the conditional probability distribution of X0 given Z0 = z, i.e. U0 dx = P X0 ∈ dx|Z0 = z . Then, U0 (dx) = p(x)λz (dx), where λz is the Lebesgue measure on the hyperplane Mz and π˜ 0 (x) . ˜ 0 (y)λz (dy) Mz π

p(x) =

(11.10)

Proof For a test function φ deﬁned on Rd+ , and a Borel set D in R+ , we have E φ(X0 )1Z0 ∈D = φ(x)1D (x1 + · · · + xd )π˜ 0 (x)dx Rd+

=

D Mz

By taking φ(x) ≡ 1, we get P(Z0 ∈ D) =

φ(x)π˜ 0 (x)λz (dx)dz.

D Mz

π˜ 0 (x)λz (dx)dz.

Thus, E φ(X0 )1Z0 ∈D = E E (φ(X0 )|Z0 ) 1Z0 ∈D E (φ(X0 )|Z0 = z) π˜ 0 (x)λz (dx)dz. = D

(11.11)

Mz

(11.12)

235

236

11 : Singular ﬁltering

Comparing equations (11.11) and (11.12), we get that 8 E (φ(X0 )|Z0 = z) = φ(x)π˜ 0 (x)λz (dx) π˜ 0 (y)λz (dy). Mz

Mz

Therefore, U0 is absolutely continuous with respect to λz and the density is given by equation (11.10). In contrast to the ﬁltering models equations (5.2) and (5.1), the coefﬁcients in the equations (11.7), (11.8) and (11.9) here depend on the observation process Zt . However, since the observation processes are regarded as known (behaves deterministically) in the ﬁltering problem, there is no essential difﬁculty in deriving the ﬁltering equation and the numerical solutions for Ut using the methods introduced in the previous chapters. We leave the details to the interested reader. Finally, we note that Y,Z πt (·) = P Xt ∈ ·|Ft 1 Y,Z = P κt + √ e¯ (Zt − Z0 ) ∈ ·|Ft . d Thus, the optimal ﬁlter for Xt is given by πt (·) = Ut · − d −1/2 e¯ (Zt − Z0 ) ,

t > 0.

It is clear that for t = 0, the optimal ﬁlter π0 coincides with the initial distribution π˜ 0 of X. Note that the optimal ﬁlter πt is not continuous at time t = 0 since Z0 is not observed at time t = 0.

11.2

A general singular ﬁltering model

In this section, we study the following ﬁltering model: dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , dYt = h(Xt )dt + σ (Xt )dWt ,

(11.13)

where B and W are two independent Brownian motions in Rd and Rm , respectively, and b, c, c˜ , h, σ are functions on Rd with values in Rd , Rd×m , Rd×d , Rm , Rm×m , respectively. The analysis can be easily extended to cover the case when all terms in equation (11.13) depend on X as well as on Y. Since the coefﬁcient matrix of the observation noise plays an important role in this chapter, we reserve the notation σ for it. For the

11.2

A general singular ﬁltering model

coefﬁcient matrix of the second noise of the signal, we use c˜ instead of σ (contrary to what we did in the previous chapters). To aid the understanding of the material in Sections 11.2–11.4, we suggest the reader compares the results in these three sections with the corresponding ones in the previous section, which is more explicit. For convenience, we make the following assumption throughout the rest of this chapter. Condition (ND): For any x ∈ Rd , the m × m-matrix σ (x) is invertible. Without loss of generality, we can and will assume that σ (x) is a symmetric and positive-deﬁnite matrix. √ In fact,d we can deﬁne the −1 d × m-matrix-valued function α = cσ σ σ ∗ on R and m-dimensional ˜ process W by 3 −1 ˜t= σ σ ∗ (Xt ) σ (Xt )dWt . dW ˜ is a martingale with Meyer’s processes Then, W ! " ˜ i, W ˜ j = δij t, W ∀ 1 ≤ i, j ≤ m, t

and

!

˜ i , Bj W

" t

= 0,

∀ 1 ≤ i ≤ m, 1 ≤ j ≤ d.

˜ B) is a standard m + d-dimensional By Theorem 3.13, the process (W, Brownian motion. Note that the system equation (11.13) is equivalent to the system ˜ t + c˜ (Xt )dBt dXt = b(Xt )dt + α(Xt )d W √ ∗ ˜ t. dYt = h(Xt )dt + σ σ (Xt )d W √ In this case, σ σ ∗ is symmetric and positive-deﬁnite. Let Yt be Meyer’s process of Y. Recall that, since Y is a continuous semimartingale, Yt coincides with the quadratic covariation (Rm×m -valued) process of Y. From equation (11.13) it follows that t

Yt = σ 2 (Xt ) ds, 0

and hence the process σ 2 (Xt ) =

d dt

Yt is Gt -measurable for any t > 0.

Remark 11.3 The random variable σ 2 (X0 ) is not G0 -measurable, instead, it is G0+ -measurable. However, σ 2 (Xt ) is Gt -measurable for t > 0. + the set of all symmetric positive-deﬁnite m × m-matrices. Denote by Sm +. Then for x ∈ Rd , we have σ (x) and σ 2 (x) ∈ Sm

237

238

11 : Singular ﬁltering + to Rm ˜ with m Next, we deﬁne a mapping a from Sm ˜ = m(m+1) as the 2 list of the diagonal entries and those above the diagonal in lexicographical + , a (r) is deﬁned as order, i.e. for any r ∈ Sm

a (r)1 = r11 , a (r)2 = r12 , . . . , a (r)m = r1m , a (r)m+1 = r22 , . . . , a (r)2m−1 = r2m , a (r)2m = r33 , . . . , a (r)m˜ = rmm . + onto a(S + ) ⊂ Rm ˜ . Further, a and It is clear that a is one-to-one from Sm m are continuous.

a−1

+ ) is open in Rm ˜. Lemma 11.4 The set a(Sm + if and only Proof Suppose that σ ∈ Rm×m . It is well known that σ ∈ Sm if det(σk ) > 0, k = 1, 2, . . . , m, where σk is the k × k submatrix obtained from σ by removing the last m − k rows and m − k columns. Note that + ) consists det(σk ) is a polynomial of the entries in σ . Thus, the image a(Sm m ˜ of points in R such that these polynomials of its co-ordinates are positive. + ) is open. This implies that a(Sm + → S + such that for any Now let s be the square-root mapping s : Sm m + + such that s (r)2 = r. To r ∈ Sm , s (r) is the unique matrix belonging to Sm see that s is a continuous and, in particular, a Borel measurable mapping, we prove the following representation.

Lemma 11.5 Suppose that is a smooth closed path in the complex half-plane with positive real part and contains all the eigenvalues of the positive-deﬁnite matrix r. Then √ 1 s(r) = z(r − zI)−1 dz. (11.14) 2π + to itself. As a consequence, s is a continuous mapping from Sm

Proof Since r is a positive-deﬁnite matrix, there exists an orthogonal matrix r˜ such that r˜∗ r˜r = diag(λ1 , . . . , λm ), where diag(λ1 , . . . , λm ) is the diagonal matrix with diagonal (λ1 , . . . , λm ), and λ1 , . . . , λm > 0 are the eigenvalues of the matrix r. Then, √ √ √ z(r − zI)−1 = r˜diag z(λ1 − z)−1 , . . . , z(λm − z)−1 r˜∗ . By Cauchy’s theorem, we have # √ 1 z(λj − z)−1 dz = λj , 2π

j = 1, 2, . . . , m.

11.2

Thus,

A general singular ﬁltering model

3 3 λ1 , . . . , λm r˜∗ s(r) = r˜diag √ 1 = z˜rdiag (λ1 − z)−1 , . . . , (λm − z)−1 r˜∗ dz 2π √ 1 = z(r − zI)−1 dz. 2π

Now, since σ 2 (Xt ) is Gt -measurable for any t>0, then σ (Xt )=s σ 2 (Xt ) is Gt -measurable for any t > 0, too. In fact, we have the following Lemma 11.6 Let Z, Yˆ be deﬁned by Zt = σ (Xt ), t > 0 and ˜ t )dt + dWt , d Yˆ t = σ −1 (Xt )dYt = h(X where h˜ = σ −1 h. Then ˆ

Gt = FtY ∨ FtZ , t > 0. Proof As we indicated in Remark 11.3, we have FtZ ⊂ Gt . Since σ −1 (Xt ) ˆ is also FtZ -measurable, Yˆ t is Gt -measurable, i.e. FtY ⊂ Gt . Therefore ˆ

FtY ∨ FtZ ⊂ Gt .

(11.15)

On the other hand, as dYt = Zt d Yˆ t , we have, ˆ

Gt ⊂ FtY ∨ FtZ .

(11.16)

The conclusion of the lemma follows from equations (11.15) and (11.16). From Lemma 11.6, we see that the signal–observation pair can be written as ⎧ ⎨ dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt , ˜ t )dt + dWt , (11.17) d Yˆ = h(X ⎩ t Zt = σ (Xt ). We see now that the framework is truly non-classical as part of the observation process is noiseless. It follows that, given the observation, Xt takes + , M is deﬁned as values in the set MZt where, for any z ∈ Sm z Mz = {x ∈ Rd : σ (x) = z}.

239

240

11 : Singular ﬁltering

More precisely, if πt is the optimal ﬁlter, that is πt f = E( f (Xt )|Gt ), t > 0, then πt has support in MZt . Hence, πt will no longer have total support and will be singular with respect to the Lebesgue measure on Rd . To study the process πt , we need the following smoothness condition on σ . Condition (S): The matrix-valued function σ on Rd , together with its partial derivatives up to order 2, are bounded and Lipschitz continuous. As already observed, only the diagonal entries and those above the diagonal of the process Zt (in other words, a (Zt )) are required to generate FtZ . Hence, we only need to take into account the properties of the mapping aσ aσ : Rd −→ Rm˜ , deﬁned as aσ (x) = a (σ (x)) for all x ∈ Rd . Note that, for all x ∈ Rd , ˜ × d matrix with (∇ ∗ aσ ) (x) is a linear mapping from Rd to Rm˜ , i.e. a m entries ij qij (x) = ∇ ∗ aσ (x) = ∂j aiσ (x), i = 1, . . . , m; ˜ j = 1, . . . , d. Deﬁnition 11.7 The vector x ∈ Rd is a regular point for aσ if and only if the matrix q(x) has full rank d ∧ m. ˜ We will study the optimal ﬁlter πt in the next two sections according to the type of level set Mz . For convenience, we will make the following assumption throughout the rest of this chapter. Condition (R): Every point in Rd is regular for aσ .

11.3

Optimal ﬁlter with discrete support

In this section, we consider the case when d ≤ m. ˜ We will show that Mz consists of countably many points and the optimal ﬁlter πt is a discrete type probability measure on Mz . Note that x ∈ Rd is a regular point if and only if # J (x) = det q∗ q(x) > 0. Applying Itô’s formula to aσ (Xt ) with Xt being given by equation (11.13), we get da (Zt ) = Laσ (Xt )dt + q (Xt ) c(Xt )dWt + c˜ (Xt )dBt , (11.18)

11.3

Optimal ﬁlter with discrete support

where L is the second-order differential operator given by d d 1 2 a˜ ij ∂ij f + bi ∂i f , Lf = 2 i,j=1

i=1

with a˜ = cc∗ + c˜ c˜ ∗ . Hence, −1 ∗ q (Xt ) da (Zt ) − Laσ (Xt )dt . c(Xt )dWt + c˜ (Xt )dBt = q∗ q Inserting back into equation (11.13), we have −1 ∗ q (Xt ) da (Zt ) − Laσ (Xt )dt . dXt = b(Xt )dt + q∗ q

(11.19)

Now we proceed to study the optimal ﬁlter πt . First, we consider the case when X0 = x0 is constant. By Conditions (S) and (BC), it is easy to show that the coefﬁcients for the SDE (11.19) are Lipschitz continuous. Therefore, equation (11.19) has a unique strong solution Xt that is a functional of {a(Zs ) : s ≤ t} and x0 = 0. Thus, Xt is Gt -measurable and hence, t > 0. πt f = Ex (f (Xt )|Gt ) = f (Xt ), In other words, the noiseless component of the observation uniquely identiﬁes the conditional distribution of the signal, given the observation: πt = δXt ,

t > 0.

Next, we consider the case when X0 is not constant. We assume that the distribution of X0 has a continuous density π˜ 0 with respect to the Lebesgue measure in Rd . Now we need to take into account the additional information that may arise from observing the quadratic variation of the process a (Zt ) and the covariation process between a(Zt ) and Yt . This will not inﬂuence the trajectory of the process. However, it may reduce the number of possible initial values. By equation (11.18), we see that the quadratic covariation process of a (Zt ) is t q(Xs ) cc∗ (Xs ) + c˜ c˜ ∗ (Xs ) q∗ (Xs ) ds. 0

It follows from equations (11.18) and (11.13) that the quadratic covariation process between a(Zt ) and Yt is t qc(Xs )Zs∗ ds. 0

241

242

11 : Singular ﬁltering

Therefore, ˜ 0 ≡ q cc∗ + c˜ c˜ ∗ q∗ (X0 ) , qc(X0 )Z∗ Z 0 is G0+ measurable. We now divide the discussion into two cases. Case 1 The matrices q cc∗ + c˜ c˜ ∗ q∗ and qc are functions of aσ , in other +) words there exist two Borel measurable functions H1 and H2 from a(Sm m× ˜ m ˜ m×m ˜ to R and R , respectively, such that ∗ q cc + c˜ c˜ ∗ q∗ = H1 (aσ ) and qc = H2 (aσ ). (11.20) ˜ 0 = H1 (a (Z0 )) , H2 (a (Z0 )) Z∗ brings no new knowledge, In this case, Z 0 hence we can ignore it. Before we can proceed further, we need an area formula and the deﬁnition of the Hausdorff measure. We will state them here for the convenience of the reader. We refer the reader who is interested in more details to the book of Evans and Gariepy [62] for the proofs and other related deﬁnitions. Deﬁnition 11.8 (i) Let A ⊂ Rd , 0 ≤ s < ∞ and δ > 0. Deﬁne ⎧ ⎫ ∞ ⎨ ⎬ s Hδs (A) = inf α(s) r(Cj ) A ⊂ ∪∞ C , r(C ) ≤ δ , j j=1 j ⎩ ⎭ j=1

where r(Cj ) is the radius of the ball Cj and 8 ∞ s/2 α(s) = π xs/2 e−x dx. 0

(ii) Let

Hs (A) = sup Hδs (A). δ>0

We call Hs the s-dimensional Hausdorff measure on Rd . It can be proved that H0 is the counting measure. If s = k is a positive integer, then Hk agrees with ordinary “k-dimensional surface area” on nice sets; this is the reason we include the normalizing constant α(s) in the deﬁnition. Now, we are ready to state the area formula. Lemma 11.9 (Area formula) For every g ∈ L1 (Rd , J(x)dx), we have g(x)J(x)dx = g(x)Hd (du). Rd

Rm˜ x∈M a−1 (u)

11.3

Optimal ﬁlter with discrete support

Now we can state the result for the case of d ≤ m ˜ when equation (11.20) is satisﬁed. Theorem 11.10 Suppose that Z0 = z and Mz = {xi , i ∈ I} . Suppose that X0 has a continuous density π˜ 0 with respect to the Lebesgue measure in Rd , and the conditions (R, S, BC) are satisﬁed. If π˜ 0 (xj ) j∈I J (xj ) < ∞, then pi δXˆ i , t > 0, πt = t

i∈I

ˆ i is the solution to equation (11.19) with initial xi , and where X t π˜ 0 (xi ) J(xi ) π˜ 0 (xi ) j∈I J(xi )

pi =

.

Proof Let µz be the conditional probability distribution of X0 given Z0 = z, i.e. µz dx = P X0 ∈ dx|Z0 = z . For any B ∈ B (Rm˜ ) and φ ∈ L1 (Rd , π˜ 0 (x)dx), let g(x) = φ(x)1aσ (x)∈B By the area formula, we have

E φ(X0 )1aσ (X0 )∈B =

Rd

Rd

= =

φ(x)1aσ (x)∈B π˜ 0 (x)dx g(x)J(x)dx

Rm˜

=

π˜ 0 (x) . J(x)

x∈Ma−1 (u)

B x∈M

Taking φ = 1, we get P(aσ (X0 ) ∈ B) =

g(x)Hd (du)

φ(x)

a−1 (u)

B x∈M

a−1 (u)

π˜ 0 (x) d H (du). J(x)

π˜ 0 (x) d H (du). J(x)

(11.21)

243

244

11 : Singular ﬁltering

Thus, E φ(X0 )1aσ (X0 )∈B = E 1a(Z0 )∈B φ(x)µZ0 (dx) =

φ(x)µa−1 (u) (dx)

B

Comparing with equation (11.21), we get φ(x)µa−1 (u) (dx) B

=

x ∈Ma−1 (u)

B x∈M

a−1 (u)

x ∈Ma−1 (u)

π˜ 0 (x ) d H (du). J(x )

π˜ 0 (x ) d H (du) J(x )

π˜ 0 (x) φ(x)Hd (du). J(x)

Therefore, φ(x)µz (dx)

π˜ 0 (x) π˜ 0 (x ) = φ(x). J(x ) J(x)

x ∈Mz

x∈Mz

Hence, µz has the support in the set Mz and µz = pi δxi . i∈I

Following the case with constant initial, we then have ˆ i ), πt f = Eµz (f (Xt )|Gt ) = pi f (X ∀t > 0. t i∈I

Remark 11.11 Since Z0 is not observed at time 0, π0 is deterministic and is given by the law of X0 for which we used the notation π˜ 0 . Therefore, the optimal ﬁlter πt is not continuous at t = 0. Case 2 If q cc∗ + c˜ c˜ ∗ q∗ and qc are not functions of aσ , then µ has support in the set ˜ z, z , z = {x ∈ Rd : σ (x) = z, q cc∗ + c˜ c˜ ∗ q∗ (x) = z1 , qc(x) = z2 }, M 1 2 and a similar formula as above is valid under additional smoothness assumptions c˜ . Namely, we replace σ (x) in Case 1 by σ˜ (x) = ∗ on c∗ and ∗ (σ (x), q cc + c˜ c˜ q (x) , qc(x)) and continue with the discussion there.

11.4

11.4

Optimal ﬁlter supported on manifolds

Optimal ﬁlter supported on manifolds

In this section, we consider the case when d > m. ˜ We will show that Mz is a surface (manifold) and the optimal ﬁlter πt is a probability measure on the surface Mz and is absolutely continuous with respect to the surface measure. Note that x ∈ Rd is a regular point if and only if # J (x) = det qq∗ (x) > 0. We list some facts about the transformation aσ without giving their proofs. Again, we refer the reader to the book of Evans and Gariepy [62] for more details. We shall use Tx Mz to denote the space consisting of all the tangent vectors of the manifold (surface) Mz at point x ∈ Mz . Tx Mz is called the tangent space of Mz at x. We will use Nx Mz to denote the orthogonal complement of Tx Mz in Rd . We call it the normal space of Mz at x. ˜ Lemma 11.12 i) For any u ∈ Rm˜ , Ma−1 (u) is a d-dimensional manifold ˜ d ˜ where d = d − m. ˜ The Hausdorff measure H on Ma−1 (u) is the surface measure. ii) For any x ∈ Mz , the rows of the matrix q generate the normal space Nx Mz to the manifold Mz at point x. iii) Let ρ(x) = q∗ (qq∗ )−1 (x) and p(x) = ρ(x)q(x). Then p(x) is the orthogonal projection matrix from Rm˜ to the subspace N x Mz . For simplicity of presentation, we make an assumption that is slightly stronger than equation (11.20). Condition (IN): There exist two Borel measurable functions H1 and H2 + ) to Rm×m ˜ ˜ from a(Sm and Rm×d , respectively, such that qc = H1 (aσ ) and q˜c = H2 (aσ ).

(11.22)

Throughout the rest of this section, the assumptions (IN), (R) and (S) will be in force. Now, we proceed to study the “initial” π0+ of the optimal ﬁlter πt . Note that the real initial π0 coincides with the law of X0 , while π0+ is the initial for the ﬁltering equation satisﬁed by the optimal ﬁlter. Another point of view is that π0+ is the initial of the optimal ﬁlter when both Zt and Yt , including Z0 , are observed. We need a co-area formula whose proof and related deﬁnitions can be found in Chapter 3 of [62].

245

246

11 : Singular ﬁltering

Lemma 11.13 (Co-area formula) For every g ∈ L1 (Rd , J(x)dx), the restriction of g to the level set Ma−1 (u) is Hd−m˜ -integrable for almost all u ∈ Rm˜ , and g(x)J(x)dx = g(x)Hd−m˜ (dx)du. Rm˜

Rd

Ma−1 (u)

The next theorem gives the “initial” π0+ of the optimal ﬁlter in the case of d > m. ˜ Theorem 11.14 For u ∈ Rm˜ , let λu denote the surface measure on the level set Ma−1 (u) . Suppose that X0 has a continuous density π˜ 0 that is not identically zero on Ma−1 (u) , and satisﬁes the following integrability condition: π˜ 0 (x) λu (dx) < ∞. J(x) M −1 a

(u)

Let µz be the conditional probability distribution of X0 given Z0 = z, i.e. µz dx = P X0 ∈ dx|Z0 = z . Then, µz (dx) = p(x)λz (dx), where π˜ 0 (x)/J(x) . ˜ 0 (y)/J(y)λu (dy) Mz π

p(x) =

Proof For any test function φ deﬁned on Rd , and any Borel set D in Rm˜ , we deﬁne π˜ 0 (x) g(x) = φ(x)1σ (x)∈D . J(x) Then, by the co-area formula, we have E φ(X0 )1σ (X0 )∈D = g(x)J(x)dx Rd = g(x)Hd−m˜ (dx)du Rm˜

Ma−1 (u)

π˜ 0 (x) d−m˜ = φ(x) H (dx) du J(x) D Ma−1 (u) π˜ 0 (x) = λu (dx) du. φ(x) J(x) D M −1

a

(u)

11.4

Optimal ﬁlter supported on manifolds

By taking φ(x) ≡ 1, we get P(σ (X0 ) ∈ D) =

D

Ma−1 (u)

π˜ 0 (x) λu (dx) du. J(x)

The result follows from the deﬁnition of the conditional expectation.

As we did in equation (11.5), we now decompose the vector ﬁelds in the SDE satisﬁed by the signal according to their components in the spaces Tx Mz and Nx Mz . In contrast to the case in Section 11.1, here the tangent space and the normal space depend on the location of the point on the manifold, it is more convenient to use the Stratonovich form for the signal process. Lemma 11.15 The signal Xt satisﬁes the following SDE in Stratonovich form: ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt , dXt = b(X

(11.23)

where for i = 1, 2, . . . , d, the ith component of b˜ is d m k 1 1 ckj ∂k cij − c˜ kj ∂k c˜ ij . b˜ i = bi − 2 2 k=1 j=1

k, j=1

Proof By equations (11.13) and (3.26), we have dXt = b(Xt )dt + c(Xt )dWt + c˜ (Xt )dBt = b(Xt )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt −

1 1 d c(X), Wt − d ˜c(X), Bt . 2 2

(11.24)

Applying Itô’s formula, we get dc(Xt ) = Lc(Xt )dt +

d

⎛ ⎞ m d j j ∂k c(Xt ) ⎝ ckj (Xt )dWt + c˜ kj (Xt )dBt ⎠ . j=1

k=1

j=1

Thus, d m " d ! j d

c(X), Wt = W ,W ∂k c(Xt ) ckj (Xt ) t dt dt k=1

j=1

247

248

11 : Singular ﬁltering

=

d m

ckj ∂k c(Xt )ej

k=1 j=1

=

d m

ckj ∂k c·j (Xt ),

k=1 j=1

here, c·j denotes the jth column of the matrix c. Similarly, we can prove that k d

˜c(X), Bt = c˜ kj ∂k c˜ ·j (Xt ). dt k,j=1

Inserting back into equation (11.24), we see that equation (11.23) holds. Note that a(Zt ) is observable. Recall equation (11.18) that da(Zt ) = Laσ (Xt )dt + q(Xt ) c(Xt )dWt + c˜ (Xt )dBt .

(11.25)

Deﬁne Vt =

t

0

qc(Xs )dWs +

t 0

q˜c(Xs )dBs .

Then, Vt is an m-dimensional ˜ continuous martingale with quadratic covariation process

Vt =

t

0

H(a(Zs ))ds,

where H = H1 H1∗ + H2 H2∗ . Lemma 11.16 The observation process a(Zt ) satisﬁes ˜ t )dt. da(Zt ) = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt + qb(X

(11.26)

Proof Similar to the proof of Lemma 11.15, we have dVt = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt −

d m 1 ∂k (q·j cj )ck (Xt )dt 2 k,j=1 =1

−

d 1 ∂k (q·j c˜ j )˜ck (Xt )dt 2 k,j,=1

11.4

Optimal ﬁlter supported on manifolds

d d 1 1 ∗ 2 ∗ = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt − (cc )jk ∂kj adt− q∂k cc·k dt 2 2 k,j=1

k=1

d d 1 1 ∗ 2 ∗ − (˜cc˜ )jk ∂kj adt − q∂k c˜ c˜ ·k dt 2 2 k,j=1

k=1

˜ t )dt. = qc(Xt ) ◦ dWt + q˜c(Xt ) ◦ dBt − Laσ (Xt )dt + qb(X Combining with equation (11.25), we see that equation (11.26) holds. Finally, we arrive at the main decomposition result. Theorem 11.17 The ﬁltering model can be rewritten as ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt + ρ(Xt ) ◦ da(Zt ), dXt = (I − p) ◦ b(X (11.27) with observations da(Zt ) = Laσ (Xt )dt + dVt ,

(11.28)

dYt = h(Xt )dt + Zt dWt .

(11.29)

and

Proof By Lemma 11.15, we have ˜ t )dt + c(Xt ) ◦ dWt + c˜ (Xt ) ◦ dBt dXt = b(X ˜ t )dt + (I − p)c(Xt ) ◦ dWt + (I − p)˜c(Xt ) ◦ dBt = (I − p)b(X ˜ t )dt + pc(Xt ) ◦ dWt + p˜c(Xt ) ◦ dBt . + pb(X

(11.30)

By equation (11.26) and Corollary 3.27, we get ˜ t )dt + pc(Xt ) ◦ dWt + p˜c(Xt ) ◦ dBt . ρ(Xt ) ◦ da(Zt ) = pb(X Inserting back into equation (11.30), we see that equation (11.27) holds. The equality equation (11.28) is a rewrite of equation (11.25). Equation (11.29) is just the original observation model equation (11.13). Let {ξt,s : 0 ≤ s ≤ t} be the stochastic ﬂow associated with the SDE: dξt = ρ(ξt ) ◦ da(Zt ).

(11.31)

Lemma 11.18 The ﬂow ξt,s maps MZs to MZt . Further, the process ξt is FtZ -adapted.

249

250

11 : Singular ﬁltering

Proof Applying the Stratonovich form of Itô’s formula, we get daσ (ξt ) = ∇ ∗ aσ (ξt )ρ(ξt ) ◦ da(Zt ) = da(Zt ). Thus, for σ (ξs ) = Zs , we get σ (ξt ) = Zt . The second conclusion follows from the uniqueness of the solution to the SDE equation (11.31). Denote the column vectors of (I − p)c and (I − p)˜c by g1 , . . . , gm and ˜ Then for each x ∈ Mz , the g˜ 1 , . . . , g˜ d , respectively. Let b0 = (I − p)b. vectors g1 (x), . . . , gm (x), g˜ 1 (x), . . . , g˜ d (x), b0 (x) are all in Tx Mz . The signal process Xt satisﬁes dXt = ρ(Xt ) ◦ da(Zt ) +

m

gi (Xt ) ◦ dWti +

i=1

d

j

g˜ j (Xt ) ◦ dBt + b0 (Xt )dt.

j=1

(11.32) of the stochastic ﬂow ξ By Theorem 4.18, the Jacobian matrix ξt,s t,s is −1 invertible. The operator (ξt,s )∗ deﬁned below pulls a vector in the tangent space of MZt at ξt,s (x) back to a vector in the tangent space of MZs at x.

Deﬁnition 11.19 Let g be a vector ﬁeld in Rd . The random vector ﬁeld −1 (ξt,s )∗ g is deﬁned as −1 −1 )∗ g(x) ≡ (ξt,s ) g(ξt,s (x)) (ξt,s

for any regular point x ∈ Rd . Similar to what we did in Section 11.1, we consider the following SDE on Rd : dκt =

−1 (ξt,0 )∗ b0 (κt )dt

m d j −1 −1 i + (ξt,0 )∗ gi (κt ) ◦ dWt + (ξt,0 )∗ g˜ j (κt ) ◦ dBt . i=1

j=1

(11.33) Lemma 11.20 The SDE equation (11.33) has a unique strong solution. Further, if Z0 = z, then κt ∈ Mz for all t ≥ 0 a.s. )−1 satisﬁes an equation of the form equation (4.19). Proof Note that (ξt,0 By Gronwall’s inequality, it is easy to show that −1 p E sup (ξt,0 ) < ∞,

∀p > 1.

0≤t≤T

Therefore, we can prove that there is a constant K such that 2 −1 −1 E (ξt,0 )∗ b0 (κ1 ) − (ξt,0 )∗ b0 (κ2 ) ≤ K|κ1 − κ2 |2 , ∀ κ1 , κ2 ∈ Rd .

11.4

Optimal ﬁlter supported on manifolds

The same inequalities hold with b0 replaced by gi , 1 ≤ i ≤ m or g˜ j , 1 ≤ j ≤ d. By the same argument as in the proof of Theorem 4.8, we see that equation (11.33) has a unique strong solution. Applying Itô’s formula to equation (11.33), we get daσ (κt ) =

−1 q(κt )(ξt,0 )∗ b0 (κt )dt

+

−1 )∗ gi (κt ) ◦ dWti q(κt )(ξt,0

m i=1

+

j −1 )∗ g˜ j (κt ) ◦ dBt . q(κt )(ξt,0

d j=1

−1 Since b0 (ξt,0 (κt )) ∈ Tξt,0 (κt ) MZt , we have (ξt,0 )∗ b0 (κt ) ∈ Tκt Mσ (κt ) . As the row vectors of q(κt ) are in Nκt Mσ (κt ) , we see that −1 q(κt )(ξt,0 )∗ b0 (κt ) = 0.

The same equalities hold with b0 replaced by gi , 1 ≤ i ≤ m or g˜ j , 1 ≤ j ≤ d. Thus, daσ (κt ) = 0, and hence, aσ (κt ) = aσ (κ0 ), ∀ t ≥ 0. This proves that σ (κt ) = z, and hence κt ∈ Mz , ∀ t ≥ 0, a.s. The next theorem gives the decomposition of the signal process. Theorem 11.21 For almost all ω ∈ , we have Xt (ω) = ξt,0 (κt (ω), ω),

∀t ≥ 0.

(11.34)

˜ t (ω). Applying Proof Denote the right-hand side of equation (11.34) by X Itô’s formula to ξt,0 and κt in equations (11.31) and (11.33), we get ˜t = dX

d

j

∂j ξt,0 (κt ) ◦ dκt + ρt (ξt,0 (κt )) ◦ da(Zt )

j=1 (κt )(ξt,0 (κt ))−1 b0 (ξt,0 (κt ))dt = ξt,0

+

m

ξt,0 (κt )(ξt,0 (κt ))−1 gi (ξt,0 (κt )) ◦ dWti

i=1

+

d

˜ t ) ◦ da(Zt ) ξt,0 (κt )(ξt,0 (κt ))−1 g˜ j (ξt,0 (κt )) ◦ dBt + ρt (X j

j=1

˜ t ) ◦ da(Zt ) + = ρ(X

m i=1

˜ t ) ◦ dW i + gi (X t

d j=1

˜ t ) ◦ dBjt + b0 (X ˜ t )dt. g˜ j (X

251

252

11 : Singular ﬁltering

By the uniqueness of the solution to equation (11.32), we see that the representation equation (11.34) holds. The optimal ﬁlter then satisﬁes ˆ

πt f = E( f (ξt,0 (κt ))|FtY ∨ FtZ ),

∀ t > 0.

Note that ξt,0 is FtZ -measurable. Thus, we may regard ξt,0 as known and the singular ﬁltering problem can be transformed to a classical one as follows: For f ∈ Cb (Mz ), let ˆ Ut f ≡ E f (κt )|FtY ∨ FtZ . Then, Ut is the optimal ﬁlter with the signal process κt given by equation (11.33) and the observation (Yˆ t , a(Zt )) given by d Yˆ t = Zt−1 h(ξt,0 (κt ))dt + dWt ,

(11.35)

and da(Zt ) = Laσ (ξt,0 (κt ))dt + H1 (a(Zt ))dWt + H2 (a(Zt ))dBt .

(11.36)

Note that the ﬁltering problem with signal equation (11.33) and observations equations (11.35) and (11.36) are classical. We leave the detail on how to derive Zakai’s equation and the ﬁltering equation for Ut to the reader. Finally, we note that t > 0. (11.37) πt f = Ut f ◦ ξt,0 , As we studied in Sections 11.1 and 11.3, π0 is not given by equation (11.37). In fact, π0 coincides with π˜ 0 , which is the initial distribution of X.

11.5

Filtering model with Ornstein–Uhlenbeck noise

In this section, we consider the ﬁltering problem with the OU process as the observation noise. As we indicated in Section 1.1.4, the OU process is an approximation of the white noise that exists in the sense of a generalized function only. Recall that the observation model is yt = h(Xt ) + Ot , while Ot is the OU process given by dOt = −Ot dt + dWt ,

(11.38)

11.5

Filtering model with Ornstein–Uhlenbeck noise

h : Rd → Rm is continuous and Wt is an m-dimensional Brownian motion. Suppose that the signal process Xt is given by dXt = b(Xt )dt + c(Xt )dBt , where b : Rd → Rd , c : Rd → Rd×d are continuous mappings, and Bt is a d-dimensional Brownian motion independent of W. Now we transform this ﬁltering problem with OU-process noise to a singular ﬁltering problem studied in the previous sections. Let Yt =

t 0

e−s d es ys .

y

Then FtY = Ft and, by Itô’s formula, dYt = Lh(Xt ) + h(Xt ) dt + dWt + ∇ ∗ hc(Xt )dBt . Deﬁne ∗ −1/2 dWt + ∇ ∗ hc(Xt )dBt . dVt = I + ∇ ∗ hc ∇ ∗ hc Then Vt is an m-dimensional Brownian motion and ∗ 1/2 (Xt )dVt . dYt = Lh(Xt ) + h(Xt ) dt + I + ∇ ∗ hc ∇ ∗ hc

(11.39)

Let d V˜ t =

∇ ∗ hc

∗

∇ ∗ hc + I

−1/2

∇ ∗ hc

∗

dWt − dBt .

Then V˜ is a d-dimensional Brownian motion independent of Vt . It is easy to solve for dBt to get that dBt =

∗ −1 ∗ ∗ ∗ 1/2 ∇ ∗ hc ∇ ∗ hc + I (Xt )dVt ∇ hc I + ∇ ∗ hc ∇ ∗ hc ∗ −1/2 − ∇ ∗ hc ∇ ∗ hc + I d V˜ t .

The signal process is then written as ∗ −1/2 d V˜ t (11.40) dXt = b(Xt )dt − c(Xt ) ∇ ∗ hc ∇ ∗ hc + I ∗ −1 ∗ ∗ ∗ 1/2 ∇ hc I+∇ ∗ hc ∇ ∗ hc (Xt )dVt . +c(Xt ) ∇ ∗ hc ∇ ∗ hc+I It is clear that the ﬁltering model equations (11.39) and (11.40) is a special case of the singular ﬁltering model equation (11.13).

253

254

11 : Singular ﬁltering

11.6

Notes

Stochastic ﬁltering with the Ornstein–Uhlenbeck process as the observation noise has been studied by Kunita [95], Mandal and Mandrekar [121] and Gawarecki and Mandrekar [68] under the condition that the signal is differentiable in time. Bhatt and Karandikar [10] relax the differentiability condition on the signal by smoothing the observation function. Most of the material in this chapter is taken from the paper of Crisan et al. [44]; the main ideas come from the papers of Joannides and LeGland [77], [78]. Section 11.1 is based on an unpublished manuscript written jointly with Xunyu Zhou.

Bibliography

[1] R. A. Adams (1975). Sobolev spaces. Pure and Applied Mathematics, Vol. 65. Academic Press, New York-London. [2] R. Atar (1998). Exponential stability for nonlinear ﬁltering of diffusion processes in non-compact domain. Ann. Probab. 26, 1552–1574. [3] R. Atar and O. Zeitouni (1997). Exponential stability for nonlinear ﬁltering. Ann. Inst. H. Poincaré Probab. Statist. 33, no. 6, 697–725. [4] R. Atar and O. Zeitouni (1997). Lyapunov exponents for ﬁnite state nonlinear ﬁltering. SIAM J. Control Optim. 35, 36–55. [5] P. Baxendale, P. Chigansky and R. Liptser (2004). Asymptotic stability of the Wonham ﬁlter: ergodic and nonergodic signals, SIAM J. Opt. Contr. 43, no. 2, 643–669. [6] M. Baxter and A. Rennie (1996). Financial calculus: an introduction to derivative pricing, Cambridge University Press. [7] V. E. Beneš and I. Karatzas (1983). Estimation and control for linear, partially observable systems with non-Gaussian initial distribution. Stochastic Process Appl. 14, 233–248. [8] P. Bernard, D. Talay, and L. Tubaro (1994). Rate of convergence for the Kolmogorov equation with variable coefﬁcients. Math. Comp. 63, 555–587. [9] A. G. Bhatt, A. Budhiraja, and R. L. Karandikar (2000). Markov property and ergodicity of the nonlinear ﬁlter. SIAM J. Control Optim. 39, 928–949. [10] A. G. Bhatt and R. L. Karandikar (2003). On ﬁltering with Ornstein– Uhlenbeck process as noise. J. Ind. Statist. Assoc. 41, no. 2, 205–220. [11] P. Billingsley (1986). Probability and measure. Wiley, New York. [12] G. Birkhoff (1967). Lattice theory, Am. Math. Soc. Publ. 25, 3rd edn. [13] B. Z. Bobrovsky and M. Zakai (1975). A lower bound on the estimation error for Markov processes. IEEE Trans. Automat. Control 20, no. 6, 785–788. [14] B. Z. Bobrovsky and M. Zakai (1975). A lower bound on the estimation error for certain diffusion processes. IEEE Trans. Inform. Theory IT-22, no. 1, 45–52.

256

Bibliography

[15] R. S. Bucy and R. E. Kalman (1961). New results in linear ﬁltering and prediction theory. J. Basic Eng., Trans. ASME 83, 95–108. [16] A. Budhiraja (2001). Ergodic properties of the nonlinear ﬁlter. Stochastic Process. Appl. 95, 1–24. [17] A. Budhiraja (2002). On invariant measures of discrete time ﬁlters in the correlated signal-noise case. Ann. Appl. Probab. 12, no.3, 1096– 1113. [18] A. Budhiraja (2003). Asymptotic stability, ergodicity and other asymptotic properties of the nonlinear ﬁlter. Ann. Inst. H. Poincaré Probab. Statist. 39, no. 6, 919–941. [19] A. Budhiraja and G. Kallianpur (1996). Approximations to the solution of the Zakai equation using multiple Wiener and Stratonovitch integral expansions. Stochastics Stochastics Rep. 56, 271–315. [20] A. Budhiraja and H. J. Kushner (1998). Robustness of nonlinear ﬁlters over the inﬁnite time interval. SIAM J. Control Optim. 36, 1618–1637. [21] A. Budhiraja and H. J. Kushner (1999). Approximation and limit results for nonlinear ﬁlters over an inﬁnite time interval. SIAM J. Control Optim. 37, no. 6, 1946–1979 [22] A. Budhiraja and H. J. Kushner (2001). Monte Carlo algorithms and asymptotic problems in nonlinear ﬁltering. Stochastics in ﬁnite/inﬁnite dimension, in: trends math., Birkhäuser, Boston, 59–87. [23] A. Budhiraja and H. J. Kushner (2000). Approximation and limit results for nonlinear ﬁlters over an inﬁnite time interval. II. Random sampling algorithms. SIAM J. Control Optim. 38, no. 6, 1874–1908 [24] A. Budhiraja and D. L. Ocone (1997). Exponential stability of discrete time ﬁlters without signal ergodicity. Systems Control Lett. 30, 185–193. [25] A. Budhiraja and D. L. Ocone (1999). Exponential stability in discrete time ﬁltering for non-ergodic signals. Stochastic Process. Appl. 82, 245–257. [26] H. Carvalho, P. Del Moral, A. Monin and G. Salut (1997). Optimal nonlinear ﬁltering in GPS/INS integration. IEEE Trans. Aerosp. Electron. Syst., 33, no. 3, 835–850. [27] F. Cérou (1994). Long time asymptotics for some dynamical noise free nonlinear ﬁltering problems. Rapport de Recherche 2446, INRIA. [28] T. Chiang, G. Kallianpur and P. Sundar (1991). Propagation of chaos and McKean-Vlasov equation in duals of nuclear spaces. Appl. Math. Optim. 24, 55–83. [29] P. Chigansky (2006). An ergodic theorem for ﬁltering with applications to stability. Systems Control Lett. 55, no. 11, 908–917.

Bibliography

[30] P. Chigansky (2006). Stability of the nonlinear ﬁlter for slowly switching Markov chains. Stochastic Process. Appl. 116, no. 8, 1185–1194. [31] P. Chigansky and R. Liptser (2004). Stability of nonlinear ﬁlters in nonmixing case. Ann. Appl. Probab. 14, no. 4, 2038–2056. [32] P. Chigansky and R. Liptser (2006). On a role of predictor in the ﬁltering stability. Electron. Comm. Probab. 11, 129–140 [33] P. L. Chow, R. Z. Khasminskii and R. S. Liptser (1997). Tracking of a signal and its derivatives in Gaussian white noise. Stochastic Process. Appl. 69, no. 2, 259–273. [34] J. M. C. Clark, D. L. Ocone, and C. Coumarbatch (1999). Relative entropy and error bounds for ﬁltering of Markov processes. Math. Control Signals Syst. 12, 346–360. [35] D. Crisan (2001). Particle ﬁlters—a theoretical perspective. Sequential Monte Carlo methods in practice, 17–41, Stat. Eng. Inf. Sci., Springer, New York. [36] D. Crisan (2002). Numerical methods for solving the stochastic ﬁltering problem, Numerical methods and stochastics (Toronto, ON, 1999), 1–20, Fields Inst. Commun., 34, Amer. Math. Soc., Providence, RI. [37] D. Crisan (2003). Exact rates of convergence for a branching particle approximation to the solution of the Zakai equation. Ann. Probab. 31, no. 2, 693–718. [38] D. Crisan (2004). Superprocesses in a Brownian environment. Stochastic analysis with applications to mathematical ﬁnance. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 460, no. 2041, 243–270. [39] D. Crisan, P. Del Moral and T. Lyons (1999). Interacting particle systems approximations of the Kushner–Stratonovitch equation. Adv. Appl. Probab. 31, no. 3, 819–838. [40] D. Crisan and A. Doucet (2002). A survey of convergence results on particle ﬁltering methods for practitioners. IEEE Trans. Signal Process. 50, no. 3, pp 736–746. [41] D. Crisan, J. Gaines and T. Lyons (1998). Convergence of a branching particle method to the solution of the Zakai equation. SIAM J. Appl. Math. 58, no. 5, 1568–1590 (electronic). [42] D. Crisan and T. Lyons (1997). Nonlinear ﬁltering and measurevalued processes. Probab. Theory Related Fields, 109, 217–244. [43] D. Crisan and T. Lyons (1999). A particle approximation of the solution of the Kushner–Stratonovitch equation. Probab. Theory Related Fields 115, no. 4, 549–578. [44] D. Crisan, M. Kouritzin and J. Xiong (2007). Nonlinear ﬁltering with signal depending observation noise. Submitted.

257

258

Bibliography

[45] D. Crisan and J. Xiong (2007). A central limit type theorem for particle ﬁlter. Comm. Stoch. Analysis 1, no. 1, 103–122. [46] G. Da Prato, M. Fuhrman and P. Malliavin (1995). Asymptotic ergodicity for the Zakai ﬁltering equation. C. R. Acad. Sci. Paris Sér. I Math. I 321, 613–616. [47] G. Da Prato and J. Zabczyk (1992). Stochastic equations in inﬁnite dimensions. University Press, New York. [48] E. B. Davies (1989). Heat kernels and spectral theory. Cambridge University Press. [49] D. A. Dawson and J. Vaillancourt (1995). Stochastic McKean– Vlasov equations. NoDEA Nonlinear Differential Equations Appl. 2, 199–229. [50] P. Del Moral (1995). Non-linear ﬁltering using random particles. Theory Probab. Appl. 40, 690–701. [51] P. Del Moral (1996). Non-linear ﬁltering: interacting particle resolution. Markov Process. Related Fields 2, no. 4, 555–581. [52] P. Del Moral and A. Guionnet (1999). On the stability of measurevalued processes with applications to ﬁltering. C. R. Acad. Sci. Paris Sér. I Math. I 329, 429–434. [53] P. Del Moral and A. Guionnet (1999) Central limit theorem for nonlinear ﬁltering and interacting particle systems. Ann. Appl. Probab. 9, no. 2, 275–297. [54] P. Del Moral and A. Guionnet (2001). On the stability of interacting processes with applications to ﬁltering and genetic algorithms. Ann. Inst. H. Poincaré Probab. Statist. 37, no. 2, 155–194. [55] P. Del Moral, J. C. Noyer and G. Salut (1995). Résolution particulaire et traitement non-linéaire du signal: application radar/sonar. In traitement du signal 12, no. 4, 287–301. [56] P. Del Moral and L. Miclo (2000). Branching and interacting particle systems approximations of Feynman–Kac formulae with applications to non-linear ﬁltering. Séminaire de Probabilités, XXXIV, Lecture Notes in Math., 1729, Springer, Berlin, 1–145. [57] P. Del Moral and L. Miclo (2002). On the stability of nonlinear Feynman–Kac semigroups. Ann. Fac. Sci. Toulouse Math. (6) 11, no. 2, 135–175. [58] B. Delyon and O. Zeitouni (1991). Lyapunov exponents for ﬁltering problem. Applied Stochastic Analysis, M. H. A. Davis and R. J. Elliot, ed., Gordon & Breach, New York, 511–521. [59] G. B. Di Masi, M. Pratelli and W. J. Runggaldier (1985). An approximation for the nonlinear ﬁltering problem with error bound. Stochastics. 14, 247–271.

Bibliography

[60] G. B. Di Masi and L. Stettner (2005). Ergodicity of hidden Markov models. Math. Control Signals Systems 17, no. 4, 269–296. [61] T. E. Duncan (1967). Doctoral Dissertation, Department of Electrical Engineering, Stanford University. [62] L.C. Evans and R.F. Gariepy (1992). Measure theory and ﬁne properties of functions. CRC Press, Boca Raton, FL. [63] P. Florchinger and F. Le Gland (1992). Particle approximations for ﬁrst order stochastic partial differential equations. Applied stochastic analysis (New Brunswick, NJ, 1991), 121–133, Lecture Notes in Control and Inform. Sci., 177 Springer, Berlin. [64] P. Florchinger and F. Le Gland (1991). Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. Stochastics Stochastics Rep. 35, 233–256. [65] P. Florchinger and F. Le Gland (1990). Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. In: Analysis and optimization of systems (Antibes, 1990), 228–237, Lecture Notes in Control and Inform. Sci., 144, Springer, Berlin. [66] P. Frost and T. Kailath (1971). An innovation approach to leastsquares estimation, Part III. IEEE Trans. Automat. Control, AC-16, 217–226. [67] M. Fujisaki, G. Kallianpur and H. Kunita (1972). Stochastic differential equations for the non-linear ﬁltering problem. Osaka J. Math. 9, 19–40. [68] L. Gawarecki and V. Mandrekar (2000). On the Zakai equation of ﬁltering with Gaussian noise. Stochastics in ﬁnite and inﬁnite dimensions, volume in honor of Gopinath Kallianpur, eds. T. Hida et al., 145–151. [69] N. J. Gordon, D. J. Salmon and A. F. M. Smith (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F, 140, 107–113. [70] N. J. Gordon, D. J. Salmon and C. Ewing (1995). Bayesian state estimation for tracking and guidance using the bootstrap ﬁlter. J. Guidance Control Dyn., 18, no. 6, 1434–1443. [71] C. Graham (1992). Nonlinear Itô–Skorohod equations and martingale problem with discrete jump sets, Stochastic Process. Appl. 40, 69–82. [72] B. Grigelionis (1973). On stochastic equations of nonlinear ﬁltering of random processes. Litov. Mat. Sb., 12, no. 4, 37–51. [73] M. Hitsuda and I. Mitoma (1986). Tightness problem and stochastic evolution equation arising from ﬂuctuation phenomena for interacting diffusions, J. Multivariate Anal. 19, 311–328.

259

260

Bibliography

[74] E. Hopf (1963). An inequality for positive linear integral operators. J. Math. Mech. 12, no. 5, 683–692. [75] Y. Hu, G. Kallianpur and J. Xiong (2002). An approximation for Zakai equation, Appl. Math. Optim. 1, 23–44. [76] N. Ikeda and S. Watanabe (1989). Stochastic differential equations and diffusions. North Holland Publishing Company, Amsterdam. [77] M. Joannides and F. LeGland (1997). Nonlinear Filtering with Perfect Discrete Time Observations, Proceedings of the 34th IEEE Conference on Decision and Control, New Orleans, December 13–15, 1995, pp. 4012–4017. [78] M. Joannides and F. LeGland (1997). Nonlinear Filtering with Continuous Time Perfect Observations and Noninformative Quadratic Variation. Proceedings of the 36th IEEE Conference on Decision and Control, San Diego, December 10–12, 1997, pp. 1645–1650. [79] T. Kailath (1968). An innovation approach to least-squares estimation, Parts I, II. IEEE Trans. Automat. Control, AC-13, 646–660. [80] T. Kailath and R. Greesy (1971). An innovation approach to leastsquares estimation, Part IV. IEEE Trans. Automat. Control, AC-16, 720–727. [81] G. Kallianpur (1980). Stochastic ﬁltering theory. Springer-Verlag, New York. [82] G. Kallianpur and C. Striebel (1968). Estimation of stochastic systems: arbitrary system process with additive noise observation errors. Ann. Math. Statist., 39, 785–801. [83] G. Kallianpur and C. Striebel (1969). Stochastic differential equations occurring in the estimation of continuous parameter stochastic processes. Teor. Veroyatn. Primen., 14, no. 4, 597–622. [84] G. Kallianpur and J. Xiong (1994). Asymptotic behavior of a system of interacting nuclear-space-valued stochastic differential equations driven by Poisson random measures. Appl. Math. Optim. 30, 175– 201. [85] R. L. Karandikar (1995). On pathwise stochastic integration. Stochastic Process. Appl. 57, 11–18. [86] G. Kitagawa (1996). Monte-Carlo ﬁlter and smoother for nonGaussian non-linear state space models. J. Comput. Graphical Stat., 5, no. 1, 1–25. [87] P. Kotelenez (1995). A class of quasilinear stochastic partial differential equation of McKean-Vlasov type with mass conservation. Probab. Theory Related Fields 102, 159–188. [88] N. Krylov (1996). On the Lp -theory of stochastic partial differential equations in the whole space. SIAM J. Math. Anal. 27, 313–340.

Bibliography

[89] N. Krylov (1999). An analytic approach to SPDEs. In: Stochastic partial differential equations, six perspectives. Mathematical Surveys and Monographs. AMS, Providence, RI. [90] N. Krylov and B. L. Rozovskii (1981). Stochastic evolution equations. J. Sov. Math. 16, 1233–1277. [91] N. Krylov and B. L. Rozovskii (1982). On the characteristics of degenerate second order parabolic Itô equations. Trudy seminara imeni Petrovskago 8, 153–168 in Russian; English translation in J. Sov. Math. 32, no. 4 (1986), 336–348. [92] H. Kunita (1971). Asymptotic behavior of the nonlinear ﬁltering errors of Markov processes. J. Multivariate Anal. 1, 365–393. [93] H. Kunita (1991). Ergodic properties of nonlinear ﬁltering processes. Spatial Stochastic Processes, K.C. Alexander and J. C. Watkins (ed.). [94] H. Kunita (1990). Stochastic ﬂows and stochastic differential equations. Cambridge Studies in Advanced Mathematics 24. Cambridge University Press, Cambridge. [95] H. Kunita (1993). Representation and stability of nonlinear ﬁlters associated with Gaussian noises. Stochastic processes: a festschrift in honor of Gopinath Kallianpur, (ed.) Cambanis et al., SpringerVerlag, New York. 201–210. [96] H. H. Kuo (1996). White Noise Distribution Theory. CRC Press, Boca Raton. [97] T. Kurtz and J. Xiong (1999). Particle representations for a class of nonlinear SPDEs. Stochastic Process. Appl. 83, 103–126. [98] T. Kurtz and J. Xiong (2000). Numerical solutions for a class of SPDEs with application to ﬁltering. Stochastics in ﬁnite and inﬁnite dimension: in honor of Gopinath Kallianpur. (ed.) T. Hida, R. Karandikar, H. Kunita, B. Rajput, S. Watanabe and J. Xiong. Trends in mathematics. Birkhäuser, Boston. [99] T. Kurtz and J. Xiong (2004). A stochastic evolution equation arising from the ﬂuctuation of a class of interacting particle systems. Commun. Math. Sci. 2, 325–358. [100] H. J. Kushner (1964). On the dynamic equations of conditional probability density functions with applications to optimal stochastic control theory. J. Math. Anal. Appl., 8, 332–344. [101] H. J. Kushner (1967). Dynamic equations for nonlinear ﬁltering. J. Differ. Equations, 3, 179–190. [102] H. J. Kushner (1977). Probability methods for approximations in stochastic control and for elliptic equations. Academic Press, New York.

261

262

Bibliography

[103] H. J. Kushner (1979). A robust discrete state approximation to the optimal nonlinear ﬁlter for a diffusion. Stochastics Stochastics Rep. 3, 75–83. [104] H. J. Kushner (1997). Robustness and convergence of approximations to nonlinear ﬁlters for jump-diffusions. Comput. Appl. Math., V. 16, 153–183. [105] H. Kwakernaak and R. Sivan (1972). Linear optimal control systems. Wiley-Interscience, New York. [106] A. Le Breton and M. Roubaud (2000). Asymptotic optimality of approximate ﬁlters in stochastic systems with colored noises. SIAM J. Control Optim. 39, no. 3, 917–927. [107] F. Le Gland and L. Mevel (2000). Exponential forgetting and geometric ergodicity in hidden Markov models. Math. Control Signals Syst. 13, 63–93. [108] F. Le Gland and N. Oudjane (2003). A robustiﬁcation approach to stability and to uniform particle approximation of nonlinear ﬁlters: the example of pseudo-mixing signals. Stochastic Process. Appl. 106, no. 2, 279–316. [109] F. Le Gland and N. Oudjane (2004). Stability and uniform approximation of nonlinear ﬁlters using the Hilbert metric and application to particle ﬁlters. Ann. Appl. Probab. 14, no. 1, 144–187. [110] R. S. Liptser (1967). On ﬁltering and extrapolation of the components of diffusion type Markov processes. Teor. Veroyatn. Primen., 12, no. 4, 754–756. [111] R. S. Liptser (1968). On ﬁltering and extrapolation of some Markov processes, I. Kibernetika (Kiev), 3, 63–70. [112] R. S. Liptser (1968). On ﬁltering and extrapolation of some Markov processes, II. Kibernetika (Kiev), 6, 70–76. [113] R. S. Liptser and A. N. Shiryaev (1968). Nonlinear ﬁltering of diffusion type Markov processes. Tr. Mat. Inst. Steklova, 104, 135–180. [114] R. S. Liptser and A. N. Shiryaev (1968). On the case of effective solution of the problems of optimal nonlinear ﬁltering, interpolation, and extrapolation. Teor. Veroyatn. Primen., 13, no. 3, 570–571. [115] R. S. Liptser and A. N. Shiryaev (1968). Nonlinear interpolation of the components of diffusion type Markov processes (forward equations, effective formulae). Teor. Veroyatn. Primen., 13, no. 4, 602–620. [116] R. S. Liptser and A. N. Shiryaev (1969). Interpolation and ﬁltering of the jump components of a Markov process. Izv. Akad. Nauk SSSR, Ser. Mat., 33, no. 4, 901–914.

Bibliography

[117] S. Lototsky and B. L. Rozovskii (1997). Recursive multiple Wiener integral expansion for nonlinear ﬁltering of diffusion processes. In: Stochastic processes and functional analysis (Riverside, CA, 1994), 199–208, Lecture Notes in Pure and Appl. Math., 186, Dekker, New York. [118] S. Lototsky, R. Mikulevicius and B. L. Rozovskii (1997). Nonlinear ﬁltering revisited: a spectral approach. SIAM J. Control Optim. 35, 435–461. [119] A. M. Makowski (1986). Filtering formula for partially observed linear systems with non-Gaussian initial conditions. Stochastics 16, 1–24. [120] A. M. Makowski and R. B. Sowers (1992). Discrete-time ﬁltering for linear systems with non-Gaussian initial conditions: Asymptotic behaviors of the difference between the MMSE and the LMSE estimates. IEEE Trans. Automat. Control 37, 114–120. [121] V. Mandrekar and P. K. Mandal (2000). A Bayes formula for Gaussian processes and its applications. SIAM J. Control Optim. 39, 852–871. [122] H. P. McKean (1967). Propagation of Chaos for a Class of Nonlinear Parabolic Equations, Lecture Series in Differential Equations 2, 177–194. [123] S. Méléard (1996). Asymptotic behavior of some interacting particle systems, McKean-Vlasov and Boltzmann models, Probabilistic models for nonlinear partial differential equations, Lecture Notes in Math., 1627, 42–95. [124] B. M. Miller and E. Ya. Rubinovich (1995). Regularization of a generalized Kalman ﬁlter. Math. Comput. Simul. 39, 87–108. [125] B. M. Miller and W. J. Runggaldier (1997) Kalman ﬁltering for linear systems with coefﬁcients driven by a hidden Markov jump process. Systems Control Lett. 31, 93–102. [126] P. L. Morien (1996). Propagation of chaos and ﬂuctuations for a system of weakly interacting white noise driven parabolic SPDE’s. Stochastics Stochastics Reps. 58, 1–43. [127] R. Mortensen (1966). Doctoral Dissertation, Department of Electrical Engineering, University of California, Berkeley. [128] D. L. Ocone (1999). Entropy inequalities and entropy dynamics in nonlinear ﬁltering of diffusion processes. Stochastic analysis, control, optimization and Applications, W. McEneaney, G. Yin, and Q. Zhang (ed.). [129] D. L. Ocone and E. Pardoux (1996). Asymptotic stability of the optimal ﬁlter with respect to its initial condition. SIAM J. Control Optim. 34, no. 1, 226–243.

263

264

Bibliography

[130] N. Oudjane and S. Rubenthaler (2005). Stability and uniform particle approximation of nonlinear ﬁlters in case of non-ergodic signals. Stoch. Anal. Appl. 23, no. 3, 421–448. [131] A. Papavasiliou (2006). Parameter estimation and asymptotic stability in stochastic ﬁltering. Stochastic Process. Appl. 116, no. 7, 1048–1065. [132] E. Pardoux (1979). Stochastic partial differential equations and ﬁltering of diffusion processes. Stochastics 3, 127–167. [133] J. Picard (1984). Approximation of nonlinear ﬁltering problems and order of convergence. Filtering and Control of Random Processes. Lecture Notes Control Inform. Sci. 61, Springer, New York. [134] P. Protter (1990). Stochastic integration and differential equations: a new approach. Springer, New York. [135] D. Revuz and M. Yor (1999). Continuous martingales and Brownian motion. Springer, New York. [136] B. L. Rozovskii (1972). Stochastic partial differential equations arising in nonlinear ﬁltering problems. Usp. Mat. Nauk. SSSR, 27, 3, 213–214. [137] B. L. Rozovskii (1990). Stochastic evolution systems. Linear theory and applications to nonlinear ﬁltering. Kluwer, Dordrecht. [138] A. N. Shiryaev (1966). On stochastic equations in the theory of conditional Markov processes. Teor. Veroyatn. Primen., 11, no. 1, 200–206. [139] A. N. Shiryaev (1966). Stochastic equations of nonlinear ﬁltering of jump Markov processes. Probl. Peredachi. Inf., 2, no. 3, 3–22. [140] L. Stettner (1989). On invariant measures of ﬁltering processes. Stochastic Differential Systems, Proc. 4th Bad Honnef Conf., K. Helmes, N. Christopeit, and M. Kohlmann (ed.), Lecture Notes in Control and Inform. Sci., 279–292. [141] L. Stettner (1991). Invariant measures of the pair: state, approximating ﬁltering process. Colloq. Math. 62, no. 2, 347–351. [142] R. L. Stratonovich (1960). Conditional Markov processes. Teor. Veroyatn. Primen., 5, no. 2, 172–195. [143] R. L. Stratonovich (1966). Conditional Markov processes and their applications to optimal control theory. Izd. MGU, Moscow. [144] C. Striebel (1968). Partial differential equations for the conditional distribution of a Markov process given noisy observations. J. Math. Anal. Appl., 11, 151–159. [145] V. B. Tadi´c and A. Doucet (2005). Exponential forgetting and geometric ergodicity for optimal ﬁltering in general state-space models. Stochastic Process. Appl. 115, no. 8, 1408–1436.

Bibliography

[146] H. V. Weizsächer (1983). Exchanging the order of taking suprema and countable intersections of σ -algebras. Ann. Inst. H. Poincaré Probab. Statist. 19, no. 1, 91–100. [147] A. D. Wentzell (1965). On equations of the conditional Markov processes. Teor. Veroyatn. Primen., 10, no. 2, 390–393. [148] D. Williams (1991). Probability with martingales. Cambridge University Press, Cambridge, UK. [149] W. M. Wonham (1965). Some applications of stochastic differential equations to optimal nonlinear ﬁltering. SIAM J. Control Optim., 2, 347–369. [150] J. Xiong and X.Y. Zhou (2006). Mean-variance portfolio selection under partial information. SIAM J. Control Optim. 46, no. 1, 156– 175. [151] M. P. Yershov (1969). Nonlinear ﬁltering of Markov processes. Teor. Veroyatn. Primen., 14, no. 4, 757–758. [152] M. P. Yershov (1970). Sequential estimation of diffusion processes. Teor. Veroyatn. Primen., 15, no. 4, 705–717. [153] J. Yong and X.Y. Zhou (1999). Stochastic control: Hamiltonian systems and HJB equations. Springer, New York. [154] M. Zakai (1969). On the optimal ﬁltering of diffusion processes. Z. Wahr. Verw. Gebiete 11, 230–243. [155] M. Zakai and J. Ziv (1972). Lower and upper bounds on the optimal ﬁltering error of certain diffusion processes. IEEE Trans. Inform. Theory IT-18, no. 3, 325–331.

265

List of Notations

• A = B mod P: A and B induce the same sets of P-equivalence sets • Ab : the set of all bounded A-measurable random variables • AB: the symmetric difference between two sets A and B • B (S): Borel σ -ﬁeld of a metric space S • Cd = C(R+ , Rd ): the collection of the continuous maps from R+ to Rd • C 2 (Rd ): the collection of all bounded differentiable functions with b • • • • • • • •

• • • • • • • • • • • • •

bounded derivatives up to order 2 C0 (R): the collection of all continuous functions with compact support δθ : the Dirac measure at θ D(L): the domain of the operator L EQ : the expectation with respect to the probability measure Q Eˆ (Y|G ): the conditional expectation, given G , under the probability measure Pˆ Ft : the σ -ﬁeld generated by the observation up to time t G1 ∨ G2 : The σ -ﬁeld generated by G1 ∪ G2 H0 = L2 (Rd ): the Hilbert space consists of square-integrable functions on Rd

2 ◦ φ2 = d |φ(x)| dx R 0 ◦ φ, ψ0 = Rd φ(x)ψ(x)dx H1 ⊗ H2 : the tensor product of the Hilbert spaces H1 and H2 L(X) or P ◦ X −1 : the measure induced by X L2loc (M): the collection of all locally square-integrable predictable processes L2G (0, T; Rd ): the collection of square-integrable processes that are predictable with respect to the σ -ﬁelds Gt M2 : the collection of all square-integrable martingales M2,c : the collection of all continuous square-integrable martingales Mcloc : collection of all continuous local martingales

M2,c : all continuous locally square-integrable martingales loc MG (Rd ): the space of ﬁnite signed measures on Rd .

Mt : quadratic variation process of Mt

M, Nt : quadratic covariation process of Mt and Nt MF (Rd ): collection of all ﬁnite Borel measures on Rd µ << ν: the measure µ is absolutely continuous with respect to ν

List of Notations

• • • • • • • • • • •

|ν|: the total variation measure of ν ν, f : the integral of the function f with respect to the measure ν Nx Mz : the normal space of the manifold Mz at point x ∈ Mz PωA : the regular conditional probability measure on (, F , P) given the sub-σ -ﬁeld A P (Rd ): the collection of all Borel probability measures on Rd ∂i F: the partial derivative of a function F with respect to its ith variable ∂ij2 F: ∂i ∂j F ST : the collection of all stopping times bounded by T · 0,∞ : supremum norm of a function + : the set of symmetric positive-deﬁnite m × m-matrices Sm Tx Mz : the tangent space of the manifold Mz at point x ∈ Mz ξ ∗ : the transpose of a vector or a matrix ξ

267

This page intentionally left blank

Index PA -separable, 228 ρh -diameter, 212 s-dimensional Hausdorff measure, 242 accumulated observation process, 2 algebraic Riccati equation, 176 Area formula, 242 Assumption (A), 181 Assumption (BC), 83 Assumption (BD), 113 Assumption (E1), 198 Assumption (E2), 198 Assumption (E3), 206 Assumption (S1), 216 Assumption (S2), 217 asymptotically stable matrix, 169 backward Itô integral, 112 backward martingale, 22 backward martingale convergence theorem, 22 backward SPDE, 111 barycenter, 202 Bayes’ formula, 55 Birkhoff’s contraction coefﬁcient, 213 Brownian motion, 34 Burkholder–Davis–Gundy inequality, 45 cádlág, 24 cádlág modiﬁcation, 24 Cauchy sequence, 96 Cayley–Hamilton theorem, 172 characteristic polynomial, 172 class (DL), 25 comparable measures, 211 complete orthonormal basis, 97 completely controllable matrix, 170 completely reconstructible system, 170 Condition (F), 187 Condition (I), 138 Condition (IN), 245 Condition (ND), 237 Condition (R), 240

Condition (S), 240 CONS, 97 controllable subspace of a system, 175 covariance matrix, 159 Cox–Ingersoll–Ross model, 232 detectable pair, 173 discrete-time Gronwall’s inequality, 75 Doob’s decomposition, 25 Doob’s inequality, 17, 18, 24 Doob–Meyer decomposition, 27 Duncan–Mortensen–Zakai equation, 95 Euler approximation, 73 Euler scheme, 134 extension, 48 Feller–Markov process, 187 ﬁlter is asymptotically stable, 187 ﬁnite memory property, 208 FKK equation, 8 Gaussian process, 158 generator of the signal, 89 Girsanov’s transformation, 56 Gronwall’s inequality, 69 Hilbert metric, 211 Hilbert space, 96 hybrid ﬁlter, 140 inner product space, 96 innovation process, 90 integrable increasing process, 26 invariant measure of ﬁlter, 198 Itô stochastic integral, 38 Itô’s formula, 42 Jordan form, 168 Jordan normal form, 168

270

Index

Kallianpur–Striebel formula, 7 Kallianpur–Striebel, 85 Kalman–Bucy ﬁlter, 164 Kazamaki’s theorem, 53 Kushner–FKK equation, 91 Kushner–Stratonovich equation, 8, 95 local martingale, 33 Markov process, 79 martingale, 15 martingale convergence theorem, 20 martingale problem, 71 mean vector, 159 Meyer’s process, 33 natural increasing process, 26 normal space, 245 Novikov condition, 54 observation process, 2 optimal control problem, 166 optimal ﬁlter, 85 optional sampling theorem, 16, 25 Ornstein–Uhlenbeck process, 6 particle ﬁlter, 140 pathwise uniqueness, 62 PDE, 111 portfolio, 3 predictable process, 15, 37 regular conditional probability distribution, 84 regular point for mapping, 240 regular submartingale, 31 Riccati equation, 163, 167

SDE, 2 semimartingale, 41 separable space, 96 separation principle, 93 SPDE, 9 square-integrable martingale, 32 stabilizable pair, 175 stable subspace of matrix A, 169 standard extension, 48 stochastic basis, 15 stochastic ﬂow, 73 stochastic integral, 41 stopping time, 16 Stratonovich integral, 58 strong solution, 62 submartingale, 15 submartingale convergence theorem, 20 supermartingale, 15

tangent space, 245 tensor product of Hilbert spaces, 98 total variation metric on P(S), 214 uniqueness of strong solution, 62 uniqueness of weak solution, 62 unnormalized ﬁlter, 86 unreconstructible subspace of a system, 172 unstable subspace of matrix A, 169 upcrossing number, 19

Wasserstein metric, 122 weak solution, 62 weighted unnormalized ﬁlter, 133 well-posedness, 71

Zakai’s equation, 8, 89