Stochastic Mechanics Random Media Signal Processing and Image Synthesis Mathematical Economics and Finance
Stochastic ...
64 downloads
937 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Stochastic Mechanics Random Media Signal Processing and Image Synthesis Mathematical Economics and Finance
Stochastic Modelling and Applied Probability (Formerly: Applications of Mathematics)
Stochastic Optimization Stochastic Control Stochastic Models in Life Sciences
Edited by
Advisory Board
60 B. Rozovski˘ı G. Grimmett D. Dawson D. Geman I. Karatzas F. Kelly Y. Le Jan B. Øksendal G. Papanicolaou E. Pardoux
For other titles published in this series, go to www.springer.com/series/602
Alan Bain · Dan Crisan
Fundamentals of Stochastic Filtering
123
Alan Bain BNP Paribas 10 Harewood Av London NW1 6AA United Kingdom alan.bain@bnpparibas.com
Man aging Editors B. Rozovski˘ı Division of Applied Mathematics 182 George St. Providence, RI 02912 USA rozovski@dam.brown.edu
Dan Crisan Department of Mathematics Imperial College London 180 Queen’s Gate London SW7 2AZ United Kingdom d.crisan@imperial.ac.uk
G. Grimmett Centre for Mathematical Sciences Wilberforce Road Cambridge CB3 0WB UK G.R.Grimmett@statslab.cam.ac.uk
ISSN: 0172-4568 Stochastic Modelling and Applied Probability ISBN: 978-0-387-76895-3 e-ISBN: 978-0-387-76896-0 DOI 10.1007/978-0-387-76896-0 Library of Congress Control Number: 2008938477 Mathematics Subject Classification (2000): 93E10, 93E11, 60G35, 62M20, 60H15 c Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper springer.com
Preface
Many aspects of phenomena critical to our lives can not be measured directly. Fortunately models of these phenomena, together with more limited observations frequently allow us to make reasonable inferences about the state of the systems that affect us. The process of using partial observations and a stochastic model to make inferences about an evolving system is known as stochastic filtering. The objective of this text is to assist anyone who would like to become familiar with the theory of stochastic filtering, whether graduate student or more experienced scientist. The majority of the fundamental results of the subject are presented using modern methods making them readily available for reference. The book may also be of interest to practitioners of stochastic filtering, who wish to gain a better understanding of the underlying theory. Stochastic filtering in continuous time relies heavily on measure theory, stochastic processes and stochastic calculus. While knowledge of basic measure theory and probability is assumed, the text is largely self-contained in that the majority of the results needed are stated in two appendices. This should make it easy for the book to be used as a graduate teaching text. With this in mind, each chapter contains a number of exercises, with solutions detailed at the end of the chapter. The book is divided into two parts: The first covers four basic topics within the theory of filtering: the filtering equations (Chapters 3 and 4), Clark’s representation formula (Chapter 5), finite-dimensional filters, in particular, the Beneˇs and the Kalman–Bucy filter (Chapter 6) and the smoothness of the solution of the filtering equations (Chapter 7). These chapters could be used as the basis of a one- or two-term graduate lecture course. The second part of the book is dedicated to numerical schemes for the approximation of the solution of the filtering problem. After a short survey of the existing numerical schemes (Chapter 8), the bulk of the material is dedicated to particle approximations. Chapters 9 and 10 describe various particle filtering methods in continuous and discrete time and prove associated con-
vi
Preface
vergence results. The material in Chapter 10 does not require knowledge of stochastic integration and could form the basis of a short introductory course. We should like to thank the publishers, in particular the senior editor, Achi Dosanjh, for her understanding and patience. Thanks are also due to various people who offered their support and advice during the project, in particular Martin Clark, Mark Davis and Boris Rozovsky. One of the authors (D.C.) would like to thank Robert Pich´e for the invitation to give a series of lectures on the subject in August 2006. Part of the book grew out of notes on lectures given at Imperial College London, University of Cambridge and Tampere University of Technology. Special thanks are due to Kari Heine from Tampere University of Technology and Olasunkanmi Obanubi from Imperial College London who read large portions of the first draft and suggested many corrections and improvements. Finally we would like to thank our families for their support, without which this project would have never happened.
London December 2007
Alan Bain Dan Crisan
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Contents of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Historical Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3 5
Part I Filtering Theory 2
The Stochastic Process π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The Observation σ-algebra Yt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Optional Projection of a Measurable Process . . . . . . . . . . . . 2.3 Probability Measures on Metric Spaces . . . . . . . . . . . . . . . . . . . . . 2.3.1 The Weak Topology on P(S) . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The Stochastic Process π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Regular Conditional Probabilities . . . . . . . . . . . . . . . . . . . 2.5 Right Continuity of Observation Filtration . . . . . . . . . . . . . . . . . . 2.6 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13 16 17 19 21 27 32 33 41 45
3
The Filtering Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The Filtering Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Two Particular Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 X a Diffusion Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 X a Markov Process with a Finite Number of States . . . 3.3 The Change of Probability Measure Method . . . . . . . . . . . . . . . . 3.4 Unnormalised Conditional Distribution . . . . . . . . . . . . . . . . . . . . . 3.5 The Zakai Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47 47 49 49 51 52 57 61
viii
Contents
3.6 3.7 3.8 3.9 3.10
The Kushner–Stratonovich Equation . . . . . . . . . . . . . . . . . . . . . . . The Innovation Process Approach . . . . . . . . . . . . . . . . . . . . . . . . . The Correlated Noise Framework . . . . . . . . . . . . . . . . . . . . . . . . . . Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67 70 73 75 93
4
Uniqueness of the Solution to the Zakai and the Kushner–Stratonovich Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.1 The PDE Approach to Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.2 The Functional Analytic Approach . . . . . . . . . . . . . . . . . . . . . . . . 110 4.3 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.4 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5
The Robust Representation Formula . . . . . . . . . . . . . . . . . . . . . . . 127 5.1 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2 The Importance of a Robust Representation . . . . . . . . . . . . . . . . 128 5.3 Preliminary Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.4 Clark’s Robustness Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.5 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.6 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6
Finite-Dimensional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.1 The Beneˇs Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.1.1 Another Change of Probability Measure . . . . . . . . . . . . . . 142 6.1.2 The Explicit Formula for the Beneˇs Filter . . . . . . . . . . . . 144 6.2 The Kalman–Bucy Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.2.1 The First and Second Moments of the Conditional Distribution of the Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2.2 The Explicit Formula for the Kalman–Bucy Filter . . . . . 154 6.3 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7
The Density of the Conditional Distribution of the Signal . 165 7.1 An Embedding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.2 The Existence of the Density of ρt . . . . . . . . . . . . . . . . . . . . . . . . . 168 7.3 The Smoothness of the Density of ρt . . . . . . . . . . . . . . . . . . . . . . . 174 7.4 The Dual of ρt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 7.5 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Part II Numerical Algorithms 8
Numerical Methods for Solving the Filtering Problem . . . . . 191 8.1 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8.2 Finite-Dimensional Non-linear Filters . . . . . . . . . . . . . . . . . . . . . . 196 8.3 The Projection Filter and Moments Methods . . . . . . . . . . . . . . . 199 8.4 The Spectral Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Contents
ix
8.5 Partial Differential Equations Methods . . . . . . . . . . . . . . . . . . . . . 206 8.6 Particle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 8.7 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9
A Continuous Time Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . 221 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 9.2 The Approximating Particle System . . . . . . . . . . . . . . . . . . . . . . . 223 9.2.1 The Branching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.3 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 9.4 The Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 9.5 Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 9.6 The Implementation of the Particle Approximation for πt . . . . . 250 9.7 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10 Particle Filters in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 10.1 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 10.2 The Recurrence Formula for πt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 10.3 Convergence of Approximations to πt . . . . . . . . . . . . . . . . . . . . . . 264 10.3.1 The Fixed Observation Case . . . . . . . . . . . . . . . . . . . . . . . . 264 10.3.2 The Random Observation Case . . . . . . . . . . . . . . . . . . . . . 269 10.4 Particle Filters in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 10.5 Offspring Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 10.6 Convergence of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.7 Final Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 10.8 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Part III Appendices A
Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 A.1 Monotone Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 A.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 A.3 Topological Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 A.4 Tulcea’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 A.4.1 The Daniell–Kolmogorov–Tulcea Theorem . . . . . . . . . . . . 301 A.5 C` adl` ag Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 A.5.1 Discontinuities of C` adl` ag Paths . . . . . . . . . . . . . . . . . . . . . 303 A.5.2 Skorohod Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 A.6 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 A.7 The Optional Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 A.7.1 Path Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 A.8 The Previsible Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 A.9 The Optional Projection Without the Usual Conditions . . . . . . 319 A.10 Convergence of Measure-valued Random Variables . . . . . . . . . . . 322 A.11 Gronwall’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
x
Contents
A.12 Explicit Construction of the Underlying Sample Space for the Stochastic Filtering Problem . . . . . . . . . . . 326 B
Stochastic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 B.1 Martingale Theory in Continuous Time . . . . . . . . . . . . . . . . . . . . 329 B.2 Itˆ o Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.2.1 Quadratic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 B.2.2 Continuous Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 B.2.3 Integration by Parts Formula . . . . . . . . . . . . . . . . . . . . . . . 341 B.2.4 Itˆ o’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 B.2.5 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 B.3 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 B.3.1 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 B.3.2 Martingale Representation Theorem . . . . . . . . . . . . . . . . . 348 B.3.3 Novikov’s Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 B.3.4 Stochastic Fubini Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 351 B.3.5 Burkholder–Davis–Gundy Inequalities . . . . . . . . . . . . . . . . 353 B.4 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 355 B.5 Total Sets in L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 B.6 Limits of Stochastic Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 B.7 An Exponential Functional of Brownian motion . . . . . . . . . . . . . 360
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Author Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Notation
Spaces • • • • • • • • • • •
• • •
• •
Rd – the d-dimensional Euclidean space. Rd – the one-point compactification of Rd formed by adjoining a single point at infinity to Rd . B(S) – the Borel σ-field on S. That is the σ-field generated by the open sets in S. If S = Rd for some d, then this σ-field is countably generated. (S, S) – the state space for the signal. Unless otherwise stated, S is a complete separable metric space and S is the associated Borel σ-field B(S). C(S) – the space of real-valued continuous functions defined on S. M (S) – the space of B(S)-measurable functions S → R. B(S) – the space of bounded B(S)-measurable functions S → R. Cb (S) – the space of bounded continuous functions S → R. Ck (S) – the space of compactly supported continuous functions S → R. Ckm (S) – the space of compactly supported continuous functions S → R whose first m derivatives are continuous. Cbm (Rd ) – the space of all bounded, continuous functions with bounded partial derivatives up to order m. The norm k · km,∞ is frequently used with this space. T∞ Cb∞ (Rd ) = m=0 Cbm (Rd ). DS [0, ∞) – the space of c` adl` ag functions from [0, ∞) → S. Cb1,2 the space of bounded continuous real-valued funtions u(t, x) with domain [0, ∞) × R, which are differentiable with respect to t and twice differentiable with respect to x. These derivatives are bounded and continuous with respect to (t, x). C l (Rd ) the subspace of C(Rd ) containing functions ϕ such that ϕ/ψ ∈ Cb (Rd ), where ψ(x) = 1 + kxk. Wpm (Rd ) – the Sobolev space of all functions with generalized partial derivatives up to order m with both the function and all its partial derivatives being Lp -integrable. This space is usually endowed with the norm k · km,p .
xii
• • • • • • • •
Notation
SL(Rd ) = ϕ ∈ Cb (Rd ) : ∃ M such that ϕ(x) ≤ M/(1 + kxk), ∀x ∈ Rd M(S) – the space of finite measures over (S, S). P(S) – the space of probability measures over (S, S), i.e the subspace of M(S) such that µ ∈ P(S) satisfies µ(S) = 1. DMF (Rd ) [0, ∞) – the space of right continuous functions with left limits a : [0, ∞) → MF (Rd ) endowed with the Skorohod topology. I – an arbitrary finite set {a1 , a2 , . . .}. P (I) – the power set of I, i.e. the set of all subsets of I. M(I) – the space of finite positive measures over (I, P (I)). P(I) – the space of probability measures over (I, P (I)), i.e. the subspace of M(I) such that µ ∈ P (I) satisfies µ(I) = 1.
Other notations •
p m k · k – the Euclidean norm, for x = (xi )m x21 + · · · + x2m . i=1 ∈ R , kxk = It is also applied to d × p-matrices by considering them as d × p vectors, viz v u d p uX X kak = t a2ij . i=1 j=1
•
k · k∞ – the supremum norm; for ϕ : Rd → R, kϕk∞ = supx∈Rd |ϕ(x)|. In general if ϕ : Rd → Rm then kϕk∞ = max sup |ϕi (x)|. i=1,...m x∈Rd
•
The notation k · k∞ is equivalent to k · k0,∞ . This norm is especially useful on spaces such as Cb (Rd ), or Ck (Rd ), which only contain functions of bounded supremum norm; in other words, kϕk∞ < ∞. k · km,p – the norm used on the space Wpm defined by 1/p
kϕkm,p =
X
p
kDα ϕ(x)kp
|α|≤m 1
•
d
where α = (α1 , . . . , αd ) is a multi-index and Dα ϕ = (∂1 )α . . . (∂d )α ϕ. k · km,∞ is the special case of the above norm when p = ∞, defined by X kϕkm,∞ = sup |Dα ϕ(x)| . d
|α|≤m x∈R
• • •
δa – the Dirac measure concentrated at a ∈ S, δx (A) ≡ 1A (x). 1 – the constant function 1. ⇒ – used to denote weak convergence of probability measures in P(S); see Definition 2.14.
Notation
• • • • • • • • • • • • •
xiii
µf R , µ(f ) – the integral of f ∈ B(S) with respect to µ ∈ M(S), i.e. µf , f (x)µ(dx). S a> is the transpose of the matrix a. Id – the d × d identity matrix. Od,m – the d × m zero matrix. P tr(A) – the trace of the matrix A, i.e. if A = (aij ), then tr(A) = i aii . [x] – the integer part of x ∈ R. {x} – the fractional part of x ∈ R, i.e. x − [x]. hM it – the quadratic variation of the semi martingale M . s ∧ t – for s, t ∈ R, s ∧ t = min(s, t). s ∨ t – for s, t ∈ R, t ∨ s = max(s, t). A ∨ B – the σ-algebra generated by the union A ∪ B. A4B – the symmetric difference of sets A and B, i.e. all elements that are in one of A or B but not both, formally A4B = (A \ B) ∪ (B \ A). N – the collection of null sets in the probability space (Ω, F, P).
Part I
Filtering Theory
2 The Stochastic Process π
The principal aim of this chapter is to familiarize the reader with the fact that the conditional distribution of the signal can be viewed as a stochastic process with values in the space of probability measures. While it is true that this chapter sets the scene for the subsequent chapters, it can be skipped by those readers whose interests are biased towards the applied aspects of the subject. The gist of the chapter can be summarized by the following. The principal aim of solving a filtering problem is to determine the conditional distribution of the signal X given the observation σ-algebra Yt , where Yt , σ(Ys , 0 ≤ s ≤ t) ∨ N , where N is the collection of all null sets of the complete probability space (Ω, F, P) (see Remark 2.3 for comments on what is possible without the addition of these null sets to Yt ). We wish to formalise this by defining a stochastic process describing this conditional distribution. Let the signal process X take values in a measurable space (S, S). Suppose we na¨ıvely define a stochastic process (ω, t) → πtω taking values in the space of functions from S into [0, 1] by πtω (A) = P [Xt ∈ A | Yt ] (ω), (2.1) where A is an arbitrary set in the σ-algebra S. Recalling Kolmogorov’s definition of conditional expectation† , πtω (A) is not uniquely defined for all ω ∈ Ω, but only for ω outside a P-null set, which may depend upon the set A. It would be natural to think of this πt as a probability measure on (S, S). However, this is not straightforward. For example consider the countable additivity property which any measure must satisfy. Let A1 , A2 , . . . ∈ S be a sequence of pairwise disjoint sets, then by properties a. and c. of conditional expectation (see Section A.2), πt (·)(ω) satisfies the expected σ-additivity condition †
See Section A.2 in the appendix for a brief review of the properties of conditional expectation and conditional probability.
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 2,
14
2 The Stochastic Process π
! πtω
[ n
An
=
X
πtω (An )
n
for every ω ∈ Ω\N (An , n ≥ 1), where N (An , n ≥ 1) is a P-null set which depends on the choice of the disjoint sets An , n ≥ 1. Then we define [ ¯ = N N (An , n ≥ 1), where the union is taken over all sequences of disjoint sets (An )n≥1 , such that for all n > 0, An ∈ S. Then πtω satisfies the σ-additivity property for arbitrary ¯ . Although the P-measure of N (An , n ≥ 1) is sets {An , n ≥ 1} only if ω ∈ /N zero, the set N need not even be measurable because it is defined in terms of an uncountable union, and furthermore, N need not be contained in a P-null set. This would imply that πt cannot be a probability measure. To solve this difficulty we require that the state space of the signal S be a complete separable metric space and S be the Borel σ-algebra B(S). This enables us to define πt as the regular conditional distribution (in the sense of Definition A.2) of Xt given Yt . Defined in this manner, the process π = {πt , t ≥ 0} will be a P(S)-valued Yt -adapted process which satisfies (2.1) for any t ≥ 0. Unfortunately this is not enough. A second requirement must be satisfied by the process π. One of the results established in Chapter 3 is an evolution equation (1.4) for π, which is called the filtering equation. This evolution equation involves a stochastic integral with respect to the observation process Y whose integrand is described in terms of π. Since the integrator process Y is continuous, it follows from Theorem B.19 that the stochastic integral with respect to Y is defined if π is a progressively measurable process, that is, if the function (t, ω) → πt : ([0, T ] × Ω, B([0, T ]) ⊗ Yt ) → (P(S), B(P(S))), is measurable for any T > 0. It is necessary to show that π has a version which is progressively measurable. We construct such a version for a signal process X which has c` adl`ag paths. In general, such a version is no longer adapted with respect to Yt , but with respect to a right continuous enlargement of Yt . In the case of the problems considered within this book Yt itself is right continuous (see Section 2.5) so no enlargement is required. Theorem 2.1. Let S be a complete separable metric space and S be the associated Borel σ-algebra. Then there exists a P(S)-valued Yt -adapted process π = {πt , t ≥ 0} such that for any f ∈ B(S) πt f = E[f (Xt ) | Yt ]
P-a.s.
In particular, identity (2.1) holds true for any A ∈ B(S). Moreover, if Y satisfies the evolution equation
2 The Stochastic Process π
15
t
Z Yt = Y0 +
h(Xs ) ds + Wt ,
t ≥ 0,
(2.2)
0
where W = {Wt , t ≥ 0} is a standard Ft -adapted m-dimensional Brownian m motion and h = (hi )m is a measurable function such that i=1 : S → R Z t E kh(Xs )k ds < ∞ (2.3) 0
and
Z P
t
kπs (h)k2 ds < ∞ = 1.
(2.4)
0
for all t ≥ 0, then π has a Yt -adapted progressively measurable modification. Furthermore, if X is c` adl` ag then πt can be chosen to have c` adl` ag paths. The conditions (2.3) and (2.4) are frequently difficult to check (particularly (2.4)). They are implied by the stronger, but simpler condition Z t E kh(Xs )k2 ds < ∞. (2.5) 0
To prove Theorem 2.1 we prove first a more general result (Theorem 2.24) which justifies the existence of a version of π adapted with respect to a right continuous enlargement of the observation filtration Yt . This result is proved without imposing any additional constraints on the observation process Y . However, under the additional constraints (2.2)–(2.4) as a consequence of Theorem 2.35, the filtration Yt is right continuous, so no enlargement is required. Theorem 2.1 then follows. In order to prove Theorem 2.24, we must introduce the optional projection of a stochastic process with respect to a filtration which satisfies the usual conditions. The standard construction of the optional projection requires the filtration to be right continuous and a priori the filtration Yt may not have this property. Therefore choose a right continuous enlargement of the filtration Yt defined by {Yt+ , t ≥ 0}, where Yt+ = ∩s>t Ys . The existence of such an optional projection is established in Section 2.2. Remark 2.2. The construction of the optional projection is valid without requiring that the filtration satisfy the usual conditions (see Section A.9). However such conditions are too weak for the proof of Theorem 2.24. Remark 2.3. We always assume that the process π is this progressively measurable version and consequently {Yt , t ≥ 0} always denotes the augmented observation filtration. However, for any t ≥ 0, the random probability measure πt has a σ(Ys , s ∈ [0, t])-measurable version, which can be used whenever the progressive measurability property is not required (see Exercise 2.36). Such a version of πt , being σ(Ys , s ∈ [0, t])-adapted, is a function of the observation path and thus is completely determined by the observation data. It turns out that πt is a continuous function of the observation path. This is known as the path-robustness of filtering theory and it is discussed in Chapter 5.
16
2 The Stochastic Process π
2.1 The Observation σ-algebra Yt Let (Ω, F, P) be a probability space together with a filtration (Ft )t≥0 which satisfies the usual conditions: 1. F is complete i.e. A ⊂ B, B ∈ F and P(B) = 0 implies that A ∈ F and P(A) = 0. 2. The filtration Ft is right continuous i.e. Ft = Ft+ . 3. F0 (and consequently all Ft for t ≥ 0) contains all the P-null sets. On (Ω, F, P) we consider a stochastic process X = {Xt , t ≥ 0} which takes values in a complete separable metric space S (the state space). Let S be the associated Borel σ-algebra. We assume that X is measurable. That is, X has the property that the mapping (t, ω) → Xt (ω) : ([0, ∞) × Ω, B([0, ∞)) ⊗ F) → (S, S) is measurable. Moreover we assume that X is Ft -adapted. Also let Y = {Yt , t ≥ 0} be another Ft -adapted process. The σ-algebra Yt has already been mentioned in the introductory chapter. We now make a formal definition Yt , σ(Ys , 0 ≤ s ≤ t) ∨ N , (2.6) where N is the set of P-null sets in F and the notation A ∨ B is the standard notation for the σ-algebra generated by A and B, i.e. σ(A, B). The addition of the null sets N to the observation σ-algebra considerably increases the complexity of the proofs in the derivation of the filtering equations via the innovations approach in Chapter 3, so we should be clear why it is necessary. It is important that we can modify Yt -adapted processes. Suppose Nt is a such a process, then we need to be able to construct a process ˜t so that for ω ∈ G we change the values of the process, and for all ω ∈ N / G, ˜t (ω) = Nt (ω), where G is a P-null set. In order that N ˜t be Yt -adapted, the N set G must be in Yt , which is assured by the augmentation of Yt with the P-null sets N . The following exercise gives a straightforward characterization of the σalgebra Yt and the relation between the expectation conditional upon the augmented filtration Yt and that conditional upon the unaugmented filtration Yto . Exercise 2.4. Let Yto = σ(Ys , 0 ≤ s ≤ t). i. Prove that Yt = {F ⊂ Ω : F = (G\N1 ) ∪ N2 , G ∈ Yto , N1 , N2 ∈ N }.
(2.7)
ii. Deduce from part (i) that if ξ is Yt -measurable, then there exists a Yto measurable random variable η, such that ξ = η P-almost surely. In particular, for any integrable random variable ξ, the identity
2.2 The Optional Projection of a Measurable Process
17
E[ξ | Yt ] = E[ξ | Yto ] holds P-almost surely. As already stated, we consider a right continuous enlargement of the filtration Yt defined by {Yt+ , t ≥ 0}, where Yt+ = ∩s>t Ys . We do not wish a priori to impose the requirement that this observation σ-algebra be right continuous and satisfy Yt+ = Yt , because verifying the right continuity of a σ-algebra which depends upon observations might not be possible before the observations have been made! We note, however, that the σ-algebra Yt+ satisfies the usual conditions; it is right continuous and complete. Finally we note that no path regularity is assumed on either X or Y . Also no explicit connection exists between the processes X and Y .
2.2 The Optional Projection of a Measurable Process From the perspective of measure theory, the filtering problem is associated with the construction of the optional projection of a process. The results in this section are standard in the theory of continuous time stochastic processes; but since they are often not mentioned in elementary treatments we consider the results which we require in detail. Definition 2.5. The optional σ-algebra O is defined as the σ-algebra on [0, ∞) × Ω generated by Ft -adapted processes with c` adl` ag paths. A process is said to be optional if it is O-measurable. There is a well-known inclusion result: the set of previsible processes is contained in the set of optional processes, which is contained in the set of progressively measurable processes. We only require the second part of this inclusion; for a proof of the first part see Rogers and Williams [249]. Lemma 2.6. Every optional process is progressively measurable. Proof. As the optional processes are generated by the adapted processes with c` adl` ag paths; it is sufficient to show that any such process X is progressively measurable. For fixed T > 0, define an approximation process Y (n) (s, ω) ,
∞ X
XT 2−n (k+1) (ω)1[T k2−n ,T (k+1)2−n ) (s) + XT (ω)1[T,∞) (s).
k=0
It is immediate that Y (n) (s, ω) restricted to s ∈ [0, T ] is B([0, T ]) ⊗ FT measurable and progressive. Since X has right continuous paths as does Y (n) , (n) it follows that limn Yt = lims↓t Xs = Xt as n → ∞. Since the limit exists, X = lim inf n→∞ Y (n) , and is therefore progressively measurable. t u
18
2 The Stochastic Process π
The following theorem is only important in the case of a process X which is not adapted to the filtration Ft . It allows us to construct from X an Ft adapted process. Unlike in the case of discrete time, we can not simply use the process defined by the conditional expectation E[Xt | Ft ], since this would not be uniquely defined for ω in a null set which depends upon t; thus the process would be unspecified on the uncountable union of these null sets over t ∈ [0, ∞), which need not be null, therefore this definition could result in a process unspecified on a set of strictly positive measure which is unacceptable. Theorem 2.7. Let X be a bounded measurable process, then there exists an optional process o X called the optional projection† of X such that for every stopping time T o XT 1{T <∞} = E XT 1{T <∞} | FT . (2.8) This process is unique up to indistinguishability, i.e. any processes which satisfy these conditions will be indistinguishable. As we have assumed that the filtration Ft satisfies the usual conditions, this result can be established using Doob’s result on the regularization of the trajectories of martingales. The proof is given in Section A.7 of the Appendix. Remark 2.8. A simple consequence of the uniqueness part of this result is the fact that if X is itself optional then o X = X. The definition can be extended to unbounded non-negative measurable processes by applying Theorem 2.7 to X ∧ n and taking the limit as n → ∞. While Theorem 2.7 establishes the existence of the optional projection process, it does not provide us with any information about the trajectories of this process; for example, if the process X has continuous paths, does o X also have continuous paths? This turns out not to be true; see Remark 2.10. We must establish some kind of path regularity in order to apply many of the standard techniques of continuous time processes to the optional projection process. The following theorem establishes the regularity which we need, however, its proof is fairly long and uses multiple applications of the optional section theorem; therefore the proof is not given here, but can be found in Section A.7.1 of the appendix. Theorem 2.9. If Y is a bounded c` adl` ag process then the optional projection o Y is also c` adl` ag. Since the optional projection is only unique up to indistinguishability, the theorem is in fact stating that o Yt is indistinguishable from a c`adl`ag †
In some older French literature relevant to the subject, this projection is called the projection bien-measurable, although more recently, projection optionelle has superseded this.
2.3 Probability Measures on Metric Spaces
19
process. As may be expected, this result depends upon Ft satisfying the usual conditions. The restriction to bounded processes in the statement of the theorem is not essential, but is natural since our definition of optional projection was for a bounded process. The theorem can be extended to a process Y in the class D (i.e. the class of processes such that the set {XT : T is a stopping time and P(T < ∞) = 1} is uniformly integrable). As a uniformly integrable martingale is of class D, it follows that the theorem applies to uniformly integrable martingales. Remark 2.10. The optional projection of a bounded continuous process need not itself be continuous. As an example, consider the process whose value at any time t is given by the same integrable random variable A; that is Xt (ω) = A(ω). The optional projection of such a process is clearly the c`adl`ag modification of the martingale E[A | Ft ], however, clearly this modification need not be continuous.
2.3 Probability Measures on Metric Spaces This section presents some results on probability measures on metric spaces which are needed in order to construct the process π, and which are used throughout the book. The reader familiar with these topics can skip this section to proceed with the construction of π. Let P(S) denote the space of probability measures on the space S, that is, the subspace of µ ∈ M(S) such that µ(S) = 1. Let B(S) be the space of bounded B(S)-measurable functions S → R. If ν ∈ P(S) and f ∈ B(S) we write Z νf = f (x)ν(dx). S
The following standard results about probability measures are necessary. For more details on these subjects, the reader should consult one of the many references, such as Billingsley [19] and Parthasarathy [239]. Theorem 2.11. Any probability measure µ on a metric space S endowed with the associated Borel σ-algebra B(S) is regular. That is, if A ∈ B(S), given ε > 0 we can find an open set G and a closed set F such that F ⊆ A ⊆ G and µ(G \ F ) < ε. Proof. Let d be the metric on S. If A is closed then we can take F = A and G = {x : d(x, A) < δ}; as δ ↓ 0 the set G decreases to A. So if we let H be the class of sets A with the property of regularity then all the closed sets are contained in H. As the closed sets are the complements of the open sets they also generate the Borel σ-algebra. So if we show that H is a σ-algebra then we shall have established the result. As H is obviously closed under
20
2 The Stochastic Process π
complementation we only need to prove that it is closed under the formation of countable unions. Let An ∈ H and let Fn and Gn be closed and open sets such that Fn ⊆ An ⊆ Gn . By the definition of S H we can choose these sets such ∞ that P(Gn \ Fn ) < ε/2n+1 . IfSwe define G = n=1 Gn this is clearly S an open ∞ n0 set. Choose n0 such that P( n0 +1 Fn ) < ε/2 and then define F = n=1 Fn which, by virtue of the finite union is a closed set. Thus F ⊆ A ⊆ G and P(G \ F ) < ε, establishing that H is closed under countable unions. Hence H contains the Borel sets. t u The main consequence for us of this theorem is that if two probability measures on (S, B(S)) agree on the closed sets then they are equal. Definition 2.12. A subset A ⊂ B(S) is said to be separating if for ν, µ ∈ P(S), the condition νf = µf for all f ∈ A implies that µ = ν. The following result determines a very important separating class which motivates the definition of weak convergence. However, it should be noted that the conclusion of Theorem 2.13 does follow from the more general Portmanteau theorem (Theorem 2.17). Theorem 2.13. Let (S, d) be a metric space and U d (S) be the space of all continuous bounded functions S → R which are uniformly continuous with respect to the metric d on S. If µ, ν are elements of P(S), and Z Z f (x)µ(dx) = f (x)ν(dx) ∀f ∈ U d (S), S
S
then this implies that µ = ν. That is, the space U d (S) is separating. Proof. By Theorem 2.11 it is sufficient to show that ν and µ agree on closed subsets of S. Let F be a closed set and define F ε = {x : d(x, F ) < ε} which is T∞ clearly open and n=1 F 1/n = F . The sets F and (F 1/n )c are disjoint closed sets and d(F, (F 1/n )c ) ≥ 1/n. Define fn = (1 − nd(x, F ))+ . It is clear that fn ∈ U d (S) and 0 ≤ fn ≤ 1. For x ∈ F it follows that fn (x) = 1, and fn (x) = 0 for x ∈ (F 1/n )c . Hence Z Z µ(F ) = 1F (x)µ(dx) ≤ fn (x)µ(dx) S
and ν(F 1/n ) =
S
Z
Z 1F 1/n (x)ν(dx) ≥
S
fn (x)ν(dx), S
but by assumption as fn ∈ U d (S) the right-hand sides of these two equations are equal; therefore µ(F ) ≤ ν(F 1/n ) and letting n tend to infinity we obtain µ(F ) ≤ ν(F ). By symmetry we obtain the opposite inequality; hence µ(F ) = ν(F ) for all closed sets F . t u
2.3 Probability Measures on Metric Spaces
21
2.3.1 The Weak Topology on P(S) Let us endow the space P(S) with the weak topology. Familiarity with basic results of general topology is assumed here, but some less elementary results which are required are proved in Appendix A.3. Definition 2.14. A sequence of probability measures µn ∈ P(S), converges weakly to µ ∈ P(S) if and only if µn ϕ converges to µϕ as n → ∞ for all ϕ ∈ Cb (S). Weak convergence of µn to µ is denoted µn ⇒ µ. No restriction is implied in this definition by the assertion that the limit µ is a probability measure. Since 1 ∈ Cb (S) it follows that µn 1 = 1 for all n; hence µ1 = 1. We now exhibit a topology which engenders this form of convergence. The reader with an interest in functional analysis should be aware that the concept of weak topology in the following definition is really the weak*topology on the dual of Cb (S) which is the space M(S), but the terminology weak convergence for this concept has become standard within probability theory. Recall that for a space S, if T1 and T2 are topologies (collections of subsets of S satisfying the axioms of closure under finite intersections, closure under all unions, and containing the S and ∅), then we say that T1 is weaker (coarser ) than T2 if T1 ⊂ T2 , in which case T2 is said to be finer than T1 . Definition 2.15. The weak topology on the space P(S) is defined to be the weakest topology such that for all f ∈ Cb (S), the function µ 7→ µf is continuous. A basis for the neighbourhoods of a measure µ is defined to be a collection of open sets which contain µ, such that if V is another open set containing µ then there exists an element of the basis which is a subset of V . It is clear that for f ∈ Cb (S) the required continuity of the real-valued function on P(S) given by ν 7→ νf implies that the set {ν : |νf − µf | < } is open and contains µ. As the axioms of a topology require closure under finite intersections, we can construct a neighbourhood basis from these sets by taking finite intersections of them; thus in the weak topology on P(S) a basis for the neighbourhood of µ (a local basis) is provided by the sets of the form {ν ∈ P(S) : |µfi − νfi | < ε, 1 ≤ i ≤ m} (2.9) for m ∈ N, ε > 0 and where f1 , . . . , fm are elements of Cb (S). Theorem 2.16. A sequence of probability measures µn ∈ P(S) converges weakly to µ ∈ P(S) if and only if µn converges to µ in the weak topology. Proof. If µn converges to µ in the weak topology then for any set A in the neighbourhood base of µ, there exists n0 such that for n ≥ n0 , µn ∈ A. For any f ∈ Cb (S), and ε > 0, the set {ν : |µf − νf | < ε} is in such a neighbourhood
22
2 The Stochastic Process π
basis; thus µn f → µf for all f ∈ Cb (S), which implies that µn ⇒ µ. Conversely suppose µn ⇒ µ, and let A be the element of the neighbourhood basis for the weak topology given by (2.9). By the definition of weak convergence, it follows that µn fi → µn f , for i = 1, . . . , m, so there exists ni such that for n ≥ ni , |µn fi −µfi | < ε; thus for n ≥ maxi=1,...,m ni , µn is in A and thus µn converges to µ in the weak topology. t u We do not a priori know that this topology is metrizable; therefore we are forced to consider convergence of nets instead of sequences until such point as we prove that the space is metrizable. Consequently we make this proof our first priority. Recall that a net in E is a set of elements in E indexed by α ∈ D, where D is an index set (i.e. a set with a partial ordering). Let xα be a net in E. Define lim sup xα , inf sup xα α
α0 ∈D
and
lim inf xα , sup α
α0 ∈D
α≥α0
inf xα . α≥α0
The net is said to converge to x if and only if lim inf xα = lim sup xα = x. α
α
If S is compact then by Theorem A.9, the space of continuous functions C(S) = Cb (S) is separable and we can metrize weak convergence immediately; however, in the general case Cb (S) is not separable. Is it possible to find a smaller space of functions which still guarantee weak convergence but which is separable? The first thought might be the functions Ck (S) with compact support; however, these functions generate a different topology called the vague topology which is weaker than the weak topology. To see this, consider S = R and µn = δn the measure with an atom at n ∈ N; clearly this sequence does not converge in the weak topology, but in the vague topology it converges to the zero measure. (Although this is not an element of P(S); it is an element of M(S).) The Portmanteau theorem provides a crucial characterization of weak convergence; while an important part of the theory of weak convergences its main importance to us is a step in the metrization of the weak topology. Theorem 2.17. Let S be a metric space with metric d. Then the following are equivalent. 1. µα ⇒ µ. 2. limα µα g = µg for all uniformly continuous functions g, with respect to the metric d. 3. limα µα g = µg for all Lipschitz functions g, with respect to the metric d. 4. lim supα µα (F ) ≤ µ(F ) for all F closed in S. 5. lim inf α µα (G) ≥ µ(G) for all G open in S.
2.3 Probability Measures on Metric Spaces
23
Proof. The equivalence of (4) and (5) is immediate since the complement of an open set G is closed. That (1)⇒(2)⇒(3) is immediate. So it is sufficient to prove that (3)⇒(4)⇒(1). Start with (3)⇒(4) and suppose that µα f → µf for all Lipschitz continuous f ∈ Cb (S). Let F be a closed set in S. We construct a sequence fn ↓ 1F viz for n ≥ 1, fn (x) = (1 − nd(x, F ))+ .
(2.10)
Clearly fn ∈ Cb (S) and fn is Lipschitz continuous with Lipschitz constant n. But 0 ≤ fn ≤ 1 and for x ∈ F , fn (x) = 1, so it follows that fn ≥ 1F , and it is also immediate that this is a decreasing sequence. Thus by the monotone convergence theorem lim µfn = µ(F ). (2.11) n→∞
Consider n fixed; since 1F ≤ fn it follows that for α ∈ D µα (F ) ≤ µα fn , and thus lim sup µα (F ) ≤ lim sup µα fn . α∈D
α∈D
But by (3) lim sup µα fn = lim µα fn = µfn ; α∈D
α∈D
it follows that for all n ∈ N, lim supα∈D µα (F ) ≤ µfn and by (2.11) it follows that lim supα∈D µα (F ) ≤ µ(F ), which is (4). The harder part is the proof is that (4)⇒(1). Given f ∈ Cb (S) we split it up horizontally as in the definition of the Lebesgue integral. Let −kf k∞ = a0 < a1 < · · · < an = kf k∞ + ε/2 be constructed with n sufficiently large to ensure that ai+1 − ai < ε. Define Fi , {x : ai ≤ f (x)}, which by continuity of f is clearly a closed set. It is clear that µ(F0 ) = 1 and µ(Fn ) = 0. Therefore n X
ai−1 [µ(Fi−1 ) − µ(Fi )] ≤ µf <
i=1
n X
ai [µ(Fi−1 ) − µ(Fi )] .
i=1
By telescoping the sums on the left and right and using the fact that a0 = −kf k∞ , we obtain −kf k∞ + ε
n−1 X i=1
µ(Fi ) ≤ µf < −kf k∞ + ε + ε
n−1 X
µ(Fi ).
(2.12)
i=1
By the assumption that (4) holds, lim supα µα (Fi ) ≤ µ(Fi ) for i = 0, . . . , n hence we obtain from the right-hand inequality in (2.12) that
24
2 The Stochastic Process π
µα f ≤ −kf k∞ + ε + ε
n−1 X
µα (Fi )
i=1
thus lim sup µα f ≤ −kf k∞ + ε + ε α
n−1 X
lim sup µα (Fi ) α
i=1
≤ −kf k∞ + ε + ε
n−1 X
µ(Fi )
i=1
and from the left-hand inequality in (2.12) this yields lim sup µα f ≤ ε + µf. α
As ε was arbitrary we obtain lim supα µn f ≤ µf , and application to −f yields lim inf µn f ≥ µf which establishes (1). t u While it is clearly true that a convergence determining set of functions is separating, the converse is not true in general and in the case when S is not compact, there may exist separating sets which are not sufficiently large to be convergence determining. For further details see Ethier and Kurtz [95, Chapter 3, Theorem 4.5]. Theorem 2.18. If S is a separable metric space then there exists a countable convergence determining class ϕ1 ,ϕ2 , . . . where ϕi ∈ Cb (S). Proof. By Lemma A.6 a separable metric space is homeomorphic to a subset of [0, 1]N ; let the homeomorphism be denoted α. As the space [0, 1]N is compact, the closure α(S) is also compact. Thus by Theorem A.9 the space C(α(S)) is separable. Let ψ1 ,ψ2 , . . . be a countable dense family, where ψi ∈ C(α(S)). It is therefore immediate that we can approximate any function ψ ∈ C(α(S)) arbitrarily closely in the uniform metric by suitable choice of ψi provided that ψ is the restriction to α(S) of a function in C(α(S)). Now define ϕi = ψi ◦ α for each i. By the same reasoning, we can approximate f ∈ C(S) arbitrarily closely in the uniform metric by some fi provided that f = g ◦ α where g is the restriction to α(S) of a function in C(α(S)). Define a metric on S, ρˆ(x, y) = d(α(x), α(y)), where d is a metric induced by the topology of co-ordinatewise convergence on [0, 1]N . As α is a homeomorphism, this is a metric on S. For F closed in S, define the function +
+
fnF (x) , (1 − nˆ ρ(x, F )) = (1 − nd(α(x), α(F ))) = (gnF ◦ α)(x), where
(2.13)
2.3 Probability Measures on Metric Spaces
25
+
gnF (x) , (1 − nd(x, α(F ))) . This function gnF is an element of C([0, 1]N ), and hence is an element of C(α(S)); thus by the foregoing argument, we can approximate fnF arbitrarily closely by one of the functions ϕi . But we have seen from the proof that (3)⇒(4) in Theorem 2.17 that fnF of the form (2.13) for all F closed, n ∈ N form a convergence determining class. Suppose that for all i, we have that limα µα ϕi = µϕi ; then for each i |µα fnF − µfnF | ≤ 2kfnF − ϕi k∞ + |µα ϕi − µϕi |, by the postulated convergence for all i of µα ϕi ; it follows that the second term vanishes and thus for all i, lim sup |µα fnF − µfnF | ≤ 2kfnF − ϕi k∞ . α
As i was arbitrary, it is immediate that lim sup |µα fnF − µfnF | ≤ 2 lim inf kfnF − ϕi k∞ , α
i
and since fnF can be arbitrarily approximated in the uniform norm by a ϕi , it follows limα µα fnF = µfnF , and since this holds for all n, and F is closed, it follows that µα ⇒ µ. t u Theorem 2.19. If S is a separable metric space, then P(S) with the weak topology is separable. We can then find a countable subset ϕ1 ,ϕ2 , . . . of Cb (S), with kϕi k∞ = 1 for all i, such that d : P(S) × P(S) → [0, ∞),
d(µ, ν) =
∞ X |µϕi − νϕi | i=1
2i
(2.14)
defines a metric on P(S) which generates the weak topology; i.e., a net µα converges to µ weakly if and only if limα d(µα , µ) = 0. Proof. By Theorem 2.18 there exists a countable set f1 ,f2 , . . . of elements of Cb (S) which is convergence determining for weak convergence. Define ϕi , fi /kfi k∞ ; clearly kϕi k∞ = 1, and the ϕi s also form a convergence determining set. Define the map β : P(S) → [0, 1]N
β : µ 7→ (µϕ1 , µϕ2 , . . .).
Since the ϕi s are convergence determining; they must also be separating and thus the map β is one to one. It is clear that if µα ⇒ µ then from the definition of weak convergence, limα β(µα ) = β(µ). Conversely, since the ϕi s are convergence determining, if limα µα ϕi = µϕi for all i then µα ⇒ µ. Thus β is a homeomorphism from P(S) with the topology of weak convergence to
26
2 The Stochastic Process π
[0, 1]N with the topology of co-ordinatewise convergence. Thus since [0, 1]N is separable, this implies that P(S) is separable. The space [0, 1]N admits a metric which generates the topology of coordinatewise convergence, given for x, y ∈ [0, 1]N by D(x, y) =
∞ X |xi − yi | . 2i n=1
(2.15)
Therefore it follows that d(x, y) = D(β(x), β(y)) is a metric on P(S) which generates the weak topology. t u As a consequence of this theorem, when S is a complete separable metric space the weak topology on P(S) is metrizable, so it is possible to consider convergence in terms of convergent sequences instead of using nets. Exercise 2.20. Exhibit a countable dense subset of the space P(R) endowed with the weak topology. (Such a set must exist since R is a complete separable metric space, which implies that P(R) is separable.) Show further that P(R) is not complete under the metric d defined by (2.14). Separability is a topological property of the space (i.e. it is independent of both existence and choice of metric), whereas completeness is a property of the metric. The topology of weak convergence on a complete separable space S can be metrized by a different metric called the Prohorov metric, under which it is complete (see, e.g. Theorem 1.7 of Chapter 3 of Ethier and Kurtz [95]). Exercise 2.21. Let (Ω, F) be a probability space and S be a separable metric space. Let ζ : Ω → P(S) be a function. Write B(P(S)) for the Borel σ-algebra on P(S) generated by the open sets in the weak topology. Let {ϕi }i>0 be a countable convergence determining set of functions in Cb (S), whose existence is guaranteed by Theorem 2.18. Prove that ζ is F/B(P(S))-measurable (and thus a random variable) if and only if ζϕi : Ω → R is F/B(R)-measurable for all i > 0. [Hint: Consider the case where S is compact for a simpler argument.] Let us now turn our attention to the case of a finite state space I. The situation is much easier in this case since both M(I) and P(I) can be viewed as subsets of the Euclidean space R|I| with the product topology (which is separable), and equipped with a suitable complete metric. ( ) X |I| M(I) = (xi )i∈I ∈ R : xi < ∞, xi ≥ 0 ∀i ∈ I i∈I
( P(I) =
) (xi )i∈I ∈ M(I) :
X
xi = 1 .
i∈I
The Borel sets in M(I), viz B(M(I)), are generated by the cylinder sets {Ri,a,b }i∈I;a,b≥0 , where Ri,a,b = {(xj )j∈I ∈ M(I) : a ≤ xi ≤ b} and B(P(I)) is similarly described in terms of cylinders.
2.4 The Stochastic Process π
27
Exercise 2.22. Let d(x, y) be the Euclidean metric on R|I| . Prove that d metrizes the topology of weak convergence on P(I) and that (P(I), d) is a complete separable metric space.
2.4 The Stochastic Process π The aim of this section is to construct a P(S)-valued stochastic process π which is progressively measurable. In order to guarantee the existence of such a stochastic process some topological restrictions must be imposed on the state space S. In this chapter we assume that S is a complete separable metric space.† While this topological restriction is not the most general possible, it includes all the cases which are of interest to us; extensions to more general spaces are possible at the expense of additional technical complications (for details of these extensions, see Getoor [105]). If we only wished to construct for a fixed t ∈ [0, ∞) a P(S)-valued random variable πt then we could use the theory of regular conditional probabilities. If the index set (in which t takes values) were countable then we could construct a suitable conditional distribution Qt for each t. However, in the theory of continuous time processes the index set is [0, ∞). If suitable conditions are satisfied, then by making a specific choice of Ω (usually the canonical path space), it is possible to regularize the sequence of regular conditional distributions {Qt : t ∈ Q+ } to obtain a c` adl` ag P(Ω)-valued stochastic process, (Qt )t≥0 which is called a kernel for the optional projection. Such a kernel is independent of the signal process X and depends only on the probability space (Ω, F) and the filtration Yt . Performing the construction in this way (see Meyer [206] for details) is somewhat involved and imposes unnecessary conditions on Ω, which are irrelevant since we are only interested in the distribution of the signal process Xt . Thus we do not follow this approach and instead choose to construct πt by piecing together optional projections. The existence and uniqueness theorem for optional projections requires that we work with a filtration which satisfies the usual conditions, since the proof makes use of Doob’s martingale regularisation theorem. Therefore since we have do not assume right continuity of Yt , in the following theorem the right continuous enlargement Yt+ is used as this satisfies the usual conditions. Lemma 2.23. Assume that S is a compact metric space and S = B(S) is the corresponding Borel σ-algebra. Then there exists a P(S)-valued stochastic process πt which satisfies the following conditions. 1. πt is a Yt+ -optional process. †
A complete separable metric space is sometimes called a Polish space following Bourbaki in recognition of the work of Kuratowksi.
28
2 The Stochastic Process π
2. For any f ∈ B(S), the process πt f is indistinguishable from the Yt+ optional projection of f (Xt ). Proof. The proof of this lemma is based upon the proofs of Proposition 1 in Yor [279], Theorem 4.1 in Getoor [105] and Theorem 5.1.15 in Stroock [262]. Let {fi }∞ i=1 be a set of continuous bounded functions fi : S → R whose linear span is dense in Cb (S). The compactness of S implies by Corollary A.10 that such a set must exist. Set f0 = 1. We may choose such a set so that {f0 , . . . , fn } is linearly independent for each n. Set g0 = 1, and for n ≥ 1 set the process gn equal to a Yt+ -optional projection of fn (X). The existence of such an optional projection is guaranteed by Theorem 2.7. Let U be the (countable) vector space generated by finite linear combinaPN tions of these fi s with rational coefficients. If for some N ∈ N, f = i=1 αi fi PN with αi ∈ Q then define the process Λω = i=1 αi gi . By the linear independence property, it is clear that any such representation is unique and therefore this is well defined. Define a subspace, U + , {v ∈ U, v ≥ 0}. For v ∈ U + define N (v) = {ω ∈ Ω : Λω t (v) < 0 for some t ≥ 0} . It is immediate from Lemma A.26 that for each v ∈ U+ , the process Λω (v) has non-negative paths a.s., thus N (v) is a P-null set. Define [ N = Nf , f ∈U +
which is also a P-null set since this is a countable union. By construction Λω is linear; Λ(1) = 1. Define a modified version of the process Λω which is a functional on U ⊂ Cb (S) and retains the properties of non-negativity and linearity for all ω ∈ Ω, ( Λω (f ) ω∈ / N, ω Λ¯ (f ) = 0 ω ∈ N. It only remains to check that Λ¯ω is a bounded operator. Let f ∈ U ⊂ Cb (S); then trivially |f | ≤ kf k∞ 1, so it follows that kf k∞ 1 ± f ≥ 0 and hence for all t ≥ 0 Λ¯ω t (kf k∞ 1 ± f ) ≥ 0 by the non-negativity property. But by linearity since Λω (1) = 1, it follows that for all t ≥ 0, kf k∞ 1 ± Λω t (f ) ≥ 0, from which we deduce supt∈[0,∞) kΛω (f )k < kf k . ∞ ∞ t Since Λ¯ω is bounded, and U is dense in Cb (S) we can extend† the definition of Λ¯ω (f ) for f outside of U as follows. Let f ∈ Cb (S), since U is dense in Cb (S), we can find a sequence fk ∈ U such that fk → f pointwise. Define †
Functional analysts will realise that we can use the Hahn–Banach theorem to construct a norm preserving extension. Since this is a metric space we can use the constructive proof given here instead.
2.4 The Stochastic Process π
29
Λ¯ω (f ) , lim Λ¯ω (fk ) which is clearly well defined since if fk0 ∈ U is another sequence such that fk0 → f , then by the boundedness of Λ¯ and using the triangle inequality 0 0 ¯ω 0 sup kΛ¯ω t (fk ) − Λt (fn )k∞ ≤ kfk − fn k∞ ≤ kfk − f k∞ + kf − fn k∞ . t∈[0,∞)
Since S is compact and the convergence fk → f and fn0 → f is uniform on S, then given ε > 0, there exists k0 such that k ≥ k0 implies kfk − f k∞ < ε/2 and similarly n0 such that n ≥ n0 implies kfn0 − f k∞ < ε/2 whence it follows that the limit as n → ∞ of Λ¯ω (fn0 ) is Λ¯ω (f ). We must check that for f ∈ Cb (S), that Λ¯ω t (f ) is the Yt+ -optional projection of f (Xt ). By the foregoing it is Yt+ -optional. Let T be a Yt+ -stopping time E[ΛT (f )1T <∞ ] = lim E Λ¯T (fk )1T <∞ k→∞
= lim E [fk (XT )1T <∞ ] k→∞
= E[f (XT )1T <∞ ], ¯ n ) is a Yt+ -optional projection of where the second equality follows since Λ(f fn (X) and the other two inequalities follow by the dominated convergence theorem. By the Riesz representation theorem, which applies since S is compact,† we can find a kernel π ω (·) such that for ω ∈ Ω, Z Λ¯ω (f ) = f (x)πtω (dx) = πtω f, for all t ≥ 0. (2.16) t S
To establish the first and second parts of the theorem, we need to check that for f ∈ B(S) that (π ω f )t is the Yt+ -optional projection of f (Xt ). We do this via the monotone class framework (see Theorem A.1 in the appendix) since on a metric space the σ-algebra generated by Cb (S) is B(S). It is clear that for f ∈ Cb (S) from (2.16) and the preceding argument that (π ω f )t is the Yt+ -optional projection of f (Xt ). Let H be the subset of B(S) for which (π ω f )t is the Yt+ -optional projection of f (Xt ). Clearly 1 ∈ H, H is a vector space and Cb (S) ⊆ H. The monotone convergence theorem for integration implies that H is a monotone class. Therefore by the monotone class theorem H contains B(S). t u Theorem 2.24. Let S be a complete separable metric space. Then there exists a P(S)-valued stochastic process πt which satisfies the following conditions. 1. πt is a Yt+ optional process. †
Without the compactness property, we can not guarantee that the kernel be σadditive.
30
2 The Stochastic Process π
2. For any f ∈ B(S), the process πt f is indistinguishable from the Yt+ optional projection of f (Xt ). Proof. Since S is a complete separable metric space, by Theorem A.7 of the ˆ appendix, it is homeomorphic to a Borel subset of a compact metric space S; we denote the homeomorphism by α. ˆ Since S ˆ is a compact ˆ t = α(Xt ) taking values in S. Define a process X ˆ separable metric space, by Lemma 2.23 there exists a P(S)-valued stochastic ˆ π process π ˆ , such that for each fˆ ∈ B(S), ˆt fˆ is a Yt+ -optional projection of ˆ t ). fˆ(X ˆ it is immediate that ˆ takes values in α(S) ⊂ S, Since the process X o
o ˆ t ) = o 1ˆ 1ˆS\α(S) (X S\α(S) (α(Xt )) = 0 = 0.
As the optional projection is only defined up to indistinguishability, it follows that ˆ \ α(S) = π ˆ t ) = 0 ∀t ∈ [0, ∞) π ˆtω S ˆtω 1ˆS\α(S) = o 1ˆS\α(S) (X P-a.s. Define
n oc ˆ \ α(S) = 0 ∀t ∈ [0, ∞) , ¯ , ω∈Ω:π N ˆtω S
which we have just shown to be a P-null set. We define a P(S)-valued random process π as follows; let A ∈ B(S), ( ¯, π ˆtω (α(A)) ω∈ /N ω πt (A) , −1 ¯. PYt (A) ω∈N ¯ is arbitrary; we cannot choose 0, because πtω must be Here the choice of π on N a probability measure on S for all ω ∈ Ω. Thus it is immediate that πtω ∈ P(S). ˆ by defining If f ∈ B(S) then we can extend f to a function fˆ in B(S) ( f (α−1 (x)) if x ∈ α(S) fˆ(x) = 0 otherwise. Clearly π(f ) = π fˆ ◦ α = π ˆ fˆ1α(S) = π ˆ (fˆ)
P-a.s.,
ˆ t ) = f (Xt ), hence as required but π ˆt fˆ is the Yt+ -optional projection of fˆ(X πt (f ) is the Yt+ -optional projection of f (Xt ). t u Exercise 2.25. Let πt be defined as above. Show that for any f ∈ B(S), then for any t ∈ [0, ∞), πt f = E [f (Xt ) | Yt+ ] holds P-a.s.
2.4 The Stochastic Process π
31
Corollary 2.26. If the sample paths of X are c` adl` ag then there is a version of πt with c` adl` ag paths (where P(S) is endowed with the topology of weak convergence) and a countable set Q ⊂ [0, ∞), such that for t ∈ [0, ∞) \ Q, for any f ∈ B(S), πt f = E[f (Xt ) | Yt ]. Proof. For any f ∈ Cb (S), the Yt+ -optional projection of f (Xt ) is indistinguishable from a c` adl` ag process by Theorem 2.9. Since by Theorem 2.24 πt f is indistinguishable from the Yt+ -optional projection of f (Xt ), it follows that πt f is indistinguishable from a c` adl` ag process. By Theorem 2.18, there is a countable convergence determining class {ϕi }i≥0 , which is therefore also a separating class. We can therefore choose a modification of πt such that πt ϕi is c` adl` ag for all i. Therefore πt is c`adl`ag. Since P(S) with the weak topology is metrizable it then follows by Lemma A.14 that I , {t > 0 : P(πt− 6= πt ) > 0} is countable. But for t ∈ / I, πt = πt− a.s. thus πt = lims↑↑t πs (where the notation s ↑↑ t is defined in Section A.7.1). Clearly πs for s < t is Yt -measurable and therefore so is the limit πt . As Yt ⊂ Yt+ it follows from the definition of Kolmogorov conditional expectation that πt f = E[πt f | Yt ] = E[f (Xt ) | Yt ]. t u Remark 2.27. The theorem as stated above only guarantees πt f is a Yt+ optional projection of f (Xt ) for f a bounded measurable function. Examining the proof shows that this restriction to bounded f is the usual one arising from the use of the monotone class theorem A.1. It is useful to consider πf when f is not bounded. Consider f non-negative and define f n , f ∧n, which is bounded, so by the above theorem π(f n ) is Yt+ optional. Clearly fn → f in a monotone fashion as n → ∞, and since π(f n ) is the expectation of f n under the measure πt , by the monotone convergence theorem π(f n ) → π(f ). Since π(f ) is the limit of a sequence of Yt+ -optional processes, it is Yt+ -optional. By application of the monotone convergence theorem to the defining equation of optional projection (2.8), it follows that π(f ) is a Yt+ -optional projection of f (Xt ). In the general case where f is unbounded, but not necessarily non-negative, if πt |f | < ∞ for all t ∈ [0, ∞), P-a.s., then writing f+ = f ∧ 0 and f− = (−f ) ∧ 0, it follows that |f | = f+ + f− and hence πt f+ < ∞ and πt f− < ∞ for all t ∈ [0, ∞) P-a.s. Thus πt f = πt f+ − πt f− is well defined (i.e. it cannot be ∞ − ∞) and a similar argument verifies that it satisfies the conditions for the Yt+ -optional projection of f (Xt ). The pathwise regularity of πt f (i.e. showing that the trajectories of πf are c` adl` ag if X is c` adl` ag) requires stronger conditions in the unbounded case
32
2 The Stochastic Process π
irrespective of whether f is non-negative. In particular we need to be able to exchange a limit and an expectation; a suitable condition for this to be valid is that the family of random variables {f (Xt ) : t ∈ [0, ∞)} is uniformly integrable. For example, this is true if this family is dominated by an integrable random variable, in other words if sups∈[0,∞) |f (Xs )| is integrable. 2.4.1 Regular Conditional Probabilities This section is not essential reading in order to understand the subsequent chapters. It describes the construction of a regular conditional probability. The ideas involved are important in many areas of probability theory and most of the work in establishing them has been done in the previous section, hence their inclusion here. For many purposes a stronger notion of conditional probability is required than that provided by Kolmogorov conditional expectation (see Appendix A.2). The most useful form is that of regular conditional probability. Definition 2.28. Let (Ω, F, P) be a probability space and G a sub-σ-algebra of F. A function Q(ω, B) defined for all ω ∈ Ω and B ∈ E is called a regular conditional probability of P with respect to G if (a) For each B ∈ F, Q(ω, B) = E[IB | G]
P-a.s.
(b) For each ω ∈ Ω, Q(ω, ·) is a probability measure on (Ω, F). (c) For each B ∈ E, the map Q(·, B) is G-measurable. (d) If the σ-algebra G is countably generated then for all G ∈ G, Q(ω, G) = 1G (ω)
P-a.s.
Regular conditional probabilities as described in Definition 2.28 do not always exist. For an example of non-existence of regular conditional probabilities due to Halmos, Dieudonn´e, Andersen and Jessen see Rogers and Williams [248, Section II.43]. Exercise 2.29. Prove by similar methods to those used in the proof of Theorem 2.24 that if Ω is a compact metric space then there exists a regular conditional probability distribution with respect to the σ-algebra G ⊂ F. FurG thermore, show that in the case where T G is finitely generated that if A (ω) is the atom of G containing ω (i.e. {G ∈ G : ω ∈ G}) then Q(ω, A(ω)) = 1. This argument can be extended to complete separable metric spaces using Theorem A.7 using an argument similar to that in the proof of Theorem 2.24.
2.5 Right Continuity of Observation Filtration
33
2.5 Right Continuity of Observation Filtration The results in this section are proved under more restrictive conditions than those of Section 2.1. The observation process Y is assumed to satisfy the evolution equation (2.2). That is, Z Yt = Y0 +
t
h(Xs ) ds + Wt ,
t ≥ 0,
0
where W = {Wt , t ≥ 0} is a standard Ft -adapted m-dimensional Brownian m motion and h = (hi )m is a measurable function. Assume that i=1 : S → R conditions (2.3) and (2.4) are satisfied; that is Z t E kh(Xs )k ds < ∞, 0
and
t
Z
kπs (h)k ds < ∞ = 1. 2
P 0
Let I = {It , t ≥ 0} be the following process, called the innovation process, t
Z It = Yt −
πs (h) ds 0
Z
t
Z h(Xs ) ds −
= Wt + 0
t
πs (h) ds,
t ≥ 0.
(2.17)
0
For this innovation process to be well defined it is necessary that Z t πs (khk) ds < ∞ P-a.s.,
(2.18)
0
which is clearly implied by the stronger condition (2.3). The condition (2.18) is not strong enough for the proof of the following theorem; consequently only condition (2.3) is referenced subsequently. Proposition 2.30. If condition (2.3) is satisfied then It is a Yt -adapted Brownian motion under the measure P. Rt Proof. Obviously It is Yt -adapted as both Yt and 0 πs (h) ds are. First it is shown that It is a continuous martingale. As a consequence of (2.3) It is integrable, hence taking conditional expectation
34
2 The Stochastic Process π
Z t Z s E [It | Ys ] − Is = E Wt + h(Xr ) dr Ys − Ws + h(Xr ) dr 0 0 Z s Z t −E πr (h) dr Ys + πr (h) dr 0 0 Z t = E Ys + Wt − Ws + h(Xr ) dr Ys − Ys s Z s Z t −E πr (h) dr Ys + πr (h) dr. 0
Since Yt and
Rt 0
0
πr dr are Yt -measurable, Z
E [It | Ys ] − Is = E [Wt − Ws | Ys ] +
t
E [h(Xr ) − πr (h) | Ys ] dr = 0, s
where we have used the fact that for r ≥ s, E [πr (h) | Ys ] = E [h(Xr ) | Ys ] and E [Wt − Ws | Ys ] = E [E [Wt − Ws | Fs ] | Ys ] = 0. The cross-variation of I is the same as the cross-variation of W as the other two terms in (2.17) give zero cross-variation. So I is a continuous martingale and its cross-variation is given by
i j
I , I t = W i , W j t = tδij . (2.19) Hence I is a Brownian motion by L´evy’s characterisation of a Brownian motion (Theorem B.27). t u From the first part of (2.17), for small δ, Yt+δ − Yt ' πs (h)δ + It+δ − It . Heuristically the incoming observation Yt+δ − Yt has a part which could be predicted from the knowledge of the system state πs (h)δ and an additional component It+δ − It , containing new information which is independent of the current knowledge. This is why I is called the innovation process. Proposition 2.31 (Fujisaki, Kallianpur and Kunita [104]). Assume the conditions (2.3) and (2.4) are satisfied. Then every square integrable random variable η which is Y∞ -measurable has a representation of the form Z ∞ η = E[η] + νs> dIs , (2.20) 0
where ν = {νt , t ≥ 0} is a progressively measurable Yt -adapted process such that Z ∞ 2 E kνs k ds < ∞. 0
2.5 Right Continuity of Observation Filtration
35
This theorem is often proved under the stronger condition Z t 2 E kh(Xs )k ds < ∞, ∀t ≥ 0, 0
which implies both conditions (2.4) and (2.3). The innovation process It is clearly Yt -adapted. If the converse result, that Yt = σ(Is : 0 ≤ s ≤ t) ∨ N , were known† then this proposition would be a trivial application of the martingale representation theorem B.32. However, the representation provided by the proposition is the closest to a satisfactory converse which is known to hold.‡ The main element of the proof of Proposition 2.31 is an application of Girsanov’s theorem followed by use of the martingale representation theorem. In Section 2.1 it was necessary to augment the filtration Yt+ with the null sets in order to construct the process π. This will cause some difficulties, because the process to be used as a change of measure is not necessarily a martingale. In order to construct a uniformly integrable martingale, a stopping argument must be used and this cannot be done directly working with the augmented filtration. This has the unfortunate effect of obscuring a simple and elegant proof. The proof for a simpler case, where the process is a martingale is discussed in Exercise 2.33 and the reader who is uninterested in measurability aspects would be well advised to consult the solution to this exercise instead of reading the proof. To be clear in notation, we denote by Yto the unaugmented σ-algebra (i.e. without the addition of the null sets) corresponding to Yt . The following technical lemma, whose conclusion might well seem to be ‘obvious’ is required. The proof of the lemma is not important for understanding the proof of the representation result, therefore it can be found in the appendix proved as Lemma A.24. Lemma 2.32. Let Xto be the unaugmented σ-algebra generated by a process Xt . Then for T a X o -stopping time, for all t ≥ 0, o Xt∧T = σ {Xs∧T : 0 ≤ s ≤ t} . †
‡
An example of Tsirel’son which is presented in a filtering context in Beneˇs [11] demonstrates that in general Yt is not equal to σ(Is : 0 ≤ s ≤ t) ∨ N . In special cases the observation and innovation filtrations can be shown to be equal. Allinger and Mitter [3] extend an earlier result of Clark [55] (see also Theorem 11.4.1 in Kallianpur [145] and Meyer [205] pp 244–246) to show that if the obRT servation and signal noise are uncorrelated and for some T , E[ 0 kh(Xs )k2 ds] < ∞, then for t ≤ T , σ(Is : 0 ≤ s ≤ t) ∨ N = Yt . Their proof consists of an analysis of the Kallianpur–Striebel functional which Rleads to a pathwise uniqueness result t for weak solutions of the equation It = Yt − 0 πs (h) ds. That is, if two valid weak ˜ of this equation have a common Brownian motion I solutions (Y, I) and (Y˜ , I) (but not necessarily a common filtration) then Y and Y˜ are indistinguishable. From a result of Yamada and Watanabe (Remark 2, Corollary 1 of [275]; see also Chapter 8 of Stroock and Varadhan [261]) this establishes the result.
36
2 The Stochastic Process π
We are now in a position to prove the representation result, Proposition 2.31. Proof. Since the integral in (2.4) is non-decreasing in t, this condition implies that Z t
kπr (h)k2 dr < ∞, ∀t ∈ [0, ∞)
P
= 1.
(2.21)
0
Define
Z t Z 1 t Z¯t , exp − πr (h> ) dIr − kπr (h)k2 dr , 2 0 0
(2.22)
and for n > 0 define Z t n 2 ¯ ¯ T , inf t ≥ 0 : kπr (h)k dr ≥ n or |Zt | ≥ n ,
(2.23)
0
which by Lemma A.19 is a Yt -stopping time, since the processes t 7→ Rt 2 kπ (h)k dr and Z¯ are both continuous and Yt -adapted. By Lemma A.21 r 0 o the Yt -stopping time T¯n is a.s. equal to a Yt+ -stopping time. However, this is o not strong enough; a Yt -stopping time is required. The process πt (h) gives rise to a sequence of c`adl`ag step function approximations ∞ X π n (h)(ω) , 1[2−n i,2−n (i+1)) (t)π2−n i (h)(ω). i=0
Each π2−n i (h) is a Y2−n i -measurable random variable. From the definition of augmentation, by modification on a P-null set a Y2o−n i -measurable random variable Pin can be defined such that π2−n i (h) = Pin holds P-a.s. Then define π ¯ n (h)(ω) ,
∞ X
1[2−n i,2−n (i+1)) (t)Pin (ω),
i=0
and as a countable family of random variables has been modified on null sets, it follows that the processes π ¯ n (h) and π n (h) are indistinguishable. The n o process π ¯ (h) is c` adl` ag and Yt -adapted, therefore it must be Yto -optional. As the process π(h) has by Lemma A.13 at most a countable number of discontinuities, it follows that π n (h) converges λ-a.s. to π(h). Therefore the sequence π ¯ n (h) converges λ ⊗ P-a.s. to π(h). The limit π ¯ (h) , lim inf n→∞ π ¯ n (h) o o is a limit of Yt -optional processes and is therefore Y -optional. Using this π ¯ ˆ This process process in place of π(h) in the definition of Z¯ we may define Z. Zˆ as constructed need not be continuous as it can explode from a finite value to infinity, because (2.21) only holds outside a null set. This process cannot simply be modified on a null set, as this might destroy the property of Yto ˆ which is zero on the adaptedness. Instead define Z to be a modification of Z, set Z r 2 ω∈Ω: k¯ πs (h)k ds = ∞ for r < t, r ∈ Q . 0
2.5 Right Continuity of Observation Filtration
37
This set is clearly Yto -measurable, hence this modified process Z is Yto -adapted and continuous. R· As the processes Z and 0 k¯ πs (h)k2 ds are continuous and Yto -adapted by Rt Lemma A.19 inf{t ≥ 0 : Zt ≥ n} and inf{t ≥ 0 : 0 k¯ πs (h)k2 ds} are both o Yt -stopping times. The process π ¯ (h) is indistinguishable from π(h), therefore define a second sequence of stopping times Z t T n , inf t ≥ 0 : k¯ πr (h)k2 dr ≥ n or |Zt | ≥ n (2.24) 0
and it follows that Tn is be a.s. equal to T¯n and Zn , ZT n is P-a.s. equal to ZT¯n . Clearly Z is a local martingale; but in general it is not a martingale. The next argument shows that by stopping at T n , the process Z Tn is a uniformly integrable martingale and therefore suitable for use as a measure change. From (2.22), using Itˆ o’s formula ZtTn = Z0 −
Z
t∧Tn
ZsTn πs (h> ) dIs ,
0
and since by Proposition 2.30, I is a P-Brownian motion adapted to Yt ; it follows that the stochastic integral is a Yt -adapted martingale provided that "Z # t∧Tn 2 Tn 2 E kπs (h)k Zs ds < ∞, for all t ≥ 0. 0
It is clear that Z t∧Tn Z 2 Tn 2 2 kπs (h)k Zs ds ≤ n 0
t∧Tn
kπs (h)k2 ds ≤ n4 < ∞.
(2.25)
0 n
It follows that Z T is a martingale which by (2.25) is uniformly bounded in L2 and hence uniformly integrable. Define a change of measure by ˜n dP = Z Tn . dP ˜ n are by construction equivalent probability measures, it follows As P and P ˜ n -a.s. As a consequence of Girthat statements which hold P-a.s. also hold P sanov’s theorem (see Theorem B.28 of the appendix), since Z Tn is a uniformly ˜ n , the process integrable martingale, under the measure P Ytn , It +
Z
T n ∧t
πr (h) dr, 0
is a Brownian motion with respect to the filtration Yt .
38
2 The Stochastic Process π
We are forced to use this Brownian motion Y n in place of Y when applying the martingale representation theorem. Were Z itself a uniformly integrable −1 martingale we could use this directly to construct a representation of Z∞ η which is Y∞ -measurable and square integrable as an integral over Y . Using Y n instead of Y as our Brownian motion is not itself a problem. However, the martingale representation theorem only allows representations to be constructed of random variables which are measurable with respect to the augmentation of the filtration generated by the Brownian motion. In this case this means measurable with respect to the augmentation of the filtration Ytn,o , σ {Ysn : 0 ≤ s ≤ t} . Clearly this filtration Ytn,o is not the same as Yto . From the definition of the innovation process Z t Yt = It + πr (h) dr. 0 n
Thus Y and Y agree on the time interval [0, T n ]. It must now be shown that the σ-algebras generated by these processes stopped at T n agree. From Lemma A.24 it follows that n,o n o Yt∧T n = σ {Ys∧T n : 0 ≤ s ≤ t} = σ {Ys∧T n : 0 ≤ s ≤ t} = Yt∧T n ,
where the second equality follows from the fact that Y n and Y agree on the interval [0, Tn ]. Suppose that η is an element of L2 (YTon , P), that is η is YTon -measurable, and E[η 2 ] < ∞. As the process Zt is continuous, it is progressively measurable therefore Zn is YTon -measurable. Thus Zn−1 η is also YTo n -measurable. One of the conditions defining the stopping time Tn ensures that |(Zn )−1 | < exp(2n), thus EP˜n (Zn−1 η)2 = E[Zn Zn−2 η 2 ] ≤ exp(2n)E[η 2 ] < ∞, ˜ n ). and hence Zn−1 η is an element of L2 (YTon , P We can now apply the classical martingale representation theorem B.32 (together with Remark B.33) to construct a representation with respect to the Brownian motion Y n of η˜ to establish the existence of a previsible process Φn adapted to the filtration Ytn (the representation theorem requires the use of the augmented filtration) such that Z ∞ > −1 −1 Zn η = EP˜n (Zn η) + (Φns ) dYsn 0 Z ∞ Z ∞ > n > = E[η] + (Φs ) dIs + (Φns ) πs (h) ds. 0
Φns
Ysn -adapted, n
0
As is it follows that for s > T n , Φns = 0 and since Y and Y n ˜n agree on [0, T ] it follows that Φns is adapted to Ys . We now construct a P −1 martingale from ηZn via
2.5 Right Continuity of Observation Filtration
39
η˜t = EP˜n ηZn−1 | Ytn . n
Applying Itˆ o’s formula to the product η˜t ZtT , > n d(˜ ηt ZTn ∧t ) = 1t≤T −Zt η˜t πt (h> ) dIt + Zt (Φnt ) dIt +
> Zt (Φnt )
πt (h) dt −
> Zt (Φnt )
πt (h) dt .
The finite variation terms in this expression cancel and thus integrating from 0 to t, Z t∧T n > η˜t ZtTn = E[η] + Zs (Φns ) − η˜s Zs πs (h> ) dIs . 0
Writing νtn , Zt Φnt − η˜t Zt πt (h), η˜t ZtTn
t∧Tn
Z = E[η] +
νs> dIs ,
0
taking the limit as t → ∞, yields a representation Zn−1 ηZTn
Z
Tn
= E[η] +
νt> dIt .
0
By choice of Zn , the left-hand side is a.s. equal to η and since Φn and Z are Yt -adapted, it follows that ν n is also Yt -adapted. The fact that Φn is previsible implies that it is progressively measurable and hence since π(h) is progressively measurable the progressive measurability of ν n follows and we have established that for η ∈ L2 (YTo n , P) there is a representation Z η = E[η] +
Tn
νt> dIt ,
P-a.s.,
(2.26)
0
where ν is progressively measurable. Taking expectation of the square of (2.26) it follows that "Z n # T
E[η − E[η]]2 = E
2
(νsn ) ds .
0
Since η is a priori a square integrable random variable, the left-hand side is finite and hence "Z n # T n 2 E (νs ) ds < ∞. 0
The representation of the form (2.26) must be unique, thus it follows that there exists νt such that νt = νtn on t ≤ T n for all n ∈ N.
40
2 The Stochastic Process π
To complete the proof let H be the set of all elements of L2 (Y∞ , P) which have a representation of the form (2.20). By the foregoing argument, for any n, L2 (YTo n , P) ⊆ H. Clearly the set H is closed and since [ L2 (YTo n ; P) n∈N
is dense in L2 (Y∞ ; P), hence H = L2 (Y∞ , P).
t u
Exercise 2.33. To ensure you understand the above proof, simplify the proof of Proposition 2.31 in the case where for ω not in some null set Z t kπr (h)k2 dr < K(t) < ∞, (2.27) 0
where K(t) is independent of ω, a condition which is satisfied if h is bounded. In this case the condition (2.3) holds trivially. Proposition 2.31 offers an easy route to showing that the filtration Yt is right continuous. Lemma 2.34. Let M = {Mt , t ≥ 0} be a right continuous Yt+ -adapted martingale that is bounded in L2 (Ω); that is M satisfies supt≥0 E[Mt2 ] < ∞. Then M is Yt -adapted and continuous. Proof. By the martingale convergence theorem (Theorem B.1) Mt = E[M∞ | Yt+ ], and by Proposition 2.31 Z ∞ M∞ = E[M∞ ] + νs> dIs , 0
so using the fact that It is Yt -adapted Z Mt = E[M∞ ] + E
∞
νs>
0
Z = E[M∞ ] +
dIs Yt+
t
νs> dIs ,
0
from which it follows both that Mt is Yt -measurable and that M is continuous. t u Theorem 2.35. The observation σ-algebra is right continuous that is Yt+ = Yt . Proof. For a given t ≥ 0 let A ∈ Yt+ . Then the process M = {Ms , s ≥ 0} defined by ( 1A − E[IA | Yt ] for s ≥ t Ms , 0 for s < t
2.6 Solutions to Exercises
41
is a Ys+ -adapted right continuous martingale bounded in L2 (Ω). Hence, by Lemma 2.34, M is also a continuous Ys -adapted martingale. In particular 1A −E[IA | Yt ] is Yt -measurable so A ∈ Yt . Hence Yt+ ⊆ Yt and the conclusion follows since t was arbitrarily chosen. t u Exercise 2.36. Let π = {πt , t ≥ 0} be the Yt -adapted process defined in Theorem 2.24. Prove that for any t ≥ 0, πt has a σ(Ys , 0 ≤ s ≤ t)-measurable modification.
2.6 Solutions to Exercises 2.4 i. Let Ht be set on the right-hand side of (2.7). Since, for any G ∈ Yto and N1 , N2 ∈ N , (G\N1 ) ∪ N2 ∈ Yt it follows that Ht ⊆ Yt . Since Yto and N are subsets of Ht and H is a σ-algebra Yt = Yto ∨ N ⊆ Ht . ii. From Part (i) the result is true for ξ, the indicator function of an arbitrary set in Yt . By linearity the result holds for simple random variables, that is, for linear combinations of indicator functions of sets in Yt . Finally let ξ be an arbitrary Yt -measurable function. Then there exists a sequence (ξn )n>1 of simple random variables such that limn→∞ ξn (ω) = ξ(ω) for any ω ∈ Ω. Let (ηn )n≥1 be the corresponding sequence of Yto -measurable simple random variables such that, for any n ≥ 1, ξn (ω) = ηn (ω) for any ω ∈ Ω\Nn where Nn ∈ N . Define η = lim supn→∞ ηn . Hence η is Yto measurable and ξ(ω) = η(ω) for any ω ∈ Ω\(∪n≥1 Nn ) which establishes the result. 2.20 The rational numbers of R. We show that the set Pn Q are a dense subset + G ⊂ P(R) of measures α δ , for α ∈ Q , and xk ∈ Q for all k with k k=1 k xk Pn α = 1, is dense in P(R) with the weak topology. Given µ ∈ P(R) we k=1 k must find a sequence µn ∈ G such that µn ⇒ µ. It is sufficient to show that we canPfind an approximating sequence µn in ∞ + the P∞space H of measures of the form i=1 αi δxi where αi ∈ R , xi ∈ Q and i=1 αi = 1. It is clear that each such measure in H is the weak limit of a sequence of measures in G. We can cover R by the countable collection of disjoint sets of the form [k/n, (k + 1)/n) for k ∈ Z. Define µn ,
∞ X
µ([k/n, (k + 1)/n))δk/n ;
k=−∞
then µn ∈ H. Let g ∈ Cb (R) be a Lipschitz continuous function. Define ank ,
inf x∈[k/n,(k+1)/n)
g(x),
bni ,
sup x∈[k/n,(k+1)/n)
g(x).
42
2 The Stochastic Process π
As n → ∞, since g is uniformly continuous it is clear that supi |ani − bni | → 0. Thus as ∞ X µn g = g(k/n)µ([k/n, (k + 1)/n)), k=−∞
and ∞ X
ank µ([k/n, (k + 1)/n)) ≤ µg ≤
k=−∞
∞ X
bnk µ([k/n, (k + 1)/n)),
k=−∞
it follows that |µn g − µg| ≤
∞ X
|bnk − ank | → 0.
k=−∞
As this holds for all uniformly continuous g, we have established (2) of Theorem 2.17 and thus µn ⇒ µ. For the second part, define µn , δn for n ∈ N. This sequence does not converge weakly to any element of P(R) but the sequence is Cauchy in d, hence the space (P(R), d) is not complete. 2.21 Suppose that ζϕi is F/B(R)-measurable for all i. To show that ζ is F/B(P(S))-measurable, it is sufficient to show that for all elements A of the neighbourhood basis of µ, the set ζ −1 (A) ∈ F. But the sets of the neighbourhood basis have the form given by (2.9). We show that the weak topology is also generated by the local neighbourhoods of µ of the form B=
m \
{ν ∈ P(S) : |νϕji − µϕji | < ε} ,
(2.28)
i=1
where ε > 0, and j1 , . . . , jm are elements of N. Clearly the topology with this basis must be weaker than the weak topology. We establish the equivalence of the topologies if we also show that the weak topology is weaker than the topology with neighbourhoods of the form (2.28). To this end, consider an element A in the neighbourhood basis µ of the weak topology A=
m \
{ν ∈ P(S) : |νfi − µfi | < ε} ;
i=1
we show that there is an element of the neighbourhood (2.28) which is a subset of A. Suppose no such subset exists; in this case we can find a sequence µn in P(S) such that µn ϕi → µϕi for all i, yet µn ∈ / A for all n. But since {ϕi }∞ i=1 is a convergence determining set, this implies that µn ⇒ µ and hence µn f → µf for all f ∈ Cb (S), in which case for n sufficiently large µn must be in A, which is a contradiction. Thus we need only consider ζ −1 (B) =
m \ i=1
{ω : |ζ(ω)ϕji − µϕji | < ε} ,
2.6 Solutions to Exercises
43
where ε > 0 and j1 , . . . , jm in N. Since ζϕi is F/B(R)-measurable, it follows that each element of the intersection is F-measurable and thus ζ −1 (B) ∈ F. Thus we have established that ζ is F/B(P(S))-measurable. For the converse implication suppose that ζ is B(P(S))-measurable. We must show that ζf is B(R)-measurable for any f ∈ Cb (R). For any x ∈ R, ε > 0 the set {µ ∈ P(S) : |µf − x| < ε} is open in the weak topology on P(S), hence {ω : |ζf − x| < ε} is F-measurable; thus we have shown that (ζf )−1 (x − ε, x + ε) ∈ F. The open intervals (x − ε, x + ε) for all x ∈ R, ε > 0 generate the open sets in R, hence ζf is F/B(R) measurable. 2.22 Considering µ ∈ P(I) as a subset of R|I| , then a continuous bounded function ϕ on a finite set I may be thought of as elements of R|I| and µϕ is the dot product µ · ϕ. If µn , µ ∈ P(I) and µn ⇒ µ, then by choosing the functions to be the basis vectors of R|I| we see that µn {i} → µ{i} as n → ∞ for all i ∈ I. Thus weak convergence in P(I) is equivalent to co-ordinatewise convergence in R|I| . It is then clear that P(I) is separable since the set Q|I| is a countable dense subset of R|I| . Since (R|I| , d) is complete and since d is a metric for co-ordinatewise convergence in R|I| , it also metrizes weak convergence on P(I). 2.25 We know from Theorem 2.24 that πf is indistinguishable from the Yt+ optional projection of f (X). As t is a bounded stopping time, for any t ∈ [0, ∞), E[f (Xt ) | Yt+ ] = o (f (Xt )) P-a.s., hence the result. 2.29 Parts (a) and (b) are similar to the argument given for the existence of the process π, but in this case taking fi ∈ Cb (Ω, R) and gi = E[fi | G] choosing some version of the conditional expectation. For (c) let Gi be a countable family generating G. Define K to be the (countable) π system generated by these Gi s. Clearly G = σ(K). Define Ω 0 , {ω ∈ Ω : Q(ω, K) = 1K (ω), ∀K ∈ K} . Since E[1K | G] = 1K ,
P-a.s.,
it follows that P(Ω 0 ) = 1. For ω ∈ Ω 0 the set of G ∈ G on which Q(ω, G) = 1G (ω) is a d-system; so by Dynkin’s lemma (see A1.3 of Williams [272]) since this d-system includes the π-system K it must include σ(K) = G. Thus for ω ∈ Ω 0 it follows that Q(ω, G) = 1G (ω),
∀G ∈ G.
To show that Q(ω, AG (ω)) = 1, observe that this would follow immediately from the above if AG (ω) ∈ G, but since it is defined in terms of an uncountable intersection we must use the countable generating system to write
44
2 The Stochastic Process π
! [
AG (ω) =
Gi
∩
Gi :ω∈Gi
[
Gci
Gi :ω ∈G / i
and since the expression on the right-hand side is in terms of a countable intersection of elements of G, the result follows. 2.33 To keep the solution concise, consider the even simpler case where the process Z defined in (2.22) is itself a uniformly integrable martingale (the general case can be handled by defining the change of measure on each Ft to be given by Zt as in Section 3.3). Thus we define a change of measure via ˜ dP = Z∞ , dP ˜ by Girsanov’s theorem Yt is a Brownian motion. and consequently under P Let η ∈ L2 (Y∞ , P), and apply the martingale representation theorem to Z −1 η, to obtain a previsible process νt such that Z ∞ −1 Z∞ η = EP˜ (Z −1 η) + Φ> t dYs . 0
−1 ˜ If we define a P-martingale via η˜t = EP˜ [Z∞ η | Yt ] and by stochastic integration by parts d(Zt η˜t ) = Zt Φ> ˜t Zt πt (h> ) dIt , t −η
consequently we may define νt = Zt Φ> ˜t Zt πt (h> ). We may integrate this t −η to obtain Z t
νt> dIt ,
Zt η˜t = E[η] + 0
and passing to the t → ∞ limit Z η = Z∞ η˜∞ = E[η] +
∞
νt> dIt .
0
2.36 Follow the same steps as in Lemma 2.23 for arbitrary fixed t ≥ 0 only consider the random variables gi to be given by the (Kolmogorov) conditional expectations E[fi (Xt ) | σ(Ys , 0 ≤ s ≤ t)] instead of the Yt -optional projection. Then use Exercise 2.4 part (ii) to show that the two constructions give rise to the same (random) probability measure almost surely. Alternatively, let π ¯t be the regular conditional distribution (in the sense of Definition A.2) of Xt given σ(Ys , 0 ≤ s ≤ t). Then for any f ∈ B(S), π ¯t f = E [f (Xt ) | σ(Ys , 0 ≤ s ≤ t)] holds P-a.s. Following Exercise 2.25 using the right continuity of the filtration (Yt )t≥0 and Exercise 2.4, for any f ∈ B(S),
2.7 Bibliographical Notes
πt f = E [f (Xt ) | Yt ] = E [f (Xt ) | σ(Ys , 0 ≤ s ≤ t)]
45
P-a.s.
Since S is a complete separable metric space there exists a countable separating set A ⊂ Cb (S). Therefore, there exists a null set N (A) such that for any ω ∈ Ω\N (A) we have π ¯t f (ω) = πt f (ω) for any f ∈ A. Therefore π ¯t (ω) = πt (ω) for any ω ∈ Ω\N (A).
2.7 Bibliographical Notes The majority of the results about weak convergence and probability measures on metric spaces can be found in Prokhorov [246] and are part of the standard theory of probability measures. The innovations argument originates in the work of Fujisaki, Kallianpur and Kunita [104], however, there are some technical difficulties whose resolution is not clear from this paper but which are discussed in detail in Meyer [205].
3 The Filtering Equations
3.1 The Filtering Framework Let (Ω, F, P) be a probability space together with a filtration (Ft )t≥0 which satisfies the usual conditions. (See Section 2.1 for a definition of the usual conditions.) On (Ω, F, P) we consider an Ft -adapted process X = {Xt , t ≥ 0} which takes values in a complete separable metric space S (the state space). Let S be the associated Borel σ-algebra B(S). The process X is assumed to have paths which are c` adl` ag. (See appendix A.5 for details.) In the following X is called the signal process. Let {Xt , t ≥ 0} be the usual augmentation with null sets of the filtration associated with the process X. In other words define Xt = σ(Xs , s ∈ [0, t]) ∨ N , (3.1) where N is the collection of all P-null sets of (Ω, F) and define _ X , Xt ,
(3.2)
t∈R+
where the ∨ notation denotes taking the σ-algebra generated by the union ∪t Xt . That is, [ X = σ Xt . t∈R+
Recall that B(S) is the space of bounded B(S)-measurable functions. Let A : B(S) → B(S) and write D(A) for the domain of A which is a subset of B(S). We assume that 1 ∈ D(A) and A1 = 0. This definition implies that if f ∈ D(A) then Af is bounded. This is a very important observation which is crucial for many of the bounds in this chapter. Let π0 ∈ P(S). Assume that X is a solution of the martingale problem for (A, π0 ). In other words, assume that the distribution of X0 is π0 and that the process M ϕ = {Mtϕ , t ≥ 0} defined as A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 3,
48
3 The Filtering Equations
Mtϕ = ϕ(Xt ) − ϕ(X0 ) −
Z
t
Aϕ(Xs ) ds,
t ≥ 0,
(3.3)
0
is an Ft -adapted martingale for any ϕ ∈ D(A). The operator A is called the generator of the process X. m Let h = (hi )m be a measurable function such that i=1 : S → R Z t P kh(Xs )k ds < ∞ = 1 (3.4) 0
for all t ≥ 0. Let W be a standard Ft -adapted m-dimensional Brownian motion on (Ω, F, P) independent of X, and Y be the process satisfying the following evolution equation Z t Yt = Y0 + h(Xs ) ds + Wt , (3.5) 0 m where h = (hi )m is a measurable function. The condition (3.4) i=1 : S → R ensures that the Riemann integral in the definition of Yt exists a.s. This process {Yt , t ≥ 0} is the observation process. Let {Yt , t ≥ 0} be the usual augmentation of the filtration associated with the process Y , viz
Yt = σ(Ys , s ∈ [0, t]) ∨ N , _ Y= Yt .
(3.6) (3.7)
t∈R+
Then note that since by the measurability of h, Yt is Ft -adapted, it follows that Yt ⊂ Ft . Remark 3.1. To simplify notation we have considered A and h as having no explicit time dependence. By addition of t as a component of the state vector X, most results immediately extend to the case when A and h are time dependent. The reason for adopting this approach is that it keeps the notation simple. Definition 3.2. The filtering problem consists in determining the conditional distribution πt of the signal X at time t given the information accumulated from observing Y in the interval [0, t]; that is, for ϕ ∈ B(S), computing πt ϕ = E[ϕ(Xt ) | Yt ].
(3.8)
As discussed in the previous chapter, we must choose a suitable regularisation of the process π = {πt , t ≥ 0}, and by Theorem 2.24 we can do this so that πt is an optional (and hence progressively measurable), Yt -adapted probability measure-valued process for which (3.8) holds almost surely. While (3.8) was established for ϕ bounded, πt as constructed is a probability measurevalued process, so it is quite legitimate to compute πt ϕ when ϕ is unbounded
3.2 Two Particular Cases
49
provided that the expectation in question is well defined, in other words when πt |ϕ| < ∞. In the following, Y0 is considered to be identically zero (there is no information available initially). Hence π0 , the initial distribution of X, is identical with the conditional distribution of X0 given Y0 and we use the same notation for both Z π0 ϕ = ϕ(x)PX0−1 (dx). S
In the following we deduce the evolution equation for π. We consider two possible approaches. •
•
The change of measure method. A new measure is constructed under which Y becomes a Brownian motion and π has a representation in terms of an associated unnormalised version ρ. This ρ is then shown to satisfy a linear evolution equation which leads to the evolution equation for π by an application of Itˆ o’s formula. The innovation process method. The second approach isolates the Brownian motion driving the evolution equation for π (called the innovation process) and then identifies the corresponding terms in the Doob–Meyer decomposition of π.
Before we proceed, we first present two important examples of the above framework.
3.2 Two Particular Cases We consider here two particular cases. One is a diffusion process and the second is a Markov chain with a finite state space. The results in the chapter are stated in as general a form as possible and the various exercises show how the results can be applied in these two particular cases. The exercises establish suitable conditions on the processes, under which the general results of the chapter are valid. The process of verifying these conditions is sequential and the exercises build upon the results of earlier exercises, thus they are best attempted in order. As usual, the solutions may be found at the end of the chapter. 3.2.1 X a Diffusion Process Let X = (X i )di=1 be the solution of a d-dimensional stochastic differential equation driven by a p-dimensional Brownian motion V = (V j )pj=1 : Xti = X0i +
Z 0
t
f i (Xs ) ds +
p Z X j=1
0
t
σ ij (Xs ) dVsj ,
i = 1, . . . , d.
(3.9)
50
3 The Filtering Equations
We assume that both f = (f i )di=1 : Rd → Rd and σ = (σ ij )i=1,...,d,j=1,...,p : Rd → Rd×p are globally Lipschitz: that is, there exists a positive constant K such that for all x, y ∈ Rd we have kf (x) − f (y)k ≤ Kkx − yk kσ(x) − σ(y)k ≤ Kkx − yk,
(3.10)
where the Euclidean norm k · k is defined in the usual fashion for vectors, and extended to d × p-matrices by considering them as d × p-dimensional vectors, viz: v u d p uX X 2. kσk = t σij i=1 j=1
Under the globally Lipschitz condition, (3.9) has a unique solution by Theorem B.38. The generator A associated with the process X is the second-order differential operator A=
d X i=1
fi
d X ∂ ∂2 + aij , ∂xi i,j=1 ∂xi ∂xj
(3.11)
where a = (aij )i,j=1,...,d : Rd → Rd×d is the matrix-valued function defined as aij =
1 2
p X
σ ik σ jk =
1 2
σσ >
ij
.
(3.12)
k=1
for all i, j = 1, . . . , d. Recall from the definition that Af must be bounded for f ∈ D(A). There are various possible choices of the domain. For example, we can choose D(A) = Ck2 (Rd ), the space of twice differentiable, compactly supported, continuous functions on Rd , since Aϕ ∈ B(Rd ) for all ϕ ∈ Ck2 (Rd ) and the process M ϕ = {Mtϕ , t ≥ 0} defined as in (3.3) is a martingale for any ϕ ∈ Ck2 (Rd ). Exercise 3.3. If the global Lipschitz condition (3.10) holds, show that there exists κ > 0 such that for x ∈ Rd , kσ(x)k2 ≤ κ(1 + kxk)2
(3.13)
kf (x)k ≤ κ(1 + kxk).
(3.14)
Consequently show that there exists κ0 > 0 such that kσ(x)σ > (x)k ≤ κ0 (1 + kxk2 ).
(3.15)
Exercise 3.4. Let SL2 (Rd ) be the subset of all twice continuously differentiable real-valued functions on Rd for which there exists a constant C such that for all i, j = 1, . . . , d and x ∈ Rd we have
3.2 Two Particular Cases
|∂i ϕ(x)| ≤
C , 1 + kxk
|∂i ∂j ϕ(x)| ≤
51
C . 1 + kxk2
Prove that Aϕ ∈ B(Rd ) for all ϕ ∈ SL2 (Rd ) and the process M ϕ defined as in (3.3) is a martingale for any ϕ ∈ SL2 (Rd ). We can also choose D(A) to be the maximal domain of A. That is, D(A) is the set of all ϕ ∈ B(Rd ) for which Aϕ ∈ B(Rd ) and M ϕ is a martingale. In the following, unless otherwise stated, we assume that D(A) is the maximal domain of A. Remark 3.5. The following question is interesting to answer. Under what conditions is the solution of a martingale problem associated with the secondorder differential operator defined in (3.11) the solution of the SDE (3.9)? The answer is surprisingly complicated. If D(A) contains the sequences (ϕik )k>0 , i,j 2 d i i i j (ϕi,j k )k>0 of functions in Ck (R ) such that ϕk = x and ϕk = x x for kxk ≤ k then there exists a p-dimensional Brownian motion V defined on an extension ˜ of (Ω, F, P) such that X is a weak solution of (3.9). For details see ˜ F, ˜ P) (Ω, Proposition 4.6, page 315 together with Remark 4.12, page 318 in Karatzas and Shreve [149]. 3.2.2 X a Markov Process with a Finite Number of States Let X be an Ft -adapted Markov process with values in a finite state space I. Then B(S) is isomorphic to RI and the rˆ ole of A is taken by the Q-matrix Q = {qij (t), i, j ∈ I, t ≥ 0} associated with the process. The Q-matrix is defined so that for all t, h ≥ 0 as h → 0, uniformly in t, for any i, j ∈ I, P (Xt+h = j | Xt = i) = Ji (j) + qij (t)h + o(h).
(3.16)
In (3.16) Ji is the indicator function of the atom i. In other words, qij (t) is the rate at which the process jumps from site i to site j and −qii (t) is the rate at which the process leaves site i. Assume that Q has the properties: a. qP ii (t) ≤ 0 for all i ∈ I, qij (t) ≥ 0 for all i, j ∈ I, i 6= j. b. j∈I qij (t) = 0 for all i ∈ I. c. supt≥0 |qij (t)| < ∞ for all i, j ∈ I. Exercise 3.6. Prove that for all ϕ ∈ B(S), the process M ϕ = {Mtϕ , t ≥ 0} defined as Z t Mtϕ = ϕ(Xt ) − ϕ(X0 ) − Qϕ(s, Xs ) ds, t ≥ 0, (3.17) 0
is an Ft -adapted right-continuous martingale. In (3.17), Qϕ : [0, ∞) × I → R is defined in a natural way as X (Qϕ)(s, i) = qij (s)ϕ(j), for all (s, i) ∈ [0, ∞) × I. j∈I
52
3 The Filtering Equations
Exercise 3.7. The following is a simple example with real-world applications which fits within the above framework. Let X = {Xt , t ≥ 0} be the process t ≥ 0,
Xt = I[T,∞) (t),
where T is a positive random variable with probability density p and tail probability gt = P(T ≥ t), t > 0. Prove that the Q-matrix associated with X has entries q01 (t) = −q00 (t) = pt /gt , q11 (t) = q10 (t) = 0. See Exercise 3.32 for more on how the associated filtering problem is solved. Remark 3.8. We can think of T as the time of a certain event occurring, for example, the failure of a piece of signal processing equipment, or the onset of a medical condition, which we would like to detect based on the information given by observing Y . This is the so-called change-detection filtering problem.
3.3 The Change of Probability Measure Method This method consists in modifying the probability measure on Ω, in order to transform the process Y into a Brownian motion by means of Girsanov’s theorem. Let Z = {Zt , t > 0} be the process defined by ! m Z t m Z X 1X t i i i 2 Zt = exp − h (Xs ) dWs − h (Xs ) ds , t ≥ 0. (3.18) 2 i=1 0 i=1 0 We need to introduce conditions under which the process Z is a martingale. The classical condition is Novikov’s condition (see Theorem B.34). If " !# m Z 1X t i 2 E exp h (Xs ) ds <∞ (3.19) 2 i=1 0 for all t > 0, then Z is a martingale. Since (3.19) is quite difficult to verify directly, we use an alternative condition provided by the following lemma. Lemma 3.9. Let ξ = {ξt , t ≥ 0} be a c` adl` ag m-dimensional process such that "m Z # X t 2 E ξsi ds < ∞ (3.20) i=1
0
and z = {zt , t > 0} be the process defined as zt = exp
m Z X i=1
0
m
t
ξsi
dWsi
1X − 2 i=1
Z 0
t
2 ξsi
! ds ,
t ≥ 0.
(3.21)
3.3 The Change of Probability Measure Method
If the pair (ξ, z) satisfies for all t ≥ 0 "m Z # X t 2 E zs ξsi ds < ∞, i=1
53
(3.22)
0
then z is a martingale. Proof. From (3.20), we see that the process m Z t X t 7→ ξsi dWsi 0
i=1
is a continuous (square-integrable) martingale with quadratic variation process m Z t X 2 t 7→ ξsi ds. i=1
0
By Itˆ o’s formula, the process z satisfies the equation m Z t X zt = 1 + zs ξsi dWsi . i=1
0
Hence z is a non-negative, continuous, local martingale and therefore by Fatou’s lemma a continuous supermartingale. To prove that z is a (genuine) martingale it is enough to show that it has constant expectation. Using the supermartingale property we note that E[zt ] ≤ E[z0 ] = 1. By Itˆ o’s formula, for ε > 0, zt 1 1 = − 1 + εzt ε ε (1 + εzt ) m Z t X 1 zs = + ξ i dWsi 1 + ε i=1 0 (1 + εzs )2 s m Z t X 2 εzs2 − ξsi ds. 3 (1 + εz ) s i=1 0
(3.23)
From (3.20) it follows that "m Z # 2 X t zs i 2 E ξs ds (1 + εzs )2 i=1 0 "m Z # X t 1 εzs 2 1 i 2 =E ξ ds ε2 1 + εzs (1 + εzs )2 s i=1 0 "m Z # X t 1 2 ≤ 2E ξsi ds < ∞, ε 0 i=1
54
3 The Filtering Equations
hence the second term in (3.23) is a martingale with zero expectation. By taking expectation in (3.23), "m Z # X t zt 1 1 εzs i 2 E = −E zs ξs ds . (3.24) 1 + εzt 1+ε (1 + εzs )2 1 + εzs i=1 0 We now take the limit in (3.24) as ε tends to 0. From (3.22) we obtain our claim by means of the dominated convergence theorem. t u As we require Z to be a martingale in order to construct the change of measure, the preceding lemma suggests the following as a suitable condition to impose upon h, Z t Z t 2 2 E kh(Xs )k ds < ∞, E Zs kh(Xs )k ds < ∞, ∀t > 0. (3.25) 0
0
Note that, since X has c` adl` ag paths, the process s 7→ h(Xs ) is progressively measurable. Condition (3.25) implies conditions (2.3) and (2.4) and hence Yt is right continuous and πt has a Yt -adapted progressively measurable version. Exercise 3.10. Let X be the solution of (3.9). Prove that if (3.10) is satisfied and X0 has finite second moment, then the second moment of kXt k is bounded on any finite time interval [0, T ]. That is, there exists GT such that for all 0 ≤ t ≤ T, E[kXt k2 ] < GT . (3.26) Further show that under the same conditions, if X0 has finite third moment that for any time interval [0, T ], there exists HT such that for 0 ≤ t ≤ T , E[kXt k3 ] < HT .
(3.27)
[Hint: Use Gronwall’s lemma, in the form of Corollary A.40 in the appendix.] Exercise 3.11. i. (Difficult) Let X be the solution of (3.9). Prove that if condition (3.10) is satisfied and X0 has finite second moment and h has linear growth, that is, there exists C such that kh(x)k2 ≤ C(1 + kxk2 )
∀x ∈ Rd ,
(3.28)
then (3.25) is satisfied. ii. Let X be the Markov process with values in the finite state space I as described in Section 3.2. Then show that (3.25) is satisfied. Proposition 3.12. If (3.25) holds then the process Z = {Zt , t ≥ 0} is an Ft -adapted martingale. Proof. Condition (3.25) implies condition (3.22) of Lemma 3.9, which implies the result. t u
3.3 The Change of Probability Measure Method
55
˜ t on Ft by For fixed t ≥ 0, since Zt > 0 introduce a probability measure P specifying its Radon–Nikodym derivative with respect to P to be given by Zt , viz ˜ t dP = Zt . dP Ft
˜ t form It is immediate from the martingale property of Z that the measures P a consistent family. That is, if A ∈ Ft and T ≥ t then ˜ T (A) = E[ZT 1A ] = E [E[ZT 1A | Ft ]] = E [1A E[ZT | Ft ]] = E[1A Zt ] = P ˜ t (A), P where E denotes expectation with respect to the probability measure P, a convention which we adhere to throughout this chapter.STherefore we can ˜ which is equivalent to P on define a probability measure P 0≤t<∞ Ft and we are able to suppress the superscript t in subsequent calculations. It is important to realise that we have not defined a measure on F∞ , where ∞ _ [ F∞ = Ft = σ Ft . t=0
0≤t<∞
We cannot in general use the Daniel–Kolmogorov theorem here to extend the ˜ to F∞ . Indeed there may not exist a measure defined on F∞ definition of P ˜ t on Ft for all 0 ≤ t < ∞. For a more detailed discussion which agrees with P of why this extension may not be possible, see the discussion in Section B.3.1 of the appendix. ˜ the observaProposition 3.13. If condition (3.25) is satisfied then under P, tion process Y is a Brownian motion independent of X; additionally the law ˜ is the same as its law under P. of the signal process X under P Proof. By Corollary B.31 to Girsanov’s theorem, the process Z t Yt = Wt + h(Xs ) ds 0
˜ Also, the law of the pair process is a Brownian motion with respect to P. (X, Y ) can be written as Z t (X, Y ) = (X, W ) + 0, h(Xs ) ds , 0
thus on the interval [0, t] where t is arbitrary, the law of (X, W ) is absolutely continuous with respect to the law of the process (X, Y ), and its Radon– Nikodym derivative is Zt (see Exercise 3.14). That is, for any bounded measurable function f defined on the product of the corresponding path spaces for the pair (X, Y ),
56
3 The Filtering Equations
E [f (X, Y )Zt ] = E[f (X, W )],
(3.29)
where in (3.29) both processes are regarded up to time t. Hence ˜ (X, Y )] = E[f (X, Y )Zt ] = E[f (X, W )] E[f ˜ since (X, W ) has the same and therefore X and Y are independent under P ˜ and a priori X and W are joint distribution under P as (X, Y ) has under P independent. t u Exercise 3.14. i. Show that the process P = {Pt , t ≥ 0} defined with β ∈ Rm as 1 Pt = exp iβ > Yt − kβk2 t Zt 2 is a X ∨ Ft -martingale. ii. Deduce from (i) that for any n ≥ 1 and 0 ≤ t1 ≤ t2 ≤ · · · ≤ tn < ∞ and any β1 , . . . , βn ∈ Rm , we have n n X X > > E exp iβj Ytj Ztn X = E exp iβj Wtj X . j=1 j=1 iii. Deduce from (ii) that (3.29) holds true for any bounded measurable function f defined on the product of the corresponding path spaces for the pair (X, Y ). Let Z˜ = {Z˜t , t ≥ 0} be the process defined as Z˜t = Zt−1 for t ≥ 0. Under ˜ Z˜t satisfies the following stochastic differential equation, P, dZ˜t =
m X
Z˜t hi (Xt ) dYti
(3.30)
i=1
and since Z˜0 = 1, Z˜t = exp
m Z X i=1
m
t i
h
(Xs ) dYsi
0
1X − 2 i=1
Z
!
t i
2
h (Xs ) ds ,
(3.31)
0
˜ Z˜t ] = E[Z˜t Zt ] = 1, so Z˜t is an Ft -adapted martingale under P ˜ and we then E[ have dP = Z˜t for t ≥ 0. ˜ F dP t
˜ the observation process Y is a Yt Proposition 3.13 implies that under P adapted Brownian motion; we can make use of the fact that Brownian motion is a Markov process to derive the following proposition. Proposition 3.15. Let U be an integrable Ft -measurable random variable. Then we have ˜ | Yt ] = E[U ˜ | Y]. E[U (3.32)
3.4 Unnormalised Conditional Distribution
57
Proof. Let us denote by Yt0 = σ(Yt+u − Yt ; u ≥ 0); ˜ the σ-algebra Y 0 ⊂ Y is then Y = σ(Yt , Yt0 ). Under the probability measure P t independent of Ft because Y is an Ft -adapted Brownian motion. Hence since U is Ft -adapted using property (f) of conditional expectation ˜ | Yt ] = E[U ˜ | σ(Yt , Y 0 )] = E[U ˜ | Y]. E[U t t u This proposition is an important step in the change of measure route to deriving the equations of non-linear filtering. It allows us to replace the timedependent family of σ-algebras Yt in the conditional expectations with the fixed σ-algebra Y. This enables us to use techniques based on results from Kolmogorov conditional expectation which would not be applicable if the conditioning set were time dependent (as in the case of Yt ). The proposition also has an interesting physical interpretation: the solution of the filtering problem for an Ft -adapted random variable U given all observations (future, ˜ present and past) is equal to E[U | Yt ]; that is, future observations will not influence the estimator.
3.4 Unnormalised Conditional Distribution In this section we first prove the Kallianpur–Striebel formula and use this to define the unnormalized conditional distribution process. ˜ The notation P(P)-a.s. in Proposition 3.16 means that the result holds ˜ ˜ both P-a.s. and P-a.s. We only need to show that it holds true in the first ˜ and P are equivalent probability measures. sense since P Proposition 3.16 (Kallianpur–Striebel). Assume that condition (3.25) holds. For every ϕ ∈ B(S), for fixed t ∈ [0, ∞), πt (ϕ) =
˜ Z˜t ϕ(Xt ) | Y] E[ ˜ Z˜t | Y] E[
˜ P(P)-a.s.
(3.33)
Proof. It is clear from the definition that Z˜t ≥ 0; furthermore it is readily observed that h i h i ˜ 1 ˜ ˜ ˜ 0=E ˜t =0} = P(Zt = 0), {Zt =0} Zt = E 1{Z ˜ Z˜t | Y] > 0 Pwhence it follows that Z˜t > 0 P-a.s. as a consequence of which E[ a.s. and the right-hand side of (3.33) is well defined. Hence using Proposition 3.15 it suffices to show that
58
3 The Filtering Equations
˜ Z˜t | Yt ] = E[ ˜ Z˜t ϕ(Xt ) | Yt ] πt (ϕ)E[
˜ P-a.s.
As both the left- and right-hand sides of this equation are Yt -measurable, this is equivalent to showing that for any bounded Yt -measurable random variable b, ˜ t (ϕ)E[ ˜ Z˜t | Yt ]b] = E[ ˜ E[ ˜ Z˜t ϕ(Xt ) | Yt ]b]. E[π A consequence of the definition of the process πt is that πt ϕ = E[ϕ(Xt ) | Yt ] ˜ P-a.s., so from the definition of Kolmogorov conditional expectation E [πt (ϕ)b] = E [ϕ(Xt )b] . ˜ Writing this under the measure P, h i h i ˜ πt (ϕ)bZ˜t = E ˜ ϕ(Xt )bZ˜t . E By the tower property of the conditional expectation, since by assumption the function b is Yt -measurable h i h i ˜ πt (ϕ)E[ ˜ Z˜t | Yt ]b = E ˜ E[ϕ(X ˜ ˜ E t )Zt | Yt ]b ˜ which proves that the result holds P-a.s.
t u
Let ζ = {ζt , t ≥ 0} be the process defined by ˜ Z˜t | Yt ], ζt = E[
(3.34)
˜ and Ys ⊆ Fs , it follows that for then as Z˜t is an Ft -martingale under P 0 ≤ s < t, h i ˜ t | Ys ] = E[ ˜ Z˜t |Ys ] = E ˜ E[ ˜ Z˜t | Fs ] | Ys = E[ ˜ Z˜s | Ys ] = ζs . E[ζ Therefore by Doob’s regularization theorem (see Rogers and Williams [248, Theorem II.67.7] since the filtration Yt satisfies the usual conditions we can choose a c` adl` ag version of ζt which is a Yt -martingale. In what follows, assume that {ζt , t ≥ 0} has been chosen to be such a version. Given such a ζ, Proposition 3.16 suggests the following definition. Definition 3.17. Define the unnormalised conditional distribution of X to be the measure-valued process ρ = {ρt , t ≥ 0} which is determined (see Theorem 2.13) by the values of ρt (ϕ) for ϕ ∈ B(S) which are given for t ≥ 0 by ρt (ϕ) , πt (ϕ)ζt . Lemma 3.18. The process {ρt , t ≥ 0} is c` adl` ag and Yt -adapted. Furthermore, for any t ≥ 0, h i ˜ Z˜t ϕ(Xt ) | Yt ˜ ρt (ϕ) = E P(P)-a.s. (3.35)
3.4 Unnormalised Conditional Distribution
59
Proof. Both πt (ϕ) and ζt are Yt -adapted. By construction {ζ, t ≥ 0} is also c` adl` ag.† By Theorem 2.24 and Corollary 2.26 {πt , t ≥ 0} is c`adl`ag and Yt adapted, therefore the process {ρt , t ≥ 0} is also c`adl`ag and Yt -adapted. For the second part, from Proposition 3.15 and Proposition 3.16 it follows that ˜ Z˜t | Yt ] = E[ ˜ Z˜t ϕ(Xt ) | Yt ] ˜ πt (ϕ)E[ P-a.s., ˜ Z˜t | Yt ] = ζt a.s. from which the result follows. From (3.34), E[
t u
It may be useful to point out that for general ϕ, the process ρt (ϕ) is not a Yt -martingale but a semimartingale. This misconception arising from (2.8) is due to confusion with the well-known result that taking conditional expectation of an integrable random variable Z with respect to the family Yt gives rise to a (uniformly integrable) martingale E[Z | Yt ]. But this is only true for a fixed random variable Z which does not depend upon t. Corollary 3.19. Assume that condition (3.25) holds. For every ϕ ∈ B(S), πt (ϕ) =
ρt (ϕ) ρt (1)
˜ P(P)-a.s.
∀t ∈ [0, ∞)
(3.36)
Proof. It is clear from Definition 3.17 that ζt = ρt (1). The result then follows immediately. t u The Kallianpur–Striebel formula explains the usage of the term unnormalised in the definition of ρt as the denominator ρt (1) can be viewed as the normalising factor. The result can also be viewed as the abstract version of Bayes’ identity in this filtering framework. In theory at least the Kallianpur– Striebel formula provides a method for solving the filtering problem. Remark 3.20. The Kallianpur–Striebel formula (3.33) holds true for any Borelmeasurable ϕ, not necessarily bounded, such that E [|ϕ(Xt )|] < ∞; see Exercise 5.1 for details. Lemma 3.21. i. Let {ut , t ≥ 0} be an Ft -progressively measurable process such that for all t ≥ 0, we have Z t ˜ E u2s ds < ∞; (3.37) 0
then, for all t ≥ 0, and j = 1, . . . , m, we have Z t Z t j ˜ ˜ s | Y] dY j . E us dYs Y = E[u s 0
†
(3.38)
0
R t It is in fact the case that ζt = exp 0 πs (h> ) dYs − 3.29.
1 2
Rt 0
kπs (h)k2 ds ; see Lemma
60
3 The Filtering Equations
ii. Now let {ut , t ≥ 0} be an Ft -progressively measurable process such that for all t ≥ 0, we have Z t ˜ E u2s d hM ϕ is < ∞; (3.39) 0
then ˜ E
Z 0
t
us dMsϕ Y = 0.
(3.40)
Proof. i. Every εt from the total set St as defined in Lemma B.39 satisfies the following stochastic differential equation Z t εt = 1 + iεs rs> dYs . 0
We observe the following sequence of identities Z t Z t j j ˜ ˜ ˜ E εt E us dYs Y = E εt us dYs 0 0 Z t Z t j j ˜ ˜ =E us dYs + E iεs rs us ds 0 0 Z t ˜ E ˜ =E iεs rsj us ds Y 0 Z t ˜ ˜ s | Y] ds =E iεs rsj E[u 0 Z t j ˜ ˜ = E εt E[us | Y] dYs , 0
which completes the proof of (3.38). ii. Since for all ϕ ∈ D(A), {Mtϕ , Ft } is a square integrable martingale, we can define the Itˆ o integral with respect to it. The proof of (3.40) is similar to that of (3.38). We once again choose εt from the set St and obtain the following sequence of identities (we use the fact that the quadratic covariation between Mtϕ and Y is 0).
3.5 The Zakai Equation
61
Z t Z t ˜ εt E ˜ ˜ εt E us dMsϕ Y = E us dMsϕ 0 0 Z t ˜ =E us dMsϕ 0
Z · Z · m X j j ϕ ˜ + E iεs rs dYs , us dMs i=1 t
˜ =E
Z
˜ =E
Z
0
0
us dMsϕ +
0
m X
˜ E
Z
i=1 t
us dMsϕ
0
t t
iεs rsj us d M·ϕ , Y·j s
0
= 0, where the final equality follows from the fact that the condition (3.39) ensures that the stochastic integral is a martingale. t u Exercise 3.22. Prove that if ϕ, ϕ2 ∈ D (A) then Z
ϕ
hM it =
t
Aϕ2 − 2ϕAϕ (Xs ) ds.
(3.41)
0
Hence, show in this case that condition (3.37) implies condition (3.39) of Lemma 3.21.
3.5 The Zakai Equation In the following, we further assume that for all t ≥ 0, Z t 2 ˜ P [ρs (khk)] ds < ∞ = 1.
(3.42)
0
Exercise 3.25 gives some convenient conditions under which (3.42) holds for the two example classes of signal processes considered in this chapter. Rt Exercise 3.23. Show that the stochastic integral 0 ρs (ϕh> ) dYs is well defined for any ϕ ∈ B(S) under condition (3.42). Hence the process Z t 7→
t
ρs (ϕh> ) dYs ,
0
is a local martingale with paths which are almost surely continuous, since it is Yt -adapted and (Yt )t≥0 is a Brownian filtration.
62
3 The Filtering Equations
Theorem 3.24. If conditions (3.25) and (3.42) are satisfied then the process ρt satisfies the following evolution equation, called the Zakai equation, Z t Z t ˜ ρt (ϕ) = π0 (ϕ) + ρs (Aϕ)ds + ρs (ϕh> ) dYs , P-a.s. ∀t ≥ 0 (3.43) 0
0
for any ϕ ∈ D(A). Proof. We first approximate Z˜t with Z˜tε given by Z˜tε =
Z˜t . 1 + εZ˜t
Using Itˆ o’s rule and integration by parts, we find d Z˜tε ϕ(Xt ) = Z˜tε Aϕ(Xt ) dt + Z˜tε dMtϕ − εϕ(Xt )(1 + εZ˜t )−3 Z˜t2 kh(Xt )k2 dt + ϕ(Xt )(1 + εZ˜t )−2 Z˜t h> (Xt ) dYt . Since Z˜tε is bounded, (3.39) is satisfied; hence by Lemma 3.21 Z t ε ϕ ˜ ˜ E Zs dMs Y = 0. 0
Also since Z t ˜ E ϕ2 (Xs ) 0
1 1 2 ˜ ε (1 + εZs ) 2
εZ˜s 1 + εZ˜s
!2 2
kh(Xs )k ds 2
kϕk∞ ˜ ≤ E ε2
Z
kϕk2∞ = E ε2
Z
t
kh(Xs )k ds
t
Zs kh(Xs )k ds < ∞,
2
0 2
0
where the final inequality is a consequence of (3.25). Therefore condition (3.37) is satisfied. Hence, by taking conditional expectation with respect to Y and applying (3.38) and (3.40), we obtain Z t π0 (ϕ) ε ˜ ˜ Z˜ ε Aϕ(Xs ) | Y] ds ˜ E[Zt ϕ(Xt ) | Y] = + E[ s 1+ε 0 Z t 1 ε 2 2 ˜ ˜ − E εϕ(Xs )(Zt ) kh(Xs )k | Y ds (1 + εZ˜s ) 0 Z t 1 > ˜ Z˜ ε + E ϕ(X )h (X ) | Y dYs . (3.44) s s t 1 + εZ˜s 0
3.5 The Zakai Equation
63
Now let ε tend to 0. We have, writing λ for Lebesgue measure on [0, ∞), lim Z˜tε = Z˜t
ε→0
˜ Z˜ ε ϕ(Xt ) | Y] = ρt (ϕ), lim E[ t
ε→0
˜ Z˜ ε Aϕ(Xt ) | Y] = ρt (Aϕ), lim E[ t
ε→0
˜ P-a.s. ˜ λ ⊗ P-a.e.
˜ Z˜t | Y], This last sequence remains bounded by the random variable kAϕk∞ E[ 1 ˜ which can be seen to be in L ([0, t] × Ω; λ ⊗ P) since Z t Z t ˜ ˜ ˜ Z˜s ] ds ≤ kAϕk∞ t < ∞. ˜ E kAϕk∞ E[Zs | Y] ds ≤ kAϕk∞ E[ 0
0
Consequently by the conditional form of the dominated convergence theorem as ε → 0, Z t Z t ε ˜ ˜ ˜ ˜ ˜ E E[Zs Aϕ(Xs ) | Y] ds Y → E ρs (Aϕ) ds Y , P-a.s. 0
0
Using the definition of ρt , we see that by Fubini’s theorem Z t Z t ˜ Z˜ ε Aϕ(Xs ) | Y] ds → ˜ E[ ρs (Aϕ) ds, P-a.s. s 0
0
Next we have that for almost every t, lim εϕ(Xs )(Z˜sε )2 (1 + εZ˜s )−1 kh(Xs )k2 = 0,
ε→0
˜ P-a.s.,
and εϕ(Xs )(Z˜sε )2 (1 + εZ˜s )−1 kh(Xs )k2 −2 ˜s ε Z 2 = ϕ(Xs )Z˜s kh(Xs )k 1 + εZ˜s 1 + εZ˜s ≤ kϕk∞ Z˜s kh(Xs )k2 .
(3.45)
˜ The right-hand side of (3.45) is integrable over [0, t] × Ω with respect to λ ⊗ P using (3.25): Z t Z t 2 2 ˜ ˜ E Zs kh(Xs )k ds = E kh(Xs )k ds < ∞. 0
0
Thus we can use the conditional form of the dominated convergence theorem to obtain that Z t 2 ε −1 2 ˜ ˜ ˜ lim εE ϕ(Xs ) Zs (1 + εZs ) kh(Xs )k | Y ds = 0. ε→0
0
64
3 The Filtering Equations
To complete the proof it only remains to show that as ε → 0, Z t Z t 1 ε > ˜ ˜ E Zs ϕ(Xs )h (Xs ) | Y dYs → ρs (ϕh> ) dYs . 1 + εZ˜s 0 0 Consider the process Z t ˜ Z˜ ε t 7→ E t 0
1 > ϕ(Xs )h (Xs ) | Y dYs ; 1 + εZ˜s
(3.46)
(3.47)
we show that this is a martingale. By Jensen’s inequality, Fubini’s theorem and (3.25), "Z " # # 2 t 1 ε > ˜ ˜ ˜ E E Zt ϕ(Xs )h (Xs ) | Y ds 1 + εZ˜s 0 Z t kϕk2∞ ˜ 2 ˜ ≤ E E[kh(Xs )k | Y] ds ε2 0 Z t 2 ˜ = ε2 kϕk2∞ E[kh(X s )k ] ds 0 Z t = ε2 kϕk2∞ E Zs kh(Xs )k2 ds 0
< ∞. Thus the process defined in (3.47) is an Ft -martingale. From condition (3.42) and Exercise 3.23 the postulated limit process as ε → 0, Z t t 7→ ρs (ϕh> ) dYs , (3.48) 0
is a well defined local martingale. Thus the difference of (3.47) and (3.48) is a well defined local martingale, # Z t " ˜2 ˜s ) ε Z (2 + ε Z s > ˜ t 7→ E ϕ(Xs )h (Xs ) | Y dYs . (3.49) (1 + εZ˜s )2 0 We use Proposition B.41 to prove that the integral in (3.49) converges to 0, ˜ P-almost surely. Since, for all i = 1, . . . , m, εZ˜s2 (2 + εZ˜s ) ϕ(Xs )hi (Xs ) = 0, ˜s )2 ε→0 (1 + εZ lim
˜ P-a.s.
and εZ˜s (2 + εZ˜s ) ˜ i ϕ(Xs )h (Xs ) ≤ 2kϕk∞ Z˜s hi (Xs ) , Zs ˜ ˜ (1 + εZs ) (1 + εZs )
(3.50)
3.5 The Zakai Equation
65
˜ using (3.25) it follows that for Lebesgue a.e. s ≥ 0, the right-hand side is Pintegrable, and hence it follows by the dominated convergence theorem that for almost every s ≥ 0, " # εZ˜s2 (2 + εZ˜s ) i ˜ ˜ lim E ϕ(Xs )h (Xs ) | Y = 0, P-a.s. ε→0 (1 + εZ˜s )2 As a consequence of (3.50), " # ˜ 2 (2 + εZ˜s ) ε Z s i ˜ ϕ(Xs )h (Xs ) | Y ≤ 2kϕk∞ ρs (khk), E (1 + εZ˜s )2 ˜ and using the assumed condition (3.42), it follows that P-a.s. " #!2 Z t εZ˜s2 (2 + εZ˜s ) i ˜ E ϕ(Xs )h (Xs ) | Y ds (1 + εZ˜s )2 0 Z t ≤ 4kϕk2∞ [ρs (khk)]2 ds < ∞. 0
Thus using the dominated convergence theorem for L2 ([0, t]), we obtain that " #!2 Z tX m ˜ 2 (2 + εZ˜s ) ε Z s i ˜ ˜ E ϕ(Xs )h (Xs ) | Y ds → 0 P-a.s. (3.51) (1 + εZ˜s )2 0 i=1 Because this convergence only holds almost surely we cannot apply the Itˆo isometry to conclude that the stochastic integrals in (3.46) converge. However, Proposition B.41 of the appendix is applicable as a consequence of (3.51), which establishes the convergence in (3.46).† t u Exercise 3.25. i. (Difficult) Let X be the solution of (3.9). Prove that if (3.10) is satisfied, X0 has finite third moment and h has linear growth (3.28), then (3.42) is satisfied. [Hint: Use the result of Exercise 3.10.] ii. Let X be the Markov process with values in the finite state space I as described in Section 3.2. Then (3.42) is satisfied. Remark 3.26. If X is a Markov process with finite state space I, then the Zakai equation is, in fact, a (finite-dimensional) linear stochastic differential equation. To see this, let us define by ρit the mass that ρt puts on site {i} for any i ∈ I. In particular, ρit = ρt ({i}) ˜ i (Xt )Z˜t | Yt ], = E[J †
i ∈ I,
The convergence established in Proposition B.41 is in probability only. Therefore the convergence in (3.46) follows for a suitably chosen sequence (εn ) such that εn → 0. The theorem follows by taking the limit in (3.44) as εn → 0.
66
3 The Filtering Equations
where Ji is the indicator function of the singleton set {i} and for an arbitrary function ϕ : I → R, we have X ρt (ϕ) = ϕ(i)ρit . i∈I
Hence the measure ρt and the |I|-dimensional vector (ρit )i∈I can be identified as one and the same object and from (3.43) we get that X ρt (ϕ) = ϕ(i)ρit i∈I
=
X
ϕ(i) π0i +
Z tX
Qji ρjs ds +
0 j∈I
i∈I
m Z X j=1
t
ρis hj (i) dYsj .
0
Hence ρt = (ρit )i∈I satisfies the |I|-dimensional linear stochastic differential equation Z t m Z t X > ρt = π0 + Q ρs ds + H j ρs dYsj , (3.52) 0
j=1
0
where, for j = 1, . . . , m, H j = diag(hj ) is the |I| × |I| diagonal matrix with entries Hii = hji , and π0 is the |I|-dimensional vector with entries π0i = π0 ({i}) = P (X0 = i) . The use of the same notation for the vector and the corresponding measure is warranted for the same reasons as above. Evidently, due to its linearity, (3.52) has a unique solution. Exercise 3.27. Let X be a Markov process with finite state space I with associated Q-matrix Q and π = { πti i∈I , t ≥ 0} be the conditional distribution of X given the σ-algebra Yt viewed as a process with values in RI . i. Deduce from (3.52) that the |I|-dimensional process π solves the following (non-linear) stochastic differential equation, Z
t
Q> πs ds
πt = π0 + 0
+
m Z X j=1
t
H j − πs (hj )I|I| πs (dYsj − πs (hj ) ds),
(3.53)
0
where I|I| is the identity matrix of size |I|. ii. Prove that (3.53) has a unique solution in the space of continuous Yt adapted |I|-dimensional processes.
3.6 The Kushner–Stratonovich Equation
67
Remark 3.28. There is a corresponding treatment of the Zakai equation for the case S = Rd and X is the solution of the stochastic differential (3.9). This be done in Chapter 7. In this case, ρt can no longer be associated with a finite-dimensional object (a vector). Under additional assumptions, it can be associated with functions defined on Rd which represent the density of the measure ρt with respect to the Lebesgue measure. The analysis goes in two steps. First one needs to make sense of the stochastic partial differential equation satisfied by the density of ρt (the equivalent of (3.52)). That is, one shows the existence and uniqueness of its solution in a suitably chosen space of functions. Next one shows that the measure with that given density solves the Zakai equation which we establish beforehand that it has a unique solution. This implies that ρt has the solution of the stochastic partial differential equation as its density with respect to the Lebesgue measure.
3.6 The Kushner–Stratonovich Equation An equation has been derived for the unnormalised conditional distribution ρ. In order to solve the filtering problem the normalised conditional distribution π is required. In this section an equation is derived which π satisfies. The condition (2.4) viz: Z t P kπs (h)k2 ds < ∞ = 1, for all t ≥ 0, (3.54) 0
turns out to be fundamental to the derivation of the Kushner–Stratonovich equation by various methods This technical condition (3.54) is unfortunate since it depends on the process π which we are trying to find, rather than being a direct condition on the system. It is, however, a consequence of the stronger condition which was required for the change of measure approach to the derivation of the Zakai equation, which is the first part of (3.25), since πt is a probability measure for all t ∈ [0, ∞). Lemma 3.29. If conditions (3.25) and (3.42) are satisfied then the process t 7→ ρt (1) has the following explicit representation, Z t Z 1 t ρt (1) = exp πs (h> ) dYs − πs (h> )πs (h) ds . (3.55) 2 0 0 Proof. Because h is not bounded, it is not automatic that πt (h) is defined (h might not be integrable with respect to πt ). However (3.25) ensures that it is defined λ ⊗ P-a.s. which suffices. From the Zakai equation (3.43), since A1 = 0, one obtains that ρt (1) satisfies the following equation, Z ρt (1) = 1 + 0
t
ρs (h> ) dYs ,
68
3 The Filtering Equations
which gives Z ρt (1) = 1 +
t
ρs (1)πs (h> ) dYs .
0
We cannot simply apply Itˆ o’s formula to log ρt (1) to conclude that ρt (1) has the explicit form (3.55), because the function x 7→ log x is not continuous at x = 0 (it is not even defined at 0) and we do not know a priori that ρt (1) > 0. Using the fact that ρt (1) is non-negative, we use Itˆo’s formula to compute for ε > 0 p ρt (1)2 d log ε + ρt (1)2 = πt (h> ) dYt ε + ρt (1)2 1 ε − ρt (1)2 + πs (h> )πs (h) dt 2 (ε + ρt (1)2 )2 =
ρt (1)2 ρt (1)2 πt (h> )h(Xt )dt + πt (h> ) dWt 2 ε + ρt (1) ε + ρt (1)2 1 ε − ρt (1)2 + πs (h> )πs (h) dt. (3.56) 2 (ε + ρt (1)2 )2
From (3.25) the condition (2.4) is satisfied; thus 2 Z t Z t ρs (1)2 2 2 kπ (h)k ds ≤ kπs (h)k ds < ∞ s ε + ρs (1)2 0 0
P-a.s.
and from (3.25) and (2.4) Z
s Z
t >
πs (h )h(Xs ) ds ≤ 0
t
2
Z
kπs (h)k ds 0
t
2
kh(Xs )k ds
P-a.s.
0
Thus s 7→ πs (h> )h(Xs ) is integrable, so by dominated convergence the limit as ε → 0 in (3.56) yields d (log ρt (1)) = πt (h> ) (h(Xt )dt + dWt ) − 12 πt (h> )πt (h) dt = πt (h> ) dYt − 12 πt (h> )πt (h) dt. Integrating this SDE, followed by exponentiation yields the desired result.
t u
Theorem 3.30. If conditions (3.25) and (3.42) are satisfied then the conditional distribution of the signal πt satisfies the following evolution equation, called the Kushner–Stratonovich equation, Z t πt (ϕ) = π0 (ϕ) + πs (Aϕ) ds 0 Z t + πs (ϕh> ) − πs (h> )πs (ϕ) (dYs − πs (h) ds), (3.57) 0
for any ϕ ∈ D(A).
3.6 The Kushner–Stratonovich Equation
Proof. From Lemma 3.29 we obtain Z t Z 1 1 t > > = exp − πs (h ) dYs + πs (h )πs (h) ds ρt (1) 2 0 0 1 1 d = −πt (h> )dYt + πt (h> )πt (h)dt . ρt (1) ρt (1)
69
(3.58)
By using (stochastic) integration by parts, (3.58), the Zakai equation for ρt (ϕ) and the Kallianpur–Striebel formula, we obtain the stochastic differential equation satisfied by πt , πt (ϕ) = ρt (ϕ) ·
1 ρt (1)
dπt (ϕ) = πt (Aϕ)dt + πt (ϕh> )dYt − πt (ϕ)πt (h> )dYt + πt (ϕ)πt (h> )πt (h)dt − πt (ϕh> )πt (h)dt t u
which gives us the result.
Remark 3.31. The Zakai and Kushner–Stratonovich equations can be extended for time inhomogeneous test functions. Let ϕ : [0, ∞) × S → R be a bounded measurable function and let ϕt (·) = ϕ(t, ·) for any t ≥ 0. Then Z ρt (ϕt ) = π0 (ϕ0 ) +
t
Z
t
ρs (∂s ϕs + Aϕs ) ds + 0
ρs (ϕs h> ) dYs
(3.59)
0
Z t πt (ϕt ) = π0 (ϕ0 ) + πs (∂s ϕs + Aϕs ) ds 0 Z t + (πs (ϕs h> ) − πs (h> )πs (ϕs ))(dYs − πs (h) ds)
(3.60)
0
for any ϕ ∈ D(A). This extension is carried out in Lemma 4.8. Exercise 3.32. Consider once again the change detection filter introduced in Exercise 3.7. Starting from the result of this exercise define an observation process Z t
Yt =
Xs ds + Wt . 0
Show that the Kushner–Stratonovich equation for the process X takes the form dπt (J1 ) = πt (J1 )(1 − πt (J1 )) (dYt − πt (J1 )dt) + (1 − πt (J1 ))pt /gt dt. (3.61) where J1 is the indicator function of the singleton set {1}.
70
3 The Filtering Equations
3.7 The Innovation Process Approach Here we use the representation implied by Proposition 2.31 to derive the Kushner–Stratonovich equation. The following corollary gives us a representation for Yt -adapted martingales. Corollary 3.33. Under the conditions of Proposition 2.31 every right continuous square integrable martingale which is Yt -adapted has a representation Z t ηt = η0 + νs> dIs t ≥ 0. (3.62) 0
Proof. Following Proposition 2.31, for any n ≥ 0, the Y∞ -measurable (square integrable) random variable ηn − η0 has a representation of the form Z ∞ ηn − η0 = (νsn )> dIs . 0
By conditioning with respect to Yt , for arbitrary t ∈ [0, n], we get that Z t ηt = η0 + (νsn )> dIs , t ∈ [0, n]. 0
The result follows by observing that the processes ν n , n = 1, 2, . . . must be compatible. That is, for any n, m > 0, ν n and ν m are equal on the set [0, min(n, m)]. t u We therefore identify a square integrable martingale to which the corollary 3.33 may be applied. Rt Lemma 3.34. Define Nt , πt ϕ− 0 πs (Aϕ) ds, then N is a Yt -adapted square integrable martingale under the probability measure P. Proof. Recall that πt ϕ is indistinguishable from the Yt -optional projection of ϕ(Xt ), hence let T be a bounded Yt -stopping time such that T (ω) ≤ K for all ω ∈ Ω. Then since Aϕ is bounded it follows that we can apply Fubini’s theorem combined with the definition of optional projection to obtain, " # Z T ENT = E πT ϕ − πs (Aϕ) ds 0
"Z
#
K
= E[πT ϕ] − E
1[0,T ] (s)πs (Aϕ) ds 0
Z
K
E 1[0,T ] (s)πs (Aϕ) ds
= E[ϕ(XT )] − 0
Z = E[ϕ(XT )] −
K
E 1[0,T ] (s)Aϕ(Xs ) ds 0 "Z # T
= E[ϕ(XT )] − E
Aϕ(Xs ) ds . 0
3.7 The Innovation Process Approach
71
Then using the definition of the generator A in the form of (3.3), we can find Mtϕ an Ft -adapted martingale such that ENT = E[ϕ(XT )] − E [ϕ(XT ) − ϕ(X0 ) − MTϕ ] = E[ϕ(X0 )]. Thus since Nt is Yt -adapted, and this holds for all bounded Yt -stopping times, it follows by Lemma B.2 that N is a Yt -adapted martingale. Furthermore since Aϕ is bounded for ϕ ∈ D(A), it follows that Nt is bounded and hence square integrable. t u An alternative proof of Proposition 3.30 can now be given using the innovation process approach. The proposition is restated because the conditions under which it is proved via the innovations method differ slightly from those in Proposition 3.30. Theorem 3.35. If the conditions (2.3) and (2.4) are satisfied then the conditional distribution of the signal π satisfies the following evolution equation, Z t πt (ϕ) = π0 (ϕ) + πs (Aϕ) ds 0 Z t + πs (ϕh> ) − πs (h> )πs (ϕ) (dYs − πs (h) ds), (3.63) 0
for any ϕ ∈ D(A). Rt Proof. Let ϕ be an element of D(A). The process Nt = πt ϕ − 0 πs (Aϕ) ds is by Lemma 3.34 a square integrable Yt -martingale. By assumption, condition (2.21) is satisfied, thus Corollary 3.33 allows us to find an integral representation for Nt . This means that there exists a progressively measurable process ν such that Z t Z t > Nt = EN0 + νs dIs = π0 (ϕ) + νs> dIs ; (3.64) 0
0
thus using the definition of Nt , we obtain the following evolution equation for the conditional distribution process π, Z t Z t πt (ϕ) = π0 (ϕ) + πs (Aϕ) ds + νs> dIs . (3.65) 0
0
To complete the proof, it only remains to identify explicitly the process νt . Let ε = (εt )t≥0 be the process as defined in (B.19), Lemma B.39. Thus dεt = iεt rt> dYt , hence, by stochastic integration by parts (i.e. by applying Itˆo’s formula to the products πt (ϕ)εt and ϕ(Xt )εt )
72
3 The Filtering Equations
Z t Z t πt (ϕ)εt = π0 (ϕ)ε0 + πs (Aϕ)εs ds + νs> εs dIs 0 0 Z t Z t + πs (ϕ)iεs rs> (dIs + πs (h)ds) + iεs rs> νs ds (3.66) 0 0 Z t Z t Z t ϕ(Xt )εt = ϕ(X0 )ε0 + Aϕ(Xs )εs ds + εs dMsϕ + iεs rs> d hM ϕ , W is 0 0 0 Z t > + ϕ(Xs )iεs rs (h(Xs )ds + dWs ) . (3.67) 0
Since we have assumed that the signal process and the observation process noise are uncorrelated, hM ϕ , Y it = hM ϕ , W it = 0 consequently subtracting (3.67) from (3.66) and taking the expectation, all of the martingale terms vanish and we obtain Z t irs> E [εs (νs − ϕ(Xs )h(Xs ) + πs (h)πs (ϕ))] ds 0
= E [εt (πt (ϕ) − ϕ(Xt ))] + E [ε0 (π0 (ϕ) − ϕ(X0 ))] Z t +E εs (Aϕ(Xs ) − πs (Aϕ)) ds 0
= E [εt (E [ϕ(Xt ) | Yt ] − ϕ(Xt ))] = 0. Hence, for almost all t ≥ 0, E [εt (νt − ϕ(Xt )h(Xt ) + πt (ϕ)πt (h))] = 0, so since εt belongs to a total set it follows that νt = πt (ϕh) − πt (ϕ)πt (h),
P-a.s.
(3.68)
Using the expression for πt (ϕ) given by (3.65) expressing the final term using the representation (3.64) with νt given by (3.68) Z t Z t πt (ϕ) = π0 (ϕ) + πs (Aϕ) ds + πs ϕh> − πs (ϕ)πs h> dIs , (3.69) 0
0
which is the Kushner–Stratonovich equation as desired.
t u
The following exercise shows how the filtering equations can be derived in a situation which on first inspection does not appear to have an interpretation as a filtering problem, but which can be approached via the innovation process method. Exercise 3.36. Define the Ft -adapted semimartingale α via Z t αt = α0 + βs ds + Vt , t≥0 0
3.8 The Correlated Noise Framework
and Z δt = δ0 +
73
t
γs ds + Wt ,
t ≥ 0,
0
where βt and γt are bounded progressively measurable processes and where W is an Ft -adapted Brownian motion which is independent of β and γ. Define Dt = σ(δs ; 0 ≤ s ≤ t) ∨ N . Find the equivalent of the Kushner–Stratonovich equation for πt (ϕ) = E [ϕ(αt ) | Dt ]. The following exercise shows how one can deduce the Zakai equation from the Kushner–Stratonovich equation. For this introduce the exponential martingale Zˆ = {Zˆt , t > 0} defined by Z t Z 1 t 2 > ˆ Zt , exp πs h dYs − kπs (h)k ds , t ≥ 0. 2 0 0 Exercise 3.37. i. Show that 1 1 d = − πt h> dIt . ˆ ˆ Zt Zt ii. Show that for any εt from the total set St as defined in Lemma B.39, εt E = E [εt Zt ] . Zˆt h i ˜ Z˜t | Yt = ρt (1) . iii. Show that Zˆt = E iv. Use the Kallianpur–Striebel formula to deduce the Zakai equation.
3.8 The Correlated Noise Framework Hitherto the noise in the observations W has been assumed to be independent of the signal process X. In this section we extend the results to the case when this noise W is correlated to the signal. As in the previous section, the signal process {Xt , t ≥ 0} is the solution of a martingale problem associated with the generator A. That is, for ϕ ∈ D(A), Z t ϕ Mt , ϕ(Xt ) − ϕ(X0 ) − Aϕ(Xs ) t≥0 0
is a martingale. We assume that there exists a vector of operators B = (B1 , . . . , Bm )> such that Bi : B(S) → B(S) for i = 1, . . . , m. Let D(Bi ) ⊆ B(S) denote the domain of the operator Bi . We require for each i = 1, . . . , m that Bi 1 = 0 and for ϕ ∈ D(Bi ), Z t ϕ i hM , W it = Bi ϕ(Xs ) ds. (3.70) 0
74
3 The Filtering Equations
Define D(B) ,
n \
D(Bi ).
i=1
Corollary 3.38. In the correlated noise case, the Kushner–Stratonovich equation is dπt (ϕ) = πt (Aϕ)dt + (πt (h> ϕ) − πt (h> )πt (ϕ) + πt (B > ϕ)) × (dYt − πt (h)dt),
for all ϕ ∈ D(A) ∩ D(B).
(3.71)
Proof. We now follow the innovations proof of the Kushner–Stratonovich equation. However, using (3.70) the term Z t Z t iεs rs> dhM ϕ , W is = iεs rs> Bϕ(Xs ) ds. 0
0
Inserting this term, we obtain instead of (3.68), νt = πt (ϕh) − πt (ϕ)πt (h) + πt (Bϕ),
P-a.s. t u
and using this in (3.65) yields the result.
Corollary 3.39. In the correlated noise case, for ϕ ∈ B(S), the Zakai equation is Z t Z t ρt (ϕ) = ρ0 (ϕ) + ρs (Aϕ) ds + ρs ((h> + B > )ϕ) dYs . (3.72) 0
0
Consider the obvious extension of the diffusion process example studied earlier to the case where the signal process is a diffusion given by dXt = b(Xt ) dt + σ(Xt ) dVt + σ ¯ (Xt ) dWt ;
(3.73)
thus σ ¯ is a d × m matrix-valued process. If σ ¯ ≡ 0 this case reduces to the uncorrelated case which was studied previously. Corollary 3.40. When the signal process is given by (3.73), the operator B = (Bi )m i=1 defined by (3.70) is given for k = 1, . . . , m by Bk =
d X
σ ¯ik
i=1
∂ . ∂xi
Proof. Denoting by A the generator of X, Z t ϕ Mt = ϕ(Xt ) − ϕ(X0 ) − Aϕ(Xs ) ds 0
=
d Z X i=1
0
t
d Z t X ∂ϕ ∂ϕ i (σdVs ) + (¯ σ dWs )i . ∂xi ∂x i 0 i=1
(3.74)
3.9 Solutions to Exercises
75
Thus hM ϕ , W k it =
d X m Z X 0
i=1 j=1
=
d Z X i=1
0
t
t
∂ϕ σ ¯ij dhW j , W k is ∂xi
∂ϕ σ ¯ik ds ∂xi t u
and the result follows from (3.70).
3.9 Solutions to Exercises 3.3 From (3.10) with y = 0, kσ(x) − σ(0)k ≤ Kkxk, by the triangle inequality kσ(x)k ≤ kσ(x) − σ(0)k + kσ(0)k ≤ kσ(0)k + Kkxk. Thus since (a + b)2 ≤ 2a2 + 2b2 , kσ(x)k2 ≤ 2kσ(0)k2 + 2K 2 kxk2 ; thus setting κ1 = max(2kσ(0)k2 , 2K 2 ), we see that kσ(x)k2 ≤ κ1 (1 + kxk2 ). Similarly from (3.10) with y = 0, and the triangle inequality, it follows that kf (x)k ≤ kf (0)k + Kkxk, so setting κ2 = max(kf (0)k, K), kf (x)k ≤ κ2 (1 + kxk). The result follows if we take κ = max(κ1 , κ2 ). For the final part, note that (σσ > )ij =
p X
σik σjk ,
k=1
hence |(σσ > )ij (x)| ≤ pkσk2 , consequently kσ(x)σ > (x)k ≤ pd2 κ(1 + kxk2 );
76
3 The Filtering Equations
thus we set κ0 = pd2 κ to get the required result. 3.4 First we must check that Aϕ is bounded for ϕ ∈ SL2 (Rd ). By the result of Exercise 3.3, with κ0 = κpd2 /2, kak = 12 kσ(x)σ > (x)k ≤ κ0 (1 + kxk2 ). Hence |Aϕ(x)| ≤
d X
|fi (x)||∂i ϕ(x)| +
i=1
≤
d X
|aij (x)||∂i ∂j ϕ(x)|
i,j=1
d X
|fi (x)|
i=1
d X C C + |aij (x)| 1 + kxk i,j=1 1 + kxk2
≤ Cdκ + Cpd2 κ0 < ∞, so Aϕ ∈ B(Rd ). By Itˆ o’s formula since ϕ ∈ C 2 (Rd ), Z tX p d X ϕ(Xt ) = ϕ(X0 ) + ∂i ϕ(Xs ) f i (Xs ) ds + σ ij dVsj 0 i=1
+
1 2
Z
d t X
j=1
∂i ∂j ϕ(Xs )
0 i,j=1
p X k=1
Hence Mtϕ
=
d Z X i=1
σ ik (Xs )σ jk (Xs ) ds.
0
t
∂i ϕ(Xs )
p X
σ ij (Xs ) dVsj ,
j=1
which is clearly a local martingale. Consider 2 p Z t 2 d Z t X X C kσ(Xs )k2 2 ij |∂i ϕ(Xs )| σ (Xs ) ds ≤ p ds 2 0 (1 + kXs k) j=1 i=1 0 Z t 2 pd κ(1 + kXs k2 ) ≤ C 2p ds (1 + kXs k)2 0 ≤ C 2 p2 d2 κt < ∞. Hence M ϕ is a martingale. 3.6 It is sufficient to show that for all i ∈ I, the process M i = Mti , t ≥ 0 defined as Z t Mti = Ji (Xt ) − Ji (X0 ) − qXs i (s) ds, t ≥ 0, 0
where Ji is the indicator function of the singleton set {i}, is an Ft -adapted right-continuous martingale. This is sufficient since
3.9 Solutions to Exercises
Mϕ =
X
ϕ(i)M i ,
77
for all ϕ ∈ B(S).
i∈I
Thus if M i is a martingale for i ∈ I then so is M ϕ which establishes the result. The adaptedness, integrability and right continuity of Mti are straightforward. From (3.16) and using the Markov property for 0 ≤ s ≤ t, P(Xt = i | Fs ) = E E 1{Xt =i} | Ft−h Fs = E [ P(Xt = i | Xt−h )| Fs ] = E[Ji (Xt−h ) | Fs ] + E qXt−h i (t − h) Fs h + o(h) = P(Xt−h = i | Fs ) + E qXt−h i (t − h) Fs h + o(h). It is clear that we may apply this iteratively; the error term is o(h)/h which by definition tends to zero as h → 0. Doing this and passing to the limit as h → 0 we obtain Z t P(Xt = i | Xs ) = Ji (Xs ) + E qXr i (r) dr Fs . s
Now Z t E[Mti | Fs ] = P(Xt = i | Fs ) − Ji (X0 ) − E qXr i (r) dr Fs 0 Z s = Ji (Xs ) − Ji (X0 ) − qXr i (r) dr 0
= Msi . It follows that Mti is a martingale. 3.7 Clearly the state space of X is {0, 1}. Once in state 1 the process never leaves the state 1 hence q10 (t) = q11 (t) = 0. Consider the transition from state 0 to 1, P(Xt+h = 1 | Xt = 0) = P(T ≤ t + h | T > t) = =
P(t < T ≤ t + h) P(T > t)
pt h + o(h). gt
Thus q01 (t) = pt /gt and hence q00 (t) = −q01 (t) = −pt /gt . 3.10 By Itˆ o’s formula d kXt k2 = 2Xt> (f (Xt )dt + σ(Xt )dVt ) + tr σ(Xt )σ > (Xt ) dt, Thus if we define Z Mt , 0
t
2Xs> σ(Xs ) dVs ,
(3.75)
78
3 The Filtering Equations
this is clearly a local martingale. Take Tn a reducing sequence (see Definition B.4) such that MtTn is a martingale for all n and Tn → ∞. Integrating between 0 and t ∧ Tn and taking expectation, EMt∧Tn = 0, hence Z t∧Tn 2 2 EkXt∧Tn k = EkX0 k + E 2XsT f (Xs ) + tr(σ(Xs )σ > (Xs )) ds. 0
By the results of Exercise 3.3, Z EkXt∧Tn k2 ≤ EkX0 k2 + E
t∧Tn
2dκkXs k(1 + kXs k) + κ0 (1 + kXs k2 ) ds
0
so setting c = max(2dκ, 2dκ + κ0 , κ0 ) > 0, Z t∧Tn 2 2 EkXt∧Tn k ≤ EkX0 k + cE (1 + kXks + kXs k2 ) ds. 0
But by Jensen’s inequality for p > 1, it follows that for Y a non-negative random variable 1/p E[Y ] ≤ (E[Y p ]) ≤ 1 + E[Y p ]. Thus 1 + EkXt∧Tn k2 ≤ 1 + EkX0 k2 + 2c
Z
t∧Tn
E[1 + kXs k2 ] ds,
0
and by Corollary A.40 to Gronwall’s lemma 1 + EkXt∧Tn k2 ≤ (1 + EkX0 k2 )e2c(t∧Tn ) . We may take the limit as n → ∞ by Fatou’s lemma to obtain EkXt k2 ≤ (1 + EkX0 k2 )e2ct − 1,
(3.76)
which establishes the result for the second moment. In the case of the third moment, applying Itˆo’s formula to f (x) = x3/2 and the process kXt k2 yields 3 > > d kXt k = 3kXt k 2Xt (f (Xt )dt + σ(Xt )dVt ) + tr(σ(Xt )σ (Xt ))dt + Define Z Nt , 6
3 X > σ(Xt )σ > (Xt )Xt dt. 2kXt k t
t
kXs kXs> σ(Xs ) dVs ,
0
and let Tn be a reducing sequence for the local martingale Nt . Integrating between 0 and t ∧ Tn and taking expectation, we obtain for some constant c > 0 (independent of n, t) that
3.9 Solutions to Exercises
E[kXt∧Tn k3 ] ≤ E[kX0 k3 ] + c
Z
79
t∧Tn
E[kXs k + kXs k2 + kXs k3 ] ds,
0
using Jensen’s inequality as before, E[kXt∧Tn k3 ] ≤ E[kX0 k3 ] + 3c
Z
t∧Tn
1 + E[kXs k3 ] ds,
0
thus by Corollary A.40 to Gronwall’s lemma E[kXt∧Tn k3 + 1] ≤ E[kX0 k3 ] + (1 + EkX0 k3 )e3c(t∧Tn ) , passing to the limit as n → ∞ using Fatou’s lemma E[kXt k3 ] ≤ (1 + E[kX0 k3 ])e3ct − 1,
(3.77)
and since E[kX0 k3 ] < ∞ (X0 has finite third moment) this yields the result. 3.11 i. As a consequence of the linear growth bound on h, Z t Z t Z t 2 2 E kh(Xs )k ds ≤ CE (1 + kXs k ) ds ≤ Ct + CE kXs k2 ds. 0
0
0
It follows by Jensen’s inequality that 2/3 E[kXt k2 ] ≤ EkXt k3 . Since the conditions (3.10) are satisfied and the second moment of X0 is finite, we can use the bound derived in Exercise 3.7 as (3.76); viz E[kXt k2 ] ≤ (EkX0 k2 + 1)e2ct . Consequently for t ≥ 0, Z t e2ct − 1 E kh(Xs )k2 ds ≤ Ct + C E[kX0 k2 + 1] < ∞. 2c 0
(3.78)
This establishes the first of the conditions (3.25). For the second condition, using the result of (3.75), Itˆ o’s formula yields d Zt kXt k2 = Zt 2Xt> (f (Xt )dt + σ(Xt )dVt ) + tr σ(Xt )σ > (Xt ) dt − Zt kXt k2 h> (Xt )dYt . Thus applying Itˆ o’s formula to the function f (x) = x/(1 + εx) and the process Zt kXt k2 yields
80
3 The Filtering Equations
d
Zt kXt k2 1 + εZt kXt k2
=
1 2 k2 )
d Zt kXt k2
(1 + εZt kXt ε 2 4 > − 3 Zt kXt k h (Xt )h(Xt ) (1 + εZt kXt k2 ) 2 > > + 4Zt Xt σ(Xt )σ (Xt )Xt dt. (3.79)
Integrating between 0 and t and taking expectation, the stochastic integrals are local martingales; we must show that they are martingales. Consider first the term Z t Zs 2Xs> σ(Xs ) dVs ; 2 2 0 (1 + Zs kXs k ) to show that this is a martingale we must therefore establish that
2 "Z # Z t
t > 2 > > Z 2X σ Z X σσ X
s s s s s E ds < ∞.
ds = 4E 2 2 2 4 0 (1 + Zs kXs k ) 0 (1 + Zs kXs k ) In order to establish this inequality notice that |Xt> σ(Xt )σ > (Xt )Xt | ≤ d2 kXt k2 kσ(Xt )σ > (Xt )k, and from Exercise 3.3 kσσ > k ≤ κ0 (1 + kXk2 ), hence |Xt σ(Xt )σ(Xt )Xt | ≤ d2 κ0 kXt k2 1 + kXt k2 , so the integral may be bounded by Z t Z t 2 Zs kXs k2 1 + kXs k2 Zs2 Xs> σσ > Xs 0 2 ds ≤ κ d ds 4 2 4 (1 + εZt kXs k2 ) 0 (1 + εZt kXt k ) 0 Z t Zs2 kXs k2 Zs2 kXs k4 0 2 =κd + 4 ds. 2 4 (1 + εZs kXs k2 ) 0 (1 + εZs kXs k ) Considering each term of the integral separately, the first satisfies Z t Z t Zs2 kXs k2 Zs kXs k2 1 ds ≤ Zs × × 4 3 ds 2) 2 (1 + εZ kX k t t (1 + εZt kXt k2 ) 0 (1 + εZs kXs k ) 0 Z t Z Zs 1 t Zs ds. ≤ ds ≤ ε 0 0 ε Thus the expectation of this integral is bounded by t/ε, because E[Zs ] ≤ 1. Similarly for the second term,
3.9 Solutions to Exercises
Z t"
Zs kXs k2 (1 + εZs kXs
0
2 k2 )
#2
t
Z
Zs2 kXs k4
ds ≤
(1 + εZs kXs
0
≤
2 k2 )
×
81
1 (1 + Zs kXs k2 )
2
ds
1 t < ∞. ε2
For the second stochastic integral term, Z t Zs kXs k2 h> (Xs ) − dVs , 2 2 0 (1 + εZs kXs k ) to show that this is a martingale, we must show that Z t 2 Zs kXs k4 kh(Xs )k2 E ds < ∞. (1 + εZs kXs k2 )4 0 Thus bounding this integral Z 0
t
Zs2 kXs k4 kh(Xs )k2 ds ≤ (1 + εZs kXs k2 )4
Z t 0
C ≤ 2 ε
Z
Zs kXs k2 (1 + εZs kXs k2 )
2
kh(Xs )k2 ds (1 + εZt kXk2 )2
t
kh(Xs )k2 ds.
0
Taking expectation, and using the result (3.78), Z t 2 Z t Zs kXs k4 kh(Xs )k2 C 2 E ds ≤ E kh(X )k ds < ∞. s (1 + εZs kXs k2 )4 ε2 0 0 Therefore we have established that the stochastic integrals in (3.79) are martingales and have zero expectation. Consider now the remaining terms; by an application of Fubini’s theorem, we see that " # Zt 2Xt> f (Xt ) + tr σ(Xt )σ > (Xt ) d Zt kXt k2 E ≤E dt 1 + εZt kXt k2 1 + εZt kXt k2 Zt kXt k2 ≤K E + 1 , 1 + εZt kXt k2 where we used the fact that E[Zt ] ≤ 1. Hence, by Corollary A.40 to Gronwall’s inequality there exists Kt such that for 0 ≤ s ≤ t, Zs kXs k2 E ≤ Kt < ∞, 1 + εZs kXs k2 by Fatou’s lemma as ε → 0, E Zs kXs k2 ≤ Kt < ∞. Then by Fubini’s theorem
82
3 The Filtering Equations
Z E
t
"Z # m t X Zs kh(Xs )k2 ds = E Zs hi (Xs )2 ds
0
0 t
Z
i=1
h
2
E Zs kh(Xs )k
=
i
ds
0
Z ≤C
t
E Zs 1 + kXs k2 ds ≤ Ct(1 + Kt ) < ∞,
0
which establishes the second condition in (3.25). ii. Let H = maxi∈I |h({i})|, as the state space I is finite, it is clear that H < ∞. Therefore Z t E kh(Xs )k2 ds ≤ E[Ht] = Ht < ∞, 0
which establishes the first condition of (3.25). For the second condition by Fubini’s theorem and the fact that Zt ≥ 0, Z t Z t E Zs kh(Xs )k2 ds ≤ H E[Zs ] ds ≤ Ht < ∞. 0
0
Thus both conditions in (3.25) are satisfied (E[Zs ] ≤ 1 for any s ∈ [0, ∞)). 3.14 i.
It is clear that Pt is X ∨ Ft -measurable and that it is integrable. Now for 0 ≤ s ≤ t, E [Pt | X ∨ Fs ] = E exp iβ > Yt − 12 kβk2 t Zt X ∨ Fs ˜ exp iβ > Yt − 1 kβk2 t | X ∨ Fs E 2 h i = ˜ Z˜t | X ∨ Fs E = Z˜s−1 exp iβ > Ys − 12 kβk2 s = Ps .
Hence Pt is a X ∨ Ft martingale under P. ii. For notational convenience let us fix t0 = 0 and define li =
n X
βj .
j=i
Since W is independent of X it follows that
3.9 Solutions to Exercises
83
n n X X E exp iβj> Wtj X = E exp iβj> Wtj j=1 j=1 n X = E exp ilj> (Wtj − Wtj−1 )
j=1
n X 2 1 = exp 2 klj k (tj − tj−1 ) . j=1
For the left-hand side we write n n X X E exp iβj> Ytj Ztn , X = E expi lj> (Ytj − Ytj−1 ) Ztn X j=1 j=1 > Zt2 exp il2 Yt2 = E Zt1 exp il1> Yt1 Zt1 exp il2> Yt1 Ztn exp iln> Ytn X . × ··· × Ztn−1 exp iln> Ytn−1 Write Pt (l) = exp il> Yt − 12 klk2 t Zt ; then n X > E exp iβj Ytj Ztn X j=1 Pt (ln−1 ) Ptn (ln ) Pt (l2 ) = E Pt1 (l1 ) 2 · · · n−1 X Pt1 (l2 ) Ptn−2 (ln−1 ) Ptn−1 (ln ) n X 2 1 . × exp 2 klj k (tj − tj−1 ) j=1
From part (i) we know that Pt (l) is a X ∨ Ft martingale for each l ∈ Rm ; thus conditioning on X ∨ Ftn−1 , Ptn−1 (ln−1 ) Ptn (ln ) Pt2 (l2 ) E Pt1 (l1 ) ··· X Pt1 (l2 ) Ptn−2 (ln−1 ) Ptn−1 (ln ) Ptn−1 (ln−1 ) Pt1 (l1 ) Pt2 (l2 ) =E ··· Ptn (ln ) X Pt1 (l2 ) Pt2 (l3 ) Ptn−1 (ln ) Ptn−1 (ln−1 ) Pt1 (l1 ) Pt2 (l2 ) =E ··· E Ptn (ln ) | X ∨ Ftn−1 X Pt1 (l2 ) Pt2 (l3 ) Ptn−1 (ln ) Pt (ln−2 ) Pt1 (l1 ) Pt2 (l2 ) =E · · · n−2 Ptn−1 (ln−1 ) X . Pt1 (l2 ) Pt2 (l3 ) Ptn−2 (ln−1 )
84
3 The Filtering Equations
Repeating this conditioning we obtain Ptn−1 (ln−1 ) Ptn (ln ) Pt2 (l2 ) E Pt1 (l1 ) ··· X Pt1 (l2 ) Ptn−2 (ln−1 ) Ptn−1 (ln ) = E [ Pt1 (l1 ) | X ] = E [ E [ Pt1 (l1 ) | X ∨ Ft0 ] | X ] = E [ Pt0 (l1 ) | X ] = 1. Hence n n X X 2 1 E exp iβj> Ytj Ztn X = exp 2 klj k (tj − tj−1 ) , j=1 j=1
which is the same as the result computed earlier for the right-hand side. iii. By Weierstrass’ approximation theorem any bounded continuous complex valued function g(Yt1 , . . . , Ytp ) can be approximated by a sequence as r → ∞, p mr X X > r g (r) (Yt1 , . . . , Ytp ) , ark expi βk,j Ytj . k=1
j=1
Thus as a consequence of (ii) it follows that for such a function g, E[g(Yt1 , . . . , Ytp )Zt | X ] = E[g(Yt1 , . . . , Ytp ) | X ], which since p was arbitrary by a further standard approximation argument extends to any bounded Borel measurable function g, E[g(Y )Zt | X ] = E[g(Y ) | X ]. Thus given f (X, Y ) bounded and measurable on the path spaces of X and Y it follows that E[f (X, Y )Zt ] = E [E[f (X, Y )Zt | X ]] . Conditional on X , f (X, Y ) may be considered as a function g X (Y ) on the path space of Y and hence E[f (X, Y )Zt ] = E E[g X (Y )Zt | X ] = E E[g X (W ) | X ] = E[f (X, W )]. 3.22 The result (3.41) is immediate from the following identities, Z t ϕ(Xt ) = ϕ(X0 ) + Mtϕ + Aϕ(Xs ) ds, 0 Z t Z t ϕ2 (Xt ) = ϕ2 (X0 ) + 2 ϕ(Xs ) dMsϕ + 2ϕAϕ(Xs ) ds + hM ϕ it , 0 0 Z t ϕ2 2 2 2 ϕ (Xt ) = ϕ (X0 ) + Mt + Aϕ (Xs ) ds; 0
3.9 Solutions to Exercises
thus Z
ϕ
hM it =
85
t
(Aϕ2 − 2ϕAϕ)(Xs ) ds.
0
Hence (3.39) becomes Z t Z t 2 u2s (Aϕ2 − 2ϕAϕ) ds ≤ kAϕ2 k∞ + 2kϕk∞ kAϕk∞ us ds < ∞. 0
0
˜ the process Y is a Brownian motion a sufficient condition 3.23 Since under P for the stochastic integral to be well defined is given by (B.9) which in this case takes the form, for all t ≥ 0, that "Z d # tX 2 ˜ P (ρs (ϕhi )) ds < ∞ = 1. 0 i=1
But since ϕ ∈ B(Rd ) it follows that Z tX d
2
ρs (ϕhi ) ds ≤
kϕk2∞
Z tX d
0 i=1
≤ dkϕk2∞
ρs (hi )2 ds
0 i=1 Z t
ρs (khk)2 ds.
0
Thus under (3.42) for all t ≥ 0 Z t ˜ P ρs (khk)2 ds < ∞ = 1, 0
and the result follows. 3.25 i. As a consequence of the linear growth condition (3.28) we have that p ρt (khk) ≤ Cρt 1 + kXt k2 , and we prove that t 7→ ρt
p
1 + kXt k2
(3.80)
is uniformly bounded on compact intervals. The derivation of (3.44) p did not require condition (3.42). We should like to apply this to ψ(x) = 1 + kxk2 , but while continuous this is not bounded. Thus choosing an approximating test function s 1 + kxk2 ϕλ (x) = 1 + λkxk2 in (3.44), we wish to take the limit as λ tends to 0 as ϕλ converges pointwise to ψ. Note that
86
3 The Filtering Equations
ϕ (X )Z˜
1
εZ˜s 1
λ s s
h(Xs ) = ϕλ (Xs ) h(Xs )
˜ ˜
(1 + εZ˜s )2
ε
1 + εZs 1 + εZs 1 ≤ ϕλ (Xs )kh(Xs )k ε s ! √ C 1 + kXs k2 p ≤ 1 + kXs k2 ε 1 + λkXs k2 √ C ≤ 1 + kXs k2 . ε Therefore we have the bound,
" # √
C
˜ (ϕλ (Xs ) − ψ(Xs ))Z˜s
2 ˜ h(Xs ) | Y ≤ 1 + E[kX k | Y] .
E s
ε (1 + εZ˜s )2 ˜ the process X is independent of Y , But by Proposition 3.13 since under P ˜ it follows that and since the law of X is the same under P as it is under P,
" # √
C
˜ (ϕλ (Xs ) − ψ(Xs ))Z˜s
h(Xs ) | Y ≤ 1 + E[kXs k2 ] . (3.81)
E 2 ˜
ε (1 + εZs ) Using the result (3.76) of Exercise 3.10 conclude that Z 0
t
√
C 1 + EkXs k2 ε
!2
2 C ds ≤ 2 1 + EkX0 k2 ε
Z
t
e4cs ds < ∞.
0
Thus by the dominated convergence theorem using the right-hand side of (3.81) as a dominating function, λ → 0, Z t h i ˜ ϕλ (Xs )Z˜s (1 + εZ˜s )−2 h(Xs ) | Y E 0
h i2 ˜ ψ(Xs )Z˜s (1 + εZ˜s )−2 h(Xs ) | Y −E ds → 0; thus using Itˆ o’s isometry it follows that as λ → 0, Z t h i ˜ ϕλ (Xs )Z˜s (1 + εZ˜s )−2 h(Xs ) | Y dYs E 0 Z t h i ˜ ϕ(Xs )Z˜s (1 + εZ˜s )−2 h(Xs ) | Y dYs → 0, → E 0
whence we see that (3.44) holds for the unbounded test function ψ. This ψ is not contained in D(A) since it is not bounded; however, computing using (3.11) directly
3.9 Solutions to Exercises
Aψ =
1 ψ
87
1 1 > > > > f x + tr(σσ ) − (X σσ X) . 2 2ψ 2
Thus using the bounds in (3.14) and (3.15) which follow from (3.10), 1 κd(1 + kXk)kXk + 12 κ0 (1 + kXk2 ) + 12 κ0 d2 kXk2 ψ2 ≤ 12 κ0 + κd + 12 d2 κ0 .
|Aψ|/ψ ≤
For future reference we define kA , 12 κ0 + κd + 12 d2 κ0 .
(3.82)
We also need a simple bound which follows from (3.26) and Jensen’s inequality p p ˜ Z˜t ψ(Xt )] = E[ψ(Xt )] ≤ 1 + E[kXt k2 ] ≤ 1 + Gt . E[ (3.83) In the argument following (3.47) the stochastic integral in (3.44) was shown ˜ Therefore for 0 ≤ r ≤ to be a Yt -adapted martingale under the measure P. t, Z t ˜ E[ ˜ Z˜ ε ψ(Xt ) | Y] − π0 (ψ) + ˜ Z˜ ε Aψ(Xs ) | Y] ds E E[ t s 1+ε 0 Z t 2 −3 2 ˜ ˜ ˜ − E εψ(Xs ) Zs 1 + εZs kh(Xs )k | Y ds Yr 0 Z r h i ε ˜ Z˜ ψ(Xr ) | Y] − π0 (ψ) + ˜ Z˜ ε Aψ(Xs ) | Y ds = E[ E r s 1+ε 0 Z r 2 −3 ˜ εψ(Xs ) Z˜s − E 1 + εZ˜s kh(Xs )k2 | Y ds. 0
Then we the term the term
can take the limit on both sides of this equality as ε → 0. For ˜ Z˜ ε ψ(Xt ) | Y] the limit follows by monotone convergence. For E[ t involving π0 (ψ), since X0 has finite third moment, p ˜ 0 (ψ)) = E[ψ(X0 )] < 1 + EkX0 k2 < ∞, E(π (3.84)
the limit follows by the dominated convergence theorem. For the integral involving the generator A we use the bound (3.82) to construct a domi˜ Z˜t kA ψ(Xt )] < ∞, the nating function since using (3.83) it follows that E[ limit then follows by the dominated convergence theorem. This only leaves the integral term which does not involve A; as this is not monotone in ε we must construct a dominating function. As a consequence of (3.28) and the definition of ψ(x),
88
3 The Filtering Equations
εψ(X )Z˜ 2 −2 ˜ ε Z s s s kh(Xs )k2 = ψ(Xs )Z˜s kh(Xs )k2 1 + εZ˜s ˜ (1 + εZ˜s )3 1 + εZs ≤ ψ(Xs )Z˜s kh(Xs )k2 ≤ C Z˜s (1 + kXs k2 )1/2 (1 + kXs k2 ) ≤ C Z˜s (1 + kXs k2 )3/2 . and use the fact that the third moment of kXt k is bounded (3.27) to see that this is a suitable dominating function. Hence as ε → 0, Z t 2 −3 ˜ εϕ(Xs ) Z˜s E 1 + εZ˜s kh(Xs )k2 | Y ds → 0, 0
and thus passing to the ε → 0 limit we obtain that Z t Mt , ρt (ψ) − π0 (ψ) + ρs (Aψ) ds
(3.85)
0
˜ t | Fr ] = Mr for 0 ≤ r ≤ t, and Mt is Yt -adapted. To show satisfies E[M ˜ t | < ∞, but this that Mt is a martingale, it only remains to show that E|M follows from the fact that for s ∈ [0, t] using (3.83), h i ˜ t (ψ)] = E ˜ E[ ˜ Z˜t ψ(Xt ) | Y] = E( ˜ Z˜t ψ(Xt )) < ∞, E[ρ together with the bounds (3.82) and (3.84) this implies Z t ˜ ˜ ˜ ˜ s (ψ)] ds < ∞ E[|Mt |] ≤ E(ρt (ψ))) + E[π0 (ψ)] + kA E[ρ 0 p p ≤ 1 + Gt (1 + ka t) + 1 + EkX0 k2 < ∞. But since ρt (ψ) is c` adl` ag (from the properties of ρt ) it follows that Mt ˜ Finally we use the fact that is a c` adl` ag Yt -adapted martingale under P. a c` adl` ag martingale has paths which are bounded on compact intervals in time (a consequence of Doob’s submartingale inequality, see Theorem 3.8 page 13 of Karatzas and Shreve [149] for a proof) to see that ˜ P(sup s∈[0,t] |Mt | < ∞) = 1. Then for ω fixed we have from (3.82) that Z t |ρt (ψ)| ≤ sup |Mt | + |π0 (ψ)| + kA |ρs (ψ)| ds, s∈[0,t]
0
so Gronwall’s inequality implies that ! |ρt (ψ)(ω)| ≤
sup |Mt | + |π0 (ψ)| ekA t , s∈[0,t]
whence for ω not in a null set ρs (ψ) is bounded for s ∈ [0, t]. Hence the result.
3.9 Solutions to Exercises
89
ii. Setting H = maxi∈I kh({i})k, since I is finite, H < ∞, thus using the fact that ρs is a probability measure Z t Z t 2 2 ρs (khk) ds ≤ H ρs (1)2 ds. 0
0
From (3.44) with ϕ = 1, since A1 = 0, Z t π0 (1) 1 ε ε 2 2 ˜ ˜ ˜ ˜ E[Zt | Y] = − E ε(Zt ) kh(Xs )k | Y ds 1+ε (1 + εZ˜s ) 0 Z t 1 > ˜ Z˜ ε + E h (X ) | Y dYs . s t 1 + εZ˜s 0 Taking conditional expectation with respect to Yr for 0 ≤ r ≤ t, Z t 1 2 ˜ E[ ˜ Z˜ ε ] + ˜ ε(Z˜ ε )2 Yr E E kh(X )k | Y ds s t t (1 + εZ˜s ) 0 Z r 1 ε ε 2 2 ˜ ˜ ˜ ˜ = E[Zt | Y] + E ε(Zt ) kh(Xs )k | Y ds. (1 + εZ˜s ) 0 Since khk ≤ H, it is straightforward to pass to the limit as ε → 0 which yields ρt (1) is a Yt -martingale. As in case (i) above then this has a c`adl`ag version which is a.s. bounded on finite intervals. Thus Z t ˜ ρs (1) ds < ∞ P-a.s., 0
˜ and P are equivalent on Ft which establishes (3.42) since the measures P and thus have the same null sets. 3.27 i. Observe first that (using the properties of the matrix Q): ρt (1) =
X i∈I
ρit = 1 +
m Z X j=1
t
ρs hj dYsj .
0
Next apply Itˆo’s formula and integration by parts to obtain the evolution equation of ρi πti = P t i . i∈I ρt ii. Assume that there are two continuous Yt -adapted |I|-dimensional processes, π and π ¯ , solutions of the equation (3.53). Show that the processes continuous Yt -adapted |I|-dimensional processes ρ and ρ¯ defined as
90
3 The Filtering Equations
m Z t X ρt = exp πs (hj ) dYsj − 0
j=1
m Z t X ρ¯t = exp π ¯s (hj ) dYsj − 0
j=1
1 2
1 2
Z
t
πs (hj )2 ds πt ,
t≥0
0
Z
t
π ¯s (hj )2 ds π ¯t ,
t≥0
0
satisfy equation (3.52) hence must coincide. Hence their normalised version must do so, too. Note that the continuity and the adaptedness of the processes are used to ensure that the stochastic integrals appearing in (3.52) and, respectively, (3.53) are well defined. 3.32 It is easiest to start from the finite-dimensional form of the Kushner– Stratonovich equation which was derived as (3.53). The Markov chain has two states, 0 and 1 depending upon whether the event is yet to happen, or has happened. Since it is clear that πt0 + πt1 = 1, then it suffices to write the equation for the component corresponding to the state 1 as this is πt1 = πt (J1 ). Then h is given by 1{T ≤t} and hence h = J1 . Writing the equation for state {1}, Z t (q01 πs0 + q11 πs1 ) ds + (h(1) − πs1 (h))πs1 (dYs − πs1 ds) 0 0 Z t Z t = π01 + (1 − πs1 )pt /gt ds + (1 − πs1 )πs1 (dYs − πs1 ds).
πt1 = π01 +
Z
t
0
0
3.36 Since β is bounded for ϕ ∈ Cb2 (R) by Itˆo’s formula t
Z
As ϕ(αs ) ds + Mtϕ ,
ϕ(αt ) − ϕ(α0 ) = 0
where As = β s
1 ∂2 ∂ + , ∂x 2 ∂x2
Rt and Mtϕ = 0 ϕ0 (Xs ) dVs is an Ft -adapted martingale. Analogously to Theorem 2.24, we can define a probability measure-valued process πt , such that for ft a bounded Ft -adapted process, π(ft ) is a version of the Dt -optional projection of ft . The equivalent of the innovations process It for this problem is Z t
It , δt −
πs (γs ) ds, 0
which is a Dt -adapted Brownian motion under P. By the representation result, Proposition 2.31, we can find a progressively measurable process νt such that Z t Z t πt (ϕ(αt )) − πs (As ϕ(αs )) ds = π0 (ϕ(α0 )) + νs dIs , 0
0
3.9 Solutions to Exercises
91
therefore it follows that Z πt (ϕ(αt )) = π0 (ϕ(α0 )) +
t
Z πs (As ϕ(αs )) ds +
0
t
νs dIs . 0
As in the innovations proof of the Kushner–Stratonovich equation, to identify ν, we can compute d(πt (ϕ(αt ))εt ) and d(εt ϕ(αt )) whence subtracting and taking expectations and using the independence of W and V we obtain that νt = πt (γt ϕ(αt )) − π(γt )π(ϕ(αt )), whence t πt (ϕ(αs )) = π0 (ϕ(α0 )) + πs βs ϕ0 (αs ) + 12 ϕ00 (αs ) ds 0 Z t + (πs (γs ϕ(αs )) − πs (γs )πs (ϕ(αs ))) (dδs − πs (γs )ds).
Z
0
3.37 i.
By Itˆ o’s formula d(Zˆt−1 ) = Zˆt−1 (−πt (h> )dYt + 12 kπt (h)k2 dt) + 12 Zˆt−1 kπt (h)k2 dt = −Zˆt−1 πt (h> )(dYt − πt (h)dt) = −Zˆt−1 πt (h> )dIt .
ii. Let εt ∈ St be such that dεt = iεt r> dYt and apply Itˆo’s formula to the product d(εt Zˆt−1 ) which yields d(εt Zˆt−1 ) = −εt Zˆt−1 πt (h> )dIt + iZˆt−1 εt rt> dYt − iεt Zˆt−1 hrt> dYt , πt (h> )dIt i = εt Zˆt−1 −πt (h> )dIt + irt> dYt − irt> πt (h)ds = εt Zˆt−1 −πt (h> ) + irt> dIt . Since by Proposition 2.30 the innovation process It is a Yt -adapted Brownian motion under the measure P it follows that taking expectation E[εt Zˆt−1 ] = E[ε0 Zˆ0−1 ] = 1. Now consider Z t > ˜ ˜ E[Zt εt ] = E[εt ] = E 1 + iεs rs dYs = 1, 0
˜ Thus since Yt is a Brownian motion under P. E[Zˆt−1 εt ] = E[Zt εt ].
92
3 The Filtering Equations
iii. It follows from the result of the previous part that h i ˜ Z˜t εt /Zˆt = E[ ˜ Z˜t Zt εt ]. E Hence
h i ˜ εt Zˆ −1 Z˜t − 1 = 0. E t
Clearly Zˆt and εt are Yt -measurable h i ˜ εt Zˆ −1 E[ ˜ Z˜t | Yt ] − 1 = 0. E t ˜ Z˜t | Yt ] − 1 is Yt -measurable, it follows from the total set Since Zˆt−1 E[ property of St that ˜ Z˜t | Yt ] = 1, Zˆt−1 E[
P-a.s.
Since Zˆt > 0 it follows that ˜ t | Yt ]. Zˆt = E[Z We may drop the a.s. qualification since it is implicit from the fact that conditional expectations are only defined almost surely. iv. By the Kallianpur–Striebel formula P-a.s. using the result of part (iii) πt (ϕ) =
ρt (ϕ) = Zˆt−1 ρt (ϕ). ρt (1)
Hence ρt (ϕ) = Zˆt πt , and note that by a simple application of Itˆo’s formula dZˆt = Zˆt πt (h> )dYt . Starting from the Kushner–Stratonovich equation dπt (ϕ) = πt (Aϕ)dt + πt (ϕh> )dIt − πt (ϕ)πt (h> )dIt . Applying Itˆ o’s formula to the product Zˆt πt we find dρt (ϕ) = dπt (ϕ)Zˆt + πt Zˆt πt (h> )dYt + dhZˆt , πt (ϕ)i = πt (Aϕ)dt + πt (ϕh> )dIt − πt (ϕ)πt (h> )dIt Zˆt + πt (ϕ)Zˆt πt (h> )dYt + Zˆt πt (h)(πt (ϕh> ) − πt (ϕ)πt (h> ))dt = Zˆt πt (Aϕ)dt + πt (ϕh> )dYt = ρt (Aϕ)dt + ρt (ϕh> )dYt . But this is the Zakai equation as required.
3.10 Bibliographical Notes
93
3.10 Bibliographical Notes In [160], Krylov and Rozovskii develop the theory of strong solutions of Itˆo equations in Banach spaces and use this theory to deduce the filtering equations in a different manner from the two methods presented here. In [163], Krylov and Zatezalo deduce the filtering equations using a PDE, rather than probabilistic, approach. They use extensively the elaborate theoretical framework for analyzing SPDEs developed by Krylov in [157] and [158]. The approach requires boundedness of the coefficients and strict ellipticity of the signal’s diffusion matrix.
4 Uniqueness of the Solution to the Zakai and the Kushner–Stratonovich Equations
The conditional distribution of the signal π = {πt , t ≥ 0} is a solution of the Kushner–Stratonovich equation, whilst its unnormalised version ρ = {ρt , t ≥ 0} solves the Zakai equation. It then becomes natural to ask whether the Zakai equation uniquely characterizes ρ, and the Kushner–Stratonovich equation uniquely characterizes π. In other words, we should like to know under what assumptions on the coefficients of the signal and observation processes the two equations have a unique solution. The question of uniqueness of the solutions of the two equations is central when attempting to approximate numerically π or ρ as most of the analysis of existing numerical algorithms relies on the SPDE characterization of the two processes. To answer the uniqueness question one has to identify suitable spaces of possible solutions to the equations (3.43) and (3.57). These spaces must be large enough to allow for the existence of solutions of the corresponding SPDE. Thus π should naturally belong to the space of possible solutions for the Kushner–Stratonovich equation, and ρ to the space of possible solutions to the Zakai equation. However, if we choose a space of possible solutions which is too large this may make the analysis more difficult, and even allow multiple solutions. In the following we present two approaches to prove the uniqueness of the solutions to the two equations: the first one is a PDE approach, inspired by Bensoussan [13]; the second one is a more recent functional analytic approach introduced by Lucic and Heunis [200]. For both approaches the following result is useful. Exercise 4.1. Let µ1 = {µ1t , t ≥ 0} and µ2 = {µ2t , t ≥ 0} be two M(S)valued stochastic processes with c` adl` ag paths and (ϕi )i≥0 be a separating set of bounded measurable functions (in the sense of Definition 2.12). If for each t ≥ 0 and i ≥ 0, the identity µ1t (ϕi ) = µ2t (ϕi ) holds almost surely, then µ1 and µ2 are indistinguishable.
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 4,
96
4 Uniqueness of the Solution
4.1 The PDE Approach to Uniqueness In this section we assume that the state space of the signal is S = Rd and that the signal process is a diffusion process as described in Section 3.2.1. First we define the space of measure-valued stochastic processes within which we prove uniqueness of the solution. This space has to be chosen so that it contains only measures with respect to which the integral of any function with linear growth is finite. The reason for this is that we want to allow the coefficients of the signal and observation processes to be unbounded. Define first the class of integrands for these measures. Let ψ : Rd → R be the function ψ(x) = 1 + kxk,
(4.1)
for any x ∈ Rd and define C l (Rd ) to be the space of continuous functions ϕ such that ϕ/ψ ∈ Cb (Rd ). Endow the space C l (Rd ) with the norm kϕkl∞ = sup
x∈Rd
|ϕ(x)| . ψ(x)
Also let E be the space of continuous functions ϕ : [0, ∞) × Rd → R such that for all t ≥ 0, we have sup kϕs kl∞ < ∞, (4.2) s∈[0,t]
where ϕs (x) = ϕ(s, x) for any (s, x) ∈ [0, ∞) × Rd . Let Ml (Rd ) ⊂ M(Rd ) be the space of finite measures µ over B(Rd ) such that µ(ψ) < ∞. In particular, this implies that µ(ϕ) < ∞ for all ϕ ∈ C l (Rd ). We endow Ml (Rd ) with the corresponding weak topology. That is, a sequence (µn ) of measures in Ml (Rd ) converges to µ ∈ Ml (Rd ) if and only if lim µn (ϕ) = µ(ϕ),
n→∞
(4.3)
for all ϕ ∈ C l (Rd ). Obviously this topology is finer than the usual weak topology (i.e. the topology under which (4.3) holds true only for ϕ ∈ Cb (Rd )). Exercise 4.2. For any µ ∈ Ml (Rd ) define νµ ∈ M(Rd ) to be the measure whose Radon–Nikodym derivative with respect to µ is ψ (defined in (4.1)). Let µ, µn , n ≥ 1 be measures in Ml (Rd ). Then µn converges to µ in Ml (Rd ) if and only if (νµn ) converges weakly to νµ in M(Rd ). Definition 4.3. The class U is the space of all Yt -adapted Ml (Rd )-valued stochastic processes µ = {µt , t ≥ 0} with c` adl` ag paths such that, for all t ≥ 0, we have Z t 2 ˜ E (µs (ψ)) ds < ∞. (4.4) 0
4.1 The PDE Approach to Uniqueness
97
Exercise 4.4. (Difficult) Let X be the solution of (3.9). Prove that if (3.10) is satisfied, X0 has finite second moment, and h is bounded then ρ belongs to the class U. [Hint: You will need to use the Kallianpur–Striebel formula and the normalised conditional distribution πt .] We prove that the Zakai equation (3.43) has a unique solution in the class U subject to the following conditions on the processes. Condition 4.5 (U). The functions f = (f i )di=1 : Rd → Rd appearing in the signal equation (3.9), a = (aij )i,j=1,...,d : Rd → Rd×d as defined in (3.12) and d m h = (hi )m i=1 : R → R appearing in the observation equation (3.5) have twice continuously differentiable components and all their derivatives of first- and second-order are bounded. Remark 4.6. Under condition U all components of the functions a, f and h are in C l (Rd ), but need not be bounded. However, condition U does imply that a, f and h satisfy the linear growth condition (see Exercise 4.11 for details). Exercise 4.7. i. Show that if the process µ belongs to the class U then t 7→ µt (ϕt ) is a Yt -adapted process for all ϕ ∈ E (where E is defined in (4.2)). ii. Let ϕ be a function in Cb1,2 ([0, t] × Rd ) and µ be a process in the class U. Assume that h satisfied the bounded growth condition (3.28). Then the processes Z t ∂ϕs t 7→ µs + Aϕs ds, t≥0 ∂s 0 Z t t 7→ µs (ϕs h> ) dYs , t≥0 0
are well defined Yt -adapted processes. In particular, the second process is ˜ a square integrable continuous martingale under the measure P. When establishing uniqueness of the solution of the Zakai equation, we need to make use of a time-inhomogeneous version of (3.43). Lemma 4.8. Assume that the coefficients a, f and g satisfy condition U. Let µ be a process belonging to the class U which satisfies (3.43) for any ϕ ∈ D(A). ˜ Then, P-almost surely, Z t Z t ∂ϕs µt (ϕt ) = π0 (ϕ0 ) + µs + Aϕs ds + µs (ϕs h> ) dYs , (4.5) ∂s 0 0 for any ϕ ∈ Cb1,2 ([0, t] × Rd ). Proof. Let us first prove that under condition U, µ satisfies equation (3.43) for any function ϕ ∈ Cb2 (Rd ) not just for ϕ in the domain of the infinitesimal generator ϕ ∈ D(A) ⊂ Cb2 (Rd ). We do this via an approximation argument.
98
4 Uniqueness of the Solution
Choose a sequence (ϕn ) such that ϕn ∈ D(A) (e.g. ϕn ∈ Ck2 (Rd )) such that, ϕn , ∂α ϕn , α = 1, . . . , d and ∂α ∂β ϕn , α, β = 1, . . . , d converge boundedly pointwise to ϕ, ∂α ϕ, α = 1, . . . , d and ∂α ∂β ϕ, α, β = 1, . . . , d. In other words the sequence (ϕn ) is uniformly bounded and for all x ∈ Rd , limn→∞ ϕn (x) = ϕ(x), with a similar convergence assumed for the first and second partial ˜ derivatives of ϕn . Then, P-almost surely Z t Z t µt (ϕn ) = π0 (ϕn ) + µs (Aϕn ) ds + µs (ϕn h> ) dYs . (4.6) 0
0
Since (ϕn ) is uniformly bounded and pointwise convergent, by the dominated convergence theorem, we get that lim µt (ϕn ) = µt (ϕ),
(4.7)
lim π0 (ϕn ) = π0 (ϕ).
(4.8)
n→∞
and similarly n→∞
The use of bounded pointwise convergence and condition U implies that there exists a constant K such that |Aϕn (x)| ≤ Kψ(x), for any x ∈ Rd and n > 0. Since µ ∈ U implies that µs (ψ) < ∞, by the dominated convergence theorem limn→∞ µs (Aϕn ) = µs (Aϕ). Also, from (4.4) it follows that Z t Z t 2 1 ˜ ˜ E µs (ψ) ds ≤ E 1 + µ (ψ) ds < ∞. (4.9) s 2 0
0
˜ Therefore, P-almost surely Z
t
µs (ψ) ds < ∞ 0
and, again by the dominated convergence theorem, it follows that Z t Z t ˜ lim µs (Aϕn ) ds = µs (Aϕ) ds P-a.s. n→∞
0
(4.10)
0
Similarly, one uses the integrability condition (4.4) and again the dominated convergence theorem to show that for i = 1, . . . , m, Z t 2 ˜ lim E (µs (ϕn hi ) − µs (ϕhi )) ds = 0; n→∞
0
hence by Itˆ o’s isometry property, we get that
4.1 The PDE Approach to Uniqueness
Z lim
n→∞
t
µs (ϕn h> ) dYs =
Z
0
99
t
µs (ϕh> ) dYs .
(4.11)
0
Finally, by taking the limit of both sides of the identity (4.6) and using the results (4.7), (4.8), (4.10) and (4.11) we obtain that µ satisfies equation (3.43) Rt for any function ϕ ∈ Cb2 (Rd ). The limiting processes t 7→ 0 µs (Aϕ) ds and Rt t 7→ 0 µs (ϕs h> ) dYs , t ≥ 0 are well defined as a consequence of Exercise 4.7. Let us extend the result to the case of time-dependent test functions ϕ ∈ Cb1,2 ([0, t] × Rd ). Once again by Exercise 4.7 all the integral terms in (4.5) are well defined and finite. Also from (3.43), for i = 0, 1, . . . , n − 1 we have Z
(i+1)t/n
µ(i+1)t/n (ϕit/n ) = µit/n (ϕit/n ) +
µs (Aϕit/n ) ds it/n
Z
(i+1)t/n
+
µs (ϕit/n h> ) dYs
it/n
for i = 0, 1, . . . , n − 1. By Fubini’s theorem we have that Z
(i+1)t/n
µ(i+1)t/n (ϕ(i+1)t/n − ϕit/n ) =
µ(i+1)t/n
it/n
∂ϕs ∂s
ds.
Hence µ(i+1)t/n (ϕ(i+1)t/n ) = µ(i+1)t/n (ϕ(i+1)t/n − ϕit/n ) + µ(i+1)t/n (ϕit/n ) Z (i+1)t/n ∂ϕs = µit/n (ϕit/n ) + µ(i+1)t/n ds ∂s it/n Z (i+1)t/n + µs Aϕit/n ds it/n
Z
(i+1)t/n
+
µs (ϕit/n h> ) dYs .
it/n
Summing over the intervals [it/n, (i + 1)t/n] from i = 0 to n − 1, Z t Z t ∂ϕs µt (ϕt ) = π0 (ϕ0 ) + µ([ns/t]+1)t/n ds + µs Aϕ[ns/t]t/n ds ∂s 0 0 Z t > + µs ϕ[ns/t]t/n h dYs . (4.12) 0
The claim follows by taking the limit as n tends to infinity of both sides of the identity (4.12) and using repeatedly the dominated convergence theorem. Note that we use the c` adl` ag property of the paths of µ to find the upper bound for the second term. t u
100
4 Uniqueness of the Solution
Exercise 4.9. Assume that the coefficients a, f and g satisfy condition U. Let µ be a process belonging to the class U which satisfies the Zakai equation (3.43) and ϕ be a function in Cb1,2 ([0, t] × Rd ). Let εt ∈ St , where St is the set defined in Corollary B.40, that is, Z t Z 1 t εt = exp i rs> dYs + krs k2 ds , 2 0 0 where r ∈ Cbm ([0, t], Rm ). Then Z t ∂ϕs ˜ t µt (ϕt )] = π0 (ϕ0 ) + E ˜ E[ε εs µs + Aϕs + iϕs h> rs ds ∂s 0
(4.13)
for any ϕ ∈ Cb1,2 ([0, t] × Rd ). In the following we establish the existence of a function ϕ ∈ Cb1,2 ([0, t]×Rd ) which plays the rˆole of a (partial) function dual of the process µ; in other words we seek ϕ such that for s ∈ [0, t], µs (ϕs ) = 0. In particular as a consequence of (4.13) and the fact that the set St is total, such a function could arise as a solution ϕ ∈ Cb1,2 ([0, t] × Rd ) of the second-order parabolic partial differential equation ∂ϕs (s, x) + Aϕs (s, x) + iϕs (s, x)h> (x)rs = 0, (4.14) ∂s where the operator A is given by Aϕ =
d X i,j=1
aij
d X ∂2 ∂ ϕ+ fi ϕ. ∂xi ∂xj ∂x i i=1
This leads to a unique characterisation of µ. The partial differential equation (4.14) turns out to be very hard to analyse for two reasons. Firstly, the coefficients aij (x) for i, j = 1, . . . , d, f i (x), and hi (x) for i = 1, . . . , d are not in general bounded as functions of x. Secondly, the matrix a(x) may be degenerate at some points x ∈ Rd . A few remarks on this degeneracy may be helpful. Since a(x) = 12 σ > (x)σ(x) it is clear that y > a(x)y = 12 y > σ > (x)σ(x)y = 1 > d 2 (σ(x)y) (σ(x)y) ≥ 0, thus for all x ∈ R , a(x) is positive semidefinite. However, a(x) is not guaranteed to be positive definite for all x ∈ Rd ; in other words there may exist x ∈ Rd such that there is a non-zero y such that y > a(x)y = 0, for example, if for some x, a(x) = 0 and this is not positive definite. Such a situation is not physically unrealistic since it has the interpretation of an absence of noise in the signal process at the point x. A typical existence and uniqueness result for parabolic PDEs is the following Theorem 4.10. If the PDE d d 2 X X ∂ϕt ∂ϕt ij ∂ ϕt = a + fi ∂t ∂x ∂x ∂xi i j i,j=1 i=1
(4.15)
4.1 The PDE Approach to Uniqueness
101
is uniformly parabolic, that is, if there exists λ > 0 such that x> ax ≥ λkxk2 for every x 6= 0, the functions f and a bounded and H¨ older continuous with exponent α and Φ is a C 2+α function, then there exists a unique solution to the initial condition problem given by (4.15) and the condition ϕ0 (x) = Φ(x). Furthermore if the coefficients a, f and the initial condition Φ are infinitely differentiable then the solution ϕ is infinitely differentiable in the spatial variable x. The proof of the existence of solutions to the parabolic PDE is fairly difficult and its length precludes its inclusion here. These details can be found in Friedman [102] as Theorem 7 of Chapter 3 and the continuity result follows from Corollary 2 in Chapter 3. Recall that the H¨older continuity condition is satisfied with α = 1 for Lipschitz functions. As these conditions are not satisfied by the PDE (4.14), we use a sequence of functions (v n ) which solves uniformly parabolic PDEs with smooth bounded coefficients. For this, we approximate a, f and h by bounded continuous functions. More precisely let (an )n≥1 be a sequence of functions an : Rd → Rd×d , (fn )n≥1 a sequence of functions fn : Rd → Rd and (hn )n≥1 a sequence of functions hn : Rd → Rm . We denote components as usual by superscript indices. We require that these sequences of functions have the following properties. All the component functions have bounded continuous derivatives of all orders; in other words each component is an element of Cb∞ (Rd ). There exists a constant K0 such that the bounds on the first- and second-order derivatives (but not necessarily on the function values) hold uniformly in n,
sup max ∂α aij sup max ∂α ∂β aij (4.16) n ∞ ≤ K0 , n ∞ ≤ K0 , n
n i,j,α,β
i,j,α
and the same inequality holds true for the partial derivatives of the components of fn and hn . We also require that these sequences converge to the original functions a, f and h; i.e. limn→∞ an (x) = a(x), limn→∞ fn (x) = f (x) and limn→∞ hn (x) = h(x) for any x ∈ Rd . Finally we require that the matrix an is uniformly elliptic; in other words for each n, there exists λn such that x> an x ≥ λn kxk2 for all x ∈ Rd . We write An ,
d X i,j=1
d
aij n
X ∂2 ∂ + fni , ∂xi ∂xj ∂x i i=1
for the associated generator of the nth approximating system.† †
To obtain an , we use first the procedure detailed in section 6.2.1. That is, we consider first the function ψ n a, where ψ n is the function defined in (6.23) (see also the limits (6.24), (6.25) and (6.26)). Then we regularize ψ n a by using the convolution operator T1/n as defined in (7.4), to obtain the function T1/n (ψ n a). More precisely, T1/n (ψ n a) is a matrix-valued function with components T1/n (ψ n aij ), 1 ≤ i, j ≤ d. Finally, we define the function an to be equal to T1/n (ψ n a) + n1 Id , where Id is the d × d identity matrix. The functions fn and hn are constructed in the same manner (without using the last step).
102
4 Uniqueness of the Solution
Exercise 4.11. If condition U holds, show that the entries of the sequences (an )n≥1 , (fn )n≥1 and (hn )n≥1 belong to C l (Rd ). Moreover show that there exists a constant K1 such that
l
i l
i l
sup max aij , max f , max h ≤ K1 . n ∞ n ∞ n ∞ n
i,j
i
i
Next we use a result from the theory of systems of parabolic partial differential equations. Consider the following partial differential equation ∂vsn = −An vsn − ivsn h> n rs , ∂s
s ∈ [0, t]
(4.17)
with final condition vtn (x) = Φ(x),
(4.18)
where r ∈ Cbm ([0, t], Rm ) and Φ is a complex-valued C ∞ function. In other words, if vsn = vsn,1 + ivsn,2 , s ∈ [0, t], Φ = Φ1 + iΦ2 then we have the equivalent system of real-valued PDEs ∂vsn,1 = −An vsn,1 + vsn,2 h> n rs ∂s ∂vsn,2 = −An vsn,2 − vsn,1 h> n rs ∂s
vtn,1 (x) = Φ1 (x), (4.19) vtn,2 (x)
2
= Φ (x).
We need to make use of the maximum principle for parabolic PDEs in the domain [0, T ] × Rd . Lemma 4.12. Let A=
d X i,j=1
aij (x)
∂2 ∂ + fi (x) ∂xi ∂xj ∂xi
Pd be an elliptic operator; that is, for all x ∈ Rd , it holds that i,j=1 yi aij (x)yj > 0 for all y ∈ Rd \ {0}. Let the coefficients aij (x) and fi (x) be continuous in x. If u ∈ C 1,2 ([0, ∞) × Rd ) is such that Au −
∂u ≥0 ∂t
(4.20)
in (0, ∞) × Rd with u(0, x) = Φ(x) and u is bounded above, then for all t ∈ [0, ∞), ku(t, x)k∞ ≤ kΦk∞ . (4.21) Proof. Define w(t, x) = u(t, x) − kΦk∞ . It is immediate that Aw − ∂w ∂t ≥ 0. Clearly w(0, x) ≤ 0 for all x ∈ Rd . Consider the region (0, t] × Rd for t fixed. If (4.21) does not hold for s ∈ [0, t] then w(t, x) > 0 for some 0 < s ≤ t, x ∈ Rd . As we have assumed that u is bounded above, the same holds for w, which
4.1 The PDE Approach to Uniqueness
103
implies that w has a positive maximum in the region (0, t] × Rd (including the boundary at t). Suppose this occurs at the point P0 = (x, t); then it follows by Theorem 40 of Chapter 2 of Friedman [102] that w assumes this positive constant value over the whole region S(P0 ) = [0, t] × Rd which is clearly a contradiction since w(0, x) ≤ 0 and w is continuous in t. Thus w(t, x) ≤ 0 for all x ∈ Rd which establishes the result. t u Exercise 4.13. Prove the above result in the case where the coefficients aij for i, j = 1, . . . , d and fi for i = 1, . . . , d are bounded, without appealing to general results from the theory of parabolic PDEs. By modifying the above proof of Lemma 4.12 it is clear that it is sufficient to prove directly that if u ∈ C 1,2 ([0, ∞) × Rd ) is bounded above, satisfies (4.20), and u(0, x) ≤ 0, then u(t, x) ≤ 0 for t ∈ [0, ∞) and x ∈ Rd . This may be done in the following stages. i. First, by considering derivatives prove that if (4.20) were replaced by Au −
∂u >0 ∂t
(4.22)
then u(t, x) cannot have a maximum in (0, t] × Rd . ii. Show that if u satisfies the original condition (4.20) then show that we can find δ and ε such that wδ,ε , u(t, x) − δt − εe−t kxk2 satisfies the stronger condition (4.22). iii. Show that if u(t, x) ≥ 0 then wδ,ε must have a maximum in (0, t] × Rd ; hence use (i) to establish the result. Proposition 4.14. If Φ1 , Φ2 ∈ Cb∞ (Rd ), then the system of PDEs (4.19) has a solution (v n,1 , v n,2 ) where v n,i ∈ Cb1,2 ([0, t]×Rd ) for i = 1, 2, for which there exists a constant K2 independent of n such that kv n,i k, k∂α v n,i k, k∂α ∂β v n,i k, for i = 1, 2, α, β = 1, . . . , d are bounded by K2 on [0, t] × Rd . Proof. We must rewrite our PDE as an initial value problem, by reversing n time. That is, we define v¯sn , vt−s for s ∈ [0, t]. Then we have the following system of real-valued partial differential equations and initial conditions ∂¯ vsn,1 = An v¯sn,1 − v¯sn,2 h> n rt−s ∂s ∂¯ vsn,2 = An v¯sn,2 + v¯sn,1 h> n rt−s ∂s
v¯0n,1 (x) = Φ1 (x), (4.23) v¯0n,2 (x)
2
= Φ (x).
As the operator An is uniformly elliptic and has smooth bounded coefficients, the existence of the solution of (4.23) is justified by Theorem 4.10 (the coefficients have uniformly bounded first derivative and are therefore Lipschitz and thus satisfy the H¨ older continuity condition). Furthermore since the initial condition and coefficients are also smooth, the solution v¯n (and thus v n ) is also smooth (has continuous derivatives of all orders) in the spatial variable.
104
4 Uniqueness of the Solution
It only remains to prove the boundedness of the solution and of its first and second derivatives. Here we follow the argument in Proposition 4.2.1, page 90 from Bensoussan [13]. Define 1 n,1 2 n,2 2 n zt , v¯t + v¯t . (4.24) 2 Then d X ∂zsn − An zsn = − aαβ ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯n,2 ∂β v¯sn,2 ≤ 0. n ∂s α,β=1
Therefore from our version of the positive maximum principle, Lemma 4.12, it follows that
n,1 2
v¯s + v¯sn,2 2 ≤ Φ1 2 + Φ2 2 , (4.25) ∞ ∞ ∞ ∞ for any s ∈ [0, t], which establishes the bound on kv n,i k. Define uns ,
d 2 2 1 X ∂α v¯sn,1 + ∂α v¯sn,2 . 2 α=1
(4.26)
Then ∂uns − An uns = ∂s d X − aαβ n
∂α ∂γ v¯sn,1
∂β ∂γ v¯sn,1 + ∂α ∂γ v¯sn,2 ∂β ∂γ v¯sn,2
α,β,γ=1
+
d X
∂γ aαβ n
∂α ∂β v¯sn,1
∂γ v¯sn,1 + ∂α ∂β v¯sn,2 ∂γ v¯sn,2
α,β,γ=1
+
d X
∂β fnα ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯sn,2 ∂β v¯sn,2
α,β=1
+
d X
∂α gn,s (−¯ vsn,2 ∂α v¯sn,1 + v¯sn,1 ∂α v¯sn,2 ),
(4.27)
α=1
where gn,s = h> n rt−s . The first term in (4.27) is non-positive as a consequence of the non-negative definiteness of a. Then by (4.16), since |∂β fnα | is uniformly Pd Pd bounded by K0 , using the inequality ( i=1 ai )2 ≤ d i=1 a2i , the third term of (4.27) satisfies d X α,β=1
∂β fnα ∂α v¯n,1 ∂β v¯n,1 + ∂α v¯n,2 ∂β v¯n,2 ≤ 2K0 duns .
(4.28)
4.1 The PDE Approach to Uniqueness
105
Similarly, from (4.16) and (4.25) we see that the fourth term of (4.27) satisfies d X
∂α gn,s (−¯ vsn,2 ∂α v¯sn,1 + v¯sn,1 ∂α v¯sn,2 ) ≤ K0
α=1
d X
|¯ vsn,2 | ∂α v¯sn,1 + |¯ vsn,1 | ∂α v¯sn,2
α=1 d
X ∂α v¯sn,1 + ∂α v¯sn,2 ≤ K0 Φ1 ∞ + Φ2 ∞ α=1
≤ K0 Φ1 ∞ + Φ2 ∞ (uns + d) ≤ C4 (uns + d),
(4.29)
where the constant C4 , K0 (kΦ1 k∞ + kΦ2 k∞ ). It only remains to find a suitable bound for the second term in (4.27). This is done using the following lemma, which is due to Oleinik–Radkevic (see [234, page 64]). Recall that a d × d-matrix a is said to be non-negative definite if θ> aθ ≥ 0 for all θ ∈ Rd . Lemma 4.15. Let a : R → Rd×d , be a symmetric non-negative definite matrix-valued function which is twice continuously differentiable and denote its components aij (x) for 1 ≤ i, j ≤ d. Let u be any symmetric d × d-matrix; then 2 (tr(a0 (x)u)) ≤ 2d2 λ tr(ua(x)u) ∀x ∈ R, where primes denote differentiation with respect to x, and ( ) θ> a00 (x)θ d λ = sup : x ∈ R, θ ∈ R \{0} . kθk2 Proof. We start by showing that 0 q aij (x) ≤ λ(aii (x) + ajj (x))
∀x ∈ R.
(4.30)
Let ϕ ∈ C 2 (R) be a non-negative function with uniformly bounded second derivative; let α = supx∈R |ϕ00 (x)|. Then Taylor’s theorem implies that 0 ≤ ϕ(x + y) ≤ ϕ(x) + yϕ0 (x) + αy 2 /2; thus the quadratic in y must have no real roots, which implies that the discriminant is non-positive thus p |ϕ0 (x)| ≤ 2αϕ(x). Let ei denote the standard basis of Rd ; define the functions > ϕij ± (x) = (ei ± ej ) a(x)(ei ± ej ) = aii (x) ± 2aij (x) + ajj (x).
From the fact that a is non-negative definite, it follows that ϕij ± (x) ≥ 0. From √ 00 the definition of λ, since kei ± ej k = 2, it follows that |ϕ± (x)| < 2λ; thus applying the above result
106
4 Uniqueness of the Solution
0 p ϕ± (x) ≤ 4λϕ± (x). From the definition aij (x) = (ϕ+ − ϕ− )/4, using (4.30) |a0ij (x)| ≤ (|ϕ0+ (x)| + |ϕ0− (x)|)/4 p p ≤ 12 λϕ+ (x) + λϕ− (x) p √ ≤ λ(ϕ+ (x) + ϕ− (x))/ 2 q ≤ λ(aii (x) + ajj (x)). To establish the main result, by Cauchy–Schwartz 2 d X 2 (tr(a0 (x)u)) = a0ij (x)uji i,j=1
≤d
2
d X
a0ij (x)uji
2
i,j=1
≤ 2λd2
d X
(aii (x) + ajj (x))(uji )2
i,j=1
≤ 2d2 λ
d X
uij ajj (x)uji .
i,j=1
In general since a is real-valued and symmetric, at any x we can find an orthogonal matrix q such that q > a(x)q is diagonal. We fix this matrix q and then since tr(q > uq) = tr(qq > u) = tr u, it follows that 2 2 (tr(a0 (x)u)) = tr(q > a0 (x)qq > uq) ≤ 2d2 λ
d X
(q > uq)ij (q > a(x)q)jj (q > uq)ji
i,j=1
≤ 2d λ tr (q > uq)(q > a(x)q)(q > uq) 2
≤ 2λd2 tr(ua(x)u). t u Taking uα,β = ∂α ∂β v¯sn,i , Lemma 4.15 implies that d X α,β,γ=1
∂γ aαβ ¯sn,i n ∂α ∂β v
2
≤ C2
d X
aαβ ∂α ∂γ v¯sn,i n
∂β ∂γ v¯sn,i ,
i = 1, 2,
α,β,γ=1
where C2 only depends upon the dimension of the space and K0 (in particular, it depends on the bound on the second partial derivatives of the entries of an ). Hence, by using the elementary inequality, for C > 0,
4.1 The PDE Approach to Uniqueness
τζ ≤
1 2 1 2 τ + Cζ , 2C 2
107
(4.31)
on each term in the summation in the second term of (4.27) one can find an upper bound for the second sum of the form 1 n 2 Θs
+ C2 uns ,
where Θsn is given by Θsn ,
d X
aαβ n
∂α ∂γ v¯sn,1
∂β ∂γ v¯sn,1 + ∂α ∂γ v¯sn,2 ∂β ∂γ v¯sn,2 ,
α,β,γ=1
and as a is non-negative definite Θsn ≥ 0. By substituting the bounds (4.28), (4.29) and (4.31) into (4.27) we obtain the bound ∂uns − An uns ≤ −Θsn + 12 Θsn + C2 uns + 2K0 duns + C4 (uns + d) ∂s ≤ C2 uns + 2K0 duns + C4 (uns + d) ≤ C0 uns + C1 , where the constants C0 and C1 only depend upon the dimension of the space and K0 (and not upon s or x). Thus u ˆns =
C1 −C0 s e + uns e−C0 s C0
satisfies
∂u ˆns − An u ˆns ≤ 0; ∂t thus from the maximum principle in the form of Lemma 4.12 we have that kˆ uns k∞ ≤ kˆ un0 k∞ , but u ˆ0 = C1 /C0 + un0 , ! d
C1 1 X n C0 T 1 2 2 2
kus k∞ ≤ e ∂α Φ ∞ + ∂α Φ ∞ + , 2 α=1 C0 which establishes the uniform bound on the first derivatives. The bound on the second-order partial derivatives of v¯ is obtained by performing a similar, but more tedious, analysis of the function wtn
d 2 2 1 X n,1 n,2 , ∂α ∂β v¯t + ∂α ∂β v¯t . 2 α,β=1
Similar bounds will not hold for higher-order partial derivatives.
t u
Theorem 4.16. Assuming condition U on the coefficients a, f and g, the equation (4.5) has a unique solution in the class U, up to indistinguishability.
108
4 Uniqueness of the Solution
Proof. Let v n be the solution to the PDE (4.17). Applying Exercise 4.9 to v n yields that for any solution µ of (3.43) in the class U we have Z t n ∂vs n n n > n ˜ ˜ E[εt µt (vt )] = π0 (v0 ) + E εs µs + Avs + ih vs rs ds ∂t 0 and using the fact that vsn satisfies (4.17) we see that ˜ t µt (v n )] = π0 (v n ) E[ε t 0 Z t ˜ +E εs µs (A − An ) vsn + ivsn (h − hn )> rs ds .
(4.32)
0
As a consequence of Proposition 4.14, v n and its first- and second-order partial derivatives are uniformly bounded and consequently, lim vsn (x)(h> (x) − h> n (x))rs (x) = 0
lim (A − An )vsn (x) = 0,
n→∞
n→∞
for any x ∈ Rd×d . Also there exists a constant Ct independent of n such that |(A − An )vsn (x)|, |vsn (x)(h(x) − hn (x))> rs | ≤ Ct ψ(x) for any x ∈ Rd×d and s ∈ [0, t]. Hence, as µs ∈ U it follows that µs (ψ) < ∞ and thus by the dominated convergence theorem we have that lim µs (A − An )vsn + ivsn (h − hn )> rs = 0. n→∞
Next let us observe that sups∈[0,t] |εs | < exp(sups∈[0,t] krs kt/2) < ∞, hence there exists a constant Ct0 such that for s ∈ [0, t], εs µs (A − An )vsn + ivsn (h − hn )> rs ≤ Ct0 µs (ψ) and since as a consequence of (4.4), it follows that (4.9) holds; thus Z t ˜ E µs (ψ) ds < ∞. 0
Ct0 µs (ψ)
It follows that is a dominating function, thus by the dominated convergence theorem it follows that Z t ˜ lim E εs µs (A − An )vsn + ivsn (h − hn )> rs ds = 0. (4.33) n→∞
0
Finally, let µ1 and µ2 be two solutions of the Zakai equation (3.43) in the class U. Then from (4.32), ˜ t µ1 (v n )] − E[ε ˜ t µ2 (v n )] E[ε t t t t Z t 1 2 n n > ˜ =E εs µs − µs (A − An )vs + ivs (h − hn ) rs ds . 0
4.1 The PDE Approach to Uniqueness
109
The final condition of the partial differential equation (4.18) implies that vtn (x) = Φ(x) for all x ∈ Rd ; thus ˜ t µ1 (Φ)] − E[ε ˜ t µ2 (Φ)] E[ε t t Z t 1 2 n n > ˜ =E εs µs − µs (A − An )vs + ivs (h − hn ) rs ds 0
and we may then pass to the limit as n → ∞ using (4.33) to obtain ˜ t µ1 (Φ)) = E(ε ˜ t µ2 (Φ)). E(ε t t
(4.34)
The function Φ was an arbitrary Cb∞ function, therefore using the fact that the ˜ set St is total, for ϕ any smooth bounded function, P-almost surely µ1t (ϕ) = 2 n µt (ϕ). From the bounds we know that kv0 k∞ ≤ kΦk∞ , thus by the dominated convergence theorem since π0 is a probability measure lim π0 (v0n ) = π0 lim v0n ; n→∞
n→∞
passing to n → ∞ we get ˜ t µt (Φ)) = π0 E(ε whence
lim v0n
n→∞
˜ E(εt µt (Φ)) ≤ kΦk∞ .
By the dominated convergence theorem, we can extend (4.34) to any ϕ which is a continuous bounded function. Hence by Exercise 4.1 µ1t and µ2t are indistinguishable. t u Exercise 4.17. (Difficult) Extend Theorem 4.16 to the correlated noise framework. Now let µ = {µt , t ≥ 0} be a Yt -adapted Ml (Rd )-valued stochastic process with c` adl` ag paths and mµ = {mµt , t ≥ 0} be the Yt -adapted real-valued process Z t Z 1 t mµt = exp µs (h> ) dYs − µs (h> )µs (h) ds , t ≥ 0. 2 0 0 We prove uniqueness for the Kushner–Stratonovich equation (3.57) in the class U¯ of all Yt -adapted Ml (Rd )-valued stochastic processes µ = {µt , t ≥ 0} with c` adl` ag paths such that the process mµ µ belongs to the class U. Exercise 4.18. Let X be the solution of the SDE (3.9). Prove that if (3.10) is satisfied, π0 has finite third moment and h satisfies the linear growth condition ¯ (3.28) then the process π belongs to the class U.
110
4 Uniqueness of the Solution
Theorem 4.19. Assuming condition U on the coefficients a, f and g the ¯ up to indistinguishability. equation (3.57) has a unique solution in the class U, Proof. Let π 1 and π 2 be two solutions of the equation (3.57) belonging to ¯ Then by a straightforward integration by parts, one shows that the class U. i ρi = mπ π i , i = 1, 2 are solutions of the Zakai equation (3.43). However, by Theorem 4.16, equation (3.43) has a unique solution in the class U (where both ρ1 and ρ2 reside). Hence, ρ1 and ρ2 coincide. In particular, P-almost surely 1 2 mπt = ρ1t (1) = ρ2t (1) = mπt for all t ≥ 0. and hence πt1 =
1 ρ1 1 ρt (1) t
=
1 ρ2 2 ρt (1) t
= πt2
for all t ≥ 0, P-almost surely.
t u
4.2 The Functional Analytic Approach In this section, uniqueness is proved directly for the case when the signal and observation noise are correlated. However, in contrast to all of the arguments which have preceded this we assume that the function h is bounded. We recall that A, Bi : B(S) → B(S), i = 1, . . . , m are operators with domains, respectively, D(A), D(Bi ) ⊆ B(S), i = 1, . . . , m with 1 ∈ D , D(A) ∩
m \
D(Bi )
and
A1 = B1 1 = · · · = Bn 1 = 0.
(4.35)
i=1
As in the previous section we need to define the space of measure-valued stochastic processes within which we prove uniqueness of the solution. We ˜ is a complete probability space and that the filtration recall that (Ω, F, P) ˜ the process Y (Ft )t≥0 satisfies the usual conditions. Also recall that, under P, is an Ft -adapted Brownian motion. The conditions (4.35) imply that for all t ≥ 0 and ϕ ∈ D since Bϕ is bounded, Z t Z t 2 2 (µs (kBϕk)) ds < kBϕk2∞ (µs (1)) ds, (4.36) 0
0
for any µ = {µt , t ≥ 0} which is an Ft -adapted M(S)-valued stochastic process. Definition 4.20. Let U 0 be the class of Ft -adapted M(S)-valued stochastic processes µ = {µt , t ≥ 0} with c` adl` ag paths that satisfy conditions (4.36) and (3.42); that is, for all t ≥ 0, ϕ ∈ D, "Z m # tX 2 ˜ P [µs (|(hi + Bi ) ϕ|)] ds < ∞ = 1. (4.37) 0 i=1
4.2 The Functional Analytic Approach
111
Let ρ = {ρs , s ≥ 0} be the M(S)-valued process with c`adl`ag paths which is the unnormalised conditional distribution of the signal given the observation process as defined in Section 3.4. We have assumed that h = (hi )m i=1 : S → R for i = 1, . . . , m is a bounded measurable function hence it satisfies condition (3.25) which in turn ensures that the process Z˜ = {Z˜t , t ≥ 0} introduced in ˜ where P ˜ is the probability (3.30) and (3.31) is a (genuine) martingale under P, measure defined in Section 3.3. Exercise 4.21. Prove that the mass process ρ(1) = {ρt (1), t ≥ 0} is a Yt ˜ adapted martingale under P. ˜ which Since the mass process ρ(1) = {ρt (1), t ≥ 0} is a martingale under P is c` adl` ag by Lemma 3.18, it is almost surely bounded on compact intervals. Exercise 4.22. Prove that if (3.42) is satisfied, then the process ρ as defined by Definition 3.17 belongs to the class U 0 . ˜ Recall that, for any t ≥ 0 and ϕ ∈ D we have, P-almost surely that the unnormalised conditional distribution satisfies the Zakai equation, which in the correlated noise situation which we are considering here is Z t Z t ρt (ϕ) = π0 (ϕ) + ρs (Aϕ) ds + ρs ((h> + B > )ϕ) dYs , (4.38) 0
0
where condition (4.37) ensures that the stochastic integral in this equation is well defined. Proposition 4.23. If h is a bounded measurable function and ρ = {ρt , t ≥ 0} is an Ft -adapted M(S)-valued stochastic process belonging to the class U 0 which satisfies (4.38), then for any α > 0, there exists a constant k(α) such that " # α ˜ sup (ρs (1)) < k(α) < ∞. E (4.39) s∈[0,t]
Proof. From condition (4.35) and equation (4.38) for ϕ = 1, we get that Z ρt (1) = 1 +
t
ρs (h> ) dYs .
(4.40)
0
In the following we make use of the normalised version of ρt (hi ). Since we do not know that ρt (1) is strictly positive this normalisation must be defined with some care. Let ρ¯t (hi ) be defined as ρt (hi ) if ρt (1) > 0 ρ¯t (hi ) = ρt (1) 0 if ρt (1) = 0.
112
4 Uniqueness of the Solution
Since h is bounded it follows that ρt (hi ) ≤ khi kρt (1); hence ρ¯t (hi ) ≤ khi k. Hence ρt (1) satisfies the equation Z t ρt (1) = 1 + ρ¯t (h> )ρt (1) dYs (4.41) 0
and has the explicit representation (as in Lemma 3.29) ρt (1) = exp
m Z X i=1
t
0
1 ρ¯s (hi ) dYsi − 2
Z
t
! (¯ ρs (hi )) ds . 2
0
We apply Lemma 3.9 to the bounded m-dimensional process ξ = {ξt , t ≥ 0} defined as ξti , ρ¯t (hi ), i = 1, . . . , m, t ≥ 0 and deduce from the boundedness ˜ Also of ρ¯t that ρt (1) is a (genuine) Yt -adapted martingale under P. ! Z m X α2 − α t α 2 α (ρt (1)) = zt exp (¯ ρs (hi )) ds 2 0 i=1 m ≤ ztα exp t α2 − α khk2∞ , (4.42) 2 where the process z α = {ztα , t ≥ 0} is defined by ztα
! Z m Z t X α2 t 2 i , exp α ρ¯s (hi ) dYs − (¯ ρs (hi )) ds , 2 0 0 i=1
t ≥ 0.
˜ martingale by using Lemma 3.9. By Doob’s maximal and is again a genuine P inequality we get from (4.42) that for α > 1, " # α α α ˜ sup (ρs (1)) ≤ ˜ [(ρt (1))α ] E E α−1 s∈[0,t] α α m ≤ exp t α2 − α khk2∞ . α−1 2 Hence defining k(α) =
α α−1
α m exp t α2 − α khk2∞ , 2
we have established the required bound for α > 1. The bound (4.39) for 0 < α ≤ 1 follows by a straightforward application of Jensen’s inequality. For example, " # " #!α/2 α 2 ˜ ˜ E sup (ρs (1)) ≤ E sup (ρs (1)) ≤ k(2)α/2 . s∈[0,t]
s∈[0,t]
t u
4.2 The Functional Analytic Approach
113
The class U 0 of measure-valued stochastic processes is larger than the class U defined in the Section 4.1. This is for two reasons; firstly because the constituent processes are no longer required to be adapted to the observation filtration Yt , but to the larger filtration Ft . This relaxation is quite important as it leads to the uniqueness in distribution of the weak solutions of the Zakai equation (4.38) (see Lucic and Heunis [200] for details). The second relaxation is that condition (4.4) is no longer imposed. Unfortunately, this has to be done at the expense of the boundedness assumption on the function h. Following Proposition 4.23, assumption (4.37) can be strengthened to "Z m # tX 2 ˜ E ρs (|(hi + Bi )ϕ|) ds 0 i=1 2 ˜ ≤ m (kBϕk∞ + khk∞ kϕk∞ ) E
Z
t
2 (ρs (1)) ds
0 2
≤ m (kBϕk∞ + khk∞ kϕk∞ ) tk(2) < ∞.
(4.43)
In particular, this implies that the stochastic integral in (4.38) is a (genuine) martingale. Let us define the operator Φ : B(S × S) → B(S × S) with domain D(Φ) = {ϕ ∈ B(S × S) : ϕ(x1 , x2 ) = ϕ1 (x1 )ϕ2 (x2 ), ∀x1 , x2 ∈ S, ϕ1 , ϕ2 ∈ D} defined as follows. For ϕ ∈ D(Φ) such that ϕ(x1 , x2 ) = ϕ1 (x1 )ϕ2 (x2 ), for all x1 , x2 ∈ S we have Φϕ(x1 , x2 ) = ϕ1 (x1 )Aϕ2 (x2 ) + ϕ2 (x2 )Aϕ1 (x1 ) m X + (hi + Bi )ϕ1 (x1 )(hi + Bi )ϕ2 (x2 ).
(4.44)
i=1
We introduce next the following deterministic evolution equation Z t νt ϕ = ν0 (ϕ) + νs (Φϕ) ds,
(4.45)
0
where ν = {νt , t ≥ 0} is an M(S × S)-valued stochastic process, with the property that the map t 7→ νt ϕ : [0, ∞) → [0, ∞) is Borel-measurable for any ϕ ∈ B(S × S) and integrable for any ϕ in the range of Φ. m Condition 4.24 (U0 ). The function h = (hi )m appearing in the i=1 : S → R observation equation (3.5) is a bounded measurable function and the deterministic evolution equation (4.45) has a unique solution.
Of course, condition U0 is not as easy to verify as the corresponding condition U which is used in the PDE approach of Section 4.1. However Lucic and Heunis [200] prove that, in the case when the signal satisfies the stochastic differential equation,
114
4 Uniqueness of the Solution
dXti = f i (Xt ) dt +
n X
σ ij (Xt ) dVtj +
j=1
m X
σ ¯ ij (Xt ) dWtj ,
(4.46)
j=1
then condition U0 is implied by the following condition which is easier to verify. Condition 4.25 (U00 ). The function f = (f i )di=1 : Rd → Rd appearing in the signal equation (4.46) is Borel-measurable, whilst the functions σ = (σ ij )i=1,...,d,j=1,...,n : Rd → Rd×n and σ ¯ = (¯ σ ik )i=1,...,d,k=1,...,m : Rd → Rd×m are continuous and there exists a constant K such that, for x ∈ Rd , they satisfy the following linear growth condition max |f i (x)|, |σ ij (x)|, |¯ σ ik (x)| ≤ K(1 + |x|). i,j,k
Also σ ¯σ ¯ > is a strictly positive definite matrix for any x ∈ Rd . Finally, the m function h = (hi )m appearing in the observation equation (3.5) is i=1 : S → R a bounded measurable function. The importance of Condition U0 is that it ensures that there are enough functions in the domain of Φ so that ν = {νt , t ≥ 0} is uniquely characterized by (4.45). Lucic and Heunis [200] show that, under condition U00 , the closure of the domain of Φ contains the set of bounded continuous functions which in turn implies the uniqueness of (4.45). Theorem 4.26. Assuming condition U0 , the equation (4.38) has a unique solution in the class U 0 , up to indistinguishability. Proof. Let ρ1 = {ρ1t , t ≥ 0} and ρ2 = {ρ2t , t ≥ 0} be two processes belonging to the class U 0 and define the M(S × S)-valued processes ραβ = {ραβ t , t ≥ 0},
α, β = 1, 2
to be the unique processes for which β α ραβ t (Γ1 × Γ2 ) = ρt (Γ1 )ρt (Γ2 ),
for any Γ1 , Γ2 ∈ B(S) and t ≥ 0.
Of course ραβ is an Ft -adapted, progressively measurable process. Also define ν αβ = {νtαβ , t ≥ 0} for α, β = 1, 2 as follows h i ˜ ραβ (Γ ) νtαβ (Γ ) = E for any Γ ∈ B(S × S) and t ≥ 0. t It follows that νtαβ is a positive measure on (S × S, B(S × S)) and from Proposition 4.23 we get that, for any t ≥ 0, h i ˜ ρα (S)ρβ (S) ≤ k(2); sup νsαβ (S × S) = sup E t t s∈[0,t]
s∈[0,t]
hence ν αβ is uniformly bounded with respect to s in any interval [0, t] and by Fubini’s theorem t 7→ νtαβ (Γ ) is Borel-measurable for any Γ ∈ B(S × S). Let
4.2 The Functional Analytic Approach
115
ϕ ∈ B(S × S) such that ϕ ∈ D(Φ). By definition, ϕ(x1 , x2 ) = ϕ1 (x1 )ϕ2 (x2 ) and for all x1 , x2 ∈ S, ϕ1 , ϕ2 ∈ D and β α dραβ (ϕ) = d ρ (ϕ )ρ (ϕ ) 1 2 t t t
α β β α β = ρt (ϕ1 ) dρt (ϕ2 ) + ρα t (ϕ2 ) dρt (ϕ2 ) + d ρ (ϕ1 ), ρ (ϕ2 ) t β β > > = ρα (ϕ ) ρ (Aϕ ) dt + ρ ((h + B )ϕ ) dY α 2 2 t t t t β α > > + ρt (ϕ2 ) ρt (Aϕ1 ) dt + ρα t ((h + B )ϕ1 ) dYt +
m X
β ρα t ((hi + Bi )ϕ1 )ρt ((hi + Bi )ϕ2 ) dt.
i=1
In other words using Φ defined in (4.44) for ϕ ∈ D(Φ), Z t Z t αβ αβ αβ ρt (ϕ) = ρ0 (ϕ) + ρs (Φϕ) ds + Λαβ s (ϕ) dYs , 0
(4.47)
0
α β > > β α > > where Λαβ s (ϕ) , ρs (ϕ1 )ρs ((h +B )ϕ2 )+ρs (ϕ2 )ρs ((h +B )ϕ1 ). By Proposition 4.23 and the Cauchy–Schwartz inequality we have that Z t Z t 2 α 2 β 2 E Λαβ (ϕ) ds ≤ M E ρ (1) ρ (1) ds s s s 0 "0 #
≤ M tE
2 sup ρα sup ρβs (1)2 s (1) s∈[0,T ]
s∈[0,T ]
v " # " # u u β 4 E sup ρs (1)4 ≤ M ttE sup ρα s (1) s∈[0,T ]
s∈[0,T ]
≤ M tk(4) < ∞, where the constant M is given by M = 4 max kϕ1 k2∞ , kϕ2 k2∞ ,
m X
k(hi +
2 Bi )ϕ1 k∞
i=1
,
m X
! k(hi +
2 Bi )ϕ2 k∞
,
i=1
which is finite since ϕ1 , ϕ2 ∈ D and consequently they belong to the domain of Bi , i = 1, . . . , m. It follows that the stochastic integral in (4.47) is a martingale with zero expectation. In particular, from (4.47) and Fubini’s theorem we get that for ϕ ∈ D(Φ), h i ˜ ραβ (ϕ) νtαβ (ϕ) = E t Z t αβ ˜ ραβ (ϕ) + =E ρ (Φϕ) ds s 0 0
=
ν0αβ (ϕ)
Z + 0
t
νsαβ (Φϕ) ds.
(4.48)
116
4 Uniqueness of the Solution
In (4.48), the use of the Fubini’s theorem is justified as the mapping (ω, s) ∈ Ω × [0, t] 7→ ραβ s (Φϕ) ∈ R is F ×B([0, t])-measurable (it is a product of two F ×B([0, t])-measurable mappings) and integrable (following Proposition 4.23). From (4.48), we deduce that ν αβ is a solution of the equation (4.45), hence by condition U0 the deterministic evolution equation has a unique solution and since ν011 = ν012 = ν022 , we have that for any t ≥ 0, νt11 = νt22 = νt12 . This implies that for any ϕ bounded Borel-measurable function we have h 2 i E ρ1t (ϕ) − ρ2t (ϕ) = νt11 (ϕ × ϕ) + νt11 (ϕ × ϕ) − 2νt12 (ϕ × ϕ) = 0. ˜ Hence ρ1t (ϕ) = ρ2t (ϕ) holds P-almost surely and by Exercise 4.1, the measurevalued processes ρ1 and ρ2 are indistinguishable. t u As in the previous section, now let µ = {µt , t ≥ 0} be an Ft -adapted M(S)-valued stochastic processes with c` adl` ag paths and mµ = {mµt , t ≥ 0} be the Ft -adapted real-valued process Z t Z 1 t µs (h> )µs (h) ds , t ≥ 0. mµt = exp µs (h> ) dYs − 2 0 0 Define the class U¯0 of all Ft -adapted M(S)-valued stochastic processes with c` adl` ag paths such that the process mµ µ belongs to the class U 0 . Exercise 4.27. Let X be the solution of the SDE (4.46). Prove that if h is bounded then π belongs to the class U¯0 . Exercise 4.28. Assume that condition U 0 holds. Prove that the Kushner– Stratonovich equation has a unique solution (up to indistinguishability) in the class U¯0 .
4.3 Solutions to Exercises ˆt 4.1 Since µ1t (ϕi ) = µ2t (ϕi ) almost surely for any i ≥ 0 one can find a set Ω 1 ˆ of measure one, independent of i ≥ 0, such that for any ω ∈ Ωt , µt (ϕ)(ω) = µ2t (ϕi )(ω) for all i ≥ 0. Since (ϕi )i≥0 is a separating sequence, it follows that ˆt , µ1t (ω) = µ2t (ω). Hence one can find a set Ω ˆ of measure one for any ω ∈ Ω 1 2 ˆ independent of t such that for any ω ∈ Ω, µt (ω) = µt (ω) for all t ∈ Q+ (the positive rational numbers). This together with the right continuity of the ˆ µ1t (ω) = µ2t (ω) for all sample paths of µ1 and µ2 implies that for any ω ∈ Ω, t ≥ 0.
4.3 Solutions to Exercises
117
4.2 Suppose νµn ⇒ νµ ; then from the definition of weak convergence, for any ϕ ∈ Cb (Rd ) it follows that νµn ϕ → νµ ϕ as n → ∞. Thus µn (ϕψ) → µ(ϕψ). Since any function in C l (Rd ) is of the form ϕψ where ϕ ∈ Cb (Rd ), it follows that µn converges to µ in Ml (Rd ). Conversely suppose that µn converges to µ in Ml (Rd ); thus µn ϕ → µϕ for ϕ ∈ C l (Rd ). If we set ϕ = ψθ for θ ∈ Cb (Rd ), then as ϕ/ψ ∈ Cb (Rd ), it follows that ϕ ∈ C l (Rd ). Thus µn (ψθ) → µ(ψθ) for all θ ∈ Cb (Rd ), whence νµn ⇒ νµ . 4.4 We have by the Kallianpur–Striebel formula Z t Z t 2 2 ˜ ˜ E (ρs (ψ)) ds = E (πs (ψ)) ρ2s (1) ds 0
0
Z =
t
h i ˜ (πs (ψ))2 ρ2 (1) ds. E s
0
Now h i ˜ (πs (ψ))2 ρ2 (1) ≤ E ˜ πs (ψ 2 ) ρ2 (1) E s s = E πs (ψ 2 ) ρ2s (1)Zs = E πs (ψ 2 )ρ2s (1)E [Zs |Ys ] . Since ρs (1) = 1/E [Zs | Ys ] (see Exercise 3.37 part (iii) we get that ˜ πs (ψ 2 ) ρ2 (1) = E πs (ψ 2 ) ρs (1) E s = E E ψ 2 (Xs )|Ys ρs (1) = E ψ 2 (Xs ) ρs (1) . Now since h is bounded, Z s Z 1 s > 2 ρs (1) = exp πr (h ) dYr − kπr (h)k dr 2 0 Z0 s Z s Z 1 s > > 2 = exp πr h dWr + πr h h(Xr ) dr − kπr (h)k dr 2 0 0 0 Z s Z s 2 1 ≤ eskhk∞ exp πr h> dWs − kπr (h)k2 dr . 2 0 0 Using the independence of W and X we see that Z s Z 1 s > > E exp πr h dWs − πr h πr (h)dr σ(Xr , r ∈ [0, s]) = 1, 2 0 0 hence
2
E [ρs (1)|σ(Xr , r ∈ [0, s])] ≤ eskhk∞ . It follows that
118
4 Uniqueness of the Solution
h i ˜ πs (ψ 2 ) ρ2 (1) ≤ eskhk2∞ E (1 + kXs k)2 , E s and therefore Z t h i 2 2 2 ˜ E (ρs (ψ)) ds ≤ tetkhk∞ sup E (1 + kXs k) s∈[0,t]
0
≤ 2te
tkhk2∞
! 1 + sup E kXs k2 . s∈[0,t]
As a consequence of Exercise 3.10, the last term in this equation is finite if X0 has finite second moment and (3.10) is satisfied. Thus ρ satisfies condition (4.4) and hence it belongs to the class U. 4.7 i. We know that for t in [0, ∞) the process µt is Yt -measurable. As ϕ ∈ E, this implies that ϕt ∈ C l (Rd ) and thus |ϕt (x)| ≤ kϕt kl∞ ψ(x). Define the sequence ϕnt (x) , ϕt (x)1{|ϕt (x)|≤n} . By the argument used for Exercise 2.21 we know that µt (ϕn ) is Yt -adapted since ϕn is bounded. But kϕt kl∞ ψ is a dominating function, and since µ ∈ U, it follows that µt (ψ) < ∞ hence it is a µt -measurable dominating function. Thus µt (ϕnt ) → µt (ϕt ) as n → ∞, which implies that µt (ϕt ) is Yt -measurable. As this holds for all t ∈ [0, ∞) it follows that µt (ϕt ) is Yt -adapted. ii. From the solution to Exercise 3.23, a sufficient condition for the stochastic integral to be well defined is Z t ˜ P (µs (ϕkhk))2 ds < ∞ = 1. 0
We establish the stronger condition for the stochastic integral to be a martingale; viz for all t ≥ 0, Z t 2 ˜ E (µs (ϕkhk)) ds < ∞. 0
Using the boundedness of ϕ and the linear growth condition p √ √ ϕ(x)h(x) ≤ Ckϕk∞ 1 + kxk2 = Ckϕk∞ ψ(x), but since µs ∈ Ml (Rd ), it follows that µs (ψ) < ∞. Thus Z t Z t (µs (ϕkhk))2 ds ≤ kϕk∞ C (µs (ψ))2 ds, 0
and by condition (4.4) it follows that
0
4.3 Solutions to Exercises t
Z
˜ E
119
(µs (ψ))2 < ∞
0
so the stochastic integral is both well defined and a martingale. 4.9 Starting from (4.5) we apply Itˆ o’s formula to the product εt µt (ϕt ), obtaining Z t ∂ϕt εt µt (ϕt ) = ε0 π0 (ϕ0 ) + εs µs + Aϕs ds ∂t 0 Z t Z t Z t > > + εs µs (ϕt h )dYs + iεs rs µt (ϕt )dYs + iεs rs µs (ϕs h> )ds. 0
0
0
˜ We now show that as a consequence of Next we take expectation under P. condition (4.4) both stochastic integrals are genuine martingales. Because εt is complex-valued we need to introduce the notation kε(ω)k∞ = sup |εt (ω)| t∈[0,∞)
where | · | denotes the modulus of the complex number. The following bound is elementary, kεt k∞ ≤ exp 12 max kri k2∞ t < ∞; i=1,...,m
for notational conciseness write R = maxi=1,...,m kri k∞ . By assumption there is a uniform bound on kϕs k∞ for s ∈ [0, t]; hence Z t Z t 2 2 > 2 R2 t ˜ ˜ E εs µs (ϕs h ) ds ≤ e sup kϕs k∞ E (µs (khk)) ds [0,t]
0
0
and the right-hand side is finite by (4.4). The second stochastic integral is treated in a similar manner Z t Z t 2 2 2 2 2 R2 t 2 ˜ ˜ E εs krs k (µs (ϕs )) ds ≤ R e sup kϕs k∞ E (µs (1)) ds . [0,t]
0
0
Therefore ˜ t µt (ϕt )) = π0 (ϕ0 ) + E ˜ E(ε
Z
t
εs µs
0
∂ϕs > + Aϕs + irs ϕs h ds , ∂t
which is (4.13). 4.11 Since the components of an , fn and hn are bounded it is immediate that they belong to Cb (Rd ) and consequently to the larger space C l (Rd ). For the bound, as there are a finite number of components it is sufficient to establish the result for one of them. Clearly
120
4 Uniqueness of the Solution
aij n (x)
=
aij n (0)
+
d Z X k=1
0
1
∂aij n (xs)xk ds. ∂xk
By (4.16), uniformly in x and i, ij ∂an ∂xi ≤ K0 ; thus ij ij an (x) ≤ an (0) + dK0 kxk. ij ij Secondly, since aij it follows that aij n → a n (0) → a (0); thus given ε > 0, ij ij there exists n0 such that for n ≥ n0 , |an (0) − a (0)| < ε. Thus we obtain the bound ij ij kaij n (x)k ≤ max kai (0)| + ka (0)k + ε + dK0 kxk. 1≤i≤n0
Hence, since l kaij n k∞ = sup
x∈Rd
|aij n (x)| 1 + kxk
ij setting A = max(max1≤i≤n0 kaij i (0)| + ka (0)k + ε, dK0 ), it follows that ij l kan k∞ ≤ A.
4.13 i. At such a maximum (t0 , x0 ) in (0, t] × Rd , ∂u (t0 , x0 ) ≥ 0, ∂t
∂u (t0 , x0 ) = 0, ∂xi
i = 1, . . . d,
(we cannot assert that the time derivative is zero, since the maximum might occur on the boundary at t) and the Hessian matrix of u (i.e. (∂i ∂j u)) is negative definite. Thus since a is positive definite, it follows that d X
∂2u a (x0 ) (t0 , x0 ) ≤ 0, ∂xi ∂xj i,j=1 ij
d X
f i (x0 )
i=1
∂u (t0 , x0 ) = 0; ∂xi
consequently ∂u (t0 , x0 ) ≤ 0 ∂t which is a contradiction since we had assumed that the left-hand side was strictly positive. ii. It is easy to verify that Au(x0 ) −
∂w ∂u = − δ + εe−t kxk2 , ∂t ∂t and
4.3 Solutions to Exercises
121
Aw = Au − εe−t 2 tr a + 2b> x . Thus
∂w ≥ −εe−t 2 tr a + 2(b − x)> x + δ. ∂t Thus given δ > 0 using the fact that a and b are bounded, we can find ε(δ) so that this right-hand side is strictly positive. iii. Choose δ, ε so that the condition in part (ii) is satisfied. It is clear that wδ,ε (0, x) = u(0, x) − εkxk2 . Thus since ε > 0, if u(0, x) ≤ 0, it follows that wδ,ε (0, x) ≤ 0. Also since u is bounded above, it is clear that as kxk → ∞, wδ,ε (t, x) → −∞. Therefore if u(t, x) ≥ 0 at some point, it is clear that wδ,ε has a maximum. But by part (i) wδ,ε (t, x) cannot have such a maximum on (0, t] × Rd . Hence u(t, x) ≤ 0 for all t ∈ [0, ∞) and x ∈ Rd . Aw −
4.17 Under the condition that Z t 2 > > ˜ E ρs (k(h + B )ϕk) ds < ∞, 0
we deduce that the corresponding complex values PDE for a functional dual ϕ is ∂ϕt Aϕt + + irt> (hϕt + Bϕt ) = 0. ∂t If we write ϕt = vt1 + iv2t , then the time reversed equation is ∂¯ v1 = A¯ v 1 − v¯2 gs − r¯> B¯ v2 ∂t ∂¯ v2 = A¯ v 2 + v¯1 gs + r¯> B¯ v1 , ∂t where rs = rt−s , and gs = h> r¯. As in the proof for the uncorrelated case an approximating sequence of uniformly parabolic PDEs is taken, with smooth bounded coefficients and so that (4.16) holds together with the analogue for f . Then with ztn defined by (4.24), d X ∂zs − Azs = − aαβ ∂α v¯sn,1 ∂β¯ vsn,1 + ∂α v¯sn,2 ∂β¯ vsn,2 ∂s α,β=1
− v¯sn,1 r¯> B¯ vsn,2 + v¯sn,2 r¯> B¯ vsn,1 . If we consider the special case of Corollary 3.40, and write ct = σ ¯ r¯t , which we assume to be uniformly bounded, then d X ∂zsn − Azsn = − aαβ ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯sn,2 ∂β v¯sn,2 ∂s α,β=1
+
d X γ=1
cγt −¯ vsn,1 ∂γ v¯sn,2 + v¯sn,2 ∂γ v¯sn,1 .
122
4 Uniqueness of the Solution
Using the inequality ab ≤ 12 (a2 + b2 ), it follows that for ε > 0, d X ∂zsn − Azsn ≤ − aαβ ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯sn,2 ∂β v¯sn,2 ∂s α,β=1
≤
+
d 1 X γ |ct | (¯ vsn,1 )2 + (¯ vsn,2 )2 2ε γ=1
+
d 2 2 ε X ∂γ v¯sn,1 + ∂γ v¯sn,2 2 γ=1
zsn dkck∞ ε d X αβ − (a − ε/2I) ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯sn,2 ∂β v¯sn,2 . α,β=1
As a is uniformly elliptic, x> ax ≥ λkxk2 , therefore, by choosing ε sufficiently small (i.e. ε < 2λ) then the matrix a − ε/2I is positive definite. Thus ∂zsn z n dkck∞ − Azsn ≤ s . ∂s ε ¯ Writing C¯0 = dkck∞ /ε and zˆt = e−C0 t zt , then
∂ zˆsn − Aˆ zsn ≤ 0, ∂s from which the positive maximum principle (Lemma 4.12) implies that ¯
k¯ vtn,1 k2∞ + k¯ vtn,2 k2∞ ≤ eC0 t kΦ1t k2∞ + kΦ2t k2∞
and the boundedness of v¯n,1 and v¯n,2 follows. To show the boundedness of the first derivatives, define uns as in (4.26); then
4.3 Solutions to Exercises
∂uns − An uns = ∂s d X − aαβ n
∂α ∂γ v¯sn,1
123
∂β ∂γ v¯sn,1 + ∂α ∂γ v¯sn,2 ∂β ∂γ v¯sn,2
α,β,γ=1
+
d X
∂γ aαβ n
∂α ∂β v¯sn,1
∂γ v¯sn,1 + ∂α ∂β v¯sn,2 ∂γ v¯sn,2
α,β,γ=1
+
d X
∂β fnα ∂α v¯sn,1 ∂β v¯sn,1 + ∂α v¯sn,2 ∂β v¯sn,2
α,β=1
+
d X
∂α gn,s (−¯ vsn,2 ∂α v¯sn,1 + v¯sn,1 ∂α v¯sn,2 )
α=1
+
d X
−(∂α v¯sn,1 ) ∂α (¯ r> B¯ vsn,2 ) + (∂α v¯sn,2 ) ∂α (¯ r> B¯ vsn,1 ) .
α=1
Bounds on the first four summations are identical to those used in the proof in the uncorrelated noise case, so ∂uns − An uns ≤ − Θsn + 12 Θsn + C2 uns + 2K0 duns + C4 (uns + d) ∂s d X + −(∂α v¯sn,1 ) ∂α (¯ r> B¯ vsn,2 ) + (∂α v¯sn,2 ) ∂α (¯ r> B¯ vsn,1 ) . α=1
To bound the final summation again use the special form of Corollary 3.40, ∂uns − An uns ≤ 12 Θsn + C0 uns + C1 ∂s d X + cγs −(∂α v¯sn,1 )(∂α ∂γ v¯sn,2 ) + (∂α v¯sn,2 )(∂α ∂γ v¯sn,1 ) α,γ=1
+
d X
(∂α cγs ) −(∂α v¯sn,1 )(∂γ v¯sn,2 ) + (∂α v¯sn,2 )(∂γ v¯sn,1 ) .
α,γ=1
The first summation can be bounded using ab ≤ 12 (a2 + b2 ) for ε > 0, d X
cγs −(∂α v¯sn,1 )(∂α ∂γ v¯sn,2 ) + (∂α v¯sn,2 )(∂α ∂γ v¯sn,1 )
α,γ=1
dkck∞ uns ε d X 2 2 ε + kck∞ ∂α ∂γ v¯sn,1 + ∂α ∂γ v¯sn,2 . 2 α,γ=1 ≤
124
4 Uniqueness of the Solution
Again by choice of ε sufficiently small, the matrix a − εkck∞ I remains positive definite (for ε < λ), therefore d X 2 2 1 ε − Θsn + kck∞ ∂α ∂γ v¯sn,1 + ∂α ∂γ v¯sn,2 ≤ 0. 2 2 α,γ=1
Since ∂α cγt is uniformly bounded by C5 , it follows that d X
(∂α cγs ) −(∂α v¯sn,1 )(∂γ v¯sn,2 ) + (∂α v¯sn,2 )(∂γ v¯sn,1 )
α,γ=1 d X
≤ C5
|∂α v¯sn,2 ||∂γ v¯sn,1 | + |∂α v¯sn,1 ||∂γ v¯sn,2 |
α,γ=1
≤
d C5 X |∂α v¯sn,1 |2 + |∂γ v¯sn,2 |2 + |∂α v¯sn,2 |2 + |∂γ v¯sn,1 |2 2 α,γ=1
≤ dC5
d X
|∂α v¯sn,1 |2 + |∂α v¯sn,2 |2
α=1
≤ 2dC5 uns . Using all these bounds ∂uns − An uns ≤ Cˆ0 uns + Cˆ1 , ∂s where Cˆ0 , C2 + 2K0 d + C4 + dkck∞ /ε + 2dC5 and Cˆ1 , dC4 ; thus as in the correlated case ! d Cˆ X
1 2 2 ˆ 1
∂α Φ1 + ∂α Φ2 kuns k∞ ≤ eC0 T + , ∞ ∞ 2 α=1 Cˆ0 from which the bound follows. The boundedness of the second derivatives is established by a similar but longer argument. 4.18 Using Exercises 3.11 and 3.25 the conditions (3.25) and (3.42) are satisfied. Lemma 3.29 then implies that mπt = ρt (1). From the Kallianpur–Striebel formula (3.36), for any ϕ bounded Borel-measurable, ρt (ϕ) = πt (ϕ)ρt (1), and by Exercise 4.4 the process ρt belongs to U. ˜ Z˜t |Yt ], we need to prove that E[ρ ˜ t (1)ξ] = E[ρ ˜ s (1)ξ] 4.21 Since ρt (1) = E[ for any Ys -measurable function. We have, using the martingale property of Z˜ that h i h i h h ii h i h i ˜ E[ ˜ Z˜t |Yt ]ξ = E ˜ Z˜t ξ = E ˜ E ˜ Z˜t ξ|Ys = E ˜ Z˜s ξ = E ˜ E[ ˜ Z˜s |Ys ]ξ , E which implies that ρt (1) is a Yt -martingale.
4.4 Bibliographical Notes
125
4.22 From Lemma 3.18 it follows that ρt is c`adl`ag, and ρt is Yt -adapted which implies that it is Ft -adapted since Yt ⊂ Ft . To check the condition (4.37), note that 2
2
(µt (|(hi + Bi )ϕ|)) ≤ 2 (µt (|hi ϕ|)) + 2 (µt (|Bi ϕ|))
2
2
2
≤ 2kϕk2∞ (µt (khk)) + 2kBϕk2∞ (µt (1)) . Thus Z tX m
2
[µs (|(hi + Bi )ϕ|)] ds
0 i=1
≤ 2m
kϕk2∞
t
Z
2
(µs (khk)) ds +
kBϕk2∞
Z
0
≤ 2m kϕk2∞
Z
t
(µs (1)) ds 2
0
!2
t
2
(µs (khk)) ds + tkBϕk2∞
sup µs (1)
.
s∈[0,t]
0
˜ Since (3.42) is satisfied, the first term is P-a.s. finite. As µt (1) has c`adl`ag ˜ paths, it follows that the second term is P-a.s. finite. 4.27 If h is bounded then conditions (3.25) and (3.42) are automatically satisfied. If πt is the normalised conditional distribution, by Lemma 3.29, mπt = ρt (1), hence from the Kallianpur–Striebel formula (3.36) mπt πt (ϕ) = ρt (ϕ), and from Exercise 4.22 it then follows that mπ π is in U 0 . As πt is Yt adapted, it is Ft -adapted. Furthermore, from Corollary 2.26 the process πt has c` adl` ag paths; thus πt is in U¯0 . i 4.28 Suppose that there are two solutions π1 and π2 in U¯0 . Then ρi , mπ πi are corresponding solutions of the Zakai equation, and from the definition of U¯0 must lie in U 0 . As condition U 0 holds, by Theorem 4.26, it follows that ρ1 and ρ2 are indistinguishable. The remainder of the proof is identical to that of Theorem 4.19.
4.4 Bibliographical Notes There are numerous other approaches to establish uniqueness of solution to the filtering equations. Several papers address the question of uniqueness without assuming that the solution of the two SPDEs (Zakai’s equation or the Kushner–Stratonovich equation) is adapted with respect to the given observation σ-field Yt . A benefit of this approach is that it allows uniqueness in law of the solution to be established. In Szpirglas [264], the author shows that in the absence of correlation between the observation noise and the signal, the Zakai equation is equivalent to the equation Z t ρt (ϕ) = π0 (Pt ϕ) + ρs (Pt−s ϕh> ) dYs , (4.49) 0
126
4 Uniqueness of the Solution
for all ϕ ∈ B(S), where Pt is the semigroup associated with the generator A. This equivalence means that a solution of the Zakai equation is a solution of (4.49) and vice versa. The uniqueness of the solution of (4.49) is established by iterating a simple integral inequality (Section V2, [264]). However, this technique does not appear to extend to the case of correlated noise. More recently, Lucic and Heunis [200] prove uniqueness for the correlated case, again without the assumption of adaptedness of the solution to the observation σ-algebra. There are no smoothness conditions imposed on the coefficients of the signal or observation equation. However h is assumed to be bounded and the signal non-degenerate (i.e. σ > σ is required to be positive definite). The problem of establishing uniqueness when ρt and πt are required to be adapted to a specified σ-algebra Yt is considered in Kurtz and Ocone [170] and further in Bhatt et al. [18]. This form of uniqueness can be established under much less restrictive conditions on the system.
5 The Robust Representation Formula
5.1 The Framework Throughout this section we assume that the pair (X, Y ) are as defined in Chapter 3. That is, X is a solution of the martingale problem for (A, π0 ) and Y satisfies the evolution equation (3.5) with null initial condition; that is, Z s Ys = h(Xr ) dr + Ws , s ≥ 0. (5.1) 0 m To start off with, we assume that the function h = (hi )m satisfies i=1 : S → R either Novikov’s condition (3.19) or condition (3.25) so that the process Z = {Zt , t > 0} defined by Z t Z 1 t > 2 Zt = exp − h(Xs ) dWs − kh(Xs )k ds , t ≥ 0, (5.2) 2 0 0
˜ defined on Ft by taking is a genuine martingale and the probability measure P its Radon–Nikodym derivative with respect to P to be given by Zt , viz ˜ dP = Zt dP Ft
is well defined (see Section 3.3 for details; see also Theorem B.34 and Corollary ˜ the process Y is a Brownian motion B.31). We remind the reader that, under P independent of X. The Kallianpur–Striebel formula (3.33) implies that for any ϕ a bounded Borel-measurable function πt (ϕ) =
ρt (ϕ) ρt (1)
˜ P(P)-a.s.,
where ρt is the unnormalised conditional distribution of X, A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 5,
128
5 The Robust Representation Formula
i h ˜ ϕ(Xt )Z˜t Yt , ρt (ϕ) = E and
Z t Z 1 t Z˜t = exp h(Xs )> dYs − kh(Xs )k2 ds . 2 0 0
(5.3)
Exercise 5.1. Show that the Kallianpur–Striebel formula holds true for any Borel-measurable function ϕ such that E [|ϕ(Xt )|] < ∞. In the following, we require that s 7→ h(Xs ) be a semimartingale. Let h(Xs ) = Hsfv + Hsm ,
s≥0
be the Doob–Meyer decomposition of h(Xs ) with H·fv = (H·fv,i )m i=1 the finite m,i m m variation part of h(X), and H· = (H· )i=1 the martingale part, which is assumed to be square integrable. We require that for all positive k > 0, the following conditions be satisfied, " !# m Z t X fv,i fv,k ˜ exp k dHs c =E <∞ (5.4) " ˜ exp k cm,k = E
i=1 0 m Z t X i=1
0
!#
d H m,i s
< ∞,
(5.5)
where s 7→ H m,i s is the quadratic variation of H m,i , for i = 1, . . . , m and Rt dHsfv,i is the total variation of H fv,i on [0, t] for i = 1, . . . , m. 0 Exercise 5.2. Using the notation from Chapter 3, show that if X is a solution of the martingale problem for (A, π0 ) and hi , (hi )2 ∈ D(A), i = 1, . . . , m, then conditions (5.4) and (5.5) are satisfied. [Hint: Use Exercise 3.22.]
5.2 The Importance of a Robust Representation In the following we denote by y· an arbitrary element of the set CRm [0, t], where t ≥ 0 is arbitrary but fixed throughout the section. In other words s 7→ ys is a continuous function y· : [0, t] → Rm . Also let Y· be the path-valued random variable Y· : Ω → CRm [0, t], Y· (ω) = (Ys (ω), 0 ≤ s ≤ t). Similar to Theorem 1.1, one can show that if ϕ is, for example, a bounded Borel-measurable function, then πt (ϕ) can be written as a function of the observation path. That is, there exists a bounded measurable function f ϕ : CRm [0, t] → R such that πt (ϕ) = f ϕ (Y· )
P-a.s.
(5.6)
5.3 Preliminary Bounds
129
Of course, f ϕ is not unique. Any other function f¯ϕ such that P ◦ Y·−1 f¯ϕ 6= f ϕ = 0, where P ◦ Y·−1 is the distribution of Y· on the path space CRm [0, t] can replace f ϕ in (5.6). In the following we obtain a robust representation of the conditional expectation πt (ϕ) (following Clark [56]). That is, we show that there exists a continuous function fˆϕ : CRm [0, t] → R (with respect to the supremum norm on CRm [0, t]) such that πt (ϕ) = fˆϕ (Y· )
(5.7)
P-a.s.
The following exercise shows that such a continuous fˆϕ has the virtue of uniqueness. Exercise 5.3. Show that if P◦Y·−1 positively charges all non-empty open sets in CRm [0, t], then there exists a unique continuous function fˆϕ : CRm [0, t] → R for which (5.7) holds true. Finally show that if Y satisfies evolution equation (5.1) then it charges all non-empty open sets. The need for this type of representation arises when the filtering framework is used to model and solve ‘real-life’ problems. As explained in a substantial number of papers (e.g. [56, 74, 73, 75, 76, 179, 180]) the model Y chosen for the “real-life” observation process Y¯ may not be a perfect one. However, as long as the distribution of Y¯· is close in a weak sense to that of Y· (and some integrability assumptions hold), the estimate fˆ(Y¯· ) computed on the actual observation will still be reasonable, as E[(ϕ(Xt ) − fˆϕ (Y¯· ))2 ] is well approximated by the idealized error E[(ϕ(Xt ) − fˆϕ (Y· ))2 ]. Even when Y and Y¯ coincide, one is never able to obtain and exploit a continuous stream of data as modelled by the continuous path Y· (ω). Instead the observation arrives and is processed at discrete moments in time 0 = t0 < t1 < t2 < · · · < tn = t. However the continuous path Yˆ· (ω) obtained from the discrete observations (Yti (ω))ni=1 by linear interpolation is close to Y· (ω) (with respect to the supremum norm on CRm [0, t]); hence, by the same argument, fˆϕ (Yˆ· ) will be a sensible approximation to πt (ϕ).
5.3 Preliminary Bounds Let Θ(y· ) be the following random variable Θ(y· ) , exp h(Xt )> yt − I(y· ) −
1 2
Z 0
t
kh(Xs )k ds , 2
(5.8)
130
5 The Robust Representation Formula
Rt where I(y· ), is a version of the stochastic integral 0 ys> dh(Xs ). The argument of the exponent in the definition of Θ(y· ) will be recognized as a formal integration by parts of the argument of the exponential in (5.3). In the following, for any random variable ξ we denote by kξkΩ,p the usual Lp norm of ξ, 1/p
˜ [|ξ|p ] kξkΩ,p = E
,
Θ Lemma 5.4. For any R > 0 and p ≥ 1 there exists a positive constant MR,p such that Θ sup kΘ(y· )kΩ,p ≤ MR,p . (5.9) ky· k≤R
Proof. In the following, for notational conciseness, for arbitrary y· ∈ CRm [0, t], define y¯· ∈ CRm [0, t] by y¯s , yt − ys ,
s ∈ [0, t].
If ky· k ≤ R, then it is clear that k¯ y· k ≤ 2R. From (5.8) we get that Z t Z t > 2 1 kh(Xs )k ds Θ(y· ) = exp y¯s dh(Xs ) − 2 0 0 Z t Z t > fv > m ≤ exp y¯s dHs + y¯s dHs . 0
0
Next observe that, from (5.4) we have Z t Z t ˜ exp 2p ˜ exp 4pR dHsfv E y¯s> dHsfv ≤E = cfv,4pR , 0
0
and by using the Cauchy–Schwartz inequality Z t ˜ exp 2p E y¯s> dHsm 0 Z t m Z t X > m 2 ˜ = E exp 2p y¯s dHs − 4p y¯si y¯sj dhH m,i , H m,j is 0
+ 4p
2
m Z X i,j=1
i,j=1
0
t
y¯si y¯sj dhH m,i , H m,j is
0
v u q m Z t u X u ˜ [Θ0 (y· )]tE ˜ exp8p2 ≤ E y¯si y¯sj dhH m,i , H m,j is r i,j=1
0
v u q m Z t u X ˜ [Θ0 (y· )]u ˜ exp32p2 R2 tE ≤ E |dhH m,i , H m,j i|s , r i,j=1
0
5.3 Preliminary Bounds
131
where Θr0 (y· ) , exp4p
Z
r
0
m Z r X (4p) y¯s> dHsm − y¯i y¯j dhH m,i , H m,j is . 2 i,j=1 0 s s 2
The process r 7→ Θr0 (y· ) is clearly an exponential local martingale and by Novikov’s condition and (5.5) it is a martingale, so ˜ [Θ0 (y· )] = 1. E r From this, the fact that Z t Z t Z m,i m,j
m,i 1 t m,j d H , H ≤1 d H + d H , s s s 2 0 2 0 0 and (5.5) we get Z t √ > m ˜ E exp 2p y¯s dHs ≤ cm,32p2 R2 m . 0
Hence, again by √ applying Cauchy–Schwarz’s inequality, (5.9) follows with Θ MR,p = (cfv,4pR cm,32p2 R2 m )1/2p . t u Now let ϕ be a Borel-measurable function such that kϕ(Xt )kΩ,p < ∞ for some p > 1. Note that kϕ(Xt )kΩ,p is the same whether we integrate with ˜ Let gˆϕ , gˆ1 , fˆϕ : CRm [0, t] → R be the following functions, respect to P or P. ˜ [ϕΘ(y· )] , gˆϕ (y· ) = E
gˆϕ (y· ) fˆ(y· ) = 1 . gˆ (y· )
˜ [Θ(y· )] , gˆ1 (y· ) = E
(5.10)
Θ Lemma 5.5. For any R > 0 and q ≥ 1 there exists a positive constant MR,q such that
Θ 1
Θ(y·1 ) − Θ(y·2 ) ≤ MR,q y· − y·2 (5.11) Ω,q
for any two paths y·1 , y·2 such that |y·1 |, |y·2 | ≤ R. In particular, (5.11) implies that gˆ1 is locally Lipschitz; more precisely 1 1
gˆ y· − gˆ1 y·2 ≤ MRΘ y·1 − y·2
Θ for any two paths y·1 , y·2 such that y·1 , y·2 ≤ R and MRΘ = inf q≥1 MR,q . Proof. For the two paths y·1 , y·2 denote by y·12 the difference path defined as y·12 , y·1 − y·2 . Then Z t 12 > Θ y·1 − Θ y·2 ≤ Θ y·1 + Θ y·2 y ¯ dh(X ) s , s 0
Using the Cauchy–Schwartz inequality
132
5 The Robust Representation Formula
Z
Θ
Θ(y·1 ) − Θ(y·2 )
≤ 2M R,2q Ω,q
t
y¯s12
>
0
dh(Xs )
.
(5.12)
Ω,2q
Finally, since y¯·12 ≤ 2 y·1 − y·2 , a standard argument based on Burkholder–Davis–Gundy’s inequality shows that the expectation on the righthand side of (5.12) is bounded by
Z t
Z t
Z t
12 > 12 > fv 12 > m
y¯s dh(Xs ) ≤ y¯s dHs + y¯s dHs
0
0
Ω,2q
0
Ω,2q
Ω,2q
Z
t fv dHs ≤ 2 y·1 − y·2
0
Ω,2q
m Z
1
X
2
+ 2cq y· − y·
0
i=1
t
m,i 1/2
, d H s Ω,q
where cq is the constant appearing in the Burkholder–Davis–Gundy inequality. Hence (5.11) holds true. t u Lemma 5.6. The function gˆϕ is locally Lipschitz and locally bounded. Proof. Fix R > 0 and let y·1 , y·2 be two paths such that ky·1 k, ky·2 k ≤ R. By H¨ older’s inequality and (5.11), we see that
˜ ϕ(Xt ) Θ y 1 − Θ y 2 ≤ kϕ(Xt )kΩ,p M Θ y 1 − y 2 . E (5.13) · · R,q · · where q is such that p−1 + q −1 = 1. Hence gˆϕ is locally Lipschitz, since ˜ ϕ(Xt ) Θ(y 1 ) − Θ(y 2 ) gˆϕ (y 1 ) − gˆϕ (y 2 ) = E ·
·
·
·
and R > 0 was arbitrarily chosen. Next let y· be a path such that ky· k ≤ R. Again, by H¨ older’s inequality and (5.9), we get that ˜ Θ sup |ˆ g ϕ (y· )| = sup E ϕ(Xt )Θ y·1 ≤ kϕ(Xt )kp MR,q < ∞. ky· k≤R
ky· k≤R
ϕ
Hence gˆ is locally bounded.
t u
Theorem 5.7. The function fˆϕ is locally Lipschitz. Proof. The ratio gˆϕ /ˆ g 1 of the two locally Lipschitz functions gˆϕ and gˆ1 (Lemma 5.5 and Lemma 5.6) is locally Lipschitz provided both gˆϕ and 1/ˆ gt1 are locally bounded. The local boundedness property of gˆϕ is shown in Lemma 5.6 and that of 1/ˆ gt1 follows from the following simple argument. If ky· k ≤ R Jensen’s inequality implies that Z t Z t Z 1 t ˜ [Θ(y· )] ≥ exp E E y¯s> dHsm + y¯s> dHsfv − kh(Xs )k2 ds 2 0 0 0 Z Z ! m t t X fv,i 1 2 dHs − E ≥ exp −2R E kh(Xs )k ds . (5.14) 2 0 0 i=1 Note that both expectations in (5.14) are finite, by virtue of condition (5.4). t u
5.4 Clark’s Robustness Result
133
5.4 Clark’s Robustness Result We proceed next to show that fˆϕ (Y· ) is a version of πt (ϕ). This fact is much more delicate than showing that fˆϕ is locally Lipschitz. The main difficulty is the fact that the mapping (y· , ω) ∈ CRm [0, t] × Ω → I(y· ) ∈ R is not B (CRm [0, t])×F-measurable since the integral I(y· ) is constructed path by path (where B(CRm [0, t]) is the Borel σ-field on CRm [0, t]). Let H1/3 be the following subset of CRm [0, t], ( ) kys1 − ys2 k∞ H1/3 = y· ∈ CRm [0, t] : K(y· ) , sup <∞ . 1/3 s1 ,s2 ∈[0,t] |s1 − s2 | Exercise 5.8. Show that almost all paths of Y belong to H1/3 , in other words show that ˜ ω ∈ Ω : Y· (ω) ∈ H1/3 = 1. P [Hint: Use the modulus of continuity for Brownian motion; see, for example, [149, page 114].] Lemma 5.9. There exists a version of the stochastic integral I(y· ) which has the property that the mapping (y· , ω) ∈ CRm [0, t] × Ω → I(y· ) ∈ R, whilst still non-measurable, is equal on H1/3 × Ω to a B (CRm [0, t]) × Ω-measurable mapping. Proof. Denote by I fv (y· ) the Stieltjes integral with respect to H fv . I fv (y· ) is defined unambiguously pathwise. To avoid ambiguity, for arbitrary y· ∈ CRm [0, t] and all ω ∈ Ω, we have I fv (y· )(ω) = lim
n→∞
n−1 X
> fv fv yit/n H(i+1)t/n (ω) − Hit/n (ω) .
i=0
Rt Hence defining I(y· ) only depends on selecting the version of 0 ys> dHsm , the stochastic integral with respect to the martingale part of h(X· ), which we denote by I m (y· ). Recall that for integrators which have unbounded variation on locally compact intervals it is not possible to define a stochastic integral pathwise for general integrands. However, if we restrict to a suitable class of integrands (such as H1/3 ) then this is possible. Inm (y· )(ω) ,
n−1 X i=0
Since, for y· ∈ H1/3 ,
> m m yit/n H(i+1)t/n (ω) − Hit/n (ω) .
134
5 The Robust Representation Formula
˜ E
"
I2mk (y· ) −
t
Z
ys> dHsm
2 #
0
˜ =E
m Z t X i=1
0
!2
i ysi − y[s2 dHsm,i k /t]t2−k
Z t m 2
X i i m,i ˜ ≤m E ys − y[s2k /t]t2−k d H s i=1
≤
0
mcX K(y· )2 t2/3 , 22k/3
where cX =
m 2 X ˜ H m,i E < ∞. t
i=1
Hence by Chebychev’s inequality Z t m 1 mcX K(y· )2 t2/3 > m ˜ P I2k (y· ) − ys dHs > ε ≤ 2 . ε 22k/3 0 But since
∞ X mcX K(y· )2 t2/3 k=1
22k/3
< ∞,
by the first Borel–Cantelli lemma it follows that Z t > m ˜ lim sup I mk (y· ) − P y dH > ε = 0; s s 2 k→∞
0
hence for y ∈ H1/3 , I2mk (y· ) converges to I m (y· ) to be the limit
Rt 0
˜ ys> dHsm , P-almost surely. We define
I m (y· )(ω) , lim sup I2mk (y· )(ω) k→∞
Rt for any (ω, y· ) ∈ Ω×H1/3 and any version of 0 ys> dHsm on CRm [0, t] \ H1/3 × Ω. Although the resulting map is generally non-measurable with respect to B(CRm [0, t]) ⊗ F, it is equal on H1/3 × Ω to the following jointly measurable function J m (y· ) , lim sup I2mk (y· ) (5.15) k→∞
defined on the whole of CRm [0, t] × Ω. We emphasize that for y ∈ / H1/3 it is Rt > m m quite possible that J (y) differs from the value of 0 ys dHs . t u In order to simplify the proof of the robustness result which follows, it ˆ be an idenˆ F, ˆ P) is useful to decouple the two processes X and Y . Let (Ω, ˜ ˆ tical copy of (Ω, F, P) and let X be the copy of X within the new space
5.4 Clark’s Robustness Result
135
˜ Let H ˜ ˆ F, ˆ P). ˆ m and H ˆ fv be the processes within the new space (Ω, ˆ F, ˆ P) (Ω, m fv ϕ corresponding to the original H and H . Then the function gˆ has the following representation, h i ˆ ϕ(X ˆ t )Θ(y ˆ ·) gˆϕ (y· ) = E (5.16) Z t ˆ · ) = exp h(X ˆ t )> yt − I(y ˆ ·) − 1 ˆ t )k2 ds , Θ(y kh(X (5.17) 2 0
ˆ denotes integration on (Ω, ˆ and I(y ˆ F, ˆ P), ˆ · ) is the version of the where E Rt > ˆ stochastic integral 0 ys dh(Xs ) corresponding to I(y· ) as constructed above. Denote by Iˆm (y· ) the respective version of the stochastic integral with respect ˆ m and by Iˆfv (y· ) the Stieltjes integral with respect to H ˆ fv . to the martingale H m m Let Jˆ (y· ) be the function corresponding to J (y· ) as defined in (5.15). Then, ˆ · ) can be written as for y· ∈ H1/3 , Θ(y Z t ˆ s )k2 ds . (5.18) ˆ · ) = exp h> (X ˆ t )yt − Iˆfv (y· ) − Jˆm (y· ) − 1 Θ(y kh( X 2 0
¯ be the product space ¯ F, ¯ P) Finally, let (Ω, ¯ = (Ω × Ω, ˜ ⊗ P) ˆ ¯ F, ¯ P) ˆ F ⊗ F, ˆ P (Ω, ˆ and Y from the component spaces. In other on which we ‘lift’ the processes H ˆ ˆ ω ) for all (ω, ω ˆ words, Y (ω, ω ˆ ) = Y (ω) and H(ω, ω ˆ ) = H(ˆ ˆ ) ∈ Ω × Ω. Lemma 5.10. There exists a null set N ∈ F such that the mapping (ω, ω ˆ) ∈ ¯ 7→ I(Y ˆ (ω))(ˆ ˆ with an F-measurable ¯ Ω ω ) coincides on (Ω\N ) × Ω mapping. Proof. First let us remark that (ω, ω ˆ ) 7→ Iˆfv (Y (ω))(ˆ ω ) is equal to Iˆfv (Y· (ω)) (ˆ ω ) = lim
n−1 X
n→∞
> ˆ fv ˆ fv (ˆ Yit/n (ω) H ω) − H (i+1)t/n (ˆ it/n ω )
(5.19)
i=0
and since ¯ 7→ (ω, ω ˆ) ∈ Ω
n−1 X
> ˆ fv ˆ fv (ˆ Yit/n (ω) H ω) − H (i+1)t/n (ˆ it/n ω )
i=0
¯ is F-measurable then so is its limit. Define N , {ω ∈ Ω : Y· (ω) 6∈ H1/3 }. ˜ ) = 0. Following the definition of I m (y· ), the mapping Then N ∈ F and P(N (ω, ω ˆ ) 7→ Iˆm (Y (ω))(ˆ ω ) coincides with the mapping (ω, ω ˆ ) 7→ Jˆm (Y (ω))(ˆ ω ) on ˆ Then Jˆm is an F-measurable ¯ (Ω\N ) × Ω. random variable, since ˆm
J (Y (ω))(ˆ ω ) = lim sup k→∞
k 2X −1
> ˆm ˆ m k (ˆ Yit/2 H (ˆ ω ) − H ω ) . (5.20) k (ω) k (i+1)t/2 it/2
i=0
Combining this with the measurability of Iˆfv (Y· ) gives us the lemma.
t u
136
5 The Robust Representation Formula
¯ Lemma 5.11. P-almost surely Z t ˆ s = Iˆfv (Y· ) + Jˆm (Y· ). Ys> dH
(5.21)
0
Proof. We have Z
t
Z
ˆs = Ys> dH
0
t
ˆ sm + Ys> dH
0
Z
t
ˆ sfv . Ys> dH
0
Rt
ˆ sfv = Iˆfv (Y· ). Hence, following Following (5.19) it is obvious that 0 Ys> dH ¯ the of the previous lemma, it suffices to prove that, P-almost surely, R t proof > ˆm m m ˆ ˆ Y d H = J (Y ) where J (Y ) is the function defined in (5.20). Without · · s s 0 loss of generality we assume that m = 1 (the general case follows by treating each of the m components in turn) and we note that we only need to prove ¯ that, for arbitrary K > 0, P-almost surely, Z t ˆ sm = Jˆm (Y·K ), YsK dH (5.22) 0
where YsK
( Ys = K
if |Ys | ≤ K otherwise.
In turn, (5.22) follows once we prove that !2 n−1 > X K ¯ ˆm ˆm ˆm Y·K = 0. lim E Yit/n H (i+1)t/n − Hit/n − J n→∞
i=0
¯ By Fubini’s theorem, using the F-measurability of Jˆm (Y·K ) and the fact that m K m K ˆ ˆ ˆ I (Y· ) coincides with J (Y· ) on (Ω\N ) × Ω we have !2 n−1 > X ¯ ˆm ˆm E YK H −H − Jˆm Y·K it/n
(i+1)t/n
it/n
i=0
Z =
ˆ E
2 m K m K ˜ ˆ ˆ In Y· (ω) − J Y· dP(ω)
ˆ E
Ω\N
Z =
Iˆnm (Y·K (ω)) − Iˆm (Y·K )
2
˜ dP(ω).
Ω\N
Now since s 7→ YsK (ω) is a continuous function and Iˆm (Y·K (ω)) is a version > Rt ˆ sm , it follows that of the stochastic integral 0 YsK (ω) dH 2 m K m K ˆ ˆ ˆ lim E In (Y· (ω)) − I (Y· (ω)) =0 n→∞
5.4 Clark’s Robustness Result
137
for all ω ∈ Ω\N . Also, we have the following upper bound 2 h i m K m K ˆ ˆ (H ˆ ˆ ˆ m )2 < ∞. E In (Y· (ω)) − I (Y· (ω)) ≤ 4K 2 E t Hence, by the dominated convergence theorem, !2 n−1 > X K ¯ ˆm ˆm ˆm K lim E Yit/n H (i+1)t/n − Hit/n − I (Y· ) n→∞
i=0
Z =
ˆ lim E
Ω\N n→∞
Iˆnm (Y·K (ω)) − Iˆm (Y·K (ω))
2
˜ dP(ω) = 0. t u
Theorem 5.12. The random variable fˆϕ (Y· ) is a version of πt (ϕ); that is, πt (ϕ) = fˆϕ (Y· ), P-almost surely. Hence fˆϕ (Y· ) is the unique robust representation of πt (ϕ). ˜ Proof. It suffices to prove that, P-almost surely (or, equivalently, P-almost surely), ρt (ϕ) = gˆϕ (Y· ) and ρt (1) = gˆ1 (Y· ). We need only prove the first identity as the second is just a special case obtained by setting ϕ = 1 in the first. From the definition of abstract conditional expectation therefore it suffices to show ˜ [ρt (ϕ)b(Y· )] = E ˜ [ˆ E g ϕ (Y· )b(Y· )] ,
(5.23)
where b is an arbitrary continuous bounded function b : CRm [0, t] → R. Since ˜ it follows that the pair processes (X, Y ) X and Y are independent under P, ˜ and (X, ¯ have the same distribution. Hence, the left-hand ˆ Y ) under P under P, side of (5.23) has the following representation, ˜ [ρt (ϕ)b(Y· )] E Z t Z t > 2 1 ˜ = E ϕ(Xt ) exp h(Xs ) dYs − 2 kh(Xs )k ds b(Y· ) 0 0 Z t Z t > 2 1 ¯ ˆ ˆ = E ϕ(X ˆ t ) exp h(Xs ) dYs − 2 kh(Xs )k ds b(Y· ) 0 0 Z t Z t > > 2 1 ¯ ˆ ˆ ˆ ˆ = E ϕ(Xt ) exp h(Xt ) Yt − Ys dh(Xs ) − 2 kh(Xs )k ds b(Y· ) . 0
0
On the other hand, using (5.18), the right-hand side of (5.23) has the representation
138
5 The Robust Representation Formula
˜ [ˆ E g ϕ (Y· )b(Y· )] ˜ b(Y· )E ˆ ϕ(X ˆ t ) exp h(X ˆ t )> Yt − Iˆfv (Y· ) − Iˆm (Y· ) =E 1 − 2
Z
1 − 2
Z
t
2 ˆ kh(Xs )k ds
0 > fv ˜ b(Y· )E ˆ ϕ(X ˆ t ) exp h(Xt ) Yt − Iˆ (Y· ) − Jˆm (Y· ) =E t
2 ˆ kh(Xs )k ds .
0
¯ Hence by Fubini’s theorem (using, again the F-measurability of Jˆm (Y· )) ˜ [ˆ ¯ ϕ(X ˆ t ) exp h(X ˆ t )> Yt − Iˆfv (Y· ) − Jˆm (Y· ) E g ϕ (Y· )b(Y· )] = E −
1 2
Z
t
ˆ s )k2 ds b(Y· ) . kh(X
0
Finally, from Lemma 5.11, the two representations coincide.
t u
Remark 5.13. Lemma 5.11 appears to suggest a pathwise construction for the stochastic integral Z t h(Xs )> dYs , 0
Rt but we know that for cases such as 0 Bs dBs a stochastic integral cannot be defined pathwise (see Remark B.17). However, this apparent paradox is resolved by noting that the terms appearing in the lemma are only constructed ¯ on the space Ω. This construction has other uses in the numerical solution of problems involving stochastic integrals. For example, adaptive pathwise approximation is sometimes used in numerical evaluation R t of stochastic integrals. Suppose we wish to evaluate the stochastic integral 0 Xs dYs where X and Y are c`adl`ag processes and we assume the usual conditions on the filtration. Given δ > 0, if we define stopping times T0δ = 0 and δ Tkδ = inf{t > Tk−1 : |Xt − Xtk−1 | > δ},
then the stochastic integral may be approximated pathwise by (X · Y )
(δ)
,
∞ X
XTkδ (YTk+1 − YTkδ ). δ
k=0
If δn is a sequence of values of δ which tends to zero sufficiently fast, by similar calculations to those used in the justification that I m is a pathwise approximation to the stochastic integral, this series of approximations can be shown to converge P-a.s. uniformly on a finite interval to the stochastic integral as n → ∞.
5.6 Bibliographic Note
139
5.5 Solutions to Exercises 5.1 Repeat the proof of the formula for a Borel-measurable function ϕ such that E [|ϕ(Xt )|] < ∞. Alternatively use the following argument. It suffices to prove the result only for ϕ a non-negative Borel-measurable function such that E [ϕ(Xt )] < ∞, as the general result follows by decomposing the function into its positive and negative parts. Consider the sequence (ϕn )n≥0 of functions defined as ( ϕ(x) if ϕ(x) ≤ n ϕn = . n otherwise Then ϕn is bounded and by the Kallianpur–Striebel formula (3.33), πt (ϕn ) = Also
ρt (ϕn ) ρt (1)
˜ P(P)−a.s.
h i ˜ ϕ (Xt ) Z˜t = E [ϕ(Xt )] < ∞. E
Hence, by the conditional monotone convergence theorem πt (ϕ) = lim πt (ϕn ) = n→∞
1 ρt (1)
lim ρt (ϕn ) =
n→∞
ρt (ϕ) . ρt (1)
˙ Then 5.3 Let fˆ1ϕ and fˆ2ϕ be two continuous functions both versions of πt (ϕ). n o A = y· ∈ CRm [0, t] : fˆ1ϕ (y· ) 6= fˆ2ϕ (y· ) is an open set CRm [0, t]. Also, from (5.7), we get that n o P ◦ Y·−1 (A) = P ω ∈ Ω : fˆ1ϕ (Y· (ω)) 6= fˆ2ϕ (Y· (ω)) = 0. Since P◦Y·−1 positively charges all non-empty open sets in CRm [0, t], it follows that A must be empty. Finally observe that, by Girsanov’s theorem the distribution of Y· under P is absolutely continuous with respect to the distribution ˜ The results follows since the Wiener measure charges all open of Y under P. ˜ sets in CRm [0, t] and the Radon–Nikodym derivative dP/dP is almost surely positive.
5.6 Bibliographic Note The robust representation was introduced by Clark [56]. Both Clark and Kushner [179] show that the associated robust expression for the conditional distribution f ϕ given by (5.10) is locally Lipschitz continuous in the observation path y. Very general robustness results have been obtained by Gy¨ongy [115] and Gy¨ ongy and Krylov [114].
6 Finite-Dimensional Filters
In Section 3.5 we analyzed the case when X is a Markov process with finite state space I and associated Q-matrix Q (see Exercise 3.27). In that case, π = {πt , t ≥ 0} the conditional distribution of Xt given the σ-algebra Yt is a finite-dimensional process. More precisely π = {(πti )i∈I , t ≥ 0}, the conditional distribution of Xt given the σ-algebra Yt is a process with values in RI which solves the stochastic differential equation (3.53). The natural question which arises is whether the finite-dimensionality property is preserved when the signal is a diffusion process, in particular when the signal is the solution of the d-dimensional stochastic differential equation (3.9) (see Section 3.2 for details). In general, the answer to this question is negative (see, e.g. [42, 189, 231, 233]). With some notable exceptions, π is truly an infinitedimensional stochastic process. The aim of this chapter is to study two special classes of filters for which the corresponding π is finite-dimensional: the Beneˇs filter (see [9]) and the linear filter, also known as the Kalman–Bucy filter ([29, 146, 147]).
6.1 The Beneˇ s Filter To simplify the calculations, we assume that both the signal and the observation are one-dimensional. We also assume that the signal process satisfies a stochastic differential equation with constant diffusion term and non-random initial condition; that is, X is a solution of the equation Z t Xt = x0 + f (Xs ) ds + σVt . (6.1) 0
In (6.1) σ > 0 is a positive constant, x0 ∈ R, V is a Brownian motion and the function f : R → R is differentiable, and satisfies the analogue of (3.10), |f (x) − f (y)| ≤ K|x − y|.
(6.2)
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 6,
142
6 Finite-Dimensional Filters
As in Chapter 3, the Lipschitz condition (6.2) is to ensure that the SDE for the signal process has a unique solution. We assume that W is a standard Brownian motion which is independent of V and that Y is the process satisfying the following evolution equation Z t Yt = h(Xs ) ds + Wt . (6.3) 0
In (6.3) h : R → R is chosen to be the linear function h(x) = h1 x + h2 , x ∈ R,
where h1 , h2 ∈ R.
We assume that the following condition, introduced by Beneˇs in [9], is satisfied f 0 (x) + f 2 (x)σ −2 + h2 (x) = P (x),
x ∈ R,
(6.4)
where f 0 is the derivative of f and P (x) is a second-order polynomial with positive leading-order coefficient. Exercise 6.1. i. Show that if f is linear then the Beneˇs condition is satisfied (which establishes that the linear filter with time-independent coefficients is a Beneˇs filter). ii. Show that the function f defined as f (x) = ασ
βe2αx/σ − 1 , βe2αx/σ + 1
where α, β ∈ R
satisfies the Beneˇs condition. Thus show that f (x) = aσ tanh(ax/σ) satisfies the Beneˇs condition. iii. Show that the function f defined as f (x) = aσ tanh(b + ax/σ),
where a, b ∈ R,
satisfies the Beneˇs condition. 6.1.1 Another Change of Probability Measure We need to apply a change of the probability measure similar to the one detailed in Section 3.3. This time both the distribution of X and Y are affected, not just that of the observation process Y as was previously the case. Let Z˘ = {Z˘t , t > 0} be the process defined by Z t Z f (Xs ) 1 t f (Xs )2 ˘ Zt , exp − dVs − ds σ 2 0 σ2 0 Z t Z t 1 − h(Xs ) dWs − h(Xs )2 ds , t ≥ 0. (6.5) 2 0 0
6.1 The Beneˇs Filter
143
Exercise 6.2. Show that the process Z˘ = {Z˘t , t ≥ 0} is an Ft -adapted martingale under the measure P. ˆ be a new probability measure such that its Radon–Nikodym derivaLet P tive with respect to P is ˆ dP = Z˘t dP Ft
for all t ≥ 0. Let Vˆ = {Vˆt , t > 0}, be the process Z t f (Xs ) Vˆt , Vt + ds, t ≥ 0. σ 0 Using Girsanov’s theorem the pair process (Vˆ , Y ) = {(Vˆt , Yt ), t > 0} is a standard two-dimensional Brownian motion. Let Zˆ = {Zˆt , t ≥ 0} be the process defined as Zˆt = Z˘t−1 for t ≥ 0. By Itˆo’s formula, this process Zˆ satisfies the following stochastic differential equation, dZˆt = Zˆt h(Xt ) dYt + f (Xt )σ −1 dVˆt , (6.6) and since Zˆ0 = 1, t
Z f (Xs ) ˆ 1 t f (Xs )2 dVs − ds σ 2 0 σ2 0 Z t Z 1 t h(Xs )2 ds , t ≥ 0. + h(Xs ) dYs − 2 0 0
Zˆt = exp
Z
(6.7)
ˆ Zˆt = E(Zˆt Z˘t ) = 1, so Zˆ is a martingale under P ˆ and we have It is clear that E dP = Zˆt for t ≥ 0. ˆ F dP t Let F be an antiderivative of f ; that is, F is such that F 0 (x) = f (x) for all x ∈ R. By Itˆ o’s formula, Z t Z 1 t 0 ˆ F (Xt ) = F (X0 ) + f (Xs )σ dVs + f (Xs )σ 2 ds. 2 0 0 Thus from the Beneˇs condition (6.4) we get that, for all t ≥ 0, Z t Z 1 t F (Xt ) F (x0 ) Z˘t = exp − + h(X ) dY − P (X ) ds . s s s σ2 σ2 2 0 0 ˆ the observation process Y is a Brownian Exercise 6.3. Prove that, under P motion independent of X, where we can write Xt = X0 + σ Vˆt .
144
6 Finite-Dimensional Filters
Define ρˆt to be a measure-valued process following the definition of the unnormalised conditional expectation in Chapter 3. For every ϕ a bounded Borel-measurable function, it follows that ρˆt (ϕ) satisfies ˆ ˆ ρˆt (ϕ) , E[ϕ(X t )Zt |Y]
ˆ P-a.s.,
(6.8)
ˆ is the expectation with respect to P. ˆ As a consequence of Proposition where E 3.15, the process ρˆ(ϕ) is a modification of that defined with Y replaced by Yt in (6.8). Exercise 6.4. For every ϕ a bounded Borel-measurable function we have πt (ϕ) =
ρˆt (ϕ) , ρˆt (1)
ˆ P(P)-a.s.
(6.9)
6.1.2 The Explicit Formula for the Beneˇ s Filter We aim to obtain an explicit expression of the (normalised) density of ρˆt . For this we make use of the closed form expression (B.30) of the functional Itβ,Γ,δ as described in equation (B.22) of the appendix. This cannot be done directly as the argument of the exponential in (B.22) contains no stochastic integral. However, similar to the analysis in Chapter 5, one can show that ρˆt (ϕ) ρˆn (ϕ) = lim tn , ρˆt (1) n→∞ ρˆt (1) where ρˆnt is the measure defined as Z t F (Xt ) F (x0 ) n ˆ ρˆt (ϕ) , E ϕ(Xt ) exp − + h(Xs )ysn ds σ2 σ2 0 Z 1 t − P (Xs ) ds , 2 0
(6.10)
for any bounded measurable function ϕ and y n = {ysn , s ∈ [0, t]} the piecewise constant process ysn =
Y(k+1)t/n − Ykt/n , t/n
s ∈ [kt/n, (k + 1)t/n),
k = 0, 1, . . . , n − 1.
As explained in Chapter 5, the expectation in (6.10) is no longer conditional. We keep y n fixed to the observation path, or rather the approximation of its ‘derivative’ and integrate with respect to the law of Vˆ . Exercise 6.5. Prove that, almost surely, Z t Z t sinh(spσ) n sinh(spσ) lim ys ds = dYs , n→∞ 0 sinh(tpσ) 0 sinh(tpσ)
6.1 The Beneˇs Filter
145
and that there exists a positive random variable c(t, Y ) such that, uniformly in n ≥ 1, we have Z t sinh(spσ) n y ds s ≤ c(t, Y ). 0 sinh(tpσ) In the following, we express the polynomial P (x) in the form P (x) = p2 x2 + 2qx + r, where p, q, r ∈ R are arbitrary. Then we have the following. Lemma 6.6. For an arbitrary bounded Borel-measurable function ϕ, the ratio ρˆnt (ϕ)/ˆ ρnt (1) has the following explicit formula Z ∞ ρˆnt (ϕ) 1 = ϕ(x0 + σz) exp F (x0 + σz)σ −2 + Qnt (z) dz, (6.11) n n ρˆt (1) ct −∞ where Qnt (z) is the second-order polynomial Z t sinh(spσ) pσ coth(tpσ) 2 Qnt (z) , z σ h1 ysn − q − p2 x0 ds − z , sinh(tpσ) 2 0 and cnt is the normalising constant Z ∞ cnt , exp F (x0 + σz)σ −2 + Qn (z) dz. −∞
Proof. From (6.10), the expression for ρˆnt (ϕ) becomes Z t ˆ ϕ(x0 + σ Vˆt ) exp F x0 + σ Vˆt σ −2 + ρˆnt (ϕ) = λnt E Vˆs βsn ds 0 Z 1 t − (pσ Vˆs )2 ds , (6.12) 2 0 where Z t 1 λnt , exp −F (x0 )σ −2 + (h1 x0 + h2 )ysn ds − (r + 2x0 q + p2 x20 )t , 2 0 βsn , σ(h1 ysn − q − p2 x0 ). If we make the definition Z t Z 1 t β n ,pσ,z n 2 ˆ ˆ ˆ ˆ It , E exp Vs βs ds − (pσ Vs ) ds Vt = z , 2 0 0 then
146
6 Finite-Dimensional Filters
Z t ˆ ϕ(x0 + σ Vˆt ) exp F x0 + σ Vˆt σ −2 + E Vˆs βsn ds 0 Z 1 t − (pσ Vˆs )2 ds Vˆt = z 2 0 n F (x0 + σz) = Itβ ,pσ,z ϕ(x0 + σz) exp . σ2 Following (B.36) we get that Z β n ,pσ,z β n ,pσ ¯ It = ft exp z
t
0
sinh(spσ) n pσ coth(tpσ) 2 z 2 βs ds − z + sinh(tpσ) 2 2t
(6.13)
, (6.14)
where s n f¯tβ ,pσ
,
tpσ exp sinh(tpσ)
Z t Z 0
0
t
sinh((s − t)pσ)sinh(s0 pσ) n n βs βs0 ds ds0 . 2pσ sinh(tpσ)
Identity (6.11) then follows from (6.12)–(6.14) by integrating over the N (0, t) law of Vˆt , Z ∞ ˆ = √1 ˆ | Vˆt = z)e−z2 /2t dz. E(·) E(· 2πt −∞ t u β n ,pσ ¯ Observe that the function ft which is used in the above proof does not appear in the final expression for ρˆnt (ϕ)/ˆ ρnt (1). We are now ready to obtain the formula for πt (ϕ). Proposition 6.7. If the Beneˇs condition (6.4) is satisfied then for arbitrary bounded Borel-measurable ϕ, it follows that πt (ϕ) satisfies the following explicit formula Z 1 ∞ πt (ϕ) = ϕ(z) exp F (z)σ −2 + Qt (z) dz, (6.15) ct −∞ where Qt (z) is the second-order polynomial Z t sinh(spσ) q + p2 x0 q Qt (z) , z h1 σ dYs + − coth(tpσ) pσ sinh(tpσ) pσ 0 sinh(tpσ) p coth(tpσ) 2 − z , 2σ and ct is the corresponding normalising constant, Z ∞ ct , exp F (z)σ −2 + Qt (z) dz. (6.16) −∞
In particular, π depends only on the one-dimensional Yt -adapted process Z t t 7→ sinh(spσ) dYs . 0
6.1 The Beneˇs Filter
147
Proof. Making a change of variable in (6.11), we get that Z ∞ 1 u − x0 1 ρˆnt (ϕ) F (u) n = n ϕ(u) exp + Qt du. ρˆnt (1) ct −∞ σ2 σ σ Following Exercise 6.5 we get that Z t Z t sinh(spσ) n sinh(spσ) lim ys ds = dYs , n→∞ 0 sinh(tpσ) 0 sinh(tpσ) hence† Z
t
sinh(spσ) dYs 0 sinh(tpσ) q + p2 x0 pσ coth(tpσ) 2 − (coth(tpσ) − csch(tpσ)) z − z . p 2
lim Qnt (z) = zσh1
n→∞
Thus Qt (u) = lim
n→∞
Z
Qnt
u − x0 σ
t
sinh(spσ) p coth(tpσ) 2 dYs − u 2σ 0 sinh(tpσ) q + p 2 x0 p coth(tpσ) − (coth(tpσ) − csch(tpσ)) u + ux0 . pσ σ
= uh1
Finally, since ρˆnt (ϕ) , n→∞ ρ ˆnt (1)
πt (ϕ) = lim
the proposition follows by the dominated convergence theorem (again use Exercise 6.5). t u Remark 6.8. For large t, as coth(x) → 1 and csch(x) → 0 as x → ∞, it follows that πt (ϕ) is approximately equal to Z 1 ∞ πt (ϕ) ' ϕ(z) exp F (z)σ −2 + P˜t (z) dz, ct −∞ where P˜t (z) is the second-order polynomial Z t sinh(spσ) q p 2 P˜t (z) , h1 σ dYs − z− z , sinh(tpσ) σp 2σ 0 t
t0 < t.
In particular, past observations become quickly (exponentially) irrelevant and so does the initial position of the signal x0 . †
Recall that coth(x) = cosh(x)/ sinh(x) and csch(x) = 1/ sinh(x).
148
6 Finite-Dimensional Filters
Exercise 6.9. Compute the normalising constant ct for the linear filter and the filter given by f (x) = aσ tanh(ax/σ), which were shown to satisfy the Beneˇs condition described in Exercise 6.1. Hence determine an explicit expression for the density of πt . What is the asymptotic behaviour of πt for large t? If the initial state of the signal X0 is random, then the formula for πt (ϕ) is obtained by integrating (6.15) in the x0 variable with respect to the law of X0 . A multidimensional version of (6.15) can be obtained by following the same procedure as above. The details of the computation of the exponential Brownian function Itβ,Γ,δ are described in formula (B.22) of the appendix in the multidimensional case. Including the full form of πt (ϕ) in this case would make this chapter excessively long. However, the fact that such a computation is possible is fairly important, due to the scarcity of explicit expressions for π. Such explicit expressions provide benchmarks for testing numerical algorithms for computing approximations to π.
6.2 The Kalman–Bucy Filter Let now X = (X i )di=1 be the solution of the linear SDE driven by a pdimensional Brownian motion process V = (V j )pj=1 , Z Xt = X0 +
t
Z (Fs Xs + fs ) ds +
0
t
σs dVs ,
(6.17)
0
where, for any s ≥ 0, Fs is a d × d matrix, σs is a d × p matrix and fs is a ddimensional vector. The functions s 7→ Fs , s 7→ σs and s 7→ fs are measurable and locally bounded.† Assume that X0 ∼ N (x0 , r0 ) is independent of V . Next assume that W is a standard Ft -adapted m-dimensional Brownian motion on (Ω, F, P) independent of X and let Y be the process satisfying the following evolution equation Z t Yt = (Hs Xs + hs ) ds + Wt , (6.18) 0
where, for any s ≥ 0, Hs is a m × d matrix and hs is an m-dimensional vector. Remark 6.10. Let Im be the m×m-identity matrix and 0a,b be the a×b matrix with all entries equal to 0. Let Ls be the (d + m) × (d + m) matrix, ls be the (d + m)-dimensional vector and zs be the (d + m) × (r + m) matrix given by, respectively, Fs Od,m fs σs Od,m Ls = , ls = , zs = . Hs Om,m hs Om,r Im †
That is, for every time t, the functions are bounded for s ∈ [0, t].
6.2 The Kalman–Bucy Filter
149
Let T = {Tt , t > 0} be the (d + m)-dimensional pair process (X, Y ) and U = {Ut , t > 0} be the (p + m)-dimensional Brownian motion (V, W ). Then T is a solution of the linear SDE Z t Z t Tt = T0 + (Ls Ts + ls ) ds + zs dUs . (6.19) 0
0
Exercise 6.11. i. Prove that T has the following representation Z t Z t −1 −1 Tt = Φt T0 + Φs ls ds + Φs zs dUs , 0
(6.20)
0
where Φ is the unique solution of the matrix equation dΦt = Lt Φt , dt
(6.21)
with initial condition Φ0 = Id+m . ii. Deduce from (i) that for any n > 0 and any n + 1-tuple of the form Yt1 , Yt2 , . . . , Ytn−1 , Yt , Xt , where 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t, has a (d+nm)-variate normal distribution. iii. Let K : [0, t] → Rd×m be a measurable (d × m) matrix-valued function with all of its entries square integrable. Deduce from (ii) that the pair Z t Xt , Ks dYs 0
has a 2d-variate normal distribution. Lemma 6.12. In the case of the linear filter, the normalised conditional distribution πt of Xt conditional upon Yt is a multivariate normal distribution. Proof. Consider the orthogonal projection of the components of the signal Xti , i = 1, . . . , d, onto the Hilbert space HtY ⊂ L2 (Ω) generated by the components of the observation process j Ys , s ∈ [0, t], j = 1, . . . , m . Using Lemma 4.3.2, page 122 in Davis [71], the elements of HtY have the following representation (m Z ) X t Y i Ht = ai dYs : ai ∈ L2 ([0, t]), i = 1, . . . , m . i=1
0
It follows that there exists a (d×m) matrix-valued function K : [0, t] → Rd×m ˘ t = (X ˘ ti )d with all of its entries square integrable, and a random variable X i=1 Y with entries orthogonal on Ht such that
150
6 Finite-Dimensional Filters
˘t + Xt = X
Z
t
Ks dYs . 0
˘ t has a Gaussian In particular, as a consequence of Exercise 6.11 part (iii), X distribution. Moreover, for any n > 0 any n-tuple of the form ˘t , Yt , Yt , . . . , Yt , X 1
2
n−1
where 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t has a (d + (n − 1)m)-variate nor˘ t has all entries orthogonal on HtY it follows mal distribution. Now since X ˘ t is independent of (Yt , Yt , . . . , Yt ) and since the time instances that X 1 2 n−1 ˘ t is 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t have been arbitrarily chosen it follows that X independent of Yt . This observation is crucial! It basically says that, in the linear/Gaussian case, the linear projection (the projection onto the linear space generated by the observation) coincides with the non-linear projection (the conditional expectation with respect to the observation σ-algebra). Hence the ˘ distribution of Xt conditional upon R t Yt is the same as the distribution of Xt shifted by the (fixed) quantity 0 Ks dYs . In particular πt is characterized by its first and second moments alone. t u 6.2.1 The First and Second Moments of the Conditional Distribution of the Signal We know from Chapter 3 that the conditional distribution of the signal is the unique solution of the Kushner–Stratonovich equation (3.57). Unlike the model analysed in Chapter 3, the above linear filter has time-dependent coefficients. Nevertheless all the results and proofs presented there apply to the linear filter with time-dependent coefficients (see Remark 3.1). In the following we deduce the equations for the first and second moments of π. Let ϕi , ϕij for i, j = 1, . . . , d be the functions ϕi (x) = xi ,
ϕij (x) = xi xj ,
x∈R
and let πti , πtij be the moments of πt πti = πt (ϕi ),
πtij = πt (ϕij ),
i, j = 1, . . . , d.
Exercise 6.13. i. Show that for any t ≥ 0 and i = 1, . . . , d and p ≥ 1, the solution of the equation (6.17) satisfies h p i sup E Xsi < ∞. s∈[0,t]
ii. Deduce from (i) that for any t ≥ 0 and i, j = 1, . . . , d p
sup E [(πs (|ϕi |)) ] < ∞, s∈[0,t]
p
sup E [|πs (|ϕij |)| ] < ∞. s∈[0,t]
6.2 The Kalman–Bucy Filter
151
In particular sup E |πsi |p < ∞, s∈[0,t]
h p i sup E πsij < ∞. s∈[0,t]
In this case the innovation process I = {It , t ≥ 0} defined by (2.17) has the components ! Z t X d j j ji i j It = Yt − Hs πs + hs ds, t ≥ 0, j = 1, . . . , m. 0
i=1
The Kushner–Stratonovich equation (3.57) now takes the form Z
t
πt (ϕ) = π0 (ϕ) +
πs (As ϕ) ds +
d X m Z X
0
i=1 j=1
t
πs ϕ ϕi − πsi
Hsji dIsj (6.22)
0
where the time-dependent generator As , s ≥ 0 is given by As ϕ =
d X i,j=1
Fsij xj + fsi
d d ∂ϕ 1 XX ∂2ϕ (σs σs> )ij + , ∂xi 2 i=1 j=1 ∂xi ∂xj
and ϕ is chosen in the domain of As for any s ∈ [0, t] such that sup kAs ϕk < ∞. s∈[0,t]
To find the equations satisfied by πti and πtij we cannot replace ϕ by ϕi and ϕ by ϕij in (6.22) because neither of them belongs to the domain of As (since they are unbounded). We proceed by cutting off ϕi and ϕij at a fixed level which we let tend to infinity. For this let us introduce the functions (ψ k )k>0 defined as ψ k (x) = ψ(x/k), x ∈ Rd , (6.23) where
1 2 −1 ψ(x) = exp |x| |x|2 −4 0
if |x| ≤ 1 if 1 < |x| < 2 . if |x| ≥ 2
Obviously, for all k > 0, ψ k ∈ Cb∞ (Rd ) and 0 ≤ IB(k) ≤ ψk ≤ 1. Also, all partial derivatives of ψk tend uniformly to 0. In particular lim kAψk k∞ = 0, lim k∂i ψk k∞ = 0,
k→∞
k→∞
In the following we use the relations
i = 1, . . . , d.
152
6 Finite-Dimensional Filters
lim ϕi (x)ψ k (x) = ϕi (x),
|ϕi (x)ψ k (x)| ≤ |ϕi (x)| ,
k→∞
lim As (ϕi ψ k )(x) = As ϕi (x), n n X X sup |As ϕi ψ k (x)| ≤ Ct |ϕi (x)| + |ϕij (x)| . k→∞
s∈[0,t]
i=1
(6.24) (6.25) (6.26)
i,j=1
Proposition 6.14. Let x ˆ = {ˆ xt , t ≥ 0} be the conditional mean of the signal. In other words, x ˆ is the d-dimensional process with components x ˆit = E[Xti |Yt ] = πti ,
i = 1, . . . , d, t ≥ 0.
Define R = {Rt , t ≥ 0} to be the conditional covariance matrix of the signal. In other words, Rt is the d × d-dimensional process with components Rtij = E[Xti Xtj |Yt ] − E[Xti |Yt ]E[Xtj |Yt ] = πtij − πti πtj ,
i, j = 1, . . . , d, t ≥ 0.
Then x ˆ satisfies the stochastic differential equation dˆ xt = (Ft x ˆt + ft ) dt + Rt Ht> (dYt − (Ht x ˆt + ht ) dt),
(6.27)
and R satisfies the deterministic matrix Riccati equation dRt = σt σt> + Ft Rt + Rt Ft> − Rt Ht> Ht Rt . dt
(6.28)
Proof. Replacing ϕ by ϕi ψ k in (6.22) gives us πt (ϕi ψ k ) = π0 (ϕi ψ k ) + +
t
Z
πs (As ϕi ψ k ) ds
0 Z d m t XX
πs
l=1 j=1
ϕi ψ k ϕl − πsl
Hsjl dIsj .
(6.29)
0
By the dominated convergence theorem (use (6.24)–(6.26)) we may pass to the limit as k → ∞, lim πt (ϕi ψ k ) = πt (ϕi )
(6.30)
k→∞
lim π0 (ϕi ψ k ) +
k→∞
Also
Z
t
πs (As (ϕi ψ k )) ds = π0 (ϕi ) +
0
Z t lim E πs k→∞
Z
t
πs (As ϕi ) ds. 0
k
ϕi ψ − 1
k
ϕ −
πsl
Hsjl
0
Hence at least for subsequence (kn )n≥0 , we have that
dIsj
= 0.
(6.31)
6.2 The Kalman–Bucy Filter
lim
kn →∞
d X m Z X
t
πs
ϕi ψ
k
l
ϕ −
πsl
Hsjl
dIsj
=
0
l=1 j=1
d X m Z X l=1 j=1
153
t
Rsil Hsjl dIsj .
0
(6.32) By taking the limit in (6.29) along a convenient subsequence and using (6.30)– (6.32) we obtain (6.27). We now derive the equation for the evolution of the covariance matrix R. Again we cannot apply the Kushner–Stratonovich equation directly to ϕij but use first an intermediate step. We ‘cut off’ ϕij and use the functions (ψ k )k>0 and take the limit as k tends to infinity. After doing that we obtain the equation for πtij which is ! d X ij kj jk j dπt = (σt σt> )ij + Ftik πt + Ft πtik + fti x ˆit + x ˆt dt k=1
+
d X m X
πt (ϕi ϕj ϕk ) − πtij x ˆkt Htlk dItl .
(6.33)
k=1 l=1
Observe that since πt is normal we have the following result on the third moments of a multivariate normal distribution πt (ϕi ϕj ϕk ) = x ˆit x ˆjt x ˆkt + x ˆit Rtjk + x ˆjt Rtik + x ˆkt Rtij . It is clear that dRtij = dπtij − d(ˆ xit x ˆjt ),
(6.34)
where the first term is given by (6.33) and using Itˆo’s form of the product rule to expand out the second term d(ˆ xit x ˆjt ) = x ˆit dˆ xjt + x ˆjt dˆ xit + dhˆ xi , x ˆ j it . Therefore using (6.27) we can evaluate this as d X d x ˆit x ˆjt = Ftik x ˆkt x ˆjt dt + Ftjk x ˆit x ˆkt dt + fti x ˆit + x ˆjt dt + x ˆit (Ht Rt> dIt )j k=1
+x ˆjt (Ht Rt> dIt )i + (Ht> Rt dIt )i , (Ht Rt> dIt )j .
(6.35)
For evaluating the quadratic covariation term in this expression it is simplest to work componentwise using the Einstein summation convention and use the fact that by Proposition 2.30 the innovation process It is a P-Brownian motion E
D (Ht Rt> dIt )i , (Ht> Rt dIt )j = Rtil Htkl dItk , Rtjm Htnm dItn = Rtil Htkl Rtjm Htnm δkn dt = Rtil Htkl Htkm Rtjm dt = (RH > HR> )ij dt = (RH > HR)ij dt,
(6.36)
154
6 Finite-Dimensional Filters
where the last equality follows since R> = R. Substituting (6.33), (6.35) and (6.36) into (6.34) yields the following equation for the evolution of the ijth element of the covariance matrix dRtij = (σt σt> )ij + (Ft Rt )ij + (Rt> Ft> )ij − (Rt Ht> Ht Rt )ij dt + (ˆ xit Rtjm + x ˆjt Rtjm )Htlm dItl − (ˆ xit Rtjm + x ˆjt Rtjm )Htlm dItl . Thus we obtain the final differential equation for the evolution of the conditional covariance matrix (notice that all of the stochastic terms will cancel out). t u 6.2.2 The Explicit Formula for the Kalman–Bucy Filter In the following we use the notation R1/2 to denote the square root of the symmetric positive semi-definite matrix R; that is, the matrix R1/2 is the (unique) symmetric positive semi-definite matrix A such that A2 = R. Theorem 6.15. The conditional distribution of Xt given the observation σalgebra is given by the explicit formula Z 1 1 1/2 2 ϕ x ˆt + Rt ζ exp − kζk dζ πt (ϕ) = 2 (2π)n/2 Rd for any ϕ ∈ B Rd . Proof. Immediate as πt is a normal distribution with mean x ˆt and covariance matrix Rt . t u We remark that, in this case too, π is finite-dimensional as it depends only on the (d + d2 )-process (x, R) (its mean and covariance matrix). Corollary 6.16. The process ρt satisfying the Zakai equation (3.43) is given by Z 1 1 1/2 2 ρt (ϕ) = Zˆt ϕ x ˆ + R ζ exp − kζk dζ, t t 2 (2π)n/2 Rd where ϕ ∈ B Rd and Z t Z t > 2 Zˆt = exp (H x ˆt + h) dYs − kH x ˆt + hk ds . 0
0
Proof. Immediate from Theorem 6.15 and the fact that ρt (1) has the representation Z t Z t > 2 ρt (1) = exp (H x ˆs + h) dYs − kH x ˆs + hk ds 0
as proved in Exercise 3.37.
0
t u
6.3 Solutions to Exercises
155
6.3 Solutions to Exercises 6.1 i. Suppose that f (x) = ax + b; then P (x) = a + (ax + b)2 σ −2 + (h1 x + h2 )2 which is a second-order polynomial with leading coefficient a2 /σ 2 +h21 ≥ 0. The Lipschitz condition on f is trivial. ii. In this case P (x) = α2 + (h1 x + h2 )2 which is a second order polynomial with leading coefficient h21 ≥ 0. The case f (x) = aσ tanh(ax/σ) is obtained by taking α = a and β = 1. The derivative f 0 (x) is bounded by 1/(4β), thus the function f is Lipschitz and satisfied (6.2). iii. Use the previous result with α = a, β = e2b . 6.2 Lemma 3.9 implies that it is sufficient to show that Z t 2 −2 2 E f (Xs ) σ + h (Xs ) ds < ∞. 0
From the Lipschitz condition (6.2) on f , the fact that σ is constant, and that X0 = x0 is constant and thus trivially has bounded second moment, it follows from Exercise 3.11 that for 0 ≤ t ≤ T , EXt2 < GT < ∞. It also follows from Exercise 3.3 that f (X) has a linear growth bound f (x) ≤ κ(1+kxk), therefore Z t 2 Z t f (Xs )2 κ 2 2 2 E + h(Xs ) ds ≤ E (1 + |Xs |) + (h1 Xs + h2 ) ds 2 σ2 0 0 σ Z t κ2 κ2 2 2 2 E|Xs | ds + h2 + 2 t ≤ 2 h1 + 2 σ σ 0 2 2 κ κ ≤ 2 h21 + 2 tGT + h22 + 2 t < ∞. σ σ ˜ the process with components 6.3 By Girsanov’s theorem under P, Z t Z t Z t f (Xs ) 1 Xt = Wt − W, − dVs − h(Xs ) dWs = Wt + h(Xs ) ds σ 0 0 0 and Xt2
Z
= Vt − V, − 0
t
f (Xs ) dVs − σ
Z
t
h(Xs ) dWs
Z = Vt +
0
0
t
F (Xs ) ds σ
is a two-dimensional Brownian motion. Therefore the law of (Xt1 , Xt2 ) = (Tt , Vˆt ) is bivariate normal, so to show the components are independent it is sufficient to consider the covariation hVˆt , Yt i = hVt , Wt i = 0,
∀t ∈ [0, ∞),
from which we may conclude that Y is independent of Vˆ , and since Xt = ˆ the processes Y and X are independent. X0 + σ Vˆ , it follows that under P
156
6 Finite-Dimensional Filters
6.4 Follow the same argument as in the proof of Proposition 3.16. 6.5 Consider t as fixed; it is then sufficient to show that uniformly in n, Z t sinh(spσ)ysn ds 0
=
n−1 X k=0
=
n−1 X k=0
=
Y(k+1)t/n − Ykt/n t/n
Z
(k+1)t/n
sinh(spσ) ds kt/n
cosh((k + 1)pσt/n) − cosh(kpσt/n) pσt/n
Z
(k+1)t/n
dYs kt/n
Z t n−1 X cosh((k + 1)pσt/n) − cosh(kpσt/n) 1(kt/n,(k+1)t/n] (s) dYs . pσt/n 0 k=0
ˆ therefore it is Thus by Itˆ o’s isometry, since Y is a Brownian motion under P, sufficient to show "Z n−1 t X cosh((k + 1)pσt/n) − cosh(kpσt/n) E 1(kt/n,(k+1)t/n] (s) pσt/n 0 k=0 2 # − sinh(spσ) ds → 0. Using the mean value theorem, for each interval for k = 0, . . . , n − 1, there exists ξ ∈ [kpσ/n, (k + 1)pσ/n] such that sinh(ξpσ) =
cosh((k + 1)pσ/n) − cosh(kpσ/n) pσ/n
therefore since sinh(x) is monotonic increasing for x > 0, "n−1 Z 2 # X (k+1)t/n cosh((k + 1)pσt/n) − cosh(kpσt/n) E − sinh(spσ) ds pσt/n kt/n k=0
n−1 X
t 2 (sinh((k + 1)pσt/n) − sinh(kpσt/n)) n k=0 2 n−1 X t tpσ ≤ cosh2 ((k + 1)pσt/n) n n ≤
k=0
≤ t cosh2 (tpσ)
(tpσ)2 , n2
where we use the bound for a, x > 0, sinh(a + x) − sinh(a) ≤ sinh0 (a + x)x = cosh(a + x)x.
6.3 Solutions to Exercises
157
Thus this tends to zero as n → ∞, which establishes the required convergence. For the uniform bound, it is sufficient to show that for fixed t,
E
"n−1 Z X
(k+1)t/n
kt/n
k=0
cosh((k + 1)pσt/n) − cosh(kpσt/n) dYs pσt/n
#2 (6.37)
√ is uniformly bounded in n. We can then use the fact that E|Z| < EZ 2 , to see that the modulus of the integral is bounded in the L1 norm and hence in probability. The dependence on ω in this bound arises solely from the process Y ; thus considered as a functional of Y , there is a uniform in n bound. To complete the proof we establish a uniform in n bound on (6.37) using the Itˆ o isometry
E
#2 cosh((k + 1)pσt/n) − cosh(kpσt/n) dYs pσt/n kt/n n−1 X Z (k+1)t/n cosh((k + 1)pσt/n) − cosh(kpσt/n) 2 ≤ ds pσt/n k=0 kt/n 2 2 n pσt 2 sinh (tpσ) ≤ pσt n
"n−1 Z X k=0
(k+1)t/n
≤ sinh2 (tpσ). 6.9 For the linear filter take F (x) = ax2 /2 + bx; computing the normalising constant involves computing for B > 0, 2 ! Z ∞ Z ∞ √ A 2 A2 /(4B) exp(−Bx + Ax) dx = e exp − Bx − √ dx 2 B −∞ −∞ √ √ 2 (6.38) = eA /(4B) π/ B. p In the case of the linear filter the coefficients p = a2 /σ 2 + h21 , q = ab/σ 2 + h1 h2 and r = a + b2 /σ 2 + h22 . Thus from the equation for the normalising constant (6.16), q q + p2 x0 − coth(tpσ), pσ sinh(tpσ) pσ
At = b/σ 2 + h1 Ψt + where Z Ψt =
0
and Bt = −
t
sinh(spσ) dYs sinh(tpσ)
a p coth(tpσ) + . 2σ 2 2σ
158
6 Finite-Dimensional Filters
Since coth(x) > 1 for x > 0 and p ≥ a/σ, it follows that B > 0 as required. Using the result (6.38) we see that the normalised conditional distribution is given by (6.15), √ Z ∞ 2 ! Bt 1 x − At /(2Bt ) √ πt (ϕ) = √ ϕ(x)exp − dx, 2 π −∞ 1/ 2Bt which corresponds to a Gaussian distribution with mean x ˆt = At /(2Bt ) and variance Rt = 1/2Bt . Differentiating dRt p2 = 2 dt 4 sinh (tpσ)Bt2 thus with the aid of the identity coth2 (x) − 1 = 1/ sinh2 (x), it is easy to check that dRt = σ 2 + 2aRt − Rt2 h21 dt which is the one-dimensional form of the Kalman filter covariance equation (6.28). In one dimension the Kalman filter equation for the conditional mean is dˆ xt = (aˆ xt + b)dt + Rt h1 dYt − Rt h1 (h1 x + h2 )dt thus to verify that the mean At Rt is a solution of this SDE we compute At Rt2 p2 dt + Rt h1 dYt − Rt coth(tpσ)pσ(At − b/σ 2 ) − qRt sinh2 (tpσ) = Rt h1 dYt + (At Rt ) Rt p2 coth2 (tpσ) − pσ coth(tpσ) − Rt p2 pRt b ab + coth(tpσ) − 2 Rt − h1 h2 Rt σ σ = Rt h1 dYt − h1 h2 Rt + b + (At Rt ) Rt p2 coth2 (tpσ) − pσ coth(tpσ) − Rt p2
d(At Rt ) =
= Rt h1 dYt − h1 h2 Rt + b − Rt h21 (Rt At ) pσ a2 + (At Rt )Rt p2 coth2 (tpσ) − coth(tpσ) − 2 Rt σ = Rt h1 dYt − h1 h2 Rt + b − Rh21 (At Rt ) + (At Rt )Rt a a2 p × p2 coth2 (tpσ) − pσ coth(tpσ) − 2 + coth(tpσ) − 2 σ σ σ = Rt h1 dYt − Rt h1 (h1 At Rt + h2 ) + (At Rt )a + b. Therefore the solution computed explicitly solves the SDEs for the onedimensional Kalman filter.
6.3 Solutions to Exercises
159
In the limit as t → ∞, Bt → −a/σ 2 +p/(2σ) and At ' b/σ 2 +h1 Ψt −q/(pσ) and thus the law of the conditional distribution asymptotically for large t is given by h1 Ψt σ 2 + b − qσ/p σ2 N , . pσ − a pσ − a For the second Beneˇs filter, from the solution to Exercise 6.1 p = h1 , q = h1 h2 and r = h22 + α2 , so h2 + h1 x0 h2 h1 Qt (x) = h1 Ψt + − coth(tpσ) x − coth(tpσ)x2 . σ sinh(tpσ) σ 2σ In the general case we can take as antiderivative to f F (x) =
σ2 log e2αx/σ + 1/β − σx. α
However, there does not seem to be an easy way to evaluate this integral in general, so consider the specific case where β = 1 and α = a, F (x) = σ 2 log(cosh(ax/σ)) ; thus from (6.16) the normalising constant is Z ∞ ax ct = cosh σ −∞ h2 + h1 x0 h2 h1 2 × exp h1 Ψt + − coth(tpσ) x − coth(tpσ)x dx, σ sinh(tpσ) σ 2σ which can be evaluated using two applications of the result (6.38), with Bt , and A± t ,±
h1 coth(tpσ), 2σ
a h2 + h1 x0 h2 + h1 Ψt + − coth(tpσ). σ σ sinh(tpσ) σ
Thus the normalising constant is given by √ + 2 − 2 π ct = √ e(At ) /(4Bt ) + e(At ) /(4Bt ) . 2 Bt Therefore the normalised conditional distribution is given by
160
6 Finite-Dimensional Filters
√ Bt 1 πt (ϕ) = √ + 2 − 2 (A ) /(4B ) t + e(At ) /(4Bt ) π e t Z ∞ − × ϕ(x) exp(−Bt x2 ) exp(A+ t x) + exp(At x) dx −∞ √ Bt 1 = √ − 2 2 π e(A+ t ) /(4Bt ) + e(At ) /(4Bt ) " 2 ! Z ∞ 1 x − A+ t /(2Bt ) −(A+ )2 /(4Bt ) √ × e ϕ(x) exp − dx 2 1/ 2Bt −∞ 2 ! # Z ∞ 1 x − A− t /(2Bt ) −(A− )2 /(4Bt ) √ +e ϕ(x) exp − dx . 2 1/ 2Bt −∞ Thus the normalised conditional distribution is the weighted mixture of two normal distributions, with weight w± =
2 exp(−(A± t ) /(4Bt )) − 2 2 exp((A+ t ) /(4Bt )) + exp((At ) /(4Bt ))
on a N (A± t /(2Bt ), 1/(2Bt )) distributed random variable. In the limit as t → ∞, Bt → h1 /(2σ) and A± t ' ±a/σ + h1 Ψt − h2 /σ and the asymptotic expressions for the weights become w± = 2
exp(±2a/(h1 Ψt /σ − h2 /σ 2 )) cosh(2a/(h1 Ψt /σ − h2 /σ 2 ))
and the distributions N (±a/h1 + σΨt − h2 /h1 , σ/h1 ). 6.11 i.
Setting Z Ct , T0 +
t
Φ−1 s ls ds +
0
Z
t
Φ−1 s zs dUs ,
(6.39)
0
and At , Φt Ct , where Φt is given by (6.21), it follows by integration by parts that Z t Z t −1 dAt = dΦt T0 + Φ−1 l dt + Φ z dU s s s s s 0 0 −1 + Φt Φt lt dt + Φ−1 t zt dUt = Lt At + lt dt + zt dUt which is the SDE for Tt . As Φ0 = Id+m , it follows that A0 = T0 . Thus Tt has the representation (6.20). ii. In this part we continue to use the notation for the process Ct introduced above. It is clearly sufficient to show that (Tt1 , . . . , Ttn−1 , Tn ) has a multivariate-normal distribution, since T = (X, Y ). Note that the process
6.3 Solutions to Exercises
161
Φt is a deterministic matrix-valued process, thus if for fixed t, Ct has a multivariate normal distribution then so does Φt Ct . Since X0 has a multivariate normal distribution and Y0 = 0, T0 has a multivariate normal distribution. From the SDE (6.39) it follows that Ct1 , Ct2 − Ct1 , . . . , Ct − Ctn−1 are independent random variables, each of which has a multivariate-normal distribution. The result now follows since Tt1 = Φt1 Ct1 Tt2 = Φt2 (Ct1 + (Ct2 − Ct1 )) .. .. . . Tt = Φt (Ct1 + · · · + (Ctn−2 − Ctn−1 ) + (Ct − Ctn−1 )). iii. It follows from (ii) and the fact that the image under a linear map of a multivariate-normal distribution is also multivariate-normal, that for any n and fixed times 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t, ! n−2 X Xt , Kti Yti+1 − Yti + Ktn−1 Yt − Ytn−1 i=0
has a multivariate-normal distribution. By the usual Itˆo isometry argument as the mesh of the partition tends to zero, this term converges in L2 and thus in probability to Z t Xt , Ks dYs . 0
By a standard result on weak convergence (e.g. Theorem 4.3 of [19]) this convergence in probability implies that the sequence converges weakly; consequently the characteristic functions must also converge. As each element of the sequence is multivariate normal it follows that the limit must be multivariate normal. 6.13 i. The first part follows using the SDE for X and Itˆo’s formula using the local boundedness of fs , Fs , σs . In the case p = 1 local boundedness of σs implies that the stochastic integral is a martingale, thus using the notation kF k[0,t] , sup
max |Fsij | < ∞,
kf k[0,t] , sup
0≤s≤t i,j=1,...,d
max |fsi | < ∞,
0≤s≤t i=1,...,d
we can obtain the following bound
Z t
EkXt k ≤ EkX0 k + E F X + f ds s s s
0 Z t ≤ x0 + tdkf k[0,t] + dkF k[0,t] kXs k ds. 0
162
6 Finite-Dimensional Filters
Thus from Corollary A.40 to Gronwall’s lemma EkXt k ≤ x0 + tdkfs k[0,t] exp tdkF k[0,t] . Similarly for p = 2, use f (x) = x> x, dkXt k2 = 2Xs> (Fs Xs + fs )ds + 2Xs> σdVs + tr(σ > σ)ds. Let Tn be a reducing sequence for the stochastic integral, which is a local martingale (see Exercise 3.10 for more details). Then "Z # t∧Tn
EkXt∧Tn k2 = EkX0 k2 + E
2Xs> (Fs Xs + fs ) + tr(σ > σ) ds
0 2
2
Z
≤ EkX0 k + 2d kF k[0,t]
t
E[kXs k2 ] ds
0
+ dtkf k[0,t] sup EkXs k + tdkσk2[0,t] . 0≤s≤t
Using the first moment bound, Gronwall’s inequality yields a bound independent of n, thus as n → ∞ Fatou’s lemma implies that sup EkXs k2 < ∞. 0≤s≤t
We can proceed by induction to the general case for the pth moment. Apply Itˆ o’s formula to f (x) = xp/2 for p ≥ 3 to obtain the pth moment bound; thus dkXt kp = pkXkp−2 2Xt> (Ft Xt + ft )dt + tr(σ > σ) ds + 2Xt> σdVt +
p(p − 1) kXt kp−4 (Xt> σσ > Xt )dt. 2
The stochastic integral is a local martingale and so a reducing sequence Tn can be found. The other terms involve moments of order p, p − 1 and p − 2, so the result follows as in the case above from the inductive hypotheses, Gronwall’s lemma followed by Fatou’s lemma and the fact that all moments of the initial X0 are finite since it is normally distributed. ii. For any s ∈ [0, ∞), p
p
E [(πs (|ϕi |)) ] = E [(E [|ϕi (Xs )| | Ys ]) ] ≤ E [E [|ϕi (Xs )|p | Ys ]] = E [|ϕi (Xs )|p ] , where the inequality follows from the conditional form of Jensen’s inequality. Therefore from part (i), p sup E [(πs (|ϕi |)) ] ≤ sup E |Xsi |p < ∞. s∈[0,t]
s∈[0,t]
6.3 Solutions to Exercises
163
For the product term h p i p E [(πs (|ϕij |)) ] = E E |Xsi Xsj | | Ys ≤ E E |Xsi |p |Xsj |p | Ys = E |Xsi |p |Xsj |p r h i ≤ E [|Xsi |2p ] E |Xsj |2p < ∞.
7 The Density of the Conditional Distribution of the Signal
The question which we consider in this chapter is whether πt , the conditional distribution of Xt given the observation σ-algebra Yt , has a density with respect to a reference measure, in particular with respect to Lebesgue measure. We prove that, under fairly mild conditions, the unnormalised conditional distribution ρt , which is the unique solution of the Zakai equation (3.43), has a square integrable density with respect to Lebesgue measure. This automatically implies that πt has the same property. There are various approaches to answer this question. The approach presented here is that adopted by Kurtz and Xiong in [174]. In the second part of the chapter we discuss the smoothness properties (i.e. the differentiability) of the density of ρ. Finally we show the existence of the dual of the solution of the Zakai equation (see (7.30) below). The dual of ρ plays an important rˆ ole in establishing the rates of convergence of particle approximations to π and ρ which are discussed in more detail in Chapter 9. In the following, we take the signal X to be the solution of the stochastic differential equation (3.9); that is, X = (X i )di=1 is the solution of the stochastic differential equation dXt = f (Xt )dt + σ(Xt ) dVt ,
(7.1)
where f : Rd → Rd and σ : Rd → Rd×p are bounded and globally Lipschitz (i.e. they satisfy the conditions (3.10)) and V = (V j )pj=1 is a p-dimensional Brownian motion. The observation process is the solution of the evolution equation (3.5). That is, Y is an m-dimensional stochastic process which satisfies dYt = h(Xt ) dt + dWt , d m where h = (hi )m is a bounded measurable function and W is a i=1 : R → R standard m-dimensional Brownian motion which is independent of X. In the following we make use of an embedding theorem which we state below. In order to state this theorem, we need a few notions related to Sobolev spaces. Further details on this topic can be found, for example, in Adams [1].
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 7,
166
7 Density of the Conditional Distribution of the Signal
7.1 An Embedding Theorem Let α = (α1 , . . . , αd ) ∈ Nd be an arbitrary multi-index. Given two functions f and g ∈ Lp (Rd ), we say that ∂ α f = g in the weak sense if for all ψ ∈ C0∞ (Rd ) we have Z Z f (x)∂ α ψ(x) dx = (−1)|α| g(x)ψ(x) dx. (7.2) Rd
Rd
We immediately see that if the partial derivative of a function exists in the conventional sense and is continuous up to order |α|, integration by parts will yield (7.2). The converse is not true; to see this one can consider, for example, the function exp(i/|x|n ). Let k be a non-negative integer. The Sobolev space, denoted Wkp (Rd ), is the space of all functions f ∈ Lp (Rd ) such that the partial derivatives ∂ α f exist in the weak sense and are in Lp (Rd ) whenever |α| ≤ k, where α is a multi-index. We endow Wkp (Rd ) with the norm 1/p
kf kk,p =
X
p
k∂ α f kp
,
(7.3)
|α|≤k
where ∂ 0 f = f and the norms on the right are the usual norms in Lp (Rd ). Then Wkp (Rn ) is complete with respect to the norm defined by (7.3); hence it is a Banach space. In the following, we make use, without proof, of the following Sobolev-type embedding theorem (for a proof see Adams [1], SaloffCoste [252], or Stein [256]). Theorem 7.1. If k > d/p then there exists a modification of f ∈ Wkp (Rd ) on a set of zero Lebesgue measure so that the resulting function is continuous. In the following we work mostly with the space Wk2 (Rd ). This space is a Hilbert space with the inner product X hf, giW p (Rd ) = h∂ α f, ∂ α gi , 2
|α|≤k
where h·, ·i is the usual inner product on L2 (Rd ) Z hf, gi = f (x)g(x) dx. Rd
Exercise 7.2. Let {ϕi }i>0 be an orthonormal basis of L2 (Rd ) with the property that ϕi ∈ Cb (Rd ) for all i > 0. Let µ ∈ M(Rd ) be a finite measure. Show that if ∞ X µ(ϕi )2 < ∞, i=1
then µ is absolutely continuous with respect to Lebesgue measure. Moreover if gµ : Rd → R is the density of µ with respect to Lebesgue measure then gµ ∈ L2 (Rd ).
7.1 An Embedding Theorem
167
The results from below make use of theP regularisation method . Let ψ be d the kernel for the heat equation (∂t u = 1/2 i=1 ∂i ∂i u), viz ψε (x) , (2πε)−d/2 exp −kxk2 /2ε , and define the convolution operator Tε : B(Rd ) → B(Rd ) Z Tε f (x) , ψε (x − y)f (y) dy,
x ∈ Rd .
(7.4)
Rd
Also define the corresponding operator on the space of finite measures Tε : M(Rd ) → M(Rd ) Tε µ(f ) , µ(Tε f ) Z Z = ψε (x − y)f (y) dy µ(dx) Rd Rd Z = f (y)Tε µ(y) dy, Rd
where y 7→ Tε µ(y) is the density of the measure Tε µ with respect to Lebesgue measure, which by the above exists even if µ is not absolutely continuous with respect to Lebesgue measure; furthermore the density is given by Z Tε µ(y) = ψε (x − y)µ(dx), y ∈ Rd . Rd
In the following, we use the same notation Tε µ for the regularized measure and its density. Exercise 7.3. Let µ be a finite measure on Rd and |µ| ∈ M(Rd ) be its total variation measure. Show that: i. For any ε > 0 and g ∈ L2 (Rd ), kTε gk2 ≤ kgk2 . ii. For any ε > 0, Tε µ ∈ Wk2 (Rd ). iii. For any ε > 0, kT2ε µk2 ≤ ||Tε |µ|||2 . Let µ be a finite signed measure on Rd ; then for f ∈ Cb (Rd ), denote by f µ the finite signed measure on Rd which is absolutely continuous with respect to µ and whose density with respect to µ is f . Exercise 7.4. Let µ be a finite (signed) measure on Rd and |µ| ∈ M(Rd ) be its total variation measure. Also let f ∈ Cb (Rd ) be a Lipschitz continuous and bounded function. Denote by kf , supx∈Rd |f (x)| and let kf0 be the Lipschitz constant of f . Show that: i. For any ε > 0, kTε f µk2 ≤ kf kTε |µ|k2 .
2 ii. For any ε > 0 and i = 1, . . . , d, we have Tε µ, f ∂ i Tε µ ≤ 12 kf0 kTε |µ|k2 . iii. For any ε > 0 and i = 1, . . . , d, we have kf ∂ i Tε µ − ∂ i Tε f µk2 ≤ 2d/2+2 kf0 kT2ε |µ|k2 .
168
7 Density of the Conditional Distribution of the Signal
7.2 The Existence of the Density of ρt In this section we prove that the unnormalised conditional distribution ρt is absolutely continuous with respect to Lebesgue measure and its density is square integrable. We start with two technical lemmas. We require a set of functions {ϕi }i≥1 , where ϕ ∈ Cb2 (Rd ), such that these functions form an orthonormal basis of the space L2 (Rd ). There are many methods to construct such a basis. One of the most straightforward ones is to use wavelets (see, e.g. [224]). For any orthonormal basis of L2 (Rd ) and arbitrary f ∈ L2 (Rd ), ∞ X f= hf, ϕi iϕi , i=1
so kf k22 =
∞ X
hf, ϕi i2 kϕi k22 =
i=1
∞ X hf, ϕi i2 . i=1
The function ψε (x) decays to zero as kxk → ∞, therefore for ϕ ∈ Cb1 (Rd ), using the symmetry of ψε (x − y) and integration by parts Z Z ∂ ∂ i ψε (x − y)ϕ(y) dy = ∂ Tε ϕ = ψε (x − y)ϕ(y) dy ∂xi Rd ∂x d i R Z Z ∂ ∂ϕ(y) dy =− ψε (x − y)ϕ(y) dy = ψε (x − y) ∂y ∂y i d d i R R = Tε (∂ i ϕ). Lemma 7.5. Let A be a generator of the form Aϕ =
d X
d
aij
i,j=1
X ∂ϕ ∂2ϕ + fi i , i j ∂x ∂x ∂x i=1
ϕ ∈ D(A) ⊂ Cb (Rd ),
(7.5)
where the matrix a is defined as in (3.12); that is, a = 12 σσ > . Let {ϕi }i>0 be any orthonormal basis of L2 (Rd ) with the property that ϕi ∈ Cb2 (Rd ) for all i > 0. Then ∞ X
d d X X
i
2
i j
i 2
∂ ∂ Tε (aij ρs ) 2 . (7.6) ρs (A (Tε ϕk )) ≤ d ∂ Tε (f ρs ) 2 + d 2
k=1
2
i=1
i,j=1
In particular, if kf = max sup |f i (x)| < ∞ i=1,...,d x∈Rd
ka =
max
sup |aij (x)| < ∞,
i,j=1,...,d x∈Rd
then there exists a constant k = k(f, a, ε, d) such that
7.2 The Existence of the Density of ρt ∞ X
169
2
ρs (A (Tε ϕk ))2 ≤ k kTε ρs k2 .
k=1
Proof. For any i ≥ 0, for ϕ ∈ Cb2 (Rd ), integration by parts yields ρs (f i ∂ i Tε ϕ) = ρs (f i Tε ∂ i ϕ) = (f i ρs )(Tε ∂ i ϕ) = h∂ i ϕ, Tε (f i ρs )i = −hϕ, ∂ i Tε (f i ρs )i
(7.7)
and ρs (aij ∂ i ∂ j Tε ϕ) = ρs (aij Tε ∂ i ∂ j ϕ) = (aij ρs )(Tε ∂ i ∂ j ϕ) = h∂ i ∂ j ϕ, Tε (aij ρs )i = hϕ, ∂ i ∂ j Tε (aij ρs )i.
(7.8)
Thus using (7.7) and (7.8), ρs (A (Tε ϕk )) = −
d X
i
i
ϕk , ∂ Tε (f ρs ) +
i=1
d X
ϕk , ∂ i ∂ j Tε (aij ρs ) ,
(7.9)
i,j=1
from which inequality (7.6) follows. Then Z i |xi − yi | i ∂ Tε (f i ρs )(x) ≤ ψε (x − y)(f ρs )(dy) d ε R Z |xi − yi | kx − yk2 d/2 ≤ 2 kf exp − ψ2ε (x − y)ρs (dy) ε 4ε Rd ≤
2d/2 kf √ T2ε ρs (x), ε
p where the last inequality follows as supt≥0 t exp(−t2 /4) = 2/e < 1. For the second term in (7.9) we can construct a similar bound Z i j (xi − yi )(xj − yj ) 1i=j ij ∂ ∂ Tε (aij ρs )(x) ≤ − ψ (x − y)(a ρ )(dy) ε s d 2 ε ε R Z kx − yk2 1 ≤ 2d/2 ka + 2 ε ε d R 2 kx − yk × exp − ψ2ε (x − y)ρs (dy) 4ε ≤ 2d/2 ka (2 + 1/ε)T2ε ρs (x), where we used the fact that supt≥0 te−t/4 = 4/e < 2. The lemma then follows using part iii. of Exercise 7.3. t u Lemma 7.6. Let kσ0 be the Lipschitz constant of the function σ, where a = 1 > 2 σσ . Then we have
170
7 Density of the Conditional Distribution of the Signal
d
2 p X d
X
1X
i j ij i ik Tε ρs , ∂ ∂ Tε (a ρs ) + ∂ Tε (σ ρs )
2 i,j=1 i=1 k=1
2 2
≤ 2d/2+3 d2 p(kσ0 )2 kTε ρs k2 . (7.10) Proof. First let us note that
Tε ρs , ∂ i ∂ j Tε (aij ρs ) Z Z Z ∂2 = ψε (x − y)ρs (dy) ψε (x − z)aij (z)ρs (dz) dx Rd Rd Rd ∂xi ∂xj Z Z = Θ(y, z)aij (z)ρs (dy)ρs (dz) Rd Rd Z Z aij (z) + aij (y) = Θ(y, z) ρs (dy)ρs (dz), (7.11) 2 Rd Rd where the last equality follows from the symmetry in z and y, and where Z ∂2 Θ(y, z) , ψε (x − y) ψε (x − z) dx ∂xi ∂xj Rd Z ∂2 = ψε (x − z)ψε (x − y) dx ∂zi ∂zj Rd ∂2 ψ2ε (z − y) ∂zi ∂zj (zi − yi )(zj − yj ) 1{i=j} = − ψ2ε (z − y). 4ε2 2ε =
Then by integration by parts and the previous calculation we get that
i ∂ Tε (σ ik ρs ), ∂ j Tε (σ jk ρs )
= − Tε (σ ik ρs ), ∂ i ∂ j Tε (σ jk ρs ) Z Z σ ik (y)σ jk (z) + σ ik (z)σ jk (y) =− Θ(y, z) ρs (dy)ρs (dz). 2 Rd Rd (7.12) Combining (7.11) and (7.12) summing over all the indices, and using the fact that a = σσ > , the left-hand side of (7.10) is equal to 1 2
Z
Z Θ(y, z)
Rd
Rd
p X d X
σ ik (y) − σ ik (z) σ jk (y) − σ jk (z) ρs (dy)ρs (dz)
k=1 i,j=1
and hence using the Lipschitz property of σ,
7.2 The Existence of the Density of ρt
171
d
2 p X d
X
X
i j ij i ik Tε ρs , ∂ ∂ Tε (a ρs ) + ∂ Tε (σ ρs )
i,j=1 k=1 i=1 2 Z Z d2 p 0 2 ≤ (kσ ) ky − zk2 Θ(y, z)ρs (dy)ρs (dz). 2 Rd Rd It then follows that ky − zk2 |Θ(y, z)| ≤ 2d/2 ky − zk2 ψ4 (z − y) kz − yk2 1 kz − yk2 × + exp − 4ε2 2ε 8ε ≤ 2d/2+5 ψ4ε (z − y), where the final inequality follows by setting x = ky−zk2 /(2ε) in the inequality sup(x2 + x)exp(−x/4) < 25 . x≥0
Hence the left-hand side of (7.10) is bounded by 2
2
2d/2+3 d2 p(kσ0 )2 kT2ε ρs k2 ≤ 2d/2+3 d2 p(kσ0 )2 kTε ρs k2 , the final inequality being a consequence of Exercise 7.3, part (iii).
t u
Proposition 7.7. If the function h is uniformly bounded, then there exists a constant c depending only on the functions f, σ and h and such that for any ε > 0 and t ≥ 0 we have Z t h h i i ˜ kTε ρt k2 ≤ kTε π0 k2 + c ˜ kTε ρs k2 ds. E E 2 2 2 0
Proof. For any t ≥ 0 and ϕi an element of an orthonormal basis of L2 (Rd ) chosen so that ϕi ∈ Cb (Rd ) we have from the Zakai equation using the fact that ρt (Tε ϕi ) = Tε ρt (ϕi ), Z t m Z t X Tε ρt (ϕi ) = Tε π0 (ϕi ) + ρs (A (Tε ϕi )) ds + ρs (hj Tε ϕi ) dYsj 0
j=1
0
and by Itˆ o’s formula 2
2
Z
(Tε ρt (ϕi )) = (Tε π0 (ϕi )) + 2
t
Tε ρs (ϕi )ρs (A (Tε ϕi )) ds 0
+2
m Z X
m Z X j=1
Tε ρs (ϕi )ρs (hj Tε ϕi ) dYsj
0
j=1
+
t
0
t
2 ρs (hj Tε ϕi ) ds.
172
7 Density of the Conditional Distribution of the Signal
The stochastic integral term in the above identity is a martingale, hence its expectation is 0. By taking expectation and using Fatou’s lemma we get that " n # h i X 2 2 ˜ kTε ρt k ≤ lim inf E ˜ E (Tε ρt (ϕi )) 2
n→∞
≤
i=1
2 kTε π0 k2
+ lim inf n→∞
n X i=1
"Z t ˜ E 2Tε ρs (ϕi )ρs (A (Tε ϕi )) 0
+
m X
ρs (hj Tε ϕi )
2
# ds . (7.13)
j=1
By applying the inequality |ab| ≤ (a2 + b2 )/2, Z t n X ˜ E |Tε ρs (ϕi )ρs (A (Tε ϕi ))| ds i=1
0
1˜ ≤ E 2
"Z
n tX
0
#
1˜ (Tε ρs (ϕi )) ds + E 2 i=1 2
"Z
n tX
# 2
(ρs (A(Tε ϕi ))) ds .
0 i=1
Thus using the bound of Lemma 7.5, it follows that uniformly in n ≥ 0, Z t Z n i X 1+k t ˜h 2 ˜ E |Tε ρs (ϕi )ρs (A (Tε ϕi ))| ds ≤ E kTε ρs k2 ds. 2 0 0 i=1 For the second part of the last term on the right-hand side of (7.13) for any n ≥ 0, Z t h n m Z t i X X 2 ˜ ˜ kTε ρs k2 ds, E ρs (hj Tε ϕi ) ds ≤ mkh2 E 2 i=1
j=1
0
0
where kh , max
sup |hj (x)|.
j=1,...,m x∈Rd
¯ a, h, ε, d, m) such that As a consequence, there exists a constant k¯ = k(f, Z t h h i i ˜ kTε ρt k2 ≤ kTε π0 k2 + k¯ ˜ kTε ρs k2 ds; E E 2 2 2 0
hence by Corollary A.40 to Gronwall’s lemma h i ¯ ˜ kTε ρt k2 ≤ kTε π0 k2 ekt E , 2 2 thus
7.2 The Existence of the Density of ρt
Z 0
t
173
h i 2 ¯ ˜ kTε ρs k2 ds ≤ kTε π0 k2 ekt E < ∞, 2 ¯ k
where we used Exercise 7.3 part (ii) to see that kTε π0 k22 < ∞. Thus as a consequence of the dominated convergence theorem in (7.13) the limit can be exchanged with the integral P∞ and expectation (which is a double integral). From (7.9), using hf, gi = i=1 hf, ϕi ihg, ϕi i, we then get that d Z t h i X
˜ kTε ρt k2 ≤ kTε π0 k2 + 2 ˜ Tε ρs , ∂ i Tε f i ρs ds E E 2 2 i=1 d Z X
+
+
i,j=1 0 m Z t X j=1
0
0
t
˜ E
Tε ρs , ∂ i ∂ j Tε aij ρs
ds
h
i ˜ Tε hj ρs 2 ds. E 2
(7.14)
From Exercise 7.4 parts (ii) and (iii), we obtain
Tε ρs , ∂ i Tε f i ρs ≤ Tε ρs , f i ∂ i Tε ρs + Tε ρs , ∂ i Tε (f i ρs ) − f i ∂ i Tε ρs 1 2 ≤ kf0 kTε ρs k2 + 2d/2+2 kf0 kTε ρs k2 kT2ε ρs k2 . (7.15) 2 Since the function h is uniformly bounded, it follows that
Tε (hj ρt ) 2 ≤ kh2 kTε ρt k2 , 2 2
j = 1, . . . , m.
(7.16)
The proposition follows now by bounding the terms on the right-hand side of (7.14) using (7.10) for the third term, (7.15) for the second term and (7.16) for the fourth term. t u Theorem 7.8. If π0 is absolutely continuous with respect to Lebesgue measure with a density which is in L2 (Rd ) and the sensor function h is uniformly bounded, then almost surely ρt has a density with respect to Lebesgue measure and this density is square integrable. Proof. In view of Exercise 7.2, it is sufficient to show that "∞ # X 2 ˜ E ρt (ϕi ) < ∞, i=1
where {ϕi }i>0 is an orthonormal basis of L2 (Rd ) with the property that ϕi ∈ Cb (Rd ) for all i > 0. From Proposition 7.7, Corollary A.40 to Gronwall’s lemma and Exercise 7.3 part (iii) we get that, h i ˜ kTε ρt k2 ≤ ect kπ0 k2 . sup E (7.17) 2 2 ε>0
174
7 Density of the Conditional Distribution of the Signal
Hence, by Fatou’s lemma "∞ # " # ∞ X X 2 2 ˜ ˜ lim E (ρt (ϕi )) = E (Tε ρt (ϕi )) ε→0
i=1
i=1
h i ˜ kTε ρt k2 ≤ lim inf E 2 ε→0
≤ ect kπ0 k22 < ∞, t u
hence the result.
Corollary 7.9. If π0 is absolutely continuous with respect to Lebesgue measure with a density which is in L2 (Rd ) and the sensor function h is uniformly bounded, then almost surely πt has a density with respect to Lebesgue measure and this density is square integrable. Proof. Immediate from Theorem 7.8 and the fact that πt is the normalised version of ρt . t u
7.3 The Smoothness of the Density of ρt So far we have proved that ρt has a density in L2 (Rd ). The above proof has the advantage that the conditions on the coefficients are fairly minimal. In particular, the diffusion matrix a is not required to be strictly positive. From (7.17) we get that ˜ [kTε ρt k ] < ∞. sup E 2 ε>0
Since, for example, the sequence (kT2−n ρt k2 )n>0 is non-decreasing (see part (iii) of Exercise 7.3), by Fatou’s lemma, this implies that sup kT2−n ρt k2 < ∞.
n>0
This implies that T2−n ρt belongs to a finite ball in L2 (Rd ). But L2 (Rd ) and in general any Sobolev space Wkp (Rd ) with p ∈ (1, ∞) has the property that its balls are weakly sequentially compact (as Banach spaces, they are reflexive; see, for instance, Adams [1]). In particular, this implies that the sequence T2−n ρt has a weakly convergent subsequence. So ρt , the (weak) limit of the convergent subsequence of T2−n ρt must be in L2 (Rd ) almost surely. Similarly, if we can prove the stronger result h i ˜ kTε ρt k p d < ∞, sup E (7.18) W (R ) ε>0
k
then, by the same argument, we can get that the density of ρt belongs to Wkp (Rd ). Moreover by Theorem 7.1, if k > d/p then the density of ρt is continuous (more precisely it has a continuous modification with which we can
7.3 The Smoothness of the Density of ρt
175
identify it) and bounded. Furthermore, if k > d/p+n, not just the density of ρt but also all of its partial derivatives up to order n are continuous and bounded. To obtain (7.18) we require additional smoothness conditions imposed on the coefficients f, σ and h and we also need π0 to have a density that belongs to Wkp (Rd ). We need to analyse the evolution equation not just of Tε ρt but also that of all of its partial derivatives up to the required order k. Unfortunately, the analysis becomes too involved to be covered here. The following exercise should provide a taster of what would be involved if we were to take this route. Exercise 7.10. Consider the case where d = m = 1 and let {ztε , t ≥ 0} be the measure-valued process (signed measures) whose density is the spatial derivative of Tε ρt . Show that Z t h i
2 0 ε 2 ˜ ˜ z ε , (Tε f ρs )00 ds
E kzt k2 ≤ (Tε π0 ) 2 − 2 E s 0 Z t Z t h
i ε ˜ z , (Tε aρs )000 ds + ˜ (Tε hρs )0 2 ds. − E E s 2 0
0
A much cleaner approach, but just as lengthy, is to recast the Zakai equation in its strong form. Heuristically, if the unconditional distribution of the signal ρt has a density pt with respect to Lebesgue measure for all t ≥ 0 and pt is ‘sufficiently nice’ then from (3.43) we get that Z ρt (ϕ) = ϕ(x)pt (x) dx Rd Z Z t Z t = ϕ(x) p0 (x) + A∗ ps (x) ds + h> (x)ps (x) dYs dx. (7.19) Rd
0
0
In (7.19), ϕ is a bounded function of compact support with bounded first and second derivatives and A∗ is the adjoint of the operator A, where Aϕ =
d X i,j=1
A∗ ϕ =
d
aij
X ∂ϕ ∂2ϕ + fi i i j ∂x ∂x ∂x i=1
d X
d X ∂2 ∂ (aij ϕ) − (f i ϕ) ∂x ∂x ∂x i j i i,j=1 i=1
and for suitably chosen functions ψ, ϕ (e.g. ψ, ϕ ∈ W22 (Rd )),† hA∗ ψ, ϕi = hψ, Aϕi. It follows that it is natural to look for a solution of the stochastic partial differential equation †
We also need f to be differentiable and a to be twice differentiable.
176
7 Density of the Conditional Distribution of the Signal
Z pt (x) = p0 (x) +
t
A∗ ps (x) ds +
0
Z
t
h> (x)ps (x) dYs ,
(7.20)
0
in a suitably chosen function space. It turns out that a suitable function space within which we can study (7.20) is the Hilbert space Wk2 (Rd ). A multitude of difficulties arise when studying (7.20): the stochastic integral in (7.20) needs to be redefined as a Hilbert space operator, the operator A∗ has to be rewritten in its divergence form and the solution of (7.20) needs further explanations in terms of measurability, continuity and so on. A complete analysis of (7.20) is contained in Rozovskii [250]. The following two results are immediate corollaries of Theorem 1, page 155 and, respectively, Corollary 1, page 156 in [250] (see also Section 6.2, page 229). We need to assume the following. C1. The matrix-valued function a is uniformly strictly elliptic. That is, there exists a constant c such that ξ > aξ ≥ ckξk2 for any x, ξ ∈ Rd such that ξ 6= 0. C2. For all i, j = 1, . . . , d, aij ∈ Cbk+2 (Rd ), fi ∈ Cbk+1 (Rd ) and for all i = 1, . . . , m, we have hi ∈ Cbk+1 (Rd ). C3. p0 ∈ Wkr (Rd ), r ≥ 2. Theorem 7.11. Under the assumptions C1–C3 there exists a unique Yt adapted process p = {pt , t ≥ 0}, such that pt ∈ Wk2 (Rd ) and p is a solution of the stochastic PDE (7.20). Moreover there exists a constant c = c(k, r, t) such that 0 r0 ˜ E sup kps kW r (Rd ) ≤ ckp0 krW r (Rd ) , (7.21) 0≤s≤t
k
k
where r0 can be chosen to be either 2 or r. Theorem 7.12. Under the assumptions C1–C3, if n ∈ N is given and (k − n)r > d, then p = {pt , t ≥ 0}; the solution of (7.20) has a unique modification with the following properties. 1. For every x ∈ Rd , pt (x) is a real-valued Yt -adapted process. 2. Almost surely, (t, x) → pt (x) is jointly continuous over [0, ∞) × Rd and is continuously differentiable up to order n in the space variable. Both pt and its partial derivatives are continuous bounded functions. 3. There exists a constant c = c(k, n, r, t) such that " # r ˜ sup kps k E ≤ ckp0 kr r d . (7.22) n,∞
s∈[0,t]
Wk (R )
Remark 7.13. The inequality (7.21) implies that, almost surely, pt belongs to the subspace Wkr (Rd ) or Wk2 (Rd ). However, the definition of the solution of (7.20) requires the Hilbert space structure of Wk2 (Rd ) which is why the conclusion of Theorem 7.11 is that p is a Wk2 (Rd )-valued process.
7.3 The Smoothness of the Density of ρt
177
Let now ρ˜t be the measure which is absolutely continuous with respect to Lebesgue measure with density pt . For the following exercise, use the fact that the stochastic integral appearing on the right-hand side of the stochastic partial differential equation (7.20) is defined as the unique L2 (Rd )-valued stochastic process M = {Mt , t ≥ 0} satisfying Z t hMt , ϕi = hps h> , ϕi dYs , t ≥ 0 (7.23) 0
for any ϕ ∈ L2 (Rd ) (see Chapter 2 in Rozovskii [250] for details). Exercise 7.14. Show that ρ˜ = {˜ ρt , t ≥ 0} satisfies the Zakai equation (3.43); that is for any test function ϕ ∈ Ck2 (Rd ), Z ρ˜t (ϕ) = π0 (ϕ) +
t
Z
t
ρ˜s (Aϕ) ds + 0
ρ˜s (ϕh> ) dYs .
(7.24)
0
Even though we proved that ρ˜ satisfies the Zakai equation we cannot conclude that it must be equal to ρ based on the uniqueness theorems proved in Chapter 4. This is because the measure-valued process ρ˜ does not a priori belong to the class of processes within which we proved uniqueness for the solution of the Zakai equation. In particular, we do not know if ρ˜ has finite mass (i.e. ρ˜(1) may be infinite), so the required inequalities (4.4), or (4.37) may not be satisfied. Instead we use the same approach as that adopted in Section 4.1. Exercise 7.15. Let εt ∈ St where St is the set defined in Corollary B.40; that is, Z t Z 1 t εt = exp i rs> dYs + krs k2 ds , 2 0 0 where r ∈ Cbp ([0, t], Rm ). Then show that Z t ∂ϕs ˜ t ρ˜t (ϕt )] = π0 (ϕ0 ) + E ˜ E[ε εs ρ˜s + Aϕs + iϕs h> rs ds , ∂s 0
(7.25)
for any ϕ ∈ Cb1,2 ([0, t] × Rd ), such that for any t ≥ 0, ϕ ∈ W22 (Rd ) and sup kϕs kW22 (Rd ) < ∞.
(7.26)
s∈[0,t]
Proposition 7.16. Under the assumptions C1–C3, for any ψ ∈ Ck∞ (Rd ) we have, almost surely, ˜ ρ˜t (ψ) = ρt (ψ), P-a.s. Proof. Since all coefficients are now bounded and a is not degenerate there exists a (unique) function ϕ ∈ Cb1,2 ([0, t]×Rd ) which solves the parabolic PDE (4.14); that is,
178
7 Density of the Conditional Distribution of the Signal
∂ϕs + Aϕs + iϕs h> rs = 0, ∂s
s ∈ [0, t]
with final condition ϕt = ψ. The compact support of ψ ensures that (7.26) is also satisfied. From (7.25) we obtain that ˜ t ρ˜t (ψ)] = π0 (ϕ0 ). E[ε As the same identity holds for ρt (ψ) the conclusion follows since the set St is total. t u Theorem 7.17. Under the assumptions C1–C3, the unnormalised conditional distribution of the signal has a density with respect to Lebesgue measure and its density is the process p = {pt , t ≥ 0} which is the unique solution of the stochastic PDE (7.20). Proof. Similar to Exercise 4.1, choose (ϕi )i≥0 to be a sequence of Ck∞ (Rd ) functions dense in the set of all continuous functions with compact support. Then choose a common null set for all the elements of the sequence outside which ρt (ϕi ) = ρ˜t (ϕi ) for all i ≥ 0 and by a standard approximation argument one shows that outside this null set ρt (A) = ρ˜t (A) for any ball A = B(x, r) for arbitrary x ∈ Rd and r > 0, hence the two measures must coincide. t u The following corollary identifies the density of the conditional distribution of the signal (its existence follows from Corollary 7.9). Denote the density of πt by π ˜t ∈ L2 (Rd ). Corollary 7.18. Under the assumptions C1–C3, the conditional distribution of the signal has a density with respect to Lebesgue measure and its density is the normalised version of process p = {pt , t ≥ 0} which is the solution of the stochastic PDE (7.20). In particular, π ˜t ∈ Wk2 (Rd ) and there exists a constant c = c(k, r, t) such that 0 0 ˜ sup k˜ E πs krW r (Rd ) ≤ ckp0 krW r (Rd ) , (7.27) 0≤s≤t
k
k
where r0 can be chosen to be either 1 or r/2. Proof. The first part of the corollary is immediate from Theorem 7.11 and Theorem 7.17. Inequality (7.27) follows from (7.21) and the Cauchy–Schwarz inequality s 0 0 0 r ˜ sup k˜ ˜ sup ρ−2r (1) sup kps k2r r d . E πs k r d ≤ E s 0≤s≤t
Wk (R )
0≤s≤t
0≤s≤t
Wk (R )
−2r 0 ˜ Exercise 9.16 establishes the finiteness of the term E[sup (1)]. 0≤s≤t ρs
7.3 The Smoothness of the Density of ρt
179
Additional smoothness properties of π follow in a similar manner from Theorem 7.12. Following the Kushner–Stratonovich equation (see Theorem 3.30), the density of π satisfies the following non-linear stochastic PDE Z t Z t ∗ π ˜t (x) = π ˜0 (x) + A π ˜s (x) ds + π ˜s (x)(h> (x) − π ˜s (h> ) (dYs − π ˜s (h) ds). 0
0
(7.28) It is possible to recast the SPDE for the density p into a form in which there are no stochastic integral terms. This form can be analysed; for example, Baras et al. [7] treat the one-dimensional case in this way, establishing the existence of a fundamental solution to this form of the Zakai equation. They then use this fundamental solution to prove existence and uniqueness results for the solution to the Zakai equation without requiring bounds on the sensor function h. Theorem 7.19. If we write 1 Rt , exp −Yt> h(x) + kh(x)k2 t 2
(7.29)
and define p˜t (x) , Rt (x)pt (x) then this satisfies the following partial differential equation with stochastic coefficients d˜ pt = Rt A∗ (Rt−1 p˜t ) dt with initial condition p˜0 (x) = p0 (x). Proof. Clearly 1 1 > 2 2 dRt = Rt −h (x)dYt + kh(x)k dt + kh(x)k dhY it 2 2 > 2 = Rt −h (x)dYt + kh(x)k dt . Therefore using (7.20) for dpt it follows by Itˆ o’s formula that d˜ pt (x) = d(Rt (x)pt (x)) = Rt A∗ pt (x)dt + Rt (x)h> (x)pt (x) dYt + pt (x)Rt (x)(−h> (x) dYt + kh(x)k2 dt) − pt (x)Rt kh(x)k2 dt = Rt A∗ pt (x) dt = Rt A∗ (Rt (x)−1 p˜t (x)) dt. The initial condition result follows from the fact that R0 (x) = 1 .
t u
180
7 Density of the Conditional Distribution of the Signal
7.4 The Dual of ρt A result similar to Theorem 7.12 justifies the existence of a function dual for the unnormalised conditional distribution of the signal. Theorem 7.20 stated below is an immediate corollary of Theorem 7.12 using a straightforward timereversal argument. Choose a fixed time horizon t > 0 and let Y t = {Yst , s ∈ [0, t]}, be the backward filtration Yst = σ(Yt − Yr , r ∈ [s, t]). Theorem 7.20. Let m > 2 be an integer such that (m − 2)p > d. Then under the assumptions C1 – C2, for any bounded ϕ ∈ Wpm (Rd ) there exists a unique function-valued process ψ t,ϕ = {ψst,ϕ , s ∈ [0, t]}: 1. For every x ∈ Rd , ψst,ϕ (x) is a real-valued process measurable with respect to the backward filtration Yst . 2. Almost surely, ψst,ϕ (x) is jointly continuous over (s, x) ∈ [0, ∞) × Rd and is twice differentiable in the spatial variable x. Both ψst,ϕ and its partial derivatives are continuous bounded functions. 3. ψ t,ϕ is a (classical) solution of the following backward stochastic partial differential equation, Z t t,ϕ ψs (x) = ϕ(x) − Aψpt,ϕ (x) dp s Z t ¯ p, − ψpt,ϕ (x)h> (x) dY 0 ≤ s ≤ t, (7.30) s
Rt
¯ k is a backward Itˆ where s ψpt,ϕ h> dY o integral. p 4. There exists a constant c = c(m, p) independent of ϕ such that " #
t,ϕ p ˜ sup ψ E ≤ cm,p kϕkpm,p . s s∈[0,t]
2,∞
1
(7.31)
Exercise 7.21. If ϕ ∈ Wpm (Rd ) as above, prove that for 0 ≤ r ≤ s ≤ t we have s,ψ t,ϕ ψr s = ψrt,ϕ . Theorem 7.22. The process ψ t,ϕ = {ψst,ϕ , s ∈ [0, t]} is the dual of the solution of the Zakai equation. That is, for any ϕ ∈ Wpm (Rd ) ∩ B(Rd ), the process s 7→ ρs ψst,ϕ , s ∈ [0, t] is almost surely constant. Proof. Let εt ∈ St where St is the set defined in Corollary B.40; that is, Z t Z 1 t > 2 εt = exp i rs dYs + krs k ds , 2 0 0
7.4 The Dual of ρt
181
where r ∈ Cbm ([0, t], Rm ). Then for any ϕ ∈ Cb1,2 ([0, t] × Rd ), the identity (4.13) gives ˜ [εt ρt (ϕt )] = E ˜ [εr ρr (ϕr )] E Z t ∂ϕs > ˜ +E ε s ρs + Aϕs + iϕs h rs ds . ∂s r Let
(7.32)
Z t Z 1 t > 2 ε˜s = exp i ru dYu + kru k du ; 2 s s
then for s ∈ [0, t], it is immediate that ˜ ψ t,ϕ εt | Ys = εs E ˜ ψ t,ϕ ε˜s | Ys . E s s Since ψst,ϕ and ε˜s are both Yst -measurable, it follows that they are independent ˜ t,ϕ ε˜s ], it of Ys ; thus defining Ξ = {Ξs , s ∈ [0, t]} to be given by Ξs = E[ψ s follows that ˜ ψ t,ϕ εt | Ys = εs Ξs . E s Since ε˜ = {˜ εs , s ∈ [0, t]} is a solution of the backward stochastic differential equation: Z t ¯ u, ε˜s = 1 − i ε˜u ru> dY 0 ≤ s ≤ t. s
It follows by stochastic integration by parts using the SDE (7.30) that ¯ p + ε˜p Aψ t,ϕ dp + ε˜p ψ t,ϕ h> dY ¯ p + i˜ d(ψpt,ϕ ε˜p ) = −iψpt,ϕ ε˜p rp> dY εp h> rp ψpt,ϕ dp p p and taking expectation and using the fact that ψtt,ϕ = ϕ, and ε˜t = 1, Z t Z t ˜ ˜ Ξs = ϕ − E ε˜p Aψpt,ϕ dp − iE h> rp ψpt,ϕ dp , 0 ≤ s ≤ t; s
s
using the boundedness properties of ψ,a,f ,h and r we see that Z t Z t ˜ E ε˜p Aψpt,ϕ dp = AΞp dp, s s Z t Z t ˜ E ε˜p h> rp ψpt,ϕ dp = h> rp Ξp dp, s
hence Z Ξs = ϕ −
s t
Z AΞp dp − i
s
t
h> rp Ξp dp,
0 ≤ s ≤ t;
s
in other words Ξ = {Ξs , s ∈ [0, t]} is the unique solution of the the parabolic PDE (4.14), therefore Ξ ∈ Cb1,2 ([0, t] × Rd ). Hence from (7.32), for arbitrary r ∈ [0, t]
182
7 Density of the Conditional Distribution of the Signal
˜ t ρt (ϕ)] = E[ε ˜ t ρt (Ξt )] = E[ρ ˜ r (εr Ξr )] = E[ε ˜ r Ξr ] E[ε h i h i ˜ εr E ˜ ψ t,ϕ ε˜r | Yr = E ˜ E ˜ εr ε˜r ψ t,ϕ | Yr =E r r h i ˜ εt ψ t,ϕ = E ˜ εt E ˜ ψ t,ϕ | Yr = E ˜ εt ρr (ψ t,ϕ ) , =E r r r where the penultimate equality uses the fact that ψrt,ϕ is Yrt -adapted and hence independent of Yr . The conclusion of the theorem then follows since this holds for any εt ∈ St and the set St is total, thus ρr (ψrt,ϕ ) = ρt (ϕ) P-a.s., and as t is fixed this implies that ρr (ψrt,ϕ ) is a.s. constant. t u Remark 7.23. Theorem 7.22 with r = 0 implies that ˜ ρt (ϕ) = π0 ψ0t,ϕ , P-a.s., hence the solution of the Zakai equation is unique (up to indistinguishability). We can represent ψ t,ϕ by using the following version of the Feynman–Kac formula (see Pardoux [238]) ˜ ϕ (Xt (x)) at (X(x), Y ) | Y , s ∈ [0, t], ψst,ϕ (x) = E (7.33) s where ats (X(x), Y
Z t Z 1 t > 2 kh(Xs (x))k ds , ) = exp h (Xs (x)) dYs − 2 s s
and Xt (x) follows the law of the signal starting from x, viz Z t Z t Z t Xt = x + f˜(Xs ) ds + σ(Xs ) dVs + σ ¯ (Xs ) dWs . s
s
(7.34)
(7.35)
s
The same formula appears in Rozovskii [250] (formula (0.3), page 176) under the name of the averaging over the characteristics (AOC) formula. Using (7.33) we can prove that if ϕ is a non-negative function, then so is ψst,ϕ for any s ∈ [0, t] (see also Corollary 5, page 192 of Rozovskii [250]). We can also use (7.33) to define the dual ψ t,ϕ of ρ for ϕ in a larger class than Wpm (Rd ), for example, for B(Rd ). For these classes of ϕ, Rozovskii’s result no longer applies: the dual may not be differentiable and may not satisfy an inequality similar to (7.31). However, if ϕ has higher derivatives, one can use Kunita’s theory of stochastic flows (see Kunita [164]) to prove that ψ t,ϕ is differentiable.
7.5 Solutions to Exercises 7.2 Let g¯µ : Rd → R be defined as g¯µ =
∞ X i=1
µ(ϕi )ϕi .
7.5 Solutions to Exercises
183
Then g¯µ ∈ L2 (Rd ). Let µ ¯ be a measure absolutely continuous with respect to Lebesgue measure with density g¯µ . Then µ(ϕi ) = µ ¯(ϕi ), since *∞ + Z X µ ¯(ϕi ) = ϕi g¯µ dx = µ(ϕj )ϕj , ϕi = µ(ϕi ); Rd
j=1
hence via an approximation argument µ(A) = µ ¯(A) for any ball A of arbitrary center and radius. Hence µ = µ ¯ and since µ ¯ is absolutely continuous with respect to Lebesgue measure the result follows. 7.3 i.
First we show that if for p, q ≥ 1, 1/p + 1/q = 1 + 1/r then kf ? gkr ≤ kf kp kgkq , where f ? g denotes the convolution of f and g. Then choosing p = 2, q = 1, and r = 2, we see that for g ∈ L2 (Rd ), using the fact that the L1 norm of the heat kernel is unity, kψε gk2 = kψε ? gk2 ≤ kψε k1 kgk2 = kgk2 . We now prove the result for convolution. Consider f, g non-negative; let 1/p0 + 1/p = 1 and 1/q + 1/q 0 = 1. Since 1/p0 + 1/q 0 + 1/r = 1 we may apply H¨ older’s inequality, Z f ? g(x) = f (y)g(x − y) dy d ZR = f (y)p/r g(x − y)q/r f (y)1−p/r g(x − y)1−q/r dy Rd
Z
f (y)p g(x − y)q dy
≤
1/r Z
Rd
0
f (y)(1−p/r)q dy
Rd
Z
0
g(x − y)(1−q/r)p dy
×
1/p0
Rd
Z
p
1/r Z
q
f (y) g(x − y) dy
=
1/q0
p
f (y) dy
Rd
Rd
Z
q
×
1/p0
g(y) dy
.
Rd
Therefore
0
0
(f ? g)r (x) ≤ (f p ? g q )(x)kf kpr/q kgkrq/p , p q so by Fubini’s theorem kf ?
gkrr
≤
kf kr−p kgkr−q p q
≤ kf kr−p kgkr−q p q ≤
Z Rd
Z
Z
f p (y)g q (x − y) dy dx Z f p (y) g q (x − y) dx dy Rd
Rd r−p r−q kf kp kgkq kf kpp kgkqq
Rd
= kf krp kgkrq .
1/q0
184
7 Density of the Conditional Distribution of the Signal
ii. The function ψ2ε (x) is bounded by 1/(2πε)d/2 , therefore Z Z Z kTε µk22 = ψε (x − y)ψε (x − z)µ(dy) µ(dz) dx Rd Rd Rd Z Z = ψ2ε (y − z)µ(dy) µ(dz) Rd
Rd
1 4πε
d/2 Z
1 4πε
d/2
Z
Z
≤ ≤
Z |µ|(dy)|µ(dz)|
Rd
Rd
2 |µ|(Rd ) < ∞.
Also k∂ i Tε µk22 =
Z Rd
Rd
Rd
(xi − yi ) ψε (x − y) ε
(xi − zi ) ψε (x − z) µ(dy) µ(dz) dx ε Z Z Z (xi − yi ) kx − yk2 d =2 ψ2ε (x − y)exp − ε 4ε Rd Rd Rd 2 (xi − zi ) kx − zk × ψ2ε (x − z)exp − µ(dy) µ(dz) dx ε 4ε Z Z Z 2d ≤ ψ2ε (x − y)ψ2ε (x − z) µ(dy) µ(dz) dx ε Rd Rd Rd Z Z 2d ≤ ψ4ε (y − z) µ(dy) µ(dz) ε Rd Rd d/2 2 2d 1 ≤ |µ|(Rd ) < ∞. ε 8πε ×
2
In the above the bound supt≥0 te−t /4 < 1 was used twice. Similar bounds hold for higher-order derivatives and are proved in a similar manner. iii. From part (ii) Tε µ ∈ L2 (R), thus by part (i), kT2ε µk22 = kTε (Tε µ)k22 ≤ kTε µk22 . 7.4 i.
Immediate from Z |Tε f µ(x)| =
Rd
ψε (x − y)f (y)µ(dy) ≤ kf Tε |µ|(x).
ii. Assuming first that f ∈ Cb1 (Rd ), integration by parts yields Z 1 hTε µ, f ∂i Tε µi = f (x)∂ i (Tε µ(x))2 dx 2 Rd Z 1 =− (Tε µ(x))2 ∂ i f (x) dx. 2 Rd
7.5 Solutions to Exercises
185
Thus |hTε µ, f ∂i Tε µi| ≤ 12 kf0 kTε µk22 , which implies (ii) for f ∈ Cb1 (Rd ). The general result follows via a standard approximation argument. iii. i f ∂ Tε µ(x) − ∂ i Tε (f µ)(x) Z i = (f (x) − f (y))∂ ψε (x − y)µ(dy) Rd Z |xi − yi | 0 ≤ kf kx − yk ψε (x − y)|µ|(dy) ε Rd Z kx − yk2 kx − yk2 d/2 0 ≤ 2 kf exp − ψ2ε (x − y)|µ|(dy) ε 4ε Rd ≤ 2d/2+1 kf0 T2ε |µ|(x), where the final inequality follows as a consequence of the fact that supt≥0 (t exp(−t/4)) < 2. 7.10 Using primes to denote differentiation with respect to the spatial variable, from the Zakai equation, Z t Z t Tε ρ(ϕ0 ) = Tε π0 (ϕ0 ) + ρs (ATε ϕ0 ) ds + ρs (hTε ϕ0 ) dYs . 0
By Itˆ o’s formula, setting
ztε
0 0
= (Tε ρ) ,
Z t Z t 2 (ztε (ϕ)) = (Tε π0 )0 ϕ + 2 ztε (ϕ)ρs (ATε ϕ0 ) ds + 2 ztε (ϕ) dYs 0 0 Z t 0 2 + (ρs (hTε ϕ )) ds. 0
Taking expectation and using Fatou’s lemma, Z t ˜ (z ε (ϕ))2 ≤ E ˜ [(Tε π0 )0 (ϕ)] + 2E ˜ E z ε (ϕ)ρs (ATε ϕ0 ) ds t
t
0
˜ +E
Z
t
2
(ρs (hTε ϕ0 )) ds.
0
For the final term ρs (hTε ϕ0 ) = (hρ)(Tε ϕ0 ) = hϕ, Tε (hρ)i; using this and the result (7.9) of Lemma 7.5 it follows that
186
7 Density of the Conditional Distribution of the Signal
Z t ˜ (z ε (ϕ))2 ≤ E ˜ [(Tε π0 )0 (ϕ)] + 2E ˜ E ztε (φ)hϕ0 , (Tε f ρ)0 i ds t 0 Z t Z t ˜ ˜ + 2E ztε (φ)hϕ0 , (Tε aρ)00 i ds + E hϕ0 , Tε (hρ)i2 ds. 0
0
Therefore integrating by parts yields, Z t 2 ε 0 ˜ ˜ ˜ E (zt (ϕ)) ≤ E [(Tε π0 ) (ϕ)] + 2E ztε (φ)hϕ, (Tε f ρ)00 i ds 0 Z t Z t ε 000 ˜ ˜ + 2E zt (φ)hϕ, (Tε aρ) i ds + E hϕ, Tε (hρ)0 i2 ds. (7.36) 0
0
Now let ϕ range over an orthonormal basis of L2 (Rd ), and bound lim
n→∞
n X
(ztε (ϕi ))
2
i=1
using the result (7.36) applied to each term. By the dominated convergence theorem the limit can be exchanged with the integrals and the result is obtained. 7.14 By Fubini and integration by parts (use the bound (7.21) to prove the Rt integrability of 0 A∗ ps (x) ds), Z h
t ∗
Z
A ps ds, ϕi = 0
t
ρ˜s (Aϕ) ds. 0
Next using the definition 7.23 of the stochastic integral appearing in the stochastic partial differential equation (7.20), Z t Z t h h> (x)ps (x) dYs , ϕi = ρ˜s (ϕh> ) dYs . 0
0
Hence the result. 7.15 This proof requires that we repeat, with suitable modifications, the proof of Lemma 4.8 and Exercise 4.9. In the earlier proofs, (4.4) was used for two purposes, firstly in the proof of Lemma 4.8 to justify via dominated convergence interchange of limits and integrals, and secondly in the solution to Exercise 4.9 to show that the various stochastic integrals are martingales. The condition (7.26) must be used instead. First for the analogue of Lemma 4.9 we show that (7.24) also holds for ϕ ∈ W22 (Rd ), by considering a sequence ϕn ∈ Ck2 (Rd ) converging to ϕ in the k · k2,2 norm. From Theorem 7.11 with k = 0, ˜ sup kps k2 ≤ ckp0 k2 < ∞, E 2 2 0≤s≤t
7.5 Solutions to Exercises
187
since we assumed the initial state density was in L2 (Rd ); thus sup0≤s≤t kps k22 < ˜ ∞ P-a.s. Therefore by the Cauchy–Schwartz inequality Z
t
Z t hps , ϕi ds ≤ kps k2 kϕk2 ds 0 0 Z t ≤ kϕk2 kps k2 ds ≤ tkϕk2 sup kps k2 < ∞ Z
t
ρ˜s (ϕ) ds = 0
and similarly Z
t i
Z
i
t
ρ˜s (∂ ϕ) ds ≤ k∂ ϕk2 0
and Z
˜ P-a.s.
0≤s≤t
0
kps k2 ds < ∞
˜ P-a.s.,
0
t i j
i j
Z
t
ρ˜s (∂ ∂ ϕ) ds ≤ tk∂ ∂ ϕk2 0
kps k2 ds < ∞
˜ P-a.s.
0
Thus using the boundedness (from C2) of the aij and fi , it follows from the dominated convergence theorem that Z t Z t lim ρ˜s (Aϕn ) ds = ρ˜s (Aϕ) ds. n→∞
0
0
From the boundedness of h, and Cauchy–Schwartz Z t Z t 2 lim [˜ ρs (hi ϕn ) − ρ˜s (hi ϕ)] ds ≤ khk2∞ hps , ϕn − ϕi2 ds n→∞
0
0
≤ khk2∞ sup kpt k2 tkϕn − ϕk22 = 0, 0≤s≤t
so by Itˆ o’s isometry Z lim
n→∞
t >
n
Z
ρ˜s (h ϕ ) dYs = 0
t
ρ˜s (h> ϕ) dYs .
0
Thus from these convergence results (7.24) is satisfied for any ϕ ∈ W22 . The result can then be extended to time-dependent ϕ, which is uniformly bounded in W22 over [0, t] by piecewise approximation followed by the dominated convergence theorem using the bounds just derived. Thus for any ϕ ∈ Cb1,2 ([0, t]×Rd ) such that ϕt ∈ W22 , Z t Z t ∂ϕs ρ˜t (ϕt ) = ρ˜0 (ϕ0 ) + ρ˜s + Aϕs ds + ρ˜s (ϕs h> ) dYs . ∂s 0 0 For the second part of the proof, apply Itˆ o’s formula to εt ρt (ϕt ) and then take expectation. In order to show that the stochastic integrals are martingales and therefore have zero expectation, we may use the bound
188
7 Density of the Conditional Distribution of the Signal
˜ E
Z 0
t
Z t 2 2 ˜ ε2s (˜ ρs (ϕs )) ds ≤ ekrk∞ t E (˜ ρs (ϕs ))2 ds 0 Z t ˜ ≤E hps , ϕi2 ds 0 Z t 2 2 ˜ ≤E kϕs k2 kps k2 ds 0
2
≤t
sup kϕs k2 0≤s≤t
˜ E
sup kps k22 < ∞.
0≤s≤t
Consequently since the stochastic integrals are all martingales, we obtain Z t ∂ϕs ˜ [εt ρ˜t (ϕt )] = π0 (ϕ0 ) + E ˜ + Aϕs + iϕs h> rs ds . E εs ρ˜s ∂s 0 7.21 It is immediate from (7.30) that s,ψst,ϕ
ψs
= ψst,ϕ ;
thus by subtraction of (7.30) at times s and r, for 0 ≤ r ≤ s ≤ t, we obtain Z s Z s t,ϕ t,ϕ t,ϕ ¯ p ψr = ψs − Aψp dp − ψpt,ϕ h> dY r
r s,ψ t,ϕ
and this is the same as the evolution equation for ψr s . Therefore by the s,ψ t,ϕ uniqueness of its solution (Theorem 7.20), ψrt,ϕ = ψr s for r ∈ [0, s].
Part II
Numerical Algorithms
8 Numerical Methods for Solving the Filtering Problem
This chapter contains an overview of six classes of numerical methods for solving the filtering problem. For each of the six classes, we give a brief description of the ideas behind the methods and state some related results. The last class of methods presented here, particle methods, is developed and studied in depth in Chapter 9 for the continuous time framework and in Chapter 10 for the discrete one.
8.1 The Extended Kalman Filter This approximation method is based on a natural extension of the exact computation of the conditional distribution for the linear/Gaussian case. Recall from Chapter 6, that in the linear/Gaussian framework the pair (X, Y ) satisfies the (d + m)-dimensional system of linear stochastic differential equations (6.17) and (6.18); that is, dXt = (Ft Xt + ft ) dt + σt dVt dYt = (Ht Xt + ht ) dt + dWt .
(8.1)
In (8.1), the pair (V, W ) is a (d + m)-dimensional standard Brownian motion. Also Y0 = 0 and X0 has a Gaussian distribution, X0 ∼ N (x0 , p0 ), and is independent of (V, W ). The functions F : [0, ∞) → Rd×d ,
f : [0, ∞) → Rd
H : [0, ∞) → Rd×m ,
h : [0, ∞) → Rm
are locally bounded, measurable functions. Then πt , the conditional distribution of the signal Xt , given the observation σ-algebra Yt is Gaussian. Therefore πt is uniquely identified by its mean and covariance matrix. Let x ˆ = {ˆ xt , t ≥ 0} be the conditional mean of the signal; that is, x ˆit = E[Xti |Yt ]. Then x ˆ satisfies the stochastic differential equation (6.27), that is, A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 8,
192
8 Numerical Methods for Solving the Filtering Problem
dˆ xt = (Ft x ˆt + ft ) dt + Rt Ht> (dYt − (Ht x ˆt + ht ) dt), and R = {Rt , t ≥ 0} satisfies the deterministic matrix Riccati equation (6.28), dRt = σt σt> + Ft Rt + Rt Ft> − Rt Ht> Ht Rt . dt We note that R = {Rt , t ≥ 0} is the conditional covariance matrix of the signal; that is, Rt = (Rtij )di,j=1 has components Rtij = E[Xti Xtj |Yt ] − E[Xti |Yt ]E[Xtj |Yt ],
i, j = 1, . . . , d, t ≥ 0.
Therefore, in this particular case, the conditional distribution of the signal is explicitly described by a finite set of parameters (ˆ xt and Rt ) which, in turn, are easy to compute numerically. The conditional mean x ˆt satisfies a stochastic differential equation driven by the observation process Y and is computed online, in a recursive fashion, updating it as new observation values become available. However Rt is independent of Y and can be computed offline, i.e., before any observation is obtained. Some of the early applications of the linear/Gaussian filter, known as the Kalman–Bucy filter, date back to the early 1960s. They include applications to space navigation, aircraft navigation, anti-submarine warfare and calibration of inertial navigation systems. Notably, the Kalman–Bucy filter was used to guide Rangers VI and VII in 1964 and the Apollo space missions. See Bucy and Joseph [31] for details and a list of early references. For a recent selfcontained treatment of the Kalman–Bucy filter and a number of applications to mathematical finance, genetics and population modelling, see Aggoun and Elliott [2] and references therein. The result obtained for the linear filtering problem (8.1) can be generalized as follows. Let (X, Y ) be the solution of the following (d + m)-dimensional system of stochastic differential equations dXt = (F (t, Y )Xt + f (t, Y )) dt + σ(t, Y ) dVt m X + (Gi (t, Y )Xt + gi (t, Y ))dYti
(8.2)
i=1
dYt = (H(t, Y )Xt + h(t, Y )) dt + dWt , where F, σ, G1 , . . . , Gn : [0, ∞) × Ω → Rn×n , f, g1 , . . . , gn : [0, ∞) × Ω → Rn , H : [0, ∞) × Ω → Rn×m and h : [0, ∞) × Ω → Rm are progressively measurable† locally bounded functions. Then, as above, πt is Gaussian with mean x ˆt , and variance Rt which satisfy the following equations †
If (Ω, F, Ft , P) is a filtered probability space, then we say that a : [0, ∞)×Ω → RN is a progressively measurable function if, for all t ≥ 0, its restriction to [0, t] × Ω is B ([0, t]) × Ft -measurable, where B([0, t]) is the Borel σ-algebra on [0, t]).
8.1 The Extended Kalman Filter
dˆ xt =
F (t, Y )ˆ xt + f (t, Y ) +
m X
193
! Gi (t, Y )Rt Hi> (t, Y )
dt
i=1
+
m X (Gi (t, Y )ˆ xt + gi (t, Y )) dYti i=1
+ Rt H > (t, Y ) (dYt − (Ht (t, Y )ˆ xt + ht (t, Y )) dt) dRt = F (t, Y )Rt + Rt F (t, Y ) + σ(t, Y )σ > (t, Y ) +
m X
(8.3)
Gi (t, Y )Rt G> (t, Y ) dt − Rt H > (t, Y )H(t, Y )Rt dt i
i=1 m X i + (Gi (t, Y )Rt + Rt G> i (t, Y )) dYt .
(8.4)
i=1
The above formulae can be used to estimate πt for more general classes of filtering problems, which are non-linear. This will lead to the well-known extended Kalman filter (EKF for short). The following heuristic justification of the EKF follows that given in Pardoux [238]. Let (X, Y ) be the solution of the following (d + m)-dimensional system of non-linear stochastic differential equations dXt = f (Xt ) dt + σ(Xt ) dVt + g(Xt ) dWt dYt = h(Xt ) dt + dWt ,
(8.5)
and assume that (X0 , Y0 ) = (x0 , 0), where x0 ∈ Rd . Define x ¯t to be the solution of the ordinary differential equation d¯ xt = f (¯ xt ), dt
x ¯ 0 = x0 .
The contribution of the two stochastic terms in (8.5) remains small, at least within a small window of time [0, ε], so a trajectory t 7→ Xt may be viewed as being a perturbation from the (deterministic) trajectory t → x ¯t . Therefore the following Taylor-like expansion is expected dXt ' (f 0 (¯ xt )(Xt − x ¯t ) + f (¯ xt )) dt + σ(¯ xt ) dVt + g(¯ xt ) dWt dYt ' (h0 (¯ xt )(Xt − x ¯t ) + h(¯ xt )) dt + dWt . In the above equation, ‘'’ means approximately equal, although one can not attach a rigorous mathematical meaning to it. Here f 0 and h0 are the derivatives of f and h. In other words, for a small time window, the equation satisfied by the pair (X, Y ) is nearly linear. By analogy with the generalized linear filter (8.2), we can ‘conclude’ that πt is ‘approximately’ normal with mean x ˆt and with covariance Rt which satisfy (cf. (8.3) and (8.4))
194
8 Numerical Methods for Solving the Filtering Problem
dˆ xt = [(f 0 − gh0 )(¯ xt )ˆ xt + (f − gh)(¯ xt ) − (f 0 − gh0 )(¯ xt )¯ xt ] dt + g(¯ xt )dYt + Rt h0> (¯ xt )[dYt − (h0 (¯ xt )ˆ xt + h(¯ xt ) − h0 (¯ xt )¯ xt ) dt] dRt = (f 0 − gh0 )(¯ xt )Rt + Rt (f 0 − gh0 )> (¯ xt ) + σσ > (¯ xt ) − Rt h0> h0 (¯ xt )Rt dt with x ˆ0 = x0 and R0 = p0 . Hence, we can estimate the position of the signal by using x ˆt as computed above. We can use the same procedure, but instead of x ¯t we can use any Yt -adapted ‘estimator’ process mt . Thus, we obtain a mapping Λ from the set of Yt -adapted ‘estimator’ processes into itself Λ
mt −→ x ˆt . The extended Kalman filter (EKF) is the fixed point of Λ; that is, the solution of the following system dˆ xt = (f − gh)(ˆ xt )dt + g(ˆ xt )dYt + Rt h0> (ˆ xt )[dYt − h(ˆ xt )dt] dRt = (f 0 − gh0 )(ˆ xt )Rt + Rt (f 0 − gh0 )> (ˆ xt ) + σσ > (ˆ xt ) − Rt h0> h0 (ˆ xt )Rt . dt Although this method is not mathematically justified, it is widely used in practice. The following is a minute sample of some of the more recent applications of the EKF. • •
• •
In Bayro-Corrochano et al. [8], a variant of the EKF is used for the motion estimation of a visually guided robot operator. In Kao et al. [148], the EKF is used to optimise a model’s physical parameters for the simulation of the evolution of a shock wave produced through a high-speed flyer plate. In Mangold et al. [202], the EKF is used to estimate the state of a molten carbonate fuel cell. In Ozbek and Efe [235], the EKF is used to estimate the state and the parameters for a model for the ingestion and subsequent metabolism of a drug in an individual.
The EKF will give a good estimate if the initial position of the signal is well approximated (p0 is ‘small’), the coefficients f and g are only ‘slightly’ non-linear, h is injective and the system is stable. Theorem 8.5 (below) gives a result of this nature. The result requires a number of definitions. Definition 8.1. The family of function f ε : [0, ∞) × Rd → Rd , ε ≥ 0, is said to be almost linear if there exists a family of matrix-valued functions Ft : Rd → Rd×d such that, for any t ≥ 0 and x, y ∈ Rd , we have |f ε (t, x) − f ε (t, y) − Ft (x − y)| ≤ µε |x − y|, for some family of numbers µε converging to 0 as ε converges to 0.
8.1 The Extended Kalman Filter
195
Definition 8.2. The function f ε : [0, ∞) × Rd → Rd is said to be strongly injective if there exists a constant c > 0 such that |f (t, x) − f (t, y)| ≥ c|x − y| for any x, y ∈ Rd . Definition 8.3. A family of stochastic processes {ξtε , t ≥ 0}, ε > 0, is said to be bounded in L∞− if, for any q < ∞ there exists εq > 0 such that kξtε kq is bounded uniformly for (t, ε) ∈ [0, ∞) × [0, εq ]. Definition 8.4. The family ξtε , ε > 0, is said to be of order εα for some α > 0 if ε−α ξtε is bounded in L∞− . Assume that the pair (X ε , Y ε ) satisfies the following system of SDEs, √ √ dXtε = β ε (t, Xtε )dt + εσ(t, Xtε )dWt + εγ(t, Xtε )dBt √ dYtε = hε (t, Xtε )dt + εdBt . The following theorem is proved in Picard [240]. −1/2
Theorem 8.5. Assume that p0 conditions are satisfied.
(X0ε − x ˆ0 ) is of order
√
ε and the following
• • • •
σ and γ are bounded. β ε and hε are continuously. differentiable and almost linear. h is strongly injective and σσ > is uniformly elliptic. The ratio of the largest and smallest eigenvalues of P0 is bounded. √ Then (Rtε )−1/2 (Xtε − x ˆεt ) is of order ε. Hence the EKF works well under the conditions described above. If any of these conditions are not satisfied, the approximation can be very bad. The following two examples, again taken from [240], show this fact. Suppose first that X ε and Y ε are one-dimensional and satisfy √ dXtε = (2 arctan Xtε − Xtε )dt + εdWt √ dYtε = HXtε dt + εdBt , where H is a positive real number. In particular, the signal’s drift is no longer almost linear. The deterministic dynamical system associated with X ε (obtained for ε = 0) has two stable points of equilibrium denoted by x0 > 0 and −x0 . The point 0 is an unstable equilibrium point. The EKF performs badly in this case. For instance, it cannot be used to detect phase transitions of the signal. More precisely, suppose that the signal starts from x0 . Then, for all ε, Xtε will change sign with probability one. In fact, one can check that α0 = lim ε log(E [inf{t > 0; Xtε < 0}]) ε→0
196
8 Numerical Methods for Solving the Filtering Problem
exists and is finite. We choose α1 > α0 and t1 , exp(α1 /ε). One can prove that 1 lim P Xtε1 < 0 = , ε→0 2 but on the other hand, lim P [(ˆ xt1 > x0 − δ)] = 1
ε→0
for small δ > 0. Hence Xtε − x ˆεt does not converge to 0 in probability as ε tends to 0. In the following example the EKF does not work because the initial condition of the signal is imprecisely known. Assume that X ε is one-dimensional, Y ε is two-dimensional, and that they satisfy the system of SDEs, √ dXtε = εdWt √ dYtε,1 = Xtε + εdBt1 √ dYtε,2 = 2|Xtε | + εBt2 , ˆεt does not converge to 0. To be and X0ε ∼ N (−2, 1). In this case Xtε − x precise, ε ε lim inf P inf Xs ≥ 1, sup x ˆt ≤ −1 > 0. ε→0
s≤t
For further results and examples see Bensoussan [12], Bobrovsky and Zakai [21], Fleming and Pardoux [97] and Picard [240, 243].
8.2 Finite-Dimensional Non-linear Filters We begin by recalling the explicit expression of the conditional distribution of the Beneˇs filter as presented in Chapter 6. Let X and Y be one-dimensional processes satisfying the system of stochastic differential equations (6.1) and (6.3); that is, dXt = f (Xt ) dt + σdVt (8.6) dYt = (h1 Xt + h2 ) dt + dWt with (X0 , Y0 ) = (x0 , 0), where x0 ∈ R. In (8.6), the pair process (V, W ) is a two-dimensional Brownian motion, h1 , h2 , σ ∈ R are constants with σ > 0, and f : R → R is differentiable with bounded derivative (Lipschitz) satisfying the Beneˇs condition f 0 (x) + f 2 (x)σ −2 + (h1 x + h2 )2 = p2 x2 + 2qx + r,
x ∈ R,
where p, q, r ∈ R are arbitrary. Then πt satisfies the explicit formula (6.15); that is,
8.2 Finite-Dimensional Non-linear Filters
πt (ϕ) =
1 ct
Z
197
∞
ϕ(z)exp F (z)σ −2 + Qt (z) dz,
(8.7)
−∞
where F is an antiderivative of f , ϕ is an arbitrary bounded Borel-measurable function, Qt (z) is the second-order polynomial Z t sinh(spσ) q + p2 x0 q Qt (z) , z h1 σ dYs + − coth(tpσ) pσ sinh(tpσ) pσ 0 sinh(tpσ) p coth(tpσ) 2 − z 2σ and ct is the corresponding constant, Z ∞ ct , exp F (z)σ −2 + Qt (z) dz.
(8.8)
−∞
In particular, π only depends on the one-dimensional Yt -adapted process Z t t 7→ ψt = sinh(spσ) dYs . 0
The explicit formulae (8.7) and (8.8) are very convenient. If the observations arrive at the given times (ti )i≥0 , then ψti can be recursively approximated using, for example, the Euler method ψti+1 = ψti + sinh(ti+1 pσ)(Yti+1 − Yti ) and provided the constant ct and the antiderivative F can be computed this gives an explicit approximation of the density of πt . Chapter 6 gives some examples where this is possible. If ct and F are not available in closed form then they can be approximated via a Monte Carlo method for c and numerical integration for F . The following extension to the d-dimensional case (see Beneˇs [9] for details) is valid. Let f : Rd → Rd be an irrotational vector field; that is, there exists a scalar function F such that f = ∇F and assume that the signal and the observation satisfy dXt = f (Xt )dt + dVt ,
X0 = x
(8.9)
dYt = Xt dt + Wt ,
Y0 = 0,
(8.10)
and further assume that F satisfies the following condition ∇2 F + |∇F |2 + |z|2 = z > Qz + q > Z + c,
(8.11)
where Q ≥ 0 and Q = Q> . Let T be an orthogonal matrix such that T QT > = Λ, where Λ is the √ diagonal√matrix of (nonnegative) eigenvalues λi of Q and b = T q. Let k = ( λ1 , . . . , λd ), u> = (0, 1, −1, 0, 1, −1, . . . repeated d times) and m be the 3d-dimensional solution of the equation
198
8 Numerical Methods for Solving the Filtering Problem
dm = Am, dt
(8.12)
where m(0) = (x1 , 0, 0, x2 , 0, 0, . . . , xd , 0, 0) and A1 0 −ki 00 A2 0 0. A= Ai = 0 , .. . ki (T y)i − bi /2 0 0 0 Ad Let also R be the 3d × 3d matrix-valued solution of dR = Y¯ + RA∗ + AR, dt where R=
R1
0 R2 ..
.
,
Y¯1
Y¯ =
0 Rd 1 Y¯i = (T Yt )i (1, (T Yt )i , 0) . 0
0 Y¯2 ..
0
. Y¯d
,
Then we have the following theorem (see Beneˇs [9] for details). Theorem 8.6. If condition (8.11) is satisfied, then πt satisfies the explicit formula Z 1 πt (ϕ) = ϕ(z)exp(F (z) + Ut (z)) dz, ct R d where ϕ is an arbitrary bounded Borel-measurable function, Ut (z) is the second-order polynomial 1 1 Ut (z) = z > Yt + z > Q1/2 z − (T z + Ru − m)> R−1 (T z + Ru − m), 2 2
z ∈ Rd
and ct is the corresponding normalising constant Z ct = exp(F (z) + Ut (z)) dz. Rd
As in the one-dimensional case, this filter is finite-dimensional. The conditional distribution of the signal πt depends on the triplet (Y, m, R), which can be recursively computed/approximated. Again, as long as the normalising constant ct and the antiderivative F can be computed we have an explicit approximation of the density of πt and if ct and F are not available in closed
8.3 The Projection Filter and Moments Methods
199
form they can be approximated via a Monte Carlo method and numerical integration, respectively. The above filter is equivalent to the Kalman–Bucy filter: one can be obtained from the other via a certain space transformation. This in turn induces a homeomorphism which makes the Lie algebras associated with the two filters equivalent (again see Beneˇs [9] for details). However in [10], Beneˇs has extended the above class of finite-dimensional non-linear filters to a larger class with corresponding Lie algebras which are no longer homeomorphic to the Lie algebra associated with the Kalman–Bucy filter. Further work on finite-dimensional filters and numerical schemes based on approximation using these classes of filter can be found in Cohen de Lara [58, 59], Daum [69, 70], Schmidt [253] and the references therein. See also Darling [68] for another related approach.
8.3 The Projection Filter and Moments Methods The projection filter (see Brigo et al. [24] and the references therein) is an algorithm which provides an approximation of the conditional distribution of the signal in a systematic way, the method being based on the differential geometric approach to statistics. The algorithm works well in some cases, for example, the cubic sensor example discussed below, but no general convergence theorem is known. Let S , {p(·, θ), θ ∈ Θ} be a family of probability densities on Rd , where Θ ⊆ Rn is an open set of parameters and let p S 1/2 , { p(·, θ), θ ∈ Θ} ∈ L2 (Rd ) be the corresponding set of square roots of densities. We assume that for all θ ∈ Θ, ( p ) p ∂ p(·, θ) ∂ p(·, θ) ,..., ∂θ1 ∂θn are independent vectors in L2 (Rd ), i.e., that S 1/2 p is an n-dimensional submanifold of L2 (Rd ), The tangent vector space at p(·, θ) to S 1/2 is ( p ) p ∂ p(·, θ) ∂ p(·, θ) 1/2 √ L p(·,θ) S = span ,..., . ∂θ1 ∂θn The L2 -inner product of any two elements of the basis is defined as * p + p Z ∂ p(·, θ) ∂ p(·, θ) 1 1 ∂p(x, θ) ∂p(x, θ) 1 , = dx = gij (θ), ∂θi ∂θj 4 Rd p(x, θ) ∂θi ∂θj 4 where g(θ) = (gij (θ)) is called the Fisher information matrix and following normal tensorial convention, its inverse is denoted by g −1 (θ) = (g ij (θ)).
200
8 Numerical Methods for Solving the Filtering Problem
In the following we choose S to be an exponential family, i.e., S = {p(x, θ) = exp θ> c(x) − ψ(θ) : θ ∈ Θ}, where c1 , . . . , cn are scalar functions such that {1, c1 , . . . , cn } are linearly independent. We also assume that Θ ⊆ Θ0 where Z > Θ0 = θ ∈ Rn : ψ(θ) , log eθ c(x) dx < ∞ and that Θ0 has non-empty interior. Let X and Y be the solution of the following system of SDEs, dXt = f (t, Xt ) dt + σ(t, Xt ) dWt dYt = h(t, Xt ) dt + dVt . The density πt (z) of the conditional distribution of the signal satisfies the Stratonovich SDE, dπt (z) = A∗ πt (z)dt − 12 πt (z)(kh(z)k2 − πt (khk2 )) + πt (z)(h> (z) − πt (h> )) ◦ dYt ,
(8.13)
where ◦ is used to denote Stratonovich integration and A∗ is the operator which is the formal adjoint of A, ! d d d X X X ∂ ∂2 ∗ i 1 A ϕ,− (f ϕ) + 2 ϕ σik σjk . ∂xi ∂xi ∂xj i=1 i,j=1 k=1
By using the Stratonovich chain rule, we get from (8.13) that m
X √ √ √ √ 1 d πt = √ ◦ dπt = Rt ( πt )dt − Q0t ( πt )dt + Qkt ( πt ) ◦ dYtk , 2 πt k=1
m where Rt and Qkt k=0 are the following non-linear time-dependent operators A∗ p √ Rt ( p) , √ 2 p √ p √ Q0t ( p) , khk2 − πt khk2 √4 p k √ Qt ( p) , khkk − πt khkk . 2 Assume now that for all θ ∈ Θ and all t ≥ 0 " 2 # A∗ p(·, θ) Ep(·,θ) <∞ p(·, θ)
8.3 The Projection Filter and Moments Methods
201
p p and Ep(·,θ) [|h|4 ] < ∞. This implies that Rt ( p(·, θ)) and Qkt ( p(·, θ)), for k = 0, 1, . . . , m, are vectors in L2 (Rd ). We define the exponential projection filter for the exponential family S to be the solution of the stochastic differential equation p p p d p(·, θt ) = Λθt ◦ Rt ( p(·, θt )) dt − Λθt ◦ Q0t ( p(·, θt )) dt m X p + Λθt ◦ Qkt ( p(·, θt )) ◦ Ytk , k=1
where Λθt : L2 → L√p(·,θ) S 1/2 is the orthogonal projection * + p p n n X X ∂ p(·, θ) ∂ p(·, θ) . v 7→ 4g ij (θ) v, ∂θ ∂θi j i=1 j=1 Λ θt
p In other words, p(·, θt ) satisfies a differential equation whose driving vector fields are the projections of the corresponding vector fields appearing in the √ equation satisfied by πt onto the tangent space of the manifold S 1/2 , and therefore, p(·, θt ) is a natural candidate for an approximation of the conditional distribution of the signal at time t, when the approximation is sought among the elements of S. One can prove that for the exponential family p(x, θ) = exp θ> c(x) − ψ(θ) , the projection filter density Rtπ is equal to p(·, θt ), where the parameter θt satisfies the stochastic differential equation ¯ ¯ Ac − 1 khk2 (c − E[c]) dθt = g −1 (θt ) E dt 2 +
m X
k k ¯ ¯ E[ht (c − E[c])] ◦ Yt ,
(8.14)
k=1
¯ = Ep(·,θ ) [·]. Therefore, in order to approximate πt , solve (8.14) and where E[·] t then compute the density corresponding to its solution. Example 8.7. We consider the cubic sensor, i.e., the following problem dXt = σ dWt dYt = Xt3 dt + dVt . We choose now S to be the following family of densities ( ! ) 6 X i 6 S = p(x, θ) = exp θi x − ψ(θ) : θ ∈ Θ ⊂ R , θ6 < 0 . i=1
202
8 Numerical Methods for Solving the Filtering Problem
Let ηk (θ)R be the kth moment of the probability with density p(·, θ), i.e., ∞ ηk (θ) , −∞ xk p(x, θ) dx; clearly η0 (θ) = 1. It is possible to show that the following recurrence relation holds 6 X 1 η6+i (θ) = − (i + 1)ηi (θ) + θj ηi+j (θ) , i ≥ 0, 6θ6 j=1 and therefore we only need to compute η1 (θ), . . . , η5 (θ) in order to compute all the moments. The entries of the Fisher information matrix gij (θ) are given by ∂ 2 ψ(θ) gij (θ) = = ηi+j (θ) − ηi (θ)ηj (θ) ∂θi ∂θj and (8.14) reduces to the SDE, dθt = g −1 (θt )γ• (θt )dt − λ0• dt + λ• dYt , where λ0• = (0, 0, 0, 0, 0, 1/2)> λ• = (0, 0, 1, 0, 0, 0)> γ• = 12 σ 2 (0, 2η0 (θ), 6η1 (θ), 12η2 (θ), 2 − η3 (θ), 30η4 (θ))> . See Brigo et al. [24] for details of the numerical implementation of the projection filter in this case. The idea of fixing the form of the approximating conditional density and then evolving it by imposing appropriate constraints on the parameters was first introduced by Kushner in 1967 (see [177]). In [183], the same method is used to produce approximations for the filtering problem with a continuous time signal and discrete time observations.
8.4 The Spectral Approach The spectral approach for the numerical estimation of the conditional distribution of the signal was introduced by Lototsky, Mikulevicius and Rozovskii in 1997 (see [197] for details). Further developments on spectral methods can be found in [195, 198, 199]. For a recent survey see [196]. This section follows closely the original approach and the results contained in [197] (see also [208]). Let us begin by recalling from Chapter 7 that pt (z), the density of the unnormalised conditional distribution of the signal, is the (unique) solution of the stochastic partial differential equation (7.20), Z t Z t ∗ pt (x) = p0 (x) + A ps (x) ds + h> (x)ps (x) dYs , 0
0
8.4 The Spectral Approach
203
in a suitably chosen function space (e.g. L2k (Rd )). The spectral approach is based on decomposing pt into a sum of the form pt (z) =
X 1 √ ϕα (t, z)ξα (Y ), α! α
(8.15)
where ξα (Y ) are certain polynomials (see below) of Wiener integrals with respect to Y and ϕα (t, z) are deterministic Hermite–Fourier coefficients in the Cameron–Martin orthogonal decomposition of pt (z). This expansion separates the parameters from the observations: the Hermite–Fourier coefficients are determined only by the coefficients of the signal process, its initial distribution and the observation function h, whereas the polynomials ξα (Y ) are completely determined by the observation process. A collection α = (αkl )1≤l≤d,k≥1 of nonnegative integers is called a ddimensional multi-index if only finitely many of αkl are different from zero. Let J be the set of all d-dimensional multi-indices. For α ∈ J we define: X |α| , αkl : the length of α l,k
d(α) , max k ≥ 1 : αkl > 0 for some 1 ≤ l ≤ d : the order of α Y α! , αkl !. k,l
Let {mk } = {mk (s)}k≥1 be an orthonormal system in the space L2 ([0, t]) and ξk,l be the following random variables Z ξk,l =
t
mk (s) dY l (s).
0
˜ ξk,l are i.i.d. Gaussian random variables Under the new probability measure P, l ˜ Let also (Hn )n≥1 be (as Y = Y is a standard Brownian motion under P). the Hermite polynomials Hn (x) , (−1)n ex
2
/2
d2 −x2 /2 e dxn
and (ξα )α be the Wick polynomials Y Hαl (ξk,l ) qk . ξα , l αk ! k,l ˜ Their corThen (ξα )α form a complete orthonormal system in L2 (Ω, Yt , P). responding coefficients in the expansion (8.15) satisfy the following system of deterministic partial differential equations
204
8 Numerical Methods for Solving the Filtering Problem
X dϕα t (z) = A∗ ϕα (t, z) + αkl mk (t)hl (z)ϕα(k,l) (t, z) dt k,l
ϕα 0 (z)
(8.16)
= π0 (z)1{|α|=0} ,
where α = (αkl )1≤l≤d,k≥1 ∈ J and α(i, j) stands for the multi-index (˜ αkl )1≤l≤d,k≥1 with l αk if k 6= i or ` 6= j or both α ˜ kl = . max(0, αij − 1) if k = i and ` = j Theorem 8.8. Under certain technical assumptions (given in Lototsky et al. [197]), the series X 1 √ ϕα t (z)ξα α! α ˜ and in L1 (Ω, P) and we have converges in L2 (Ω, P) pt (z) =
X 1 √ ϕα (t, z)ξα , α! α
P-a.s.
(8.17)
Also the following Parseval’s equality holds ˜ t (z)|2 ] = E[|p
X 1 |ϕα (t, z)|2 . α! α
For computational purposes one needs to truncate the sum in the expann sion of pt . Let JN be the following finite set of indices n JN = {α : |α| ≤ N, d(α) ≤ n}
and choose the following deterministic basis r 2 π(k − 1)s 1 m1 (s) = √ ; mk (s) = cos , t t t
k ≥ 1, 0 ≤ s ≤ t.
Then, again under some technical assumptions, we have the following. √ P Theorem 8.9. If pn,N (z) , α∈J n (1/ α!)ϕα (t, z)ξα , then t N
Ct1 C2 + t, (N + 1)! n 1 ¯ ¯ Ct C2 ˜ n,N (z) − pt (z)|2 ] ≤ sup E[|p + t, t (N + 1)! n z∈Rd ˜ n,N − pt k2 ] ≤ E[kp t L2
where the constants Ct1 , Ct2 , C¯t1 , and C¯t2 are independent of n and N .
8.4 The Spectral Approach
205
One can also construct a recursive version of the expansion (8.17) (see [197] for a discussion of the method based on the above approximation). Let 0 = t0 < t1 < · · · < tM = T be a uniform partition of the interval [0, T ] with step ∆ (ti = i∆, i = 0, . . . , M ). Let mik = {mik (s)} be a complete orthonormal system in L2 ([ti−1 , ti ]). We define the random variables Z ti i Y Hαl (ξk,l ) i qk , ξk,l = mik (s) dY l (s), ξαi = ti−1 (αkl )! k,l where Hn is the nth Hermite polynomial. Consider the following system of deterministic partial differential equations dϕiα (t, z, g) = A∗ ϕiα (t, z, g) dt X + αl,k miα (t)hl (z)ϕiα(k,l) (t, z, g),
t ∈ [ti−1 , ti ]
(8.18)
k,l
ϕiα (ti−1 , z, g) = g(z)1{|α|=0} . We observe that, for each i = 1, . . . , M , the system (8.18) is similar to (8.16), the difference being that the initial time is no longer zero and we allow for an arbitrary initial condition which may be different for different is. The following is the recursive version of Theorem 8.8. Theorem 8.10. If p0 (z) = π0 (z), then for each z ∈ Rd and each ti , i = 1,. . . ,M , the unnormalised conditional distribution of the signal is given by pti (z) =
X 1 √ ϕiα (ti , z, pti−1 (·))ξαi α! α
(P-a.s.).
(8.19)
˜ and L1 (Ω, Yt , P) and the following ParThe series converges in L2 (Ω, Yt , P) seval’s equality holds, ˜ t (z)|2 ] = E[|p i
X 1 |ϕiα (ti , z, pti−1 (·))|2 . α! α
For computational purposes we truncate (8.19). We introduce the following basis mik (t) = mk (t − ti−1 ), ti−1 ≤ t ≤ ti , 1 m1 (t) = √ , ∆ r 2 π(k − 1)t mk (t) = cos , k ≥ 1, t ∈ [0, ∆], ∆ ∆ mk (t) = 0,
k ≥ 1, t 6∈ [0, ∆].
206
8 Numerical Methods for Solving the Filtering Problem
Theorem 8.11. If pn,N (z) = π0 (z) and 0 X 1 √ ϕiα (∆, z)ξαi , pn,N ti (z) = α! n α∈J N
where ϕiα (∆, z) are the solutions of the system X dϕiα (t, z) = A∗ ϕiα (t, z) + αl,k miα (t)hl (z)ϕiα(k,l) (t, z), dt
t ∈ [0, ∆]
k,l
ϕiα (0, z) = pn,N ti−1 (z)1{|α|=0} , then ˜ n,N − pt k2 ] ≤ BeBT max E[kp ti i L2
1≤i≤M
¯
˜ n,N (z) − pt (z)|2 ] ≤ Be ¯ BT max sup E[|p ti i
1≤i≤M
z
(C∆)N ∆2 + , (N + 1)! n ¯ N (C∆) ∆2 + , (N + 1)! n
¯ and C¯ are independent of n, N , ∆ and T . where the constants B, C, B
8.5 Partial Differential Equations Methods This type of method uses the fact that pt (z), the density of the unnormalised conditional distribution of the signal, is the solution of a partial differential equation, albeit a stochastic one. Therefore classical PDE methods may be applied to this stochastic PDE to obtain an approximation to the density pt . These methods are very successful in low-dimensional problems, but cannot be applied in high-dimensional problems as they require the use of a space grid whose size increases exponentially with the dimension of the state space of the signal. This section follows closely the description of the method given in Cai et al. [37]. The first step is to apply the splitting-up algorithm (see [186, 187] for results and details) to the Zakai equation dpt (z) = A∗ pt (z) dt + pt (z)h> (z) dYt . Let 0 = t0 < t1 < · · · < tn < · · · be a uniform partition of the interval [0, ∞) with time step ∆ = tn −tn−1 . Then the density ptn (z) will be approximated by ∆ ∆ p∆ n (z), where the transition from pn−1 (z) to pn (z) is divided into the following two steps. •
The first step, called the prediction step, consists in solving the following Fokker–Planck equation for the time interval [tn−1 , tn ], ∂pnt = A∗ pt (z) ∂t pntn−1 = p∆ n−1
8.5 Partial Differential Equations Methods
207
n and we denote the prior estimate by p¯∆ n , ptn . The Fokker–Planck equation is solved by using the implicit Euler scheme, i.e., we solve ∗ ∆ p¯∆ ¯n = p∆ n − ∆A p n−1 .
•
(8.20)
The second step, called the correction step, uses the new observation Ytn to update p¯∆ n . Define Z 1 1 tn 1 zn∆ , Ytn − Ytn−1 = h(Xs ) ds + Wtn − Wtn−1 . ∆ ∆ tn−1 ∆ d Using the Kallianpur–Striebel formula, define p∆ n (z) for z ∈ R as ∆ p∆ p∆ n (z) , cn ψn (z)¯ n (z), where ψn∆ (z) , exp − 12 ∆ kzn∆ − h(z)k2 and cn is a normalisation constant chosen such that Z p∆ n (z) dz = 1. Rd
Assume that the infinitesimal generator of the signal is the following second-order differential operator A=
d X i,j=1
d
aij (·)
X ∂2 ∂ + fi (·) . ∂xi ∂xj ∂xi i=1
We can approximate the solution to equation (8.20) by using a finite difference scheme on a given d-dimensional regular grid Ω h with mesh h = (h1 , . . . , hm ) in order to approximate the differential operator A. The scheme approximates first-order derivatives evaluated at x as (ei is the unit vector in the ith coordinate) ϕ(x + ei hi ) − ϕ(x) if fi (x) ≥ 0 hi ∂ϕ ' ∂xi x ϕ(x) − ϕ(x − ei hi ) if fi (x) < 0 hi and the second-order derivatives as ϕ(x + ei hi ) − 2ϕ(x) + ϕ(x − ei hi ) ∂ 2 ϕ ' ∂x2i x h2i and
∂ 2 ϕ ∂xi ∂xj x
ϕ(x+ei hi +ej hj )−ϕ(x+ei hi ) ϕ(x+ej hj )−ϕ(x) 1 − 2hi hj hj ϕ(x)−ϕ(x−ej hj ) ϕ(x−ei hi )−ϕ(x−ei hi −ej hj ) + − if aij ≥ 0, h h j j ' ϕ(x+ei hi )−ϕ(x+ei hi −ej hj ) ϕ(x)−ϕ(x−ej hj ) 1 − hj hj 2hi ϕ(x+ej hj )−ϕ(x) ϕ(x−ei hi +ej hj )−ϕ(x−ei hi ) + − if aij < 0. hj hj
208
8 Numerical Methods for Solving the Filtering Problem
For each grid point x ∈ Ω h define the set V h to be the set of points accessible from x, that is, V h (x) , {x + εi ei hi + εj ej hj , ∀ εi , εj ∈ {−1, 0, +1}, i 6= j} and the set N h (x) ⊃ V h (x) to be the set of nearest neighbors of x, including x itself N h (x) , {x + ε1 e1 h1 + · · · + εd ed hd , ∀ ε1 , . . . , εd ∈ {−1, 0, +1}} . The operator A is approximated by Ah , where Ah is the operator X Ah ϕ(x) , Ah (x, y)ϕ(y) y∈V h (x)
with coefficients† given for each x ∈ Ω h by d d X X X 1 1 1 aii (x) − Ah (x, x) = − |a (x)| − |fi (x)| ij 2 h 2h h h i j i i=1 i=1 i j : j6=i
Ah (x, x ± ei hi ) =
X 1 1 aii (x) − |aij (x)| + fi± (x) 2 2hi hi j : j6=i
1 A (x, x + ei hi ± ej hj ) = a± (x) 2hi hj ij 1 Ah (x, x − ei hi ∓ ej hj ) = a± (x) 2hi hj ij h
Ah (x, y) = 0,
otherwise
¯ h , where for all i, j = 1, . . . , d, i 6= j. One can check that, for all x ∈ Ω [ ¯h , Ω N h (x), x∈Ω h
it holds that X
Ah (x, y) = 0.
y∈V h (x)
If for all x ∈ Rd and i = 1, . . . , d, the condition X 1 1 a (x) − |aij (x)| ≥ 0, ii 2 hi 2hi hj j : j6=i
is satisfied then Ah (x, x) ≤ 0 †
Ah (x, y) ≥ 0 ∀x ∈ Ω h , ∀y ∈ Ω h (x) \ x.
The notation x+ denotes max(x, 0) and x− denotes min(x, 0).
(8.21)
8.6 Particle Methods
209
Condition (8.21) ensures that Ah can be interpreted as the generator of a pure jump Markov process taking values in the discretisation grid Ω h . As a consequence the solution of the resulting approximation of the Fokker–Planck equation p¯∆ n will always be a discrete probability distribution. For recent results regarding the splitting-up algorithm see the work of Gy¨ ongy and Krylov in [118, 119]. The method described above can be refined to permit better approximations of pt by using composite or adaptive grids (see Cai et al. [37] for details). See also Kushner and Dupuis [181], Lototsky et al. [194], Sun and Glowinski [263], Beneˇs [9] and Florchingen and Le Gland [101] for related results. For a general framework for proving convergence results for this class of methods, see Chapter 7 of the monograph by Kushner [182] and the references contained therein. See also Kushner and Huang [184] for further convergence results.
8.6 Particle Methods Particle methods† are algorithms which approximate the stochastic process πt with discrete random measures of the form X ai (t)δvi (t) , i
in other words, with empirical distributions associated with sets of randomly located particles of stochastic masses a1 (t),a2 (t), . . . , which have stochastic positions v1 (t),v2 (t), . . . where vi (t) ∈ S. Particle methods are currently among the most successful and versatile methods for numerically solving the filtering problem and are discussed in depth in the following two chapters. The basis of this class of numerical method is the representation of πt given by the Kallianpur–Striebel formula (3.33). That is, for any ϕ a bounded Borel-measurable function, we have πt (ϕ) =
ρt (ϕ) , ρt (1)
where ρt is the unnormalised conditional distribution of Xt i h ˜ ϕ(Xt )Z˜t Yt , ρt (ϕ) = E and
†
Z t Z 1 t Z˜t = exp h(Xs )> dYs − kh(Xs )k2 ds . 2 0 0
Also known as particle filters or sequential Monte Carlo methods.
(8.22)
210
8 Numerical Methods for Solving the Filtering Problem
The expectation in (8.22) is taken with respect to the probability measure ˜ under which the process Y is a Brownian motion independent of X (see P Section 3.3 for details). ˜ ˜ One can then use a Monte Carlo approximation for E[ϕ(X t )Zt | Yt ]. That is, a large number of independent realisations of the signal are produced (say n) and, for each of them, the corresponding expression ϕ(Xt )Z˜t is computed. Then, by taking the average of all the resulting values, one obtains an ap˜ ˜ proximation of E[ϕ(X t )Zt | Yt ]. To be more precise, let vj , j = 1, . . . , n be n mutually independent stochastic processes and independent of Y , each of them being a solution of the martingale problem for (A, π0 ). In other words the pairs (vj , Y ), j = 1, . . . , n are identically distributed and have the same ˜ Also let aj , j = 1, . . . , n be the distribution as the pair (X, Y ) (under P). following exponential martingales Z t aj (t) = 1 + aj (s)h(vj (s))> dYs , t ≥ 0. (8.23) 0
In other words Z t Z 1 t > 2 aj (t) = exp h(vj (s)) dYs − kh(vj (s))k ds , 2 0 0
t ≥ 0.
Hence, the triples (vj , aj , Y ), j = 1, . . . , n are identically distributed and have ˜ ˜ Y ) (under P). the same distribution as the triple (X, Z, Exercise 8.12. Show that the pairs (vj (t), aj (t)), j = 1, . . . , n are mutually independent conditional upon the observation σ-algebra Yt . Let ρn = {ρnt , t ≥ 0} and π n = {πtn , t ≥ 0} be the following sequences of measure-valued processes n
ρnt ,
1X aj (t)δvj (t) , n j=1
ρnt , t≥0 ρnt (1) n X = a ¯nj (t)δvj (t) ,
t≥0
(8.24)
πtn ,
t ≥ 0,
(8.25)
j=1
where the normalised weights a ¯nj have the form aj (t) a ¯nj (t) = Pn , k=1 ak (t)
j = 1, . . . , n, t ≥ 0.
That is, ρnt is the empirical measure of n (random) particles with positions vj (t), j = 1, . . . , n and weights aj (t)/n, j = 1, . . . , n and πtn is its normalised version. We have the following.
8.6 Particle Methods
211
Lemma 8.13. For any ϕ ∈ B(S) we have ˜ n (ϕ) − ρt (ϕ))2 | Yt ] = c1,ϕ (t) , E[(ρ t n
(8.26)
2 ˜ ˜ where c1,ϕ (t) , E[(ϕ(X t )Zt − ρt (ϕ)) | Yt ]. Moreover
˜ (ρn (ϕ) − ρt (ϕ))4 | Yt ≤ c2,ϕ (t) , E t n2
(8.27)
4 ˜ ˜ where c2,ϕ (t) , 6E[(ϕ(X t )Zt − ρt (ϕ)) | Yt ].
Proof. Observe that since the triples (vj , aj , Y ), j = 1, . . . , n are identically ˜ Y ), we have distributed and have the same distribution as the triple (X, Z, for j = 1, . . . , m, h i ˜ [ϕ(vj (t))aj (t) | Yt ] = E ˜ ϕ(Xt )Z˜t | Yt = ρt (ϕ). E In particular ˜ [ρn (ϕ) | Yt ] = ρt (ϕ) E t and the random variables ξjϕ , j = 1, . . . , n defined by ξjϕ , ϕ (vj (t)) aj (t) − ρt (ϕ),
j = 1, . . . , n,
have zero mean and the same distribution as ϕ(Xt )Z˜t − ρt (ϕ). It then follows that n 1X ϕ ξ = ρnt (ϕ) − ρt (ϕ). n j=1 j Since the pairs (vi (t), ai (t)) and (vj (t), aj (t)) for i 6= j, conditional upon Yt are independent, it follows that the random variables ξjϕ , j = 1, . . . , n are mutually independent conditional upon Yt . It follows immediately that 2 n i h X ϕ ˜ (ρn (ϕ) − ρt (ϕ))2 Yt = 1 E ˜ E ξj Y t t n2 j=1 =
n 1 X ˜ ϕ 2 E (ξj ) Yt n2 j=1
n i 1 X˜h 2 = 2 E (ϕ(vj (t))aj (t) − ρt (ϕ)) Yt n j=1
= Similarly
c1,ϕ (t) . n
212
8 Numerical Methods for Solving the Filtering Problem
4 n i X ϕ ˜ (ρn (ϕ) − ρt (ϕ))4 Yt = 1 E ˜ E ξj Yt t n4 j=1 h
=
n 1 X ˜ h ϕ 4 i E ξj Yt n4 j=1 h X i h ϕ 2 i 12 ˜ ξ ϕ 2 Yt E ˜ ξ + 4 E j1 j2 Yt n 1≤j1 <j2 ≤n
4 ˜ ˜ E[(ϕ(X 6n(n − 1) t )Zt − ρt (ϕ)) ] ≤ + (c1,ϕ (t))2 3 n n4
and the claim follows since, by Jensen’s inequality, we have 2 4 ˜ ˜ (c1,ϕ (t)) ≤ E[(ϕ(X t )Zt − ρt (ϕ)) ].
t u Remark 8.14. More generally one can prove that for any integer p and any ϕ ∈ B(S), ˜ (ρn (ϕ) − ρt (ϕ))2p Yt ] ≤ cp,ϕ (t) , E[ (8.28) t np where i h ˜ (ϕ(Xt )Z˜t − ρt (ϕ))2p Yt , cp,ϕ (t) = kp E (8.29) where kp is some universal constant. Of course, Lemma 8.13 and Remark 8.14 are of little use if the random variables cp,ϕ (t) are not finite a.s. In the following we assume that they are. Under this condition the lemma implies that ρnt (ϕ) converges in expectation √ to ρt (ϕ) for any ϕ ∈ B(S) with the rate of convergence of order 1/ n. Exercise 8.15. Let cp,ϕ (t) be the Yt -adapted random variable defined in ˜ Z˜ 2p ] < ∞, then E[c ˜ p,ϕ (t)] < ∞, hence the random (8.29). Show that if E[ t ˜ variable cp,ϕ (t) is finite P-almost surely for any ϕ ∈ B(S). In particular, show ˜ that if the function h is bounded, then cp,ϕ (t) < ∞, P-almost surely for any ϕ ∈ B(S). The convergence of ρnt (ϕ) to ρt (ϕ) is valid for larger classes of function ϕ ˜ (not just bounded functions) provided that ϕ(Xt )Z˜t is P-integrable. Moreover, the existence of higher moments of ϕ(Xt )Z˜t ensures a control on the rate of convergence. However, in the following we restrict ourselves to just bounded test functions. ˜ Z˜ 2p ] < ∞, then for any ϕ ∈ B(S), there exists a Proposition 8.16. If E[ t finite Yt -adapted random variable c¯p,ϕ (t) such that for any ϕ ∈ B(S), ˜ (π n (ϕ) − πt (ϕ))2p Yt ] ≤ c¯p,ϕ (t) . E[ t np
(8.30)
8.6 Particle Methods
213
Proof. Observe that πtn (ϕ) − πt (ϕ) =
ρnt (ϕ) 1 1 (ρt (1) − ρnt (1)) + (ρn (ϕ) − ρt (ϕ)) , ρnt (1) ρt (1) ρt (1) t
hence, since |ρnt (ϕ)| ≤ kϕk∞ ρnt (1), we have |πtn (ϕ) − πt (ϕ)| ≤
kϕk∞ n 1 |ρ (1) − ρt (1)| + |ρn (ϕ) − ρt (ϕ)| ρt (1) t ρt (1) t
(8.31)
and, by the triangle inequality, ˜ (π n (ϕ) − πt (ϕ))2p Yt ]1/2p ≤ kϕk∞ E ˜ (ρn (1) − ρn (1))2p Yt 1/2p E[ t t t ρt (1) i1/2p 1 ˜h n 2p + E (ρt (ϕ) − ρt (ϕ)) Yt . ρt (1) Remark 8.14 and Exercise 8.15 imply that there exists a finite Yt -adapted random variable such that for any ϕ ∈ B(S) we have ˜ (ρn (ϕ) − ρt (ϕ))2p Yt ] ≤ cp,ϕ (t) ; E[ t np hence (8.30) holds with c¯p,ϕ (t) being the Yt -adapted random variable cp,ϕ (t)1/2p + kϕk∞ cp,1 (t)1/2p c¯p,ϕ (t) , ρt (1)2p
2p .
(8.32) t u
Lemma 8.13 shows the convergence of ρnt (ϕ) to ρt (ϕ) when conditioned with respect to the observation σ-algebra Yt . It also implies the convergence in expectation,† and the almost sure convergence of ρnt to ρt . ˜ Z˜ 2 ] < ∞, then for any ϕ ∈ B(S) we have Theorem 8.17. If E[ t 1 (t) ˜ n (ϕ) − ρt (ϕ)|] ≤ c˜√ E[|ρ kϕk∞ , t n
q ˜ Z˜ 2 ]. In particular e limn→∞ ρn = ρt . Moreover, if E[ ˜ Z˜ 2p ] < where c˜1 (t) , E[ t t t ∞, for p ≥ 2 then for any ε ∈ (0, 1/2 − 1/(2p)) and ϕ ∈ B(S) there exists a positive random variable c˜ε,p,ϕ (t) which is almost surely finite such that |ρnt (ϕ) − ρt (ϕ)| ≤
c˜ε,p,ϕ (t) . nε
(8.33)
˜ In particular, ρnt converges to ρt , P-almost surely. †
˜ n Recall that ρn t → ρt in expectation if limn→∞ E [|ρt f − ρt f |] = 0 for all f ∈ Cb (S). See Section A.10 for the definition of convergence in expectation.
214
8 Numerical Methods for Solving the Filtering Problem
Proof. From Lemma 8.13 we get, using Jensen’s inequality, that q q ˜ 1,ϕ (t)] E[c ˜ n (ϕ) − ρt (ϕ)|] ≤ E[(ρ ˜ n (ϕ) − ρt (ϕ))2 ] = √ E[|ρ , t t n hence the first claim is true since 2 ˜ 1,ϕ (t)] = E[(ϕ(X ˜ ˜ E[c t )Zt − ρt (ϕ)) ] h i ˜ ϕ(Xt )2 Z˜ 2 − 2ρt (ϕ)Z˜t ϕ(Xt ) + ρt (ϕ)2 =E t h i ˜ ϕ(Xt )2 Z˜ 2 − ρt (ϕ)2 =E t
˜ Z˜ 2 ]. ≤ kϕk2∞ E[ t Similarly ˜ ˜ n (ϕ) − ρt (ϕ))2p ] ≤ E[cp,ϕ (t)] , E[(ρ t np where cp,ϕ (t) is the random variable defined in (8.29), which implies (8.33) and the almost sure convergence of ρnt to ρ follows as a consequence of Remark A.38 in the appendix. t u Let us turn our attention to the convergence of πtn . The almost sure convergence of πtn to πt holds under the same conditions as the convergence of ρnt to ρt . However, the convergence in expectation of πtn to πt requires an additional integrability condition on ρ−1 t (1). ˜ Z˜ 2 ] < ∞ and E ˜ ρ−2 (1) < ∞, then for any ϕ ∈ B(S), Theorem 8.18. If E[ t t we have 1 (t) ˜ n (ϕ) − πt (ϕ)|] ≤ cˆ√ E[|π kϕk, (8.34) t n q ˜ Z˜ 2 ]E[ρ ˜ −2 (1)]. In particular π n converges to πt in expecwhere cˆ1 (t) = 2 E[ t t t ˜ Z˜ 2p ] < ∞, for p ≥ 2 then for any ε ∈ (0, 1/2 − 1/(2p)) tation. Moreover, if E[ t there exists a positive random variable cˆε,p,ϕ (t) almost surely finite such that |πtn (ϕ) − πt (ϕ)| ≤
cˆε,p,ϕ (t) . nε
(8.35)
˜ In particular, πtn converges to πt , P-almost surely. Proof. By inequality (8.31) and the Cauchy–Schwartz inequality we get that q n ˜ ˜ ρ−2 (1) E ˜ [|ρn (1) − ρt (1)|2 ] E[|πt (ϕ) − πt (ϕ)|] ≤ kϕk E t t q ˜ ρ−2 (1) E ˜ [|ρn (ϕ) − ρt (ϕ)|2 ]. + E t t
8.6 Particle Methods
215
Moreover, for any ϕ ∈ B(S), from the proof of Theorem 8.17, it follows that ˜ n (ϕ) − ρt (ϕ))2 ] ≤ 1 kϕk2 E[ ˜ Z˜ 2 ]; E[(ρ t t n hence the first claim is true. For the almost sure convergence result observe that inequalities (8.31) and (8.33) imply that kϕk∞ n 1 |ρt (1) − ρt (1)| + |ρn (ϕ) − ρt (ϕ)| ρt (1) ρt (1) t kϕk∞ c˜ε,p,1 (t) 1 c˜ε,p,ϕ (t) ≤ + ρt (1) nε ρt (1) nε
|πtn (ϕ) − πt (ϕ)| ≤
and the claim follows with cˆε,p,ϕ (t) =
kϕk∞ c˜ε,p,1 (t) + c˜ε,p,ϕ (t) . ρt (1) t u
˜ −2 (1)] < ∞ if the function h is bounded. Exercise 8.19. Show that E[ρ t ˜ Z˜ 2p ] < ∞, then there exists a positive constant Exercise 8.20. Show that if E[ t c˜p (t) such that for any ϕ ∈ B(S) we have ˜ n (ϕ) − ρt (ϕ)|p ] ≤ c˜p (t) kϕkp . E[|ρ t ∞ np/2 ˜ Z˜ 2p ] < ∞ and E[ρ ˜ −2p (1)] < ∞, then for any ϕ ∈ Similarly, show that if E[ t t B(S), we have ˜ n (ϕ) − πt (ϕ)|p ] ≤ cˆp (t) kϕkp . E[|π t ∞ np/2 Let M = {ϕi , i ≥ 0}, where ϕi ∈ Cb (S) be a countable convergence determining set such that kϕi k∞ ≤ 1 for any i ≥ 0 and let dM be the metric on M(S) (see Section A.10 for additional details): dM : M(S) × M(S) → [0, ∞),
∞ X 1 d(µ, ν) = |µϕi − νϕi |. i 2 i=0
Theorems 8.17 and 8.18 give the following corollary. ˜ Z˜ 2 ] < ∞, then Corollary 8.21. If E[ t q ˜ Z˜ 2 ] 2 E[ t ˜ M (ρn , ρt )] ≤ √ E[d . (8.36) t n ˜ Z˜ 2 ] < ∞ and E ˜ ρ−2 (1) < ∞, then for any ϕ ∈ B(S), we have Similarly if E[ t t
216
8 Numerical Methods for Solving the Filtering Problem
q ˜ Z˜ 2 ]E ˜ ρ−2 (1) E[ t t ˜ M (π n , πt )] ≤ √ E[d . t n 4
(8.37)
˜ Z˜ 2p ] < ∞, for p ≥ 2 then for any ε ∈ (0, 1/2 − 1/(2p)) there Moreover, if E[ t exists a positive random variable c˜ε (t) which is almost surely finite such that dM (ρnt , ρt ) ≤
c˜ε (t) , nε
dM (πtn , π) ≤
c˜ε (t) . nε
(8.38)
Proof. From Theorem 8.17 we get, using the fact that kϕi k∞ ≤ 1, that ˜ M (ρn , ρt )] ≤ E[d t
∞ X 1˜ n E[|ρt (ϕi ) − ρt (ϕi )|] i 2 i=0 ∞
c˜1 (t) X 1 2˜ c1 (t) ≤ √ ≤ √ , n i=0 2i n which establishes (8.36). Inequality (8.37) follows by a similar argument. By the triangle inequality and Exercise 8.20 it follows that ˜ M (ρn , ρt )p ]1/p ≤ E[d t
∞ ∞ X c˜p (t) X 1 1˜ n p 1/p E[|ρ (ϕ ) − ρ (ϕ )| ] ≤ . i t i t 2i np/2 i=0 2i i=0
The first inequality in (8.38) then follows from Remark A.38 in the appendix. The second inequality in (8.38) follows in a similar manner. t u n n Corollary 8.21 states that both √ ρt converges to ρt , and πt converges to πt in expectation with the rate 1/ n. It also states that the √corresponding rate for the almost sure convergence is slightly lower than 1/ n. The above analysis requires the existence of higher moments of the martin˜ Of course, the question arises as to what happens if they do not exist gale Z. and we only know that Z˜ is integrable. In this case πtn still converges almost surely to πt for fixed observation paths s 7→ Ys as a consequence of the strong law of large numbers. To state this precisely, it is necessary to use an explicit description of the underlying probability space Ω as a product space, where the processes (X, Y ) live on one component and the processes vj , j = 1, 2, . . . live on another. The details and the ensuing analysis is cumbersome, so we do not include the details. Moreover, in this case, the random measure πtn will not converge to the random measure πt (over the product space) and no convergence rates may be available. In Chapter 10 we discuss the convergence for fixed observations for the discrete time framework. Theorems 8.17 and 8.18 and Corollary 8.21 show that the Monte Carlo method will produce approximations for ρt , respectively πt , provided enough particles (independent realizations of the signal) are used. The number of particles depends upon the magnitude of the constants appearing in the upper bounds of the rates of convergence, which in turn depend on the magnitude
8.7 Solutions to Exercises
217
˜ This is bad news, of the higher moments of the exponential martingale Z. because these higher moments of the exponential martingale Z˜ increase very rapidly as functions of time. The particle picture makes the reason for the deterioration in the accuracy of the approximations with time clearer. Each particle has a trajectory which is independent of the signal trajectory, and its corresponding weight depends on how close its trajectory is to the signal trajectory: the weight is the likelihood of the trajectory given the observation.† Typically, most particles’ trajectories diverge very quickly from the signal trajectory, with a few ‘lucky’ ones remaining close to the signal. Therefore the majority of the weights decrease to zero, while a small minority become very large. As a result only the ‘lucky’ particles will contribute significantly to the sums (8.24) and (8.25) giving the approximations for ρt , respectively, πt . The convergence of the Monte Carlo method is therefore very slow as a large number of particles is needed in order to have a sufficient number of particles in the right area (with correspondingly large weights). To solve this problem, a wealth of methods have been proposed. In filtering theory, the generic name for these methods is particle filters or sequential Monte Carlo methods. These methods use a correction mechanism that culls particles with small weights and multiplies particles with large weights. The correction procedure depends on the trajectory of the particle and the observation data. This is effective as particles with small weights (i.e. particles with unlikely trajectories/positions) are not carried forward uselessly whereas the most probable regions of the signal state space are explored more thoroughly. The result is a cloud of particles, with those surviving to the current time providing an estimate for the conditional distribution of the signal. In the following two chapters we study this class of methods in greater detail. In Chapter 9 we discuss such a particle method for the continuous time framework together with corresponding convergence results. In Chapter 10, we look at particle methods for solving the filtering problem in the discrete framework.
8.7 Solutions to Exercises 8.12 It is enough to show that the stochastic integrals Z t Ij = h(vj (s))> dYs , j = 1, . . . , n 0
are mutually independent given Yt . This follows immediately from the fact that the random variables m X Ijm = h(vj (it/m))> (Y(i+1)t/m − Yit/m ), j = 1, . . . , n i=1 †
In Chapter 10 we make this statement precise for the discrete time framework.
218
8 Numerical Methods for Solving the Filtering Problem
are mutually independent given Yt , hence by the bounded convergence theorem n n Y Y mk ˜ ˜ E exp(iλj Ij ) Yt = lim E exp iλj Ij Yt k→∞ j=1 j=1 = lim
k→∞
=
n Y
n Y
˜ exp iλj I mk Yt E j
j=1
˜ [ exp(iλj Ij )| Yt ] E
(8.39)
j=1
for any λj , j = 1, . . . , n. In (8.39), (Ijmk )k>0 is a suitably chosen subsequence of (Ijm )m>0 so that Ijmk converges to Ij almost surely. 8.15 From (8.29) and the inequality (a + b)k ≤ 2k−1 (ak + bk ), 2p ˜ ˜ ˜ E[cp,ϕ ] = kp E ϕ(Xt )Zt − ρt (ϕ) h i ≤ 22p−1 kp E (ϕ(Xt )Z˜t )2p + (ρt (ϕ))2p 2p ˜ ˜ 2p ˜ ≤ 22p−1 kp kϕk2p ∞ E[Zt ] + E[(ρt (1)) ] . ˜ Z˜ 2p ] < ∞; for the second term The first term is bounded by the assumption E[ use the conditional form of Jensen h i h i ˜ (ρt (1))2p = E ˜ (E[ ˜ Z˜t | Yt ])2p ≤ E ˜ E[ ˜ Z˜ 2p | Yt ] = E[ ˜ Z˜ 2p ] < ∞. E t t ˜ Therefore E[cp,ϕ ] < ∞, which implies that cp,ϕ < ∞ P-a.s. For the second part, where h is bounded, use the explicit form ! m Z t m Z t X X 2p i i i 2 Z˜t = exp 2p h dYs − p h (Xs ) ds i=1 2
≤ exp((2p −
0
i=1
0
p)mtkhk2∞ )Θt ,
where Θ = {Θt , t ≥ 0} is the exponential martingale Θt , exp 2p
m Z X i=1
0
t i
h
dYsi
! m Z (2p)2 X t i 2 − h (Xs ) ds . 2 i=1 0
The boundedness of h implies that Θ is a genuine martingale via Novikov’s ˜ Z˜ 2p ] is condition (see Theorem B.34). Taking expectations, we see that E[ t 2 2 bounded by exp((2p − p)mtkhk∞ ). 8.19 By Jensen’s inequality
8.7 Solutions to Exercises
219
h i−2 h h ii h i −2 ˜ ˜ ˜ ˜ E ˜ Z˜ −2 | Yt = E ˜ Z˜ −2 ˜ E[ρt (1)] = E E Zt | Yt ≤E t t and from the explicit form for Z˜t , Z t Z t Z˜t−2 = exp −2 h(Xs )> dYs + kh(Xs )k2 ds 0
0
¯t , ≤ exp(3mtkhk2∞ )Θ ¯ = {Θ¯t , t ≥ 0} is the exponential martingale where Θ Θ¯t , exp −2
m Z X i=1
t i
h
dYsi
−2
0
m Z X i=1
!
t i
2
h (Xs ) ds .
0
¯ is a genuine martingale via Novikov’s The boundedness of h implies that Θ ˜ −2 (1)] is condition (see Theorem B.34). Taking expectations, we see that E[ρ t 2 bounded by exp(3mtkhk∞ ). 8.20 By Jensen’s inequality and (8.28) q ˜ [|ρn (ϕ) − ρ(ϕ)|p ] ≤ E ˜ [(ρn (ϕ) − ρ(ϕ))2p ] E t t r h i ˜ E ˜ [(ρn (ϕ) − ρ(ϕ))2p | Yt ] ≤ E t q ˜ p,ϕ (t)] E[c ≤ . np/2 From the computations in Exercise 8.15, ˜ p,ϕ (t)] ≤ Kp (t)kϕk2p , E[c ∞ where
h i ˜ Z˜ 2p , Kp (t) = 4p kp E t
thus
p Kp (t)kϕkp∞ n p ˜ E [|ρt (ϕ) − ρ(ϕ)| ] ≤ . np/2 p Therefore the result follows with c˜p (t) = Kp (t). For the second part, from (8.31) and the inequality (a+b)p < 2p−1 (ap +bp ), |πtn (ϕ) − π(ϕ)|p ≤ 2p−1 so by Cauchy–Schwartz
kϕkp∞ n 2p−1 n p p |ρ (1)) − ρ (1)| + |ρ (ϕ) − ρt (ϕ)| , t t ρt (1)p ρt (1)p t
220
8 Numerical Methods for Solving the Filtering Problem
r h i ˜ [|π n (ϕ) − π(ϕ)|p ] ≤ 2p−1 kϕkp E ˜ [ρt (1)−2p ] E ˜ cp,1 E t ∞ np r h i ˜ [ρt (1)−2p ] E ˜ cp,ϕ + 2p−1 E p n q q q ˜ [ρt (1)−2p ] E p−1 p ˜ ˜ ≤2 kϕk∞ E [cp,1 ] + E [cp,ϕ ] np/2 q q ˜ [ρt (1)−2p ] E p ≤ 2p−1 kϕk 2 Kp (t), ∞ np/2 so the result follows with cˆp (t) = 2p
q q ˜ [ρt (1)−2p ]. Kp (t) E
9 A Continuous Time Particle Filter
9.1 Introduction Throughout this chapter, we take the signal X to be the solution of (3.9); that is, X = (X i )di=1 is the solution of the stochastic differential equation dXt = f (Xt )dt + σ(Xt ) dVt ,
(9.1)
where f : Rd → Rd and σ : Rd → Rd×p are bounded and globally Lipschitz functions and V = (V j )pj=1 is a p-dimensional Brownian motion. As discussed in Section 3.2, the generator A associated with the process X is the secondorder differential operator, A=
d X i=1
fi
d X ∂ ∂2 + aij , ∂xi i,j=1 ∂xi ∂xj
where a = 12 σσ > . Since both f and a are bounded, the domain of the generator A, D(A) is Cb2 (Rd ), the space of bounded twice continuously differentiable functions with bounded first and second partial derivatives; for any ϕ ∈ Cb2 (Rd ), the process M ϕ = {Mtϕ , t ≥ 0} defined by† Z t Mtϕ , ϕ(Xt ) − ϕ(X0 ) − Aϕ(Xs ) ds, 0 Z t = ((∇ϕ)> σ)(Xs ) dVs , t ≥ 0 0
is an Ft -adapted martingale. The observation process is the solution of the evolution equation (3.5); that is, Y is an m-dimensional stochastic process that satisfies dYt = h(Xt ) dt + dWt , †
In the following (∇ϕ)> is the row vector (∂1 ϕ, . . . , ∂d ϕ).
A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 9,
222
9 A Continuous Time Particle Filter
d m where h = (hi )m is a bounded measurable function and W i=1 : R → R is a standard m-dimensional Brownian motion independent of X. Since h is bounded, condition (3.25) is satisfied. Hence the process Z = {Zt , t > 0} defined by Z t Z 1 t > 2 Zt , exp − h(Xs ) dWs − kh(Xs )k ds , t ≥ 0, (9.2) 2 0 0
˜ whose Radon–Nikodym derivais a genuine martingale and the probability P tive with respect to P is given on Ft by Zt , viz ˜ dP = Zt , dP Ft
is well defined (see Section 3.3 for details, also Theorem B.34 and Corollary ˜ the process Y is a Brownian B.31). As was shown in Chapter 3, under P, motion independent of X. Then the Kallianpur–Striebel formula (3.33) states that ρt (ϕ) ˜ πt (ϕ) = , P(P)-a.s., ρt (1) where ρt is the unnormalized conditional distribution of X, which satisfies h i ˜ ϕ(Xt )Z˜t | Yt ρt (ϕ) = E for any bounded Borel-measurable function ϕ and Z t Z 1 t > 2 ˜ Zt = exp h(Xs ) dYs − kh(Xs )k ds . 2 0 0
(9.3)
Similar to the Monte Carlo method which is described in Section 8.6, the particle filter presented below produces a measure-valued process π n = {πtn , t ≥ 0} which represents the empirical measure of n (random) particles with varying weights πtn ,
n X
a ¯nj (t)δvjn (t) ,
t ≥ 0.
j=1
The difference between the Monte Carlo method described earlier and the particle filter which we are about to describe is the presence of an additional correction procedure, which is applied at regular intervals to the system of particles. At the correction times, each particle is replaced by a random number of particles (possibly zero). We say that the particles branch into a random number of offspring. This is done in a consistent manner so that particles with small weights have no offspring (i.e. are killed), and particles with large weights are replaced by several offspring.
9.2 The Approximating Particle System
223
The chapter is organised as follows. In the following section we describe in detail the particle filter and some of its properties. In Section 9.3 we review the dual of the process ρ, which was introduced in Chapter 7, and give a number of preliminary results. The convergence results are proved in Section 9.4.
9.2 The Approximating Particle System The particle system at time 0 consists of n particles all with equal weights 1/n, and positions vjn (0), for j = 1, . . . , n. We choose the initial positions of the particles to be independent, identically distributed random variables with common distribution π0 , for j, n ∈ N. Hence the approximating measure at time 0 is n 1X n π0 = δvn (0) . n j=1 j The time interval [0, ∞) is partitioned into sub-intervals of equal length δ. During the time interval [iδ, (i + 1)δ), the particles all move with the same law as the signal X; that is, for t ∈ [iδ, (i + 1)δ), vjn (t)
=
vjn (iδ)
Z
t
+
f (vjn (s)) ds
Z
t
σ(vjn (s)) dVs(j) ,
+
iδ
j = 1, . . . , n, (9.4)
iδ
where (V (j) )nj=1 are mutually independent Ft -adapted p-dimensional Brownian motions which are independent of Y , and independent of all other random variables in the system. The notation V (j) is used to make it clear that these are not the components of each p-dimensional Brownian motion. The weights a ¯nj (t) are of the form anj (t) a ¯nj (t) , Pn , n k=1 ak (t) where anj (t) = 1 +
m Z X k=1
t
anj (s)hk (vjn (s)) dYsk ;
(9.5)
iδ
in other words anj (t)
Z
t
= exp
h(vjn (s))>
iδ
1 dYs − 2
Z
t
iδ
For t ∈ [iδ, (i + 1)δ), define πtn ,
n X j=1
a ¯nj (t)δvjn (t) .
kh(vjn (s))k2
ds .
(9.6)
224
9 A Continuous Time Particle Filter
At the end of the interval, each particle branches into a random number of particles. Each offspring particle initially inherits the spatial position of its parent. After branching all the particles are reindexed (from 1 to n) and all of the (unnormalized) weights are reinitialised back to 1. When necessary, we use the notation j 0 = 1, 2, . . . , n to denote the particle index prior to the branching event, to distinguish it from the index after the branching event n,(i+1)δ which we denote by j = 1, 2, . . . , n. Let oj 0 be the number of offspring 0 produced by the j th particle at time (i + 1)δ in the n-particle approximating n,(i+1)δ system. Then oj 0 is F(i+1)δ -adapted and† h i n,(i+1)δ n,(i+1)δ n¯ a with prob. 1 − {n¯ aj 0 } 0 j n,(i+1)δ h i oj 0 , (9.7) n,(i+1)δ n,(i+1)δ n¯ aj 0 +1 with prob. {n¯ aj 0 }, n,(i+1)δ
where a ¯j 0 is the value of the particle’s weight immediately prior to the branching; in other words, n,(i+1)δ
a ¯j 0
=a ¯nj0 ((i + 1)δ−) =
lim t%(i+1)δ
a ¯nj0 (t).
(9.8)
Hence if F(i+1)δ− is the σ-algebra of events up to time (i + 1)δ, viz F(i+1)δ− = σ(Fs , s < (i + 1)δ), then from (9.7), h i n,(i+1)δ n,(i+1)δ E oj 0 | F(i+1)δ− = n¯ aj 0 ,
(9.9)
and the conditional variance of the number of offspring is h 2 i2 n,(i+1)δ n,(i+1)δ E oj 0 F(i+1)δ− − E oj 0 F(i+1)δ− n o n o n,(i+1)δ n,(i+1)δ = n¯ aj 0 1 − n¯ aj 0 . Exercise 9.1. Let a > 0 be a positive constant and Aa be the set of all integer-valued random variables ξ such that E[ξ] = a, viz Aa , {ξ : Ω → N | E[ξ] = a} . Let var(ξ) = E[ξ 2 ]−a2 be the variance of an arbitrary random variable ξ ∈ Aa . Show that there exists a random variable ξ min ∈ Aa with minimal variance. That is, var(ξ min ) ≤ var(ξ) for any ξ ∈ Aa . Moreover show that ( [a] with prob. 1 − {a} min ξ = (9.10) [a] + 1 with prob. {a} †
In the following, [x] is the largest integer smaller than x and {x} is the fractional part of x; that is, {x} = x − [x].
9.2 The Approximating Particle System
225
and var(ξ min ) = {a}(1 − {a}). More generally show that E[ϕ(ξ min )] ≤ E[ϕ(ξ)] for any convex function ϕ : R → R. Remark 9.2. Following Exercise 9.1, we deduce that the random variables n,(i+1)δ oj 0 defined by (9.7) have conditional minimal variance in the set of all n,(i+1)δ
integer-valued random variables ξ such that E[ξ | F(i+1)δ− ] = n¯ aj 0 for j = 1, . . . , n. This property is important as it is the variance of the random variables onj that influences the speed of convergence of the corresponding algorithm. 9.2.1 The Branching Algorithm We wish to control the branching process so that the number of particles in the system remains constant at n; that is, we require that for each i, n X
n,(i+1)δ
oj 0
= n,
j 0 =1 n,(i+1)δ
which implies that the random variables oj 0 , j 0 = 1, . . . , n will be correlated. n,(i+1)δ Let uj 0 , j 0 = 1, . . . , n − 1 be n − 1 mutually independent random variables, uniformly distributed on [0, 1], which are independent of all other random variables in the system. To simplify notation in the statement of n,(i+1)δ the algorithm, we omit the superscript (i + 1)δ in the notation for oj 0 , n,(i+1)δ
a ¯j 0
n,(i+1)δ
and uj 0
. The following algorithm is then applied.
g := n h := n for j 0 := 1 to n n − 1 if n¯ aj 0 + g − n¯ anj0 < 1 then if unj0 < 1 − n¯ anj0 /{g} then n onj0 := n¯ aj 0 else n onj0 := n¯ aj 0 + (h − [g]) end if else n if unj0 < 1 − 1 − n¯ aj 0 / (1 − {g}) then n n oj 0 := n¯ aj 0 + 1 else n onj0 := n¯ aj 0 + (h − [g]) end if end if g := g − n¯ anj0 h := h − onj0 end for onn := h
226
9 A Continuous Time Particle Filter
Some of the properties of the random variables {onj0 , j 0 = 1, . . . , n} are given by the following proposition. Since there is no risk of confusion, in the statement and proof of this proposition, the primes on the indices are omitted and thus the variables are denoted {onj }nj=1 . Proposition 9.3. The random variables onj for j = 1, . . . , n have the following properties. Pn a. j=1 onj = n. b. For any j = 1, . . . , n we have E[onj ] = n¯ anj . n c. For any j = 1, . . . , n, oj has minimal variance, specifically E[(onj − n¯ anj )2 ] = {n¯ anj }(1 − {n¯ anj }). d. For any k = 1, . . . , n − 1, the random variables on1:k = Pn onk+1:n = j=k+1 onj have variance
Pk
j=1
onj , and
E[(on1:k − n¯ an1:k )2 ] = {n¯ an1:k } (1 − {n¯ an1:k }) . n E[(onk+1:n − n¯ ank+1:n )2 ] = n¯ ank+1:n 1 − n¯ ak+1:n , Pk Pn where a ¯n1:k = j=1 a ¯nj and a ¯nk+1:n = j=k+1 a ¯nj . n e. For 1 ≤ i < j ≤ n, the random variables oi and onj are negatively correlated. That is, E[(oni − n¯ ani )(onj − n¯ anj )] ≤ 0. Proof. Property (a) follows immediately from the fact that onn is defined as onn
=n−
n−1 X
onj0 .
j 0 =1
For properties (b), (c) and (d), we proceed by induction. First define the sequence of σ-algebras Uk = σ({unj , j = 1, . . . , k}),
k = 1, . . . , n − 1,
where unj , j = 1, . . . , n − 1 are the random variables used to construct the 0 onj s. Then from the algorithm, on1 = [n¯ an1 ] + 1[0,{n¯an1 }] (un1 ) ; hence on1 has mean n¯ an1 and minimal variance from Exercise 9.1. As a consequence of property (a), it also holds that on2:n has minimal variance. The induction step follows from the fact that h stores the number of offspring which are not yet assigned and g stores the sum of their corresponding means. In other words at the kth iteration for k ≥ 2, h = onk:n = n − on1:k−1 and n g = n¯ ank:n = n − n¯ an1:k−1 . It is clear that {n¯ ank } + n¯ ak+1:n is either equal
9.2 The Approximating Particle System
227
to {n¯ ank:n } or {n¯ ank:n } + 1. In the first of these cases, from the algorithm it follows that for k ≥ 2, onk = [n¯ ank ] + (onk:n − [n¯ ank:n ]) 1[1−{n¯ank }/{n¯ank:n },1] (unk ) , from which it follows from the fact that onk+1:n + onk = onk:n , that n onk+1:n = n¯ ak+1:n + (onk:n − [n¯ ank:n ]) 1[0,1−{n¯ank }/{n¯ank:n }] (unk ) ;
(9.11)
(9.12)
hence, using the fact that onk:n is Uk−1 -measurable and unk is independent of Uk−1 , we get from (9.11) that E [(onk − n¯ ank ) | Uk−1 ] = −{n¯ ank } + (onk:n − [n¯ ank:n ])
{n¯ ank } {n¯ ank:n }
{n¯ ank } n (o − n¯ ank:n ) {n¯ ank:n } k:n {n¯ ank } = (onk:n − n¯ ank:n ) {n¯ ank:n } =
(9.13)
and by a similar calculation 2
E[(onk − n¯ ank ) | Uk−1 ] {n¯ ank } + ({n¯ ank:n } − {n¯ ank }) {n¯ ank } {n¯ ank:n } {n¯ ank } + 2 (onk:n − n¯ ank:n ) ({n¯ ank:n } − {n¯ ank }) . (9.14) {n¯ ank:n }
= (onk:n − n¯ ank:n )
2
The identities (9.13), (9.14) and the corresponding identities derived from (9.12), viz: {n¯ ank } n n E ok+1:n − n¯ ak+1:n | Uk−1 = (ok:n − n¯ ak:n ) 1 − {n¯ ank:n } and E[(onk+1:n
−
n¯ ank+1:n )2
{n¯ ank } | Uk−1 ] = − 1− {n¯ ank:n } {n¯ ank } n n n + 2 (ok:n − n¯ ak:n ) {n¯ ak } 1 − {n¯ ank:n } (onk:n
2 n¯ ank:n )
+ ({n¯ ank:n } − {n¯ ank }) {n¯ ak } which give the induction step for properties (b), (c) and (d). For example, in the case of (b), taking expectation over (9.13) we see that E [onk − n¯ ank ] =
{n¯ ank } E [onk:n − n¯ ank:n ] {n¯ ank:n }
228
9 A Continuous Time Particle Filter
and side is zero by the inductive hypothesis. The case {n¯ ank } + nthe right-hand n n¯ ak+1:n = {n¯ ak:n } + 1 is treated in a similar manner. Finally, for the proof of property (e) one shows first that for j > i, E onj − n¯ anj | Ui = ci:j oni+1:n − n¯ ani+1:n ci:j = pj
j−2 Y
qk ≥ 0,
k=i
Qj−2 where we adopt the convention k=i qk ( n n¯ anj / n¯ a if n j:n n pj = 1 − n¯ aj / 1 − n¯ aj:n if ( {n¯ ank:n } / n¯ ank−1:n n qk = (1 − {n¯ ank:n }) / 1 − n¯ ak−1:n
= 1 if i = j − 1, and where n n n n¯ aj + n¯ aj+1:n = n¯ a n n j:n n¯ aj + n¯ aj+1:n = n¯ anj:n + 1 n n if n¯ ak−1 + {n¯ ank:n } = n¯ ak−1:n otherwise.
Then, for j > i E (oni − n¯ ani ) onj − n¯ anj = ci:j E (oni − n¯ ani ) oni+1:n − n¯ ani+1:n = −ri ci:j , where ( n {n¯ ani } n¯ ai+1:n n ri = (1 − {n¯ ani }) 1 − n¯ ai+1:n
n if {n¯ ani } + n¯ ai+1:n = {n¯ ani:n } n n if {n¯ ai } + n¯ ai+1:n = {n¯ ani:n } + 1.
As ri > 0 and ci:j > 0, it follows that E (oni − n¯ ani ) onj − n¯ anj < 0. t u Remark 9.4. Proposition 9.3 states that the algorithm presented above produces an n-tuple of integer-valued random variables onj for j = 1, . . . , n with minimal variance, negatively correlated and whose sum is always n. Moreover, not only do the individual onj s have minimal variance, but also any sum of the Pk Pn form j=1 onj or j=k onj is an integer-valued random variable with minimal variance for any k = 1, . . . , n. This additional property can be interpreted as a further restriction on the random perturbation introduced by the branching correction. ˜ does not affect the Remark 9.5. Since the change of measure from P to P n 0 distribution of the random variables uj 0 , for j = 1, . . . , n−1, all the properties ˜ as well. stated in Proposition 9.3 hold true under P
9.2 The Approximating Particle System
229
Lemma 9.6. The process π n = {πtn , t ≥ 0} is a probability measure-valued process with c` adl` ag paths. In particular, π n is continuous on any interval [iδ, (i + 1)δ), i ≥ 0. Also, for any i > 0 we have n E[πiδ | Fiδ− ] = lim πtn . t%iδ
(9.15)
˜ That is, The same identity holds true under the probability measure P. ˜ n | Fiδ− ] = lim π n . E[π iδ t t%iδ
Proof. Since the pair processes (¯ anj (t), vjn (t)), j = 1, 2, . . . , n are continuous in the interval [iδ, (i + 1)δ) it follows that for any ϕ ∈ Cb (Rd ) the function πtn (ϕ) =
n X
a ¯nj (t)ϕ(vjn (t))
j=1
is continuous for t ∈ (iδ, (i + 1)δ). Hence π n is continuous with respect to the weak topology on M(Rd ) for t ∈ (iδ, (i + 1)δ), for each i ≥ 0. By the same argument, π n is right continuous and has left limits at iδ for any i > 0. For any t ≥ 0, n X πtn (1) = a ¯nj (t) = 1, j=1 n
therefore π is probability measure-valued. The identity (9.15) follows by observing that at the time iδ the weights are reset to one; thus for ϕ ∈ B(Rd ), it follows that n πiδ (ϕ) =
n 1 X n,iδ oj 0 ϕ(vjn0 (iδ)) n 0 j =1
and from (9.8) and (9.9), we have n E [πiδ (ϕ) | Fiδ− ] =
=
=
n 1 X n E[on,iδ j 0 |Fiδ− ]ϕ vj 0 (iδ) n 0
j =1 n X n,iδ a ¯j 0 ϕ vjn0 (iδ) j 0 =1 n X lim a ¯nj0 (t)ϕ vjn0 (t) . t%iδ 0 j =1
˜ does not Finally, from Remark 9.5, since the change of measure from P to P n 0 affect the distribution of the random variables uj 0 , for j = 1, . . . , n − 1, it follows that h i ˜ on,iδ E | F(i+1)δ− = n¯ an,iδ j0 j0 , ˜ n | Fiδ− ] = limt%iδ π n . hence also E[π t iδ
t u
230
9 A Continuous Time Particle Filter
If the system does not undergo any corrections, that is, δ = ∞, then the above method is simply the Monte Carlo method described in Section 8.6. The convergence of the Monte Carlo approximation is very slow as the particles wander away from the signal’s trajectory forcing the unnormalised weights to become infinitesimally small. Consequently the branching correction procedure is introduced to cull the unlikely particles and multiply those situated in the right areas. However, the branching procedure introduces randomness into the system as it replaces each weight with a random number of offspring. As such, the distribution of the number of offspring has to be chosen with great care to minimise this effect. The random number of offspring should have minimal variance. That is, as the mean number of offspring is pre-determined, we should choose the onj0 s to have the smallest possible variance amongst all integer-valued random variables with the given mean n¯ anj0 . It is easy to check n that if the oj 0 s have the distribution described by (9.7) then they have minimal variance. In [66], Crisan and Lyons describe a generic way to construct n-tuples of integer-valued random variables with the minimal variance property and the total sum equal to n. This is done by means of an associated binary tree, hence the name Tree-based branching Algorithms (which are sometimes abbreviated as TBBAs). The algorithm presented above is a specific example of the class described in [66]. To the authors’ knowledge only one other alternative algorithm is known that produces n-tuples which satisfy the minimal variance property. It was introduced by Whitley [268] and independently by Carpenter, Clifford and Fearnhead [39]. Further remarks on the branching algorithm can be found at the end of Chapter 10.
9.3 Preliminary Results The following proposition gives us the evolution equation for the approximating measure-valued process π n . Proposition 9.7. The probability measure-valued process π n = {πtn , t ≥ 0} satisfies the following evolution equation Z t n,ϕ πtn (ϕ) = π0n (ϕ) + πsn (Aϕ) ds + Stn,ϕ + M[t/δ] 0
+
m Z X k=1
t
(πsn (hk ϕ) − πsn (hk )πsn (ϕ)) dYsk − πsn (hk ) ds ,
(9.16)
0
for any ϕ ∈ Cb2 (Rd ), where S n,ϕ = {Stn,ϕ , t ≥ 0} is the Ft -adapted martingale ∞
Stn,ϕ
n
1 XX = n i=0 j=1
Z
(i+1)δ∧t
iδ∧t
a ¯nj (s)(∇ϕ)> σ)(vjn (s)) dVs(j) ,
9.3 Preliminary Results
231
and M n,ϕ = {Mkn,ϕ , k > 0} is the discrete parameter martingale k
Mkn,ϕ =
n
1XX n n (oj 0 (iδ) − n¯ an,iδ j 0 )ϕ(vj 0 (iδ)), n i=1 0
k > 0.
(9.17)
j =1
Proof. Let Fkδ− = σ (Fs , 0 ≤ s < kδ) be the σ-algebra of events up to time kδ n (the time of the kth-branching) and πkδ− = limt%kδ πtn . For t ∈ [iδ, (i + 1)δ), † 2 d we have for ϕ ∈ Cb (R ), πtn (ϕ) = π0n (ϕ) + Min,ϕ + + where M n,ϕ
(πtn (ϕ)
−
i X
n n πkδ− (ϕ) − π(k−1)δ (ϕ)
k=1 n πiδ (ϕ)) ,
(9.18)
= Mjn,ϕ , j ≥ 0 is the process defined as
Mjn,ϕ =
j X
n n πkδ (ϕ) − πkδ− (ϕ) ,
for j ≥ 0.
k=1
The martingale property of M n,ϕ follows from (9.15) and the explicit expresPn n n n sion (9.17) from the fact that πkδ = (1/n) j 0 =1 on,kδ j 0 δvj 0 (kδ) and πkδ− = Pn n ¯n,kδ j 0 δvj 0 (kδ) . j 0 =1 a We now find an expression for the third and fourth terms on the right-hand side of (9.18). From Itˆ o’s formula using (9.4), (9.5) and the independence of Y and V , it follows that d anj (t)ϕ(vjn (t)) = anj (t)Aϕ(vjn (t)) dt (j)
+ anj (t)((∇ϕ)> σ)(vjn (t)) dVt
+ anj (t)ϕ(vjn (t))h> (vjn (t)) dYt , and d
n X
! ank (t)
k=1
for any ϕ ∈ †
Cb2 (Rd ).
=
n X
ank (t)h> (vkn (t)) dYt ,
k=1
Hence for t ∈ [kδ, (k + 1)δ) and k = 0, 1, . . . , i, we have
We use the standard convention
P0
k=1
= 0.
232
9 A Continuous Time Particle Filter
n πtn (ϕ) − π(k−1)δ (ϕ) =
Z
n X d a ¯nj ϕ(vjn (s))
t
(k−1)δ
Z
n X
t
=
d
(k−1)δ j=1
Z =
(9.19)
j=1
! anj (s)ϕ vjn (s) Pn n p=1 ap (s)
t
πsn (Aϕ) ds (k−1)δ m Z t X + (πsn (hr ϕ) − πsn (hr )πsn (ϕ)) (k−1)δ r=1 × ( dYsr − πsn (hr ) ds) n Z t X + a ¯nj (s)((∇ϕ)> σ)(vjn (s)) dVs(j) . (k−1)δ j=1
(9.20)
Taking the limit as t % kδ yields, n n πkδ− (ϕ) − π(k−1)δ (ϕ) =
Z
kδ
πsn (Aϕ) ds
(k−1)δ n Z kδ X
+
j=1
+
m Z X r=1
a ¯nj (s)((∇ϕ)> σ)(vjn (s)) dVs(j)
(k−1)δ kδ
(πsn (hr ϕ) − πsn (hr )πsn (ϕ))
(k−1)δ
× (dYsr − πsn (hr ) ds).
(9.21) t u
Finally, (9.18), (9.20) and (9.21) imply (9.16).
In the following we choose a fixed time horizon t > 0 and let Y t = {Yst , s ∈ [0, t]} be the backward filtration Yst = σ(Yt − Yr , r ∈ [s, t]). Recall that Cbm (Rd ) is the set of all bounded, continuous functions with bounded partial derivatives up to order m on which we define the norm X kϕkm,∞ = sup |Dα ϕ(x)| , ϕ ∈ Cbm (Rd ), d
|α|≤m x∈R
1
d
where α = (α1 , . . . , αd ) is a multi-index and Dα ϕ = (∂1 )α · · · (∂d )α ϕ. Also recall that Wpm (Rd ) is the set of all functions with generalized partial derivatives up to order m with both the function and all its partial derivatives being p-integrable on which we define the Sobolev norm
9.3 Preliminary Results
1/p
kϕkm,p =
233
X Z
|α|≤m
Rd
p
|Dα ϕ(x)| dx
.
In the following we impose conditions under which the dual of the solution of the Zakai equation exists (see Chapter 7 for details). We assume that the matrix-valued function a is uniformly strictly elliptic. We also assume that there exists an integer m > 2 and a positive constant p > max(d/(m − 2), 2) such that for all i, j = 1, . . . , d, aij ∈ Cbm+2 (Rd ), fi ∈ Cbm+1 (Rd ) and for all i = 1, . . . , m we have hi ∈ Cbm+1 (Rd ). Under these conditions, for any bounded ϕ ∈ Wpm (Rd ) there exists a function-valued process ψ t,ϕ = {ψst,ϕ , s ∈ [0, t]} which is the dual of the measure-valued process ρ = {ρs , s ∈ [0, t]} (the solution of the Zakai equation) in the sense of Theorem 7.22. That is, for any ϕ ∈ Wpm (Rd ) ∩ B(Rd ), the process s 7→ ρs ψst,ϕ , s ∈ [0, t] is almost surely constant. We recall below the properties of the dual as described in Chapter 7. 1. For every x ∈ Rd , ψst,ϕ (x) is a real-valued process measurable with respect to the backward filtration Y t . 2. Almost surely, ψ t,ϕ is jointly continuous over [0, ∞) × Rd and is twice differentiable in the spatial variable. Both ψst,ϕ and its partial derivatives are continuous bounded functions. 3. ψ t,ϕ is a solution of the following backward stochastic partial differential equation which is identical to (7.30): Z t ψst,ϕ (x) = ϕ(x) − Aψpt,ϕ (x) dp s Z t t,ϕ ¯ p, − ψp (x)h> (x) dY 0 ≤ s ≤ t, x ∈ Rd , s
Rt
¯ p is a backward Itˆ where dY o integral. 4. There exists a constant c = c(p) independent of ϕ such that " #
t,ϕ p ˜ sup ψ E ≤ ckϕkpm,p . s ψ t,ϕ h> s p
s∈[0,t]
2,∞
(9.22)
As mentioned in Chapter 7, the dual ψ t,ϕ can be defined for a larger class of the test functions ϕ than Wpm (Rd ), using the representation (7.33). We can rewrite (7.33) in the following form, ˜ ϕ(v(t))at (v, Y ) | Yt , v(s) = x , ψst,ϕ (x) = E (9.23) s for any ϕ ∈ B(Rd ). In (9.23), v = {v(s), s ∈ [0, t]} is an Fs -adapted Markov process, independent of Y that satisfies the same stochastic differential equation as the signal; that is,
234
9 A Continuous Time Particle Filter
dv(t) = f (v(t)) dt + σ(v(t)) dVt and ats (v, Y ) = exp
Z
t
h(v(r))> dYr −
s
1 2
Z
t
kh(v(r))k2 dr .
s
Lemma 9.8. For s ∈ [0, t] and ϕ ∈ B(Rd ), we have ˜ ϕ(v(t))at (v, Y ) | Fs ∨ Yt . ψst,ϕ (v(s)) = E s Proof. From (9.23) and the properties of the conditional expectation ˜ ϕ(v(t))at (v, Y ) | Yt ∨ σ(v(s)) ψst,ϕ (v(s)) = E s and the claim follows by the Markov property of the process v and its independence from Yt . t u Lemma 9.9. For any ϕ ∈ B(Rd ) and any k < [t/δ], the real-valued process s ∈ [kδ, (k + 1)δ ∧ t) 7→ ψst,ϕ (vjn (s))anj (s) is an Fs ∨ Yt -adapted martingale. Moreover, if ϕ ∈ Wpm (Rd ) ∩ B(Rd ) where m > 2 and (m − 2)p > d t,ϕ ψst,ϕ (vjn (s))anj (s) = ψkδ vjn (kδ) Z s + anj (p)((∇ψpt,ϕ )> σ) vjn (p) dVp(j) , (9.24) kδ
for s ∈ [kδ, (k + 1)δ ∧ t) and j = 1, . . . , n. Proof. For the first part of the proof we cannot simply use the fact that ψ t,ϕ is a (classical) solution of the backward stochastic partial differential equation (7.30) as the test function ϕ does not necessarily belong to Wpm (Rd ). However, from Lemma 9.8 it follows that ˜ ϕ v n (t) at (v n , Y ) | Fs ∨ Yt , ψst,ϕ (vjn (s)) = E (9.25) j s j where for j = 1, . . . , n, following (9.6), Z t Z > 1 t kh(vjn (r))k2 dr ats (vjn , Y ) = exp h vjn (r) dYr − 2 s s and vjn (s) is given by Z s Z vjn (s) = vjn (kδ) + f (vjn (r)) dr + kδ
s
σ(vjn (r)) dVr(j) ,
j = 1, . . . , n, (9.26)
kδ
which is taken as the definition for s ∈ [kδ, t]. Comparing this with (9.4) it is clear that if (k + 1)δ < t, then this vjn (s) may not agree with the previous
9.3 Preliminary Results
235
definition on ((k + 1)δ, t]. Observe that ats (vjn , Y ) = anj (t)/anj (s) where anj (s) is given for s ∈ [kδ, t] by Z s Z 1 s n n > n 2 aj (s) = exp h(vj (p)) dYp − kh(vj (p))k dp ; (9.27) 2 kδ kδ since anj (s) is Fs -adapted it is also Fs ∨ Yt -adapted, thus ˜ ψst,ϕ (vjn (s))anj (s) = E[ϕ vjn (t) anj (t) | Fs ∨ Yt ].
(9.28)
˜ Since s 7→ E[ϕ vjn (t) anj (t) | Fs ∨ Yt ] is an Fs ∨ Yt -adapted martingale for s ∈ [0, t], so is s 7→ ψst,ϕ (vjn (s))anj (s). This completes the proof of the first part of the lemma. For the second part of the lemma, as ϕ ∈ Wpm (Rd ), it is now possible to use properties 1–4 of the dual process ψ t,ϕ , in particular the fact that ψ t,ϕ is differentiable. The stochastic integral on the right-hand side of (9.24) is (j) well defined as the Brownian motion V (j) = {Vs , s ∈ [kδ, (k + 1)δ ∧ t)} is (j) Fs ∨ Yt -adapted (V is independent of Y ) and so is the integrand s ∈ [kδ, (k + 1)δ ∧ t) 7→ anj (p)((∇ψpt,ϕ )> σ) vjn (p) . Moreover, the stochastic integral on the right-hand side of (9.24) is a genuine martingale since its quadratic variation process Q = {Qs , s ∈ [kδ, (k+1)δ∧t)} satisfies the inequality Z s n 2 2 ˜ ˜ kψ t,ϕ k2 ˜ E[Qs ] ≤ Kσ E dp < ∞. (9.29) p 1,∞ E (aj (p)) kδ
In (9.29) we used the fact that kψpt,ϕ k21,∞ and anj (p) are mutually independent and that σ is uniformly bounded by Kσ . We cannot prove (9.24) by applying Itˆ o’s formula directly: ψpt,ϕ is Ypt -measurable, whereas anj (p) is Fp -measurable. Instead, we use a density argument. Since all terms appearing in (9.24) are measurable with respect to the σt t algebra Fkδ ∨ Ykδ ∨ (V j )tkδ , where Ykδ = σ(Yr − Ykδ , r ∈ [kδ, t]) and (V j )tkδ = j j σ(Vr − Vkδ r ∈ [kδ, t]), it suffices to prove that ˜ χ ψ t,ϕ (v n (s))an (s) − ψ t,ϕ v n (kδ) E s j j j kδ Z s n t,ϕ > n (j) ˜ =E χ aj (p)((∇ψp ) σ) vj (p) dVp ,
(9.30)
kδ t where χ is any bounded Fkδ ∨ Ykδ ∨ (V j )tkδ -measurable random variable. It is t sufficient to work with a much smaller class of bounded Fkδ ∨ Ykδ ∨ (V j )tkδ m measurable random variables. Let b : [kδ, t] → R and c : [kδ, t] → Rd be bounded, Borel-measurable functions and let θb and θc be the following (bounded) processes
236
9 A Continuous Time Particle Filter
θrb
r
Z , exp i
b> p
kδ
and
Z , exp i
θrc
1 dYp + 2
r
c> p
1 + 2
dVp(j)
kδ
r
Z
2
kbp k dp ,
(9.31)
kδ
Z
r
kcp k dp . 2
(9.32)
kδ
Then it is sufficient to show that (9.30) holds true for χ of the form χ = ζθtb θtc , for any choice of b in (9.31) and c in (9.32) and any bounded Fkδ -measurable random variable ζ (see Corollary B.40 for a justification of the above). For s ∈ [kδ, (k + 1)δ ∧ t), h i ˜ ψ t,ϕ (v n (s))an (s)ζθb θc | Fkδ ∨ Y s ∨ V j s E s j j t t kδ kδ = Ξs (vjn (s))anj (s)ζθsb θsc ,
(9.33)
where Ξ = {Ξs (·), s ∈ [kδ, (k + 1)δ ∧ t]} is given by h i ˜ ψ t,ϕ (·)θ˜b | Fkδ ∨ Y s ∨ V j s , Ξs (·) , E s s kδ kδ and
Z t Z θtb 1 t > 2 b ˜ θs , b = exp i bp dYp + kbp k dp . θs 2 s s
Both ψst,ϕ and θ˜sb are measurable Yst , which is with respect to thet,ϕσ-algebra s j s b ˜ ˜ independent of Fkδ ∨ Ykδ ∨ V kδ , hence Ξs (·) = E[ψ s (·)θs ]. As in the proof m d of Theorem 7.22 it follows that for any r ∈ Cb ([0, ∞), R ) and any x ∈ Rd , Z Ξs (x) = ϕ(x) −
t
Z AΞp (x) dp − i
s
t
h> (x)rp Ξp (x) dp,
0 ≤ s ≤ t. (9.34)
s
Equivalently Ξ(·) = {Ξs (·), s ∈ [0, t]} is the unique solution of the parabolic PDE (4.14) with final time condition Ξt (·) = ϕ(·). From the Sobolev embedding theorem as a consequence of the condition (m − 2)p > d, it follows that ϕ has a modification on a set of null Lebesgue measure which is in Cb (Rd ), therefore the solution to the PDE Ξ ∈ Cb1,2 ([0, t] × Rd ). From (9.33) it follows that ˜ ψ t,ϕ (v n (s))an (s) − ψ t,ϕ v n (kδ) χ E s j j j kδ ˜ ζ Ξs (v n (s))an (s)θb θc − Ξkδ v n (kδ) . =E (9.35) j
j
s s
j
As Ξ is the solution of a deterministic PDE with deterministic initial condition, it follows that Ξs (vjn (s)) is Fs -measurable. Thus as all the terms are now measurable with respect to the same filtration, it is possible to apply Itˆ o’s rule and use the PDE (9.34) to obtain
9.3 Preliminary Results
237
˜ ζ Ξs (v n (s))an (s)θb θc − Ξkδ v n (kδ) an (kδ)θb θc E j j s s j j kδ kδ Z s ˜ ζ =E d anj (p)Ξp (vjn (p))θpb θpc Z kδ s ˜ ζ =E anj (p)θpb θpc AΞp (vjn (p)) + iΞp (vjn (p))h> (vjn (p))bp kδ ∂Ξp n + (vj (p)) + i(∇Ξ)> σcp θpb θpc dp ∂p Z s ˜ iζ =E anj (p)(∇Ξ > σ)cp θpb θpc dp Zkδs ˜ iζ =E anj (p) ∇Ξp> σ (vjn (p))cp θpb θpc dp . (9.36) kδ
A second similar application of Itˆ o’s formula using (9.32) yields Z s n t,ϕ > n j t ˜ ζθb θc E a (p)((∇ψ ) σ)(v (p)) dV F ∨ Y kδ t t j p j p kδ kδ Z s Z s t ˜ d θtc anj (p)((∇ψpt,ϕ )> σ)(vjn (p)) dVpj Fkδ ∨ Ykδ = ζθtb E kδ Zkδs t ˜ = iζθtb E anj (p) (∇ψpt,ϕ )> σ (vjn (p))cp θpc dp Fkδ ∨ Ykδ . (9.37) kδ
Use of Fubini’s theorem and the tower property of conditional expectation gives Z s n n t,ϕ > b c ˜ E ζ aj (p) (∇ψp ) σ (vj (p))cp θt θp dp Zkδs ˜ ζan (p) ∇(ψ t,ϕ )> σ (v n (p))cp θb θc dp = E j p j t p Zkδs h i ˜ E ˜ ζan (p) ∇(ψ t,ϕ )> σ (v n (p))cp θb θc Fkδ ∨ Y p ∨ (V j )p = E dp j p j t p Zkδs = Zkδs =
kδ
kδ
h h ii ˜ ζθc θb an (p)cp E ˜ ∇(ψ t,ϕ )> σ (v n (p))θ˜b Fkδ ∨ Y p ∨ (V j )p E dp p p j p j p kδ kδ h h i i ˜ ζθc θb an (p)cp E ˜ ∇(ψ t,ϕ )> (v n (p))θ˜b σ(v n (p)) dp E p p j p j p j
Zkδs
h h i i ˜ ζθc θb an (p)cp ∇E ˜ (ψ t,ϕ )> (v n (p))θ˜b σ(v n (p)) dp E p p j p j p j kδ Z s n n > b c ˜ =E ζ aj (p) ∇Ξp σ (vj (p))cp θp θp dp . =
kδ
Using this result and (9.37) it follows that
238
9 A Continuous Time Particle Filter
Z ˜ E ζθtb θtc
s
anj (p)
(∇ψpt,ϕ )> σ
(vjn (p)) dVpj
kδ
Z ˜ = E iζ
s
anj (p)
∇Ξp> σ
(vjn (p))cp θpb θpc
dp .
(9.38)
kδ
From (9.35), (9.36) and (9.38) we deduce (9.30) and hence the result of the lemma. t u To show that ψst,ϕ is dual to ρs for arbitrary ϕ ∈ B(Rd ), use the fact that ˜ and (9.28), have the same law as (X, Z)
(vjn (s), anj (s))
˜ Z˜s ψ t,ϕ (Xs ) | Ys ] ρs ψst,ϕ = E[ s ˜ Z˜s ψ t,ϕ (Xs ) | Yt ] = E[ s ˜ t,ϕ (v n (s))an (s) | Yt ] = E[ψ s j j h i n n ˜ E ˜ ϕ v (t) a (t) | Fs ∨ Yt | Yt =E j j n n ˜ = E ϕ vj (t) aj (t) | Yt h i ˜ ϕ(Xt )Z˜t | Yt =E = ρt (ϕ). Define the following Ft -adapted martingale ξ n = {ξtn , t ≥ 0} by [t/δ] n n Y 1X X 1 ξtn , an,iδ anj (t) . j n n j=1 j=1 i=1 Exercise 9.10. Prove that for any t ≥ 0 and p ≥ 1, there exist two constants t,p ct,p 1 and c2 which depend only on maxk=1,...,m khk k0,∞ such that p
sup sup E [(ξsn ) ] ≤ ct,p 1 ,
(9.39)
n≥0 s∈[0,t]
and ˜ max sup sup E
j=1,...,n n≥0 s∈[0,t]
ξsn anj (s)
p
≤ ct,p 2 .
(9.40)
We use the martingale ξtn to linearize πtn in order to make it easier to analyze the convergence of π n . Let ρn = {ρnt , t ≥ 0} be the measure-valued process defined by ρnt , ξtn πtn =
n n X ξ[t/δ]δ
n
anj (t)δvjn (t) .
j=1
Exercise 9.11. Show that ρn = {ρnt , t ≥ 0} is a measure-valued process which satisfies the following evolution equation
9.3 Preliminary Results
ρnt (ϕ) = π0n (ϕ) +
t
Z 0
+
m Z X
239
¯ n,ϕ ρns (Aϕ)ds + S¯tn,ϕ + M [t/δ]
t
ρns (hk ϕ) dYsk ,
(9.41)
0
k=1
for any ϕ ∈ Cb2 (Rd ). In (9.41), S¯n,ϕ = {S¯tn,ϕ , t ≥ 0} is an Ft -adapted martingale ∞ n Z 1 X X (i+1)δ∧t n n n,ϕ ¯ St = ξiδ aj (s)((∇ϕ)> σ)(vjn (s))dVsj n i=0 j=1 iδ∧t n,ϕ ¯ n,ϕ = M ¯ and M k , k > 0 is the discrete martingale k n X X n n ¯ n,ϕ = 1 M ξ (onj0 (iδ) − n¯ an,iδ j 0 )ϕ(vj 0 (iδ)), k n i=1 iδ 0
k > 0.
j =1
Proposition 9.12. For any ϕ ∈ B(Rd ), the real-valued process ρn· (ψ·t,ϕ ) = {ρns (ψst,ϕ ), s ∈ [0, t]} is an Fs ∨ Yt -adapted martingale. Proof. From Lemma 9.9 we deduce that for s ∈ [[t/δ]δ, t], we have ˜ an (t)ϕ(v n (t)) | Fs ∨ Yt = an (s)ψ t,ϕ (v n (s)) E j j j s j which implies, in particular that t,ϕ n n t,ϕ n ˜ n,kδ E[a j 0 ψkδ vj 0 (kδ) | Fs ∨ Yt ] = aj 0 (s)ψs (vj 0 (s)) for any s ∈ [(k − 1)δ, kδ). Hence ˜ [ρn (ϕ) | Fs ∨ Yt ] = E t =
n n X ξ[t/δ]δ
n ρns
˜ n (t)ϕ v n (t) | Fs ∨ Yt ] E[a j j
j=1
ψst,ϕ
,
for [t/δ]δ ≤ s ≤ t
(9.42)
and, for s ∈ [(k − 1)δ, kδ), n h i X ξn t,ϕ n ˜ ρn (ψ t,ϕ ) | Fs ∨ Yt = (k−1)δ ˜ an,kδ E E kδ− j 0 ψkδ (vj 0 (kδ)) | Fs ∨ Yt kδ− n 0 j =1
=
ρns (ψst,ϕ ).
(9.43)
Finally n n X an,kδ j0 t,ϕ n ˜ n ψ t,ϕ | Fkδ− ∨ Yt ] = ξkδ E[ρ ψkδ (vj 0 (kδ)) P kδ kδ n n,kδ n 0 a /n 0 0 k =1 k j =1
=
t,ϕ ρnkδ− (ψkδ− ).
The proposition now follows from (9.42), (9.43) and (9.44).
(9.44) t u
240
9 A Continuous Time Particle Filter
Proposition 9.13. For any ϕ ∈ Wpm (Rd ) ∩ B(Rd ), the real-valued process ρn· (ψ·t,ϕ ) = {ρns (ψst,ϕ ) , s ∈ [0, t]} has the representation ˆ n,ϕ . ρnt (ϕ) = π0n (ψ0t,ϕ ) + Sˆtn,ϕ + M [t/δ]
(9.45)
In (9.45), Sˆn,ϕ = {Sˆsn,ϕ , s ∈ [0, t]} is the Fs ∨ Yt -adapted martingale Sˆsn,ϕ ,
Z ∞ X n X ξn
(i+1)δ∧s
anj (p)((∇ψpt,ϕ )> σ)(vjn (p)) dVp(j)
iδ
i=0 j=1
n
iδ∧s
ˆ n,ϕ = {M ˆ n,ϕ , k > 0} is the discrete martingale and M k ˆ n,ϕ , M k
k n n X X ξiδ t,ϕ n (onj (iδ) − n¯ anj (iδ))ψiδ (vj (iδ)), n i=1 j=1
k > 0.
Proof. As in (9.18), we have for t ∈ [iδ, (i + 1)δ) that ρnt (ϕ) = ρnt (ψtt,ϕ ) ˆ n,ϕ + = π0n (ψ0t,ϕ ) + M i +
(ρnt (ψtt,ϕ )
−
i X
t,ϕ t,ϕ ρnkδ− (ψkδ− ) − ρn(k−1)δ (ψ(k−1)δ )
k=1 t,ϕ n ρiδ (ψiδ )),
(9.46)
ˆ n,ϕ = {M ˆ n,ϕ , i ≥ 0} is the process defined as (note that ψ t,ϕ = ψ t,ϕ ) where M i kδ− kδ ˆ n,ϕ = M i
i X
t,ϕ t,ϕ (ρnkδ (ψkδ ) − ρnkδ− (ψkδ− ))
k=1
=
i X
t,ϕ t,ϕ n n n ξkδ (πkδ (ψkδ ) − πkδ− (ψkδ ))
k=1 i n 1 X n X n,kδ t,ϕ n = ξkδ (oj 0 − n¯ an,kδ j 0 )ψkδ (vj 0 (kδ)), n 0 k=1
for i ≥ 0.
(9.47)
j =1
t The random variables on,kδ are independent of Ykδ since they are Fkδ -adapted. j Then (9.9) implies h i h i t ˜ on,kδ ˜ on,kδ E | Fkδ− ∨ Ykδ =E | Fkδ− = n¯ an,kδ j0 j0 j0 ,
ˆ n,ϕ . Finally, from the representation whence the martingale property of M (9.24) we deduce that for t ∈ [iδ, (i + 1)δ),
9.4 The Convergence Results
ρnt (ψtt,ϕ ) = =
241
n n X ξiδ anj (t)ψtt,ϕ vjn (t) n j=1 n n X ξiδ t,ϕ ψiδ vjn (iδ) n j=1 n Z t n X ξiδ + an (p)((∇ψpt,ϕ )> σ) vjn (p) dVp(j) , n j=1 iδ j
hence t,ϕ ρnt (ψtt,ϕ ) − ρniδ (ψiδ )=
n Z t n X ξiδ anj (p)((∇ψpt,ϕ )> σ) vjn (p) dVp(j) . n j=1 iδ
Similarly t,ϕ t,ϕ ρnkδ− (ψkδ− ) − ρn(k−1)δ (ψ(k−1)δ )
=
n n Z X ξ(k−1)δ
n
j=1
kδ
anj (p)((∇ψpt,ϕ )> σ) vjn (p) dVp(j) ,
(k−1)δ
which completes the proof of the representation (9.45).
t u
9.4 The Convergence Results In this section we begin by showing that ρnt (ϕ) converges to ρt (ϕ) in Proposition 9.14 and that πtn (ϕ) converges to πt (ϕ) in Theorem 9.15 for any ϕ ∈ Cb (Rd ). These results imply that ρnt converges to ρt and πtn converges to πt as measure-valued random variables (Corollary 9.17). Proposition 9.14 and Theorem 9.15 are then used to prove two stronger results, namely that the process ρn· (ϕ) converges to ρ· (ϕ) in Proposition 9.18 and that the process π·n (ϕ) converges to π· (ϕ) in Theorem 9.19 for any ϕ ∈ Cb2 (Rd ).† These imply in turn, by Corollary 9.20, that the measure-valued process ρn· converges to ρ· and that the probability measure-valued process π·n converges to π· Bounds on the rates of convergence are also obtained. Proposition 9.14. If the coefficients σ,f and h are bounded and Lipschitz, then for any T ≥ 0, there exists a constant cT3 independent of n such that for any ϕ ∈ Cb (Rd ), we have T ˜ n (ϕ) − ρt (ϕ))2 ] ≤ c3 kϕk2 , E[(ρ t 0,∞ n
t ∈ [0, T ].
(9.48)
In particular, for all t ≥ 0, ρnt converges in expectation to ρt . †
Note the smaller class of test functions for which results 9.18 and 9.19 hold true.
242
9 A Continuous Time Particle Filter
Proof. It suffices to prove (9.48) for any non-negative ϕ ∈ Cb (Rd ). Obviously, we have [t/δ]
ρnt (ϕ)
− ρt (ϕ) =
ρnt (ϕ)
−
t,ϕ ρn[t/δ]δ (ψ[t/δ]δ )
+
X
t,ϕ t,ϕ ρnkδ (ψkδ ) − ρnkδ− (ψkδ− )
k=1
+
[t/δ]
X
t,ϕ t,ϕ ρnkδ− (ψkδ− ) − ρn(k−1)δ (ψ(k−1)δ )
k=1
+ π0n ψ0t,ϕ − π0 ψ0t,ϕ .
(9.49)
We must bound each term on the right-hand side individually. For the first term, using the martingale property of ρn (ψ t,ϕ ) and the fact that the random variables vjn (t) for j = 1, 2, . . . , n are mutually independent conditional upon F[t/δ]δ ∨ Yt (since the generating Brownian motions V (j) , for j = 1, 2, . . . , n are mutually independent), we have h i ˜ (ρn (ϕ) − ρn (ψ t,ϕ ))2 | F[t/δ]δ ∨ Yt E t [t/δ]δ [t/δ]δ ˜ n (ϕ) − E[ρ ˜ n (ϕ) | F[t/δ]δ ∨ Yt ])2 | F[t/δ]δ ∨ Yt ] = E[(ρ t t 2 n n 2 (ξ[t/δ]δ ) X ˜ = E ϕ(vjn (t))anj (t) F[t/δ]δ ∨ Yt n2 j=1 2 n n (ξ[t/δ]δ )2 X ˜ ϕ(v n (t))an (t) | F[t/δ]δ ∨ Yt − E j j n2 j=1 ≤
n (ξ[t/δ]δ )2
n2
kϕk20,∞
n X
˜ n (t)2 | F[t/δ]δ ∨ Yt ]. E[a j
(9.50)
j=1
By taking expectation on both sides of (9.50) and using (9.40) for p = 2, we obtain n 2 kϕk2 X 0,∞ t,ϕ n n ˜ n )2 an (t)2 ] ˜ E ρt (ϕ) − ρ[t/δ]δ (ψ[t/δ]δ ) ≤ E[(ξ j [t/δ]δ n2 j=1 ≤
ct,2 2 kϕk20,∞ . n
(9.51)
t,ϕ Similarly (although in this case we do not have the uniform bound on ψkδ t,ϕ which was used with ψt ), 2 t,ϕ t,ϕ n n ˜ E ρkδ− (ψkδ− ) − ρ(k−1)δ (ψ(k−1)δ )
≤
n i 1 X ˜h n n,kδ 2 t,ϕ n 2 E (ξ a ) ψ (v (kδ)) . 0 0 j (k−1)δ j kδ n2 0 j =1
(9.52)
9.4 The Convergence Results
243
From (9.25) we deduce that t,ϕ n ˜ ϕ(v n (t))at (v n , Y ) | Fkδ ∨ Yt ; ψkδ (vj 0 (kδ)) = E j kδ j hence by Jensen’s inequality h i ˜ ψ t,ϕ (v n0 (kδ)) p ≤ E ˜ E ˜ ϕ(v n (t))at (v n , Y ) | Fkδ ∨ Yt p E s j j kδ j ˜ ϕ(v n (t))at (v n , Y ) p . =E j kδ j Therefore ˜ ψ t,ϕ (v n0 (kδ)) p E s j Z t Z > 1 t 2 ˜ exp p kh(vjn0 (r))k2 dr ≤ kϕkp0,∞ E ph vjn0 (r) dYr − 2 kδ kδ 2 Z t p −p × exp kh(vjn0 (r))k2 dr 2 kδ 1 2 ≤ exp m(p − p)t max khk k20,∞ kϕkp0,∞ . (9.53) k=1,...,m 2 Using this upper bound with p = 4, the bound (9.40) and the Cauchy–Schwarz inequality on the right-hand side of (9.52), 2 t,ϕ ˜ ρn (ψ t,ϕ ) − ρn E (ψ ) kδ− (k−1)δ kδ− (k−1)δ ≤
q
kϕk20,∞ 2 ct,4 exp 3mt max kh k . k 0,∞ 2 k=1,...,m n
(9.54)
For the second term on the right-hand side of (9.49), observe that h i ˜ ρn (ψ t,ϕ ) − ρn (ψ t,ϕ ) 2 | Fkδ− ∨ Yt E kδ kδ− kδ kδ− n ξ 2 X ˜ n,kδ n,kδ n,kδ n,kδ = kδ E o − n¯ a o − n¯ a | F ∨ Y 0 0 0 0 kδ− t j j l l n2 0 0 j ,l =1
t,ϕ n t,ϕ n × ψkδ (vj 0 (kδ))ψkδ (vl0 (kδ)).
Since the test function ϕ was chosen to be non-negative, and the random 0 variables {on,kδ j 0 , j = 1, . . . , n} are negatively correlated (see Proposition 9.3 part e.) it follows that ˜ (ρn (ψ t,ϕ ) − ρn (ψ t,ϕ ))2 | Fkδ− ∨ Yt E kδ kδ− kδ kδ− n 2 2 X ξkδ n,kδ n,kδ t,ϕ n ˜ ≤ 2 E oj 0 − n¯ aj 0 | Fkδ− ∨ Yt ψkδ (vj 0 (kδ))2 n 0 ≤
2 ξkδ n2
j =1 n n X
n¯ an,kδ j0
j 0 =1
o n o t,ϕ n 1 − n¯ an,kδ ψkδ (vj 0 (kδ))2 . 0 j
244
9 A Continuous Time Particle Filter
Finally using the inequality q(1 − q) ≤ 14 for q = {n¯ an,kδ j 0 } and (9.53) with p = 2, it follows that ˜ (ρn (ψ t,ϕ ) − ρn (ψ t,ϕ ))2 E kδ kδ− kδ kδ− 1 2 ≤ exp mt max khk k0,∞ kϕk20,∞ . (9.55) k=1,...,m 4n For the last term, note that ψ0t,ϕ is Yt -measurable, therefore using the mutual independence of the initial points vjn (0), and the fact that ˜ t,ϕ (v n (0)) | Yt ] = π0 (ψ t,ϕ ), E[ψ j 0 0 we obtain h i ˜ π n (ψ t,ϕ ) − π0 (ψ t,ϕ ) 2 | Yt E 0 0 0 =
n i 2 2 1 X ˜ h t,ϕ n E ψ (v (0)) | Y − π0 (ψ0t,ϕ ) t j 0 2 n j=1
n i 2 1 X ˜ h t,ϕ n ≤ 2 E ψ0 (vj (0)) | Yt . n j=1
Hence using the result (9.53) with p = 2, ˜ E
h
n 2 i 1 X ˜ t,ϕ n π0n (ψ0t,ϕ ) − π0 (ψ0t,ϕ ) ≤ 2 E[ψ0 (vj (0))2 ] n j=1 1 ≤ exp mt max khk k20,∞ kϕk20,∞ . (9.56) k=1,...,m n
The bounds on individual terms (9.51), (9.54), (9.55) and (9.56) substituted into (9.49) yields the result (9.48). t u Theorem 9.15. If the coefficients σ,f and h are bounded and Lipschitz, then for any T ≥ 0, there exists a constant cT4 independent of n such that for any ϕ ∈ Cb (Rd ), we have cT4 ˜ [|π n (ϕ) − πt (ϕ)|] ≤ √ E kϕk0,∞ , t n
t ∈ [0, T ].
(9.57)
In particular, for all t ≥ 0, πtn converges in expectation to πt . Proof. Since πtn (ϕ)ρnt (1) = ξtn πtn (ϕ) = ρnt (ϕ) −1
πtn (ϕ) − πt (ϕ) = (ρnt (ϕ) − ρt (ϕ)) (ρt (1))
− πtn (ϕ) (ρnt (1) − ρt (1)) (ρt (1))
−1
.
9.4 The Convergence Results
Define
245
r h i ˜ (ρt (1))−2 . mt , E
Following Exercise 9.16 below, mt < ∞, hence by Cauchy–Schwartz r h i n ˜ ˜ (ρn (ϕ) − ρt (ϕ))2 E [|π (ϕ) − πt (ϕ)|] ≤ mt E t
t
r h i ˜ (ρn (1) − ρt (1))2 , + mt kϕk0,∞ E t
(9.58)
and the result follows by applying Proposition 9.14 to the two expectations on the right-hand side of (9.58). t u −2 ˜ Exercise 9.16. Prove that E[sup ] < ∞ for any T ≥ 0. t∈[0,T ] (ρt (1))
Let M = {ϕi , i ≥ 0} ∈ Cb (Rd ) be a countable convergence determining set such that kϕi k ≤ 1 for any i ≥ 0 and dM be the metric on MF (Rd ) (see Section A.10 for additional details) dM : MF (Rd ) × MF (Rd ) → [0, ∞),
d(µ, ν) =
∞ X |µϕi − νϕi | i=0
2i
.
Proposition 9.14 and Theorem 9.15 give the following corollary. Corollary 9.17. If the coefficients σ,f and h are bounded and Lipschitz, then p T T 2 n ˜ M (ρ , ρt )] ≤ √ c3 , ˜ M (π n , πt )] ≤ 2c √4 . sup E[d sup E[d (9.59) t t n n t∈[0,T ] t∈[0,T ] Thus ρnt converges to ρt in expectation and πtn converges to πt in expectation. In the following, we prove a stronger convergence result. Proposition 9.18. If the coefficients σ,f and h are bounded and Lipschitz, then for any T ≥ 0, there exists a constant cT5 independent of n such that " # T 2 n ˜ sup (ρ (ϕ) − ρt (ϕ)) ≤ c5 kϕk2 E (9.60) t 2,∞ n t∈[0,T ] for any ϕ ∈ Cb2 (Rd ). Proof. Again, it suffices to prove (9.60) for any non-negative ϕ ∈ Cb2 (Rd ). Following Exercise 9.11 we have that Z t ρnt (ϕ) − ρt (ϕ) = (π0n (ϕ) − π0 (ϕ)) + (ρns (Aϕ) − ρs (Aϕ)) ds + S¯tn,ϕ 0
¯ n,ϕ + +M [t/δ]
m Z X k=1
0
t
(ρns (hk ϕ) − ρs (hk ϕ)) dYsk ,
(9.61)
246
9 A Continuous Time Particle Filter
where S¯n,ϕ = {S¯tn,ϕ , t ≥ 0} is the martingale ∞
n
1 XX S¯tn,ϕ , n i=0 j=1
Z
(i+1)δ∧t n n ξiδ aj (s)((∇ϕ)> σ)(vjn (s))dVs(j) ,
iδ∧t
n,ϕ ¯ n,ϕ = M ¯ and M k , k > 0 is the discrete parameter martingale k n X X n n ¯ n,ϕ , 1 M ξ (onj0 (iδ) − n¯ an,iδ iδ j 0 )ϕ(vj 0 (iδ)), k n i=1 0
k > 0.
j =1
We show that each of the five terms on the right-hand side of (9.61) satisfies an inequality of the form (9.60). For the first term, using the mutual independence of the initial locations of the particles vjn (0), we obtain h i ˜ (π n (ϕ) − π0 (ϕ))2 = 1 π0 (ϕ2 ) − π0 (ϕ)2 ≤ 1 kϕk2 . E 0 0,∞ n n
(9.62)
For the second term, by Cauchy–Schwartz " Z t 2 # n ˜ E sup (ρ (Aϕ) − ρs (Aϕ))ds s
t∈[0,T ]
0
" ˜ ≤E
Z sup t t∈[0,T ]
" Z ˜ T =E
T
#
t
(ρns (Aϕ)
2
− ρs (Aϕ)) ds
0
# 2
(ρns (Aϕ) − ρs (Aϕ)) ds .
(9.63)
0
By Fubini’s theorem and (9.48), we obtain "Z # T cT T 2 n ˜ E (ρs (Aϕ) − ρs (Aϕ)) ds ≤ 3 kAϕk20,∞ . n 0
(9.64)
From the boundedness of σ and f since there exists c6 = c6 (kσk0,∞ , kf k0,∞ ) such that kAϕk20,∞ ≤ c6 kϕk22,∞ , from (9.63) and (9.64) that " Z t 2 # cT c6 T 2 n ˜ E sup (ρs (Aϕ) − ρs (Aϕ)) ds ≤ 3 kϕk22,∞ . n t∈[0,T ] 0
(9.65)
For the third term, we use the Burkholder–Davis–Gundy inequality (Theorem B.36). If we denote by C the constant in the Burkholder–Davis–Gundy inequality applied to F (x) = x2 , then
9.4 The Convergence Results
" ˜ E
247
# ˜ sup (S¯tn,ϕ )2 ≤ C E
S¯n,ϕ
t∈[0,T ]
=
C n2
n Z X j=1
T
T
n n 2 > > n ˜ E (ξ[s/δ]δ aj (s)) ((∇ϕ) σσ ∇ϕ)(vj (s)) ds. (9.66)
0
From (9.40) and the fact that σ is bounded, we deduce that there exists a constant cT7 such that n 2 > > n T 2 ˜ n E[(ξ [s/δ]δ aj (s)) ((∇ϕ) σσ ∇ϕ)(vj (s))] ≤ c7 kϕk2,∞ ,
for any s ∈ [0, T ]. From (9.66) and (9.67) " # T n,ϕ ˜ sup (S¯ )2 ≤ Cc7 T kϕk2 . E t 2,∞ n t∈[0,T ]
(9.67)
(9.68)
For the fourth term on the right-hand side of (9.61), by Doob’s maximal inequality 2 n,ϕ 2 n,ϕ ˜ ˜ ¯ ¯ E max Mk ≤ 4E M[T /δ] . (9.69) k=1,...,[T /δ]
Since ϕ is non-negative and the offspring numbers, onj0 (iδ) for j 0 = 1, . . . , n, are negatively correlated, from the orthogonality of martingale increments 2 n,ϕ ˜ ¯ E M[T /δ] ≤
[T /δ] n n 2 i 1 X X˜h n 2 n n E (ξ ) n¯ a (iδ) 1 − n¯ a (iδ) ϕ v (iδ) iδ j j j n2 i=1 j=1
≤
[T /δ] n h i kϕk20,∞ X X ˜ (ξ n )2 . E iδ 4n2 i=1 j=1
(9.70)
Then, from (9.39), (9.69) and (9.70) there exists a constant cT8 = cT,2 1 [T /δ]/4 independent of n such that cT n,ϕ 2 ˜ ¯ E max Mk ≤ 8 kϕk20,∞ . (9.71) n k=1,...,[T /δ] To bound the last term, we use the Burkholder–Davis–Gundy inequality (Theorem B.36), Fubini’s theorem and the conclusion of Proposition 9.14 (viz equation (9.48)) to obtain
248
9 A Continuous Time Particle Filter
"
Z
˜ E
t
sup t∈[0,T ]
(ρns (hk ϕ) − ρs (hk ϕ)) dYsk
2 #
0
˜ ≤ CE
"Z
#
T
(ρns (hk ϕ)
2
− ρs (hk ϕ)) ds
0
Z ≤C ≤
T
h i ˜ (ρn (hk ϕ) − ρs (hk ϕ))2 ds E s
0 T Cc3 T khk k0,∞
n
kϕk20,∞ .
(9.72)
The bounds (9.62), (9.65), (9.68), (9.71) and (9.72) together imply (9.60). t u Theorem 9.19. If the coefficients σ,f and h are bounded and Lipschitz, then for any T ≥ 0, there exists a constant cT9 independent of n such that " # cT9 n ˜ sup |π (ϕ) − πt (ϕ)| ≤ √ E kϕk2,∞ (9.73) t n t∈[0,T ] for any ϕ ∈ Cb2 (Rd ). Proof. As in the proof of Theorem 9.15, v " " # # u u 2 ˜ sup |π n (ϕ) − πt (ϕ)| ≤ m ˜ sup (ρn (ϕ) − ρt (ϕ)) E ¯ T tE t t t∈[0,T ]
t∈[0,T ]
v " # u u 2 n t ˜ +m ¯ T kϕk0,∞ E sup (ρt (1) − ρt (1)) , t∈[0,T ]
where, following Exercise 9.16, v " # u u −2 t ˜ <∞ m ¯ T , E sup (ρt (1)) t∈[0,T ]
and the result follows from Proposition 9.18.
t u
¯ = {ϕi , i ≥ 0} where each ϕi ∈ C 2 (Rd ) be a countable convergence Let M b determining set such that kϕi k∞ ≤ 1 and kϕk2,∞ ≤ 1 for any i ≥ 0 and d dM ¯ be the corresponding metric on MF (R ) as defined in Section A.10. The following corollary of Proposition 9.18 and Theorem 9.19 is then immediate. Corollary 9.20. If the coefficients σ,f and h are bounded and Lipschitz, then we have " # " # p T 2 c 2cT9 n n 5 ˜ sup dM ˜ sup dM E , E (9.74) ¯ (ρt , ρt ) ≤ √ ¯ (πt , πt ) ≤ √ n n t∈[0,T ] t∈[0,T ] for any T ≥ 0.
9.5 Other Results
249
9.5 Other Results The particle filter described above merges the weighted approximation approach, as presented in Kurtz and Xiong [171, 174] for a general class of non-linear stochastic partial differential equations (to which the Kushner– Stratonovich equation belongs) with the branching corrections approach introduced by Crisan and Lyons in [65]. The convergence of the resulting approximation follows from Theorem 9.15 under fairly mild conditions on the coefficients. The convergence results described above can be extended to the correlated noise framework. See Section 3.8 for a description of this framework and Crisan [61] for details of the proofs in this case. More refined convergence results require the use of the decomposition (9.61). For this we make use of the properties of the dual of ρ supplied by the theory of stochastic evolution systems (cf. Rozovskii [250]; see also Veretennikov [267] for a direct approach to establishing the dual property of ψ t,ϕ ). The decomposition (9.61) is very important. It will lead to an exact rate of convergence, that is, to computing the limit h i ˜ (ρn (ϕ) − ρt (ϕ))2 lim nE t n→∞
and also to a central limit theorem (note that the three terms on the right-hand side of (9.61) are mutually orthogonal). For this we need to understand the limiting behaviour of the covariance matrix of the random variables {onj , j = 1, . . . , n}. This has yet to be achieved. In the last ten years we have witnessed a rapid development of the theory of particle approximations to the solution of non-linear filtering, and implicitly to solving SPDEs similar to the filtering equations. The discrete time framework has been extensively studied and a multitude of convergence and stability results have been proved. A comprehensive description of these developments in the wider context of approximations of Feynman–Kac formulae can be found in Del Moral [216] and the references therein. See also Del Moral and Jacod [217] for a result involving discrete observations but a continuous signal. Results concerning particle approximations for the continuous time filtering problem are far fewer that their discrete counterparts. The development of particle filters for continuous time problems started in the mid-1990s. In Crisan and Lyons [64], the particle construction of a superprocess is extended to the case of a branching measure-valued process in a random environment. When averaged, the particle system used in the construction is shown to converge to the solution of the Zakai equation. In Crisan et al. [63], the idea of minimal variance branching is introduced (instead of fixed variance branching) with the resulting particle system shown to converge to the solution of the Zakai equation. Finally, in Crisan and Lyons [65], a direct approximation of πt is produced by using a normalised branching approach. In Crisan et al. [62], an alternative approximation to the Kushner–Stratonovich equation (3.57) is given where the branching step is replaced by a correction procedure
250
9 A Continuous Time Particle Filter
using multinomial resampling. The multinomial resampling procedure produces conditionally independent approximate samples from the conditional distribution of the signal, thus facilitating the analysis of the corresponding algorithms. It is, however, suboptimal. For a heuristic explanation, assume that between two consecutive correction steps, the information we receive on the signal is ‘bad’ (the signal-to-noise ratio is small). Consequently the corresponding weights will all be (roughly) equal: that is, all the particles are equally likely. The correction procedure should leave the particles untouched in this case as there is no reason to cull or multiply any of the particles. This is exactly what the minimal branching step does: each particle has exactly one offspring. The multinomial resampling correction will not do this: some particle will be resampled more than others thus introducing an unnecessary random perturbation to the system. For theoretical results related to the suboptimality of the multinomial resampling procedure, see e.g. Crisan and Lyons [66] and Chopin [51]. Even if one uses the minimal variance branching correction, additional randomness is still introduced in the system, which can affect the convergence rates (see Crisan [60]). It remains an open question as to when and how often should one use the correction procedure. On a parallel approach, Del Moral and Miclo [218] produced a particle filter using the pathwise approach of Davis [74]. The idea is to recast the equations of non-linear filtering in a form in which no stochastic integration is required. Then one can apply Del Moral’s general method of approximating Feynman– Kac formulae. This approach is important as it emphasises the robustness of the particle filter, although it requires that the observation noise and signal noise are independent. While it cannot be applied to the correlated noise framework, it is nevertheless a very promising approach and we expect further research to show its full potential.
9.6 The Implementation of the Particle Approximation for πt In the following we give a brief description of the implementation of the particle approximation analysed in this chapter. We start by choosing parameters n, δ and m. We use n particles and we apply the correction (branching) procedure at times kδ, for i > 1, divide the inter branching intervals [(k − 1)δ, kδ] into m subintervals of length δ/m and apply the Euler method to generate the trajectories of the particles. The following is the initialization step. Initialization For j := 1, . . . , n Sample vj (0) from π0 . aj (0) := 1. end for
9.6 The Implementation of the Particle Approximation for πt
251
Pn π0 := n1 j=1 δvj (0) Assign value t := 0 The standard sampling procedure can be replaced by any alternative method that produces an approximation for π0 . For example, a stratified sampling procedure, if available, will produce a better approximation. In the special case where π0 is a Dirac measure concentrated at x0 ∈ Rd , the value x0 is assigned to all initial positions vj (0) of the particles. The following is the (two-step) iteration procedure. Iteration [iδ to (i + 1)δ] 1. Evolution of the particles for l := 0 to m − 1 for j := 1 to n Generate the Gaussian random vector ∆V . p vj (t + δ/m) := vj (t) + f (vj (t))δ/m + σ(vj (t))∆V δ/m. bj (t + δ/m) := h(vj (t))> (Yt+δ/m − Yt ) − (δ/2m)kh(vj (t))k2 aj (t + δ/m) := aj (t) exp(bj (t + δ/m)) end for t := t + δ/m Pn Σ(t) := j=1 aj (t) Pn 1 πtn := Σ(t) j=1 aj (t)δvj (t) . end for In the above ∆V = (∆V1 , ∆V2 , . . . , ∆Vp )> is a p-dimensional random vector with independent identically distributed entries ∆Vi ∼ N (0, 1) for all i = 1, . . . , p. The Euler method used above can be replaced by any other weak approximation method for the solution of the stochastic differential equation satisfied by the signal (see for example Kloeden and Platen [151] for alternative approximation methods). The choice of the parameters δ and m depends on the frequency of the arrivals of the new observations Yt . We have assumed that the observation Yt is available for all time instants t which are integer multiples of δ/m. There are no theoretical results as to what is the right balance between the size of the intervals between corrections and the number of steps used to approximate the law of the signal, in other words what is the optimal choice of parameters δ and m. 2. Branching procedure for j := 1 to n a ¯j (t) := aj (t)/Σ(t) end for for j 0 := 1 to n
252
9 A Continuous Time Particle Filter
Calculate the number of offspring onj0 (t) for the j 0 th particle in the system of particles with weights/positions (¯ aj (t), vj (t)) using the algorithm described in Section 9.2.1. end for We have now n particles with positions (v1 (t), v1 (t), . . . , v1 (t), v2 (t), v2 (t), . . . , v2 (t), . . .) | {z } | {z } o1 (t)
(9.75)
o2 (t)
Reindex the positions of the particles as v1 (t), v2 (t), . . . , vn (t). for j := 1, . . . , n aj (t) := 1 end for The positions of the particles with no offspring will no longer appear among those described by the formula (9.75). Alternatives to the branching procedure are described in Section 10.5. For example, one can use the sampling with replacement method. In this case Step 2 is replaced by the following. 20 . Resampling procedure for j := 1 to n a ¯j (t) := aj (t)/Σ(t). end for for j := 1 to n Pick vj (t) by sampling with replacement from the set of particle positions (v1 (t), v2 (t), . . . , vn (t)) according to the probability vector of normalized weights (¯ a1 (t), a ¯2 (t), . . . , a ¯n (t)). end for Reindex the positions of the particles as v1 (t), v2 (t), . . . , vn (t). for j := 1, . . . , n aj (t) := 1 end for However, the resampling procedure generates a multinomial offspring distribution which is known to be suboptimal. In particular, it does not have the minimal variance property enjoyed by the offspring distribution produced by the algorithm described in Section 9.2.1 (see Section 10.5 for details).
9.7 Solutions to Exercises 9.1 In the case where a is an integer it is immediate that taking ξ min = a achieves the minimal variance of zero, and by Jensen’s inequality for any convex function ϕ, for ξ ∈ Aa , E[ϕ(ξ)] ≥ ϕ(E(ξ)) = ϕ(a) = E[ϕ(ξ min )] thus E[ϕ(ξ min )] ≤ E[ϕ(ξ)] for any ξ ∈ Aa .
9.7 Solutions to Exercises
253
For the more general case, let ξ ∈ Aa . Suppose that the law of ξ assigns non-zero probability mass to two integers which are not adjacent. That is, we can find k, l such that P(ξ = k) > 0 and P(ξ = l) > 0 and k + 1 ≤ l − 1. We construct a new random variable ζ from ξ by moving some probability mass β > 0 from k to k + 1 and some from l to l − 1. Let U ⊂ {ω : ξ(ω) = k} and D ⊂ {ω : ξ(ω) = l}, be such that P(U ) = P(D) = β; then define ζ , ξ + 1 U − 1D . Thus by direct computation, E[ζ] = a + β − β, so ζ ∈ Aa ; secondly var(ζ) = E[ζ 2 ] − a2 = E[ξ 2 ] + 2β(1 + k − l) − a2 = var(ξ) + 2β(1 + k − l). As we assumed that k + 1 ≤ l − 1, it follows that var(ζ) < var(ξ). Consequently the variance minimizing element of Aa can only have non-zero probability mass on two adjacent negative integers, and then the condition on the expectation ensures that this must be ξ min given by (9.10). Now consider ϕ a convex function, we use the same argument E[ϕ(ζ)] = E[ϕ(ξ)] + β (ϕ(k + 1) − ϕ(k) + ϕ(l − 1) − ϕ(l)) . Now we use that fact that if ϕ is a convex function for any points a < b < c, since the graph of ϕ lies below the chord (a, ϕ(a))–(c, ϕ(c)), ϕ(b) ≤ ϕ(a)
c−b b−a + ϕ(c) , c−a c−a
which implies that ϕ(b) − ϕ(a) ϕ(c) − ϕ(b) ≤ . b−a c−b If k + 1 = l − 1 we can apply this result directly to see that ϕ(k + 1) − ϕ(k) ≤ ϕ(l) − ϕ(l − 1), otherwise we use the result twice, for k < k + 1 < l − 1 and for k + 1 < l − 1 < l, to obtain ϕ(k + 1) − ϕ(k) ≤
ϕ(l − 1) − ϕ(k + 1) ≤ ϕ(l) − ϕ(l − 1) k−l−2
thus E[ϕ(ζ)] ≤ E[ϕ(ξ)]. This inequality will be strict unless ϕ is linear between k and l. If it is strict, then we can argue as before that E[ϕ(ζ)] < E[ϕ(ζ)]. It is therefore clear that if we can find a non-adjacent pair of integers k and l, such that ϕ is not linear between k and l then the random variable ξ cannot minimize E[ϕ(ξ)]. Consequently, a ξ which minimizes E[ϕ(ξ)] can either assign strictly positive mass to a single pair of adjacent integers, or it can assign strictly positive
254
9 A Continuous Time Particle Filter
probability to any number of integers, provided that they are all contained in a single interval of R where the function φ(x) is linear. In the second case where ξ ∈ Aa only assigns non-negative probability to integers in an interval where ϕ is linear, it is immediate that E[ϕ(ξ)] = ϕ(E[ξ]) = ϕ(a), thus as a consequence of Jensen’s inequality such a ξ achieves the minimum value of E[ϕ(ξ)] over ξ ∈ Aa . Since ξ ∈ Aa satisfies E[ξ] = a, the region where ϕ is linear must include the integers [a] and [a] + 1, therefore with ξ min defined by (9.10), E[ϕ(ξ min )] = ϕ(E[a]). It therefore follows that in either case, the minimum value is uniquely attained by ξ min unless ϕ is linear in which case E[ϕ(ξ)] is constant for any ξ ∈ Aa . E[ϕ(ξ min )] ≤ E[ϕ(ξ)] for any ξ ∈ Aa . 9.10 We have for t ∈ [kδ, (k + 1)δ] Z t Z p p t anj (t) = exp p h(vjn (s))> dYs − kh(vjn (s))k2 ds 2 kδ kδ 2 Z t p −p n 2 = Mp (t) exp kh(vj (s))k ds 2 kδ ! m p2 − p X i 2 ≤ Mp (t) exp kh k∞ (t − kδ) , 2 i=1 where Mp = {Mp (t), t ∈ [kδ, (k + 1)δ]} is the exponential martingale defined as Z t Z p2 t Mp (t) , exp p h(vjn (s))> dYs − kh(vjn (s))k2 ds . 2 kδ kδ Hence ˜ E
! m 2 X p − p p anj (t) | Fkδ ≤ exp khi k2∞ (t − kδ) , 2 i=1
which, in turn, implies that p ! n m 2 X X 1 p − p ˜ E an (t) Fkδ ≤ exp khi k2∞ (t − kδ) . n j=1 j 2 i=1
(9.76)
Therefore p n X n p n p 1 ˜ (ξ ) | F[t/δ]δ = ξ ˜ E E an (t) F[t/δ]δ t [t/δ]δ n j=1 j ≤
n ξ[t/δ]δ
p
Also from (9.76) one proves that
m
(p2 − p)(t − kδ) X i 2 exp kh k∞ 2 i=1
! .
(9.77)
9.7 Solutions to Exercises m p n n p p2 − p X i 2 E (ξkδ ) |F(k−1)δ ≤ ξ(k−1)δ exp kh k∞ δ 2 i=1
255
!
hence, by induction, ! m p2 − p X i 2 ≤ exp kh k∞ kδ . 2 i=1
n p E[(ξkδ ) ]
(9.78)
Finally from (9.76), (9.77) and (9.78) we get (9.39). The bound (9.40) follows in a similar manner. 9.11 We follow the proof of Proposition 9.7 Let Fkδ− = σ(Fs , 0 ≤ s < kδ) be the σ-algebra of events up to time kδ (the time of the kth-branching) and ρnkδ− = limt%kδ ρnt . For t ∈ [iδ, (i + 1)δ), we have† ¯ n,ϕ + ρnt (ϕ) = π0n (ϕ) + M i + ¯ n,ϕ
where M
¯ n,ϕ = M i
=
i X
¯ n,ϕ , M k
i X
(ρnkδ− (ϕ) − ρn(k−1)δ (ϕ))
k=1 n ρiδ (ϕ)) ,
(ρnt (ϕ)
− k > 0 is the martingale
ρnkδ (ϕ) − ρnkδ− (ϕ)
k=1
=
i n 1X n X n n ξiδ (oj 0 (iδ) − n¯ an,iδ j 0 )ϕ(vj 0 (iδ)), n 0
for i ≥ 0.
j =1
k=1
Next, by Itˆ o’s formula, from (9.4) and (9.5), we get that danj (t)ϕ vjn (t) = anj (t)Aϕ(vjn (t)) dt + anj (t)((∇ϕ)> σ)(vjn (t)) dVt + anj (t)ϕ(vjn (t))h(vjn (t))> dYt for ϕ ∈ Cb2 (Rd ). Hence for t ∈ [kδ, (k + 1)δ), for k = 0, 1, . . . , i, we have Z t n X n ρnt (ϕ) − ρnkδ (ϕ) = ξkδ danj (s)ϕ(vjn (s)) kδ
Z
j=1
t
ρns (Aϕ) ds
= kδ
n Z 1X t n n ξ a (s)((∇ϕ)> σ)(vjn (s)) dVsj n j=1 kδ kδ j m Z t X + ρns (hr ϕ) dYsr .
+
†
r=1
kδ
We use the standard convention
P0
k=1
= 0.
256
9 A Continuous Time Particle Filter
Similarly ρnkδ− (ϕ) − ρn(k−1)δ (ϕ) =
Z
kδ
ρns (Aϕ) ds (k−1)δ n Z 1 X kδ + ξ n an (s)((∇ϕ)> σ)(vjn (s)) dVsj n j=1 (k−1)δ kδ j m Z kδ X + ρns (hr ϕ) dYsr . r=1 (k−1)δ
9.16 Following Lemma 3.29, the process t 7→ ρt (1) has the explicit representation (3.55). That is, Z t Z 1 t ρt (1) = exp πs (h> ) dYs − πs (h> )πs (h) ds . 2 0 0 As in Exercise 9.10 with p = −2, for t ∈ [0, T ], −2
ρt (1)
≤ exp 3mtkhk2∞ Mt ,
where M = {Mt , t ∈ [0, T ]} is the exponential martingale defined as Z t Z t > > Mt , exp −2 πs (h ) dYs − 2 πs (h )πs (h) ds . 0
0
Using an argument similar to that used in the solution of Exercise 3.10 based on the Gronwall inequality and the Burkholder–Davis–Gundy inequality (see Theorem B.36 in the appendix), one shows that " # ˜ sup Mt < ∞; E t∈[0,T ]
hence the claim.
10 Particle Filters in Discrete Time
The purpose of this chapter is to present a rigorous mathematical treatment of the convergence of particle filters in the (simpler) framework where both the signal X and the observation Y are discrete time processes. This restriction means that this chapter does not use stochastic calculus. The chapter is organized as follows. In the following section we describe the discrete time framework. In Section 10.2 we deduce the recurrence formula for the conditional distribution of the signal in discrete time. In Section 10.3 we deduce necessary and sufficient conditions for sequences of (random) measures to converge to the conditional distribution of the signal. In Section 10.4 we describe a generic class of particle filters which are shown to converge in the following section.
10.1 The Framework Let the signal X = {Xt , t ∈ N} be a stochastic process defined on the probability space (Ω, F, P) with values in Rd . Let FtX be the filtration generated by the process; that is, FtX , σ(Xs , s ∈ [0, t]). We assume that X is a Markov chain. That is, for all t ∈ N and A ∈ B(Rd ), P Xt+1 ∈ A | FtX = P (Xt+1 ∈ A | Xt ) . (10.1) The transition kernel of the Markov chain X is the function Kt (·, ·) defined on Rd × B(Rd ) such that, for all t ∈ N and x ∈ Rd , Kt (x, A) = P(Xt+1 ∈ A | Xt = x).
(10.2)
The transition kernel Kt is required to have the following properties. i. Kt (x, ·) is a probability measure on (Rd , B(Rd )), for all t ∈ N and x ∈ Rd . A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 10,
258
10 Particle Filters in Discrete Time
ii. Kt (·, A) ∈ B(Rd ), for all t ∈ N and A ∈ B(Rd ). The distribution of X is uniquely determined by its initial distribution and its transition kernel (see Theorem A.11 for details of how a stochastic process may be constructed from its transition kernels). Let us denote by qt the distribution of the random variable Xt , qt (A) , P(Xt ∈ A). Then, from (10.2), it follows that qt satisfies the recurrence formula qt+1 = Kt qt , t ≥ 0, where Kt qt is the measure defined by Z (Kt qt )(A) , Kt (x, A)qt (dx).
(10.3)
Rd
Hence, by induction it follows that qt = Kt−1 . . . K1 K0 q0 ,
t > 0.
Exercise 10.1. For arbitrary ϕ ∈ B(Rd ) and t ≥ 0, define Kt ϕ as Z Kt ϕ(x) = ϕ(y)Kt (x, dy). Rd
i. Prove that Kt ϕ ∈ B(Rd ) for any t ≥ 0. ii. Prove that Kt qt is a probability measure for any t ≥ 0. iii. Prove that, for any ϕ ∈ B(Rd ) and t > 0, we have Kt qt (ϕ) = qt (Kt ϕ), hence in general qt (ϕ) = q0 (ϕt ),
t > 0,
d
where ϕt = K0 K1 . . . Kt−1 ϕ ∈ B(R ). Let the observation process Y = {Yt , t ∈ N} be an Rm -valued stochastic process defined as follows Yt , h(t, Xt ) + Wt ,
t > 0,
(10.4)
and Y0 = 0. In (10.4), h : N × Rd → Rm is a Borel-measurable function and for all t ∈ N, Wt : Ω → Rm are mutually independent random vectors with laws absolutely continuous with respect to the Lebesgue measure λ on Rm . We denote by g(t, ·) the density of Wt with respect to λ and we further assume that g(t, ·) ∈ B(Rd ) and is a strictly positive function. The filtering problem consists of computing the conditional distribution of the signal given the σ-algebra generated by the observation process from time
10.2 The Recurrence Formula for πt
259
0 up to the current time i.e. computing the (random) probability measure πt , where πt (A) , P(Xt ∈ A | σ(Y0:t )),
(10.5)
πt f = E [f (Xt ) | σ(Y0:t )] for all A ∈ B(Rd ) and f ∈ B(Rd ), where Y0:t is the random vector Y0:t , (Y0 , Y1 , . . . , Yt ).† For arbitrary y0:t , (y0 , y1 , . . . , yt ) ∈ (Rm )t+1 , let πty0:t be the (non-random) probability measure defined as πty0:t (A) , P (Xt ∈ A | Y0:t = y0:t ) , πty0:t f
(10.6)
= E [f (Xt ) | Y0:t = y0:t ]
for all A ∈ B(Rd ) and f ∈ B(Rd ). Then πt = πtY0:t . While πt is a random probability measure, πty0:t is a deterministic probability measure. We also iny troduce pt and pt 0:t−1 , t > 0 the predicted conditional probability measures defined by y
pt 0:t−1 (A) , P (Xt ∈ A | Y0:t−1 = y0:t−1 ) , y
pt 0:t−1 f = E [f (Xt ) | Y0:t−1 = y0:t−1 ] . Y
Again pt = pt 0:t−1 . In the statistics and engineering literature the probability qt is commonly called the prior distribution of the signal Xt , whilst πt is called the (Bayesian) posterior distribution.
10.2 The Recurrence Formula for πt The following lemma gives the density of the random vector Ys:t = (Ys , . . . , Yt ) for arbitrary s, t ∈ N, s ≤ t. Lemma 10.2. Let PYs:t ∈ P((Rm )t−s+1 ) be the probability distribution of Ys:t and λ be the Lebesgue measure on ((Rm )t−s+1 , B((Rm )t−s+1 )). Then, for all 0 < s ≤ t < ∞, PYs:t is absolutely continuous with respect to λ and its Radon– Nikodym derivative is dPYs:t (ys:t ) = Υ (ys:t ) , dλ
Z
t Y
gi (yi − h(i, xi ))PXs:t (dxs:t ),
(Rd )t−s+1 i=s
where PXs:t ∈ P((Rd )t−s+1 ) is the probability distribution of the random vector Xs:t = (Xs , . . . , Xt ). †
{Y0:t , t ∈ N} is the path process associated with the observation process Y = {Yt , t ∈ N}. That is, {Y0:t , t ∈ N} records the entire history of Y up to time t, not just its current value.
260
10 Particle Filters in Discrete Time
Proof. Let Cs:t = Cs ×· · ·×Ct , where Cr are arbitrary Borel sets, Cr ∈ B(Rm ) for all s ≤ r ≤ t. We need to prove that Z PYs:t (Cs:t ) = P ({Ys:t ∈ Cs:t }) = Υ (ys:t )dys . . . dyt . (10.7) Cs:t
Using the properties of the conditional probability, Z P (Ys:t ∈ Cs:t ) = P (Ys:t ∈ Cs:t | Xs:t = xs:t ) PXs:t (dxs:t ) .
(10.8)
(Rd )t−s+1
Since (Xs , . . . , Xt ) is independent of (Ws , . . . , Wt ), from (10.4) it follows that " t # Y P (Ys:t ∈ Cs:t | Xs:t = xs:t ) = E 1Ci (h(i, Xi ) + Wi ) | Xs,t = xs:t i=s
=E
" t Y
# 1Ci (h(i, xi ) + Wi ) ,
i=s
thus by the mutual independence of Ws , . . . , Wt , P (Ys:t ∈ Cs:t | Xs:t = xs:t ) = =
t Y
E [1Ci (h(i, xi ) + Wi )]
i=s t Z Y i=s
gi (yi − h(i, xi )) dyi .
(10.9)
Ci
By combining (10.8) and (10.9) and applying Fubini’s theorem, we obtain (10.7). t u Remark 10.3. A special case of (10.9) gives that P (Yt ∈ dyt | Xt = xt ) = gt (yt − h(t, xt )) dyt , which explains why the function gtyt : Rd → R defined by gtyt (x) = gt (yt − h(t, x)),
x ∈ Rd
(10.10)
is commonly referred to as the likelihood function. Since gi for i = s, . . . , t are strictly positive, the density of the random vector (Ys , . . . , Yt ) is also strictly positive. This condition can be relaxed (i.e. gi required to be non-negative), however, the relaxation requires a more involved theoretical treatment of the particle filter. The recurrence formula for πt involves two operations defined on P(Rd ): a transformation via the transition kernel Kt and a projective product associated with the likelihood function gtyt defined as follows.
10.2 The Recurrence Formula for πt
261
Definition 10.4. Let p ∈ P(Rd ) be a probability measure, and let ϕ ∈ B(Rd ) be a non-negative function such that p(ϕ) > 0. The projective product ϕ ∗ p is the (set) function ϕ ∗ p : B(Rd ) → R defined by Z ϕ(x)p(dx) ϕ ∗ p(A) , A p(ϕ) for any A ∈ B(Rd ). In the above definition, recall that Z p(ϕ) =
ϕ(x)p(dx). Rd
Exercise 10.5. Prove that ϕ ∗ p is a probability measure on B(Rd ). The projective product ϕ ∗ p is a probability measure which is absolutely continuous with respect to p, whose Radon–Nikodym derivative with respect to p is proportional to ϕ, viz: d(ϕ ∗ p) = cϕ, dp where c is the normalizing constant, c = 1/p(ϕ). The following result gives the recurrence formula for the conditional probability of the signal. The prior and the posterior distributions coincide at time 0, π0 = q0 , since Y0 = 0 (i.e. no observations are available at time 0). Proposition 10.6. For any fixed path (y0 , y1 , . . . , yt , . . .) the sequence of (non-random) probability measures (πty0:t )t≥0 satisfies the following recurrence relation y0:t−1 πty0:t = gtyt ∗ Kt−1 πt−1 , t > 0. (10.11) The recurrence formula (10.11) holds PY0:t -almost surely.† Equivalently, the conditional distribution of the signal satisfies the following recurrence relation πt = gtYt ∗ Kt−1 πt−1 ,
t > 0,
(10.12)
and the recurrence is satisfied P-almost surely. Proof. For all f ∈ B(Rd ), using the Markov property of X and the definition of the transition kernel K, X E f (Xt ) | Ft−1 = E [f (Xt ) | Xt−1 ] = Kt−1 f (Xt−1 ). †
Equivalently, formula (10.11) holds true λ-almost surely where λ is the Lebesgue measure on (Rm )t+1 .
262
10 Particle Filters in Discrete Time
Since W0:t−1 is independent of X0:t , from property (f) of conditional expectation,† X X E f (Xt ) | Ft−1 ∨ σ(W0:t−1 ) = E f (Xt ) | Ft−1 , hence, using property (d) of conditional expectation pt f = E [f (Xt ) | Y0:t−1 ] X = E E f (Xt ) | Ft−1 ∨ σ(W0:t−1 ) | σ(Y0:t−1 ) = E [Kt−1 f (Xt−1 ) | σ(Y0:t−1 )] = πt−1 (Kt−1 f ), which implies that pt = Kt−1 πt−1 (as in Exercise 10.1 part (iii) or equivalently y y0:t−1 pt 0:t−1 = Kt−1 πt−1 . y Next we prove that πty0:t = gtyt ∗ pt 0:t−1 . Let C0:t = C0 × · · · × Ct where Cr ∈ B(Rm ) for r = 0, 1, . . . , t. We need to prove that for any A ∈ B(Rd ), Z Z y y0:t πt (A) PY0:t (dy0:t ) = gtyt ∗ pt 0:t−1 (A) PY0:t (dy0:t ). (10.13) C0:t
C0:t
By (A.2), the left-hand side of (10.13) is equal to P({Xt ∈ A} ∩ {Y0:t ∈ C0:t }). Since σ(X0:t , W0:t−1 ) ⊃ σ(Xt , Y0:t−1 ), from property (f) of conditional expectation P (Yt ∈ Ct | Xt , Y0:t−1 ) = E (P (Yt ∈ Ct | X0:t , W0:t−1 ) | Xt , Y0:t−1 ) (10.14) and using property (d) of conditional expectations and (10.9) P (Yt ∈ Ct | X0:t , W0:t−1 ) = P (Yt ∈ Ct | X0:t ) = P Y0:t ∈ (Rm )t × Ct | X0:t Z = gt (yt − h(t, Xt )) dyt .
(10.15)
Ct
From (10.14) and (10.15), P (Yt ∈ Ct | Xt , Y0:t−1 ) = E (P (Yt ∈ Ct | Xt , W0:t−1 ) | Xt , Y0:t−1 ) Z = gt (yt − h(t, Xt )) dyt . Ct
This gives us Z P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1 ) =
gtyt (xt ) dyt ,
Ct
where g yt is defined in (10.10); hence †
See Section A.2 for a list of the properties of conditional expectation.
(10.16)
10.2 The Recurrence Formula for πt
263
PY0:t (C0:t ) = P {Yt ∈ Ct } ∩ {Xt ∈ Rd } ∩ {Y0:t−1 ∈ C0:t−1 } Z = P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1 ) Rd ×C0:t−1
PXt ,Y0:t−1 (dxt , dy0:t−1 ) Z Z y = gtyt (xt ) dyt pt 0:t−1 (dxt )PY0:t−1 (dy0:t−1 ) Rd ×C0:t−1 Ct Z Z y = gtyt (xt )pt 0:t−1 (dxt ) PY0:t−1 (dy0:t−1 ) dyt . (10.17) C0:t
Rd
In (10.17), we used the identity y
PXt ,Y0:t−1 (dxt , dy0:t−1 ) = pt 0:t−1 (dxt )PY0:t−1 (dy0:t−1 ),
(10.18)
which is again a consequence of the vector-valued equivalent of (A.2), since for all A ∈ B(Rd ), we have P ((Xt , Y0:t−1 ) ∈ A × C0:t−1 ) Z = P (Xt ∈ A | Y0:t−1 = y0:t−1 ) PY0:t−1 (dy0:t−1 ) C Z 0:t−1 y = pt 0:t−1 (dxt )PY0:t−1 (dy0:t−1 ). A×C0:t−1
From (10.17) y
PY0:t (dy0:t ) = pt 0:t−1 (gtyt ) dyt PY0:t−1 (dy0:t−1 ). Hence the second term in (10.13) is equal to Z y gtyt ∗ pt 0:t−1 (A)PY0:t (dy0:t ) C0:t R yt Z y g (xt )pt 0:t−1 (dxt ) A t = PY0:t (dy0:t ) y0:t−1 pt (gtyt ) C0:t Z Z y = gtyt (xt )pt 0:t−1 (dxt ) dyt PY0:t−1 (dy0:t−1 ). C0:t
A
Finally, using (10.16) and (10.18), Z y gtyt ∗ pt 0:t−1 (A)PY0:t (dy0:t ) C0:t Z Z y = gtyt (xt )dyt pt 0:t−1 (dxt )PY0:t−1 (dy0:t−1 ) A×C0:t−1 Ct Z = P (Yt ∈ Ct | Xt = xt , Y0:t−1 = y0:t−1 ) A×C0:t−1
× PXt ,Y0:t−1 (dxt , dy0:t−1 ) = P ({Xt ∈ A} ∩ {Y0:t ∈ C0:t }) .
264
10 Particle Filters in Discrete Time
From the earlier discussion this is sufficient to establish the result.
t u
As it can be seen from its proof, the recurrence formula (10.12) can be rewritten in the following expanded way, πt−1 7→ pt = Kt−1 πt−1 7→ πt = gtYt ∗ pt ,
t > 0.
(10.19)
The first step is called the prediction step: it occurs at time t before the arrival of the new observation Yt . The second step is the updating step as it takes into account the new observation Yt . A similar expansion holds true for the recurrence formula (10.11); that is, y
y
y
y
0:t−1 0:t−1 πt−1 7→ pt 0:t−1 = Kt−1 πt−1 7→ πty0:t = gtyt ∗ pt 0:t−1 ,
t > 0.
(10.20)
The simplicity of the recurrence formulae (10.19) and (10.20) is misleading. A closed formula for the posterior distribution exists only in exceptional cases (the linear/Gaussian filter). The main difficulty resides in the updating step: the projective product is a non-linear transformation involving the computay tion of the normalising constant pt (gtYt ) or pt 0:t−1 (gtyt ) which requires an integration over a (possibly) high-dimensional space. In Section 10.4 we present a generic class of particle filters which can be used to approximate numerically the posterior distribution. Before that we state and prove necessary and sufficient criteria for sequences of approximations to converge to the posterior distribution.
10.3 Convergence of Approximations to πt We have two sets of criteria: for the case when the observation is a priori fixed to a particular outcome, that is, say Y0 = y0 , Y1 = y1 , . . . and for the case when the observation remains random. The first case is the simpler of the two, since the measures to be approximated are not random. 10.3.1 The Fixed Observation Case We look first at the case when the observation process has an arbitrary, but fixed, value y0:T , where T is a finite time horizon. We assume that the recurrence formula (10.20) for πty0:t – the conditional distribution of the signal given the event {Y0:t = y0:t } – holds true for the particular observation path y0:t for all 0 ≤ t ≤ T (remember that (10.20) is valid PY0:t -almost surely). As stated above, (10.20) requires the computation of the predicted conditional y probability measure pt 0:t−1 : y
y
0:t−1 πt−1 −→ pt 0:t−1 −→ πty0:t .
10.3 Convergence of Approximations to πt
265
Therefore it is natural to study algorithms which provide recursive approxy imations for πty0:t using intermediate approximations for pt 0:t−1 . Denote by y0:t and (pnt )∞ (πtn )∞ n=1 the approximating sequence for πt n=1 the approximaty0:t−1 ing sequence for pt . Is is assumed that the following three conditions are satisfied. • • •
πtn and pnt are random measures, not necessarily probability measures. pnt 6= 0, πtn 6= 0 (i.e. no approximation should be trivial). pnt gtyt > 0 for all n > 0, 0 ≤ t ≤ T .
Let π ¯tn be defined as a (random) probability measure absolutely continuous with respect to pnt for t ∈ N and n ≥ 1 such that π ¯tn = gtyt ∗ pnt ; thus π ¯tn f =
pnt (f g yt ) . pnt g yt
(10.21)
(10.22)
The following theorems give necessary and sufficient conditions for the cony vergence of pnt to pt 0:t−1 and πtn to πty0:t . In order to simplify notation, for the remainder of this subsection, dependence on y0:t is suppressed and πty0:t is y denoted by πt , pt 0:t−1 by pt and gtyt by gt . It is important to remember that the observation process is a given fixed path y0:T . Theorem 10.7. For all f ∈ B(Rd ) and all t ∈ [0, T ] the limits a0. limn→∞ E [|πtn f − πt f |] = 0, b0. limn→∞ E [|pnt f − pt f |] = 0, hold if and only if for all f ∈ B(Rd ) and all t ∈ [0, T ] we have n a1. limn→∞ E [|π 0nf − π0 f |] =n 0, b1. limn→∞ E pt f − Kt−1 πt−1 f = limn→∞ E [|πtn f − π ¯tn f |] = 0.
Proof. The necessity of conditions (a0) and (b0) is proved by induction. The limit (a0) follows in the starting case of t = 0 from (a1). We need to show n that if πt−1 converges in expectation to πt−1 and pnt converges in expectation to pt then πtn converges in expectation to πt . Since pt = Kt−1 πt−1 , for all f ∈ B(Rd ), by the triangle inequality n n |pnt f − pt f | ≤ |pnt f − Kt−1 πt−1 f | + |Kt−1 πt−1 f − Kt−1 πt−1 f |.
(10.23)
The expected value of the first term on the right-hand side of (10.23) converges n to zero from (b1). Also using Exercise 10.1, Kt−1 f ∈ B(Rd ) and Kt−1 πt−1 f= n πt−1 (Kt−1 f ) and Kt−1 πt−1 f = πt−1 (Kt−1 f ) hence n lim E Kt−1 πt−1 f − Kt−1 πt−1 f = 0. n→∞
By taking expectation of both sides of (10.23),
266
10 Particle Filters in Discrete Time
lim E [|pnt f − pt f |] = 0,
n→∞
(10.24)
which establishes condition (a0). From (10.22) pnt (f gt ) pt (f gt ) − pnt gt pt gt n n pt (f gt ) 1 pt (f gt ) pt (f gt ) n =− n (p gt − pt gt ) + − , pt gt pt gt t pt gt pt gt
π ¯tn f − πt f =
and as |pnt (f gt )| ≤ kf k∞ pnt gt , |¯ πtn f − πt f | ≤
kf k∞ n 1 |pt gt − pt gt | + |pn (f gt ) − pt (f gt )| . pt gt pt gt t
(10.25)
Therefore E [|¯ πtn f − πt f |] ≤
kf k∞ E [|pnt gt − pt gt |] pt gt 1 + E [|pnt (f gt ) − pt (f gt )|] . pt gt
(10.26)
From (10.24) both terms on the right-hand side of (10.26) converge to zero. Finally, |πtn f − πt f | ≤ |πtn f − π ¯tn f | + |¯ πtn f − πt f | . (10.27) As the expected value of the first term on the right-hand side of (10.27) converges to zero using (b1) and the expected value of the second term converges to zero using (10.26), limn→∞ E [|πtn f − πt f |] = 0. For the sufficiency part, assume that conditions (a0) and (b0) hold. Thus for all t ≥ 0 and for all f ∈ B(Rd ), lim E [|πtn f − πt f |] = lim E [|pnt f − pt f |] = 0.
n→∞
n→∞
Clearly condition (a1) follows as a special case of (a0) with t = 0. Since pt = Kt−1 πt−1 , we have for all f ∈ B(Rd ), n E pnt f − Kt−1 πt−1 f ≤ E [|pnt f − pt f |] n + E πt−1 (Kt−1 f ) − πt−1 (Kt−1 f ) , (10.28) which implies the first limit in (b1). From (10.26), lim E [|πt f − π ¯tn f |] = 0
n→∞
and by the triangle inequality E [|πtn f − π ¯tn f |] ≤ E [|πtn f − πt f |] + E [|πt f − π ¯tn f |] from which the second limit in (b1) follows.
(10.29) t u
10.3 Convergence of Approximations to πt
267
Thus conditions (a1) and (b1) imply that pnt converges in expectation to pt and πtn converges in expectation to πt (see Section A.10 for the definition of convergence in expectation). The convergence in expectation of pnt and of πtn holds if and only if conditions (a1) and (b1) are satisfied for all f ∈ Cb (Rd ) (not necessarily for all f ∈ B(Rd )) provided additional constraints are imposed on the transition kernel of the signal and of the likelihood functions; see Corollary 10.10 below. Definition 10.8. The transition kernel Kt is said to satisfy the Feller property if Kt f ∈ Cb (Rd ) for all f ∈ Cb (Rd ). Exercise 10.9. Let {Vt }∞ t=1 be a sequence of independent one-dimensional standard normal random variables. i. Let X = {Xt , t ∈ N} be given by the following recursive formula Xt+1 = a(Xt ) + Vt , where a : R → R is a continuous function. Show that the corresponding transition kernel for X satisfies the Feller property. ii. Let X = {Xt , t ∈ N} be given by the following recursive formula Xt+1 = Xt + sgn(Xt ) + Vt . Then show that the corresponding transition kernel for X does not satisfy the Feller property. The following result gives equivalent conditions for the convergence in expectation. Corollary 10.10. Assume that the transition kernel for X is Feller and that the likelihood functions gt are all continuous. Then the sequences pnt , πtn converge in expectation to pt and πt for all t ∈ [0, T ] if and only if conditions (a1) and (b1) are satisfied for all f ∈ Cb (Rd ) and all t ∈ [0, T ]. Proof. The proof is a straightforward modification of the proof of Theorem 10.7. The Feller property is used in the convergence to zero of the second term on the right-hand side of (10.23): n lim E Kt−1 πt−1 f − Kt−1 πt−1 f n→∞ n = lim E πt−1 (Kt−1 f ) − πt−1 (Kt−1 f ) = 0. n→∞
That is, only if Kt−1 f is continuous, we can conclude that the limit above is zero. The continuity of gt is used to conclude that both terms on the righthand side of (10.26) converge to zero. t u
268
10 Particle Filters in Discrete Time
Following Remark A.38 in the appendix, if there exists a positive constant p > 1 such that h i cf 2p E |πtn f − πt f | ≤ p, (10.30) n where cf is a positive constant depending on the test function f , but independent of n, then, for any ε ∈ (0, 1/2 − 1/(2p)) there exists a positive random variable cf,ε almost surely finite such that |πtn f − πt f | ≤
cf,ε . nε
In particular πtn f converges to πt f almost surely. Moreover if (10.30) holds for any f ∈ M where M is a countable convergence determining set (as defined in Section A.10), then, almost surely, πtn converges to πt in the weak topology. ¯ ∈ F such that P(Ω) ¯ = 1 and for any This means that there exists a set Ω ¯ ω ∈ Ω the corresponding sequence of probability measures πtn,ω satisfies lim πtn,ω (f ) = πt (f ),
n→∞
for any f ∈ Cb (Rd ). This cannot be extended to the convergence for any f ∈ B(Rd ) (i.e. to the stronger, so-called convergence in total variation, of πtn,ω to πt ). Exercise 10.11. Let µ be the uniform measure on the interval [0, 1] and (µn )n≥1 be the sequence of probability measures n
µn =
1X δi/n . n i=1
i. Show that (µn )n≥1 converges to µ in the weak topology. ii. Let f = 1Q∩[0,1] ∈ B(Rd ) be the indicator set of all the rational numbers in [0, 1]. Show that µn (f ) 6→ µ(f ), hence µn does not converge to µ in total variation. Having rates of convergence for the higher moments of the error terms πtn f − πt f as in (10.30) is therefore very useful as they imply the almost sure convergence of the approximations in the weak topology with no additional assumptions required on the transition kernels of the signal and the likelihood function. However, if we wish a result in the same vein as that of Theorem 10.7, the same assumptions as in Corollary 10.10 must be imposed. The following theorem gives us the corresponding criterion for the almost sure convergence of pnt to pt and πtn to πt in the weak topology. The theorem makes use of the metric dM as defined in Section A.10 which generates the weak topology on MF (Rd ). The choice of the metric is not important; any metric which generates the weak topology may be used.
10.3 Convergence of Approximations to πt
269
Theorem 10.12. Assume that the transition kernel for X is Feller and that the likelihood functions gt are all continuous for all t ∈ [0, T ]. Then the sequence pnt converges almost surely to pt and πtn converges almost surely to πt for all t ∈ [0, T ] if and only if the following two conditions are satisfied for all t ∈ [0, T ] a2. limn→∞ π0n = π0 , P-a.s. n b2. limn→∞ dM pnt , πt−1 Kt−1 = limn→∞ dM (πtn , π ¯tn ) = 0, P-a.s. Proof. The sufficiency of the conditions (a2) and (b2) is proved as above by induction using inequalities (10.23), (10.25) and (10.27). It remains to prove that (a2) and (b2) are necessary. Assume that for all t ≥ 0 pnt converges almost n surely to pt and πtn converges almost surely to πt This implies that πt−1 Kt−1 converges almost surely to pt (which is equal to πt−1 Kt−1 ) and using (10.25), that π ¯tn converges almost surely to πt . Hence, almost surely limn→∞ dM (pnt , pt ) = 0, limn→∞ dM (πtn , πt ) = 0, n limn→∞ dM (πt−1 Kt−1 , pt ) = 0 and limn→∞ dM (¯ πtn , πt ) = 0. Finally, using the triangle inequality n n dM pnt , πt−1 Kt−1 ≤ dM (pnt , pt ) + dM pt , πt−1 Kt−1 and dM (πtn , π ¯tn ) ≤ dM (πtn , πt ) + dM (πt , π ¯tn ) , t u
which imply (b2).
Remark 10.13. Theorems 10.7 and 10.12 and Corollary 10.10 are very natural. y They say that we obtain approximations of pt 0:t−1 and πty0:t for all t ∈ [0, T ] if and only if we start from an approximation of π0 and then ‘follow closely’ y the recurrence formula (10.20) for pt 0:t−1 and πty0:t . The natural question arises as to whether we can lift the results to the case when the observation process is random and not just a given fixed observation path. 10.3.2 The Random Observation Case In the previous section both the converging sequences and the limiting measures depend on the fixed value of the observation. Let us look first at the convergence in mean. If for an arbitrary f ∈ B(Rd ), the condition lim E [|πtn,y0:t f − πty0:t f |] = 0,
n→∞
holds for PY0:t -almost all values y0:t and there exists a PY0:t -integrable function w(y0:t ) such that, for all n ≥ 0,
270
10 Particle Filters in Discrete Time
E [|πtn,y0:t f − πty0:t f |] ≤ wf (y0:t ) PY0:t -a.s.,†
(10.31)
then by the dominated convergence theorem, i h lim E πtn,Y0:t f − πt f n→∞ Z = lim E [|πtn,y0:t f − πty0:t f |] PY0:t (dy0:t ) = 0. n→∞
(Rm )t+1
Hence conditions (a1) and (b1) are also sufficient for convergence in the random observation case. In particular, if (a1) and (b1) are satisfied for any f ∈ Cb (Rd ) and the two additional assumptions of Corollary 10.10 hold then πtn,Y0:t converges in expectation to πt . Similar remarks apply to pt . Also, the existence of rates of convergence for higher moments and appropriate integrability conditions can lead to the P-almost sure convergence of πtn,Y0:t to πt . However, a necessary and sufficient condition can not be obtained in this manner, since limn→∞ E[|πtn,Y0:t f − πt f |] = 0 does not imply lim E [|πtn,y0:t f − πty0:t f |] = 0
n→∞
for PY0:t -almost all values y0:t . n,Y The randomness of the approximating measures pt 0:t−1 and πtn,Y0:t now comes from two sources; one is the (random) observation Y and the other one is the actual construction of the approximations. In the case of particle approximations, randomness is introduced in the system during each of the propagation steps (see the next section for details). As the following convergence results show, the effect of the second source of randomness vanishes asymptotically (the approximating measures converge to pt and πt ). The following proposition is the equivalent of Theorem 10.7 for the random observation case. Here and throughout the remainder of the section the 0:t dependence on the process Y is suppressed from the notations pn,Y , πtn,Y0:t , t Yt gt , and so on. Proposition 10.14. Assume that for any t ≥ 0, there exists a constant ct > 0 such that pt gt ≥ ct . Then, for all f ∈ B(Rd ) and all t ≥ 0 the limits a0 0. limn→∞ E [|πtn f − πt f |] = 0, b0 0. limn→∞ E [|pnt f − pt f |] = 0, hold if and only if for all f ∈ B(Rd ) and all t ≥ 0 a1 0. limn→∞ E [|π0n f − π0 f |] = 0, n b1 0. limn→∞ E[|pnt f − Kt−1 πt−1 f |] = limn→∞ E[|πtn f − π ¯tn f |] = 0. †
Condition (10.31) is trivially satisfied for approximations which are probability measures since in this case wf = 2kf k∞ satisfies the condition.
10.3 Convergence of Approximations to πt
271
Proof. The proof follows step by step that of Theorem 10.7. The only step that differs slightly is the proof of convergence to zero of E[|¯ πtn f − πt f |]. Using the equivalent of the inequality (10.25) 1 n n E [|¯ πt f − πt f |] ≤ kf k∞ E |p gt − pt gt | pt gt t 1 |pnt (f gt ) − pt (f gt )| . (10.32) +E pt gt Since 1/(pt gt ) is now random it can not be taken outside the expectations as in (10.26). However, by using the assumption pt gt ≥ ct , we deduce that E [|¯ πtn f − πt f |] ≤
kf k∞ 1 E [|pnt gt − pt gt |] + E [|pnt (f gt ) − pt (f gt )|] ct ct
and hence the required convergence.
t u
The condition that pt gt ≥ ct is difficult to check in practice. It is sometimes replaced by the condition that E[1/(pt gt )2 ] < ∞ together with the convergence to zero of the second moments of pnt gt − pt gt and pnt (f gt ) − pt (f gt ) (see the proof of convergence of the particle filter in continuous time described in the previous chapter). As in the previous case, conditions (a10) and (b10) imply that pnt converges in expectation to pt and πtn converges in expectation to πt . A result analogous to Corollary 10.10 is true for the convergence in expectation of pnt and πtn , provided that the same additional constraints are imposed on the transition kernel of the signal and of the likelihood functions. The existence of rates of convergence for the higher moments of the error terms πtn f − πt f as in (10.30) can be used to deduce the almost sure convergence of the approximations in the weak topology with no additional constraints imposed upon the transition kernel of the signal or the likelihood function. However, in order to prove a similar result to Theorem 10.7, the same assumptions as in Corollary 10.10 must be imposed. The following theorem gives us the corresponding criterion for the almost sure convergence of pnt to pt and πtn to πt in the weak topology. The result is true without the need to use the cumbersome assumption pt gt ≥ ct for any t ≥ 0. It makes use of the metric dM , defined in Section A.10, which generates the weak topology on MF (Rd ). The choice of the metric is not important; any metric which generates the weak topology may be used. Proposition 10.15. Assume that the transition kernel for X is Feller and that the likelihood functions gt are all continuous. Then the sequence pnt converges almost surely to pt and πtn converges almost surely to πt , for all t ≥ 0 if and only if, for all t ≥ 0, a2 0. limn→∞ π0n = π0 , P-a.s. n b2 0. limn→∞ dM pnt , Kt−1 πt−1 = limn→∞ dM (πtn , π ¯tn ) = 0.
272
10 Particle Filters in Discrete Time
Proof. The proof is similar to that of Theorem 10.12, the only difference being the proof that limn→∞ pnt = pt , P-a.s. implies limn→∞ π ¯tn = πt , P-a.s. which is as follows. Let M be a convergence determining set of functions in Cb (Rd ), for instance, the set used to construct the metric dM . Then almost surely lim pnt gt = pt gt
n→∞
and
lim pnt (gt f ) = pt (gt f )
n→∞
for all f ∈ M.
Hence, again almost surely, we have pnt (gt f ) n→∞ pn t gt pt (gt f ) = (ω) = πt f, pt gt
lim π ¯tn f = lim
n→∞
which implies limn→∞ π ¯tn = πt , P-a.s.
∀f ∈ M t u
In the next section we present examples of approximations to the posterior distribution which satisfy the conditions of these results. The algorithms used to produce these approximations are called particle filters or sequential Monte Carlo methods.
10.4 Particle Filters in Discrete Time The algorithms presented below involve the use of a system of n particles which evolve (mutate) according to the law of X. After each mutation the system is corrected: each particle is replaced by a random number of particles whose mean is proportional to the likelihood of the position of the particle. After imposing some weak restrictions on the offspring distribution of the particles, the empirical measure associated with the particle systems is proven to converge (as n tends to ∞) to the conditional distribution of the signal given the observation. Denote by πtn the approximation to πt and by pnt the approximation to pt . The particle filter has the following description. 1. Initialization [t = 0]. (i) For i = 1, . . . , n, sample x0 from π0 . 2. Iteration [t − 1 to t]. (i) Let xt−1 , i = 1, . . . , n be the positions of the particles at time t − 1. (i) (i) a) For i = 1, . . . , n, sample x ¯t from Kt−1 (xt−1 , ·). Compute the (norPn (i) (i) (j) malized) weight wt = gt (¯ xt )/( j=1 gt (¯ xt )). Pn (i) (i) b) Replace each particle by ξt offspring such that i=1 ξt = n. Denote (i) the positions of the offspring particles by xt , i = 1, . . . , n.
10.4 Particle Filters in Discrete Time
273
It follows from the above that the particle filter starts from π0n : the empirical measure associated with a set of n random particles of mass 1/n whose (i) positions x0 for i = 1, . . . , n form a sample of size n from π0 , n
π0n ,
1X δ (i) . n i=1 x0
πtn ,
1X δ (i) , n i=1 xt
In general, define πtn to be n
(i)
where xt for i = 1, . . . , n are the positions of the particles of mass 1/n obtained after the second step of the iteration. Let π ¯tn be the weighted measure π ¯tn ,
n X i=1
(i)
wt δx¯(i) . t
We introduce the following σ-algebras Ft = σ(x(i) ¯(i) s ,x s , s ≤ t, i = 1, . . . , n) (i) F¯t = σ(x(i) ¯(i) ¯t , i = 1, . . . , n). s ,x s , s < t, x
Obviously F¯t ⊂ Ft and the (random) probability measures pnt and π ¯tn are F¯t n measurable whilst πt is Ft -measurable for any t ≥ 0. The random variables (i) x ¯t for i = 1, . . . , n are chosen to be mutually independent conditional upon Ft−1 . n The iteration uses πt−1 to obtain πtn , but not any of the previous approximations. Following part (a) of the iteration, each particle changes its position according to the transition kernel of the signal. Let pnt be the empirical distribution associated with the cloud of particles of mass 1/n after part (a) of the iteration n 1X pnt = δ (i) . n i=1 x¯t This step of the algorithm is known as the importance sampling step (popular in the statistics literature) or mutation step (inherited from the genetic algorithms literature). n n Exercise 10.16. Prove that E [pnt | Ft−1 ] = Kt−1 πt−1 . n Remark 10.17. An alternative way to obtain pnt from πt−1 is to sample n n n times from the measure Kt−1 πt−1 and define pt to be the empirical measure associated with this sample.
274
10 Particle Filters in Discrete Time (i)
We assume that the offspring vector ξt = (ξt )ni=1 satisfies the following two conditions. (i)
1. The conditional mean number of offspring is proportional to wt . More precisely h i (i) (i) E ξt | F¯t = nwt . (10.33) 2. Let Ant be the conditional covariance matrix of the random vector ξt , (i) (ξt )ni=1 , h i > Ant , E (ξt − nwt ) (ξt − nwt ) | F¯t with entries (Ant )ij = E
h
(i)
(i)
ξt − nwt
i (j) (j) ¯ ξt − nwt Ft ,
(i)
where wt , (wt )ni=1 is the vector of weights. Then assume that there exists a constant ct , such that q > Ant q ≤ nct (10.34) n for any n-dimensional vector q = q (i) i=1 ∈ Rn , such that |q (i) | ≤ 1 for i = 1, . . . , n. Exercise 10.18. Prove that the following identity holds n
πtn =
1 X (i) ξ δ (i) , n i=1 t x¯t
and that E[πtn | F¯t ] = π ¯tn . Step (b) of the iteration is called the selection step. The particles obtained after the first step of the recursion are multiplied or discarded according to the magnitude of the likelihood weights. In turn the likelihood weights are proportional to the likelihood of the new observation given the correspond(i) ing position of the particle (see Remark 10.3). Hence if nwt is small, fewer (i) offspring are expected than if nwt is large. Since (i) gt x ¯t (i) , nwt = P (j) n 1 ¯t j=1 gt x n (i)
(i)
nwt is small when the corresponding value of the likelihood function gt (¯ xt ) is smaller than the likelihood function averaged over the positions of all the particles. In conclusion, the effect of part (b) of the iteration is that it discards particles in unlikely positions and multiplies those in more likely ones. Following Exercise 10.18, this is done in an unbiased manner: the conditional expectation of the approximation after applying the step is equal to the weighted
10.5 Offspring Distributions
275
sample obtained after the first step of the recursion. That is, the average of (i) (i) the mass ξt /n associated with particle i is equal to wt , the weight of the particle before applying the step. Exercise 10.19. Prove that, for all f ∈ B(Rd ), we have h i c kf k2 t 2 ∞ E (πtn f − π ¯tn f ) ≤ . n Exercise 10.19 implies that the randomness introduced in part (b) of the iteration, as measured by the second moment of πtn f − π ¯tn f , tends to zero with rate given by 1/n, where n is the number of particles in the system. Lemma 10.20. Condition (10.34) is equivalent to q > Ant q ≤ n¯ ct (10.35) n for any n-dimensional vector q = q (i) i=1 ∈ [0, 1]n , where c¯t is a fixed constant. Proof. Obviously (10.34) implies (10.35), so we only need to show the reverse implication. Let q ∈ Rn be an arbitrary vector such that q = (q (i) )ni=1 , |q (i) | ≤ 1, i = 1, . . . , n. Let also (i) (i) (i) (i) q+ , max q (i) , 0 , q− , max −q (i) , 0 , 0 ≤ q+ , q− ≤ 1 (i)
(i)
and q+ = (q+ )ni=1 and q− = (q− )ni=1 . Then q = q+ − q− . Define k · kA to be the semi-norm associated with the matrix A; that is, p kqkA , q > Aq. If all the eigenvalues of A are strictly positive, then k · kA is a genuine norm. Using the triangle inequality and (10.35), √ kqkAnt ≤ kq+ kAnt + kq− kAnt ≤ 2 n¯ ct , which implies that (10.34) holds with ct = 4¯ ct .
t u
10.5 Offspring Distributions In order to have a complete description of the particle filter we need to specify the offspring distribution. The most popular offspring distribution is the multinomial distribution (1) (n) ξt = Multinomial n, wt , . . . , wt ; that is,
276
10 Particle Filters in Discrete Time n n(i) Y n! (i) (i) P ξt = n(i) , i = 1, . . . , n = Qn w . t (i) i=1 n ! i=1
The multinomial distribution is the empirical distribution of an n-sample from the distribution π ¯tn . In other words, if we sample (with replacement) n times (i) from the population of particles with positions x ¯t , i = 1, . . . , n according (i) to the probability distribution given by the corresponding weights wt , i = (i) 1, . . . , n and denote by ξt the number of times that the particle with position (i) (i) x ¯t is chosen, then ξt = (ξt )ni=1 has the above multinomial distribution. Lemma 10.21. If ξt has a multinomial distribution then it satisfies the unbiasedness condition; that is, h i (i) (i) E ξt | F¯t = nwt , for any i = 1, . . . , n. Also ξt satisfies condition (10.34). Proof. The unbiasedness condition follows immediately from the properties of the multinomial distribution. Also 2 (i) (i) (i) (i) E ξt − nwt | F¯t = nwt 1 − wt h i (i) (i) (j) (j) (i) (j) E ξt − nwt ξt − nwt | F¯t = −nwt wt , i 6= j. Then for all q = q (i) q > Ant q =
n X
n i=1
∈ [−1, 1]n ,
(i)
(i) wt
nwt
(i)
1 − wt
2 q (i) − 2
i=1
=n ≤n
n X i=1 n X
X
(i)
(j)
nwt wt q (i) q (j)
1≤i<j≤n
q
(i)
2
−n
n X
!2 (i) wt q (i)
i=1 (i)
wt ,
i=1
and since
Pn
i=1
(i)
wt = 1, (10.34) holds with ct = 1.
t u
The particle filter with this choice of offspring distribution is called the bootstrap filter or the sampling importance resampling algorithm (SIR algorithm). It was introduced by Gordon, Salmond and Smith in [106] (see the last section for further historical remarks). Within the context of the bootstrap filter, the second step is called the resampling step. The bootstrap filter is quick and easy to implement and amenable to parallelisation. This explains its great popularity among practitioners. However,
10.5 Offspring Distributions
277 (i)
it is suboptimal: the resampling step replaces the (normalised) weights wt (i) (i) by the random masses ξt /n, where ξt is the number of offspring of the ith (i) particle. Since ξt has a multinomial distribution, ξt can take any value be(i) tween 0 and n. That is, even when wt is high (the position of the ith particle is very likely), the ith particle may have very few offspring or even none at all (albeit with small probability). If ξt is obtained by residual sampling, rather than by independent sampling with replacement, then the above disadvantage can be avoided. In this case ξt = [nwt ] + ξ¯t .
(10.36) (i)
In (10.36), [nwt ] is the (row) vector of integer parts of the quantities nwt . That is, h i h i (1) (n) [nwt ] = nwt , . . . , nwt , and ξ¯t has multinomial distribution (1) (n) ξ¯t = Multinomial n ¯, w ¯t , . . . , w ¯t , where the integer n ¯ is given by n h n n i X o X (i) (i) n ¯ ,n− nwt = nwt i=1 (i)
and the weights w ¯t
i=1
are given by (i)
w ¯t
n o (i) nwt n o. ,P (i) n nw t i=1
By using residual sampling to obtain ξt , we ensure that the original weights (i) (i) wt are replaced by a random weight which is at least [nwt ]/n. This is the (i) closest integer multiple of 1/n lower than the actual weight wt . In this way, eliminating particles with likely positions is no longer possible. As long as the corresponding weight is larger than 1/n, the particle will have at least one offspring. Lemma 10.22. If ξt has distribution given by (10.36), it satisfies both the unbiasedness condition (10.33) and condition (10.34). Proof. The unbiasedness condition follows from the properties of the multinomial distribution: h i h i h i (i) (i) (i) E ξt | F¯t = nwt + E ξ¯t | F¯t h i (i) (i) = nwt + n ¯w ¯t h i n o (i) (i) (i) = nwt + nwt = nwt .
278
10 Particle Filters in Discrete Time
Also E
and
h E
2
(i)
(i)
(i)
(i)
ξt − nwt
ξt − nwt
2 (i) (i) | F¯t = E ξ¯t − {nwt } | F¯t (i) (i) =n ¯w ¯t 1 − w ¯t
i (j) (j) (i) (j) ξt − nwt | F¯t = −¯ nw ¯t w ¯t .
Then for all q = (q (i) )ni=1 ∈ [−1, 1]n , we have q > Ant q =
n X
(i)
n ¯w ¯t
(i)
1−w ¯t
2 q (i) − 2
i=1
= ≤
n X i=1 n X
X
(i)
(j)
n ¯w ¯t w ¯t q (i) q (j)
1≤i<j≤n (i) n ¯w ¯t
q
(i)
2
−n ¯
n X
!2 (i) w ¯t q (i)
i=1 (i)
n ¯w ¯t ,
i=1
and since
Pn
i=1
(i)
n ¯w ¯t =
(i) i=1 {nwt }
Pn
< n, (10.34) holds with ct = 1.
t u
Exercise 10.23. In addition to the bound on the second moment of πtn f −¯ πtn f resulting by imposing the assumption (10.34) on the offspring distribution ξt (see Exercise 10.19), prove that if ξt has multinomial distribution or the distribution given by (10.36), then there exists a constant c such that, for all f ∈ B(Rd ), we have h i ckf k4 4 ∞ E (πtn f − π ¯tn f ) | F¯t ≤ . n2 The residual sampling distribution is still suboptimal; the correction step (i) (i) now replaces the weight wt by the deterministic mass [nwt ]/n to which it (i) (i) adds a random mass given by ξ¯t /n, where ξ¯t can take any value between 0 and n ¯ . This creates a problem for particles with small weights. Even when (i) wt is small (the position of the ith particle is very unlikely) it may have a large number of offspring: up to n ¯ offspring are possible (albeit with small probability). The multinomial distribution also suffers from this problem. If ξt is obtained by using the branching algorithm described in Section 9.2.1, then both the above difficulties are eliminated. In this case, the number (i) of offspring ξt for each individual particle has the distribution h i n o (i) nwt(i) with probability 1 − nwt (i) i n o ξt = h (10.37) (i) nwt(i) + 1 with probability nwt ,
10.5 Offspring Distributions
279
Pn (i) whilst i=1 ξt remains equal to n. (i) If the particle has a weight wt > 1/n, then the particle will have offspring. (i) Thus if the corresponding likelihood function gt (¯ xt ) is larger than the likePn (j) lihood averaged over all the existing particles (1/n) j=1 gt (¯ xt ), then the (i)
ith site is selected and the higher the weight wt the more offspring the ith (i) particle will have. If wt is less than or equal to 1/n, the particle will have (i) at most one offspring. It will have no offspring with probability 1 − nwt , as (i) (i) (i) in this case nwt = {nwt }. Hence, if wt 1/n, no mass is likely to be assigned to site i; the ith particle is very unlikely and it is eliminated from the sample. The algorithm described in Section 9.2.1 belongs to a class of algorithms called tree-based branching algorithms. If ξt is obtained by using the branching algorithm described in Section 9.2.1, then it is optimal in the sense that, for (i) any i = 1, . . . , n, ξt has the smallest possible variance amongst all integer(i) valued random variables with the given mean nwt . Hence, the algorithm ensures that minimal randomness, as measured by the variance of the mass allocated to individual sites, is introduced to the system. The minimal variance property for the distribution produced by any tree-based branching algorithm holds true not only for individual sites but also for all groups of sites corresponding to a node of the building binary tree. A second optimality property of this distribution is that it has the minimal relative entropy with respect to the measure π ¯t which it replaces in the class of all empirical distributions of n particles of mass 1/n. The interested reader should consult Crisan [60] for details of these properties. See also K¨ unsch [169] for further results on the distribution produced by the branching algorithm. Lemma 10.24. If ξt is produced by the algorithm described in Section 9.2.1, it satisfies both unbiasedness condition (10.33) and condition (10.34). Proof. The unbiasedness condition immediately follows from (10.37) h i h i n o h i n o (i) (i) (i) (i) (i) (i) E ξt | F¯t = nwt 1 − nwt + nwt + 1 nwt = nwt . Also
E
(i) ξt
−
(i) nwt
2
n o n o (i) (i) ¯ | Ft = nwt 1 − nwt ,
and from Proposition 9.3, part (e), h i (i) (i) (j) (j) E ξt − nwt ξt − nwt | F¯t ≤ 0. Then for all q = (q (i) )ni=1 = [0, 1]n , we have q > Ant q ≤
n n o n o X (i) (i) nwt 1 − nwt , i=1
280
10 Particle Filters in Discrete Time (i)
(i)
and since {nwt }(1 − {nwt }) < 14 , following Lemma 10.20, condition (10.34) holds with ct = 14 . t u For further theoretical results related to the properties of the above offspring distributions, see Chopin [51] and K¨ unsch [169]. There exists another algorithm that satisfies the same minimal variance property of the branching algorithm described above. It was introduced by Carpenter, Clifford and Fearnhead in the context of particle approximations (see [38]). The method had appeared earlier in the field of genetic algorithms and it is known under the name of stochastic universal sampling (see Baker [6] and Whitley [268]). However, the offspring distribution generated by this method does not satisfy condition (10.34) and the convergence of the particle filter with this method is still an open question.† All offspring distributions presented above leave the total number of particles constant and satisfy (10.34). However, the condition that the total number of particles does not change is not essential. (i) One can choose the individual offspring numbers ξt to be mutually independent given F¯t . As alternatives for the distribution of the integer-valued (i) random variables ξt the following can be used. (i)
(i)
(i)
1. ξt = B(n, wt ); that is, ξt are binomially distributed with parameters (i) (n, wt ). (i) (i) (i) 2. ξt = P (nwt ); that is, ξt are Poisson distributed with parameters (i) nwt . (i) 3. ξt are Bernoulli distributed with distribution given by (10.37). (i)
Exercise 10.25. Show that if the individual offspring numbers ξt are mutually independent given F¯t and have any of the three distributions described above, then ξt satisfies both the unbiasedness condition and condition (10.34). The Bernoulli distribution is the optimal choice for independent offspring Pn (i) distributions. Since i=1 ξt is no longer equal to n, the approximating mean sure πt is no longer a probability measure. However, following the unbiasedness condition (10.33) and condition (10.34), the total mass πtn (1) of the approximating measure is a martingale which satisfies, for any t ∈ [0, T ], h i c 2 E (πtn (1) − 1) ≤ , n where c = c(T ) is a constant independent of n. This implies that for large n the mass oscillations become very small. Indeed, by Chebyshev’s inequality P (|πtn (1) − 1| ≥ ε) ≤ †
See K¨ unsch [169] for some partial results.
c . nε2
10.6 Convergence of the Algorithm
281
Hence, having a non-constant number of particles does not necessarily lead to instability. The oscillations in the number of particles can in themselves constitute an indicator of the convergence of the algorithm. Such an offspring distribution with independent individual offspring numbers is easy to implement and saves computational effort. An algorithm with variable number of particles is presented in Crisan, Del Moral and Lyons [67]. Theorems 10.7 and 10.12 and all other results presented above can be used in order to prove the convergence of the algorithm in [67] and indeed any algorithm based on such offspring distributions.
10.6 Convergence of the Algorithm First fix the observation process to an arbitrary value y0:T , where T is a finite time horizon and we prove that the random measures resulting from the class y of algorithm described above converge to πty0:t and pt 0:t−1 for all 0 ≤ t ≤ T . Exercise 10.26. Prove that π0n converges in expectation to π0 and also limn→∞ π0n = π0 , P-a.s. n ∞ Theorem 10.27. Let (pnt )∞ n=1 and (πt )n=1 be the measure-valued sequences produced by the class of algorithms described above. Then, for all 0 ≤ t ≤ T , we have lim E [|πtn f − πt f |] = lim E [|pnt f − pt f |] = 0, n→∞
n→∞
(pnt )∞ n=1
d
y
for all f ∈ B(R ). In particular, converges in expectation to pt 0:t−1 y0:t n ∞ and (πt )n=1 converges in expectation to πt for all 0 ≤ t ≤ T . Proof. We apply Theorem 10.7. Since (a1) holds as a consequence of Exercise 10.26, it is only necessary to verify condition (b1). From Exercise n 10.16, E [pnt f | Ft ] = πt−1 (Kt−1 f ) and using the independence of the sam(i) n ple {¯ xt }i=1 conditional on Ft−1 , h i 2 n E pnt f − πt−1 (Kt−1 f ) | Ft−1 ! n 2 1 X (i) (i) Ft−1 = 2E f x ¯t − Kt−1 f xt−1 n i=1 n 2 1 X (i) = 2 E f x ¯t | Ft−1 n i=1 n i2 1 X h (i) E K f x | F t−1 t−1 t−1 n2 i=1 1 n 2 = πt−1 Kt−1 f 2 − (Kt−1 f ) . n
−
282
10 Particle Filters in Discrete Time
n Therefore E[(pnt f − πt−1 Kt−1 f )2 ] ≤ kf k2∞ /n and the first limit in (b1) is satisfied. The second limit in (b1) follows from Exercise 10.19. t u
Corollary 10.28. For all 0 ≤ t ≤ T , there exists a constant kt such that kt kf k2∞ E (πtn f − πt f )2 ≤ , n
(10.38)
for all f ∈ B(Rd ). (i)
Proof. We proceed by induction. Since {x0 , i = 1, . . . , n} is an n-independent sample from π0 , kf k2∞ E (π0n f − π0 f )2 ≤ , n hence by Jensen’s inequality (10.38) is true for t = 0 with k0 = 1. Now assume that (10.38) holds at time t − 1. Then n kt−1 kKt−1 f k2∞ kt−1 kf k2∞ ≤ . (10.39) E (πt−1 (Kt−1 f ) − πt−1 (Kt−1 f ))2 ≤ n n Also from the proof of Theorem 10.27, h E
n pnt f − πt−1 Kt−1 f
2 i
≤
kf k2∞ . n
(10.40)
By using inequality (10.23) and the triangle inequality for the L2 -norm, h i kˆ kf k2 t 2 ∞ E (pnt f − pt f ) ≤ , n
(10.41)
p where kˆt = ( kt−1 + 1)2 . In turn, (10.41) and (10.25) imply that h i k¯ kf k2 t 2 ∞ E (¯ πtn f − πt f ) ≤ , n
(10.42)
where k¯t = 4kˆt kgt k2∞ /(pt gt )2 . From Exercise 10.19. h i c kf k2 t 2 ∞ E (πtn f − π ¯tn f ) ≤ , n
(10.43)
where ct is the constant appearing in (10.34). Finally from (10.42), (10.43) p √ and the triangle inequality (10.27), (10.38) holds with kt = ( ct + k¯t )2 . This completes the induction step. t u Condition (10.34) is essential in establishing the above rate of convergence. A more general condition than (10.34) is possible, for example, that there exists α > 0 such that q > Ant q ≤ nα ct (10.44)
10.6 Convergence of the Algorithm
283
for any q ∈ [−1, 1]n . In this case, inequality (10.43) would become 2
E[(πtn f − π ¯tn f ) ] ≤
ct kf k2∞ . n2−α
Hence the overall rate of convergence would take the form E (πtn f − πt f )2 ≤
kt kf k2∞ max(2−α,1) n
for all f ∈ B(Rd ). Hence if α > 1 we will see a deterioration in the overall rate of convergence. On the other hand, if α < 1 no improvement in the rate of convergence is obtained as the error in all the other steps of the particle filter remains of order 1/n. So α = 1 is the most suitable choice for condition (10.34). Theorem 10.29. If the offspring distribution is multinomial or is given by (10.36), then for all 0 ≤ t ≤ T , y
lim pnt = pt 0:t−1
n→∞
and
lim πtn = πty0:t
n→∞
P-a.s.
Proof. We apply Theorem 10.12. Since condition (a2) holds as a consequence of Exercise 10.26, it is only necessary to verify condition (b2). Let M ⊂ Cb (Rd ) be a countable, convergence determining set of functions (see Section A.10 for details). Following Exercise 10.16, for any f ∈ M, h i (i) (i) E f x ¯t Ft−1 = Kt−1 f xt−1 (i)
and using the independence of the sample {¯ xt }ni=1 conditional on Ft−1 , h i 4 n E pnt f − Kt−1 πt−1 f | Ft−1 !4 n X 1 (i) (i) Ft−1 = E f x ¯t − Kt−1 f xt−1 n i=1 n 4 1 X (i) (i) Ft−1 E f x ¯t − Kt−1 f xt−1 = 4 n i=1 2 X 6 (i) (i) + 4 E f x ¯t − Kt−1 f xt−1 n 1≤i<j≤n 2 (j) (j) Ft−1 . f x ¯t − Kt−1 f xt−1 (10.45) Observe that since kKt−1 f k∞ ≤ kf k∞ , 4 (i) (i) 4 E f x ¯t − Kt−1 f xt−1 Ft−1 ≤ 16kf k∞
284
10 Particle Filters in Discrete Time
and 2 2 (i) (i) (j) (j) E f x ¯t − Kt−1 f xt−1 f x ¯t − Kt−1 f xt−1 Ft−1 ≤ 16kf k4∞ . Hence by taking the expectation of both terms in (10.45) h E
n pnt f − Kt−1 πt−1 f
4 i
16kf k4∞ 6 n(n − 1) + 4 16kf k4∞ n3 n 2 48kf k4∞ ≤ . (10.46) n2 ≤
From (10.46), following Remark A.38 in the appendix, for any ε ∈ (0, 14 ) there exists a positive random variable cf,ε which is almost surely finite such that n cf,ε n pt f − Kt−1 πt−1 f ≤ ε . n n In particular |pnt f − Kt−1 πt−1 f | converges to zero, P-a.s., for any f ∈ M. n n Therefore limn→∞ dM pt , Kt−1 πt−1 = 0 which is the first limit in (b2). Similarly, following Exercise 10.23, one proves that, for all f ∈ M,
h i ckf k4 4 ∞ E (πtn f − π ¯tn f ) ≤ n2
(10.47)
which implies that limn→∞ dM (πtn , π ¯tn ) = 0, hence also the second limit in b2. holds. t u We now consider the case where the observation process is no longer a particular fixed outcome, but is random. With similar arguments one uses Propositions 10.14 and 10.15 to prove the following. Corollary 10.30. Assume that for all t ≥ 0, there exists a constant ct > 0 such that pt gt ≥ ct . Then we have i i h h n,Y lim E πtn,Y0:t f − πt f = lim E pt 0:t−1 f − pt f = 0 n→∞
n→∞
n,Y
for all f ∈ B(Rd ) and all t ≥ 0. In particular, (pt 0:t−1 )∞ n=1 converges in y0:t−1 n,Y0:t ∞ expectation to pt and (πt )n=1 converges in expectation to πty0:t for all t ≥ 0. Corollary 10.31. If the offspring distribution is multinomial or is given by (10.36), then n,Y0:t−1
lim pt
n→∞
for all t ≥ 0.
= pt
and
lim πtn,Y0:t = πt
n→∞
P-a.s.
10.7 Final Discussion
285
10.7 Final Discussion The results presented in Section 10.3 provide efficient techniques for proving convergence of particle algorithms. The necessary and sufficient conditions ((a0), (b0)), ((a1), (b1)) and ((a2), (b2)) are natural and easy to verify as it can be seen in the proofs of Theorems 10.27 and 10.29. The necessary and sufficient conditions can be applied when the algorithms studied provide both πtn (the approximation to πt ) and also pnt (the intermediate approximation to pt ). Algorithms are possible where πtn is obtained from n πt−1 without using the approximation for pt . In other words one can perform the mutation step using a different transition from that of the signal. In the statistics literature, the transition kernel Kt is usually called the importance ¯ t be used which distribution. Should a kernel (or importance distribution) K is different from that of the signal Kt , the form of the weights appearing in the selection step of the particle filter must be changed. The results presented ¯ t−1 πt−1 and the weighted in Section 10.3 then apply for pt now given by K n measure π ¯t defined in (10.21) given by π ¯tn =
n X i=1
(i)
(i)
w ¯t δx¯(i) , t
where w ¯t are the new weights. See Doucet et al. [83] and Pitt and Shephard [244] and the references contained therein which describe the use of such importance distributions. As already pointed out, the randomness introduced in the system at each selection step must be kept to a minimum as it affects the rate of convergence of the algorithm. Therefore one should not apply the selection step after every new observation arrives. Assume that the information received from the observation is ‘bad’ (i.e. the signal-to-noise ratio is small). Because of this, the likelihood function is close to being constant and the corresponding weights (i) are all (roughly) equal; w ¯t ' 1/n. In other words, the observation is uninformative; it cannot distinguish between different sites and all particles are equally likely. In this case no selection procedure needs to be performed. The observation is stored in the weights of the approximation π ¯tn and carried forward to the next step. If a correction procedure is nevertheless performed and ξt has a minimal variance distribution, all particles will have a single offspring ‘most of the time’. In other words the system remains largely unchanged with high probability. However, with small probability, the ith particle might have (i) (i) no offspring (if w ¯t < 1/n) or two offspring (if w ¯t > 1/n). Hence randomness still enters the system and this can affect the convergence rates (see Crisan and Lyons [66] for a related result in the continuous time framework). If ξt does not have a minimal variance distribution, the amount of randomness is even higher. It remains an open question as to when and how often one should use the selection procedure.
286
10 Particle Filters in Discrete Time
The first paper on the sequential Monte Carlo methods was that of Handschin and Mayne [120] which appeared in 1969. Unfortunately, Handschin and Mayne’s paper appeared at a time when the lack of computing power meant that it could not be implemented; thus their ideas were overlooked. In the late 1980s, the advances in computer hardware rekindled interest in obtaining sequential Monte Carlo methods for approximating the posterior distribution. The first paper describing numerical integration for Bayesian filtering was published by Kitagawa [150] in 1987. The area developed rapidly following the publication of the bootstrap filter by Gordon, Salmond and Smith [106] in 1993. The development of the bootstrap filter was inspired by the earlier work of Rubin [251] on the SIR algorithm from 1987. The use of the algorithm has spread very quickly among engineers and computer scientists. An important example is the work of Isard and Blake in computer vision (see [132, 133, 134]). The first convergence results on particle filters in discrete time were published by Del Moral in 1996 (see [214, 215]). Together with Rigal and Salut, he produced several earlier LAAS-CNRS reports which were originally classified ([219, 221, 220]) which contain the description of the bootstrap filter. The condition (10.34) was introduced by Crisan, Del Moral and Lyons in 1999 (see [67]). The tree-based branching algorithm appeared in Crisan and Lyons [66]. In the last ten years we have witnessed a rapid development of the theory of particle filters in discrete time. The discrete time framework has been extensively studied and a multitude of convergence and stability results have been proved. A comprehensive account of these developments in the wider context of approximations of Feynman–Kac formulae can be found in Del Moral [216] and the references therein.
10.8 Solutions to Exercises 10.1 i.
For ϕ = IA where A is an arbitrary Borel set, Kt ϕ ∈ B(Rd ) by property (ii) of the transition kernel. By linearity, the same is true for ϕ being a simple function, that is, a linear combination of indicator functions. Consider next an arbitrary ϕ ∈ B(Rd ). Then there exists a sequence of simple functions (ϕn )n≥0 uniformly bounded which converges to ϕ. Then by the dominated convergence theorem Kt (ϕn )(x) converges to Kt (ϕ)(x) for any x ∈ Rd . Hence Kt (ϕ)(x) is Borel-measurable. The boundedness results from the fact that Z Z |Kt ϕ(x)| = ϕ(y)Kt (x, dy) ≤ kϕk∞ Kt (x, dy) = kϕk∞ Rd
for any x ∈ Rd ; hence kKt ϕk∞ ≤ kϕk∞ .
Rd
10.8 Solutions to Exercises
287
ii. Let Ai ∈ B(Rd ) be a sequence of disjoint sets for i = 1, 2, . . . , then using property (i) of Kt , Z ∞ Kt qt (∪i=1 Ai ) = Kt (x, ∪∞ i=1 Ai )qt (dx) Rd
Z =
lim
N X
Rd N →∞ i=1 N Z X
= lim
N →∞
i=1
Kt (x, Ai )qt (dx)
Kt (x, Ai )qt (dx) =
Rd
∞ X (Kt qt )(Ai ), i=1
where the bounded convergence theorem was used to interchange the limit and the integral (using Kt (x, Ω) = 1 as the bound). Consequently Kt qt is countably additive and hence a measure. To check that it is a probability measure Z Z Kt qt (Ω) = Kt (x, Ω)qt (dx) = qt (dx) = 1. Rd
iii.
Rd
Z (Kt qt )(ϕ) =
Z ϕ(y)
y∈Rd
Kt (x, dy)qt (dx). x∈Rd
By Fubini’s theorem, which is applicable since ϕ is bounded and as a consequence of (ii) Kt qt is a probability measure, which implies that Kt qt (|ϕ|) ≤ kϕk∞ < ∞), Z Z (Kt qt )(ϕ) = qt (dx) ϕ(y)Kt (x, dy) = qt (Kt ϕ) x∈Rd
y∈Rd
and the general case follows by induction. 10.5 Finite additivity is trivial from the linearity of the integral, and countable additivity follows from the bounded convergence theorem, since ϕ is bounded. Thus ϕ ∗ p is a measure. It is clear that ϕ ∗ p(Ω) = p(ϕ)/p(ϕ) = 1, so it is a probability measure. 10.9 i. For arbitrary ϕ ∈ Cb (Rd ). It is clear that Z 1 (y − a(x))2 Kt ϕ(x) = ϕ(y) √ exp − 2 2π Rd 2 Z 1 y = ϕ(y + a(x)) √ exp − . 2 2π Rd Then by the dominated convergence theorem using the continuity of a, lim Kt (ϕn )(x) = Kt (ϕn )(x0 )
x→x0
for arbitrary x0 ∈ Rd , hence the continuity of Kt ϕ.
288
10 Particle Filters in Discrete Time
ii. Choose a strictly increasing ϕ ∈ Cb (Rd ). Then, as above 2 Z 1 y lim Kt (ϕn )(x) = ϕ(y − 1) √ exp − x↑0 2 2π Rd 2 Z 1 y < ϕ(y + 1) √ exp − = lim Kt (ϕn )(x). x↓0 2 2π Rd 10.11 i. Let f ∈ Cb (R); then
n
µn f =
1X f (i/n) n i=1
and Z
1
µf =
f (x) dx. 0
As f ∈ Cb (R), it is Riemann integrable. Therefore the Riemann approximation µn f → µf as n → ∞. ii. If f = 1Q∩[0,1] then µn f = 1, yet µf = 0. Hence µn f 6→ µf as n → ∞. 10.16 Following part (a) of the iteration we get, for arbitrary f ∈ B(Rd ), that h i (i) (i) E f x ¯t Ft−1 = Kt−1 f (xt−1 ) ∀i = 1, . . . , n; hence n
E[pnt f | Ft ] =
n
i 1 X h (i) 1X (i) E f x ¯t Ft−1 = Kt−1 f xt−1 n i=1 n i=1
n = Kt−1 πt−1 (f ) = πt−1 (Kt−1 f ).
10.18 The first assertion follows trivially from part (b) of the iteration. Next observe that, for all f ∈ B(Rd ), n h i 1X (i) (i) E πtn f | F¯t = f (¯ xt )E ξt | F¯t = π ¯tn f, n i=1 (i)
(i)
since E[ξt | F¯t ] = nwt
for any i = 1, . . . , n.
10.19 h E
(πtn f
−
2 π ¯tn f )
n i 1 X (i) (j) n ¯ | Ft = 2 f x ¯t f x ¯t (At )ij . n i,j=1
By applying (10.34) with q = q (i) that
d i=1
(i)
(10.48)
, where q (i) = f (¯ xt )/kf k∞ , we get
10.8 Solutions to Exercises
(i)
n f x ¯t X i,j=1
kf k∞
(Ant )ij
(j)
f x ¯t
kf k∞
289
≤ nct .
(10.49)
The exercise now follows from (10.48) and (10.49). 10.23 The multinomial distribution is the empirical distribution of an nsample from the distribution π ¯tn . Hence, in this case, πtn has the representation n
πtn =
1X δ (i) , n 1 ζt
(i) where ζt are random variables mutually independent given F¯t such that (i) E[f (ζt ) | F¯t ] = π ¯tn f for any f ∈ B(Rd ). Using the independence of the (i) n sample {ζt }i=1 conditional on F¯t , !4 n X 1 (i) E[(πtn f − π ¯tn f )4 | F¯t ] =E (f (ζt ) − π ¯tn f ) F¯t n i=1 n 1 X (i) E[(f (ζt ) − π ¯tn f )4 | F¯t ] n4 i=1 X 6 (i) (j) + 4 E[(f (ζt ) − π ¯tn f )2 (f (ζt ) − π ¯tn f )2 | F¯t ]. n
=
1≤i<j≤n
Observe that since |¯ πtn f | ≤ kf k∞ , (i)
E[(f (ζt ) − π ¯tn f )4 | F¯t ] ≤ 16kf k4∞ and (i) (j) E[(f (ζt ) − π ¯tn f )2 (f (ζt ) − π ¯tn f )2 | F¯t ] ≤ 16kf k4∞ .
Hence 16kf k4∞ 48kf k4∞ (n − 1) 48kf k4∞ E[(πtn f − π ¯tn f )4 | F¯t ] ≤ + ≤ . n3 n3 n2 The bound for the case when the offspring distribution is given by (10.36) is proved in a similar manner. 10.25 π0n is the empirical measure associated with a set of n random particles (i) of mass 1/n whose positions x0 for i = 1, . . . , n form a sample of size n from (i) π0 . Hence, in particular, E[f (x0 )] = π0n f for any f ∈ B(Rd ) and by a similar argument to that in Exercise 10.23, E[(π0n f − π0 f )4 | F¯t ] ≤
48kf k4∞ , n2
290
10 Particle Filters in Discrete Time
which implies the convergence in expectation by Jensen’s inequality and the almost sure convergence follows from Remark A.38 in the appendix. 10.26 Immediate from the computation of the first and second moments of the binomial, Poisson and Bernoulli distributions and the fact that, due to the (i) independence of the random variables ξt , the conditional covariance matrix Ant is diagonal, h i (i) (i) (j) (j) (Ant )ij = E ξt − nwt ξt − nwt | F¯t = 0, i 6= j, (i)
where wt , (wt )ni=1 is the vector of weights. Hence q > Ant q =
n 2 X (i) (i) (q (i) )2 E ξt − nwt | F¯t i=1
for any n-dimensional vector q = q (i)
n i=1
∈ [−1, 1]n .
Part III
Appendices
A Measure Theory
A.1 Monotone Class Theorem Let S be a set. A family C of subsets of S is called a π-system if it is closed under finite intersection. That is, for any A, B ∈ C we have that A ∩ B ∈ C. Theorem A.1. Let H be a vector space of bounded functions from S into R containing the constant function 1. Assume that H has the property that for any sequence (fn )n≥1 of non-negative functions in H such that fn % f where f is a bounded function on S, then f ∈ H. Also assume that H contains the indicator function of every set in some π-system C. Then H contains every bounded σ(C)-measurable function of S. For a proof of Theorem A.1 and other related results see Williams [272] or Rogers and Williams [248].
A.2 Conditional Expectation Let (Ω, F, P) be a probability space and G ⊂ F be a sub-σ-algebra of F. The conditional expectation of an integrable F-measurable random variable ξ given G is defined as the integrable G-measurable random variable, denoted by E[ξ | G], with the property that Z Z ξ dP = E[ξ | G] dP, for all A ∈ G. (A.1) A
A
Then E[ξ | G] exists and is almost surely unique (for a proof of this result see for example Williams [272]). By this we mean that if ξ¯ is another G-measurable integrable random variable such that Z Z ¯ ξ dP = E[ξ | G] dP, for all A ∈ G, A
A
294
A Measure Theory
¯ P-a.s. then E[ξ | G] = ξ, The following are some of the important properties of the conditional expectation which are used throughout the text. a. If α1 , α2 ∈ R and ξ1 , ξ2 are F-measurable, then E[α1 ξ1 + α2 ξ2 | G] = α1 E[ξ1 | G] + α2 E[ξ2 | G],
P-a.s.
b. If ξ ≥ 0, then E[ξ | G] ≥ 0, P-a.s. c. If 0 ≤ ξn % ξ, then E[ξn | G] % E[ξ | G], P-a.s. d. If H is a sub-σ-algebra of G, then E [E[ξ | G] | H] = E[ξ | H], P-a.s. e. If ξ is G-measurable, then E[ξη | G] = ξE[η | G], P-a.s. f. If H is independent of σ(σ(ξ), G), then E[ξ | σ(G, H)] = E[ξ | G],
P-a.s.
The conditional probability of a set A ∈ F with respect to the σ-algebra G is the random variable denoted by P(A | G) defined as P(A | G) , E[IA | G], where IA is the indicator function of the set A. From (A.1), Z P(A ∩ B) = P(A | G) dP, for all B ∈ G. (A.2) B
This definition of conditional probability has the shortcoming that the conditional probability P(A | G) is only defined outside of a null set which depends upon the set A. As there may be an uncountable number of possible choices for A, P(· | G) may not be a probability measure. Under certain conditions regular conditional probabilities as in Definition 2.28 exist. Regular conditional distributions (following the nomenclature in Breiman [23] whose proof we follow) exist under much less restrictive conditions. Definition A.2. Let (Ω, F, P) be a probability space, (E, E) be a measurable space, X : Ω → E be an F/E-measurable random element and G a sub-σalgebra of F. A function Q(ω, B) defined for all ω ∈ Ω and B ∈ E is called a regular conditional distribution of X with respect to G if (a) For each B ∈ E, the map Q(·, B) is G-measurable. (b) For each ω ∈ Ω, Q(ω, ·) is a probability measure on (E, E). (c) For any B ∈ E, Q(·, B) = P(X ∈ B | G)
P-a.s.
(A.3)
Theorem A.3. If the space (E, E) in which X takes values is a Borel space, that is, if there exists a function ϕ : E → R such that ϕ is E-measurable and ϕ−1 is B(R)-measurable, then the regular conditional distribution of the variable X conditional upon G in the sense of Definition A.2 exists.
A.2 Conditional Expectation
295
Proof. Consider the case when (E, E) = (R, B(R)). First we construct a regular version of the distribution function P(X < x | G). Define a countable family of random variables by selecting versions Qq (ω) = P(X < q | G)(ω). For each q ∈ Q, define for r, q ∈ Q, Mr,q , {ω : Qr < Qq } and then define the set on which monotonicity of the distribution function fails [ M, Mr,q . r>q r,q∈Q
It is clear from property (b) of the conditional expectation that P(M ) = 0. Similarly define for q ∈ Q, Nq , ω : lim Qr 6= Qq r↑q
and N,
[
Nq ;
q∈Q
by property (c) of conditional expectation it follows that P(Nq ) = 0, so P(N ) = 0. Finally define L∞ , ω : q→∞ lim Qq 6= 1 and L−∞ , ω : lim Qq 6= 0 , q→−∞ q∈Q
and again P(L∞ ) = P(L−∞ ) = 0. Define lim r↑x Qr r∈Q F (x | G) , Φ(x)
q∈Q
if ω ∈ / M ∪ N ∪ L∞ ∪ L−∞ otherwise,
where Φ(x) is the distribution function of the normal N (0, 1) distribution (its choice is arbitrary). It follows using property (c) of conditional expectation applied to the functions fri = 1(−∞,ri ) with ri ∈ Q a sequence such that ri ↑ x that F (x | G) satisfies all the properties of a distribution function and is a version of P(X < x | G). This distribution function can be extended to define a measure Q(· | G). Let H be the class of B ∈ B(R) such that Q(B | G) is a version of P(X ∈ B | G). It is clear that H contains all finite disjoint unions of intervals of the form [a, b) for a, b ∈ R so by the monotone class theorem A.1 the result follows. In the general case, Y = ϕ(X) is a real-valued random variable and so has ˆ regular conditional distribution such that for B ∈ B(R), Q(B | G) = P(Y ∈ B | G); thus define ˆ Q(B | G) , Q(ϕ(B) | G), and since ϕ−1 is measurable it follows that Q has the required properties.
t u
296
A Measure Theory
Lemma A.4. If X is as in the statement of Theorem A.3 and ψ is a Emeasurable function such that E[|ψ(X)|] < ∞ then if Q(· | G) is a regular conditional distribution for X given G it follows that Z E[ψ(X) | G] = ψ(x)Q(dx | G). E
Proof. If A ∈ B then it is clear that the result follows from (A.3). By linearity this extends to simple functions, by monotone convergence to non-negative functions, and in general write ψ = ψ + − ψ − . t u
A.3 Topological Results Definition A.5. A metric space (E, d) is said to be separable if it has a countable dense set. That is, for any x ∈ E, given ε > 0 we can find y in this countable set such that d(x, y) < ε. Lemma A.6. Let (X, ρ) be a separable metric space. Then X is homeomorphic to a subspace of [0, 1]N , the space of sequences of real numbers in [0, 1] with the topology of co-ordinatewise convergence. Proof. Define a bounded version of the metric ρˆ , ρ/(1 + ρ); it is easily checked that this is a metric on X, and the space (X, ρˆ) is also separable. Clearly the metric satisfies the bounds 0 ≤ ρˆ ≤ 1. As a consequence of separability we can choose a countable set x1 ,x2 , . . . which is dense in (X, ρˆ). Define J = [0, 1]N and endow this space with the metric d which generated the topology of co-ordinatewise convergence. Define α : X → J, α : x 7→ (ˆ ρ(x, x1 ), ρˆ(x, x2 ), . . .). Suppose x(n) → x in X; then by continuity of ρˆ it is immediate that ρˆ(x(n) , xk ) → ρˆ(x, xk ) for each k ∈ N and thus α(x(n) ) → α(x). Conversely if α(x(n) ) → α(x) then this implies that ρˆ(x(n) , xk ) → ρˆ(x, xk ) for each k. Then by the triangle inequality ρˆ(x(n) , x) ≤ ρˆ(x(n) , xk ) + ρˆ(xk , x) and since ρˆ(x(n) , xk ) → ρˆ(x, xk ) it is immediate that lim sup ρˆ(x(n) , x) ≤ 2ˆ ρ(xk , x) ∀k. n→∞
As this holds for all k ∈ N and the xk s are dense in X we may pick a sequence xmk → x whence ρˆ(x(n) , x) → 0 as n → ∞. Hence α is a homeomorphism X → J. t u The following is a standard result and the proof is based on that in Rogers and Williams [248] who reference Bourbaki [22] Chapter IX, Section 6, No 1.
A.3 Topological Results
297
Theorem A.7. A complete separable metric space X is homeomorphic to a Borel subset of a compact metric space. Proof. By Lemma A.6 there is a homeomorphism α : X → J. Let d denote the metric giving the topology of co-ordinatewise convergence on J. We must now consider α(X) and show that it is a countable intersection of open sets in J and hence belongs to the σ-algebra of open sets, the Borel σ-algebra. For ε > 0 and x ∈ X we can find δ(ε) such that for any y ∈ X, d(α(x), α(y)) < δ implies that ρˆ(x, y) < ε. For n ∈ N set ε = 1/(2n) and then consider the ball B(α(x), δ(ε) ∧ ε). It is immediate that the d-diameter of this ball is at most 1/n. But also, as a consequence of the choice of δ, the image under α−1 of the intersection of this ball with X has ρˆ-diameter at most 1/n. Let α(X) be the closure of α(X) under the metric d in J. Define a set Un ⊆ α(X) to be the set of x ∈ α(X) such that there exists an open ball Nx,n about x of d-diameter less than 1/n, with ρˆ-diameter of the image under α−1 of the intersection of α(X) and this ball less than 1/n. By the argument of the previous paragraph we see that if x ∈ α(X) we can always find such a ball; hence α(X) ⊆ Un . T For x ∈ ∩n Un choose xn ∈ α(X) ∩ k≤n Nx,k . By construction d(x, xk ) ≤ 1/n, thus xn → x as n → ∞ under the d metric on J. However, for r ≥ n both points xr and xn are in Nx,n thus ρˆ(α−1 (xr ), α−1 (xn )) ≤ 1/n, so (α−1 (xr ))r≥1 is a Cauchy sequence in (X, ρˆ). But this space is complete so there exists y ∈ X such that α−1 (xn ) → y. As α is a homeomorphism this implies that d(xn , α(y)) → 0. Hence by uniqueness of limits x = α(y) and thus it is immediate that x ∈ α(X). Therefore ∩n Un ⊆ α(X); since α(X) ⊆ Un it follows immediately that \ α(X) = Un . (A.4) n
It is now necessary to show that Un is relatively open in α(X). From the definition of Un , for any x ∈ Un we can find Nx,n with diameter properties as above which is a subset of J containing x. For any arbitrary z ∈ α(X), by (A.4) there exists x ∈ Un such that z ∈ Nx,n ; then by choosing Nz,n = Nx,n it is clear that z ∈ Un . Therefore Nx,n ∩ α(X) ⊆ Un from which we conclude that Un is relatively open in α(X). Therefore we can write Un = α(X) ∩ Vn where Vn is open in J ! \ \ α(X) = Un = α(X) ∩ Vn , (A.5) n
n
where Vn are open subsets of J. It only remains to show that α(X) can be expressed as a countable intersection of open sets; this is easily done since \ α(X) = {x ∈ J : d(x, α(X)) < 1/n} , n
298
A Measure Theory
therefore it follows that α(X) is a countable intersection of open sets in J. Together with (A.5) it follows that α(X) is a countable intersection of open sets. t u Theorem A.8. Any compact metric space X is separable. Proof. Consider the open cover of X which is the uncountable union of all balls of radius 1/n centred on each point in X. As X is compact there exists a finite subcover. Let xn1 , . . . , xnNn be the centres of the balls in one such finite subcover. By a diagonal argument we can construct a countable set which is the union of all these centres for all n ∈ N. This set is clearly dense in X and countable, so X is separable. t u Theorem A.9. If E is a compact metric space then the set of continuous real-valued functions defined on E is separable. Proof. By Theorem A.8, the space E is separable. Let x1 ,x2 , . . . be a countable dense subset of E. Define h0 (x) = 1, and hn (x) = d(x, xn ), for n ≥ 1. Now define an algebra of polynomials in these hn s with coefficients in the rationals n o X n ,...,n ,...,nr A = x 7→ qk00,...,krr hnk00 (x) . . . hnkrr (x) : qkn00,...,k ∈ Q . r The closure of A is an algebra containing constant functions and it is clear that it separates points in E, therefore by the Stone–Weierstrass theorem, it follows that A is dense in C(E). t u Corollary A.10. If E is a compact metric space then there exists a countable set f1 , f2 , . . . which is dense in C(E). Proof. By Theorem A.8 E is separable, so by Theorem A.9 the space C(E) is separable and hence has a dense countable subset. t u
A.4 Tulcea’s Theorem Tulcea’s theorem (see Tulcea [265]) is frequently stated in the form for product spaces and their σ-algebras (for a very elegant proof in this vein see Ethier and Kurtz [95, Appendix 9]) and this form is sufficient to establish the existence of stochastic processes. We give the theorem in a more general form where the measures are defined on the same space X, but defined on an increasing family of σ-algebras Bn as this makes the important condition on the atoms of the σ-algebras clear. The approach taken here is based on that in Stroock and Varadhan [261]. Define the atom A(x) of the Borel σ-algebra B on the space X, for x ∈ X by \ A(x) , {B : B ∈ B, x ∈ B}, (A.6) that is, A(x) is the smallest element of B which contains x.
A.4 Tulcea’s Theorem
299
Theorem A.11. Let (X, B) be a measurable space S∞and let Bn be an increasing family of sub-σ-algebras of B such that B = σ( n=1 Bn ). Suppose that these σ-algebras satisfy the following constraint. T∞ If An is a sequence of atoms such that An ∈ Bn and A1 ⊇ A2 ⊇ · · · then n=0 An 6= ∅. Let P0 be a probability measure defined on B0 and let πn be a family of probability kernels, where πn (x, ·) is a measure on (X, Bn ) and the mapping x 7→ πn (x, ·) is Bn−1 -measurable. Such a probability kernel allows us to define inductively a family of probability measures on (X, Bn ) via Z Pn (A) , πn (x, A)Pn−1 (dx), (A.7) X
with the starting point for the induction being given by the probability measure P0 . Suppose that the kernels πn (x, ·) satisfy the compatibility condition that for x∈ / Nn , where Nn is a Pn -null set, the kernel πn+1 (x, ·) is supported on An (x) (i.e. if B ∈ Bn+1 and B ∩ An (x) = ∅ then πn+1 (x, B) = 0). That is, starting from a point x, the transition measure only contains with positive probability transitions to points y such that x and y belong to the same atom of Bn . Then there exists a unique probability measure P defined on B such that P|Bn = Pn for all n ∈ N. Proof. It is elementary to see that Pn as defined in (A.7) is a probability measure on BnSand that Pn+1 agrees with Pn on Bn . We can then define a set function P on Bn by setting P(Bn ) = Pn (Bn ) for Bn ∈ Bn . From the definition (A.7), for B ∈ Bn we have defined Pn inductively via the transition functions Z Z Pn (Bn ) = ··· πn (qn−1 , B)πn−1 (qn−2 , dqn−1 ) · · · π1 (q0 , dq1 ) P0 (dq0 ). X
X
(A.8) To simplify the notation define π m,n such that π m,n (x, ·) is a measure on M(X, Bn ) as follows. If m ≥ n ≥ 0 and B ∈ Bn , then define π m,n (x, B) = 1B (x) which is clearly Bn -measurable and hence as Bm ⊇ Bn , x 7→ π m,n (x, B) is also Bm -measurable. If m < n define π m,n inductively using the transition kernel πn , Z m,n π (x, B) , πn (yn−1 , B)π m,n−1 (x, dyn−1 ). (A.9) X
It is clear that in both cases x 7→ π m,n (x, ·) is Bm -measurable. Thus π m,n can be viewed as a transition kernel from (X, Bm ) to (X, Bn ). From these definitions, for m < n Z Z m,n π (x, B) = ··· πn (yn−1 , B) · · · πm+1 (ym , dym+1 )π m,m (x, dym ) X X Z Z = ··· πn (yn−1 , B) · · · πm+2 (ym+1 , dym+2 )πm+1 (x, dym+1 ). X
X
300
A Measure Theory
It therefore follows from the above with m = 0 and (A.8) that for B ∈ Bn , Z P(Bn ) = Pn (Bn ) = π 0,n (y0 , B)P0 (dy0 ). (A.10) X
S∞ We must show that P is a probability measure on n=0 Bn , as then the Carath´eodory extension theorem† establishes the existence of an extension S∞ to a probability measure on (X, σ ( n=0 B)). The only non-trivial condition which must be verified for P to be a measure is countable additivity. A necessary and sufficient condition for countable additivity of P is that S T if Bn ∈ n Bn , are such that B1 ⊇ B2 ⊇ · · · and n Bn = ∅ then P(Bn ) → 0 as n → ∞ (the proof can be found in many books on measure theory, see for example page 200 of Williams [272]). It is clear that the non-trivial cases are covered by considering Bn ∈ Bn for each n ∈ N. We argue by contradiction; suppose that P(Bn ) ≥ ε > 0 for all n ∈ N. We T must exhibit a point of n Bn ; as we started with the assumption that this intersection was empty, this is the desired contradiction. Define Fn0 , x ∈ X : π 0,n (x, Bn ) ≥ ε/2 . (A.11) Since x 7→ π 0,n (x, ·) is B0 -measurable, it follows that F0n ∈ B0 . Then from (A.10) it is clear that P(Bn ) ≤ P0 (Fn0 ) + ε/2. As by assumption P(Bn ) ≥ ε for all n ∈ N, we conclude that P0 (Fn0 ) ≥ ε/2 for all n ∈ N. 0 Suppose that x ∈ Fn+1 ; then π 0,n+1 (x, Bn+1 ) ≥ ε/2. But Bn+1 ⊆ Bn , so 0,n+1 π (x, Bn ) ≥ ε/2. From (A.9) it follows that Z π 0,n+1 (x, Bn ) = πn+1 (yn , Bn )π 0,n (x, dyn ), X
for y ∈ / Nn , the probability measure πn+1 (y, ·) is supported on An (y). As Bn ∈ Bn , from the definition of an atom, it follows that y ∈ Bn if and only if An (y) ⊆ Bn , thus πn+1 (y, Bn ) = 1Bn (y) for y ∈ / Nn . So on integration we obtain that π 0,n (x, Bn ) = π 0,n+1 (x, Bn ) ≥ ε/2. Thus x ∈ F0n . So we have shown that F0n+1 ⊆ F0n . Since P0 (F0n ) ≥ ε/2 for allTn and the Fn form a non-increasing sequence, ∞ it is then immediate that P0 ( n=0 F0n ) ≥ ε/2, whence we can find x0 ∈ / N0 0,n such that π (x0 , Bn ) ≥ ε/2 for all n ∈ N. Now we proceed inductively; suppose that we have found x0 , x1 , . . . xm−1 such that x0 ∈ / N0 and xi ∈ Ai−1 (xi−1 ) \ Ni for i = 1, . . . , m − 1, and †
Carath´eodory extension theorem: Let S be a set, S0 be an algebra of subsets of S and S = σ(S0 ). Let µ0 be a countably additive map µ0 : S0 → [0, ∞]; then there exists a measure µ on (S, S) such that µ = µ0 on S0 . Furthermore if µ0 (S) < ∞, then this extension is unique. For a proof of the theorem see, for example, Williams [272] or Rogers and Williams [248].
A.4 Tulcea’s Theorem
301
π i,n (xi , Bn ) ≥ ε/2i+1 for all n ∈ N for each i = 0, . . . , m − 1. We have already established the result for the case m = 0. Now define Fnm , {x ∈ X : π m,n (x, Bn ) ≥ ε/2m+1 }; from the integral representation for π m,n , Z π m−1,n (x, Bn ) = π m,n (ym , Bn )πm (x, dym ), X
it follows by an argument analogous to that for Fn0 , that ε/2m ≤ π m−1,n (xm−1 , Bn ) ≤ ε/2m+1 + πm (xm−1 , Fnm ), where the inequality on the left hand side follows from the inductive hypothm esis. As in the case for m = 0, we can deduce that Fn+1 ⊆ Fnm . Thus ! ∞ \ πm xm−1 , Fnm ≥ ε/2m+1 , (A.12) n=0
T∞ which implies that we can choose xm ∈ n=0 Fnm , such that π m,n (xm , Bn ) > ε/2m+1 for all n ∈ N, and from (A.12) as the set of suitable xm has strictly positive probability, it cannot be empty, and we can choose an xm not in the Pm -null set Nm . Therefore, this choice can be made such that xm ∈ Am−1 (xm−1 ) \ Nm . This establishes the required inductive step. Now consider the case of π n,n (xn , Bn ); we see from the definition that this is just 1Bn (xn ), but by choice of the xn s, π n,n (xn , Bn ) > 0. Consequently as xn ∈ / Nn , by the support propertyTof the transition T kernels, it follows that AT (x ) ⊆ B for each n. Thus A (x ) ⊂ Bn and if we define n n n n n n Kn , i=0 Ai (xi ) it follows that xn ∈ Kn and Kn is a descending sequence; by the σ-algebra property it is clear that Kn ∈ Bn , and since An (xn ) is an atom in Bn it follows that Kn = An (xn ). We thus have a decreasing sequence of T atoms; by the initial assumption, Tsuch an intersection is non-empty, that is, An (xn ) 6= ∅ which implies that Bn 6= ∅, but this is a contradiction, since we assumed that this intersection was empty. Therefore P is countably additive and the existence of an extension follows from the theorem of Carath´eodory. t u A.4.1 The Daniell–Kolmogorov–Tulcea Theorem The Daniell–Kolmogorov–Tulcea theorem gives conditions under which the law of a stochastic process can be extended from its finite-dimensional distributions to its full (infinite-dimensional) law. The original form of this result due to Daniell and Kolmogorov (see Doob [81] or Rogers and Williams [248, section II.30]) requires topological conditions on the space X; the space X needs to be Borel, that is, homeomorphic to a
302
A Measure Theory
Borel set in some space, which is the case if X is a complete separable metric space as a consequence of Theorem A.7. It is possible to take an alternative probabilistic approach using Tulcea’s theorem. In this approach the finite-dimensional distributions are related to each other through the use of regular conditional probabilities as transition kernels; while this does not explicitly use topological conditions, such conditions may be required to establish the existence of these regular conditional probabilities (as was seen in Exercise 2.29 regular conditional probabilities are guaranteed to exist if X is a complete separable metric space). I We use Qthe notation X for the I-fold product space generatedI by X, that I is, X = i∈I Xi where Xi s are identicalQcopies of X, and let B denote the product σ-algebra on X I ; that is, B I = i∈I Bi where Bi are copies of B. If V U and V are finite subsets of the index set I, let πU denote the restriction V U map from X to X . Theorem A.12. Let X be a complete separable metric space. Let µU be a family of probability measures on (X U , B U ), for U any finite subset of I. Suppose that these measures satisfy the compatibility condition for U ⊂ V V µU = µV ◦ πU .
Then there exists a unique probability measure on (X I , B I ) such that µU = I µ ◦ πU for any U a finite subset of I. Proof. Let Fin(I) denote the set of all finite subsets of I. It is immediate from the compatibility condition that S we can find a finitely additive µ0 which is a probability measure on (X I , F ∈Fin(I) (πFI )−1 (BF )), such that for U ∈ Fin(I), I −1 µU = (πU ) ◦ µ0 . If we can show that µ0 is countably additive, then the Carath´eodory extension theorem implies that µ0 can be extended to a measure µ on (X I , B I ). We cannot directly use Tulcea’s theorem to construct the extension measure; however we can use it to show that µ0Sis countably additive. Suppose An is a non-increasing family of sets An ∈ F ∈Fin(I) (πFI )−1 (BF ) such that An ↓ ∅; we must show that µ0 (An ) → 0. Given the Ai s, we can find finite subsets Fi of I such that Ai ∈ (πFI i )−1 BFi for each i. Without loss of generality we can choose this sequence so that F0 ⊂ F1 ⊂ F2 ⊂ · · · . Define Fn , (πFI n )−1 (BFn ) ⊂ BI . As a consequence of the product space structure, these σ-algebras satisfy the condition that the intersection of a decreasing family of atoms Zn ∈ Fn is non-empty. For q ∈ X I and B ∈ Fn , let F −1 −1 n πn (q, B) , µFn πFn−1 (BFn−1 ) πFI n (q), πFI n (B) , where (µFn | G)(ω, ·) for G ⊂ BFn is the regular conditional probability distribution of µFn given G. We refer to the properties of regular conditional
A.5 C` adl` ag Paths
303
probability distribution using the nomenclature of Definition 2.28. This πn is a probability kernel from (X I , Fn−1 ) to (X I , Fn ), i.e. π n (q, ·) is a measure on (X I , Fn ) and the map q 7→ πn (q, ·) is Fn−1 -measurable (which follows from property (b) of regular conditional distribution). In order to apply Tulcea’s theorem we must verify that the compatibility condition is satisfied i.e. πn (q, ·) is supported for a.e. q on the atom in Fn−1 containing q which is denoted An−1 (q). This is readily established by computing π(q, (An−1 (q))c ) and using property (c) of regular conditional distribution and the fact that q∈ / (An−1 (q))c . Thus we can apply Tulcea’s theorem to find a unique S∞ S∞probability measure µ on (X I , σ( n=0 Fn )) such that µ is equal to µ0 on n=0 Fn . Hence as An ∈ Fn , it follows that µ(An ) = µ0 (An ) for each n and therefore since µ is countably additive µ0 (An ) ↓ 0 which establishes the required countable additivity of µ0 . t u
A.5 C` adl` ag Paths A c` adl` ag (continue a ` droite, limite ` a gauche) path is one which is right continuous with left limits; that is, xt has c` adl` ag paths if for all t ∈ [0, ∞), the limit xt− exists and xt = xt+ . Such paths are sometimes described as RCLL (right continuous with left limits). The space of c`adl`ag functions from [0, ∞) to E is conventionally denoted DE [0, ∞). Useful references for this material are Billingsley [19, Chapter 3], Ethier and Kurtz [95, Sections 3.5–3.9], and Whitt [269, Chapter 12]. A.5.1 Discontinuities of C` adl` ag Paths Clearly c` adl` ag paths can only have left discontinuities, i.e. points t where xt 6= xt− . Lemma A.13. For any ε > 0, a c` adl` ag path taking values in a metric space (E, d) has at most a finite number of discontinuities of size in the metric d greater than ε; that is, the set D = {t ∈ [0, T ] : d(xt , xt− ) > ε} contains at most a finite number of points. Proof. Let τ be the supremum of t ∈ [0, T ] such that [0, t) can be finitely subdivided 0 < t0 < t1 < · · · < tk = t with the subdivision having the property that for i = 0, . . . , k − 1, sups,r∈[ti ,ti+1 ) d(xs , xr ) < ε. As right limits exist at 0 it is clear that τ > 0 and since a left limit exists at τ − it is clear that the interval [0, τ ) can be thus subdivided. Right continuity implies that there exists δ > 0 such that for 0 ≤ t0 −t < δ, then d(xt0 , xt ) < ε; consequently the result holds for [0, t0 ), which contradicts the fact that τ is the supremum unless τ = T , consequently τ = T . Therefore [0, T ) can be so subdivided:
304
A Measure Theory
jumps of size greater than ε can only occur at the ti s, of which there are a finite number and thus there must be at most a finite number of such jumps. t u Lemma A.14. Let X be a c` adl` ag stochastic process taking values in a metric space (E, d); then {t ∈ [0, ∞) : P(Xt− 6= Xt ) > 0} contains at most countably many points. Proof. For ε > 0, define Jt (ε) , {ω : d(Xt (ω), Xt− (ω)) > ε} Fix ε, then for any T > 0, δ > 0 we show that there are at most a finite number of points t ∈ [0, T ] such that P(Jt (ε)) > δ. Suppose this is false, and an infinite sequence ti of disjoint times ti ∈ [0, T ] exists. Then by Fatou’s lemma c c P lim inf (Jti (ε)) ≤ lim inf P ((Jti (ε)) ) i→∞
thus
i→∞
P lim sup Jti (ε) ≥ lim sup P(Jti (ε)) > δ, i→∞
i→∞
so the event that Jt (ε) occurs for an infinite number of the ti s has strictly positive probability and is hence non empty. This implies that there is a c`adl`ag path with an infinite number of jumps in [0, T ] of size greater than ε, which contradicts the conclusion of Lemma A.13. Taking the union over a countable sequence δn ↓ 0, it then follows that P(Jt (ε)) > 0 for at most a countable set of t ∈ [0, T ]. Clearly P(Jt (ε)) → P(Xt 6= Xt− ) as ε → 0, thus the set {t ∈ [0, T ] : P(Xt 6= Xt− ) > 0} contains at most a countable number of points. By taking the countable union over T ∈ N, it follows that {t ∈ [0, ∞) : P(Xt 6= Xt− ) > 0} is at most countable. t u A.5.2 Skorohod Topology Consider the sequence of functions xn (t) = 1{t≥1/n} , and the function x(t) = 1{t>0} which are all elements of DE [0, ∞). In the uniform topology which we used on CE [0, ∞), as n → ∞ the sequence xn does not converge to x; yet considered as c` adl` ag paths it appears natural that xn should converge to x since the location of the unit jump of xn converges to the location of the unit jump of x. A different topology is required. The Skorohod topology is the most frequently used topology on the space DE [0, ∞) which resolves this problem. Let λ : [0, ∞) → [0, ∞), and define
A.5 C` adl` ag Paths
305
γ(λ) , esssup | log λ0 (t)| t≥0 λ(s) − λ(t) = sup log . s−t s>t≥0 Let Λ be the subspace of Lipschitz continuous increasing functions from [0, ∞) → [0, ∞) such that λ(0) = 0, limt→∞ λ(t) = ∞ and γ(λ) < ∞. The Skorohod topology is most readily defined in terms of a metric which induces the topology. For x, y ∈ DE [0, ∞) define a metric dDE (x, y) by Z ∞ dDE (x, y) = inf γ(λ) ∨ e−u d(x, y, λ, u) du , λ∈Λ
0
where d(x, y, λ, u) = sup d(x(t ∧ u), y(λ(t) ∧ u)). t≥0
It is of course necessary to verify that this satisfies the definition of a metric. This is straightforward, but details may be found in Ethier and Kurtz [95, Chapter 3, pages 117-118]. For the functions xn and x in the example, it is clear that dDR (xn , x) → 0 as n → ∞. While there are other simpler topologies which have this property, the following proposition is the main reason why the Skorohod topology is the preferred choice of topology on DE . Proposition A.15. If the metric space (E, d) is complete and separable, then (DE [0, ∞), dDE ) is also complete and separable. Proof. The following proof follows Ethier and Kurtz [95]. As E is separable, it has a countable dense set. Let {xn }n≥1 be such a set. Given n, 0 = t0 < t1 < · · · < tn where tj ∈ Q+ and ij ∈ N for j = 0, . . . , n define the piecewise constant function ( xik tk ≤ t < tk+1 x(t) = xin t ≥ tn . The set of all such functions forms a dense subset of DE [0, ∞), therefore the space is separable. To show that the space is complete, suppose that {yn }n≥1 is a Cauchy sequence in (DE [0, ∞), dDE ), which implies that there exists an increasing sequence of numbers Nk such that for n, m ≥ Nk , dDE (yn , ym ) ≤ 2−k−1 e−k . Set vk = yNk ; then dDE (vk , vk+1 ) ≤ 2−k−1 e−k . Thus there exists λk such that Z ∞ e−u d(vk , vk+1 , λk , u) du < 2−k e−k . 0
As d(x, y, λ, u) is monotonic increasing in u, it follows that for any v ≥ 0,
306
A Measure Theory
Z
∞
e
−u
Z d(x, y, λ, u) du ≥ d(x, y, λ, v)
0
∞
e−u du = e−v d(x, y, λ, v).
v
Therefore it is possible to find λk ∈ Λ and uk > k such that max(γ(λk ), d(vk , vk+1 , λk , uk )) ≤ 2−k .
(A.13)
Then form the limit of the composition of the functions λi µk , lim λk+n ◦ · · · λk+1 ◦ λk . n→∞
It then follows that γ(µk ) ≤
∞ X
γ(λi ) ≤
i=k
∞ X
2−i = 2−k+1 < ∞;
i=k
thus µk ∈ Λ. Using the bound (A.13) it follows that for k ∈ N, −1 sup d vk (µ−1 k (t) ∧ uk ), vk+1 (µk+1 (t) ∧ uk ) t≥0
−1 = sup d vk (µ−1 k (t) ∧ uk ), vk+1 (λk ◦ µk (t) ∧ uk )
t≥0
= sup d (vk (t ∧ uk ), vk+1 (λk (t) ∧ uk )) t≥0
= d (vk , vk+1 , λk , uk ) ≤ 2−k . Since (E, d) is complete, it now follows that zk = vk ◦ µ−1 k converges uniformly on compact sets of t to some limit, which we denote z. As each zk has c`adl`ag paths, it follows that the limit also has c` adl`ag paths and thus belongs to DE [0, ∞). It only remains to show that vk converges to z in the Skorohod topology. This follows since, γ(µ−1 k ) → 0 as k → ∞ and for fixed T > 0, lim sup d vk ◦ µ−1 k (t), z(t) = 0. k→∞ 0≤t≤T
t u
A.6 Stopping Times In this section, the notation Fto is used to emphasise that this filtration has not been augmented. Definition A.16. A random variable T taking values in [0, ∞) is said to be an Fto -stopping time, if for all t ≥ 0, the event {T ≤ t} ∈ Fto .
A.6 Stopping Times
307
The subject of stopping times is too large to cover in any detail here. For more details see Rogers and Williams [248], or Dellacherie and Meyer [77, Section IV.3]. o Lemma A.17. A random variable T taking values in [0, ∞) is an Ft+ stopping time if and only if {T < t} ∈ Fto for all t ≥ 0.
Proof. If {T < t} ∈ Fto for all t ≥ 0 then since \ {T ≤ t} = {T < t + ε}, ε>0 o Ft+ε
o it follows that {T ≤ t} ∈ for any t ≥ 0 and ε > 0, thus {T ≤ t} ∈ Ft+ . o Thus T is an Ft+ -stopping time. o Conversely if T is an Ft+ -stopping time then since
{T < t} =
∞ [
{T ≤ t − 1/n}
n=1 o and each {T ≤ t − 1/n} ∈ F(t−1/n)+ ⊆ Fto , therefore {T < t} ∈ Fto .
t u
Lemma A.18. Let T n be a sequence of Fto -stopping times. Then T = inf n T n o is an Ft+ -stopping time. Proof. Write the event {inf n T n < t} as n o \ inf T n < t = {T n < t}. n
n
o By Lemma A.17 each term in this intersection belongs to Ft+ , therefore so does the intersection which again by Lemma A.17 implies that inf n T n is a o Ft+ -stopping time. t u
Lemma A.19. Let X be a real-valued, continuous, adapted process and a ∈ R. Define Ta , inf{t ≥ 0 : Xt ≥ a}. Then Ta is a Ft -stopping time Proof. The set {ω : Xq (ω) ≥ a} is Fq -measurable for any q ∈ Q+ as X is Ft -adapted. Then using the path continuity of X, [ {Ta ≤ t} = ω : inf Xs (ω) ≥ a = {ω : Xq (ω) ≥ a} . 0≤s≤t
q∈Q+ :0≤q≤t
Thus {Ta ≤ t} may be written as a countable union of Ft -measurable sets and so is itself Ft -measurable. Hence Ta is a Ft -stopping time. t u Theorem A.20 (D´ ebut Theorem). Let X be a process defined in some topological space (S) (with its associated Borel σ-algebra B(S)). Assume that X is progressively measurable relative to a filtration Fto . Then for A ∈ B(S), the mapping DA = inf{t ≥ 0; Xt ∈ A} defines an Ft -stopping time, where Ft is the augmentation of Fto .
308
A Measure Theory
For a proof see Theorem IV.50 of Dellacherie and Meyer [77]. See also Rogers and Williams [248, 249] for related results. We require a technical result regarding the augmentation aspect of the usual conditions which is used during the innovations derivation of the filtering equations. Lemma A.21. Let Gt be the filtration Fto ∨N where N contains all the P-null o sets. If T is a Gt -stopping time, then there exists a Ft+ -stopping time T 0 such 0 that T = T P-a.s. In addition if L ∈ GT then there exists M ∈ FTo + such that L = M P-a.s. Proof. Consider a stopping time of the form T = a1A + ∞1Ac where a ∈ R+ and A ∈ Ga ; in this case let B be an element of Fao such that the symmetric difference A 4 B is a P-null set and define T 0 = a1B + ∞1B c . For a general Gt -stopping time T use a dyadic approximation. Let S (n) ,
∞ X
k2−n 1{(k−1)2−n ≤T
k=0
Clearly S (n) is GT -measurable and by construction S (n) ≥ T . Thus S n is a Gt -stopping time. But the stopping time S (n) takes values in a countable set, so S (n) = inf k2−n 1Ak + ∞IAck , k
(n)
−n
where Ak , {S = k2 }. The result has already been proved for stopping times of the form of those inside the infimum. As T = limn S (n) = inf n S (n) , consequently the result holds for all Gt -stopping times. As a consequence of o this limiting operation Ft+ appears instead of Fto . To prove the second assertion, let L ∈ GT . By the first part since L ∈ G∞ o there exists L0 ∈ F∞ such that L = L0 P-a.s. Let V = T 1L + ∞1Lc a.s. Using o the first part again, Ft+ -stopping times V 0 and T 0 can be constructed such that V = V 0 a.s. and T = T 0 a.s. Define M , {L0 ∩ {T 0 = ∞}} ∪ {V 0 = T 0 < ∞}. Clearly M is FTo + -measurable and it follows that L = M P-a.s. t u The following lemma is trivial, but worth stating to avoid confusion in the more complex proof which follows. Lemma A.22. Let Xto be the unaugmented σ-algebra generated by a process X. Then for T an Xto -stopping time, if T (ω) ≤ t and Xs (ω) = Xs (ω 0 ) for s ≤ t then T (ω 0 ) ≤ t. Proof. As T is a stopping time, {T ≤ t} ∈ Xto = σ(Xs : 0 ≤ s ≤ t) from which the result follows. t u Corollary A.23. Let Xto be the unaugmented σ-algebra generated by a process X. Then for T an Xto -stopping time, if T (ω) ≤ t and Xs (ω) = Xs (ω 0 ) for s ≤ t then T (ω 0 ) = T (ω).
A.6 Stopping Times
309
Proof. Apply Lemma A.22 with t = T (ω) to conclude T (ω 0 ) ≤ T (ω). By symmetry, T (ω) ≤ T (ω 0 ) from which the result follows. t u Lemma A.24. Let Xto be the unaugmented σ-algebra generated by a process X. Then for T a Xto -stopping time, for all t ≥ 0, o Xt∧T = σ {Xs∧T : 0 ≤ s ≤ t} .
Proof. Since T ∧ t is also a Xto -stopping time, it suffices to show XTo = σ {Xs∧T : s ≥ 0} . The definition of the σ-algebra associated with a stopping time is that o XTo , {B ∈ X∞ : B ∩ {T ≤ s} ∈ Xso for all s ≥ 0} .
If A ∈ FTo then it follows from this definition that ( T if ω ∈ A, TA = +∞ otherwise, defines a Xto -stopping time. Conversely if for some set A, the time TA defined as above is a stopping time it follows that A ∈ XTo . Therefore we will have established the result if we can show that A ∈ σ{Xs∧T : s ≥ 0} is a necessary and sufficient condition for TA to be a stopping time. For the first implication, assume that TA is a Xto -stopping time. It is necessary to show that A ∈ σ{Xs∧T : s ≥ 0}. Suppose that ω, ω 0 ∈ Ω are such that Xs (ω) = Xs (ω 0 ) for s ≤ T (ω). We will establish that A ∈ σ{Xs∧T : s ≥ 0} if we show ω ∈ A implies that ω 0 ∈ A. If T (ω) = ∞ then it is immediate that the trajectories Xs (ω) and Xs (ω 0 ) are identical and hence ω 0 ∈ A. Therefore consider T (ω) < ∞; if ω ∈ A then TA (ω) = T (ω) and since it was assumed that TA is a Xto -stopping time the fact that Xs (ω) and Xs (ω 0 ) agree for s ≤ TA (ω) implies by Corollary A.23 that TA (ω 0 ) = TA (ω) = T (ω) < ∞ and from TA (ω 0 ) < ∞ it follows that ω 0 ∈ A. We must now prove the opposite implication; that is, given that T is a stopping time and A ∈ σ{Xs∧T : s ≥ 0}, we must show that TA is a stopping time. Given arbitrary t ≥ 0, if TA (ω) ≤ t and Xs (ω) = Xs (ω 0 ) for s ≤ t it follows that ω ∈ A (since TA (ω) < ∞). If T (ω) ≤ t and Xs (ω) = Xs (ω 0 ) for s ≤ t, since T is a stopping time it follows from Corollary A.23 that T (ω) = T (ω 0 ). Since we assumed A ∈ σ{Xs∧T : s ≥ 0} it follows that ω 0 ∈ A from which we deduce TA (ω) = T (ω) = T (ω 0 ) = TA (ω 0 ) whence {TA (ω) ≤ t, Xs (ω) = Xs (ω 0 )
for all s ≤ t} ⇒ TA (ω 0 ) ≤ t,
which implies that {TA (ω) ≤ t} ∈ Xto and hence that TA is a Xto -stopping time. t u
310
A Measure Theory
For many arguments it is required that the augmentation of the filtration generated by a process be right continuous. While left continuity of sample paths does imply left continuity of the filtration, right continuity (or even continuity) of the sample paths does not imply that the augmentation of the generated filtration is right continuous. This can be seen by considering the event that a process has a local maximum at time t which may be in Xt+ but not Xt (see the solution to Problem 7.1 (iii) in Chapter 2 of Karatzas and Shreve [149]). The following proposition gives an important class of process for which the right continuity does hold. Proposition A.25. If X is a d-dimensional strong Markov process, then the augmentation of the filtration generated by X is right continuous. Proof. Denote by X o the (unaugmented) filtration generated by the process X. If 0 ≤ t0 < t1 < · · · < tn ≤ s < tn+1 · · · < tm , then by application of the o strong Markov property to the trivial Xt+ -stopping time s, P(Xt0 ∈ Γ0 , . . . Xtm ∈ Γm | Fs+ ) = 1{Xt0 ∈Γ0 ,...,Xtn ∈Γn } P(Xtn+1 ∈ Γn+1 , . . . , Xm ∈ Γm | Xs ). The right-hand side in this equation is clearly Xs -measurable and it is P-a.s. equal to P(Xt0 ∈ Γ0 , . . . Xtm ∈ Γm | Fs+ ). As this holds for all cylinder sets, o it follows that for all F ∈ X∞ there exists a Xso -measurable random variable o which is P-a.s. equal to P(F | Xs+ ). o o o Suppose that F ∈ Xs+ ⊆ X∞ ; then clearly P(F | Xs+ ) = 1F . As above there exists a Xso -measurable random variable ˆ1F such that ˆ1F = 1F a.s. Define the event G , {ω : ˆ 1F (ω) = 1}, then G ∈ Xso and the events G and F differ by at most a null set (i.e. the symmetric difference G4F is null). o Therefore F ∈ Xs , which establishes that Xs+ ⊆ Xs for all s ≥ 0. It is clear that Xs ⊆ Xs+ . Now prove the converse implication. Suppose that F ∈ Xs+ , which implies that for all n, F ∈ Xs+1/n . Therefore there o exists Gn ∈ Xs+1/n such that F and Gn differ by a null set. Define G , T∞ S ∞ o G . Then clearly G ∈ Xs+ ⊆ Xs (by the result just proved). To m=1 n=m n show that F ∈ Xs , it suffices to show that this G differs by at most a null set from F . Consider ! ∞ ∞ [ [ G\F ⊆ Gn \ F = (Gn \ F ), n=1
n=1
where the right-hand side is a countable union of null sets; thus G \ F is null. Secondly !c ! ∞ [ ∞ ∞ \ ∞ \ [ c F \G=F ∩ Gn =F∩ Gn =
∞ [ m=1
m=1 n=m ∞ \
F∩
n=m
m=1 n=m
!! Gcn
⊆
∞ [ m=1
F ∩ Gcm =
∞ [ m=1
(F \ Gm ),
A.7 The Optional Projection
311
and again the right-hand side is a countable union of null sets, thus F \ G is null. Therefore F ∈ Xs , which implies that Xs+ ⊆ Xs ; hence Xs = Xs+ . t u
A.7 The Optional Projection Proof of Theorem 2.7 Proof. The proof uses a monotone class argument (Theorem A.1). Let H be the class of bounded measurable processes for which an optional projection exists. The class of functions 1[s,t) 1F , where s < t and F ∈ F can readily be seen to form a π-system which generates the measurable processes. Define Z to be a c` adl` ag version of the martingale t 7→ E(1F | Ft ) (which necessarily exists since we have assumed that the usual conditions hold); then we may set o 1[s,t) 1F (r, ω) = 1[s,t) (r)Zr (ω). It is necessary to check that the defining condition (2.8) is satisfied. Let T be a stopping time. Then from Doob’s optional sampling theorem (which is applicable in this case without restrictions on T , because the martingale Z is bounded and hence uniformly integrable) that E[1F | FT ] = E[Z∞ | FT ] = ZT whence E[1F 1{T <∞} | FT ] = ZT 1{T <∞}
P-a.s.
To apply the Monotone class theorem A.1 it is necessary to check that if Xn is a bounded monotone sequence in H with limit X then the optional projections o Xn converge to the optional projection of X. Consider Y , lim inf o Xn 1{| lim inf n→∞ o Xn |<∞} . n→∞
We must check that Y is the optional projection of X. Thanks to property (c) of conditional expectation the condition (2.8) is immediate. Consequently H is a monotone class and thus by the monotone class theorem A.1 the optional projection exists for any bounded B × F-measurable process. To extend to the unbounded non-negative case consider X ∧ n and pass to the limit. In order to verify that the projection is unique up to indistinguishability, consider two candidates for the optional projection Y and Z. For any stopping time T from (2.8) it follows that YT 1{T <∞} = ZT 1{T <∞} ,
P-a.s.
(A.14)
Define F , {(t, ω) : Zt (ω) 6= Yt (ω)}. Since both Z and Y are optional processes the set F is an optional subset of [0, ∞)×Ω. Write π : [0, ∞)×Ω → Ω for
312
A Measure Theory
the canonical projection map π : (t, ω) 7→ ω. Now argue by contradiction. Suppose that Z and Y are not indistinguishable; this implies that P(π(F )) > 0. By the optional section theorem (see Dellacherie and Meyer [77, IV.84]) it follows that given ε > 0 there exists a stopping time U such that when U (ω) < ∞, (U (ω), ω) ∈ F and P(U < ∞) ≥ P(π(F )) − ε. As it has been assumed that P(π(F )) > 0, by choosing ε sufficiently small, P(U < ∞) > 0. It follows that on some set of non-zero probability 1{U <∞} YU 6= 1{U <∞} ZU . But from (A.14) this may only hold on a null set, which is a contradiction. Therefore P(π(F )) = 0 and it follows that Z and Y are indistinguishable. t u Lemma A.26. If almost surely Xt ≥ 0 for all t ≥ 0 then o Xt ≥ 0 for all t ≥ 0 almost surely. Proof. Use the monotone class argument (Theorem A.1) in the proof of the existence of the optional projection, noting that if F ∈ F then the c`adl`ag version of E[1F | Ft ] is non-negative a.s. Alternatively use the optional section theorem as in the proof of uniqueness. t u A.7.1 Path Regularity Introduce the following notation for a one-sided limit, in this case the right limit lim sup xs , lim sup xs = inf sup xu , s↓↓t
s→t:s>t
v>t t
a similar notation with s ↑↑ t being used for the left limit. The following lemma is required to establish right continuity. It can be applied to the optional projection since being optional it must also be progressively measurable. Lemma A.27. Let X be a progressively measurable stochastic process taking values in R; then lim inf s↓↓t Xs and lim sups↓↓t Xs are progressively measurable. Proof. It is sufficient to consider the case of lim sup. Let b ∈ R be such that b > 0, then define ( supkb2−n ≤s<(k+1)b2−n Xs if b(k − 1)2−n ≤ t < bk2−n , k < 2n , n Xt , lim sups↓↓b Xs if b(1 − 2−n ) ≤ t ≤ b. For every t ≤ b, the supremum in the above definition is Fb -measurable since X is progressively measurable; thus the random variable Xtn is Fb -measurable. For every ω ∈ Ω, Xtn (ω) has trajectories which are right continuous for t ∈ [0, b]. Therefore X n is B([0, b]) ⊗ Fb -measurable and is thus progressively measurable. On [0, b] it is clear that lim supn→∞ Xtn = lim sups↓↓t Xs , hence lim sups↓↓t Xs is progressively measurable. t u
A.7 The Optional Projection
313
In a similar vein, the following lemma is required in order to establish the existence of left limits. For the left limits the result is stronger and the lim inf and lim sup are previsible and thus also progressively measurable. Lemma A.28. Let X be a progressively measurable stochastic process taking values in R; then lim inf s↑↑t Xs and lim sups↑↑t Xs are previsible. Proof. It suffices to consider lim sups↑↑t Xt . Define Xtn ,
X
1{k2−n
k>0
sup
Xs ,
(k−1)2−n <s≤k2−n
from this definition it is clear that Xtn is previsible as it is a sum of left continuous, adapted, processes. But as lim supn→∞ Xtn = lim sups↑↑t Xs , it follows that lim sups↑↑t Xs is previsible. t u Proof of Theorem 2.9 Proof. First observe that if Yt is bounded then o Yt must also be bounded. There are three things which must be established; first, the existence of right limits; second, right continuity; and third the existence of left limits. Because of the difference between Lemmas A.27 and A.28 the cases of left and right limits are not identical. The first part of the proof establishes the existence of right limits. It is sufficient to show that ! P lim inf o Ys < lim sup o Ys for some t ∈ [0, ∞) s↓↓t
= 0.
(A.15)
s↓↓t
The following steps are familiar from the proof of Doob’s martingale regularization theorem which is used to guarantee the existence of c`adl`ag modifications of martingales. If the right limit does not exist at t ∈ [0, ∞), that is, if lim inf s↓↓t o Ys < lim sups↓↓t o Ys , then rationals a, b can be found such that lim inf s↓↓t o Ys < a < b < lim sups↓↓t o Ys . The event that the right limit does not exist has thus been decomposed into a countable union over the rationals: ( ) ω : lim inf o Ys (ω) < lim sup o Ys (ω) for some t ∈ [0, ∞) s↓↓t
=
s↓↓t
( [ a,b∈Q
) ω : lim inf o Ys (ω) < a < b < lim sup o Ys (ω) for some t ∈ [0, ∞) . s↓↓t
s↓↓t
The lim sup and lim inf processes are progressively measurable by Lemma A.27, therefore for rationals a < b, the set ( ) Ea,b ,
(t, ω) : lim inf o Ys < a < b < lim sup o Ys s↓↓t
s↓↓t
,
314
A Measure Theory
is progressively measurable. Now argue by contradiction; suppose that (A.15) is not true. Then from the decomposition into a countable union, it follows that we can find a, b ∈ Q such that a < b and ! 0 < P lim inf o Ys < a < b < lim sup o Ys for some t ∈ [0, ∞) s↓↓t
= P(π(Ea,b )),
s↓↓t
where the projection π is defined for A ⊂ [0, ∞) × Ω, by π(A) = {ω : (ω, t) ∈ A}. Define Sa,b , inf{t ≥ 0 : (t, ω) ∈ Ea,b }, which is the d´ebut of a progressively measurable set, and thus by the D´ebut theorem (Theorem A.20 applied to the progressive process 1Ea,b (t, ω)) is a stopping time (and hence optional). For a given ω, this stopping time Sa,b (ω) is the first time where lim inf s↓↓t o Ys and lim sups↓↓t o Ys straddle the interval [a, b] and thus the right limit fails to exist at this point. If ω ∈ π(Ea,b ) then there exists t ∈ [0, ∞) such that (t, ω) ∈ Ea,b and this implies t ≥ S(ω), whence S(ω) < ∞. Thus, if P(π(Ea,b )) > 0 then this implies P(Sa,b < ∞) > 0. Thus a consequence of the assumption that (A.15) is false is that we can find a, b ∈ Q, with a < b such that P(Sa,b < ∞) > 0. This will lead to a contradiction. For the remainder of the argument we can keep a and b fixed and consequently we write S in place of Sa,b . Define A0 , {(t, ω) : S(ω) < t < S(ω) + 1, o Yt (ω) < a}; it then follows that the projection π(A0 ) = {S < ∞}. Thus by the optional section theorem, since A0 is optional (S is a stopping time and o Yt is a priori optional), we can find a stopping time S0 such that on {S0 < ∞}, (S0 (ω), ω) ∈ A0 and P(S0 < ∞) > (1 − 1/2)P(S < ∞). Define A1 , {(t, ω) : S(ω) < t < (S(ω) + 1/2) ∧ S0 (ω), o Yt (ω) > b} and again by the optional section theorem we can find a stopping time S1 such that on {S1 < ∞}, (S1 (ω), ω) ∈ A1 and P(S1 < ∞) > (1 − 1/22 )P(S < ∞). We can carry on this construction inductively defining A2k , (t, ω) : S(ω) < t < (S(ω) + 2−2k ) ∧ S2k−1 (ω), o Yt (ω) < a , and n o A2k+1 , (t, ω) : S(ω) < t < (S(ω) + 2−(2k+1) ) ∧ S2k (ω), o Yt (ω) > b .
A.7 The Optional Projection
315
We can construct stopping times using the optional section theorem such that for each i, on {Si < ∞}, (Si (ω), ω) ∈ Ai , and such that P(Si < ∞) > 1 − 2−(i+1) P(S < ∞). On the event {Si < ∞} it is clear that Si < Si−1 and Si < S + 2−i . Also if S = ∞ it follows that Si = ∞ for all i, thus Si < ∞ implies S < ∞, so P(Si < ∞, S < ∞) = P(Si < ∞), whence P(Si = ∞, S < ∞) = P(S < ∞) − P(Si < ∞, S < ∞) = P(S < ∞) − P(Si < ∞) ≤ P(S < ∞)/2i+1 . Thus
∞ X
P(Si = ∞, S < ∞) ≤ P(S < ∞) ≤ 1 < ∞,
i=0
so by the first Borel–Cantelli lemma the probability that infinitely many of the events {Si = ∞, S < ∞} occur is zero. In other words for ω ∈ {S < ∞}, we can find an index i0 (ω) such that for i ≥ i0 , the sequence Si converges in a decreasing fashion to S and o YSi < a for even i, and o YSi > b for odd i. Define Ri = supj≥i Sj , which is a monotonically decreasing sequence. Almost surely, Ri = Si for i sufficiently large, therefore limi→∞ Ri = S a.s. and on the event {S < ∞}, for i sufficiently large, o YRi < a for i even, and o YRi > b for i odd. Set Ti = Ri ∧ N . On {S < N }, for j sufficiently large Sj < N , hence using the boundedness of o Y to enable interchange of limit and expectation lim sup E [o YT2i ] ≤ aP(S < N ) + E o YN 1{S≥N } , i→∞ lim inf E o YT2i+1 ≥ bP(S < N ) + E o YN 1{S≥N } . i→∞
But since Ti is bounded by N , from the definition of the optional projection (2.8) it is clear that E[o YTi ] = E [E [YTi 1Ti <∞ | FTi ]] = E[YTi ].
(A.16)
Thus, since Y has right limits, by an application of the bounded convergence theorem E[YTi ] → E[YT ], and so as i → ∞ E[o YTi ] → E[o YT ].
(A.17)
Thus lim sup E[o YTi ] = lim sup E[o YT2i ] and lim inf E[o YTi ] = lim inf E[o YT2i+1 ], i→∞
i→∞
i→∞
i→∞
so, if P(S < N ) > 0 we see that since a < b, lim supi→∞ E[o YTi ] < lim inf i→∞ E[o YTi ], which is a contradiction therefore P(S < N ) = 0. As
316
A Measure Theory
N was chosen arbitrarily, this implies that P(S = ∞) = 1 which is a contradiction, since we assumed P(S < ∞) > 0. Thus a.s., right limits of o Yt exist. Now we must show that o Yt is right continuous. Let o Yt+ be the process of right limits. As this process is adapted and right continuous, it follows that it is optional. Consider for ε > 0, the set Aε , {(t, ω) : o Yt (ω) ≥ o Yt+ (ω) + ε}. Suppose that P(π(Aε )) > 0, from which we deduce a contradiction. By the optional section theorem, for δ > 0, we can find a stopping time S such that on S < ∞, (S(ω), ω) ∈ Aε , and P(S < ∞) = P(π(Aε )) − δ. We may choose δ such that P(S < ∞) > 0. Let Sn = S + 1/n, and bound these times by some N , which is chosen sufficiently large that P(S < N ) > 0. Thus set Tn , Sn ∧N and T , S ∧ N . Hence by bounded convergence lim E[o YTn ] = E[o YN 1S≥N ] + E[o YT + 1S
(A.18)
E[o YT ] = E[o YN 1S≥N ] + E[o YT 1S
(A.19)
n→∞
but As the right-hand sides of (A.18) and (A.19) are not equal we conclude that limn→∞ E(o YTn ) 6= E(o YT ), which contradicts (A.17). Therefore P(π(Aε )) = 0. The same argument can be applied to Bε = {(t, ω) : o Yt (ω) ≤ o Yt+ (ω) − ε}, which allows us to conclude that P(π(Bε )) = 0; hence P (o Yt = o Yt+ , ∀t ∈ [0, ∞)) = 1, and thus, up to indistinguishability, the process o Yt is right continuous. The existence of left limits is approached in a similar fashion; by Lemma A.28, the processes lim inf s↑↑t o Ys and lim sups↑↑t o Ys are previsible and hence optional. For a, b ∈ Q we define ( ) Fa,b ,
(t, ω) : lim inf o Ys (ω) < a < b < lim sup o Ys (ω) . s↑↑t
s↑↑t
We assume P(π(Fa,b )) > 0 and deduce a contradiction. Since Fa,b is optional, we may apply the optional section theorem to find an optional time T such that on {T < ∞}, the point (T (ω), ω) ∈ Fa,b and with P(T < ∞) > ε . Define C0 , {(t, ω) : t < T (ω), o Yt < a}, which is itself optional; thus another application of the optional section theorem constructs a stopping time R0 such that on {R0 < ∞} (R(ω), ω) ∈ C0 and since R0 < T it is clear that P(R0 < ∞) > ε.
A.8 The Previsible Projection
317
Then define C1 , {(t, ω) : R0 (ω) < t < T (ω), o Yt > b}, which is optional and by the optional section theorem we can find a stopping time R1 such that on R1 (ω) < ∞, (R1 (ω), ω) ∈ C1 and again R1 < T implies that P(R1 < ∞) > ε. Proceed inductively. We have constructed an increasing sequence of optional times Rk such that on the event {T < ∞}, YRk < a for even k, and o YRk > b for odd k. Define Lk = Rk ∧ N for some N ; then this is an increasing sequence of bounded stopping times and clearly on {T < N } the limit limn E[o YLn ] does not exist. But since Ln is bounded, from (A.16) it follows that this limit must exist a.s.; hence P(T < N ) = 0, which as N was arbitrary implies P(T < ∞) = 0, which is a contradiction. t u The results used in the above proof are due to Doob and can be found in a very clear paper [82] which is discussed further in Benveniste [16]. These papers work in the context of separable processes, which are processes whose graph is the closure of the graph of the process with time restricted to some countable set D. That is, for every t ∈ [0, ∞) there exists a sequence ti ∈ D such that ti → t and xti → xt . In these papers ‘rules of play’ disallow the use of the optional section theorem except when unavoidable and the above results are proved without its use. These results can be extended (with the addition of extra conditions) to optionally separable processes, which are similarly defined, but the set D consists of a countable collection of stopping times and by an application of the optional section theorem it can be shown that every optional process is optionally separable. The direct approach via the optional section theorems is used in Dellacherie and Meyer [79].
A.8 The Previsible Projection The optional projection (called the projection bien measurable in some early articles) has been discussed extensively and is the projection which is of importance in the theory of filtering; a closely related concept is the previsible (or predictable) projection. Some of the very early theoretical papers make use of this projection. By convention we take F0− = F0 . Theorem A.29. Let X be a bounded measurable process; then there exists an optional process o X called the previsible projection of X such that for every previsible stopping time T , p XT 1{T <∞} = E XT 1{T <∞} | FT − . (A.20) This process is unique up to indistinguishability, i.e. any processes which satisfy these conditions will be indistinguishable.
318
A Measure Theory
Proof. As in the proof of Theorem 2.7, let F be a measurable set, and define Zt to be a c` adl` ag version of the martingale E[1F | Ft ]. Then we define the previsible projection of 1(s,t] 1F by p 1(s,t] 1F (r, ω) = 1(s,t] (r)Zr− (ω). We must verify that this satisfies (A.20); let T be a previsible stopping time. Then we can find a sequence Tn of stopping times such that Tn ≤ Tn+1 < T for all n. By Doob’s optional sampling theorem applied to the uniformly integrable martingale Z; E[1F | FTn ] = E[Z∞ | FTn ] = ZTn , now pass to the limit as n → ∞, using the martingale convergence theorem (see Theorem B.1), and we get ZT − = E [Z∞ | ∨∞ n=1 FTn ] and from the definition of the σ-algebra of T − it follows that ZT − = E[Z∞ | FT − ]. To complete the proof, apply the monotone class theorem A.1 as in the proof for the optional projection and use the same optional section theorem argument for uniqueness. t u The previsible and optional projection are actually very similar, as the following theorem illustrates. Theorem A.30. Let X be a bounded measurable process; then the set {(t, ω) : o Xt (ω) 6= p Xt (ω)} is a countable union of graphs of stopping times. Proof. Again we use the monotone class argument. Consider the process 1[s,t)(r) 1F , from (2.8) and (A.20) the set of points of difference is {(t, ω) : Zt (ω) 6= Zt− (ω)} and since Z is a c` adl` ag process we can define a sequence Tn of stopping times corresponding to the nth discontinuity of Z, and by Lemma A.13 there are at most countably many such discontinuities, therefore the points of difference are contained in the countable union of the graphs of these Tn s. t u
A.9 The Optional Projection Without the Usual Conditions
319
A.9 The Optional Projection Without the Usual Conditions The proof of the optional projection theorem in Section A.7 depends crucially on the usual conditions to construct a c` adl` ag version of a martingale, both the augmentation by null sets and the right continuity of the filtration being used. The result can be proved on the uncompleted σ-algebra by making suitable modifications to the process constructed by Theorem 2.7. These results were first established in Dellacherie and Meyer [78] and take their definitive form in [77], the latter approach being followed here. The proofs in this section are of a more advanced nature and make use of a number of non-trivial results about σ-algebras of stopping times which are not proved here. These results and their proofs can be found in, for example, Rogers and Williams [249]. As usual let Fto denote the unaugmented σ-algebra corresponding to Ft . Lemma A.31. Let L ⊂ R+ × Ω be such that [ L = {(Sn (ω), ω) : ω ∈ Ω}, n
where the Sn are positive Fto -stopping times. We can find disjoint Fto -stopping times Tn to replace the Sn such that [ L = {(Tn (ω), ω) : ω ∈ Ω}. n
Proof. Define T1 = S1 and define An , {ω ∈ Ω : S1 6= Sn , S2 6= Sn , . . . , Sn−1 6= Sn }. Then it is clear that An ∈ FSon . From the definition of this σ-algebra, if we define Tn , Sn 1An + ∞1Acn , then this Tn is a stopping time. It is clear that this process may be continued inductively. The disjointness of the Tn s follows by construction. t u Given this lemma the following result is useful when modifying a process as it allows us to break down the ‘bad’ set A of points in a useful fashion. Lemma A.32. Let A be a subset of R+ × Ω contained in a countable union of graphs of positive random variables then A = K∪L where K and L are disjoint measurable sets such that K is contained in a disjoint union of graphs of optional times and L intersects the graph of any optional time on an evanescent set.† †
A set A ⊂ [0, ∞) × Ω is evanescent if the projection π(A) = {ω : ∃t ∈ [0, ∞) such that (ω, t) ∈ A} is contained in a P-null set. Two indistinguishable processes differ on an evanescent set.
320
A Measure Theory
Proof. Let V denote the set of all optional times. For Z a positive random variable define V (Z) = ∪T ∈V {ω : Z(ω) = T (ω)}; consequently there is a useful decomposition Z = Z 0 ∧ Z 00 , where Z 0 = Z1V (Z) + ∞1V (Z)c Z 00 = Z1V (Z)c + ∞1V (Z) . From the definition of V (Z) the set {(Z 0 (ω), ω) : ω ∈ Ω} is contained in the graph of a countable number of optional times and if T is an optional time then P(Z 00 = T < ∞) = 0. Let the covering of A by a countable family of graphs of random variables be written A⊆
∞ [
{(Zn (ω), ω) : ω ∈ Ω}
n=1
and formSa decomposition of each random variable Zn = Zn0 ∧ Zn00 as above. ∞ Clearly n=1 {(Zn0 (ω), ω) : ω ∈ Ω} is also covered by a countable union of graphs of optional times and by Lemma A.31 we can find a sequence of disjoint optional times Tn such that ∞ [
{(Zn0 (ω), ω) : ω ∈ Ω} ⊆
n=1
∞ [
{(Tn (ω), ω) : ω ∈ Ω}.
n=1
Define K =A∩ L=A∩
∞ [
{(Zn0 (ω), ω) : ω ∈ Ω} = A ∩
n=1 ∞ [
{(Z 00 (ω), ω) : ω ∈ Ω} = A \
n=1
[
{(Tn (ω), ω) : ω ∈ Ω}
n
[ {(Tn (ω), ω) : ω ∈ Ω}. n
Clearly A = K ∪ L, hence this is a decomposition of A which has the required properties. t u Lemma A.33. For every Ft -optional process Xt there is an indistinguishable o Ft+ -optional process. Proof. Let T be an Ft -stopping time. Consider the process Xt = 1[0,T ) , which is c` adl` ag and Ft -adapted, and hence Ft -optional. By Lemma A.21 there exists o an Ft+ -stopping time T 0 such that T = T 0 a.s. If we define Xt0 = 1[0,T 0 ) , then o o since this process is c` adl` ag and Ft+ -adapted, it is clearly an Ft+ -optional process. P(ω : Xt0 (ω) = Xt (ω) ∀t) = P(T = T 0 ) = 1, which implies that the processes X 0 and X are indistinguishable. We extend from processes of the form 1[0,T ) to the whole of O using the monotone class framework (Theorem A.1) to extend to bounded optional processes, and use truncation to extended to the unbounded case. t u
A.9 The Optional Projection Without the Usual Conditions
321
Lemma A.34. For every Ft -previsible process, there is an indistinguishable Fto -previsible process. Proof. We first show that if T is Ft -previsible; then there exists T 0 which is Fto -previsible, such that T = T 0 a.s. As {T = 0} ∈ F0− , we need only consider the case where T > 0. Let Tn be a sequence of Ft -stopping times announcing† o T . By Lemma A.33 it is clear that we can find Rn an Ft+ -stopping time such that Rn = Tn a.s. Define Ln , maxi=1,...,n Rn ; clearly this is an increasing sequence of stopping times. Let this sequence have limit L. Define An , {Ln = 0} ∪ {Ln < L} and define ( Ln ∧ n if ω ∈ An Mn = +∞ otherwise. Since the sets An are decreasing, the stopping times Mn form an increasing sequence and the sequence Mn announces everywhere its limit T 0 . This limit is strictly positive. Because T 0 is announced, T 0 is an Fto -previsible time and T = T 0 a.s. Finish the proof by a monotone class argument as in Lemma A.33. t u The main result of this section is the following extension of the optional projection theorem which does not require the imposition of the usual conditions. Theorem A.35. Given a stochastic process X, we can construct an Fto optional process Zt such that for every stopping time T , ZT 1{T <∞} = E ZT 1{T <∞} | FT , (A.21) and this process is unique up to indistinguishability. Proof. By the optional projection theorem 2.7 we can construct an Ft -optional process Z¯t which satisfies (A.21). By Lemma A.33 we can find a process Zt o which is indistinguishable from Z¯t but which is Ft+ -optional. In general this o process Z will not be Ft -optional. We must therefore modify it. Similarly using Theorem A.29, we can construct an Ft -previsible process Yt , and using Lemma A.34, we can find an Fto -previsible process Y¯t which is indistinguishable from the process Yt . Let H = {(t, ω) : Yt (ω) 6= Zt (ω)}; then it follows by Theorem A.30, that this graph of differences is contained within a countable disjoint union of graphs of random variables. Thus by Lemma A.32 we may write H = K ∪ L †
A stopping time T is called announceable if there exists an announcing sequence (Tn )n≥1 for T . This means that for any n ≥ 1 and ω ∈ Ω, Tn (ω) ≤ Tn+1 (ω) < T (ω) and Tn (ω) % T (ω). A stopping time T is announceable if and only if it is previsible. For details see Rogers and Williams [248].
322
A Measure Theory
such that for T any Fto -stopping time P(ω : (T (ω), ω) ∈ L) = 0 and there exists a sequence of Fto -stopping times Tn such that [ K ⊂ {(Tn (ω), ω) : ω ∈ Ω}. n
For each n let Zn be a version of E[XTn 1{Tn <∞} | FTon ]; then we can define ( S Yt (ω) if (t, ω) ∈ / n {(Tn (ω), ω) : ω ∈ Ω} Zt (ω) , (A.22) Zn (ω) if (t, ω) ∈ {(Tn (ω), ω) : ω ∈ Ω}. It is immediate that this Zt is Fto -optional. Let us now show that it satisfies o (A.21). Let T be an Fto -optional S time and let A ∈ FTo . Set An = A∩{T = Tn }; o thus A ∈ FTn . Let B = A \ n An and thus B ∈ FT . From the definition (A.22), ZT 1An 1T <∞ = Zn 1An 1Tn <∞ = 1An E[1Tn <∞ XTn | FTon ] = E[XTn 1An 1Tn <∞ | FTon ] = E[XT 1An 1T <∞ | FTo ]. Consequently E[1An ZT 1T <∞ ] = E[1An E[1T <∞ XT | FTo ]] = E[1An XT 1T <∞ ]. So on An the conditions are satisfied. Now consider B, on which a.s. T 6= Tn for all n; hence (T (ω), ω) ∈ / L. Since P((T (ω), ω) ∈ K) = 0, it follows that a.s. (T (ω), ω) ∈ / H. Recalling the definition of H this implies that Yt (ω) = ζt (ω) a.s.; from the Definition A.22 on B, ZT = YT , thus E[1B ZT ] = E[1B ζT ] = E[1B E[XT | FTo + )] = E[1B XT ]. Thus on An for each n and on B the process Z is an optional projection of X. The uniqueness argument using the optional section theorem is exactly analogous to that used in the proof of Theorem 2.7. t u
A.10 Convergence of Measure-valued Random Variables Let (Ω, F, P) be a probability space and let (µn )∞ n=1 be a sequence of random measures, µn : Ω → M(S) and µ : Ω → M(S) be another measure-valued random variable. In the following we define two types of convergence for sequences of measure-valued random variables: 1. limn→∞ E [|µn f − µf |] = 0 for all f ∈ Cb (S). 2. limn→∞ µn = µ, P-a.s. We call the first type of convergence convergence in expectation. If there exists an integrable random variable w : Ω → R such that µn (1) ≤ w for all n, then limn→∞ µn = µ, P-a.s., implies that µn converged to µ in expectation by the dominated convergence theorem. The extra condition is satisfied if (µn )∞ n=1 is a sequence of random probability measures, since in this case, µn (1) = 1 for all n. We also have the following.
A.10 Convergence of Measure-valued Random Variables
323
Remark A.36. If µn converges in expectation to µ, then there exist sequences n(m) such that limm→∞ µn(m) = µ, P-a.s. Proof. Since M(S) is isomorphic to (0, ∞)×P(S), with the isomorphism being given by ν ∈ M(S) 7→ (ν(1), ν/ν(1)) ∈ (0, ∞) × P(S), it follows from Theorem 2.18 that there exists a countable convergence determining set of functions† M , {ϕ0 , ϕ1 , ϕ2 , . . .},
(A.23)
where ϕ0 is the constant function equal to 1 everywhere and ϕi ∈ Cb (S) for any i > 0. Since lim E [|µn f − µf |] = 0 n→∞
for all f ∈ {ϕ0 , ϕ1 , ϕ2 , . . . } and the set {ϕ0 , ϕ1 , ϕ2 , . . .} is countable, one can find a subsequence n(m) such that, with probability one, limm→∞ µn(m) ϕi = µϕi for all i ≥ 0, hence the claim. t u If a suitable bound on the rate of convergence for E [|µn f − µf |] is known, then the sequence n(m) can be specified explicitly. For instance we have the following. Remark A.37. Assume that there exists a countable convergence determining set M such that, for any f ∈ M, cf E [|µn f − µf |] ≤ √ , n 3
where cf is a positive constant independent of n, then limm→∞ µm = µ, P-a.s. Proof. By Fatou’s lemma " ∞ # n i h 3 X 3 X m E f − µf ≤ lim E µm f − µf µ n→∞
m=1
≤ cf Hence
m=1
∞ X
1 < ∞. 3/2 m m=1
∞ X m3 µ f − µf < ∞
P-a.s.,
m=1 †
Recall that M is a convergence determining set if, for any sequence of finite measures νn , n = 1, 2, . . . and ν being another finite measure for which limn→∞ νn f = νf for all f ∈ M, it follows limn→∞ νn = ν.
324
A Measure Theory
therefore
3
lim µm f = µf
m→∞
for any f ∈ M.
Since M is countable and convergence determining, it also follows that 3 limm→∞ µm = µ, P-a.s. t u Let d : P(S) × P(S) → [0, ∞) be the metric defined in Theorem 2.19; that is, for µ, ν ∈ P(S), ∞ X |µϕi − νϕi | d(µ, ν) = , 2i i=1 where ϕ1 ,ϕ2 , . . . are elements of Cb (S) such that kϕi k∞ = 1 and let ϕ0 = 1. We can extend d to a metric on M(S) as follows. dM : M(S) × M(S) → [0, ∞),
d(µ, ν) ,
∞ X 1 |µϕi − νϕi |. i 2 i=0
(A.24)
The careful reader should check that dM is a metric and that indeed dM induces the weak topology on M(S). Using dM , the almost sure convergence 2. is equivalent to 20 . limn→∞ dM (µn , µ) = 0,
P-a.s.
If there exists an integrable random variable w : Ω → R such that µn (1) ≤ w for all n, then similarly, (1) implies 10 . limn→∞ E [dM (µn , µ)] = 0. However, a stronger condition (such as tightness) must be imposed in order for condition (1) to be equivalent to condition (10 ). It is usually the case that convergence in expectation is easier to establish than almost sure convergence. However, if we have control on the higher moments of the error variables µn f − µf then we can deduce the almost sure convergence of µn to µ. The following remark shows how this can be achieved and is used repeatedly in Chapters 8, 9 and 10. Remark A.38. i. Assume that there exists a positive constant p > 1 and a countable convergence determining set M such that, for any f ∈ M, we have cf E |µn f − µf |2p ≤ p , n where cf is a positive constant independent of n. Then, for any ε ∈ (0, 1/2− 1/(2p)) there exists a positive random variable cf,ε almost surely finite such that cf,ε |µn f − µf | ≤ ε . n n In particular, limn→∞ µ = µ, P-a.s.
A.11 Gronwall’s Lemma
325
ii. Similarly, assume that there exists a positive constant p > 1 and a countable convergence determining set M such that c E dM (µn , µ)2p ≤ p , n where dM is the metric defined in (A.24) and c is a positive constant independent of n. Then, for any ε ∈ (0, 1/2 − 1/(2p)) there exists a positive random variable cε almost surely finite such that |µn f − µf | ≤
cε , nε
P-a.s.
In particular, limn→∞ µn = µ, P-a.s. Proof. As in the proof of Remark A.37, "∞ # ∞ X X 2εp n 2p E n |µ f − µf | ≤ cf n=1
m=1
1 np−2εp
< ∞,
since p − 2εp > 1. Let cf,ε be the random variable cf,ε =
!1/2p
∞ X
n
2εp
n
2p
|µ f − µf |
.
n=1
As (cf,ε )2p is integrable, cf,ε is almost surely finite and nε |µn f − µf | ≤ cf,ε . Therefore limn→∞ µn f = µf for any f ∈ M. Again, since M is countable and convergence determining, it also follows that limn→∞ µn = µ, P-a.s. Part (ii) of the remark follows in a similar manner. t u
A.11 Gronwall’s Lemma An important and frequently used result in the theory of stochastic differential equations is Gronwall’s lemma. Lemma A.39 (Gronwall). Let x, y and z be measurable non-negative functions on the real numbers. If y is bounded and z is integrable on [0, T ] for some T ≥ 0, and for all 0 ≤ t ≤ T , Z t xt ≤ zt + xs ys ds, (A.25) 0
then for all 0 ≤ t ≤ T , Z xt ≤ zt + 0
t
Z t zs ys exp yr dr ds. s
326
A Measure Theory
R t Proof. Multiplying both sides of the inequality (A.25) by yt exp − 0 ys ds yields Z t Z t Z t xt yt exp − ys ds − xs ys ds yt exp − ys ds 0 0 0 Z t ≤ zt yt exp − ys ds . 0
The left-hand side can be written as the derivative of a product, Z t Z t Z t d xs ys ds exp − ys ds ≤ zt yt exp − ys ds , dt 0 0 0 which can be integrated to give Z t Z t Z t Z xs ys ds exp − ys ds ≤ zs ys exp − 0
0
0
s
yr dr
ds,
0
or equivalently t
Z
Z xs ys ds ≤
0
t
Z t zs ys exp yr dr ds.
0
s
Combining this with the original equation (A.25) gives the desired result. t u Corollary A.40. If x is a real-valued function such that for all t ≥ 0, Z t xt ≤ A + B xs ds, 0
then for all t ≥ 0, xt ≤ AeBt . Proof. We have for t ≥ 0, Z xt ≤ A +
t
ABeB(t−s) ds
0
≤ A + ABeBt (e−tB − 1)/(−B) = AeBt . t u
A.12 Explicit Construction of the Underlying Sample Space for the Stochastic Filtering Problem Let (S, d) be a complete separable metric space (a Polish space) and Ω 1 be the space of S-valued continuous functions defined on [0, ∞), endowed with
A.12 Explicit Construction of Sample Space
327
the topology of uniform convergence on compact intervals and with the Borel σ-algebra associated denoted with F 1 , Ω 1 = C([0, ∞), S),
F 1 = B(Ω 1 ).
(A.26)
Let X be an S-valued process defined on this space; Xt (ω 1 ) = ω 1 (t), ω 1 ∈ Ω 1 . We observe that Xt is measurable with respect to the σ-algebra F 1 and consider the filtration associated with the process X, Ft1 = σ(Xs , s ∈ [0, t]).
(A.27)
Let A : Cb (S) → Cb (S) be an unbounded operator with domain D(A) with 1 ∈ D(A) and A1 = 0 and let P1 be a probability measure which is a solution of the martingale problem associated with the infinitesimal generator A and the initial distribution π0 ∈ P(S), i.e., under P1 , the distribution of X0 is π0 and Z t Mtϕ = ϕ(Xt ) − ϕ(X0 ) − Aϕ(Xs ) ds, Ft1 , 0 ≤ t < ∞, (A.28) 0
is a martingale for any ϕ ∈ D(A). Let also Ω 2 be defined similarly to Ω 1 , but with S = Rm . Hence Ω 2 = C([0, ∞), Rm ),
F 2 = B(Ω 2 ).
(A.29)
We consider also V to be the canonical process in Ω 2 , (i.e. Vt (ω 2 ) = ω 2 (t), ω 2 ∈ Ω 2 ) and P2 to be a probability measure such that V is an m-dimensional standard Brownian motion on (Ω 2 , F 2 ) with respect to it. We consider now the following. Ω , Ω1 × Ω2, F 0 , F 1 ⊗ F 2, P , P1 ⊗ P2 , N , {B ⊂ Ω : B ⊂ A, A ∈ F, P(A) = 0} F , F0 ∨ N . So (Ω, F, P) is a complete probability space and, under P, X and V are two independent processes. They can be viewed as processes on the product space (Ω, F, P) in the usual way: as projections onto their original spaces of definition. If W is the canonical process on Ω, then W (t) = ω(t) = (ω 1 (t), ω 2 (t)) X = p1 (ω)
where p1 : Ω → Ω 1 , p1 (ω) = ω 1
V = p2 (ω)
where p2 : Ω → Ω 2 , p2 (ω) = ω 2 .
Mtϕ is also a martingale with respect to the larger filtration Ft , where
328
A Measure Theory
Ft = σ(Xs , Vs , s ∈ [0, t]) ∨ N . Let h : S → Rm be a Borel-measurable function with the property that ! Z T
kh(Xs )k ds < ∞
P
=1
for all T > 0,
0
Finally let Y be the following stochastic process (usually called the observation process) Z t Yt = h(s, Xs ) ds + Vt , t ≥ 0. 0
B Stochastic Analysis
B.1 Martingale Theory in Continuous Time The subject of martingale theory is too large to cover in an appendix such as this. There are many useful references, for example, Rogers and Williams [248] or Doob [81]. Theorem B.1. If M = {Mt , t ≥ 0} is a right continuous martingale bounded in Lp for p ≥ 1, that is, supt≥0 E[|Mt |p ] < ∞, then there exists an Lp integrable random variable M∞ such that Mt → M∞ almost surely as t → ∞. Furthermore, 1. If M is bounded in Lp for p > 1, then Mt → M∞ in Lp as t → ∞. 2. If M is bounded in L1 and {Mt , t ≥ 0} is uniformly integrable then Mt → M∞ in L1 as t → ∞. If either condition (1) or (2) holds then the extended process {Mt , t ∈ [0, ∞]} is a martingale. For a proof see Theorem 1.5 of Chung and Williams [53]. The following lemma provides a very useful test for identifying martingales. Lemma B.2. Let M = {Mt , t ≥ 0} be a c` adl` ag adapted process such that for each bounded stopping time T , E[|MT |] < ∞ and E[MT ] = E[M0 ] then M is a martingale. Proof. For s < t and A ∈ Fs define ( T (ω) ,
s t
if ω ∈ A, if ω ∈ Ac .
Then T is a stopping time and E[M0 ] = E[MT ] = E[Ms 1A ] + E[Mt 1Ac ],
330
B Stochastic Analysis
and trivially for the stopping time t, E[M0 ] = E[Mt ] = E[Mt 1A ] + E[Mt 1Ac ], so E[Mt 1A ] = E[Ms 1A ] which implies that Ms = E[Mt | Fs ] a.s. which together with the integrability condition implies M is a martingale. t u By a straightforward change to this proof the following corollary may be established. Corollary B.3. Let {Mt , t ≥ 0} be a c` adl` ag adapted process such that for each stopping time (potentially infinite) T , E[|MT |] < ∞ and E[MT ] = E[M0 ] then M is a uniformly integrable martingale. Definition B.4. Let M be a stochastic process. If M0 is F0 -measurable and there exists an increasing sequence Tn of stopping times such that Tn → ∞ a.s. and such that MnT = {Mt∧Tn − M0 , t ≥ 0} is a Ft -adapted martingale for each n ∈ N, then M is called a local martingale and the sequence Tn is called a reducing sequence for the local martingale M . The initial condition M0 is treated separately to avoid imposing integrability conditions on M0 .
B.2 Itˆ o Integral The stochastic integrals which arise in this book are the integrals of stochastic processes with respect to continuous local martingales. The following section contains a very brief overview of the construction of the Itˆo integral in this context and the necessary conditions on the integrands for the integral to be well defined. The results are presented starting from the previsible integrands, since in the general theory of stochastic integration these form the natural class of integrators. The results then extend in the case of continuous martingale integrators to integrands in the class of progressively measurable processes and if the quadratic variation of the continuous martingale is absolutely continuous with respect to Lebesgue measure (as for example in the case of integrals with respect to Brownian motion) then this extends further to all adapted, jointly measurable processes. It is also possible to construct directly the stochastic integral with a continuous martingale integrator on the space of progressively measurable processes (this approach is followed in e.g. Ethier and Kurtz [95]). There are numerous references which describe the material in this section in much greater detail; examples include Chung and Williams [53], Karatzas and Shreve [149], Protter [247] and Dellacherie and Meyer [79].
B.2 Itˆ o Integral
331
Definition B.5. The previsible (predictable) σ-algebra denoted P is the σalgebra of subsets of [0, ∞) × Ω generated by left continuous processes valued in R; that is, it is the smallest σ-algebra with respect to which all left continuous processes are measurable. A process is said to be previsible if it is P-measurable. Lemma B.6. Let A be the ring† of subsets of [0, ∞) × Ω generated by the sets of the form {(s, t] × A} where A ∈ Fs and 0 ≤ s < t and the sets {0} × A for A ∈ F0 . Then σ(A) = P. Proof. It suffices to show that any adapted left continuous process (as a generator of P) can be approximated by finite linear combinations of indicator functions of elements of A. Let H be a bounded adapted left continuous process; define nk X Ht = lim lim H(i−1)/n 1((i−1)/n,i/n] (t). k→∞ n→∞
i=2
As Ht is adapted it follows that H(i−1)/n ∈ F(i−1)/n , thus each term in the sum is A-measurable, and therefore by linearity so is the whole sum. t u Definition B.7. Define the vector space of elementary function E to be the space of finite linear combinations of indicator functions of elements of A. Definition B.8. For the indicator function X = 1{(s,t]×A} for A ∈ Fs , which an element of E, we can define the stochastic integral Z ∞ Xr dMr , 1A (Mt − Ms ). 0
For X = 1{0}×A where A ∈ F0 , define the integral to be identically zero. This definition can be extended by linearity to the space of elementary functions E. Further define the integral between 0 and t by Z t Z ∞ Xr dMr , 1[0,t] (r)Xr dMr . 0
0
Lemma B.9. If M is a martingale and X ∈ E then martingale.
Rt 0
Xr dMr is a Ft -adapted
Proof. Consider Xt = 1A 1(r,s] (t) where A ∈ Fr . From Definition B.8, Z
t
Z
0
∞
1[0,t] (p)Xp dMp = 1A (Ms∧t − Mr∧t ),
Xp dMp = 0
and hence as M is a martingale and A ∈ Fr , then by considering separately the cases 0 ≤ p ≤ r, r < p ≤ s and p > s it follows that †
A ring is a class of subsets closed under finite unions and set differences A \ B and which contains the empty set.
332
B Stochastic Analysis t
Z E 0
Xr dMr Fp = E [1A (Ms∧t − Mr∧t ) | Fp ] Z p = 1A E(Ms∧p − Mr∧p ) = Xs dMs . 0
By linearity, this result extends to X ∈ E.
t u
B.2.1 Quadratic Variation The total variation is the variation which is used in the construction of the usual Lebesgue–Stieltjes integral. This cannot be used to define a non-trivial stochastic integral, as any continuous local martingale of finite variation is indistinguishable from zero. Definition B.10. The quadratic variation process† hM it of a continuous square integrable martingale M is a continuous increasing process At starting from zero such that Mt2 − At is a martingale. Theorem B.11. If M is a continuous square integrable martingale then the quadratic variation process hM it exists and is unique. The following proof is based on Theorem 4.3 of Chung and Williams [53] who attribute the argument to M. J. Sharpe. Proof. Without loss of generality consider a martingale starting from zero. The result is first proved for a martingale which is bounded by C. For given n ∈ N, define tnj , j2−n and tˆnj , t ∧ tnj for j ∈ N and Stn ,
∞ X
Mtˆnj+1 − Mtˆnj
2
.
j=0
By rearrangement of terms in the summations Mt2 =
∞ X
Mtˆ2n
k+1
k=0 ∞ X
=2
− Mtˆ2n k
Mtˆn Mtˆn k
k+1
∞ X − Mtˆn + Mtˆn k
k=0
Therefore Stn = Mt2 − 2
k+1
− Mtˆn k
2
.
k=0 ∞ X
Mtˆn Mtˆn k
k+1
− Mtˆn . k
(B.1)
k=0 †
Technically, if we were to study discontinuous processes what is being constructed here should be denoted [M ]t . The process hM it , when it exists, is the dual previsible projection of [M ]t . In the continuous case, the two processes coincide, and historical precedent makes hM it the more common notation.
B.2 Itˆ o Integral
333
For fixed n and t the summation in (B.1) contains a finite number of non zero terms each of which is a continuous martingale. It therefore follows that the Stn − Mt2 is a continuous martingale for each n. It is now necessary to show that as n → ∞, for fixed t, the sequence {Stn , n ∈ N} is a Cauchy sequence and therefore converges in L2 . If we consider fixed m < n and for notational convenience write tj for tnj , then it is possible to relate the points on the two dyadic meshes by setting t0j = 2−m [tj 2m ] and tˆ0j , t ∧ t0j ; that is, t0j is the closest point on the coarser mesh to the left of tj . It follows from (B.1) that Stn
−
Stm
= −2
[2n t]
X
Mtˆj − Mtˆ0j
Mtˆj+1 − Mtˆj .
(B.2)
j=0
Define Zj , Mtˆj − Mtˆ0j ; as t0j ≤ tj it follows that Zj is Ftˆj -measurable. For j < k since Zj (Mtˆj+1 − Mtˆj )Zk is Ftˆk -measurable it follows that h i E Zj Mtˆj+1 − Mtˆj Zk Mtˆk+1 − Mtˆk = 0.
(B.3)
Hence using (B.3) and the Cauchy–Schwartz inequality n [2 t] 2 X n E (St − Stm )2 = 4E Zj2 Mtˆj+1 − Mtˆj j=0
≤ 4E
sup
(Mr − Ms )2
0≤r≤s≤t s−r<2−m
v u u u ≤ 4u tE
[2n t]
X
Mtˆj+1 − Mtˆj
2
j=0
2 sup
0≤r≤s≤t s−r<2−m
(Mr − Ms )2
v u 2 u n 2 Xt] u [2 ×u Mtˆj+1 − Mtˆj . tE j=0
The first term tends to zero using the fact that M being continuous is uniformly continuous on the bounded time interval [0, t]. It remains to show that the second term is bounded. Write aj , (Mtˆj+1 − Mtˆj )2 , for j ∈ N; then
334
B Stochastic Analysis
2 2 [2n t] [2n t] 2 X X E Mtˆj+1 − Mtˆj = E aj j=0
j=0
[2n t]
[2n t]
[2n t]
X
X
X
ak
= E
a2j + 2
j=0
aj
j=0
k=j+1
a2j + 2E aj E ak Ftˆj+1 . = E j=0 j=0 k=j+1
[2n t]
[2n t]
X
X
[2n t]
X
It is clear that since the aj s are non-negative and M is bounded by C that [2n t]
X j=0
[2n t]
a2j
≤
max n al
l=0,...,[2 t]
X
[2n t]
aj ≤ 4C
j=0
2
X
aj
j=0
and ∞ 2 X E ak Ftˆj+1 = E Mtˆk+1 − Mtˆk | Ftˆj+1 k=j+1 k=j+1
[2n t]
X
=
∞ X
h i E Mtˆ2k+1 − Mtˆ2k | Ftˆj+1
k=j+1
h i = E Mt2 − Mtˆ2j+1 | Ftˆj+1 ≤ C 2. From these two bounds 2 n [2n t] [2 t] 2 X X E Mtˆj+1 − Mtˆj ≤ (4C 2 + 2C 2 )E aj j=0
j=0 2
= 6C E Mt2 < ∞. As this bound holds uniformly in n, m, as n and m → ∞ it follows that Stn − Stm → 0 in the L2 sense and hence the sequence {Stn , n ∈ N} converges in L2 to a limit which we denote St . As the martingale property is preserved by L2 limits, it follows that {Mt2 − St , t ≥ 0} is a martingale. It is necessary to show that St is increasing. Let s < t, St − Ss = lim (Stn − Ssn ) in L2 . n→∞
Then writing k , inf{j : tˆj > s}, 2 2 2 X Stn − Ssn = Mtˆj+1 − Mtˆj + Mtˆk − Mtˆk−1 − Ms − Mtˆk−1 . tj >s
B.2 Itˆ o Integral
335
Clearly 2 2 Mˆ − Mˆ − Ms − Mtˆk−1 ≤ 2 tk tk−1
sup
(Mr − Ms )2 ,
0≤r≤s≤t s−r<2−m
where the bound on the right-hand side tends to zero in L2 as n → ∞. Therefore in L2 2 X St − Ss = lim Mtˆj+1 − Mtˆj n→∞
tj >s
and hence St − Ss ≥ 0 almost surely, so the process S is a.s. increasing. It remains to show that a version of St can be chosen which is almost surely continuous. By Doob’s L2 -inequality applied to the martingale (B.2) it follows that n m 2 E sup |St − St | ≤ 4E (San − Sam )2 ; t≤a
thus a suitable subsequence nk can be chosen such that Stnk converges a.s. uniformly on compact time intervals to a limit S which from the continuity of M must be continuous a.s. Uniqueness follows from the result that a continuous local martingale of finite variation is everywhere zero. Suppose the process A in the above definition were not unique. That is, suppose that also for some Bt continuous increasing from zero, Mt2 − Bt is a martingale. Then as Mt2 − At is also a martingale, by subtracting these two equations we get that At −Bt is a martingale, null at zero. It clearly must have finite variation, and hence be zero. To extend to the general case where the martingale M is not bounded use a sequence of stopping times Tn , inf{t ≥ 0 : |Mt | > n}; then {MtTn , t ≥ 0} is a bounded martingale to which the proof can be applied to construct hM Tn i. By uniqueness it follows that hM Tn i and hM Tn+1 i agree on [0, Tn ] so a process hM i may be defined. t u Definition B.12. Define a measure on ([0, ∞) × Ω, P) in terms of the quadratic variation of M via Z ∞ µM (A) , E 1A (s, ω) dhM is . (B.4) 0
In terms of this measure we can define an associated norm on a P-measurable process X via Z kXkM , X 2 dµM . (B.5) [0,∞)×Ω
336
B Stochastic Analysis
This norm can be written using (B.4) and (B.5) more simply as Z ∞ 2 kXkM = E Xs dhM is . 0
Definition B.13. Define L2P , {X ∈ P : kXkM < ∞}. This space L2P with associated norm k · kM is a Banach space. Denote by L2P the space of equivalence classes of elements of L2P , where we consider the equivalence class of an element X to be all those elements Y ∈ L2P which satisfy kX − Y kM = 0. Lemma B.14. The space of bounded elements of E, which we denote E¯ is dense in the subspace of bounded functions in L2P . Proof. This is a classical monotone class theorem proof which explains the requirement to work within spaces of bounded functions. Define C = H ∈ P : H is bounded, ∀ε > 0 ∃J ∈ E¯ : kH − JkM < ε . It is clear that E¯ ⊂ C. Thus it also follows that the constant function one is included in C. The fact that C is a vector space is immediate. It remains to verify that if Hn ↑ H where Hn ∈ C with H bounded that this implies H ∈ C. Fix ε > 0. By the bounded convergence theorem for Stieltjes integrals, it follows that kHn − HkM → 0 as n → ∞; thus we can find N such that for n ≥ N , kHn − HkM < ε/2. As HN ∈ C, it follows that there exists J ∈ E¯ such that kJ − HN kM < ε/2. Thus by the triangle inequality kH − JkM ≤ kH − ¯ ⊂ C. t HN kM +kHN −JkM < ε. Hence by the monotone class theorem σ(E) u Lemma B.15. For X ∈ E it follows that "Z 2 # ∞ E Xr dMr = kXkM . 0
Proof. Consider X = 1(s,t]×A where A ∈ Fs and s < t. Then "Z "Z 2 # 2 # ∞ ∞ E Xt dMr =E 1(s,t] (r)1A dMr 0
0
= E (Mt − Ms )2 1A = E 1A Mt2 − 2Mt Ms + Ms2 = E 1A Mt2 + Ms2 − 2E [1A E [Mt Ms | Fs ]] = E 1A Mt2 − Ms2 . Then from the definition of µM it follows that µM ((s, t] × A) = E[1A (hM it − hM is )].
B.2 Itˆ o Integral
337
We know Mt2 − hM it is a local martingale, so it follows that "Z 2 # ∞ µM ((s, t] × A) = E 1(s,t] (r)1A dMr 0
and by linearity this extends to functions in E.
t u
As a consequence of Lemma B.14 it follows that given any bounded X ∈ L2P we can construct an approximating sequence X n ∈ E such kX n − R ∞ that n XkM → 0 as n → ∞. Using Lemma B.15 it follows that 0 Xs dMs is a Cauchy sequence in the L2 sense; thus we can make the following definition. Definition B.16. For X ∈ L2P we may define the Itˆ o integral in the L2 sense through the isometry "Z 2 # ∞ E Xr dMr = kXkM . (B.6) 0
We must check that this extension of the stochastic integral is well defined. That is, consider another approximating sequence Yn → X; we must show that this converges to the same limit as the sequence Xn considered previously, but this is immediate from the isometry. 2 Remark B.17. From the above definition R ∞ nof the stochastic integral in an L sense as a limit of approximations 0 Xr dMr , it follows that since convergence in L2 implies convergence in probability we can also define the extension of the stochastic integral as a limit Rin probability. By a standard result, there ∞ exists a subsequence nk such that 0 Xrnk dMr converges a.s. as k → ∞. It might appear that this would lead to a pathwise extension (i.e. a definition for each ω). However, this a.s. limit is not well defined: different choices of approximating sequence can give rise to limits which differ on (potentially different) null sets. As there are an uncountable number of possible approximating sequences the union of these null sets may not be null and thus the limit not well defined.
The following theorem finds numerous applications throughout the book, usually to show that the expectation of a particular stochastic integral term is 0. Theorem B.18. If X ∈ L2P and M is a square integrable martingale then Rt Xs dMs is a martingale. 0 Proof. Let X n ∈ E be sequence converging to X in the k · kM norm; then by Rt Rt Lemma B.9 each 0 Xsn dMs is a martingale. By the Itˆo isometry 0 Xsn dMs Rt converges to 0 Xs dMs in L2 and the martingale property is preserved by L2 limits. t u
338
B Stochastic Analysis
B.2.2 Continuous Integrator The foregoing arguments cannot be used to extend the definition of the stochastic integral to integrands outside of the class of previsible processes. For example, the previsible processes do not form a dense set in the space of progressively measurable processes so approximation arguments can not be used to extend the definition to progressively measurable integrands. The approach taken here is based on Chung and Williams [53]. Let µ ˜M be a measure on [0, ∞) × Ω which is an extension of µM (that is µM and µ ˜M agree on P and µ ˜M is defined on a larger σ-algebra than P). Given a process X which is B × F-measurable, if there is a previsible process Z such that Z (X − Z)2 d˜ µM = 0, (B.7) [0,∞)×Ω
which, by the usual Lebesgue argument, is equivalent to µ ˜M ((t, ω) : Xt (ω) 6= Zt (ω)) = 0, R∞ R∞ then we may define 0 Xs dMs , 0 Zs dMs . In general we cannot hope to find such a Z for all B × F-measurable X. However, in the case where the integrator M is continuous we can find such a previsible Z for all progressively measurable X. ˜ be the set of µ ˜ ; then it follows Let N ˜M null sets and define P˜ = P ∨ N ˜ that for X a P-measurable process, we can find a process Z in P such that µ ˜M ((t, ω) : RXt (ω) 6= Zt (ω)) R ∞= 0. Hence (B.7) will hold and consequently we ∞ may define 0 Xs dMs , 0 Zs dMs . The following theorem is an important application of this result. Theorem B.19. Let M be a continuous martingale. Then if X is progressively measurable we can define the integral of X with respect to M in the Itˆ o sense through the extension of the isometry "Z # Z 2
∞
Xs dMs
E 0
∞
Xs2 d˜ µM .
=E 0
Proof. From the foregoing remarks, it is clear that it is sufficient to show ˜ that every progressively measurable process X is P-measurable. There are two approaches to establishing this: one is direct via the previsible projection and the other indirect via the optional projection. In either case, the result of Lemma B.21 is established, and the conclusion of the theorem follows. t u Optional Projection Route We begin with a measurability result which we need in the proof of the main result in this section.
B.2 Itˆ o Integral
339
Lemma B.20. If X is progressively measurable and T is a stopping time, then XT 1{T <∞} is FT -measurable. Proof. For fixed t the map ω 7→ X(t, ω) defined on [0, t] × Ω is B[0, t] ⊗ Fmeasurable. Since T is a stopping time ω 7→ T (ω) ∧ t is Ft -measurable. By composition of functions† it follows that ω 7→ X(T (ω)∧t, ω) is Ft -measurable. Now define Y = XT 1{T ≤∞} ; for any t it is clear Y 1{T ≤t} = XT ∧t 1{T ≤t} . Hence on {T ≤ t} it follows that Y is Ft -measurable, which by the definition of FT implies that Y is FT -measurable. t u Lemma B.21. The set of progressively measurable functions on [0, ∞) × Ω ˜ is contained in P. ˜ Proof. First we must show that all optional processes are P-measurable. This ˜ is straightforward: if τ is a stopping time we must show that 1[0,τ ] is P˜ measurable. But 1[0,τ ) is previsible and thus automatically P-measurable, hence it is sufficient to establish that [τ ] , {(τ (ω), ω) : τ (ω) < ∞, ω ∈ Ω} ∈ ˜ But P. Z ∞ µ ˜M ([τ ]) = E 1{τ (ω)=s} dhM is = E[hM it − hM it− ] = 0; 0
the final equality follows from the fact that Mt is continuous. Starting from a progressively measurable process X, by Theorem 2.7 we can construct its optional projection o X. From (B.4), Z ∞ o µ ˜M ((t, ω) : Xt (ω) 6= Xt (ω)) = E 1{o Xs (ω)6=Xs (ω)} dhM is . 0
Define τt = inf{s ≥ 0 : hM is > t}; since the set (t, ∞) × Ω is progressively measurable, and hM it is continuous and hence progressively measurable, it follows that τt is a stopping time by the D´ebut theorem (Theorem A.20). Hence, Z ∞ o µ ˜M ((t, ω) : Xt (ω) 6= Xt (ω)) = E 1{o Xs (ω)6=Xs (ω)} dhM is "Z0 # hM i∞
=E 0
Z =E 0 †
∞
1{o Xτs (ω)6=Xτs (ω)} ds 1{τs <∞} 1{o Xτs (ω)6=Xτs (ω)} ds .
It is important to realise that this argument depends fundamentally on the progressive measurability of X, it is in fact the same argument which is used (e.g. in Rogers and Williams [248, Lemma II.73.11]) to show that for progressively measurable X, XT is FT -measurable for T an Ft -stopping time.
340
B Stochastic Analysis
Thus using Fubini’s theorem µ ˜M ((t, ω) : o Xt (ω) 6= Xt (ω)) = E Z =
∞
Z 0
1{τs <∞} 1{o Xτs 6=Xτs } ds
∞
P(τs < ∞, o Xτs 6= Xτs ) ds.
0
From Lemma B.20 it follows that for any stopping time τ , Xτ 1{τ <∞} is Fτ measurable; thus from the definition of optional projection o
Xτ 1{τ <∞} = E[Xτ 1{τ <∞} | Fτ ] = Xτ 1{τ <∞}
P-a.s.
Hence µ ˜M ((t, ω) : o Xt (ω) 6= Xt (ω)) = 0. But we have shown that the optional ˜ processes are P-measurable, and o X is an optional process; thus from the definition of P˜ there exists a previsible process Z such that µ ˜M ((t, ω) : Zt (ω) 6= o Xt (ω)) = 0 hence using these two results µ ˜M ((t, ω) : Zt (ω) 6= Xt (ω)) = 0 ˜ which implies that X is P-measurable. t u Previsible Projection Route While the previous approach shows that the progressively measurable processes can be viewed as the class of integrands, the argument is not constructive. By considering the previsible projection we can provide a constructive argument. In brief, if X is progressively measurable and M is a continuous martingale then Z Z ∞
∞
p
Xs dMs = 0
Xs dMs ,
0
where p X, the previsible projection of X, is a previsible process and the integral on the right-hand side is to be understood in the sense of Definition B.16. Lemma B.22. If X is progressively measurable and T is a previsible time, then XT 1{T <∞} is FT − -measurable. Proof. If T is a previsible time then there exists an announcing sequence T n ↑ T such that Tn is a stopping time. By Lemma B.20 it follows for each n that XTn 1{Tn <∞} is FTn -measurable. Recall that _ FT − = FTn , n
so if we define random variables Y n , XTn 1{Tn <∞} and Y , lim inf Y n , n→∞
then it follows that Y is FT − -measurable.
t u
B.2 Itˆ o Integral
341
From the D´ebut theorem, τt , inf{s ≥ 0 : hM is > t} is a Ft -stopping time. Therefore τt−1/n is an increasing sequence of stopping times and their limit is τˆt , inf{s ≥ 0 : hM is ≥ t} therefore τˆt is a previsible time. We can now complete the proof of Lemma B.21 using the definition of the previsible projection. Proof. Starting from a progressively measurable process X by Theorem A.29 we can construct its previsible projection p X, from (B.4), Z ∞ p µ ˜M ( Xt (ω) 6= Xt (ω)) = E 1{(s,ω)p Xs (ω)6=Xs (ω)} dhM is . 0
Using the previsible time τˆt , µ ˜M (p Xt (ω) 6= Xt (ω)) = E
∞
Z
1{p Xs (ω)6=Xs (ω)} dhM is
"Z0 =E
1{p Xτˆs (ω)6=Xτˆs (ω)} ds
0
Z
#
hM i∞
∞
=E 0
1{ˆτs <∞} 1{p Xτˆs (ω)6=Xτˆs (ω)} ds .
Thus using Fubini’s theorem µ ˜M ((t, ω) : p Xt (ω) 6= Xt (ω)) =
Z
∞
P(ˆ τs < ∞, p Xτˆs 6= Xτˆs ) ds.
0
From Lemma B.22 it follows that for any previsible time τˆ, Xτˆ 1{ˆτ <∞} is Fτˆ− -measurable; thus from the definition of previsible projection p
Xτ 1{ˆτ <∞} = E[Xτˆ 1{ˆτ <∞} | Fτˆ− ] = Xτˆ 1{ˆτ <∞}
P-a.s.
˜ Hence µ ˜M ((t, ω) : p Xt (ω) 6= Xt (ω)) = 0. Therefore X is P-measurable. We also see that the previsible process Z in (B.7) is just the previsible projection of X. t u
B.2.3 Integration by Parts Formula The stochastic form of the integration parts formula leads to Itˆo’s formula which is the most important result for practical computations.
342
B Stochastic Analysis
Lemma B.23. Let M be a continuous martingale. Then Z t 2 2 hM it = Mt − M0 − 2 Ms dMs . 0
Proof. Following the argument and notation of the proof of Theorem B.11 define X n by ∞ X Xsn (ω) , Mtj (ω)1(tj ,tj+1 ] (s); j=0
while X n is defined in terms of an infinite number of non-zero terms, it is clear that 1[0,t](s) Xsn ∈ E. Therefore using the definition B.8, Stn =
∞ X
Mtˆ2j+1 − Mtˆ2j − 2Mtˆj Mtˆj+1 − Mtˆj
j=0
= Mt2 − M02 −
∞
Z
1[0,t] (s)Xsn dMs .
0
As the process M is continuous, it is clear that for fixed ω, X n (ω) → M (ω) uniformly on compact subsets of time and therefore by bounded convergence, kX n 1[0,t] − M 1[0,t] kM tends to zero. Thus by the Itˆo isometry (B.6) the result follows. t u Lemma B.24. Let M and N be square integrable martingales; then Z t Z t Mt Nt = M0 Nt + Ms dNs + Ns dMs + hM, N it . 0
0
Proof. Apply the polarization identity hM, N it = (hM + N it − hM − N it )/4 to the result of Lemma B.23, to give Z t hM, N it = (1/4) (Mt + Nt )2 − (M0 + N0 )2 − 2 (Ms + Ns ) dMs 0
Z
t
(Ms + Ns ) dNs − (Mt − Nt )2 − (M0 − N0 )2 0 Z t Z t −2 (Ms − Ns ) dMs + 2 (Ms − Ns ) dNs −2
0
0
Z = M t Nt − M 0 N0 −
t
Z Ns dMs −
0
t
Ms dNs . 0
t u
B.2 Itˆ o Integral
343
B.2.4 Itˆ o’s Formula Theorem B.25. If X is an Rd -valued semimartingale and f ∈ C 2 (Rd ) then f (Xt ) = f (X0 )+
d Z X i=1
t
0
d Z ∂ 1 X t ∂2 i f (X ) dX + f (Xs ) dhX i , X j is . s s ∂xi 2 i,j=1 0 ∂xi ∂xj
The continuity condition on f in the statement of Itˆo’s lemma is important; if it does not hold then the local time of X must be considered (see for example Chapter 7 of Chung and Williams [53] or Section IV. 43 of Rogers and Williams [249]). Proof. We sketch a proof for d = 1. The finite variation case is the standard fundamental theorem of calculus for Stieltjes integration. Consider the case of M a martingale. The proof is carried out by showing it holds for f (x) = xk for all k; by linearity it then holds for all polynomials and by a standard approximation argument for all f ∈ C 2 (R). To establish the result for polynomials proceed by induction. Suppose it holds for functions f and g; then by Lemma B.24, d(f (Mt )g(Mt )) = f (Mt ) dg(Mt ) + g(Mt ) df (Mt ) + dhf (Mt ), g(Mt )it = f (Mt )(g 0 (Mt ) dMt + 12 g 00 (Mt ) dhM it ) + g(Mt )(f 0 (Mt ) dMt + 12 f 00 (Mt ) dhM it ) + g 0 (Mt )f 0 (Mt ) dhM it . Since the result clearly holds for f (x) = x, it follows that it holds for all polynomials. The extension to C 2 (R) functions follows from a standard approximation argument (see e.g. Rogers and Williams [249] for details). t u B.2.5 Localization The integral may be extended to a larger class of integrands by the procedure of localization. Let H be a progressively measurable process. Define a nondecreasing sequence of stopping times Z t 2 Tn , inf Hs dhM is > n ; (B.8) t≥0
0
then it is clear thatR the process HtTn , Ht∧Tn is in the space LP . Thus the ∞ stochastic integral 0 HsTn dMs is defined in the Itˆo sense of Definition B.16. Theorem B.26. If for all t ≥ 0, Z t 2 P Hs dhM is < ∞ = 1, 0
(B.9)
344
B Stochastic Analysis
then we may define the stochastic integral Z ∞ Z Hs dMs , lim 0
n→∞
∞
HsTn dMs .
0
Proof. Under condition (B.9) the sequence of stopping times Tn defined in (B.8) tends to infinity P-a.s. It is straightforward to verify that this is well defined; that is, different choices of sequence Tn tending to infinity give rise to the same limit. t u This general definition of integral is then a local martingale. We can similarly extend to integrators M which are local martingales by using the minimum of a reducing sequence Rn for the local martingale M and the sequence Tn above.
B.3 Stochastic Calculus A very useful result can be proved using the Itˆo calculus about the characterisation of Brownian motion, due to L´evy. Theorem B.27. Let {B i }t≥0 be continuous local martingales starting from zero for i = 1, . . . , n. Then Bt = (Bt1 , . . . , Btn ) is a Brownian motion with respect to (Ω, F, P) adapted to the filtration Ft , if and only if hB i , B j it = δij t
∀i, j ∈ {1, . . . , n}.
Proof. In these circumstances it follows that the statement Bt is a Brownian motion is by definition equivalent to stating that Bt − Bs is independent of Fs and is distributed normally with mean zero and covariance matrix (t − s)I. Clearly if Bt is a Brownian motion then the covariation result follows trivially from the definitions. To establish the converse, we assume hB i , B j it = δij t for i, j ∈ {1, . . . , n} and prove that Bt is a Brownian motion. Observe that for fixed θ ∈ Rn we can define Mtθ by 1 2 Mtθ = f (Bt , t) , exp iθ> Bt + kθk t . 2 By application of Itˆ o’s formula to f we obtain (in differential form using the Einstein summation convention) ∂f ∂f 1 ∂2f (Bt , t) dBtj + (Bt , t) dt + (Bt , t) dhB j , B k it j ∂x ∂t 2 ∂xj ∂xk 1 1 = iθj f (Bt , t) dBtj + kθk2 f (Bt , t) dt − θj θk δjk f (Bt , t) dt 2 2 = iθj f (Bt , t) dBtj .
d (f (Bt , t)) =
B.3 Stochastic Calculus
Hence Mtθ
Z =1+
345
t
d(f (Bt , t)), 0
and is a sum of stochastic integrals with respect to continuous local martingales and is hence itself a continuous local martingale. But for each t, using | · | to denote the complex modulus 1 |Mtθ | = exp kθk2 t < ∞. 2 Hence for any fixed time t0 , Mtt0 satisfies t0 |Mtt0 | ≤ |M∞ | < ∞,
and so is a bounded local martingale. Hence {Mtt0 , t ≥ 0} is a genuine martingale. Thus for 0 ≤ s < t we have 1 > 2 E exp iθ (Bt − Bs ) | Fs = exp − (t − s)kθk a.s. 2 However, this is the characteristic function of a multivariate normal random variable distributed as N (O, (t−s)I). Thus by the L´evy characteristic function theorem Bt − Bs is an N (O, (t − s)I) random variable. t u B.3.1 Girsanov’s Theorem Girsanov’s theorem for the change of drift underlies many important results. The result has an important converse but this is not used here. Theorem B.28. Let M be a continuous martingale, and let Z be the associated exponential martingale Zt = exp Mt − 12 hM it . (B.10) If Z is a uniformly integrable martingale, then a new measure Q, equivalent to P, may be defined by dQ , Z∞ . dP Furthermore, if X is a continuous P local martingale then Xt − hX, M it is a Q-local martingale. Proof. Since Z is a uniformly integrable martingale it follows from Theorem B.1 (martingale convergence) that Zt = E[Z∞ | Ft ]. Hence Q constructed thus is a probability measure which is equivalent to P. Now consider X, a P-local martingale. Define a sequence of stopping times which tend to infinity via Tn , inf{t ≥ 0 : |Xt | ≥ n or |hX, M it | ≥ n}.
346
B Stochastic Analysis
Consider the process Y defined via Y , XtTn − hX Tn , M it . By Itˆ o’s formula applied to (B.10), dZt = Zt dMt ; a second application of Itˆo’s formula yields d(Zt Yt ) = 1t≤Tn (Zt dYt + Yt dZt + hZ, Y it ) = 1t≤Tn (Zt (dXt − dhX, M it ) + Yt Zt dMt + hZ, Y it ) = 1t≤Tn (Zt (dXt − dhX, M it ) + (Xt − hX, M it )Zt dMt + Zt dhX, M it ) =1t≤Tn ((Xt − hX, M it )Zt dMt + Zt dXt ) , where the result hZ, Y it = Zt hX, M it follows from the Kunita–Watanabe identity; hence ZY is a P-local martingale. But Z is uniformly integrable and Y is bounded (by construction of the stopping time Tn ), hence ZY is a genuine P-martingale. Hence for s < t and A ∈ Fs , we have EQ [(Yt − Ys )1A ] = E [Z∞ (Yt − Ys )1A ] = E [(Zt Yt − Zs Ys )1A ] = 0; hence Y is a Q-martingale. Thus Xt − hX, M it is a Q-local martingale, since Tn is a reducing sequence such that (X − hX, M i)Tn is a Q-martingale, and Tn ↑ ∞ as n → ∞. t u Corollary B.29. Let Wt be a P-Brownian motion and define Q as in Theorem ˜ t = Wt − hW, M it is a Q-Brownian motion. B.28; then W Proof. Since W is a Brownian motion it follows that hW, W it = t for all t ≥ 0. ˜ t is continuous and hW ˜ ,W ˜ it = hW, W it = t, it follows from L´evy’s Since W ˜ is a Q-Brownian characterisation of Brownian motion (Theorem B.27) that W motion. t u The form of Girsanov’s theorem in Theorem B.28 is too restrictive for many applications of interest. In particular the requirement that the martingale Z be uniformly integrable and the implied equivalence of P and Q on F rules out even such simple applications as transforming Xt = µt + Wt to remove the constant drift. In this case the martingale Zt = exp(µWt − 21 µ2 t) is clearly not uniformly integrable. If we consider A ∈ F∞ defined by Xt − µt A = lim =0 , (B.11) t→∞ t it is clear that P(A) = 1, yet under a measure Q under which X has no drift Q(A) = 0. Since equivalent measures have the same null sets it would follow that if this measure which killed the drift were equivalent to P then A should also be null, a contradiction. Hence on F the measures P and Q cannot be equivalent.
B.3 Stochastic Calculus
347
If we consider restricting the definition of the measure Q to Ft for finite t then the above problem is avoided. In the example given earlier under Qt the process X restricted to [0, t] is a Brownian motion with zero drift. This approach via a family of consistent measures is used in the change of measure approach to filtering, which is described in Chapter 3. Since we have just shown that there does not exist any measure equivalent to P under which X is a Brownian motion on [0, ∞) it is clear that we cannot, in general, find a t measure Q defined on F∞ such S that the restriction of Q to Ft is Q . Define a set function on 0≤t<∞ Ft by Q(A) = Qt (A),
∀A ∈ Ft , ∀t ≥ 0. (B.12) S If we have a finite set A1 , . . . , An of elements of 0≤t<∞ Ft , then we can find s such that Ai ∈ Fs for i = 1, . . . , n and since Qs is a probability measure it follows that the set function Q is finitely additive. It is immediate that Q(∅) = 0 and Q(Ω) = 1. It is not obvious whether Q is countably additive. If Q is countably additive, Sthen Carath´ eodory’s theorem allows us to extend the definition of Q to σ 0≤t<∞ Ft = F∞ . This can be resolved in special situations by using Tulcea’s theorem. The σ-algebras Ft are all defined on the same space, so the atom condition of Tulcea’s theorem is non-trivial (contrast with the case of the product spaces used in the proof of the Daniell–Kolmogorov–Tulcea theorem), which explains why this extension cannot be carried out in general. The following corollary gives an important example where an extension is possible. Corollary B.30. Let Ω = C([0, ∞), Rd ) and let Xt be the canonical process on Ω. Define Fto = σ(Xs : 0 ≤ s ≤ t). If Zt = exp Mt − 12 hM it o o is a Ft+ -adapted martingale then there exists a unique measure Q on (Ω, F∞ ) such that dQ = Zt , ∀t dP F o t+
o and the process Xt −hX, M it is a Q local martingale with respect to {Ft+ }t≥0 .
Proof. We apply Theorem B.28 to the process Z t , which is clearly a uniformly o integrable martingale (since Zst = E[Ztt | Fs+ ]). We may thus define a family t o Q of measures equivalent to P on Ft+ . It is clear that these measures are o consistent; that is for s ≤ t, Qt restricted to Fs+ is identical to Qs . For any finite set of times t1 < t2 < · · · such that tk → ∞ as k → ∞, since the sample space Ω = C([0, ∞), Rd ) is a complete separable metric space, regular conditional probabilities in the sense of Definition 2.28 exist as a consequence of Exercise 2.29, and we may denote them Qtk (· | Ftk−1 + ) for k = 1, 2, . . ..
348
B Stochastic Analysis
The sequence of σ-algebras Ftok + is clearly increasing. If we consider a sequence Ak of atoms with each Ak ∈ Ftok + such that A1 ⊇ A2 ⊃ ·, then using the fact that these are the unaugmented σ-algebras on the canonical sample space it follows that ∩∞ k=1 Ak 6= ∅. Therefore, using these regular conditional probabilities as the transition kernels, we may now apply Tulcea’s theorem o A.11 to construct a measure Q on F∞ which is consistent with Qtk on Ftok + for each k. The consistency condition ensures that the measure Q thus obtained is independent of the choice of the times tk s. t u Corollary B.31. Let Wt be a P-Brownian motion and define Q as in Corol˜ t = Wt − hW, M it is a Q-Brownian motion with respect to lary B.30; then W o Ft+ . t u
Proof. As for Corollary B.29. B.3.2 Martingale Representation Theorem
The following representation theorem has many uses. The proof given here only establishes the existence of the representation. The results of Clark allow an explicit form to be established (see Nualart [227, Proposition 1.3.14] for details, or Section IV.41 of Rogers and Williams [249] for an elementary account). Theorem B.32. Let B be an m-dimensional Brownian motion and let Ft be the right continuous enlargement of the σ-algebra generated by B augmented† with the null sets N . Let T > 0 be a constant time. If X is a square integrable random variable measurable with respect to the σ-algebra FT then there exists a previsible νs such that Z T X = E[X] + νs> dBs . (B.13) 0
Proof. To establish the respresentation (B.13), without loss of generality we may consider the case EX = 0 (in the general case apply the result to X−EX). Define the space ( "Z # ) T
L2T =
kHs k2 ds < ∞ .
H : H is Ft -previsible and E 0
Consider the stochastic integral map J : L2T → L2 (FT ), defined by †
This condition is satisfied automatically if the filtration satisfies the usual conditions.
B.3 Stochastic Calculus
Z J(H) =
349
T
Hs> dBs .
0
As a consequence of the Itˆ o isometry theorem, this map is an isometry. Hence the image V under J of the Hilbert space L2T is complete and hence a closed subspace of L20 (FT ) = {H ∈ L2 (FT ) : EH = 0}. The theorem is proved if we can establish that the image is equal to the whole space L20 (FT ) for the image is the space of random variables X which admit a representation of the form (B.13). Consider the orthogonal complement of V in L20 (FT ). We aim to show that every element of this orthogonal complement is zero. Suppose that Z is in the orthogonal complement of L20 (FT ); thus E(ZX) = 0
for all X ∈ L20 (FT ).
(B.14)
Define Zt = E[Z | Yt ] which is an L2 -bounded martingale. We know that the σ-algebra F0 is trivial by the Blumental 0–1 law therefore Z0 = E[Z | F0 ] = E(Z) = 0
P-a.s.
Let H ∈ L2T and NT , J(H) and define Nt , E[NT | Ft ] for 0 ≤ t ≤ T . It is clear that NT ∈ V . Let S be a stopping time such that S ≤ T ; then by optional sampling # "Z Z T S NS = E[NT | FS ] = E Hs> dBs + Hs> dBs FS = J(H1[0,S] ), 0 S so consequently NS ∈ V . The orthogonality relation (B.14) then implies that E(ZNS ) = 0. Thus using the properties of conditional expectation 0 = E[ZNS ] = E[E[ZNS | FS ]] = E[NS E[Z | FS ]] = E[ZS NS ]. Since this holds for S a bounded stopping time, and ZT and NT are square integrable, it follows that Zt Nt is a uniformly integrable martingale and hence hZ, N it is a null process. Let εt be an element of the set St defined in Lemma B.39 where the stochastic process Y is taken to be the Brownian motion B. Extending J in the obvious way to m-dimensional vector processes, we have that εt = 1 + J(iεr1[0,t] ) for some r ∈ L∞ ([0, t], Rm ). Using the above, Zt εt = Z0 + Zt J(iεr1[0,t] ). Both {Zt J(iεr1[0,t] ), t ≥ 0} and {Zt , t ≥ 0} are martingales and Z0 = 0; hence E[εt Zt ] = E[Z0 ] + E Zt J iεr1[0,t] = E(Z0 ) = 0. Thus since this holds for all εt ∈ St and the set St is total this implies that Zt = 0 P-a.s. t u
350
B Stochastic Analysis
Remark B.33. For X a square integrable Ft -adapted martingale this result can be applied to XT , followed by conditioning and use of the martingale property to obtain for any 0 ≤ t ≤ T , Z t∧T Xt = E[XT | Ft ] = E[XT ] + νs> dBs 0 Z t = E(X0 ) + νs> dBs . 0
As the choice of the constant time T was arbitrary, it is clear that this result holds for all t ≥ 0. B.3.3 Novikov’s Condition One of the most useful conditions for checking whether a local martingale of exponential form is a martingale is that due to Novikov. Theorem B.34. If Zt = exp Mt − 12 hM it for M a continuous local martingale, then a sufficient condition for Z to be a martingale is that E exp( 12 hM it < ∞, 0 ≤ t < ∞. Proof. Define the stopping time Sb = inf{t ≥ 0 : Ms − s = b} and note that P(Sb < ∞) = 1. Then define Yt , exp(Mt − 12 t);
(B.15)
it follows by the optional stopping theorem that E[exp(MSb − 12 Sb )] = 1, which implies E[exp( 12 Sb )] = e−b . Consider Nt , Yt∧Sb ,
t ≥ 0,
which is also a martingale. Since P(Sb < ∞) = 1 it follows that N∞ = lim Ns = exp(MSb − 12 Sb ). s→∞
By Fatou’s lemma Ns is a supermartingale with last element. But E(N∞ ) = 1 = E(N0 ) whence N is a uniformly integrable martingale. So by optional sampling for any stopping time R, E exp MR∧Sb − 12 (R ∧ Sb ) = 1. Fix t ≥ 0 and set R = hM it . It then follows for b < 0, E 1Sb
B.3 Stochastic Calculus
351
B.3.4 Stochastic Fubini Theorem The Fubini theorem of measure theory has a useful extension to stochastic integrals. The form stated here requires a boundedness assumption and as such is not the most general form possible, but is that which is most useful for applications. We assume that all the stochastic integrals are with respect to continuous semimartingales, because this is the framework considered here. To extend the result it is simply necessary to stipulate that a c`adl`ag version of the stochastic integrals be chosen. For a more general form see Protter [247, Theorem IV.46]. In this theorem we consider a family of processes parametrised by an index a ∈ A, and let µ be a finite measure on the space (A, A); that is, µ(A) < ∞. Theorem B.35. Let X be a semimartingale and µ a finite measure. Let Hta = H(t, a, ω) be a bounded B[0, t] ⊗ A ⊗ P measurable process and Rt R R Zta , 0 Hsa dXs . If we define Ht , A Hta µ(da) then Yt = A Z a µ(da) is Rt the process given by the stochastic integral 0 Hs dXs . Proof. By stopping we can reduce the case to that of X ∈ L2 . As a consequence of the usual Fubini theorem it suffices to consider X a martingale. The proof proceeds via a monotone class argument. Suppose H(t, a, ω) = K(t, ω)f (a) for f bounded A-measurable. Then it follows that Z t Zt = f (a) K(s, ω) dXs , 0
and hence Z
Zta µ(da) =
A
Z
Z f (a) A Z t
t
K(s, ω) dXs
µ(da)
0
Z
=
K(s, ω) dXs f (a) µ(da) 0 A Z t Z f (a)µ(da)K(s, ω) dXs = 0
Z =
A t
Hs dXs . 0
Thus we have established the result in this simple case and by linearity to the vector space of finite linear combinations of bounded functions of this form. It remains to show the monotone property; that is, suppose that the result holds for Hn and R t Hn → H. We must show that the result holds for H. a Let Zn,t , 0 Hna dXs . We are interested in convergence uniformly in t; thus note that Z Z Z a a a a E sup Zn,t µ(da) − Zt µ(da) ≤ E sup |Zn,t − Zt | µ(da) . t
A
A
A
t
352
B Stochastic Analysis
We show that the right-hand side tends to zero as n → ∞. By Jensen’s inequality and Cauchy–Schwartz we can compute as follows, "Z Z 2 2 # a a a a E sup |Zn,t − Zt | µ(da) ≤E sup |Zn,t − Zt | µ(da) A
t
t
A
Z
Z ≤
µ(da) E A
a sup |Zn,t − Zta |2 µ(da) .
A
t
Then an application of the non-stochastic version of Fubini’s theorem followed by Doob’s L2 -inequality implies that "Z " # 2 # Z 1 a a a a 2 E sup |Zn,t − Zt | µ(da) ≤ E sup |Zn,s − Zs | µ(da) µ(A) s∈[0,T ] A t A Z a a 2 ≤4 E (Zn,∞ − Z∞ ) µ(da) ZA ≤4 E [hZna − Z a i∞ ] µ(da). A
Then by the Kunita–Watanabe identity 1 E µ(A)
2 a sup |Zn,t − Zta | µ(da) A t Z Z ≤4 E
Z
A
∞ a (Hn,s − Hsa )2 dhXis
µ(da).
0
Since Hn increases monotonically to a bounded process H it follows that Hn and H are uniformly bounded; we may apply the dominated convergence theorem to the double integral and expectation and thus the right-hand side converges to zero. Thus Z Z a a lim E sup Zn,t µ(da) − Zt µ(da) = 0. (B.16) n→∞
t
A
A
We may conclude from this that Z a sup |Zn,t − Zta | µ(da) < ∞ A
a.s.
t
R as a consequence of which A |Zta | µ(da) < ∞ for all t a.s., and thus the R R a integral A Zta µ(da) is defined a.s. for all t. Defining Hn,t , A Hn,t µ(da), we Rt have from (B.16) that H dX converges in probability uniformly in t to n,s s 0 R a Z µ(da). Since a priori the result holds for H we have that n A t Z
t
Z Hn,s dXs =
0
A
a Zn,t µ(da),
B.3 Stochastic Calculus
353
and form of the dominated convergence theorem R t since by the stochastic Rt H dX tends to H dX n,s s s s as n → ∞ it follows that 0 0 Z
t
Z
Zta µ(da).
Hs dXs = 0
A
t u B.3.5 Burkholder–Davis–Gundy Inequalities Theorem B.36. If F : [0, ∞) → [0, ∞) is a continuous increasing function such that F (0) = 0, and for every α > 1 KF =
sup x∈[0,∞)
F (αx) < ∞, F (x)
then there exist constants cF and CF such that for every continuous local martingale M , h p i h p i cF E F hM i∞ ≤ E F sup |Mt | ≤ CF E F hM i∞ . t≥0
An example of a suitable function F which satisfies the conditions of the theorem is F (x) = xp for p > 0. Various proofs exist of this result. The proof given follows Burkholder’s approach in Chapter II of [36]. The proof requires the following lemma. Lemma B.37. Let X and Y be nonnegative real-valued random variables. Let β > 1, δ > 0, ε > 0 be such that for all λ > 0, P(X > βλ, Y ≤ δλ) ≤ εP(X > λ).
(B.17)
Let γ and η be such that F (βλ) ≤ γF (λ) and F (δ −1 λ) ≤ ηF (λ). If γε < 1 then γη E [F (X)] ≤ E [F (Y )] . 1 − γε Proof. Assume without loss of generality that E[F (X)] < ∞. It is clear from (B.17) that for λ > 0, P(X > βλ) = P(X > βλ, Y ≤ δλ) + P(X > βλ, Y > δλ) ≤ εP(X > λ) + P(Y > δλ). Since F (0) = 0 by assumption, it follows that Z x Z ∞ F (x) = dF (λ) = I{λ<x} dF (λ); 0
thus by Fubini’s theorem
0
(B.18)
354
B Stochastic Analysis
Z
∞
E[F (X)] =
P(X > λ) dF (λ). 0
Thus using (B.18) it follows that Z ∞ E[F (X/β)] = P(X > βλ) dF (λ) 0 Z ∞ Z ≤ε P(X > λ) dF (λ) + 0
∞
P(Y > δλ) dF (λ)
0
≤ εE[F (X)] + E[Y /δ]; from the conditions on η, and γ it then follows that E[F (X/β)] ≤ εγE[F (X/β)] + ηE[F (Y )]. Since we assumed E[F (X)] < ∞, and εγ < 1, it follows that E[F (X/β)] ≤
η E[F (Y )], 1 − εγ
and the result follows using the condition on γ.
t u
We can now prove the Burkholder–Davis–Gundy inequality, by using the above lemma. Proof. Let τ = inf{u : |Mu | > λ} which is an Ft -stopping time. Define Nt , (Mτ +t − Mτ )2 − (hM iτ +t − hM iτ ), which is a continuous Fτ +t -adapted local martingale. Choose β > 1, 0 < δ < 1. On the event defined by {supt≥0 |Mt | > βλ, hM i∞ ≤ δ 2 λ2 } the martingale Nt must hit the level (β − 1)2 λ2 − δ 2 λ2 before it hits −δ 2 λ2 . From elementary use of the optional sampling theorem the probability of a martingale hitting a level b before a level a is given by −a/(b − a); thus P sup |Mt | > βλ, hM i∞ ≤ δ 2 λ2 | Fτ ≤ δ 2 /(β − 1)2 . t≥0
Hence as β > 1, P sup |Mt | > βλ, hM i∞ ≤ δ 2 λ2 t≥0
2 2 = P sup |Mt | > βλ, hM i∞ ≤ δ λ , τ < ∞ t≥0 2 2 = E P sup |Mt | > βλ, hM i∞ ≤ δ λ Fτ 1τ <∞ t≥0
2
≤ δ P(τ < ∞)/(β − 1)2 . It is immediate that since β > 1, F (βλ) < KF F (λ) and similarly since δ < 1, F (λ/δ) < KF F (λ), so we may take γ = η = KF . Now we can choose 0 < δ < 1
B.5 Total Sets in L1
355
sufficiently small that εγ = δ 2 /(β − 1)2 < 1/KF . Therefore all the conditions of Lemma B.37 are satisfied whence h p i E F sup |Mt | ≤ CE F hM i∞ t≥0
and the opposite inequality can be established similarly.
t u
B.4 Stochastic Differential Equations Theorem B.38. Let f : Rd → Rd and σ : Rd → Rp be Lipschitz functions. That is, there exist positive constants Kf and Kσ such that kf (x) − f (y)k ≤ Kf kx − yk,
kσ(x) − σ(y)k ≤ Kσ kx − yk,
for all x, y ∈ Rd . Given a probability space (Ω, F, P) and a filtration {Ft , t ≥ 0} which satisfies the usual conditions, let W be an Ft -adapted Brownian motion and let ζ be an F0 -adapted random variable. Then there exists a unique continuous adapted process X = {Xt , t ≥ 0} which is a strong solution of the SDE, Z t Z t Xt = ζ + f (Xs ) ds + σ(Xs ) dWs . 0
0
The proof of this theorem can be found as Theorem 10.6 of Chung and Williams [53] and is similar to the proof of Theorem 2.9 of Chapter 5 in Karatzas and Shreve [149].
B.5 Total Sets in L1 The use of the following density result in stochastic filtering originated in the work of Krylov and Rozovskii. ˜ let Y be a Brownian Lemma B.39. On the filtered probability space (Ω, F, P) motion starting from zero adapted to the filtration Yt ; then define the set Z t Z 1 t > 2 ∞ m St = εt = exp i rs dYs + krs k ds : r ∈ L ([0, t], R ) (B.19) 2 0 0 ˜ That is, if a ∈ L1 (Ω, Yt , P) ˜ and E[aε ˜ t] = Then St is a total set in L1 (Ω, Yt , P). ˜ 0, for all εt ∈ St , then a = 0 P-a.s. Furthermore each process ε in the set St satisfies an SDE of the form dεt = iεt rt> dYt , for some r ∈ L∞ ([0, t], Rm ).
356
B Stochastic Analysis
Proof. We follow the proof in Bensoussan [13, page 83]. Define a set Z t 0 > ∞ m St = εt = exp i rs dYs r ∈ L ([0, t], R ) . 0
˜ such that E[aε ˜ t ] = 0 for all εt ∈ S 0 . Let a be a fixed element of L1 (Ω, Yt , P) t ˜ t ] = 0 for all This can easily be seen to be equivalent to the statement that E[aε ˜ t] = 0 εt ∈ St , which we assume. To establish the result, we assume that E[aε 0 for all εt ∈ St , and show that a is zero a.s. Take t1 , t2 , . . . , tp ∈ (0, t) with t1 < t2 < · · · < tp , then given l1 , l2 , . . . , ln ∈ Rm , define µp , lp ,
µp−1 , lp + lp−1 ,
µ1 , lp + · · · + l1 .
...
Adopting the convention that t0 = 0, define a function µh for t ∈ (th−1 , th ), h = 1, . . . , p, rt = 0 for t ∈ (tp , T ), whence as Yt0 = Y0 = 0, p X
lh> Yth =
h=1
p X
µ> h (Yth − Yth−1 ) =
Z
t
rs> dYs .
0
h=1
˜ Hence for a ∈ L1 (Ω, Yt , P) " !# Z t p X > ˜ a exp i ˜ a exp i E lh Yth =E rs> dYs = 0, 0
h=1
where the second equality follows from the fact that we have assumed E[aεt ] = 0 for all ε ∈ St0 . By linearity therefore, " K !# p X X > ˜ a E ck exp i l Yt = 0, h,k
k=1
h
h=1
where this holds for all K and for all coefficients c1 , . . . , cK ∈ C, and values lh,k ∈ R. Let F (x1 , . . . , xp ) be a continuous bounded complex-valued function defined on (Rm )p . By Weierstrass’ approximation theorem, there exists a uniformly bounded sequence of functions of the form ! p Kn X X (n) (n) > (n) P (x1 , . . . , xp ) = ck exp i (lh,k ) xh k=1
h=1
such that lim P (n) (x1 , . . . , xp ) = F (x1 , . . . , xp ).
n→∞
˜ Hence we have E[aF (Yt1 , . . . , Ytp )] = 0 for every continuous bounded function F , and by a further approximation argument, we can take F to be a
B.5 Total Sets in L1
357
bounded function, measurable with respect to the σ-algebra σ(Yt1 , . . . , Ytp ). ˜ Since t1 , t2 , . . . , tp were chosen arbitrarily, we obtain that E[ab] = 0, for b any ˜ 2 ∧ m] = 0 for arbounded Yt -measurable function. In particular it gives E[a ˜ bitrary m; hence a = 0 P-a.s. t u The following corollary enables us to use a smaller set of functions in the definition of the set St , in particular we can consider only bounded continuous functions with any number m of bounded continuous derivatives. Corollary B.40. Assume the same conditions as in Lemma B.39. Define the set Z t Z 1 t Stp = εt = exp i rs> dYs + krs k2 ds : r ∈ Cbp ([0, t], Rm ) (B.20) 2 0 0 where m is an arbitrary non-negative integer. Then Stm is a total set in ˜ That is, if a ∈ L1 (Ω, Yt , P) ˜ and E[aε ˜ t ] = 0, for all εt ∈ St , L1 (Ω, Yt , P). ˜ then a = 0 P-a.s. Furthermore each process ε in the set St satisfies an SDE of the form dεt = iεt rt> dYt , for some r ∈ L∞ ([0, t], Rm ). Proof. Let us prove the corollary for the case p = 0, that is, for r a bounded continuous function. To do this, as a consequence of Lemma B.39, it suffices to ˜ and E[aε ˜ t ] = 0, for all εt ∈ S 0 , then E[aε ˜ t ] = 0, show that if a ∈ L1 (Ω, Yt , P) t for all εt ∈ St . Pick an arbitrary εt ∈ St , Z t Z 1 t εt = exp i rs> dYs + krs k2 ds , r ∈ L∞ ([0, t], Rm ). 2 0 0 First let us note that by the fundamental theorem of calculus, as r ∈ L∞ ([0, t], Rm ), the function p : [0, t] → Rm defined as Z s ps = ru du 0
is continuous and differentiable almost everywhere. Moreover, for almost all s ∈ [0, t] dps = rs . ds Now let rn ∈ Cb0 ([0, t], Rm ) be defined as rsn , n ps − p0∨s−1/n ,
s ∈ [0, t].
Then rn is uniformly bounded by same bound as r and from the above, for almost all s ∈ [0, t], limn→∞ rsn = rs . By the bounded convergence theorem,
358
B Stochastic Analysis t
Z
krsn k2 ds =
lim
n→∞
and also ˜ lim E
"Z
n→∞
0
Z
t
krs k2 ds
0
t
rs>
t
Z dYs −
0
> (rsn )
2 # dYs
= 0.
0
Hence at least for a subsequence (rnk )nk >0 , by the Itˆo isometry Z t Z t > ˜ lim (rsnk ) dYs = rs> dYs , P-a.s. k→∞
0
0
and hence, the uniformly bounded sequence Z t Z 1 t nk 2 > εkt = exp i (rsnk ) dYs + krs k ds 2 0 0 ˜ converges, P-almost surely to εt . Then, via another use of the dominated convergence theorem ˜ t ] = lim E[aε ˜ k ] = 0, E[aε t k→∞
εkt
St0
since ∈ for all k ≥ 0. This completes the proof of the corollary for p = 0. For higher values of p, one iterates the above procedure. t u
B.6 Limits of Stochastic Integrals The following proposition is used in the proof of the Zakai equation. Proposition B.41. Let (Ω, F, P) be a probability space, {Bt , Ft } be a standard n-dimensional BrownianR motion defined R t on this space and Ψn , Ψ be an t Ft -adapted process such that 0 Ψn2 ds < ∞, 0 Ψ 2 ds < ∞, P-a.s. and Z n→∞
t
2
kΨn − Ψ k ds = 0
lim
0
in probability; then lim
Z t sup (Ψn> − Ψ > ) dBs = 0
n→∞ t∈[0,T ]
0
in probability. Proof. Given arbitrary t, ε, η > 0 we first prove that for an n-dimensional process ϕ, Z s Z t 4η 2 P sup ϕ> dB ≥ ε ≤ P kϕ k ds > η + 2. (B.21) r s r ε 0≤s≤t 0 0
B.6 Limits of Stochastic Integrals
359
To this end, define Z t τη , inf t : kϕs k2 ds > η , 0
and a corresponding stopped version of ϕ, ϕηs , ϕs 1[0,τη ] (s). Then using these definitions Z s Z s > P sup ϕ> dB ≥ ε = P τ < t; sup ϕ dB ≥ ε r η r r r 0≤s≤t 0≤s≤t 0 0 Z s > + P τη ≥ t; sup ϕr dBr ≥ ε 0≤s≤t 0 Z s η > ≤ P (τη < t) + P sup (ϕr ) dBr ≥ ε 0≤s≤t
Z
t
0
kϕs k2 ds > η 0 Z s + P sup (ϕηr )> dBr ≥ ε .
≤P
0≤s≤t
0
By Chebychev’s inequality and Doob’s L2 -inequality the second term on the right-hand side can be bounded " Z s Z s 2 # 1 P sup (ϕηr )> dBr ≥ ε ≤ 2 E sup (ϕηr )> dBr ε 0≤s≤t 0≤s≤t 0 0 "Z 2 # t 4 η > ≤ 2E (ϕr ) dBr ε 0 Z t 4 2 ≤ 2E kϕηr k dr ε 0 4η ≤ 2, ε which establishes (B.21). Applying this result with fixed ε to ϕ = Ψn − Ψ yields ! Z t Z t 4η 2 > > P sup (Ψn − Ψ ) dBs ≥ ε ≤ P kΨn − Ψ k ds > η + 2 . ε t∈[0,T ] 0 0 Given arbitrary δ > 0, by choosing η < δε2 /8 the second term on the righthand side is then bounded by δ/2 and with this η by the condition of the proposition there exists N (η) such that for n ≥ N (η) the first term is bounded by δ/2. Thus the right-hand side can be bounded by δ. t u
360
B Stochastic Analysis
B.7 An Exponential Functional of Brownian motion In this section we deduce an explicit expression of a certain exponential functional of Brownian motion which is used in Chapter 6. Let {Bt , t ≥ 0} be a d-dimensional standard Brownian motion. Let β : [0, t] → Rd be a bounded measurable function, Γ a d × d real matrix and δ ∈ Rd . In this section, we compute the following functional of B, Z t Z t 2 Itβ,Γ,δ = E exp Bs> βs ds − 21 kΓ Bs k ds Bt = δ . (B.22) 0
0
In (B.22) we use the standard notation Bs> βs =
d X
2
Bsi βsi ,
kΓ Bs k =
i=1
d X
Γ ij Bsj
2
,
s ≥ 0.
i,j=1
To obtain a closed formula for (B.22), we use L´evy’s diagonalisation procedure, a powerful tool for deriving explicit formulae. Other results and techniques of this kind can be found in Yor [280] and the references contained therein. The orthogonal decomposition of Bs with respect to Bt is s s Bs = Bt + Bs − Bt , s ∈ [0, t], t t and using the Fourier decomposition of the Brownian motion (as in Wiener’s construction of the Brownian motion) r X 2 sin(ksπ/t) s Bs = Bt + ξk , s ∈ [0, t], (B.23) t t kπ/t k≥1
where {ξk ; k ≥ 1} are standard normal random vectors with independent entries, which are also independent of Bt and the infinite sum has a subsequence of its partial sums which almost surely converges uniformly (see Itˆo and McKean [135, page 22]), we obtain the following. Lemma B.42. Let ν ∈ R and µk ∈ Rd , k ≥ 1 be the following constants Z t 1 1 2 β,Γ,δ > ν (t) , exp sδ βs ds − kΓ δk t t 0 6 Z t 2 sin(ksπ/t) k t µβ,Γ,δ (t) , β ds + (−1) Γ > Γ δ, k ≥ 1. s k kπ/t k2 π2 0 Then
Itβ,Γ,δ = ν β,Γ,δ (t)E exp
X
k≥1
r
! 2 > β,Γ,δ t2 ξ µ (t) − 2 2 kΓ ξk k2 . (B.24) t k k 2k π
B.7 An Exponential Functional of Brownian motion
361
Proof. We have from (B.23), t
Z
Bs> βs
0
1 ds = t
t
Z
>
sδ βs ds + 0
X k≥1
r Z t 2 sin(ksπ/t) > ξk βs ds t 0 kπ/t
(B.25)
and similarly Z 0
t
r 1 2X t2 2 kΓ Bs k ds = kΓ δk t − 2 (−1)k 2 2 ξk> Γ > Γ δ 3 t k π k≥1
r 2
Z t X sin(ksπ/t)
2
+ ξk
Γ
ds. t kπ/t 0
2
(B.26)
k≥1
Next using the standard orthonormality results for Fourier series !2 2 ksπ sin ds = 1, ∀k ≥ 1, t t 0 Z t k1 sπ k2 sπ sin sin ds = 0, ∀k1 , k2 ≥ 1, k1 6= k2 , t t 0 Z
t
r
it follows that
r 2
Z t 2 X X
sin(ksπ/t) 2 t
Γ 2
ξ kΓ ξk k 2 2 . k ds =
t kπ/t k π 0
k≥1 k≥1
(B.27)
The identity (B.24) follows immediately from equations (B.25), (B.26) and (B.27). t u Let P be an orthogonal matrix (P P > = P > P = I) and D be a diagonal d matrix D = diag(γ1 , γ2 , . . . , γd ) such that Γ > Γ = P > DP . Obviously (γi )i=1 > are the eigenvalues of the real symmetric matrix Γ Γ . Lemma B.43. Let aβ,Γ,δ i,k (t), for i = 1, . . . , d and k ≥ 1 be the following constants d j X aβ,Γ,δ (t) = P ij µβ,Γ,δ (t) . i,k k j=1
Then Itβ,Γ,δ = ν β,Γ,δ (t)
d Y i=1
r Q
1 h k≥1
γi t2 k2 π 2
2 X aβ,Γ,δ (t) 2i,k . i exp t γi + 1 t 2 2 k≥1 k π +1
(B.28)
362
B Stochastic Analysis
Proof. Let {ξ¯k , k ≥ 1} be the independent identically distributed standard normal random vectors defined by ξ¯k = P ξk for any k ≥ 1. As a consequence of Lemma B.42 we obtain that ! r 2 X 2 t β,Γ,δ β,Γ,δ It = ν β,Γ,δ (t)E exp ξ¯> P µk (t) − 2 2 ξ¯k> Dξ¯k . (B.29) t k 2k π k≥1
Define the σ-algebras Gk , σ(ξ¯p , p ≥ k)
and
G,
\
Gk .
k≥1
Now define ζ , exp
X k≥1
r
! 2 ¯> β,Γ,δ t2 ¯> ¯ ξ P µk (t) − 2 2 ξk Dξk ; t k 2k π
using the independence of ξ¯1 , . . . , ξ¯n , . . . and Kolmogorov’s 0–1 Law (see Williams [272, page 46]), we see that \ E[ζ] = E ζ Gk . k≥1 Since Gk is a decreasing sequence of σ-algebras, the L´evy downward theorem (see Williams [272, page 136]) implies that \ E ζ Gk = lim E[ζ | Gk ]. k→∞ k≥1 Hence we determine first E[ζ | Gk ] and then take the limit as k → ∞ to obtain the expectation in (B.29). Hence " !!# r Y 2 ¯> β,Γ,δ t2 ¯> ¯ E[ζ] = E exp ξ P µk (t) − 2 2 ξk Dξk t k 2k π k≥1 2 r t γi Z ∞ d + 1 x2 YY k2 π 2 1 2 β,Γ,δ dx, √ = exp ai,k (t)x − t 2 2π −∞ k≥1 i=1 and identity (B.28) follows immediately. Proposition B.44. Let f β,Γ (t) be the following constant
t u
B.7 An Exponential Functional of Brownian motion
f
β,Γ
363
Z tZ tX √ √ d sinh((s − t) γi ) sinh(s0 γi ) (t) , √ √ 2 γi sinh t γi 0 0 i=1 ×
d X j=1
P ij βsj
d X
0
0
P ij βsj0 ds ds0 ,
j 0 =1
and Rt,β,Γ (δ) be the following second-order polynomial in δ Z tX √ d d d X j 0 sinh(s γi ) X ij j 0 t,β,Γ R (δ) , P βs ds P ij Γ > Γ δ √ 0 i=1 γi sinh(t γi ) j=1 0 j =1
2 √ d d X coth(t γi ) X ij > j − P Γ Γδ . √ 2γ γ i i i=1 j=1 Then Itβ,Γ,δ
=
d Y
s
i=1
√ t γi kδk2 β,Γ t,β,Γ exp f (t) + R (δ) + . √ sinh(t γi ) 2t
(B.30)
Proof. Using the classical identity (B.35), the infinite product in the denom√ √ inator of (B.28) is equal to sinh(t γi )/(t γi ). Then we need to expand the argument of the exponential in (B.28). The following argument makes use of the identities (B.32)–(B.34). We have that aβ,Γ,δ i,k (t)
Z
t
= 0
sin(ksπ/t) β,Γ t2 ci (s) ds + (−1)k 2 2 cΓ,δ kπ/t k π i
and 2 aβ,Γ,δ i,k (t) =
Z tZ
t
sin(ksπ/t) sin(ks0 π/t) β,Γ ci (s)cβ,Γ (s0 ) ds ds0 i kπ/t kπ/t 0 0 Z t t2 sin(ksπ/t) β,Γ + 2(−1)k 2 2 cΓ,δ ci (s) ds i k π kπ/t 0 2 2 t Γ,δ + c , (B.31) k2 π2 i
j Pd Pd where cβ,Γ (s) = j=1 P ij βsj and cΓ,δ = j=1 P ij Γ > Γ δ . Next we sum up i i over k each of the three terms on the right-hand side of (B.31). For the first term we use
364
B Stochastic Analysis
X k≥1
sin(ksπ/t)sin(ks0 π/t) (kπ/t)2 t (t2 γi /(k 2 π 2 ) + 1) t X cos(k(s − s0 )π/t) − cos(k(s + s0 )π/t) 2π 2 t2 γi /π 2 + k 2 k≥1 √ √ cosh (s − t − s0 ) γi − cosh (s − t + s0 ) γi t π = √ √ 2π 2 2t γi /π sinh t γi √ √ sinh((s − t) γi )sinh(s0 γ i ) = ; √ √ 2 γi sinh t γi =
hence Z tZ X
0
k≥1
0
t
sin(ksπ/t) sin(ks0 π/t) β,Γ ci (s)cβ,Γ (s0 ) ds ds0 i kπ/t kπ/t t (t2 γi /(k 2 π 2 ) + 1) Z tZ t √ √ sinh((s − t) γi )sinh(s0 γi ) β,Γ = ci (s)cβ,Γ (s0 ) ds ds0 . √ √ i 2 γi sinh t γi 0 0
For the second term, 2 sin(ksπ/t) k t X (−1) k 2 π 2 t2 X (−1)k sin(ksπ/t) kπ/t = t(t2 γi /(k 2 π 2 ) + 1) π3 k (t2 γi /π 2 + k 2 ) k≥1 k≥1 √ sinh(s γi ) t2 π sπ = 3 − √ π 2t2 γi /π 2 sinh(t γi ) 2t3 γi /π 2 √ 1 sinh(s γi ) s = − ; √ 2γi sinh(t γi ) 2tγi
hence k d X 2(−1) X i=1 k≥1
Z t2 Γ,δ t sin(ksπ/t) β,Γ c ci (s) ds k2 π2 i kπ/t 0 t(t2 γi /(k 2 π 2 ) + 1) Z tX √ d sinh(s γi ) s cβ,Γ (s)cΓ,δ i i = − ds √ sinh(t γi ) t γi 0 i=1 Z tX √ d sinh(s γi ) cβ,Γ (s)cΓ,δ i i = ds √ γi 0 i=1 sinh(t γi ) Z 1 t > − sδ βs ds, t 0
since
Pd
β,Γ (s)cΓ,δ i /γi i=1 ci
= δ > βs . For the last term we get
B.7 An Exponential Functional of Brownian motion
2 2 2 t2 cΓ,δ i /(k π )
365
t Γ,δ 2 X 1 1 c − t (t2 γi /(k 2 π 2 ) + 1) γi i k2 π2 t 2 γi + k 2 π 2 k≥1 k≥1 t Γ,δ 2 1 1 1 √ = ci + 2 − √ coth(t γi ) ; γi 6 2t γi 2t γi X
=
then d X X i=1 k≥1
2 2 2 t2 cΓ,δ i /(k π )
t(t2 γi /(k 2 π 2 ) + 1) 2 √ d d kΓ δk2 kδk2 X coth(t γi ) X ij > j = + − P Γ Γδ , √ 6 2t 2γi γi i=1 j=1
2 Pd Pd Γ,δ 2 2 2 since i=1 cΓ,δ /γ = kΓ δk , /γi = kδk2 . In the above we i i i=1 ci used the following classical identities. X cos kr π e(r−π)z + e−(r−π)z 1 = − 2, 2 2 z +k 2z eπz − e−πz 2z
∀r ∈ (0, 2π),
(B.32)
∀r ∈ (−π, π),
(B.33)
k≥1
sin kr π erz − e−rz r = − 2, k (z 2 + k 2 ) 2z 2 eπz − e−πz 2z k≥1 X 1 1 1 = coth z − , z 2 + k2 π2 2z z k≥1 Y l2 sinh(πl) 1+ 2 = , k πl X
(−1)k
(B.34) (B.35)
k≥1
P and k≥1 1/k 2 = π 2 /6 (for proofs of these identities see for example, Macrobert [201]). We finally find the closed formula for the Brownian functional (B.22). t u In the one-dimensional case Proposition B.44 takes the following simpler form. This is the form of the result which is used in Chapter 6 to derive the density of πt for the Beneˇs filter. Corollary B.45. Let {Bt , t ≥ 0} be a standard Brownian motion, β : [0, t] → R be a bounded measurable function, and Γ ∈ R be a positive constant. Then Z t Z 1 t 2 2 E exp Bs βs ds − Γ Bs ds Bt = δ 2 0 0 Z t sinh(sΓ ) Γ coth(tΓ ) 2 δ 2 = f¯β,Γ (t) exp βs ds δ − δ + , (B.36) 2 2t 0 sinh(tΓ ) where
366
B Stochastic Analysis
s f¯β,Γ (t) =
Z t Z t tΓ sinh((s − t)Γ ) sinh(s0 Γ ) 0 exp βs βs0 ds ds . sinh(tΓ ) 2Γ sinh(tΓ ) 0 0
1 Introduction
1.1 Foreword The development of mathematics since the 1950s has gone through many radical changes both in scope and in depth. Practical applications are being found for an increasing number of theoretical results and practical problems have also stimulated the development of theory. In the case of stochastic filtering, it is not clear whether this first arose as an application found for general theory, or as the solution of a practical problem. Stochastic filtering now covers so many areas that it would be futile to attempt to write a comprehensive book on the subject. The purpose of this text is not to be exhaustive, but to provide a modern, solid and accessible starting point for studying the subject. The aim of stochastic filtering is to estimate an evolving dynamical system, the signal, customarily modelled by a stochastic process. Throughout the book the signal process is denoted by X = {Xt , t ≥ 0}, where t is the temporal parameter. Alternatively, one could choose a discrete time process, i.e. a process X = {Xt , t ∈ N} where t takes values in the (discrete) set {0, 1, 2, . . .}. The former continuous time description of the process has the benefit that use can be made of the power of stochastic calculus. A discrete time process may be viewed as a continuous time process with jumps at fixed times. Thus a discrete time process can be viewed as a special case of a continuous time process. However, it is not necessarily effective to do so since it is much easier and more transparent to study the discrete case directly. Unless otherwise stated, the process X and all other processes are defined on a probability space (Ω, F, P). The signal process X can not be measured directly. However, a partial measurement of the signal can be obtained. This measurement is modelled by another continuous time process Y = {Yt , t ≥ 0} which is called the observation process. This observation process is a function of X and a measurement noise. The measurement noise is modelled by a stochastic process W = {Wt , t ≥ 0}. Hence, A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 1,
1 Introduction
1.1 Foreword The development of mathematics since the 1950s has gone through many radical changes both in scope and in depth. Practical applications are being found for an increasing number of theoretical results and practical problems have also stimulated the development of theory. In the case of stochastic filtering, it is not clear whether this first arose as an application found for general theory, or as the solution of a practical problem. Stochastic filtering now covers so many areas that it would be futile to attempt to write a comprehensive book on the subject. The purpose of this text is not to be exhaustive, but to provide a modern, solid and accessible starting point for studying the subject. The aim of stochastic filtering is to estimate an evolving dynamical system, the signal, customarily modelled by a stochastic process. Throughout the book the signal process is denoted by X = {Xt , t ≥ 0}, where t is the temporal parameter. Alternatively, one could choose a discrete time process, i.e. a process X = {Xt , t ∈ N} where t takes values in the (discrete) set {0, 1, 2, . . .}. The former continuous time description of the process has the benefit that use can be made of the power of stochastic calculus. A discrete time process may be viewed as a continuous time process with jumps at fixed times. Thus a discrete time process can be viewed as a special case of a continuous time process. However, it is not necessarily effective to do so since it is much easier and more transparent to study the discrete case directly. Unless otherwise stated, the process X and all other processes are defined on a probability space (Ω, F, P). The signal process X can not be measured directly. However, a partial measurement of the signal can be obtained. This measurement is modelled by another continuous time process Y = {Yt , t ≥ 0} which is called the observation process. This observation process is a function of X and a measurement noise. The measurement noise is modelled by a stochastic process W = {Wt , t ≥ 0}. Hence, A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, c Springer Science+Business Media, LLC 2009 DOI 10.1007/978-0-387-76896-0 1,
2
1 Introduction
t ∈ [0, ∞).
Yt = ft (Xt , Wt ),
Let Y = {Yt , t ≥ 0} be the filtration generated by the observation process Y ; namely, Yt = σ (Ys , s ∈ [0, t]) , t ≥ 0. This σ-algebra Yt can be interpreted as the information available from observations up to time t. This information can be used to make various inferences about X, for example: •
•
•
ˆ t ) of the value of the signal at What is the best estimate (denoted by X time t, given the observations up to time t? If best estimate means the best mean square estimate, then this translates into computing E[Xt | Yt ], the conditional mean of Xt given Yt . Given the observations up to time t, what is the estimate of the difference ˆ t ? For example, if the signal is real-valued, we may want to compute Xt − X ˆ t )2 | Yt ] = E[X 2 | Yt ] − E[Xt | Yt ]2 . E[(Xt − X t What is the probability that the signal at time t can be found within a certain set A, again given the observations up to time t? This means computing P(Xt ∈ A | Yt ), the conditional probability of the event {Xt ∈ A} given Yt .
The typical form of such an inference requires the computation or approximation of one or more quantities of the form E[ϕ(Xt ) | Yt ], where ϕ is a real-valued function defined on the state space of the signal. Each of these statistics will provide fragments of information about Xt . But what if all information about Xt which is contained in Yt is required? Mathematically, this means computing πt , the conditional distribution of Xt given Yt . This πt is defined as a random probability measure which is measurable with respect to Yt so that† Z E [ϕ(Xt ) | Yt ] = ϕ(x)πt (dx), (1.1) S
for all statistics ϕ for which both terms of the above identity make sense. Knowing πt will enable us, at least theoretically, to compute any inference of Xt given Yt which is of interest, by integrating a suitable function ϕ with respect to πt . The measurability of πt with respect to Yt is crucial. However, this condition is sometimes overlooked and treated as a rather meaningless theoretical requirement. The following theorem illustrates the significance of the condition (for a proof see, e.g. Proposition 4.9 page 69 in [23]). Theorem 1.1. Let Ω be a probability space and a, b : Ω → R be two arbitrary functions. Let A be the σ-algebra generated by a, that is the smallest σ-algebra †
The identity (1.1) holds P-almost surely, i.e. there can be a subset of Ω of probability zero where (1.1) does not hold. The formal definition of the process πt can be found in Chapter 2.
1.2 The Contents of the Book
3
such that a is A/B(R)-measurable. Then if b is also A/B(R)-measurable there exists a B(R)/B(R)-measurable function f : R → R such that b = f ◦ a, where ◦ denotes function composition. Hence if b is “a-measurable”, then b is determined by a. If we know the value of a then (theoretically) we will know the value of b. In practice however, it is often impossible to obtain an explicit formula for the connecting function f and this is the main difficulty in solving the filtering problem. Translating this concept into the context of filtering tells us that the random probability πt is a function of Ys for s ∈ [0, t]. Thus πt is determined by the values of the observation process in the time interval [0, t].
1.2 The Contents of the Book The book is divided into two parts. The first part deals with the theoretical aspects of the problem of stochastic filtering and the second describes numerical methods for solving the filtering problem with emphasis on the class of particle approximations. In Chapter 2 a fundamental measure-theoretic result related to π is proved: that the conditional distribution of the signal can be viewed as a stochastic process with values in the space of probability measures. The filtering problem is stated formally in Chapter 3 for a class of problem where the signal X takes values in a state space S and is the solution of a martingale problem associated with an operator A. Two examples of filtering problems which can be considered in this fashion are: 1. The state space S = Rd and X = (X i )di=1 is the solution of a d-dimensional stochastic differential equation driven by an m-dimensional Brownian motion process V = (V j )m j=1 , Xti = X0i +
Z 0
t
f i (Xs )ds +
m Z X j=1
t
σ ij (Xs ) dVsj ,
i = 1, . . . , d.
(1.2)
0
In this case, the signal process is the solution of a martingale problem associated with the second-order differential operator ! d d m X X X ∂ 1 ∂2 A= fi + σ ik σ jk . ∂xi 2 i,j=1 ∂xi ∂xj i=1 k=1
2. The state space S = I and X is a continuous time Markov chain with finite state space I. In this case, the corresponding operator is given by the Q-matrix of the chain. The observation process Y is required to satisfy a stochastic evolution equation of the form
4
1 Introduction
Z Yt = Y0 +
t
h(Xs ) ds + Wt ,
(1.3)
0
where W = (W i )ni=1 is an n-dimensional Brownian motion independent of X and h = (hi )ni=1 : S → Rn is called the sensor function. The filtering equations for a problem of this class are then deduced. In particular, it is proved that for any test function ϕ in the domain of A we have† dπt (ϕ) = πt (Aϕ) dt +
n X
πt hi ϕ − πt hi πt (ϕ)
i=1
× dYti − πt (hi ϕ)dt .
(1.4)
Also, πt has an unnormalized version, denoted by ρt , which satisfies the linear equation n X dρt (ϕ) = ρt (Aϕ) dt + ρt (hi ϕ) dYti . (1.5) i=1
The identity πt (ϕ) =
ρt (ϕ) ρt (1)
is called the Kallianpur–Striebel formula. The first term of (1.5) describes the evolution of the signal and the accumulation of observations is reflected in the second term. The same terms (with the same interpretations) can be found in (1.4) and the additional terms are due to the normalization procedure. In Chapter 3 we present two approaches to deducing the filtering equations (1.4) and (1.5): the change of measure approach and the innovation approach. An extension is also described to the case where the noise driving the observation process is no longer independent of the signal. This feature is quite common, for example, in financial applications. Chapter 4 contains a detailed study of the uniqueness of the solution of the filtering equations (1.4) and (1.5). The uniqueness can be shown by following a partial differential equations approach. The solution of certain partial differential equations with final condition is proved to be a partial dual for the filtering equations which leads to a proof of uniqueness. The second approach to proving uniqueness of the solution of the filtering equations follows the recent work of Heunis and Lucic. In Chapter 5, we study the robust representation formula for the conditional expectation of the signal. The representation is robust in the sense that its dependence on the observation process Y is continuous. The result has important practical and theoretical consequences. †
If R a is a measure on a space S and f is an a-integrable function then a(f ) , f (x)a(dx). S
1.3 Historical Account
5
Chapter 6 is devoted to finite-dimensional filters. Two classes of filter are described: the Kalman–Bucy filter and the Beneˇs filter. Explicit formulae are deduced for both πt and ρt and the finite-dimensionality of the filters is emphasized. The analysis of the Beneˇs filter uses the robust representation result presented in Chapter 5. Among practitioners, it is generally accepted that the state space for πt is that of densities with respect to the Lebesgue measure. Inherent in this is the (often unproved) assumption that πt will always be absolutely continuous with respect to the Lebesgue measure. This is not always the case, although usually practitioners assume the correct conditions to ensure this. We discuss this issue in Chapter 7 and we look at the stochastic PDEs satisfied by the density of πt and the density of ρt . Chapter 8 gives an overview of the main computational methods currently available for solving the filtering problem. As expected of a topic with such a diversity of applications, numerous algorithms for solving the filtering problem have been developed. Six classes of numerical method are presented: linearization methods (the extended Kalman filter), approximations by (exact) finite-dimensional filters, the projection filter/moment methods, spectral methods, PDE methods and particle methods. Chapter 9 contains a detailed study of a continuous time particle filter. Particle filters (also known as sequential Monte Carlo methods) are some of the most successful methods for the numerical approximations of the solution of the filtering problem. Chapter 10 is a self-contained, elementary treatment of particle approximations to the solution of the stochastic filtering problem in the discrete time case. Finally, two appendices contain an assortment of measure theory, probability theory and stochastic analysis results included in order to make the text as self-contained as possible.
1.3 Historical Account The origins of the filtering problem in discrete time can be traced back to the work of Kolmogorov [152, 153] and Krein [155, 156]. In the continuous time case Wiener [270] was the first to discuss the optimal estimation of dynamical systems in the presence of noise. The Wiener filter consists of a signal X which is a stationary process and an associated measurement process Y = X + V where V is some independent noise. The object is to use the values of Y to estimate X, where the estimation is required to have the following three properties. • •
Causal : Xt is to be estimated using Ys for s ≤ t. ˆ t , should minimise the mean square error Optimal : The estimate, say X 2 ˆ E[(X − Xt ) ].
6
•
1 Introduction
ˆ t should be available. Online: At any (arbitrary) time t, the estimate X The Wiener filter gives a linear, time-invariant causal estimate of the form Z t ˆ Xt = h(t − s)Y (s) ds, −∞
where h(s) is called the transfer function. Wiener studied and solved this problem using the spectral theory of stationary processes. The results were included in a classified National Defense Research Council report issued in January/February 1942. The report, nicknamed “The Yellow Peril” (according to Wiener [271] this was because of the yellow paper in which it was bound) was widely circulated among defence engineers. Subsequently declassified, it appeared as a book, [270], in 1949. It is important to note that all consequent advances in the theory and practical implementation of stochastic filtering always adhered to the three precepts enumerated above: causality, optimality and online estimation. The next major development in stochastic filtering was the introduction of the linear filter. In this case, the signal satisfies a stochastic differential equation of the form (1.2) with linear coefficients and Gaussian initial condition and the observation equation satisfies an evolution equation of the form (1.3) with a linear sensor function. The linear filter can be solved explicitly; in other words, πt is given by a closed formula. The solution is a finite-dimensional one: πt is Gaussian, hence completely determined by its mean and its covariance matrix. Moreover it is quite easy to estimate the two parameters. The covariance matrix does not depend on Y and it satisfies a deterministic Riccati equation. Hence it can be solved in advance, before the filter is applied online. The mean satisfies a linear stochastic differential equation driven by Y , whose solution can be easily computed. These were the reasons for the linear filter’s widespread success in the 1960s; for example it was used by NASA to get the Apollo missions off the ground and to the moon.† Bucy and Kalman were the pioneers in this field. Kalman was the first to publish in a wide circulation journal. In [146], he solved the discrete time version of the linear filter. Bucy obtained similar results independently. Following the success of the linear filter, scientists started to explore different avenues. Firstly they extended the application of the Kalman filter beyond the linear/Gaussian framework. The basis of this extension is the fact that, locally, all systems behave linearly. So, at least locally, one can apply the Kalman filter equation. This gave rise to a class of algorithm called the extended Kalman filter. At the time of writing these algorithms, most of which are empirical and without theoretical foundation, are still widely used in a variety of applications.‡ †
‡
For an account of the linear filter’s applications to aerospace engineering and further references see Cipra [54]. We study the extended Kalman filter in some detail in Chapter 6.
1.3 Historical Account
7
Stratonovich’s work in non-linear filtering theory took place at the same time as the work of Bucy and Kalman. Stratonovich† presented his first results in the theory of conditional Markov processes and the related optimal nonlinear filtering at the All-Union Conference on Statistical Radiophysics in Gorki (1958) and in a seminar [257]; they were published as [259]. Nevertheless, there was considerable unease about the methods used by Stratonovich to deduce the continuous time filtering equation. The paper [259] appeared with an editorial footnote indicating that part of the exposition was not wholly convincing. Writing in Mathematical Reviews, Bharucha-Reid [17] indicated that he was inclined to agree with the editor’s comment concerning the author’s arguments in the continuous case. Part of the problem was that Stratonovich was using the stochastic integral which today bears his name. Stratonovich himself mentions this misunderstanding in [260, page 42]. He also points out (ibid., page 227) that the linear filtering equations were published by him in [258]. On the other side of the Atlantic in the mid-1960s Kushner [175, 176, 178] derived and analysed equation (1.4) using Itˆ o (and not Stratonovich) calculus. Shiryaev [255] provided the first rigorous derivation in the case of a general observation process where the signal and observation noises may be correlated. The equation (1.4) was also obtained in various forms by other authors, namely: Bucy [30] and Wonham [273]. In 1968, Kailath [137] introduced the innovation approach to linear filtering. This new method for deducing the filtering equations was extended in the early 1970s by Frost and Kailath [103] and by Fujisaki, Kallianpur and Kunita [104]. The equation (1.4) is now commonly referred to as either the Fujisaki–Kallianpur–Kunita equation or the Kushner–Stratonovich equation. Similarly, the filtering equation (1.5) was introduced in the same period by Duncan [85], [84], Mortensen [222] and Zakai [281], and is consequently referred to as the Zakai or the Duncan–Mortensen–Zakai equation. The stochastic partial differential equations‡ associated with the filtering equations were rigorously analysed and extended in the late 1970s by Pardoux [236, 237, 238] and Krylov and Rozovskii [159, 160, 161, 162]. Pardoux adopted a functional analytic approach in analysing these SPDEs, whereas Krylov and Rozovskii examined the filtering equations using methods inherited from classical PDE theory. See Rozovskii [250] and the references therein for an analysis of the filtering equations using these methods. Another important development in filtering theory was initiated by Clark [56] and continued by Davis [72, 74, 75]. In the late 1970s, Clark introduced the concept of robust or pathwise filtering; that is, πt (ϕ) is a function of the observation path {Ys , s ∈ [0, T ]}, †
‡
We thank Gregorii Milstein and Michael Tretyakov for drawing our attention to Stratonovitch’s historical account [260]. Here we refer to the strong version of the filtering equations (1.4) and (1.5) as described in Chapter 7.
8
1 Introduction
πt (ϕ) = Φ(Ys ; s ∈ [0, T ]), where Φ is a function defined on the corresponding space of trajectories. But Φ is not uniquely defined. Any other function Φ0 equal to Φ on a set of measure one would be an equally acceptable version of πt (ϕ). From a computational point of view, we need to identify a continuous version of Φ.† Given the success of the linear/Gaussian filter, scientists tried to find other classes of filtering problem where the solution was finite-dimensional and/or had a closed form. Beneˇs [9] succeeded in doing this. The class of filter which he studied had a linearly evolving observation process. However the signal was allowed to have a non-linear drift as long as it satisfied a certain (quite restrictive) condition, thenceforth known as the Beneˇs condition. The linear filter satisfies the Beneˇs condition. Brockett and Clark [26, 27, 28] initiated a Lie algebraic approach to the filtering problem. From the linearized form of the Zakai equation one can deduce that ρt lies on a surface “generated” by two differential operators. One is the infinitesimal generator of X, generally a second-order differential operator and the other is a linear zero-order operator. From a Lie algebraic point of view the Kalman filter and the Beneˇs filter are isomorphic, where the isomorphism is given by a state space transformation. Beneˇs continued his work in [10] where he found a larger class of exact filter for which the corresponding Lie algebra is no longer isomorphic with that associated with the Kalman–Bucy filter. Following Beneˇs, Daum derived new classes of exact filters in [69] and [70]. A number of other classes of finite-dimensional filter have been discovered; see the series of papers by Chiou, Chen, Hu, Leung, Wu, Yau and Yau [48, 49, 50, 131, 274, 277, 276, 278]. See also the papers by Maybank [203, 204] and Schwartz and Dickinson [254]. In contrast to these finite-dimensional filters, results have been discovered which prove that generically the filtering problem is infinite-dimensional (Chaleyat-Maurel and Michel [42]). Hazewinkel, Marcus and Sussmann [121, 122] and Mitter [210] have contributed to this area. The general consensus is now that finite-dimensional filters are the exceptions and not the rule. The work of Kallianpur has been influential in the field. The papers which contain the derivation of the Kallianpur–Striebel formula [144] and the derivation of the filtering equation [104] are of particular interest. Jointly with Karandikar in the papers [138, 139, 140, 141, 142, 143], Kallianpur extended the theory of stochastic filtering to finitely additive measures in place of countably additive measures. The area expanded rapidly in the 1980s and 1990s. Among the topics developed in this period were: stability of the solution of the filtering problem, the uniqueness and Feynman–Kac representations of the solutions of the filtering equations, Malliavin calculus applied to the qualitative analysis of πt and connections were discovered between filtering and information theory. In addition to the scientists already mentioned Bensoussan †
We analyze the pathwise approach to stochastic filtering in Chapter 5.
1.3 Historical Account
9
[12, 14, 15], Budhiraja [32, 33, 34, 35], Chaleyat-Maurel [40, 41, 44, 45], Duncan [86, 87, 88, 89], Elliott [90, 91, 92, 94], Grigelionis [107, 108, 109, 111], Gy¨ ongy [112, 113, 115, 116, 117], Hazewinkel [124, 123, 125, 126], Heunis [127, 128, 129, 130], Kunita [165, 166, 167, 168], Kurtz [170, 172, 173, 174], Liptser [52, 190, 191], Michel [46, 47, 207, 20], Mikulevicius [109, 110, 208, 209], Mitter [98, 211, 212, 213], Newton [212, 225, 226], Picard [240, 241, 242, 243], Ocone [57, 228, 229, 230, 232, 233] Runggaldier [80, 96, 154, 191] and Zeitouni [4, 5, 282, 283, 284] contributed during this period. In addition to these papers, monographs were written by Bensoussan [13], Liptser and Shiryaev [192, 193] and Rozovskii [250] and Pardoux published lecture notes [238]. Much of the work carried out in the 1990s has focussed on the numerical solution of the filtering problem. The advent of fast computers has encouraged research in this area beyond the linear/Gaussian filter. Development in this area continues today. In Chapter 8 some historical comments are given for each of the six classes of numerical method discussed. Kushner (see e.g. [177, 179, 180, 181]) worked in particular on approximations of the solution of the filtering problem by means of finite Markov chain approximations (which are classified in Chapter 8 as PDE methods). Among others he introduced the important idea of a robust discrete state approximation, the finite difference method. Le Gland and his collaborators (see [25, 24, 100, 101, 136, 187, 188, 223]) have contributed to the development of several classes of approximation including the projection filter, PDE methods and particle methods. Rapid progress continues to be made in both the theory and applications of stochastic filtering. In addition to work on the classical filtering problem, there is ongoing work on the analysis of the filtering problem for infinite-dimensional problems and problems where the Brownian motion noise is replaced by either ‘coloured’ noise, or fractional Brownian motion. Applications of stochastic filtering have been found within mathematical finance. There is continuing work for developing both generic/universal numerical methods for solving the filtering problem and problem specific ones. At a Cambridge conference on stochastic processes in July 2001, Moshe Zakai was asked what he thought of stochastic filtering as a subject for future research students. He replied that he always advised his students ‘to have an alternative subject on the side, just in case!’ We hope that this book will assist anyone interested in learning about this challenging subject!
References
1. Robert A. Adams. Sobolev Spaces. Academic Press, Orlando, FL, 2nd edition, 2003. 2. Lakhdar Aggoun and Robert J. Elliott. Measure Theory and Filtering, volume 15 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK, 2004. 3. Deborah F. Allinger and Sanjoy K. Mitter. New results on the innovations problem for nonlinear filtering. Stochastics, 4(4):339–348, 1980/81. 4. Rami Atar, Frederi Viens, and Ofer Zeitouni. Robustness of Zakai’s equation via Feynman-Kac representations. In Stochastic Analysis, Control, Optimization and Applications, Systems Control Found. Appl., pages 339–352. Birkh¨ auser Boston, Boston, MA, 1999. 5. Rami Atar and Ofer Zeitouni. Exponential stability for nonlinear filtering. Ann. Inst. H. Poincar´e Probab. Statist., 33(6):697–725, 1997. 6. J. E. Baker. Reducing bias and inefficiency in the selection algorithm. In John J. Grefenstette, editor, Proceedings of the Second International Conference on Genetic Algorithms and their Applications, pages 14–21, Mahwah, NJ, 1987. Lawrence Erlbaum. 7. John. S. Baras, Gilmer L. Blankenship, and William E. Hopkins Jr. Existence, uniqueness, and asymptotic behaviour of solutions to a class of Zakai equations with unbounded coefficients. IEEE Trans. Automatic Control, AC-28(2):203– 214, 1983. 8. Eduardo Bayro-Corrochano and Yiwen Zhang. The motor extended Kalman filter: A geometric approach for rigid motion estimation. J. Math. Imaging Vision, 13(3):205–228, 2000. 9. V. E. Beneˇs. Exact finite-dimensional filters for certain diffusions with nonlinear drift. Stochastics, 5(1-2):65–92, 1981. 10. V. E. Beneˇs. New exact nonlinear filters with large Lie algebras. Systems Control Lett., 5(4):217–221, 1985. 11. V. E. Beneˇs. Nonexistence of strong nonanticipating solutions to stochastic DEs: implications for functional DEs, filtering and control. Stochastic Process. Appl., 5(3):243–263, 1977. 12. A. Bensoussan. On some approximation techniques in nonlinear filtering. In Stochastic Differential Systems, Stochastic Control Theory and Applications
368
13. 14. 15.
16.
17. 18.
19. 20.
21.
22. 23. 24.
25.
26.
27.
28.
29.
30.
References (Minneapolis, MN, 1986), volume 10 of IMA Vol. Math. Appl., pages 17–31. Springer, New York, 1988. A. Bensoussan. Stochastic Control of Partially Observable Systems. Cambridge University Press, Cambridge, UK, 1992. A. Bensoussan, R. Glowinski, and A. Rascanu. Approximation of the Zakai equation by splitting up method. SIAM J. Control Optim., 28:1420–1431, 1990. Alain Bensoussan. Nonlinear filtering theory. In Recent advances in stochastic calculus (College Park, MD, 1987), Progr. Automat. Info. Systems, pages 27– 64. Springer, New York, 1990. Albert Benveniste. S´eparabilit´e optionnelle, d’apre`s doob. In S´eminaire de Probabiliti´es, X (Univ. Strasbourg), Ann´ees universitaire 1974/1975, volume 511 of Lecture Notes in Math., pages 521–531. Springer Verlag, Berlin, 1976. A. T. Bharucha-Reid. Review of Stratonovich, Conditional markov processes. Mathematical Reviews, (MR0137157), 1963. A. G. Bhatt, G. Kallianpur, and R. L. Karandikar. Uniqueness and robustness of solution of measure-valued equations of nonlinear filtering. Ann. Probab., 23(4):1895–1938, 1995. P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. Jean-Michel Bismut and Dominique Michel. Diffusions conditionnelles. II. G´en´erateur conditionnel. Application au filtrage. J. Funct. Anal., 45(2):274– 292, 1982. B. Z. Bobrovsky and M. Zakai. Asymptotic a priori estimates for the error in the nonlinear filtering problem. IEEE Trans. Inform. Theory, 28:371–376, 1982. N. Bourbaki. El´ements de Math´ematique: Topologie G´en´erale [French]. Hermann, Paris, France, 1958. Leo Breiman. Probability. Classics in Applied Mathematics. SIAM, Philadelphia, PA, 1992. Damiano Brigo, Bernard Hanzon, and Fran¸cois Le Gland. A differential geometric approach to nonlinear filtering: the projection filter. IEEE Trans. Automat. Control, 43(2):247–252, 1998. Damiano Brigo, Bernard Hanzon, and Fran¸cois Le Gland. Approximate nonlinear filtering by projection on exponential manifolds of densities. Bernoulli, 5(3):495–534, 1999. R. W. Brockett. Nonlinear systems and nonlinear estimation theory. In Stochastic Systems: The Mathematics of Filtering and Identification and Applications (Les Arcs, 1980), volume 78 of NATO Adv. Study Inst. Ser. C: Math. Phys. Sci., pages 441–477, Dordrecht-Boston, 1981. Reidel. R. W. Brockett. Nonlinear control theory and differential geometry. In Z. Ciesielski and C. Olech, editors, Proceedings of the International Congress of Mathematicians, pages 1357–1367, Warsaw, 1984. Polish Scientific. R. W. Brockett and J. M. C. Clark. The geometry of the conditional density equation. analysis and optimisation of stochastic systems. In Proceedings of the International Conference, University of Oxford, Oxford, 1978, pages 299–309, London-New York, 1980. Academic Press. R. S. Bucy. Optimum finite time filters for a special non-stationary class of inputs. Technical Report Internal Report B. B. D. 600, March 31, Johns Hopkins Applied Physics Laboratory, 1959. R. S. Bucy. Nonlinear filtering. IEEE Trans. Automatic Control, AC-10:198, 1965.
References
369
31. R. S. Bucy and P. D. Joseph. Filtering for Stochastic Processes with Applications to Guidance. Chelsea, New York, second edition, 1987. 32. A. Budhiraja and G. Kallianpur. Approximations to the solution of the Zakai equation using multiple Wiener and Stratonovich integral expansions. Stochastics Stochastics Rep., 56(3-4):271–315, 1996. 33. A. Budhiraja and G. Kallianpur. The Feynman-Stratonovich semigroup and Stratonovich integral expansions in nonlinear filtering. Appl. Math. Optim., 35(1):91–116, 1997. 34. A. Budhiraja and D. Ocone. Exponential stability in discrete-time filtering for non-ergodic signals. Stochastic Process. Appl., 82(2):245–257, 1999. 35. Amarjit Budhiraja and Harold J. Kushner. Approximation and limit results for nonlinear filters over an infinite time interval. II. Random sampling algorithms. SIAM J. Control Optim., 38(6):1874–1908 (electronic), 2000. 36. D. L. Burkholder. Distribution function inequalities for martingales. Ann. Prob., 1(1):19–42, 1973. 37. Z. Cai, F. Le Gland, and H. Zhang. An adaptive local grid refinement method for nonlinear filtering. Technical Report 2679, INRIA, 1995. 38. J. Carpenter, P. Clifford, and P. Fearnhead. An improved particle filter for non-linear problems. IEE Proceedings – Radar, Sonar and Navigation, 146:2– 7, 1999. 39. J. R. Carpenter, P. Clifford, and P. Fearnhead. Sampling strategies for Monte Carlo filters for non-linear systems. IEE Colloquium Digest, 243:6/1–6/3, 1996. 40. M. Chaleyat-Maurel. Robustesse du filtre et calcul des variations stochastique. J. Funct. Anal., 68(1):55–71, 1986. 41. M. Chaleyat-Maurel. Continuity in nonlinear filtering. Some different approaches. In Stochastic Partial Differential Equations and Applications (Trento, 1985), volume 1236 of Lecture Notes in Math., pages 25–39. Springer, Berlin, 1987. 42. M. Chaleyat-Maurel and D. Michel. Des r´esultats de non existence de filtre de dimension finie. Stochastics, 13(1-2):83–102, 1984. 43. M. Chaleyat-Maurel and D. Michel. Hypoellipticity theorems and conditional laws. Z. Wahrsch. Verw. Gebiete, 65(4):573–597, 1984. 44. M. Chaleyat-Maurel and D. Michel. The support of the law of a filter in C ∞ topology. In Stochastic Differential Systems, Stochastic Control Theory and Applications (Minneapolis, MN, 1986), volume 10 of IMA Vol. Math. Appl., pages 395–407. Springer, New York, 1988. 45. M. Chaleyat-Maurel and D. Michel. The support of the density of a filter in the uncorrelated case. In Stochastic Partial Differential Equations and Applications, II (Trento, 1988), volume 1390 of Lecture Notes in Math., pages 33–41. Springer, Berlin, 1989. 46. M. Chaleyat-Maurel and D. Michel. Support theorems in nonlinear filtering. In New Trends in Nonlinear Control Theory (Nantes, 1988), volume 122 of Lecture Notes in Control and Inform. Sci., pages 396–403. Springer, Berlin, 1989. 47. M. Chaleyat-Maurel and D. Michel. A Stroock Varadhan support theorem in nonlinear filtering theory. Probab. Theory Related Fields, 84(1):119–139, 1990. 48. J. Chen, S. S.-T. Yau, and C.-W. Leung. Finite-dimensional filters with nonlinear drift. IV. Classification of finite-dimensional estimation algebras of maximal rank with state-space dimension 3. SIAM J. Control Optim., 34(1):179–198, 1996.
370
References
49. J. Chen, S. S.-T. Yau, and C.-W. Leung. Finite-dimensional filters with nonlinear drift. VIII. Classification of finite-dimensional estimation algebras of maximal rank with state-space dimension 4. SIAM J. Control Optim., 35(4):1132– 1141, 1997. 50. W. L. Chiou and S. S.-T. Yau. Finite-dimensional filters with nonlinear drift. II. Brockett’s problem on classification of finite-dimensional estimation algebras. SIAM J. Control Optim., 32(1):297–310, 1994. 51. N. Chopin. Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Annals of Statistics, 32(6):2385–2411, 2004. 52. P.-L. Chow, R. Khasminskii, and R. Liptser. Tracking of signal and its derivatives in Gaussian white noise. Stochastic Process. Appl., 69(2):259–273, 1997. 53. K. L. Chung and R. J. Williams. Introduction to Stochastic Integration. Birkh¨ auser, Boston, second edition, 1990. 54. B. Cipra. Engineers look to Kalman filtering for guidance. SIAM News, 26(5), 1993. 55. J. M. C. Clark. Conditions for one to one correspondence between an observation process and its innovation. Technical report, Centre for Computing and Automation, Imperial College, London, 1969. 56. J. M. C. Clark. The design of robust approximations to the stochastic differential equations of nonlinear filtering. In J. K. Skwirzynski, editor, Communication Systems and Random Process Theory, volume 25 of Proc. 2nd NATO Advanced Study Inst. Ser. E, Appl. Sci., pages 721–734. Sijthoff & Noordhoff, Alphen aan den Rijn, 1978. 57. J. M. C. Clark, D. L. Ocone, and C. Coumarbatch. Relative entropy and error bounds for filtering of Markov processes. Math. Control Signals Systems, 12(4):346–360, 1999. 58. M. Cohen de Lara. Finite-dimensional filters. II. Invariance group techniques. SIAM J. Control Optim., 35(3):1002–1029, 1997. 59. M. Cohen de Lara. Finite-dimensional filters. part I: The Wei normal technique. Part II: Invariance group technique. SIAM J. Control Optim., 35(3):980– 1029, 1997. 60. D. Crisan. Exact rates of convergence for a branching particle approximation to the solution of the Zakai equation. Ann. Probab., 31(2):693–718, 2003. 61. D. Crisan. Particle approximations for a class of stochastic partial differential equations. Appl. Math. Optim., 54(3):293–314, 2006. 62. D. Crisan, P. Del Moral, and T. Lyons. Interacting particle systems approximations of the Kushner-Stratonovitch equation. Adv. in Appl. Probab., 31(3):819– 838, 1999. 63. D. Crisan, J. Gaines, and T. Lyons. Convergence of a branching particle method to the solution of the Zakai equation. SIAM J. Appl. Math., 58(5):1568–1590, 1998. 64. D. Crisan and T. Lyons. Nonlinear filtering and measure-valued processes. Probab. Theory Related Fields, 109(2):217–244, 1997. 65. D. Crisan and T. Lyons. A particle approximation of the solution of the Kushner-Stratonovitch equation. Probab. Theory Related Fields, 115(4):549– 578, 1999. 66. D. Crisan and T. Lyons. Minimal entropy approximations and optimal algorithms for the filtering problem. Monte Carlo Methods and Applications, 8(4):343–356, 2002.
References
371
67. D. Crisan, P. Del Moral, and T. Lyons. Discrete filtering using branching and interacting particle systems. Markov Processes and Related Fields, 5(3):293– 318, 1999. 68. R. W. R. Darling. Geometrically intrinsic nonlinear recursive filters. Technical report, Berkeley Statistics Department, 1998. http://www.stat.berkeley. edu/~darling/GINRF. 69. F. E. Daum. New exact nonlinear filters. In J. C. Spall, editor, Bayesian Analysis of Time Series and Dynamic Models, pages 199–226, New York, 1988. Marcel Dekker. 70. F. E. Daum. New exact nonlinear filters: Theory and applications. Proc. SPIE, 2235:636–649, 1994. 71. M. H. A. Davis. Linear Estimation and Stochastic Control. Chapman and Hall Mathematics Series. Chapman and Hall, London, 1977. 72. M. H. A. Davis. On a multiplicative functional transformation arising in nonlinear filtering theory. Z. Wahrsch. Verw. Gebiete, 54(2):125–139, 1980. 73. M. H. A. Davis. New approach to filtering for nonlinear systems. Proc. IEE-D, 128(5):166–172, 1981. 74. M. H. A. Davis. Pathwise nonlinear filtering. In M. Hazewinkel and J. C. Willems, editors, Stochastic Systems: The Mathematics of Filtering and Identification and Applications, Proc. NATO Advanced Study Inst. Ser. C 78, pages 505–528, Dordrecht-Boston, 1981. Reidel. 75. M. H. A. Davis. A pathwise solution of the equations of nonlinear filtering. Theory Probability Applications [trans. of Teor. Veroyatnost. i Primenen.], 27(1):167–175, 1982. 76. M. H. A. Davis and M. P. Spathopoulos. Pathwise nonlinear filtering for nondegenerate diffusions with noise correlation. SIAM J. Control Optim., 25(2):260– 278, 1987. 77. Claude Dellacherie and Paul-Andr´e Meyer. Probabilit´es et potentiel. Chapitres I` a IV. [French] [Probability and potential. Chapters I–IV] . Hermann, Paris, 1975. 78. Claude Dellacherie and Paul-Andr´e Meyer. Un noveau th´eor`eme de projection et de section [French]. In S´eminaire de Probabilit´es, IX (Seconde Partie, Univ. Strasbourg, Ann´ees universitaires 1973/1974 et 1974/1975), pages 239–245. Springer Verlag, New York, 1975. 79. Claude Dellacherie and Paul-Andr´e Meyer. Probabilit´es et potentiel. Chapitres V ` a VIII. [French] [Probability and potential. Chapters V–VIII] Th´eorie des martingales. Hermann, Paris, 1980. 80. Giovanni B. Di Masi and Wolfgang J. Runggaldier. An adaptive linear approach to nonlinear filtering. In Applications of Mathematics in Industry and Technology (Siena, 1988), pages 308–316. Teubner, Stuttgart, 1989. 81. J. L. Doob. Stochastic Processes. Wiley, New York, 1963. 82. J. L. Doob. Stochastic process measurability conditions. Annales de l’institut Fourier, 25(3–4):163–176, 1975. 83. Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo Methods in Practice. Stat. Eng. Inf. Sci. Springer, New York, 2001. 84. T. E. Duncan. Likelihood functions for stochastic signals in white noise. Information and Control, 16:303–310, 1970. 85. T. E. Duncan. On the absolute continuity of measures. Ann. Math. Statist., 41:30–38, 1970.
372
References
86. T. E. Duncan. On the steady state filtering problem for linear pure delay time systems. In Analysis and control of systems (IRIA Sem., Rocquencourt, 1979), pages 25–42. INRIA, Rocquencourt, 1980. 87. T. E. Duncan. Stochastic filtering in manifolds. In Control Science and Technology for the Progress of Society, Vol. 1 (Kyoto, 1981), pages 553–556. IFAC, Luxembourg, 1982. 88. T. E. Duncan. Explicit solutions for an estimation problem in manifolds associated with Lie groups. In Differential Geometry: The Interface Between Pure and Applied Mathematics (San Antonio, TX, 1986), volume 68 of Contemp. Math., pages 99–109. Amer. Math. Soc., Providence, RI, 1987. 89. T. E. Duncan. An estimation problem in compact Lie groups. Systems Control Lett., 10(4):257–263, 1988. 90. R. J Elliott and V. Krishnamurthy. Exact finite-dimensional filters for maximum likelihood parameter estimation of continuous-time linear Gaussian systems. SIAM J. Control Optim., 35(6):1908–1923, 1997. 91. R. J Elliott and J. van der Hoek. A finite-dimensional filter for hybrid observations. IEEE Trans. Automat. Control, 43(5):736–739, 1998. 92. Robert J. Elliott and Michael Kohlmann. Robust filtering for correlated multidimensional observations. Math. Z., 178(4):559–578, 1981. 93. Robert J. Elliott and Michael Kohlmann. The existence of smooth densities for the prediction filtering and smoothing problems. Acta Appl. Math., 14(3):269– 286, 1989. 94. Robert J. Elliott and John B. Moore. Zakai equations for Hilbert space valued processes. Stochastic Anal. Appl., 16(4):597–605, 1998. 95. Stewart N. Ethier and Thomas G. Kurtz. Markov Processes: Characterization and Convergence. Wiley, New York, 1986. 96. Marco Ferrante and Wolfgang J. Runggaldier. On necessary conditions for the existence of finite-dimensional filters in discrete time. Systems Control Lett., 14(1):63–69, 1990. 97. W. H. Fleming and E. Pardoux. Optimal control of partially observed diffusions. SIAM J. Control Optim., 20(2):261–285, 1982. 98. Wendell H. Fleming and Sanjoy K. Mitter. Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics, 8(1):63–77, 1982/83. 99. Patrick Florchinger. Malliavin calculus with time dependent coefficients and application to nonlinear filtering. Probab. Theory Related Fields, 86(2):203– 223, 1990. 100. Patrick Florchinger and Fran¸cois Le Gland. Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. In Analysis and Optimization of Systems (Antibes, 1990), volume 144 of Lecture Notes in Control and Inform. Sci., pages 228–237. Springer, Berlin, 1990. 101. Patrick Florchinger and Fran¸cois Le Gland. Time-discretization of the Zakai equation for diffusion processes observed in correlated noise. Stochastics Stochastics Rep., 35(4):233–256, 1991. 102. Avner Friedman. Partial Differential Equations of Parabolic Type. PrenticeHall, Englewood Cliffs, NJ, 1964. 103. P. Frost and T. Kailath. An innovations approach to least-squares estimation. III. IEEE Trans. Autom. Control, AC-16:217–226, 1971. 104. M. Fujisaki, G. Kallianpur, and H. Kunita. Stochastic differential equations for the non linear filtering problem. Osaka J. Math., 9:19–40, 1972.
References
373
105. R. K. Getoor. On the construction of kernels. In S´eminaire de Probabilit´es, IX (Seconde Partie, Univ. Strasbourg, Ann´ees universitaires 1973/1974 et 1974/1975), volume 465 of Lecture Notes in Math., pages 443–463. Springer Verlag, Berlin, 1975. 106. N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings, Part F, 140(2):107–113, 1993. 107. B. Grigelionis. The theory of nonlinear estimation and semimartingales. Izv. Akad. Nauk UzSSR Ser. Fiz.-Mat. Nauk, (3):17–22, 97, 1981. 108. B. Grigelionis. Stochastic nonlinear filtering equations and semimartingales. In Nonlinear Filtering and Stochastic Control (Cortona, 1981), volume 972 of Lecture Notes in Math., pages 63–99. Springer, Berlin, 1982. 109. B. Grigelionis and R. Mikuleviˇcius. On weak convergence to random processes with boundary conditions. In Nonlinear Filtering and Stochastic Control (Cortona, 1981), volume 972 of Lecture Notes in Math., pages 260–275. Springer, Berlin, 1982. 110. B. Grigelionis and R. Mikuleviˇcius. Stochastic evolution equations and densities of the conditional distributions. In Theory and Application of Random Fields (Bangalore, 1982), volume 49 of Lecture Notes in Control and Inform. Sci., pages 49–88. Springer, Berlin, 1983. 111. B. Grigelionis and R. Mikulyavichyus. Robustness in nonlinear filtering theory. Litovsk. Mat. Sb., 22(4):37–45, 1982. 112. I. Gy¨ ongy. The approximation of stochastic partial differential equations and applications in nonlinear filtering. Comput. Math. Appl., 19(1):47–63, 1990. 113. I. Gy¨ ongy and N. V. Krylov. Stochastic partial differential equations with unbounded coefficients and applications. II. Stochastics Stochastics Rep., 32(34):165–180, 1990. 114. I. Gy¨ ongy and N. V. Krylov. On stochastic partial differential equations with unbounded coefficients. In Stochastic partial differential equations and applications (Trento, 1990), volume 268 of Pitman Res. Notes Math. Ser., pages 191–203. Longman Sci. Tech., Harlow, 1992. 115. Istv´ an Gy¨ ongy. On stochastic partial differential equations. Results on approximations. In Topics in Stochastic Systems: Modelling, Estimation and Adaptive Control, volume 161 of Lecture Notes in Control and Inform. Sci., pages 116– 136. Springer, Berlin, 1991. 116. Istv´ an Gy¨ ongy. Filtering on manifolds. Acta Appl. Math., 35(1-2):165–177, 1994. White noise models and stochastic systems (Enschede, 1992). 117. Istv´ an Gy¨ ongy. Stochastic partial differential equations on manifolds. II. Nonlinear filtering. Potential Anal., 6(1):39–56, 1997. 118. Istv´ an Gy¨ ongy and Nicolai Krylov. On the rate of convergence of splitting-up approximations for SPDEs. In Stochastic inequalities and applications, volume 56 of Progr. Probab., pages 301–321. Birkh¨ auser, 2003. 119. Istv´ an Gy¨ ongy and Nicolai Krylov. On the splitting-up method and stochastic partial differential equations. Ann. Probab., 31(2):564–591, 2003. 120. J. E. Handschin and D. Q. Mayne. Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. Internat. J. Control, 1(9):547–559, 1969. 121. M. Hazewinkel, S. I. Marcus, and H. J. Sussmann. Nonexistence of finitedimensional filters for conditional statistics of the cubic sensor problem. Systems Control Lett., 3(6):331–340, 1983.
374
References
122. M. Hazewinkel, S. I. Marcus, and H. J. Sussmann. Nonexistence of finitedimensional filters for conditional statistics of the cubic sensor problem. In Filtering and Control of Random Processes (Paris, 1983), volume 61 of Lecture Notes in Control and Inform. Sci., pages 76–103, Berlin, 1984. Springer. 123. Michiel Hazewinkel. Lie algebraic methods in filtering and identification. In VIIIth International Congress on Mathematical Physics (Marseille, 1986), pages 120–137. World Scientific, Singapore, 1987. 124. Michiel Hazewinkel. Lie algebraic method in filtering and identification. In Stochastic Processes in Physics and Engineering (Bielefeld, 1986), volume 42 of Math. Appl., pages 159–176. Reidel, Dordrecht, 1988. 125. Michiel Hazewinkel. Non-Gaussian linear filtering, identification of linear systems, and the symplectic group. In Modeling and Control of Systems in Engineering, Quantum Mechanics, Economics and Biosciences (Sophia-Antipolis, 1988), volume 121 of Lecture Notes in Control and Inform. Sci., pages 299–308. Springer, Berlin, 1989. 126. Michiel Hazewinkel. Non-Gaussian linear filtering, identification of linear systems, and the symplectic group. In Signal Processing, Part II, volume 23 of IMA Vol. Math. Appl., pages 99–113. Springer, New York, 1990. 127. A. J. Heunis. Nonlinear filtering of rare events with large signal-to-noise ratio. J. Appl. Probab., 24(4):929–948, 1987. 128. A. J. Heunis. On the stochastic differential equations of filtering theory. Appl. Math. Comput., 37(3):185–218, 1990. 129. A. J. Heunis. On the stochastic differential equations of filtering theory. Appl. Math. Comput., 39(3, suppl.):3s–36s, 1990. 130. Andrew Heunis. Rates of convergence for an adaptive filtering algorithm driven by stationary dependent data. SIAM J. Control Optim., 32(1):116–139, 1994. 131. Guo-Qing Hu, Stephen S. T. Yau, and Wen-Lin Chiou. Finite-dimensional filters with nonlinear drift. XIII. Classification of finite-dimensional estimation algebras of maximal rank with state space dimension five. Loo-Keng Hua: a great mathematician of the twentieth century. Asian J. Math., 4(4):905–931, 2000. 132. M. Isard and A. Blake. Visual tracking by stochastic propagation of conditional density. In Proceedings of the 4th European Conference on Computer Vision, pages 343–356, New York, 1996. Springer Verlag. 133. M. Isard and A. Blake. Condensation conditional density propagation for visual tracking. Int. J. Computer Vision, 1998. 134. M. Isard and A. Blake. A mixed-state condensation tracker with automatic model switching. In Proceedings of the 6th International Conference on Computer Vision, pages 107–112, 1998. 135. K. Ito and H. P. McKean. Diffusion Processes and Their Sample Paths. Academic Press, New York, 1965. 136. Matthew R. James and Fran¸cois Le Gland. Numerical approximation for nonlinear filtering and finite-time observers. In Applied Stochastic Analysis (New Brunswick, NJ, 1991), volume 177 of Lecture Notes in Control and Inform. Sci., pages 159–175. Springer, Berlin, 1992. 137. T. Kailath. An innovations approach to least-squares estimation. I. linear filtering in additive white noise. IEEE Trans. Autom. Control, AC-13:646– 655, 1968.
References
375
138. G. Kallianpur. White noise theory of filtering—Some robustness and consistency results. In Stochastic Differential Systems (Marseille-Luminy, 1984), volume 69 of Lecture Notes in Control and Inform. Sci., pages 217–223. Springer, Berlin, 1985. 139. G. Kallianpur and R. L. Karandikar. The Markov property of the filter in the finitely additive white noise approach to nonlinear filtering. Stochastics, 13(3):177–198, 1984. 140. G. Kallianpur and R. L. Karandikar. Measure-valued equations for the optimum filter in finitely additive nonlinear filtering theory. Z. Wahrsch. Verw. Gebiete, 66(1):1–17, 1984. 141. G. Kallianpur and R. L. Karandikar. A finitely additive white noise approach to nonlinear filtering: A brief survey. In Multivariate Analysis VI (Pittsburgh, PA, 1983), pages 335–344. North-Holland, Amsterdam, 1985. 142. G. Kallianpur and R. L. Karandikar. White noise calculus and nonlinear filtering theory. Ann. Probab., 13(4):1033–1107, 1985. 143. G. Kallianpur and R. L. Karandikar. White Noise Theory of Prediction, Filtering and Smoothing, volume 3 of Stochastics Monographs. Gordon & Breach Science, New York, 1988. 144. G. Kallianpur and C. Striebel. Estimation of stochastic systems: Arbitrary system process with additive white noise observation errors. Ann. Math. Statist., 39(3):785–801, 1968. 145. Gopinath Kallianpur. Stochastic filtering theory, volume 13 of Applications of Mathematics. Springer, New York, 1980. 146. R. E. Kalman. A new approach to linear filtering and prediction problems. J. Basic Eng., 82:35–45, 1960. 147. R. E. Kalman and R. S. Bucy. New results in linear filtering and prediction theory. Trans. ASME, Ser. D, J. Basic Eng., 83:95–108, 1961. 148. Jim Kao, Dawn Flicker, Kayo Ide, and Michael Ghil. Estimating model parameters for an impact-produced shock-wave simulation: Optimal use of partial data with the extended Kalman filter. J. Comput. Phys., 214(2):725–737, 2006. 149. I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus., volume 113 of Graduate Texts in Mathematics. Springer, New York, second edition, 1991. 150. Genshiro Kitagawa. Non-Gaussian state-space modeling of nonstationary time series. with comments and a reply by the author. J. Amer. Statist. Assoc., 82(400):1032–1063, 1987. 151. P. E. Kloeden and E. Platen. The Numerical Solution of Stochastic Differential Equations. Springer, New York, 1992. 152. A. N. Kolmogorov. Sur l’interpolation et extrapolation des suites stationnaires. C. R. Acad. Sci., 208:2043, 1939. 153. A. N. Kolmogorov. Interpolation and extrapolation. Bulletin de l-acad´emie des sciences de U.S.S.R., Ser. Math., 5:3–14, 1941. 154. Hayri K¨ orezlio˘ glu and Wolfgang J. Runggaldier. Filtering for nonlinear systems driven by nonwhite noises: An approximation scheme. Stochastics Stochastics Rep., 44(1-2):65–102, 1993. 155. M. G. Krein. On a generalization of some investigations of G. Szeg¨ o, W. M. smirnov, and A. N. Kolmogorov. Dokl. Adad. Nauk SSSR, 46:91–94, 1945. 156. M. G. Krein. On a problem of extrapolation of A. N. Kolmogorov. Dokl. Akad. Nauk SSSR, 46:306–309, 1945.
376
References
157. N. V. Krylov. On Lp -theory of stochastic partial differential equations in the whole space. SIAM J. Math. Anal., 27(2):313–340, 1996. 158. N. V. Krylov. An analytic approach to SPDEs. In Stochastic Partial Differential Equations: Six Perspectives, number 64 in Math. Surveys Monogr., pages 185–242. Amer. Math. Soc., Providence, RI, 1999. 159. N. V. Krylov and B. L. Rozovski˘ı. The Cauchy problem for linear stochastic partial differential equations. Izv. Akad. Nauk SSSR Ser. Mat., 41(6):1329– 1347, 1448, 1977. 160. N. V. Krylov and B. L. Rozovskii. Conditional distributions of diffusion processes. Izv. Akad. Nauk SSSR Ser. Mat., 42(2):356–378,470, 1978. 161. N. V. Krylov and B. L. Rozovski˘ı. Characteristics of second-order degenerate parabolic Itˆ o equations. Trudy Sem. Petrovsk., (8):153–168, 1982. 162. N. V. Krylov and B. L. Rozovski˘ı. Stochastic partial differential equations and diffusion processes. Uspekhi Mat. Nauk, 37(6(228)):75–95, 1982. 163. N. V. Krylov and A. Zatezalo. A direct approach to deriving filtering equations for diffusion processes. Appl. Math. Optim., 42(3):315–332, 2000. 164. H. Kunita. Stochastic Flows and Stochastic Differential Equations. Number 24 in Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, UK, 1990. 165. Hiroshi Kunita. Cauchy problem for stochastic partial differential equations arising in nonlinear filtering theory. Systems Control Lett., 1(1):37–41, 1981/82. 166. Hiroshi Kunita. Stochastic partial differential equations connected with nonlinear filtering. In Nonlinear Filtering and Stochastic Control (Cortona, 1981), volume 972 of Lecture Notes in Math., pages 100–169. Springer, Berlin, 1982. 167. Hiroshi Kunita. Ergodic properties of nonlinear filtering processes. In Spatial Stochastic Processes, volume 19 of Progr. Probab., pages 233–256. Birkh¨ auser Boston, 1991. 168. Hiroshi Kunita. The stability and approximation problems in nonlinear filtering theory. In Stochastic Analysis, pages 311–330. Academic Press, Boston, 1991. 169. Hans R. K¨ unsch. Recursive Monte Carlo filters: Algorithms and theoretical analysis. Ann. Statist., 33(5):1983–2021, 2005. 170. T. G. Kurtz and D. L. Ocone. Unique characterization of conditional distributions in nonlinear filtering. Ann. Probab., 16(1):80–107, 1988. 171. T. G. Kurtz and J. Xiong. Numerical solutions for a class of SPDEs with application to filtering. In Stochastics in Finite and Infinite Dimensions, Trends Math., pages 233–258. Birkh¨ auser Boston, 2001. 172. Thomas G. Kurtz. Martingale problems for conditional distributions of Markov processes. Electron. J. Probab., 3:no. 9, 29 pp. (electronic), 1998. 173. Thomas G. Kurtz and Daniel Ocone. A martingale problem for conditional distributions and uniqueness for the nonlinear filtering equations. In Stochastic Differential Systems (Marseille-Luminy, 1984), volume 69 of Lecture Notes in Control and Inform. Sci., pages 224–234. Springer, Berlin, 1985. 174. Thomas G. Kurtz and Jie Xiong. Particle representations for a class of nonlinear SPDEs. Stochastic Process. Appl., 83(1):103–126, 1999. 175. H. Kushner. On the differential equations satisfied by conditional densities of markov processes, with applications. SIAM J. Control, 2:106–119, 1964. 176. H. Kushner. Technical Report JA2123, M.I.T Lincoln Laboratory, March 1963. 177. H. J. Kushner. Approximations of nonlinear filters. IEEE Trans. Automat. Control, AC-12:546–556, 1967.
References
377
178. H. J. Kushner. Dynamical equations for optimal nonlinear filtering. J. Differential Equations, 3:179–190, 1967. 179. H. J. Kushner. A robust discrete state approximation to the optimal nonlinear filter for a diffusion. Stochastics, 3(2):75–83, 1979. 180. H. J. Kushner. Robustness and convergence of approximations to nonlinear filters for jump-diffusions. Matem´ atica Aplicada e Computacional, 16(2):153– 183, 1997. 181. H. J. Kushner and P. Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Number 24 in Applications of Mathematics. Springer, New York, 1992. 182. Harold J. Kushner. Weak Convergence Methods and Singularly Perturbed Stochastic Control and Filtering Problems, volume 3 of Systems & Control: Foundations & Applications. Birkh¨ auser Boston, 1990. 183. Harold J. Kushner and Amarjit S. Budhiraja. A nonlinear filtering algorithm based on an approximation of the conditional distribution. IEEE Trans. Autom. Control, 45(3):580–585, 2000. 184. Harold J. Kushner and Hai Huang. Approximate and limit results for nonlinear filters with wide bandwidth observation noise. Stochastics, 16(1-2):65–96, 1986. 185. S. Kusuoka and D. Stroock. The partial Malliavin calculus and its application to nonlinear filtering. Stochastics, 12(2):83–142, 1984. 186. F. Le Gland. Time discretization of nonlinear filtering equations. In Proceedings of the 28th IEEE-CSS Conference Decision Control, Tampa, FL, pages 2601– 2606, 1989. 187. Fran¸cois Le Gland. Splitting-up approximation for SPDEs and SDEs with application to nonlinear filtering. In Stochastic Partial Differential Equations and Their Applications (Charlotte, NC, 1991), volume 176 of Lecture Notes in Control and Inform. Sci., pages 177–187. Springer, New York, 1992. 188. Fran¸cois Le Gland and Nadia Oudjane. Stability and uniform approximation of nonlinear filters using the Hilbert metric and application to particle filters. Ann. Appl. Probab., 14(1):144–187, 2004. 189. J. L´evine. Finite-dimensional realizations of stochastic PDEs and application to filtering. Stochastics Stochastics Rep., 37(1–2):75–103, 1991. 190. Robert Liptser and Ofer Zeitouni. Robust diffusion approximation for nonlinear filtering. J. Math. Systems Estim. Control, 8(1):22 pp. (electronic), 1998. 191. Robert S. Liptser and Wolfgang J. Runggaldier. On diffusion approximations for filtering. Stochastic Process. Appl., 38(2):205–238, 1991. 192. Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes. I General Theory, volume 5 of Stochastic Modelling and Applied Probablility. Springer, New York, second edition, 2001. Translated from the 1974 Russian original by A. B. Aries. 193. Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes. II Applications, volume 6 of Stochastic Modelling and Applied Probability. Springer, New York, second edition, 2001. Translated from the 1974 Russian original by A. B. Aries. 194. S. Lototsky, C. Rao, and B. Rozovskii. Fast nonlinear filter for continuousdiscrete time multiple models. In Proceedings of the 35th IEEE Conference on Decision and Control, Kobe, Japan, 1996, volume 4, pages 4060–4064, Madison, WI, 1997. Omnipress.
378
References
195. S. V. Lototsky. Optimal filtering of stochastic parabolic equations. In Recent Developments in Stochastic Analysis and Related Topics, pages 330–353. World Scientific, Hackensack, NJ, 2004. 196. S. V. Lototsky. Wiener chaos and nonlinear filtering. Appl. Math. Optim., 54(3):265–291, 2006. 197. Sergey Lototsky, Remigijus Mikulevicius, and Boris L. Rozovskii. Nonlinear filtering revisited: A spectral approach. SIAM J. Control Optim., 35(2):435– 461, 1997. 198. Sergey Lototsky and Boris Rozovskii. Stochastic differential equations: A Wiener chaos approach. In From Stochastic Calculus to Mathematical Finance, pages 433–506. Springer, New York, 2006. 199. Sergey V. Lototsky. Nonlinear filtering of diffusion processes in correlated noise: analysis by separation of variables. Appl. Math. Optim., 47(2):167–194, 2003. 200. Vladimir M. Lucic and Andrew J. Heunis. On uniqueness of solutions for the stochastic differential equations of nonlinear filtering. Ann. Appl. Probab., 11(1):182–209, 2001. 201. T. M. Macrobert. Functions of a Complex Variable. St. Martin’s Press, New York, 1954. 202. Michael Mangold, Markus Grotsch, Min Sheng, and Achim Kienle. State estimation of a molten carbonate fuel cell by an extended Kalman filter. In Control and Observer Design for Nonlinear Finite and Infinite Dimensional Systems, volume 322 of Lecture Notes in Control and Inform. Sci., pages 93– 109. Springer, New York, 2005. 203. S. J. Maybank. Path integrals and finite-dimensional filters. In Stochastic Partial Differential Equations (Edinburgh, 1994), volume 216 of London Math. Soc. Lecture Note Ser., pages 209–229, Cambridge, UK, 1995. Cambridge University Press. 204. Stephen Maybank. Finite-dimensional filters. Phil. Trans. R. Soc. Lond. A, 354(1710):1099–1123, 1996. 205. Paul-Andr´e Meyer. Sur un probl`eme de filtration [French]. In S´eminaire de Probabiliti´es, VII (Univ. Strasbourg), Ann´ees universitaire 1971/1972, volume 321 of Lecture Notes in Math., pages 223–247. Springer Verlag, Berlin, 1973. 206. Paul-Andr´e Meyer. La th´eorie de la pr´ediction de F. Knight [French]. In S´eminaire de Probabiliti´es, X (Univ. Strasbourg), Ann´ees universitaire 1974/1975, volume 511 of Lecture Notes in Math., pages 86–103. Springer Verlag, Berlin, 1976. 207. Dominique Michel. R´egularit´e des lois conditionnelles en th´eorie du filtrage non-lin´eaire et calcul des variations stochastique. J. Funct. Anal., 41(1):8–36, 1981. 208. R. Mikulevicius and B. L. Rozovskii. Separation of observations and parameters in nonlinear filtering. In Proceedings of the 32nd IEEE Conference on Decision and Control, Part 2, San Antonio. IEEE Control Systems Society, 1993. 209. R. Mikulevicius and B. L. Rozovskii. Fourier-Hermite expansions for nonlinear filtering. Teor. Veroyatnost. i Primenen., 44(3):675–680, 1999. 210. Sanjoy K. Mitter. Existence and nonexistence of finite-dimensional filters. Rend. Sem. Mat. Univ. Politec. Torino, Special Issue:173–188, 1982. 211. Sanjoy K. Mitter. Geometric theory of nonlinear filtering. In Mathematical Tools and Models for Control, Systems Analysis and Signal Processing, Vol.
References
212. 213.
214. 215. 216. 217.
218.
219.
220.
221.
222. 223.
224. 225. 226. 227. 228. 229.
379
3 (Toulouse/Paris, 1981/1982), Travaux Rech. Coop. Programme 567, pages 37–60. CNRS, Paris, 1983. Sanjoy K. Mitter and Nigel J. Newton. A variational approach to nonlinear estimation. SIAM J. Control Optim., 42(5):1813–1833 (electronic), 2003. Sanjoy K. Mitter and Irvin C. Schick. Point estimation, stochastic approximation, and robust Kalman filtering. In Systems, Models and Feedback: Theory and Applications (Capri, 1992), volume 12 of Progr. Systems Control Theory, pages 127–151. Birkh¨ auser Boston, 1992. P. Del Moral. Non-linear filtering: Interacting particle solution. Markov Processes Related Fields, 2:555–580, 1996. P. Del Moral. Non-linear filtering using random particles. Theory Probability Applications, 40(4):690–701, 1996. P. Del Moral. Feynman-Kac formulae. Genealogical and Interacting Particle Systems with Applications. Springer, New York, 2004. P. Del Moral and J. Jacod. The Monte-Carlo method for filtering with discretetime observations: Central limit theorems. In Numerical Methods and Stochastics (Toronto, ON, 1999), Fields Inst. Commun., 34, pages 29–53. Amer. Math. Soc., Providence, RI, 2002. P. Del Moral and L. Miclo. Branching and interacting particle systems approximations of Feynman-Kac formulae with applications to non-linear filtering. In S´eminaire de Probabilit´es, XXXIV, volume 1729 of Lecture Notes in Math., pages 1–145. Springer, Berlin, 2000. P. Del Moral, J. C. Noyer, G. Rigal, and G. Salut. Traitement particulaire du signal radar : d´etection, estimation et reconnaissance de cibles a´eriennes. Technical Report 92495, LAAS, Dcembre 1992. P. Del Moral, G. Rigal, and G. Salut. Estimation et commande optimale nonlin´eaire : un cadre unifi´e pour la r´esolution particulaire. Technical Report 91137, LAAS, 1991. P. Del Moral, G. Rigal, and G. Salut. Filtrage non-lin´eaire non-gaussien appliqu´e au recalage de plates-formes inertielles. Technical Report 92207, LAAS, Juin 1992. R. E. Mortensen. Stochastic optimal control with noisy observations. Internat. J. Control, 1(4):455–464, 1966. Christian Musso, Nadia Oudjane, and Francois Le Gland. Improving regularised particle filters. In Sequential Monte Carlo Methods in Practice, Stat. Eng. Inf. Sci., pages 247–271. Springer, New York, 2001. David E. Newland. Harmonic wavelet analysis. Proc. Roy. Soc. London Ser. A, 443(1917):203–225, 1993. Nigel J. Newton. Observation sampling and quantisation for continuous-time estimators. Stochastic Process. Appl., 87(2):311–337, 2000. Nigel J. Newton. Observations preprocessing and quantization for nonlinear filters. SIAM J. Control Optim., 38(2):482–502 (electronic), 2000. David Nualart. The Malliavin Calculus and Related Topics. Springer, New York, second edition, 2006. D. L. Ocone. Asymptotic stability of Beneˇs filters. Stochastic Anal. Appl., 17(6):1053–1074, 1999. Daniel Ocone. Multiple integral expansions for nonlinear filtering. Stochastics, 10(1):1–30, 1983.
380
References
230. Daniel Ocone. Application of Wiener space analysis to nonlinear filtering. In Theory and Applications of Nonlinear Control Systems (Stockholm, 1985), pages 387–400. North-Holland, Amsterdam, 1986. 231. Daniel Ocone. Stochastic calculus of variations for stochastic partial differential equations. J. Funct. Anal., 79(2):288–331, 1988. 232. Daniel Ocone. Entropy inequalities and entropy dynamics in nonlinear filtering of diffusion processes. In Stochastic Analysis, Control, Optimization and Applications, Systems Control Found. Appl., pages 477–496. Birkh¨ auser Boston, 1999. ´ 233. Daniel Ocone and Etienne Pardoux. A Lie algebraic criterion for nonexistence of finite-dimensionally computable filters. In Stochastic Partial Differential Equations and Applications, II (Trento, 1988), volume 1390 of Lecture Notes in Math., pages 197–204. Springer, Berlin, 1989. 234. O. A. Ole˘ınik and E. V. Radkeviˇc. Second Order Equations with Nonnegative Characteristic Form. Plenum Press, New York, 1973. 235. Levent Ozbek and Murat Efe. An adaptive extended Kalman filter with application to compartment models. Comm. Statist. Simulation Comput., 33(1):145–158, 2004. 236. E. Pardoux. Equations aux d´eriv´ees partielles stochastiques non lin´earires monotones. PhD thesis, Univ Paris XI, Orsay, 1975. 237. E. Pardoux. Stochastic partial differential equations and filtering of diffusion processes. Stochastics, 3(2):127–167, 1979. 238. E. Pardoux. Filtrage non lin´eaire et equations aux d´eriv´ees partielles stochastiques associ´ees. In Ecole d’Et´e de Probabilit´es de Saint-Flour XIX – 1989, volume 1464 of Lecture Notes in Mathematics, pages 67–163. Springer, 1991. 239. P. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York, 1967. 240. J. Picard. Efficiency of the extended Kalman filter for nonlinear systems with small noise. SIAM J. Appl. Math., 51(3):843–885, 1991. 241. Jean Picard. Approximation of nonlinear filtering problems and order of convergence. In Filtering and Control of Random Processes (Paris, 1983), volume 61 of Lecture Notes in Control and Inform. Sci., pages 219–236. Springer, Berlin, 1984. 242. Jean Picard. An estimate of the error in time discretization of nonlinear filtering problems. In Theory and Applications of Nonlinear Control Systems (Stockholm, 1985), pages 401–412. North-Holland, Amsterdam, 1986. 243. Jean Picard. Nonlinear filtering of one-dimensional diffusions in the case of a high signal-to-noise ratio. SIAM J. Appl. Math., 46(6):1098–1125, 1986. 244. Michael K. Pitt and Neil Shephard. Filtering via simulation: Auxiliary particle filters. J. Amer. Statist. Assoc., 94(446):590–599, 1999. 245. M. Pontier, C. Stricker, and J. Szpirglas. Sur le th´eor`eme de representation par raport a l’innovation [French]. In S´eminaire de Probabilit´es, XX (Univ. Strasbourg, Ann´ees universitaires 1984/1985), volume 1204 of Lecture Notes in Math., pages 34–39. Springer Verlag, Berlin, 1986. 246. Yu. V. Prokhorov. Convergence of random processes and limit theorems in probability theory. Theory Probability Applications [Teor. Veroyatnost. i Primenen.], 1(2):157–214, 1956. 247. P. Protter. Stochastic Integration and Differential Equations. Springer, Berlin, second edition, 2003.
References
381
248. L. C. G. Rogers and D. Williams. Diffusions, Markov Processes and Martingales: Volume I Foundations. Cambridge University Press, Cambridge, UK, second edition, 2000. 249. L. C. G. Rogers and D. Williams. Diffusions, Markov Processes and Martingales: Volume II Itˆ o Calculus. Cambridge University Press, Cambridge, UK, second edition, 2000. 250. B. L. Rozovskii. Stochastic Evolution Systems. Kluwer, Dordrecht, 1990. 251. D. B. Rubin. A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when the fraction of missing information is modest: The SIR algorithm (discussion of Tanner and Wong). J. Amer. Statist. Assoc., 82:543–546, 1987. 252. Laurent Saloff-Coste. Aspects of Sobolev-Type Inequalities, volume 289 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, UK, 2002. 253. G. C. Schmidt. Designing nonlinear filters based on Daum’s theory. J. of Guidance, Control Dynamics, 16(2):371–376, 1993. 254. Carla A.I. Schwartz and Bradley W. Dickinson. Characterizing finitedimensional filters for the linear innovations of continuous-time random processes. IEEE Trans. Autom. Control, 30(3):312–315, 1985. 255. A. N. Shiryaev. Some new results in the theory of controlled random processes [Russian]. In Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes (Prague, 1965), pages 131–203. Academia Prague, 1967. 256. Elias M. Stein. Singular Integrals and Differentiability Properties of Functions. Number 30 in Princeton Mathematical Series. Princeton University Press, Princeton, NJ, 1970. 257. R. L. Stratonovich. On the theory of optimal non-linear filtration of random functions. Teor. Veroyatnost. i Primenen., 4:223–225, 1959. 258. R. L. Stratonovich. Application of the theory of Markov processes for optimum filtration of signals. Radio Eng. Electron. Phys, 1:1–19, 1960. 259. R. L. Stratonovich. Conditional Markov processes. Theory Probability Applications [translation of Teor. Verojatnost. i Primenen.], 5(2):156–178, 1960. 260. R. L. Stratonovich. Conditional Markov Processes and Their Application to the Theory of Optimal Control, volume 7 of Modern Analytic and Computational Methods in Science and Mathematics. Elsevier, New York, 1968. Translated from the Russian by R. N. and N. B. McDonough for Scripta Technica. 261. D. W. Stroock and S. R. S. Varadhan. Multidimensional Diffusion Processes. Springer, New York, 1979. 262. Daniel W. Stroock. Probability Theory, An Analytic View. Cambridge University Press, Cambridge, UK, 1993. 263. M. Sun and R. Glowinski. Pathwise approximation and simulation for the Zakai filtering equation through operator splitting. Calcolo, 30(3):219–239 (1994), 1993. 264. J. Szpirglas. Sur l’´equivalence d’´equations diff´erentielles stochastiques a ` valeurs mesures intervenant dans le filtrage Markovien non lin´eaire [French]. Ann. Inst. H. Poincar´e Sect. B (N.S.), 14(1):33–59, 1978. 265. I. Tulcea. Measures dans les espaces produits [French]. Atti. Accad. Naz. Lincei Rend. Cl. Sci. Fis. Math. Nat., 8(7):208–211, 1949.
382
References
¨ unel. Some comments on the filtering of diffusions and the Malliavin 266. A. S. Ust¨ calculus. In Stochastic analysis and related topics (Silivri, 1986), volume 1316 of Lecture Notes in Math., pages 247–266. Springer, Berlin, 1988. 267. A. Yu. Veretennikov. On backward filtering equations for SDE systems (direct approach). In Stochastic Partial Differential equations (Edinburgh, 1994), volume 216 of London Math. Soc. Lecture Note Ser., pages 304–311, Cambridge, UK, 1995. Cambridge Univ. Press. 268. D. Whitley. A genetic algorithm tutorial. Statist. Comput., 4:65–85, 1994. 269. Ward Whitt. Stochastic Process Limits. An Introduction to Stochastic-Process Limits and Their Application to Queues. Springer, New York, 2002. 270. N. Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications. MIT Press, Cambridge, MA, 1949. 271. N. Wiener. I Am a Mathematician. Doubleday, Garden City, NY; Victor Gollancz, London, 1956. 272. D. Williams. Probability with Martingales. Cambridge University Press, Cambridge, UK, 1991. 273. W. M. Wonham. Some applications of stochastic differential equations to optimal nonlinear filtering. J. Soc. Indust. Appl. Math. Ser. A Control, 2:347–369, 1965. 274. Xi Wu, Stephen S.-T. Yau, and Guo-Qing Hu. Finite-dimensional filters with nonlinear drift. XII. Linear and constant structure of Wong-matrix. In Stochastic Theory and Control (Lawrence, KS, 2001), volume 280 of Lecture Notes in Control and Inform. Sci., pages 507–518, Berlin, 2002. Springer. 275. T. Yamada and S. Watanabe. On the uniqueness of solutions of stochastic differential equations. J. Math. Kyoto Univ., 11:151–167, 1971. 276. Shing-Tung Yau and Stephen S. T. Yau. Finite-dimensional filters with nonlinear drift. XI. Explicit solution of the generalized Kolmogorov equation in Brockett-Mitter program. Adv. Math., 140(2):156–189, 1998. 277. Stephen S.-T. Yau. Finite-dimensional filters with nonlinear drift. I. A class of filters including both Kalman-Bucy filters and Benes filters. J. Math. Systems Estim. Control, 4(2):181–203, 1994. 278. Stephen S.-T. Yau and Guo-Qing Hu. Finite-dimensional filters with nonlinear drift. X. Explicit solution of DMZ equation. IEEE Trans. Autom. Control, 46(1):142–148, 2001. 279. Marc Yor. Sur les th´eories du filtrage et de la pr´ediction [French]. In S´eminaire de Probabiliti´es, XI (Univ. Strasbourg), Ann´ees universitaire 1975/1976, volume 581 of Lecture Notes in Math., pages 257–297. Springer Verlag, Berlin, 1977. 280. Marc Yor. Some Aspects of Brownian Motion, Part 1: Some Special Functionals (Lectures in Mathematics, ETH, Z¨ urich). Birkh¨ auser Boston, 1992. 281. Moshe Zakai. On the optimal filtering of diffusion processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 11:230–243, 1969. 282. O. Zeitouni. On the tightness of some error bounds for the nonlinear filtering problem. IEEE Trans. Autom. Control, 29(9):854–857, 1984. 283. O. Zeitouni and B. Z. Bobrovsky. On the reference probability approach to the equations of nonlinear filtering. Stochastics, 19(3):133–149, 1986. 284. Ofer Zeitouni. On the filtering of noise-contaminated signals observed via hard limiters. IEEE Trans. Inform. Theory, 34(5, part 1):1041–1048, 1988.
Author Name Index
A Adams, R. A. 165, 166 Aggoun, L. 192 Allinger, D. F. 35 B Baker, J. E. 280 Baras, J. S. 179 Bayro-Corrochano, E. 194 Beneˇs, V. E. 8, 142, 197–199 Bensoussan, A. 8, 9, 95, 104, 196, 356 Bharucha-Reid, A. T. 7 Bhatt, A. G. 126 Billingsley, P. 303 Blake, A. 286 Bobrovsky, B. Z. 196 Bourbaki, N. 27, 296 Breiman, L. 294 Brigo, D. 199, 202 Brockett, R. W. 8 Bucy, R. S. 6, 7, 192 Budhiraja, A. 9 Burkholder, D. L. 353 C Carpenter, J. 230, 280 Chaleyat-Maurel, M. 8, 9 Chen, J. 8 Chiou, W. L. 8 Chopin, N. 280
Chung, K. L. 329, 330, 332, 338, 343, 355 Cipra, B. 6 Clark, J. M. C. 7, 8, 35, 129, 139, 348 Clifford, P. 230, 280 Cohen de Lara, M. 199 Crisan, D. 230, 249, 279, 281, 285, 286 D Daniell, P. J. 301 Darling, R. W. R. 199 Daum, F. E. 8, 199 Davis, M. H. A. 7, 149, 250 Del Moral, P. 249, 250, 281, 286 Dellacherie, C. 307, 308, 312, 317, 319, 330 Dickinson, B. W. 8 Dieudonn´e, J. 32 Doob, J. L. 18, 58, 88, 301, 329 Doucet, A. 285 Duncan, T. E. 7, 9 Dynkin, E. B. 43 E Efe, M. 194 Elliott, R. J. 9, 192 Ethier, S. N. 298, 303, 305, 330 F Fearnhead, P. 230, 280 Fleming, W. H. 196
384
Author Name Index
Friedman, A. 101, 103 Frost, P. 7 Fujisaki, M. 7, 34, 45 G Getoor, R. K. 27, 28 Gordon, N. J. 276, 286 Grigelionis, B. 9 Gy¨ ongy, I. 9, 139, 209 H Halmos, P. R. 32 Handschin, J. E. 286 Hazewinkel, M. 8, 9 Heunis, A. J. 9, 95, 113, 114, 126 Hu, G.-Q. 8 I Isard, M. 286 Itˆ o, K. 360
L L´evy, P. 344, 362 Le Gland, F. 9 Leung, C. W. 8 Liptser, R. S. 9 Lototsky, S. 202, 204 Lucic, V. M. 95, 113, 114, 126 Lyons, T. J. 230, 249, 250, 281, 285, 286 M Mangold, M. 194 Marcus, S. I. 8 Maybank, S. J. 8 Mayne, D. Q. 286 McKean, H. P. 360 Meyer, P. A. 27, 45, 307, 308, 312, 317, 319, 330 Michel, D. 8, 9 Miclo, L. 250 Mikulevicius, R. 9, 202 Mitter, S. K. 8, 9, 35 Mortensen, R. E. 7
J N Jacod J. 249 Joseph, P. D. 192 K K¨ unsch, H. R. 279, 280 Kailath, T. 7 Kallianpur, G. 7, 8, 34, 35, 45, 57 Kalman, R. E. 6 Kao, J. 194 Karandikar, R. L. 8 Karatzas, I. 51, 88, 310, 330, 355 Kitagawa, G. 286 Kloeden, P. E. 251 Kolmogorov, A. N. 5, 13, 31, 32, 301 Krein, M. G. 5 Krylov, N. V. 7, 93, 139, 209, 355 Kunita, H. 7, 9, 34, 45, 182 Kuratowksi, K. 27 Kurtz, T. G. 9, 126, 165, 249, 298, 303, 305, 330 Kushner, H. J. 7, 9, 139, 202
Newton, N. J. 9 Novikov, A. A. 52, 350 Nualart, D. 348 O Ocone, D. L. 9, 126 Ole˘ınik, O. A. 105 Ozbek, L. 194 P Pardoux, E. 7, 9, 182, 193, 196 Picard, J. 9, 195, 196 Pitt, M. K. 285 Platen, E. 251 Prokhorov, Y. V. 45 Protter P. 330, 351 R Radkeviˇc, E. V.
105
Author Name Index Rigal, G. 286 Rogers, L. C. G. 17, 32, 58, 293, 296, 300, 301, 307, 308, 319, 321, 329, 339, 343, 348 Rozovskii, B. L. 7, 9, 93, 176, 177, 182, 202, 355 Rubin, D. B. 286 Runggaldier, W. J. 9 S Salmond, D. J. 276, 286 Saloff-Coste, L. 166 Salut, G. 286 Schmidt, G. C. 199 Schwartz, C. A. I. 8 Sharpe, M. J. 332 Shephard, N. 285 Shiryaev, A. N. 7, 9 Shreve, S. E. 51, 88, 310, 330, 355 Smith, A. F. M. 276, 286 Stein, E. M. 166 Stratonovich, R. S. 7 Striebel, C. 8, 57 Stroock, D. W. 28, 298 Sussmann, H. J. 8 Szpirglas, J. 125 T Tsirel’son, B. S. Tulcea, I. 298
35
298
249
W Watanabe, S. 35 Whitley, D. 230, 280 Whitt, W. 303 Wiener, N. 5 Williams, D. 17, 32, 43, 58, 293, 296, 300, 301, 307, 308, 319, 321, 329, 339, 343, 348, 362 Williams, R. J. 329, 330, 332, 338, 343, 355 Wonham, W. M. 7 Wu, X. 8 X Xiong, J.
165, 249
Y Yamada, T. 35 Yau, S.-T. 8 Yau, S. S.-T. 8 Yor, M. 28, 360 Z
V Varadhan, S. R. S.
Veretennikov, A. Y.
385
Zakai, M. 7, 196 Zatezalo, A. 93 Zeitouni, O. 9
Subject Index
A Announcing sequence 321 Atom 298 Augmented filtration see Observation filtration Averaging over the characteristics formula 182 B Beneˇs condition 142, 196 Beneˇs filter 141, 146, 196 the d-dimensional case 197 Bootstrap filter 276, 286 Borel space 301 Branching algorithm 278 Brownian motion 346 exponential functional of 360, 361, 363, 365 Fourier decomposition of 360 L´evy’s characterisation 344, 346 Burkholder–Davis–Gundy inequalities 246, 256, 353 C C` adl` ag path 303 Carath´eodory extension theorem 300, 347 Change detection filter see Changedetection problem Change of measure method 49, 52 Change-detection problem 52, 69
Clark’s robustness result see Robust representation formula Class U 96, 97, 100, 107, 109, 110, 113, 118 U¯ 109, 110 U 0 110, 111, 113, 114, 116 U¯0 116 Condition U 97, 102, 107, 110 U0 113, 114, 116 U00 114 Conditional distribution of Xt 2–3, 191 approximating sequence 265 density of 174 density of the 200 recurrence formula 261, 264 unnormalised 58, 173, 175 regular 294 Conditional expectation 293 Conditional probability of a set 294 regular 32, 294, 296, 347 Convergence determining set 323 Convergence in expectation 322, 324 Cubic sensor 201 D D´ebut theorem 307, 314, 339, 341 Daniell–Kolmogorov–Tulcea theorem 301, 302, 347 Density of ρt existence of 168
388
Subject Index
smoothness of 174 Dual previsible projection 332 Duncan–Mortensen–Zakai equation see Zakai equation E Empirical measure 210 Euler method 251 Evanescent set 319 Exponential projection filter 201 Extended Kalman filter 194 F
I Importance distribution 285 Importance sampling 273 Indistinguishable processes 319 Infinitesimal generator see Generator of the process X Innovation approach 7, 49, 70–73 process 33–34 Itˆ o integral see Stochastic integral Itˆ o isometry 337, 338, 349 Itˆ o’s formula 343 K
Feller property 267 Feynman–Kac formula 182 Filtering equations 4, 16, 72, 93, 125, 249, 308 see Kushner–Stratonovich equation, Zakai equation for inhomogeneous test functions 69 problem 13, 48 discrete time 258–259 the correlated noise case 73–75, 109 Finite difference scheme 207 Finite-dimensional filters 141, 146, 154, 196–199 Fisher information matrix 199 Fokker–Planck equation 206 Fujisaki–Kallianpur–Kunita equation see Kushner–Stratonovich equation G
Kallianpur–Striebel formula 57, 59, 128 Kalman–Bucy filter 6, 148–154, 191, 192, 199 1D case 158 as a Beneˇs filter 142, 148 Kushner–Stratonovich equation 68, 71, 153 correlated noise case 74 finite-dimensional 66 for inhomogeneous test functions 69 linear case 151 strong form 179 uniqueness of solution 110, 116 L Likelihood function 260 Linear filter see Kalman–Bucy filter Local martingale 330, 344 M
Generator of the process X 48, 50, 51, 151, 168, 207, 221 domain of the 47, 50–51 maximal 51 Girsanov’s theorem 345, 346 Gronwall’s lemma 78, 79, 81, 88, 172, 325 H Hermite polynomials
203
Markov chain 257 Martingale 329 representation theorem 348 uniformly integrable 330, 346 Martingale convergence theorem 318, 329, 345 Martingale problem 47 Martingale representation theorem 35, 38, 44 Measurement noise 1
Subject Index Monotone class theorem 29, 31, 293, 295, 311, 318, 336 Monte Carlo approximation 210, 216, 222, 230 convergence of 213, 214, 217 convergence rate 215, 216 Multinomial resampling see Resampling procedure Mutation step 273 N Non-linear filtering see Stochastic filtering Non-linear filtering problem see Filtering problem Novikov’s condition 52, 127, 131, 218, 222, 350 O Observation filtration 13–17 right continuity of the 17, 27, 33–40 unaugmented 16 process 1, 3, 16 discrete time 258 σ-algebra see Observation filtration Offspring distribution 224, 252, 274–281 Bernoulli 280 binomial 280 minimal variance 225, 226, 228, 230, 279, 280 multinomial 275–277 obtained by residual sampling 277 Poisson 280 Optional process 320 Optional projection of a process 17–19, 311–317, 338 kernel for the 27 without the usual conditions 321 P Parabolic PDEs existence and uniqueness result maximum principle for 102 systems of 102
389
uniformly 101, 121 Parseval’s equality 204, 205 Particle filter 209, 222–224 branching algorithm 225 convergence rates 241, 244, 245, 248 correction step 222, 230, 250 discrete time 272–273 convergence of 281–284 prediction step 264 updating step 264 evolution equation 230 implementation 250–252 correction step 251, 252 evolution step 251 offspring distribution see Offspring distribution path regularity 229 resampling procedure 250, 252 Particle methods see Particle filter Path process 259 PDE Methods correction step 207 prediction step 206 π the stochastic process 14, 27–32 c` adl` ag version of 31 πt see Conditional distribution of Xt Polarization identity 342 Posterior distribution 259 Predictable σ-algebra see Previsible σ-algebra Predictable process see Previsible process Predicted conditional probability 259 Previsible σ-algebra 331 Previsible process 321, 331, 338 Previsible projection of a process 317, 321, 340, 341 Prior distribution 259 Projection bien measurable see Optional projection of a process Projection filter 199 Projective product 261 Q
100 Q-matrix 51 Quadratic variation
332, 335, 342
390
Subject Index
R Reducing sequence 330 Regular grid 207 Regularisation method 167 Regularised measure 167 Resampling procedure 276 Residual sampling 277 ρ see Conditional distribution of Xt , unnormalised density of 173, 178 dual of 165, 180–182, 233, 238 Riccati equation 152, 192 Ring of subsets 331 Robust representation formula 129, 137
limits of 358 localization 343 martingale property 337 Stochastic integration by parts Stopping time 306 announceable 321
342
T TBBA see Tree-based branching algorithms Total sets in L1 355, 357 Transition kernel 257 Tree-based branching algorithms 230, 279 Tulcea’s theorem 298, 303, 347, 348
S
U
Sampling with replacement method see Resampling procedure Selection step 274 Sensor function 4 Separable metric space 296 Sequential Monte Carlo methods see Particle filter Signal process 1, 3, 16, 47 discrete time version 257 filtration associated with the 47 in discrete time 257 particular cases 49–52 SIR algorithm 276 Skorohod topology 304–305 Sobolev embedding theorem 166 space 166 Splitting-up algorithm 206 Stochastic differential equation strong solution 355 Stochastic filtering 1, 3, 6, 8, 9, see also Filtering problem Stochastic Fubini’s theorem 351 Stochastic integral 330–341
Uniqueness of solution see Kushner– Stratonovich equation, uniqueness of solution, see Zakai equation, uniqueness of solution Usual conditions 16, 319 W Weak topology on P(S) metric for 26 Wick polynomials 203 Wiener filter 5–6
21–27
Z Zakai equation 62, 69, 73, 154, 177 correlated noise case 74, 111 finite-dimensional 65 for inhomogeneous test functions 69, 97 strong form 67, 175–178, 202–203, 206 uniqueness of solution 107, 109, 114, 182