A Weak Existence Result with Application to the Financial Engineer's Calibration Problem

Abstract A Weak Existence Result with Application to the Financial Engineer’s Calibration Problem Gerard Brunick Advisor...

Author: Gerard Brunick

9 downloads 362 Views 832KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Abstract A Weak Existence Result with Application to the Financial Engineer’s Calibration Problem Gerard Brunick Advisor: Steven E. Shreve Given an initial Itˆo process, Krylov and Gy¨ongy have shown that it is often possible to construct a diffusion process with the same one-dimensional marginal distributions. As the one-dimensional marginal distributions of a price process under a pricing measure essentially determine the prices of European options written on that price process, this result has found wide application in Mathematical Finance. In this dissertation, we extend the result of Krylov and Gy¨ongy in two directions: We relax the technical conditions which must be imposed on the initial Itˆo process. And we clarify the relationship between the stochastic differential equation that is solved by the mimicking process and the properties of the initial process that are preserved.

A Weak Existence Result with Application to the Financial Engineer’s Calibration Problem

Gerard Brunick

Advisor: Steven E. Shreve Defense Date: July 29th , 2008

A dissertation in the Department of Mathematical Sciences submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Carnegie Mellon University.

Copyright © 2008 by Gerard Brunick All rights reserved.

Abstract A Weak Existence Result with Application to the Financial Engineer’s Calibration Problem Gerard Brunick Advisor: Steven E. Shreve Given an initial Itˆo process, Krylov and Gy¨ongy have shown that it is often possible to construct a diffusion process with the same one-dimensional marginal distributions. As the one-dimensional marginal distributions of a price process under a pricing measure essentially determine the prices of European options written on that price process, this result has found wide application in Mathematical Finance. In this dissertation, we extend the result of Krylov and Gy¨ongy in two directions: We relax the technical conditions which must be imposed on the initial Itˆo process. And we clarify the relationship between the stochastic differential equation that is solved by the mimicking process and the properties of the initial process that are preserved.

i

Acknowledgments I would like to express my gratitude to my adviser, Steven Shreve, for his guidance and support as I worked on this dissertation. His comments and insight have been invaluable. I would also like to thank Dmitry Kramkov and Kasper Larson for many useful conversations on a wide range of topics. Finally, I would like acknowledge Peter Carr who made me aware of the previous work of Krylov and Gy¨ongy, and Silviu Predoiu who produced a very nice counterexample that allowed me to abandon a fallacious conjecture. I would also like to take this opportunity to thank my family for their love and encouragement during my time at Carnegie Mellon University. In particular, I would never have had this opportunity without my parents’ constant love, patience, and support. Finally, I would like to express my gratitude to Jessica whose love and kindness have been a constant source of inspiration. During my time at Carnegie Mellon University I was supported by an NSF VIGRE fellowship and grant DMS-0404682.

ii

Contents 1 Introduction 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Definitions and Notation . . . . . . . . . . . . . . . . . . . . .

1 1 7

2 Statement of Results 12 2.1 Updating Functions . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 Applications to Mixture Models . . . . . . . . . . . . . . . . . 30 3 A Cross Product Construction 33 3.1 The Binary Construction . . . . . . . . . . . . . . . . . . . . . 36 3.2 Properties Preserved by the Binary Construction. . . . . . . . 42 3.3 The General Construction. . . . . . . . . . . . . . . . . . . . . 53 4 Main Theorem 61 4.1 Conditional Expectation Lemmas . . . . . . . . . . . . . . . . 61 4.2 Approximation Lemmas . . . . . . . . . . . . . . . . . . . . . 66 4.3 Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 77 A Galmarino’s Test

93

B Metric Space-Valued Random Variables.

99

C FV and AC Processes

104

D Semimartingale Characteristics

113

E Rebolledo’s Criterion

120

iii

F Convergence of Characteristics

128

References

136

iv

Chapter 1 Introduction 1.1

Introduction

Emanuel Derman [Der01] neatly summarizes the way in which many market participants make use of financial models as follows: Trading desks in many product areas at investment banks often have substantial positions in long-term or exotic over-the-counter derivative securities that have been designed to satisfy the risk preferences of their customers . . . Because liquid market prices are unavailable, these positions are marked and hedged by means of sophisticated and complex financial models . . . These models derive prices from market parameters (volatilities, correlations, prepayment rates or default probabilities, for example) that are forward-looking and should ideally be implied from market prices of traded securities. In this application, the market participant identifies a set of primary and derivative securities which she believes characterize the state of the market. In particular, she selects a set of securities which are actively traded so that price information is available, and she then attempts to construct a financial model in such a way that the prices computed within the model are consistent with the prices quoted in the market for the securities in this set. The process of constructing such a model is known as model calibration. Once a model has been calibrated, the market participant then uses the model to draw inferences about the market. 1

CHAPTER 1. INTRODUCTION

One approach to this calibration problem is to suppose that the financial model takes a particular parametric form. The calibration problem then reduces to a nonlinear optimization over the parameter space to minimize an objective function that measures the difference between the prices computed in the model and the prices quoted in the market. While the absence of arbitrage does serve to reduce the range of possible market price configurations, the space of market price configurations is generally of much higher dimension than the set of parameters for a given model. This raises the possibility that a particular parametric model simply cannot be calibrated to market prices. From the market participant’s perspective, this is in fact the main indictment of the Black-Scholes model, which provides a single volatility parameter for calibration. As the map from parameters to prices is often rather complicated, it can be difficult to determine, a priori, whether a particular parametric form will allow for a good fit to a given set of market prices. Instead, this question is empirical, and a great deal of research has been to devoted to determining which classes of models allow for good fits to market data. In this dissertation, we develop a result that provides the market participant with a tool to approach the calibration problem from the opposite direction. We impose a specific structure on the derivative securities, and we then use this structure to draw conclusions about the models that are consistent with a given set of prices. To illustrate the structure that we must impose on the derivative securities to apply our result, we introduce an auxiliary process that we use to record information about the history of the primary security, and we require that this auxiliary process can be updated using only the changes in the price of the primary security. This is actually a technical notion whose definition we postpone until Chapter 2; however, we do provide the following heuristic version of that definition. Suppose that, given the value of the auxiliary process today and a full description of the changes in the price of the primary security over the next week, we can give a full description of the changes in the auxiliary process over the next week. Then we say that the auxiliary process may be updated using only the changes in the price of the primary security. As an example of an auxiliary process that satisfies this condition, we might choose to track the current value of the primary security as well as its running maximum. In this case, the auxiliary process takes values in R2 . We could also choose to track the current value of the primary security and its historical average. We could even let the auxiliary process take values in R4 2

CHAPTER 1. INTRODUCTION

and track the current value of the primary security, the running maximum, the running minimum, and the historical average. An example of an auxiliary process that may not be updated using only the changes in the price of the primary security would be the current price of the primary security and the price one week prior. Of course, if we instead decided to track the price of the primary security over the entire previous week, this auxiliary process could be updated using only the changes in the price of the primary security. We now assume that we have fixed some auxiliary process that satisfies the updating condition sketched above, and that we have prices for a collection of European-style derivative securities. In this context, European-style means that the holder of the derivative security receives a single payment at a fixed maturity and makes no decisions prior to that date. The main result of this dissertation essentially asserts that when the payoff of each derivative security can be expressed as a function of the auxiliary process evaluated at the derivative’s maturity, then it is possible to construct a model with a price process that satisfies a “simple” stochastic differential equation (SDE) in such a way that the model prices for all derivative securities agree with the given prices. Moreover, the structure of this SDE is determined by the structure of the auxiliary process. As an example of such a situation, we note that the payoff of most barrier options can be expressed as a function of the current value, the running maximum, and the running minimum of the underlying security’s price at the maturity of the option. To see how such a result might be useful, consider the simplest case where the auxiliary process is simply the underlying security’s price. In this case, the payoffs of European puts and calls are functions of the auxiliary process at each option’s maturity. This case corresponds to the notion of “local volatility models” and has been studied extensively, starting with the work of Dupire [Dup94] and Derman and Kani [DK94]. (Rubinstein [Rub94] has also given similar results in a binomial tree model.) The forward equation of Fokker [Fok13], Planck [Pla17], and Kolmogorov [Kol31] expresses the relationship between the drift and volatility coefficients of a diffusion process and the one-dimensional marginal distributions of that process. Breeden and Litzenberger [BL78] have shown the equivalence between European option prices and the one-dimensional marginal distributions of the underlying price process under any pricing measure. Connecting these two results and observing that the drift of a price process under any pricing measure is determined by no-arbitrage, Dupire argued that the volatility coefficient in a diffusion model for a price process may be implied directly from the set of European 3

CHAPTER 1. INTRODUCTION

option prices. In particular, assuming zero interest rates for simplicity of exposition, it is possible to choose a (deterministic) diffusion coefficient σ b in such a way that the model prices for European options written on a price process Sb that solves the SDE (1.1)

dSbt = σ b(Sbt , t) Sbt dWt

will agree with the market prices. It was clear to Dupire from the start that (1.1) may not be a very good model for the price process and he advocated a hedging strategy that is robust to violations of the dynamics given in (1.1). Indeed, empirical work such as [DFW98] indicates that σ b must often be modified to refit market prices. As the model assumes that σ b should be a fixed function, this is inconsistent. Nevertheless, (1.1) is still useful because it can be used to characterize the models that are consistent with a given set of European option prices. We will say that a model is an Itˆo model if the price processes for the primary securities are modeled as Itˆo processes. Consider a general Itˆo model where the price process solves the SDE (1.2)

dSt = σt St dWt

for some adapted process σ under some pricing measure. Given such a model, we could compute the European option prices, take these prices as inputs to Dupire’s approach, and then choose σ b such that the prices for European options written on price processes which solve (1.1) and (1.2) agree. It turns out that the process σ and the function σ b that we imply are related by the rather intuitive formula (1.3)

σ b2 (x, t) = E[σt2 | St = x].

Derman and Kani give such a formula in [DK98]. As the local volatility function σ b essentially characterizes the European option prices, (1.3) essentially characterizes the Itˆo models that are consistent with a given collection of European option prices. Gatheral [Gat06] argues that one should understand local volatilities as an “effective theory,” and (1.3) connects a local volatility model with a stochastic volatility model in a way that is consistent, at least with respect to pricing European options. The relationship given in (1.3) has found a wide range of applications. Brigo and Mecurio [BM01] [BM02] use (1.3) to produce local volatility models 4

CHAPTER 1. INTRODUCTION

where the prices for European options are given as simple mixtures of BlackScholes prices. To do this, they fix a finite number of deterministic volatility scenarios and build a stochastic volatility model by randomly choosing a volatility scenario at the initial time. It is clear that the price for any option in such a model is given as a mixture of the prices that are computed in each scenario. They then compute σ b explicitly using (1.3) and conclude that the corresponding local volatility model has European option prices that are given as mixtures of the prices computed in each scenario. We note that the prices for non-European options in the initial mixture model and the mimicking local volatility model may differ. Piterbarg stresses this point in the working paper [Pit03a] and argues against the use of such mixture models. We will briefly revisit this point at the end of Chapter 2. Avellaneda et al. [ABOBF02] use (1.3) for pricing European options on baskets of securities. In this case, the volatility of the basket is given as the sum of the volatilities of the underlying securities and ideas from Varadhan [Var67] are used to compute the most likely configuration of the basket and approximate σ b. Combining (1.3) with parameter averaging techniques, Antonov, Misirpashaev, and Piterbarg [Pit03b] [Pit05] [Pit06] [AM06] [Pit07] [AMP07] have developed pricing approximations for a range of markets. Inspired by the success of the HJM methodology, some authors have attempted to use σ b as the state of a process. Unlike parameterized models, this approach has the advantage that essentially any set of European option prices may be matched with an appropriate choice of state. Derman and Kani [DK98] initiate such an approach for trinomial trees, but comment that the no-arbitrage drift conditions for such a model in continuous-time are rather involved. More recently, Carmona and Nadtochiy [Car07] [CN07] have provided a rigorous development of this approach in continuous-time. They use results of Kunita [Kun90] on stochastic flows to ensure that σ bt ( , ), which is now a random field, remains regular enough that the pricing PDE may be solved, and they derive the drift restrictions, correcting a mistake in [DK98]. Formula (1.3) actually provides some clue as to why the drift restrictions in such a model are so difficult to deal with. To see how a perturbation to σ b affects P[St ∈ dx], one must re-solve the forward equation using the perturbed value of σ b, so the drift restrictions that enforces (1.3) are not available in closed-form. Formula (1.3) is also useful because it suggests a way to adjust an initial model to fit a set of European options prices. In particular, given a model of the form (1.2), equation (1.3) suggests that we might attempt to choose a

..

5

CHAPTER 1. INTRODUCTION

deterministic function f such that the solution to the SDE (1.4)

dSt = f (St , t) σt St dWt

has the required European option prices. In [BJN00], Britten-Jones and Neuberger develop this approach for a discrete-time model, and Madan, Qain, and Ren [MQR07] propose a numerical method to compute such an f in a continuous-time model by solving the associated forward equation. In this dissertation, we extend the relationship among the formulas (1.1), (1.2), and (1.3) beyond the diffusion case. In particular, we show that by conditioning on the value of an auxiliary process in (1.3), we may construct a path-dependent function σ b and then find a process that solves a generalization of (1.1) and preserves some path-dependent properties of the initial process given in (1.2). In particular, the one-dimensional marginal distributions of the auxiliary process are preserved, so the prices for European-style options with payoffs that can be written as functions of the auxiliary process at maturity are also preserved. We hope that with such a result, it will be possible to adapt some of the local volatility-based approaches mentioned above to handle common path-dependent options. Moreover, even in the diffusion case, we believe that our approach offers technical advantage over previous PDE-based approaches to local volatility models. Dupire’s derivation of the local volatility SDE using the forward equation is essentially formal; however, in earlier work only recently rediscovered by finance community, Krylov [Kry84] and Gy¨ongy [Gy¨o86] develop a result that may be considered a rigorous proof of the existence of a local volatility model. They have provided the following result. 1.5 Theorem ([Gy¨o86] Thm. 4.6). Let W be an Rd -valued Wiener process, and let X be an Rd -valued process that solves the SDE dXt = µt dt + σt dWt , where µt ∈ Rd and σt ∈ Rd ⊗Rd are bounded, adapted processes and σt σtT is uniformly positive definite. Then there exist (deterministic) functions µ b: d d d d d R ×R+ → R and b : R ×R+ → R ⊗R and a Lebesgue-null set N ⊂ R+ σ such that µ b Xt , t = E[µt | Xt ] a.s. and σ b2 (Xt , t) = E[σt σtT | Xt ] a.s. when t∈ / N , and there exists a weak solution to the SDE (1.6)

bt = µ bt , t) dt + σ bt , t) dW ct dX b(X b(X 6

CHAPTER 1. INTRODUCTION

c is anwith the same one-dimensional marginal distributions as X, where W d other R -valued Wiener process. The requirements on the covariance process σσ T in this theorem are rather strong. For instance, in Heston’s model [Hes93] the volatility process is neither bounded, nor uniformly bounded away from zero. As a result, one may not apply Thm. 1.5 to conclude that there exists a local volatility model with the same one-dimensional marginal distributions as a given parameterization of Heston’s model. Using the main result of this dissertation, we may replace the requirements of boundedness and uniform positive definiteness in Thm. 1.5 with a weaker integrability condition (compare Thm. 1.5 with Cor. 2.16) that is satisfied in Heston’s model. So, even in the diffusion case, we provide a stronger result by avoiding the use of PDE-based arguments.

1.2

Definitions and Notation

We have N = {1, 2, . . . } and N , N ∪ {∞}. We also have R , (−∞, ∞), R+ , [0, ∞), and R+ , [0, ∞]. We define Rd , Rd ∪ {∞} to be the one d point compactification of the locally compact Rd as a P spacei Rj . We treat p Hilbert space with inner product (x, y) , 1≤i≤d x y and kxk = (x, x). d We denote the set of n×d matrices by Rn ⊗R Rn ⊗Rd as a Hilbert P . We treat T ii space as well with inner product (A, B) , 1≤i≤n (AB ) where B T denotes the transpose of B (this is the Frobenius or Hilbert-Schmidt norm). If z ∈ Rn and w ∈ Rd then z⊗w denotes the matrix in Rn ⊗Rd with (z⊗w)ij = z i wj and kz⊗wk = kzkkwk. We also identify A ∈ Rn ⊗Rd with the linear operator from Rd to Rn that acts by matrix multiplication. The Frobenius norm is stronger than the operator norm, so kAxk ≤ kAkkxk. We denote by S+d ⊂ Rd ⊗Rd the set of symmetric nonnegative definite matrices. We let λ denote Lebesgue’s measure on R, and we let λB (A) , λ(A ∩ B) denote the restriction of λ to B. For easy reference, we tag the following remark. 1.7 Remark. We will always use the “extended” version of the integral (R R Z f dµ if kf k dµ < ∞, f dµ , ∞ otherwise, so the integral takes values in the set Rd when f takes values in Rd . In particular, the integral is always defined. With this definition, one should interpret 7

CHAPTER 1. INTRODUCTION

the symbol ∞ to mean thatRthe integral is not finite. In particular, according R1 1 to this definition, we have 0 −1/x dx = ∞ rather than 0 −1/x dx = −∞. This may seem to be a strange choice, but will prove to be very convenient as we will be using this convention mainly with Rd -valued integrands. Rather than trying to keep track of which coordinates are finite or infinite or undefined, we simply define the value of the whole integral to be “∞” whenever any coordinateR in not finite. If we fix any measurable f : R+ → Rd , and t define F (t) , 0 f (s) ds, then F is left continuous. In fact, the only way that F may jump is if it jumps to ∞ and then stays there, so F is continuous if it is finitely-valued for all t ∈ R+ . For instance, if we fix any nonzero x ∈ Rd and take f (t) = x/(t − 1) 1{t>1} , then F (t) = ∞ 1{t>1} . 1.8 Definition. If (E, E ) is a measurable space and X is an E-valued random variable, then we let L (X | P) , P ◦ X -1 denote the measure induced by X on (E, E ), and we say that L (X | P) is the law of X under P. When P is clear from the context, we abbreviate L (X | P) to L (X), and we simply say that L (X) is the law of X. 1.9 Definition. Let E be a topological space, and let {X n }n≤∞ be a sequence of E-valued random variables, possibly defined on different probability spaces. We say that X n converges in distribution to X ∞ , written X n ⇒ X ∞ , if lim En f (X n ) = E∞ f (X ∞ ) n→∞

for each bounded, continuous function f : E → R.

Filtrations and stochastic bases 1.10 Definition. Given a probability space (Ω, F , P), we say that a collection of σ-fields F 0 = {Ft0 }t∈R+ is a filtration if Fs0 ⊂ Ft0 ⊂ F when s < t. We say that F 0 is right continuous if Fs = ∩t>s Ft . We say that the σ-field F is complete if A ⊂ N ∈ F and P[N ] = 0 implies that A ∈ F . We say that the filtration F 0 is complete if A ⊂ N ∈ F and P[N ] = 0 implies A ∈ F00 , and we say that F 0 satisfies the usual conditions if F is right-continuous and complete. In particular, we do not require a filtration to be right continuous as in [JS87] Def. I.1.2. We often superscript a filtration which may not satisfy the usual conditions with a zero to warn the reader. Some authors differentiate 8

CHAPTER 1. INTRODUCTION

between the “completion” and the “augmentation” of a filtration; we make no such distinction. 1.11 Definition. We say that B = Ω, F , F 0 = {Ft0 }t∈R+ , P is a stochastic basis if (Ω, F , P) is a probability space and F 0 is a filtration. We say that B is complete if F and F 0 are both complete, and we say that B satisfies the usual conditions if F is complete and F 0 satisfies the usual b = Ω, b is another stochastic basis, then we b F b , {Fb 0 }t∈R+ , P conditions. If B t b , Ω×Ω, b b and we say that B⊗ B b F ⊗F b , {F 0 ⊗ Fb 0 }t∈R+ , P×P define B⊗ B t t is an extension of B.

Spaces of functions If E1 is a topological space, then the Borel σ-field on E1 is the σ-field generated by the open subsets of E1 . If E2 is another topological space, then C(E1 ; E2 ) denotes the set of continuous maps from E1 to E2 . If E2 has a metric d2 , then we will always endow C(R+ ; E2 ) with the locally uniform topology and the compatible distance d(x, y) ,

∞ X

2-n 1 ∧ sup d2 x(s), y(s) . s≤n

n=1

If E2 is a Polish space, then C(R+ ; E2 ) is a Polish space as well. If E2 is a vector space, then C(R+ ; E) is also vector space. We now assume that E is a Polish space. We will be slicing up paths and patching them back together, so we fix some useful notation. We define the shift operator Θ : C(R+ ; E)×R → C(R+ ; E) by Θ(x, t) , x (t + )+ ,

.

.

where ( )+ denotes the positive part. Notice that this is a slight extension of the standard shift operator of Markov process theory as we allow for negative shifts and the value of the path at 0 is used to fill the “gap” that is created when the path is shifted to the right. In particular, if 0 ≤ s ≤ t then Θ(x, −t)(s) = x(0). We also define the stopping operator

.

∇ : C(R+ ; E)×R+ → C(R+ ; E) by ∇(x, t) , x( ∧ t). If E has a vector space structure, then C(R+ ; E) has a vector space struc9

CHAPTER 1. INTRODUCTION

ture, so E has a zero element, and we may define the space of paths that start at zero C0 (R+ ; E) , x ∈ C(R+ ; E) : x(0) = 0 . This is a closed subset of C(R+ ; E), so C0 (R+ ; E) is a Polish space in the relative topology. In this situation, we can also define the difference operator

.

∆ : C(R+ ; E)×R+ → C0 (R+ ; E) by ∆(x, t) , x(t + ) − x(t). We will slightly abuse the notation and use the same symbols for these operators as we vary the space E. The domain and range of the operators should be clear from the context. The maps Θ, ∇, and ∆ are continuous, and they are linear in x, for fixed t, when E is a vector space. If X : Ω → C(R+ ; E), then we use the notation Xt : Ω → E to denote the map ω 7→ X(ω)(t), and we use the standard “stopped process” notation. In particular, if T is an R+ -valued random variable, then X T : Ω → C(R+ ; E) denotes the map ω 7→ ∇ X(ω), T (ω) . Notice that if t and u are nonnegative, then we have Θ(X t+u , t) = Θ ∇(X, t + u), t = ∇ Θ(X, t), u = Θu (X, t), and a similar chain of equalities for ∆(X t+u , t) when E is a vector space. If X is random variables, then σ(X) denotes the σ-field generated by X. If G and H are σ-fields, and X and Y are random variables, then σ(G , H , X, Y ) = G ∨ H ∨ σ(X) ∨ σ(Y ).

Processes We denote by BV [a, b]; Rd the class of Rd -valued functions that are of bounded variation when restricted to the interval [a, b]. Similarly, AC [a, b]; Rd denotes the class of Rd -valued functions that are absolutely continuous when restricted to the interval [a, b]. One may consult Appendix C for further details. 1.12 Definition. If X is an Rd -valued process, then we say that X is a finite variation process if X ∈ BV [0, t]; Rd for all t ∈ R+ , and we say that X is an absolutely continuous process if X ∈ AC [0, t]; Rd for all t ∈ R+ .

10

CHAPTER 1. INTRODUCTION 1.13 Definition. If X is an Rd -valued process, then we define X

Xs − Xs , Vart (X) , sup i i−1 π

i

where the sup is taken over all partitions of the form π = 0 = s0 < s1 < . . . < sn = t . If X is a c`adl`ag process, then we need only consider partitions containing rational points in the definition above, and Vart (X) is a (measurable) random variable. 1.14 Definition. Let B = (Ω, F , F 0 , P) be a stochastic basis supporting a continuous, Rd -valued process X. We say that X is a continuous semimartingale if we can decompose X as (1.15)

Xt = X0 + Mt + Bt ,

where M is a continuous local martingale with M0 = 0, and B is a continuous process with B0 = 0 that is P-a.s. of finite variation. In this case, we say that X has the characteristics B, hM i . If B and hM i are both absolutely continuous, P-a.s., then we say that X is an Itˆ o process.

11

Chapter 2 Statement of Results In this chapter, we present the main result of the dissertation and we give a few corollaries to illustrate potential applications. To state the main result, we need to first define the notion of an updating function. We give this definition and present a few examples of updating functions in Section 2.1. In Section 2.2, we state the main result of the dissertation and give corollaries. In Section 2.3, we show how these result can be used to give an answer to a question raised by Piterbarg about the prices for barrier options in mixture models.

2.1

Updating Functions

The following definition is fundamental for all that follows. 2.1 Definition. Let E be a Polish space, and let Φ : E×C0 (R+ ; Rd ) → C(R+ ; E) be a function. We say that Φ is an updating function if (a) Φt (e, x) = Φt e, ∇(x, t) ∀t ∈ R+ , and (b) Θ Φ(e, x), t = Φ Φt (e, x), ∆(x, t) ∀t ∈ R+ . If Φ is also continuous as a map from E×C0 (R+ ; Rd ) to C(R+ ; E), then we say that Φ is a continuous updating function. Property (a) of Def. 2.1 is an adaptedness condition. Property (b) restricts the way in which Φ may depend upon the history of X. We state this precisely as a lemma. 12

CHAPTER 2. STATEMENT OF RESULTS 2.2 Lemma. Let E be a Polish space, let X be a continuous, Rd -valued process, let Z be a continuous, E-valued process, and let Φ be an updating function. If Z = Φ Z0 , ∆(X, 0) , then Θ(Z, t) = Φ Zt , ∆(X, t) for all t ∈ R+ . Proof. We simply write

Θ(Z, t) = Θ Φ Z0 , ∆(X, 0) , t = Φ Φt Z0 , ∆(X, 0) , ∆(X, t) = Φ Zt , ∆(X, t) , where we have used property (b) of Def. 2.1 and the fact that ∆ ∆(X, 0), t = ∆(X, t). This lemma suggests that processes of the form Z = Φ Z0 , ∆(X, 0) have some desirable properties, and we give this relationship an intuitive name. 2.3 Definition. Let E be a Polish space, let X be a continuous, Rd -valued process, and let Z be a continuous, E-valued process. If we can write Z = Φ Z0 , ∆(X, 0) for some updating function Φ, then we say that Z may be updated using only the changes in X. We now present a number of examples to illustrate the relationship between the processes X and Z of Def. 2.3. Unfortunately, the notational burden increases with each example; however, the point of these examples is simply to show that the notion of updating function is quite general, and the reader will not lose much by skimming the details. 2.4 Example. Let X be a continuous, Rd -valued process, and set Z , X. Then Z may be updated using only the changes in X. To see this, we set E , Rd , and we define Φ(e, x) , e 1[0,∞) + x, where e 1[0,∞) denotes the constant path in C(R+ ; Rd ) that is equal to e at all times. We have Z = Φ Z0 , ∆(X, 0) , and we now check that Φ is an updating function. For t ∈ R+ and x ∈ C0 (R+ ; Rd ), we have Φt (e, x) = e 1[0,∞) + ∇(x, t) = Φt e, ∇(x, t) , so property (a) of Def. 2.1 holds. If we know the value of X at time t, and we know how the process X changes after time t, then we may reconstruct 13

CHAPTER 2. STATEMENT OF RESULTS

the path of X after time t. In particular, we have Θ Φ(e, x), t = e 1[0,∞) + Θ(x, t) = e + x(t) 1[0,∞) + ∆(x, t) = Φ Φt (e, x), ∆(x, t) , so property (b) of Def. 2.1 holds, and Φ is an updating function. In Example 2.4, the only information that we decided to record about the process X was the current location. In the next example, we choose to track both the current location and the running maximum. We restrict ourselves to the one-dimensional case for simplicity. 2.5 Example. Let X be a continuous process and set Z , (X, M ) where Mt , max Xs : s ∈ [0, t] . Then Z may be updated using only the changes in X. This time we take E = R2 , and we write a typical point in E as e = [ ee12 ]. We let ψ : C0 (R+ , R) → C(R+ ; R+ ) denote the map such that ψt (x) = max x(s) : s ∈ [0, t] , and we let Φ : E×C0 (R+ ; R) → C(R+ ; R2 ) denote the map such that e1 + x(t) Φt (e, x) , . max{e2 , e1 + ψt (x)} Notice that 0 Φt Z0 , ∆(X, 0) = Φt X X0 , ∆(X, 0) =

X0 + ∆t (X, 0) = Zt , X0 + ψt ◦ ∆(X, 0)

where we have used the fact that X0 + ψt ◦ ∆(X, t) ≥ X0 . We now check that Φ is an updating function. Fixing any s ≤ t, we have e1 + x(s) t Φs (e, x) = Φs (e, x) = max{e2 , e1 + ψs (x)} e1 + ∇s (x, t) = max{e2 , e1 + ψs ◦ ∇(x, t)} = Φs e, ∇(x, t) = Φts e, ∇(x, t) . As this is true for all s ≤ t, we conclude that property (a) of Def. 2.1 holds. 14

CHAPTER 2. STATEMENT OF RESULTS

Finally, we check that, for all t, u ∈ R+ , we have Θu Φ(e, x), t = Φt+u (e, x) e1 + x(t + u) = max{e2 , e1 + ψt+u (x)} e1 + x(t) + ∆u (x, t) = max{e2 , e1 + ψt (x), e1 + x(t) + ψu ◦ ∆(x, t)} h i = Φu max{ee21,+x(t) , ∆(x, t) e1 +ψt (x)} = Φu Φt (e, x), ∆(x, t) , so property (b) of Def. 2.1 holds, and Φ is an updating function. When dealing with a time-dependent Markov processes, a standard technique is to append the time to the state of the process and form a timehomogeneous “space-time” process. In the next example, we see that we may employ a similar technique with an updating function. We also see how we might construct updating functions that record information about the joint distributions of the X process. 2.6 Example. Let X be a continuous, real-valued process. Fix some T > 0, and set Zt , (Xt , XtT , t). Then Z may be updated using only the changes 2 in X. h e1This i time we take E , R ×R+ , and we write a typical point in E as e = ee23 . Let Φ : E×C0 (R+ ; R) → C(R+ ; E) denote the map such that 

 e1 + x(t) Φt (e, x) = e2 + ∇t x, (T − e3 )+  . e3 + t We check that   X0 + ∆t (X, 0) h X i 0 Φt Z0 , ∆(X, 0) = Φt X0 , ∆(X, 0) = X0 + ∆Tt (X, 0) = Zt . 0 t

15

CHAPTER 2. STATEMENT OF RESULTS

Fixing any s ≤ t, we have 

 e1 + x(s) Φts (e, x) = Φs (e, x) = e2 + ∇s x, (T − e3 )+  e3 + s   e1 + ∇s (x, t) = e2 + ∇s ∇(x, t), (T − e3 )+  e3 + s = Φs e, ∇(x, t) = Φts e, ∇(x, t) . As this is true for any s ≤ t, we again conclude that property (a) of Def. 2.1 holds. To see that property (b) holds, we first note that for any path y and times s, t ∈ R+ , we have (2.7) ∇ Θ(y, t), s = Θ ∇ y, (s − t)+ , t . Taking any t, u ∈ R+ and letting y = ∆(x, t), we write ∇t+u x, (T − e3 )+ + = ∇t+u ∇(x, t) + Θ ∆(x, t), t , (T − e3 ) = ∇t+u ∇(x, t), (T − e3 )+ + ∇t+u Θ(y, t), (T − e3 )+ + + (2.8) = x t ∧ (T − e3 ) + Θt+u ∇ y, (T − e3 − t) , t = ∇t x, (T − e3 )+ + ∇u ∆(x, t), (T − e3 − t)+ ,

16

CHAPTER 2. STATEMENT OF RESULTS where 2.8 follows from 2.7 with s = (T − e3 )+ . To conclude, we write Θu Φ(e, x), t = Φt+u (e, x)   e1 + x(t + u) = e2 + ∇t+u x, (T − e3 )+  e3 + t + u   e1 + x(t) + ∆ (x, t) u = e2 + ∇t x, (T − e3 )+ + ∇u ∆(x, t), (T − e3 − t)+  e3 + t + u e1 +x(t) e2 +∇t (x, (T −e3 )+ ) , ∆(x, t) = Φu e3 +t = Φu Φt (e, x), ∆(x, t) . This shows that property (b) of Def. 2.1 holds, so Φ is an updating function. As is quickly becoming clear, the hardest part of checking that a function is an updating function is working through the notation. This is particularly true of the last example that we present. In this last example, we use Z to record the entire trajectory of X up until the current time. The updating function removes an initial segment of path from the front end of ∆(X, t) and appends it to the end of the initial path segment stored in Z. As we now have a path-valued process, this situation is somewhat unpleasant to deal with notationally, and the reader will not lose much by omitting the details. 2.9 Example. Let X be a continuous, Rd -valued process, and set Zt = (X t, t), so Zt records the entire trajectory of X up until the time t. Then Z may be updated using only the changes in X. This example is extremal in the sense that we choose to record the most information about the path of X that is possible without violating property (a) of Def. 2.1. We take E , C(R+ ; Rd )×R+ , and we write a typical point in E as e = [ ee12 ]. We map a segment of path to a point in E, using the second coordinate to record the length of the segment. Let Ψ : E → C(R+ ; E) denote the map such that ∇(e1 , e2 + t) Ψt (e) = . e2 + t We might describe Ψ as a map that reveals more and more of the path e1 over time. In particular, we give Ψ a path e1 and an initial time e2 , and Ψt shows us the piece of e1 that lives on the interval [0, e2 + t]. Let 17

CHAPTER 2. STATEMENT OF RESULTS Φ : E×C0 (R+ ; Rd ) → C(R+ ; E) denote the map such that Φt (e, x) = Ψt ∇(e1 , e2 ) + Θ(x, −e2 ), e2 ∇ ∇(e1 , e2 ) + Θ(x, −e2 ), e2 + t = . e2 + t Recall that to compute Θ(x, −e2 ), we slide the path x to the right by the amount e2 , and we have Θt (x, −e2 ) = x(0) = 0 for t ∈ [0, e2 ]. Φ appends the path x to the initial path segment e and then hands the newly constructed path over to Ψ, which slowly reveals information about the path in ∇(X,0) an adapted way. As Z0 = , we have 0 Φt Z0 , ∆(X, 0) = Ψt ∇(X, 0) + ∆(X, 0), 0 = Ψt (X, 0) = Zt . We now check that Φ is an updating function. To check the first property of Def. 2.1, we first notice that if x ∈ C(R+ ; Rd ) and 0 ≤ s ≤ t, then ∇ Θ(x, −e2 ), e2 + s = Θ ∇(x, s), −e2 = Θ ∇ ∇(x, t), s , −e2 = ∇ Θ ∇(x, t), −e2 , e2 + s . The first and last equality state that sliding the path x or ∇(x, t) to the right by e2 and then stopping it at time e2 +s is the same as first stopping the path at time s and then sliding it to the right by e2 . The second equality follows from the fact that stopping a path at two deterministic times is equivalent to stopping the path once at the earlier time. Using this observation and the fact that ∇(x + y, t) = ∇(x, t) + ∇(y, t), we may write ∇ ∇(e1 , e2 ) + Θ(x, −e2 ), e2 + s = ∇ ∇(e1 , e2 ), e2 + s + ∇ Θ(x, −e2 ), e2 + s = ∇ ∇(e1 , e2 ), e2 + s + ∇ Θ ∇(x, t), −e2 , e2 + s = ∇ ∇(e1 , e2 ) + Θ ∇(x, t), −e2 , e2 + s .

18

CHAPTER 2. STATEMENT OF RESULTS

Fixing any 0 ≤ s ≤ t, we then have ∇ ∇(e1 , e2 ) + Θ(x, −e2 ), e2 + s t Φs (e, x) = Φs (e, x) = e2 + s # " ∇ ∇(e1 , e2 ) + Θ ∇(x, t), −e2 , e2 + s = e2 + s = Φs e, ∇(x, t) = Φts e, ∇(x, t) . We have now shown that property (a) of Def. 2.1 holds. To check the second property of Def. 2.1, we first observe that x = ∇(x, t) + Θ ∆(x, t), −t), which implies that Θ(x, −e2 ) = Θ ∇(x, t), −e2 + Θ ∆(x, t), −e2 − t = ∇ Θ(x, −e2 ), e2 + t) + Θ ∆(x, t), −e2 − t . Using these observations, we may write eb1 , ∇(e1 , e2 ) + Θ(x, −e2 ) = ∇ ∇(e1 , e2 ), e2 + t + ∇ Θ(x, −e2 ), e2 + t) + Θ ∆(x, t), −e2 − t = ∇ ∇(e1 , e2 ) + Θ(x, −e2 ), e2 + t + Θ ∆(x, t), −e2 − t = ∇ eb1 , e2 + t + Θ ∆(x, t), −e2 − t .

19

CHAPTER 2. STATEMENT OF RESULTS

To conclude, we write Θu Φ(e, x), t) = Φt+u (e, x) = Ψt+u eb1 , e2 ∇ eb1 , e2 + t + u = e2 + t + u # " ∇ ∇ eb1 , e2 + t + Θ ∆(x, t), −e2 − t , e2 + t + u = e2 + t + u e1 , e2 + t) + Θ ∆(x, t), −e2 − t , e2 + t = Ψu ∇(b e2 +t) = Φu ∇(bee12, +t , ∆(x, t) e1 , e2 ), ∆(x, t) = Φu Ψt (b = Φu Φt (e, x), ∆(x, t) . We have now shown that property (b) of Def. 2.1 holds, so Φ is an updating function.

2.2

Main Result

Before we present the main result of the dissertation, we pause to give the following simple example. e be a stochastic basis that supports a e F e , {Fet }t∈R+ , P 2.10 Example. Let Ω, f and an Fe0 -measurable random variable U e that is uniformly Wiener process W distributed over the interval [0, 1]. Fix constants 0 < c1 < c2 , and set √ √ σ et , c1 1{Ue <1/2} + c2 1{Ue ≥1/2} , where we take the nonnegative square root. Rt fs and The process σ e is bounded and adapted, so we may define Yet , 0 σ es dW Rt 2 et , σ et = tc1 1 e C es ds. Notice that C +tc2 1 e and that Ye is an Itˆo 0

{U <1/2}

{U ≥1/2}

e Letting η(x, v) , (2πv)−1/2 e−x2 /(2v) process with the characteristics (0, C). denote the density of the normal distribution with mean 0 and variance v, e Yet ∈ dx] = we see that the density of the random variable Yet is given by P[ η(x, tc1 )/2 + η(x, tc2 )/2. This example is essentially the simplest possible stochastic volatility model and we will use this example to illustrate most of the results that follow. 20

CHAPTER 2. STATEMENT OF RESULTS

We now present the main result of this dissertation. 2.11 Theorem. Let W be an Rr1 -valued Wiener process, let µ be an adapted, Rd -valued process, let σ be an adapted, Rd ⊗Rr1 -valued process, assume that Z t T (2.12) E kµs k + kσs σs k ds < ∞ ∀t ∈ R+ , 0

and set Z (2.13)

Yt ,

t

Z

t

µs ds + 0

σs dWs . 0

Let E be a Polish space, and let Z be a continuous, E-valued process with Z = Φ(Z0 , Y ) for some continuous updating function Φ. Finally, suppose that N ⊂ R+ is a Lebesgue-null set and that we have (deterministic) functions µ b : E×R+ → Rd and σ b : E×R+ → Rd ⊗Rr2 such that µ b(Zt , t) = E[µt | Zt ] T T a.s. and σ bσ b (Zt , t) = E[σt σt | Zt ] a.s. when t ∈ / N. b P) b supporting processes W b b c, Then there exists a stochastic basis (Ω, F , F, Yb , and Zb such that c is an Rr2 -valued Wiener process, (a) W Z t Z t cs , b b σ b(Zbs , s) dW µ b(Zs , s) dt + (b) Yt = 0

0

(c) Zb is a continuous, E-valued process with Zb = Φ(Zb0 , Yb ), and (d ) Zb has the same one-dimensional marginal distributions as Z. 2.14 Remark. Cor. 4.5 asserts that we may find deterministic functions µ b : E×R+ → Rd and νb : E×R+ → S+d and a Lebesgue-null set N ⊂ R+ such that µ b(Zt , t) = E[µs | Zt ] a.s. and νb(Zt , t) = E[σσsT | Zt ] a.s. when t ∈ / N . If we take r2 = d, then Lem. D.6 asserts that we may take the positive square root of νb to get a function σ b taking values in S+d and satisfying σ b2 = νb. As a result, we can always find functions satisfying the requirements of the previous theorem; however, in applications we can often compute versions of µ b and σ b explicitly, so we formulate the theorem to take these functions as inputs.

21

CHAPTER 2. STATEMENT OF RESULTS

2.15 Remark. In this formulation, we always have Y0 = 0, so Z0 is the only “initial condition.” In the corollaries that follow, we will see that this does not restrict generality. To appreciate this result, it is probably helpful to first consider two essentially extremal corollaries. The first corollary reads as follows. 2.16 Corollary. Let W be an Rr1 -valued Wiener process, and let X be an Rd -valued Itˆo process with stochastic differential dXt = µt dt + σt dWt , where µt ∈ Rd and σt ∈ Rd ⊗Rr1 are adapted processes satisfying (2.12). Let N ⊂ R+ be a a Lebesgue-null set, and let µ b : Rd ×R+ →Rd and σ b : Rd ×R+ → Rd ⊗Rr2 be (deterministic) functions such that µ b Xt , t = E[µt | Xt ] a.s. and σ bσ bT (Xt , t) = E[σt σtT | Xt ] a.s. when t ∈ / N . Then there exists a weak solution to the SDE (2.17)

bt = µ bt , t) dt + σ bt , t) dW ct dX b(X b(X

c is an Rr2 -valued Wiener process, and X b has the same one-dimensional where W marginal distributions as X. Proof. Set E , Rd , Φ(e, x) , e 1[0,∞) + x, Y , ∆(X, 0), and Z , X, so Z = Φ(Z0 , Y ). It is clear that Φ is continuous, and we have already shown that Φ is an updating function in Example 2.4, so we may apply Thm. 2.11 b P) b that supports b F b , F, to conclude that there exists a stochastic basis (Ω, b and W c such that W c is an Rr2 -valued Wiener process, Yb processes Yb , Z, satisfies (b) of Thm. 2.11, Zb is a continuous, Rd -valued process with Zb = Φ(Zb0 , Yb ), and Zb has the same one-dimensional marginal distributions as Z. b , Z, b so X bt = Φt (Zb0 , Yb ) = Zb0 + Ybt . As Yb satisfies (b) of Thm. 2.11, We set X b we conclude that X solves (2.17), and we are done. As noted in the introduction, Krylov [Kry84] and Gy¨ongy [Gy¨o86] have proved this result under the additional hypotheses that µ and σ are both bounded and σσ T is uniformly positive definite. Let M denote the class of financial models in which the interest rate is deterministic and the price processes are modeled as Itˆo processes whose coefficients satisfy the integrability requirement (2.12). Let D ⊂ M denote 22

CHAPTER 2. STATEMENT OF RESULTS

the class of financial models in which the price processes solve an SDE of the form (2.17). Given the equivalence between the one-dimensional marginal distributions of a price process under a pricing measure and the prices of European options, Cor. 2.16 admits the following financial interpretation: If there exists any model in M which is consistent with a given set of market prices for European options, then there also exists a model in D which is consistent with that set of market prices. 2.18 Example. Let Ye and η be defined as in Example 2.10, and take X = Ye in Cor. 2.16. In this case, we can compute µ b and σ b explicitly. For t > 0, we have µ b(x, t) = 0 and 2 e σ σ b2 (x, t) = E et Yet = x h i h i e σ e σ c1 P et2 = c1 , Yet ∈ dx + c2 P et2 = c2 , Yet ∈ dx h i h i = e σ e σ P et2 = c1 , Yet ∈ dx + P et2 = c2 , Yet ∈ dx =

c1 η(x, tc1 ) + c2 η(x, tc2 ) . η(x, tc1 ) + η(x, tc2 )

So, taking µ b = 0 and s (2.19)

σ b(x, t) =

c1 η(x, tc1 ) + c2 η(x, tc2 ) , η(x, tc1 ) + η(x, tc2 )

Cor. 2.16 asserts the existence of a solution to the SDE (2.17) with the same one-dimensional marginal distributions as Ye . If we were to start with a mixture of geometric Brownian motions with differing volatilities rather than (arithmetic) Brownian motions, then we would recover the results of Brigo and Mecurio [BM01] [BM02] who show how to construct models in which European option prices are given as mixtures of Black-Scholes prices. In fact, our results are slightly stronger, as Brigo and Mecurio require the existence of a strong solution to (2.17). In Example 2.18, we already see a situation where the solution to (2.17) may not be strong. In particular, looking at (2.19), we see that we cannot define σ b in such a way that it is continuous at t = 0. Brigo and Mecurio avoid this problem by choosing volatility scenarios that are deterministic functions of time and requiring that all volatility scenarios agree on some arbitrarily small initial 23

CHAPTER 2. STATEMENT OF RESULTS

time interval. The first corollary that we presented corresponds to the diffusion case where the only information that we choose to track about the process X is the current location. At the other extreme, we might choose to remember the entire history of X. 2.20 Corollary. Let W be an Rr1 -valued Wiener process, and let X be an Rd -valued Itˆo process with stochastic differential dXt = µt dt + σt dWt , where µt ∈ Rd and σt ∈ Rd ⊗Rr are adapted processes satisfying (2.12). Let N ⊂ R+ be a Lebesgue-null set, and let µ b : C(R+ ; Rd )×R+ → Rd and σ b : C(R+ ; Rd )×R+ → Rd ⊗Rr2 be functions such that µ b(X t , t) = E[µt | X t ] T t T t a.s. and σ bσ b (X , t) = E[σt σt | X ] a.s. when t ∈ / N . Then there exists a weak solution to the SDE (2.21)

bt = µ b t , t) dt + σ b t , t) dW ct , dX b(X b(X

c is some Rr2 -valued Wiener process. with the same law as X, where W Proof. Take E = C(R+ ; Rd )×R+ and let e = [ ee12 ] denote a typical point in E. Set Y , ∆(X, 0), set Zt , (X t , t), and let Φ : E×C0 (R+ ; Rd ) → C(R+ ; Rd ) denote the map such that ∇ ∇(e1 , e2 ) + Θ(x, −e2 ), e2 + t Φt (e, x) , . e2 + t Z is a continuous, E-valued process with Z = Φ(Z0 , Y ). We showed that Φ is an updating function in Example 2.9. ∇ and Θ are continuous functions, and the addition of paths in C(R+ , Rd ) is a continuous operation, so Φ is a continuous function, and we may apply Thm. 2.11 to conclude that there b P) b that supports processes Yb , Z, b F b , F, b and W c exists a stochastic basis (Ω, r c is an R 2 -valued Wiener process, Yb satisfies (b) of Thm. 2.11, such that W b Z is a continuous, E-valued process with Zb = Φ(Zb0 , Yb ), and Zb has the same one-dimensional marginal distributions as Z. There is a slight abuse of notation here as µ b has domain C(R+ ; Rd )×R+ , but Thm. 2.11 expects µ b d to have domain E×R+ = C(R+ ; R )×R+ ×R+ . This happens because the process Z already includes the time, and we implicitly identify µ b(x, t, t) with µ b(x, t) and σ b(x, t, t) with σ b(x, t). 24

CHAPTER 2. STATEMENT OF RESULTS b , (Zb1 ) + Yb , where Zbi denotes the ith component of Z. b This Define X 0 0 looks a little awkward, but notice that Zb01 ∈ C(R+ ; Rd ), so (Zb01 )0 ∈ Rd . Given b and the fact that Yb satisfies (b) of Thm. 2.11, it then the definition of X b solves (2.21). As L (Zb0 ) = L (Z0 ) = L (X 0 , 0), Zb2 is follows that that X h b1 i 0 Z b b P-a.s. equal to 0. This means that the E-valued processes Zt = Φt Zb02 , Yb 0 h i b1 t Z b b b b t) = 0 and (X , t) = Φt , Y are P-indistinguishable. In particular, L (X 0 b L (Zb1 ) = L (Z 1 ) = L (X t ) for each t, so L (X) = L (X). t

t

Lipster and Shiryaev refer to a process that solves an SDE of the form (2.21) as a process of diffusion-type, and they give Cor. 2.20 under the additional assumptions that d = r1 = 1 and σ = 1 (see [LS01] Thm. 7.12), although it is not clear that these assumptions are necessary for their approach to work. Lipster and Shiryaev provide an explicit formula for the Radon-Nikodym derivative of the law of a process of diffusion-type with respect to the law of a Wiener process. To apply this result to a general Itˆo process like X, they must show that X solves some SDE of the form (2.21). Their approach is to filter the drift from the path of the process X. They subtract the filtered drift from X, and they show that what remains is a Wiener process; although, it may differ from the Wiener process that was initially used to define X. 2.22 Example. Assume that we are in the setting of Example 2.10. For n : C(R+ ; R) → R+ by each fixed n, define the sequence of functions ξm n ξm (x)

,

m X

x

i nm

−x

i−1 nm

2

.

i=1 n e For each fixed n, nξm (Y ) converges to σ e02 in probability as m → ∞. Moving n to a subsequence {a(n, m)}m that converges a.s., we define ξ n , lim inf n ξa(n,m) , m→∞

n

e so ξ (Ye ) = ξ n (Ye 1/n ) = σ e02 , P-a.s., where Ye 1/n denotes the process stopped th at time 1/n (not the n root). Define σ b : C(R+ , R)×R+ → R+ by σ b(x, t) ,

∞ X p ξ n (x) 1[1/n, 1/(n−1)) (t), i=1

where we take the positive root, and we take 1/0 to be ∞. Then σ b(Ye , t) = 25

CHAPTER 2. STATEMENT OF RESULTS e σ b(Ye t , t) = σ e0 , P-a.s., for t > 0. In this simple case, it is clear without even ft . applying Cor. 2.20 that Ye solves dYet = σ b(Ye t , t) dW It seems that the results that fall between Cor. 2.16 and Cor. 2.20 are new. For example, we have the following corollary. 2.23 Corollary. Let W be a real-valued Wiener process, let X have stochastic differential dXt = µt dt + σt dWt , where µ and σ are real-valued, adapted processes that satisfy (2.12), and set Mt , max{Xs : s ∈ [0, t]}. Let N ⊂ R+ be a Lebesgue-null set, and let µ b : R2 ×R+ → R and σ b : R2 ×R+ → R be functions with µ b(Xt , Mt , t) = 2 2 / N. E[µt | Xt , Mt ] a.s. and σ b (Xt , Mt , t) = E[σt | Xt , Mt ] a.s. when t ∈ b b b b Then there exists a stochastic basis (Ω, F , F, P) that supports processes c b and M c such that W c is a Wiener process, X b solves the SDE: W , X, (2.24)

bt = µ bt , M ct , t) dt + σ bt , M ct , t) dW ct , dX b(X b(X

ct = max{X bs : s ∈ [0, t]}, and L (X bt , M ct ) = L (Xt , Mt ) for all t ∈ R+ . M Proof. Take E , R2 and let e = [ ee12 ] denote a typical point in E. Set Y , ∆(X, 0), set Z , (X, M ), and let Φ : E×C0 (R+ ; R) denote the map such that e1 + x(t) Φt (e, x) = , max e2 , e1 + x(s) : s ∈ [0, t] so Z = Φ(Z0 , Y ). It is clear that Φ is a continuous map, and we have shown that Φ is an updating function in Example 2.5, so we may apply Thm. 2.11 b P) b that supports b F b , F, to conclude that there exists a stochastic basis (Ω, c , Yb , and Zb such that W c is a Wiener process, Yb satisfies (b) processes W of Thm. 2.11, Zb = Φ(Zb0 , Yb ), and L (Zbt ) = L (Zt ) for all t ∈ R+ . Set b N b ) , Z, b and set M c = max{X bs : s ∈ [0, t]}. Then X bt = Z01 + Ybt where Zbti (X, b solves denotes the ith component of Zbt . As Yb satisfies (b) of Thm. 2.11, X (2.25)

bt = µ bt , N bt , t) dt + σ bt , N bt , t) dW ct . dX b(X b(X

We also see that bt = max Zb02 , Zb01 + Ybs : s ∈ [0, t] = max Zb02 , X bs : s ∈ [0, t] . N 26

CHAPTER 2. STATEMENT OF RESULTS b Now Z02 = M0 = X0 = Z01 and L (Zb0 ) = L (Z0 ), so Zb02 = Zb01 , P-a.s. In b b c b particular, P[ Nt = Mt ∀t] = 1. As Xt solves (2.25), it also solves (2.24). Finally, we notice that bt , M ct ) = L (X bt , N bt ) = L (Zbt ) = L (Zt ) = L (Xt , Mt ) ∀t ∈ R+ , L (X so we are done. 2.26 Example. Assume that we are in the setting of Example 2.10, set et , max W fs : s ∈ [0, t] , and define N (2m − x)2 2(2m − x) exp − p(x, m; t) , √ 1{x≤m, m≥0} . 2t 2πt3 ft , N et ) According to [KS91] Prop. 2.8.1, the R2 -valued random variable (W e W ft ∈ dx, N et ∈ dm] = p(x, m; t). admits the density P[ ft = max Yes : s ∈ [0, t] and taking X = Ye and M = Setting M f in Cor. 2.23, we may use this density to compute µ M b and σ b explicitly. e e f e f e We set A , (X, M ), B , (W , N ), and we will write a typical point in R2 as a = (x, m). From the scaling properties of Brownian motion (e.g., √ e e [RY99] Prop. 1.1.10 (iii)), it follows that L ( ci B t ) = L (Btci ). For t > 0, we have 2 e σ ft = m σ b2 (x, m, t) = E et Yet = x, M 2 2 e σ e σ e ∈ da + c2 P e ∈ da c1 P et = c1 , A et = c2 , A = 2 2 e σ e σ e ∈ da + P e ∈ da P et = c1 , A et = c2 , A e √c1 B e √c2 B et ∈ da + c2 P[ et ∈ da] c1 P = √ √ e c1 B e c2 B et ∈ da + P et ∈ da P e B e B etc1 ∈ da + c2 P[ etc2 ∈ da] c1 P = e B e B etc1 ∈ da + P etc2 ∈ da P =

c1 p(x, m; tc1 ) + c2 p(x, m; tc2 ) . p(x, m; tc1 ) + p(x, m; tc2 )

27

CHAPTER 2. STATEMENT OF RESULTS

So taking µ b(u, v, t) = 0 and s σ b(x, m, t) =

c1 p(x, m; tc1 ) + c2 p(x, m; tc2 ) , p(x, m; tc1 ) + p(x, m; tc2 )

Cor. 2.23 asserts the existence a solution to (2.24) such that the one-dimensional b M c) agree with the one-dimensional marginal distributions of the the process (X, f), where M ct = max X bs : s ∈ [0, t] . marginal distributions of (Ye , M 2.27 Corollary. Let W be a Wiener process, fix some time T ∈ R+ , and let X have stochastic differential dXt = µt dt + σt dWt , where µ and σ are adapted processes that satisfy (2.12). Further assume that N ⊂ R+ is Lebesgue-null set and that µ b : R2 ×R+ → R and σ b : R2 ×R+ → R T T b2 (Xt , XtT ; t) = are functions such that µ b(Xt , Xt ; t) = E[µt | Xt , Xt ] a.s. and σ / N. E[σt2 | Xt , XtT ] a.s. when t ∈ b P) b that supports processes b F b , F, Then there exists a stochastic basis (Ω, c and X b such that W c is a Wiener process, X b solves the SDE: W (2.28)

bt = µ bt , X b T ; t) dt + σ bt , X b T ; t) dW ct , dX b(X b(X t t

bt , X btT ) = L (Xt , XtT ) t ∈ R+ . and L (X h e1 i Proof. Take E = R2 ×R+ , and write a typical point in E as e = ee23 . Define h Xt i Y , ∆(X, 0), Zt , XtT , and Φ : E×C0 (R+ , R) → C(R+ ; E) by t



 e1 + x(t) Φt (e, x) , e2 + ∇t x, (T − e3 )+  . e3 + t It is clear that Φ is a continuous map, and we have shown that Φ is an updating function in Example 2.6, so we may apply Thm. 2.11 to conclude b P) b that supports processes W b F b , F, c, that there exists a stochastic basis (Ω, c is a Wiener process, X b satisfies (b) of Thm. 2.11, Zb = Yb , and Zb such that W Φ(Zb0 , Yb ), and L (Zt ) = L (Zbt ) for all t ∈ R+ . As in Cor. 2.20, there is a slight abuse of notation here as µ b has domain R2 ×R+ , but Thm. 2.11 expects µ b to 28

CHAPTER 2. STATEMENT OF RESULTS have domain E×R+ = R2 ×R+ ×R+ . We are implicitly identifying µ b(e, t, t) with µ b(e, t) and σ b(e, t, t) with σ b(e, t). b , Zb1 , where Z i denotes the ith component of Z, so X bt = Zb1 + Ybt . Set X 0 b solves As Yb satisfies (b) of Thm. 2.11, X bt = µ bt , Zb2 ; t) dt + σ bt , Zb2 ; t) dW ct . dX b(X b(X t t

(2.29)

b Z02 = X0 = Z01 and L (Zb0 ) = L (Z0 ), so Zb02 = Zb01 , P-a.s. Notice that 1 2 Φt∧T (e, x) = Φt (e, x) for all t ∈ R+ when e1 = e2 and e3 = 0. As Zb = Φ(Zb0 , Yb ), we have b Zb2 = X bT P[ t t

(2.30)

∀t] ≥ P[ Zb02 = Zb01 ] = 1.

b solves (2.28). It also follows It then follows from (2.29) and (2.30) that X from (2.30) that we have bt , X b T ) = L (Zb1 , Zb2 ) = L (Z 1 , Z 2 ) = L (Xt , X T ) ∀t ∈ R+ , L (X t t t t t t so we are done. 2.31 Example. Assume that we are in the setting Example 2.10. Taking X = Ye in Cor. 2.27, we may compute µ b and σ b explicitly. It is clear that µ b = 0. When t ≤ T , σ b(e, t) is a.s. only evaluated at the points with e1 = x2 and we may use the formula given in in (2.19). We now assume that t > T . e , (Yet , YetT ) and define Write a typical point in R2 as x = (x1 , x2 ). Set A η 0 (x1 , x2 ; v, t) , η(x2 , T v) η(x1 − x2 , (t − T )v), for t > T , so η 0 (x1 , x2 ; c, t) is the density of the R2 -valued random variable ft , c W fT . Recall that η(x, v) was defined as the density of the normal cW f was defined as a Wiener distribution with mean 0 and variance v, and W

29

CHAPTER 2. STATEMENT OF RESULTS

process in Example 2.10. We then have 2 e σ σ b2 (x, t) = E et Yet = x1 , YetT = x2 2 2 e σ e σ et ∈ dx + c2 P et ∈ dx c1 P et = c1 , A et = c2 , A = 2 2 e σ e σ et ∈ dx + P et ∈ dx P et = c1 , A et = c2 , A =

c1 η 0 (x1 , x2 ; c1 , t) + c2 η 0 (x1 , x2 ; c2 , t) . η 0 (x1 , x2 ; c1 , t) + η 0 (x1 , x2 ; c2 , t)

So taking µ b(x1 , x2 , t) = 0 and s   c1 η(x1 , tc1 ) + c2 η(x1 , tc2 )     η(x1 , tc1 ) + η(x1 , tc2 )  σ b(x1 , x2 , t) = s    c1 η 0 (x1 , x2 ; c1 , t) + c2 η 0 (x1 , x2 ; c2 , t)     η 0 (x1 , x2 ; c1 , t) + η 0 (x1 , x2 ; c2 , t)

if t ≤ T , and

if t > T ,

Cor. 2.27 asserts the existence a solution to (2.28) such the one-dimensional b X b T ) agree with the one-dimensional marginal distributions of the the process (X, marginal distributions of the process (Ye , Ye T ).

2.3

Applications to Mixture Models

To conclude this chapter, we will show how our results may be used to give an answer to a question raised by Piterbarg in the working paper [Pit03a]. Let S denote the price process for some primary security. We will assume throughout this section that S0 = s0 where s0 is a constant. We will also abuse notation and use S to denote the price process in different models which may be defined on different spaces. Recall that an up-and-out call option with maturity T , strike K, and barrier L is an option that pays the amount (ST −K)+ at time T if S remains below the barrier L until time T . If S exceeds the barrier L, then the option knocks out and losses all value. Fix two constant volatility levels 0 ≤ σ 1 < σ 2 and two probabilities p1 + p2 = 1. For i ∈ {1, 2}, let B i (T, K, L) denote the price at time 0 of an up-and-out call option with maturity T , strike K, and barrier L in a Black-Scholes model with constant interest rate r and volatility σ i . No-arbitrage arguments imply 30

CHAPTER 2. STATEMENT OF RESULTS

that the price for such an option in given as B i (T, K, L) = Ei e−rT 1{MT ≤L} (ST − K)+ , where Mt , max{Su : u ∈ [0, t]}, Pi [S0 = s0 ] = 1, and S satisfies the SDE: dSt = σ i St dWt under Pi for some Wiener process W . e denote a probability measure under which the volatility is We will let P a random variable that takes the value σ i with probability pi at the initial time, just as in Example 2.10. As e e−rT 1{M ≤L} (ST − K)+ = p1 B 1 (T, K, L) + p2 B 2 (T, K, L), (2.32) E T we conclude that (2.33)

e B(T, K, L) , p1 B 1 (T, K, L) + p2 B 2 (T, K, L)

gives an arbitrage-free pricing rule for up-and-out call options in the model e where B(T, e P, K, L) is the price for the option with maturity T , strike K, and barrier L. Note that we are not making any effort to justify the pricing formula (2.32); instead, we are simply observing that the existence of a martingale measure is sufficient to ensure the absence of arbitrage. Piterbarg conjectures that this “coin-flip” model is essentially the only model in which the pricing rule (2.33) is arbitrage-free. In particular, Piterbarg writes: Does there exist a “real” and ‘reasonable” dynamic model, in which uncertainty is revealed over time, and not in an instant explosion of information as in [the coin-flip model], such that all European options and all barriers are priced using (2.33)? The answer is most likely no, but we do not have a formal proof. To produce another model in which the pricing rule (2.33) is arbitragee to produce free, we apply Cor. 2.23 to the process (S, M ) under the measure P b and processes Sb and M b = L (S, M | P). e c such that L (S, b M c | P) a measure P This is the geometric version of Example 2.26. It then follows that b e−rT 1 c e e−rT 1{M ≤L} (ST − K)+ bT − K)+ = E E ( S T {MT ≤L} b = B(T, K, L), b We should so the the pricing rule (2.33) is also arbitrage-free for the model P. 31

CHAPTER 2. STATEMENT OF RESULTS

also observe that the prices computed by discounting cash flows under the pricing measure Pb will no longer be given as mixtures of Black-Scholes prices after the initial time. By including the running minimum in the auxiliary process, it is possible to construct an arbitrage-free model which is distinct from the coin-flip model, and in which all options with both upper and lower barriers may be priced as simple mixtures of Black-Scholes prices without introducing arbitrage. While we make no claims about the extent to which b is “real” or ‘reasonable”, it is fully-specified and dynamicallythe model P consistent.

32

Chapter 3 A Cross Product Construction In this section, we develop a cross product construction for probability measures that preserves certain properties of the composed measures. We first develop a binary product, and then we show that the construction is associative, so we may repeat the construction iteratively. This section is very much in the spirit of Chapter 6 of Stroock and Varadhan [SV79]; however, the goal is slightly different. Stroock and Varadhan begin with a collection of measures that each solve a martingale problem locally, and they show that these measures can be patched together to produce a global solution to that martingale problem. We start with a single initial measure which we break into a measure on the events prior to some stopping time T , and a conditional probability measure on the events after T . We then patch these two objects back together to form a new measure, and we do this in such a way that we preserve the unconditional distribution of the events before T and the unconditional distribution of the events after T . In the next chaper, we will need to keep track of an auxiliary process that takes values in a Polish space, so the natural initial condition will be a measure on that Polish space. These considerations lead us to work in the following setting. 3.1 Setting. Let (E, E ) be a Polish space with its Borel σ-field, and set Ω , E×C0 (R+ ; Rd ) with typical point ω = (e, x). Define the random variable E(e, x) , e, the process X(e, x) , x, and the filtration F0 , {Ft0 }t∈R+ where Ft0 , E ⊗σ(X t ) = E ⊗σ(Xs : s ≤ t). Than Ω is a Polish space under the

33

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION standard product topology on E×C0 (R+ ; Rd ) with Borel σ-field _ F , E ⊗σ(X) = E ⊗X -1 (C0 ) = Ft0 , t

where C0 denotes the Borel σ-field on C0 (R+ ; Rd ). In this chapter, we will always assume that we are in Setting 3.1. Notice that by taking E = Rd and defining Zt (e, x) , e + x(t), we recover the standard Wiener space with canonical process Z. One might be concerned that F0 does not satisfy the usual conditions; however, Lem. F.8 asserts that every right-continuous F0 -martingale remains a martingale when we move to the smallest filtration generated by F0 that satisfies the usual conditions, so we can move to a filtration that satisfies the usual conditions if we need to invoke results from the general theory of processes. Moreover, F0 -stopping times have a number of useful properties that are lost when we move to the right-continuous filtration generated by F0 . In particular, if T is an F0 -stopping time, then the events in the σ-field FT0 have a nice characterization (e.g., Lem. A.1), and FT0 is countably generated. These result are developed in Appendix A. The following notion will be fundamental. 3.2 Definition. Let {0 = T0 ≤ T1 ≤ . . . ≤ Tn < ∞} be an increasing sequence of finite F0 -stopping times, and let {Gi }0≤i≤n be a collection of σ-fields. Set H0 , σ(E), Tn+1 = ∞, and (3.3) Hi , σ Gi−1 , ∆(X Ti , Ti−1 ) for 1 ≤ i ≤ n + 1. We say that Π , (Ti , Gi ) 0≤i≤n is an extended partition if both the following properties hold: (a) Ti − Ti−1 ∈ σ Gi−1 , ∆(X, Ti−1 ) for 1 ≤ i ≤ n, and (b) Gi ⊂ Hi for 0 ≤ i ≤ n. One possible way to interpretation this structure is to think of an extended partition as a filtration-like object in which information is lost at each time Ti−1 , and Gi−1 denotes the information that we keep. If we watch the process ∆(X, Ti−1 ) over the stochastic interval [Ti−1 , Ti ], and we combine what we learn with the information that we kept at Ti−1 , then the amount of 34

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION information that we have at time Ti is Hi . As Ti is an F0 -stopping time by assumption, Lem. A.5 asserts that (a) is actually equivalent to the seemingly stronger Ti − Ti−1 ∈ Hi . The σ-field Hi represents the information that we have at time Ti , and property (b) states that this is the only information that we may include in the next Gi . In essence, once you choose to forget something by leaving it out of some Gi , that information is gone forever. 3.4 Example. We specialize to the case Ω = {0}×C0 (R+ ; R2 ), so E contains a single point and there is no initial condition. We assume that the canonical process is divided into (Y, C) = X, where Y and C are real-valued processes. We fix some n, and we fix a deterministic partition π = {0 = t0 < t1 < . . . < tn < tn+1 = ∞}.

(3.5)

We take Ti = ti and Gi = σ(Yti ) in Def. 3.2. For i ∈ {1, . . . , n + 1}, we have Hi = σ Gi−1 , ∆(X Ti , XTi−1 ) = σ Yti−1 , Ys − Yti−1 , Cs − Cti−1 : s ∈ [ti−1 , ti ] ∩ R+ = σ Θ(Y ti , ti−1 ), ∆(C ti , ti−1 ) . As Yti = Θti −ti−1 (Y ti , ti−1 ) ∈ Hi , Π = (Gi , ti ) 0≤1≤n is an extended partition. Notice that in this example, we do not have Cti ∈ Hi when i > 0. The goal of this section is to the provide the following theorem. 3.6 Theorem. Let P be a probability measure on Ω and Π = {(Ti , Gi )}0≤i≤n ⊗Π be an extended partition. Then there exists a unique measure, denoted P , such that ⊗Π

(a) P [A] = P[A] for A ∈ ∪i Hi , and ⊗Π

(b) any version of P[B | Gi ] is a version of P [B | FT0i ] for B ∈ Hi+1 and 0 ≤ i ≤ n, where Hi , σ Gi−1 , ∆(X Ti , Ti−1 ) for 1 ≤ i ≤ n + 1. 3.7 Remark. Two versions of P[B | Gi ] may differ on a P-null set, N ∈ Gi ⊂ ⊗Π ⊗Π Hi , but P [N ] = P[N ] = 0 by (a) of Thm. 3.6, so N is also a P -null set and statement of the theorem is at least plausible.

35

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

Property (a) says that we do not change the unconditional distributions of events that are Hi -measurable for some i; however, if the random variable A is FT0i -measurable and the random variable B is Hi+1 -measurable, then we may change the joint distribution of (A, B). In particular, (b) implies that A ⊗Π and B are conditionally independent given Gi under P , regardless of their joint distribution under P. 3.8 Example. Let Ω, F , {Ft0 }t∈R+ , P be a stochastic basis that supports a Wiener process W and a collection of independent, F00 -measurable random variables {Ui }0≤i≤n , each of which is uniformly distributed over the interval [0, 1]. Let c1 , c2 , and η be defined as as in Example 2.10, and let π be defined as in (3.5) of Example 3.4. Define σ : Ω×C0 (R+ ; R)×R+ → R by √  c1 if t ∈ [0, t1 ) and U0 < 1/2,    √c if t ∈ [0, t ) and U ≥ 1/2, 2 1 0 σt (y) , √ i ), ti c1 )  c1 if t ∈ [ti , ti+1 ) and Ui < η(y(ti ),η(y(t for i > 0, and  ti c1 )+η(y(ti ), ti c2 )   √ η(y(ti ), ti c2 )  c if t ∈ [t , t ) and U ≥ for i > 0. 2

i

i+1

i

η(y(ti ), ti c1 )+η(y(ti ), ti c2 )

Rt Let Y solve dYt = σt (Y ) dWt with Y0 = 0, and set Ct = 0 σs2 (Ys ) ds. In prose, we flip a coin to choose a volatility at the initial time, and we use this volatility over the time interval [0, t1 ). At each time ti , we flip again to reset the volatility level, but the odds are adjusted so that the conditional distribution of the volatility chosen at time ti given Yti = y is the same as the conditional distribution of σ e0 = σ eti in Example 2.10 given Yeti = y. Let Ω e where Ye and C e and Π be defined as in Example 3.4, and set P , L (Ye , C) are defined as in Example 2.10. In particular, P is a measure on Ω. In this ⊗Π case, we have P = L (Y, C).

3.1

The Binary Construction

We work up to this result in steps. In the first lemma, we take an initial point ω = (e, x) ∈ Ω, and we cut the path x at time t, keeping the initial segment from 0 to t and discarding the rest. We then randomly draw a path from C0 (R+ ; Rd ) according to some measure Q and append this path to the initial segment of x. Recall that C0 (R+ ; Rd ) denotes the set of continuous functions from R+ to Rd that start at 0. 36

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION 3.9 Lemma. Fix some ω 0 = (e0 , x0 ) ∈ Ω, t ≥ 0, and let Q be a probability measure on C0 (R+ ; Rd ). Then there exists a unique measure on Ω, denoted δω0 ⊕t Q, such that δω0 ⊕t Q A ∩ {∆(X, t) ∈ B } = 1A (ω 0 ) Q[B ] for all A ∈ Ft0 and B ∈ C0 , where C0 denotes the Borel σ-field on C0 (R+ ; Rd ). Proof. Let φ : Ω → C0 (R+ ; Rd ) denote the map ω = (e, x) 7→ ∆(x, t). As F = σ Ft0 , φ -1 (C0 ) = σ Ft0 , ∆(X, t) (e.g., Lem. A.3), uniqueness follows from the standard π-system argument. If we let ψ : C0 (R+ ; Rd ) → Ω denote the map y 7→ e0 , ∇(x0 , t) + Θ(y, −t) , then ψ -1 A ∩ {∆(X, t) ∈ B } = ψ -1 (A) ∩ y ∈ C0 (R+ ; Rd ) : ∆ ∇(x0 , t) + Θ(y, −t), t ∈ B ( B if ω 0 ∈ A, and , = ∅ otherwise. as ∆ ∇(x0 , t) + Θ(y, −t), t = ∆ ∇(x0 , t), t +∆ Θ(y, −t), t = 0 + y. This means that Q ◦ ψ -1 is the required measure. 3.10 Lemma. Let T be an F0 -stopping time, let Q be a probability measure on C0 (R+ ; Rd ), and let C0 denote the Borel σ-field on C0 (R+ ; Rd ). Fix some ω 0 = (e0 , x0 ) ∈ Ω and set P , δω0 ⊕T (ω0 ) Q. Then (a) P[T = T (ω 0 )] = 1, (b) P A ∩ {∆(X, T ) ∈ C} = 1A (ω 0 ) Q[C] for all A ∈ FT0 , C ∈ C0 , and (c) P[A ∩ F ] = 1A (ω 0 ) P[F ] for all A ∈ FT0 , F ∈ F . Proof. Set B , {E = e0 , X t = ∇(x0 , t)}. Lem. A.1 asserts that B ∈ Ft0 , so may apply the previous lemma to conclude that P[B ] = 1B (ω 0 ) = 1. Applying Lem. A.1 again, we see that T (ω) = T (ω 0 ) for all ω ∈ B, so we have (a).

37

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION If A ∈ FT0 and C ∈ C0 , then P[A ∩ {∆(X, T ) ∈ C }] = P[A ∩ {∆(X, T ) ∈ C } ∩ {T = T (ω 0 )}] = P[A ∩ {∆ X, T (ω 0 ) ∈ C } ∩ {T = T (ω 0 )}] = P[A ∩ {∆ X, T (ω 0 ) ∈ C }] = 1A (ω 0 ) Q[C ], so (b) follows. Finally, take A ∈ FT0 and let F = B ∩ {∆(X, T ) ∈ C} with B ∈ FT0 and C ∈ C0 . In this case, we have P[A ∩ F ] = 1A∩B (ω 0 ) Q[C ] = 1A (ω 0 )P[B ∩ {∆(X, T ) ∈ C }] = 1A (ω 0 )P[F ]. As F = σ FT0 , ∆(X, T ) (e.g., Lem. A.3), (c) then follows from the standard π-system argument. We can now patch together a fixed initial point in Ω and a single probability measure on C0 (R+ ; Rd ). We use this construction to glue together a probability measure P on Ω and a probability kernel Q on C0 (R+ ; Rd ). 3.11 Definition. Let (Ω0 , F 0 ) and (Ω00 , F 00 ) be a measurable spaces and fix some G 0 ⊂ F 0 . We say that Q : Ω0 ×F 00 → [0, 1] is a G 0 -measurable probability kernel from (Ω0 , F 0 ) to (Ω00 , F 00 ), if (a) Q[A] is a G 0 -measurable random variable for fixed A ∈ F 00 , and (b) Qω0 is a probability measure on (Ω00 , F 00 ) for fixed ω 0 ∈ Ω0 . 3.12 Theorem. Let P be a probability measure on (Ω, F ), T be an F0 stopping time, and Q be an FT0 -measurable probability kernel from (Ω, F ) to (C0 (R+ ; Rd ), C0 ). Then there exists a unique probability measure on (Ω, F ), denoted P⊕T Q, such that (a) P⊕T Q[A] = P[A] for all A ∈ FT0 , and (b) the map ω 7→ δω ⊕T (ω) Qω [B] is a version of P⊕T Q[B | FT0 ] for each B ∈ F. b : Ω×F → [0, 1] denote the map (ω, A) 7→ δω ⊕T (ω) Qω [A]. Proof. Let Q b is an F 0 -measurable probability kernel from (Ω, F ) to We first show that Q T (Ω, F ). 38

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION b bω [Ac ] = Let A , {A ∈ F : Q[A] is FT0 -measurable}. If A ∈ F , then Q bω [A] because Q bω is a probability measure for each fixed ω. In particular, 1− Q c b c ] = 1 − Q[A] b if A ∈ A , then Q[A soPA ∈ A . Similarly, if An ∈ A and the b ∪n An = b An are disjoint, then Q n Q[An ] so ∪n An ∈ A . We have now shown that A is a λ-system. Set B , F ∈ F : F = A ∩ B for some A ∈ FT0 and B = {∆(X, T ) ∈ C } , b ] = Q[A b ∩ B] = 1A Q(C) ∈ F 0 by the previous and take F ∈ B. Then Q[F T lemma, so B ⊂ A . B is closed with respect to finite intersections and 0 σ(B) ⊃ FT ∨ σ ∆(X, T ) = F (e.g., Lem A.3), so F ⊂ A by the π-λ bω is a probability b theorem, and Q[A] is an FT0 -measurable for all A ∈ F . As Q b is an F 0 -measurable probability kernel measure for fixed ω by construction, Q T on (Ω, F ) b ] for F ∈ F . If A ∈ F 0 , then Now define the measure Q[F ] , EP Q[F T b Q[A] = EP Q[A] = EQ [1A ], where we use the second property in Lem. 3.10. Therefore Q has property (a). If B ∈ F , then b ∩ B] = EP 1A Q[B] b b Q A ∩ B = EP Q[A = EQ 1A Q[B] , b ∈ F0, where we have used the last property in Lem. 3.10, the fact that Q T 0 0 and the fact that Q and P agree on FT . As A ∈ FT was arbitrary, we have now shown that Q has property (b) of Thm. 3.12. The uniqueness is evident, as any other measure R with these properties must assign measure h i 0 b R B = R R B FT = EP Q[B] to any set B ∈ F . The previous construction connects an initial law with a collection of probability kernels. In our application, the collection of probability kernels will be generated by conditioning a probability measure, so we will need the following definition. 39

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION 3.13 Definition. Let P be a probability measure on (Ω0 , F 0 ) and fix some G 0 ⊂ F 0 . We say that Q is a conditional probability distribution for P given G 0 if (a) Q is a G 0 -measurable probability kernel from (Ω0 , F 0 ) into (Ω0 , F 0 ), and (b) Q[A] is version of P[A | G 0 ] for all A ∈ F 0 . In addition, we say that Q is regular if there exists a P-null set N such that Qω0 [G] = 1G (ω 0 ) for all G ∈ G 0 and ω 0 ∈ / N. We recall the following result. 3.14 Theorem. Let (Ω0 , F 0 ) be a Polish space with its Borel σ-field, and let P be a probability measure on (Ω0 , F 0 ). If we fix some G 0 ⊂ F 0 , then a conditional probability distribution for P given G 0 exists. Moreover, if G 0 is countably generated, then we may choose a regular version. For proof, one may consult [SV79] Thm 1.1.6 and Thm 1.1.8. 3.15 Corollary. Let P1 and P2 be probability measures on Ω, let T be an F0 -stopping time, and let G ⊂ FT0 with P1 |G P2 |G . Then there exists a unique measure, denoted P1 ⊗T,G P2 , such that (a) P1 ⊗T,G P2 [A] = P1 [A] for any A ∈ FT0 , and 1 2 0 (b) any version of P2 [B | G ] is a version of P ⊗T,G P [B | FT ] for all B ∈ σ G , ∆(X, T ) .

In particular, if P1 and P2 agree when restricted to G , then P1 ⊗T,G P2 and P2 agree when restricted to σ G , ∆(X, T ) 3.16 Remark. If G = σ(XT ), then σ G , ∆(X, T ) = σ Θ(X, T ) , so we can read (b) as saying that X has a strong Markov-like property at the stopping time T under the measure P1 ⊗T,G P2 . e be a regular conditional probability distribution of P2 condiProof. Let Q tioned on G (which exists as the conditions of Thm. 3.14 are satisfied), and e let Q(ω, C) , Q ω, {∆(X, T ) ∈ C} for ω ∈ Ω and C ∈ C0 . Notice that Q is a G -measurable probability kernel from (Ω, F ) to C0 (R+ ; Rd ), C0 , so 40

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION b , P1 ⊕S Q as our candidate for P1 ⊗T,G P2 . Property (a) is we may define P simply (a) of Thm. 3.12. b has property (b), consider the classes of sets To show that P There exists a version of P2 [A | G ] which A , A∈F : , and b | F0] is also a version of P[A T B , B ∈ F : B = G ∩ {∆(X, T ) ∈ C } for some G ∈ G and C ∈ C0 . . Now fix some B = G ∩ {∆(X, T ) ∈ C} ∈ B. By (b) of Thm. 3.12, the map b | F 0 ]. But ω 7→ δω ⊕T (ω) Qω [B] is a version of P[B T δω ⊕T (ω) Qω [B] = δω ⊕T (ω) Qω G ∩ {∆(X, T ) ∈ C } = 1G (ω) Q(ω, C) e ω, {∆(X, T ) ∈ C } = 1G (ω) Q where these equalities hold for all ω and we have used (b) of Lem. 3.10. We b | F 0 ], and e ∆(X, T ) ∈ C is a version of P[B can then conclude that 1G Q T 2 e we already know 1G Q ∆(X, T ) ∈ C is a version of P G ∩ {∆(X, T ) ∈ that C} G = P2 B G , so B ⊂ A . But A is a σ-field and B is closed with respect to intersection, so σ(B) = σ G , ∆(X, T ) ⊂ A . We now have the existence of a common version, but we still need to show that every version of P2 [A | G ] actually works. Fix A ∈ σ G , ∆(X, T ) , let Y be any version of P2 [A | G ], and let Z be a version of P2 [A | G ] which is also b | F 0 ]. So P2 [Y 6= Z] = 0 ⇒ P1 [Y 6= Z] = 0 as P2 |G P1 |G , a version of P[A T b 6= Z] = 0 by (a), so Y is a version of P[A b | F 0 ] and we are but then P[Y T done. 3.17 Remark. If we did not have P2 |G P1 |G , we could still attempt to define some measure “P1 ⊗T,G P2 ”; however, we could not hope for uniqueness. In this case, P1 charges events in G that don’t happen under P2 , so a conditional probability distribution of Q conditioned on G can be defined arbitrarily (up to measurability requirements) on these events. 3.18 Example. We specialize Example 3.8 to the case n = 1. In this case, we have T = t1 , G = σ(Yt1 ), and Y and C admit the relatively explicit

41

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

representations √  c1 W t for t ≤ t1 and U0 < 1/2,   √    c2 W t for t ≤ t1 and U0 ≥ 1/2, √ Yt = η(Y 1 , t1 c1 ) , Yt1 + c1 (Wt − Wt1 ) for t > t1 and U1 < η(Yt , t c t)+η(Y   t1 , t1 c2 ) 1 1 1   √ η(Yt1 , t1 c1 )  Yt1 + c2 (Wt − Wt1 ) for t > t1 and U1 ≥ , and η(Yt1 , t1 c1 )+η(Yt1 , t1 c2 )   tc1 for t ≤ t1 and U0 < 1/2,     tc2 for t ≤ t1 and U0 ≥ 1/2, Ct = η(Y 1 , t1 c1 ) , and Ct1 + (t − t1 )c1 for t > t1 and U1 < η(Yt , t c t)+η(Y   t1 , t1 c2 ) 1 1 1   η(Yt1 , t1 c1 )  Ct1 + (t − t1 )c2 for t > t1 and U1 ≥ . η(Yt , t c )+η(Yt , t c ) 1

1 1

1

1 2

e where Ye and C e were defined Recall that in Example 3.8 we set P , L (Ye , C) as in Example 2.10. Then Q , L (Y, C) = P⊗ T,G P. It is clear that Q and P agree on Ft01 , and this is property (a) of Cor. 3.15. Using the independence of the increments of W , the fact that U1 is independent of U0 , and the representation of Y and C above, we see that ∆(X, t1 ) where X = (Y, C) only depends upon Ft01 through the value of Yt1 . This is property (b) of Cor. 3.15.

3.2

Properties Preserved by the Binary Construction.

The results given in this section share the following assumptions. 3.19 Assumption. Let P1 and P2 be probability measures on Ω, let T be an F0 -stopping time, let G ⊂ FT0 with P1 |G P2 |G , and we set P12 , P1 ⊗T,G P2 . 3.20 Lemma. In addition to the assumptions of 3.19, let A be an Rd -valued, continuous process such that ∆(A, T ) is σ G , ∆(X, T ) -measurable. Then the following two implications hold. (a) If AT is P1 -a.s. of finite variation, and ∆(A, T ) is P2 -a.s. of finite variation, then A is P12 -a.s. of finite variation. (b) If AT is P1 -a.s. absolutely continuous, and ∆(A, T ) is P2 -a.s. absolutely continuous, then A is P12 -a.s. absolutely continuous. 42

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

Proof. Set F V d , {x ∈ C0 (R+ ; Rd ) : x ∈ BV [0, t]; Rd for all t}, and AC d , {x ∈ C0 (R+ ; Rd ) : x ∈ AC [0, t]; Rd for all t}. These are both Borel measurable subsets of C0 (R+ ; Rd ) (e.g., Cor. C.9 andd Cor. C.11). Fixing any ω = (e, x) ∈ Ω, we see that ∇ x, T (ω) ∈ BV [0, t]; R d and ∆ x, T (ω) ∈ BV [0, (t−T (ω))∨0]; R implies that x ∈ BV [0, t]; Rd , so T A ∈ F V d ∩ {∆(A, T ) ∈ F V d } ⊂ A ∈ F V d . As P2 [∆(A, T ) ∈ F V d ] = 1, 1 is a version of P2 [∆(A, T ) ∈ F V d | G ], and we may apply the properties of P12 listed in Cor. 3.15 to conclude that E12 [A ∈ F V d ] ≥ E12 1{AT ∈F V d } 1{∆(A,T )∈F V d } h i = E12 1{AT ∈F V d } E12 1{∆(A,T )∈F V d } FT0 h i = E1 1{AT ∈F V d } E2 1{∆(A,T )∈F V d } G = 1. This is (a). Similarly, if ω = (e, x) ∈ Ω, ∇ x, T (ω) ∈ AC [0, t]; Rd and ∆ x, T (ω) ∈ AC [0, (t − T (ω)) ∨ 0]; Rd then x ∈ AC [0, t]; Rd , so may replace F V d with AC d in the previous argument to get (b). The following corollary is often easier to use. 3.21 Corollary. In addition to the assumptions of 3.19, let A be an Rd valued, continuous process such that ∆(A, T ) is σ G , ∆(X, T ) -measurable. Then the following two implications hold. (a) If A is Pi -a.s. of finite variation for i ∈ {1, 2}, then A is P12 -a.s. of finite variation. (b) If A is Pi -a.s. absolutely continuous for i ∈ {1, 2}, then A is P12 -a.s. absolutely continuous. Proof. We have {A ∈ F V d } ⊂ {AT ∈ F V d }, so P1 [A ∈ F V d ] = 1 implies P1 [AT ∈ F V d ] = 1. We also have {A ∈ F V d } ⊂ {∆(A, T ) ∈ F V d }, so P2 [A ∈ F V d ] = 1 implies P2 [∆(A, T ) ∈ F V d ] = 1. Assertion (a) then 43

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

follows from (a) of Lem. 3.20, and essentially the same argument shows that (b) follows from (b) of Lem. 3.20. 3.22 Lemma. In addition to the assumptions of 3.19, let A be an F0 adapted, Rd -valued, continuous process such that ∆(A, T ) is σ G , ∆(X, T ) measurable, and let a be an Rd -valued, measurable process such that the set o ∂ ∂ B(ω) , t ∈ R+ : At (ω) exists and at (ω) 6= At (ω) ∂t ∂t n

has Lebesgue measure 0 for all ω. Further assume that S is an R+ -valued, F0 -stopping time such that (S − T )+ is σ G , ∆(X, T ) -measurable. If we Rt have Pi At = 0 au du ∀t ∈ R+ = 1 for i ∈ {1, 2} and P1 |G = P2 |G , then Rt P12 At = 0 au du ∀t ∈ R+ = 1 and 12

Z

(3.23) E

S

1

Z

T ∧S

f (au ) du = E 0

2

Z

f (au ) du + E 0

T ∨S

f (au ) du .

T

3.24 Remark. Each B(ω) is automatically Lebesgue measurable as it is a null set. We do not require these sets to be Borel measurable. 3.25 Remark. S = ∞ always satisfies the requirements of this lemma. Proof. It follows from the previous corollary that A is P12 -a.s. absolutely continuous. R t As a(ω) is a version of the derivative for each ω, we must have 12 P [At = 0 au du for all t] = 1. By taking divided differences of the process AT , we may find a σ(AT )⊗R+ ∂ T A measurable process a1 such that a1t (ω) = ∂t t (ω) whenever this deriva-2 tive exits. Similarly, there exists a σ ∆(A, T ) ⊗R+ -measurable process a ∂ such that a2t (ω) = ∂t ∆t (A, T ) whenever this derivative exists (e.g., take 0 T 0 Ft = σ(A ) or Ft = σ ∆(A, T ) for all t in Lem. C.10). Now define the sets n o ∂ T 1 B (ω) , t ∈ R+ : At (ω) does not exist , and ∂t n o ∂ 2 B (ω) , t ∈ R+ : ∆t A(ω), T (ω) does not exist. ∂t ∂ If 0 < t < T (ω) and t ∈ / B 1 (ω), then ∂t ATt (ω) exists and AT (ω) and A(ω) ∂ ∂ agree in some neighborhood of t, so ∂t At (ω) exists and agrees with ∂t ATt (ω).

44

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION In particular, if 0 < t < T (ω) and t ∈ / B(ω) ∪ B 1 (ω), then at (ω) = a1t (ω). If T d 1 ω ∈ {A ∈ F V }, then B(ω) ∪ B (ω) has Lebesgue measure zero, so at (ω 0 ) and a1t (ω 0 ) agree for Lebesgue-a.e. t ∈ 0, T (ω) . This means that Z (3.26)

T ∧S

1{AT ∈F V d }

T ∧S

Z

f (a1u ) du

f (au ) du = 1{AT ∈F V d } 0

0

for all ω, where we use the extended integral of Rem. 1.7. The process R T ∧S a1 1[0,T ∧S] is FT0 ⊗R+ -measurable, so 0 f (a1u ) du is FT0 -measurable by Fubini’s Theorem, and the same holds for the left hand side of (3.26). Similar pathwise arguments show that aT (ω)+t (ω) = a2t (ω) if t > 0, T (ω)+ t ∈ / B(ω), and t ∈ / B 2 (ω). If ω ∈ {∆(A, T ) ∈ F V d }, then aT (ω)+t (ω) and a2t (ω) agree for Lebesgue-a.e. t, so Z (3.27)

1{∆(A,T )∈F V d }

T ∨S

Z f (au ) du = 1{∆(A,T )∈F V d }

T

(S−T )+

f (a2u ) du

0

for all ω. The process as 1[0,(S−T )+ ] is σ G ,∆(X, T ) ⊗R+ -measurable, so the right hand side of (3.27) is σ G , ∆(X, T ) -measurable by Fubini’s Theorem and the assumption that ∆(A, T ) is σ G , ∆(X, T ) -measurable. Rt We have assumed that P2 [At = 0 au du for all t] = 1, so A is absolutely R T ∨S R T ∨S continuous P2 -a.s. and 1{∆(A,T )∈F V d } T f (au ) du = T f (au ) du P2 -a.s. R T ∨S This means that any version of P2 [1{∆(A,T )∈F V d } T f (au ) | G ] is also a R T ∨S version of P2 [ T f (au ) du | G ]. Rt We have assumed that Pi [At = 0 au du for all t] = 1 for i ∈ {1, 2}, so we may apply the previous corollary to conclude that A is P12 -a.s. absolutely continuous. As {AT ∈ AC d } ⊂ {A ∈ AC d } and {∆(A, T ) ∈ AC d } ⊂ {A ∈ AC d }, P12 [AT ∈ AC d ] = P12 [∆(A, T ) ∈ AC d ] = 1. We now use (3.23), (3.26), the properties stated in Cor. 3.15, and the

45

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION assumption that P1 |G = P2 |G to write Z S Z T ∧S 12 12 f (au ) du 1{AT ∈F V d } f (au ) du = E E 0 0 Z T ∨S 12 +E 1{∆(A,T )∈F V d } f (au ) du T

=E

1

Z

T ∧S

f (au ) du 1{AT ∈F V d } 0 h Z T ∧S i 1 2 + E E 1{∆(A,T )∈F V d } f (au ) du G T

1

Z

=E

T ∧S

2

Z

f (au ) du + E 0

T ∧S

f (au ) du .

T

3.28 Example. Resume the setting of Example 3.18 and take A = C in Lem. 3.22. Recall that in this case we have T = t1 and σ G , ∆(X, t1 ) = σ Yt1 , ∆(X, t1 ) = σ Θ(Y, t1 ), ∆(C, t1 ) , so ∆(C, T ) is σ G , ∆(X, T ) -measurable. Define the constant process ct , R e we have P Ct = t cs ds ∀t = 1, but under C1 . Then under P , L (Ye , C), Rt 0 Q , P⊗ t1 ,σ(Yt1 ) P = L (Y, C), we have Q Ct = 0 cs ds ∀t < 1. In particular, Rt Ct = 0 cs ds ∀t is the event where we choose the same volatility over the interval [0, t1 ) as we choose over the interval (t1 , ∞). This shows that Q is not absolutely continuous with respect to P. This example also shows why we must require a to agree with the derivative of A at all ω, rather than just P-a.e. ω. b0 , {Fb0 } where 3.29 Lemma. In addition to the assumptions of 3.19, let F t Fbt0 , FT0 +t and let M be a continuous, real-valued process with M0 = 0. Set c , ∆(M, T ) and assume that M c is σ G , ∆(X, T ) -measurable. If M T is M b0 , P2 )-local martingale, then M c is an (F an (F0 , P1 )-local martingale and M is an (F0 , P12 )-local martingale. Proof. First we note that if S is an F0 -stopping time, then S ∨ T − T is an b0 -stopping time and F 0 ⊂ Fb0 F S S∨T −T . To see this, notice that T + t is an 46

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION F0 -stopping time, so the first claim follows from the equalities {S ∨ T − T ≤ t} = {S ≤ T + t} ∈ FT0 +t = Fbt0 . If A ∈ FS0 , then the same chain of equalities gives A ∩ {S ∨ T − T ≤ t} = A ∩ {S ≤ T + t} ∈ FT0 +t = Fbt0 . From now on, if S is an F0 -stopping time, then Sb , S ∨ T − T will denote b0 -stopping time. the corresponding F b0 )-local martingale. Let c is a (P12 , F We first show that M T2n , inf{t ≥ T : |Mt − MT | ≥ n}. As this is the hitting time of a closed set, T2n is an F0 -stopping time. Notice that in this case, Tb2n = inf{t ≥ 0 : |∆t (M, T )| ≥ n}, so Tb2n is σ ∆(M, T ) cTb2n = ∇ ∆(M, T ), Tb2n . Also notice that M cn , M cTb2n ≤ n, measurable as is M b0 , P2 )-martingale. We write Z ∈ bF to mean that Z is a cn is an (F so M bounded F -measurable random variable. For 0 ≤ s ≤ t < ∞, let ctn − M csn ) = 0 , and A , Z ∈ bF : E12 Y (M B , Z ∈ bF : Z = Z1 Z2 with Z1 ∈ bFb00 , Z2 ∈ bσ ∆s (X, T ) . If we fix some Z = Z1 Z2 in B, then h i ctn − M csn ) = E12 Z1 E12 Z2 (M ctn − M csn ) Fb00 E12 Z(M h i cn − M cn ) G = E1 Z1 E2 Z2 (M t s h 0 i 1 2 2 cn n c ) Fb G , = E Z1 E Z2 E (Mt − M s s {z } | =0

where the second equality follows from (b) of Cor. 3.15. This means that B ⊂ A , but A is a monotone class (by bounded convergence) and B is closed with respect to forming finite products, so σ(B) ⊂ A by a monotone class argument. As Fbs0 = FT0 +s = σ FT0 , ∆(X T +s , T ) = σ FT0 , ∆s (X, T ) ⊂ σ(B), 47

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION b0 )-martingale. As M cn is a (P12 , F c is conby Lem. A.3, we conclude that M c tinuous, we have Tb2n → ∞ everywhere, so T2n is a localizing sequence and M b0 )-local martingale. is a (P12 , F Now we show that M is an (F0 , P12 )-local martingale. Retaining the previous notation, also define T1n , inf{0 ≤ t ≤ T : |Mt | ≥ n}, and T n , T1n ∧ T2n . Notice that {T1n = ∞} = {sups≤T |Ms | < n} and T n is an F0 -stopping time. n M n , M T is bounded by 2n as it is bounded by n on the interval [0, T ] and can potentially make a move of size n after T before getting stopped. We now show that M n is an (F0 , P12 )-martingale. To this end, fix some s < t and bounded Z ∈ Fs0 . Then E12 [Z(Mtn − Msn )] = E12 [Z 1{s≤T } (Mtn − MTn )] + E12 [Z 1{s≤T } (MTn − Msn )] + E12 [Z 1{T <s} (Mtn − Msn )] , A + B + C. To see that A = 0, first notice that ( Tn Tn Mt 2 − MT 2 n n Mt − Mt∧T = 0 = 1{T
if T1n = ∞ and T < t, and if T1n ≤ T or t ≤ T T2n Tn Mt∨T − MT 2 cbn − M cn M 0

t

b0 -stopping time. As Z 1{s≤T
48

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION n

where we have also used the fact that M T ∧T is a bounded (F0 , P1 )-local martingale, so it is in fact a martingale. Finally, we show that C = 0. Let sb , s ∨ T − T and b t , t ∨ T − T . Notice 0 0 b that Z 1{T <s} ∈ Fs ⊂ Fsb . Then Tn

Tn

2 2 Z 1{T <s} (Mtn − Msn ) = Z 1{T <s∧T1n } (Mt∨T − Ms∨T ) cbn − M cn ) = Z 1{T <s∧T n } (M 1

t

sb

Finally notice that {T < T1n } ∈ FT0 = Fb00 , so Z 1{T <s∧T1n } ∈ Fbsb0 . We then write cbn − M cn )] = 0 C = E12 [Z 1{T <s∧T1n } (M sb t cn is a (P12 , F0 )- martingale as shown above. using the fact that M To conclude the proof, we again note that T n → ∞ everywhere as M is continuous, so T n is a localizing sequence, and M is an (F0 , P12 )-local martingale. b0 , P2 )c is an (F To apply the previous theorem, you must know that M local martingale. While this looks like an unpleasant property to check, the following lemma shows that this condition is automatically satisfied when M is an (F0 , P2 )-local martingale. b0 , 3.30 Corollary. Let M be a continuous, real-valued process, and set F 0 0 0 0 c , {Fbt }, where Fbt , FT +t . If M is an (F , P)-local martingale, then M 0 b , P)-local martingale. ∆(M, T ) is an (F Proof. Take Fb0 , T1n , T2n , Tb2n , and T n as in the proof of the previous lemma. cn , M cTb2n , M n , M T2n , and M n,m , M T2n ∧T m . M cn is bounded by n, Let M n,m and M is bounded by m + (n ∧ m). In particular, M n,m is an (F0 , P)martingale. Notice that if m ≥ n, then {T m ≥ T2n } = {T1m > T } ∈ FT0 . In prose, when m ≥ n, the only way for T m to happen strictly before T2n is for the process to make a move of at least size m before time T . Also notice that if m ≥ n, then M n = M n,m on the set {T m ≥ T2n }. b0 )-martingale. Fix s < t and cn is a (P, F We are now ready to show that M

49

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION bounded Z ∈ Fbs0 = FT0 +s . Notice that Z1{T m ≥T2n } ∈ Fbs0 = FT0 +s , and write cn − M cn )] = E[Z(M n − M n )] E[Z(M t s T +t T +s n,m = lim E[Z1{T m ≥T2n } (MTn,m +t − MT +s )] m

= 0, where we have used bounded convergence, and the fact that M n,m is a (P, F0 )martingale. This means that Tbn is a localizing sequence, and M is a (P, Fb0 )local martingale. Combining Cor. 3.30 and Lem. 3.29 yields the following corollary. 3.31 Corollary. In addition to the assumptions of 3.19, let M be a continuous, real-valued process. Suppose that M is a local martingale with respect to both (F, P1 ) and (F, P2 ) and that ∆(M, T ) is σ G , ∆(X, T ) -measurable. Then M is an (F, P12 )-local martingale. Before we present the corresponding result for quadratic variation, we give an easy lemma. 3.32 Lemma. Let M be a uniformly integrable (F0 , P)-martingale, and let S, T , and U be F0 -stopping times with T ≤ U . If Z is an FT0 -measurable random variable, then E[(MU − MT ) Z | FS0 ] = (MU ∧S − MT ∧S ) Z. 3.33 Remark. We often apply this result with Z = YT for some process Y . In this situation, we have (MU ∧S − MT ∧S ) YT = (MU ∧S − MT ∧S ) YT ∧S as MU ∧S − MT ∧S is only nonzero if T < S.

50

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

Proof. We write E[(MU − MT ) Z | FS0 ] = 1{S≤T } E[(MU − MT ) Z | FS0 ] + 1{T <S≤U } E[(MU − MT ) Z | FS0 ] + 1{U <S} E[(MU − MT ) Z | FS0 ] i h 0 0 0 = 1{S≤T } E E[(MU − MT ) | FT ] Z FS + 1{T <S≤U } E[MU | FS ] − MT Z | {z } =0

+ 1{U <S} (MU − MT ) Z = 1{T <S≤U } (MS − MT ) Z + 1{U <S} (MU − MT ) Z = (MU ∧S − MT ∧S ) Z

3.34 Corollary. In addition to the assumptions of 3.19, let M 1 , M 2 , and C 1 2 be continuous, real-valued processes, and assume that i∆(M , T ), ∆(M , T ), and ∆(C, T ) are all σ G , ∆(X, T ) -measurable. If M is a local martingale under both P1 and P2 for i ∈ {1, 2}, and M 3 , M 1 M 2 − C is a local martingale under both P1 and P2 , then M 3 is a local martingale under P12 . Proof. We cannot apply Lem. 3.29 directly to M 3 as we do not assume 3 ∆(M , T ) ∈ σ G , ∆(X, T ) . Instead, we define the process 1 2 1 2 Yt , Mt∧T Mt∧T + (M 1 − Mt∧T )(M 2 − Mt∧T ) − Ct .

Notice that ∆(Y, T ) = ∆(M 1 , T )∆(M 2 , T )−∆(C, T ) ∈ σ G , ∆(X, T ) . Now let T n , inf{t ≥ 0 : |Mti | ≥ n for any i ∈ {1, 2, 3} or |Ct | ≥ n}. n

n

n

and define M i,n , (M i )T for i ∈ {1, 2, 3}, C n , C T , and Y n , Y T . Notice that M i,n is bounded by n for i ∈ {1, 2, 3} and Y n is bounded by 5n2 + n. We now show that Y n is a martingale under P1 and P2 . For 0 ≤ s ≤ t, we write 0 2,n 1,n 2,n 1,n Ei (Mt1,n − Mt∧T )Mt∧T Fs = (Ms1,n − Ms∧T Mt∧T (3.35) 1,n 2,n Ms∧T , = (Ms1,n − Ms∧T 2,n where we have applied the previous lemma with M = M 1,n , Z = Mt∧T , 1,n 1,n S = s, T = t ∧ T , and U = t and used the fact that (Ms − Ms∧T is zero

51

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION if T ≥ s. Clearly the same equality holds if we reverse the roles of M 1,n and M 2,n . Now we use the fact that M 3,n is a martingale to write 0 1,n 2,n )Mt∧T Fs Ei Ytn Fs0 = Ei Mt3,n Fs0 − Ei (Mt1,n − Mt∧T 0 2,n 2,n 1,n i − E (Mt − Mt∧T )Mt∧T Fs 2,n 1,n 1,n 2,n = Ms3,n − (Ms1,n − Ms∧T Ms∧T − (Ms2,n − Ms∧T Ms∧T = Ysn , so Y is local martingale under P1 and P2 . By Cor. 3.31, M 1 , M 2 , and Y are all P12 -local martingales, so the stopped versions are P12 -martingales. This means that (3.35) holds for P12 as well, and we can run the above argument in the opposite direction. In particular, 0 1,n 2,n E12 Mt3,n F 0 = E12 Ytn Fs0 + E12 (Mt1,n − Mt∧T Fs )Mt∧T 0 2,n 2,n 1,n 12 + E (Mt − Mt∧T )Mt∧T Fs 2,n 1,n 1,n 2,n n = Ys + (Ms1,n − Ms∧T Ms∧T + (Ms2,n − Ms∧T Ms∧T = Ms3,n We conclude that M 3 is a P12 -local martingales. Putting all of this together, we see that this construction preserves the characteristics of continuous semimartingales. Recall that a continuous, Rd valued semimartingale X is said to admit the characteristics (B, C) with respect to (F, P) if we can write X = X0 + M + B, where M is a continuous (F, P)-local martingale with M0 = 0, B is a continuous process with B0 = 0 that is P-a.s. of finite variation, and C = hM i. 3.36 Corollary. In addition to the assumptions of 3.19, let Y be a continuous, Rd -valued that is a semimartingale which admits the characteristics 1 2 (B, C) with respect to both (F, P ) and (F, P ). If ∆(Y, T ), ∆(B, T ), and ∆(C, T ) are all σ G , ∆(X, T ) -measurable, then Y admits the characteristics (B, C) with respect to (F, P12 ). Proof. As B and C are both P1 and P2 -a.s. of finite variation, we may apply (a) of Cor. 3.21 to conclude that B and C are both P12 -a.s. of finite variation. We then notice that ∆(Y − B, T ) = ∆(Y, T ) − ∆(B, T ) ∈ σ G , ∆(X, T ) , so we may apply Cor. 3.31 to each component of M , Y − B to conclude that M is a (P12 , F)-local martingale. We then apply Cor. 3.34 to each component 52

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION of M ⊗M − C to conclude that M ⊗M − C is a (P12 , F)-local martingale. As C is P12 -a.s. of finite variation, we conclude that hM i = C. 3.37 Example. Resume the setting of Example 3.18. As σ G , ∆(X, t1 ) = σ Yt1 , ∆(X, t1 ) = σ Θ(Y, t1 ), ∆(C, t1 ) , ∆(Y, t1 ) and ∆(C, t1 ) are both σ G , ∆(X, t1 ) -measurable. Notice that under e and Q , P⊗ t ,σ(Y ) P = L (Y, C), Y is a continuous both P , L (Ye , C) t1 1 semimartingale with the characteristics (0, C).

3.3

The General Construction.

We get the general construction announced at the beginning of this chapter by repeated application of the binary construction. Fortunately, the binary construction is associative. 3.38 Theorem. Let P1 , P2 , and P3 be measures on Ω, and let S ≤ T be0 0 finite F -stopping times with S ∈ σ G , ∆(X, S) . Fix σ-fields G ⊂ FS T− T 1 and H ⊂ σ G , ∆(X , S) . If P |G P2 |G and P2 |H P3 |H , then (a) P1 |G P2 ⊗T,H P3 |G , (b) P1 ⊗S,G P2 |H P3 |H , and (c) P1 ⊗S,G P2 ⊗T,H P3 = P1 ⊗S,G P2 ⊗T,H P3 . Proof. To reduce the now rather burdensome notation, we set P12 , P1 ⊗S,G P2 , P23 , P2 ⊗T,H P3 , P12,3 , P1 ⊗S,G P2 ⊗T,H P3 , and P1,23 , P1 ⊗S,G P2 ⊗T,H P3 . (a) Fix A ∈ G with P1 [A] > 0. A ∈ FT0 and (a) of Cor. 3.15 imply P23 [A] = P2 [A], and P2 [A] > 0 as P1 |G P2 |G .

53

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION (b) Fix A ∈ H with P3 [A] = 0, so P2 [A] = 0 as P2 |H P3 |H . This means that 0 is a version of P2 [A | G ], but A ∈ σ G , ∆(X T , S) ⊂ σ G , ∆(X, S) , so 0 is also a version of P12 [A | FS0 ] by (b) of Cor. 3.15 and P12 [A] = 0. (c) Fix G ∈ G , B ∈ σ ∆(X T , S) , and C ∈ σ ∆(X, T ) . Let Z be any version of E3 [1C | H ] and Y be a any version of E2 1B Z G . Two applications of Cor. 3.15 give E23 1G Y = E2 1G Y = E2 1G∩B Z = E23 1G∩B Z = P23 G ∩ B ∩ C . This means that any version of E2 1B Z G is a version of E23 [1B∩C | G ]. We will use this fact the next chain of equalities. Now fix A ∈ FS0 as well. Again using the properties listed in Cor. 3.15, we have h i P12,3 A ∩ B ∩ C = E12 1A∩B E3 1C H h i 1 2 3 = E 1A E 1B E 1C H G h i 1,23 2 3 =E 1A E 1B E 1C H G h i = E1,23 1A E23 1B∩C G = P1,23 A ∩ B ∩ C . As we have σ FS0 , ∆(X T , S), ∆(X, T ) = σ FT0 , ∆(X, T ) = F (e.g., Lem. A.3), the measures agree on a π-system that generates F which is enough to conclude that the measures agree everywhere.

We now have everything that we need for the proof of Thm. 3.6. We restate the result below for the reader’s convenience. 54

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION 3.6 Theorem. Let P be a probability measure on Ω and Π = {(Ti , Gi )}0≤i≤n ⊗Π be an extended partition. Then there exists a unique measure, denoted P , such that ⊗Π

(a) P [A] = P[A] for A ∈ ∪i Hi , and ⊗Π

(b) any version of P[B | Gi ] is a version of P [B | FT0i ] for B ∈ Hi+1 and 0 ≤ i ≤ n, where Hi , σ Gi−1 , ∆(X Ti , Ti−1 ) for 1 ≤ i ≤ n + 1. Proof. First notice that if 0 ≤ i ≤ j ≤ k ≤ l ≤ n, then repeated applications of Cor. A.7 give σ Gj , ∆ X Tk+1 , Tj ⊂ σ Gi , ∆ X Tl+1 , Ti . We will make use of this fact repeatedly, and without further mention. We will argue inductively, making the inductive assumption that there exists a measure Pm such that (a) Pm [A] = P[A] if A ∈ Hi and 0 ≤ i ≤ n + 1, and (b) any version of P[B | Gi ] is a version of Pm [B | FT0i ] if B ∈ Hi+1 and 0 ≤ i ≤ m. Setting P-1 , P it clear that (a) holds trivially and (b) holds vacuously for the base case m = −1. We now assume that some Pm exists which satisfies (a) and (b). As Gm+1 ⊂ Hm+1 , we have Pm |Gm+1 = P |Gm+1 by assumption (a), so we may define Pm+1 , Pm ⊗ Tm+1 ,Gm+1 P. If A ∈ Hi for some i ≤ m+1, then A ∈ FT0m+1 and Pm+1 [A] = Pm [A] = P[A] by (a) of Cor. 3.15 and the first inductive assumption. If A ∈ Hi for some i > m + 1, then Hi ⊂ σ Gm+1 , ∆(X, Tm+1 ) and Pm+1 [A] = P[A] by (b) of Cor. 3.15. Either way, (a) is satisfied by Pm+1 . If A ∈ FT0i , B ∈ Hi+1 for some i ≤ m, and Z is any version of P[B | Gi ], then we note that A ∩ B and 1A Z are both FT0m+1 -measurable, so applying (a) of Cor. 3.15, followed by the our inductive assumption (b), and then (a) of Cor. 3.15 again, gives Em+1 1A Z = Em 1A Z = Pm A ∩ B = Pm+1 A ∩ B , and Z is a version of Em+1 [1B | FT0i ]. If B ∈ Hm+2 ⊂ σ Gm+1 , ∆(X, Tm+1 ) , then (b) of Cor. 3.15 says that any version of E[B | Gm+1 ] is a version of Em+1 [B | FT0m+1 ], so Pm+1 satisfies the inductive assumption (b) for all 0 ≤ i ≤ m + 1. 55

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION It is then clear that Pn satisfies properties (a) and (b). To see thatthis measure is unique, fix any A0 ∈ σ(E) = F00 and Ai ∈ σ ∆(X Ti , XTi−1 ) for 1 ≤ i ≤ n + 1. Then h i n+1 0 n n 0 n n P ∩i=0 Ai = E 1A0 E 1A1 · · · E [1An+1 | FTn ] · · · F0 i h 0 n n = E 1A0 E 1A1 · · · E[1An+1 | Gn ] · · · F0 i h = E 1A0 E 1A1 · · · E[1An+1 | Gn ] · · · G0 , so the probability assigned to the event ∩n+1 i=0 Ai is fully determined by P and the properties (a) and (b). As σ E, ∆(X T1 , 0), · · · , ∆(X Tn+1 , Tn = F , any two measures which agree on sets of the form ∩ni=0 Ai must agree on all of F by the π-λ theorem. Now we quickly check that the properties of the original measure which were preserved by the binary construction are also preserved by the general construction. All of these proofs are essentially the same, and we use induction to reduce to the binary case. 3.39 Lemma. Let P be a measure on Ω and Π = {(Ti , Gi )}0≤i≤n be an extended partition. Let A be a continuous, Rd -valued process, and assume that ∆(A, Ti ) is σ Gi , ∆(X, Ti ) -measurable for each i ∈ {1, . . . , n}. Then the following two implications hold. ⊗Π

(a) If A is P-a.s. of finite variation, then A is P -a.s. of finite variation. ⊗Π

(b) If A is P-a.s. absolutely continuous, then A is P -a.s. absolutely continuous. Proof. Assume that A is P-a.s. of finite variation, and set P0 , P ⊗T0 ,G0 P. It then follows from Cor. 3.21 that A is P0 -a.s. of finite variation. We now proceed by induction, so assuming that A is Pi -a.s. of finite variation and setting Pi+1 = Pi ⊗Ti+1 ,Gi+1 P, it again follows from Cor. 3.21 that A is Pi+1 ⊗Π a.s. of finite variation. As Pn = P , we have (a). Assertion (b) follows in the same way. 56

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION 3.40 Lemma. Let P be a measure on Ω and Π = {(Ti , Gi )}0≤i≤n be an extended partition. Let A be a continuous, Rd -valued process such that ∆(A, Ti ) is σ Gi , ∆(X, Ti ) -measurable for each i ∈ {1, . . . , n}. Let a be a measurable, Rd -valued process such that the set o n ∂ ∂ B(ω) , t ∈ R+ : At (ω) exists and at (ω) 6= At (ω) ∂t ∂t has Lebesgue measure 0 for all ω. Finally, let S be an R+ -valued F0 -stopping time such that (S−Ti )+ is σ Gi , ∆(X, Ti ) -measurable for each i ∈ {1, . . . , n}. Rt Rt ⊗Π If P At = 0 au du ∀t = 1, then P At = 0 au du ∀t = 1 and S

Z

⊗Π

(3.41)

S

Z

f (au ) du .

f (au ) du = E

E

0

0

Proof. Set P0 , P ⊗T0 ,G0 P. It then follows from Lem. 3.22 that 0

Z

S

Z

T0 ∧S

f (au ) du + E Z S f (au ) du . =E

Z

T0 ∨S

f (au ) du

f (au ) du = E

E

0

0

T0

0

We then proceed by induction, setting Pi+1 , Pi ⊗Ti+1 ,Gi+1 P, and applying Lem. 3.22 to conclude that Z S Z S i+1 f (au ) du . E f (au ) du = E 0

0 ⊗Π

As Pn = P , we are done. 3.42 Corollary. Let P be a measure on Ω, let A be a continuous, Rd -valued, P-a.s. absolutely continuous process, let a be a measurable, Rd -valued process, and let Π(n) = (Tin , Gin ) 0≤i≤N (n) be an extended partition for each n. Set TNn (n)+1 , ∞ and Pn , P⊗Π(n) , ∂ and assume that at (ω) = ∂t At (ω) whenever this derivative exists. Further n assume that Ti and ∆(A, Tin ) are σ Gin , ∆(X, Tin ) -measurable for all n and

57

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

i ∈ {0, . . . , N (n)}. If Z (3.43)

t

kau k du < ∞

P

E

∀t ∈ R+ ,

0

then a is uniformly integrable with respect to {Pn ×λ[0,t] }n for each t ∈ R+ . Proof. Fix any t and ε > 0 and choose M so large that Z t + P kau k − M du < ε E 0

using the integrability assumption (3.43). Applying the previous lemma with + f (x) = kxk − M and S = t, we have Z t Z t + + P E kau k − M du = E kau k − M du < ε. n

0

0

This shows that A is uniformly integrable with respect to {Pn ×λ[0,t] }n 3.44 Corollary. Let P be a measure on Ω, let A be a continuous, Rd -valued, P-a.s. absolutely continuous process, and let Π(n) = (Tin , Gin ) 0≤i≤N (n) be an extended partition for each n. Set TNn (n)+1 , ∞ and Pn , P⊗Π(n) . Further suppose that Tin and ∆(A, Tin ) are both σ Gin , ∆(X, Tin ) -measurable for all n and i ∈ {0, . . . , N (n)} and that EP [ Vart (A) ] < ∞ ∀t ∈ R+ . Then the collection L (A | Pn ) n is tight.

(3.45)

Proof. By taking the limit of divided difference on the left (e.g., Lem. C.10), ∂ we may find an F0 -predictable processes a such that at (ω) = ∂t At (ω) whenever this derivative exists. As A is P-a.s. absolutely continuous, pathwise

58

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION

arguments show that we have Z t P au du ∀t = 1, and E At = 0 Z t P (3.46) kau k du ∀t = 1. E Vart (A) = 0

Combining (3.45) and (3.46), we see that we may apply the previous lemma to conclude that a is uniformly integrable with respect to {(Pn ×λ[0,t] )}n for each t. As A0 = 0, we only need to check that we can control the modulus of continuity on each compact interval [0, t] to conclude that {L (A | Pn )}n is tight. Fix some T > 0. Using the uniform integrability, we may choose M so large that Z t + n sup E kau k − M du < ε2 /2. n

0

Setting δ = ε2 /(2M ) and letting D(δ, T ) , {s, t ∈ R2+ : s ≤ t ≤ T ∧ s + δ}, we have h i Pn sup kAt − As k : s, t ∈ D(δ, T ) ≥ ε Z t n kau k du : s, t ∈ D(δ, T ) ≥ ε ≤ P sup s Z t 1 n ≤ E sup kau k du : s, t ∈ D(δ, T ) ε s Z T + 1 n ≤ E δM + kau k − M du ε 0 ≤ ε, so {L (A | Pn )}n is tight. 3.47 Lemma. Let P be a measure on Ω, and let Π = {(Ti , Gi )}0≤i≤n be an extended partition. Let Y be a continuous, Rd -valued process, and suppose that Y is a semimartingale with the characteristics (B, C) under P. If ∆(Y, Ti ), ∆(B, Ti ), and ∆(C, Ti ) are all σ Gi , ∆(X, Ti ) -measurable for each i ∈ {1, . . . , n}, then Y is a semimartingale with the characteristics (B, C) ⊗Π under P . 59

CHAPTER 3. A CROSS PRODUCT CONSTRUCTION Proof. Setting P0 , P⊗T0 ,G0 P, we may apply Cor. 3.36 to conclude that X has characteristics (B, C) under P0 . We then proceed inductively, setting Pi+1 = Pi ⊗Ti+1 ,Gi+1 P and applying Cor. 3.36 to conclude that X has characteristics ⊗Π (B, C) under Pi+1 for each i < n. As Pn = P , we are done.

60

Chapter 4 Main Theorem In this chapter, we develop the main theorem of this dissertation. In Section 4.1, we develop some results related to conditional expectations. In Section 4.2, we develop some approximation lemmas that we will need for the proof of the main result, and in Section 4.3 we prove Thm. 2.11.

4.1

Conditional Expectation Lemmas

The results of this section are implicit in [Gy¨o86]. 4.1 Lemma. Let (E, E ) be a measurable space with a countably generated σ-field, let Y be an E-valued process, let z be an Rd -valued process with Z t kzu k du < ∞ ∀t ∈ R+ , (4.2) E 0

and let zb : E×R+ → Rd be a (deterministic) E ⊗R+ -measurable function. Then zb(Yt , t) is a version of E[zt | Yt ] for Lebesgue-a.e. t if and only if Z t Z t (4.3) E zb(Yu , u) f (Yu , u) du = E zu f (Yu , u) du 0

0

for all t ∈ R+ and all bounded f : E×R+ → Rd that are E ⊗R+ -measurable. Moreover, in this case we have Z t (4.4) E kb z (Yu , u)k du < ∞ ∀t ∈ R+ . 0

61

CHAPTER 4. MAIN THEOREM

Proof. Suppose that zb(Yt , t) is a version of E[zt | Yt ] when t ∈ / N , where N ⊂ R+ is a Lebesgue-null set. It then follows that E kb z (Yt , t)k ≤ E kzt k when t ∈ / N . We may apply Tonelli’s Theorem and (4.2) to conclude that (4.4) holds, and (4.3) then follows by Fubini’s Theorem. assume that (4.3) holds. Taking f = 1 in (4.3), we see that we have RNow t E[ 0 kb z (Yt , t)k dt] < ∞ for all t (recall Rem. 1.7). Set n o N1 , t ∈ R+ : E kb z (Yt , t)k + kzt k = ∞ . By Tonelli’s Theorem, N1 is an R+ -measurable Lebesgue-null set. Let C = {Cn }n∈N denote a countable collection of sets that generate E . Without loss of generality, we may assume that C is closed with respect to finite intersections. Now define gn (t) , E zb(Yt , t) − zt 1{Yt ∈Cn } , so gn is R+ /R d -measurable by Fubini’s Theorem. Suppose that A ⊂ [0, t] is R+ -measurable. Then (4.3) implies that Z t Z gn (u) du = E zb(Yt , t) − zt 1{(Yt ,t)∈Cn ×A} du = 0. A

0

As this is true for all such A, we have gn = 0 Lebesgue-a.e. when restricted to [0, t] (e.g., [Roy88] Lem. 5.3.8). Letting t → ∞, we see that gn = 0 Lebesguea.e. on R+ . Setting N2 = ∪n {t ∈ R+ : gn (t) 6= 0}, Fubini’s Theorem implies that N2 is an R+ -measurable Lebesgue-null set. Now consider the class of sets B , B ∈ E : E zb(Yt , t) 1{Yt ∈B} = E zt 1{Yt ∈B} for all t ∈ / N1 ∪ N2 . We have C ⊂ B by construction and B is a monotone class by dominated convergence, so B = E . In particular, zb(Yt , t) is a version of E[zt | Yt ] for all t∈ / N1 ∪ N2 . 4.5 Corollary. Let (E, E ) be a measurable space with a countably generated σ-field, let Y be an E-valued process, and let z be a K-valued process, where

62

CHAPTER 4. MAIN THEOREM K is a closed convex subset of Rd or Rd ⊗Rr . If Z t E kzs k ds < ∞ ∀t ∈ R+ , 0

then there exists a (deterministic) E⊗R+ -measurable function zb : E×R+ → K such that zb(Yt , t) is a version of E[zt | Yt ] for Lebesgue-a.e. t. 4.6 Remark. S+d is a closed convex subset of Rd ⊗Rd , so if z takes values in S+d , then the theorem asserts that may choose zb : E×R+ → S+d . Proof. We consider the case K ⊂ Rd with z = (z 1 , . . . , z d ). Define the following σ-finite, signed measures on E ×R+ : Z ∞ 1A (Yu , u) du , and µ(A) , E 0 Z ∞ i i zu 1A (Yu , u) du for i ∈ {1, . . . , d}. ν (A) , E 0

It is clear from these definitions that µ ν i for each i. As µ is σ-finite, the Radon-Nikodym derivatives dν i /dµ are well-defined for each i. Let ze = (e z 1 , . . . , zed ) denote the function with zei = dν i /dµ for each i. Fixing any bounded, E ⊗R+ /R d -measurable g = (g 1 , . . . , g d ) : E ×R+ → Rd and letting {e1 , . . . , ed } denote the canonical basis on Rd , we have Z E 0

t

ze(Yu , u) g(Yu , u) du = =

d X i=1 d X

i

Z

e

E×[0,t]

ei

Z

zei (y, u) g(y, u) µ(dy, du) g(y, u) ν i (dy, du)

E×[0,t]

i=1

Z =E

t

zu g(Yu , u) du .

0

We now show that ze takes values in K, µ-a.e. We argue by contradiction, so assume that Z (4.7) 1{ez(e,t)∈K} µ(de, dt) > 0. / E×R+

As Rd is separable, we may write K as an intersection of a countable collection 63

CHAPTER 4. MAIN THEOREM

of closed half-spaces. In particular, we have K = ∩n Hn where Hn = {x ∈ Rd : (x, yn0 ) ≤ αn } with yn0 ∈ Rd and αn ∈ R. This means that K c = ∪n Hnc . As we have assumed (4.7), there must exist some N and T with Z 1{bz(e,t)∈HNc } µ(de, dt) > 0. E×[0,T ]

But then Z

T

Z

T 0

ze(Ys , s), y 1{ez(Ys ,s)∈HNc } ds Z T 0 (zs , y ) 1{ez(Ys ,s)∈HNc } ds =E 0 Z T 1{ez(Ys ,s)∈HNc } ds , ≤ αN E

1{ez(Ys ,s)∈HNc } ds < E

αN E 0

0

0

which is a contradiction. We conclude that ze takes values in K, µ-a.e. We then pick any k ∈ K and define zb(e, t) , ze(e, t) 1{ez(e,t)∈K} + k 1{ez(e,t)∈K} . / We have Z E 0

t

Z

t

zb(Yu , u) g(Yu , u) du Z t zu g(Yu , u) du , =E

ze(Yu , u) g(Yu , u) du = E

0

0

so we may apply the previous lemma to conclude that zb(Yt , t) is a version of E[zt | Yt ] for Lebesgue-a.e. t. The case where z takes values in Rd ⊗Rr follows in the same way. 4.8 Definition. Let (Ω, F , F, P) be a stochastic basis which supports processes {X i }i and a random variable T . We say that T is strongly independent of the processes {X i }i is there exists a σ-field G ⊂ F such that X i is G ⊗R+ -measurable for each i and σ(T ) is independent of G . 4.9 Remark. The statement that “X is independent of T ” means that σ(X) , σ(Xt : t ∈ R+ ) and σ(T ) are independent. Unfortunately, for general measurable processes X, we cannot immediately conclude that X ∈ 64

CHAPTER 4. MAIN THEOREM

σ(X)⊗R+ . Indeed, if this were true, it would imply that every adapted process is progressive without modification. If the sample paths of X have enough regularity that we may write X as the pointwise limit of simple functions, then it is of course true that X ∈ σ(X)⊗R+ . We give the previous definition so that we may handle the situation where we do not assume any sample path regularity. 4.10 Lemma. Let (Ω, F , F, P) be a stochastic basis which supports an Rd valued process z and an R+ -valued random time T with law µ , L (T ). If T is strongly independent of z, then Z (4.11) E[zt ] µ(dt) = E[zT ]. R+

Proof. Let G ⊂ F be a σ-field such thatR T is independent of G and z is G ⊗R+ measurable. We will show that R+ E[Xt ] µ(dt) = E[XT ] for all bounded, G ⊗R+ -measurable processes X. Letting X ∈ bF ⊗R+ mean that X is a bounded, F ⊗R+ -measurable process, we set Z n o C , X ∈ bF ⊗R+ : E[Xt ] µ(dt) = E[XT ] . R+

If Xt = then Z

Pn

i=1

Ai 1Bi (t) for some random variables Ai ∈ G and sets Bi ∈ R+ ,

E[Xt ] µ(dt) = R+

X

E[Ai ] P[T ∈ Bi ] =

i

X

E[Ai 1{T ∈Bi } ] = E[XT ].

i

As C is a monotone class that contains all X of this form, we conclude that G ⊗R+ ⊂ C . We then use monotone convergence to write Z Z E [ kzt k ] µ(dt) = lim E [ kzt k ∧ n ] µ(dt) R+

n

R+

= lim E [ kzT k ∧ n ] = E [ kzT k ] . n

If this expression is finite, then (4.11) follows in the same way by dominated converge at each coordinate. If this expression is infinite, then both sides of (4.11) are defined to be ∞ (recall Rem. 1.7). Either way, (4.11) holds. 65

CHAPTER 4. MAIN THEOREM

4.12 Corollary. Let (E, E ) be a measurable space, and let (Ω, F , F, P) be a stochastic basis which supports an E-valued process Y , an Rd -valued process z, and an R+ -valued random time T . Let zb be a (deterministic) E ⊗R+ /R d measurable function such that zb(Yt , t) is a version of E[zt | Yt ] for Lebesguea.e. t. If the the law of T is absolutely continuous with respect to Lebesgue’s measure, and T is strongly independent of Y and z, then zb(YT , T ) is a version of E[zT | YT , T ] Proof. Let G ⊂ F be a σ-field such that T is independent of G and z is G ⊗R+ measurable. Set µ , L (T ) and write zb = (b z 1 , . . . , zbd ) and z = 1 d (z , . . . , z ). We now check each component. Fix any bounded deterministic f : E×R+ → R which is E ⊗R+ /R-measurable. A standard monotone class argument shows that the maps (ω, t) 7→ f Yt (ω), t and (ω, t) 7→ zbi Yt (ω), t are both G ⊗R+ /R d -measurable. We have (4.13)

E[f (Yt , t) zbi (Yt , t)] = E[f (Yt , t) zti ]

for Lebesgue-a.e. t by assumption. As µ is absolutely continuous with respect to Lebesgue’s measure, (4.13) holds for µ-a.e. t as well. We then write Z i E[f (YT , T ) zb (YT , T )] = E[f (Yt , t) zbi (Yt , t)] µ(dt) ZR+ E[f (Yt , t) zti ] µ(dt) = R+

= E[f (YT , T ) zTi ], where the first and last equalities follow from Lem. 4.10.

4.2

Approximation Lemmas

To appreciate our first approximation result, consider the following lemma from Revuz and Yor. 4.14 Lemma (Lem. 0.5.7 from [RY99]). Let (Xn , Yn ) be a sequence of random variables with values in separable metric spaces E and F and such that (a) (Xn , Yn ) converges in distribution to (X, Y ), and 66

CHAPTER 4. MAIN THEOREM

(b) the law of Yn does not depend on n. Then for every Borel function f : F → G, where G is a separable metric space, the sequence (Xn , f (Yn )) converges in distribution to (X, f (Y )). While we do not repeat the proof, it essentially results from the fact that we can approximate f arbitrarily well in L1 L (Y ) with bounded, continuous functions. The point is that we get a stronger kind of convergence from the fact that the Yn share a common law. This is related to the notion of weak-strong convergence as developed by Jacod and Memin in [JM81b] and [JM81a]. In the following theorem, it is the assumption of common onedimensional marginal distributions that allows us to conclude that we have weak convergence even though f is only assumed to be measurable. 4.15 Lemma. Let E be a Polish space and let {Y n }n≤∞ be sequence of continuous E-valued processes, possibly defined on different spaces. Let f : E×R+ → Rt d n R be a measurable function and define Ft , 0 f (Ysn , s) ds. If (a) L (Ytn ) = L (Yt1 ) ∀t ∈ R+ and ∀n ∈ N, (b) Y n ⇒ Y ∞ , and R t (c) E1 0 kf (Yu1 , u)k du < ∞

∀t ∈ R+ ,

then

..

f (Y n , ), Pn ×λ[0,t] n∈N is uniformly integrable ∀t ∈ R+ , (e) Pn F n ∈ C(R+ ; Rd ) = 1 for each n ∈ N, and

(d )

(f ) (Y n , F n ) ⇒ (Y ∞ , F ∞ ).

. .

4.16 Remark. There may very well be paths of Y n for which f (Y n , ) is not integrable. Recall that in Rem. 1.7 we adopted the convention that the integral is Rd -valued, where Rd , Rd ∪{∞} is the one-point compactification of Rd , so Ftn may take the value ∞. It is a conclusion of this lemma that F n is a continuous, finitely-valued process Pn -a.s. for each n. As a result, we may treat F n as C(R+ ; Rd )-valued random variable and (f) makes sense.

67

CHAPTER 4. MAIN THEOREM

Proof. Define the σ-finite measure µ on E×R+ by Z ∞ n n 1A (Yu , u) du . µ(A) = E 0

If we fix some t ∈ R+ , then x 7→ x(t) is a continuous map from C(R+ ; R) to Rd . This means that (a) actually holds for n = ∞ as well, and it does not matter which n ∈ N we use in this definition of µ. In particular, we have Z t

n n

f (Yu , u) 1{kf (Yun ,u)k>M } du sup E n≤∞ 0 Z

f (e, u) 1{kf (e,u)k>M } µ(de, du) = E×[0,t] 1

Z

=E

t

f (Yu1 , u) 1{kf (Y 1 ,u)k>M } u

du .

0

Using the integrability assumption (c), we may make this last expression arbitrarily small by choosing M sufficiently large. This implies (d), and also implies that Z t

n n

f (Yu , u) du < ∞ ∀t ∈ R+ = 1 ∀n ∈ N, P 0

which then implies (e). For the final claim, we approximate f in L1 (µ) with bounded continuous functions. Cor. B.9 asserts that we may choose a sequence of bounded functions f m ∈ C(E×[0, m]; Rd ) such that Z

f (e, t) − f m (e, t) µ(de, dt) = 0. lim m→∞

E×[0,m]

Define the processes Ztn,m

Z ,

t∧m

f m (Ysn , s) ds

0

68

CHAPTER 4. MAIN THEOREM

for n ∈ N and m ∈ N. If yi → y∞ in C(R+ ; E), then

Z t Z t

m m

f yi (s), s ds f y∞ (s), s ds − lim sup sup

i→∞ t≤m 0 0 Z m

m m y (s), s y (s), s − f f ≤ lim sup

ds = 0

i ∞ i→∞

0

by bounded convergence. In particular, the map y 7→ continuous from C(R+ ; E) to C(R+ ; Rd ), so we have (4.17)

R

.

∧m

0

f m y(s), s ds is

(Y n , Z n,m ) ⇒ (Y ∞ , Z ∞,m ) for each fixed m.

For each fixed T , we also have h i lim sup sup Pn sup kFtn − Ztn,m k > ε t≤T m→∞ n∈N h i ≤ lim sup sup ε -1 En sup kFtn − Ztn,m k t≤T m→∞ n∈N Z

f (e, s) − f m (e, s) µ(de, ds) = 0. ≤ lim sup ε -1 m→∞

E×[0,m]

In particular, inf sup Pn d(F n , Z n,m ) > δ = 0 m∈N n∈N

for each δ > 0, so (4.17) implies (f) (e.g., Lem. B.3). 4.18 Remark. If we add an additional sequence of random variables, {Zn }, which take values in some metric space E 2 to the statement of this theorem, and we assume that (Y n , Z n ) ⇒ (Y ∞ , Z ∞ ), then we may conclude that (Y n , Z n , F n ) ⇒ (Y ∞ , Z ∞ , F ∞ ). The next result will show that we may approximate an integrable process in L1 (P×λ[0,t] ) using step functions if we randomize the partition that we use to generate the step functions. First we will need to present a lemma from analysis. If f : R → R is a function which is integrable over [0, T ], and we set φn (t) , n 1[0,1/n] (t), then φn converges to the Dirac mass at 0 in some sense, so we might expect f ∗ φn to converge to f in L1 [0, T ], λ[0,T ] . This observation motivates, but is not quite equivalent to, the following lemma. 69

CHAPTER 4. MAIN THEOREM Rt 4.19 Lemma. Let f : R+ → Rd be function with 0 kf (s)k ds < ∞ for all t ∈ R+ . Define the sets u+i−1 u+i n Ii , (t, u) ∈ R+ ×[0, 1] : ≤t< , n n and defineP the sequence of approximating functions fn : R+ ×[0, 1] → Rd by u+i−1 fn (t, u) , ∞ 1Iin (t, u). Then i=1 f n 1

Z

Z

t

kf (s) − fn (s, u)k ds du = 0 ∀t ∈ R+ .

lim

n→∞

0

0

Proof. Fix any t and ε > 0. Then choose g ∈ C([0, t + 1]; Rd ) with Z t+1 kf (s) − g(s)k ds < ε/4. 0

Set m , dte ∈ [t, t + 1) ∩ N and set gn (s, u) , have Z 1Z t kfn (s, u) − gn (s, u)k ds du 0

Pmn

i=1

g

u+i−1 n

1Iin (s, u). We

0

≤ = =

mn Z X

1

i=1 0 mn Z 1 X i=1 0 mn Z X i=1

Z

u+i n u+i−1 n

f

i n

i−1 n

f

u+i−1 n

u+i−1 n

−g

u+i−1 n

−g

u+i−1 n

du

n

ds du

f (v) − g(v) dv ≤ ε/4.

As g is uniformly continuous on the interval [0, t + 1], we may choose N so large that |s2 − s1 | < 1/N implies kg(s2 ) − g(s1 )k < ε/(4t). By enlarging N R 1/N if necessary, we may also assume that 0 kg(s)k ds ≤ ε/4. Putting all of

70

CHAPTER 4. MAIN THEOREM

these estimates together gives Z 1Z t kf (s) − fn (s, u)k ds du 0 0 Z 1Z Z t kf (s) − g(s)k ds + ≤ 1/n

Z

kg(s)k ds +

+ 0

kg(s) − gn (s, u)k ds du

u/n 0 1Z t

0

Z

t

kgn (s, u) − fn (s, u)k ds du 0

0

≤ ε/4 + ε/4 + ε/4 + ε/4 when n ≥ N . Using this lemma, we see that we may approximate an arbitrary integrable process using step functions if we first extend the space to add an independent uniform random variable, and we use this variable to randomize the placement of the partition points that we use to do the sampling. 4.20 Lemma. Let (Ω0 , G , Q) be a probability space which supports an Rd valued process a0 with Z t Q 0 (4.21) E kas k ds < ∞ ∀t ∈ R+ . 0

Set Ω , [0, 1]×Ω0 with typical point ω = (u, ω 0 ), and define U (u, ω 0 ) , u. Letting R[0,1] denote the Borel σ-field on [0, 1], set F , R[0,1] ⊗G , P , λ[0,1] ×Q, and a(u, ω 0 ) , a0 (ω 0 ). Finally, define the random times T0n , 0, Tin , (U + i − 1)/n for i ∈ {1, . . . , n2 }, and Tnn2 +1 , ∞ and the sampled P 2 n ) (t). Then processes ant , ni=1 aTin 1[Tin ,Ti+1 Z (4.22)

lim E

n→∞

t

kas −

P

ans k ds

= 0 ∀t ∈ R+ .

0

Proof. We will first show that the collection of processes {an }n is uniformly integrable with respect to P×λ[0,t] for each t ∈ R+ . To see this, fix some

71

CHAPTER 4. MAIN THEOREM n t ∈ R+ and set m , dte ∈ [t, t + 1) ∩ N so t ≤ Tmn+1 . We then write Z t n P kas k 1{kans k>M } ds E 0 Z Tn mn+1 P n ≤E kas k 1{kans k>M } ds 0 mn i 1 X Ph n = E kaTi k 1{kaT n k>M } i n i=1 mn Z 1 h i du X = EQ ka0(u+i−1)/n k 1{ka0(u+i−1)/n k>M } n i=1 0 mn Z i/n h i X = EQ ka0s k 1{ka0s k>M } ds i=1

(i−1)/n

Z

m

ka0s k 1{ka0s k>M }

≤E

Q

ds .

0

where the third relation follows Fubini’s Theorem applied to the product measure P = Q×λ[0,t] . As a0 is integrable over the interval [0, m] under Q, we may make this last expression arbitrarily small by choosing M large. We have now shown that {an }n is uniformly integrable with respect to P×λ[0,t] . As a result, ka − an k n is also uniformly integrable with respect to P×λ[0,t] . Define Z 1Z t 0 0 (4.23) At,n (ω ) , ka0s (ω 0 ) − ans (u, ω 0 )k ds du, 0

0

which is a random variable on Ω0 . We then write Z 1 Z t h + + i Q 0 Q 0 0 n 0 E At,n − M =E kas (ω ) − as (u, ω )k ds du − M 0 0 Z 1 Z t + 0 Q 0 0 n ≤E kas (ω ) − as (u, ω )k ds − M du 0 0 Z t + P n kas − as k ds − M =E 0 Z t + P n ≤E kas − as k − M/t ds , 0

72

CHAPTER 4. MAIN THEOREM

where both inequalities follow from Jensen’s inequality. The uniform integrability of {A0t,n }n with respect to Q then follows from the uniform integrability of kas − ans k n with respect to P×λ[0,t] . Rt If we fix an ω 0 such that 0 ka0s (ω 0 )k ds < ∞ for all t ∈ R+ , then we may apply the previous lemma to the right-hand of (4.23) to conclude that R t side 0 0 0 limn At,n (ω ) = 0. (4.21) implies that Q 0 kas k ds < ∞ ∀t ∈ R+ = 1, so may conclude that limn A0m,n = 0 Q-a.e. Combining this with the uniform integrability of {A0t,n }n , we conclude that limn EQ [A0t,n ] = 0. As t is arbitrary, we have shown (4.22). To motivate the final approximation result, we recall that a local martingale of finite variation is constant. The next result is essentially a prelimiting version of that result. In this lemma, we have a sequence of absolutely continuous processes that are only martingales with respect to a discrete partition. We will show that we can control such a sequence by controlling the width of the partition and the integrability of the derivatives. To state the lemma, we need the following definition. 4.24 Definition. If π = {0 = T0 ≤ T1 ≤ . . . ≤ Tn } is a linearly ordered sequence of random times, then we call π a random partition and we set |π|(ω) , sup1≤i≤n |Ti (ω) − Ti−1 (ω)|. If {π m }m is sequence of random partitions, possibly defined on different spaces {Ωm }m , with π m = {0 = T0m ≤ T1m ≤ . . . ≤ TNm(m) }, then we say that {π m }m converges uniformly to the identity if lim sup |π m |(ω) = 0, and m→∞ ω∈Ωn

lim inf TNm(m) (ω) = ∞.

m→∞ ω∈Ωn

4.25 Lemma. Let (Ωn , F n , Pn ) n be a sequence of probability spaces. Assume that on each space there is defined a processes xn and a random partition π(n) = {0 = T0n ≤ T1n ≤ . . . ≤ TNn (n) }. Further assume that the collection of processes and measures (xn , Pn ×λ[0,t] ) n is uniformly integrable for each t ∈ R+ and that the sequence of partitions

73

CHAPTER 4. MAIN THEOREM

{π(n)}n converges uniformly to the identity. Finally, define Ykn

Z

Tkn

,

xnu du

0

and Fkn , σ(Yjn , Tjn : j ≤ k) for k ∈ {0, . . . , N (n)}, and assume that {Ykn , Fkn }0≤k≤N (n) is a martingale for each n. Then

Z s

n n xu du = 0 ∀t ∈ R+ . (4.26) lim E sup n→∞

s∈[0,t]

0

Proof. First we derive an estimate for a single process. Let x : Ω×R+ → Rd be a process and suppose that π = {0 = T0 ≤ T1 ≤ . . . ≤ TN }, Rt is a random partition with TN > t and |π| ≤ 1. Set Xt , 0 xs ds, Yk , XTk , and Fk , σ(Yj , Tj : j ≤ k). We show below that if Yk , Fk 0≤k≤N is a martingale, then Z t+1 h i + E sup kXs k ≤ M E |π| + E kxs k − M ds s∈[0,t]

0

1 h Z t+1 + i 2 + d C1 M E |π| + E kxu k − M du 0 s h Z t+1 i × E kxu k du ,

(4.27)

0

where C1 is a constant that does not depend on x and M is arbitrary. To see this, let S , inf{k ∈ {0, . . . , N } : Tk ≥ t}, so S is an F-stopping time and Y stopped at S is still an F-martingale. Also notice that |π| ≤ 1 implies that TS ≤ t + 1. Letting Y i and X i denote the ith components of Y

74

CHAPTER 4. MAIN THEOREM

and X, we write h i E max kYn k 1≤n≤N ∧S i X h ≤ E max |Yni | 0≤n≤N ∧S

1≤i≤d

≤

X

q

C1 E

P

0≤n≤N ∧S

1≤i≤d

≤ d C1 E

q

P

(Yni

−

i )2 Yn−1

XTn − XTn−1 2

 r0≤n≤N ∧S ≤ d C1 E 

Z

max XTn − XTn−1

0≤n≤N ∧S



t+1

kxu k du 

0

s s Z

max XTn − XTn−1 E ≤ d C1 E 1≤n≤N ∧S

= d C1 (4.28)

kxu k du

0

Z

t+1

t+1

M E |π| + E kxu k − M 0 s i h Z t+1 kxu k du , × E

+

12 du

0

where C1 is the “universal” constant from the discrete-time BDG inequality with p = 1 (e.g., [Gar73] II.1.1) and the fifth inequality is H¨older’s. Now fix some s ≤ t and temporarily set bsc , max i ∈ {0, 1, . . . , N − 1} : Ti ≤ s so Tbsc is the largest random time before s. With this notation, we write

kXs k ≤ Xs − XTbsc + XTbsc Z Tbsc+1

≤ kxu k du + XTbsc (4.29) Tbsc Z t + ≤ M |π| + kxu k − M du + max kYn k. 1≤n≤N ∧S

0

We now sup over s ∈ [0, t] on the left hand side of (4.29) and take expectations

75

CHAPTER 4. MAIN THEOREM

to give h

i

h E sup kXs k ≤ M E |π| + E

t

Z

s∈[0,t]

kxu k − M

+

i h i du + E max kYn k . 1≤n≤N ∧S

0

(4.27) then follows from this inequality and (4.28). We now use (4.27) to show that (4.26) holds. Fix some t ∈ R+ and ε > 0 and set Z t+1

n n

xu du . C2 , sup E n

0

The uniform integrability of (x , P ×λ[0,t+1] ) n ensures that C2 < ∞, and we also use the uniform integrability to choose M1 (ε, C2 ) so large that h Z t+1 + i ε ε2 n . sup E kxnu k − M1 du ≤ ∧ 4 8C12 C2 d2 n 0 n

Set

n

δ = δ(ε, C2 , M ) ,

ε ε2 ∧ 2M1 8M1 C12 C2 d2

,

and choose M2 (δ) so large that that |π(n)| ≤ δ ∧ 1 and TNn (n) > t for all n > M2 using the fact that {π(n)}n converges uniformly to the identity. Putting this all together and applying the estimate (4.27) to X n then gives

76

CHAPTER 4. MAIN THEOREM

h i En sup kXsn k s∈[0,t] n

n

≤ M1 E |π(n)| + E

t+1

hZ

kxns k − M1

+

ds

i

0

h + d C1 M1 E |π(n)| + En s h Z t+1 i n × E kxns k ds n

Z

t+1

kxns k

− M1

+

ds

i 21

0

0

21 p ε2 C2 ≤ M1 δ + ε/4 + d C1 M1 δ + 8C12 C2 d2 p ε √ ≤ ε/2 + d C1 C2 2C1 C2 d ≤ε

for all n ≥ N . As t and ε were arbitrary, we have shown that (4.26) holds.

4.3

Main Theorem

We recall the following definition for the reader’s convenience. 2.1 Definition. Let E be a Polish space, and let Φ : E×C0 (R+ ; Rd ) → C(R+ ; E) be a function. We say that Φ is an updating function if (a) Φt (e, x) = Φt e, ∇(x, t) ∀t ∈ R+ , and (b) Θ Φ(e, x), t = Φ Φt (e, x), ∆(x, t) ∀t ∈ R+ . If Φ is also continuous as a map from E×C0 (R+ ; Rd ) to C(R+ ; E), then we say that Φ is a continuous updating function. We first give a version of the main theorem stated in terms of the characteristics of an Itˆo process. 4.30 Theorem. Let (Ω, F , F, P) be stochastic basis that Rsupports an Rd t valued Itˆo process Y with Y0 = 0 and characteristics Bt = 0 bs ds and Ct = 77

CHAPTER 4. MAIN THEOREM Rt 0

cs ds, where bt ∈ Rd and ct ∈ S+d are F-adapted processes with Z

(4.31)

t

kbs k + kcs k ds < ∞

E

∀t ∈ R+ .

0

Let E be a Polish space, and let Z be a continuous, E-valued process with Z = Φ(Z0 , Y ) for some continuous updating function Φ. Let N ⊂ R+ be a Lebesgue-null set, and let bb : E×R+ → Rd and b c : E×R+ → S+d be (deterministic) functions such that bb(Zt , t) = E[bt | Zt ] a.s. and b c(Zt , t) = E[ct | Zt ] a.s. when t ∈ / N. c, F, b P) b that supports continuous, b F Then there exists a stochastic basis (Ω, b F-adapted processes Yb and Zb such that R b C), b where B bt , t bb(Zbs , s) ds (a) Yb is an Itˆo process with characteristics (B, 0 R bt , t b bs , s) ds, and C c ( Z 0 (b) Zb = Φ(Zb0 , Yb ), and (c) Zb has the same one-dimensional marginal distributions as Z. 4.32 Remark. As the proof is somewhat involved, we first give a heuristic e and explanation of the main steps in the context of an example. Let Ye , C, 2 e et = σ c,σ e , so we have h Y it = C Re t be defined as in Example 2.10, and set e e c ds. In Example 2.4, we produced an updating function Φ such that 0 s Y = Φ(0, Y ) when Y0 = 0, so we may take Y = Z = Ye in the statement of the theorem. We take b c=σ b2 , where σ b is defined as in (2.19) of Example 2.18. e e In particular, we showed in Example 2.18 that b c(x, t) = E e ct Yt = x for R bt (Y ) , t b t > 0. Defining C c(Ys , s) ds (which is slightly at odds with the 0 definition given in the statement of the theorem, but convenient for the purposes of this remark), the theorem asserts that we may find a process bt (Yb ) and such that Yb has the same one-dimensional Yb such that h Yb it = C marginal distributions as Ye . We will construct a sequence of processes in such a way that Yb is given as the weak limit of this sequence. Set Ω = {0}×C0 (R+ ; R2 ) with canonical process X = (Y, C) as in Exame is a measure on Ω. Define a sequence of deterministic ple 3.4, so P , L (Ye , C) partitions π(n) , {0 = tn0 < . . . < tnn < tnn+1 = ∞},

78

CHAPTER 4. MAIN THEOREM and set Gin , σ(Ytni ). In Example 3.4, we showed that Π(n) , {(Gin , tni )}0≤i≤n is an extended partition, so we may define the sequence of measures Pn , P⊗Π(n) . Recall that we interpreted these extended partitions as filtration-like objects in which we choose to forget everything about the process X at time tni except the current location of Y . We also showed in Example 3.4 that, in this case, we have n n n Hi n , σ Gin , ∆(X ti , tni−1 ) = σ Θ(Y ti , tni−1 ), ∆(C ti , tni−1 ) . In particular, if we choose some s ∈ R+ , then we may choose some j such n that s ∈ [tnj−1 , tnj ], and we then have Ys = Θs−tni−1 (Y ti , tni−1 ) ∈ Hi n . Using (a) of Thm. 3.6, we see that Pn [Ys ≤ y] = P[Ys ≤ y] for any y ∈ R. In particular, Y has the same one-dimensional marginal distributions under each Pn as Ye . To show that this sequence of measures is tight, we use a result of Rebolledo [Reb79] which asserts that the sequence {L (Y | Pn )}n is tight when the sequence {L (C | Pn )}n is tight. But the tightness of {L (C | Pn )}n follows from the integrability condition (4.31) and Cor. 3.44, so by passing to a subsequence, we may assume that L (Y | Pn ) ⇒ Yb for some limiting process Ye . As Y has the same one-dimensional marginal distributions under each Pn as Ye , Yb also has this property. b Yb ). As hY i = C under each Pn (e.g., We still need to show that h Yb i = C( Lem. 3.47), the main result of Appendix F asserts that if L (Y, C | Pn ) ⇒ b Yb ) , then h Yb i = C( b Yb ). Lem. 4.15 essentially asserts that we have L Yb , C( b Yb ) , so all we need to show is that L C − b ) | Pn ⇒ L Yb , C( L Y, C(Y b ) | Pn ⇒ 0. This is probably the most technical part of the proof, and C(Y we only give a plausibility argument now. Let Y n and C n denote the processes Y and C of Example 3.8, when we take π = π(n) in that example, and let cn denote the right derivative of C n . As noted in Example 3.8, we have L (Y n , C n ) = Pn . Notice that P and Pn are not equivalent. In particular, Pn charges paths of C which change slope, and P does not. Comparing the definition of σ in Example 3.8 with the definition of σ b given in (2.19) of Example 2.18, we see that, for each n and i > 0, we have En [ctni | Ytnni = x] = b c(x, tni ), so b c is the expected variance accumulation rate, conditional on the location of Y at a reset time. As each reset is conditionally independent given the value of Y at the time of the reset, we might hope that when we have enough partition points, a b to converge to zero. To law of large numbers will kick in causing C − C 79

CHAPTER 4. MAIN THEOREM

make this work without imposing continuity assumptions on b c, we need to slightly randomize the placement of the partition points and use Lem. 4.20. We introduce the uniform random variable in the proof that follows for just this purpose. Proof. To free up some notational space, we add a tilde to every symbol in e the statement of the theorem which does not have a hat, and we set Φ , Φ. e to a space with The first thing that we do is transport the problem from Ω 2 0 more structure where it is easier to work. Set Ω , E×C0 (R+ ; Rd+d+d ) and set Ω , [0, 1]×Ω0 . We will write a typical point of Ω as ω = (u, ω 0 ) = (u, e, x) 2 where u ∈ [0, 1], ω 0 ∈ Ω0 , e ∈ E, and x ∈ C0 (R+ ; Rd+d+d ). We define the random variables U (u, ω 0 ) , u, E(u, e, x) , e, and X(u, e, x) , x, and we subdivide X as (Y, B, C) = X where Y ∈ C0 (R+ ; Rd ), B ∈ C0 (R+ ; Rd ), and C ∈ C0 (R+ ; Rd ⊗Rd ). We also define the continuous, E-valued process Z , Φ(E, Y ). Let E 0 , X 0 , Y 0 , Z 0 , B 0 , and C 0 denote the corresponding random variables defined on Ω0 . Letting E denote the Borel σ-field on E, we see that Ω0 is Polish space with Borel σ-field G , E ⊗σ(X). The filtration on Ω0 is given by G0 , {Gt0 }t∈R+ where Gt0 = E ⊗σ(X t ). Letting R[0,1] denote the Borel σ-field on [0, 1], we see that Ω is also Polish space with Borel σ-field F , R[0,1] ⊗G . The filtration on Ω is F0 , {Ft0 }t∈R+ where Ft0 = R[0,1] ⊗Gt0 . Because [0, 1]×E is a Polish space, we are in Setting 3.1, and we may apply the results from Chapter 3 to measures on Ω. We define the measure (4.33)

e C), e Q , L (Ze0 , Ye , B,

on Ω0 and the measure P , λ[0,1] ×Q on Ω. In particular, we have (4.34)

e B, e C). e L (E 0 , Y 0 , Z 0 , B 0 , C 0 | Q) = L (Ze0 , Ye , Z,

By taking divided difference on the left (e.g., Lem. C.10), we may find G0 predictable processes b0 and c0 such that, for each ω 0 ∈ Ω0 , we have b0t (ω 0 ) = ∂ ∂ e and B 0 (ω 0 ) and c0t (ω 0 ) = ∂t Ct0 (ω 0 ) whenever these derivatives exists. As B ∂t t e e are P-a.s. C absolutely continuous, (4.34) implies that B 0 and C 0 are Q-a.s.

80

CHAPTER 4. MAIN THEOREM

absolutely continuous. Pathwise arguments show that we have Z t 0 0 (4.35) bs ds ∀t = 1, and Q Bt = 0 Z t 0 0 (4.36) cs ds ∀t = 1. Q Ct = 0

Setting b(u, ω 0 ) , b0 (ω 0 ) and c(u, ω 0 ) , c0 (ω 0 ), we see that the corresponding properties also hold under P. It is then clear from the product structure that U is strongly independent of (Y, Z, B, C, b, c) under P. R R e B et = t ebs ds ∀t] = 1, P[Bt = t bs ds ∀t] = 1, and L (B) e = We have P[ 0 0 e and B are both a.s. absolutely continuous, all of the L (B | P). As B information about their derivatives is essentially encoded RT R T in their common e law. In particular, the random variables 0 kbs k ds and 0 kbs k ds (under P) agree in law. The details of this argument are given in Cor. C.18. This means that we have Z t Z t P e e k bs k ds < ∞. kbs k ds = E (4.37) E 0

0

Repeating the argument for c gives Z t Z t P e ke cs k ds < ∞. (4.38) E kcs k ds = E 0

0

We will now show that bb(Zt , t) is still a version of EP [bt | Zt ] for Lebesguea.e. t. Fixing any t ∈ R+ and any bounded, E ⊗R+ /R d -measurable f : E×R+ → Rd , we write Z t Z t P e e e E bs f (Zs , s) ds = E bs f (Zs , s) ds 0 0 Z t b e e e =E b(Zs , s) f (Zs , s) ds 0 Z t P bb(Zs , s) f (Zs , s) ds . =E 0

e Z) e The first and last equalities follows from the fact that L (B, Z | P) = L (B, (e.g., Cor. C.18). The middle equality follows from our assumption that 81

CHAPTER 4. MAIN THEOREM

bb(Zet , t) is a version of E[ e ebt | Zet ] for Lebesgue a.e. t and Lem. 4.1. We then apply Lem. 4.1 again to conclude that bb(Zt , t) is a version of EP [bt | Zt ] for Lebesgue a.e. t. It follows in the same way b c(Zt , t) is a version of EP [ct | Zt ] for Lebesgue-a.e. t. In particular, (4.4) of Lem. 4.1 asserts that Z t

P

bb(Zs , s) ds < ∞ ∀t ∈ R+ , and (4.39) E 0 Z t

P

b (4.40) c(Zs , s) ds < ∞ ∀t ∈ R+ . E 0

e P). e It follows from e C) e with respect to (F, Ye has the characteristics (B, 0 0 0 (4.33) that Y has the characteristics (B , C ) with respect to (G0 , Q). As the only difference between (F0 , P) and (G0 , Q) is the addition of an F00 measurable random variable that is independent of σ(E, X), we may conclude that Y still has the characteristics (B, C) with respect to (F0 , P). We define the random times T0n , 0, Tin , (U +i−1)/n for i ∈ {1, . . . , n2 }, and Tnn2 +1 , ∞. We collect these random times into the random partitions π(n) , {0 = T0n ≤ . . . ≤ Tnn2 ≤ n}. Notice that each Tin is trivially an F0 -stopping time as Tin is F00 -measurable, and notice that the sequence of partitions {π(n)}n converges uniformly to the identity. We now define the additional objects that we need to specify a generalized partition. For each n ∈ N, let G0n , H0n , F00 = σ(U, E), Gin , σ(U, ZTin ) Hi n

for 1 ≤ i ≤ n2 , and n n n , σ Gi−1 , ∆(X Ti , Ti−1 ) for 1 ≤ i ≤ n2 + 1.

Intuitively, this structure means that the only historical information that we keep at the reset time Tin is the value of U and the current location of Z. n n Notice that T1n − T0n = U/n and Tin − Ti−1 = 1/n for i > 1, so Tin − Ti−1 is n always Gi−1 -measurable. As Z may be updating using only the changes in Y , we have (4.41)

n

n Θ(Z Ti , Ti−1 ) ∈ Hi n

∀i ∈ {1, . . . , n2 + 1}.

82

CHAPTER 4. MAIN THEOREM

To show this rigorously, we write n

n −T n i−1

n Θ(Z Ti , Ti−1 ) = ΘTi

n (Z, Ti−1 )

n −T n i−1

= ΘTi (4.42) (4.43)

n Φ(E, Y ), Ti−1

n −T n i

= ΦTi

n n (E, Y ), ∆(Y, T ΦTi−1 i−1 ) n n n n , ∆(Y, T = ΦTi −Ti−1 ZTi−1 i−1 ) n n T n −Tin n n ,∆ i (Y, Ti−1 ) = ΦTi −Ti−1 ZTi−1 n n Tn n n , ∆(Y i , T = ΦTi −Ti−1 ZTi−1 i−1 ) ,

where we use property (b) of Def. 2.1 at (4.42) and property (a) of Def. 2.1 at n Tn n n n , and ∆(Y i , T (4.43). Because Tin − Ti−1 , ZTi−1 i−1 ) are all Hi -measurable, n n we have now written Θ(Z Ti , Ti−1 ) as a function of Hi n -measurable random variables. We now set Π(n) , {(Tin , Gin )}0≤i≤n2 for n ≥ 1, and we show that each Π(n) is an extended partition. Specifically, we need to check that (a) each Tin is a finite F0 -stopping time, n n ) , and ∈ σ Gi−1 , ∆(X, Ti−1 (b) Tin − Ti−1 (c) Gin ⊂ Hi n . Claim (a) holds as π(n) is uniformly bounded by n and Tin ∈ F00 for all i, n Tn n (Z i , T and we have already shown (b). Writing ZTin = ΘTin −Ti−1 i−1 ), we see n n n that (4.41) and Ti − Ti−1 ∈ Gi−1 imply that ZTin ∈ Hi for each i, so (c) holds as well. We now use this sequence of extended partitions to define a ⊗Π(n) sequence of measures on Ω, setting Pn , P for each n ∈ N. We will now show that the collection of laws, {L (E, Y | Pn )}n , is tight. {L (E, Y | Pn )}n is tight if and only if the collections {L (E | Pn )}n and {L (Y | Pn )}n are both tight (e.g., Lem. B.5), so we may check each collection individually. {L (E | Pn )}n contains a single element, so it is clearly tight. Because Y has the characteristics (B, C) under P, and ∆(Y, Tin ), ∆(B, Tin ), and ∆(C, Tin ) are all trivially ∆(X, Tin )-measurable for each i ∈ {0, 1, . . . , n2 }, we may apply Lem. 3.47 to conclude that Y has the characteristics (B, C) with respect to any measure in the set {Pn }n . As we have (4.37) and (4.38), we may apply Cor. 3.44 to conclude that the collection {L (B, C | Pn )}n is tight. We then use the results of Rebolledo (e.g., Cor. E.12 or [JS87] Thm. VI.4.18) to conclude that the collection {L (Y | Pn )}n is tight. 83

CHAPTER 4. MAIN THEOREM As the collection of laws {L (E, Y | Pn )}n is tight, we may assume (by b for some passing to a subsequence if necessary) that L (E, Y | Pn ) ⇒ P d b on the Polish space Ω b , E×C0 (R+ ; R ) with E(e, b x) , e and measure P b0 , b Yb ), and we define the filtration F Yb (e, x) , x. We also set Zb , Φ(E, {Fbt0 }t∈R+ where Fbt0 , σ(E, Y t ). As we assumed that Φ is continuous, we have b Yb , Zb . (4.44) L E, Y, Z | Pn ⇒ L E, We now show that L (Zt | Pn ) = L (Zt | P) for all n ∈ N and t ∈ R+ . n Fix any A ∈ E and t ∈ R+ . As U ∈ Hi n for all i, the event {t ∈ [Ti−1 , Tin )} n and the random variable Sin , (t − Ti−1 )+ are both Hi n -measurable. Notice n T n n that Zt = ΘSin (Z i , Ti−1 ) when t ∈ [Ti−1 , Tin ). Combining this observation with (4.41) gives n

P [Zt ∈ A] =

2 +1 nX

P

n

h

Θ (Z Sin

Tin

n , Ti−1 )

∈ A and t ∈

n [Ti−1 , Tin )

i

i=1

=

2 +1 nX

h i n n n P ΘSin (Z Ti , Ti−1 ) ∈ A and t ∈ [Ti−1 , Tin )

i=1

= P[Zt ∈ A], n (e.g., (a) of where we have used the fact that Pn agrees with P on each Hi+1 b Thm. 3.6). It then follows from (4.44) that L (Zt ) = L (Zt | P) = L (Zet ) for all t ∈ R+ . To complete the proof, we need to characterize the limit. We will show b P) b by showing that b C) b with respect to (F, that Yb has the characteristics (B,

(4.45)

b B, b C), b L (Y, Z, B, C | Pn ) ⇒ L (Yb , Z,

and applying Thm. F.1. As a first step, define the processes Z t b bt , b(Zt , t), Bt , bs ds, 0 Z t ct , b c(Zt , t), and Ct , cs ds. 0

84

CHAPTER 4. MAIN THEOREM

As we have (4.39), (4.40), and (4.44), and we have shown that Z has the same one-dimensional marginal distributions under each Pn , we may apply Lem. 4.15 and Rem. 4.18 to conclude that (4.46) (4.47) (4.48)

b Yb , Z, b B, b C), b L (E, Y, Z, B, C | Pn ) ⇒ L (E, (b, Pn ×λ[0,t] ) n is u.i. ∀t ∈ R+ , and (c, Pn ×λ[0,t] ) n is u.i. ∀t ∈ R+ .

If we show that limn Pn [d(B, B) > ε] = 0 and limn Pn [d(C, C) > ε] = 0 for each ε, then (4.45) follows from (4.46) (e.g., Lem. B.2). We will actually do slightly more. We will show that i h (4.49) lim En sup kBs − Bs k = 0 ∀t ∈ R+ , and n→∞ s≤t i h (4.50) lim En sup kCs − Cs k = 0 ∀t ∈ R+ . n→∞

s≤t

We now show that (4.49) holds by approximating B and B with step functions. As a first step, we show that there exist random variables {ξin }1≤i≤n2 n such that P[ξin = bTin ] = 1 and ξin is Hi+1 -measurable. Recall that Tin , (U + i − 1)/n for i ∈ {1, . . . , n2 }, and define the Rd -valued random variables ξin , lim inf m(BTin +1/m − BTin ) m→∞

n − for i ∈ {1 . . . , n2 } where the lim inf is taken at each coordinate. As Ti+1 n T n n n n i+1 Ti ≥ 1/n when i ≥ 1, it is clear that ξi ∈ σ ∆(B , Ti ) ⊂ Hi+1 . In prose, ξin is the right derivative of B at the time Tin (when it exists), so ξin is fully determined by the changes in B just after Tin . For each ω 0 ∈ Ω0 , we define the sets n o , u ∈ [0, 1] : ξin (u, ω 0 ) = b0(u+i−1)/n (ω 0 ) , and Aωn,i 0 n o ∂ Bωn,i0 , u ∈ [0, 1] : Bt0 (ω 0 ) exists at t = (u + i − 1)/n . ∂t ∂ Recall that b0t (ω 0 ) = ∂t Bt0 (ω 0 ) whenever this derivative exists. It is clear ∂ from the construction of ξin that ξin (u, ω 0 ) = ∂t Bt0 (ω 0 ) at t = (u + i − 1)/n whenever this derivative exists. Combining these two observations, we see

85

CHAPTER 4. MAIN THEOREM that Bωn,i0 ⊂ Aωn,i for all ω 0 ∈ Ω0 . Using Fubini’s Theorem, we may write 0 Z n 0 P[ξi = bTin ] = λ[0,1] (Aωn,i 0 ) Q(dω ). Ω0

If we choose an ω 0 ∈ Ω0 such that B 0 (ω 0 ) is absolutely continuous, then we 0 have λ[0,1] (Bωn,i0 ) = 1, so λ[0,1] (Aωn,i 0 ) = 1 as well. As B is Q-a.s. absolutely 0 continuous, this equality holds for Q-a.e. ω , and we conclude that P[ξin = bTin ] = 1.

(4.51)

We pause for a brief comment on this argument. We never assume that a Borel measurable subset of [0, 1]. If we choose ω 0 ∈ Ω0 such that B (ω ) is absolutely continuous, then Bωn,i0 is necessarily Lebesgue measurable as it is the complement of a null set. On the other hand, Aωn,i 0 is a cross section n of the R[0,1] ⊗G -measurable set {ξi = bTin }, so it is Borel measurable. We only apply Fubini’s Theorem to the set {ξin = bTin }, so the potential lack of Borel measurability of the set Bin (ω 0 ) is not a problem. We now define some sequences of step functions which we will use to approximate b and b. Let Bωn,i0 is 0 0

2

bnt

,

n X

ξin

n ) (t), 1[Tin ,Ti+1

Btn

Z

i=1 n2

bnt ,

X

n ) (t), bTin 1[Tin ,Ti+1 Btn ,

i=1

t

bns ds,

, 0

Z

t

bns ds, and

0

n2 Π(n)

bt

,

X

n ) (t). bTin 1[Tin ,Ti+1

i=1

As P[ξin = bTin ] = 1 for each i, bn and bΠ(n) are P-indistinguishable. Each B n ∂ Btn (ω) = bnt (ω) except at a finite is piecewise affine, so, for each ω, we have ∂t number of points. It is also clear that ∆(B n , Tin ) is σ(ξjn : j ≥ i)-measurable. n n As each ξjn is Hj+1 -measurable and Hj+1 ⊂ σ Gin , ∆(X, Tin ) for j ≥ i, we conclude that ∆(B n , Tin ) is σ Gin , ∆(X, Tin ) -measurable. This means that we may apply Lem. 3.40 to the Rd ×Rd -valued process (B, B n ). In particular, taking f : Rd ×Rd → R to be the function f (x, y) = kx − yk, we may apply

86

CHAPTER 4. MAIN THEOREM

Lem. 3.40 to conclude that for each n, we have Z t Z t n P n n kbs − bs k ds = E kbs − bs k ds E 0

∀t ∈ R+

0

Fixing any t ∈ R+ , we then write Z t n n n n kbs − bs k ds lim sup E sup kBs − Bs k ≤ lim sup E n→∞ n→∞ s≤t 0 Z t P n = lim sup E kbs − bs k ds n→∞ 0 Z t P Π(n) = lim sup E kbs − bs k ds . n→∞

0

U is strongly independent of b under P, so we may apply Lem. 4.20 to conclude that Z t P Π(n) lim E kbs − bs k ds = 0. n→∞

0

In particular, we have now shown that n n (4.52) lim E sup kBs − Bs k n→∞

s≤t

∀t ∈ R+ .

To estimate the distance between B and B n , first notice that Z Tin ∧t

bs − bns ds (4.53) n ∧t Ti−1

Z

Tin ∧t

= n ∧t Ti−1

Z = 0

n

bb(Zs , s) − bb(ZT n , Ti−1

ds )1 {i>1} i−1

n )+ 1/n∧(t−Ti−1

n n

bb Θs (Z Tin , Ti−1 ), T + s i−1

Tin n n b − b Θ0 (Z , Ti−1 ), Ti−1 1{i>1}

ds.

n n n Θ(Z Ti , Ti−1 ), Ti−1 , and 1/n ∧ (t − Ti−1 ) are all Hi n -measurable, so (4.53) is n Hi -measurable as well. Then, fixing any t and using the fact that Pn and n

87

CHAPTER 4. MAIN THEOREM P agree on each Hin (e.g., (a) of Thm. 3.6), we write lim sup En n→∞

t

Z

2

kbs − bns k ds = lim sup n→∞

0

n X

"Z

En

2

= lim sup n→∞

"Z EP

= lim sup E n→∞

bs − bns ds

t

kbs −

P

#

Tin ∧t

n ∧t Ti−1

i=1

Z

bs − bns ds

n ∧t Ti−1

i=1 n X

#

Tin ∧t

bns k ds

.

0

Once we have reduced the estimate to a calculation under P, we may use the fact that U is strongly independent of Z, so U is strongly independent of b, and we may again apply Lem. 4.20 to conclude that Z t n P lim E kbs − bs k ds = 0. n→∞

0

In particular, we have now shown that Z t n n (4.54) lim E kbs − bs k ds = 0 ∀t ∈ R+ . n→∞

0

R t As En sups≤t kBs − Bsn k ≤ En 0 kbs − bns k ds , (4.54) implies n n (4.55) lim E sup kBs − Bs k = 0 ∀t ∈ R+ . n→∞

s≤t

We are now almost done. We only need to estimate the difference between B and B n . To do this, we define Z t n n n Ψt , Bt − Bt = bns − bns ds. n

0

We will now show that n (4.56) (b − bn , Pn ×λ[0,t] ) n is u.i. ∀t ∈ R+ . Combining (4.47) and (4.54) shows that (bn , Pn ×λ[0,t] ) n is uniformly inte grable for each t, so we only need to show that (bn , Pn ×λ[0,t] ) n is uniformly 88

CHAPTER 4. MAIN THEOREM n n integrable for each t. As ξin and Ti+1 ∧ t − Tin ∧ t are both Hi+1 -measurable, we may break the integral into pieces again to show that Z t n n kbs k 1{kbns k>M } ds sup E n

0 2

= sup n

n X

En

n Ti+1 ∧ t − Tin ∧ t kξin k 1{kξin k>M }

EP

n Ti+1 ∧ t − Tin ∧ t kξin k 1{kξin k>M }

i=1 2

= sup n

n X i=1

Z P

(4.57)

= sup E n

t

kbns k 1{kbns k>M }

ds .

0

We have that bn converges to b in L1 (P×λ[0,t] ) which implies nalready shown that (b , P×λ[0,t] ) n is uniformly integrable. In particular, we may make (4.57) arbitrarily small by choosing sufficiently large M . But this means that (bn , Pn ×λ[0,t] n is uniformly integrable, so we have (4.56). n = U/n 1{i=1} + 1/n 1{i>1} . We then write Set δin , Tin − Ti−1 h i h i 0 0 n n n ,T En ΨnTin − ΨnTi−1 = δin En ξi−1 − bb(ZTi−1 ) F n n n FTi−1 i−1 Ti−1 i h n n n n ,T = δin EP ξi−1 G − bb(ZTi−1 ) i−1 i−1 h i n n n n ,T n = δin EP bTi−1 − δin bb(ZTi−1 ZTi−1 , Ti−1 i−1 ) = 0. The first equality follows from the F00 -measurability of δin . The second equaln n n ,T ity follows from the Hi n -measurability of ξi−1 − bb(ZTi−1 i−1 ) and property n (b) of Thm. 3.6. The third equality follows from the P-equivalence of ξi−1 and n n bTi−1 and the definition of Gi−1 . The final inequality follows from Cor. 4.12 and the fact that U is strongly independent of b and Z under P. This means that {ΨnTin }0≤i≤n2 is a discrete time martingale under Pn , and we may apply Lem. 4.25 to conclude that n n n n (4.58) lim E sup kBs − Bs k = lim sup E sup kΨs k = 0 ∀t ∈ R+ . n→∞

s≤t

n→∞

89

s≤t

CHAPTER 4. MAIN THEOREM

Combining (4.52), (4.55), and (4.58) gives (4.49). We make essentially the same argument to get (4.50). Combining (4.46) with (4.49) and (4.50) then gives (4.45), so we may applying Thm. F.1 to conclude that Yb has the b P), b completing the proof. b C) b with respect to (F, characteristics (B, 4.59 Remark. In this proof, we construct ξin such that P[ξin = bTin ] = 1 for each i by taking the right derivative of B at Tin . In this remark, we want to emphasis that this does not imply that ξin and bTin are Pn -indistinguishable. In general, the measures in the sequence {Pn }n are not equivalent to P. The reason that ξin and bTin agree under P is that U is (strongly) independent of B under P, and B is absolutely continuous. As a result, Tin is P-a.s. a point at which B is differentiable, and the left and right derivatives agree at such a point. Once we start constructing new measures, U and B are no longer independent. In fact, we would expect that the characteristics quite often have “kinks” at reset times as we reset the dynamics of the process at these times, so we should not expect the left and right derivatives to agree at these points. In particular, if we resume the setting of Remark 4.32, we see that C is P-a.s. linear, so C is P-a.s. differentiable for all t > 0; however, C has a “kink” at each reset time under each Pn whenever we “reflip” and change the variance accumulation rate. Also notice that the right derivative of C at the reset time tni is equal to derivative of C over the interval (ti , ti+1 ), while the left derivative of C at the reset time tni is equal to the derivative of C over the previous interval (ti−1 , ti ). To get the theorem announced in Section 2.2, we must show that we can add a Wiener process to the stochastic basis produced in Thm. 4.30. This involves moving to an extension, so we make the following definition. 4.60 Definition. Let X denote the canonical process on the space C(R+ ; Rr ), let C denotes the Borel σ-field on C(R+ ; Rr ), let C0 = {σ(X t )}t∈R+ denote the filtration generated by X, and let W denote Wiener’s measure on r r 0 C(R+ ; R ). We refer to W , C(R+ ; R ), C , C , W as Wiener’s basis on C(R+ ; Rr ). According to this definition, Wiener’s basis on C(R+ ; Rr ) does not satisfy the usual conditions, but this will not matter in what follows. We restate the result presented in Section 2.2 for the reader’s convenience.

90

CHAPTER 4. MAIN THEOREM 2.11 Theorem. Let W be an Rr1 -valued Wiener process, let µ be an adapted, Rd -valued process, let σ be an adapted, Rd ⊗Rr1 -valued process, assume that Z t T (2.12) E kµs k + kσs σs k ds < ∞ ∀t ∈ R+ , 0

and set Z (2.13)

t

Z

σs dWs .

µs ds +

Yt ,

t

0

0

Let E be a Polish space, and let Z be a continuous, E-valued process with Z = Φ(Z0 , Y ) for some continuous updating function Φ. Finally, suppose that N ⊂ R+ is a Lebesgue-null set and that we have (deterministic) functions µ b : E×R+ → Rd and σ b : E×R+ → Rd ⊗Rr2 such that µ b(Zt , t) = E[µt | Zt ] T T / N. a.s. and σ bσ b (Zt , t) = E[σt σt | Zt ] a.s. when t ∈ b P) b supporting processes W b b c, Then there exists a stochastic basis (Ω, F , F, Yb , and Zb such that c is an Rr2 -valued Wiener process, (a) W Z t Z t cs , σ b(Zbs , s) dW µ b(Zbs , s) dt + (b) Ybt = 0

0

(c) Zb is a continuous, E-valued process with Zb = Φ(Zb0 , Yb ), and (d ) Zb has the same one-dimensional marginal distributions as Z. Proof. Set b , µ, bb , µ b, c = σσ T , and b c=σ bσ bT . ItRis clear from (2.13), R t that t Y is an Itˆo process with the characteristics Bt , 0 bs ds and Ct , 0 cs ds (e.g., Lem. D.2), so Thm. 4.30 asserts the existence of a stochastic basis e = (Ω, e P) e that supports adapted, continuous processes Ye and Ze such e F e , F, B R e C), e where B et , t bb(Zes , s) ds that Ye is an Itˆo process with characteristics (B, 0 R et , t b es , s) ds, Ze = Φ(Ze0 , Ye ), and Ze has the same one-dimensional and C c ( Z 0 f = Ye − B, e so M f is a local martingale marginal distributions as Z. Set M with Z t Z t f e e h M it = Ct = b c(Zs , s) ds = σ bσ bT (Zes , s) ds. 0

0

b = (Ω, b P) b , b F b , F, Let W denote Wiener’s basis on C(R+ ; Rr2 ), and set B e b C, b M c, Yb , Zb denote the extensions of B, e C, e B⊗W (see Def. 1.11). Let B, 91

CHAPTER 4. MAIN THEOREM r2 f, Ye and Ze from Ω e to Ω b , Ω×C(R e M + ; R ). Moving to the extension, we Rt bt , bb(Zbs , s) ds and Zb = Φ(Zb0 , Yb ), and Zb still has the same still have B 0 one-dimensional marginal distributions as Z. Thm. D.9 asserts the existence b c defined on Ω b such of an F-adapted, continuous, Rr2 -valued Wiener process W that Z t cs . c σ b(Zbs , s) dW Mt = 0

c + B, b we see that Yb satisfies (b), and we are done. As Yb = M

92

Appendix A Galmarino’s Test In this section, we assume that we are in Setting 3.1. A.1 Lemma (Galmarino’s test). Let T be an F -measurable, R+ -valued random variable. The following are equivalent: (a) T is an F0 -stopping time, and (b) if E(ω) = E(ω 0 ) and Xu (ω) = Xu (ω 0 ) for 0 ≤ u ≤ T (ω 0 ), then T (ω) = T (ω 0 ). Moreover, if T is an F0 -stopping time and Z is an F -measurable random variable, then (c) Z is FT0 -measurable if and only if Z = Z(E, X T ). In particular, FT0 = σ(E, X T ). A.2 Remark. If T is the last time that X leaves an open set G, then XT ∈ / G. This means that T is also the last time that the process stopped at T leaves the set G. In particular, T = T (E, X T ), but T is clearly not a stopping time as you must look into the future to determine if you will enter the set G again later. In particular, the property which must be checked in (b) is strictly stronger than the property which must be checked in (c). Proof. First we show that Z is Ft0 -measurable if and only if Z is F -measurable and Z = Z(E, X t ). ⇒ The class of bounded random variables such that Z = Z(E, X t ) is a monotone class that contains finite products of the form f (E)g1 (Xt1 ) · · · gn (Xtn ) 93

APPENDIX A. GALMARINO’S TEST

for bounded measurable f and gi and 0 ≤ ti ≤ t. The property then holds for all bounded Z ∈ Ft0 by a monotone class argument. ⇐ E and X t are both Ft0 -measurable, and Z is F -measurable, so Z(E, X t ) is also Ft0 -measurable. Now we show the first equivalence. ⇒ Assume T is an F0 -stopping time and fix ω and ω 0 E(ω) = E(ω 0 ), and Xu (ω) = Xu (ω 0 ) for 0 ≤ u ≤ t , T (ω 0 ). T is an F0 -stopping time, so {T = t} ∈ Ft0 . By the previous case, 1{T =t} (ω) = 1{T =t} (E(ω), X t (ω)) = 1{T =t} (E(ω 0 ), X t (ω 0 )) = 1{T =t} (ω 0 ) = 1. ⇐ Assume that property (b) holds. We need to show that this implies {T ≤ t} ∈ Ft0 . By the previous case, it is sufficient to show that ω ∈ {T ≤ t} ⇒ E(ω), X t (ω) ∈ {T ≤ t}. Fix ω ∈ {T ≤ t} and set ω t , E(ω), X t (ω) . Then E(ω t ) = E(ω) and Xu (ω t ) = Xu (ω) for 0 ≤ u ≤ T (ω) ≤ t, Using the assumption, we see that T (ω t ) = T (ω), so T (ω t ) ≤ t and ω t ∈ {T ≤ t}. Finally we show (c). ⇒ Assume that Z ∈ FT0 . Fix any ω and set t , T (ω) and z , Z(ω). By assumption, we have A , T = t and Z = z ∈ Ft0 . So ω ∈ A ⇒ E(ω), X t (ω) ∈ A, but then Z E(ω), X t (ω) = z. As ω was arbitrary, we conclude that Z = Z(E, X t ) ⇐ Suppose that Z is F -measurable, and that Z = Z(E, X T ). Fixing an 0 arbitrary constant z, we need to show that A , {Z ≤ z and T ≤ t} 0∈ Ft . 0 0 t 0 Fix some ω ∈ A and set ω , E(ω ), X (ω ) . Then E(ω) = E(ω ) and Xu (ω) = Xu (ω 0 ) for 0 ≤ u ≤ T (ω 0 ) ≤ t so T (ω) = T (ω 0 ) by the previous equivalence. Then 0

0

X T (ω) (ω) = X T (ω ) (ω) = X T (ω ) (ω 0 ).

94

APPENDIX A. GALMARINO’S TEST

Using the assumption, we see that 0 Z(ω) = Z E(ω), X T (ω) (ω) = Z E(ω 0 ), X T (ω ) (ω 0 ) = Z(ω 0 ), so ω ∈ A. This implies that A ∈ Ft0 , so we are done. A.3 Lemma. Let S ≤ T be F0 -stopping times with S ∈ R+ and T ∈ R+ . Then FT0 = σ FS0 , ∆(X T , S) . Proof. By the previous lemma, we have FT0 = σ(E, X T ). Writing X T = X S + Θ ∆(X T , S), −S , and observing that E, X S , and S are all FS0 -measurable, we conclude that all FT0 FT0 ⊂ σ FS0 , ∆(X T , S) . On the other hand, FS0 , X T , and S are T 0 0 measurable. This means that we also have σ FS , ∆(X , S) ⊂ FT , completing the proof. A.4 Lemma. Suppose that S is a finite F0 -stopping time and that T ≥ S is an R+ -valued random time. Further suppose that G ⊂ FS0 and that Z is a random variable. Let I , (E, X) denote the identify operator on Ω, set C0 , C0 (R+ ; Rd ), and let C0 denotes the Borel σ-field on C0 . Then the following are equivalent: (a) Z is σ G , ∆(X T , S) -measurable, (b) Z = f I; ∆(X T , S) for some f : Ω×C0 → R which is G ⊗C0 -measurable, and (c) Z = g E, X S ; ∆(X T , S) for some g : Ω×C0 → R which is G ⊗C0 measurable. Proof. Z ∈ bF means Let B , Z ∈ bF : Z C , Z ∈ bF : Z D , Z ∈ bF : Z

that Z is a bounded F -measurable random variable. = f I; ∆(X T , S) with f ∈ bG ⊗C0 , = g E, X S ; ∆(X T , S) with g ∈ bG ⊗C0 , and = Z 0 h ∆(X T , S) with Z 0 ∈ bG and h ∈ bC0 . .

95

APPENDIX A. GALMARINO’S TEST It is clear that D ⊂ B. As Z 0 = Z 0 (E, X S ) by (c) of Lem. A.1, we also have D ⊂ C . As B and C are closed with respect to uniformly bounded, pointwise limits they are monotone classes, and we have σ(D) = σ G , ∆(X T , S) ⊂ B ∩ C . In particular, (a) implies (b) and (c). Now assume Z = f I; ∆(X T , S) for some G ⊗C0 /R-measurable f . By checking the preimages of rectangles, we see that the map which sends ω 7→ ω, ∆(X T (ω), S(ω)) is σ G , ∆(X T , S) /G ⊗C0 -measurable. As Z is the composition of this map with f , it follows that Z is σ G , ∆(X T , S) /Rmeasurable. In particular, (b) implies (a). Let φ denote the map ω 7→ E(ω), X S (ω) . If G ∈ G , then Lem. A.1 asserts that ω ∈ G ⇔ φ(ω) ∈ G. As a result, φ -1 (G) = G and φ is G /G measurable. Now assume that Z = g φ; ∆(X T , S) for some G ⊗C0 /Rmeasurable g. By checking the preimages of rectangles, we see that the map which sends ω 7→ φ(ω), ∆(X T (ω), S(ω)) is σ G , ∆(X T , S) /G ⊗C0 measurable. As Z is the composition of this map with g, it follows that Z is σ G , ∆(X T , S) /R-measurable. In particular, (c) implies (a) and we are done. A.5 Lemma. Let S ≤ T be finite, F0 -stopping times and let U ≥ T be an R+ -valued random time. If G ⊂ FS0 , then σ G , ∆(X U , S) ∩ FT0 = σ G , ∆(X T , S) ∩ FT0 . Proof. Fix any R+ -valued random times U1 ≥ T and U2 ≥ T . We will show that (A.6) σ G , ∆(X U1 , S) ∩ FT0 ⊂ σ G , ∆(X U2 , S) ∩ FT0 . To do this, choose any bounded random variable Z which is measurable with respect to σ G , ∆(X U1 , S) ∩ FT0 . Using Lem. A.4, we may choose some g ∈ G ⊗ C0 such that Z = g E, X S ; ∆(X U1 , S) . Now fix any ω ∈ Ω, let t = T (ω), and set ω t , E(ω), X t (ω) ∈ Ω. As ω t agrees with ω up until time t = T (ω) ≥ S(ω), T (ω t ) = T (ω) and S(ω t ) = S(ω) by (b) of Lem. A.1. This means that X S (ω t ) = ∇ X(ω t ), S(ω t ) = ∇ X(ω t ), S(ω) = ∇ X(ω), S(ω) = X S (ω).

96

APPENDIX A. GALMARINO’S TEST We also have Ui (ω t ) ≥ T (ω t ) = t. This means that X U1 (ω t ) = ∇ X(ω t ), U1 (ω t ) = X(ω t ) = ∇ X(ω t ), U2 (ω t ) = X U2 (ω). As Z ∈ FT0 , an application of (b) of Lem. A.1 followed by the use of the characterization in terms of g gives Z(ω) = Z(ω t ) t t t S t U1 = g E(ω ), X (ω ); ∆ X (ω ), S(ω ) = g E(ω), X S (ω); ∆ X U2 (ω), S(ω) . As ω is arbitrary, we have Z = g E, X S ; ∆(X U2 , S) and the characterization given in the preceding lemma implies that Z ∈ σ G , ∆(X U2 , S) . We have now shown that (A.6) holds. The result follows by first taking U1 = U and U2 = T and applying (A.6) to conclude that σ G , ∆(X U , S) ∩ FT0 ⊂ σ G , ∆(X T , S) ), and then taking T U1 = T and U2 = U and applying (A.6) to conclude that σ G , ∆(X , S) ∩ FT0 ⊂ σ G , ∆(X T , S) . A.7 Corollary. Let S ≤ T be finite F0 -stopping times, and let U1 ≥ T and U2 ≥ T be R+ -valued random times. If G ⊂ FS0 and T −S ∈ σ G , ∆(X U1 , S) , then σ G , ∆(X T , S) = σ G , ∆(X U2 , S) ∩ FT0 , and (A.8) σ G , ∆(X T , S), ∆(X U2 , T ) = σ G , ∆(X U2 , S) . (A.9) In particular, if T − S ∈ σ G , ∆(X, S) then T − S ∈ σ G , ∆(X T , S) . Proof. S is an F0 -stopping time andS ≤ T , so S is FT0 -measurable. This means that we have σ G , ∆(X T , S) ⊂ FT0 and that T − S ∈ FT0 . As T − S ∈ σ G , ∆(X U1 , S) by assumption, we have T − S ∈ σ G , ∆(X U1 , S) ∩ FT0 = σ G , ∆(X T , S) ∩ FT0 = σ G , ∆(X U2 , S) ∩ FT0 by the previous lemma. In particular, we know that T −S ∈ σ G , ∆(X U2 , S) , so if we then write ∆(X T , S) = ∇ ∆(X U2 , S), T − S) , then it is clear that 97

APPENDIX A. GALMARINO’S TEST σ G , ∆(X T , S) ⊂ σ G , ∆(X U2 , S) , and we have one of inclusions needed for A.8. The opposite inclusion follows immediately from the previous lemma as σ G , ∆(X U , S) ∩ FT0 = σ G , ∆(X T , S) ∩ FT0 ⊂ σ G , ∆(X T , S) . To show that σ G , ∆(X T , S), ∆(X U2 , T ) ⊂ σ G , ∆(X U2 , S) , we write ∆(X T , S) = ∇ ∆(X U2 , S), T − S , ∆(X U2 , T ) = ∆ ∆(X U2 , S), T − S , U2 U2 and use the fact that T −S ∈ σ G , ∆(X , S) . To show that σ G , ∆(X , S) ⊂ σ G , ∆(X T , S), ∆(X U2 , T ) , we write ∆(X U2 , S) = ∆(X T , S) + Θ ∆(X U2 , T ), −(T − S) . and use the fact that T −S ∈ σ G , ∆(X T , S) . We have now shown (A.9)

98

Appendix B Metric Space-Valued Random Variables. Here we collect a number of results on metric space-valued random variables. We recall the following definition from Section 1.2. 1.9 Definition. Let E be a topological space, and let {X n }n≤∞ be a sequence of E-valued random variables, possibly defined on different probability spaces. We say that X n converges in distribution to X ∞ , written X n ⇒ X ∞ , if lim En f (X n ) = E∞ f (X ∞ ) n→∞

for each bounded, continuous function f : E → R. B.1 Theorem (Portmanteau). When E is a metric space, the following are equivalent: (a) X n ⇒ X ∞ , and (b) En [f (X n )] → E∞ [f (X ∞ )] for all bounded uniformly continuous f . (c) lim supn Pn [X n ∈ F ] ≤ P∞ [X ∞ ∈ F ] for all closed F ⊂ E, (d ) lim inf n Pn [X n ∈ G] ≥ P∞ [X ∞ ∈ G] for all open G ⊂ E, and (e) Pn [X n ∈ A] → P∞ [X ∞ ∈ A] for all A ⊂ E with P∞ [X ∞ ∈ ∂A] = 0. Proof. See [Bil68] Thm. 2.1.

99

APPENDIX B. METRIC SPACE-VALUED RANDOM VARIABLES. B.2 Lemma. Let (E, d) be a metric space and let {X n }n∈N and {Y n }n∈N be collections of E-valued random variables. If X n ⇒ X ∞ and d(X n , Y n ) ⇒ 0 then Y n ⇒ X ∞ . Proof. Fix any bounded uniformly continuous f : E → R, and write ∞ E [f (X ∞ )] − En [f (Y n )] h i ≤ E∞ [f (X ∞ )] − En [f (X n )] + En f (X n ) − f (Y n ) . As d(X n , Y n ) ⇒ 0 and f is uniformly continuous, we conclude that f (X n ) − f (Y n ) ⇒ 0, and the result follows. B.3 Lemma. Let (E, d) be a metric space, let {S n }n∈N be a sequence of probability spaces where S n = (Ωn , F n , Pn ), and suppose that on each space S n there is defined an E-valued a random variable Y n and a collection of approximating random variables {X n,a }a∈A . If (B.4) inf sup Pn d(X n,a , Y n ) > δ = 0 a∈A n∈N

for each δ > 0, and X n,a ⇒ X ∞,a as n → ∞ for each a ∈ A, then Y n ⇒ Y ∞ . Proof. Fix any bounded, uniformly continuous f : E → R, choose C such that |f | ≤ C, and then choose any ε > 0. Using the uniform continuity of f , choose δ = δ(ε) so small that |f (e2 ) − f (e1 )| ≤ ε/6 when d(e1 , e2 ) ≤ δ. Using (B.4), choose a ∈ A such that sup Pn d(X n,a , Y n ) > δ < ε/(12C). n∈N

Finally, choose N = N (ε, a) so large that E∞ [f (X ∞,a )]−En [f (X n,a )] < ε/3

100

APPENDIX B. METRIC SPACE-VALUED RANDOM VARIABLES.

when n ≥ N . Putting this all together gives ∞ E [f (Y ∞ )] − En [f (Y n )] h i ≤ E∞ f (Y ∞ ) − f (X ∞,a ) + E∞ [f (X ∞,a )] − En [f (X n,a )] h i n n,a n + E f (X ) − f (Y ) ≤ 2C P∞ d(Y ∞ , X ∞,a ) > δ + ε/6 + ε/3 + 2C Pn [d(X n,a , Y n ) > δ] + ε/6 ≤ε when n ≥ N . As f and ε are arbitrary, we conclude that Y n ⇒ Y ∞ . B.5 Lemma. Let E1 and E2 be topological spaces and let {Xni } be a collection of Ei -valued random variables for i ∈ {1, 2}. Then the collection of E1 ×E2 valued random variables {(Xn1 , Xn2 )} is tight if and only the collection {Xn1 } is tight and the collection {Xn2 } is tight. Proof. We let πi : E1 ×E2 → Ei denote projection onto the ith component. We now check both implications: ⇒ Fix ε and choose compact K ⊂ E1 ×E2 with Pn [(Xn1 , Xn2 ) ∈ K ] ≥ 1 − ε. Without loss of generality, we may assume that K = K1 ×K2 for compact sets Ki ⊂ Ei (otherwise replace K with π1 (K)×π2 (K) and note that the continuous forward image of a compact set is compact). Then we have Pn [Xni ∈ Ki ] ≥ Pn [(Xn1 , Xn2 ) ∈ K ] ≥ 1 − ε. ⇐ Fix ε and choose Ki with Pn [Xni ∈ / Ki ] ≤ ε/2. Then K1 ×K2 is compact and / K2 ] ≤ ε / K1 ×K2 ] ≤ Pn [Xn1 ∈ / K1 ] + Pn [Xn2 ∈ Pn [(Xn1 , Xn2 ) ∈ B.6 Lemma. Let E be a Polish space. If f ∈ C(Rd ×R+ ; E), and F : C(R+ ; Rd ) → C(R+ ; E) denotes the map such that Ft (x) = f x(t), t for x ∈ C(R+ ; Rd ) and t ∈ R+ , then F is a continuous map. Proof. Fix a path x ∈ C(R+ ; Rd ). If tn → t∞ , then Ftn (x) = f x(tn ), tn → f x(t∞ ), t∞ = Ft∞ (x) 101

APPENDIX B. METRIC SPACE-VALUED RANDOM VARIABLES.

so F (x) is a continuous process. Set x∗ (t) , sups≤t kx(s)k < ∞ for all t, so f is uniformly continuous when restricted to the compact set B(x∗ (t) + 1)×[0, t] where B(r) , {a ∈ Rd : kak ≤ r}. Now let xn → x and fix t and ε > 0. Choose δ > 0 so small that kf (b, s) − f (a, s)k ≤ ε if a, b ∈ B(x∗ (t) + 1), s ≤ t and kb − ak ≤ δ. Then choose N so large that sups≤t kx(s) − xn (s)k ≤ δ ∧ 1. We have sup kFs (x) − Fs (xn )k = sup kf x(s), s − f xn (s), s k ≤ ε, s≤t

s≤t

as x(s) and xn (s) are both in B(x∗ (t) + 1). In particular, F (xn ) → F (x). We used the fact that closed bounded subsets of Rd are compact in the previous proof. B.7 Theorem (Lusin’s Theorem). Let E be a metric space, µ be finite measure on E, and f be a real-valued measurable function on E. Given any ε > 0, there exists a continuous function g such that µ({x : f (x) 6= g(x)}) < ε. Proof. See [Kec95] Thm 17.12 B.8 Lemma. Let (E, d) be a metric space, and let µ be finite measure on that space. Then the collection of bounded Lipschitz continuous functions on E is dense in Lp (E, µ) for any p ≥ 1. Proof. Let f : E → R with 0 ≤ f ≤ M for some finite constant M . Fix any ε > 0 and choose continuous g with µ({x : f (x) 6= g(x)}) < ε 2−p−1 M −p using the last theorem. Without loss of generality, we may assume that 0 ≤ g ≤ M ; otherwise, replace g with (0 ∨ g) ∧ M . Let gn (x) = inf y∈E g(y) + nd(y, x), so we have 0 ≤ gn ≤ g, gn (x) → g(x) as n → ∞, and each gn is Lipschitz continuous with n. Using bounded convergence, we may choose N R constant p so large that E |g − gN | dµ < ε/2, and then Z Z Z p p |f − gN | dµ ≤ |f − g| dµ + |gN − g|p dµ ≤ ε E

E

E

The result follows for arbitrary f ∈ Lp (E, µ) by first truncating, and then approximating the positive and negative parts. B.9 Corollary. Let E be a metric space, let µ be a finite measure on E, and let f : E → Rd be a measurable function. Then there Rexists a sequence of bounded, Lipschitz continuous functions {fn } such that kf − fn k dµ → 0. 102

APPENDIX B. METRIC SPACE-VALUED RANDOM VARIABLES. Proof. Letting f i denote the ith component of f , we choose d sequences of R-valued functions, {fni }n , with fni → f i in L1 (E, µ) and we let fn be the Rd -valued function with these components. Then Z kf − fn k dµ ≤

d Z X i=1

103

|f i − fni | dµ → 0.

Appendix C FV and AC Processes C.1 Definition. If f : D → Rd and [a, b] ⊂ D, then we define n X

f (ti ) − f (ti−1 ) . Var[a,b] (f ) , sup π

i=1

where the supremum is taken over all partitions of the form π = a = t0 < t1 < . . . < tn = b . We say that f is of bounded variation on the interval [a, b] if Var[a,b] (f ) < ∞, and we let BV [a, b]; Rd denote the collection of all such functions. We abbreviate Var[0,t] (f ) to Vart (f ). Definition C.1 extends Def. 1.13. C.2 Definition. Let f : D → Rd . We say that f is absolutely continuous on the interval P [a, b] if [a, b] ⊂ D, and there exists a function δ : (0, ∞) → n (0, ∞) Pn such that i=1 kf (ti ) − f (si )k < ε whenever si , ti ∈ [a, b] with si < ti , i=1 |ti − si | < δ(ε), and the intervals {(si , ti )}i are disjoint. We let AC [a, b]; Rd denote the collection of all such functions. It is clear that AC [a, b]; Rd ⊂ BV [a, b]; Rd . C.3 Theorem. If f ∈ BV [a, b]; Rd , then f 0 exists for Lebesgue-a.e. t ∈ Rb [a, b], and we have a kf 0 (u)k du < ∞. If f ∈ AC [a, b]; Rd , then Z (C.4)

f (t) = f (a) +

t

f 0 (u) du

a

104

∀t ∈ [a, b].

APPENDIX C. FV AND AC PROCESSES 0 Proof. Write f = (fi )1≤i≤d . Recall that f 0 exists at t if and only if fi exists 0 at t for all i ∈ {1, . . . , d}. It is clear that fi ∈ BV [a, b]; R for each i ∈ {1, . . . , d}, so we may apply the scalar result at each component to conclude / Ni . that there exist Lebesgue-null sets {Ni }1≤i≤d such that fi0 exists when t ∈ Setting N = ∪1≤i≤d Ni , we see that N is a Lebesgue-null set and f 0 exists when t ∈ / N. If f ∈ AC [a, b]; Rd , then fi0 ∈ AC [a, b]; R for each i ∈ {1, . . . , d}. Equation (C.4) then follows by applying the scalar result componentwise.

C.5 Theorem. If g : [a, b] → Rd is integrable on [a, b], and Z t g(u) du ∀t ∈ [a, b], f (t) = f (a) + a

then f ∈ AC [a, b]; Rd , f 0 exists for Lebesgue-a.e. t ∈ [a, b], and f 0 = g for Lebesgue-a.e. t ∈ [a, b]. R Proof. Given ε, we can choose δ so small that A kg(t)k du < ε when A ⊂ [a, b] Rt and λ(A) < δ. As kf (t) − f (s)k ≤ s kf (u)k du, the absolute continuity of f follows just as in the scalar-valued case.R The previous theorem asserts that f 0 t exists for Lebesgue-a.e. t ∈ [a, b], and a g(u) − f 0 (u) du = 0 for all t ∈ [a, b]. Fixing any x ∈ Rd , we have Z t Z t 0 0 (g(u) − f (u), x) du = g(u) − f (u) du, x = 0 ∀t ∈ [a, b], a

a

so we conclude that (g(t) − f 0 (t), x) = 0 for Lebesgue-a.e. t ∈ [a, b]. Letting {xn } denote a countable dense subset of Rd , we can choose a single Lebesguenull set N ⊂ [a, b] such that (g(t) − f 0 (t), xn ) = 0 for all n ∈ N when t ∈ / N. 0 0 This implies that g(t) − f (t) = 0 when t ∈ / N , so g = f Lebesgue-a.e. If the function f : [a, b] → R is nondecreasing, then f ∈ BV [a, b]; R . The next result generalizes this observation. C.6 Lemma. Let f : [a, b] → Rd ⊗Rd . If f (t) − f (s) ∈ S+d for all s, t ∈ [a, b] with s ≤ t, then and f ∈ BV [a, b]; Rd ⊗Rd . Proof. It is clear that f ii is nondecreasing and, therefore, of finite variation for each i ∈ {1, . . . , d}. Letting {ei } denote the canonical basis on Rd , and

105

APPENDIX C. FV AND AC PROCESSES

fixing any s, t ∈ [a, b] with s < t, we see that f (s)(ei ± ej ), ei ± ej ≤ f (t)(ei ± ej ), ei ± ej . This implies that 2 f ij (t) − f ij (s) ≤ f ii (t) − f ii (s) + f jj (t) − f jj (s). In particular, if we fix a partition π = {a = t0 < t1 < . . . < tn = b}, then n X ij f (tk ) − f ij (tk−1 ) k=1

≤

n X

f ii (tk ) − f ii (tk−1 ) + f jj (tk ) − f jj (tk−1 ) /2

k=1

= f ii (b) − f ii (a) + f jj (b) − f jj (a) /2. Taking the supremum over all such partitions, we see that f ij ∈ BV [a, b]; R . We have now shown that each component of f is of bounded variation on the interval [a, b], so f must be of bounded variation on the interval [a, b]. We recall the following definition from Section 1.2. 1.12 Definition. If X is an Rd -valued process, then we say that X is a finite variation process if X ∈ BV [0, t]; Rd for all t ∈ R+ , and we say that X is an absolutely continuous process if X ∈ AC [0, t]; Rd for all t ∈ R+ . C.7 Lemma. The map Vart : C(R+ ; Rd ) → R+ is lower semicontinuous for each fixed t. Proof. Take xn → x∞ , fix ε > 0, and choose a partition {0 = s0 < s1 < . . . < sm = t} such that m X

x∞ (si ) − x∞ (si−1 ) + ε/2. Vart (x∞ ) ≤ i=1

Then choose N = N (ε, m) so large that

sup x∞ (u) − xn (u) ≤ ε/4m u∈[0,t]

106

APPENDIX C. FV AND AC PROCESSES

for all n ≥ N . So when n ≥ N , we have Vart (xn ) ≥

m X

xn (si ) − xn (si−1 ) i=1

m X

x∞ (si ) − x∞ (si−1 ) − ε/2m ≥ i=1

≥ Vart (x∞ ) − ε. Letting ε → 0, we are done. To see we cannot hope for more than lower semicontinuity, consider the function   for 0 ≤ btc ≤ 1/4, t f (t) , 1/2 − t for 1/4 ≤ btc ≤ 3/4, and   t−1 for 3/4 ≤ btc ≤ 1, and set fn (t) , f (nt)/n, so fn → 0 uniformly, but Vart (fn ) = t for all n. C.8 Corollary. If X is a continuous process, then Vart (X) is a (measurable) random variable. Proof. The composition of a measurable map and a lower semicontinuous map is measurable. C.9 Corollary. The set F V d , x ∈ C(R+ ; Rd ) : x ∈ BV [0, t]; Rd ∀t ∈ R+ is a Borel measurable subset of C(R+ ; Rd ). Proof. Treating Vart as a map from C(R+ ; Rd ) to R+ , we write \ FV d = Varn-1 (R+ ). n∈N

C.10 Lemma. If X is an Rd -valued, continuous process which is adapted to some filtration F0 = {Ft0 }t∈R+ , then there exists an F0 -predictable process ∂ x such that, for each ω, we have xt (ω) = ∂t Xt (ω) whenever this derivative exists. 107

APPENDIX C. FV AND AC PROCESSES Proof. Define xnt , n(Xt −Xt−1/n )1{t>1/n} . Each xn is left-continuous and F0 adapted, so each xn is F0 -predictable. By taking the lim sup or lim inf at each coordinate, we get an F0 -predictable process x such that xt (ω) = limn xnt (ω) ∂ whenever the limit exists. In particular, if ∂t Xt (ω) exists, then limn xnt (ω) ∂ exists, and xt (ω) = limn xnt (ω) = ∂t Xt (ω). C.11 Corollary. The set AC d , y ∈ C(R+ ; Rd ) : y ∈ AC [0, t]; Rd ∀t ∈ R+ is a Borel measurable subset of C(R+ ; Rd ). Proof. Let X denote the canonical process on C(R+ ; Rd ), let C0 = {Ct0 }t∈R+ where Ct0 = σ(X t ) denote the filtration generated by X, and let C denote the Borel σ-field on C(R+ ; Rd ). Lem. C.10 asserts the existence of a C0 ∂ ∂ predictable process x such that xt (y) = ∂t Xt (y) = ∂t y(t) whenever the derivative exists. Set Z t d xu (y) du . A(t) , y ∈ C(R+ ; R ) : y(t) = 0

Rt As x is C0 -predictable, it is certainly C ⊗ R+ -measurable, so 0 xu du is C measurable by Fubini’s theorem (recall convention Rem. 1.7), and A(t) is C -measurable as well. Setting B = ∩q∈Q+ A(q), we will show that B = AC d . First assume that y ∈ AC d . Thm. C.3 asserts that y 0 (t) exists R t 0 for Lebesgue0 a.e. t, y is integrable on each interval [0, t], and y(t) = 0 y (u) du for all t. 0 0 As xt (y) agrees R t with y (t) whenever it exists, xt (y) = y (t) for Lebesgue-a.e. t, and y(t) = 0 xu (y) du for all t. In particular, y ∈ B. Rt Now assume that y ∈ B. As y is continuous and the map t 7→ 0 xu (y) du Rt is left-continuous (e.g., Rem. 1.7), we have y(t) = 0 xu (y) du for all t, and we may apply Thm. C.5 to conclude that y is absolutely continuous on each compact interval. If X and Y are two absolutely continuous processes which share the same law, then the derivatives of X and Y should the same law in some sense. To make this precise, one must address the fact the derivatives are only specified up to equivalence with respect to Lebesgue’s measure, and the following lemma gives one possible approach.

108

APPENDIX C. FV AND AC PROCESSES C.12 Lemma. Let (E, E ) be a metric space with its Borel σ-field, and let S i and S 2 be probability spaces with S i = (Ωi , F i , Pi ). Let S 1 support a continuous, Rd -valued process X i , a measurable, Rd -valued process xi , and a continuous, E-valued process Y . Let f : Rd ×E → R+ be an R d ⊗E -measurable function, and define the R+ -valued random variables Z ∞ f,i (C.13) Z , f (xis , Ysi ) ds for i ∈ {1, 2}. 0

If P

i

Xti

Z

t

=

xis

∀t ∈ R+

ds

= 1 for i ∈ {1, 2},

0

and L (X 1 , Y 2 ) = L (X 2 , Y 2 ), then L (X 1 , Y 1 , Z f,1 ) = L (X 2 , Y 2 , Z f,2 ).

(C.14) Proof. Set

i

(C.15)

A ,

Xti

Z =

t

xis

ds ∀t ∈ R+

.

0

.

R X i is continuous and 0 xiu du is left-continuous (e.g., Rem. 1.7), so we may replace R+ with Q+ in the (C.15) to see that Ai is measurable. We first show that the lemma holds when f is of the form f (a, b) = e−t g(a, b) for some bounded, R d ⊗E -measurable g, and we then show that the lemma holds as stated using monotone convergence. Assume that f (a, b) = e−t g(a, b) for some bounded, continuous g. Define n φ : C(R+ ; Rd )×R+ → Rd by φnt (y) , n y(t) − y(t − 1/n) 1{t>1/n} , and set Znf,i ,

R∞

(C.16)

L (X 1 , Y 1 , Znf,1 ) = L (X 2 , Y 2 , Znf,2 ) ∀n ∈ N.

0

f (φns ◦ X i , Ysi ) ds. As L (X 1 , Y 1 ) = L (X 2 , Y s ), we have

Set B i (ω i ) , t ∈ R+ : lim φnt ◦ X i (ω i ) 6= xit (ω i ) for ω i ∈ Ωi , n

109

APPENDIX C. FV AND AC PROCESSES

where limn zn 6= z∞ means that either the limit doesn’t exists, or that the ∂ Xti (ω i ) exists and agrees with xit (ω i ), limit exists and differs from z∞ . If ∂t then the difference quotients used to define φnt ◦ X i (ω i ) must converge to this value. In particular, n o ∂ B i (ω i ) ⊂ t ∈ R+ : Xti (ω i ) 6= xit (ω i ) . ∂t ∂ If ω i ∈ Ai , then Thm. C.5 asserts that ∂t Xti (ω i ) exists and agrees with xit (ω i ) for Lebesgue-a.e. t. In particular, λ(B i (ω i )) = 0 and

lim φnt ◦ X i (ω i ) = xit (ω i ) for Lebesgue-a.e. t. n

Using the continuity of f and dominated convergence, we conclude that limn Znf,i (ω i ) = Z f,i (ω i ) when ω i ∈ Ai . As Pi [Ai ] = 1, limn Znf,i = Z f,i , Pi -a.s., and this implies that (X i , Y i , Znf,i ) ⇒ (X i , Y i , Z f,i ). Combining this with C.16, we conclude that (C.14) holds for this case. We will now extend the result to functions f of the form f (a, b) = −t e g(a, b) for some bounded, measurable g using a monotone class argument. g ∈ bR d ⊗E means that g is a bounded R d ⊗E -measurable function. Let n o C , g ∈ bR d ⊗E : (C.14) holds with f (a, b) = e−t g(a, b) . We now show that C is a monotone class. Assume that {gn }n∈N is a uniformly bounded sequence of functions in C that converge to some limiting function g pointwise on Rd ×E. Setting fn (a, b) , e−t gn (a, b) for n ∈ N and f , e−t g(a, b), we have limn fn (xit , Yti ) = f∞ (xit , Yti ) for each t ∈ R+ , so we may apply dominated converge to conclude that Z ∞ Z ∞ i fn ,i i f (xit , Yti ) du = Z f,i lim Z = lim fn (xt , Yt ) du = n→∞

n→∞

0

0

pointwise on Ωi . This implies that (Z i , Y i , Z fn ,i ) ⇒ (Z i , Y i , Z f,i ), and we have (C.17)

L (Z 1 , Y 1 , Z fn ,1 ) = L (Z 2 , Y 2 , Z fn ,2 ) ∀n ∈ N

from the definition of C , so we may conclude that (C.14) holds for this case. Finally, we show that the result holds for nonegative f . Setting fn = 110

APPENDIX C. FV AND AC PROCESSES f ∧ (ne−t ) and applying the monotone convergence theorem, we see that limn Z fn ,i = Z f,i pointwise on Ωi as R+ -valued random variables. Applying the previous case to each fn , we that (C.17) holds, so (C.14) holds as well.. The following corollary is often more convenient for applications than Lem. C.12. C.18 Corollary. Let (E, E ) be a metric space with its Borel σ-field, and let S i and S 2 be probability spaces with S i = (Ωi , F i , Pi ). Let S 1 support a continuous, Rd -valued process X i , a measurable, Rd -valued process xi , and a continuous, E-valued process Y . Let f : Rd ×E →→ Rr be an R d ⊗E /R r measurable function, and define the Rr -valued random variables Z ∞ f,i (C.19) Z , f (xis , Ysi ) ds for i ∈ {1, 2}. 0

If P

i

Xti

Z =

t

xis

ds

∀t ∈ R+

= 1 for i ∈ {1, 2},

0

and L (X 1 , Y 2 ) = L (X 2 , Y 2 ), then L (X 1 , Y 1 , Z f,1 ) = L (X 2 , Y 2 , Z f,2 ).

(C.20)

C.21 Remark. According to the conventions of Rem. 1.7, the integral in (C.19) is always defined and takes the value ∞ ∈ Rd when any component is infinite or undefined. Proof. Write f = (fi )1≤i≤r , and let fi+ and fi− denote the positive and negative parts of fi . We may apply Lem. C.12 to conclude that +

+

−

+

L (X 1 , Y 1 , Z f1 ,1 ) = L (X 2 , Y 2 , Z f1 ,2 ), and then that +

−

L (X 1 , Y 1 , Z f1 ,1 , Z f1 ,1 ) = L (X 2 , Y 2 , Z f1 ,2 , Z f1 ,2 ).

111

APPENDIX C. FV AND AC PROCESSES

Repeating this argument a finite number of times, we see that −

+

−

+

L (X 1 , Y 1 , Z f1 ,1 , Z f1 ,1 , . . . , Z fr ,1 , Z fr ,1 ) −

+

+

−

= L (X 2 , Y 2 , Z f1 ,2 , Z f1 ,2 , . . . , Z fr ,2 , Z fr ,2 ).

(C.22)

Define the R+ -valued random variables +

−

+

−

φi , Z f1 ,i + Z f1 ,i + · · · + Z fr ,i + Z fr

,i

for i ∈ {1, 2}.

According to the conventions of Rem. 1.7, we have ( + − + − (Z f1 ,i − Z f1 ,i , . . . , Z fr ,i − Z fr ,i ) if φi < ∞, and f,i Z = ∞ otherwise, so (C.22) implies (C.20).

112

Appendix D Semimartingale Characteristics In Section 1.2, we presented the following definitions. 1.14 Definition. Let B = (Ω, F , F 0 , P) be a stochastic basis supporting a continuous, Rd -valued process X. We say that X is a continuous semimartingale if we can decompose X as (1.15)

Xt = X0 + Mt + Bt ,

where M is a continuous local martingale with M0 = 0, and B is a continuous process with B0 = 0 that is P-a.s. of finite variation. In this case, we say that X has the characteristics B, hM i . If B and hM i are both absolutely continuous, P-a.s., then we say that X is an Itˆ o process. Our definition of Itˆo process is technically convenient; however, it differs from the standard definition where an Itˆo process is defined as a process of form: Z t Z t (D.1) Xt = µs ds + σs dWs . 0

0

In this section, we show that our definition is essentially equivalent to the standard definition. One direction is trivial. D.2 Lemma. Suppose that W is an Rr -valued Wiener process and that X is a continuous, Rd -valued process which satisfies (D.1), where µ is an adapted, Rd -valued process, and σ is an adapted, Rd ⊗Rr -valued process. Then X is an Itˆo process. 113

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS Rt Rt Proof. Set Bt , 0 µs ds and Mt , 0 σs dWs . It is then clear that X has Rt the canonical decomposition X = X0 + M + B. As hM it = 0 σs σsT ds, it is clear that B and hM i are both a.s. absolutely continuous, so X is an Itˆo process. Going in the other direction, we will show that we can construct a Wiener process W such that (D.1) holds. The first step is to find good versions of the characteristics. D.3 Lemma. Let B = (Ω, F , F, P) be a stochastic basis which satisfies the usual conditions and supports an Rd -valued Itˆo process X. Then there exists an F-predictable, Rd -valued process b and an F-predictable, S+d -valued Rt process c such that X has the characteristics (B, C) where Bt , 0 bs ds and Rt Ct , 0 cs ds. Proof. As X is an Itˆo process, we may write X = X0 + M + B for some continuous local martingale M with M0 = 0 and some continuous process B with B0 = 0 which is P-a.s. absolutely continuous. Moreover, there also exists a process C which is version of hM i and is P-a.s. absolutely continuous. As B satisfies the usual conditions, we may assume that C is continuous (otherwise we redefine C on a null set). Lem. C.10 asserts that by taking divided differences from the left, we may construct F-predictable processes ∂ ∂ Bt (ω) and ct (ω) = ∂t Ct (ω) b and c such that, for each ω ∈ Ω, bt (ω) = ∂t whenever either derivative exists. Using the a.s. absolute continuity of B and C, we conclude that Z t bs ds ∀t = 1, and P Bt = 0 Z t cs ds ∀t = 1. P Ct = 0

We now need to modify c so that it only takes valued in S+d , and we follow

114

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS [JS87] II.2.9. For q ∈ Qd , we define aqt , 1{(ct q, q)<0} , Mtq , (Mt q, q), Z t Z t q q q q aqs (cs q, q) ds, and as dhM is , Zt , Yt , 0 Z0 t aqs ds, Zbtq , 0

where (x, y) denotes the inner product on Rd . Y q and Z q are P-indistiguishable, but Y q is P-a.s. nonnegative for all t and Z q is P-a.s. nonpositive for all t, so we conclude that Y q and Z q are both P-indistiguishable from the zero process. This implies that Zbq is also P-indistiguishable from the zero process. Letting {qn }n be an enumeration of Qd , we define bnt , maxi≤n aqt i , Rt bt , maxn∈N bnt , and Zbt , 0 bs ds. We may find a single P-null set N such that Zbtq (ω) = 0 for all q ∈ Qd and all t ∈ R+ when ω ∈ / N. n As b(ω) = limn b (ω) pointwise on R+ , and the sequence bn (ω) is nondecreasing, we may applying the monotone convergence theorem to conclude that Zbt (ω) = 0 for all t ∈ R+ when ω ∈ / N . This implies that Z t Z t as cs ds ∀t = 1. cs ds = P 0

0

As at (ω) ct (ω) ∈ S+d for all t and ω, we are done. D.4 Remark. We only use the usual conditions to ensure that we may choose a version of hM i which is continuous, rather than a.s. continuous. If we know, a prior, that such a version of hM i exists, then the usual conditions are not necessary in this lemma. Once we have good versions of the characteristics, the rest of the work is essentially linear algebra. We will need the following definitions and results. D.5 Definition. Given a matrix A ∈ S+d , we say that the matrix A1/2 ∈ S+d is the positive square root of A if A1/2 A1/2 = A. D.6 Lemma. Given any A ∈ S+d , the positive square root of A exists and is unique, and the map A 7→ A1/2 is a measurable map from S+d to S+d . Proof. It is a classical result that a bounded self-adjoint linear operator on a Hilbert space has a unique positive, self-adjoint square root. Moreover, if 115

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS we define B1 = (I − A)/2 and Bn+1 = (I − A + Bn2 )/2 for n ≥ 1, where I denotes the identity matrix in S+d , then the sequence {Bn } converges in operator norm to A1/2 . This implies that the map A 7→ A1/2 is measurable. One may consult [RSN90] VII.104 for the details of this argument. D.7 Definition. Given a matrix A ∈ Rn ⊗Rm , we say that a matrix A+ ∈ Rm ⊗Rn is the Moore-Penrose generalized inverse of A if AA+ A = A, A+ AA+ = A+ , (AA+ )T = AA+ , and (A+ A)T = A+ A. D.8 Lemma. The Moore-Penrose generalized inverse exists and is unique, and the map A 7→ A+ is measurable. Proof. The existence and uniqueness of the Moore-Penrose generalized derivative is shown in [Pen55]. [BIG03] provides a textbook treatment. Moreover, if we define B1 , AT /kAAT k and Bn+1 , Bn (2I − ABn ), then it is shown in [BI66] that the sequence {Bn } converges to A+ at each coordinate. This implies that the map A 7→ A+ is measurable. + While the map To see this A 7→ A is measurable, it is not continuous. 1 0 1 0 1 0 consider An = → A∞ = , then A+ but A+ n = ∞ = A∞ . 0 1/n 0 0 0 n [Con98] shows thats the the map A 7→ A+ is in fact analytic when restricted to matrices of a common rank. Notice that AA+ and A+ A are idempotent and self-adjoint, so they are orthogonal projections. We recall the following definition from Section 4.3.

4.60 Definition. Let X denote the canonical process on the space C(R+ ; Rr ), let C denotes the Borel σ-field on C(R+ ; Rr ), let C0 = {σ(X t )}t∈R+ denote the filtration generated by X, and let W denote Wiener’s measure on r r 0 C(R+ ; R ). We refer to W , C(R+ ; R ), C , C , W as Wiener’s basis on C(R+ ; Rr ). D.9 Theorem. Let B = (Ω, F , F0 , P) be a stochastic basis which supports an Rd -valued, P-a.s. continuous local martingale M and an adapted, Rd ⊗Rr valued process σ with Z t

σs σsT ds.

hM it = 0

b , B⊗W (see Def. 1.11), Let W denote Wiener’s basis on C(R+ ; Rr ), set B b Then B b supports c and σ and let M b denote the extensions of M and σ to B. 116

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS c such that an Rr -valued Wiener process W Z t cs . c σ bs dW Mt = 0

b Let X denote the canonical process b0 , P) b = B. b F b,F Proof. We will let (Ω, b denote the extension of X to Ω. b X b is a continuous on C(R+ ; Rr ), and let X b where I denotes the identity matrix b it = t I under P, martingale with h X b is a in Rr ⊗Rr , so we may apply Levy’s characterization to conclude that X Wiener process. Applying Lem. D.8, we see that σ b+ is a adapted, Rr ⊗Rd -valued process. T + + + bs , we bs )T = I − σ bs+ σ bs )(I − σ bs+ σ bs ) and (I − σ bs+ σ bs ) = (b σs σ bs )(b σs σ As (b σs σ have Z t Z t Z t + + + + T c σ bs ⊗b σs dh M is = σ bs σ bs (b σs σ bs ) ds = σ bs+ σ bs ds, and 0 0 0 Z t Z t + + b (I − σ bs σ bs )⊗(I − σ bs σ bs ) dh X is = (I − σ bs+ σ bs )(I − σ bs+ σ bs )T ds 0 Z0 t (I − σ bs+ σ = bs ) ds. 0

bs k ≤ r and kI − bs is an orthogonal projection, we have kb σs+ σ As σ bs+ σ + σ bs σ bs k ≤ r. Recall that we use the Frobenius norm on Rr ⊗Rr rather than the operator norm, so kIk = r. This means that

Z t

Z t

+ + c σ bs ⊗b σs dh M is ≤ kb σs+ σ bs k ds ≤ t r, and

0 0

Z t

Z t

+ + b is bs σ bs )⊗(I − σ bs σ bs ) dh X kI − σ bs+ σ bs k ds ≤ t r,

(I − σ

≤ 0

0

so

Z ct , W 0

t

σ bs+

Z cs + dM 0

117

t

bs . (I − σ bs+ σ bs ) dX

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS

is well-defined. We have Z t Z t + + cis + b is c it = σ bs ⊗b σs dh M (I − σ bs+ σ bs )⊗(I − σ bs+ σ bs ) dh X hW 0 0 Z t = σ bs+ σ bs + I − σ bs+ σ bs du 0

= tI, c, X b i = 0 because M c and X b are orthogonal as a result of the where h M c is a continuous martingale, we conclude that W c product construction. As W d is an Rr -valued Wiener R t process. Finally, let I denote the identity matrix in d d b, σ cs , so L b−M c is a local martingale and R ⊗R , and set L b dW 0 s b−M cit = h Li b t + hM cit − 2h L, b M cit hL Z t Z t c, M cis σ bs σ bsT ds − 2 =2 σ bs ⊗I d dh W Z0 t Z0 t c, M cis σ bs σ bsT ds − 2 σ bs σ bs+ ⊗I d dh M =2 Z0 t Z0 t σ bs σ bsT ds − 2 σ bs σ bs+ σ =2 bs σ bsT ds 0

0

= 0. c, so we are done. In particular, M = M D.10 Corollary. Let B = (Ω, F , F, P) be a stochastic basis which satisfies the usual conditions and supports an adapted, Rd -valued Itˆo process X. Let b , B⊗W , and let X b denote W denote Wiener’s basis on C(R+ ; Rd ), set B b Then there exist adapted, Rd -valued processes µ the extension of X to B. b b such c and an adapted, S+d -valued process σ and W b which are defined on B c is a Wiener process, and that W Z t Z t b cs . (D.11) Xt = µ bs ds + σ bs dW 0

0

b = (Ω c1 , F b0 , P). b If a is a process defined on B, then b b 1, F Proof. Let B a will b denote the extension of a to B. Lem. D.3 asserts the existence of an Fpredictable, Rd -valued process b and an F-predictable, S+d -valued process c 118

APPENDIX D. SEMIMARTINGALE CHARACTERISTICS Rt such that X has the characteristics (B, C) where Bt , 0 bs ds and Ct , Rt 1/2 c ds. Define σt , ct . Lem. D.6 asserts that the map A 7→ A1/2 is 0 s measurable, so R t σ isTalso F-predictable. Defining M , X − X0 − B, we see b that hM it = 0 σs σs ds. The previous theorem asserts the existence of an FR t c such that M ct = σ cs . adapted, Rd -valued, continuous Wiener process W b dW 0 s b solves (D.11). Setting µ b , bb, we see that X

119

Appendix E Rebolledo’s Criterion If a collection of probability measures on a Polish space is tight, then Prokhorov’s theorem tells us that we may select a weakly convergent sequence from that collection. We will also say that a collection of processes is tight if the collection laws induced by those processes is tight. Given a collection of Rd -valued continuous processes, {X α }, each defined on a stochastic bases (Ωα , Pα , Fα , F α ), we list five potential conditions. [C1] The collection of random variables {X0α } is tight. [C2] For each t and ε > 0 there exists a δ > 0 such that h i Pα sup kXsα2 − Xsα1 k ≥ ε ≤ ε s1 ,s2 ∈At,δ

for any α, where At,δ , {(s1 , s2 ) ∈ R2+ : s1 ≤ s2 ≤ t and s2 − s1 ≤ δ}. [C3] Each X α is a continuous semimartingale with characteristics Aα , (B α , C α ), and the collection of continuous Rd ×(Rd ⊗ Rd )-valued processes, {Aα }, is tight. [C4] For each t and ε > 0 there exists a δ > 0 such that Pα kXTαα +u − XTαα k ≥ ε ≤ ε for each α, Fα -stopping time T α ≤ t, and u ∈ [0, δ].

120

APPENDIX E. REBOLLEDO’S CRITERION

[C5] For each t and ε > 0 there exists a δ > 0 such that Pα kXTαα − XSαα k ≥ ε ≤ ε for any α, and any two Fα -stopping times S α ≤ T α ≤ t with T α − S α ≤ δ. Azela and Ascoli’s characterization of the compact subsets of spaces of continuous functions implies that a collection of processes is tight if and only if [C1] and [C2] holds (e.g., [Bil68] or [Par67]); however, it is often difficult to directly verify condition [C2] for a collection of Itˆo processes if the drift and diffusion processes are not uniformly bounded. Fortunately, Rebolledo [Reb79] has shown that [C3] is actually sufficient to ensure that [C2] holds. This result is quite useful because it is often easier to compute with the characteristics of a semimartingale than with the semimartingale itself. The goal of this section is to provide a relatively brief and self-contained derivation of this result. To prove Rebolledo’s result, we will first show that the conditions [C2], [C4], and [C5] are all equivalent for continuous processes. This result is essentially given in [Ald78], and we borrow heavily from the presentation given in [JM86]. To facilitate the proof, we give two lemmas. First we note that if a function does not oscillate too wildly within in each interval of a partition, then the function also cannot oscillate wildly between points in adjacent intervals. This is the content of the rather obvious E.1 Lemma. Let x ∈ C(R+ ; Rd ) and suppose that we have a (deterministic) partition {0 = t0 < t1 < t2 < . . . < tn } such that |ti − ti−1 | ≥ δ for all i ∈ {2, . . . , n − 1} and |x(v) − x(u)| < ε if u, v ∈ [ti−1 , ti ] for some i ∈ {1, . . . , n}. Then |v − u| ≤ δ and u, v ∈ [0, tn ] implies |x(v) − x(u)| < 2ε. E.2 Remark. We do not need to control the size of the first or the last interval. Proof. If u, v ∈ [ti−1 , ti ] for some i the result is immediate. The only other possibility is that ti−1 ≤ u < ti ≤ v < ti+1 for some i, but the |x(v) − x(u)| ≤ |x(v) − x(ti )| + |x(ti ) − x(u)| ≤ 2ε. We also observe that if [C5] holds, then we can bound the probability that the processes makes a large number of large moves in a given time interval. This is the content of the following 121

APPENDIX E. REBOLLEDO’S CRITERION E.3 Lemma. Suppose that P |XT − XS | ≥ ε ≤ ε for all stopping times S ≤ T ≤ t with T − S ≤ δ. If we define the stopping times T0 , 0, and Ti , inf t > Ti−1 : |Xt − XTi−1 | ≥ ε , then (E.4) (1 − t/δn) P Tn ≤ t ≤ ε. Proof. First we notice that n n X X P Ti − Ti−1 ≤ δ = P |XTi ∧(Ti−1 +δ) − XTi−1 | ≥ ε ≤ nε, i=1

i=1

so we have n X n P Tn ≤ t ≤ P Tn ≤ t and Ti − Ti−1 > δ + nε i=1 n X ≤ E 1{Tn ≤t} (Ti − Ti−1 )/δ + nε i=1 = E 1{Tn ≤t} Tn /δ + nε ≤ t P Tn ≤ t /δ + nε.

We now have everything that we need to show the equivalence of [C2], [C4], and [C5]. Notice that [C4] looks much weaker than [C2] or [C5]. In particular, one must choose a single deterministic offset u in condition [C4] which cannot vary from path to path. E.5 Theorem. If {X α } is a collection of continuous processes, then [C2], [C4], and [C5] are all equivalent. Proof. [C2] clearly implies [C4], so we now assume that [C4] holds and show that this implies [C5]. Fix t and ε > 0 and then choose δ as in condition [C4], so that Pα [ |XTαα +u − XTαα | ≥ ε/2 ] ≤ ε/3 for every α, Fα -stopping time T α , and u ∈ [0, 2δ]. Pick any α and any two Fα -stopping times S α ≤ T α ≤ t with T α −S α ≤ δ. In particular, we have [T α , T α + δ] ⊂ [S α , S α + 2δ]. Also notice that for any

122

APPENDIX E. REBOLLEDO’S CRITERION

s, we have α |XT α − XSαα | ≥ ε ⊆ |XTαα − Xsα | ≥ ε/2 ∪ |Xsα − XSαα | ≥ ε/2 . Combining these two observations with Fubini’s Theorem, we write δ P |XTαα − XSαα | ≥ ε R T α +δ = E 1{|XTαα −XSαα |≥ε} T α ds Z ∞ P |XTαα − XSαα | ≥ ε and s ∈ [T α , T α + δ] ds = Z0 ∞ ≤ P |XTαα − Xsα | ≥ ε and s ∈ [T α , T α + δ] 0 + P |Xsα − XSαα | ≥ ε and s ∈ [S α , S α + 2δ] ds Z 2δ Z δ α α P |XSαα +u − XSαα | ≥ ε/2 du = P |XT α +u − XT α | ≥ ε/2 du + 0

0

≤ δ ε, so [C5] holds. Finally, we will show that [C5] implies [C2], so assume [C5], fix some t and ε > 0, and define the stopping times T0α , 0 and Tiα , inf s > Ti−1 : |Xsα − XTαi−1 | ≥ ε/2 . Choose δ1 , as in [C5], such that Pα |XSαα − XTαα | ≥ ε/4 ≤ ε/4 for each α and all Fα -stopping times S α ≤ T α ≤ t with T α − S α ≤ δ1 . Then choose n so large that 1 − t/(nδ1 ) ≥ 1/2 which means that Pα [Tnα ≤ t] ≤ ε/2 by (E.4). Finally, choose another δ2 such that Pα |XSαα − XTαα | ≥ ε/2 ≤ ε/2n for each α and all Fα -stopping times S α ≤ T α ≤ t with T α − S α ≤ δ2 . Now notice that if we fix a point ω α ∈ B α ⊂ Ωα where α ≥ δ2 /2 for all i ≥ 1 with Tiα ≤ t , B α , Tiα − Ti−1

123

APPENDIX E. REBOLLEDO’S CRITERION then we may apply Lemma E.1 to conclude that |Xsα2 (ω α ) − Xsα1 (ω α )| ≤ ε for all s1 ≤ s2 ≤ t with s2 −s1 ≤ δ2 . In particular, we are done if Pα [B α ] ≥ 1−ε. Define the sets α < δ2 /2 and Tiα ≤ t Ciα , Tiα − Ti−1 α +δ /2)∧ t − XT α ∧ t | ≥ ε/2 , = |XTiα ∧(Ti−1 2 i−1 so Pα [Ciα ] ≤ ε/2n for all i ≥ 1. As α n α α (B α )c ⊂ ∪∞ i=1 Ci ⊂ ∪i=1 Ci ∪ {Tn ≤ t},

we have

n α c X P (B ) ≤ P Ci + P Tnα ≤ t ≤ , i=1

and we are done. We now recall the following E.6 Definition. We say that the process A dominates the process X in the sense of Lenglart [Len77], if E X T ≤ E AT for all bounded stopping times T . The domination property is useful because it implies the following E.7 Lemma. Let X be a right-continuous process and let A be a continuous increasing process which dominates X in the sense of Lenglart. If T is an R-valued stopping time and a and x are strictly positive constants, then a P XT∗ ≥ x ≤ + P AT ≥ a . x where Xt∗ , sups≤t |Xs | and Y∞ , limt→∞ Yt for any increasing process Y . Proof. If A dominates X, then AT dominates X T , so we may assume without loss of generality that T = ∞ by redefining X to be X T and A to be AT . Fix x and a and let S , inf { t : Xt ≥ x} and U , inf { t : At ≥ a}, so we have ∗ Xt ≥ x and At < a = S ≤ t < U ⊂ S = S ∧ U ∧ t = XS∧U ∧t ≥ x 124

APPENDIX E. REBOLLEDO’S CRITERION

Using Chebyshev’s inequality, the domination property, and the fact that A is increasing, we see that P Xt∗ ≥ x ≤ P XS∧U ∧t ≥ x + P At ≥ a 1 ≤ E AS∧U ∧t + P A∞ ≥ a x a ≤ + P A∞ ≥ a x holds for all t. Letting t → ∞ through some sequence and noting that ∗ {X∞ > x} ⊂ {Xt∗ ≥ x for some t}, we have ∗ a P X∞ > x ≤ + P A∞ ≥ a . x But the right hand side is continuous in x, so we really have a + P A∞ ≥ a n x − 1/n a = + P A∞ ≥ a . x

∗ ∗ P X∞ ≥ x = lim P X∞ > x − 1/n ≤ lim n

E.8 Lemma. If M is a continuous local martingale with M0 = 0, then hM i dominates M 2 in the sense of Lenglart. Proof. Define the stopping times Tn , inf{t : |M | ≥ n or hM i ≥ n}, fix 2 some stopping time T , and set N n , M Tn ∧T . Applying Doob’s maximal inequality to the positive submartingale M Tn ∧T gives E[sups≤t Nsn ] ≤ 4 E[Ntn ] = 4 E[hM iTt n ∧T ]. Letting t → ∞ and then n → ∞, we apply the monotone convergence theorem to conclude that E[sups≤T Ms2 ] ≤ E[hM iT ]. E.9 Lemma. If M is an Rd -valued continuous local martingale and T is an extended real-valued stopping time, then N , M − M T is a continuous local martingale and hN i = hM i − hM iT . Proof. By stopping, we may assume without loss of generality that M and hM i are bounded martingales. Let N i denote the ith component of N and 125

APPENDIX E. REBOLLEDO’S CRITERION let M i denote the ith component of M . If s < t, then i i E[Nti | Fs ] = E[Mti − Mt∧T | Fs ] = Msi − Ms∧T = Nsi

and E[Nti Ntj − hM i , M j it − hM i , M j it∧T | Fs ] j i = E[Mti Mtj − hM i , M j it | Fs ] − E[Mt∧T Mt∧T − hM i , M j it∧T | Fs ] j j i i | Fs ] ) Mt∧T | Fs ] − E[(Mtj − Mt∧T ) Mt∧T − E[(Mti − Mt∧T j i = Msi Msj − hM i , M j is − Ms∧T Ms∧T + hM i , M j is∧T j j i i − (Msi − Ms∧T ) Ms∧T − (Msj − Ms∧T ) Ms∧T

= Nsi Nsj − hM i , M j is − hM i , M j is∧T where we have applied Lem. 3.32. E.10 Lemma. If M , {M α } is collection of Rd -valued continuous local martingales and {hM α i} satisfies condition [C4], then M also satisfies condition [C4]. Proof. Fix t and ε > 0. Then choose a ≤ ε3 /2d3 and use the fact that {hM α i} satisfies condition [C4] to choose δ so small that i h

Pα hM α iT α +δ − hM α iT α ≥ a ≤ ε/(2d) for all α, F-stopping times T α ≤ t. We will now show that condition [C4] holds for M. Fix some α and Fα stopping time T α ≤ t, and then set M , M α −(M α )T . Lem. E.9 asserts that α M is a local martingale with quadratic variation C , hM α i − hM α iT . As M0 = 0, Lem. E.8 asserts that C ii dominates (M i )2 in the sense of Lenglart for each i ∈ {1, . . . , d}. Now fix any u ∈ [0, δ]. As T α + u is a stopping time,

126

APPENDIX E. REBOLLEDO’S CRITERION

we may apply Lem. E.7 to conclude that d h i X h i

α α

P MT α +u − MT α ≥ ε ≤ Pα MTα,iα +u − MTα,iα ≥ ε/d α

=

i=1 d X

h i Pα (MTi α +u )2 ≥ ε2 /d2

i=1

≤ ad3 /ε2 +

d X

Pα [CTiiα +u ≥ a]

i=1

= ε/2 +

d X

Pα [hM α,i iT α +u − hM α,i iT α ≥ a]

i=1

≤ ε. As this holds for all u ∈ [0, δ], we conclude that M satisfies condition [C4]. E.11 Lemma. If {X α } is a collection of Rd -valued continuous semimartingales, then [C3] implies condition [C2]. Proof. M α , X α − B α is a local martingale with hM α i = C α . As the collection {Aα } is tight, it satisfies [C2] which implies that it also satisfies [C4]. The previous lemma then asserts that the collection {M α } satisfies [C2] which implies that it also satisfies [C4]. As {B α } satisfies [C2] by assumption and X α = M α + B α , it is clear that {X α } satisfies [C2] as well. E.12 Corollary. If {X α } is a collection of Rd -valued continuous semimartingales that satisfy condition [C1] and [C3], then X is tight. Proof. As {X α } satisfies [C3], the previous lemma asserts that {X α } also satisfies [C2]. But conditions [C1] and [C2] are sufficient to ensure that the collection {X α } is tight.

127

Appendix F Convergence of Characteristics The goal of this subsection is to provide a self-contained development the following theorem. F.1 Theorem. Let X, B, and C be continuous processes where X and B take values in Rd , C takes values in Rd ⊗Rd , and B is a.s. of finite variation. Let {X n } be sequence of continuous, Rd -valued processes, and suppose that X n is a semimartingale with the characteristics (B n , C n ). If (X n , B n , C n ) ⇒ (X, B, C), then X is a semimartingale which has the characteristics (B, C) with respect to the filtration generated by X, B, and C (i.e., F0 , {σ(X t , B t , C t )}t∈R+ ). This theorem is a continuous version of [JS87] Thm. IX.2.4, and we take advantage of the assumption of continuity to streamline the presentation. To prove this theorem, we will need to show that the the weak limit of a local martingale is still a local martingale; however, this is a little delicate as the map which stops a path when it reaches a given level is not continuous. Consider the following example. F.2 Example. Let xn (t) , t 1[0,1) (t) − (2 − t) 1[1,∞) (t) − 1/n, y(t) , t 1[0,1) (t) − (2 − t) 1[1,∞) (t), and z(t) , t 1[0,1) (t) + 1[1,∞) (t). Define the stopping time T : C(R+ ; R) → R+ by T (x) = inf{t : x(t) ≥ 1}. 128

APPENDIX F. CONVERGENCE OF CHARACTERISTICS

Then xn → y uniformly, but lim ∇ xn , T (xn ) = lim xn = y 6= ∇ y, T (y) = z.

n→∞

n→∞

Fortunately, we can avoid the situation just described a.s. by choosing the levels at which we stop a process in a clever way that depends upon the law of that process. This is the content of Lem. F.5. First we will need to give a lemma about counting the jumps of a nondecreasing function. F.3 Notation. If f is a function that admits left and right limits, then we set f (x+) , limy↓x f (y), f (x−) , limy↑x f (y). F.4 Lemma. Let f : R → R+ be a nondecreasing function, fix some ε > 0, and define A , a ∈ R : f (a+) − f (a−) ≥ ε , B m , q ∈ Q : f (q + 1/m) − f (q) ≥ ε , x0 , −∞, and xi , lim inf B m ∩ (xi−1 , ∞) for i ≥ 1. m→∞

Then inf {xi }i∈N > −∞, xi > xi−1 when xi−1 < ∞, and A = {xi : xi < ∞}. Proof. We first show that if y ∈ R with f (y+) − f (y−) < ε, then we may choose δ = δ(y) > 0 and M = M (y) ∈ N such that B m ∩ (y − δ, y + δ) = ∅ for all m ≥ M . Choose η so small that f (y+) − f (y−) < ε − η, and then choose δ so small that f (y−) − f (y − δ) < η/2 and f (y + 2δ) − f (y+) < η/2. Finally, choose M so large that 1/M < δ. If m ≥ M and q ∈ (y − δ, y + δ), then {q, q + 1/m} ⊂ (y − δ, y + 2δ), so f (q + 1/m) − f (q) ≤ f (y + 2δ) − f (y − δ) < ε. In particular, B m ∩ (y − δ, y + δ) = ∅. As f is bounded from below, the set A ∩ (−∞, n] contains a finite number of points for each n. In particular, A contains a least element, and we may linearly order the jumps of size at least ε as {yn }n
APPENDIX F. CONVERGENCE OF CHARACTERISTICS

choose δ > xi−1 so close to xi−1 that we have f (z) − f (xi−1 +) < ε/2 when z ∈ (xi−1 , δ). For sufficiently large m, we have {qm , qm + 1/m} ⊂ (xi−1 , δ), but this contradicts the fact that f (qm + 1/m) − f (qm ) ≥ ε. Recall that f is bounded below, so this argument is valid when xi−1 = −∞. On the other hand, if xi ∈ (xi−1 , yi ), then we have f (xi +) − f (xi −) < ε, so we may choose δ > 0 and M with B m ∩ (xi − δ, xi + δ) = ∅ for all m ≥ M . This is again a contradiction. We have now shown that xi ≥ yi . Choosing qm ∈ Q ∩ (xi−1 , ∞) with yi ∈ (qm , qm + 1/m), we see that f (qm ) + ε ≤ f (yi −) + ε ≤ f (yi +) ≤ f (qm + 1/q), so xi ≤ yi , and we conclude that xi = yi . Using induction, we conclude that xi = yi for all i < N . As the yi are strictly increasing and finite, so are the xi . If N = ∞, we are done, so assume that N < ∞, and then further assume that xN < ∞ for the sake of generating a contradiction. We have f (xN +) − f (xN −) < ε, so we may choose δ > 0 and M with B m ∩ (xN − δ, xN + δ) = ∅. This contradicts the the definition of xN , so we conclude that xi = ∞ for all i ≥ N . F.5 Lemma. Let Z be a continuous, real-valued process, and define the stopping times T a : C(R+ ; R) → R+ by T a (z) , inf{t : z(t) ≥ a}. Then there exists a countable set A ⊂ R such that T a is L (Z)-a.s. continuous when a ∈ / A. Proof. We first show that the map a 7→ T a (z0 ) is left-continuous as a function of a for fixed z0 ∈ C(R+ ; R). Fix some nondecreasing sequence {an } with an → a∞ and an < a∞ for all n. Set tn = T an (z0 ) and t∞ , supn tn . If t∞ = ∞, then we must have T a (z0 ) = ∞ = limn T an (z0 ), as T a (z0 ) ≥ T an (z0 ) for each n. Now assume that t∞ < ∞. This implies that tn → t∞ , as every bounded nondecreasing sequence converges. As z0 is continuous, we have z0 (t∞ ) = lim z0 (tn ) = lim an = a, n→∞

n→∞

so T a (z0 ) ≤ t∞ . On the other hand, if s < t∞ , then there exists some n such that tn ∈ (s, t∞ ), and this implies that z0 (s) ≤ an < a and T a (z0 ) > s. In particular, T a (z0 ) = t∞ . We have now shown that the map a 7→ T a (z0 ) is left continuous. We now assume that the map a 7→ T a (z0 ) is continuous at the point a = c, and we show that this implies that the map z 7→ T c (z) is continuous 130

APPENDIX F. CONVERGENCE OF CHARACTERISTICS

at the point z = z0 . Notice that this assumption implies that z0 does not have a local max at t = T c (z0 ) and prevents the situation in Example F.2. Let zn → z0 , fix ε > 0, and choose b < c < d with T c (z0 ) − T b (z0 ) < ε and T d (z0 ) − T c (z0 ) < ε using the continuity of the map a 7→ T a (z0 ). Set δ , min{c − b, d − c}/2, set t = T d (z0 ), and choose N so large that sups≤t |z0 (s) − zn (s)| ≤ δ for all n ≥ N . Notice that this implies that zn (s) ≤ z0 (s) + δ ≤ b + δ < c for s ∈ [0, T b (z0 )] and zn (t) ≥ z0 (t) − δ = d − δ > c. In particular, T c (zn ) ∈ T b (z0 ), T d (z0 ) , so |T c (z0 ) − T c (zn )| ≤ ε. Recursively define a sequence of functions ξin : C(R+ ; R) → R by setting ξ0n (x) , −∞, and then defining n ξin (x) , lim inf a ∈ Q : ξi−1 (x) < a and T a+1/m (x) − T a (x) ≥ 1/n . m→∞

for each i > 0. For fixed x, the map a 7→ T a (x) is left-continuous, nondecreasing, and nonnegative, so we may apply Lem. F.4 to conclude that {a ∈ R : T a+ (x) − T a (x) ≥ 1/n} = {ξin (x) : ξin (x) < ∞}. so it has at most The map a 7→ P[ξin (Z) ≤ a] is right-continuous, countably n many jumps. This implies that the set a : P[ξi (Z) = a] > 0 is countable for each n and i. Putting everything together, we have the map z 7→ T a (z) is not continuous at z = Z ⊂ the map b 7→ T b (Z) is not continuous at b = a = T a+ (Z) ≥ T a (Z) + 1/n for some n = a = ξin (Z) for some n and i . Defining A , ∪i,n a : P[ξin (Z) = a] > 0 , we see that A is countable, and T a is L (Z)-a.s. continuous when a ∈ / A. a In particular, for each a ∈ / A there exists a set Ω ⊂ C(R+ ; R) such that P[Z ∈ Ωa ] = 1 and the map z 7→ T a (z) is continuous at each z ∈ Ωa . F.6 Corollary. Let E be a Polish spaces, and let {X n }n∈N be a collection of continuous, E-valued processes. Suppose that X n ⇒ X ∞ , fix some point

131

APPENDIX F. CONVERGENCE OF CHARACTERISTICS

e ∈ E, define the stopping times S a : C(R+ ; E) → R+ by S a (y) , inf{t : d(y(t), e) ≥ a}, a

n

and set X n,a , (X n )S (Y ) . Then there exists an increasing sequence {am } with limm am = ∞ such that X n,am ⇒ X ∞,am for each m. Proof. Let B ∞ = (Ω∞ , P∞ , F∞ , F ∞ ) denote the stochastic basis on which X ∞ is defined. Let φ : C(R+ ; E) → C(R+ ; R+ ) denote the map such that φt (y) = d(y(t), e), and set Z n = φ ◦ X n . The map f 7→ d(f, e) is uniformly continuous, so φ is continuous, and (X n , Z n ) ⇒ (X ∞ , Z ∞ ). Define the stopping times T a as in the previous lemma, so T a (Z n ) = a S (X n ), and then choose A such that P∞ [T a is discontinuous at Z ] = 0 when a ∈ / A. Let ψ a : C(R+ ; E×R) → C(R+ ; E) denote the map (x, z) 7→ ∇(x, T a (z)), and notice that ψ a (X n , Z n ) = (X n,a ). The map (x, t) 7→ ∇(x, t) is continuous, so ψ a is continuous at the point (x, z) when T a is continuous at the point z. In particular, ψ a is L (X ∞ , Z ∞ )-a.s. continuous when T a is L (Z ∞ )-a.s. continuous. To conclude, choose an increasing sequence {am } with limm am = ∞ and am ∈ / A for all m. Then ψ am is L (X ∞ , Z ∞ )-a.s. continuous, for each m, so we have X n,am = ψ am (X n , Z n ) ⇒ ψ am (X ∞ , Z ∞ ) = X ∞,am .

F.7 Remark. Let E2 be a Polish space, and let {Y n }n∈N be a collection of E2 -valued random variables. If we make the stronger assumption that (X n , Y n ) ⇒ (X ∞ , Y n ) in the previous corollary, then we may conclude that (X n,am , Y n ) ⇒ (X ∞,am , Y ∞ ) for each m. F.8 Lemma. Suppose the M is a P-a.s. right-continuous process which is adapted to some filtration F0 = {Ft0 }t∈R+ . If D is a dense subset of R+ and 0 {Mt }t∈D , {Ft }t∈D , P is a martingale, then {Mt }t∈R+ , FP , P is a martingale where FP = {FtP }t∈R+ is the smallest filtration which contains F0 and satisfies the usual conditions with respect to P. Proof. Let F = {Ft }t∈R+ denote the smallest right-continuous filtration that 0 contains F0 , so Ft = Ft+ . Fix s < t and let Z be a bounded, Fs -measurable random variable. Choose strictly decreasing sequences {sn } and {tn } in D with limn sn = s, limn tn = t, and sn < tn for all n. As Mu = E[Mt0 | 132

APPENDIX F. CONVERGENCE OF CHARACTERISTICS Fu0 ] for u ∈ D ∩ [0, t0 ], the collection {Mu }u∈D∩[0,t0 ] is uniformly integrable, limn Msn = Ms , P-a.s., and limn Mtn = Mt , P-a.s., so E[Mt Z ] = lim E[Mtn Z ] = lim E[Msn Z ] = E[Ms Z ]. n→∞

n→∞

As this is true for all bounded, Fs -measurable Z, we conclude that Ms is a version of E[Mt | Fs ], and it is then clear that Ms must also be a version of E[Mt | FsP ]. F.9 Theorem. Let E be a Polish space, let {Y n }n∈N be collection of continuous, E-valued processes, and let {M n }n∈N be collection of continuous, realvalued processes. If M n is a local martingale with respect to some filtration to which Y n and M n are adapted for each n < ∞ and (Y n , M n ) ⇒ (Y ∞ , M ∞ ), then M ∞ is a local martingale with respect to the filtration generated by Y ∞ and M ∞ . Proof. First we move the proof onto the canonical space. Let X = (Y, M ) denote the canonical process on Ω , C(R+ ; E×R), and set Pn = L (Y n , M n ), so Pn ⇒ P∞ by assumption. As we now have everything defined on the canonical space, we throw away the original sequence (Y n , M n ), and we will reuse the notation M m to denote stopped versions of M below. Let C0 = {Ct0 } 0 denote denote the filtration on Ω with Ct0 , σ(X t ), and let C = σ(X) = C∞ n n the Borel σ-field on Ω. Finally, let F = {Ft } denote the smallest rightcontinuous, Pn -augmented filtration which contains C0 . Now define the stopping time T a , inf{t : |Mt | ≥ a}. Using the Cor. F.6 and Rem. F.7, we may choose a sequence {am } with limm am = ∞ such that am L (X, M m | Pn ) ⇒ L (X, M m | P∞ ) for each fixed m, where M m , M T . If we fix any s < t and any bounded, continuous f : C(R+ ; E×R) → R, then the map x = (y, z) 7→ f ∇(x, s) z(t)−z(s) is continuous from C(R+ ; E×R) to s m m n R. This means that L f (X )(Mt − Ms ) | P ⇒ L f (X s )(Mtm − Msm ) | P∞ . But everything here is bounded, so we also have E∞ [f (X s ) Mtm − Msm ] = lim En [f (X s ) Mtm − Msm ] = 0, (F.10) n as M m is a martingale for each n < ∞. Now the class of functions f : C(R+ ; E×R) → R such that F.10 holds is a monotone class that contains all the bounded continuous functions, so it must actually contain all bounded C /R-measurable functions. In particular, E∞ [Mt∞ − Ms∞ | Cs0 ] = 0 and M m is a (C0 , P∞ )-martingale. We may then apply Lem. F.8 to conclude that 133

APPENDIX F. CONVERGENCE OF CHARACTERISTICS M m is actually an (F∞ , P∞ , )-martingale. As T am → ∞ everywhere, we have evidenced a localizing sequence for M , and we see that M is an (F∞ , P∞ )local martingale. F.11 Lemma. Let D1 ⊂ R+ and D2 ⊂ Rd be dense subsets, and let f ∈ C(R+ ; Rd ⊗Rd ). Suppose that f (s) ∈ S d when s ∈ D1 and that (f (s)x, x) ≤ (f (t)x, x) when s, t ∈ D1 , x ∈ D2 and s ≤ t. Then f (t) − f (s) ∈ S+d for all s, t ∈ R+ with s ≤ t. Proof. For the sake of generating a contradiction, first assume that there exist s ∈ R+ such that f ij (s) 6= f ji (s). But then we may choose sn ∈ D1 with limn sn = s, so f ij (s) = lim f ij (sn ) = lim f ji (sn ) = f ji (s). n

n

As this is a contradition, we have f (s) ∈ S d for all s ∈ R+ . Now assume that there exist s, t ∈ R+ with s < t such that f (t) − f (s) ∈ / d d S+ . This means that there exists x ∈ R with (f (s)x, x) > (f (s)x, x). Take sn , tn ∈ D1 and xn ∈ D2 with limn sn = s, limn tn = t, limn xn = x, and sn < tn for each n. The map (A, x) 7→ (Ax, x) is continuous from S+d ×Rd to R, so we have (f (s)x, x) = lim (f (sn )xn , xn ) ≤ lim (f (tn )xn , xn ) = (f (t)x, x). n→∞

n→∞

This is again a contradiction, so we conclude that f (t) − f (s) ∈ S+d for all s, t ∈ R+ with s ≤ t. We now have everything that we need to prove the main theorem of this subsection. F.1 Theorem. Let X, B, and C be continuous processes where X and B take values in Rd , C takes values in Rd ⊗Rd , and B is a.s. of finite variation. Let {X n } be sequence of continuous, Rd -valued processes, and suppose that X n is a semimartingale with the characteristics (B n , C n ). If (X n , B n , C n ) ⇒ (X, B, C), then X is a semimartingale which has the characteristics (B, C) with respect to the filtration generated by X, B, and C (i.e., F0 , {σ(X t , B t , C t )}t∈R+ ).

134

APPENDIX F. CONVERGENCE OF CHARACTERISTICS

Proof. Without loss of generality, we assume that everything is defined on the same space. As the set {0} is closed and C n = hX n i, we may apply Portmanteau’s Theorem (Thm. B.1) to conclude that P[Ctij − Ctji = 0] ≥ lim P[Ctn,ij − Ctn,ji = 0] = 1. n

Letting D1 denote a countable dense subset of R+ , we may then conclude that P[Ct ∈ S d ∀t ∈ D1 ] = 1. Similarly, for each s < t and x ∈ Rd , we have P (Ct − Cs )x, x ≥ 0 ≥ lim P (Ctn − Csn )x, x ≥ 0 = 1. n

Letting D2 denote a countable dense subset of Rd , we may then conclude that P (Ct − Cs )x, x ≥ 0 ∀s < t ∈ D1 x ∈ D2 = 1. We may then apply Lem. F.11 to conclude that P[Ct − Cs ∈ S+d ∀s < t ∈ R+ ] = 1, and then Lem. C.6 implies that C is a.s. of finite variation. If a and b are points in Rn and c is a point in Rd ⊗Rd , then the maps (a, b) 7→ a − b and (a, b, c) 7→ (a − b)⊗(a − b) − c are continuous, so we may apply Lem. B.6 to conclude that the functional versions of these map are continuous. This implies that X n , B n , C n , X n − B n , (X n − B n )⊗(X n − B n ) − C n ⇒ X, B, C, X − B, (X − B)⊗(X − B) − C As each coordinate of X n − B n and (X n − B n )⊗(X n − B n ) − C n is a local martingale, we may apply Thm. F.9 to conclude that each coordinate of X − B and (X − B)⊗(X − B) − C is a local martingale with respect to the filtration generated by (X, B, C). We have shown above that C is a.s. of finite variation, so hX − B i = C, As B is a.s. of finite variation by assumption, we conclude that X has the characteristics (B, C).

135

Bibliography [ABOBF02] M. Avellaneda, D. Boyer-Olson, J. Busca, and P. Friz. Reconstruction of volatility: Pricing index options using the steepestdescent approximation. Risk, pages 87–91, 2002. [Ald78]

D. Aldous. Stopping times and tightness. The Annals of Probability, 6(2):335–340, 1978.

[AM06]

A. Antonov and T. Misirpashaev. Markovian projection onto a displaced diffusion: Generic formulas with applications. available at SSRN: http://ssrn.com/abstract, 937860, 2006.

[AMP07]

A. Antonov, T. Misirpashaev, and V. Piterbarg. Markovian projection onto a heston model. Working Paper, 2007.

[BI66]

A. Ben-Israel. A note on an iterative method for generalized inversion of matrices. Math. Comp, 20:439–440, 1966.

[BIG03]

A. Ben-Israel and T.N.E. Greville. Generalized Inverses: Theory and Applications. Springer, 2003.

[Bil68]

P. Billingsley. Convergence of Probability Measures. John Wiley & Sons, 1968.

[BJN00]

M. Britten-Jones and A. Neuberger. Option prices, implied price processes, and stochastic volatility. The Journal of Finance, 55(2):839–866, 2000.

[BL78]

D.T. Breeden and R.H. Litzenberger. Prices of state-contingent claims implicit in option prices. The Journal of Business, 51(4):621–651, 1978.

136

BIBLIOGRAPHY

[BM01]

D. Brigo and F. Mercurio. Displaced and mixture diffusions for analytically-tractable smile models. Mathematical FinanceBachelier Congress 2000, pages 151–174, 2001.

[BM02]

D. Brigo and F. Mercurio. Lognormal-mixture dynamics and calibration to market volatility smiles. International Journal of Theoretical and Applied Finance, 5(4):427–446, 2002.

[Car07]

R. Carmona. HJM: A unified approach to dynamic models for fixed income, credit and equity markets. Lecture Notes in Mathematics, 1919:1, 2007.

[CN07]

R. Carmona and S. Nadtochiy. Local volatility dynamic models. Preprint, Princeton University, 2007.

[Con98]

D. Constales. A closed formula for the Moore-Penrose generalized inverse of a complex matrix of given rank. Acta Mathematica Hungarica, 80:83–88, 1998.

[Der01]

Emanuel Derman. Models and markets. Risk, 14(2):48–50, 2001.

[DFW98]

B. Dumas, J. Fleming, and R.E. Whaley. Implied volatility functions: Empirical tests. The Journal of Finance, 53(6):2059– 2106, 1998.

[DK94]

E. Derman and I. Kani. Riding on a smile. Risk, 7(2):32–39, 1994.

[DK98]

E. Derman and I. Kani. Stochastic implied trees: Arbitrage pricing with stochastic term and strike structure of volatility. International Journal of Theoretical and Applied Finance, 1(1):61– 110, 1998.

[Dup94]

B. Dupire. Pricing with a smile. Risk, 7(1):18–20, 1994.

[Fok13]

AD Fokker. Die mittlere energie rotierender elektrischer dipole im strahlungsfeld. Annalen der Physik, 348(5):810–820, 1913.

[Gar73]

A. Garcia. Martingale Inequalities: Seminar Notes on Recent Progress. W. A. Benjamin, 1973.

137

BIBLIOGRAPHY

[Gat06]

J. Gatheral. The Volatility Surface: A Practitioner’s Guide. Wiley, 2006.

[Gy¨o86]

I. Gy¨ongy. Mimicking the one-dimensional marginal distributions of processes having an Itˆo differential. Probability Theory and Related Fields, 71(4):501–516, 1986.

[Hes93]

S.L. Heston. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6(2):327–43, 1993.

[JM81a]

J. Jacod and J. Memin. Existence of weak solutions for stochastic differential equations with driving semimartingales. Stochastics An International Journal of Probability and Stochastic Processes, 4(4):317–337, 1981.

[JM81b]

J. Jacod and J. Memin. Weak and strong solutions of stochastic differential equations: existence and stability. Proc. LMS Symp., Lect. Notes in Math, 851:169–212, 1981.

[JM86]

A. Joffe and M. Metivier. Weak convergence of sequences of semimartingales with applications to multitype branching processes. Advances in Applied Probability, 18(1):20–65, 1986.

[JS87]

J. Jacod and A.N. Shiryaev. Limit theorems for stochastic processes, 2nd Edition. Springer New York, 1987.

[Kec95]

A.S. Kechris. Classical Descriptive Set Theory. Springer, 1995.

[Kol31]

¨ A. Kolmogoroff. Uber die analytischen methoden in der wahrscheinlichkeitsrechnung. Mathematische Annalen, 104(1):415–458, 1931.

[Kry84]

N. V. Krylov. Once more about the connection between elliptic operators and Itˆos stochastic equations. Statistics and Control of Stochastic Processes, Steklov Seminar, pages 214–229, 1984.

[KS91]

I. Karatzas and S.E. Shreve. Brownian motion and stochastic calculus, volume 113 of graduate texts in mathematics, 1991.

[Kun90]

H. Kunita. Stochastic Flows and Stochastic Differential Equations. Cambridge University Press, 1990. 138

BIBLIOGRAPHY

[Len77]

E. Lenglart. Relation de domination entre deux processus. Ann. Inst. Henri Poincar´e, 13:171–179, 1977.

[LS01]

RS Lipster and A.N. Shiryaev. Statistics of random processes i: General theory, second edition. Applications of Mathematics, Springer, Berlin-New York, 2001.

[MQR07]

Dilip Madan, Michael Qian Qian, and Yong Ren. Calibrating and pricing with embedded local volatility models. Risk, 20(9):138–143, 2007.

[Par67]

K. R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, 1967.

[Pen55]

R. Penrose. A generalized inverse for matrices. Proc. Cambridge Philos. Soc., 51:406–413, 1955.

[Pit03a]

V. Piterbarg. Mixture of models: A simple recipe for a... hangover? Working Paper, 2003.

[Pit03b]

VV Piterbarg. A stochastic volatility forward libor model with a term structure of volatility smiles. Technical report, Working paper, Bank of America, 2003.

[Pit05]

V. Piterbarg. Time to smile. Risk, pages 71–75, 2005.

[Pit06]

V. Piterbarg. Smiling hybrids. Risk, May, pages 65–71, 2006.

[Pit07]

V. Piterbarg. Markovian projection for volatility calibration. Risk, 4, 2007.

[Pla17]

M. Planck. Ueber einen satz der statistichen dynamik und eine erweiterung in der quantumtheorie. Sitzungberichte der Preussischen Akadademie der Wissenschaften, pages 324–341, 1917.

[Reb79]

R. Rebolledo. La m´ethode des martingales appliqu´ee `a l’´etude de la convergence en loi de processus. M´emoires de la Soci´et´e Math´ematique de France, 62:1–125, 1979.

[Roy88]

H.L. Royden. Real analysis, 3rd Edition. Macmillan New York, 1988. 139

BIBLIOGRAPHY

[RSN90]

F. Riesz and B. Sz˝okefalvi-Nagy. Functional Analysis. Dover Publications, 1990.

[Rub94]

M. Rubinstein. Implied binomial trees. Journal of Finance, 49(3):771–818, 1994.

[RY99]

D. Revuz and M. Yor. Continuous martingales and Brownian motion, 3rd Edition. Springer-Verlag New York, 1999.

[SV79]

D.W. Stroock and S.R.S. Varadhan. Multidimensional Diffusion Processes. Springer-Verlag, 1979.

[Var67]

S.R.S. Varadhan. On the behavior of the fundamental solution of the heat equation with variable coefficients. Comm. Pure Appl. Math, 20(2), 1967.

140