WEAK CONVERGENCE AND ITS APPLICATIONS
July 25, 2013
17:28
WSPC - Proceedings Trim Size: 9.75in x 6.5in
This page intentionally left blank
icmp12-master
WEAK CONVERGENCE AND ITS APPLICATIONS Zhengyan Lin Hanchao Wang Zhejiang University, China
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
WEAK â•›CONVERGENCE â•›AND â•›ITS â•›APPLICATIONS Copyright © 2014 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 978-981-4447-69-0
Printed in Singapore
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Preface
The book is devoted to the theory of weak convergence of probability measures on metric spaces. There are many books concerning the weak convergence, for example, Ethier and Kurtz (1986), van der Vaart and Wellner (1996), Billingsley (1999), Jacod and Shiryaev (2003) and so on. The emphasis of these books are different. Statistics and econometrics have made great progress in last two decades. It became necessary to study the distributions or the asymptotic distributions of some complex statistics, so the weak convergence of stochastic processes based on these complex statistics would be more important. Some models may need new technique in weak convergence to study. For example, • In the study of the asymptotic behavior of non-stationary time series, the limiting process (resp. limiting distribution) usually associates with the processes with conditional independent increment (resp. mixture normal distributions). • The asymptotic errors caused by discretizations of stochastic differential equations (e.g. analysis of hedging error) converge to stochastic integral in distribution. • In the finance risk theory, the returns of assets usually obey heavy-tailed distributions, the statistics based on such data do not have the asymptotic normality. The processes based on these statistics usually converge to the non-Gaussian stable process weakly. • Some statistics in the goodness-of-fit testing are empirical processes or functional index empirical processes. It is quite difficult to state the weak convergence of such processes. Our main aim is to give a systematic exposition of the theory of weak convergence, covering wide range of weak convergence problems including new developments. In this book, we summarize the development of weak convergence in each of the following aspects: Donsker type invariance principles, convergence of point processes, weak convergence to semimartingale and convergence of empirical processes. Compared to the well-known monographs mentioned earlier, we do not cover all details, but our book contains the core content of many branches and new development as much as possible. This is perhaps one of the highlights of our book. v
pg v/0
April 23, 2014
10:51
vi
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
In chapter 1, some definitions, properties of weak convergence are presented, including the useful metric space, portmanteau theorem of weak convergence and some important examples. It is noteworthy that the method of proving weak convergence are given, which is frequently used in the following chapter. The weak convergence to independent increment processes is presented in chapter 2. We divide this chapter to two main parts: Donsker type invariance principles and point process convergence. Donsker type invariance principles depict weak convergence to Gaussian independent increment processes, and point process convergence is about the weak convergence to compound Poisson type processes. In chapter 3, We give the weak convergence to semimartingale. Firstly, the general case of convergence to semimartingale is presented. Furthermore, weak convergence to stochastic integral as examples is given. At last, some applications are collected. Weak convergence of empirical processes are presented in chapter 4. The classical results of empirical process are presented in section 4.1, and the convergence of function index empirical processes are discussed in section 4.3 and 4.4. Section 4.2 and 4.5 give some applications of weak convergence of empirical processes. More and more people are interested in applications of weak convergence theory. Another aim of writing this book is to review some recent applications of modern weak convergence theory in time series, statistics and econometrics. Some of the results belong to the authors. Finally, we would like to sincerely thank Prof. V. Ulyanov of Moscow State University, who provided us valuable comments.
pg vi/0
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Contents
Preface
v
1. The Definition and Basic Properties of Weak Convergence
1
1.1
1.2
1.3
1.4
Metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 C[0, 1] space . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 D[0, 1] space . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 M+ (S) space . . . . . . . . . . . . . . . . . . . . . . . . The definition of weak convergence of stochastic processes and portmanteau theorem . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Prohorov metric . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Weak convergence and portmanteau theorem . . . . . How to verify the weak convergence? . . . . . . . . . . . . . . . 1.3.1 Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Identifying the limit . . . . . . . . . . . . . . . . . . . . Two examples of applications of weak convergence . . . . . . . 1.4.1 Unit root testing . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Goodness-of-fit testing for volatility . . . . . . . . . . .
. . . .
. . . .
1 1 2 5
. . . . . . . . .
. . . . . . . . .
6 7 10 14 14 17 18 19 19
2. Convergence to the Independent Increment Processes 2.1 2.2
2.3
The basic conditions of convergence to the Gaussian independent increment processes . . . . . . . . . . . . . . . . . . . . . . . . . Donsker invariance principle . . . . . . . . . . . . . . . . . . . . . 2.2.1 Classical Donsker invariance principle . . . . . . . . . . . 2.2.2 Martingale invariance principle . . . . . . . . . . . . . . . 2.2.3 Extension of martingale invariance principle . . . . . . . Convergence of Poisson point processes . . . . . . . . . . . . . . . 2.3.1 Functional limit theorem for independent heavy-tailed sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Functional limit theorem for dependent heavy-tailed sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
21 . . . . . .
21 23 23 26 38 45
.
48
.
57
pg vii/0
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
viii
linwang
Weak Convergence and its Applications
2.4
Two examples of applications of point process method . . . . . . . 2.4.1 The maximum of the periodogram for heavy-tailed sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The weak convergence to stable integral . . . . . . . . . . .
3. Convergence to Semimartingales 3.1 3.2
3.3 3.4 3.5 3.6
3.7
The conditions of tightness for semimartingale sequence . . . . Weak convergence to semimartingale . . . . . . . . . . . . . . . 3.2.1 Tightness . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Identifying the limit . . . . . . . . . . . . . . . . . . . . Weak convergence to stochastic integral I: the martingale convergence approach . . . . . . . . . . . . . . . . . . . . . . . Weak convergence to stochastic integral II: Kurtz and Protter’s approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stable central limit theorem for semimartingales . . . . . . . . An application to stochastic differential equations . . . . . . . . 3.6.1 Euler method for stochastic differential equations . . . 3.6.2 Milstein method for stochastic differential equations . . Appendix: the predictable characteristics of semimartingales .
4.4 4.5
65 69 75
. . . .
. . . .
75 79 80 81
. .
86
. . . . . .
. . . . . .
4. Convergence of Empirical Processes 4.1 4.2 4.3
64
Classical weak convergence of empirical processes . . . . . . . . . . Weak convergence of marked empirical processes . . . . . . . . . . Weak convergence of function index empirical processes . . . . . . 4.3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Glivenko-Cantelli theorems . . . . . . . . . . . . . . . . . . 4.3.3 Donsker theorem . . . . . . . . . . . . . . . . . . . . . . . . Weak convergence of empirical processes involving time-dependent data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two examples of applications in statistics . . . . . . . . . . . . . . 4.5.1 Rank tests in complex survey samples . . . . . . . . . . . . 4.5.2 M-estimator . . . . . . . . . . . . . . . . . . . . . . . . . .
105 111 115 115 125 127 131 131 136 145 146 157 159 161 167 167 169
Bibliography
171
Index
175
pg viii/0
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Chapter 1
The Definition and Basic Properties of Weak Convergence
1.1
Metric space
The aim of writing this book is to summarize both classical theory and recent development of weak convergence theory. The weak convergence of stochastic processes is an important subject of modern probability theory. Stochastic processes can be treated as the random elements taking values in a specific space, such as function space. To study the weak convergence of stochastic processes, the space, in which the stochastic processes take values, should be a Polish space (a complete separable topological space) with a suitable topology. Usually, people have been interested in the topology of the space. In this section, we firstly review the properties of some useful topological spaces, and the definitions of weak convergence in these spaces. The following notations would be used in the whole book. Let (Ω, F , P) be the probability space, X the random element on (Ω, F , P) taking values in the space S. B(S) stands for the σ−field generated by open subsets of S under the metric ρ. (S, ρ) stands for the metric space. In this book, we mainly focus on three kinds of spaces for the study of weak convergence: C[0, 1], D[0, 1] and M+ (S). R stands for the 1-dimensional Euclid space. 1.1.1
C[0, 1] space
C[0, 1] is the space of all continuous functions: [0, 1] → R. We wish to introduce a topology to C[0, 1], such that C[0, 1] is a Polish space with this topology. Usually, for α, β ∈ C[0, 1], people use the local uniform topology associated with the metric ρ(α, β) = sup |α(s) − β(s)|. (1.1) s∈[0,1]
By the Arzel` a-Ascoli theorem, a set A ⊂ C[0, 1] is relatively compact if and only if sup sup |x(t)| < ∞ t∈[0,1] x∈A
and lim sup sup |x(t) − x(s)| = 0. δ↓0 x∈A |t−s|≤δ
1
pg 1/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
2
1.1.2
linwang
Weak Convergence and its Applications
D[0, 1] space
D[0, 1] is the space of all functions: [0, 1] → R, which are right-continuous and have left-hand limits. Since the elements in D[0, 1] may have discontinuous points, it is difficult to find a topology such that D[0, 1] is a Polish space endowed this topology. For briefness and convenience, people may hope that D[0, 1] will be a Polish space with the local uniform topology. Unfortunately, D[0, 1] fails to be separable with this metric. For example, the number of functions αs (t) = 1[s,1) (t), 0 ≤ s ≤ 1 is uncountable. However, ρ(αs , αs ) = 1 for s = s , so D[0, 1] is not separable with the local uniform topology. In fact, D[0, 1] is uncountable under the uniform topology because of the discontinuity of x(t), a function belonging to D[0, 1]. The candidate topology should be different. Skorokhod (1956) introduced four kinds of topologies to overcome the shortage of the local uniform topology. To express the definitions of these four topologies briefly, we use the convergence concept to define the topologies. Definition 1.1. Let xn , n ≥ 1, x be elements in D[0, 1]. If there exists a continuous one-to-one mapping sequence λn (t) of [0, 1] onto [0, 1] such that lim sup |xn (λn (t)) − x(t)| = 0,
n→∞ t∈[0,1]
lim sup |λn (t) − t| = 0,
n→∞ t∈[0,1]
(1.2)
J
1 we call xn J1 -convergent to x, denoted by xn −→ x.
The definition of J1 topology can be introduced by J1 -convergence. Usually, J1 topology stands for Skorokhod topology, but there are many situations where J1 should be replaced by the other kinds of topologies. Definition 1.2. Let xn , n ≥ 1, x be elements in D[0, 1], and if there exists a oneto-one mapping sequence λn (t) of [0, 1] onto [0, 1] such that (1.2) is satisfied, we call J
2 x. xn J2 -convergent to x, denoted by xn −→
Define the graph Γx(t) as the closed set in R × [0, 1], such that for any t, x(t) belongs to [x(t − 0), x(t + 0)]. If there exists a pair of functions (y(s), t(s)), and an s such that x = y(s), t = t(s), where y(s) is continuous, t(s) is continuous and monotonically increasing, those and only those pairs (x, t) belong to Γx(t) , we call (y(s), t(s)) a parametric representation of Γx(t) . We use the above notations and definitions to define the M1 and M2 topologies. Let R[(x1 , t1 ); (x2 , t2 )] = |x1 − x2 | + |t1 − t2 |.
pg 2/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
3
The Definition and Basic Properties of Weak Convergence
Definition 1.3. If there exist parametric representations (y(s), t(s)) of Γx(t) and (yn (s), tn (s)) of Γxn (t) such that lim sup R[(yn (s), tn (s)); (y(s), t(s))] = 0,
n→∞ s∈[0,1]
(1.3)
M
1 x. we call xn M1 -convergent to x, denoted by xn −−→
Definition 1.4. If there exist parametric representations (y(s), t(s)) of Γx(t) and (yn (s), tn (s)) of Γxn (t) such that sup
lim
inf
n→∞ (y ,t )∈Γ (y2 ,t2 )∈Γxn (t) 1 1 x(t)
R[(y1 , t1 ); (y2 , t2 )] = 0,
(1.4)
M
2 we call xn M2 -convergent to x, denoted by xn −−→ x.
The definitions of M1 and M2 topologies which are given in terms of graphs are actually complicated and not short. We will present the necessary and sufficient conditions for M1 and M2 convergence. Set H(x1 , x2 , x3 ) is the distance of x2 from [x1 , x3 ]. Proposition 1.1. (Skorohod (1956)) There is equivalence between: M1 x; (i) xn −−→ (ii) a) xn converges to x on a dense set of [0, 1], and this set contains 0 and 1. b) lim lim sup
sup
c→0 n→∞ t2 −c
H(xn (t1 ), xn (t2 ), xn (t3 )) = 0.
Proposition 1.2. (Skorohod (1956)) There is equivalence between: M2 x; (i) xn −−→ (ii) lim lim sup sup H(x(t), xn (tc ), x(t∗c )) = 0,
c→0 n→∞ t∈[0,1]
where tc = max{0, t − c}, t∗c = min{1, t + c}. Next, we present the conditions for compactness in the topologies J1 , J2 , M1 , M2 . Proposition 1.3. (Skorohod (1956)) The set K is compact in a topology S, where S stands for one of J1 , J2 , M1 , M2 , if lim lim sup (ΔS (c, x(t)) + sup |x(t) − x(0)| +
c→0 x∈K t∈[0,1]
0
sup
1−c
|x(t) − x(1)|) = 0,
where ΔJ1 (c, x(t)) =
sup t−c
min{|x(t1 ) − x(t)|, |x(t2 ) − x(t)|},
pg 3/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
4
linwang
Weak Convergence and its Applications
ΔJ2 (c, x(t)) =
sup tc ≤t1 ≤tc +c/2, t,t1 ,t2 ∈[0,1], tc −c/2≤t2 ≤tc
ΔM1 (c, x(t)) = ΔM2 (c, x(t)) =
sup t2 −c
min{|x(t1 ) − x(t)|, |x(t2 ) − x(t)|},
H(x(t1 ), x(t2 ), x(t3 )),
sup ∗ tc ≤t1 ≤tc +c/2, t,t1 ,t2 ∈[0,1], t∗ c −c/2≤t2 ≤tc
H(x(t), x(t1 ), x(t2 )).
Usually, when the limiting process takes value in D[0, 1], people study the asymptotic properties under the J1 and M1 topologies. However, the J1 and M1 topologies are not enough for convergence in some special cases. Example 1.1. Let k 0 (t) = 1[1/2,1] (t), k n (t) = 1[1/2−1/n,1] (t), x0 (t) = xn (t) = J
J
1 1 1[1/2,1] (t), where n ≥ 1. Then xn −→ x0 , k n −→ k 0 , however x0 (s) ≡ 1 n 1 0 n 0 0 k (s−)dx (s) doesn’t converge to 0 k (s−)dx (s) ≡ 0 in J1 topology. It means that the convergence of integrators and integrands in J1 topology can not implies the convergence of integral in J1 topology.
Example 1.2. Let k 0 (t) = 1[1/2,1] (t), k n (t) = 1[1/2+1/n,1] (t), x0 (t) = xn (t) = J
J
1 1 1[1/2,1] (t), where n ≥ 1. Then xn −→ x0 , k n −→ k0, 1 1 J1 k n (s−)dxn (s) −→ k 0 (s−)dx0 (s) ≡ 0, 0≡
0
J1
0
0
0
however, (k , x ) −→ (k , x ) doesn’t hold. It means that the convergence of integral in J1 topology can not imply joint convergence of integrators and integrands in J1 topology. n
n
Although Skorohod topology is very common in the study of weak convergence theory, it is not suitable when we study the convergence of some functionals in D[0, 1]. The examples 1.1 and 1.2 tell us that convergence of integrals has little contact with convergence of integrands and integrators under Skorohod topology. In Jakubowski (1996, 1997), the author introduced a new topology, the so-called semimartingale topology. Although this topology can not be metrical, it is quite natural and share many useful properties with J1 and M1 . Furthermore, it is advantage for convergence of integrals, we will present the details in Chapter 3. In this section, we firstly give the definition of the topology. Definition 1.5. If xn , x belong to D[0, 1], and for every ε > 0, one can find bounded variation functions υn,ε and υε , such that sup |xn (t) − υn,ε (t)| ≤ ε, t∈[0,1]
sup |x(t) − υε (t)| ≤ ε, t∈[0,1]
and for every continuous function f : [0, 1] → R, f (t)dυn,ε (t) → f (t)dυε (t). [0,1]
[0,1]
pg 4/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
The Definition and Basic Properties of Weak Convergence
5
S
we call xn S-convergent to x, denoted by xn − → x. To characterize the compact set under the S topology in D[0, 1], we need the following notation. For a function k ∈ D[0, 1], let Nη (k) be the number of η−oscillations of k in [0, 1]. More precisely, Nη (k) ≥ m if there are points 0 ≤ t1 ≤ t2 ≤ · · · ≤ t2m−1 ≤ t2m ≤ 1, such that |k(t2j ) − k(t2j−1 )| > η, j = 1, 2, · · · , m. Proposition 1.4. K ⊂ D[0, 1] is S−compact if and only if for every η > 0, there exists constant Cη such that (i) sup sup |k(t)| ≤ Cη , k∈K t∈[0,1]
(ii) sup Nη (k) ≤ Cη . k∈K
The proof of this proposition can be found in Skorohod (1956). 1.1.3
M+ (S) space
In some stochastic models, people usually define the stochastic point process to be a counting function for modeling random distribution of points in a locally compact topological space S. A stochastic point process can be seen as a random element taking values in the measure space, then it is necessary to discuss the measure space’s topology for studying the weak convergence of point processes. Suppose that S is a locally compact topological space with a countable base. B(S) stands for the σ−field generated by open subsets of S. A measure μ on (S, B(S)) is called Radon if μ(K) < ∞ for every compact set K ⊂ S. Define M+ (S) = {μ : μ is Radon measure on (S, B(S))}, C+ b (S) = {f : S → R+ ; is continuous with compact support}. Definition 1.6. If μn ∈ M+ (S), n ≥ 0, and for every f ∈ C+ b (S), f dμn → f dμ0 S
S
υ
as n → ∞, we call μn converges vaguely to μ0 , denoted by μn − → μ0 . Vague topology can be introduced by vague convergence.
pg 5/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
6
linwang
Weak Convergence and its Applications
A very important element in M+ (S) is the point measure with the form δxi μ= i
where δ denotes the Dirac measure, {xi } is a sequence of points in S. Let Mp (S) be the set of all Radon point measures in M+ (S). Mp (S) turns out to be a closed subset of M+ (S). To specify open sets and a topology in M+ (S), we need to define a basis set to be a subset of M+ (S) with the form {μ ∈ M+ (S) : fi dμ ∈ (ai , bi ), i = 1, 2, · · · , d}, S
where d is any fixed positive integer and fi ∈ C+ b (S), and 0 ≤ ai ≤ bi . Unions of basis sets form the class of open sets constituting the vague topology. Furthermore, let {fi } be the countable basis of C+ b (S), for μ1 and μ2 ∈ M+ (S), define metric ∞ | fi dμ1 − fi dμ2 | ∧ 1 S S . d(μ1 , μ2 ) = i 2 i=1 Thus, vague topology is metrizable as a complete, separable metric space. The following proposition characterize the relatively compact set in M+ (S) Proposition 1.5. (Resnick (2007)) A set M ⊂ M+ (S) is vaguely relatively compact if and only if f dμ < ∞ sup μ∈M
for every f ∈ 1.2
S
C+ b (S).
The definition of weak convergence of stochastic processes and portmanteau theorem
The weak convergence of stochastic processes is an important subject. In fact, the weak convergence of stochastic processes is the weak convergence of probability measures, which is defined on the functional space. So firstly we give the definition of the weak convergence of probability measures. In this section, (S, d) stands for a complete separable space, G(S), F (S) and K(S) stand for the families of open sets, closed sets and compact sets respectively in S. P(S) stands for the family of probability measures defined on the S. C(S) stands for the space of real-valued bounded continuous functions on the (S, d). In the previous section, we have already defined the topology for the M+ (S). When people study the special case P(S) of M+ (S), the vague convergence is equivalence to the weak convergence, thus convergence in P(S) is very interesting. In this section, we will first introduce the Prohorov metric in P(S). Then we will present the connection between Prohorov metric, vague topology and weak convergence.
pg 6/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
The Definition and Basic Properties of Weak Convergence
1.2.1
linwang
7
Prohorov metric
Prohorov (1956) introduced the Prohorov metric as follow: ρ(P, Q) = inf{ε > 0 : P(F ) ≤ Q(F ε ) + ε for every F ∈ F (S)},
(1.5)
where P, Q ∈ P(S), F ε = {x ∈ S : inf y∈F d(x, y) < ε}. Proposition 1.6. ρ is a metric on P(S). Proof. Firstly, if ρ(P, Q) = 0, then for every F ∈ F (S), P(F ) = Q(F ), hence for every A ∈ B(S), P(A) = Q(A). Therefore, ρ(P, Q) = 0 if and only if P = Q. Furthermore, for every F ∈ F (S), there exist α, β > 0 such that P(F ) ≤ Q(F α ) + β. For any K ∈ F (S), let F = S − K α , then F ∈ F (S), and K ⊂ S − F α , then P(K α ) = 1 − P(F ) ≥ 1 − Q(F α ) − β ≥ Q(K) − β, which implies Q(K) ≤ P(K α ) + β for every K ∈ F (S). We have ρ(P, Q) = ρ(Q, P). Finally, if T ∈ P(S), ρ(P, Q) < δ, ρ(Q, T) < ε, then for every F ∈ F (S), P(F ) ≤ Q(F δ ) + δ ≤ Q(F δ ) + δ ≤ T((F δ )ε ) + δ + ε = T(F δ+ε ) + δ + ε, so ρ(P, T) ≤ δ + ε. It implies ρ(P, T) ≤ ρ(P, Q) + ρ(Q, T). The proposition is obtained. The following theorem tells us the reason why S is to be a complete separable space. Theorem 1.1. (P(S), ρ) is a complete separable space. Proof. Let {xn } be a countable dense subset of S. Let δx denote the indicator measure of point x, it obviously belongs to P(S). Let {ai }∞ i=1 denote the rational number set and N denote the finite positive integer. By the monotone class theo rem and the convergence theorem, the probability measures of the form N i=1 ai δxi comprise a dense subset of P(S). We get the separability of (P(S), ρ). For completeness, we consider the sequence {Pn } ⊂ P(S) with ρ(Pn , Pn−1 ) < 2−n for each n ≥ 2. We need to find a P ∈ P(S), such that ρ(Pn , P) → 0 as n → ∞. The basic idea of proof is to construct a probability space (Ω, F , ν) and S−valued random variables X, Xn , n = 1, 2, · · · , with distribution P and Pn . If we already have d(Xn , X) → 0 in probability as n → ∞ due to the completeness of (S, d), we can obtain that for every ε > 0, lim μn ((x, y) : d(x, y) ≥ ε) = 0,
n→∞
pg 7/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
8
linwang
Weak Convergence and its Applications
where μn is the joint distribution of Xn and X. Then, for every F ∈ F (S), Pn (F ) = μn (F × S) ≤ μn ((F × S) ∩ {(x, y) : d(x, y) ≤ ε}) + δ ≤ μn (S × F ε ) + δ = P(F ε ) + δ, for given δ > 0 and large n. Hence ρ(Pn , P) → 0 as n → ∞. Now, we construct the probability space and random variables. ρ(Pn , Pn−1 ) ≤ 2−n , we first construct (Ω, F , ν), Xn and Xn−1 , such that
Since
ν(d(Xn , Xn−1 ) ≥ 2−n+1 ) ≤ 2−n+1 , Xn−1 , Xn have distributions Pn−1 , Pn respectively. (n) (n) Firstly, choose E1 , · · · , ENn ∈ B(S) disjoint with diameters less than 2−n , and (n)
(n)
(n)
n Pn−1 (E0 ) ≤ 2−n , where E0 = S − ∪N i=1 Ei . (n) (n) (n) 2−n Let pn,i = Pn−1 (Ei ), Ai = (Ei ) , i = 1, · · · , Nn. We have proved (n) (n) pn,i ≤ Pn−1 ( Ei ) ≤ Pn ( Ai ) + 2−n (1.6)
i∈I
i∈I
i∈I
for all I ⊂ {1, 2, · · · , Nn }. In the following, we will prove that there exist positive n Borel measures η1n , η2n , · · · , ηN on S such that n ⎧ n (n) ηi (Ai ) = ηin (S) ≤ pn,i for i = 1, 2, · · · , Nn , ⎪ ⎪ ⎨ Nn Nn n −n (1.7) i=1 ηi (S) ≥ i=1 pn,i − 2 ⎪ ⎪ ⎩ Nn n i=1 ηi (A) ≤ Pn (A) for A ∈ B(S). We claim that (1.6) can be replaced by (n) (n) pn,i ≤ Pn−1 ( Ei ) ≤ Pn ( Ai ). i∈I
i∈I
(1.8)
i∈I
n We will prove that there exist positive Borel measures η1n , η2n , · · · , ηN on S such n that ⎧ ⎨η n (A(n) ) = η n (S) = pn,i for i = 1, 2, · · · , Nn, i i i (1.9) ⎩Nn η n (A) ≤ Pn (A) for all A ∈ B(S). i=1 i
In fact, we proceed by induction on Nn . For Nn = 1, define η1n (A) = pn,1 Pn (A ∩ (n) (n) A1 )/Pn (A1 ) for A ∈ B(S). Obviously, η1n satisfies the requirement. Suppose that the argument holds for m = 1, 2, · · · , Nn − 1. Define λ(A) = (n) (n) Pn (A ∩ ANn )/Pn (ANn ), and let δ0 be the largest δ such that (n) pn,i ≤ (Pn − δλ)( Ai ) (1.10) i∈I
i∈I
for all I ⊂ {1, 2, · · · , Nn − 1}. (n) (n) When δ0 ≥ pn,Nn , let ηNn = pn,Nn λ, and Pn = Pn − ηNn . Since the induction (n)
(n)
(n)
hypothesis, there exist positive Borel measures η1 , η2 , · · · , ηNn −1 on S such that
pg 8/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
9
The Definition and Basic Properties of Weak Convergence
Nn −1 (n) (n) (n) (n) ηi (Ai ) = ηi (S) = pn,i for i = 1, 2, · · · , Nn − 1, and i=1 ηi (A) ≤ Pn (A) (n) (n) (n) for all A ∈ B(S), and ηNn (ANn ) = ηNn (S) = pn,Nn , the argument holds. When δ0 < pn,Nn , let Pn = Pn − δ0 λ. By the definition of δ0 , there exists I0 ⊂ {1, 2, · · · , Nn − 1}, such that (n) pn,i ≤ Pn ( Ai ) i∈I
i∈I
for all I ⊂ I0 . By the induction hypothesis, there exist positive Borel measures (n) (n) (n) ηi on S such that ηi (Ai ) = ηin (S) = pn,i for i ∈ I0 , and i∈I0 ηin (A) ≤ Pn (A) for all A ∈ B(S). Let pn,i = pn,i for i = 1, 2, · · · , Nn − 1, pn,Nn = pn,Nn − δ0 , (n) n (A) = P (A) − P (A ∩ B0 ), I1 = {1, 2, · · · , n} − I0 . B0 = i∈I0 Ai . Define P n n Then for every I ⊂ I1 (n) pn,i + Pn (B0 ) = pn,i ≤ Pn ( Ai ) i∈I
i∈I
I0
= Pn (
i∈I (n) Ai )
I0
+ Pn (B0n ) − Pn (
i∈I
n( =P
(n)
Ai
(n)
∩ B0 )
i∈I (n) Ai )
+
(n) Pn (B0 ).
i∈I
Obviously,
n ( pn,i ≤ P
i∈I
(n)
Ai )
i∈I
for all I ⊂ I1 . Since the induction hypothesis, there exist positive Borel measures (n) (n) (n) (n) (n) on S such that ηi (Ai ) = ηi (S) = pn,i for i ∈ I1 , and i∈I1 ηi (A) ≤ ηi (n) (n) (n) (n) n (A) for all A ∈ B(S). Finally, let η = η P for i ∈ I1 − Nn , ηNn = ηNn + δ0 λ. i i Then (1.9) is obtained. Let S = S {Δ}, where Δ is the isolated point. We extend Pn from S to S by (n) (n) {Δ}, we can easily get (1.7) defining Pn ({Δ}) = 2−n . We consider Ai = Ai (n) (n) with Ai instead of Ai by using the same procedure. (n) (n) (n) (n) (n) Define c1 , c2 , · · · , cNn ∈ [0, 1] by ci = (pn,i − ηi (S))/pn,i . Obviously, (n)
(n)
(n)
(1 − ci )Pn−1 (Ei ) = ηi (S), i = 1, 2, · · · , Nn and (n)
Pn−1 (E0 ) +
Nn
(n)
(n)
ci Pn−1 (Ei ) = 1 −
i=1
Then, there exist
(n) (n) Q0 , Q1 , · · ·
(n)
Nn
(n)
ηi (S).
i=1
(n) , QNn
(n)
∈ P(S), such that (n)
(n)
Qi (B)(1 − ci )Pn−1 (Ei ) = ηi (B), i = 1, 2, · · · , Nn , (n)
(n)
Q0 (B)(Pn−1 (E0 ) +
Nn i=1
for B ∈ B(S).
(n)
(n)
ci Pn−1 (Ei )) = Pn (B) −
Nn i=1
(n)
ηi (B)
(1.11) (1.12)
pg 9/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
10
linwang
Weak Convergence and its Applications (n)
(n)
Let Xn−1 , Y0 , · · · , YNn
and ξ (n) be independent random variables on (n)
a probability space (Ω, F , ν) with Xn−1 , Y0 (n) Pn−1 , Q0 , · · ·
(n) , QNn
and ξ
(n)
(n)
, · · · , YNn
having distributions
having uniform distribution.
Define
⎧ ⎨Y (n) on {Xn−1 ∈ E (n) , ξ (n) ≥ c(n) } , i = 1, 2, · · · , Nn i i i Xn = ⎩Y (n) on {Xn−1 ∈ E (n) } Nn {Xn−1 ∈ E (n) , ξ (n) < c(n) }. 0 0 i i i=1
By (1.11) and (1.12), ν(Xn ∈ B) =
Nn
(n)
(n)
Qi (B)(1 − ci )Pn−1 (Ei )
i=1 (n)
Nn
(n)
+Q0 (B)(Pn−1 (E0 ) +
(n)
(n)
ci Pn−1 (Ei ))
i=1
= Pn (B), (n) which implies Xn has distribution Pn . Furthermore, since {Xn−1 ∈ Ei , ξ (n) > (n) (n) (n) ci } ⊂ {Xn−1 ∈ Ei , Xn ∈ Ai } ⊂ {d(Xn−1 , Xn ) < 2−n+1 }, N n (n) (n) (n) ν(d(Xn−1 , Xn ) ≥ 2−n+1 ) ≤ ν(Xn−1 ∈ E0 ) + ν( {Xn−1 ∈ Ei , ξ (n) ≤ ci }) i=1 (n)
≤ Pn−1 (E0 ) +
Nn
(n)
(n)
ci Pn−1 (Ei )
i=1 (n)
= Pn−1 (E0 ) +
Nn
(n)
(pn,i − ηi (S)) ≤ 2−n+1 .
i=1
By the Borel-Cantelli lemma, we have ∞ d(Xn−1 , Xn ) < ∞) = 1. ν( n=2
Thus, lim Xn = X a.s..
n→∞
Let P be the distribution of X, we have ρ(Pn , P) → 0 as n → ∞. The theorem is proved. 1.2.2
Weak convergence and portmanteau theorem
In this subsection, we introduce the definition and basic properties of weak convergence, the central topic in this book. Definition 1.7. A sequence Pn ∈ P(S) is said to converge weakly to P ∈ P(S) if f dPn = f dP (1.13) lim n→∞
for every f ∈ C(S), denoted by Pn ⇒ P.
pg 10/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
11
The Definition and Basic Properties of Weak Convergence
The distribution of an S−valued random element X on (Ω, F , P) may be denoted by PX −1. Obviously, PX −1 ∈ P(S). Definition 1.8. A sequence of S−valued random elements Xn is said to converge weakly to S−valued random element X if lim Ef (Xn ) = Ef (X)
(1.14)
n→∞
for every f ∈ C(S), denoted by Xn ⇒ X. The random elements taking values in C[0, 1] and D[0, 1] are continuous path stochastic processes and the so-called c` adl`ag stochastic processes (which are rightcontinuous and have left-hand limits) respectively. Weak convergence of stochastic processes is the main topic of this book. From now on, if we do not specially adl`ag mention, D[0, 1] space is endowed with the J1 topology and the R−valued c` process can be seen as the random element taking values in this topology space, and denote ΔXt = Xt − Xt− . In fact, the weak convergence is equivalent to the convergence under Prohorov metric. Theorem 1.2. Let Pn ⊂ P(S) and P ∈ P(S), the following arguments are equivalent: (i). lim ρ(Pn , P) = 0; n→∞
(ii). Pn ⇒ P; (iii). lim f dPn = f dP for all uniformly continuous bounded f ∈ C(S); n→∞
(iv). lim sup Pn (F ) ≤ P(F ) for all F ∈ F (S); n→∞
(v). lim inf Pn (G) ≥ P(G) for all G ∈ G(S); n→∞
(vi). lim Pn (A) = P(A) for all A ∈ B(S) with P(∂A) = 0. n→∞
Proof. (i)⇒ (ii). For each n, let δn = ρ(Pn , P) + n1 . Consider f ∈ C(S) with f ≥ 0, we have ||f || ||f || f dPn = Pn ({f > s})ds ≤ P({f > s}δn )ds + δn ||f ||, 0
then
0
lim sup n→∞
f dPn ≤ lim
n→∞
0
||f ||
P({f > s}δn )ds =
Applying (1.15) to ||f || + f and ||f || − f , (ii) is obtained. (ii)⇒ (iii) is obvious. (iii)⇒ (iv). For every F ∈ F (S), ε > 0, define d(x, F ) ) ∨ 0, fε (x) = (1 − ε
f dP.
(1.15)
pg 11/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
12
linwang
Weak Convergence and its Applications
where d(x, F ) = inf y∈F d(x, y), then fε is uniform continuous. By (iii), fε dPn = fε dP → P(F ) lim sup Pn (F ) ≤ lim n→∞
n→∞
as ε ↓ 0. (iv)⇒ (v). For every G ∈ G(S), lim inf Pn (G) ≥ 1 − lim sup Pn (Gc ) ≥ 1 − P(Gc ) = P(G) n→∞
n→∞
by (iv). (v)⇒ (vi). Let A ∈ B(S) with P(∂A) = 0. c
c
lim sup Pn (A) ≤ lim sup Pn (A) = 1 − lim inf Pn (A ) ≤ 1 − P(A ) = P(A) n→∞
n→∞
n→∞
and lim inf Pn (A) ≥ lim Pn (A − ∂A) ≥ P(A − ∂A) = P(A), n→∞
n→∞
which imply (vi). (vi)⇒ (ii). Let f ∈ C(S) and f ≥ 0. Since ∂{f ≥ t} ⊂ {f = t}, and P({f = t}) = 0 for all but at most countably many t ≥ 0, ||f || ||f || f dPn = lim Pn ({f ≥ t})dt = P({f ≥ t})dt = f dP. lim n→∞
n→∞
0
0
(v)⇒ (i). For any ε > 0, let E1 , E2 , · · · ∈ B(S) be a partition of S with diameters less than ε/2. There exists n0 , for n ≥ n0 , P(
n
Ei ) ≥ 1 − ε/2.
i=1
Let G be the finite collection of the sets of the form ( i∈I Ei )ε/2 , where I ⊂ {1, 2, · · · , n0 }. Then there exists n1 , for every n ≥ n1 , P(G) ≤ Pn (G) + ε/2 for every G ∈ G by (v). For any F ∈ F (S), let F0 =
n0
{Ei: Ei ∩ F = ∅}.
i=1
We have ε/2
ε/2
P(F ) ≤ P(F0 ) + ε/2 ≤ Pn (F0 ) + ε ≤ Pn (F ε ) + ε, hence for n ≥ n1 , ρ(Pn , P) ≤ ε. (i) is proved.
Proposition 1.7. If Xn , Yn , n ≥ 1, X are random elements which take values in S, and Xn ⇒ X, d(Xn , Yn ) → 0 as n → ∞, then Yn ⇒ X.
pg 12/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
The Definition and Basic Properties of Weak Convergence
13
Proof. Let F ∈ F (S), Fε = {x : d(x, F ) ≤ ε}. P(Yn ∈ F ) ≤ P(d(Xn , Yn ) ≥ ε) + P(Xn ∈ Fε ). Hence, lim sup P(Yn ∈ F ) ≤ lim sup P(Xn ∈ Fε ) ≤ P(X ∈ Fε ) → P(X ∈ F ) n→∞
n→∞
as ε ↓ 0.
Proposition 1.8. Suppose h is a measurable mapping from S to S , and let Dh be the set of discontinuous points of h. If Pn ⇒ P ∈ P(S) and P(Dh ) = 0, then Pn h−1 ⇒ Ph−1 . Proof. Let F ∈ F (S ). Since P(Dh ) = 0 and h−1 F ⊂ Dh (h−1 F ), lim sup Pn h−1 (F )) ≤ lim sup Pn (h−1 F ) ≤ P(h−1 F ) = Ph−1 (F ). n→∞
n→∞
Proposition 1.8 is an extension of the continuous mapping theorem. The following propositions provide some examples of the continuous mapping theorem. adl` ag stochastic processes. If X n ⇒ Proposition 1.9. Suppose X n , n ≥ 1, X are c` X and ΔX1 = 0 a.s., then sup |Xtn | ⇒ sup |Xt |,
0≤t≤1
sup
0≤t≤1
Proof.
0≤t≤1
|ΔXtn |
⇒ sup |ΔXt |. 0≤t≤1
By Proposition 1.8, we only need to prove that lim sup |αn (t)| = sup |α(t)|,
(1.16)
lim sup |Δαn (t)| = sup |Δα(t)|,
(1.17)
n→∞ 0≤t≤1
0≤t≤1
n→∞ 0≤t≤1
0≤t≤1
J
1 α as n → ∞. if αn , α ∈ D[0, 1], Δα(1) = 0 and αn −→
J
1 α as n → ∞, there exists a continuous one-to-one mapping seSince αn −→ quence λn (t) of [0, 1] onto [0, 1] such that
lim sup |αn (λn (t)) − α(t)| = 0,
n→∞ t∈[0,1]
lim sup |λn (t) − t| = 0.
n→∞ t∈[0,1]
We have | sup |αn (t)| − 0≤t≤1
=| ≤
sup 0≤t≤λ−1 n (1)
sup 0≤t≤λ−1 n (1)
sup 0≤t≤λ−1 n (1)
|αn (λn (t))| −
|α(t)|| sup
0≤t≤λ−1 n (1)
|α(t)||
|αn (λn (t)) − α(t)| → 0.
(1.16) is showed, since λ−1 n (1) → 1 as n → ∞ and Δα(1) = 0. Similarly, (1.17) holds true.
pg 13/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
14
linwang
Weak Convergence and its Applications
Proposition 1.10. Suppose X n , n ≥ 1, X are c` adl` ag stochastic processes. g is a continuous function: R → R, and vanishing at the neighborhood of 0. If X n ⇒ X, then g(ΔXsn )) ⇒ (X, g(ΔXs )). (X n , s≤·
s≤·
The proof of this proposition is complex, and can be founded in Jacod and Shiryaev (2003).
1.3
How to verify the weak convergence?
When we intend to prove the weak convergence of probability measures and stochastic processes, firstly, it is needed to prove the relative compactness of {Pn }, the probability measure sequence or distribution sequence of stochastic processes. Relative compactness means that for {Pnk }, subsequence of {Pn }, there exists a further subsequence {Pnkj }, such that Pnkj converges. Furthermore, it is needed to identify the limit. If the limiting measure of weak convergence of Pnkj is common for every subsequence Pnk , we can identify the limit of {Pn }. In the previous section, we have already studied the topology of the P(S), and prove that the Prohorov topology is equivalent to the weak topology, which is generated by the weak convergence. Consequently, a characterization of compact subset of {Pn } under the Prohorov topology is crucial for the relative compactness of {Pn }. In this section, we will characterize the relative compactness of the subsets of P(S) by the notion of tightness. 1.3.1
Tightness
Definition 1.9. A subset M ⊂ P(S) is said to be tight, if for every ε > 0, there exists a compact set K ∈ S, such that P(K) ≥ 1 − ε for every P ∈ M. Theorem 1.3. (Prohorov Theorem) Let K ⊂ P(S), then the followings are equivalent: (i) K is relative compactness. (ii) K is tight. Proof. Since (S, ρ) is completely separable, the relative compactness of K is equivalent to the totally bounded. (i)⇒ (ii). Let ε > 0, since K is totally bounded, there exists a finite subset ε for some P ∈ Tn } for every n = 1, 2, · · · . Tn ⊂ K such that K ⊂ {Q : ρ(Q, P) < 2n+1 For one measure P in P(S), the tightness is obvious for (S, d), which is complete and separable. In fact, for every ε > 0, there exists Gn ∈ G(S), such that P(Gn ) ≥
∞ ε 1 − 2εn . Let K be the closure of ∞ n=1 Gn , then P(K) ≥ 1 − n=1 2n = 1 − ε.
pg 14/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
The Definition and Basic Properties of Weak Convergence
15
ε Then for Tn , there exists Kn such that P(Kn ) ≥ 1 − 2n+1 for every P ∈ Tn and ε > 0. Given Q ∈ K, it means that for n = 1, 2 · · · , there exists Pn ∈ Tn , such that n+1 ε ε Q(Knε/2 ) ≥ Pn (Kn ) − n+1 ≥ 1 − n . 2 2
∞ ε/2n+1 , then Q(K) ≥ 1 − ε. Let K be the closure of n=1 Kn (ii)⇒ (i). We only need to construct that for given δ, there exists a finite subset N of P(S), such that K ⊂ {Q : ρ(P, Q) < δ for some P ∈ N }. For 0 < ε < δ/2, there exists a compact set K, such that P(K) ≥ 1 − ε for every P ∈ K. By the compactness of K, there exists a subset {x1 , x2 , · · · , xk } ⊂ K, such k that K ε ⊂ i=1 Bi , where Bi is the open ε neighborhood of xi . We construct N as follows: given x0 ∈ S, n ≥ k/ε, N is the collection of the probability measures of the form
P=
k ji ( )δxi , n i=0
k
where 0 ≤ ji ≤ n, and i=0 ji = n. i−1 k For fixed Q ∈ K, let ji = [nQ(Ai )], where Ai = Bi − l=1 Bl , j0 = n − i=1 ji . For every F ∈ F (S), we have Q(F ) ≤ Q( Ai ) + ε F ∩Ai =∅
≤
F ∩Ai =∅
[nQ(Ai )] k + +ε n n
≤ P(F 2ε ) + 2ε. ρ(P, Q) ≤ ε is obtained.
For the weak convergence of stochastic processes, the tightness of the distribution sequence of stochastic processes is really very abstractive. We will give some criterions for the tightness of distributions of c`adl`ag processes. Let α ∈ D[0, 1], ω (α, θ) = inf {max r>0
sup
i≤r ti−1 ≤s,t
|α(s)−α(t)| : 0 = t0 < · · · < tr = 1, inf (ti −ti−1 ) ≥ θ}. i≤r
Lemma 1.1. (Skorohod (1956a)) A subset A of D[0, 1] is relatively compact for J1 topology if and only if supα∈A sups∈[0,1] |α(s)| < ∞, (1.18) limθ↓0 supα∈A ω (α, θ) = 0. Theorem 1.4. The c` adl` ag process sequence {X n } is tight if and only if (i) for every ε > 0, there exist positive integer n0 and K ∈ R+ , such that P( sup |Xtn| > K) ≤ ε t∈[0,1]
pg 15/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
16
linwang
Weak Convergence and its Applications
for every n ≥ n0 . (ii) for every ε > 0, δ > 0, there exist positive integer n0 and θ > 0, such that P(ω (X n , θ) ≥ δ) ≤ ε for every n ≥ n0 . Proof. Sufficiency. We suppose that (i) and (ii) hold. Fix ε > 0 and positive integer k, there exist Kε and θk , such that ε ε P( sup |Xtn | > Kε ) ≤ , P(ω (X n , θk ) ≥ 1/k) ≤ k+1 2 2 t∈[0,1] for every n ≥ n0 . Let Aε = {α ∈ D[0, 1] : supt∈[0,1] |α(t)| ≤ Kε , ω (α, θk ) ≤ 1/k for all k ∈ N}. Obviously, P(ω (X n , θk ) ≥ 1/k) ≤ ε, P(X n ∈Aε ) ≤ P( sup |Xtn | > Kε ) + t∈[0,1]
k≥1
and Aε is relatively compact in D[0, 1] by Lemma 1.1. This implies tightness of {X n }. Necessity. By Theorem 1.3, for every ε > 0, there exists compact set K ∈ D[0, 1], such that P(X n ∈ K) ≥ 1 − ε for all n. Then for every δ > 0, there exists θ > 0 with supα∈K ω (α, θ) ≤ δ, and T := supα∈K,t∈[0,1] |α(t)| is finite by Lemma 1.1. Thus, P( sup |Xtn | > T ) ≤ ε, P(ω (X n , θ) ≥ δ) ≤ ε t∈[0,1]
for all n. These imply (i) and (ii) with n0 = 1
In the study of weak convergence for sums of some stochastic processes, the following theorem is very interesting. Theorem 1.5. Suppose that for all n, p ∈ N, the c` adl` ag process sequence {X n } has a decomposition X n = Y n + Z np + W np , such that (i) the sequence {Y n }n≥1 is tight; (ii) the sequence {Z np }n≥1 is tight and there is a sequence {ap } with limp→∞ ap = 0, and lim P( sup |ΔZtnp | ≥ ap ) = 0;
n→∞
t∈[0,1]
(iii) for any ε > 0, lim lim sup P( sup |Wtnp | ≥ ε) = 0.
p→∞ n→∞
Then {X } is tight. n
t∈[0,1]
pg 16/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
The Definition and Basic Properties of Weak Convergence
17
Proof. It is obvious that condition (i) in Theorem 1.4 is satisfied. Let ε, η > 0, there exist p0 , n0 and θ > 0, such that ap ≤ η for p ≥ p0 , P(ω (Y n , θ) ≥ η) ≤ ε, P(ω (Z np , 2θ) ≥ η) ≤ ε, P( sup |Wtnp | ≥ η) ≤ ε, P( sup |ΔZtnp | ≥ η) ≤ ε t∈[0,1]
t∈[0,1]
for n ≥ n0 . Thus P(ω (X n , θ) > 6η) ≤ 6ε, since ω (X n , θ) ≤ ω (Y n + Z np , θ) + sup{
sup s≤t,u≤s+θ
|Wtnp − Wunp | : 0 ≤ s ≤ s + θ ≤ 1}
≤ ω (Y n , θ) + 2ω (Z np , θ) + sup |ΔZtnp | + 2 sup |Wtnp |. t∈[0,1]
(ii) in Theorem 1.4 is satisfied. 1.3.2
t∈[0,1]
Identifying the limit
Suppose that {Pn } is relatively compact. Then, each subsequence contains a further subsequence converging weakly to some limit Q. If every such limit coincides with each other, we can obtain {Pn } weakly converge to Q as n → ∞. This procedure is called as identifying the limit. Let X n = (Xtn )t∈[0,1] and X = (Xt )t∈[0,1] be stochastic processes defined on the probability space (Ω, F , P). Let πt1 ···tk be the mapping that carries the point x ∈ C[0, 1] or D[0, 1] to the point (x(t1 ), · · · , x(tk )) of Rk . Proposition 1.11. Suppose that X n = (Xtn )t∈[0,1] , n ≥ 1, are continuous path processes on (Ω, F , P) with tightness. If the finite dimensional distributions of {X n } weakly converge, then {X n } weakly converge. } for every Proof. We assume that {P(Xtn1 ,··· ,tk )−1 } weakly converge to {PXt−1 1 ,··· ,tk t1 , t2 , · · · , tk ∈ [0, 1]. Tightness implies each subsequence {X n } contains a further subsequence {X n } converging weakly to some limit X . If we can prove that {PXt−1 } = {PXt−1 }, we will obtain this proposition. It is enough to prove 1 ,··· ,tk 1 ,··· ,tk that the finite-dimensional sets form a determining class of C[0, 1] (c.f., page 19 Billingsley (1999)). In fact, when x ∈ C[0, 1], πt1 ···tk is obviously a continuous mapping. Define the finite dimensional sets as the sets of the form πt−1 H with 1 ···tk H ∈ B(C[0, 1]), since π is continuous. The closed sphere H ∈ B(Rk ). πt−1 t ···t ···t 1 k 1 k {y : δ(x, y) ≤ ε} is the limit of the finite dimensional sets {y : |x( ni ) − y( ni )| ≤ ε, i = 1, 2, · · · , n}. Since C[0, 1] is separable and complete, the finite dimensional sets generate B(C[0, 1]).
pg 17/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
18
linwang
Weak Convergence and its Applications
However, πt is not always continuous in D[0, 1]. The following lemma is about this fact. Lemma 1.2. For x ∈ D[0, 1], πt is continuous at x if and only if x is continuous at t. Proof.
J
1 If xn −→ x, and x is continuous at t, then there exists λn such that
−1 |xn (t) − x(t)| ≤ |xn (t) − x(λ−1 n (t))| + |x(λn (t)) − x(t)| → 0
as n → ∞, i.e., πt is continuous at x. On the other hand, suppose that x is discontinuous at t. Let λn (t) = t − 1/n, J1 x, but xn (t) and λn be linear on [0, t] and on [t, 1]. Let xn = x(λn (t)), then xn −→ does not converge to x(t). We assume that X is a c` adl` ag process on [0, 1], and for every t, P(ΔXt = 0) = 0. X is called as the process with not fixed discontinuous point. From Proposition 1.11, we can obtain the following proposition. adl` ag processes on Proposition 1.12. Suppose that {X n = (Xtn )t∈[0,1] } are c` (Ω, F , P), and {X n } is tight. If the finite dimensional distributions of {X n } weakly converge to that of X. Then {X n} weakly converge to X. The processes with not fixed discontinuous point contain a lot of important examples of stochastic processes, such as L´evy processes. In the previous two propositions, we have introduced important rules of identifying the limiting process. Usually, when the limiting process is L´evy process, people can compute the finite dimensional distributions of the underlying process. Then convergence of finite dimensional distributions is the tool for identifying the limit. However, when the limiting process is more general diffusion process, its finite dimensional distributions are difficult to compute. They can be determined by the generator of the diffusion process, and furthermore, the convergence of finite dimensional distributions is equivalent to convergence of martingale characteristics for generator. Thus convergence of martingale characteristics for the generator can identify the limiting processes (c.f., page 226 Ethier and Kurtz (1986)). Moreover, when the limiting process is semimartingale, its predictable characteristics play the similar roles as martingale characteristics for generator. More details can be found in Chapter 3 of this book.
1.4
Two examples of applications of weak convergence
In this section, we will present two examples of applications of weak convergence.
pg 18/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
The Definition and Basic Properties of Weak Convergence
1.4.1
19
Unit root testing
Methods for detecting the presence of a unit root in parametric time series models have attracted a good deal of interest. A field of applications where the hypothesis of a unit root has important implications is econometrics. This is because a unit root is often a theoretical implication of models which postulate the rational use of information that is available to economic agents. Let Yi = αYi−1 + Xi , where {Xi } is a mean zero stationary time series. The unit root testing is aim to test whether α = 1 from {Yi }, and it can also be used for testing stationarity for {Yi }. Usually, we use statistic n Yi−1 Yi α ˆ = i=1 n 2 i=1 Yi−1 for unit root testing. Theorem 1.6. Assume motion. We have
√1 n
[n·] i=1
Xi ⇒ σ 2 W (·), where W is a standard Brownian
d
→ n(ˆ α − 1) −
λ + σ2 σ2
1 01
W (v)dW (v) W 2 (v)dv
0
.
The proof of this theorem will be given in Chapter 3. 1.4.2
Goodness-of-fit testing for volatility
In general, the diffusion process (Xt )t>0 is a solution of the following stochastic differential equation: dXt = b(t, Xt )dt + σ(t, Xt )dWt where (Wt )t>0 is a standard Brownian motion. The specification of a parametric form for the volatility σ(t, Xt ) is quite an interesting statistical problem. Consider the null hypothesis: H0 : σ(s, Xs ) =
d
θj σj (s, Xs ),
j=1
where θ = (θ1 , θ2 , · · · , θd ) are unknown parameters and σ1 , σ2 , · · · , σd are given and known volatility functions. For testing the null hypothesis, we consider the following stochastic process t d θjmin σj (s, Xs ) ds σ(s, Xs ) − Mt := 0
j=1
pg 19/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
20
linwang
Weak Convergence and its Applications
where the vector θmin = (θ1min , · · · , θdmin ) is defined by 1 d θ min := arg min θj σj (s, Xs ) ds. σ(s, Xs ) − θ∈Rd
0
j=1
Based on the estimations of certain integrals of the volatility, we can construct the ˆ t (c.f. Dette and Podolskij (2008)). the empirical value of Mt , M Dette and Podolskij (2008) have proved √ ˆ ⇒A An := nM on D[0, 1], where A is a stochastic integral. Based on weak convergence, in the case of the Kolmogorov-Smirnov statistics √ d ˆ t| − n sup |M → sup |At | t∈[0,1]
t∈[0,1]
by Proposition 1.8. Moreover, consider local alternatives of the form (n)
H1
: σ(t, Xt ) =
d
θj σj (s, Xs ) + n−1/2 h(t, Xt ).
j=1 (n)
Under H1 , Dette and Podolskij (2008) shows √ ˆ ⇒A nM
(1.19)
on D[0, 1], where t = At + A
t 0
h(s, Xs )ds − t
0
1
h(s, Xs )ds .
The goodness-of-fit test based on (1.19) is more powerful with respect to the Pitman alternatives than the test which rejects the null hypothesis of homoscedasticity ˆ t. for large values of the statistic M
pg 20/1
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Chapter 2
Convergence to the Independent Increment Processes
In the previous chapter, we introduce the general criterion for the weak convergence of stochastic processes. When the limiting process is independent increment process, the sufficient conditions of weak convergence can be weakened. In this chapter, we will discuss these in details. 2.1
The basic conditions of convergence to the Gaussian independent increment processes
Brownian motion is a very important example of independent increment processes, it is the limiting process in a lot of weak convergence theorems. Thus it is very interesting to study the convergence to Brownian motion. For simplicity, we firstly study weak convergence of random elements to Brownian motion in the space C[0, 1], then extend the result to D[0, 1]. In 1.1.1, we define the local uniform topology in C[0, 1] ρ(α, β) = sup |α(s) − β(s)| s∈[0,1]
and give the sufficient and necessary conditions for tightness in (C[0, 1], B(C[0, 1])). Let Pn be probability measures on (C[0, 1], B(C[0, 1])). We have: Theorem 2.1. {Pn }n≥1 is tight if and only if (iff ) (i) for every ε > 0, there exist a > 0 and n0 , such that Pn {x : |x(0)| ≥ a} ≤ ε as n ≥ n0 ; (ii ) for every ε > 0, η > 0, there exist δ, 0 < δ < 1, and n0 , such that Pn {x : sup |x(s) − x(t)| ≥ η} ≤ ε |s−t|≤δ
as n ≥ n0 . Proof. Sufficiency. Since (C[0, 1], B(C[0, 1])) is separable and complete, for fixed n, {Pi : i ≤ n} is obviously tight. For convenience, we may assume that n0 = 1. 21
pg 21/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
22
linwang
Weak Convergence and its Applications
From (i) and (ii), for given ε > 0 and integer k > 0, there exist a > 0 and δk > 0 such that Pn {x : |x(0)| ≤ a} ≥ 1 − ε, Pn {x :
sup |s−t|≤δk
|x(s) − x(t)| ≤
ε 1 }≥1− k k 2
for all n. Let K be the closure of {x : |x(0)| ≤ a} {x : sup|s−t|≤δk |x(s)−x(t)| ≤ k1 }, then Pn (K) ≥ 1 − 2ε, so {Pn } is tight. Necessity. Suppose that {Pn} is tight. For given ε, we have compact subset a-Ascoli theorem, we have K such that Pn (K) ≥ 1 − ε for all n. By the Arzal` K ⊂ {x : |x(0)| ≤ a} for a large enough and K ⊂ {x : sup|s−t|≤δ |x(s) − x(t)| ≥ η} for η small enough. (i) and (ii) are proved. In some situations, for example, in some statistical applications we need to study the weak convergence of random elements which take values in D[0, 1]. First of all, we discuss the tightness of probability measures on (D[0, 1], B(D[0, 1])). For α ∈ D[0, 1], recall ω (α, δ) = inf {max r>0
sup
i≤r s,t∈[ti−1 ,ti )
|α(s)−α(t)| : 0 = t0 < · · · < tr = 1, inf (ti −ti−1 ) ≥ δ}. i≤r
Let Pn be probability measures on (D[0, 1], B(D[0, 1])). We have: Theorem 2.2. {Pn }n≥1 is tight iff (i) for every ε > 0, there exist a > 0 and n0 , such that Pn {x : sup |x(t)| ≥ a} ≤ ε t∈[0,1]
as n ≥ n0 ; (ii) for every ε > 0, η > 0, there exist δ, 0 < δ < 1, and n0 , such that Pn {x : ω (x, δ) ≥ η} ≤ ε as n ≥ n0 . Proof. Sufficiency. For fixed n, {Pi : i ≤ n} is obviously tight. For convenience, we may assume that n0 = 1. Fix ε > 0 and integer k > 0, let aε < ∞ and δεk > 0 satisfy ε sup Pn {x : sup |x(t)| ≥ aε } ≤ , 2 n≥1 t∈[0,1] sup Pn {x : ω (x, δεk ) ≥ n≥1
ε 1 } ≤ k+1 . k 2
Let Aε be the closure of {x : sup |x(t)| ≤ aε , ω (x, δεk ) ≤ t∈[0,1]
1 for k ≥ 1}, k
pg 22/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
23
Convergence to the Independent Increment Processes
then Pn {x : x ∈ Acε } ≤ Pn {x : sup |x(t)| ≥ aε } + t∈[0,1]
∞
Pn {x : ω (x, δεk ) ≥
k=1
1 } < ε. k
Hence {Pn } is tight. Necessity. Suppose that {Pn} is tight. For given ε, we have compact subset K such that Pn (K) ≥ 1 − ε for all n. By the Arzal` a-Ascoli theorem, we have K ⊂ {x : supt∈[0,1] |x(t)| ≤ a} for a, large enough and K ⊂ {x : ω (x, δ) ≥ η} for η, small enough. (i) and (ii) are proved. Let us come back to the field of weak convergence of stochastic processes. When the limiting processes are Gaussian independent and stationary increment processes, which are processes with not fixed discontinuous points, the convergence of finite dimensional distributions can identify the limiting processes by Proposition 1.12. 2.2
Donsker invariance principle
In the classical probability theory, the central limit theorem is a fundamental theorem. It tells us that, quite generally, what happens when we have the sum of a large number of random variables each of which contributes a small amount to the total. In fact, the central limit theorem provides a universal property in classical probability theory. The central limit theorem describes weak convergence of probability measure on R. In the other words, central limit theorem depicts weak convergence of R−valued random elements. It is natural to extend central limit theorem to more general cases. Donsker invariance principle is an extension of cental limit theorem on random elements to C[0, 1]. 2.2.1
Classical Donsker invariance principle
Consider a sequence {Xi }i≥1 of independent and identically distributed (i.i.d.) random variables with EXi = 0 and EXi2 = σ2 < ∞. Let us consider a continuous partial sum process on [0, 1]: [nt] X[nt]+1 1 Wtn = √ Xi + (nt − [nt]) √ nσ i=1 nσ
(2.1)
where [nt] is an integer part of nt. W n is a random element in C[0, 1]. Theorem 2.3. If {Xi }i≥1 are i.i.d. random variables with mean 0 and variance σ2 . Wtn is defined as (2.1). Then W n ⇒ W , where W is a Brownian motion. Proof.
By the previous section, we will obtain this theorem through showing d
(Wtn1 , · · · , Wtnk ) − → (Wt1 , · · · , Wtk )
(2.2)
pg 23/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
24
linwang
Weak Convergence and its Applications
for every t1 , · · · , tk ∈ [0, 1] and lim lim sup P[ sup |Wtn − Wsn | ≥ ε] = 0
δ→0 n→∞
(2.3)
|t−s|≤δ
for any ε > 0. For (2.2), we have X[nt]+1 P − →0 (nt − [nt]) √ nσ by Chebyshev’s inequality, and [nt] 1 d √ √ Xi − → tN nσ i=1
by the Lindeberg-L´evy central limit theorem, where N is a random variable with the standard normal distribution, and so d
→ Wt Wtn − for any t ∈ [0, 1]. For s < t, [ns] [nt] 1 Xi , Xi ) + oP (1) (Wsn , Wtn − Wsn ) = √ ( nσ i=1 [ns]+1
d
− → (N1 , N2 ) where N1 and N2 are independent normal random variables with mean 0 and variances s and t − s. Then using the continuous mapping theorem, we have d
(Wsn , Wtn ) − → (Ws , Wt ). Similarly, we can show (2.2). n Let Sn = i=1 Xi . By the central limit theorem, for k ≤ n, √ √ E|N |3 P[|Sk | ≥ λσ n] ≤ P[|Sk | ≥ λσ k] → P(|N | ≥ λ) ≤ . λ3
√ Set Ei = {max1≤j
We have
√ Ei = { max |Si | ≥ λσ n}. 1≤i≤n
√ √ √ P[ max |Si | ≥ λσ n] ≤ P[|Sn | ≥ (λ − 2)σ n] 1≤i≤n
+
n−1 i=1
P[Ei , |Sn | < (λ −
√
√ 2)σ n],
pg 24/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
25
however for large n, n−1
n−1 √ √ √ P[Ei , |Sn | < (λ − 2)σ n] ≤ P[Ei , |Sn − Si | ≥ σ 2n]
i=1
i=1
=
n−1
√ P[Ei ]P[|Sn − Si | ≥ σ 2n]
i=1
≤
n−1 √ 1 1 P[Ei ] ≤ P[ max |Si | ≥ λσ n], 2 i=1 2 1≤i≤n
thus
√ √ √ lim lim sup λ2 P[max |Sk | ≥ λσ n] ≤ 2 lim lim sup λ2 P[|Sn | ≥ (λ − 2)σ n] = 0.
λ→∞ n→∞
k≤n
λ→∞ n→∞
(2.4) Next step, we will prove that (2.4) implies (2.3). Let x ∈ C[0, 1], 0 = t0 < t1 < · · · < tv = 1 with min1
0. Let Ii = [ti−1 , ti ]. For s, t ∈ [0, 1] with t − δ ≤ s ≤ t, there are two cases. If s and t lie in a same Ii , write |x(s) − x(t)| ≤ |x(s) − x(ti−1 )| + |x(t) − x(ti−1 )|.
(2.5)
If s and t lie in adjacent Ii and Ii+1 , write |x(s) − x(t)| ≤ |x(s) − x(ti−1 )| + |x(ti ) − x(ti−1 )| + |x(t) − x(ti )|.
(2.6)
From (2.5) and (2.6), sup |x(s) − x(t)| ≤ 3 max
sup
1≤i≤v ti−1 ≤s≤ti
|s−t|≤δ
|x(s) − x(ti−1 )|.
(2.7)
For convenience, we take ti = im/n with integer m, nδ ≤ m ≤ nδ + 1, for 0 ≤ i ≤ v := [n/m] + 1. Then P[ sup |Wtn − Wsn | ≥ 3ε] ≤ |s−t|≤δ
v
P[
i=1 v
sup
ti−1 ≤s
|Wsn − Wtni−1 | ≥ ε]
|Sk − S(i−1)m | √ ≥ ε] (i−1)m≤k≤(i+1)m σ n i=1 √ = vP[max |Sk | ≥ εσ n]
=
P[
max
k≤m
εσ √ 2 m]. ≤ P[max |Sk | ≥ √ δ k≤m 2δ
√ Let λ = ε/ 2δ,
P[ sup |Wtn − Wsn | ≥ 3ε] ≤ |s−t|≤δ
Then we can obtain (2.3) through (2.4).
√ 4λ2 P[max |Sk | ≥ λσ m]. 2 k≤m ε
pg 25/2
April 23, 2014
10:51
26
2.2.2
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Martingale invariance principle
In the study of time series and statistics, dependent data is very common. To explore the large sample properties of dependent data, martingale approximation is a useful tool. Thus, martingale limit theorem may be necessary. Martingale central limit theory and invariance principle can be regarded as an extensions of the counterpart limit theory for sums of independent random variables. For n = 1, 2, · · · , let {Xn,i }1≤i≤kn be a sequence of square-integrable martingale difference with respect to filter {Fn,i }1≤i≤kn , the sub-σ−field Fn,i is generated by Xn,1 , · · · , Xn,i , and Sn,k = ki=1 Xn,i . Thus, {Sn,i , Fn,i , 1 ≤ i ≤ kn , n ≥ 1} is a martingale array. L´evy introduced the conditional variance kn 2 Vn2 = E(Xn,i |Fn,i ) i=1
as a counterpart of the variance in study of limit theory for sums of independent rankn 2 Xn,i , dom variables. Moreover, from the Doob-Meyer decomposition of Un2 = i=1 2 2 2 2 Vn can be regarded as the compensator of Un . It means that {Un,i − Vn,i } is uniformly integrable martingale. From Theorem 2.23 in Hall and Heyde (1980) P
2 2 max |Un,i − Vn,i |− →0 (2.8) i i 2 2 2 holds, where Un,i = j=1 Xn,j , Vn,i = j=1 E(Xn,i |Fn,i ). (2.8) means the condi2 2 tional variance Vn,i may be approximated by Un,i . If {Xn,i }1≤i≤kn is independent random variables with zero means, variances summing to 1, and for all ε > 0, 1≤i≤kn
max P(|Xn,i | > ε) → 0,
1≤i≤kn
Raikov (1938) showed that kn
d
Xn,i − → N (0, 1)
i=1
iff P
Un2 − → 1.
(2.9)
However, when {Xn,i }1≤i≤kn is a martingale difference sequence, (2.9) can not always hold due to (2.8). Usually, the limit of Un2 in probability is a random variable, it is quite different from the independent case. Thus, the central limit theorem for martingale is very interesting. Chatterji (1974), Hall (1977), Rootz´en (1977) and Aldous and Eagleson (1978) studied the central limit theorems for martingales. Under some conditions, Aldous and Eagleson (1978) showed that d
Sn,kn − → N η, Un2
(2.10)
in probability, and N is a stanwhere η is a random variable as the limit of dard normal variable independent of η. Obviously, N η has characteristic function E exp(− 12 η 2 t2 ). (2.10) is equivalent to Sn,kn d − → N. (2.11) Un2
pg 26/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
27
In this subsection, we will introduce the martingale invariance principle, or say martingale functional central limit theorem. For each t ∈ [0, 1], similar to (2.1), we turn to study the weak convergence of random elements in C[0, 1]: 2 (tUn2 − Un,i ) 1 Xn,i+1 ), ζn (t) = (Sn,i + 2 Xn,i+1 Un2
2 2 if Un,i ≤ tUn2 < Un,i+1 .
However, ζn (t) is quite complex. We turn to study simpler case. Let βn (t) =
kn
2 Xn,i 1{Un,i 2 ≤t} /Un
(2.12)
i=1
and 1 ξn (t) = βn (t). Un2
(2.13)
ξn and βn are random elements in D[0, 1]. Theorem 2.4. (Martingale invariance principle) Suppose that 2 lim E( max Xn,i )=0
n→∞
1≤i≤kn
(2.14)
and there exist variables ηn adapted to the σ−fields Fn,1 such that P
Un2 − ηn − → 0.
(2.15)
lim lim inf P(Un2 > δ) = 0
(2.16)
If δ→∞ n→∞
and kn
P
|E(Xn,i |Fn,i−1 )| − → 0,
(2.17)
i=1
then ξn ⇒ W , where W is a standard Brownian motion. If 2 max1≤i≤kn Xn,i P − → 0, Un2
(2.18)
Un2 − →η
(2.19)
kn 1 P |E(Xn,i |Fn,i−1 )| − → 0, Un2 i=1
(2.20)
(βn , Un2 ) ⇒ ((η )1/2 W, η ),
(2.21)
d
where η is a random variable and
then
where η is a copy of η, independent of W .
pg 27/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
28
linwang
Weak Convergence and its Applications
Proof. We first prove that (2.14), (2.15) and (2.18)–(2.20) are sufficient for (2.21). Note that (2.21) is equivalent to the pair of conditions: (i) for all sequences 0 = t0 < t1 < · · · < tp ≤ 1 and real numbers z1 , z2 , · · · , zp , s, lim |E exp[i
n→∞
p
1 zk (βn (tk )−βn (tk−1 ))+isUn2 ]−E exp[− η 2
k=1
p
zk2 (tk −tk−1 )+isη]| = 0;
k=1
(2.22) (ii) the sequence of random elements {βn , Un2 , n ≥ 1} is tight. Suppose first that for some λ > 0, P(η > λ) = 0. ηn can be chosen such that P(ηn > 2λ) = 0 and (2.15) is satisfied. Fix Δ > 0, δ > 0, define ηn (δ) = δ1{ηn ≤δ} + ηn 1{ηn >δ} and βn (δ, t) =
kn
2 Xn,i 1{Un,i−1 /ηn (δ)≤t} .
i=1
Define sets 2 Fn = {max Xn,i ≤ i≤kn
Δ 2 Δ Un ; |Un2 /ηn (δ) − 1| ≤ } 2 2
and Gn = {|Un2 − ηn | ≤ Δ; ηn ≤ δ}. We need the following lemma. Lemma 2.1. For all ε > 0 and t ∈ [0, 1], lim lim sup P(|βn (t) − βn (δ, t)| > ε) = 0.
δ→0 n→∞
Proof of Lemma 2.1. Note that 2 2 2 |Un,i /ηn − Un,i−1 /ηn (δ)| ≤ Xn,i /ηn + |ηn /ηn (δ) − 1|,
and so, 2 2 1{Un,i−1 2 ≤t} ≤ 1{U 2 /ηn (δ)≤t−Δ} ≤ 1{Un,i−1 /Un n,i−1 /ηn (δ)≤t+Δ}
on the set Fn . Hence |βn (t) − βn (δ, t)| =|
kn
2 Xn,i {1{Un,i−1 }| 2 ≤t} − 1{U 2 /Un n,i−1 /ηn (δ)≤t}
i=1
≤
max
1≤m
≤ 2 max | 1≤m≤kn
|
k
i=m m
2 Xn,i 1{t−Δ≤Un,i−1 /ηn (δ)≤t+Δ} |
2 Xn,i 1{t−Δ≤Un,i−1 /ηn (δ)≤t+Δ} |
i=1
(2.23)
pg 28/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
29
Convergence to the Independent Increment Processes
on Fn . Furthermore, on the set Gn , |βn (t)| ∨ |βn (δ, t)| ≤ =
max |
1≤m≤kn
max |
1≤m≤kn
m i=1 m
Xn,i | 2 Xn,i 1{Un,i−1 ≤Δ+δ} |.
i=1
Therefore, P(|βn (t) − βn (δ, t)| > ε) m ε 2 Xn,i 1{Un,i−1 ≤ P( max | ≤Δ+δ} | ≥ ) 1≤m≤kn 2 i=1 +P( max | 1≤m≤kn
m
2 Xn,i 1{t−Δ≤{Un,i−1 /ηn (δ)≤t+Δ} | ≥
i=1
ε ) + P(Fnc Gcn ). 4
(2.15) and (2.18) imply that the first two terms on the right hand side are o(1). The last term, P(Fnc Gcn ) Δ 2 Δ Un ) + P(|Un2 /ηn (δ) − 1| > ; ηn > δ), 2 2
2 > ≤ P(max Xn,i i≤kn
where P(|Un2 /ηn (δ) − 1| >
Δ Δδ ; ηn > δ) ≤ P(|Un2 − ηn | > )→0 2 2
by (2.15). Hence, P(Fnc
Gcn ) = o(1).
(2.24)
Set Pn,m =
m
2 Xn,i 1{t−Δ≤Un,i−1 /ηn (δ)≤t+Δ} ,
i=1
Qn,m =
m
2 Xn,i 1{Un,i−1 ≤Δ+δ} .
i=1
Obviously, (Pn,m , Fn,m )m≥1 , (Qn,m , Fn,m )m≥1 are martingales. mogorov’s inequality for martingales, we have P( max |
m
1≤m≤kn
≤ 16ε−2E[
kn
2 Xn,i 1{t−Δ≤Un,i−1 /ηn (δ)≤t+Δ} ≥
i=1 2 2 Xn,i 1{t−Δ≤Un,i−1 /ηn (δ)≤t+Δ} ]
i=1
≤ 32ε−2
kn λ 2 X2 1 ] E[ ηn (δ) i=1 n,i {t−Δ≤Un,i−1/ηn (δ)≤t+Δ}
ε ) 4
Applying Kol-
pg 29/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
30
linwang
Weak Convergence and its Applications 2 maxi≤kn Xn,i ] δ = 64ε−2λΔ + o(1).
≤ 32ε−2λE[2Δ +
Similarly, we can obtain P( max |
m
1≤m≤kn
2 Xn,i 1{Un,i−1 ≤Δ+δ} ≥
i=1
ε ) ≤ 4ε−2 (Δ + δ) + o(1). 2
Hence P(|βn (t) − βn (δ, t)| > ε) ≤ 64ε−2 λΔ + 4ε−2(Δ + δ) + o(1). Let n → ∞ and then Δ → 0, δ → 0 to establish (2.23). In view of (2.15) and (2.23), (2.22) will hold if we show that lim |E exp[i
n→∞
p
zk (βn (δ, tk ) − βn (δ, tk−1 )) + isηn ]
k=1
1 2 zk (tk − tk−1 ) + isη]| = 0. −E exp[− η 2 p
(2.25)
k=1
Define An,i = An,i (δ) = Xn,i
p
zk 1{tk−1
1 ≤ i ≤ kn ,
k=1
Bn2 = Bn2 (δ) =
kn
A2n,i =
i=1
kn
2 Xn,i
i=1
p
zk2 1{tk−1
k=1
and Tn = Tn (δ) =
kn
(1 + iAn,i ),
i=1 n 1 r(An,i ) + isηn ). Wn = Wn (δ) = exp(− Bn2 + 2
k
i=1
Write e
ix
= (1 +
x) exp(− 12 x2
+ r(x)), where |r(x)| ≤ |x|3 for |x| < 1. Let
1 wn = exp(− σ2 ηn + isηn ), 2 σ2 =
p
zk2 (tk − tk−1 )
k=1
and ψ =
E exp(− η2 σ 2 + p E exp[i
k=1
isη). Then zk (βn (δ, tk ) − βn (δ, tk−1 )) + isηn ]
pg 30/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
31
1 2 −E exp[− η zk (tk − tk−1 ) + isη] 2 p
k=1
= E(Tn Wn ) − ψ = ETn (Wn − wn ) + E(Tn − 1)wn + E(wn ) − ψ and |E(Tn − 1)wn | ≤ E|E(Tn |Fn,1 ) − 1| = 0. Moreover, E(wn ) → ψ as n → ∞, since (2.15), (2.18) and the functions f (u) = exp(− 12 σ 2 u) and g(u) = exp(isu) are uniformly bounded and continuous on [0, ∞). Now, we show lim lim sup E|Tn (δ)(Wn (δ) − wn )| = 0.
δ→0 n→∞
(2.26)
First let us prove Lemma 2.2. For all ε > 0, lim lim sup P(|Wn (δ) − wn | > ε) = 0.
δ→0 n→∞
(2.27)
Proof of Lemma 2.2. Let z = max1≤k≤p |zk |. We have |
kn
r(An,i )| ≤
i=1
kn
|An,i |3 ≤ Bn2 max |An,i | i≤kn
i=1
P
→0 ≤ z 3 Un2 max |Xn,i | − i≤kn
as n → ∞. Furthermore, p
≤
zk2
kn
k=1
i=1
p
kn
zk2
k=1
2 2 /U 2 ≤t −Δ} Xn,i 1{tk−1 +Δ
2 2 /U 2 ≤t +Δ} . Xn,i 1{tk−1 −Δ
i=1
In view of (2.18), for each t ∈ [0, 1], Dn2 (t) :=
kn 1 P X2 1 2 − → t. 2 Un2 i=1 n,i {Un,i /Un ≤t}
If t > 1, then Dn2 (t) = 1 and if t < 0, Dn2 (t) = 0. If Δ is so small that each tk−1 + Δ ≤ tk − Δ, then we have p k=1
zk2 [Dn2 (tk
− Δ) − Dn2 (tk−1
+ Δ)] ≤
Bn2 /Un2
≤
p
zk2 [Dn2 (tk + Δ) − Dn2 (tk−1 − Δ)].
k=1
Hence on the set Fn {max0≤k≤p | pk=1 [Dn2 (tk ±Δ)−(tk ±Δ)]| ≤ Δ} {Un2 ≤ 2λ}, Un2 σ 2 − 8pz 2Δλ ≤ Bn2 ≤ Un2 σ 2 + 8pz 2Δλ.
pg 31/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
32
linwang
Weak Convergence and its Applications
Let ε > 0 and choose Δ so small that 8pz 2 Δλ < ε. Then P(|Bn2 − Un2 σ 2 | > ε; Fn ) ≤ P( max |Dn2 (tk ± Δ) − (tk ± Δ)| > Δ) + P (Un2 > 2λ), 0≤k≤p
d
where P(Un2 > 2λ) → 0, since Un2 − → η ≤ λ a.s., and so for all sufficiently small Δ and all δ > 0, lim P(|Bn2 (δ) − Un2 σ 2 | > ε; Fn (δ, Δ)) = 0.
n→∞
On the set Gn , Un2 ≤ Δ + δ and so |Bn2 − Un2 σ 2 | ≤ 2z 2 Un2 ≤ 2z 2 (Δ + δ). Choose Δ and δ so small that 2z 2 (Δ + δ) < ε. Then for all n, P(|Bn2 − Un2 σ 2 | > ε; Gn (δ, Δ)) = 0. Thus, for all sufficiently small Δ and δ, P(|Bn2 − Un2 σ 2 | > ε) = o(1). Then we have (2.27) by (2.15). Now we show (2.26). Let Cn =
kn i=1
An,i =
p
zk (βn (δ, tk ) − βn (δ, tk−1 )),
k=1
In = In (δ) = exp(iCn + isηn ). For ε > 0, E|Tn (δ)(Wn (δ) − wn )| 2 1/2 |Tn (Wn − wn )|dP ≤ (E|Tn | ) ε + {|W −w |>ε} n n ≤ (E|Tn |2 )1/2 ε + (|In | + |Tn wn |)dP {|Wn −wn |>ε}
2 1/2
≤ (E|Tn | )
ε + P(|Wn − wn | > ε) + (E(Tn2 ))1/2 (P(|Wn − wn | > ε))1/2
= (E|Tn |2 )1/2 (ε + (P(|Wn − wn | > ε))1/2 ) + P(|Wn − wn | > ε). 2 2 /ηn (δ) ≤ 1}. Since A2n,i ≤ t2 Xn,i , Let Jn = max{i ≤ kn |Un,i−1
E|Tn |2 = E
kn
2 2 (1 + A2n,i ) ≤ E exp(z 2 Un,J )(1 + z 2 Xn,J ) n −1 n
i=1 2 ≤ E exp(z 2 ηn (δ))(1 + z 2 max Xn,i ) i≤kn
2
2
2 )) ≤ exp(2λz )(1 + z E(max Xn,i i≤kn
→ exp(2λz 2 ) as n → ∞.
pg 32/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to the Independent Increment Processes
33
Combining Lemma 2.2, we get (2.26). Hence, we obtain (2.22) under the conditions (2.14), (2.15), (2.18) and (2.19) for some λ > 0 with P(η > λ) = 0. Now let us consider the case of an arbitrary η. Let λ > 0 be a continuity point n,i = Xn,i 1{U 2 ≤λ} , S n,j = j X of distribution function of η, and define X i=1 n,i , n,i−1 k n n2 = 2 n = ηn 1{η ≤λ} + λ1{η >λ} , η = η1{η≤λ} + λ1{η>λ} . Obviously, U n n i=1 Xn,i , η 2 2 ≤ X implies X n,i
n,i
2 ) = 0. lim E( max X n,i
(2.28)
1≤i≤kn
n→∞
2 − η n | ≤ |U 2 − ηn | + maxi≤kn X 2 and so Moreover, |U n n n,i P n2 − η n − U → 0.
(2.29)
2 2 2 maxi≤kn X maxi≤kn Xn,i maxi≤kn Xn,i P n,i )− ≤ max( , → 0, 2 2 U λ Un n
(2.30)
Note that
since λ is a continuity point of the distribution function of η, d 2 := U 2 1{U 2 ≤λ} + λ1{U 2 >λ} − → η . U n n n n
Furthermore, 2 ≤ U 2 ≤ U 2 + max X 2 , U n n n n,i i≤kn
we have d n2 − → η . U
(2.31)
Define β n for the martingale {(S n,i , Fn,i ), 1 ≤ i ≤ kn } in the same way that we defined βn for {(Sn,i , Fn,i ), 1 ≤ i ≤ kn }. From (2.28), (2.29), (2.30) and (2.31), the proof given above implies that lim |E exp[i
n→∞
p
n2 ]−E exp[− 1 η zk (β n (tk )−β n (tk−1 ))+isU zk2 (tk −tk−1 )+is η ]| = 0. 2 k=1 k=1 (2.32) p
Since lim sup P(Un2 > λ) → 0 n→∞
as λ → ∞, (2.32) implies (2.22) under the conditions (2.14), (2.15), (2.18) and (2.19). Now we will prove the tightness of {βn , n ≥ 1} under the conditions (2.14), (2.15), (2.18) and (2.19). Suppose first that for some λ > 0, P(η > λ) = 0. ηn can be chosen such that P(ηn > 2λ) = 0. We will obtain the tightness of {βn , n ≥ 1} through showing P( sup |βn (t) − βn (kh)| > ε) = 0 (2.33) lim lim sup h→0 n→∞
kh<1
kh
pg 33/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
34
linwang
Weak Convergence and its Applications
for all ε > 0. On the set Fn , 2 2 max |Un,i /Un2 − Un,i−1 /ηn (δ)| ≤ Δ i≤kn
and so |βn (t) − βn (kh)| ≤ 2 max |
sup
m≤kn
kh
m
2 Xn,i 1{kh−Δ
i=1
on Fn . Moreover, |βn (t) − βn (kh)| ≤ 2 max |
sup
m≤kn
kh
m
2 Xn,i 1{Un,i−1 ≤Δ+δ} |
i=1
on the set Gn . Therefore P(
|βn (t) − βn (kh)| > ε)
sup
(2.34)
kh
≤ P( max | m≤kn
m
2 Xn,i 1{kh−Δ ε/2)
i=1 m
+P( max | m≤kn
c 2 Xn,i 1{Un,i−1 ≤Δ+δ} | > ε/2) + P(Fn
Gcn ).
(2.35) (2.36)
i=1
Consider the term in (2.35). Set 0 < ε1 < ε/8 and C > ε, Mn,m =
m
2 Xn,i 1{kh−Δ
i=1
Ln = {|βn (kh−Δ)−βn (δ, kh−Δ)| < ε1 ; |βn ((k+1)h+Δ)−βn (δ, (k+1)h+Δ)| < ε1 } and Mn = Mn,kn . Obviously, {(Mn,i , Fn,i ), 1 ≤ i ≤ kn } is a martingale, and by Kolmogorov’s inequality, m 2 Xn,i 1{kh−Δ ε/2) m≤kn
i=1
4 ≤ |Mn |dP ε {|Mn |>ε/4} 4 4 4 |Mn |dP + |Mn |dP + |Mn |dP. ≤ ε {|Mn |>C} ε Lcn ε Ln {C≥|Mn |≥ε/4} Furthermore,
4 |Mn |dP ε {|Mn |>C} 4 EMn2 ≤ εC kn 4 2 = X2 1 ] E[ εC i=1 n,i {kh−Δ
pg 34/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
35
Convergence to the Independent Increment Processes n 8λ 1 2 ≤ X2 1 ] E[ εC ηn (δ) i=1 n,i {kh−Δ
k
2 maxi≤kn Xn,i 8λ 8C λ E[C + ]= + o(1) εC δ εC where C = (k + 1)h + Δ, and 4 4 |Mn |dP ≤ (P(Lcn )EMn2 )1/2 + o(1). ε Lcn ε
≤
On Ln |Mn| = |βn (δ, (k + 1)h + Δ) − βn (δ, kh − Δ)| ≤ |βn ((k + 1)h + Δ) − βn (kh − Δ)| + 2ε1 =: |Mn | + 2ε1 , then 4 ε
4 |Mn |dP ≤ (2ε1 + ε Ln {C≥|Mn |≥ε/4}
|≥ε/4−2ε } {C+2ε1 ≥|Mn 1
|Mn |dP).
The convergence of finite dimensional distributions of βn implies that C+2ε1 lim |Mn |dP = E[ xdP(|N (0, η(h+ 2Δ)| ≤ x)]. (2.37) n→∞
|≥ε/4−2ε} {C+2ε≥|Mn
ε/4−2ε1
Now consider the first term of (2.36). By Kolmogorov’s inequality, P( max | m≤kn
m
2 Xn,i 1{Un,i−1 ≤Δ+δ} | >
i=1
ε 4 ) ≤ 2 (Δ + δ) + o(1). 2 ε
Combining (2.35), (2.37) with (2.24), we see that lim sup P( n→∞
≤
|βn (t) − βn (kh)| > ε)
sup
kh
16λ 4 ε2 + (lim sup P(Lcn )4λ)1/2 + (Δ + δ) εC ε n→∞ 4 C+2ε1 ε xdP(|N (0, η(h + 2Δ)| ≤ x)]. + E[ 4 ε/4−2ε1
Let δ → 0 and then ε1 → 0, Δ → 0 and C → ∞, we obtain lim sup P( n→∞
≤
ε E[ 4
|βn (t) − βn (kh)| > ε)
sup
kh
xdP(|N (0, ηh| ≤ x)] =
ε/4
Hence lim sup n→∞
kh≤1
P(
2 ε 1 ε E[( )1/2 (ηh)1/2 exp(− ( )2 (ηh)−1 )]. 4 π 2 4
sup kh
|βn (t) − βn (kh)| > ε)
pg 35/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
36
linwang
Weak Convergence and its Applications
2 ε 1 ε E[( )1/2 (ηh)1/2 exp(− ( )2 (ηh)−1 )] + δh , 4 π 2 4 where δh → 0 as h → 0. The integrand of the expectation on the right converges a.s. to 0 as h → 0 and is dominated by (4/ε)T ≤ (4/ε)λ. Hence the expectation itself converges to 0, and this establishes (2.33). This proves the tightness of βn in the case where η is essentially bounded. A proof in the more general case follows via a truncation argument like that used in the proof of (2.22). Hence, we obtain (2.21) under the conditions (2.14), (2.15), (2.18) and (2.19). Now we will prove the theorem. Let Yn,i = Xn,i − E(Xn,i |Fn,i−1 ), Tn,i = i i 2 2 2 2 k=1 Yn,k , Vn,i = k=1 Yn,k and Vn = Vn,kn . Define ≤
ηn (t) =
[nt]
2 /V 2 ≤t} . Yn,i 1{Vn,i n
i=1
(2.14) implies 2 lim E( max Yn,i )=0
n→∞
(2.38)
1≤i≤kn
and so the martingales ({Tn,i , Fn,i ), 1 ≤ i ≤ kn }, n ≥ 1, satisfy the analogue of (2.14). kn kn 2 2 2 1/2 maxi≤kn |Un,i ( i=1 − Vn,i | Xn,i ) ( i=1 |E(Xn,i |Fn,i−1 )|2 )1/2 ≤ 2 Un2 Un2 kn ( i=1 |E(Xn,i |Fn,i−1 )|)2 + Un2 where θn =
= 2θn + θn2 (
kn
i=1
|E(Xn,i |Fn,i−1 )|)2 . 2 Un
By (2.20), P
→0 θn − and so 2 2 maxi≤kn |Un,i − Vn,i | P − →0 2 Un
(2.39)
and furthermore, P
|Un2 − Vn2 | − → 0. Hence, the analogues of (2.15) and (2.19) hold for the martingale ({Tn,i , Fn,i ), 1 ≤ i ≤ kn }. From the above proof, we obtain (2.21) is equivalent to the pair of the conditions: (iii) for all sequence 0 = t0 < t1 < · · · < tp ≤ 1 and real numbers z1 , z2 , · · · , zp , s, p p 1 2 zk (ηn (tk )−ηn (tk−1 ))+isVn2 ]−E exp[− η zk (tk −tk−1 )+isη]| = 0; lim |E exp[i n→∞ 2 k=1
k=1
pg 36/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
37
Convergence to the Independent Increment Processes
(iv) the sequence of random elements {ηn , n ≥ 1} is tight. The theorem will be obtained by P
→ 0. sup |ηn (t) − βn (t)| −
(2.40)
t∈[0,1] 2 2 On the set {supi≤kn |Un,i /Un2 − Vn,i /Vn2 | ≤ δ} we have
sup |ηn (t) − βn (t)| ≤ t∈[0,1]
|βn (w) − βn (s)| +
sup
kn i=1
s,w∈[0,1];|s−w|≤δ
|E(Xn,i |Fn,i−1 )| Un2
and for any ε > 0, P( sup |ηn (t) − βn (t)| > ε) ≤ P( t∈[0,1]
|βn (w) − βn (s)| > ε/2)
sup
s,w∈[0,1];|s−w|≤δ
kn
i=1
+P(
|E(Xn,i |Fn,i−1 )| > ε/2) Un2
2 2 +P( sup |Un,i /Un2 − Vn,i /Vn2 | > δ). i≤kn
By (2.20), (2.39) and the tightness of βn , the theorem is proved.
Corollary 2.1. Suppose that (2.14) holds, P
Un2 − → η,
(2.41)
and either η is measurable in
∞
Fn,1
(2.42)
n=1
or kn ↑ ∞ and Fn,i ⊆ Fn+1,i for all i ≤ kn .
(2.43)
Then (2.21) holds. Proof. We need to prove that (2.14), (2.41) and (2.42) or (2.43) are sufficient for (2.21). Taking ηn = η, (2.20) implies (2.18), so (2.21) holds. If (2.43) is true, choose integers ln ≤ kn such that ln ↑ ∞, and maxi≤kn |Xn,i | P − → 0. Un2 ∞ η is measurable in the σ−field generated by ∞ n=1 Fn,kn = n=1 Fn,ln . For each ε > 0, we can find an n and an Fn,ln -measurable variable ζ such that ln
P(|η − ζ| > ε) < ε. Hence, we can choose a sequence ηn adapted to the σ−fields Fn,1 so that (2.15) is satisfied. Furthermore, (2.20) is obviously true, then we can obtain the (2.21) from the theorem.
pg 37/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
38
linwang
Weak Convergence and its Applications
2.2.3
Extension of martingale invariance principle
As one of main conventional tools, the classical martingale invariance principle is widely used in statistics, econometrics and other fields. In many applications, however, the condition (2.15) seems to be too restrictive. For example, consider Mn,i = cn
i
g(Xn,m )εm+1 ,
(2.44)
m=1
where g is a real integrable function on R, εm is a stationary sequence with n is a random walk. Let Fn,m = mean 0. For every n > 0, {Xn,m }km=1 σ(Xn,1 , · · · , Xn,m ; ε1 , · · · , εm ) and assume E(ε2m+1 |Fn,m ) = 1. Obviously, {Mn,i , Fn,i } is a martingale array with conditional variance Un2 = c2n
kn
g 2 (Xn,m ).
m=1
Unfortunately, in the statistics and econometric, (2.15) can not be obtained usually, thus the asymptotic of Mn,kn can not be obtained by the classical martingale invariance principle. However, under certain conditions on Xn,m such that Xn,[n·] ⇒ W (·) on D[0, 1], where W is a standard Brownian motion, we may prove that the variance Vn2 − → η, d
(2.45)
where η is a random variable. The new question is that whether (2.15) can be replaced by (2.45) in the proof of martingale invariance principle. To deal with this problem, Wang (2012) extended the classical martingale central limit theorem to more general case. Under the condition of convergence in distribution of conditional variance, martingale central limit theorem can be obtained. Assume {εn,m , Fn,m }1≤m≤n forms a sequence of martingale differences, Let {ηn,m } and {ξn,j } be two sequences of random variables, and fn be a real function on R∞ . Specify Xn,m = fn (εn,1 , · · · , εn,m ; ηn,1 , · · · , ηn,k ; ξn,1 , ξn,2 , · · · ), and for each n ≥ 1, the Fn,k is a filtration. Now, we present the extended martingale invariance principle. Theorem 2.5. (Extended martingale invariance principle) Assume {(ηn,m , εn,m ), Fn,m } forms a sequence of martingale differences satisfying that 2 |Fn,m ) − 1| → 0, max |E(ηn,m+1
k≤m≤n
max |E(ε2n,m+1 |Fn,m ) − 1| → 0 a.s. (2.46)
k≤m≤n
as n → ∞ first and then k → ∞, and as A → ∞, 2 max [E(ηn,m+1 1{|ηn,m+1 |≥A} |Fn,m ) + E(ε2n,m+1 1{|εn,m+1 |≥A} |Fn,m )] → 0 a.s.
1≤m≤n
(2.47)
pg 38/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
39
Convergence to the Independent Increment Processes
Furthermore, assume max |Xn,m | = oP (1),
(2.48)
n 1 √ |Xn,m ||E(ηn,m+1 εn,m+1 |Fn,m )| = oP (1) n m=1
(2.49)
[n·] n 1 2 ηn,m+1 , Xn,m } ⇒ {W, g 2 (W )} {√ n m=1 m=1
(2.50)
1≤m≤n
and
on D[0, 1] × R, where W is standard Brownian motion, and g 2 (W ) is an a.s. finite [nt] n 2 . Then functional of W . Let Sn (t) = m=1 Xn,m εn,m+1 , G2n = m=1 Xn,m {Sn , G2n } ⇒ {g(W )B, g 2 (W )}
(2.51)
where B is a standard Brownian motion independent of W . Before proving Theorem 2.5, we first present two useful lemmas. Choose some λ > 0, such that P(g 2 (W ) > λ) = 0,
(2.52)
and λ is the continuous point of the distribution function of g 2 (W ). Let ∗ 2 Xn,m = Xn,m 1{m , Sn∗ = i=1 Xn,i ≤2λ}
n
∗ Xn,m εn,m+1
and
G∗2 n
=
m=1
n
∗2 Xn,m .
m=1
For α, βk ∈ R, 0 = u0 < u1 < · · · < uN = 1, define Vn =
N
βm [Wn (um ) − Wn (um−1 )]
m=1
where Wn (t) =
√1 n
[nt] m=1
Γn =
n
ηn,m+1 , and ∗ ∗ E{[exp(iβm + iαXn,m εn,m+1 ) − 1|Fn,m ]},
m=1
where
∗ βm
=
β √j n
when [nuj−1 ] < m ≤ [nuj ] for j = 1, · · · , N .
Lemma 2.3. For any α, βk ∈ R and 0 = u0 < u1 < · · · < uN = 1, exp(|Γn |) is uniformly integrable and → (V, g 2 (W ), Γ) (Vn , G∗2 n , Γn ) − d
where V = 1 2 2 2 α g (W ).
N m=1
2 βm (W (um ) − W (um−1 )) and Γ = − 12
(2.53) N m=1
2 βm (um − um−1 ) −
pg 39/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
40
linwang
Weak Convergence and its Applications
Proof. Recall that max1≤m≤n |Xn,m | = oP (1). There exists a sequence An > 0, such that An (2.54) An → ∞, √ → 0, An max |Xn,m | = oP (1). 1≤m≤n n ∗ ηn,m+1 + αXn,m εn,m+1 , Ωnm = {|ηn,m+1 | ≤ An , |εn,m+1 | ≤ An }. Let Ynm = βm √ ∗ Noting |βm | ≤ C/ n, obviously, we have ∗2 )R1nm , E[|Ynm |3 1{Ωnm } |Fn,m ] ≤ C(n−1 + Xn,m
(2.55)
2 ∗2 E[Ynm 1{Ωcnm } |Fn,m ] ≤ C(n−1 + Xn,m )R2nm ,
(2.56)
where 2 R1nm = (An n−1/2 + An max1≤k≤n |Xn,k |)(E[ηn,m+1 |Fn,m ] + E[ε2n,m+1 |Fn,m ]), 2 R2nm = E[(ηn,m+1 + ε2n,m+1 )1{Ωcnm } |Fn,m ].
By the Taylor expansion |E[exp(iYnm ) − 1 − iYnm |Fn,m ]| 1 2 1{Ωnm } |Fn,m ] + |E[(exp(iYnm ) − 1 − iYnm )1{Ωcnm } |Fn,m ]| ≤ − E[Ynm 2 1 2 )1{Ωnm } |Fn,m ]| +|E[(exp(iYnm ) − 1 − iYnm + Ynm 2 1 2 2 |Fn,m ] + E[Ynm 1{Ωcnm } |Fn,m ] + E[|Ynm |3 1{Ωnm } |Fn,m ] ≤ − E[Ynm 2 1 2 ∗2 |Fn,m ] + C(n−1 + Xn,m )(R1nm + R2nm ). ≤ − E[Ynm 2 Due to (2.47) and (2.54), n
2 (E[Ynm 1{Ωcnm } |Fn,m ] + E[|Ynm |3 1{Ωnm } |Fn,m ])
m=1
≤ C max (R1nm + R2nm ) 1≤m≤n
n
∗2 (n−1 + Xn,m ) = oP (1).
m=1
Thus Γn =
n
E[exp(iYnm ) − 1 − iYnm |Fn,m ]
m=1
=−
n 1 ∗ ∗ E[(βm ηn,m+1 + αXn,m εm+1 )2 |Fn,m ] + oP (1) 2 m=1
=−
n n α2 1 ∗2 2 ∗2 βm E[ηn,m+1 |Fn,m ] − E[Xn,m ε2m+1 |Fn,m ] 2 m=1 2 m=1
−
n m=1
∗ ∗ βm αXn,m E[ηn,m+1 εm+1 |Fn,m ] + oP (1)
pg 40/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
41
N n 1 2 α2 ∗2 =− β ([num ] − [num−1 ]) − X + oP (1) 2n m=1 m 2 m=1 n,m
=−
N n α2 ∗2 1 2 βm (um − um−1 ) − X + oP (1) 2 m=1 2 m=1 n,m
where we have used the fact n ∗ ∗ βm αXn,m E[ηn,m+1 εm+1 |Fn,m ]| | m=1 n C √ |Xn,m ||E(ηn,m+1 εn,m+1 |Fn,m )| = oP (1) ≤ n m=1
by (2.49). So, for any αi ∈ R, i = 1, 2, 3, we have α1 Vn + α2 G∗2 n + α3 Γn = α1 Vn + (α2 − d
− → α1 V + (α2 −
N α2 α3 ∗2 α3 2 )Gn − β (um − um−1 ) + oP (1) 2 2 m=1 m N α2 α3 2 α3 2 )g (W ) − β (um − um−1 ) 2 2 m=1 m
= α1 V + α2 g 2 (W ) + α3 Γ. Then (2.53) holds. Furthermore, noting that 2 + ε2n,m+1 |Fn,m ] ≤ C E[ηn,m+1
and |E[exp(iYnm ) − 1|Fn,m ]| ≤
1 2 2 E[Ynm |Fn,m ] ≤ C(n−1 + Xn,m ), 2
we have |Γn | ≤ C(1 +
n
2 Xn,m ) ≤ C(1 + 2λ).
m=1
The uniformly integrability of exp(|Γn |) is easily obtained.
Lemma 2.4. For any α, βk ∈ R and 0 = u0 < u1 < · · · < uN = 1, we have In := E|E[exp(iαSn∗ + iVn − Γn )|Fn,1 ] − 1| = o(1).
(2.57)
∗ ηn,m+1 + αXn,m εn,m+1 , and write Znm = Proof. Recall that Ynm = βm E[exp(iYnm ) − 1|Fn,m ]. Due to the definitions of Wn (t) and βk∗ , we have
In = E|E[exp(i
n
Ynk −
k=1
≤ E|E[exp(i
n−1 k=1
n
Znk )|Fn,1 ] − 1|
k=1
Ynk −
n k=1
Znk ) exp(iYnn ) − exp(Znn ))|Fn,1 ]|
pg 41/2
April 29, 2014
8:57
World Scientific Book - 9.75in x 6.5in
42
linwang
Weak Convergence and its Applications
+E|E[exp(i
n−1
Ynk −
k=1
n−1
Znk )|Fn,1 ] − 1|.
k=1
By induction n m−1 m E|E[exp(i Ynk − Znk )(exp(iYnm ) − exp(Znm ))|Fn,1 ]| In ≤ m=2
k=1
k=1
+E|E[exp(iYn1 − Zn1 )|Fn,1 ] − 1| =: I1n + I2n . Where, by the Jensen inequality, I2n = |E[exp(iYn1 − Zn1 )|Fn,1 ] − 1| = |E[exp(iYn1 − E[exp(iYn1 ) − 1|Fn,1 ])|Fn,1 ] − 1| ≤ | exp(−E[exp(iYn1 ) − 1|Fn,1 ]) − 1| + |E[exp(iYn1 ) − 1|Fn,1 ]| ∗2 ∗2 ≤ C(n−1 + Xn,1 ) exp(C(n−1 + Xn,1 )) + oP (1), and the uniformly integrability of |E[(exp(iYn1 − Zn1 )|Fn,1 ] − 1| implies I2n → 0 as n → ∞. Define m−1 m un,m = exp(i Ynk − E[exp(iYnk ) − 1|Fn,k ]), k=1
k=1
vn,m = exp(iYnm ) − exp(E[exp(iYnm ) − 1|Fn,m ]). n ∗2 X ≤ 2λ implies n,k k=1 m m ∗2 |un,m | ≤ exp( |E[exp(iYnk −1)|Fn,k ]|) ≤ exp(C (n−1 +Xn,k )) ≤ exp(C(1+2λ)). k=1
k=1
(2.58) exp(|x|) for some δ > 0 and any x ∈ R, we have Since | exp(x) − 1 − x| ≤ |x| |E(vn,m |Fn,m )| = |E[exp(iYnm ) − 1|Fn,m ] + 1 − exp(E[exp(iYnm ) − 1|Fn,m ])| (2+δ)/2
≤ |E[exp(iYnm ) − 1|Fn,m ]|(2+δ)/2 exp(|E[exp(iYnm ) − 1|Fn,m ]|) ∗2 (2+δ)/2 ≤ C exp(C(1 + 2λ))(n−1 + Xn,m ) . By recalling Fn,m ⊆ Fn,m+1 for any n ≥ m ≥ 1, and n ≥ 1, it is readily seen that n I1n ≤ E[|un,m ||E[vn,m |Fn,m ]|] m=2
≤ C exp(2C(1 + 2λ))E[
n
∗2 (2+δ)/2 (n−1 + Xn,m ) ] → 0,
m=2
since
n
∗2 (2+δ)/2 (n−1 + Xn,m )
m=2
≤ (n and
n
−1 m=2 (n
−δ/2
+ max
1≤m≤n
∗ |Xn,m |δ )(1
+
n
∗2 Xn,m ) = oP (1)
m=1
∗2 (2+δ)/2 + Xn,m ) is uniformly integrable, (2.57) is obtained.
pg 42/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
43
The proof of Theorem 2.5. According to Theorem 2.4, we need to prove convergence of finite dimensional distributions and tightness for {Sn , G2n }. For convergence of finite dimensional distributions, we need to prove that p zk (Sn (tk ) − Sn (tk−1 )) + isG2n ] lim |E exp[
n→∞
k=1
1 − E exp[− g 2 (W ) zk2 (tk − tk−1 ) + isg 2 (W )]| = 0 2 p
(2.59)
k=1
for all sequence 0 = t0 < t1 < · · · < tp ≤ 1 and real numbers z1 , z2 , · · · , zp , s. For facilitating the proof, we only need to prove that 1 lim |E exp[iαSn + iβG2n ] − E exp[(− α + iβ)g 2 (W )]| = 0, 2 where Sn = Sn (1), α, β ∈ R. On the set {G2n ≤ 2λ}, Sn = Sn∗ and G2n = G∗2 n . By (2.50), n→∞
(2.60)
lim P(G2n > 2λ) = 0.
n→∞
It is easy to see that, for all α, β ∈ R, 2 2 E| exp(iαSn∗ + iβG∗2 n ) − exp(iαSn + iβGn )| ≤ 2P(Gn > 2λ) → 0.
(2.61)
Hence, (2.60) holds true iff 1 2 lim |E exp[iαSn∗ + iβG∗2 n ] − E exp[(− α + iβ)g (W )]| = 0. 2
n→∞
(2.62)
Since ESn∗2 ≤ C, EG∗2 n ≤ Cλ and (2.53), {Sn∗ , G∗2 n , Vn , Γn }n≥1 is tight. Hence, for each subsequence {n } ⊂ {n}, there exists a further subsequence {n } ⊂ {n } such that → {S, g 2 (W ), V, Γ}, {Sn∗ , G∗2 n , Vn , Γn } − d
where S is a limiting random variable of Sn∗ . By Lemma 2.3, we have E[exp(iαSn∗ + iVn − Γn )] → E[exp(iαS + iV − Γ)]. This, together with Lemma 2.4, yields that E[exp(iαS + iV − Γ)] = 1. Hence, by N noting E[exp(iV )] = exp(− 12 k=1 βk (uk − uk−1 )), we obtain 1 1 E[exp(iαS + iV + α2 g 2 (W ))] = exp(− βk (uk − uk−1 )). 2 2 N
k=1
Furthermore, using the Weierstrass theorem on approximation of continuous functions by trigonometric polynomials, we have 1 E[exp(iαS + α2 g 2 (W )) − 1]G[W (u0 ), W (u1 ), · · · , W (uN )] = 0 2
(2.63)
pg 43/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
44
linwang
Weak Convergence and its Applications
for any 0 = u0 < u1 < · · · < uN = 1, where G(y0 , · · · , yN ) is an arbitrary continuous function with compact support. Write G = σ(W (s), 0 ≤ s ≤ 1). (2.63) means that 1 E[exp(iαS + α2 g 2 (W )) − 1|G] = 0. 2 Consequently, for all α, β ∈ R, 1 E[exp(iαS + iβg 2 (W ))|G] = exp((− α2 + iβ)g 2 (W )). 2 Hence, for each {n } ⊂ {n}, there exists a subsequence {n } ⊂ {n } such that 1 2 2 2 lim E exp[iαSn∗ +iβG∗2 n ] = E[exp(iαS +iβg (W ))] = E[exp((− α +iβ)g (W ))]. n →∞ 2 (2.60) is established as the limitation does not depend on the choice of the subsequences, and also completes the proof in the special case where (2.52) holds. It remains to remove the boundedness condition (2.52). To this end, for given ε > 0, choose a continuous point λ of distribution function of g 2 (W ) such that P(g 2 (W ) > λ) ≤ ε. Let gλ2 (W ) = g 2 (W )1{g2 (W )≤λ} + λ1{g2 (W )>λ} , = Xn,k 1{k X 2 ≤λ} , Xn,k i=1
n,i
Sn =
n
Xn,k εn,k+1 , G2 n =
k=1
n
2 Xn,k .
k=1
2 , where 2 − max1≤k≤n X 2 ≤ G2 ≤ G Now G n n n n,k 2n = G2n 1{G2 ≤λ} + λ1{G2 >λ} . G n
n
Since (2.49) and (2.50), n n 1 1 √ |Xn,k ||E(ηn,k+1 εn,k+1 |Fn,k )| ≤ √ |Xn,k ||E(ηn,k+1 εn,k+1 |Fn,k )| = oP (1) n n k=1
k=1
d d 2n − → gλ2 (W ) since G → and G2 n − that gλ2 (W ) is a.s. bounded, so
gλ2 (W )
by the continuous mapping theorem. Note
α2 + iβ)g 2 (W ))]| n→∞ 2 α2 )] − E[exp((− ≤ lim |E[exp(iαSn + iβG2 + iβ)gλ2 (W ))]| n n→∞ 2 +2 lim P(G2n > λ) + 2P(g 2 (W ) > λ) lim |E[exp(iαSn + iβG2n )] − E[exp((−
n→∞
≤ 4ε which implies (2.60). Consider tightness of {Sn , G2n }. We only need to prove the tightness of {Sn }, since weak convergence of {Gn } by (2.50). In fact, for every ε > 0, sup
0
≤ ε−2
[nt]
P(|
sup
Xn,k εn,k+1 | > ε)
k=[ns]+1
0
[nt]
2 EXn,k+1 ,
k=[ns]+1
which implies tightness of {Sn} in D[0, 1].
pg 44/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
2.3
linwang
45
Convergence of Poisson point processes
In the previous section, Donsker type invariance principles are studied under second moment conditions (such as conditions (2.14), (2.46), (2.47)). When variances do not exist, weak convergence of underlying processes may be changed. For example, the data, appearing in the fields of data network, financial returns and reinsurance, usually are heavy tailed. Roughly speaking, a random variable X is called heavy tailed if there exists a positive parameter α > 0 such that P[X > x] ∼ x−α as x → ∞. Here and elsewhere that we use g(x) ∼ h(x) as x → ∞ as shorthand for lim
x→∞
g(x) = 1. h(x)
Heavy-tail analysis is an important branch of probability theory. Its mathematical tool is based on point processes. Our interest in this section is the weak convergence in the context of heavy-tail analysis. We firstly present some preliminaries which are used in the section. 1. Regular variation In our study of heavy-tail analysis, we assume that the tail probability distribution of heavy tail random variable is regular varying function. Roughly speaking, regularly varying functions are those functions which behave asymptotically like power functions. Its mathematical definition is following: Definition 2.1. A measurable function U : R+ → R+ is regularly varying at ∞ with index ρ ∈ R (written U ∈ RVρ ) if for any x > 0, U (tx) = xρ . t→∞ U (t) lim
If ρ = 0, we call U slowly varying. Example 2.1. The extreme-value distribution Φα (x) = exp(−x−α ), x > 0, its tail satisfies 1 − Φα (x) ∼ x−α as x → ∞. Example 2.2. The Cauchy distribution F has density function f (x) = and its tail satisfies 1 − F (x) ∼ (πx)−1 as x → ∞.
1 , π(1+x2 )
pg 45/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
46
linwang
Weak Convergence and its Applications
Example 2.3. A random variable X is said to have a α−stable distribution if for any positive numbers A and B, there are a positive number C and a real number D such that d
AX1 + BX2 = CX + D, where X1 and X2 are independent copies of X, and there exists a number α ∈ (0, 2] such that C α = Aα + B α . When 0 < α < 2, α−stable distribution G satisfies 1 − G(x) ∼ cx−α as x → ∞. Regular variation of distribution tails can be reformulated by vague convergence. This will be useful in the study of weak convergence. Proposition 2.1. Suppose that ξ is a nonnegative random variable with distribution function F . Set F = 1 − F . The following statements are equivalent: (i) F ∈ RV−α , α > 0. (ii) There exists a sequence {bn} with bn → ∞ such that lim nF (bn x) = x−α , x > 0.
n→∞
(iii) There exists a sequence {bn } with bn → ∞ such that ξ υ → να (·) nP[ ∈ ·] − bn in M+ (0, ∞], where να ((x, ∞]) = x−α . Proof. (i)⇒ (ii). Let bn be a quantile of F , (ii) is easily obtained. (ii)⇒ (i). Define n(t) = inf{n : bn+1 > t}. Note that bn(t) ≤ t < bn(t)+1 . By monotonicity, we have n(t)F (bn(t) x) (n(t) + 1)F (bn(t)+1 x) n(t) F (tx) n(t) + 1 ≤ ≤ , · · n(t) n(t) + 1 (n(t) + 1)F (bn(t)+1 ) F (t) n(t)F (bn(t) ) and so lim
t→∞
F (tx) = x−α F (t)
holds. (ii)⇒ (iii). Let C+ K ((0, ∞]) be the space of continuous functions which have compact supports. For f ∈ C+ K ((0, ∞]), the support of f is contained in (δ, ∞] for some δ > 0. From (ii), nP[
ξ > x] → x−α = να ((x, ∞]) for any x > 0. bn
pg 46/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
47
On (δ, ∞], define Pn (·) =
P[ bξn ∈ ·] P[ bξn > δ]
and P (·) =
να (·) , να ((δ, ∞])
which are probability measures on (δ, ∞]. Then for y ∈ (δ, ∞], Pn ((y, ∞]) → P ((y, ∞]). In R, convergence of distribution functions is equivalent to weak convergence, so for bounded continuous function f on (δ, ∞], we have Pn (f ) → P (f ) which means that nEf (
ξ ) → να (f ). bn
(iii) is obtained. (iii)⇒ (ii). From (iii), since (x, ∞] is a relatively compact set, and να (∂(x, ∞]) = 0 so we have nP[
ξ > x] → x−α = να ((x, ∞]) for any x > 0. bn
(ii) is obtained.
2. Point process Point process is a well studied object in probability theory, and the point process method is a powerful tool in statistics. In the study of weak convergence under heavy-tailed assumption, point process serves as a tool, bridging gap between heavytail analysis and weak convergence. Definition of point process is based on the point measure. Definition 2.2. The random element N : (Ω, F ) → (Mp (S), Mp (S)) is called as point process with state space S, where Mp (S) is the Borel σ−filed of Mp (S) generated by the open sets in Mp (S). The set of Borel subsets in S is denoted by S. Definition 2.3. N is a Poisson random measure with mean measure μ (P RM (μ)), if k if μ(A) < ∞, and P[N(A) = k] = (1) for A ∈ S, P[N(A) = k] = exp(−μ(A))(μ(A)) k! 0 if μ(A) = ∞; (2) for disjoint subsets of S in S, A1 , · · · , Ak , N(A1 ), · · · , N(Ak ) are independent random variables.
pg 47/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
48
linwang
Weak Convergence and its Applications
Functional limit theorem for independent heavy-tailed sequence
2.3.1
The limiting process in this section is pure-jump L´evy process. The distribution of a L´evy process Z(·) is characterized by its characteristic L´evy triple, that is, the characteristic triple of the infinitely divisible distribution of Z(1). The characteristic function of Z(1) and the characteristic L´evy triple (b, c, ν) of Z are related in the following way: 1 E[exp(iuZ(1))] = exp(ibu − cu2 + (eiux − 1 − iux1{|x|≤1} (x))ν(dx)) 2 R for u ∈ R, where b ∈ R, c ≥ 0, and ν is a measure on R. When c = 0, the L´evy process is a pure-jump process. Firstly, we present a basic convergence lemma. Lemma 2.5. Suppose that for each n ≥ 1, {Xn,j , j ≥ 1} is a sequence of i.i.d. random elements of (S, S). Let ξ be PRM(μ) on Mp (S). (i) We have n
δXn,j ⇒ ξ
j=1
on M+ (S) iff
⎛ nP[Xn,1 ∈ ·] = E ⎝
n
⎞ δXn,j (·)⎠ → μ υ
(2.64)
j=1
in M+ (S). (ii) For a measure μ ∈ M+ (S), we have n 1 δX ⇒ μ an j=1 n,j
on M+ (S) for some an with 0 < an ↑ ∞ iff ⎛ ⎞ n n 1 υ P[Xn,1 ∈ ·] = E ⎝ δX (·)⎠ → μ an an j=1 n,j in M+ (S). Proof.
(i) For f ∈ C+ K (S),
E[exp(−
n j=1
Xn,j (f ) )] = E[exp(−
n j=1
f (Xn,j ))] = (E exp(−f (Xn,1 )))n
n E(n(1 − exp(−f (Xn,1 )))) = 1− n
(2.65)
(2.66)
pg 48/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
49
n (1 − exp(−f (x)))nP[Xn,1 ∈ dx] S = 1− , n which converges to
exp S
(1 − exp(−f (x)))μ(dx) ,
the Laplace functional of P RM (μ), iff (1 − exp(−f (x)))nP[Xn,1 ∈ dx] → (1 − e−f (x) )μ(dx). S
S
This last statement is equivalent to vague convergence in (2.64). n (ii) It is enough to prove the Laplace functionals of i=1 Xn,1 (f ) converge. We compute the Laplace functional for the quantity on the left side of (2.65): n X (f ) 1 E exp(− i=1 n,1 ) = (E exp(− f (Xn,1 )))n an an !n (1 − exp(− a1n f (x)))nP[Xn,1 ∈ dx] S = 1− , n which converges to exp(−μ(f )), the Laplace functional of μ, iff 1 (1 − exp(− f (x)))nP[Xn,1 ∈ dx] → μ(f ). a n S
(2.67)
We prove that (2.67) is equivalent to (2.66) as follows: Suppose (2.66) holds. Firstly, n (1 − exp(−f (x)/an ))nP[Xn,1 ∈ dx] ≤ f (x) P[Xn,1 ∈ dx] → μ(f ), a n S S so lim sup (1 − exp(−f (x)/an ))nP[Xn,1 ∈ dx] ≤ μ(f ). n→∞
S
Furthermore, S
(1 − exp(−f (x)/an ))nP[Xn,1 ∈ dx] 2 n f (x) n ≥ f (x) P[Xn,1 ∈ dx] − P[Xn,1 ∈ dx] a n S S 2an an =: I + II.
Now I → μ(f ) from (2.66), and II ∼
μ(f 2 ) →0 2an
since f 2 ∈ C+ k (S) and an ↑ ∞. So lim inf (1 − exp(−f (x)/an ))nP[Xn,1 ∈ dx] ≥ μ(f ). n→∞
S
pg 49/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
50
linwang
Weak Convergence and its Applications
Hence (2.67) holds. Conversely, let f ∈ CK + (S), and suppose that f ≤ 1. Assuming (2.67), we have f /an ≥ 1 − exp(−f /an ) leading to
f (x)
lim inf n→∞
S
n P[Xn,1 ∈ dx] ≥ μ(f ) an
and f f2 − 2 ≤ 1 − exp(−f /an ) an 2an leading to
(
lim sup n→∞
S
f (x) f 2 (x) − )nP[Xn,1 ∈ dx] ≤ μ(f ). an 2a2n
As before, we may show that S
f 2 (x) nP[Xn,1 ∈ dx] → 0. 2a2n
Hence (2.66) holds. Furthermore, we have
Lemma 2.6. Suppose {X, X1 , X2 , · · · } are i.i.d. random variables. The following statements are equivalent: (i) nP[ in [−∞, ∞] \ {0}; (ii)
j
X υ ∈ ·] −→ ν bn
δ( j ,Xj /bn ) ⇒ P RM (λ × ν)
(2.68)
n
in Mp ([0, ∞) × [−∞, ∞] \ {0}), where λ is Lebesgue measure. Proof. It suffices to prove (2.68) in Mp ([0, T ] × [−∞, ∞] \ {0}) for any T > 0. To see this, observe that for f ∈ C+ K ([0, ∞) × [−∞, ∞] \ {0}) with compact support in [0, T ] × [−∞, ∞] \ {0}, the Laplace functional of a random measure at f is the same as the restriction of the random measure to [0, T ] × [−∞, ∞] \ {0} evaluated on the restriction of f to [0, T ] × [−∞, ∞] \ {0}. For convenience, we restrict our attention to proving convergence in M+ ([0, 1] × [−∞, ∞] \ {0}). Suppose that U1 , · · · , Un are i.i.d. U (0, 1) random variables, which are independent of {Xj }, with order statistics U1:n ≤ U2:n ≤ · · · ≤ Un:n .
pg 50/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
51
Convergence to the Independent Increment Processes
It is necessary to prove n
δ(Uj:n ,Xj /bn ) ⇒ P RM (λ × ν)
(2.69)
j=1
in Mp ([0, 1] × [−∞, ∞] \ {0}). First, from the independence of {Uj } and {Xn }, n
d
δ(Uj:n ,Xj /bn ) =
j=1
n
δ(Uj ,Xj /bn )
j=1
as random elements of Mp ([0, 1] × [−∞, ∞] \ {0}). Thus we need to prove n
δ(Uj ,Xj /bn ) ⇒ P RM (λ × ν)
j=1
in Mp ([0, 1] × [−∞, ∞] \ {0}). However, because of independence, nP[(U1 ,
X1 X1 ) ∈ ·] = λ × nP[ ∈ ·] ⇒ λ × v, bn bn
and therefore, from Lemma 2.5, (2.69) follows. Let d(·, ·) be the vague metric on Mp ([0, 1] × [−∞, ∞] \ {0}). From Slutsky’s theorem, it is enough to prove that n n P d( δ( j ,Xj /bn ) , δ(Uj:n ,Xj /bn ) ) → 0 j=1
n
(2.70)
j=1
as n → ∞. From the definition of the vague metric, it is enough to prove for h ∈ C+ K ([0, 1] × [−∞, ∞] \ {0}) that |
n
n j P h( , Xj /bn ) − h(Uj:n , Xj /bn )| → 0 n j=1 j=1
(2.71)
in R. Suppose the compact support of h is contained in [0, 1] × {x : |x| > θ} for some θ > 0. We have n j |h( , Xj /bn ) − h(Uj:n , Xj /bn )|1[|Xj |/bn >θ] n j=1 j − Uj:n |) 1[|Xj |/bn >θ] , n n
≤ ωh (sup | j≤n
j=1
where, as usual, ωh (η) is the modulus of continuity of the uniformly continuous function h : ωh (η) =
sup |h(x) − h(y)|,
|x−y|≤η
and n j=1
1[|Xj |/bn >θ] =
n j=1
δXj /bn ({x > θ})
pg 51/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
52
linwang
Weak Convergence and its Applications
is stochastically bounded. So it is enough to prove that sup | j≤n
j P − Uj:n | → 0. n
However, from the Glivenko−Cantelli theorem 1 1[U ≤x] − x| → 0 a.s. n j=1 j n
sup |
0≤x≤1
and hence, we obtain this lemma.
Assume that {Xn }n≥1 is an sequence of i.i.d. random variables, which are regular varying with index α ∈ (0, 2), and let P(X1 > x) , x→∞ P(|X1 | > x)
P(X1 < −x) . x→∞ P(|X1 | > x)
q = 1 − p = lim
p = lim
The following theorem holds. Theorem 2.6. Suppose that lim lim sup ε↓0
n→∞
n E(X12 1|X1 |≤an ε ) = 0, a2n
(2.72)
where {an }n≥1 is a sequence of positive real numbers such that nP(|X1 | > an ) → 1. Define the partial sum stochastic process Zn (t) =
[nt] 1 (Xk − E(Xk 1[|Xk |≤an ] )), an
t ≥ 0.
k=1
Then Zn ⇒ Zα in D[0, 1] endowed with the J1 topology, where (Zα (t))t∈[0,1] is an α−stable L´evy process with L´evy triple (b, 0, ν) given by ν(dx) = (p1(0,∞)(x) + q1(−∞,0) (x))α|x|−α−1 dx, b=
xν(dx). {|x|≤1}
Proof.
From Lemma 2.6, we know that ∞ k=1
δ( k , Xk ) ⇒ n
an
∞
δ(tk ,jk ) = P RM (λ × ν)
k=1
in Mp ([0, ∞) × [−∞, ∞] \ {0}). For convenience, we assume that 1 is not a jump of the function τ (t) = ν{x : |x| > t}.
pg 52/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
53
Here and in the rest of the proof, we always assume ε is chosen so that ε is not a jump point of τ (·). Define the restriction map Mp ([0, ∞) × [−∞, ∞] \ {0}) → Mp ([0, ∞) × {x : |x| > ε}) by m → m|[0,∞)×{x:|x|>ε}. It is almost surely continuous with respect to the distribution of P RM (λ × ν). We claim that the summation functional ψ (ε) : Mp ([0, ∞) × {x : |x| > ε}) → D([0, T ], R) defined by ψ (ε) (
∞
δ(τk ,Jk ) )(t) →
k=1
Jk ,
τk ≤t
is continuous respect to the distribution of P RM (λ × ν) almost surely. Fix some T > 0. We will show that if m1 , m2 ∈ Mp ([0, ∞) × {x : |x| > ε}) are closed, then ψ (ε) (m1 ) and ψ (ε) (m2 ) are closed as functions in D[0, T ]. Define the subset of Mp ([0, ∞) × {x : |x| > ε}) Λ = { m ∈ Mp ([0, ∞) × {x : |x| > ε}) : m([0, ∞) × {±ε}) = 0, m([0, ∞) × {±∞}) = 0, m({0} × {x : |x| > ε}) = m({T } × {x : |x| > ε}) = 0, m{[0, T ] × {x : |x| > ε}} < ∞, and no vertical line contains two points of m(([0, T ] × {x : |x| > ε}) ∩ ·)}. Denote P RM (λ × ν) by N. We analyze Λ as an intersection of several sets and show that each of the intersecting sets has probability 1. First, we have E(N([0, T ] × {x : |x| > ε})) = T ν({x : |x| > ε}) < ∞, so P[N([0, T ] × {x : |x| > ε}) < ∞] = 1. Second, we have λ × ν({0} × {x : |x| > ε}) = λ({0}) · ν({x : |x| > ε}) = 0, so E(N({0} × {x : |x| > ε})) = 0, and therefore P[N({0} × {x : |x| > ε}) = 0] = 1.
pg 53/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
54
linwang
Weak Convergence and its Applications
Similarly, P[N({T } × {x : |x| > ε}) = 0] = 1. Next, we show P{no vertical line contains two points of N(([0, T ] × {x : |x| > ε}) ∩ ·)} = 1. (2.73) Pick any M > 0. We can represent N(([0, T ] × {x : |x| > ε}) ∩ ·)} =
ξ
δ(Ui ,Vi ) (·),
i=1
where ξ is a Poisson random variable with parameter T ν({x : |x| > ε}), {Ui , i ≥ 1} are i.i.d. uniformly distributed on (0, T ), and {Vi , i ≥ 1} are i.i.d. with distribution ν({x : |x| > ε} ∩ ·)/ν({x : |x| > ε}). Then P{some vertical line contains two points of N(([0, M ] × {x : |x| > ε}) ∩ ·)} ⎫ ⎧ ⎬ ⎨ [Ui = Uj ] =P ⎭ ⎩ 1≤i<j≤ξ ⎧ ⎫ ∞ ⎨ ⎬ = P [Ui = Uj ] P[ξ = n] ⎩ ⎭ 1≤i<j≤n
n=0
∞ % & n P[U1 = U2 ]P[ξ = n] ≤
=
2 n=0 ∞ % & n n=0
2
· 0 · P[ξ = n] = 0.
This gives (2.73). Thus P[N ∈ Λ] = 1. For continuity of ψ (ε) , it is enough that ψ (ε) (mn ) → ψ(ε) (m) in D[0, 1] according υ → m in Mp for some m ∈ Λ. to the J1 topology if mn − Since m ∈ Λ, there exists a nonnegative integer k = k(m) such that m([0, T ] × {x : |x| > ε}) = k < ∞. By the assumption, η does not have any atoms on the horizontal lines at u or −u. Thus there exists a positive integer n0 such that for all n ≥ n0 , mn ([0, T ] × {x : |x| > ε}) = k < ∞. (n)
(n)
If k = 0, there is nothing to prove, so we assume k ≥ 1. Let (ti , xi ), (ti , xi ) be the atoms of m and mn in [0, T ] × {x : |x| > ε} respectively. Pick γ so small that t1 − γ > γ,
ti + γ < ti+1 − γ, i = 1, 2, · · · , k − 1,
tk + γ < T − γ
and there exists n1 > n0 , such that for n > n1 , (n)
(n)
(ti , xi ) ∈ (ti − γ, ti + γ) × (xi − γ, xi + γ), 1 ≤ i ≤ k.
pg 54/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
55
Convergence to the Independent Increment Processes
Define continuous one-to-one mapping λn : [0, T ] → [0, T ] by λn (0) = 0, λn (T ) = T, (n)
λn (ti ) = ti i = 1, · · · , k, between these points, λn is defined by linear interpolation. We have (n) (ε) (mn )(t)| = sup | xk − xk | sup |ψ (ε) (mn )(λ−1 n (t)) − ψ t∈[0,T ]
t∈[0,T ]
= sup | t∈[0,T ]
t∈[0,T ] t ≤t k
(n)
xk −
(n) λn (tk )≤t
= sup | ≤
tk ≤t
(n)
tk ≤λ−1 n (t)
(n)
xk −
xk |
tk ≤t
xk |
tk ≤t
(n)
|xk − xk | ≤ kγ.
tk ≤T
On the other hand, sup |λn (s) − s| =
s∈[0,T ]
k '
'
|λn (s) − s|.
i=1 s∈[t(n) ,t(n) ] i
i+1
On the first interval, |λn (s) − s| =
sup (n)
s∈[0,t1 ]
t1 s (n) (n) s∈[0,t1 ] t1
≤| (n)
|
sup
t1 (n) t1
(n)
− 1|t1
− s| (n)
= |t1 − t1 | ≤ γ.
(n)
On [ti , ti+1 ], λn (s) =
ti+1 − ti s (n) (n) ti+1 − ti
+ ti −
ti+1 − ti (n) (n) ti+1 − ti
(n)
· ti .
Thus |λn (s) − s|
sup (n)
s∈[ti
=
(n)
,ti+1 ]
sup (n) (n) s∈[ti ,ti+1 ]
=
|
ti+1 − ti s (n) (n) ti+1 − ti
sup (n) (n) s∈[0,ti+1 −ti ]
=
sup (n) (n) s∈[0,ti+1 −ti ]
≤ |(
ti+1 − ti (n) (n) ti+1 − ti
|
+ ti −
ti+1 − ti (n) (n) ti+1 − ti
|(
ti+1 − ti (n) (n) ti+1 − ti
(n)
(n)
ti+1 − ti (n) (n) ti+1 − ti (n)
· (s + ti ) + ti − (n)
− 1)s + ti − ti | (n)
− 1)(ti+1 − ti )| + |ti − ti |
(n)
· ti
− s|
ti+1 − ti (n) (n) ti+1 − ti
(n)
· ti
(n)
− (s + ti )|
pg 55/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
56
linwang
Weak Convergence and its Applications (n)
(n)
≤ |ti+1 − ti | + |ti+1 − ti | + γ ≤ 3γ. So sup |λn (s) − s| ≤ 3γ.
s∈[0,T ]
Thus ψ (ε) (mn ) → ψ (ε) (m) in D[0, 1] according to the J1 topology. Hence, ψ (ε) is continuous almost surely respect to the distribution of N. Then, we have n
1[|Xk |>an ε] ( k , Xk ) ⇒ n
k=1
an
∞
1[||jk ||>ε] (tk ,jk )
k=1
in Mp ([0, ∞) × {x : |x| > ε}), and [n·] Xk k=1
an
1[|Xk |>an ε] ⇒
jk 1[|jk |>ε]
tk ≤(·)
in D([0, T ]). Similarly, [n·] Xk k=1
an
1[ε<| Xk |≤1] ⇒ an
jk 1[ε<|jk |≤1] .
tk ≤(·)
Then, taking expectations, we have [n·] E(X1 1[ε<| X1 |≤1] ) ⇒ (·) an an
xν(dx) {x:ε<|x|≤1}
in D([0, T ]). To justify this, observe first for any t > 0 that [nt] [nt] X1 E(X1 1[ε<| X1 |≤1] ) = xnP[ ∈ dx] an an n {x:|x|∈(ε,1]} an xν(dx) →t {x:|x|∈(ε,1]}
ν
1 since nP[ X ∈ ·] → ν(·) and ε and 1 are not jumps of τ (·). Convergence is locally an uniform in t and hence convergence takes place in D([0, T ]). Thus
[n·] Xk
Zn(ε) (·) :=
k=1
an
⇒
1[|Xk |>an ε] − [n·]E(
jk 1[|jk |>ε] − (·)
tk ≤(·)
X1 X1 1 ) an [ε<| an |≤1]
{x:|x|∈(ε,1]}
xν(dx) =: Zα(ε) (·).
From the Itˆo representation of L´evy process, for almost all ω, as ε ↓ 0, Zα(ε) (·) → Zα (·) locally uniformly in t. Hence, it suffices to show that ( ) (ε) lim lim sup P sup |Zn (t) − Zn (t)| > δ = 0 ε↓0
n→∞
0≤t≤T
(2.74)
pg 56/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
57
(ε)
for any δ > 0. Recalling the definitions of Zn , we have * * * [nt] * Xk * * X1 (ε) * 1[|Xk |≤an ε] − [nt]E( 1[|X1 |≤an ε] )** |Zn (t) − Zn (t)| = * an *k=1 an * * * * * * [nt] Xk * Xk = ** 1[|Xk |≤an ε] − E( 1[|Xn,k |≤an ε] ) ** , an an * * k=1
so by Kolmogorov’s ) ( inequality, P sup |Zn(ε) (t) − Zn (t)| > δ 0≤t≤T * * * [nt] * * * Xk X k 1[|Xk |≤an ε] − E( 1[|Xk |≤an ε] ) ** > δ] = P[ sup ** an an 0≤t≤T * * k=1 * * j * * X Xk * * k 1[|Xk |≤ε] − E( 1[|Xk |≤an ε] ) * > δ] = P[ max * * 0≤j≤nT * an an k=1 ⎛ ⎞ [nT ] Xk 1 ⎠ ≤ 2 V ar ⎝ 1 δ an [|Xk |≤an ε] k=1 X1 [nT ] 1[|X1 |≤an ε] = 2 V ar δ an ! 2 [nT ] X1 ≤ 2 E 1[|X1 |≤an ε] δ an → x2 ν(dx) |x|≤ε
as n → ∞. Letting ε → 0, we obtain (2.74). The theorem is proved. 2.3.2
Functional limit theorem for dependent heavy-tailed sequence
In the previous subsection, we discuss the functional limit theorem for independent heavy-tailed sequence by means of point process convergence method. In this subsection, we discuss the functional limit theorem of dependent heavy-tailed sequence with regular variation with index α ∈ (0, 2). Similar to last subsection, we investigate the asymptotic distributional behavior of processes [nt] 1 ( Xi − [nt]bn ), t ∈ [0, 1], Zn (t) = an i=1
valued in D[0, 1], where {Xi }i≥1 is a stationary sequence of random variables, {an }n≥1 is a sequence of positive real numbers such that nP(|X1 | > an ) → 1
pg 57/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
58
linwang
Weak Convergence and its Applications
as n → ∞ and bn = E(X1 1{|X1 |≤an } ). In independent case, the weak convergence of Zn holds in Skorohod’s J1 topology. For dependent case, the asymptotic behavior of Zn is quite complex, due to the different dependent structures. Leadbetter and Rootz´en (1988), Tyran-Kami´ nska (2010) studied that subject. Essentially, what they showed is that the functional limit theorem holds in Skorohod’s J1 topology iff certain point processes to a Poisson point process, which in turn is equivalent to a kind of nonclustering property for heavy tailed random variables. However, for many interesting models, convergence in the J1 topology can not hold. For example, Avram and Taqqu (1992) showed that when {Xn }n∈Z (Z is the integers set of R) is linear process with regular varying innovations and nonnegative coefficients, {Zn (·)} converges weakly in the M1 topology instead. Basrak, Krizmani´c and Segers (2012) showed that for a stationary, regularly varying sequence for which clusters of high-threshold excesses can be broken down into asymptotically independent blocks, {Zn (·)} converges weakly to an α−stable L´evy process in D[0, 1] with M1 topology. The method of proving convergence of point processes in M1 topology is different from that in J1 topology. In 2.3.1, we have already introduced the convergence of point processes in J1 topology. For the completeness, we introduce Basrak’s result in this subsection. To express the dependence between {Xn }n∈Z , it is necessary to introduce the definition of jointly regularly varying. From now on, Definition 2.4. For a strictly stationary sequence (Xn )n∈Z , if for any positive integer k, and any norm || · || on Rk , there exists a random vector Θ on the unit sphere Sk−1 = {x ∈ Rk : ||x|| = 1}, and α ∈ (0, ∞) such that for every u ∈ (0, ∞), P(||X|| > ux, X/||X|| ∈ ·) υ −α − → u P(Θ ∈ ·) P(||X|| > x) as x → ∞, where X = (X1 , · · · , Xk ), we say that {Xn }n∈Z is jointly regularly varying with index α. Basrak and Segers (2009) provides a convenient characterization of jointly regularly variation: it is necessary and sufficient that there exists a strictly stationary process {Yn }n∈Z with P(|Y1 | > y) = y −α for y > 1 such that as x → ∞, f idi
{(x−1 Xn )n∈Z ||X1 | > x} −−→ {Yn }n∈Z , f idi
where −−→ denotes convergence of finite-dimensional distributions, {Yn }n∈Z is called the tail process of {Xn }n≥1 . Theorem 2.7. Let {Xn }n∈Z be a strictly stationary sequence of random variables, jointly regular varying with index α ∈ (0, 2), and with the tail process {Yn }n≥1 almost surely has no two values of the opposite sign. Furthermore, suppose that there exists a positive integer sequence {rn }n≥1 such that rn → ∞, rn /n → 0 as n → ∞ and for every u > 0 lim lim sup P( max |Xi | > uan ||X0 | > uan ) = 0,
m→∞ n→∞
m≤i≤rn
(2.75)
pg 58/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to the Independent Increment Processes
59
and for every f ∈ C+ K ([0, 1] × [−∞, ∞]\{0}), E[exp{−
n i=1
kn rn i Xi kn rn Xi , f( , )}] − E[exp{− f( )}] → 0 n an n an
(2.76)
i=1
k=1
as n → ∞, where kn = [n/rn ]. If 1 ≤ α < 2, also suppose that for all δ > 0, k Xi Xi ( 1{|Xi |≤uan } − E( 1{|Xi |≤uan } ))| > δ) = 0. lim lim sup P( max | u↓0 n→∞ 1≤k≤n an an
(2.77)
i=1
Then Zn ⇒ Zα in D[0, 1] endowed with the M1 topology, where (Zα (t))t∈[0,1] is an α−stable L´evy process with L´evy triple (b, 0, μ) given by the limits υ ν (u) − → ν, xν (u) (dx) − xμ(dx) → b {u<|x|≤1}
as u ↓ 0, with ν (u) (x, ∞) = u−α P(u
{u<|x|≤1}
Yi 1{|Yi |>1} > x, sup |Yi | ≤ 1), i≤−1
i≥0
ν (u) (−∞, −x) = u−α P(u
Yi 1{|Yi |>1} < −x, sup |Yi | ≤ 1), i≤−1
i≥1
μ(dx) = (p1(0,∞) (x) + q1(−∞,0) (x))α|x|−α−1 dx, p = lim
x→∞
Proof.
P(X1 > x) , P(|X1 | > x)
q = 1 − p = lim
x→∞
P(X1 < −x) . P(|X1 | > x)
Define the point process Nn =
n i=1
We first show that → N(u) := Nn |[0,1]×[−∞,−u)∪(u,∞] − d
δ( i , Xi ) , n ≥ 1. n an
i≥1 j≥1
δ(T (u) ,uZij ) |[0,1]×[−∞,−u)∪(u,∞]
(2.78)
i
in Mp ([0, 1] × [−∞, −u) ∪ (u, ∞]), where (1) i δT (u) is a homogeneous Poisson process on [0, 1] with intensity θuα ; i (2) ( j δZij )i is an i.i.d. sequence of point processes in R, independent of i δTi(u) , and with common distribution equal to ( n≥1 δYn | supi≤−1 |Yi | ≤ 1). Let {Xk,j }j≥1 with k ≥ 1, be independent copies of {Xj }j≥1 , and define ˆn = N
kn k=1
ˆ n,k = ˆ n,k with N N
rn j=1
δ( krn , Xk,j ) . n
an
pg 59/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
60
linwang
Weak Convergence and its Applications
ˆ n must coincide. Similarly, it is enough By (2.76), the weak limits of Nn and N ˆ n converge to those of N(u) . Take f ∈ to show that the Laplace functionals of N + CK ([0, 1] × [−∞, −u) ∪ (u, ∞]). We extend f to [0, 1] × [−∞, ∞]/{0} by setting f (t, x) = 0 whenever |x| ≤ u, hence there exists M > 0, such that 0 ≤ f (t, x) ≤ M 1[−u,u]c (x). Then ˆ n,k f )] ≥ E[exp(−M 1 ≥ E[exp(−N
rn
1{|Xi |>an u} )]
i=1
≥ 1 − M rn P(|X0 | > an u) = 1 − O(kn−1 ) as n → ∞. By 0 ≤ − log z − (1 − z) ≤ (1 − z)2 /z for z ∈ [0, 1], it follows that ˆ n f )] = − − log E[exp(−N
kn
ˆ n,k f )] = log E[exp(−N
k=1
kn
ˆ n,k f )])+O(k −1 ) (1−E[exp(−N n
k=1
as n → ∞. Furthermore, by Proposition 4.2 in Basrak and Segers (2009), θ := P(sup |Yi | ≤ 1) = lim P( max |Xi | ≤ an u||X0 | > an u) n→∞
i≥1
= lim
n→∞
1≤i≤rn
P(max1≤i≤rn |Xi | ≤ an u) , rn P(|X0 | > an u)
and rn 1 1(an u,∞) (Xi )| max |Xi | > an u] = . lim E[ n→∞ 1≤i≤rn θ i=1
(2.79)
Thus kn
ˆ n,k f )]) (1 − E[exp(−N
k=1
= kn P( max |Xi | > an u) 1≤i≤rn
= θu−α
rn kn 1 krn Xj , E[1 − exp(− f( ))| max |Xi | > an u] kn n an 1≤i≤rn k=1
j=1
rn kn krn Xj 1 , E[1 − exp(− f( ))| max |Xi | > an u] + o(1). kn n an 1≤i≤rn j=1 k=1
Let Tn be uniformly distributed random variables on {krn /n : k = 1, 2, · · · , kn } and independent of {Xn }n∈Z , we have kn
ˆ n,k f )]) = θu−α E[1−exp(− (1−E[exp(−N
rn
f (Tn ,
j=1
k=1
Xj ))| max |Xi | > an u]+o(1). an 1≤i≤rn
Note that Tn converges in law to random variable T with a uniformly distribute on (0, 1), by (2.79), (Tn ,
rn i=1
d
δ Xi | max |Xi | > an u) − → (T, an
1≤i≤rn
∞ n=1
δuZn )
pg 60/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
61
Convergence to the Independent Increment Processes
where n δZn is a point process on [−∞, ∞]/{0}, independent of T and with dis tribution equal to ( n≥1 δYn | supi≤−1 |Yi | ≤ 1). Thus rn kn krn Xj 1 E[1 − exp(− f( ))| max |Xi | > an u] , n→∞ kn n an 1≤i≤rn j=1 k=1 1 f (T, uZj ))] = θu−α E[1 − exp(− f (t, uZj ))]dt. = θu−α E[1 − exp(−
lim θu−α
0
j≥1
In the other hand, E[− exp(N(u) f )] = E[exp(− = E[
j≥1
(u)
f (Ti
, uZij ))]
i≥1 j≥1
E[−
i≥1
(u)
(u)
f (Ti , uZij )|Ti
]]
j≥1
(u) = E[exp( log E[exp(− f (Ti , uZj ))])].
i≥1
j≥1
is a homogeneous Poisson process on [0, 1] with intensity θu−α , the ˆ n converge to those of N(u) . (2.78) is obtained. Laplace functionals of N Note the fact that Zij 1{|Zij |>1} u
Since
i δTi(u)
j≥1
are i.i.d. and almost surely finite. Define ˜ (u) = N δ(T (u) ,u i
i≥1
j
Zij 1{|Zij |>1} )
.
˜ (u) is a Poisson random measure with mean measure θu−α λ × Obviously, N (u) F , where λ is the Lebesgue measure, and F (u) is the distribution of u j≥1 Z1j 1{|Z1j |>1} . But for 0 ≤ s ≤ t ≤ 1 and x > 0, using the fact that the distribution of j≥1 δZ1j is equal to the one of ( n≥1 δYn | supi≤−1 |Yi | ≤ 1), then θu−α λ × F (u) ([s, t] × (x, ∞)) = θu−α (t − s)F (u) ((x, ∞)) Z1j 1{|Z1j |>1} > x) = θu−α (t − s)P(u j≥1
−α
= θu
(t − s)P(u
Yj 1{|Yj |>1} > x| sup |Yi | ≤ 1)
j≥1
= θu−α (t − s)
P(u
= u−α (t − s)P(u
j≥1
j≥1
i≤−1
Yj 1{|Yj |>1} > x, supi≥1 |Yi | ≤ 1) P(supi≤−1 |Yi | ≤ 1)
Yj 1{|Yj |>1} > x, sup |Yi | ≤ 1) i≥1
pg 61/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
62
linwang
Weak Convergence and its Applications
= λ × ν (u) ([s, t] × (x, ∞)). The same can be done for [s, t] × (−∞, −x), so θu−α λ × F (u) = λ × ν (u) . Consider the summation functional ψ (u) : Mp ([0, 1] × [−∞, −u) ∪ (u, ∞]) → D[0, 1] defined by
ψ (u) ( δ(ti ,xi ) )(t) = xi 1{u<xi <∞} . ti ≤t
i≥1
As the proof in subsection 2.3.1, we need to prove that P(N(u) ∈ Λ) = 1,
(2.80)
and ψ (u) is continuous on the set Λ, when D[0, 1] is endowed with M1 topology,
where Λ = Λ1 Λ2 , Λ1 = {η ∈ Mp ([0, 1] × [−∞, −u) ∪ (u, ∞]) : η({0, 1} × [−∞, −u) ∪ (u, ∞]) = η([0, 1] × {±∞, ±u}) = 0}, Λ2 = {η ∈ Mp ([0, 1] × [−∞, −u) ∪ (u, ∞]) : η({t} × (u, ∞]) ∧ η({t} × [−∞, −u)}) = 0 for all t ∈ [0, 1]}. In fact, by the spectral decomposition Yi = |Y0 |Θi into independent components Y0 and Θi with |Y1 | as Pareto random variable (c.f. Basrak and Segers (2009)), thus Yi cannot have any atoms except at origin, and j≥1 δuYj ({±u}) = 0, then j≥1 δuZij ({±u}) = 0 as well. Together with P( i≥1 δT (u) ({0, 1}) = 0) = 1, it i
implies P(N(u) ∈ Λ1 ) = 1. Second, the assumption that with probability one the tail process has no two values of the opposite sign implies P(N(u) ∈ Λ2 ) = 1. Thus (2.80) holds. For the continuity of ψ (u) , it is enough to show that ψ(u) (ηn ) → ψ (u) (η) in υ → η in Mp for some η ∈ Λ. D[0, 1] according to the M1 topology if ηn − Since η ∈ Λ, there exists a nonnegative integer k = k(η) such that η([0, 1] × [−∞, −u) ∪ (u, ∞]) = k < ∞. By assumption, η does not have any atoms on the horizontal lines at u or −u. Thus, there exists a positive integer n0 such that for all n ≥ n0 , ηn ([0, 1] × [−∞, −u) ∪ (u, ∞]) = k < ∞. (n)
(n)
If k = 0, there is nothing to prove, so we assume k ≥ 1. Let (ti , xi ), (ti , xi ) be the atoms of η and ηn in [0, 1] × [−∞, −u) ∪ (u, ∞] respectively. It is easy to see that, for every γ > 0, we can find a positive integer nγ > n0 such that for all n ≥ nγ , (n)
|ti
− ti | ≤ γ,
(n)
|xi
− xi | ≤ γ
for i = 1, 2, · · · , k.
pg 62/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to the Independent Increment Processes
63
Let 0 < τ1 < · · · < τp < 1 be a sequence such that the sets {τ1 , · · · , τp } and {t1 , · · · , tk } coincide. Since η can have several atoms with the same time coordinate, note that p ≤ k. Put τ0 = 0, τp+1 = 1, and set 0
1 min |τi+1 − τi |. 2 0≤i≤p
For any t ∈ [0, 1] \ {τ1 , · · · , τp }, there exists ∈ (0, u) such that < r,
≤ min |t − τi |. 1≤i≤p
≤ t is equivalent to ti ≤ t, (n) xi − xi | ≤ ≤ k. |ψ (u) (ηn )(t) − ψ (u) (η)(t)| = |
Hence n ≥ n implies that
tni
(n)
ti
ti ≤t
≤t
ti ≤t
Put vi = τi + r, i ∈ {1, 2, · · · , p}. For any < u ∧ r, it is obvious that the function ψ(u) (ηn ) (n ≥ n ) and ψ (u) (η) are monotone on each of the intervals [0, v1 ], [v1 , v2 ], · · · , [vp , 1]. By the Proposition M
1 1.1, ψ (u) (ηn ) −−→ ψ (u) (η). Thus ψ (u) is continuous on the set Λ. Note that Xi ψ (u) (Nn |[0,1]×[−∞,−u)∪(u,∞] )(·) = 1{|Xi |>an u} , an
i/n≤·
ψ (u) (N(u) )(·) =
(u)
Ti
uZij 1{|Zij |>1}
≤· j≥1
and d ˜ (u) ) = ¯ (u) ), ψ (u) (N(u) ) = ψ (u) (N ψ (u) (N
where ¯ (u) = N
i≥1
δ(T
(u) ) i ,Ki
is a Poisson random measure with mean measure λ × ν (u) . We obtain [n·] Xi i=1
an
1{|Xi |>an u} ⇒
(u)
Ki
Ti ≤·
as n → ∞ in D[0, 1] under M1 metric. As proved in Theorem 2.6, X1 [n·]E( 1 X1 ) → (·) xμ(dx). an {u<| an |≤1} {x:u<|x|≤1}
pg 63/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
64
linwang
Weak Convergence and its Applications
Since (·)
{x:u<|x|≤1}
xμ(dx) is a continuous function, we obtain [n·] Xi
Zn(u) (·) =
i=1
⇒ =
an
Zα(u)(·)
1{|Xi |>an u} − [n·]E(
=
(u) Ki
Ti ≤· (u) Ki
− (·)
Ti ≤·
− (·)
X1 1 X1 ) an {u<| an |≤1} xμ(dx),
{x:u<|x|≤1}
xν (u) (dx)
{x:u<|x|≤1}
+(·)(
xν
{x:u<|x|≤1}
(u)
(dx) −
xμ(dx)). {x:u<|x|≤1}
By the L´evy-Itˆo representation of L´evy process, Zα(u) (·) ⇒ Zα (·) as u → 0. Thus, Zn(u) (·) ⇒ Zα (·) as u → 0. In fact lim lim sup P( sup |Zn(u) (t) − Zn (t)| > δ)
u→0 n→∞
0≤t≤1
= lim lim sup P( sup | u→0 n→∞
0≤t≤1
[n·] Xi Xi ( 1{|Xi |≤an u} − E( 1{| Xi |≤u} ))| > δ) an an an i=1
= lim lim sup P( max | u→0 n→∞
1≤k≤n
k Xi Xi ( 1{|Xi |≤an u} − E( 1{| Xi |≤u} ))| > δ). a an an n i=1
Note that for α ∈ (0, 1), we have P( max | 1≤k≤n
k Xi Xi 2n E[|X1 |1{|X1 |≤uan } ] ( 1{|Xi |≤an u} − E( 1{| Xi |≤u} ))| > δ) ≤ a a a a n n n nδ i=1
and for α ∈ [1, 2), lim sup n→∞
α 2n E[|X1 |1{|X1 |≤uan } ] = u1−α an δ 1−α
by Karamata’s theorem. Hence (2.77) holds. Thus the theorem is proved. 2.4
Two examples of applications of point process method
The point process method is a powerful method in the heavy-tailed theory. To illustrate this, we will introduce some applications of point process method in the studying of time series with heavy tail.
pg 64/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to the Independent Increment Processes
65
From the previous section, we show that the asymptotic properties of regular varying sequence in the case of α ∈ [1, 2) are more complicated than in the case of α ∈ (0, 1). For the sake of simplicity, in this section, we will assume that {Xn }n≥1 is a i.i.d. sequence of random variables, regular varying with index α ∈ (0, 1), and let P(X1 > x) P(X1 < −x) p = lim , q = 1 − p = lim . x→∞ P(|X1 | > x) x→∞ P(|X1 | > x) Set {an }n≥1 is a sequence of positive real numbers such that nP(|X1 | > an ) → 1. In fact, an = n1/α L(n), where L(n) is a slowly varying function. In what follows, three independent sequence {Γn }n≥1 , {Un }n≥1 , {Bn }n≥1 are defined on the same probability space. {Γn }n≥1 is the arrival sequence of a unit rate Poisson process on R+ . {Un }n≥1 is a i.i.d. sequence of U (0, 1) variables, and {Bn }n≥1 is i.i.d sequence satisfying P[B1 = 1] = p, 2.4.1
P[B1 = −1] = q.
The maximum of the periodogram for heavy-tailed sequence
Recall that {Xn }n≥1 is a i.i.d. sequence of random variables, regular varying with index α ∈ (0, 1), its periodogram is defined by In,X (x) = |Jn,X (x)|2 =
n 1 1 Xj exp(−i2πxj)|2 x ∈ [0, ]. an j=1 2
The periodogram is an estimate of the spectral density of a signal in practice. The statisticians are interested in the limit behavior of the sequence +n,X = Mn,X = max1 In,X (x) and M x∈[0, 2 ]
where 2ωj =
2j , n
max
j=1,2,··· ,q
j = 1, · · · , q, q = qn = max{j : 1 ≤ j <
In,X (ωj ),
n }. 2
Theorem 2.8. (Mikosch, Resnick and Samorodnitsky (2000)) The limit relations
hold, where Yα = Proof.
∞ j=1
Mn,X ⇒ Yα2 , 1 −α
Γj
+n,X ⇒ Y 2 M α
(2.81)
.
Note that n |Xj | 2 ) , Mn,X ≤ ( an j=1
and it is well known (c.f. Feller (1971)) that n |Xj | j=1
an
⇒ Yα .
(2.82)
pg 65/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
66
linwang
Weak Convergence and its Applications
Hence the sequence {Mn,X } is stochastically bounded and it remains to show the lower bound in the limit for the maximum of the periodogram. It suffices to find lim inf P[Mn,X > γ] ≥ P[Yα2 > γ],
(2.83)
n→∞
since then we would have P[Yα2 > γ] ≤ lim inf P[Mn,X > γ] ≤ lim sup P[Mn,X > γ] n→∞
n→∞
n |Xj | 2 ) > γ] = P[Yα2 > γ]. ≤ lim sup P[( a n→∞ n j=1
For any T > 0 and n ≥ 2T , Mn,X = sup |Jn,X (x)|2 = sup |Jn,X (x/n)|2 x∈[0, 12 ]
x∈[0, n 2]
≥ sup |Jn,X (x/n)|2 . x∈[0,T ]
To show (2.83), we need to study the weak convergence of Jn,X (x/n). Define the point processes n ∞ δ( j , Xj ) and N = δ Nn = −1 . α n
j=1
an
j=1
(Uj ,Bj Γj
)
By Lemma 2.6, d
Nn − →N in Mp ([0, 1] × [−∞, ∞] \ {0}). Pick η > 0, define Tη : Mp ([0, 1] × [−∞, ∞] \ {0}) → C[0, ∞) by Tη (
δ(tk ,xk ) )(x) =
k≥1
xk 1{|xk |>η} exp(−2πixtk ).
k≥1 υ
If yn → y, yn , y ∈ [0, ∞), and mn − → m, mn , m ∈ C[0, ∞), we firstly prove Tη (mn )(yn ) → Tη (m)(y). To do this, denote mn =
k≥1
δ(t(n) ,x(n) ) , k
k
m=
(2.84)
δ(tk ,xk ) .
k≥1
The set Kη := [0, 1] × [−∞, −η)] ∪ [η, ∞] is compact in [0, 1] × [−∞, ∞] \ {0} with m(∂Kη ) = 0. There exists n0 such that for n ≥ n0 , mn (Kη ) = m(Kη ) =: l,
pg 66/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
67
Convergence to the Independent Increment Processes
and there is an enumeration of the points in Kη such that (n)
(n)
((tk , xk ), 1 ≤ k ≤ l) → ((tk , xk ), 1 ≤ k ≤ l), and without loss of generality we may assume for given b > 0 that (n)
sup |yn | ∨ max |xk | ≤ b. 1≤k≤l
n≥n0
Therefore, |Tη (mn )(yn ) − Tη (m)(y)| ≤
l
(n)
(n)
(n)
(n)
|xk exp(−2πiyn tk ) − xk exp(−2πiytk )|
k=1
≤
l
(n)
|xk exp(−2πiyn tk ) − xk exp(−2πiyn tk )
k=1 (n)
+ xk exp(−2πiyn tk ) − xk exp(−2πiytk )| ≤
l
(n)
|xk − xk | +
k=1
l
(n)
|xk || exp(−2πiyn tk ) − exp(−2πiytk )|
k=1
→ 0, hence (2.84) holds. This means that the map Tη is continuous a.s. with respect to the distribution of N. Applying the functional Tη to Nn and N, we have (η)
Jn,X (x/n) :=
n Xj j=1
an
(η) exp(−2πixj/n)1{|Xj |≥ηan } ⇒ J∞ (x)
in C[0, ∞), where (η) J∞ (x) =
∞
1 −α
Bj Γj
exp(−2πixUj )1
−1 α
{Γj
j=1
. >η}
Let η → 0, (η) J∞ (x)
⇒ J∞ (x) :=
∞
1 −α
Bj Γj
exp(−2πixUj ).
j=1
We have (η)
lim lim sup P[ sup |Jn,X (x/n) − Jn,X (x)| > δ]
η→0 n→∞
0≤x≤1
≤ lim lim sup P[ sup η→0 n→∞
0≤x≤1
≤ lim lim sup nE[| η→0 n→∞
n Xj j=1
an
exp(−2πixj/n)1{|Xj |<ηan } > δ]
X1 |1{|X1 |<ηan } ]/δ = 0 an
(2.85)
pg 67/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
68
linwang
Weak Convergence and its Applications
by Karamata’s theorem. Hence Jn,X (x/n) ⇒ J∞ (x)
(2.86)
holds in C[0, ∞). Therefore, lim inf P[Mn,X > γ] ≥ lim inf P[ sup |Jn,X (x/n)|2 > γ] n→∞
n→∞
x∈[0,T ]
= P[ sup |J∞ (x)|2 > γ]. x∈[0,T ]
This is true for any T , so lim inf P[Mn,X > γ] ≥ P[ sup |J∞ (x)|2 > γ]. n→∞
x∈[0,∞]
where 2
sup |J∞ (x)| = sup | x∈[0,∞]
≤
∞
1 −α
x∈[0,∞] j=1 ∞ −1 ( Γij α )2 . j=1
Bj Γj
exp(−2πixUj )|2
In the other hand, define Υ = {ω ∈ Ω :
∞
1 −α
Γj
(ω) < ∞ and for every m ≥ 1 the numbers
j=1
(U1 (ω), · · · , Um (ω)) are rationally independent}. By the result of Weyl (1916), P[Υ] = 1, and {({xU1 (ω)}, · · · , {xUm (ω)}), x ≥ 0} (where {a} denotes the fractional part of a) is dense in [0, 1]m . Fix an element ω ∈ Υ and for any ε > 0, there exists N ≥ 1, such that ∞
1 −α
Γj
< ε.
j=N +1
There is a x0 ∈ [0, ∞) such that R(Bj exp(−2πix0 Uj )) ≥ 1 −
ε 1 −α
, j = 1, · · · , N.
N Γj
Then sup |
∞
x∈[0,∞] j=1
≥ sup |
N
x∈[0,∞] j=1
1 −α
Bj Γj
1 −α
Bj Γj
exp(−2πixUj )| exp(−2πixUj )| −
∞ j=N +1
1 −α
Γj
pg 68/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
69
Convergence to the Independent Increment Processes
≥|
N
−1 Bj Γj α
exp(−2πix0 Uj )| − ε ≥ R(
j=1
≥
N
N
1 −α
Bj Γj
exp(−2πix0 Uj )) − ε
j=1
ε
(1 −
1 −α
N Γj
j=1
−1 )Γj α
−ε=
N
1 −α
Γj
− ε.
j=1
Letting first N → ∞ and then ε → 0, we obtain ∞ −1 2 sup |J∞ (x)| ≥ ( Γj α )2 . x∈[0,∞]
j=1
Hence ∞ −1 Γj α )2 . sup |J∞ (x)|2 = (
x∈[0,∞]
(2.87)
j=1
+n,X ⇒ Y 2 . In view of Mn,X ⇒ Y 2 , it suffices to show that It remains to prove M α α for all γ > 0, +n,X > γ] ≥ P[Yα2 > γ]. lim inf P[M n→∞
In fact, for any fixed integer K and sufficiently large n, P[ max |Jn,X (j/n)|2 > γ] ≥ P[ max |Jn,X (j/n)|2 > γ], 1≤j≤q
1≤j≤K
and from (2.86), d
(Jn,X (j/n), 1 ≤ j ≤ K) − → (J∞ (j), 1 ≤ j ≤ K) in RK , hence → max |J∞ (j)|2 max |Jn,X (j/n)|2 − d
1≤j≤K
1≤j≤K
and so lim inf P[ max |Jn,X (j/n)|2 > γ] ≥ P[sup |J∞ (j)|2 > γ]. n→∞
1≤j≤q
j≥1
By (2.87), we finish the proof. 2.4.2
The weak convergence to stable integral
Assume that for 0 < M < ∞, β ∈ [−M, M ], s ∈ [0, 1], fn (β, s), f (β, s) are continuous in β, f (β, s) is differentiable in β and s, and for all β ∈ [−M, M ] and s ∈ [0, 1], there exists a constant C (depending on M ), such that, * * * * * ∂f (β, s) * * ∂f (β, s) * * * *≤C * |fn (β, s)| ≤ C, |f (β, s)| ≤ C * ≤ C, * ∂s * ∂β * and
1 |fn (β, s) − f (β, s)| = O . sup n β∈[−M,M],s∈[0,1]
pg 69/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
70
linwang
Weak Convergence and its Applications
Recall that {Xn }n≥1 is a sequence of i.i.d. random variables, regular varying with index α ∈ (0, 1), we have [ns] Theorem 2.9. Let Un (s) = a−1 n i=1 Xi , and assume Un ⇒ Zα on D[0, 1]. Then n i=1
1 i Xi fn β, ⇒ f (β, s)dZα (s) n an 0
holds on C(R). Proof.
We first prove under local uniform metric, n n i Xi i Xi P fn β, − f β, →0 n an n an i=1 i=1
on C(R). In fact * * n n * i X i Xi ** * i fn β, − f β, * * * n an n an * i=1 i=1 * * * n * * i ** ** i i i − 1 ** * ≤ *. *fn β, n − f β, n * *Un n − Un n i=1
Let δ =
α , m
m > 1 + α1 ,
* * *!α−δ n * ** * * i i i i − 1 * *Un * *fn β, − f β, − Un E sup * * * * n n n n β i=1 * *α−δ * *α−δ n * * i i i − 1 ** i ** * * − f β, ≤ E sup *fn β, * *Un n − Un n n * n β i=1 * *α−δ n * * 1 *U n i − Un i − 1 * =O E * * α−δ n n n i=1 n 1 1 =O nα−δ i=1 n1/α L(n) α−δ n = o(1) =O nα−δ n1−δ/α (L(n))α−δ based on the Minkowski adjoint inequality. By Markov’s inequality, we can easily obtain n n i Xi P i Xi fn β, − f β, → 0, n an n an i=1 i=1 on C([−M, M ]).
pg 70/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to the Independent Increment Processes
linwang
71
i n Next, we only consider the weak convergence for i=1 f β, ni X an . We need to prove the limit relation ∞ n 1 i −1/α ⇒ Xi f β, Bi Γi f (β, Ui ) an i=1 n i=1 holds in C[0, ∞) as n → ∞. Now pick η > 0 and define Tη : Mp ([0, 1] × [−∞, ∞] \ {0}) → C[0, ∞) in the following way. If m=
(tj ,νj ) ∈ Mp ([0, 1] × [−∞, ∞] \ {0})
j
and all νj ’s are finite, then (Tη m)(β) =
νj 1[|νj |>η] f (β, tj ).
k
Otherwise, set (Tη m)(β) ≡ 0. υ Assume βn → β and mn → m in Mp ([0, 1] × [−∞, ∞] \ {0}), where m{∂([0, 1] × {|ν| ≥ η}) [0, 1] × {−∞, ∞}} = 0 implies (Tη mn )(βn ) → (Tη m)(β). Denote mn = (t(n) ,ν (n) ) ; m = (tj ,νj ) . j
j
j
j
The set Gη = [0, 1] × {ν : |ν| ≥ η} is compact in [0, 1] × [−∞, ∞] \ {0} with m(∂Gη ) = 0. There exist n0 , for n ≥ n0 , mn (Gη ) = m(Gη ) =: l, say, that there is an enumeration of the points in Gη such that & % (n) (n) (tk , νk ), 1 ≤ k ≤ l → ((tk , νk ), 1 ≤ k ≤ l) and in fact, without loss of generality we may assume for given ξ > 0 that (n)
sup (|βn | ∨ sup |νk |) ≤ ξ.
n≥n0
k=0,··· ,l
Therefore, |(Tη mn )(βn )| − |(Tη m)(β)| * * l l * * * * (n) (n) =* νk f (βn , tk ) − νk f (β, tk )* * * k=1
k=1
pg 71/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
72
linwang
Weak Convergence and its Applications l * * * (n) * (n) ≤ *νk f (βn , tk ) − νk f (β, tk )* . k=1
We have
* * * * (n) (n) *νk f (βn , tk ) − νk f (β, tk )* * * * (n) * (n) (n) (n) = *νk f (βn , tk ) − νk f (βn , tk ) + νk f (βn , tk ) − νk f (β, tk )* * * * * * * * * (n) (n) ≤ C *νk − νk * + |νk | *f (βn , tk ) − f (β, tk )* .
Thus |(Tη mn )(βn )| − |(Tη m)(β)| l l * * * * * * (n) * * (n) ≤ C *νk − νk * + |νk | *f (βn , tk ) − f (β, tk )* k=1
=
l k=1
≤
l k=1
k=1 l * * * * * (n) * * * (n) (n) (n) C *νk − νk * + |νk | *f (βn , tk ) − f (β, tk ) + f (β, tk ) − f (β, tk )* k=1 l l * * * * * * * (n) * * * * (n) (n) * (n) C *νk − νk * + |νk | *f (βn , tk ) − f (β, tk )* + |νk | *f (β, tk ) − f (β, tk )* . k=1
k=1
So lim |(Tη mn )(βn )| − |(Tη m)(β)| = 0.
n→∞
This implies that the map Tη : Mp ([0, 1] × [−∞, ∞] \ {0}) → C[0, ∞) is continuous a.s. with respect to the distribution of N. Thus we can obtain n j Xj Jnη (β/n) := f (β, )1[|Xj |>ηn ] a n j=1 n ⇒
∞
−1/α
Bj Γj
η f (β, Uj )1[Γ−1/α >η] := J∞ (β) j
j=1
in C[0, ∞). Also, as η → 0 we have η (β) ⇒ J∞ (β) := J∞
∞
−1/α
Bj Γj
f (β, Uj ),
j=1
it remains to prove for any θ > 0,
**,** ** ** (η) lim lim sup P **Jn,X − Jn,X ** = 0,
η→0 n→∞
(2.88)
where ||x(·) − y(·)|| is the C[0, ∞) metric distance between x, y ∈ C[0, ∞). The method of proof of (2.88) will be simply demonstrated if we show for any θ > 0, * * ⎡ ⎤ * n * * Xj * i f (β, )1[|Xj | θ⎦ = 0. (2.89) lim lim sup P ⎣ sup ** η→0 n→∞ n β∈[−M,M] * j=1 an *
pg 72/2
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to the Independent Increment Processes
73
The expression in (2.89) has a bound ⎡ ⎤ * n * * Xj * * * 1[|X | θ⎦ ≤ lim lim sup CnE X1 1[|X |
≤ C lim
η→0
{x:|x|≤η}
|x|ν(dx)/θ = 0
by Karamata’s theorem, where ν(dx) = pαx−α−1 dx1[x>0] + qα|x|−α−1 dx1[x<0] .
pg 73/2
July 25, 2013
17:28
WSPC - Proceedings Trim Size: 9.75in x 6.5in
This page intentionally left blank
icmp12-master
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Chapter 3
Convergence to Semimartingales
In Chapter 2, the limiting processes for weak convergence are L´evy processes or the product of a L´evy process and an independent random variable (c.f., theorems 2.4 and 2.5). The characteristic functions of these type of processes can be computed. So we can identify limiting processes through the convergence of finite dimensional distributions. When the limiting process is general semimartingale, the finite dimensional distributions are difficult to compute. We need to find new approach. This problem is quite pivotal in some application areas. In some study of physics, chemistry and biology, the diffusion approximation for Markov processes may be very important. In this case, the limiting process of weak convergence may be a diffusion process, which is an important example of semimartingale. Semimartingale is an adapted stochastic process, which can be used as integrators in the general theory of stochastic integration. Examples of semimartingales include Brownian motion, all local martingales, finite variation processes and L´evy processes. We employ the predictable characteristics of semimartingale to discuss the weak convergence of semimartingales. The idea of predictable characteristics of semimartingale comes from the characteristic L´evy triple, which is determined by characteristic functions of finite dimensional distributions of a L´evy process. We list the details of predictable characteristics of a semimartingal in Appendix. The presentation follows Jacod and Shiryaev (2003) closely; any unexplained notation can be found in this monograph. 3.1
The conditions of tightness for semimartingale sequence
In this section, we study a sequence {X n } of R−valued semimartingales, which are defined on the space (Ω, F , (Ft )t≥0 , P). To simplify, we only discuss the case of t ∈ [0, 1]. We wish to derive criteria for tightness of the sequence {X n} in D[0, 1], that can be easily checked. For c` adl` ag adapted process sequence {X n }, Aldous (1978) presented the following criterion for tightness. Theorem 3.1. Assume that 75
pg 75/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
76
linwang
Weak Convergence and its Applications
(i) for every ε > 0, there exist a > 0 and n0 , such that P{ sup |Xtn | ≥ a} ≤ ε t∈[0,1]
as n ≥ n0 ; (ii) for every ε > 0, θ > 0, and stopping times S, T ∈ S, lim lim sup θ↓0
sup
n→∞ S≤T ≤S+θ
P(|XTn − XSn | > ε) = 0,
where S is the set of (Ft )0≤t≤1 − stopping times that are bounded by 1. Then {X n } is tight in D[0, 1]. Proof. It is enough to prove that condition (ii) in Theorem 2.2 holds under condition (ii) in this theorem. Fix η > 0, condition (ii) implies that for all ρ > 0, there are δ(ρ) > 0 and integer n(ρ) > 0 such that P(|XTn − XSn | ≥ η) ≤ ρ
(3.1)
for n ≥ n(ρ), S, T ∈ S, S ≤ T ≤ S + δ(ρ). Inductively, define stopping times: S0n = 0, n = inf{t > Skn : |Xtn − XSnkn | ≥ η}. Sk+1
For any given ε > 0, there exist n1 = n(ε) and δ = δ(ε), such that n P(Sk+1 ≤ (Skn + δ) ∧ 1) ≤ ε, n ∧ (Skn + δ(ρ)) ∧ 1, when n ≥ n1 by applying (3.1) to ρ = ε and S = Skn ∧ 1, T = Sk+1 and noticing that
|XSnn − XSnn | > η k
k+1
n if Sk+1 < ∞. Then we choose integer q, such that qδ > 2. The same argument as above shows that there exist n2 = n(ε/q) ∨ n1 and θ = δ(ε/q), such that ε n ≤ (Skn + δ) ∧ 1) ≤ , P(Sk+1 q
when n ≥ n2 . q n ), we have Since Sqn = k=1 (Skn − Sk−1 q n P(Sqn < 1) ≥ E( (Skn − Sk−1 )1{Sqn ≤1} )
≥
k=1 q
n n E((Skn − Sk−1 )1{Sqn ≤1,Skn −Sk−1 >δ} )
k=1 q
≥
n δ[P(Sqn ≤ 1) − P(Sqn ≤ 1, Skn − Sk−1 ≤ δ)]
k=1
≥ δqP(Sqn ≤ 1) − δqε.
pg 76/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence to Semimartingales
77
We deduce that P(Sqn < 1) ≤ 2ε when n ≥ n2 . Next set An = {Sqn ≥ 1}
q n [ {Sk+1 > (Skn + θ) ∧ 1}]. k=1
We can obtain P(An ) ≥ 1 − 3ε when n ≥ n2 . For ω ∈ An , we consider the subdivision 0 = t0 < · · · < tr = 1 with ti = Sin (ω) if i ≤ r − 1 and r = inf(i : Sin (ω) ≥ 1). By the construction of Sjn ’s, we have ω (X n (ω), θ) ≤ 2η. Thus P(ω (X n (ω), θ) ≥ 2η) ≤ 3ε when n ≥ n2 . The proof is completed.
Next, we consider the case where each X n is a semimartingale. We present a explicit criteria for tightness of a sequence of semimartingales. We pick a truncation function h, which is bounded and satisfies h(x) = x in a neighborhood of 0. We shall heavily use the predictable characteristics of X n , (B n = B n (h), C n , ν n ), as defined in Appendix A. We also introduce the modified second characteristic of X n as follow: n (h) = C n + h2 ∗ ν n − n = C (ΔBsn )2 . C s≤·
Definition 3.1. A sequence {X n } of processes is called C−tight if it is tight, and if all limiting points of weak convergence of the sequence are laws of continuous processes. Definition 3.2. For increasing processes X and Y , we say Y strongly majorizes X, if Y − X is increasing, denoted by X ≺ Y . n do not depend on the choice of h. Lemma 3.1. The C−tightness of B n and C Proof. Let h be another truncation function. There exist two constants a > 0, b > 0, such that |h|, |h | ≤ a, h(x) = h (x) = x for |x| ≤ b. Choose positive integer p such that 2/p ≤ b, and denote gp (x) = (p|x| − 1)+ ∧ 1. Thus |h(x) − h (x)| ≤ 2agp (x),
(h2 (x) − h2 (x)) ≤ 2a2 gp (x),
and (h − h) ∗ ν n ≺ 2agp ,
(h2 − h2 ) ∗ ν n ≺ 2a2 gp .
pg 77/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
78
linwang
Weak Convergence and its Applications
Hence (h − h) ∗ ν n and (h2 − h2 ) ∗ ν n are C−tight. Note that T V( [ΔBsn (h)2 − ΔBsn (h )2 ]) ≺
s≤·
|ΔBsn (h)2 − ΔBsn (h )2 |[ΔBsn (h)2 + ΔBsn (h )2 ]
s≤·
≺ 2a|h − h | ∗ ν n ≺ 4a2 (gp ∗ ν n ), where T V(A) is the totally variation process of A, thus is C−tight. Since
n 2 n 2 s≤· [ΔBs (h) − ΔBs (h ) ]
B n (h ) = B n (h) + (h − h) ∗ ν n , n (h ) = C n (h) + (h2 − h2 ) ∗ ν n + C
[ΔBsn (h)2 − ΔBsn (h )2 ], s≤·
n are C−tight by Theorem 1.5. we can obtain that both B n and C
Remark 3.1. If Y strongly majorizes X, it is easy to get that the tightness (C−tightness) of Y implies the tightness (C−tightness) of X. Theorem 3.2. Assume that (i) the sequence {X0n } is tight in R; (ii) for all ε > 0, lim lim sup P(ν n ([0, 1] × {x : |x| > a}) > ε) = 0;
a↑∞ n→∞
(3.2)
n }, {gp ∗ ν n } are C−tight, where gp (x) = (p|x| − 1)+ ∧ 1 for a (iii) {B n }, {C positive integer p. Then {X n } is tight in D[0, 1]. Proof. tion:
Let h be fixed, and set hq = qh(x/q). Consider the following decomposiX n = U nq + V nq + W nq ,
ˇ n (hq ). (The definition of where U nq = X0n + M n (hq ), V nq = B n (hq ), W nq = X ˇ n (hq ) can be found in Appendix A, (3.99).) M n (hq ), B n (hq ), X n (hq )} are C−tight due to condition (iii). By From Lemma 3.1, {V nq } and {C Proposition 3.2, for all a > 0 b > 0, P( sup |Mtn (hq )| ≥ a) ≤ t∈[0,1]
b 1n (hq ) ≥ b). + P(C a2
Thus P( sup |Utnq | ≥ 2a) ≤ P(|X0n | > a) + P( sup |Mtn (hq )| ≥ a) t∈[0,1]
t∈[0,1]
b 1n (hq ) ≥ b). ≤ P(|X0n | > a) + 2 + P(C a
pg 78/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
79
Furthermore, for stopping times S, T ∈ S with S ≤ T , and for all ε > 0, η > 0, nq P(|UTnq − USnq | ≥ ε) ≤ P( sup |Utnq − Ut∧S | ≥ ε) t∈[0,T ]
η Tn (hq ) − C Sn (hq ) ≥ η), ≤ 2 + P(C ε n (hq )} is C−tight, we can obtain for large enough n. Since {X0n } is tight and {C {U nq } is tight by Theorem 2.2. By the definition of h, there is a constant d > 0, such that hq (x) = x when ˇ n (hq ) |x| ≤ dq, thus by the definition of X P( sup |Wtnq | > 0) ≤ P( sup |ΔXt | > dq) t∈[0,1]
t∈[0,1]
≤ ε + P(ν n ([0, 1] × {x : |x| > dq}) > ε) by Proposition 3.2. Then lim sup P( sup |Wtnq | > 0) = 0 n→∞
t∈[0,1]
by the condition (ii). We obtain this theorem from Theorem 1.5.
3.2
Weak convergence to semimartingale
This section is the heart of this chapter. We consider the following problem: to find conditions such that a given semimartingale sequence {X n } with predictable n weakly converges. To simplify the problem, we characteristics (B n , C n , ν n ) and C assume that the candidate limiting process is a canonical process, which is defined on the canonical space D[0, 1]. Recall that the canonical space (D[0, 1], D[0, 1], D[0, 1]) is the space of all c`adl` ag functions on [0, 1], Dt0 [0, 1] denotes the σ−field generated by
all maps: β → β(s), Dt [0, 1] = t≤u≤1 Du0 [0, 1], D[0, 1] = (Dt [0, 1])0≤t≤1 , D[0, 1] = 2 0 0≤t≤1 Dt [0, 1], for 0 ≤ s ≤ t ≤ 1, and for α ∈ D[0, 1], canonical process Xt (α) = α(t). On (D[0, 1], D[0, 1], D[0, 1]), the following processes are given: (i) B is a predictable process with finite variation over finite intervals and B0 = 0; (ii) C is a continuous adapted process with C0 = 0 and Ct is nondecreasing process in t; (iii) ν is a predictable random measure on [0, 1] × R, which charges neither [0, 1] × {0} nor {0} × R, such that (1 ∧ |x|2 ) ∗ ν < ∞, +∞ h(x)ν({t × dx}) = ΔBt −∞
and ν({t × R}) ≤ 1 identically. We first present the main theorem of this section. Theorem 3.3. (Liptser and Shiryaev (1983)) For any dense subset of [0, 1], D, we assume
pg 79/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
80
linwang
Weak Convergence and its Applications
(i) There is a continuous and determinisitic increasing function Ft , such that V ar(B) ≺ F , and C + (|x|2 ∧ 1) ∗ ν ≺ F . (ii) For all t ∈ [0, 1], lim
sup ν(α : [0, t] × {x : |x| > a}) = 0.
a↑∞ α∈D[0,1]
(3.3)
(iii) There is a unique probability measure P on (D[0, 1], D[0, 1], D[0, 1]) such that the canonical process X is a semimartingale on (D[0, 1], D[0, 1], D[0, 1], P) with characteristics (B, C, ν) and initial distribution η. t , (iv) For all t ∈ D and bounded continuous function g, the functions Bt , C g ∗ νt are Skorokhod-continuous on D[0, 1]. (v) d
→ η, ηn −
(3.4)
where η n is the initial distribution of X n . (vi) P
→ 0, sup |Btn − Bt ◦ X n | −
(3.5)
t∈[0,1] P
t ◦ X n | − n − C → 0 for all t ∈ D, |C t
(3.6)
and for all bounded continuous function g P
→ 0 for all t ∈ D. |g ∗ νtn − g ∗ νt ◦ X n| −
(3.7)
Then the laws L(X n ) weakly converge to P . L(X n ) ⇒ P implies X n weakly converge to X. In the following two subsections, we will discuss the conditions in Theorem 3.3. 3.2.1
Tightness
In this subsection, our goal is to show the tightness of {X n } if {X0n } is tight and the conditions (i), (ii), and (vi) in Theorem 3.3 hold. We verify the conditions in Theorem 3.2. Let t ∈ D, gb (x) = (b|x| − 1)+ ∧ 1. For any ε > 0 and η > 0, there exists a > 0 such that ε g2/a ∗ ν ≤ 2 by condition (ii), and ε P(|g2/a ∗ νtn − g2/a ∗ νt ◦ X n | ≥ ) ≤ η 2 by (3.7). We deduce P(ν n ([0, t] × {x : |x| > a}) ≥ ε) ≤ η
pg 80/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
81
Convergence to Semimartingales
since ν n ([0, t] × {x : |x| > a}) ≤ g2/a ∗ νtn . Hence (3.2) holds due to arbitrariness of D. {B ◦ X n } is C−tight, since V ar(B) ≺ F and F is continuous. From (3.5), {B n } is C−tight. Let ε > 0, there exist 0 = t0 < t1 < · · · < tq = 1 with ti ∈ D for i ≥ 0, such that Fti+1 − Fti ≤ ε. Then if tk ≤ s ≤ tk+1 , s ◦ X n | ≤ |C n − C n | + |C n − C t ◦ X n | + |C t ◦ X n − C s ◦ X n | n − C |C s
s
≤
tn |C k+1
tk
tk
k
k
n | + |C n − C t ◦ X n | + ε −C tk tk k
t n − C t ◦ X n | + 2ε. n − C ◦ X n | + 2|C ≤ |C k+1 k tk+1 tk Thus n − C s ◦ X n| ≤ 2ε + max{3|C n − C t ◦ X n |}. sup |C k s tk k
t∈[0,1]
Then P
tn − C t ◦ X n | − sup |C →0
(3.8)
t∈[0,1]
by (3.6). Similarly, P
→ 0. sup |g ∗ νtn − g ∗ νt ◦ X n| −
(3.9)
t∈[0,1]
≺ cF , gb ∗ ν ≺ cF , It is easy to deduce that there exists constant c, such that C n n thus, {C } and {gb ∗ν } are C−tight. We obtain the tightness of {X n } by Theorem 3.2. 3.2.2
Identifying the limit
We first consider the following problem: if a semimartingale sequence {X n } weakly converges to a limiting process X, a semimartingale with predictable characteristics (B, C, ν), are B, C, ν the limits of X n ’s predictable characteristics respectively? This is to say that, we need to clarify the relationship between {(B n , C n , ν n )} and (B, C, ν). The following lemmas are necessary. Lemma 3.2. Let H n be a c` adl´ ag adapted process, M n be a martingale, H, M be c` adl´ ag adapted processes defined on the canonical space (D[0, 1], D[0, 1], D[0, 1]), and D be a dense subset of [0, 1]. Assume that (i) the family {Mtn }n>0,t∈[0,1] of random variables is uniformly integrable. (ii) H n ⇒ H; (iii) for all t ∈ D, Mt is Q-a.s. continuous on D[0, 1], where Q is the distribution of H; P (iv) Mtn − Mt ◦ H n − → 0 for all t ∈ D. Then the process M ◦ H is a martingale with respect to the filtration generated by H.
pg 81/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
82
linwang
Weak Convergence and its Applications
In general, H can be defined on any given space (Ω, F , P). Here, we assume H is the canonical process on the space (D[0, 1], D[0, 1], D[0, 1], Q). Proof. We first assume that |M n | ≤ b identically for some constant b > 0. Let t1 ≤ t2 with t1 , t2 ∈ D, and f be a continuous bounded Dt1 − [0, 1]−measurable function. We first prove that EQ [f (H)(Mt2 ◦ H − Mt1 ◦ H)] = 0.
(3.10)
We can deduce that the mappings α → f (α)[−b ∨ Mti (α) ∧ b], i = 1, 2, are Q − a.s. continuous by (iii), and so lim E[f (H n )(−b ∨ (Mti ◦ H n ) ∧ b)] = EQ [f (H)(−b ∨ (Mti ◦ H) ∧ b)].
n→∞
Furthermore, lim E[f (H n )(−b ∨ Mtni ∧ b)] = EQ [f (H)(−b ∨ (Mti ◦ H) ∧ b)]
n→∞
(3.11)
since lim E[f (H n )((−b ∨ (Mti ◦ H n ) ∧ b) − (−b ∨ Mtni ∧ b))] = 0
n→∞
by (iv). For t ∈ D, d
Mtn − → Mt ◦ H by (ii), (iii), (iv). We deduce that |M ◦ H| ≤ b identically outside a Q-null set, since D is dense in [0, 1] and M is c`adl`ag, so −b ∨ (Mti ◦ H) ∧ b = Mti ◦ H Q−a.s., and thus lim E[f (H n )Mtni ] = EQ [f (H)(Mti ◦ H)].
n→∞
(3.12)
We can obtain EQ [f (H)(Mtn2 − Mtn1 )] = 0 since M n is a martingale. Thus (3.10) is obtained by (3.12). Finally, a monotone class argument shows that (3.10) holds for all Dt1 − [0, 1]−measurable bounded f . Let s < t, there are two sequences sn ↓ s, tn ↓ t, such that sn , tn ∈ D. So (3.10) holds for every pair (sn , tn ). By martingale convergence theorem, we can obtain EQ [f (H)(Mt ◦ H − Ms ◦ H)] = 0
(3.13)
for all s ≤ t and Ds [0, 1]−measurable bounded f (Ds [0, 1] ⊂ Dsn − [0, 1]). Hence the claim holds when we assume that |M n | ≤ b identically for some constant b > 0. Set g(b) =
sup
E[|Mtn | − |Mtn | ∧ b].
t∈[0,1],n≥1
We have that g(b) → 0 as b → ∞ by the uniformly integrability of the family {Mtn }n>0,t∈[0,1]. Moreover, lim E[|Mtn | ∧ b] = EQ [|Mt ◦ H| ∧ b]
n→∞
(3.14)
pg 82/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
83
for t ∈ D by (3.12). Applying (3.14) for b > b, we deduce EQ [|Mt ◦ H| ∧ b − |Mt ◦ H| ∧ b] ≤ g(b). Let b → ∞, EQ [|Mt ◦ H| − |Mt ◦ H| ∧ b] ≤ g(b).
(3.15)
On the other hand, EQ [|Mt ◦ H|1{|Mt ◦H|>b} ] ≤ EQ [(|Mt ◦ H| ∧ a)1{|Mt ◦H|>b} ] + EQ [|Mt ◦ H| − |Mt ◦ H| ∧ a] ≤ aQ[|Mt ◦ H| > b] + g(a) a ≤ sup EQ [|Mt ◦ H|] + g(a) → 0 b t∈D under b → ∞, and then a → ∞. Thus the family (Mt ◦ H)t∈D is uniformly integrable. It is easy to deduce from (3.11) and (3.15) that (3.12) still holds under the uniformly integrable condition. The same argument will obtain (3.12) holds under the general condition, we obtain this lemma. Lemma 3.3. For semimartingale sequence {X n}, we assume that the distribution be a dense subset of X n on D[0, 1] weakly converges to probability measure Q. Let D of [0, 1], which is contained in [0, 1] \ {t ∈ [0, 1], Q(ΔXt = 0) > 0}. Moreover, (i) P
→ 0 for all t ∈ D, |Btn − Bt ◦ X n | − P
n − C t ◦ X n | − |C → 0 for all t ∈ D, t
(3.16) (3.17)
and for all bounded continuous function g P → 0 for all t ∈ D; |g ∗ νtn − g ∗ νt ◦ X n| −
(3.18)
t (α)| < ∞, supα∈D[0,1] |g ∗ νt (α)| < ∞ for all t ∈ [0, 1] and (ii) supα∈D[0,1] |C every continuous bounded function g; (iii) for all t ∈ [0, 1] and every continuous bounded function g, the mapping t (α), α → g ∗ νt (α) are Q−almost surely Skorokhod continuous. α → Bt (α), α → C Then X is a semimartingale on (D[0, 1], D[0, 1], D[0, 1], Q) with predictable characteristics (B, C, ν). Proof.
Set Xt = Xt −
[ΔXs − h(ΔXs )], Vt = Xt − Bt − X0 , s≤t
t , N g = Zt = Vt2 − C t
s≤t
g(ΔXs ) − g ∗ νt ,
pg 83/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
84
linwang
Weak Convergence and its Applications
and (Xtn ) = Xtn −
[ΔXsn − h(ΔXsn )], Vtn = (Xtn ) − Btn − X0n , s≤t
n , N n,g = Ztn = (Vtn )2 − C t t
g(ΔXsn ) − g ∗ νtn .
s≤t
We need to prove that V, Z, N g are local martingales on (D[0, 1], D[0, 1], D[0, 1], Q). By (ii), we can choose constant K such that t (α)| + |g ∗ νt (α)| ≤ K |C for any fixed bounded continuous function g, and define n ≥ K + 1 or g ∗ ν n ≥ K + 1}. Tn = inf{t : C t t n 1 We will apply Lemma 3.2 to H n = X n , H = X, Mt1n = Vt∧T ∧Tn , Mt = Vt∧T , n,g g n 2 3n 3 Mt2n = Zt∧T ∧Tn , Mt = Zt∧T , Mt = Nt∧T ∧Tn , Mt = Nt∧T . It is enough to prove 1 2 3 that M , M , M are martingales. Firstly, by Doob’s inequality,
n ], E[sup |Mt1n |] ≤ 4E[|M11n |2 ] = 4E[C Tn t
n . thus {Mt1n }t∈[0,1],n≥1 is uniformly integrable by boundness of C Tn 2n Similarly, we easily deduce that the family {Mt }t∈[0,1],n≥1 is uniformly integrable as well. Moreover, n ] ≤ 4(K + 1)2 E[sup |Mt3n |] ≤ 4E[< M 3n , M 3n >t∧Tn ] = 4E[g 2 ∗ νt∧T n t
by Doob’s inequality. Thus {Mt3n }t∈[0,1],n≥1 is uniformly integrable. So the condition (i) in Lemma 3.2 is satisfied for M 1n , M 2n and M 3n . One can easily obtain from (iii), Mt1 , Mt2 , Mt3 are Q-a.s. continuous on D[0, 1] in D. Note that on {Tn ≥ T } n Mt1n − Mt1 ◦ X n = Bt∧T ◦ X n − Bt∧T , n Mt3n − Mt3 ◦ X n = (g ∗ νt∧T ) ◦ X n − g ∗ νt∧T .
(3.17) and (3.18) imply P T ◦ X n | − n − C → 0, |C T P
→ 0. |g ∗ νTn − g ∗ νT ◦ X n | − T ◦ X n ≤ K, g ∗ νT ◦ X n ≤ K, we have Since C n ≥ K + 1) → 0, P(g ∗ ν n ≥ K + 1) → 0 P(C T T
pg 84/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
85
Convergence to Semimartingales
as n ↑ ∞. Thus P(Tn < T ) → 0 as n ↑ ∞. Then condition (iv) of Lemma 3.2 for Mt1 , Mt3 is satisfied by (3.16) and n } is uniformly integrable, (3.18). On the other hand, we have seen above {Vt∧T P
n n |Vt∧T →0 ∧Tn − Vt∧T ◦ X | −
and n n n Mt2n − Mt2 ◦ X n = Vt∧T ∧Tn (Vt∧T ∧Tn − Vt∧T ◦ X ) n n n n +Vt∧T ◦ X n (Vt∧T ∧Tn − Vt∧T ◦ X ) + Ct∧T ◦ X − Ct∧T ∧Tn .
Then, condition (iv) of Lemma 3.2 for Mt2 is satisfied. We obtain this lemma.
Lemma 3.3 means that probability measure Q is a solution of martingale problem (B, C, ν). When we assume that the limiting process is the canonical process X, the distribution of X in D[0, 1] is a probability measure on D[0, 1]. If the martingale problem has unique solution on D[0, 1], it means that the distribution of X in D[0, 1], the measure on D[0, 1], under the tightness assumption is unique, thus we identify the limiting process. The proof of Theorem 3.3. According to previous discussion and the condi tion (iii), it remains to prove that D = D {[0, 1] \ {t > 0, P (ΔXt = 0) > 0}} is dense in [0, 1]. For any t ∈ [0, 1] and the positive rational number ε, there exist s, s ∈ D with s < t < s , and g2/ε ∗ νs − g2/ε ∗ νs ≤ ε since condition (i). By Proposition 1.10, there also exist r, r ∈ [0, 1] \ {t > 0, P (ΔXt = 0) > 0} with s ≤ r < t < r ≤ s , d
→ sup |ΔXu |. sup |ΔXun | −
r
r
Note that P (|ΔXt | ≥ ε) ≤ P ( sup |ΔXu | ≥ ε) r
≤ lim sup P( sup |ΔXun | ≥ ε) n→∞
r
≤ lim sup P( sup |ΔXun | ≥ ε) n→∞ s
s
Then, by Proposition 3.2, P (|ΔXt | ≥ ε) ≤ 2ε + lim sup P(g2/ε ∗ νsn − g2/ε ∗ νsn ≥ 2ε) n→∞
since 1(s,∞) g2/ε ∗ ν is the P−compensator of n
s
g2/ε (ΔXun ).
pg 85/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
86
linwang
Weak Convergence and its Applications
At last, P
|g2/ε ∗ νsn − g2/ε ∗ νs ◦ X n | − →0 implies P (|ΔXt | > ε) ≤ 2ε for any t. Thus, P (|ΔXt | > 0) = 0 since ε is arbitrary. We obtain {t > 0, P (ΔXt = 0) > 0} = ∅. The proof is complete. 3.3
Weak convergence to stochastic integral I: the martingale convergence approach
Itˆ o type stochastic integral is an important example of semimartingale. Weak convergence to Itˆo type stochastic integral is an interesting problem in probability and statistics. In the following two sections, we will present some results on convergence to stochastic integral. In this section, we obtain the result by means of Theorem 3.3, this technique is called the martingale convergence approach. Firstly, we consider the weak convergence of functionals for sums of causal processes. We call {Xn , n ≥ 1} a causal process if Xn has the form Xn = g(· · · , εn−1 , εn ), where {εn ; n ∈ Z} is mean zero, i.i.d. random variables and g is a measurable function. Causal process is a very important example of stationary processes. It has been widely used in practice, and contains many important statistical models, such as linear processes, ARCH models, threshold AR (TAR) models and so on. Asymptotic behavior of the sums of causal processes, Sn = ni=1 Xi , is important subjects in both practice and theory. Recall that Z ∈ Lp (p > 0) if ||Z||p = [E(|Z|p )]1/p < ∞ and write ||Z|| = ||Z||2 . To study the asymptotic property of the sums of causal processes, martingale approximation is an effective method. Roughly speaking, martingale approximation is to find a martingale Mn , such that the error Sn − Mn p is small in some sense. Define Z ∈ L1 ,
Pk Z = E(Z|Fk ) − E(Z|Fk−1 ), Dk =
∞
Pk Xi ,
Mk =
k
Di
Rk = Sk − M k ,
i=1
i=k
Hk =
∞ i=1
E(Xk+i |Fk ),
pg 86/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
θn,p = ||P0 Xn ||p ,
Λn,q =
n
linwang
87
θi,q , n > 0,
i=0
Θm,p =
∞
θi,p ,
θn,p = 0 = Λn,p if n < 0.
i=m
It is easy to check that {Dk } is a sequence of martingale differences, Mk is a martingale. We will use Mk to approximate Sk , and assume that Dk converges almost surely. We present the following theorem. Theorem 3.4. (Lin and Wang (2010)) Let f : R → R be a twice continuously differentiable function satisfying |f (x)| ≤ C(1 + |x|α ) for some positive constants C, α and all x ∈ R. Suppose that Xt is a causal process satisfying ∗ (i) X0 ∈ Lq , q > 2, and Θn,q∗ = O(n1/q −1/2 (log n)−1 ), where q ∗ = min(q, 4); (ii) ∞
||E(Dk2 |F0 ) − σ 2 ||q∗ /2 < ∞,
k=1
where ||Dk || = σ; (iii) ∞ ∞
||E(Xk Xk+i |F0 ) − E(Xk Xk+i |F−1 )||4 < ∞
k=0 i=1
and ||
∞ ∞
E(Xk Xk+i |F0 )||3 < ∞,
k=0 i=1
where Fk = (· · · , εk−1 , εk ). Then · · [n·] t−1 1 1 √ f(√ Xi )Xt ⇒ λ f (W (v))dv + σ f (W (v))dW (v) n t=2 n i=1 0 0 ∞ where λ = j=1 EX0 Xj , W is the standard Brownian motion.
(3.19)
To prove Theorem 3.4, we verify the conditions in Theorem 3.3. Firstly, we introduce two lemmas. Lemma 3.4. (Wu (2007)) Assume that E[X0 ] = 0, X0 ∈ Lq , q > 1, and Θ0,q = ∞ i=0 θi,q < ∞, then max |Sk | q ≤ k≤n
qBq 1/q n Θ0,q , q−1
where q = min(2, q), Bq = 18q3/2 (q − 1)−1/2 if q ∈ (1, 2) ∪ (2, ∞) and Bq = 1 if q = 2.
pg 87/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
88
linwang
Weak Convergence and its Applications
Lemma 3.5. (Wu (2007)) Under conditions (i) and (ii), there exists a standard Brownian motion W on a richer probability space such that |Sn − σW (n)| = Oa.s. (n1/4 (log n)1/2 (log log n)1/4 ). Set [ns]
[ns]
1 1 1 Xn (s) = ( √ f(√ Xi )Xt , √ Xt ) =: (Xn1 (s), Xn2 (s)) n t=2 n i=1 n t=1 and
s
X(s) = (λ
t−1
f (B(v))dv + σ
0
s
f (B(v))dB(v), B(s)) =: (X 1 (s), X 2 (s)).
0
We can get the first two predictable characteristics of Xn as follows: [ns] [ns] t−1 1 1 1 √ √ √ Bn (s) = ( f( Xi )(Xt − Dt ), (Xt − Dt )), n t=2 n i=1 n t=1 ) ( 11 Cn (s) Cn12 (s) , Cn (s) = Cn21 (s) Cn22 (s) [ns]
Cn11 (s) =
1 2 1 f (√ Xi )E(Dt2 |Ft−1 ), n t=2 n t−1
i=1
[ns]
Cn22 (s) =
1 E(Dt2 |Ft−1 ), n t=1 [ns]
Cn12 (s) = Cn21 (s) =
1 1 f(√ Xi )E(Dt2 |Ft−1 ). n t=2 n i=1 t−1
The process X(s) = (X 1 (s), X 2 (s)), a random element in the Skorokhod space D([0, 1] × [0, 1]), is a solution to the stochastic differential equation dX 1 (t) = λf (X 2 (t))dt + σf (X 2 (t))dW (t), (3.20) dX 2 (t) = dW (t). The predictable characteristics of X are (B(X), C(X), 0): s f (σα2 (v))dv, 0), B(s, α) = (λ 0 4 3 s s σ 2 0 f 2 (α2 (v))dv σ 0 f (α2 (v))dv . C(s, α) = s σ2 s σ 0 f (α2 (v))dv The proof of Theorem 3.4. We can easily obtain [ns]
B(s) ◦ Xn =
[ns]
λ 1 λ 1 f (√ Xi ) + f ( √ Xi )(ns − [ns]), n t=2 n n i=1 n i=1 t−1
pg 88/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
89
Convergence to Semimartingales [ns] [ns] t−1 σ2 2 1 σ2 2 1 f (√ f (√ Xi ) + Xi )(ns − [ns]), n t=2 n n n
C 11 (s) ◦ Xn =
i=1
i=1
[ns] [ns] t−1 σ2 1 σ2 1 f(√ Xi ) + Xi )(ns − [ns]). f(√ n t=2 n i=1 n n i=1
C 12 (s) ◦ Xn = C 21 (s) ◦ Xn = Firstly, we need to prove
P
→ 0, sup |Bn (s) − B(s) ◦ Xn | −
(3.21)
0<s≤1
which will be proved if we show [ns]
[ns]
1 1 λ 1 P sup | √ f(√ Xi )(Xt − Dt ) − f (√ Xi )| − → 0. n t=2 n t=2 n i=1 n i=1 0<s≤1 t−1
t−1
(3.22)
We have 1 1 λ 1 f(√ Xi )(Xt − Dt ) − f (√ Xi )| Jk := | √ n t=2 n i=1 n t=2 n i=1 k
t−1
k
t−1
1 1 λ 1 = |√ f(√ Xi )(Ht−1 − Ht ) − f (√ Xi )| n t=2 n i=1 n t=2 n i=1 k
t−1
k
t−1
1 1 1 (f ( √ Xi ) − f ( √ Xi ))Ht = |√ n t=2 n n k
t
t−1
i=1
1 1 −√ f(√ n n
k
i=1
Xi )Hk −
i=1
λ n
k
1 f ( √ Xi )| n i=1 t=2 t−1
and 1 1 1 1 Xi )Hk | + max | f (√ Xi )(Xt Ht − λ)| max Jk ≤ max | √ f ( √ 1≤k≤n 1≤k≤n 1≤k≤n n n n i=1 n i=1 t=2 k
k
t−1
1 1 1 1 Xt + max | √ (f ( √ Xi ) − f ( √ Xi ) − f ( √ Xi ) √ )Ht |. 1≤k≤n n t=2 n i=1 n i=1 n i=1 n k
t
t−1
t−1
Firstly, by the definition of Hk , 1 1 P Xi )Hk | − → 0. max | √ f ( √ 1≤k≤n n n i=1 k
(3.23)
Next, we prove 1 1 P f (√ Xi )(Xt Ht − λ)| − → 0. n t=2 n i=1 k
max |
1≤k≤n
t−1
(3.24)
Set Yt,j = E(Xt Xt+j |Ft ) − E(Xt Xt+j ), (3.24) can be implied by max |
1≤k≤n
k t−1 ∞ 1 1 P f (√ Xi )( Yt,j )| − → 0. n t=2 n i=1 j=1
(3.25)
pg 89/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
90
linwang
Weak Convergence and its Applications
We approximate St :=
∞ j=1
t := ∞ Pt (Sk ), then we need to prove Yt,j by D k=t 1 1 P t| − f (√ Xi )D →0 n t=2 n i=1
(3.26)
1 1 P t )| − f (√ Xi )(St − D → 0. n t=2 n i=1
(3.27)
k
max |
1≤k≤n
t−1
and k
max |
1≤k≤n
t−1
For (3.26), we have
5 6 t−1 t−1 6 1 1 2 7 t )4 , √ √ E(f ( Xi )Dt ) ≤ E(f ( Xi )4 )E(D n n i=1
i=1
∞ t )4 ]1/4 = [E( [E(D Pt (Sk ))4 ]1/4 k=t ∞
≤
||Pt (Sk )||4 =
∞ ∞
||E(Xk Xk+i |Ft ) − E(Xk Xk+i |Ft−1 )||4 < ∞
k=t i=1
k=t
by condition (iii). Since f (x) ≤ C(1 + |x|α ), we have 1 t )2 ≤ L Xi )D E(f ( √ n t−1
(3.28)
i=1
by Lemma 3.4. Then, by the Kolmogorov inequality for martingale, for any ε > 0 we have k t−1 1 1 t | > ε) f (√ Xi )D P( max | 1≤k≤n n n i=1 t=2 n t−1 t ]2 E[ t=2 f ( √1n i=1 Xi )D ≤ n2 ε 2 t−1 t ]2 max2≤t≤n E[f ( √1n i=1 Xi )D 1 ≤ = O( ), 2 nε n (3.26) is proved. For (3.27), we have t = Zt−1 − Zt , Zt = St − D
∞ ∞
E(Xt+i+k Xt+i |Ft )
i=1 k=1
and 1 1 1 t )| = max | 1 f (√ Xi )(St − D f ( √ Xi )(Zt − Zt−1 )| max | 1≤k≤n n 1≤k≤n n n i=1 n i=1 t=2 t=2 k
t−1
k
t−1
1 1 1 1 1 Xi )Zk | + max | (f ( √ Xi ) − f ( √ Xi ))Zt | ≤ max | f ( √ 1≤k≤n n 1≤k≤n n n i=1 n i=1 n i=1 t=2 k
k
t
t−1
pg 90/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
91
Convergence to Semimartingales
1 |Zk | |Xk Zk | + max √ ≤ max |f ( √ Xi )| max 1≤k≤n 1≤k≤n n 1≤k≤n n i=1 n k
f (t).
sup
k Xi √ |t|≤max1≤k≤n | i=1 n
|
Under condition (iii), by the law of large numbers, we have P
1
→ 0, max n− 6 |Xk | −
1≤k≤n
P
1
max n− 3 |Zk | − → 0,
1≤k≤n
so 1 P max √ |Xk Zk | − → 0. 1≤k≤n n k From Lemma 3.5, we have √1n i=1 Xi = OP (1). By the continuity of f (x), we obtain (3.27), thus (3.25) and further (3.24) hold. We have, by the Taylor expansion, that 1 1 1 1 Xt (f ( √ Xi ) − f ( √ Xi ) − f ( √ Xi ) √ )Ht | max | √ 1≤k≤n n t=2 n i=1 n i=1 n i=1 n k
≤
t
1 1 max √ Xk2 |Hk | 2 1≤k≤n n
t−1
f (t).
sup
k Xi √ |t|≤max1≤k≤n | i=1 n
t−1
|
Under condition (iii), by the law of large numbers and similar argument in the above, we get that 1 1 1 1 Xt P max | √ (f ( √ Xi ) − f ( √ Xi ) − f ( √ Xi ) √ )Ht | − → 0. 1≤k≤n n t=2 n i=1 n i=1 n i=1 n k
t
t−1
t−1
Then combining it with (3.23) and (3.24), we obtain P
→ 0, max Jk −
1≤k≤n
which implies (3.22). Next, we prove P
sup |Cnij (s) − C ij (s) ◦ Xn | − → 0.
(3.29)
0<s≤1
We only consider the case of i = j = 1, since the proofs for other cases are similar. Clearly, (3.29) is equivalent to [ns]
sup |
0<s≤1
t−1
[ns]
i=1
i=1
1 2 1 σ2 1 P f (√ Xi )(E(Dt2 |Ft−1 ) − σ 2 ) − f 2 ( √ Xi )(ns − [ns])| − → 0, n t=2 n n n
Firstly, we prove [ns]
sup |
0<s≤1
[ns]
1 2 1 1 2 1 P f (√ Xi )σ2 − f (√ Di )σ2 | − → 0. n t=2 n t=2 n i=1 n i=1 t−1
t−1
(3.30)
pg 91/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
92
linwang
Weak Convergence and its Applications
Since [ns]
sup |
0<s≤1
[ns]
1 2 1 1 2 1 f (√ Xi ) − f (√ Di )| n t=2 n i=1 n t=2 n i=1 t−1
t−1
1 1 ≤ max |f ( √ Xi ) − f 2 ( √ Di )|, 1≤t≤n n i=1 n i=1 t−1
t−1
2
and f is a uniform continuous function, we can get (3.30) by 1 1 P max | √ Xi − √ Di | − → 0. 1≤t≤n n i=1 n i=1 t−1
t−1
(3.31)
For any ε > 0, by Lemma 3.4, for 2 < q < 4, we have n t−1 t−1 E[ t=1 (Xt − Dt )]2 n1/q (log n)−1 1 Xi − Di | > ε) ≤ ≤ C , P( √ max | n 1≤t≤n i=1 nε2 nε2 i=1 which implies (3.31), then we obtain (3.30). By the martingale version of the Skorokhod representation theorem, on a richer probability space, there exist a standard Brownian motion W and nonnegative k random variables τ1 , τ2 , · · · with partial sums Tk = i=1 τi such that Tk − kσ2 = oa.s. (k2/q ) and Mk = W (Tk ), E(τk |Fk−1 ) = E(Dk2 |Fk−1 ) for k ≥ 1. (cf. Strassen (1964)). T < s ≤ Tnk , we consider For k−1 n [ns] [ns] t−1 σ2 2 1 σ2 2 1 f (√ Di ) + Di )(ns − [ns]). In (s) = f (√ n t=2 n n i=1 n i=1
By the martingale version of the Skorokhod representation theorem, we have In (s) =
k−1 σ2 2 Tt−1 Tk−1 σ2 2 f (W ( )) + f (W ( ))(ns − [ns]). n t=2 n n n
Since Tk −kσ 2 = oa.s. (k 2/q ), by using the continuity modulas theorem for the Wiener process max |W ( t≤k
σ2 t σ2 t Tt x ) − W( )| ≤ max )| sup |W ( ) − W ( t≤k |x−σ2 t|≤k2/q n n n n ≤ oa.s. (k1/q log k).
By the similar argument in (3.30), we have sup |In (s) −
0<s≤1
k−1 t=2
f 2 (W (
σ 2 (t − 1) σ 2 σ 2 (k − 1) σ2 2 P → 0. )) + f (W ( ))(ns − [ns])| − n n n n
By the Riemann approximation of stochastic integral, and the continuity of Brownian motion’s paths, we have s P sup |In (s) − f 2 (W (v))dv| − → 0. (3.32) 0<s≤1
0
pg 92/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
93
Convergence to Semimartingales
By noting Mk = W (Tk ), we have [ns]
t−1
[ns]
t−1
[ns]
1 2 1 1 2 1 f (√ Xi )E(Dt2 |Ft−1 ) − f ( √ W (Tt−1))E(Dt2 |Ft−1 )| sup | n t=2 n i=1 n 0<s≤1 n t=2 1 2 1 = sup | f (√ Xi )E(Tt − Tt−1 |Ft−1 ) n i=1 0<s≤1 n t=2 [ns]
1 2 1 f ( √ W (Tt−1 ))E(Tt − Tt−1 |Ft−1 )| − n t=2 n P
− →0 by Lemma 3.5. By the Riemann approximation of stochastic integral again and the Approximated Laplacians property (cf. Dellacherie and Meyer (1982)), we obtain [ns]
sup |
0<s≤1
1 2 1 f ( √ W (Tt−1 ))E(Tt − Tt−1|Ft−1 ) − n t=2 n
0
s
P
f 2 (W (v))dv| − → 0,
thus, we have [ns]
sup |
0<s≤1
1 2 1 f (√ Xi )E(Tt − Tt−1 |Ft−1 ) − n t=2 n t−1
i=1
s 0
P
f 2 (W (v))dv| − → 0.
(3.33)
By (3.30), (3.32) and (3.33), we obtain (3.29). Furthermore, we prove P
→ 0. sup |ΔXn (s)| −
0<s≤1
(3.34)
In fact, 1 1 1 sup |ΔXn (s)| ≤ max |f ( √ Xi )| · max √ |Xk | + max √ |Xk |. (3.35) 0≤k≤n 0≤k≤n 0≤k≤n n n n 0<s≤1 i=1 k
k From Lemma 3.5, we have √1n i=1 Xi = OP (1). Combining it with the assumptions for f (x), we obtain (3.34) by (3.35). (3.34) implies (3.7), since the limiting process is continuous. Under the assumptions of the theorem, functions f (x) and f (x) are locally Lipschitz continuous and satisfy the growth condition (i.e. |f (x)| ≤ C(1 + |x|α ) for α > 0). Thus, the SDE (3.20) has a unique solution. In other words, the martingale problem ς(σ(X0 ), X|L0 , B, C, ν) has unique solution. Thus, conditions (iii) and (vi) in Theorem 3.3 are satisfied, and the rest conditions are verified easily, since limiting process is continuous, these works are left to the reader.
pg 93/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
94
linwang
Weak Convergence and its Applications
Next, we consider the heavy tailed case. When the distribution of X1 is heavytailed, we will discuss the weak convergence of following processes: [nt]
i−1 f ( (Xn,j − E(h(Xn,j )))(Xn,i − E(h(Xn,i ))),
i=2
j=1
where Xn,j = Xj /bn for some bn → ∞, f (x), h(x) are continuous functions. This problem is interesting and difficult from the theoretical point of view. If we use the point process method to obtain the weak convergence, the summation functional should be a continuous functional respect to the Skorohod topology, and the limiting process should have a compound Poisson representation. However, it is difficult to prove that the summation functional above is a continuous functional respect to the Skorohod topology. Moreover, the stochastic integral driven by α−stable L´evy process do not have a compound Poisson representation. Thus, the point process method can not be used easily. By means of Theorem 3.3, we have the following theorem. Theorem 3.5. (Lin and Wang (2011)) Let f : R → R be a continuous differentiable function such that |f (x) − f (y)| ≤ C|x − y|a
(3.36)
for some constants C > 0, a > 0 and all x, y ∈ R. Suppose that {Xn }n≥1 is a sequence of i.i.d. random variables. Set Xn,j =
Xj Xj − E(h( )) bn bn
(3.37)
for some bn → ∞, where h(x) is a continuous function satisfying h(x) = x in a neighborhood of 0 and |h(x)| ≤ |x|1|x|≤1 . Define measure ρ by ρ((x, +∞]) = px−α ,
ρ([−∞, −x)) = qx−α
(3.38)
for x > 0, where α ∈ (0, 1), 0 < p < 1 and p + q = 1. Then · [n·] [n·] i−1 ( Xn,i , f( Xn,j )Xn,i ) ⇒ (Zα (·), f (Zα (s−))dZα (s)) i=1
i=2
(3.39)
0
j=1
in D[0, 1], where Zα (s) is an α−stable L´evy process with L´evy measure ρ iff nP[
X1 v ∈ ·] − → ρ(·) bn
(3.40)
in Mp ([−∞, ∞]\{0}). Set Yn (t) =
[nt] i=2
t [nt] i−1 f( Xn,j )Xn,i , Y (t) = f (Zα (s−))dZα (s), Sn (t) = Xn,i . j=1
0
i=1
pg 94/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
95
We intend to prove Hn (t) := (Yn (t), Sn (t)) ⇒ H(t) = (Y (t), Zα (t)). We firstly give some lemmas, which are the basis of the proof. Lemma 3.6. The predictable characteristics of (Y (t), Zα (t)) are the triplet (B, C, λ) defined as follows: t ∞ ⎧ ⎪ (h(f (Zα (s−)x) − f (Zα (s−)h(x))ν(ds, dx), i = 1, ⎪ 0 −∞ ⎪ B i (t) = ⎪ ⎪ ⎪ 0, i = 2, ⎪ ⎪ ⎪ ⎧ t ∞ ⎪ ⎨ 2 ⎪ h (f (Zα (s−)x)ν(ds, dx), i = 1, j = 1, ⎪ ⎨ 0 −∞ t +∞ ⎪ ij ⎪ h(f (Zα (s−)x)h(x)ν(ds, dx), i = 1, j = 2 or i = 2, j = 1, ⎪ 0 −∞ ⎪ C (t) = ⎪ ⎪ ⎪ t +∞ 2 ⎪ ⎩ ⎪ ⎪ i = 2, j = 2, ⎪ 0 −∞ h (x)ν(ds, dx), ⎪ ⎩ 1G ∗ λ(ds, dx) = 1G (x, f (Zα (s−))x)ν(ds, dx) for all G ∈ B2 , where ν(ds, dx) is the compensator of the jump measure of Zα (t). Proof. B 2 (t) and C 22 (t) can be easily obtained by the definition of predictable characteristics of a semimartingale. Let η(ds, dx) be the jump random measure of Y (t) and λ (ds, dx) be the compensator of η(ds, dx). If G is a Borel set in R, we have 1G ∗ λ (ds, dx) = 1G (f (Zα (s−))x) ∗ ν(ds, dx). Set z = f (Zα (s−))x, then t +∞ (z − h(z))η(ds, dz) Y (t) − t = 0
0 +∞
−∞
f (Zα (s−))h(x)(μ(ds, dx) − ν(ds, dx))
−∞ t +∞
+ 0
−∞ t +∞
0
−∞ +∞
−
t
f (Zα (s−))(x − h(x))μ(ds, dx) (f (Zα (s−))x − h(f (Zα (s−)x))μ(ds, dx)
f (Zα (s−))h(x)(μ(ds, dx) − ν(ds, dx))
= 0
−∞ t +∞
+ 0
t
−∞ +∞
(h(f (Zα (s−))x) − f (Zα (s−)h(x))μ(ds, dx)
f (Zα (s−))h(x)(μ(ds, dx) − ν(ds, dx))
= 0
−∞ t +∞
+ 0
−∞
(h(f (Zα (s−))x) − f (Zα (s−))h(x))(μ(ds, dx) − ν(ds, dx))
pg 95/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
96
linwang
Weak Convergence and its Applications
t
+∞
+ 0
t
−∞ +∞
(h(f (Zα (s−))x) − f (Zα (s−))h(x))ν(ds, dx)
h(f (Zα (s−))x)(μ(ds, dx) − ν(ds, dx))
= 0
−∞ t +∞
+ 0
−∞
which implies Bt1 =
(h(f (Zα (s−))x) − f (Zα (s−))h(x))ν(ds, dx),
t 0
+∞
−∞
(h(f (Zα (s−))x) − f (Zα (s−))h(x))ν(ds, dx),
and the martingale part of Yt is t +∞ h(f (Zα (s−))x)(μ(ds, dx) − ν(ds, dx)). 0
(3.41)
−∞
Then we can get C 11 , C 12 and C 21 . The lemma is proved.
We set μn (ω; ds, dx) =
n
ε( i , Xi (ω) ) (ds, dx), n
i=1
bn
then n
νn (ds, dx) :=
ε( ni ) (ds)P(
i=1
Xi ∈ dx) bn
is the compensator of μn by independence of {Xi }i≥1 . Set ζn (ω; ds, dx) =
n
ε( i , Xi (ω) −c ) (ds, dx), n
i=1
bn
n
we have ϕn (ds, dx) :=
n
ε( i ) (ds)P( n
i=1
Xi − cn ∈ dx) bn
1 is the compensator of ζn , where cn = E[h( X )]. bn Firstly, we consider process Sn . Introduce truncate function hn (x) = h(x + cn ), we have
Sn (t) =
[nt]
h(
i=1
=
[nt]
[nt] Xi Xi )+ (Xn,i − h( )) bn bn i=1 [nt]
(h(
i=1
t
Xi Xi Xi ) − cn ) + ( − h( )) bn bn bn i=1
+∞
= 0
−∞
h(x)(μn (ds, dx) − νn (ds, dx)) +
[nt] Xi Xi ( − h( )) b bn n i=1
pg 96/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
97
Convergence to Semimartingales [nt] Xi Xi ( − h( )). bn bn
=: S n (t) +
i=1
The predictable characteristics of S n are Bn2 (t) = 0, t
Cn22 (t)
+∞
= 0
−∞
h (x)νn (ds, dx) − ( 2
+∞
−∞
s≤t
h(x)νn ({s}, dx))2 .
For Yn (t), we have Yn (t) =
[nt]
h(f (
i=2
=
[nt] i=2
i−1 j=1
i=2
j=1
j=1
i−1 i−1 Xi Xi (h(f ( Xn,j ) ) − E(h(f ( Xn,j ) )|Fi )) bn bn j=1
[nt]
+
[nt] i−1 i−1 Xi Xi Xn,j ) ) + (f ( Xn,j )Xn,i − h(f ( Xn,j ) )) bn bn
j=1
i−1
(E(h(f (
i=2
i−1 Xi X1 )|Fi ) − f ( Xn,j )E(h( ))) bn bn j=1
Xn,j )
j=1
[nt]
+
i−1 i−1 Xi Xi (f ( Xn,j ) − h(f ( Xn,j ) )) b bn n i=2 j=1 j=1
t
[ns]−1
+∞
=
h(f ( 0
−∞
t
[ns]−1
+∞
(h(f ( 0
Xn,j )x)(μn (ds, dx) − νn (ds, dx))
j=1
+
+
−∞
[ns]−1
Xn,j )x) − f (
j=1
Xn,j )h(x))νn (ds, dx)
j=1
[nt]
i−1 i−1 Xi Xi (f ( Xn,j ) − h(f ( Xn,j ) )) bn bn i=2 j=1 j=1
=: Y n (t) +
[nt] i=2
i−1 i−1 Xi Xi (f ( Xn,j ) − h(f ( Xn,j ) )). bn bn j=1
j=1
The predictable characteristics of Y n (t) are Bn1 (t)
t
Cn11 (t) =
[ns]−1
+∞
=
(h(f ( 0
−∞
t 0
+∞
−∞
j=1
[ns]−1
Xn,j )x) − f (
j=1
[ns]−1
h2 (f (
j=1
Xn,j )x)νn (ds, dx)
Xn,j )h(x))νn (ds, dx),
pg 97/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
98
linwang
Weak Convergence and its Applications
− ( s≤t
[ns]−1
+∞
h(f ( −∞
Xn,j )x)νn ({s}, dx))2 ,
j=1
Cn12 (t) = Cn21 (t) t +∞ [ns]−1 h(f ( Xn,j )x)h(x)νn (ds, dx) = 0
−
−∞
j=1
h(f (
−∞
s≤t
[ns]−1
+∞
Xn,j )x)νn ({s}, dx)
j=1
Lemma 3.7. Under (3.40), +∞ g(x)nFn (dx) → −∞
+∞
g(x)ρ(dx), −∞
+∞
−∞
h(x)νn ({s}, dx).
n → ∞,
1 for every continuous g ∈ Cb2 (R), where Fn (x) = P( X bn ≤ x).
Proof.
From (3.40), we have +∞ h(x)nFn (dx) → −∞
+∞
h(x)ρ(dx), −∞
n → ∞,
for every continuous function h with a compact support. From (3.38), we can get that for any ε > 0, there exists r > 0 such that ρ((r, +∞)) + ρ((−∞, −r)) < ε. Set Br = [−r, r], we can find a continuous function gr with a compact support, such that 1Br ≤ gr ≤ 1. Then +∞ +∞ g(x)nFn (dx) − g(x)ρ(dx)| | −∞ +∞
≤|
g(x)nFn (dx) −
−∞ +∞
+| −∞ +∞
+| ≤|
−∞ +∞
−∞
−∞ +∞
g(x)gr (x)nFn (dx)|
−∞
g(x)gr (x)nFn (dx) − g(x)gr (x)ρ(dx) −
g(x)gr (x)nFn (dx) −
+∞
−∞ +∞
g(x)gr (x)ρ(dx)|
g(x)ρ(dx)| −∞ +∞ −∞
g(x)gr (x)ρ(dx)|
+||g||(nFn (Brc ) − ρ(Brc )). For ε > 0, there exists n0 , such that as n ≥ n0 , +∞ +∞ g(x)gr (x)nFn (dx) − g(x)gr (x)ρ(dx)| < ε. | −∞
−∞
pg 98/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
99
Convergence to Semimartingales
From Theorem 3.2 (ii) in Resnick (2007), there exists n1 , such that as n ≥ n1 , |nFn (Brc ) − ρ(Brc )| < ε. Then we have
|
+∞
−∞
g(x)nFn (dx) −
+∞
−∞
g(x)ρ(dx)| ≤ (1 + 2||g||)ε
as n ≥ max{n0 , n1 }, which implies the lemma.
From (3.40), we can obtain [n·]
Xn,i ⇒ Zα (·)
i=1
[n·] [n·] by Theorem 2.7. So i=1 Xn,i is relatively compact, in the other words, i=1 Xn,i is tight. By this fact, we have that for any ε > 0, there are n0 > 0 and M > 0 such that P(sup |Sn (t)| > M ) < ε as n ≥ n0 . t≤1
Since the convergence Hn ⇒ H is a local property, it suffices to prove Theorem 3.5 for f (Sn (t−))1[0,T ] and f (Zα (t−))1[0,T ] for any stopping time T . We use SnC and S C to replace T in f (Sn (t−))1[0,T ] and f (Zα (t−))1[0,T ] respectively, where SnC = inf(s : |Sn (s)| ≥ C or |Sn (s−)| ≥ C), S C = inf(s : |S(s)| ≥ C or |S(s−)| ≥ C). As described in Pag`es (1986), we can assume f (Sn (t−)) ≤ C, f (Zα (t−)) ≤ C
(3.42)
for some constant C in the following proof. Let K be a compact subset of R such that |u| ≤ C for any u ∈ K. Set [ns]−1
1G ∗ λn (ds, dx) = 1G (x, f (
Xn,i )x)νn (ds, dx) for G ∈ B2 .
i=1
Lemma 3.8. Under (3.40), we have that for t > 0, P
→0 T V[K ∗ λn − (K ∗ λ) ◦ Hn ]t −
(3.43)
for every bounded continuous function K(x, u) on R × K satisfying K(x, u) = 0 for all |x| ≤ δ, u ∈ K for some δ > 0. Proof.
At first, we show that for every bounded continuous function g on R, P
T V[g ∗ νn − g ∗ ν]t − →0 In fact,
t 0
for t > 0.
+∞
−∞
g(x)νn (ds, dx) = [nt]E(g(Xn,1 ))
(3.44)
pg 99/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
100
linwang
pg 100/3
Weak Convergence and its Applications
and
t
+∞
+∞
g(x)ν(ds, dx) = t 0
−∞
g(x)ρ(dx). −∞
Hence, we have T V[g∗νn −g∗ν]t ≤
[nt] | n
+∞ −∞
g(x)nFn (dx)−
+∞
g(x)ρ(dx)|+| −∞
[nt] −t| n
+∞
g(x)ρ(dx), −∞
(3.44) is obtained by Lemma 3.7. As proved in the Lemma IX5.22 in Jacod and Shiryaev (2003), we only need to prove (3.43) for K(x, u) = ga (x)d(x)R(u), where R(u) is a continuous function on K, d is a bounded continuous function on R. Noting that T V[K ∗ λn − (K ∗ λ) ◦ Hn ]t [nt]−1
≤ |R(f (
Xn,i ))|T V[dga ∗ νn − dga ∗ ν]t + |R(f (Sn (t−)))
i=1 [nt]−1
−R(f (
Xn,i ))| · (dga ∗ ν)t
i=1 [nt]−1
≤ ||R||T V[dga ∗ νn − dga ∗ ν]t + ||d|||R(f (Sn (t−))) − R(f (
Xn,i ))| · (ga ∗ ν)t .
i=1
We can get P
||R||T V[dga ∗ νn − dga ∗ ν]t − →0 by (3.44). For any ε > 0, there exists δ1 > 0, such that |y − y | < δ1 ⇒ |R(y) − R(y )| < ε, since R(u) is uniformly continuous on K. Then we have [nt]−1
P(||d|||R(f (Sn (t−))) − R(f (
Xn,i ))| > ε)
i=1 [nt]−1
≤ P(|f (Sn (t−)) − f (
Xn,i )| >
i=1
δ1 ) ||d||
δ1 ≤ P(|Xn,1 | > ) ||d|| ≤2
δ1 , ∞] ρ( ||d||
n
→0
by the Lipschitz condition of f and (3.40). Then [nt]−1
||d|||R(f (Sn (t−))) − R(f (
i=1
P
Xn,i ))| − → 0,
(3.45)
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
101
Convergence to Semimartingales
which implies [nt]−1
||d|||R(f (Sn (t−))) − R(f (
P
Xn,i ))| · (ga ∗ ν)t − →0
i=1
since ga ∗ ν is an increasing deterministic measure. We complete the proof of the lemma.
Lemma 3.9. Under (3.40), we have P
→0 T V[Bn1 − B 1 ◦ Sn ]t − Proof.
for any t > 0.
Let K(x, u) = h(ux) − uh(x).
We obtain the lemma by Lemma 3.8. Lemma 3.10. Under (3.40), we have P
T V[Cnij − C ij ◦ Sn ]t − →0
for any t > 0,
where i, j = 1, 2. Proof. We only prove the case of i = j = 1. The other cases can be proved similarly. Although this lemma is different from Lemma 3.8, the method of proof is same as that of Lemma 3.8 by noting [nt]−1
T V[h2 (f ((
Xn,i )x)) ∗ νn (ds, dx) − h2 (f (Zα (s−)x)) ∗ ν(ds, dx) ◦ Sn ]t
i=1 [nt]−1
≤ |f 2 (
[nt]−1
Xn,i )|V ar[x2 ∗ νn − x2 ∗ ν]t + |f 2 (Sn (t−)) − f 2 (
i=1
Xn,i )| · (x2 ∗ ν)t
i=1 [nt]−1
≤ CV ar[x2 ∗ νn − x2 ∗ ν]t + 2C|f (Sn (t−)) − f (
Xn,i )| · (x2 ∗ ν)t
i=1 P
− →0 by |h(x)| ≤ |x|1|x|≤1 and Lemma 3.8. It suffices to show [ns]−1 +∞ P h(f ( Xn,j )x)νn ({s}, dx))2 ] − →0 T V[ ( −∞
s≤t
j=1
which is equivalent to (
T V[
s≤t
+∞
−∞
[ns]−1
h(f (
j=1
Xn,j )x)νn ({s}, dx))2
(3.46)
pg 101/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
102
linwang
Weak Convergence and its Applications
− ( s≤t
+∞ −∞
P
h(f (Zα (s−))x)ν({s}, dx))2 ◦ Sn ]t − →0
(3.47)
since ν({s}, dx)) = 0. However, [ns]−1
T V[h(f (
P
Xn,j )x) ∗ νn (ds, dx) − h(f (Zα (s−)x) ∗ ν(ds, dx) ◦ Sn ]t − → 0 (3.48)
j=1
implies (3.47), and the proof of (3.48) is similar to the argument in the proof of Lemma 3.8. We complete the proof. Set [ns]−1
1G ∗ ωn (ds, dx) = 1G (x, f (
Xn,j )x)ϕn (ds, dx) for G ∈ B2 .
j=1
Lemma 3.11. Under (3.40), we have that for t > 0, P
→0 T V[K ∗ ωn − (K ∗ λ) ◦ Sn ]t − for every bounded continuous function K(x, u) on R × K satisfying K(x, u) = 0 for all |x| ≤ δ, u ∈ K for some δ > 0. Proof.
Note that |cn | ≤ E|
X1 |1|X1 |≤bn = bn
1
(P(| 0
X1 X1 | > y) − P(| | > 1))dy → 0. bn bn
For a = 0, X1 X1 X1 − cn < a) − P( < a)) ≤ nP(a − |cn | ≤ ≤ a + |cn |) → 0, bn bn bn which implies n(P(
nP[
X1 v − cn ∈ ·] − → ρ(·) bn
by (3.40). From (3.49) and Lemma 3.7, we complete the proof.
(3.49)
The proof of Theorem 3.5. Assume (3.39) with f (x) = x holds. From Theorem 2.6, we can get (3.40). Assume that (3.40) holds. we prove (3.39). The proof will be presented in two steps. (a) We prove the tightness of {Hn }. The functions α Bt (α), Ct (α), g ∗ λt (α) are Skorokhod-continuous on D([0, 1]) since the truncation function is continuous. Then Bn (t), Cn (t), g ∗ ωn (t) are C-tight by lemmas 3.9-3.11. From (3.40), L(Sn ) ⇒ L(Zα ).
pg 102/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
103
Convergence to Semimartingales
It means that {Sn } is tight. Note that [nt] Xi
and [nt]cn →
t +∞ 0
−∞
bn
i=1
= Sn (t) + [nt]cn ,
h(x)ν(ds, dx). Hence {
[n·]
Xi i=1 bn }
is tight and
lim lim sup P (|x2 |1{|x|>b} ∗ ϕn (t ∧ Sna ) > ε) = 0
(3.50)
b↑∞ n→∞
for all t > 0, a > 0, ε > 0. We have lim lim sup P (|x2 |1{|x|>b} ∗ ωn (t ∧ Sna ) > ε) = 0
b↑∞ n→∞
by (3.42), and hence {Hn} is tight. (b) Identify the limiting process. We need to prove that if for any subsequence we can identify the limiting process. weakly converges to a common limit P, Since (3.36), the martingale problem ς(σ(X0 ), X|L0 , B, C, λ) has unique solution by Theorem 6.13 in Applebaum (2009). So we need to prove the limiting process, in the other words, to prove H, has predicable characteristics (B, C, λ) under P, h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t), (h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t))2 − C 11 ◦ Sn (t), g ∗ η ◦ Sn (t) − g ∗ λ ◦ Sn (t) where g is a bounded continuous function. are local martingales under P, We have t +∞ [ns]−1 h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)) − h(f (Zα (s−))x) 0
−∞
j=1
∗(μ(ds, dx) − ν(ds, dx)) ◦ Sn (t) [nt]−1
= h(f ((
P
Xn,i )x) ∗ ϕn (ds, dx) − h(f (Zα (s−)x) ∗ ν(ds, dx) ◦ Sn (t) − → 0 (3.51)
i=1
by (3.48) and Lemma 3.11. Set t 11 Cn (t) = 0
−
+∞
[ns]−1 2
h (f (
−∞
( s≤t
Xn,j )x)ϕn (ds, dx)
j=1 [ns]−1
+∞
−∞
h(f (
Xn,j )x)ϕn ({s}, dx))2 .
j=1
Since L(Sn ) ⇒ P,
pg 103/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
104
linwang
Weak Convergence and its Applications
11 (1) ≥ (3.42) implies that C 11 ◦ Sn (t) ≤ C, and lemmas 3.10, 3.11 imply that P(C n n11 (t) > C + 1}, we have C + 1) → 0 as n → ∞. Set Tn = inf{t : C lim P(Tn < 1) = 0.
n→∞
So
E( sup | 0≤t≤1
t∧Tn
[ns]−1
+∞
h(f ( 0
−∞
11 (Tn )) Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx))|2 ) ≤ 4E(C n
j=1
(3.52) by Doob’s inequality. (3.51) and (3.52) imply that h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t) since is a local martingale under P, t +∞ [ns]−1 h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)) 0
−∞
j=1
is a local martingale. It is easy to see that t +∞ [ns]−1 h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)))2 ( 0
−∞
j=1
n11 (t) −(h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t))2 + C 11 ◦ Sn (t) − C t +∞ [ns]−1 h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)) =( 0
−∞
t ·( 0
j=1 [ns]−1
+∞
h(f ( −∞
Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)) − h(f (Zα (s−))x)
j=1
∗(μ(ds, dx) − ν(ds, dx)) ◦ Sn (t)) +(h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t)) t +∞ [ns]−1 ·( h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)) − h(f (Zα (s−))x) 0
−∞
j=1
∗(μ(ds, dx) − ν(ds, dx)) ◦ Sn (t)) n11 (t). +C 11 ◦ Sn (t) − C Moreover
t∧Tn
+∞
[ns]−1
h(f ( −∞
0
Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx))
j=1
is uniformly integrable by (3.52), thus t +∞ [ns]−1 h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)))2 ( 0
−∞
j=1
pg 104/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
105
P 11 (t) − −(h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t))2 + C 11 ◦ Sn (t) − C →0 n
by (3.51) and Lemma 3.10. By Proposition 3.3, t∧Tn +∞ [ns]−1 E( sup | h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx))|4 ) 0≤t≤1
0
−∞
j=1
n11 (Tn ))2 ] 12 + CE(C n11 (Tn ))2 ≤ C[E(C where C are constants. (h(f (Zα (s−))x) ∗ (μ(ds, dx) − ν(ds, dx)) ◦ Sn (t))2 − C 11 ◦ Sn (t) since is local martingale under P, t +∞ [ns]−1 n11 (t) ( h(f ( Xn,j )x)(ζn (ds, dx) − ϕn (ds, dx)))2 − C 0
−∞
j=1
is a local martingale. For g ∗ η ◦ Sn (t) − g ∗ λ ◦ Sn (t) we can get the similar conclusion by Lemma 3.11. We complete the proof of the theorem. 3.4
Weak convergence to stochastic integral II: Kurtz and Protter’s approach
Weak convergence to stochastic integral is a key step in the study of error distribution for approximating a stochastic differential equation. However, the method in previous section may be not suitable when we intend to obtain that convergence of the integrand and integrator implies convergence of the integral under some suitable conditions. Jakubowski, M´emin and Pag`es (1989), Kurtz and Protter (1991) introduced different approaches to complete this work. Their methods are similar. In this section, we will introduce Kurtz and Protter’s approach and its applications. Kurtz and Protter’s approach is quite simple, when it deals with the continuous path processes. In this section, we only discuss the continuous path processes. For the processes with jump, the core of method is similar, but the proof is more complex. First recall that, for every δ > 0, any semimartingale X can be written as Xs 1{|Xs |>δ} , (3.53) Xt = X0 + A(δ)t + M (δ)t + st
where A(δ) is a predictable process with finite variation, null at 0, M (δ) is a local martingale null at 0, and Xs denotes the jump size of X at time s.
pg 105/3
April 23, 2014
10:51
106
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Let Xn = {X n } be a sequence of Rd -valued semimartingales, with A(δ)n and M (δ)n associated with X n as in (3.53). We say that the sequence {X n } satisfies (∗) if for some δ > 0 and for each i the sequence 1 < M (δ)n , M (δ)n >1 + |dA(δ)ns | + |Xsn |1{|Xsn >δ} 0
0<s≤1
is tight. It turns out that this property is equivalent to the notion of uniform tightness (UT) as introduced by Jakubowski, M´emin and Pag`es (1989). Recall that, for any (possibly multidimensional) process V : V ∗ = sup ||Vt ||. t∈[0,1]
The following theorem is the basic tool in this section. Theorem 3.6. (Kurtz and Protter (1991)) Let {X n } and {Y n } be two sequences of Rd -valued semimartingales, relative to the filtrations (Ftn ). (a) If both sequences {X n } and {Y n } have (∗), then so has the sequence X n + Y n . 1 (b) If each X n is of finite variation and if the sequence { 0 |dXsn |} is tight, then the sequence {X n } has (∗). (c) Let {H n } be a sequence of (Ftn )-predictable processes such that the sequence {H n∗ } is tight. If the sequence {X n} has (∗), so has the sequence {H n · X n }. (d) Let {H n } and {H n } be two sequences of (Ftn )-predictable processes such that P the sequence {H n∗ } is tight and that (H n − H n )∗ → 0. If the sequence {X n } has P (∗), then (H n · X n − H n · X n )∗ → 0. (e) Suppose that {X n } weakly converges. Then (∗) is necessary and sufficient for the following property: For any sequence {H n } of (Ftn )-adapted, right-continuous and left-hand limited processes such that the sequence {(H n , X n )} weakly converges to a limit (H, X), then X is a semimartingale with respect to the filtration generated by the process (H, X), and we have n · X n ) ⇒ (H, X, H− · X). (H n , X n , H−
The proof of Theorem 3.6 can be found in Kurtz and Protter (1991). In the previous section, we prove Theorem 3.4 by means of strong approximation technique. However, for multivariate case, we can not find a unique stopping time to be embeded into every component of multivariate Brownian motion, so we can not obtain the correspondence results. In this section, we try to use Kurtz and Protter’s approach to overcome this difficult. Let {i, ηi }i∈Z be i.i.d. random variabels, uk =
∞ j=0
ϕj k−j ,
xk =
∞
δj ηk−j ,
j=0
where {ϕj }j≥0 , {δj }j≥0 are two sequences of numbers.
pg 106/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
107
A fractional Brownian motion with Hurst parameter 0 < H < 1 on D[0, 1] is defined by t 0 1 H H−1/2 H−1/2 ∗ [(t − s) − (−s) ]dW−s + (t − s)H−1/2 dWs , Bt = A(H) −∞ 0 where 1 A(H) = ( + 2H
∞ 0
[(1 + s)H−1/2 − sH−1/2 ]2 ds)1/2 ,
W = (Ws )s≥0 is a standard Brownian motion and W ∗ = (Ws∗ )s≥0 is an independent copy of W . Theorem 3.7. Let f : R → R be a twice differentiable function such that |f (x)| ≤ C(1 + |x|α ), |f (x)| ≤ C(1 + |x|α ) for some constants C > 0 and α > 0 and all x ∈ R, and f (x) is locally bounded. Assume (i) E(ε1 ) = E(η1 ) = 0, E(ε21 ) = E(η12 ) = 1, E|η1 |4α < ∞; ∞ (ii) j=0 j|ϕj | < ∞; (iii) δj ∼ j −d ρ(j), where 1/2 < d < 1 and ρ(j) is a function slowly varying at ∞; n Put d2n = E( k=1 xk )2 . we have · [n·] 1 √ f (yn,i−1 )ui ⇒ f (Bs3/2−d )dWs (3.54) n i=1 0 in D[0, 1], where yn,k = d1n kj=1 xj , W is the Brownian motion. Furthermore, if (iii) is replaced by ∞ ∞ (iv) j=0 |δj | < ∞ and δ := j=0 δj = 0, we have · · [n·] 1 √ f (yn,i−1 )ui ⇒ M f (Gs )ds + f (Bs )dWs n i=1 0 0 j ∞ in D[0, 1], where M = j=0 (ϕj ( s=0 δs ))E[ε1 η1 ]. ∞ −d 1 x (x + 1)−d dx, Proof. It is well-know that, with cd = (1−d)(3−2d) 0 d2n
∼
(3.55)
cd n3−2d ρ2 (n) (iii), δ2n
(iv).
Suppose the {Xk }k≥1 is a tight random variable sequence, g(x) is a locally bounded function. It is obvious that g( max Xk ) = OP (1). 1≤k≤n
Noting that [nt]
1 √ f (yn,i−1 )ui n i=1
pg 107/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
108
linwang
Weak Convergence and its Applications [nt]
∞
1 = √ f (yn,i−1 ) ϕj i−j n i=1
1 = √ n
∞
j=0
[nt]+j
ϕj
j=0
∞
1 f (yn,i−1 )i−j + √ ϕj f (yn,i−1 )i−j n j=0 i=1+j i=1 [nt]+j
∞
1 −√ ϕj n j=0
j
f (yn,i−1 )i−j
i=[nt]+1
[nt]+j ∞ 1 =: √ ϕj f (yn,i−1 )i−j + R([nt]). n j=0
i=1+j
We first prove that for any δ > 0, lim sup P{ sup |R([nt])| ≥ δ} = 0.
(3.56)
0≤t≤1
n→∞
In fact, we have ∞ 1 E sup ( ϕj n 0≤t≤1 j=0
≤ ≤
1 n C n
∞
|ϕj |
j=0 ∞
∞
|ϕj |
j=0
[nt]+j
i=[nt]+1
j|ϕj |[E(
j=0 ∞
sup [nt]+1≤i≤[nt]+j
j|ϕj |[E(
j=0
and E[|yn,k |2α ] =
f (yn,i−1 )i−j )2
sup [nt]+1≤i≤[nt]+j
f 2 (yn,i ))]1/2 (1 + |yn,i |α )2 )]1/2
k k O((E[| i=1 xi |2 ])α ) 1 2α E[| x | ] = = O(1) i d2α d2α n n i=1
by Wang, Lin and Gulati (2003), we can easily obtain (3.56). Furthermore, [nt]+j ∞ 1 √ ϕj f (yn,i−1 )i−j n j=0 i=1+j ∞
[nt]
∞
[nt]
1 = √ ϕj f (yn,i+j−1 )i n j=0 i=1 ∞
[nt]
1 1 = √ ϕj [f (yn,i+j−1 ) − f (yn,i−1 )]i + √ ϕj f (yn,i−1 )i n j=0 n j=0 i=1 i=1 and ∞
[nt]
1 √ ϕj [f (yn,i+j−1 ) − f (yn,i−1 )]i n j=0 i=1
pg 108/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
109
Convergence to Semimartingales [nt]
∞
1 = √ ϕj f (yn,i−1 )(yn,i+j−1 − yn,i−1 )i n j=0
1 + √ 2 n
i=1
∞
ϕj
j=0
[nt]
f (ξn,i−1 )(yn,i+j−1 − yn,i−1 )2 i ,
i=1
where ξn,i−1 is the random variables between yn,i−1 and yn,i+j−1 . We have [nt]
∞
1 | √ ϕj f (ξn,i−1 )(yn,i+j−1 − yn,i−1 )2 i | 2 n j=0 i=1 [nt] i+j−1 ∞ 1 =| √ 2 ϕj f (ξn,i−1 )( xk )2 i | 2 ndn j=0 i=1 k=i
∞ 1 P 2 n √ j|ϕj | − → 0. max |εh | sup xh 2 ≤C n 1≤h≤n d 1≤h<∞ n j=0
Note that [nt]
∞
1 √ ϕj f (yn,i−1 )(yn,i+j−1 − yn,i−1 )i n j=0 i=1 = √
[nt] i+j−1 ∞ ∞ 1 ϕj f (yn,i−1 )( δs ηk−s )i , ndn j=0 s=0 i=1 k=i
i+j−1 ∞
j=1 i+j−1 j+i−k j+k ∞ δs ηk−s = ( δs )ηi + ( δs )ηk + ( δs )ηi−k
k=i s=0
s=0
k=i+1
s=0
k=1 s=k
and E|
[nt] i=1 [nt]
≤C
i=1
≤C
k=1 s=k j+k ∞ E[(f (yn,i−1 ))2 ( ( δs )ηi−k )2 ]
[nt]
j+k ∞ f (yn,i−1 )( ( δs )ηi−k )i |2
k=1 s=k j+k ∞ (E[(f (yn,i−1 ))4 ])1/2 (E[( ( δs )ηi−k )4 ])1/2 ,
i=1
k=1 s=k
and furthermore, E[(f (yn,k ))4 ] ≤ C(1 + E[|yn,k |4α ]),
E[|yn,k |4α ] =
k k O((E[| i=1 xi |2 ])2α ) 1 4α E[| x | ] = = O(1) i d4α d4α n n i=1
pg 109/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
110
linwang
Weak Convergence and its Applications
by Wang, Lin and Gulati (2003), and j+k ∞ ∞ E[( ( δs )ηi−k )4 ] ≤ C(j( δh4 )). k=1 s=k
h=0
Then E|
n
j+k ∞ f (yn,i−1 )( ( δs )ηi−k )i |2 ≤ C jn.
i=1
(3.57)
k=1 s=k
By the Kolmogorov inequality for martingale, [nt] j+k ∞ ∞ 1 E sup ( √ ϕj f (yn,i−1 )( ( δs )ηi−k )εi )2 ndn j=0 0≤t≤1 i=1 k=1 s=k
≤ ≤
∞
|ϕj |
∞
j=0
j=1
∞
∞
|ϕj |
j=0
|ϕj |
j+k ∞ 1 E| f (y )( ( δs )ηi−k )i |2 ) n,i−1 nd2n i=1 n
k=1 s=k
j|ϕj |
j=1
1 → 0. nd2n
By the similar method, we can also obtain that E sup ( √ 0≤t≤1
[nt] i+j−1 ∞ j+i−k 1 ϕj f (yn,i−1 )(( ( δs )ηk )εi )2 → 0, ndn j=0 s=0 i=1
(3.58)
k=i+1
[nt] j j ∞ 1 E sup ( √ ϕj f (yn,i−1 )(( δs )ηi − E( δs )ηi )εi )2 → 0, ndn j=0 0≤t≤1 s=0 s=0 i=1
Then, we obtain ∞
[nt]
1 ϕj f (yn,i−1 )(yn,i+j − yn,i−1 )i sup | √ n j=0 0≤t≤1 i=1 [nt] j ∞ 1 P √ − ϕj f (yn,i−1 )E(( δs )ηi i ))| − → 0. ndn j=0 s=0 i=1
Thus we just need to discuss the weak convergence of [nt] [nt] j ∞ ∞ 1 1 √ ϕj f (yn,i−1 )i + √ ϕj f (yn,i−1 )E(( δs )ηi i )). n j=0 ndn j=0 s=0 i=1 i=1
Obviously, we have [n·]
[n·]
1 1 εi , √ ηi ) ⇒ (B, W ), (√ n i=1 n i=1 where (B, W ) is 2-dimensional Brownian motion, COV(Bt , Wt ) = tE(ε0 η0 ).
(3.59)
pg 110/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
111
By the continuous mapping theorem and Theorem 2 in Sowell (1990), [n·] 1 xi ⇒ G, dn i=1
[n·] [n·] 1 1 ui , f ( xi )) ⇒ (W, f (G)), (√ n i=1 dn i=1 [n·]
1 f (yn,i ) ⇒ n i=2 where
Gt =
3/2−d
· 0
f (Gs )ds
Bt
under (iii),
Wt
under (iv).
(3.60)
(3.61)
By (iii) and (3.61), we have [nt] j ∞ 1 P ϕj f (yn,i−1 )E(( δs )ηi i ))| − → 0. sup | √ ndn j=0 0≤t≤1 s=0 i=1
(3.60) implies the C-tightness and finite dimensional convergence of [n·] [n·] ( √1n i=1 ui , f ( d1n i=1 xi )), since W, f (G) are continuous path processes. By Theorem 3.6, we obtain (3.54). Similarly, we can obtain [n·] [n·] [n·] 1 1 1 ui , f ( xi ), f ( xi )) ⇒ (W, f (G), f (G)) (√ n i=1 dn i=1 dn i=1
By (iii) and (3.61), d2n = δ 2 n, we can obtain · [n·] j 1 f (y )E(( δ )η )) ⇒ K f (Gs )ds. n,i s i i d2n 0 s=0 i=2
By Theorem 3.6, we complete the proof. 3.5
Stable central limit theorem for semimartingales
In the study of limit theorems for semimartingales, we usually need to deal with the mixed normal limits. More precisely, we have that d
Yn − → V N, where N ∼ N (0, 1), V is a positive random variables independent of N . Usually, the distribution of V is unknown and thus the weak convergence of Yn can not be used for statistical purposes, since confidence intervals are unavailable.
pg 111/3
April 23, 2014
10:51
112
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Furthermore, if V is the weak limit of a sequence {Vn }, the weak convergence d
Yn − →VN does not imply (Yn , Vn ) ⇒ (V N, V ).
(3.62)
For this reason we need a stronger mode of convergence of Yn , which would imply the joint weak convergence in (3.62) for random variable V . Stable convergence in law is exactly the right type of convergence to guarantee this property. Firstly, we introduce the conditional Gaussian martingale and martingale biased conditional Gaussian martingale, which are the limiting processes in stable convergence usually. We start with a stochastic basis (Ω, F , (Ft )t∈[0,1] , P). The extension of F , (F t )t∈[0,1] , P), which (Ω, F , (Ft )t∈[0,1] , P) is another filtered probability space (Ω, is constructed as follows: = Ω×Ω , F = F ⊗F , F t = Fs ⊗Fs , P(dω, dω ) = P(dω)Qω d(ω ), (3.63) Ω s>t
where (Ω , F , (Ft )t∈[0,1] ) is an auxiliary space, Qω d(ω ) is a transition probability from (Ω, F ) into (Ω , F ). Let Mb be the set of all bounded martingales on (Ω, F , (Ft )t∈[0,1] , P). Definition 3.3. A continuous process X on the extension is called an F −conditional Gaussian martingale if X is a local martingale on the extension, orthogonal to all elements of Mb , and < X, X > is (Ft )−adapted. Let M be a continuous local martingale, and Mb (M ⊥ ) be a class of Mb which are orthogonal to M . Definition 3.4. A continuous process X on the extension is called an M −biased F −conditional Gaussian martingale if it can be written as t X t = Xt + us dMs , 0
where X and u are adapted continuous processes. Now, we recall some facts about stable convergence. Let {Xn } be a sequence of random elements defined on (Ω, F , P), which are taking values in Polish space E, and X be an E−valued random element on the extension space. s−L
Definition 3.5. We say that {Xn } stable converges in law to X, and write Xn → X, if f (X)) E(Y f (Xn )) → E(Y
for all f : E → R bounded continuous functions and any bounded variable Y on (Ω, F ).
pg 112/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
113
Jacod (1997) gave the following theorem, which is the basic result in stable convergence. Theorem 3.8. Let {S n } be a sequence of continuous semimartingales on the stochastic basis (Ω, F , (Ft )t∈[0,1] , P) with predictable characteristics {(B n , C n , 0)}, where F is separable. Assume that there are two continuous adapted processes C and D, and a continuous bounded variation function B on (Ω, F , (Ft )t∈[0,1] , P) such that P
sup |Btn − Bt | − →0
(3.64)
t∈[0,1] P
→ 0 for any t ∈ [0, 1], Ctn − Ct − P
< M n , M > −Dt − → 0 for any t ∈ [0, 1], P
→ 0 for any t ∈ [0, 1] and any N ∈ Mb (M ⊥ ), < S n , N >− n
(3.65) (3.66) (3.67)
n
where M is the local martingale part of S , and M is a given martingale. Then, there is a extension of (Ω, F , (Ft )t∈[0,1] , P) and an M −biased continuous F −conditional Gaussian martingale S on this extension with < S, S >= C,
< S, M >= D,
such that s−L
S n → S + B. Proof.
It is enough to prove s−L
Mn → S
(3.68)
by (3.64). There is a sequence of bounded variables {Ym }m≥1 which is dense in L1 (Ω, F , P) since F is separable. Set Ntm = E[Ym |Ft ], so N m ∈ Mb , and we have two important properties (c.f. (4.15) in Jacod (1979)): (i) Every bounded martingale is the limit in L2 , uniformly in time, of a sequence of sums of stochastic integrals with respect to some N m ’s. (ii) (Ft )t∈[0,1] is the smallest filtration, such that all N m ’s are adapted. By (3.64) and (3.65), we obtain {M n} is tight. Now, we choose any subse quence, indexed by n , such that {(M n , M, N )} converges in law. We can realize the limiting process as follow: consider the canonical space (Ω , F , (Ft )t∈[0,1] ) = (C[0, 1], C[0, 1], C[0, 1]) with the canonical process S, and define the extension as on the extension, whose (3.63). Furthermore, there is a probability measure P n Ω−marginal is P, such that {(M , M, N )} converges in law to (S, M, N ) under P. We obtain that S is an M −biased continuous conditional Gaussian martingale by Lemma 3.2 and (3.65)-(3.67). Furthermore, the law of S is determined by the processes M, C, D, and it does not depend on the subsequence {n } above. Noting
pg 113/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
114
linwang
Weak Convergence and its Applications
that, in the procedure of proving, we need to prove SN is a local martingale on the extension for any N ∈ Mb (M ⊥ ), which can be implied by (3.67). Hence (S)N m ) E(f (M n )N1m ) → E(f 1 for every bounded continuous function f . Furthermore, (S)Ym ) E(f (M n )Ym ) → E(f N m ) = E(U Ym ) for any bounded random variable U . since E(U 1 Finally, any bounded random variable Y is the L1 −limit of some subsequence of Ym , hence (S)Y ) E(f (M n )Y ) → E(f
which means (3.68).
Assume that a sequence {Δn } of constants satisfying Δn → 0 as n → ∞, {Xn,i }i≥1 is a triangular array of square integrable random variables on the filtered probability space (Ω, F , (Ft )t∈[0,1] , P), where F is separable, Xn,i ’s are FiΔn −measurable. Theorem 3.8 implies the following theorem easily. In the study of high frequence statistics for stochastic processes X, Xn,i is usually equal to XiΔn − X(i−1)Δn . Theorem 3.9. Assume that there are two absolutely continuous adapted processes u and v, and a continuous bounded variation function B on (Ω, F , (Ft )t∈[0,1] , P) such that sup |
[nt]
t∈[0,1] i=1 [nt]
P
E[Xn,i |F(i−1)Δn ] − Bt | − → 0,
P
2 (E[Xn,i |F(i−1)Δn ] − E2 [Xn,i |F(i−1)Δn ]) − Ft − → 0 for any t ∈ [0, 1],
(3.69)
(3.70)
i=1 [nt]
P
E[Xn,i (WiΔn − W(i−1)Δn )|F(i−1)Δn ] − Gt − → 0 for any t ∈ [0, 1],
(3.71)
i=1 [nt]
P
E[Xn,i (NiΔn −N(i−1)Δn )|F(i−1)Δn ] − → 0 for any t ∈ [0, 1] and any N ∈ Mb (W ⊥ ),
i=1
(3.72) [nt]
P
2 E[Xn,i 1{|Xn,i |>ε} (NiΔn −N(i−1)Δn )|F(i−1)Δn ] − → 0 for any t ∈ [0, 1] and any ε > 0,
i=1
(3.73) where W is standard Brownian motion, t t (vs2 + ws2 )ds, Gt = vs ds. Ft = 0
0
pg 114/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
115
Then, there is a extension of (Ω, F , (Ft )t∈[0,1] , P) and a Brownian motion W on this extension, and independent of F , such that · · [·/Δn ] s−L Xn,i → B + vs dWs + us dWs . 0
i=1
0
This theorem is more suitable than Theorem 3.8 to study high frequency statistics.
3.6
An application to stochastic differential equations Let us consider the following stochastic differential equation (SDE): t Xt = x0 + f (Xs− )dYs ,
(3.74)
0
where f denotes a continuous function and Y is a R−valued semimartingale on (Ω, F , (Ft )0≤t≤1 , P). Numerical methods for SDEs are a quite important topic. The method provide ˘ The law of X ˘ may be easy to obtain. We a numerical solution of SDE, say X. ˘ and the weak can obtain the law of solution of SDE through the known law of X ˘ − X. convergence of X 3.6.1
Euler method for stochastic differential equations
In order to study SDE (3.74), we consider the Euler continuous approximation X n to X given by dXtn = f (Xϕnn (t) )dYt , X0n = x0 ,
(3.75)
where ϕn (t) = [nt]/n if nt ∈ N and ϕn (t) = t − 1/n if nt ∈ / N, and the Euler n discontinuous approximation X given by n
n X t = X[nt]/n .
(3.76)
The corresponding error processes are denoted by n
n
n Utn = Xtn − Xt , U t = X t − X[nt]/n = U[nt]/n .
(3.77) n
The aim of this section is to find the asymptotic distribution of U n and U . Weak convergence of stochastic integrals (Theorem 3.6) will play an important role in this section. Note that t t n n (f (Xϕn (s) ) − f (Xϕn (s) ))dYs − (f (Xs− ) − f (Xϕn (s) ))dYs , (3.78) Ut = 0
0
and set
Wtn =
0
t
(f (Xs− ) − f (Xϕn (s) ))dYs .
pg 115/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
116
linwang
Weak Convergence and its Applications
We obtain
Utn =
t
(f (Xϕn (s) + Uϕnn (s) ) − f (Xϕn (s) ))dYs − Wtn
0 t
= 0
(3.79)
n f (Xϕn (s) )Us− dYs − Wtn + oP (1).
We study the weak convergence of U n , generally, we are interested in convergence of a sequence of SDE’s with the form t n Ktn = Jtn + Ks− Hsn dYs , (3.80) 0
where K n , J n , H n are stochastic processes. The following theorem provides a basis for the results in this section. Theorem 3.10. (Jacod and Protter (1998)) (a) Tightness of both sequences J n∗ and H n∗ implies tightness of the sequence K n∗ . (b) Suppose that we have another equation (3.80) with solution K n and coeffi P cients J n and H n . If the sequences J n∗ and H n∗ are tight and if (J n − J n )∗ → 0
P
P
0, then (K n − K n )∗ → 0. and (H n − H n )∗ → t (c) Let Vtn = 0 Hsn dYs . Suppose that the sequence H n∗ is tight and that the sequence (J n , V n ) stably converges to a limit (J, V ) defined on some extension of the space. Then V is a semimartingale, and s−L
(J n , V n , K n ) → (J, V, K), where K is the unique solution of
Kt = Jt +
t
0
Ks− dVs .
(3.81)
Proof. Let Z be a c`adl` ag process on [0, 1] and T a stopping time with respect to (Ft )0≤t≤1 , define the process Z T − by ZtT − = Zt 1[0,T ) (t) + ZT − 1[T,1] (t). For
Kt = Jt +
t 0
Ks− Hs dYs ,
(3.82)
let K is the solution of another equation (3.82) associated with J and H , and with the same semimartingale Y . From the slicing technique of Dol´eans-Dade (see Theorem 5 in Chapter V of Protter (2005)), we know for any semimartingale Y and any ε > 0, there is a stopping time T such that P(T < 1) ≤ ε, and that the semimartingale Y := Y T − is sliceable, i.e., t E( sup | Hs dY s |) ≤ CY E(H ∗ ), t∈[0,T ]
0
pg 116/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
117
Convergence to Semimartingales
where constant CY only depends on Y . At last, if Y is sliceable and if we consider (3.82) with |H| ≤ A a.s. for some constant A, then E(K ∗ ) ≤ CA,Y E(J ∗ ) for a constant CA,Y depending on A, Y . Fix positive A, ε and u, v, ω, we set S = inf(t : |Ht | > A or |Jt | > u or |Ht − Ht | > υ or |Jt − Jt | > ω) ∧ T
and J = J S− , J = J
S−
i
i
, and ith component of H is H = −A ∨ H ∧ A, and
similarly for H . We consider the solutions K and K of (3.82), associated with (J , H, Y ) and (J , H , Y ) respectively. Note that
K = K, K = K on the set {S > 1}.
(3.83)
Note also that K = K − K is the solution of (3.82) associated with (J , H, Y ), where t Jt = Jt − Jt + (H s − H s )K s− dY s . 0
Using the properties of sliceable semimartingales, if υ ≤ A and ω ≤ u, E(K
∗ ∗
) ≤ qE(J
∗
), E(J
∗
∗
) ≤ ω + qυE(K ),
E(K ) ≤ (υ + ω)q ≤ (A + u)q, where q depends on A, ε and Y . Thus P(S ≤ 1) ≤ ε + P(H ∗ > A) + P(J ∗ > u) + P((H − H )∗ > υ) + P((J − J )∗ > ω). Hence P((K − K )∗ > η) ≤ ε + P(H ∗ > A) + P(J ∗ > u) +P((H − H )∗ > υ) + P((J − J )∗ > ω) uυ + ω q, + η
which, when J = 0 and H = 0, yields P(K n∗ > η) ≤ ε + 2P(H n∗ ≥ A) + 2P(J n∗ ≥ u) +
u Cε,A,Y , η
(3.84)
where Cε,A,Y is a constant depending on ε, A and Y . Then we obtain that P(K n∗ > η) is smaller than Cε, hence the sequence {K n∗ } is tight. Thus (a) and (b) are proved. The assumptions ensure that the sequence {V n } has (∗). Since stable convergence is just weak convergence of (U, J n , V n ) to (U, J, V ) for any random variable U on the original probability space, we complete the proof.
pg 117/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
118
linwang
Weak Convergence and its Applications
For any process X we write (n)
ni X = Xi/n − Xi−1/n , Xt
= Xt − X[nt]/n .
For any two semimartingales M and N , we write t (n) Ms− dNs . Ztn (M, N ) =
(3.85)
(3.86)
0
The fundamental result on the error distribution of Euler method is presented now. Theorem 3.11. (Jacod and Protter (1998)) Let Z n = Z n (Y, Y ), where Y is a R−valued semimartingale on (Ω, F , (Ft )0≤t≤1 , P), and let αn be a sequence of positive numbers. There is equivalence between the following statements: (a) there exists Z on extension of the space on which Y is defined, and the sequence αn Z n has (∗) and (Y, αn Z n ) ⇒ (Y, Z); (b) For any x0 and any differentiable function f with linear growth, there exists U on extension of the space on which Y is defined, and the sequence αn U n has (∗) and (Y, αn U n ) ⇒ (Y, U ); Under (a) or (b), we can realize the limits Z and U above on the same extension space, and dUt = f (Xt− )[Ut− dYt − f (Xt− )dZt ], U0 = 0,
(3.87)
and (Y, αn Z n , αn U n ) ⇒ (Y, Z, U ). This theorem can be easily obtained by Theorem 3.10. The details can be founded in Jacod and Protter (1998). Applying Theorem 3.11 to continuous local martingale, we have the following theorem. Theorem 3.12. Let Y be a continuous local martingale, such that there exist continuous adapted process c such that t cs ds Ct =< Y, Y >= 0
and assume
Then the sequence process Z given by
√
1 0
c2s ds < ∞.
(3.88)
nZ n , defined in Theorem 3.11, stably converges in law to a 1 Zt = √ 2
t 0
σs2 dWs ,
pg 118/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
119
Convergence to Semimartingales
where σ2 = c, W is a standard Brownian motion defined on an extension of the space on which Y is defined and independent of Y . Moreover, we also have √ (Y, nZ n ) ⇒ (Y, Z) and (Y,
√
nU n ) ⇒ (Y, U ).
Proof. Up to enlarging the space, we can assume that there is a Wiener process W such that t σs dWs . Yt = 0
By Theorem 3.8, if we prove that for all t ∈ (0, 1], √
P
Dtn → Dt ,
P
nZ n , W t → 0,
(3.89)
where Dtn = n < Z n , Z n >t , 1 Dt = 2
t 0
c2s ds,
√ then the processes nZ n will converge stably in law to the process Z. We complete the proof. At first, we assume there exists m ∈ N such that σ has the form σs =
m
Ai−1 1(ti−1 ,ti ] (s),
(3.90)
i=1
where 0 = t0 < t1 < · · · < tm = 1 and where Ai is a bounded Fti −measurable random variable. Set τn (u) = u − [nu]/n. By the Burkholder-Gundy inequality we have, for some constant K, (n) 4
E(Yt
) ≤ K/n2 .
Recall that Ytn = Yt − Y[nt]/n . Since Yun with Br = A2r , and for tr ≤ [nu]/n ≤ u ≤
(n) Ar Wu
= v ≤ tr+1 ,
(3.91) for tr ≤ [nu]/n ≤ u ≤ tr+1 ,
E(Yu(n) Yv(n) |Fsr ) = Br τn (u)1{[nu]=[nv]} , E((Yu(n) )2 (Yv(n) )2 |Fsr ) = τn (u)τn (v)(Br )2 + 2τn (u)2 (Br )2 1{[nu]=[nv]} . Fix r and t such that 0 < t ≤ tr+1 − tr . We have tr +t Dtnr +t − Dtnr = nBr (Yu(n) )2 du, tr
pg 119/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
120
linwang
Weak Convergence and its Applications
Dtr +t − Dtr =
1 2 B t, 2 r
√ √ √ nZ n , W tr +t − nZ n , W tr = nBr
tr +t
Yu(n) du.
tr
Set s(n) = ([ntr + 1])/n, then as n → ∞ tr +t √ (n) 2 2 (Yu ) du] → 0 and E[ n E[n s(n)
tr +t
Yu(n) du]2 → 0
s(n)
by (3.91). So it remains to prove that Eα2n → 0, where
tr +t
αn = n s(n)
We have Eα2n
=n
2
t (Yu(n) )2 du − Br . 2
[s(n),tr +t]2
+ On the one hand,
t2 EBr2 − nt 4
tr +t
nt
E(Yu(n) )2 (Yv(n) )2 dudv
tr +t
E(Br (Yu(n) )2 )du.
s(n)
E(Br (Yu(n) )2 )du →
s(n)
t2 EBr2 . 2
On the other hand, * * K * * *E(Yu(n) )2 (Yv(n) )2 − τn (u)τn (v)EBr2 * ≤ 2 1{[nu]=[nv]} , n and thus t2 E(Yu(n) )2 (Yv(n) )2 dudv → EBr2 . n2 4 [s(n),tr +t]2 Thus Eα2n → 0, and we have (3.89) for the case of (3.90). It remains to prove (3.88) implies (3.89) in the general case. t that 2 Let Tp = inf(t : 0 cs ds ≥ p). Since (3.88) yields P(Tp < 1) → 0 as p → ∞ by localization, it is enough to prove the result for the processes stopped at time Tp . From now on, we assume that 1 c2s ds ≤ p. 0
pg 120/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
121
There exists a sequence σ(r) of processes of the form (3.90), such that 1 |σs − σ(r)s |4 ds → 0, ηr := 0
t
1
4
|σ(r)s | ds ≤
0
0
1
σs4 ds ≤ p.
Let Y (r)t = 0 σ(r)s dWs with the associated processes Z(r)n and D(r)n and D(r). In fact, for each r we have that the following converge for all t P
D(r)nt → D(r)t ,
√ P nZ(r)n , W t → 0.
We have, with c(r) = σ(r)2 , * t * * * |D(r)nt − Dtn | = n ** ((Y (r)s(n) )2 c(r)s − (Ys(n) )2 cs )ds** 0 t
≤n
0
(Y (r)s(n) )2 |σ(r)s − σs |(|σ(r)s | + |σs |)ds
t
+ 0
|Y (r)s(n) − Ys(n) |(|Y (r)s(n) | + |Ys(n) |)σs2 ds.
By the Burkholder-Gundy inequality and the Cauchy-Schwarz inequality, ! s K (n) 4 4 E(|Ys | ) ≤ E |σ(q)u | du , n [ns]/n thus
t 0
for some constant K, and also t 0
E(|Ys(n) |4 )ds ≤
K n2
E(|Y (r)s(n) |4 )ds ≤
K . n2
The same argument shows that t Kηr E(|Y (r)s(n) − Ys(n) |4 )ds ≤ 2 . n 0 Thus E(|D(r)nt − Dtn |) ≤ Kηr1/4 , which implies the first part of (3.89). The second part of (3.89) is proved similarly.
pg 121/3
April 23, 2014
10:51
122
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Random grid scheme for SDEs is an interesting subject. Comparing to the equidistant determinsitic grid, random grid scheme can be chosen to design the approximation error so that it has desirable properties. Lindbwerg and Rootz´en (2013) considered the random grid scheme, which is based on the following stopping times, τ0n = 0, n τk+1 = (τkn +
1 )∧1 nθ(τkn )
for some adapted stochastic process θ. Let ηn (t) = τkn ,
n τkn ≤ t < τk+1
for k = 1, 2, · · · . Theorem 3.13. (Lindbwerg and Rootz´en (2013)) Let the measurable functions α(·) : R → R, β(·) : R → R satisfy |α(x)| + |β(x)| ≤ C(1 + |x|), where x ∈ R for some constant C and |α(x) − α(y)| + |β(x) − β(y)| ≤ D(|x − y|), where x, y ∈ R for some constant D. Let Y be the solution of the SDE dY (t) = α(Y (t))dt + β(Y (t))dW (t),
(3.92)
where W is a Brownian motion and Y (0) is independent of W and satisfies EY (0)2 < ∞. Furthermore, assume sup θ(t) < ∞
a.s.
t∈[0,1]
and 1/θ is Riemann integrable. The error in the Euler scheme is defined by t √ n (f (Y (u)) − f (Y (ηn (u))))dY (u). Ut = n 0
where f is a continuously differentiable function. Then √ n ⇒U nU on C[0, 1], where t = U
0
t
f (Y (u))β 2 (Y (u)) dB(u) 2θ(u)
and B is a Brownian motion independent of W . Proof. We first assume α and β are uniformly bounded. There exists a unique solution Y of (3.92) by Theorem 5.2.1 in Oksendal (2003). By Theorem 3.11, we firstly discuss the joint weak convergence of (Z n , Y ), where t √ (Y (s) − Y (ηn (s)))dY (s). Z n (t) = n 0
pg 122/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
123
Convergence to Semimartingales
We first show that √
n sup | t∈[0,1]
Write
0 i=1
t ∞
√ n =
t ∞
τin
P
n } β(Y (u))dW (u)ds| − 1{τin ≤s<τi+1 → 0.
(3.93)
s
0 i=1 τin t ∞ s
√ n
s
n } β(Y (u))dW (u)ds 1{τin ≤s<τi+1
n n } (β(Y (u)) − β(Y (τ )))dW (u)ds 1{τin ≤s<τi+1 i
0 i=1 τin t ∞ s
√ + n
0 i=1
τin
n n } β(Y (τ ))dW (u)ds 1{τin ≤s<τi+1 i
=: I1 + I2 . In fact
I2 =
t
0
∞ √ n n n } β(Y (τ ))(W (s) − W (τ ))ds. n 1{τin ≤s<τi+1 i i i=1
We have that √
n τi+1
nE[ τin
=
√
nβ(Y
β(Y (τin ))(W (s) − W (τin ))ds|Fτin ]
(τin ))
1/nθ(τin ) 0
EW (s)ds
= 0, and
Min :=
τin 0
∞ √ n n n n 1{τkn ≤s<τk+1 } β(Y (τk ))(W (s) − W (τi ))ds k=1
is a martingale. Using the Cauchy-Schwarz inequality, ∞ i=1
=
∞
n E[(Mi+1 − Min)2 |Fτin ]
E[(
i=1 ∞
≤n
2
n τi+1
τin
β (Y
∞ √ n n 2 n n 1{τkn ≤s<τk+1 } β(Y (τk ))(W (s) − W (τi ))ds) |Fτin ] k=1
n (τin ))(τi+1
i=1
−
τin )
n τi+1
τin
E[(W (s) − W (τin ))2 |Fτin ]ds
∞ n 2 n β (Y (τin ))(τi+1 − τin )3 = 4 i=1 1 β(Y (τ n )) β(Y (τin )) ) → 0. ≤ max( 3/2 i n ) ( 4 i nθ (τi ) τ n <1 nθ3/2 (τin ) i
pg 123/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
124
linwang
Weak Convergence and its Applications
Thus P
max |Min | − →0
(3.94)
i
by the Doob inequality of martingale. Moreover, t ∞ √ n n n max sup n 1{τkn ≤s<τk+1 } β(Y (τk ))(W (s) − W (τi ))ds i
n τin ≤t<τi+1
τin
k=1
1 ∞ √ n n n 2 1/2 n − τin }1/2( ( n 1{τkn ≤s<τk+1 ≤ max{τi+1 } β(Y (τk ))(W (s) − W (τi ))) ds) i
0
k=1
P
− → 0.
(3.95)
Combining (3.94) with (3.95) implies P
I2 − → 0.
(3.96)
Consider I1 . By Itˆ o’s isometry, s n 2 n } (β(Y (u)) − β(Y (τ )))dW (u)) |Fτ n ] E[( 1{τin ≤s<τi+1 i i τin s
= E[ ≤C
n τi+1
τin
≤C
n 2 n } (β(Y (u)) − β(Y (τ ))) du|Fτ n ] 1{τin ≤s<τi+1 i i
τin
n τi+1
τin
Let Δni (t) =
√
E[(Y (u) − Y (τin ))2 |Fτin ]du n (u − τin )du ≤ C(τi+1 − τin )2 .
n τi+1 ∧t
n τi+1 ∧s
n τin
τin
(β(Y (u)) − β(Y (τin )))dW (u)ds,
Using the Doob inequality and the Cauchy-Schwarz inequality, we have E[
sup
n τin ≤t<τi+1
|Δni (t)||Fτin ]
√ n ≤ n(τi+1 − τin )E[ ≤C
√
sup
n τin ≤s<τi+1
n n(τi+1
−
τin )E[
√ n ≤ C n(τi+1 − τin )2 .
n τi+1
τin
|
s
τin
(β(Y (u)) − β(Y (τin )))dWj (u)||Fτin ]
(β(Y (u)) − β(Y (τin )))2 du|Fτin ]1/2
Thus ∞ i=1
E[
sup
n τin ≤t<τi+1
∞ √ n |Δni (t)||Fτin ] ≤ C n (τi+1 − τin )2 i=1
pg 124/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
125
C 1 a.s. ≤ √ sup −−→ 0. n 0≤t≤1 θ(t) Hence, a.s.
I1 −−→ 0. Combining it with (3.96) implies (3.93). Completely similar, but more complex computation shows that t ∞ s n n 1{τkn ≤s<τk+1 } β(Y (u))dW (u) 0 k=1 s
· = n P
− →
1 2
τkn ∞ t
0 k=1 t 4
τkn
2 n 1{τkn ≤s<τk+1 } β(Y (z))dWm (z)β (Y (s))ds 4 n n 2 n 1{τkn ≤s<τk+1 } β (Y (τk )) · (W (s) − W (τk )) ds + oP (1)
β (Y (s))/θ(s)ds.
0
By Theorem 3.8 and Theorem 3.11, we obtain the result under the uniformly boundness of α and β. To remove the restriction, we can apply the local procedure, which is similar to that in Theorem 3.12 and is left to the reader. Remark 3.2. Theorem 3.13 can be extend to the multidimentional case by similar argument. 3.6.2
Milstein method for stochastic differential equations
The Milstein scheme can improve the rate of convergence for Euler method from 1/n to 1/n2 . In this subsection, we study the normalized asymptotic error for Milstein scheme. Let Y be an R−valued continuous semimartingale on (Ω, F , (Ft )0≤t≤1 , P) with Y0 = 0. Consider the SDE (3.74), here we assume f is a twice order differentiable function, and satisfies |f (x)| ≤ C(1 + |x|) for some constant C. ˆ0n = 0, and The Milstein scheme is defined by X t ˆn ˆ n )(Yt − Yϕ (t) ) + ˆ n )(Ys − Yϕ (s) )dYs , ˆ tn = X + f ( X g(X X ϕn (t) ϕn (t) ϕn (t) n n ϕn (t)
where g(x) = f (x)f (x), ϕn (t) = [nt]/n if nt ∈ N and ϕn (t) = t − 1/n if ¯ N. nt ∈ As above, our concern is ˆ n := X ˆ n − Xt . U t t
pg 125/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
126
linwang
Weak Convergence and its Applications
We first introduce some notation below. For any process V , we write Δnt (V ) = (n) Vt := Vt − Vϕn (t) , and t n Ys(n) dYs , Zt = 0 t s n Mt = Yr(n) dYr dYs , Ntn = Rtn
0 t
0 t
= 0
ϕn (s)
(Ys(n) )2 dYs , ˆ n )Ys(n) dYs . g(X ϕn (s)
Obviously, n ˆn ˆn ˆ tn = X X ϕn (t) + f (Xϕn (t) )(Yt − Yϕn (t) ) + Rt ,
and therefore, ˆ n ) = f (X ˆ n )Δnt (Y ) + Δnt (Rn ). Δnt (X ϕn (t) ˆ n )dYs , then Integrating the both sides with respect to f (X s t t ˆ n )f (X ˆ n )dYs = ˆ n )dYs + Rtn . Δns (X Δns (Rn )f (X ϕn (s) ϕn (s) 0
(3.97)
0
Thus
ˆn = U t
t
0
ˆ n ) − f (Xs )] − [f (X ˆ n ) − f (X ˆ n )])dYs + Rn . ([f (X s s t ϕn (s)
ˆ sn and Xs , and ξsn between X ˆ sn and By Taylor’s expansion, there exist ξ¯sn between X n ˆ Xϕn (s) such that ˆsn , ˆ sn ) − f (Xs ) = f (ξ¯sn )U f (X ˆ n ) − f (X ˆ n ) = f (X ˆ n )Δn (X ˆ n ) + 1 f (ξ n )(Δn (X ˆ n ))2 . f (X s s s s ϕn (s) ϕn (s) 2 Therefore, t n ˆ ˆsn dYs + Rtn Ut = f (ξ¯sn )U 0 t 1 t n ˆn n ˆn ˆ n ))2 dYs . − f (Xϕn (s) )Δs (X )dYs − f (ξs )(Δns (X 2 0 0 From (3.97), ˆtn = U
t 0
ˆsn dYs f (ξ¯sn )U
−
0
t
ˆ n )Δn (Rn )dYs − f (X s ϕn (t)
1 2
0
t
ˆ n ))2 dYs . f (ξsn )(Δns (X
pg 126/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
In fact
t
0
ˆ n )Δn (Rn )dYs = f (X s ϕn (s)
linwang
127
t 0
ˆ n )dM n , h(X s ϕn (s)
2
where h(x) = f (x)(f (x)) . Hence t ˆtn = ˆsn dYs U f (ξ¯sn )U 0 t t ˆ n )dM n − 1 ˆ n ))2 dYs . − h(X f (ξsn )(Δns (X s ϕn (s) 2 0 0 Note that
sup |
0≤t≤1
0
t
ˆ n))2 dYs − f (ξsn )(Δns (X
0
t
P ˆ n )dN n | − f (ξsn )f 2 (X s → 0. ϕn (s)
Through simple computation, we can easily obtain Theorem 3.11 for Milstein method, then the following theorem holds. Theorem 3.14. (Yan (2005)) Let αn be a constants sequence of positive numbers. There is equivalence between the following statements (a) The sequence (αn M n , αN n ) has (∗) and (Y, αn M n , αn N n ) ⇒ (Y, M, N ); (b) For any x0 and any twice order differentiable function f with linear growth, the ˆ n has (∗) and sequence αn U ˆ ). ˆ n ) ⇒ (Y, U (Y, αn U ˆ above on the same extension In this case, we can realize the limits M , N and U of the space on which Y is defined, and ˆt f (Xt )dYt − f (Xt )(f )2 (Xt )dMt − f (Xt )f 2 (Xt )dNt ˆt = U dU
(3.98)
ˆ0 =0. and U 3.7
Appendix: the predictable characteristics of semimartingales
Drift, variance of Gaussian part and L´evy measure of L´evy process play an important role in limit theory. The predictable characteristics of semimartingales are introduced to replace these three terms. In this section, we will list the basic concept and property of predictable characteristics of semimartingales. More details and the proofs of propositions can be found in Jacod and Shiryaev (2003). We first introduce the definition of random measure and its related property. Definition 3.6. A random measure on R+ ×R is a family μ = (μ(ω; dt, dx) : ω ∈ Ω) of nonnegative measures on (R+ ×R, B+ ⊗B) satisfying μ(ω; {0}×R) = 0 identically.
pg 127/3
April 23, 2014
10:51
128
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
= Ω× To introduce the compensator of random measure, we need to put: Ω R+ × R with the σ−fields O = O × B, P = P × B, where O is the optional field of Ω × R+ , P is the predictable field of Ω × R+ . is a function which are Definition 3.7. An optional (predictable) function on Ω (P). measurable respect to the fields O for every ω ∈ Ω, the integral is defined by H is a optional function on Ω, [0,t]×R H(ω, t, x)μ(ω; dt, dx) if [0,t]×R |W (ω, t, x)|μ(ω; dt, dx) < ∞; H ∗ μt (ω) = 0, otherwise. Definition 3.8. A random measure μ is called optional (predictable) if W ∗ μ is optional (predictable) for every optional (predictable) function W . −σ−finite if there exists Definition 3.9. An optional random measure μ is called P a strict positive predictable function V such that V ∗ μ∞ < ∞. Now, we introduce the definition of compensator of a random measure. − σ−finite random measure μ, Definition 3.10. The compensator of optional P p denoted by μ , is a predictable random measure such that E(W ∗ μ∞ ) = E(W ∗ μp∞ ) for any positive predictable function W . To characterize the jump of underlying semimartingale, we need to study the integer valued random measure. − σ−finite random measure μ with Definition 3.11. An integer valued optional P μ(ω, t × R) ≤ 1 is called an integer valued random measure. For a semimartingale X, μX (ω; dt, dx) :=
1{Xs (ω) =0} ε(s,Xs ) (dt, dx)
s
is obviously an integer valued random measure. Now, we introduce the definition of predictable characteristics of a semimartingale. Let (Ω, F , {Ft}, P) be a filtered probability space, X be a semimartingale defined on (Ω, F , {Ft}t≥0 , P). Set h(x) = x1|x|≤1 , and ˇ t = [ΔXs − h(ΔXs )], X(h) s≤t (3.99) ˇ X(h) = X − X(h). X(h) is a special semimartingale, a semimartingale with predictable finite variations part, and we consider its canonical decomposition X(h) = X0 + M (h) + B(h)
(3.100)
pg 128/3
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence to Semimartingales
linwang
129
where M (h) is its local martingale part, B(h) is its finite variation part. Denote the continuous local martingale part of X by X c . Definition 3.12. (B, C, ν) is called as predictable characteristics of X if (1) B is a predictable finite variation process, namely the process B = B(h) appearing in (3.100); (2) C =< X c , X c >; (3) ν is the compensator of the random measure μX associated to the jumps of X. Usually, =< M (h), M (h) > C is called as the modified second characteristic of X. It can be obtained by +∞ = C + (h2 ) ∗ ν − ( h(x)ν({s} × dx))2 C s≤·
= C + (h2 ) ∗ ν −
−∞
(ΔBs )2 .
s≤·
The following proposition is critical in the study of semimartingales. Proposition 3.1. There is equivalence between (i) X is a semimartingale with predictable characteristics (B, C, ν). (ii) The following processes are local martingale (a) M (h) = X(h) − B − X0 ; (b) M (h)2 − C; X (c) g ∗ μ − g ∗ ν for any bounded Borel function g. The following two inequalities are widely used in the study of limit theory of martingales. Proposition 3.2. (Lendlart inequality) Let X be a c` adl´ ag adapted process, A be the compensator of X, for any stopping times T and every ε, η > 0, η P(sup |Xs | ≥ ε) ≤ + P(AT ≥ η). ε s≤T Proposition 3.3. There exist two constants K1 and K2 such that every locally square-integrable martingale M with M0 = 0 satisfies E(sup Ms4 ) ≤ K1 a2 E(< M, M >2t )1/2 + K2 E(< M, M >2t ), s≤t
where a = supt,ω |ΔMt (ω)|.
pg 129/3
July 25, 2013
17:28
WSPC - Proceedings Trim Size: 9.75in x 6.5in
This page intentionally left blank
icmp12-master
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Chapter 4
Convergence of Empirical Processes
If X1 , · · · , Xn are i.i.d. real valued random variables with common distribution function F , the empirical distribution function is 1 1{Xi ≤x} n i=1 n
Fn (x) =
and the corresponding empirical process is √ Yn (x) = n(Fn (x) − F (x)). The theory of empirical process is not only important in probability theory, but also provides a basis for studying statistics. In this chapter, we concern the weak convergence of empirical processes. In Section 1, we present the classical weak convergence result. As an application of theorems 3.6 and 4.1, weak convergence of marked empirical processes is given in Section 2. In Sections 3 and 4, the function index and set index empirical processes are studied. 4.1
Classical weak convergence of empirical processes
By the classical central limit theorem, for every fixed x, Yn (x) converge in distribution to a normal random variable with mean 0 and variance F (x)(1 − F (x)). In 1952, Donsker proved a general extension for central limit theorem for {Yn (x)}. He showed that the weak convergence of {Yn } to the Brownian bridge, B, holds for uniform [0, 1] sample with respect to local uniform topology. Brownian bridge B is a Gaussian process satisfying EBt = 0, EBs Bt = s(1 − t) (0 ≤ s < t ≤ 1). In fact, there exists a Wiener process W such that Bt = Wt − tW1 . However Donsker’s formulation was not quite correct because of the problem of measurability of the functionals of discontinuous processes. 131
pg 131/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
132
linwang
Weak Convergence and its Applications
In 1956, Skorokhod introduced the Skorokhod topology, and proved that weak convergence under Skorokhod topology to a continuous process is equivalent to convergence under local uniform topology. In this section, we introduce the classical weak convergence of empirical processes. For simplicity, we assume that X1 , · · · , Xn are i.i.d. real valued random variables with uniform distribution between 0 and 1. For x ∈ [0, 1], √ Zn (x) = n(Fn (x) − x) is random element on D[0, 1]. In order to discuss the weak convergence of Zn (x) under the local uniform topology, it is necessary to modify the random elements into C[0, 1]. Let X1∗ , · · · , Xn∗ be the order statistics of sample X1 , · · · , Xn , X0∗ = 0, ∗ Xn+1 = 1, Gn (Xi∗ ) = i/(n + 1), and Gn (x), 0 ≤ x ≤ 1, the linear interpolation of Gn (Xi∗ ). Then Gn (x) =
n 1 t − max(Xi ; Xi ≤ x) 1{Xi ≤x} + ( ). n + 1 i=1 min(Xi ; Xi > x) − max(Xi ; Xi ≤ x)
Define Wn (x) =
√
n(Gn (x) − x).
We can easily obtain |Gn (Xi∗ ) − Fn (Xi∗ )| = |
i 1 i − |≤ , n+1 n n
thus sup |Gn (x) − Fn (x)| ≤
0≤x≤1
1 n
and 1 sup |Wn (x) − Zn (x)| ≤ √ . n 0≤x≤1
(4.1)
The classical weak convergence of empirical processes is presented in the following. Theorem 4.1. If {Xn }n≥1 is a sequence of i.i.d. real valued random variables with uniform distribution between 0 and 1, we have Zn ⇒ B
(4.2)
where B is a Brownian bridge. Before proving this theorem, we first introduce a lemma. Lemma 4.1. Let {Xn }n≥1 be a sequence of real valued random variables, S0 = 0, k Sk = i=1 Xi , Mn = max0≤k≤n |Sk |, Mn = max0≤k≤n min{|Sk |, |Sn − Sk |}. If there are nonnegative real numbers u1 , · · · , un , γ ≥ 0 and α > 1/2 such that 1 ul )2α (4.3) P(|Sj − Si | ≥ λ, |Sk − Sj | ≥ λ) ≤ 2γ ( λ i≤l≤k
pg 132/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
133
for any λ > 0 and 0 ≤ i ≤ j ≤ k ≤ n, then P(Mn ≥ λ) ≤
Cγ,α ( ul )2α , λ2γ
(4.4)
1≤l≤n
where Cγ,α is a constant dependent on γ, α. Proof.
Denote δ = (2γ + 1)−1 . For K > 1 large enough with
1 1 (4.5) + 2α ) ≤ 1. K 2 For such K, let Cγ,α = K. We prove the lemma by mathematical induction method. When n = 1, (4.4) obviously holds. For n = 2, M2 = min{|S1 |, |S2 − S1 |}, 2(
P(M2 ≥ λ) = P(|S1 | ≥ λ, |S2 − S1 | ≥ λ) Cγ,α 1 ≤ 2γ (u1 + u2 )2α ≤ 2γ (u1 + u2 )2α λ λ by (4.3). We assume the lemma holds for integers which are less than n. Denote n u = i=1 ui , we can assume u > 0. Then there exists integer h, such that 1 ≤ h ≤ n, and 1 u1 + · · · + uh u1 + · · · + uh−1 ≤ ≤ . u 2 u Set U1 =
max min{|Si |, |Sh−1 − Si |},
0≤i≤h−1
U2 = max min{|Sj − Sh |, |(Sn − Sh ) − (Sj − Sh )|}, h≤j≤n
D1 = min{|Sh−1 |, |Sn − Sh−1 |}, D2 = min{|Sh |, |Sn − Sh |}. If |Si | ≤ U1 , min{|Si |, |Sn − Si |} ≤ |Si | ≤ U1 ≤ U1 + D1 ; If |Sh−1 − Si | ≤ U1 and |Sh−1 | = D1 , min{|Si |, |Sn − Si |} ≤ |Si | ≤ |Sh−1 − Si | + |Sh−1 | ≤ U1 + D1 ; If |Si | ≤ U1 and |Sn − Sh−1 | = D1 , min{|Si |, |Sn − Si |} ≤ |Sn − Si | ≤ |Sh−1 − Si | + |Sn − Sh−1 | ≤ U1 + D1 . Thus min{|Si |, |Sn − Si |} ≤ U1 + D1
(4.6)
when 0 ≤ i ≤ h − 1. Similarly, we have min{|Si |, |Sn − Si |} ≤ U2 + D2
(4.7)
pg 133/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
134
linwang
Weak Convergence and its Applications
when h ≤ i ≤ n. Hence Mn ≤ min{U1 + D1 , U2 + D2 }.
(4.8)
From (4.8), for any given λ > 0 P(Mn ≥ λ) ≤ P(U1 + D1 ≥ λ) + P(U2 + D2 ≥ λ). Furthermore, let λ = λ1 + λ2 , P(Mn ≥ λ) ≤ P(U1 ≥ λ1 ) + P(U2 ≥ λ1 ) + P(D1 ≥ λ2 ) + P(D2 ≥ λ2 ). . Then by the induction assumption, By the definition, U1 = Mh−1
P(U1 ≥ λ1 ) ≤
Cγ,α
u2α
Cγ,α . 22α
(4.9)
Cγ,α u2α Cγ,α 2α ≤ 2γ · 2α . 2γ (uh+1 + · · · + un ) 2 λ1 λ1
(4.10)
λ2γ 1
(u1 + · · · + uh−1 )2α ≤
λ2γ 1
·
Similarly, P(U2 ≥ λ1 ) ≤ Furthermore, P(D1 ≥ λ2 ) = P(|Sh−1 | ≥ λ2 , |Sn − Sh−1 | ≥ λ2 ) u2α 1 ≤ 2γ (u1 + · · · + un )2α = 2γ . λ2 λ2 Similarly, P(D2 ≥ λ2 ) ≤
u2α λ2γ 2
.
Thus P(Mn ≥ λ) ≤ 2u2α ( ≤
1 Cγ,α + ) 2α λ2γ λ2γ 2 2 1
−1 Cγ,α u2α · 2[(Cγ,α 2−2α )(2γ+1) + 1](2γ+1) ≤ 2γ u2α . 2γ λ λ
The proof of Theorem 4.1. Firstly, we prove the convergence of finite dimensional distribution of {Zn }. Denote Un (t) = nFn (t). For 0 = t0 < t1 < · · · < tk = 1, let pi = ti − ti−1 , Un (ti ) − Un (ti−1 ), 1 ≤ i ≤ k, obeys multinomial distribution with parameters n and p1 , · · · , pk . By the central limit theorem of multinomial experiment, d
(Zn (ti ) − Zn (ti−1 ); 1 ≤ i ≤ k) − → (B(ti ) − B(ti−1 ); 1 ≤ i ≤ k). Secondly, we prove that {Zn } is tight. It is difficult to discuss the tightness of {Zn } on D[0, 1], but it is easy to study the tightness of {Wn } on C[0, 1]. In fact, we can focus on {Wn } on C[0, 1] by (4.1). We need to prove that for every ε > 0, η > 0, there exist δ, 0 < δ < 1 and n0 , such that P( sup |Wn (t) − Wn (s)| ≥ ε) ≤ δη |s−t|≤δ
(4.11)
pg 134/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
135
Convergence of Empirical Processes
as n ≥ n0 , which can be implied by P( sup |Zn (s)| ≥ ε) ≤ δη
(4.12)
0≤s≤δ
as n ≥ n0 . For fixed δ, let γ = 2, α = 1, ui = to Lemma 4.1. If we can prove
√ 6δ . m
Apply Zn (iδ/m)−Zn((i−1)δ/m)
E(|Zn (s + p1 ) − Zn (s)|2 |Zn (s + p1 + p2 ) − Zn (s + p1 )|2 ) ≤ 6p1 p2 ,
(4.13)
then P( max min{|Zn (iδ/m)|, |Zn (δ) − Zn (iδ/m)|} ≥ ε) ≤ 1≤i≤m
6C2,1 δ 2 ε4
by Lemma 4.1. Thus, we have P( max {|Zn (iδ/m)|} ≥ ε) ≤ 1≤i≤m
96Cδ2 ε + P(|Zn (δ)| ≥ ). ε4 2
Noticing lim P( max {|Zn (iδ/m)|} ≥ ε) = P( sup |Zn (s)| ≥ ε),
m→∞
1≤i≤m
0≤s≤δ
we have P( sup |Zn (s)| ≥ ε) ≤ 0≤s≤δ
96Cδ 2 ε + P(|Zn (δ)| ≥ ). ε4 2
Furthermore,
ε ) → P(N ≥ ε/(2 δ(1 − δ))) 2 16δ 2 ≤ EN ε4 since the central limit theorem for Zn (δ), where N is standard normal distribution random variable. Hence P(|Zn (δ)| ≥
P( sup |Zn (s)| ≥ ε) ≤ (96C + 48) 0≤s≤δ
δ2 . ε4
We can choose δ such that (4.12) holds. Now we prove (4.13). Let p = t − s, we have 1 Zn (t) − Zn (s) = √ (Un (t) − Un (s) − np) n 1 = √ (k − np) n where k is the number of X1 , · · · , Xn , which drop into (s, t]. Define 1 − p1 , when Xi ∈ (s, s + p1 ], αi = otherwise; −p1 ,
pg 135/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
136
linwang
Weak Convergence and its Applications
βi =
1 − p2 , −p2 ,
when Xi ∈ (s + p1 , s + p1 + p2 ], otherwise;
{(αi , βi ), i = 1, · · · , n} is i.i.d. sequence with mean 0, and E[(
n i=1
n αi )2 ( βi )2 ] = nEα21 β12 + n(n − 1)Eα21 Eβ12 i=1
+2n(n − 1)Eα1 β1 Eα2 β2 . Noticing Eα21 β12 = p1 (1 − p1 )2 p22 + p21 (1 − p2 )2 p2 + p21 p22 (1 − p1 − p2 ) ≤ 3p1 p2 , Eα21 Eβ12 = p1 (1 − p1 )p2 (1 − p2 ) ≤ p1 p2 , Eα1 β1 Eα2 β2 = p21 p22 ≤ p1 p2 , we obtain E[(
n
αi )2 (
i=1
n
βi )2 ] ≤ 6n2 p1 p2 ,
i=1
which implies (4.13). We complete the proof of Theorem 4.1. 4.2
Weak convergence of marked empirical processes
Consider the autoregressive model Xi = βXi−1 + εi
(4.14)
where X0 is given, {εi = G(ηi , ηi−1 , · · · )} is a causal process with mean zero. The estimation and inference of β is a very interesting problem. The skills from empirical process and goodness-of-fit tests are vibrant research topic in statistics. In this section, we focus on the weak convergence results, which can be used in the inference of β. Let g(x) be a continuous function on R. We assume {Xn }n≥1 is a unit root process in this section, thus there exists a constant sequence {an } such that X[n·] ⇒ ξ(·) an for some stochastic process ξ, where ξ is a L´evy process or a Gaussian process. In this section, we study the weak convergence of αn (x) =
n
Xi )(1{εi ≤x} − F (x)) an
(4.15)
Xi )(1{ˆεi ≤x} − F (x)), an
(4.16)
g(
i=1
and α ˆ n (x) =
n i=1
g(
pg 136/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
137
ˆ i−1 , and βˆ is an where F (x) is the common distribution function of εi , εˆi = Xi − βX ˆ n (x) are the so-called marked empirical processes. The estimate of β. αn (x) and α results in this section are from Chan and Zhang (2012). Let S[nt] =
[nt]
εi ,
i=1
and we assume ε1 satisfies lim
x→∞
P(|ε1 | ≥ xy) = y −α P(|ε1 | ≥ x)
(4.17)
for any y > 0, where α ∈ (0, 2), and the normalization constants {an } are given by an = inf{x : P(|ε1 | ≥ x) ≤
1 }. n
In fact, condition (4.17) is equivalence to condition (2.66) in Chapter 2. To study the asymptotic behavior of αn (x) and α ˆ n (x), we utilize the martingale approximation. Let Fi = σ(ηi , ηi−1 , · · · ),
Fi∗ = σ(ηi , ηi−1 , · · · , η1 , η0 , η−1 , · · · ), fi (x|Fj ) = Fi (x|Fj ),
Fi (x|Fj ) = P(εi ≤ x|Fj ), Pj (·) = E(·|Fj ) − E(·|Fj−1 ),
where η0 is the independent copy of η0 . Let Bn (t, x) =
[nt] [1{εi <x} − F (x)] i=1
and B(t, x) be a rescaled Brownian bridge for fix t and a Brownian motion with ∞ variance μ(x) = E{ i=0 F1 (x|F0 ) − Fi (x|F0∗ )}2 for fix x. Theorem 4.2. Let g(·) be a Lipschitz continuous function on R, i.e., |g(x)−g(y)| ≤ C|x − y| for all x, y ∈ (−∞, +∞). Assume that there exist an such that S[n·] ⇒ S(·) an
(4.18)
on D[0, 1] with the semimartingale topology, where S is a L´evy process or a Gaussian process, and ∞ ∞ || Fi (x|F0 ) − Fi (x|F0∗ )||2 < ∞. j=1
(4.19)
i=j
Then for any x ∈ R, 1 d √ αn (x) − → n
1
g(S(t−))dB(t, x). 0
(4.20)
pg 137/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
138
linwang
Weak Convergence and its Applications
If (4.19) is replaced by ∞ j=1
sup | x
∞
(l)
(l)
Fi (x|F0 ) − Fi (x|F0∗ )|2 < ∞,
(4.21)
i=j
(0)
where l = 0, 1, Fi (x|F0 ) = Fi (x|F0 ), then for any constant A > 0, 1 1 √ αn (·) ⇒ g(S(t−))dB(t, ·) n 0
(4.22)
in D[−A, A] under the J1 topology. Theorem 4.3. If an (βˆ − 1) = oP (1), then 1 1 1 √ α ˆ n (x) = √ αn (x) + √ g(Xi−1 /an )[F (x + (βˆ − β)Xi−1 ) − F (x)] + oP (1). n n n i=1 (4.23) n
Let βˆ be the τ -quantile estimate of β, that is, βˆ = arg min β
n
ρτ (Xi − βXi−1 − F −1 (τ )),
i=1
where ρτ (y) = y(τ − 1{y≤0} ). When β = 1, using the argument of Theorem 4 in Knight (1991), we have n √1 √ t=1 (Xt−1 /an )(τ − 1{εt ≤F −1 (τ )} ) n ˆ an n(β − β) = 1 n + oP (1). 2 −1 (τ )|F 2 t−1 )(Xt−1 /an ) t=1 ft (F n By virtue of Theorem 2.2 and this expression, the following corollary concerning the quantile estimate is immediate. Let f denote the density function of ε1 . Corollary 4.1. Under the conditions in Theorem 4.2, if E|f1 (F −1 (τ )|F0 |)p < ∞ for some p > 1 and f (F −1 (τ )) > 0, then 1 √ S(t−)dB(t, F −1 (τ )) 1 d 0 . →− an n(βˆ − β) − 1 −1 2 (t)dt f (F (τ )) S 0 Theorem 4.4. Under the conditions in Theorem 4.2 we have (1) if βˆ is the τ -quantile estimate of β and f (F −1 ) > 0, then 1 √ α ˆ n (x) n x∈[−A,A] sup
f (x) )( − → sup [(− f (F −1 (τ )) x∈[−A,A] 1 g(S(t−))dB(t, x)] + d
0
1 0
S(t−)dB(t, F −1 (τ )) ) 1 S 2 (t)dt 0
1
g(S(t)S(t)dt 0
pg 138/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
139
(2) if βˆ is the least square estimation (LSE) of β and (S[nt] ,
n
ε2i /a2n ) − → (S(t), S 2 ), d
i=1
then (i) if an = nϑ l(n) for some 1/2 < ϑ < 1, 1 1 1 1 d → sup f (x) S(t−)dS(t) g(S(t))S(t)dt/ S 2 (t); sup √ α ˆ n (x) − n x∈[−A,A] x∈[−A,A] 0 0 0 √ (ii) if an = n, 1 1 1 1 d ˆ n (x) − → sup [f (x) S(t−)dS(t) g(S(t))S(t)dt/ S 2 (t) sup √ α n x∈[−A,A] x∈[−A,A] 0 0 0 1 g(S(t−))dB(t, x)]. + 0
Before proving these results, it is necessary to introduce some lemmas. Lemma 4.2. (Wu (2003)) Let H be a function with continuous first-order derivatives and a > 0. Then t+a 2 t+a 2 H (u)du + 2a H 2 (u)du sup H 2 (s) ≤ a t t≤s≤t+a t and sup H 2 (s) ≤ 2 s∈R
H 2 (u)du + 2
R
H 2 (u)du,
R
where H is the derivative of H. Lemma 4.3. Under the conditions in Theorem 4.2, for any x, there exists a martingale difference sequence ζi (x) with respect to Fi such that for any δ > 0, 1 1 g(Si−1 /an )(1{εi ≤x} − F (x)) − √ g(Si−1 /an )ζi (x)| > δ} = 0. lim P{| √ n→∞ n i=1 n i=1 n
Proof. quence
n
When (4.19) holds, by Volny (1993), there exists a random variables se-
ξi (x) =
∞ −1
Pi+j 1{εi+l ≤x}
j=−∞ l=0
and a martingale difference sequence ζi (x) =
∞
Pi 1{εj ≤x}
j=−i
such that 1{εi ≤x} − F (x) = ζi (x) + ξi (x) − ξi+1 (x).
pg 139/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
140
linwang
Weak Convergence and its Applications
This gives that 1 1 √ g(Si−1 /an )(1{εi ≤x} − F (x)) − √ g(Si−1 /an )ζi (x) n i=1 n i=1 n
n
n−1 1 1 =√ ξi+1 (x)[g(Si /an ) − g(Si−1 /an)] − √ g(Sn−1 /an )ξn+1 (x) =: I1 + I2 . n i=1 n
Since for any δ > 0, P{ sup
2≤i≤n+1
n+1 √ √ √ |ξi (x)| > δ n} ≤ ( nδ)−2 E[ξ12 (x)I(|ξ1 (x)| > δ n)] → 0, i=2
√
it follows that |ξn+1 (x)|/ n = oP (1). On the other hand, by (4.18), g(Sn−1 /an ) = OP (1) holds. Thus, I2 = oP (1). It suffices to show that I1 = oP (1). When {εi } has infinite variance, the result I1 = oP (1) follows along exactly the lines of argument of Lemma 2 of Knight (1991). We therefore only give the proof for the finite variance case in detail. When {εi } has finite variance, by Theorem 1 of Wu (2007), E( ni=1 εi )2 = √ √ O( n). Thus, an = O( n). By the assumptions, we have n−1 C E|ξi+1 (x)εi /an | E|I1 | ≤ √ n
C ≤ n
i=1 n−1
2 [Eξi+1 (x)]1/2 [Eε2i ]1/2
i=1
n C 2 ≤ [Eξ (x)]1/2 n i=2 i
n −1 ∞ C { E[ [Fi+l (x|Fj ) − Fi+l (x|Fj∗ )]]2 }1/2 = o(1). n i=2 j=−∞
≤
l=0
Then I1 = oP (1) and therefore the proof is complete. Lemma 4.4. Under the conditions in Theorem 4.2, for any constant A > 0, 1 1 g(Si−1 /an )(1{εi <x} − F (x)) − √ g(Si−1 /an )ζi (x)| sup | √ n i=1 n i=1 x∈[−A,A] n
n
converges to zero in probability, where ζi (x) is defined as in Lemma 4.3. Proof.
It suffices to show 1 E sup ξ 2 (x) = O(1). n i=1 x∈[−A,A] i n
pg 140/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
Since ξi (x) = E
−1
∞
141
l=0 Pj 1{εi+l ≤x}
j=−∞
2 A
ξi2 (x) ≤
sup
linwang
x∈[−A,A]
2 A
≤
A −A A
is differentiable, thus A Eξi2 (u)du + 2A Eξi 2 (u)du −A
−1
−A j=−∞
A
+2A
E[
≤2
sup E[
j=−∞
+4A2
E[
−1 j=−∞
∞
[fi+l (u|Fj ) − fi+l (u|Fj∗ )]]2 du
l=0
∞
u
[Fi+l (u|Fj ) − Fi+l (u|Fj∗ )]]2 du
l=0
−1
−A j=−∞ −1
∞
[Fi+l (u|Fj ) − Fi+l (u|Fj∗ )]]2
l=0
sup E[ u
∞
[fi+l (u|Fj ) − fi+l (u|Fj∗ )]]2
l=0
≤C
for every i. Lemma 4.5. Let ˜n (t, x) = B
[nt]
ζi (x),
i=1
ζi (x) is the martingale difference defined in Lemma 4.3. Then under condition (4.21), 1 ˜ √ B n (·, ·) ⇒ B(·, ·) n in D([0, 1] × [−A, A]). Proof.
(4.21) implies ∞ ∞ || Fi (x|F0 ) − Fi (x|F0∗ )||2 < ∞, j=1
i=j
it follows that Eζi (x) = E{
∞
Fi (x|F0 ) − Fi (x|F0∗ )}2 = μ(x) < ∞.
i=0
Since {ζi (x)} is a martingale difference sequence, by Theorem 18.2 of Billingsley (1999), we have 1 ˜ √ B n (·, ·) ⇒ B(·, ·). n ˜ n (t, x) follows. Let ζi (x, y) = ζi (y) − Thus, the finite dimensional convergence of B ζi (x). By Theorem 6 of Bickel and Wicchura (1971), to show the tightness of
pg 141/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
142
linwang
Weak Convergence and its Applications
˜n (t, x)} on D[0, 1] × D[−A, A], it suffices to show that for any 0 ≤ t1 < t < t2 ≤ 1 {B and −A ≤ x1 < x < x2 ≤ A, [nt]
n−2 E{[
ζi (x1 , x2 )]2 [
i=[nt1 ]+1
[nt2 ]
ζi (x1 , x2 )]2 } ≤ (t − t1 )(t2 − t)(x2 − x1 )2
i=[nt]+1
and [nt2 ]
E{|
i=[nt1 ]+1
ζi (x1 , x2 )|2 |
[nt2 ]
ζi (x1 , x2 )|2 } ≤ C(x − x1 )(x2 − x)(t2 − t1 )2 .
i=[nt1 ]+1
˜ n (t, x) is a martingale. These follow easily by condition (4.21) and noting that B Details are omitted. Lemma 4.6. Under the conditions of Theorem 4.2, there exists a dense set Q ⊂ [0, 1],0, 1 ∈ Q such that for any finite subset {0 ≤ t1 < t2 < · · · < tm ≤ 1} ⊂ Q and for any x, d
˜n (ti , x), 1 ≤ i ≤ m) − (S[nti ] , B → (S(ti ), B(ti , x), 1 ≤ i ≤ m).
(4.24)
Proof. There exists a dense set Q ⊂ [0, 1], 1 ∈ Q such that for any finite subset {t1 < t2 < · · · < tm ≤ 1} ⊂ Q , d
(S[nt1 ] , S[nt2 ] , S[nt3 ] ) − → (S(t1 ), S(t2 ), S(t3 )) by (4.18). When εi has infinite variance, weak convergence in D([0, 1] × [−A, A]) in Lemma 4.5 can also be replaced by that in C([0, 1] × [−A, A]), since W (t, x) is a continuous ˜n (t, x)) is uniformly S-tight on D([0, 1] × process on [0, 1] × [−A, A]. Thus (S[nt] , B ˜n (t, x)), t ∈ Q, there exists a [−A, A]). This implies that for sequence (S[nt] , B ˜ subsequence (S[nk t] , Bnk (t, x)) such that d ˜n (t, x)) − (S[nk t] , B → (S(t), B(t, x)). k
Following the argument of Theorem 3 in Resnick and Greenwood (1979), we have that S(t) and B(t, x) are independent and any convergent subsequence has the same limit. Thus, (4.24) holds. When εi has finite variance, from Corollary 3 of Dedecker and Merlev`ede (2003), S[n·] ⇒ S(·). Thus, by Lemma 4.5, if we can show that for any finite subset {ti , 1 ≤ t ≤ m} ⊂ [0, 1], d
˜n (ti , x), 1 ≤ i ≤ m) − (S[nti ] , B → (S(ti ), B(ti , x), 1 ≤ i ≤ m) then ˜n (·, x)) ⇒ (S(·), B(·, x)) (S[n·] , B
(4.25)
pg 142/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
143
Convergence of Empirical Processes
on D[0, 1] and (4.24) follows. By Theorem 1 of Wu (2007), there exists martingale Mi with respect to Fi such that [nt] [nt] ˜n (t, x)) − ( |(Sn (t), B Mi , ζi (x))| = oP (1). i=1
i=1
On the other hand, from the martingale central limit theorem, it follows that [n·] (Mi , ζi (x)) ⇒ (S(·), B(·, x)). i=1
Thus, (4.25) holds, we obtain the lemma.
The proof of Theorem 4.2. Lemma 4.6 implies that (5) of Jakubowski (1996) holds, i.e., there exists a dense set Q such that for any x d ˜n (t, x)) − → (g(S(t)), B(t, x)), t ∈ Q. (g(S[nt] ), B
Further, since g(·) is a Lipschitz continuous function and S[n·] is uniformly S-tight, ˜ x) is it follows that g(S[n·] ) is also uniformly S-tight. Moreover, for any x ∈ R, B(t, a martingale satisfying UT condition and is J1 -tight with limiting law concentrated on C([0, 1]), by Remark 4 of Jakubowski (1996), we see that his condition (6) is satisfied. ˜n (·, x)}, all the conditions of Theorem 3 of Jakubowski Therefore, for {g(S[n·] ), B (1996) are satisfied, thus 1 n 1 d √ g(Si−1 /an )ζi (x) − → g(S(t−))dB(t, x). (4.26) n i=1 0 n The left side of (4.20) can be approximated by √1n i=1 g(Si−1 /an )ζi (x) through martingale approximation. Thus, (4.20) follows from Lemma 4.3. By Lemma 4.4, for (4.22) it suffices to show that 1 n 1 Un (·) := √ g(Si−1 /an )ζi (·) ⇒ g(S(t−))dB(t, ·) =: U (·) n i=1 0 on D[0, 1]. The finite-dimension convergence follows from (4.26) and the Cram´erWold device. Next, we show for any ε > 0, there exists a δ > 0 such that P{ sup |Un (x) − Un (y)| > ε} → 0 |x−y|≤δ
as n → ∞. This implies {Un (·)} is tight, as a result, we complete the proof. Obviously, d
sup |g(S[nt] )| − → sup |g(S(t))|,
0≤t≤1
0≤t≤1
by (4.18). Let gδ (Si /an ) = g(Si /an )1{|g(Si /an )|≤δ−1/4 }
(4.27)
pg 143/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
144
linwang
pg 144/4
Weak Convergence and its Applications
and 1 gδ (Si−1 /an )(ζi (x) − ζi (y)). Vn (x) = √ n i=1 n
Then Vn (x) is a martingale and by Lemma 4.2 and condition (4.21), E[
|Vn (x)|]2
sup y≤x≤y+δ
n n Si−1 2δ y+δ Si−1 E( [gδ ( )(ζi (u) − ζi (y))])2 du + E( [gδ ( )ζi (u)])2 du a n a n n y y i=1 i=1 n y+δ n 2 2 2δ ≤ √ E{ζi (u) − ζi (y)}2 du + √ sup E{ζi (x)}2 δn i=1 y δn i=1 y≤x≤y+δ n 2 y+δ u 2δ 2 ≤ √ E{ζi (v)}2 dvdu + √ sup E{ζi (x)}2 ≤ Cδ 3/2 . δn i=1 y δn x∈[−A,A] y ≤
2 δn
y+δ
Note that P{ sup |Un (x) − Un (y)| > 4ε} |x−y|≤δ
≤ (1 + [1/δ])P{
|Vn (x) > ε|} + P{ max |g(Si /an )| > δ −1/4 },
sup
1≤i≤n
y≤x≤y+δ
which implies (4.27). We complete the proof. The proof of Theorem 4.3. Let {uni}1≤i≤n be a constant array with max1≤i≤n |uni | = o(1). Along the lines of proof in Lemma 4.3, we have 1 Si−1 √ g( )[1{εi ≤x+uni } − 1{εi ≤x} an n i=1 n
1 Si−1 1 Si−1 = √ g( )(ζi (x + uni ) − ζi (x)) + √ g( )(F (x + uni ) − F (x)) + oP (1) n i=1 an n i=1 an n
n
1 Si−1 g( )(F (x + uni ) − F (x)) + oP (1). = √ an n i=1 n
If an (βˆ − β) = oP (1), max (βˆ − β)Xi = oP (1),
1≤i≤n
since max |Xi /an | = OP (1).
1≤i≤n
Thus, we have 1 1 √ (ˆ g(Si−1 /an )(F (x + (βˆ − β)Xi ) − F (x)) + oP (1). αn (x) − αn (x)) = √ n n i=1 n
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
145
This completes the proof of Theorem 4.3. The proof of Theorem 4.4. Let {uni }1≤i≤n be a constant with maxi |uni | = o(1), then by Lemma 4.4, we have n 1 sup √ g(Xi−1 /an )1{εi ≤x+uni } n x∈[−A,A] =
1 sup [ √ n x∈[−A,A]
i=1 n
1 g(Xi−1 /an )1{εi ≤x} + √ g(Xi−1 /an )[ζi (x + uni ) − ζi (x)] n i=1 i=1 n
1 +√ g(Xi−1 /an )(F (x + uni ) − F (x))] + oP (1) n i=1 n
1 g(Xi−1 /an )1{εi ≤x} sup [ √ n i=1 x∈[−A,A] n
=
1 +√ g(Xi−1 /an )(Fi (x + uni ) − Fi (x))] + oP (1). n n
i=1
As a result, by max1≤i≤n (βˆ − 1)Xi = oP (1) and the Taylors expansion, we have n αn (x) α ˆ n (x) 1 √ g(Xi−1 /an)Xi−1 ] + oP (1) = sup [ √ + √ f (x)(βˆ − 1) n n n x∈[−A,A] x∈[−A,A] i=1
sup
in probability. Further, by Theorem 3 of Jakubowski (1996), it follows that 1 n 1 d √ [g(Xi−1 /an )Xi−1 /an ] − → g(S(t))(S(t))dt. n i=1 0 Corollary 4.1 yields (1). As for (2), noting that βˆ is the LSE of β, we have n n 1 2 1 2 2 ε2i /a2n ]/[ √ Xi−1 /a2n ] [Xn /an − X02 /a2n − 2 n i=1 i=1 1 1 1 1 d − → (S 2 (1) − S 2 )/ S 2 (t)dt =: S(t−)dS(t)/ S 2 (t)dt. 2 0 0 0
n(βˆ − β) =
The proof of Theorem 4.4 is completed. 4.3
Weak convergence of function index empirical processes
In Section 4.1, we have already mentioned that Donsker’s formulation was not quite correct since he discussed the weak convergence of empirical process on D[0, 1] under the local uniform topology, however measurability of the functionals of discontinuous processes may not hold in this case. Skorokhod (1956) introduced the Skorokhod metric to deal with this problem, however, the Skorokhod metric is quite complex.
pg 145/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
146
linwang
Weak Convergence and its Applications
Dudley (1978), Talagrand (1987) and so on reformulated Donsker’s result to avoid the problem of measurability and the need of the Skorokhod metric. In this section, we will introduce this approach and discuss the weak convergence of function index empirical processes. Function index empirical processes can be seen as an extension of classical empirical processes, however, the tools of studying are quite different. In the whole section, let {Xn }n≥1 be a sequence of i.i.d. random variables defined on (Ω, F , P), and P the common distribution of {Xn }n≥1 . The so called empirical measure of {Xn }n≥1 is defined as: 1 δX , Pn = n i=1 i n
where δ is the dirac measure at the observation. Given a collection M of R → R measurable functions we define a map from M to R by g → Pn (g), g ∈ M, here, we use the abbreviation Q(g) = gdQ for measurable function g and measure Q. Then n n 1 1 Pn (g) = gdPn = gdδXi = g(Xi ). n i=1 n i=1 4.3.1
Preliminary
The core of this section is to investigate weak convergence in the space of l∞ (M), the set of all uniformly bounded real functions on M with the norm ||z||M = sup |z(g)|. g∈M
∞
The metric space (l (M), || · ||M ) is not a separable metric space. Thus, if f is a bounded continuous function on l∞ (M) and ξ is a random element taking values in l∞ (M), the measurability of f (ξ) maybe fail. This is quite different from the cases in previous chapters. Hence it is necessary to modify the definition of weak convergence for nonmeasurable variables on (Ω, F , P). Let Z be an arbitrary map from (Ω, F , P) to R. The outer expectation of Z is defined as E∗ Z = inf{EU : U ≥ Z, U : Ω → R measurable and EU exists}. Furthermore, the outer probability measure is defined as P∗ (A) = E∗ 1A for every set A ⊂ Ω.
pg 146/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
147
Definition 4.1. Let {Hn } be a sequence of arbitrary, possibly nonmeasurable maps on (Ω, F , P), H be a Borel measurable map. If for every bounded continuous functional f , lim E∗ f (Hn ) = Ef (H),
n→∞
we say {Hn } converges weakly to H, and write Hn ⇒ H. Function index empirical process is given by 1 (g(Xi ) − P (g)), n i=1 n
Gn (g) := Pn (g) − P (g) =
(4.28)
where g ∈ M. The empirical process Gn can be viewed as a map from M into l∞ (M). Consequently, it makes sense to investigate conditions under which √ nGn ⇒ G (4.29) in l ∞ (M), where G is a Gaussian random element in l∞ (M). Definition 4.2. A class M for which (4.29) is true is called a P −Donsker class. To prove (4.29) in the case of Definition 4.1, the results in Section 1.3 may not be enough, here we present a more general result. Theorem 4.5. If {Hn } is a sequence of random elements taking values in l∞ (M), and there exists a pseudometric || · || on M such that (M, || · ||) is totally bounded and {Hn } is asymptotic equicontinuity, i.e., lim lim sup P∗ ( sup
δ→0 n→∞
||g−h||≤δ
|Hn (g) − Hn (h)| > ε) = 0
(4.30)
for all ε > 0. Furthermore, all the finite-dimensional distributions of Hn converge in law to the corresponding finite-dimensional distributions of H, which has tight probability distribution on l ∞ (M). Then Hn ⇒ H in l∞ (M). Proof. Since (M, ||·||) is totally bounded, for every δ > 0, there exist finite points g1 , · · · , gN (δ) such that N (δ)
M⊂
B(gi , δ),
i=1
where B(g, δ) is the open ball with center g and radius δ. Thus, for each g ∈ M, we can choose ϑδ (g) ∈ {g1 , · · · , gN (δ) } so that ||ϑδ (g) − g|| < δ. Then define Hn,δ (g) = Hn (ϑδ (g)),
Hδ (g) = H(ϑδ (g)).
pg 147/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
148
linwang
Weak Convergence and its Applications
Note that Hn,δ and Hδ are approximations of Hn and H by (4.30), and Hn,δ and Hδ are indexed by finite set. Convergence of the finite dimensional distributions of Hn to those of H implies that Hn,δ ⇒ Hδ
(4.31)
∞
in l (M) Let f : l∞ (M) → R be bounded and continuous. Then it follows that |E∗ f (Hn ) − Ef (H)| ≤ |E∗ f (Hn ) − Ef (Hn,δ )| + |Ef (Hn,δ ) − Ef (Hδ )| +|Ef (Hδ ) − Ef (H)|. From (4.31), lim sup |Ef (Hn,δ ) − Ef (Hδ )| = 0. n→∞
Given ε > 0, let K ⊂ l∞ (M) be a compact set such that P(H ∈ K c ) < ε, and there exists τ > 0, if x ∈ K, ||x − y||M < τ , |f (x) − f (y)| < ε, and for δ small enough P(||Hδ − H||M ≥ τ ) < ε. Then |Ef (Hδ ) − Ef (H)| ≤ CP({H ∈ K c } ∪ {||Hδ − H||M ≥ τ }) + sup{|f (x) − f (y)| : x ∈ K, ||x − y||M < τ } ≤ C1 ε, which implies lim |Ef (Hδ ) − Ef (H)| = 0.
δ→0
Furthermore, we have |Ef (Hn ) − Ef (Hn,δ )| ≤ CP(Hn,δ ∈ Kτc/2 ) + P(||Hn − Hn,δ ||M ≥ τ /2) + sup{|f (x) − f (y)| : x ∈ K, ||x − y||M < τ }, where Kτ /2 is the τ /2 open neighborhood of the set K for the sup norm. For big enough n, if Hn,δ ∈ Kτ /2 and ||Hn,δ − Hn ||M ≤ τ /2, then there exists x ∈ K such that ||Hn,δ − x||M ≤ τ /2 and ||x − Hn ||M ≤ τ . Now, (4.30) implies that there is a δ1 such that lim sup P∗ (||Hn,δ − Hn ||M ≥ τ /2) < Cε n→∞
pg 148/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
149
for all δ < δ1 and (4.31) yields lim sup P(Hn,δ ∈ Kτc/2 ) ≤ P(Hδ ∈ Kτc/2 ) ≤ Cε. n→∞
Hence lim lim sup |Ef (Hn ) − Ef (Hn,δ )| = 0.
δ→0 n→∞
The proof is complete.
Whether a given class M is a P-Donsker class depends on the size of the class. In fact, a finite class of square integrable functions is always P-Donsker class, however, a class of all square integrable uniformly bounded functions is almost never a PDonsker class. A relatively convenient way to measure the size of M is to use the entropy. We give the definition of entropy in the following. Definition 4.3. Let (M, || · ||) be a subset of a metric or pseudo-metric space. The covering number N (δ, M, || · ||) is the minimal number of balls {h : ||h − f || < δ} of radius δ needed to cover the set M where f is the center of ball. Note that the centers of the balls need not belong to M, but they should have finite norms. Let (M, || · ||) be a subset of a metric or pseudo-metric space of real functions. For two functions l and m with l ≤ m, the bracket [l, m] is the set of all functions f satisfying l ≤ f ≤ m, the δ−bracket is the bracket [l, m] with ||m − l|| < δ. Definition 4.4. The bracketing number N[] (δ, M, || · ||) is the minimal number of δ−brackets needed to cover the set M. Definition 4.5. The entropy without bracketing is the logarithm of the covering number. The entropy with bracketing is the logarithm of the bracketing number. In the study of weak convergence of empirical processes, we usually use the Lr (Q)−norm f ∈ M. ||f ||Lr (Q) = ( |f (x)|r Q(dx))1/r , Definition 4.6. An envelope function of a class M is any function F (x) such that |f (x)| ≤ F (x) for every x ∈ R and f ∈ M. In the following, we introduce some inequalities, and express these inequalities in abstract setting, using the Orlicz norm. Let ψ be a nondecreasing convex function with ψ(0) = 0 and ξ a random variable. The Orlicz norm ||ξ||ψ is defined as |ξ| ) ≤ 1}. C The best-known example of the Orlicz norm is Lp −norm ||ξ||ψ = inf{C > 0 : Eψ(
||ξ||p = (E|ξ|p )1/p ,
pg 149/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
150
linwang
Weak Convergence and its Applications
which is the case of ψ(x) = xp for p > 1. There is a quite important Orlicz norm, which is given by ψp (x) = exp(xp ) − 1 with p ≥ 1. It is easy to obtain that ||ξ||ψp ≤ ||ξ||ψq (log 2)p/q , p ≤ q, ||ξ||Lp ≤ p!||ξ||ψ1 , p ≥ 1. Next, we will present some useful inequalities. Lemma 4.7. Let ξ be a random variable with P(|ξ| > x) ≤ C1 exp(−C2 xp ) for every x, where C1 and C2 are constants, and p ≥ 1. Then ||ξ||ψp ≤ ( Proof.
By Fubini’s theorem
1 + C1 1/p ) . C2
|ξ|p
E(exp(D|ξ| ) − 1) = E D exp(Ds)ds 0 +∞ P(|ξ| > s1/p )D exp(Ds)ds = 0 +∞ C1 D ≤ . C1 D exp((D − C2 )s)ds = C2 − D 0 p
1 1/p is less than or equal to 1 for D−1/p greater than or equal to ( 1+C by the C2 ) definition of the Orlicz norm. We obtain the lemma.
C1 D C2 −D
Lemma 4.8. Let ψ be a convex nondecreasing nonzero function with ψ(0) = 0 and lim sup x,y→∞
ψ(x)ψ(y) <∞ ψ(cxy)
for some constant c > 0. Then for random variables ξ1 , · · · , ξm , || max ξi ||ψ ≤ Cψ −1 (m) max ||ξi ||ψ 1≤i≤m
1≤i≤m
for a constant C depending only on ψ. Proof. We first assume that ψ(x)ψ(y) ≤ ψ(cxy) for all x, y ≥ 1. Then, for y ≥ 1 and any C > 0, max ψ(
1≤i≤m
|ξi | |ξi | ψ(c|ξi |/C) ) ≤ max [ + ψ( 1 |ξ | ] 1≤i≤m Cy ψ(y) Cy { Cyi <1} m ψ(c|ξi |/C) + ψ(1). ≤ ψ(y) i=1
pg 150/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
151
Convergence of Empirical Processes
Obviously, Eψ(
|ξi | |ξi |) ) ≤ Eψ( ) ≤ 1, max1≤i≤m ||ξi ||ψ ||ξi ||ψ
letting C = c max1≤i≤m ||ξi ||ψ , we obtain Eψ(
|ξi | max1≤i≤m |ξi | m ) ≤ max ψ( )≤ + ψ(1). 1≤i≤m Cy cy ψ(y)
When ψ(1) ≤ 1/2 and y = ψ −1 (2m), we have Eψ( greater than 1 when ψ(1) > 1/2. Thus
max1≤i≤m |ξi | ) Cy
≤ 1, which is
|| max ξi ||ψ ≤ ψ −1 (2m)c max ||ξi ||ψ . 1≤i≤m
1≤i≤m
By the convexity of ψ and ψ(0) = 0, we have ψ−1 (2m) ≤ 2ψ −1 (m), the conclusion is obtained. For a general ψ, there are constants a ≤ 1 and b > 0 such that g(x) = aψ(bx) satisfies the conditions in the previous argument. Observe that ||ξi ||ψ ≤
||ξi ||ψ ||ξi ||g ≤ , ab a
we complete the proof.
The previous inequality is useless for a supremum over infinitely many variables, for example, stochastic process. Such a case can be handled via a method known as chaining. To obtain similar inequality to stochastic process {Xt : t ∈ T }, we need to introduce several notions concerning the size of index set T . Let (T, d) be a metric or pseudo-metric space. Lemma 4.9. Let (T, d) be a pseudo-metric space, {Xt , t ∈ T } a stochastic process indexed by T . Let ψ be a convex nondecreasing, nonzero function with ψ(0) = 0, lim sup x,y→∞
ψ −1 (xy) ψ −1 (x)ψ −1 (y)
< ∞,
lim sup x→∞
ψ −1 (x2 ) <∞ ψ −1 (x)
and ||Xt − Xs ||ψ ≤ d(s, t),
s, t ∈ T.
Then for all finite subsets S ⊂ T , t0 ∈ T and ε > 0, D ψ −1 (N (x, T, d))dx, || max |Xt |||ψ ≤ ||Xt0 ||ψ + C t∈S
||
max s,t∈S,d(s,t)<ε
(4.32)
0
|Xt − Xs |||ψ ≤ C
ε/2
ψ −1 (N (x, T, d))dx
0
for a constant C depending only on ψ, and D is the diameter of (T, d).
(4.33)
pg 151/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
152
linwang
Weak Convergence and its Applications
Proof. If (T, d) is not totally bounded, the right sides of (4.32) and (4.33) are infinite. Hence, we assume (T, d) is totally bounded and has diameter less than 1. For the sake of simplicity, we assume t0 ∈ S and Xt0 = 0. For any integer k > 0, let {sk1 , · · · , skNk } ⊂ S be the center of Nk ≡ N (2−k , S, d) open balls of radius at most 2−k and centers in S that cover S. Assume S0 = {t0 }. Let πk : S → Sk be a map such that d(s, πk (s)) < 2−k for all s ∈ S. There exists an integer kS such that for k ≥ kS and s ∈ S, d(πk (s), s) = 0 since S is finite. Therefore, for s ∈ S Xs =
kS
(Xπk (s) − Xπk−1 (s) )
k=1
almost surely. Note that d(πk−1 (s), πk (s)) ≤ d(s, πk (s)) + d(s, πk−1 (s)) < 3 · 2−k . Therefore, || max |Xs |||ψ ≤ s∈S
kS
|| max |Xπk (s) − Xπk−1 (s) |||ψ s∈S
k=1
≤ 3Cψ
kS
2−k ψ −1 (Nk Nk−1 )
k=1
≤C
kS
2−k ψ −1 (Nk ).
k=1
This implies (4.32) since N (2x, S, d) ≤ N (x, T, d) for any x > 0, and then by bounding the sum in the last display by the integral in (4.32). To prove (4.33), for ε > 0, let V = {(s, t) : s, t ∈ T, d(s, t) ≤ ε}, and for v := (tv , sv ) ∈ V define Yv = Xtv − Xsv . For u, v ∈ V , define the pseudo-metric ρ(u, v) = ||Yu − Yv ||ψ . We can assume that ε ≤ diam(T ). Note that diamρ (V ) := sup ρ(u, v) ≤ 2 max ||Yv ||ψ ≤ 2ε, u,v∈V
v∈V
and furthermore, ρ(u, v) ≤ ||Xtv − Xtu ||ψ + ||Xsv − Xsu ||ψ ≤ d(tv , tu ) + d(sv , su ). If t1 , · · · , tN are the centers of a covering of T by open balls of radius at most x, then the set of open balls with centers in {(ti , tj ) : 1 ≤ i, j ≤ N } and ρ−radius 2x cover V . Furthermore, if the 2x ball about (ti , tj ) has a non-empty intersection with V , then it is contained in a ball of radius 4x centered at a point in V . Thus N (4x, V, ρ) ≤ N 2 (x, T, d).
pg 152/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
153
Convergence of Empirical Processes
Thus, applying (4.32) to the process Y , we find that 2ε || max |Xt − Xs |||ψ ≤ C ψ −1 (N (x, V, ρ))dx s,t∈S,d(s,t)<ε
0
≤C
2ε
ψ −1 (N 2 (x/4, T, d))dx
0
≤ C
ε/2
ψ −1 (N (x, T, d))dx.
0
Definition 4.7. A process {Xt : t ∈ T } is separable if there is a countable set T0 ⊂ T and a subset Ω0 ⊂ Ω with P(Ω0 ) = 1 such that for all ω ∈ Ω0 , t ∈ T , and ε > 0, X(t, ω) is in the closure of {X(s, ω) : s ∈ T0 ∩ B(t, ε)}, where B(t, ε) is ε−open ball of t. If X is separable, it is easily seen that || sup |Xt |||ψ = t∈T
sup
S⊂T, S finite
|| max |Xt |||ψ t∈S
and similarly for || sups,t∈T, d(s,t)<ε |Xt − Xs |||ψ . The following lemma is an easy consequence of the preceding lemma. Lemma 4.10. Let (T, d) be a pseudo-metric space, {Xt , t ∈ T } a separable stochastic process. Let ψ be a convex nondecreasing nonzero function with ψ(0) = 0, lim sup x,y→∞
ψ −1 (xy) < ∞, ψ −1 (x)ψ−1 (y)
lim sup x→∞
ψ −1 (x2 ) < ∞, ψ −1 (x)
and ||Xt − Xs ||ψ ≤ d(s, t),
s, t ∈ T.
Then for all ε > 0, || max |Xt |||ψ ≤ ||Xt0 ||ψ + C t∈T
D
ψ −1 (N (x, T, d))dx,
||
max s,t∈T,d(s,t)<ε
(4.34)
0
|Xt − Xs |||ψ ≤ C
ε
ψ −1 (N (x, T, d))dx
(4.35)
0
for a constant C depending only on ψ, and D is the diameter of (T, d). If the stochastic process has multivariate normal finite-dimensional distributions (called as Gaussian process), we have the following lemma. In particular, we define a pseudo-metric ρ2X (s, t) = E[(Xt − Xs )2 ], where X is a Gaussian process.
pg 153/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
154
linwang
Weak Convergence and its Applications
Lemma 4.11. Let (T, ρX ) be a pseudo-metric space, {Xt, t ∈ T } a separable Gaussian process indexed by T . Then for all 0 < ε < D, D || max |Xt |||ψ2 ≤ ||Xt0 ||ψ2 + C log N (x, T, ρX )dx, (4.36) t∈T
0
||
max
s,t∈T,ρX (s,t)<ε
|Xt − Xs |||ψ2 ≤ C
ε
0
log N (x, T, ρX )dx
(4.37)
for a constant C depending only on ψ, and D is the diameter of (T, ρX ). Proof.
It is easy to see that if Z N (0, 1), E exp(
for c2 > 2. Choosing c =
1 Z2 )= <∞ 2 c 1 − 2/c2
8/3, we have E exp(
Hence ||Z||ψ2 =
Z2 ) = 2. c2
8/3. By homogeneity this yields ||aZ||ψ2 = a 8/3. Thus ||Xt − Xs ||ψ2 = 8/3ρX (s, t).
Furthermore, ψ2−1 (x) =
log(1 + x) ≤ C log x, x ≥ 2.
Applying Lemma 4.10, we complete the proof.
The previous lemma can be extended virtually without changing to sub-Gaussian process, a process Xt , t ∈ T with respect to the pseudo-metric d on T satisfying P(|Xs − Xt | > x) ≤ 2 exp(−
x2 ). 2d2 (s, t)
Example 4.1. Suppose that {εn }n≥1 is a sequence of independent Rademacher random variables (that is, P(εi = ±1) = 12 ), and let Xt =
k
ti ε i ,
t = (t1 , t2 , · · · , tk ) ∈ Rk .
i=1
Then by Hoeffding’s inequality P(|Xs − Xt | > x) ≤ 2 exp(− i.e., Xt , t ∈ Rk is a sub-Gaussian process.
x2 ), 2||s − t||2
pg 154/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
155
Next, we will introduce the symmetrization inequalities, which are the main tools to derive the weak convergence of functional index empirical processes. To prove the weak convergence of n 1 (g(Xi ) − P (g)), Pn (g) − P (g) = n i=1
the main approaches is based on the principle of comparing the empirical processes to symmetrized processes n 1 εi g(Xi ) G0n g = n i=1 or n 1 G+ εi (g(Xi ) − P (g)), ng = n i=1 where εn , n ≥ 1, are i.i.d. Rademacher random variables. It will be shown that the law of large numbers or weak convergence holds for one of these processes if and only if it holds for the other two processes. It will be convenient to generalize the inequality beyond the empirical process setting. We instead consider sums of independent stochastic processes {Zi (g) : g ∈ M}i≥1 . The empirical process corresponds to taking Zi (g) = g(Xi )−P g. We do not assume any measurability of Zi , but for computing outer expectations E∗ , it will be understood that the underlying probability space is a product space (Rn , B n , P n ) × (Z, C, Q), where (Rn , B n , P n ) is the n times product space of (R, B, P ), (Z, C, Q) is auxiliary probability space. Each Zi is a function of the ith coordinate (z, x) = (z1 , · · · , zn , x). Lemma 4.12. Let Z1 , · · · , Zn be independent stochastic processes with mean 0. Then for any nondecreasing convex function Φ : R → R and arbitrary function Si : M → R, n n n 1 E∗ Φ( || εi Zi ||M ) ≤ E∗ Φ(|| Zi ||M ) ≤ E∗ Φ(2|| εi (Zi − Si )||M ). 2 i=1 i=1 i=1 Proof. Let Y1 , · · · , Yn be an independent copy of Z1 , · · · , Zn defined on (Rn , B n , P n ) × (Z, C, Q) × (Rn , B n , P n ) and depending on the last n coordinates. n Since EYi = 0, E∗ Φ( 12 || i=1 εi Zi ||M ) can be seen as an average of expressions of the type n 1 ei (Zi − EYi )||M ), E∗Z Φ( || 2 i=1 where (e1 , · · · , en ) ranges over {−1, 1}n, E∗Z is the outer expectation with respect to Z1 , Z2 , · · · , Zn computed for given probability measure. The definitions of E∗Y , E∗Z,Y are similar. By the convexity of Φ and the definition of the norm || · ||M , n 1 ei (Zi − EYi )||M ) E∗ Φ( || 2 i=1
pg 155/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
156
linwang
Weak Convergence and its Applications
1 1 ≤ E∗Z,Y Φ( || ei (Zi − Yi )||M ) = E∗Z,Y Φ( || (Zi − Yi )||M ) 2 2 n
n
i=1
i=1
by Jensen’s inequality. Use the triangle inequality and convexity of Φ yields the left side inequality. To prove the inequality on the right side, note that for fixed values of Zi ’s we have n n n || Zi ||M = sup | (Zi (f ) − EYi (f ))| ≤ E∗Y sup | (Zi (f ) − Yi (f ))|. i=1
f ∈M i=1
f ∈M i=1
Since Φ is convex, Jensen’s inequality yields Φ(||
n
Zi ||M ) ≤ EY Φ(||
i=1
n
(Zi (f ) − Yi (f ))||∗Y M ),
i=1
where ∗Y denotes the minimal measurable majorant of the supremum with respect to Y1 , · · · , Yn with Z1 , · · · , Zn fixed. Take outer expectation with respect to Z1 , · · · , Zn , we have E∗Z Φ(||
n
Zi ||M ) ≤ E∗Z EY Φ(||
i=1
n
(Zi (f ) − Yi (f ))||M ).
i=1
Note that adding a minus sign in front of a term (Zi (f ) − Yi (f )) has the effect of exchanging Zi and Yi . By construction of the underlying probability space n E∗ Φ(|| ei (Zi − Yi )||M ) i=1
is the same for any n−tuple (e1 , · · · , en ) ranges over {−1, 1}n, thus E∗ Φ(||
n i=1
Zi ||M ) ≤ E∗ε E∗Z,Y Φ(||
n
εi (Zi − Yi )||M ).
i=1
Now, add and subtract Si inside the right side and use the triangle inequality and convexity of Φ, we have n εi (Zi − Yi )||M ) E∗ε E∗Z,Y Φ(|| i=1 n n 1 1 ≤ E∗ε E∗Z,Y Φ(2|| εi (Zi − Si )||M ) + E∗ε E∗Z,Y Φ(2|| εi (Si − Yi )||M ). 2 2 i=1 i=1
Finally, the repeated outer expectations E∗ε E∗Z,Y can be replaced by a joint outer expectation E∗ = E∗ε,Z,Y . We complete the proof. By taking Zi (g) = g(Xi ) − P g, we have the following corollary. Corollary 4.2. If Φ is a nondecreasing and convex function, then 1 E∗ Φ( ||G+ ||M ) ≤ E∗ Φ(||Pn − P ||M ) ≤ E∗ Φ(2||G0n ||M ) ∧ E∗ Φ(2||G+ n ||M ). 2 n
pg 156/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
4.3.2
linwang
157
Glivenko-Cantelli theorems
Although our interest is weak convergence, it is necessary to present the GlivenkoCantelli theorems for completeness of the empirical process theory. Theorem 4.6. Let M be a class of measurable functions such that N[] (ε, M, L1 (P )) < ∞ for every ε > 0. Then M is P −Glivenko-Cantelli, that is a.s.
||Pn − P ||M = sup |Pn g − P g| −−→ 0. g∈M
Proof. For any ε > 0, choose N[] (ε, M, L1 (P )) ε−brackets {[li , ui ]} whose union contains M and satisfies P (ui − li ) < ε for every i. We have (Pn − P )g ≤ (Pn − P )ui + P (ui − g) ≤ (Pn − P )ui + ε. Similarly, (P − Pn )g ≤ (P − Pn )li + P (g − li ) ≤ (P − Pn )li + ε. Thus sup |(Pn − P )g| ≤ max (Pn − P )ui ∨ max (P − Pn )li + ε 1≤i≤m
g∈M
1≤i≤m
where the right side converges almost surely to ε by the strong law of large numbers. Then lim sup ||Pn − P ||M ≤ ε n→∞
for every ε > 0. The next theorem is more simple.
Theorem 4.7. Let M be a class of P −measurable functions that is L1 (P )−bounded. Then M is P −Glivenko-Cantelli if (1) P ∗ F < ∞; (2) E∗ log N (ε, MC , L2 (Pn )) =0 n→∞ n for all C < ∞ and ε > 0, where F is an envelope function of M, MC is the class of functions {g1{F ≤C} : g ∈ M}. lim
Proof.
By Corollary 4.2, measurability of class M and Fubini’s theorem, 1 εi g(Xi )||M n n
E∗ ||Pn − P ||M ≤ 2EX Eε ||
i=1
n 1 ≤ 2EX Eε || εi g(Xi )||MC + 2P∗ F 1{F >C} . n i=1
pg 157/4
April 23, 2014
10:51
158
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
For sufficiently large C, the last term is small enough. It suffices to show that the first term converges to 0 for fixed C. If G is an ε−net over MC in L2 (Pn ), then it is also an ε−net over MC in L1 (Pn ). Thus n n 1 1 εi g(Xi )||MC ≤ Eε || εi g(Xi )||G + ε. Eε || n i=1 n i=1 The cardinality of G can be chosen equal to N (ε, MC , L2 (Pn )). We use Lemma 4.8 with ψ2 (x) = exp(x2 ) − 1, thus the right side of the last display is bounded by a constant multiple of n 1 1 + log N (ε, MC , L2 (Pn )) sup || εi g(Xi )||ψ2 |X + ε g∈G n i=1 where || · ||ψ2 |X is taken over ε1 , · · · , εn with X1 , · · · , Xn . By Example 4.1 and Lemma 4.8, n n √ 1 1 εi g(Xi )||ψ2 |X ≤ 6|| g(Xi )||L2 || n i=1 n i=1 6 6 ≤ (Pn g 2 )1/2 ≤ C. n n Thus n 1 1 + log N (ε, MC , L2 (Pn )) sup || εi g(Xi )||ψ2 |X + ε → ε g∈G n i=1 in outer probability. Then 1 εi g(Xi )||MC → 0 n i=1 n
Eε ||
in probability. This concludes the proof that E∗ ||Pn − P ||M → 0. Furthermore, ||Pn − P ||∗M is a reverse submartingale with respect to a suitable filtration, and hence almost sure convergence follows from the reverse submartingale convergence theorem. It is useful to specialize Theorem 4.7 to the case of indicator functions of some class of subsets C of R. Set ΔC {x1 , · · · , xn } : C ∈ C}. n (x1 , · · · , xn ) = {C We use C to denote the indicator function class of C. Theorem 4.8. If C is a P −measurable class of sets, and lim n→
E log ΔC n (X1 , · · · , Xn ) = 0. n
Then ||Pn − P ||∗C −−→ 0. a.s.
pg 158/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
159
Convergence of Empirical Processes
Proof.
Note that for any r > 0, N (ε, C, Lr (Pn )) = N (εr
−1
∨1
, C, L∞ (Pn )) ≤ (
2 εr −1 ∨1
)n ,
and ||f − g||Lr (Pn ) = {Pn |f − g|r }1/(r∨1) , ||f − g||L∞ (Pn ) = max |f (Xi ) − g(X1 )|. 1≤i≤n
If we can prove ΔC n (X1 , · · · , Xn ) = N (ε, C, L∞ (Pn ))
(4.38)
we can obtain this theorem by Theorem 4.7. Let C1 , · · · , Ck with k = N (ε, C, L∞(Pn )) form an ε−net for C for the L∞ (Pn )) metric. If C ∈ C satisfies max (1CCj (Xi ) + 1Cj C (Xi )) = max |1C (Xi ) − 1Cj (Xi )| < ε < 1
1≤i≤n
1≤i≤n
for some j ∈ {1, · · · , k}, then the left side must be 0, and hence no Xi is in either CCj or Cj C. Thus k = {{X1 , · · · , Xn } C, C ∈ C};
(4.38) is obtained. 4.3.3
Donsker theorem
Now, we come to the central part of this section and will develop the Donsker theorem, or equivalently, uniform central limit theorem, for classes of functions and sets. Here is the main result in this section. Theorem 4.9. Let {Xi }i≥1 be a sequence of i.i.d. random variables defined on canonical space (R, B, P ), where P is the common distribution of {Xi }i≥1 . Suppose that M is a class of measurable functions with envelope function F satisfying: (i) ∞ 8 sup log N (x||F ||Q,2 , M, L2 (Q))dx < ∞ 0
Q
where the supremum is taken over all finitely discrete measures Q on (R, B) with ||F ||2Q,2 = R F 2 dQ > 0; (ii) P ∗ F 2 < ∞; (iii) for all δ > 0, the classes Mδ = {f − g : f, g ∈ M, ||f − g||P,2 < δ} and M∞ are P −measurable. Then M is P -Donsker.
pg 159/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
160
linwang
Weak Convergence and its Applications
Proof. For fixed f ∈ M, the classical central limit theorems holds. From Theorem 4.5, it is enough to prove that M is totally bounded in L2 (P ), and asymptotic equicontinuity for Gn . Obviously, the latter is equivalent to the following statement: for every decreasing sequence δ ↓ 0, ||Gn ||Mδ → 0 in outer probability measure. By Markov’s inequality and Lemma 4.2, n 2 1 εi g(Xi )||Mδ ). P ∗ (||Gn ||Mδ > x) ≤ E ∗ (|| √ x n i=1 By Hoeffding’s inequality, the stochastic process n 1 g→ √ εi g(Xi ) n i=1 is sub-Gaussian for the L2 (Pn )−seminorm 5 6 n 1 6 ||g||L2 (Pn ) = √ 7 εi g 2 (Xi ). n i=1 By Lemma 4.11, 1 εi g(Xi )||Mδ ) ≤ C E ∗ (|| √ n n
i=1
∞
0
log N (x, Mδ L2 (Pn ))dx.
The set Mδ fits in a single ball of radius ε is larger than θn given by n 1 2 2 g (Xi )||Mδ . θn = || n i=1 Since the covering number of Mδ is bounded by covering number of M∞ , and N (ε, M∞ , L2 (Q)) ≤ N 2 (ε, M, L2 (Q)), we have 1 εi g(Xi )||Mδ ) ≤ E ∗ (|| √ n i=1 n
≤ ≤ Furthermore, E||F ||n
θn 0
√ √
Q
2||F ||n
0
log N (x||F ||n , M, L2 (Pn ))dx
θn /||F ||n
sup 0
Q
8 log N (x||F ||Q,2 , M, L2 (Q))dx.
8 log N (x||F ||Q,2 , M, L2 (Q))dx
θn /||F ||n
sup 0
θn /||F ||n
sup
2 1/2 ≤ (E||F ||n ) (E(
2||F ||n
θn /||F ||n 0
log N (x, Mδ , L2 (Pn ))dx
Q
8 log N (x||F ||Q,2 , M, L2 (Q))dx)2 )1/2 .
pg 160/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
linwang
161
If we can prove θn ≤ δ + oP (1) in P ∗ , we can complete the proof. This will hold if P∗
||Pn g 2 − P g 2 ||M∞ −−→ 0
(4.39)
since sup{P g 2 , g ∈ Mδ } ≤ δ 2 . By the assumption, for any f, g ∈ M∞ Pn |f 2 − g 2 | ≤ Pn (|f − g|(2F )) ≤ ||f − g||L2 (Pn ) ||2F ||L2 (Pn ) . By the uniform entropy assumption (i), N (ε||F ||n , M∞ , L2 (Pn )) is bounded by a fixed number, so its logarithm is certainly oPn (n). By Theorem 4.7, (4.39) is true. Hence asymptotic equicontinuity holds. It remains only to prove that M is totally bounded in L2 (P ). By the previous arguments, there exists a sequence of discrete measure Pn with ||Pn g 2 − P g 2 ||M∞ converging to 0. Choose n sufficiently large so that the supremum is bounded by ε2 .√By the assumption, N (ε, M, L2 (Pn )) is finite. But an ε−net for M in L2 (Pn ) is 2ε−net for M in L2 (P ), thus M is totally bounded in L2 (P ). 4.4
Weak convergence of empirical processes involving time-dependent data
In this section, we extend the result in section 4.3 to more general case. In (4.28), {Xi } is a sequence of R−vauled random variables. A more general version is replace the random variables by random elements taking values in some function space S, it means that empirical processes may involve the time evolution of the stochastic process X. The result in this section is from Kuelbs, Kurtz and Zinn (2013). Let (S, S, P ) be a probability space, and define (Ω, F , P) to be the infinite product probability space (S N , S N , P N ) where S N = S × S × · · · . Let Xi : Ω → S be the natural projections of Ω into the ith copy of S, and M a subset of L2 (S, S, P ) with sup |f (s)| < ∞,
f ∈M
s ∈ S.
Recall that l ∞ (M) is the set of the bounded real valued functions on M with the sup-norm. We take i.i.d. copies {Xi }i≥1 of a process {X(t) : t ∈ [0, 1]}. Consider stochastic process n 1 {√ [1{Xi (t)≤x} − P(X(t) ≤ x)], x ∈ R; t ∈ [0, 1]} n i=1 with time t. Our goal is to determine when these processes converge in distribution in some uniform sense to a mean zero Gaussian process.
pg 161/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
162
linwang
Weak Convergence and its Applications
Let Ft be the distribution function of X(t). The modified distribution function F (x, λ) is defined by Ft (x, λ) = P(X(t) < x) + λP(X(t) = x). Let V be uniformly distribution random variable on [0, 1], F˜t (x) = Ft (x, V ) uschendorf (2009) showed that F˜t (X(t)) is be the distributional transform of Ft . R¨ uniform on [0, 1]. Firstly, a fundamental theorem is presented. Theorem 4.10. (Andersen (1988)) Let X be a random element with distribution P with respect to P, and M = sup |f (X)|. Let P f = (1)
f ∈M
f dP , ||P f ||M = supf ∈M |P f | < ∞ and assume that lim u2 P∗ (M > u) = 0;
u→∞
(2) M is P −pre-Gaussian; (3) there exists a centered Gaussian process {G(f ) : f ∈ M} with L2 distance dG such that G is sample bounded and uniformly dG continuous on M, and for some K > 0, all f ∈ M and all ε > 0, sup u2 P∗ ( u>0
sup g:dG (g,f )<ε
|f − g| > u) ≤ Kε2
Then M is a P −Donsker class. Theorem 4.11. Let ρ2 (s, t) = E(H(s) − H(t))2 , for some centered Gaussian process H that is sample bounded and uniformly continuous on ([0, 1], ρ) with probability one. Furthermore, assume that for some L < ∞ and all ε > 0, |F˜t (X(s)) − F˜t (X(t))| > ε2 ) ≤ Lε2 , (4.40) sup P∗ ( sup t∈[0,1]
{s:ρ(s,t)≤ε}
and D([0, 1]) is a collection of real valued functions on [0, 1] such that P(X(·) ∈ D([0, 1])) = 1. Let D = {Cs,x : s ∈ [0, 1], X ∈ R}, where Ds,x = {z ∈ D([0, 1]) : z(s) ≤ x} for s ∈ [0, 1], x ∈ R. Then weak convergence of n 1 {√ [1{Xi (t)∈D} − P(X(t) ∈ D)] : t ∈ [0, 1], D ∈ D} n i=1 in l ∞ (D) holds.
pg 162/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
163
Convergence of Empirical Processes
Remark 4.1. The relationship between {X(t)} and ρ(s, t) is given in (4.40), which allows one to establish the limiting Gaussian process. To prove theorem, we need some lemmas. Lemma 4.13. Assume that for some L < ∞ and all ε > 0, sup {s,t∈[0,1]:ρ(s,t)≤ε}
P∗ (|F˜t (X(s)) − F˜t (X(t))| > ε2 ) ≤ Lε2 .
(4.41)
Then for all x ∈ R, P(X(s) ≤ x < X(t)) ≤ (L + 1)ρ2 (s, t)
(4.42)
and by symmetry, E|1{X(t)≤x} − 1{X(s)≤x} | = P(Xt ≤ x < Xs ) + P(Xs ≤ x < Xt ) ≤ 2(L + 1)ρ2 (s, t). Further, we have sup |Ft (x) − Fs (x)| ≤ 2(L + 1)ρ2 (s, t). x∈R
Proof.
Since F˜t is nondecreasing and x ≤ y implies Ft (x) ≤ F˜t (y), we have P(X(s) ≤ x < X(t)) ≤ P(Ft (x) ≤ F˜t (X(t)), F˜t (X(s)) ≤ F˜t (x)) ≤ P(Ft (x) ≤ F˜t (X(t)) ≤ Ft (x) + ρ2 (s, t), F˜t (X(s)) ≤ F˜t (x)) +P(F˜t (X(t)) > Ft (x)) + ρ2 (s, t), F˜t (X(s)) ≤ F˜t (x))
and hence P(X(s) ≤ x < X(t)) ≤ P(Ft (x) ≤ F˜t (X(t)) ≤ Ft (x) + ρ2 (s, t)) +P(|F˜t (X(t)) − F˜t (X(s))| > ρ2 (s, t)). Now (4.41) implies for all s, t ∈ [0, 1] that P(|F˜t (X(t)) − F˜t (X(s))| > ρ2 (s, t)) ≤ Lρ2 (s, t). Therefore, P(X(s) ≤ x < X(t)) ≤ ρ2 (s, t) + Lρ2 (s, t) since F˜t (X(t)) is uniform on [0, 1], i.e., (4.42) holds. The last conclusion follows by moving the absolute values outside the expectation.
pg 163/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
164
linwang
Weak Convergence and its Applications
Lemma 4.14. Assume that (4.41) holds. Let τ ((s, x), (t, y)) = [E(1{X(s)≤x} − 1{X(t)≤y} )2 ]1/2 , G = {1{X(s)≤x} : s ∈ [0, 1], x ∈ R}. Then τ 2 ((s, t), (t, y)) ≤ min |Fu (y) − Fu (x)| + (2L + 2)ρ2 (t, s).
(4.43)
u∈{s,t}
Moreover, if Q denotes the rational numbers, there is a countable dense set E0 of ([0, 1], ρ) such that G0 = {1{X(s)≤x} : (s, x) ∈ E0 × Q} is dense in (G, τ ). Proof. τ 2 ((s, x), (t, y)) = E|1{X(s)≤x} − 1{X(t)≤y} | ≤ E|1{X(t)≤y} − 1{X(t)≤x} | + E|1{X(t)≤x} − 1{X(s)≤x} | = |Ft (y) − Ft (x)| + P(X(s) ≤ x < X(t)) + P(X(t) ≤ x < X(s)) ≤ |Ft (y) − Ft (x)| + (2 + 2L)ρ2 (s, t) by using the symmetry in s and t and (4.42). Similarly, τ 2 ((s, x), (t, y)) ≤ |Fs (y) − Fs (x)| + (2 + 2L)ρ2 (s, t). Combining above two inequalities for τ , the proof of (4.43) holds. Since ([0, 1], ρ) is totally bounded, there is a countable dense set E0 of ([0, 1], ρ), and the proof is completed by the right continuity of the distribution functions and (4.43). Lemma 4.15. Assume that (s, x) and (t, y) satisfy τ ((s, x), (t, y)) = ||1{X(s)≤x} − 1{X(t)≤y} ||2 ≤ ε, ρ(s, t) ≤ ε, and (4.41) holds. Then, for c = (2L + 2)1/2 + 1, |Ft (x) − Ft (y)| ≤ (cε)2 . Proof.
In fact, |Ft (y) − Ft (x)|1/2 = ||1{X(t)≤y} − 1{X(t)≤x} ||2 ≤ ||1{X(s)≤x} − 1{X(t)≤y} ||2 +||1{X(s)≤x} − 1{X(t)≤x} ||2 ≤ ε + (P(X(s) ≤ x < X(t)) + P(X(t) ≤ x < X(s)))1/2 ≤ ε + (2Lε2 + 2ε2 )1/2 = [(2L + 2)1/2 + 1]ε = cε.
The lemma is proved.
pg 164/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Convergence of Empirical Processes
165
Lemma 4.16. If (4.40) holds, c = (2L + 2)1/2 + 1, and λ((s, x), (t, y)) = max{τ ((s, x), (t, y)), ρ(s, t)}, then for all (t, y) and ε > 0, P∗ (
sup
{(s,x):λ((t,y),(s,x))≤ε}
Proof.
|1{X(t)≤y} − 1{X(s)≤x} | > 0) ≤ 2(c2 + L + 1)ε2 .
Firstly, P∗ ( = P∗ (
sup {(s,x):λ((t,y),(s,x))≤ε}
sup {(s,x):λ((t,y),(s,x))≤ε}
≤ P∗ (
sup
{(s,x):λ((t,y),(s,x))≤ε} ∗
+P (
|1{X(t)≤y} − 1{X(s)≤x} | > 0) 1{X(t)≤y,X(s)>x} + 1{X(s)≤x,X(t)>y} > 0) 1{F˜t (X(t))≤F˜t (y),Ft (x)≤F˜t (X(s))} > 0)
sup
{(s,x):λ((t,y),(s,x))≤ε}
1{F˜t (X(s))≤F˜t (x),Ft (y)≤F˜t (X(t))} > 0)
= : I + II, by using the fact that x < y implies Ft (x) ≤ F˜t (y). By using Lemma 4.15 and (4.40), I ≤ P∗ ( ≤ P∗ (
sup {(s,x):λ((t,y),(s,x))≤ε}
sup {(s,x):λ((t,y),(s,x))≤ε}
1{F˜t (X(t))≤Ft (y),Ft (y)−(cε)2 ≤F˜t (X(s))} > 0) 1{F˜t (X(t))≤Ft (y),Ft (y)−(cε)2 ≤F˜t (X(t))+ε2 } > 0) + Lε2
≤ P(Ft (y) − (cε)2 − ε2 ≤ F˜t ((Xt )) ≤ Ft (y) + Lε2 ) = (c2 + L + 1)ε2 . For II, since F˜t (x) ≤ Ft (x) for all x, Lemma 4.15, and the definition of L, we have II ≤ P(F˜t (X(t)) − ε2 ≤ Ft (y) + (cε)2 , Ft (y) ≤ F˜t (X(t))) + Lε2 ≤ (c2 + L + 1)ε2 . Combining the estimates for I and II obtains the lemma.
Lemma 4.17. Let E0 be a countable dense subset of ([0, 1], ρ). Then there exits a sequence of partitions {An : n ≥ 0} of E0 × R such that sup 2n/2 Δτ (An ((t, y))) = 0 (4.44) lim r→∞ (t,y)∈E ×R 0 n≥r
and Card(An ) ≤ 22 , n
where Δτ (A) is the diameter of A with respect to τ .
pg 165/4
April 23, 2014
10:51
166
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
The proof of this lemma is with strong technical skills, we omit it here. The readers can find details in Kuelbs, Kurtz and Zinn (2013). The proof of Theorem 4.11. To prove this theorem, we verify the conditions in Theorem 4.10. Let Q denote the set of rational numbers. Then, if we restrict the partitions An of E0 × R in Lemma 4.17 to E0 × Q, then sup 2n/2 Δτ (An ((t, y))) = 0, (4.45) lim r→∞ (t,y)∈E ×Q 0 n≥r
and (E0 × Q, τ ) is totally bounded. Let {G(s,x) : (s, x) ∈ E × R} be a centered Gaussian process with E(G(s,x) G(t,y) ) = P(X(s) ≤ x, X(t) ≤ y). Then, G has L2 distance τ , it is uniformly continuous on (E0 × Q, τ ) by (4.45) and the Talagrand continuity theorem (Theorem 1.4.1, Talagrand (2005)). Hence if {H(s,x) : (s, x) ∈ E0 × Q} is a centered Gaussian process with E(H(s,x) H(t,y) ) = P(X(s) ≤ x, X(t) ≤ y) − P(X(s) ≤ x)P(X(t) ≤ y), then E((H(s,x) − H(t,y) )2 ) = τ 2 ((s, x), (t, y)) − (P(X(s) ≤ x) − P(X(t) ≤ y))2 . Hence the L2 distance of H is smaller than that of G, and therefore the process H is uniformly continuous on (E0 × Q, dH ), where dH ((s, x), (t, y)) = ρ(1{X(s)≤x} , 1{X(t)≤y} ). By Lemma 4.14 the set E0 × Q is dense in ([0, 1] × R, τ ), and dH ((s, x), (t, y)) ≤ τ ((s, x), (t, y)), we also have that E0 × Q is dense in ([0, 1] × R, dH ). Thus the Gaussian process {H(s,x) : (s, x) ∈ [0, 1] × R} has a uniformly continuous version, which we also denote by H, and since (E, dH ) is totally bounded, the sample functions are bounded on [0, 1] with probability one. By the definition of dH ((s, x), (t, y)), the continuity of H on ([0, 1] × R, dH ) implies condition (2) in Theorem 4.10 is satisfied. Condition (1) in Theorem 4.10 is also satisfied, since 1{X(t)≤y} is bounded. Thus, it is enough to verify condition (3) of Theorem 4.10. Let H = {I{Xs ≤x} : (s, x) ∈ E × R}. Since f = 1{X(s)≤x} ∈ H with the point (s, x) ∈ [0, 1] × R, for the centered Gaussian process {Gf : f ∈ M} in (3) of Theorem 4.10, for (s, x) ∈ [0, 1] × R, we take the process ˜s ˜ (s,x) = G(s,x) + H G ˜ s : s ∈ [0, 1]} is a Gaussian process whose law is that of the process where {H {Hs : s ∈ [0.1]} given in the theorem. Furthermore, {G(s,x) : (s, x) ∈ [0, 1] × R} is
pg 166/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
167
Convergence of Empirical Processes
a uniformly continuous and sample bounded version of the Gaussian process, also denoted by G(s,x) , but defined above on E0 × Q. Since E0 × Q is dense in (E × R, τ ), the extension to all of [0, 1] × R again follows. ˜ is sample bounded and uniformly continuous on [0, 1] × R with Therefore, G respect to its L2 distance dG˜ ((s, x), (t, y)) = {τ 2 ((s, x), (t, y)) + ρ2 (s, t)}1/2 . We have {(s, x) : λ((s, x), (t, y)) ≤ } ⊇ {(s, x) : dG˜ ((s, x), (t, y)) ≤ ε}. For a random variable Z bounded by one, sup t2 P(|Z| > t) ≤ P(|Z| > 0), t>0
thus sup u2 P∗ ( u>0
sup
g:dH (g,f )<ε
|f − g| > u) ≤ P∗ (
sup
g:dH (g,f )<ε
|f − g| > 0).
Condition (3) of Theorem 4.10 now follows by Lemma 4.15. 4.5
Two examples of applications in statistics
Study of empirical processes is an important subject in statistics. Here, we just present two simple examples to show the importance of weak convergence of empirical processes, when people need to obtain the limiting distributions of some statistics. 4.5.1
Rank tests in complex survey samples
Data from complex survey samples such as national health and nutrition examination survey series in the US or the British household panel survey in the UK are increasingly being used for statistical analyses. Observations in these surveys are sampled with known unequal sampling distributions, and the sampling is typically correlated. Data analysis in complex survey samples is quite different from the independently-sampled data analysis. The classical derivation of asymptotic distribution for rank tests is based strongly on exchangeability, which is not present in complex survey samples. Thus, people consider the test statistic as a Hadamarddifferentiable functional of the estimated cumulative distribution function in each group, then obtain the asymptotic distribution for rank tests through the central limit theorem of empirical process. This example is from Lumley (2013). In a sequence of Nk populations and nk samples, a real valued variables Y is observed together with a binary group G that divides the population into groups of size M0 and M1 .
pg 167/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
168
linwang
Weak Convergence and its Applications
A sample, s, of n units is draw from the finite population using some probability sampling method with sampling probabilities πi , and corresponding sampling weights πi−1 , and the values of Yi and Gi is observed for the sampled units. The ˆ i are defined by estimated population ranks R ˆ i = Fˆn (Yi ), R 1 1 1{Yj ≤y} , Fˆn (y) = ˆ πj N j∈s 1 ˆ = where N j∈s πj . A rank test statistic is then defined by 1 1 ˆ 1 1 ˆ Tˆn = g(Ri ) − g(Ri ), ˆ ˆ M0 i∈s0 πi M1 i∈s1 πi 1 ˆl = where g is a given function, sl = {i ∈ s, Gi = l}, M j∈sl πj , l = 0 or 1. Now, write DY (y) = F0Y (y) − F1Y (y), ˆ n (y) = Fˆ0n (y) − Fˆ1n (y), D where FlY denotes the conditional distribution function of Y given G = l in the superpopulation, and 1 1 Fˆln (y) = 1{Yi ≤y} . ˆl π M i∈s i Theorem 4.12. (Lumley (2013)) Assume that g is a continuous differentiable function, and the marginal distribution of Y is absolutely continuous with finite forth moment. Let g(y)dDY (y). δY = R √ ˆ Then n(Tn − δY ) is asymptotically normal. Proof.
The finiteness of fourth moment of Y implies that √ ˆ n − DY , Fˆn − FY ) n(D
converges weakly to a Gaussian process Zπ = (Zπ1 , Zπ2 ) by Theorem 4.9. The details can be found in Theorem 3 in Lumley (2013). We can write ˆ n. Tˆn = g(Fˆn )dD R As g is differentiable and bounded, φ(D, F ) = g(F )dD is Hadamard differentiable with Hadamard derivative φD,F (α, β)
Then
√
g(F )dα +
d n(Tˆn − δY ) − →
βg (F )dD.
gdZπ1 +
Zπ2 g dDY ,
from the Donsker Theorem and the functional delta method. The proof is complete.
pg 168/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Convergence of Empirical Processes
4.5.2
linwang
169
M-estimator
M-estimators stands for a broad class of estimators, which are obtained as the minima of functions of the data. LSE is important examples of M-estimators. More generally, an M-estimator may be defined to be a zero of an estimating function, the derivative of another statistical function. Let X1 , X2 , · · · be i.i.d. copies of a random variable X taking values in R and with distribution P . Let Θ be a parameter space and for θ ∈ Θ, γθ : R → R some loss function. We assume P (|γθ |) < ∞ for all θ ∈ Θ. We estimate the unknown parameter θ0 := arg min P (γθ ). θ∈Θ
By the M −estimator θn := arg min Pn (γθ ), θ∈Θ
where 1 δXi . n n
Pn =
i=1
We assume that θ0 exists and is unique. Our interest is to obtain the asymptotic normality of estimator θˆn . Firstly, three conditions are presented. Condition a. There is a function ψ0 : R → R, with P (ψ02 ) < ∞ such that lim
θ→θ0
||γθ − γθ0 − ψ0 (θ − θ0 )||2,P = 0. |θ − θ0 |
Condition b. P (γθ ) − P (γθ0 ) =
1 C(θ − θ0 )2 + o(|θ − θ0 |) + o(|θ − θ0 |2 ) 2
as θ → θ0 , where C > 0. Condition c. There exists an ε > 0 such that the class {gθ : |θ − θ0 | < ε} is P −Donsker class with envelope G with P (G2 ) < ∞, where gθ =
γθ − γθ0 . |θ − θ0 |
Theorem 4.13. Suppose conditions a, b and c are satisfied. Then
where J =
√ J d n(θˆn − θ0 ) − → N (0, 2 ), C ψ02 dP .
pg 169/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
170
linwang
Weak Convergence and its Applications
Proof.
We may write
0 ≥ Pn (γθˆn − γθ0 ) = (Pn − P )(γθˆn − γθ0 ) + P (γθˆn − γθ0 ) = (Pn − P )(gθ )|θ − θ0 | + P (γθˆn − γθ0 ) = (Pn − P )ψ0 (θˆn − θ0 ) + oP (n−1/2 ) + P (γθˆn − γθ0 ) = (Pn − P )ψ0 (θˆn − θ0 ) + oP (n−1/2 )|θˆn − θ0 | 1 + C(θ − θ0 )2 + o(|θˆn − θ0 |2 ). 2 This implies |θˆn − θ0 | = OP (n−1/2 ). Then |C 1/2 (θˆn − θ0 ) + C −1/2 (Pn − P )ψ0 + oP (n−1/2 )|2 = oP (n−1 ). Therefore, θˆn − θ0 = −C −1 (Pn − P )ψ0 + oP (n−1/2 ). The result follows by condition c and P (ψ0 ) = 0.
pg 170/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Bibliography
Aalen, O. (1977). Weak convergence of stochastic integrals related to counting processes. Probability Theory and Related Fields 38, 261-277. Aldous, D. (1978). Stopping times and tightness. The Annals of Probability 6, 335-340. Andersen, N., Gin´e, E., Ossiander, M., Zinn, J. (1988). The central limit theorem and the law of iterated logarithm for empirical processes under local conditions. Probability Theory and Related Fields 77, 271-305. Aldous, D., Eagleson, G. (1978). On mixing and stability of limit theorems. The Annals of Probability 6, 325-331. Applebaum (2009). L´evy Processes and Stochastic Calculus, Cambridge Studies in Advanced Mathematics. Avram, F., Taqqu, M. (1992). Weak convergence of sums of moving averages in the α-stable domain of attraction. The Annals of Probability 20, 483-503. Basrak, B., Segers, G. (2009). Regularly varying multivariate time series. Stochastic Processes and their Applications 119, 1055-1080. Basrak, B., Krizmani´c, D., Segers, G. (2012). A functional limit theorem for dependent sequences with infinite variance stable limits. The Annals of Probability 40, 2008-2033. Billingsley, P. (1999). Convergence of Probability Measures, John Wiley and Sons. Can, S., Mikosch, T., Samorodnitsky, G. (2010). Weak convergence of the functionindexed integrated periodogram for infinite variance processes. Bernoulli 16, 995-1015. Caner, M. (1997). Weak convergence to a matrix stochastic integral with stable processes. Econometric Theory 13, 506-528. Chan, N., Zhang, R. (2012). Marked empirical processes for non-stationary time series. Forthcoming in Bernoulli. Chatterji, S. (1974). A principle of subsequences in probability theory: the central limit theorem. Advances in Mathematics 13, 31-54. Davis, R., Resnick, S. (1986). Limit theory for the sample covariance and correlation functions of moving averages. The Annals of Statistics 14, 533-538. Davis, R., Mikosch, T. (1998). The sample autocorrelations of heavy-tailed processes with applications to ARCH. The Annals of Statistics 26, 2049-2080. Dellacherie, C., Meyer, P. (1982). Probabilities and Potential, volume B, Amsterdam. Dette, H., Podolskij, M. (2008). Testing the parametric form of the volatility in continuous time diffusion modelsa stochastic process approach. Journal of Econometrics 143, 56-73. Dudley, R. (1978). Central limit theorems for empirical measures. The Annals of Probability 143, 899-929. Ethier, S., Kurtz, T. (1986). Markov Processes: Characterization and Convergence, John Wiley and Sons.
171
pg 171/4
April 23, 2014
10:51
172
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Feller, W. (1971). An Introduction to Probability and its Applications, John Wiley and Sons. Fermanian, J., Radulovic, D., Wegkamp, M. (2004). Weak convergence of empirical copula processes. Bernoulli 10, 847-860. Hall, P. (1977). Martingale invariance principles. The Annals of Probability 5, 876-887. Hall, P., Heyde, C.C. (1980). Martingale Limit Theory and its Applications, Academic Press. Hansen, B. (1992). Convergence to stochastic integrals for dependent heterogeneous processes. Econometric Theory 8, 489-500. Jacod, J. (1975). Multivariate point processes: predictable projection, Radon-Nikodym derivatives, representation of martingales. Probability Theory and Related Fields 31, 235253. Jacod, J. (1979). Calcul Stochastique et Probl`emes de Martingales. Lecture Notes in Mathematics 714. Jacod, J. (1997). On continuous conditional Gaussian martingales and stable convergence in law. S´eminaire de Probabilit´es XXXI 232-246. Jacod, J. (2003). On processes with conditional independent increments and stable convergence in law. S´eminaire de probabilit´es XXXVI 383-401. Jacod, J. (2004). The Euler scheme for L´evy driven stochastic differential equations: limit theorems. The Annals of Probability 32, 1830-1872. Jacod, J. (2008). Asymptotic properties of realized power variations and related functionals of semimartingales. Stochastic Processes and Their Applications 118, 517-559. Jacod, J., Protter, P. (2003). Asymptotic error distributions for the Euler method for stochastic differential equations. The Annals of Probability 26, 267-307. Jacod, J., Shiryaev, A. (2003). Limit Theorems for Stochastic Processes, Springer. Jakubowski, A. (1996). Convergence in various topologies for stochastic integrals driven by semimartingales. The Annals of Probability 24, 1-21. Jakubowski, A. (1997). A non-Skorokhod topology on the Skorokhod space. Electronic Journal of Probability 24, 2141-2153. Jakubowski, A., M´emin., J., Pag`es, G. (1989). Convergence en loi des suites d’int´egrales stochastiques sur l’espace D1 de Skorokhod Probability Theory and Related Fields 81, 111-137. Kinght, K. (1991). Limit theory for M-estimates in an integrated infinite variance processes. Econometric Theory 7, 200-212. Kuelbs, J., Kurtz, T., Zinn, J. (2013). A CLT for empirical processes involving timedependent data. The Annals of Probability 41, 785-816. Kurtz, T., Protter, P. (1991). Weak limit theorems for stochastic integrals and stochastic differential equations. The Annals of Probability 19, 1035-1070. Leadbetter, M., Rootz´en, H. (1988). Extremal theory for stochastic processes. The Annals of Probability 28, 885-908. Lin, Z. (1985). On a conjecture of an invariance principle for sequences of associated random variables. Acta Mathematica Sinica, English Series 1, 343-347. Lin, Z. (1999). The invariance principle for some class of Markov chains. Theory of Probability and its Applications 44, 136-140. Lin, Z., Bai, Z. (2010). Probability Inequalities, Springer. Lin, Z., Choi, Y. (1999). Some limit theorems for fractional L´evy Brownian fields. Stochastic Processes and their Applications 82, 229-244. Lin, Z., Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables, Science Press/Kluwer Academic Publisher.
pg 172/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
Bibliography
linwang
173
Lin, Z., Qiu, J. (2004). A central limit theorem for strong near-epoch dependent random variables. Chinese Annals of Mathematics 25, 263-274. Lin, Z., Wang, H. (2010). Empirical likelihood inference for diffusion processes with jumps. Science in China 53, 1805-1816. Lin, Z., Wang, H. (2010). On convergence to stochastic integrals. arXiv:1006.4693. Lin, Z., Wang, H. (2011). Weak convergence to stochastic integrals driven by α− stable L´evy processes. arXiv:1104.3402. Lin, Z., Wang, H. (2012). Strong approximation of locally square-integrable martingales. Acta Mathematica Sinica, English Series 28, 1221-1232. Lindbwerg, C., Rootz´en, H. (2013). Error distributions for random grid approximations of multidimensional stochastic integrals. The Annals of Applied Probability 23, 834-857. Liptser, R., Shiryaev, A. (1983). Weak convergence of a sequence of semimartingales to a process of diffusion type. Matematicheskii Sbornik 121, 176-200. Liu, W., Lin, Z. (2008). Strong approximation for a class of stationary processes. Stochastic Processes and their Applications 119, 249-280. Lumley, T. (2013). An empirical-process central limit theorem for complex sampling under bounds on the design effect. Forthcoming in Bernoulli. Marcus, M. (1981). Weak convergence of the empirical characteristic function. The Annals of Probability 9, 194-201. Mikosch, T., Resnick, S., Samorodnitsky, G. (2000). The maximum of the periodogram for a heavy-tailed sequence. The Annals of Probability 28, 431-478. Oksendal, B. (2003). Stochastic Differential Equations, Springer. Politis, D., Romano, J., Wolf, M. (1999). Weak convergence of dependent empirical measures with application to subsampling in function spaces. Journal of Statistical Planning and Inference 79, 179-190. Phillips, P. (1988). Weak convergence of sample covariance matrices to stochastic integrals via martingale approximations. Econometric Theory 4, 528-533. Phillips, P. (1990). Time series regression with a unit root and infinite variance errors. Econometric Theory 6, 44-62. Phillips, P., Perron, P. (1988). Testing for a unit root in time series regression. Biometrika 75, 335-346. Phillips, P., Solo, V. (1992). Asymptotics for linear processes. The Annals of Statistics 20, 971-1001. Prohorov, Y.V. (1956). Convergence of random processes and limit theorems in probability theory. Theory of Probability and Its Applications 1, 157-214. Protter, P. (2005). Stochastic Integration and Differential Equations, Springer. Raikov D. (1938). On a connection between the central limit-law of the theory of probability and the law of great numbers. Izv. Akad. Nauk SSSR Ser. Mat 2, 323-338. Rootz´en, H. (1977). On the functional central limit theorem for martingales. Probability Theory and Related Fields 38, 199-210. Resnick, S. (1975). Weak convergence to extremal processes. Advances in Applied Probability 5, 951-960. Resnick, S. (1986). Point processes, regular variation and weak convergence. Advances in Applied Probability 18, 66-138. Resnick, S. (2007). Heavy-tail phenomena: Probabilistic and Statistical Modeling, Springer. Skorokhod, A.V. (1956). Limit theorems for stochastic processes. Theory of Probability and Its Applications 1, 261-290. Song, Y., Lin, Z., Wang, H. (2013). Re-weighted functional estimation of second-order jump-diffusion model. Journal of Statistical Planning and Inference 143, 730-744.
pg 173/4
April 23, 2014
10:51
174
World Scientific Book - 9.75in x 6.5in
linwang
Weak Convergence and its Applications
Sowell, T. (1990). The fractional unit root distribution. Econometrica 58, 495-505. Talagrand, M. (1987). Donsker classes and random geometry. The Annals of Probability 15, 1327-1338. Taqqu, M. (1975). Weak convergence to fractional Brownian motion and to the Rosenblatt process. Probability Theory and Related Fields 31, 287-302. Tyran-Kami´ nska, M. (2010). Convergence to L´evy stable processes under some weak dependence conditions. Stochastic Processes and their Applications 120, 1629-1650. van der vaart, A., Wellner, J. (1996). Weak Convergence and Empirical Processes, Springer. Voln´ y, D. (1993). Approximating martingales and the central limit theorem for strictly stationary processes. Stochastic Processes and their Applications 44, 41-74. Wang, Q. (2012). Martingale limit theorems revisited and non-linear cointegrating regression. Forthcoming in Econometric Theory. Wang, Q., Lin, Y., Gulati, C. (2003). Asymptotics for general fractionally integrated processes with applications to unit root tests. Econometric Theory. 19, 143-164. Wang, W. (2003). Weak convergence to fractional Brownian motion in Brownian scenery. Probability Theory and Related Fields 126, 203-220. Wu, W. (2003). Empirical processes of long-memory sequences Bernoulli 95, 809-831. Wu, W. (2007). Strong invariance principles for dependent random variables. The Annals of Probability 35, 2294-2320. Yan, L. (2005). Asymptotic error for the Milstein scheme for SDEs driven by continuous semimartingales. The Annals of Applied Probability 15, 2706-2738.
pg 174/4
April 23, 2014
10:51
World Scientific Book - 9.75in x 6.5in
linwang
Index
α−stable distribution, 46 Aldous (1978), 75 Aldous and Eagleson (1978), 26 Applebaum (2009), 103 Arzel` a-Ascoli theorem, 1 asymptotic equicontinuity, 147
Feller (1971), 65 function index empirical process, 147 graph, 2 Hall (1977), 26 Hall and Heyde (1980), 26 heavy tailed, 45
Basrak and Segers (2009), 58 Basrak, Krizmani´c and Segers (2012), 58 Billingsley (1999), 17 bracketing number, 149
J1 -convergence, 2 J1 topology, 2 Jacod (1997), 113 Jacod and Shiryaev (2003), 14 Jacod and Protter (1998), 116 Jakubowski (1996, 1997), 4 Jakubowski, M´emin and Pag`es (1989), 105 jointly regularly varying, 58
C−tight, 77 causal processes, 86 Chan and Zhang (2012), 137 characteristic L´evy triple, 48 Chatterji (1974), 26 conditional Gaussian martingale, 112 covering number, 149
Kuelbs, Kurtz and Zinn (2013), 166 Kurtz and Protter (1991), 105 Kurtz and Protter’s approach, 105
Dellacherie and Meyer (1982), 93 Dette and Podolskij (2008), 20 Donsker invariance principle, 23 Dudley (1978), 146
Laplace functional, 49 least square estimation, 139 Lin and Wang (2010), 87 Lin and Wang (2011), 94 Lindbwerg and Rootz´en (2013), 122 Liptser and Shiryaev (1983), 79 local uniform topology, 1 Lumley (2013), 167
empirical processes, 131 entropy, 149 envelope function, 149 error processes, 115 Ethier and Kurtz (1986), 18 Euler continuous approximation, 115 Euler discontinuous approximation, 115 extended martingale invariance principle, 38
marked empirical processes, 137 martingale approximation, 86 175
pg 175/4
April 23, 2014
10:51
176
World Scientific Book - 9.75in x 6.5in
Weak Convergence and its Applications
martingale biased conditional Gaussian martingale, 112 Martingale invariance principle, 26 Mikosch, Resnick and Samorodnitsky (2000), 65 Milstein scheme, 125 Oksendal (2003), 122 Orlicz norm, 149 P −Donsker class, 147 P −Glivenko-Cantelli, 157 parametric representation, 2 periodogram, 65 point process, 47 Poisson random measure, 47 Polish space, 1 predictable characteristics, 75 Prohorov metric, 7 Protter (2005), 116 R¨ uschendorf (2009), 162 Raikov (1938), 26 random grid scheme, 122 regular variation, 45 Resnick (2007), 6 Rootz´en (1977), 26
linwang
Skorokhod (1956), 2 Skorokhod representation theorem, 92 Skorokhod topology, 2 Sowell (1990), 111 stable convergence, 112 stable integral, 69 Strassen (1964), 92 strongly majorizes, 77 summation functional, 53 Talagrand (1987), 146 tight, 14 unit root testing, 19 vague convergence, 5 Wang (2012), 38 Wang, Lin and Gulati (2003), 108 Wu (2003), 139 Wu (2007), 87 Yan (2005), 127
pg 176/4