iiilill^^ iiiiiβ^ llβ
illlllil
Empirical Processes Peter Gaenssler Mathematical Institute of the University of Munich
lllilllllllllllllllll
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Shanti S. Gupta, Series Editor Volume 3
Empirical Processes Peter Gaenssler Mathematical Institute of the University of Munich
Institute of Mathematical Statistics Hayward, California
Institute of Mathematical Statistics Lecture Notes-Monograph Series Series Editor, Shanti S. Gupta, Purdue University
The production of the IMS Lecture Notes-Monograph Series is managed by the IMS Business Office: Bruce E. Trumbo, Treasurer, and Jose L Gonzalez, Business Manager.
Library of Congress Catalog Card Number: 83-82637 International Standard Book Number 0-940600-03-X Copyright © 1983 Institute of Mathematical Statistics All rights reserved Printed in the United States of America
CONTENTS
1. Introduction and some structural properties of empirical measures
1
2. Glivenko-Cantelli Convergence: The Vapnik-Chervonenkis Theory with some extensions
12
3. Weak convergence of non-Borel measures on a metric space: Separable and tight measures on ^ ( S )
4 1
Weak convergence / Portmanteau-Theorem
46
Identification of Limits
52
Weak convergence and mappings (Continuous Mapping Theorems)
53
Weak convergence criteria and compactness
58
Some remarks on product spaces
69
Sequential Compactness
74
Skorokhod-Dudley-Wichura Representation Theorem
82
The space D[θ,l]
90
Random change of time
102
4. Functional Central Limit Theorems: Functional Central Limit Theorems for empirical C-processes
105
Some remarks on other measurability assumptions and further results
131
Functional Central Limit Theorems for weighted empirical processes
139
Some general remarks on weak convergence of random elements in D = D[O,ll w.r.t. p -metrics
147
Functional Central Limit Theorems for weighted empirical processes w.r.t, p -metrics <1
iii
150
iv
Concluding remarks on further results for empirical processes indexed by classes of sets or classes of functions (Functional Laws of the Iterated Logarithm, Donsker classes of functions, Strong approximations) 158
REFERENCES
166
NOTATION INDEX
172
SUBJECT INDEX
176
PREFACE
The present notes are based on lectures given at the Department of Statistics, University of Washington, Spring 1982, when the author was on leave form the Mathematical Institute, University of Munich, as a Visiting Professor at the University of Washington, supported by the National Science Foundation under Grant (MCS) 81-02568.
The last section of these lecture notes is supplemented by a course on Functional Central Limit Theorems held at the University of Munich during the Summer term in 1982.
ACKNOWLEDGEMENTS
These lecture notes gained much from the extremely stimulating atmosphere at the Department of Statistics of the University of Washington in Seattle. Special thanks are due to Professors Michael Perlman, Ronald Pyke and Galen Shorack for their great hospitality and for the very helpful conversations. I want to mention here that it was Professor Ronald Pyke who gave to me the decisive impulse for studying empirical processes after having attended his beautiful lecture at the Mathematical Research Institute Oberwolfach in 1974, I also express my gratitude to Professor R.M. Dudley (MIT) for the privilege of obtaining at any time preprints of his newest results. The present notes are mainly influenced by his work and the correspondence we occasionally had on the subj ect. These notes gained also from some remarks made by Professor D. Pollard (Yale) who visited the University of Washington at the same time, as well as from the careful reading of the manuscript by my assistant Dr. Erich Hausler, Finally, the contributions and critical remarks made by my students J. Schattauer and W. Schneemeier in connection with section 3 and 4- were of significant importance. I am also very grateful to the referees for suggesting further valuable improvements for the final version. Special thanks are due to Mrs. K. Bischof for typing the final version of the manuscript with an admirable skill. Last not least I would like to thank Professors Michael Perlman and Shanti S. Gupta for encouraging me to publish these notes in the IMS Lecture Notes Monograph Series.
Munich, August 1982; revised June 1983.
vU
PETER GAENSSLER
TO MY WIFE INGRID
with admiration and thanks for her patience with me
ix
1. Introduction and some structural properties of empirical measures.
Many standard procedures in statistics are based on a random sample x ,...,x
of i.i.d. observations, i.e., it is assumed that observations (or
measurements) occur as realizations (or values) x. = ξ.(ω) in some sample space X of a sequence of independent and identically distributed (i.i.d.) random elements ξ ,... ,ξ
defined on some basic probability space (p-space for short)
(Ω,F,IP); here ξ is called a RANDOM ELEMENT in X whenever there exists a (Ω,F,P) such that
ξ: Ω •*• X
which case the law
is F,B-measurable for an appropriate σ-algebra B in X, in μ Ξ [{ξ} of ξ is a well defined p-measure on B
(μ(B) =P({ωEΩ: ξ(ω)GB}) = P ( ξ E B ) for short, BGB). In classical situations, the sample space X is usually the k-dimensional Euclidean space ΊR , k^l, with the Borel σ-algebra 43,. In the present notes, if not stated otherwise, the sample space X is always an arbitrary measurable space (X,B). Given then i.i.d. random elements ξ. in X = (X,B) with (common) law μ on B we can associate with each (sample size) n the so-called EMPIRICAL MEASURE (1)
μ
:= n
(εr
+ ... + ε
{
r
) on B,
ξn
ξ>1
1 i f xGB , 0 if
BeB.
xtB
In other words, given the first n observations x. = ξ.(ω), i=l,...,n, μ (B) Ξμ (B,ω) is the average number of the first n x.'s falling into B. (The notation μ ( , ω ) should call attention to the fact that μ is a random n n p-measure on B.) μ
may be viewed as the statistical picture of μ and we are thus inte-
rested in the connection between μ and μ, especially when n tends to infinity. n
PETER GAENSSLER
In what follows, let C be some subset of 8 (e.g., C = {(-»,t]: tθR }, the class of all lower left orthants in X = E , or the class of all closed Euclidean k balls in ]R , to have at least two specific examples in mind). Denoting with 1 the indicator function of a set CEC, μ (C) can be rewritten in the form
Now, since the random variables 1 (ξ.), i=l,2,... are again i.i.d. with common mean μ(C) and variance μ(C)(l-μ(C)), it results from classical probability theory that (2)
(Strong Law of Large Numbers): For each fixed CEC one has μ (C) -> μ(C) F-almost surely QP-a.s.) as n tends to infinity.
(3)
(Central Limit Theorem): For each fixed CEC one has 1/2
L —
n ' (μ (C) - μ(C)) + G (C) as n tends to infinity, where G (C) is a random variable with L{Gμ(C)} = N(O,μ(C)(l-μ(C))). (4)
(Multidimensional Central Limit Theorem): For any finitely many C
l 5 " ''CkeC O n e
h a S
{n 1 / 2 (μ n (C.) - μ(C.)) : j=l,...,k} k {G μ (C.) : j=l,...,k} as n tends to infinity, where
Φ = (G ( C ) ) n c P μ μ Let
is a mean-zero
Gaussian process with covariance structure cov(G (C), G (D)) = μ(CΠD) - μ(C)μ(D), C,DEC. Here, according to Kolmogorovfs theorem (cf. Gaenssler-Stute (1977), 7.1.16),
— Φ
μ
is viewed as a random element
C
in (IR , β p ) , where t
Bn Ξ & IB denotes the proL p
C duct σ-algebra in ]R of identical components β, β being the σ-algebra of Borel sets in ΊR, In this lecture we are going to present uniform analogues of (2) (with the uniformity being in CEC) known as GLIVENKO-CANTELLI THEOREMS (Section 2) and functional versions of (4), so-called FUNCTIONAL CENTRAL LIMIT THEOREMS (Section 4 ) ; an appropriate setting for the latter is presented in Section 3
EMPIRICAL PROCESSES
3
which might also be of independent interest. First we want to give insight into some more or less known
STRUCTURAL PROPERTIES OF EMPIRICAL MEASURES: For this, consider instead of μ
the counting process
N (B) := nμ (B), BEB. n n j
2
Note that L{N n (B)} = Bin(n,μ(B)) (i.e., F(N n (B)=j) = (* )μ(B) ( l-μ(B) f"
,
j=O,l,...,n). The following Markov and Martingale properties associated with empirical measures are well known; since however specific references are not conveniently available, and especially not in the set-indexed context of these lectures, we present detailed derivatives. LEMMA 1 (MARKOV PROPERTY). For any B.eB such that for
D. := B.\B
0 ^ m ^ . . .^m, . ύ ΠL ^ n
Proofs
with
R + 1
= X
with
, μ(D.)>0, i=l,...,k+l, and for any m. E {0,1,... ,n}
one has
3P(Nn(Bk)=mk|Nn(B1)=m1,...,Nn(Bk_1)=mk_1)
"
a
0 = B Q C B^. . .CBJ^C B R C B
•
_ _
-: —
9 sa
y 9 where
=P(Nn(D1)=m1, 1
n 1 (m 2 -m 1 )!...(m k -m k _ 1 )!(n-m k )! m. 1
3
whence
— = b
(
"-\-l)!
(mk-\i>!<
u(D k )
m_-m. . x 2 1
/Γ
,_
m. .-m. x k-1 κ-2
PETER GAENSSLER
proving equality of
the first and third term in the assertion of the lemma; the other equalityfollows in the same way by just taking B 1 9 ...,B J.
out of consideration.
D
K —Z
Corollary. Let C be a subset of B which is linearly ordered by inclusion then (N ( O )
r
is a Markov process.
Lemma 2. Let B68 be arbitrary but fixed such that 0 < μ(B) < 1 and let CΞB(B)CB
be linearly ordered by inclusion with B as its smallest element;
then for 0 £m
where
N (D) := N μ (D), μ
being the empirical measure pertaining to i.i.d.
random elements X. in (X,B) with L{"ξ. } =μ and μ(D) := μ l
l
(
y^^
for D€B.
μv.v o /
(Here the laws L{...} are considered to be defined on the product σ-algebra in ]R
and CB denotes the complement of B in X.)
Proof. It suffices to show that the finite dimensional marginal distributions coincide. 1) As to the one-dimensional marginal distributions, let BEB with BQ3 be arbitrary but fixed; then it follows from Lemma 1 that for k ^ m μ(B\B)\ k ~ m N (B) =k|N k|N (B) = m ) = n n
/
μ(B\B)
n-k
1
k-m
μ(CB)/
I
μ(CB)
k-m
L On the other hand, taking into account that
1 (ξ.) = 1 \—(ξ.)
L (where = means
equality in law) and therefore N (B) = N (B\B) for any BGB with BCβ, one obtains that
n-m \ μ(B) K
k-m / n-m
m
(l-μ(B))n
k
=P(N
n
(B\B) =k-m) =]P(m+N
~m
(B) =k-m)
n-m
(BVI)=k)
EMPIRICAL PROCESSES
proving the coincidence of the one-dimensional marginal distributions. 2) As to higher-dimensional marginal distributions, let us consider for simplicity the two-dimensional case (the general case runs in the same way): For this, let B ^ B , i=l,2, with B C B ^ B
be arbitrary but fixed; then for
P(N ( B ) = m , N ( B 1 ) = k 1 , N ( B o ) = k o ) a _ n n l l n z 2 - — — =: — , s a y , where P(N(B)=m) b = P ( N n ( B ) = m , N n (B 1 \B) = k 1 - m , N n ( B 2 \ B 1 ) = k 2 - k l 9 N n and
Ώ!(k1-m)!(k2-k1)!(n-k2)! / n\ b = μ(B) m (l-μ(B)) n
m
, whence
(n-m)! b
μ(Cl) n-m
(k1-m)!(k2-k1)!(n-k2)! (n-m)!
(n-m)!
m
' Vm(B2N5)
=
. D
6
PETER GAENSSLER
LEMMA 3 (MARTINGALE PROPERTY).
Let CCβ be linearly ordered by inclusion such
that μ(CB) > 0 for all BGC; then, for each fixed n, N R (B) -nμ(B) I
is a martingale, i.e., for each B, BGC with BQ3 one has
N (B)-nμ(B) E | -2 |N (D) : n μ(CB)
N (B)-nμ(B)
I P-6 \ *
d < B
μ(CB)
Proof. Since (N (C))ne-n is a Markov process (cf. Corollary to Lemma 1), it follows that N (B)-nμ(B) —
Π
μ(CB)
=
/^
N (B)-nμ(B) P. I J |N
where
/N(B)-nμ(B)
\ |N (D) : CΞDCB
El —
a S
*
\ ( B )
( ω )
=
E
J
\ |N (5)
μ(fB)
/ N (B)-nμ(B) ^n__ JN
n
( g )
,
J =
for all ω G {N (B) = m } , m=O,l,. .. ,n.
I
I m+N _ (BNB) \
μ(CB)
N (B)-nμ(B) N ( μ(CB)
—
nμ(B)
_
IN (B) =m m+(n-m)μ(B) nμ(B)
I μ(CB)
μ(CB)
μ(CB)
mμ(Cl) + (n-m)μ(BΠCI) ^ nμ(B)μ(CB) μ(CB)μ(CH)
(μ(BΠffi
m-my(B)+nμ(B)-mμ(B)-nμ(B)+mμ(B)-nμ(B)+nμ(B)μ(B) μ(CB)μ(CI)
(
(l-μ(B))(m-nμ(B))
m-nμ(B)
μ(CB)μ(CB)
μ(CB)
N (B)-nμ(B)
—
\
N (B)-nμ(B)
|N (1)
= -^ D t / Let us make at this place a remark concerning the covariance structure of n
(N (B)) β supplementing the properties (2)-(4 ) on page 2: n Bto
EMPIRICAL PROCESSES
It is easy to check that for any B.EB, i=l,2, (5)
E ( N n ( B 1 ) N n ( B 2 ) ) = nμ(B 1 ΠB 2 ) + n(n-l)μ(B 1 )μ(B 2 ), whence
E U ^ B ^ N ^ B ^ ) = n(n-l)μ(B;L)μ(B2)
together with
if B x n B 2 = 0;
2
E(N (B1>)E(N (B 2 >) = n μ(B 1 )μ(B 2 ) this yields
cov(N n (B 1 ), N n ( B 2 ) ) = - nμ(B 1 )μ(B 2 ) Φ 0
therefore, B Πβ
if B . ^
= 0
and
= 0 does not imply that N (B ) and N (B ) are indepen-
dent. (For the uniform empirical process, to be considered later, this implies that it is not a process with independent increments.) This situation changes if one considers instead the following (6)
POISSONIZATION: Let v be a Poisson random variable (defined on the same p-space as the ξ. f s) with parameter λ and let for BGB v(ω) M(B) Ξ M(B,ω) :=
Σ i=l
l_(ξ.(ω)), ω€Ω. B
X
Assume that v is independent of the sequence (ξ.). . Then, for any pairwise disjoint B.GB, j=l,...,s, the random variables M(B.), j=l,...,s, are independent. Furthermore, for any BGB and any k£{0,l,2,...} one has
]P(M(B)=k) =
( λ
^
B ) )
exp(-λμ(B)).
Proof. Let us prove first the last statement: £ F(M(B)=k) = P ( U { Σ lR(ξ.)=k, v=£}) = B
X
a Σ ϊ ( Σ l R (ξ. )=k) P(v=£) B
x
Σ φμ(B) k (l-μ(B))*~ k ^- exp(-λ)
Σ
(**=:»)
kl
or
k'fΓ
—
lm,0
ml
I"
kl
8
PETER GAENSSLER
As to the independence assertion let
s s := C( U B . ) , k := Σ k . 5 and ] 3 j=l 3=1
B S + 1
kr,
:= l-k for iik. Then
I P(M(B.) = k., j=l,...,s) = P( U { Σ l β (ζ±) = k., j=l,...,s+l; v=A}) J6=k l-l 3
=
Σ P( Σ
L
(ξ ) = k., j=l,...,s+l)P(v = £)
^
=
s Π
k. (λμ(BJ) 3 2 k
j=l [
j
_
exp(-λ)
. ( * ) , where
exp(-λ)'
!
^
Lm^O
] = exp(λμ(B s + 1 )), whence s+1 ] = exp(-λ( Σ μ(B.))) exp(λμ(B
exp(-λ) [
s )) = exp(-λ( Σ μ(B.))).
Therefore (*) =
π
V A μ
^ ]
/ y
' exp(-λμ(B.)),
D
Later we will consider for a given CCβ the so-called EMPIRICAL C-PROCESS 3 q Ξ ( 3 n ( C ) ) c e C defined by 3 n (C) := n 1 / 2 (μ n (C)-μ(C)), CeC. Using (5) one obtains cov(3 n (C 1 ), 3 n (C 2 )) = μ(C 1 ΠC 2 ) - μ(C 1 )μ(C 2 ), C l 5 C 2 G C .
Furthermore,
Π 1/2 n ' (3 (C ) - 3 (C 9 )) = Σ n n i = 1
η.(C 9 C ) with l
η. Ξ η . ( C i 5 C 0 ) := l p (ξ.) - l p (ξ.) - (μ(C-) - μ(C o )) i
l
1
^
C
1
i
identically distributed with
C2
i
1
I
being independent and
E(η.) = 0 and
Var(η.) = μ(C AC ) - (μ(C ) - μ(C ))
ύ μ(C AC ) , whence the following
Bernstein-type inequality applies (cf. G. Bennett (1962)):
EMPIRICAL PROCESSES (7)
9
Let η ,η ,... be a sequence of independent random variables with E(η.) = 0 and Var(η.) = σ. and suppose that n 2 := Σ η. and τ ί n i=l
constant 0<M<«; let S n
sup|η.| ^ M
for some
n := Σ σ. then for all n and i=l ί
ε>0 ε2/2 K S £ε) ^ exp (
). τ + +εM/3 n M/3
n
From (7) we obtain immediately LEMMA 4. For every n and a>0 one has for any C.EC, i=l,2, 2
P( 3n(C
(i)
_
t
2exp ( and for any CEC
na
x
γγz ) 2nμ(C1ΔC )+4n7 a/3 2
) ^ 2exp (-
(ii) Γ( βn(C
We will conclude this section with a further fundamental property concerning the so-called EMPIRICAL C-DISCREPANCY D (C,μ) := sup|μ (C)-μ(C)|. n CGC n In what follows we shall write that
||μ -μ|| instead of D (C 5 μ) and we assume
||μ -μ|| is a random variable, (i.e. F,β-measurable)t Then:
LEMMA 5. (|| μ -μ || ) ——————
σ-fields
~[\
G
is a REVERSED SUBMARTINGALE w.r.t. the sequence of
TitHN
:= σ({μ (B), μ
(B),... : BEB}) which means that for each m, n
with m^n
| )
S jjμ-μi| P-a.s.
Proof. As shown in Gaenssler-Stute (1977), 6.5.5(c), the following holds: For each CGC the process (μ (C)-μ(C) ) n 6 J N is a REVERSED MARTINGALE w.r.t. i.e., for each m, n with m^n one has E((μ m (C)-μ(C))|G n ) = μn(C)-μ(C); therefore TE(supjμ ( C ) - μ ( C ) | I G ) y~,^n
cec
m m
I
n
10
PETER GAENSSLER
^ sup| E((μ (C)-μ(C))|G ) | = sup|μ (C m Π Π C6C C€C Now, as in the case of submartingales, there holds an analogous CONVERGENCE THEOREM FOR REVERSED SUBMARTINGALES (cf. Gaenssler-Stute (1977), 6.5.10) stating that for any reversed submartingale (T T )^ w.r.t. a monotone decreasing sequence (G ) _
(on some p-space (Ω,F,]P))
of sub-σ-fields of F satisfying
the condition that inf E(T )>-» there exists an integrable random variable T^ n such that T -*T P-a.s. and in the mean, n °° From this and Lemma 5 one obtains a rather simple proof of the following result (cf. D. Pollard (1981)) which, in a similar form, was one of the main results in Steele ! s paper (cf. M. Steele (1978)) proved there with different methods based on the ergodic theory of subadditive stochastic processes. LEMMA 6. Let (v ) ^
be an arbitrary sequence of non-negative integer valued
random variables on (Ω,F^P) such that v
]P IP -»• » (where + denotes convergence in
probability; also here and in the following all statements about convergence are understood to hold as n tends to infinity). Then
||μ n -μ||->0 P-a.s, iff
||μ v -μ|i ? 0; n
in particular,
|jμ -μjj •> 0 ΊP-a.s, iff jjμ -μ|| •> 0.
(Note that according to our measurability assumption on ||μ -μ|| also the RANDOMIZED DISCREPANCY ||μ -μ|| is a random variable; in fact, n
ίω: || μ v
( ω )
n
( ,ω)-μ|| £a} = υ {vn=j}Π{||μ - μ | | <;a} j£ΣZ
Proof. 1.) Only if-part: v _______
for each a*0.)
->• °° implies that for any subsequence (v f ) of (v )
j}
there exists a further subsequence (v !f ) such that v „ •> » || μ
n
-μ|| -> 0 P-a.s. as n" tends to infinity, and therefore n"
2.) If-part: According to Lemma 5 the process (||μ -μ||,G )
n
P-a.s.,whence ||μ -μ|| ->- 0. n is a reversed sub-
martingale. It is uniformly bounded; therefore, by the convergence theorem for reversed submartingales mentioned before, there exists an integrable random
EMPIRICAL PROCESSES
11
variable Ίm such that
||μ -y|| -> T^ P-a.s. From this it follows as in part 1.)
of our proof that
TP -μ|| •> T , whence, by assumption, it follows that
||μ
n T
=0
IP-a.s.
D
2. GLIVENKO-CANTELLI-convergence: The VAPNIK-CHERVONENKIS-Theory with some extensions.
Let us start with the simplest case: Assume that ( ξ . ) . ^ is a sequence of i.i.d. random variables on some p-space (Ω,F^P) with distribution function (df) F; let F n be the EMPIRICAL df pertaining to ξ ^ . .^ξ , i.e., F (t) : = — Σ 1 (ξ ) tQR. n * n . _«. ("-°°at] i ' Then the classical GLIVENKO-CANTELLI Theorem states: D F := sup|F (t)-F(t)| •»• 0 F-a.s. n tQR n
(8) F (Note that D
F is a random variable since
D
= sup|F (t)-F(t)|, where (Q denotes
the rationale.) The proof of (8) usually runs as follows: a) One shows that (8) holds true if the ξ.!s are uniformly distributed on (0,1). b) Using the QUANTILE TRANSFORMATION s H- F~ 1 (s) := inf{t GR: F(t)£s}, s€(0,l) and a) one obtains (8) for the SPECIAL VERSIONS ξ. := F
f
(η.), where the η. s are independent and uniformly distributed on
(0,1) (and defined on the same p-space as the ξ. ! s). Note that U ξ . } = L{ξ.} for each i; even more, by independence, one has
c) Reasoning on the fact that the validity of (8) only dependes on the proof is concluded. In view of the more general situations we shall consider later on in this sec12
EMPIRICAL PROCESSES
13
tion we want to clarify c) a little bit more: •}{•
c ) (8) claims t h a t n
1 P({ω€Ω: lim (sup | - Σ 1, . Ί ( ξ . ( ω ) ) - F ( t ) | ) =0}) = 1; n C tJ 1 n-x» tGR i=l °°' consider ξ(ω) := (ξ..(ω), ξ ( ω ) , . . . ) E ] R
g(ξ(ω)) ^
:= sup l jj Σ 1 n ( tGR i=l
and put
, (ξ (ω) ) - F ( t ) | . t J
'
1
Now, note (and remember) the fact that in the present situation
g
(9)
E 1 1 - ^ ! is β^β-measurable
1
since
g (x) = sup | - Σ 1, tGQ i=l
Therefore
A := {^EϋR
-,(x.)-F(t)| *
J
for
X
x_ : = ( x i 9 x 9 , . . . ) .
: lim g (x_)=0} £ Φvo whence, p u t t i n g
ξ(ω) := ( ξ . ( ω ) , ξ 2 ( ω ) , . . . )
one o b t a i n s
P ( { ω E Ω : l i m (sup | i
n Σ
l ( - < β ^ ( ξ ^ ω ) ) - F ( t ) | )=θ} )
= P({ωGΩ:limg (ξ(ω))=O>) =P({ωEΩ: ξ(ω)GA}) n
= L{ξ}(A) = L{ζ}(A) = F({ωGΩ : ξ(ω)6A}) b) n = Γ({ωGΩ : lim (sup | - Σ 1, n-x° tGR Π i = l °°'
. ( ξ . ( ω ) ) - F ( t ) | ) = 0 } ) = 1, ± b)
When taking (X,B) = (R,β) and C := {(-«>,t] :tθR}, then, in the setting of Section l, the GLIVENKO-CANTELLI-convergence (8) reads as (8*)
D (C,μ) Ξ supjμ (C)-μ(C)j -> 0 P-a.s. n
n
cec
Concerning more general situations it turns out however that (8 ) may hold for the empirical measures obtained from one sequence ξ-,ξ 9 9 ... of independent random elements in (X,B) each having distribution μ but not for the empirical measures obtained from another such sequence, say η ,η ,... EXAMPLE
(cf. D. Pollard (1981), Example (5.1)).
.
14
PETER GAENSSLER
Let (X,B,μ) be a nonatomic p-space ( i . e . , { x } E B and μ({x})=0 for a l l
xθ().
Suppose that there exists a subset A of X with inner measure μ (A)=0 and outer measure μ*(A)=l (cf. P.R. Halmos (1969), Section 16, for an example). Let B := AΠβ be the trace σ-algebra of B on A and μ be the p-measure defined on A A B by μ (AΠB) := μ(B), BEB; note that μ is well defined since μ*(A)=l. By the A A A definition of B Δ the embedding ξ Δ of A into X is B.,B-measurable A A A (ξ Λ
(B) = {xEA: ξΛ(x)=xEB} =AΠBEB. for any B6B) and one has
A
A
A 1
μ A (ξ A " (B)) = μ(B)
for a l l BEB.
Consider the p-space (Ω 1 ,F 1 ^P 1 ) := (A
,S) B Δ , x μ ) Έ
A
]N
A
and on it the random elements ξ.: Ω- + X, defined by
TM where π.: A
-> A denotes the i-th coordinate projection.
Then, by construction, the ξ.fs are independent having distribution μ (L{ξ i }(B) = P 1 ( ξ i " 1 ( B ) ) =ΊP 1 (π i " 1 (ξ A " 1 (B))) = V ^ ' ^ B ) ) = μ(B) for each BEB). Now, let C be the class of all finite subsets of A; then CCβ and μ(C)=O for all CGC since (X,B,μ) we assumed to be nonatomic. But since all the ξ.fs take their values in A it follows that for the empirical measures μ
pertaining to ξ ,...,ξ
one has
sup |μ (C)-μ(C)| Ξ 1.
cec
n
Taking instead the p-space
and on it the random elements η.: Ω
where here τr.:(CA)
-> X, defined by
•> QA is again the i-th coordinate projection and where
EMPIRICAL PROCESSES
15
denotes the embedding of CA into X, it follows as before (noticing that f
μ (CA)=1) that the n. s are i.i.d. with distribution μ (whence
But,
for the same class C as before one has now for the empirical measure μ ———— n !
pertaining to η , ...,η , μ (C)=0 for all CeC, since the η^ s take their values in CA, whence sup|μ (C)-μ(C)| =0.
cec (Note that in both cases
n
D (C,μ)Ξsup|μ (C)-μ(C)| n
cec
is measurable.)
n
Finally, taking as underlying p-space the CANONICAL MODEL (Ω,A,P) = (X^.B.., x μ )
and on it the coordinate projections ξ., i G U , being again i.i.d. with distribution μ, the above example shows that for the very same class C one gets e.g. , sup|μ (C,x)-μ(C)|
cec for
x.= ( X I J X Λ '
1
^e
{xθiΈ
λ x
= sup \iΛC9x) - l.(x-)
~
cec
λ
"
A 1
» whence, since AφB : sup|μ1(C,x)-μ(C)|
cec
λ
"
= l } = A x X x X x ...
i.e., here - in contrast to (9) (10)
ϋ^gtx)
:
= s u P | μ n (C,x)-μ(C)|
cec
n
-
is not B__ ,β-measurable, JN This indicates already the need for appropriate measurability assumptions to be discussed later.
Let us point out at this stage the usefulness of GLIVENKO-CANTELLI-convergence in statistics by giving only one example concerning CHERNOFF-type estimates of the mode (c.f. H. Chernoff (1964), and E.J. Wegman (1971)). For other examples, see P. Gaenssler and J.A. Wellner (1981). For the moment we anticipate the following GLIVENKO-CANTELLI-Theorem which will be proved later in this section:
16
PETER GAENSSLER
(11) Let ξ ,ξ ,... be i.i.d. random vectors on some p-space (Ω,F^P) with values in X = E , k^l, having distribution μ on B = 03, . Let C Ξ B ^ be the class of all closed Euclidean balls in ]R lim
(sup |μ (C)-μ(C)|) = 0
then P-a.s.
Now, consider (X,B) = (E j β ^ ) , k^l, and suppose that μ is "unimodal" in the following sense: (*) there exists a ΘG]R
such that for some <5 >0, μ(B C (θ,δ ))>μ(BC(x,<S ))
for all xEΊR , xj=θ, where B°(x,& ) denotes the closed Euclidean ball with center x and radius <5 . o Facing the problem of finding a consistent sequence of estimators for the (unknown) θ, one may proceed as follows: Suppose that 0
with lim r =δ
are given; then, given i.i.d. observations
x. = ξ.(ω), i=l,...,n, choose as estimate θ (ω) = θ(ξ i (ω),.. . ,ξ (ω)) a center of ii
n
a closed Euclidean ball with radius r
1
n
which covers most of the observations,
i.e. for which (#*)
μ (B°(θ (ω),r ),ω) £ μ (B°(x,r ),ω) n n n n n
for all χ£3Rk,
Then the claim is that lim θ = θ JP-a.s. n n-χ»
Proof. Choose M>0 such that μ(CBC(O,M))<μ(B°(θ,δ
Let ω6Ω
o
)). According to (11) we have
3P(Ω )=1 for Ω := {ω6Ω: lim (sup |μ (C)-μ(C) | )=θ} . ° ° n-x» CGB, n K and suppose that θ (ω)-/-»θ as n->co; then either
(1) lim sup |θ (ω)| - °° n
or
(2) there exists an x*θ such that lim θ
(ω) = x for
some subsequence (n.) of ]N. We will show that both, (1) and (2), will lead to a contradiction. ad (1): lim sup I θ (ω)| = °° implies that there exists some subsequence (n.) of IN n+oo
such that
C
B (Θ
n
^
(ω),r
) C CB C (O,M)
for all j , whence
EMPIRICAL PROCESSES
17
C
C
lim sup μ (B (Θ (ω),r ),ω) ύ lim inf μ (CB (O,M),ω) Π J-*» j j j j+~ j C
= 1 - lim sup μ
C
(B (O,M),ω) = l-μ(B (O,M)) = μ(£β (O,M))
C
< μ(B (θ>βn)) = lim inf μ n
(B (θ,δ Q ),ω) ^ lim sup μ
C
(B (θ,r
),ω)
k which is in contradiction with the choice of θ ad (2): lim θ n (ω) = x * θ
implies that
according to (**).
lim inf μ
C
(B (Θ
(ω),r
),ω)
£ lim |μ (B°(θ (ω),r ),ω) - μ(BC(θ (ω),r ))| n. n. n. n. n. /
V = 0 according to (11)
+ lim sup μ(B°(θn (ω),i»n )) ύ μ(BC(x,δ )) < μ(B°(θ,δ )) 3"*°° j j ° (*) ° C
= lim inf μ n (B°( Θ,<5 Q ) ,ω) ύ lim inf μ n (B (θ,r
which again is in contradiction with the choice of θ
),ω),
according to (**).
D
Before starting with the VAPNIK-CHERVONENKIS Theory we want to add here some remarks concerning the]P-a.s. limiting behaviour of so-called weighted discrepancies which are of importance in statistics as well (cf. T.W. Anderson and D.A. Darling (1952), J. Durbin (1953)). For this, let (X,B) = QR,β) and ( 5 i ) i 6 ] N be a sequence of i.i.d. random variables on some p-space (Ω,FjP) with df F; let F
be the empirical df per-
taining to ξ i s ...,ξ n and define the WEIGHTED DISCREPANCY by |F (t)-F(t)|
F
D (q) := sup tem
q(F(t))
where q: [ θ , l ] -> ]R+ i s some given WEIGHT FUNCTION. (Note that D (q) Ξ D F
for q Ξ l . )
Considering instead of the ξ.!s the special versions ξ. = F~ 1 (n.), the n.'s being independent and uniformly distributed on (0,1), it follows in the same
18
PETER GAENSSLER
way as pointed out in part c*) of the outline of the proof of (8) that w.r.t. p ! Γ-a.s. convergence of D (q) for continuous q s one may consider w.l.o.g. p instead of D (q) its versions .F |F < D (q) := sup n q(F(t)) where F
is the empirical df pertaining to L } ...,ξ .
But, due to the identity
{ξ\^t} = {η.^F(t)}
F (t) = U (F(t)) for all tEH, where U η
|
l 5 '*'' η n*
for all tGIR one has
is the empirical df pertaining to
T n e r e f o r e
|U (F(t))-F(t)| |ϋ (s)-s| D (q) = sup — ^ sup — =: D (q), tEΠR q(F(t)) s6[0,l] q(s)
A F r
~F where we remark that for continuous F we have even D (q) = D (q), whence for continuous q's and F f s one has, comparing again with part c*) of the outline of the proof of (8), that D
n ( q ) = ^n ( q ) = V
q )
showing that in this case D F (q) is a DISTRIBUTION-FREE STATISTIC. By the way, since for continuous q f s and arbitrary F f s
F ** ~F D (q) = D (q) ύ D (q), we obtain
in this case that ^
for each d^O.
Also, the above remarks show that for continuous q*s we may restrict ourselves w.l.o.g. to the case of finding conditions on q such that (*)
lim D (q) = 0 ΠP-a.s. n-*» n
in order to get the same GLIVENKO-CANTELLI-convergence for D (q). The following theorem gives in a certain sense necessary and sufficient conditions on q for (*) to hold (cf. J.A. Wellner (1977) and (1978)). THEOREM 1. Let (η.).
be a sequence of independent random variables on some
p-space (Ω,F,IP) being uniformly distributed on [θ,l] [θ,l]. Let D (q) be defined as above with a weight function q belonging to the set
EMPIRICAL PROCESSES
19
0^ := ίq:[0,l] -*]R, q continuous, q(0) = q(l) = 0, q(t) > 0
for all
tE(0,l), q monotone increasing on [0,6 ] and monotone decreasing on [l-δ ,l] for appropriate δ.=δ.(q), i=0,l}. Then, putting (i)
Ψ(t) :=
, > , one has:
1 For any q€(λ with Jψ(t)dt < °° it follows that 1 0 lim D (q) = 0 P-a.s.
1 (ii) For any qE£ with Jψ(t)dt = °° it follows that 1 0 lim sup D (q) = °° P-a.s. Proof.(i): For any ε>0 there exist θ.>0 such that θ.<δ., i=0,l, with θo Jψ(t)dt < ε/4 0 We have D (q) = n
+
sup
1 / Ψ(t)dt < ε/4. 1-θ
and
sup Ψ(t)|ϋ (t)-t| ^ sup Ψ(t)U (t) n 0
Ψ(t)t + o
sup o
Ψ(t)|ϋ (t)-t| +
sup
1
Ψ(t)|u (t)-t|
1
= : I^n) + I 2 + I 3 (n) t I^ίn), say. Now, to start with the first summand I.Cn), one has
-* J Ψ(t)dt P-a.s. by the SLLN, whence 0
lim sup I (n) < ε/4 P-a.s. n-*»
Concerning I 2 , note that for all t θo Ψ(t)t ^ JΨ(s)ds ^ J Ψ(s)ds, whence I 0 0 As to Io(n) we have Io(n) =
sup Ψ(t)t < ε/4. 0
sup o~
o
^ [ min q(t)]~ sup |u (t)-t| ^ 0 θ ^t^l-θ, θ ^t^l-θ, n = : c with 0
=
P-a.s. according to (8).
20
PETER GAENSSLER
Finally, I 4 (n) =
sup
i n
1 -
Σ
sup
i=i I
Σ Ψ(η.)l,
1 lim -
Ψ(t) | 1 - - Σ l(i_
Ί
Ψ(t)l, u
Θ ^ I
(η.) +
J
(η.)-t|
,(η.) + J
x
Ψ(t)dt
Σ
iΊ
sup
iwhere again by the SLLN
(η.) =
J
Ψ(t)dt
lim sup I.(n) < ε/2
P-a.s.
1
n
Ψ(t)(l-t)
P-a.s.; thus
So we have shown that lim sup D (q) < ε P-a.s. for any ε>0; this proves the assertion in (i). (ii): Let N E U
be arbitrary but fixed; we will show (+) lim sup D (q) k N
P-a.s.
which gives the assertion in (ii). Now, by assumption, either (a)
or
(b)
J Ψ(t)dt = «> 0 1 J
Ψ(t)dt = oo.
Let us consider case (a) (case (b) can be dealt with in an analogous w a y ) , i.e.,assume J°Ψ(t)dt = oo. 0 Then, for n sufficiently large, say n^n , \
FIGURE 1
a
n
:= max {t
2nN} o' Ψ(t) -
is well defined (i.e. { ...} * (cf . Figure 1)
Σ
Σ n^n
n^
1+l)N-2r
o o in other words one has Σ
P(η n
and
EMPIRICAL PROCESSES
21
whence by the BOREL-CANTELLI LEMMA P(lim sup {η ^a }) = 1. n n From this we obtain, looking at the first order statistic η has
n n # 1
^ a
, thatP-a.s. one
for infinitely many n and this implies thatP-a.s.
^ — Ψ(a ) — = N
for infinitely many n, which implies (+). D
Remark. The validity of the first inequality at the end of the proof is based on the following fact which is easy to prove: Given two df's F , F
on E. and some strictly positive function h on some inter-
val (a,b)C]R , then for any continuity point t C(a,b) of F
sup a
|F2(t)-F (t)|
F 1
^ -
( t 2
and of h one has
o)"F2(to"0)
h(t)
h(t Q )
THE VAPNIK-CHERVONENKIS THEORY: There are various methods for proving GLIVENKO-CANTELLI-Theorems (i.e., a.s. convergence of empirical C-discrepancies D (C,μ)) in cases where a common geometrical structure for the sets in C is essentially used; see P. Gaenssler and W. Stute (1979), Section 1.1, for a survey on the results and methods of proof. For arbitrary sample spaces where geometrical arguments are no longer available, perhaps the most striking method based on combinatorial arguments was developed by V.N. Vapnik and A.Ya. Chervonenkis (1971). We are going to present here their main results together with some extensions and applications. In what follows, if not stated otherwise, let X be an arbitrary nonempty set and denote by P(X) the power set of X (i.e.5the class of all subsets of X ) . For any set A, |A| denotes its cardinality. DEFINITION 1. Let C be an arbitrary subclass of P(X) and, for any FCX with |F| < oo5 let Δ C (F) := |{FΠC: CGC}I
22
PETER GAENSSLER
be the number of different sets of the form FΠC for CEC. Furthermore, for r=0,l,2,... let m C (r) := max {Δ C (F): |F| = r } 9 and C
V(C) :=
Γ
finfίr: m (r)<2 } l~, if m (r) = 2 Γ for all r.
If m (r)<2 Γ for some r, i.e., if V(C)<«>, C will be called a VAPNIK-CHERVONENKIS CLASS (VCC). REMARKS. C
C
T
a) m (•) is called the GROWTH FUNCTION pertaining to C. Note that m (r)£2 for all r and m (r)=2 r iff there exists an FCX with |F|=Γ such that for all F ! CF there exists a CeC with F<X=F ! ; in other words: m (r) = 2 Γ iff C cuts all subsets of some FCX with |F|=r, saying that F is shattered by C. On the other hand, m (r)<2 implies m (n)<2 for all n^r. b) EXAMPLES.
1) If X=K and C = {(-«>,t]: t€3R}, then
r
m (r) = r+1, whence C is a VCC with V(C) = 2. 2) (cf. R.S. Wenocur and R.M. Dudley (1981)): More generally, let X be an arbitrary set with |x|£2 and suppose that CCP(X) fulfills the following condition (*): (*) VFQC with |F|=2 there exist C.GC, i=l,2, such that m e =0 and FΠC 2 =F. (Note that (•) holds if {0,X}CC.) Then C is a VCC with V(C)=2 iff C is linearly ordered by inclusion. PROOF. Assume to the contrary that C is not linearly ordered by in!
M
clusion. Then there exist C ,C 6C such that C'ψC" and C'cjlC; choosing x ^ ^ C " and x ^ C ' V C 1 , it follows, together with (*), that F := {x ,x } is shattered by C which implies V(C)>2. To prove the other direction, assume (cf. (*)) that C contains at least two elements and is linearly ordered by inclusion. Then, since
EMPIRICAL PROCESSES
23
implies V(C)^2 (note that |C|=1 iff V(C)=1), it remains to show that
For this, consider an FQ( with | F | = 2 ; then, since C is linearly ordered by inclusion, there is at most one F'CF with |F'|=1 and F'=FΠC for some CGC,
showing that F is not shattered by C.
D
C
3) If X=[O,1] and C := {CCX: |c|
Γ
for all r,
whence V(C)=°°.
v c) L e t φ ( v , r ) := Σ \ A j=0 3 '
i.e.,
9
where
| r ] := 0 f o rj > r , ' -1 '
φ(v,r) = <
j=0*
3
if v
I
2 Γ , if (Note that φ(v,r) is the number of all subsets of an r-element set with at most v elements.) Then it is easy to show that the following relations hold true: (12)
Φ(v,r) = φ(v,r-l) + φ(v-l,r-l),
where φ(O,r) = 1 and φ(v,O) = 1; Φ(v,r) £ r V +l for all v 9 r ^ O ,
(13)
The following remarks d) and e) are taken from R.M. Dudley (1978). d) Let H, be the collection of all open half spaces in Έ. , k^l, i.e., all sets of the form k ίxGIR : (x,u)>c}
k for
0 * uG3R
and some cGΠR,
and let \ ( r ) be the maximum number of open regions into which ]R
is
decomposed by r hyperplanes H. H. = {x€]Rk : (x,u.) = c.}, j=l,...,r; then the maximum number N, (r) is attained for £L,...,H
in "general
position11 i.e. if any k or fewer of the u. are linearly independent. L.Schlafli (1901, posth.) showed that
24
PETER GAENSSLER
N k (r) J.
= φ(k,r).
Steiner (1826) had proved t h i s for k^3.
k e) If F i s an r-element subset of ]R , then
H K
(15)
Δ (F) ύ 2φ(k,r-l)
and equality is attained if the points of F are in "general position", i.e. no k+1 of them are in any hyperplane (cf. T.M. Cover (1965); E.F. Harding (1967); D. Watson (1969)).
Hk
Therefore the growth function m
(•) pertaining to H
satisfies
H m K ( r ) = 2φ(k,r-l).
(16)
Without using (14)-(16), but directly from the definition of φ(v,r) and the recurrence relation (12), Vapnik and Chervonenkis ((1971), Lemma 1) proved the following lemma: Lemma 7. If X is any set and if CCp(X) is a Vapnik-Chervonenkis class (i.e.9 V(C) ύ v < °° for some v ) , then m (r) < φ(v,r) for all r ^ v . In view of (16) this implies that for arbitrary X one gets an upper estimate
r for the growth function m (•) pertaining to a V C C CCP(X) b y the growth *v v function m (•) pertaining to the class H o f a l l open halfspaces in X = E , namely II
(17)
m C (r) < | m V (r+1)
for all r^v^V(C).
Instead of Lemma 7 we shall show here a slightly sharper result whose nice proof (based on a proof of a more general result in J.M. Steele (1975)) I learned from David Pollard on occasion of one of his Seminar talks in Seattle (1982); cf. also N. Sauer (1972). Lemma 8 (Vapnik-Chervonenkis-Lemma).
Let X be any set and CCp(χ) be a Vapnik-
Chervonenkis class (i.e.,V(C) =: s<°°), then
EMPIRICAL PROCESSES
25
Q m (r) £ φ(s-l,r) for all Proof. Let r^s be arbitrary but fixed. We have to show that for any FCX with |F|=r (a)
Δ°(F) := |{FΠC: C6C}| ^ φ(s-l,r).
Let {F ,.,,,F } be the collection of all subsets of F of at least size s
(so
p = (r) + ( Γ 1 ) + . . . + ( r )). * s s+1 r
Note that (a) is trivially fulfilled if (b)
FΠCΦF.
for all i=l,... ,p and all CGC.
Now, by assumption, we have: (c)
For each F. there exists an F. C F . such that F!Φ
F. Π C
for all CGC,
implying (d)
{FΠC: CeC} C B 1 := {BCF: B Π F ^ F *
for all i=l,...,p}.
In one special case the result follows readily, namely if F. = F. for all i=l,...,p since then B+ F. for all i=l,...,p and each BEB β
(which means that
cannot contain any subset of F of at least size s ) , so that (a) follows
from (d). We are going to show that by a successive modification of the F.'s the general case will reduce in a finite number of steps to this special case: If F. * F. for some i, choose any x E F
and put
F? := (FJ uίx 1 }) n F i 9 i=i,...,P, and define the corresponding class B 2 := {BCF: BΠF^^ΦF^
for all i=l,...,p}.
We will show below that (e)
|B 1 | i | 8 2 | .
26
PETER GAENSSLER
2 If now F. = F. for all i=l,...,p,then B^ cannot contain any subset of F of at least size s in which case (a) follows from (d) and (e). 2 If F. * F. for some i we go once more through the same argument, i.e.fwe choose any x
2
6F, x
2
Φ x
1
and put
?l := (F?U{χ 2 }) n F 1 , i=l,...,p, B. := {B C F:B Π F. * F? for all i=l,...,p} O
1
and show as in (e) that
1
|'B | ύ \β \ .
So, another n-2 (n^r) repetitions of this argument would generate classes B 4 ,...,B n such that
|B1| ύ |B 2 | * |B 3 | * ... * |B n | with Bn = f B C F
B Π F J
F* for
all
i=l,...,p}
and F. = F. for all i=l,...,p, which is the special case implying (a). So it remains to prove (e): For this it suffices to show that there exists a one-to-one map, say T, from B 1 \ B 2 into B 2 \
B
r
Our claim is that T(B) := B \ {x } is appropriate: Let B 6 B
\ B 2 ; then by definition of B ^ B O F . + F !
implying that x x
G F. and x
B \ { X 1 } E ^^Bχ
all
i = l , . . . ,p
2 BΠ F. = F. for at l e a s t one j G { l , . . . , p }
and 1
for
i=l,2 s
1
1 2
€ CF. whereas, by construction, x
E F. and therefore
E B; the last makes T one-to-one. It remains to show that for all
B E &^*2
So, let B E B^Bgi then
( B N ί x 1 } ) Π F. = ( B Π F.) \ {x 1 } = F? \ {x 1 } = F^, whence ( B X ί x 1 } ) ^ B^; so we must finally show that B\ ίx 1 } E B 2 > i.e. that (+)
(B\ {x 1 }) Π T± Φ F^
for all i=l,...,p. 1
ad (+): Let i E {l,...,p} be arbitrary but fixed; if x
E F., then x 1
1 2 E F., 1
EMPIRICAL PROCESSES
27
X
1
but x £ (B\{x }) Π F i 3 implying (+) in this case. If x E (?F., i.e., 1
1
{x } Π F, = ?, then F^ = F* % whence (B\ {x }) Π F. = B Π F. Φ F^ = F?, implying (+) also in this case. This proves Lemma 8. D The next lemma, being a consequence of Lemma 7 or Lemma 8 5 respectively, will be one of the key results used below. Lemma 9. Let X be any set and CCp(χ) be a Vapnik-Chervonenkis class (i.e.,V(C) =: s < ~) then C
S
(i)
m (r) ϊ r
for all r£2, and
(ii)
m C (r) ύ r S +l
for all
Proof. According to Lemma 7 and (13) we have C
s
C
m (r) < φ(s,r) ύ r +1 for all r^s, whence (note that m (•) is integer valued) m (r) ^ r
for all r^s;
if 2 £ r £ s , it follows that m C (r) ύ 2 Γ ύ 2 s ύ r S ; this proves (i). CO s Finally, for r=0 we have m (0) = 1 = 2 (whence s^l) ^ 0 +1, and for r=l we have m (1) ^ 2= 1 S +1, proving (ii). D Besides Lemma 9, the following VAPNIK-CHERVONENKIS-INEQUALITIES are basic for the whole theory. Vfe are going to present this part in a form strengthening the original bounds obtained by Vapnik and Chervonenkis. This will be done in a similar way as in a recent paper by L. Devroye (1981). For this, let again (X,8) be an arbitrary measurable space (Devroye (1981) considers only (X,B) = (]R k ,6 k ), k*l), and let (ξ ± ) i e : N be a sequence of independent and identically distributed random elements in (X,B), defined on some common p-space (Ω,F,F), with distribution μ ΞL{ξ^} on 8* For n,n' E Έ let μ and v t be the empirical measures based on ξ.,...,ξ and ξ n + 1 >
>£ n + n »» respectively.
Let C be an arbitrary subset of B, and let
28
PETER GAENSSLER
D (C,μ) := sup |μ (C) - μ(C)| , n n CGC D
(C) := sup |μ (C) - v ,(C)|,
where we assume that both D_(C,μ) and D_ t (C) are measurable w.r.t. the n n, ri ~———————————————_______^— canonical model
(i.e., with (X , BτΊ,
being the coordinate projections of X (Note that then D (C,μ) and D
T
f
x μ) as basic p-space and with ξ. s onto X ) .
(C) are also measurable considered as func-
tions on the initially given p-space (Ω,F,P), since ω H- ξ(ω) := (ξ^ω) ,ξ2(ω) ,. . . ) G X 1
is
F,6^-measurable.)
The proof of the following inequalities is patterned on the proof of Vapnik and Chervonenkis (1971). As a corollary we will obtain both, the fundamental Vapnik-Chervonenkis inequality and its improvement by Devroye (1981). Lemma 10. For any ε>0, any 0<α
P(D (C,μ)>ε) ^ (1
±—)"1 4α2ε2n»
P(U
, (C) > (l-α)ε),
and (b)
P ( D nn , ( C ) > ( l - α ) ε ) ^ m C ( n + n ! ) 2 e x p
[
-
2
^
2
2
2
Q where m (•) denotes the growth function pertaining to the class C.
Before proving this lemma, let us point out the following facts: We have that D
n,n
= h
π!
t
(C)(ω) = sup |μ (C,ω) - v .(C,ω)| CGC ^ ^
(ξ(ω)) for ξ(ω) = (ξ1(ω),ξo(ω),...) G X 1 *,
11,11
±
Z
where h t: X — > ]R, defined by n, n , n n+n 1 h ,(x) := sup I- Σ 1 (x ) - i I 1 (x )| , n n 'n cec n i=i c x i= n +i c x for x_ = (x^,x ,...)6X
, is, by assumption, B , β-measurable, whence
EMPIRICAL PROCESSES
A := {x G X
:^
29
^ (C) > (l-α)ε} G B ^ ,
and therefore P({ω G Ω: ^
= J
Έ
u(h
Π n
nl
(C)(ω) > (l-α)ε}) = L{ξ}(A)
f 1, if ϋ
,- (l-α)ε) d(x μ)
Noticing further that h
z>0
where u(z) := <
1l θ , id if z^O.
, depends only on the first n and the following n
f
coordinates of x_, we obtain using the notation 1
- ( λ 2 :- U r . . . , x n ; , x
-
n := x X, X 1
x X
- ( :- ^ x n + 1 >
n+n f := x X 5 P n+1
ϊ >xn+n»^
n := x μ, P 1
n+n 1 := x μ n+1
and P := P 1 x P 2 , that (*)
P(D
n
'n
,(C) > (l-α)ε) =
J u(h ,(χ 1 a χ 2 )-(l-α)ε) P(dχ 1 ,dχ 2 ). n X^ x χ2 »n
In the very same way one has
(**)
P(D n (C,μ) > ε ) = P 1 ({χ 1 =(x 1 ,...,x n ) G X 1 : sup |i
Σ
l c (x i )-μ(C)| > ε } ) ,
Proof of inequality (a) in Lemma 10. According to (*) of our remarks made before, one has P(U
n > n
t
(C) > (l-α)ε) =
(FuBini) ^
^ ©
with
where
[
5
U{h χ2
n^-
( 1
1
2
J u(h (l-α)ε) d C P x P ) t2 n n yl x x »
"α)ε)
dE>2]
^
:= J [ J u(h ,- (l-α)ε) dP 2 ] d P 1 1 2 n > n A X A 1 := {x 1 G X 1 : sup |μ (C,χ 1 )-μ(C)| > ε } , n CGC 1 1 Π μ (C,x ) = — Σ n i=1
But for any ( x 1 ^ 2 ) G X 1 x X 2
1 1 l p (x.) for x = (x1 ,.. . ,x ) G X . C i 1 n
with
x1 G A1
there exists a C χ l G C
30
PETER GAENSSLER
such that 1
L.x ) - μ ( C χ l ) | > ε,
2
1
whence (for v τ (C,x ) = - t n n
Σ
i-n+1
l p (x.) with L> ϊ
x
9
= (x
2
,,...,x f) E X ) n+l n+n
we have μ
l n if
l
(C
v
χl)
χl' (C
n
χl'
therefore, we obtain for all x
" V
χ2)
( C
χl'
χ 2 )
> ( 1
l
"
α ) ε
" uCC χ l )I £ αe;
£ A
the following estimate for the inner
integral in (a):
/ u(h , ( x \ x 2 ) - (l-α)ε) P 2 (dχ 2 ) n χ2 'n P 2 ( { χ 2 € X 2 : |v.(C
l5
χ 2 ) - μ ( C i ) | S oε}).
But, by Tschebyschev's inequality,
P 2 ( { χ 2 E X 2 : | v n ! ( C χ l , χ 2 ) - μ ( C χ l ) | > αε}) 1
(l-y(C χ l ))
μCC?
2 2 α ε
n
f
1 %α
2
2 ' ε nf1
thus, summarizing we obtain P(D
,(C) > (lα)ε) (l-α)ε) ^
( ) ( 1 ^ ) 4α2ε2n'
= P(D (C,μ) >ε)(l n
(•#)
^—) 4o ε n f
which proves (a). Proof of inequality (b) in Lemma 10. According to our remarks preceeding the proof we may and will consider D
.(C) as a function h . of n,n ! n,n ! 1 2 1 2 x = (x ,x ) E X x X
1 with
x
2 = (x-,... ,x ) and x = (x
.,... > x n + n » )
1 2 Due to the symmetry of P = P * P (w.r.t. coordinate permutations) one has
EMPIRICAL PROCESSES
31
n+n 1 for each f 6 L (X xX , <2> B,P) 1 λ
9
λ
J
f(x) P(dx) =
J
f(T.x)P(dx)
for every permutation T.x of x (which means that T.x is the image of x when applying a permutation T. to the n+nf components of x ) . Therefore, P(U
t
(C) > (l-α)ε) =
where the summation w . r . t .
J
u(h
i i s over a l l
f
(x) - (l-α)ε) P(dx)
(n+n')I permutations T . .
Ws a l s o remark here for l a t e r use i n the proof t h a t Q
u(h
. - ( l - α ) ε ) = sup u(h , - (l-α)ε), n,n' n,n c ^ := |μ ( C ^ 1 ) - v , ( C , χ 2 ) | n n
where h° t ( x ) n jn 1 9
1
for
9
x = (x ,x ) e X xX . Next, l e t x = ( x \ χ 2 ) x F
2
χ
= (χn+1> :=
X
6 X 1 x X2 (with x 1 = ( x ^ .
#
X
X
9
^ l' "'' n' n+l '
Λ
Π C
±
# #
X
f
= F
x
Π C
Δ
C ,C 6 C , F X X
implies h
l Λ
C
X
2
n9n
C
C
1» 2
G
C
O n e
h a S
t h a t
,(T.χ) for all T.. l
I
n C , and such that at the same time for any
X
CEC there exists a C
l
a n y
a subclass of C such that for any two
Φ F
J.
f o r
,(T.x) = h
n5n
n C
t h e n
' n+n ^*
Hence, denoting with C Δ. A
and
» x n + i ) ) ^ e a r b i t r a r y but fixed and put
C
F
.x^
Z
EC
with F X
ΠC
X
sup u(h° ,(T.x) - ( l - α ) ε ) = n,n' l CEg
= F X
n C, we obtain for all T.
X
2. C
C
sup u ( h - (l-α)ε) t(T.x) n,n τ l E ^
32
PETER GAENSSLER
C
Σ u(h (T.x) - (l-α)ε). t n n 1 CGC ' x For later use in the proof we note here also that jC I = Δ (F ) x x
(cf. DEFINITION 1) 2
1
2
for every x = ( x ^ x ) E X x X . It follows that (n+n')! Σ u(h ,(T.x) - (l-α)ε) . n,n ! 1
/ . t (n+nM1 )!
rrγ
Σ
sup u(h
,(T.x) - (l-α)ε)
(n+n')!
1
ΓΓ
.Σ.
u(h
n,n'(Tix)
Note that for each fixed x and C
is the fraction of all (n+n 1 )! permutations T.x of x for which
|μ n (C 5 (T i x) 1 ) - v n ! ( C , ( T i x ) 2 ) | > (l-α)ε. Now, for x and C being arbitrary but fixed, put
{
1, if x. e C ^ , j=l,...,n+n τ , and denote by 0, if x. € CC
(η^ 1 ,... , n ^ n , ) the vector T^η for η = (nχ9.. . > n n + n ι ) Consider then the p-space (Ω ,A JP ) with Ω permutations T., A
being the set of all (n+n 1 )!
:= P(Ω ) , and
|A| Γ (A) :=
,A e A . (n+n')!
EMPIRICAL PROCESSES
Then, u s i n g t h e random v a r i a b l e s ζ . : Ω — > { 0 , l } 5 by ς . ( I\) := η .
33
defined
, Ύ± E Ω Q 9 j = l , . . . 5 n + n f , we o b t a i n
lί (|i
n Σ ς. - i i =l ^
Σ
n [ ( n + n ' ) μ . ,(C,x) - Σ ζ . ] π+n i=l ^
ζ . - (n+n')y
+n
ι(C,x) + |
Σ
ζ . | > (l-α)ε)
j = 1
^ 2
exp [-2n (
f) (1-α) ε ],
using Hoeffding's inequality for sampling without replacement from n+n' binaryvalued random variables with sum (n+nf)μ
f
(C,x); cf. W. Hoeffding (1963) and
R.J. Serfling (1974). 1 2 Summarizing we t h u s o b t a i n f o r e v e r y x = ( x , x ) E X
1 2 xX
(n+n!)ί Σ
Σ
ί ^ . ' W
u(h
.(T.x) - (l-α)ε)]
\C | (2 exp [ - 2 n ( ^ τ ) 2 ( l - α ) 2 ε 2 ] ) = ΔC(F ) x n+n x Q ! ^ m (n+n ) (
), and therefore 1
J I
> (l-α)ε) = X
(n+n )! Σ [J^J, Σ u(h^ ^ h ^ ^, (T.x)-(l-α)ε) (T.x [γ—^TTΓ 1 P(dx)
o
xX
l—1 f
Λ
( Σ
[7—VvΓ
(n+n )Σ Σ
u
p
^h
,(T.x) - (l-α)ε)l) P(dx)
34
PETER GAENSSLER
mC(n+n') 2.exp [-2 which concludes the proof of (b). D
NOTE: We have in fact shown, assuming measurability of ΔCαξl,...,ξn,ξn+1 P(D
ξ n + n , } ) , that
,(C) > (l-α)e) £ 2 exp[-2n(^- Γ ) 2 (l-α) 2 ε 2 ] E(ΔC({ξ.,... ,ξ , n,n n+n 1. ΠTΠ
in many cases this bound is considerably smaller than the r.h.s. of (b).
COROLLARY. (i) Vapnik-Chervonenkis (1971). Taking α = — and n f = n, one gets 2
P(D (C,μ) > ε) ύ 4 mC(2n) exp (- ^ n o
) for a l l n £ 2 / ε 2 .
( i i ) Devroye (1981). 1 Taking α = —
2 and n 1 = n -n, one gets
P(D (C,μ)>ε) ύ 4 exp(4ε + 4ε 2 ) m C (n 2 ) exp(-2nε2) for all n > max(~,2).
Proof, (i): It follows from (a) and (b) in Lemma 10 that in the present case P(D n (C,μ) >ε) ik (1 - -^-)"1.mC(2n) 2 exp [-2n i i ε 2 ] ε n 2 A. 4 mC(2n)exp(- — • ) . 8 (nε2έ2) (ii): Again (a) and (b) in Lemma 10 yield in the present case
P(D (C,μ)>ε) ί» (1 n
2 ~1 2 C 2 2 2 2 2 § ) m (n )2exp[-2iH—) (ε -2αε +α ε )] n 4(n -n)
EMPIRICAL PROCESSES
2
2
2
2
35
2
2
2
2
x 2exp [-2n(ε -2αε +α ε ) +4(ε -2αε +α ε )] 4 m C ( n 2 ) exp[-2nε 2 +4αnε 2 -2nα 2 ε 2 +4ε 2 +4α 2 ε 2 ]
n (n 2 )exp[-2nε 2 +4ε+4ε 2 ]
m (n 2 )exp(-2nε 2 ). D
Based on Lemma 9 (i) and on part (i) of the corollary to Lemma 10 we now obtain the main result of Vapnik and Chervonenkis concerning almost sure convergence of empirical C-discrepancies in arbitrary sample spaces.
THEOREM 2. Let (X,B) be an arbitrary measurable space and let (£*) £-rκτ b e
a
sequence of independent and identically distributed random elements in (X,B), defined on some common p-space (Ω,FjP), with distribution μ = L{ξ.} on B.
For n G I l e t μ and v b e t h e e m p i r i c a l m e a s u r e s b a s e d on ξ . 9 . . . 9 ξ a n d n n in ξ
n t l 9 " # * ξ 2n' r e s P e c t i v e l y
Let CCB be a VCC such that both D (C,μ) as well as D
n
(C) are measurable
n,n
w . r . t . the canonical model; then lim D (C,μ) = 0 n
-χχ>
P-a.s.
n
Proof. Of course, it suffices to show that (*)
lim sup D (C,μ) ^ ε P-a.s. for every ε>0;
according to the Borel-Cantelli Lemma, (*) is implied by (**)
Σ
n€]N
P(D (C,μ) > ε) < «> for n
every ε>0,
whence the proof will be concluded by showing that (**)
holds true.
36
PETER GAENSSLER
ad (**): Given any ε>0, we obtain from part (i) of the corollary to Lemma 10 that for all n ^ 2/ε 2 Γ(D (C,μ) > ε) ^ 4m (2n) exp(- ^ - ) . n o Since, by assumption, C is a VCC, we have V(C) =: s < °°, whence by Lemma 9 (i) Σ
P(D (C,μ) > ε) ^ n
Σ o P(D (C,μ) > ε) + n n<2/ε 2
2 ( 2 n ) S exp(- ^ -
Σ
) < «.
D
8
^
The proof shows that the assumption of C being a VCC was essentially Q used to the amount that in this case the growth function m (r) is majorized by r
for r ^ 2 (with s being the minimal r for which m (r) < 2 ) ; without
C r this assumption, i.e., in case that m (r) = 2 for all r, we would have arrived at .. 4
„ Σ
O
2n 2
2 , ε n exp(- — - λ) = oo.
2
8
Thus, Theorem 2 can be restated as follows: (18) If for a given CCβ there exists an s<°° such that C does not shatter any FCX with I F| = S (i.e., for any FCX with |F|=s there exists an F'CF s.t. F 1 * F Π C for all C E C ) , then C is a GLIVENKO-CANTELLI-class (i.e.jlim D (C,μ) = 0 JP-a.s.), provided that the measurability assumptions
stated in Theorem 2 are fulfilled.
The following example shows that these measurability assumptions cannot be dispensed with, in general. (19) EXAMPLE (cf. M. Durst and R.M. Dudley (1980)). Let X = (X,<) be an uncountable wellordered set such that all its initial segments { x E X : x < y } , y E X, are countable (cf. J. Kelley
EMPIRICAL PROCESSES
37
(1961), p. 29 - ). Then C := {{x G X: x < y } 9 y G X} does not shatter any F C x with |F| = 2 (in fact: for any F = ίx ,x } C X with x χ < x 2 we have {x } 4= F Π c for all C G C, since x
6 C would necessarily imply that
x 1 G C for all C G C ) . Note that C is linearly ordered by inclusion I To complete the example, let B := {A C X: A countable or ζk countable}, and let μ on B be defined by
r
{
0, iί if A is countable μ(A) : = «. if C A is countable. 1, id Then C C B and μ(C) = 0 for all C G C; on the other hand, given any observations x. 9 i=l,...,n, of i.i.d. random elements ξ. in (X,B) with distribution μ, there exists a C G C s.t. x. G C for all i=l,.,.,n, whence D (C,μ) = 1.
Note that in the present situation D (C) fails to be measurable n, n .^__— w.r.t. the canonical model (cf. Theorem 2 ) . In fact, consider for simplicity n=l, i.e.,Ω = X x X, F = S&B>
Γ = μ x μ with ξ
and ξ
being
the projections of Ω onto the first and second coordinate, respectively. Then D i,i
(C) = sup |μ (C) - v (C)i = lf , l l CΔ c e C
where Δ denotes the diagonal in X x X which is not contained in F: note that Δ E B®
B iff there exists a countable subsystem E of B
which is separating in the sense that (+) for any x,y G X with x * y there exists an E G E such that 1 (x) * l^Cy); but in the present situation it can easily be shown that any countable subsystem E of B does not have the property (+) which implies that
Δ φ B&B = F.
38
PETER GAENSSLER
Thus, although D (C,μ)=l is measurable w.r.t. the canonical model, D
n,n
(C) is not in the present case.
We will show later (Section 4) that in X = ΊR , k ^ 1, the class B ^ of all closed Euclidean balls fulfills the measurability assumptions made in Theorem 2. As shown by R.M. Dudley (1979) one has V(B,) = k+2, implying that B, is a VCC; therefore, by (18) with s = k+2, we obtain the GLIVENKO-CANTELLI result (11) stated earlier without proof.
We are going to present here an independent nice proof of (11) which I learned from F. Tops
(20) RADON'S THEOREM (cf. F. Valentine (1964), Theorem 1.26). Any F C ]R , k ^ 1, with |F| ^ k+2, can be decomposed into two (disjoint) subsets F., i=l,2, such that co(F 1 ) Π co(F 2 ) * 0, where co(F.) denotes the convex hull of F.,
(21) (SEPARATION PROPERTY). For any two C ^ C ^ 6 B k one has co(C 1 \ C 2 ) Π co(C 2 \ C 1 ) = 0. Now, according to (18), in order to prove (11) it suffices to show that B, does not shatter any F C ]R with |F| = s := k+2. Suppose to the contrary that there exists an F C ]R such that for every F
C F there exists a C G B
with |F| = k+2
with F Π C = F .
This implies that for the F. ! s of (20) which decompose a given F C R with |FJ = k+2, there exist Since F
C. EΈ. such that F. = F Π C., i=l,2. 1 K 1 1
Π F 2 = 0, we have
F
Λ
C P c π
\ C P and __ J F_ Tp c C" c P \ P , \ \ C, o o
EMPIRICAL PROCESSES
39
and therefore co(C 1 \ C 2 ) n co(C 2 \ C 1 ) D co(F 1 ) Π co(F 2 ) Φ 0 (by (20)) which contradicts (21). D
As the proof has shown, the separation property of the class C =B
was essential; at the same time the proof has shown that in
general one has the following result (again under appropriate measurability assumptions as in Theorem 2 ) :
(22) If a given class C C β
in X = I k , k M ,
fulfills the separation
property, then C is a VCC and therefore also a GLIVENKO-CANTELLI class.
Let me conclude this section with the following
CONJECTURE: The class of all translates of a fixed convex set in X = ]R , k £ k is, in general, not a VCC; at least it does not fulfill the separation o property: in fact, consider the class of all translates of a tetrahedron C in E. , then the situation looks like this where you (hopefully) can see that for C z := C + z one has C \ Q,^ = C \ {x} and C
\ C = C^ \ {x}, whence
co(C \ C z ) Π co(C z \ C) = {x} (cf. Figure 2 ) ;
I am grateful to Professors K. Seebach and R. Fritsch (Munich) for pointing out to me this example.
PETER GAENSSLER
C fa
a\
FIGURE 2
Note added in proof: As pointed out by a referee, translates of multiples of a fixed convex set need not be a GCC nor VCC: cf. Elker, Pollard and Stute (1979), Adv. Appl. Prob. 11, p. 830.
3. Weak convergence of non-BOREL measures on a metric space.
Let S = (S,d) be a metric space with metric d and let B (S)
Ξ
B b (S,d) be
the σ-algebra in S generated by the open (d-) balls B,(x,r) Ξ B(x,r) := {y€S: d(x,y)
0. Clearly, B, (S) is a sub-σ-algebra of the Borel σ-algebra B(S) in S (generated by all open subsets of S ) .
In this section we will study a mode of weak convergence for nets of finite measures which are defined at least on B, (S). Our formulation is a slight modification of a concept which was introduced by R.M. Dudley (1966) and further studied and extended by M.J. Wichura (1968); cf. also D. Pollard (1979), where it is shown that some of the key results in that theory can be deduced directly from the better known weak convergence theory for Borel measures.
As in Wichura, our presentation here is made roughly along the lines of Chapter I of Billingsley (1968) (see also P. Billingsley (1971), S I A M No. 5) which treats similar aspects of the theory of weak convergence of probabilities defined on all of B(S). The present theory is especially suited to cope with measurability problems arising in the theory of empirical processes as well as to allow for a proper formulation of functional central limit theorems for empirical C-processes (cf. Section 4 ) .
To start with, let us first establish some notation and terminology to be
*) This section represents and extends parts of a first draft of a "Diplomarbeit" by J. Schattauer, University of Munich, 1981.
PETER GAENSSLER
used throughout this section. If not stated otherwise, S = (S,d) is always a (possibly non-separable) metric space. Let A be a d-algebra of subsets of S such that &(S)CACβ(s)
then the
following spaces of real valued functions on S will be considered:
f
a
(S) := {f: S -> ]R , f A ^-measurable}
C ( S ) : = { f : S - > R , f bounded and continuous} C b (S) :=1F ( S ) Π C b ( S ) a a U ( S ) : = { f : S -»• ]R , f bounded and uniformly continuous} U b (S) := f ( S ) Π U b ( S ) . a a In case of B b (S) instead of A we shall write ^ ( S ) , C b (S) and U b (S) instead of ίF a (S) 9 C b (S) and U b ( S ) , respectively.
The following figure may help to visualize the different spaces, where the largest box represents the class of all B(S), β-measurable functions f: S -> ]R and where the smallest class UΓ(S) is represented by the shaded area:
FIGURE 3
for example is represented as that part of the & (S)-box
(marked by the bold arrows) which is left to the dotted line.)
EMPIRICAL PROCESSES
43
Furthermore, let M (S) be the space of all nonnegative finite measures defined a s
on A and write M, (S) for the space of all nonnegative finite measures on Bχ( ) For f: S ->• ]R , we denote by D(f) the set of discontinuity points of f. Finally, given any μEM (S) and any bounded f or ACS, respectively, let a * J f dμ ;= infίjg dμ: g*f, g e f (S) and g bounded}, a J f dμ := sup{Jg dμ: g^f, g E f a (S) and g bounded}, * * * μ (A) := J l Δ dμ, and μ (A) := J 1 dμ. * A A # Note that μ^ and μ
are inner and outer measures, respectively, i.e.,one has
for every ACS (23)
μ*(A) = infίμ(B): B^A, BEA} and μ # (A) = sup{μ(B): BCA, BGA}.
(In fact, as to the first equality in (23), "£" is obvious, since for any B^A, BEA, g := 1 £ 1 Λ , g EtF (S) and g bounded; as to the other inequality, Jb A a given any g^l. 9 g E f (S) and g bounded, choose for each ε>0 B := {g^l-ε} to A a ε get B EA with B DA and μ(B ) ύ J (g+ε)dμ ύ Jgdμ + εμ(S); since μ(S)<°°, ε B ε we obtain, taking ε = — and letting n-*°°, B := n
Π B , EA with B3A and n63N / n
μ(B) ^ Jgdμ, which proves the other inequality.)
The following lemma comprises some simple but still essential facts to be used later on.
LEMMA 11. (i) B ^ S ) = σ(ίd( , x ) : xES}), where σ({d( , x ) : xES}) denotes the smallest σ-algebra in S w.r.t. which all of the functions d( , x ) , for each fixed xES, are measurable. (ii) Let S CS be such that S
= (S ,d) is a separable metric space, then for
d( ,S ) := inf{d( , x ) : xES } we have min(d( ,S ),n)EUΓ(S) for each n; O O O D in this case also S
:= {xES: d(x,S )«5} E \(S)
for every δ>0, and
S C EB 1 _(S), where S° denotes the closure of S in (S,d). o b o o (iii) /((S)CB (S), where /((S) denotes the class of all compact subsets of (S,d).
44
PETER GAENSSLER
(iv) If (S,d) is separable, then B^S)
- B(S).
Proof. (i) is an immediate consequence of the identity B(x,r) = {yES: d(y,x)0. To verify (ii), since d( ,A) is uniformly continuous for each ACS, it suffices to show that d( ,S ) G f , (S); for this, let T be a countable dense o D o subset of S . Then, since d(x,S ) = d(x,T ) = inf{d(x,y): yET }, d(x,S ) is a countable infimum of B , (B-measurable functions, hence d( ,S ) 6 f , (S). This also shows that S
6 B,(S) implying that S° =
°
b
°
Π
S
' n G B , (S). Since each
nEΈ °
b
compact subset of S is closed and separable, (iii) is just a particular case of (ii). Finally, if (S,d) is separable, then (S ,d) is separable for each S Cs, especially for all closed subsets F of S, whence, by (ii), FEB,(S) for all closed FCS and therefore 8(S)Cβ(s) which proves (iv). D Remark. The converse of (iv) is not true, in general: Talagrand (1978) has constructed an example of a non-separable metric space S for which B L ( S ) coincides with B(S).
Now, our first subsection will be concerned with SEPARABLE AND TIGHT MEASURES ON
DEFINITION 2. μEM, (S) is called separable iff there exists a separable subset S of S (i.e., an S CS s.t. S = (S ,d) is a separable metric space) with o o o o μ(S°) = μ(S). (Note that the closure S
(24)
of a separable S
is also separable.)
REMARK. Let μEΛί(S) be separable; then there exists a unique extension of μ to an (even τ-smooth) Borel measure μ on B(S) f
Proof. By assumption there exists a closed and separable A Cs such that μ(A ) = μ(S), where A EB V (S) by Lemma 11 (ii). Let V := ίBEB(S): O O D BΠA EB,(S)}; then V is a σ-algebra in S. But, since each closed subset belongs to V (cf. Lemma 11 (ii) and notice that FΓ>A
is again closed and separable),
EMPIRICAL PROCESSES
45
V equals B(S) and therefore μ(B) := μ(BΠA ) is well defined for all BGB(S). Furthermore, for every BEB(S), μ(B) = μ(BΠA ) = μ(B)-μ(B\A ) = μ(B), since μ(B\A ) = 0 according to μ(A ) = μ(S), showing that μ is a Borel extension of μ (being even τ-smooth since μ concentrates on the separable subset A
of S ) .
As to the uniqueness of μ, suppose that μ. are finite measures on B(S) with rest R / C \ U . = μ, i=l»2; then μ-(A ) = μ o (A ) = μ(A ) = μ(S) and therefore O. \ o y 1
_L O
^
O
O
b μ-(B) = μ.CBΠA ) =μ(BΠA ) = μo(BΠA ) = £_(B) f o r a l l B6B(S) showing t h a t l l o o z o z
(Note: It can be shown by examples that the assumption in (24) of μ being separable cannot be dispensed with, in general.)
DEFINITION 3. μEM (S) is called tight iff sup{μ(K): K6K(S)} = μ(S). (Note that K(S)Cβ (S) according to Lemma 11 (iii).)
(25)
REMARK. Any tight μEM, (S) is separable.
Proof. Note first that any KEK(S) is separable; now, since μ is tight, there exists for every n a K 6fC(S) s.t. μ(K )>μ(S) - -
then S
°
:= U
K
n
is
separable and μ(S C ) ^ μ(S ) k μ(K ) > μ(S) - - for all n, whence μ(S C )=μ(S). O o o n n o As to the converse of (25) one has (26)
REMARK. If μ G K ( S ) is separable and if S is topologically complete, then μ is tight.
Proof. Use (24) to get the unique Borel extension μ of μ and apply Theorem 1, Appendix III, p. 234, in Billingsley (1968).
•
(Note: As shown by Billingsley (1968), Remark 2, p. 234, the hypothesis of topological completeness cannot be suppressed in (26).)
46
PETER GAENSSLER
WEAK CONVERGENCE/PORTMANTEAU-THEOREM:
As before, let S = (S,d) be a (possibly non-separable) metric space, let M (S) be the space of all nonnegative finite measures on a σ-algebra A with a B (S)CACβ(s) and let Aί(S) be the space of all nonnegative finite measures on 8, (S). D Then, given a net (μ )
in M (S) and a μGλί (S), we define:
DEFINITION 4. (μ^) converges weakly to μ (denoted by μ (i)
μ is separable
(ii)
lim Jfdμ α = Jfdμ for all fGC^(S) α
—ζ μ) if
(where again μ is the unique Borel extension of μ, according to (24)).
(27)
REMARKS. a) If (S,d) is a separable metric space, then Definition 4 coincides with the usual definition of weak convergence of Borel measures (cf. Lemma 11 (iv)). b) If (μ ) converges weakly in the sense of Wichura ! s (1968) definition, then (μ ) converges in the sense of our Definition 4 but not vice versa; both definitions are equivalent if (S,d) is topologically complete (cf. (26) and our Portmanteau-Theorem below).
LEMMA 12. Let f: S -> ΊR be such that O^f
then, for every
μGM (S), n
Jfdμ ύ μ(S) +
* Σ μ ({f^k}). k=l
Proof. Since by (23), for every ACS, μ*(A) = infίμ(B): B3A, BGA}, it follows that for every ε>0 and every l^k^n there exists a B
and μ ({fέk}) ^ μ(B
ε,κ
) - —. Put f n
ε
:= 1
o
+ Σ
i,—-i
1
^
GA s.t. B 3{f^k} ε ,κ ε,κ to get a bounded function
n belonging to F (S) and dominating f (f £ Σ 1 J Λ , , £ 1 k=0 whence
+
n Σ 1 = f ), k=l ε,k
EMPIRICAL PROCESSES
47
Jfdμ = infίjgdμ: g£f, g 6 f (S) and g bounded} ^ Jf dμ s ε n n * = μ(S) + Σ μ(B _ ) £ μ(S) + Σ [μ ({f£k}) + £ ] k=i ε ' k k=i = μ(S) +
Σ μ ({f£k}) + ε, k=i
which implies the assertion since ε>0 was chosen arbitrary,
D
In what follows, let G(S), resp. F(S), denote the class of all open, resp. closed, subsets of S , also for ACS let A , A
and 3A denote the interior,
closure and boundary of A, respectively. (28) PORTMANTEAU-THEOREM. Let (μ ) be a net in M (S) and let μEjlί (S) be separable with μ being its unique Borel extension (cf. (24)). Then the following assertions (a) - (h 1 ) are all equivalent: (a)
lim μ (S) = μ(S) and lim inf (μ ) (G) έ μ(G) for all G€G(S) Ot Ot « α α
(a ! ) lim μ (S) = μ(S) and lim inf μ (G) k μ(G) for all GEG(S)ΠA α α α α (b) lim μ (S) = μ(S) and lim sup μ*(F) ^ μ(F) for all FGF(S) α α (b 1 ) lim μ (S) = μ(S) and lim sup μ (F) ^ μ(F) for all FeF(S)ΠA α α α (c)
lim inf Jfdμ ^ Jfdμ for all bounded lower semicontinuous f: S -»• TR a α *
1
(c ) lim inf Jfdμ ^ Jfdμ for all bounded lower semicontinuous f EtF (S) ex a α •* (d) lim sup Jfdμ ^ Jfdμ for all bounded upper semicontinuous f: S -> ]R α (d 1 ) lim sup Jfdμ ύ Jfdμ for all bounded upper semicontinuous f Ef (S) ot a α •* (e) lim Jfdμ = lim Jfdμ = Jfdμ f o r a l l bounded B(S), fl5-measurable α ot α α * f: S •*• ΊR which are μ-almost everywhere (μ-a.e.) continuous (e ! ) lim Jfdμ = Jfdμ for all bounded f EtF (S) which are μ-a.e. continuous α a α
48
PETER GAENSSLER
(f) lim (μ )#(A) = lim μ*(A) = μ(A) for all AGB(S) with μ(9A) = 0 (f') lim μ (A) = μ(A) for all AGA with μ(3A) = 0 α b
(g) lim Jfdμ = lim Jfdμ =Jfdμ for all fGC (S) α α α α * f
(g ) lim Jfdμ
ot
α
b
= Jfdμ for all fGC (S) (cf. Definition 4, (ii)) a
*
(h) lim Jfdμ = lim Jfdμ = Jfdμ for all fGLΓ(S) ot ot α α * (h ! ) lim Jfdμ = Jfdμ for all fGU b (S). α b α Proof. The proof may be divided into 4 steps showing that the following implications hold true, where 'Ί^l' indicates the non-trivial parts. STEP 1: (a)
STEP 2: (d)=> (d')=^ ( c ! ) = > (a!)z=> ( b f ) β φ ( b ) STEP 3: (a) and ( b ) * φ ( f ) = > (f! ) STEP 4: ( e ) z ^ ( g ) n ^ ( h ) n ^ (h1 )
(=^(d) by STEP 1)
(=φ (a) by STEP 1) (=>(e) by STEP 1 ) .
We are going to prove the "•φ!1 parts; the others are either immediate or easy to prove. (b)—fr (d): 1. Let f: S -»• 3R be upper semicontinuous and assume for the moment that 0
n
* £ lim sup [μ (S) + Σ μ ({n α k=l
n
n * ^ lim sup μ (S) + Σ lim sup μ ({nf^k}) ύ μ(S) + Σ μ({nf^k}) α α α k=l α (b) k=l
ύ μ(S) + Jnfdμ, whence lim sup Jfdμ α
^ ^ ^ - + Jfdμ;
thus (for n-*») we obtain (d) for all upper semicontinuous f with 0
EMPIRICAL PROCESSES
49
2. Let f: S -> ]R be upper semicontinuous and bounded, say a
then 0 < T — - < 1, and therefore it follows from part 1 that JD a lim sup J — — dμ ^ J r-— dμ, Da d D—a α
which implies (d), since lim μ (S) = μ(S) = μ(S). α α [ ( a ) - ( d ) ] — K e ) : Let f: S -> ]R be bounded, B(S), β-measurable and μ-a.e. continuous. It follows (cf. Gaenssler-Stute (1977), Satz 8.4.3) that μ(if#
f
:= sup {g: g^f, g lower semicontinuous} and
f
:= inf {g: g^f, g upper semicontinuous};
therefore, since f
^ f S f , we obtain (+)
Furthermore, since f uous and f
and f
Jf#dμ = Jfdμ = Jf*dμ. are also bounded with f
being lower semicontin-
being upper semicontinuous, respectively, we obtain Jf dμ ^ lim inf Jf^dμ (c) α *
^ lim inf α
Jfdμ *
* * * * * ~ ^ lim inf j'fdμ ^ lim sup Jfdμ ^ lim sup Jf dμ ^ Jf dμ, ot ot ot f j v α α α (d) whence, by (+),
lim Jfdμ α
= Jfdμ.
On the other hand, one obtains in the same way that Jf^dμ ύ lim inf Jf^dμ α * £ lim sup Jfdμ α
^ lim sup Jfdμ α *
£ lim sup Jf dμ α ^ Jf dμ, whence, again by (+), α
lim Jfdμ = Jfdμ, which proves (e). α α * ( f τ ) β φ ( g ! ) : Given fGC (S), let fμ ( Ξμof" ) be the image measure that f ^ ^ a induces on IB in R from μ (i.e.,fμ(B) = μCίfGB}) ,B e β ) . Since f is bounded, we have fμ([a,b]) = μ(S) for some -c°; furthermore, since μ(S)<°°, we
50
PETER GAENSSLER
have fμ({t})>0 for at most countable many te[a,bl. Therefore, it follows that for every ε>0 there exist t , t..,..., t
such that
(1) a = t < t < . . . < t = b o 1 m (2) a < f ( x ) < b for all xes (3) and
t.-t. < ε for all j=l,...,m
(4) μ({xES: f(x) = t.}) = 0 for all j=O,l,..,,m. χ
Now, let A. := {xGS: t-i^f( )
then A.GA, the A.'s being pairwise dis-
joint with union S, and 3A.C{χeS: f (x)G{t._. ,t.}}, whence (by (M-)) μ O A . ) = 0, j=l,. . ,,m. Therefore it follows by (f') that (+) lim μ (A.) = μ(A.)
for j=l,..,,m.
Now, put g := Σ t. 1. to get a bounded function g 6?F (S) for which j=l I'1 A j (by (3)) (++)
sup |f(x) - g(x)| < ε. xES
Then, it follows that |Jfdμ α -Jfdμ|
= |j(f-g)dμ α + Jgdμ^ J(f-g)dμ -Jgdμ|
< J|f-g|dμα + J|f-g|dμ + |Jgdμα - Jgdμ| m εμ (S) + εϊ(S) + Σ |t ||y CA ) - μ(A ) | , α j=l 3 " 1 α 3 3 whence, by (+), lim sup |jfdμ α
- J*fdμ| ύ 2εμ(S)
(note that SGA with μ(3S) = μ(0) = 0, and μ(S) = μ(S)). Thus (for ε~K)) we have shown (g' ).
(hτ ) — ^ ( b ) : Since f ΞlEU^(S), we obtain from (h! ) at once lim μ (S) = μ(S). α Next, given an arbitrary FEF(S) and ε>0, let F ε := {xeS: d(x,F)<ε};
EMPIRICAL PROCESSES
51
then F ΨF as εΨO, and therefore, for every nE]N there exists an F £ n GG(S) such that μ ( F £ n ) ύ μ(F) + A. Now, since by assumption μ is separable, there exists a separable S CS with μ(S^) = μ(S); put, for each xES, C ε Γd(x,S C ΠCF εn )/ε , iif S ΠCF n Φ 0 o Π f (x) := < ° n c ε [ l , if s n?F n = 0;
then the function g
:= min(f ,1) has the following properties for every nG]N:
g euf(S) EU_ t s ; (cf. l e t . Lemma 11 11 (ii) ( n j and and note n o t e tthat h a t S ni>F " (1) g n b o
iS
separable), (2) rest
and
S Γ>CFεn o
g n Ξ 0,
(3) restp g^ = 1.
Therefore, for every nEΈ
we obtain
lim sup μ (F) = lim sup J l p d μ oi r o t α α = (h'),(D
^
(g^D
Jg dμ = Π
ί(SCΠFεn)
/
/
^ lim sup Jg dμ n o (3),(1) α
- , \ / - \
J g dμ = f g dμ + f SC n SCΠCFεn n S°ΠF ε n o o c
g dμ n
= J (2) S ° n F £ n c
t
g dμ n
= μ(F£n) < Ϊ(F) + - ,
°
whence (for n-*0^) we obtain
lim sup μ (F) ^ μ(F), which proves (b). α
(bτ ) — ^ ( b ) : Given an arbitrary FEF(S), we have as before that for every nEϋN there exists an F G n EG(S) s.t. μ ( F ε n ) ^ μ(F) + ~. Let g
be defined as before and put F
:= {xES: g (x^tyjll}
then
F EF(S)ΠB, (S), F 3F for all nE]N , and n D n (+)
F Π S C C F ε n Π S C for all nEIN.
no
o
(As to (t), let xθF Πs°; then, if S°ΠδF ε n Φ 0, we have by construction of g n , d(x,S^ΠCF ε n) ^ -^ > 0, whence xφS^ΠCF ε n , and therefore xGF £ r i ns^; if S^ΠCF £ n = 0
52
PETER GAENSSLER
εn
(and therefore g = 1 ) , it follows that F Π S ° = S° and therefore n c o c
c
εn
c
F ns cs = F πs .) n o o We thus obtain
o
lim sup μ (F) ^ lim sup μ (F ) ^ μ(F ) α α n ! n α α (b )
μ(F) + p whence (for n-*») lim sup μ (F) ύ μ(F), which proves (b). α
(a) and ( b ) — ^ ( f ) : Given an Ae8(S) with μ(3A) = 0, we have μ(A°)
ύ lim inf (μ α ) # (A°) ^lim inf (y α > # (A) ^ lim inf μ*(A) (a) α α α
^ lim sup μ (A) ύ lim sup μ (A ) ^ μ(A ) = μ(A ) , whence α α α (b)
lim μ*(A) = ϊί(A), α On the other hand, one obtains in the same way that μ(A ) ^ lim inf (μ ) (A) ύ lim sup (μ ) (A) ύ lim sup μ (A ) α α α ^ μ(A C ) = μ(A°), whence also lim (μ )<Jt(A) = μ(A), which proves (f). α This concludes the proof of the Portmanteau theorem, •
IDENTIFICATION OF LIMITS: Let C be the set of all closed balls in S = (S,d) and let C
denote
the class of all subsets of S which are finite intersections of sets in C. Then, since C
is a Π-closed generator of 8 (S), we have for any two
μ.GR (S), i=l,2, that μ χ = μ
if μ χ (A) = μ 2 (A) for all A G C Π f (cf. Gaenssler-
Stute (1977), Satz 1.4.10).
We will show below that for any net (μ ) in M (S) and any μ.eM, (S), μ
α "b μ i ' i = 1 ' 2 »
i m
P l i e s ^1
=
^2'
For this we need the following auxiliary result:
EMPIRICAL PROCESSES
(29)
53
For any ACS, any ε>0, and any separable S Cs, there exists o an f£Oj£(S) such that 0£f£l, reSt
C(AΠS^)ε f ε
Ξ
°a n d
r e s
f
W
Ξ
U
ε
Proof. It follows from Lemma 11 (ii) that C
f (x) := max [(1 has the stated properties.
d(x,AΠS ) — ) » θ ] , x€S,
D
LEMMA 13. Let μ.GR(S) be separable, i=l,2, and suppose that (+)
Jfdμ1 = Jfdμ2 for all fβj£(S);
then μ 1 = μ 2 Proof. Let S. be the separable subsets of S for which μ.(S.) = μ.(S), i=l,2; put S
:= S^US^jt o
s e t
a
se
P a r a t ) l e subset of S for which μ.(S C ) = μ.(S),
i=l,2. Now, given an arbitrary AGC
and n£]N , choose f =f . according to
(29) to get a sequence (f )OJ (S) for which lim f n-*" n
= l . n ς O ; from this, by Lebesgue's theorem and (+) c
it follows that ^
^
= μ^A). D
Lemma 13, together with the equivalence of (g 1 ) and (h 1 ) in (28) implies the result announced above (cf. Definition 4 (i)): Lemma 14. For any net (μ ) in M (S) with μ -rj μ., i=l,2, we have μ
= μ .
WEAK CONVERGENCE AND MAPPINGS (Continuous Mapping Theorems): Let S = (S,d) and S f = (S ? ,d ! ) be two metric spaces and suppose again that A is a σ-algebra of subsets of S such that B b (S) C A C B(S); let g: S + S τ be A,B (S1)~measurable and let μ 6M (S) and μE/^CS), respectively,μ separable. Then μ and μ induce measures v and v on B, (S ! ), defined by α α b
54
PETER GAENSSLER
1
t
τ
1
1
v α (B') := μ α (g C B ) ) and v(B ) := £(g (B')) g
for B ' d B ^ S ) , where
1
(B ) = lx6S: g(x)eB'} and where μ is again the unique Borel extension of
μ (cf. (24)).
We are interested in conditions on g under which μ v
= μ og
—r* \) Ξ μog
—=*« μ implies
. it can be shown by examples that measurability of g
alone is not sufficient for preserving weak convergence. As we will see, some continuity assumptions on g will be needed. The corresponding theorems are then usually called CONTINUOUS MAPPING THEOREMS.
THEOREM 3. Let S - (S,d) and S f = (S ! ,d f ) be metric spaces, let A be a σ-algebra of subsets of S such that B (S) C A C B(S), and let g: S -> S f be A,B (S ! )-measurable and continuous. Let (μ^) be a net in M (S) and let \iEhί (S) be separable such
t h a t
μ
α ~~bμ #
T h e n
V
α
Ξ
μ
α °g~±
~b
V
Ξ
μ
° g"±'
Theorem 3 is a special case of the following result where the continuity assumption on g is weakened:
THEOREM 4. Let S = (S,d) and S τ = (S τ ,d ! ) be metric spaces, let A be a σ-algebra of subsets of S such that B, (S) C A C B(S), and let (μ ) be a net in b α M (S) and μEAl (S) be separable such that μ measurable such that μ(D(g)) = 0. Then v
- £ μ; let g: S -> S τ be A,B (S')~ Ξ μ
g
—ζ v Ξ μ«g
(Note that D(g) 6 B(S); cf. P. Billingsley (1968), p. 225-226.) Proof. Note that v ^ M ^ S 1 ) and \>ύl ( S ! ) , whence v^ -ζ v iff (i) v is separable and (ii) lim J fdv α α S
= J fdv for all f θ l ( S f ) where (ii) b S'
is equivalent to any of the conditions (a)-(h 1 ) in (28) (with S replaced by S 1 and A replaced by A ! = B (S f )). 1.) v is separable: since μ is separable, there exists a separable S Q5 such that μ(S ) = μ(S).
EMPIRICAL PROCESSES
Let T C S Q be countable and dense in S we will show that S
1
:= g(S^\D(g))UT
55
(as well as in S ) and let T
1
is a separable subset of S
τ
f
:= g(T ) ;
with
v((sυ°) = v(S'). For this we will show that T
τ
!
(being countable) is dense in S :
o o in fact, l e t (w.l.o.g.) yGgtS^Dίg)), i . e . , y = g(x) for some xESC\D(g)f Since T is dense in S there exists a sequence (x ) ^τ. τ c T such that x -> x o o ^ n nEU o n and therefore, since x f D(g), we have g(x ) -> g(x) = y, where g(x ) GT1 . X
C
Next, since g ( ( S M ) D g
(g(S^\D(g))) D S ^ D ( g ) and since £(D(g)) = 0,
we have v((S') C ) = μ(g~
SM))
ϊ(S^D(g))
S(S)
= £(S°) = μ(S^) = μ(S) = Ϊ ( S ) = ϊ ί g ' ^ S 1 ) )
2.) It remains to show (ii)
= v(Sf).
= J fdv for all f6C?(S f ).
lim J fdv α
α S
μ(
b
S
f
For this, given any fEC, ( S ) , we have that f g: S -> ]R is a bounded function belonging totF (S) which is μ-a.e. continuous, and therefore it follows from a (28) (cf. (e 1 )) that lim / fdv α α Sf
= lim J (f g)dμ α α S
= J (f g)dμ = J S S!
fdv, which proves (ii). D
The following lemma is in some sense an inverse result: LEMMA 15. Let S = (S,d) be a metric space, (μ ) be a net in M (S) and let
ot μEK (S) be separable such that μ βf" b a
- ^ μ f" b
a for all fEC (S). Then μ -Γ μ. a α D
Proof. Note that in the present case S 1 = P. (a separable metric space), whence v
Ξ μ
f
—1
~ —1 and v Ξ μ f are separable Borel measures on
C = B(IR). Now,
for any fEC^(S) and any gEC^(]R) = Cb(]R) we have
lim J (g f)dμ
= lim J gdv α
αS
α
]
R
α
]
= J gdv = J (g f)dμ. R
S
Furthermore, for any fEC (S) there exists a c>0 such that |f| ^ c, whence for a r
- c , if t<-c
g(t) :=<^ t, if |t| ύ c , t 6 E , I c, if t>c
56
PETER GAENSSLER
we have g6C (]R) and g f = f. Therefore it follows that lim Jfdμ
= Jfdμ for all
fee (S) implying the assertion since μ is, by assumption, separable. D a For the next mapping theorem we need the following auxiliary lemma, the proof of which is left to the reader.
LEMMA 16. Let S = (S,d) and S
1
f
= (S ,d') be metric spaces; given g ,
g: S -> S 1 , nG3N, let E = E((g n ),g) := {xeS: ^ ( x ^ ^
c
S s.t. x n + x
but g ^ x ^ y * g(x)}.
Then x£E iff for every ε>0 there exists a kE]N and a δ>0 such that n^k and d(x,y) < 6 together imply d f (g(x),g n (y)) < ε. THEOREM 5. Let S = (S,d) and S 1 = (S'^d 1 ) be metric spaces, A be a σ-algebra of subsets of S such that 8 b (S) C A C B(S), and let (μ ) M (S) and μβί (S) be separable such that μ
be a sequence in
-g μ; let g , g: S •* S f be
A,B (Sτ)-measurable, n E U , such that μ*(E) = inf{μ(B): B 3 E, B E A} = 0. Then V n Ξ μ
g^" -g v = μ g" 1 .
Proof, (cf. P. Billingsley (1968), Proof of Th. 5.5). 1.) v is separable: this is shown as in the proof of Theorem M , replacing g(T Q ) there by T
:=
U g n (T ). E]N
2.) We are going to show (+) lim v (S 1 ) = v(S f ) n nχ» and
(++) lim inf v (G) ^ v(G) for all G 6 G(S') Π B, ( S 1 ) . n b n^
(Note that (+) together with (++) imply the assertion according to (28) with S replaced by S 1 and A replaced by A 1 = B (Sf ).) ad (+): μ - ^ μ implies (cf. (28)) μ (S) •> μ(S) and therefore n Jb n V n ( S f ) = μ n (g^(S')) = μ n (S) + μ(S) = μ(S) = μίg'^S')) = v(S'). ad (++); Given an arbitrary G 6 G(S') Π B ( S ' ) 9 we have
EMPIRICAL PROCESSES
(a)
1
g (G) C E U
U τ° where T k k kern
:= Π n^k
57
g ^ G ) E A, n
and (b)
1
v(G) = μ(g" (G)) £ μ( U T°). K
ad (a): It suffices to show that x E (JE and g(x) E G together imply x £ T° for f
some k. Now, since G E G(S ) we have that for some ε>0 B,t(g(x),ε) C G; on the other hand, by Lemma 16 x 6 &E implies that there exists a k E 3N and a δ>0 such that d'(g(x),g (y)) < ε whenever n^k and d(x,y) < δ; therefore g (y) E G for all n^k and all y E S with d(x 5 y) < <5 implying Bd(x,d) C g " 1 ^ ) for all whence B (x,δ) C T , and therefore x 6 T°, Q
K
ad (b): μ(g"1(G))
= μ*( U
JC
Z μ*(E U U Ί?) ύ Ϊ*(E) + μ*( U τ°) (a) ken K kEIN k
T°); note that for A E B(S), μ*(A) ^ μ(A) by (23); we will show that κ
even μ
= μ on B(S) which proves (b).
For this, let A E B(S); it suffices to show that μ (A) ύ μ(A). Now, μ(A) = μ(S C Π A) = μ(S° n A f ) for some A 1 E B^ίS) (noticing that for separable S° one has S° n A E B(S°) = B, (S C ) = S C Π B, (S))9s and therefore F o o o b o o b μ(S° Π A 1 ) = μ(A! U ζs°) Z μ*(A), since A C A ! U Cs° E B. (S) C A. O
O
O
Now, since T° C T° and therefore μ(T°) t μ( U k k+1 k
D
T ^ ) , for every ε>0, there k
exists a k E IN such that o μ( U
T°) ύ μ(T°) + ε for all k ^ k ,
kernk
k
°
and therefore, by (b), we obtain v(G) ^ μ(T°) + ε for all k U K
, o
But μ -7* μ implies (cf. (28)) that for every kEϋN n b Ϊ(T°) ύ lim inf (U n ) # (T°),
and therefore, noticing that T° C g (G) for sufficiently large n, we obtain K n
58
PETER GAENSSLER
μ(T°) έ lim inf μ ( g ^ C O ) = lira inf v (G), k
n
n
n
whence v(G) ύ lim inf v (G) + ε for every ε>0, which implies (++). D n*»
WEAK CONVERGENCE CRITERIA AND COMPACTNESS: b e
As before, let S = (S,4)
a
(possibly non-separable) metric space, let
M (S) be the space of all p-measures on a σ-algebra A with EL(S) C A C B(S) and let ^ί(S) be the space of all p-measures on β ( S ) .
DEFINITION 5. Let (μ ) _. be a net in M X ( S ) ; then (μ ) C Δ is called δ-tight ot dcA a ot otfcA iff (30)
sup
(Note that K
lim inf μ ( K δ ) = 1. α α€A
inf δ>0
E B,(S) C A according to Lemma 11 (ii).)
The following two results were proved by M.J. Wichura (1968), Th. 1,3 and Th. 1.4; in view of (27) b) they can be restated as follows (where in Theorem 7 the assumption of (S,d) being topologically complete cannot be dispensed with, in general).
THEOREM 6 (Wichura). Let (μ ) C Δ C M 1 ( S ) be δ-tight. Ot Ottn.
Then there exists a subnet (μ ,) f Ot
a
ΔT
Ot vzzΔ
of (μ ) 01
. and a separable μEM, (S) such OtvzΔ
JD
that μ α , - - μ . THEOREM 7 (Wichura). Let S = (S,d) be a topologically complete metric space and (μ ) be a net in M ( S ) ; then there exists a separable μEM, (S) with
and
(a)
lim inf Jfdμ = lim sup Jfdμ for all f e Ot Ot α α
(b)
(μ ) is δ-tight.
D
We are going to prove here instead the following versions of Theorem 6 and 7 (cf. Remark (31) below):
EMPIRICAL PROCESSES
THEOREM 6*. Let (μ )
59
1
he a net in M (S) fulfilling the following two con-
ditions: (b ) 1
For every (f ) __τ C lί(S) with f Ψ 0 one has n nEJN b n lim sup Jf dμ α
(b_) ^
-> 0 as n-x».
There exists a separable S lim inf Jfdμ
o
C s such that
^ 1 for all f G lf(S) with f b
SC o
Then there exists a subnet (μ f ) , ^ , of (μ )
and a separable μ G Λi(S)
with μ(S°) = 1 such that μ -r* μ. o ΌL b
THEOREM 7*. Let S = (S,d) be an arbitrary metric space and (μ ) be a net in M (S); then there exists a separable μ G Λί(S) with μ
—=* μ iff the following
conditions are fulfilled: (a) as in Theorem 7 and (b.), i=l,2, as in Theorem 6 , where in this connection the separable S
with μ(S ) = 1 and the separable S
occurring in (b~)
coincide.
Proof of Theorem 6*. Let μ (f) := Jfdμ
α~(uα(f»£θJb(s)e b
for f G U (S) and consider the net
π
[
fθf(S) b
where
||f|| :=
sup |f(x)|. Since the product space x G S
Π
[- ||f||,||f|| 1 b
fGU (S)
is compact in the product topology (Tychonov!s theorem), there exists a convergent subnet, say α 1 ι—> (μ ι ( f ^£Qjb( S )» α f G A τ . Therefore exists for each f G Let
then μ: u
u b
(S).
μ(f) :=
lim μ ,(f) for f G U b ( S ) ;
(S) — > ]R is positive, linear, and normed.
b We are going to show that μ is also σ-smooth on U (S):
lim
μ t (f)
60
PETER GAENSSLER
for this, let (f ) ___C lί(S) with f Ψ 0; then it follows by (b.) that J n nfcJN b n 1 μ(f ) = lim μ t (f ) = lim sup Jf dμ . n , α' n , * n α' α α ^ lim sup Jf dμ α
-> 0 as n-*»,
Therefore, according to the Daniell-Stone representation theorem, there exists one and only one μ E M, (S) such that μ(f) = Jfdμ for all f E l£(S). D Hence, in view of (28) (cf. the equivalence of (g ! ) and (h ! )) it follows that μ , — £ μ, if we finally show that μ(S°) = 1 (i.e. μ separable). For this we use (b^) according to which (+)
lim inf Jfdμ α
^ 1 for all f G l£(S) with f * 1 b SC o
taking
f (x) := max [l - nd(x,S^),0], xGS,
we obtain a sequence (f ) € 1 s
c
^ f ^ 1 Λ . , n 1/n (sc
o
C U (S) with 0 £ f
£ 1 and
whence by the σ-smoothness of μ
o
(note that S ° , ( S C ) 1 / n G B, (S) by Lemma 11 (ii)), O O D C
C
μ(S ) = inf μ ( ( S ) o ^.--T o n£U
1/n
) ^ i n f Jf dμ = inf lim Jf dμ ,1 n n α fi n n α
^ inf lim inf Jf dμ ^ 1, whence μ(S C ) = 1. n α n α ( +) °
Proof of Theorem 7 . Only if-part: Suppose μ
D
—£ μ;
then (a) is a consequence of (28) (cf. the equivalent statements (g f ) and (h 1 )). ad (b,): Let (f ) c n κ τ Clf(S) with f 4- 0; then (cf. again the equivalence of x— n ntjN JD n (g 1 ) and (h ! ) in (28)) lim sup Jf du α according to the σ-smoothness of μ on
= lim Jf n dμ α = Jfndμ -> 0 as n^° α u b
(S)
ad (b^): Since, by assumption, μ is separable, there exists a separable S
C s
EMPIRICAL PROCESSES
61
such that μ(S°) = μ(S); therefore, for any f 6 uf(S) with f Π o b
s
c o
one has
lim inf Jfdμ = lim Jfdμ = Jfdμ ^ μ(S°) = μ(S) = 1. α α o α α 1
c
If-part: It suffices to show that there exists a μ E M. (S) with μ(S ) = 1 such that for any subnet (μ ,) of (μ ) there exists a further subnet (μ M ) such that μ M —ζ μ. For this, let (μ ,) l f = Δ t be an arbitrary subnet of (μ ) a then it is easy to α α tA α αtA show that (μ ,) ! C : Δ I fulfills (bΛ) and ( b o ) and therefore, by Theorem 6 , there α α tA -L 2. exists a subnet (μ „ ) llc.,, of (μ t ) ! C Δ , and a μ Δ f Ot
yK Λ ι!
c. Λif(S ) = 1
A ,A"
o
'9
Ot
such
vΞ*A
that
Ot
μ „ —^ μ
α" b
Ot
vHxX
Jt\
„ E Λl (S) with % ί\
iλϊ*
....
AΛ ff, A M
We are going to show that μ Δ , Δ f l in fact does not depend on A 1 or A", whence A ,A for μ being the common value of all the μ Λ l .„ we get μ - ^ μ, which will A
,A
Ot JD
conclude the proof. For this, given any f E U (S), we have by (a) lim inf Jfdμ α αEA = lim sup Jfdμ α α"EAM whence
ύ lim inf Jfdμ „ = Jfdμ Δ ! α α"EA" ' fl
^ lim sup Jfdμ α αEA
Δ M
= lim inf Jfdμ , α αEA
Jfdμ A ? A,, = Jfdμ~t - „ , for all f Eu b ( S ) and any other subnet dJ~M)~ιι^Tti of (vι~f)~tcAi which is a subnet of (μ ) α Q . , with μ~ π —£ μ~ t ~ tl ;
therefore Lemma 13 implies the assertion.
D
(31) REMARK. Any «S-tight net (μ ) C M"L(S) fulfills ( b . ) 3 i=l,2, but not vice versa (look at μ
Ξ μ with a separable μ E ^ί(S) which is not tight).
Proof, ad (b,): Let (f ) C 1 λ τ C UΓ(S) with f Ψ 0 and assume w.i.o.g. sup f ^ 1 ; i.— n ntJJN D n n then for every n £ 3N , every δ>0, and every K E K(S) we have lim sup Jf dμ = lim sup ( J x f dμ + J^f dμ ) F r n α 'δ n α α Γvδ n α α K CK
62
PETER GAENSSLER
^ lim sup J.f dμ
K δn
α
a
+ lim sup
CK
α
J~f dμ
δ n
α
^ (since f ^1) sup. f (x) + lim sup μ «JK ) ^sup~ f (x) + sup lim sup μ (£K ). n ό n α n α xGK α xGK δ>0 α Now, given any ε>0, there exists by assumption (cf. (30) and look at complements) a K
6 K(S) such that sup lim sup μ (£K ) ύ ε/3.
ε
δ>0
α
α
Therefore, for any ε>0 there exists a K
ε
6 K(S) such that for all n G ΊX and δ>0
lim sup Jf dμ ^sup {f (x): x G K 6 } + ε/3. n α n α Furthermore, it is easy to show that for any ε>0 and n G IN there exists a δ(ε,n) such that sup if (x): x e K ^ ( ε ' n ) } ύ sup if (x): x G K £ } + ε/3. We thus obtain that for any ε>0 there exists a K
G K(S) such that for every
n GH lim sup Jf dμ α But, since K lim sup Jf dμ
^ sup {f (x): x G K } + ε/3 + ε/3.
is compact, sup {f (x): x G K } -> 0 as n->», whence ύ ε for sufficiently large n, which implies ( b ^ .
ad ( b o ) : δ-tightness of (μ ) implies that for every n G U there exists a 2.— ot K G K(S) such that inf lim inf μ (K 6 ) £ 1 - -. n a n n Γ^Λ δ>0 a Put S
:=
U K nGϋN
to obtain a separable S
C S; then, given any f G U^(S)
with f ^ 1 , we must show that SC o (+)
Since f ^ 1 K
lim inf Jfdμ ^ 1. α α
for each n, it follows (by continuity of f) that for every n G n
and every ε>0 there exists a δ J
o
= δ ( ε , n ) > 0 such that o
inf ίf(x): x G K ^ } ^ 1 - ε, whence Jfdμα ^ (1-ε) U α ( κ n ° ) every ε>0 and every n E U w e
have
lim inf Jfdμ α
Therefore, for
^ (1-ε) lim inf ^ ( K ^ ) α
EMPIRICAL PROCESSES
63
k (1-ε) inf lim inf μ (K ) ^ ( l - ε ) ( l - - ) , which implies (b ). D α n n δ α (32) Remark. The proof of Theorem 7* shows that any net (μ )C Λ C M (S) —————————
ot oiviA
a
which fulfills (b.), i=l,2, is a compact net in M (S) (i.e. for any subnet I a (
}
V a'eA' °
f (μ
}
a aeA
theΓe existS a further subnet (μ
ι) α
f(
}
α"eA" ° V a ' G A '
and a separable μ 6 Λί(S) such that μ u -£ μ ) . The following lemma prepares for the next theorem (cf. M.J. Wichura (1968), Theorem 1.2 (a)). LEMMA 17. Let (μ ) be a net in M (S) and μ G R ( S ) be separable, i.e., μ(S ) = μ(S) for some separable S C S; let C C A be such that o o (33)
for each x E S° fc € C: x G C°} is a neighborhood base at x,
and let C
denote the class of all finite intersections of members of C.
Suppose that ~ Πf lim μ (C) = μ(C) for all C G C . α
(+) Then
lim inf μ (G) ^ μ(G) for all G G G(S) n A. α α (Here again μ denotes the unique Borel extension of μ and A is a σ-algebra of subsets of S with B, (S) C A C 8(S).) D
Proof. Given any G 6 G(S) n A, it follows by (33) that for every x E G n S° there e x i s t s a C EC such t h a t x G C C G C G, whence x xx G Π s° C U o χeσΊsc of G Π S
C°, which means that {C° Π S°: x G G Π s°} is an open covering x' x o
in the separable subspace (S ,d) of (S,d). Therefore (cf. Billings-
ley (1968), p. 216) there exists a countable subcovering of G ^ S , i.e., G Π SC C U ( c ° Π SC)with x 6 G Π S C , n e i . o ^-_ x o n o n G Uτ n
64
PETER GAENSSLER
Put C := C , n e W ; then U C C U C C G, whence neu n xGens0 x O
μ(G) ^ μ( U C ) = μ( U (C Π S°)) ^ μ(G Π S°) = μ(G), n
n
n o
n
o
i.e., μ(G) = μ( U C ) . n
n
C* := C. and C! := C \
Put
C1
G A with
n
n-1 U
C., n ^ 2 5 t o get pairwise d i s j o i n t
sets
U C1 = U c , for which one can e a s i l y show (using t h e assumpnG]N n nGU n
tion (+)) that lim μ (C 1 ) = μ(C f ) for all n GIN. α
-
α
n
"
n
Therefore, for every n G U we have n n n lim μ ( U C!_) = lim Σ μ (C!) = Σ α α i=l 1 α i=l α 1 i=l
(++)
n μ(C!) = μ( U 1 i=l
C!). 1
Since μ(G) = μ( U C^) = μ( U C ! ) , there exists for each ε>0 5 an n = n(ε) G 3N nG]N such that
~ n ~ μ( U C!) ^ μ(G) - ε 5 and therefore (note also that G 3 U cl) 1
x
lim inf μ (G) ^ lim inf μ ( U C! ) = μ( U C|) k μ(G) - ε, α α α α i=l X (++) i=l 1 which proves the assertion. D THEOREM 8. Let (μ ) be a net in M 1 (S) and μ G Mj^(S) be separable (i..e.,μ(S ) = μ(S) = 1 for some separable S C S ) . o o Suppose that C C {B G β (S): μ(3B) = 0} fulfills (33). b Then the following two assertions are equivalent: (i)
lim μ (C) = μ(C) for all C G C Π f α
(ii)
μ.-eμ.
Proof, (i) =* (ii): Follows immediately from Lemma 17 and (28) (cf. the equi1
1
valence of (a ) and (g ) there); note that lim μ (S) = μ(S) is trivially α
EMPIRICAL PROCESSES
fulfilled for p-measures μ
65
and μ. τ
1
(ii) =» (i): Again (28) (cf. the equivalence of (g ) and (f )) yields f
lim μ (B) = μ(B) for all B 6 {B 6 B, (S): μ(3B) = 0} =: R~ = R~ α D μ μ
Πf
D C .
D
We will consider next a Cramer-type result which is useful in applications. For this, let again S = (S,d) be a (possibly non-separable) metric space, A be a σ-algebra of subsets of S such that 8, (S) C A C B(S), and let (ξ ) „. D n ntJN be a sequence of random elements in (S,A) and ξ be a random element in (S,B,(S)), being all defined on a common p-space (Ω,FjP). Then (34)
ξ
is said to converge in law to ξ (denoted by
L ξ -=•* ξ) iff L{ξ } — e H ξ } n n b
(in the sense of
our Definition 4 ) . Now, let (η ) ™
be another sequence of random elements in (S,A) defined on
the same p-space (Ω,F ,P), and let d(ξn,nn)(ω) := d(ξ n (ω),n n (ω)), ω 6 Ω. Note that for non-separable S, d(ξ ,n ) need not be a random variable. THEOREM 9a. Suppose that in the setting just decribed lim P*(d(ξ ,π ) > 6) = 0 for every 6 > 0, n-χ»
where Έ Then
denotes the outer p-measure pertaining to P.
ξ -2-> ξ iff η -^> ξ. n n
Proof. By symmetry, it suffices to show that ξ n
L So, assume
>ξ
implies
η n b
ϊ ξ.
ξ > ξ and let f 6 U (S) be arbitrary but fixed; n D then according to (28) (cf. (h ! )) it suffices to show that
66
PETER GAENSSLER
( + ) lim I E(f(ξ n )) -lE(f(n n ))| = 0. (Note that f(ξ ) and f(η ), as well as f(ξ), are random variables.) n n ad (+): Given an arbitrary ε>0 there exists (by uniform continuity of f) a ό = δ(ε) > 0 such that |f(x) - f(y)| ^ ε whenever d(x,y) ^ 6; also ||f|| = sup |f(x)| < co.
xes Therefore,
I E(f(ξ n )) -E(f(n n ))| ^ J|±r(€n) - f(n n )| dip -- J#|f(ξn) - f(nn)| «
s !\άUΛ)>6}
l«5n) - f(nn)|
^ 2 ||f || P*(d(ξ ,η ) > 6) + ε •> ε as n -> «, whence lim sup | E(f(ξ )) - E(f(n ))| = ε
for every ε > 0,
which implies (+). • The following version of Theorem 9a is useful as well, THEOREM 9a . Let (ξ ) ^^T and (η ) __. be sequences of random elements in n ΏEM n ntJN (S,A), defined on a common p-space (Ω,F>1P) such that (a)
lim P*(d(ξ ,η ) > 6) = 0 for every δ > 0. n-χ»
Let S τ = (S ! ,d ! ) be another metric space and H: S -> S 1 be A,B b (S ! )-measurable9 and such that (b)
d τ (H(x) 9 H(y)) ^ L d(x,y) for all x 5 y 6 S and some constant 0 < L < °°.
Then, for any random element ζ in (S ! ,B b (S ! )) s H(ξ ) -2-> ζ iff H(η n n
EMPIRICAL PROCESSES
67
f
1
Proof. H(ξ ) and H(η ) are random elements in (S ,B (S )) for which by (a) and (b) f
]P*(d (H(ξ ) , H(η )) > δ) ^P*(d(ξ ,η ) > 6/L) •> 0 n n n n for every δ>0, whence the assertion follows from Theorem 9a. D REMARK. Instead of (b) it suffices to assume only that H is uniformly continuous. THEOREM 9b. Let (ξ )
be a sequence of random elements in (S,A) and let ξ
be a random element in (S,8, (S)) being all defined on some common p-space (Ω 9 FjP). Suppose that ξ iaP-a.s. constant; then ξ
? ξ implies
lim 3P*(d(ξ ,ξ) > δ) = 0 for every δ > 0.
Proof. We show first (+)
lim E(|f ξ - f ξ|) = 0 for all f 6 t£(S). b
In fact,
f
we have (cf. Theorem 3) that f ξ for each f E UΓ(S) UΓ D n D
*f β ξ ,
ξ and f ξ being real random variables such that f oξ is P-a.s. constant, Έ
ΊP
whence (by classical probability theory) f ξ -*- f ξ (where •> denotes convergence in probability). Since f is bounded, ifo ξ : nEΈ } is uniformly Ll integrable and therefore fβ ξ —f fβ ξ which proves ( + ). We are going to show that (+) implies lim P*(d(ξ ,ξ) > δ) = 0 for every δ > 0. n
-xχ»
n
For this, let δ>0 be arbitrary; since L{ξ}(SC) = 1 for some separable S C S there exists a countable and dense subset {x.: iEΠN } of S and we have 1
O
E S^) = 1. Then, for each iEU , there exists an f i
{
0 if 1 if
E U^(S) such that O^ ^ i l
and
x6B°(x.,6/4) xECB°(xi9δ/2),
where B°(x.,r) denotes the open ball with center x i and radius r.
68
PETER GAENSSLER
In fact, take C
d(x,B°(x.,δ/4)ΠS ) i —),θl 6/4
f.(x) := 1 - max [(1 1
to get such a function. o
Now, let A 1 := ίξ6B (x1,δ/4)} and for i^2 let
then A. E F, the A.'s being pairwise disjoint and such that P( U A.) = 1 1
1
according
iE3N
to (*).
P*(d(ξ
^
Σ
1
Therefore , ξ ) >δ ) £
IP*({d(ξ
Σ iEK
,x ) > h ] i 4
n
=
P*({d(ξ
Π A.) £ i
,ξ) > δ} Π A.)
Σ i e ] N
J*l |f A± i
ξ n
- f . o ξ | d P i
Σ / |f.o ξ - f. . ξ|dP 5 1 iE3N A i
where the last inequality follows from the fact that for all ω E { d ( ξ , x . ) > -^δ}n A . n l 4 l by
c o n s t r u c t i o n of
If
we put
the
g (i) n
o n eh a s
f.τs
:= J
and f.(ξ(ω)) l
= 0
.
|f.
A
f . ( ξ ( ω ) )= 1 in
o ξ l
- f. n
ξjdF
and
g(i)
i
:= P ( A . ) i
i for
each iEJJ,
O^g
^g n
and
we o b t a i n f u n c t i o n s Σ iE]N
g(i)
=
Σ
g
and g on Έ f o r
P(A. ) = P ( U
iE]N
X
iEIN
A.)
which
= 1,
1
i.e.fthe g 's are integrable functions on U (integrable w.r.t. the counting measure on U ) being dominated by an integrable function g; since, by assumption
lim g (i) = 0 for all iEJN, n^°° it follows from Lebesgue's dominated convergence theorem that lim sup P*(d(ξ n-*°°
,ξ) > δ) ^ l i m Σ g ( i ) = 0. n^°° i E ] N
D
Finally, concerning the speed of convergence we have the following result:
EMPIRICAL PROCESSES
THEOREM 10. Let ξ
69
n£H , and η be random elements i n (S,A) defined on a
5
common p-space (ΐl91rJP) such t h a t f o r some sequence a (a')
+0
P*(d(ξn,η) > an) = <«an).
Let H: S ->]R be A,β-measurable and such t h a t (bf)
|H(x) - H ( y ) | £ L
d(x,y)
for a l l x9y 6 S
and some constant 0 < L < °°. Assume further that L{H(η)} is absolutely continuous w.r.t. Lebesgue measure λ such that (c )
| | h | | = sup | h ( t ) | =: M < » f b r t6]R
h
Then s u p jP(H(ξ ) ^ t ) - P ( H ( η ) £ t ) | = CT(a ) . n
ten
n
Proof. Let tEJR be arbitrary but fixed; then P(H(ξ n ) ^ t) -Γ(H(η) £ t) ]P*(H(ξ ) ^ t, d(ξ ,η) ^ a ) + σ(a ) - P(H(η) ^ t) n n n n P(H(ξ ξ nn) ύ t, |H(ξn) ~ H(η)| Z L a n ) + « a n ) -P(H(η) ύ t) P(H(η) ύ t + L a ) + tf(a ) - P(H(n) ^ t) n n ^
(c )
M
L - a
n
+ CΓ(a ) = CΓ(a ) .
n
n
In the same way one obtains that 3P(H(ξn) > t) -P(H(η) > t) = whence also P(H(η) ύ t) -P(H(ξ ) ύ t) = so that in summary sup |P(H(ξ ) ύ t) -]P(H(η) ύ t ) | = CΓ(a ). D n
ten
n
SOME REMARKS ON PRODUCT SPACES: Let S
f
f
!
ff
= ( S 5 d ) and S" = (S",d ) be two (possibly non-separable) metric
70
PETER GAENSSLER
spaces. 1
Let S := S * S " be the Cartesian product of S
!
!
and S" and let d := max(d ,d"),
i.e., ι
d((x ,x"),(y',y")) := max(d' (χ ,y' ),d"(x'\y")) !
for (x',x") 6 S and (y ,y") G S, Then S = (S,d) is again a (possibly non-separable) metric space.
REMARK.
(1) Bb(S) C B^S 1 ) ©
and
(2) B ( S f ) ® B(S") C B(S),
the inclusions being strict in general as can be shown by examples. Let A r and A" be σ-algebras of subsets of S ! and S", respectively, such that B b ( S f ) c A' C 6(S ! )
and
B^(S") c A" c B ( S M ) .
Then
fi.(S) c B,(S') ®BAS") c A! © A " c B(s')©B(S") c B(s), b
(1)
b
b
i.e.,putting e.g., A := B ^ S 1 ) ® ^(S") or
A := A' © A "
(2) ... (a) ... (a 1 ),
we have again B b (S) C A C B(S) for the product space
S = S 1 x S".
Now, let ξ , nE3N, be random elements in ( S ' , A f ) 9 ξ be a random element in (S',B, (S')) 9 η , n θ ί , be random elements in ( S " , A M ) 9 n and let η be a random element in (S",B, (Sf!)) suppose that all these random elements are defined on a common p-space
Then (ξ ,n ) , nEΠN , are random elements in (S,A) (for both choices of A as in n n (a) or (a 1 )) and (ξ,η) is a random element in ( S ^ t S 1 ) Θ as well as in (S,B, (S)) (cf. (1) in the above remark). Thus, considering (ξ,η) as a random element in (S,B, (S))
EMPIRICAL PROCESSES
71
is again defined in the sense of (34), i.e., as
L{(ξ ,n )} ( eM (s)) -^ L{(ξ,η)} ( eM. (s)), n
n
a
b
b
Supplementing the results contained in Theorems 9a and 9b we can prove within the setting just described the following Theorems 9c and 9d:
THEOREM 9c, Suppose that n equals P-a.s. some constant c; L
then
L
b
ξ
L
b
>ξ
and
η
h
* η.
together imply
(ξ ,n )
> (ξ,n).
Proof. According to Theorem 9b,
η
>η
Since
and
d((ξ
n
n = c P-a.s,
5
η ),(ξ n n
,η))
imply
= max
(df(ξ
lim
P (d π (n ,n) > δ) = 0
, ξ ) ,d"(n ,η)) nn n
= d
f !
(n
n
for every δ>0.
,n),
we thus have lim P*(d((ξ ,η ),(ξsη))>fi) = 0 for every <5>0. nχ» Therefore, by Theorem 9a, the assertion of the present theorem will follow if we show
ad (+); 1.) L{(ξ,η)} is separable: since L{ξ} is separable, there exists a separable S 1 C S 1 such that L{ξ}(S t C ) = 1 . Take o
S
o
:= S' C x {c} to get a separable and closed subset o
S = S C of S for which o o U(ξ,n)}(s o ) = p ( ( ξ , η ) e s Q ) = p ( ( ξ , n ) e s ^ c x {c}) = p ( ( ξ , c ) e s | C x {c}) = f ( ξ e s ' c ) = L{ξ}(s f C ) = l, o
Q
o
2,) According to the Portmanteau theorem (cf. (h f ) there) it remains to show that
where
S f dμ ? J f dμ for all f € U?(S), n S S μ := L{(ξ ,η)} and μ := ί-{(ξ,η)}. n n
72
PETER GAENSSLER
!
Now, given any f: S. = S x S" •> ]R being hounded, d-uniformly continuous and B, (S)-measurable, it follows from (1) in the remark made at the beginning 1
n
that f is also B (S ) 0 B,(S )-measurable, whence !
!
!
!
f : S ->]R, defined by f'(x') := f(x ,c), x E S , !
!
!
is B (S )-measurable, and thus f E IΓ(S ), But now, with μ 1 := L{ξ } and μ ! := L{ξ}, we obtain from ξ n n again the Portmanteau theorem): / f dμn = J f o (ξn,η)dIP = J f S Ω Ω =
J S!
f'dμf n
> J Sf
ψ ξ (using
n
f !dμ!
( ξ n , c ) d I P = J f! Ω
= J f! Ω
o ξ dlP = J f Ω
= J f < » ( ξ , n ) d P = J f dμ. Ω S
* ^
(ξ,c)
dP
dlP
D
For sequences of independent random elements one gets THEOREM 9d. Suppose that ξ and η
are independent for each nEϋN and suppose
also that ξ and η are independent. Then the following two statements are equivalent: (i)
ξ n
> ξ and n — * n n
Proof, (i) => (ii): 1.) L{U,n)) is separable; since both, L{ξ} and L{η} are separable, there exist S^ C S ! and S^ C S" such that (S',d!) and (S",dM) are separable and o o L{ξ}(S C ) = L{ n }(S" C ) = 1. o o Put S
:= S !
c
c x S" to get a separable and closed subspace of
S = (S,d) (S = S 1 x S", d = max(d!,d")) for which
2.) According to the Portmanteau theorem (cf. (a f ) there) it remains to show
EMPIRICAL PROCESSES
(+)
lim inf
(where A = A C
f
©A").
L{(ξ ,n )>(G) ^ U(ξ,η)}(G)
For this, let μ
:= {A' x A": A
f
1
73
for all
G E G(S) Π A
:= L{ξ} and μ" := L{η} and let
E A' , T ' O A ' ) = 0, A" 6 A", £"(3A") = 0 } ;
then C is closed under finite intersections, i.e, C = C
, and (33) holds
which means that for each
x E S
{CEC
x E C }
is a neighborhood base at x. f
Furthermore, by assumption and the Portmanteau theorem (cf, (f ) there), we
1
have for μ = L{ξ } n n
and μ" = L{η } n n
lim (μ1 x μ")(Af x A") = lim μ 1 (A1 ) μ"(Afl) = μ"* (Af ) μ"(A") n n n n = (μf x μ")(Af x A") = μ f x μl!(Af x A")
for all
A' x A" E C = C' ,
whence (+) follows from Lemma 17,
(ii) =» (i): 1.) Both L{ξ} and L{η.) are separable: since L{(ξ,η)} is separable there exists a separable S
C S = S ! x S"
such that L{(ξ,n)}(s£) = 1. Put S^
:= {x E S f : 3y E S"
such that (x,y) E S Q }
to get a separable S 1 C S' for which S° C π τ ~ ( S ! ° ) 9 whence
here π 1 denotes the projection of S = S' x S" onto S ! . In the same way one shows that L{τ\] is separable.
2.) According to the Portmanteau theorem (cf. (a ! ) there) it remains to show that for μ 1 = L{ξ } and μ τ = L{ξ} n n (+)
l i mi n f μ'(Gf) n
^ V^(Gf)
and that for μ" = L{x\ } and μ" = L{η} n n
for a l l
G
l
6 G ( S
l
) Π A
f
74
PETER GAENSSLER
(++)
lim inf μ"(G") ^ μ"(G M )
for all
G" G G(S Π ) Π A".
We will show (+); the proof of (++) runs analogously.
ad (+); Let G 1 G G(S') Π A f be arbitrary but fixed; then
π t " 1 (G l ) = Gf x s M G A n G(s) and
μ ' ( G ' ) = (μ 1
n
n
x μ M ) ( π ' ~ 1 ( G I )) = ( μ !
n
n
x μ")(G f
n
(A = A' (jpA") x S")
for each nGlN.
By assumption and the Portmanteau theorem (cf. (a') there) we therefore obtain lim inf μ ! ( G ! ) = lim inf (μ1 x μ")(Gf x S") n n n ^ (μ ! x μ")(G! x S") = μ ! ( G f )
μ"(Sfl) = μ f ( G ! ) .
D
Remark. Using the continuous mapping theorem (Theorem 3) one easily gets an alternative proof of "(ii) ** (i) t τ in Theorem 9d, even without imposing the independence assumptions,
SEQUENTIAL COMPACTNESS: We have shown before (cf. (32)) that any net (μ ) C M (s) which fulfills oi a (b.), i=l,2, is a compact net in M (S). At this point we ask the question I a whether the same is true for sequences instead of nets, i.e.,whether for any sequence (μ ) c l K r C M (S) fulfilling (b.), i=l,2, there exists a subsequence n ntJiN
a
l
(μ ). ___ and a separable μ G M, (S) such that μ - ^ μ (as k-x»). n, kGϋN D n, b k k (Note that a subnet of a sequence need not be a sequence!) If (b.), i=l,2, is replaced by the (stronger) assumption of (μ ) __, being <5tight (cf. (31)), then it follows that the answer is affirmative; in fact, as shown by Dudley (1966), Theorem 1, the following is true: (35)
For any δ-tight sequence (μ ) _ C M (S) there exists a subsequence (μ ).. ,_._τ of (μ ) ^__τ and a Borel p-measure μ (on B(S)) such that n, k G H n nGϋN
(36)
lim / fdμ
= lim J fdμ
= J fdμ
for all
f G Cb(S).
EMPIRICAL PROCESSES
75
Based on this result we obtain in a first step the following theorem:
THEOREM 11. Let (μ ) •
(μ
\c™ n
C M (S) be ό-tight; then there exists a subsequence
n nfcJΓi
o f (μ ) ^
a
a n d a s e p a r a b l e μ E ML(S) s u c h t h a t μ n
k
- τ * μ. k
Proof. Apply (35) to get a subsequence (μ ) of (μ ) n, KczJIM n nfcJN K
and a Borel
p-measure μ for which (36) holds true. Then it can be shown as in the 1
f
"(h ) ** (b)' part of the proof of (28). that (36) implies (+)
lim sup μ* (F) ύ μ(F) n k-**> k
for all
F E F(S).
(In fact, given an arbitrary F E F(S), there exists for every nE]N an ε >0 ~ εn 1 such that μ(F ) ^ μ(F) + —
taking then
ε
} and
ε
d(x,CF n ) / ε n i f
x E S,
CF Π * 0,
1 if (JF Π = 0,
g
:= min(f , 1 ) , we obtain a sequence of functions g
having the
following properties for every nEIN: (l) ^•^
g E C (S), n
(2) ^-^
rest
g tF£n
= 0
and
n
(5) ^*^
rest_ g
= 1.
n
Therefore, for every nE]N we obtain lim sup μ
(F) (F) = lim li sup J l p dμ
ε
=
J g d μ = J
(36),yJ
QJ F
n
g d μ ^ μ (F (F
^
) ) < ύ μ (F) (F) + +
^
lim sup ί g dμ
A
I —, w h i c h
implies ( + ) . )
Now, we are going to show that (due to the δ-tightness of (μ )) (++)
μ is necessarily tight,
whence μ := rest« /ςΛ^ i-s also tight and therefore separable (cf. (25)) and thus (noticing also (24-)) we can apply (28) (cf. the equivalent statements (g) and (g 1 )) to obtain the result, i.e.,μ
- ^ μ. K.
ad (++): Since (μ )
is δ-tight, it follows that for every nEIN there exists
76
PETER GAENSSLER
a K E K(S) such that n lim inf μ (K 1 / 2 m ) ^ 1 - - for all m G I . n k+« kn
(+++)
W.l.o.g. we may assume K f and therefore μ( U K ) = lim Ϊ(K ) = lim (lim H(K n
n
n-*χ>
n-*χ>
m
1/m
))
n
* »
k lim (lim sup ί((K 1 / 2 m ) c ) ) ; > ii m ii m s u p ii m s u p μ * ( ( K n n-x» m-**> ( +) n m k-*» n k n k lim lim sup lim inf μ (K 1 ' 2 m ) ^ 1. n m k \ n (+++)
))
D
The proof of Theorem 11 also shows that the following result holds true: THEOREM 11*. If S = (S 5 d) is a separable subspace of S and if (μ ) ™
1
c
Λ
M (S) is <S-tight w.r.t. S_ (i.e., if sup inf lim inf μ (K ) = 1 ) , n nEJN a °~ κeK(s ) δ>o n*» n o then there exists a subsequence (μ ). __τ of (μ ) and a μ E R (S) with n, KEJDM
n nEJN
b
μ(S°) = 1 such that μ -^ μ. \ b As to our question raised at the beginning, it was shown by J. Schattauer (1982) that the assertion of Theorem 11 even holds if the assumption of δ-tightness of (μ )
is replaced by the (weaker) conditions (b.) 5 i=l,2:
THEOREM lla). Let (μ )
be a sequence in M (S) fulfilling the following two
conditions: (b,) For every sequence (f ) ^ τ CU (s) with f 4- 0 one has 1 m mGB b m lim
s ur p J f d μ
n
mn
+ O a s m
+ « > .
(b^) There exists a separable S C s such that lim inf J fμ
£ 1 for all n
f6 ^ ( S ) b
with
fk 1 oc S
Then there exists a subsequence (μ
) of (μ ) k μ(S ) = 1 such that μ -r* μ (as k •> » ) . o n kb
o
and a μ G R ( S ) with
c
For the proof of this theorem we need an auxiliary result which is based on the following
EMPIRICAL PROCESSES
77
DEFINITION. Let S = (S,d) be a metric space and A. C s, i=l,2; A said to be d-strictly separated if either A. 1 ΠA A1 Π A
δo Δ
- 0 for some δ
and A
= 0 for some δ
are
> 0 or
> 0,
where A. ^ := {x E S: d(x,A.) < δ.}, i=l,2.
PROPOSITION (cf. E. Hewitt (1947), Theorem 1 ) . Let S = (S,d) be a metric space, and let G be a subset of IT(S) such that for every d-strictly separated pair F 1 ,F such that
sup
g(x) < inf
E F(S) there exists a function g E G
g(x) (or sup
g(x) < inf
g(x)). Then G is an
"analytic generator" of IT (S), i.e., for every f E u (S) and every positive real number ε there exist functions f.,...,f
E G and a polynomial P(z.,...,z ) Ξ
L
l Σ
Σ
V°
α
r<
a
'
z
z
i
(with real coefficients α
K
) l"f
k
k
such that
|| f - P(f ,...,f )|| := sup |f(x) - P(f 1 k xes
.,f )(x)| < ε. i k
Proof of the proposition. This follows along the same lines as in Hewitt (1947) noticing that the functions ψ,φ,h,h.. ,. . . ,h
and
3 3 i 2 2 i~l φ 9 -(φ - h 1 ) , (-) (φ - h χ - - h 2 -...-(-) h^^), 2 ^ i ύ n, respectively, occurring there are uniformly continuous which implies that the sets F^ := {x e S: f(x) ύ - -|}
for
and
f β {φ,(|)1(φ - hJ_ - | h
2
F 2 := {x 6 S: f(x) k - } , -.,. - ( f ) 1 " 1 h^.), 1 ^ i ^ n } $
are d-strictly separated: In fact, f E U (S) implies that for ε = — there exists a δ>0 such that |f(x) - f(y)j < -g whenever d(x,y) < δ; thus given any x E (F 1 > , we have d(x,x ) < δ for some x (since x
E F
and therefore |f(x) - f(x ) I < "g"
E F ) which implies f(x) ^ - -g- for all x E (F 1 )
for all x E F^, we thus have ( F ^ ) 6 Π F^ = 0.
D
a n d
f
(
χ
^ - " "3
since f(x) ^ —
78
PETER GAENSSLER
Proof of Theorem lla). According to (t>2), let T C S Q he countable and dense in S
( a s w e l l a s i n S ) , say T = ( x , x 2 , . . . } , and
G χ := ίmin(d( G
:= {f:
9
xn)9l):
S -*]R: f
=
2
G3
:= {f:
then G
n G 10, min g . , X l£i£n
S +]R: f = g ^
C G
C G
let
with G
...
g.
G G , i = l , . , . , n , n G 3N} , and
X
for
^
g]
some g ± G G ^ i ^ , . . . , ^
GIU
{0},
being a countable class of functions in U (S)
(cf. Lemma 11 (ii)). Thereforej by the diagonal method, there exists a subsequence (μ
)τ,CΊKT of n,
(μ )
kGUN
such t h a t
n nG3N
K
(i)
lim / fdμ n k-χ» k
Let G. := i f : 4
exists for
f G G_. 3
S -> ]R: f = min(d(
then (cf. (ii)
all
, F ε Π S C ) 9 1 ) for o
Lemma 11 ( i i ) )
some F G F ( S C ) , o
ε>0};
G^ C U^(S), and
For any f G G, t h e r e e x i s t s a sequence ( f ) _-._ c 4 n nGUN
G_ such t h a t f Ψ f 2 n
ad (ii): Let f 6 G. , i.e.,f = min(d(. 5 F ε Π S°),l) 9 F e r e s 0 ) , ε>0. 4
O
O
It is easy to show that T Π F ε is countable and dense in F ε Π S°; let T Π F ε = ί z l 5 z 2 5 . . . } ; then d( ,F £ n s£) = inf d( 5 z n ) , and therefore n g := inf d( ,z.) ψ inf d( , z . ) 9 i.e.,f := min(g ,1) Ψ f, Xl *^ --1 1 il XI l^i^n l where f
=
min (min(d( ,z.),1)) G G 1 l^i^n
n
Now, l e t
G
O
:= if:
S -* ]R: f = g
£ 1 ,...,A k e i then, since G (iii)
for each n which proves (ii).
Ik ... g
-L
u {o},
k
K
for ei};
C U (s), we have also
G 5 C U^(S).
On the other hand, it follows from (ii) that
some g.
1
r, GG , 4
EMPIRICAL PROCESSES
79
(iv) For any f E Gc there exists a sequence (f ) ~- T C G o such that 5 n nE]N 3 f Ψ f as n •> «\ n Furthermore, (v) lim / fdμ *
exists for all f € G_ . 5
then, by (iv), there exists a sequence (f ) CΊ.T C G
ad (v): Let f € G
such
that f Ψ f as m •* °°. Since f - f Ψ 0 and f - f E uf(S), we obtain by (b ) m m m b 1 that lim sup J (f lim sup J (f m k—
- f)dμ
- f )dμ n k
lim sup J" (f k it follows that
-> 0 as m •* °°, and therefore also
->- 0 as m •> °°. Since
- f)dμ ^ lim sup J f dμ - lim sup J fdμ m n *k k "k k k
^ 0,
lim sup J f dμ - lim sup J fdμ -> 0 as m •> °°, , m n, . n, k k k k
and therefore we obtain by (i) (a)
lim lim J f dμ m \
On the other hand, since we have
= lim sup J fdμ k^» \
lim inf / (f - f )dμ m n k k
= - lim sup J (f m k
- f)dμ Π k
,
lim inf J (f - f )dμ -> 0 as m -> °°. Since k k
lim inf J (f - f )dμ ^ lim inf J fdμ - lim inf J f dμ £ 0, , m n, , n, , m n, k k k k k k we thus obtain in the same way as before, using (i), that lim lim J f dμ^ = lim inf J fdμ^ , whence together with (a) the assertion in n, . _^5 n, k k-** k (v) follows. Finally, let G
:= {f: S ->B: f = P(g ,...,g ) for some g. E G , 1 ύ i £ k, k E]N};
then, by (iii), G (vi)
C U (S), and it can be easily shown that (v) implies
lim / fdμ
exists for all f E G .
n k-χ»
o k
C Now , let IΓ(S ) := {f: S° -> ]R: f bounded and uniformly (d-) continuous}
and consider G' := {rest_of: f E G } C U b ( S C ) . M
o
C
M"
O
80
PETER GAENSSLER
Let F ,F
E F(S C ) be a d-strictly separated pair of closed subsets in the
metric space S
= (S , d ) , i.e.,(w.l.o.g.) there exists a δ>0 such that
c
(Γ? n s ) n F,°= 0. ° _L
O
Z.
Put f := minCdC ^
7
Π S Q ) , 1 ) ; then f E G^ and g := rest Q f E G^; o
we will show that (b)
sup g(x) < inf g(x). xEF xEF 1 2
ad (b): x E F, implies d(x,F/ ^"*— l 1 f(x) = 0 for all x E F
Π S°) = 0 since F, C s°, and therefore o i o
whence sup g(x) = 0.
On the other hand, x E F 2 together with (F^ Π S^) Π F 2 = 0 implies d(x,F1) ^ δ ) ^ δ/2; thus d(x,F^/2 Π S°) ^ δ/2 for all x E F 2 , i.e.,
and therefore d(x,F^ inf
g(x) ^ min(δ/2,l) which proves (bJ.
Therefore, by our proposition, G' is an analytic generator of U ( S C ) , i.e., for every h E U (S°) there exists a sequence (g ) Sn ~- V β n l
^
-Snk^
w i t h
^ni
f
G
i >
H
i
S
k
n'
such that a n d
sup | h ( x ) - g ( x ) | -> 0 a s n -> α>. n xES° c Since g . E G' , g . = rest f . for some f . ^ G.t 9 whence & ni 4 ni c ni ni M o
O
f := P (f i 9 f o,...,f , ) E G c with rest f = g for each n 6 W . n n nl n2' nk 6 _c n & n n S o We thus obtain that (vii)
For any f E U, (S) there exists a sequence (f )
C G
such that
sup |f(x) - f (x)| •> 0 as n n xES c o Furthermore, we will show that (viii)
lim J fdμ
exists for all f E UΓ(S).
n
b
k
ad (viii): Let f E U (S); then, by (vii), there exists a sequence (f ) e - I N c G such that sup |f(x) - f^(x)| ->• 0 as n ->• °°; therefore, given an arbitrary n x6S Sc o
EMPIRICAL PROCESSES
81
but fixed m G UN there exists a n n
= n (m) G U such that o o |f(x) - f (x)| < —. Since f and f are uniformly continuous, there n m n o o
sup c xGS o exists a δ >0 such that |f(x) - f(y)| < - and |f (x) - f (y)| < m m n n m o o δ whenever d(x,y) < δ . Now, let S := (S ) then, for any x G S there exists a
y G S such that d(x,y) < δ , and therefore o m
|f
(x) - f(x)| ^ |f o
(x) - f (y)| + |f (y) - f(y)| + |f(y) - f(x)| < | n m o o o
for all x G S, whence f(x) ύ f (x) + n m o
and f(x) £ f (x) - - for all x G S. n m o
Since S G δ (S) (cf. Lemma 11 (ii)), it follows that ©
J fdμ n
(d)
k
J fdμ n
£ J (f + -) dμ n S Πo m k
+ I fdμ n GS k
for all k € U , and
^ / (f
+
for all k 6 1 ,
k
S
n
o
- -) dμ m
n
k
J fdμ
CS
n
k
Furthermore, it follows from (b ) that (e)
lim sup μ (CS) = 0, n k k d ( f
'So} ad Q) : l/»g = 1 r ( S C ) δ m = m i n ( — £ — — , D lim sup μ (£S) ^ lim sup J f dμ n n k k k k = 1 - lim inf J (1 - f°)dμ
lim inf J (1 - f°)dμ
& 1
=: f
o b G IΓ(S), whence
= lim sup (1 - /(I - f )dμ ) n k k
^ 0, since 1 - f° 2 1
and thus
by ( b o ) . This proves ( e ) , 2
\
Next, it can be easily shown that (<e) implies (f\
lim ^ f d μ n k+~ Cs k
=0
for all f G ϋ
But then, it follows from (c), Ql) and Qj lim sup J fdμ Π k k
that
^ lim sup J f dμ k S no nk
+ —, and
82
PETER GAENSSLER
lim inf J f dμ ^ lim inf J f dμ k k k S o k Furthermore, ©
together with (vi) imply easily that
lim sup ί f dμ n k S o \ and therefore
.
= lim inf J f n k S o
lim sup J fdμ k \
dμ n k
- lim inf J fdμ n k k
= lim J f n k~> S o
dμ k
n
^ — ra
which implies the assertion in (viii) since we started with an arbitrary m. But now, putting
μ(f) := lim J fdμ for n k-*° k
f E u£(S),
the assertion of Theorem 11 a) follows as in the proof of Theorem 6
applying
the Daniell-Stone representation theorem (cf. H. Bauer (1978), 3. Auflage, S. 188) noticing that ^ ( S ) coincides with the smallest σ-algebra with respect to which all f E U, (S) are measurable. D
This concludes the proof of Theorem 11 a ) . D
SKOROKHOD-DUDLEY-WICHURA REPRESENTATION THEOREM:
Let again S = (S,d) be a (possibly non-separable) metric space and suppose that A is a σ-algebra of subsets of S such that B, (S) C A C B(S): let (ξ ) ^ D n nEJN be a sequence of random elements in (S,A) and ξ be a random element in (S,B b (S)) such that ξ
-^-» ξ (cf. (34)).
Then the Skorokhod-Dudley-Wichura Representation Theorem states: THEOREM 12. ξ
r ξ implies that there exists a sequence ξ , n E U , of random
elements in (S,A) and a random element ξ in (S,B (S)) being all defined on an appropriate p-space ( Ω , ? ^ ) such that L{ξ } = Lίξ } (on A) for all n G I , HO
= ί-{ξ} (on B (S)) and ξ
an Ω C Ω o
with ?l E ΐ and o
-> ξ P-almost surely as n •> °° (i.e., there exists
ί(Ω ) = 1 such that for all ώ E Ω o o
lim d(ξ (ω), ξ(ω)) = 0 ) . n
For complete and separable metric spaces this result was proved by A.V. Skorokhod (1956); it was generalized to arbitrary separable metric spaces
EMPIRICAL PROCESSES
83
by R.M. Dudley (1968), and in its present form (for arbitrary metric spaces) it was first proved by M.J. Wichura (1970); cf. also R.M. Dudley (1976), Lectures 19 and 24. Our proof will be based on the one given by Dudley (1976). For this, we need the following proposition. Proposition. Let S = (S,d) be a metric space, μ E R C S ) be separable, i.e., μ(S Q ) = 1 for some separable S
c
S; then, given any ε>0, there exists a
sequence (A ) „ of pairwise disjoint subsets A
of S having the following
properties: (i)
S° C
° (ii)
U
A
new
n
u(3A ) = 0 for all n E U (where μ denotes the unique Borel extension of μ (cf. (24)))
(iii)
diam(A ) :=
sup d(x,y) < ε for all n E U , and x,yEAn
(iv)
A E B, (S) for all n 6 3N. n D
Proof. Let {x ,x ,...} be dense in S . For each n EϋN, the open ball B(x ,δ) 1 z o n is a μ-continuity set (i.e., μ(3B(x ,δ)) =0) except for at most countably many values of δ; hence, given any ε>0, for each n E U there exists an ε ε/4 < ε
< ε/2 and μθB(x ,ε )) = 0. Now, let A
for n>l
A
:= B(x ,ε ) \ U n n j
such that
:= B(x ,£.), and recursively
B(x.,ε.). Then (i) - (iv) are fulfilled: 3
3
In fact, (iii) and (iv) follow at once by construction, (ii) holds since the class of all μ-continuity sets forms an algebra containing each B(x ,ε ) ; finally, given any x E S x E B(x^,εA) C B(xk,ε ) C
there exists an x^ such that d(x,x^) < ε/4 whence k U A , implying (i). D n=l
Proof of Theorem 12. Let us start by giving a description of the basic steps along which the proof will go, postponing some details to its end. For this, let P := L{ξ} on B^tS) with P(S°) = 1 for some separable S C S, and let b o o
PETER GAENSSLER
P
n
:= L{ξ } on A, n 6]N. n
STEP 1. For each k 6]N, by the proposition take a sequence (A^ ~ disjoint P-continuity sets A . 6 B,(S) such that diam(A Since
(a)
1 ) < -r- for all j 6 H .
U A , 3 S c and P(S C ) = 1, there exists a J < » such that j6W k] o o k -k P(A .) > 1 - Λ2
Σ
P(A
(where w.l.o.g. we may assume
) > 0 for all 1 ύ j ύ J ).
Applying (28) (cf. (f τ ) there) we obtain (b)
For each k 6 1 there exists an n
6 1 such that for
|P (A. .) - P(A. . ) | < 2 ~ k
1 £ j £ J,
min
P(A. .) for k
all n ^ n . We may assume w . l . o . g . l < n
STEP 2. For each n 6 1 8
n
:= A ® B ( I
n
measure on 8 ( 1
) n
let
< n <
...
.
S := S, I := I := L θ , l ] , T := S n ' n n n
(with 8 ( 1 ) := I n
Πβ)
); furthermore, l e t
n
and Q n
χ
I , n'
:= P x λ, λ b e i n g Lebesgue n
T := S x I , 8 := B ^ ( S ) © B ( I ) and o o D
Q := P x λ. o For each k 6 U, 1 ύ j ύ J
and n k n
i f
f(n,k,j) :-
n
kj
P
let
n ( A k jJ )
1 otherwise,
g ( n , k , j ) : =<
P (
\ j
}
if P (A. .) < P ( A V . ) , '
1 otherwise, B ,. := A^. x [θ,f(n,k,j)]:( considered as a subset of T , and nK] }c ~2 n C^. B
:= A . x [O,g(n,k,j)l, considered as a subset of T , i.e.,
. 6 6 and C 6 8 nK ~2 n nK j o
then, by the definition of f and g, we have
EMPIRICAL PROCESSES
( C )
Q
n
( B
nkj
)
Q
= o
(C
nkj
)
85
= ffi
it follows from (b) that (d)
k
min(g(n,k 5 j), f(n,k 9 j)) S I - 2 ~ ,
Let B . := T \ U nko n ^ . ^
B , . and C , nk 3 nko
= T \ U o ^ . ^
C , .f nk :
For k = 0 let J := 0, B := T and C := T . o noo n noo o Let n
:= 1 and for each n G I , let k(n) G U U {0} be the unique k such that
IL ύ n < n k + 1 ; then T (e)
T = 11
is the disjoint union of sets D . : = B . x., i.e.,
Σ
D .; likewise T n3
=
Σ
E . with
°
^
E . := C w v .. nj nk(n)3 It follows from (c) and (d) that (f)
Q (D .) = Q (E .) > 0 if j S 1. J n nj o nj
STEP 3. For each n G I , given any x (Ξ To9 let j = j(n,x) be the j such that ———~— x 6 E . (cf. (e)). Let A := ίx G T : xQ (E ., J o o n3(n,x) Then
CA =
U n G B
> 0
{ x G T : Q (E ., ) = 0} = o o n:(n,x)
Vn G3N}. U
{ n e Έ
( E
o
Eno G B nθ Q} o no
and Q ((/A) = 0, whence (g)
A G BQ
and
QQ(A)
= 1.
Therefore, Q (A Π B) := Q (B), B G B , is a well defined p-measure on A Π B . o o o o For x G A and any B G B
let
P (B) := x Q (B Π D ., J / Q (E ., J. nx n n](n,x) x o n3(n,x) It follows from (f) that (h)
the P
, x G A, are p-measures on B , belonging to a finite
set { P n j > , where P n j s= P n χ if
, o s j s Jk(n). For x G A let
86
μ
PETER GAENSSLER
:=
x
x
P
be the product measure of the P
n X
on the product σ-algebra
n X
B :- (S) B in the product space T : = n
x T . n
n£]N
Let μ: A x B •> [θ,ll be defined by μ(x,B) := μ (B) for x E A and B E δ ; then μ is a "transition probability (or Markov kernel) from (A,A Π B ) into (T,B)", i.e. , (i)
(1) For each x € A
μ(x, ) is a p-measure on B, and
(2) For each B 6 B
μ( , B ) : A + I = [O.ll is A n B Q
B(I)-measurable. Of course, (1) holds true here and (2) will be shown later. Therefore (cf. Gaenssler-Stute (1977), Satz 1.8.10) ]P := Q
x μ defines a p~measure on
f := (A Π B )(χ)B in o Ω := A x T, where (cf. Gaenssler-Stute (1977), 1.8.7 (j)
and 1,8.9)
#(C) = J J 1L (x,y)μ(x,dy)Q (dx) = J μ(x,C )Q (dx) X
A T
°
for C e f ; note that C
A
°
:= {y 6 T: (x,y) e C} E B.
STEP M , ( Ω , F ^ ) as obtained before being the desired p-space, let, for n 6 1 , ξ
Ω •> S be the natural projection of Ω = A x [ x
(S
x I )] onto S
!
then the ξ s are random elements in (S,A) and n (k)
Ul
n
} = L{ξ } n
(on A) for all n 6 « .
In fact, for any A 1 G A, l{\ }(A f ) = ί(ξ ~ 1 ( A f ) ) n n = / μ (T x . . . x T ,.x . x 1 n-1 (3) A
x (A1 x I ) x T x . . . )Q (dx) n n+1 o
= J P (Af x I )Q (dx) Σ . nx n o f.y. ^.^-r A (h) 0 ^ J k
P .(Af x I )Q (A Π E .) nj n o nj ( n )
= Σ P .(A1 x I )Q (E .) = Σ Q ((A! x I ) Π D .) = Q (A1 x I ) nj n o ni n n nj , s n n : 3 (e) = P (Af) = L{ξ }(A'). n n
= S;
EMPIRICAL PROCESSES T Next, let ξ := Π ° © i(A)
87
Π A : Ω -> S, where Π A is the natural projection of
T Ω = A x T onto A, i(A) is the injection of A into T , and Π ° is the natural O
projection of T
o
= S x I onto S; then ξ is a random element in (S,B, (S)) and
U)
L{ξ} = L{ξ}
(on B. (S)). D 1
In fact, for any B e B b (S), L{ξ}(B) = P(ξ" (B)) A
T ~ 1 1 1 1 1 = P((π")"" c (i(A))" o (ΠS°)" (B)) ^ ( ( I I " ) " o (i(A))'" (B x I)) 1 ^ - ( A Π (B x I)))=£( [A Π (B x I)] x T) = J μ (T)Q (dx) A (j) AΠ(BxI) X ° = Q (A Π (B x I)) = Q (B x I) = P(B) = L{ξ}(B). o o Now, let Ω
:= lim inf Ω
, where
°
n
k(n) then Ω £ F and o (m)
lim d(ξ (ω), ξ(ω)) = 0 n n-*»
for all ω € Ω , o
In fact, for any
ώ £ Ω there exists an n E l such that for all n ^ n there o o o exists a j(n), 1 ^ j(n) ύ J . ., such that *k(n)j(n)
and
implies i(A)(Π^(ω)) € E n
< ™ t β that ή t t ) 6 A Π E n j ( n )
"(ίS whence Πs°(i(A)(π"(ίS))) β A k ( n ) ( n ))'
(n)
cf. the definition of the sets D . and E ., respectively. Therefore, for all n ^ n , d(ξ (ω), ξ(ω)) £ diam(Aj, Λ / \) ^ F Γ T "*" n -> °° (since k(n) -*• °° as n •> » ) . Next we will show that ]P(Ω ) = 1. For this we will prove later that (n)
Q (lim sup E ) = 0. o no -xχ) n
Now, C
"o,n=
( [ A n E
* k(n)
n j
]
X T
l
X
X T
n-lxCDnj
X T
n+1:
PETER GAENSSLER
+ ([A Π E 1 x T x . . . x T x . ..) no 1 n = Ω. + Ω^ , say, l,n 2,n where
P(8
) = Σ J Pn x (CD . )Q (dx) = Σ P . (JD . )Q (A Π E .) n n j AΠE n j 3 ° j 3 n3 o nD
Q ((CD .)ΠD .) r Σ — ^ £3_ Q (E .) = 0 for all n E JJ, and therefore
ί ( l i m sup CΩ ) = ]P(lim sup ίL ) = ί ( [ A n lim sup E 1 x T) n 2 n n o n~ °' n-~ ' n~ = Q (A Π l i m s u p E
n o
°
I t follows t h a t
) = Q ( l i m s u pE
°
n~°
P(Ω ) = H l i m inf Ω
°
n o
) =0
by ( n ) .
) = 1 - P(lim sup Cfi Π
°
) = 1. n
It remains to show (i) (2) and (n): ad (i) (2): We have to show that ( +) If
V := {B e 8: μ( ,B): A -> I = [θ,l], is A Π B , B(I)-measurable} = B. B = T x ... x T - x B x T x ... for some B 6 B , then for each t G I 1 n-1 n n+1 n n
{x E A: μ(x,B) £ t} = {x G A: μ (B) ^ t} = {x G A: P (B ) ^ t} = x nx n ( h ) U
AΠE.GAΠB,
whence B E 1? f or a l l these s e t s .
From this and the product form of μ it follows that the class C of finite intersections of sets B just considered is also contained in P. Since C is a Π-closed generator of B, we get (+) as in Gaenssler-Stute (1977), 1.8.5.
ad (n): It follows from (a) that
Σ P(S \ U l ύ k ^
\.) ^ Σ 2~ k < « 9 k k
whence, by the Borel^Cantelli lemma P(lim sup(S\
U
A
)) = 0, i . e . ,
P(lim inf
U
A
) =1
and thus P(lim inf
U
A
.) = 1
as k(n) -> °° for n •*
EMPIRICAL PROCESSES
89
Furthermore, λ(lim inf [0,
with t < l ,
min 1
Since
min
^
g(n,k(n),j)1) = 1 ,
since for any
g(n,k(n) 5 j) ^ 1 - 2
J
> t
lim inf
U
E . D (lim inf
U
A. , Λ . ) x k ( n > 3
(lim inf [0, min
we thus obtain Q (lim inf
U
g(n,k(n),j)]),
E .) = 1 n3
° u
for all large enough n.
( )
nD
C(
t 6 I = [θ,ll
which implies (n) since
E .) = E . ni
no
This concludes the proof of Theorem 12.
D
Following a suggestion of Ron Pyke, let us demonstrate at this place the usefulness of the representation theorem for proving the following version of Theorem 5 (cf. Lemma 16 for the definition of the set E ) .
THEOREM 5'. Let S = (S,d) and S 1 = ( S ! 5 d f ) be metric spaces, and A a σ-algebra of subsets of S such that ^ ( S ) C A C B(S). For n G Έ let g : S -> S τ be A,8 (Sf )-measurable and let g: S -> S ! be 8 h ( S ) , B (S f )-measurable. Let (ξ ) ^ ^
be a sequence of random elements in (S,A) and ξ be a random ele-
L
, ,
ment in (S,B, (S)) such that ξ -=•• ξ and L{ξ} (E) = 0 . Then D n
(Note that g (ξ ) and g(ξ) are random elements in (S f ,B b (S ! )).)
Proof. As in the proof of Theorem *+ it is shown that ί.{g(ξ)} = Lίί} ° g
( = ί{ξ} β g
) is separable. Now according to Theorem 12,
there exists a p-space (Ω,F,P) and on it random elements ξ random element ξ in (S,B, (S)) such that
in (S,A) and a
90
Uξ
PETER GAENSSLER
n
} = L{ξ } (on A) for all n 6 lί, L{ξ} = L{ξ} (on B, (S)), n Jb
and ξ (ω) •> ξ(ω) n
(as n -> «) for all ω G Ω , where Ω 6 F with o o
P(Ω ) = 1. o
Let Ω. := {ξ G C E } and Ω o : = Ω n Ω : then for all ω 6 Ω_, 1 z o 1 Z g (ξ (ώ)) -* g(ξ(ω))
(as n -> « ) . Since
ί(Ω ) = 1
(note that L{ξ}^ (ffE) = L{ξ}^ (CE) = 1) we have
and ί ^ C ^ ) = 1
i # ( Ω 2 > = 1, whence there
exists Ω 3 e F such that Ω 3 C Ω 2 and P(Ω 3 > = 1. It follows that for ft G Ω 3 and
each f e u h b
f eg
ξ(ώ)-> fβgβξ(ω)
(as n ->- »)
whence, by Lebesgue's theorem, E ( f β g n β g ^ i ( f ί g o ξ ) 5 i.e., JfdL{gn(ξn)} -• JfdL{g(ξ)} Since and
Lign(ξn)} = Litjo^
(as n -* » ) .
= Lίξ^-g^ 1 = Lίg^ζJ}
L{g(ξ)} = L{ξ} g" 1 = L(ξ} g" 1 = H g ( ξ ) } , the assertion follows by (28)
(cf. (h 1 ) there).
D
Next, we want to make some specific remarks concerning the special case S = D[θ,ll reviewing at the same time some of the key results from Billingsley's (1968) book (cf. Appendix A in G, Shorack (1979)).
THE SPACE D[0,l]: Let D Ξ D[θ,ll be the space of all right continuous functions on the unit interval [θ,l] that have left hand limits at all points t G (0,1]. Cf. P. Billingsley (1968), Lemma 1, p. 110, and its consequences concerning specific properties of functions x G D; among others,
sup |x(t) | < °° for all x G D. te[o,i]
If not stated otherwise, the space D will be equipped with the supremum metric p, i.e. ρ(x,y) :=
sup |x(t) - y(t)| for x,y G D. tG[0,l]
(By the way, (D,p) is a linear topological space whereas (D,s), with s being
EMPIRICAL PROCESSES
91
the Skorokhod metric, is not (cf, P. Billingsley (1968), p. 123, 3.)). Note that (S,d) = (D,p) is a non-separable metric space (in fact, look at x
s
:= lr -I, s E (0,1), to obtain an uncountable set of functions in D for L s9±J
which p(x ,x ,) = 1 for s Φ s ' ) . s s Also, as pointed out by D.M. Chibisov (1965), (cf. P. Billingsley (1968), Section 18), the empirical df U some
(based on independent random variables (on
p-space (Ω,F^P)) being uniformly distributed on [θ,l]) cannot be con-
sidered as a random element in (D,B(D,p)) (i.e. U : Ω •* D is not F,B(D,p)n ——— measurable), where B(D,p) denotes the Borel σ-algebra in (D,p). But, considering instead the smaller σ-algebra B .(D) Ξ B (D,ρ) generated by the open (p-) balls we have (37)
B b (D) = σ({π t : t E [0,1]}), where σ({π : t E [0,1]}) denotes the σ-algebra generated by the coordinate projections π by π (x) := x(t)
= π (D) from D onto ]R, defined
for x E D.
(Note that (37) implies that U
is F,B (D)-measurable since
F, σ({π : t E [O,l]})-measurability of U
is equivalent with F,β-measurability
of π.(U ) = U (t) for each fixed t E [θ,l] where the latter is satisfied since t n n U (t) is a random variable.) n Proof of (37). Let T := Q Π [θ,l] be the set of rational numbers in [θ,ll; then, by the right continuity of each x E D one has (a)
ρ(x ,x ) = sup |x-(t) - x Q ( t ) | for every x ,x 1
ι
Therefore, for any x
E D.
ter E D and any r > 0
{x E D: p(x,x ) ύ r} = Π {x E D: |x(t) - x (t)j ^ r} ° tGT ° =
n
tET
Tr'^tx (t) - r, x (t) +r]) E σ({π^; t E [0,1]}); thus t t ° °
B, (D) C σ({τr : t E [0,1]}); (note that {x E D: p(x,x ) < r} Jb t o =
U ^-«τ
mEIN
{x E D: p(x,x ) ύ r - - } ) . o
m
92
PETER GAENSSLER
To verify the other inclusion it suffices to show that for every fixed t E [θ,l] and r E ]R one has (b)
{x E D: π t (x) < r} G B ^ D ) .
For this we define, given the fixed t and r, for any n, k E IN and s E [θ,l] 0, if s < t k
χ (s) := <
r - ^ - n, if s E [t,t + £) Π [θ,U o, if s 6 [t + p
«) n [o,il.
Then x E D and it follows that n (c)
{x E D: TΓ (x) < r} = U U {x E D: p(x,χk) ύ n}, n nEΠN kE]N
which proves (b). As to (c), let x E U U { x E D: p(x,χk) ^ n}; then n ° nEU kEIN p(x ,x ) ύ n
for some n and k, whence n έ |x (t) - χ k (t)| = |x (t) - r + i + n| o n o K ^ x ( t ) - r + — + n , and therefore o K x (t) ύ r - — < r, i.e. π (x ) < r. o K to
On the other hand, if x (t) < r, x sup
E D , choose n
EH
such that
|x (t)| ύ min(n ,n - r ) ; then it can be easily shown that
te[o,i] °
° ° x
€
°
{x e D: p(x,χk ) S n },
U
ken
which proves (c). D
(38) REMARK. Comparing (37) with the known result that the Borel σ-algebra B(D,s) in (D,s), equipped with the Skorokhod metric s (cf, P, Billingsley (1968), Chapter 3 ) , coincides also with σ({τr : t E [θ 9 ll}), we obtain that 8(D,s) = B b (D,p). It is also known, that for any sequence (x ) £.™c lim ρ(x ,x) = 0 always implies lim s(x ,x) = 0;
^^oo
n
^
n
D
and x E D,
EMPIRICAL PROCESSES
93
on the other hand, if lim s(x ,x) = 0 for some continuous x, then
lim p(x ,x) = 0 (hence the Skorokhod topology relativized to the space of all continuous functions on [θ 3 l] coincides with the uniform topology there). Let C Ξ C[θ,l] be the space of all continuous functions on [θ,l] and consider again the supremum metric p on C. Then (S ,d) = (C,p) is a separable metric space being here a closed subspace of (D,p), i.e., we have S
= S
in the present situation, (Note that x E C is
even uniformly continuous,) Therefore, denoting by B(C,p) the Borel σ-algebra in (C,p) we have (cf, Lemma 11) c n B(D,p) = B(c,p) = B b (c,p) c c n B b (D,p) c c n B ( D , P ) , and (39)
C E B^(D,p), i.e.,
B(C,p) = 8, (C,p) = C Π B (D,p) D
and
C € B_(D,p).
D
D
In what follows let ξ , n E II, and ξ be random elements in (D,B(D,p)) which are all defined on a common p-space (Ω 9 F,P), Following (34) we write ξ
> ξ iff L{ξ } —r* L{ξ} in which case (by our definition of —r* - convergen-
ce) L{ξ} is assumed to be separable. On the other hand, in view of (38), ί.{ξ } and L{ξ} may also be considered as Borel measures on B(D,s), whence the usual concept of weak convergence of Borel measures can also be used, which means that ξ
>ξ
iff, by definition, L{ξ } on B(D,s) converges weakly to L{ξ} on B(D,s) in the sense of Billingsley (1968),
L h LEMMA 18. If ξ -=-> ξ, then ξ n n
/
> ξ; on the other hand, if ξ n
L
> ξ and
H ξ K C ) = 1, then ξ^ -^> ξ.
Proof. Note first that (D,s) is a separable metric space whence we can use (28) with B,(D,s) = A = B(D,s), which gives us
94
PETER GAENSSLER
(+)
ξ
- ^ ξ * lim E(f(ξ )) = E(f(ξ)) for all hounded
B(D,s), /B-measurable functions f: D ->• ]R which are L{ξ}-a,e, continuous.
1. ) Consider an f: D -*]R; then: if
f is s-continuous,
i t i s also
p - c o n t i n u o u s a n d ( c f . ( 3 8 ) ) ίL ( D , p ) ,
iB-measu-
rable. Therefore
Lb
ξ
> ξ implies that lim E(f(ξ )) = E(f(ξ)) for all bounded s-continuous n n^o
n
f: D •* ]R, whence ξ n 2.) ξ
> ξ.
? ξ implies, according to ( + ), that lim E(f(ξ )) =E(f(ξ)) for all
bounded β(D,s), β-measurable f: D •> "R which are L{ξ}-a.e. continuous. Since i-{ξ}(C) = 1 implies (cf. (38)) that any p-continuous f is also L{ξ}-a.e. s-continuous, we obtain, using again that B(D,s) = 8, (D,p), that lim n
E(f(ξ )) =E(f(ξ)) for all bounded, p-continuous, and Bτ(D 9 p)j β-measu-
->oo
rable f: D -*]R; furthermore, since C = (C,p) is a closed separable subspace of (D,p) with L{ξ}(C) = 1, we finally obtain (cf, (28)(hf)) that
Now we are going on in reviewing here some of the key results of Billingsley's (1968) book. The following lemma is well known (cf. Yu.V. Prohorov (1956));
LEMMA 19. Let F: [θ,l] •> E be a continuous function and a>l, b>0 be constants such that for some random element (K[°>l],βΓ O LO,1J (40)
ξ in (β
LO.1J
:= t(E
© 8 [o,i] t
with tB
=β)
E(|ξ(t) - ξ(s)|b) ύ |F(t) - F(s)| a for all 0 ύ s ύ t ύ 1;
then there exists a random element ξ in (D,8 (D,p)) such that L{ξ}|IBr0 .-. = L{ξ} and (L{ξ}|Bb(D,p))(C) = 1. (Note that D Π ^
^
z B^D.p) (cf. (37)) and
EMPIRICAL PROCESSES
95
C e B b (D,p) (cf. (39)).) In what follows we shall write ξ -=—f ξ, if the finite dimensional distrin r .α, butions (fidis) of ξ converge weakly to the corresponding fidis of ξ. (Recall that, given a r.e. ξ in (D,B (D,p)), the fidis of ξ (or L{ξ}, respectively) are defined as the image measures that π o n ^
k (D): D ->]R induce
from L{ξ} on B b (D,p) (= σ({π t (D): t E [0,1].})) for each fixed
ti5...3t 1
E [0,1], k k 1, where π.
K
(D) (x) :- (x(t ),...,χ(t, )) for
L^5. .., L
x E D; note that π.
±
K
is B (D,ρ) 9 β -measurable,)
DEFINITION 6. Let (ξ ) , τ be a sequence of random elements in n nEJN (D,Bb(D,p)) = (D,B(D,s)); (i)
(ξ ) is said to be relatively L-sequentially compact 9 iff for any sub-
sequence (L{ξ ,}) of (L{ξ }) there exists a further subsequence (L{ξ ,,}) of (L{ξ t}) and a p-measure μ on B(D,s) such that L{ξ ,,} converges weakly to μ in the sense of Billingsley (1968), (ii) (ξ ) is said to be relatively Lv-sequentially compact, iff for any subn Ό sequence
( H ξ ,}) of (L{ξ }) there exists a further subsequence (/-{ξ „}) of
(L{ξ ,}) and a separable p-measure μ on B,(D 9 p) (in (D,p)I) such that
The following theorem is well known (cf. P. Billingsley (1968), Th. 15.1).
THEOREM 13. Let (ξ ) be relatively /.-sequentially compact and suppose that
The next theorem gives sufficient conditions for (ξ ) to be relatively L-sequentially compact. For this, given any x E D and B E [θ,l] Π β
||x|| :=
}
sup |x(t)|, tE[0,l]
let
96
PETER GAENSSLER
and iλ) (B) := sup jx(t) - x(s) I . X s,t€B THEOREM 14. Let (ξ )
be a sequence of random elements in (D,B (D,p)) 9 all
defined on a common p-space (Ω,F,P), and satisfying the following set of conditions ^ p - ( S ) : (A):
lim sup 3P(||ξ || > m) -* 0 as m -»- «.
(β): For every ε>0, lim sup Έ> (u) ([θ,δ)) ^ ε) + 0 as δ -> 0 and
lim sup ]P (u) ([δ,l)) k ε) •> 0 as δ •> 1, n-*» n
(S): There exist constants a>l, b>0 and, for every n 6 IN there exist monotone increasing functions F : [0,1] -»• ]R such that for every ε>0 and any
P(|ξ (s) - ζ (r)| 2 ε, |ζ (t) - ξ (s)| 2 e) S 11
11
11
11
ii
xx
(5) : There exists a monotone increasing and continuous function F: [θ,l] -> ]R such that for the F ' s occurring in (^
and any 0 ^ s S t ^ 1
lim sup (F (t) - F (s)) ^ F(t) - F(s). n n n-χ» Then (ξ ) is relatively L-sequentially compact.
(41) REMARK. Given any x E D and δ>0, let UJM(δ) :=
sup
min ί|x(s) - x(r)|, |x(t) - x(s)|}
t-r^δ Then @
and (^ together imply
(cjj) : For every ε>0, lim sup Γ(M;'! ( ό ) ^ ε ) ^ O a s δ - ^ O . ir*»
n
As to Theorem 14, it is shown in Billingsley (1968), Theorem 15.3 that
@
and (cjj) together imply the assertion of Theorem 14, So we will prove here only the statement made in (41). For notational convenience we shall write ξ (s,t] instead of ξ (t) - ξ (s) n n n
EMPIRICAL PROCESSES
97
for 0 ^ s £ t ύ 1, a) Given an arbitrary ε>0, t 6 [0,1) and δ ^ 1 - t, it follows from Theorem 12.5 in Billingsley (1968) together with (c) that for every n 6 UN and every m 6 Ή P(
^
{min [ jξ (t + —
ύ K(a,b)'ε
δ,t + —
δ] I , | ξ ( t + - ^ - δ , t + —
δ]|]^ε})
(Fn('t + θ ) - F ( t ) ) , where K(a,b) is a constant depending only on
a and b. Therefore, due to the right-continuity of the sample paths of ξ , putting u;"(Lt,t + 6]) : =
sup
min { |x(r ,s] |, |x(s,t' 11}
X
for x β D, δ>0 and t ^ 1 - 6, it follows that
P(itf" ([t,t + δ]) ^ ε) n
^ K(a,b).ε"*b(Fn(t + δ) -
n
b) Let, for any δ>0, m = m(δ) ;= [^j] (where [xl stands for the integer part of x ) ; then, for every n
EB9
P(M;" (δ) ^ ε) ύ Σ p(wy ([-,—1) ^ ε) + Σ p(wy a—1, n
\
a)
i=0
,
[
i=0
n
n
i=0
n
φ
i=0
% ^ 1 ) ^ ε)
n
^ ^ ^
]
,
which implies by (5) that lim sup n-^00
P(M;" (δ) ^ ε) ύ K(a,b) ε""b. 2 ( W F φ ) a " 1 . (F(l) - F(0)) t n
where υίΛ-) := sup {F(t) - F(s): s £ t, t - s ^ -} F m m = M;TΠ( / f v ) •»• 0 as δ -> 0 (since F is uniformly continuous), r m(.ό This proves (C^) ,
D
(42) REMARK. Let us consider in Theorem 14 instead of (^ conditions y&j
and ^J/
, respectively:
and @
the following
98
PETER GAENSSLER
: For every ε>0, lim sup P(|ξ (d) - ξ (0)j ^ ε) -> 0 as δ -> 0
and
lira sup P(|ξ n (l) - ξ U ) | ^ ε) + 0 as δ + 1;
j) : There exist constants a.,b. > 0, i=l,2, such that a 1 + a o > 1 and, for every n E U there exist monotone increasing functions F :[θ9ll •*• ]R such that for any O ^ r ^ s ^ t ^ l b b2 E( |ξ ξ n (s) ( ) - ξξnn(r) (r)| | L |ξ |ξnn(t) ( - ξ n (s) | ) ύ ( F ^ s ) -
then (?j) together w i t h ^)
imply ^ ^ , and (c]) implies
THEOREM 15, Let ξ , n 6 E , and ξ b e random elements in (D,B ( D , ρ ) ) f a l l defined on a common p-space ( Ω ^ ^ I P ) , and suppose that ζ^
(or ζcj) ) a n d ( ϋ ) together
with t h e following conditions ( g ) and Q ) a r e fulfilled:
φ : then
L{ξ}({x E D: x(l) * x(l-0)}) = 0; ξ n
> ξ,
Proof, As remarked in (M-2), ^ j ) implies (c) which together with ^ ) implies Qcjj according to (M-l), But ζ y together with ^ ^ and (c^) imply the assertion according to Theorem 1 5 Λ in Billingsley (1968) (cf, also Gaenssler-Stute (1977), Satz 8,5.6.).
D
In view of Lemma 18 we thus obtain the following L ^convergence theorem:
THEOREM 16. Let ξ , n β Jί, and ξ be random elements in (D,B (D,p)) f all defined on a common p-space (Ω,F^P), and suppose that L{ξ}(C) = 1,
I
Then (c) (or (cj) ) together with (^
and (E) imply ξ
?- ξ.
The following result is used in G. Shorack's (1979) paper concerning ξ 's of a special nature.
EMPIRICAL PROCESSES
99
n
n
THEOREM 17. Let, for every n G U, T := {t , t^9,..,t } be such that n o 1 m 0 = t
^ t 1 ^ . . .^ t m n
= 1. Let ( ^ ^ e ^ k e
(D,B, (D,p)) such that for all n and i ^ m ω
ζ
([t
t
i-i' i
)) =
a
sequence of random elements in
ξ
is constant on [ t ^ ^ t . ) , i.e M
°
a
'
s
n Furthermore, assume that the following conditions (i) - (iii) are fulfilled: (i) max (t? - t?_ 1 ) -> 0 as n -> »; 1 i^m (ii) There exists a sequence (F )
of monotone increasing functions
F : [0,1] ->]R such that for some a>l and b>0 P(|ξ n (s) - ξ n (r)| ^ ε, |ξn(t) - ξ n (s)| 2 ε) έ ε~"b(Fn(t) - F j r ) ) 3 for every ε>0 and any set {r,s,t} C T with r ^ s ^ t ; (iii) There exists a monotone increasing and continuous function F: [θ,l] such that for the F 's occurring in (ii) either (a) F (t) - F (s) ^ F(t) - F(s) for every n and any O ^ s ^ t ^ ^•^
or Then (ξ )
Q)
n
n
F (t) + F(t) as n -> °° for every t G [θ,ll.
satisfies (c) and (ϋ).
Proof. Let, for each n G U, φ : [θ3l] ->T be defined by φ (t) := max {r ^ t: r G T }, t 6 [θ,ll. Then, according to (i), lim φ (t) = t for every t G [θ,l]. Now, put F 1 := F o φ , n 6 1 , to get a sequence of monotone increasing functions on [θ,l]; we are going to show that (Q and ζg) are satisfied with F τ (instead of F there): n As to (c), by the assumed nature of the ξ f s , we have for any 0 ^ t^ ^ t^ ύ
which implies by (ii) that for every ε>0 and any O P(|ξ n (s) - ξ n (r)| Z ε, |ξn(t) - ξ^Cs)| ^ ε) £ ε"
100
PETER GAENSSLER
which proves (Q, As to ^ ) , we have to show that for any 0 ύ s £t ^ 1 (+)
f
f
lim sup (F (t) - F (s)) ^ F(t) - F(s). n n-
But this follows easily from (iii); in fact, (iii) (a) implies that for any O^s^t^l
F'(t) - F'(s) = F (φ (t)) - F (φ (s)) n n n n n n
ύ F(φ (t)) - F(φ (s)) -*- F(t) - F(s) as n •> «>, which implies ( + ), On the other hand, (iii) Q ) implies by the Polya-Cantelli theorem that sup
te[o,i] for any t <Ξ [θ,l],
jF (t) - F(t)| + 0 as n ->°° and therefore, n
|F'(t) - F(t)| ^ |F(t) - F(φn (t))| n
+ JF(φ (t)) - F (φ (t))| -> 0 as n -> °°, which implies ( + ),
D
This concludes our short review of some of the key results in Billingsley's (1968) book to be used in Section 4 when proving functional central limit theorems for weighted empirical processes along the lines of Shorackτs (1979) paper; concerning the L -statements there (cf. Theorem 18 and 19 in Section 4) it is possible to modify the above mentioned criteria in Billingsley's book in such a way that they allow for proofs working totally within the theory of L -convergence (cf. Remark (73)(b) in Section 4) as it will be the case for the following example concerning Donsker's functional central limit theorem for the uniform empirical process α
= (α (ΐ) )-t^ΓQ ii> defined by
α n (t) := n 1 / 2 ( U n ( t ) - t ) , t e [θ,ll, where U
is the empirical distribution function based on independent random
variables having uniform distribution on [θ,l]. According to (37), α can be considered as a random element in (D,8, (D,p)) n D as well as in (D,B(D,s)) (cf, (38)) and it follows from the multidimensional Central Limit Theorem that
EMPIRICAL PROCESSES
101
α :Λf
(43)
n f .d. where B
= (B ("t^+.cΓo ii ^
s
t
ie
" ^ Brownian bridge.
As to B , having all its sample paths in the separable and closed subspace C = (C,p) of D = (D,p), it follows from (39) that L{B°}, being originally defined on 8(C,p), may be considered as well on 8 (D,p) having the additional property that L{B }(C) = 1. Therefore, B
may be considered as a random element
in (D,8,(D,p)), too, with L{B } being concentrated on C, whence by Lemma 18 one has
(44)
(i)
α - t * B°
iff
(ii)
n
L α -^» B°. n
It was conjectured by J.L. Doob (1949) and shown by M.D, Donsker (1952) that (44)(i) holds true. There are various ways of proving this result which is known as Donsker τ s functional central limit theorem for the uniform empirical process: One may e.g. use Theorem 15 by showing that the hypotheses ©
and ( D ) are
fulfilled (cf. Gaenssler-Stute (1977), Lemma 10,2.2) or one may apply Theorem 15.5 in Billingsley's (1968) book; as to the latter one has to show that (45)
For each positive ε and η there exist a δ, 0 < δ < 1, and an integer er n
o
such that tha for all n ^ n
o
(fi) > ε) < η, n where
u) (6) := X
sup |x(t) - x(s)| for x E D, |t-sj«5 t,se[o,i]
(By the way, it follows from Theorem 15,5 in Billingsley (1968) together with Lemma 18 that (45) is a sufficient condition for (α )
_, to be relatively
L -sequentially compact.)
As to (45), this can be shown either by using Donker's invariance principle for partial sum processes (in case of independent exponential random variables) (cf. L. Breiman (1968), problem 9, p. 296) or by more direct com-
102
PETER GAENSSLER
putations using the structural properties of empirical measures as presented in Section 1 (cf. W. Stute (1982)) yielding at the same time an independent proof of (M-M-)(ii) within the theory of 1. -convergence in (D,p); in fact, it can be shown (cf. Proposition B 9 in Section 4) that (45) implies δ-tightness of (L{α })
w.r.t. S
- C[θ,ll, and therefore Theorem 11* together with an
application of Theorem 3 yields (44)(ii) in view of (43). This also indicates the way to prove Functional Central Limit Theorems for more general empirical processes (empirical C-processes indexed by classes C of sets) in the setting of L -convergence of random elements in appropriately chosen metric spaces. Before doing this in the next section we want to supplement the present one by some remarks on random change of time (cf. Billingsley (1968), Chapter 3,17.).
RANDOM CHANGE OF TIME: Following Billingsley (1968) we will briefly indicate here that so-called random change of time arguments are valid also within the context of L -convergence (even with simplified proofs not relying on Skorokhod's topology); in this connection the reader should remind our remarks on product spaces, For this, let D
consist of those elements φ of D Ξ D[θ,l] that are in-
creasing and satisfy 0 ύ φ(t) ύ 1 for all t. Such a φ represents a transformation of the time interval [θ,l]. We topologize D
by relativizing the uniform topology of D,
Then (37) implies that D 6 B L (ϋ) and therefore o D BΛΏ b
)C A o o
:= D Π 3R (D) = {B C D : B E B. (D)} C B(D ). o b o b o
For x E D and φ 6 D , let x
φ: [0,1] •* E
be defined by (x * φ)(t) := x(φ(t)), t E [θ,l]. Then x © φ lies in D and, if ψ: D x D -> D o
EMPIRICAL PROCESSES
is defined by
ψ(x,φ) := x o φ , then ψ is
103
B, (D)(χ)A , 8 (D)-measurable,
i.e. one has (+)
1
ψ" (8 b (D)) c A := B b ( D ) 6 ) A o
where A is a σ-algebra in the product space S = D x D
(being equipped with the
maximum metric d (cf. our remarks on product spaces)) such that
B b (s) c A c 8(S). ad (+): cf. Billingsley (1968) p, 232 for a proof being based on the fact that B b (D) = σ(ίπ t : t 6 [0,1]}) by (37). D
Now , let ξ , n £ 1 , and ξ be random elements in (D,B (D)) and, in addition, let η , n € K , and η be random elements in (D ,A ) all defined on a common p-space (Ω,FjP). Then (ξ ,η ) , n 6 1 , and (ξ,n) are random elements in
(S,A) = (D x D , B.(D)®A ) O
D
O
and so, by (+), ξoη
= ψ(ξ ,ii ) , n E E , and ξ » η = ψ(ξ,η) are random elements in (D,B, (D))
resulting from subjecting ξ by η
and ξ to the random change of time represented
and η, respectively.
Concerning a "(ξ ,η )
* (ξ,η)"-statement, (ξ,n) may be considered as a
random element in (S,8 (S)), since BjtS) c A> thus being in accordance with our definition of L -convergence. When asking for conditions under which L
(++)
(ξ ,η ) n n
L
b > (ξ,η)
implies
ξ o n n n
b >ξ
n
we know from the continuous mapping theorem (Theorem 4) that (++) holds if ψ is A,8 (D)-measurable and L{(ξ,η)}-a.e. d-continuous. Now, the required measurability of ψ is guaranteed by (+) and it follows as in Billingsley (1968), p. 145, that ψ is also L{(ξ,η)}-a.e. d-continuous if L{ξ}(C) = L{η)(C) = 1
for
C Ξ C[θ,l]; in fact, if L{ξ} and L{η} concentrate
PETER GAENSSLER
on C, then L{(ξ,n)}(C x ( C Π D
)) = 1, and it is easy to show that ψ is d-con-
tinuous on C x (C Π D ).
It remains of course the question of when
holds and here Theorem 9c can be used leading to the following result on stability of L -convergence in D Ξ D[θ,ll under random change of time:
THEOREM. Suppose that ξ , n GIN, and ξ are random elements in (D,BL (D)) such ί-b that
ξ
> ξ and
(D 5 A ) such that η
L{ξ}(C) = 1. Let η , n E U , and n. be random elements in Lh
> η and n equals
P-a f s, some function belonging to
C Ξ C[09l]*}. Then ξ o η , n EϋN, and ξ © η are random elements in (D,B (D)) for which
This last assumption may be omitted by considering instead the set C x {c} as separable support of L{(ξ,n)} if Π
=
c
P-a,s,
4. Functional Central Limit Theorems,
In the last section we have already mentioned Donsker's functional central limit theorem for the uniform empirical process α
= (α (t)).pfπ .,-i5 where
1/2 α (t) = n
(U (t) - t ) , U ( t ) being the empirical df based on independent
random variables η. having uniform distribution on the sample space X = [θ,l] with its Borel σ-algebra B = [θ,ll n β , In the setting of an empirical C-process β Ξ (£ (C)) r the uniform emn n utu pirical process α is a very special case taking C = {[θ,t]: t G [θ,l]} and identifying α (t) with β (C) = n
1/2
(μ (C) - μ(C)) for C = [θ,t], μ being the
empirical measure based on η ,. . • ,n
and μ being the uniform distribution on
[0,1]; note that μ (C) = U (t) and μ(C) = t for C = [θ,tl. The present section is concerned with some extensions of Donsker's functional central limit theorem in its form (1+4)(ii) to more general situations.
FUNCTIONAL CENTRAL LIMIT THEOREMS FOR EMPIRICAL C-PROCESSES: Let X = (X98) be an arbitrary measurable space considered as a sample space for a given sequence ξ ,ξ 9... of i.i.d, random elements in (X,B) 9 the ξ.'s being defined on some common p-space (Ω,F,]P) with law μ on B. If not stated otherwise we will consider the canonical model (Ω,F,]P)=(X 1N 9 B_ l5 x μ) with the ξ.'s being the coordinate projections of X
onto X.
1 n Let μ (B) = — Σ l D (ξ.), B 6 8, be the empirical measure based on ξ ,...,ξ . n n . _ . . b i ±n
105
106
PETER GAENSSLER
Now, given some subclass C of B, consider the empirical C-process 3
n
Ξ
(β
n
(C))
C6C
d e f i n e d
M O
b v
:= n
1 / 2
(μ n (C) - μ ( O ) , C G C,
as a stochastic process (on (Ω,F^P)) indexed by C. As mentioned in Section 1, its covariance structure is given by cov(3 n (C 1 ),β n (C 2 )) = \i(C1 Π C 2 ) - μ(C 1 )μ(C 2 ),
C l 9 C 2 G C.
So, the analogue of (44 )(ii) would be the statement that (in the sense of (34-)) (46)
Lh
3 => C , G = (G (C)) p c r , being a mean-zero Gaussian process n μ μ μ LfcL with cov(G (C1),G (C 2 )) = μ(C 1 Π C?) - μζC^μίC^), C ^ C ^ G C.
But this amounts at first to make a proper choice for a metric space S = (S,d) together with a suitable separable subspace S β
serving as sample spaces for
and its limiting process (B , respectively.
Following Dudley (1978) we propose to choose S
= U (C,d ) ;=
{φ: C -> H: φ bounded and uniformly d -continuous}, where d
is the pseudo^
metric defined on C by d
μ(Cl'C2)
:= μ ( C
l
Δ C
2
) 9
C
1'C2 e C'
(C. Δ C denoting the symmetric difference between C. and C ). Note that, concerning the μ(C)-part of 3 (C), C -*• μ(C) is a function belonging to S Q (since (μζ^) - μ(C 2 >| ^ d μ ( C l 9 C 2 ) ) . 1/2 In order to cope also with the μ (C)-part of 3 (C) (and the factor n ) , let S Ξ D (C,μ)
:= {φ = φ
O
+ φ : φ l
/
i
_
G S L
and φ
θ
k Σ a.ε
= ^
.
^ 1=1
a
G ] R ,x
E X , l ^ i ^ k ,
Note that S is a linear space containing S
1
X
for
some
. i
k elί}.
as a linear subspace
also 3 ( f 9 ω) 6 S for all ω G Ω. Finally, let S (and its subspace S ) be metrized by the metric d := p, where p is the supremum-metric, i.e.,
EMPIRICAL PROCESSES
p(φ ! 5 φ") := sup |cpτ(C) - φ"(C)| CEC
107
for φ 1 ,φ" € S.
Note that the closure D(C 9 μ) of D (C 9 μ) in the Banach space £°°(C) = U°°(C) 9 p) of all bounded real-valued functions on C can be considered as an extension of D=D[O,1] in the classical case, where X = [θ,ll, C = {[0 t tl: t 6 [θ 9 l]} 9 and μ is the uniform distribution on [0,1] or any other distribution on [θ,l] with a strictly increasing distribution function; also, in the latter case, u (C,d ) equals C[θ,ll after identifying φ([θ,tl) with x(t).
Having made this choice for S , S and d, in view of (46) the following problems still remain: PROBLEM (a) (MEASURABILITY): Find conditions under which the 3 ! s can be viewed ^m*.
pj
as random elements in (S,A) for some σ-algebra A in S such that one meets the situation of Section 3, i,e. (47)
B b (S,p) C A C B(S,p)
(with B (S,p) being the σ-algebra generated by the open p-balls in S 9 and B(S,ρ) being the Borel σ-algebra in (S,p)) Taking A ;= σ({π : C 6 C } ) 5 with π : S -> E. being defined by π c (φ) := φ(C), C 6 C, is
F,A-measurable
(since F,σ({π : C E C})-measurability of β
is equivalent with F9β-measurabili-
ty of π (& ) = 3 (C) for each fixed C 6 C, the latter being satisfied since 3 (C) is a random variable (on (Ω9F,]P)) for each fixed C ) 9 but the first inclusion in (47) fails to hold, in general: in fact, looking back to (10) in Section 1, it follows that in the example considered there 3 is not even F,B(S,p)-measurable, n D So, we will restrict our consideration to cases where the following measurability condition (M): B b ( S , p ) C A := σ ( { π c : C E C})
108
PETER GAENSSLER
is fulfilled, which turns out to be satisfied in important cases of interest; note that (M) implies (47), since the other inclusion there holds trivially due to the p-continuity of the π 's for each fixed C E C,
LEMMA 20. Suppose that C fulfills the following condition (SE): There exists a countable subclass V of C such that for any C E C there exists a sequence (D ) ^
in V with 1
(x)
> lp(x) for all
n x E X; then (M) holds true.
Proof. (SE) implies that for any C E C there exists a sequence (D ) _ „ in V such that lim d (D ,C) = 0 from which it follows that φ,(C) = lim φ,(D ) for U n 1 l n every φ
E S
on the other hand, since 1
(x) -** 1 (x) for all x is equivalent n
with lim ε (D ) = ε (C) for all x, we obtain φ(C) = lim φ(D ) for every φ E S. x n x n But from this it follows that for any φ {φ E S: ρ(φ,φ ) ^ r} =
°
Π
E S and any r > 0
{φ E S: |φ(D) - φ (D)| ύ r} E A,
DEP
°
since V is countable, implying (M). D
(48) EXAMPLES, (a) Let (X,B) = 0R k ,β ) , k £ 1, and let C be the class 1
of all
lower left orthants or the class B, of all closed Euclidean balls in Έ. , respectively; then (SE) and therefore (M) holds true for C = J
and C = B ,
respectively, (b) If we c o n s i d e r i n s t e a d e . g M t h e fixed c l o s e d Euclidean b a l l in f a c t ,
no V - {C
class C ; = { C
+ z; z E R }, C being a
in E. , then (SE) f a i l s t o h o l d :
+ q: q E R} with countable R C ]R
can serve as a countable
s u b c l a s s of C with t h e d e s i r e d p r o p e r t y s t a t e d i n (SE), s i n c e for any fixed k z E ]R \ R and any D E V t h e r e e x i s t s a y 1 —1 (v ) ^ 1 (v ) ~ 0 C Ί*z π Do o Q (cf.
FIGURE 4 ) .
k E H such t h a t
EMPIRICAL PROCESSES
109
FIGURE 4
We shall see below how to cope also with examples where (SE) fails to hold. For this another measurability assumption (M ) weaker than (M) will be needed. It should be noticed (cf, the proof of Lemma 20) that in case of (SE) we have SEPARABILITY of the process 3 Ξ (3 (C)) r in the sense that each sample path n n UfcL' of 3 is uniquely determined by its values on P.
Let us make some further remarks at this place: first, note that (M) implies (49)
B (T,p) = σ({π c (T): C 6 C}) = B(T,p) for any_ separable subspace T of S, with TΓpίT): T ->• ΊR being defined by π c (T)(φ) := φ(C).
In fact 9 the same reasoning which gave us (39) in Section 3 yields (50)
B (T,p) = T Π 8 (S,p) for any separable subspace T of S,
whence (cf. Lemma 11 (iv)) B, (T,p) = T Π B, (S,p) b b
C ( M )
T Π A = σ({π p (T): C E C } ) C 8(T,p) = B, (T,p) 5 C b
which proves (49). Next, concerning S (49*)
= u (C,d ), it follows even without imposing (M) that
B, (S ,p) = σ({π_(S ): C E C}) = 6(S , ρ ) 5 provided that C is DO
LO
O
^^
^
totally bounded for d j_ In fact, if C is totally bounded for d , there exists a countable d -dense subset V of C implying, due to the d -continuity of functions belonging to S , that for any φ
G S
and any r > 0
110
PETER GAENSSLER
{φ E S : p(φ,φ ) ^ r} = Π o D e p
{φ € S : |φ(D) - φ (D)| ύ r} o o
G σ({π n (S ): C E C } ) 9 whence B. (S ,p) C σ({π (S ); C 6 C}) C β(S ,p) Co DO Co o on the other hand, using the Stone-WeierstraB theorem, it can be shown that (51)
S = LΓ(C9d ) i s s e p a r a b l e and p-closed ( i . e . S° = S ) , o μ o o provided that C is totally bounded for d j_
This proves (49 ). For later use it is important to note that (49 ) together with (50) and (51) imply LEMMA 21. Let C be totally bounded for d has all its sample paths in S
and suppose that (B = (G (C))pw,
= lΓ(C,d ) ; then
element in (S,&(S,p)) with L{<E }(S ) = 1. Furthermore, L{<9 } as well as any other law v E ML (S) with v(S ) = 1 is uniquely determined by its fidis (which 1,
are the image measures that π
(S ): S
1**'#' k ° is viewed as defined on S πp ^25 *
-> ΊR induce on /B, from v when v
°
Π B (S,ρ) = BL (S ,ρ) = σ({π c (S ): C G C } ) 9 where
(S )(φ) := (φ(C ), . . . ,φ(C, ) ) .) > jζ ° J. K
This leads us to the next PROBLEM (b) : (EXISTENCE OF A VERSION OF g Let Φ
= (G (C))
r
Ξ (g ( O ) c c C in S
=_ (^(C^d ) ) :
be a mean-zero Gaussian process with covariance structure
(cf. Section 1, (4)) cov(G y (C 1 ), 6 (C 2 )) = μ ( C 1 Π C 2 ) - μ ( C 1 ) μ ( C 2 ) > C ^ Noticing that the fidis of 3
G C.
(viewed as a random element in (S,A) with
A := σ ( { π : C G C})) are well defined, we have according to (4) of Section 1 (52)
3 -Ξ-J C , G
being viewed as the coordinate process on
(where L{Φ } is uniquely determined by the fidis of
EMPIRICAL PROCESSES
111
Now, the problem is to find suitable conditions under which there exists a version
and (ί have the same fidis (denoted by
L (E
f
=d
€ )
in this connection (5 is allowed to be defined on a p-space
1
( Ω ^ F ^ ' ) different from (Ω,F,P).
It turns out that in order to get a positive result, C is not allowed to be too "large"; cf, R.M. Dudley (1979 a) and also R.M. Dudley (1982). A proper condition on a class C being not too large to allow for a solution of problem (b) is in terms of the so-called metric entropy: for this, let for any ε>0 C =
n U
N(ε,C,μ) be the smallest n G Έ such that
C. for some classes C. with d -diam(C) := supίd ( C ! 9 C M ) : C',C"eC.}^2ε
for each j log N(ε,C,μ) is called a METRIC ENTROPY (of C w.r.t. μ ) .
Obviously, N(ε,C,μ) < °° for each ε>0 iff C is totally bounded for d case S
(in which
= u (C,d ) is separable and p-closed, by (51)).
Now, as shown by R.M. Dudley (1967) and (1973), cf. p. 71, (53)
$ S
μ
has a version $
μ
= (G ( O ) p having all its sample paths in μ Ltc
Ξ ϋ (C,d ) provided that
(EQ):
J
1
9 1/9 (log N(x\C,μ)) '' dx < ~.
But it turns out that (E ) is not sufficient to ensure (46); o in fact, disregarding for the moment measurability questions, the following example shows that (46), i.e. 3
— ^
satisfied: Let C be the collection of all finite subsets of X = [θ,l] and let μ be the uniform distribution (Lebesgue measure) on β = [θ,l] Π β ; then d ^ ^ ' ^ 9 ^ ~ ^
112
PETER GAENSSLER
for all C ,C
2 E C, whence N(x 5 C,μ) Ξ 1 and therefore (E ) obviously holds
true; also μ(C) = 0 on C implies that (E = 0, but still (46) fails to hold: (46) would imply sup |β (C)| n CEC
» sup JG (C)| = 0 μ CEC
for the present C
= n
sup
cec
β (C) n
1/2
which cannot be true since
•* °° as n -> «\
A proper strengthening of (E ) which will yield (46) and hence also a solution to problem Q)
is in terms of the so-called metric entropy with in-
clusion. For this, let for any ε>0 A.,.,.,A
N (ε,C,μ) be the smallest n E 3N such that for some
E B (not necessarily in C ) , for every C E C
there exist i,j with
A . C C C A . and μ(A.\A.) < ε; log N (ε,C,μ) is called a METRIC ENTROPY WITH INCLUSION (of C w.r.t. μ ) .
Compared with N(ε,C,μ) we have for any C C β and any μ (54)
N(ε,C,μ) ^ Njίε.C.μ) for each ε>0,
For, suppose w.l.o.g. that n = N (ε,C,μ) < °°; then there exist A-,...,A such that for any C E C
E 8
there exist i,j with A. C c C A. and μ(A. \ A . ) < ε.
But then, for i=l,...,n, C±
:= {C E C: A. C C and d (A.,C) < ε} * 0,
d -diam (C.) ^ 2ε, and C =
n U
C. (since for each C E C
there exist i,j such
that A. C c C A. and μ(A.\A.) < ε which implies d (A.,C) ^ μ(A.\A.) < ε, i.e.,C E C ) . This proves (54).
Now, as shown by R.M. Dudley (1978), Theorem 5.1, the following result holds true:
THEOREM A. (M) together with
(E ): J 0
(log N ( χ 2 , C , μ ) ) 1 / 2 dx < «>
imply that
3 — > (E . n μ
EMPIRICAL PROCESSES
113
The proof of Theorem A is based on the following fundamental characterization theorem (cf, R.M. Dudley (1978)):
THEOREM B. Let (X,B) be an arbitrary measurable space considered as sample space for a given sequence ξ ^ ξ ^ . . . of i.i.d, random elements ξ. in ( X 9 B ) 9 where the ξ^'s are viewed as coordinate projections of (Ω,F,]P) = (X
9
B
, x μ)
onto X with law L{ξ.} Ξ μ on B. Suppose, given some subclass C of B together with the empirical C-process 3
Ξ (3 ( O ) C £ C based on ξ
ξ
9
that (M) is
fulfilled.
L Then 3 n if both
» (B (in which case C will be called a μ-DONSKER CLASS) if and only J μ
(a)
C i s t o t a l l y bounded f o r d
(b)
f o r any ε,η > 0 t h e r e e x i s t s a δ = δ ( ε 9 η ) 9 0 < δ < 1, and t h e r e e x i s t s a n n
o
9
and
= n ( ε 9 η 9 δ ) E ] N such t h a t for n ^ n o o
P*(M;Q 3
(δ)
> ε)
<
η9
n
where tϋ (δ) := sup { |φ(C1) - φ(C 2 >|: d ( C 1 > C 2 ) < δ 9 C l 9 C 2 E C} for φ E S Ξ D (C,μ). o (55) REMARK. A comparison with (45) shows the complete analogy with the classical situation X = [θ 9 l], B = [0,1] Π flB5 μ = uniform distribution on B, C = {[θ,t]: t E [0,1]}, where 3 process α
can be identified with the uniform empirical
note that, due to the compactness of the unit interval,
C = {[0,t]: t E [0 9 l]} is totally bounded for d : given any ε>0 let n
: {[θ {[θ99tl: tl: U := infίn: - ^ 2ε} and C.. := 3 nQ
then d -diam (C.) ύ 2ε and C =
ύ tt < ί }; nQ
n o U C..
Before proving Theorem B we will show two auxiliary results:
PROPOSITION B
(cf. Problem ©
above). Suppose that (a) and (b) of Theorem B
114-
PETER GAENSSLER
are fulfilled; then G
= (S (C)) P ί _ Γ has a version in S = u (C,d ) , i.e.,there μ L/vzL o μ
μ
exists a Gaussian process Φ μ such that Φ
= (G ( C ) ) n ^ p having all its sample paths in S and μ LtL o
L
=
Thus, by Lemma 21, Φ
can be viewed as a random element in (S,B, (S,p)) with
L{Φ }(S ) = 1, where, by (51), S
is separable and p-closed.
Proof of Proposition B.. As already remarked in connection with problem (b) _ above, the process φ
μ
C — is viewed as the coordinate process on OR ,/BΓiL{& } ) t v^ μ
According to (a), for every n E Έ there exist m
E U and C n
such that C -
Λ ,,.,,C n,l n9m
EC
U B^ (C . , - ) , where B^ (C. ,-) := {C E C: d (C. ,C) ^ -} ._ 1 d n,i n d i,n'n μ i,n' n * m
therefore, V :=
n U{c
U
,} is a countable and d -dense subset of C. Let
U(P,d ) := {φ: V •*• ]R, φ uniformly d -continuous}, and let Φ
β = (G (D))j,gQ9 viewed as the coordinate process on
(Ω,FJί) = (]R,βp,L{G
@
^ } ) . Then it suffices to show
There exists a Gaussian process (B ^ = (G (D)) ^p on some p-space (Ω1 ,F ! ,£>' ) having all its sample paths in U(P,d )
L
and such that Φ -n =, ® -n μ,P f.d. μ,P In fact, once ©
is shown, we can define for each ω ' E Ω
τ
f
G (ω ) as the μ
uniquely determined uniformly d -continuous extension on C of Φ G (C,ω! ) = lim G ( D n , ω ! ) , ( D n ) n e ] N C V
for each C E C
r}(ω') (i.e t
being such that
d (C,D ) -> 0 as n •> «>). It follows that (*) and
(**)
€ (ω1 ) is bounded for each ω τ , whence Φ (ω 1 ) E S βμ
f
=d
for all ω',
βμ .
ad (*): By (a), for every ε>0 there exist an
n
= n (ε) 63N and C. C C, o 3 n o j=l,...,n , such that d -diam (C.) S 2ε and C = U C.. ° U 3 3 j = 1 o
EMPIRICAL PROCESSES
115
Let ω f 6 Ω 1 be arbitrary but fixed^ since € (ω ! ) is uniformly continuous on C, for each δ>0 there exists an ε = ε(δ,ω ! ) > 0 such that | G μ ( C 1 5 ω ' ) - G ( C 2 5 ω ! ) | < δ whenever d ( C ^ C ^ ^ 2ε for C ^ C ^ E C. Now, given an arbitrary C £ C, there exists a j G {l,...,n } and a C.
EC.
such that d (C,C.) ^ 2ε, and therefore G μ ( C , ω f ) | ^ |G μ (C 5 ω ! ) - G^(C.,ω f )| + |G ( C . ω 1 ) ! £ δ + |G (C.,ω f )|, whence sup |G (C,ω')| ^ δ + sup |G (C.,ω f )| < «. μ CEC l^j^nQ μ : ad (**); Let us confine here to show that L{G (C)} = L{G (C)} for each fixed C 6 C; concerning the higher-dimensional fidis the proof runs in a similar way. Now, given any C E C , let (D )
C V be such that d (C,D ) •* 0 as n -> °°,
whence, by construction, G (C,ω ! ) = lim G (D ,ω ! )
i mJp l 7y i n& g G (D ) - ^ ^ μ n
G ( C ) . Now, by © , L{G (D )} = L{G (D )} = μ > J \~s * μn μn
W(O,μ(D ) ( 1 - μ(D ) ) ) (cf.
n
for all ω 1 E Ω ! ,
(3) of S e c t i o n 1) for
n
each n E ]N, where
μ(D ) -*• μ(C) as n •><», s i n c e lim d (C,D ) = 0 ;
n-
μ
n
therefore L{G (C)} = N(O,μ(C)(l - μ(C))) = L{G (C)},
So it remains to show ^ ^ : According to Lemma 7.2.31 and Satz 7.1.18 in Gaenssler-Stute (1977) (+) is equivalent with P p ( { φ βΈ0:
(t+) where φ E U(V9ά
) iff μ
Q E
lim W (δ) = 0 64-0 φ
,
,
μ
with
iλPiδ) := sup ίiφCDj^) - φ ( D 2 ) | : d ( D ^ D ^ < δ, D l S D 2 E V} being B^, B-measurable as a function in φ. Note that for any φ E ]R
and any δ>0
ι/ n (δ) f ^ ( δ ) φ φ
as P
t V9 n
116
PETER GAENSSLER
whence for any ε>0 we have (c)
{φ El?:
(/(δ) > ε} C Φ
U { φ e^: nEΦJ
We are going to show next that ^ Ώ
n
t/ (δ) > ε} as V
t P.
Ψ
is implied by
(R ): For any ε,η > 0 there exists a δ = δ(ε,η), 0 < δ < 1, such that Pp({φ EΈ°: MJ (δ) > ε}) < η. In fact, (R ) implies that for each fixed ε>0 Σ U?p({φ E E : ttf (δ ) > ε } ) <°° Φ n nEU for some suitable sequence δ Pβ-almost
a
I 0, whence, by the Borel~Cantelli lemma, for
H Ψ E ]R there exists an n(φ) E IN such that for all n ^ n(φ)
MJ (δ ) ^ ε which implies, by repeating the argument for a sequence of ε's
-
V
tending to zero, that for Pp-almost all φ E E lim (λ) (δ) = 0, i.e. φ E U(V,d ), μ δΨO φ which proves (jH^ . So far we only made use of assumption (a); now, the proof of Proposition B will be concluded by showing that the other assumption (b) implies (R ): for this, remembering that V is countable, let V V
C V9 n GUN, with \V \ < °°
f P; then, according to (c) it suffices to show:
(d) For any ε,η > 0 there exists a δ = δ ( ε , η ) 9 O < δ < l , such that for any Vy C V with |P f | < » P^Cίφ E E : M J
where,
f o r Vy
- {D
1 5
J.
...,DO}, X/
( δ ) > ε } ) < n *
M; ' ( δ ) > ε i f f ψ
(φ(D.. ) , . . . , φ ( D β ) ) E G J.
Λι
with G = Gε ,o_ being some open subset of E . Now, given an arbitrary ε>0 and an arbitrary η>0 choose δ = δ(ε,n) 9 0 < δ < 1 according to (b) such that for all n ^ n (ε,n9δ) (e)
P (UJO (δ) > ε) < η. 3 n
EMPIRICAL PROCESSES
r
Then it follows that for each V
— =
= ί D i 9 , .,D β } C V (C C)
-1
-1
«π
P|Λ
^ lim inf L{β } n-*» n
117
(G) = L{ϋ } *π
(G)
πr , (G) = lim inf P * (π r _ _ , ββ ) ~ (G) {D 1 9 ...,D £ } ^ {D 1 9 ..,,D £ } n
P1 = lim
inf
P(MJ O
n-x»
* (6)
> ε) ύ lim
n
inf
P
(M; H
n-*»
(δ)
> ε)
n
ύ
η,
(e)
where for the first inequality above we made use of (28) and the fact that according to (52) and (**) This proves Proposition B ^
3 ,. •, > Φ . n r.α. y D
(56) REMARK. The proof just given of Proposition B
shows that in order to get
a result like (53), it suffices to show that an entropy condition like (E ) implies (R ). This was nicely demonstrated by D. Pollard (1982) in one of his Seminar talks at Seattle using an analogue of the chaining argument of R.M. Dudley ((1978), pp. 915, 924); cf. also D. Pollard (1981), pp. 191-192.
PROPOSITION B . Suppose that (a) and (b) of Theorem B are fulfilled and also (M); then (L{β }) „_ is δ-tight w.r.t. S = U b (C,d ) , n nE-DM o μ (Note again that L{$ } 6 M ^ S ) , S Ξ D (C,μ)J n a o For the proof of Proposition B
we will make use of the
Kirszbraun-McShane-Theorem (cf. M.D. Kirszbraun (1934) and McShane (1934)): let S = (S,d) be a metric space, A C S , and let φ be a real-valued function defined on A such that sup { |φ(x) - φ(y) | / d ( x , y ) : x,y G A, x + y} =: K < °°; then φ can be extended to a function ψ on all of S with sup ί|ψ(x) - ψ(y)|/d(x,y): x,y 6 S, x Φ y} = K.
Proof of Proposition B o (cf. R.M. Dudley (1978), Lemma (1,3)).
118
PETER GAENSSLER
For any ε,δ > 0 let B.
θj.ε
:= {φ E D (C,μ): 3C. ,CQ E C s.t. d (C19C_) < δ and |φ(C.) - φ(C o )| > ε}. o
±
Note that
z
μ
j
-
z
i
z
φ E B. iff itf (ό) > ε, o5ε φ
We have to show: for any 0 < ε < 1 there exists a compact set K C Lr(C,d ) such that for each γ>0
Ύ
L{3 }(K ) > 1 - ε for n large enough.
(Note that K Ύ E B (S,p) C A by (M).) Let 0 < ε < 1 be given; by (b) take δ = δ(ε) 9 0 < δ < 1, such that
Θ
P*(β
n
E B Γ / o ) < ε Λ for all n ^ n (ε,δ(ε)), δ,ε/2 o
According to (a) there exists a finite C = C (&) C C such that for all C e C . o o d (C,C ) < δ for some C E C . μ 'o o o Let k := |C |; then k = k(δ(ε)) E H . Take M = M(ε) large enough so that (M - 1 ) ~ < ε/k; then Γb) ^
3P(sup |3n (C)| > M) < ε/2 for all n ^ n (ε) Ξ n (ε,δ(ε)). CEC o o
ad (b): Note that {ω: sup |β (C,ω)| > M} E F according to ( M ) ; n CEC now, for each C E C , ]P( j β (C ) | > M - 1) < ε/M k
by Chebyshev's inequality
(and the choice of M ) , whence
© Next,
P(sup |β (C )| > M - 1) < ε/4, CEC n ° o sup |βn (C,ω)| > M CθC
and
|β (C.,ω) - β (C o ,ω)| ^ ε/2 n i n z
for all C ,C E C with d (C ,C ) < δ together imply (due to the choice of C ) that there exists a C £ C such that o o |β (C ,ω)| > M - ε/2 > M - 1, whence {sup |β (C)| > M} C {3 6 B . ,J U {sup |β (C )| > M - 1} n δ CEC n 'ε/2 C EC Π ° o o which implies \bj according to (a} and (bj ,
EMPIRICAL PROCESSES
119
3
Now, for any j E E , let ε(j) := ε
2 ; then by (b) there exists a sequence
ό(j) = ό(j,ε) > 0 , ] 6 1 , such that (i) (ii)
δ(j + 1) < δ(j)/29 and P*(3 n E B 6 ( j )
ε ( j )
) < ε(j) for all n ^ n Q (ε,j).
Let A. : = B., .λ , . Λ and 6 . : = — ~ z — 3 δ(]),ε(]) ] 2^1
^
*
then, by (i), we have (iii)
ό. < δ./M3+1 j
and
. Q
is increasing with j,
Furthermore, for m ^ 2, let Fm := {φ E D (C,μ): sup |φ(C)| ^ M and
°
then (c)
CEC
s.t. for all CX 1,C l
E C
d (C ,C ) IφίC^) - φ(C 2 )| si ε(j).max (1, - = — ^ ) for j=2,..,,m}; j sup |φ(C)| ^ M for some φ E D (C,μ) and φ 6 JA. for j=2,,.,,m 3 CEC ° together imply that φ E F ,
ad (c) : sup jφ(C)| ύ M implies that for all C ,C E C CEC |φ(C ) - φ(C o )| ύ 2M = — ~ — — X
2.
£ ε(j) — — ^
0.
, if
0.
d (C l 9 C 2 ) ^ δ(j) for all C l 9 C 2 E C; on the other hand, d (C l 9 C 2 ) < ό(j) for some C^C^ E C imply
together with φ E C A .
IΦCC^) - φ(C 2 )| ^ ε(j), which proves (c) .
We will show next that (ii) together with Q y and ^c) imply Id)
For each m ^ 2 there exists an n 1 =n 1 (ε 9 m) 6 3N such that for all
n ^ n 1 there exists an E £¥ with P(E ) > 1-ε and β ( ,ω) E F 1 nm nm n π for all ω E E nm ad (d): According to (ii), let n (ε,m) be large enough such that for all n ^ n (ε,m) and each j=2,...5m there exist E', E F with ° n3
120
{3
n
PETER GAENSSLER
6 A . } C E 1 . a n d 3P(E f . ) < ε ( j ) = ε 3
whence
n j •
nj
m P ( f u E ' . ) > l - ε / 2 j=2 n:l
thus, for
m E := ( £ U nm j = 2
2 ^,
mm C u E ' . C Π {β j =2 n ^ j=2 n
and
and
3
E τ .) Π {sup |3 (C) | ^ M} G F, n3 n C E C
we obtain together with ζ y and ^ ) that for n ^ n ]P(E ) > 1-ε nm
efA.};
3 ( ,ω) 6 F n m
for all
:= max(n (ε,m),n (ε))
ω 6 E . nm
This proves (d} Now let K := {φ E A C ) : sup |φ(C)| ύ M and s.t. for all j G I
ceC d
μ
( C
l
9 C
2
)
<
δ
'
/ 2
i m
P
l i e s
Then K C LΓ(C,d ). Now, (C,d ) is totally bounded and K is a uniformly bounded and equicontinuous family of functions being p-closed in the Banach space 00
(£ (C),p) whence, by the Arzela-Ascoli theorem (applied to the completion of (C,d )) it follows that K is compact. So, it remains to show that for each γ>0 H3
}(K Ύ ) > 1 - ε
for n large enough,
For this it suffices to prove (e)
For each γ>0 there exists an m = m(ε,γ) such that F
C KΎ,
In fact, (e) together with (d) imply L{3 }(K Ύ ) k P*(β β F ) £ F(E ) > 1 - ε v^ ^"^ n n m nm for n ^ n.. (ε,m(ε,γ)), which concludes the proof of Proposition B~
ad (e) : Given γ>0, choose m = m(ε,γ) such that ε(m) < γ/2 and take a maximal set C C C such that m d (C 1 5 C_) ^ δ for all C. Φ C_ in C . μ 1 2 m 1 2 m Then C
is finite by (a) and for all C E C, d ( C , C ) < δ
(by the maximality of C ),
for some C
EC
EMPIRICAL PROCESSES
121
d (C ,C ) Now, if φ G F and C. ,C_ G C , then (since - ^ — = — — ^ 1 for C, * C o ) m l 2 m 6 1 2 m we obtain (cf. the definition of F ) m d(C 1
) - φ ( C 2 ) | ^ e(m)
,C ) g m
Applying the Kirszbraun-McShane Theorem, rest^ φ can be extended to a function m ψ on C with d(C5C (iv)
|Ψ(C 1 ) - ψ ( C 2 ) | ^ ε(m)
) for all C^C,, G C.
s
m In addition, w.l.o.g, we may assume that
sup |ψ(C)| ^ M, CGC
Let us show that ψ G K, i.e., for all j GIN
d C ^ , ^ ) < δ./2
For j ^ m, since
^
^ —-z—
implies
I ψ C ^ ) - ψ(C 2 > | ύ 3ε(j).
by (iii), we obtain from (iv)
m
j ^)
- ψ ( C 2 ) | ύ ε(j) if d μ ( C l 5 C 2 ) ύ δ
for j < m, given C. G C, i=l,2, with d (C, ,C O ) < δ./2, choose C! G C l μ ± z 3 l m such that then
d (C. 3 C!) < 6 , i=l,2;
d (C' Ci) < 2δ μ
λ
2
m
(note that rest^ L
+ 6./2 ^ ^ ( i i i )
δ., and so by (iv) D
ψ = rest^ φ, φ G F ) L mm m - ψ(C2)| - ψ ( C 2 ) | ύ ε(m) + ε(j) + ε(m)
Thus ψ E K ,
Now, we have p(φ,ψ) < γ since for any C G C
there exists a C
G C such that m
d (C,C f ) < δ , whence (since φ G F and by (iv)) y m m |φ(C) - ψ(C)| ^ |φ(C) - φ ( C f ) | + |ψ(C τ ) - ψ(C)| ^ 2ε(m) < γ. So F C K m
which concludes the proof of (e) . ^^
D
122
PETER GAENSSLER
We are now in a position to give the Proof of Theorem B. First assume (a) and (b). Then, by Proposition B , we can view (B Ξ (G (C))n^n as a random element in (S,EL(S,p)) with L{<& }(S ) = 1, μ μ CEL D μ o S
Ξ LΓ(C,d ) being p-closed and separable; furthermore, as mentioned at the
end of the proof of Proposition B., we have
(T)
£ -J^f
s
-*'
n f .α.
μ
Now, by Proposition B , (L{3 }) e _, is δ-tight w.r.t. S , whence it follows from Theorem 11
that
for every subsequence (L{β ,}) of (L{3 }) there exists a further subsequence (L{β „}) of (L{β ,}) and a v = v, , λ t ... € M, (S) with v(S ) = 1 such that (n'),(n") b o
Since each projection π
: S -»• ]R C i a ...,L k
is A,B -measurable and p^continuous k
and since (M) is assumed, we obtain from (^
by Theorem 3 that
for each C 1 ,.. , ,C, € C.
Together with ^ ^ this implies that v and L{
n
v = L{(β } on B (S,ρ)
b
μ
9
' *
n
(cf. Lemma 21) and therefore
μ*
Conversely if C is a μ-Donsker class, then (a) holds (cf. Proposition 3.4 in R.M. Dudley (1967)). So it remains to prove (b) (where it suffices to prove the assertion there by taking n = ε ) . Now, by Theorem 12 there exists a sequence (3 , n E l , of random elements in (S,A) and a random element (B in (S,B(S,p)), all defined on an appropriate p-space (Ω,F»P), such that
EMPIRICAL PROCESSES
©
123
Lίί } = L{$ } (on A) for all n E3N, L{£ } = L{€ } (on B, (S,p)) n
^•^
μ
n
μ
D
and ©
p(β (ώ), Φ (ω)) = sup |β (C,ώ) - G (C,ώ)| -> 0 as n + °° n μ n μ CEC for all ώ E Ω
o
G F with
P(Ω ) = 1. o
Since L{Φ }(S ) = L{φ }(S ) = 1, we may assume w.l.o.g. that Φ (ώ) E S
for
all ω E Ω, whence for any ε>0 there exists a ό = < S ( ε ) > 0 such that ©
3P(M;- (6)
> ε/2)
<
ε/2.
μ (Note that {ώ E Ω: uft /-N(<S) > ε/2} E F if,5 jas just assumed, (B(ω) 5 μ Φ (ω) E S for all ώ E Ω.) μ o Now, since S (with Bfi
ε / 2
is separable, take a sequence {φ : m E Έ] dense in S
:=
U
B (φ ,ε/4) p m
(B (φ ,ε/4) denoting the open p-ball p m
with center φ and radius ε/4): m then T Q E ^ ( S . p ) whence, by ( M ) t {&n $ T Q } E F as well as ίi
$ T } E F
no
for each n.
Furthermore we have T in fact, φ E T
o
n B. =0: o,ε
implies that p(φ ,φ) < ε/M for some m E IN, and since
φ m E ffBfi ε / 2 , we have for any C ^ C j E C either
d (Cl9C2) ^ δ
I Φ ^ C ^ ) - Φm(c2)l =
or
ε / 2
'
implying in the latter case that |φ(C 1 ) - φ(C 2 >| ^ | Φ ( C 1 ) - Φ ^ ^ ) | +
.
:= {φ E S: 3 ^ , ^ E C s.t. d ( C ^ C ^ < δ and |φ(C1)-φ(C ) | >ε/2}).
Let TΛ
Π QB^
'Cpm(Cl) " φm(C2)l
+
' φ m ( C 2 ) " (P(C2)I
We thus obtain for each n E ]N
<
ε
'
w h e n c e
Φ
E
CBδ,ε'
124
PETER GAENSSLER
1P*(M>O ( δ ) > ε ) = Έ (ft e B. ) £ P (ft φr T ) = P ( β φr T ) = P(ft φ τ T ) , ft n δ,ε n o n o n o and so it remains to show (j)
P(B
φ T ) < ε for n sufficiently large.
This will follow now easily from ( y together with (θj: At first Q) 0
implies that there exists an n
o
= n (ε) GIN such that o
P*(p(β n ,C ) > ε/8) < ε/2 for all n ^ n .
Next, if £ (ώ) φ T
then p(φ ,§ (ω)) ^ ε/4 for all itι6E } whence
either
p(ft (ω), Φ (ώ)) > ε/8
or
ρ(φ , G (ω)) k ε/8 for all m G IN m μ
(note that p(ft (ω), Φ (ω)) ^ ε/8 implies ρ(φ , 5 (ώ)) n μ m μ ^ ρ(φ 5 3 (ω)) - p(ft (ω), S (ω))^ε/M - ε/8 = ε/8 for all m G U ) . m n n μ But since P(Φ m , ϊ (ώ)) ^ ε/8 for all m 6 E implies Uft (£)(δ> > ε / 2 μ (note that MJ^ /-\(^) ^ ε / 2 would imply φ (ω) G H := S
Π CB
. , whence
p(φ , Φ (ω)) < ε/8 for some m G U since {φ : m G 3N} is dense in H ) , m μ m it follows from (^
together with (s) that Q
This concludes the proof of Theorem B.
holds true.
D
After having taken great care in proving the fundamental characterization theorem for μ-Donsker classes
, we can confine ourselves now to giving
!
Dudley s Proof of Theorem A.
In view of (E ) and (54) we have N(ε,C,μ) < «> for each e>0, i.e. C is totally bounded for d
9
and therefore by Theorem B it suffices to prove
By the way, if instead of Theorem 12 the Portmanteau theorem (cf. (b) there) is used, the last part of the proof becomes much simpler.
EMPIRICAL PROCESSES
(+):
125
(E. ) implies that for any 0 < ε < 1 there exists a 6 = δ (ε), l o o 0 < δ
< 1, and there exists an n
Ω
= n (ε,δ ) such that for each n > n
n
n
n
r
F*(M>β (δ ) > ε) < ε. P
O
n Let 0 < ε < 1 be arbitrary but fixed and N (x) = N (x,C,μ). Suppose that δ , k=0,l,2,.,. is a sequence of nonnegative real numbers tending to zero (δ
will be specified below).
According to the definition of N (δ ,C,μ) take sets
Λmίk)6 B '
^I'
m ( k ) :=
W «
such that for each C 6 C and k=O,l,2 9 ... there exist i(k) = i(k,C) and j(k) = j(k,C), i(k),j(k) G {1,...,m(k)} 9 TVΓT Ί~Ή
Δ
^™" f
£"" Δ
ki(k)
3*πH
HI A
κ](k)
^
k](k)
Δ
i ^
ki(k)
/R
k
Since {(λ)o (δ ) > ε} = ίsup[|p (C) - p (D)|: C,D E C 9 μ ( C Λ D ) < δ ] > ε} p o n n o n C {sup |p (C) - 3 (A .,
cec U {sup [J3 (A
) - p" (A
n
n
) | , μ(A
(o
°^ ' Δ A
n
,)\
> ε/2}
c)
) < 3δ , r,s E ί l 9 , , , ,m(O)>l > ε/2}
= E 1 ( ε , δ Q 9 n ) U E 2 ( ε , δ Q 9 n ) , say, it suffices to show that δ
P (E.(ε,δ 9 n ) ) < ε / 2 , i=l,2, for an appropriate
= δ (ε) and n sufficiently large.
STEP Q :
Let us consider first E 2 replacing (in view of STEP (g) below)
ε by ε/2, i.e. we will show that P2
:= P*(E 2 (ε/2,δ Q ,n)) = P(E 2 (ε/2,δ Q ,n)) < ε/4 for a proper choice of
δ = δ (ε) and n sufficiently large. o o Applying Lemma 4 (i) of Section 1 we get
P o ύ 2 [m(O)]2exp ( 6δo
for n > n
o
2 (^ ,/ o „ ) ^ 2 [m(0)] exp + - n -
2
Z
o
2 2 := ε /(256 δ ) ; o
now , as to m(0), it follows from ( E ^ together with N^-Cx) + as x Ψ 0 that
126
PETER GAENSSLER
xlog N (x) •> 0 as x •> 0, whence there is a γ = γ(ε) > 0 such that 2
£l)
N (x) ύ exp(ε /(8OO x))
for all 0 < x ^ γ.
2 Thus, for δ o έ γ and n > n Q , ?2 έ 2 exp ( J^-
2 2 - I § 5 ? - ) = 2 exp ( - 3 ^ — ). o o o
But since 2 @
exp (-
we obtain for δ
^ min(γ,α) that P 2 < ε/4 for all n > n .
STEP (^:
) < ε/8 for α small enough,
To cope with the other event E- a certain chaining argument will be
used: for this we note first that the entropy condition ( E ^ is equivalent to
ί y" 1 / 2 (log N ( y ) ) 1 / 2 dy < - and to Z ^ l o g X 0 HEM
N ( 2 ^ ) ) 1 / 2 < «;
therefore, there exists a u = u(ε) so that 0
Now, let 6 δ
Σ
(2~1log N ( 2 " X ) ) 1 / 2 < ε/96
Σ
exp(-2 £ + U ε2/(9000(il+l)4)) < ε/32.
=ά(ε):=2
•~r
and
with r ^ u and r large enough so that also
^ min(γ,α) (cf. STEP O ) .
For k=l,2,... let δ, : = δ - 2~ k = 2 ^ ( r + k ) and b, := (2~klog m ( k ) ) 1 / 2 , K O K i.e. b v δ
1 / 2
= (2-(r+k)log N τ t 2 - ( r + k ) ) ) 1 / 2 Σ
b. δ k
Next, let B ] < = B ^ C ) ;= ^ D
k
=
and
D
k(C)
:=
.
'
we have
< ε/96.
° ^
N
A^^ . ^ ^
^ljCk-elC) N ^ j d c C ) 5
μlD k ) < δ k + 1 < δ R
so that by ©
t h e n
" ^ ί
and
<
δ
k
(cf. STEP (ΐ)).
we choose n := ε /(256 δ ). (Note that δ ^ α < ε /1600, o o o xjv so that n > 10.000/ε o Then, for each n > n
2
-> ~ as ε •* 0.)
there is a unique k = k(n) such that
EMPIRICAL PROCESSES
Θ
1 /9
1/2 < 86.
K
Now, f o r e a c h n > n
127
n ' /ε £ 1.
a n d e a c h C 6 C we o h t a i n ( w i t h k = k ( n ) , i ( k ) = i ( k , C )
and j ( k ) = j ( k , O ) 1 /9
s-*κ
©
β
(A
n ki(k)
)
"
ε / δ
&
(A
* n ki(k)
)
δ
* k
n
*
β
n
(C)
Also
β (B )| + |β (D )|1. n l n ^ Let S n be the collection of sets B = A n .N.A n
or A Λ
Λ
Λ
v A n . with
j E {1,. . . ,m(A)} and m 6 {1,. , , 5m(λ+l)}, respectively, and so that μ(B) < <S J Then, for each C e C, B β (C) and D β (C) G S o . The number of sets in S. is bounded by (§)
|S |^ 2m(£Jm(A+l).
For later use, note that (by the definition of b 0 ) Jo
£
m(£) = Let d & := max( U + l ) " " 2 ε / 3 2 , 6 b £ + 1 6 Q l i ^ " ) (g)
Σ
d
^
t h e n by
(3^)
< e/8. 1/2
For each I ύ k = k ( n ) , n > n , we have n O thus by (9)
1/2 δ
Z n ' )L
<S > ε/16; K s£S
Now, by Lemma 4 (ii) of Section 1 we obtain for each B 6 S
P
:=P(|β (B)| > d ) ί 2exp(
Thus, since μ(B) < 6^
and
d^n
-1/2
4 ^ 2δ^, we have
ryx ).
128
PETER GAENSSLER
o
Let
mU)mU+l) ^ 4[mU+l)]
{.+9 9
= 4 exp(2
b£+1).
Then, using (^ and (lg> we obtain P£
:= Γ( |β (B)| > d
for some B 6 S
)
2d }=M exp(
7
i
d 2 /(8 δ ) and
Now, by definition of d 0 ,
JO
d 2 /8
" T2Γ
ε 2 /(8
O
(32)2(ί,+l)i+) and so
4 e x p ( - 2 A + Γ ε2/(9000(ic+l)i+)). k=k(n)
Thus, by Σ
Next, again for k = k ( n )
P
9
(n > n ) we have o
< 4 ε/32 = ε/8.
n > n , let
and Q := P(V > ε/8). n n Then by Lemma 4 (i) of Section 1 and (jΓ) (according to which 4 3
n
-1/2 ε ϊ -
U(k)]
= exp(2
2
3 δ
k
}
. 2 exp(
2kb2)
^ ,
2 exp(- J L 2 _
• 2 exp(8" k
9
r29Γ
= 2 exp[2K(2b1" - - | — - ) ] .
Now, for s := k+r, 2b2 = 2 1 klog m(k) = 21"klog \i\)
= 21"klog
EMPIRICAL PROCESSES
-ξ!
Thus Q n S 2 exp
Now, if V
129
£ ε/8 then by (?) |β (C) - β (A..,,
* ) I ^ε/4 for all C E C,
and therefore 1
1
o utc
= (E1 Π{V In with W
Now W
"- *-
> ε/8}) U (E1 Π{V In
:= {sup |β (\ i(k
c )
) " β (A . (
^ ε/8.}) C {V > ε/8} U W n n J | > ε/H}.
C W ! := {sup[ Σ | β (B,(C)) | I > ε/8} U {sup[ © n CGC 0^£
Σ
j β (Dβ(C)) \ I > ε/8} ,
where according to (?) (note that B (C), D (C) 6 5 ) ^ ^ ^
At
P(W') ^ n
thus, together with P(V P
Σ
P
Λn
>ε/8)=Q
ΛJ
+
Σ
JO
P
< ε/4; (l^
< ε/4 it follows that
(E.Cεjδ ,n)) < ε/2 for n > n , 1 o o
This proves (+) and concludes the proof of Theorem A,
D
(57) REMARK. The above proof shows that the two conditions (a) and (b) of Theorem B are implied by ( E ^ without imposing (M). I,S. Borisov (1981) has shown that (E.) cannot be weakened, being necessary in case C is the collection of all subsets of a countable set X, where (E-) is equivalent to 1 /9 Σ (μ({x})) < «; cf. also M. Durst and R.M. Dudley (1980). x6X (58) EXAMPLE. As an illustration of the applicability of Theorem A we will show that in (X,B) = QRk5/B ), k i l l , the class C = J R of all lower left orthants is a μ-Donsker class for any p-measure μ on β, (1952) for k = 1 and by R.M. Dudley (1966) for k k 1 ) .
(proved by M,D. Donsker
130
PETER GAENSSLER
As remarked in (48) (a), condition (M) holds true for J,
so, by Theorem A 9
we must show that (E.) is fulfilled: a) For k = 1, consider for any 0 < ε ^ 1 the partition -« =: t < t <....< t Λ < t := ~ o 1 m-1 m
of ]R9 where
t. 1 := sup {t 6 Έ: μ((t.,tl) ύ ε/2}. Since μ((t.,t
1) ^ ε/2 and μQR) = 1 9 we have m - 1 £ 2/ε.
Then, taking as A. ! s in the definition of N (ε,J 9 μ) all sets of the form
we obtain min {n 6 ϋ: 3Al5...,A
£ β s.t. for all C 6 l
there exist i,j with
A. C C C A. and μ(A.\ A.) < ε} ^ 2(m-l) + 2 = 2m ύ 4/ε + 2 ύ 6/ε. 2 2 This implies that log N (ε ,J..,μ) ^ log 6/ε showing that (E ) is fulfilled for k = 1. b) For k > 1 the result is an immediate consequence of a) and the inequality (59) of the following lemma (formulated in greater generality as needed in the present case), LEMMA, Let (X,B) be a measurable space and let μ be a probability measure k k on the product σ-algebra © B in X , k ^ 1, with marginal laws π.μ on B, i=l,...,k. Let C. C B, i=l,...,k, be given classes of sets and k C := { x C : C. 6 C , i=l,,.. 9 k}. x x i=k 1 Then (59)
N I (ε 9 C 9 μ) ύ
k Π
N^ε/k^^μ).
Proof. We may and do assume that n. := N (ε/k9C.9τr.μ) < °° for each i=l 9 , t l 9 k. Then there exist A. ,...9A. r.,s. 6 {1,...,n.} with
6 B such that for any C. 6 C. there exist i
EMPIRICAL PROCESSES
A
ir
C
C
i
i
C
A
a n d
is
π
i
i
μ(A
V A
is
131
} <
ir i
ε/k
i
>
i=l,...,k. This implies that
x
A.
C
x
• _-. ir. l-l l
._ 1=11
c. C
x
l
. i=l
A.
is. l
and
μ( x
Σ μ(B.) (with B. := X x ... xX x (A. X x 1S =l i
=
Σ
. _i=l
π.μ(A. l
is. l
\ A.
IΓ.
x
A.
βΘ
i
X A. xr i
A.
\
x
is. . Λ i i=l
A.
)
lr. i
) x X x ... x X )
) < ε.
l
Since there are at most n
i=l
. i=l
n
...
B, (59) follows.
ΓL approximating sets of the form
D
1
SOME REMARKS ON OTHER MEASURABILITY ASSUMPTIONS AND FURTHER RESULTS: Instead of (M) Dudley (1978) used the following measurability assumption (M ) (again w.r.t. the canonical model (Ω5F,JP) = (X (M ): o
,β
, x μ)): H
3 : Ω ^ S Ξ D (C 9 μ) is ?.B, (S,ρ)-measurable, n o o where F denotes the measure-theoretic completion of F w.r.t, P = x μ . H
Imposing (M), it follows that β
is F,B^(S,p)-measurable, whence
(M) implies (M ) . On the other hand, replacing A = σ ( { π : C (Ξ C}) by A
o
:= σ({π_: C 6 C; p( c
9
φ ) : φ E S})
and imposing (M ) instead of (M), it follows that β where B (S,ρ) C A
C B(S,ρ)
is F,A -measurable, (cf. (47)), which means that also under (M )
one meets the basic model of Section 3. Thus, Theorem A and Theorem B hold as well (with the same proof) if (M) is
132
PETER GAENSSLER
replaced by (M ),
Besides (M ) Dudley (1978) Introduced a second measurability assumption o (M ) (called a μE Suslin property for C ) , stronger that (M ) , which turned out to be verifiable in cases of interest where (M) or (SE) fails to hold (cf. (48) (b)) t As shown in Gaenssler (1983), based on Theorem A (with (M) replaced by (M )) one obtains a functional central limit theorem for empirical C-processes indexed by classes C allowing a finite-dimensional parametrization in the sense of the following theorem:
THEOREM C. Let X be a locally compact, separable metric space, B = B(X) be the σ-algebra of Borel sets in X, and let K be a compact subset of ΊR , I ^ 1. Suppose that f: X x K
+H
is a function satisfying the following conditions (i) - (iii) ((iii) with respect to a given probability measure μ on B ) : (i)
f
(ii)
f.(x): K -> B is "uniformly Lipschitz", i,e,,
z
:= f( jz): X ^ E is continuous for each z 6 K
M :- sup sup{jf (x) - f t(x)j / |z - z ! | , z + z 1 , z,z' E K} < » z Z xEX (where |z - z'| denotes the Euclidean distance between z and z 1 ) (iii) μ({f
z
E [-ε,ε)}) = Cf(ε) uniformly in z E K.
Let C C B be defined by
C := {{f
k 0}: z E K } . z
Then C is a μ-Donsker class; furthermore, (M ) (and therefore also (M )) is satisfied for C and μ. (60) EXAMPLES, (a): Let (X,B,μ) = ([θ,l] k , [θ,l] k n ^ , 5 λ l ) , k ^ 1, λ, being K the k-dimensional Lebesgue measure on [θ,l]
k
K
Π (β , and let C C β be the class
of all closed Euclidean balls in [θ,l] . Then C is a λ-Donsker class and ( M ^ is satisfied for C and λ, .
EMPIRICAL PROCESSES
133
In fact, take k
C
k
K := ίz = (y,r): y E [θ,l] , 0 ^ r ^ r y := sup {r: B (y,r) C [θ,l] }}, c k where B (y,r) := ίx 6 E : e(x,y) ύ r} (e denoting the Euclidean distance in k
[0 9 l] ) 5 and define
k
f: [θ,l] x K + ]R by C
f(x,z) := e(x,£B°(y,r)) - e(x,B (y,r)), z = (y,r) E K (= r - e(x,y)) where B (y,r) := ίx E E. : e(x,y) < r}. Then
{{f ^ 0}: z E K} z
is the class of all closed Euclidean balls in
X = [θ,l] and it is easy to verify (i) - (iii) of Theorem C giving the result, (b) (cf. (48)(b)): consider the same p-space (X,B,μ) as in (a) and let C := {(C + z) n [0 9 ll k : z E [θ,l]k}, C being a fixed closed and convex subset of X = [θ,l] , k k 1, (cf. R. Pyke f(x,z) := e(x,CC°) - e(x9C ), x 9 z E [θ 9 l] k , with z z C := C + z and C denoting the interior of C . & z z z (1979)). As in (a) let
Then C is a λ -Donsker class and (M.,) is satisfied for C and λ, . This follows again from Theorem C; for this we have to verify the conditions (i) - (iii) there and also that
(+) ad (+): x E C
k
C = {{f
^ 0}: z E [0 9 ll } 9 i.e., that
C
^0}
= if
for each z' E [θ 9 l] k .
implies that e(x,C ) = 0 whence f (x) = e(x9tC°) ^ 0; on the
other hand, if x E ()C then e(x9C ) > 0 9 since C is closed, and e(x9CC ) = 0 z z z z whence f (x) = -e(x9C ) < 0, this shows (+). z z ad (i) follows immediately from the fact that for any 0 * A C X |e(x ,A) - e(x ,A)| ^ e(x ,x ) for each x ^ ^ £ X ad (ii): let f'(x) := e(x,CC°) and f"(x) := e(x,C ), i.e.., —^—^^——
2
Z
z
Z
f.(x) = fτ.(x) - fV(x) for all x E [θ,l] , Then it suffices to show that both
134
PETER GAENSSLER
f;(x) and f"(x) are uniformly Lipschiΐz: as to fj(x) this follows from k
0 V x e ^^
k
[05l] :
|e(x,CC°) - e(x 9 CC°,)| ^ e(z,z') Vz 9 z' E L θ 9 l ] . z z
ad Q ) : we use the following fact which is easy to prove: (+)
For any closed F C [θ5l]
and any x 6 F° there exists
a w E 9F such that e(x,w) = e(x 9 CF°). Now, given any x E Lθ,l]
let w.l.o.g, z and z
1
be such that x 6 C° Π c ° τ ; z z applying then (+) for F = C and F = C ,, respectively, we obtain z z
e(x,(JCO) = e(x,w ) and e(x,(JC°t) = e(x,w E 3C and t ) for some w Z X5Z Z X9Z XjZ Z w , 6 3C l 5 respectively. Furthermore33 since C w for some c
and C 1. are closed, z
x,z
= c + z χ5z
E C and c
X9 Z
(++)
z
and
w
. = c , -t z 1 x 9 z' x9z'
, E C 9 respectively, and
X9 Z
e(x9c
+ z) ύ e(x,c X9Z
f
+ z) and e(x9c
XjZ
c +zf) X, Z
9
, + z1 ) XjZ
respectively.
Thus C^) - e(x,CC° ) = e(x9c
^
+ z) - e(x 9 c
+ zf)
e(x 9 c . + z) - e(x9 9c , + z') ^ e(c τ . + z 9 c . + z 1 ) = e(z.z'). xzτ x 9 z' x9z x 9 z'
This proves ^ ^ . That also f" is uniformly Lipschitz follows from (2) 1
Vx E [ 0 9 l ] k :
|e(x,C ) - e(x9C , ) | ^ e(z,z ! ) Vz 9 z τ E [ θ 9 l ] k . Z Z
ad (2) : Given any x E [θ,ll that for all z 9 z
!
E [θ9ll
9
and any ε>0 there exists a c = c(x 9 ε) E C such e(x9c + z) ύ e(x,C ) + ε and thus
e(x,C ,) ^ e(x 9 c + z ! ) ύ e(x9c + z) + e(c + z,c + z f ) z = e ( x 9 c + z ) + e ( z , z ' ) ύ e ( x , C z ) + ε + e ( z , z ' ) f o r any ε>09 whence e ( x 9 C
z
f
) - e ( x , C ) ^ e ( z 9 z ! ) y i e l d i n g Cty by s y m m e t r y . z ^"*^
EMPIRICAL PROCESSES
135
Before proving (iii), let us remark that so far we have only used that C is a closed subset of [θ,ll
for proving (iii), in addition, some smoothness of the
boundary of C is needed. So we will now use that C is convex, ad (iii): We must show that λ ({f G [-ε,ε)}) = βf(ε) uniformly in z G K, K z For this it suffices to prove
0
{f
CO
z
G L-ε,ε)} C c ε \ C z ε z
for a l l z G L θ , l l k , and
sup . λ. (C \ C ) ^ c. ε £-r^ -, i k k z ε z k constant c
(Here Aε
:= {x:
for ε Ψ 0 with some
depending only on k.
e(x,A) ^ ε} and
A := {x:
e(x,CA) > ε}.)
x G X, whence (a)
f
(x) = -e(x,C ) i f f x G CC°, z z f (x) = e(x,CC°) iff x G C , and z z z f (x) = 0 iff x G ac . z z z
(b) (c)
Thus (note t h a t X = [(βC°) \3C 1 + (C \ 3C ) + 3C ) z z z z z -ε £ f
-ε < e(x,C ) ύ ε z
(x) < ε z
xG(CC°)\3C f (a) I z z I I
-ε ύ e(x,Cc ) < ε z
-ε £ f (x) < ε x G C \3C z z
(b)
f (x) < ε z x G 3C
x G C \ C, z εz
x G ( Cv C ° ) \ a C z z
x G
f
> =>
. ac
(x) = 0 z
(c)
x G C \ C , and z ε z'
x G ac
x G C° N\ C , z εz
This proves Ad (j+): Due to the translation invariance of λ, Qϊ) is equivalent to λ (C \ C) ^ c, ε as ε Ψ 0. K ε K
136
PETER GAENSSLER
Now, as shown in Gaenssler (1981) one has ε
(61)
sup λ ( C \ C) ύ c ε k ε K CE<Γk
as ε Ψ 0,
where (C denotes the class of all convex Borel sets k
in [θ,l] , k Ξ> l. This proves the assertion of Example (b), D
(62) ADDITIONAL REMARKS, (a): the above considerations show that the set of all translates of a fixed closed but not necessarily convex set C is a λ, -Donsker class provided that C has a smooth boundary in the sense that λ ( C ε \ C) = CΓ(ε). Based on a result of E.M. Bron^tein (1976) it was shown by K ε Dudley (1981a) that for the class
(63)
for 0 < ε ^ 1
and some constant M < °° depending only on k, For k = 2 this yields that (E ) is fulfilled for €ί and λ 2
J (log N _ ( χ , C ^ , λ _ ) )
o
λ
z
1/2
dx ^ J
z
M
1 / 2
x~
1/2
in fact, dx < »,
o
implying a result of Bolthausen (1978) according to which C
is a λ2-Donsker
class, But, for k ^ 3, (63) does not yield ( E ^ for (C and λ with a result of Dudley (1979a) showing that C
which is in accordance
is not a λ -Donsker class for
k ^ 3. (b): let us reconsider the example in (60)(b) according to which for any fixed inclosed a n d c o n v e x s e t C i n X = [ θ , l ], k ^ l , C = {(C + z) n [0,l] k : z 6 [0,l] k } is a λ -Donsker class and also (M.) (and therefore also (M )) is satisfied K
1
O
for C and λ, . The way we derived this result from Theorem C shows that λΛ can be replaced by any p-measure μ on [θ,ll
Π β
having a bounded density w.r.t.
EMPIRICAL PROCESSES
137
λ^, whence, by Theorem 3 (using (M ) ) , n
1/2
sup |μ (C) - μ(C)| CEC n
L
> sup |G (C)| CEC μ
implying (note that w.l.o.g. G ( ,ω) E S and therefore sup |G (C,ω)| < «> μ ° CEC μ for each ω) sup |μ (C) - μ(C)| = D (C,μ) - ϊ * 0 n CEC n being equivalent with D (C,μ) -> 0 P-a.s. n according to Lemma 6 in Section 1. (Note that the necessary measurability for D n (C,μ) is implied by (M ).) Thus C = {(C + z) Π [ 0 9 l l k : z E Lθ,ll k } is also a Glivenko-Cantelli class (compare this with our conjecture at the end of Section 2 stating that C is not a Vapnik-Chervonenkis class). Of course the above reasoning works in general, i.e. one gets (64)
Any μ-Donsker class C satisfying (M ) is also a Glivenko-Cantelli class.
(c): Let (X,B) be the Euclidean space ]R , k ^ 1, with its Borel σ-algebra β = /B and let μ be any p-measure on (B . Let B
be the class of all closed
Euclidean balls in Έ. . As shown at the end of Section 2 B
is a Vapnik-Chervo-
nenkis class (VCC); we also know from (48)(a) that (M) holds true for B . Furthermore, as pointed out in Gaenssler (1983), also (M^) is satisfied for C = B, and any μ on β. , Thus, for any μ, B, is a μ-Donsker class according to the following general result of R,M. Dudley ((1978), Theorem 7.1) stated here without proof (cf. also D. Pollard (1981): THEOREM D. Let (X,B,μ) be an arbitrary sample space and C C B be a VCC such that (M ) is satisfied for C and μ; then C is a μ-Donsker class. (d): A condition like (61) was also basic for the results of R. Pyke (1977 and 1982) on the Haar function construction of Brownian motion indexed by sets and
138
PETER GAENSSLER
on functional limit theorems for partial-sum processes indexed by sets (1982a). In fact, Pyke considers classes C of closed sets in X = [θ,ll , k ^ 1, fulfilling (besides an entropy condition) the following two conditions: Al. There is a constant c > 0 such that for all ε > 0 and C E C ε
λ k ( C \ £ C ) ^ cε. A2. C is totally bounded with respect to the Hausdorff metric d^ defined by rl d ( C , D ) := inf {ε > 0: C C D ε and D C C £ } for C,D E C rl (and C ε := ίx: e(x,C) ύ ε}).
In another very important and original contribution of T.G. Sun and R. Pyke (1982) on weak convergence of empirical processes, a certain index family C of closed sets in [θ,l] , k ^ 1, closely related to one introduced by Dudley (1974) is studied and it is shown in particular that this class fulfills Al.
In contrary to Dudley's (1978) approach to functional central limit theorems for empirical measures (i.e., empirical C-processes) the paper of Sun and Pyke (based on results of Sun's thesis (1977)) involves first the study of a SMOOTHED VERSION of the empirical processes obtained by replacing the unit point masses assigned to each observation by a uniform distribution of equal mass on a small ball (in the sample space (X,B,μ) = ([θ,ll k , [0,ll k Π β
, χ ))
of radius r centered at the observations (i.e. 3 (C,ω) is replaced by
3 Γ (C,ω) := n " 1 / 2 n
Σ ζ?(C,ω) with ζ?(C,ω) := λ,(C Π B°(ξ.(ω),r))/λ. (B°(ξ.(ω),r))
. _.. l
i
K
l
κ i
-λ (C), where B (ξ.(ω),r) denotes the closed ball of radius r centered at the K 1 observation ξ.(ω)).
This approach has the advantage that the smoothed version has continuous sample paths in the space of all d -continuous functions on C, The remaining steps in the Sun-Pyke approach are then to show the uniform (w.r.t, C) closeness of the smoothed and unsmoothed versions and to establish weak sequential compactness which amounts to verify a conditions like (b) in Theorem B on the
EMPIRICAL PROCESSES
139
uniform (w.r.t. n) behaviour of the modulus of continuity.
In this context the following mode of weak convergence is used (cf. R. Pyke and G. Shorack (1968)): If η , n 6 1 , and η are defined on some p-space (Ω,F,]P) with values in a metric f
space S = (S,d) (like e.g., D (C,d ) ) , the Π s and η. being not assumed F,Ameasurable for some σ-algebra A in S (with B (S,d) C A C B(S)) 9 then n said to converge weakly to η iff
is
lim E(f(η )) = E(f(η)) for all f € C (S) n n-*»
which are, in addition, such that each f(n ) , n E H , and f(η.) is a random variable, i.e.^F,/B-measurable ,
This concludes our remarks on the other measurability assumptions and further results. For other extensions the reader is referred to our concluding remarks at the end of Section 4.
At this place we prefer to present some of the interesting results obtained by G. Shorack (1979).
FUNCTIONAL CENTRAL LIMIT THEOREMS FOR WEIGHTED EMPIRICAL PROCESSES: This part is concerned with some results on weak convergence of so-called weighted empirical processes supplementing in another way our earlier remarks in Section 2 on the a.s. behaviour of weighted discrepancies and giving at the same time a further illustration of the special results concerning the D[θ,l]case summarized at the end of Section 3. We will follow closely the presentation in Shorack ! s (1979) paper using some modifications due to W. Schneemeier in a first draft of his Diploma-Thesis, University of Munich, 1981/82.
Let (ξ •)-,<•< 9 n ^INj be an array of row-wise independent random variables defined on some p-space (Ω,FjP) with distribution functions F ., l^i^n, n (Ξ U , being concentrated on Lθ,l] (i.e. F n i ( ° ) = °
n en).
a n d
F .(1) = 1 for all
PETER GAENSSLER
Before introducing some weight functions q as in Section 2, let us start with the consideration of the following form of a WEIGHTED EMPIRICAL PROCESS W based on (ξ .) and on a given array of so-called scores (c -1/9
(65)
Wn(t) := n
)-,^ ^ , n 6 U :
n
J ^ ί l ^ j ί ^ ) - F^Ct)], t e [o.l],
where the constant scores c . are assumed to satisfy m -1 (66)
n
Π
2 Σ c . = 1
for each n.
Note that for c . Ξ 1 and for ξ . being uniformly distributed on [θ,ll W reduces to the uniform empirical process α In the same way as α J
n
considered at the end of Section 3.
there, also W will be considered as a random element in 5 n
(D 9 B b (D 3 p)) as well as in (D,B(D,s)). Generalizing Donsker ! s functional central limit theorem for α
we are
going to give sufficient conditions under which there exists a certain Gaussian stochastic process W being a random element in (D,B (D,p)) with JL{W}(C) = 1 (C = C[θ,ll being again the space of continuous functions on [θ,ll) and such
L that W
n
-=•* W.
Before proving one of the main results of Shorack (1979), Theorem 1.1, we will mention some basic facts and preliminary results.
(67) REMARKS, (a) It follows from (66) that v , defined by
vn(t) := if 1 j ^ . Fn.(t), t € [0,11, is a distribution function on [θ,l] (with v (0) = 0 and v (1) = 1 ) . n n we have E(W (t)) = 0 and n n 1 9 K (s,t) := cov(W (s),W (t)) = n Σ c .[F ,(sAt) - F .(s)F .(t)l, n n n . _ ni m ni ni
(b) For each 0 ^ s, t U
whence (68)
E(W2(t)) ύ v (t) for all n n
t 6 [θ,l].
EMPIRICAL PROCESSES
In the following, let F ,(s,tl := F .(t) - F ,(s), v (s9t] := v (t) - v (s), ni ni ni n n n and W (s,t] := W (t) - W (s) n n n
for 0 ύ s ύ t ^ 1; then we have
LEMMA 2 2 . ( i ) E ( W 2 ( r 9 s ] • W 2 ( s 9 t ] ) n n
^ 3v (r,sl . v (s9t], n n
O ^ r S s S t S l ;
2
(ii)
E ( W 4 ( s , t ] ) ύ 3 v 2 ( s , t ] + ( max - ^ ) n n l£i£n n
v (s,tl,
0 ύ s < t < 1.
n
Proof (cf. G. Shorack (1979), INEQUALITY 1.1). Writing c. for n i
-1 L/9 c . we have m
for 0 ^ r ύ s ^ 1 (a)
W n (r,s] =
furthermore, for
n Σ cΛΛr.s)
with X i (r,s) := 1 ( ^ s j ( ξ n i ) -
F
( r n i
*sl'
O ^ r ^ s ^ t ^ l ,
(bl) E U Λ r . s ) ) = 0 (b2) E(X?(r,s)) = F .(r,s](l - F .(r,s]) ύ F .(r,sl l ni ni ni (b3) ECX^Cr^)) ^ F ,(r,s] l ni (b4) 3E(X.(r,s)X.(s,t)) = -F ,(r,sl l l ni
F .(s,tl ni
(b5) E(X?(r 9 s) X?(s,t)) ύ F .(r,s] i i ni
F .(s,tl ni
(b6)
If {i,j,k,£} C {l,...,n}
such that |{i,j,k,£}| ^ 3, then
assuming w.l.o.g. that i (ji {j 9 k 9 £}, we have (by independence) E(Xi(r,s)X.(r,s)Xk(s,t)X£(s,t)) =E(X i (r,s))E(X.(r,s)X k (s,t)X A (s,t)) = 0. Therefore, n 9 9 2 n 2 E(W (r9s]W (s,tl) =E(( Σ c X ( r s s ) ) ' ( Z c.X. (s,t))^) n n l l l i i = 1 i = 1
=
= (b6)
n Σ c.c.cc E(X.(r,s)X.(r,s)X, (s ,t)X0 (s ,t)) i,j,k,£=l 1 3 k £ i : T< £ n Σ c.c.c. c 0 E(X.(r,s)X.(r9s)X, (s9t)X0(s,t)) J i,j,k,io=l
Σ π - ί V O - 1
i=j*k=£
c?c2 E(X?(r9s ) ) E(X?(s 1
142
PETER GAENSSLER
n
9 2
Σ A
cfcT E(X.(r,s)X.(s,t)) E(X.(r,s)X.(s,t)) 1 o-i 3 i i 3 3
k
n
99
Σ cTc, E(X.(r,s)X.(s,t)) E(X, (r,s)X ( l k X i j k * = l Tc k n
n
11
99
^ (b5)9(b2),(b4)
^
Σ c. F ,(r,s] F ,(s,t] + 3 Σ c.c. F .(r,sl F . (s,t] i=l i,k=l i*k n 2 n 3( Σ c. F .(r,s])( Σ c, F (s,tl) = 3v (r5slv (s,tl proving (i). ._- 1 ni , _1 K nK π n
As to (ii) we have
Π 4 4 E(W (s,t])= E(( Σ c.X.(s,t)) ) n i=l X x
= Σ c?E(X?(s,t)) + 3 i=l
1
"
3( Σ c^E(X^(s,t))) 2 - 3 1
Σ c?c?E(X?(s,t)X?(s,t))
i,k = l i+k
x
k
x
Σ c^CE(X^(s,t))) 2 +
1
1
2
k
Σ c^
1
9
ύ 3v (s,t] + ( max Cb2λ(b3) n l^i^n
c.) v (s,11 proving (ii), X
•
n
THEOREM 18 (G. Shorack (1979), Theorem 1,1), (i) If there exists a monotone increasing and continuous function G: [0,1] + E. for which either (a)
v (r,s] ^ G(r,s] := G(s) - G(r) for all O ^ r ^ s ^ l and all n 6 I
or
lim v (t) = G(t) for all t € [0,1],
(b)
then (W )
is relatively L-sequentially compact.
(ii) If further 2 c . max
•0
as
n
n then any possible limiting process, i.e.,any random element W in (D,B(D,s)) = (D,B (D,ρ)) such that (n ! ) of H , satisfies L{W}(C) = 1;
W ,
> W for some subsequence
EMPIRICAL PROCESSES
thus (cf. Lemma 18) (W )
143
is relatively L -sequentially compact.
(iii) Suppose the hypotheses of (i) and (ii) hold. Then there exists a random element W in (D,B (D,p)) being a mean-zero Gaussian stochastic process with cov(W(s),W(t)) = K(s,t), L{W}(C) = 1
L
and such that W
> W in (D,ρ)
if and only if lim K (s,t) = K(s,t)
for all 0 ^ s ^ t ^ 1.
Proof. ad (i): We shall apply Theorem 14 in connection with our remarks (41) and (42). In view of this we have to verify the conditions (Q, (jP) , (Cj) and
Θ for
ξ
n
= W choosing F &
n
n
: = /3~ v
n
and F := / 3 " G ,
ad (5y: Follows immediately from the above assumptions (a) or ^ ) according to the choice of F
and F in the present situation.
ad (cj) : Follows from Lemma 22 (i) according to the choice of F , ad ^Bj) : Given any ε > 0 we have to show that (+)
lim sup P(|W (δ) - W (0)| £ ε) •* 0
as
6 + 0
as
δ + 1,
and (++)
lim sup P(|W n (l) - W (δ)| £ ε) -• 0 n-*»
P(|W (δ) - W (0)| ^ ε) ύ ε" 2 E(W2(O,δl) n n n
ad ( + ):
/, ., x /κox τ o o (bl)3(b2),L.22 = ε
ε ' V 1 Σ c2.[F ,(09δ] -F2.(O,δl] ^ ε ' V 1 Σ c2. F .(0,δ] ._1 m m i=l
ni
. 1 ni i=l
ni
v (0,6], whence by (a) or ^ )
lim sup P( |W. (δ) - W (0) | £ ε) £ ε~2(G(δ) - G(0)) ^ 0
as δ + 0
since G was assumed to be continuous, ad (++): Follows in the same way as (+). ad (&:
By (42) (cj) implies (c)which together with ζϊj) implies ζ ? ) by (41),
According to (C*) , given any η > 0 there exist δ - δ(η) > 0 and
144
n
o
PETER GAENSSLER
= n (δ(η)) G H such that for all n ^ n P(W" (<5) £ 1) £ n/2. o o W n
Now, choose k 6 1 and a > 0 such that k"1 < δ and
a"2 ύ η/2;
then ρ <
1 * . » as
i <„a
a
-2
Σ v
a
L
«<'
(Aii, i l
» η/2. (67)(a)
Furthermore, for
n sϊ
n
o'
let
k
.
A := n {|w (-V1, ^1| < a} n {u;;; (δ) < 1} n {w (0) = o}; n . - . n κ κ w n i=l
n
then P(A ) ^ 1 - η and for each ω 6 A we have for any t 6 [θ,l] and 1 ύ i ύ k so that t 6
[^^)
min {|W
i ) - W (t,ω)|, |w (t,ω) - W (^i,ω)|} n Kω n n n K ) ^ ^
^a
+ W
W (ω)(δ) *a
+ 1
W h e n c e
|W (t,ω)| < ka + 1, and therefore ΊP(jjW |i £ ka + 1) ^ 1 - η for all n ^ n which proves (2p. This concludes the proof of (i). ad (ii); Suppose that for some random element W in (D9B(D,s)) = (D,8 (D,p)) W , — > W for some subsequence (n1 ) of I; then for any 0 ύ s ^ t ύ 1 such that s,t E T
:= {r E [0,1]: π
is L{W}-a.e. s-continuous}
it follows from Theorem 5.1 in Billingsley (1968) that (+)
|Wn,(t) - W n t ( s ) | 3 -ί> |w(t) - W(s)| 3 ;
on the other hand it follows from Lemma 22 (ii) that
EMPIRICAL PROCESSES
145
2 Q
2
(++)
3E(w\(s,tl) ^ 3v f(s,tl + ( max - ^ n n - s .^_f n all n
1
) v ,(s,tl ^ 4 n
for
(cf. (66) and (67)(a)).
But (+) together with (++) imply (cf. Gaenssler-Stute (1977), Exercise 1.14.4, p. 114) that 3
3
E(|W(s,t]| ) = lim E(|W t (s,tl| ) ύ lim sup (E(w\ (s ,tl) ) n ! n n'-*» (Holder) n ->~
3 / 4
(3G 2 (s,t]) 3 / 4 έ 3(G(s,tl) 3 / 2 . (++) and (a) or Since T
w
contains 0 and 1 and is dense in [θ,l] (cf, Billingsley (1968)), it
follows that E(|W(t) - W(s)| 3 ) έ 3|G(t) - G ( s ) | 3 / 2 for all 0 £ s £ t £ 1, whence by Lemma 19 P(W 6 C) = 1. ad (iii): Assume first that lim K (s,t) = K(s,t) for all 0 £ s £ t £ 1. n
We are going to show that for any αl5...,α (+++)
k , Σ α.W (t.) -=^W(O,V) with . Λ Jj n J j j=l
€ ]R and any t ,.. , ,t E [θ,ll
k V := Σ α α K(t ,t ). r s r s r,s=lΛ
ad (+++): If V = 0, then for any ε > 0 k P(| Σ α.W (t.)| ^ ε) ^ ε 3 n 3 j = 1
-
-2 ε
(cf.(67)(b))
k
k E( | Σ α.W (t.)| ) 3 n 3 j = 1
-2 α α K ( t , t ) —=> ε V = 0
Σ r,s=l
Γ
S n
in case V = 0. If V > 0, consider
C
ni[ j
Γ
S
asn-^°° which proves (+++)
146
PETER GAENSSLER
Then the ζ .'s form a triangular array of row-wise independent random variables with
E(ζ .) = 0 and such that ni n
n
2
k
-1
2
±
Σ ECζ .) =E(( Σ ζ .Γ) = V τ(( Σ α.W (t.)Γ) Π1 Π 1 Π ^ ^ -1 = V
Σ α α K ( t 9,t ) + 1 , r s n r s r,s=l
as n -*- «;
furthermore, for any δ > 0 we have n
Σ
2 .
E ( ζ
i=l
n l
lr i {
I . ζ
l ni
x
λ
) + 0
a s
n
-* « :
l > δ }
in fact, given any ε > 0 let p := 1 + ε / 2 and q > 0 be such that 1/p + 1/q = 1; then, by Holder's and Markov's inequality we obtain for any ό > 0
Jfsi (Vn)
y
-l-ε/2
( maχ
l
l
ϋ n J >ε
-^Oasn-^00
=1
Thus, it follows from the Central Limit Theorem (cf, Gaenssler-Stute (1977), 9.2.9) that n
L
Z ζ . — > N(O,1) i=l n l
which proves (+++).
Next, let W be a mean-zero Gaussian process with covariance structure given by K. Then, again for any α ,. .. ,α k L{ Σ α.W(t.)} = N(O,V)
E H and any t.. ,. . , ,t, G [θ,ll
with V defined as above.
Therefore, by the Cramer-Wold Device (cf. Gaenssler-Stute (1977), 8.7.6) it follows together with (+++) that
EMPIRICAL PROCESSES
W ~f
n f .α.
Now, by (ii), for any subsequence (W
f
W.
) of (W ) there exists a further subse-
quence (W ,,) and a random element W, τ x, ftx in
L{
147
(D,8, (D,ρ)) such that
V ) ( n » ) K C ) =l a n d V^ W (n')(n") i n (D'p}
Applying Theorem 3 and using the fact that each W, t,
M )
is uniquely deter-
mined by its fidis, it follows that L W, n1 , , n =: W (n )(n")
L = W
and therefore
d
L
W — > W in (D,p). To prove the other direction, suppose that W
V,,.
—•> W in (D,p) where W is a
r.e. in (D,& (D,p)) and is a mean-zero Gaussian process with covariance structure given by K (and such that L{W}(C) = 1 ) . Then, by Theorem 3, for any 0 ύ s ύ t ύ 1 W n (s) - W n (t) -^> W(s) and
E(W 2 (s) n
W(t)
as
n -> «,
W 2 (t)) S E ( W 4 ( s ) ) 1 / 2 E ί W ^ C t ) ) 1 7 2 ύ 4 n n n
for all n 6 Έ (cf. (+++) above), whence (by the same reasoning as in the proof of (ii)) E(W(s)
W(t)) = lim
lim K (s,t) = K(s,t) n
^o
E(W (s)
W (t)), i.e.,
for all 0 ύ s £ t ύ 1.
D
n
SOME GENERAL REMARKS ON WEAK CONVERGENCE OF RANDOM ELEMENTS IN D Ξ D[0,ll w.r.t. p -METRICS: Let q be any weight function belonging to the set (L := {q: [θ,l] ->]R9 q continuous, q(0) = q(l) = 0 and q(t) > 0 for all t G (0,1), having the following additional properties (i ) - (iii )}: There exists a δ
=δ(q),O<δ
^ 1/2, such that
148
PETER GAENSSLER
(i )
q(t) and q(l-t) are monotone increasing on Lθ,ό 1;
(ii )
yr
(iii ) J q 0
and q
/*-—
are monotone decreasing on [θ,δ 1;
(u)du < °° and
(i.e. q
J 1-6
q
(u)du < °°
square integrable).
Here and in the following we make use of the convention — := 0, REMARK. Let q 6 ^ :
then for any t ύ <5
t 1/2 t -2 1/2 t -2 1/2 (—« ) = (J q (t)du) ^ (J q (u)du) , whence by (iii ) one q q (t) 0 0
has (iv )
"TTT "^ 0 as t ^ 0; similarly, by symmetry, q(t) + 0 as t + 1.
-TΓΓ
Now, let ξ , n ^ 0, be random elements in (D,B, (D,p)), defined on a common p-space (Ω,FjP), such that P(ξ (0) = 0, ξ (1) = 0) = 1 andP(ξ /qED) = 1 for all n ^ 0; in this case we shall assume w.l.o.g. that ξn( ,ω) —j—.— E D q( )
for each ω E Ω and all n Z 0.
Then the ξ /q, n ^ 0, are also random elements in (D,Bb(D,p)) (cf. (37)). Let q E !L and define D with
:= {y = qx: x E D} Ξ qD 9 and C
:= qC
C Ξ C[0,ll, and define the p -METRIC on D by
p (y-,y2) : = p(x l 3 x 2 )
if y. = qx. E D , i=l,2, where we tacitly
assume that x(0) = x(l) = 0 whenever x E D occurs. Let B, (D ,p ) be the σ-algebra in D generated by the open p -balls and conto b q q q° q sider the map T : D •> D , defined by T (x) := qx for x E D; then T. is B, (D,ρ), B,(D 9,p )-measurable: 1 b 9K b q q
EMPIRICAL PROCESSES
in fact, let B pq
then T^(B
149
(y,ε) be the open p -ball with center y E D and radius ε; q q
(y,ε)) = {x G D: p (y,qx) < ε} = {x E D: p(y/q,x) < ε} E B b (D,p).
In the same way T :D
•> D, defined by T (y) := y/q, is B (D ,p ), B,(D,p) -measurable.
This implies that ξ/q is a random element in (D,B(D,p)) iff ξ
is a random element in (D jB^tD sP ))•
Note also (cf. (39)) that C G B, (D ,p ) and that (C ,ρ ) is a separable and q b q q q9 q closed subspace of (D ,p ). Furthermore, one has the following LEMMA 23, In the just described setting, the following two statements are equivalent: (i)
ξ /q - ^ ξ /q no
(ii)
ξ -^-> ξ n o
in (D,p) and
L{ξ /q}(C) = 1 o
in (D ,p ) and L{ξ }(C ) = 1, q q o q
Proof. (i) => (ii): Note first that L{ξ }(C ) = P(ξ G C = qC) o q o q = P(ξ Q /q E C ) : L{ξ Q /q}(C), Now, according to (28) (cf, (h ! ) there) it remains to show (+)
E(f(ξ )) ->E(f(ξ )) for every f: D -> ]R which no q is bounded, uniformly p -continuous and B, (D ,p ), 0S-measurable. b q q
So, let f: D
•> E be bounded, uniformly p -continuous and B (D ,p ), β-measu-
rable, and let g: D •> ]R be defined by g(x) := f(qx), x G D; then g is bounded, B (D,p), /B-measurable (since g = f
T.) and uniformly p-continuous (since
p(x.,xo) = p (qx,,qxo) and |g(x^ - g(x o )| 1 2 q 1 2 1 2
=
|fCqx.) - f(qx )j for any 1 2
x l 9 x 2 G D, i.e. qx l 3 qx 2 G D ). Therefore, by (i) and (28) E(g(ξ /q)) ->E(g(ξ /q)) which implies ( + ) since E(g(ξ n /q)) =E(f(ξ n >) for all
150
PETER GAENSSLER
n Z 0. (ii) =*• (i): follows in the same way,
D
FUNCTIONAL CENTRAL LIMIT THEOREMS FOR WEIGHTED EMPIRICAL PROCESSES w.r.t. pq -METRICS: As before let W
be a weighted empirical process based on an array (ξ .)
of row-wise independent random variables ξ ., 1 ^ i ^ n, n G I , defined on some p-space (Ω,F,P), and on an array (c .) of given scores (cf. (65)). We assume again that the distribution functions F . of the ξ . f s are concentrated on [0,1]; here, in addition, we suppose that
(69)
for each n 6 3M: n
-1
n
Σ
F .(t) = t
for all t G [θ,l].
n l
Then , for any q 6 (L, we have
LEMMA 24, P ( W / q £ D ) = 1 for all n, whence we may and do assume w,l.o,g, that W ( ,ω)/q( ) 6 D for each ω 6 Ω and all n
EΈ.
Proof. According to the definition of W , for P-a,a. ω 6 Ω there exists a t = t(ω) ύ δ |W Ct,ω)| n
,. q(t)
such that by (69) and (iv ) ,
± n n ±
1 / 2
n F (t) , l .. ΣΣl |c II -2i-r| ) n1/2 - ^ |c -2i-r-ί ί( (max max|c|c..
m» q(t) q(t) m»
i=1
^ JJ
mm
+ 0
|W (t,ω)l as t ->- 0; similarly, —
n
, . — -> 0
as
t ->• 1 for P-a.a. ω
which implies the assertion (imposing the convention — := 0 ) . D
Now, for uniformly bounded scores, Shorack (1979) has shown:
THEOREM 19 (Shorack (1979), Theorem 1,2). Suppose that (70)
sup ( max jc . |) ^ M < °°.
Then for all q € ζL we have
EMPIRICAL PROCESSES
^
151
(i)
(W /q)
is relatively [.-sequentially compact (in (D,s)).
(ii)
Any limiting process, i.e. any random element W in (D,B(D,s)) = (D,B (D,p)) such that W ,/q
> W for some subsequence
1
(n ) o f W , satisfies L{W}(C) = 1, whence, by Lemma 18, (W /q) is relatively L -sequentially compact in (D,p) such that for any limiting process W
L{W}(C) = 1,
and therefore, by Lemma 23, (W )
is relatively L -sequentially compact in (D ,p )
such that for any limiting process W (iii) There exists a random element W o Gaussian stochastic process with
L{W }(C ) = 1.
in (D 9 B,(D ,p )) being a mean-zero q b q q
C O V ( W Q ( S ) , W (t)) = K Q (s,t), L { W Q } ( C ) = 1 W
-^> W
n o
and such that
in (D ,p )
q q
if and only if (for K (s,t) = cov(W (s), W (t))) lim K (s,t) = K (s,t) n+~
n
for all
0 ^ s ^ t ^ 1.
°
The proof of Theorem 19 (being based on Theorem 18 and Theorem 17) can be carried through along the lines presented in Shorack's (1979) paper with some slight modifications being necessary due to our choice of 0 : by the way, instead of (15) on p. 171 it suffices to impose (iv ) and instead of P(A ) ύ exp(-l/a ) one shows
P(A ) ^ 1 - I/a
to get (v) on p. 181. We are not
going to give further details here. Instead, since the proof of Theorem 1.2 in Shorack (1979) seems not suited to carry over to give a proof of his Theorem 1, 3 as mentioned there on p. 182 (note that in the case of not uniformly bounded scores it is not possible to estimate 2 ( max l^n
)(t - s) by M (t - s)
for t - s > n
, which was essentially used
n
to get (c) on p. 179) we want to present here a completely different proof of the following result:
152
PETER GAENSSLER
THEOREM 20 (Shorack (1979), Theorem 1.3). If a l l ξ . are uniformly distributed on Lθ,ll and if 2 c . max — ^ - ->- 0
(71)
L
b
then, for any q E (L, W B
instead of (70)
as n •> °°,
o > B° in (D ,p ) as n -> °° and L{B°}(C ) = 1, where
denotes the Brownian bridge.
The proof of Theorem 20 is based on the following lemmata which may be of independent interest. The first lemma is concerned with a martingale property of the weighted empirical process W
based on ξ . which are uniformly distributed on [θ,l],
LEMMA 25. Let n E U be arbitrary but fixed and write ξ. and c. instead of ξ . J l
i
ni
and c ^, respectively.Suppose that F .(t) = t for all t E [θ $ ll. Then for any (cl5...5cn)
EIR n
n1/2Wn(t) (
ϊ=t
W
martingale w.r.t.
F t
n 1 / 2 W () (s) := σ({—zr—^ : s ^ t})9 0 _L S
t < 1.
Proof. We use the following auxiliary result which is easy to prove: (72) Let ( C t ) 0 ^ t < T a (F t
n d (η
) t
o^t
T
" °°5b
e m a r t i n
:= σ({ζ s : s £ t } ) ) Q ^ < T and (G t := σ ( ί
respectively. Assume that (C t ) Q ^t
S
i s a
m a r t i n
n d
a l e s
V
w.r.t.
s
^ η t^0^t
Γ e i n d e
£ a l e w.r.t.
:= σ({ζ s ,η s : s ύ t } ) ) Q ^ t < τ a n d therefore also w.r.t.
Now, given any ( C . J . .JC
) E ]R
:=
Ji
, put
Pendent
EMPIRICAL PROCESSES
nt :
ς
4
n )
,
153
O ί t < i ,
and apply (72) to get the assertion of Lemma 25 by induction on n; for n = 1 cf. Lemma 3 in Section 1 (choosing (X,B,μ) = (Lθ,ll9Lθ,ll Π β ^ )
}
λ = Lebesgue
measure, and C := {[θ,t]: 0 ύ t < 1} there),
LEMMA 26. Suppose that F .(t) = t for all t E [θ,l]; let q £ ^ and 6
= δ*(q), 0 < ό
^ 1/2, be as in the definition of (L.
Then, for any n E IN, each ε > 0, and any 6 ^ 6 , one has
(i)
P(
sup
Wit) δ 9 9 I -^γ-γ I > ε) £ ε - 8 J q~^(u)du, and
te[o,δ] (ii)
P(
sup [ ]
q C t }
o
W (t) I - A T T I > ε) ^ ε
8
1 / q^ (u)du.
q ( t ;
Proof, ad (i): let n 6 I b e arbitrary but fixed and for each k G IN and ie{0,...,2k} let tk. : , i .
δ / 2
\
Then, due to the path properties of W /q (cf. Lemma 24), it suffices to show that for each ε > 0 and any k E l one has W δ n(tki} -2 -2 JP( sup j n * \ I > ε) ^ ε 8 J q ^(u)du, l^i^2k q U k i ; 0
(+ )
For later use note that (++)
1 ^ (1 - t
Γ
2
^ 4 for each k and i,
ad ( + ): let ε > 0 and k E l b e arbitrary but fixed. Since, by Lemma 25, for any fixed n E Έ
(
w n (t)
(
^
^Sa
mart
inSale9
w e c a n a
Pp!y Chow's inequality (cf. Gaenssler-
Stute (1977), (6.6.2)) on the submartingale W(t)
2 ε P(max
ε) kq(t} * |!ίL^4l =ε2]P(max
154
PETER GAENSSLER
W
n(tkl} 2 E ( ( - ^ - ^ ) 2 ) 1 "tkl
-2 q
2
( t )
Now, s i n c e
E(W2(t)) n
2
-2 Σ q 2(t..)E(( k l i=2
+
1
Σc2.[F ..mm
=n "
W
( t π
"
ki}
2 )2
k l
W
n(tk " J
. ( t )- F 2 . ( t ) ] = n " 1 Σ C 2 . ( t - t ni . _.. ni
2
)
= t ( l - t ) ^ t , we get by the second inequality in (++) (66) q ' 2 ( t v 1 ) E(( k l
* . 1
k l
)2)
έ 4 q'2(t.
)t
k l
ΐ
" ki
k l
έ
(i ) q
4
J
kl
q"2(u)du ί 4 j
o
o
on the other hand
ki
i=2
t i=2
k l
1
k,i
2k
t
^
k
\i
2
i
1
^.i-l
(cf.(++))
i=2
q~ 2 (u)du S 4 J δ q ~ 2 ( u ) d u .
So,
in summary we have
max ,k I
n qf
.Uk i .; I > ε) £ ε2 8 J 6 q"2(u)du 0
which proves (+). ad (ii): by symmetry this follows in the same way.
LEMMA 27.
For any q e (λ we have
D
P(B°/q 6 C) = 1, where B° denotes the
Brownian bridge and C is the space of all continuous functions on [θ,ll.
Proof. We have to show that B /q is P-a.s. continuous at 0 (and also at 1 which is shown similarly). For this, according to Lemma 19, it suffices to show that for some constants a > 1, b > 0 and some continuous function F: [θ,l] •> E. ( +)
E(|B°(t) - B°(s)| b ) έ |F(t) - F ( s ) | a for all 0 ύ s ^ t ^ 1, where (using again the convention — := 0)
EMPIRICAL PROCESSES
(δ
155
= δ (q) as in the definition of
ad (+): since, for any 0 ^ s ^ t ύ 1, B°(t) - B°(s) is normally distributed with mean zero and variance (t - s)(l - (t - s)), we have θ
° (a)
^
2
, 3(t- s ) ,==
E (
2
( l- ( t - s ) )
t
q (t)
O
O ί
ί a
a S
S t
t i
i
l
q\t)
On the other hand, for any 0 < s ^ t ^ δ ,we have (s)) [
_
-
]
)
_
[
-
_
!
. 3 s (1 - s) ,
where S
2f L
1 1 Ί*+ _ r / Γ n q(s) .-,4 q(s) " q(t) J " L q(s) K± ' q(t) n
[ ^ y (1 - ψ)f
(since ^ y + on [0,6*1 by (ii
/t"-ϊ/s" 4^/t-sΛ2 = ΓL — n ( -x J Ί ^ ( -5 ) 9 whence
(b
,
E
(CB°(s)Λ
1
.
I1*, ^ 3 ( t T s ; 2 ( 1 ' s ) 2 , 0 < s £ t . δ *
1
Now, it follows from (a) and (b) that for 0 < s ^ t ^ δ , B°(t) q(t)
2
\
4 2
[
B°( S ) 4 q(s) I } -
E (
3
B°(t) - B°(s) + q(t)
E U
(B^t^B^s)) q (t)
4
,
+ E ( ( B
o
( B ) )
o B
C s ) L
^ q
1 . ^t;
Ct-s)2(l> (t-s))2 + 3(t-s)2(l-s)2
6(q"2(t)(t - s))2
Thus, taking F(t) := 4^T J 0 0 < s ύ t ύ 1.
t
ύ
q(t)
q
1 ^S;
q(s) f
n
j
(i+i/Γj11 q " 2 ( u ) d u ) 2 .
2
q" (u)du, we get ( + ) (with b = 4, a = 2) for all
It remains to consider the case s = 0 and 0 < t ύ δ 11
F2(t) = |F(t)- F(0)|2.
This proves
but, by (a), q" 2 (u)du) 2
(+). D
156
PETER GAENSSLER
Proof of Theorem 20. In the setting of Theorem 18 and its preceding remarks (67) we have in the present situation (where F .(t) = t for all t E [θ,l]) that (cf. (66)) v n (t) = t
and
(=: G(t)) for all t E [θ,l]
K (s,t) = K(s 9 t) = sΛt - st = cov (B°(s), B°(t)).
Therefore, by Theorem 18 ( i i i ) , we have L
(a)
Wn A » B °
in (D,p).
Furthermore, by Lemma 24 and Lemma 27, for any q 6 (^ we have (b)
F(W /q E D) = 1
for all n E Ή and F(B°/q E C) = 1.
We are thus in a situation where our general remarks on weak convergence of random elements in D = D[θ,l] w.r.t. p -metrics can be applied. So, by Lemma 23, it remains to show L (c)
W n /q - ^ B ° / q
ad (c): let q q Since q
in (D,p).
Ξ 1 and for m £1 let := q 1
+ q(-) 1
is continuous and q
+ q(l-i) 1
> 0 on [θ,l]
L Wn/qm - £ *B 0 / ^
(d)
in (D t p) as n -> «,
Now, according to (28), (c) holds if we show that lim
E(f(W n /q)) =E(f(B°/q))
for all f E U^(D,p).
n-χ»
But, again by (28) and (d), we have for each m that
lim ECfCW^c^)) =E(f(B°/q m )) for a l l f E u£(D,p); n
->oo
furthermore, by Lebesgue's theorem lim m^
E(f(B°/q )) =E(f(B°/q)) for all f E U ^ ( D , p ) , b ^
since, by Lemma 27, P(B°/q E C) = 1 and therefore lim p(B°/q , B°/q) = 0 m
F-a.s.
->oo
EMPIRICAL PROCESSES
157
Thus, given any f G U.(D,p), choosing for each m 6 UN k > k Λ (with k := 0) such that m m-1 o
and putting for each n 6 IN i
:= m
if
n 6 {k 9 .,. 5 k
*1}, we obtain
lim E(f(W /q. )) = E(f(B°/q)). n x n-**> n So, it remains to show (e)
lim
|E(f(W /q ± )) -E(f(W /q))| = 0. 1
n-*»
n
For this, let ε > 0 be arbitrary and δ = δ(ε) > 0 be such that ρ(x,y) ^ 6 implies |f(x) - f(y) | S ε. (Note also that
||f|| := sup |f(x)| < °°.) xGD Then for n sufficiently large (i.e. such that i ύ 6 ) n |E(f(
V q i ) } -E(f
έ ε + 2 j|f||
[P(
sup
'
f
(Vq)'}
n . P ( p ( W n / q i , W /q) > 6) n
W (t) | - i L - y | > δ/2) + ]P(
tG[o,i 1 n
}
'
W (t) |-ϋ-_|
sup
> δ/2)l,
te[i-i ,H n
whence it follows from Lemma 26 that |E(f(Wn/qi )) n
n
£ ε + 2 ||f|| [(δ/2)^ 2 . 8 ( 1 ^ q^2(u)du + 0
/
q" 2 (u)du)] 9
^i
and therefore, by (iii ), lim sup |E(f(W / q ^ ) -E(f(W /q))| ^ ε. n-^ n Since ε > 0 was arbitrary, this proves (e) and therefore (c) is shown.
D
(73) REMARKS, (a) W. Schneemeier (1982) has given an example showing that Theorem 19 fails to hold if the uniform boundedness condition (70) on the scores is replaced by the condition
158
PETER GAENSSLER
max
2 c . - ^ + 0 n
as
n •> °°
which was imposed in Theorem 18. Thus, the assumption in Theorem 20 of ξ . being uniformly distributed on [θ,l] cannot be weakened to the assumption that
for every n 6 Iί
-1
n~
n
Σ
F .(t) = t
for all
t e [0,1]
(cf. (69)) without
strengthening the condition on the scores. (b) As to the L -statements in Theorem 18 and Theorem 19 it is possible by making use of Theorem 11 a) (or Theorem 11 ) to modify the given proofs such that they operate totally within our theory of L -convergence.
Note, for example, that along the same lines as in the proof of Proposition B
together with an application of Theorem 11
theory of L -convergence) that any sequence (ξ )
one obtains (within the _τ of random elements in
(D,B (D,p)) which satisfies the following two conditions (i)
lim δΨO
lim sup n-*»
TP(u) (<S) > ε) - 0 ξ n
lim
lim
]P(||ξ || > M) = 0
for each ε > 0
and (ii)
sup
is relatively L -sequentially compact and such that for any limiting random element ξ one has L{ξ }(C) = 1. o o Further results in this direction will be contained in a forthcoming paper by P. Gaenssler, E. Haeusler and W. Schneemeier (1983).
CONCLUDING REMARKS ON FURTHER RESULTS FOR EMPIRICAL PROCESSES INDEXED BY CLASSES OF SETS OR CLASSES OF FUNCTIONS: (a) FUNCTIONAL LAWS OF THE ITERATED LOGARITHM (cf. Gaenssler-Stute (1979), Section 1.3, concerning results for the uniform empirical process α ). One of the main theorems in Kuelbs and Dudley (1980) states that for any
EMPIRICAL PROCESSES
159
p-space (X,B,μ) the following holds true: (74) If (M^) is satisfied for a class C C β
and μ, and if C is a μ-Donsker
class, then C is a STRASSEN LOG-LOG CLASS for μ, i.e., with probability one the set β n (C) {(
:
Γ75" )nar 1/2 L t L (21oglogn)
n
-
n
^ °
^
s
relatively compact
(w.r.t. the supremum metric p in D (C,μ)) with limit set Br
L
:= {φ: C H- J fdμ, C 6 C; f 6 B } , where C
B := {f e i. 2 (X 5 8 5 μ): J f dμ = 0 X
and
J|f|2dμ ύ 1}. X
(Note that Bn C iP(C,d ) C D (C,μ).) L μ o
Now , as pointed out in Gaenssler (1983), since for (X,B,μ) = (]R1^,β ,μ), k ^ 1, the class C = J, of all lower left orthants satisfies (M ) and is a μ-Donsker class for any μ by (58), one obtains by (74) the results of Finkelstein (1971) and Richter (1974), namely (75) J
is a Strassen log log class for every p-measure μ on β, , k ^ 1,
That the same holds true for C = B
(the class of all closed Euclidean balls
in E j k ^ 1) is a consequence of our remarks preceding Theorem D and of Corollary 2.4 in Kuelbs and Dudley (1980) according to which one has (76) If (M.) is satisfied for μ and a Vapnik-Chervonenkis class C, then C is a Strassen log log class for μ.
(b) DONSKER CLASSES OF FUNCTIONS. Let α
be the uniform empirical process (cf, the end of Section 3) and let
q be some weight function considered above in connection with weak convergence of random elements in D = D[θ,l] w.r.t. p -metrics. For any q £ Q^ we know from Theorem 20 (with c . = 1) that ni
or, equivalently by Lemma 23, that
160
PETER GAENSSLER
(77) Now, ft:
α n /q - ^ B°/q
(in (D,p)).
from a different point of view, taking for each t € [θ,ll the functions [0,1] +]R defined by f t (s) := q" 1 (t) 1 [ Q
j(s), s E [ θ , U ,
α /q can be considered as an empirical process indexed by a class of functions; in fact, let tFQ
:= ίf t : t G [0,11},
then for each t E [θ,l] α (t)/q(t) = J 1 f_(s) dα (s) =: α (f ). n ~ t n n t Also the limiting process in (77) can be viewed as a mean-zero Gaussian process C
(μ being here Lebesgue measure on X = [θ,ll) indexed by tF , i.e.,
cov(G (f. ) , G (f. )) = J 1 f. f μ t l V t2 0 t l t2
note that
dμ - J 1 f. dμ . J 1 f t 0 * 1 0 2
dμ;
cov(q" 1 (t 1 ) B ° ( t 1 ) , q' 1 (t 2 ) B°(t 2 >)
= q"l(t1)q"l(t_)[t Λ t 9 - t t l = J l I l l l z o
1
fτ
fT
i
2
dμ - J 0
1
f
1
dμ - J 0
1
fτ d μ . 2
Hence (77) is equivalent to
This leads to the problem of generalizing Dudley's central limit theory from empirical C-processes to the case of EMPIRICAL 2F-PR0CESSES
3
= (3
n
n
defined by 3 (f)
:= n 1 / 2
n
(μ ( f )
- μ(f)), f 6
f,
n
where f is a given class of measurable functions defined on an arbitrary sample space (X,B,μ), and where μ (f) := J fdμ , n n
μ(f) := J f dμ,
f 6fF,
EMPIRICAL PROCESSES
μ
161
being the empirical measure based on i,i.d. ξ.'s with values in X and
distribution μ on 8,
For uniformly bounded classes of functions such an extension is more or less straightforward, but this does of course not meet the special case mentioned before (note that q
is approaching °° at the endpoints of [θ,l]).
For possibly unbounded classes iF the present knowledge is by recent work of R.M. Dudley (1981a), R.M. Dudley (1981b) and D. Pollard (1981a) as follows: let (X,β,μ) be an arbitrary p-space and β
= (3 (f))
*. be an empirical f-
2
process with fF C L (X,B,μ). It turns out that there are proper extensions of the spaces S
= u (C,d ) and S Ξ D (C,μ) considered at the beginning of Section
4 (corresponding to the special situation ίF = 1^, := ίl c : C E C}) to the present case with d
on C being replaced by 9 e
μ(fl'f2}
:=
(J
"(fl " f 2 X
) d μ )
1/9 'fl'f2
e t F
9 (or better P ^ f ^ )
:= (/(^
'f2 ~
5if
l
X leading to certain spaces S
~f2
) d μ )
d μ )
' 1 /9 5 f l'f2
G t F )
'
X = S (f ,e ) and S = S(tF,μ) of functions φ: tF -* R
which can be chosen in such a way that under certain conditions o n f
S (ίF,e ) o μ
becomes a separable subspace of (S(tF,μ)9p) and such that (3 (f))j=^-i:n ^ s
a
ll
its sample paths in S(jF,μ); here as before p denotes the supremum metric, i.e., p(φ i 9 φ.) := sup |φ Cf) - Φ o ( f ) | for φ ,φ G S(f,μ). 1 2 1 2 1 2 f e f Now, again under certain measurability assumptions (like (M) or (M ) imposed in the case of empirical C-processes) the setting of a functional limit theorem for empirical iF-processes 3
= (3 ( f ) ) f ^ψ in the sense of L^-conver-
gence for random elements in (S(f,μ), 8, (S(ί,μ) ,p)) applies, i.e., one can speak of (80)
3 — ^ (E , where (E Ξ (G (f))^^^r is a mean-zero Gaussian process with n μ μ μ f Efr covariance structure (cf. (78))
cov(G ( f ^ , μ-L
G (f )) = J f,f dμ - J f± dμ •/ f dμ. λ 9Z μ
i
X
X
X
162
PETER GAENSSLER
If (80) holds true, f is called a μ-DONSKER CLASS OF FUNCTIONS. Generalizing Theorem A from classes of sets to classes of functions the main result of R.M. Dudley (1981a) is: (81) Suppose (M ) (which means here that 6 : X -> S(?F,μ) is measurable from o n ]N the measure-theoretic completion of (X ,B , x μ) to M IN (S(f,μ), B ^ S ^ μ ^ p ) ) ) and suppose that F := sup{|f|: f G f } G LP(X,B,μ) for some p > 2; assume further that for γ with 0 < γ < 1 - 2/p and some M < °° (E ) N (ε,OF,μ) ^ exp (M ε
) for ε small enough.
Then tF is a μ-Donsker class. In this connection, N (ε,ίF,μ), a natural extension of N (ε,C,μ), is defined 2 as the smallest m E ]N such that for some fΛ ,...,f E l (X,B,μ) (not necessarily 1 m in tF), for every f E tF there exist j ,k ύ m with f.(x) ^ f(x) ^ f\(χ) ^ O Γ a -^ 3 * x E X and such that J (f - f.)dμ < ε. K 3 χ Note that for f = 1^ with C C β one has for any μ (82)
N_(ε,lΓ,μ) ύ Nχ(ε,C,μ) ύ 2 N (ε,l p 9 μ).
In fact, as to the first inequality, suppose that n := N (ε,C,μ) < °°; then there exist A ,...,A
E B such that for every C E C there exist i,j with
A. C C C A. and μ(A. \ A.) < ε. Take f.1 := 1 Δ , i=l,.,.,n9 to obtain 1 3 3 ! i f i E L2(X,B,μ) so that for every f = 1Q
f ± £ f £ f.
and
J(f. - fjdμ
= μ(A.\A.) < ε. To verify the second inequality, let m := N (ε,l^,μ) < °°; then there exist f.,...,f E L2(X,B,\i) such that for every f = 1 E l p there exist j,k ^ m with f. £ 1
£ f
and
J(f, - f.)dμ < ε. Taking as A ,. . , ,A
all sets of the form if. > 0} and if. ^ l}, i=l,...,m, we obtain that for every C E C there exist
j ,k ύ 2m such that A. C C C A^ and μ(A^\A.) < ε κ κ 3 3 fact, A. := if. > 0} and A, := if ^ 1} serves for this. 3 3 K K (83) (R.M. Dudley (1981a)): as p -* » the condition on γ in (81) approaches γ < 1; if ί is a collection of indicator functions of sets, i,e.,
in
EMPIRICAL PROCESSES
163
tF = l c for some C C δ, then (E2> does imply (E ) for C (cf
(82)). For
γ = 1 it appears that (81) fails, specifically it fails when if is the collection of indicator functions of convex sets in ]R
and μ is Lebesgue
measure on the unit cube (cf. (63) and its consequences).
Now, if one would try to infer (79) from (81), there is the problem of verifying (E ) ; on the other hand the condition on the envelope function F im~ posed in (81) is rather restrictive since this forces q
to be in L. (X,B,μ)
for some p > 2 (cf. instead the condition (iii ) imposed in the definition of ζL). But, from another point of view, the class f
- {q~ (t) lr
-,; t € [0,1]}
considered in (79) is of the following
special form: f
= {
V g t :t e
where
%rτ + 0 q(t)
o
[0 1]} with f
'
as
o = q'1 and g t ( s )
;=
s -> 0.
Thus, restricting our attention at this place to weight functions q for which q
is continuous, monotone decreasing on (0,1/2), symmetric around
s = 1/2 and such that q
(s) ^ <5 > 0 for all s 6 [θ,ll, then there exists some
M < oo such that for each t E [θ,l]
{g
((a,b]): a < b}
since for each a < b
g
sup |g Cs) | ^ M and such that s6[0,ll t
forms a Vapnik-Chervonenkis class,
((a,bl) consists of one or at most two disjoint
intervals (c,d] in [θ,l] (cf. FIGURE 5 ) .
Thus, the following result of R.M. Dudley (1981b) gives another way to obtain (79) (for proper weight functions q ) : (84) Suppose f = {f
g: gE^}
where for some constant M < °° and some (suitably
measurable) Vapnik-Chervonenkis class C a) $=
b) f
o
ίg:
£0, f
X + [-M,M], g " 1 ( ( a , b ] ) € C V a < b } and o
i s measurable and μ({f
for some β > 4, then f is a μ-Donsker class.
o
>t})=βr(t
-2
(log t )
-3
) a s t->«
164
PETER GAENSSLER
FIGURE 5
Note that b ) , even for 3 > 1, implies f on μ(ίf
> t}) is implied by
Note also that by taking f
f
6 L (X,B,μ). Conversely, the condition
6 L P (X 9 B,μ) for some p > 2.
= 1 one obtains Theorem D as a corollary of (84).
In this context the following result of D, Pollard (1981a) extends a special case of (84) to the case where only f
2 E ί (X,B,μ) is assumed:
(85) If f € L (X,B,μ) and if C is a Vapnik-Chervonenkis class of sets, then (for a separable version of &^ Ξ (& (f ) ) f ef) > f
: = ίf Q
l c : C G C}
is a μ^Donsker class.
(c) STRONG APPROXIMATIONS (cf. Gaenssler-Stute (1979), Section 3, concerning results for the uniform empirical process α ). In a recent paper by R.M. Dudley and W. Philipp (1983) almost sure and probability invariance principles are established for sums of independent not necessarily (Borel-)measurable random elements with values in a not necessarily
EMPIRICAL PROCESSES
165
separable Banach space like the closure of D (C,μ) in (I (C),ρ) fitting readily into the theory of empirical C-processes β = ((3 (C))-,cr, being now viewed as partial sum processes
β
-1/2 ±/Z = n
n
Σ ζ i=l
with ζ± Ξ ( ζ i ( C ) ) c e C
X
defined by ζ.,(C) := l c ( ξ 1 ) - μ(C)
(for a given sequence (ξ.).
of random elements in (X,8) with distribution μ
on β) having its values in D (C,μ). In an analogous way the same viewpoint applies for empirical tF-processes. This approach of getting strong resp. weak invariance principles has the advantage that one can bypass most of the problems of measurability and topological characteristics which occurred in our theory of L, -convergence where it was essential to choose proper sample spaces S and S β
for the processes
and £ , respectively, together with suitable σ-algebras in S and S
on which
the laws of 3 and Φ could be defined, n μ On the other hand, we think that the availability of the presented theory of weak convergence of empirical processes is at the least necessary to support Dudley's and Philipp's claim that strong approximation results are strengthened versions of functional central limit theorems.
REFERENCES
ANDERSON, T.W. and DARLING, D.A. (1952). Asymptotic theory of certain 'goodness of fit' criteria based on stochastic processes. Ann. Math. Statist. 23 193-212. BAUER, H. (1978). Wahrscheinlichkeitstheorie und Grundzϋge der Maβtheorie, 3. Auflage, Walter de Gruyter, Berlin - New York. BENNETT, G. (1962). Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. 57 33-4-5. BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New York. BILLINGSLEY, P. (1971). Weak Convergence of Measures. Applications in Probability. Regional Conference Series in Appl, Math., No. 5. SIAM, Philadelphia. BOLTHAUSEN, E. (1978). Weak convergence of an empirical process indexed by the 2 closed convex subsets of I . Z. Wahrscheinlichkeitstheorie und verw. Gebiete 43^ 173-181. BORISOV, I.S. (1981). Some limit theorems for empirical distributions; Abstract of Reports, Third Vilnius Conference on Probability Theory and Math. Statistics I 71-72. BREIMAN, L. (1968). Probability. Addison-Wesley, Reading. BRON^TEIN, E.M. (1976). ε-entropy of convex sets and functions. Siberian Math. J. (English translation) 17 393-398. CHERNOFF, H. (1964). Estimation of the mode. Ann. Inst. Statist. Math. 16 31-41. CHIBISOV, D.M. (1965). An investigation of the asymptotic power of tests of fit. Theor. Probability Appl. 10 421-437.
166
EMPIRICAL PROCESSES
167
COVER, T.M. (1965). Geometric and statistical properties of systems of linear inequalities with applications to pattern recognition, IEEE Trans. Elec. Comp. EC-14 326-334. DEVROYE, L. (1982). Bounds for the uniform deviation of empirical measures. J. Multiv. Anal. 12 72-79. DONSKER, M.D. (1952). Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist, 23 277-281. DOOB, J.L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann, Math. Statist. 20 393-403. DUDLEY, R.M. (1966). Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces, Illinois J, Math. 10 109-126. DUDLEY, R.M. (1967). The sizes of compact subsets of Hubert space and continuity of Gaussian processes. J, Functional Analysis 1 290-330. DUDLEY, R.M. (1968). Distances of probability measures and random variables, Ann. Math. Statist. 39 1563-1572. DUDLEY, R.M. (1973). Sample functions of the Gaussian process. Ann, Probability 1 66-103. DUDLEY, R.M. (1974). Metric entropy of some classes of sets with differentiable boundaries. J. Approximation Theory 10 227-236. DUDLEY, R.M. (1976). Probabilities and Metrics, Convergence of laws on metric spaces, with a view to statistical testing. Aarhus Lecture Notes Series No. 45. DUDLEY, R.M. (1978). Central Limit Theorems for Empirical Measures, Ann, Probability 6_ 899-929. DUDLEY, R.M. (1979). Balls in ]Rk Do Not Cut All Subsets of k + 2 Points, Adv. in Math. _31 306-308. DUDLEY, R.M. (1979a). Lower layers in ]R2 and convex sets in R 3 are not GB classes. Springer Lecture Notes in Math, 709 97-102, DUDLEY, R.M. (1981a). Donsker classes of functions; Statistics and Related
168
PETER GAENSSLER
Topics (Proc. Symp. Ottawa, 1980), North Holland, New York - Amsterdam, 341-352. DUDLEY, R.M. (1981b). Vapnik-Chervonenkis-Donsker classes of functions, Aspects Statistiques et aspects physiques des processus gaussiens (Proc. Collogue C.N.R.S. St. Flour, 1980), C.N.R.S. Paris, 251-269. DUDLEY, R.M. (1982). Empirical and Poisson processes on classes of sets or functions too large for central limit theorems. Z. Wahrscheinlichkeitstheorie verw. Gebiete 61 355-368. DUDLEY, R.M. and PHILIPP, W. (1983). Invariance principles for sums of Banach space valued random elements and empirical processes, Z. Wahrscheinlichkeitstheorie verw. Gebiete 6_2 509-552. DURBIN, J. (1973). Distribution Theory for Tests based on the Sample Distribution Function. Regional Conference Series in Appl. Math., No. 9. SIAM, Philadelphia. DURST, M. and DUDLEY, R.M. (1980). Empirical Processes, Vapnik-Chervonenkis classes and Poisson Processes. Probability and Mathematical Statistics (Wrocίaw) 1, Fasc. 2, 109-115. FINKELSTEIN, H. (1971). The law of the iterated logarithm for empirical distributions. Ann. Math. Statist. 42 607-615, GAENSSLER, P. and STUTE, W. (1977). Wahrscheinlichkeitstheorie, Springer, Berlin-Heidelberg-New York. GAENSSLER, P. and STUTE, W. (1979). Empirical Processes: A Survey of Results for independent and identically distributed random variables. Ann. Probability 1_ 193-243. GAENSSLER, P. (1981). On certain properties of convex sets in Euclidean spaces with probabilistic implications. Unpublished manuscript, GAENSSLER, P. and WELLNER, J.A. (1981). Glivenko-Cantelli Theorems. To appear in the Encyclopedia of Statistical Sciences, Volume 3. GAENSSLER, P. (1983). Limit Theorems for Empirical Processes indexed by classes of sets allowing a finite-dimensional parametrization. Probability and Mathematical Statistics (Wroclaw), Vol. IV, Fasc. 1.
EMPIRICAL PROCESSES
169
HARDING, E.F. (1967). The number of partitions of a set of N points in k dimensions induced by hyperplanes, Proc. Edinburgh Math. Soc. (Ser. II) 15 285-298. HEWITT, E. (1947). Certain Generalizations of the WeierstraB Approximation Theorem. Duke Math. J. 14 419-427. HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. As soc. _5£ 13-30. KELLEY, J.L. (1961). General Topology. D. van Nostrand Comp., Inc. Princeton N.J. KIRSZBRAUN, M.D. (1934). ϋber die zusammenziehende und Lipschitzsche Transformationen. Fund. Math. 22 77-108. KUELBS, J. and DUDLEY, R.M. (1980). Log log laws for empirical measures, Ann, Probability _8 405-418. McSHANE, E.J. (1934). Extension of range of functions. Bull, Amer. Math. Soc, 40 837-842. POLLARD, D. (1979). Weak convergence on non-separable metric spaces. J. Austral. Math. Soc. (Ser. A) _28 197-204. POLLARD, D. (1981). Limit theorems for empirical processes. Z. Wahrscheinlichkeitstheorie und verw. Gebiete 57 181-195. POLLARD, D. (1981a). A central limit theorem for empirical processes. To appear in J. Austral. Math. Soc. PR0H0R0V, Yu.V. (1956). Convergence of random processes and limit theorems in probability theory. Theor. Probability Appl. 1 157-214. PYKE, R. and SHORACK, G. (1968). Weak convergence of a two sample empirical process and a new approach to Chernoff-Savage theorems. Ann. Math. Statist, 39_ 755-771. PYKE, R. (1977). The Haar-function construction of Brownian motion indexed by sets. Technical Report _35_, Dept. of Mathematics, University of Washington, Seattle. PYKE, R. (1979). Recent developments in empirical processes. Adv. Appl, Prob,
170
PETER GAENSSLER
11 267-268. PYKE, R. (1982). The Haar-function construction of Brownian motion indexed by sets. Technical Report 18, Dept, of Statistics, University of Washington, Seattle, PYKE, R, (1982a), A uniform central limit theorem for partial-sum processes indexed by sets. Technical Report Γ 7 9 Dept, of Statistics, University of Washington, Seattle. RICHTER, H. (1974). Das Gesetz vom iterierten Logarithmus fur empirische Yerteilungsfunkΐionen im ]R . Manuscripta Math. 11 291-303. SAUER, N. (1972). On the density of families of sets. J. Comb. Theory (A), 13 145-147. SCHLAFLI, L. (1901, posth.). Theorie der vielfachen Kontinuitat, in Gesammelte Math. Abhandlungen I; Birkhauser, Basel, 1950. SERFLING, R.J. (1974). Probability Inequalities for the Sum in Sampling without Replacement. Ann. Statist. 2 39-48. SHORACK, G.R. (1979). The weighted empirical process of row independent random variables with arbitrary distribution functions. Statistica Neerlandica _33 169-189. SK0R0KH0D, A.V. (1956), Limit theorems for stochastic processes. Theor. Probability Appl. 1 261-290. STEELE, M. (1975). Combinatorial Entropy and Uniform Limit Laws. Ph. D, dissertation. Stanford University. STEELE, M. (1978). Empirical discrepancies and subadditive processes. Ann. Probability 6 118-127. STEINER, J. (1826). Einige Gesetze uber die Theilung der Ebene und des Raumes, J. Reine Angew. Math. 1 349-364. STUTE, W. (1982). The oscillation behavior of empirical processes, Ann. Probability JLO 86-107. SUN, T.G. (1977). Ph. D. dissertation, Dept. of Mathematics, University of Washington, Seattle.
EMPIRICAL PROCESSES
171
SUN, T.G, and PYKE, R. (1982), Weak convergence of empirical processes. Technical Report ^9_, Dept. of Statistics, University of Washington, Seattle. TALAGRAND, M. (1978). Les boules peuvent elles engendrer la tribu borelienne d'un espace metrisable non separable? Communication au Seminaire Choquet (Paris.) VALENTINE, F.A. (1964). Convex Sets. McGraw-Hill, New York. VAPNIK, V.N. and CHERVONENKIS, A.Ya. (1971). On uniform convergence of the frequencies of events to their probabilities. Theor. Probability Appl. 16 264-280. WATSON, D. (1969). On partitions of n points. Proc. Edinburgh Math, Soc. 16 263-264. WEGMAN, E.J. (1971). A note on the estimation of the mode. Ann. Math. Statist, 42 1909-1915. WELLNER, J.A. (1977). A Glivenko-Cantelli theorem and strong laws of large numbers for functions of order statistics, Ann. Statist, 5 473-480; Correction, ibid. 6 1394. WENOCUR, R.S. and DUDLEY, R.M. (1981). Some special Vapnik-Chervonenkis classes. Disrete Math. _33 313-318. WICHURA, M.J. (1968). On the weak convergence of non-Borel probabilities on a metric space. Ph. D. dissertation, Columbia University, WICHURA, M.J. (1970). On the construction of almost uniformly convergent random variables with given weakly convergent image laws. Ann. Math, Statist. 41 284-291.
NOTATION INDEX
JA|, 21
B(D,s), 92
A° 5 47
B(C,p), 93
C
A , 47
α
B° Ξ (B°(t))
nΞ(αn(t))te[0,ll' 10°
ε
A
8
b ( S ' p ) ' 8(S.pKS = Do(C.μ)), 107
" 135 Ul,
(B
EB),
®β, 2 C
a Ξ
(f
n
V
co(F), 38
yc)w8
C b ( S ) 9 42
m
C^CS), 42
16
9 42
3.
B C (x .,δ ) , 16
Bb(S ) Ξ B b ( S , d ) 9
lf
, 52
Ξ C [ 0 9 l ] 9 93
41
8(S)
9
i 9 r) 9 B ( x 9 r ) 9 41
148
D (C9μ)9 9
B°(x. 9 r) 9 67 l
D F 9 12 n
δ(D,p) 9 91
D F ( q ) 9 17
172
EMPIRICAL PROCESSES
>, 18
F ^ ( s ) , 12
) 9 18
f
(S) ,
173
M
α. C
(F b (s)
Δ (F), 21
\9n'
(C)
28
>
F(S),
47
D(f) 9 43 ,S ), 43
j|f|| := sup |f(x)|, 59
xes
9A9 47
€μE(Gμ(C))ceC9
d ( ξ n 9 n n ) , 65
G(S) 9 47
d := maxίd'jd"), 70
g " 1 ( B f ) 9 54
diam(A)9 83
(B
D = D [ 0 9 l ] 9 90
J*f dμ, 43
d μ , 106
Jf dμ, 43
D (C,μ), 106
J
D , 148
US),
ε (B), 1
ξ -i> ξ, 65 >g)> 56
(E Q ):, 111
(ξ
=
φ , 111
108 43
n )-^
e , 161 μ n5
F
n(t)'
(ξ,n), 71
ξ n - ^ ξ (in (D,s)), 93 ξ,95
F
2
K (s 9 t) 9 140 n 12
K(s 9 t) 9 143
174
PETER GAENSSLER
L =S 4
N(ε,C9μ)9 111
M C ) , 107
N (ε,C 9 μ), 112
> 1
v ( t ) 9 140
V Γ
, μ ) , 162
PCX), 21 m ( r ) 9 22
P* 9 65
M ( S ) ,43
π ! 9 73
3.
43
π t , 91
43
πt
t
μ*(A) 9 43
π c , 107
μ 5 44
IT L
μ
-T^ μ, 46
(D), 95
(S ) 9 U l5#"'Lk
°
Q 9 12
M 1 C S ) 9 58
0
M?;CS) 9 58
0 > 14V
(M):9 107
]Rk9 k ^ l 9
9
19
C M Q ) : 9 131
R., 65
CM ) : 5 132
p = supremum metric, 90
N n (B), 3
pq,
175
EMPIRICAL PROCESSES
S =(S,d)5
W, 143
42
σ({d( , x ) : x e s } ) 9 43 _δ
, _
S°, o
43
;jx||
σ({πt: t e [ 0 , l ] } ) 5
91
:=
sup |x(t)| t€[θ9ll
(x€DLθ,l]), 95
USED ABBREVIATIONS:
s = Skorokhod m e t r i c , 92
ad( ): = as to the proof of ( ):
(SE):, 108
a.e. = almost everywhere
U , U (t), 18 n n
CLT = Central Limit Theorem
a
, 42
df = distribution function
, 42
fidis = finite dimensional distributions
U?(S), 42
GCC = Glivenko-Cantelli class
D U b (C,dμ), 106
= convergence in the mean
= convergence in probability
V(C), 22 U) (B), 96
P-a.s, = IP-almost surely
/U"(δ), 96 x
rest f = restriction of f on A
, 97
r.h.s, = right hand side
VO ( δ ) 9 101
SLLN = Strong Law of Large Numbers
V) (δ), 113
w.r.t. = with respect to
, 115
V ί W n ( t ) ) t6[0,:Ll
VCC = Vapnik-Chervonenkis class
SUBJECT INDEX
analytic generator, 77
Bernstein-type inequality, 8, 9. Borel extension μ of a separable measure μ on ίL(S), 44 Brownian bridge, 101
canonical model, 15 characterization theorem for μ-Donsker classes, 113 Chernoff-type estimates of the mode, 15 compactness, 58 compact net in M (S), 63 a continuous mapping theorems, 54 convergence in law (L -convergence, 65; L-convergence, 93) convergence theorem for reversed submartingales, 10 Cramer-type result, 65 Devroye-Inequality, 34 distribution-free statistic, 18 Donsker classes of functions, 159, 162 Donsker's functional CLT for the uniform empirical process, 101 d-strictly separated, 77 ό-tight, 58 δ-tight w.r.t. S , 76
empirical C-discrepancy, 9 empirical C-process, 8, 106
176
EMPIRICAL PROCESSES
empirical distribution function (empirical d f ) , 12 empirical tF-process, 160 empirical measure, 1 existence of a version of € = (G (C)) pc o in S Ξlr(C,d ) , 110, 111, 114
finite dimensional distributions (fidis), 95, 110 functional CLT's for empirical C-processes9 105 functional CLT's for weighted empirical processes, 139 functional CLT's for weighted empirical processes w.r.t. p -metrics, 150 functional laws of the iterated logarithm, 158
Gaussian process, 2, 110, 14-3, 151, 160 Glivenko-Cantelli class (GCC), 36, 39, 137 Glivenko-Cantelli convergence, 13, 15, 18 Glivenko-Cantelli theorem, 12, 15, 21 growth function pertaining to C, 22
identification of limits, 52
L-convergence, 93 L -convergence, 65 lower left orthants in X = ]Rk, 2, 108, 129, 159
Markov property of empirical measures, 3 martingale property of empirical measures, 6 martingale property of the weighted empirical process, 152 measurability, 107, 131 metric entropy, 111 metric entropy with inclusion, 112 μ-Donsker class, 113 μ-Donsker class of functions 162
Poissonization, 7
177
178
PETER GAENSSLER
Portmanteau-theorem, 47 product spaces, 69 p-space, 1
quantile transformation, 12
Radon's theorem, 38 random change of time, 102 random element in X, 1 randomized discrepancy, 10 relatively L-sequentially compact, 95 relatively L -sequentially compact, 95 reversed martingale, 9 reversed submartingale, 9, 10 p -metric, 148
scores, 140 separable measures on B, (S), 44 separation property, 38 sequential compactness, 74 shattered by C, 22 Skorokhod-Dudley-Wichura Representation Theorem, 82 Skorokhod metric s, 92 smoothed version of empirical processes, 138 special versions, 12, 17 Strassen log log class, 159 strong approximation, 164 supremum metric p, 90, 106
tight measures on B (S) 5 45 trace σ-algebra, 14
uniform empirical process, 100
EMPIRICAL PROCESSES
Vapnik-Chervonenkis class (VCC), 22 Vapnik-Chervonenkis-Inequalities, 27, 34 Vapnik-Chervonenkis-Lemma, 24 version, 111
weak convergence and mappings, 53 weak convergence criteria, 58 weak convergence of random elements in D Ξ D [ θ 9 l l w.r.t. p -metrics, 147 weighted discrepancy, 17 weighted empirical process, 140 weight function, 17
179