This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
S}).t (2.49) fE^
J JE 9
For every positive 8 the right side goes to zero as €,0. Therefore, by Theorem 2.4, is a Q-uniformity class. Necessity of (i) also follows immediately from Theorem 2.4. To prove the necessity of (ii), assume that there exist a positive number S and a point x o in S such that sup c j (x o :e)>S
foralle>0.
fE9
This implies the existence of a sequence {x„) of points in S converging to x o and a sequence of functions (f„) c F such that IL(xn)—f.('XO)I>2
(n=1,2,...).
Let Q = Sxo and Q„ = Sx. (n =1, 2, ... ). Clearly, (Q} converges weakly to Q, but
f f.dQ,^ Hence
—
f.dQl =If^(x,^)—f,,(xo)l>2
(n=1,2,...).
is not a Q-uniformity class. Q.E.D.
We have previously remarked that the weak topology on the set Vii' of all probability measures on a separable metric space is metrizable, and the tThe set (x: sup(wj (x : c) : f E f) and, therefore, measurable.
>6) =
u ((x : wj (x : c) > 8):f E i} is open (see Section 11)
Uniformity Classes
17
Prokhorov distance dp metrizes it. Another interesting metrization is provided by the next corollary. For every pair of positive numbers c. d, define a class L(c, d) of Lipschitzian functions on S by L(c,d) = { f : wf (S) < c, I f (x) — f (y)l < dp(x,y) for all x,y ES). (2.50)
Now define the bounded Lipschitzian distance d BL by dBL (Q1 Q2) = sup fdQ, — f fdQ2I fEL(1,1) I
(Q1.Q2E''P). (2.51)
,
COROLLARY 2.8. If S is a separable metric space, then d BL metrizes the weak topology on Proof. By Corollary 2.7, L(l, I) is a Q-uniformity class for every probabil-
ity measure Q on S. Hence if (Q n ) is a sequence *of probability measures converging weakly to Q, then lira dBL (Qn,Q)=0.
(2.52)
n
Conversely, suppose (2.52) holds. We shall show that (Q n ) converges weakly to Q. It follows from (2.52) that lim l f fdQn — f fdQl =0
for every bounded Lipschitzian function f. (2.53)
For, if f (x) — f (y)^ < dp(x,y) for all x,y E S, f fdQ,,— f fdQ=c( f f'dQn
—
f f'dQ),
where c = max {wf (S),d) and f' = f/c E L(1, 1). Let F be any nonempty closed subset of S. We now prove lim Qn (F)
(2.54)
For E >0 define the real-valued function fE on S by ff (x)=^(e
-
'p(x,F))
(x ES),
(2.55)
18
Weak Convergence and Uniformity Classes
where 4, is defined on [0, oo) by
(t) __1
.
1—t if 0 < t < 1, if 1>1. 0
(2.56)
Note that f is, for every positive e, a bounded Lipschitzian function satisfying wf(S)< 1, If,(X) f^(Y)I<) EP(x+F) — - p(Y,F)I —
E
P(x , Y)
(x Y E S ), ,
so that, by (2.53), linm f f^ dQ„ = f f, dQ Since IF
(e>0).
(2.57)
f, for every positive e, linm Q,, (F) < lim f f dQ„ = f f dQ
(e>0). (2.58)
Also, lim f^(x)= IF (x) for all x in S. Hence
lim JJdQ=Q(F).
(2.59)
cj0
By Theorem 1.1 { Q) converges weakly to Q. Finally, it is easy to check that dBL is a distance function on 'P. Q.E.D. Remark. The distance daL may be defined on the class GR, of all finite signed measures on S by letting Q t . Q 2 be finite signed measures in (2.51). The function µ—.d, L (s,0) is a norm on the vector space 9R.. The topology induced by this norm is, in general, weaker than the one induced by the variation norm (1.5). It should also be pointed out that the proof of Corollary 2.8 presupposes metrizability of the weak topology on 9 and merely provides a suitable metric as an alternative to the Prokhorov distance do defined by (1.16)]. This justifies the use of sequences (rather than nets) in the proof.
One can construct many interesting examples of uniformity classes beyond those provided by Corollaries 2.7 and 2.8. We give one example now. The rest of this section will be devoted to another example (Theorem 2.11) of considerable interest from our point of view.
Uniformity Classes
19
Example Let S=R 2 . Let 6! (1) be the class of all Borel-measurable subsets of R 2 each having a boundary contained in some rectifiable curvet of length not exceeding a given positive number 1. We now show that 6D (I) is a Q-uniformity class for every probability measure Q that is absolutely continuous with respect to the Lebesgue measure A 2 on R 2 . Let A E 6 (l) and let 3A C J, where J is a rectifiable curve of length 1. There exist k points z o,z t ,.••,z k -, on J such that (i) z o and z k _ t are the end points of J (may coincide), (ii) k
(L +2)1T(2E)
2
=41rk+87rE 2 [A E (l )].
(2.60)
Let Q be a probability measure that is absolutely continuous with respect to A 2 . Then (2.60) implies (in view of Theorem A.3.1) lim (sup{Q((aA):AE^'i)(1)})=0.
(2.61)
By Corollary 2.6, -D (1) is a Q-uniformity class. We need some preparation before proving the next theorem. Let S be a metric space. Define the Hausdorff distance A between two closed bounded subsets A, B of S by
A(A,B)=inf{ E:c>0 ,AcB`, BCA`}.
(2.62)
The class Q of all closed bounded subsets of S is a metric space with metric A. LEMMA 2.9. Let ')1t be a compact subset of (? . Then for every probability measure Q on S one has lim sup{Q(M`):ME cjo
)=sup{Q(M):MElt).
(2.63)
Proof Let rq be a given positive number. For every M E )lt. there exists a positive number E, t, such that
Q(M`")
,
to rectifiable curve in R 2 is a subset of R 2 of the form {z(t):0G t < 1), where t- z(t) =(x(t),y(t)) is a continuous function of bounded variation on [0,11 into R 2 ; z(0), z(1) are called the endpoints of the curve.
20 Weak Convergence and Uniformity Classes
since M`J.M as 40. By compactness of Olt, there exists a finite collection of sets {M 1 , ..., M,) in OL such that every set in OJlt, is within a A-distance from M for some i, 1
EM;
:1
{
Then sup(Q(M`"):MEOL) <sup{Q(M,`°Mi 2 ):1
Q.E.D. A subset C of Rk is convex if a x, + (1 — a) x2 E C for all x,, x2 E C and all a E [0,1]. A hyperplane H in R' is a set of the form
H={x:=c},
(2.64)
where c is a real number and u is a unit vector of R k ; that is, Ilul) = 1, and ,> denotes euclidean inner product
<
k
=
u,X,,
(2.65)
i=! k
1/2
u?
(lull
[u=(u1,...,uk), x=(x 1 ,...,xk )ER k ].
i^l
A closed half space E is a set of the form E=(x:xER k ,
{cER', uER k , IIull=l]. (2.66)
A hyperplane H= (y : = c) is said to be a supporting hyperplane for a set A at x E A if =c,
AC{z:
(2.67)
that is, if the hyperplane H passes through the point x of A and has A on one side. Note that the sign < in (2.66) and (2.67) may be replaced by >
Uniformity Classes
21
(by merely changing u to — u). It is a well-known fact that if A is a compact convex set, then there exists a supporting hyperplane for A at each x E aA .t LEMMA 2.10. Let A, B be two closed, bounded, convex subsets of R k . Then A(A,B)=A(aA,aB).
(2.68)
Proof. Suppose A(8A,aB) =E. Let E' be any number larger than E. Let There exist xr, x 2 E aA and a E [0,11 such that x = ax I +(1 — a)x 2 , since the intersection of A with any line through x is a closed line segment whose end points (x,,x 2 , for example) lie on 8A. Let y,, y 2 E aB be such that xi —y ; < E' for i= 1, 2. Then letting y = ay, +(l — a)y 2 yields y E B and
x E A.
llx yl! <«Ilx1 yell+(I —
—
—
a)IIx2 — y2fl <E'.
Thus A c B` . Similarly B CA '. Since this is true for every E > E, 0(A, B) < A(aA, aB
).
(2.69)
To prove the opposite inequality, suppose that E > t1(A, B) and let be any positive number. Let x E aA A. Let (z : (1, z> = c) be a supporting hyperplane for A at x. Then the half space H = { z : (l, z> c c + E) contains A. If z E A and 11 z' — z l l< E, then
(l,z'>=(l,z>+(l,z'—z>+IIz'—z!I c+ e. Hence H J A D B. The ball B (x : E + il) intersects R k \ H. This is because the point x +(E +71/2)1 of this ball satisfies (l,X+(E+
2 )l>=( l,x>+(
E+ 2) =C +E+
2
,
and therefore lies in the complement of H. It follows that B (x : E + rl) intersects R k \ B. But, since A C B ` and x E A, B (x : E + rl) certainly intersects B. It follows that B (x : E + rl) intersects aB, so that (aB)"'' D 3A. Similarly (8A)` + ' j B. Therefore A(aA,aB) <E +rl tEggleston II), p. 20.
22
Weak Convergence and Uniformity Classes
for every c > 0(A, B) and every positive rt. Hence 0(aA,aB) <0(A,B).
(2.70)
The inequalities (2.69) and (2.70) together yield (2.68). Q.E.D. THEOREM 1.11. Let C denote the class of all Bore/-measurable convex subsets of R k . Let Q be a probability measure on R k . The class e is a Q-uniformity class if and only if it is a Q-continuity class, that is, if and only if Q(ac)=o
forallCEe.
(2.71)
Proof If e is a Q-uniformity class, then, by relation (2.41) in Corollary 2.6, e is a Q-continuity class. Suppose, conversely, that (2.71) holds. Since the characterization (2.41) of Q-uniformity is in terms of boundaries, and since aC= a (Cl(C)) for every convex set [note that C1(C)= Int(C)u aC and that aC has empty interior for convex C], it follows that C is a Q-uniformity class if and only if e = (C 1(C) : CE C) is. Given ri >0, let r be so chosen that
Q({x:IIxII>r))<2.
(2.72)
Write C', = { C: C E e, CCC1(B (0: r))). By a well-known theorem of Blaschket 3, is compact in the Hausdorff metric A defined by (2.62). Lemma 2.10 shows that the compactness of , is equivalent to the compactness of { aC : C E 13, }. Lemma 2.9 and the hypothesis (2.71) now yield (2.73) lim sup{ Q((8C)`):CEC',}=0. By Corollary 2.6, C, is a Q-uniformity class. Let (Qn ) be a sequence of probability measures weakly converging to Q. Then lim (sup{IQn(C) n
—
Q(C)I:CEC^))
<11im (sup{IQ.(C)—Q(C)^:CE,}) + lim Qn({x:llxll>r))+Q({x:llxll>r)) n =2Q({x:fIxll >r})
(2.74)
Integrals Over Convex Shells
23
Note that the last equality follows from the fact that (x : Ilxil > r) is a Q-continuity set, since its complement is [although, given any probability measure Q and a positive rl, one can always find r such that (x: jlxjl > r) is a Q-continuity set and (2.72) holds]. Since rl is an arbitrary positive number, it follows from (2.74) that is a Q-uniformity class. Consequently, (2 is a Q-uniformity class. Q.E.D.
a
Remark. It follows from the above theorem that if Q is absolutely continuous with respect to Lebesgue measure on R", then is a Q-uniformity class. In particular, is a 4)uniformity class, 4) being the standard normal distribution in R k . We shall obtain a refinement of this last statement in the next section. It is easy to see from the proof of Theorem 2.11 that the class C in its statement may be replaced by any class Ef of Borel-measurable convex sets with the property that
e
a
R,- (CI(A)nCl(B(0:r)):AE($} is compact in the Hausdorff metric for all positive r. In particular, by letting d — {(—oo,x1]X(—oo,x2]X ... X( — oo,xk]:x—(xi,...,Xk)ER k }, we get Polya's result: Let P be a probability measure on R k whose distribution function F, defined by F(x)-P((-oo,x,]x... x(-oo,xk ]) is continuous on Rk. If a sequence converges weakly to P, then
[x-(x......xk)ERk ],
(2.75)
of probability measures {P„} with distribution functions {F
sup(IF„(x)-F(x)I:xER k }-+0
A)
(2.76)
as n-boo. The left side of (2.76) is sometimes called the Kolmogorov distance between P o and P. The converse of this result is also true: if (2.76) holds, then (P„) converges weakly to P. In fact, if (F„} converges to Fat all points of continuity of F, then (P„) converges weakly to P.t
3. INEQUALITIES FOR INTEGRALS OVER CONVEX SHELLS
It is not difficult to check that if C is convex then so are Int(C),C1(C). The convex hull c(B) of a subset B of R k is the intersection of all convex sets containing B. Clearly c(B) is convex. If C is a closed and bounded convex subset of Rk, then it is the convex hull of its boundary; that is,
c(8C)=C. Clearly c(8C) C C. On the other hand, if x E C, x li= 8C, then every line through x intersects aC at two points and x is a convex combination of these two points. Thus c(aC) = C. If C is convex and e >0, then C` is tSee Billingsley [ 1 1, pp. 17-18.
Weak Convergence and Uniformity Classes
24
convex and open, and C -' is convex and closed. The main theorem of this section is the following: THEOREM 3.1. Let g be a nonnegative differentiable function on [0, oo)
such that (i) b= f Ig'(t)It k- 'dt
(ii) lim g(t)=0. r--► oo
Then for every convex subset C of R k and every pair of positive numbers e, p,
C'\C v
(3.1)
g(JI xI1)dx < bak(E+p)
-
where
_ ak
k7l
2^7k/2
k/2
(3.2)
f((k+2)/2) I'(k/2) '
is the surface area of the unit sphere in R k .
COROLLARY 3.2. Let s > 0, k > 1, and f(x)=(21T)-k/2IIxIIsexp j —
11x112 1
(xERk )
Then for all convex subsets C of R k and every pair of positive numbers e, p,
f
C.\C
r((k+s-1 )/2)
c: '/2-+k-1) (2s f(x)dx<2 -,
r(k/2)
((+p). (3.3)
Proof. Here one takes (in Theorem 3.1) g(t)=(217) -k / 2 t'exp(-1 2 /2)
1E[0,00).
Then g,(t)=(2r)-k/2(st$-I— is+ I )exp (— 2 1, 00
( )—k/2
b < 2^r f o
( k+s-2
st
p r j—22 }l di + t k+s) ex J l
Integrals Over Convex Shells =(2.r)-k/2(s,2(k+s-3)/zr(
k+s— 1 )+21c+3_/2r( k+s+ 1
2
=(2^r) - k/2 2 (k+s-3)/2 (2s+k-1) r(
=2 ( s -3) / 2 (2s+k— I)
-k / 2
25
r(
2
k+s 2
-
1
k+s-1 )( 3.4)
which gives (3.3) on substitution in (3.1). Q.E.D. The rest of this section is devoted to the development of the material needed for the proof of Theorem 3.1. For k=1, C` \C -° is contained in the union of two disjoint intervals each of length e+ p. Hence
f
g(IxI)dx 2(e +)( sup g(x)) <2(e+p) f
C`\C ^ -
x>0
'Q
I g'(t)I dt=2(e+p)b
0
=ba 1 (e+p).
(3.5)
For k >. I a more intricate argument is needed. For the rest of the section we assume k> 1. A polyhedron is a closed, bounded, convex set with nonempty interior that is the intersection of a finite number of closed half spaces. If P is a polyhedron, a face of P is a set of the form H n aP, where H is a hyperplane such that H n aP has nonempty interior in H. LEMMA 3.3. Let a polyhedron P be given by P=(x:
(3.6)
where ui 's are distinct unit vectors. Let Li= (x:xEP, =d)
(I < j<m).
(3.7)
Then a P = U L^. If F is a face of P, then F = L^ for some j. Moreover, P= (x : (u^,x> < d for all j for which LL is a face of P }.
(3.8)
Proof. The first assertion is obvious. If F= H n 8P= u(H n L^) is a face of P, then H = (x :
26
Weak Convergence and Uniformity Classes
Lj = Hj n aP. It is clear that the interior of L1 in H^ is (x:=d ),
so that if Lj is not a face of P, then Lj C U L,. This implies 8P C 8Q, r#j
where Q is defined by Q= { x :
Clearly P c Q. It is sufficient to show that aP c aQ implies P= Q. Let x t E Int (P). Assume there exists x 2 E aQ \ aP. Consider the line segment [x 1 ,x 2 1 joining x, and x 2 . This line segment meets aP at x 3 , say. Clearly, [x 1 ,x 2 )CInt(Q) and x 3 x 2 , so that x 3 EInt(Q)naP, which contradicts the fact aP c Q. Q.E.D. LEMMA 3.4. A polyhedron P has a finite number of faces. if F 1 ,.. .,F m are the faces of P, then a P = U F. Moreover, there exist unique unit vectors u^ and constants d3 such that F, = (x : x E P, = d d ), and P C (x : < di ). Also, P then has the representation P=(x:
There are two main steps in the proof of Theorem 3.1. The first one (Lemma 3.9) is to express the surface integral as a derivative of volume in volume integrals. The second step (Lemma 3.10) is to get a uniform bound for the surface integral of a fixed function over the boundary of a polyhedron. The ideas involved here belong naturally to the domain of surface area and surface integrals. In the following paragraphs we develop the material needed for the proofs. The development here is self-contained except for the use of Cauchy's formula, which is stated but not proved. We begin with some notation. Let A k denote the Lebesgue measure on R k normalized by the euclidean distance on R k ; that is, if y 1 , ... ,y k is an tSee Eggleston 111, pp. 29-30.
Integrals Over Convex Shells
27
orthonormal basis and A is a cube with respect to them or A = { I ty ; : a; < t; < b; for all i}, then X.(A)=(b t — a t )• • • (bk — ak ). We also refer to Ak as the k-dimensional Lebesgue measure on R'. On any hyperplane of R' there is a (k — 1)-dimensional Lebesgue measure normalized the same way. We denote this measure as Ak _ 1 and call it the (k— 1)-dimensional Lebesgue measure. For example, if H is a hyperplane, it can be written in the form H=x o +Ry 1 +---+Ryk _ 1 , where x0,y1,...,yk-1EH and y,,•.., yk _ I are (k—I) orthonormal vectors in R'. If f is a function with compact support in R k , then
f
H fdXk-1=
f f(xo+t1Y1+
...
+tk-1Yk-1)dt,...dtk_1.
(3.10)
Next we denote a k _ l as the surface area measure on the unit sphere Sk _ . One can write an explicit formula for 0k- 1 by using Eulerian angles to parametrize points of Sk _ 1 . It then follows that o k _ I (Sk _ 1 )=a k , where a, is given by (3.2). Let P be a polyhedron with faces F, (1 < i < m). Then the surface area of P is defined as Ak-1(F;).
Ak_1(8P)=
(3.11)
;=t
For a bounded Borel-measurable function f on R k we define the surface integral off on a P by
faP fdXk-1= ^, f fd^k-1
(3.12)
and if A is a Borel subset of aP, then the surface area of A is defined by Ak-1(A)=
Ak-1(AnF;).
(3.13)
Remark. Let P be given as P={x:<4., I <j< m}. Then
f
fda k _ 1 =
f fdAk_1. (3.14)
aP I<j<m L,
where LL -(x:xEP,=d1 ). This follows from the fact that A1(L)0 if L^ is not a
face of P.
28
Weak Convergence and Uniformity Classes
CAUCHY'S FORMULA -Let C be a polyhedron. For each unit vector u E R k , let Ak _ I (C: u) denote the X k _ 1 -measure of the orthogonal projection of C on the hyperplane (x: = 0). Then xk-I(aC) = $k_I f Ak-I(C:u)ak-I(du) (3.15) ,
where 13k1 = ak_ 1/(k —1) is the volume (Lebesgue measure in R k- I ) of the unit ball in Rk - I
COROLLARY 1. Let P and Q be polyhedra such that Pc Q. Then xk - I (8P)
(3.16)
Proof. This follows from Cauchy's formula if it is noted that Xk _ I (P : u) < Xk _ I (Q : u) for any u. Q.E.D.
COROLLARY 2. If P is a polyhedron and PC B(0: t), then xk-I(8P)
(3.17)
Proof. The projection of B (0:t) on the hyperplane (y : = 0) is again a ball of radius t in that hyperplane, so that Xk _,(B (0 : t) : u) = t k— I13k _ The estimate now follows from Cauchy's formula. Q.E.D. .
These two corollaries of Cauchy's formula are actually used in the proof of Lemma 3.10. Let P be a polyhedron given by
P=(x:(u1 ,x>
(3.18)
where the unit vectors u^ are assumed distinct. This representation is kept fixed throughout. For each real a define the convex set P a by
Pa ={x:
(3.19)
Then Pa has nonempty interior for a > 0, since PCP, for a >0. In fact, there exists an a 0 <0 such that PQ has nonempty interior if and only if a > a o . Thus P. is a polyhedron for a > a o , if we show that it is bounded. A more precise result is given by the following: LEMMA 3.5. Let xo E Int (P) and suppose that P C B(x o : po). Then P a CB(x o :cpo) for a>0, where c = I + a/d, d = min(d d —<us ,x0 >: 1 <j<m).
Proof. The proof uses the notion of a gauge, or support, function of a convex set. Let C be a closed convex set with nonempty interior and
Integrals Over Convex Shells
29
x 0 EInt(C). Then x-xo FF (x)=inf(a:a>0,x 0 +
a
EC
(3.20)
is called the gauge function of C with respect to x 0 . It is easy to see that if x 0 is an interior point of each of two closed convex sets C I , C2 , then C I c C2 if and only if Fc= (x) < F 1 (X) for all x, where both gauge functions are with respect to x 0 . Returning to P and defining Fp with respect to x 0, an easy calculation shows that [since x o +(x - x o)/a E aP if a = F,(x)] (uj , x - x 0 )
Fp (x) = max
< j < m dj —
FP (x)
U j^ X
—
X 0)
Ima`md+a-
(a>0).
CIearly, Fp (x)> c,FP (x) for all x, where c= min
dj -
<j<m d
+a—
The gauge function of CI(B(x o :p o)) with respect to x 0 is Ilx-x 0 /p 0 , so that P C B (x 0 : p o) implies FP(x)> I llx-x011•
Thus Cl
FF, (x)>—IIx-x 0 11,
which implies that Pa C B (x 0 : p0 /c 1 ). Now d +a-
c -1 = max 1<j<m dj
—
(uj ,x o)
= maxm I+ dj1
Q.E.D.
a a
Cuj,xo))=1+d.
30
Weak Convergence and Uniformity Classes
LEMMA 3.6. Let P be a polyhedron defined by (3.18) and let P. be defined by (3.19) for all a E R'. Then Pa ^P°
if a>0,
PQ CP°
if a<0.
Proof. Suppose a >0 and x E P°• Then there exists y E P such that IIx—y(I
=+ < d + (I < j<m),
+IIuu1I.IIx-YII
so that x E Pa . If a= — b < 0, x E Po , and II y — x II < b, then=+
<4
—
x>
(I<j(m),
implying y E P. Hence x E pa = p -b . Q.E.D. LEMMA 3.7. Let v be a unit vector in R'`, and A = B(x o : p) n { x : a <0, h >0, aER'. Then (k— I)/2
Xk(A) < h r( k+ I ) P
k-i
.
(3.21)
2 Proof. Choose an orthonormal basis v l , ..., vk of R'with v, = v. Let U denote the orthogonal transformation taking v ; to e,, I
Xk (A)=Xk(UA) < hA k _ I(( u E R k- ' : Bull
=h
77
ck—^^/z
r( Q.E.D.
k+l )P 2
k—I
(xo=Ux o ),
Integrals Over
Convex Shells
31
LEMMA 3.8. Let v 1 , v 2 be two linearly independent unit vectors, h > 0, and all a 2 ER'. Let A=B(y o :p)l {x:a ;
<
Then ^(k-2)/2pk-2h2
Ak (A)<
2 1/2
r(k/ 2 )( 1—)
Choose orthonormal vectors pl,p2,...,Pk such that p l = v 1 ,
Proof
P2 = ( 1 —2 ) — 1/2 ( — < v 2, vl>ul + v2)•
Writing y, = (pi , x>, 1 < i < k, Xk(A)=Xk(B(YO:P)r 1 {Y=(YI,...,Yk):al
a 2
(3.22)
where y;=(y; l ,...,yak ) is the image of y o under the map x=(x l ,...,xk ) —>(,..., )=(Yl,...,Yk)=Y, and Y= S=(I,
2 ) 1/2 . Thus
Xk(A) Ak((Y =(Yl ... Yk) : ,
,
(Y3 —Yo3)2 + ... + (Y k — Yok) 2 < p 2 , a 1 < y < a l + h, 8-' (a2 — YYl)
= S — 1112 17(
k-2)/2 k-2
r(k/2) P
Q.E.D. LEMMA 3.9. Let f be a continuous function on R k and let M(a)= f f(x)dx
(aER'),
(3.23)
P,
where P is a polyhedron defined by (3.18), and P, is defined by (3.19). Then
32 Weak Convergence and Uniformity Classes
the function M is differentiable on R' with a continuous derivative given by M'(a)= f fdA k _ I (aER').
(3.24)
aPa
Proof Since Pa is a bounded set for each a, we assume (without loss of generality) that f is bounded on R k . We assume throughout that
O< h < 1.
(3.25)
If there exist ul, , uj ,, where I < j j J2 < m, such that uj , + u=0, then d,+d = >0,
(3.26)
in order that P may not be empty or have an empty interior (as is the case if d,+d2 =0). Similarly, if d,+a+d,+a is negative, Pa+ ,, is empty for sufficiently small h, and M identically vanishes in (— oo, a + h o), say. Thus (3.24) is trivially true, both sides being identically equal to zero. We
therefore assume that a> — /3,
(3.27)
a=min{d,+d=:uj,+u. =0, 1 < j 1 ,j2 <m} J J2
(3.28)
where
if there exist j,,j 2 such that u+u 2 =0, and fl = oo otherwise. Let b be a positive number. By Lemma 3.5 there exists a positive number c, such that
PacB(0:c,)
foralla
(3.29)
The definition of P, gives Pa+h\Pa= U Qli....J., (3.30)
where the union is over all r, 1< d +a for all j ^ (j i ,...,j,)
andd+a<(u1; x>
(3.31)
Then
M(a+h)—M(a)= f
fdx= 2 f fdx,
(3.32)
Integrals Over
Convex Shells
33
where the summation is over all r-tuples (j,'. ,j), I < r < m, as specified above. Let (3.33)
L;(a)={x:xEPa' (u; ,x>=d; +a}.
We shall prove the following inequalities: Z
^k(Qj,.....j,^
6
c 2 h 3 if r 3.34 2, ( )
and f f(x)dx—h f fdA k _,
(I'i<m), (3.35)
L,(a)
where c 2 , c 3 are constants depending only on k, P, and b, and V(h)=sup{if(x)—f(x') :x,x'EB(0:c,),lix—x'
h).
(3.36)
To prove (3.34), note that if any two of the vectors uj. , ... , uj are linearly independent, then (3.34) follows from Lemma 3.8 and (3.29). The only remaining case occurs when r = 2 and uj + uj = 0. In this case Qj .^, 2 c {x : max [ — dj .—a—h,dj ,+a] <
<min[—dj —a,djs +a+h]}. Thus Xk (QJ ,j ,) = 0 unless d.z + a < — d.JI —a,
or,
a < — i (d. + dj:/
(3.37)
But (3.37) is ruled out by (3.27). Hence (3.34) is proved. Now define 2 1 /2 aij = < ui ,uj> , y,=(1—aij)
y=min{y j :y j 0, l
<1,1< m}.
If u; + uj = 0, that is, if y j = 0, then, by (3.27), {x:>d; +a)={x: <—d; —a} c{x:
(3.38)
34 Weak Convergence and Uniformity Classes
so that one may write Q; ={x:
(3.39)
(14 i <m).
In the following discussion we fix the integer i. Let v l ,...,vk be an orthonormal basis such that v 1 =ui . The transformation x=(x 1 ,...,x& )--y =(y 1 ,. .. ,Yk) given by yj =< vj ,x>
(l <j
enables one to write x=y l v l +y', wherey'=Ej_ 2 yj vj . Forj i let u,' be the unit vector defined by Uj
Then, since
_—
Z ) I/Z
uj =aji VI +'Yj; uj .
is orthogonal to u; ,= ajiy I + Yji .
Thus in terms of the new coordinates Qi , respectively, where Qi = l Y=(Y, , ... , Yk):
L1 (a)
become Q , L ( a), ;
;
aj;(YI —d;—a)
•1,
foralljwithyj; O, and di +a
L, (a)= {(di +a,y 2 ,...,yk ) : < u5,y') cj for all j with where
d +a—aj; (d; + a) cj
_ (
Yji
35
Integrals Over Convex Shells
Let Q; (y 1 ) denote the y,-section of Q; . Note that
f_f(x)dx=
Q;
f
L(a)
f
d; + a+h
f(Y1v1+Y')dy2...dyk
dYi f-
d+a
,
Q;(ri)
f((d;+a)v,+y')dy2...dyk.
fdXk-,= J
(3.41)
Q^(d;+a)
Now since Q, (y,) is a section of Q that is the image (under an orthogonal transformation) of Q, c Pa+h c B (0: c,), it follows that ;
If(y^v,+y')
—
f((d,+a)v1+Y')IdY2...dyk
Q, (Y 1)
< V(h)Ak-I(Q; (y1)). c 5 V (h),
(3.42)
say, for all y 1 satisfying d; + a < y, < d; + a + h. Next the symmetric difference Q1 (y 1 )0 Q.(d,+a) is contained in U
c 1 ) fl S (y2 .. .,Y):
{(Y21...,yk): II(Y2,...,Yk)II
,
(j:7j1 .0)
Ia;:l l
I
ii
))
It follows from Lemma 3.7 that Xk - 1(Q; (Yi)AQr (di+a)) < ceh
for a suitable constant c 6 , and so
f .
f( vIYI+y')dy2...dyk
Q;(vi)
< cbl1f h.
—
f
f(v1Y1+Y')dY2...dyx
Q,(d,+a)
(3.43)
The inequality (3.35) now follows from (3.41), (3.42), and (3.43). We thus
have M(a+h)—M(a)—h faP fdA k _ I
Since V (h)- .0 as h-+0, (3.24) is proved. Continuity of this derivative
36 Weak Convergence and Uniformity Classes
follows from that of the function a—i f fdAk _,, which is an immediate L,(a)
consequence of (3.42) and (3.43). Q.E.D. LEMMA 3.10. Let g be a differentiable function on [0, oo) such that lim g(t)=0 I-.00
f
^g'(t)jt k- `dt
(3.44)
faP g(Ilxll)dAk_I < a k •b.
(3.45)
bm
0
00
Then for every polyhedron P
Proof Define the function F on [0, oo) by F(t)=xk _ I (3PnCl(B(0:t))).
(3.46)
Then F is the distribution function of the measure A k - 1 0 h -' induced on [0, oo) by the map x (1x11 on 8P (endowed with the surface area measure A k _ )• Note that F (0) = 0, F(t)=xk _ I (aP)
for all sufficiently large t.
(3.47)
By changing variables and integrating by parts,t
fap g(Ilxll) Xk-.(dx)= f ,
'
g(t)Xk _ 1 oh - '(dt)
= f g(t)dF(t) = lim (F(t)g(t))—F(0)g(0)— f g'(t)F(t)dt _ — f g'(t)F(t)dt, (3.48) from (3.47). Fix 1>0. Let Q be a polyhedron such that Q c C I B (0 : t). tSee Dieudonne [1], Vol. 11, p. 218.
Integrals Over Convex Shells
37
a
Then Q n 8P c (P n Q). Since P n Q is a polyhedron and Ak _, is a measure on the boundary of a polyhedron, it follows that Ak _ I (Q n 3P) k_ I (P n Q)). On the other hand, (3.16) implies that A k _ I (P n Q)) k _,(3Q). Thus by (3.17)
< a (a <x
(a
(3.49)
xk- i(QnaP) <xk- I(aQ)
Choose a countable dense subset (x 1 ,x 2 ,...) of 8B(0:t) and let Q„ be the convex hull of x 1 ,...,x,,. Then Q„TC1(B(O:t)), and (3.49) yields
F(t) =limx k _,(aP n Q„) < a k t k -1 .
( 3.50)
Substituting in (3.48) we get the lemma. Q.E.D. We are now ready to prove Theorem 3.1. Proof (of Theorem 3.1). First suppose that C is bounded, lnt(C) q. Given 6>0, choose x,,...,x„ E aC such that C {x 1 ,...,x„) s. Let P be the convex hull of { x,, ... ,x). ). By taking 6 smaller than the radius of a ball contained in C, we ensure that P has a nonempty interior. Also, clearly,
aC
PC CC
P s,
so that
C` C Pc+s By Lemma 3.6 P
(Pa) CP0 +P
for all p>0, all a E R '. Thus (P_ P )PC P C C, or P_ P C C', so that C`\C-PC(Pa
)t
\P -PC PP +s\P -P
(P>0),
and
P
f
g(Ilxll)dx<
f
g(Ilxll)dx
+s da g(IlxII)Ak-i(dx) =j a ro
f
(P > 0)
38 Weak Convergence and Uniformity Classes
by Lemma 3.9. From Lemma 3.10 we have
fc.\c
p 8(ilxjl)dx <(e+B+p)b a k .
(3.51)
-
Since 8 is arbitrary, we are done. If C is bounded, Int(C)=+, then C °=4 for p>O, and for all 8>0 -
Jc , \c p g(Ilxll)dx= f g(IlxjI)dx< f c6) ,g(llxll)dx c'
a g(Ilxll)dx
G(c+S) bak , (3.52)
^C a ) VC a ) —
since C & has a nonempty interior. Since S is arbitrary, (3.1) is true for all bounded convex sets C. If C is unbounded, look at C, = C n B (0: r) and let r increase to infinity. Since C, 1C and C, - °TC - ° as rToo, and the right side of (3.1) does not depend on the particular convex set, the proof is complete. Q.E.D.
NOTES Section 1.
Detailed accounts of the theory of weak convergence of probability measures may
be found in Billingsley [1] and Parthasarathy [1]. Section 2. A systematic study of uniformity classes was initiated by Rao [3), who proved, in
particular, Corollary 2.7 and Theorem 2.11. The theory was advanced and completed by Billingsley and Topsde [1], who proved the main theorem Theorem 2.4, Corollary 2.6, Lemmas 2.2, 2.3, 2.9, 2.10, and gave the example following Corollary 2.8. These two articles also contain many results and applications not included here. One significant application of the theory is in generalizing and strengthening the classical Gilvenko-Cantelli Theorem. For this, see Rao [3] and Topsde [I]. The useful Lemma 2.1 was proved by Scheffe [1]. The bounded Lipschitzian distance has been studied by Dudley [1], and Corollary 2.8 is due to him. Theorem 2.5, which is a convenient variant of Theorem 2.4, is due to Bhattacharya [3]. Section 3. Rao [1] was the first to prove the existence of a constant c(k), depending only on k, such that 4((8C)') < c(k)c for all convex subsets C of R k . The particular estimation given by Theorem 3.1 is essentially due to von Bahr [3). It is easy to check that for k - I and a nonincreasing g [on (0, oo)], this estimate cannot be improved upon. The proof presented here provides perhaps the first detailed formal derivation of the estimate. The development (in particular, Lemma 3.9) may be of some independent interest.
CHAPTER 2
Fourier Transforms and Expansions of Characteristics Functions
The main mathematical tool used in this monograph is the Fourier transform and its extension, the Fourier—Stielijes transform. The Fourier—Stieltjes transform of a probability measure on R k is better known in probability literature as its characteristic function. In Sections 4 and 5 we present a summary of some basic facts about these transforms. Section 6 introduces moments and cumulants of probability measures on R k and presents some inequalities concerning them. In Section 7 the Cramer—Edgeworth polynomials associated with a set of cumulants are studied. The principal results of Chapter 2 are th'e asymptotic expansions of (derivatives of) characteristic functions of normalized sums of independent random vectors. These expansions, in terms of the Cramer—Edgeworth polynomials, are developed in detail in Sections 8 and 9. In Section 10 some classes of probability measures, which are used as smoothing kernels in Chapters 3 and 4, are introduced.
4. THE FOURIER TRANSFORM
In this section we collect some standard results on Fourier transforms without proofs. Let L'(R'o), I
f jflodx
40
Expansions of Characteristic Functions
with norm I/p
l fII =( f IfIPdx) ,
(4.1)
'
where f gdx denotes the integral of g on R k with respect to Lebesgue measure on R k . As usual, two functions f and g on R k are said to be equivalent if they are equal almost everywhere with respect to Lebesgue measure. The space L 2 (R k ) is a Hilbert space endowed with the inner product < •> 2 (4.2)
2= f f 8 dx.
Here g denotes the complex conjugate of g. For nonnegative integral vectors a=(a 1 ,...,ak ) we write X ° =Xa l • ••Xk
[X=(x^.....xk)ERk],
DDr'...Dk
[I<j
a=(a I ,...,ak )E(Z + ) k ],
l
(4.3)
and say that D°f is the ath derivative of the function f. Also, for each complex-valued function f on R k we define the function f by (xER k ).
f (x)=f(—x)
(4.4)
The function is said to be symmetric if f = f. For f E L'(R k ) the Fourier transform f of f is a complex-valued function on R k defined by (IER k ),
f (t)= fe'<`•">f(x)dx
(4.5)
where <.,> is the usual inner product in Ck : k
= L; lj xj [1=(ti,...,lk), X=(x 1 ,...,Xk )ECk ],
(4.6)
j-1
with c the complex conjugate of the complex number c, giving rise to the norm 1^ • ^1, '/Z
k
IIIII=(2 ItjI ) 2
(IECk).
(4.7)
The Fourier Transform
41
Here 1cl 2 =cc for complex numbers c. We shall sometimes use a different norm I • I in Ck : k
IxI= I Ix^I
(xEC k ).
(4.8)
j-1
THEOREM 4.1. Suppose f E L'(R k ). (i) f is uniformly continuous on R k . (ii) If (t)I < II f II for all t E R k . (iii) (Riemann-Lebesgue Lemma) If(t)I-*0 as lt^I-► oo• (iv) (Fourier Inversion Theorem) If (the equivalence class of) f E L'(R'), then one can recover (the continuous version of) f from f by the formula f(x)=( 21r)
-k ( e -i
Hence
<J.X>f(t)dt
(x ER")
J
J=(2)
J = ( 217 ) -k l•
(v) Let a = (a 1 ,. . . , a k ) be a nonnegative integral vector. Define g by g(x)=xaf(x)
(xERk).
If gEL'(R k ), then D°f exists and g=(—i) 1°1 D °f
(vi) If f E L`(R k ) n L 2 (R k ), then f E L 2 (R k ) and 11 f 112 = ( 21T) -k/2 11 f ^I2•
The last result (vi) says that the map defined by f-*(27r) - "`" 2f regarded as a map on the subset L'(R k )n L 2 (R k ) of L 2 (R k ) into L 2 (R k ), is an isometry. Since L'(R k )n L 2 (R k ) is a dense subset of L 2 (R k ), one can extend the above isometry on all of L 2 (R k ), and f so extended is still called the Fourier transform of f j E L 2 (R k )). The following theorem is then immediate. THEOREM 4.2 (Plancherel Theorem). The map f-^(2?r) - '`" 2 f is a linear isometry on L 2 (R k ), and
-k
2
[f gEL 2 (R k )J.
42
Expansions of Characteristic Functions
For f, g E L'(R IC) one defines the (equivalence class of) function(s) f*g, called the convolution off and g by (f*g)(x)
= f f(x—y)g(y)dy
(xER k ).
(4.9)
It follows from Fubini's theorem that f•gEL'(R k ) and that
III=gI[I< (ifIh IlgtI1
[ffgEL'(Rk)J.
(4.10)
The convolution operation is clearly commutative and associative. The n-fold convolution f " is defined recursively by ,f*'= f,
f*n = f`(n-l) * f
[n> l. fEL'(Rk )].
(4.11)
5. THE FOURIER-STIELTJES TRANSFORM
We shall now extend the concept of the Fourier transform to the set 6_11t of all finite signed measures on the Borel sigma-field of R". Let µ be a finite signed measure. We define the finite signed measure µ" by µ(B)=µ(—B)
(BE3Y').
(5.1)
A finite signed measure t is called symmetric if u = µ. The Fourier—Stieltjes transform µ of a finite signed measure µ is a complex-valued function on RC defined by µ(t)=
f e <' ,
x
>µ(dx)
(:ER', µE ')1L),
(5.2)
where, as usual, the integral is over the whole space R k . If µ is a probability measure, µ is also called the characteristic function of s. Note that if s is absolutely continuous (finite signed measure) with respect to Lebesgue measure, having density (i.e., the Radon—Nikodym derivative) f, then
µ=f.
(5.3)
The convolution µ*v of two finite signed measures is a finite signed measure defined by (µ*p)(B)=
f
(B-x)p(dx)
(BE iYC),
(5.4)
The Fourier-Stielijes Transform
43
where for A C R"`, y E R k , the translate A +y is defined by A+y=(z=u+y:uEA)•
(5.5)
It is clear that convolution is commutative and associative. One defines the n-fold convolution u*" by µ *1 =µ µ*"=µ*("- i) *µ
(n> 1,
e'X).
(5.6)
Let be a signed measure on R k . For any measurable map T on the space (R k , GJ3 k ,µ) into (R 3 ,Jas) one defines the induced signed measure µ ° T -' by (µoT-')(B)=µ(T-'(B))
(B E'3).
(5.7)
THEOREM 5.1 (i) (Uniqueness Theorem). The map µ—*A is one-to-one on 671.. (ii) For u E JR, , µ is uniformly continuous and µ( 0 ) = µ( R k ) , Iµ(t)I
(tERk).
(iii) For µ, v E OiL µ*v=µ $. (iv) If ft E Olt , then µ=µ,
so that µ is symmetric if and only if is real-valued. The symmetrization µ*µ" of is always symmetric since *µ=IZ. (v) If T is an affine transformation on R k into R 5 : Tx = a + Bx, x E R k , then
T- ► (t)=ei<<.a>µ(B't)
(1ERs, IRE
where B' is the adjoint (or transpose) of B.
),
44
Expansions of Characteristic Functions
(vi) If µ E Jll, and f Ix °IIjI(dx)
µ(dx)
I,
... ,
a d,
then D ° µ exists
(t E Rk)
(vii) Ifµ E L 1 (R'`), then is absolutely continuous with respect to Lebesgue measure with a uniformly continuous and bounded density. (viii) (Parseval's Relation). Let µ, v be two finite signed measures on R k . Then
f µ dv = f i dµ. The following theorem (due to Cramer and Levy) gives a useful characterization of weak convergence of probability measures on R'`. THEOREM 5.2 Let (G„ : n> I) be a sequence of probability measures on R k . If the sequence converges weakly to a probability measure G, then {G„) converges pointwise to G. Conversely, if {G„} converges pointwise to a continuous limit h, then there exists a probability measure H such that {G„} converges weakly to H and H = h. It should be noted that if { G„ } converges weakly to G, then {Ô) converges to G not merely pointwise, but uniformly on compact subsets of R k . This is because the class of functions }
(e < r,. >:tEK), ;
where K is a compact subset of R k , is equicontinuous and uniformly bounded; hence, by Corollary 2.7, it is a uniformity class for every probability measure G.
6. MOMENTS, CUMULANTS, AND NORMAL DISTRIBUTION vk ) is a nonnegative Let G be a probability measure on R'. If v = (v 1 v 2 integral vector such that Ix'I is integrable with respect to G, one defines the v-th moment IA, of G by ,
µ,,
= f x'G (dx).
, ... ,
(6.1)
Moments and Cumulants 45
For every nonnegative real s one defines the s-th absolute moment ps of G
by PS=JIIxII'G(dx)
(s>0).
(6.2)
Note that, by Theorem 5.1 (vi), if µ, is finite for some nonnegative integral vector v, then iI°Iµ=(D°G)(0)
(6.3)
and, therefore, one has the Taylor expansions O(i)=l+ ^ µ vi (Iltlls) + o Y
(
IPI<s
( t-> 0),
(6.4)
if ps is finite for some positive integer s. Here v!=v 1 !v 2 !
.
.. Vk !
[V=(V1,....'k
)].
(6.5)
If k=1, = f xJG(dx)
(1=1,2,...).
(6.6)
For a positive real number x, logx denotes the natural logarithm. In other words, the function x.-*logx on (0, oo) is defined as the inverse of the
function x--),exp (x) = 1 + x + x 2 /2! + x 3 /3! + • on R'. For a nonzero complex number z = r exp (i8) (r>0, - it < 8 < ir), we define logz=logr+iO (r>0, -^r<0 it). (6.7) Thus we always take the so-called principal branch of the logarithm. The characteristic function G of a probability measure G on R k is continuous and has a value of one at 0. Hence in a neighborhood of zero one has the Taylor expansion t = ^ Jc^. (it) V +o to gG O— (IIt IIS) ( t- 0 ) ,
(6.8)
IPj<s
if ps is finite for some positive integer s. Here the summation is over tNote that Ixi < I1x11 H , so that I, I
Also, see Corollary 8.3.
Expansions of Characteristic Functions
46
nonnegative integral vectors v, and (see Corollary 8.3) = (D log G )(0).
(6.9)
The coefficient y, is called the v-th cumu/ant of G. A simple calculation gives Xo=0,
X,,=µl,
if lvl=1.
(6.10)
The following formal identity enables one to express a cumulant x in terms of moments by equating coefficients of t' on both sides:
II
' X(v)•=(-1)3+is z^ =i 11 >
(a)
)
J.
(6.11)
/
This is obtained by noting a, log(I+z)=
(-1)° + ' 1 z'
(zEC, lz^
(6.12)
For example, in one dimension X21L2! LI2,
X3 — µs - 3µ2µi+2µi3,
X4=µa -4 µ3µi -3 µZ 2 + 12 µ2µ1 Z-6 µi °, (6.13) Xs=µs -5 µ4 µi — 10 µ3µ2+ 20 µ3µi 2 + 30 µ2 2µI —60 µ2µi 3 + 24 µI 5 .
In general, in R k one may write li..
J
(6.14)
where the summation is over s =1,2,. . . , vi and for each s over all s-tuples of nonnegative integral vectors (v 1 ....,v5 ) and s-tuples of nonnegative integers (j l ,...,j,) satisfying (6.15)
and c(v 1 ,...,v5 ;j,,...,j,) is a constant depending only on (v,,...,v,) and (jl.... is). ,
For a given probability measure G on R" with cumulants {X,, : (vi < s),
Moments and Cumulants 47
we define the polynomial X,.(z) in z = (z ,, ... , z k ) E C k by i
(zECk; s= 1,2,...).
(6.16)
IIS
We point out that for t E R', X,(t) is the sth cumulant of the probability measure on R' induced by the map x-*defined on the probability space (R", J^'Y,G) into (R',). This follows because the function u -*log G (ut) [u E (- a, a) for some a>01 is the logarithm of the characteristic function of this induced probability measure on R' and because [by ( 6. 8 )] "
it
d -S log6(ut)I =s 1 2 X" v) ( -o
I V I- S
Often we find it convenient to state results in terms of random variables and random vectors (although for purposes of this monograph it would be possible to avoid such notions). The with moment µ"(X), the with cumulant y, (X), and the sth absolute moment p(X) of a random vector X in R k are defined to be the corresponding characteristics of the distribution P x of X. Note that for a random vector X with a finite covariance matrix, Cov(TX+a)=Cov(TX)= TCov(X)T', (6.17) where T is an m X k matrix (l < m < k), T' is its transpose, and a E R'". Also, if (X 1 ,...,X„) are n independent random vectors in R' and p 2 (X 1 )
(6.18)
If X is a random vector in R k and has a finite with moment (and cumulant), then for every c E R', N,(cX)=c 1 " pti(X)
(lv >1 ), (6.19)
X,.(cX) = c"""X"(X)
and for all cER', bERc,
x"(cX+b)=X„(cX)=cIX,,(X)
if lvI > 2,
X,,(cX+b)= y. (cX+b)=cµ"(X)+ =cX,,(X)+
if ivl=1.
(6.20)
48 Expansions of Characteristic Functions
It may also be shown, either by direct computation, as in (6.13), or by looking at (6.14) and (6.15), that X,(X)=(X) for Iv1=2,3,
if E(X)=0.
(6.21)
The relations (6.19) and.(6.20) follow easily from definitions; note that in a neighborhood of t=0 logPPx+b (t) = i+logP X (ct).
(6.22)
Since the cumulants y, are invariant under changes of origin for Ivl > 2, they are sometimes called semi-invariants. The effect on y (and µ,) of a change of scale, as given by (6.19), is also quite simple. However the most important property of cumulants from our point of view is the following: If { X 1 ,.. . , X, j are independent random vectors in R k each having a finite with cumulant, then (X1+..• +Xn)=x,(X1)+ • • • +X„(Xn).
(6.23)
This follows from n
logPXl+.,.+X.(t)= 2 logP Xj (t),
(6.24)
js1
which holds in a neighborhood of 1=0. We now derive some inequalities for moments and cumulants of a random vector X. Henceforth, if there is no possibility of confusion, we simply write µ., p S , 7^, for µ,(X), p s(X), L(X), and so forth, respectively. The following basic inequality is stated without proof.t It is used fairly often in this book. LEMMA 6.1 (Generalized Holder Inequality). Let f 1 , 1,< i < m, be measurable functions on a measure space (2,Q,µ) into (R', V) such that If1 1", 1 < i < m, are integrable for some m-tuple of positive reals (p t , ..., p m ) satisfying pj ' + • • • + pm ' =1. Then
f Ifs = fmldµ
-
LEMMA 6.2. Let X be a random vector in R k having a finite sth absolute moment p s for some positive s. If X is not degenerate at 0,; tSee Hardy, Littlewood, and Polya [1] for several different proofs of this important inequality. That is, Prob(X-0)3. 1.
Moments and Cumulants 49
(i) r-->log p, is a convex function on [0, s]. (ii) r-^p /` is nondecreasing on (0,s]. (iii) r_►(p,/p 2 1 / 2 ) 1 / ( r -2) is nondecreasing on )
(2,s] if s>2.
Proof. We may assume that X is not degenerate at 0, so that log p, is defined for 0 < r < s. (i) Let 0 < < 1, 0 5 r,, r 2 s. Then Pa,,+(0 —a)r i =
E (IIXIlar,+(1 —a)r,
) = E (YZ),
where Y=I;XII«r', Z=IIXI )r '. Since E(Y'/a) and E(Z"° - a ) ) are both finite, by Lemma 6.1 (with m=2), E (YZ) c(E (Y l /a)) a (E (Z'/u
p
=
r«
p
^^
-
a,
so that log par, +(i - a)r 2 < a log p, , +(l - a) l og p, 2 . (ii) Since log p o = 0, and (11r) log pr is the slope of the line segment joining (0,logpo ) and (r, log p,), it follows by the convexity of r--*log pr that
the function r-+Iog p; = r logy, is nondecreasing. (iii) If p 2 = 1, then the function r
( -2)
r^logP
=(r-2) -I logPr=(r-2)-)(logy,-logP2) (6.25)
is increasing in (2,s], since the expression on the extreme right of (6.25) is the slope of the line segment joining (2,logp 2 ) and (r,logp,), and r- logp, is convex in [0,s] by (i). In the general case, apply this argument to Y X/pi /2 . Q.E.D. A simple application of Lemma 6.2(ii) yields r
(tai l <mr -I Ila i VV
(a 1 ER', 1
(6.26)
for every r> I and every positive integer m. It also holds, trivially, for r=0.
50
Expansions of Characteristic Functions
LEMMA 6.3. Let X be a random vector in R k having a finite sth absolute moment p S for some positive integer s. Then for nonnegative integral vectors v satisfying Ivl ( s, (6.27)
,LLl < EIX"I < p H ,
and there exists a constant c 1 (v) depending only on v such that (6.28)
Ix.l < c1(v)P1'1. Proof. The inequalities (6.27) follow from the simple inequality
(6.29)
x'I < 11xII''. The inequality (6.28) follows from (6.14) and (6.15) by noting that µV; ... µ s I
. per < p^ PIV I ..
V1
E 1;I,l/IYI "' =p1.
(6.30)
The first inequality in (6.30) follows from (6.27), the second from Lemma 6.2(ii). Q.E.D. Before concluding this section, we mention a few properties of the single most important probability measure on R "—the normal distribution 4,,,, v , whose density 4, n v (with respect to Lebesgue measure) is given by 4)m v (x)=(277) -k/z (DetV) -1/Z exp{
-- <x,V -
^x>}
(xER k ). (6.31)
Of the two parameters, m ER" and V is a symmetric positive-definite k x k matrix. The notation Det V stands for determinant of V, and V' is, of course, the inverse of V. It is well known thatt (D,,,. v (t) = exp { i— 2 )
(t ER").
(6.32)
From this it follows that m is the mean and V is the covariance matrix of For the computation of cumulants X, for II > 2, it is convenient to take m =0 (for It > 2 a change of origin does not affect the cumulants x.). This yields log f o, v (t) = — 2tSee Cramer [4], pp. 118, 119.
(6.33)
The Polynomials P,
51
which shows that X„ = (i,j) element of V if v = ei + ej ,
(6.34)
where e1 is the vector with I for the ith coordinate and 0 for others, < i k. Also, X„=0
if Ip1>2.
(6.35)
Another important property of the normal distribution is
(D m1, V l * ( Dm 2 . V,* ...
* 0
(6.36)
V = 4i m ,,,,
where m=m1+m2+••• +m,
(6.37)
V=V i +V 2 +•••+V,,.
This follows from (6.32) and Theorem 5.1(i), (iii). The normal distribution (D o.r , where I is the k x k identity matrix, is called the standard normal distribution on R k and is denoted by 4'; the density of 4' is denoted by .0. Lastly, if X = (X 1 , ... , X k ) is a random vector with distribution 4',., ^,, then, for every a E R k , a0, the random variable (a, X> has the onedimensional normal distribution with mean and variance = ^k a a i ja r^ , where ij=I a j =(i,j) element of V=cov(X i ,Xj )
(i,j= I....,k).
7. THE POLYNOMIALS P, AND THE SIGNED MEASURES P J Throughout this section v = (v i , ... , v,,) is a nonnegative integral vector in R k . Consider the polynomials
X^ 7,(z)=s!
lvIas P !
(z`=z^
zP
,...
(7.1)
z)
in k variables z 1 , z 2 ,...,z k (real or complex) for a given set of real constants x,. We define the formal polynomials P3 (z: (X„)) in Z 1 ,....Z k by means of the following identity between two formal power series (in the real
variable u). 1+ I P,(z: {X })u s =exp '
X,+2(z) !
u' m
=l+ 1+I ,
m=i
E s + 2)! us .
m! L•^ (s+2)!
s=i
(7.2)
52
Expansions of Characteristic Functions
In other words, Ps (z : {7c}) is the coefficient of u' in the series on the extreme right. Thus
Ps( z : (X"})=
t
V1 f
L
,r,l
(il
X1 1 +2( Z ) Xj2+2(z) ...
Xj„ + 2 ( Z )
(11+2)! (12+ 2 )!
(Jm+2)1 1}
^^
1 0 -0,
1
mt
(s=1,2,...),
(7.3)
where the summation X* is over all m-tuples of positive integers (j 1 ,...,jm ) satisfying M (7.4) ji =1,2,...,s (1G i<m), >ji=s, and S** denotes summation over all m-tuples of nonnegative integral vectors (p 1 ,. . . , vm ) satisfying kI=j; +2
(1
(7.5)
In particular, z
P 1 (z: (X,,})= X 3 i =
-z", I"I' 3
X4( z )
x 32
(
z
)
P2 (z: {X})= 4! + 2!(3!)2
P3(z:})=
x5(z)
5!
+ X4( Z ) X 3( Z ) + X 3( z)
3!4!
(3!)a
(7.6)
LEMMA 7.1. The degree of the polynomial PS (z: {X,}) is 3s, and the smallest order of the terms in the polynomial is s + 2. The coefficients of Ps (z: {x.)) only involve X„'s with M <s+2. Proof. This follows immediately from (7.3). Q.E.D. The notation above has been chosen to suggest that eventually we use cumulants of some probability measure for X„'s in the expression for PP (z: (X"}). The polynomial Pt can be defined in this sense only if the
The Polynomials P, 53
(s + 2)th absolute moment p5+2 is finite. The role of the polynomials P S in the theory of normal approximation is briefly indicated now. More precise results are given in Section 9. Let G be a probability measure on R k with zero mean, positive-definite covariance matrix V, and finite sth absolute moment p, for some integer s>3. Then [see (6.8), (6.16), and (6.23)1 log e"
\ n 1/2) =n
logG ( I) -
s-2
!t t = — i+ E (r+2)! n '/Z+n o(^^ / 2 ii `
,
n
i
)
Lo)
(7.7)
.
(
Thus for any fixed t E R k ,
G"
)
(
=exp(--L)-exp
I
n ^/ 2
S-2
(,12(" n-'/2+o(n-(s-2)/2)
n1/2 s-2
=exp{-i
n - '/ 2 P,(it: {X,})
Vt>}. l+
r=!
x (I +o(n -js-2) / 2 ))
(n-oo),
(7.8)
where, in the evaluation of P,(it: {X„}), one uses the cumulants y of G. Thus, for each t E R k , one has an asymptotic expansion of G" (t /n 1 / 2 ) in powers of n - 1 / 2 , in the sense that the remainder is of smaller order of magnitude than the last term in the expansion. The first term in the asymptotic expansion is the characteristic function of the normal distribution'I ,. The function (of t) that appears as the coefficient of n - '/ 2 in the asymptotic expansion is the Fourier transform of a function that we denote by P,(-0 0. ,,: (y,)). The reason for such a notation is supplied by the following lemma. LEMMA 7.2. The function t-*,(it:{x.})exp{-z)
(IGRk),
(7.9)
is the Fourier transform of the function P,(-40 o v : {X„}) obtained by formally
54 Expansions
of Characteristic Functions
substituting (— 1) IHI D V *O v
for (it) " (7.10)
for each v in the polynomial P r (it: {X,,)). Here 4, is the normal density in R k with mean zero and covariance matrix V. Thus one has the formal identity P,( - 4o: {X,))= P,( —D: {X„})$o.v
(7.11)
where —D=(—D I ,..., —Dk ). Proof. The Fourier transform of 0 o, v is given by [see (6.32)]
— Z}
(tERC).
(7.12)
D qo v(t)=( it)r*o.v(t)
(tER"),
(7.13)
0o,v(t)=exp{
Also
which is obtained by taking the with derivatives with respect to x on both sides of (the Fourier inversion formula) 00,v(x)=(2sr) -k
f
exp{—i}$ o. v (t)dt
(xER k ) ( 7.14)
[or, by Theorem 4.1(iv), (v)]. Q.E.D. We define P,(— (D o, v : {7t,.}) as the finite signed measure on R k whose density is P,(- 4o.v:(X„ }). For any given finite signed measure µ on R", we define u(•), the distribution function of µ, by µ(x)=u((—oo,x])
(xER"),
(7.15)
where (
—
oc,X]=(
—
oo,X1]X .. • X(
—
ci,X k ]
[X= (x1,...,Xk)ERk}. (7.16)
Note that D,... DkP,( — 'Do.v: {X.})(x)= P,( — $o,v: {X,})(x) ,(—D: {X, , }) 40.v(x)
=P,( —D: (X, ))(DI ,
...
DkOo.v)(x)
= D j ... Dk (P,(— D: {Xr))4o.v)(x). (7.17)
The Polynomials P, 55
Thus the distribution function of P,( — (D o, I, : {x,}) is obtained by using the operator P,(— D: (x)) on the normal distribution function (D o, I,(• ). The last equality in (7.17) follows from the fact that the differential operators P,(—D: {X,}) and D I D 2 • • • Dk commute. Let us write down P,(-40: (x,)) explicitly. By (7.6), P I (it : (x}) =
(7.18) -- (it) ° (I ER"),
so that (by Lemma 7.2) P1(
—
$0,v: {x,})(x)
_—
Irl-3
Dr0O.v(x)
k —
6 X(3.0.....0)
—
3
II
v ljxj )
k
+3v 11 k v l'xj
j-I
j-1 k
vkjx. +3v
+... +X0.....0,3) — (
k j -1
k
2 k
2 "12.1.0.....0) — 2I vlxj El^ 2jx! )
k
k
+Zv l2 vlixj+vll j-1
2 k
k +... + X10,....0.1.2)
k
v2Jx1 j-I
k—Ijx ^IVk^xj) (^ I V j
k
+2v k ' k-I 2 v kjxj +v kk 2 Uk-IjxJ j- 1 j-1
Jj
Expansions of Characteristic Functions
56
—
X (1.I.1.0,....0) — (J
x)(±
U2jXl /
I
k
1 v3JxJ /
\
k
k
v 12 U3jx,+U13 2 V2iX^') U 23 UIIXj +...
l-1
JAI
k +X(0.....0,1,1,1)
f-I
I k
k k—IjxJ Vkjxl
Uk-2jXj)(j- U 1 JiI
/
k
+Uk-2.k-1
k UkJx.+Uk-2.k
Uk-1jX.
l -1 l—I
l
k
+Uk—1,k
2
Uk-2jkJ
l- 1
OV(x)
r V -1 =((v')), x=(x l ,...,xk )ER k ]. (7.19)
If one takes V= I, then (7.19) reduces to P1(-4: ( ))(x) =
{
-
—
6["13,0,....0)(
— X^+ 3 X1)+ ... +x((0....,0.3)( — xk+ 3 xk)]
2 [X(2,1,0.....0)(
— x^x2+x2)+ ... + X(0.....0,1.2)( —Xk xk—I +Xk—I)]
[X(1,1,1,0,...,0)(
—x 1 x 2 x 3)
+... +
X(0,...,0,1,1,1)(
Xk xk—I xk-2)]}'O( x )
(x E R k) (7.20)
where is the standard normal density in R k . If k= 1, by letting X^ be the jth cumulant of a probability measure G on R'(j = 1, 2, 3) having zero mean, one gets [using (6.13)] P1(
—
*: {7G.))(x) = 6 µ3 (z3-3x)$(x)
(xER'),
(7.21)
where µ 3 is the third moment of G. Finally, note that whatever the numbers (X,) and the positive definite symmetric matrix V are,
f P,( — $o v: {^))(x)dx=P,( -4 o,: {x,))(R k ) =0,
(7.22)
Approximation of Characteristic Functions
57
for all s > 1. This follows from
for III 1.
f (Dpo o, v )(x)dx=0
(7.23)
The relation (7.23) is a consequence of the fact that 40 o v and all its derivatives are integrable and vanish at infinity.
8. APPROXIMATION OF CHARACTERISTIC FUNCTIONS OF NORMALIZED SUMS OF INDEPENDENT RANDOM VECTORS Let X 1 ,. ..,X ben independent random vectors in R k each with zero mean and a finite third (or fourth) absolute moment. In this section we investigate the rate of convergence of Px,+...+x.)/nt/2 to 1 o.v , where V is the average of the covariance matrices of X 1 ,... ,X. The following form of Taylor's expansion will be us-;ful to us. LEMMA 8.1.t Let f be a complex-valued function defined on an open interval J of the real line, having continuous derivatives f ` of orders r=1,...,s. If x, x+hEJ, then (
)
s—I
f(x+h)=J(x)+
h^ fi 1 (x)+
(s hs l)i
f ' (1–v) s- 'f( s l (x+vh)dv.
–
(8.1) COROLLARY 8.2. For all real numbers u and positive integers s (to)' I J u
exp { iu } — 1 — iu — • • • — (s
—
1)! < —. l(8.2)
Consequently, if G is a probatility measure on R k having a finite sth absolute moment p s for some positive integer s, then i
G(t)
—
I
—
iµi(t)
—...
—
1)! µs -1(t) $ (t) ^ p511tlls s ^ s^ tE ( R k )+ ( 8.3 )
tHardy [fl, p. 327.
58 Expansions of Characteristic Functions
where for r = 1, ... , s, vi µrt r ^ljs(t)= f II s G(dx). (8.4)
A'(t)= f'G(dx)= Irl - r
Proof The inequality (8.2) follows immediately from Lemma 8.1 on taking f (u) = exp (iu) (u E R 1 ) and x = 0, h = u. Inequality (8.3) is obtained on replacing u byin (8.2) and integrating with respect to G(dx). Note that
a:(1)= f II'G(dx) < Iltll'p,. Q.E.D. COROLLARY 8.3. Let f be a complex-valued function defined on an open subset SI of R k , having continuous derivatives Drf for I vl <s (on SZ). If the closed line segment joining x, x + h (E R k ) lies in St, then vl (Dr7)(x)
f(x+h)=J(x)+ 1
+s 2 vl j l (1—u) s-1 (Drf)(x+uh)du,
(8.5)
U
Irl—s
and therefore V
f(x+h) — f(x) — hl
1
<
(D rf)(x) r
Ivll max{^(D7)(x+uh)^:0
Irl-S
Proof. Define the function g on (— c, I + c) by (8.7)
g(u) =f(x + uh),
c being a positive number for which the line segment joining x-eh, x+(l +e)h lies in S2. Using the formula for differentiation of composite functions, one obtains, by induction on r,
g (r)( u ) = vl h'(Drf)(x+uh) Irlsr
(r= 1,.... ․).
(8.8)
Approximation of Characteristic
Functions
59
Hence, by Lemma 8.1, g(1)=g(0)+2 g( r 0) + (
f ' (1—u)s -I g ( : ) (u)du,
(s 1)1
r— 1
which, on substitution from (8.7) and (8.8), yields (8.5). The inequality (8.6) follows immediately from (8.5). Q.E.D. Let X i ,. .. , X„ be n independent random vectors in R' each having a zero mean and a finite sth absolute moment for some s> 2. Assume that the average covariance matrix V, defined by V= n I Cov(Xj )= n 2, Vj ,
(8.9)
l —1 j-1
is nonsingular. Then we define the Liapounov coefficient ii,, by n
n -1 I E(j5 ) l^
II^II—I
n —(s —2)/2 (s > 2).
n s/2
!s „= sup n
1
(8.10)
2 E (2 ) j-1
It is simple to check that 1, ,, is independent of scale. If B is a nonsingular k x k matrix, then BX I ,...,BX„ have the same Liapounov coefficients as X 1 ,...,X, have. If one writes (1<j'cn)
Prj—E(IIXjIIr)
,
n
P
Pr = n
(8.11)
(r>0),
rj
j=1
then it is clear from (8.10) that l,.n n -(: z)/z sup
P, II t CI = PS S
IItII= i fit, Vt>s/2
n
^s12
-(s- 2)/2 (8.12)
where X is the smallest eigenvalue of V. In one dimension (i.e., k= 1)
!S n _P P /2 n
—(s-2)/2
(s > 2).
(8.13)
We also use the following simple inequalities in the proofs of some of the
60 Expansions of Characteristic Functions
theorems below. If V= I, then.. n EI
n' /2l,,nlltIJ .
(8.14)
In the rest of this section we use the notation (8.9)-(8. 11), often without further mention. THEOREM 8.4. Let X 1 ,...,X, be n independent random vectors (with values) in R'` having distributions G 1 ,...,G, respectively. Assume that each Xi has zero mean and a finite third absolute moment. Assume also that the average covariance matrix V is nonsingular. Let B be the symmetric positivedefinite matrix satisfying B2= V '. (8.15) -
Define -
r..2
11
d2
)
r-2 ,
lT)
b„(d)=2-d(da(d)+1)(ls.n)2/3. 6
(8.16)
Then for every d E (0, 2' / 2) and for all t satisfying
Iltll ( d'/',
(8.17)
one has the inequality
n
^G( nBt 1/2 )-exp{-2IItIl2} i-^
<(da(d)+-6L)13,nII1Il3exp{
-
b„(d)IItII 2 }• (8.18)
Proof. Assume first that V= I and, consequently, B =1, where I is the
identity matrix. In the given range (8.17) one has [see (8.14)]
(
)- ll <(2n) - ' E2 n f) <(2n)-'(EI Is)2/3 2/3
<(2n) -
^+ E I< t,X1> 13 '( j o> <(2n) - '(n 3 / 213 IItjI 3 ) 2/3 = 2/3 n 3 IItII 24 d2,
(8.19)
Approximation of Characteristic Functions
61
so that logGj (t/n 1 / 2 ) is defined for 1 < j < n, and [using (8.14)] n
log II Gj (
n
) + 22 I
j^l
_ 2 ^Iog(I—(1—Gj\ j \ -1
n
n
l))+(2n)_'E21
a0
r
+ Gj ( ^ /2 ) l + (2n) - I E2 ]
n
<
n
J
2
°O
2 r-2
(4n2)- ( E < t,Xi> 2) jr-1(^) l
j-1
+6n-3/2EII3
r-2
n
2 ( E I< t,Xj>l 3 ) 4/3+ 6 /3.nll t II 3
< a (d )n -2
j -1
n 4/3
EIl3)
6 13.nII t ll 3
+
(8.20)
Hence, noting that fie"— 11 xleNNN for all complex numbers x, one has n Gj( 1 n /2 ) — exp{ j
—
-
2IItII 2 }
I n
=exp{
—
_ n
i1It1I 2 } exp log IIGj ( j
<(da(d )+
1 ) 13,nlltll 3 exp
1
—1
{ — 2II , II 2
+d(da(d)+ 6 ) 13,3 11 1 11 2 } =(da(d)+6)/3.fIltll 3 exp{
—
b,,(d)11 1 11 2 }.
(8.21)
62
Expansions of Characteristic Functions
This proves the theorem when V= I. In the general case, look at the random vectors B X 1 ,.. . , B X„, whose average covariance matrix is 1, and recall that the Liapounov coefficient is independent of scale. Q.E.D. By going through the proof it is easy to see that if the X D 's in Theorem 8.4 are i.i.d. with common distribution G, then G"( B I exp{
—
1IItI1 2 } <(da(d)+ L ^13,„Illll 3
Xexp { _1'1_d2a(d)_ 6
(8.22)
for all d E (0, 2 1 / 2) and for all t satisfying
(8.23) II 1 II < dl3 • The next two theorems sharpen (8.22) and (8.18), under the additional assumption of finiteness of fourth moments. THEOREM 8.5. Let X be a random vector in Rk with destribution G. Suppose X has zero mean, positive-definite covariance matrix V, and a finite fourth absolute moment p 4 . For all t satisfying
IItII < 1 /4-„t !
(n> 1),
(8.24)
one has the inequality
I
G"l B^ 2 )—exp { — iIIIII Z }( 1 + 6i3n-t/2µ3(1))I
<[(0.1325)n - 1 + 2 l4." ]IItII 4 exp { — Iltlu Z }
+(0.0272)li „IItII 6 exp { — ( 0 . 3835 )IIt1I 2 }
(n> 1), (8.25)
where µ3(t) = E3 , and B is defined by (8.15). Proof. Assume that V= I. In the given range of:
I
G
<(2n)-'IItII2<e. (8.26) ( h)— ll -
Hence
logG" (
n /z )+
1
11 1 11 2 — I i 3n - 11Zµ3(t)
I
=nllog[I—(1—G(n ^Z))l+(2n)-hIItII2-6i3n-3/2µ3(t)
Approximation of Characteristic Functions
=n - Ir - `(1-G( r-z
i n /Z )) r
63
+6( i n /Z ) -1 +( 2 n) ^Iji) 2
6t3n-3/2A3(t) "0
<( 4 n) - ^Iltll ° r - '(sy -Z +iala,"Ilrll 4 r-2
<[(0.1325) n '+i14 "]Iltll ° (8.27) -
Similarly,
I
n log G ( n 1 /Z ) + II t II Z ^ <(0.1325)n - `II t II ° + 6 13."II tll 3 <(0.0663+0.1667)1 3 "11 1 11 3 =( 0 . 233 ) 13 "11 1 11 3 <(0.1165)II 111 2 .
(8.28)
We then have
t(
{ n f2 ) - exp { - illtll 2 }( ] + 6i 3n
-
1/2µ3( l )) 1 3
=exp{
-
211 1 11 2 }^exp{logo"( t +211 1 11 2 ,"( ( + 6n - ` /2µ3(t) 2
<exp( - 211t11 2 )^exp{loge"(
1 n `/Z
1+loge"
+exp{ - 2IItI1 2 }iog&"(
`/2
+ U1111 2 }
(n 1 J2 J
)+iiItI{ 2- 6n -l/Zµ3(i)^ z
r2
<exp{ - 2IItII2} 2^logG"\n`/z)+iIItII2^ exp^^logG"t n'/z)+ II 21 +exp{ - ?IItI1 2 }[( 0 . 1325 )n - `+2 1°,"^Iltli ° <(0.0272)l "IItII 6 exp { -0.383511tf1 2 }
+ {(0.1325)n - `+ 2 1 °,"] II t II 4 exp ( - 1 II t II') .
(8.29)
Expansions of Characteristic Functions
64
Ixl 2el`I in (8.29) [as well as the We have used the inequality Iex — 1— x estimates (8.27), (8.28)]. This proves the theorem for V= I. In the general case look at the random vectors BX I ,...,BX,. Q.E.D. THEOREM 8.6. Let X 1 ,...,X, be n independent random vectors in R k having distributions G 1 ,...,G, respectively. Suppose that each random vector Xi has zero mean and a finite fourth absolute moment. Assume that the average covariance matrix V is nonsingular. Also assume
l
1.
(8.30)
<21,.n'j°,
(8.31)
4
Then for all t satisfying
IltII one has n
3
III G( BI Z )exP{
—
iIIrII Z }( 1 + 6n -112µ3( 1 ))
I -1
(0.175)14,,IItlI 4 exp { — i IItII'} + [(0.018)l ,.111118+3613,, IItll6]exp{—(0.383)IIt1I2}, (8.32) where B is the positive-definite symmetric matrix defined by (8.15), and E
µ3(t)=n-'
J- 1
Proof. We may, as before, take V= I. Note that [by (8.14) and (8.3 1)] i
Iz ( I/Z)-1 <(2n)-IE2<(2n)-1(E. 4)
n
n
1/2
<(2n) -1 (^ E
(8.33)
J-1
in the given range of t. Proceeding as in (8.28), one has to g•^ n G. J —1
- ^ + 2II ! tII i <(0.1325)n2
( 1/2)
n
Z2+ 1 ( Et,X. ^ J^) 6 13.n
Iltll 3
j-1
<(0. 1325 ) 14.nIltll 4 +( 6) 13 ,,.11111 3 <(0. 117 )IftII',
(8.34)
Approximation of Characteristic Functions
65
using (8.30), (8.31) in the last step. Also, as in (8.27),
log
Gj ( In )+ j IItII Z— 6 n -I ^ Zµ3(t) j=I
<[(0.1325)+(1/24)]14. lull ° <( 0 . 175 ) 14.nll 1 II . (8.35) Hence, as in (8.30), n
I-I Gj(__
I
l
/
—
exp{
-
2IIzII 2 }( 1+ 66
n—I ^ Zµ3( 1 ))
I n
<exp(
— zIItII 2 ) exp I logGj ( __^ Z )+iII 1 II Z —1
n
J-1 n
— Y logGj ( ^^ Z )+211 1 11 2 j-1
n
+exp{
— ill 1 II 2 } 1logGj( 1)+ziitti Z
n
j=1
3 T"— I/2(,) —6
<(0.1 7 5)141I 1 II 4 exp { — z II'II Z } + [(0. 1 325) 21;,,11(11 8 + 13 nlltll b j exp ( — ilI'l1 2 +( 0 . 117 )11 1 11 2 ).
(8.36)
Q.E.D. Note that [see (7.6)] if X^ j is the with cumulant of X i and if XY =n-Ili-,x,j, then
t"
P I (it: (X„})=i 3 I IPI-3 v
n
n
=n
n
-I (t3
j 1
l t^^=n v
1^I= 3 j
3
3
(
1
3
31X3.j(0
II-3
=n—I1 2 31 µ3,j( 1 )) — 6 µ3(t) (+ j-1 Iv!-3
66 Expansions of Characteristic Functions
the equality X3 j (t)=tb j (t) being a consequence of the fact that X 3f (t), 53J (1) are, respectively, the third cumulant and third moment of the random variable (t,X1 ), which has zero mean [see (6.21)]. THEOREM 8.7. Let G be a probability measure on R k having zero mean, nonsingular covariance matrix V, and a finite third absolute moment. For all t satisfying (8.37)
11111 <2 h / 2I3. "', one has
G"(
2
B1
Iexp1 — (2 -262 )Iltll 2 }, )<
(8 .38 )
where B=B', B 2 = V Proof. First take V= I. By Taylor expansion, if X has distribution G, G( 1 n )=]-(2n)- 111,1 + 6n-3/2EI
(8.39)
where I0I < 1. Hence, noting that I- (2n) -1 II t II 2 > 0 in the given range oft,
G ( ni ^ 2 )1< 1-(2n)- I II t11 2 + sn-1/3."Iltll' <exp(—(2n) -1 11 1 11 2 +In - '/3.,,IlIII'} <ex p
(1 _ -I i -
2 1 /2 ) - --
1, ) --;;n-Jj -;;
(8.40)
which proves (8.38), when V= 1. For the general case, look at the distribution of BX, where X has distribution G. Q.E.D. Before proving a similar result for non-i.i.d. random vectors, we need a simple lemma. LEMMA 8.8. Let X and Y be two independent random variables (in R') having the same distribution. If this common distribution has mean zero and a finite third absolute moment, then EIX-YI 3 <4E1XI 3 .
(8.41)
Approximation of Characteristic Functions
67
Proof. Since IX
—
YI 3=(X—Y)'IX—YI <(X2-2XY+Y2)(IXI+IYI)
= IXI 3 + IYI 3 + XZIYI + YZIXI — 2XIXIY — 2YIYIX ,
(8.42)
and E(XIXIY)=(E(XIXI))EY=0=E(YIYIX), one has E IX — Yj 3 ' 2 E IX1 3 + 2(E X 2 )E IYI = 2 E IXI 3 + 2(E X 2 )E IXI < 4EIX1 3 .
(8.43)
Q.E.D. THEOREM 8.9. Let X 1 ,...,X, be n independent random vectors in R'` having distributions G,. . . ,G, respectively. Suppose that each X^ has zero mean and a finite third absolute moment. Also, assume that V = n - '2j Coy (Xi ) is nonsingular. Then for every 8 E (0, Z ), n
1G;( .i —
Bt2
) <exp {-31I1112}, (8.44)
for all t satisfying Ilt11 (—&)l,,'.
(8.45)
Here B=B', B 2 = V -' . Proof We may take V=I. In this case, for each j, I < j < n,
z
1 —n-iE(t
Xl \2+3n-3/ZEI(t
n
,X^>I3
C exp { — n -' E ( t, X1 >2 + 3 n - 3/2E I (t, X^>I 3 } , (8.46) since I G^(ut)I 2 is the characteristic function, evaluated at u, of the random variable— , where Y1 is a random vector independent of and having the same distribution as X^; also, by Lemma 8.8,
E[— <',y>] 2 =2E 2 , EI — l 3 <4EI 3 .
(8.47)
68
Expansions of Characteristic Functions
Multiplying both sides of (8.46) over j= 1,...,n, one gets „
2
llG( n112/ exp{
-
IItII 2 +3/s,„Iltll'}
JI
exp{—I1,112+3(i—S)IItII2}
=exp{ — 3s IItII 2 } , (8.48) in the given range (8.45). Q.E.D.
9. ASYMPTOTIC EXPANSIONS OF DERIVATIVES OF CHARACTERISTIC FUNCTIONS We first state and prove some preliminary lemmas. LEMMA 9.1. Let g be a complex-valued function on an open subset 2 of R k having continuous derivatives D"g for 'vj < m, where m is a positive integer. If g has no zero in Sl, then on Stt (D^'g)...
(DP'g)
g where the summation is over all collections of nonnegative integral vectors ($',... /P) satisfying '
f3'+... +13 1 =v,
1 6 j
[I
and the constant c({ $I,...,$3)) depends only on the collection {/3 . ,...,/3i). Proof. The result is obviously true for I vj = 1. Assume that it holds for all v satisfying 1 < m, where m is a positive integer. An immediate computation shows that the result holds for all vectors v + e,, I i < k, where e, is the vector having one in the ith coordinate and zeros elsewhere. Thus the result holds for all v satisfying I < M < m + 1. Q.E.D. LEMMA 9.2 (Cauchy's Estimate). Let f be a complex-valued function on B(a: R)= (z=(z 1 ,...,Z k )EC k : . I lz ; —a ; 1 2 . R 2 )[where a=(a I ,...,a k )EC'` tThis is a local result and is valid whatever the branch of the logarithm chosen.
Expansions of Characteristic Functions
69
and R > 0] given by the power series [absolutely convergent on B(a : R)] J( z ) = I c(v)(z—a)'
[Z—a=(z1—a1....,Zk—ak) .
v
where the summation is over all nonnegative integral vectors v E (Z + )'`. Then (D"f)(a)I
for every r E (0, k -' / 2 R], where M r is defined by
Mr =sup (If(z)I : Iz —a,I =r, l < i < k}. ;
Proof. Since the cube (z:Iz; —a; {
(—Ir
1
Clearly, for all v,
rtc(v)=(27r)-k
(' J[
(8......°k )dO 1 ... dOk . k exp { — i} g r
Noting that the integrand is bounded above by M r , one has I(D"f)(a)I=v!Ic(v)I
Q.E.D. For x,y in R k we define
x
ifxi
(9.1)
We also write x
(9.2)
where the summation X * is over all collections of nonnegatire integers
70
Expansions of Characteristic Functions
(j p : 0< /3 < v) satisfying (here Eli's are integral vectors)
(R:0<$(}
and c((jp :0< /3 < v}) depends only on the collection {jp :03 < v}. Also, H denotes the product over all /3, 0< /3 < v.
Proof
Let e J =(I,0,...,0),...,ek =(0,0,...,0,1). Then D`'(exp{f })=exp{f }(D`f)
(1
Thus the assertion is true for all v with Iv I= 1. Suppose that it is true for all v with II < m. Given v with I vI = m, one has, for all i (I
D" +e, (exp{f))=exp{f)Z*c((jp:03
.Ip(D#.f)'°'-I (D p.+ef)^"(Dsf)'a I { p':je •>0) 1
+exp(f )(Df)Y,*c((jp :0<$3. Then there exists a constant c 2(s) depending only on s such that J(D"log G)(t)I < c2(s)p5
(9.3)
for all t in R k satisfying
G(t)—lI<2
(9.4)
and for all nonnegative integral vectors v satisfying I vI = S. Proof. Note that loge is defined on an open set containing the•set of is satisfying (9.4). Also, if /3 j ,...,/3^ are nonnegative integral vectors satisfying /3 i +... +/3^=v,
Expansions of Characteristic Functions
71
then (D''G)()I
... I(D1 6 )(t)I <(EIX 'i)... (EIX4°ij) '< PIQ.I
.
..
PI $I
PcIs I+... +I s,U/s =P
J
,
(9.5)
by (6.29) and Lemma 6.2(ii). The proof is now completed using Lemma 9.1. Q.E.D.
For the remainder of this section we shall consider n independent random vectors X 1 , ... , A. with respective distributions G i , ... , G. We assume that Xi has mean zero, covariance matrix VV , and a finite sth absolute moment for some integer s > 3. We write n
PrJ =E II XIII ', Pr=n—'
(r>0),
Pr i—I
n
= with cumulant of X^,
Xtl = n -
X„
(9.6)
[PE(z+)k, 0
V^,
V=n ^ -
l -1
unless- otherwise specified. In case V is nonsingular, let B denote the symmetric positive-definite matrix satisfying B 2 =V - '
(9.7)
and write n 1,=fl'
2 EIIBX;Il r (r>0).
(9.8)
J -1
The constants c's that appear below depend only on their arguments. LEMMA 9.5. For every nonnegative integral vector a satisfying 0 < lal <3r, and for r=0, I,...,s-2, I D°Pr( z: {X„})I
(zEC k ). (9.9)
72 Expansions of Characteristic Functions
Proof By (7.3), D"Pr (z: {X,))= 2 _ I 1
m-1
+ ym)1
(v1+.
X
2" l... v 1
j,...,j_
(v1+... +Pm
—
a)!
''
z
+...+r.-a
(9.10)
where I• denotes summation over all m-tuples of positive integers (j1,. .,j,,,) satisfying
(9.11)
j1= r, r-1
and 1" denotes summation over all m-tuples of nonnegative integral vectors (v 1 ,...,vm ) satisfying (7.5). By Lemma 6.3 and averaging, IXr i
...
I < C 1( v 1) = c1(vl)
. .. C
.
i( Vm)Pj i +2
...
PI..+ 2
PZ /
r
...
= c1(v1) .
P1_ +2
.. CI(vm)Pi(j,+2)+••• +(j+2))/2 Pj,+2
(i^+
PJ
..
p2 „+2)/2
. +j..)/(s - 2)
cl(Vm)P^2+m^\ P2/2
..0
(vm)Pz
-r/(s-2)pJ/(s-2).
(9.12)
We have used Lemma 6.2(iii) and the fact that p ; is the r'th absolute moment of n -1 (G I + • • - + G,;), where Gj is the distribution of Xj . On using (9.12) in (9.10), the desired inequality (9.9) is obtained if one notes that
Ilzll m
(O<m'<m<m", zEC k ).
(9.13)
Q.E.D. LEMMA 9.6. Let g be an absolutely convergent power series in z E Ck on B(0: r)= (IIzII
[zEB(0:r)],
(9.14)
where h is a nondecreasing function on [0, oo). Then for every c > 0, J(D rg)(z)l < v!c H " h Ilz11 - "h(( 1 +k l/2c)IIzII)
(9.15)
Expansions of Characteristic Functions
73
for all nonnegative integral vectors v and for all z satisfying (9.16)
IIZII < r (z EC k — (0)). (1 + k'/ 2c)
Proof. By Cauchy's estimate (Lemma 9.2) I(D'g)(Z)I
< v!(clizll) -I ' I sup {I g(z')I : Ilz'II <(1 +k'" 2c)Ilzll } G v!(cllzll)-I"lh((l+k'/2c)IIZII) (9.17) for all z0 satisfying (9.16) if one takes (in Lemma 9.2) a = z, r = c II z II and notes that [for z satisfying (9.16)]
(Z':Iz
—
z,I=c11z11,l
C{Ilz'II
Q.E.D. LEMMA 9.7. Let s be an integer not smaller than 3 and let (y : v E(Z+)k, M < s) be real numbers. Define
X.,=(max{IX,.I"/''
2)
:3
s-3
1
(9.19)
l
c(s,k)= 7 }. r=i M=r+2 v JJJ
Then for every u ER' — {0) and all z satisfying IIZII <(I lc(s,k)iuI /31/(s-2))-1
(9.20)
one has, in the notation of Section 7, s-3 X
s-3 _
r+2( Z)
D exp 2 ( r+2)! r-1 '
ur
— I 'r r-O
—
J -2)— ^ -1
)exp{911zI1 2 ),
(9.21)
74
Expansions of Characteristic Functions
where c'(s, k) depends only on s and k, and D is the ath derivative with respect to z = (z 1 , . . . , z k ), a being any nonnegative integral vector satisfying jaj<s.
Proof Write [see (7.1)] 3-3
g(u:z)=
urXr+2(Z) s-3
r-1 (r+2)!
► —I
v
r
Irl—r+2 P
f(u:z)=exp(g(u:z)},
(9.22)
s-3
h(u:z)=f(u:z)— Y, u'PP (z:( ))
(uER I , zEC").
r=0
By definition of the polynomials P,,
dU
'" h(u : z)I
=0
for m=0,1,...,s-3,
u^0
du'
2
h(u:z)= d ' 2 f(u:z),
when h and f are regarded as functions of the first argument u only. By Corollary 8.3,
S
h(u:z)l< ( I IS 2)I sup{ I da ' z f(a:z)I:0
(ds -2 /dus -2 )f(u:z)
d
f,
(9.23)
is a linear combina-
j.2
ds-2
f(u:z)(du g(u:z)) ... (dus-2 g(u:z))
,
(9.24)
where jl,...,js-2 are nonnegative integers satisfying s-2 2 mjm =s-2.
(9.25)
M-1
If z satisfies J/(s_2) }
Ilzll <(8c(s, k)^uI
-
1
(9.26)
Expansions of Characteristic Functions
75
then, since [by (9.17)] QI'I-2)/(J-2)
IX.I
(9.27)
one has s-3
d m2 r(r— 1)... (r—m+ 1)( m 8(u:z)I = du r=m M=r+2 s-3
Zt )u'-"'
c t (r m k)a Jr J-2)IUIr-m.IIzIlr+2
r=m
s-3
< E c1(r , m , k)(aJ
t/(s-2)
r-m
IuIIIzII) r-mam/(J-2) IIZIl m+2
< c 2 (s,m,k) /3m/(J-2)IlzIIm+2
(1 < m < s-2), (9.28)
so that d (
J.-2
JJ-2
ii
g(u:z)) ... (
d4us-2
g(u:z))
du
< c3(s,k) aJ (IIzIIJ+ 11z11 3(3-2) ).
(9.29)
Also, if z satisfies (9.26), then 3-3
Ig(U:Z)I< 2
2
1 as/(s-2)IIZIIr+2luv
r= 1 1vI-r+2
s-3
<
IIZI1 2 2
1
11
r=I M=r+2 Y '
s-3
<
IIZI1 2 Y.
2
r- I M=r+2
r (IuIIIZII ps t s -2) )
1- (8c(s,k)) - r v
IIZ112 J-3 (gC(S'kI' r=1
I
II
r+2
IIZI1 2 8' ^f(u:z)I<exp{sllzII 2 }.
(9.30)
76
Expansions of Characteristic Functions
Hence, by (9.23), (9.24), (9.29), and (9.30), Ih(u:z)(
118112 (9.31)
for all u 0 and all z satisfying (9.26). One can now use Lemma 9.6 with r given by the right side of (9.26), c=(3k ! / 2 ) - ', to obtain — lal
)D°h(u : z)l <
c4(s , k)I uj s
a!(31k'I/2)
x /3 [(IIIzll) +(311zII)'
-2
(s-2)
$
jexp(9JIz11 2 )
(9.32)
for all z satisfying (9.20), since the right side of (9.20) is smaller than r/(l +k'/ 2c) =;r. Q.E.D. If V= I, then by Lemmas 6.3, 6.2(iii), one has (since p 2 = k) IXYI' /(I ' I -2) <(c1(v)p11) < ci(v,k)p,'
'/(IVI -2) /(s -2)
(3 < Ivy < s).
(9.33)
Taking these x's and u = n -' / 2 in Lemma 9.7, one has LEMMA 9.8. Let s be an integer not smaller than 3. Define (9.34)
c6 (s,k)=(l lc(s,k)max {cj(v,k):3 G 1v <s}) - ' There exists a constant c7 (s, k) such that for all t in R k satisfying
Il t11 < c6(s , k)n `12ps-
U(s-2),
(9.35)
one has, for every nonnegative integral vector a, 0< a! <s, s-3
DQ exp
— Iltll2 + 2
x +2 (tt) n_ /2
E (r+2)! r=1
—exp — 1I21z
' s-3
)J r=0
n-'/ zP,(it: {X,)) C7(s,k)ps
(
I.
111112
(Iltlls",al +I(tll 3(s-2) +lal )exp (— 4
77
Expansions of Characteristic Functions
Proof. The assertion follows from Lemma 9.7, inequality (9.34), and the following:
(i) D a (exp { - I ^ 2 12 } -h (n 1/2: it)) may be expressed as a linear combination of terms like t 2
(DRh(n -112 :it)) D°-13exp{- II2I })
(ii) also
(0<^13
l
D a—s exp
(— II221 <
c'(a
—
1 +IItIlla —R I)exp { — 2 12
fl,k)(l
Here h is the function defined in (9.22). Q.E.D. We are now ready to prove the main theorem of this section. Before stating it, we define d„ = sup { a >0:< a 2 implies G/
l
(
-s) n
- 1 I
))
(9.36) THEOREM 9.9 There exist constants c 8 (s, k), c 9(s, k) such that, for all t in
R k satisfying
Iltll < d, II 1 11 < c 8 (s,k)n'' q, '/(s-2'
(9.37)
one has, for all nonnegative integral vectors a, 0 ( IaI < s, n s-3
DI
Bt 2
) - exp( - il1r11 2 ) Y, n -r/2Pr(iBt:{X,.}) r—o
n /
—c
s -2)/2
[IItIIs —I a I +IIt11 3( 2)+I « I ]exp{
—
elll11 2 )• (9.38)
Proof. First assume that V= B= I. In the given region logarithm of G^(t/n'/ 2) is defined. Write s-3
h^(t)=logd (n .7 )-
- 2n llthI2+
It
(r+2)!) n-(r+2)/2 r—^
n
h(t)=
h/ (t), f-I s-3 n—r/2Xr+ 2,j( it )/( r+2 )
^( t ) — 2bhthI 2 +
n
_
_ n
logGj ( -_f. j ) - h(t).
i
!
r—1
1
(9.39)
Expansions of Characteristic Functions
78
We want to estimate I
n
D
G
t exp {+^(t)}
j=1
= Da[(exp (h(t)} —1)exp (i(t)} ] _ E c lo(a,$)(D'exp {^(t)})[D°
-,
(exp {h(t)} —1)]. (9.40)
0< /3Ga
By relation (9.30),
< 1121 2 + 11812 =Blitll (9.41) 2 if t satisfies (9.37) [see (9.27)]. Also, using (9.33), ( 1 -3 j(D II )(t)I = D O
=
I n'
r=0
^tv M=r+2v
s-3 C^
n—r/2 C` X' .
0,I$I-2? r=max{G
t ► —^
11=Gr+2,(v—/3)!
v>fi
s —t3 ^
<
G r=max(0,((i1-2}
n r/2 cii(v+k)Ps/(5-2)Iltlr+2—l^^ lvl=r+2 v>(3
s-3
(n-1/2c8(s,k)nl/2ps 1/(s-2))
E
/(s-2) g 2-IPI
r— max (0,1#1-2)
(a>0, 1ER' —(0)).
(9.42)
Let j l ,.•.,jr be nonnegative integers and /3,...,/3, nonnegative integral vectors such that jiai=a,
$>0
(1
Expansions of Characteristic Functions
79
By (9.42) one has [using (9.13)] (Dp (t))J1...(DR'p(t))i'I 1 i,(2-IR,I> ,
(two).
(9.43)
It now follows from Lemma 9.3 and inequality (9.30) that
ID'exp {^'(t))I < c14(s , k)(IItII 2-I,8I + IItII IRI )Iexp ('P(t))l < c3o(s , k)(IItlI 2-IQI + IItII I1I )exp { — i 11(11 2 + s IIlII Z } =cia(s,k)(Iltll 2-IQl +lltll' Q ')exp f —IIItII2) (t
0)
(9.44)
if t satisfies (9.37). Next note that for any nonnegative integral vector /3 satisfying 0 I /3 s,
D,6'(D'h^)(0)=0
[0
Hence, applying Corollary 8.3 to g DOhJ , one gets D'Qh (t)I < ;
ItRI
2 sup {I(Dag)(ut)I:0C 1). IR'I-s-I/3 a '
(9.45)
But if I /3'I = s — I /31, then by Lemma 9.4 [note that (9.4) holds for II t II < d„],
j(D O 8)(ut)I =I (D$+R)(ut)I =I (D$+R • log G^)t
nut )I n-s/2
< n —s12c2(s)PsJ,
so that summing over j = 1, ... , n, in (9.45) one has E 1 1 s-lel.(Pi-s-Ifil n a•J
ID ,8h( 1 )I < ASA III
(9.46)
In particular, taking /3=0 in (9.46) yields i h(t)I < cz(s)(^
I$'I=s Q
9.47) l( -,) (.Pz>^i Iltll° < II g n
80
Expansions of Characteristic Functions
for an appropriate choice of c g (s, k). If a - /3 = 0, then [using both inequalities in (9.47)] ID° Q (exp {h(t)) -1)( = (exp {h(t)) -11 < (h(t)Iexp {lh(t)I) -
$ (I Iz }.
(9.48)
If a>/3, then D° - O (exp {h(t)) -1)= D° - " (exp {h(t)}),
(9.49)
which is a linear combination of terms such as (Da'h(t))''... (DRh(t))''exp { h(t)),
where 2; _ 1 j; /3; = a - /3. By (9.46) [and (9.13)1,
(DF'h( t ))i'... (DRh(t))i'l
c ► 6 s k 11 (I Vc s-2)1
n
(s - 2)(EJ, - 1)
,
(s
P [ n /2 - 2)/2
II tII5-2+2±Jl-la-13I
C17( S,k)
ps(IItlls—ja—Ql+ 1(II( sla—PI-2 )
(9.50)
Hence if a> Q, then
ID° a (exp{h(t)) - 1)I -
c's( s, k) ps(Iltll'-IQ-fI n cs-z)/z
+Iltllt+la-BI+2)exp{ Il8 I.
(9.51)
Using (9.44), (9.48), and (9.51) in (9.40), one obtains
D a (n n G t ex l/z) p i=1
Iltll z + s-3 n / J4+z( !t)
2 E -^ .-1
c19(s k)P, ,
(Iltll
Ia l
2
+Iltl l
(r+2)i
l
3+al+z
Iltll2 4 }• (9.52) )exp (-
}
Expansions of Characteristic Functions
81
Now use Lemma 9.8 to complete the proof when V= B= 1. If V r 1. look at random vectors BX 1 ,...,BX n and observe that n
II J=^
Gi \
z l
is the characteristic function of Z n = n - '/ 2 Y. where Y n = B (X, + • • . + X n ). Also, if the (r+2)th cumulant of the random variableis denoted by X,+z,l(t), then the corresponding cumulant of = is Xz(Bt). Q.E.D. The following theorems are easy consequences of Theorem 9.9. THEOREM 9.10. Let G be a probability measure on R k with zero mean, positive-definite covariance matrix V. and finite sth absolute moment for some integer s not smaller than 3. Then there exist two positive constants c 20(s,k), c 21 (s,k) such that for all tin R k satisfying
IItII < c20(s,k)n1/zns
z'
one has, for all nonnegative integral vectors a, 0 al < s, s-3
D ° Gn( ^^z) —ex p{ - 111 , 11 2 } E n—r/2Pr( iB1: (Xv)) r=0
cz,(s,k)71S s lai ncs—z)/z ^Iltll
+Iltll
3(s—z,+iaj ]ex{ — Ill
4 ^.
Here B is the symmetric positive-definite matrix satisfying (9.7),
-s= f
I1BxII-V(dx),
and y,, is the with cumulant of G. Proof Note that if i satisfies 1 1 11 < n'/z^ls 11(s-zt
then Bt, VBt ) t 2 2n = I1I G^ Bi z )—1 < 2n
^z,
Expansions of Characteristic Functions
82
because of the relations fIBxjp 2G(dx)=k<[fit BxIJ5G (dx)]
THEOREM 9.11. Let X 1 ,.. .,X be n independent random vectors in R k with zero means, covariance matrices V i ,...,V. (at least one of which is nonsingular), and finite sth absolute moments for some integer s not smaller than 3. Then there exist two positive constants c 22 (s, k), c 23 (s, k) such that for all t in R k satisfying (s-2)/s
ni/Z
11111 < cZZ(s k)(, 1J( / s-2))
one has for all a, 0 6 IaI < s, E ( ex j . iB!
Da
X^> )) —
Ilt2I `3
r
2_
))
exp — 2 J ^n -1 Pr(i 81 :{X^ r=0
111112 ,ai acs—z>+iQi------j }exp^ , +I{t{I n (z))z [lltll
c23 s ,k s j
where the notation is as in Theorem 9.9. Proof. As in the proof of Theorem 9.9 (see the concluding observations), it is enough to prove this theorem for B= V= I. In this case, for all t satisfying n'/2
Iltll
11u/(52) •j
one has t E
Gilni/z/-11<
> Z IItII ZE(IX II Z ;
2n
<
2n
IItl12(EllXilis)21s 11,112(n',ns)Z/s<2. 2n 2 n
Q.E.D.
83
A Class of Kernels
Before concluding this section, we state another theorem that may be easily proved along the lines of Theorem 9.9, by taking one more term in the Taylor expansion. THEOREM 9.12. Under the hypothesis of Theorem 9.10, D°
/ t 2 s-2 Gn BI Z - )—exp j (Ii 1 2 n-'"2P,(iBt:{X,))
`
ns(2)/2
I r=o
[Iltll
s-ICI
+Iltll
3(s-2)+ l a l]exp
{ _ 1^ 12 t
(9.53)
where S (n)—*0 as n—*oo. In fact, (9.53) holds even for s)2, al 6 s. Remark. Under the hypothesis of Theorem 9.11, one may also prove that if i/2 ) (
1111
2)/
(9.54)
/(,-2)
where A is the largest eigenvalue of V, then R
s-s
D. fJE(exp{<— X/ ))) -exp{-l)•± ro n - '/ 2P,(it:(x.)) ^
j-1
Aldl/ , k) ^2^( sR` < 2
1<: yt>t:-1=I)/2 ,
n ( ' -2) / 2
+
Vt>(3(,-2)+laly9exp
(
- i
(9.55)
For a=0 this follows by replacing Bt by t in Theorem 9.11. The general case follows by induction on al. Note that the derivative D" is with respect to r. Completely analogous modifications hold in the statements of Theorems 9.10 and 9.12.
10. A CLASS OF KERNELS
For a >0 let UI _ a , al denote the probability measure on R' with density
ul-Q,,)(x)= a 2 =0
for —a<x6a for lxl>a.
(10.1)
84
Expansions of Characteristic Functions
is called the uniform distribution on [ – a, a). One has
The measure U1 _
O(_
Q
, a J(t)
=Za f costxdx= 0
a
a
sp a ' (tERl).
(10.2)
The probability measure
To=
(10.3)
U`i_a/2,a/2l
is called the triangular distribution on [ – a, a]. It is easy to show that its density is
Q (1– ad l )
t,(x)=
for IxI
(10.4)
for lxl>a,
=0 and that
at
sin — I a t 2 (tER'). T a (t)=
(105)
2 One can write
c(m) =(f
/Ri
I
sinx I m dx) (m =2,...). -'
X i
(10.6)
For a >0 and integer m > 2 let G,.,,, denote the probability measure on R' with density g,
m
(x)=ac(m)I sin x ax
I
m
(xER').
(10.7)
It follows from (10.2) that for even integers m> 2 sin at _
--
at
m
^ [- a . a
l(t)
(t E R'),
(
10.8)
so that by the Fourier inversion theorem [Theorem 4.l(iv))
d.,.(t)=21Tac(m)u*
=O
if
f tI > ma.
.. a) (t)
(t ER'), (10.9)
A Class of Kernels 85
Let Z I ,...,Zk be independent random variables each with distribution G 112,, 2 , and let Z = (Z,, ... , Z, F ). Then for each d>0 Prob(IIZII>d)
\.
(10.10)
Thus for any a in (0, 1), there exists a constant d depending only on k such that Prob(IIZII> d)< 1—a.
(10.11)
Note that the characteristic function of Z vanishes outside [ — 1,1 ] k . Now let K j denote the distribution of Z/d; then K,((x:lIxll> 1))< 1—a, if t12[—d,d^ k
K 1 (t)=0
(10.12)
One thus has THEOREM 10.1. Let r be any given positive integer and let a E (0, 1). There exists a probability measure K, on Rk such that
(i) K1((x:IIxII> 1))
one has
f I x"I K (dx) < I
oo.
(iii) K 1 (t)=0 for t [ — d, d ) k where d is a constant depending only on a and r. We also need as kernels certain probability measures having compact support and fast-decreasing characteristic functions. To this end we prove THEOREM 10.2. Let u be a real-valued, nonnegative, nonincreasing function on [1, oo) such that
uj t)
dt< oo.
(10.13))
Then for every 1 >0 there exists a probability measure K on R 1 satisfying (i) Support of K[
—
1,l]
(ii) IK(t)I = 0(exp (— Itlu(ItI)))
(ItI—>oo).
86
Expansions of Characteristic Functions
Proof. Define a nonincreasing sequence of nonnegative numbers (a,: r> 1) by eu (ro)
a,=
1
ro
_
eu(r) r
,
r>ro,
where ro is a positive integer chosen to satisfy
00 O° u(r) I a,-eu(ro)+e 2 1. r-1 r r.- rp+ l Let the probability measure K be defined as the infinite convolution K= Ut _ a „ o ,I« UL _ az,Q=J• .. .
Clearly, support of K is a, r-1
r-1
Also,
si na,t r-1
For every positive integer
1
(tER ).
at
K(t) s> ro ,
'
therefore,
S
IK (t)I < 11 ^7 _ tl
< l a!tl' s - `eu(
)It)) • '
Now choose s- [Itlu(Itl)], the integer part of Itlu(ItI). We assume without loss of generality that IiIu(ItI)—'oo
as Itl—+oo.
[For otherwise one may replace u(t) by u 1 (t)= u(1)+(t+ 1) -1 / 2 and note that (10.13) holds for u if and only if it holds for u 1 , and that (ii) holds for u if it holds for u 1 (since u 1 > u).j Thus for sufficiently large Its, one has
A Class of Kernels
87
ro < s < I tl so that lItIU(ItI)J
IK(t)I <( eu(s)Itl ) < Itlu(Itl)
(ltI11(ItI)I
I
<
)
eu(itl)Iti
Remark. The above result is due to Ingham [1], who has also shown that the above theorem provides the best possible rate of growth for the characteristic function of a probability measure with compact support. To be precise, he has shown that if u is a real-valued, nonnegative, nonincreasing function on [1, cc) such that f r[u(r)/:Jdt-oo, then there does not exist any probability measure with compact support whose characteristic function is 0(exp(—Itlu(I:I))) as Itl— ► oo. Note that this also follows from the following result of Wiener and Paley: If fE L 2 (a, oo) then f
J
°O 00
IiogIf (t)II dr
(10.14)
1+r 2
THEOREM 10.3. Given a nonnegative, nonincreasing function u on [1, co) satisfying (10.13), and a nonnegative integer s, there exists a probability measure K on Rk with support contained in the unit ball S= (x: IIxii < 1) such that
ID°K(t)I < c(s,k,u)exp
— k t^lu(It,l) -i
( 10.15 )
for all nonnegative integral vectors a=.(a,,...,a k ) satisfying 0
Define M, = M. Then
M,((x:xER',(xi
M(t)=(M(t)) 3+ '
(tER'),
*See Paley, E. A. C. and Wiener, N. [1], Theorem XII, pp. 16, 17.
88
Expansions of Characteristic Functions
so that for 0 < r < s,
dt'Mt(t)I <(s+ 1)'EIXI'IM(t)I
t-.
<(s+1)'k"'' 2 (s+1) 'IM(t)I
< c 2 (s,k,u)exp ( — Ittu(I1I)), where X denotes a random variable with distribution M. Now define K on R k as the product measure M, X M, x • • • x M t . Then
K({x:xER k ,Iixll<1}) > K({x:x=(x l ,...,xk ),Ix; )
and jD aK(t)I — ^ dt ^ , Mt( 1 1)
1
...
f dtk Mt(tk)I k
It;Iu(It;I)
(t=(tt,...,tk)ERk),
for 0 < Iai < s. Q.E.D. COROLLARY 10.4. For any positive integer s, there exists a probability measure K on Rk satisfying (i) K((x: Ilxll < 1))= 1, (ii)for all nonnegative integral vectors a, 0< Ial <,
(D°K(t),
(tER k ).
Proof In Theorem 10.3 take u(1)=t - " 2 on [1,00) and note > IIt11 t / 2 , for all t-(t I ,...,tk )ER k . Q.E.D. NOTES Sections 4 and 5. The material on the Fourier transform and the Fourier—Stieltjes transform reviewed here is fairly standard and may be found in Cramer [3, 4], Chung [11, Feller 131, Katznelson [11, and Stein and Weiss [11.
A Class of Kernels
89
Section 6. The best reference for the inequalities of this section is Hardy, Littlewood, and
Polya [l). Sections 7-9. The idea of expanding the distribution function F„ of the normalized sum of n independent random variables as E,n - 2P,(-d': {X,)) appears for the first time in Chebyshev (1]; later it was investigated independently by Edgeworth [I]. However the asymptotic expansion of the characteristic function of F. was obtained by Cramer (1, 3) (Chapter VII), who used it to give the first rigorous derivation of the asymptotic expansion of F. under the so-called Cramer's condition (20.1) (see Chapter IV). Theorems 8.4-8.6 are refinements (and extensions to R*) of results of Cramer (3) (Chapter VII); analogs of Theorems 8.7 and 8 .9 for k - I were obtained by Laipounov [2]. There are many such refinements available in the literature, for example, Esseen [I], Gnedenko and Kolmogorov [1], and Petrov [1) in one dimension; Rao [I), Bikjalis (3, 61, and Bhattacharya (3) in multidimension. Theorems 9.9 -9.12 are generalizations of analogous results of Bikjalis [6]. "
Section 10. Kernels such as K i (Theorem 10.1) were used in one dimension by Berry [I] and
Esseen [ I ). Theorem 10.2 is due to Ingham [ I ].
CHAPTER 3
Bounds for Errors of Normal Approximation
Our goal in the present chapter is to estimate j f f dQ„ — J f d for a large class of functions f on R k here Q. is the distribution of the normalized sum of n independent random vectors and cb is the standard normal distribution on R'. For bounded f, the error bound is computed in terms of the average modulus of oscillation of f with respect to 4, or the supremum of this average over all translates of f. This is entirely appropriate in view of the characterization of (D-uniformity classes proved in Chapter 1; also, in most cases of practical importance, these moduli can be estimated efficiently. The main tools used for obtaining these error bounds are expansions of the characteristic function Q. and its derivatives as derived in Chapter 2, together with some smoothing inequalities proved in Section 11. In Section 12 the classical Berry—Esseen bound is obtained with an estimation of the constant involved. Section 13 is devoted to estimations of IJfdQ„ — Jf d 4?j for bounded f under the simplifying assumption that fourth moments are finite; the proofs here are not only simpler than those of the general results of Sections 15-17, but also yield numerical values for constants involved in the bounds. In Section 14 we obtain truncation estimates that enable us to derive the main results of Section 15 on rates of convergence for unbounded fs under the assumption of finiteness of third moments. After a short section showing how to deal with different normalizations of the sum of n independent random vectors, in Section 17 we consider a number of important applications of the theorems in Section 15. A final section deals with rates of convergence under the sole assumption of finiteness of second moments. To facilitate comprehension we briefly sketch the main ideas underlying the rather long route leading to the main results. For the sake of simplicity ;
90
Smoothing Inequalities
91
assume that (X.: n> 1) is a sequence of independent and identically distributed (i.i.d.) random vectors with common distribution Q 1 . Suppose that Q 1 has mean zero, covariance I (identity matrix), and a finite third absolute moment p3 . Let Q. denote the distribution of n - 2(X, + • • • + X„). If Q, has an integrable characteristic function (c.f.) Q 1 , then the c.f. Q. of Q„ is integrable for all n. One can then use Fourier inversion to estimate the density h„ of the signed measure Q„ - (D as "
Ilh.II m sup Ih.(y)I <( 21T) -k IIQR $II1. -
(I)
yER k
The fairly precise estimates of Q„ - ' of Chapter 2 (e.g., Theorem 8.4) yield IIh„II= O(n-' "2). Since such a uniform estimate of h„ cannot be integrated over the unbounded domain R k , one may estimate the variation norm II Q. - 41 11 = II h,, II 1 by estimating the integral of IhI over a sphere S of radius 0(log' / 2 n) as Ilh„II, 0 •vol(S), and the integral over the complement of S by the classical Berry-Esseen theorem and a Chebyshev-type inequality. To avoid the loss in precision arising from the factor vol(S)=0(logk / 2 n), assume that Q, has a finite fourth moment and apply Theorem 8.5, adding one term [n ' 12P,(- b: {x,))] to b and later subtracting the contribution from this term, which is of the order n" 2 . In the general case (i.e., when Q, may not be integrable) we smoothen Q, by convolving it with a smooth kernel KK (a probability measure with an integrable c.f., which for small e assigns most of its mass near zero) and apply the above argument to (Q„ - 4)* Ke , with a proper choice of a (depending on n). The smoothing inequalities of Section 11 enable one to estimate the perturbation due to this convolution by K, and one arrives at Theorems 13.2 and 13.3, which express bounds for jf d (Q„ - ( P)I in terms of the average moduli of oscillation wf or wJ and the range wf (R k ) of f. To estimate Iffd(Q„ -4D)l for unbounded f and at the same time relax the assumption of finiteness of fourth moments (in case of bounded f) we compare the measures IIxll'Q„(dx) and Ilxll' (D(dx) for nonnegative integers r in the same manner in which Q„ and 1 are compared above. Because of the fact that for odd integers r Il xll'Q„(dx) does not have Fourier-Stieltjes transforms as well behaved as those for even r, we replace r by ro, where ro = r + I if r is odd and r0 = r if r is even. As above, we smoothen P,o - llxIi' °(Q„-4)(dx) as p o * KK and apply Lemma 11.6 to the density g of v, o * K,, thus obtaining -
Ilv, o *Kell= IIgll1
c(k)
max 1$I—O,k+i IIo
111,
(2)
where g is the Fourier transform of g. The Fourier-Stieltjes transform of v,o is estimated by Theorems 9.9-9.12 and a sharp estimate of II v a * K, II is
92
Normal Approximation
obtained provided f (Ixll'o +"` +'Q,(dx) is finite (which ensures the existence of D flg for fl = k + 1). To relax this last hypothesis, which is rather restrictive, one resorts to a truncation of the random vectors (X.: n> 1) and applies the above procedure to these truncated vectors. The various lemmas in Section 14 allow one to take care of the perturbation due to truncation. As in the case r0 =0, for final accounting (i.e., to estimate the effect of smoothing by Kc ) one uses the smoothing inequalities of Section 11. The main theorems of Section 15 are obtained in this manner. A further truncation enables one to obtain corresponding analogs when only the finiteness of absolute moments of order 2+8 is assumed for some E, 0 < 8 < 1, thus yielding generalizations and refinements of the classical one-dimensional theorems of Liapounov and Lindeberg.
11. SMOOTHING INEQUALITIES
Lemmas 11.1 and 11.4 show how the difference µ— v between a finite measure and a finite signed measure v is perturbed by convolution with a probability measure Kc that concentrates (for small e) most of its mass near zero. Let f be a real-valued, Borel-measurable function on R k . Recall that in Chapter I we defined the following:
(ACRk),
wf (A)=sup(If(x)—f(y)I:x,yEA} wf (x:e)= wf (B(x:e))
(11.1)
(xER k , a>0).
Also define
MJ (x:e)=sup{f(y):yEB(x:e)), mj (x:e) =inf{ f(y): yEB(x:e))
(11.2)
(xERk, e>0).
Note that
wf (x:e)=M1 (x:e)—mf (x:e)
(xER k , a>0).
(11.3)
The functions M1 (- : e), mf (• : e) are lower and upper semicontinuous, respectively, for every real-valued function f that is bounded on each compact subset of R k . Also, wf(• : e) is lower semicontinuous. These follow from {x: M1 (x:e) >c }=
U {B(x:e): f(x)>c)
(cER5,
z
ml (x:e)=—M_ f (x:e)
(xER k , r >0).
(11.4)
In particular, it follows that Mf (• : r), mf (• : e), wf (• : e) are Borel-measurable
Smoothing Inequalities
93
for every real-valued function f on R k that is bounded on compacts. Recall that the translate fy off by y(E R'") is defined by fy (x) =f(x+y)
(xER k
).
(11.5)
LEMMA 11.1 Let s be a finite measure and v a finite signed measure on R k . Let a be a positive number and K E a probability measure on R k satisfying Ke (B(0:e))= 1.
(11.6)
Then for every real-valued, Bore/-measurable function f on R k that is bounded on compacts, f fd(µ
—v)I
(11.7)
where Y(f :E)= max{ f Mf (•:e)d(tt— v) *KE
, — f mf(•:e)d(µ — v) *Ke }, ,
r(f:2e)=max(f (Mf (•:2e)—f)dv + f (f—tn (•:2e))dv + ), (11.8) provided that M f (- : 2e)I and Im f (- : 2e)I are integrable with respect to and Proof. One has
f Mf(•:e)d(µ —v)*Ke = f [ f MJ(Y+x:e)(IL
Y(f:e)>
=f
B(O:) e
—
v)(dy)]Ke(dx)
[ f Mf(Y +x:e)It(dy) f f(Y)v(dy) — f (Mf(y+x:e)— f(Y))v(dy)]K.(dx) —
> f[ f f(Y) u(dY) f f(Y)v(dY) — f (MJ(y+x:e) — f(Y))v (dy) I KE(dx) > f fd(,u f (M1( . : 2 e) — f)dv . —
B(0:e)
+
—
v) —
+
(11.9)
Normal Approximation
94
Similarly — Y(f:e)<
f
=
mf(•:e)d(µ'v)`KK mf(Y+x: e)IL(cô')
B(O:i){
—f
f(Y)v(d')
+ f (f(Y)—m (y+x:e))v(dy)]K.(dx) l
f fd(µ—v)+ f (f—m (•:2e))dv f
+ .
(11.10)
From (11.9), (11.10), one gets Y(f:e)—T(f:2e)G TY(f:e)
—f
(f—mm(•:2e))dv+
< f fd(µ—v)
+
Q.E.D. COROLLARY 11.2 Under the hypothesis of Lemma 11.1, one has
f fd(µ—v) I
(11.11)
+ .
If, in addition, f is bounded and µ(R k
)= v(R' ),
(11.12)
then
f fd(µ v)L< —
,
1(R k
)II(µ
—
v) *KII+
fw (.:2e)dv f
+ . (11.13)
Proof The inequality (11.11) follows from (11.7) and the relation (11.3). To prove (11.13) note that, in view of (11.12),
f (f
— c)d(µ — v)=
7(f — c:e)= max{
f fd(µ
— v) ,w i- c( - : 2 e)=wf(•: 2 e),
f M _,(.:e)d(µ—v)•K , — f m _ j
K
<sup {If(x)—e{: xER' )II(µ— v)•KcII
f c
(•:e)d(µ—v)*KK
Smoothing Inequalities
95
for all c in R 1 . Letting c=2(sup{f(x): xER k }+inf{f(x): xER k }),
(11.14)
and observing that for this value of c wf _ C (R k )=wf (R k )=sup{f(x)—c: xER k } —inf( f(x)—c: xER k )=2sup{lf(x)—cl: xER k },
one obtains (11.13). Q.E.D. COROLLARY 11.13 For every Bore! subset A of R k one has µ(A) — v(A)
fl( t— v)*KII + v+(A2,\A)
ifs, v, K e are as in Lemma 11.1 and if (11.12) holds. Proof. The inequality (11.15) follows from (11.9) with f= IA if one notes that
f M1_112(': 2 e)d(µ — v)'KK2.
(11.16)
Then for each real-valued, Borel-measurable, bounded function f on R k , one has
I
f fd(tt—v)I<(2a-1)-^y•(f:e)+aT'(f:2E)+(1—a)z•(f:e)^ (11.17)
where
y`(f:e)=sup{y(fy:e):yERk }, T * (f:2e)=sup{7 . (fy :2e ):yER k }.
Proof Let 8=sup{ f fy d(µ—v)I: yER k }
(11.18)
% Normal Approximation
Assume first that 8—sup{ rfy d(µ—v):yER k }.
(11.19)
°
Then, given any positive number ,1, there exists y in R" such that f f,.d(p—v)>6-71.
(11.20)
In this case Y`(J: E) > f Mfy,(• : E) d (,u — P)"K,
=
f
fMfy(y+x:E)(µ—v)(dy)IK.(dx)
B (o: e)
+
f k
J
[
Mf.(y+x:E)(µ — v)(dY)}KK(dx) f
R \B(O:e)
> Jf J,'(y) µ( dy) — f Jy .(y) v ( dy) 8(0: E)
— f (M,(y+x:E)
—
fy o(y))v(dy)I K.(dx)
+1 [ f fy4y+x)µ(dy) R k\$(0:e)
-
f fy.(y+X)Y(dy)
— f (Mjy.(y+x:e)—fy4y+X))v(dy)}K (dx)
I &—— f (Mi,.(y+x:E) — Jyo(y))v + (d1')]K.(dX)
> f
B(O:e)L
>+f f
-8—f(Mfr.(y+x:E)—Jy.(y+X))v+(dy)K^(dX) k
\B(0:e)
8(0: e)
+
[s—n
—
JC Rk\B(0:e)
f (MIy.(y: 2 E) — Jyo(y))v (dY)]K.(dx)
—
S
—
T*(f:E)]KK(dX)
[S—T rt—•(f:2E)]Ke (dx)+[—S—T*(f:E)](1—a) >fB(0:e) _ {8—rt—r*(f:2E)]a— [8+r*(f:E)](1—a) =(2a-1)6—ar*(f:2E)—(1—a)r•(f:e)—art.
(11.21)
Smoothing Inequalities
97
Since rl may be chosen arbitrarily close to zero, Y*(f:e)>(2a — 1)8—aT*(f:2e)—(I — a)T * (f : r),
from which (11.17) follows. If instead of (11.19) one has S= —inf( f f,,d(µ—v):yERk },
°
then, given any ,j>0, find y such that — f f,od(µ—v)>S—q.
Now look at — f,o (instead of f,,o) and note that
M_ 1 (-:e)=—mj (•:e),
f
(fl,—mfy ( •:e )) dv+'
=
f (M-J( . :e)
—
(
(11.22)
— f,))dv + (yERk)•
Proceeding exactly as in (11.21), one obtains Y'(f:e)^ — f m10(.:e)d(IL—v)*K >(2a— 1)6 —aT*(f:2e)—(I —a)T*(f:e)—arl.
Q.E.D. We define the average modulus of oscillation w^(e : µ) of f with respect to a finite measure µ by w^(e: s) = f wj (x:e)u(dx)
(e>0).
(11.23)
Here f is a real-valued, Borel-measurable function on R"`. We also define wf (e : µ) as the supremum of the above average over all translates of f, that
is, wf (e:µ)=sup{wfy (e:IA): yER k )
(e>0).
(11.24)
COROLLARY 11.5 Under the hypothesis of Lemma 11.4, one has
f fd ( ,u -v )) <(2a-1) - '[y*(f:e)+awf (2e :v + )+(1 —a)wf (e:v + )] <(2a-1) '[Y*(f:e)+4 (2e:v + )]. -
(11.25)
98 Normal Approximation
If, in addition, µ(R k)= v(R k ), then ' f fd(µ - v)I <(2a- l)_ [Iwj(R ' )II(µ-v)*K,Il +awf (2e:v + )
+(1-a)w/ (e:v + )] <(2a-1) - '[
,1(Rk)II(µ-v)*KII+4 (2e: v)]. (11.26)
Proof. This corollary follows from Lemma 11.4 exactly as Corollary 11.2 follows from Lemma 11.1. Q.E.D.
The final result of this section relates the L 1 -norm of an integrable function g to the L 1 -norms of certain derivatives of its Fourier transform g. LEMMA 11.6 Let g be a real-valued function in L I (Rk ) satisfying f xll k+l jg(x)^dx
Then there exists a positive constant c(k) depending only on k (and not on g) such that f I Dfig(t)I dt. II gill < c(k) IIll max —U,k+l
(11.28)
Proof We assume that Dflg is integrable with respect to Lebesgue measure for 0< 1/30),
and the 2k quadrants E., one for each vector a =(a 1 , ... , ak ) (in R k ) having zeros or ones as coordinates, by
Ea = (x =(x 1 , ... , xk ): x1 >0 if thejth coordinate of a is zero, xj <0 if the jth coordinate of a is one, I <j< k ). Since the Fourier transform of the function k k+l
-1 , x1 g(x) x-1+( 2 ()°
j_ I
is of the form 7, I$1-0,k+I
c(a,$,k)D' s8,
(xER k )
Berry-Esseen Theorem
99
where c (a, /3, k) depends only on its arguments (and not on g), one has g(x)dx ]
Ilgh 1= I If g(x)dx—f a
_
(t+'xlk+')-'(1+lxlk+l) g(x)dx
(f; a
An E,
J
(Rk\A)nE,
An E,
(Rk\A)nE,
k k+1 X
l+( E (—l) a'xj g(x)dx
I.
j-1 2 ^ lk+l + IX ) ( )
1(1r =
( ✓ An E, - ✓ (R k \A)nj (
k
I
c(a,/3,k)DPg(t))dt dx
X f exp(—i)( 1,01-0,k+l
<(f (l+IxI k+l ) - 'dx)c'(k) max f IDfiI(t)Idt. 101-0.k+1
• M 12. BERRY-ESSEEN THEOREM Let P be a finite measure and Q a finite signed measure on R 1 with distribution functions F and G, respectively, F(x)=P((—oo,x]),
G(x)=Q((—oo,x])
(xER'). (12.1)
Let Ks be a probability measure on R' such that
a=KE((—ESE))>i
(12.2)
for a given e > 0. It follows easily from Lemma 11.4 that sup IF(x)—G(x)I <(2a— 1) - 'sup{I(P—Q)+KK ((—oo,x])I: xER'} xER'
+sup{ IG +(x) — G + (y)I : Ix —yj < 2e } ],
Normal Approximation
100
where G + is the distribution function of Q +. However with a more restricted choice of Q and the kernel K f this inequality can be sharpened. LEMMA 12.1 Let P be a finite measure and Q a finite signed measure on R' with distribution functions F and G, respectively. If Q has a density bounded above in magnitude by m, and if K is a probability measure on R' that is symmetric and satisfies (12.2), then f
sup IF(x)—G(x)j<(2a-1) -1 x€R
sup !(P—Q)•K1 ((—oo,x])I+ame. L ERI
J
S= sup IF(x)—G(x)I= —inf(F(x)—G(x)).
(12.3)
,
Proof First assume that xER'
xERI
Given q > 0, there exists x o such that F(xo)— G(x o )< —S+rl.
Then (P — Q)'KK(( — oo , xo — e]) = ([F(xo — e — y) — G(xo — e — y)]K: (dy)
— ✓ B(o:f)[ F(x
o — e—y)— G (xo—r—y)]K1(dy)
+ f[F(xo — e — y) — G(xo — r — y)]K1(dy) R^\B(O:f)
:f) B(o
[F(xo)]K,(dy)+S(1—a)
f
[F(x o )—G(x 0)+m(e+y)]K1 (dy)+S(1—a)
f
[ —S+7i+m(e+y)]KK (dy)+S(1—a)
B(0:f)
B(o:f)
=(—S+71+me)a+m f B(0:f)
—S(2a— 1)+mae+rla.
yK1(dy)+8(1—a)
101
Berry-Esseen Theorem
Hence S(2a-1)< sup I(P—Q)*KK ((—oo,x])I+mat,
(12.4)
xER'
which proves the lemma if (12.3) holds. If, on the other hand, S= sup (F(x)—G(x)), xER'
then given rl > 0, there exists x o such that F(x o )—G(x o )>S—r^.
Then (P— Q)*Ke ((— oo,x o +e])
B(o:e)
[F(xo+e
—
y)
—
G(xo+e
—
Y)]KE(dY)
+ f[F(x o +e—y)—G(x o +e—y)]KE (dy) R \B(O:e)
>f[F(xo)—G(xo)—(e—Y)m]KK(dy)—S(I —a) B(0:)
(S
—
n)IX
—
am€
—
S(1
—
a)=S(2c
—
1)— amt—a,
so that (2a— l)8 sup (P—Q)*K ((—oo,x])+ame. E
xER
1
Q.E.D. LEMMA 12.2 Let P be a finite measure and Q a finite signed measure on R' with distribution functions F and G, respectively. Assume that
f IxIP(dx)
JIx1IQI(dx)
P(R')=Q(R').
If Q has a density bounded above in magnitude by m, and if K E is a symmetric probability measure on R' satisfying amKe ((—e,€))>,
f JKE(t)Idt
Normal Approximation
102
for some e > 0, then
sup IF(x) — G (x)I <(2a— l) ' -
xER'
x [(2 ,ff) - 'f III -l l(p(t)-Q(t))K^(t)ldt+ame]. (12.5) Proof By Fourier inversion, the density of the signed measure (P - Q)
• Kc is (2ir) - ► f exp(-itx)(P (t)-Q(t))K,(t)dt
(x ER t ).
(12.6)
It is simple to check that the function (12.6) is the derivative of the function (27r) ' f exp{-itx}(-ii) '(P(1)-Q(t))K1 (r)dt -
-
(xER5. (12.7)
Note that Ill - ► Ip(t)_Q(1)1=
lit -'(exp{itx)-1)(P-Q)(dx)I
< f IxIiP- QI(dx) < oo.
By the Riemann-Lebesgue lemma [Theorem 4.l(iii)J, the function (12.7) goes to zero as IxI->oo. Thus
1
(P-Q)`K ((-oo,x]) =(27r) - ' f exp{-itx}(-it) - '(P(t)-Q(t))K,(t)dt,
the constant of integration being zero. The inequality (12.5) now follows from Lemma 12.1. Q.E.D. Remark. 1f, in addition to the assumptions on P and Q in Lemma 12.2, one also assumes that they have integrable Fourier-Stieltjes transforms and that flu
-
'l (t)—Q(t)ldt<eo,
(12.8)
then the above argument yields sup IF(x)- G(x)l <(21r) ' f ltl -
-
IIP(t)-Q(t)ldt.
(12.9)
xERl
We shall use Lemma 12.2 to prove the Berry-Esseen theorem below. The kernel K, is the distribution of eZ/3.25, where Z is a random variable
Berry-Esseen Theorem
103
whose distribution has the density g,, 22 given by x-
x sin 21T '[()
x 2
( x ER').
(12.10)
Recall [see (10.9)] that t K(t)=8i/2.z(3.25)=1- 3.25 e
if j t j <3 E 5,
if ItI>5 .2 .
=0
(12.11)
C
By a careful but straightforward numerical integration one also obtains a- K1
(( —
E,E))=
J
(lxIc3.25)
9,,,22(x)dx> 0 . 79 .
( 12.12)
The following lemma will be useful in estimating the constant in the Berry-Esseen theorem. LEMMA 12.3 Let P be a probability measure on R' with zero mean and variance one. Let F denote the distribution function of P and 4 that of the standard normal distribution on R. Then sup F(x) - 4(x)I0.5416. xER
I
Proof. We first prove the so-called one-sided Chebyshev inequality
F(-x)<(l+x 2 ) - ',
1-F(x)<(l+x 2 ) - '
(x>0). (12.13)
Fix x > 0. For every b > 0, one has 1 +b 2 =
f( - b) 2P(dy) > f
(
(y- b)2P(dy)>(x+b)2F(-x), m,
-
xl
so that g(b)=(l +b 2 )(x+b) -2 > F(-x). The minimum of g in [0, oo) occurs at b = x -', and g(x- 1)=(l+xZ)-1,
104
Normal Approximation
which gives the first inequality in (12.13) [note that for x=0, (12.13) is trivial]. The second is obtained similarly (or, by looking at P). This gives F(-x)-(D(-x) 6(1 +x 2 ) -' -(D(-x)mh(x),
say, for x > 0. The supremum of h in [0, oo) is attained at a point x 0 satisfying or
h'(xo)=0,
xo/ 2x 0 (1+xo) -2 = (27r) - / 2e - 2
A numerical computation yields x 0 =0.2135 and h(x 0)=0.5416, thus proving (12.14)
IF(x)-(D(x)i <0.5416
for all x (0 [note that 11(x)- F (x) < .5 for all x (01. The inequality (12.14) for x > 0 follows similarly (or, by looking at P). Q.E.D. THEOREM 12.4 (Berry-Esseen Theorem) Let X 1 ,... , X„ be n independent random variables each with zero mean and a finite absolute third moment. If n
P2-n -1 2 EX^>0, i=1
then sup IF„(x)-c(x)I<(2.75)/ 3 ,,,,
(12.15)
xER'
is the where Fn is the distribution function of (np 2) - " 2(X 1 + • • . + Xn), the Liapounov ratio standard normal distribution function, and l l3.„°( /2 n -1/Z, P3=n- 2 ' P2 p3). .i -
EIXJ I 3 .
If, in addition to the above hypothesis, X 1 ,.. . , X„ are identically distributed, then sup jF„(x)-4?(x)I <(1.6)l 3 .
(12.16)
xER' Proof To prove (12.15) first assume that P 2 =1. For convenience write
( 12.17 ) In view of Lemma 12.3, sup lF„(x)-(t'(x)+<0.5416, xERI
(12.18)
Berry-Esseen Theorem
105
so that we may assume that 0.5416
1n
2.75
<0.l96.
(12.19)
For, if (12.19) is violated, (12.15) reduces to (12.18). Let P„ denote the distribution of n 1 / 2 (X 1 • • • + X„). Take E=;(3.25)1,,,
(12.20)
and let KE be the distribution specified before the statement of Lemma 12.3.
It follows from Lemma 12.2 and inequality (12.12) that
sup IF,,(x)-4?(x)I <(0.58) - ' (2i) ' xER'
( f
(i (t) — a -12/ 2 )
(1:1 (3/2);-i)
t
x (1—;1„ItI)dt+(277) - '/ 2 (3.25)(0.79);1
(12.21)
Write Pt — nO 1= f{III c(3/2)t ^ ') (
e
-
`'/Z
)
t
—
(1-3lnItIdz 1 +1 2 +1 3 +1 4 +1 5 , (12.22)
where i=
f
^i (t)
—
I
13=
J t 1 / 3
? tt)di
,
2t
f
((l/2)1,1
14 = /
dt
t
( I' i < t; "^ }
12
e -1'/2 )
____ (1-31,,ItI)dt,
t
(t)
l
15 =f
( 1 — il,^Itl)dt. -, = /2
e dt.
(12.23)
Normal Approximation
106
Applying Theorem 8.4 with d= 1 and using (12.19), one gets I 1 <(0.36)1„ f t 2 exp( —0.3779t 2 ) dt <(1.3118)/,,. R'
(12.24)
By Theorem 8.9 (with 8 = 1), f 12 < 1,/f
e_t2/3dt < l, 3 f
=3l„/ 3 exp{ — 3/,^
2/3 }
} ^t^e-`2/3dt
/ =3l„(l^ '/ 3 exp( —;l-
2 3 ))
<(l.9320)1.
(12.25)
Again using Theorem 8.8, this time with 8 = Z, we get J3=
P-(t) (l — 2, 11jtD)di
<321„ f (I11>(1/2)t, '}
It)e-12f6dt e-'2/6d1<3(21.)2 f {IiI>(t/2)t, ')
— !^ 2 =161. exp 24 <(1.06)1,,.
(12.26)
Noting now that Theorem 8.9 was derived from the inequality [see (8.48)] Pn(t)I<exp(— 4-
which holds for ItI <21; 1 , one has z exp^— 2 +
14' < 3 lnf =3!„
<
f exp(-3!„t2(21n '—fit()}dt {i '
„ 3
!h
f
{r
<3info
exp — (l
2 ^
'
1—
I
fit+) dt
'}
exp { —( 3 )u } du / l J
= 212 <(0.3920)/,,.
(12.27)
Berry-Esseen Theorem 107
Finally, 15 =
e_12/2
t
< li/3f
Idt
Itle-12dt
= 21,,'/ 3 exp { - 2
2/3
} (12.28)
<(0.8096)1„. The estimates (12.24)-(12.28) are now used in (12.22) to give
(12.29)
I <(5.5054)1,,. Using this in (12.21) one obtains
sup IF„(x)-4(x)I<(0.58) - '[(27r) - '(5.5054)!„+(2ir) - '/ 2 (3.25)(0.79)31„] xER'
<(2.676)1,,.
(12.30)
This proves (12.15) under the assumption p 2 = 1. In the general case, look at random variables Y^=X^/p2/ 2 , 1 <j < n. We now prove (12.16). Again assume that EX= and < 0.54616
In
=0.3385.
(12.31)
As before, we proceed to estimate I and write
IGI^+Iz+I3,
(12.32)
where (Pn(t)—a—rZ/2
1 = j {ltI2'/ 2! -1 }
)
dt,
t
12 f -^ 1/2
I P^I t) I (1-i1^I'I),
I3= f
Ie
— r2/2 {2'/ 21. ' <1r1 <(3/2)1^ }
1
I(I- 2,1ItI)dt.
(12.33)
108
Normal Approximation
Writing
a=E(expS n
1n 11
b=exp{ —2 ? },
X2 I) , 111
(12.34)
one has
I Pn(t)—e-`2/2I=)a"—b"j=ia—bj•lan-I+an-2b+...
+b"-1).
(12.35)
But, for all t, writing p 3 = E IX 1 1 3 yields
la— bi
<exp{ —
(0.26 42)t 2
n
I.
(12.37)
c = 0.2642,
(12.38)
Hence, writing one has, using (12.36), (12.37) in (12.35), If'n (t)—e -= ^ Z I<( 6nI3/Z
+
)exp(— 8n2
n
n— —r
ct 2 -2n t Z } (12.39)
r-0 J
Now n-1
n—l—r C12_ r t 2 1 r—O
p n
2n = exp { ( 2n — c)1 2 }
exp { — (i — c) l2 n }
ll r=1
f1e-(=-`)`'"dx
` (2n —c)t2111I 0
r =nexpj
` (2n
—c)t 2 ^
1— exp{_ 0_0 12 } Z . (12.40) ! ( 2 c)t
Berry-Esseen Theorem
109
The estimate (12.40) is used in (12.39) to yield — r=/2
n(t)—e
(
Ill
p3
1
1—
I
III
( 6n 1 / 2 + 8n/
2 n )t 2 } x (exp{ —(c— I
—exp{ — " 2n 1 t2}).
(12.41)
Recalling from (12.31) that p 3 n - '/ 2 < 0.3385, so that n) 9, one obtains ligL—c)-11 /RI(6+ 1 exp { — t c
_(2—c)-11„
(2 6 1/2 1(2c— n) -1/2
2n) t2 J (
exp[—n
2n l t21 ) dt
" n 1 ) -1/2 1
1 _ I -1 + 24{( c 2n) 2n —1n 1 ) }
(2ir) 1/z <(1—c) -I 1,, 6 {(2c-9) -1/2
-1}+ 4{(c— 8) 1-2)1
<(1.465)1,,.
(12.42)
Next, proceeding as in the estimation of 1 4 above, one has IZ<2-1/2(1— 23/2)In exp{-3l^ f(2'/2/"-'<,,I,(3/2)/"- ') 3
I ^ilh 1
—Itj)}dt
<(0.0404)1,,.
(12.43)
Finally, 13<(2-1/21„)2(1— 23/2 ) 3 J
^tle-`'/2dt
{2'/z! '
<(0.00002)1,,.
(12.44)
Therefore I <(1.506)1,,,.
(12.45)
and, using (12.45) in (12.21), sup IF,,(x)-41(x)I <(0.58) -1 {(2zr) -1 (1.506)+0.6829 1„ xER'
<(1.6)/,,.
Q.E.D.
(12.46)
)
110
Normal Approximation
Some of the computations above may be sharpened to yield somewhat better bounds in (12.15), (12.16). In Chapter 5 we see that in the i.i.d. case, under the hypothesis of the Berry-Esseen theorem, the finite limit sup IF.(x)-fi(x)l
d(P,P), n-rao lim
(12.47)
xER'
exists, where P is the distribution of X,, and that 3
(12.48) P P3 6V 21r where a 2 = EX2, and the supremum is over all P having mean zero, positive sup °—d(P,(D)= 10 +3 =0.409,
variance and finite third absolute moment. Thus (V +3)/(6V ) is the asympototically best constant for the Berry-Esseen theorem. It is also known that in the general non-i.i.d. case one has (see Notes at the end of this chapter) (12.49) sup I F„ (x) - D(x)I <(0.7975)1 3 ,,. xER'
13. RATES OF CONVERGENCE ASSUMING FINITE FOURTH MOMENTS We begin with a lemma that is used in computing error bounds. LEMMA 13.1 Let X 1 ,...,X n be n independent random vectors with values
in R' having finite third absolute moments p 3.^(1 <j (n) and satisfying n
n '
EX^=0,
-
Cov(X^)=I,
(13.1)
j -1
where I is the k x k identity matrix. As usual, let x,,, j = with cumulant of X^
(I < j < n),
n
x.=n -I T, X,,,.
()v)=3),
(13.2)
j -1 n
p 3 = n - ' I P3j. i-1
Their the variation norm of the signed measure P 1 (-ID: (x,.)) defined in Section 7 satisfies 11P1(-1: {Xr})II
3 / 2 -3k'/ 2 +2) p3. (k (.Z 17
(13.3)
Convergence Assuming Finite Fourth Moments
III
In the special case k = 1, -1/2 ( 4 e -3/2 + 1 )1µ3I II P I( -1: {x})II = ( 2 ir)
< 3 (2 i) -1 ' 2 (4e - 3 / 2 + 1)P3. (13.4)
where n µ3—n-' 2 EXf.
Proof We first prove (13.4). By (7.21), IIPI(—t: {X,})Il = 61 µ3I JRJ Ix 3 -3xlo(x)dx = 3iµ3^^1o,31/2)(3x—x3)$(x)dx+
j 3112 00)(x3-3x)41(x)dx l
-I/2 {((x2-1)e-X2/231/'+[ (1—x2)e-x'/2]31,3) µ3 =(2r)
o
= (2r) ' 2 (4e
312 + 1 )Jµ3]
where ¢ is the standard normal density on R'. For arbitrary k, we use (7.20) to get IIPI( — 'D: (X,} )II < 6 [ IX(3.o....,o)I + .. . + IX(0.....0.3)II f. 1x3-3xlo(x)dx + i [
o)I + P(2,0,1,o.....0)j
.. .
+ IX(o.....0,1,2)1] f l x 2( 1—x I)I$( x 1)$( x 2) dx 1 dx 2 R2
+ IX(0....,0,1,1,1)1]fR ,I x i x 2 x 3141( x 1)4, ( x 2)41( x 3)dx i dx 2dx 3 /
<3( 27r ) -'/2 ( 4 e -3/2+ 1 ) 9,, +(ire)/2)-'0.,2+ i7l
)
3/2
9 ,3,
(13.6)
112
Normal Approximation
where, writing Xi _ (Xi , 1 , ... , Xj k ) and using (6.21) give en, I = IX<0)I + ... + IX(0,...,0,3)1 k
n
=- 1 EX^ i ^ i—t
l
j-1 k
n
N3
0",2 = ("(2,1,0....,0)1 + ... + IX 0....,0 1,2)1 '
n
=
E(X iXj.r )
n-1 1
j-I k
k
(
x.
k
± I Xj.i'I —i-1± I Xj.iI3
i'-1
—1 n
k
i-1
< k 1/2P3 — k — ' 12p3 = k l / 2(1 — )P3,
0n,3 = IX(1,1,I,o,....0)I+ ... +IXCO,...,0.1,1,1)1 n = 1
n —' E j-1
i#i',i'#i",i"#t n
k
3
cn - IEE (GIXj"I j—I
k
k
-3^X;i^ i-1
i'—I
k
IXj,'l - IXj.11 - IXj.il 3 i-I
<(k 3 / 2 — 3k I / 2 + 2 )P3.
Q.E.D.
(13.7)
Convergence Assuming Finite Fourth Moments
113
Let us now introduce a kernel probability measure K on R' whose density i is given by k 11 ga.4(xl)
W(x)=
[X=(XI..... X k )ER k J.
(13.8)
i—I
where a >0, and g 4 is the probability density on R' defined by (10.7). In fact, in this case we may compute the constant in the expression for g a and get 3a sin ay
4
(yERI)'
ga.4(Y)= 2ir( ay )
(13.9)
We shall choose )1/3 a=(4 512 k5/6=2,,7—I/3k5/6. 4
(13.10)
By (10.9), k
K(t)= IT ga4 (t 1 )=0
for I
if t [-4a,4a] k =(1:It 1 1 4a
i-I
[t=(iI,...,rk)E R k ].
(13.11)
Note that
( sin ay l
3ak
K({x: IlxII> 1})^— f [k-'12'x) 1 ay < 3k
4
1 dy
U_4dy= k (ak -) / 2 ) [ak-'/x co )
3
= .
(13.12)
IT
Hence
K({x:
JJxll8.
(13.13)
In the proofs of the theorems below we use some theorems of Section 8, substituting n-(s-2)/2p5 for ls „ (s>2). This is justified in view of (8.12).
THEOREM 13.2 Let X 1 ,.. . , X n be n independent and identically distributed random vectors with values in R k satisfying EX,=0,
Cov(XI)=1,
P4=E11X1114
(13.14)
114
Normal Approximation
Let Q„ denote the distribution of n - '/ 2 (X I + - - • + X,) and let 1 be the standard normal distribution on R k . Then for every real-valued, bounded, Bore/-measurable function f on R k ,
f
fd(Q„ -4 >)
<w1(R k )I (3Cp+aI(k))P3n
-I/2
k 2 +;I -1 / 2 kn - '(logn) -1 / 2 +a 2 (k,P 3 ,P 4 )n - '(logn) /
k (k -2)/2 1 5/2 3 logn 3/2 + zr k P3(n (I'( 2 ) + 3 ( 2 k)
+ 3 wf (27/27T-1/3k4/3p3n-i/2: t),
))-26.1
(13.15)
where wf is defined by (11.24), c o is the universal constant appearing in the Berry-Esseen bound (12.49), and -'/2 (4e -3/2 + 1)+ 3(7re' /2 ) -' k' /2 (1 — k - ') al(k)= 2 (27x)
) 3/2 +a
(
(k 3/2- 3k' /2 +2),
a2(k , P3,P4)= ik k / 2 (k+2)2
k+1
(r(
I
2
)/ (0.14+
+(0.19)(2.3) k k k / 2 (k+2)(k S =(logn) k / 2 J 00
24 )
+4)(r(fl) -' , P2
[rk-Ie-o.264r2+rk-Ie-(I/2)r2 1/2)
+ Ip3n-I/2rk+2e-(l/2)r2]dr.
(13.16)
Proof Let Z denote a random vector whose distribution is K [see (13.8)(13.13)]. By Corollary 11.5 [more specifically, inequality (11.26)],
J fd(Q„ -
-
D)I
Convergence Assuming Finite Fourth Moments
115
for all e > 0. Choose e = 4ak 1/2P3(2n) - I
/2.
(13.18)
Then (2n)"2 if Iltll> (13.19) P3
K1 (t)=0
For every Borel set B and r > 0, write B 1 =BnB(0:r),
B 2 =B\B 1 .
(
13.20)
One has
I (Q
—
'':
(
X1)) *KE(BI)I,
(13.21)
where H„ = Q„ — 4 — n I/2P1( — ( D: ()) and X is the with cumulant of X I (only cumulants with I I = 3 enter into the expression for P I ). By Fourier inversion and (13.19), Hn *K,( B I)I <( 2i )
-k
Ak(B1)
f I H (t)K1(t)I dt <(27r)-kAk(BI)(11+I2), (13.22)
where Ak is Lebesgue measure on R k, and (by Theorem 8.5) I
=f
IH.(i)Idt
(II+II< , 1 / 2 /(2p /2 ))
(
0.14 + ? 4 n 24n
+ <
0.03p3 n
)f 1 ji,114 exp(—
zIItII2)dt
f 11111 6 exp{ —0 . 38 311111 2 ) dl
(Q. 14 + 24n )k(k+2)(2ff)k /2
+
OA7p3 7t k j2 n ( 0.383) k(k+2)(k+4),
(13.23)
116
Normal Approximation
and (by Theorem 8.7) ) IHn (t)I dt / ^/\ 2 P^ / ^) < IIIII <(2n) 1/=^ /P
n
< f
[exp{ —0.264(1111 2 )
(II 1 II>n h/2 /( 2P,1 /2 ))
+( 1 +bn -1/2 P3IItII 3 )exp{
—
iII 1 II 2 }]dt
i/z Lrk—le-0.264r2+rk—Ie_0/2)r2 27rk/2(r(
2
j1/2
n^ /( 2 Pe
+
n-1/2rk+2e-0/2)r=ldr
=21rk/2(r( i.))
(13.24)
Thus I Hn*KK(Bi)l < 2-(k-2)/2(I'(2 )) (k+2)I 0.1 4 + 24n
,
+(0.14)(1.532) -k / 2 (I'( 2 )) (k+2)(k+4)p3n -I r k +2-tk-2)k-'(r(2 ))
-2
(13.25)
snrk.
Also, by Lemma 13.1, n-1/2lPI(—c: (X1.))*Kc (B1)I <1n - " 2 IIPi( — : {X„))II ^2 f 3(217) -1 / 2 (4e -3 / 2 +1)+(1re 1 / 2 ) - 'k'/ 2 (1 —k - ^)
+
=
))
\ TT/ 2 3/2
ai(k)P3n
-
(k3/2-3kh/2+2) lp3n—i/2
'/ 2 ,
J
(13.26)
117
Convergence Assuming Finite Fourth Moments
say. Next, (Qn -4')*Kc(B2)I <max{Qn*K,(B2)+ ^*Kc(B2))
< max (Prob(IIn -' / 2 (X I + • • • + X„) II > 2 ),
f
O(x)dx}+Prob(IIeZII>
JJ)
(IIxII>r/ 2 )
2
)• (13.27)
By the Berry-Esseen theorem, Prob(Iln -1 / 2 (X i + ... +X„))I > 2 ) k
<
Prob(In -'/Z (Xi , + ... +X1)> r
2k'/Z
)
/2 < 2c 0p 3 n - '/ 2 + 4k 3 (21T) 1"2r - ' e -?/sk
$(x)dx <4k3/2(21T)-i/2r-'e-,^/sk {IIxll>r/2)
r
512k°p3
Prob(IIEZII> 2)< 52 3 3 2 irr n
(13.28)
where XJ =(XJ , I ,...,XJ.k ), 1 < j
-'/2+4k3/2(2i')-1/2r- ie-r2/8k
" P
+512 ;/Z 4 3 3/2 .
23
(13.29)
2
Now take r=(8klogn)'/2.
(13.30)
118 Normal Approximation
Then from (13.25), (13.26), and (13.29), (Q,, -
<2tk + 1)
(r(2 ))
k k / 2 (k+2)0-14 -1
/2
+ 24n ) (lo g n)k
+(0.14)(2.3)k(r(2))- k k / 2 (k+2)(k+4)p32n - '(logn) k / 2 +2(k +4)/2k(k- 2)/2(I'(2 )/-2d,, (logn)k /2+ai(k)P3n-I/2, (13.31) I(Qn- 4) *K.(B2)I <2c,P3n-
1
/2+kir- 1 /2(nlog'/2n) -I
+81r -I k $ / 2p3(nlogn) -3 / 2
Now since
1i( Q,,- (D) *K,11 =2sup{i(Q,,-(D) *K,(B)I: BEl k ),
(13.32)
by using (13.31) in (13.17) one obtains the desired inequality (13.15). Q.E.D. One can easily deduce from (13.15) that there exists a constant a 3 (k) (depending only on k) such that
I
1 fd(Q., -( D)I ( wj(R k )a3(k)P4n-
1 /2+3w)
(27/2 -1/3k4/3p3n-1/2: t). (13.33)
We prove a generalization of (13.33) to the non-i.i.d. case. THEOREM 13.3 Let X 1 ,.. .,X, be n independent random vectors with values in R k satisfying (1<1(n),
EXO n
n
-
1
2
Cov(Xj ) =1,
(13.34)
j-1 n
P4 =n -I I EIjXj 11 4 <00. j-1
Let Q„ denote the distribution of n - " 2 (X 1
+ • • • +
X„). Then there exists a
Convergence Assuming Finite Fourth Moments
1 I9
constant a3 (k) depending only on k such that for every real-valued, bounded, Bore/-measurable function f on R' one has f fd(Qn —
t)I <wj(Rc)a3(k)P4n-'/ z+3w.i (27/21-"/3k4/3P3n -1 /2: (13.35)
Proof. We use the same notation as in the preceding proof. Then we have [in place of (13.22) -(13.24)] IHn*K. (B1)I <( 2 Tr)
-k Xk(BJ)(J1
+J 2 ).
(13.36)
Theorem 8.6 yields Ih.(t)Idt
J1mf
(13.37)
(hit<.i nh / 4 /p /4 )
if [see (8.30)] Pan -' < 1,
(13.38)
and Theorem 8.9, with & = Z — 2'/ Z leads to the estimate ,
Jz=
f
'12
(In I/4 /Pi 4 <11 1 11<( 2 n) 1p,]
J (hut >f(n /v.)'
")1
IHn(I)Idt
l 3& 11 1 11 2
_I
}
+ (1 + b pan - 1/1 11 , 11 3 )exp { — II 1 II 2
}
J
dt
(13.39)
Note that we used (13.38) in the last step of (13.39). Now choose r= (8klogn)' / 2 and obtain -1 /2, (13.40) Hn*KK(B1)I
and IHn"K,(B2)I
-
'
+8ir-'k5/2ps(nlogn)-3/2 < ag(k)P4n-i/2. (13.41)
120
Normal Approximation
The rest follows exactly as in the preceding proof. It is simple to check that (13.35) holds with a3 (k)=2 if (13.38) is violated. Thus (13.35) is proved for all cases. Q.E.D. One may compute an explicit error bound, with numerical values for the constants involved, in Theorem 13.3 much the same way as in the i.i.d. case (Theorem 13.2). of all Borel-measurable A significant application is to the class convex subsets of R k . By Corollary 3.2 (with s = 0) sup w+ r (2e:4)= sup 4b((8C ) 2 e)
cEe
cEe c
2 5 ' 2 F(( k + 1)/2) C
(13.42)
r(k/2)
Using this in Theorem 13.3 one obtains sup I Q,, (C)
CE2
—
(C)I < a1o(k)P4n-^/2.
(13.43)
In the i.i.d. case one also has Tim sup V I Q,, (C) — P(C )l " -10° CEC
1) / +1(k)+ 3 1 8 'r - ^ /3k 413 r( k+ 1) 2) IP31 (13.44) where a 1 (k) is given by (13.16). This follows from Theorem 13.2 and the estimate (13.42), provided that one replaces the first estimate in (13.28) by Q,,({x:
Jlxii>z( 8 klogn)" 2 })=o(n -1/2 )
(n-*oo),
(13.45)
which holds (see Corollary 17.12) if P 3 < oo.
14. TRUNCATION
We need some truncation estimates for relaxing the moment condition (namely, finiteness of fourth moments) of the last section to obtain rates of convergence for integrals of unbounded functions and (in Chapters 4 and 5) to derive precise asymptotic expansions. Throughout this section X i ,...,X„ are n independent random vectors with
Truncation 121
values in R k having zero means. We write, as usual, µa j = EX,
X = ath cumulant of X^,
1 p5.j =EjlX ll''
(l < j
n
n
E
Xa=n—' I Xa.j,
A'aj'
(14.1)
l —1 J°I n
[aE(Z+ )k,
p,, = n -I Y, ps,j
s>0].
Define truncated random vectors
X^ Y= 0
1
if IX,II
n 1,2
(14.2)
if I^Xf ll > n
Zj =Y1 —EYJ (Ic j
= EZ,j ,
(14.3)
ps.i = E IIZ1 II5, n
n
ps=n E11ZJ II 3 ,
Xa=n —I 2 X•
—I
j —1 l-1
Also introduce n s
Anj,s
= IIXjII
^n.a =n
—
I
11 Xj 11 >n^^ Z }
(
I An,j.a.
(14.4)
j=I
Finally, write Cov(XX )=((vii )),
V=n-I n
D=n -I Y, Cov(Zj )=((dij (14.5) jal
The constants c's depend only on their arguments.
122
Normal Approximation
LEMMA 14.1 Let p, < oo for some s>2. (i) One has
p,,=EIIY11I3+
(14.6)
(ii) If a is a nonnegative integral vector satisfying I < a <s, then EX — EYl I < n (s lal)/2A "j 5 —
—
IEY^ — EZ^ I < Ial( 21a1 + 1)n1/2An,i.5.
(14.7)
+d11 —v1A 42n -( s -z) / 2 p n.: (1
(14.8)
(iii) One has
(iv) If 2 < s' < s, then
p—EIIZ IIS <2fpJ'j ,
EIIY;II
;
S
ps <2 S p,..
(14.9)
(v) If s' > s, then
EIIY IIt <(En'/2)(3
-s)^
IIX^iis
{IIXj IjGen/ Z )
+(s'—s)/2^
I° < n (f — s ) / Z pf.i (0 < e < 1), IIX^I
{ sn/2< II Xji
Gn2 )
ps..i =EIIZf ll s <2'EIIY II'.
(14.10)
Proof The relations (14.6) follow from definition. To prove the first inequality in (14.7), use the inequality (6.29) to obtain EYj! - I= EX°— l
f
f
X < r 11X.111°I
I {^^Xju>n^^ 2 }
j
l
toxilI>n 1/ )
,-(s-la)/2 fIIX.II s =n -(s- l a l ) / 2 0 nJ,s• {IlXill>n1/2) j
(14.11)
Truncation
123
To derive the second inequality in (14.7), first observe that
xa_YaI_Ixr i...Xk —Y1O". k I = Ix '
Xk I ( Xk Yk k )
+x '• ..Xk 2( xk I Yk I)Yk +... + ( x 1 ' YI ' )Y2 =.•. Yk I
xk
Yk)I ak max (Ixkl ak-I, IYkI
+... +a,max(IxI o1-I, IYII a '
1
N-1 )
)IYZ :... yk (XI — YI)I
< IaI(IIxll lal-I +IIYII I°I-1 )max(lxi — y, : I < i 6 k},
(14.12)
k X=(XI,...,Xk)+Y=(YI,...,Yk)ER ].
Since by (14.11) IYi.r — Z^,I=IEYI
-(:-I)/20
n.•
(1
(14.13)
it follows from (14.12) that
EYE —EZ7I
'EIIY'+EIIZJIIIai-1). (14.14)
Now EIIY^III^I-1= /
IlXJIlal-1
(14.15)
IIEY1II <(EIIY;II 2 ) 1/2 , E I Z^ IIIai-1 < 2 1al -'( E 1IYI II Ial -1 + II EYY II I " I-1 ) C 2laI n tlal- It/z . Hence IEY^ — EZJ I < IaI( 21al + 1)n-(s-IQI)/2An.i..,.
=
124 Normal Approximation
Next we have
4 Idir—vtt1
= n -1 2 I E (XX,1 X1. , — YJ,rY!.,) + E (Y)E (Y1.t)I i- 1 n
l
n,j,s +n
) h/2n -(s-U/2^ nJ.s
l
f I —
2n
-(s-2)/2^ A t.
(14.16)
To prove (iv) note that [using (6.26)]
11Z;II J <(IIY;II+IIEY;II) s < 2s- '(IIY;II S +(EIIY; 11 2) s/2 ) < 2s -'(IIY^II' + E IIYYIi s ) Proof of (v) is clear. Q.E.D. Recall that the norm of a k X k matrix T, denoted IITII, is defined by 11 TIJ =
11 Txll.
sup
(14.17)
xER k .IIxII C I
COROLLARY 14.2 Under the hypothesis of Lemma 14.1, one has I— I<2kn -( s -2) / 2411 t II Z
< 2 kn -(s-z)/2 ps11 1 II 2 (t E R k ),
(14.18)
so that
lI D —VII < 2kn
-(s-2)/2E
2kn _(5_2)/2p3 (14.19)
In particular, if V = I and
ks
(14.20)
then D is nonsingular and
a
IID-III
(14.21)
Truncation
125
Proof. By (14.8) and using (6.26) in the last step, we have k
I— I= i t;tr(d,, i,t-i
—
vu) 2
k
< 2 n -cs-2)/2 p'.:(2 It^I) < 2 kn -cs-2)/2 p"SII 1 II 2 .
(14.22)
Since the matrices D, V are symmetric, (ID — VII= sup II<2kn
-(
II+II <
' -2
l/ 2 A- n ,5 .
(14.23)
Also, if V= I and (14.20) holds, then
(14.24)
(t,Dt>> (ItiI2—aIItIIZ=aIItIIZ
(14.25)
IID — III,
2
and
The last inequality implies that D is nonsingular and that II D - ^ II < i Q.E.D. The next lemma concerns the growth of derivatives of the characteristic function of n -1 / 2 (Z 1 +"• + Z„). LEMMA 14.3 Suppose p s < oo for some s> 3, and that V = 1. Let gi denote the characteristic function of ZZ (1 < j < n). Then if 1/2
II t II < 16p 3 '
(14.26)
one has, for every nonnegative integral vector a,
l g' (n ' ) ^ ^fi (
D
n
+6111II 2 ).
(14.27)
Proof By inequality (8.46) one has
( <exp{ gjln^/2 )1 t
2
—
n - 'E2 +3n -3 / 2EI<,,Z^>I 3 )'. (14.28)
126 Normal Approximation
Observe that exp{ n -I E'(t,Z> 2 — in -3/2E11 3 ) <exp{ n - '(EKt , Zj>I 3 ) z/3— in -3/2EI 1 3 } < supexp{a z — Za 3 } =e 1 / 3 .
(14.29)
a>O
Now let Nr be a subset of N = (1, 2, ..., n) containing r elements. Then in the given range (14.26) one has
11 jEN\N,I
Sj
t
)1
z
n /
2
4
II exp{
—n-IE2+
3n-3/2EII3)
jEN\N,
'exp ^
—<1,D:>+ 3n-3f zj-1 1 EI <1,Zi>`3 }
x 11 exp(n -I E
<exp( —+ 3n -1/zp3II 1 11 3 )e' j3
(14.30)
< exp{ — fit, Dt>+ 311 rll z }e`/ } ,
using (14.9) (with s'=3) and (14.26) in the Iast step. It follows that ) <exp( ( g( J 1 2
-
(14.31)
jEN\N,
which proves the lemma for a=0. To prove it for nonzero a, note that
(DMgj)(
1/z)l=n-1/2IE(Zj,n ,expfi))I
n
i
=n-1/2IE[ZJ m(exp{i)
—1)]I
I
[Zj—(Z 1.1,...,Zj,k)],
(14.32)
using (14.9) in the last step (with s' = 2). For a nonnegative integral vector
Truncation
127
/3 satisfying I /31 > 2, (Dagi)( (- )I < n
-1PI/ZEIZBI < n -IfiI/Z P ,
6 21 #I n -'p2j,
(14.33)
by (6.29) and (14.10). Thus
(DPg^)( ^/2)I
(1
where
c2(a,k)=max(4,21al),
uhf = max{ 1 ,II'll).
(14.35)
By Leibniz' rule for differentiation of the product of n functions, (D*Hj_,g1 )(t/n 1 / 2 ) is the sum of nI"I terms, a typical term being
J
[ f^N, 11 gi\n,2))[D4Igi(112)
_j ) [DPgj(_( ... J,
(14.36)
where N, _ {J1,.. . 'Jr)' 11
(1
Q;=a.
(14.37)
r-i
The number of times the expression (14.36) is repeated among the nIaI summands is given by a1! .. ak l
r
(14.38)
k
II H a, i-1 -I
where a=(a,,...,ak),
$i=(Qri •.• Qik) ,
,
(1
(14.39)
In view of (14.31) and (14.34), the expression (14.36) is bounded in magnitude by exp{ 6 — Z+ blltII 2 } II b1 , BEN,
(14.40)
128
Normal Approximation
where bi=n-IC2(a,k)P2j
IltII (1
(14.41)
< j
Therefore
_ n
(D a 11 9)i)(_/2 ) i I
G 2 c 3 (a,r)exp l 6 — i+ 611t11 2 }' j `(j b I
(14.42)
BEN,
where, for each r, I• denotes summation over all choices of r indices from N=(1,...,n}. Hence
n
_
11 b'! <(^ )-I b;) =(c2(a , k)P2 Iltll )r
sl BEN,
(14.43)
=(c2(a , k)k Iltll )'•
The inequality (14.27) follows on using (14.43) in (14.42). Q.E.D. COROLLARY 14.4 If in addition to the hypothesis of Lemma 14.3 the inequality (14.20) holds, then Da gi)( (
i-I
I z )I^ci(a , k)( 1 +IItfl l°l )exp( — i IItII 2 )
(14.44)
n/
for all t satisfying (14.26). Proof. Use (14.25) in (14.27). Q.E.D.
We also need an estimate for the difference between a polynomial expression in the cumulants of n - '/ 2 (X I + • • • +X„) and the corresponding expression in the cumulants of n - I / 2(Z I + • • • LEMMA 14.5 Suppose that V= I and (14.20) holds for some s > 3. Let v i ,...,v m be nonnegative integral vectors satisfying m
^v,I>3
(1
2 (Iv1 l-2)=r,
(14.45)
i-I
I < r < s — 2. Then Ix,: xi. —x1... x,j < c4(s,k)n-('-r-2)12.,,
for some positive integer r,
' -2) / 2 P,.
(14.46)
Truncation
129
Proof First recall [see (6.14), (6.15)] that for each nonnegative integral vector v, X^ i is a linear combination of terms like µa;,i...µ i
(14.47)
where a,,...,ap are nonnegative integral vectors and r,,...,rr are positive integers satisfying P
r; a; =v
(a ; >0,1
(14.48)
To estimate the difference y, — y i it is therefore enough to estimate 1.... µ
µ« i l
r —' r«i ....µ ,r r «r.J l J ay,J
(14.49)
subject to (14.48) and 3 < I vJ < s. By Lemma 14.1(ii), (s I«I ) /Z0^ (I < II < s). µa.i µail < C5(a)n ✓,s
(14.50)
We may use (14.50) and (14.12) to get µayj...N ^—^^'aiIJ.
..µr^.il
P
<
r.0
l µ .,-iI + I µ '.-lI
( a ) n -(s-da)/20
6
nj,s( 0.J
!=I r^
r.
x µa,j
µ„iµ«,.,,1
...
o4J
'rr
µmil
P
<
c a n
—(—kI)/2
m;/s
(14.51)
^n J,sPs,j '
where c 7 depends on i, a's, and r's, and P
m;=
r; Ia; l ,
r=i
,
—
Ia1I= Iv!
—
Jail (14.52) ,
by (14.48). In the last step of (14.51) we have used the inequalities [see (6.27), Lemma 6.2(ii), and (14.9)] i l/s µaiI < plal,i
(14.53)
130
Normal Approximation
In view of (14.20) one has
11 )11'+A n j
PS.)
<
S
n+ORj.s
(11X;11 4.111)
< nJ/ 2 +n0,,, J < nJ/ 2 + 8k2 (14.54) so that n-(s-larl)/Zp j/J=n-(s-Ia,I)/2P;^VI -Ia,I)/s < n-(s-Iarl)/2n(Irl-la;l)/2 1 + (
8k
< c 8 (a„k)n -( s - I'U/ 2 .
(14.55)
It now follows from (14.51) that Xrj—Xrj
I
and, therefore, averaging over j=1,...,n, ix
— xi
< c 9 (Y,k)Zn -(s-1• I ) / 2 (3 < Ivl < s).
(14.57)
Let now v,, ... , vm be nonnegative integral vectors satisfying (14.45). By (14.57) one has xv, ...
w _^... m C9 ( 1,
k)Q n'5 n -(s-Irrl)/ZlXrr..
. ^i-Ix'ra^...
(14.58)
By Lemma 6.3 (and averaging), Lemma 6.2(iii), and Lemma 14.1(iv) one gets (recall that P 2 - k)
< C10(s k)PI. 1 1 ,
...
p1rr - ^IpjP,.1l
.z)/(=- 2 ) r
...
p1.I
(14.59)
Truncation
131
But because of (14.20), one has n
p, = n -'
p5 , j=I
n
=n '
(f
—
IIXjIIS+O„,i.sl
lll xill
j 1 L
n
{11x:11
< kn ( s -2) / 2 +, s <(1 + 8k)n(s-2)/2.
(14.60)
Using (14.60) in (14.59), one gets ..
IX.,... X;-^
.I < c12(s,k)n (r-II+2)/2 .
(14.61)
Substituting this in (14.58) we obtain (14.46). Q.E.D. The above estimates also lead to LEMMA 14.6 Assume that V=I, and that (14.20) holds for some s>3. Then for every integer r, 0 < r < s — 2, one has n -r/2 V ' ( - 4: {X.})(x) — Pr( - 4o,n: {XP'})(x)I < c r k s 0 n - (5 -2 )/ 2 (1 +lixIl 3 T+ 2
2 xexp{ — 11611 +IIXII}
)
(xERk),
(14.62)
and 2IP,(—$: (X,.})(x+a„) < ci4(r,k, ․ )in,5n
—
-
Pr(
: {X.})(x)I
3.+1 ) C, 2)/2 ( 1 + 11x11 -
xexp(-211x112+ 8
where
—
1Ik /Z }
(xER'`),
(14.63)
)J a„=n - '/ 2 EYE . j—1
(14.64)
Normal Approximation
132
Proof We first obtain an auxiliary estimate. For z E C k write k
g(z)= l -(DetD)-1/2exp^ i 2 z?
-
i-1
i
d11z,z1 1
Iz=(z 1 ,...,Z k )}.
Then k
z?-i drrz1z,.
g(z)I=I(DetD )-1/2exp{i
i=1
1
-1-
i,i'k
III(IIZ11 2 )}
+zlID -1- III(IIzII 2 )exp{IID -1- f II(IIzII 2 )} Z=(Zl,...,zk)EC k ],
(14.66)
where Det denotes determinant and d" is the (i,i') element of D -'. By Corollary 14.2 [replacing t by Bt in (14.18), where B 2 = D -1 ],
- II 1 11 2 1 <2kn-(:-2)'2pn,,IID -I II(IIt11 2 ) <3kn -(s-2)2
p^S11 1 11 2, (14.67)
which implies II D
-1 _ I II < 3 kn -(s-i)/2 „ ^ <;68) . (14.
Also, expanding DetD and making use of (14.8) and (14.20), we get I(DetD) -1 / 2 - l l = (DetD) -1 /'^ I -(DetD ) 1 / 2 I <(Det D ) -1 / 2 t1-(Det D ) 1 j 2 I(1 +(Det D ) 1 / Z) = (DetD) -1 / 2 11- DetDj 4 II D -' ll k / 2 11- DetD l < c15(s,k)L3n-(s-2)/Z
(14.69)
The estimates (14.68), (14.69) are now substituted in (14.66) to yield z _ 14311 1 ^g(z)I
(zECk). (14.70)
Truncation
133
Hence max
I8(z)i
(.-, e& , II: •- zIl
(Ilz11+ 1 ) 2
l
3
J
II 311 2
< c17(s,k)On,5n-(s-2)/2(1 + Ilz11 2 )exp j
+ lizII .
J
l (14.71)
By Cauchy's estimate (Lemma 9.2) one then obtains
I
IDg(z)l
From this follows D' ($ — $o,v)(x)1= I E c19(a)(D a $)(x)(D' -a8)(x) 0
/z(l+IIXII2)expl—
I-
(xER k ).
+11x11} (14.73)
Now recall from Section 7 that [using (7.3) and Lemma 7.2] P,(-0: {x)) r
m^ I
m
`
I ^*
(^**
J^,....Jm
^
vi....,v,,,
^
I• •
m•
(-1)r+zm Ll
P,+...+VT
^I (14.74) J
where X* denotes summation over all m-tuples of positive integers (j l ,...,jm ) satisfying m
j
i
=r,
(14.75)
f=I
and X** denotes summation [for fixed (j l ,...,jm )] over all m-tuples of
134
Normal Approximation
nonnegative integral vectors (v 1 ,.
''m)
1v1 1=j; +2
satisfying
(1
(14.76)
The function P,(-4 o,0 : (c)) is similarly defined by replacing X • by X • and 0 by 4 o.o in (14.74). Next observe that
I XYI...X• D•.+...+.^$(x)_ ... .D•l+...+'.^o,o(x)^ II I ... Xy Dr,+...+P-
(O(x)-0o,a(x))1
+J(X•....X• —X.,...x. )D°^
+...+
'"4,(x)I
(xER k ). (14.77)
By Lemma 14.5 ... Xyj < JXP,... XP_ _ X: 1
c4 ( s ,k) n -( s -r-z) / 2 Q
(14.78)
Also, as in (14.61), one has 1Xy^
...
(14.79)
X^j 4 c21(s,k)n"/2.
Using (14.78), (14.79), and (14.73) in (14.77), we have n(x)l I X•I .. . X. DY^+...+ .4(x )—x , ... X;.D•"+...+ .'Po.
II 6112
+IIxII}
(x E R k ). (14.80)
The inequality (14.62) follows from this and the expression P, given by (14.74).
To prove (14.63), first note that by Lemma 14.1(ii) Ila,ll < n -1/2 I IIEY;II < k 2En,5n-c,-2)/2 <(8k" 2 ) - '. (14.81) I=I From the expression (14.74) and the inequality (14.79) (which also holds if the primes are deleted), we get {})(x+a„)
—
P,(
-
4: {X,.})(x)i
Truncation
135
But D'4 =q"çt,, where q" is a polynomial of degree Iv( with coefficients depending only on v and k, so that ID"4(x+an)—D"-O(X)I
But by (14.12) and (14.81),
I(x+an) °— x°I
(xER k ), (14.84)
and I(x+an ) ° [4(x+an )-4(x)]I
= (x+an) ° -O(x)Ijexp{
—
illx+anII 2 +illxIl z } -1 ^
(x+an) a 14(x)IIanlI(I+11x11)exp{IIa.II(l+11x{1)}
-(,-z)/z
(l+IIxII kN1 )exp{ —
8
k'11
2 }• (14.85)
Relations (14.84), (14.85) are used in (14.83) to yield ID"¢(x+an )
1
—
D"4(x)I
-(5-2)/2
Xexp{ — zIIxII2+
(l+llxII'
81k / z }.
1
)
(14.86)
Finally, (14.86) is used in (14.82) to get (14.62). Q.E.D. A bound for absolute moments of sums of independent random vectors is needed to prove the important Lemma 14.8. This is provided by LEMMA 14.7 Suppose that V=I and (14.20) holds for an integer s)2. Then there exists a constant c27 (r, s, k) such that EIIX,+... for I< r < s.
+X n llr
136
Norma! Approximation
Proof. In view of Lemma 6.2(ii) it is enough to prove the lemma for r = s. For r = s = 2, (14.87) is trivially true. Let us then assume that s > 3. Also by (6.26), z
n
k
l/2
EIIX1+••• +XII'=E E l i-I !
—
/
i
k
E l
s
x1 j ,
i- 1 [Xi=(XJ.i,...,XJ•k)],
(14.88)
so that it is enough to prove the lemma for k = 1. If s is an even integer and k= 1, then by Theorem 9.11, s-3
n-,/zE(X1+... +X.) 5 <
1
n -m/2D sgm( 0) +cza(s) n
-
(: z)/2p,, (14.89) -
m—O
where
r z l g(t)=Pm (it: (7t,))exp( 21 }
(tERI).
(14.90)
By Lemma 9.5 (with k = 1, a = s, z = 0) and inequality (14.60), one has D'Sm(0)1 < c29(m, ․
)P;'/(,-z) < c3o(m, ․)n m/z '
(0<m<s-3).
(14.91)
The lemma is proved for even integers s. Next assume that s is an odd integer, s> 3, and k = 1. Clearly, EIX1+••• +X<EIZ,+••• +Z n l' (14.92)
+EI(X 1 +••. +X„)'—(Z 1 +... +Z„)'I.
By Lemma 14.3 [inequality (14.27) with k= 1, a = s + 1, :=0], EIZ,+••• +Z„I°<(E(Z 1 +••• +Z„): +1 ) < c 31 (s)n'/ z .
s
+1
(14.93)
Truncation
137
Next, EI(X1+••• +X,,)s—(Z1+...
+Z,,)uI
n EI(X 1 + • • +Xj +Z11 +••• +Z n )
<
3
j-1
—(X1+... +Xj—I+Zj+••• +Z,) 5 I n
44
s
1 7. (,) E I( XX — Zj)(Xl +... +Xj—I+Zj+I+... •}•Z ) sm I j=1 m—I n
s
s
+... +Xj—I+Zj+I+... = 2 Y. ( ) E I X7 — Zj IEIXI
+Znls—m.
j-1 m-1
(14.94)
By (6.26) and (14.54) we have E IX 1
+•••
+Xj—I+ZZ+I+...
+Znls—m
62s—m—I (EIXI+...
+Xj+Zj+l+... +Z nl s—m+EIXj Is—m)
<2s—m—lE•IX,+... +Xj+Zj+l.^... +Znls—m +2s-m- ► Bn(s-m)/2.
(14.95)
Writing Tr.j=n-I(EIXII'+... +EIXjI'+EIZj+II r +
...
+EIZ,,I'), (14.96)
one has, using Lemma 14.1 and inequality (14.20), n
T 2•j
EZj +1 EYE +; <EX^ +; ,
(14.97)
j-1 Tz.j — n -1 E^Xi+
... X)
—
E(Xj+I — EZ^+I) —... —(Exn—EZn)]
=n -I [n — E (2— Yj+l) —... —E(Xn—Y^) — (EYj+I)'— ... — (EYn)2]
138 Normal Approximation n
> I-n' I n-(s-2)/20
n
n,j,s
-n-I E n -(s-i)Az
n j,s
j_ j-1 )
I
-
20 n s n —(s-2) / 2 ^4^
7rJ
IXII'+... +EIXjI ' + 2 '(EIXj+I I ,+... +E I X nl ' )]
Look at the random variables
Xi j, l
W = ;
T2 (14.98)
the j+l
T2 j
Applying Theorem 9.11 (with k= 1, a=s-1, t=0) to these random variables, we have n s—m
E ^ W ^
5-1 (s —m) / (s —I)
n
< (E (i -I W1 /
'
I)/2)(3-m)/(s-1) <(c32(s)n(s=C 33 (s,m)n/ 2 (1 <m <s).
(14.99)
In deriving this we have proceeded exactly as in the proof for even integers s. More explicitly, we have used Lemma 9.5, inequalities (14.89), (14.91) with W's replacing X's, rs ,j /-r2j 2 replacing ps , and inequality (14.97) and the modified polynomials Pm corresponding to the cumulants of W., 1 < i < n. Hence EJXI+... +Xj+Zj+I+... +Znls—m
_
T
(j
n s—m
m)/ 2 E)
W,)
(C34 (s,m)n(s-m)/2
(I < m < s).
(14.100)
Using this in (14.95), we get EIXI+... +Xj-1+Zj+1+... +Znls—m
< c 35 (s,rn )n (s
-
m)12
(1 < m < s).
(14.101)
Truncation
139
Also, by Lemma 14.1 and inequality (14.60), n
n
I EX; Z;I< Y. (EIX; —
l-1
—
y;I+IEY;I)
l—1
—
J
_ (
Ox 1>n 1 / 2 ) ;
IX'I+IEYJI)
n
44 1 (n-1/2EX2+n-'/2)=2n1/2, J-1 n
n
2 EIX^—ZZI<2 2 EXj=2n, n
n
I EIXj —Z; I< 2 (EIX,I.^+2mEIX,Im)=(l+2m)npm l- 1 l-1 <(1 +2m)npsrn 2)/(5 - 2)
( (1 + 2m)n (Qn(s-2)/2)(m-2)/(s-2) 8 = C36 (s,
m)n'"/ 2 (3< in < s).
(14.102)
Use (14.101), (14.12) in (14.94) to get EI(X 1 + • • • +Xn)5—(Z1+ • • • +Z„) s l < c37(s)n
2 .
(14.103)
Together with (14.93) this implies EIX,
+
...
+Xnl s < c38(s)ns/2,
(14.104)
completing the proof of the lemma. Q.E.D. Before we state the next lemma, we define On.s(E)=n ' ^, -
IIX^IIs
(e>0).
(14.105)
IIX^i>fn( 1/2) l—1
In this notation (14.106)
Let Qn denote the distribution of n - '/ 2 (X 1 + • • • +X n ) and Q„ that of n
- "/2(y l + ... +Y).
Normal Approximation
140
LEMMA 14.8. Let V = I. If p 6 < co for some s > 0, then there exists a positive constant c 39(s, k) such that
II Qn — Qn II < C 39(s
k)tn,3n -t'-2>/2.
,
(14.107)
Also, there exist two positive constants c4o(s,k), c 41 (s,k) such that whenever .3 )
(14.108)
for some integer s> 2, f II xII'I Qn — Q,','I (dx) < c4,(s,k)On.,n-(s-2)/2
(14.109)
for all rE(0, ․). Proof. Let G^ denote the distribution of n - '/ ZXJ , Gd " that of n 1<j
and
I
n
Q.—Q..1'II = G,*...
H !i—'
*G1— ,*(G 1 —G")*G'+,*... J J
aG,'
I
n
n
<
IIGG — Gd"II= 2 Y. P(II XxtI> n1/2 ) n
IlXjll'=2A-nsn-c:-2>/2, (14.111) <22n - s/ 2 f (IIXjII>n112) i-I
which proves (14.107). Now assume that s is an integer, s>2. Since IIxII' < l + Ilxlls for 0< r < s, it is enough to prove (14.109) for r=0 and r = s. The case r = 0 is precisely (14.107). We therefore need only to prove (14.109) for r=s. One has
f IIxl IQ. Q,'I(dx) 5
—
<
f Ilxll'IG.*
..
.
*GG-i*(G1
—
Gi )*Gi+i* g
...
*G,. I(dx)
n * G,, (du ) <j-21 f (f ►I u+U II sIG^— GJ'I(dv))G,*... *Gi-,**... n
<2s"' 2 (ii
—
Gj"II JII uII 3G,* ...
*Gj-
*G;+I*... * G„' (du )
i-1
+ f iivll 'IG; Gj' I(dv)). —
(14.112)
Truncation
141
Now
f IIvlIJIGj—Gj"I(dv)= f
( [Ix; 1 1> n'lz)
IIn
-i / 2 XJ IIs=n - s/ 2 0 n'j. ,.
(14.113)
Also,
f II uII G1* s
...
*G^-,*Gj+,*... * G„' (du )
=EIIn-1/2(X,+... +X1-^+Y;+,+
...
+Yn)II 3
<2s—'( Elfin — /2(X, + ... +X; +Y;+i +... +Yn)Ii 5
+IIEIIn — " 2 X11_, ) < 22cs—u(E IIn —/2(X, + ... +X1 +Z1+ ,+... +Zn)IIS +IIn -1 / 2 (EY^
+i
+... +EY.)II')+25- 'Eli n -1 / 2 X1 II'. (14.114)
By Lemma 14.1,
I I n "/2(E Y^ + , + ... + E Y1) I IS $
O l =(k2n-c:-2)/20 ) . (14.115)
< ( n -1/2ki/2n -c:-u/2 1
nJJ
J
J
l^^
By inequality (14.54), Elln
-
1/2
X,II s
—
s/2 (n s/2
+
< 1+n —(s-2) / 2 0„ s . (14.116)
We next estimate E IIX, + • • • + Z,, Its by using the preceding lemma. For this assume first that k = 1, and define T,.^ and random variables W as in (14.96), (14.98). Note that ;
IW;IJ=TZ fLIwi >n ) J
5/2
112
f
(/
llx,I>T=i'2&/2}
f I W. I S=T—s/2 2.i f{Iw,I>nhI2) , {IZ,I>*2 /
NIS
(1 < i <.j),
IZ I s /2
)
,
< T2 s/2 f
t IY;I> T,14i2n112— IEY;I }
lye's
+IEY1I 5P(Iv11>T2 2n"/ 2 —IEY I)] 1
(j+l
(14.117)
142
Normal Approximation
Taking c40(s,k)< 1/(8k), one has [by (14.97) and Lemma 14.1) T
EY ; I < n -( s -1) / 2 A
"
J .s
Z 2 >(4) l/2,
n' /2 < n -(s-3)/20 " S < g TZJ 2n'/ 2 —IEY 1 I >n" 2 ,
2n'/2 n'/ 2 )'p (IY;I> 3 ) IEY1I sP(IY;I>T24 2n' /2— IEY;I)<( 8
( nl/2)S ( 2n'/2 3 ) < g PIXrI> flh/2
(
)
I
2 fl 1 /2
)
—J
f
IX;Is > 2n' / 2 /3 )
(jXjj
1
S
{IX j j>2n'/ 2 /3) IXiI,
fIY;^J < f {IY I>r1t 2n'/ 2 —IEY I) ;
;
{IX;I>2n'/i/3}
^X;Is.
(14.118)
Therefore, choosing c, a(s,k)=(,1) , / 2 (l6k) - ', one has n -'
"
IW;I s < T2^S/22a n.^(s) < f {IW , I>"' 12 )
n (r — 2)/2
(14.119)
8
so that Lemma 14.7 may be applied to yield EIX,+••• +X^+Z,+••• +Z,I
$
= TZ j 2E I W, +... + W IS .
EIW J +
+W„I 5
(14.120)
For arbitrary k, apply this estimate coordinatewise to get EIIn — './ 2 (X 1 +... +X^+Z^ + ,+... +Z n )11 3
Truncation
143
The inequalities (14.115), (14.116), and (14.121) now show that the left side of (14.114) is bounded by a constant (depending only on s and k). This fact, together with the estimates (14.113) and (14.111), now gives [on substitution in (14.112)]
f II x II s !Q. — Q,'J(dx) < c44( s,k ) t
s
n.
(14.122)
Q.E.D. 15. MAIN THEOREMS
We are finally ready to prove the two major theorems of this chapter. These provide bounds for I f f dQ n — f f d (Dj over (essentially) all 4continuous functions f that are integrable with respect to Q n under given moment conditions. The bound provided by the first theorem, which is much more useful than the second, cannot in general be improved upon, and the hypothesis in it cannot be relaxed any further (at least if we are to have bounds involving moments alone). The utility of the second theorem is explained later. We continue to use the same notation as in the preceding section. For ease of reference, recall that X 1 ,.. .,X are n independent random vectors with values in R k satisfying n
EXj =O
n I Y, Cov(X1 )=I,
(1<j(n),
-
(15.1)
j -1
I being the k x k identity matrix. The moments and cumulants of XD 's are denoted as in (14.1). In particular, n
p3 =n -1 Y, EIIXj IIs j=
(s>0),
1
n
[vE(Z+)k],
X.=n ' ^. XY.i -
(15.2)
j=1
where is the with cumulant of X. The corresponding quantities for the centered truncated random vectors Zj (1 < j < n), defined by (14.2), are denoted by ps, . Recall also the notation n
D = n -'
Cov (Z^). j-1
(15.3)
144
Normal Approximation
We let Q, Q,,, Q„ denote the distributions of n -1 / 2 7-_ 1 Xj , n -1 / Z E^_ 1 Zj , n -1 / 2 _ 1 Y1 , respectively. As usual, '1" denotes the standard normal distribution on R k , and 4V is the normal distribution with mean a and covariance matrix V. For a real-valued function f on R" define M,(f)= sup (1+IIxIr)-llf(x)I if r>O, xER k
M0(f)= sup If(x)—f(y)I=wj(R" ).
(15.4)
x,yER k
If v is a finite (signed) measure on R k , define a new (signed) measure v, by if r>O,
v,(dx)=(1 +IIxII')'(dx) o = v.
v
(15.5)
As in the preceding section, write n
^n,sl E ) —n-1 f
IIXi 3
^ns=^ns(1).
(15.6)
j-1 01Xj11>cn'/2)
Then define n
inf A* s =O
O n ,,(E)+En
-
'
f
(15.7)
i=1 (jjXill
Note that
Ons
On.a—°On,s(1)
(15.8)
If one takes E = n - '/ 4 in the expression within brackets in (15.7), then one gets n s
n.s(n —1 /4) + n —1 / 4 ps .
(15.9)
Thus if X 1 ,.. .,X,, are the first n terms of a sequence of independent and identically distributed random vectors, then n -(s-2)/2
* ., = o ( n -(s-z)/2)
(n—*oo).
(15.10)
145
Main Theorems
Finally recall the average moduli of oscillation wJ and wJ defined by (11.23), (11.24). THEOREM 15.1. There exist positive constants c (i = 1, 2, 3, 4) depending ;
only on k and s such that if T
(15.11)
h.s^ 3) < c1n(s-2)/2
holds for some integer s > 3, then for every real-valued Bore/-measurable function f satisfying Mr(f)
(15.12)
for some integer r, 0< r < s, one has
J
^
- s-2)/2
+2wg (C4P3n
—
'/2
p*.s : (p+)ra)•
(15.13)
where s-2
_ I n - m/ 2 Pm (-(D:(}), m=0
and _Jr
r°
g(
)
if r is even,
r+1 if r is odd,
(I+jjxjj
ra
)
-
i
f(x)
if r>0,
-
x f(x )
if
(15.14 ) r=0.
Proof The constants c's appearing in the proof do not depend on anything other than r, k, s. We write s-2
m/2
m=0 i//" (B)=1^I(B+a n )
(BE^'ilk),
(15.15)
146
Normal Approximation
where n
an =n -1 / 2 E EYj .
(15.16)
Thus'" is the signed measure induced from iji by the map x- *x — an . Now
f fd(Q,,
—
i )I
—
Q.')I+ f fd(Q,^'
—
^G)I.
(15.17)
If r>0, then (xER k ),
If(x)I <Mr(f)(l+11x11')
(15.18)
so that using Lemma 14.8 one gets (15.19)
f fd (Qn — Q,^' )
Since the left side does not change when f is replaced by f— c, where c is the midpoint of the range of f, (15.19) also holds for r=0. Next
I f fd(Q,,' — *p)I =I f fg,d(Q.' p")I
+Ifa,d( 1P' — ip)l+I f fd('G — p")I , (15.20) where f, denotes the translate off by y; that is,
f
y
(x)=f(x+y)
(xER k ).
(15.21)
By Lemma 14.6,
f fg,d(0'-0)
<Mr (f) f (l+llx+anll')I ' - 4I(dx) <M,(f) f ( 1 +2•llanIIr+2'llxlI')Ip'
-
4I(dx)
Mr(J)I110 '- 7'lI+ 2' llanll ' ll4 '— o11+ 2' f 11x11 ' 14' '- 4'I(dx)J
C6Mr
)Q
(f
..sn-(s-2)/2.
(15.22)
Main Theorems
147
In the last step we also used inequality (14.81). Similarly,
ff.d(P-'P")
-(s-2)/2
(15.23)
Therefore (15.17) reduces to [in view of (15.19), (15.20), (15.22), and
(15.23)]
f fd(Qn
-
4')I
To estimate the second term on the right of (15.24) we introduce a kernel probability measure K on R k satisfying
K((x:llxIj3, f IIxII
K(t)=0
k +s+ 2K(dx)
if 11111 c 9 (tER k ).
(15.25)
One construction of such a probability measure is provided by Theorem 10.1. For c >0 define the probability measure KK by K,(B)=K(E
-
'B)
(BE, E 'B=(E 'X:XEB)). (15.26) -
-
Then one has
Kc ((x:IIx)I<E))>4,
Kc (t)=0
if Iltll> E9 .
(15.27)
Now
f f;
d
(Qn
')I=I f (I+Ilx+a.II To)
-!f
(X+an )
X (I + II x+ anll '°)( Qn — 4 ' )( dx )
1 1 (1 + tx + aIl'°) ' f(X+an)(l+11xII'°)(Q,,
—
+M,°(f) f IIlx+a„II'°_11x11'°I(Q,+j,"I)(dx),
^')(dx)I
148 Normal Approximation
f IIX+a,.II'°
—
Ilxll'°I (Qn+I^F'I)(dx)
+ f ((8k'/2)-.°+'+IIxII'°-I)IiI(dx)
+ f (( 8 k 1/2 ) - '°+i +IIxII'° - ')I^Y'—PI(dx)I < c I00n -(s-2)/2 ,
(15.28)
by Lemmas 14.3, 14.6 and inequalities (14.12), (14.81). Hence [see (15.24), (15.28)]
f fd(
Q —,p)I
)Zn,,n-(s-2)/2
+ f 8(x)( 1 +11x1('°)(Q,, — +G')(dx)I
(15.29)
if it is noted that
M,° (I)=
sup
(l+IIx11'°) -I I.f(x)I
XER k
<M,(f) sup ( 1 +Ilxll')(I+IIxII'°) - '<2M,(f).
(15.30)
XER k
By Corollary 11.5 one has for every c >0 f 8 (x)(l + lI xJI'°)(Q — ')(dx) ;
< 2 M,° (f)II(Q.
—
ip'), *K II+ 2 ;( 2 € : (¢'+),0),
(15.31)
where (Q,'—i').°(dx)=(l + II xll'°)(Q.
-
4 ')(dx), ,
( ,G' + ),° (dx) = ( 1 + IIxII'°)4G' + (dx).
(15.32)
149
Main Theorems
Now choose e = l6c 9p 3 n - I / z .
(15.33)
By Lemma 11.6, (Q,H*P'),-*KcII
where the maximum is over all pairs of nonnegative integral vectots a, /3 satisfying 0
(15.35)
and the integration is with respect to Lebesgue measure on R". Since
K (t)=0 for Iltll>c 9 /e=n 1 / 2 /(16p 3 ), and (
( DPR<)( t )I < (elsllxflIK(dx)
(15.36)
one has
f
I " (Qn — ' ')'
D'KK
=
IDO(Qn—')(t)Idt.
(15.37)
(11:11
Note that [see (14.60) with s=3] a is bounded above by a constant because of (15.11) [whatever the constant c one may choose in (15.11)]. This fact
was used in (15.36). Now
I
— ^G )(t)I dt
II^II fl"2
k+s—I
" D
Q^(t)— 2 n - "'/ 2Pm (it:{X))exp(-2(t,Dt)) dt m=0
A.
k+s—I
+ f D"
m/ZPm(it:{ ))exp{—Z(t,Dt>) di m=s-1
+ fA^ ^(D"Q„)(t)Idt+ fA^ I(D°5P')(t)Idt=1,+1 2 +1 3 +1 4 ,
(15.38)
150 Normal Approximation
where, writing A for the largest eigenvalue of D,
Aga
A-1/z1,
11 1 11< C14( ) ^Ik+s+z
i/z
11th < i }\A,
=
n
EIIBZjhI m (m>0),
,lmmn -I
(15.39)
i-1
B being the symmetric positive-definite matrix satisfying B 2 = D - 1 . By Lemma 14.1(v) (with s'=m) and inequality (14.21) (Corollary 14.2), one has
IIB(I<(;)^/2,
A<;,
11,,, <2m (3) m/2 n
(m—s)/2A* r^
n—(m-2)/271m <2m(3)m/2n—(s-2)/2Qn's
(m>s+1).
(15.40)
Hence, putting m = k + s + 2, we •get n (s-2)/2I/(k+., + 2 )
Aj Iltll
.
An.t
)
I
(15.41) .
If we choose c, a to be the constant c 22 (k + s + 2, k) appearing in Theorem 9.11, then we get [using (15.40)] l i < C 7lk+.s+2n -(k+s)/2 < c 17 n -( s -2) / 2 ^s
(15.42)
For this we have used Theorem 9.11 and the remark after Theorem 9.12. Also, by Lemma 9.5 (with r = m, s = m + 2) and Corollary 14.2, one has [still using (15.40)]
f IDa[n
- '°/ 2Pm
(it: {X;})exp{— I}]Idt
< c 1s n —( s -2) / 2 06 s (s— I <m < k+s— 1),
(15.43)
whereas (14.10) yields n -m /2 Pm+2< 2
m+2 p2= 2m+2
k
(1 <m<s-2).
(15.44)
Main Theorems
151
The inequality (15.43) immediately leads to
12 < c19n - (,- z) / z O*. J .
(15.45)
Next use Corollary 14.4 to get [see (15.39), (15.41)] 1 3= fAw ID a QnI
(
„^-:^,:
)I/(*+2) }( 1 +Ilill l ° I )exp{ — illtll z }dt
j II 1 II>cis l
< c21 n zz
^w,i
n,s I f I t .11 ^,s•
k+ ^ + z (
1 + II t II I°I )
exp { - u II t II z } dt (15.46)
A similar calculation using (15.44) gives 14 - fA ID°5P'I < c 23 n -(,-2) / 2A*,,.
(15.47)
The estimation of the right side of (15.34) is complete, and we have
II(Q, - 4, ')' KI
(15.48)
Finally, by Lemma 14.6, IW8 \
2E : (tP')') — Wg (2E : (4 + ),) < sup J W (x : 2c)I(4,'+),o—(4'+)rjj(' ) ,
yER"
< 2 M,(I)ll(4" ). - (4)Il < C25Mr (f )On,fn
-(:-z)/z.
(15.49)
Now use (15.48), (15.49) to get an estimate for the right side of (15.31) and then use this estimate in (15.29) to complete the proof, noting that M,0(f) < 2 M,(.l), 0 , < Q.E.D. Q.E.D. Remark. In view of relation (7.22), JId(Qw
-
P)- f (I-c)d(Q,^- P) ,
for all c ER'. Hence M,(J) in (15.13) (and in all subsequent theorems involving it) may be
replaced by M, (J)= inf M,(f—c). eER1
(15.50)
Normal Approximation
152
COROLLARY 15.2. If p3 < oo, then for all bounded Borel-measurable functions f one has
J f fd(Q.-)I
k
)p 3 n - '/ 2 +2wf (c aP 3 n -1 / 2 :4?), (15.51)
where c 26 depends only on k. Proof. The corollary follows from the above theorem if one takes r = ro =0, s=3, and notes that in the present case the condition (15.11) is unnecessary. For if p3> 0 ,,.3{i)> c1n 1/2,
then
f fd(Q,
-4i )I
<2w1 (R' )< -- w1(R k )P 3 n -1 / 2
Now (15.13) yields [using (15.8)]
If
fd (Qn
- 0 - n -'/2P,(-,t:
{x,)))I
<wf(R k )n -1/2 A.3+ 2wf ( c4P3 n-1/2 : 0+ In -h/2Pi( -0 : {X,•))I) <w^(Rk)n-1/2p3+2wf(Rk )n -x/2 11 P1( - c:
{))II +2w7
(c4 P 3 n -1
/ 2 :4j).
(15.52) Lemma 13.1 does the rest. Q.E.D. COROLLARY 15.3. If p3 < oo, then for all Bore! sets A one has IQ.(A) -4 D(A)I
sup 40((aA)''+y),
(15.53)
yER h
where 71= c4P3n -
2 (15.54)
Proof. This follows immediately from the preceding corollary if one takes f - IA (the indicator function of A) and recalls that [see (2.40)-(2.43)] wf,, (X: 71) =
'A'\A
- I W=ltaA)'(x)
(xERk, ACR k , 71> 0 ),
w^^(rl:I)= sup f wj (x+y:rl)0(dx)= sup 4((aA)+y). (15.55) yER * yERk
Q.E.D.
Main Theorems
153
Remark. Suppose A is a Borel set such that ''(A) -0. With r—r0 -0, (15.13) then yields IQ.(A)-4(A)I
yE R'`
(15.56)
4 + (( aA) +y), "
where n— c4 p3 n -1 / 2 . If, in addition, A is a set having a small diameter, one may use inequalities (11.25) of Corollary 11.5, rather than inequalities (11.26) as used in (15.31), to get, instead of the term Mo(1A)II(Q.- 4')•K.II (in (15.31)), the expression y•(IA.:c)
where A denotes the Lebesgue measure on R k , and h„ is the density of (Q„-4/)•K1 With a-0 in (15.38), (15.42), (15.45)-(15.47), one obtains .
Ih.(y)I
i-])
/2
(yERk ),
p..r
and therefore
IQ. (A) — C(A)I
-( f
-2)
/ A:,Ak(A')
4 *((aA)"+y),
+2 sup
(15.57)
yERk
with c - c9 p3 n -1 / 2 , rl — c4 p3 n -1 / 2 . We make use of (15.56) and (15.57) in Section 17 (Theorems 17.4, 17.5).
Theorem 15.1 provides sharp bounds for a wide class of functions f. This is seen, for qxample, by comparing it with the more specialized asymptotic expansions in Chapters 4 and 5, or with the general expansions for trigonometric polynomials f as provided by Theorems 9.10-9.12. Section 17 contains several applications. However note that the right side of (15.50) (Corollary 15.2) does not go to zero (as n -'oo) for every bounded II'-continuous f when Q„ is the distribution of n '/ 2 (X 1 + • • • + X) and (X,, : n> 1) is a sequence of independent and identically distributed random vectors with zero means, covariance matrices I, and finite third absolute moments p3 . The reason for this is that the bound is in terms of wf [and wf (R k )] rather than wf . Recall that [see Lemma 1.2(iii), Theorem 1.3(iii)] that f fdQ„---. f fd4 for all bounded c1-continuous f and that a bounded f is (D-continuous if and only if -
lim wf (e:1)=0
{wl ( e:0)=
f
wj (x:e)(P(dx)l.
On the other hand, it is not difficult to construct bounded 4>-continuous functions f such that wj (e does not go to zero with e. For example, let A I , the indicator function of the following Borel subset A of R'. f=
: c)
((,-I)/21
A=U
U r-2
i—i
jxeR 1 :r+? <xr+ r !
(2i+1) r
,
Normal Approximation
154
with [(r— 1)/2] denoting the integer part of (r— 1)/2. It is easy to see that w^A (e:4i)= sup 0((8A)`+y)=1, yERk
W1A (E:4))=4D l (aA) ) <(27l)-1/2 ^ rexp t —
r2 11
'
1
,-
I
J E
(t>0).
It would be ideal if one could replace wj in the bound (15.51) by w . Unfortunately we are unable to do this. The situation is partially salvaged by the next theorem, which provides an effective bound for every bounded 0-continuous f. However a price is paid for greater generality. For "nice" functions and sets there is a loss in precision. On the other hand, apart from its greater generality, this theorem is more useful than Theorem 15.1 for estimating tail probabilities (see Section 17). !
THEOREM 15.4. There exist positive constants c; (i = 1, 2, 3) depending only on k and s such that if A,,,,(i) < cin t
s -2) / 2 (15.58)
holds for some integer s> 3, then for every Bore/-measurable function f satisfying Mr(f)<00
(15.59)
for some integer r, 0 < r < s, one has
f fd(Q„
—
'G)I
+),
(15.60)
where s-2
0 = I n (s 2)/2Pm( -
Y
m-0
-
-
4):
(x.))'
and 'q=ciP3([logn]+ 1)n -1 / z ,
(15.61)
where [logn] is the integer part of logn. The quantities 0 (3) , On.s are defined by (15.6) and (15.7), respectively.
Main Theorems
155
Proof We continue to use the same notation as in the proof of Theorem 15.1. Define , ', ¢" as in (15.15). As in (15.24), one has J fd(QQ - 4i)I
- c ,-2) / z +Iffd(Q '—O')I. ,,
(15.62)
Choose a probability measure K' on R k satisfying K'((x:IIxII<1))=1,
IDCK'(t)I
(15.63)
Such a choice is possible by Corollary 10.4. For E >0 and a positive integer p define K' (B)=K'(e - 'B)
(B E J^ k ), (15.64)
and note that KK. P ({x:llxll
By Corollary 11.2, with µ = Q,,, v = 4', KK = KK , P , and f=4' one has f J^,d(Q^—^'')I
(15.66)
where, by definition of y [see (11.8)] and of M,(J) [see (15.4)] y(f,:p€)<Mr(f)( 1 +IIxII +Ilan ,, + PE) r i(Q, - 4i')'Ki.PI(dx)
<M,.(f) f (C3o+ 2'llxll'+P'E')I(Q, - 41')*KK. P I(dx) < C31Mr(f)[
la+^^ m
ax
IID a (Qn
+p'€' max
Ia+flk
—
II D ° ( —
(15.67)
If r=0, then the first inequality in (15.67) holds because 4'(R k )=V(R k ) =1 [see (7..22) and (11.13)]. The last inequality follows from Lemma 11.6. In view of (15.63), one has
IID° (Q' ^F')'D"K.. P II1 -
I-6I
( ID ° (Q, — i')(t)^exp(
— PIIEt11 112 )
dt.
(15.68)
156 Normal Approximation
Now choose e=c33p3n-1/2,
p=[logn]+1,
(15.69)
and write
f
(15.70)
where, as in (15.38)-(15.47),
I1 <
^,:^
34 k+s-1
f ID a [n - '"/ ZPm (it:{x;})exp{-2(t,Dt)}]Idt
I2- m—s-1
< c n—cs-2)/2Q. n.5
3s
13 < c36n -( s - z ) / 2 A*n,s9
la'= f
I(D ° '')(t)Idt
Rk
,-z>/2 ^.
Is =I(D"Qn)(t)Iexp( — pIIE'II 1/2 )dt { II:II > n" 2 /(16p3) )
(1III>n'/ 2 /<^^))
exp(—plfEtlll/2)di.
(15.71)
The last inequality follows from Corollary 14.4 if one notes that 1 ID°Q„(t)I<EIIn- / 2 (Z1+... +Z,,)II 1 ° 1
<(Elln-1/2(Z1+... +Z.n)II"' I-I/mI, (15.72) ,)
for every even integer m'> Ial, and that for an even integer m', E lI n -1 / 2(Z I + • • • + Z„)II"') may be expressed in terms of ordinary (as opposed to absolute) moments that are estimated by putting t = 0 in
Main Theorems
157
Corollary 14.4. Also,
{IIII>n'/ 2 /(16v,)) xp { —
PIIEtII
1/z
} dt
=E ke xp{—PIItII112}dt (II111>c33/ 16 )
=C 39C
—k ^ 00
u k-I exp{—pu 1 / 2 }du
C33/ 16 2 =2C 391/EP)
kf
2k—I
w
expt
P(c33/16)i/2
exp{—v)dv
—v
(15.73)
}dv.
2
Now choose 2
s 2 2 )
C33 =64 (k+
(15.74)
to get (s 2)/2 < c 42 n IS < c4l(n1/2log2n ) kn -
-
-
-
(3 - 2) / 2
0 ,.
(15.75)
The last inequality in (15.75) derives from [see (14.7), (14.10), and (15.58)] n (s+1)/2
n -1/2c43
n ^I f IIY3II -
2
j-1
n
IIljII
s+
^ s0* . .s
(15.76)
J-1
Hence II D ° (Q^—^^)'DQK^.P^I1
c44(PE)II'In-cs-2)/2pn.s'
(15.77)
which leads to Y(fa :pE )
k+r+l
)n
—(s-2) / 2
^^ s .
(15.78)
158 Normal Approximation
Finally, by Lemma 14.6,
w^w (2pE:(¢') + )=w/ (2pe:0')+
f w/^(x:2pe)^(^'^)+-0+)(dx)
+ f (wf. (x : 2pt) — wf (x : 2p))¢ + (dx) <w1 (2pe:i + )+M,(f) f
(I+(IlxI +IIanII + 2pe) r )IO' — PI(dx)
2 + I.1 Q1(x : pc)((4'") — 4')(dx) <(Jf(2pE:1^if)
+C46 Mr(f)( 1 +(pE) r )n -(5-2)/2A*
(15.79) In the last step we also used the inequality w1 (x:2pc)<2M,(f)(1+(IIxII+2pc) r )
(r>0, xER k ). (15.80)
The proof is now complete by (15.78), (15.79), (15.66), and (15.62) if one writes rl = 2pe. Q.E.D. COROLLARY 15.5 Under the hypothesis of Theorem 15.4, one has
f fd(Q.
-1)I
<e47(s , k)Mr(J )( I+.l
k+r+' )(P3n -1/2 +p,n -cs-2)/2
+wf( 71: b),
)
(15.81)
where, as in (15.61),
71=c p 3 ([logn]+ 1)n" 2 Proof Proof. From the expression (14.74) one has IPm( — $: (X.))(x)l
(xER k , 1 <m<s-2).
(15.82)
Main Theorems
159
For, if I vi I > 3, I°_ 1 (I v ; I - 2) = m, then (recalling that p 2 = k) IX,,. .. X ,r ^ < c49(m k)PI ►,I
.
..
,
pk,I (II-2)/m
.
(15.83)
(1<m<s-2).
Now n
n-m/2Pm+2=n _I
n-m/2( f
+ f
11xj1Im+2
n
=pan- ► /2+psn-(s-2)/2
J i1xills) (15.84)
(1 <m<s-2).
Therefore 11 n -'"/2Pm
(- 4:
(x,) )H < c5 ► (s , k)(Psn -
► /2 + psn
-(s-2)/2)
(I <m<s-2).
(15.85)
Also, using (15.80), (15.82), and (15.84), we get wl(fl' I n-m/2Pm( --b : {)C►,})I) < c52(s , k)( 1 + n r )M,, (f)(Psn -
i/z + pan -(s-2)/2) (1 < m <s-2).
(15.86)
Now (15.81) follows from (15.60) if one uses (15.85) and (15.86), as well as the first inequality in (15.8). Q.E.D. Remark. The stipulation (15,58) [which is the same as (15.11)] is redundant when one has r-0 in Theorem 15.4 or Corollary 15.5. The proof of this assertion in the Corollary 15.5 case is essentially contained in the proof of Corollary 15.2. In the Theorem 15.4 case one needs the additional fact [see (7.22)] P.(-0:(y,))(R k )-0
(1<m<s-2).
(15.87)
160
Normal Approximation
16. NORMALIZATION The main theorems in the last section have been stated for independent random vectors X 1 ,...,X. satisfying (standard normalization) n EXj -0
n ' I Cov(X^)=1
(1 <j< n),
-
(16.1)
j -1
where I is the k x k identity matrix. This was done for the sake of simplicity. It is a relatively minor matter to extend all results of the last section to independent random vectors X 1 ,.. .,X„ satisfying EXj =O
(1 < j
n '
Cov(X^)-. V,
-
j-
(16.2)
1
where V is an arbitrary symmetric, positive-definite matrix. One could also take the mean vectors in (16.2) to be arbitrary. But since this is taken care of merely by changing the integrand f to a translate of it, we assume (16.2) throughout this section. Let Q,, denote the distribution of n - '/ 2 (X I + • • • + X,,). If f is integrable with respect to Qn and (D o. V , then by changing variables x-+ Tx one has f fd(Q, -41o.v)= f f°T ^d(G,-Z),
(16.3)
where T is the symmetric positive-definite matrix satisfying T
2 =V - ',
(16.4)
and G. is the distribution of n - '/ 2(TX 1 + • - - + TX.). Note that the random vectors TX 1 ,..., TX,, satisfy standard normalization, that is, E(TX1)=0
(1 'j< n),
n n - ' Y, Cov(TXj)=1.
(16.5)
j-1
Hence one may use the results of the last section to estimate the right side of (16.3) in terms of the moments - n
•r,-n ' I E^ITXj II' -
(r>0),
(16.6)
j-1
and M,(f°T - ') and w8r or wf, T -1, where (I + IIxIrr") - '(fo T - ')(x) Sr(x)= (foT
- ')(x)
if r>O, if r=0.
(16.7)
Normalization
161
Since T is easily computable, we may leave things at this stage. If one would like the bound to involve moments of X D 's and not those of TX's, then the simple inequality (r>0)
Tr
(16.8)
may be used. Here A is the smallest eigenvalue of V. Note that TII=A -1/2 ,
IIT
-I
II_A'/ 2 ,
(16.9)
where A is the largest eigenvalue of V. We now rewrite (or extend) the results of Section 15 in a series of theorems and corollaries, assuming that the n independent random vectors X 1 ,. . . , X,, satisfy (16.2). THEOREM 16.1. There exist positive constants c ; (i= 1,1,2,3,4) depending only on k and s such that if n (16.10) IITX^II'( 2 /3). 1 / 2 )
holds for some integer s > 3, then for every real-valued, Borel-measurable function f satisfying (16.11) M,(f)
J fd(Q,,—(Do.v)I <M,(f°T-1)(
C2T3n-1/2+C3T.,n—(s-2)/2)
g;( c4T3n-1/2: 4,
+2w
a)
G max (1,A r / 2 }Mr (f)(c 2X -3 / 2 p j n -1 / 2 + c 3 A — s/2
+ 2wwr(c4X -3/2
p3n -1/2: 4 r.).
p, n
_(s_2)/2)
(16.12)
Proof. The first inequality in (16.12) follows from (16.3) and Theorem
15.1, as well as the relations (15.82), (15.84), and (15.86). Also, one has M,(f°T ► )= —
sup (l+IIxtI') —I If(T —tx)I
xER 1'
sup (1+IIT-1xIr)—Ilf(T—'x)I)t
sup (1+IIT—IXII')(l+11x11')-1) <( xER k xERk
=M,(f)( sup ( 1 +11T —I xll')(l+11x11') — ') xER k
<M (f)max(1,IIT - 'll'}=M,(f)max{l,A'/ 2 ). '
(16.13)
162
Normal Approximation
The second inequality in (16.12) follows from (16.9) and (16.13). Q.E.D. The constants c's below depend only on s and k.
COROLLARY 16.2. If P3< oo, then for all bounded, Borel-measurable functions f one has
f fd (Qn-0o v)I < cswf(Rk )T3n-1/2 + 2wf r _,(c4 T 3 n - 1/2. D)
< c s w/ (R k )'r 3 n -1 / 2 +2wj (c 4 A h / 2'r 3 n -h / 2 :4,o v ) < c sw1 (R k )A -3/2 p3n -1/2
+2wj ( C4A'/2A - 3/2p3n - '/2:(Do. v)•
(16.14)
Proof The first inequality follows from Theorem 16.1 on taking r = ro = 0, s=3, and recalling Mo(f°T -l )=wf r,_,(R k )=wf(R* ).
(16.15)
To get the second inequality in (16.14) note that wf, T _,(x:e)=suP{I.f(T - ^y)_f(T - 'z)I:y,zEB(x:e)} <sup {If(y')-f(z')I:y',z'EB(T - 'x: IIT - 'IIE))
=wf (T -1x:IIT -1 IIe),
(16.16)
so that c1) sup sup f wi(T—'(x+y):II T-111e)4(dx) yER`
= sup f w (T lx+y :IIT 'IIE)(P(dx) !
-
,
-
y'ER k
= sup fwi(z+y':II T —h IIE)io.v(dz) y'ERk
-wf (II T - 'Il e : t' , v )=wf (A" 2e:Io, v ).
(16.17)
The last inequality in (16.14) now follows from (16.8). Finally, note that
Normalization
163
the condition (16.10) may be replaced by p 3 < oo as in the proof of Corollary 15.2. Q.E.D. COROLLARY 16.3. If P3 < 00, then for all Bore! sets A one has IQ.(A)-4o.v(A)I < c 5 T 3 n -1 / 2 +2
sup
(bo v((aA)n'+y)
yER k
/ 2 p 3 n -i / 2 +2 sup 4i0.v ((aA)''+y), (16.18) yER 4
where 1J=c4T3n-1/2
q,=IIT-'IIr1=A'/1q. (16.19)
Proof. The proof follows from Corollary 16.2 and the relations (15.55). Also note that IA O T -' = I TA . Q.E.D.
Before stating the analog of Theorem 15.4, we note that if we start with the finite signed measure Pm( -40 o,v: {X„}), where Xv is the average of the with cumulants of XD 's and make the transformation x-+Tx on R k , then the induced signed measure is Pm (-(b: {XY )), where j,^, is the average of the with cumulants of TX's, 1 < j < n. To see this, observe that the FourierStieltjes transform of Pm (- 4 o, v : (x)) is Pm (-4 ov :{aG})(t)=Pm (it:{Xv))exp(--L)
(IER k ), (16.20)
and that of the induced signed measure [see Theorem 5.1(v)] is t-+Pm (iTt,(X ))exp{-iIItII2) (tER' ).
(16.21)
Now look at the expression (7.3) to conclude Pm (iTt,(X.))=P.(it:( ))
(tER k ).
(16.22)
THEOREM 16.4. There exist positive constants c 1 , c; (i = 2,3) depending only on k and s such that under the hypothesis of Theorem 16.1 one has s-2
I
fd Qn—m-0 7. n-m/ZP.(-410 v. {X.}) )
< cZ max ( 1, Ar/ 2 )Mr (f)( 1 + rl k+r+ i)X-,/2 pfn -(s-i)/2 I s-2 + +Zif (A'/ 271: (1 n -m/ZPm( - to.v: {X,))) ), m-0
(16.23)
164
Normal Approximation
where ° C31
-
3 /2 P3([logn] + I)n -) /`.
(16.24)
Proof. This is an immediate consequence of Theorem 15.4 and the relation
f fd(Qn
s-2
m=0
n— m/2Pm(— ,DO.V :(x})
=f
l
s-2
(f°T-))d Gn
—
2
m-0
(16.25)
n -m/2Pm( — $:(?(,)) 1 //
which follows from the discussion preceding the statement of the present theorem. One of course also uses (16.8), (16.9), (16.13), and (16.16), as in the proof of Theorem 16.1. Q.E.D. COROLLARY 16.5. Under the hypothesis of Theorem 16.4 one has
f fd(
Qn
- -to,v )I
)
x(A 3 /2P3n - )/ 2+X- 3 /2p,n -(s-2)/2) -
(16.26) where is defined by (16.24). Proof. This follows from Corollary 15.5 if one uses (16.3), (16.8), (16.9), (16.13), and (16.16). Q.E.D. Remark We have already pointed out (see Corollaries 16.2, 16.3) that the stipulation (16.10) is redundant in Theorem 16.1 in the case r-0. This is also true for Theorem 16.4 and Corollary 16.5, and the proof is also essentially the same. Note that (16.10) may be replaced by the slightly more stringent but simpler condition T,
< ctn ' (
-
2) '2
(16.27)
or by (16.28)
17. SOME APPLICATIONS
The present section is devoted to some important applications of the main theorems of Sections 15 and 16. We continue to use the same notation as in Section 16. Thus X 1 ,... ,X,, are n independent random vectors with
Some Applications
165
values in R k satisfying n
(1 < j < n),
EXj =O
n -1 Cov(Xj )= V,
(17.1)
j-1
where V is a symmetric, positive-definite matrix. Also, T is the symmetric positive-definite matrix satisfying (16.4), and we write n
n
pr=n -I EIIXjII",
j
— I
T,
=n -1
E EIITXjII r (r>0).
(17.2)
j-1
Recall that A, A are the smallest and largest eigenvalues of V, respectively, and Qn is the distribution of n -I / 2 (XI+ • • - +X„). We define the class 9 (d: µ) as the class of all Bore! subsets A of R" satisfying sup µ((aA)`+y)<de'
(e>0),
(17.3)
yER*
for a given pair of positive numbers a, d, and a measure µ. THEOREM 17.1. There exist positive constants c 1 and c 2 depending only on k such that if p 3 < oo, then
sup
AE(t';(d:4'o.v)
I Q.(A) -4 0, (A)
< c lX - 3/2 p3n -1 /2 + 2 d (c 2 A I /2A - 3/2 p3n - I /2)°. (17.4) Proof This follows immediately from Corollary 16.3. Q.E.D. COROLLARY 17.2. There exists a constant c 3 depending only on k such that if p 3 < oo, then
sup IQ. (C) -00, (C)I
1 /2
(17.5)
CEe
where
a is the class of all Borel-measurable convex subsets of Rk .
Proof Let Gn be the distribution of n -1 / 2 (TX I + • • • + TX„). Applying Theorem 17.1 to the random vectors TX,,..., TX,, and using Corollary 3.2 [or inequality (13.42)1, one has
sup IQn(C)— D0.v(C)I= sup IG.(C)- 4 (C)I
CEC
CEe
166
Normal Approximation
where d= 25/2
r((k + 1)/2)
(17.7)
F(k/2)
Note that G' is invariant under translation, as well as nonsingular, linear transformations. Q.E.D. COROLLARY 17.3. Let k=2. Denote by 61 (1) the class of all Bore! subsets of R 2 each having a boundary contained in some rectifiable curve of length not exceeding 1. There are absolute constants c 4 , c5 such that if p 3 < oo, then sup JQ.(D)-1Po.v(D)I
DE9(t)
+c5(A- 11 +a-1/2)A'/2A-3/2p3n -1 /2. (17.8) Proof From the estimate (2.60) we get
4i o. V ((8D )`) <(27r) - '( Det V) V 2 (41r1 + 87rc 2 )
(17.9)
(E <._-_2 ),
since Det V > A 2 It is enough to consider E
.
There are Borel subsets A of R k for which sup i((8A)`+y)0)
(17.10)
yaR k
for some positive constants c6 and a, a> 1. Examples of such sets are affine subspaces of dimensions k' < k — I (and their subsets and complements) and many other manifolds of dimension k'< k —1, for which a = k — k'. Below we assume that V= I merely to avoid notational complexity. THEOREM 17.4. Let V=I. Assume that p, < oo for some integer s)'3. If A is a Borel set satisfying (17.10) for some a> 1, then IQ.(A)
-
1b(A.)I
+c8(P3n -1/2 ) a ( 1 +
I
m-1
n- m /2PM+21og3,n/2n),
(17.11)
Some Applications 167
where c7 depends only on s and k, and cg depends only on s, k, a, and c 6 . In particular, if X 1 ,..., X. are identically distributed, then I QQ(A)
-,
(17.12)
b(A)I = O(n-a/2)
provided that s> a+2. Proof. First assume that 4(A)=0 or 4(A)=1. In this case P.(-4: {X,,)) (A)=0 for all m, 1 <m <s-2, because of (7.22). Hence (15.56) holds, and it remains to show that fl -m/2 IP,n(-^ {x.})(x)Idx (aA)"+y
"'/ 2p
n+2 1og'
- /2
(1 <m <s-2).
n)
This follows from (15.82) and the estimate
f
°
11xHI 3 in$(x)dx < f
(aA) +y
(2alogn) 3 m/ 20(x)dr
(aA)'+y
3rn
/
o(x)dx
•
+ (11X11>(2alogn)1/3)IIxI^
< c 6 (2a logn)3m/2(p3n -1/2)a + n -a/2
We now show that 4(A)=0 or 1. For this first observe that for all y, z E R k (y # z), one has A +y\A +zCCl(AII' - `°I)+y, so that, by (17.10), I4(A +y)-t(A +z)I < c6IIY - zII a .
In other words, the function f(y)-4(A+y) is Lipschitzian of order a> 1. It follows that f is differentiable and has a zero differential on all of R"`. Hence f is a constant on R k ; that is, b(A+y)=O(A)
(17.13)
(yER k ).
Define the measure u on R k by µ(B)=JBIA(x)dx=Xk(BnA)
(BE
),
168 Normal Approximation
where IA is the indicator function of A and Ak is Lebesgue measure on R k . We show that if (17.13) holds, then is translation invariant. This would imply that µ = c11k for some c, 0 < c < oo, which is possible only if IA = 0 (almost everywhere) or IA =1 (almost everywhere), and the proof of the theorem is complete. If B is a bounded Borel set, then for every E >0 there exist c; F_ R', y, ER k , 1 < i < m, say, such that II 'B — = 1 c(x +y) 1 2, where 4 is the standard normal density on R k . This follows because linear combinations of translates of any function whose fourier transform does not vanish anywhere on R k are dense in L'(R k ).t One then has, using (17.13) in the last step, µ(B y) —
—
µ(B)J=I f'8_ (x)dx — fIB(x)dx f I.(x)dx— f 1,(x)dxI A
A+y
m
< fIs (x)dx— A+y
c;4)(x+y1)dx
E fA+y
m
m
+ 2 fcc¢(x+y,)dx— E f ci4(x+y,)dx i^l '^
+ y
i^l '^
m
+ 2 f c;4(x+y,)dx— f IB (x)dx A
A
<2+0+2=E
for allyER k . Hence µ(B—y)=µ(B) for allyER*. Q.E.D. For the special case (of Theorem 17.4) when A is an affine subspace of dimension k' < k —1, the hypothesis p, < oo may be relaxed. Indeed, one has THEOREM 17.5. There exists a positive number c lo depending only on k such that if the hypothesis of Theorem 17.4 holds with s = 3, then for every affine subspace A (of R'`) of dimension k' < k — 1, one has Q,, (A) < C1o(psn -Z)ck- k >. tSee Wiener [11, Theorem 9, p. 98.
(17.14)
Some Applications 169
Proof An affine subspace A of dimension k' < k has a representation A ={xER k :<11 ,x>=c,,1
and note that n
(1 <j
EUj =O
n
CovUj =1,
j-1
n
n n 7, EIIUj lj 3 < n -1 2 EIIXj II 3 =P 3 .
denote the Let Qn denote the distribution of n -I / 2 ^i_,Uj and let standard normal distribution on R k-k '. Now let c be the vector in Rk-k' whose ith coordinate is c. (1 < i < k — k'). Then Qn(A)=Qn((c)) ID(A)=4)({c})=0, ,
and using (15.57) to Qn , , and {c) (in place of Qn , 4), and A, respectively) for s=3,
Q.( A ) = I Q.( A ) -4 (A)I =I Q,((c))—^((c))I < c l0 n -1 / 2P 3 (P 3 n -t / 2 ) k-k +2 sup >G + ((a {c+y))") k'
c10(n -1/2 P3)
k+1-k '
y E Rk-
+c', o (P 3 n -I / 2 ) k-k ,
where ¢= +n -t / ZP t (— : {X„}), , denoting the average of the with
cumulants of Uj 's. Q.E.D. Remark. It is fairly straightforward to extend the assertion (17.12) to a sequence (X,, : n> 1) of independent random vectors for which n
linminfX.>0,
supn -1 7, EIIX^II'
Here A,,, is the smallest eigenvalue of V. _— n has n -'
n
j_1
_, Cov(X). Note that with T. =
EliT„X1 lr'<x,^ " 2 (n - ' ± EllX^ii'), j -1 j-1
(17.15) one (17.16)
170 Normal Approximation
so that applying (17.11) to random vectors T„X^, I < j< n, one arrives at (17.12), provided that s> a + 2 and (17.15) holds. Similarly, in Corollaries 17.2 and 17.3 the error bounds are O (n —1 / 2) if (17.15) holds with s-3. In the same manner, orders of magnitude of errors for a sequence (X„ : n> I) satisfying (17.15) may be obtained from the remaining theorems and corollaries of this section.
For an application of a different nature, consider a Bbrel set A and define the function f by f(x)=(1+d 3 (0,aA))I4 .(x)
(xER k ),
(17.17)
where
A,_
J
if OERk\A, R k \A if OEA,
A
(17.18)
and d(0, aA) is the euclidean distance of aA from the origin 0. Note that M3 (f)<1
(17.19)
(s>0).
Defining g by (15.14) with r=s, one has, for IlziI <2c
<(1+IIx+Y11:
,
)
-
'If(x+Y+z)-f(x+Y)I
+If(x+y+z)II(l+Ilx+Yll'°) - ► - (l+llx+y+zIl'°) -t ) <(1+ilx+ll) - 'I.f(x+y+z)—f(x+Y)I+c 12 (k, ․)e
<(1 + IIx+y113°)-`(1 +d° (0 , a4 )) 1(aA)(x+Y)+c12(k , ․)c <(1 +(d(O,aA)-2c)'°) - '(1 + d' (0,8A))I (aA) =•(x+y)+c 12 (k, ․)c < c 13 (k, ․)I(aA )''(x+y)+c I2 (k, ․)e,
(17.20)
where c13(k, ․) =
sup (1+(b+c 11 (k, ․))')(l+b'°) ^. -
(17.21)
b>O
Thus in this case w8 (2e: I,o) < c 13 (k, ․) sup 4 ((aA) 2 i+y)+e 14 (k, ․)e,
(17.22)
c14(k, ․ )—c12(k, ․)4',o(R"` ).
(17.23)
yER k
where
Some Applications 171
Specializing to convex sets, we can prove THEOREM 17.6. There exist positive constants c 15 , c 16, c 17 depending only on k and s such that if V = I and ps < c 15 n
( s -2) / 2 (17.24)
for some integer s> 3, then sup ( 1 +d 3 ( 0, aC))IQ.(C) — b(C)I < c16Psn - ' /2 +ci7P:n -t:-2)/2 .
CE@
(17.25)
Proof. Let C be a Borel-measurable convex set. Replacing A by C in (17.17) and using (17.22) in Theorem 15.1 (with r=s), one has (1+d'(U,8C))IQ.(C)-41(C)j
=^ ffd(Q^-0)I s-2
m/2I (' fdp (—: f fd(Q,, — ^)I + I n{X,))I m-1 J c18Pan-(s-2)/2+ c 1 9w (2E : (Y
+ )r°)
s-2
+ 2 n - m/ 2 I f fdpp (-4: (x.))I,
(17.26)
m-1
where e=c 10P 3 n - '/ 2 . Now, by (15.82) and (15.84),
1
(1 + 1I X II^ ° )I Pm ( - -0 : {Xr})(X)I dX 4 C21Pm+2+ n-m/2Pm+2
Hence (17.26) reduces to (I +d 3 (0, aC)))Q,,(C) -4 (C)I
+ C 1$WB (2E : 0to).
(17.28)
After (17.22) and Corollary 3.2 are used in (17.28) the proof is complete.
Q.E.D.
172
Normal Approximation
Theorem 17.6 leads to the so-called global and mean central limit theorems in R'. COROLLARY 17.7. Under the hypothesis of Theorem 17.6 with k = I one has
sup (1+IxI')IF,(x) - 'D( x)I4c23P3n-1/2+c24Pn-(s-2)/2 (17.29) xER'
where F.(•) is the distribution function of n - " 2 (X 1 + • • - + X„) and ID(•) is the standard normal distribution function. It follows that `
I1 F„ — 0 11 P -(
I/P
fR 1F, (x) — ID(x)I P dx
< c25(P , ․ )(c23P3n -1 /2
+ c24P:n -(s-2)/2),
(17.30)
for all p> 1/s. Proof. The inequality (17.29) follows from Theorem 17.6 on taking k=1, C=(—oo,x], xER'. Inequality (17.30) is immediate from this. Q.E.D. Remark. The validity of (17.30) for p>I may be proved without the assumption (17.24). Let P3 < co. Then IF, (x) — $(x)I <(1+IxI') -' c26P3n -1/2 <2(1+x 2 ) -1 c 26 P 3 n -1 / 2 (xER')
(17.31)
if (17.24) holds, that is, if P
3
for a suitable positive number c 27 . However x 2 ^F„ (x) —,D(x)j < x 2 (F„ (x) +@(x))
2g^(dy)+
y
x 2 IF. (x) —4 (x)l _ x21(1— F^ (x)) — (1
Y24(dy)<2
f
(x<0),
(IYI>IxD)
(IYI>IXI) -
4i(x))I
y 2Q^(4Y)+ f
(Irl>x}
IF„(x)-4b(x)t<1
(IYI>x)
Y 24(dr)<2 (x>0), (17.33)
(xER 1 ),
so that the inequality (1+x2)IF„(x)-0(x)I<3
(x ER')
(17.34)
Some Applications 173
holds whenever P 2 = I. Now IF. (x)
-
4b( x)I
-
1/Z (x
ER')
(17.35)
holds whenever P3< if c 28 =max(2c 26 ,3/c 27 }. This is true since (17.35) holds because of (17.31), provided that (17.32) holds. If (17.32) is violated, then (17.35) holds by virtue of (17.34).
The final application of Theorem 15.1 is to Lipschitzian functions. THEOREM 17.8. Let (17.1) hold with V = I. There exist positive constants c29 c30 c31 depending only on r, k, and s such that if Ps < c 29 n (s 2) / 2 (17.36) ,
,
-
for some integer s > 3, then for all f satisfying
Mr(f)
f
fd(Q,, -4 )I
-I/2
+psn -(s-2)/2
)
+ c30d (c3IP3n- I/2)°
(17.38)
If r = 0, (17.36) may be replaced by p 3 < oo . Proof By Theorem 16.1 (with V= 1= T) one has J fd(Q -0 )I
-1/2
+psn -(s-2)/2
)
+ 2; (c33P3 n - 1/2: (Dro ).
(17.39)
Now if r=0, then r0 =0, g = f, and wJ (E : t) Z d(2E)' and we are done by Corollary 15.2. If r>0, then I g(x)
—
g(Y)I
=I(I+IIx11') 'f(x) (l+IlYll') 'f(Y) -
-
-
<(1+IIx11') 'if(x) f(Y)I+If(Y)[(I+Ilxll r ) -'— (I+IIYII -
-
') - ']
'max (IIxII'-',IIYII'-I)
+I x11' -' )(IIxII + 2 E ) r-12 E
< d (2E)"+2 rrMr (f)(1 + IIxII r ) '-' (IIx11' - I + (2E)' - I )E
(IIx y11 <2E).
(17.40)
174 Normal Approximation
Letting E = c33P3n -' /Z , one has c < c 34 by (14.60). Hence (17.38) follows from (17.39) and (17.40). Q.E.D. We now turn to some applications of Theorem 15.4. Specialize Theorem 15.4 to the case r=0, s = 3, f = I4 , where A is a Borel subset of R k. If V= I and p 3 < oo, then one can show that Q,, (A)
—
Z(A) < c35( 1 +r^ k+' )n -I /2A* 3 +*(A"\A),
(17.41)
where, as in (15.61), (17.42)
71—C3P3 ([logn]+1)n-'/2.
The one-sided inequality (17.41) follows if one uses Corollary 11.3 instead of Corollary 11.2 [see (15166)] in the proof of Theorem 15.4. The rest of the proof need not be altered. Let 8„=max
{rl,c35(1
+')n - '/ 2 0
.3 }.
(17.43)
THEOREM 17.9. Let V=I, p 3 < oo. Then the Prokhorov distance d p and the bounded Lipschitzian distance d BL between Q and 4 are estimated by d,, (Q,4)
d8c(Q.,4)
where c 36 depends only on k. Proof. The assertion concerning dBL follows immediately from Theorem 17.8 with r=0 [see (2.51) for the definition of this distance]. As to dp , note that for any two probability measures G,, G 2 on R k one has dp (G i , G 2 ) - inf { e > 0 : G ! (A) < G 2 (A `) + e and G2 (A)
=inf( >0:G 1 (F)< G2 (F`)+e, G 2 (F) < G I (F )+e for all closed F)
(17.45)
since A ` = (closure of A) for all A C R k and all c > 0. Also, d,(G 3 ,G 2 )=inf (t>0: G 1 (F) < G 2 (F
)+c
for all closed F) (17.46)
since, given G 1 (F) < G2 (F`) + e for all closed F and some c > 0, one obtains 1— G 1 (F` )= G 1 (R"`\F`) < G 2 ((R"\F` f)+e
(17.47)
Some Applications 175
for all closed sets F, using
(17.48)
R k \FJ(R"`\F` )(. Now (17.41) and (17.43) yield
Q„(A)
(A E s"). (17.49)
Together with (17.46) this gives the first inequality in (17.44). Q.E.D. Recall that A, 3
do(G,,G2 )m sup G,(C)— G 2 (C)I
(17.50)
CEe
where C is the class of all Borel-measurable convex subsets of R k , and G I , G2 are probability measures on R k . By Corollary 17.2 one has
d0(Q, c) < c37P3n -' / 2 , under the hypothesis of Theorem 17.9. For two positive numbers d, a, where 0< a < 1, let d. (d: 4 the class of all Bore! subsets A of Rk satisfying
(I 0, v (aA)`)<de°
(e>0).
( 17.51)
denote
(17.52)
THEOREM 17.10. If p 3 < oo, then there exist constants c 38 , c 39 depending
only on k such that sup AE&(d:0o.v)
iQ.(A)- 00,v(A)I
(17.53)
where =c39h-3/2P3([logn]+l)n-' /2,
(17.54)
and A, A are the smallest and largest eigenvalues, respectively, of the average covariance matrix V of X 1 ,.. .,X,. Proof. This is an immediate consequence of Corollary 16.5 with s=3, r-0, f=14. Q.E.D.
176 Normal Approximation
The class Qa (d: (D o, i,) may be shown to be larger than ( Q (d: (D o, v ) defined by (17.3), for any pair of positive numbers d, a, where 0< a < I (see, e.g., the discussion preceding the statement of Theorem 15.4). The latter is the largest translation invariant subclass of the former. A final application of Theorem 15.4 provides an estimate for tail probabilities of Qn . THEOREM 17.11. Let (X„ : n> 1) be a sequence of independent random vectors with values in R k and having zero means and finite sth absolute moments for some integer s> 3. Let n
Vn = n - ' Y, Coy (Xi ),
An = smallest eigenvalue of V,,,
J- 1 n
A n = largest eigenvalue of Vn ,
p = n - ' F, E IIXjII', i=1
n
= inf
o<<
A_ t/Z f
en"'
{Ilx;^14A./2cn'/'}
IIX II'
n
+ n -' 2 X - s/ Z f ^
W)
IlxjIIa .
(17.55)
n {11x111>/;cn"2}
Assume that
liminfA„>0,
lim n - '/ 2 p 3 , 1 log'/ 2 n=0,
lim n -( ' -Z) / Z pJ, n =0.
(17.56) Then one has
sup
{ a'
a >((s-2+8)Iogn)
t /2
Prob (II n - 1 / 2 (X 1 + ... + Xn) II > A' /Za) )
< c 4o(s, k)(1 + A'( 2 )( l + (n -'/2A1 3/2logn)k+,+'`n _(s-2)/2pn +©n(8 )n -(:-2)/Z, (17.57)
where 0„(S)—>0 as n-+oo for each 8 >0. Proof. Without loss of generality, assume a n >0 for all n. Let Qn denote the distribution of n - '/ 2(T„X + + T„X n ), where Tn is the symmetric, positive-definite matrix satisfying TZ =V,,
(n>1).
(17.58)
Some Applications
177
Write n T
—n—'
m+2,n
m+2 (0<m<s-2), 2 EllTT Xjll i I —
n
(l vi < s).
(vth cumulant of T„Xj )
(17.59)
j-1
Define the function f by
f (x)° 0 if a if
IIxII a,
(17.60)
and use Theorem 15.4 with this f to get s-2
Qn — 2 n -mj2 Pm( —, D: (j4.,n}) Qx: Ilxll > a)) ,n 0
a' (
< cal(s k)M, (f)[ 1 +( n—'/2T3
,nlogn)k+:+110n 3n-(s-2)/2
,
s-2
'has m2.0 n—m/2 I Pm( — 'D: (Z,n))l (17.61)
(lx:IIxII>a—c42n-'/22r3,nlogn}) where n
0.* ,- inf ,
04141
en-1 2 f
j^ l (IIT.Xj IK <Enl/I)
II TnXj IIJ
n
+n -l I f
IITnXiiIs
f—1 IIITXjII>En'/Z)
by an easy computation using
A, ' /2 11Xj11 < II TXj ll' A,i ' /2 11Xj il T
m+2 ,n
< An
(m+2)/ZPm+2
,n
(1 < j < n),
(0 < m < s — 2).
(17.63)
178 Normal Approximation
The assumption (17.56) implies [in view of (17.63), (15.84)] n-m/2Tm+2 .n—'0
asn—>oo
(17.64)
(1 <m<s-2),
and a — c42n -1 /2i3„logn> 2 , I/2
a—c42 n -1 / 2T 3 „logn>((s-2+ 2 )logn)
(17.65)
for all sufficiently large n if a>((s-2+S)logn) 1 / 2 . Hence a'n -m/2 I Pm( -0 : {i.,^})l({x IIXII > 2 1/ < c43 n - m/2,rm+2
(IIxII
.n f
IIxll3m+aeXp
z >(s-2+6/2)1.`.)
_ II 211 2 ) dx J (17.66)
=©„(S)n -( s -2) / 2 (0<m<s-2). Also note that MS(f)= 1+a5 <.
(17.67)
The estimates (17.62), (17.66) are now used in (17.61) to yield a JQ.({ x: Il x ll > a ))
< c4,(s,k)r 1 +(n -1 / 2 X; 3/2P3
.logn)k+,+' ]n n sn -(..-2)/2
n-(S-2)/2. +O„(S) t
(17.68)
Finally observe that Q,,((x:IIxii>a})=Prob(I(n -tl2 (T,X,+
...
+T,,X)11>a)
>Prob(IIn -1 / 2 (X j +... +.X,)11>aA,',/ 2 ), (17.69)
since II T,,xll > Ar 112IIxII. Q.E.D. COROLLARY 17.12. Let {X: n> 1) be a sequence of independent and identically distributed random vectors having a common mean zero and common covariance matrix V. If p,-EIIX 1 II s is finite for some integer s>3, then 2n^/ 2 )=Sn n -( : -2) l 2 an s^ (17.70) P(IIX1+... +X.II>a.Al/
Some Applications 179
where 6,,-*0 as n--* oo uniformly for every sequence {a„ : n> I) of numbers satisfying (17.71)
a,,>(s-2+6)'/2log"2n
for any fixed 6 > 0, and A is the largest eigenvalue of V. Proof Note that in this case a' < n -I / 4 - s/ 2ps +x - s/ 2 f1IX1 n
(IlXill> >,
5-* 0 (17.72)
as n-*oo. Here A, A are the smallest and largest eigenvalues of V, respectively. Q.E.D. COROLLARY 17.13. Let {X„ : n> 1) be a sequence of independent random vectors having zero means and finite sth absolute moments for some integer s > 3. Assume that lim Q,,,, < oo ,
lim inf an > 0, n-ioo
(17.73)
n—,cc
where the notation is the same as in Theorem 17.11. Then P(11X1+ ... +Xn11 > a,A,', /2n 112 )= S„n
c: -2) / 2 a,^ ',
(17.74)
where (S„ : n) 1 } remains uniformly bounded for every sequence of numbers { a„ : n) 1) satisfying (17.71) for any fixed 6 > 0. Proof. In view of (17.73) the sequence (A n : n) 1) is bounded since, writing V. = ((va )), one has k
A,,= sup (x, V,,x> = sup 2 v j xi xj I1x11-1
11x11 i,j-1 2
kk < G ( vii vjj) 1/2xi xjl =(
Pm,,, <(p,.n)
m/s
i-1
k
Ixilvri /2 ^ <
vii=P2,nr i-!
(1 < m < s).
(17.75)
Also [putting e = 0 in the expression within square brackets in (17.55)], n
E(IIXj jJf)=a^ s/ 2 p-,,,,
0 ^..,<Xn s / 'n - '
i-I
(17.76)
180
Normal Approximation
so that {0.*,, :n> 1) is a bounded sequence. The relation (17.74) now follows from (17.57). Q.E.D. Note that the sequence (&, : n> 1) in Corollary 17.13 may be shown to go to zero in the same manner as (Sn : n > 1) in Corollary 17.12 if n
limn
1
(17.77)
j-1 {Ilxjll>Af2n'/')
18. RATES OF CONVERGENCE UNDER FINITENESS OF SECOND MOMENTS Most of the main results in Sections 15, 16, and 17 have appropriate analogs when only the second moments are assumed finite. Here we prove an analog of Theorem 15.1 and derive some corollaries. As before, X 1 ,.. .,X are independent random vectors with values in R',
and n
n ' 7, Cov(Xj)= V,
(l < j
EXj =O
-
(18.1)
j-1
where V is a symmetric, positive-definite matrix, and we write
Ej^XJ j^'
(r>0),
(18.2)
j-1
and denote by T the symmetric, positive-definite matrix satisfying T 2 =V - '
(18.3)
We also define It
0",(E)=n - '
f
^^X.^^J
(E>0, s>0),
j —I {Ilxjll>in'^^}
n
S," J = inf
0<
E(3—J)n—I
r J
—
J {IIXII
IIXiIIJ^0" J (E) .
(18.4)
Convergence Assuming Finite Second Moments 181
, we see that
Recalling the definition (15.7) of
°n2 = ^n,2 , An,2(1)<sn2
(1g'S)
Also, as before, for a function f on R k we define sup
Mr(f)=
(I+Ilxll r ) — 'If(x))
(r>0),
xER k
Mo(f)=
sup
lf(x)
—
f(y)I =wj(R" ).
(18.6)
x,yER k
Finally, let Qn denote the distribution of n -1 / 2 (X 1 + • • • +X„) and let ' a e denote the normal distribution on R k with mean a and covariance matrix V. We write 4i=0 o.J , where I is the identity matrix. THEOREM 18.1. Let V=I and p< oo for some s, 2 < s < 3. There exist positive constants c,, c 2 , c 3 depending only on k and s such that for every Borel-measurable function f on R k satisfying (18.7)
M,(f)<°o
for an integer r, 0 < r < s, one has f ,s fd(Q n --V)I
-(3-2)/28* ^o) +c2 w8(c n,s 3 n
(18.8) where ro = r if r is even, ro = r + 1 if r is odd, g(x)=( 1 +Ilxll r°) -If(x) if r>0, if r=0, g(x)=f(x)
(18.9)
and fi r° is the measure (Dr°(dx)=(1+IIxIIr°),D(dx).
(18.10)
Proof. Let the truncated random vectors Yj , Zj (1 <j < n) be defined as in (14.2). Let Q„', Q„ be the distributions of n - 1 / 2 (Y 1 + • • • +Y,), n -1 / 2 (Z I + • • • +Z„), respectively. Let n
an =n -1 / 2 1 EYj, i-1
n
D=n - ' 2 Cov(Yj ), i-1
(18.11)
182 Normal Approximation
and write 4i" = 4i_ q .. r , 4i' = 4i o.o . As in the proof of Theorem 15.1,
ffd(Q,-V)I
„
_t )
+I ff d -b-I' I+I fl,d(Q.-^-) ;
(
)
f
(18.12)
that
f
IIXII'Qn(dx)
IIxII' 4i(dx)
(18.13) so that (18.8) may be shown to be trivially ture if (15.11) is violated, and one may therefore assume, without loss of generality, that (15.11) holds. Next apply Theorem 16.1 to random vectors Z 1 ,.. . , Z„ [letting s = 3 in
(16.12)] to get ,1 fg,d(Qn-)I
(18.14)
where A, A are the smallest and largest eigenvalues, respectively, of D, and n
Ps=n -1
EIIZZII', i-1
(l+IIxII
ro
)_ ' ( fa^ B ')(x) ,
-
if r>0, (18.15)
()(x)
if r=0.
Here B is the symmetric, positive-definite matrix satisfying B?= D - '. By (14.10) one has Pan-1/2<23n-3/2
n
EIIYJII3<236 n-(s-2)/2. (18.16) j-1
Now by writing g"(x)=(1+JIB
-
'x+a„II'")
'
(f,° B - ')(x)
for r>0,
Convergence Assuming Finite Second Moments 183
it is simple to check that [see (14.12), (14.19), (14.81), and (18.5)]
I g"(x)
g (x)I < M,0 (f)I II B 'x+anll'°_ 11xII'°1( 1 +llx11'°) — ' —
—
-
< r
c 7(f) M Ssn, s n
-
t
,-
z)/i
As a consequence
We(e:O,°)<w8 (e:0,o)+cBM,(f)s,`fn-cs-z)/2. (18.17) Changing variables x--B 'x + a,,, and using Lemma 14.6, -
w8 (e : 4b, 0) = sup f w8 ,.(x +y : e)^,0(dx) yER"
,0(dx) < sup f wg(B -' (x+y)+a,,:IIB -' II E 14 / yERk
< sup f wg(z+y:c9e )[1+IIB(z—a„)II'°l4Da,,.n(dz) yER k
>e : 0.°)ci2Mr (f )6,'5n -(..-2)/2. (18.18)
The proof is now complete by inequalities (18.12), (18.14), (18.16), (18.17), and (18.18). Q.E.D. COROLLARY 18.2. For each n> 1 let X,") , ... , Xv" ) be independent random vectors with values in R k , having zero means and an average positivedefinite covariance matrix V" = k I lk. 1 Cov (X). Let G" denote the distribution of k„'/ 2T"(XS" ) + • • • +Xke )), where T. is the symmetric, positivedefinite matrix satisfying Tn = Vn ^, n) 1. Suppose k"—*oo as n—*oo. If k"
©„(c)=k„'2
f
IIT„X^" ) 11 2 0
asn-+oo, (18.19)
{ II T"X,"^^I > ck ^^^
for every c > 0, then {G,, : n) I) converges weakly to the standard normal distribution 4.
8-
184 Normal Approximation Proof. Apply Theorem 18.1 (with r = ro = 0) to the random vectors
For every Lipschitzian function f bounded by one (or for the indicator function f of an arbitrary Borel-measurable convex set) one has f f(G"-'0)I <(c 1 +c, 3 d)8,!
(18.20)
where k„
Ek1 ' I f
inf o<<
11 T"X^"t^H 2 +9"(E) , (18.21) j-I (IT„XJ (N)ps ski
and d depends on f. Note that k„
8q <Ek.'
fiT"
pI 2 +e"(E)=kE+8 n (E)
(0<E<1).
j-^
Given rl > 0, choose c=/2k and let no(ri) be an integer such that for n> no (rl) one has 8"(t/2k) < ri/2. Then d, < rl for all n> n 0(rl). Q.E.D. The above coronary is an extension of Lindeberg's central limit theorem to the multidimensional case.
COROLLARY 18.3. If in Corollary 18.2 one replaces (18.19) by
EIIXj"I'
(1 < j1)
(18.22)
for some s, 2< s<3, then
sup IG"(C)—t(C)I
CEe
k„
inf I E 3-sk; ' < c14k; 0<<
^^ T Xj"tII' (' ) 2 i-^ J{IITX3"'ll> ,' "
k„
+ kn 1 2 J II T"X5 n) II S j-1il ll Xf"III>ck=} (
< c14k^ t -z)/2 ,
k^
k„
Ell T"XJ"t° , j-1
(18.23)
Convergence Assuming Finite Second Moments 185
where e is the class of all Bore/-measurable convex subsets of R k , and e 14 depends only on k. Proof. The first inequality in (18.23) follows from Theorem 18.1 (with r=0= r 0) applied to random vectors T"X^" , 1 <j < k, and from Corollary )
3.2 (with s = 0). The second inequality is obtained from the first by letting e=0 in the expression within square brackets. Q.E.D. The above corollary contains a multidimensional extension of Liapounov's central limit theorem: { G" : n > 1) converges weakly to cli if
lim
kn (,-2)/2 fl-p 00
k„
k,. ' 2 EllT"X,ll s )= 0,
(18.24)
j-1
for some s, 2 <s < 3. The first inequality in (18.23), however, is sharper. For example, if (X" : n> 1) is a sequence of independent and identically distributed random vectors with common mean zero, common positivedefinite covariance matrix V, and finite sth absolute moments for some s, 2<s<3, then one has [letting k" = n, €= n 1 " 4 in (18.23)] sup Prob(n-1/2(X1+... +X")EC)-c0.V(C) = o(n -( s -2) /Z)
(n-.00).
(18.25)
One may in the same manner obtain analogs of Theorem 17.6 and the mean central limit theorem Corollary 17.7. For example, if k= 1, then there exists a constant c 15 depending only on p such that under the hypothesis of Theorem 18.1 (with k = 1) 1IF., - cI11i) , (18.26) where F" (•) is the distribution function of n -1 2 (X 1 + • . • + X"), and 't'() is the standard normal distribution function on R'. If (X" : n> 1) is an independent and identically distributed sequence of random variables, then the right side is o(n -(a -2) / 2 ) as n,00.
NOTES The first central limit theorem was proved for i.i.d. Bernoulli random variables by DeMoivre [I]; Laplace [I] elucidated and refined it, and also gave a statement (as well as some reasoning
for the validity) of a rather general central limit theorem. Chebyshev [1] proved (with a complement due to Markov Ill) the first general central limit theorem by his famous method of moments; however, Chebyshev's moment conditions were very severe. Then came Lia-
186 Normal Approximation pounov's pioneering investigations [1, 21 in which he introduced the characteristic function in probability theory and used it to prove convergence to the normal distribution under the extremely mild hypothesis (18.24) (for k I). Finally Lindeberg [1] proved Corollary 18.2 (for k - 1). In the i.i.d. case this reduces to the so-called classical central limit theorem: if (X" : n> 1) is a seque...c of i.i.d. random variables each with mean zero and variance one, then the distribution of n - ' / 2(X, +••• +X") converges weakly to the standard normal distribution 4}. This classical central limit theorem was also proved by Levy [I] (p. 233). Feller [I) proved that the Lindeberg condition (18.19) is also necessary in order that (i) the distribution of k,, '/ 2s; t(XS" 1 + • • • +X ) converge weakly to 4' and (ii) m/k"s,-*O as n-+oo; here k-1, and we write s,, for V", s; I for T", and m, = max (var(XS" )) :1 < j < k"). Many authors have obtained multidimensional extensions of the central limit theorem, for example, Bernstein [I], Khinchin [1, 2]; the Lindeberg-Feller theorem was extended to R" by Takano [1]. Section 11. Lemma 11.1-Corollary 11.5 are due to Bhattacharya [1-5]. These easily extend to metric groups, and Bhattacharya [6] used them to derive rates of convergence of the n-fold convolution of a probability measure on a compact group to the normalized Haar measure as n-+oo. Lemma 11.6 is perhaps well known to analysts. Section 12. The first result on the speed of convergence is due to Liapounov [2], who proved sup 1F"(x)- 4'(x)I
(n>2),
(N.1)
xER I
where F. is the distribution function of the normalized sum of n independent random variables with zero means, average variance one (normalization), and average third absolute moment p3 . Under an additional hypothesis [Cramer's condition (20.1)] Cramer [I, 3] (Chapter VII) was able to remove the factor logn from (N.1). The best result was obtained independently by Berry [1] and Esseen [1], who eliminated the logarithmic factor in (N.I) under Liapounov's hypothesis. The constants appearing in Theorem 12.4 (the Berry-Esseen theorem) are not the best known. After initial work by Zolotarev [1, 2], the constant 0.7975 [appearing in (12.49)] was obtained by van Beek [1]. Lemmas 12.1 and 12.2 are refinements due to Zolotarev (1) of some inequalities of Berry [1) and Esseen [I]. The multidimensional extension of the Berry-Esseen theorem is due to Bergstrom 11), who used an ingenious induction argument. Bergstrom's approach does not require the Fourier analytic machinery and has been extended in recent years by Bergstrom (2J, Sazonov [ 1, 2], and Paulauskas [1]. Sections 13-18. Apart from a special (and deep) result of Esseen [1] [see Notes, Chapter 5] the first result going beyond distribution functions was obtained by Rao [1, 2], who proved that sup IQ"( C)-4•(C )I2), (N.2)
CEe
where e is the class of all Borel-measurable convex subsets of R k , and Q. is the distribution of the normalized sum of n i.i.d. random vectors. von Bahr (3] and Bhattacharya [1, 2] independently extended it to much more general classes of sets [e.g., the class B; (d: 4}) in (17.3)] and at the same time made it precise by eliminating the logarithmic factor on the right side of (N.2). The moment condition in Bhattacharya [ 1, 2] was ps +e < 00 for some 6>0, whereas von Bahr [3] essentially assumed that the random vectors are i.i.d. and that p 3 < 00, pk+ I < oo. For the class e, Sazonov [ I ] finally relaxed the moment condition to P3 < 00, proving Corollary 17.2 in the i.i.d. case (Bergstrom [3] later proved this independently of
Convergence Assuming Finite Second Moments 187 Sazonov), while Rotar' [1] relaxed it for the general non-i.i.d. case. For more general classes of sets this relaxation of the moment condition is due to Bhattacharya [7]. Paulauskas [1] also has a result that goes somewhat beyond the class ( . The results of Section 13 are due to Bhattacharya [3], although the explicit computation of constants given here is new. The first effective use of truncation in the present context is due to Bikjalis [4]; Lemma 14.1 and Corollary 14.2 are essentially due to him. Lemma 14.3 is due to Rotar' (I]. Lemmas 14.6 and 14.8 were obtained by Bhattacharya [7]; a result analogous to the inequality (14.107) was obtained earlier by Bikjalis [4]. Analogs of Lemma 14.7 were obtained earlier by Doob [1], pp. 225-228, for a stationary Markov chain, by Brillinger [I] for the i.i.d. case, and by von Bahr [1] for the case considered by us; but we are unable to deduce the present explicit form needed by us from their results. Theorems 15.1, 15.4, and Corollary 15.2 are due to Bhattacharya [7], as is the present form of Corollary 15.3; earlier, a version of Corollary 15.3 was independently proved by von Bahr [3] and Bhattacharya [1, 2]. Theorems 17.1, 17.4, 17.8-17.10, and Corollary 17.3 are due to Bhattacharya [4, 5, 7]. Corollaries 17.5 and 17.12 were proved by von Bahr [2, 31 in the i.i.d. case; the corresponding results (Theorems 17.4, 17.11, and Corollary 17.13) in the non-i.i.d. case are new. The first global, or mean central limit theorems are due to Esseen [1, 3], and Agnew [1]. The fairly precise result Corollary 17.7 was proved for s-3 by Nagaev [I] in the i.i.d. case (a slightly weaker result was proved earlier by Esseen [1]) and later by Bikjalis [2] in the non-i.i.d. case; afterwards, the much more powerful Theorem 17.6 was proved by Rotar' [ I ] for s-3. Rotar' [1] also stated a result which implies Theorem 17.6 for all s>3; however we are unable to verify it.
Theorem 18.3 is new, as is perhaps Corollary 18.3; however Osipov and Petrov [1] and Feller [2] contain fairly general inequalities for the difference between the distribution functions F. and 0 in the non-i.i.d. case in one dimension. More precise results than (18.25), (18.26) are known in one dimension. Ibragimov (1) has proved the following result. Suppose that {X.: n > 1) is a sequence of i.i.d. random variables each with mean zero and variance one; let O<6<1, 1
(Ix,I x)
Xi-O(x -6 ) (x-400); (N.3)
also,
IIF,,-4II,-O(n-1/2) if and only if (N.3) holds with 8-I and
f(lx,
I , x)
X?-O(l)
(x-+oo);
(N.4)
here IIF o -OII„ denotes the Kolmogorov distance (see (2.72)). Under the same hypothesis Heyde [ I ] has shown that if 0<8<1, then m „-I
n
+8 /
2 IIF^ _^II,
0
if and only if E1X, 1 2 * 8 < oo; also (N.5) holds with 8'O if and only if E(Xf log(1 +IX 1 O) < oo. An extension of Ibragimov's results to R k has been recently obtained by Bikjalis [6].
(N.5)
CHAPTER 4
Asymptotic ExpansionsNon lattice Distributions
We have seen in Chapter 2 that the characteristic function of Q. (distribution of nth normalized partial sum of a sequence (X.: n> 1) of independent and identically distributed random vectors) admits an asymptotic expansion in powers of n -1 / 2 in the sense that the remainder is of a smaller order of magnitude than the last term in the expansion. If the sth absolute moment of X, is finite for some integer s > 3, then there exist functions P —4): (x,) ), 0< r < s — 2, which are polynomial multiples of the standard normal density .0, with P o = 0, such that ;
f exp(i)Q„(dx)=
,-2
f
n - '/2 exp{i)P,(—¢: (x))(x)dx r -o
+o(n -( s -2) / 2 )
(t
E R k , n-*oo).
(1)
In fact, we know from Theorem 9.12 that one may replace the function x-+exp{i) by any polynomial multiple of it of degree s or less. Unfortunately such an expansion of ffdQ„ does not hold for a large enough class of functions (e.g., indicator functions of even those sets that have "smooth" boundaries) unless some further assumption is made on the nature of the distribution of X 1 . In Section 19 we assume that the distribution of X, has a bounded density (or at least an r-fold convolution of it has one, for some r), and show that in this case there is an expansion of the density of Q. as well as of ffdQ„ for every bounded measurable f. In Section 20, the assumption of Section 1 is relaxed to read "the distribution of X, satisfies Cramer's condition (20.1)," and an expansion of ffdQ„ is obtained for a very large class of functions f, although not for every 188
Expansions of Densities
189
bounded measurable f. In particular, one has an expansion of Q(A) for every Borel set A satisfying ((aA))=o((— loge)
-(Y-2V2)
2) 40 ( ,
the remainder term in the expansion being o(n - ^^ -2 ^/ 2 ) uniformly over every class Q of sets A satisfying (2) uniformly.
19. LOCAL LIMIT THEOREMS AND ASYMPTOTIC EXPANSIONS FOR DENSITIES
Suppose that (X,,: n> I) is a sequence of independent random vectors such that the distributions (Q: n) 1 } of their normalized partial sums have densities {q,,: n> 1), at least for large n, satisfying lim q„ (x)=1(x)
(xER k ),
(19.1)
where p is the density of the limiting distribution 'Y (e.g., the standard normal distribution on R``). Then this assertion (19.1) is called a local limit theorem (for densities of {Q: n > I )). We shall always take the continuous version of the density q , if there exists one. Recall that by Scheffe's theorem (Lemma 2.1), (19.1) implies q
urn IIQ,, — 'YII=O.
(19.2)
We shall usually consider uniform local limit theorems, that is, assertions of the form urn sup jgn (x)—i(x)j=0. (19.3) xER k
Since assertions involving densities are made in this section under hypotheses guaranteeing existence of continuous versions of q„ for all large n, there is no scope for ambiguity in interpreting (19.3) or similar statements. THEOREM 19.1 Let {X: n> 1) be a sequence of i.i.d. random vectors with values in Rk and EX =0,
Cov(X 1 )= V,
(19.4)
where V is a symmetric positive-definite matrix. Let Q n denote the distribution of n 2 (X 1 + • • +X„) (n ='1, 2, • • ). The following statements are all equivalent. (i) Q, E L° (R k) for some p > 1.
Expansions—Nonlattice Distributions
190
(ii) For every sufficiently large n, Q, has a density q„ and Ism sup
(19.5)
Jgn(x)-40.v(x)I =0.
xER k
(iii) There exists an integer m such that Q* m (or, equivalently, Q m ) has a bounded (almost everywhere) density. Proof. We first show that (i)=(ii). If JQ 1 I° is integrable, then so is „I for every n > p. This implies [by the Fourier inversion theorem 4.1(iv)] that, for n > p, Q„ has a bounded continuous density q n and sup Jgn(x) — $ov(x)I <( 21T) —k f I4n(t)—^o.v (t)Idt
(n>p).
(19.6)
xER k
By the classical central limit theorem (see, e.g., Corollary 18.2 specialized to the i.i.d. case) lim f
(19.7)
I4n(t)—^0.v(t)Idt=0
"—°° (IItII
for each positive a. Also, since Q i (t)=1—+o(11 t11 2 )
(t-40),
(19.8)
there exists a positive number b such that
1Q 1 (t)I < I-a<exp{-o )
(11 1 11 < b).
(19.9)
This implies n
IQn(t)1=IQi(n ^ 2 )I
Gexp{-'s)
(11 1 11
Let
8-sup(IQ1(1)I: IitIJ>b).
(19.11)
Then 8<1, since 8=1 implies sup(1 Q i (t)lm : II 1 II )' b} =1 for all m> 1; but the Riemann—Lebesgue lemma [Theorem 4.1(iii)] applies for m > p, so that there must exist t o E R k such that I Q 1 (t0)I= I, which means that X i assigns all its mass to a countable set of parallel hyperplanes (see Section 21); this would imply singularity of Qm with respect to Lebesgue measure for all m> 1, contradicting the fact that Q,„ is absolutely continuous for all m >p.
+
Expansions of Densities
Next, for 11t1I >bn
191
► 12
`nb2)I:11:I1>bn1^2)) 161( /2)IP'(sup^IQ n "-P ► P
=S" -P IQ ► (.. _ )I
(19.12)
(n>p)•
Now lim f
n iao
"(t)-;O,v(t)I dt <
urn f
19"(t)-;0,v(t)I dt
'00 (Iltil
exp{-IQ,Vt>)dt
+21 (II'II>a)
lim
-P S" "moo
=2f
r
t P
✓ (IItII>bnh'2)I^ ► C n ► / 2
exp(
—
;)dt
dt
(19.13)
(11'11>a)
for all a > 0. Letting a--). oo, one gets i1m sup Iq"(x) - $o.v(x)l<( 21r) -k ism f l4"(t) - $o.v(t)ldt= 0 . xER k
(19.14)
Next note that (19.5) clearly implies boundedness of each q" for all sufficiently large n. Hence (ii)=(iii). To complete the proof we show that (iii)= (i). If q", is bounded above by c, then qm < cq, so that q,„ E L 2(R k ). This implies 4.EL 2(R k ) [Theorem 4.1(vi)]; that is, Q I EL 2, (R k ).
Q.E.D. The three statements (i), (ii), and (iii) are each equivalent to (iv): There exist r> 1 and an integer m such that Q*1 m has a density belonging to L`(R►`). It is clear that (iii)^(iv) with r = 2. Conversely, by the so-called HausdorffYoung Theorem,t if fE L ► (R'`)n L'(R k) for some rE(1,2), then f E L'(R tc), where r'=r/(r-1), and llfllr <( 2 r) k " Ilfll ,. 'See Katznelson [1), p. 142.
(19.15)
192
Expansions—Nontattice Distributions
Hence if (iv) holds with r (=- (1,2), then Qr E L'(R"); that is, Q, E L mr'(R k ). If (iv) holds for some r > 2, then q,,, E L 2(R k ) (since q,,„ < q,„ + q,^) and, therefore, q„, E L 2(R "), that is, Q, E L 2 m (R k ). As an example of an absolutely continuous probability measure on R having compact support (and mean zero) and density q, such that q*' (hence q,„) is unbounded for every m, define t -i
h(x)= (log2)(xlog 2 x)
q,=h•h,
if xjz(0, 2)
0
h(x)=h(-x)
ifxE(0,Z) (19.16)
(xER 1 )
The following result, which provides asymptotic expansions for densities, is more important from our point of view than Theorem 19.1. THEOREM 19.2 Let (X: n> 1) be a sequence of i.i.d. random vectors with values in Rk , having a (common) mean zero and a positive-definite covariance matrix V. Assume p, m EIIX1 II' < oo for some integer s >3 and that the characteristic function Q, of X, belongs to LP(R k) for some p> 1. A bounded continuous density q„ of the distribution Q, of n - '/ 2 (X 1 + • • • +X") exists for every n> p, and one has the asymptotic expansion s-2
sup ( 1 + IIXII')Ign(x) — xERk
n
—i/2pj( — $o.v:
{L})(x)I
j-0
=o(n-(2)/2)
(n-*oo),
(19.17)
where X, denotes the with cumulant of X, (3 < lvI < s). Proof Without loss of generality, assume p to be an integer (else, take [p] + I for p). For n > p + s, D °Q„ is integrable for 0< la) <s. Writing, for n>p+s, iai<s, s-2
h.(x) = x ('(gn(x) —
I
n-1/2pj(—tpo.v
j-o
{X. })(x)
(xERk),
s-2
ho(t)s( — i)^ a ^D^
n—j/2P1(it: {X,})exp{—i
(t)— j-0
(19.18)
tSee Gnedenko and Kolmogorov [1], pp. 223, 224 for a proof of the unboundedness of q,,, for every m.
Expansions of Densities
193
one has (by the Fourier inversion theorem)
f
h„(x)=(217) -k exp {—i(t,x>)h„(t)dt
(19.19)
(xER k ).
Let B be the positive-definite symmetric matrix satisfying B 2 = V 1 Define (19.20) ,q3= EIIBX1II°• -
.
By Theorem 9.12 (and the remark following it) 8(n)n-(s-2)/2(
+
Vt)(s- lal)/2 Vt) (3(s- 2)+ I a I / 2 )exp(— 4(t, Vt)}
(19.21)
)
for all t satisfying
1I 1 II
i / ( s - 2)
( 19.22)
=an 1 / 2 ,
say, where A is the largest eigenvalue of V. In view of (19.19), (19.21), and (19.22), it is enough to show that
1
a Q1 (1)1 dt =o(n (s-2)/2)
(,
(n- -*oo),
( S -2
1
(Da
(IItII >an" 2 )
2 n -1 / 2P,(it: {exp (—Z))Idt x)) —0
_((S_2)/2) -(s-2)/2)
(n—,00). (19.23)
The second assertion in (19.23) is true because of the presence of the exponential term. The first follows easily from the estimate (obtained by application of Leibniz' formula for differentiation of a product of functions)
ID a Qn (t)I < C(s
,
k)pIaln Hal/28°-IaI-i iQI(
1 )I n
p
/2
(11t1l > an
2 ),
(19.24)
where n > p + s, jaI s, and from 8
Q.E.D.
msup {IQ,(t)j: 11 1 11 > a) < 1.
(19.25)
194 Expansions—Nonlattice Distributions
Remark. It should be pointed out that Theorem 19.2 holds even with s-2. This is true because Theorem 9.12 holds with s-2. Therefore a sharper assertion than (19.5) holds, namely, nlimp
sup (l+llxI 2 )jgn(x) — #ov(x)^= 0 .
(19.26)
xERk
The next theorem deals with the non-i.i.d. case. THEOREM 19.3 Let (X: n> 1) be a sequence of independent random vectors with values in R k having zero means and average positive-definite covariance matrices V„ for large n. Assume that n
lira n - '
n— oo
E11B„Xj lI 3
(19.27)
j—1
for some integer s > 3, where B n is the positive-definite symmetric matrix satisfying
Vn = n
B„ = Vn ',
Cov(Xj),
(19.28)
j- 1
defined for all sufficiently large n. Also, assume that there exists a positive integer p such that the functions m+p
g n.n (t)- 11 jE(exp(i))I
(0<mp+1)
j—m+1
satisfy
y-
sup
f gm , n (t)dt
(19.29)
O<m
n>p+l
and, for all positive numbers b,
S(b)-sup{gm,n(t): IltII>b,0<mp+1)<1. (19.30) Then the distribution Q. of n - '/ 2 B„(X 1 + • • • +X„) has a density qn for all sufficiently large n, and `
sup
(1
s-3
+1Lx11 3 )Ign(x) — I n —"2Pr( - 4: (7Cr,n))(x)
xER k r..0
= 0(n -( s -2) / z )
(n-*oo),
(19.31)
Expansions of Densities
195
where is the average of the with cumulants (3 < I vi < s) of B„X3 (1 <j < n). Proof For a given nonnegative integral vector a, al <s, write (for all large n) s-3
h,,(x)=x' q (x) — 2 n - '/2P.( - 4 ())(x)
(xER'),
r=o
,=n
-'
(19.32)
± EIIB„X1II'
(2
J- 1
The statements below are meant to hold for all sufficiently large n. By the Fourier inversion theorem, one has (19.33)
sup Ih„(x)) <(217) -k f Ih(t)ldt. xER k
By Theorem 9.11 and hypothesis (19.27), the inequality 2
4
(19.34)
holds for all t in R k satisfying >/ f t II < b 2 n^^ -2 2
pI
(19.35)
where b, and b2 are two appropriate positive numbers independent of n. Also, mimicking the proof of Lemma 14.3 (however in this case jal < s is a necessary condition) one has, under the present hypothesis, ID*Q,^(t)I
II6IZ
(19.36)
for all t satisfying 11t1l < b4 n'/ 2 .
(19.37)
Again, b 3 and b 4 , as well as all b's below, are positive numbers independent of n. Now, by (19.33), sup Ih„(x)l <(21r) -k (I I +I2 +13 ), xERk
(19.38)
x
196
Expansions—Nonlattice Distributions
where 1hn(t)Idt
1 ► =f
Iz=
Jb z n ( • - z ) / 2,
13=
(IIhII>b4n" 2 )
^hn(t)I dt
,
(19.39)
Ihn(t)Idt.
The estimate for 1 follows from (19.34), (19.35). Write )
s-3 f( t)
.D a
P r( it: Ij n-r/2 r—O
exp{ {Xy.n)) It
—
Z I2tll ( tER k)•
I
l(19 . 40 )
Applying Lemma 9.5 to the random vectors B„X ,...,B„X., one obtains, using (19.27), z (IERk). (19.41) If(t)I4b6(1+11t113')exp{ —1 !11 } )
2
Thus by (19.36), (19.37), and (19.41), one gets 12 +1 3
(1+11,113,)exp{ (1111>bzn (' -z) / 2, )l
_6
1 1II
z
Idi
I D°Qn (1)I dt.
+
(19.42)
fII1II>b4n 1 / z }
As in (19.24), differentiate D°QQ using Leibniz' formula so that D"QR is expressed as a sum of nI"I terms, a typical term being a(t)-1
E(exp{i)) )
x (D°'E ( exp { i) ))
r,
-
(D""E(exp{i(tn -) /z,B,,X>))) ",
(19.43)
where j i ,.•.,j n are distinct indices in (1,...,n}, r ) ,...,r n are positive
Expansions of Densities
197
integers, and a 1 ,...,a m are nonnegative (nonzero) integral vectors satisfying I ` i r.a; = a. Now (D at E ( ex pl i) )),, ...
(D °„E l
ex p l i < tn -'/2, Bn X> >) )) rT )
< n-IaI/2(EIIB ,,XIi illa1l)... (Eli BnX1,^il -I)'^ < n - IaI/ 2 (E' II Bn X1 IlI a l)' < n-lal/2
ilad/Ial
(E1IB,,Xj,111aI) 4
... (Eli B,,XI. jl Ial ) ', n - lal/ 2mn i I a l n .
^ la.l/lal
(19.44)
Also, since m < jat , there are at least (n — IaI)/(dal p)— I sets of p consecutive indices in {1,2,...,n}\{j 1 ,...,jm ). Hence dt < n -lal/2+ ^yflal.n(S (b4))in-IaI /(IaIP)-2
} (11111>b.n' / '
X fRk gm'.n n^/2 dt < b$n-IaI/2+k/2+Ila.n(6(b4))(n-lal)/(IaIP)-2 (19.45)
for some m', 0 < m' < n —p and, therefore,
(11111>bOI12
)
JD a Qn (t)I dt
(n—o).
(19.46)
The desired conclusion (19.31) now easily follows from (19.33), (19.39), (19.42), and (19.46) on taking a=0, a = (s, 0, ... , 0), ... , a = (0, 0, ... , 0, s).
Q.E.D. The following variant of Theorem 19.3 is perhaps easier to use. COROLLARY 19.4 Let {X: n> 1) be a sequence of independent random
vectors having zero means and finite sth absolute moments {p S ,,,-EIIX n irs: n> I) for some integer s) 3. Let V n = n - Cov(XX ) and let X„ denote the smallest eigenvalue of V n (n) 1). Write p s = n - l lj_ I Eli u'. Suppose that lim A„ > 0, n
sup ps < 00.
(19.47)
n
Also assume that there exists an integer p such that the functions gm (m ) 0)
Expansions—Nonlattice Distributions
198
defined by m+p
gm (t)= 11 IE(exp{i<:,X,>))I
(m=0,1,2,...)
(19.48)
j—m+l satisfy
sup f gm (t)dtb,m>0)<1 m
(all b>0).
m>0
(19.49)
Then (19.31) holds. Proof. Note that, with B n defined by (19.28), one has
lin n — ' Z EIIB„Xj lls<( Ti IIBBIIs)1i„m ( m
n-1 j EI(X1II5 )
= ( lim /Z lim ( n-' p,; 1 <00, (19.50) n xn n ) j—I a
by (19.47), since II Bn II = A ' /2 . Also, letting gm n be as in Theorem 19.3, f gm.n (t)dt < sup (Det Vn )
sup 0<m
m>O,
n>p+l
n>p+l
2
f gm (t)dt.
(19.51)
But, writing A n for the largest eigenvalue of Vn , Det( Vn ) < An,
A n
(19.52)
Hence (19.29) holds. Finally, gm, (t)
sup
= gm(Bnt) ,
8m n(t) <
O<m
sup
g(z)<1
(19.53)
m>0
n>p+l,
n>p+l
IIziI>b
IllII>bA;'12
for a sufficiently large p. Q.E.D. If the integers in Theorem 19.2 (never smaller than 3) is larger than k, then (19.17) immediately implies s-2
Qn
—
I j-0
n
-j12Pj (_1boV: { x. )) =o(n-c,-z)/2)
(n-^0o). (19.54)
Expansions of Densities
199
Similarly, if s> k + 1, then under the hypothesis of Theorem 19.3, one has II
ii
s-3
-( s -i) / 2 Q'I — I n i/ ZP.^(—i: (}) =0(n )
(n-.00).
-0
The following theorem deals with the general (i.i.d.) case. THEOREM 19.5 Let {X: n> 1) be a sequence of independent and identically distributed random vectors with mean zero, nonsingular covariance matrix V and a nonzero, absolutely continuous component. If p S = Eli XiII` < oo for some integer s> 3, then, writing Q, for the distribution of n -1 / 2 x (X 1 + . • • + X n ), one has a-2
j^(l+Ilxll') Qn
-
I n - ' /2P,( - ^0.V: (X,.})I (dx) r-0
_..(_(S_2)/2) (n-*oo), (19.55) where y, is the with cumulant of X j . Proof By changing variables from x to Tx, where T'= T and T 2 = V I , one may immediately check that it is enough to prove the theorem for the case V= I. Hence we assume V= I. Define the truncated random vectors Y, Z
XX if IIX^II < n'/z Zi. ,, = Y1 ,,, —EYE ,,, (1 j < n, n > 1). 0
if IIXj II>n" 2 (19.56)
These are all the same as YY , Z1 defined by (14.2); the additional subscript n is introduced to emphasize that the truncations change with n. Let Q„', Q„ denote the distributions of n - / 2 1 y1 , n - / 2 ^n Z1 ,,,, respectively. ,
We write s'-2 n -"ZP,( —( D: {X}), r-O
s -2 n - ' /2P,( -4 0D: {}), r-0
(19.57)
200
Expansions—Nonlattice Distributions
where D„ = Cov(Y,.,,) = Cov(Z 1 ,,,),
x ,, = with cumulant of Z,,,,.
(19.58)
Also, let 4 ",. denote the signed measure whose density at x equals the density of ^Y'. at x - a,,, where ,
R
an =n -I / 2 F, EYJ „=n" 2EY 1 ,,,.
(19.59)
i-1
Now by Lemmas 14.6,.14.8,
f (I+IIxll')IQ,,-',,,I(dx)< f ( I + IIxIl)l Q - Q I (dx) '
+
" ,),' ^'...I(dx) J (1+ llxIQ ' -h
+ f(' + IIxH 5 )N'Z, - 'I',,,I(dx) = f( 1 + IIxII )IQ'
-
`I'^.:I(dx)+o(n (n--*co).
-(s-2)/2
)
(19.60)
Also, by a change of variables (x->x - a,,),
1 ( 1 + IIxIflIQ' - 'i':,i (dx) f (1+Ilx+anll')IQ, - ` ',.I(dx) =
f( 1 + IIxII 3 )IQ -
(dx)+o(n -(,-2) / 2 )
(n- o.o), (19.61) using (14.81) and the fact that ';,., has a bounded (uniformly in n) variation norm (since II41'-`y,,,sll-^0 as n->oo, by Lemma 14.6). Let q, denote the density of the absolutely continuous component of Q,. There exists a positive number c such that 1>B= fBg l (x)dx>0,
BC{xER k :q,(x)
(19.62)
Write B„={xEB:IIxii
o=f q,(x)dx. e^
(19.63)
Expansions of Densities
201
Then there exists no such that (n>n o).
9>0.>2
(19.64)
The distribution QI; n of Y 1.n may then be expressed as (19.65)
Q;;n=egG„'+(1—On)H„',
where G„, H„ are probability measures, and G„ is absolutely continuous with density q,(x)•IB.(x) (x ER k ). (19.66) n
Define the function p n on R k by pn (x)= q,(x+a1 )IB^(x+a1 )
(XER k ).
(19.67)
n
Then the distribution QI, n of Z 1.n may be expressed as (19.68)
where Gn , H. are probability measures on R k , G. being absolutely continuous with density p,,. Write G for the probability measure on R k with density q I -IB /0. Then sup G,' (t)—G(t)I= sup IB f exp(i}g l (x)dx n B.
IER k jERk
e
— 1 f Bexp{iQ , x>}gi(x)dx
G B 1
1
l+ i f
n en— 0 / 0 B,B„
g l (x)dx—.0
(n--goo). (19.69)
Also observe that Gn has a bounded and, therefore, square integrable density pn , so that G, E L Z(R "`). Clearly,
G,,(t)=G„' (t)exp{i(t, —a n )},
1G (1)1=^G„' (t)I 1
(tER k ). (19.70)
Using the expression (19.68), one has n
(Ql.n) *n— \j)Bn (l —B") j-0
(n—j) G*j*Hn^ n— l ) (19.71)
202
Expansions—Nonlattice Distributions
where G,! 0 = H, ° is the probability measure degenerate at zero. Write
(j:0<j—n" 2 logn)
_ —s . n
,
(j;j—n9,,<—n'/ 2 logn)
(19.72)
i- 0
Applying Theorem 17.11 or Corollary 17.13 to a triangular array whose nth row consists of n independent centered Bernoulli random variables with parameter B" (note that such an extension of the quoted results is immediate in view of the fact that they deal with tail probabilities of {Q": n> 1) and one only needs to be able to represent each Q" as the distribution of the normalized sum of n independent random vectors whose average moments and average variance covariance matrix satisfy the given hypotheses), one has, using (19.64), I "(i)99 (1—B")" j=o(n '")
(n-goo),
-
-
(19.73)
for every positive integer m. Define the measures G", H", M by H.
G. (A) = G" (n '/2A),
(A) = H" (n ' /2A)
(A E avc, n" 2A= (n' " 2x: xEA)), Mn,; _ (,")OJ ( 1— Bn
)" - iG^.i«H
. "—^^.
(19.74)
Then write n
Q,=
2
Mn j .
(19.75)
j-0
One has
f (l+llxll')IQ' *',,I(dx) -
= f (1+(IxII')I^'M.,,—T;,,j(dx)+o(n-t:-2>/2), (19.76)
Expansions of Densities 203
since 9"( 2"() G' 1—B ")"-;[ l+( n+n/2EY II ^."II) J ^
dx) f(1 1+ IIx II')(^ ,,) M(
=o(n -( s -2) / 2 )
(n-+oo).
(19.77)
Observe that the inequality
(19.78)
IIZl."II < n I/2 +IIEY1."II implies
G"({IIxII < I+n -1 / Z IIEYi"II})=1 =H"({Ilxll < l+n - ' /Z IIEYi."II}), M" ({IlxII>n+n'/ 2 I(EY1."II})=0, ✓
-i M",i({Ilxll
(19.79)
By Lemma 11.6 we have
f (l+IIxll')Ij'M",j—%F',,fl(dx)
"
'I'
(19.80)
Use Lemma 9.5, the fact that II D" —I II goes to zero as n—*oo, and the relation [see Lemma 14.1(v)] n -r/2EIIZ1,"II
.+2
=o(n -cs-2)/2
)
(r> s— 1)
(19.81)
to get IID ("n,k+,+i
—
`f" )I^^=o(n
-(J-2)/z )
(n-+oo).
(19.82)
Write (the constant c 20 below is the same as in Theorem 9.10)
T"=Dn ' /2 ,
i,= Eli T"ZI."Iir
A"= {tER k : IItII
A;,= tER'^:Iltll< nJ 16p3
(P3=EIIXIII 3 ),
(19.83)
204
Expansions—Nonlattice Distributions
where A. is the largest eigenvalue of D,,. Letting (Q^-4'R.k+:+1)(t)I dt,
'=f ID I2= f
A„\A„
JD^Q^ (t)I dt,
13
f
" i' (t)I dt,
I4=
f
ID^^'M„j (1)^dt,
(19.84)
R k \A;
,k+ , +I( t )I dt ,
D ^^ ' IS— I J R k \A
one has
I
DP(
M„J
—
`i',,,k+,+1)j
<' i +12 +13 +14 +1 5 .
(19.85)
By Theorem 9.10 and relation (19.81) (with r=k+s), one has II -' b9A1'%+a+2n -(k+s)/2 < b911
7
, Ilk+s+2E
=o(n -(,-2) / 2 )
IIZ1'.11k+r+2n
-(k+r)/2
(n--*oo),
(19.86)
since, by Corollary 14.2,
II7;, - 'II =IID,, '/ 2 -I1I-40, (19.87)
IIT^ ' — III=IID„` /2— III -+ 0 (n-goo). By Lemma 14.1(v) there exists a positive number b io such that
n` / ^►1k+s (+2 t) > n 2 (II 7",Il k+.' +2EIIZ,,,^ll k+.r+2 )
-I/(k+s)
> n`/2(j Tellk+a+22(k+s+2)n(k+2)/2Pj)-I/(k+s)
> b 1on (s-2)/(2(k+s))
/
for all sufficiently large n. From this and (19.87) it follows that there exists
Expansions of Densities
205
a positive constant b 11 such that A" D { I E R k : 11 1 11 < b1in(s-2)/(z(k+s))},
A\A n C (IItII>b 1 ,n ( S -2 >/ (z(k +s )) , IEA.').
(19.88)
By (19.88) and Lemma 14.3 one has I
2
=o(n -( ' -2) / 2 )
(n-,00).
(19.89)
Because of the presence of the exponential factor exp{-2
I
-( s -2) /
2 )
(.n-+oo).
(19.90)
It follows from (19.79) [and (19.74)] that
DOM";(t))<(7)O (I - 9 )"-1(n+nh/211EYi.nII)INI (0<j
so that, in view of (19.73), 1 3 =o(n -(
s -2) / 2 ).
(19.92)
It remains to estimate 14 . By virtue of (19.64) there exists an integer n, such that for all indicesj entering in I' j>n'/ 2 (n>n,).
(19.93)
Therefore for all such j, using (19.79) and the Leibniz formula for differentiating a product of n functions, ID OMM,j (t)I <)O (I -9 )"
-i (n IPI/2
n"=—ICI
)(n h/Z + IIEY1.nhI)IFfI
1Z )
and one has I a
(IIII>"'i2/(^^a>))
=b 12 nlPI 1k / 2
IG) ^/z "
fl8J di
IG (t)I' -IOldt f(II1II>^/Ubv^)} "
G bi2nlPI+k/2(g.'/2-IPI-2) lGn (I)1 2 th,
(19.94)
f
=
206
Expansions—Nonlattice Distributions
where b t2 is a positive constant and sup
S„-
G(t)I.
(19.95)
(II1II>(I6v,) ^) -
By (19.69) and (19.70) (and remembering that G is absolutely continuous), sup
llm S„= lim S„= n-i 00
n- 00
IG(i)I<1,
(19.96)
II1II>(I6p3)
so that 8„ <8 < 1 for all large n (and some 6< 1). Also, by the Plancherel theorem (Theorem 4.2), f iG„(1)I Z dt=(2ir) k f p, (x)dx<(21r) k I f q (x)dx,
on
(19.97)
which is bounded away from infinity. Thus I4=o(n-('-2)/2)
(n-goo).
(19.98)
The above estimates of 1 1 ,...,15 are now used in (19.85), and the resulting estimate is combined with (19.82) to yield [via (19.60), (19.76), (19.80)] the desired result. Q.E.D. The following corollary is now immediate. COROLLARY 19.6 Under the hypothesis of Theorem 19.5, II Q„ - `yn.sII =o(n
-(s-2)/2 )
(n-.,00),
(19.99)
where II I) denotes variation norm.
It is possible to prove analogs of Theorem 19.5 (and the above Corollary) for a sequence (X,,: n> 1) of nonidentically distributed random vectors following the same method of proof as above. For example, it is not difficult to check that under the hypothesis of Theorem 19.3 or Corollary 19.4 one has s-3
(l+IIXIIS)IQn— y n - r/ 2p,(—
{x, }) (dx)
r-o
0
(n-(s -z)/2) (n-.00), (19.100)
where the notation is as in Theorem 19.3. Indeed, the proof is simpler
Expansions Under Cramer's Condition 207
because the sums of p consecutive X„'s have uniformly bounded densities. Finally, we observe that the hypothesis of Theorem 19.5 (even when Q t is assumed to be absolutely continuous) is not sufficient to provide the uniform local expansion obtained in Theorem 19.2. The discussion following the proof of Theorem 19.1 [as well as the counterexample displayed by (19.16)) proves this point. Remark. It is sometimes useful to relax in Theorem 19.5 the requirement that Q, has a nonzero absolutely continuous component and require instead that Q^ " has a nonzero absolutely continuous component for some positive integer m. The proof that (19.55) holds under this relaxation is essentially the same as above. Note that Theorem 19.2 is proved under a similar relaxation of the hypothesis that Q, has a bounded density.
20. ASYMPTOTIC EXPANSIONS UNDER CRAMER'S CONDITION A probability measure Q on R satisfies Cramer's condition if
•
•
lim IQ(t)V1.
(20.1)
Iltil—oo
A probability measure Q having a nonzero, absolutely continuous component (with respect to Lebesgue measure) satisfies (20.1), as a result of the Riemann-Lebesgue lemma [Theorem 4.1(iii)]. In addition, there are many singular probability measures satisfying (20.1). Any Q having a nonzero singular component of this type also satisfies Cramer's condition. Note that (20.1) implies that Q is nondiscrete; if Q is purely discrete, then Q is almost periodict and the lim sup in (20.1) equals one. Moreover, any Q satisfying (20.1) is nondegenerate, that is, does not assign all its mass to a hyperplane. In fact, (20.1) implies that Q is strongly nonlattice in the sense that there does not exist t o 0 in R k such that IQ(t0 )1=1.
(20.2)
Notice (20.2) implies that 1 Q (nt o)l =1 for all integers n, thus violating (20.1). We shall discuss (20.2) and lattice distributions in greater detail in the next chapter. We point out, however, that (20.2) means that the entire mass of Q is carried by a sequence of hyperplanes orthogonal to t o . From the above discussion it follows that (20.1) is equivalent to sup I Q (')I< 1 IItII>b
for all positive b (or, equivalently, some positive b). tSee Katznelson [1], Chapter VI.5.
(20.3)
208 Expansions—Nonlauice Distributions
THEOREM 20.1 Let {X: n> I) be an i.i.d. sequence of random vectors with values in R k whose common distribution Q, satisfies Cramer's condition (20.1). Assume that Q, has mean zero and a finite sth absolute moment for some integer s> 3. Let V denote the covariance matrix of Q 1 and X its with cumulant (3 < II < s). Then for every real-valued, Bore/-measurable function f on R k satisfying (20.4)
M5,(f)<00
for some s', 0 < s' < s, one has s-2
f fd ( Qn' 2 n -r/2P.( -
0: {7C+))
r-o
G M;(f)si(n) +c(s , k) i1( 2 e
-dn :
tb t,), (20.5)
where Q. is the distribution of n - U 2 (X, + - • • +X n ), d is a suitable positive constant, and 8,(n)=o(n-(s-2)/2),
(n-*oo).
(20.6)
Moreover, c(s,k) depends only on s and k, and the quantities d, & 1 (n) do not depend on f. Proof. Assume that V= I, without essential loss of generality. As in (19.56), introduce truncated random vectors Y 1, ,,, ZZ,,,. Recall that sup (I+Ilx11')-IIf(x)I, s'>0,
Ms(f)=
xER k
(20.7)
wl(Rk)= sup If(x)-f(y)I, x,yERk
Let Q„ denote the distribution of n - '/ 2(Y 1 , + • • • +Y.,,,), and Q„ that of n - ' / 2 (Z 1 , n + • • • + Z n n ). By Lemma 14.8, for all large n one has f fd(Qn—Q,')I < M,.(f) f
(I+Ilxll s )IQn — Qn ' I( dx )
< M3,(f)c'(s', ․ )n
- cs -2) / 2
0 n , s ,
(20.8)
(n-aoo).
(20.9)
where Tn,s
IIX1113=o(I) {IiXIII>n'/2}
Expansions Under Cramer's Condition 209
Writing
a"=n"2EY1 ",
(20.10)
one has, by (14.81), s=o(n-cs-2)/2) (n—.ao).
Ila"II < k'/zn-cs-z)/zp"
(20.11)
Now
I
fdQ" =
f f°• dQ"'
s-z
f (fa.—f)d'/zP,(—^: {}) r=o ,-2
)
I n-r/2 ( Pr( - 0: (Xv})( x ) -P,(-0: (X,))( x-a.)) dx
=
r=0
I ff(X
< C,(s',s,k)MS (f )n - cs -2) /z
(20.12)
The last inequality follows from the second inequality in Lemma 14.6. Next use the first inequality in Lemma 14.6 to get s-2
s-2
f
fw d
n-'/ZP,(—^: { })— r0
n-'/zp,(—Oo.o^: {^,"}) r0
< c2(s',s,k)M5,(f)n-(s-2)/20" s.,
(20.13)
X =vth cumulant of Z 1 ,.
(20.14)
where D"=Cov(Z1,"),
In view of (20.8), (20.11)-(20.13), it is enough to estimate s-2
f
(x)))..
(20.15)
n -r/2Pr( — Do.o,: (1C.,")).
(20.16)
r=0
Write s+k-2
r=0
210
Expansions—Nonlattice Distributions
We first estimate ,f f dB. By Corollary J J.2,
f f;dHnl <MS(I) f [ 1 +(IIxII+e+Ila,.II) s ]IHn*KKI(dx) s+k-2
r=0
— r/Z (e>O), (20.17)
where we choose the probability measure K, to satisfy
KK ((x: Ilxll<e))=1, (20.18)
I D"K,(t)I 4 EIalc3( s,k)exp( —(EIIuII)"2)
(tER k , IaI <s+k+ 1).
This is possible by Corollary 10.4 (with s replaced by s + k + 1), on defining K,(B)= K(e - 'B) for all Borel sets B. Since Ilxll <E;_ I Ix; I and e will be chosen to be smaller than one, Lemma 11.6 may be used to obtain f [ 1 +(IIxii+Ila,,II+E)' ]IH*KI (dx) < c 4 (s,s',k)
max
f I DO (h.i,)(t)idt.
(20.19)
0<1/3K
One can write D(J)'
2 c5(a,$)(Dfl-1Hn)(DQK€).
(20.20)
0(a< /3
By Theorem 9.10,
1
' D$-a1,(t)] [Da K.(t)]Idt< f
IDP-dHH(t)Idt
11'11
(20.21)
where T. is the symmetric, positive-definite matrix defined, for all n> n o , say, by T2 = and, writing A„ for the largest eigenvalue of D,
s+k+l
A„ _
c 7 (s, k)n 1/2
h/(s+k
(EIIT,Z1.1,II An
)
—
I)
'/Z
(20.22)
211
Expansions Under Cramei s Condition
By Corollary 14.2, (II T" II : n > no ) is bounded, and [by Lemma 14.1(v)] EIIT1,.II5+k+1 G2 s+k+lEIIy1 .nlls+k+1= 0(n(k+1)/2) C8(s,
(n—+),
k)n(1 /2)(s— 2)/(s+k— 1) (>
Ps /(s+k— 1)
(20.23)
where c 8 is positive. Use the first estimate of (20.23) in (20.21) to get
f
I ^ DO aHH (t) J [ DGK^ (1) -
{Il^il
IlI dt = o( n —(s-2)/2) (n--aoo). (20.24)
Write 1/z
c" = n (20.25) 16p3
By Lemma 14.3,
I
IrDp "Hn(t)]^DaKe(t)fl dt
{Il'Il>A,}
L
I Dfl-aQn (1)][D'k,(t)1I di {II1)>c }
+ fc 9 (s, k)(1 + IItIIIS - °I) exp { — ie IItII 2 } di { II1II >A}
s+k-2
+ fDO - °` 2 n - '/ 2p(it : (X,}) exp( — Z) di (11111>A.)
r-0
1 1 +12 +13 ,
(20.26)
say. By the second inequality in (20.23), 12 =o(n -(
s -2) / 2
)
(n—+oo).
(20.27)
The same is true of 1 3 because of the presence of the exponential term. It remains to estimate 1 1 . This is where we use Cramer's condition (20.1).
212
Expansions—Nonlottice Distributions
Observe that fl - al
ID s-°Q, (t)I < nis-°IEII n
E
jgn(t)j"-IB-aI,
1f2 4I
(20.28)
where gn (t)=E(exp{i
{i
-
/2
(20.29)
t'yj.n>})I
))j+ 2 P(IIX1II>n 1/2 ), so that, by Cramer's condition [see (20.3)] sup Ign (t)^<0<1
(20.30)
11:11 >c,
for all sufficiently large n. Here 9 is a number independent of n. Hence by (20.28), (20.30), and (20.18), we get I 1 < cioe I-I n lN--I/2gn-Ill-al ( J (II:II >" Z/ 16p3 }
< c1onla-QI/29n-Ia-aIEIai-k f
exp( —(EII111)"2) dt
exp{ _ 11!11 1/2 } dl
J < g I n ( s +k+ »/ ZB"e -k (20.31)
for all large n. Now choose =
e
—
(20.32)
dn ,
where d is any positive number satisfying d< — k logo,
(20.33)
so that
J,=o(n-(s-2)/2)
(n--goo).
(20.34)
Therefore we have shown /^
J
s+k-2
ff dH,<(.Ji. (2e_1?:
I n -'/2p ^\ -
0.D,: {Xr.n})
r-O
+M,(f)o(n -( s -2) / 2 )
(n—>oo).
(20.35)
Expansions Under Cramer's Condition
213
Now s+k-2
Wf
2e —d": "
n—r/2P,(-4,OD": {Xvn}) r=0
(
<
s+k-2
Wf."(2e-dn: n-r/2I pr (—' o D" : (x, ))I).
(20.36)
,
r=o
For 0< r< s-2, using (14.74), (9.12) in the first step, Wf""(2e—dn. n—r/2
I Pr( — 'OD : {Xrn))I)
{IIxII < n h/6)
wf(x:2e-dn)I$^,.D"(x)—$(x)ldx
1
+w } j (x: 2e -d")qs(x)dxl {IIxIIn'/
e
} wf(x:2e—dn )(I+IIx113r)'P;„D"(x)dx
- r/ 2 }(1
+
IIX113r+^ x
J
dx
(IIxII>n h / 6
< M,.(f)'o(n
-(s-22
)+ci3P:wl( 2 e
-dn
: 1).
(20.37)
These inequalities are based on Lemma 14.6 and wf (x : e) < 2MJ (f)(1 +(11x11 +e)").
(20.38)
For s— I
)
= Ms (f ) . o(
n-(s-2)/2
)
(n—*oo),
(20.39)
214 Expansions—Nonlattice Distributions
since, by Lemma 14.1(v) [or relation (19.81)], n -r/2EIIZi."Ii
r+2 =o(n -( s -2)/2
)
(r> s- 1).
(20.40)
Thus we have finally shown that
f f,dH"I
/2).M,.(f)+cis(r,s,k)wf(2e-
M:
Cu)
(n-+oo). (20.41)
Since, as in (20.39), s+k-2
J f
d n
-
r -s-I
r/2
( Pr\ —40,D : {Xr.n })
= M: (f)-o(n -( s -2)/2)
(n -),00),
(20.42)
one has s-2
J f4.
d (Q ,—
n-rf2P, ( —
r-0
<
M3.(.f). o(n-
P O.DD
(s-
:
2)/2)+c18(r,s,k)w1(2e -dn: Cu).
(20.43)
The proof is now complete in view of (20.43), (20.13), (20.12), (20.9), and (20.8). Q.E.D.
The following corollaries are immediate. COROLLARY 20.2 Suppose that f is a bounded, Bore!-measurable function. One has, under the hypothesis of Theorem 20.1, s-2
f
fd
Qn
-
I
ro
n - r/2Pr( -0 o,v: (X,)) = wf(R k ) 8 (n) +c(s,k)wf(2e-°": 4i 0
i,). ( 20.44)
Let F be a class of real-valued, Borel-measurable functions on R k satisfying sup M,(f)<00,
feg
-(,-2)/2) (40). s u p 51(e: 4i o.V )= o((- loge)
(20.45)
215
Expansions Under Cramer's Condition
COROLLARY 20.3 Under the hypothesis of Theorem 20.1, s-2
n - r/ 2Pr (
sup ( f fd(Q- JE
-
^OV' {x})
r-0
= o(n -(s-2)/2)
(20.46)
(n-*oo).
Thus Theorem 20.1 provides asymptotic expansions (with an error term that is of smaller order of magnitude than the last term in the expansion) for an extremely large class of functions. In particular, for the class Qa (c:4D o, v )=(AE J^ k : 00, v ((aA)`)0}
(0
one has COROLLARY 20.4 Under the hypothesis of Theorem 20.1, s-2
-c,-2)/2 ) supIQR(A)- 1 n - '/ 2Pr (-4 o, v : {x,))(A) =o(n A EQ a (C: 4'o V)
r-0
(n-*oo).
(20.48)
A simple consequence of (20.48) is s-2
sup IQ.(C)
n-r/2Pr(- b0:
-
cee
{X,))(C)
=o(n
-(s-2)/z
)
(n-*oo),
r-0
(20.49) where G is the class of all Borel-measurable convex subsets of R "`. One may sharpen (20.49) somewhat as COROLLARY 20.5 Under the hypothesis of Theorem 20.1, s-2
sup (1+d (0,8C)) Q(C)3
Cae
=
2 n r/2Pr( -4 o.V {X,})(C) r-0
o(n - c ,-2) / 2 )
(20.50)
where d(0, a C) is the distance between the origin and the boundary a C of C.
Expansions—Nontattice Distributions
216
Proof. Take (x ER"),
f(x)=(l+d'(0,aC))Ic .(x)
(20.51)
where C' = R k C or C according as 0 E C or 0 fZ C. One only needs to check
fI+(11x11+2e) ° ]4io, v (dx) (I+d'(0,8C))' o, V ((8C) 2, ) < f (ac) 2—0(e)
(20.52)
(40)
uniformly for C E 3. The inequality in (20.52) is easy to check, and Corollary 3.2 does the rest. For k= I, let F,, denote the distribution function of Q, and use the same symbol for the signed measure (y,)) and its distribution function to get a-2
sup (1+1x1 3 ) F„(x)— Y.
" 2P,(-4) o v : {x,))(x) ,
xER 1 r-0
=(_(S2)/2)
(n--)-oo).
(20.53)
It is possible to extend Theorem 20.1 to the non-i.i.d. case as follows. THEOREM 20.6 Let (Xe : n> 1) be a sequence of independent random vectors with values in R k , having zero means, positive-definite covariance matrices, and finite ;th absolute moments for some integer s> 3. Assume that (i) the smallest eigenvalues X„ of Vn = n - 'EJ , Cov(XX) are bounded away from zero, (ii) the average sth absolute moments n - 'J ,EIIX;II` are bounded away from infinity and lim
n—.00
n '
n
f
-
J - 1
)IIX;IIs=0
(20.54)
Ilx i II>m^^=
for every positive e, and (iii) the characteristic functions g„ of X. satisfy
lim sup jg„(t)I < l °~00
(20.55)
II1II>b
for every positive b. Then for every real-valued, Borel-measurable function f on R" satisfying (20.4) for some s', 0< s' < s, one has .,-Z
f fd( Qn— 1 n-'/2P,(—'D: {X,,,n)) r-0
<M:(f)8i(n)+c(s,k)w!(2e-cn_ 4),
(20.56)
where Q, is the distribution of n -1 / 2 B„ (X I +"• + X„), with B,2, = V, '. Also 8 1 (n) and d are as in Theorem 20.1, and z = average with cumulant of B„Xj (1 < j < n).
Expansions Under Cramer's Condition
217
The proof of Theorem 20.6 is entirely analogous to that of Theorem 20.1 and is therefore omitted. As indicated in the introduction to the present chapter, there are special functions f, for example, trigonometric polynomials, for which the expansion of f fdQ„ is valid whatever may be the type of distribution of X,. This follows from Theorems 9.10-9.12. Our next theorem provides a class of functions of this type. For the sake of simplicity we state it for the i.i.d. case. THEOREM 20.7 Let (X,,: n> 1) be an i.i.d. sequence of random vectors with values in Rk . Assume that the common distribution has mean zero, positive-definite covariance matrix I and a finite sth absolute moment p, for some integer s> 3. Let f be a (real or complex-valued) Bore/-measurable function on Rk that is the Fourier-Stielijes transform of a finite signed measure satisfying (20.57)
f Ilxll s-Z I t (dx) < oo.
Then
f
f-2
fd Q„
{7G}) =o(n - cs -2) / 2 )
- n - '/ Z P,(-^:
r=o
(n--*oo).
(20.58)
Here Q„ is the distribution of n " 2 (X 1 + • • • + X„), X, = with cumulant of X 1 . Proof. By Parseval's relation (Theorem 5.1(viii)] and Theorem 9.12, s-z
f
fd Q„ - ^ n-'/2P,(-0: {X})1
= ro
/
s-z
= f Q„(t) - 1 n
- ' /2
P,(it:{X,.})exp(
r=o
f
8
(n)n -(s-2)/2
-
ill'11 2 ) It(dt)
[IItll Z +llt11 10-2)
I
{Iltll
Xexp{ - IIBIZ I I pj(dt) s-i
+
1 +
j.
u=
{11 1 11>Co(s .k)n / P, 1/a-2)}
J
'/ZI Pr(it: {X'))I r—O
x exp{ - 211 1 11 2 } IILI(di) < c'S (n)n -^°-2)/ZII µII
+cn -( s -2) / 2 f {II'II>
c2o(s.k)nU/ 2 p,
Iltlls-'I Al(dt) u/o-a)
(20.59)
218
Expansions—Nonlattice Distributions
where c depends only on the distribution of X 1 , and
c'= supk (Il 1 IIs+11111 3(,-2) )exp{ —Iat112 ) tER
I
.
Note that we have used a Chebyshev-type inequality f
I µI (dt)
<(c2o(s,k)n2ps
v(s-2))-(..-2)
{IIII> c20(s,k)n'/ 2 p^ '/ ( ' -2) )
X
II11Is-2I ttI (di).
f {11111>c20(s, k)n" 2p, t/(-2))
Q.E.D. We point out that if µ is discrete, assigning its entire mass to a finite number of points of R k , then the above theorem applies, and thus f may be taken to be an arbitrary trigonometric polynomial. However the result applies to a much larger class of functions f, including the class of all Schwartz functions (see Section A.2 for a definition of this class). Finally, for strongly nonlattice distributions we have the following result. THEOREM 20.8 Let (X,,: n> 1) be a sequence of i.i.d. strongly nonlattice random vectors with values in R k . If EX 1 = 0, Cov(X,) = I, and p 3 - EIIX 1 11 3 < cc, then for every real-valued, bounded, Bore/-measurable function f on R k one has J fd (Q,,—(D—n-'/2P1(— I:
{X„)))I=wj(R k ) . o(n -I/Z )+ 0 (wl (s.
4 ))
(n-moo), (20.60)
where Q. is the distribution of n -1 / 2 (X I + . • • +X„), X,, = with cumulant of X 1 , and 8„ = o(n - 1 / 2 ); a n does not depend on f. Proof. Given q >0, we show that there exists n(rl) such that for all n > n(ij) the left side of (20.60) is less than wf (R') , o(n -1 / 2 )+c(k)4 (ran -1 / 2 : fi).
(20.61)
Introduce truncated random vectors Y1, (1 <1< n) as in the proof of Theorem 20.6 and recall that Q'„ and Q„ are the distributions of the
Expansions Under Cramer's Condition 219
normalized sums of these random vectors with and without centering, respectively. Then by (20.8), (20.11 x(20.13) (with s = 2, s'— 0)
I
f
-1 / 2 ) J fd(Q,—Qn )I=wf (R k )•o(n
f f dQn = f ff dQ,
(n—.00), (20.62)
(f; —f)d(4+n -I / 2P I (—& {x,}))I=wf(Rk)•o(n-1/2) (n--*oo),
f f d(I1+n ;
-
'/ 2P I (—t: {)G.)) — Do.n — n -1/2P1( -1o.n,: {x3, )))I f (R k )•o(n -
=w
1/
2
)
(n--boo).
Also, by (20.42), k+1
f
fw d 71 n
P,(
—
PO,n,: {X,.n})
r-2
=wf(Rk
)•o(n-
1/2) (n-+oo).
(20.63)
Write k+1
Hn=Q,'. — I n -r/2 p'( —(b: {X}).
(20.64)
r-O
In view of (20.62), (20.63), it is enough to estimate f f^ dH,,. By Corollary
11.5, f fa^dH„I <(2a-1)-1 wf(Rk )IIHn'KKII [
k+l
+cwfa^ 2 E:
- ' /2P,{ -4 o.v,: {X})) r—O
J
(20.65) '
where thekernel probability measure K satisfies
a-K(x: IlxII< 1 })>z , f K(t)=0
IIxII
k+4
K(dx)
if IItII>c1(k)•
(20.66)
220
Expansions—Nonlattice Distributions
Such a choice is possible by Theorem 10.1. Also,
K.(B)=K(e 'B) -
(B E ), (20.67)
r=inn
-i/2
By Lemma 11.6,
ID^(H„K^)(t)Idt. (20.68)
IIHH`K.II
$I< k+ 1 (IltII < 2c i (k),7''n "2 )
Letting cn = n l / 2 /(16p 3 ), one has, by (20.24), (20.26), and (20.27) with s = 3,
ID # (H„K, )(t)t dt = o(n -1 / 2 )
(n—>oo).
(20.69)
(11:11
Also, D" ( Hn &t)( t )I C 3( k, P3) Y, I D 0
Q—a
Hn(t)I
and, since apart from Q„ the remaining terms of 6,, (and their derivatives) contain an exponential factor, it is enough to estimate I = f
(n'/2 /( 16P3) < 11:11 <2ci(k)*1- snu/s
)IDP-aQ„ (t)I dt.
(20.70)
By (20.28), ID$—aQn (
t )p< nIp—align\t)In—Ift—al'
(20.71)
where [see (20.29) and (14.111)] 8n(t)= E(exp{i
I8n(t)I
-i/2
t,Xi>))I+o(n -1/2 )
(n—oco). (20.72)
Since X i is strongly nonlattice, (
^/z
sup(I8n(t)l: 16/3
(20.73)
Expansions Under Cramer's Condition
221
for all n> n('q), say. Using estimates (20.71), (20.73) in (20.70), we have I=o(n - '/ Z )
(20.74)
(n-+oo),
so that IIHH'K,II =o(n-1/2)
(n-*oo)
(20.75)
Finally, as in (20.36)-(20.41), :-2
Wf
w
2e: (
i r-0
n—r/2Pr\ -4 O,D. :
` "^"I )
c7 (2e: 4))+Wf(R k )•o(n —t/2 )
(n—*oo). (20.76)
Theorems 20.7, 20.8 hold with I replaced by an arbitrary symmetric, positive-definite matrix V. We chose I merely to simplify notation. As an immediate consequence of (20.60) one has, in case X, is a strongly nonlattice distribution with mean zero and covariance I, lint sup n 1/2 IF.(x) -0 (x)I= sup IP1( -4 : (X.))(x)I , (20.77) xER'
xERk
where F, is the distribution function of Q. For k =1 (20.77) yields lim sup n t " 2 IF. (x)—co,,2(x)J= n^oo x E R'
Iµ31 (20.78) 6(27r)1 /2 0 3
provided that X t is nonlattice and has zero mean, variance a 2 >0, and third moment µ3 . It may be noted that for k =1 "strongly nonlattice" is the same as "nonlattice." One may also easily write down analogs of (20.77) for more general classes of sets (than rectangles), for example, the class of all Borel-measurable convex subsets of R c, or the class & 1 (d: 1) introduced in (17.3). NOTES Although Chebyshev [11 and Edgeworth [1] had conceived of the formal expansions of this chapter, it was not until Cramer's important work [1,31 (Chapter VII) that a proper foundation was laid and the first important results derived.
Expansions—Nonlattice Distributions
222
Section 19. For an early local limit theorem for densities in the non-i.i.d. case and its significant applications to statistical mechanics the reader is referred to Khinchin [3]. Theorem 19.1 is essentially proved in Gnedenko and Kolmogorov [I], pp. 224-227; in this book (pp. 228-230) one also finds the following result of Gnedenko: in one dimension under the hypothesis of Theorem 19.2 one has s-2
sup q,,(x)- i n -112P/( +: {L))(x) =o(n -
-(,-
2 ' 12 ).
(N.1)
xER 1 !-0
For k- I and s>3 the relation (19.17) in Theorem 19.2 was proved by Petrov [1] assuming boundedness of q„ for some n; however this assumption has been shown to be equivalent to ours in Theorem 19.1. Theorems 19.2, 19.3, and Corollary 19.4 appear here for the first time in their present forms. The assumptions (19.47), (19.49) may be considered too restrictive for the non-i.i.d. case; however it is not difficult to weaken them and get somewhat weaker results; we have avoided this on the ground that the conditions would look messier. Theorem 19.5 is due to Bhattacharya [8]; it strengthens Corollary 19.6, which was proved earlier by Bikjalis (4) for s>3. Section 20. Cramer [1,3] (Chapter VII) proved that
sup
k_
f
-3
I
n J /2P(-4 o ?: {X,))(x)I_O(n- t,-=W/2) (N.2) -
XER 1 J-0
in one dimension under the hypothesis of Theorem 20.1; here F. is the distribution function of n -1 / 2 (X 1 +... +X„) and var(X 1 ) - a 2. This was sharpened by Esseen (1], who obtained a remainder o(n t' -2 )/ 2) by adding one more term to the expansion; Esseen's result is equivalent to (20.49) when specialized to k =1. R. Rao [1,2] was the first to obtain multidimensional expansions under Cramir's condition (20.1) and prove that in the i.i.d. case one can expand probabilities of Borel-measurable convex sets with an error term O(n t' -2>/ 2 (logn)t k- 't/ 2) uniformly over the class Cs, provided that the hypothesis of Theorem 20.1 holds. This was extended to more general classes of sets independently by von Bahr [3] and Bhattacharya [1,2]. Esseen's result on the expansion of the distribution function (mentioned above) was extended to Rk independently by Bikjalis [4] and von Bahr [3]. Corollaries 20.4, 20.5 as well as the relation (20.49), which refine earlier results of Bhattacharya [1,2) and von Bahr [3], were obtained in Bhattacharya [4, 5]. The very general Theorem 20.1 is new; this extends Corollaries 20.2, 20.3 proved earlier by Bhattacharya [5]. Theorems 20.6, 20.7 are due to Bhattacharya [4,5]. There is a result in Osipov [l] that yields o(n_ (1 _ 2)/ 2) in place of o(n t'_ 2)/ 2) as the right side of (20.53). Some analogs of Theorem 20.8 have been obtained independently by Bikjalis [6). Earlier Esseen [1) had proved (20.60) xl x E R t )) in for the distribution function of Q. (i.e., for the class of functions { f I one dimension and derived (20.78). -
-
-
- _,,. : (
CHAPTER 5
Asymptotic Expansions Lattice Distributions
The Cramer-Edgeworth expansions of Chapter 4 are not valid for purely discrete distributions. For example, if (X,,: n> 1) is a sequence of i.i.d. lattice random variables (k= 1), then the distribution Q„ of the nth normalized partial sum is easily shown to have point masses each of order n - I / 2 (if variance of X, is finite and nonzero). Thus the distribution function of Q. cannot possibly be expanded in terms of the absolutely continuous distribution functions of P,(- 4)), 0 < r < s -2, with a remainder term o(n -(,-2)/2 ), when X, has a finite sth moment for some integers not smaller than 3. However the situation may be salvaged in the following manner. The multiple Fourier series Q. is easily inverted to yield the point masses of Q. Making use of the approximation of Q„ by exp( - 2)^;=pn - '' 2P,(it) as provided by Chapter 2, Section 9, one obtains an asymptotic expansion of the point masses of Q„ in terms of j;=an - '12P,(-4). To obtain an expansion of Q„ (B) for a Borel set B, one has to add up the asymptotic expansions of the point masses in B. For B = (- oo, x], x E R k , this sum may be expressed in a simple closed form. A multidimensional extension of the classical Euler-Maclaurin summation formula is used for this purpose. 21. LATTICE DISTRIBUTIONS Consider R k as a group under vector addition. A subgroup L of R I is said to be a discrete subgroup if there is a ball B (0: d), d > 0, around the origin such that L n B (0: d) _ (0). Equivalently, a subgroup L is discrete if every ball in R k has only a finite number of points of L in it. In particular, a 223
224
Expansions—Lattice Distributions
discrete subgroup is a closed subset of R k . The following theorem gives the structure of discrete subgroups. THEOREM 21.1 Let L be a discrete subgroup of Rk and let r be the
number of elements contained in a maximal set of linearly independent vectors in L. Then there exist r linearly independent vectors i , ... , in L such that L=Z•E,+ • • • +Z•t,-(m i t t + • • • +m,t,: m i ,...,m, integers) (Z=(0,±1,±2,...)). (21.1)
Proof First consider the case k = 1. If L is a discrete subgroup, L * (0), then to =min t : t E L, t > O) is positive. If t E L is arbitrary, let n be an integer such that nto < t < (n + 1)t a . Then 0 < t — nt p < to and t — nto E L, so that the minimality of t o implies that t = nt o or L = Z • t o . Now consider the case k> 1. The theorem will be proved if we can construct linearly independent vectors,..., , in L such that Ln(R• +•.. +R• ,)=Z•¢ i + • • • +Z..,, since it follows from the definition of the integer r that L c R• J, + • • • +R•,. Here R is the field of reals. We construct these vectors inductively. Assume that linearly independent vectors ¢, ..., ^ have been found for some s, s < r, such that (i) EE L, j=1,2,...,s and (ii) Ln(R•J,+•••+R•^,)=Z•t 1 +•••+Z. . Now we describe a method for choosing ^ ^. Let a E L\(R• J, + • • • + R.). Then a is linearly independent of ...,ts . Let M={toa: t oa+t i t s + • • • +tEL for some choice of real numbers t1,...,ts }. Since ¢ i ,...,C,EL, it follows that M = (t o a: toa + a i t i + • • • + as E L for some choice of a i with 00 such that M=Z•a oa. Choose constants a^, 1 <1< s, with 0 < a^ < I such that fit+ - ao a + a 11 + • • • + a j, E L. If q = t oa + t i t s + • • • + t,E, EL, then t o = nao for some integer n, so that rl—n¢, +1 ELf1(R-j j +•••+R•E)=Z•J,+•••+Z•J,,. Thus,..., are linearly independent, and Ln(R•¢ 1 + • • • +R•t i) = Z • f i + • • • + Z • E +,. Since the argument works for s = 0, the proof is complete. Q.E.D. The set of vectors {,...,,} in the representation (1.1) is called a basis of L, and the number r is called the rank of L. A discrete subgroup of R k having rank k is called a lattice. The next theorem describes the structure of closed subgroups of R k.
Lattice Distributions
225
THEOREM 21.2 Let L be a closed subgroup of Rk . Let r be the maximum of dimensions of linear subspaces of L, and let r + s be the number of elements contained in a maximal set of linearly independent vectors in L. Then there exist linearly independent vectors r^ 1 , ... , rj,, , ... , J. in L such that
L=R•,q i + • • • +R•rJ,+Z•^ i + • • • +Z•1J .
(21.2)
Proof. Let r(d) denote the number of elements in a maximal set of linearly independent vectors in L n B (0: d). As d 0, r(d )1,. Let r o =limd.o r(d). Clearly, ro > r. We shall show that ro =r. Since r(d) is integer-valued, there exists d0 >0 such that r(d)=ro for 0
LfB(0:do )CR•rl I +••-+R•rl,o .
(21.3)
R•• l +••• +R•7l,0 CL.
(21.4)
We claim
To prove this let a>0 be arbitrary and let d 1 =min(e/k,d0 }. Then r(d 1 ) = do, and there exist linearly independent vectors Q^,...,/3,, in L n B(0: d 1 ). It follows that /3^ E R•rl 1 + • • • + R•rl o and, therefore, R•(3, + • + R•13,0 C R•rl, + • • • + R•17,, which implies R.f3 1 +••• +R•/3. 0 =R•rj 1 +••. +R•rl, o (21.5) Now let EE R•r1 1 + • • • + R•rl, o be arbitrary. Then J= t 1 X13, + • • • + t /3,0 Write t^ = m1 + tj with rn1 integral and ^ t^^ < 1, 1 <j < r o . Thus there exists /3 E L, /6= m, Q,+••• +m of13, 0, such that Ili — a II
1'
. •' k ) be a basis of L. Then R'= U+ L, where U={t I t l +••• +tk 4:0
(21.6)
226 Expansions—Lattice Distributions
Now R k /L=(zmx+L: xER k ) with group operation x+y=(x+y)', and (quotient) topology defined as the strongest topology on R"` /L that makes the map x-+x (on R k into R'^/L) continuous. The restriction of this map on U (into R k /L) is continuous and has R k /L as its image. Since U is compact, its continuous image R k /L is compact. Conversely, suppose that R k /L is compact. If possible, let r-rank L=0 for all EEL, so that the function x-* is well defined and continuous on R k /L. Since R k /L is compact, this function is bounded, implying that x--+ is bounded on R"`. This is impossible since u^O. Q.E.D. A random vector X on a probability space (2, , P) into R k is a lattice random vector if there exist x o E R k and a lattice L such that P (X E xo + L) =1.
(21.7)
The distribution of X is then said to be a lattice distribution. A random vector X (or its distribution) is said to be degenerate if there exists a hyperplane H.(i.e., a set of the form {x: =c) for some nonzero vector a and some real number c) such that P(XEH)=1. (21.8)
LEMMA 21.4 Let X be a lattice random vector. Then there exists a unique discrete subgroup Lo with the following two properties: (i) P(X E x + L o) = 1 for every x such that P(X = x)>0; (ii) if M is any closed subgroup such that P(X E yo + M) =1 for some yo E R k , then La c M. This discrete subgroup Lo is generated by ( j: P(X = x0 + ) > 0), where xo is any given vector satisfying P(X =x0)>0. Proof. Since X is a lattice random vector, there exist x o E R k and a lattice L such that P (X E x 0 + L) = 1. We may, and shall, also assume that
P (X = x0) > 0. Let Lo be the subgroup generated by the set (contained in L) {E: P (X = xo + ¢) > 0 ). Then Lo c L, so L o is a discrete subgroup and P(XExo +La)=1. If P(X=x,)>0, then x 1 Ex0 +L0, so that x 1 +L0 = x0 + Lo and P (X E x i + L o) =1, proving property (i). To prove (ii) suppose P (X E yo + M) =1. Since P (X = x o) > 0, it follows that x o Ey o + M and so x0 +M=yo +M, so that P(XEx0 +M)=1. The definition of L a now implies that Lo c M. Thus L o has all the properties stated in the lemma. The uniqueness is clear from property (ii). Q.E.D. For a lattice random vector X, the unique discrete subgroup L o with properties (i) and (ii) of Lemma 21.4 is called the minimal group associated
Lattice Distributions
227
with X. The rank r of Lo is called the rank of X. When Lo is of rank k, Lo is called the minimal lattice for X.
LEMMA 21.5 Let X be a lattice random vector. Then the following two statements are equivalent. (i) X is nondegenerate. (ii) The minimal subgroup associated with X is a lattice. Proof Fix a .vector x o E R k such that P (X = xo) > 0. Suppose that X is nondegenerate. If the minimal subgroup Lo associated with X is not a lattice, then there exist a basis {E,, ... , Ek } of R k and r < k such that Lo =Z•>e j + • • • +Z. In particular, this implies P(XEH)=1, where H = x0 + R• E 1 + • • • + R• Ek _ i , contradicting (i). To show that (ii) implies (i), suppose that X is degenerate. Then there exists a linear subspace of dimension (k —1), for example, M, such that P (X E x o + M)= 1. Minimality of the subgroup Lo associated with X implies L o c M, showing that L o is not a lattice. Q.E.D.
Let f be the characteristic function of a random vector X. A vector t o e R k is said to be a period of I fl if I f (t + t o)I = I f (t)I for all t E R k .
LEMMA 21.6 Let X be a lattice random vector and L the minimal subgroup associated with X. Let f be the characteristic function of X and L* the set of periods of If I. Then (i) L`={t: If(t)I=1} (ii) L• ={t:E2irZ for all EEL) (iii) L={>E: (t,E>E2orZ for all tEL').
(21.9)
In particular, L* is a closed subgroup of R k . Proof. Let 10 E L*. Then I f (to)I = I f (0)I =1. Conversely, suppose that If (t o)I =1. Then there exists a real number a such that f (1 o) = exp (is ). Therefore
E(1—cos(— a))=0,
(21.10)
or P(cos(— a) =1)=1. Equivalently, P( Ea +2i Z)=1,
(21.11)
so that f(t+to)=E(exp(i))=e°f(t),
(21.12)
and If (t+ to)I = If (t)I for all t, proving (i). To prove (ii), choose x o such
228
Expansions—Lattice Distributions
that P (X = x o) > 0 and let to E L*. Then (21.11) implies that <1 0, xo > E a + 2irZ, so that P(E2IrZ)=1.
(21.13)
If S=(: P (X = x o + E) > 0), then (21.13) is equivalent to <1o, J> E 277Z
for all j' ES.
(21.14)
Since S generates L, (21.13) is equivalent toE27TZ
(21.15)
for all EEL.
Thus t o belongs to the right side of (ii). Conversely, if (21.15) holds for some t o , (21.13) holds and f(t+t o)I=IE(exp{i))I=IE(exp(i — i })I=IE(exp{i +i })J=J f (1)J
(tER"` ). (21.16)
Thus to E L*, and (ii) is proved. It remains to prove (iii). By Theorem 21.1, there exists a basis { ... k } of R k such that L = Z • t 1 + • • • + Z • J„ where r is the rank of L. Let { rl,, ... ,,t k } be the dual basis, that is, %, rl^) = 5 ., 1 < j, j' < k, where 8y is Kronecker's delta. Then (ii) implies L*=27r(Z•r1 1 +... +Z q,)+R'ij. +j +... +R• l k .
(21.17)
The relation (iii) follows immediately from (21.17). The last assertion is an immediate consequence of (i) and (ii) [or (21.17)]. Q.E.D. COROLLARY 21.7 Let X be a lattice random vector with characteristic function f. The set L* of periods of !fI is a lattice if and only if X is nondegenerate. Proof. This follows immediately from representation (21.17) and Lemma 21.5. Q.E.D. Let L be a lattice and {j I ,•••,Jk ) a basis of.L; that is, L=Z•,+ • • • +Z•jk .
(21.18)
Let {rt 1 ,...,71k } be a dual basis, that is, <^^^>=Sy , and let L* be the lattice defined as L*=21T(Z•,q1+ • • • +Z.).
(21.19)
Lattice Distributions
229
Write Det(j l ,...,^k ) for the determinant of the matrix whose jth row is =(^ I ,...,^k ), so that Det(^ I ,...,^k )=Det(^ .). If .,k} is another basis of L, then there exist integral matrices A = (ay,) and B = (b11 ) such that Jj =la^j J.' and ^y =Yby.f^. Then DetA =±1, so that Det(i; I ,...,iak )=±Det(,...,?;k) Thus the quantity defined by det L = IDet(^ I , ..., i k )I
(21.20)
is independent of the basis and depends only on the lattice L. Also, with this definition, (2i)k det L* =(2 a) k IDet(,1 i ,.. .,rlk )I =
(21.21)
detL '
Consider a domain S * defined as follows: 9*={ 1111+"• +101k: I1^ l<1r forallj}.
(21.22)
Then F* has the following properties: (i) S * n L* = {0}; (ii) For any two 71, rl' E L*, rl rl' the sets q+ * rl and * + ri' are disjoint; (iii) For any x E R k there exists q EL* such that x — rt E Cl t *; that is, R k = U er..Cl( 6t
*+^).
(21.23)
In view of these properties, F * is also called a fundamental domain for L. Note that Y *=(x: <,,,x>I < 7r for all j}, so that
vol t * = f dx = f +•
C(1)
a (x l ,...,xk) a(YI ... Yk) ,
dy
,
(2^)k
(21.24)
JDet(i; I , ...,Ek )I a(YI,•••,Yk)
- «.,j x> , and a x where y.^—
( I,.,
x
,the Jacobian of y with respect to x, is
k)
}Det(^ I ,..., Jk )I, and C(7r) is the cube equal to ((y 1 ,..., yk ): I y J <77 for all ;
230
Expansions—Lattice Distributions
j). From (21.21) we get (2Tr)k vol S*=detL`= detL '
( 21.25)
An evaluation similar to (21.24) gives
f
e'dx = 0 ,
for all EEL, E 0.
(21.26)
5
Now let X be a nondegenerate, lattice random vector having characteristic function f. Let L be a lattice and x 0 E R k be such that P (X E x 0 + L) = 1. Then the characteristic function f is given by f(t)=
2
P (X = X o +E)e r <<.X + e>.
(21.27)
,
EEL
Multiplying (21.27) by exp(— i< t,xo +E>) and integrating over from (21.25) and (21.26), for all EEL, ( i<,.x,+F >f(t)dt. P(X=xo +E)= detL e-
we get,
(21.28)
(2 17 ) J •
The above formula is called the inversion formula in the lattice case. If E is finite for some positive integer s, then on differentiating both sides of (21.27), multiplying the result by exp( — i< t,xo +E>), and finally integrating over 9', one obtains, for IP1 < s, ( x o + E)'P(X=x 0 +E)= (2ir)
-
k detL(—i)I'I
f
exp{— i} D"f(t)dt (EEL). (21.29)
22. LOCAL EXPANSIONS
Throughout this section (X1 : j> 1) is a sequence of independent and identically distributed, lattice random vectors defined on a probability space (f2,JS ,P) with values in R k ; the common distribution is assumed to be nondegenerate having a minimal lattice L; also, EX,= t,
Cov(X I )= I,
P(X 1 EL)=1.
(22.1)
231
Local Expansions
Note that for every nondegenerate, lattice random vector X having finite second moments, there exists x 0 E R k and a nonsingular matrix T such that X,= T(X—x 0) satisfies the above hypothesis. For each n define sequences of truncated random vectors (Y,,,,: j> 1 }, (Z: j> l) by Yj„ =
Xi if IIX1 0
if
—
µ II
IIXj µII >n" 2 ,. —
(j> 1, n > 1). We shall assume that write
ps
= E IIX1 µ ll f is finite for some integer s > 2, and
D„ = Cov(Z,,,,),
—
1= det L,
X„ = with cumulant of X 1 ,
= with cumulant of Z 1 ,,, Ya,n_n- 1/2(a
(22.2)
—nµ),
(Iv! < s),
y,,,,=n -1/2 (a — nEY,. ,,)
(a
EL),
Pn(Y.,.)=P(XI+... +Xn= a)= P(n -1/2 1 (Xi — µ)=Y.,n)• j -1 n
P.(Y..n)=P(Y1,rr+... +Y. ,= a) =P( n -1 / 2
1 Zj.n=ya.n),
m-2
qn.m = In —k/2
n —'/ 2P,(
r..0
-
4: (x})
(2<m<s),
,n-2 r-0
n - ' / 2P,( — qIov,: (,,))
(2<m
(22.3)
The following theorem constitutes the main result of this section. THEOREM 22.1 If p, -EIIX I — µlls is finite for some integer s> 2, then sup ( 1+ IIYa.nII 5 ) IP,,( Yn,n) — gn,,(Ya.n)1 = o(n —( k+s-2)/2 ) aEL
( n —.00).
(22.4) Also, a€L
IPn(Y.,n) — q.,,(Ya.n)I = o(n
—( s-2)/2) (n--goo).
(22.5)
Expansions—Lattice Distributions
232
Proof Let g, denote the characteristic function of X, and fn that of n - 1 / 2 (X, + • • • + X n - nµ). Then fn(t)=(gi("
- ' /2t)exp{
-
i<" -V21 , &>}) n (1ER").
(22.6)
The inversion formulas (21.28), (21.29) yield pn(YQ.n)= 1 ( 2, r) -k f
^gi (t)exp{-i)dt
=/(2^r)-kn-k/2
Ya,Pn(Ya.n) =1 ( 27x)
(
fn(t)exp(
-
i}dt , (22.7)
—k n —k/z ( — i) 191 . xp( —i(t,ya.n>)D^fn(t)dt I e 1/I f'
where /3 is a nonnegative integral vector satisfying I /3 I < s, and n'/ 2A={n'/ 2x: xEA}
(ACR").
(22.8)
Also, clearly, i)IQl Ya ngn.s(Ya.n) = 1(27r) —k n —k / 2 ( —
1
rRk exp{ — l}
3_2
2
2
l
XD O 2n - '/ 2Pr (it: (x,))exp{ -I 1(1 } dt
1 1
r-0
(C^(31<s). (22.9)
111
Hence IYun(hn(Ya.n)—q.s(Ya.n ))J
(22.10)
where, writing (the constant c 20 being the same as in Theorem 9.10) E,=(1: 11 1 11
(22.11)
one has, using Theorem 9.12, (s—)/ 22
1 ^=
f„ E1 s(")"
=(-(,-2)/2) on
12= f
f
(^Itll s Iti^ +Iltll 3(s-2)+1BI )exp{
(n— co),
l4tII2^dt
l
I D^fn (t)I dt ,
s-2
13-
—
(
2
2
1
DP ^n - '/ ZP,(it: {X})exp{ -1 11) y dt
R k \nuIZE,
=o(n -( s -2) / 2 )
r^0
(n--*oo).
l
1
(22.12)
Loral Expansions 233
To estimate I 2 , note that S=sup{Ig l (t)j: tE Y*\E,)<1,
(2113)
since the closure of S*\E I contains no period of g l ^. In view of (22.13) and (22.6), we get (using Leibniz' formula for differentiating a product) 2 < PIS1
1
n1#1/2gn-IfiI
(22.14)
s).
(I$1
Thus sup
aE L
IYan(Pn(Ya,n)-9n,s(Ya.n))I=o(n-(k+s-2)/2) (n--.o0),
(22.15)
from which (22.4) easily follows. If s> k + 1, then (22.5) follows from (22.4). For the general case, we shall use truncation. First note that [see (14.111)] n
IPn(Ya,n)—P,(Ya,n)IK2 1 P(Xj }J,n) '
j-I
aEL
= 2 nP(IIXI'µII =
)
(_(5_2)/2) -(s -2) /2 )
n l/2 ) (22.16)
(n—*oo).
Next, by Lemma 14.6, s-2 ^gn,s(Ya,n)
Qn,s(Ya.n)^
(I + IIY0,nII3r+2)
n-k/2sl(n) r=0
xexp{
—II 6 , n ll2
+ I8k '
II6a.nlI 2
1 8012
12I
} (aEL), (22.17)
where 8,(n) = o(n -(s -2) / 2 ). Now n-k/2a. (I+IIYa.n1I3r+2 )exp EL 6
=n -
k/2
I
(1+1In
{
l
—
- 1/2ajI3r+2)exp
aEL-nµ
< n - "`/ 2 sup
I h(n - '/ 2a+x)
IIxII< n-112aEL
.}
I/2
IIn
-
6
l
1
' /2a
1 2 + 11n
-
I/ 2a
1I
8k1/2 }
(=diameter of *). (22.18)
234
Expansions—Lattice Distributions
where
h(y)=(1+IIYII3r+2)exp{
—I^6Y1I2 + 8^k' 1 2 1
(yER k ). (22.19)
Let T be a nonsingular linear transformation on R k such that TL = Zk .
Then n —k/2 sup Y, h(n — ' /2a+x) IIxiIGAn ^/ 2 aEL
sup
2 h(n -1 / 2 T - 'a'+x), f h(T - y)dy
11xII
Rk
(n— co).
(22.20)
(n-moo).
(22.21)
Thus (22.17) yields 2 19n.3(ya,n) - 9n,5(ya.n)I =o(n
-(3-2/2) )
aEL
Let f„ denote the characteristic function of n -2 1% 1 Z1, n . Then writing so =max(s,k) + 1,
(22.22)
one has, for all aE L [since P (Y 1 E L) = 1], (^'a.n j) t pn
(^'a.n) = 1(291) — k
n — k/2( — j) 5'
r s
x J exp( I/26 •
n
— i) (
ate
f, (t)dt+
kn—k/2(— k l I) s I exp( (y)s t/n.so-!(Ya.n )= 1(277)— JR
— 1}
Ssp-3
x (a) at
(
n_r/2i;r(it:
(X..n})exp(
-
i^t,Dn t)})dt
!
(s'= 0,1, ... ; y a , n j = jth coordinate of y a.n ).
(22.23)
For all sufficiently large n, let Bn = D 1 / 2 and write k 1/2
E 1 , n ={t: 110 n
B.
1 < (E II B
il)2.24) Z1J (2 i 2 ' 2)n/ n o In/(:o—
Local Expansions
235
where A. is the largest eigenvalue of D. By Lemmas 14.1, 14.2, c ^( s, k) n (: -2)/2(s o -2)
E^ n EZ = t II tII <
(22.25)
pf /(so-2)
for some positive constant cZ 0(s,k). By Theorem 9.10, for s' = 0, 1,. . . ,s 0, one has
I
E2
3
exP{ - i
all
)J f^ (t) - J I n - ^ /2Pr(it: [,fl}) r-0
x exp( - 2(t,D"i>)
dt
-(So-2)/2=o(n-(s-2)/2),
< cii(so,k)EIIB,,Z1.nII
(22.26)
using Lemma 14.1(v) for the last step. By Lemma 14.3,
f {IIIII< n I/I /( 16 P3))\EZ..
xexP{
-
(a ) " f (t) all
,
dt
i IIt11 2 }dt=o(n
c'(s',s,k)(l+IIII) < fRk\E2.,
-(:-2)/2
)
(s'=0,1,...). (22.27)
Next, because of the presence of an exponential factor, iso-3
I
R k \E2.. (
I
(I n - '/ ZP,(it: { "})exp{ - i} dt
ti ) ,-0 =o(n-(s-2)/2)
(s'=0, 1, ...),
(22.28)
using Lemma 14.2. Finally,
as
1
i/zry*\ (11t11>n1/2/(16P 3 ))
( —) f' n(t) I di ate
< n" 2E IIZI.n Its (S' + 2p5n -(:-2)/2)" s = o(n -( -2) / 2 )
(s'=0, 1,...),
-;
(22.29)
where S' is defined by -
S'msup(Ig1(t)I:
tE
*\ (1':
IIt'll<( 16 p3) '}}<1.
(22.30)
236
Expansions—Lattice Distributions
Note that in (22.29) we have used the inequality IE(exp(i})I=IE(exp(i })I < 18i (t)I+ 2pn
-(s-2)/2 •
(22.31)
Using (22.26){22.29) in (22.23), one obtains k lk+l sup l+ ± lya,n,i ip,,(ya,n)—q.s o (ya.n)I aEL
j-]
= o(n -(k+s-2)/Z)
(22.32)
(n->oo),
from which follows (Lemma A.4.6 may be used to do the summation) I (p;,(yQ.n)-9,,o-i(ya.n)I=o(n-(s-2)/2)
(n--goo). (22.33)
aEL
Using Lemmas 9.5 and 14.1(v) (as well as Lemma A.4.6), I qn,,, — 1(ya, n) — qn, s(ya, n) I
aEL so-3 =yt—k/z I I 2 n—,/2P.( - 0o.D„ {Xr})(Ya.n) aEL ,—s—I
= o(n —(s-2)/2)
(n--boo),
(22.34)
and one has, combining (22.16), (22.21), (22.33), and (22.34), the desired result (22.5). Q.E.D. The following corollary is immediate from (22.5). COROLLARY 22.2 Under the hypothesis of Theorem 22.1, sup k IQn(B) BED
-
2 9n.:(ya.n)I =o(n
-( J -2) / 2 )
(n-moo), (22.35)
aEBnL
where Q„ is the distribution of n
(X^ - µ).
The next section provides a generalized Euler-Maclaurin summation formula for expressing EaEBfL9n,J(ya,n) in a more convenient integral form.
AsyoViotic Expansions of Distribution Functions
237
For notational convenience we later take L = Z k . Again, this does not involve any essential loss of generality provided that one works with a general nonsingular covariance matrix V, instead of the identity matrix 1. For example, if X, satisfies (22.1) and has minimal lattice L, and if T is a nonsingular linear transformation such that TL = Zk , then Y, = TX, is a lattice random vector having minimal lattice Z' and satisfying Cov(Y 1 )= TT'= V.
P(Y 1 EZ k )=1.
(22.36)
We arrive at the following corollary: COROLLARY 22.3 Let (Y^: j> 1) be a sequence of i.i.d. lattice random vectors with values in R k . Assume that EY, = p and that Y, satisfies (22.36) and has the minimal lattice Zk . If ps= EIIY1 - µl1 5 < co for some integer s> 2, then sup (I+IIYn,n1IJ)IPn(Ya,n)—gn,s(Ya.n)I=o(n'tk+s-2)/2),
a EZ4 (22.37) IPn(Ya.n) - 9,J(Yn,n)I_o(n
-(s-2)/Z
)
(n-*cc),
aEzk
where yan =n -1 / 2 (a-nµ),
=P(n -1/2 1 f
—
(Yj
Pn(Ya,n)=P(Y1+... +Y a)
-
(aEZ"),
1.6)=Y. ,n , )
I s-2
qn.s =n—k/2
n —r/2 Pr( — $.v: {L)), r—O
(22.38)
y, being the with cumulant of Y I .
23. ASYMPTOTIC EXPANSIONS OF DISTRIBUTION FUNCTIONS In this section we apply the generalized Euler-Maclaurin expansion obtained in Appendix A.4 to the local expansion of Section 22. As noted toward the end of Section 22, we may assume, without any essential loss of generality, that the sequence of i.i.d. lattice random vectors (X n : n > I) satisfies EX 1 = s,
Cov(X 1 )= V,
P(X,EZ k )=1,
minimal lattice for X, = Z k .
EIIXI - µll s
Expansions—Lattice Distributions
238
Here is an arbitrary vector in R k V is an arbitrary symmetric, positivedefinite matrix, and s is an integer > 2. We also write xa . n =n - '/ 2 (a-nµ)= -n'/ 2µ+n - '/ 2a
(aEZ k ),
A,(xa.n)=Prob(XI+ • • • +X n =a) =Prob(n - '/ 2 (X I + • • • +Xn-nµ}=xa n), s-2
q
5
7.
—n—k/2
n -r/2Pr( - $0.V:
r—O
(23.2)
(x1))•
As before, y, denotes the with cumulant of X,. THEOREM 23.1 Let a sequence of i.i.d. lattice random vectors (X n : n> 1) satisfy (23.1). Let F n denote the distribution function of n' / 2 x (X 1 +••• +X„-nµ). Then
sup IF,,(x)- I n -IaI/2 ( -1 ) IaI Sa(nµ+n '/2x)(D a to.v)(x)
XER k lal<s-2
-n-1/2 2 n -IaI/2 ( -1 ) IaI S,,(nu+n h/2x)(D P1( U -4 o.v: {74)))(x) lal<s-3
- ...
-
n -(s-2)/2
p,-2(
-
'Do. ,: {X.})(x)
= o(n -(,-2)/2)
(n--),00),
(23.3)
where Sa are defined by (A.4.14) and (A.4.2). Proof. First, by Corollary 22.3, I I Pn(xa,n) - gn,,(xa.n)I = o(n -(s-2)/2)
(n--► oo).
(23.4)
aEZk
Hence it is enough to prove (23.3) with Fn (x) replaced by
Q,, (x) = 5
(23.5)
^In.s (xa.^)
(a: X..,<x)
But by Theorem A.4.3 (taking r = s - 1), sup xERk
Qn.:(x)
-
Z
n-lal/2(-1)laISa(nµ+n'/2x)
j(a)<s-2
s 2 n-j/2 pj( -4,o.v: {X,}))(x)
XD° (
j—o
= O (n -(s-')/2)
(n-*oo). (23.6)
239
Asymptotic Expansions of Distribution Functions
The relation (23.3) is now obtained by omitting from the expansion in (23.6) all terms of order n - '/ 2 , j > s -1. Q.E.D. As an immediate consequence one obtains COROLLARY 23.2 If s = 3 in the hypothesis of Theorem 23.1, then
I
k
sup F. (x) - 0 o v (x)+n -1 / 2 1 S,(n+n'/2xj)(Dj4,,,,)(x) xER A jjj
-n -2P i (-4)o, v : {x,})(x)I=o(n-'/2); (23.7)
and if s > 4, then the left side of (23.7) is O(n -1 ).
Recall that for every nonnegative integral vector a=(a 1 ,...,ak ) the integer j(a) is defined by (A.4.21). To state the next result it would be convenient to write A, „(F) for the finite signed measure whose distribution function is given by Ar.n(F)(x)m 2 ( -1 ) I°I n -IQI/2Sa(nµ+n h/2x)(D aF)(x). (23.8) j(a)
THEOREM 23.3 Under the hypothesis of Theorem 23.1, one has s-2
Q'I_
s -2) f 2 ), (23.9) .n -j/2 AJ-j-i.n(Pj( - (x,))) =o(n -(
j-a'
where Q. is the distribution of n -1 / 2 (X, + • • • + X. - np). Proof This theorem is an immediate consequence of Theorem A.4.3 and
relation (23.4) if one omits all terms with variation norms j > s -1. Q.E.D.
O(n -1 / 2 ),
Specializing Corollary 23.2 to k = 1, one has n 1 / 2 sup JF,(x)-^o.a:(x)j= xER'
sup
ISi(np.1+ni/2x)-po.a=(x)
xER'
-
I' 6iT 2a
2
- X2 )e - x'/ 2 o' +o(l), a
(23.10)
where µ, = EX,, ,t 3 = EX, and a 2 =var(X 1 ). Since the function x -+S 1 (nµ 1 +n'/ 2x) is periodic with a period n - '/ 2 , for sufficiently large n
240
Expansions—Lattice Distributions
the function x-.S 1 (nµ,+n 112x)4 00 2(x) takes values arbitrarily close to both Z 0o,,i(0) and - Z-0o,.,(0) near zero. Also, the supremum 3
sup ER 6V
(1e —j2 3 Q 11 3
`
=
-
-x /ie _
µ3 (23.11)
6Va3
)
is attained at zero. It follows that the left side of (23.10) has a limit (as n--co) given by lim n'/ 2 sup (F„(x)-1Do.0:(x)I= 1 2 V
xER 1
'
I1r31
a 6V2ir 0 3
'
_
3a2+
I µ3I
(23.12)
6Va 3
It has been shown by Esseen [2] that (if X, has a lattice distribution) (23.13)
3a 2 +^ µ 3 l <(V +3)p 3 ,
where p= E,X 1 1 3 . Hence lim n'f 2 sup IF„ (x)-^o , o:(x)^ < xER'
^/ 10 +3 6
p3
(3\
(23.14)
21r a
This bound is actually attained by a Bernoulli random variable X, with probabilities P (X
_ _
-3 " VT-2 2
=
2
P ( X = ^ -3 1 = 4- V 2 2 J '
(23.15) Recall also [see (20.78)] that if X, is nonlattice, then lim n' 12 sup^F„(x)-1D o (x)}= l µ31 (23.16) ^ ^-
n—oo xER'
6V21T a 3
Therefore the best asymptotic constant in the Berry-Esseen theorem (Theorem 12.4) is 6(V'T +3)/Vs . Going through the proof of Theorem 22.1, one can easily obtain some sufficient conditions for the validity of analogs of Theorems 22.1, 23.1, and 23.3 for a sequence of nonidentically distributed, lattice random vectors.
Asymptotic Expansions of Distribution Functions
241
For example, one may assume that the random vectors have a common minimal lattice, say Z', that their average covariance matrices and average sth absolute moments satisfy (19.47), and that their characteristic functions satisfy (22.13) uniformly, to obtain the above theorems in the non-i.i.d. case with o replaced by 0 in the remainders. These conditions can be somewhat relaxed.
NOTES Section 21. The structure of lattice random vectors was investigated by Esseen (1); our treatment is more detailed. Section 22. The first local limit theorem (for lattice random variables) is that of DeMoivre [1] and Laplace [1] for Bernoulli random variables. A much sharpened form of this appears in Uspensky [1). Among the earlier local limit theorems for general lattice random variables are those of Esseen [1] and Gnedenko [1). Esseen II) also obtained the expansion sup (pn(Yan)—q,,,(Yan)I=0(n
2)/2) (N.l)
aEL
in one dimension under the hypothesis of Theorem 22.1. R. Rao 11, 21 extended this to R k and proved a version of Theorem 22.1 having a factor (logn) k / 2 with the remainder in (22.4). For k - I and s> 3 Theorem 22.1 was proved by Petrov [ 1 ], whereas the result of Rao was refined in the multidimensional case by Bikjalis [5]. It follows from a result of Gnedenko (see Gnedenko and Kolmogorov [1], pp. 233-235) that for s=2 the hypothesis in Theorem 22.1 is also necessary. Section 23. Esseen, in his important basic work [I], used the Euler-Maclaurin summation formula to prove Theorem 23.1 in one dimension. Theorem A.4.2, which is a multidimensional extension of the Euler-Maclaurin sum formula, is due to R. Rao 11,2]; using this, Rao was able to prove versions of Theorems 23.1 and 23.3, the remainders having an additional factor (logn) k / 2 . Bikjalis (5] removed this factor from the remainders and proved versions of Theorems 23.1, 23.3 (having considerably more terms in the expansion than we have). We would like to note here that Theorem 4 in R. Rao [2] contains an error in its first assertion. Comparing it with our Theorem 23.3 with s = 4, we get (assuming P4< oo)
IIQe — A3.n(ItQ.v) — n —)/2 A2,n (P)(
-
'I
,Y:
(X.)))II=O(n
-
) ),
(N.2)
while the corresponding expansion in Rao's Theorem 4 omits some of the above terms. This omission is permissible for expanding F. (see, for example, Corollary 23.2) since the functions Sf are bounded; it is not permissible when one is computing the variation norm. The complicated nature of the signed measures corresponding to the functions F, and A, [see (A.4.19) and (A.4.20)] has compelled us to give a fairly long account of these objects. For example, even the fact that F, is of bounded variation turns out to be nontrivial to prove (see Theorem A.2.3 in the Appendix). Also, there remains the outstanding problem of finding computable expressions for A,,,,(P1 (- to , ^,: {w,))XA) for a large enough class of "nice" sets A (not just rectangles properly aligned with the lattice). For some results in this direction see Yarnold [ I ]. There is one important theorem of Esseen [1] that we have not discussed in this monograph: if (X e : n> I) is a sequence of i.i.d. random vectors with (common) mean zero,
242
Expansions—Lattice Distributions
nonsingular covariance matrix V, and a finite fourth absolute moment, then sup JQ,,(A) - 4fly(A)t=O(n -k / ck+u ),
(N.3)
AES where f; is the class of all ellipsoids of the form
A=(xER 1':
<x,V"'x>
(c>0),
(N.4)
and Q. is the distribution of n - 1 / 2 (X, +••• +X„). Note that if the common distribution of
(X.: n> 1) satisfies Cramer's condition (20.1), then Corollary 20.4 yields a smaller error, namely O(n -1 ), in (N.3), since P 1 (-Io. ^,: (X,)XA)=0 for A E . The strength of Esseen's result lies in its validity for discrete distributions. Indeed, he showed that when specialized to lattice random vectors (N.3) is equivalent to a deep result of Landau in analytic number theory: the number B(c) of lattice points, that is, points with integer coordinates, in the ellipsoid (N.4) is of the order B(c)-(volume of A)+O(c k / 2-k / k * l )
(c-boo).
(N.5)
A significant extension of (N.3) to a class of smooth convex bodies has been recently obtained by Matthes [1). Since in this case the contribution from P 1 (-4 : (y,)) is nonzero, one must include this term in (N.3).
CHAPTER 6
Two Recent Improvements In this section a sharper smoothing inequality than those provided by Lemmas 11.1, 11.4 is derived. This inequality is due to Sweeting (1977) and it enables one to remove the factor involving log n from the expression in Theorem 15.4. 24. ANOTHER SMOOTHING INEQUALITY We continue to use the notation of Section 11. Let f be a real-valued Borelmeasurable function on Rk. Recall that MAx: E), m,(x: E)are the supremum and infimum of f in the open ball of radius a centered at x. Also Wf (x:s) = MAx.•s) — m,.(ke), and f(y) = f(x + y). Let be a finite measure, v a finite signed measure and KE a probability measure on R". Assume that a and c are two positive numbers such that
==
I
KE(dx) >'/z. (24.1) {Ilxll
So = I j f(y) (A v) (dy) I
(24.2)
— v) (dy).
(24.3)
—
Assume first that ao = i f(y)
(jL
Then
S
M,(• : a&)d((µ — v)sKE)
= S jMAx+ y: ae)(µ — v)(dy)K (dx) E
243
Two Recent Improvements
244
{ NxO
j(x
{ lM
=11
+
{a6 _< NXN
+
+
{NxH a&'}
+ y: ac)(A — P)(dy) } Ke(dx)
I,
+
13 ,
(24.4)
say. Since Mf (x + y:as) — fly) ^ 0 if Nxf < as, one has J
I, >— —
[ j (M, (x + y: at) — Ay)) v (dy) ]K6 (dx)
(NxH
f .f (y)
+
— v) (dy) ]K6 (dx)
(xN < at)
(24.5)
2t — of I w1 (y: 2ae) v + (dy) + a b o ,
where v = v + — v — is the Jordan decomposition of J
I: =
v. Also,
J (M, (x + y : as) — f (x + y)) (µ — v)
(dy) J KE (dx)
(as <_ NxU
+
j
f (x + y) (µ — v) (dy) ]K6 (dx)
(as <— NxU
>— — J ms (• : at) d I (µ — v) * K E ' —
(1 — a),
(24.6)
where =
sup (Nxi
I f (x + y) (.a — y) (dy) I .
(24.7)
245
Smoothing Inequality
Next,
11 ) < 3
J
[JIM,(x+y :ae)I
l
—
vI (dy)]Kg (dx)
{uxU> as') sup, ( l xER I ( I f (x + y) I +w,(x+y: at))lµ — vI(dy)
1
K,(dx) (24.8)
{nx1 >— a&')
Combining (24.4) — (24.8) one arrives at aS 0 : I M,(•: as) d (z — v)*KE + a w,(•: 2as)dv'
+ J w, (• : as) d I (µ — v) r KF I + ö 1 (1 — a)
+( xER, I(If(x+y)I+ww(x+y:
•
j K5(dx) . {Nxl ^ as')
a&))I
—
v I(dy))
(24.9)
In case b o = — J fd (µ — v) , the above computations are applied to —f (in place off). Thus in all cases one obtains a6 0 5 9, (e) + a0 2 (z) + c0 3 (s) + S, (1 — a) ,
(24.10)
Two Recent Improvements
246
where Os (s) = j(IfI + 2W,(.: as)) d l (i — v) . K E I , 0 :(E) = !wf( . : 2as)dv',
03 (g) =
c=
J KE (dx) , IDA2t — as' )
sup x E Rk
I (If (x + y) I +wf(x+y
:a&))IA
—
v I(dy)•
(24.11)
Now define
=
sup
I j f (x + y) (ai — v) (dy)
(uxII <jae'
(j = 1, 2,...).
(24.12)
We are going to establish a relation such as (24.10) between S f and 5,,, for all j = 1, 2, .... Fix j . Let n be a number satisfying 0 < rl < .(Assume S; > 0; else, there is nothing to prove). Once again suppose
=
sup t Ilia <jas' )
J f(x + y) (A — ►') (dy)
(24.13)
There exists x, such that Ix, <jas' and f(x^ + Y) (p — v) (dy) > S1 — n .
(24.14)
Smoothing Inequality
247
Then
j J M, (x + y + x) (. — v) (dy) K (dx) E
1
= (M, (x + y + x) — f (x + j
j
y)) (u — v) (dy) K. (dx)
+ f (x, + y) (µ — v) (dy) KE (dx) = I.j + I +
131,
(24.15)
where I,j , I, I j are the integrals (w.r.t. K. (dx)) over [ Uxa— as') , respectively. As in (24.5) — (24.8), one has I,, >— — a w,
(xj
+ y : 2as) v (dy) + a (Sj — -q) ,
1.► >— — J w,,j (.: as) d I (u — v) * KE
I —8
j. 1
(1 — a) ,
I I,j 15 c0,(e).
(24.16)
Since i > 0 may be taken arbitrarily small, one has a bj <— 6, j (E) + a O, (&) + CO, (s) + Ej., (1 — a) ,
(j = 1,2,...),
(24.17)
where
O (E) = ( 921 (8)
=Iw
I
I + 2(Jf
j
(.:
2a6)dv`,
a6 ) ) d I (u — v) * KE
I, (24.18)
=
248
Two Recent Improvements
and c, 0, are as in (24.11) . Now let m be any positive integer, and define
8 1M =
sup
S(^ f=1 + 2w,= (•: as)) d}(u,
— v).K E 1,
(IxI <— mas' )
sup (UX1 s mas' )
B zM =
,y = M
J w,: (.: 2ae) dv' ,
1. + c03() + a 81,,, a
(24.19)
Then one has
S., <_ (1/a) 8, M + (c/a) 0 3 (e) + 0 2M + ((1 — a) / a) 5^.,
TM + ((I — a)/ a) Sj„
(24.20)
(j = 0, 1, 2, ...) .
Iteration now yields
M-I
:
So
^
1+ 1
a
a
a + 1
a
+ ... + 1
M
+ 1
—
a
a SM
a
a
Smoothing Inequality 1 - a ). 1-( =y- a + ( 1_a 2a
-
1
249
8^
a
(24.21)
The following result is now proved. LEMMA 24.1. Let µ be a finite measure, v a finite signed measure, and K. a probability measure on Rk. Suppose (24.1) holds for some positive numbers a, s, and let E' > F. Then the inequality (24.21) holds with 6o given by (24.2) and -yam, defined by (24.19), (24.20).
As a consequence of this strengthening of Lemma 11.4 one may obtain the following improvement of Theorems 15.1, 15.4. The notation below is that of Section 15, unless stated otherwise. The positive constants d depend only on k, s, r and are suitably chosen. THEOREM 24.2. Under the hypothesis of Theorem 15.1 one has, for each positive integer j,
I
J fd (Q
-
^') 1 ^ M, (f)(d1n (• 2 )': A*,,, + d2,1(e3/n'')') -
+ d3w1 (d4Q3n ' : -
,
(24.22)
where r 0 , g and 0 are as defined in the statement of Theorem 15.1.
Proof The proof of Theorem 15.1 remains unchanged upto (and including) relation (15.30). Now apply Lemma 24.1 with ' µ (dy) = (1 + ly I 0) Q (dy), v (dy) = (1 + Ily u '°) ' (dy) ,
(24.23)
250
Two Recent Improvements
and KE satisfying (15.25) and
1yV
K (dy) < oo
J (24.24)
Let (See (15.25), (15.33) )
a = 1, s = 16c 9 Q,n, e' =
m = [E , _ ½ ] , (24.25)
and note that a ^ 3/4 (See (15.25) ). Instead of (15.31) one then obtains the relation
I
g.. (x) (I + lxu `°) (Q.' — 0') (dx)
< 27.. +
(!
—
a
)^
a., (24.26)
By (15.48),
81 ^, s d6 M, (f) I (U — is) .K6 1 s d, M, (f)A;,n'c' :>i: -
(24.27)
Smoothing Inequality
251
To estimate 82,,. write t for the density of ' and note that
(y : 2a6) v (dy) = J w, (a. + x + y : 2a6) (1 + lyu ro ) E ` (y) dy
= j cot (z : 2at) (1 + liz — a„ — xli '°) ' (z — a. — x) dz
= J w, (z : 2as) (1 + az1 '°) ^ ` (z) dz
+ j w, (z : 2as) j (1 + liz — a„ — xli '0 ) E ` (z — a„ — x)
— (1 + lizIir°)E`(z)Idz.
(24.28)
For lixl < mae' = ms' < s "½ _ ; 4 one may show, as in the proof of Lemma 14.6 (especially see (14.81) — (14.86)) ,
(1 + liz — a. — xli ' 0 ) t + (z — a. — x) — (1 + Ilzll ' 0 ) E' (z)
s 1 liz — a. — xu '° — lizu '° 1 ^' (z) + (1 + liz — an — xli '0)
I E'(z—a„—x) — + (z) I < d8 (s% + lia.li ) (I + Uz11 '0- ' ) t + (z)
+ d9 (e% + aa„II ) (1 + lid '0+1 ) exp ( ½ (e% + lia„II )'
+ (s'" + Ila„II ) lid — ½ Uzu 2 I
252
Two Recent Improvements
s d, o (1 + Id
'O-1
) E (z) + d 11 (1 + Id '0 ) exp ( — '/2 Nz1 2
if IzI s d%2 ( + Na.N )'' ,
s d, ° (I + Nz I '*-') t ` (z) + d,:,,s' exp I — Id 2 / 4 }
if Nzd > d 12 (&4 + Na„N) -' (24.29)
Hence
82 s dt, ,E
J w, (z : 26) (1 + NzI '°) E (z) dz + d,4, jm,(f)&J. (24.30)
Further, by Lemma 14.3 (or (14.93) ), and inequalities (14.12), (14.81) (also see (15.28) ), one has
NU
= I (1 + Nyl ' ) Q. (dy) s d,5,
UPI =
0
J (1 + NyI'° )) E(y) I dy s d,6.
(24.31)
Smoothing Inequality
253
These lead to the estimate
c =
sup
J(g
+ y) ( + w, (x + y: ac)) µ — v I (dy)
x E R'
5 d17 M,(f)(liO + IvU) <_ d 18 M,(f). (24.32)
Next,
9, (e) =
K, (dx)
=
K(dz)
I OZu ^ E 11
[ 1XY ^ e')
-
Izi "K(dz) = d, 9 0. (24.33)
Combining (24.27), (24.30), (24.32), (24.33) one has (see (24.19) )
s M,(f)(d2on-(-') p w.
+ d21,,F+ ' )
+ d„ J w, (z : 2E) (1 + .jZM ro) ^G ' (dz) . (24.34)
By an application of Lemma 14.6, (24.34) is reduced to
(,-2)hl s y am, 5 Al, ( )(d30n p ,, + d2i'JE -
254
Two Recent Improvements
+ d23 j w, (z : 2s) (1 + IzI '0 ) ^, ` (dz) . (24.35)
Since a z 3/4, one easily shows that
1J
—_a \ -
a
1 < 3 )" < 1 31 E
d24,jE',
(24.36)
for <_ 1. In view of (15.11) and (14.60) one may assume 5 1. Using (24.35), (24.36) in (24.26), and in turn using (24.26) in (15.29), the proof of (24.22) is completed. Q.E.D.
Note: In view of the Remark on p. 151, one may replace M, (f) in (24.22) by M (f) = inf I M, (f — c) : c E R 1 ).
COROLLARY 24.3. (a) w* may be replaced by w in Corollary 15.2, Theorem 16.1, and Corollary 16.2. (b) Theorem 17.1 holds with A. (d: o ,, ) replacing A : (d : t o , v ) (See (17.52) ). (c) Theorem 17.4 holds for all Bore! sets A satisfying
^((aA)E) <_ c6r.a , (& > 0)
(24.37)
for some a> 1. (d) Theorem 17.9 for the estimation of the Prokhorov distance d, (and the bounded Lipschitzian distance dal) holds with r = c; g, n '1.
Expectations of Smooth Functions
255
25. ASYMPTOTIC EXPANSIONS OF EXPECTATIONS OF SMOOTH FUNCTIONS OF NORMALIZED SUMS Theorem 20.7 of Chapter 4 provides an asymptotic expansion upto order o (n'" '' 2 ) of j f d Q. for every real or complex-valued function f on Rk which is the Fourier-Stieltjes transform of a finite (possibly complex-valued) signed measure µ satisfying -
S tutu
I I (dl) = o (A 'U
-(, -2) /2)
as A — oo .
>A] (25.1)
There is no distributional assumption on the common distribution Q, of the i.i.d. summands, excepting that of nondegeneracy and finiteness of moments of order s. In particular, the result applies to all trigonometric polynomials and all Schwartz functions. (See p. 248 for definition of Schwartz functions). This result is refined below by a simple argument involving completion in a Sobolev norm. This section provides an alternative approach, used in Bhattacharya (1985), to a very useful result of Gotze and Hipp (1978). Let m be a positive integer. The Sobolev space W^". 1 is the class of all (equivalence classes of) complex-valued functions g on Rk which have square integrable distributional derivatives Dag, 0 <_ I a I s m. Recall that g has a square integrable distributional derivative h of order a, denoted h = D g, if h E L' (Rk) and
J
h (x) v (x) dx = (— I) I a I
I
g (x) (D" v) (x) dx
(25.2)
for all infinitely differentiable complex-valued functions v having compact support. In particular, the class 8 of all Schwartz functions on Rk is contained in W" . 2 . For g E 8 the distributional derivative D ° g coincides with the classical derivatives of g of order a. Also one has, by Theorem 4.1 (v) , (vi),
256
Two Recent Improvements
ND°gl, = (2w) -k^ 3 Nta§(t)N:
(g E S). (25.3)
Define the inner product
m,2 =
J (1 + NtN2)-Kl(t)j:(t)dt
(25.4)
on g . This inner product can be extended to the class of all square integrable functions g on Rk satisfying
1
(1 + Nt
2
)'Ie 1 (t)I 2 dt < 0 (25.5)
Thus the extension is to a linear subset H of L 2 (Rk) whose image under the Fourier transform is the Hilbert space L 2 (Rk, (1 + In 2 ) ^ dt). Hence H is a Hilbert space under the inner product (25.4). In view of (25.3), on S this Hilbert space norm is equivalent (topologically) to the norm
%2
191. (2) =
2
C ID°^gN, Os a^sm
(25.6)
This leads to the identification of sets H = W- ,2 . From now on W". 2 will be identified with the Hilbert space H not only as sets, but also as having the same inner product (25.4). For the sake of simplicity we consider only the i.i.d. case. Suppose X,, X2 ,... is a sequence of i.i.d. k-dimensional random vectors having a positive
Expectations of Smooth Functions
257
definite dispersion matrix V Let Q. denote the distribution of n'"4 E;_, (XJ — EXJ ) . Write for the Edgeworth expansion (signed measure) of Q. of order j:
= , n•nPr (
o, :N J),
r O (25.7)
where X • is the cumulant of X, of order v.
THEOREM 25.1. Let s be integer, s >_ 3, such that Q, = E NX, — EX, N' . Let m be an integer satisfying k / 2 < m <— s — 2. 11(1 + NxN 2 ) -•' 2 f E W". 2 , then one has < oc
J
fd ( Q. - 0..2)
= 0
(n-"2-h'4
)
.
(25.8)
Proof Write g (x) = (1 + NxI 2) ' 2 f (x) . Since g E W'". 2 one has
I 1§(t) I dt =1 (1 + NtU (1 + NtU') - dt)
2
) -.^"(1 + IItU 2 )../ 2 ^g(t)Idt
NgL,. (25.9)
Also
I 1 9(t) l dt (111 >A)
Two Recent Improvements
258
%z
%z
(1 + Ntl 2 ) --dt (Ntl >A)
+
(1 + NtI 2 )-^g(t)I 2 dt
(NtU >A)
= o (A -" + k° n) as A --
Co.
(25.10)
Now assume for the moment that s is an even integer , By Parseval's relation (Theorem 5.1 (vii)) one has (Note that g = (2 Ir) k , where g is defined ;
-
by (4.4) )
I j fd ( Q — 0G . )1 = I I g (x) (1 + NxU 2 ) S/I (Q. (dx) — 0. (dx)) I
= ( 2 r) -kl i (t) (1 —.1)512(Q (t) — Wa (t)) dt I J
(25.11)
where A = E ;_, a 2/a t ; Using Theorem 9.12 and the estimation (25.10) with A = c, (s,k) n'f p, -1 '(' - 2 one can bound the last integral by .
sup
i
t E R'
I sup
t E Rk
(t) I
I (1 — ©)''2 (Q„ (t) — , (t)) I dt
S (NtN >A)
I (1 — o) a2 (& (t) — t. (t)) (NtN>A)
= o(n -f 2)/2 ) + o (ii -(-/2
-
19 (01 dt
k/4)).
(25.12)
Expectations of Smooth Functions
259
In case s is an odd integer use truncation and carry out the above computations with s + 1 replacing s. Q.E.D.
For some purposes the result of Gotze and Hipp (1978) is somewhat better and, therefore, let us state it here.
THEOREM 25.2. Assume that g, < oo for some integers >_ 3. If (i) D °a f is continuous for a I s s — 2 , (ii) (1 + IxN 2 ) -12 1 f (x) I is bounded above, and (iii) D of has at most a polynomial growth at infinity for I a I = s — 2 , then
j fd(Q, u,) = o(n -
-
c•
-
2)'2).
(25.13)
In order to make a simple comparison between the two theorems above, assume Q, < ao for all integers s > 0. If f is m-times continuously differentiable and its derivatives of order m are polynomially bounded, then Theorem 25.2 provides an asymptotic expansion of j fd Q. with an error o (n m2). Theorem 25.1 on the other hand gives a larger error o (n (= k"') . However, there are functions in W".' which are not m-times continuously differentiable. In general, all that can be said is: if g E W'. 2 then g has continuous derivatives of order a for all a satisfying I a I < m — k / 2 t. Thus there are functions f for which Theorem 25.1 provides a sharper result. Finally, let us mention the recent monographs by Hall (1982) and Sazonov (1981) on the subject matter of this monograph. -
t See Reed, M. and Simon, B. [l], Theorem IX.24, p. 52.
-
Chapter 7
An Application of Stein's Method In this section, we first present a brief outline of a method of approximation due to Stein (1986), which is, in general, not Fourier analytic. This is followed by a detailed derivation of the Berry—Esseen bound for convex sets obtained by Gotze (1991), who used Stein's method.
26.
AN EXPOSITION OF GOTZE'S ESTIMATION OF
THE RATE OF CONVERGENCE IN THE MULTIVARIATE CENTRAL LIMIT THEOREM In his article Gotze (1991) used Stein's method to provide an ingenious derivation of the Berry—Esseen-type bound for the class of Borel convex subsets of R' in the context of the classical multivariate central limit theorem. This approach has proved fruitful in deriving error bounds for the CLT under certain structures of dependence as well (see Rinott and Rotar 260
26.1. The generator of the ergodic Markov process. 261
(1996)). Our view and elaboration of Gotze's proof follow Bhattacharya and Holmes (2010) and were first presented in a seminar at Stanford given in the summer of 2000. The authors wish to thank Persi Diaconis for pointing out the need for a more readable account of Gotze's result than that given in his original work. Rai c (2004) has followed essentially the same route as Gotze, but in greater detail, in deriving Gotze's bound. It may be pointed out that we are unable to verify the derivations of the dimensional dependence 0(k) in Gotze (1991), Raic (2004). Our derivation provides the higher order dependence of the error rate on k, namely, 0(k4). This rate can be reduced to 0(k4) using an inequality of Ball (1993). The best order of dependence known, namely, 0(k4), is given by Bentkus (2003), using a different method, which would be difficult to extend to dependent cases. As a matter of notation, the constants c, with or without subscripts, are absolute constants. The k- dimensional standard Normal distribution is denoted by Ar(0, ilk) as well as P, with density 0.
26.1 The generator of the ergodic Markov process as a Stein operator. Suppose Q and Qo are two probability measures on a measurable space (S, S) and h is integrable (with regards to Q and Qo). Consider the problem
of estimating
Eh—E0h
-J hdQ — f hdQ o .
( 26.1.1)
A basic idea of Stein (1986) (developed in some examples in Diaconis and Holmes (2004) and Holmes (2004)) is
Chapter 7. An Application of Stein's Method
262
(i) to find an invertible map L which maps "nice" functions on S into the kernel or null space of Eo, (ii) to find a perturbation of L, say, L a , which maps "nice" functions on S into the kernel or null space of E, and (iii) to estimate (26.1.1) using the identity Eh - Eoh = ELgo = E(Lgo - L a ga ),
(26.1.2)
where go = L -1 (h - Eoh),
ga - L
-
1 (h
- Eh).
In the present application, instead of finding a perturbation L a of L, one obtains a smooth perturbation Tt h, say, of h, and applies the first relation in (26.1.2) to Tt h rather than h. Writing V) t = L -1 (Tt - EoTt h) in place of go above, one then estimates EL b = ETt h - EoTt h. Finally, the extent of perturbation due to smoothing is estimated: (ETt h - EoTt h) - (Eh - Eoh). One way to find L is to consider an ergodic Markov process {Xt : t > 0} on S which has Qo as its invariant distribution and let L be its generator. Lg
t oTtgt
-
g 2 g 6.1.3 E Dc,, ) (
where the limit is in L 2 (S, Qo) , and (Ttg)(x) = E ]g(Xt)IXo = x] , or, in terms of the transitions probability p(t; x, dy) of the Markov process {Xt :t>0}, (Tt g)(x)
= Js
g(y)p(t; x, dy)
(x E S, t > 0).
(26.1.4)
26.1. The generator of the ergodic Markov process.
263
Also, DL is the set of g for which the limit in (26.1.3) exists. By the Markov (or semigroup) property, Tt +s = Tt T3 = T3 Tt , so that d Ttg = lim
dt
Tt+sg - Ttg
40
= lim
s
Tt(Tsg - g) = T t Lg.
(26.1.5)
s
40
Since Tt T3 = T8 Tt , Tt and L commute, (26.1.6)
dt Ttg = LTtg.
Note that invariance of Qo means ETt g(Xo) = Eg(Xo) = f gdQo, if the distribution of X 0 is Qo. This implies that, for every g E DL, ELg(Xo) = 0, or
i
s
Lg(x)dQo(x) = 0,
[ELg(Xo) = E^ t o
Ttg(Xo) - g(X0) __
t
)
urn ETeg(Xo) - Eg(Xo) t
That is, L maps DL into the set 1 1 of mean zero functions in L 2 (S, Qo). It is known that the range of L is dense in 1 -L and if L has a spectral gap, then the range of L is all of 1 1 . In the latter case L -1 is well defined on 1 -'- (kernel of Qo) and is bounded on it (Bhattacharya (1982)). Since Tt converges to the identity operator as t . 0, one may also use Tt for small t > 0 to smooth the target function h = h - f hdQo. For the case of a diffusion {Xt : t > 0}, L is a differential operator and even nonsmooth functions such as h = 1B - Qo(B)(h = 1B) are immediately made smooth by applying T. One may then use the approximation to h given by
Tt h = L(L - 'Tt h) =Li t , with O t = L -1 Tt h,
(26.1.7)
and then estimate the error of this approximation by a "smoothing inequality", especially if Tt h may be represented as a perturbation by convolution.
Chapter 7. An Application of Stein's Method
264
For several perspectives and applications of Stein's method, see Barbour (1988), Diaconis and Holmes (2004), Holmes (2004), and Rinott and Rotar (1996).
1(a) The Ornstein—Uhlenbeck Process and Its Gausssian Invariant Distribution The Ornstein-Uhlenbeck (OU) process is governed by the Langevin equa-
tion (see, e.g., Bhattacharya and Waymire (2009), pp. 476, 597, 598)
dXt where {B t
( 26.1.8)
= - Xt dt + v dBt ,
: t > 0} is a k- dimensional standard Brownian motion. Its
transition density is k
p(t; x, y) = x =
-t
)2
2 - e xi II [27r(1 - e -2t)] z exp - (y2(1 - e -2t) }'
i=1 ... , xk), y = (y1, ... , yk). (
x1,
(26.1.9)
This is the density of a Gaussian (Normal) distribution with mean vector
e - t x and dispersion matrix (1 - e -2t )4 where ilk is the k x k identity matrix. One can check (e.g., by direct differentiation) that the Kolmogorov backward equation holds:
ap(t; x, y) k a2 p(t; x, y) _ k ap(t; x, y) Lxz —L. z _ 1 ax? at i= 1. ax, = Ap-x.Vp=Lp, withL-©-x•V,(26.1.10)
26.1. The generator of the ergodic Markov process.
265
where © is the Laplacian and V = grad. Integrating both sides w.r.t. h(y)dy, we see that Tt h(x) = f h(y)p(t; x, y)dy satisfies
a
Tt h(x) = ATt h(x) - x • VTt h(x) = LTt h(x), Vh E L 2 (IP C , (F).
(26.1.11)
Now on the space L 2 (R k , (F) (where (F = N(0, Ilk) is the k-dimensional standard Normal), L is self-adjoint and has a spectral gap, with the eigenvalue 0 corresponding to the invariant distribution 4) (or the constant function 1 on L 2 (R', 4))). This may be deduced from the fact that the Normal density p(t; x, y) (with mean vector e -t x and dispersion matrix (1 - e -2t )IIk) converges to the standard Normal density q(y) exponentially fast as t -* 00, for every initial state x. Else, one can compute the set of eigenvalues of L, namely, {0, -1, -2,.. .}, with eigenfunctions expressed in terms of Hermite polynomials (Bhattacharya and Waymire, 2009, page 487). In particular, L -1 is a bounded operator on 1 1 and is given by
L -1 h = - J
r
T3 h(x)ds,
Vh
h - f hd(F E L 2 (R k ,,D). (26.1.12)
0
To check this, note that by (26.1.11)
(26.1.13)
= -
Tsh(x)ds = -1 00 LTs h(x)ds = L (- ^ Ts h(x)ds) .
Jo
as
0
o
For our purposes h = 1C: the indicator function of a Borel convex subset CofR k . A smooth approximation of h is Tt h for small t > 0 (since Tt h is infinitely
Chapter 7. An Application of Stein's Method
266
differentiable). Also, by (26.1.12)
't (x) - L 'Tt h(x) _ -
J
T T h(x)ds 8
(26.1.14)
t
0
_ — Jo / T3+th(x)ds = —Jt ^ T h(x)ds 3
_—
T l Rk J {
h(e - sx + 1 - e -2 sz)O(z)dz } ds,
JJ
O
where 0 is the k-dimensional standard Normal density. We have expressed T3 h(x) - E[h(X 3 )IXo = x] in (26.1.14) as E[h(X 3 )jXo = x] = Eh(e - sx + 1 - e -2 sZ), where Z is a standard Normal N(0,
Ilk),
(26.1.15)
for X3 has the same distribution
as a - ex + 1 - e -2 sZ. Now note that using (26.1.14), one may write (26.1.16)
Tt h(x) = L(L - 'Tth(x))
= 0 (L - 'Tih(x)) - x . 0 (L - 'Tth(x)) = i i?&t(x) - x - DV)t(x)• For the problem at hand (see 26.1.1), Qo = I and Q = Q( n ) is the distribution of S,,, = *(Y1 + Y2 + . • • + Y.) _ (X1 + X2 + • • + Xn), (X3 = Y^/ / ), where are i.i.d. mean-zero with covariance matrix IIk
and finite absolute third moment k
2
p= EjjYi11 3 = E ^(Y) 2 We want to estimate
Eh(S) = Eh(Sn ) -
J hd4)
for h = 1c, C E C, the class of all Borel convex sets in Rk.
(26.1.17)
26.2. Derivatives of Ii t - L -1 Tt h.
267
For this we first estimate (see (26.1.16)), for small t > 0,
ETth(SS) = E [DOt(S.) — Sn - DV)t(S,)] •
(26.1.18)
This is done in subsection 26.3. The next step is to estimate, for small t > 0,
ETth(S) — Eh(S),
(26.1.19)
which is carried out in subsection 26.4. Combining the estimates of (26.1.18) and (26.1.19), and with a suitable choice oft > 0, one arrives at the desired estimation of (26.1.17). We will write
6n =
sup
J hdQ(,
n
{h=lc:CEC}
26.2 Derivatives of
t
) —
fhdF. (26.1.20)
- L -1 Tt h.
Before we engage in the estimation of (26.1.18) and (26.1.19), it is useful to compute certain derivatives of V)t:
a a2a3 Let Di = —, Dijy = ax Dii, = , axi
axiaxi•
,
,
iaxi, axi„
etc.
Chapter 7. An Application of Stein's Method
268 Then, using (26.1.14),
DiV)t (x)
(26.2.1)
fk
—
1 — e -2
_ Ily—e-sxH12
ex P
-s }dyds { 2(1 —e2)
_ k e -3 (yz — e sxi) k h(y)(2^r(1 _e28)) 21 — e -2 s 1 — e -2 s —r f -
exp
{
Ily—e-sxII2 d ds 2(1_e_28) I y
e —s
1 — e 2s
t
Ilgk
h(ex + 1 — e` 28 z)zjO(z)dz ds,
ziO(z) = a z2 O(z) = DiO(z) —
—
using the change of variables
z
— y — e -s x 1
—
e 2s -
a ' In the same manner, one has, usingg Z, D Z^ , etc. for derivatives aZ;
a2
etc., e s
Dii3Ot(x) _
DZZZ
(y2 — e s xi) e -s h(y)(2^r(1 — e -2s))- 2 s
^
— ,/t
[fRk ^t(x)
_ —
f t
Rk [f
2
1 — e-2s
(26.2.2)
h(e-sx + 1 — e -2s z ) • D(z)dzds, e
8
1 — e -28
h(e-sx + 1 — e-2sz) ' (—Dii'i"O(z))dz^ ds.
26.2. Derivatives of V) t - L -1 Tt h.
269
The following estimate is used in the next section: sup
uER'
Jk
n- 1 e-sx + e -s u + 1- e -2 sz O(x)Dii , a ,, O(z)dxdz
h
fRl^
Ti
< coke 2s (1 - e-23).
(26.2.3)
To prove this, write a =/ ^n1 e 8 / 1 - e -2 s and change variables x —* y = x + az.
Then
O(x) _ çb(y - az) _ 0(y) - az V (y) k
+ a2
f
zr zr ,
(26.2.4)
-v)Drr'(y - vaz)dv,
so that
n n 1 e -s x + e -s u + 1- e -2 sz = h n n 1 e -s y + e - 'su
h
and the double integral in (26.2.3) becomes
(26.2.5) Jh
JRk R k
n-1 n
e-Sy + e -s u k
0(y) - az • V (y) +
1
a 2 zrzr,
1 (1 - v)Drr'O(y - vaz)dv 0
Djjy, O(z)dzdy.
Note that the integrals of
and zj)Djjy,O(z) vanish for i, i', i",
and i o , so that (26.2.6)
f
R kh
n-1 e s y + e su (O(y) - az
= 0.
Chapter 7. An Application of Stein's Method
270
The magnitude of the last term on the right in (26.2.4) is (26.2.7)
f
a 2 (1 — v)
zrzr'(y —vaz) r (y —vaz) r ' —
1
zr O(y — avz)dv r=1
r=1
k
k
zrzr'(y — vaz) r (y — vaz) r ' + E zT O(y — avz)dv,
<_ a 2 1 (1 — v) °
r=1
r=1
since the sum E r r , above is nonnegative. Bounding hj by 1, it follows from (26.2.5)-(26.2.7) that the left side of (26.2.3) is no more than
a2 L
/' 1
(1 (1 — v)
0
{
k
+z
f
f
k
k {(y
r # r ,
fvaz)r(y — vaz) r '(y — vaz)dy ( 1R k
— vaz)T + 1)0(y — avz)dy
jDii'i"O(z)jdz dv
d
r=1
=a2
L
l(I — v
f
2 ^ z2 (Dii'i"O(z)(dz dv,
(26.2.8)
Rk r=1
from which (26.2.3) follows.
26.3 Estimation of Tt h(Sn ). By (26.1.16), (26.3.1)
Tth(S^) = L(L -1 Tth)(Sn.) = Lb(S) = DOt(S.) — Ste, • V^it(S.)•
Consider the Taylor expansions k
Aot(S.)
= ^Dii t(sn)
i=1 k
k Dii VYt(Sn — X l) +
_
i=1
i,il
f
— X l +vX i )dv,
26.3. Estimation
ofTt h(Sn ).
271 n k
n
Sn . VV)t(Sn) = E `Yj • VV)t(Sn) = E E n
Xj z)
k
k
DiOt(S. — Xj ) + j=1 i=1
+
(26.3.2)
• DiOt(Sn)
j=1 i=1
j=1
Xj'')Xj( )
Dii'Ot(Sn — Xj)
i,i'=1
XX^i,)XY„)
(1 — v)Dii'i"Ot(Sn — Xj + vXi )dv
Recalling that Xj = Y' , EYj = 0, EX(7 i) X = n 1 EY7 (i) Y7 (i) =
n
and Xj and Sn, - Xj are independent, k
(26.3.3)
EOj't (Sn ) = E E Dii*t(Sn — X1) i=1 k
yr (i)
—
+ Fes' E
f o
1
Diii''Ot(S'n — X1
+ vX 1 )dv
k
ESn • vbt(Sn) = E E Dii0t(Sn — Xi)
(26.3.4)
i=1
1
+
k
k
E Yll'')Y(',)Y1(i„)
r1
J
(1 — v)Dii'i"Ot(Sn — X1 + vXi)dv].
0
Hence k Y (i)
ETth(Sn ) = E
1 Diii'Vt(Sn — X1 + vXi)dv n
- 1
k
Y(i)y1(i1)y1(i„)
i,i”=1
f1 •
J
0
(1 -
v)Dii'i"Ot(Sn
- X 1 + vX 1 )dv .
(26.3.5)
Chapter 7. An Application of Stein's Method
272 One may then write
ETt h(SS ) = E[E(... IY1)],
(26.3.6)
where • • • is the quantity within square brackets in (26.3.5), i.e.,
J0 E [Diiiopt(S. — Xi +vXi)IY1] dv
E(Tth(S.)IYII = 1 Yi(2 i Z,i =1
(26.3.7) —
k
f
Y(z)y(i')Ytit„i
1 (1 — v)E[Dii v 0t(Sn — X1 + vX1)IYi]dv. ,
,
The first term on the right side in (26.3.7) equals ✓In -
k YI )
+ e -3 vX1 + 1
-!
(
E
1.
1 e e-2s) 3 f 1 o
f/ (
=1
(
k
h(e -s (Sn - Xl)
L'
1 — e -2 sz) (—Dizz ^(z))dzlYl
k
+ e -s vX1 +
(
dv ds
fl f
l e ) 3
[
fs n n l x
1 — e -28 z dQ( fl l)(x) Dii i 3O(z)dz dv ds,
-1( Y2+Y3
noting that the distribution of S,,, — X1 =
(26.3.8)
i+Y ) is that of
n n 1 V, where V has distribution Q(,,,_l). Therefore, (26.3.8) is equal to k
co
s
3
l ^ ( 1e e zs) Jo
Y (2) 2,2 =1
+ e - svX1 +
1
fk
fRkh e-s
V^ n—n x
^
1 — e -23 z (d(Q( n _ l )(x) — -D(x)) + d4I(x))
DiZti3O(z)dz duds.
(26.3.9)
26.3. Estimation of Tth(Sn ).
273
Since the class of functions h = 1c, where C ranges over all Borel convex subsets of R k , is invariant under translation, and bC is convex if C is convex (bC ={bx:XEC},Vb>0),
JJk h e
1 - e-zsz (d(Q(n-1) (x) - 'D(x)))
x + e svXi +
-3
n
(26.3.10)
Similarly, the second term on the right in (26.3.7) equals k
1
y1(
e_ 3
3 1
o e- ) ) rt 1( e-zs 1
i)Yl(i )y1(i
i,i"=1
-s
n- 1
n
(26.3.11)
x+ e - svXi + 1- e -28 z
d(Q(.-,)(x) - -D(x)) + d(x) II .Dii'i"O(z)dzIdv. (
Again, the inner integral in (26.3.11) with regard to Q(
1
) -4) is estimated
by (26.3.10). Therefore, using (26.2.3) for the remaining integration with regard to P in (26.3.8), (26.3.11), EIY1(Z)
ETt h(S) < 1
\
t
k
1
_
e_9
°°
1
+-
(26.3.12)
i,i'=1
3 ^6.-j
_ e _23 )
JDiii'O(z)I dz + coke2,(1 _e_25)] ds
J
IRk
E Yl ' Yl " Y ^ )
1 )
(
)
(
)
(
i,i , ,i"=1 00
Jt
J
k
e-s
3
1 -e-23 )
f
1
(1 - v)dv
o
-23 ) ds . JDii'i"O(z)I dz+ coke 2s (1 -e
Next, the first two terms on the right in (26.3.12) may be estimated by
Chapter 7. An Application of Stein's Method
274 using
(26.3.13)
i',i"=iori',
EI(Z1 ) 2 -11-EIZi 2) I<1 `di
^k
I Dili^^^(z)I dz =
yi
= i ' = i re
IDii 2 O(z)Idz = EIZi 2) ZI() Zi 2) I < 1 if i,i',i" are all distinct. ,
JR k
EI(Z iz) ) 3 _ Z (i) I < f
,,
Finally, note that _< 1 (s > 0), e—s 1 — e -28 2s
(26.3.14)
so that 00
L
e-8 1— e as
\ 3 1
e—g
0o
(
ds=^o
I ds<(2t). —
1— e -2 3 /
ft
(26.3.15)
Hence, using (26.3.12)—(26.3.15), together with the estimates k
k
E ^ y(z)y1(2)y(z„) < Z,z ,LZ ,•=1
E
IY(2 I < 2 P3, )
z,2^=1
one has 5/2
ETth(S.)I
C cik 2P3
(^^/ + c2n1/2
P3
.
(26.3.16)
26.4 The smoothing inequality and the estimation of 5. Let '1-1 = {1c,C E C}, where C is the class of all Borel convex subsets of R k . As before, h = h - f hdD. We also write Gb as the distribution of
26.4. The smoothing inequality and the estimation of 5, .
275
bW, if W has distribution G(b > 0). Recall that (see 26.1.15) Tt h(x) = Eh(e -t x + 1 - e -2 tZ), where Z has the standard Normal distribution
= N(0, Ilk), which we take to be independent of S. Then
ETt h(Sn,) = Eh(e -t S,, + 1 - e -2 tZ)
f f k
=
fR
k
h(ex + 1 - e
hd((Q(n))e-t *
hd((Q(n))e—c —
2t
z)dQ(n)(x)^(z)dz
1_e- t)
k
=J
(26.4.1)
e) *
e
1
_e z .
fk
The introduction of the extra term 4) e -t *4 integration in the last step since
j,.
,
1_e_
, _ 4 does not affect the
hd4 = 0.
Since the last integration is with respect to the difference between two probability measures, its value is unchanged if we replace h by h. Hence
ETth(Sn) = f hd[(Q(n))e-t - (D e -t] ^t $ 1 _ e - ^..
(26.4.2)
Rk
Also the class C is invariant under multiplication C —+ bC, where b > 0 is given. Therefore,
s,, = sup I Eh(Sn)I = sup f hd(Q (n) — 41) hE
hEN
= sup f hd [(Q())e-t - ^-t^ .
(26.4.3)
hEN
Thus (26.4.2) is a perturbation (or smoothing) of the integral in (26.4.3) by convolution with
1_e_
^. If e > 0 is a constant such that
1_e
({izI
(26.4.4)
then the smoothing inequality below applies, with p = (Q( n )) e -', v = ^'e t, K =
1_e- t
,
f = h = 1c, a = 7/8,
and e as in (26.4.4).
276
Chapter 7. An Application of Stein's Method
Smoothing inequality Let µ, v, K be probability measures on W, K({x : x <€}) = a> 2. Then for every bounded measurable f one has
J
k
f d(y — v) I <(2a — 1)-1 [*(f : E) + w (2€ : u)],
(26.4.5)
where fE (x)
= sup{ f (y) : — xj < E}, f (x) = inf{ f (y) : Iv — xI <}, E
and y(f : E) = max { f fE d(µ — v) f fE d(µ — v) ^k ^k -
'Y * (f : E) = sup 'Y(f" : E),
1,
f9(x) = f (x + y),
yE11 k
Wf(x : E )
= sup{If (y) — f(x)I : I y — x l <E}, Wf(E : v) =
J
wf(x : c)dv(x),
W * (f : E) = sup W f y (E : v). yERk
For a proof of the inequality (26.4.5), see Lemma 11.4. With h = 1C
lc f hE = 1c-F, where C = {x : dist(x,C) <€}, C E _ {x : open ball of radius E and center x is contained in C} are both convex,
one gets hE =
E
-
so that ry(h : E) < max
l 1C
f l C -rd [(Q(n))e-t — ^ e t] *
*
—
1 _ e sup
I ETth(SS,)
hE7{
Since C is invariant under translation, one then obtains ry*(h : E) < sup I ETth(S) I .
(26.4.6)
hEW
Also, letting Z be standard Normal N(O, 1k), w(2€: 1 e -t) = P(e- t Z E (aC)
2 E)
= P(Z E e t (aC) 2 E) < c3./2Ee t ,
(26.4.7)
26.4. The smoothing inequality and the estimation
of 8.
277
where c3 > 0 is a constant (see Theorem 3.1). From (26.4.4) one gets P ( 1 - e -2 tZ < e) =
p (IZI < 1
e -2t)
so that e//1 - e -2 t = ak, where ak satisfies P(IZI < ak) =
- g
It is simple
to check that ak = O(/), as k —* oo, and ak < c4\, e = ak \/1 - e -2 t < c4/i/1 - e -2 t < c4\/ 2t. (26.4.8)
Using this estimate of c in (26.4.7), one obtains wh(2€ : 1 e -t) < c 5 kVe t .
(26.4.9)
The smoothing inequality now yields (use (26.4.3), (26.4.6), and (26.4.9) in (26.4.5))
On < 4 ^sup I ETth(S,,) I +c5kVe t 3 hE1t
]
(26.4.10)
.
Now use (26.3.14) in (26.4.11) to get 6n < (C6k3/2P3)
+
C7n112P3
+ cskJ te t .
(26.4.11)
By comparing the first and third T terms on the right, an optimal order of t is obtained as t=min 1,
^On 1 P3 -
n
It follows that n_i + e7k312P3
Oo < (e9k5/4P1/2) Sn4 n1^2 (26.4.12)
Consider now the induction hypothesis: The inequality ck 512
O n - P3
(26.4.13)
Chapter 7. An Application of Stein's Method
278
holds for some n> 1 and an absolute constant c > 1 specified below. Note that (26.4.13) clearly holds for n < c 2 k 5 p3. Since c 2 k 5 p3 > k
,
that (26.4.13) holds for some n = n o > k 8 . Then under the induction hypothesis, and (26.4.12), and using n o co.. ck 4+4 P3
> 2(no 1) 14 , one obtains
C7k3/2P3
ago+1 (no(no + 1)) + (no + 1) cio. ck P3
C7k5/2P3
(n o + 1) + 2o(n o + 1)2 \ 1 (c lo = 2c 9 , k < k -o < 2 -9 for k >2 I .
no+ 1 —
JJJ
(26.4.14)
Now, choose c to be the greater of 1 and the positive solution of c = clo/+c72 -9 , to check that (26.4.13) holds for n = no+1. Hence (26.4.13) holds for all n. We have proved the following result.
Theorem 1 There exists an absolute constant c> 0 such that on
ck 3
< _
n
(26.4.15)
26.5 The Non—Identically Distributed Case For the general case considered in Gotze (1991), XD's (1 < j < n) are independent with zero means and E 1 CovX3 = Ek• Assume
a3 = E EIIX^II 3 <00. 1<j
(26.5.1)
26.5. The Non—Identically Distributed Case
279
Let {Xj : 1 < j < n} be an independent copy of {Xj : 1 < j < n}. Then, writing Sn =
=1
Xj, as before,
k
n
k
(26.5.2)
EEDiiV)t(S.) = EE > i=1
j=1 i,i'=1
n k
E'
Dii'bt(Sn — X j )X^ 2) 9 j=1 i,i'=1
)X^ Z
Z + j=1 i,x . ,a —1 ,
) f1 Dii'i"V)t(Sn
— Xj + vXj )dv
and n
E[Sn . VOt(Sn)]=I' nXj.V t(Sn) (26.5.3) j=1 n
k
Xj ^4 t(Sn
=E
—
Xj)+E Xj XDii' t(Sn
j=1
+
—
Xj)
i,i'=l
X^Z)X^Z )X^2)
J
1 (1
— v)Dii'i"Ot(Sn — X j + vXj )dv
Substracting (26.5.3) from (26.5.2) and noting that EXj•VOt(Sn — Xj)= 6,
one obtains ETt h(Sn ) = E
A'^i)Xj2')X^2) j=1 i,i',i"=1
n
k
X
_
ax
t)))
1
f
JO
—v)D ii
1
Dii'i"V)t(Sn — Xj + vX j )dv
i t (Sn
— Xj + vX)dv . (26.5.4)
The estimation of the conditional expectation of the integrals fo in (26.5.4), given Xj, proceeds as in subsection 26.3 (with Xj in place of X1). The only
Chapter 7. An Application of Stein's Method
280
significant change is in the normalization in the argument of h (see (26.3.8)(26.3.11)), where, writing Nj as the positive square root of the inverse of Cov(S" ' - Xj ),
E [h(e s(S,, - Xj ) + e - svXj + Ji - e -2 szjXj]
(26.5.5)
-
= E [h(e -3 NT 1 (Nj(Sl - Xj)) + e -S vXj + vi j- e -28 zIXj] =
fa
k
= Lk
h(e - SNP 1 x+e -3 vXj + 1 - e_ 28 z)dQ(f.-1),j(x) h(eN
1
(x + Nje1 - e
2
z) + evX)dQ(-1),(x),
where Q( n ) denotes the distribution of S,,, =
Xj , and Q( n - 1 ), j that of
Nj (Sn, - X j ), which has mean zero, covariance Rk• As in subsection 26.3, the last integration is divided into two parts: d(Q( n _ 1 ), j -(D)(x)+dJi(x). Since the class of Borel convex sets is invariant under nonsingular affine linear transformations, the integral with regards to
Q(_1),j
- 4 is bounded by
bn _ 1 . For the integral with regards to 4), we change variables x —* y =
x + A j z, where Aj = e - S 1 - e -2 sNj. The estimation of the integral now proceeds as in (26.2.3)-(26.2.8), with scalar a replaced by the matrix Aj . The effect of this is simply to change the sum a2 Err , zrzr'Drr' (y - vaz)
in (26.2.4) to (Ajz) r(Ajz
)r'Drr'O(y - vAjz).
r,r'=1
Arguing as in (26.2.3)-(26.2.8), one arrives at the upper bound for (26.5.5) given by 2
-23 )( 1 - /i3 )-1' cokII Aj11 2 = coke 2s (1 - e-2s)IINj^I2 < co'ke 23 (1 - e
281
26.5. The Non-Identically Distributed Case
using NjII2=
11(lk
Jk
-CovXj)-2II2=
(26.5.6)
-COVXjII-1,
IIk - CovX = sup u • (Rk - CovXj)u = sup (1 - E(u.Xj) 2 ) IuI=1
IuI=1
> 1 — EIXjI 2 > 1 — (EIXj I 3 ) 3 > 1 — ^3
and assuming (26.5.7)
/33 < 1.
Proceeding as in subsection 26.4, one arrives at the bound 8, < ck'/33.
(26.5.8)
If one takes the absolute constant c> 1, then /33 may be assumed to be smaller than or equal to c -l k - 2, and (1- fig)-' < (1 - - ) -1 = c'. The c^
induction argument is similar. Remark. If one defines 73
1 =_ >E
k
^X^zl1 1 3 ,
(26.5.9)
j=1 `i=1
then n
k
E L
j=1 i,i" i"=1
j j
^
i
E X ("X ")X 2/ l j = 73.
3
Since 13 now replaces k 2 Q3 in the computations, it follows that b n < ckry3. Since ry3 < k033 i (26.5.10) provides a better bound than (26.5.8) or (26.4.13).
(26.5.10)
Bibliography Ball, K. (1993). The reverse isoperimetric problem for Gaussian measure. Discrete Comput. Geom., 10(4):411-420.
Barbour, A. D. (1988). Stein's method and Poisson process convergence. In A Celebration of Applied Probability ( Journal of Applied Probability, Volume 25A), pages 175-184.
Bentkus, V. (2003). On the dependence of the Berry-Esseen bound on dimension. J. Statist. Plann. Inference, 113(2):385-402. Bhattacharya, R. (1982). On the functional central limit theorem and the law of the iterated logarithm for Markov processes. Z. Wahrsch. Verve. Gebiete, 60(2):185-201.
Bhattacharya, R. and Holmes, S. (2010). An exposition of Gotze's estimation of the rate of convergence in the multivariate central limit theorem. Technical report, Stanford University, Stanford, CA. http://arxiv.org/abs/1003.4254. Bhattacharya, R. and Waymire, E. C. (2009). Stochastic Processes with Applications. Classics Appl. Math. 61, SIAM, Philadelphia.
283
284
BIBLIOGRAPHY
Diaconis, P. and Holmes, S. (2004). In Stein's Method: Expository Lec-
tures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH. Gotze, F. (1991). On the rate of convergence in the multivariate Cit. The
Annals of Probability, 19:724-739. Holmes, S. (2004). Stein's method for birth and death chains. In
Stein's Method: Expository Lectures and Applications, IMS Lecture Notes Monogr. Ser. 46, Inst. Math. Statist., Beachwood, OH, pp. 45-68. Raic, M. (2004). A multivariate CLT. Personal communication. Rinott, Y. and Rotar, V. (1996). A multivariate CLT for local dependence with n -112 log n rate and applications to multivariate graph related statistics. J. Multivariate Anal., 56(2):333-350. Stein, C. (1986). Approximate Computation of Expectations. Inst. Math. Statist., Beachwood, OH.
Appendix
A.1 RANDOM VECTORS AND INDEPENDENCE
is a A measure space is a triple (S2, , ii), where SZ is a nonempty set, sigma-field of subsets of SZ, and is a measure defined on ffi . A measure space (S2,, , P) is called a probability space if the measure P is a probability measure, that is, if P ((I) = 1. Let (SZ, ffi , P) be a probability space. A random vector X with values in R k is a map on Sl into R k satisfying X '(A)-(w:X(w)EA}E -
(A.1.1)
for all A E , where S k is the Borel sigma-field of R k . When k= I, such an X is also called a random variable. If X is an integrable random variable, the mean, or expectation, of X, denoted by EX [or E(X)], is defined by
EX
f
n
(A.1.2)
X dP.
If X = (X I Xk is a random vector (with values in R k ) each of whose coordinates is integrable, then the mean, or expectation, EX of X is defined by , ...,
)
EXm(EX1,...,EXk).
(A.1.3)
If X is an integrable random variable, then the variance of X, denoted by var X [or var(X)], is defined by varX = E (X — EX) 2
.
(
A.1.4)
Let X, Y be two random variables defined on (2, ,P). If X, Y, and XY 285
286
Bounded Variation
are all integrable, one defines the covariance between X and V. denoted cov (X, Y), by
cov (:C, Y) - E (X — E X) (Y — E Y) = E XY — (E X) (E Y). (A.1.5 ) If X = (X 1 , ... , X k ) is a random vector (with values in R'), such that cov(X i ,X^) is defined for every pair of coordinates (X.,X^), then one defines the covariance matrix Cov(X) of X as the k x k matrix whose (ij) element is coy (X.,Xi ). The distribution Px of a random vector X (with values in R k ) is the induced probability measure PDX - ' on R", that is, Px(A)-P(X - '(A))
(A E 'i k ).
(A.1.6)
Since the mean and the covariance matrix of a random vector X depend only on its distribution, one also defines the mean and the covariance matrix of a probability measure Q on R k as those of a (any) random vector having distribution Q. Random vectors X 1 ,.. .X,,, (with values in R k ) defined on ((2,J^'v,P) are independent if P(X 1 EA 1 ,X 2 EA 2 ,..., X,, EA.)
=P(X 1 E A 1 )P (XZ EA 2 )• P(X.E A,.)
(A.1.7)
for every m-tuple (A 1 ,. ..,A,,,) of Borel subsets of R"`. In other words, X 1 ,.. .,X,,, are independent if the induced measure P°(X 1 ,...,X,„) - ' is a product measure. A sequence (X: n) 1) of random vectors [defined on ((2, S , P)] are independent if every finite subfamily is so.
A.2 FUNCTIONS OF BOUNDED VARIATION AND DISTRIBUTION FUNCTIONS Let be a finite signed measure on R k . The distribution function F µ of is the real-valued function on R k defined by F, (x)=IL((— oo, x])
(xER' ),
(A.2.1)
where (—oo,x]=(—oo,x1]X(—oo,x2]X... X(—oo,x k ]
I x=(x^,...,x )ER k
k I
(A.2.2)
Appendix
287
It is simple to check that F; is right continuous. For a random vector X defined on some probability space (2, % , P), the distribution function of X is merely the distribution function of its distribution P,. The distribution function Fµ completely determines the (finite) signed measure µ. To see this consider the class S of all rectangles of the form (a,b]=(a,,b 1 ]x... x(ak ,bk ]
(A.2.3)
(a,
Also define the difference operators 0,', ... , ik by 1. 'F(x)=F(x1+h1,x2,...,xk)-F(xl — h 1 ,x2,...,xk), t 2F(x)=F(x1,x2+h2,x3,...,xk) — F(x 1 ,x2 — h2,x3,...,xk ), (A.2.4)
dkF(x)= F(x i ,x 2, ...,Xk_I,Xk+hx) — F(XI,x2,...,Xk- 1 ,Xk — hk)
[
X=(X I ,...,X k )ER k , ]
where h l , h 2 ,...,hk are positive numbers, and F is an arbitrary real-valued function on R k . The difference operators are associative and commutative, and one can define the operator Ok by Oh =^,^
tlk
[h=(h l ,...,hk ), hi >4fori=l,...,k]. (A.2.5)
If k = 1, we shall write t k for the difference operator. One can also shows that for every (a, b] E 9 it((a,b])=A hF,,(x),
(A.2.6)
where h=2(b—a),
x=I(a+b).
(A.2.7)
The class €, of all finite disjoint unions of sets in is a ring over which µ is determined by (A.2.6). Since the sigma-ring generated by R. is 'B k , the tSee Cramer [4], pp. 78-80.
Bounded Variation
288
uniqueness of the Caratheodory extensions implies that on 6k is determined by on L (and, hence by the distribution function Fµ ). One may also show by an induction argument$ that A h F(x)= 2 ±F(x i +E 1 h i ,x 2 +E 2 h 2 ,...,x, +Ek hk ),
(A.2.8)
where the summation is over all k-tuples (€ I '€ 2 , ... , E k ), each e; being either + I or —1. The sign of a summand in (A.2.8) is plus or minus depending on whether the number of negative E's is even or odd. Now let F be an arbitrary real-valued function on an open set U. Define a set function µF on the class Fu of all those sets in 9 that are contained in U by (A.2.9) µF((a,b1)-AhF(x), where x and h are given by (A.2.7). One can check that µF is finitely additive in . The function F is said to be of bounded variation on an open set U if
sup
I µF( 1 )I
(A.2.10)
is finite, where the supremum is over all finite collections { 1 '2'••)} of pairwise disjoint sets in 9 such that 1c U for all j. The expression (A.2.10) is called the variation of F on U. The following theorem is proved in Saks El] (Theorem 6.2, p. 68). THEOREM A.2. 1. Let F be a right continuous function of bounded variation on a nonempty open set U. There exists a unique finite signed measure on U that agrees with µ F on the class ^+o of all sets in 't contained in U. It may be checked that the variation on U of a right continuous function F of bounded variation (on U) coincides with the variation norm of the signed measure whose existence is asserted in Theorem A.2. I. A function F is said to be absolutely continuous on an open set U if given e >0 there exists S >0 such that
^^µF Qj )I<E
(A.2.11)
1
for all finite collections (I,.... } of pairwise disjoint rectangles 1^ E ^+ u satisfying (A.2.12) Xk (1^) < S,
tSee Halmos [1), p. 54. =See Cramer [4], pp. 78-80.
289
Appendix
where Ak denotes the Lebesgue measure on R k . If F is absolutely continuous on a bounded open set U, then it may be shown that F is of bounded variation on U.t THEOREM A.2.2. Let F be a right continuous function of bounded variation on an open set U C Rk . Let µ F be the measure on R k defined by (A.2.6) (and Theorem A.2.1). Suppose that on U the successive derivatives D k F, Dk _ I D k F,...,D 1 . Dk F exist and are continuous. Then F is absolutely continuous on U and one has ttF (A)= f(D 1 ... Dk F)(x)dx
(A.2.13)
for every Bore! subset A of U. Also, lim
h,jo.....h 4 j0
(2 kh 1 . • • hk) -I LkF= DID2• • . D k F
(A.2.14)
on U. Proof. Let the closed rectangle [a, b] be contained in U. Let h and x be defined by (A.2.7). Then µF((a+b])=A F(x)=A^.
..
Ok F(x)
Ok-IlF(XI,...,Xk-1, Xk+ hk)-
_oi... Ak_y
F(x1,...,xk-1,xk-hk))
(X4th4
JX4
hk
(DkF)(x, , ... , xk-I+Yk)dYk
(A.2.15)
by the fundamental theorem of integral calculus. Since the integrand has a continuous derivative with respect to X k - 1 , N'F(( a,b])=(]^'... Qk_2
(x4+hkr
2 J X4
f
h4
L ( Dk F )(X1 , ...9 Xk-I +hk-I'Yk) —
Xk t k4
(DkF)( X I , ... , Xk-1 - hk-Ilyk)]dyk
^^X4_^+/J4_^ //
-
lDk-IDkF)1X1,...+Yk-I+Yk)^k-I x4_i - h4_i
X4 k4
J
^k•
(A.2.16) tSaks [1), p. 93.
Bounded Variation
290
Proceeding in this manner, we arrive at (A.2.13) for A = (a, b], remembering that by Fubini's theorem the iterated integral as obtained by the above procedure is equal to the integral on (a,b] with respect to Lebesgue measure on R k . We next show that D, • • • Dk F is integrable on U. For if this is false, then for every integer n > I there exists an integer m„ and pairwise disjoint rectangles (a', b'], ... , (a ", b'"•] such that [a', b' ] c U, i=1,...,m 1 , and m„
j
(D, • • Dk F)(x)dx > n.
_
='.b'l
By (A.2.13), which we have proved for sets like (a',b'], one then has I tLF((a; ,b; ])I>n
for all n, contradicting the hypothesis that F is of bounded variation on U. Thus we have two finite signed measures on U, defined by A—► f (D I • Dk F)(x)dx,
A—^ s,(A),
A
that coincide on the class of all rectangles (a,b] such that [a,b]c U. Therefore the two signed measures on U are equal, and (A.2.13) is established. To prove (A.2.14), let x E U. Choose h =(h 1 , ... ,h,) such that hi > 0 for all i and [x — h, x + hl C U. Then by (A.2.13) one has A F(x)°µF((x '
—
h x+h])=_ f(D1 ,
(z
h.z+Al
...
DF)(y)dy.
From this and continuity of D s • • • Dk F on U the relation (A.2.14) follows.
Q.E.D. It follows from definition that the sum of a finite number of functions of bounded variation, or absolutely continuous, on an open set U is itself of bounded variation, or absolutely continuous, on U. Our next result establishes the bounded variation of a product of a special set of functions of bounded variation. We say that a function g on R k (into R 1 ) is Schwartz if it is infinitely differentiable and if for every nonnegative integral vector a and every positive integer m one has sup IIxIImI(D"g)(x)I
(A.2.17)
xER k
THEOREM A.2.3. Let F 1 ,..., F p be right continuous, real-valued functions on R', each periodic with period one; in (0, 1) each F, is differentiable (1 ( i < p) and has a bounded derivative. Suppose that 4, is a real-valued,
Appendix
291
bounded, measurable function on Rk- P and that g is a Schwartz function on R'`, k) p, and let G (x) =
f ... f
^lYp+l^ ^Yk)gl X l , + xp^Yp+l^ ^Yk)^kp+l
X,+1
-
00
00
I x=(xI,...,xk) E R
k ],
(A.2.18)
if k > p, and G = g if k = p. Then the function F(x)=F l (x l )• • • F;(xp )G(x)
(xER k ),
(A.2.19)
is of bounded variation on R k . Proof Consider an arbitrary function Ho on R. We first show that phi... L F .. . F I( x l) p( xp) Ho( x I_ .,xp) i - I F,(x;, - h,,) . .. F: (x, - h i) [ 0!^ . .. , Ho(x') ] X {phi,F. (x. )]... [ Ohj,-,F. (x. )],
(A.2.20)
where X'=(x ,...,Xp), X^^=X...,X^=x x^ , =x+h^ . ..... x, =x1 + and the summation is over all partitions of (1,2,... ' p^ into two disjoint subsets {i 1 ,...,i3 ), {j 1 ,...,jp _ J }, 0<s' p; when one of these subsets is empty, the corresponding factor drops out. If p = 1, then
i F1 (x)Ho (x)=FI (x+h)Ho(x+h)- Fl (x- h)Ho (x-h) =F1 (x-h)[Ho(x+h)-H o (x-h)] +H0(x+h)[FI (x+h)-FI (x-h)] = F I (x - h)O 1Ho (x) + Ho (x + h)PPF1 (x), which proves (A.2.20) for p = 1. Assume, as an induction hypothesis, that (A.2.20) holds for some p. Then ^'i...
Fl(x1)... jCp+l( xp+l) Ho(xl , ... , xp+1)
... F;( .. pp [ F 1( x l) xp)' Ap+11( FF+l( xp+1) Ho( x 1 ,...,xp+1))] =d1.... dp
[Fl(xl)
...
F,(xp){Fp+,(xp+1 - hp+l) 4➢+1Ha(xl , ... , xp+l)
+Ho(xl,...,Xp,Xp+l+hp+Oilh,-Fp+l(xp+l)1 J
_^^^... poFl(xl) .. F. (xp)Ho'(xI — xp+1) .
+
l...
AoF'1(xl)
,
...
,
F.(xp)Ho(xl , ... , xp+1) , (A.2.21)
Bounded Variation
292
where Qx l ,...,xp+l )= Fp+1(xp+1
—
h,,+l)AP+'HO(x, ... xp+I) ,
,
,
Ho(xl,...,xp+l)= Ho(xl, ... ,xp , xp+I +hp+1 )A +'F,+1(xp+l). (A.2.22)
Now apply the induction hypothesis to each of the two summands of the last expression in (A.2.21) and then substitute from (A.2.22) to see that (A.2.20) holds with p replaced by p+ 1. This completes the proof of (A.2.20) for all p. Looking at F given by (A.2.19), one sees that del... dkkF(x)=4h1... AhF,(xI)... F,(xp )Ho (x l ,...,xk ), (A.2.23) where Ho(xI,...,xk)—Ap+11...
dkG(x l ,...,xk ).
(A.2.24)
By (A.2.20) we obtain 2Fxi .—hi .)... Fi, (xi, —h i, )
OF(x)=
X (d^4,. .. A^ ,j^p+1 • dk G ( x' ) J X r^^51F( xi ),... {'-
where
x' = ( 1,..., k),
x. ^,
...,x' ^
'/, _ ,
= i, x. x' p+ = l-x p+
]
_
(A.2.25)
x'k— = kx 1x' _ 1
l,...,
,
x.i , + h^., ... , xJ' , = x^. , + hh. , and the summation is over all partitions of
(1,2,...,p) into two disjoint subsets {i 1 ,...,i}, {j l ,...,jp - f }, 0<s< p. For the sake of simplicity consider the summand on the right side of (A.2.25) corresponding to i 1 = 1, . .. , is = s. Then by the definition (A.2.18) of G, one
has
.. oS P+y ... okG( x ') = rxi+l+h0+l... fXk+Ilk
xo+l
-
hr+l
k
Xk-hk
... Dag(x1 ,. . . xs xf+l+hs+1 ,...,xp ,
-
x1+h1 ...
Jfxl-h, ...
Xk
,
+hp , Yp+11 ..., YJdYk ... dyp+I
/'x,+h, ! xF+l +h,+l
J
x,-h,
+ hk ^G(Yp+1,...,Yk)(D, • •ass}(y1 ...,y^^xJ+ ► ,
xk - h k
+h,+1 , ... , xp+hp , Yp+l^
,Yk )dyk
...
4p+1 43'5
...
dy1.
(A .2.26)
293
Appendix
Let the derivative of F, on (0, 1) be bounded above in magnitude by b,, and let c, denote the magnitude of the jump of F, at 0 (c, may be zero), 1 < i < p. Assume that 2h,
F1(xs+l)]
...
[ AV.( xx)]I
< IIFlllw ... IlFll.(E+c3+1).
x 2 P -9 11fll..
... hP .. (E+cq )bq+1 ... bp h q+l
f
I(D1... Dsg)(Y1,...
(x -h
;
;
;
... yS xs+1 +hs+1 ... XP+hP yp+l.... Yk)l dyk ,
,
,
,
,
...
,
dyp+l va
...
dvl.
(A.2.27)
Here II f II„ denotes the supremum of I fl. The sum of the left side of (A.2.27) over an arbitrary partitioning of the (x,, ... , x s , xp + 1 , ... , xP )-space into rectangles of the form x — h < y1 x + h for i —1, ... , s, p + I , ... , k, is therefore bounded above by ;
IIF11I^
...
IlFsll0(E+c,+,)
.
;
;
;
(E+cq)bq+l •.. bphq+1.•
..
x f,+k-
j(D1
...
D,g)(YI ... ,
,
Ys x,+l+ha+l ,...,xp
Yp+1+• , Yk)I dYk" < C'hq+l...
. hP2P-9II^GII00
,
+hpr
'p+I 's ... dY1
hP( l +Ixs+Ilk+l+ ... +Ix,lk+l)-I
(E < 1),
since g is a Schwartz function. Now the lim sup (as h 1 0) of the sum of the last expression over all pairwise disjoint intervals (x i — hi , xi + h; ] in each variable i = s + 1, ...,p, as prescribed preceding the inequality (A.2.27), is 00
c' n,+300
00
... 2 f (1+ I n s+ll k+l Rv a no--co
+... +Inglk+l+IXq+llk+l+... +lxp l k+l ) -I dx
q+
1
...
dxp
where each summation is over the set of all integers. The variation of F on R k is given by the expression (A.2.10) (with U=R k ) and is also obtained in the limit by computing this sum over successively finer partitions
294
Absolutely Continuous, Singular, and Discrete Measures
> 1) of R k into rectangles in 6S. For a typical rectangle I, µF (I) is (II :j> given by (A.2.23) and (A.2.25). There are a finite (bounded) number of terms on the right side of (A.2.25), and we have shown above that the limit of the sum of the absolute values of a typical term over successively finer partitions is finite. This has been accomplished by further splitting a partition into subpartitions; in a subpartition (xi — h i , xi + hi ] contains one integer point for a specified set of i's, but none for a disjoint set of i's. This was necessary to take care of the possible jumps of the F,'s at integer points. Q.E.D.
A.3 ABSOLUTELY CONTINUOUS, SINGULAR, AND
DISCRETE PROBABILITY MEASURES We recall that a finite signed measure it on a sigma-fieldJl of subsets of a set SZ is said to be absolutely continuous with respect to a sigma finite measure v (also defined on J') if every v-null set is also a u-null set. The Radon—Nikodym theorem asserts that in this case there exists a vintegrable function f, called the Radon—Nikodym derivative of t with respect to v, such that
µ(B)= fB fdv
(B E eJ3).
(A.3.1)
A well-known characterization of absolute continuity is the following: THEOREM A.3.1. A finite signed measure (on) is absolutely con-
tinuous with respect to a sigma-finite measure v (on) if and only if for every positive a there exists a S >0 such that t(B)I <e for every B (in GJ3 ) for which v(B)<6. For a proof see Halmos [1], Theorem B, pp. 125, 126. The opposite of absolute continuity is singularity. A finite signed measure µ is singular with respect to a sigma finite measure v if there exists a set B 0 E 61 such that µ(Bo)=0, v(Sl\B 0)=0. The so-called Lebesgue decomposition theorem says that if µ is a finite signed measure and v a sigma-finite measure (on i ), then there exists a unique decomposition I —µi+µr
(A.3.2)
where It^ is absolutely continuous and z 2 is singular (both with respect to v).
We now specialize to (signed) measures on R k . Let p be a finite (signed) measure on R k . Let (A.3.2) denote the Lebesgue decomposition of with
µ
295
Appendix
respect to Lebesgue measure. It is easy to see that t 2 may be decomposed uniquely as (A.3.3) µ2 - µ s + µa, where µ4 is (purely) discrete and µ3 has no point masses; that is, there exists a countable set A = {x 1 ,x 2 ,...) such that Iµ4l(A)=I,i41(R") µ 3((x })=0 ,
forallxER".
The (signed) measures µ3 and p. are called the singular and discrete components of µ, respectively. Thus µ is decomposed uniquely as =
µi
+
µ3
+
µ 4,
(A.3.4)
the sum of its absolutely continuous, singular, and discrete components. The (signed) measure µ is simply called absolutely continuous or singular or discrete if the other components vanish. When µ is absolutely continuous, its Radon—Nikodym derivative (with respect to Lebesgue measure) is simply referred to as its density. Theorem A.3.1 and the definition of absolute continuity of a function on R k lead to the following proposition: a finite signed measure on R k is absolutely continuous if and only if its distribution function F µ is absolutely continuous. Let P be a probability measure on R"`. If P is absolutely continuous, then it follows from the Riemann—Lebesgue lemma [Theorem 4.1(iii)] that its characteristic function P satisfies lim o IP(t)I=O.
Ilt
(A.3.5)
It follows that if P has a nonzero absolutely continuous component, then lim II+II-
IP (t)I < I.
(A.3.6)
If P is discrete, then P is a uniformly convergent trigonometric series. A fundamental theorem of Bohr (see Bohr [1J, pp. 80, 81) implies that in this case lim I P (t)I =1. (A.3.7) II111-For singular probability measures the situation is a little more complex. It is known (see Esseen [1J, pp. 28, 29) that there are singular probability measures P 1 , P2 such that lim IIfII—oo
IP I (t)I=0,
lirP IP2 (t)I=1. II:II-+oo
(A.3.8)
Euler-Maclaurin Summation Formula
296
By taking convex combinations of P 1 , P2 . one shows that for each a E[0,1] there exists a singular probability measure P such that Ik1lm o
IP(t)1=a.
(A.3.9)
A.4 THE EULER-MACLAURIN SUM FORMULA FOR FUNCTIONS OF SEVERAL VARIABLES
Let f be a function of the real variable x on an interval 0 < x < n (n integral), with continuous derivative f'. Then a closed expression for the sum . 0 f(j) may be given as follows: f(j)= f f(x)dx+z(f(0)+f(n))+ f n (x—[x]—Z)f'(x)dx j=0
where [x] denotes the largest integer less than or equal to x. This is the Euler—Maclaurin sum formula in its simplest form. In this section we obtain a general summation formula for functions of several variables. The results of this section are used in Section 23 to get asymptotic expansions for distribution functions of normalized sums of independent, lattice random vectors.
The Functions Si (j > 0). Let Sj (j = 0, 1,2,...) be a sequence of real-valued periodic functions on R of period one, possessing the following properties: (i) For j > 0, Si is differentiable at all nonintegral points and Si+ ^(x) = S.(x) (at all nonintegral x), (ii) So (x) m 1, S, is right continuous and Sj is continuous for j > 2. (A.4.1) Such a sequence is uniquely determined by the above properties and plays a fundamental role in the summation formula. To see this, write Si (0) = Bj /(j!) and observe that (i) leads to S 1 (x)=x+B 1 , S (x)-
1
ji
S2 (x) = Zx 2 +B 1 x+B 2 /2!,...
B x j + I XX
1
1! (j— 1)!
...+
j B - _ I I B (j)Xl -r ,
j!
j!
r-p r
(0<x(1,j> 1) (A.4.2)
297
Appendix
The constants Bi ' s are determined by the property (A.4.1). In fact, Si (0) =S(1) for j > 2, which yields 1+ (
J B1 +12JB2+... +( . i l )B'-i=0
(3= 2,3,...). (A.4.3)
)
The sequence. of constants Bi is recursively defined by the relation (A.4.3), thus completely determining the sequence of functions Si in the interval 0< x < 1. The continuity assumption determines their values at integral points. The numbers Bi defined by (A.4.3) are called Bernoulli numbers, and the polynomial i
B1 (x)=
I B (J)x' r
(A.4.4)
r
r-0
is called the jth Bernoulli polynomial. Clearly, SS (x)= B1 (x)/(j!) for 0 < x <1. Since the sequence ((-1YSi (—x):j>0} has the properties (A.4.1), excepting right continuity of (-1)S,(— x), it follows from uniqueness that
Si (— x) = (-1)iS1 (x)
(for all x if j 1, for nonintegral x if j =1). (A.4.5)
The functions S, are thus even or odd depending on whether j is even or odd. In particular, Bi
.,(0)=-4=0 J•
for j odd, j > 3.
(A.4.6)
The first few Bernoulli numbers are B0 =1,
B 1 =-2,
B 2 =6,
B 3 =0,
B4 =-3,
B 5 =0. (A.4.7)
Therefore S1(X)= X-
S3 (x)=
,
S2(x)=1(x2-x+),
6(x 3 — 2x 2 + 2x),...
(0 < x < 1),
(A.4.8)
and so on. The periodic functions S, have the following Fourier series
298
Euter-Maclaurin Summation Formula
expansions when x is not an integer: cos(2nirx) —1 2 )^^2- (
j even, j > 0, n=1
(2nir)j
(A.4.9) 2sin(2n7rx)
j odd. n= I (2nir)i
This may be seen as follows. Let uj denote the function represented by the Fourier series (j > 1). It can be checked directly that u i is the Fourier series of S, and that u' + i = uj for j > 1. Thus Sj = uj for all j > 2, and S 1 (x)=u 1 (x) for all nonintegral x. THEOREM A.4. 1. Let f be a real-valued function on R' having r continuous derivatives, r> 1, and let
f
JDifJdx
(A.4.10)
for j < r. Then for every Borel set A Y, f (n) = f dF„ nEA
(A.4.11)
A
where
F(x)= f x f (t)dt,
F,(x)= 2 (-1)'Si(x)(D'F)(x)+(-1).+, j=0
fx S,(t)(D'+'F)(t)dt. 00
(A.4.12)
Proof. It is clear from the definition of F, that it is a function of bounded variation. Also, since each one of the functions S3 (x) is absolutely continuous with a continuous differential coefficient in the interval n <x
Appendix
299
Generalization to Functions on R k Let S denote the Schwartz space on R k ; that is, f E 5 if and only if f is infinitely differentiable and sup xQ(Daf)(x)l< oo
(A.4.13)
xER"
for all pairs of nonnegative integral vectors a,$. Our first step is to construct functions on R k analogous to (A.4.12). Write S. (x) = Sa,(xl)Sai(x2)•.. S
[a=(a 1 ,...,ak ) nonnegative integral vector, x=(x 1 ,...,x k )J,(A.4.14) and for an integrable f write F( x )=J x ' ... Jxk f(Y)dy I f E S , x=(x l ,...,x k )ER k ].
(A.4.15)
For a function f that is Schwartz in the jth coordinate x x , let the operator I,^(F) be defined by I, (F)(x)= f x' S,(Yj)(Dj' +'F)(x l ,...,x
i
Y; ,xj+i ,...,xk )dy
_ (x'...IxkS,(y)(Di`f)(Y)dy
(xER k ), (A.4.16)
where D^ = (a / axe ) . By Fubini's theorem, the operators I,,, , , ... , I,i commute if f is Schwartz in x.....xi., and jrli ... jrj,(F)(x)
= l x i ... fXk S, -00
(Y;,)...
S, (Yi,)(Dii ... D4, f)(Y)dy. (A.4.17)
-00
From Lemma A.4.4 it follows that if f is Schwartz in x,,,...,x 3v , xj ,...,xj^, DRS... DjiI^
✓ ^... j..i,(F)(x)
= r x ' ... Ix'k S.(Yi,)...S,(Yj,)(D '... DJ^
-00
where the sets {s l ,...,sq } and (j l ,...,jp } are disjoint.
Euler-Maclaurin Summation Formula
300
Theorems A.4.2 and A.4.3 below are the main results of this appendix. THEOREM A.4.2. Let f be a Schwartz function on R k and let F,(x) k
{1— S1(x)Dj+.. +(
—
l rS,.( xj) Djr+ ( — 1)r
+lI,
)
j)( F)(x ) ,
j-1
(A.4.19) where F is defined by (A.4.15). Then F r is a well-defined function of bounded variation, and for any Borel set A
2 f (n)= jAdF,.
nEAfZ"
In the next theorem we obtain an error estimate when we expand the product defining F, and replace it by a partial sum. THEOREM A.4.3. Let f E 5 , v E R k , and h > 0, and let r be a positive integer. Define
I
( Ar(x)= j(a)
-1 ) la1 h la1 sa (
X
h v)(DaF)(x)
(A.4.20)
where F is defined by (A.4.15), and for any nonnegative integral vector a=(a 1 ,...,a k )
j(a)=
E (a -1),
j(a)=0
j
ifaj <2forallj. (A.4.2I)
a2
For every m > k/2 there exists a constant c(r, m, k) such that for all Borel sets A hk
E f (v +hn)— f v+hnEA
dA,l
A
< c(r,m,k)
I
hlrlVm(Drf), (A.4.22)
r
where v m is defined by 'm@') sup ((1+ljxfl2)m /ZI0(x)I:xERk ).
(A.4.23)
301
Appendix
The rest of this section is devoted to proofs of these two theorems. Proof of Theorem A.4.2 The first step is to show that the various operations occuring in the definition of F, commute and that F, is a well-defined function of bounded variation. This is done by using Lemma A.4.4. The proof of the theorem is then carried out by induction on k. The following lemma on differentiation under the integral sign is well known,t and its proof is omitted. LEMMA A.4.4. Let 0 be a function on R k having continuous partial derivatives of all orders a =(a 1 , ... , a , 0, ... ,O) with al s. Assume that there exists an integrable function H on Rt- P such that '(D W)(xl,...,xk)I
(X=(XI,...,xk)ERk).
Then the function 4, = f dxp + l ... dxk
has continuous derivatives of all orders a, jai < s, in the variables x 1 ,.. . , xp , and D°tl= f D \pdxX+1... dxk for all (x 1 , ... , xP ) E RP. Write T,,=1—SIDS+
+(-1)'S,D^'+(-1)' +I I, j .
(A.4.24)
One can regard T, j as an operator acting on functions G G(x)=f... ("k g(Y)dy, J ^ J
where g is an integrable function that is Schwartz in the variable x,. Also note that if g E S , then T, j (G) as a function of the remaining variables is the indefinite integral of a Schwartz function in these variables. Thus if g E , it makes sense to apply T,,,. , ... , T, to G. tSee Love [ 1), p. 126.
302
Euler-Maclaurin Summation Formula
It is then clear that F, = FIB _ I T,^ (F) is a well-defined function, independent of the order of application of the operators T, j . By Theorem A.2.3 of the Appendix, it is a function of bounded variation. Moreover T,,, (F) = T,.'j, T,,= ... T, J Ds F
if s^(j l ,...,jp }.
We now complete the proof of Theorem A.4.2 by induction on k. If x=(x l ,...,xk+! ), write x'=(x l ,...,xk ) so that x=(x',x k+E ). Then by the induction hypothesis, I f (n' y) = G, (x',y), (A.4.25) ,
n'<x'
where k
G,(x ', y) = II T,. (G)(x ', y) j-1
and G(x',y)—G(xl,...,xk,y)= ff(z',y)dz'. ao
Now 1 J(n ', nk+,) I f(nI , ••• , nk+I) = G n<x nk+,<xk+I n'<x'
= Y,
h(x',nk+l),
nk+i<xk+i
where h(x',y)_ 2 f(n',y)=G,(x',y) n' < x'
by (A.4.25). For each x', h(x',y) is a Schwartz function of y. Consequently, from Theorem A.4.1 one has 2
h(x',nk+1)=T..k+I(H)(x'Ixk11),
nk+I<xk+I
where H (x',xk+ )= J
xk+I
xk+'l ^r (x ' h(x^,y)? '1 = ^ 0 V ,y)dy
f T (
I
_-- G(x',y)dy= II T,,l(F)(x)•
JJ °°
f—I
Appendix
303
Thus I f( n ) =T,,k+I( H )( x ) =F,.( x ) ,
n<x
which proves the theorem for k + 1. Since the theorem follows for k =1 from Theorem A.4.1, the proof is complete. Q.E.D. Proof of Theorem A.4.3
We begin with some preparation. The main problem is to estimate the mass of f4 dF, — f4 dA r on the various planes defined by the restriction that some of the coordinates are integers. LEMMA A.4.5. Let h > 0, v E R'. Then for m> 00
[1 +(hn+v) 2 +a 2 ] -m < cm h - '(1+a 2 )
-m+I/2
+2(l+a 2 )
m,
n--00
where cm= f (l +x 2 ) -m dx=B(2,m— 2); 00
here B (m, n) denotes the standard beta function. Proof The above summation can be split up as
I
+
h(n-1)+v>0
I h(n+1)+v<0
+ 2 =I+Y.+Y,, (A.4.26) Ihn+vI
1
2
3
say. Then 2
2 < I
I < 2
In [l+(hx+v)2+a2jmdx
(n-1)>—v/h n—I
/h [l+(hx+v) 2 +a 2 ] -m dx=h -
' f ^(1+x 2 +a 2 ) -m dx,
7. fn+l [1+(hx+v) 2 +a 2 ] -m dx h(n+I)+v<0 n
h
(A.4.27)
l
{l+(hx+v) 2 +a 2 ] -m dx=h -1 f 0^ (l +x 2 +a 2 ) -m dx.
304
Euler-Maclaurin Summation Formula
Also, there can be no more than two integers satisfying the inequality —1—h
2 <2(1+a 2 )
m
(A.4.28)
.
3
The lemma follows from (A.4.27) and (A.4.28). Q.E.D. LEMMA A.4.6. Assume that 0p/2. Then hp2 ^I+Ilh•n+V11 2 +a 2 ^ -m <(c,,+2) ° (l+a 2 )
-m+p/z '
where the summation is over all integral vectors n = (n 1 , ... , ni,) in R. Proof Fix a and h. Put ¢o (v:m)= 2 [l+IIh•n+v11 2 +a 2 ]
m
Then by Lemma A.4.5, , I (v :m) (c,„+2)h '(1 +a 2 ) - "'
+1/2
-
So, writing v=(v I ,...,v.) and ^p (v:m) <(c,,,+2)h 'gy- 1 (d :m— 2), -
and the Lemma follows by induction on p. Q.E.D. LEMMA A.4.7. Let H(x)=S(x 1 ) • • • S,(x s )G(x), where G is a function of bounded variation on R'`, continuous in the variables x 1 ,...,x s . Let H also denote the signed measure with distribution function H. If L(n 1 , ... , n 5 ) = (x=(x ) ,...,x k ): x I =n 1 ,...,x 5 =n 5 }, then the signed measure HjL(n 1 ,...,n s ) [i.e., the signed measure H restricted to L(n 1 ,...,n s )j has distribution function (— l)SG(n,, ... , n5 , x:+ I .... , xk ).
Proof. We shall prove it for the case s=1. The general case follows by induction on s. Let the difference operator ' be defined as YO(x)=-O(x+ hey ) — 4(x — hey ), where e1 is the vector with I at the jth coordinate and 0 at others, h>0. Then if h = (h I , ... , h k ), h.>0 for all i, dH=A" . .. Ok (H)=A ,(S1p2z... OkG). ,
x - h
(A.4.29)
Appendix
305
The result follows by putting x I = n 1 and taking the limit as h40. Q.E.D. LEMMA A.4.8. •Let H(x)=S 1 (x l )• • • S I (Xp)¢ 1 (Xp+11 ...,X q)G(x), where G(X)= "o+1
...
—00
x k
/
/
> 2(yq+1,...,yk)g( X1 ... xq Yq+I ... yk)dYq+l ,
,
,
...
,
,
dyk
— 00
(A.4.30)
and g E S. Suppose that 1^. 1 is absolutely continuous in its variables and sup(Df ... Df 4i1I < c(i) supl4'2I < c(i), ,
(A.4.31)
where the constant c(iJ') depends only on ¢=(' 1 ,>J' 2), and the first supremum is over all fi =0 or 1, p+ I <j
(A.4.32)
where the sum is over all a = (a, + 1 , ... , a q) with as = 0 or 1, s+1 < j < q. Proof If H ( ,.^ is the distribution function of HI L(n 1 ,...,n3 ) for some integers n l ,...,n,, then by Lemma A.4.7, H (,, ,. )(x, +l ,... I Xk )=(-1)' Si(xp)Y i(xp+1,...,Xq)•G(nl,..., n3, x..+l ... xk). Since the X S1(x,+1) ...
,
,
density function of A ( „ ..... ,,,. ) is Ds, +1 • • • D k H t „ 1. ,,, , the estimate (A.4.32) follows if one uses (A.4.31). Q.E.D. Let 'm be the seminorm defined by (A.4.23). We then have
LEMMA A.4.9. With the same notation and assumptions as in Lemma A.4.8, let g(x) = O(v + hx), where v E R k , h >0, and ¢ E S. Then for all
m>k/2, hkIIHII < c(m,k,>p)
hIaIr m (D ,0), 0
(A.4.33)
where c(m, k, J) is a constant depending only on its arguments, and the summation is over all a such that ai = 0 or I and aj = 0 for j > q. Proof. Let (j1,...,js)C{l,2,...,p) and L(j1,...,jj: n 1 ,...,ns )={xER k : x^,=n 1 ,...,x, =n,) for some integers n 1 ,...,n,. Let X(j l ,...,js : n1,...,n,)
306
Euler-Maclaurin Summation Formula
denote the absolutely continuous component of HI L(j,,...,j3 : n 1 ,...,ns ) with respect to the Lebesgue measure on the (k — s)-dimensional "plane" L. Then from the nature of the function H, it follows that A(j1,...,j3: n 1 ,...,n5 ),
H=Ao+
(A.4.34)
(fi.... ✓ ) (n .....n,)
where A0 is the absolutely continuous component of the signed measure H. We shall estimate
(n i....., n,)
assuming j, = 1, ...,j, = s. By Lemma A.4.8, this is less than or equal to S
c(t')j 2 h IaI vm(D ° $) f
a III,..., Ili R
1+ E (hn j +vj ) 2 k-,
i—I
k
(hxj + vj ) 2 dx, + , • - dx k
+ j—:+1
lO)I
+ hx+v hlal vm D ( ac) (fRk-,P I^
112]—m+s^2'(' )
dz1c ms h ` s
J
< c(J)I2 hlaIv.(D°$)l c(m,k)h 1 a
-k .
(A.4.35)
We have also used Lemma A.4.6 in this estimation. Lemma A.4.9 now follows. Q.E.D. We are now ready to prove Theorem A.4.3. Let w(x)
=f(v
+ hx).
(A.4.36)
Then W(x) = f
w(y)dy=h-kF(v+hx).
(A.4.37)
Then by Theorem A.4.2, 2 v+hnEA
f(v+hn)=f v+hxEA
dW,(x),
(A.4.38)
Appendix
307
where k
(A.4.39)
W,= II Tv, (W). ,
i
-
1
If we now expand the product formally (as we may), then W, is expressed as a sum of terms of the form ± Sr1( xe,)
...
Sr,(xa,)D 0 D.14, ,i
,...I
,
✓v
(W),
(A.4.40)
where {a,,...,o,, j,,...,jj ) is a permutation of (1,2,...,k}, and y0,1, or larger than 1. We now obtain an estimate of the variation norm of the signed measure whose distribution function is given by (A.4.40). For this purpose the function (A.4.40) may be written in the form ±SI(xk,)
...
SI(xk,)Sp(xm,)
.
..
SR. (x„.., )G(x),
(A.4.41)
where$>2, l<j<s,and G ( x)= x ... f"v
J — oo
1 x;, ... f
J — oo — ao
b S,(x),)...S,(x^)g(x' )dx, ...dx) , v
X
— oo
(A.4.42)
where (x') a =xx for all aE(i j ,...,i1 , j 1 ,...,j,,) and (x') Q =xa for all other indices a. Also, writing I /3I = $ I + • • • + /3,, we define the function g by g(x)=(D s,1-'... JAR-'D•... D'w)(x) mi
^,
it
lv
=hv" + IRI - ' Dm,- '...D'D r...Dff)(v+hx) (xER' ). (A.4.43) i (
-
Here (i 1 ,...,i,, k 1 ,...,kq , m i ,...,ms , j l ,...,j,,} is a permutation of {l,2,...,k}. Since S^(j>2) and its derivative is bounded, it follows that Lemma A.4.9 can be applied to the function (A.4.41) and its variation is less than or equal to c ( r,m, k)h-k+pr+lfll-sY1 h laIpm ( D aD , :-^... Dff ),
(A.4.44)
a
where the summation is over all a with a^ = 0 or I and a1 = 0 unless It follows that Ial q+ s, and if D'= DaD^; '...D^ I D ,'•••D', then IYI=IaI+I/3I—s+pr r ifp>0. J^
j E {k,,...,kq , m 1 ,...,m 3 }. -
-
308
Euler-Maclaurin Summation Formula
Let
W,.= w;+ W"+ W; ",
(A.4.45)
where W: (X) =
I (-
l ) IaI S« (x)(D"W)(x) ,
j(a)
W, (x)= 7
I
1(a) >r
(A.4.46) ( -1 ) laI Sa (X)(D a W)(X)
,
and W" is defined by the identity (A.4.45). Then W7' is the sum of terms of the form (A.4.41), where p > 0. Thus the variation of W,'" is not larger than h-kc(r,m,k)
2
hh''Ivm (D'f).
(A.4.47)
r
Now consider W,". If the general term occurring in W," is written in the form (A.4.41), then p = 0, j(a) _ I /91- s > r. Then from (A.4.44) it follows that hkIJW,"
hlrlym(D'f),
(A.4.48)
r
where 11 W;' 11 is the variation norm of the signed measure having distribution function W,". Finally, it is easy to check that ( A.4.49)
h k Ar(X)=W:( x h v ) -
so that
f
v+hxEA
dW,=h -k
f dA
r .
A
The theorem follows from (A.4.45)-(A.4.50). Q.E.D.
( A.4.50)
References
Agnew, R. P. [1] Global versions of the central limit theorem. Proc. Natl. Acad. Sci. 40 (1954) 800-804.
Bahr, B. von [1] On the convergence of moments in the central limit theorem. Ann. Math. Stat. 36
(1 %5) 808-818. [2] On the central limit theorem in R. Ark. Mat. 7 (1%7) 61-69. [3] Multi-dimensional integral limit theorems. Ark. Mat. 7 (1967) 71-88. Beek, P. Van (1) An application of the Fourier method to the problem of sharpening the BerryEsseen inequality. Z. Wahrscheinlichkeitstheor. Verw. Geb. 23 (1972) 187-197. Bergstrom, H. [1] On the central limit theorem in the space R k , k> 1. Skand. Aktuarietidskr. 28 (1945)
106 -127. [2] On the central limit theorem in the case of not equally distributed random variables. Skand. Aktuarietidskr. 33 (1949) 37-62. [3] On the central limit theorem in Rk. Z. Wahrscheinlichkeitstheor. Verw. Geb. 14 (1 %9) 113-126.
Bernstein, S. [1] Sur 1'extension du theoreme limite du calcul des probabilites aux sommes de quantites dependantes. Math. Ann. 97 (1927) 1-59. Berry, A. C. [I] The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Am. Math. Soc. 48 (1941) 122-136. Bhattacharya, R. N.
[1] Berry-Esseen bounds for the multi-dimensional central limit theorem. Ph.D. Dissertation, University of Chicago (1 %7). [2] Berry-Esseen bounds for the multi-dimensional central limit theorem. Bull. Am. Math. Soc. 75 (1%8) 285-287.
[3] Rates of weak convergence for the multi-dimensional central limit theorems. Theory Probab. App!. 15 (1970) 68-86.
[4] Rates of weak convergence and asymptotic expansions for classical central limit theorems. Ann. Math. Stat. 42 (1971) 241-259. [5] Recent results on refinements of the central limit theorem. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1I, University of California Press (1972), pp. 453-484.
309
310
References
[6] Speed of convergence of the n -fold convolution of a probability measure on a compact group. Z. Wahrscheinlichkeitstheor. Verw. Geb. 23 (1972) 1-10. [7] On errors of normal approximation. Ann. Probab. 3 (1975) 815-828. [8] Some recent results on Cramer-Edgeworth expansions with applications. Multivariate Analysis - VI (Ed. P.R. Krishnaiah), 57-75, North-Holland, New York (1985). Bickel, P. J. [1) Edgeworth expansions in nonparametric statistics. Ann. Stat. 2 (1974) 1-20. Bikjalis, A. [1] On the refinement of the remainder term in the multidimensional central limit theorem. Litov. Mat. Sb. 4 (1964) 153-158 (in Russian). [2] Estimates of the remainder term in the central limit theorem. Litov. Mat. Sb. 6 (1966) 321-346 (in Russian). [3] On multivariate characteristic functions. Litov. Mat. Sb. 8(1968)21-39 (in Russian). [4] Asymptotic expansions of distribution functions and the density functions of sums of independent identically distributed random vectors. Litov. Mat. Sb. 8 (1968) 405-422 (in Russian). [5] Asymptotic expansions of distributions of sums of identically distributed independent lattice random variables.. Theory Probab. App!. 14 (1969) 481-489. [6] On the central limit theorem in R", Parts I, II. Litov. Mat. Sb. 11 (1971) 27-58; 12 (1972) 53-84 (in Russian). Billingsley, P. 11] Convergence of Probability Measures. Wiley, New York (1968). Billingsley, P. and Topsde, F. [1] Uniformity in weak convergence. Z. Wahrscheinlichkeitstheor. Verw. Geb. 7 (1967) 1-16. Bohr, H. [I] Almost Periodic Functions. Chelsea, New York (1947). Brillingerti D. [1] A note on the rate of convergence of a mean. Biometrika 49 (1962) 574-576. Chebyshev, P. L. [1) Sur deux theoremes relatifs aux probabilities. Acta Math. 14 (1890) 305-315. Chung, K. L. [11 A Course in Probability Theory, 2nd ed. Academic Press, New York (1974). Cramer, H. (I) On the composition of elementary errors. Skand. Aktuarietidskr. 11 (1928) 13-74, 141-180. [2] Sur un nouveau theoreme-limite de la theorie des probabilites. Act. Sci. Ind. 736 (1938). [3) Random Variables and Probability Distributions. Cambridge University Press, Cambridge (1937). [4] Mathematical Methods of Statistics. Princeton University Press, Princeton, N. J. (1946). Dieudonne, J. [1] Treatise on Analysis, Vol. 11 (1970), Vol. 111 (1972). English translation by 1. G. MacDonald, Academic Press, New York. Doob, J. L. [1] Stochastic Processes. Wiley, New York (1953).
References
311
Dudley, R. M. [I] Convergence of Baire measures. Stud. Math. 27 (1966) 251-268. [2] Distances of probability measures and random variables. Ann. Math. Stat. 39 (1 %8) 1563-1572. Edgeworth, F. Y. [II The law of error. Proc. Comb. Philos. Soc. 20 (1905) 36-65. Eggleston, H. G. 11] Convexity. Cambridge University Press, Cambridge (1958). Esseen, C. G. [1) Fourier analysis of distribution functions. A mathematical study of the LaplaceGaussian law. Acta Math. 77 (1945) 1-125. [2] A moment inequality with an application to the central limit theorem. Skand. Aktuarietidskr. 3-4 (1956) 160-170. [3) On mean central limit theorems. Trans. R. Inst. Technol., Stockh., 121 (1958) 1-30. Federer, H. [1] Geometric Measure Theory, Die Grundlehren der Mathematischen Wissenschaften, Vol. 153. Springer, New York (1%9). Feller, W. [1] Uber den zentralen grenzwertsatz der wahrscheinlichkeitsrechnung. Math. Z. 40 (1935) 521-559. [2) On the Berry-Esseen theorem. Z. Wahrscheinlichkeitstheor. Verw. Geb. 10 (1968) 261-268. [3) An Introduction to Probability Theory and Its Applications, Vol. 2, 2nd ed. Wiley, New York (1971). Gnedenko, B. V. [1) On a local theorem for stable limit distributions. Ukr. Mat. Zh. 4 (1949) 3-15 (in Russian). Gnedenko, B. V. and Kolmogorov, A. N. (1) Limit Distributions of Sums of Independent Random Variables. English translation by K. L. Chung, Addison-Wesley, Reading, Mass. (1954). Gotze, F. & Hipp, C. [1) Asymptotic expansions in the central limit theorem under moment conditions. Z. Wahrscheinlichkeitstheor. Verw. Geb. 42 (1978) 67-87. Hall, P. [I] Rates of Convergence in the Central Limit Theorem. Pitman, London (1982). Halmos, P. [I] Measure Theory. Van Nostrand, Princeton, N. J. (1950). Hardy, G. H. [1] Pure Mathematics, 3rd ed. Cambridge University Press (1959). Hardy, G. H., Littlewood, J. E., and Polya, G. [I) Inequalities. Cambridge University Press (1934). Heyde, C. C. [1) On the influence of moments on the rate of convergence to the normal distribution. Z. Wahrscheinlichkeitstheor. Verw. Geb. 8 (1 %7) 12-18. Ibragimov, I. A. [I] On the accuracy of Gaussian approximation to the distribution functions of sums of independent variables. Theory Probab. App!. 11 (1966) 559-579.
312
References
lbragimov, I. A. and Linnik, Yu. V. [II Independent and Stationary Sequences of Random Variables. English translation, Wolters-Noordhoff, Gronigen (1971). Ingham, A. E. [1] A note on Fourier transforms. J. Lond. Math. Soc. 9 (1934) 29-32. Katznelson, Y. [ I) An Introduction to Harmonic Analysis. Wiley, New York (1968). Khinchin, A. Ya. [IJ Begrundung der normalkorrelation nach der Lindebergschen methode. Izv. Assoc. Inst. Mosk. Univ. 1 (1928) 37-45. [2] A.rymptotische Gesetze der Wahrscheinlichkeitsrechnung. Ergeb. der Mat. Springer, Berlin (1933). [3] Mathematical Foundations of Statistical Mechanics. Gtti, Moscow- Leningrad (1938). English translation by G. Gamow, Dover, New York (1949). Laplace, P. S. [ 1 ] Theorie Analytique des Probabilites, 1st ed. (1812). Levy, P. [II Caleul del Probabilites. Paris (1925). Liapounov, A. M. [I] Sur une proposition de la theorie des probabilites. Bull. Acad. Imp. Sci. St. Petersb. (5) 13 (1900) 359-386. [2] Nouvelle forme du theoreme sur la limite de probabilite. Mem. Acad. Sci. St. Petersb., (8) 12 (1901). Lindeberg, J. W. [1] Eine neue herleitung des exponential gesetzes in der Wahrscheinlichkeitsrechnung. Math. Z. 15(1922)211-225. Loeve, M. [ I) Probability Theory, 3rd. ed. Van Nostrand, Princeton, N. J. (1963). Markov, A. A. [ I) The law of large numbers and the method of least squares. Izv. Fiz. Mat. Soc. Kazan. Univ. (2) 8 (1898) 110-128 (in Russian). -
Matthes, T. K. (I] The multivariate central limit theorem for regular convex sets. Ann. Probab. 3 (1975) 503-515. DeMoivre, A. [I] Miscellana Analytica Supplementum. London (1730). Nagaev, S. V. (I] Some limit theorems for large derivations. Theory Probab. Appl. 10 (1965) 214-235. Osipov, L. V. [1] On asymptotic expansions of distribution functions of sums of independent random variables. Vesin. Leningr. Univ. 24 (1972) 51-59 (in Russian). Osipov, L. V. and Petrov, V. V. (1J On an estimate of the remainder in the central limit theorem. Theory Probab. Appl. 12 (1967) 281-286. Paley, R. E. A. C. and Wiener, N. [l) Fourier Transforms in the Complex Domain, Vol. XIX. A. M. S. Colloquium Publications (1934).
References
313
Parthasarathy, K.R. [1] Probability Measures on Metric Spaces. Academic Press, New York (1%7). Paulauskas, V. [I] On the multidimensional central limit theorem. Litov. Mat. Sb. 10 (1970) 783-789. Petrov, V. V. (1] On local limit theorems for sums of independent random variables. Theory Probab. Apps. 9 (1964) 312-320. (2) Sums of Independent Random Variables. Nauka, Moscow (1972) (in Russian). Rao, Ranga R. (I) Some problems in probability theory. D. Phil, Thesis, Calcutta University (1960). (2) On the central limit theorem in Rk . Bull, Am. Math. Soc. 67 (1961) 359-361. (3) Relations between weak and uniform convergence of measures with applications. Ann. Math. Stat. 33 (1 %2) 659-680. Rao, R. Ranga and Varadarajan, V. S. (1) A limit theorem for densities. Sankhya 22 (1960) 261-266. Reed, M. and Simon, B. [1) Methods of Modern Mathematical Physics II: Fourier Analysis, Self-Adjointness. Academic Press, New York (1975). Rotar', V. I. (1] A non-uniform estimate for the convergence speed in the multi-dimensional central limit theorem. Theory Probab. Apps. 15 (1970) 630-648. Sazonov, V. V. (1) On the multi-dimensional central limit theorem. Sankhya, Ser. A 30 (1 %8) 181-204. (2] On a bound for the rate of convergence in the multidimensional central limit theorem. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 11, University of California Press (1972), pp. 563-581. [3] Normal Approximation-Some Recent Advances. Springer-Verlag, New York (1981). Scheffb, H. [1] A useful convergence theorem for probability distributions. Ann. Math. Stat. 18 (1947) 434-438. Stein, E. M. and Weiss, q. [1] Introduction to Fourier Analysis on Euclidean Spaces. Princeton University Press, Princeton, N. J. (1971). Sweeting, T.J. [1] Speeds of convergence for the multidimensional central limit theorem. Ann. Probab. 5 (1977) 28-41. Takano, K. [I] Multidimensional central limit criterion in the case of bounded variances. Ann. Inst. Stat. Math. Tokyo 7 (1956) 81-93. Topsde, F. [1] On the Gilvenko-Cantelli theorem. Z. Wahrscheinlichkeitstheor. Verw. Geb. 14 (1969) 239-250. Uspensky, J. V. [11 Introduction to Mathematical Probability. McGraw-Hill, New York (1937). Wallace, D. L. (I) Asymptotic approximations to distributions. Ann. Math. Stat. 29 (1958) 635-654.
314
References
Wiener, N. (I) The Fourier Integral and Certain of Its Applications. Dover Publications, New York (1933). Yarnold, J. K. [ 11 Asymptotic approximations for the probability that a sum of lattice random vectors lies in a convex set. Ann. Math. Stat. 43 (1972), 1566-1580. Zahl, S. [11 Bounds for the central limit theorem error. SIAM App!. Math. 14 (1966) 1225-1245. Zolotarev, V. M. [l] On the closeness of the distributions of two sums of independent random variables. Theor. Probab. App!. 10 (1965) 472479. [2) A sharpening of the inequality of Berry-Esseen. Z. Wahrschein/ichkeitstheor. and Verw. Geb. 8(1967)32-42.
Index
Absolutely continuous, function, 263 signed measure, 269 Asymptotically best constant in BerryEsseen theorem, 110, 240 Bernoulli numbers, polynomials, 272 Berry-Esseen theorem, 104 Borel sigma-field of a metric space, 2 Boundary of a set, 4 Bounded Lipschitzian distance, 17 Bounded variation of a function, 263 Cauchy's estimate, 68, 69 Cauchy's formula, 28 Central limit theorem, classical, 186 multidimensional, 183, 184 Characteristic function of a probability measure, 42 Chebyshev's inequality, one sided, 103 Closed half space, 20 Continuity set, 4 Convex set, 20 Convex hull of a set, 23 Convolution, of functions, 42 of finite signed measures, 43 Covariance matrix, 261 average, 59 Cramer's condition, 207 Cramer-Edgeworth polynomials, 51, 52 Cramer-Levy theorem, 44
Cumulants, vth, 46 Degenerate distribution, 226 Difference operators, 262 Distribution of a random vector, 261 Distribution function, of a finite signed measure, 54, 261 of a random vector, 262 Euler-Maclaurin sum formula, in multidimension, 275 in one dimension, 273 Face of a polyhedron, 25 Fourier transform, 40 Fourier-Stieltjes transform, 42 Fourier inversion, theorem, 41 formula in lattice case, 230 Fundamental domain of a lattice, 229 Hausdorff distance, 19 Holder inequality, generalized, 48 Hyperplane, 20 Indicator function of a set, 4 Induced signed measure, 43 Inner ?roduct, in Ck, 40 in L (Rk), 40 Lattice, 224 315
316
Index
distribution, 226 minimal, 227 random vector, 226 Lattice point problem, 242 Liapounov coefficient, 59 Liapounov's central limit theorem, multidimensional extension, 185 Lindeberg's central limit theorem, multidimensional extension, 183, 184 Local central limit theorem for densities, 189 Local expansion, for densities, 192, 194 for the lattice case, 231 Logarithm, principal branch, 45 Mean of a random vector, 260 Mean central limit theorem, 172 Moment, vth, 44 absolute, 45 Normal distribution, 50 density, 50 Norm, in Ck, 40 in LP(Rk), 40 of a matrix, 124 of a finite signed measure, 3
Period lattice, 228 Plancherel theorem, 41 Polya's result, 23 Polyhedron, 25 Prokhorov distance, 5 Q-continuous function, 4 Riemann-Lebesgue lemma, 41 Scheffts theorem, 6 Schwartz, function, 265 space, 274 Singular measure on Rk, 270 Standard normal distribution, 51 Standard normalization, 160 Strongly non-lattice distribution, 221 Support of a (signed) measure, 2 Surface, area, 27 integral, 27 Uniformity class, of functions, 6 of sets, 6 Uniqueness theorem for the Fourier-Stieltjes transform, 43
Oscillation, of a function, 4 average modulus of, 97 of a function on a set, 7
Variation, of a function on a set, 263 of a finite signed measure — positive, negative, total, 2
Parseval's relation, 44 Period, 227
Weak, topology, 3 convergence, 3