This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0. TTien (i)
'(Q + ) = E(Y); E(YebY) exists, -oo < E(YebY) ^ oo,and(p'(b-) = E(YebY). Proof. Write
where/= O o n ( — oc,0)and/= 1 on [0, oo). The integrands in (2.6) are continuous in t for each y. For t e [0, b], the first integrand is bounded by 0 and exp(by) and the second by 0 and 1. It therefore follows from (2.6) by the dominated convergence theorem that (p is continuous on [0, b]. Let z denote the complex variable t + iu, and let
(t) < oo for all t > 0, -co ^ E(Y) < 0, and P(Y > 0) > 0. Then F satisfies the standard conditions. Proof. The first two assumptions of the lemma are that J3 = oo and that
(0) = 1 for all t ^ 0. Now, P(Y > 0) > 0 implies that there exists e > OsuchthatP(7^ s) > O.Hencecp(r) ^jy^EetydF ^ e'EP(Y ^ e) for t > 0; hence cp(t) -> + oo as t -> +00. For any f e / let G(() be the left-continuous probability distribution function defined by dG(t\y) = [(p(t)]~l e\p(ty)dF. The distributions {G(r)} associated wit a given F appear in many statistical and probabilistic contexts. For example, the G(t) with t =£ 0 such that cp(t) = 1 is of importance in sequential analysis [Wl]; cf. also [B3], [F2], [F3], [S3]. We are interested in the case when F satisfies the standard conditions, and t = T. LEMMA 2.4. Suppose F satisfies the standard conditions (2.10)-(2-ll), and let0, and hence P(Y= 0) < 1. It now follows from Lemma 2.2 that
0 for 0 < t < b, so (p is strictly increasing on (0, b). Since cp is continuous on [0, b], (p is strictly increasing on [0,5]. Since (p is convex, it follows that cp(t) > cp(0) for all t > 0 so p = 1. Let k be a positive integer, and let 0) > 0 is essential to Lemma 3.3. For example, if P(Y = 0) = 1, then f(u) = 0 for u ^ 0 and /(«) = oo for M > 0. This important remark and counterexample are due to M. Wichura. Now let y i 5 Y 2 , • • • denote a sequence of independent replicates of Y and for each n= 1, 2, • • • let Pn = P(Yi + • • • + Yn ^ 0). THEOREM 3.1. n~1 log Pn -> log p as « -» oo. This theorem is in action, so to speak, only when 0 < p < 1, but it is a technical convenience to know that it is valid in complete generality. Wichura's counterexample shows that the theorem becomes false if k t ^ " is replaced with ">" in the definition of Pn. Proof. Suppose first that the d.f. of Y satisfies the standard conditions. In this case the theorem is deducible from Theorem 2.2 as follows. For each n let Zn = £"=1 Yt and let Fn and (t) ^ - K(q, p) ; hence (4.8) holds. Thus (4.7) is established. We now show that \l/(t) = (1 - ^/2t)~i/2 exp(-t/ x /2). \l/(t) is the m.g.f. of a (£ - I)/V^ variable, so Hn converges weakly to the d.f. of such a variable. Hence (2.20) holds for every £ > 0. (5.1) now follows from (2.21), (5.2), and (5.4). The following conclusions can be established by the same method as used above for (5.1). Let) and /q, /c 2 , • • • be positive integers such that 0 = —K. Suppose next that 0 < K < oo. In this case there exists r(x), 0 ^ r(x) < oo, such that dP02 = r(x) dP0i on &, and 1, 7cn ^ 1 — /? for n > m. For n > m consider ?** = (1 — /i)?C L m; hence n" 1 log a n = — oc = — K for all n > m. In the remainder of this section we consider a theory of estimation in terms of the framework (X(n\ ^ (n) ), {Pj^rfle®}, « = 1,2, • • - , introduced at the outset of this section. Let g be a real-valued functional defined on 0. For each n, let Tn = Tn(x(n)) be a ^'"'-measurable function on X(n), to be thought of as a point estimate of g. For any 9 and e > 0, let Tn = rn(e, 0) be defined by 0 ^ r n ^ oo. Since the right-hand side of (6.13) can be found exactly by entering a standard normal table with e/t n , let us call in(s, 9) the effective standard deviation of Tn when 9 obtains and it is required, for some theoretical or practical reason, to compute the left-hand side of (6.13). Note that if Tn is exactly normally distributed
Let F(k\ cp(k\ and p(k) be denned by (2.1), (2.2), and (2.3) with Y replaced by
LEMMA 3.2. p(k) -» p as k -> oo. Proof. It is plain from (3.1) that Y^ Y(k+1) ^ Y(k); hence ^ (p(k\t) for t ^ 0; hence
Since Y(k) ^ k, E(Y(k)) exists, say mk, and If mk ^ 0 for some /c, say /c t , then p(ki) = 1 by Lemma 3.1; hence p(k) = 1 = p for all k ^ /c 1? by (3.2). Suppose henceforth that mk < 0 for all k. If P(Y g 0) = 1, then P(Y(k) g 0) = 1 and p(k) = p = P(Y= 0) for all k. Suppose then that P(Y > 0) > 0. It then follows from Lemma 2.3 that F(k) satisfies the standard conditions, and from Lemma 2.2 that the first derivative of cp(k\ say ij/(k\ is strictly increasing over (0, oo), k = 1,2, • • • . For each /c, let r (fc) be the solution of \j/(k\t) = 0. We observe next from (3.1) that Y(k) exp(tY(k}) ^ y ( f c + 1 ) exp(r7 ( f c + 1 ) ) for all t ^ 0; it follows by taking expectations that
SOME LIMIT THEOREMS IN STATISTICS
7
It follows from (3.4) that
for otherwise 0 = \l/(k\r(k)) ^ ^< k + 1 ) ( T ( k ) ) < ^ < * + 1 > ( T < k + 1 > ) = 0. It follows from (3.5) that lim^^ r(k) exists, a say, and 0 ^ a < x. Since p(k) = (p(k)(r(k)) = £(exp[T(fc) x y
e "V(0 is» °f course, the m.g.f. of Y — u, and /(O) = —log p. It is plain that /(«) is nondecreasing over (— oo, x). LEMMA 3.3. // P(Y > 0) > 0, there exists a 6 > 0 st/c/i r/raf /(u) /s continuous over ( — oo, <3); m particular, f(u) -» — log p as u -> 0. Proof. f(u) = sup{ut — log>(t):0 ^ t < oo, (p(r) < 00} is the supremum of a family of linear functions; / is therefore convex over ( — 0 0 , +00). There exists d > 0 such that P(Y ^ 5) = a > 0. Since
8
R. R. BAHADUR
by Y1 + • • • + Yn that Pn ^ p" for all n; hence lim supn_00 n J log Pn ^ log p. W have to show that
Case 1. P(7 ^ 0) = 1. In this case p = P(Y = 0) and Pn = p" for all n, so (3.7) holds trivially. Case 2. P(Y > 0) > 0. To treat this case, let k be a positive integer and for each
n let Y(k) be defined by (3.1) with Y replaced by Yn. The
sequence of independent and identically distributed variables, and hence
Let p(k) be defined as in the paragraph following (3.1) and let mk = E(Y(k)), — oo ^ mk < oo. Now consider the following subcases of the present Case 2: (i) mk < 0 for all k, (ii) mk > 0 for some k, and (iii) mk = 0 for all sufficiently large k. In view of (3.3) these three cases are exhaustive. Suppose Case 2(i) obtains. It then follows from Lemma 2.3 that, for each k, the d.f. of Y(k) satisfies the standard conditions; hence n~1 log Pn(k) -> log p(k) by the first paragraph of this proof with Y replaced with Y(k]. It now follows from (3.8) that the left-hand side of (3.7) is not less than log p(k) for any fc; hence (3.7) holds, by Lemma 3.2. Suppose now that Case 2(ii) obtains, and let k be such that mk > 0. Then Pn(k) -> 1 as n -» oo by the law of large numbers; hence Pn -> 1 by (3.8); hence n~l log Pn ->• 0, so (3.7) holds since p ^ 1. Suppose finally that Case 2(iii) obtains. In this case E(Y) exists and equals 0. It is thus seen that the theorem is established in all cases except possibly in Case 3: P(Y > 0) > 0, E(Y) exists and equals 0. To treat this case let u be a positive constant, and let Y* = Yn - u for each n. Then Pn ^ P(Y* + ••• + Y* ^ 0) = P*, say, for each n. Since E(Y*) = — u^Q, Y* does not belong in Case 3. Hence n~ 1 log P* -> —/(«), where / is given by (3.6). Thus the left-hand side of (3.7) is not less than — f(u). Since u is arbitrary, it follows from Lemma 3.3 that (3.7) holds. Notes. Theorem 3.1 is due to Chernoff [C1 ]. The present proof is a rearrangement and continuation of the proof in [B3] for the case when the standard conditions are satisfied. A different proof, again under the standard conditions, is given in [fill]. Concluding remark. The following partial generalization of Theorem 3.1 is required in certain applications. Let Y be an extended real-valued random variable such that P(— oo ^ Y < oo) = 1, and let
SOME LIMIT THEOREMS IN STATISTICS
9
and Then
and also
To see this let Y* denote a random variable such that the d.f. of Y* is that of Y given that Y > - oo. Then P(Y{ + • • • + Yn ^ nun) = a"P(Y* + • • • + Y* ^ nun) andP^ + • • • + Yn > nun) = a"P(Y* + • • • + Y* > nun) for all n, and 0 < a ^ 1 by (3.10). Choose e > 0. Then u — E < un < u + e for all sufficiently large n. Hence, for all sufficiently large n, P(Y*1 + • • • + Y* ^ nun) and P(Y* + • • • + Y* > nun) are both bounded by P(Y* + • • • + Y* ^ n(u + e)) and P(Y* + • • • + Y* ^ n(u — e)). By first applying Theorem 3.1 to these bounds and then letting e -> 0, it follows from (3.10) and Lemma 3.3 that the limits in (3.11) and (3.12) exist and that both limits are equal to log a — f * ( u ) , where /* is defined by (3.6) with (p replaced by cp* equal to the m.g.f. of Y*. Since
and let
If q is not absolutely continuous with respect to p, let K(q, p) = oo. THEOREM 4.1. K is well-defined. 0 ^ K ^ oo. K = 0 if and only if p(B) = q(B) for all B in @. This theorem is a straightforward consequence of the facts that (4.1) implies
and that log t ^ t — 1 for 0 5^ ? < oo, with equality if and only if t = 1; we omit the details. It follows from (4.1) and (4.2) that, in case q « p,
10
R. R. BAHADUR
with 0 log 0 = 0, is an alternative formula for K. The value of K does not depend on the particular version of r used in (4.2) or (4.4). It should also be noted that K can be + oo even if q « p and p « q. The number K was introduced by Kullback and Leibler [K5]. It plays an important role in statistical theory [K6], especially in large sample theories of estimation and testing [B9]. The underlying reason is that ^/ZK&p), ^/2K(p~q) and ^/K(q,p] + K(p,q) are indices of the statistical distance between p and q: the smaller these indices the harder it is to discriminate between p and q when the sample space is (X, 3$}. The following are some more or less heuristic considerations bearing on the nature of X. (i) Let ^ be a cr-field such that <€ <= ^, and let p* and q* be the restrictions of p and q to #. Then K(q*,p*) ^ K(q,p); if K(q,p) < oo, then K(q*,p*) = K(q,p) if and only if ^ is sufficient for the two-point set {p, q} of measures on 28 (cf. [K5]). (ii) Suppose for simplicity that p and q are mutually absolutely continuous on 38. Then 0 < r(x) < oo. Let \l/(t] be them.g.f. of log r(x) under p. Then i//(0) = i//(l) = 1, and it follows from Lemma 2.2 that ijs'(Q + ) = — K(p,q) and i^'(l—) = K(q,p). The K numbers thus indicate how rapidly the graph of i// departs from the graph of f(t) = 1 at the endpoints of [0, 1]. f(t) = 1 is, of course, the m.g.f. of log r in case p = q on 38. (iii) Suppose X = Rk, and p and q represent multivariate normal distributions with mean vectors ^ and ju 2 respectively and common covariance matrix £, where L is positive definite. Then K(q,p) = K(p,q) — 4(ju2 ~ Mi)^ '(^2 ~~ M i ) (iv) Let 0 be an open set in Rk, and for each 0 in 0 let pe be a probability measure on $. Suppose that dpe — fe(x) d\i, where ju is a fixed ff-finite measure, and 0 < /0(x) < oo for all 6 in 0 and x in X. Assume that fe(x) is continuous in 0 for each fixed x, and let 00 be a given point in 0. Let r[x\9,60) = fe(x)/f6o(x). Since rlogr ~ (r — 1) + (r — l) 2 /2 as r -> 1, it follows from (4.4), under appropriate regularity conditions, that K(pe,peo) ~ E6()(r(x\0, 0 0 ) — l) 2 /2 as 0 -> 00. Similarly, under appropriate regularity conditions, K(p0o, pe) - E6o(r(x\6,00) ~ l) 2 /2 as 9 -» 0 0 . £6o(r(.x|0, 00) — I) 2 is a familiar quantity in the theory of estimation. In particular, if/ fl (x) is continuously differentiable in 9 for each x and the partial derivatives are square integrable, then, under appropriate regularity conditions, E9o(r(x|0, 00) - I) 2 - (0 - 90)I(90)(9 - 00)' as 0-> 00, where 7(00) is Fisher's information matrix, when (X, ^) is the sample space and 00 obtains. (v) In the same context as that of the preceding paragraph consider fixed 9l and 92, and let T(x) = log K^l^i > ^2)- According to the Neyman-Pearson theory, T is the best statistic for discriminating between p0i and pB2. Assume that x consists of a large number of independent components. Then T is approximately normally distributed under each 9. Let m, be the mean and a? the variance of T under pe.,i = 1,2. If a^ = a2 — G saY' the separation between pdi and p02 afforded by the optimal statistic T is approximately the distance between the N(0,1) and the N(d, 1) distributions, where d = (mi — m2)/a. (Here m2 < 0 < ml.) In the general case, i.e., a{ not necessarily equal to a2, we may take d{ — (m t — m2)/o-1 or d 2 = (m t — w 2 )/fT 2 or some mixture of the two as the effective distance, in the standard normal scale, between p 0i and p 02 . It can be shown, under appropriate
SOME LIMIT THEOREMS IN STATISTICS
11
regularity conditions, that and 2 (cf. [B7]). Consequently, -J K(pQ^ pe2) + K(pg2,pdl) is the approximate distance between p6i and p02 in the standard normal scale if 02 is close to 0 ^ . So much for heuristics. Suppose now that Y = Y(x) is a real-valued ^-measurable function on X , and that p is a given probability measure on &. Let
and let
THEOREM 4.2. exp[- K(A, p)] = p. Proof. We shall first show that If A is empty (e.g., if Y(x) = — 1), K(A,p) = oo and (4.7) holds trivially. Suppose then that A is nonempty, and choose and fix a measure q in A. We have to show that
If K(q,p) = oo or if p = 1, (4.8) holds trivially. Suppose then that K < oo and p < 1. Then there exists r(x), 0 fg r(x) < oo, such that (4.1) holds, and K is given by (4.2), Consider a t > 0 such that (p(t) < oc. Then
by (4.1) and (4.3). Hence log
Suppose first that Y(x) < 0[p]. Then p = 0 and (4.9) holds trivially. Suppose next that Y(x) ^ 0[p] but that with B = {x: Y(x) = 0}, p(B) > 0. Then p - p(B), 0 < p ^ 1. Let r(x) = p~! on B and = 0 on X - B, and let q be defined by (4.1). Then q e A, and X(g, p) = — log p; hence (4.9) holds. Suppose henceforth that
12
R. R. BAHADUR
Suppose that the d.f. of Y satisfies the standard conditions. Let i be defined by (2.12), and let dq = p~l exp[ty(x)] dp. It follows from Lemma 2.4 that q is a probability measure in A, i.e., Eq[Y(x)] = 0. Since \og(dq/dp) = -log/? + i7(x), it follows from (4.2) that K(q, p) = - log p; thus (4.9) holds. To treat the general case, let Y(k\x) be defined by (3.1) with Y replaced by Y(x). Then, for any q, Eq(Y(k\x)) exists, -oo ^ E^Yw(x)) < oo. Let Ak = {q: Eq(Y(k\x)) ^ 0}. Since Y(k\x) ^ Y(x), Ak c A; hence
Suppose first that Ep(Y(k\x)) ^ 0 for some k. Then peAk; hence K ( A k , p ) = 0; hence K(A,p) = 0 by (4.10); hence (4.9) holds, since 0 ^ p ^ 1. Suppose next that Ep(Y(k\x)) < 0 for all k. Since p(Y(k)(x) > 0) = p(Y(x) > 0) > 0, it follows from Lemma 2.3 that the d.f. of each Y(k} satisfies the standard conditions. Hence
by the preceding paragraph. It follows from (4.10) and (4.11) by Lemma 3.2 that (4.9) holds. Theorem 4.2 affords the following interesting reformulation of Theorem 3.1. Suppose that each one-point set {x} c X is ^-measurable. Let 7(x) be a $measurable function, and p a probability measure such that K(A, p) defined by (4.5) and (4.6) is finite. Suppose p obtains, and let x1, x 2 , • • • denote a sequence of independent replicates of x. For each n, let pn = pn( • xl, • • • , xn] be the probability measure on & defined by pn(B) = n~l (the number of indices j with 1 ^j !g n and XjEB). Then p n e A if and only if £? =1 Y(xt) ^ 0. In view of Theorem 4.2, Theorem 3.1 is therefore equivalent to as n —> oo. The set A defined by (4.5) is of a very special sort. It seems that (4.12) holds for a large class of sets A. To be more precise, let A be a given set of probability measures on ^ such that the left-hand side of (4.12) is well-defined for each n, and let K(A,p) be defined by (4.6). Then, under certain additional conditions, (4.12) holds. This elegant formulation is due to Sanov [SI], but it seems that the theorems of [SI] are not reliable; just what conditions are needed for (4.12) is still an open question in the general case. The special case when X is a finite set (i.e., the multinomial case) is treated in the following section. We conclude this section with a set of sufficient conditions for (4.12). Suppose that X is the /c-dimensional Euclidean space of points x == ( z l , • • • , z k ), and & is the class of Borel sets of X. Let us say that a sequence p ^ , p 2 , • • • of probability measures pn converges to a probability measure p0 if and only if the d.f. of pn converges uniformly to the d.f. of p 0 . Let M be a given set of probability measures containing each probability measure which assigns probability 1 to some finite set. Let T(m) be a real-valued functional on M, and let A., = [m: T(m) ^ t] for
SOME LIMIT THEOREMS IN STATISTICS
13
THEOREM 4.3. // p is a nonatomic measure in M, T is uniformly continuous on M,K(At,p) is continuous in t at t = u, and K ( A u , p ) < oo, then (4.12) holds for A = AU. For the proof, see [H2]. Notes. A version of Theorem 4.2 was noted by Kullback [K6] in a special case (cf. also [K7], [SI]); the general theorem as stated here is due to Hoeffding [H6] Theorem 4.3 is a special case of a theorem of Hoadley [H2]. 5. Some examples of large deviation probabilities. Example 5.1. For each n let Yn be a random variable having the f-distribution with n d.f., and let Pn(a) = P(Yn ^ *Jna], where 0 < a < oo. We shall show that as n —> oo. Let U and Vn be independent N(0, 1) and ^ variables, and let Zn = [U2 - bVn]/2, where b — a2. Then
cpn(t) = the m.g.f. of Zn = (1 - 0~ 1 / 2 0 + bt)-"l2(or -b'1
Let b be a positive constant, and for each n let V and Wn be independent chi-square variables with d.f. j and kn respectively. Then
14
R. R. BAHADUR
Example 5.2. Let W be a p.d.f. over (0, oo) such that
For each n = 1, 2, • • • and for 1 ^ i ^ n let a,(n) denote the expected value of the ith order statistic in a sample of n independent observations on a random variable distributed according to W. Then 0 ^ a^n) ^ • rg a n (n) < oo and £"=1fl*(n) = n/i. Let I/,, £/ 2 , • • • be a sequence of independent random variables with p(Vi = +1) = a, and P(£7, = - 1) = 1 - a for each i, where 0 < a < 1, and let
Consider the hypothesis that a = |. If W is degenerate at 1, Yn is equivalent to the sign-test statistic for testing this hypothesis; if W is uniform over (0,1), Yn is equivalent to the Wilcoxon statistic; and if W is the d.f. of a |/V(0, 1)| variable, Yn is the normal scores statistic. We are interested in large deviation probabilities of Yn in the null case, i.e., when a = j. Suppose henceforth that OL — j, and let b be a constant, Then where
and i is the solution of
To establish the stated conclusion we require the following special case of a theorem of Hoeffding [H3]. For each n let Wn denote the d.f. which assigns mass 1 /n to each of the (not necessarily distinct) points a,(rc), i — 1, • • • , n. Let g(y) be a nonnegative and continuous function on (0, oo), and suppose that there exists a convex function h(y) on (0, oo) such that g(y) ^ h(y) for all y and J°° hdW < oo. Then
as n -» oo. For each n let Zn — Yn — nb, and let Fn, cpn and pn be defined as in Theorem 2.2. Since Zn is a bounded random variable, since E(Zn) = —nb<0, and since
SOME LIMIT THEOREMS IN STATISTICS
15
P(Zn > 0) ^ P(Ul = • • • = Un = 1) = 2~" > 0 by (5.10), Fa satisfies the standard conditions, by Lemma 2.3. It is readily seen that
It follows from (5.15) that pn =
Now, the integrals in (5.13) and (5.16) are continuous and strictly increasing in t, varying from 0 to fj, as t varies from 0 to oo. Also, for each fixed t, the integral in (5.16) ->• the integral in (5.13) as n -> oo, by (5.14). It follows hence that and that 0 < T < oo. It follows from (5.15) and (5.17) by another application of (5.14) that n ^ 1 logp n -»log p, where p is given by (5.12) and (5.13). It will now suffice to show that conditions (2.23) and (2.24) are satisfied. Let C n (r) be the cumulant generating function of G n , the distribution obtained from Fn by exponential centering. Then <;„(£) = log (pn(t + rj — log(p n (rj, where \ogq>n is given by (5.15). Hence G2n = cj,2)(0) = n f x sech2(tn>') dWn. Hence
by an application of (5.14). Another application of (5.14) shows that
Since in is bounded, it follows from (5.18) that (2.23) holds. It follows from (5.18) and (5.19) that the fourth cumulant of Hn —> 0 as n —>• oc. Hence the fourth moment of Hn ->• 3, so (2.24) holds with c = 2. It can be shown that in the present case Hn -> O, even if}' 4 is replaced by >'3 in (5.8). Example 5.3. Let X j , x 2 , • • • be a sequence of independent random variables, with each x, uniformly distributed over [0, 1]. For each n let FH(t) be the empirical d.f. based on Xj, • • • , xn, i.e., Fn(t) = (the number of indices / with 1 ^j^n and Xj ^ t)/n, and let
and Let a be a constant, 0 < a < 1, and let Pn+ = P(D+ ^ a), P~ = P(D~ ^ a), and pn = p(Dn ^ a). We shall show that
16
R. R. BAHADUR
as n -> oo, where g is defined as follows. Let
Then
It follows from (5.22) that Fn+ ^ Pn ^ P+ + P~ ^ 2 max{P + , P~}; consequently, (5.23) implies that
as n —> oc. Some properties of g defined by (5.24), (5.25) are described in the following lemma. LEMMA 5.1. g(a) is a strictly increasing and continuously dijferentiable function of a for 0 < a < 1; g(a) — 2a2 + 0(a3) as a —>• 0; g(a) -> oo as a -»• 1. The proof of this lemma is omitted. Now consider Pn+ for given a. It is plain from (5.20) that P+ ^ P(Fn(t) - (a + t) ^ 0) for each te[0, 1]. For fixed t, Fn(t) — (a + t) is the mean of n independent and identically distributed random variables, say Y! , • • • , Y n , and with / defined by (5.24) we have
It follows hence from Theorem 3.1 that lim^^ n'1 \ogP(Fn(t) - (a + r) ^ 0) = —/ Hence lim inf n ^ x n~ l log Pn+ ^ —f(a, t). Since t is arbitrary,
Now let k be a positive integer so large that 0 < a — k * < 1. Then
Hence
thus
SOME LIMIT THEOREMS IN STATISTICS
17
It follows from Theorem 2.1 that the /th term of the series does not exceed Qxp[-nf(a — k'^i/k)]; hence Pn+ ^ /cexp[-ng(a - AT 1 )], by (5.25). Since n is arbitrary,
Since g is continuous, it follows from (5.27) by letting k -> oo in (5.28) that the first part of (5.23) holds. The second part of (5.23) follows from the first since Dn+ and D~ have the same distribution for each n. Now let
and let Qn = P(Tn ^ a). Then, for Qn also, Since Tn ^ D n + , Qn ^ Pn+ for all n. In view of (5.23) it will therefore suffice to show that
To this end, note first that
Let k be a positive integer so large that 0 < a — 2k 1 < 1, and let i and j be integers, 1 ^ i, j ^ fc. If (i - l)/k £ t £ i/k and (; - l)/k ^ u £j/k, then FH(t) -t + u- Fn(u) ^ Fn(i/k) - (i - l)//c + 7y/c - FJi(j - l)/k) = G n (U), say. Now, i ^ j - 1 implies P(Gn(iJ) ^ a) ^ exp[-n/(a - 2fc~ *, (i - 7 + !)/&)], and i ^ ;
- 1 implies P(Gn(i,j) ^ a) ^ exp[-n/(a - 2fc~1, 1 - (7 - 1 - i)/k)], by applica
tions of Theorem 2.1. Hence P(Gn(i,j) ^ a) ^ exp[ —ng(a — 2/c" 1 )] for all tj. Since Tn ^. a implies Gn(i,j) ^ a for some i and;, it follows that Qn ^ /c2 exp[ — ng(a — 2k~1)]. Since this holds for each n, the left-hand side of (5.31) does not exceed — g(a — 2k~1}. By letting k -> oo it now follows that (5.31) holds. Remark 1. All four conclusions (5.23), (5.26), and (5.30) are special cases of (4.12). Remark 2. It is known that P(n 1/2 D n + ^ i) -» exp[-2t 2 ] for each f > 0. This suggests that if a > 0 is very small the limit of n' l log P* is nearly — 2a2 ; verification of this suggestion is provided by (5.23) and Lemma 5.1. The parallel remark applies to Pn, and perhaps also to Qn. Example 5.4 (Sanov's theorem in the multinomial case). Let X be a finite set, say X = {av, • • • , ak}, k ^ 2. Let A denote the set of all v = (v^, • • • , vk) with vt ^ 0 and Xfc=i vi ~ 1- Regard A as the set of all probability measures on X, i.e., i v = (v1, • • • , vk) obtains, then P(x = a,) = vt for / = 1, • • • , k. For any v = (vl, • • • , vk) and p = (P! , • • • , pk) in A let
18
R. R. BAHADUR
with 0/0 = 1 (say) and 0 log 0 = 0. Then K as just defined is the specialization to the present case of K as defined in § 4. Now let p — (p,, • • • , pk} be a given point in A with p( > 0 for each /, and let A be a subset of A. Let K ( A , p ) = inf{X(i;,p):ve A } if A is nonempty, and let K(A, p) = co otherwise. Let x l 5 x 2 , ••• denote a sequence of independent and identically distributed observations on x. For each n let fin = the number of indices j with 1 ^j^n and x}: — «,-, i = 1, • • • , /c, and let Vn = n~l(j\n, ••• ,fkn). We are interested in P(Vne A\p). The values of Vn are restricted, of course, to A n = the set of all v of the form (i{/n, • • • , ik/n), where i\, • • • , ik are nonnegative integers with il + • • • + ik = n. Let An — A n A n . We shall show that there exists a positive constant y(k), depending only on /c, such that
for all n, A c A, and p in the interior of A. Let us say that A is p-regular if
In this case (5.33) implies that
as n —> oo. The following lemma gives a useful sufficient condition for /^-regularity. Let A° denote the interior of /4, and let A° be the closure of A°. LEMMA 5.2. If A c: A° (e.g., if A is open), then A is p-regular for any p e A°. Proof. If A is empty so is An and K(An, p) — co — K(A, p) for all n, so (5.34) holds trivially. Suppose then that A is nonempty. Choose e > 0 and let v be a point in A such that K(v, p) ^ K(A, p) + K. Now, p E A° implies that K(i\ p) is finite-valued and continuous in its first argument. It follows hence from v E A and A c: A° that there exists w e A° such that K(w, p) fg K(v, p) + E. Hence K(w, p) ^ K(A, p) + 2i:. Suppose vv = ( W L , • • • , w k ). For each n let rin be the greatest integer ^nwf for i = 1, • • • , k - 1, let rkn = n - ^~} rin, and let wn = n' l(rln, ••• , rkn). Then \vn e A n for each n, and vvn —>• w as n —>• oo. Hence w n e A° for all sufficiently large n, say for /? > m. Since /4n — /4 n A n =3 /1° n A n , it follows that K(An, p) ^ K(wn, p) for all n > m. Hence limsup n ^ a ) K(An,p) ^ K(w,p). Thus limsup,,^^ K(An, p) ^ X(/l,p) + 2fi. Since £ is arbitrary, and since An<^ A implies K(/4 n ,p) ^ X(/4,p) for all n, it follows that (5.34) holds. We proceed now to establish (5.33). Choose and fix n. A straightforward calculation shows that, for any v = n~ l ( r l , • • • , rk) in A n ,
Hence, for any
SOME LIMIT THEOREMS IN STATISTICS
19
To establish the lower bound we require the following proposition: There exists y(k), depending only on k, such that
for all_£ in A n . Suppose v = n Vn • • • > rk) with each r, ^ 1. Since r\ = ^/2nr rr e~r +s, where (12r + I)' 1 < s < (12r) -1 , it follows by an easy l/k calculation using that P(Vn = v\v) ^ 0(fc) 1 ••• rk) (k 1}/2 (fc l l / 2 /13 •n~ ' , where jg(fc) = (27r)' • fc* • e~* . Suppose now that v is a point in An with exactly /c, positive coordinates where 1 ^ ^ < /c. The preceding argument shows that then P(Vn = y|y) ^ P ( k j n ~ ( k l ~ 1)/2 ^ ft(k^n~ {k ~ l) > 2 . Letting y(/c) = min{0(l), • • • , jS(fc)}, it now follows that (5.37) holds for all v in A n . If/!„ is empty the upper and lower bounds in (5.33) are zero and so is P(Vne A\p). Suppose then that An is nonempty. Since An is a finite set, there exists vn e An such that K(An,p) = K ( v n , p ) . Then P(VneA\p) = P(Vne An\p] ^ P(Vn = vu\p) ^ the lower bound in (5.33), by (5.36) and (5.37). Notes. The sources of Example 5.1 are [Al] and [B4], where a different method is used. Example 5.2 is from [K3]. Example 5.3 is based partly on [S2] and partly on [A2]. Example 5.4 is based on the treatment in [H5]. 6. Stein's lemma. Asymptotic effective variances. The statistical framework and notation of this section and all subsequent ones is the following. X is a set of points x, and <% is a er-field of sets of X. 0 is an index set of points 6, and, for each 0 in 0, Pe is a probability measure on ~$. (X, 38) is to be thought of as the sample space of an observation x whose distribution is determined by the unknown parameter 9. xl, x,, • • • represents a sequence of independent replicates of x. For each positive integer «, x (n) = ( x l , • • • , xj, and (X("\ 3#M) is the sample space of x(n). P(Q} is the probability measure on 3#(n) when x is distributed according to P0. In subsequent sections we shall also consider s = (xl, x 2 , • • • ad inf), an infinite sequence of independent replicates of x. Then (S, <*/) = (X(x\ i^1'00') will denote the sample space of s, and Pj,00' will denote the probability measure on ,«/ when 9 obtains. For any n, 1 ^ n ^ oo, P(en) will often be abbreviated to Pe. The framework just described (i.e., the independent and identically distribute case) is ostensibly that of the "one-sample case" but in fact is more general. Suppose, for example, that we have two normal distributions on Rk, the first one having mean vector fi1 and covariance matrix Z t , and the second one having mean vector /i2 and covariance matrix £ 2 . Suppose that it is desired to study the case when we have samples of independent observations from each distribution, the sample from distribution 2 being twice the size of the sample from distribution 1 Then we could take X = R*k, and x = ( u , v , w ) , where u,i\w are independent A r ( ^ i , £ j ) , N(n2,1.2) and N(/i 2 ,£ 2 ) variables, and take, for example 0 = (^ ; A*2^i; £ 2)Now for any 9{ and 02 in 0, let K(P02, P0t) be defined as in §4 with p — P0t and q = Pe2 and (X, 3#) as the sample space. We shall usually abbreviate K(Pg2, PeJ to X(0 2 ,^i). It is readily seen that if (*,<#) is replaced with (X(n\@(n}] in §4 and
20
R. R. BAHADUR
p = Pf;, q = Pg, then K(Pg, Pg) = nKfP^.P^) = "^2 A) for all 0 1 9 0 2 , and 1 ^ n < oo. Now choose and fix Bv and $2 and consider testing the simple hypotheses that 9 = 9l against the simple alternative that 9 = 02. Let ft be given, 0 < ft < 1, and suppose that we require the test to have power 1 — ft against 92. For each n let an = an(/?) be the infimum of all available sizes under the stated power requirement when the sample space is (X("\ &(n}). As is well known, this infimum is always attained, i.e., there exists a (possibly randomized) test based on x(n) which has size an and power 1 — ft. LEMMA 6.1. For each frn'1 log aB(0) -> - K(92, 0J. Proof. A test based on x(n) is defined to be a J^-measurable function (p(x(n)) with 0 ^ (p ^ 1. In using 9, given x(n) the hypothesis is rejected with probability (p(x(n)). The size and power of (p are E6i((p) and £02(
We have dP$ = rn(x(n)) dP(f on @(n\ where
It follows from (6.2) and (6.3) that 0 < rn(xM) < oo[P^]. It follows hence that there exist (uniquely determined) constants cn and dn, 0 < cn < oo and 0 ^ dn ^ 1, such that with
we have
It follows from (6.4) and (6.5) by the Neyman-Pearson theory (or otherwise directly by verifying that cp* minimizes hnEe^((p) + (1 — hn)Ee2(l — (p) in the class of all ^(n)-measurable test functions, where
SOME LIMIT THEOREMS IN STATISTICS
21
Write Y(x) = log r(x) and for n = 1, 2, • • • . It then follows from (6.4) and (6.5) that
for all n. Since 0 < 1 - /? < 1, and since n *£" Yt -» X in ^-probability by (6.1), it follows that One method of establishing the desired conclusion is to note that Y is an extended real-valued random variable with P0l( — oo ^ Y < oo) = 1, and that
and to apply Chernoff's theorem (cf. the concluding remark of § 3). This method (which requires separate consideration of the case P0l(Y ^ K) = 1) is, however, rather complicated; we shall use some direct evaluations instead. It follows from (6.4) and (6.6) that
for all n. It follows from (6.8) and (6.9) that lim supn^x n 1 log an ^ —K.
Now we obtain a lower bound for the size of any test based on x(n) which has power 1 — ft. Let cpn be such a test, and let dn be a positive constant. Then
22
R. R. BAHADUR
Let K > 0 be arbitrary, and take dn = e\p(nK + m:). Then Pfl2(rn > dn) -> 0 as «-> oo. It follows hence from (6.10) with cpn = (p* that lim inf,,^^ n' l log a M ^ — K — K. Since e is arbitrary it follows from the conclusion of the preceding paragraph that n' l log a n -> — K. Suppose now that P02 « P0] on ^, say JPe2 = r(x) ^/P 01 , but that K = oo. In this case / '.v) ^ /) > 0 for all / > 0. Choose and fix an / > 1 and define z = r(x) if Q -^ r(x) < I and z = / if r(x) ^ /. Let Z = [0, /] be the sample space of z, and let P£ denote the probability measure on Z when 0 obtains. We then have dP$2 = p(z}dP*ei, where p(z) = z for 0 ^ z < / and p(/) = J r(x)i , ; r(x)^P ei /P 0i (r(x) ^ /). It is plain that p is a bounded function on Z; hence X* = £^2(log p(z)) is finite. Now let z, = z(x,) for i = ! , - • • , « and let a* be the minimum available size when power is fixed at 1 — /? and the sample point is ( z j , • • • , z n ). Then an ^ a* for all n; hence
by the first two parts of the present proof with x replaced by z. Now, since K* = J z p(z] log p(z) dP^ and since z = / implies p(z) ^ / > 1,
Since K = J x r(x) log r(x) dPdi = oo, it follows from (6.12) that K* —> GO as / —> GO. By letting / -» oo in (6.11) it follows, as desired, that n * log a n -»• — oo. It remains now to consider the case when K = oo and P02 is not dominated by P9l on 30. Then there exists a set £ c X, 5e<^, such that Pdl(B) = 0 and P02(B) = 7 (say) > 0. Consider
SOME LIMIT THEOREMS IN STATISTICS
23
with mean g(9) when 0 obtains, then tn(£, 9) equals the actual standard deviation of Tn for each s. The sequence {T n } is said to be a consistent estimate of g if, for each fixed e and 0, the left-hand side of (6.13) —> Oas n —> oc. It is plain that we have consistency if and only if
for all 8 and 0. A special case of consistency occurs when Tn is asymptotically normal with mean g(9) and variance v(9)/n, i.e., there exists a positive function v on 0 such that for each 9, nli2(Tn - g(9))/[v(9)]1/2 tends in distribution to an JV(0, 1) variable when 9 obtains. It is readily seen that in this case for each 9 and h > 0. In (6.15), E -> 0 as n -» oo. We now consider the case when E remains fixed as n -> oc. It will be shown that for consistent estimates there is an asymptotic lower bound for nil f°r a^ sufficiently small c. This conclusion (Theorem 6.1) is an analogue of Fisher's bound for the asymptotic variance of asymptotically normal estimates (for a fuller description and discussion see §§ 1-3 of [B9]). Assumption 6.1. 0 is an open set in Rk, and g(0) is a continuously differentiate function of 9 = ( 9 l , • • • , 0k). Let
Assumption 6.2. For each 0° in 0 there exists a positive definite symmetric k x k matrix 1(9°) = {/,7(fl°)} such that
As noted in § 4, under additional assumptions the matrix 1(9 ) coincides with the information matrix when (X, &) is the sample space and 9° obtains; these additional assumptions are, however, not required here. Write I ~ l ( 9 ) = {Iij(9)} and let
Note that v depends only on the framework (X,&),{Pe:9e 0} and on the function g to be estimated. THEOREM 6.1. Suppose that Assumptions 6.1 and 6.2 hold and that [Tn] is a consistent estimate of g. Then
for every 9.
24
R. R. BAHADUR
Proof. Choose and fix 0° e 0. We shall show that (6.19) holds i) = 0(0°). If 0 = 0, then (6.19) holds trivially. Suppose then that / = 1(0°) and /i = (hi(0°), • • • , M#°)), where the ht are given by h 7^ 0, so hl~l is a nonzero vector. Choose and fix A, 0 < A < 1. For
at 9°. Write v > 0. Write (6.16). Then e > 0 let
It follows from Assumption 6.1 that 0* e 0 for all sufficiently small e, and that g(0*) - g(0°) = 6 + 0(6) as e -> 0. Consequently, by (6.17), for all sufficiently small e. Choose and fix e so small that (6.21) holds, and consider testing 0° against 9* by means of tests which have power ^ (say) against 9*, and let a* be the minimum available size when the sample point is x(n). It is known (cf. the proof of Lemma 6.1 in case 0 < K < oo) that if (pn is a ^'"'-measurable test such that Ee4
Let us write Pe0(\Tn - g(0°)\ ^ A<5) = an(A5). We see from (6.20) that 6 decreases continuously to zero as e decreases to zero. Dividing both sides of (6.2!^ by A232 and letting K -> 0 we obtain
It follows from (6.17), (6.18) and (6.20) that K(0*, 9°) - e2v/2 + o(e2) as e -> 0. It now follows from (6.20) that the right-hand side of (6.23) equals —(2A2v)~ l. Since A is arbitrary, we conclude that
Since 0 < v < oo it follows from (6.24) that there exists 8j > 0 such that if 0 < E < e l 5 then a n (e) > 0 for all sufficiently large rc, say for n > m(e). Since an(e) equals the left-hand side of (6.13) with 9 = 0°, it follows that 0 < e < EJ and n > m(e) imply that 0 < tn(e, 0°) fS oo. Since {Tn} is consistent, tn(e, 0°) ->• 0 as n -> oo. It follows hence from (6.13) by Theorem 1.1 that if 0 < e < e l 5 then
as n -> oo. It follows from (6.24) and (6.25) that (6.19) holds at 9°. In view of Theorem 6.1 let us say that {7^,} is an efficient estimate of g, in the sense of asymptotic effective variances if, for each fixed 0, lim n _ ao {nT^(e, 9)} exists for all sufficiently small e, say w(g, 9), and lim £ ^ 0 w(e, 9) = 0(0). At present it is an
SOME LIMIT THEOREMS IN STATISTICS
25
open problem to find estimates which are efficient in this sense. For partial results concerning efficiency of the maximum likelihood estimate in the present sense see [B6]and[B9]. We conclude this section with an example where the regularity assumptions of classical estimation theories are not satisfied but Assumptions 6.1 and 6.2 do hold, and the maximum likelihood estimate is efficient in the sense of asymptotic effective variances. Example 6.1. Suppose that X is the real line and x is distributed in X according to the double exponential distribution with mean 0, i.e., dPe = exp( —|x — 0\) dx/2, and 0 = (a, b), where a and b are constants, — oo ^ a < b ^ oc. Let g(0) = 0. A straightforward calculation shows that, for any 0{ and 02, It follows from (6.26) that (6.17) holds with for all 9. It follows from (6.27) that, for the present g, v(0) = 1 for all 0. Now for each n let kn be the integer such that n/2 < kn ^ n/2 + 1, let yn(l) < • < yn(n) be the ordered sample values {.Xj, • • • , xj , and let Tn — yn(kn). Then, for each 0 and £, where p < \ is given by Denote the left-hand side of (6.28) by a n (e). It follows easily from (6.28) and the definition of kn by Example 1.2 that where It is plain that rn does not depend on 0, that 0 < r n (e) < x, and that rn(e) ->• 0 as n -> oo. It therefore follows from (6.25) and (6.30) that, for each c > 0,
It is readily seen from (6.29) and (6.31) that
so {T n ] is asymptotically efficient. Suppose now that g(0) is a continuously differentiate function over («, b) and g'(0) ^ 0 for a < 0 < b. Let Un = g(Tn). It can be shown by an elaboration of the preceding argument that { U n } is an asymptotically efficient estimate of g.
26
R. R. BAHADUR
Notes. Lemma 6.1 is contained in unpublished work of Stem. The first published statement of the lemma seems to be in [C2]. The present proof of Lemma 6.1 is based on the proof in [B9]. Theorem 6.1 is due to Bahadur [B6]. 7. Exact slopes of test statistics. Let (S, s#) be the sample space of infinitely many independent and identically distributed observations 5 = ( x l , x 2 , • • • ad inf) on an abstract random variable x, the distribution of x being determined by an abstract parameter 0 taking values in a set 0. Let 00 be a given subset of 0, and consider testing the null hypothesis that some 9 in 00 obtains. For each n = 1,2, • • • , let Tn(s) be an extended real-valued function such that Tn is .^/-measurable and depends on s only through ( x l , • • • , x n ); Tn is to be thought of as a test statistic, large values of Tn being significant. Assume for simplicity that Tn has a null distribution, i.e., there exists an Fn(t) such that
and all t, — oo ^ t ^ oo. Then the level attained by Tn is defined to be
If in a given case the data consists of ( x l , • • • , x n ), then Ln(x^, • • • , xn) is the probability of obtaining as large or larger a value of Tn as the observed value T n (xi, • • • , xj if the null hypothesis is true. In typical cases Ln is asymptotically uniformly distributed over (0, 1) in the null case, and Ln -» 0 exponentially fast (with probability one) in the non-null case. We shall say that the sequence {T n } has exact slope c(9) when 9 obtains if
This definition is motivated in part by the following considerations. Consider the
Fisherian transformation Vn(s) = — 21ogLn(s). Then, in typical cases, Vn -> xl m
distribution in the null case. Suppose now that a non-null 9 obtains and that (7.3) holds, with 0 < c(9) < oo. Suppose we plot, for a given s, the sequence of points {(n, Vn(s)):n = 1, 2, • • •} in the uu-plane. It then follows from (7.3) that, for almost all s, this sequence of points moves out to infinity in the direction of a ray from the origin, the angle between the ray and the w-axis, on which axis the sample size n is being plotted, being tan" l c(&). The term "exact" in the above definition serves to distinguish c from another quantity, called the approximate slope of {7^}, which is defined as follows. Suppose that Tn has an asymptotic null distribution F, i.e., lim,,^^ Fn(t) = F(t) for each t. For each n and s let L("} = 1 — F(Tn(s)). Suppose (7.3) holds when Ln is replaced by L(°] and c is replaced by c(a)(9}. Then c(a\9) is the approximate slope of {Tn} when 9 obtains. (For a discussion of approximate slopes c(a\ and of the rather tenuous relations between c and c(a), see [B9].) In the remainder of this section, and in subsequent sections, we consider only exact slopes. In particular, the assumption that Tn has an asymptotic null distribution is henceforth not in force.
SOME LIMIT THEOREMS IN STATISTICS
27
Now for given £, 0 < £ < 1, and given s, let N = N(E, s) be the smallest integer m such that Ln(s) < e, for all n ^ m and let N — oo if no such m exists. Then N is the sample size required in order that the sequence {7^} become significant (and remain significant) at the level e. The following theorem shows that, for small £, N is approximately inversely proportional to the exact slope. THEOREM 7.1. If (1.3) holds and 0 < c(9) < oo, then
Proof. Choose and fix 9 such that 0 < c(9) < oo and choose and fix s such that n" 1 logics) -» —c(9)/2. Then Ln > 0 for all sufficiently large n and Ln -> 0 as n -» oo. It follows that N < oo for every e > 0 and that N -> oo through a subsequence of the integers as e -> 0. Hence 2 ^ N < oo for all sufficiently small £, say
for £ < £,. For1e < E{ we 1have LN < E ^ LN_:. Hence
^ (N — 1)- N' -(N — I)" log L N _ !. It now follows from the present choice of s that AT' log £ -» -c(0)/2 as £ -> 0. Suppose that {T^,1*} and {T(n2}} are two sequences of test statistics such that T(^ has exact slope ct(0) and suppose a non-null 0 with 0 < cf(9) < oo obtains. It then follows from Theorem 7.1 that, with W,-(£, s) the sample size required to make 7™ significant at level e, N2(z, S)/N{(E, s) -> Ci(0)/c 2 (0)[P fl ]. Consequently c](G)lc2(0) is a measure of the asymptotic efficiency of T(n1} relative to T(2) when 9 obtains. The following theorem describes a useful method of finding the exact slope of a given sequence {7^} for which (7.1) holds. Let 0! = 0 — 00 denote the non-null set of points 9. THEOREM 7.2. Suppose that
for each 0€&l, where — oo < b(9) < oo, and that
for each t in an open interval I, where f is a continuous function on I , and {b(0):0 e 0 t } ci /. Then (7.3) holds with c(0) = 2f(b(9))for each 0 6 0 , . Proof. Choose and fix 9 e 0 X , and choose and fix an s such that n~ l!2Tn(s) -> b as n -» oo. Let £ > 0 be so small that b + B and b — E are in /. Since Fn(t) is nondecreasing in t it follows from (7.2) that nll2(b — E) < Tn < nll2(b + e) implies 1 - Fn(nl/2(b - e)) ^ Ln ^ 1 - Fn(n1/2(b + e)); consequently the latter inequality holds for all sufficiently large n. It now follows from (7.6) that lim supn_ x n' l log Ln g —f(b - E) and lim inf^^ n~l log Ln ^ -f(b + e). Since / is continuous and e is arbitrary we conclude that lim,,^ n~1 log Ln = —f(b). Remark 1. Suppose 9 is a point in 0 0 . Then, for any { T n } , (7.3) holds wit c(9) — 0. This is an immediate consequence of Theorem 7.5 below and Ln ^ 1. Remark 2. If a given {T n } does not satisfy the two conditions of Theorem 7.2 it may well be the case that {T*} does, where, for each n, T* is equivalent to Tn in the sense that T* = (pn(Tn), where
28
R. R. BAHADUR
the levels attained by Tn and T* are the same for all s and n, so Theorem 7.2 applied to {T*} yields the common exact slope of both sequences. Remark 3. In many examples it is quite easy to verify that (7.5) holds with some b; verification that (7.6) holds with some / is always nontrivial, and §§ 1-5 discus exactly this problem. The following partial converse of Theorem 7.2 shows that if (somehow) the exact slope of {7^,} is known, then asymptotic estimates of 1 — Fn(nl/2t) become available. Some examples of this method of estimating large deviation probabilities are given in [Bll]. THEOREM 7.3. Suppose that (7.3) and (7.5) hold for each ^ b(9) ^ oo and 0 ^ c(9) ^ oo. Then, for any t such that
are well-defined,
Proof. Choose and fix f, — oo < t < oo, such that J\(t) and f2(t) are well-defined by (7.7). Let 9e&1 be a point such that b(9) > r; by hypothesis, there are such points 9. Now choose and fix an 5 such that n~1 log Ln(s) -> —c(9)/2, and n'1/2Tn(s) -> b(9); (7.3) and (7.5) imply that there are such sequences s. Since Tn > n 1/2 nmpliesL n ^ 1 — Fn(nll2t), it follows that this last inequality holds for all sufficiently large n. Hence lim in^^^ n - 1 log[l — Fn(nll2t)] ^ —c(9)/2. Since 9 with b(9) > tis arbitrary, it follows from (7.7) that the first inequality in (7.8) holds. The last inequality in (7.8) is established similarly. The following theorem describes an interesting and useful nonasymptotic property of Ln in the null case. THEOREM 7.4. For each 9 e 00 and each n, Pg(Ln(s) ^ u) ^ u for 0 ^ u ^ 1. Proof. Suppose that a particular 9 e ©0 obtains, and consider a particular statistic Tn. Since 9 and n are fixed, they are omitted from the notation. We assume that Tis real-valued; this involves no loss of generality since any extended realvalued statistic T° is equivalent to the bounded statistic tari~ l T°. If F, the d.f. of T, is continuous, then L is uniformly distributed over [0,1] and P(L ^ u) = u for 0 ^ u f$ 1. To treat the general case, let U be a random variable distributed uniformly over [0,1], independent of s, and let T* = T*(s, U) = T(s) + at/, where a > 0 is a constant. Then F*, the d.f. of T*, is continuous; hence F*(T*) is uniformly distributed over [0,1]. Now, for any t, F*(t) = P(T + at/ < t) ^ P(T < t - a) - F(t - a); hence F*(T*) ^ F(T* - a) ^ F(T - a) since T* ^ T an F is nondecreasing. It follows that P(l - F(T- a) < t) ^ t for t ^ 0. Now let a t , a 2 , • • • be a decreasing sequence of positive constants such that afc ->• 0. For t ^ 0 and k= 1,2, • • • , let Ak(t) be the event that 1 - F(T - <x k ) < f. Then P(y4k(0) ^ t for each k. Since F is nondecreasing and left-continuous (cf. (7.1)), y4 k (t) c: Ak+ t (t) for each /c, and U fc v4 fc (r) is the event that 1 — F(T)(= L) < t. Consequently,
SOME LIMIT THEOREMS IN STATISTICS
29
P(L < t) = lim^^ P(Ak(t)) ^ t. Since t ^ 0 is arbitrary, it now follows easily that P(L ^ u) ^ w f o r O ^ u ^ 1. It is worthwhile to note that the preceding Theorems 7.1 7.4 are valid for any sample space (S, ,c/), any set {Pe:0e 0} of probability measures on ,o/, and any sequence {Tn:n = 1, 2, • • •} of extended real-valued ^-measurable functions. We conclude this section with a theorem which depends heavily on the present assumptions that s is a sequence of independent and identically distributed observations on x, and that Tn depends on s only through the first n observations. For 0 and 00 in 0, let K(9, 90) be defined as in § 6, and let Then 0 ^ J(0) g oo for all 0, and J(0) = 0 for 0 e 0 0 . In typical cases 0 < J(0) ^ oo on 0,. The following theorem implies that the exact slope of any sequence {Tn} cannot exceed 2J(0) when 0 obtains. THEOREM 7.5. For each 9 e 0,
Proof. Since (7.10) holds trivially if J — oc, it will suffice to consider points 0 for which J(0) < oc. Choose and fix such a 0. Let E > 0 be a constant. It follows from (7.9) that there exists a 00 e 00 such that With 0 and 00 fixed, abbreviate K(0, 00) and J(0) to K and J respectively. Since K < oo, Pe(jJ dominates Pe on (X,i%), say dP0 = r(x)dP0o. Then with rn(s)
= ri" KX,),
andrfP^1" = rn dP(^ on (X'"',^'"'). For each n let /!„ be the event that Ln < exp(-n x [K + 2e]) and fin be the event that rn < e\p(n[K + e]). Then
by Theorem 7.4. It follows from (7.13) that £n P0(An n Bn) < oo. It follows hence from (7.12) and the definitions of An and Bn, that if 0 obtains, then, with probability
30
R. R. BAHADUR
one, Ln(s) ^ exp[ — n(K + 2e)] for all sufficiently large n. Hence the left-hand side of (7.10) is not less than -K - 2s[P0]. It now follows from (7.11) that
Since e in (7.14) is arbitrary, it follows that (7.10) holds. Remark 4. If a statistic Tn does not have an exact null distribution (cf. (7.1)), the level attained by it is defined to be Ln(s) — 1 — Fn(Tn(s)), where Fn(t) = inf{P0(7^(s) < f): 6 e © 0 }. It is readily seen that, with Fn and Ln as defined here, Theorems 7.1 through 7.5 are valid for any sequence {Tn}. Notes. This section is based mainly on [B5], [B8], [B9]. Various versions of Theorem 7.5 are given under various regularity assumptions in [B6], [B8] and [BIO]; that no assumption whatsoever is required was shown by Raghavachari [Rl]. The present proof of Theorem 7.5 is a simplification suggested by R. Berk and others of the proof in [Rl]. Certain generalizations and refinements of the content of this section are given in [BIO] and [Bll]. Certain nonasymptotic treatments of the level attained are given in [Dl], [Jl]. 8. Some examples of exact slopes. Example 8.1. Suppose that X is the real line, and that x is normally distributed with mean 6 and variance 1 when 9 obtains. The parameter space 0 is [0, oo) and the null hypothesis is that 0 = 0. Consider T(nl) = n~ 1/2£"=1 x t , T(n2\s) = n~1'2 (the number of indices j with 1 ^ j ^ n and Xj > 0), and for n ^ 2, T(*\s) = T(al)/vln'2, where vn = £"(*,. - xn)2/n - 1. T< 3) might be used by a forgetful statistician who fails to remember that the underlying variance is one. Then T*,0 satisfies (7.5) with b = bt, where
where
Since the j\ are continuous it follows that Tj,° has exact slope c,-(0), where
SOME LIMIT THEOREMS IN STATISTICS
31
Note that c3/cl < 1 for each non-null 9, and that c3/c{ -> 1 as 9 -> 0. Short tables of c2/c1 and c2/c3 are given in [B4]. Example 8.2. Let X be the real line, and let 0 be the set of all continuous probability distribution functions 9(x) on X, and let Pe(B) denote the probability measure on X determined by the d.f. 6. The null hypothesis is that 6 = 0 0 , where 00 is a given continuous p.d.f. For each n let Fn(t) — F J ^ x ^ , • • • , xn) be the empirical d.f. based on [xl, • • • , xn}, and let T(nl) be the Kolmogorov statistic, i.e., T(n1}(s) = n 1 / 2 sup{|F n (f) - 0 0 (f)|:—GO < t < 00}. It follows from the Glivenko-Cantelli theorem that (7.5) holds for T(ttl\ with b(9) = 6(9) = sup{\9(t) - 90(t)\: - oo < t < 00}; 0 < <5(0) < 1 for 0 ^ 90. It follows from Example 5.3 that T^ satisfies (7.6) with /(O = g(0> where g is defined by (5.24) and (5.25). Since g is continuous, the exact slope of T< 1 ) isc 1 (0) - 2g(c5(0)). Now consider Kuiper's statistic T(n2}(s) = n1/2[sup,{Fn(r) - 0 0 (f)} + sup r {0 0 (f) — Fn(t)}]. It follows from Example 5.3, exactly as in the preceding paragraph, that T< 2) has exact slope c 2 (0) = 2g(S+(9) + 6~(0)), where <5 + (0) = sup,{9(t) - 00(t)} and 5~(0) = supt{90(t) - 9(t)}. Since 6+ + 6~ ^ 6, with equality if and only if (5+ = 0 or d~ = 0, and since g is strictly increasing, it follows that c1(9)/c2(9) ^ 1, with equality if and only if d+ — Q o r d ~ = 0. It follows from Lemma 5.1 that c,/c2 = (
Then (7.5) holds, with 6(0) = 6(0), where
The range of 6 as 9 varies over 0! = 0 — {p} is (0,7,), where y\ = max{p, 1 - 1:1 ^ i ^ fc}. Let A = 0 and let Vn = n ~ l ( f { n , • • • , fkn), as in Example 5.4, and let K(v, p) be defined for v e A as in Example 5.4. Consider a t e (0, y,). Then the event T(nl) ^ n1/2t is identical with the event Vn e /!,, where Since 5 is continuous in r, A, is the closure of its interior, so A, is p-regular, by Lemma 5.2. Consequently (7.6) holds for T(nl \ with/(t) = K(A,, p). It is -eadily see holds for all sufficiently large n. It now follows from (7.6) that lim supn_ x n' l log ,p) i(0}Ln when 0 obtains.
32
R. R. BAHADUR
Now for each n let T(2\s) = n l / 2 K ( V n , p ) . Since VH(s) -» 0[Pe] and since K(v,p) is continuous in y, Tj,2) satisfies (7.5) with b(9) = K(9, p). The range of this b over 0j is (0,y 2 ), where y2 — max{log(l/p;): 1 ^ i ^ k } . Consider a t e ( 0 , y 2 ) . The event Tj,2) ^ nl/2t is identical with the event VneBt, where
B, is p-regular, and K(B,,p) = t; hence (7.6) holds for T ( 2 ) with f(t) = t. Thus [T(2}] has exact slope c2(9) = 2K(9, p) when 9 obtains. This conclusion is available also from Theorem 10.1. Now, it is plain from (8.5) that, for each $60, Ad(e) contains 9. Hence ^ K(0, p); hence
This conclusion is available also from Theorem 7.5, since J(9) = K(9, p) in the present case. Let E be the set of all 9 i=- p such that Cj(0) = c2(9). The set E is not known at present, but it can be shown that 9 = (6l, • • • , 9k) e E implies that there exist a and /?, 0 < a < l < ) 8 < o o , such that BJp( = a or $ for each /. E is therefore onedimensional (or zero-dimensional), whatever k may be. Note. Examples 8.1, 8.2, and 8.3 are based on [B4], [A2], and [Al] respectivel (cf. [Al], [fill], [HI], [H2], [K2]-[K4], [R3], [W3] for other examples). 9. The existence and consistency of maximum likelihood estimates. In this section and the following one it is assumed that X is, or may be taken to be, a finitedimensional Euclidean space, and that $ is the field of Borel sets of X. 0* is a given set of probability measures P on ^, to be thought of as the possible distributions of x. It is assumed that & is endowed with the topology of weak convergence, s = (x : , x 2 , • • • ad inf) denotes an infinite sequence of independent replicates of x. Suppose for the moment that, as in previous sections, there is given a parametrization of ^>, say & — {Pe:9e&}, and that g(9) is a real-valued functional on 0. For each n let 9n denote the maximum likelihood (m.l.) estimate of 9 when the data is (xl, • • • , x n ). Then the m.l. estimate of g is g(9n). In particular, for any B e ^, the m.l. estimate of P0(B) is Pon(B) = Qn(B), say; Qn is, of course, a probability measure in the set (J?. It is thus seen that the m.l. method always estimates the entire under lying distribution from given data. Since successful estimation of the entire underlying distribution is the maximum of objectives attainable by any statistical method, it is of interest to enquire whether the m.l. estimated distribution is co sistent, i.e., if some P in ^ obtains, then Qn -> P with probability one. According to this viewpoint, the consistency of g(9n) for a given g is a subsidiary issue governed almost entirely by such questions as whether g is identifiable, i.e., a functional on ^, and if so whether this functional is continuous on &. It seems reasonable not to confound such nonstochastic questions with the consistency problem, so we dispense with parametrization; more precisely, we take P itself to be the unknown parameter and 3P to be the parameter space.
SOME LIMIT THEOREMS IN STATISTICS
33
It is assumed that & is a dominated set, i.e., there exists a er-finite /i, and a family (f P G £^} of J'-measurable functions /P, 0 ^ fp < oo, such that, for each P e ^, p:
Let there be given a /a and a family {/P:Pe^} such that (9.1) holds, n and [fp]
remain fixed throughout this section and the following one. For each n and 5 let
Suppose for the moment that 3? is a finite set, say {Pl, • • • , Pm}, 1 < m < oo. For each n and s let Qn be a measure in & such that ln(Qn\s) = max{/n(Pj|s): 1 ^ i ^ m}. Suppose that a particular P{ obtains, 1 ^ i ^ m. Then 0 < /n(P;|s) < oo for all n [PJ. It follows hence that n'1 log[/n(P,-|s)//n(P,-|s)] = rn(ij; s), say, is well-defined for each j and «[PJ, and that
Since K(P(, Pj) > 0 for / ^ j, it follows from (9.3) that l^s) > max{l n (Pj\s): 1 ^ j ^ m,j ^ /} for all sufficiently large «[PJ; hence
It is thus seen that m.l. estimates always exist and are consistent in the finite case. The basic idea of Wald's famous proof of consistency [W2] is to reduce the general case to the finite case by some compactification device. The following is a somewhat hyperbolic description of Wald's beautiful argument. A compact space is essentially a finite space. If 2P is compact, or can be compactified in a suitable way, and certain integrability conditions are satisfied, & is nearly finite; hence Qn is nearly consistent. But Qn is either consistent or inconsistent; so Qn is consistent. We proceed to formulate some sufficient conditions for the existence and consistency of m.l. estimates. Let M denote the set of all measures M on the Borel field^ of sets of X such that M(X) ^ 1. For any sequence {M^.j = 0, 1, 2, • • •} \nJt let us say that Mj -» M0 asy -> oo if and only if, for each real-valued continuous function h on X with compact support, J x h(x) dMj —> §x h(x) dM0. It can be shown by standard methods (cf. [B14]) that, with this definition of convergence, Ji becomes a metrizable and compact topological space. Let d be a distance function on M x M such that, for any sequence {Mj'.j = 0, 1, • • •} in Jt, Mj —>• M0 if and only if d(Mj, M0) -> 0. It should be noted that if M 0 , M t , • • • are all probability measures, then d(Mj, M 0 ) -» 0 if and only if Mj -+ M0 weakly. It is not necessary to specify d; indeed, having a specific d on hand is often a handicap in examples. Now let 0> be the closure in M of the given set & of probability measures P. ^ is a compact set. For any M e 0> and any constant r, 0 < r < oo, let
34
R. R. BAHADUR
Then gM is nondecreasing in r for each fixed x. Let
We shall say that ?J* is a suitable compactification of # if, for each M e ^, gM(x, r) defined by (9.5) is ^-measurable for all r > 0, and yM defined by (9.6) satisfies
Condition A. & is a suitable compactification of ^. It should be noted that this is a condition not only on the set & but also on the
particular version fp of dP/dfi which is in force for each P in &. Some of the addi
tional conditions stated below also involve the given family {fp: F e ^} of densit functions. This is not inappropriate since the very definition of m.l. estimates presupposes that a family of density functions is given. It is readily seen that Condition A is independent of the metric d, i.e., it holds with some choice of d if and only if it holds for every choice, and the same is true of the other conditions of this section. If M is in &, it follows from (9.5) and (9.6) that yM(x) ^ /M(x); hence yM(x) = /M(-X)M by (9.7), so yM is necessarily a version of dM/d/.i. However, if M is in & — (?, yM is not necessarily a version of dM/d/j.; in fact there are simple examples in which Condition A holds but & is not even dominated by \i or any other erfinite measure. Let l(J>\x] = sup{/ P (x):Pe^}. Since l(
Condition C. If M is a measure in & — & and P is a measure in ^, then
Condition D. ^ — & is a closed set. Condition E. For each P in ^, yp(x) = fp(x) for all x in X. Now for each n and 5 let
where ln(P\s) is given by (9.2). Then 0 ^ ln(^\s) ^ oo. It is not necessary to assume that ln(&\s) is an ^/-measurable function. Let c be a constant, 0 < c < 1. A measure Q is an approximate m.l. estimate (in the sense of [W2]) when the sample consists of ( x 1 , • • • , *„) if Q e & and ln(Q\s) ^ cln(&\s). For each n and s let ^* be the (possibly empty) set of approximate m.l. estimates based on (xi, • • , x n ). Note that 0>* depends on s and c; so write ^>* = i^*(s;c). Q e ^ is an m.l. estimate based on
SOME LIMIT THEOREMS IN STATISTICS
35
( x j , • • • , x n ) if ln(Q\s) = ln(i?\s). Let &n(s) denote the (possibly empty) set of all m.l. estimates based on ( x 1 , • • • , xj. It is plain that
for all n and s. Suppose now that a given P€ & obtains. In the following theorems (and in the following section) the phrase "with probability one" preceding a statement means that there exists an j^-measurable set SP of sequences s with P(cc\SP) — 1 such that the statement is true for each s in SP. If {^fn:n — 1, 2, • • •} is a sequence of subsets of ^>, j§?n -> P means that sup{d(Q, P)\Q€&n}^> 0. THEOREM 9.1. // Conditions A, B, and C are satisfied, then with probability one ^*(s;c) is nonempty for every n and 2?*(s; c) -» P as n -» oo. It follows from (9.11) that ,^* -» F implies .^n -> P provided that &„ is nonempty for all sufficiently large n. THEOREM 9.2. // Conditions A-E are satisfied, then with probability one $n(s} is nonempty for all sufficiently large n and $n(s) -» P as n —* oo. The proofs of Theorems 9.1 and 9.2 are along the lines of the proof on pp. 320321 of [B9], with 0 of the latter proof identified with &. The above theorems assert consistency in the sense of weak convergence. However, in many examples, if {Qn'.n = 0, 1, 2, • • •} is a sequence in & such that d(Qn, <20) -» 0, then lim n _ ^ fQn(x) = /Go(x)[/i]. It follows from Scheffe's theorem that in such examples weak convergence is equivalent to convergence in the stronger sense of the distance function S(P1, P2) = sup^P^B) - P2(B)\:BE 01}. The present regularity assumptions are by no means necessary. For a recent discussion, and for an account of various ways of weakening such conditions, see [PI]. It is unlikely, however, that general theorems can be fc r mulated which will render ad hoc methods for various special cases and examples superfluous. An important special case is treated in [B13]. We conclude this section with examples which show that Conditions A, B, C, D and E are independent conditions and that, in the general case, each one of these is indispensable. Example 9.1. Let X be the interval (0, 1], and let u be Lebesgue measure. For any positive integer k, let us call the intervals (i/2k, (i + l)/2k] for i = 0, 1, • • • , 2k — 1 dyadic intervals of rank k. Let & be the set of all probability measures P which are of the form dP = f(x) du, where /satisfies the following condition : There exists a positive integer k, and 2k~l dyadic subintervals of X of rank k, such that / = 2 on these subintervals and / = 0 elsewhere on X. Then & is a countable set, and $n(s) is nonempty and everywhere dense in & for each s and n. Consequently, given any Q e 2P, there exists Qn( • |s) e &n(s) such that Qn( • \s) -> Q( • ) for every s. It is possible to define Qn( • |s) such that Qn( • |s) e $n(s) for every s and n, and such that if some P in 2P obtains, then Qn( • \s) -> P with probability one. Example 9.2. Let X = {0, 1, 2, • • • ad inf}, let ^ be the field of all sets, and let \i be counting measure. Let ?? = {P0, F1? P2, • • • ad inf}, where dPk = /k(x) du, and
36
R. R. BAHADUR
kf is defined as follows. Let a ^ 27 be a constant, and let
for; - 1,2, ••• . Next, let
where the constant b is so chosen that
Now let
and for k = 1, 2, • • • let
In this example, &n(s) is nonempty for every s and n, and no matter what P e 2? obtains, ^n(s) —> P0 with probability one. Incidentally, with probability one, PO £ ^n(s) for all sufficiently large n, even if F0 itself obtains. Example 9.3. Suppose X = (0, oo), let ft be Lebesgue measure on X, and let 3P consist of the uniform measure on (0, 1], U say, and all measures P of the form dP = f(x) d[i, where / satisfies the following condition: There exists a positive integer k, and k dyadic subintervals of (0, 1] of rank k, such that / = 1 on these subintervals and on (k, k + 1 — k • 2~k] and / = 0 elsewhere on X. In this example 2? is again countable, and & consists of & and the measure which is identically zero, N say. ^n(s) is nonempty and sup{d(Q, U):Qe &n(s)} ^ d(N, U) > 0 for every n and s = (xl, x2, • • •) with 0 < x, ^ 1 for all i, so there is inconsistency when U obtains. Example 9.4. Suppose X = {0,1}, 0 is the set of all irrationals in (0,1), and & = {Pe: 9 e 0}, where Pe(x = 1) = 9, P0(x = 0) = 1 - 0. Then &n(s) is empty for every s and n. Example 9.5. Suppose X = (0, oo), & — (P 0 :0 < 9 < oo}, dPe = fe(x) dx, where fe(x) = 1/9 for 0 < x < 6 and/ e (x) = 0 for x ^ 6. Then .^,(.s) is empty for every s and n. If we redefine fg(x) to be 1/0 for 0 < x <; 0 and fe(x) = 0 for x > 9, then Conditions A-E are satisfied. Note. This section is based mainly on [W2] and partly on [Bl], [B8], [B9], [Kl], and [R2] (cf. [PI] for further discussion and additional references).
SOME LIMIT THEOREMS IN STATISTICS
37
10. The exact slope of the likelihood ratio statistic. In this section we con-
sider the framework (A", $}, ^, /i, {fp: P e &} described at the beginning of § 9. Let
,^0 be a given nonempty proper subset of & and consider the null hypothesis that some P in ^0 obtains. Let ^ = ^ - ,^0, and for each s and n let /n(^|s) be defined by (9.2) and (9.10) with & replaced by ^, i = 0, 1. Let
with the convention that 0/0 = oo/oc = l.Then fn is well-defined. — oc ^ fn ^ oo.
It is assumed that, for each n, fn is ^-measurable. It is not assumed that fn has an
exact null distribution. Accordingly, the leveln attained is defined tobybef where For any P and Q in ^, let
It is readily seen that the integral in (10.4) is always well-defined and that K equals the K of § 4 with p = P and q = Q. Let In this section we formulate general conditions under which the exact slope of {Tn} is 27(0, i.e., for each g E ^ n ' 1 log La(s) -> -J(Q)asn -> x[0]. According to Theorem 7.5, 2J(Q) is the maximum available exact slope when Q obtains. The idea underlying these conditions is the idea used in the preceding section; specifically, if ^o and ^ are both finite sets, then {T n } does have exact slope 2J(Q) again each Q e ^ (cf. [B9], pp. 315-316); the general case is reduced to the finite case by a compactification device. Let ,// be the set of measures M on X with M(X) ^ 1, let ,// be topologized as in § 9, and choose and fix a distance function d. Let J"0 be the closure of ^0 in jtf. Assumption 10.1. ^0 is a suitable compactification of .^0. Under this assumption, if M is a measure in ,^0, then is J'-measurable for each r > 0, 0 ^ g^ ^ x, and with
we have
38
R. R. BAHADUR
For Q e ^ and M e ^ let
It follows from (10.8) that K* is well-defined and 0 g K* ^ oo. Since Pe^ 0 implies >'£(x) = //>(x)[//|, it follows from (10.4) and (10.9) that K* is an extension of the function K on ^ x ,^0 to a function on ^ x ^0. Assumption 10.2. For each Q e ^ , J(g) = inf{K*(Q, M):M e^ 0 }. Since X* is an extension of K, it is plain from (10.5) that this assumption is automatically satisfied if #0 — ^0 is empty, or if K*(Q, M) — oo for Q e ^ and Me;^ 0 -&0. Let /(^Q|X) be the supremum of fp(x) over t/0. Assumption 10.1 implies that this supremum is ^-measurable. Assumption 10.3. For each Q e ^ , EQ(log + [/(^0 x)/JQ(x)]) < op. Now let t^ be the closure of ^ in M. Let M be a measure in -^ and for r > 0 let Assumption 10.4. Given E and T, £ > 0 and 0 < T < 1, and M e .-^ > 1 , there exists an r = r(c, T, M) > 0 such that g]/(x, r) is ^-measurable and
This assumption is perhaps the most troublesome one to verify in examples. A simple sufficient condition for the validity of the assumption is that .^, be a suitable compactification of ^ and that for each M e J*\ there exist an r = r(M] > 0 such that
(Cf.[B8], pp. 22-23.) THEOREM 10.1. Suppose that Assumptions 10.1-10.4 hold. Then (i) Tn(s) -> J(Q) as n -»• oo [Q] /or eac/i Q e ^ , (ii) n~ J log Ln(s) -> -J(Q) as n -* co[Q] for each Q€^, and (hi) for each t in the interior of the set {J(Q)'.Qe^}, n~l\og[l — -^i(0] ->• — t as n -> GO. Proo/ Choose a g 6 ^ and suppose that Q obtains. We first show that, whether J(Q) is finite or not,
Let a and b be positive constants, and let According to (10.1), fn(s) ^ n~l \og[ln(Q\s)/ln(J>Q\s)]. It will therefore suffice to show that, with probability one,
SOME LIMIT THEOREMS IN STATISTICS
39
Let M be a point in ,^0, let g°M be defined by (10.6), and let Y(x) = log[g^(.x, O//Q(^)]- It follows from Assumption 10.1 that Y is a well-defined [Q] extended real-valued random variable. It follows from Assumption 10.3 that m = EQ(Y) is well-defined, — x ^ m < x, and that m -> — K*(Q, M) as r -» 0, where K* is given by (10.9). Since - K*(Q, M) ^ -J(Q) < H(Q) by Assumption 10.2 and (10.14), w < H(Q) for all sufficiently small r. Now choose r = r(M, (), a, b) > 0 so that m < H(Q) and let ,4" = {N:Ne.,>0, d(M, N) < r } . Then /„(. T n .^0|.s) g 11,"= i gAf(*«» >")• Since 0 < ln(Q\s) < x[Q], it follows that n l log[/ n (. I n //0 s)/ ln(Q\s)] ^ i"1^^! ^(-xi) w ith probability 1. Hence, with probability one,
Thus corresponding to each M e .^0 there exists a spherical neighborhood of M in the space ,///, say ,4'(M), such that (10.16) holds with probability one. Since
.^o is compact, there exist open sets .41, • • • , .A'"k in Jt such that .^0 c UJ= t . 47 and such that, with probability one, (10.16) holds with ,,4'" = , 4 ] for each j. Since ^0 = U} = 1 (.^}n^ 0 ), it follows that ln(i?0\s) = max{/ n (,4 y n .^0\s): 1 ^ j ^ k } . It now follows that (10.15) holds with probability one. Thus (10.13) is established. Now choose e > 0 and T, 0 < T < 1. We shall show that there exists a positive integer k = /c(e, T) such that, with Fn defined by (10.3),
for all n = 1, 2, • • • and all t, - oc ^ f rg oo^ It follows from the compactness of ^ and Assumption 10.4 that there exists a finite set, M t , • • • , Mk say, of points in J^ and spherical neighborhoods of these points, say , \] — {N:N e ,//, d(M}, N) < r^} for / = 1, • • • , / c , such that ,^ c U} = , . \-] and such that (10.11) holds with M = Mj and r = ^ > 0 for each j = 1, • • • , k. Consider a P in J*0 and a r, — x < f < x, and for
each ./ let Y(j\x) = \og[glf.(x, rj)/fp(x)] - t. Then r0'1 is well-defined [P] and j} P(-x ^ y(j)(J < x) = 1. Let (/>°» = EP(exp(uY(j}}}. For any w write Z(n
= YJI= i Y \Xi). It follows from an extension of Theorem 2.1 to extended random variables that P(Z(nj)^ 0) ^ [(p(J>(r)]" ^ [/i(f)]n, by (10.11) and the definition of h in (10.17). Hence P(ma\{Z(nj):l ^j ^ k} ^ 0) ^ k[h(t)]a. However,
by (10.1) and (10.10). Hence P(fu(s) ^ t) ^ k[h(t)]". Since Pe //0 is arbitrary, it follows from (10.3) that (10.17) holds for all n and all finite t. That (10.17) holds for t = -x is trivially true. Since 1 - F n (x) ^ 1 - Fn(t) for all finite r, and since /i(x>) = 0, it follows by letting t -> x through finite values that (10.17) holds also for t = oc. Thus (10.17) is established for all n and r.
40
R. R. BAHADUR
It follows from (10.2) and (10.17) that Ln(s) ^ k(\ + e)nexp[-mfn(s)] and s. Hence
for all n
and
for every s. It follows from (10.13) and (10.18) that
Since R and T are arbitrary, it follows from (10.20) that, whether J is finite or not, the left-hand side of (10.20) does not exceed — J(Q) [Q]. It now follows from Theorem 7.5 applied to {fn} that n~l log Ln(s) -> -J(Q)[Q]. This conclusion and (10.19) imply that, for each R and T, lim sup n ^ r, f n (s) ^ [log(l + R) + J(0]/i[Q]. Hence limsup^^ Tn(s) ^ J(Q)[Q]. We now see from (10.13) that ftt(s) -> J(0[<2]. This establishes parts (i) and (ii) of Theorem 10.1; part (iii) follows from parts (i) and (ii) by Theorem 7.3. Remark 1. Suppose J^0 and .^ are finite sets. Then all four assumptions of this section hold. Note that part (iii) of Theorem 10.1 is vacuous whenever ^ is finite. Remark 2. Suppose X is a finite subset of the real line, say X = (1, • • • , k } , k ^ 2, J* is the class of all subsets of X, and ,^ — ^0 u .^, where ^0 and ^ are nonempty disjoints sets of probability measures on J?. In the notation of Example 8.3, let us identify any probability P on 38 with the point i; = ( v { , • • • , vk) in A such that P(x = i) — vf for / = 1, • • • , k. Let A0 and A l be the subsets of A corresponding to ^0 and ,^\ . With Vn the vector of relative frequency counts based on ( x 1 , • • • , xj, l let U(j\s)= M{K(Va,v)\veAi] for / = 0, 1. Then fa(s) = U(n°\s) - U(n\s). It seems difficult to find the exact slope of { f n } directly for arbitrary A, and A0, but it is readily seen that Assumptions 10.1-10.4 are always satisfied. Remark 3. It can be shown by examples (manufactured from the frameworks of Examples 9.1,9.2 and 9.3) that Assumptions 10.1 10.4 are independent assumptions and that each one is required in the general case. Cf. [B12"|. Remark 4. fn is sometimes defined by (10.1) with ;^ replaced by ^. It is readily seen that Theorem 10.1 remains valid in this case, provided Assumption 10.4 is strengthened by replacing ^ and glM & and and gM gM of § 99 in in the statement statement of of M with & Assumption 10.4. Note. This section is based on [B8].
SOME LIMIT THEOREMS IN STATISTICS
41
REFERENCES [Al] I. G. ABRAHAMSON, On the stochastic comparison of tests of hypotheses. Doctoral dissertation. University of Chicago, 1965. [A2] , The exact Bahadur efficiencies for the Kolmogorov^Smirnov and Kuiper one- and twosample statistics, Ann. Math. Statist., 38 (1967), pp. 1475-1490. [Bl] R. R. BAHADUR, Examples of inconsistency of maximum likelihood estimates, Sankhya, 20 (1958), pp.207-210. [B2] , Some approximations to the binomial distribution function, Ann. Math. Statist., 31 (1960), pp.43-54. [B3] R. R. BAHADUR AND R. RANGA RAO, On deviations of the sample mean, Ibid., 31 (1960), pp. 10151027. [B4] R. R. BAHADUR, Simultaneous comparison of the optimum and sign tests of a normal mean. Contributions to Prob. and Statist.—Essays in Honor of Harold Hotelling, Stanford University Press, 1960, pp. 79-88. [B5] , Stochastic comparison of tests, Ann. Math. Statist., 31 (1960), pp. 276-295. [B6] , Asymptotic efficiency of tests and estimates, Sankhya, 22 (1960), pp. 229-252. [B7] , On classification based on responses to n dichotomous items, Studies in Item Analysis and Prediction, H. Solomon, ed., Stanford University Press, 1961, pp. 169-176. [B8] , An optimal property of the likelihood ratio statistic, Proc. Fifth Berkeley Symp. Math. Statist. Prob., 1 (1965), pp. 13-26. [B9] , Rates of convergence of estimates and test statistics, Ann. Math. Statist., 38 (1967), pp. 303-324. [BIO] R. R. BAHADUR AND P. J. BICKEL, On conditional test levels in large samples. University of North Carolina Monograph Series No. 3, 1970, pp. 25-34. [ B l l ] R. R. BAHADUR AND M. RAGHAVACHARI, Some asymptotic properties of likelihood ratios on general sample spaces, Proc. Sixth Berkeley Symp. Math. Statist. Prob., 1 (1970). [B12] R. R. BAHADUR, Examples of inconsistency of the likelihood ratio statistic. Sankhya, Ser. A. to appear. [B13] R. H. BERK, Consistency and asymptotic normality of MLES for exponential models. [B14] P. BILLINGSLEY, Convergence of Probability Measures, John Wiley, New York, 1968. [Cl] H. CHERNOFF, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Ann. Math. Statist., 23 (1952), pp. 493-507. [C2] , Large sample theory—parametric case, Ibid., 27 (1956), pp. 1-22. [Dl] A. P. DEMPSTER AND M. SCHATZOFF, Expected significance level as a sensitivity index for test statistics, J. Amer. Statist. Assoc., 60 (1965), pp. 420^36. [Fl] W. FELLER, An Introduction to Probability Theory and Its Applications, vol. I, 3rd ed., John Wiley, New York, 1967. [F2] , An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley, New York, 1969. [F3] , Limit theorems for probabilities of large deviations, Z. Wahrscheinlichkeitstheorie verw. Geb., 14(1969), pp. 1-20. [HI] J. HAJEK, Asymptotic sufficiency of the vector of ranks in the Bahadur sense, Ann. Math. Statist.. to appear. [H2] A. B. HOADLEY, On the probability of large deviations of functions of several empirical cdfs. Ibid.. 38 (1967), pp. 360-382. [H3] W. HOEFFDING, On the distribution of the expected values of the order statistics. Ibid., 24 (1953), pp.93-100. [H4] , Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), pp. 13-30. [H5] , Asymptotically optimal tests for multinomial distributions, Ann. Math. Statist., 36 (1965), pp. 369^08. [H6] -•--, On probabilities of large deviations, Proc. Fifth Berkeley Symp. Math. Statist. Prob.. 1 (1965), pp. 203-219. [Jl] B. L. JOINER, The median significance level and other small sample measures of test efficacy, J. Amer. Statist. Assoc., 64 (1969), pp. 971-985.
42
R. R. BAHADUR
[K.1] J. KIEFER AND J. WoLFOWiiz, Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters, Ann. Math. Statist., 27 (1956), pp. 887-906. [K.2] T. J. KILLEEN, T. P. HETTMANSPERGER AND G. L. SIEVERS, An elementary theorem on the probability of large deviations, Ibid., 42 (1971). [K.3] J. KLOTZ, Alternative efficiencies for signed rank tests, Ibid., 36 (1965), pp. 1759-1766. [K4] , Asymptotic efficiency of the Kolmogorov-Smirnov test, J. Amer. Statist. Assoc., 62 (1967), pp.932 938. [K.5] S. KULLBACK AND R. A. LEiBLER, On information and sufficiency, Ann. Math. Statist., 22 (1951), pp. 79 86. [K.6] , Information Theory and Statistics, John Wiley, New York, 1959. [K7] S. KULLBACK AND M. A. KHAIRAT, A note on minimum discrimination information, Ann. Math. Statist., 37 (1966), pp. 279-280. [LI] E. L. LEHMANN, Testing Statistical Hypotheses, John Wiley, New York, 1959. [PI] M. PERLMAN, On the strong consistency of approximate maximum likelihood estimators, Proc. Sixth Berkeley Symp. Math. Statist. Prob., 1 (1970). [Rl] M. RAGHAVACHARI, On a theorem of Bahadur on the rate of convergence of test statistics, Ann. Math. Statist., 41 (1970), 1695-1699. [R2] C. R. RAO, Maximum likelihood estimation for the multinomial distribution, Sankhya, 18 (1958), pp.139-148. [R3] J. S. RAO, Bahadur efficiencies of some tests of uniformity on the circle, Ann. Math. Statist., to appear. [SI] I. N. SANOV, On the probability of large deviations of random variables, Sel. Trans. Math. Statist. Prob., 1 (1957), pp. 213-244. [S2] J. SETHURAMAN, On the probability of large deviations of families of sample means, Ann. Math. Statist., 35 (1964), pp. 1304-1316. [S3] G. L. SIEVERS, On the probability of large deviations and exact slopes, Ibid., 40 (1969), pp. 19081921. [Wl] A. WALD, Sequential Analysis, John Wiley, New York, 1947. [W2] , Note on the consistencv of the maximum likelihood estimator, Ann. Math. Statist., 20 (1949), pp. 595-601. [W3] G. G. WOODWORTH, Large deviations and Bahadur efficiency of linear rank statistics, Ibid., 41 (1970), pp.251-284.
(continued from inside front cover) JERROLD E. MARSDEN, Lectures on Geometric Methods in Mathematical Physics BRADLEY EFRON, The Jackknife, the Bootstrap, and Other Resampling Plans M. WOODROOFE, Nonlinear Renewal Theory in Sequential Analysis D. H. SATTINGER, Branching in the Presence of Symmetry R. TEMAM, Navier-Stokes Equations and Nonlinear Functional Analysis MIKLOS CSORGO, Quantile Processes with Statistical Applications J. D. BUCKMASTER AND G. S. S. LuDFORD, Lectures on Mathematical Combustion R. E. TARJAN, Data Structures and Network Algorithms PAUL WALTMAN, Competition Models in Population Biology S. R. S. VARADHAN, Large Deviations and Applications KJYOSI ITO, Foundations of Stochastic Differential Equations in Infinite Dimensional Spaces ALAN C. NEWELL, Solitons in Mathematics and Physics PRANAB KUMAR SEN, Theory and Applications of Sequential Nonparametrics LASZLCJ LOVASZ, An Algorithmic Theory of Numbers, Graphs and Convexity E. W. CHENEY, Multivariate Approximation Theory: Selected Topics JOEL SPENCER, Ten Lectures on the Probabilistic Method PAUL C. FIFE, Dynamics of Internal Layers and Diffusive Interfaces CHARLES K. CHUT, Multivariate Splines HERBERT S. WILF, Combinatorial Algorithms: An Update HENRY C. TUCKWELL, Stochastic Processes in the Neurosciences FRANK H. CLARKE, Methods of Dynamic and Nonsmooth Optimization ROBERT B. GARDNER, The Method of Equivalence and Its Applications GRACE WAHBA, Spline Models for Observational Data RICHARD S. VARGA, Scientific Computation on Mathematical Problems and Conjectures INGRID DAUBECHIES, Ten Lectures on Wavelets STEPHEN F. McCoRMiCK, Multilevel Projection Methods for Partial Differential Equations HARALD NIEDERREITER, Random Number Generation and Quasi-Monte Carlo Methods JOEL SPENCER, Ten Lectures on the Probabilistic Method, Second Edition CHARLES A. MICCHELLI, Mathematical Aspects of Geometric Modeling ROGER TEMAM, Navier-Stokes Equations and Nonlinear Functional Analysis, Second Edition
For information about SIAM books, journals, conferences, memberships, or activities, contact: Society for Industrial and Applied Mathematics 3600 University City Science Center Philadelphia, PA 19104-2688 Telephone: 215-382-9800 / Fax: 215-386-7999 [email protected] http://www. siam. org
Science and Industry Advance with Mathematics