Group Invariance in Statistical Inference
T h i s p a g e i s i n t e n t i o n a l l y left b l a n k
G r o u p I n v a r i a n c e in Statistical Inference
Narayan
C . Giri
University of Montreal, Canada
W o r l d h
Singapore
Scientific * New Jersey
• London • Hong
Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Fairer Road, Singapore 912805 USA office: Suite I B , 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
G R O U P I N V A R I A N C E IN S T A T I S T I C A L I N F E R E N C E Copyright © 1996 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-1875-3
Printed in Singapore.
To NILIMA NABANITA NANDIAN
T h i s p a g e i s i n t e n t i o n a l l y left b l a n k
CONTENTS
Chapter 0. G R O U P I N V A R I A N C E
1
0,0. Introduction
1
0.1. Examples
3
Chapter I . M A T R I C E S , G R O U P S A N D J A C O B I A N S
7
1.0. Introduction
7
1.1. Matrices
7
1.1.1. Characteristic Roots and Vectors
8
1.1.2. Factorization of Matrices
9
1.1.3. Partitioned Matrices
9
1.2. Groups
10
1.3. Homomorphism, Isomorphism and Direct Product
12
1.4. Topological Transitive Groups
12
1.5. Jacobians
13
Chapter 2. I N V A R I A N C E
16
2.0. Introduction
16
2.1. Invariance of Distributions
16
2.1.1. Transformation of Variable in Abstract Integral
22
2.2. Invariance of Testing Problems
25
2.3. Invariance of Statistical Tests and Maximal Invariant
26
2.4. Some Examples of Maximal Invariants
28
2.5. Distribution of Maximal Invariant
31
2.5.1. Existence of an Invariant Probability Measure on 0{p) (Group of p x p Othogonal Matrices)
33
2.6. Applications
33
2.7. The Distribution of a Maximal Invariant in the General Case
36
vii
viii
Contents
2.8. An Important Multivariate Distribution
37
2.9. Almost Invariance, Sufficiency and Invariance
44
2.10. Invariance, Type D and E Regions
45
Chapter 3. E Q U I V A R I A N T E S T I M A T I O N I N C U R V E D MODELS
49
3.1. Best Equivariant Estimation of y, with \ Known 3.1.1. Maximum Likelihood Estimators 3.2. A Particular Case
50 53 53
3.2.1. An Application
58
3.2.2. Maximum Likelihood Estimator
58
3.3. Best Equivariant Estimation in Curved Covariance Models
60
3.3.1. Characterization of Equivariant Estimators of S
61
3.3.2. Characterization of Equivariant Estimators of 0
63
Chapter 4. S O M E B E S T I N V A R I A N T T E S T S I N MULTINORMALS
68
4.0. Introduction
68
4.1. Tests of Mean Vector
68
4.2. The Classification Problem (Two Populations)
75
4.3. Tests of Multiple Correlation
82
4.4. Tests of Multiple Correlation with Partial Information
85
Chapter 5. S O M E M I N I M A X T E S T S I N M U L T I N O R M A L E S 5.0. Introduction
91 91
5.1. Locally Minimax Tests
93
5.2. Asymptotically Minimax Tests
106
5.3. Minimax Tests
111 2
5.3.1. HotelUng's T Test
112
2
5.3.2. R -Test
124
5.3.3. e-Minimax Test (Linnik, 1966)
135
Chapter 6. L O C A L L Y M I N I M A X
TESTS IN SYMMETRICAL
DISTRIBUTIONS
137 137
6.0. Introduction
137
6.1. Eliiptically Symmetric Distributions
137
6.2. Locally Minimax Tests in E (n, S )
140
6.3. Examples
143
p
Chapter 7. T Y P E D A N D E R E G I O N S
162
Group Invariance in Statistical Inference
Chapter 0 GROUP I N V A R I A N C E
0.0. I n t r o d u c t i o n One of the unpleasant facts of statistical problems is that they are often too big or too difficult to admit of practical solutions.
Statistical decisions
are made on the basis of sample observations. Sample observations often contain information which is not relevant to the making of the statistical decision. Some simplifications are introduced by characterizing the decision rules in terms of the sufficient statistic (minimal) which discard that part of sample observations which is of no value for any decision making concerning the parameter and thereby reducing the dimension of the sample space to that of the minimal sufficient statistic. T h i s , however, does not reduce the dimension of the parametric space.
B y introducing the group invariance principle and
restricting attention to invariant decision rules a reduction to the dimension of the parametric space is possible.
I n view of the fact that sufficiency and
group invariance are both successful in reducing the dimension of the statistical problems, one is naturally interested in knowing whether both principles can be used simultaneously and if so, in what order. Hall, Wijsman and Ghosh (1965) have shown that under certain conditions this reduction can be carried out by using both principles simultaneously and the order in which they are used is immaterial in such cases. However, one can avoid verifying these conditions by replacing the sample space by the space of sufficient statistic and then using group invariance on the space of sufficient statistic. I n this monograph we treat multivariate problems only where the reduction in dimension is
1
2
Group Invariance in Statistical
Inference
very significant. In what follows we use the term invariance to indicate group invariance. In statistics
the term invariance is used in the
mathematical
sense
to denote a property that remains unchanged (invariant) under a group of transformations. In actual practice many statistical problems possess such a property. As in other branches of applied sciences it is a generally accepted principle in statistics that if a problem with an unique solution is invariant under a group of transformations, then the solution should also be invariant under it. This notion has an old origin in statistical sciences. Apart from this natural justification for the use of invariant decision rules, the unpublished work of Hunt and Stein towards the end of Second World War has given this principle a Strang support as to its applicability and meaningMness to prove various optimum properties like minimax, admissibility etc. of statistical decision rules. Although a great deal has been written concerning this principle in statistical inference, no great amount of literature exists concerning the problem of discerning whether or not a given statistical problem is actually invariant under a certain group of transformations. Brillinger (1963) gave necessary and sufficient conditions that a statistical problem must satisfy in order that it be invariant under a fairly large class of group of transformations including Lie groups. In our treatment in this monograph we treat invariance in the framework of statistical decision rules only. D e F i n e t t i (1964) in his theory of exchangeability treats invariance of the distribution of sample observations under finite permutations. It provides a crucial link between his theory of subjective probability and the frequency approach of probability. T h e classical statistical methods take as basic a family of distributions, the true distribution of the sample observations is an unknown member of this family about which the statistical inference is required. unknown.
According to D e Finetti's approach no probability is
If « i , i E s , . . . are the outcomes of a sequence of experiments con-
ducted under similar conditions, subjective uncertainty is expressed directly by ascribing to the corresponding random variables X ,X^, t
• • • a known joint
distribution. When some of the X's are observed, predictive inference about others is made by conditioning the original distributions on the observations. De Finetti has shown that these approaches are equivalent when the subjectivist's joint distribution is invariant under finite permutation. T w o other related principles, known in the literature, are the weak invariance and the strong invariance principles. T h e weak invariance principle is used
Group Invariance
3
to demonstrate the sufficiency of the classical assumptions associated with the weak convergence of stable laws (Billingsley, 1968). T h i s is popularly known as Donsker's theorem (Donsker, 1951). L e t X\, X2, • • • be independently dis2
tributed random variable with the same mean zero and the variance a
Sj =
£j=i
= ^ . * = | . ;
i
-
1
. - " . « -
converges weakly to Brownian motion.
and let
Donsker proved that { * „ ( ( ) }
Strong invariance principle has been
introduced to prove the strong convergence result (Tusnady, 1977). Here the term invariance is used in the sense that if X i , X j , . . . are independently dis2
tributed random variables with the same mean 0 and the same variance a , and if ft is a continuous function on [0,1| then the limiting distribution of h(Xi) does not depend on any other property of Xi. 0.1. E x a m p l e s We now give a n example to show how the solution to a statistical problem can be obtained through direct application of group theoretic results. E x a m p l e 0.1.1. Let X
a
= (X
n l
,...,X
a p
)\
a = l,...,N(>
p) be inde-
pendently and identically distributed p-variate normal vectors with the same mean ^ = (m,..
. , j t ) ' , and the same positive definite covariance matrix E . p
The parametric space f! is the space of all {11, £ ) . H0 : // = 0 against the alternatives Hi : 11
T h e problem of testing
0 remains unchanged (invariant)
under the full linear group Gj(p) of p x p nonsingular matrices g transforming each Xi -* gXi, i = l,...,N.
-
Let
N
N
It is well-known { G i r i , 1977) that (X,S) transformation on the space of {X,S)
(X,S)^(gX,gSg'),
is sufficient for ( / * , £ ) . T h e induced
is given by
g € Gi(p) •
(0.2)
Since this transformation permits arbitrary changes of X, S and any reasonable statistical test procedure should not depend on any such arbitrary change by g, we conclude that a reasonable statistical test procedure should depend on (X, S) only through 2
T
1
= N(N - l l X ' S " * .
It is well-known that (Giri, 1977) the distribution of T
(0.3) 2
is given by
4
Group /nuariance in Statistical
2
2
fr*(t \6 )
=
(N-l)T(^N-p)) (
where 6
2
Inference
-
\8 y(t j{N 2
2
+
2
= JV/i'£-y
Under H S
l
>- T{\N
if £ > 0 2
+ j)
= 0 and under ff i S
0
2
(0.4)
> 0.
Applying
Neyman and Pearson's L e m m a we conclude from (0.4) that the uniformly 2
most powerful test based on T T
2
test, which rejects H
0
of H
0
against Hi is the well-known Hotelling's
for large values of
2
T. 3
N o t e . I n this problem the dimension of fl is p + also the dimension of the (X,S).
= P(P+ )
For the distribution of T
2
)
w n
i h is c
the parameter is
a scalar quantity. One main reason of the intuitive appeal, that for an invariant problem with an unique solution, the solution should be invariant, is probably the belief that there should be a unique way of analysing a collection of statistical data. A s a word of caution we should point out that, if in cases where the use of invariant decision rule conflicts violently with the desire to make a correct decision or to have a smaller risk, it must be abandoned. We give below one such example which is due to Charles Stein as reported by Lehmann (1959, p. 338). E x a m p l e 0.1.2.
f
Let X = Xi,...,X )', p
Y = (Yi,...,Y )' p
be indepen-
dently distributed normal p-vectors with the same mean 0 and positive definite covariance matrices S , 6 S respectively where 6 is an unknown scalar constant. Consider the problem of testing Ho : 6 — 1 against H\ : 6 > 1, T h i s problem is invariant under Gj(p) transforming X —> gX,Y
—» gY,g e Gt(p). Since this
group is transitive (see Chapter 1) on the space of values of (X, Y) with probability one, the uniformly most powerful invariant test of level a under Gi{p} is the trivial test $ ( X , Y) = a which rejects Ho with constant probability a for all values {x,y)
of (X,Y).
Hence the maximum power that can be achieved
over the alternatives Hi by any invariant test under G;(p) is also a . B u t the test which rejects Ha whenever
(0.5) where the constant C depends on level a, has strictly increasing power 0(6} whose minimum over the set 6 > Si > 1 is /?{<5i) > /3(1) = a. discussions and results refer to Giri (1983a, 1983b).
F o r more
Group Invariance
5
Exercises 1. L e t Xi,...
,X be n
independently and identically distributed normal random
variables with the same mean 9 and the same unknown variance o
2
H
A
(a)
and let
:d = 0 and H i : 9 / 0. F i n d the largest group of transformations which leaves the problem of testing Ha against Hi invariant.
(b)
Using the group theoretic notion show that the two-sided student (-test is uniformly most powerful among all tests based on (.
2. U n i v a r i a t e G e n e r a l L i n e a r H y p o t h e s i s .
Let
dently distributed normal random variables with E(Xi) i — 1,... ,TI.
, . . . ,X
be indepen-
n
= (ft, V a r ( X ; ) =
2
a,
L e t fi be the linear coordinate space of dimension of n and
let lip. and IL
U
be two linear subspaces of fl such that dim I I ^
dim IIJJ — i , I > k. Consider the problem of testing HQ : 8 —
— k and ... ,9 )' G P
lTm against the alternatives Hi : 9 6 HQ. (a) F i n d the largest groups of transformations which leave the problem invariant. (b)
Using the group theoretic notions show that the usual F-test is uniformly most powerful for testing HQ against H\.
3. L e t Xy,...,
X
n
be independently distributed normal random variables with
the same mean 9 and the same variance a . 2
against H\ : a
2
= a
2
For testing H
2
Q
: a
=
a\
< af, where
invariant test using the group theoretic notions.
References 1. P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968. 2. D. Brillinger, Necessary and sufficient conditions for a statistical problem to be invariant under a Lie group, Ann. Math. Statist., 34 (1963) 492-500. 3. B. De Finetti, Studies in Subjective Probability, H . E , Kyburg and H, E . Smoker, eds., Wiley, New York, (1964) 93-158. 4. M. Donsker, An invariance principle for certain probability limit theorems, Amer. Math. S o c , 6 (1951). 5. N. Giri, Hunt-Stein theorem. Encyclopedia of Statistical Sciences, Vol. 3, K o t z Johnson, eds., Wiley, New York, 1983a, 686-689. 6. N. Giri, Invariance Concepts in Statistics, Encyclopedia of Statistical Sciences, Vol. 4, Kotz-Jofmson, eds., Wiley, New York, 1983b, 219-224. 7. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977.
6
Group Invariance m Statistical
Inference
8. W. J . Hall, R. A. Wijsman and J. K . Ghosh, The relationship between sufficiency and invariance with application in sequential analysis, Ann. Math. Statist, 36 (1965) 575-614. 9. E . L . Lehmann, Testing of Statistical Hypothesis, Wiley, New York, 1959. 10. G . Tusnady, in Recent Developments in Statistics, Barra, J . R . , Brodeau, F . , Romier, G. and Van Cutsera, B. eds, North-Holland, Amsterdam, (1977) 289-300.
Chapter 1 M A T R I C E S , GROUPS AND JACOBIANS
1.0.
Introduction
T h e study of group invariance requires knowledge of matrix algebra and group theory. We present here some basic results on matrices, groups and Jacobians without proofs. Proofs can be obtained from G i r i (1993, 1996) or any textbook on these topics. 1.1. M a t r i c e s A p x q matrix C = (c\j) is a rectangular array of real number Cy written as (1.1)
where e% is the element in the i t h row and the j t h column. T h e transpose C of C in a q x p matrix obtained by interchanging the rows and the columns of C. If q = p, C is called a square matrix of order p. K q — 1,0 is a 1 X p row vector and if p = 1, C is a g x 1 column vector. A square matrix C is symmetric if C — C. A square matrix C = ( c ^ ) of order p is a diagonal matrix D ( c , c ) with diagonal elements e n , . . . ,Cp if all off-diagonal elements of C are zero. u
pp
P
A diagonal matrix with unit diagonal elements is a n indentity matrix and is denoted by / . Sometimes it will be necessary to write it as I to denote a n identity matrix of order k. K
7
8
Group Invariance
in Statistical
Inference
A square matrix C — (cij) of order p is a lower triangular matrix if
= c
0 for j > i. T h e determinant of the lower triangular matrix det C = H(=i » We shall also write det C — \C\ for convenience. A square matrix C = (ey ) of order p is a upper triangular matrix if Cj, = 0 c
for i > j and det C = [IS=I "A square matrix of order p is nonsingular if det C ?^ 0. If det C = 0 then C , is a singular matrix. A nonsingular matrix C of order p is orthogonal if CC = CC = I. T h e inverse of a nonsingular matrix C of order p is the unique matrix C such that C C
-
1
= C
_
1
1
_
1
l
C = J . From this it follows that det C " = (det C)~ .
A square matrix (7 = ( c ^ ) of order p or the associated quadratic form x'Cx = J2i Z l j djXiXj is positive definite if x'Cx > 0 for x = ( i i , . . . , x )' / 0. p
If C is positive definite C A of order pACA'
_
1
is positive definite and for any nonsingular matrix
is also positive definite.
1.1.1. Characteristic roots and vectors T h e characteristic roots of a square matrix C of order p are given by the roots of the characteristic equation det(C-A/)=0
(1.2)
where A is real. A s det {8C8' - XI) = det ( C - XI) for any orthogonal matrix 8 of order p, the characteristic roots of C remain invariant under the transformation of C —> 8C8'. T h e vector x — ( i ^ , . . . , x )' / 0 satisfying p
{C-XI)x
= 0
(1.3)
is the characteristic vector of C corresponding to its characteristic root X. I f x is a characteristic vector of C corresponding to its characteristic root X, then any scalar multiple ax, a j i 0, is also a characteristic vector of C corresponding to X. Some Results on Characteristic R o o t s a n d Vectors 1. T h e characteristic roots of a real symmetric matrix are real. 2. T h e characteristic vectors corresponding to distinct characteristic roots of a symmetric matrix are orthogonal.
Matrices, Groups and Jacobians 9 3. T h e characteristic roots of a symmetric positive definite matrix C are ail positive. 4. Given any real square symmetric matrix C of order p, there exists an orthogonal matrix 9 of order p such that 9C9' is a diagonal matrix D{\\,..., where A i , . . . , A
p
A ) p
are the characteristic roots of C. Hence det C — n f = i A<
and tr C — X]f=i *W- Note that tr C is the sum of diagonal elements of C.
1.1.2.
Factorization
of matrices
In this sequal we shall use frequently the following factorizations of matrices. For every positive definite matrix C of order p there exists a nonsingular matrix A of order p such that C = AA' and, hence, there exists a nonsingular 1
matrix B (B = A' )
of order p such that BCB'
= I.
Given a symmetric nonsingular matrix C of order p, there exists a nonsingular matrix A of order p such that
- i )
m
where the order of / is equal to the number of positive characteristic roots of C and the order of —I is equal to the number of negative characteristic roots of C . Given a symmetric positive definite matrix C of order p there exists a nonsingular lower traingular matrix A (an upper triangular matrix B) of the same order p such that
C = AA' = B'B.
(1.5)
C h o l e s k y D e c o m p o s i t i o n For every positive definite matrix C there exists an unique lower triangular matrix of positive diagonal elements D such that
C = DD'. 1.1.3.
Partitioned matrices
Let C — ( c y ) be a p x q matrix and let C be partitioned into submatrices
Cij as C = (^
n
\Cai where C
u
-
^ C22 /
(cij)(i = l , . . . , m ; j = l,...,n)\C
12
- (ftj)(» = l , . . . , m ; jf =
10
n+
Group Invariance in Statistical
l,...,q);C
Inference
= ( c y ) ( t = m+
2 l
l,...,p; j = l,...,n);C
2 2
=
=
m + l j . . , ,p; 3' = J l + 1 , . . . . , jjj. For any square matrix 1
C — f^ ' \C2i where Cn,C 2
\ C22 /
are square matrices and C 2 is nonsingular
2
2
(1) det ( C ) = det ( C
2 2
) det ( C n
CaC^Cji),
(2) C is positive definite if and only if C ,C U
C22-.Cn — C\ C C \ 2
Let C
-
1
22
22
- C iG^C2
or equivalently
n
are positive definite.
2
= B be similarly partitioned into submatrices B ,i,j i:i
= 1,2.
Then 1
C^, — Bu - B12B22B21, C^C
12
C22 — B22 -
= -B B^.
B iB 'Bi , 2
ll
2
(1.6)
l2
1.2. G r o u p s A group G is a set with an operation r satisfying the following axioms. A i . For any two elements a, b E G, arb € G. A . For any three elements a,b,c G G\ {arb)rc — o r ( b r c ) . 2
A3. There exists an element e (identity element) such that for all o £ G,are —
a. A . 4
For any a G G there exists an element a~ ara
- 1
l
(inverse element) such that
— e.
In what follows we will write for the convenience of notation arb = ab. I n such writing the reader may not confuse it with the arithmetic product ab. A group G is abelian if for any pair of elements o, 6 belonging to G, ab = ba. A non-empty subset H of G is a subgroup if the restriction of the group operation T to H satisfies the axioms A i , , . . , A4, E x a m p l e s of G r o u p s E x a m p l e 1.1.
A . L e t X be a set and G be the set of all one-to-one
mappings
g : X -> X with g{x) = g(y)\x,y G I ; implies x — y and for x G X there exists y G X such that y = g{x). W i t h the group operation defined by 9r92(x) =SI(£J2M);SI,S2 € G , G forms a permutation group.
Matrices,
Groups and Jacobians
11
E x a m p l e 1.2. T h e additive group of real numbers is the set of all reals with the group operation ab = a + 6. T h e multiplicative group of all nonzero reals with the group operation ab — a multiplied by b. E x a m p l e 1.3. L e t X be a linear space of dimension n. Define for x
0
G X,g (x)
— x + XQ,X G X. T h e set of all {g }
xo
Xo
forms an
additive abelian group and is called the translation group. E x a m p l e 1.4. L e t X be a linear space of dimension n and let Gi(n) be the set of all nonsingular linear transformations X onto X. Gi(n) with matrix multiplication as the group operation is called the full linear group. E x a m p l e 1.5. T h e affine group is the set of pairs (g,x),x
G X,g € Gi{n)
with the group operation defined by (9i,£i)(2,X2) = {9192,91X2 + Xi) with 91.92 6 Gi(n) and X\,x
2
G X. T h e identity element and the inverses are
1
(1,0) and ( s . x r - O r E x a m p l e 1.6.
1
T h e set of all nonsingular lower triangular matrices of
order p with the usual matrix multiplication as the group operation forms a group GT(P) (with identity matrix as the unit element).
T h e set of all
upper triangular nonsingular matrices of order p forms the group GUT(P)
with
identity matrix as the unit element. E x a m p l e 1.7. T h e set of all orthogonal matrices 6 of order p with the matrix multiplication as the group operation forms a group E x a m p l e 1.8. L e t o C £ 1 C £ 2 C • • • C X
k
0(p).
= X be a strictly increasing
sequence of linear subspaces of X and let G be a subgroup of Gi(p) such that g G Gi{p) if and only if gXi = Xi, i = 1 , . . . , k. T h e group G is the group of nonsingular linear transformations on X to X which leaves Xi invariant. Choose a basis x^\ ... , x $ , t = 1 , . . . , k for Xi-Xi-i
with n, — dim (X,) -
dim (3t,_i) and XQ —
(null set). Then g G G can be written as
9 =
/ 9(U) 9(12) 0 9(22) \
0
0
fl(lJfc) \
9<2fc)
(1.7)
12
Group /nuariance in Statistical
inference
where g/*^ is a block of Bj rows and rij columns for i < j . I f n; = 1 for all =
i,G
GUT-
1.3. H o m o m o r p h i s m , I s o m o r p h i s m a n d D i r e c t P r o d u c t Let G, H be groups. A mapping / of G into H is called a homomorphism if, for gi, g
2
£ G,
/(9i!72) = / ( 9 i ) * / M where * is the group operation in H. is the identity element in H and / ( g
(1-8)
I f e is the identity element in G , }{e) - 1
-1
) — [ / ( < ? ) ] . I n addition, if / is an
one-to-one mapping, then / is called an isomorphisim. T h e Cartesian product G x H of groups G and H with the operation h
(9u iK92,h ) 2
£
= (9i92,hih y,gi,g2 2
GMM
e
H is called the direct
product of G and i f . A subgroup H of G is a normal subgroup of G if for all h £ H and s e (?, ghg~
l
£ / / , or equivalently if gHg
E x a m p l e 1.9.
-1
—H .
T h e group G f ( p ) of lower triangular matrices of order p
with positive diagonal elements is a normal subgroup of GT(P)-
A n y subgroup
of an abelian group is normal. Let G be a group and let i f be a subgroup of G. T h e set G/H
of all sets
of the form gH, g € G is called the quotient group of G modulo H with group operation defined by, for j&jjfe G &,.(% H){gzH) by multiplying all elements of giH element is H and (gH)
-1
—
is a s e t of elements obtained
with all elements of g H. 2
T h e identity
v
g~ H.
1.4. T o p o l o g i c a l T r a n s i t i v e G r o u p s Let X be a set and A a collection of subsets of X.
(X,A)
is called a
topological space if A satisfies the following axioms:
TAi : X £ A. T A j : T h e union of an arbitrary subfamily of A belongs to A. TA
3
; T h e intersection of a finite subfamily of A belongs to A.
T h e elements of A are called open sets. If, in addition the topological space (X,A)
satisfies: for x,y
£ X,x / y, there exist two open sets A,B,
belonging
to A such that A n B =
Matrices,
Groups and Jacotians
13
If a group G has a topology r defined on it such that under T the mapping ab from G x G into G and a
-
1
from G into G ( a , b £ G ) are continuous, then
( G , r ) is a topological group. A compact group is a topological group which is a compact space. A locally compact group is a topological group which is a locally compact space. A compact space is a Hausdorff space X such that every open covering of X can be reduced to a finite subcovering. A locally compact space is a Hausdorff space such that every point has at least one compact neighborhood. Let X be a set. T h e group G operates from the left on X if there exists a function on G x X into X whose value at {g,x}
is denoted by gx such that (i)
ex — x for all x € X and e, the identity element of G ; (ii) g^igix)
—
gzgi(x).
This implies that gtG is one to one on X into X. Let G operate from the left on X. every Xi,x
G o p e r a t e s t r a n s i t i v e l y on X if for
£ X there exists ag £ G such that gx\ — x -
2
2
E x a m p l e 1.10.
Let X be a linear space. T h e full linear group operates
transitively on X — { 0 } . E x a m p l e 1.11.
T h e group 0{n)
of n x n orthogonal matrices operates
transitively on the set of all n x p(n ^ p) real matrices, x satisfying x'x = E x a m p l e 1.12. T h e linear group G ; ( p ) acts transitively on the set of all p x p positive definite matrices s, transfering s to gsg',g
£ G/(p).
1.5. J a c o b i a n s Let Xi,...,X
be a sequence of continuous random variables with pdf
n
fx,
xAXi,--.,X }
and let Yi = gi(X ...
n
,X )>i
u
n
continuous one-to-one transformations of X%, • • •, X ^ n
tion gi,...,g
n
= l,...,n
be a set of
Assume that the func-
have continuous partial derivatives with respect to X ] . . . ,
Denote the inverse functions by X ; = hi(Yi,...,
Y ),i n
= l , . . . , n . Let J
-
X. n
1
be
the determinant of the n x n matrix of partial derivatives Mi
Mi
9Y„ (1.9) dX„ 9V„ J is called the Jacobian of the transformation X\,..., The pdf of Yi,...,
Y
n
M , . . . y - ( K i . - - - . Vn) = fx,,...^Cftltflj r
X
n
to V j , . . . ,
is given by •••.»»). . ••*Mi*l. --v.Sn)|/|
Y. n
14
Group Invariance
tn Statistical
Inference
where \J\ is the absolute value of J. We now state some results on Jacobian without proof. We refer to G i r i (1994), OIkin (1962) and Roy (1959) for further results and proofs. S o m e R e s u l t s on J a c o b i a n s Let X
=
(Xi,...,X )',y
=
p
(Y ,...,Y )' 1
P
v
€ E.
T h e Jacobian of the
l
transformation X —> Y — gX g g G j ( p ) is (det g)~ • Let X, Y be p x n matrices. T h e Jacobian of the transformation X —* Y = gx,geGt{p)
1
is (det
L e t X,Y
g)" .
bepxn
matrices and let A 6 Gj(p),Z? € Gt{n). 1
the transformation X -> Y = AXB L e t g, k nf i(ft.i)
€
_ 1
Gr(p).
is (det A' )*
l
(det
T h e Jacobian of p
S- ) .
T h e Jacobain of the transformation g —» kg
is
where h — (/i,j) and the Jacobian of the transformation g —* gh is
=
p
1
nf= <M'- - . 1
Let g,k l
n^iC »)' ^
GUT(P)-
e - p - 1
T h e Jacobian of the transformation g —» kg is
where k = (hij) and the Jacobian of the transformation g —• gh
n L Let G S T - ( P ) be the multiplicative group of p x p lower triangular nonsingular
matrices in the block form, i.e. g e G B T ( P ) .
9
=
\ 9(21) m
9(22) m
•9(1:1)
9(*2)
0
0 *
| •••
9(ltfc)
where g^q are submatrices of order pi x pi such that Y^iPi GBT(P)
the Jacobian of the transformation g -* a
=
where cr = Y? =\P3* o ;
is n t j d e t
°-
3
1
kg is n ! = i i
d e t
/i(ii)j
6 - , T i
,
T h e Jacobian of the transformation g —> gh
p
h^F- " .
Let GBUT(P)
be the group of nonsingular upper triangular matrices of order GBUT,
p in the block form i.e. g e
'9(H) ?
where g^
r
= P- F ° 9 . ^
_
9(12)
•••
0
9(22)
••
0
0
I
are submatrices of order p; satisfying
9
9(fcfc) P
the Jacobian of the transformation g -t gh is n L i [ 9^/»9isnLi[
d e
i
,
p
t«(u)]" - " .
= . For g,h
i
e
p
d
e
t
-
fya)] "'
a
n
d
G t
h
e t
a
t
7 , T
o
f
Matrices, Groups and Jacobians
15
Let 5 be a symmetric positive definite matrix of order p. T h e Jacobian of the transformation 5 -> gSg',g
e Gt(p)
is [det g ] "
t p + 1 )
and that of S - » 5
_ 1
2
is (det S ) " . Let 5 be a symmetric positive definite matrix of order p and let g be the unique lower triangular matrix such that S = gg'. T h e Jacobian of the trans-
formation S -> gSg' is 2~" ^ i f f f i i ) ^ the transformation S —• gSg',g
1 -
*
5
with g = (jfc), T h e Jacobian of
g G £ ( p ) is [det
<
p+1
g]~ - K
Exercises 1. Show that for any nonsingular symmetric matrix A of order p and non null p vectors x,y (a) (A +
1
xy')- ^
l +
l
y'A~ x
x'A-^x 1
(b) a?{A + xx')- x
=
l + x'A-tx
(c) |A + xa;'| = \A\{\ + 2. Let A,B
be p x q and qxp
1
1
x'A~ x). matrices respectively. Show that \I + AB\ P
=
\I +BA\. q
3. Show that for any lower triangular matrix C the diagonal elements are its characteristic roots. 4. Let X be the set of all n x p real matrices X satisfying X'X that the group 0(n)
— I. p
Show
of orthogonal matrices 9 of order p, which transform
X —> 6X acts transitively on X. 5. Let 5 be the set of all symmetric positive definite matrices s of order p. Show that G j ( p ) which transforms s —t gsg',g
£ G)(p), acts transitively on
5.
References 1. N. Giri, Multivariate Statistical Inference, Academic Press, New York, 1977. 2. N. Giri, Introduction to Probability and Statistics, 2nd Edition (Expanded), Marcel Dekker, New York, 1993. 3. N, Giri, Multivariate Statistical Analysis, Marcel Dekker, New York, 1996. 4. I. Olkin, Note on the Jacobian of certain transformations useful in multivariate analysis, Biometrika, 40 (1962), 43-46. 5. S. N. Roy, Some Aspects of Multivariate Analysis, Wiley, New York, 1957.
Chapter 2 INVARIANCE
2.0,
Introduction
Invariance is a mathematical term for symmetry.
In practice many sta-
tistical problems involving testing of hypothesis, exhibit symmetries, which imposes additional restrictions for the choice of appropriate statistical tests, for example, the statistical tests must also exhibit the same kind of symmetry as is present in the problem. In this chapter we shall consider the principle of invariance of statistical testing problems only. For an additional reference the reader is referred to Lehmann (1959), Ferguson (1967) and Nachbin (1965). 2.1. I n v a r i a n c e of D i s t r i b u t i o n s be a measure space and ft the set of points 8, that is, f! — {9}.
L e t [X,A)
Consider a function P on ft to the set of probability measures on (X A) whose t
value at 8 is P . B
L e t G be a group of transformations operating from the left
on X such that g € G g : X°"°X
is (A, A) measurable;
(2.1)
9 : ft ° -
(2.2)
and
is such that if X has distribution P , 6
0
gX, X € X has distribution P s
where
0* = 98 6 ft. A l l transformations considered in connection with invariance will be taken for granted as one-to-one from X onto X. A n equivalent way of stating (2.2) is as follows: 16
Invariance
P {gX e
e A) = P- (X g6
€A),Ae
A,
I7
(2.2a)
or 1
P {g- A)
=
9
P {A)
(2.2b)
se
This can also be written as
Pt(B) = P »{gB), s
If all Pe are distinct, that is, 81 / 82 Pe,
Be A PH*
t
n
e
n
9*
S a
homomorphism.
The condition (2.2a) is often known as the condition of invariance of the distribution. Very often in statistical problems there exists a measure A such that Pa is absolutely continuous with respect to A for all 8 S fi so that p$ is the corresponding probability density function with respect to the measure A. I n other words, we can write
(2.3)
Also in great many cases of interest, it is possible to choose the measure A such that it is invariant under G, viz
\(A) = \[gA)
for all A G A,
geG.
(2.4)
D e f i n i t i o n 2 . 1 . 1 . A measure A satisfying (2.4) is left invariant. In such cases the condition of invariance of distribution under G reduces to Pgs(gx) = pe(x)
for alia; e X, g e G.
(2.5)
The general theory of invariant measures in a large class of topological groups was first given by Haar (1933).
However, the results we need were
all known by the end of the nineteenth century. I n the terminology of Haar, Ps(A) is called a positive integral of p«, not necessarily a probability density function. T h e basic result is that for a large class of topological groups there is a left invariant measure, positive on open sets and finite on compact sets and to within multiplication by a positive constant this measure is unique. Haar proved this result for locally compact topological groups.
T h e definition of
right invariant positive integrals, viz. Pe{A) = Pge(Ag) is analagous to that for left invariant positive integrals.
18
Group Invariance in Statistical
Inference
Because of the poineering works of Haar such invariant measures are called invariant (left or right) Haar measures. A rigorous presentation of Haar measure will not be attempted here. T h e reader is referred to Nachbin (1965). It has been shown that such invariant measures do exist, and from a left invariant Haar measure one can construct the right invariant Haar measure. We need also the following measure-theoritic notions for further developments. Let G be a locally compact group and let B be the <7-algebra of compact subsets of G. D e f i n i t i o n 2.1.1. (G,B)
(Relatively left invariant measure). A measure v on
is relatively left invariant with left multiplier x{g) i f f satisfies v{gB) = x(9)v(B),
forallBeB.
(2.5a) +
T h e multiplier x(t») is a continuous homomorphisim from G —» R . is relatively left invariant with left multiplier \(g) -l
x(ff )f(^ff) '
s
a
then \ ( g
- 1
If v
) = l/x(ff) and
e
l ft invariant measure.
D e f i n i t i o n 2.1.2.
A group G acts topologically on the left of x if the
mapping T(g, x) : x x G —» x
1S
continuous.
D e f i n i t i o n 2.1.3. (Proper G-space). Let the group G acts topologically on the left of x
a
n
d
let h be a mapping on G x \ —* X
x
X given by
Hg>x) = {gx,x) for g G G,x C C x
x
€ X-
T h e group G acts properly on \
if for every compact
J
X> / » " { G ) is compact.
T h e space x is called a proper G-space fG acts properly on \ . A n equivalent concept of the proper G-space is the Cartan G-space. D e f i n i t i o n 2.1.4 (Cartan G-space). Let G acts topologically on \ . Then X is called a Cartan G-space if for every x e x there exists a neighborhood V of a such that (V, V ) = {g e G\{gV) n V # 4>} has a compact closure. Wijsman (1967, 1978,1985, 1986) demonstrate the verification of the condition of Cartan G-space and proper actions in a number of multivariate testing problems on covariance structures. E x a m p l e 2.1.1. Let G be a subgroup of the permutation group of a finite s e t * - For anysubset A of x define A( A) = number of points in A. T h e m e a s u r e
Invariance
19
A is an invariant measure under G and is unique upto a positive constant. It is known as counting measure. Example
2.1.2.
L e t x be the Euclidean n space and G the group of
translations defined by i
3
e %g t
e G implying
Xl
g (x)
=x
Xl
Clearly G acts transitively on x-
+
xi.
T h e n-dimensional Lebesgue measure A is
invariant under G. Again A is unique upto a positive multiplicative constant. E x a m p l e 2.1.3. left.
Consider the group Gi(n)
operating on itself from the
Since an element g operating from the left on Gi(n)
Jacobian with respect to this linear operation. 2
space of dimension n . dimension n x n b y
is linear, it has a
Obviously, Gi(n)
is a vector
L e t us make the convention to display a matrix of
column vectors of dimension n x 1, by writing the first
column horizontally. T h e n the second is likewise and so forth. I n this manner a matrix of dimension nxn let g = (gi ),x }
— (x j),y z
n
represents a point in R
— (y ).
. L e t g,x,y
e Gi(n)
and
If we display these matrices by their column
t]
vectors, the relationship y — gx gives us the coordinates of the point
as
Vij =
y^9ikXkj k=l
in terms of the coordinates of {^111 • • ' * ^ l n > " ' • ! X i, n
. . . , Xjin) -
The Jacobian is thus the determinant of the n (9 0
0 g
\0
0
... ...
2
2
x n
0^ 0
9>
where each 0 indicates null matrix of dimension nxn. n
the transformation y = gx is (detg} . dX(y)
matrix
Hence the Jacobian of
Let us now define for y e -
(det T / ) "
Gt{n)
20
Group Invariance in Statistical Inference
Then n
dX(gy) =
(det g) Y[dVij (det(gy))"
(det 3/)"
= dX(y) Hence A is an invariant measure on G ; ( n ) under (?/(n). It is also unique upto a positive multiplicative constant. E x a m p l e 2.1.4. Let G be a group o f n x n
nonsingular lower triangular
matrices with positive diagonal elements and let G operate from the left on itself. We will verify that the Jacobian of the transformation
Q-*hg for
G, is I l i C " ) ' - Thus an invariant measure on G under G which is 1
unique upto a positive multiplicative constant, is
dX{g) =
il>,
n?=i(j?»)' To show that the Jacobian is n r = i ( ^ « ) * > ' kij = 0 for i <j. T h e n
e t
9 -
(fij),^ =
with g
tj
=
This defines a transformation (fti)-(Cy). We can write the Jacobian of this transformation as the determinant of the matrix /dC 3C„ \ dCu dg &921 n
lL
dC
21
dC
2l
dgu
BC \
nn
dgn
dC
21
dgnn
dC
nn
$921
&9nn 1
/nuartonce
21
which is a [n(rc + 1 ) ] / 2 x [ n ( f i - r l ) ] / 2 lower triangular matrix with the diagonal elements -
> ro.
h ,k kk
In other words h%i will appear once as a diagonal element, /122 will appear twice as a diagonal element and finally h 1
will appear n times as a diagonal
nn 1
element. Hence the Jacobian is I I I L i C " ) ' E x a m p l e 2.1.5.
L e t S be the space of all positive definite symmetric
matrices S of dimension n x n . and let Gi(rc) operate on 5 as S^gSg',
SeS,
g 6 G,{n).
(2.6) n+1
To show that the Jacobian of this transformation is (det g) , matrix obtained from the nxn the j t h row; M^C)
let i ? y be the
identity matrix by interchanging the i t h and
be the matrix obtained from the n x n identity matrix by
multiplying the ith row by any non-zero constant C and let A y be the matrix obtained from the nxn
identity matrix by adding the j t h row to the i t h . We
need only show this for the matrices Eij,Mi(C)
and A y . T h i s can be easily
verified by the reader; for example, d e t ( M , ( C ) ) = C and Mi{C)SMi(C) obtained from S = ( 5 y ) by multiplying Sa by C that the Jacobian is C ^ " "
1
= C"
+ 1
2
is
and S y for i ^ j by C so
.
Let us define the measure X on 5 by
d
X
{
b
}
-
(detS)(«W
Clearly, d\(S)
=
dX(gSg'),
so that X is an invariant measure on S under the transformation (2.6). Here also X is unique upto a positive multiplicative constant. T h e group G;(p)
acts transitively on S.
E x a m p l e 2.1.6. L e t H be a subgroup of G. T h e group G acts transitively on the space x = G/H
(the quotient group of H) where group action satisfies
9i(gh)
= 9i9Hi
for
9i e G ,
g€G.
When the group G acts transitively on x, there is a one to one correspondence between x and the quotient space G/H
of any subgroup H of G.
22
Group Invariance in Statistical
Inference
Transformation of variable in abstract integral
2.1.1.
Let (X, A,n)
be a measure space and let (y, C) be a measurable space.
Suppose
Let us define the measure v
(y,C) by 1
v(C) = i{>p- (C)),
CEC.
t
T h e o r e m 2 . 1 . 1 . If f is a real measurable function g[tp(x))
and if f is integrable,
(2.7) on X such that f{x) —
then
I }{x)dti{x)
=
f
Jx
g(
Jx =
f g{y)dv{y) Jy
(2.8)
For a proof of this theorem, the reader is referred to L e h m a n n [(1959), p. 38], E x a m p l e 2.1.6. ( W i s h a r t d i s t r i b u t i o n ) . Let X\,...,
X
n
be n indepen-
dent and identically distributed normal p-vectors with mean 0 and covariance matrix £ (positive definite). Assume n > p so that
is positive definite almost everywhere. T h e joint probability density function of X ...
,X
1}
n
is given by
2
1
2
1
p(x ,...,xJ-(27r)-"P/ (det£- )"/ xexp|-itrS- ^^;| 1
.
(2.9)
Using Theorem 2.1.1, we get for any measurable set A in the space of S
P(S
1
2
e A) — ( 2 J T ) - ' ^ J (det £
s - X^Xixl
= (2^)-^
e A exp
_
1
)
n
/
2
j-itr
(det £
_
1
)
f[dx,, ' i=l
S^sj
n
/
2
exp j - i
(2.10)
tr
dm(s)
Invariance
23
where m is the measure corresponding to the measure v of Theorem 2.1.1. Let us define the measure m' by
To find the distribution of S, which is popularly called Wishart distribution ( W ( S , J I ) ) with parameter E and degrees of freedom n, it is sufficient to find dm'.
T o do this let us first observe the following:
(i) Since E is positive definite symmetric, there exists a € Gi{p)
such that
£ — aa'. (ii) Let
As a
1
,
Xi s
are independently normally distributed with mean vector 0 and
covariance matrix I (identity matrix), 5 is distributed as W(I,n).
P .(S aa
n
6 A) = ( 2 7 r } - ^
2
l
/ JatA
(det(aa')- s))
x exp | - ^ t r ( a a ' )
P (aSa' r
n
€ A) = (27r)- ^
2
/
a a
' ( S e A ) = Pj{aSa
s|
(det((c.a')
x exp j - i t r ( a a ' ) Since P
- 1
l
Thus
n/2
dm'(s);
_1
s))
/a
s | x dm*{a
J
5a
')
€ A ) for all measurable sets A in the space of
5 , we must have for all g £ G J ( J I ) dm'(s)
t
= dm (gsg').
(2.12)
By example 2.1.5 we get
d
m
{ S )
where C is a positive constant.
= (det
'
24
Group Invariance in Statistical
Inference
T h u s the probability density function of the Wishart random variable is given by
1
n
2
W ( E , n ) = C(det £ - } ' ( d e t s )
( n
p
- -
, ) / 2
exp j - ^
tr£
_
1
sj
(2. 13)
where C is the universal constant depending on E . T o evaluate C let us first observe that C is a function of n and p. L e t us write it as C = C , . n p
Since C ^ n
p
is a constant for all values of £ , we can, in particular, take E = J to evaluate C . np
Write Sn
S12
\$2l where S
u
S22
is (p — 1) X (p — 1). Make the transformations
£11 = Tn, S12 — T12, S22 - SuSiiSu
= T22 = Z ( s a y ) ,
which has a Jacobian equal to unity. Using the fact that
\S\ = \Su\ \S22 - S iSi, Si2\
,
l
2
and
J C , |5| n
l n
p
p
1
- - »^exp|-^tr5jds= 1
forallp.n,
we get
1
J
2
C , \ \^-"- >' e^p^~tis^ds n p S
kiil
C
= -.pj
| n
" "
p
p
1 ) / 2
x y"{,)(- -i)/
*
/
e x p
s
{~5 'i
2 s
exp|-itr
2 e x p
1
i"i '
|_I |
5 l 2
z
}
d l l
d 2
»2
S l l
Jd
5 l l
/nuoriance
x /
|sn|
I
C,
m
^
n P
t n
p
" ~
2 j r
l ) / 2
exp
da
j( -iV2 (n- +n/2 P
2
P
r
C
n,(p-1) C
j
n 2
Since C , i = 2- ' /T{n/2),
^
l / 2
2
exp
J da n •
we have
n
1
7 r
( n
11
f n - p + l \ \
",(p- i)|5n|
25
-p(p-l)/22-' /
2 J r
-p(p-l)/2 -np/2 2
2.2. I n v a r i a n c e o f T e s t i n g P r o b l e m s We have seen in the previous section that for the invariance of the statistical distribution, transformation g of g g G must satisfy g9 g fi for 9 g fl.
D e f i n i t i o n 2.2.1. ( I n v a r i a n c e o f p a r a m e t r i c s p a c e ) . T h e parametric space fi remains invariant under a group of transformation G : X (i) for g g G, the induced mapping j on fi satisfies g9e.il
1
A , if
for all # € fi;
(ii) for any 8' £ fi there exists 0 g fi such that g$ = 8'. An equivalent way of writing (i) and (ii) is
fffi = It may be remarked that if P
fi.
(2.14)
for different values of 8 g fi are distinct then
g
g is one-to-one. T h e following theorem will assert that the set of all induced transformations g,g g G also forms a group.
T h e o r e m 2.2.1. Letg g ly
The transformations
g gi 2
2
be two transformations
and g^
1
which leave fi invariant.
defined by
0201 (x) = 02(9i (aO) _. ffi
(2-15) 9iW=x
26
Group /rtuariance in Statistical
Inference
for all x £ X leave SI invariant
and g gi
l
— fftffi, (Si ) — (ffi)
2
1
P r o o f . We know that if the distribution of X is Pg, 0 G fl, the distribution of gX is Pgd.gi € ft. Hence the distribution of g2gi{X) 1
and
leaves fi invariant, g~2g~\6 €
ft.
T h u s g g\ 2
is P g ^ e , as
obviously,
9i gi = 92fli • Similarly the reader may find it instructive to verify the other assertion.
•
L e t us now consider the problem of testing the hypothesis Ho : 6 G Slu
0
against the alternatives Hi : 6 £
ftjj,,
where ft#
0
and Sin,
are two disjoint
subsets of SI. Let G be a group of transformations which operate from the left on X satisfying conditions (i) and (ii) above and (2.14). I n what follows we will assume that g £ G is measurable which ensures that whenever X is a random variable then gX is also a random variable.
Definition 2.2.2. of testing Ho : 8 £ SIH
( I n v a r i a n c e o f s t a t i s t i c a l p r o b l e m ) . T h e problem U
against the alternative H i : 0 <S Sin, remains invariant
under a group of transformations G operating from the left on X if (i) (or g € G, A € A, P (A)
=
e
(ii)
=
ttH ,gSl a
Hl
= Sl
Hl
P- (gA), sa
for all g G G .
2.3. I n v a r i a n c e of S t a t i s t i c a l T e s t s a n d M a x i m a l I n v a r i a n t Let (X,A,
A) be a measure space.
Suppose pg,8
G SI, is the probability
density function of the random variable X G X with respect to the measure A.
Consider a group G operating on X from the left and suppose that the
measure A is invariant with respect to G , that is, \{gA)
= \(A)
for a l l A e
A ? € G .
Let [y,C) be a measurable space and suppose T ( X ) is a measurable mapping from X into X.
Definition 2.3.1.
( I n v a r i a n t f u n c t i o n ) . T ( X ) is an invariant function
on X under G operating from the left on X if T ( X ) = T(gX) G,X€X.
for all g G
Invariance
D e f i n i t i o n 2.3.2.
( M a x i m a l i n v a r i a n t ) . T(X)
function on X under G if T(X) X.
is a maximal invariant
is invariant under G and T(X)
Y £ X implies that there exists g £ G such that Y = If a problem of testing HQ : 8 e f l #
0
27
= T{Y)
for all
gX.
against the alternatives H 8 lt
£ fijjr
remains invariant under the group transformations G, it is natural to restrict attention to statistical tests which are also invariant under G. Let v>(X), X £ X be a statistical test for testing Ho against H i , that is,
a
when x is the observed sample point. If f is invariant under G
then ip must satisfy
xeX.geG
A useful characterization of invariant test f(X) variant T(X)
in terms of the maximal in-
on X is given in the following theorem.
T h e o r e m 2.3.1. Let T(X) ip(X)
(2.16)
is invariant
be a maximal invariant
on X under G. A test
under the group G if and only if there exists a function
for which
h
h(T{X)).
P r o o f . Let
Obviously,
g£G,x£X.
Conversely, if
— T(y);x,y
£ X then there exists
a g £ G such that y — gx and therefore tp(x) — f[y).
•
In general if ip is invariant and measureable, then
but h
may not be a Borel measurable function. However if the range of the maximal invariant T(X)
is Euclidean and T is Borel measurable then Ai — {A : A £
A and gA — A for all g £ G) is the smallest f-field with respect to which T is measurable and ip is invariant if and only if ip(x) — h(T(x))
where h is a Borel
measurable function. T h i s result is essentially due to Blackwell (1956). Let G be the group of induced transformations on f! (induced by the group G). Let v(8),8
€ fi, be a maximal invariant on fi under G.
T h e o r e m 2.3.2.
The distribution
depends on f! only through
of T(X)
[or any function
of
T(X))
i/{9).
P r o o f . Suppose v{9i) = v(8' ),8\,8 2
i
£ fi. T h e n there exists a g £ G such
that f?2 — § # i . Now for any measurable set C
28
Group Invariance in Statistical
Inference
P (T(X)
£ C) = P {T{gX))
tl
G C)
9i
= P- (T(X)
E C )
g9l
• 2.4. S o m e E x a m p l e s of M a x i m a l I n v a r i a n t s E x a m p l e 2 . 4 . 1 . L e t X = {Xu • - • . * n ) ' £ * and let G be the group of translations gc{X)
= (Xi + C,..
.,X
a
+ C)',-oo
A maximal invariant on X under G is T{X) Obviously, it is invariant. Suppose x — (x\,... T(x)
Writing C — y
= T(y).
n
— x
n
< C < oo. = (X\,—X ,..
— X ).
, £ „ ) ' , y = ( j / i , . . . ,y )'
and let
n
n
n
we get yi = Xi + C for i — 1 , . . . , n.
Another maximal invariant is the redundant (X\ — X,...,
X
n
— X) where
X = i 51™ X j . It may be further observed that if n — 1 there is no maximal invariant. T h e whole space forms a single orbit in the sense that given any two points in the space there exists C G G transforming one point into the other. In other words G acts transitively on X. 1
E x a m p l e 2.4.2. Let A be a Euclidean n-space, and let G be the group of scale changes, that is, j? G G, a
9 (X) a
= (aX ,...,aX )', 1
n
a > 0.
E x a m p l e 2.4.3. Let X = X x X where Xi is the p-dimensional Euclidean space and X is the set of all p x p symmetric positive definite matrices. Let XeX S€X . Write Y
2
2
u
2
X -
(X i),...,X ))', 1
( t
where X is d; x 1 and 5 is a\ x d; such that £ i U < * i = P- Let G be the group of all nonsingular lower triangular pxp matrices in the block form, S G G [t)
{ i i )
Invariance (9(H) 3(21)
0 3(22)
\9(ki)
9(k2)
0
29
>
0
9 = •••
9{kk)/
operating as (X,S)->(gX,gSg'). Let
f{X,S) =
{Rl...,Rl),
where £ ^ - % S ® %
*= 1 . - . *
Obviously, R* > 0. L e t us show that Rj,...,
(2-17)
if £ is a maximal invariant on X
under G. We shall prove it for the case k — 2, T h e general case can be proved in a similar fashion. Let us first observe the following: (i) I f X -* gX,
S - » 5g' then
- * 9 ( n ) X i ) , 5(u) -t 9(ii)S(n)0( ). (
U
Thus ( i f j . f i f + R%) is invariant under G. (ii) A " ' 5
_ 1
X - XJ'JJSJ"!,^!) + ( X (5
- 5(ai)5(n)^"(i))' 1
( 2 2 )
- S(21)5 ^ 5(i2))" (X(2) (
Now, suppose that for X,Y (x ' (
( 2 )
1 1
5-! x )
)
e X\\ S,T
1)
.
2
,x'5- x)-(r ' (
1 1
T-; y l
where K, T are similarly partitioned as X, S. exists a.g eG
l )
G A*
1
( 1 )
S(21)S^ \ X( )
such that X = gY, S = gTg'.
1
[ 1 )
,y'r- y),
We must now show that there
L e t us first choose 0
g = ' 3(21)
3(22)
with 3(11) = 3(22) ~ (S(22) ~ 5 ( 2 ) S ^ ) S 2 ) ) " " 1
9(12) = 9(22)5'(21)S " (
11)
(
.
(2.18)
)
( 1
2
30
Group Invariance in Statistical
Inference
T h e n gSg' — I. Similarly choose
h
{
1
1
)
0
h = ( \fy l)
/l(22)
2
such that hTh! = I. Since
implies we conclude that there exists an orthogonal matrix 0\
or order
that Since l
X'S-'X = s-
T
Y'T~ Y,
1
= s's,
1
=
h%
(fl(ii)-^(i))'(»(ii)^(i)) = (fc(u)%))'(tyii)%))> then 2
I|S(21)-^(1) + 9 ( 2 2 | A T ( 2 ) | | = 11/1(211^(1) + ^(22)^(2) IP so that there exists an orthogonal matrix 0
of order d
2
2
x d
2
such that
(9(21)X(i) + 9 ( 2 2 ) X ( ) = 02(/l( i)y"(i) + / l ( 2 2 ) V ( 2 ) ) . 2 )
2
Now let r
=
-
f 0,
0
\
I
o
J
o
2
T h e n , obviously, X = =
g^ThY aY,
where a £ G and gSg' =
rhTh'V,
Invariance 31 so that aTo'
S =
Hence ( R J . f l J + i i j ) and equivalently (J?J, Ji^) *
s
a
maxima! invariant on x
under G. Note that if G is the group of p x p nonsingular upper triangular matrices in the block form, that is, if g € G ( 9{ii)
S(i2)
'•'
9(i*) \
0
9(22)
•'•
9(2*)
0
0
•••
9 = V where
9{kk)/
is di x d{, then a maximal invariant on X under G is (ir,...,n)
defined by / % )
Y^T* =
%,]
• • •
(Xti ,,..,X }' )
-l
(X(i),...
m
,X )i w
= 1 , . . . ,k
2.5. D i s t r i b u t i o n o f M a x i m a l I n v a r i a n t Let (X,A,X)
be a measure space and suppose pg,9 £ 0 , is a probability
density function with respect to A. Consider a group G operating from the left on X and suppose that A is a left invariant measure, \{A) Let [yX)
g £ G,
= X(gA),
A £ A.
be a measurable space and suppose that T{X)
is a measurable
mapping such that T is a maximal invariant on X under G. measure on {y,C)
induced by A, such that X'(C)
1
= X(T- (C)),
CeC
Also let Pi be the probability measure on (y,C)
Pt{C)
=
j JT-i(c)
defined by
Pe(x)dX(x)
!
L e t A* be a
32
Group /n variance in Statistical
so that P
2
Inference
has a density function p" with respect A*. Hence
/
Assumption.
= j
p'{T)d\'{T)
p(x)dX{x),C€C
There exists an invariant probability measure p on the
group G under G. L e m m a 2.5.1.
The density function
p* of the maximal
invariant
T with
respect to A* is given by
P*(T) = J (gx)dp(g).
(2.19)
P
P r o o f . Let / be any measurable function on (y,C}-
j f(T)p'(T)d\-(T)
= J
Then
f(T(x))p(x)d\'x).
However,
j f(T(x))
J
p{g(x))dft(g)dX(x)
= jj = J
HT(x))p(g(x))d\(x)du(g) J
nnx^pigix^dHgix^g)
= j
f(T{y)) {y)d\{y). P
T h e n , since the function /
p(g(x))dfi{g)
JG
is invariant, we have p*{T)
= / p(g(x))M9) Ja
•
Invariance
2.5.1. Existence of an invaraint probability measure (group of p x p orthogonal matrices).
on
33
O(p)
Let X = ( X i , . . . , X ) be a p x p random matrix where the column vecP
tors Xi
(p-vectors) are independent
and are identically distributed normal
(p-vectors) with mean vector 0 and covariance matrix / (identity). Apply the Gram-Schmidt orthogonalization process to the columns of X to obtain the random orthogonal matrix Y = (¥%,....,
Yi-
Define Y = T(X)
Y ),
where
p
g
,
and let G = 0(p).
i =
i ...,p. t
Obviously,
gY =
(gY ...,gY ) u
v
=
T(gX ...,gX )
=
T(X;,...,X;),
u
p
where X;=gXi,
i =
l,.... . P
Since X ; , X " have the same distribution and Xi,...,X*
are independently
normally distributed with mean 0 and covariance matrix / , we obtain Y and gY have the same distribution for each g £ 0(p).
T h u s the probability measure
of Y defined by P[Y is invariant under 2.6.
eC)
= P(X
L
eT~ (C))
0{p),
Applications
Example
2.6.1.
(Non-central Wishart
distribution).
Let X
=
( X ! , . . . , X ) (each X * is a p-vector) be a p x n matrix (n > p) where each n
column Xi is normally distributed with mean vector / i ; and covariance matrix / and is independent of the remaining columns of X . T h e probability density function of X with respect to the pn-dimensional
i&'t$
=
(
2 n
-)np/2
e x p
{~i
t t x
x
Lebesgue measure is
' - | * i W * ' + t*avi'j
(2-19)
34
Group Invariance in Statistical
In/erence
where fi = ( f t , . , . . . , U p ) . It is well known that n
iml
is a maximal invariant under the transformation X -
re
XT,
0{n),
where O ( n ) is the group of orthogonal matrices of dimension n x n . Let p(s, p.) be the probability density function of 5 at the parameter point p. with respect to the induced measure dX'{S)
as defined in L e m m a 2.5.1. T o find the
probability density function of 5 with respect to d we first find dX'{s).
From
(2.13) (
p(s,0)dX'(s) By
p
= C(det s ) " - -
, ) / 2
e x p | - | t r s J da,
(2.20)
(2.18)
(s,o)=
f (2rr,o)d^(r) Join)
P
P
d
-
fH/ „., *
(r)
0
(2ff)»f/
a
rexpi-^trsl , \ 2 P
where / i is the invariant probability measure on O ( n ) . Hence, n
2
dA*(s) = ( 2 T ) " / C ( d e t »f*+-W*it
.
Again by (2.18)
p { a , u ) = exp | -1 tr(s 4- J V ) J jf
= ^ P | - g tr(s
+ W
^ exp j - j t r acT**' J
0j p),
dp{T)
Invariance
35
where h(x,fi)=
j
exp J — - t r a d ? a ' \
Jot,n)
2
I
du(T)
J
It is evident that h(x,u)
=
h(xT,p.
where T, * e 0{n). T h u s we can write h(x,p)
as
The probability density function of S with respect to the Lebesgue measure ds is given by n p
2
p(s,^)dA*(s) = p ( s , ^ ) ( 2 7 r ) / C ( d e t
(
1
/2
s) "-''- > ds.
This is called the probability density function of the noncentral Wishart distribution with non-centrality parameter fia'. E x a m p l e 2.6.2.
( N o n - c e n t r a l d i s t r i b u t i o n of t h e c h a r a c t e r i s t i c
r o o t s ) . Let S of dimension p x p be distributed as central Wishart random variable with parameter £ and degrees of freedom n > p. ••• > R
p
Let Ri
> R
2
>
> 0 be the roots of characteristic equation det(S — XI) — 0, and
let P denote a diagonal matrix with diagonal elements R i , . . . ,R . P
Denote by
p(i?, S ) the probability density function of R at the parameter point E with respect to induce measure dX'(R)
of L e m m a 2.5.1. We know [see, for example,
Giri(1977)] /p
p(R,I)dX-(R)
= C
1
v(n-p-l)/2
{U^J
.
p
.
exp|4X>j
*ll(&-Rj)dR
where dR stands for the Lebesgue measure and C\ is the universal constant. It may be checked that R is a maximal invariant under the transformation
s
r s r ,
r e
o(p).
By (2.18)
p(R,
I) = C
2
(j[
R^j
exp | - \ J £ Ri |
^ dp(T).
36
Group Invariance
in Statistical
Inference
So dX'(R)
=
^-'[[(R -R )dR. 1
]
Again by (2.18) / p 1
p ( R , E ) = (det S " ) " /
2
\
(n-p-l)/2
(n^)
(n-p-l)/2
exp J --tr x / Jotv) I 2 'O(p)
0- TRr'\ l
da(T) J
where t9 is a diagonal matrix whose diagonal elements are the characteristic roots of £ . T h e non-central probability density function of R depends on £ only through its characteristic roots. 2.7. T h e D i s t r i b u t i o n of a M a x i m a l I n v a r i a n t i n t h e General Case Invariant tests depend on the observations only through the maximal invariant. T o find the optimum test we need to find the explicit form of the maximal invariant statistic and its distribution. For many multivariate testing problems it is not always convenient to find the explicit form of the maximal invaraint. Stein (1956) gave the following representation of the ratio of probability densities of a maximal invariant with respect to a group G of transformations g leaving the testing problem invariant. T h e o r e m 2.7.1. Let G be a group operating from the left on a topological space (X, A)
and \
a measure
in x which is left invariant
not necessarily
transitive on X).
densities pi,p
with respect to A such that
2
Pi(A)
Assume
=
under G (G is
that there are two given probability
J P2(x)d\( ), A
x
Invariance
37
for A € A and Pi and P% are absolutely continuous.
Suppose T : X —* X is a
maximat invariant
when X has
Pi.
and p" is the distribution
Then under certain
of T(X}
distribution
conditions
(2.21)
where p, is a left invariant
Haar measure on G. An often used alternative
form
of (2.21) is given by
(2.21a)
where f
is the probability density with respect to a relatively invariant
with multiplier
measure
X(g).
Stein (1956) gave the statements of Theorem 2.7.1. without stating explicitly the conditions under which (2.21) holds. However this representation was used by G i r i (1961, 1964, 1965, 1971) and Schwartz (1967). Schwartz (1967) also gave a set of conditions (rather complicated) which must be satisfied for Stein's representation to be valid. Wijsman (1967, 1969) gave a sufficient condition for (2.21) using the concept of Cartan G-space.
K o e h n (1970) gave
a generalization of the results of W i j s m a n (1967). Bonder (1976) gave some conditions for (2.21) through topological arguments. Anderson (1982) has obtained some results for the validity of (2.21) in terms of the proper action of the group. Wijsman (1985) studied the properness of several groups of transformations used in several multivariate testing problems. Subsequently W i j s m a n (1986) investigated the global cross-section technique for factorization of measures and applied it to the representation of the ratio of densities of a maximal invariant.
2.8. A n I m p o r t a n t M u l t i v a r i a t e D i s t r i b u t i o n Let Xi,...,
XN be N independently, identically distributed p-dimensional
normal random variables with mean £ = ( £ , . . . , £ p ) ' and covariance matrix £ (positive definite). Write N x
=
w E
x
N ' -
s
=
jy x -x%x -xf. i
i
i
38
It
Group Invariance in Statistical
is well-known that
X,S
Inference
are independent
in distribution and X
has
p-dimensional normal distribution with mean £ and covariance matrix (1/jV) E and S has central Wishart distribution with parameter E and degrees of freedom n — N — 1. Throughout this section we will assume that N > p so that S is positive definite almost everywhere. Write for any p-vector b,
where fofl is the d; x 1 subvector of b such that £j
d* — p and for any p x p
matrix A A(n)
•••
A(
l f c
)
N
'An) Am
,A i)
•••
( A
where
A
A
•••
(n)
=
j
(kk)
is a di x dj submatrix of A. L e t us define Ri,...
,R
k
and
ft,...,5*
by t
i
E ^ = ^w2r7,^ , t = l,2,...,fc. W
T h e o r e m 2.8.1. TTte joint probability density function given by (for Ri > 0, £ * = i fl^ < y
m
« w -
r
(
(
*
w
/
2
*
x T j £
n
2
t
)
/ /
of R% . ..,R
k
\
W l - E ^ J
/ J V - <5,_i
dj J L & \
[(W- )/2]-l P
k
is
Invariance where a
— S i = i dj
^ o — 0 and tp(a, b : x) is the confluent
w
a
39
kypergeometric
series given by a +
a{a + 1 ) x X
+
b
2
b(b + 1) 2!
3
a(a + l ) ( a + 2) x +
6(6 + l)(b + 2) 3!"
+
" ' '
Furthermore, the marginal probability density function of Ri,...,Rj,j
< k
can be obtained from (2.22) by replacing A; by j and p by YJ[=I hiproof.
We will first prove the theorem for the case k — 2 and then use
this result for the case k — 3. T h e proof for the general case will follow from these cases. For the case k = 2 consider the random variables W = 5(22) - 5 ( 2 1 ) 5 ^ , 5 ( 1 2 ) ,
f
=
%,).
It is well-known [see, for example, Stein (1956), or G i r i ( l 9 7 7 ) , p. 154] that (i) V is distributed as Wishart W(N
- 1, E ( ) ) ; U
(ii) W is distributed as Wishart W{N
- 1 - di, E 2 ) -
S^ijSmjE^));
[ 2
(hi) the conditional distribution of L given V is multivariate normal with mean £(21)2^,V
- 1
/
2
and covariance matrix (£(22, - E(21)S^ i S(12)}®/ ( , 1
)
(
where ® is the tensor product of two matrices £ ( 2 2 )
—
1
^(21)^(11)^(12) and the
di x d, identity matrix I , • a
(iv) W is independent of Let us now define W\, W
2
W (l 2
+ W ) = N(X L
{2)
(L,V). by
- S
m
S ^
t
i
)
x ( X 2 ) - 5 ( 2 i , 5 ^ , A" (
T ( S ( 1 )
m
- 5
( 3
i 5 )
(
1 )
5(i2 ))
1
(2.24)
).
Since X is independent of 5 , the conditional distribution oi\fNX^2)
given 5 ( j n
and X ( i ) is normal with mean \/~N{£(2) + ^ < 2 i ) ^ ( i i ) £ ( i ) ) and covariance matrix
40
Group Invariance in Statistical
1 1
£{22> ~ E { 8 i > E r i i j E ( i 4 > of y/NXt2)
given 5
( l l
Inference
thereupon follows that the conditional distribution
, and y/NX^f
is independent of the conditional distri-
bution of
given Sni)
and Xay.
5 ( ~ ! ) ^ { i ) give
11
s
Furthermore, the conditional distribution of V W S ( i 2 ) a
(n)
n
(
i
i s
normal with mean V W E ( a i ) S ^ j j ^ i ) and
covariance matrix
Hence the conditional distribution of Viv" ( x <
given 5 ( H )
a n
2 )
-
(i +
W M
_
1
/
2
s
d -^(i) ' normal with mean
(C(2) - S
( 2
1
i)S^i^(i)) (i +
2
Wi)- ?
and covariance matrix £ ( 2 2 ) - ^(21) £ ( i i ) £ ( 1 2 ) -
From Giri (1977) it follows that the conditional distribution of W% given
X^
and S ( u ) is p(W \X ,S ) 2
(1)
=
tn)
xl [Hi
1
+ Wi)" )
/xl- dl
dl
,
(2.25)
where Xn($) denotes the noncentral chisquare random variable with noncentrality parameter 6 and with n degrees of freedom and Xn denotes the central chisquare random variable with n degrees of fredom. distribution depends on Xh-t and Sti\) probability density of WuWz p{W ,Wi) 2
Since this conditional
only through Wi,
we get the joint
as =
2 X 2
l
(6 (l
+ Wi)- )/x .
2
N
d l
.^
(2.26)
and furthermore it is well known that the marginal probability density function of Wi is P(Wi)
2
= X (Si)/x dl
N
ai
•
(2.27)
Invariance
41
Hence, we have
jSI
= 0
fl +
1
r(* + /?)r(V)
i f f t ^ ^ H *
2
JJ3
j !
(2.28)
From L e m m a 2.8.1. below,
W
2
= {(Ri + R ){1 2
- R i - Ra)" 1
x (1 + H j . { l -
Hi)" )"
1
~Hi(l-
1
Ri)~ } (2.29)
1
1
=
fi (l-fli-fi )" . 2
2
From (2.28) and (2.29) the probability density function of Ri
and R
2
is
given by
4I
1
nriW-Jh*
1
x R$ - BI*'- (I-R T
x exp | - - ( f t +6 ) 2
2 x n ^ i ( J V -
f
f
l
+ rfefiji
_ 0 , f ; i i U , )
(2.30)
We now consider the case k = 3. Write W
3
-
1
( N X ' S " : ? - N%S^X{ y)
(l +
3
It is easy to conclude that 5(33, - 5 [ 3 2 j S
[22]
S
l23
j
•1
42
Group /n variance in Statistical
Inference
is distributed as Wishart da,E(33) - E[33]E^jE(23]j ,
w(N-l-d-!-
and is independent of S( 2) and S[ -• 2
Furthermore, the conditional distribution
S2
of \ / i V X ( 3 ) given S[ ]<X[2] is normal with mean 22
( f o ) - E|32]S|2 (-?[2] - £[2])) 2J
and covariance matrix E
S
E(33) - [32]Ep2] [23] . and is independent of the conditional distribution of ^ [M] \~22) W S
S
X
given S[ 2] and X p ] , which is normal with mean Z
and covariance matrix NX
m [~2 2) m S
l
X
S
S
T h u s the conditional distribution of W given (Wi,W ),
S
S
( [33] - (32] [22j [23]) 3
given Sp2j and X [ j or, equivalently 2
is
2
viW^W^Wy)
X%$41
=
+ WiftWi
+ W + 2
WiWsT ) 1
Ixfu-dy-it-di-
(2-31)
Hence the joint probability density function of - — ^
p(Wi,W ,W ) 2
3
,W, W 2
3
is given by
5
'-
XN-di-di-dz
X.N-d,-d?
Aiv-fl,
From L e m m a 2.8.1. below, W =R (1-R )- , 1
1
1
l
1
W
=
W
= {{R, +R
a
3
Ri(l-R -R )- , l
2
2
+ R )(l 3
- R , - R
x (1 - R, - R^-'Xl
- R, -
= R (.1-Rl-R -R )- . 1
3
3
a
2
-
^ p
1
- (R, +
fl ) 2
R) 2
(2.33)
Invariance
From (2.32)-(2.33)
43
the joint probability density function of i f , , i f 2 , # 3 is
given by
i=l
V
i=l
x n ^ i t i V - ^ f ; ^ ) .
(2.34)
Proceeding exactly in the above fashion we get (2.22) for general k. Since the marginal distribution of JEui is normal with mean &a and covariance matrix S[iij it follows that given the joint probability density function of
i f i f ^ ,
the marginal distribution of i f , , . . . , i f j , 1 < j < p can be obtained from (2.22) by replacing k by j .
•
The following lemma will show that there is one-to-one correspondence between ( f i i , . . . , i f t ) and (if J , . . . , i f £ ) as defined earlier in this chapter. L e m m a 2.8.1. 1
NX'(S
+ NXX')- X
NX'S^X
=
1+
NX'S-i-X
P r o o f . Let 1
{S + NXX )-
1
=S-'
+A.
Hence, 1
I + NXX'S-
+(S
+ NXX')A
=
I.
So we get (S + NXX')-
1
= S"
1
1
- N(S +
NXX'^XX'S'
Now NX'(S
+ NXX')-^
= NX'S^X
- NX'(S
+
NXXT'XiX'S^X).
44
Group Invariance
in Statistical
Inference
Therefore, 1
NX' NX>
1+NX s
S~ X
lx
Note that
5
=i
j=i
V
J=I
2.9. A l m o s t I n v a r i a n c e , S u f f i c i e n c y a n d I n v a r i a n c e D e f i n i t i o n 2.9.1. Let ( A " , A , P ) be a measure space. A test ih{X),X
G
A" is said to be equivalent to an invariant test with respect to the group of transformations G, if there exists an invariant test ip(x) with respect to G such that ib(x) =
y{x),
for all x G X except possibly for a subset TV of A" of probability measure zero. D e f i n i t i o n 2.9.2. A l m o s t i n v a r i a n c e . A test ip{x), x G X, is said to be almost invariant with respect to the group of transformations G, if
where N
g
is a subset of x depending on g of probability measure zero.
It is tempting to conjecture that if tp(x) is almost invariant then it is equivalent to an invariant test i>{x). Obviously, any ifi(x) which is equivaleant to an invariant test is almost invariant. For, take N
-1
g
— N U g N.
If x £ N , g
then
x & N, and gx $ N so that tp(x) = y(gx)
= ih(gx) = ii>{x).
Conversely, if G is countable or finite then given an almost invariant test
g
= {x G x '• f{x)
# f{gx)}
1J N
g
,
and AT has probability measure zero.
T h e uncountable G presents difficulties. Such examples are not rare where an almost invariant test can be different from an invariant test.
Invariance
Let G be a group of transformations operating on X and let, A,B,
45
be the
d-field of subsets of X and G respectively such that for any A £ A the set of pairs (x, g) for which gx £ A is measurable Ax
B. Suppose further that there
exists a tr-finite measure v on G such that for all B £ B,v(B) v{Bg)
= 0 implies
= 0 for all g £ G . T h e n any measurable almost invariant test function
with respect to G is equivalent to an invariant test under G. For a proof of this fact the reader is referred to Lehmann (1959). — 0 imply v(Bg)
requirement that for all g £ G and B £ B,v(B) satisfied in particular when f{B)
— f{Bg)
The
= 0 is
for all g £ G and B £ B. Such a
right invariant measure exists for a large number of groups. Let A\ = {A : A £ A and gA — A for all g £ G } and let .4 be the sufficient S
o--field of X.
Let G be a group, leaving a testing problem invariant, be such
that gA„ = A
s
for each g £ G . If we first reduce the problem by sufficiency,
getting the measure space (X, A , P ) , and subsequently by invariance then we s
arrive at the measure space (X,A i,
P) where A i
s
A
s
s
is the class of all invariant
measurable sets. T h e following theorem which is essentially due to Stein
and Cox provides an answer to our questions under certain conditions. T h e proof is given by Hall, Wijsman and Ghosh (1965). T h e o r e m 2.9.1. Let [X, A, P ) be a measure space and let G be a group of measurable transformations
leaving P invariant.
for P on (X, A) such that gA almost invariant
A
3
s
—A
s
Then A
s
s
be a sufficient
subfield
for each g £ G . Suppose that for
measurable function
probability measure zero.
Let A
each
is sufficient for P on
(X,A).
Thus under conditions of the theorem if we reduce by sufficiency and invariance then the order in which it is done is immaterial. 2.10. I n v a r i a n c e , T y p e D a n d E R e g i o n s . The notion of a type D or type E region is due to Issacson (1951). Kiefer (1958) showed that the P-test of the univariate general linear hypothesis possesses this property. Suppose, for a parametric space ft — {(#,i?) : 6 £ ft',n € H] with associated distributions, with ft' a Euclidean set, that every test function
which, for each n, is twice continuously
differentiable in the components of 0 at 0 = 0, an interior point of ft'. Let Q be the class of locally strictly unbiased level a tests of Ho alternatives H\:0^O. similar and that
O u r assumption on
:
a
$ — 0 against the
implies that all tests in Q
a
are
46
Group Invariance in Statistical
for all 4> in Let
Inference
Q. a
be the matrix of second derivates of 0$(9, v) with respect to the
components of $ (which is also popularly known as the Gaussian curvature) at 6 = 0. Define AM
= det(/V"))-
Assumption. A ^ ( T J ) > 0 for all v E H and for at least one d>' 6 Definition 2.10.1.
Q. a
( T y p e E a n d T y p e D T e s t s ) . A test 4>' in Q
a
is
said to be of type E if A^(n)-max^ Q A^(n) e
=
for all V € H. If i f is a single point, d>~ is said to be of type D. Lehmann (1959a) showed that, in finding regions which are of type D, invariance could be invoked in the manner of Hunt-Stein theorem. (Theorem 5.OA); and that this could also be done for type E regions (if they exist) provided that one works with a group which operates on H as the identity. Suppose that our problem is invariant under a group of transformations for G which Hunt-Stein theorem holds and which acts trivially on fi', such that for g e G
If (fig is the test function defined by tpg(x) = <j>(gx), then a trivial computation shows that A* (n) = A ^ o n ) 9
and hence A(n) =
A{gr))
where A(n) = m a x ^ ^ A ^ n ) . Also, if 0 is better than 4>' in the sense of either type D or type E, then tpg is clearly better than
Invariance
47
Exercises 1. Let Xi,...,
X
n
be independently and identically distributed normal random
variables with mean a and variance (unknown) IT . Using Theorem 2.7.1. 2
find the U M P invariant test of Ho : p, = 0 against the alternative Hi : u ^ 0 with respect to the group of transformations which transform each Xi —* cXi,c±Q. 2. Let {Pg,8 e 11} be a family of distributions on (X,A) density p(-j8)
such that P$ has a
with respect to some a-finite measure u. For testing Ho • 0 €
flrj against Hj : 9 G Hi with 12
n
Pi = 0 (
0
n u
'l
s e t
).
t
n
e
likelihood ratio test
rejects Ho whenever X(x) is less than a constant, where, sup 9€.n
p(x\9)
0
X(x) =
sup 9
p(x|S)
e flj u Hi
Let the problem of testing r/o against i i ] be invariant under a group G of transformations g transforming x —* g ( x ) , g 6 G, i G A". Assume that p(x\9)=p(9x\g8)X(9) for some multiplier X. Show that A(x) is an invariant function. 3. Prove (2.8) and (2.25). 4. Prove Theorem 2.9.1.
References 1. S. A, Anderson, Distribution
of maximal
invariants
using
quotient
measures,
Ann. Statist. 10, pp. 955-961 (1982). 2. D. Blacltwell, On a class of probability spaces, Proc. Berkeley Symp. Math. Statist. Probability. 3rd, Univ. of California Press, Berkeley, California, 1956. 3. J . V . Bonder, Borel
cross-sections
pp. 866-877 (1976). 4. T . Ferguson, Mathematical 5. N. Giri, On the likelihood
Statisties,
and maximal
invariants,
Ann. Statist. 4,
Academic Press, 1969.
ratio test of a normal
multivariate
testing
problems;
testing
problems,
Ph.D. Thesis, Stanford University, California, 1961. 6. N. Giri, On the likelihood
ratio test of a normal multivariate
Ann. Math. Statist. 35, pp. 181-189 (1964). 7. N. Giri, On the likelihood ratio test of a normal
multivariate
testing
problem-II,
Ann. Math. Statist. 36, pp. 1061-1065 (1965). 8. N. Giri, On the distribution of a multivariate statistic, Sankhya, 33, pp. 207-210 (1971).
48
Group Invariance in Statistical
9. N. Giri, Multivariate
Inference
Statistical
10. S, L , Isaacson, On the theory specifying
Inference,
Academic Press, N . Y . , 1977.
of unbiased
tests of simple
the values of two or more parameters,
statistical
hypotheses
Ann, Math. Statist. 22, pp. 217¬
234 (1951). 11. A. Haar, Der Massbegriff
in der theorie
der kontinvjierchen
gruppen,
Annals of
between
sufficiency
Mathematics, 34 (1933), pp. 147-169. 12. W. J . Hail, R. A. Wijsman, and J . K . Ghosh, The relationship and invariance
with application
in sequential
analysis,
Ann. Math. Statist. 36,
pp. 575-614 (1965). 13. P. R. Halmos, Finite
Dimensional
Vector Space, D. Van Nostrand Company,
Inc. Princeton, 1958. 14. J . Kiefer, On the nonrandomized
symmetrical
optimatity
and randomized
nonoptimality
of
designs, Ann. Math. Statist. 29, pp. 675-699 (1958).
15. U. Koehn, Global cross sections
and the densities
of maximal
invariants,
Ann.
Math. Statist. 41, pp. 2045-2056 (1970). 16. E . L . Lehmann, Testing of Statistical Hypotheseses, John Wiley, N . Y . , 1959. 17. E . L . Lehmann, Optimum invariant tests, Ann. Math. Statist. 30, pp. 881-884
(1959a). 18. L , Loomis, An Introduction
to Abstract
Hermonic
Analysis,
D. Van Nostrand
Company, Inc., Princeton, 1948. 19. L . Nachbin, The Haar Integral.,
D, Van Nostrand Company, Inc., Princeton,
1965. 20. R. Schwartz, Locally mimimax tests, Ann. Math. Statist., 38, pp 340-359, 1967. 21. C . Stein, Multivariate Analysis-I, Technical Report 42, Department of Statistics,
Standford University, 1956. 22. R. A. Wijsman, Cross-Section of orbits and their application
to densities
of max-
imal invariants, Proc. Fifth. Berk. Symp. Math. Statist. Prob. 1. pp. 389-400 (1967) University of California, Berkeley. 23. R. A. Wijsman, General proof of termination sequential
probability
with probability
ratio test based on multivariate
one of
observation,
invariant
Ann. Math.
Statist. 38, pp. 8-24 (1969). 24. R. A. Wijsman, Distribution
of maximal
invariants,
using global and local cross-
sections, Preprint, Univ. of Illinois at Urban a-Champaign, 1978. 25. R. A. Wijsman, Proper action in steps, with application
imal invariants,
26. R. A. Wijsman, Global cross sections distribution
to density
ratios of max-
Ann. Statist. 13, pp. 395-402 (1985).
of maximal
invariants,
as a tool for factorization
of measures and
Sankhya A, 48, pp. 1-42 (1986).
Chapter 3 EQUIVARIANT ESTIMATION IN C U R V E D MODELS
Let ( x , A ) be a measure space and f) be the set of points 8. {Ps,8
Denote by
£ Si], the set of probability distributions on (x, A ) . L e t G be a group
of transformations operating from the left on \ such that p £ G , g ; x —* X is one to one onto (bijective). Let G be the corresponding group of induced transformations g on ft. Assume (a) For 0i e 0, i = 1,2;9i ? t 9 , P , £ 2
(b) P { A ) - P- e(gA), f l
g
e
P„ B
A £ A,p € G,g e G.
Let A(0) be a maximal invariant on ft under G , and let f!" = {0\9 £ fi.with A() = A }
(3.1)
0
where Ao is known. We assume x to be the space of minimal sufficient statistic for 8. D e f i n i t i o n 3 . 1 . ( E q u i v a r i a n t E s t i m a t o r ) A point estimator 8(X),X X, a mapping from x —' ft, is equivariant if 8(gX) N o t e 1.
= g8(X),g
£
£ G.
For notational convenience we are taking G to the group of
transformations on e(X).
N o t e 2. T h e unique maximum likelihood estimator is equivariant.
50
Group Invariance
L e t T(X),X
in Statistical
Inference
£ X' be a maximal invariant on x under G. Since the distridepends on 0 € Q only through A(0), given X($) =
bution of T(X)
is an ancillary statistic.
A T(X) 0
A n ancillary statistic is defined to be a part of the
minimal sufficient statistic X whose marginal distribution is parameter free. In this chapter we consider the approach of finding the best equivariant estimator in models admitting an ancillary statistic. Such models are assumed to be generated as an orbit under the induced group G on fi. T h e ancillary statistic is realized as the maximal invariant on x under G.
A model which
admits an ancillary statistic is referred to as a curved model. Models of this nature are not uncommon in statistics. Fisher (1925) considered the problem of estimating the mean of a Normal population with variance a
2
when the
coefficient of variation is known. T h e motivation behind this was based on the empirically observed fact that a standard deviation a often becomes large proportionally to a corresponding mean p so that the coefficient of variation
a/p
remains constant. This fact is often present in mutually correlated multivariate data. In the case of multivariate data, no well-accepted measure of variation between a mean vector p and a covariance matrix E is available. K a r i y a , Giri and Perron(1988) suggested the following multivariate version of the coefficent of variation, (i) A =
-1
fl'SS /!
1
(ii) v = S "
^ where E = E ^ E
1
'
2
1
with S /
2
as the unique lower triangular matrix with positive diagonal elements. I n recent years Cox and Hinkley (1977), Efron (1978), A m a r i (1982 a,b) among other have reconsidered the problem of estimating p, when the coefficent of variation is known in the context of curved models. 3.1. B e s t E q u i v a r i a n t E s t i m a t i o n of fi w i t h A K n o w n Let Xi,...,X (n > p) be independently and identically distributed JV (p, E ) . We want to estimate a under the loss function n
L{n,d)
1
= (d-n)'-L- {d-p)
p
(3.2)
l
when A = a'T,~ p, — A (known). L e t 0
n
nx = Y,x>,s T h e n (X,S)
=
- x)(Xi
is minimal sufficient for {u, E ) , JUX
of S as N (y/nu, p
7i
-xy.
is distributed independently
S ) and S is distributed as Wishart Wp(n - 1, E ) with n — 1
degrees of freedom and parameter E . Under the loss function (3.2) this problem remains invariant under the group of transformations G j ( p ) of pxp
nonsingular
Equivariant
matrices g transforming (X,S)
—» (gX ,gSg').
Estimation
in Curved Models
51
T h e corresponding group G of
induced transformations g on the parametric space ft transforms 8 = (p., £) —* g8 — (pp, o £ p ' ) . A maximal invariant invariant on the space of (\/nX, T
2
= n(n-
S) is
lJX'S-'X
(3.3)
and the corresponding maximal invariant on the parametric space ft is A = 1
(i'T,~ u.
Since the distribution of T
A, given A — A o T
2
2
depends on the parameters only through
is an ancillary statistic. Since the loss function L[p,d)
invariant under the group G i ( p ) of transformations g transforming {X,S) {gX,gSg')
and ( « , S ) —• (gp,gT,g')
R(8, l) (
we get for any equivariant estimator p. of
,
=
E (p-p)'^- (p-p,)) e
,
= E (gjl
1
- ga)'{g £,g')~ {gii.
e
-
gp))
= E (p(gX)
- 9${gWT\mm
-
gp)
= Eg (a{X)
- n)'{gZ^y\^{gX)
-
gp)
e
e
g
= *(5M), for g € Gt{p),g
is —»
(3-4)
is the induced transformation on E corresponding to g on x-
Since G | ( p ) acts transitively on E we conclude from (3.4) that the risk
R(8,fi)
of any equivariant estimator ji is a constant for all 8 € ft. Taking A — 1, which we can do without any loss of generality and using 0
the fact that R(8, ii) is a constant we can choose p.. = e = ( 1 , 0 , . • . , Q) , E = I. To find the best equivariant estimator which minimizes R(8,p)
among all
equivariant estimators p. satisfying ji(gX,gSg')=gp(X,S) we need to characterize p.. Let Gr{p)
be the subgroup of G;(p( containing all
p x p lower triangular matrices with positive diagonal elements.
Since 5 is
positive definite with probability one we can write, S — W W ' , W £
Grip)-
Let v = w
lY Q
~'
2
2
where Y = •JnX, \\ V \\ = V'V = T /(n
=
WT\
- 1).
T h e o r e m 3.1. If p, is an equivariant estimator of p, under Gi(p) p(Y,S)
=
k(U)WQ
then (3.5)
52
Group Invariance
in Statistical
Inference
2
where k is a measurable
ofU = T j(n
— 1).
P r o o f . Since p is equivariant under G i ( p ) we get for g € gp(Y,S) _1
Replace Y by W Y,
= p.(gY,gSg').
(3.6)
g by W and S by I in (3.6) to obtain p(Y,S)
Let O be an pxp
Oi(p)
= Wp(V,I).
(3.7)
orthogonal matrix with Q = V/ || V \\ as its first column.
Then fi(V,I)
= p(00'V,00')
=
OH{VUeJ).
Since the columns of O except the first one are arbitrary as far as they are orthogonal to Q it is easy to claim that the components of p.(\fUe,I) the first component p (VU
except
e, / } are zero. Hence
L
p{V,I)
= Qji {y/UeJ).
•
1
T h e following theorem gives the best equivariant estimator (BEE) of p. T h e o r e m 3 . 2 . Under the loss function
(3.2) the unique BEE of p isp = k
(If) WQ where k(U) = E(Q'W'e\U)/E(Q'W'WQ\U).
Proof.
(3.8)
B y Theorem 3.1 the risk function of an equivariant estimator p,
given Ao — 1, is R(6,p)^R((e I),,l) 1
= E(k(U)WQ
- e)'(k(U)WQ
-
e).
Using the fact that U is ancillary, a unique BEE is obtained as p =
k(u)WQ
where k(u) minimizes the conditional risk given U E{((k(U)WQ
- e)(k{U)WQ
T h i s implies k(U) is given by (3.8).
-
e)')\U\. •
Equivariant
Estimation
in Curved Models
53
3.1.1. Maximum likelihood estimators T h e maximum likelihood estimators ft and £ of p and £ respectively under the restriction An = I are obtained by maximizing
-|log
(det £ ) - |
tr 5 S "
1
- | j £ - M J ' E
-
1
^ - M ) -
(3.9)
with respect to p, £ where 7 is the Lagrange multiplier. Maximizing (3.9) we obtain p—~—, n + 7
S - - 5 + 71
7
M ' ( 7 + C
I
.
(310)
- l
Using the condition J t ' E / i = 1 we get .
(U-^U(4
+ 5U)\
^ (3.11)
The maximum likelihood estimator (mle) p is clearly equivariant and hence it is dominated by jl of Theorem 3.2 for any p. I n the univariate case the mle is
(3.12) 71
4
Thus (3.12) is the natural extension of (3.11).
Hinkley (1977) investigated
some properties of this model associated with the Fisher information. A m a r i (1982 a, 6) proposed through a geometric approach what he called the dual mle which is also equivariant.
3.2. A Particular Case
As a particular case of A constant we consider now the case £ = 2
C
is known. L e t Y = y/nX,W
for this problem, and
= tr S. T h e n (Y,W)
J when
is a sufficient statistic
i distributed independently of Y as X ( - i ) s
n
p
We are interested here to estimate /t with respect to the loss function
(3.13)
54
Group Invariance
in Statistical
Inference
T h i s problem remains invariant with respect to the group G = R+ x
0(p),R+
being the multiplicative group of positive reals and O(p) being the multiplicative group of p x p orthogonal matrices transforming Xi —t bTXi , t = 1 , . . .
,n
d -4. bTd,
^gf) where (6,T) € G with b e R+,T
a ~
n,
g O ( p ) . T h e transformation induced by G on
(Y, W) is given by 2
(Y,W) T h e theorem
below
— (!>rY, f> W).
gives the representation of an equivariant estimator
under G . T h e o r e m 3.3.
An estimator d(Y, W)
exists a measurable function
h ;R
d{Y,W)
for all [Y,W)£R? Proof. h (^w~j" Y
x
+
is equivariant
=
h{^p}Y
R. +
If h is a measurable function from if+ t
n
e
n
if and only if there
—* R such that
—* R and d[Y, W) —
clearly d is equivariant under G . Conversely if d is equiv-
ariant under G , then d(Y,W) for all T £ 0{p),Y generality that Y'Y
p
€ R ,b
(VY = b r d { — ,
¥
W\ )
(3.14)
> 0 , W > 0. We may assume without any loss of
> 0.
L e t Y, W be fixed and
-ft) where di is the first component of the p-vector d. p x p matrix such that Y'A = (\\Y
||,0,...,0)
Let A € O(p) be a fixed
Equivariant
Estimation
Let A be partitioned as A = ( A J . A J ) where A, T = {Ai,A B)
with B e 0(p-l)
2
d(Y,W)
=
in Curved Models
1
\\Y\\~ Y.
55
Now choose
and b = \\Y\\. From (3.14) we get
= d
^(l,0...0).^)r.
1
+ \\Y\\A Bd ((l 0...Q) ~) 3
a
t
.
t
(3.15)
Since (3.15) holds for any choice of B € 0(p — 1) we must have then
d(Y,w)
= d ((i,o...o).^)r.
•
l
It may be easily verified that a maximal invariant under G in the space of l
sufficient statistic V = W~ Y'Y
and a corresponding maximal invariant in
the parametric space is
As the group acts transitively on the parametric space the risk function (using (3.13)) fl(M)
=
£„(£(M))
of any equivariant estimator d is constant.
Hence we can take p = po
=
[C, 0 . . . 0)'. T h u s the risk of any equivariant estimator d can be written as
fl(po,d)
=
(L (p ,k
y ) )
0
= Eu (E(L(p ,h(^-} 0
r))
n
V
= *) •
To find a B E E we need to find the function ho satisfying Ep (L(p ,h (V)Y\V Q
for all h : R
+
0
= v)<
0
Ep {L{po,h{v)y)\V 0
= v)
—* R measurable functions and for all values v of V. Since
EpampoM^YW
= v)
2
= h (v)Epo(Y'Y\V
=v)~
2h(v)Epo(Y \V l
= v) + l
where K = ( y . . . , y ) \ we get l t
p
EMY \V Ep (Y'Y\V i
k
o
{
v
)
-
0
= v) = v)-
f
3 ( 3
. '
6
1
1 6 )
56
Group Invariance in Statistical T h e o r e m 3.4.
Inference
The BEE da(x ..
.,X„;C)
lt
= d (Y,W)
is given by
0
d {Y,W) 0
' ™ r f j n y + i + l ) (n&
nC
h i^ ) \
2
r
/
V
^
2
+i+l v
(3.17) t
rti"P+j+i)/ BC V
h
hir+W
\
2
J
.
where t — y ( l + ti) —1 P r o o f . T h e joint probability density function of Y and W under the assumption fi — fi
0
is 2
1
[ exp{-(C /2)(j/y - 2y5gi + n + I K ) } ^ ' " - ' " 2
/(y>) = {
1
2
2"p/ (C )""p/2(r(i))pr((n - l)p/2)
, if w > 0
0, otherwise. Hence the joint probability density function of Y, V is given by (• /
y
„ „ „j (
2
exp{-(C /2[((l+ i Q W i , + 2 ^
=
I
2
( C
2
) - ^ \ r ^ m ( n -
w
]}
l)p/2)
0, otherwise.
Using (3.16) we get (3.17). C o r o l l a r y . If m - p(n - l ) / 2 is an integer, the B E E is given by 2
n <" do(Y,W)= ^h (v) T
=
0
with <j(t) = p(t)/w(t)
g(t)X,
where m+ m+l
u(t) =
1
m + l
y ,
3
m + l \ /'nC \ Id r ( | p + i)
3
...
(3.18)
Equivariant
P r o o f . Let Y
k
Estimation
in Curved Models
57
be distributed as %%• T h e n c E
./
y n
^_o r(fc/2 + ) t t
-k
a
2
^ ~
l
r(fc/2)
f
Q
>
^ " '
Hence with m as integer t E i Y z ^ M - n C H m ^ y t h (v)
= v ^ C
0
=
/ v
2
^ £ 3=0
2
)exp{-nC (/2}(^)
2
SC £(V
m 1
)/£(V
m + 1 2
2
2
noncentral Xp( C t).
4
),
(3.19)
n(
2
where Vj is distributed as noncentral X ,+2( ~' i)^ n
J
2
Hence letting V = ^ ( t f )
a
and ^2 is distributed as
° d taking r as an integer we
get
Prom (3.19) and (3.20) we get (3.18).
•
N o t e 1. g(t) is a continuous function of f, and
2
lirn
S
(f) = TiC /p,
(-.0+ hmg(t)
and g(t) > 0 for all t. T h u s when Y'Y N o t e 2. We can also write d
Q
=
g { l ) < l ,
is large the B E E is less than X.
= (1 - ^ ) X where r(t))/t) = 1 - g(t).
This
form is very popular in the literature. Perron and G i r i (1990) have shown that g{t) is a strictly decreasing function of t and T(V) is strictly increasing in v. T h e result that g(t) is strictly decreasing in f tells what one may intuitively do if he has an idea of the true value of C and observe many large values concentrated. Normally one is suspicious of their effects on the sample mean and they have the tendency to shrink the sample mean towards the origin. T h a t is what our estimator does. T h e result that T(V) is strictly increasing in v relates the B B E of the mean for C known with the class of minimax estimators of the mean for C unknown. Efron and Morris (1977) have shown that a necessary condition
58
Group Invariance
in Statistical
Inference
for an equivariant estimator of the form g{t)X
to be minimax is g(t) —* 1 as
t —* 1, SO our estimator fails to be minimax if we do not know the value of C. O n the other hand Efron and Morris (1977) have shown that an estimator of the form d = (1 — 0 < T(V) <(p-
Z^p-)X
is minimax if (i) r is an increasing function, (ii)
2 ) / ( n - 1) + 2 for all v e (0, oo). T h u s our estimator satisfies
(i) but fails to satisfy (ii). So a truncated version of our estimator could be a compromise solution between the best, when one knows the values of C and the worst, one can do by using an incorrect value of C. 3.2.1.
An application
T h e following interesting application of this model is given by Kent, Briden, and Mardia (1983). T h e natural remanent magnetization ( N R M ) in rocks is known to have, in general, originated in one or more relatively short time intervals during rock-forming or metamorphic events during which N R M is frozen in by falling temperature, grain growth, etc. T h e N R M acquired during each such event is a single vector magnetization parallel to the then-prevailing geometric field and is called a component of N R M . B y thermal, alternating fields or chemical demagnetization in stages these components can be identified. Resistance to these treatments is known as "stability of remanence".
A t each stage of
the demagnetization treatment one measures the remanent magnetization as a vector in 3-dimensional space. These observations are represented by vectors Xi,...,X
3
in fi . T h e y considered the model given by j j j — a , +0i+ei
n
where
c»i denotes the true magnetization at the ith step, 0i represents the model error, and ei represents the measurement error. They assumed that 0, and e< 2
are independent, p\ is distributed as N {0,T {c>i)I),
and e is distributed as
3
z
2
A M 0 > c r ( a , ) / ) . T h e Q are assumed to possess some specific structures, like ;
collinearity etc., which one attemps to determine. Sometimes the magnitude 2
of model error is harder to ascertain and one reasonably assumes r (a) 2
In practice <J {a) is allowed to depend on a ; and plausible model for 2
which fits many data reasonably well is cr (a)
— a{a'o)
Maximum likelihood estimator
T h e likelihood function of xi,...
x
n
with C known is given by
L(xi,...,x \p) n
=
C ^ ) - -
/
2
e x p | - ^ - ( , + j ' ,-2^'M + V (
/
J
2
c (a)
+ b with a > 0, b > 0.
When a'a large, b is essentially 0 and a is unknown.
3.2.2.
= 0.
M
) } •
Equivariant
Estimation
in Curved Models
59
Thus the mle p of p (if it exists) is given by 2
[{np/C )(p'p)
-w-y'y
+ 2 \ / n i / A ] £ = Vnp'py
(3.21)
•
If the E q . (3.21) in ft has a solution it must be colinear with y. L e t p = ky be a solution. From (3.21) we obtain 2
2
k[(np/C )y'y)k
+ Jn~y'yk - {y'y +w)] = 0.
Two nonzero solutions of A; are 1
-1-(1 + ^ ( ^ ) ) 1
_
iJkpIC
/
2
- 1 + (1 +
2
2
'
^ ( ^ ) )
1
/
2
2
~
2^lp/C
To find the value of k which maximizes the likelihood we compute the matrix of mixed derivatives 2
a ( - logL) dp'dp
kHy'y)'
and assert that this matrix should be positive definite. T h e characteristic roots of this matrix are given by + 2npk
2
3
k y'y
'
A
-
2
x/nC
2
k y'y
If k = ki, then Ai < 0 and A < 0. B u t if k = k , then A, > 0, A > 0. 2
Hence the mle p. — d-i(xi,...
2
,x ,C)
2
is given by (corresponding to positive
n
characteristic roots) 2
d {x ,...x ,C) 1
1
n
=
(1 + 4 p / C ( ) 2p
1 / 2
- 1
2
C X.
(3.22)
Since the maximum likelihood estimator is equivariant and it differs from the B E E do the mle d\ is inadmissible. T h e risk function of do depends on C . Perron and Giri (1990) computed the relative efficiency of do when compared with d]_, the James-Stein estimator (d ), 2
the positive part of the James-Stein
estimator (d ), and the sample mean X ( d j ) for different values of C, n, and p. 3
They have concluded that when the sample size n increases for a given p and C the relative efficiency of do when compared with di, i = 1,.., 4 does not change significantly. T h i s phenomenon changes markedly when C varies. When C is small, dp is markedly superior to others. O n the other hand, when C is large
60
Group Invariance in Statistical
Inference
all five estimators are more or less similar. These conclusions are not exact as the risk of d ,di
are evaluated by simulation. Nevertheless, it give us sufficient
0
indication that for small values of C the use of B E E is clearly advantageous. 3.3. B e s t E q u i v a r i a n t E s t i m a t i o n in C u r v e d C o v a r i a n c e M o d e l s Let Xi,...
,X (n
> p > 2) be independently and identically distributed
n
p-dimensional normal vectors with mean vector p and positive definite convariance matrix S . Let E and S =
(X / i
£ — 1 £ll
- X)(X
a
- Xf
a
(
P-IV ^12
be partitioned as
I
.5—1
i Sn
p - i V S\2 I
We are interested to find the B E E of 0 — E ^ ^ i on the basis of n sample observations Xi,...,x„ 2
coefficent p
when one knows the value of the multiple correlation
— E ^ E ^ s E ^ E21.
2
If the value of p
is significant, one would
naturally be interested to estimate 0 for the prediction purpose and also to estimate £22 to ascertain the variability of the prediction variables. L e t Gi(p) 0{p)
be the full linear group of p x p nonsingular matrices and let
be the multiplicative group of p x p orthogonal matrices. L e t Hi be the
subgroup of Gi(p) defined by /
' An
P-1 0
\
0
h22
Hi = {ft € Gi(p): ft =
and let H
2
p
be the additive group in R .
of H i and H .
Define G = Hi © H
2
2
The
T h e corresponding transformation on the sufficient statistic is given by
(X,S)
2
T h e transformation g = (hi,k )
the direct sum
2
€ G transforms H .
transformation g — (fti,fta) € G transforms X{ - t fti^i + h , 3
t = l,...,n,
( M , E ) - * (ftiM + h a . f t i S f t i ) .
- * (fti-A + / i , ftiSftJ). A maximal invariant in the space of (X, S) under G is 2
R? =
S12S22S21 2
and a corresponding maximal invariant in the parametric space of (p, E ) is p .
Equivariant
3.3.1.
Characterization
Let S
Estimation
in Curved Models
61
of equivariant estimators of £
be the space of all p x p positive definite matrices and let G £ ( p ) be
p
the group of pxp lower triangular matrices with positive diagonal elements. A n equivariant estimator d{X, S) of £ with respect to the group of transformations p
G is a measurable function d(X, S) on S xR
to S
p
d(hX for all S £ S ,h
£ Hi,
p
+ £, hSh') = hd(X, p
and X,£
€ R'
satisfying
p
S)ti
From this definition it is easy to
remark that if d is equivariant with respect to G then d(X,S) P
all X £ R ,S
£ S. p
= d(0,S)
for
T h u s without any loss of generality we can assume that d
is a mapping from S
p
to S . p
Furthermore, if u is a function mapping S
p
into
another space Y (say) then d* is an equivariant estimator of u ( £ ) if and only if d* — u • d for some equivariant estimator d of £ . Let
and let G be the group of induced transformations corresponding to G on 0 . P
L e m m a 3 . 1 . G acts transitively on 0 . P
Proof.
It is sufficient to show that there exists a g
— ( A , £ ) £ G with
ft £ Hi, ^ £ RP such that 1
1 p .0
(/i/t + £> A£A') =
If p = 0, i.e., £ p # 0, choose A
1 2
n
= 0, we take ha = E
1
/
2
u
,
A
2 2
1
= £,,
= F £^3
P 0 0 /
p 1 0 1 / 2
2
\
j
/
1 p-2
N (3.23)
/
, f = - f t p to obtain (3.23). I f
, where T is a (p - 1) x (p - 1)
orthogonal matrix such that £n
1 / 2
E
1 2
£ 2
1 / a 2
r = 0 , 0,
0),
and £ = - A / / to obtain (3.23). T h e following theorem gives a characterization of the equivariant estimator
d(S) of S .
62
Group Invariance
in Statistical
Inference
T h e o r e m 3 . 5 . An estimator d ofT. is equivariant the
a
{
if and only if it admits
decomposition
S
,
~
2
{a^WR-iSu
1
R~ a tR)S S,- S 2 22
21
1
1
+
1
C(R){S22-S S- S ) 2l
l2
(3.24) where C{R)
> 0 and /f,\
a A
[
K
)
_ ( a-ii{R) - \a (R)
a (R) a (R) 12
2l
is a 2 x 2 positive definite matrix.
if and only if a Note. aij
l n
ai a 2
2
Furthermore
d i di d 2 Y
22
2
2
2 2
a \ = f> • 2
T h e d;j are submatrices of d as partitioned in (3.24) and aij
=
(R). Proof.
T h e sufficiency part of the proof is computational.
in verifying d(hsh')
— hd(S)h'
1
1
2
It can be obtained in a straightforward way from the compu-
2
tations presented in the necessary part. To prove the necessary part we observe that
P
- ( R
FI
*)•
>°<
and d satisfies Ii
0
o
r
for all T € 0(p - 2). T h i s implies that
o \ _ /MR)
P 0 with C(R)
I t consists
for all h g H, S e Sp and d^, d^d^o d i —
I-) P
{
2
0
o C(R)I _ P
2
> 0.
In general, S has a unique decomposition of the form S =
Sn
S\
Sn
S)
l2
22
_(Ti \ 0
0 \ / l T j \U
V \(T[ V » J U
0 T
2
Equina Hani Estimation
with 7\ G G £ ( 1 ) , T
1
in Curved Models
>
e G+(p - 1}, and U = T - S T,- .
2
2
63
Without any loss of
21
generality we may assume that U ^ 0. Corresponding to U there exists a B G 1
0 ( p - 1) such that U'B = {R, 0, . . .0) with R = \\U\\=
1
( S ^ S^S^
5 )=
>
2 1
l
0. F o r p > 2, 5 is not uniquely determined but its first column is
R~ U.
Using such a B we have the decomposition 1 U
W \ _(\ !„.,) ~ {0
O W P Bj \ 0
0 J_) P
2
\ (l \0
0 B'
and :
« - ( ? i)-(J S)(T £)(! *°)(» i) (
n
Of
an(fi)
R-'autRJl/'
V TJ
0
A o
13
p - i - W )
2
fi-'a^ffi)^!
R- a 2(R}S - S- 'S 2 2
2 l
l
+ C(R){S
l
-S iS^S )
22
2
l2
which proves the necessary part of the theorem. 3.3.2.
•
Characterization of equivariant estimators of 0
T h e following theorem gives a characterization of the equivariant estimator of 0. T h e o r e m 3 . 6 . If d* is an equivariant
estimator
of 0 then d*(S)
has the
form 1
1
d*(S) = R' a(R)S S , 22
where a : R
,
2
—* R.
+
P r o o f . Define u:S
p
1
^ RP'
by
«(£) = £ = £ - % ! . If a" is equivariant then d*(S) = (R-^WSnS^Su
+ C(R)(S
2
= (T (R- a (R)UU' 2
l
+ (1 -
22
1
l
= R- a{R)S - S i 2
2
2
l
-
S^S^S ))~ S^R-'a^R) 12
r
+ CfflHCV! -
22
= R~ (a (R)
22
•
2
v
UU )T^S R- a {R) 2l
1
21
1
R )(C(R))- a (R)S - S 21
2 2
21
•
64
Group Invariance in Statistical
Inference
T h e risk function of an equivariant estimator d" of 0 is given by R(0,d*)
=
Ex(L((3,d')) 1
= E {S^(R- a(R)S S -. L
n
- 0}'S {R~
l i l
- 2R- a{R)S-[ S 0
z
l
a(R)S
22
1
= E {a?{R)
T h e o r e m . 3.7.
2
l
l
S
22
+ S^0'S 0}
12
-
2l
.
22
0)} (3.25)
T h e best equivariant estimator of 0 given p, under the
loss function L is given by l
l
R- a*(R)S - S , 2 2
(3.26)
2
where £
a'(R)
= r p
r
2
(
n ± i
£ r Proof. a*(R)
+
l
r ( ^ - M )
)
( V V ) 7 I T
1
(3.27)
. 2
2
(2=1 +
m
) (2 )
m
p
/ m ! r (2=1 + m)
r
From (3.25), the minimum of R(0,d*)
— E^S^
+
^ _
1
Si20R~ \R).
is attained when a(R)
=
Since the problem is invariant and d" is equiv-
ariant we may assume, without any loss of generality, that
with C(p) = \ \
M
,1)
f
£ j . To evaluate o*(fi) we write 5
2 2
= T T ' , T e G + ( p - l ) , r = (%)
5
2 1
= RTW,
S
u
0 < B < l , B
r
1
6
fl"" ,
= W W .
T h e joint probability density function of (R, W,T)
1
P-i
is ( G i r i (1977))
i
i=2j=l P
1 *
e
x
p
{
~ 2 ( i -
2 P
)
(
W
i
"
+
A straightforward computation gives (3.26).
* » -
2
T
P ^
W
_
l
^ n i=i
fe)
• (3.28) •
Equivariant
Estimation
in Curved Models
65
L e m m a 3 . 2 . Let 0 > 0, 0 < 7 < l , m be an integer. Then ^Vja
+ m + l) T(0 + T{a
1=0
l)^
+1)
Proof.
^ r ( a + m + l)r(^ + l ) j i=0
d" (=1
-0 = (1 - 7 ) ^ ) ^ ( 1
+ „)•+«-» ( l -
^
• T h e o r e m 3.8. / / m —
—p) is an integer then
a ' M ^ - ^ - l p y m, 1 - r
i=0
1 -
V
r V
P r o o f . Follows trivially from L e m m a (3.2).
Note.
If p
2
is such that terms of order ( r p )
2
and higher can be ne2
glected, the best equivariant estimator of 0 is approximately equal to p (n — 1) l
1
(p-ir S S i. 22
2
T h e mle of 0 is
SS. 22
2S
66
Group Invariance
in Statistical
Inference
Exercises 1. Let X
a
= (Xai,-
-,X ,)',a
= 1,...,N(
aj
> p) be independently and iden-
tically distributed p-variate normal random vectors with common mean fi and positive definite common covariance matrix £ and let
show that if & = A V E
- 1
l
/ * is known, then NX'(S
+ NXX')~ X
is an
ancillary statistic. 2. L e t X
a
= (X i,...,X y,a a
= 1 , . . . ,2V( > p) be independently and identi-
ap
cally distributed p-variate normal random vectors with mean 0 and positive definite covariance matrix £ and let E
S
_ (
(12) \
\£(21)
with £
( l l
) : lxl.
Given the value of p
2
-
£
( 1 2
,£ ~ (
2 )
£(
2 1 1
/ £ i i find the
ancillary statistic. 3. (Basu (1955)). If T is a boundedly complete sufficient statistic for the family of distributions {Pg,9
g ft}, then any ancillary statistic is independent of
T. 4. F i n d the conditions under which the maximum likelihood estimator is equivariant.
References 1. S. Amari, Differential
information
geometry
of curved
exponential
families
- curvatures
and
loss, Ann. Statist. 10, 357-385, (1982a).
2. S. Amari, Geometrical
theory of asymptotic
ancillary
and conditional
inference,
Biometrika, 69, 1-17 (1982b). 3. D. Basu, On statistics
independent
of sufficient
statistic,
Sankhya 20, 223-226
(1955). 4. D. R. Cox and D. V, Hinkley, Theoretical Statistics, Champman and Hill, London, (1977). 5. B. Efron, The geometry of exponential families, Ann. Statist., 6, 262-376, (1978). 6. B. Efron and C, Morris, Stein's
estimation
rule and its competitors,
An
empirical
approach, J . Amer. Statist. Assoc. 68, 117-130, (1973). 7. B. Efron and C . Morris, Stein's paradox in statistics, Sci. Ann., 239, 119-127, (1977).
Equivariant
8. R. A. Fisher, Theory
of statistical
estimation,
Estimation in Curved Models
67
Proc. Cambridge Phil. Soc.
22,
700-725, (1925). 9. N. Giri, Multivariate
Statistical
10. D. V. Hinkley, Conditional of variation,
Inference,
inference
Academic Press, N . Y . , 1977.
about a normal mean with known
coefficent
Biometrika, 64, 105-108, (1977).
11. T . Kariya, N. Giri, and F . Perron, Equivariant l
of Npifi, E ) with u"Z~ u
V
estimator
of a mean vector
2
= 1 or £" V
= c or £ = a (u'fi)I,
J.
/i
Multivariate
Anal. 27, 270-283 (1988). 12. J . Kent, J . Briden, and K . Mardia, Linear tivariate
data as applied to progressive
and planer structure
demagnetization
in ordered
of palaemagnetic
mulrema-
nence, Geophys. J . Roy. Astronom. Soc. 75, 593-662, (1983). 13. F . Perron and N. Giri, On the best equivariant normal population,
14. F . Perron and N. Giri, Best equivariant 40, 46-55 (1992).
estimator
of mean of a
multivariate
3. Multivariate Analysis, 32, 1-16 (1990). estimation
in Curved
Covariance
Models,
Chapter 4 SOME B E S T INVARIANT T E S T S IN MULTINORMALS
4.0. I n t r o d u c t i o n In Chapter 3 we have dealt with some applications of invariance in statistical estimation. We discuss in this chapter various testing problems concerning means of multivariate normal distributions. Testing problems concerning discriminant coefficents, as they are somewhat related to mean problems will also be cousidered. This chapter will also include testing problems concerning multiple correlation coefficient and a related problem concerning multiple correlation with partial informations. We will be concerned with invariant tests only and we will take a different approach to derive tests for these problems. Rather than deriving the likelihood ratio tests and studying their optimum properties we will look for a group under which the testing problem remain invariant and then find tests based on the maximal invariant under the group. A justification of this approach is as follows: If a testing problem is invariant under a group, the likelihood ratio test, under a mild regularity condition, depends on the observations only through the maximal invariant in the sample space under the group {Lehmann, 1959 p. 252). We find then the optimum invariant test using the above approach, the likelihood ratio test can be no better, since it is an invariant test.
4.1. T e s t s of M e a n V e c t o r Let X = (Xi,...,X ) be normally distributed with mean £ — E(X) = (£i> — l i p ) and positive definite covariance matrix £ — E(X — u) (X — p). Its pdf is given by p
1
68
Some Best Invariant
fx(x)
2
= {2 r)-"/ (det
1
j-itrE" ^
exp
7
Tests in Mattinormals
- 0 & -
tf
} •
69
(4-1)
In what follows we denote 3: a p-dimensional linear vector space, and X', the dual space of X. T h e uniqueness of nonmal distribution follows from the following two facts: (a) T h e distribution of X is completely determined by the family of distributions olB'X,
9 £ X'.
(b) T h e mean and the variance of a normal variable completely determines its distribution. For relevant results of univariate and multivariate normal distribution we refer to Giri (1996). We shall denote a p-variate normal with pdf (4.1) as JV (f, S ) . p
On the basis of observations x" N {l, p
— (x ,...,
x)
ai
, a — I,. ..,N
ap
from
E ) we are interested in testing the following statistical problems.
P r o b l e m 1, T o test the null hypothesis HQ: £ = 0 against the alternatives 2
Hi: £ j£ 0 when g, S are unknown. T h i s is known as Hotelling's T
problem.
P r o b l e m 2. T o test the null hypothesis HQ: £i — • • — £ , — against the p
alternatives H i : £ # 0 when £, £ are unknown and p > p i . P r o b l e m 3 . T o test the null hypothesis Ha : £i — * * > — £ alternatives H i : & = • - * = ^ Let X
-
jj^X",
P l
against the
= 0 when p, £ are unknown and pi + p
S = E^(X
(minimal) for (£, £ ) and \/NX
a
- X)(X
a
- X)'.
(X,S)
2
< p.
is sufficient
is distributed independently of S as N (y/ntl,
E)
p
and S has Wishart distribution Wj,(JV—1, E ) with JI = J V - 1 degrees of freedom and parameter E . T h e pd/ of S is given by n
W (n, p
2
: i : :
f X{detE)' / (dets) £ } = < if i positive definite, s
1
1
F exp{-|trE- a} (4.2)
s
I 0 otherwise. Problem 1 remains invariant under the general linear group G ; ( p ) transfering (X, S) -* (gX, gSg'), g G Gi(p).
From E x a m p l e 2.4.3 with di = p
R =NX'(S
+
1
nXX')- X
l
NX'S- X
(4.3)
l+NX'S-tX
is a maximal invariant in the space of (X, S) under G ( p ) and 0 < R < 1. (
70
Group Invariance in Statistical
Inference
T h e likelihood ratio test of HQ depends on the statistic ( G i r i (1996)) T
2
= N(N 2
which is known as Hotelling's T T
2
-^X'S^X
(4-4)
statistic. Since R is a one-to-one function of
we will use the test based on R. From Theorem 2.8.1 the pdf of i i depends
only on 6 — i V ^ ' E ' ^ and is given by 1
M
>
1
r(p/2)r((7v- )/2)
'
P
It may be verified that 8i is a maximal invariant under the induced group (5j(p) = Gi(p)
in the parametric space of (£, E ) . Since £ is positive definite
tfi = 0 if and only if p = 0 and 6 > 0 for p ^ 0. F o r invariant tests with respect to G;(p) Problem 1 reduces to testing HQ: 6 = 0 against the alternatives Hi6 > 0. From (4.5) the pdf of R under HQ is given by r ( a
Mn) =
J—rM(i-Tjtl*-*&-*
r ( f ) r ( V ) if 0 < r < 1, 0 otherwise,
( 1
which is a central beta with parameters ( | , ^j^)-
4.
6 }
'
From (4.5) the distribution
of R has monotone likelihood ratio in i i (see Lehmann, 1959). T h u s the test which rejects HQ for large values of R is uniformly most powerful invariant (UMPI)
under the full linear group for testing HQ against H i .
Since R ^ constant, implies that T
2
^ constant we get the following theo-
rem. T h e o r e m 4.1.1. whenever T
2
For Problem
1, Hotelling's
T
2
test which rejects HQ
^ C, the constant C depends on the level a of the test, is
for testing HQ against
UMPI
Hi.
The group Gt{p) induces on the space of {X, S) the transformations (X,S)^(gX,gSg'),
9
€G,(p).
It is easy to conclude that any statistical test
a function of
{X,S}
whose power function depends on ( / i , £ ) only through 6, is almost invariant under G j ( p ) . Since
Some Best Invariant
E ^(X S) ll
=
<
Tests in Multinomials
71
E . _ . . .ct>(X,S) g 1)l
=
g lEg l
E 2
thus = 0
E^((X,S)-
Using the fact that the joint probability density function of
( X , S) is complete, we conclude that
=
1
4'(gX,gSg•}
for all g 6 <3J(P)> except possibly for a set of measure zero. A s there exists a left invariant measure (Example 2.1.3) on G ( p ) , which is also right invariant, (
any almost invariant function is invariant ( L e h m a n n , 1959). Hence we obtain the following Theorem for Problem 1.
T h e o r e m 4.1.2.
Among
pending on 6, Hotelling's
2
T
all tests of HQ: £ — 0 with power function test is
de-
UMP:
P r o b l e m 2. Let Ti be the group of translations such that t i , g T i translates the last p — pi components of each X", a — 1 , . . . , N and let Gi be the subgroup of Gi{p)
such that g 6 G\ has the following form g=
(m
o
\9{21) 9(22) where g^ j U
is the p
x
x p
x
upper left-hand corner submatrix of g. T h i s problem
is invariant under the affine group ( G ^ T i ) such that, for g £ Gi,tj
€ Ii,
(g, ( i ) transform a
a
X ^gX
+ t
u
a =
l,..,,N.
A maximal invariant in the space of (X,S)
under ( 0 i , T i ) is Bi,
R ^ N X ^ S ^ + N X ^ X ' ^ r ' X ^
where (4.7)
as defined in E x a m p l e 2.4.3 with di == Jh- A corresponding maximal invariant in the parametric space of (£, E ) under ( G i . T i ) is #i = N ^ E ^ . ^ J J . variant test this problem reduces to testing H Hi:
&i > 0. From Theorem 2.8.1 the pdf of R
t
X
For in-
: §i = 0 against the alternatives
0
is given by
72
Group Invariance
in Statistical
Inference
From (4.8) it follows that the distribution of R\ possesses a monotone likelihood ratio in i%. Hence we get the following Theorem. T h e o r e m 4 . 1 . 3 . For problem 2 the test which rejects Ho for large values of Ri is UMPI
under (Gi,Ti)
for testing Ho against
H\.
N o t e : A s in Problem 1 this U M P I test is also the likelihood ratio test for Problem 2.
Using the arguments of Theorem 4.1.2 we can prove from
Theorem 4.1.3 that among all tests of Ho with power function depending on Si, the test which rejects Ho for large values of Ri is U M P for Problem 2. P r o b l e m 3.
Let T
be the translation group which translates the last
2
a
p — pi — P2 components of each X , of Gi(p) such that g e G
a = 1 , . . . , N and let G
0 \ 0 1 3(32) ff(33] /
\9(31)
is pi x p and <7( ] is p2 x p . t
affine group (G2,T )
22
a
a
X ^gX
(G ,T ) 2
2
2
and t
2
T h i s problem is invariant under the
transforming
2
where g £ G
be the subgroup
(9iu) 0 9(21) 3(22)
9 =
where
2
has the form
2
2
+ t
2l
a =
l,..,,N
€ T . A maximal invariant in the space of (X, S) under 2
is C f l M b ) ) where
1
R l + R - NXf2](Si22] + N ? [ ] A ' [ 2 ] ) " X [ 2 | 2
J
2
as defined in Example 2.4.3 with di = pi, d = p . 2
2
A corresponding maximal invariant in the parametric space of ( £ , E ) is (61,62), defined by,
^=^(1)^-^(1).
Under the hypothesis H ,$1 6
= 0, 6 = 0 and under the alternatives Hi, fij = 2
0, 6 > 0. From Theorem 2.3.1 the joint pdf of Ri, R 2
2
is given by
Some Best Invariant Tests in Multinomials
m
, _
f l
'
2 >
73
n | N ) r t l p i j r c ^ r t ^ p - p ! - ^ ) )
From G i r i (1977) the likelihood ratio test of this problem rejects Ho whenever
where the constant c depends on the level a of the test and under Ho, Z is distributed as central beta with parameters ( | ( i V — pi — p ) , \p?)2
From (4.9) it follows that the likelihood ratio test is not U M P I . However for fixed p, the likelihood ratio test is approximately optimum as the sample size N is large (Wald, 1943). T h u s if p is not large, it seems likely that the sample size commonly occurring in practice will be fairly large enough for this result to hold. However, if the dimension p is large, it might be that the sample size N must be extremely large for this result to apply. We shall now show that the likelihood ratio test is not locally best invariant ( L B I ) as S - * 0. T h e L B I test rejects H whenever 2
0
Ri +
N
~
P
l
R
> c
2
(4.11)
where the constant c depends on the level a of the test. which rejects Ho whenever R, + R
Hotelling's T
2
test
^ c, the constant c depends on the level O:
2
of the test, does not coincide with the L B I test, and hence it is locally worse for Problem 3. D e f i n i t i o n 4 . 1 . 1 . ( L B I test). For testing H
: 9 € £IH„,
a
an invariant test
4>* of level a is L B I if there exists an open neighborhood fii of fl/jo such that
Eebt?) >
E {4>\ e
0
e&
-fijio,
(4.12)
where 0 is any other invariant test of level a. T h e o r e m 4 . 1 . 4 . For testing H
0
0, the test which rejects HQ whenever
: h*i = 0, tf = 0 against H\ : S i = 0,6 2
N
R\ + ~?'
R
2
depends on the level a of the test, is LBI as 6 — * 0. 2
^ c, where the constant
>
2
c
74
Group Invariance
Proof.
in Statistical
Since (R\,R2)
Inference
is a maximal invariant under the affine group
( G 2 , r ) , the ratio of densities of (R ,R ) 2
1
to that under HQ, is
under Hi,
2
given by, m , ^ ) f(fi,f \0)
=
1
4
+
(
2
' _ 2 V
1
+
f
i
+
i v ^ P2
f
2
> )
+
o
(
,
2
)
as 6 —* 0. Hence the power of any invariant test d> is given by 2
E ($)=a-rE ^ $6 S2
s
D
(-l
2
+ R i + ^ ^ R ^ j
as 62 —* 0, which is maximized by taking $ = ^ c.
+
o(6 ) 2
N
1 whenever Ri + ~
Pl
R n 2
Asymtotically Best Invariant ( A B I ) Test Let # be an invariant test for testing H
0
: $1 = 0, 6 = 0 against Hi : S\ = 0, 2
6 = A with associated critical region fi = { ( f i . f a ) ! U{f\,ft)
^ c „ } satisfying
2
P « ( f i ) = a; 0
P (R)
= 1 - exp{-H(A)(l +o
Hl
2
f!-"- lff? = expWAJlGfAi + +
fifA)^,^)]} fi(f f ;A);
(4.12)
l7 2
where J B ( f i , f ; A ) — o ( H ( A ) ) as A —> 00 and 0 < ci < R(X) < c 2
2
< 00. Then
$ is an A B I test for testing HQ against H1 as A —1 00. T h e o r e m 4 . 1 . 5 . For testing HQ : S\ = 62 = 0 against H i : t>\ — 0 , £ — 2
A > 0, Hotelling's
test which rejects H
0
whenever Ri + R
2
^ c, the constant c
depending on the level a of the test, is ABI as A —* 0 0 . P r o o f . Since _,
,
.
,
a
a(a +1)
= exp{ar(l + o ( l ) }
2
x
as x —< 0 0 ,
we get from (4.9) T^j^j
=
e
x
^
f
" i " M(l
+ B(f f2-,\))}; u
(4.13)
where S ( f i , f ; A ) - o ( l ) as A - » 00. It follows from (4.2) that the pdf of U = R\ + R satisfies 2
2
Some Best Invariant Testa in Multinomials P(U(R ,R ) l
=
2
R
1
+ R
2
< c
a
75
)
= exp { ^ ( c « - 1 ) 0 + o ( l ) ) }
•
(4-14)
Thus, from (4.12) with H(X) = ~{c„ - 1), we conclude that Hotelling's T
2
test
is A B I .
•
4.2. T h e C l a s s i f i c a t i o n P r o b l e m ( T w o P o p u l a t i o n s ) Given two different p-dimensional populations Pi,P probability density functions pi,p
2
• . ., x Y P
characterized by the
2
respectively; and an observation x — [x\,
the classification problem deals with classifying it to one of the two
populations. We assume for certainty that it belongs to one of the two populations. Denote by Oj, t = 1,2 that x comes from Pi and let us denote, for simplicity, the states of nature Pi simply by i. L e t L ( j ,
be the loss incurred
by taking the action a; when j is the true state of nature having property that L(i,a<) - 0, L(i,a,j)
= ij > 0, j j£ i = 1,2,
L e t (x, 1—T) be the a priori probability distribution on the states of nature. The posterior probability distribution ( £ , 1 — £ ) , given the observation x, is given by _ 5
rrp^x)
(
«Pi(*J + {
l
The posterior expected loss of taking action a
l
and for taking action a
•
(4.16)
+ (1 - JT)P2(Z)
is given by
2
irpi{x)
+ (1 - iv)p {x) 2
(4.17)
'
From (4.16)-(4.17) it follows that we take action a
if
2
TC Pl(x)
,
2
x p i ( x ) + (1 - 7 r ) p ( i )
(1-^)C1P (X) 2
npi(x)
2
+ (1 -
w)p (x) 2
and action a j if 7rc pi(a;) 2
npi(x)
'
is given by
(1 - 7T)CiP2(l) Kpi(x)
,
1
>
+ (1 - n)p (x) 2
and randomize when equality holds.
{1 - ir)c (x) lP2
xp^x)
+ (1 - ?r)p2(i)
^
76
Group Invariance
in Statistical
Inference
(4.18) and (4.19) can be simplified as, take action a
if
> .,
take action a ,
if
< JiT
2
and randomize when
.— = K (say), (4.20)
— K.
L e t us now specialize the case of two p-variate normal populations with different means but the same covariance matrix. Assume Pi : W „ ( « , E ) , p :A. (M,S), 2
where a =
...,^ )', £ -
p
(ft,. ,-,£p)' e i i
p
p
and £ is positive definite.
Hence pa(«)
1
= exp - exp
[f> - O'S" ^
(s'E^O
Using (4.20) we take action a
2
-0
HYX-H*
4- | ( { ' E
_ 1
£ -
- f*)]
}
pE'V)
if
and take action a , otherwise. T h e linear functional T — £
- 1
x
( p . — 4) is called
Fisher's discriminant function. In practice E , £, ft are usually unknown, and we consider estimation and testing problems for T — ( r i , . . . , T ) . p
a
Let X , a
let Y ,
r
a — 1 , . . . , Ni be a random sample of size A i from N (£, p
E ) and
r
a — 1 , . . . , A7 be a sample of size A7 from A ( p , E ) . Invariance and 2
2
p
sifficiency considerations lead us to consider the statistic
s = Y, x {
a
-
+ J2( ° - )( - )'. Y
=
ya
y
a=l
a=l
where N,X
Y
a
X , A ^ y = J T ^ Y° for the statistical inferences of V.
We consider two testing problems about T. P r o b l e m 4. To test the hypothesis H alternatives Hi T ^ O .
0
:r
p
i
+
i = ••- —r
p
= 0 against the
Some Best Invariant
P r o b l e m 5 . T o test the hypothesis H
:T
0
alternatives Hi : r
p
i
+
P
2
i = ••• = T
+
Testa in Multinomials
= ••• = T
P 1 + 1
p
77
— 0 against the
= 0.
p
Without any loss of generality we restate our setup in the following canonical form, where X = {Xi,...
,X )
is distributed as J V ( 7 j , E ) and S is dis-
P
p
E ) and T — E J J . Since E is _ 1
tributed, independently of X, as Wishart W (n, p
positive definite, T = 0 if and only if n - 0. T h u s the problem of testing the hypothesis T = 0 against T ^ 0 is equivalent to Problem 1 considered earlier in this chapter. P r o b l e m 4, It remains invariant under the group G of p x p nonsingular matrices g of the form 0
=(m
)
9
\9{21)
(4.21)
9(22) }
where 9{\\) is of order p i , operating as (X S;T S) }
1
^
}
(gX^gSg'^g')- ?^^')
where X, jf are the mean and the sample covariance matrix based on N observations on X.
B y E x a m p l e 2.4.3 a maximal invariant in the space of the with di = pi, d% = P2
sufficient statistic (X, S) under G is given by (Ri^R^), and p — pi + P 2 - T h e joint pdf of ( r ? i , R ) 2
is given by (4.4) where the expres-
sions for 6i, 6 in terms of T, E are given by 2
Si + s
= AT'sr,
2
Nr'» M (x r r 22r
S = 2
with £
2
2
= (E 2) - E ( 2 i ) E j ( 2
_ n )
E(
(2)
1 2 )
)
_ 1
.
(4-22)
l
{2)
Under the hypothesis H 61 > 0, 0
02 — 0 and under the alternatives HiSi > 0, i = 1 , 2 . From ( 4 . 4 ) it follows that the probability density function of Ri under Ho is given by
m\$i)
= r(ip,)r(i(A/-p,)) x(fi)^
p ,
1
x exp | - ^ ! } 4> and Ri is sufficient for
W
" (l-fi)^ "
p l )
"
1
QjT, | M J
Ifi^l
,
(4.23)
t\.
Giri (1964) has shown that the likelihood ratio test for this problem rejects Ho whenever * =
i
T T 7 r ^
( 4
-
2 4 )
78
Group Invariance
in Statistical
Inference
where the constant c depends on the level a of the test and under H Z 0
central beta distribution with parameter {^{N
has a
- p ) , \p2) and is independent
oiRi. To prove that the likelihood ratio test is uniformly most powerful similar we first check that the family of pdf UXH%)-h
^0}
(4.25)
is boundedly complete. D e f i n i t i o n 4.2.1.
(Boundedly complete).
A family of distributions
{Ps(x)
'• 6 E fi} of a random variable X or the corresponding family of pdfs
{ps(x)
: 6 e fi} is boundedly complete if E h(X) 6
= J
k(x)dP (x)
= j
h{x)p {x)dx
s
s
= 0 for all 6 6 fi and for any real valued bounded function h(X), h(X)
implies that
= 0 almost everywhere with respect to each of the measures Pf.
We also say AT is a complete statistic or X is complete. L e m m a 4.2.1. plete.
The family of pdfs { / ( f i ^ i ) : 6\ > 0} is boundedly com-
P r o o f . For any real-valued bounded function h(R]} (4.23)) E- (h{Ri)) Sl
= j
Hnmr^dfi
= ex {-i*- } P
1
x j-
f = exp <
l - 1 ^
Jo
1
I^ )
r
l
of Ri we get (using
Some Best Invariant
Tests in MultxnoTmals
79
where 1
N
h*(r )=h{r\)fi™- (l-r )u -Ky-\ 1
l
g ( £ { P - P i ) , 5 ( * - P i ) + j)
a 3
B ( i ( J V - p,), i
P
+ j ) B ( i ( J V - p), i ( p -
l
and B ( p , g) is the beta function. Hence E g h(Ri)
E
-I
))
= 0 implies that
J
/
P l
^Cn)(n) ^i=0-
(4.26)
Since the right-hand side of (4.26) is a polynomial in 6\, (4.26) implies that all its coefncents must be zero. In other words
/ Jo
h*(f\)f{dfi
=0,
j =0,1,2,....
(4.27)
Let
+
where h"
-
and ft* denote the positive and negative parts of h'. Hence we get
from (4.27)
Jo for j=
Jo
0 , 1 , . . . ; which implies that h*+(f )
=
1
for all ^ h*(Ri)
h-(f ) l
except possibly on a set of measure 0.
= 0 almost everywhere {Pg (Ri) 1
Hence we conclude that
: Si ^ 0} which, in turn, implies that
h ( f i i ) = 0 almost everywhere. Since Ri
•
is complete, it follows; from L e h m a n n (1959), p. 34; that any
invariant test 4>(Ri,R ) 2
of level a has Neyman Structure with respect to
Ri,
i.e., E (
ll
2
i
=
c. l
To find the uniformly most powerful test among all similar invariant tests we need the ratio R R =
fH (fl,f \Rl
= fi)
}H {T\,T \Ri
= f%)
l
a
2
2
80
Group Invariance
where / ^ ( f i f f ^ - f i i
in Statistical =
r
Inference
i ) denotes the conditional pdf of (RuRg)
given Ri =
f i under Hi,i = 0,1. From (4.9) we get
H - exp | - i | ( l - f x ) }
~ P i ) , \(P ~ P i ) ;
2
-
From (4.28), using the fact that Z is independent of R± when H
0
(4-28) is true, we
get the following theorem: T h e o r e m 4.2.1. For Problem 4, the likelihood ratio test given in (4.24) is uniformly
most powerful invariant
similar.
P r o b l e m 5. It remains invariant under the group G of p x p nonsingular matrices g of the form
9 =
/ V > 3(21) \3(31)
0
0
3(22) 3(32)
\ 0 3(33)/
with 3 ( H ) : pi x p2, 9(22) : P2 x p2 submatrices of g. A maximal invariant in the space of (X,S)
(as in Problem 4) under G is (Ri, R , R3) as defined in 2
Example 2.4.3 with d\ = p i , d = p and d$ = p - pi - p2. B y Theorem 2.8.1 2
2
the joint probability density function of Ri, R , Ri is given by 2
1
x
f
i""- ^i»-
l F
i(r^-W-»J-l
x f l - n - ^ - f a j i ^ - p ' "
'=1
l,
J= l
1
>>J
where
"» - E J' p
and
CTQ
=
° ' P3 = p - p i - p a •
J
Some Beat Invariant
&1 = i V f E j i i j l ^ i ) + S s
( 1 2
|r
Tests in Multinomials
81
) + £(18)1(3) ) ' S j ~ , j
i 2
r
* ( ( i o ( i ) + S(i2)r ) + £(i3]r )), f !
s
r
( 3
s
A + 6 = f (n) (i> + ci2)r > + s i3)r V \ ( 2 1 ) ( l ) + £(22)^(2) + £(231^(3) / 2
( 2
S
/
j E(ii)r + £ VE(2i)r ) + £ (
x
( 1 )
, =
Under H , 0
W
r
( 3 )
;
,
1
( E
3 3
(
r -i-£ r ] r j + £(23)r(
( 1 3 |
( 1
S
(
- i ' '
s
r
1
2 2
( 3 )
j l 3 )
( 3 )
( 2
- ( ^ ; ) ' E
r
i
2 2
3 )
\ ^ J
. ( ^ ) , r
(
3
,
61 > 0, 6 = 0, 6 = 0, and under H i , Si > 0, S > 0, 6 = 0. From 2
3
2
(4.29) it follows that Ri is sufficient for S\ under H . 0
likelihood ratio test of this problem rejects H 1
Z =
~
R
1 -
i
;
S
a
0
3
From G i r i (1964) the
whenever $ c
tii
(4.30)
where the constant c depends on the level a of the test and under H distributed independently of H i as central beta with parameter (^(N
0
Z is
—p — x
J>2). \V2)Let
3
be any invariant level a test of H
0
against H i . From
Lemma 4.2.1 H i is boundedly complete. T h u s <j> has Neyman Structure with respect to R . x
Since the conditional distribution of ( H i , R , R3) given Ri does 2
not depend on c\, the condition that the level a test <j> has Neyman structure, that is, H
H D
(0(H ,H2,H )|HI)-Q 1
3
reduces the problem to that of testing the simple hypothesis 6 = 0 against the 2
alternatives 6% > 0 on each surface Ri — f\.
In this conditional situation the
most powerful level a invariant test of S = 0 against the simple alternative 2
S — 6 >0\s 2
2
given by,
rejectH
0
whenever
Q(JV -
P l
), ip
z ;
^f 6%j 2
^ c,
(4.31)
where the constant c depends on a. Since tf. = 2
(l-Hi)(l-Z)
and Z is independent of H i , we get the following theorem. T h e o r e m 4 . 2 . 2 . For Problem 5 the likelihood ratio test given in (4.30) is uniformly most powerful invariant
similar.
82
Group Invariance in Statistical
Inference
4.3. T e s t of M u l t i p l e C o r r e l a t i o n a
Let X ,
a = 1,
,N beN
independent normal p-vectors with common mean
fi and common positive definite covariance matrix E and let NX S = £% (X =1
a
a
-X)(X -X)'.
=
a
E%-,X ,
A s usual we assume JV > p s o that 5 is positive
definite with probability one. Partition E and S as
\E(2i]
E(22) / '
\ 5(21)
S( ) / 2 2
where E ( j , 5 ( ] are of order p— 1. L e t 2 2
2 2
_
2
s
(i3)E
=
"
E
( 3 2 )
( 2 1 )
E (u)
2
the positive square root of p
is called the population multiple correlation
coefficent. 2
P r o b l e m 6. To test H
;p
0
= 0 against the alternative Hi : p
problem is invariant under the affine group (G,T),
2
> 0. T h i s
operating as
(X, 5; p, E ) - » ( g X + t, gSg'; gp +1,
gXg')
where g g G i s a p x p nonsingular matrix of the form g=f
«") 0
9(22)
where p ( ) is of order p - 1 and T is the group of translations t of components 22 a
of each X . (G,T)
A maximal invariant in the space of {X,S)
under the affine group
is r
2
_ ^
2 2
5
> ( 22)5(21) 5
(H)
which is popularly called the square of sample multiple correlation coefficent. 2
D i s t r i b u t i o n of R . To find the distribution of R 2
5
R 1
S
(12)S[! ) (21) 5( ) - (i2)£ S 2
2
R
5
U
( Z 2 )
From Giri (1977) 5(u) -
5(i2)5, L£ 2
( 2 1 )
( 2 1
)
2
we first observe that
Some Best Invariant
is distributed as X N2
w P
t
'
2
Tests in Multinomials
83
N - p degrees of freedom and is independent of
n
2
S(i2). 5( 2). T h u s R /(l
- R ) is distributed as
2
S
1
• ( )5 2 5(22) 12
2
*N-p
E
(11) -
{2
|
S
S
(12) 2] (21) [2
But the conditional distribution of S
( 12) 5(22)
\/E(ii} - S
{ 1 2 )
S
2
( 2 2 )
S(
2 l )
given 5 ( ) , is (p — l)-variate normal with mean 2 2
^{22)^(22)^2) ^ E ( u ) - E(i2)S 2)S(21) (2
2
2
and covariance / . Hence R /(l 1
2
- R ) is distributed as |
/E( )S 5(22)E S ) pl21^( )^22)^ )^(2in 1 2
( 2 2 |
( 2 2 )
2 2
X%-
X
p
v
*
p
E
(
1
1
( 2 1
( 2 2
-EdajE^Efai)
)
where Xj.(A) denotes a noncentral chisquare random variable with noncentrality parameter A and k degrees of freedom. Also s
(i2)E E
2 2 )
S
S
| 2 2 |
S(
1 2
)
S
(12) (22) (21>
Xw—v Since
is distributed as chisquare S
S
S
(12) (~22) (21)
E(u) - E ( 2
5(
( 2 2 )
I 2 )
E (
2 1
E
p ( 2 1
)
2
1 -P
2
'
2
we conclude that R /(l
- R ) is distributed as 1
2
,2 I _P_
2
\
Using the fact that a noncentral chisquare random variable XmW represented as xt +2K> 1
w
n
e
r
e
can be
K is a Poisson random variable with parameter
^, we conclude that ^ ( r r ^ - i )
84
Group Invariance
in Statistical
is distributed as X - p _
1 + 2 K
Inference
- , where the conditional distribution of K given x ? v - i 2
2
is Poisson with parameter \[p l[l
P(K
2
- P )]XN-I-
2
Let A/2 - i p / ( l - p ) . Then
= k) = w
1
25< - >r(i(Ar_i))A;!
_
r(§(iV-i)+*)A* r ( | ( N - l ) + fc) ,
_
k
2
)
i
(
_ i ,
N
{
4
fc!r<±(iv-i)) with lb = 0, 1 , . . . . Thus
X p
is distributed as
-
1 + 2 K
lo U 1 M I 1 U U . c u d o
—T-
(4.34)
t—
2
N-
2
1 - B
X
where K is a negative binomial with pdf given in (4.33). Simple calculations yield
-
(N
2
p)R
2
(p~l)(l-R ) has central F distribution with parameter (p - 1, N - p) when p (4.33)-(4.34) the noncentral distribution of R (j f R
'
{ r
>-
r
2)l(N- -3) P
( r
2
— 0. From
is given by
2 l( -3) )
2
P
( 1
_
ftHtr-l)
r(i(N-i))r(i(iv-p)) , f . W W ( j ( f f - i ) x
fe
^(i( -D P
+
l
+ i)
.....
)-
( 4
'
3 5 )
It may be checked that a corresponding maximal invariant in the parametric 2
space is p . T h e o r e m 4.3.1. For problem 6 the test which rejects H
0
whenever R
the constant c depends on the level a of the test, is uniformly invariant.
most
2
^ C,
powerful
Some Beat Invariant
Tests in Multinomials
85
P r o o f . From (4.36)
2
~
(A )T^(Jv-i^)r(j(p-i)) 2
^
i!r(i(p-i) + i)r (i(7v-i))
2
2
which for a given value of p
is an increasing function of r .
'
Using Neyman-
Pearson L e m m a we get the theorem.
• 2
As in Theorem 4.1.2 we can show that among all tests of Ho : p against Hi : p
2
> 0 with power function depending only on p
2
= 0
2
the fi -test is
UMP. 4.4. T e s t of M u l t i p l e C o r r e l a t i o n w i t h P a r t i a l I n f o r m a t i o n Let X be a normally distributed p-dimensional random column vector with a
mean p and positive definite covariance matrix S , and let X ,
a — 1 , . . . ,7V
[N > p) be a random sample of size N from this distribution. We partition X as X = (Xi,X^,X'^Y
where X\ is one-dimensional, X^
is pi-dimensional
and X ( ) is p2-dimensional and 1 + p i + P 2 = p. Let p\ and p denote the multi2
ple correlatron coefncents of X\ with X[ )
and with (X'^X'^y,
2
respectively.
2
Denote by p\ = p — p\. We consider here the following two testing problems: 2
= 0 against the alternatives Hi\;
p\ — 0,
2
= 0 against the alternatives H \:
p\ = 0,
P r o b l e m 7. To test H m : p p\ = X > 0. P r o b l e m 8. T o test H o-
p
2
pl =
2
X>0.
Let NX
0
= X^,X ,
S = S^^A-"-X)(X°-X)',
6
(fJ
denote the i-vector
consisting of the first i components of a vector b and C[;j denote the i x i upper left submatrix of a matrix c. Partitions 5 and E as
(
5n
5
( 1 2
)
S(
5 ] 5(32 )
5(21]
\
5 ) I , 5(33)/
| 2 2
5(31|
1 3 )
[ 2 3
E
—
/ En
E(i2j
I S(2i) \^(31)
E(
E(
3 2
)
E( 3) ) E( )
E 22) (
1 3
2
3 3
where 5 ( ) and E ( 2 ) are each of dimension pi x p i ; 5 ( ) and E ( j are each 2 2
2
3 3
3 3
of dimension P2 x p . Define Pi = E ( i ) E ^ E( /E , 2
1
2
2
p =^+pI
2
=
2 )
2 1 1
1 1
22
{E, E, 3 )fg ; 12)
1
)
\
(32)
^ ^ " ' ( E d ^ ^ l ' / S n , ^(33) /
86
Group Invariance in Statistical
Rl
- £(i2)S
( 2 2 )
Inference
S< )/\Sii , 2 I
4 +fl = c% ^ ) ( f p jjgj ) 2
3)
_ 1
•
}
It is easy to verify that the translation group transforming (X,S
;
i
i
, E ) ^ ( X + &,S;p + 6 , E )
leaves the present problem invariant and, along with the full linear group G of p x p nonsingular matrices g
o
(gu 9= \ where ffii : 1 X 1, 9 {
2 2 )
0 0
: pi x p
u
o \
ff(23) 0 9(32) 9(33) / g : l33)
p
2
(4.36)
x p , generates a group which 2
leaves the present problem invariant. T h e action of these transformations is to a
a
reduce the problem to that where p, = 0 and S = £% X X '
is sufficient for
=1
E , where N has been reduced by one from what it was originally. We now treat later formulation considering X", a = 1 , . . . , TV ^ p > 2, to have a common zero mean vector.
We therefore consider the group G of transformations g
operating as (S,Z)^(gSg\gU) for the invariance of the problem. A maximal invariant in the sample space under G is {Ri,R )
as defined in (4.11).
2
Since S is positive definite with
probability 1 (as N ^ p ^ 2 by assumption): R, 2
> 0, R
> 0 and Ri + R
2
2
=
the squared sample multiple correlation coefficent between the first and the
remaining p - 1 components of the random vector X. A corresponding maximal invariant in the parametric space under G is (p\,p\). density function of (R^R ) 2
/a(n,f ) - m 2
2
N/2
- P r (l
T h e joint probability
is given by (see G i r i (1979))
N p
~ fi - f )^ - ~V 2
TJfo)**-*
x
" A S S
« w t * + A > —
1
9
,
1
•
< 4
'
3 7 )
Some Best Invariant Tests in Multinormals where
% = 1- E
ff
w
P)
p
a
i = E J'
?
i
t
h
=
=
1
'
~
f)/Wi-i. .2
7.
and i f is the normalizing constant. B y straightforward computations the likelihood ratio test of Hio when fl — { ( « , E ) : E 1 3 — 0 } rejects J?io whenever f ^ c ,
(4.38)
where the constant c depends on the size a of the test and under HM, RI has a central beta distribution with parameter ( i p i , ± ( N - p i ) ) , and the likelihood ratio test of H
when fl = { ( p , S ) : S
20
^
1
1
2
= 0 } rejects H Q whenever 2
' ^ ~ ^ 4 1 - rj
,
C
(4-39)
where the constant c depends on the size a of the test and under H o the 2
corresponding random variable Z is distributed independently of R\ as central beta with parameter (^(JV — p i — ps), \p )2
T h e o r e m 4 . 4 . 1 . For problem 7 the likelihood ratio test given in (4.38) is UMP invariant.
Proof. Under ff
u
Under H
i0
f
t
= 1 , i = 0,1,2.
2
Hence a
P . = A, p\ = 0, % = 1, 71 = 1 - \ a
2
2
2
_
i=0
A
= 0, §i = 0, i = 1,2.
= 1 - A,
81 = 4fiA and t? = 0. T h u s /fln(n,r ) _ ^
2
j-w/2
(801
7
2
- 1 - A, al = 0,
87
88
Group /ntrarinnce in Statistical
Inference
Using the Neyman-Pearson L e m m a and (4.40) we get the theorem.
•
T h e o r e m 4 . 4 . 2 . For Problem 8 the likelihood ratio test is UMP
invariant
b
among all tests <j>(R R.2) ^sed on R\,R\
satisfying
u
Proof.
Under
p\ = 0, p\'=
1, 7a = 1 - A, &{ - 0,
- 1
a\8 = A, h = 0 and 0 = 4 f A ( l 2
A, % = 1 , %
fiA) .
2
Hence f
T
=
f
fn A 2\ l)/fH ( 2\n) 3
20
r
r
fH A l,?2)/fnio( l,r2) 1
0 0
.ra(JV-p )+t)/
/
1
— From (4.41) / j y fi).
a j
(2i)!
4f A 2
\ l - f
( f | r i ) has a monotone likelihood ratio in f 2
1
2
A
/
(
4
4
1
)
= (1 — z)(l —
Now using Lehmann (1959) we get the theorem.
•
Exercises 1. Prove Equations (4.3) and (4.5). 2. Prove (4.10). 3. Show that JrJ^(A) is distributed as X +2K m
w
n
e
r
e
i f is a Poisson random
variable with parameter j A . 4. Prove (4.31). 5. Let 7ri, ?r be two p-variate normal populations with means p\ and / i and 2
2
the same positive definite covariance matrix S . L e t X — {Xi,... distributed according to ir\ or ir and let b = ( & i , . . . ,b )' 2
p
,X )' P
be
be a real vector.
Show that ,
[E lb'X)-E2(b X)\
2
1
var(b'X) is maximum for all b if 6 — E
-
1
(pi — p ) 2
where Ei denote the expectation
under Tij. 6. (Giri, 1994a). Let X
= {X ,...,
a
X )',
al
a = 1 , . . . , N(>
op
p) be indepen-
dently identically distributed p = 2p!-variate normal random vectors with mean p = (/*!..- • tUp)' Let X = jjX^X^
a n <
*
c o m m
o n covariance matrix E (positive definite).
S = ^ , ( X
a
- X)(X
a
- X)'.
Write
Some Best Invariant Tests m Multinomials _ / S ( u )
E
£(ia)\
s =
(5(11)
X = ( X i , . • • Xp) — where S / y j and Sufy (a)
w
2
T
2)
p
= N(X
( 1
where T
2
%2}V
,
are pi x pi for all t, j and X ( i ) = ( X i , • • • , X , ) ' p
Show that the likelihood ratio test of H {fi' ,p[ )\
89
: M(i) =
a
p ( j , with p
=
2
= ( M I I - - I M P I ) ' rejects r Y for large values of
( 1 }
0
) - X(2))'(5
_1
+ 5(22) - 5(i2) - 5(2i)) (-A"[i) - X
( 1 1 )
2
is distributed as x ( S ) / ' x % P 1
P 1
with 6
2
-
( 2 |
/ V ( M ( I ) ~ r*(2])'
_
• ( E ( i i ) + E 22) - E independent. (
( 2 1 1
)
- E 2 ) ) ' ( M ( I ) - "(2)) and X p , ( - ) . X w ( 1
a
r
e
P l
(b) Show that the above test is U M P I . 7. ( G i r i , 19946). In problem 5 let T with r (a)
( 1 )
= (r
1 [
E " V -
(Ti
r
2
p
J' =
(T^j,Tfaf
...,r „)'. j
Show that the likelihood ratio test of HQ : T ( i ) = r ( 2 ) rejects Ho for small values of Z _
,
1
1 + i V ( X ( i ) + X ( ) ) ( S ( n ) + 5(22) + 5(21) + 5 ( i ) ) ~ ( X ( i ) + X(2)) 2
2
l +
NX'S^X
where Z is distributed as central beta with parameter {^{N — pi),
\pi)
under Hn. (b) Show that it is U M P I similar. (c)
F i n d the likelihood ratio test of H
0
: T ^ , = A F ( ) when A is known. 2
8. ( G i r i , 1994c). I n problem 6 write
( where E (
3 1
£(ii)
E(
1 2
E(2i)
E
E(3l)
E(
j
{ 2 2
3 3
E(
1 3
) \
/S(it)
)
E(23) I ,
)
E(
3 2
U
2 2
( 1 2
)
5 = I 5(2i)
5(2 )
\ -S"<31)
5(32)
)/
) and 5 ( ) are 1 x 1, E (
5
S(
S
2
1 3 |
( 2 3 |
5(33)
) and 5( 2) are pi x pi and E 2
( 3 3
j and
5(33) are pi x pi with 2pi = p — 1. (a)
Show that the likelihood ratio test of H
0
:E
( 1 2
) = E(
1 3
) rejects H
for
a
large values of R\
= (5(12)-5(i ))(5( ) + S ( ) - S ( 3 2 ) -5(23)) 3
2 2
3 3
1
(5(21) ~ 5 ( ) ) / 5 ( ) , 3 !
U
90
Group Invariance in Statistical
Inference
the probability density function of R\ is given by _
( 1 - ^ - 0 ( 1 - ^ ( ^ - 3 )
r(i(iv-i))r(i(iv- )) (^) (^?> "- r (^(Jv-i) + j) Pl
J
Pl+J
1)
3
mhpi+j)
j=0 where p
2
= i V ( £ ( 1 2 ) - 2(13)1(2(22) -I- £ { 3 ) - £ 3
x (£(2i) -
£
( 3
i))/£
( 1 1
( 3 2
) - E(23])~
l
).
(b) Show that no optimum invariant test exists for this problem.
References
1. N, Giri, Locally Minimax Tests of Multiple Correlations, Canad. J . Statist. 7, 53-60 (1979).
2. N. Giri, Multivariate Statistical Inference, Academic Press, N.Y., 1977.
3. N. Giri, On the Likelihood Ratio Test of a Normal Multivariate Testing Problem, Ann. Math. Statist. 35, 181-189 (1964). 4. N. Giri, On a Multivariate Testing Problem, Calcutta Stat. Assoc. Bull. 11, 55-60 (1962).
5. N. Giri, On the Likelihood Ratio Test of A Multivariate Testing Problem II, Ann. Math. Statist. 36, 1061-1065 (1965).
6. N. Giri, Admissibility of Some Minimax Tests of Multivariate Mean, Rapport de Research, Mathematics and Statistics, University of Montreal, 1994a. 7. N. Giri, On a Test of Discriminant Coefficients, Rapport de Research, Mathematics and Statistics, University of Montreal, 1994b.
8. N. Giri, On Tests of Multiple Correlations with Partial Informations, Rapport de Research, Mathematics and Statistics, Univerity of Montreal, 1994c.
9. N. Giri, Multivariate Statistical Analysis, Marcel Dekker, N. Y . , 1996. 10. J . K . Ghosh, Invariance in Testing and Estimation, Tech. report no Math-Stat 2/672. Indian Statistical Institute, Calcutta, India, 1967.
11. E . L . Lehmann, Testing Statistical Hypotheses, Wiley, N.Y., 1959.
12. A. Wald, Tests of Statistical Hypotheses Concerning Parameters when the Number of Observations is Large, Trans. Amer. Math. Soc, 54, 426-482 (1959).
Chapter 5 SOME MINIMAX T E S T S I N MULTINORMALES
5.0. I n t r o d u c t i o n T h e invariance principle, restricting its attention to invariant tests only, allows us to consider a subclass of the class of all available tests. Naturally a question arises, under what conditions, an optimum invariant test is also optimum among the class of all tests if such can at all be achieved. A powerful support for this comes from the celebrated unpublished work of Hunt and Stein, popularly known as the Hunt-Stein theorem, who towards the end of Second World War proved that under certain conditions on the transformation group G , there exists an invariant test of level a which is also minimax, i.e. minimizes the maximum error of second kind (1-power) among all tests. Though many proofs of this theorem have now appeared in the literature, the version of this theorem which appeared in Lehmann (1959) is probably close in spirit to that originally developed by Hunt and Stein. P i t t m a n (1939) gave intuitive reasons for the use of best invariant procedure in hypothesis testing problems concerning location and scale parameters. Wald (1939) had the idea that for certain nonsequential location parameter estimation problems under certain restrictions on the group there exists an invariant estimator which is minimax. Peisakoff (1950) in his P h . D . thesis pointed out that there seems to be a locuna in Wald's proof and he gave a general development of the theory of minimax decision procedures invariant under transformation group. Kiefer (1957) proved an analogue of the Hunt-Stein theorem for the continuous and
91
92
Group Invariance
in Statistical
Inference
discrete sequential decision problems and extended this theorem to other decision problems. Wesler (1959) generalized for modified minimax tests based on slices of the parametric space. It is well-known that for statistical inference problems we can, without any loss of generality, characterize statistical tests as functions of sufficient statistic instead of sample observations. Such a characterization introduces considerable simplifications to the sample space without loosing any information concerning the problem at hand. Though such a characterization in terms of maximal invariant is too strong a result to expect, the Hunt-Stein theorem has made considerable contribution towards that direction.
T h e Hunt-Stein theorem
gives conditions on the transformation groups such that given any test for the problem of testing H : 8 € f l / / against the alternatives K : $ € fl/f, with fi// n fi/f a null set, there exists an invariant test sup E (f>> eefi„ S
such that
sup E yj, eer2„
(5.1)
e
inf E d> < inf E il>. fen* ~ eefi e
(5.2)
e
K
In other words, V behaves at least as good as any in the worst possible cases. We shall present only the statements of this theorem. For a detailed discussion and a proof the reader is referred to Lehmann (1959) or Ghosh (1967). L e t V — {Pg,0
e f!} be a dominated family of distributions on the E u -
clidean space (X,A),
dominated by a cr-finite measure p. Let G be the group
of transformations, operating from the left on X, leave fi invariant. T h e o r e m 5.0.1.
(Hunt-Stein
Theorem).
Let B be a a-field of subsets of
G such that for any A E A, the set of pairs (x,g) and for any B € B, g G G , Bg e B. Suppose distribution
functions
v
n
with gx £ A is in A x B
that there exists a sequence of
on ( G , B) which is asymptotically
right invariant
in
the sense that for any g € G , B € B \im^\v (Bg) n
- v»(B)\
= 0.
(5.3)
1
T h e n , given any test <j>, there exists a test V which is almost invariant and satisfies conditions (5.1) and (5.2). It is a remarkable feature of this theorem that its assumptions have nothing to do with the statistical aspects of the problem and they involve only the group G . However, for the problem of admissibility of statistical tests the situation is more complicated. If G is a finite or a locally compact group the best invariant test is admissible. For other groups the nature of V plays a dominant role.
Some Minimax
Test in Muttinormates
93
T h e proof of Theorem 5.0.1 is straightforward if G is a finite group. L e t m denote the number of elements of G. We define
As observed in Chapter 2, invariant measures exist for many groups and they are essentially unique. B u t frequently they are not finite and as a result they cannot be taken as a probability measure. We have shown in Chapter 2 that on the group 0{p)
of orthogonal matrices
of order p an invariant probability measure exists and this group satisfies the conditions of the Hunt-Stein theorem. T h e group GT(P) of nonsingular lower triangular matrices of order p also satisfies the conditions of this theorem (see Lehmann, 1959, p. 345). 5.1. L o c a l l y M i n i m a x T e s t s Let (X,A)
be a measurable space. For each point (5, JJ) in the parametric
space f!, where 6 > 0 and ij is of arbitrary dimension and its range may depend on i5, suppose that p{-; 6,n) is a probability density function on (3Z,A)
with
respect to some u-finite measure p . We are interested in testing at level a (0 < a < 1) the hypothesis HQ : 6 — 0 against the alternative Hi : 6 = A, where A is a positive specified canstant and in giving a sufficient condition for a test to be locally minimax in the sense of (5.7) below. T h i s is a local theory in sense that p(x;A, JJ) is close to p(x;o, 7?) when A is small. Obviously, then, every test of level a would be locally minimax in the sense of trivial criterion obtained by not substracting a in the numerator and the denominator of (5.7). It may be remarked that our method of proof of (5.7) consists merely of considering local power behaviour with sufficient accuracy to obtain an approximate version of the classical result that a Bayes procedure with constant risk is minimax. A result of this type can be proved under various possible types of conditions of which we choose a form which is more convenient in many applications and stating other possible generalizations and simplifications as remarks.
Throughout this section expressions like o(A), o(/i(A)) are to be
interpreted as A —> 0. For fixed a , 0 < a < 1 we shall consider critical regions of the form (5.4) where V is bounded and positive and has a continuous distribution function for each {S,Tj), equicontinuous in (6,7)} for 6 < some 6$ and which satisfies
94
Group Invariance
in Statistical
Inference
P , (R)
= a +
(5.5) x v
where q(X,n) o(l).
= o{h(X))
k(X)+q(X, ) V
uniformly in v with h(X) > 0 for A > 0 and h(X) =
We shall be concerned with a priori
probability density functions £<J,A
and t]\,x on sets 6 = 0, 8 = A respectively, for which
{ f r o i f ^
=
where 0 < Ci < r(A) < c B(x,X)
= o(h(X))
1 +
WW)
+ r(Xmm
+
B( , X
A)
(5.6)
< oo for A sufficiently small and p(A) = o ( l ) ,
2
uniformly in x.
T h e o r e m 5.1.1.
(Locally
Minimax
Tests) If the region R satisfies
and for sufficiently small X there exist £o,i. and fa^ satisfying
(5.5)
(5.6) then R is
locally minimax of level a for testing Ho '• 8 = 0 against Hi : 8 = X as A —* 0, that is to say,
lim i - o Sup0
tu/iere Q
Q
A
e
Q
a
i n f , PUR)-o i n f , Px, {4>\ rejects i f } n
=
t _
« -
o
0
is t/ie class of all level a tests
P r o o f . Let r\ = [2 + h(X){g(X)
+ c
a
r(A)}]
- 1
.
Then —p-
= 1 + h{X) \g(X) + c
r(X)}.
a
(5.8)
Using (5.7) and (5.8) the Bayes critical region relative to the a priori distribution £ \ = (1 - T ) £ O , X + n A
and ( 0 — 1 ) loss is given by
£
M
Some M i n i m a l Teat in Multino-rmales
95
Define for any subset A
Pi,*(A)
=
J
Px,vW)
Vx=R W
—
= B
X
X
6 , A W ,
B\,
- R .
(5.10)
Using the fact that B(x,X) sup
-o(A)
h{\)
and the continuity assumption on the distribution function of U we get Plx(Vx Also for U
x
= V
k
+ W )=o(X).
(5.11)
x
or Pi\x(Ux)
= PS,x(Ux)[l
+ Pim)}]
-
(5-12)
Let r* (A) = {1 x
- n) ^, £4 A
+
n ( l - pfc(jf)).
Using (5.9), (5.10) and (5.11) the integrated Bayes risk relative to $\ is given by r'x(Bx) = rl(R)
+ (1 - n ) [ P ' , ( ^ ) - P * ( V ) ] 0
x
0
A
A
,
+ n[f r,A(^)-F *, (-y- )] 1
A
A
= r ( R ) 4- (1 - 2 r ) [ P * , ( ^ ) - P * ( V ) ] A
+
A
0
Q
A
A
^O%(VX + W A ) O ( M A ) )
= r ( R ) + o(/ (A)). A
(5.13)
l
If (5.7) were false one could by (5.5) find a family of tests { 0 > J of level a such that d> has power function cv + g(X, ij) on the set 6 — A with x
lim sup [inf g(A, JJ) - h(\)]/h(\)
> 0.
T h e integrated risk r' of ^> with respect to £ \ would then satisfy x
lim s u K ( i i ) - r , ] / h ( A ) > 0 , P
A—0
contradicting (5.13).
•
96
Group invariance
in Statistical
Inference
Remarks. (1) Let the set {6 = 0} be a single point and the set {8 = A } be a convex finite dimensional Euclidean set where in each component $
of TJ is
00(A))- If 1
P ? ' ^ = l + h(X) U(x) + T, p(ar;0,Jj) where s
if
6i(*) OiifA) m + B{x, A , TJ)
(5.14)
a y are bounded and s u p , J 3 ( X , A , J J ) = o ( / i ( A ) ) , and if there X i
exists any
satisfying (5.6), then the degenerate measure f ?
A
which
assigns all its measure to the mean of £I,A also satisfies (5.6). (2) T h e assumption on B can be weakened to Px,„{\B(x,
A ) | < e k(X)} - » 0
as A - * 0 uniformly in n for each e > 0. If the
i are independent of
A the uniformity of the last condition is unnecessary. T h e boundedness of U and the equicontinuity of the distribution of U can be similarly weakened. (3) T h e conclusion of Theorem 5.1.1 also holds if Q„ is modified to include every family {<j>x} of tests of level a + o(h{X)). consider the optimality of the family {Ux} replacing R by Rx -
{x : U {x) x
> c ,x}, a
One can similarly
rather than single U by
where P „{Rx} Qt
= a-
qx(v)
with qxiv) — o(h{X)). (4) We can refine (5.7) by including one or more error terms. Specifically one may be interested to know if a level a critical region R which satisfies (5.7) with inf Px, {R) n
= a + ciA + o(A),
as A - > 0
also satisfies mUPxMRl-a-dX
l i m
x
o sup^
e
Q
O
=
inf, Px, {<j> rejects H } v
0
^
- a - c,X
In the setting of (5.14) this involves two moments of the a priori
distribution
£ I , A rather than just one in Theorem 5.1.1. A s further refinements are invoked more moments are brought in. The theory of locally minimax test as developed above and the theory of asymptotically minimax test (far in distance from the null hypothesis) to be developed later in this chapter serve two purposes. F i r s t the obvious point of demonstrating such properties for their own sake. B u t well known valid doubts
Some Minimax
Test in Multinormales
97
have been raised as to meaningfulness of such properties. Secondly, then, and in our opinion more important, these properties can give an indication of what to look for in the way of genuine minimax or admissibility property of certain tests, even though the later do not follow from the local or the asymptotic properties.
2
E x a m p l e 5 . 1 . 1 . ( T - t e s t ) . Consider Problem 1 of Chapter 4. L e t X i , . . . , XN
be independently and identically distributed J V ( p , E ) random vectors. p
Write NX
=
X„ 5 -
For testing H
0
£ f (X; -
- X)'.
Let 6 =
> 0.
: 8 = 0 against the alternatives Hi : 8 > 0 the Hotelling's
test which rejects H
Q
whenever R -
NX'{S
l
+ NXX')~ X
T
2
> c, where c is
chosen to yield level a, is U M P I (Theorem 4.1.1). N o t e . For notational convenience we are writing R\ as R. L e t 6 — A > 0 (specified).
We are concerned here to find the locally
minimax test of Hn against Hj : S = A as A - > 0 in the sense of (5.7).
We
assume that N > p, since it is easily shown that the denominator of (5.7) is zero in the degenerate case N < p. In our search for locally minimax test as A —> 0 we may restrict attention to the space of sufficient statistic ( X , S). T h e general linear group G)(p) of nousingular matrix of order p operating as ( £ , s ; p , E ) — . (gx,gsg';gp,
gT,g')
leaves this problem invariant. However, as discussed earlier (see also James and Stein, 1960, p. 376), the Hunt-Stein theorem cannot be applied to the group G ( p ) , p > 2. However this theorem does apply to the subgroup G j - ( p ) (
of nonsingular lower triangular matrices of order p. T h u s , for each A, there is a level a test which is almost invariant and hence for this problem which is invariant under Grip)
(see Lehmann, 1959, p. 225) and which minimizes,
among all level a tests, the minimum power under H i . From the local point of view, the denominator of (5.7) remains unchanged by the restriction to GT invariant tests and for any level a test d> there is a G j invariant level a test 4>' for which the expression i n f , P x , , [4>' rejects H ) a
is at least as large, so
that a procedure which is locally minimax among GT invariant level a tests, is locally minimax among all level a tests. I n the place of one-dimensional maximal invariant R under G j ( p ) we obtain a p-dimensional maximal invariant (Ri,...,R ) p
with k = p and di — 1 for all i, satisfying
as defined in Theorem 2.8.1
98
Group Invariance
J2
in Statistical
Rj = NX{ {S A
Inference
+ N X ^ r ' X ^
M
, i = 1,.. - ,p
(5.16)
i=] with R, > 0, £ ] ?
= l
flj - J? < 1 and the corresponding maximal invariant on
the parametric space of ( / * , £ ) is ( ^
6 ) (Theorem 2.8.1) where P
(5.17)
with Sf_i
— 5.
(rn,...,%)',
T h e nuisance parameter in this reduced setup is n =
with jji = 6,/6 > 0, V _ ^
TJJ = 1 and under H
= 1
Theorem 2.8.1 the Lebesgue density of Ri,...
i n , . . . , r ) : r , > 0, p
,R
P
0
i) = 0.
From
on the set
JPfl <
1
is given by
/(ri,...,r„) =
x exp
yj , /JV - « + 1
i
nSi
2'
2
(5.18)
We now verify the validity of Theorem 2.8.1 for U — Yl^=i ^ - s i preceding (5.5) for the locally minimax tests are obvious. fi(A) = b\ with 6 a positive constant. Of course P\,„(R)
n c e
those
In (5.5) we take
does not depend on
77. From (5.18) we get with r = ( r i , . . . r ) ' , p
/o,,(r)
+
2
where B(r,\,-n)
— o(A) uniformly in r , n as A —* 0.
satisfied by letting £
B(J,X,V)
(5-19)
T h e equation (5.6) is
give measure one to the single point TJ — 0 while
0 | A
gives measure one to the single point rj = TP* = (TJ", . . . L
T?; - (N - j)~ (N
- j + l r v ^ o v - p),
where j = 1,..., p
Some Minimax
n
so that Y,i>3 * + (N - j the following theorem.
+ l}rj; = £ for all j.
T h e o r e m 5.1.2. For every p,N,a{N 2
T
Tesi in MultinoTmales
99
From Theorem 5.1.1 we get
> p) Hotelling's
T
2
test based on
or equivalently on R is locally minimax for testing HQ : 6 — 0 against the
alternatives Hi : 6 = X as X —* 0. E x a m p l e 5.1.3. Consider Problem 2 of Chapter 4. From E x a m p l e 5.1.2 it follows that the maximal invariant in the sample space is ( J ? i , . . . , R )
with
P1
Ri
a n
=
t
d
n
e
corresponding maximal invariant in the parametric space Under Ho, Si = 0 and under Hi 6i — X.
is ( f i i , . . . ,(J ,) with 8\ = Yii' 6j. P
Now following E x a m p l e 5.1.2 we prove Theorem 5.1.3. T h e o r e m 5.1.3. For every pi,N,a large values
of R\
the UMPi
is locally minimax
test which rejects Ho for
for testing Ho : &i — 0 against
the
alternative Hi : 6\ — X as X —* 0. E x a m p l e 5.1.4. Consider Problem 3 of Chapter 4. We are concerned here to find the locally minimax test of Ho : Si = 0, S = 0 against the alternatives 2
Hi
: 8i = 0, 6
2
— X > 0. A s usual we assume that N > p the determinator
of (5.7) is not zero.
To find the locally minimax test we restrict attention
to the space of sufficient statistic (Xi (X, S) —* (gX + t ,
gSg'), g € G , t
2
2
S).
T h e group (G2,T )
operating as
2
€ T , which leaves the problem invariant,
2
2
does not satisfy the condition of the Hunt-Stein theorem. However this theorem holds for the subgroup (G {p\ T
+ p ) , T ). 2
A maximal invariant in the sample
2
space under this subgroup is Ri....,
R
Pl
+
P
I
such that
l
^R^NX'u
(SM + NXMX )- X , {i]
with Ri >0,Ri=
Rj, Ri+R
2
i = l,...,pi
[{l
= E^Li"
2
maximal invariant in the parametric space is 61,...,
R,
+ P
2
(5.20)
and the corresponding
3
6
Pl
+ P 2
such that
(5.21)
with Si > 0,6
P
= E j L i 6j, h + 6 = 2
5,. Under H
0
Si = 0 for all i and
under Hi Si — 0 , i — 1 , . . . , p i , fij = 6 = \ > 0. T h e nuisance parameter in this
100
Croup Invariance
in Statistical
Inference w
reduced set up is fj = ( r f i , . . . , Vp,+piY
' * h if, = ^ . Under H n - 0 and under 0
Hi, ifc = 0 i = l . . . - , p i , ifc > 0, « = pi + 1 Equation (5.19) in this case reduces to
- l + f,-!/o,„(r)
M
P i + Pa with E f ^ + i * =
L
+ £(r,n,A)
^ r j . + f/V-.j + l ) ^
2 (5.22)
as A —» 0 with B(r, JJ, A) = o(A) uniformly in r, n. T h e set £
2
= 0, 6i = 0 is a single point n = 0, so £O,A assigns measure
one to the single point ij = 0. convex p -dimensional
Since the set {Si = 0, 6
2
— A (fixed)} Is a
Euclidean set wherein each component jj; is o(/i(A)),
2
any probability measure
can be replaced by the degenerate measure £ £ n
which assigns measure one to the mean ( 0 , 0 , * 7 p , + i > - • • < p,+
P2
A
° f &,*)•
Hence from (5.22)
/ I
fx, (r)
£ux(dv)
n
fo.n(r)
=I
+ ^ - i + f i + "E *(l>+Cw-i+i>%J
+ B(r,A,7j)
= 1 + -
\iiAdn)
- l + f , + L
£ J = P 1 +
^ ( ^ i t f + t W - j '
V
,
>
+ l h n
J
/
+B(r,A)
J
(5.23)
where B{r, A) — o(/i(A)} uniformly in r . Let il* be the rejection region defined by Rk = {X : U(X)
= J * , + kR
2
> C }
(5.24)
Q
where fc is chosen such that (5.23) is reduced to yield (5.6) and the constant C
a
depends on the level a of the test for the chosen fc. Choose (jy-j-i)...(/V-
P
l
-p ) 3
7
(A -Pi) P2(iV - p , - P 2 + 1)
Some M i n i m a l Test in Muttinormales
j = pi + 1, . . . , p i + p n
_
2
101
- 1,
(^-Pi)
(
5
2
5
1
so that N - p ,
V M ^ - J + l l ^ ^—^-, J =
P i + 1,
- - ,pl + P 2 •
T h e test 0* with rejection region
JT = j * :
U(X) = % +
— a satisfies (5.6) as A —> 0.
with Pa,\{R*)
structure (5.24) must have k —
> C„|
Furthermore any region R& of
to satisfy (5.6) for some
Theorem 4.1.4, for any invariant region R*, Px (R*) Hotelling's T
2
From
depends only on A and
tV
hence from (5.5) q{\,r))
(5.26)
— 0 and thus tp* is L B I for this problem as A —* 0.
test based on Ri + R
2
does not coincide with 4>' and hence it
is locally worse. It is easy to verify that the power function of Hotelling's test which depends only on 03, 6\ being zero, has positive derivative everywhere, in particular, at 8
2
— 0. T h u s from (5.5), with R — R*, h(A) > 0. Hence we
get the following theorem. Theorem 5.1.4. For Problem 3 the L B I test 0* is locally minimax as A —* 0. R e m a r k s . Hotelling's T
2
test and the likelihood ratio test are also invari-
ant under (G , T ) and therefore thier power functions are functions of 6 only. 2
2
2
From the above theorem it follows that neither of these two tests maximises the derivative of the power function at S = 0. So Hotelling's T 2
2
test and the
likelihood ratio test are not locally minimax for this problem. E x a m p l e 5.1.5.
Consider Problem 6 of Chapter 4.
T h e affine group
( G , T), transforming ( X , S; p, S )
(gX + t, gSg'; ga + t,
where g € G, t € T leaves the problem of testing H^p
gBg') 2
— 0 again H\ : p
2
=
A > 0 (specified) invariant. However the subgroup ( G T ( P ) , T ) , where Gj-(p) is the multiplicative group of nonsingular lower triangular matrices of order p whose first column contains only zeros expect for the first element, satisfies Hunt-Stein conditions. T h e action of the translation group T is to reduce the
102
Group Invariance in Statistical Inference a
mean a to zero and S — E a = i X (X°)'
is sufficient where N has been reduced
by one from what it was originally. We treat the latter formulation considering 1
A T , . . . ,AT
W
to have zero mean and positive definite covariance matrix E and
consider only the subgroup Gx{p) sample space under Gx{p)
for invariance. A maximal invariant in the
is R=
_ S )[i]
(R , 5 ~
[l2
• - -, Rp)', where
2
(
S
2 ) [ i ]
( 2 ] ) [ i
] i = 2,..
.,p
(5.27)
5(H)
i=i
where C[i] denotes the upper left-hand corner submatrix of C of order i and tyi] denotes the i-vector consisting of the first i components of a vector 6 and
A
\ (21)
3(22)7
where 5(22) is a square matrix of order (p— 1). A s usual we assume that N
>p
which implies that 5 is positive definite with probability one. Hence Ri > 0, R
2
= U = ^2*- Rj
S
2
coefficient).
(R
2
is the squared sample multiple correlation
T h e corresponding maximal invariant in the parametric space is
A = (6 ,...,S Y, 2
1-
where
P
£
S
(5.28)
E( ) U
with 6, > 0, p.2 _= EV?P= 2 > 2
S
('2)['] (22)[i1 <31)[i]
5 > i-2 5
L
e
t
T h e joint distribution of J ? , . . . , R 2
p
is (see Giri and Kiefer, 1964) i
( l - X ) ^ ( l - T , ^ r i ) nHr-Dr(l(N-p+
x fi
I)) ( l + E
P =
2
l
N
-
-
l
)
rjiCl - A ) /
+ i)^ - r(l{N-i N
p
7 i
- 1])
+2))
+2]
J=2
ft,=D
/3 =D P
\j=2
r ( i ( j v - i + 2) +
ft;
4T- (l-A)/ J
7 j
(l+7r-
10
3=2 L
(5.29)
Some Minimax Test in Multinormates
where 7_,- = 1 - A J j ^ l f t , ttj = Sjfy.
103
means 0 if
T h e expression 1 / ( 1 + TT^-
6j - 0. From (5.29) we obtain
+ B ( r , A , » ) , (5.30) /
/o,o(r)
2
3=2
\ i>J
where £J(j", A,T?) — o ( A ) uniformly in r, 77. From Theorem 4.3.1 it follows that the assumption of Theorem 5.1.1 are satisfied with U = £ letting £ i j + 2)
- 1
P =
2
-^j = R
2
a
i
m
M-M = bX, b > 0. A s in example 5.1.3
give measure one to n" = ( ) ) § , . ... ,1(2} where jjj — (JV — j + l j ' ( W — -
| A
_ 1
( p - l ) J V ( i V - p + 1), j = 2 , . . . , p - l we get (5.6). Hence we prove
the following theorem. T h e o r e m 5 . 1 . 5 . F o r ever?/ p,N
and a the R? test which rejects H
large values of the squared sample multiple correlation minimax for testing HQ against Hi : p Remarks.
2
coefficient R
2
0
for
is locally
= A —> 0.
It is not a coincidence that (5.19) and (5.29) have the same
form. (5.18) involves the ratio of noncentral to central chisquare distribution while (5.29) involves similar ratio with random noncentrality parameter. T h e first order terms in the expressions which involve only mathematical expectations of these quantities correspond each other.
E x a m p l e 5.1.5.
Consider problems 7 and 8 of Sec. 4.3.
T h e group G
as defined there does not satisfy the condition of the Hunt-Stein theorem. However this theorem does apply to the solvable subgroup GT of p x p lower triangular matrices g of the form (9u 0 9 =
0
0 0
(5.31)
322 V 0
\
Spp/
92p From E x a m p l e 5.1.4 a maximal invariant in the sample space under GT is R — ( i ? , . . . , R ) ' and the corresponding maximal invariant in the parametric 2
p
space is A = (62,...,6 )'. P
From (5.29) R has a single distribution under
HIQ and H o and it has a distribution which depends continuously on the p i 2
dimension parameter Fix under Hi\ under H , 2X
where
and on the p-dimension parameter
r
2A
104
Group Invariance in Statistical
T
IX
Inference
= { A :tf<> 0, t = 2 , . . . , p i + l ,
Si=0,i
= pi
PI+I
+ 2,...,p,
= pf = A } , ;=2
r > = { A : tfi = 0, t = 2 , . . . , p i + l ,
6i>0,
2
+ 2,...,p,
g
ft
= 4
i — pi
= A>.
(5.32)
i= +2 P 1
Let
with
2
tfi/p , 0,
2
if p > 0 if p = 0 . 2
Because of the compactness of the reduced parameter space { 0 } and and the continuity of
r
fx,n( )
r
1A
in n we conclude from Wald (1950) that every
minimax test for the reduced problem in terms of the maximal invariant R is Bayes. Thus any test based on R with constant power on TJX is minimax for problem 7 if and only if it is Bayes. From (5.29) as A ~* 0 we get (for testing H
10
against
Hi\) PI+I
j=2
,i>j
+ B(r,A,ij)
(5.33)
where S ( r , A, 17) — o(A) uniformly in r and r>. It is obvious that the assumption of Theorem 5.1.1 is satisfied with PI+I
U=
J^Rj
= Ri,
h(\)
=
b\
j=2 2
where b > 0. T h e set p — 0 is a single point n = 0, so £rjj assigns measure one to the point T? — 0. T h e set Fix is a convex p\-dimensional component r>i — o(h(\)).
set wherein each
So for a Bayes solution any probability measure t\\,x
can be replaced by the degenerate measure £ * which assigns the probability A
0
0
one t o m e a n r i - (J? ,. . . , J 7 ° , , ) o f £ i * . C h o o s i n g £ * +1
one to the point whose j t h coordinate is
= (N-j
A
which assigns probability
+
l)~ (N-j+2)- p2 (N-
1
1
1
Some Mintmax Test in Multinormates
105
j = 2 , . . . , p i + l we get (5.6) from (5.33). T h e condition (5.5)
p + l){N-pi),
follows from Theorem 4.4.1. Hence we prove the following theorem. 7 the likelihood ratio test of Hit
T h e o r e m 5 . 1 . 5 . For Problem
H\\ as X —> 0.
(4.38), is locally minimax against the alternatives In problem 8 as X —> 0 (for testing H Q against 2
A,„(r)
=
.V A
l
/o,T,(r)
- i + n + Y,
r
defined in
i(S'K+(
H x) 2
i V
- i + H.. 2
1
+5(r,A,7i),
(5.34) where B ( r , A, JJ) — o(/i(A)) uniformly i n r, TJ. Since the set I ^ A is a p2-dimensional Euclidean set wherein each component r/, — o(h(X)),
using the same
argument as in problem 7 we can write
/ I
r
h,n( )^o,x(dv)
-1
+ f, +
YI >(£ r
w
+
(
w
-
3 +
2)%
j=pi+i
=
1 +
- i + f,+ 2 ^(Sf+c^-j'+iM
JVA
+ B(r,A) (5.35)
where B ( r , A ) « + 2
=
o(/i(A)) uniformly in r and £ I A assigns measures one
to
<)'•
L e t iifc be the rejection region, given by, Rk = {x:
U{x) = n + kf
2
> c}
(5.36)
a
where k is chosen such that (5.36) is reduced to yield (5.5) and c
Q
depends on
the level of significance a of the test for the chosen k. Now choosing
o
(N-p + 2)(N-p+l)
p
( N - p + 2)p
2
„
106
Group Invariance in Statistical
Inference
we conclude that the test
B* =
U(x) = ft + ^ — - ' 2
> c
a
j
with P O , A ( W * ) = Q satisfies (5.6) as A - » 0. Moreover any region R
k
form (5.36) must have k =
in order to satisfy (5.36) for some
of the From
(4.37) it is easy to conclude that the test 4>' is L B I for problem 8 as A —» 0. T h e 2
2
iE -test which rejects H20 whenever r
— f i +f
> c
2
a
does not coincide with 0*
and is locally worse. From Sec. 4.3 it is evident that the power function of R? test depends only on p 2
at p
2
and has positive derivatives everywhere in particular
= 0. From (5.5) with R — R*, h{\)
> 0. So we get the following theorem.
T h e o r e m 5.1.6. For Problem 8 the LBI test
N
l
f 1 + ~f
f
2
> c
a
where c
a
against the alternatives
whenever
depends on the size a of the test, is locally minimax as A —* 0.
R e m a r k s . In order for an invariant test of H n against H x to be minimax, 2
it has to be minimax among all invariant tests also.
2
However, since for an 2
invariant test the power function is constant on each contour p
= p\ = A,
"minimax" simply means "most powerful". T h e rejection region of the most powerful test is obtained from (5.34), from which it is easy to conclude that [/A,7j('")//o,n(r)] depends non-trivially on A so that no test can be most powerful for every value of a. 5.2. A s y m p t o t i c a l l y M i n i m a x T e s t s We treat here the setting of (5.1) when A —> 00. o(H(\))
Expressions like o ( l ) ,
are to be interpreted as A - * 0 0 . We shall be concerned here in max-
imizing a probability of error which tends to zero. T h e reader, familiar with the large sample theory, may recall that in this setting it is difficult to compare directly approximations to such small probabilities for different families of tests and one instead compares their logarithms. While our considerations are asymptotic in a sense not involving sample sizes, we encounter the same difficulty which accounts for the form (5.39) below. Assume that the region R={x:U(x)>c } a
satisfies in place of (5.5)
(5.37)
Some M i n i m a l Test in Mu Kin urinates
= = 1 - e x p { - i r ( A ) (1 + o ( l ) ) }
P (R) x>n
107
(5.38)
where # ( A ) — i co as A —» co and o ( l ) term is uniform in TJ. Suppose that p(x,A,7j)^ (d7i) liA
= e x p { f f ( A ) [ G ( A ) + R(\)U{x)}
/
+ B(af,A)}
(5.39)
p{x,0,7j)^ , (dj;) 0
A
where sup^ | B ( a : , A ) | = o{H(X)),
and 0 < c j < R(X) < c
regularity assumption is that c
2
< oo.
One other
is a point of increase from the left of the
Q
distribution function of U when 6 — 0 uniformly in n, i.e. infPo„(E/ > c
a
- e ) > a
(5.40)
for every e > 0. T h e o r e m 5 . 2 . 1 . / / U satisfies (5.38)-(5.40) and for sufficiently there exist £ , A and £ i 0
( A
totically logarithmically
large A
satisfying
(5.39), then the rejection region R is asymp-
minimax
of level a for testing Ho -6 = 0 against the
alternatives Hi : 6 — A as A - » oo, i.e., l
i
inf,[-log(l-P ,,{*})]
m
A
A - » sup^
A
e
Q
=
1
i n f , [ - log(l - P , , ( 0 A rejects /Jo))]
a
a
P r o o f . Assume that (5.41) does not hold. T h e n there exists an e > 0 and an unbounded sequence V of values A with corresponding tests d>\ in Q
whose
a
critical region satisfies P , {R}
> 1 - e x p { - # (A) (1 + 5e)}
X V
(5.42)
for all IJ. There are two possible cases, (5.43) and (5.46). If A € T and - 1 - ( 7 ( A ) < R(X)
c
a
+ 2e,
consider the a priori distribution £A (see Theorem 5.1.1) given by
(5.43) and r
A
satisfying r /(l A
n
) = e x p { i f ( A ) (1 + 4e)} .
Using (5.42) and (5.44) the integrated risk of any Bayes procedure £f
(5.44) A
must
satisfy rl(Bx)
< rl(M
< (1 - r )a
+ r e x p { - J / ( A ) (1 + 5e)}
-
+ exp{-e H(A)}].
x
(1 -
)\a
Tx
A
(5.45)
108
Group Invariance in Statistical
Inference
From (5.39) a Bayes critical region is B
= {x : U(x) + B(x,X)/R(X)H(X)
x
> [-(1 + 4e) - G{X)]/R(X)}
.
Hence, if X is so large that B{x,X) sup
H(X)R(X)
we conclude from (5.43) that B D{x: x
U(x) > c
a
-
e/c } =
T h e assumption (5.40) implies that Po {B' } !V
B' (sa.y).
2
x
> a+t! with e' > 0, contradicting
x
(5.45) for large A. On the other hand if A e V and -l-G(A)>R C A
consider the a priori distribution
Q
+2e,
(5.46)
given by £* j , and T
a
satisfying
r / ( l - r ) = exp{>7(A)(l + e)}. A
(5.47)
A
Using (5.39) a Bayes critical region is B
A
= {x : U(x) + B(x, X)/R(X)H(X)
> [-(1 + e) - G(X))/R(X)}
Thus, if s u p I R(X^BIX) I */2c2 we conclude from (5.46) that B\ that, by (5.37) and (5.47), T
1
r*(Bx)
> nexp{-H(X)[l
+
. C r7, so
o(l)}}
-(l-r )exp{>Y(A)(c- (l))}. A
0
(5.48)
Since r*(B ) x
< r*(4>x) < (1 - T ) Q + r e x p { - r T ( A ) ( l + 5e)} A
x
= ( l - r ) [ o + exp{-4e A
H(X)}],
we contradict (5.48) for sufficiently large A.
•
2
E x a m p l e 5.2.1. ( T - t e s t ) In Problem 1 of Chapter 4 let U(X)
= £ J % .
Since 0 ( a , 6 ; x ) = e x p { x ( l + o ( l ) ) } as x -* co we get, using (5.18),
+ J= l
i>j
J
(5.49)
Some M i n i m a l Test in Multinormo/es
with Sup
\B(r,T), A)I — o ( l ) as A —• 0 0 . From (4.5) putting n
p
109
= 1, 7)i — 0,
i < p, in (5.49) the density of U being independent of n we see that P „{U
< c}
X
as A —• oo. £i
a
= exp{(
C a
Hence (5.38) is satisfied with H(X)
assign measure one to the point Vi =
i A
- 1)[1 + o ( l ) ] } = | ( 1 — c ). a
" — Vp-i
assign measure one to (0,0) we get (5.39).
(5.50) Now letting A
— 0 Vp — 1
N
A
5
Finally (5.40) is trivial.
£O,A From
Theorem 5.2.1 we get Theorem 5.2.2. T h e o r e m 5.2.2. For every p,N,a,
Hotelling's
7
T
test is
asymptotically
minimax for testing HQ .6 — 0 against H\ : 6 — A as A —> co. From Sec. (5.1) we conclude that no critical region of the form £ ai R{ > c other than Hotelling's would have been locally minimax, many regions of this form are asymptotically minimax. T h e o r e m 5 . 2 . 3 . c < 1 and 1 — a
x
< a
< •• • < a ,
2
then the critical
p
region
{ £ tti Ri > c } is asymptotically minimax among tests of same size as A —> 0 0 . P r o o f . T h e maximum of £ V f j 2~2i>j Vi subject to £ at r\
— c, T2 — • • • — r
— 0.
p
;
Ojr< < c is achieved
Hence the integration />,,()") over a small
region near that point yields (5.50) with c„ replaced by c.
Since a ,'s are }
nondecreasing in j it follows from (5.49) that £I,A can be chosen to yield (5.39) with U — Yl{aiRi.
Again (5.40) is trivial.
E x a m p l e 5.2.2.
Consider again Problem 2 of Chapter 4. From E x a m -
ple 5.2.1 with p = pi, R = (R ..., u
=
e x
R )', Pl
r, = (n,,...
-^E^E*
P S~
with sup,.^ \B(r,\,n)\
•
,n )' Pl
we get
(5.51)
( l + B(r,A,>)))
— o ( l ) as A -> 00. Using the same argument of the
above theorem we now prove the following. T h e o r e m 5.2.3. For every p i , N, a the UMPI imax for testing HQ : 61 — 0 against Hi :6 = \as\—* E x a m p l e 5.2.3.
test is asymptotically
min-
00.
In Problem 3 of Chapter 4 we want to find the asymp-
totically minimax test of H
Q
: 6~i — 62 — 0 against the alternatives H
x
: 61 — 0,
110
Croup Invariance in Statistical
62 = A as A —» co.
Inference
In the notations of Examples 5.1.4 and 5.2.1 we get,
i - E 1 J > i +*a = E V v ni = 0, i = 1 , . . . , p i + P 2 under H 1
r
r
f
+
P
2
T
T
= (n, -,Tn+toY* v = ( * h , • • • . * , + » ) ' . and rj, = 0, i = 1 , . . , , p i under H\. Hence
0
PI+PJ r
- l + fi + £
/o,n(r)
(5.52)
i£»K
= o ( l ) as A - t 00. From (4.9) when 5a = A
where sup,.,, \B(r,A,n)|
00,
61 = 0 we get /(r,,f |A) 2
= exp
/Cl.falO) where s u p , . ^ \B(f ,f ,\)\ t
j
^[-l
+ f i + f ] ( l + B(fi , f . A ) ) j
= o ( l ) as A -* 0.
2
(5.53)
2
2
Letting fa in (5.39) assign
measure one to the single point n — 0 we get from (5.52)
/
A,,(rKi.x(dij)
- l + f,+
r, 2 *
£ >=Pi + l
.-?.A))|
i>j
(5.54) Let Rk be the rejection region of size a , based on R\,R R
k
= {x : U(x) = R +kR > t
2
given by
c}
2
(5.55)
a
where k is chosen such that (5.54) is reduced to yield (5.39) and R chosen k satisfies (5.38) and (5.40). Now letting 1/1 = • • • = T ) -7
P l + P )
k
P l + p l
for the _i
= 0,
= 1 we see that (5.54) is reduced to yield (5.39) with U{x) = Ri +
R. 2
From (5.53) P(U(x)
exp j
Hence Hotelling's test which rejects fi
fa 0
- 1)(1 + o ( l ) ) j .
(5.56)
for large values of R\ + R
2
satisfies
(5.38) with H(X) = f (1 - c' ). T h e fact that Hotelling's test satisfies (3.39) is a
trivial. Since the coefficient of r
p i +
i in the expression inside the brackets in the
exponent of (5.54) is one, any rejection region R
k
must have k = 1 to satisfy
(5.39) for some £ \ \ . From Theorem 5.2.1 a region R t
k
with k /
1 and which
Some Minimax Teat in Multinormales
111
satisfies (5.38) and (5.40) cannot be minimax as A —• co. Using Theorem 4.1.5 we now prove the following:
T h e o r e m 5.2.4. Hotelling's
test which is asymptotically
best invariant
minimax for testing HQ I fli = &2 = 0 against H± : 6
asymptotically
T
is
— 0,
oj = A as A —* oo.
R e m a r k s . From Theorem 5.2.3 it is obvious that there are other asymptotically minimax tests, not of the form Rt, for this problem. It is easy to see that P {R Xin
l
1
+
P a
A
'" (
ft
~/l*fi *' 5
1) = l - e x { - ( l o g A - l o g ( ( l - c ) ( l + c ( l ) ) ) } ,
2
i
+
Thus the fact 1
2
(l-c)- R >
c
1
P
^ r
1
*
2
-
C
o
)
=1
e
~
x
p
{ ~
~
C
o
)
(
1
+
0
(
1
)
)
that the likelihood ratio test which rejects HQ whenever
and the locally minimax test satisfy (5.40) and (5.38) is obvious
in both cases. B u t from Theorem 5.2.4 these two tests are not assymptotically minimax for this problem. 5.3. M i n i m a x T e s t s 2
In this section we discuss the genuine minimax character of Hotelling's T
and the test based on the square of the sample multiple correlation coefficient 2
R.
These problems have remained elusive even in the simplest case of p — 2 or 2
3 dimensions. In the case of Hotelling's T character of T
2
test Semika (1940) proved the U M P
test among all level a tests with power functions depending 1
only on & — Np'H~ p.
2
In 1956 Stein proved the admissibility character of T
test by a method which could not be used to prove the admissibility character 2
of R - t e s t . G i r i , Kiefer and Stein (1964a) attacked the problem of minimax 2
property of Hotelling's T - t e s t among all level a tests and proved its minimax propery for the very special case of p — 2, N — 3 by solving a Fredholm integral equation of first kind which is transformed into an "overdetermined" linear differential equation of first order. Linnik, Pliss and Salaeveski (1969) extended this result to N — 4 and p — 2 using a slightly more complicated argument to construct an overdetermined boundary problem with linear differential operator. Later Salaveski (1969) extended this result to the general case of p = 2.
112
Group /nvuriaiice in Statistical
5.3.1. Hotelling's
T
2
Inference
test
In the setting of Examples 5.1.1 and 5.2.1 we give first the details of the proof of the minimax property of Hotelling's T
2
test as developed by
G i r i , Kiefer and Stein (1964a), then give the method of proof by Lunnik, Pliss and Salaeveski (1966). In this setting a maximal invariant under G j f p ) is R =
(R ,... R )' 1
}
with
P
= NX'(S
+ NXXy^X
and the corre-
sponding maximal invariant in the parametric space is A = (6i,..., £
p
St = N p ' £
- 1
S )' with p
u = S. From (5.18) R has a single distribution under H
0
and
its distribution under Hi : 6 = X (fixed) depends continuously on a p - 1 dimensional parameter T = | A
= (6
6 )':
1
P
Thus there is no U M P I test under Gr(p) / (r) A
as the pdf of
«5,>0,
=
as it was under Gt{p).
Let us write
as given in (5.18). Because of the compactness of the r
reduced parametric spaces { 0 } and T and the continuity of f&( )
i n
A we
conclude from Wald (1950) that every minimax test for the reduced problem in terms of R is Bayes.
In particular Hotelling's T
2
test which rejects He,
whenever U = £ J R\ > c, which is also G r - i n variant and has a constant power 1
function on each contour Np'T,~ p H\
= X, maximizes the minimum power over
if and only if there is a probability measure £ on T such that for some
constant Ci
'r / o t »
\m
1 = ]
*t
(5.57)
<
according as = > c
1
K< J
except possibly for a set of measure zero where c depends on the specified level of significance a and c i may depend on c and the specified X. A n examination of the integrand of (5.57) allows us to replace it by its equivalent
r
r Jo\ )
« d A ) = c,
if
Y > - e . ,
(5.58)
Obviously (5.57) implies (5.58). O n the other hand, if there are a £ and a constant C[ for which (5.58) is satisfied and if r* = ( r j , . . . , r ' ) ' is such that £ i r j = c' > c, then writing / = ^
and r"
= ft*
we conclude that
Some Minimax
Test in Multinormales
p
because of the form of / and the fact that c'/c > 1 and £
r
r** = f E i t
113
=
c
-
This and similar argument for the case c' < c show that (5.58) implies (5.57). Of course we do not assert that the left-hand side of (5.58) still depends only on 2%n if £
p
n / c.
The computation is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of c i for which (5.58) can possibly be satisfied. Let R = (Ri,...,
Rp-i}'
and let f&{f/u)
be the conditional pdf of R given
p
£ = i Ri — it with respect to the Lebesgue measure, which is continuous in f and u with U = E i
1 r
> 0 and
« < it < 1.
Denote by / " ( u ) the pdf of
which is continuous for 0 < u < 1 and vanishes elsewhere and
which depends on A only through 8. T h e n (5.58) can be written as
j /AtfMtfdA) =
fo(f\c)
(5.59)
1
/;*<<=)
if Ti > 0, E i
_
1
T
i
<
c
an
- The
fi ^
integral in (5.59), being a probability
mixture of probability densities is itself a probability density in f as is
fo(f\c).
Hence the expression inside square brakets of (5.59) is equal to one. From (4.5)
r(f) .
C
r( V )
^ )
(
a
_
c
i ^ -
)
P
- ,
2
0
' ^ , E
;
| ) .
{ 5
.
6 0 )
Using (5.60) we rewrite (5.58) as JV-i + 1 1 r$-\*,. * A
J r
I
1
i>3
2
> >=1
'2' 2 i
M
'
, / J V /j
cA
'12'!'
2 (5.61)
r
c
for all r with r j > 0, E i i = - Write T i for the unit (p - l)-simplex
r! = {(A,...,/3 ) ft>o, X > = i } p
:
114
Group Invariance in Statistical
Inference
where /3; — 6i/X. Let U —
7 — cA and £* be the probability measure on
T i associated with £ on T. Since
J r
>
I )=1
i>j
i
— t]{6A), (5.61) reduces to
v
fe=l
'
= * ( f , f ; i , )
( - >
for all ( i i , . . . , f ) with (h
a
'i = 7
p
i
m
i i > 0 and hence, by analyticity for all
t ) with g j f f t = 7P
From (5.62) it is evident that such £*, if it exists, depends on c and A only through their product 7 . Note that for p — 1, T i is a single point but the dependence on 7 in other cases is genuine. C a s e p = 2, N = 3. S o l u t i o n of G i r i , K i e f e r a n d S t e i n Since 0 ( | , —
(1 + x)exp(^x),
jf [i + ( 7 - -
from (5.62) we obtain
ft)]
= ^ " ^ ( f , i ; |(5.63)
with fi = 7 - f , A = 1 - / 3 . We could now try to solve (5.63) for by using . the theory of Meijer transform with kernal ^ ( 1 , 4121;. 1hx). Instead we expand both 2' sides of (5.63) as power series in f . Let 2
2
2
Jo
be the ith moment of /3 . From (5.63) we obtain 2
(a) (b)
l+t-*m=B, -(2r -
1) p _ ! + r
(2r + 7 W
-
7^+1 = B
T(--)r(i)J
v
y
where B = e~^ d>(^, 1; £ 7 ) . We could now try to show that the sequence { p , } given by (5.64) satisfies the classical necessary and sufficient condition for it to be the moment sequence of a probability measure on [0,1] or, equivalently, that the Laplace transform GO
3
=0
Some Minimal
Test in Muttinormales
115
is completely monotone on [0, oo), but we have been unable to proceed successfully in this way
Instead, we shall obtain a function m (x)
which we
1
prove below to be the Lebesgue density dt]*{x)jdx
of an absolutely continuous
probability measure f* satisfying (5.64) and hence (5.63). T h e generating function CO
of the sequence {fa}
satisfies a differential equation which is obtained by mul-
tiplying (5.64) (b) by f
_ 1
and summing from 1 to oo:
2
2
2t (l-t)V '(t)-(f -7t + J
= B t [ ( l - t r
1
= S ( ( l - t ) "
1
/
2
/
7
M t )
- l ] + 7 [ t ( l - / 0 - l ] 2
- ( - 7 .
(5.65)
T h i s is solved by treatment of the corresponding homogeneous equation and by variation of parameter to yield f J
m n
'
(l-i)
l
/
a
j
0
M
2
zl [3T#-Ift*
2
i 1
2T (l-T) /2
B
]
d
2T(1-T)J
T
'
(5.66) the integration being understood to start from the origin along the negative real axis of the complex plane. T h e constant of integration has been chosen to make VJ continuous at 0 with VJ(0) = I , and (5.66) defines a single-valued function on the complex plane minus a cut along the real axis from 1 to oo. The analyticity of ip on this region can easily be demonstrated by considering the integral of ip on a closed curve about 0 avoiding 0 and the cut, making the inversion w = | , shrinking the path down to the cut 0 < w < 1 and using (5.67) below. Now, if there existed an absolutely continuous £* whose suitably regular derivative m
7
satisfied
j m ( a ; ) / ( l - tx)dx = i>(t), Jo
(5.67)
7
we could obtain m
y
m^(x)
by using the inversion formula = (2irix)
_ 1
limty(x
- 1
+ it) - T / > ( X
-1
- ie)].
(5.68)
116
Group Invariance in Statistical
Inference
However, there is nothing in the theory of the Stieltjes transform which tells us that an m (x)
satisfying (5.68) does satisfy (5.67) and, hence (5.63), so we use
y
(5.68) as a formal device to obtain an m
7
which we shall then prove satisfied
(5.63). From (5.66) and (5.68) we obtain, for 0 < x < 1,
m-,(x) —
B 2
2rr x^ (l
1 2
- x) '
\ J
2
v>l
| l + u
0
+ B 3 2
(1 +
u) '
Ja
1- «
J (5.69)
In order to prove that a%*{x) — m ( x ) dx satisfies (5.63) with £" a probability 7
measure we must prove that
(a)
m ( i ) > 0 for almost all x, 0 <x
(b) f m {x)dx 0
(c)
<1,
T
1
(d) p
r
= l
1
fii = —j
xm-f(x) T
0
x m {x) 1
(5.70) dx satisfies (5.64) (a) dx satisfies (5.64) (b) for all r > 1.
3
2
T h e first condition follows from (5.69) and the fact that B > 1 and u / ( l + 3
2
u) < (1 + w) -' for u > 0. T h e condition (d) follows from the fact that m ( x ) 7
as defined in (5.69) satisfies the differential equation
w! (x) + m ( i ) | 7
+ (l-2x)/2x(l-x)
1/2
3
2
= B/2irx (l-x) < ,
(5.71)
so that an integration by parts yields, for T > 1,
( r + l K - r ^
r
_ , =
/ [(r + l ) x Jo
r
r
l
- rx - \m (x) y
= / Jo = u (l T
dx
dx -
7
/ 2 ) + 7 ii /2 T+l
-
«r_i/2
+ B r ^ + ^/2Tr^r! which is (5.64) (b). T o prove (5.70) (b) and (e) we need the following identities involving confluent hypergeometric function
T h e materials presented here can
Some Minimax
Test in Multinormates
117
be found, for example, in Erdelyi (1953), Chapter 6 or Stater (1960).
The
confluent hypergeometric function has the following integral representation
0 K 6; x) =
T(a + j)T(b)/r{a)T(b
+ j) j \ JS>
5=0
when 6 > a > 0. T h e associated solution ip to the hypergeometric equation has the representation
if a > 0. We shall use the fact the general definition of ip, as used in what follows when a — 0, satisfies iP(0,b;x)
= 1.
(5.74)
T h e functions T/J and satisfy the following differential properties, identities and integrals.
~(a,b;x)=
(j-l)b+l;x)
+ 0(o,6;i),
(5.75)
^ / > ( a , 6; x ) - $ ( a , 6; x) - ^ ( a , 6 + 1; x),
(5.76)
^(a,b;x) = i - ' o K a - H l W a + l . i ; ! ) - ^ , ! ; ! ) ] , (a - b + l ) 0 ( o , 6; x) - a 0 ( a + 1,6; x) + (b - l ) 0 ( a , b~l;x)
= 0,
6 0 ( a , 6 ; i ) - bd>(a - 1 , 6 ; a ; ) - x 0 ( o , 6 + l ; i ) = 0 , ih(a, 6; x) - ai/>(a + 1, b; x) - ip{a, b - 1; x) = 0, ( 6 - a K > ( a , 6 ; x ) -xih(a,b
r
Jo
x
e~ (x
l
+ l;ar) + ^ - ( o - l , f c ; « ) = 0 ,
+ y)- [ - , l ; i
r,l;y
(5.77) (5.78) (5.79) (5.80) (5.81)
for y > 0 , (5.82)
118
Group Invariance
J
in Statistical
Inference
e - * ( s + y)-vQ,l;z)
g
(5.83)
Using (5.73) , in terms of hypergeometric functions, for 0 < x < 1, the m , 7
defined in (5.69), can be written as
=
h[,'^./.{^ / '(^,1,7/2 j
° r r ? r 2TT[X(1 E/2
jf
l
v- e-<"t
2
dv-v^e^^dv
J
„ . ( e " ^ f i i : T / 2 ) ^ i , i ; 7 ( i - » ) / 2 ) ( \2 /
r(|)v-(|,l;7/2)}.
(5.84)
We now prove (5.70) (b) to establish that m-,(x) is an honest probability density function. From (5.73), (5.72) and (5.82) we get
/
VTTt—^*(i.i;7(i-»)/?)#
Jo
2ir x f l - i ) J =
f Jo
VT7T~ r ^ ( l , 1 ; 7 * / 2 ) tte 2ir[x(l - l ) ] a
7o
2 , r [ * ( l - * ) ] » 7o
v
dt 1
J
(5.85)
Some M i n i m a l Test in Mull in orm ales
119
From (5.85), (5.84) and (5.72) we get 4 e„i
fr i
r(|)
h
z
dx
= 2 ^ , l ; ^ ( i , l ; , )
- 0 ( I , l ; ^ ( ^ , l ;
S
where 2z = 7 . We now show that tf'(z)-#(*)=0,
(5.86)
from which it follows that H{z)=ce
l
for some constant c. B y direct evaluation in terms of elementary integrals when 7 — 0 we get (using (5.85)) / m (x) dx = 1; Jo hence, (5.70) (b) follows from (5.S6). T o prove (5.86), we use (5.75) and (5.76), a
which yield H\z)-H(z)
=
+ # / | , l ; » \ ^ Q
>
1
i
5
^
- 2 Ki i; ^G' 2;s ) + ^G' 2; ^G ,M ) 1
1
T o this expression we now add the following four left-hand side expressions, each of which equals zero i> U .
1
1 ~ n H ^2; 2
; *
(N
2* U
^
2
2- ^
T
2 \2
+ 0
-,l;z
1 1 . /3 -,2;* - - t f -,2;* -tf -,l;z 2"V 2 1 . (Z
-01
1 . ^3
- -
T
2 \2
to obtain H' — H — 0, as desired.
120
Group Invariance in Statistical Inference We now verify (5.70) (c). Prom (5.72) and (5.79), with a = §, b = 1 we
obtain 1
(1+7?/) e™'
2
dy = M -,l; /2 7
+70
1
-,2;7/2
/4 (5.88)
Using (5.70) (6), which we have just proved, and (5.85) and (5.88) we rewrite (5.70) (c) as 1-1 AI
jf u + 7(1 - $}] M * ) *
|, mt/aJJ
1-
(l + 7 » ) 2T[»(1- )]I/2
1 ; 7
/
W
2
M
, ?
V
- e ^ r 1 7
U
11; /2
g ^(l,l;7g/2) 2^(1 -
,
y
0(|,l; /2)
I
7
f Jo +
1
7 ^ ( 1 , 1 ; 7J//2) 1
2T[J/(1 — J f ) ] '
l
,
f 7
+
f
r ( | ) f ( f . 1,7/2)
=
)\dy
7
0
1
0(1,1; Tg/g) 2^^(1-2/)]!^
(l + 73/)e^
»
2
2ff[v(l - J/)]V2
dy
2
A
l„/3 1_ 2 r ( - ] ^ ( - , l ; / 2 ) - -2 r [V- 2] ^ [ - , l ; 7 / 2
(5.89)
7
Using (5.81), with a = 1, 6 = 0 and (5.72), (5.73), (5.83) we get f
1
/„
7g^(l,l;7»/2)
, _
2 ^ ( 1 - , ) ] ^ ^ - y
1
f
M0,0;7y/2)-»(l,O;7y/2)]
, d
0
»
1
= 1 -
f _ dy Jo * [3/(1 - » ) ] ^ Jo a
= l - j T ( l + = 1+ r
r V Q ^ 7 * / 2 ) e - ^
(|) ^(I^^A-^d-HT/s)
/2. (5.90)
Some Afinimai Test in Mv.lttnorma.les
121
T h u s (5.89) and (5.90) imply (5.31) and, hence (5.70) ( c ) . C a s e p = 2, N = 4. S o l u t i o n o f L i n n i k , P l i s s a n d S a l a e v s k i From (5.62) the problem consists of showing (with 0i — /J) on the interval 0 < 8 < 1 there exists a probability density function £{/3) for which
= 0(2,l; /2).
(5.91)
7
W i t h a view to solving this problem we now consider the equation | i (7 - t l ) ( l -
J
j *
e
=
"<'-«/ ^2, | ; M / 2 ^ { |
l
8)/2)mW
4l<\M^-0)/2jmdf3.
(5.92)
Using the fact that
we rewrite (5.92) as
16-^0(2,1; -
/
1
- ^ / 2
e
[
1
7 -10
^ ) [ 1 +
+
7
_
7
^
K
(
A
)
(
f
- d d -
/
/3)K(W
J
(
5
9
3
)
Jo From (5.93) we get
2
2
l
/" e - ^ [ ( r - 2r W~ Jo - (7 + irW }t:(0)d0 +1
+ (1 + 7 + - 0,
7
r +
2
r
2r )8
for r = 1 , 2 , . . . .
(5.94)
Let
r=l
be an arbitrary function represented by a series with radius of convergence greater than one. Multiplying the r t h equation in (5.94) by o _ i and taking r
the sum over r from 1 to 00 we get
122
Group Invariance in Statistical
Inference
f1 3
2
2
(2/3 - 2 / 3 ) / " + (-5/3 + 60
s:
! - l +
=
o 0 - ^ + f 0
J LUMP) Jo
2
- f&
2
3
+ /3 )/' 7
) ,
d0 (5.95)
= 0 where L{f)
denotes the differential form inside the square brakets. Applying
the green formula to L(f)
and choosing a small e > 0 we get (5.96)
where the adjoint form L*(£) is given by 2
L * ( 0 = 20 (0
- 1)C + 0{-Z
+ 60 + 7/3 - 7 / 3 K ' 2
(5.97) and the bilinear form L[f, £] is given by
m% = ^ ( ! - ^ ) ( ^ - / ' o - \0+7^ (i - mi n • 2
2
2
(5.98)
Using the thoery of Frobenius (see Ince, 1926} we conclude that CC
CO
(5.99) £=0 with tto / 0, Co / 0; are two fundamental solution of L*(Q — 0 on the interval (0,1)- Similarly fc
£ n = £ 96(1 - /J) ,
jia - (1 - /3)"5 £ fefcd - 0)
1
k=0
(5.100)
k=0
with 3o ^ 0, fto # 0 are also two fundamental solutions of the same equation on the interval (0,1).
Consequently any arbitrary solution of the equation
L*(£) — 0 is integrable on [0,1] and we assert that there also exists a solution of the equation L*(£) — 0 for which h m L [ M ] | ; - < = 0.
(5.101)
Some Minimax
Test in Mullinormales
123
It can also be shown that any linear combination of the solutions in (5.99) (similarly for solutions in (5.100)) satisfies (5.101) provided that £(•) = £ i ( - ) . 2
T h u s £ - £12 is a solution of (5.95) and hence of (5.91). I n the following paragraph we show that £ i does not vanish in the interval 2
(0,1). Write 0 = 1-*,
*(ar)=A5*yf & i ( l - 4 -
(5.102)
The equation L"(t\) — 0 becomes 2
2x{\ - x)8" + ( - 1 + 4x--,x Substituting u(x) = 6'(x)j8(x)
+ -yx )S' + ^(1 + jx)$ z
= 0.
(5.103)
we obtain the Riccati equation
l-4s
+
7
a
-
7
s ' _
3+^7*
2x{l-x)
v
4x(l-x)
1
Let us recall that 8(x) — E t ^ o ^ ) * ' • > 1*1 < !• Assume that 9{x)
'
vanishes
at some point in the interval (0,1). Since 0(0) = 1, among the roots of the equation ${x) = 0 there will be a least root. L e t us denote it by XQ. the function 0(x) £?'(0) —
is analytic on the circle \x\ < x$.
| which implies that u(0) = — | .
Then
From (5.103) we get
Hence u(x) cannot vanish in the
interval (0, Xrj), since at any point where it vanishes we would necessarily have u'(x) > 0 whereas (5.104) shows that at this point u'(x)
< 0. T h e fact that
the function u(x) has a negative value implies that it decreases unboundedly as we approach the point XQ from the left (u' / contradicts (5.104). A s a result £ = £
i 2
0 since
which in turn
does not vanish in the interval (0,1).
L e t us now return to (5.91), which, as we have seen, is satisfied by £ — £
1 2
.
B y the method of Laplace transforms it is easy to obtain the relation
b l
/ x - {l Jo
-xy-^ei
r(t)r(c - b) r(c)
*tt>(a,b;px)
-4>(a,c;p + q),
x))
0
(5.105)
Letting t\ = 7 T , and multiplying both sides of (5.91) by r
5 (1 — r )
' and
integrating with respect to r from 0 to 1 we obtain with a = 2, 6 = | , c = 1,
p = 7£72,? =
7(l-W2,
| y 2 , l ; ^ W
=p ( H ; ^ )
?
( W .
(5.106)
124
Group Invariance in Statistical
Inference
Now to obtain (5.91) it is sufficient to make use of the homogeneity of the above relation and to normalize the function £12. 5.3.2.
R?-test
Consider the setting of Example 5.1.5 for the problem of testing H against the alternatives Hi
: p
2
=
A > 0.
alternatives p
2
> 0 the R -test
:p
2
—0
We have shown in Chapter 4
that among all tests based on sufficient statistic (X,S) 2
0
of HQ against the
is U M P invariant under the affine group ( G , T )
of transformations ( 9 , t ) , g e G , t e T operating as (X,S)
—• (gX + t,
gSg')
where g is a nousingular matrix of order p of the form a
=
0
(%m
with 3 ( H ) of order 1 and t is a p-vector. For p > 2 this result does not imply the minimax property of
2
R -test
as the group ( G , T) does not satisfy the condition of the Hunt-Stein theorem.
As discussed in Example 5.1.5 we assume the mean to be zero and
consider only the subgroup G r ( p ) for invariance. T h i s subgroup satisfies the conditions of the Hunt-Stein theorem. is R =
(Rg,...
,Rp)'
A maximal invariant under GT-(P)
with a single distribution under Ho but with a dis-
tribution which depends continuously on a (p — 2)-dimensional parameter A
— (6 , • • • ,S Y 2
P
r
function f\, { ) 7)
with £ J
= A
6j
—p
2
— A under St-
T h e Lebesgue density
°^ ^ under Hi is given in (5.29). Because of the compactness
of the reduced parameter spaces {0} and
r =
6 y,
6i>o,
p
r
and the continuity of fx, { ) v
m
£ 4 = A]
(5.107)
1" '* follows that every minimax test for the 2
reduced problem in terms of R is Bayes. In particular, the Jf -test which has 2
a constant power on the contour p
= A and which is also G (p) T
invariant,
maximizes the minimum power over H i if and only if there is a probability measure on Y such that for some constant cj
according as
Some Minimax
Test in Multinormales
125
except possibly for a set of measure zero. A n examination of the integrand in (5.108} allows us to replace it by the equivalent
/ |McktfA3««i Obviously (5.108) implies (5.109).
i f X > , = c-
(5-109)
O n the other hand if there is a £ and a
constant c\ for which (5.109) is satisfied and if f = (r~2>---.*p)' f
Y% i
7
= c > c, then writing / =
and r' = cf/c'
/(f) = / ( c V / c ) > / ( r - ) =
6
a
in (5.29) 7 " ( l — A) — 1 — - J2j>i jfai
n
d
t
h
a
t
i
u
c
n
t
n
a
t
we conclude that
C l
=
because of the form of / and the fact that c'/c > 1 and ^ j j r * 1
s
ti > °-
T
n
i
c
s
- Note that ^
similar
argument for the case c' < c show that (5.108) implies (5.109). T h e remaining computations in this section is somewhat simplified by the fact that for fixed c and A we can at this point compute the unique value of C\ for which (5.109) can possibly be satisfied. L e t R = (R2,...,
Rp-i)'
and write /x,Tj(r|i/) for the version of conditional
Lebesgue density of R given that E 2 Ri = U = u which is continuous in f and _
for Ti > 0 and E i
1
r« < w < 1 and zero elsewhere. Also write
Lebesgue density of R
(u) for the
— E 2 Ri = U which is continuous for 0 < u < 1 and
2
vanishes elsewhere and depends on A only through S - £ 2 ^ , - T h e n (5.109) can be written as
/ for fi > 0 and E S P
1
r
/ ,,(r|c)£(dA) A
ci
/c*(0
75(c) J
c
> < - T h e integral in (5.110), being a probability mixture
of probability densities, is itself a probability density in f, as is / o o ( r | c ) . Hence the expression in square brackets equals one. From (4.35) with 0 < c < 1
=
M
)
q - A ) ^ - »
r(^-i»
, _ ( p
_
3 )
r((jv- )/2)r((p-i)/2)
1
P
\(N-l),
1(JV-1);±(P-1)
; C
i { N
_ _ p
2 )
' A)
where F(a, 6; c; x ) is the ordinary 2^1 hypergeometric series given by
(5.111)
126
Group Invariance in Statistical
F(a,b;c;x)
Inference
= 2^ - -—* r(«)r(6)r(c + r) r=0
E
(g)
.
where we write ( a )
r
(fc) _
r
r
(c)
r)/T(a).
= T(a +
r
(5.112)
r
From (5.110) the value of
a
which
satisfies (5.109) is given by (i-A)tt"-'>r
l
"
,,„_ ,
.
3
1
r(|(/v- ))r(i(p-i))
(
N
- _„ P
J
P
FQ(JV
- 1), i ( J V - 1); | ( p -
1);CA)
.
(5.113)
From (5.113) and (5.29) the condition (5.109) becomes
£ [ i
+
£ r ( i - A ) / 7 - t ) i
i
03=o *
r ( i ( J v - j + l) +
i l r ( | ( i v - i
=
FQ(/V-1),
+
l
ft-)
r 4
r j
rci(N-U)
0 =a p
(l-A)/
7 j
( l + 7r-')
ift
i))(2/3 )! U + £ ^ ( ( 1 - * ) / • > ; - i ) J
"
j
-(N-l);
;
(5.114)
i(p-l);cA)
for all T with r, > 0 and £ J ri = c. C a s e p — 3 , N — 4 ( N — 3 if m e a n k n o w n ) . S o l u t i o n of G i r i a n d K i e f e r In this case (5.113) can be written as . + 2n)
>*3 *3
V(l-4)(i-^A)
+
r g a ( l - A ) ( 2 n + l ) ( 2 n + 3) a
(l-«a)(l-r A)l a
Ta 83 (l-fc)(l-r A) 2
(5.115)
One could presumably solve for £ by using the theory Meijer transforms with kernal F ( § , §; \\x).
2
Instead they proceeded as for Hotelling's T
problem,
128
Group Invariance in Statistical
Inference s
measure on [0,1], or, equivalently that the Laplace transform £ ™ ltj(-t) lj\
is
completely monotone on [0, oo}. Unfortunately we have been unable to proceed successfully in this way. We now obtain a function m {x), t
which we then prove
to be the Lebesgue density d £ * ( x ) / d x of an absolutely continuous probability measure C satisfying (5.118) and hence (5.115). T h a t proof does not rely on the somewhat heuristic development which follows, but we nevertheless sketch that development to give an idea of where the 111,(1) of (5.123) comes from. T h e generating function CC
3=0
of the sequence {fij}
satisfies a differential equation, which is obtained from n
l
(5.118) by multiplying it by t ~
and summing with respect to n from 1 to 00, l
2(1 - t)(t - z)
-
2
- 2zt + z)(t) (5.119)
1
tf,(l-t) i
-1-zf" .
Solving (5.119) by treatment of the corresponding homogeneous equation and by variation of parameter, we get
k U l - r ) (r-
m=
W=T)\
Cr-*)l (1-*)*
+
1
dr.
2[T(1-T)(T-Z)]1
(5.120)
T h e constant of integration has been chosen to make
m (x) f
Llo we could obtain m
c
m (x) z
(1
—
z
satisfied
dx = 0 ( f ) ,
(5.121)
tx)
by using the simple inversion formula = -
. lim
- 1
- ie)
(5.122)
27TIX E l O
Since there is nothing in the theory of Stieltjes transforms which tells us that an m
z
satisfying (5.122) does satisfy (5.121), we use (5.122) a formal device to
Some Minimax Test in Multinormales
obtain m
E
129
which we shall prove satisfies (5.115). From (5.120) and (5.122) we
obtain, for 0 < x < 1,
(
«•(«)=
(1-**)* 1
;
M
)
,
f
n
k
f
X
/
(1 + u)(z + u p
du
[u(l + u ) ( z + i*)]*
- 2 (1 + 1 i ) i ( z + « ) { B , Q , ( z ) + c,} (say).
(5.123)
2jr(a:(l - as))i We can evaluate c
c
by making the change of variables V = (1 + u)
1
and using
(5.126) below. We obtain
and
Q , - 2 ( l - z } -
1
+ (l-z)"t
[ l - ( l - ^ ) - ^ log
' ( l - z ^ i + f l - z ) ^
l - ( l - z ) i
. ( 1 - « ) * - ( ! - * ) *
l + (l-z)*.
Now to show that £*(da;) = m (x) I
'
dx satisfies (5.115) with £* a probability
measure we must show that
(a) m , ( i ) > 0 for allmost all x, 0 < x < 1; (b) fi m (x)
dx = l;
z
(c) /ti =
1 1 1 1 , ( 1 ) da; satisfies (5.118a);
(d) u
x
n
=
n
m,(a;) da; satisfies (5.118b) for TI > 1.
(5.124)
130
Group Invariance
in Statistical
Inference
T h e first condition will follow from (5.123) and the positivity of B
and c ,
z
for 0 < z < 1. T h e former is obvious. To prove the positivity of c ,
we first
z
note that
j
this is seen by comparing the two power series, the coefficient of z [(f j j / j f ] I)2/j{j
+
3
being
and (j + 1), and the ratio of the former to the latter being n";=i(» + i ) > i . T h u s we have B
>l-2.
M
Substituting this lower bound into
the expression for c and writing u = 1 — 2 the resulting lower bound for c z
z
has
a power series in u (convergent for \u\ < 1) whose constant term is zero and 1
2
whose coefficent of v> for j > 1 is [(j + i ) " - T (j
+ ±)/r(j)r(j + 2
Using the logarithmic convexity of the T-function, i.e. V (j+^) the coefficient of u> for j > 1 is > (j + | )
-
1
+1)].
< !T(j)r(i + l ) ,
1
- (j + l ) " > 0 Hence c
z
> 0 for
0 < z < 1. T o prove (5.124)(d) we note that m , ( i ) defined by (5.118) satisfies the differential equation 2
1
1 - 2x + zx m' (x) x
+ 7ro,(i)
= B /2nxHl
- x)%{\ - zx),
z
x(l
- x)(l
(5.125)
— zx)
so that an integration by parts yields, n > 1 - z ( n + 2)M-v+I + (1 + z)(n n+1
=
/ {-z(n Jo
=
[ x (l-(l Jo
= - ) -
f
i
n
l
+
2
l
l
n
n
+ ( 1 + z)(n + l)x
-
n
zu
1
-nx ~ }
m {x) z
dx
2
+ z)x +
_
n^ -i n
+ 2)x
n
+ l)p« -
zx )m' (x)dx t
n + 1
+ B
t
—
^
|
,
which is (5.118)(b). T h e proofs of (5.124) (b) and (c) depends on the following identities involving hypergeometric functions F{a, b; c; x) which has the following integral representation when Re(c) > Re(b) > 0: F(a,b;c;x)
=
r
(
&
)
^ _
6
)
f
- t )
M
( \
- tx)~*dt.
(5.126)
We will also use the representation 2 ^ ( i l ; ^ ; x )
=
1
o g ( l ± f )
(5.127)
Some Minimax Test in Multinormales
131
and the identities =
F(a,b;c;x) ( c - o - 1 ) F(a,b;c\x)
+aF(a
- ( c - 1) F{a,b;c lim
(5.128)
F(b,a;c;x),
- l;x)
+
l,b;c;x)
= 0,
(5.129)
[TicT'F^b-c^x)
=
(a)
( f t U i ^ M A *-(„ + „ + i (JI + 1)! +
l
)
6
+
B
+
1
;
„
+
2
;x) (5.130)
for 7i = 0 , 1 , 2 , . . . , r ( l + A + n ) r ( l + y + /Q
= F \ l + A, ~
- v; 1 + X +
- A, 1 + V;1 + J/ + ii; 1 - * j
- i - A, ~ +1/; 1 + « + ./*;! — as
+ F\ \ + X, \ - u; \ + X -t- fi\x \ f [
- j
Q
+ A , | - i / ; l + A + as)
F ( | - A,|+
W
C
+ y +
- ;£.j;
6
F(ffl,6;c;x) = (1 - x ) - " - F ( c - a , c - b;c; x).
(5.131)
We now prove (5.124) (b) and (c). F r o m (5.123), using (5.126) and (5.127) we obtain
-(I-48
— F
F ^ , i ; l ; ^
+
(
3 1 - , - ; 2; 1 - z] x F\ 2 2
l
r / 2 )
-
| ; 1; 1 -
^
1 1 2 2
l + (i-z)5(i-zx) (1 — log 3 X 1 1 i 2ir(l - j ) t i i ( l - i ] l \ 1 — 6
I 1 zi) ? -
(5.132)
132
Group Invariance
in Statistical
Inference
The first expression in square brackets in ( 5 . 1 3 2 ) varishes, as is easily seen from the power series of F in ( 5 . 1 1 1 ) .
Using ( 5 . 1 2 7 ) , the integral in ( 5 . 1 3 2 }
can be written as y.
_ J
^ ( l "
3
) ^
1
zT
(1 -
f
dx
2n + l Jo
(l-x)i(l-zx)"
V
71=0 1
°° °°
z, m( (^l )\ m ^oo m
2 * H
r a
(m!) v
m=0
'
2
(n + m ) ! < l - J t ) "
Z -
B
!(n+|) v
n=0
3'
(i - f )
m=0
=
(
1 1
_
z
)
-
1
+
'
l
(
_
1
2
V
z
)
rl-l r ' _ — i i ^
- i ;
1
= ( 1 - z } " + ( T T / 4 ) (1 -
;
m
+
_
( i - 2 - o *
0
i r * * ( j ,
| ; 2 ; 1 - z ) .
(5.133)
From ( 5 . 1 3 2 ) and ( 5 . 1 3 3 ) we get
|
m,(«)(fc = ( 7 r / 2 J F f-i,i l;4ri " 2 ' 2 " " / T
F ,
!
- P
rii;l
V2'2
1.1,*.-.)]
Hence, from ( 5 . 1 2 9 ) and ( 5 . 1 3 3 ) we obtain from
(5.134)
- F ( - ^ - ^ I ^ ) ] } .
(5.135)
Some Minimax Test in Multinormales
Using (5.129) with a = b =
133
c = 1 + e and (5.130) with n = 0 and
c = e —* 0, (5.132) reduces to 1 1 1 1 3 1 - - , - ; l ; z ] F ( - , - ; 2 ; l - z ) + ( / 2 ) F (-, 2; 1-z 2'2' 2'2'
mF[
)F[
E
1 1 -;2;* 2'2' (5.136)
Now, by (5.131), the expression (5.133) equals one if we have
-/•(
- i , 4 ; l ;
Z
)
+
(
2
/ 2 ) F ( i , i ;
2
;
= 0.
.
(5.134) T h e expression inside the square brackets is easily seen to be zero by computing n
the coefficent of z . T h u s we prove (5.124)(b). We now verify (5.124) (c). T h e integrand in (5.132) is unaltered by multiplication by x, and in place of (5.133) we obtain (1 — Z)~ /2 1
z)i]
— Z~ /3+[K/4Z(1 1
—
F ( - | , | ; 2 ; 1 - z). T h e analogue of (5.134) is
pi
—
I x m (x) Jo
dx
2
^i
;2;*)
+
kin-
2
(7T/4)(1-Z )/Z
-[(l-z)*/3.]F|
'3 (
n ^ l ^ i - z
1 3 ;
2 2
;
3
2'2
:
(5.135)
1
T o verify (5.124) (c) we then have to prove the following identity (by (5.118)
ril +
x F
3
3
2'2'
F
,1:1:1-* 2
<x/4)[(l-*) /*]
1 3 - - , - ; 2 ; l - z 2'2'
.
(5.136)
134
Group Invariance in Statistical Inference
From (5.131) with c = 1, a = b = § , then (5.129) with a = ±, 6 = - J ,
c = 1,
and then (5.130) with A = /i = 0, f = 1 we can rewrite (5.136) as
(4/3ir){l + 2»)
+ (3z/2)F(i,-|;2i *)
| , | ; 2 ; 1 - * ) + (4/3TT)(1 -
i
i
1
3
2'
2'
2'
2
z)
(5.137) Using (5.129) with a = - f , b = § , c = 1 + «r, and then (5.130) with n = 0, c — « - * 0, the expression inside the square brackets in the last term of (5.137) can be simplified to \ z F ( — I , ^ ; 2 ; r ) . T h u s we are faced with the problem of establishing the following identity
4/TT = Fl
1 3 1 1 — - , - ; 2; z j F\ - , - ; 1; 1 — z 2 2 2'2'
(5.138) which finally by (5.126) reduces to
-^ F G'-^ ;2;1 - 2 ) +((1 - £)/2 - 1)F (^ ;2;1 - 3 ) • (5.139) T h e expression inside the square brackets in (5.139) has a power series in 1 - z, the value of which is seen to be zero by computing the coeificents of various power of 1 - z. Hence we prove (5.124) (c).
Some Minimal Test in Mul(inormales 135 5.3.3.
E-Minimax
test
(Linnik, 1966)
Consider the setting of Sec. 5.3.1. 6 = 0 against alternatives Hj : 8
2
for Hotelling's T
2
lest for testing H
> 0 at level Q(T(0,1). T h e Hotelling T
2
0
:
test
0jv is £ - m i n i m a x if sup inf E (
inf
8
E (
(5.139)
N
for all JV > / V (e) where 8 = (ft, E ) and 0 runs through all tests of level < a. 0
For N - * oo, A = ^ , 0
< tp < 2(log N ) * , (5.26) has the approximate
solution
which is a pdf on the simplex I V If we substitute (5.140) in the left side of (5.62) we obtain a discrepancy with its right side which is of the order of 0 ( J V £
- 1 + £
) , for e > 0.
T h u s , if
a = ON satisfies the condition 1
0(exp{-logN)V*)
< a < 1 - Ofexpf-logfV) ^)
(5.141)
and we have £
1
e x p ( - ( l o 7 V } ^ ) < 8 < (logfiV)) /*
(5.142)
g
then sup inf E () e
inf
for e > 0 and hence the Hotelling T
2
1
5
Eg(4> ) = O ^ l / i V " ) N
test is e-minimax for testing H
a
against
i f , : 8 > 0. Exercises 1. Prove in details (5.65). 2. Prove (5.72) and (5.73). 3. Prove (5.119) and (5.120). 4. Prove (5.127) and (5.130). 5. Prove that in Problem 8 of Chapter 4 no invariant test under ( G T - , ! ^ ) is minimax for testing Ho against Hi for every choice of A.
136
Group Invariance in Statistical
Inference
References 1. M. Behara, and N. Giri, Locally and asymptotically minimax test of some multivariate decision problems, Archiv der Mathmatik, 4, 436-441, (1971). 2. A. Erdelyi, Higher Transcedenial Functions, McGraw-Hill, N . Y . , 1953. 3. J . K. Ghosh, fnvariance in Testing and Estimation, Publication no S M 67/2, Indian Statistical Institute, Calculta India, (1967). 4. N. Giri, Multivariate Statistical Inference, Academic Press, N . Y . , (1977). 5. N. Giri, Locally and asymptotically minimax tests of a multivariate problem., Ann. Math. Statist. 39, 171-178 (1968). 6. N. Giri, and J . Kiefer, Local and asymptotic minimax properties of multivariate test, Ann. Math. Statist. 39, 21-35 (1964). 7. N. Giri, and J . Kiefer, Minimal character of R -test in the simplest case, Ann, Math. Statist, 34, 1475-1490 (1964). 8. N. Giri, J . Kiefer, and C . Stein, Minimax properties of T -test in the simplest case, Ann. Math. Statist. 34, 1524-1535 (1963). 9. E . L . Ince, Ordinary Differential Equations, Longmans, Green and Co., London (1926). 10. W, James, and C . Stein, Estimation with quadratic loss., Proc. Fourth Berkeley Symp. Math. Statist. Prob. 1, 361-379 (1960). 11. E . L . Lehmann, Testing Statistical Hypotheses, Wiley, N . Y . (1959). 12. Ju. V. Linnik, V. A. Pliss and Salaevski, On Hotelling's Test, Doklady, 168, 719-722 (1966). 13. J . Kiefer, Invariance, minimax sequential estimation, and continuous time processes, Ann. Math. Statist. 28, 573-601, (1957). 14. Ju. V. Linnik, Appoximately minimax detection of a vector signal on a Gaussian background, Soviet Math. Doklady, 7, 966-968, (1966). 15. M. P. Peisakoff, Transformation Parameters, Ph. D. thesis, Princeton University (1950). 16. E . J . G . Pittman, Tests of hypotheses concerning location and scale parameters, Biometrika 31, 20G-215 (1939). 17. O. V. Salaevski, Minimal character of Hotelling's T test, Soviet Math. Doklady 3, 733-735 (1968). 18. J . B. Semika, An optimum poroperty of two statistical test, Biometrika 33, 70-80 (1941). 19. L . J . Slater, Confluent Hypergeometric Functions, Cambridge University Press (1960). 20. C . Stein, The admissibility of Hotelling's T test, Ann. Math. Statist, 27, 616-623 (1956). 21. A. Wald, Contributions to the theory of statistical estimation and testing hypotheses, Ann. Math. Statist 10, 299-320, (1939). 22. A. Wald, Statistical Decision Functions, Wiley, N.Y. (1950). 23. O. Wesler, fnvariance theory and a modified principle, Ann. Math. Statist. 30, 1-20 (1959). 2
2
2
2
Chapter 6 L O C A L L Y MINIMAX T E S T S I N S Y M M E T R I C A L DISTRIBUTIONS
6.0.
Introduction
In multivariate analysis the role of multivariate normal distribution is of utmost importance for the obvious reason that many results relating to the univariate normal distribution have been successfully extended to the multivariate normal distribution.
However, in actual practice, the assumption of
multinormality does not always hold and the verification of multinormality in a given set of data is, often, very cumbersome, if not impossible. Very often, the optimum statistical procedures derived under the assumption of multivariate normal remain optimum when the underlying distribution is a member of a family of elliptically symmetric distributions.
6.1. E l l i p t i c a l l y S y m m e t r i c D i s t r i b u t i o n s D e f i n i t i o n 6 . 1 . 1 . (Elliptically Symmetric Distributions (Univariate)). A random vector X = (X\, • • - , X )' ?
with values in E
p
is said to have a distribu-
tion belonging to the family of univariate elliptically symmetric distributions with location parameter (i = (pi,...
,/i )' e E p
p
and scale matrix £
(positive
definite) if its probability density function can be expressed as a function of 1
the quadratic form (x - n)''S~ (x
fx(x)
= |E|-
1 / 2
- p) and is given by
ff((x -
- ft)),
137
i e
p
£ ,
(6.1)
138
Group Invariance
in Statistical
Inference
where q is a function on [0,oo) satisfying J o(y'y)dy
= 1
p
for y e
E.
We shall denote a family of elliptically symmetric distributions by E (n, p
£).
It may be verified that E{X) where a — p the class E {u p
t
family E (ti,
- 1
E{{X
= u, l
- n)'%- (X
cov(X)
= oS
- f*)). In other words all distributions in
S ) have the same mean and the same correlation matrix. T h e S ) contains a class of probability densities whose contours of equal
p
density have the same elliptical shape as the multivariate normal but it contains also long-tailed and short-tailed distributions relative to multivariate normal. T h i s family of distributions satisfies most properties of the multivariate normal. We refer to Giri (1996) for these results. D e f i n i t i o n 6 . 1 . 2 . (Multivariate Elliptically Symmetric Distribution). A n x p random matrix x = (x ) ij
where Xi — (Xn,...,
X )'
=
(x ,---,x y l
n
is said to have a multivariate elliptically symmetric
ip
distribution with the same location parameter p = ( p i , . . . , p } ' and the same p
scale matrix S (positive definite) if its probability density function is given by
f (x) x
p
with Xi e E ,i
2
= \L\-^ q
r g f o
-
tf^fa-tM
,
(6.2)
= 1 , . . . , n and q is a function on [0, oo) of the sum of quadratic
forms {ii — p ) ' S
- 1
( i i — p ) , i — 1,.... ,H.
We have modified the definition 6.1.2. for statistical applications.
Ellip-
tically symmetric distributions are becoming increasingly popular because of their frequent use in filtering and stochastic control, random signal input, stock market data analysis and robustness studies of multivariate normal statistical procedures. T h e family E (u, p
S ) includes, among others, the muiltivariate nor-
mal, the multivariate Cauchy, the Pearsonian type I I and type I V , the Laplance distributions. T h e family of spherically symmetric distributions which includes the multivariate-t distribution, the contaminated normal and the compound normal distributions is a subfamily of i £ ( p , £ ) . p
Locally Minimax Tests in Symmetrical
Write f (x) x
Distributions
139
in (6.2) as (6.3)
where 8 = {p, E ) e Q,x
e x and * is a measurable function from \ m
Let us assume that y is a nonempty open subset of R
o
n
t
o
3^-
and is independent of
0 and q is fixed integrable function from y to [0, oo) independently of 8. L e t G be a group of transformations which leaves the problem of testing H
0
:8 £ i l / / against Bi : 8 £ flff, invariant. 0
Assume that \ is
a
C a r t a n G-space and the function i/j satisfies *(gx\8)
for ail i
=
gV(x\8)
(6.4)
€ x> ff £ G , # G SI and g € G where (5 is the induced group of
transformations on * corresponding to g £ G on x
a r |
d G acts transitively on
the range space y of *£. Using (2.21a) the ratio i i of the probability density function of the maximal invariant T(X)
under G on x; with 8\ 6 QH, , #o £ QH > > given by s
0
where 6(g) is the Jacobian of the inverse transformation x —> gx. T h e o r e m 6 . 1 . 1 . / / G acts transitively is independent Proof.
of
then R
0
Since G acts transitively on y
element k(x,8j
on the range space y
of q for all 8\ 6 tiff,, do £ fln , for any j o £ J there exists an
: y ) of G such that 0
(6.6) Using the invariance property of p and replacing g by gh(x,8i
: j / ) we get 0
140
Group Invariance in Statistical
where A
r
Inference
(-) is the right modular function of ft. Hence
Q{8 )8{h{x,8 0
: y )-i)A (h(x,6>
0
0
r
: y ))
0
0
is independent of g.
•
6.2. L o c a l l y M i n i m a x T e s t s i n E (fi, p
E)
In recent years considerable attentions are being focussed on the study of robustness of commonly used test procedures concerning the parameters of multivariate normal populations in the family of symmetric distributions. For an up-to-date reference we refer to K a r i y a and Sinha (1988).
T h e criterion
mostly used in such studies is the locally best invariant ( L B I ) property of the test procedure. G i r i (1988) gave the following formulation of the locally minimax test in E (ii,
£ ) . Let F be the space of values of the maximal invariant
p
Z — T(X)
and let the distribution of T(X)
with 6 > 0 and 77 g W.
depend on the parameter (<5,77),
For each (u, 77) in the parametric space of the distribu-
tion of Z suppose that f(z
: $,tf) is the probability density function of Z with
respect to some r/-finite measure. Suppose the problem of testing Hg : 8 e f l / /
0
against Hi .8 e f i / / , reduces to that of testing H0 : 8 - 0 against Hi := A > 0, in the presence of the nuisance parameter IJ, in terms of Z. We are concerned with the local theory in the sense that f(z
: A,n) is close to f{z
: 0,17) when A
is small for all q in (6.2). Throughout this chapter notations like o ( l ) ,
o(h(X))
are to be interpreted a A —* 0 for an q in (6.2). For each a , 0 < a < 1 we now consider a critical region of the form R' - [U{x)
= U(T(x))
> C]
(6.7)
a
where U is bounded, positive and has a continous distribution for (8, r/), equicontinuous in (S,ri) with 6 < S for any q (6.2) and R' 0
P , „ ( f l * ) = cv, P ( f T ) = a + h{\) 0
M
satisfies
+ rfA,/,)
(6.8)
for any q in (6.2) where r(A,ri) - o(/i(A)) uniformly in 77 with h{\) > 0 and h(X) = o ( l ) . Without any loss of generality we shall assume h(X) — bX with
£>> 0. Remarks. (1) T h e assumption P „(R*) 0t
of U under H
0
= a for any q implies that the distribution
is independent of the choice of q. T h e test with critical
region R' satisfying the assumption is called null robust.
Locally Minimax Teats in Symmetrical Distributions (2)
T h e assumption P\^(R")
— a + h(X) + r(X,ri)
the distribution of U under Hi
141
for any q implies that
is independent of the choice of q as
X —> 0. T h e test with critical region R" satisfying the assumption is called locally non-null as X —• 0. L e t fo, £x be the probability measures on the sets {S = 0 } , {6 —
X},
respectively, satisfying
/
f(z
/
:
X,T,)Udv) = 1 + h(X)[g{X) + r{X)u] + B{z, X) n
0
for any q in (6.2) where 0 < Ci < r(X) g(X)
(6.9)
f(z:Q, )( (dri)
— o ( l ) and B(z,X)
< c
< oo for X sufficiently small,
2
= o(X) uniformly in z.
Note. If the set 8 = 0 is a single point, £ assigns measure 1 to that point. In 0
this case we obtain
j
=
t
(
Since G acts transitively on the range space y of
6
1
0
)
the right-hand side of
(6.10) is independent of q. T h e o r e m 6.2.1.
If R* satisfies (6.8) and for sufficiently
small X, there
(6.10), then R* is locally minimax for testing HQ :
exist £\ and £o satisfying
5 = 0 against the alternative Hi : 6 = X for any q in (6.2) as X —* 0, i.e. l
i
iaU,Px, (R*)
m
v
A-.osup^
i e Q a
for any q in (6.2), where Q
a
iaf„Px, {4x v
- a
=
rejects
H} 0
j
,
f i
^
- a
is the class of test d>\ of level a.
P r o o f . Let n
1
= (2 + /i(A)[9(A) + C* 7-(A)])- . Q
A Bayes critical region for (0, 1) losses with respect to the a priori T\Z\ + (1 - TX)£O is given by
£J—
142
Group Invariance
B y (6.10) B (z) x
Inference
holds for any q in (6.2). Write B (z)
x
and W
in Statistical
= B,
x
V
x
x
= R* -
B
x
= B\ — R". Since s u p J 5 ( z ) / 7 i ( A ) | = o ( l ) and the distribution of U A
is continuous, letting
KM)
= /
PxMUdv)
we get
Since, with A — Vj, or p&t^)=%f,4!
(i+o(ft(A))
writing r J ( A ) - (1 - r ) P J ( A ) + r ( l A
i A
A
f&fA))
we get the integrated Bayes risk with respect to f j as r J ( B > ) - (1 - r O P o , x ( S , ) + r ( l A
r (R*)
=(1
x
P&pjJ).
- r )P %(J?) + rx(l - P ; , ( P ' ) ) • A
0
A
Hence for any q in (6.2) r ( B ) = rl(R-) A
A
+
+ (1 - n ) t * f o T O "
T (P ' a
= rl(R')
1
I A
(V )-P - ,(W,)) A
1
I
+ (1 - 2 ) ( P * ( H M - P * ( V ) ) n
0
A
0
= ri(il*) + (MA)).
A
A
(6.13)
0
If (6.11) is false for all q in (6.2), then by (6.8) we can find a family of tests { has power function cv + r(A, n) on the set {6 = A} x
for all q in (6.2) satisfying lim
sup (inf[r(A,jj) -h(X)]/h(X) A —> 0 "
Hence, the integrated Bayes risk r lim
A
> 0.
of d>\ with respect to
then satisfies
sup ( r ; ( P ' ) - r ) / M A ) > 0 A— 0
for all q in (6.2) contradicting (6.13)
A
•
Locally Minimax
Tests in Symmetrical
Distributions
143
6.3. E x a m p l e s E x a m p l e s 6.3.1.
L e t X = (Xij)
= (X[,.
.. X' ), t
X[ = (X' ,...
n
,X ),
tl
ip
i — 1 , . . . , n be a n x p (TI > p) matrix with probability density function
f (x)
2
m-^ q(Y(xi
=
x
~
rf'E-^-JK))
(6.14)
i-i where g is from [0, co) to [0, oo), ft = (pi,...
p
,p )'
€ E
p
and E is a p x p positive
definite matrix. We shall assume that q is thrice contunuously differentiable. Write for any b = (h,...,b )'
= {b' ,b' )',
p
(lfa+1,..
{1)
.»6p)',6p] = (bi,...,bi)'
with b i) -
(2)
{
and for any pxp
(f>i,... , 6 ) ' , b ) P l
[2
=
matrix
A=(a )=(f* iS
3
where A{
3
are pi x p
}
\ A
2
A
1
22
submatrices of A with pi -I- p
2
— p- We shall denote by
/an,...,ai,\
\aa,...,
On/
We are interested here in the following three problems. P r o b l e m 1. To test HIQ : p = 0 against H\\ ; p ^ 0 when E is unknown. P r o b l e m 2. T o test H
: 2I)
M(i) = 0 against H \ 2
• P(i) ^ 0 when p ( ) , E 2
are unknown. P r o b l e m 3 . To test H Q : p — 0 against H31 : p ; i j = 0, p ( ) / Z
2
0 when E
is unknown. T h e normal analogues of these problems have been considered in Chapter 4. P r o b l e m 1. Let Gi(p)
be the multiplicative group of p >c p nonsingular matrices g.
Problem 1 remains invariant under G;(p) transforming (X,S;p,-i:)^(gX,gSg';gp,gT g') l
where X = ^
=
l
X i , S = Z? (Xi-X)(Xi-X}'.
space of ( X , S) is T
=l
2
= nX'S^X'
A maximal invariant in the
or equivalently R = nX'(S
1
+ nXX')~ X
=
144
T
2
Group Invariance
/ ( l + T
2
) .
in Statistical
Inference
A corresponding maximal invariant in the parametric space under
Gi(p) is 6 = T t y t ' E ~ V - I f Q is convex and nonincreasing, then the Hotelling
T
2
test which rejects H\o whenever i i > C is uniformly most powerful invariant with respect to G ( p ) for testing i f ;
1 0
against H\i
( K a r i y a (1981}}. A s stated in
Chapter 5, the group Gi(p) does not satisfy the conditions of the H u n t - S t e i n theorm. However this theorem does apply for the subgroup GT(P)
ofpxp
nonsingular lower triangular matrices with p > 2. From E x a m p l e 5.1.1. maximal invariant under GT{P) is [R\,... on A — (6\,...,Sf,)'. under Hu
T h e ratio
: S = A and H
L
R
of probability densities of
: 6 = 0 is given by (with g = (gij) € GT =
0
2
j
the
,i?p)', whose distribution depends
(
(Ri,...,Rp)'
GT{P))
i)/2
Pi(9z)fl(9 i) "- d9
R = ~
—
/
^
(6.15)
Po(sx)I](^)
( n
-
0 / 2
dS
where dg = Pi(x)
Jldgij,
= q(tr x'x - 2nxp + A),
Po(x) = ? ( t r x'x). Since x'x > 0, there exists a g E GT such that gx'xg' = I,
Jn~gx = y = (VR ...,
v^p)'.
U
Hence f R
q(ti{ g'
- 2A'gy +
9
tyf[(fi)f**&*B
=
(6.16)
/
( n
9(tr(ss'))n(^) -
i ) / 2
dff
Under the assumption that q is thrice differentiable, we expand q as q(ti(gg' - 2A'gy + A)) - 1(^(99'))
+ ( - 2 t r ( A ' g y ) + \)q™(ti
1
gg )
2
+ ^ 2 t r ( A ' g y ) + A ) ( t r (gg')) + h-2ti(A'g ) y
3
3
+ X) q (z)
(6.17)
Locally Minimal Tests in Symmetrical
Distributions
where z = ti{gg') + (1 - a ) ( - 2 t r ( A ' y ) + A), 0 < c < 1 and
145
-
9
Since the measures
9 % * < ^ ) > j l ^ ^ ^ d f l , fc = 1.2,3 i=i are invariant under the change of sign of g to -g,
/
tr(AW"(tr(S9'))
/
9iW
t
)
(tr(
fiftfe**-*** (
9
we conclude that
1
'))n(4) ' "
9
i
l
/
2
= °' r
f
^
0
(618)
for i j£ i, j m . Now letting D =
/
g
(tr(
'))n(s?i)
f f 0
( u
-°
/ 2
^,
from (6.15)-(6.18) we get
R=l
+ ±
+ 4
/
+ 4 6
!
g^(gg'))f[(gir-^dg
MA'
/ /
0 i
,))
(-2tr(A'
2
2
9
? s /
?
( >(tr(
)
+
5 f f
A)V
3 1
'))n(^)
3
2
| n
-'
( )n(S ,) i i
T h e first integral in (6.19) is a finite constant Q\.
| n
) / 2
-
d
S
l ) / 3
^ (6.19)
T o evaluate the second
integral in (6.19) we first note that
tr{L\'gy)
=
Y r)
12
(6.20)
d
From (6.17) and (6.19) the second integral in (6.18) reduces to
/
|5>£M-I
^(gg'))f[(gl^-^dg.
(6.21)
146
Group Invariance in Statistical
Inference
T o evaluate the above integral we need to evaluate the following two integrals.
JOT
i=i (6.22)
h =
2
I
( 2
t r
g 3 '( (s3'))f[(s j
2 1
(
l
} ''-
i , / 2
^
Define L = tr(ffff')
- gtJL, i = l,...,pi £
ep i=flfn,i/ '
*=
+
L
fip+p-i+i = 9i+i,if <
i,--.,P-i;
»= 2,...,p-2;
and ST= /
(2,
g (tr(9s'))rfff
Since
is a spherical density of gLtf, L and e = ( e i , . . . , e ( p
p + 1
j y ) ' are independent 2
and e has a Dirichlet distribution £ ( 1 / 2 , . . . , 1 / 2 ) . From K a r i y a and E a t o n (1977) the probability density function of e is given by
P
(e) = r ( f c ^ ) / [ r ( i / 2 ) i ^ ' /
2
p(p+l)/2-l
/
p(p+l)/2-l
i=.
V
i
1/2-1
J
(6.24)
Using (6.22) and (6.23) we get
,
,
Jk -
(n - • + 1 ) — ,
.NM
_ WW J - — 2
(6.25)
Locally Minimax
Teats in Symmetrical Distributions
147
where
M = E[
/ P TJ(
E I
)(«-0/a
\i=l
C = p(p + l ) / 4 + 1/2 £ ( n - i ) .
(6.26)
Let JJ = ( i j i , . . . , % , ) ' , r?i — ^i/A. From (6.18) we get
» =
X-+ A2?
|j>
j&+| J
L + ( n - j +
)%] j j + (6.27)
where B(y,ri,\)
and 77. T h u s , from (6.27), with
— o(A) uniformly in
the equation (6.9) is satisfied by letting £0 give measure one to the single point ?7 = 0 while £
A
gives measure one to the single point 77 — n" — ( j j j , . . • , t j j )
whose j t h corrdinate Wj satisfies V'j = (" " j ) " V
" 3 + i r V ^ f f t " P ) . / = 1, -
(6-28)
so that + l ) n | - njp.
Since G;(p) and G (p) T
(6.29)
satisfy the condition of Theorem 6.1.1., R does not
depend on q. T h e first equation in (6.8) follows from the following lemma. L e m m a 6 . 3 . 1 . Let Y = (Y ... u
annxp
,Y )', n
random matrix with spherically
f (y) Y
Y; = {Yn,..., symmetric
= 9(tr yy') = q ( g ,i=l
a n d Jet
tr
Y )' ip
f
i = 1 , . . . , « be
probability density
] ,
function
(6.30)
148
Group Invariance in Statistical
where Y ) , Y ( L
p>k.
( 2 )
Inference
are of dimensions
The distribution
of Y
(Y ' Y( ))
( 1 |
P r o o f . Since n > p, Y'Y
k x p, (n - k) x v respectively and (
2 )
Y
2
n - k >
does not depend on q.
(1)
is positive definite with probability one. Hence
there exists a p x p symmetric positive definite matrix A such that Y'Y Transform Y to U, given by Y = UA.
— AA.
A s the Jacobian of the transformation
n
Y — U is
\A\ , n
/r/,A(«,«) = «(tr a a ' ) M . T h u s (7 has the uniform distribution and 17 is independent of A.
Partition
U as
where t /
is k x p and (7(2) is (n - k) x p. From Y = C M we get Y
( 1 )
( { )
= t7#j>4,
i = 1, 2. T h u s
Since Y ( ( Y ' Y ( ) ) 1 1
1
2 )
_ 1
2
Y j j is a function of 17 alone, it has a completely specified {
distribution.
•
C o r o l l a r y 6.3.1. If T
2
fc-
is distributed as Hotelling's
(central i.e. 6 = 0). Proof.
-1
Since Y ( j ( Y j ' j Y ( 2 ] ) Y j ' j has a completely specified distribution 1
2
1
for all q in (6.30), we can, without any loss of generality, assume that Y i , . . . , Y„ are independently distributed N (0,I).
T h i s implies that Y / j l ( 3 ) =
p
E J L Y / Y / has Wishart distribution W (n 2
Y
Y
p
Y
lY
h
(i)( (2) m)~ {i)
a
s
Hotelling's T
2
distribution. 2
From this it follows that when 6 = 0, T Yi, 5 =
Y
Y
~ )( i
Y
h
~ Y
a
s
2
- 1,1) independently of Y i . Hence • 1
= nY'S' ?,
Hotelling's T
2
with Y = l/n
£"
=1
distribution for all q in
(6.30). T o establish the second equation in (6.8) we proceed as follows.
Using
(2.21) we get for any g £ t?/(p), (which we will write as Gi) 2
fT*{t \6) 2
fT2(t \0)
J g(tT( g'-2a'gy Gi
9
L
2
+
S))\gg'\^' dg 2
<j(tr gg')\g0«-»)/ dg
(
J
Locally Minimax
Testa in Symmetrical
Distributions
149
where y = ( < / r , 0 \ . . . , 0 ) ' , a - ( \ A , 0 , . . . , 0 ) ' are p-vectors and T = f / ( l + f ) . 2
2
After a Taylor expansion of q, the numerator of (6.31) can be written as iin
p,
{n
/ q(ti99')\99'\ - dg JG,
p)n
+ 6 j <,<»>(tr99')\99'\ - dg JG, ,
2
+ 26r f q^(t gg )gl\gg'\^->'y dg JG,
+ o(6)
T
where g — (jJy). Using (6.18) we get the ratio in (6.13) as l + 6(k + CT) + B(y,t),6) where B(y,r),6) Si/S,i
— o(6) uniformly in y,v
(6.32)
and 77 — ( i j i , . . . ,i7 )', with r/i — p
= 1 , . . . , p and k,c are positive constants. Hence for testing H
against the alternative H'
n
: S — A the Hotelling's T R* = {U(X)
where c
:6 = 0
l0
a
2
test with critical region.
= R>c }
(6.33)
a
is a positive constant so that R* has the size a, satisfies P^tRn^a
+ bX + giKr,)
(6.34)
with b > 0 and <7(A,r/) = o(Z>A). T h u s we get the following theorem. T h e o r e m 6 . 3 . 1 . For testing Hm : n = 0 against the alternative H'
n
A specified > 0 Hotelling's of distribution
T
2
test is locally minimax
:6—
as X —* 0 for the family
(6.14).
R e m a r k 1, Since G j ( p ) satisfies the condition of Theorem 6.1.1., the ratio (6.31) is independent of q. From Corollary 6.3.1. (6.34) it follows that Hotelling's T Hia against
2
and equations (6.31) and
test is locally best invariant for testing
: 6 — X as A —* 0 for the family of distributions (6.14). I f q
is a nonincreasing convex function then Hotelling's T
2
test is uniformly most
powerful invariant.
P r o b l e m 2. Write X = ( X i j , X ) ) with (
p,+P2=
( 2
: n x p\,X( ) 2
: n x P2, and
p.
Since the problem is invariant under translations of the last pa components of each Xj, j — 1 , . . . , n , this can be reduced to that of testing if o : p ^ j — 0 2
against H21 • P(i)
0 in terms of X^
only whose marginal probability density
function is given by /^^(^Di-ISiir^gftrSr/^ij-ep'^y^^-ep^))
150
Group Invariance in Statistical
where E i j is pi
x
P
l
Inference
left-hand cornered submatrix of £ , e
— (1,...,1)
JI-vector and g(tr v) =
/ q(tr(v + JR"F1
ww'))dw,
and q is a function on [0, oo) to [0, oo) satisfying (6.1a). Now, using the results R
of problem 1 with p = p , R = Ri L
T h e o r m 6.3.2.
i
w
e
h
a
v
e
t
h
e
following theorem.
For testing H20 • P(i) = 0 against H'
21
: 6\ — np'^y
S V ( i ) = X {specified) > 0 the test which rejects H20 for large values of n
Rj — nX' (S (1)
1
+r$( y£fa)~ Jt(i)
{n)
l
= nXl S( \ X /{l l)
l )
+
{l)
nXl S-\ X , ) 1)
)
l )
is locally minimax as X —t 0 for the family of distributions
(6.34).
R e m a r k 2. T h e locally best invariant property and the uniformly most powerful invariant property under the assumption of nonincreasing convex property of q follows from the results of problem 1. P r o b l e m 3. T h e invariance property of this problem has been discussed in Chapter 4. T h e problem of testing i f
3 0
against H i 3
remains invariant under
the multiplicative group of transformations G of p x p nonsingular matrices g, 9 m )
9 = ( \9{2i)
° 9(32)
where gin) is p i X P i , transforming (X, S) -» (gX, gSg').
A maximal invariant
in the sample space under G is (Hi,.S3) where pi
p
Ri=2^Ri,
Ri
Ri+R~
fc=l
2
= R =
Y, i=l
A corresponding maximal invariant in the parametric space is S\, 62 where
i=l P i-l
For invariant tests under G the problem is reduced to testing i i against the alternatives H
3l
Ri and R 2
3 0
; 5 = 0 2
: b\ > 0, when it is given that 61 = 0, in terms of
Locally Minimax Tests in Symmetrical
Distributions
151
As stated in Chapter 5 the group G does not satisfy the conditions of the H u n t - S t e i n theorem. However this theorem does apply for the subgroup GT(P)
of p x p nonsingular lower triangular matrices. From problem 1 the
maximal invariant under GT{P) is (Ri,...,Rp)' continuously on A =
whose distribution depends
with 6i > 0 for all i. Under H
, • - •, VS )' P
30
: 6i — 0,
i = l , . . .p and under i/31 : 6i — 0, i — 1 , . . . , pi- T h e ratio R of the probability densities of (Ri,...,
under H'
R )' p
: 6 = A > 0 and under H Q '• $2 = 0, when
31
3
2
it is given that 6\ — 0, is given by f
/
q(tv(g '
- 2A{ (g
9
2j
+
{2im)
R = —
S ( 3 2 ) J
{n
, ) + A)} 1
— /
?(tr
J G
where g e G (p)
(6-35)
gg')\{{gir-^dg 1
T
— GT, y — (T/R~I, • • • ,VRy)'
T
,)l2
]\[gl) - dg
( 2
and
•-*•>-(a a
)
with ( » i i j ff(22) both lower triangular matrices. As before we expand 9 in the numerator of (6.35) as 9(tr Sff') + (~2v + \)qM(ti 9
| 2 )
(tr
gg') + {-2v
+ A)
3 ,
3 0
') + (-2f + A)V (^)
2
(6-36)
where 2 = tr gg' + (1 - a ) ( - 2 f + A), 0 < a < 1,
" = tr A { ( 9 ) 3 / ) + 3(22)2/(2)) 2)
[21
(1
Using (6.18) the integration of the second term in (6.36) with respect to the measure n
^
-
6
^
3 7
( - >
1
gives A Q I where
W
01= /
2
in
iy2
q (K99')f[(9 ,) - dg.
JG
T
1
T o integrate the third term in (6.36) with respect to the measure given in (6.37) we first observe that (by (6.18))
152
Group Invariance
/
in Statistical
2
(tr(A;
= /
Inference
2 ) S ( 2 u y o )
(
2
)%' Htrffs0n(5 J
E
9
( 2 t
(tr
S f f
l n
-
, ) / 2
^
')n^)
( n
"
i ) / a d
f (6.38)
and
/ JG
(tt(A;
2 ) f f ( 2 2 )
i
2
( ( 2 )
v r
)%( >(trS3')n(s i
r
=/
[)=Pi + l
( n
1
} -
0 / 2
rfs
ff
E >Ei 4 + T
2
-
+
V i
I
)
j>i
(6.39) 1
Denote K=
f JG
q^[tigg')dg, T
£ = tr gg', (6.40)
D=
(
f
2MN
9(tr
gg')f[{glY-^dg,
^ ( E i r
n+ D 1>J we get where e-s are defined in (6.23). From (6.35)-(6.40)
+ B{y,r,, A) where
B(y,n,
(6.41)
A) — o(A) uniformly in j / , T> (independently
6.1.1). T h e set {A = 0} is a simple point 17 = 0. So the £ assigns measure 1 to the single point rj = 0.
of 0
q
by Theorem
in Theorem 6.2.1.
T h e set {<5 — A} is a convex 2
P2 = (p — pi) dimensional Euclidean set wherein each component
77; =
0(h(X}).
Locally Minimax
Tests in Symmetrical
Distributions
Any probability measure £ \ can be replaced by degenerate measure
153
which
assigns measure 1 to the mean 77*, i — pi + 1 , . . . , p of £ j . Hence
p
+ B(y,ri,\)
f
where B(y,Tj, A) = o(h{\))
uniformly in y,n.
g
4
2
j
Consider the rejection region
RK = {X : U(X) = fi + Kf
> c)
2
(6.43)
0
where A" is a constant such that (6.42) is reduced to yield (6.9) and C„ depends on the level of significance a of the test for the chosen K, independently of q (by L e m m a 6.3.1.) Now choose
„ _ V j
(n-j-!)••• (n-j
+ l)
(n-p) --(n-p
I
n-pi
\
+ 2) {(n-p+Vpi)
,_ 3
'
P l
so that
Hence we can conclude that the test with rejection region R' = ^x :U(x)=f with P(, (R')
+
1
7
^^-r
2
>C J
(6.45)
0
= a satisfies (6.9) as A —> 0. Furthermore any region RK of the
TTI
form (6.43) must have K =
to satisfy (6.9) as A - * 0 for some
t\.
It can be shown that ( G i r i (1988)) ffi, R (?i, f 1 J?M )//fi, 2
Dl\ where D
l7
ati, a
(fi,
2
2
P2
\
2
\
3a
P2
(6.46)
are positive constants and B{f'1,^2,62) — 0(62) uniformly in
f j , f . From this and T h e o r m 6.1.1 2
f \H )
it follows that for testing H
3D
against J f
3 1
the test with critical region R' is locally best invariant as 6 —> 0. Hotelling's 2
154
Group Invariance in Statistical
Inference
whenever f i + f 2 > constant does not coincide with the
test which rejects H
3Q
L B I test and hence it is locally worse. From (6.32) it follows that Hotelling's test whose power depends only on 6 = 6 , has positive derivative at 6 2
= 0.
2
T h u s , for the critical region of Hotelling's test the value of h(\)
with 6% = A i n
(6.8) is positive. T h e first equation in (6.8) follows from L e m m a 6.3.1. Hence we get the following theorem.
T h e o r e m 6.2.2. rejects H
For testing H
30
whenever fi + ^jf-rt
30
the family of distributions
against H
: £ ) = A the test,
3l
which
( 2
> constant, is locally minimax
as A —* 0 for
(6.2).
E x a m p l e 6 . 3 . 2 . L e t X = (X )
= {X\
I}
X )' n
X[ = (X*
X )',i
=
ip
1,..., TI be an n x p random matrix with probability density function given in (6.2). L e t X = J
Xt, S = J J ( X i - X)(X
- X)'.
t
we shall write
C
-4(ll| Ajai) 4(31)
-4(12) A 2) 4(32) (2
For any p x p matrix A
4(i \ -4(23) I .4(33) / 3 )
(6-47)
with 4 ( n ) : 1 x 1, -4{ 2) I P i x p i , 4 ( , : p2 x p ; so that pi + p a = p — 1. Define 2
S
S
S
3 3
S
Pi = ( 1 2 ) ( 2 2 ) ( 2 1 ) / ( I 1 ) ' p
2
2
2
= p + p -(E,i2),E,i3,)'(^
(E
a a
,,E
( 1 3 )
)'/E
( 1 1 )
.
We shall consider the following three testing problems: (see Sec. 4.4) P r o b l e m 4.
T o test H
0
2
: p
2
= 0 against Hi : p
> 0
when p, E are
unkown. P r o b l e m 5. T o test H
: (? = 0 against H
0
: pi > 0, p\ = 0
2
when p , E
are unknown. P r o b l e m 6. To test H
0
:p
2
= 0 against H
3
: p\ — 0, p\ > 0 when p, E
are unknown. These problems remain invariant under the group of translations transforming ( * , S ; i » , S ) - » ( * + 6 , S ; / £ + t,E).
p
6eR .
Locally Minimax Testa in Symmetrical
Distributions
155
T h e effiect of these transformations is to reduce these problems to corresponding ones where p = Q and 5 = ^27=1 XiX[.
T h u s we reduce n by 1. We treat
now the latter formulation with p = 0 and assume that n > p > 2 to assert that S is positive definite with probability one. Let G be the full linear group of p x p nonsingular matrices g of the form
9 = \
For the first problem p Pi>
0
9(22)
\
9(32]
0
= 0, p\
2
0
f s m 0
0,p2 > 0 satisfying p\+p2
\ 0
.
(6.48)
5(33]/
— 1 and for the other two problems
=p
— p— 1- T h e problems above remain invariant
under G operating as 1
( S ; E ) -> with g G G.
(gSg'^Zg )
F o r Problem 6 a maximal invariant in the space of S under G
(with p2 = 0) is 5
_o "
=
5
5
(12) (22) [21i Q
and a corresponding maximal invariant in the parametric space of E under the induced group (with p2 = 0) is
S
p* =
(
1
2
)
S
s
S
f
^ (n)
2
1
)
.
(6.49)
2
For Problems 5, 6 a maximal invariant under G is ( f i , fi ), where 2
R\
=
5
2
"
,
2
y ^ (H)
(
2
1
)
,
(6.50)
6
fij+fi|
= (5
(
1
2
)
,5
(
1
3
1
l
)(^
(
%
a
)
,
• (6.51)
A corresponding maximal invariant in the parametric space of E is where ., _ E ( i 2 ) E E 2i] Pi = E(n) ( 2 2 )
(E
( 1
2),E
(
( 1 3 )
)('^
2 2
3
)^ ))(S
L(u)
( 1 2 ) 1
E
( 1
3 )' )
{p\,p ), 2
156
Group Invariance in Statistical
Inference
P r o b l e m 4. T h e invariance of this problem has been discussed in Sec. 4.4. T h e group G does not satisfy the conditions of the Hunt-Stein theorem. T h e subgroup GT of G , given by,
f GT = < 9 ( x
where 3( 2)(Pi
0
( a m 0 \ 0
0(22) 9(32)
0
0 9(33)
Pi)> 9(33)(P2 x Pi) are nonsingular lower triangular matrices,
2
satisfies the conditions of the theorm. It may be remarked that we are using the same notation g for lower triangular matrices as g in G . In the place of
2
fi ,
the maximal invariant in the sample space under GT is a (p — 1)-dimensional vector R = ( R
2
, R
) '
p
denned in E x a m p l e 5.1.5. with PI + I
P
J=2
j=2
T h e corresponding maximal invariant in the parametric space under GT is 2
A — {6^ ,
• • • ,t>)J )' 2
as defined in E x a m p l e 5.1.5. with PI+I
p\
p
2
=
2
P =P i+pl
=
i=2 For testing H
0
2
:p
J2
6i
i-2
= 0 against the specified alternative Hi\
2
:p
= A the ratio
fi of the probability density function of R under H\\ to that under HQ (using Theorem 2.7.1.) is R =
fR(r\H )/f (r\H ) lx
(1 -
A)""/
R
0
2
/
2
$((1 - A) hT(g
- 2g Ay'g'
u)
(11)
(22)
+
g{ g( 2)]v{dg) 22)
2
(6.53) where
•=2 i=2 =
/
g(tr
gg')v(dg).
JGT lG T
y = (v^2
y/Tp)'
Locally Minimax
Testa in Symmetrical
Distribntxona
157
Since the (p - 1) x (p - 1) matrix 2
1
C=(l-p )" (/-AA') is positive definite there exists a (p— 1} x (p— 1) lower triangular nonsingular matrix T such that TCT'
= I. Define 7i,a<,2 < i < p by i
7i = 1 - E^'' j=
Obviously 7 and TCT'
^
= 1 - p . Let a = ( a 2
P
= &7j>/7<7i-i-
2
2 )
. . . , a ) ' . T h e n a = T A . Since C A = A f
= I we get a = T A - r C A a'ot = A ' ( T ' T )
- 1
=
{r)
_ 1
A,
A = ACA'
= (1 - p ) - ( A ' A ) - ( A ' A ) { A ' A ) = 2
Furthermore with
= (a , 2
1
2
p.
• • •, on)' i
Since 2
2
|C|-(l-p } -" we can write (1 — A ) " ' " / ' M=g 2
f 2
2
l/
d
g(tT(9 )^ 9(ii)ay'9(22)+m2)9{22))) ( 9)U
E x p a n d i n g the integrand in (6.45) as 9(tt 90') + o
+
+
( 1 )
(tr S<7')(-2tr(a
^ ^ ( - 2 t r ( ^^{-2tr(g,
f
f
n |
l
l
l
)
Q
ay'g
^
(11)C!
2
( 2 2 |
2
i,'o; ))) 22
)
))
) )
2
3
where z = tr go' + (1 - ff)(-2tr 9 ( i i | a ' S ( ) ) ' y
with 0 < S < 1.
2 2
(6-54)
158
Group Invariance in Statistical
Since, with g -
Inference
(gtj), P I / 2
trg ii,QJ/' E f o ) j=2
+
(
(
a
9 i
* j)
i>j
(1
and the measure g > (tr gg') f(do) is invariant under the change of sign of g to — g we conclude as in (6.18) that
(1)
/
[tr <J(ii)Q!/'i7(22)]9 (
tr
99'Hdg)
= 0
JGT
(1)
j 9i 9ihQ (^gg')u(dg) j
= 0
i / /, J j t f e .
!
(6.55)
Let 2
L = tr gg', L e i = g j u ) , L e , = g , i = 2 , . . . , p , Le
iei
+ P
2 p
-i+i = G
2 + 2
,i,
i = 2, . . . , p ,
( p - i ) / 2 = ffp, •
Now proceeding as in Problem 3 we get
j=2 •
\ !>j (6.56)
+ £ ( > / , A, r>) where
I f =6j/X,
j = 2,...,p, 1
M = E
e
(n-i)/a TT J " - * * ) /
2
j=2
S ( j / i , A , n ) = o(A) uniformly in 3/ and TJ. Furthermore from Theorem 6.1.1 R does not depend on q. Since A = 0 is a single point n = 0, £0 assigns measure 1 to the point 17 = 0. T h e set p
2
= A is a convex (p — 1)-dimensional Euclidean set wherein each
Locally Minimal
Tests in Symmetrical
Distributions
159
component is 0(A). T h u s any probability measure & can be replaced by the degenerate measure which assigns measure 1 to the mean TJ- , i = 2 , . . . , p of t)y • Choosing V* = (n-j
1
+ l)-'(
l
- j + 2 ) ~ ( p - l)~ n(n
n
-P+1),
f = 2,...»J>.
(6-57) R
we see that (6.9) is satisfied with U = YZ=2 3 using Theorem 2.7.1. we get 2
=
R
2
-
F
-
1 , / 2
l
o
m
G
i
r
i
(
1 9 8 8
)
o
r
2
/^(r |7Ji)//fi (r |ff ) 2
2
(1 -
0
2
p )-' B
yG
+ ( i - p „2i )(/-AA') 2
9 ( 2 2 | 0
;
])p(rf )
2 2 )
9
where g e G , as defined in (6.48), with p
2
2
(
^S)-(9( n,) "5 = and u -
/
, l / 2
g(tr
= 0,
|ff(22)9f
|
2 2 )
n ( p
^,
gg')p{dg), 2
2
(tii> - - • > T p - i ) ' satisfying u'u — r .
If p
is small the power of the 2
test against the alternative H\ which rejects Ho whenever R 2
2
by a + h{p )
2
+ o(p ),
2
where h{p )
> 0 for p
> 0 and h(p }
null robustness of the test follows from the fact that T(X) T(X)
= T(Xh)
> C
2
a
is given
= o(l). =
2
R
The
satisfies
where h 6 GT ( K a r i y a (1981b)). Hence we get the following
theorem:
T h e o r e m 6 . 2 . 3 . For testing H
a
rejects Hn whenever
2
R
2
against H[: p
= A, the level-a test which
> C is locally minimax as A —> 0.
P r o b l e m 5. Here m = 0, j = pi + 2 , . . . ,p. For testing H
Q
R = l
against H
2
TlA +
- p\ = A we can write (using 6.55) •pi+i
Y
3=2 + B(r,X,n).
M>j
/ . (6.58)
160
Group Invariance in Statistical
Now letting n
Inference
give measure one to n =
tlie
m
Vh---< p,+i'
e
a
n
0 and t\x give measure one
to
ot*£> where 1
1
j)| = ( n - p ) 7 ( i - i + l ) - ( n - J - l - 2 ) - p r 1
I
1
7
J = l,...,Pi.
(6.59) R
we conclude that (6.9) is satisfied with U = Rf = £,,'=2
a
j
n
d
6
8
( - )
follows
from Theorem 6.2.3. T h e o r e m 6.2.4. For testing Ha against H' -p\ — A, the level-a test which rejects Ho whenever
Rf > C , a
where the constant
C
depends on level a of
0
the test, is locally minimax as X —* 0. P r o b l e m 6. Here T),=0, For testing Ho against H
3
j = 2,...,pi + l .
; p\ = X we can write R as A —< 0 as
2B
)=Pi+2
V i »
+ B(r,A,n)
(6.60)
where B(r, X, n) = o(A) uniformly in r and rj. L e t us now consider the rejection region of the form RK={?
; u(y)
2
= f\+Kf
2
> c ]
(6.61)
a
where K is chosen such that (6.61) is reduced to yield (6.9) and C„ on the level a of the test for the chosen K.
Let Co give measure one to n = 0
and t]x give measure one to a single point ( 0 , . . . , 0, r v * Tl'j=(n-p+
l)n(n
depends
i + 2
, . . . , JJp) where
l
- j + l)" (n - j + 2)-\(p - 1)"\
}=Pi+2,...,p
and let R=! y:U(y) [
= fl +
^ f l > C
a
}
with pi + P2 = p - 1. Observe that the rejection region R with P „R 0>
satisfies (6.9). Furthermore, for the invariant region R, Px {R) iTI
= a
depends only
on A. Hence, from (6.8) r ( A , r ) = 0. Since R is locally best invariant as A - *
Locally Minimax Testa in Symmetrical
0. the test, which rejects HQ whenever R hence is locally worse. H
0
whenever R
2
161
> C, does not coincide with R and
2
From Problem 4 the power of the test (which rejects
> C) which depends only on A, has positive derivative at A
= 0. T h u s from (6.8) with R* = Rwe
conclude that h(X) > 0. Hence we have
For testing HQ against H'
T h e o r e m 6.2,5. critical
Distributions
3
: p\ = X > 0, the test with
region R is locally minimax as A —> 0.
Exercises 1. Let X = (Xi,....
X )'
be a random vector with peobability density function
p
(6.1). Show that E(X)
= p, c o v ( X ) = Q
l
s
with a = p~ E(X
- /iJ'E"
1
(X-
!*))• 2. Let X be a n x p random matrix with probability density function (6.2). F i n d the maximum likelihood estimator of p and E . References 1. N. Giri, Locally minimax
tests for multiple
correlations,
Canad. J . Statist., pp. 53¬
60 (1979). 2. N. Giri, Locally minimax
tests m symmetrical
distributions,
Ann. Inst. Stat. Math.,
pp. 381-394 (1988). 3. N. Giri, Multivariate Statistical Analysis, Marcel Dekker, N . Y . (1996). 4. T . Kariya and M. Eaton, Robust test of spherical symmetry, Ann. Statist., pp. 208¬ 215 (1977). 5. T . Kariya, A robustness
215 (1981). 6. T . Kariya, Robustness
property
of Hotelling's
of multivariate
7. T . Kariya and B . Sinha, Non-null
2
T -problem,
tests, Ann. Statist., pp. 1267-1275 (1981b),
and optimality
Statist., pp. 1182-1197 (1985). 8. T . Kariya and B . Sinha, Robustness (1988).
Ann. Statist., pp. 206¬
of Statistical
robustness
of some tests, Ann,
Tests, Academic Press, N . Y .
Chapter 7 T Y P E D AND E R E G I O N S
The
notion of a type D or E region is due to Isaacson (1951).
Kiefer
(1958) showed that the usual F - t e s t of the univarial general linear hypotheses possesses this property. Lehmann (1959) showed that, in finding type D region, invariance could be invoked in the manner of the Hunt-stein theorem; and this could also be done for type E regions (if they exist) provided that one works with a group which operates as the identity on the nuisance parameter set H of the testing problem. Suppose, for a parameter set fi = {(8,-n)
'• 8 € 0 , J J e H with associated
distributions, with 0 a Euclidean set, that every test function d> has a power function
fy(9,IJ)
which, for each n, is twice continuously differentiable in the
components of 6 at 8 = 0, an interior point of 0 . L e t Q
a
be the class of locally
strictly unbiased level a test of HQ : 8 = 0 against Hi : 0 ^ 0. O u r assumption on 0$ implies that all tests in Q„ are similar and that d0^/d&i\g^
o
in Q . a
— 0 for all d>
Let A ^ ( J J ) be the determinant of the matrix B<j,(rj) of second derivalives
of 0${8,TI)
with respect to the components of 8 (the Gaussian curvature) at
8 — 0. Suppose the parametrization be such that A # ( f i ) > 0 for all n for at least one <(>' in
Q. a
D e f i n i t i o n 7.1. T y p e . E t e s t . A test d>' is said to be of type E if
Q
a
and A ^ . ( n ) = max^gp,, A ^ ( n ) for all 77. D e f i n i t i o n 7.2.
T y p e D t e s t . A type E test d>' is said to be of type
D if the nuisance parameter set H is a single point. I n the problems treated earlier in Chapter 4 it seems doubtful that type E regions exist (in terms of
162
Type D and E Regions
163
Lehmann's development, H is not left invariant by many transformations). We introduce here two possible optimality criteria in the same sprit as the type D and E criteria which will always be fulfilled by some test under minimum regularity assumptions. Let A ( n ) = max A ^ ( T J ) .
t e s t . A test d>' is of type D
D e f i n i t i o n 7.3. T y p e D
A
max i
A( )
- A^(77)] -
V
(7.1)
if
A
min max[A(r,) - A * ( r / )
(7.2)
D e f i n i t i o n 7.4. T y p e D M t e s t . A test d<" is of type DM if m a x ^ M / A d - f a ) ] = min m a x [ A ( r j ) / A , , , ( j ) ) ] .
(7.3)
These two criteria resemble stringency and regret criteria employed elsewhere in statistics; the subscripts "A" and " M " stand for "additive" and "multiplicative" regret principles. T h e possession of these properties is invariant under the product of any transformations on 9 (acting trivially on H) of the same general type as those for which type D regions retain their property, and an arbitrary 1-1 transformation on H (acting trivially on 0 ) , but, of course, not under more general transformations on ft. Obviously, a type E test automatically satisfies these weaker criteria. L e t us now assume that a testing problem is invariant under a group of transformations G for which the Hunt-Stein theorem holds and which acts trivially on 0 ; that is g(0,n)
= (6,gr,),
9
e G .
(7.4)
If d>g is the test function, defined by 4>g{x) = d>(gx), then a trivial compulation shows that A() tg V
=/Xpigv)
It is easy to verify that A ^ ( ) j ) = A^gv) 3
if d> is better than $
m
•
(7.5)
implies A ( T J ) = A(g-n). Furthermore,
the sense of either of the above criteria, then d>g is
better than d>'g. A l l of the requirements of Lehmann (1959) are easily seen to be satisfied, so that we can conclude that there is an almost invariant (hence, in our problems, invariant) test which is of type DA or D^•
This differs from
164
Group Invariance in Statistical
Inference
the way in which invariance is used in page 883 of L e h m a n n (1959), where it is used to reduce Q rather than H here. If the group G is transitive on H, then A ( J J ) is constant as is A ^ ( T J ) for an invariant (i.e.
if A ^ .
maximizes A ^ over all invariant
" is of type DA or DM among all a). T o verify these optimality properties we need the following lemma. L e m m a 7.1. Let L be a class of non-negative definite symmetric matrices of order m and suppose J is a fixed nousingular member of L.
L
I f tr
J~ B
is maximized (over B in L) by B = J , then det B is also maximized by J. L
Conversely, if L is convex and J maximizes det B, the tr J~ B
is maximized
by B = J. L
P r o o f . Write J~ B tr ~
L M
= A. If A = I maximizes tr A, we get (det A) '
<
< 1 = det I . Conversely, if J. maximizes det A , it also maximizes tr A,
since tr B > tr I implies d e t ( a B + (1 — a)I)
> 1 for a small and positive.
•
R e m a r k s . T h e usefulness of this lemma lies in the fact that the generalized Neyman-Pearson lemma allows us to maximize tr QB^,, for fixed Q more easily than to maximize A ^ — d e t B ^ among similar level a tests. We can find, for each Q, a, " which maximizes A ^ is then obtained by finding a <S>Q for which B$
L
Q
— Q~ .
In problems of the type which we consider here the reduction by invariance under a group G which is transitive on H often results in a reduced problem wherein the maximal invariant is a vector R = (Ri,..., tion depends only on A -
(Sy,...,
6 )' M
Rm)' whose distribu-
where 6\ = B\, $ = ( 0 * , . . . , $ ) ' and m
such that the density / A of i i with respect to a o"-finite measure fi is of the form m (7.6) where the hi and a y are constants and Q ( r , A ) — o ( ^ £ ; ) as A —> 0 and we can differentiate under the integral sign to obtain, for invariant test
Til
8 (0 ) T
TV
= all
+ $ >
hA + 5 > £ a t f
/ r - #r) }
f (r) 0
p(dr)
(7.7)
Type D and E Region)
as S - » 0.
165
According to G i r i and Kiefer (1969) this is called the symmetric
reduced regular case ( S R R ) .
T h e o r e m 7 . 1 . In the SRR case, an invariant D among invariant
test i>* of level a is of type
) if and
only if
IF
^^={1} where c is a constant
and qT
1
E « {<}
= const h^
+ EotZj
(7.8)
ai Rj'{R) 3
P r o o f . I n the S R R case, every invariant has a diagonal B$ whose i t h diagonal entry, by (6.7), is 2[hia +
£o{53,
a R 4i(R)}]i3
}
B y Neyman-Pearson
lemma tr QB$ is maximized over such a <j> by a d>* of the form (6.7). Hence we get the theorem.
E x a m p l e 7.1. Let 0 = A
r l
2
^ F
- 1
•
2
( T - t e s t ) Consider once again. Problem 1 of Chapter 4.
^ where F is the unique member of GT with positive diagonal
elements satisfying F F'
= E . T h e n 8 is Euclidean p-space, GT operates
transitively on H = {positive definite E } but trivially on © and we have the S R R case. We thus have (6.7) with h = - 1 , {
'1,
d i > 3 ,
- < 0,
if i < j ,
N - j
Hotelling's T
2
+ l,
if
(7.9)
i - j .
test has a power function which depends only
5Z;°^>
so that,
with the above parametrization for 6, we have BT* a multiple of identity. r
c
Hotelling's critical region is of the form £ > > - B u t , when all
are equal,
the critical region corresponding to (6.8) is of the form ^ ( - / V + p — 1 — 2 j ) r , > c, which is not Hotelling's region if p > 1. Hence we get the following theorem.
T h e o r e m 7.2. F o r 0 < a < l < p < N , D among Gr-invariant E)
among all tests.
Hotelling's
T
2
test is not of type
tests, and hence is not of type DA or DM (nor of type
166
Group Invariance in Statistical
Inference
R e m a r k s . T h e actual computation of a <j>* of type D appears difficult due to the fact that we need to evaluate an integral of the form
(7.10) for ft = 0 or 1 for various choices of the Ci'fl and c. W h e n a is close to zero or 1, we can carry out approximate computations as follows.
A s a —» 1 the
complement R of the critical region becomes a simplex with one corner at 0. When p = 2, if we write p = 1 - a and consider critical regions of level a of the form byi + y
> c where 0 < L
2
_
I
< b < L , L being fixed but large (this keeps R r
close to the origin), we get from (6.10) that p = E {l-(p{R)) o(c) as C 2
i
0.
3 2
p b - ' /2(N-2)
Similarly, E {(l
-
a
1
2
= (A -2)c/26 / +
a
3
1
s
2
= {N - 2 ) c 6 - ' / 8 + o(c)
+ o(p) as p — 0 while £ ( # , ) = l/N.
=
From (5.18) we obtain
0
for the power near He
1 -
Np + o(p) + <5
2
2{N-2)b?
p + o(p)
1 -
2(N
(N - l)pb% + o(p) 2(N-2)
-2)6*
+ where o{p) and o ( J ^ 6 i ) terms are uniform in A and p, respectively.
The
product A ^ of the coefficients of 6\ and b% is maximized when b = (N + 1)/{JV - 1) + o ( l ) , as p —* 0; with more care, one can obtain further terms in an expansion in p for the type D choice of b. T h e argument is completed by showing that 6 < L
_
1
implies that R lies in a strip so close to the r i - a x i s as
to make EQ(1 -
d> as good as that with b = (N + l)/(N
- d>(R))R } 2
too small to yield a
- 1), with a similar argument if b > L .
When p is very close to 0, all choices of 6 > 0 give substantially the same power, 2
a + (Si + £ ) / 2 + o(p ), 2
2
Hotelling's T
so that the relative departure from being type D, of
test or any other critical region of the form bR\ + R
2
> c, with
b fixed and positive, approaches 0 as a - » 1. However we do not know how great the departure of A j from A can be for arbitrary a. r
We can similarly
treat the case p > 2 and also the case a —* 0. One can treat similarly other problems considered in Chapter 4. References 1. N. Giri, and J . Kiefer, Local and Asymptotic
Tests, Ann. Math. Statist. 35, 21-35 (1964).
Minimax
Properties
of
Multivariate
Type D and B Regions
2. S. L . I s a a c s o n , On the Theory Specifying
of Unbiased
the Values of Two or more
Tests of Simple
Parameters,
Statistical
167
Hypothesis
A n n . M a t h . S t a t i s t . 2 2 , 217¬
234 (1951). 3. J . Kiefer, On the Nonrandomized Symmetrical
Designs,
4. E . L . L e h m a n n , (1959).
Optimality
and Randomized
Nonoptimality
of
A n n . M a t h . Statist. 2 9 , 675-699 (1958).
Optimum
Invariant
Tests.,
A n n . M a t h . Statist. 3 0 ,
881-884